fio.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Recent changes (master)
@ 2022-06-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-06-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5ceed0be62f3ce8903d5747674f9f70f44e736d6:

  docs: update language setting for Sphinx build (2022-05-31 20:58:00 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 26faead0f3c6e7608b89a51373f1455b91377fcb:

  t/zbd: skip test case #13 when max_open_zones is too small (2022-06-02 03:58:31 -0600)

----------------------------------------------------------------
Ankit Kumar (5):
      configure: check nvme uring command support
      nvme: add nvme opcodes, structures and helper functions
      docs: document options for io_uring_cmd I/O engine
      zbd: Check for direct flag only if its block device
      engines/io_uring: Enable zone device support for io_uring_cmd I/O engine

Anuj Gupta (4):
      io_uring.h: add IORING_SETUP_SQE128 and IORING_SETUP_CQE32
      init: return error incase an invalid value is passed as option
      engines/io_uring: add new I/O engine for uring passthrough support
      examples: add 2 example job file for io_uring_cmd engine

Jens Axboe (3):
      engines/io_uring: cleanup supported case
      engines/nvme: fix 'fd' leak in error handling
      engines/nvme: ioctl return value is an int

Shin'ichiro Kawasaki (1):
      t/zbd: skip test case #13 when max_open_zones is too small

 HOWTO.rst                    |  41 +++--
 Makefile                     |   4 +-
 configure                    |  21 +++
 engines/io_uring.c           | 346 +++++++++++++++++++++++++++++++++++++++++-
 engines/nvme.c               | 347 +++++++++++++++++++++++++++++++++++++++++++
 engines/nvme.h               | 214 ++++++++++++++++++++++++++
 examples/uring-cmd-ng.fio    |  25 ++++
 examples/uring-cmd-zoned.fio |  31 ++++
 file.h                       |  12 +-
 fio.1                        |  33 +++-
 init.c                       |   9 ++
 os/linux/io_uring.h          |  45 +++++-
 t/zbd/test-zbd-support       |  23 ++-
 zbd.c                        |   4 +-
 14 files changed, 1123 insertions(+), 32 deletions(-)
 create mode 100644 engines/nvme.c
 create mode 100644 engines/nvme.h
 create mode 100644 examples/uring-cmd-ng.fio
 create mode 100644 examples/uring-cmd-zoned.fio

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 8ab3ac4b..28ac2b7c 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1952,6 +1952,10 @@ I/O engine
 			for both direct and buffered IO.
 			This engine defines engine specific options.
 
+		**io_uring_cmd**
+			Fast Linux native asynchronous I/O for pass through commands.
+			This engine defines engine specific options.
+
 		**libaio**
 			Linux native asynchronous I/O. Note that Linux may only support
 			queued behavior with non-buffered I/O (set ``direct=1`` or
@@ -2255,22 +2259,34 @@ with the caveat that when used on the command line, they must come after the
 	values for trim IOs are ignored. This option is mutually exclusive with
 	the :option:`cmdprio_percentage` option.
 
-.. option:: fixedbufs : [io_uring]
+.. option:: fixedbufs : [io_uring] [io_uring_cmd]
+
+	If fio is asked to do direct IO, then Linux will map pages for each
+	IO call, and release them when IO is done. If this option is set, the
+	pages are pre-mapped before IO is started. This eliminates the need to
+	map and release for each IO. This is more efficient, and reduces the
+	IO latency as well.
+
+.. option:: nonvectored : [io_uring] [io_uring_cmd]
 
-    If fio is asked to do direct IO, then Linux will map pages for each
-    IO call, and release them when IO is done. If this option is set, the
-    pages are pre-mapped before IO is started. This eliminates the need to
-    map and release for each IO. This is more efficient, and reduces the
-    IO latency as well.
+	With this option, fio will use non-vectored read/write commands, where
+	address must contain the address directly. Default is -1.
 
-.. option:: registerfiles : [io_uring]
+.. option:: force_async=int : [io_uring] [io_uring_cmd]
+
+	Normal operation for io_uring is to try and issue an sqe as
+	non-blocking first, and if that fails, execute it in an async manner.
+	With this option set to N, then every N request fio will ask sqe to
+	be issued in an async manner. Default is 0.
+
+.. option:: registerfiles : [io_uring] [io_uring_cmd]
 
 	With this option, fio registers the set of files being used with the
 	kernel. This avoids the overhead of managing file counts in the kernel,
 	making the submission and completion part more lightweight. Required
 	for the below :option:`sqthread_poll` option.
 
-.. option:: sqthread_poll : [io_uring] [xnvme]
+.. option:: sqthread_poll : [io_uring] [io_uring_cmd] [xnvme]
 
 	Normally fio will submit IO by issuing a system call to notify the
 	kernel of available items in the SQ ring. If this option is set, the
@@ -2278,14 +2294,19 @@ with the caveat that when used on the command line, they must come after the
 	This frees up cycles for fio, at the cost of using more CPU in the
 	system.
 
-.. option:: sqthread_poll_cpu : [io_uring]
+.. option:: sqthread_poll_cpu : [io_uring] [io_uring_cmd]
 
 	When :option:`sqthread_poll` is set, this option provides a way to
 	define which CPU should be used for the polling thread.
 
+.. option:: cmd_type=str : [io_uring_cmd]
+
+	Specifies the type of uring passthrough command to be used. Supported
+	value is nvme. Default is nvme.
+
 .. option:: hipri
 
-   [io_uring], [xnvme]
+   [io_uring] [io_uring_cmd] [xnvme]
 
         If this option is set, fio will attempt to use polled IO completions.
         Normal IO completions generate interrupts to signal the completion of
diff --git a/Makefile b/Makefile
index ed66305a..188a74d7 100644
--- a/Makefile
+++ b/Makefile
@@ -231,7 +231,7 @@ ifdef CONFIG_LIBXNVME
 endif
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
-		oslib/linux-dev-lookup.c engines/io_uring.c
+		oslib/linux-dev-lookup.c engines/io_uring.c engines/nvme.c
   cmdprio_SRCS = engines/cmdprio.c
 ifdef CONFIG_HAS_BLKZONED
   SOURCE += oslib/linux-blkzoned.c
@@ -241,7 +241,7 @@ endif
 endif
 ifeq ($(CONFIG_TARGET_OS), Android)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c profiles/tiobench.c \
-		oslib/linux-dev-lookup.c engines/io_uring.c
+		oslib/linux-dev-lookup.c engines/io_uring.c engines/nvme.c
   cmdprio_SRCS = engines/cmdprio.c
 ifdef CONFIG_HAS_BLKZONED
   SOURCE += oslib/linux-blkzoned.c
diff --git a/configure b/configure
index 4ee536a0..8182322b 100755
--- a/configure
+++ b/configure
@@ -2587,6 +2587,27 @@ if test "$libzbc" != "no" ; then
 fi
 print_config "libzbc engine" "$libzbc"
 
+if test "$targetos" = "Linux" ; then
+##########################################
+# Check NVME_URING_CMD support
+cat > $TMPC << EOF
+#include <linux/nvme_ioctl.h>
+int main(void)
+{
+  struct nvme_uring_cmd *cmd;
+
+  return sizeof(struct nvme_uring_cmd);
+}
+EOF
+if compile_prog "" "" "nvme uring cmd"; then
+  output_sym "CONFIG_NVME_URING_CMD"
+  nvme_uring_cmd="yes"
+else
+  nvme_uring_cmd="no"
+fi
+print_config "NVMe uring command support" "$nvme_uring_cmd"
+fi
+
 ##########################################
 # Check if we have xnvme
 if test "$xnvme" != "yes" ; then
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 1e15647e..cceafe69 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -24,6 +24,13 @@
 #include "../lib/types.h"
 #include "../os/linux/io_uring.h"
 #include "cmdprio.h"
+#include "nvme.h"
+
+#include <sys/stat.h>
+
+enum uring_cmd_type {
+	FIO_URING_CMD_NVME = 1,
+};
 
 struct io_sq_ring {
 	unsigned *head;
@@ -85,6 +92,7 @@ struct ioring_options {
 	unsigned int uncached;
 	unsigned int nowait;
 	unsigned int force_async;
+	enum uring_cmd_type cmd_type;
 };
 
 static const int ddir_to_op[2][2] = {
@@ -270,6 +278,22 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
 	},
+	{
+		.name	= "cmd_type",
+		.lname	= "Uring cmd type",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct ioring_options, cmd_type),
+		.help	= "Specify uring-cmd type",
+		.def	= "nvme",
+		.posval = {
+			  { .ival = "nvme",
+			    .oval = FIO_URING_CMD_NVME,
+			    .help = "Issue nvme-uring-cmd",
+			  },
+		},
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
 	{
 		.name	= NULL,
 	},
@@ -373,6 +397,48 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 	return 0;
 }
 
+static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+	struct fio_file *f = io_u->file;
+	struct nvme_uring_cmd *cmd;
+	struct io_uring_sqe *sqe;
+
+	/* only supports nvme_uring_cmd */
+	if (o->cmd_type != FIO_URING_CMD_NVME)
+		return -EINVAL;
+
+	sqe = &ld->sqes[(io_u->index) << 1];
+
+	if (o->registerfiles) {
+		sqe->fd = f->engine_pos;
+		sqe->flags = IOSQE_FIXED_FILE;
+	} else {
+		sqe->fd = f->fd;
+	}
+	sqe->rw_flags = 0;
+	if (!td->o.odirect && o->uncached)
+		sqe->rw_flags |= RWF_UNCACHED;
+	if (o->nowait)
+		sqe->rw_flags |= RWF_NOWAIT;
+
+	sqe->opcode = IORING_OP_URING_CMD;
+	sqe->user_data = (unsigned long) io_u;
+	if (o->nonvectored)
+		sqe->cmd_op = NVME_URING_CMD_IO;
+	else
+		sqe->cmd_op = NVME_URING_CMD_IO_VEC;
+	if (o->force_async && ++ld->prepped == o->force_async) {
+		ld->prepped = 0;
+		sqe->flags |= IOSQE_ASYNC;
+	}
+
+	cmd = (struct nvme_uring_cmd *)sqe->cmd;
+	return fio_nvme_uring_cmd_prep(cmd, io_u,
+			o->nonvectored ? NULL : &ld->iovecs[io_u->index]);
+}
+
 static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 {
 	struct ioring_data *ld = td->io_ops_data;
@@ -396,6 +462,29 @@ static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 	return io_u;
 }
 
+static struct io_u *fio_ioring_cmd_event(struct thread_data *td, int event)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+	struct io_uring_cqe *cqe;
+	struct io_u *io_u;
+	unsigned index;
+
+	index = (event + ld->cq_ring_off) & ld->cq_ring_mask;
+	if (o->cmd_type == FIO_URING_CMD_NVME)
+		index <<= 1;
+
+	cqe = &ld->cq_ring.cqes[index];
+	io_u = (struct io_u *) (uintptr_t) cqe->user_data;
+
+	if (cqe->res != 0)
+		io_u->error = -cqe->res;
+	else
+		io_u->error = 0;
+
+	return io_u;
+}
+
 static int fio_ioring_cqring_reap(struct thread_data *td, unsigned int events,
 				   unsigned int max)
 {
@@ -622,14 +711,22 @@ static int fio_ioring_mmap(struct ioring_data *ld, struct io_uring_params *p)
 	sring->array = ptr + p->sq_off.array;
 	ld->sq_ring_mask = *sring->ring_mask;
 
-	ld->mmap[1].len = p->sq_entries * sizeof(struct io_uring_sqe);
+	if (p->flags & IORING_SETUP_SQE128)
+		ld->mmap[1].len = 2 * p->sq_entries * sizeof(struct io_uring_sqe);
+	else
+		ld->mmap[1].len = p->sq_entries * sizeof(struct io_uring_sqe);
 	ld->sqes = mmap(0, ld->mmap[1].len, PROT_READ | PROT_WRITE,
 				MAP_SHARED | MAP_POPULATE, ld->ring_fd,
 				IORING_OFF_SQES);
 	ld->mmap[1].ptr = ld->sqes;
 
-	ld->mmap[2].len = p->cq_off.cqes +
-				p->cq_entries * sizeof(struct io_uring_cqe);
+	if (p->flags & IORING_SETUP_CQE32) {
+		ld->mmap[2].len = p->cq_off.cqes +
+					2 * p->cq_entries * sizeof(struct io_uring_cqe);
+	} else {
+		ld->mmap[2].len = p->cq_off.cqes +
+					p->cq_entries * sizeof(struct io_uring_cqe);
+	}
 	ptr = mmap(0, ld->mmap[2].len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_POPULATE, ld->ring_fd,
 			IORING_OFF_CQ_RING);
@@ -728,6 +825,61 @@ retry:
 	return fio_ioring_mmap(ld, &p);
 }
 
+static int fio_ioring_cmd_queue_init(struct thread_data *td)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+	int depth = td->o.iodepth;
+	struct io_uring_params p;
+	int ret;
+
+	memset(&p, 0, sizeof(p));
+
+	if (o->hipri)
+		p.flags |= IORING_SETUP_IOPOLL;
+	if (o->sqpoll_thread) {
+		p.flags |= IORING_SETUP_SQPOLL;
+		if (o->sqpoll_set) {
+			p.flags |= IORING_SETUP_SQ_AFF;
+			p.sq_thread_cpu = o->sqpoll_cpu;
+		}
+	}
+	if (o->cmd_type == FIO_URING_CMD_NVME) {
+		p.flags |= IORING_SETUP_SQE128;
+		p.flags |= IORING_SETUP_CQE32;
+	}
+
+	/*
+	 * Clamp CQ ring size at our SQ ring size, we don't need more entries
+	 * than that.
+	 */
+	p.flags |= IORING_SETUP_CQSIZE;
+	p.cq_entries = depth;
+
+retry:
+	ret = syscall(__NR_io_uring_setup, depth, &p);
+	if (ret < 0) {
+		if (errno == EINVAL && p.flags & IORING_SETUP_CQSIZE) {
+			p.flags &= ~IORING_SETUP_CQSIZE;
+			goto retry;
+		}
+		return ret;
+	}
+
+	ld->ring_fd = ret;
+
+	fio_ioring_probe(td);
+
+	if (o->fixedbufs) {
+		ret = syscall(__NR_io_uring_register, ld->ring_fd,
+				IORING_REGISTER_BUFFERS, ld->iovecs, depth);
+		if (ret < 0)
+			return ret;
+	}
+
+	return fio_ioring_mmap(ld, &p);
+}
+
 static int fio_ioring_register_files(struct thread_data *td)
 {
 	struct ioring_data *ld = td->io_ops_data;
@@ -811,6 +963,52 @@ static int fio_ioring_post_init(struct thread_data *td)
 	return 0;
 }
 
+static int fio_ioring_cmd_post_init(struct thread_data *td)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+	struct io_u *io_u;
+	int err, i;
+
+	for (i = 0; i < td->o.iodepth; i++) {
+		struct iovec *iov = &ld->iovecs[i];
+
+		io_u = ld->io_u_index[i];
+		iov->iov_base = io_u->buf;
+		iov->iov_len = td_max_bs(td);
+	}
+
+	err = fio_ioring_cmd_queue_init(td);
+	if (err) {
+		int init_err = errno;
+
+		td_verror(td, init_err, "io_queue_init");
+		return 1;
+	}
+
+	for (i = 0; i < td->o.iodepth; i++) {
+		struct io_uring_sqe *sqe;
+
+		if (o->cmd_type == FIO_URING_CMD_NVME) {
+			sqe = &ld->sqes[i << 1];
+			memset(sqe, 0, 2 * sizeof(*sqe));
+		} else {
+			sqe = &ld->sqes[i];
+			memset(sqe, 0, sizeof(*sqe));
+		}
+	}
+
+	if (o->registerfiles) {
+		err = fio_ioring_register_files(td);
+		if (err) {
+			td_verror(td, errno, "ioring_register_files");
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
 static int fio_ioring_init(struct thread_data *td)
 {
 	struct ioring_options *o = td->eo;
@@ -868,6 +1066,38 @@ static int fio_ioring_open_file(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
+static int fio_ioring_cmd_open_file(struct thread_data *td, struct fio_file *f)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+
+	if (o->cmd_type == FIO_URING_CMD_NVME) {
+		struct nvme_data *data = NULL;
+		unsigned int nsid, lba_size = 0;
+		unsigned long long nlba = 0;
+		int ret;
+
+		/* Store the namespace-id and lba size. */
+		data = FILE_ENG_DATA(f);
+		if (data == NULL) {
+			ret = fio_nvme_get_info(f, &nsid, &lba_size, &nlba);
+			if (ret)
+				return ret;
+
+			data = calloc(1, sizeof(struct nvme_data));
+			data->nsid = nsid;
+			data->lba_shift = ilog2(lba_size);
+
+			FILE_SET_ENG_DATA(f, data);
+		}
+	}
+	if (!ld || !o->registerfiles)
+		return generic_open_file(td, f);
+
+	f->fd = ld->fds[f->engine_pos];
+	return 0;
+}
+
 static int fio_ioring_close_file(struct thread_data *td, struct fio_file *f)
 {
 	struct ioring_data *ld = td->io_ops_data;
@@ -880,7 +1110,85 @@ static int fio_ioring_close_file(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
-static struct ioengine_ops ioengine = {
+static int fio_ioring_cmd_close_file(struct thread_data *td,
+				     struct fio_file *f)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+
+	if (o->cmd_type == FIO_URING_CMD_NVME) {
+		struct nvme_data *data = FILE_ENG_DATA(f);
+
+		FILE_SET_ENG_DATA(f, NULL);
+		free(data);
+	}
+	if (!ld || !o->registerfiles)
+		return generic_close_file(td, f);
+
+	f->fd = -1;
+	return 0;
+}
+
+static int fio_ioring_cmd_get_file_size(struct thread_data *td,
+					struct fio_file *f)
+{
+	struct ioring_options *o = td->eo;
+
+	if (fio_file_size_known(f))
+		return 0;
+
+	if (o->cmd_type == FIO_URING_CMD_NVME) {
+		struct nvme_data *data = NULL;
+		unsigned int nsid, lba_size = 0;
+		unsigned long long nlba = 0;
+		int ret;
+
+		ret = fio_nvme_get_info(f, &nsid, &lba_size, &nlba);
+		if (ret)
+			return ret;
+
+		data = calloc(1, sizeof(struct nvme_data));
+		data->nsid = nsid;
+		data->lba_shift = ilog2(lba_size);
+
+		f->real_file_size = lba_size * nlba;
+		fio_file_set_size_known(f);
+
+		FILE_SET_ENG_DATA(f, data);
+		return 0;
+	}
+	return generic_get_file_size(td, f);
+}
+
+static int fio_ioring_cmd_get_zoned_model(struct thread_data *td,
+					  struct fio_file *f,
+					  enum zbd_zoned_model *model)
+{
+	return fio_nvme_get_zoned_model(td, f, model);
+}
+
+static int fio_ioring_cmd_report_zones(struct thread_data *td,
+				       struct fio_file *f, uint64_t offset,
+				       struct zbd_zone *zbdz,
+				       unsigned int nr_zones)
+{
+	return fio_nvme_report_zones(td, f, offset, zbdz, nr_zones);
+}
+
+static int fio_ioring_cmd_reset_wp(struct thread_data *td, struct fio_file *f,
+				   uint64_t offset, uint64_t length)
+{
+	return fio_nvme_reset_wp(td, f, offset, length);
+}
+
+static int fio_ioring_cmd_get_max_open_zones(struct thread_data *td,
+					     struct fio_file *f,
+					     unsigned int *max_open_zones)
+{
+	return fio_nvme_get_max_open_zones(td, f, max_open_zones);
+}
+
+static struct ioengine_ops ioengine_uring = {
 	.name			= "io_uring",
 	.version		= FIO_IOOPS_VERSION,
 	.flags			= FIO_ASYNCIO_SYNC_TRIM | FIO_NO_OFFLOAD,
@@ -900,13 +1208,39 @@ static struct ioengine_ops ioengine = {
 	.option_struct_size	= sizeof(struct ioring_options),
 };
 
+static struct ioengine_ops ioengine_uring_cmd = {
+	.name			= "io_uring_cmd",
+	.version		= FIO_IOOPS_VERSION,
+	.flags			= FIO_ASYNCIO_SYNC_TRIM | FIO_NO_OFFLOAD | FIO_MEMALIGN | FIO_RAWIO,
+	.init			= fio_ioring_init,
+	.post_init		= fio_ioring_cmd_post_init,
+	.io_u_init		= fio_ioring_io_u_init,
+	.prep			= fio_ioring_cmd_prep,
+	.queue			= fio_ioring_queue,
+	.commit			= fio_ioring_commit,
+	.getevents		= fio_ioring_getevents,
+	.event			= fio_ioring_cmd_event,
+	.cleanup		= fio_ioring_cleanup,
+	.open_file		= fio_ioring_cmd_open_file,
+	.close_file		= fio_ioring_cmd_close_file,
+	.get_file_size		= fio_ioring_cmd_get_file_size,
+	.get_zoned_model	= fio_ioring_cmd_get_zoned_model,
+	.report_zones		= fio_ioring_cmd_report_zones,
+	.reset_wp		= fio_ioring_cmd_reset_wp,
+	.get_max_open_zones	= fio_ioring_cmd_get_max_open_zones,
+	.options		= options,
+	.option_struct_size	= sizeof(struct ioring_options),
+};
+
 static void fio_init fio_ioring_register(void)
 {
-	register_ioengine(&ioengine);
+	register_ioengine(&ioengine_uring);
+	register_ioengine(&ioengine_uring_cmd);
 }
 
 static void fio_exit fio_ioring_unregister(void)
 {
-	unregister_ioengine(&ioengine);
+	unregister_ioengine(&ioengine_uring);
+	unregister_ioengine(&ioengine_uring_cmd);
 }
 #endif
diff --git a/engines/nvme.c b/engines/nvme.c
new file mode 100644
index 00000000..9ffc5303
--- /dev/null
+++ b/engines/nvme.c
@@ -0,0 +1,347 @@
+/*
+ * nvme structure declarations and helper functions for the
+ * io_uring_cmd engine.
+ */
+
+#include "nvme.h"
+
+int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
+			    struct iovec *iov)
+{
+	struct nvme_data *data = FILE_ENG_DATA(io_u->file);
+	__u64 slba;
+	__u32 nlb;
+
+	memset(cmd, 0, sizeof(struct nvme_uring_cmd));
+
+	if (io_u->ddir == DDIR_READ)
+		cmd->opcode = nvme_cmd_read;
+	else if (io_u->ddir == DDIR_WRITE)
+		cmd->opcode = nvme_cmd_write;
+	else
+		return -ENOTSUP;
+
+	slba = io_u->offset >> data->lba_shift;
+	nlb = (io_u->xfer_buflen >> data->lba_shift) - 1;
+
+	/* cdw10 and cdw11 represent starting lba */
+	cmd->cdw10 = slba & 0xffffffff;
+	cmd->cdw11 = slba >> 32;
+	/* cdw12 represent number of lba's for read/write */
+	cmd->cdw12 = nlb;
+	if (iov) {
+		iov->iov_base = io_u->xfer_buf;
+		iov->iov_len = io_u->xfer_buflen;
+		cmd->addr = (__u64)(uintptr_t)iov;
+		cmd->data_len = 1;
+	} else {
+		cmd->addr = (__u64)(uintptr_t)io_u->xfer_buf;
+		cmd->data_len = io_u->xfer_buflen;
+	}
+	cmd->nsid = data->nsid;
+	return 0;
+}
+
+static int nvme_identify(int fd, __u32 nsid, enum nvme_identify_cns cns,
+			 enum nvme_csi csi, void *data)
+{
+	struct nvme_passthru_cmd cmd = {
+		.opcode         = nvme_admin_identify,
+		.nsid           = nsid,
+		.addr           = (__u64)(uintptr_t)data,
+		.data_len       = NVME_IDENTIFY_DATA_SIZE,
+		.cdw10          = cns,
+		.cdw11          = csi << NVME_IDENTIFY_CSI_SHIFT,
+		.timeout_ms     = NVME_DEFAULT_IOCTL_TIMEOUT,
+	};
+
+	return ioctl(fd, NVME_IOCTL_ADMIN_CMD, &cmd);
+}
+
+int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
+		      __u64 *nlba)
+{
+	struct nvme_id_ns ns;
+	int namespace_id;
+	int fd, err;
+
+	if (f->filetype != FIO_TYPE_CHAR) {
+		log_err("ioengine io_uring_cmd only works with nvme ns "
+			"generic char devices (/dev/ngXnY)\n");
+		return 1;
+	}
+
+	fd = open(f->file_name, O_RDONLY);
+	if (fd < 0)
+		return -errno;
+
+	namespace_id = ioctl(fd, NVME_IOCTL_ID);
+	if (namespace_id < 0) {
+		log_err("failed to fetch namespace-id");
+		close(fd);
+		return -errno;
+	}
+
+	/*
+	 * Identify namespace to get namespace-id, namespace size in LBA's
+	 * and LBA data size.
+	 */
+	err = nvme_identify(fd, namespace_id, NVME_IDENTIFY_CNS_NS,
+				NVME_CSI_NVM, &ns);
+	if (err) {
+		log_err("failed to fetch identify namespace\n");
+		close(fd);
+		return err;
+	}
+
+	*nsid = namespace_id;
+	*lba_sz = 1 << ns.lbaf[(ns.flbas & 0x0f)].ds;
+	*nlba = ns.nsze;
+
+	close(fd);
+	return 0;
+}
+
+int fio_nvme_get_zoned_model(struct thread_data *td, struct fio_file *f,
+			     enum zbd_zoned_model *model)
+{
+	struct nvme_data *data = FILE_ENG_DATA(f);
+	struct nvme_id_ns ns;
+	struct nvme_passthru_cmd cmd;
+	int fd, ret = 0;
+
+	if (f->filetype != FIO_TYPE_CHAR)
+		return -EINVAL;
+
+	/* File is not yet opened */
+	fd = open(f->file_name, O_RDONLY | O_LARGEFILE);
+	if (fd < 0)
+		return -errno;
+
+	/* Using nvme_id_ns for data as sizes are same */
+	ret = nvme_identify(fd, data->nsid, NVME_IDENTIFY_CNS_CSI_CTRL,
+				NVME_CSI_ZNS, &ns);
+	if (ret) {
+		*model = ZBD_NONE;
+		goto out;
+	}
+
+	memset(&cmd, 0, sizeof(struct nvme_passthru_cmd));
+
+	/* Using nvme_id_ns for data as sizes are same */
+	ret = nvme_identify(fd, data->nsid, NVME_IDENTIFY_CNS_CSI_NS,
+				NVME_CSI_ZNS, &ns);
+	if (ret) {
+		*model = ZBD_NONE;
+		goto out;
+	}
+
+	*model = ZBD_HOST_MANAGED;
+out:
+	close(fd);
+	return 0;
+}
+
+static int nvme_report_zones(int fd, __u32 nsid, __u64 slba, __u32 zras_feat,
+			     __u32 data_len, void *data)
+{
+	struct nvme_passthru_cmd cmd = {
+		.opcode         = nvme_zns_cmd_mgmt_recv,
+		.nsid           = nsid,
+		.addr           = (__u64)(uintptr_t)data,
+		.data_len       = data_len,
+		.cdw10          = slba & 0xffffffff,
+		.cdw11          = slba >> 32,
+		.cdw12		= (data_len >> 2) - 1,
+		.cdw13		= NVME_ZNS_ZRA_REPORT_ZONES | zras_feat,
+		.timeout_ms     = NVME_DEFAULT_IOCTL_TIMEOUT,
+	};
+
+	return ioctl(fd, NVME_IOCTL_IO_CMD, &cmd);
+}
+
+int fio_nvme_report_zones(struct thread_data *td, struct fio_file *f,
+			  uint64_t offset, struct zbd_zone *zbdz,
+			  unsigned int nr_zones)
+{
+	struct nvme_data *data = FILE_ENG_DATA(f);
+	struct nvme_zone_report *zr;
+	struct nvme_zns_id_ns zns_ns;
+	struct nvme_id_ns ns;
+	unsigned int i = 0, j, zones_fetched = 0;
+	unsigned int max_zones, zones_chunks = 1024;
+	int fd, ret = 0;
+	__u32 zr_len;
+	__u64 zlen;
+
+	/* File is not yet opened */
+	fd = open(f->file_name, O_RDONLY | O_LARGEFILE);
+	if (fd < 0)
+		return -errno;
+
+	zones_fetched = 0;
+	zr_len = sizeof(*zr) + (zones_chunks * sizeof(struct nvme_zns_desc));
+	zr = calloc(1, zr_len);
+	if (!zr) {
+		close(fd);
+		return -ENOMEM;
+	}
+
+	ret = nvme_identify(fd, data->nsid, NVME_IDENTIFY_CNS_NS,
+				NVME_CSI_NVM, &ns);
+	if (ret) {
+		log_err("%s: nvme_identify_ns failed, err=%d\n", f->file_name,
+			ret);
+		goto out;
+	}
+
+	ret = nvme_identify(fd, data->nsid, NVME_IDENTIFY_CNS_CSI_NS,
+				NVME_CSI_ZNS, &zns_ns);
+	if (ret) {
+		log_err("%s: nvme_zns_identify_ns failed, err=%d\n",
+			f->file_name, ret);
+		goto out;
+	}
+	zlen = zns_ns.lbafe[ns.flbas & 0x0f].zsze << data->lba_shift;
+
+	max_zones = (f->real_file_size - offset) / zlen;
+	if (max_zones < nr_zones)
+		nr_zones = max_zones;
+
+	if (nr_zones < zones_chunks)
+		zones_chunks = nr_zones;
+
+	while (zones_fetched < nr_zones) {
+		if (zones_fetched + zones_chunks >= nr_zones) {
+			zones_chunks = nr_zones - zones_fetched;
+			zr_len = sizeof(*zr) + (zones_chunks * sizeof(struct nvme_zns_desc));
+		}
+		ret = nvme_report_zones(fd, data->nsid, offset >> data->lba_shift,
+					NVME_ZNS_ZRAS_FEAT_ERZ, zr_len, (void *)zr);
+		if (ret) {
+			log_err("%s: nvme_zns_report_zones failed, err=%d\n",
+				f->file_name, ret);
+			goto out;
+		}
+
+		/* Transform the zone-report */
+		for (j = 0; j < zr->nr_zones; j++, i++) {
+			struct nvme_zns_desc *desc = (struct nvme_zns_desc *)&(zr->entries[j]);
+
+			zbdz[i].start = desc->zslba << data->lba_shift;
+			zbdz[i].len = zlen;
+			zbdz[i].wp = desc->wp << data->lba_shift;
+			zbdz[i].capacity = desc->zcap << data->lba_shift;
+
+			/* Zone Type is stored in first 4 bits. */
+			switch (desc->zt & 0x0f) {
+			case NVME_ZONE_TYPE_SEQWRITE_REQ:
+				zbdz[i].type = ZBD_ZONE_TYPE_SWR;
+				break;
+			default:
+				log_err("%s: invalid type for zone at offset %llu.\n",
+					f->file_name, desc->zslba);
+				ret = -EIO;
+				goto out;
+			}
+
+			/* Zone State is stored in last 4 bits. */
+			switch (desc->zs >> 4) {
+			case NVME_ZNS_ZS_EMPTY:
+				zbdz[i].cond = ZBD_ZONE_COND_EMPTY;
+				break;
+			case NVME_ZNS_ZS_IMPL_OPEN:
+				zbdz[i].cond = ZBD_ZONE_COND_IMP_OPEN;
+				break;
+			case NVME_ZNS_ZS_EXPL_OPEN:
+				zbdz[i].cond = ZBD_ZONE_COND_EXP_OPEN;
+				break;
+			case NVME_ZNS_ZS_CLOSED:
+				zbdz[i].cond = ZBD_ZONE_COND_CLOSED;
+				break;
+			case NVME_ZNS_ZS_FULL:
+				zbdz[i].cond = ZBD_ZONE_COND_FULL;
+				break;
+			case NVME_ZNS_ZS_READ_ONLY:
+			case NVME_ZNS_ZS_OFFLINE:
+			default:
+				/* Treat all these conditions as offline (don't use!) */
+				zbdz[i].cond = ZBD_ZONE_COND_OFFLINE;
+				zbdz[i].wp = zbdz[i].start;
+			}
+		}
+		zones_fetched += zr->nr_zones;
+		offset += zr->nr_zones * zlen;
+	}
+
+	ret = zones_fetched;
+out:
+	free(zr);
+	close(fd);
+
+	return ret;
+}
+
+int fio_nvme_reset_wp(struct thread_data *td, struct fio_file *f,
+		      uint64_t offset, uint64_t length)
+{
+	struct nvme_data *data = FILE_ENG_DATA(f);
+	unsigned int nr_zones;
+	unsigned long long zslba;
+	int i, fd, ret = 0;
+
+	/* If the file is not yet opened, open it for this function. */
+	fd = f->fd;
+	if (fd < 0) {
+		fd = open(f->file_name, O_RDWR | O_LARGEFILE);
+		if (fd < 0)
+			return -errno;
+	}
+
+	zslba = offset >> data->lba_shift;
+	nr_zones = (length + td->o.zone_size - 1) / td->o.zone_size;
+
+	for (i = 0; i < nr_zones; i++, zslba += (td->o.zone_size >> data->lba_shift)) {
+		struct nvme_passthru_cmd cmd = {
+			.opcode         = nvme_zns_cmd_mgmt_send,
+			.nsid           = data->nsid,
+			.cdw10          = zslba & 0xffffffff,
+			.cdw11          = zslba >> 32,
+			.cdw13          = NVME_ZNS_ZSA_RESET,
+			.addr           = (__u64)(uintptr_t)NULL,
+			.data_len       = 0,
+			.timeout_ms     = NVME_DEFAULT_IOCTL_TIMEOUT,
+		};
+
+		ret = ioctl(fd, NVME_IOCTL_IO_CMD, &cmd);
+	}
+
+	if (f->fd < 0)
+		close(fd);
+	return -ret;
+}
+
+int fio_nvme_get_max_open_zones(struct thread_data *td, struct fio_file *f,
+				unsigned int *max_open_zones)
+{
+	struct nvme_data *data = FILE_ENG_DATA(f);
+	struct nvme_zns_id_ns zns_ns;
+	int fd, ret = 0;
+
+	fd = open(f->file_name, O_RDONLY | O_LARGEFILE);
+	if (fd < 0)
+		return -errno;
+
+	ret = nvme_identify(fd, data->nsid, NVME_IDENTIFY_CNS_CSI_NS,
+				NVME_CSI_ZNS, &zns_ns);
+	if (ret) {
+		log_err("%s: nvme_zns_identify_ns failed, err=%d\n",
+			f->file_name, ret);
+		goto out;
+	}
+
+	*max_open_zones = zns_ns.mor + 1;
+out:
+	close(fd);
+	return ret;
+}
diff --git a/engines/nvme.h b/engines/nvme.h
new file mode 100644
index 00000000..70a89b74
--- /dev/null
+++ b/engines/nvme.h
@@ -0,0 +1,214 @@
+/*
+ * nvme structure declarations and helper functions for the
+ * io_uring_cmd engine.
+ */
+
+#ifndef FIO_NVME_H
+#define FIO_NVME_H
+
+#include <linux/nvme_ioctl.h>
+#include "../fio.h"
+
+/*
+ * If the uapi headers installed on the system lacks nvme uring command
+ * support, use the local version to prevent compilation issues.
+ */
+#ifndef CONFIG_NVME_URING_CMD
+struct nvme_uring_cmd {
+	__u8	opcode;
+	__u8	flags;
+	__u16	rsvd1;
+	__u32	nsid;
+	__u32	cdw2;
+	__u32	cdw3;
+	__u64	metadata;
+	__u64	addr;
+	__u32	metadata_len;
+	__u32	data_len;
+	__u32	cdw10;
+	__u32	cdw11;
+	__u32	cdw12;
+	__u32	cdw13;
+	__u32	cdw14;
+	__u32	cdw15;
+	__u32	timeout_ms;
+	__u32   rsvd2;
+};
+
+#define NVME_URING_CMD_IO	_IOWR('N', 0x80, struct nvme_uring_cmd)
+#define NVME_URING_CMD_IO_VEC	_IOWR('N', 0x81, struct nvme_uring_cmd)
+#endif /* CONFIG_NVME_URING_CMD */
+
+#define NVME_DEFAULT_IOCTL_TIMEOUT 0
+#define NVME_IDENTIFY_DATA_SIZE 4096
+#define NVME_IDENTIFY_CSI_SHIFT 24
+
+#define NVME_ZNS_ZRA_REPORT_ZONES 0
+#define NVME_ZNS_ZRAS_FEAT_ERZ (1 << 16)
+#define NVME_ZNS_ZSA_RESET 0x4
+#define NVME_ZONE_TYPE_SEQWRITE_REQ 0x2
+
+enum nvme_identify_cns {
+	NVME_IDENTIFY_CNS_NS		= 0x00,
+	NVME_IDENTIFY_CNS_CSI_NS	= 0x05,
+	NVME_IDENTIFY_CNS_CSI_CTRL	= 0x06,
+};
+
+enum nvme_csi {
+	NVME_CSI_NVM			= 0,
+	NVME_CSI_KV			= 1,
+	NVME_CSI_ZNS			= 2,
+};
+
+enum nvme_admin_opcode {
+	nvme_admin_identify		= 0x06,
+};
+
+enum nvme_io_opcode {
+	nvme_cmd_write			= 0x01,
+	nvme_cmd_read			= 0x02,
+	nvme_zns_cmd_mgmt_send		= 0x79,
+	nvme_zns_cmd_mgmt_recv		= 0x7a,
+};
+
+enum nvme_zns_zs {
+	NVME_ZNS_ZS_EMPTY		= 0x1,
+	NVME_ZNS_ZS_IMPL_OPEN		= 0x2,
+	NVME_ZNS_ZS_EXPL_OPEN		= 0x3,
+	NVME_ZNS_ZS_CLOSED		= 0x4,
+	NVME_ZNS_ZS_READ_ONLY		= 0xd,
+	NVME_ZNS_ZS_FULL		= 0xe,
+	NVME_ZNS_ZS_OFFLINE		= 0xf,
+};
+
+struct nvme_data {
+	__u32 nsid;
+	__u32 lba_shift;
+};
+
+struct nvme_lbaf {
+	__le16			ms;
+	__u8			ds;
+	__u8			rp;
+};
+
+struct nvme_id_ns {
+	__le64			nsze;
+	__le64			ncap;
+	__le64			nuse;
+	__u8			nsfeat;
+	__u8			nlbaf;
+	__u8			flbas;
+	__u8			mc;
+	__u8			dpc;
+	__u8			dps;
+	__u8			nmic;
+	__u8			rescap;
+	__u8			fpi;
+	__u8			dlfeat;
+	__le16			nawun;
+	__le16			nawupf;
+	__le16			nacwu;
+	__le16			nabsn;
+	__le16			nabo;
+	__le16			nabspf;
+	__le16			noiob;
+	__u8			nvmcap[16];
+	__le16			npwg;
+	__le16			npwa;
+	__le16			npdg;
+	__le16			npda;
+	__le16			nows;
+	__le16			mssrl;
+	__le32			mcl;
+	__u8			msrc;
+	__u8			rsvd81[11];
+	__le32			anagrpid;
+	__u8			rsvd96[3];
+	__u8			nsattr;
+	__le16			nvmsetid;
+	__le16			endgid;
+	__u8			nguid[16];
+	__u8			eui64[8];
+	struct nvme_lbaf	lbaf[16];
+	__u8			rsvd192[192];
+	__u8			vs[3712];
+};
+
+static inline int ilog2(uint32_t i)
+{
+	int log = -1;
+
+	while (i) {
+		i >>= 1;
+		log++;
+	}
+	return log;
+}
+
+struct nvme_zns_lbafe {
+	__le64	zsze;
+	__u8	zdes;
+	__u8	rsvd9[7];
+};
+
+struct nvme_zns_id_ns {
+	__le16			zoc;
+	__le16			ozcs;
+	__le32			mar;
+	__le32			mor;
+	__le32			rrl;
+	__le32			frl;
+	__le32			rrl1;
+	__le32			rrl2;
+	__le32			rrl3;
+	__le32			frl1;
+	__le32			frl2;
+	__le32			frl3;
+	__le32			numzrwa;
+	__le16			zrwafg;
+	__le16			zrwasz;
+	__u8			zrwacap;
+	__u8			rsvd53[2763];
+	struct nvme_zns_lbafe	lbafe[64];
+	__u8			vs[256];
+};
+
+struct nvme_zns_desc {
+	__u8	zt;
+	__u8	zs;
+	__u8	za;
+	__u8	zai;
+	__u8	rsvd4[4];
+	__le64	zcap;
+	__le64	zslba;
+	__le64	wp;
+	__u8	rsvd32[32];
+};
+
+struct nvme_zone_report {
+	__le64			nr_zones;
+	__u8			rsvd8[56];
+	struct nvme_zns_desc	entries[];
+};
+
+int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
+		      __u64 *nlba);
+
+int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
+			    struct iovec *iov);
+
+int fio_nvme_get_zoned_model(struct thread_data *td, struct fio_file *f,
+			     enum zbd_zoned_model *model);
+
+int fio_nvme_report_zones(struct thread_data *td, struct fio_file *f,
+			  uint64_t offset, struct zbd_zone *zbdz,
+			  unsigned int nr_zones);
+
+int fio_nvme_reset_wp(struct thread_data *td, struct fio_file *f,
+		      uint64_t offset, uint64_t length);
+
+int fio_nvme_get_max_open_zones(struct thread_data *td, struct fio_file *f,
+				unsigned int *max_open_zones);
+
+#endif
diff --git a/examples/uring-cmd-ng.fio b/examples/uring-cmd-ng.fio
new file mode 100644
index 00000000..b2888a00
--- /dev/null
+++ b/examples/uring-cmd-ng.fio
@@ -0,0 +1,25 @@
+# io_uring_cmd I/O engine for nvme-ns generic character device
+
+[global]
+filename=/dev/ng0n1
+ioengine=io_uring_cmd
+cmd_type=nvme
+size=1G
+iodepth=32
+bs=4K
+thread=1
+stonewall=1
+
+[rand-write]
+rw=randwrite
+sqthread_poll=1
+
+[rand-read]
+rw=randread
+
+[write-opts]
+rw=write
+sqthread_poll=1
+sqthread_poll_cpu=0
+nonvectored=1
+registerfiles=1
diff --git a/examples/uring-cmd-zoned.fio b/examples/uring-cmd-zoned.fio
new file mode 100644
index 00000000..58e8f79e
--- /dev/null
+++ b/examples/uring-cmd-zoned.fio
@@ -0,0 +1,31 @@
+# io_uring_cmd I/O engine for nvme-ns generic zoned character device
+#
+# NOTE: with write workload iodepth must be set to 1 as there is no IO
+# scheduler.
+
+[global]
+filename=/dev/ng0n1
+ioengine=io_uring_cmd
+cmd_type=nvme
+zonemode=zbd
+size=1G
+iodepth=1
+bs=256K
+verify=crc32c
+stonewall=1
+
+[rand-write]
+rw=randwrite
+
+[write-opts]
+rw=write
+registerfiles=1
+sqthread_poll=1
+sqthread_poll_cpu=0
+
+[randwrite-opts]
+rw=randwrite
+sqthread_poll=1
+sqthread_poll_cpu=0
+nonvectored=1
+registerfiles=1
diff --git a/file.h b/file.h
index faf65a2a..da1b8947 100644
--- a/file.h
+++ b/file.h
@@ -126,12 +126,14 @@ struct fio_file {
 	unsigned int last_write_idx;
 
 	/*
-	 * For use by the io engine for offset or private data storage
+	 * For use by the io engine to store offset
 	 */
-	union {
-		uint64_t engine_pos;
-		void *engine_data;
-	};
+	uint64_t engine_pos;
+
+	/*
+	 * For use by the io engine for private data storage
+	 */
+	void *engine_data;
 
 	/*
 	 * if io is protected by a semaphore, this is set
diff --git a/fio.1 b/fio.1
index bdba3142..948c01f9 100644
--- a/fio.1
+++ b/fio.1
@@ -1739,6 +1739,15 @@ Basic \fBpreadv\fR\|(2) or \fBpwritev\fR\|(2) I/O.
 .B pvsync2
 Basic \fBpreadv2\fR\|(2) or \fBpwritev2\fR\|(2) I/O.
 .TP
+.B io_uring
+Fast Linux native asynchronous I/O. Supports async IO
+for both direct and buffered IO.
+This engine defines engine specific options.
+.TP
+.B io_uring_cmd
+Fast Linux native asynchronous I/O for passthrough commands.
+This engine defines engine specific options.
+.TP
 .B libaio
 Linux native asynchronous I/O. Note that Linux may only support
 queued behavior with non-buffered I/O (set `direct=1' or
@@ -2040,35 +2049,49 @@ for trim IOs are ignored. This option is mutually exclusive with the
 \fBcmdprio_percentage\fR option.
 .RE
 .TP
-.BI (io_uring)fixedbufs
+.BI (io_uring,io_uring_cmd)fixedbufs
 If fio is asked to do direct IO, then Linux will map pages for each IO call, and
 release them when IO is done. If this option is set, the pages are pre-mapped
 before IO is started. This eliminates the need to map and release for each IO.
 This is more efficient, and reduces the IO latency as well.
 .TP
-.BI (io_uring,xnvme)hipri
+.BI (io_uring,io_uring_cmd)nonvectored
+With this option, fio will use non-vectored read/write commands, where address
+must contain the address directly. Default is -1.
+.TP
+.BI (io_uring,io_uring_cmd)force_async
+Normal operation for io_uring is to try and issue an sqe as non-blocking first,
+and if that fails, execute it in an async manner. With this option set to N,
+then every N request fio will ask sqe to be issued in an async manner. Default
+is 0.
+.TP
+.BI (io_uring,io_uring_cmd,xnvme)hipri
 If this option is set, fio will attempt to use polled IO completions. Normal IO
 completions generate interrupts to signal the completion of IO, polled
 completions do not. Hence they are require active reaping by the application.
 The benefits are more efficient IO for high IOPS scenarios, and lower latencies
 for low queue depth IO.
 .TP
-.BI (io_uring)registerfiles
+.BI (io_uring,io_uring_cmd)registerfiles
 With this option, fio registers the set of files being used with the kernel.
 This avoids the overhead of managing file counts in the kernel, making the
 submission and completion part more lightweight. Required for the below
 sqthread_poll option.
 .TP
-.BI (io_uring,xnvme)sqthread_poll
+.BI (io_uring,io_uring_cmd,xnvme)sqthread_poll
 Normally fio will submit IO by issuing a system call to notify the kernel of
 available items in the SQ ring. If this option is set, the act of submitting IO
 will be done by a polling thread in the kernel. This frees up cycles for fio, at
 the cost of using more CPU in the system.
 .TP
-.BI (io_uring)sqthread_poll_cpu
+.BI (io_uring,io_uring_cmd)sqthread_poll_cpu
 When `sqthread_poll` is set, this option provides a way to define which CPU
 should be used for the polling thread.
 .TP
+.BI (io_uring_cmd)cmd_type \fR=\fPstr
+Specifies the type of uring passthrough command to be used. Supported
+value is nvme. Default is nvme.
+.TP
 .BI (libaio)userspace_reap
 Normally, with the libaio engine in use, fio will use the
 \fBio_getevents\fR\|(3) system call to reap newly returned events. With
diff --git a/init.c b/init.c
index f7d702f8..da800776 100644
--- a/init.c
+++ b/init.c
@@ -2810,6 +2810,15 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 				break;
 
 			ret = fio_cmd_ioengine_option_parse(td, opt, val);
+
+			if (ret) {
+				if (td) {
+					put_job(td);
+					td = NULL;
+				}
+				do_exit++;
+				exit_val = 1;
+			}
 			break;
 		}
 		case 'w':
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index 42b2fe84..929997f8 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -22,6 +22,7 @@ struct io_uring_sqe {
 	union {
 		__u64	off;	/* offset into file */
 		__u64	addr2;
+		__u32	cmd_op;
 	};
 	union {
 		__u64	addr;	/* pointer to buffer or iovecs */
@@ -60,7 +61,17 @@ struct io_uring_sqe {
 		__s32	splice_fd_in;
 		__u32	file_index;
 	};
-	__u64	__pad2[2];
+	union {
+		struct {
+			__u64	addr3;
+			__u64	__pad2[1];
+		};
+		/*
+		 * If the ring is initialized with IORING_SETUP_SQE128, then
+		 * this field is used for 80 bytes of arbitrary command data
+		 */
+		__u8	cmd[0];
+	};
 };
 
 enum {
@@ -101,6 +112,24 @@ enum {
 #define IORING_SETUP_CLAMP	(1U << 4)	/* clamp SQ/CQ ring sizes */
 #define IORING_SETUP_ATTACH_WQ	(1U << 5)	/* attach to existing wq */
 #define IORING_SETUP_R_DISABLED	(1U << 6)	/* start with ring disabled */
+#define IORING_SETUP_SUBMIT_ALL	(1U << 7)	/* continue submit on error */
+/*
+ * Cooperative task running. When requests complete, they often require
+ * forcing the submitter to transition to the kernel to complete. If this
+ * flag is set, work will be done when the task transitions anyway, rather
+ * than force an inter-processor interrupt reschedule. This avoids interrupting
+ * a task running in userspace, and saves an IPI.
+ */
+#define IORING_SETUP_COOP_TASKRUN	(1U << 8)
+/*
+ * If COOP_TASKRUN is set, get notified if task work is available for
+ * running and a kernel transition would be needed to run it. This sets
+ * IORING_SQ_TASKRUN in the sq ring flags. Not valid with COOP_TASKRUN.
+ */
+#define IORING_SETUP_TASKRUN_FLAG	(1U << 9)
+
+#define IORING_SETUP_SQE128		(1U << 10) /* SQEs are 128 byte */
+#define IORING_SETUP_CQE32		(1U << 11) /* CQEs are 32 byte */
 
 enum {
 	IORING_OP_NOP,
@@ -143,6 +172,14 @@ enum {
 	IORING_OP_MKDIRAT,
 	IORING_OP_SYMLINKAT,
 	IORING_OP_LINKAT,
+	IORING_OP_MSG_RING,
+	IORING_OP_FSETXATTR,
+	IORING_OP_SETXATTR,
+	IORING_OP_FGETXATTR,
+	IORING_OP_GETXATTR,
+	IORING_OP_SOCKET,
+	IORING_OP_URING_CMD,
+
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
@@ -192,6 +229,12 @@ struct io_uring_cqe {
 	__u64	user_data;	/* sqe->data submission passed back */
 	__s32	res;		/* result code for this event */
 	__u32	flags;
+
+	/*
+	 * If the ring is initialized with IORING_SETUP_CQE32, then this field
+	 * contains 16-bytes of padding, doubling the size of the CQE.
+	 */
+	__u64 big_cqe[];
 };
 
 /*
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 7e2fff00..d4aaa813 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -229,6 +229,14 @@ require_regular_block_dev() {
 	return 0
 }
 
+require_block_dev() {
+	if [[ -b "$realdev" ]]; then
+		return 0
+	fi
+	SKIP_REASON="$dev is not a block device"
+	return 1
+}
+
 require_seq_zones() {
 	local req_seq_zones=${1}
 	local seq_bytes=$((disk_size - first_sequential_zone_sector * 512))
@@ -251,8 +259,19 @@ require_conv_zones() {
 	return 0
 }
 
-# Check whether buffered writes are refused.
+require_max_open_zones() {
+	local min=${1}
+
+	if ((max_open_zones !=0 && max_open_zones < min)); then
+		SKIP_REASON="max_open_zones of $dev is smaller than $min"
+		return 1
+	fi
+	return 0
+}
+
+# Check whether buffered writes are refused for block devices.
 test1() {
+    require_block_dev || return $SKIP_TESTCASE
     run_fio --name=job1 --filename="$dev" --rw=write --direct=0 --bs=4K	\
 	    "$(ioengine "psync")" --size="${zone_size}" --thread=1	\
 	    --zonemode=zbd --zonesize="${zone_size}" 2>&1 |
@@ -453,6 +472,8 @@ test12() {
 test13() {
     local size off capacity
 
+    require_max_open_zones 4 || return $SKIP_TESTCASE
+
     prep_write
     size=$((8 * zone_size))
     off=$((first_sequential_zone_sector * 512))
diff --git a/zbd.c b/zbd.c
index b1fd6b4b..627fb968 100644
--- a/zbd.c
+++ b/zbd.c
@@ -466,7 +466,7 @@ out:
 	return res;
 }
 
-/* Verify whether direct I/O is used for all host-managed zoned drives. */
+/* Verify whether direct I/O is used for all host-managed zoned block drives. */
 static bool zbd_using_direct_io(void)
 {
 	struct thread_data *td;
@@ -477,7 +477,7 @@ static bool zbd_using_direct_io(void)
 		if (td->o.odirect || !(td->o.td_ddir & TD_DDIR_WRITE))
 			continue;
 		for_each_file(td, f, j) {
-			if (f->zbd_info &&
+			if (f->zbd_info && f->filetype == FIO_TYPE_BLOCK &&
 			    f->zbd_info->model == ZBD_HOST_MANAGED)
 				return false;
 		}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-04-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-04-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9213e16d98b0e9d2f8d4f7e760ed0fd45c8960f6:

  Fio 3.37 (2024-03-26 15:13:51 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4eef23f627d103d7092b4141bd6b0c8f95309ee9:

  howto: fix zonemode formatting (2024-04-02 11:10:58 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      howto: fix zonemode formatting

 HOWTO.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 6a204072..25fdfbc4 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -985,14 +985,14 @@ Target file/device
 
 		**none**
 				The :option:`zonerange`, :option:`zonesize`,
-				:option `zonecapacity` and option:`zoneskip`
+				:option:`zonecapacity` and :option:`zoneskip`
 				parameters are ignored.
 		**strided**
 				I/O happens in a single zone until
 				:option:`zonesize` bytes have been transferred.
 				After that number of bytes has been
 				transferred processing of the next zone
-				starts. :option `zonecapacity` is ignored.
+				starts. :option:`zonecapacity` is ignored.
 		**zbd**
 				Zoned block device mode. I/O happens
 				sequentially in each zone, even if random I/O

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-03-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-03-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b2403d413ee734e8835539319d8bc3429a0777ac:

  Merge branch 'delete-instead-of-unlink' of https://github.com/edigaryev/fio (2024-03-25 10:45:13 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9213e16d98b0e9d2f8d4f7e760ed0fd45c8960f6:

  Fio 3.37 (2024-03-26 15:13:51 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 3.37

 FIO-VERSION-GEN | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index cf8dbb0e..be0d7620 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.36
+DEF_VER=fio-3.37
 
 LF='
 '

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-03-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-03-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2b03792ceb7ed00bd50db5b59486fab902295df8:

  Merge branch 'issue-1735' of https://github.com/yygcode/fio (2024-03-22 10:39:37 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b2403d413ee734e8835539319d8bc3429a0777ac:

  Merge branch 'delete-instead-of-unlink' of https://github.com/edigaryev/fio (2024-03-25 10:45:13 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'delete-instead-of-unlink' of https://github.com/edigaryev/fio

Nikolay Edigaryev (1):
      docs: use "delete" term instead of "unlink", which is less common

 HOWTO.rst | 4 ++--
 fio.1     | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index fb067fe5..6a204072 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -971,13 +971,13 @@ Target file/device
 
 .. option:: unlink=bool
 
-	Unlink the job files when done. Not the default, as repeated runs of that
+	Unlink (delete) the job files when done. Not the default, as repeated runs of that
 	job would then waste time recreating the file set again and again. Default:
 	false.
 
 .. option:: unlink_each_loop=bool
 
-	Unlink job files after each iteration or loop.  Default: false.
+	Unlink (delete) job files after each iteration or loop.  Default: false.
 
 .. option:: zonemode=str
 
diff --git a/fio.1 b/fio.1
index 63375c62..545bb872 100644
--- a/fio.1
+++ b/fio.1
@@ -749,12 +749,12 @@ same data multiple times. Thus it will not work on non-seekable I/O engines
 (e.g. network, splice). Default: false.
 .TP
 .BI unlink \fR=\fPbool
-Unlink the job files when done. Not the default, as repeated runs of that
+Unlink (delete) the job files when done. Not the default, as repeated runs of that
 job would then waste time recreating the file set again and again. Default:
 false.
 .TP
 .BI unlink_each_loop \fR=\fPbool
-Unlink job files after each iteration or loop. Default: false.
+Unlink (delete) job files after each iteration or loop. Default: false.
 .TP
 .BI zonemode \fR=\fPstr
 Accepted values are:

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-03-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-03-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 20f42c101f7876648705a4fb8a9e2a647dc936ce:

  t/run-fio-tests: restrict t0031 to Linux only (2024-03-21 08:36:14 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2b03792ceb7ed00bd50db5b59486fab902295df8:

  Merge branch 'issue-1735' of https://github.com/yygcode/fio (2024-03-22 10:39:37 -0400)

----------------------------------------------------------------
Vincent Fu (4):
      engines/fileoperations: remove extra blank lines
      engines/fileoperations: use local var for ioengine data
      examples: fiograph plots for dir operation ioengines
      Merge branch 'issue-1735' of https://github.com/yygcode/fio

friendy-su (1):
      ioengines: implement dircreate, dirstat, dirdelete engines to fileoperations.c

yonggang.yyg (1):
      iolog: fix disk stats issue

 HOWTO.rst                       |  15 +++++
 engines/fileoperations.c        | 119 ++++++++++++++++++++++++++++++++++++++--
 examples/dircreate-ioengine.fio |  25 +++++++++
 examples/dircreate-ioengine.png | Bin 0 -> 42659 bytes
 examples/dirdelete-ioengine.fio |  18 ++++++
 examples/dirdelete-ioengine.png | Bin 0 -> 45530 bytes
 examples/dirstat-ioengine.fio   |  18 ++++++
 examples/dirstat-ioengine.png   | Bin 0 -> 33597 bytes
 fio.1                           |  15 +++++
 iolog.c                         |   2 +
 10 files changed, 206 insertions(+), 6 deletions(-)
 create mode 100644 examples/dircreate-ioengine.fio
 create mode 100644 examples/dircreate-ioengine.png
 create mode 100644 examples/dirdelete-ioengine.fio
 create mode 100644 examples/dirdelete-ioengine.png
 create mode 100644 examples/dirstat-ioengine.fio
 create mode 100644 examples/dirstat-ioengine.png

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 4c8ac331..fb067fe5 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2192,6 +2192,21 @@ I/O engine
 			and 'nrfiles', so that the files will be created.
 			This engine is to measure file delete.
 
+		**dircreate**
+			Simply create the directories and do no I/O to them.  You still need to
+			set  `filesize` so that all the accounting still occurs, but no
+			actual I/O will be done other than creating the directories.
+
+		**dirstat**
+			Simply do stat() and do no I/O to the directories. You need to set 'filesize'
+			and 'nrfiles', so that directories will be created.
+			This engine is to measure directory lookup and meta data access.
+
+		**dirdelete**
+			Simply delete the directories by rmdir() and do no I/O to them. You need to set 'filesize'
+			and 'nrfiles', so that the directories will be created.
+			This engine is to measure directory delete.
+
 		**libpmem**
 			Read and write using mmap I/O to a file on a filesystem
 			mounted with DAX on a persistent memory device through the PMDK
diff --git a/engines/fileoperations.c b/engines/fileoperations.c
index 1db60da1..c52f0900 100644
--- a/engines/fileoperations.c
+++ b/engines/fileoperations.c
@@ -1,8 +1,8 @@
 /*
- * fileoperations engine
+ * file/directory operations engine
  *
- * IO engine that doesn't do any IO, just operates files and tracks the latency
- * of the file operation.
+ * IO engine that doesn't do any IO, just operates files/directories
+ * and tracks the latency of the operation.
  */
 #include <stdio.h>
 #include <stdlib.h>
@@ -15,9 +15,15 @@
 #include "../optgroup.h"
 #include "../oslib/statx.h"
 
+enum fio_engine {
+	UNKNOWN_OP_ENGINE = 0,
+	FILE_OP_ENGINE = 1,
+	DIR_OP_ENGINE = 2,
+};
 
 struct fc_data {
 	enum fio_ddir stat_ddir;
+	enum fio_engine op_engine;
 };
 
 struct filestat_options {
@@ -61,11 +67,30 @@ static struct fio_option options[] = {
 	},
 };
 
+static int setup_dirs(struct thread_data *td)
+{
+	int ret = 0;
+	int i;
+	struct fio_file *f;
+
+	for_each_file(td, f, i) {
+		dprint(FD_FILE, "setup directory %s\n", f->file_name);
+		ret = fio_mkdir(f->file_name, 0700);
+		if ((ret && errno != EEXIST)) {
+			log_err("create directory %s failed with %d\n",
+				f->file_name, errno);
+			break;
+		}
+		ret = 0;
+	}
+	return ret;
+}
 
 static int open_file(struct thread_data *td, struct fio_file *f)
 {
 	struct timespec start;
 	int do_lat = !td->o.disable_lat;
+	struct fc_data *fcd = td->io_ops_data;
 
 	dprint(FD_FILE, "fd open %s\n", f->file_name);
 
@@ -81,7 +106,14 @@ static int open_file(struct thread_data *td, struct fio_file *f)
 	if (do_lat)
 		fio_gettime(&start, NULL);
 
-	f->fd = open(f->file_name, O_CREAT|O_RDWR, 0600);
+	if (fcd->op_engine == FILE_OP_ENGINE)
+		f->fd = open(f->file_name, O_CREAT|O_RDWR, 0600);
+	else if (fcd->op_engine == DIR_OP_ENGINE)
+		f->fd = fio_mkdir(f->file_name, S_IFDIR);
+	else {
+		log_err("fio: unknown file/directory operation engine\n");
+		return 1;
+	}
 
 	if (f->fd == -1) {
 		char buf[FIO_VERROR_SIZE];
@@ -174,11 +206,11 @@ static int stat_file(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
-
 static int delete_file(struct thread_data *td, struct fio_file *f)
 {
 	struct timespec start;
 	int do_lat = !td->o.disable_lat;
+	struct fc_data *fcd = td->io_ops_data;
 	int ret;
 
 	dprint(FD_FILE, "fd delete %s\n", f->file_name);
@@ -195,7 +227,14 @@ static int delete_file(struct thread_data *td, struct fio_file *f)
 	if (do_lat)
 		fio_gettime(&start, NULL);
 
-	ret = unlink(f->file_name);
+	if (fcd->op_engine == FILE_OP_ENGINE)
+		ret = unlink(f->file_name);
+	else if (fcd->op_engine == DIR_OP_ENGINE)
+		ret = rmdir(f->file_name);
+	else {
+		log_err("fio: unknown file/directory operation engine\n");
+		return 1;
+	}
 
 	if (ret == -1) {
 		char buf[FIO_VERROR_SIZE];
@@ -250,6 +289,17 @@ static int init(struct thread_data *td)
 	else if (td_write(td))
 		data->stat_ddir = DDIR_WRITE;
 
+	data->op_engine = UNKNOWN_OP_ENGINE;
+
+	if (!strncmp(td->o.ioengine, "file", 4)) {
+		data->op_engine = FILE_OP_ENGINE;
+		dprint(FD_FILE, "Operate engine type: file\n");
+	}
+	if (!strncmp(td->o.ioengine, "dir", 3)) {
+		data->op_engine = DIR_OP_ENGINE;
+		dprint(FD_FILE, "Operate engine type: directory\n");
+	}
+
 	td->io_ops_data = data;
 	return 0;
 }
@@ -261,6 +311,12 @@ static void cleanup(struct thread_data *td)
 	free(data);
 }
 
+static int remove_dir(struct thread_data *td, struct fio_file *f)
+{
+	dprint(FD_FILE, "remove directory %s\n", f->file_name);
+	return rmdir(f->file_name);
+}
+
 static struct ioengine_ops ioengine_filecreate = {
 	.name		= "filecreate",
 	.version	= FIO_IOOPS_VERSION,
@@ -302,12 +358,60 @@ static struct ioengine_ops ioengine_filedelete = {
 				FIO_NOSTATS | FIO_NOFILEHASH,
 };
 
+static struct ioengine_ops ioengine_dircreate = {
+	.name		= "dircreate",
+	.version	= FIO_IOOPS_VERSION,
+	.init		= init,
+	.cleanup	= cleanup,
+	.queue		= queue_io,
+	.get_file_size	= get_file_size,
+	.open_file	= open_file,
+	.close_file	= generic_close_file,
+	.unlink_file    = remove_dir,
+	.flags		= FIO_DISKLESSIO | FIO_SYNCIO | FIO_FAKEIO |
+				FIO_NOSTATS | FIO_NOFILEHASH,
+};
+
+static struct ioengine_ops ioengine_dirstat = {
+	.name		= "dirstat",
+	.version	= FIO_IOOPS_VERSION,
+	.setup		= setup_dirs,
+	.init		= init,
+	.cleanup	= cleanup,
+	.queue		= queue_io,
+	.invalidate	= invalidate_do_nothing,
+	.get_file_size	= generic_get_file_size,
+	.open_file	= stat_file,
+	.unlink_file	= remove_dir,
+	.flags		=  FIO_DISKLESSIO | FIO_SYNCIO | FIO_FAKEIO |
+				FIO_NOSTATS | FIO_NOFILEHASH,
+	.options	= options,
+	.option_struct_size = sizeof(struct filestat_options),
+};
+
+static struct ioengine_ops ioengine_dirdelete = {
+	.name		= "dirdelete",
+	.version	= FIO_IOOPS_VERSION,
+	.setup		= setup_dirs,
+	.init		= init,
+	.invalidate	= invalidate_do_nothing,
+	.cleanup	= cleanup,
+	.queue		= queue_io,
+	.get_file_size	= get_file_size,
+	.open_file	= delete_file,
+	.unlink_file	= remove_dir,
+	.flags		= FIO_DISKLESSIO | FIO_SYNCIO | FIO_FAKEIO |
+				FIO_NOSTATS | FIO_NOFILEHASH,
+};
 
 static void fio_init fio_fileoperations_register(void)
 {
 	register_ioengine(&ioengine_filecreate);
 	register_ioengine(&ioengine_filestat);
 	register_ioengine(&ioengine_filedelete);
+	register_ioengine(&ioengine_dircreate);
+	register_ioengine(&ioengine_dirstat);
+	register_ioengine(&ioengine_dirdelete);
 }
 
 static void fio_exit fio_fileoperations_unregister(void)
@@ -315,4 +419,7 @@ static void fio_exit fio_fileoperations_unregister(void)
 	unregister_ioengine(&ioengine_filecreate);
 	unregister_ioengine(&ioengine_filestat);
 	unregister_ioengine(&ioengine_filedelete);
+	unregister_ioengine(&ioengine_dircreate);
+	unregister_ioengine(&ioengine_dirstat);
+	unregister_ioengine(&ioengine_dirdelete);
 }
diff --git a/examples/dircreate-ioengine.fio b/examples/dircreate-ioengine.fio
new file mode 100644
index 00000000..c89d9e4d
--- /dev/null
+++ b/examples/dircreate-ioengine.fio
@@ -0,0 +1,25 @@
+# Example dircreate job
+#
+# create_on_open is needed so that the open happens during the run and not the
+# setup.
+#
+# openfiles needs to be set so that you do not exceed the maximum allowed open
+# files.
+#
+# filesize needs to be set to a non zero value so fio will actually run, but the
+# IO will not really be done and the write latency numbers will only reflect the
+# open times.
+[global]
+create_on_open=1
+nrfiles=30
+ioengine=dircreate
+fallocate=none
+filesize=4k
+openfiles=1
+
+[t0]
+[t1]
+[t2]
+[t3]
+[t4]
+[t5]
diff --git a/examples/dircreate-ioengine.png b/examples/dircreate-ioengine.png
new file mode 100644
index 00000000..da1a8c40
Binary files /dev/null and b/examples/dircreate-ioengine.png differ
diff --git a/examples/dirdelete-ioengine.fio b/examples/dirdelete-ioengine.fio
new file mode 100644
index 00000000..4e5b1e2c
--- /dev/null
+++ b/examples/dirdelete-ioengine.fio
@@ -0,0 +1,18 @@
+# Example dirdelete job
+
+# 'filedelete' engine only do 'rmdir(dirname)'.
+# 'filesize' must be set, then directories will be created at setup stage.
+# 'unlink' is better set to 0, since the directory is deleted in measurement.
+# the options disabled completion latency output such as 'disable_clat' and 'gtod_reduce' must not set.
+[global]
+ioengine=dirdelete
+filesize=4k
+nrfiles=200
+unlink=0
+
+[t0]
+[t1]
+[t2]
+[t3]
+[t4]
+[t5]
diff --git a/examples/dirdelete-ioengine.png b/examples/dirdelete-ioengine.png
new file mode 100644
index 00000000..af246195
Binary files /dev/null and b/examples/dirdelete-ioengine.png differ
diff --git a/examples/dirstat-ioengine.fio b/examples/dirstat-ioengine.fio
new file mode 100644
index 00000000..1322dd28
--- /dev/null
+++ b/examples/dirstat-ioengine.fio
@@ -0,0 +1,18 @@
+# Example dirstat job
+
+# 'dirstat' engine only do 'stat(dirname)', file will not be open().
+# 'filesize' must be set, then files will be created at setup stage.
+
+[global]
+ioengine=dirstat
+numjobs=10
+filesize=4k
+nrfiles=5
+thread
+
+[t0]
+[t1]
+[t2]
+[t3]
+[t4]
+[t5]
diff --git a/examples/dirstat-ioengine.png b/examples/dirstat-ioengine.png
new file mode 100644
index 00000000..14b948ba
Binary files /dev/null and b/examples/dirstat-ioengine.png differ
diff --git a/fio.1 b/fio.1
index 09c6b621..63375c62 100644
--- a/fio.1
+++ b/fio.1
@@ -2004,6 +2004,21 @@ Simply delete files by unlink() and do no I/O to the file. You need to set 'file
 and 'nrfiles', so that files will be created.
 This engine is to measure file delete.
 .TP
+.B dircreate
+Simply create the directories and do no I/O to them.  You still need to set
+\fBfilesize\fR so that all the accounting still occurs, but no actual I/O will be
+done other than creating the directories.
+.TP
+.B dirstat
+Simply do stat() and do no I/O to the directory. You need to set 'filesize'
+and 'nrfiles', so that directories will be created.
+This engine is to measure directory lookup and meta data access.
+.TP
+.B dirdelete
+Simply delete directories by unlink() and do no I/O to the directory. You need to set 'filesize'
+and 'nrfiles', so that directories will be created.
+This engine is to measure directory delete.
+.TP
 .B libpmem
 Read and write using mmap I/O to a file on a filesystem
 mounted with DAX on a persistent memory device through the PMDK
diff --git a/iolog.c b/iolog.c
index 251e9d7f..96af4f33 100644
--- a/iolog.c
+++ b/iolog.c
@@ -814,6 +814,8 @@ bool init_iolog(struct thread_data *td)
 	if (!ret)
 		td_verror(td, EINVAL, "failed initializing iolog");
 
+	init_disk_util(td);
+
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-03-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-03-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 140c58beeee44a10358a817c7699b66c5c7290f9:

  test: add the test for regrow logs with asynchronous I/O replay (2024-03-21 05:57:54 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 20f42c101f7876648705a4fb8a9e2a647dc936ce:

  t/run-fio-tests: restrict t0031 to Linux only (2024-03-21 08:36:14 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      t/run-fio-tests: restrict t0031 to Linux only

 t/run-fio-tests.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 1b884d87..22580613 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -876,7 +876,7 @@ TEST_LIST = [
         'success':          SUCCESS_DEFAULT,
         'pre_job':          't0031-pre.fio',
         'pre_success':      SUCCESS_DEFAULT,
-        'requirements':     [],
+        'requirements':     [Requirements.linux, Requirements.libaio],
     },
     {
         'test_id':          1000,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-03-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-03-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7d6c99e917f7d68ffebbd1750802f7aed9c3d461:

  docs: fix documentation for rate_cycle (2024-03-18 14:51:10 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 140c58beeee44a10358a817c7699b66c5c7290f9:

  test: add the test for regrow logs with asynchronous I/O replay (2024-03-21 05:57:54 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (2):
      iolog: regrow logs in iolog_delay()
      test: add the test for regrow logs with asynchronous I/O replay

Vincent Fu (2):
      t/fiotestlib: pass command-line options to FioJobFileTest
      t/jobs/t0030: add test for --bandwidth-log option

 iolog.c              |  2 ++
 t/fiotestlib.py      |  8 +++++---
 t/jobs/t0030.fio     | 10 ++++++++++
 t/jobs/t0031-pre.fio |  8 ++++++++
 t/jobs/t0031.fio     |  7 +++++++
 t/run-fio-tests.py   | 19 +++++++++++++++++++
 6 files changed, 51 insertions(+), 3 deletions(-)
 create mode 100644 t/jobs/t0030.fio
 create mode 100644 t/jobs/t0031-pre.fio
 create mode 100644 t/jobs/t0031.fio

---

Diff of recent changes:

diff --git a/iolog.c b/iolog.c
index f52a9a80..251e9d7f 100644
--- a/iolog.c
+++ b/iolog.c
@@ -102,6 +102,8 @@ static void iolog_delay(struct thread_data *td, unsigned long delay)
 		ret = io_u_queued_complete(td, 0);
 		if (ret < 0)
 			td_verror(td, -ret, "io_u_queued_complete");
+		if (td->flags & TD_F_REGROW_LOGS)
+			regrow_logs(td);
 		if (utime_since_now(&ts) > delay)
 			break;
 	}
diff --git a/t/fiotestlib.py b/t/fiotestlib.py
index a96338a3..466e482d 100755
--- a/t/fiotestlib.py
+++ b/t/fiotestlib.py
@@ -175,7 +175,7 @@ class FioJobFileTest(FioExeTest):
 
         super().__init__(fio_path, success, testnum, artifact_root)
 
-    def setup(self, parameters=None):
+    def setup(self, parameters):
         """Setup instance variables for fio job test."""
 
         self.filenames['fio_output'] = f"{os.path.basename(self.fio_job)}.output"
@@ -185,6 +185,8 @@ class FioJobFileTest(FioExeTest):
             f"--output={self.filenames['fio_output']}",
             self.fio_job,
             ]
+        if parameters:
+            fio_args += parameters
 
         super().setup(fio_args)
 
@@ -206,7 +208,7 @@ class FioJobFileTest(FioExeTest):
                             self.testnum,
                             self.paths['artifacts'],
                             output_format=self.output_format)
-        precon.setup()
+        precon.setup(None)
         precon.run()
         precon.check_result()
         self.precon_failed = not precon.passed
@@ -412,7 +414,7 @@ def run_fio_tests(test_list, test_env, args):
                 fio_pre_success=fio_pre_success,
                 output_format=output_format)
             desc = config['job']
-            parameters = []
+            parameters = config['parameters'] if 'parameters' in config else None
         elif issubclass(config['test_class'], FioJobCmdTest):
             if not 'success' in config:
                 config['success'] = SUCCESS_DEFAULT
diff --git a/t/jobs/t0030.fio b/t/jobs/t0030.fio
new file mode 100644
index 00000000..8bbc810e
--- /dev/null
+++ b/t/jobs/t0030.fio
@@ -0,0 +1,10 @@
+# run with --bandwidth-log
+# broken behavior: seg fault
+# successful behavior: test runs to completion with 0 as the exit code
+
+[test]
+ioengine=null
+filesize=1T
+rw=read
+time_based
+runtime=2s
diff --git a/t/jobs/t0031-pre.fio b/t/jobs/t0031-pre.fio
new file mode 100644
index 00000000..ce4ee3b6
--- /dev/null
+++ b/t/jobs/t0031-pre.fio
@@ -0,0 +1,8 @@
+[job]
+rw=write
+ioengine=libaio
+size=1mb
+time_based=1
+runtime=1
+filename=t0030file
+write_iolog=iolog
diff --git a/t/jobs/t0031.fio b/t/jobs/t0031.fio
new file mode 100644
index 00000000..ae8f7442
--- /dev/null
+++ b/t/jobs/t0031.fio
@@ -0,0 +1,7 @@
+[job]
+rw=read
+ioengine=libaio
+iodepth=128
+filename=t0030file
+read_iolog=iolog
+write_lat_log=lat_log
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 08134e50..1b884d87 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -859,6 +859,25 @@ TEST_LIST = [
         'output_format':    'json',
         'requirements':     [],
     },
+    {
+        'test_id':          30,
+        'test_class':       FioJobFileTest,
+        'job':              't0030.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'parameters':       ['--bandwidth-log'],
+        'requirements':     [],
+    },
+    {
+        'test_id':          31,
+        'test_class':       FioJobFileTest,
+        'job':              't0031.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          't0031-pre.fio',
+        'pre_success':      SUCCESS_DEFAULT,
+        'requirements':     [],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-03-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-03-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b140fc5e484638a467480e369485b91290288d58:

  t/nvmept_pi: add support for xNVMe ioengine (2024-03-07 19:36:30 +0000)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7d6c99e917f7d68ffebbd1750802f7aed9c3d461:

  docs: fix documentation for rate_cycle (2024-03-18 14:51:10 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      docs: fix documentation for rate_cycle

 HOWTO.rst | 4 ++--
 fio.1     | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 2386d806..4c8ac331 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -3323,8 +3323,8 @@ I/O rate
 
 .. option:: rate_cycle=int
 
-	Average bandwidth for :option:`rate` and :option:`rate_min` over this number
-	of milliseconds. Defaults to 1000.
+        Average bandwidth for :option:`rate_min` and :option:`rate_iops_min`
+        over this number of milliseconds. Defaults to 1000.
 
 
 I/O latency
diff --git a/fio.1 b/fio.1
index d955385d..09c6b621 100644
--- a/fio.1
+++ b/fio.1
@@ -3064,7 +3064,7 @@ ignore the thinktime and continue doing IO at the specified rate, instead of
 entering a catch-up mode after thinktime is done.
 .TP
 .BI rate_cycle \fR=\fPint
-Average bandwidth for \fBrate\fR and \fBrate_min\fR over this number
+Average bandwidth for \fBrate_min\fR and \fBrate_iops_min\fR over this number
 of milliseconds. Defaults to 1000.
 .SS "I/O latency"
 .TP

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-03-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-03-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9b699fb150bbed56939d317ffc004b3bf19f098f:

  Doc: Make note of using bsrange with ':' (2024-03-05 10:54:36 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b140fc5e484638a467480e369485b91290288d58:

  t/nvmept_pi: add support for xNVMe ioengine (2024-03-07 19:36:30 +0000)

----------------------------------------------------------------
Vincent Fu (6):
      examples: update plot for cmdprio-bssplit
      examples: add plots for xNVMe examples
      examples: add fiograph plot for uring-cmd-trim-multi-range
      examples: add fiograph plots for netio_vsock examples
      t/nvmept_pi: drop JSON output for error cases
      t/nvmept_pi: add support for xNVMe ioengine

 examples/cmdprio-bssplit.png            | Bin 45606 -> 134359 bytes
 examples/netio_vsock.png                | Bin 0 -> 46920 bytes
 examples/netio_vsock_receiver.png       | Bin 0 -> 34879 bytes
 examples/netio_vsock_sender.png         | Bin 0 -> 35539 bytes
 examples/uring-cmd-trim-multi-range.png | Bin 0 -> 67853 bytes
 examples/xnvme-fdp.png                  | Bin 0 -> 53284 bytes
 examples/xnvme-pi.png                   | Bin 0 -> 76492 bytes
 t/nvmept_pi.py                          |  18 +++++++++++-------
 8 files changed, 11 insertions(+), 7 deletions(-)
 create mode 100644 examples/netio_vsock.png
 create mode 100644 examples/netio_vsock_receiver.png
 create mode 100644 examples/netio_vsock_sender.png
 create mode 100644 examples/uring-cmd-trim-multi-range.png
 create mode 100644 examples/xnvme-fdp.png
 create mode 100644 examples/xnvme-pi.png

---

Diff of recent changes:

diff --git a/examples/cmdprio-bssplit.png b/examples/cmdprio-bssplit.png
index a0bb3ff4..83a5570b 100644
Binary files a/examples/cmdprio-bssplit.png and b/examples/cmdprio-bssplit.png differ
diff --git a/examples/netio_vsock.png b/examples/netio_vsock.png
new file mode 100644
index 00000000..01aadde5
Binary files /dev/null and b/examples/netio_vsock.png differ
diff --git a/examples/netio_vsock_receiver.png b/examples/netio_vsock_receiver.png
new file mode 100644
index 00000000..524a7a1c
Binary files /dev/null and b/examples/netio_vsock_receiver.png differ
diff --git a/examples/netio_vsock_sender.png b/examples/netio_vsock_sender.png
new file mode 100644
index 00000000..75802aaf
Binary files /dev/null and b/examples/netio_vsock_sender.png differ
diff --git a/examples/uring-cmd-trim-multi-range.png b/examples/uring-cmd-trim-multi-range.png
new file mode 100644
index 00000000..c3ffd546
Binary files /dev/null and b/examples/uring-cmd-trim-multi-range.png differ
diff --git a/examples/xnvme-fdp.png b/examples/xnvme-fdp.png
new file mode 100644
index 00000000..7f802741
Binary files /dev/null and b/examples/xnvme-fdp.png differ
diff --git a/examples/xnvme-pi.png b/examples/xnvme-pi.png
new file mode 100644
index 00000000..def7e680
Binary files /dev/null and b/examples/xnvme-pi.png differ
diff --git a/t/nvmept_pi.py b/t/nvmept_pi.py
index 5de77c9d..df7c0b9f 100755
--- a/t/nvmept_pi.py
+++ b/t/nvmept_pi.py
@@ -43,13 +43,11 @@ class DifDixTest(FioJobCmdTest):
 
         fio_args = [
             "--name=nvmept_pi",
-            "--ioengine=io_uring_cmd",
-            "--cmd_type=nvme",
+            f"--ioengine={self.fio_opts['ioengine']}",
             f"--filename={self.fio_opts['filename']}",
             f"--rw={self.fio_opts['rw']}",
             f"--bsrange={self.fio_opts['bsrange']}",
             f"--output={self.filenames['output']}",
-            f"--output-format={self.fio_opts['output-format']}",
             f"--md_per_io_size={self.fio_opts['md_per_io_size']}",
             f"--pi_act={self.fio_opts['pi_act']}",
             f"--pi_chk={self.fio_opts['pi_chk']}",
@@ -58,11 +56,18 @@ class DifDixTest(FioJobCmdTest):
         ]
         for opt in ['fixedbufs', 'nonvectored', 'force_async', 'registerfiles',
                     'sqthread_poll', 'sqthread_poll_cpu', 'hipri', 'nowait',
-                    'time_based', 'runtime', 'verify', 'io_size', 'offset', 'number_ios']:
+                    'time_based', 'runtime', 'verify', 'io_size', 'offset', 'number_ios',
+                    'output-format']:
             if opt in self.fio_opts:
                 option = f"--{opt}={self.fio_opts[opt]}"
                 fio_args.append(option)
 
+        if self.fio_opts['ioengine'] == 'io_uring_cmd':
+            fio_args.append('--cmd_type=nvme')
+        elif self.fio_opts['ioengine'] == 'xnvme':
+            fio_args.append('--thread=1')
+            fio_args.append('--xnvme_async=io_uring_cmd')
+
         super().setup(fio_args)
 
 
@@ -622,7 +627,6 @@ TEST_LIST = [
         "fio_opts": {
             "rw": 'read',
             "number_ios": NUMBER_IOS,
-            "output-format": "json",
             "pi_act": 0,
             "apptag": "0x8888",
             "apptag_mask": "0x0FFF",
@@ -639,7 +643,6 @@ TEST_LIST = [
         "fio_opts": {
             "rw": 'read',
             "number_ios": NUMBER_IOS,
-            "output-format": "json",
             "pi_act": 0,
             "apptag": "0x8888",
             "apptag_mask": "0x0FFF",
@@ -660,7 +663,6 @@ TEST_LIST = [
         "fio_opts": {
             "rw": 'read',
             "number_ios": NUMBER_IOS,
-            "output-format": "json",
             "pi_act": 0,
             "apptag": "0x8888",
             "apptag_mask": "0x0FFF",
@@ -689,6 +691,7 @@ def parse_args():
                         '(e.g., /dev/ng0n1). WARNING: THIS IS A DESTRUCTIVE TEST', required=True)
     parser.add_argument('-l', '--lbaf', nargs='+', type=int,
                         help='list of lba formats to test')
+    parser.add_argument('-i', '--ioengine', default='io_uring_cmd')
     args = parser.parse_args()
 
     return args
@@ -909,6 +912,7 @@ def main():
 
     for test in TEST_LIST:
         test['fio_opts']['filename'] = args.dut
+        test['fio_opts']['ioengine'] = args.ioengine
 
     test_env = {
               'fio_path': fio_path,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-03-06 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-03-06 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2dddfd35396e2f7a1bb06cc7c92aa1e283be084e:

  Merge branch 'patch-ioengines' of https://github.com/kcoms555/fio (2024-03-04 07:31:47 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9b699fb150bbed56939d317ffc004b3bf19f098f:

  Doc: Make note of using bsrange with ':' (2024-03-05 10:54:36 -0500)

----------------------------------------------------------------
Avri Altman (5):
      fio: Some minor code cleanups
      t/jobs: Further clarify regression test 7
      t/jobs: Rename test job 15
      t/jobs: Fix a typo in jobs 23 & 24
      Doc: Make note of using bsrange with ':'

 HOWTO.rst                                         | 4 ++--
 backend.c                                         | 4 ++--
 eta.c                                             | 8 +++++---
 fio.1                                             | 2 +-
 io_u.c                                            | 3 ++-
 parse.h                                           | 2 +-
 stat.h                                            | 1 -
 t/jobs/t0007-37cf9e3c.fio                         | 5 ++++-
 t/jobs/{t0015-e78980ff.fio => t0015-4e7e7898.fio} | 0
 t/jobs/t0023.fio                                  | 4 ++--
 t/jobs/t0024.fio                                  | 2 +-
 t/run-fio-tests.py                                | 2 +-
 12 files changed, 21 insertions(+), 16 deletions(-)
 rename t/jobs/{t0015-e78980ff.fio => t0015-4e7e7898.fio} (100%)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 169cdc2a..2386d806 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1631,7 +1631,7 @@ Block size
 	Comma-separated ranges may be specified for reads, writes, and trims as
 	described in :option:`blocksize`.
 
-	Example: ``bsrange=1k-4k,2k-8k``.
+	Example: ``bsrange=1k-4k,2k-8k`` also the ':' delimiter ``bsrange=1k:4k,2k:8k``.
 
 .. option:: bssplit=str[,str][,str]
 
@@ -2786,7 +2786,7 @@ with the caveat that when used on the command line, they must come after the
 
 .. option:: sg_write_mode=str : [sg]
 
-	Specify the type of write commands to issue. This option can take three values:
+	Specify the type of write commands to issue. This option can take ten values:
 
 	**write**
 		This is the default where write opcodes are issued as usual.
diff --git a/backend.c b/backend.c
index 2f2221bf..fb7dc68a 100644
--- a/backend.c
+++ b/backend.c
@@ -2094,14 +2094,14 @@ static void reap_threads(unsigned int *nr_running, uint64_t *t_rate,
 			 uint64_t *m_rate)
 {
 	unsigned int cputhreads, realthreads, pending;
-	int status, ret;
+	int ret;
 
 	/*
 	 * reap exited threads (TD_EXITED -> TD_REAPED)
 	 */
 	realthreads = pending = cputhreads = 0;
 	for_each_td(td) {
-		int flags = 0;
+		int flags = 0, status;
 
 		if (!strcmp(td->o.ioengine, "cpuio"))
 			cputhreads++;
diff --git a/eta.c b/eta.c
index cc342461..7d07708f 100644
--- a/eta.c
+++ b/eta.c
@@ -215,8 +215,9 @@ static unsigned long thread_eta(struct thread_data *td)
 				perc = td->o.rwmix[DDIR_WRITE];
 
 			bytes_total += (bytes_total * perc) / 100;
-		} else
+		} else {
 			bytes_total <<= 1;
+		}
 	}
 
 	if (td->runstate == TD_RUNNING || td->runstate == TD_VERIFYING) {
@@ -228,8 +229,9 @@ static unsigned long thread_eta(struct thread_data *td)
 			perc = (double) bytes_done / (double) bytes_total;
 			if (perc > 1.0)
 				perc = 1.0;
-		} else
+		} else {
 			perc = 0.0;
+		}
 
 		if (td->o.time_based) {
 			if (timeout) {
@@ -395,7 +397,7 @@ static bool skip_eta()
  * Print status of the jobs we know about. This includes rate estimates,
  * ETA, thread state, etc.
  */
-bool calc_thread_status(struct jobs_eta *je, int force)
+static bool calc_thread_status(struct jobs_eta *je, int force)
 {
 	int unified_rw_rep;
 	bool any_td_in_ramp;
diff --git a/fio.1 b/fio.1
index e6b291a7..d955385d 100644
--- a/fio.1
+++ b/fio.1
@@ -1434,7 +1434,7 @@ described in \fBblocksize\fR. Example:
 .RS
 .RS
 .P
-bsrange=1k\-4k,2k\-8k
+bsrange=1k\-4k,2k\-8k or bsrange=1k:4k,2k:8k
 .RE
 .RE
 .TP
diff --git a/io_u.c b/io_u.c
index 2b8e17f8..09e5f15a 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1895,8 +1895,9 @@ struct io_u *get_io_u(struct thread_data *td)
 					io_u->buflen);
 			} else if ((td->flags & TD_F_SCRAMBLE_BUFFERS) &&
 				   !(td->flags & TD_F_COMPRESS) &&
-				   !(td->flags & TD_F_DO_VERIFY))
+				   !(td->flags & TD_F_DO_VERIFY)) {
 				do_scramble = 1;
+			}
 		} else if (io_u->ddir == DDIR_READ) {
 			/*
 			 * Reset the buf_filled parameters so next time if the
diff --git a/parse.h b/parse.h
index d68484ea..806a76ee 100644
--- a/parse.h
+++ b/parse.h
@@ -32,7 +32,7 @@ enum fio_opt_type {
  */
 struct value_pair {
 	const char *ival;		/* string option */
-	unsigned long long oval;/* output value */
+	unsigned long long oval;	/* output value */
 	const char *help;		/* help text for sub option */
 	int orval;			/* OR value */
 	void *cb;			/* sub-option callback */
diff --git a/stat.h b/stat.h
index bd986d4e..0d57cceb 100644
--- a/stat.h
+++ b/stat.h
@@ -345,7 +345,6 @@ extern void stat_exit(void);
 
 extern struct json_object * show_thread_status(struct thread_stat *ts, struct group_run_stats *rs, struct flist_head *, struct buf_output *);
 extern void show_group_stats(struct group_run_stats *rs, struct buf_output *);
-extern bool calc_thread_status(struct jobs_eta *je, int force);
 extern void display_thread_status(struct jobs_eta *je);
 extern void __show_run_stats(void);
 extern int __show_running_run_stats(void);
diff --git a/t/jobs/t0007-37cf9e3c.fio b/t/jobs/t0007-37cf9e3c.fio
index d3c98751..b2592694 100644
--- a/t/jobs/t0007-37cf9e3c.fio
+++ b/t/jobs/t0007-37cf9e3c.fio
@@ -1,4 +1,7 @@
-# Expected result: fio reads 87040KB of data
+# Expected result: fio reads 87040KB of data:
+# first read is at offset 0, then 2nd read is at offset 1.5m, then the 3rd
+# read is at offset 3m, and after the last read at offset 127m - we have only
+# read 87,040K data.
 # Buggy result: fio reads the full 128MB of data
 [foo]
 size=128mb
diff --git a/t/jobs/t0015-e78980ff.fio b/t/jobs/t0015-4e7e7898.fio
similarity index 100%
rename from t/jobs/t0015-e78980ff.fio
rename to t/jobs/t0015-4e7e7898.fio
diff --git a/t/jobs/t0023.fio b/t/jobs/t0023.fio
index 4f0bef89..8e14a110 100644
--- a/t/jobs/t0023.fio
+++ b/t/jobs/t0023.fio
@@ -33,7 +33,7 @@ bsrange=512-4k
 # 			block sizes match
 # Buggy result: 	something else
 [bssplit]
-bsrange=512/25:1k:25:2k:25:4k/25
+bssplit=512/25:1k/:2k/:4k/
 
 # Expected result: 	trim issued to random offset followed by write to same offset
 # 			block sizes match
@@ -59,5 +59,5 @@ norandommap=1
 # 			block sizes match
 # Buggy result: 	something else
 [bssplit_no_rm]
-bsrange=512/25:1k:25:2k:25:4k/25
+bssplit=512/25:1k/:2k/:4k/
 norandommap=1
diff --git a/t/jobs/t0024.fio b/t/jobs/t0024.fio
index 393a2b70..2b3dc94c 100644
--- a/t/jobs/t0024.fio
+++ b/t/jobs/t0024.fio
@@ -33,4 +33,4 @@ bsrange=512-4k
 # 			block sizes match
 # Buggy result: 	something else
 [bssplit]
-bsrange=512/25:1k:25:2k:25:4k/25
+bssplit=512/25:1k/:2k/:4k/
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index d4742e96..08134e50 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -722,7 +722,7 @@ TEST_LIST = [
     {
         'test_id':          15,
         'test_class':       FioJobFileTest_t0015,
-        'job':              't0015-e78980ff.fio',
+        'job':              't0015-4e7e7898.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
         'pre_success':      None,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-03-05 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-03-05 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5ae4f4220a48dddddc84c8b839ef9d8a1ed4edb1:

  gettime: fix cpuclock-test on AMD platforms (2024-02-27 12:36:45 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2dddfd35396e2f7a1bb06cc7c92aa1e283be084e:

  Merge branch 'patch-ioengines' of https://github.com/kcoms555/fio (2024-03-04 07:31:47 -0700)

----------------------------------------------------------------
Jaeho (1):
      ioengines: Make td_io_queue print log_err when got error

Jens Axboe (1):
      Merge branch 'patch-ioengines' of https://github.com/kcoms555/fio

 ioengines.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/ioengines.c b/ioengines.c
index 87cc2286..5dd4355d 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -413,7 +413,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	if (io_u->error == EINVAL && td->io_issues[io_u->ddir & 1] == 1 &&
 	    td->o.odirect) {
 
-		log_info("fio: first direct IO errored. File system may not "
+		log_err("fio: first direct IO errored. File system may not "
 			 "support direct IO, or iomem_align= is bad, or "
 			 "invalid block size. Try setting direct=0.\n");
 	}
@@ -421,7 +421,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	if (zbd_unaligned_write(io_u->error) &&
 	    td->io_issues[io_u->ddir & 1] == 1 &&
 	    td->o.zone_mode != ZONE_MODE_ZBD) {
-		log_info("fio: first I/O failed. If %s is a zoned block device, consider --zonemode=zbd\n",
+		log_err("fio: first I/O failed. If %s is a zoned block device, consider --zonemode=zbd\n",
 			 io_u->file->file_name);
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-02-28 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-02-28 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9af4af7ab40c4e505033d0e077cc42ac84996b09:

  ci: fix macOS sphinx install issues (2024-02-22 20:01:27 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5ae4f4220a48dddddc84c8b839ef9d8a1ed4edb1:

  gettime: fix cpuclock-test on AMD platforms (2024-02-27 12:36:45 -0500)

----------------------------------------------------------------
Vincent Fu (2):
      howto: fix job_start_clock_id formatting
      gettime: fix cpuclock-test on AMD platforms

 HOWTO.rst          | 5 +++--
 arch/arch-x86_64.h | 5 +++++
 arch/arch.h        | 7 +++++++
 gettime.c          | 2 +-
 4 files changed, 16 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 4b02100c..169cdc2a 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -756,8 +756,9 @@ Time related parameters
 	CPU mask of other jobs.
 
 .. option:: job_start_clock_id=int
-   The clock_id passed to the call to `clock_gettime` used to record job_start
-   in the `json` output format. Default is 0, or CLOCK_REALTIME.
+
+        The clock_id passed to the call to `clock_gettime` used to record
+        job_start in the `json` output format. Default is 0, or CLOCK_REALTIME.
 
 
 Target file/device
diff --git a/arch/arch-x86_64.h b/arch/arch-x86_64.h
index 86ce1b7e..b402dc6d 100644
--- a/arch/arch-x86_64.h
+++ b/arch/arch-x86_64.h
@@ -26,6 +26,11 @@ static inline unsigned long arch_ffz(unsigned long bitmask)
 	return bitmask;
 }
 
+static inline void tsc_barrier(void)
+{
+	__asm__ __volatile__("mfence":::"memory");
+}
+
 static inline unsigned long long get_cpu_clock(void)
 {
 	unsigned int lo, hi;
diff --git a/arch/arch.h b/arch/arch.h
index 3ee9b053..7e294ddf 100644
--- a/arch/arch.h
+++ b/arch/arch.h
@@ -108,6 +108,13 @@ extern unsigned long arch_flags;
 #include "arch-generic.h"
 #endif
 
+#if !defined(__x86_64__) && defined(CONFIG_SYNC_SYNC)
+static inline void tsc_barrier(void)
+{
+	__sync_synchronize();
+}
+#endif
+
 #include "../lib/ffz.h"
 /* IWYU pragma: end_exports */
 
diff --git a/gettime.c b/gettime.c
index bc66a3ac..5ca31206 100644
--- a/gettime.c
+++ b/gettime.c
@@ -623,7 +623,7 @@ static void *clock_thread_fn(void *data)
 			seq = *t->seq;
 			if (seq == UINT_MAX)
 				break;
-			__sync_synchronize();
+			tsc_barrier();
 			tsc = get_cpu_clock();
 		} while (seq != atomic32_compare_and_swap(t->seq, seq, seq + 1));
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-02-23 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-02-23 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 38d01cc56366aa2fd3af42dbab522888b6359dec:

  verify: fix integer sizes in verify state file (2024-02-16 09:25:35 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9af4af7ab40c4e505033d0e077cc42ac84996b09:

  ci: fix macOS sphinx install issues (2024-02-22 20:01:27 -0500)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fix-tests-cfi' of https://github.com/mikoxyz/fio

Miko Larsson (3):
      t/io_uring: include libgen.h
      t/io_uring: use char * for name arg in detect_node
      options: declare *__val as long long

Vincent Fu (1):
      ci: fix macOS sphinx install issues

 ci/actions-install.sh |  5 ++---
 options.c             | 12 ++++++------
 t/io_uring.c          |  3 ++-
 3 files changed, 10 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index 76335fbc..6eb2d795 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -86,9 +86,8 @@ install_macos() {
     #echo "Updating homebrew..."
     #brew update >/dev/null 2>&1
     echo "Installing packages..."
-    HOMEBREW_NO_AUTO_UPDATE=1 brew install cunit libnfs sphinx-doc pygments python-certifi
-    brew link sphinx-doc --force
-    pip3 install scipy six statsmodels
+    HOMEBREW_NO_AUTO_UPDATE=1 brew install cunit libnfs
+    pip3 install scipy six statsmodels sphinx
 }
 
 install_windows() {
diff --git a/options.c b/options.c
index 25e042d0..de935efc 100644
--- a/options.c
+++ b/options.c
@@ -647,7 +647,7 @@ static int fio_clock_source_cb(void *data, const char *str)
 	return 0;
 }
 
-static int str_rwmix_read_cb(void *data, unsigned long long *val)
+static int str_rwmix_read_cb(void *data, long long *val)
 {
 	struct thread_data *td = cb_data_to_td(data);
 
@@ -656,7 +656,7 @@ static int str_rwmix_read_cb(void *data, unsigned long long *val)
 	return 0;
 }
 
-static int str_rwmix_write_cb(void *data, unsigned long long *val)
+static int str_rwmix_write_cb(void *data, long long *val)
 {
 	struct thread_data *td = cb_data_to_td(data);
 
@@ -1625,7 +1625,7 @@ static int str_gtod_reduce_cb(void *data, int *il)
 	return 0;
 }
 
-static int str_offset_cb(void *data, unsigned long long *__val)
+static int str_offset_cb(void *data, long long *__val)
 {
 	struct thread_data *td = cb_data_to_td(data);
 	unsigned long long v = *__val;
@@ -1646,7 +1646,7 @@ static int str_offset_cb(void *data, unsigned long long *__val)
 	return 0;
 }
 
-static int str_offset_increment_cb(void *data, unsigned long long *__val)
+static int str_offset_increment_cb(void *data, long long *__val)
 {
 	struct thread_data *td = cb_data_to_td(data);
 	unsigned long long v = *__val;
@@ -1667,7 +1667,7 @@ static int str_offset_increment_cb(void *data, unsigned long long *__val)
 	return 0;
 }
 
-static int str_size_cb(void *data, unsigned long long *__val)
+static int str_size_cb(void *data, long long *__val)
 {
 	struct thread_data *td = cb_data_to_td(data);
 	unsigned long long v = *__val;
@@ -1711,7 +1711,7 @@ static int str_io_size_cb(void *data, unsigned long long *__val)
 	return 0;
 }
 
-static int str_zoneskip_cb(void *data, unsigned long long *__val)
+static int str_zoneskip_cb(void *data, long long *__val)
 {
 	struct thread_data *td = cb_data_to_td(data);
 	unsigned long long v = *__val;
diff --git a/t/io_uring.c b/t/io_uring.c
index 46b153dc..18e8b38e 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -28,6 +28,7 @@
 #include <string.h>
 #include <pthread.h>
 #include <sched.h>
+#include <libgen.h>
 
 #include "../arch/arch.h"
 #include "../os/os.h"
@@ -819,7 +820,7 @@ static void set_affinity(struct submitter *s)
 #endif
 }
 
-static int detect_node(struct submitter *s, const char *name)
+static int detect_node(struct submitter *s, char *name)
 {
 #ifdef CONFIG_LIBNUMA
 	const char *base = basename(name);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-02-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-02-17 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2d0debb3fca7ddc4374624acb8c70fc8292d860d:

  t/run-fio-tests: add t/nvmept_trim.py (2024-02-15 14:05:12 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 38d01cc56366aa2fd3af42dbab522888b6359dec:

  verify: fix integer sizes in verify state file (2024-02-16 09:25:35 -0500)

----------------------------------------------------------------
Vincent Fu (1):
      verify: fix integer sizes in verify state file

 verify.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/verify.c b/verify.c
index b438eed6..b2fede24 100644
--- a/verify.c
+++ b/verify.c
@@ -1619,8 +1619,8 @@ struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 		comps = fill_file_completions(td, s, &index);
 
 		s->no_comps = cpu_to_le64((uint64_t) comps);
-		s->depth = cpu_to_le64((uint64_t) td->o.iodepth);
-		s->nofiles = cpu_to_le64((uint64_t) td->o.nr_files);
+		s->depth = cpu_to_le32((uint32_t) td->o.iodepth);
+		s->nofiles = cpu_to_le32((uint32_t) td->o.nr_files);
 		s->numberio = cpu_to_le64((uint64_t) td->io_issues[DDIR_WRITE]);
 		s->index = cpu_to_le64((uint64_t) __td_index);
 		if (td->random_state.use64) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-02-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-02-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7aec5ac0bdd1adcaeba707f26d5bc583de6ab6c9:

  test: add the test for loops option and read-verify workloads (2024-02-14 07:39:48 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2d0debb3fca7ddc4374624acb8c70fc8292d860d:

  t/run-fio-tests: add t/nvmept_trim.py (2024-02-15 14:05:12 -0500)

----------------------------------------------------------------
Ankit Kumar (3):
      trim: add support for multiple ranges
      engines/nvme: pass offset and len instead of io_u
      engines/io_uring: add multi range dsm support

Jens Axboe (3):
      t/io_uring: account and ignore IO errors
      t/io_uring: pre-calculate per-file depth
      io_u: move number_trim to reclaim 8 bytes in struct io_u

Vincent Fu (2):
      t/nvmept_trim.py: test multi-range trim
      t/run-fio-tests: add t/nvmept_trim.py

 HOWTO.rst                               |   9 +
 backend.c                               |  20 +-
 cconv.c                                 |   2 +
 engines/io_uring.c                      |  34 +-
 engines/nvme.c                          |  72 ++--
 engines/nvme.h                          |   7 +-
 examples/uring-cmd-trim-multi-range.fio |  21 ++
 fio.1                                   |   7 +
 fio.h                                   |  18 +
 init.c                                  |  13 +
 io_u.c                                  |  97 +++++-
 io_u.h                                  |   5 +
 ioengines.h                             |   2 +
 options.c                               |  11 +
 server.h                                |   2 +-
 t/io_uring.c                            |  62 ++--
 t/nvmept_trim.py                        | 586 ++++++++++++++++++++++++++++++++
 t/run-fio-tests.py                      |   8 +
 thread_options.h                        |   3 +
 19 files changed, 899 insertions(+), 80 deletions(-)
 create mode 100644 examples/uring-cmd-trim-multi-range.fio
 create mode 100755 t/nvmept_trim.py

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 5bc1713c..4b02100c 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2534,6 +2534,15 @@ with the caveat that when used on the command line, they must come after the
 	Specifies logical block application tag mask value, if namespace is
 	formatted to use end to end protection information. Default: 0xffff.
 
+.. option:: num_range=int : [io_uring_cmd]
+
+	For trim command this will be the number of ranges to trim per I/O
+	request. The number of logical blocks per range is determined by the
+	:option:`bs` option which should be a multiple of logical block size.
+	This cannot be used with read or write. Note that setting this
+	option > 1, :option:`log_offset` will not be able to log all the
+	offsets. Default: 1.
+
 .. option:: cpuload=int : [cpuio]
 
 	Attempt to use the specified percentage of CPU cycles. This is a mandatory
diff --git a/backend.c b/backend.c
index 1fab467a..2f2221bf 100644
--- a/backend.c
+++ b/backend.c
@@ -1333,7 +1333,7 @@ static int init_io_u(struct thread_data *td)
 int init_io_u_buffers(struct thread_data *td)
 {
 	struct io_u *io_u;
-	unsigned long long max_bs, min_write;
+	unsigned long long max_bs, min_write, trim_bs = 0;
 	int i, max_units;
 	int data_xfer = 1;
 	char *p;
@@ -1344,7 +1344,18 @@ int init_io_u_buffers(struct thread_data *td)
 	td->orig_buffer_size = (unsigned long long) max_bs
 					* (unsigned long long) max_units;
 
-	if (td_ioengine_flagged(td, FIO_NOIO) || !(td_read(td) || td_write(td)))
+	if (td_trim(td) && td->o.num_range > 1) {
+		trim_bs = td->o.num_range * sizeof(struct trim_range);
+		td->orig_buffer_size = trim_bs
+					* (unsigned long long) max_units;
+	}
+
+	/*
+	 * For reads, writes, and multi-range trim operations we need a
+	 * data buffer
+	 */
+	if (td_ioengine_flagged(td, FIO_NOIO) ||
+	    !(td_read(td) || td_write(td) || (td_trim(td) && td->o.num_range > 1)))
 		data_xfer = 0;
 
 	/*
@@ -1396,7 +1407,10 @@ int init_io_u_buffers(struct thread_data *td)
 				fill_verify_pattern(td, io_u->buf, max_bs, io_u, 0, 0);
 			}
 		}
-		p += max_bs;
+		if (td_trim(td) && td->o.num_range > 1)
+			p += trim_bs;
+		else
+			p += max_bs;
 	}
 
 	return 0;
diff --git a/cconv.c b/cconv.c
index c9298408..ead47248 100644
--- a/cconv.c
+++ b/cconv.c
@@ -111,6 +111,7 @@ int convert_thread_options_to_cpu(struct thread_options *o,
 	o->serialize_overlap = le32_to_cpu(top->serialize_overlap);
 	o->size = le64_to_cpu(top->size);
 	o->io_size = le64_to_cpu(top->io_size);
+	o->num_range = le32_to_cpu(top->num_range);
 	o->size_percent = le32_to_cpu(top->size_percent);
 	o->io_size_percent = le32_to_cpu(top->io_size_percent);
 	o->fill_device = le32_to_cpu(top->fill_device);
@@ -609,6 +610,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 
 	top->size = __cpu_to_le64(o->size);
 	top->io_size = __cpu_to_le64(o->io_size);
+	top->num_range = __cpu_to_le32(o->num_range);
 	top->verify_backlog = __cpu_to_le64(o->verify_backlog);
 	top->start_delay = __cpu_to_le64(o->start_delay);
 	top->start_delay_high = __cpu_to_le64(o->start_delay_high);
diff --git a/engines/io_uring.c b/engines/io_uring.c
index c0cb5a78..9069fa3e 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -81,7 +81,7 @@ struct ioring_data {
 
 	struct cmdprio cmdprio;
 
-	struct nvme_dsm_range *dsm;
+	struct nvme_dsm *dsm;
 };
 
 struct ioring_options {
@@ -385,6 +385,9 @@ static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
 	struct fio_file *f = io_u->file;
 	struct nvme_uring_cmd *cmd;
 	struct io_uring_sqe *sqe;
+	struct nvme_dsm *dsm;
+	void *ptr = ld->dsm;
+	unsigned int dsm_size;
 
 	/* only supports nvme_uring_cmd */
 	if (o->cmd_type != FIO_URING_CMD_NVME)
@@ -423,9 +426,13 @@ static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
 	}
 
 	cmd = (struct nvme_uring_cmd *)sqe->cmd;
+	dsm_size = sizeof(*ld->dsm) + td->o.num_range * sizeof(struct nvme_dsm_range);
+	ptr += io_u->index * dsm_size;
+	dsm = (struct nvme_dsm *)ptr;
+
 	return fio_nvme_uring_cmd_prep(cmd, io_u,
 			o->nonvectored ? NULL : &ld->iovecs[io_u->index],
-			&ld->dsm[io_u->index]);
+			dsm);
 }
 
 static struct io_u *fio_ioring_event(struct thread_data *td, int event)
@@ -1133,8 +1140,11 @@ static int fio_ioring_init(struct thread_data *td)
 {
 	struct ioring_options *o = td->eo;
 	struct ioring_data *ld;
+	struct nvme_dsm *dsm;
+	void *ptr;
+	unsigned int dsm_size;
 	unsigned long long md_size;
-	int ret;
+	int ret, i;
 
 	/* sqthread submission requires registered files */
 	if (o->sqpoll_thread)
@@ -1195,10 +1205,19 @@ static int fio_ioring_init(struct thread_data *td)
 	 * in zbd mode where trim means zone reset.
 	 */
 	if (!strcmp(td->io_ops->name, "io_uring_cmd") && td_trim(td) &&
-	    td->o.zone_mode == ZONE_MODE_ZBD)
+	    td->o.zone_mode == ZONE_MODE_ZBD) {
 		td->io_ops->flags |= FIO_ASYNCIO_SYNC_TRIM;
-	else
-		ld->dsm = calloc(td->o.iodepth, sizeof(*ld->dsm));
+	} else {
+		dsm_size = sizeof(*ld->dsm) +
+			td->o.num_range * sizeof(struct nvme_dsm_range);
+		ld->dsm = calloc(td->o.iodepth, dsm_size);
+		ptr = ld->dsm;
+		for (i = 0; i < td->o.iodepth; i++) {
+			dsm = (struct nvme_dsm *)ptr;
+			dsm->nr_ranges = td->o.num_range;
+			ptr += dsm_size;
+		}
+	}
 
 	return 0;
 }
@@ -1466,7 +1485,8 @@ static struct ioengine_ops ioengine_uring_cmd = {
 	.name			= "io_uring_cmd",
 	.version		= FIO_IOOPS_VERSION,
 	.flags			= FIO_NO_OFFLOAD | FIO_MEMALIGN | FIO_RAWIO |
-					FIO_ASYNCIO_SETS_ISSUE_TIME,
+					FIO_ASYNCIO_SETS_ISSUE_TIME |
+					FIO_MULTI_RANGE_TRIM,
 	.init			= fio_ioring_init,
 	.post_init		= fio_ioring_cmd_post_init,
 	.io_u_init		= fio_ioring_io_u_init,
diff --git a/engines/nvme.c b/engines/nvme.c
index 75a5e0c1..c6629e86 100644
--- a/engines/nvme.c
+++ b/engines/nvme.c
@@ -8,20 +8,20 @@
 #include "../crc/crc-t10dif.h"
 #include "../crc/crc64.h"
 
-static inline __u64 get_slba(struct nvme_data *data, struct io_u *io_u)
+static inline __u64 get_slba(struct nvme_data *data, __u64 offset)
 {
 	if (data->lba_ext)
-		return io_u->offset / data->lba_ext;
-	else
-		return io_u->offset >> data->lba_shift;
+		return offset / data->lba_ext;
+
+	return offset >> data->lba_shift;
 }
 
-static inline __u32 get_nlb(struct nvme_data *data, struct io_u *io_u)
+static inline __u32 get_nlb(struct nvme_data *data, __u64 len)
 {
 	if (data->lba_ext)
-		return io_u->xfer_buflen / data->lba_ext - 1;
-	else
-		return (io_u->xfer_buflen >> data->lba_shift) - 1;
+		return len / data->lba_ext - 1;
+
+	return (len >> data->lba_shift) - 1;
 }
 
 static void fio_nvme_generate_pi_16b_guard(struct nvme_data *data,
@@ -32,8 +32,8 @@ static void fio_nvme_generate_pi_16b_guard(struct nvme_data *data,
 	struct nvme_16b_guard_pif *pi;
 	unsigned char *buf = io_u->xfer_buf;
 	unsigned char *md_buf = io_u->mmap_data;
-	__u64 slba = get_slba(data, io_u);
-	__u32 nlb = get_nlb(data, io_u) + 1;
+	__u64 slba = get_slba(data, io_u->offset);
+	__u32 nlb = get_nlb(data, io_u->xfer_buflen) + 1;
 	__u32 lba_num = 0;
 	__u16 guard = 0;
 
@@ -99,8 +99,8 @@ static int fio_nvme_verify_pi_16b_guard(struct nvme_data *data,
 	struct fio_file *f = io_u->file;
 	unsigned char *buf = io_u->xfer_buf;
 	unsigned char *md_buf = io_u->mmap_data;
-	__u64 slba = get_slba(data, io_u);
-	__u32 nlb = get_nlb(data, io_u) + 1;
+	__u64 slba = get_slba(data, io_u->offset);
+	__u32 nlb = get_nlb(data, io_u->xfer_buflen) + 1;
 	__u32 lba_num = 0;
 	__u16 unmask_app, unmask_app_exp, guard = 0;
 
@@ -185,8 +185,8 @@ static void fio_nvme_generate_pi_64b_guard(struct nvme_data *data,
 	unsigned char *buf = io_u->xfer_buf;
 	unsigned char *md_buf = io_u->mmap_data;
 	uint64_t guard = 0;
-	__u64 slba = get_slba(data, io_u);
-	__u32 nlb = get_nlb(data, io_u) + 1;
+	__u64 slba = get_slba(data, io_u->offset);
+	__u32 nlb = get_nlb(data, io_u->xfer_buflen) + 1;
 	__u32 lba_num = 0;
 
 	if (data->pi_loc) {
@@ -251,9 +251,9 @@ static int fio_nvme_verify_pi_64b_guard(struct nvme_data *data,
 	struct fio_file *f = io_u->file;
 	unsigned char *buf = io_u->xfer_buf;
 	unsigned char *md_buf = io_u->mmap_data;
-	__u64 slba = get_slba(data, io_u);
+	__u64 slba = get_slba(data, io_u->offset);
 	__u64 ref, ref_exp, guard = 0;
-	__u32 nlb = get_nlb(data, io_u) + 1;
+	__u32 nlb = get_nlb(data, io_u->xfer_buflen) + 1;
 	__u32 lba_num = 0;
 	__u16 unmask_app, unmask_app_exp;
 
@@ -329,24 +329,40 @@ next:
 	return 0;
 }
 void fio_nvme_uring_cmd_trim_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
-				  struct nvme_dsm_range *dsm)
+				  struct nvme_dsm *dsm)
 {
 	struct nvme_data *data = FILE_ENG_DATA(io_u->file);
+	struct trim_range *range;
+	uint8_t *buf_point;
+	int i;
 
 	cmd->opcode = nvme_cmd_dsm;
 	cmd->nsid = data->nsid;
-	cmd->cdw10 = 0;
 	cmd->cdw11 = NVME_ATTRIBUTE_DEALLOCATE;
-	cmd->addr = (__u64) (uintptr_t) dsm;
-	cmd->data_len = sizeof(*dsm);
-
-	dsm->slba = get_slba(data, io_u);
-	/* nlb is a 1-based value for deallocate */
-	dsm->nlb = get_nlb(data, io_u) + 1;
+	cmd->addr = (__u64) (uintptr_t) (&dsm->range[0]);
+
+	if (dsm->nr_ranges == 1) {
+		dsm->range[0].slba = get_slba(data, io_u->offset);
+		/* nlb is a 1-based value for deallocate */
+		dsm->range[0].nlb = get_nlb(data, io_u->xfer_buflen) + 1;
+		cmd->cdw10 = 0;
+		cmd->data_len = sizeof(struct nvme_dsm_range);
+	} else {
+		buf_point = io_u->xfer_buf;
+		for (i = 0; i < io_u->number_trim; i++) {
+			range = (struct trim_range *)buf_point;
+			dsm->range[i].slba = get_slba(data, range->start);
+			/* nlb is a 1-based value for deallocate */
+			dsm->range[i].nlb = get_nlb(data, range->len) + 1;
+			buf_point += sizeof(struct trim_range);
+		}
+		cmd->cdw10 = io_u->number_trim - 1;
+		cmd->data_len = io_u->number_trim * sizeof(struct nvme_dsm_range);
+	}
 }
 
 int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
-			    struct iovec *iov, struct nvme_dsm_range *dsm)
+			    struct iovec *iov, struct nvme_dsm *dsm)
 {
 	struct nvme_data *data = FILE_ENG_DATA(io_u->file);
 	__u64 slba;
@@ -368,8 +384,8 @@ int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 		return -ENOTSUP;
 	}
 
-	slba = get_slba(data, io_u);
-	nlb = get_nlb(data, io_u);
+	slba = get_slba(data, io_u->offset);
+	nlb = get_nlb(data, io_u->xfer_buflen);
 
 	/* cdw10 and cdw11 represent starting lba */
 	cmd->cdw10 = slba & 0xffffffff;
@@ -400,7 +416,7 @@ void fio_nvme_pi_fill(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 	struct nvme_data *data = FILE_ENG_DATA(io_u->file);
 	__u64 slba;
 
-	slba = get_slba(data, io_u);
+	slba = get_slba(data, io_u->offset);
 	cmd->cdw12 |= opts->io_flags;
 
 	if (data->pi_type && !(opts->io_flags & NVME_IO_PRINFO_PRACT)) {
diff --git a/engines/nvme.h b/engines/nvme.h
index 792b35d8..2d5204fc 100644
--- a/engines/nvme.h
+++ b/engines/nvme.h
@@ -408,6 +408,11 @@ struct nvme_dsm_range {
 	__le64	slba;
 };
 
+struct nvme_dsm {
+	__u32 nr_ranges;
+	struct nvme_dsm_range range[];
+};
+
 struct nvme_cmd_ext_io_opts {
 	__u32 io_flags;
 	__u16 apptag;
@@ -421,7 +426,7 @@ int fio_nvme_get_info(struct fio_file *f, __u64 *nlba, __u32 pi_act,
 		      struct nvme_data *data);
 
 int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
-			    struct iovec *iov, struct nvme_dsm_range *dsm);
+			    struct iovec *iov, struct nvme_dsm *dsm);
 
 void fio_nvme_pi_fill(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 		      struct nvme_cmd_ext_io_opts *opts);
diff --git a/examples/uring-cmd-trim-multi-range.fio b/examples/uring-cmd-trim-multi-range.fio
new file mode 100644
index 00000000..b376481b
--- /dev/null
+++ b/examples/uring-cmd-trim-multi-range.fio
@@ -0,0 +1,21 @@
+# Multi-range trim command test with io_uring_cmd I/O engine for nvme-ns
+# generic character device.
+#
+[global]
+filename=/dev/ng0n1
+ioengine=io_uring_cmd
+cmd_type=nvme
+size=10M
+iodepth=32
+thread=1
+stonewall=1
+
+[write_bs]
+bs=4096
+rw=randtrim
+num_range=8
+
+[write_bssplit]
+bssplit=4k/10:64k/50:32k/40
+rw=trim
+num_range=8
diff --git a/fio.1 b/fio.1
index 7ec5c745..e6b291a7 100644
--- a/fio.1
+++ b/fio.1
@@ -2293,6 +2293,13 @@ end to end protection information. Default: 0x1234.
 Specifies logical block application tag mask value, if namespace is formatted
 to use end to end protection information. Default: 0xffff.
 .TP
+.BI (io_uring_cmd)num_range \fR=\fPint
+For trim command this will be the number of ranges to trim per I/O request.
+The number of logical blocks per range is determined by the \fBbs\fR option
+which should be a multiple of logical block size. This cannot be used with
+read or write. Note that setting this option > 1, \fBlog_offset\fR will not be
+able to log all the offsets. Default: 1.
+.TP
 .BI (cpuio)cpuload \fR=\fPint
 Attempt to use the specified percentage of CPU cycles. This is a mandatory
 option when using cpuio I/O engine.
diff --git a/fio.h b/fio.h
index 1322656f..fc3e3ece 100644
--- a/fio.h
+++ b/fio.h
@@ -71,6 +71,16 @@
 
 struct fio_sem;
 
+#define MAX_TRIM_RANGE	256
+
+/*
+ * Range for trim command
+ */
+struct trim_range {
+	unsigned long long start;
+	unsigned long long len;
+};
+
 /*
  * offset generator types
  */
@@ -609,6 +619,14 @@ static inline void fio_ro_check(const struct thread_data *td, struct io_u *io_u)
 	       !(io_u->ddir == DDIR_TRIM && !td_trim(td)));
 }
 
+static inline bool multi_range_trim(struct thread_data *td, struct io_u *io_u)
+{
+	if (io_u->ddir == DDIR_TRIM && td->o.num_range > 1)
+		return true;
+
+	return false;
+}
+
 static inline bool should_fsync(struct thread_data *td)
 {
 	if (td->last_was_sync)
diff --git a/init.c b/init.c
index 105339fa..7a0b14a3 100644
--- a/init.c
+++ b/init.c
@@ -618,6 +618,19 @@ static int fixup_options(struct thread_data *td)
 		ret |= 1;
 	}
 
+	if (td_trimwrite(td) && o->num_range > 1) {
+		log_err("fio: trimwrite cannot be used with multiple"
+			" ranges.\n");
+		ret |= 1;
+	}
+
+	if (td_trim(td) && o->num_range > 1 &&
+	    !td_ioengine_flagged(td, FIO_MULTI_RANGE_TRIM)) {
+		log_err("fio: can't use multiple ranges with IO engine %s\n",
+			td->io_ops->name);
+		ret |= 1;
+	}
+
 #ifndef CONFIG_PSHARED
 	if (!o->use_thread) {
 		log_info("fio: this platform does not support process shared"
diff --git a/io_u.c b/io_u.c
index 4254675a..2b8e17f8 100644
--- a/io_u.c
+++ b/io_u.c
@@ -940,6 +940,65 @@ static void setup_strided_zone_mode(struct thread_data *td, struct io_u *io_u)
 		fio_file_reset(td, f);
 }
 
+static int fill_multi_range_io_u(struct thread_data *td, struct io_u *io_u)
+{
+	bool is_random;
+	uint64_t buflen, i = 0;
+	struct trim_range *range;
+	struct fio_file *f = io_u->file;
+	uint8_t *buf;
+
+	buf = io_u->buf;
+	buflen = 0;
+
+	while (i < td->o.num_range) {
+		range = (struct trim_range *)buf;
+		if (get_next_offset(td, io_u, &is_random)) {
+			dprint(FD_IO, "io_u %p, failed getting offset\n",
+			       io_u);
+			break;
+		}
+
+		io_u->buflen = get_next_buflen(td, io_u, is_random);
+		if (!io_u->buflen) {
+			dprint(FD_IO, "io_u %p, failed getting buflen\n", io_u);
+			break;
+		}
+
+		if (io_u->offset + io_u->buflen > io_u->file->real_file_size) {
+			dprint(FD_IO, "io_u %p, off=0x%llx + len=0x%llx exceeds file size=0x%llx\n",
+			       io_u,
+			       (unsigned long long) io_u->offset, io_u->buflen,
+			       (unsigned long long) io_u->file->real_file_size);
+			break;
+		}
+
+		range->start = io_u->offset;
+		range->len = io_u->buflen;
+		buflen += io_u->buflen;
+		f->last_start[io_u->ddir] = io_u->offset;
+		f->last_pos[io_u->ddir] = io_u->offset + range->len;
+
+		buf += sizeof(struct trim_range);
+		i++;
+
+		if (td_random(td) && file_randommap(td, io_u->file))
+			mark_random_map(td, io_u, io_u->offset, io_u->buflen);
+		dprint_io_u(io_u, "fill");
+	}
+	if (buflen) {
+		/*
+		 * Set buffer length as overall trim length for this IO, and
+		 * tell the ioengine about the number of ranges to be trimmed.
+		 */
+		io_u->buflen = buflen;
+		io_u->number_trim = i;
+		return 0;
+	}
+
+	return 1;
+}
+
 static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 {
 	bool is_random;
@@ -966,22 +1025,27 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 	else if (td->o.zone_mode == ZONE_MODE_ZBD)
 		setup_zbd_zone_mode(td, io_u);
 
-	/*
-	 * No log, let the seq/rand engine retrieve the next buflen and
-	 * position.
-	 */
-	if (get_next_offset(td, io_u, &is_random)) {
-		dprint(FD_IO, "io_u %p, failed getting offset\n", io_u);
-		return 1;
-	}
+	if (multi_range_trim(td, io_u)) {
+		if (fill_multi_range_io_u(td, io_u))
+			return 1;
+	} else {
+		/*
+		 * No log, let the seq/rand engine retrieve the next buflen and
+		 * position.
+		 */
+		if (get_next_offset(td, io_u, &is_random)) {
+			dprint(FD_IO, "io_u %p, failed getting offset\n", io_u);
+			return 1;
+		}
 
-	io_u->buflen = get_next_buflen(td, io_u, is_random);
-	if (!io_u->buflen) {
-		dprint(FD_IO, "io_u %p, failed getting buflen\n", io_u);
-		return 1;
+		io_u->buflen = get_next_buflen(td, io_u, is_random);
+		if (!io_u->buflen) {
+			dprint(FD_IO, "io_u %p, failed getting buflen\n", io_u);
+			return 1;
+		}
 	}
-
 	offset = io_u->offset;
+
 	if (td->o.zone_mode == ZONE_MODE_ZBD) {
 		ret = zbd_adjust_block(td, io_u);
 		if (ret == io_u_eof) {
@@ -1004,11 +1068,12 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 	/*
 	 * mark entry before potentially trimming io_u
 	 */
-	if (td_random(td) && file_randommap(td, io_u->file))
+	if (!multi_range_trim(td, io_u) && td_random(td) && file_randommap(td, io_u->file))
 		io_u->buflen = mark_random_map(td, io_u, offset, io_u->buflen);
 
 out:
-	dprint_io_u(io_u, "fill");
+	if (!multi_range_trim(td, io_u))
+		dprint_io_u(io_u, "fill");
 	io_u->verify_offset = io_u->offset;
 	td->zone_bytes += io_u->buflen;
 	return 0;
@@ -1814,7 +1879,7 @@ struct io_u *get_io_u(struct thread_data *td)
 
 	assert(fio_file_open(f));
 
-	if (ddir_rw(io_u->ddir)) {
+	if (ddir_rw(io_u->ddir) && !multi_range_trim(td, io_u)) {
 		if (!io_u->buflen && !td_ioengine_flagged(td, FIO_NOIO)) {
 			dprint(FD_IO, "get_io_u: zero buflen on %p\n", io_u);
 			goto err_put;
diff --git a/io_u.h b/io_u.h
index 786251d5..ab93d50f 100644
--- a/io_u.h
+++ b/io_u.h
@@ -52,6 +52,11 @@ struct io_u {
 	unsigned short ioprio;
 	unsigned short clat_prio_index;
 
+	/*
+	 * number of trim ranges for this IO.
+	 */
+	unsigned int number_trim;
+
 	/*
 	 * Allocated/set buffer and length
 	 */
diff --git a/ioengines.h b/ioengines.h
index 4391b31e..2fd7f52c 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -97,6 +97,8 @@ enum fio_ioengine_flags {
 	FIO_RO_NEEDS_RW_OPEN
 			= 1 << 18,	/* open files in rw mode even if we have a read job; only
 					   affects ioengines using generic_open_file */
+	FIO_MULTI_RANGE_TRIM
+			= 1 << 19,	/* ioengine supports trim with more than one range */
 };
 
 /*
diff --git a/options.c b/options.c
index 1da4de78..25e042d0 100644
--- a/options.c
+++ b/options.c
@@ -2395,6 +2395,17 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_INVALID,
 	},
+	{
+		.name	= "num_range",
+		.lname	= "Number of ranges",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, num_range),
+		.maxval	= MAX_TRIM_RANGE,
+		.help	= "Number of ranges for trim command",
+		.def	= "1",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_INVALID,
+	},
 	{
 		.name	= "bs",
 		.lname	= "Block size",
diff --git a/server.h b/server.h
index 0eb594ce..6d2659b0 100644
--- a/server.h
+++ b/server.h
@@ -51,7 +51,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 102,
+	FIO_SERVER_VER			= 103,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/t/io_uring.c b/t/io_uring.c
index efc50caa..46b153dc 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -94,6 +94,7 @@ struct submitter {
 	unsigned long reaps;
 	unsigned long done;
 	unsigned long calls;
+	unsigned long io_errors;
 	volatile int finish;
 
 	__s32 *fds;
@@ -109,6 +110,7 @@ struct submitter {
 #endif
 
 	int numa_node;
+	int per_file_depth;
 	const char *filename;
 
 	struct file files[MAX_FDS];
@@ -490,11 +492,6 @@ static int io_uring_enter(struct submitter *s, unsigned int to_submit,
 #endif
 }
 
-static unsigned file_depth(struct submitter *s)
-{
-	return (depth + s->nr_files - 1) / s->nr_files;
-}
-
 static unsigned long long get_offset(struct submitter *s, struct file *f)
 {
 	unsigned long long offset;
@@ -516,7 +513,7 @@ static unsigned long long get_offset(struct submitter *s, struct file *f)
 	return offset;
 }
 
-static struct file *init_new_io(struct submitter *s)
+static struct file *get_next_file(struct submitter *s)
 {
 	struct file *f;
 
@@ -524,7 +521,7 @@ static struct file *init_new_io(struct submitter *s)
 		f = &s->files[0];
 	} else {
 		f = &s->files[s->cur_file];
-		if (f->pending_ios >= file_depth(s)) {
+		if (f->pending_ios >= s->per_file_depth) {
 			s->cur_file++;
 			if (s->cur_file == s->nr_files)
 				s->cur_file = 0;
@@ -546,7 +543,7 @@ static void init_io(struct submitter *s, unsigned index)
 		return;
 	}
 
-	f = init_new_io(s);
+	f = get_next_file(s);
 
 	if (register_files) {
 		sqe->flags = IOSQE_FIXED_FILE;
@@ -587,7 +584,7 @@ static void init_io_pt(struct submitter *s, unsigned index)
 	unsigned long long slba;
 	unsigned long long nlb;
 
-	f = init_new_io(s);
+	f = get_next_file(s);
 
 	offset = get_offset(s, f);
 
@@ -717,10 +714,14 @@ static int reap_events_uring(struct submitter *s)
 			f = &s->files[fileno];
 			f->pending_ios--;
 			if (cqe->res != bs) {
-				printf("io: unexpected ret=%d\n", cqe->res);
-				if (polled && cqe->res == -EOPNOTSUPP)
-					printf("Your filesystem/driver/kernel doesn't support polled IO\n");
-				return -1;
+				if (cqe->res == -ENODATA || cqe->res == -EIO) {
+					s->io_errors++;
+				} else {
+					printf("io: unexpected ret=%d\n", cqe->res);
+					if (polled && cqe->res == -EOPNOTSUPP)
+						printf("Your filesystem/driver/kernel doesn't support polled IO\n");
+					return -1;
+				}
 			}
 		}
 		if (stats) {
@@ -867,6 +868,7 @@ static int setup_aio(struct submitter *s)
 		fixedbufs = register_files = 0;
 	}
 
+	s->per_file_depth = (depth + s->nr_files - 1) / s->nr_files;
 	return io_queue_init(roundup_pow2(depth), &s->aio_ctx);
 #else
 	fprintf(stderr, "Legacy AIO not available on this system/build\n");
@@ -971,6 +973,7 @@ static int setup_ring(struct submitter *s)
 	for (i = 0; i < p.sq_entries; i++)
 		sring->array[i] = i;
 
+	s->per_file_depth = (depth + s->nr_files - 1) / s->nr_files;
 	return 0;
 }
 
@@ -997,8 +1000,8 @@ static int submitter_init(struct submitter *s)
 	static int init_printed;
 	char buf[80];
 	s->tid = gettid();
-	printf("submitter=%d, tid=%d, file=%s, node=%d\n", s->index, s->tid,
-							s->filename, s->numa_node);
+	printf("submitter=%d, tid=%d, file=%s, nfiles=%d, node=%d\n", s->index, s->tid,
+							s->filename, s->nr_files, s->numa_node);
 
 	set_affinity(s);
 
@@ -1077,7 +1080,7 @@ static int prep_more_ios_aio(struct submitter *s, int max_ios, struct iocb *iocb
 	while (index < max_ios) {
 		struct iocb *iocb = &iocbs[index];
 
-		f = init_new_io(s);
+		f = get_next_file(s);
 
 		io_prep_pread(iocb, f->real_fd, s->iovecs[index].iov_base,
 				s->iovecs[index].iov_len, get_offset(s, f));
@@ -1102,10 +1105,14 @@ static int reap_events_aio(struct submitter *s, struct io_event *events, int evs
 
 		f->pending_ios--;
 		if (events[reaped].res != bs) {
-			printf("io: unexpected ret=%ld\n", events[reaped].res);
-			return -1;
-		}
-		if (stats) {
+			if (events[reaped].res == -ENODATA ||
+			    events[reaped].res == -EIO) {
+				s->io_errors++;
+			} else {
+				printf("io: unexpected ret=%ld\n", events[reaped].res);
+				return -1;
+			}
+		} else if (stats) {
 			int clock_index = data >> 32;
 
 			if (last_idx != clock_index) {
@@ -1379,7 +1386,7 @@ static void *submitter_sync_fn(void *data)
 		uint64_t offset;
 		struct file *f;
 
-		f = init_new_io(s);
+		f = get_next_file(s);
 
 #ifdef ARCH_HAVE_CPU_CLOCK
 		if (stats)
@@ -1550,7 +1557,7 @@ static void write_tsc_rate(void)
 int main(int argc, char *argv[])
 {
 	struct submitter *s;
-	unsigned long done, calls, reap;
+	unsigned long done, calls, reap, io_errors;
 	int i, j, flags, fd, opt, threads_per_f, threads_rem = 0, nfiles;
 	struct file f;
 	void *ret;
@@ -1661,7 +1668,7 @@ int main(int argc, char *argv[])
 		s = get_submitter(j);
 		s->numa_node = -1;
 		s->index = j;
-		s->done = s->calls = s->reaps = 0;
+		s->done = s->calls = s->reaps = s->io_errors = 0;
 	}
 
 	flags = O_RDONLY | O_NOATIME;
@@ -1746,11 +1753,12 @@ int main(int argc, char *argv[])
 #endif
 	}
 
-	reap = calls = done = 0;
+	reap = calls = done = io_errors = 0;
 	do {
 		unsigned long this_done = 0;
 		unsigned long this_reap = 0;
 		unsigned long this_call = 0;
+		unsigned long this_io_errors = 0;
 		unsigned long rpc = 0, ipc = 0;
 		unsigned long iops, bw;
 
@@ -1771,6 +1779,7 @@ int main(int argc, char *argv[])
 			this_done += s->done;
 			this_call += s->calls;
 			this_reap += s->reaps;
+			this_io_errors += s->io_errors;
 		}
 		if (this_call - calls) {
 			rpc = (this_done - done) / (this_call - calls);
@@ -1778,6 +1787,7 @@ int main(int argc, char *argv[])
 		} else
 			rpc = ipc = -1;
 		iops = this_done - done;
+		iops -= this_io_errors - io_errors;
 		if (bs > 1048576)
 			bw = iops * (bs / 1048576);
 		else
@@ -1805,6 +1815,7 @@ int main(int argc, char *argv[])
 		done = this_done;
 		calls = this_call;
 		reap = this_reap;
+		io_errors = this_io_errors;
 	} while (!finish);
 
 	for (j = 0; j < nthreads; j++) {
@@ -1812,6 +1823,9 @@ int main(int argc, char *argv[])
 		pthread_join(s->thread, &ret);
 		close(s->ring_fd);
 
+		if (s->io_errors)
+			printf("%d: %lu IO errors\n", s->tid, s->io_errors);
+
 		if (stats) {
 			unsigned long nr;
 
diff --git a/t/nvmept_trim.py b/t/nvmept_trim.py
new file mode 100755
index 00000000..57568384
--- /dev/null
+++ b/t/nvmept_trim.py
@@ -0,0 +1,586 @@
+#!/usr/bin/env python3
+#
+# Copyright 2024 Samsung Electronics Co., Ltd All Rights Reserved
+#
+# For conditions of distribution and use, see the accompanying COPYING file.
+#
+"""
+# nvmept_trim.py
+#
+# Test fio's io_uring_cmd ioengine with NVMe pass-through dataset management
+# commands that trim multiple ranges.
+#
+# USAGE
+# see python3 nvmept_trim.py --help
+#
+# EXAMPLES
+# python3 t/nvmept_trim.py --dut /dev/ng0n1
+# python3 t/nvmept_trim.py --dut /dev/ng1n1 -f ./fio
+#
+# REQUIREMENTS
+# Python 3.6
+#
+"""
+import os
+import sys
+import time
+import logging
+import argparse
+from pathlib import Path
+from fiotestlib import FioJobCmdTest, run_fio_tests
+from fiotestcommon import SUCCESS_NONZERO
+
+
+class TrimTest(FioJobCmdTest):
+    """
+    NVMe pass-through test class. Check to make sure output for selected data
+    direction(s) is non-zero and that zero data appears for other directions.
+    """
+
+    def setup(self, parameters):
+        """Setup a test."""
+
+        fio_args = [
+            "--name=nvmept-trim",
+            "--ioengine=io_uring_cmd",
+            "--cmd_type=nvme",
+            f"--filename={self.fio_opts['filename']}",
+            f"--rw={self.fio_opts['rw']}",
+            f"--output={self.filenames['output']}",
+            f"--output-format={self.fio_opts['output-format']}",
+        ]
+        for opt in ['fixedbufs', 'nonvectored', 'force_async', 'registerfiles',
+                    'sqthread_poll', 'sqthread_poll_cpu', 'hipri', 'nowait',
+                    'time_based', 'runtime', 'verify', 'io_size', 'num_range',
+                    'iodepth', 'iodepth_batch', 'iodepth_batch_complete',
+                    'size', 'rate', 'bs', 'bssplit', 'bsrange', 'randrepeat',
+                    'buffer_pattern', 'verify_pattern', 'verify', 'offset']:
+            if opt in self.fio_opts:
+                option = f"--{opt}={self.fio_opts[opt]}"
+                fio_args.append(option)
+
+        super().setup(fio_args)
+
+
+    def check_result(self):
+
+        super().check_result()
+
+        if 'rw' not in self.fio_opts or \
+                not self.passed or \
+                'json' not in self.fio_opts['output-format']:
+            return
+
+        job = self.json_data['jobs'][0]
+
+        if self.fio_opts['rw'] in ['read', 'randread']:
+            self.passed = self.check_all_ddirs(['read'], job)
+        elif self.fio_opts['rw'] in ['write', 'randwrite']:
+            if 'verify' not in self.fio_opts:
+                self.passed = self.check_all_ddirs(['write'], job)
+            else:
+                self.passed = self.check_all_ddirs(['read', 'write'], job)
+        elif self.fio_opts['rw'] in ['trim', 'randtrim']:
+            self.passed = self.check_all_ddirs(['trim'], job)
+        elif self.fio_opts['rw'] in ['readwrite', 'randrw']:
+            self.passed = self.check_all_ddirs(['read', 'write'], job)
+        elif self.fio_opts['rw'] in ['trimwrite', 'randtrimwrite']:
+            self.passed = self.check_all_ddirs(['trim', 'write'], job)
+        else:
+            logging.error("Unhandled rw value %s", self.fio_opts['rw'])
+            self.passed = False
+
+        if 'iodepth' in self.fio_opts:
+            # We will need to figure something out if any test uses an iodepth
+            # different from 8
+            if job['iodepth_level']['8'] < 95:
+                logging.error("Did not achieve requested iodepth")
+                self.passed = False
+            else:
+                logging.debug("iodepth 8 target met %s", job['iodepth_level']['8'])
+
+
+class RangeTrimTest(TrimTest):
+    """
+    Multi-range trim test class.
+    """
+
+    def get_bs(self):
+        """Calculate block size and determine whether bs will be an average or exact."""
+
+        if 'bs' in self.fio_opts:
+            exact_size = True
+            bs = self.fio_opts['bs']
+        elif 'bssplit' in self.fio_opts:
+            exact_size = False
+            bs = 0
+            total = 0
+            for split in self.fio_opts['bssplit'].split(':'):
+                [blocksize, share] = split.split('/')
+                total += int(share)
+                bs += int(blocksize) * int(share) / 100
+            if total != 100:
+                logging.error("bssplit '%s' total percentage is not 100", self.fio_opts['bssplit'])
+                self.passed = False
+            else:
+                logging.debug("bssplit: average block size is %d", int(bs))
+            # The only check we do here for bssplit is to calculate an average
+            # blocksize and see if the IOPS and bw are consistent
+        elif 'bsrange' in self.fio_opts:
+            exact_size = False
+            [minbs, maxbs] = self.fio_opts['bsrange'].split('-')
+            minbs = int(minbs)
+            maxbs = int(maxbs)
+            bs = int((minbs + maxbs) / 2)
+            logging.debug("bsrange: average block size is %d", int(bs))
+            # The only check we do here for bsrange is to calculate an average
+            # blocksize and see if the IOPS and bw are consistent
+        else:
+            exact_size = True
+            bs = 4096
+
+        return bs, exact_size
+
+
+    def check_result(self):
+        """
+        Make sure that the number of IO requests is consistent with the
+        blocksize and num_range values. In other words, if the blocksize is
+        4KiB and num_range is 2, we should have 128 IO requests to trim 1MiB.
+        """
+        # TODO Enable debug output to check the actual offsets
+
+        super().check_result()
+
+        if not self.passed or 'json' not in self.fio_opts['output-format']:
+            return
+
+        job = self.json_data['jobs'][0]['trim']
+        bs, exact_size = self.get_bs()
+
+        # make sure bw and IOPS are consistent
+        bw = job['bw_bytes']
+        iops = job['iops']
+        runtime = job['runtime']
+
+        calculated = int(bw*runtime/1000)
+        expected = job['io_bytes']
+        if abs(calculated - expected) / expected > 0.05:
+            logging.error("Total bytes %d from bw does not match reported total bytes %d",
+                          calculated, expected)
+            self.passed = False
+        else:
+            logging.debug("Total bytes %d from bw matches reported total bytes %d", calculated,
+                          expected)
+
+        calculated = int(iops*runtime/1000*bs*self.fio_opts['num_range'])
+        if abs(calculated - expected) / expected > 0.05:
+            logging.error("Total bytes %d from IOPS does not match reported total bytes %d",
+                          calculated, expected)
+            self.passed = False
+        else:
+            logging.debug("Total bytes %d from IOPS matches reported total bytes %d", calculated,
+                          expected)
+
+        if 'size' in self.fio_opts:
+            io_count = self.fio_opts['size'] / self.fio_opts['num_range'] / bs
+            if exact_size:
+                delta = 0.1
+            else:
+                delta = 0.05*job['total_ios']
+
+            if abs(job['total_ios'] - io_count) > delta:
+                logging.error("Expected numbers of IOs %d does not match actual value %d",
+                              io_count, job['total_ios'])
+                self.passed = False
+            else:
+                logging.debug("Expected numbers of IOs %d matches actual value %d", io_count,
+                              job['total_ios'])
+
+        if 'rate' in self.fio_opts:
+            if abs(bw - self.fio_opts['rate']) / self.fio_opts['rate'] > 0.05:
+                logging.error("Actual rate %f does not match expected rate %f", bw,
+                              self.fio_opts['rate'])
+                self.passed = False
+            else:
+                logging.debug("Actual rate %f matches expeected rate %f", bw, self.fio_opts['rate'])
+
+
+
+TEST_LIST = [
+    # The group of tests below checks existing use cases to make sure there are
+    # no regressions.
+    {
+        "test_id": 1,
+        "fio_opts": {
+            "rw": 'trim',
+            "time_based": 1,
+            "runtime": 3,
+            "output-format": "json",
+            },
+        "test_class": TrimTest,
+    },
+    {
+        "test_id": 2,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "time_based": 1,
+            "runtime": 3,
+            "output-format": "json",
+            },
+        "test_class": TrimTest,
+    },
+    {
+        "test_id": 3,
+        "fio_opts": {
+            "rw": 'trim',
+            "time_based": 1,
+            "runtime": 3,
+            "iodepth": 8,
+            "iodepth_batch": 4,
+            "iodepth_batch_complete": 4,
+            "output-format": "json",
+            },
+        "test_class": TrimTest,
+    },
+    {
+        "test_id": 4,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "time_based": 1,
+            "runtime": 3,
+            "iodepth": 8,
+            "iodepth_batch": 4,
+            "iodepth_batch_complete": 4,
+            "output-format": "json",
+            },
+        "test_class": TrimTest,
+    },
+    {
+        "test_id": 5,
+        "fio_opts": {
+            "rw": 'trimwrite',
+            "time_based": 1,
+            "runtime": 3,
+            "output-format": "json",
+            },
+        "test_class": TrimTest,
+    },
+    {
+        "test_id": 6,
+        "fio_opts": {
+            "rw": 'randtrimwrite',
+            "time_based": 1,
+            "runtime": 3,
+            "output-format": "json",
+            },
+        "test_class": TrimTest,
+    },
+    {
+        "test_id": 7,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "time_based": 1,
+            "runtime": 3,
+            "fixedbufs": 0,
+            "nonvectored": 1,
+            "force_async": 1,
+            "registerfiles": 1,
+            "sqthread_poll": 1,
+            "fixedbuffs": 1,
+            "output-format": "json",
+            },
+        "test_class": TrimTest,
+    },
+    # The group of tests below try out the new functionality
+    {
+        "test_id": 100,
+        "fio_opts": {
+            "rw": 'trim',
+            "num_range": 2,
+            "size": 16*1024*1024,
+            "output-format": "json",
+            },
+        "test_class": RangeTrimTest,
+    },
+    {
+        "test_id": 101,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "num_range": 2,
+            "size": 16*1024*1024,
+            "output-format": "json",
+            },
+        "test_class": RangeTrimTest,
+    },
+    {
+        "test_id": 102,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "num_range": 256,
+            "size": 64*1024*1024,
+            "output-format": "json",
+            },
+        "test_class": RangeTrimTest,
+    },
+    {
+        "test_id": 103,
+        "fio_opts": {
+            "rw": 'trim',
+            "num_range": 2,
+            "bs": 16*1024,
+            "size": 32*1024*1024,
+            "output-format": "json",
+            },
+        "test_class": RangeTrimTest,
+    },
+    {
+        "test_id": 104,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "num_range": 2,
+            "bs": 16*1024,
+            "size": 32*1024*1024,
+            "output-format": "json",
+            },
+        "test_class": RangeTrimTest,
+    },
+    {
+        "test_id": 105,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "num_range": 2,
+            "bssplit": "4096/50:16384/50",
+            "size": 80*1024*1024,
+            "output-format": "json",
+            "randrepeat": 0,
+            },
+        "test_class": RangeTrimTest,
+    },
+    {
+        "test_id": 106,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "num_range": 4,
+            "bssplit": "4096/25:8192/25:12288/25:16384/25",
+            "size": 80*1024*1024,
+            "output-format": "json",
+            "randrepeat": 0,
+            },
+        "test_class": RangeTrimTest,
+    },
+    {
+        "test_id": 107,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "num_range": 4,
+            "bssplit": "4096/20:8192/20:12288/20:16384/20:20480/20",
+            "size": 72*1024*1024,
+            "output-format": "json",
+            "randrepeat": 0,
+            },
+        "test_class": RangeTrimTest,
+    },
+    {
+        "test_id": 108,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "num_range": 2,
+            "bsrange": "4096-16384",
+            "size": 80*1024*1024,
+            "output-format": "json",
+            "randrepeat": 0,
+            },
+        "test_class": RangeTrimTest,
+    },
+    {
+        "test_id": 109,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "num_range": 4,
+            "bsrange": "4096-20480",
+            "size": 72*1024*1024,
+            "output-format": "json",
+            "randrepeat": 0,
+            },
+        "test_class": RangeTrimTest,
+    },
+    {
+        "test_id": 110,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "time_based": 1,
+            "runtime": 10,
+            "rate": 1024*1024,
+            "num_range": 2,
+            "output-format": "json",
+            },
+        "test_class": RangeTrimTest,
+    },
+    # All of the tests below should fail
+    # TODO check the error messages resulting from the jobs below
+    {
+        "test_id": 200,
+        "fio_opts": {
+            "rw": 'randtrimwrite',
+            "time_based": 1,
+            "runtime": 10,
+            "rate": 1024*1024,
+            "num_range": 2,
+            "output-format": "normal",
+            },
+        "test_class": RangeTrimTest,
+        "success": SUCCESS_NONZERO,
+    },
+    {
+        "test_id": 201,
+        "fio_opts": {
+            "rw": 'trimwrite',
+            "time_based": 1,
+            "runtime": 10,
+            "rate": 1024*1024,
+            "num_range": 2,
+            "output-format": "normal",
+            },
+        "test_class": RangeTrimTest,
+        "success": SUCCESS_NONZERO,
+    },
+    {
+        "test_id": 202,
+        "fio_opts": {
+            "rw": 'trim',
+            "time_based": 1,
+            "runtime": 10,
+            "num_range": 257,
+            "output-format": "normal",
+            },
+        "test_class": RangeTrimTest,
+        "success": SUCCESS_NONZERO,
+    },
+    # The sequence of jobs below constitute a single test with multiple steps
+    # - write a data pattern
+    # - verify the data pattern
+    # - trim the first half of the LBA space
+    # - verify that the trim'd LBA space no longer returns the original data pattern
+    # - verify that the remaining LBA space has the expected pattern
+    {
+        "test_id": 300,
+        "fio_opts": {
+            "rw": 'write',
+            "output-format": 'json',
+            "buffer_pattern": 0x0f,
+            "size": 256*1024*1024,
+            },
+        "test_class": TrimTest,
+    },
+    {
+        "test_id": 301,
+        "fio_opts": {
+            "rw": 'read',
+            "output-format": 'json',
+            "verify_pattern": 0x0f,
+            "verify": "pattern",
+            "size": 256*1024*1024,
+            },
+        "test_class": TrimTest,
+    },
+    {
+        "test_id": 302,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "num_range": 8,
+            "output-format": 'json',
+            "size": 128*1024*1024,
+            },
+        "test_class": TrimTest,
+    },
+    # The identify namespace data structure has a DLFEAT field which specifies
+    # what happens when reading data from deallocated blocks. There are three
+    # options:
+    # - read behavior not reported
+    # - deallocated logical block returns all bytes 0x0
+    # - deallocated logical block returns all bytes 0xff
+    # The test below merely checks that the original data pattern is not returned.
+    # Source: Figure 97 from
+    # https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-1.0c-2022.10.03-Ratified.pdf
+    {
+        "test_id": 303,
+        "fio_opts": {
+            "rw": 'read',
+            "output-format": 'json',
+            "verify_pattern": 0x0f,
+            "verify": "pattern",
+            "size": 128*1024*1024,
+            },
+        "test_class": TrimTest,
+        "success": SUCCESS_NONZERO,
+    },
+    {
+        "test_id": 304,
+        "fio_opts": {
+            "rw": 'read',
+            "output-format": 'json',
+            "verify_pattern": 0x0f,
+            "verify": "pattern",
+            "offset": 128*1024*1024,
+            "size": 128*1024*1024,
+            },
+        "test_class": TrimTest,
+    },
+]
+
+def parse_args():
+    """Parse command-line arguments."""
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-d', '--debug', help='Enable debug messages', action='store_true')
+    parser.add_argument('-f', '--fio', help='path to file executable (e.g., ./fio)')
+    parser.add_argument('-a', '--artifact-root', help='artifact root directory')
+    parser.add_argument('-s', '--skip', nargs='+', type=int,
+                        help='list of test(s) to skip')
+    parser.add_argument('-o', '--run-only', nargs='+', type=int,
+                        help='list of test(s) to run, skipping all others')
+    parser.add_argument('--dut', help='target NVMe character device to test '
+                        '(e.g., /dev/ng0n1). WARNING: THIS IS A DESTRUCTIVE TEST', required=True)
+    args = parser.parse_args()
+
+    return args
+
+
+def main():
+    """Run tests using fio's io_uring_cmd ioengine to send NVMe pass through commands."""
+
+    args = parse_args()
+
+    if args.debug:
+        logging.basicConfig(level=logging.DEBUG)
+    else:
+        logging.basicConfig(level=logging.INFO)
+
+    artifact_root = args.artifact_root if args.artifact_root else \
+        f"nvmept-trim-test-{time.strftime('%Y%m%d-%H%M%S')}"
+    os.mkdir(artifact_root)
+    print(f"Artifact directory is {artifact_root}")
+
+    if args.fio:
+        fio_path = str(Path(args.fio).absolute())
+    else:
+        fio_path = 'fio'
+    print(f"fio path is {fio_path}")
+
+    for test in TEST_LIST:
+        test['fio_opts']['filename'] = args.dut
+
+    test_env = {
+              'fio_path': fio_path,
+              'fio_root': str(Path(__file__).absolute().parent.parent),
+              'artifact_root': artifact_root,
+              'basename': 'nvmept-trim',
+              }
+
+    _, failed, _ = run_fio_tests(TEST_LIST, test_env, args)
+    sys.exit(failed)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 2f76d3fc..d4742e96 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -981,6 +981,14 @@ TEST_LIST = [
         'success':          SUCCESS_DEFAULT,
         'requirements':     [Requirements.linux, Requirements.nvmecdev],
     },
+    {
+        'test_id':          1015,
+        'test_class':       FioExeTest,
+        'exe':              't/nvmept_trim.py',
+        'parameters':       ['-f', '{fio_path}', '--dut', '{nvmecdev}'],
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [Requirements.linux, Requirements.nvmecdev],
+    },
 ]
 
 
diff --git a/thread_options.h b/thread_options.h
index 24f695fe..c2e71518 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -353,6 +353,8 @@ struct thread_options {
 	unsigned long long offset_increment;
 	unsigned long long number_ios;
 
+	unsigned int num_range;
+
 	unsigned int sync_file_range;
 
 	unsigned long long latency_target;
@@ -711,6 +713,7 @@ struct thread_options_pack {
 	uint32_t fdp_plis[FIO_MAX_PLIS];
 	uint32_t fdp_nrpli;
 
+	uint32_t num_range;
 	/*
 	 * verify_pattern followed by buffer_pattern from the unpacked struct
 	 */

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-02-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-02-15 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a99bd37f690ab246caa9b9d65adfa65a25967190:

  examples: add PI example with xnvme ioengine (2024-02-13 14:24:59 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7aec5ac0bdd1adcaeba707f26d5bc583de6ab6c9:

  test: add the test for loops option and read-verify workloads (2024-02-14 07:39:48 -0700)

----------------------------------------------------------------
Shin'ichiro Kawasaki (2):
      verify: fix loops option behavior of read-verify workloads
      test: add the test for loops option and read-verify workloads

 io_u.c             |  3 ++-
 t/jobs/t0029.fio   | 14 ++++++++++++++
 t/run-fio-tests.py | 21 +++++++++++++++++++++
 3 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 t/jobs/t0029.fio

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index 13187882..4254675a 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2151,7 +2151,8 @@ static void io_u_update_bytes_done(struct thread_data *td,
 
 	if (td->runstate == TD_VERIFYING) {
 		td->bytes_verified += icd->bytes_done[DDIR_READ];
-		return;
+		if (td_write(td))
+			return;
 	}
 
 	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
diff --git a/t/jobs/t0029.fio b/t/jobs/t0029.fio
new file mode 100644
index 00000000..481de6f3
--- /dev/null
+++ b/t/jobs/t0029.fio
@@ -0,0 +1,14 @@
+[global]
+filename=t0029file
+size=4k
+verify=md5
+
+[write]
+rw=write
+do_verify=0
+
+[read]
+stonewall=1
+rw=read
+loops=2
+do_verify=1
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 1448f7cb..2f76d3fc 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -542,6 +542,17 @@ class FioJobFileTest_t0027(FioJobFileTest):
         if data != self.pattern:
             self.passed = False
 
+class FioJobFileTest_t0029(FioJobFileTest):
+    """Test loops option works with read-verify workload."""
+    def check_result(self):
+        super().check_result()
+
+        if not self.passed:
+            return
+
+        if self.json_data['jobs'][1]['read']['io_kbytes'] != 8:
+            self.passed = False
+
 class FioJobFileTest_iops_rate(FioJobFileTest):
     """Test consists of fio test job t0011
     Confirm that job0 iops == 1000
@@ -838,6 +849,16 @@ TEST_LIST = [
         'pre_success':      None,
         'requirements':     [],
     },
+    {
+        'test_id':          29,
+        'test_class':       FioJobFileTest_t0029,
+        'job':              't0029.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-02-14 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-02-14 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 097af663e93f86106f32580bfa59e68fae007035:

  Merge branch 'vsock' of https://github.com/MPinna/fio (2024-02-12 11:56:33 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a99bd37f690ab246caa9b9d65adfa65a25967190:

  examples: add PI example with xnvme ioengine (2024-02-13 14:24:59 -0500)

----------------------------------------------------------------
Ankit Kumar (5):
      engines/xnvme: allocate iovecs only if vectored I/O is enabled
      engines/xnvme: add support for metadata
      engines:xnvme: add support for end to end data protection
      engines/xnvme: add checks for verify, block size and metadata size
      examples: add PI example with xnvme ioengine

Vincent Fu (4):
      logging: record timestamp for each thread
      helper_thread: do not send A_EXIT message when exit is called
      logging: expand runstates eligible for logging
      docs: explain duplicate logging timestamps

 HOWTO.rst             |  25 ++--
 configure             |   2 +-
 engines/xnvme.c       | 331 +++++++++++++++++++++++++++++++++++++++++++++++---
 examples/xnvme-pi.fio |  53 ++++++++
 fio.1                 |  24 ++--
 helper_thread.c       |   1 -
 stat.c                |  24 +++-
 7 files changed, 414 insertions(+), 46 deletions(-)
 create mode 100644 examples/xnvme-pi.fio

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 53b03021..5bc1713c 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2491,11 +2491,11 @@ with the caveat that when used on the command line, they must come after the
         want fio to use placement identifier only at indices 0, 2 and 5 specify
         ``fdp_pli=0,2,5``.
 
-.. option:: md_per_io_size=int : [io_uring_cmd]
+.. option:: md_per_io_size=int : [io_uring_cmd] [xnvme]
 
 	Size in bytes for separate metadata buffer per IO. Default: 0.
 
-.. option:: pi_act=int : [io_uring_cmd]
+.. option:: pi_act=int : [io_uring_cmd] [xnvme]
 
 	Action to take when nvme namespace is formatted with protection
 	information. If this is set to 1 and namespace is formatted with
@@ -2511,7 +2511,7 @@ with the caveat that when used on the command line, they must come after the
 	it will use the default slower generator.
 	(see: https://github.com/intel/isa-l)
 
-.. option:: pi_chk=str[,str][,str] : [io_uring_cmd]
+.. option:: pi_chk=str[,str][,str] : [io_uring_cmd] [xnvme]
 
 	Controls the protection information check. This can take one or more
 	of these values. Default: none.
@@ -2524,12 +2524,12 @@ with the caveat that when used on the command line, they must come after the
 	**APPTAG**
 		Enables protection information checking of application tag field.
 
-.. option:: apptag=int : [io_uring_cmd]
+.. option:: apptag=int : [io_uring_cmd] [xnvme]
 
 	Specifies logical block application tag value, if namespace is
 	formatted to use end to end protection information. Default: 0x1234.
 
-.. option:: apptag_mask=int : [io_uring_cmd]
+.. option:: apptag_mask=int : [io_uring_cmd] [xnvme]
 
 	Specifies logical block application tag mask value, if namespace is
 	formatted to use end to end protection information. Default: 0xffff.
@@ -4066,12 +4066,15 @@ Measurements and reporting
 
 .. option:: log_avg_msec=int
 
-	By default, fio will log an entry in the iops, latency, or bw log for every
-	I/O that completes. When writing to the disk log, that can quickly grow to a
-	very large size. Setting this option makes fio average the each log entry
-	over the specified period of time, reducing the resolution of the log.  See
-	:option:`log_window_value` as well. Defaults to 0, logging all entries.
-	Also see `Log File Formats`_.
+        By default, fio will log an entry in the iops, latency, or bw log for
+        every I/O that completes. When writing to the disk log, that can
+        quickly grow to a very large size. Setting this option directs fio to
+        instead record an average over the specified duration for each log
+        entry, reducing the resolution of the log. When the job completes, fio
+        will flush any accumulated latency log data, so the final log interval
+        may not match the value specified by this option and there may even be
+        duplicate timestamps. See :option:`log_window_value` as well. Defaults
+        to 0, logging entries for each I/O. Also see `Log File Formats`_.
 
 .. option:: log_hist_msec=int
 
diff --git a/configure b/configure
index becb193e..3eef022b 100755
--- a/configure
+++ b/configure
@@ -2697,7 +2697,7 @@ fi
 ##########################################
 # Check if we have xnvme
 if test "$xnvme" != "no" ; then
-  if check_min_lib_version xnvme 0.7.0; then
+  if check_min_lib_version xnvme 0.7.4; then
     xnvme="yes"
     xnvme_cflags=$(pkg-config --cflags xnvme)
     xnvme_libs=$(pkg-config --libs xnvme)
diff --git a/engines/xnvme.c b/engines/xnvme.c
index 2a0b3520..a8137286 100644
--- a/engines/xnvme.c
+++ b/engines/xnvme.c
@@ -11,6 +11,7 @@
 #include <assert.h>
 #include <libxnvme.h>
 #include "fio.h"
+#include "verify.h"
 #include "zbd_types.h"
 #include "fdp.h"
 #include "optgroup.h"
@@ -30,8 +31,10 @@ struct xnvme_fioe_fwrap {
 
 	uint32_t ssw;
 	uint32_t lba_nbytes;
+	uint32_t md_nbytes;
+	uint32_t lba_pow2;
 
-	uint8_t _pad[24];
+	uint8_t _pad[16];
 };
 XNVME_STATIC_ASSERT(sizeof(struct xnvme_fioe_fwrap) == 64, "Incorrect size")
 
@@ -58,19 +61,31 @@ struct xnvme_fioe_data {
 	uint64_t nallocated;
 
 	struct iovec *iovec;
-
-	uint8_t _pad[8];
+	struct iovec *md_iovec;
 
 	struct xnvme_fioe_fwrap files[];
 };
 XNVME_STATIC_ASSERT(sizeof(struct xnvme_fioe_data) == 64, "Incorrect size")
 
+struct xnvme_fioe_request {
+	/* Context for NVMe PI */
+	struct xnvme_pi_ctx pi_ctx;
+
+	/* Separate metadata buffer pointer */
+	void *md_buf;
+};
+
 struct xnvme_fioe_options {
 	void *padding;
 	unsigned int hipri;
 	unsigned int sqpoll_thread;
 	unsigned int xnvme_dev_nsid;
 	unsigned int xnvme_iovec;
+	unsigned int md_per_io_size;
+	unsigned int pi_act;
+	unsigned int apptag;
+	unsigned int apptag_mask;
+	unsigned int prchk;
 	char *xnvme_be;
 	char *xnvme_mem;
 	char *xnvme_async;
@@ -79,6 +94,20 @@ struct xnvme_fioe_options {
 	char *xnvme_dev_subnqn;
 };
 
+static int str_pi_chk_cb(void *data, const char *str)
+{
+	struct xnvme_fioe_options *o = data;
+
+	if (strstr(str, "GUARD") != NULL)
+		o->prchk = XNVME_PI_FLAGS_GUARD_CHECK;
+	if (strstr(str, "REFTAG") != NULL)
+		o->prchk |= XNVME_PI_FLAGS_REFTAG_CHECK;
+	if (strstr(str, "APPTAG") != NULL)
+		o->prchk |= XNVME_PI_FLAGS_APPTAG_CHECK;
+
+	return 0;
+}
+
 static struct fio_option options[] = {
 	{
 		.name = "hipri",
@@ -171,6 +200,56 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group = FIO_OPT_G_XNVME,
 	},
+	{
+		.name	= "md_per_io_size",
+		.lname	= "Separate Metadata Buffer Size per I/O",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct xnvme_fioe_options, md_per_io_size),
+		.def	= "0",
+		.help	= "Size of separate metadata buffer per I/O (Default: 0)",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_XNVME,
+	},
+	{
+		.name	= "pi_act",
+		.lname	= "Protection Information Action",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct xnvme_fioe_options, pi_act),
+		.def	= "1",
+		.help	= "Protection Information Action bit (pi_act=1 or pi_act=0)",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_XNVME,
+	},
+	{
+		.name	= "pi_chk",
+		.lname	= "Protection Information Check",
+		.type	= FIO_OPT_STR_STORE,
+		.def	= NULL,
+		.help	= "Control of Protection Information Checking (pi_chk=GUARD,REFTAG,APPTAG)",
+		.cb	= str_pi_chk_cb,
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_XNVME,
+	},
+	{
+		.name	= "apptag",
+		.lname	= "Application Tag used in Protection Information",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct xnvme_fioe_options, apptag),
+		.def	= "0x1234",
+		.help	= "Application Tag used in Protection Information field (Default: 0x1234)",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_XNVME,
+	},
+	{
+		.name	= "apptag_mask",
+		.lname	= "Application Tag Mask",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct xnvme_fioe_options, apptag_mask),
+		.def	= "0xffff",
+		.help	= "Application Tag Mask used with Application Tag (Default: 0xffff)",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_XNVME,
+	},
 
 	{
 		.name = NULL,
@@ -181,6 +260,10 @@ static void cb_pool(struct xnvme_cmd_ctx *ctx, void *cb_arg)
 {
 	struct io_u *io_u = cb_arg;
 	struct xnvme_fioe_data *xd = io_u->mmap_data;
+	struct xnvme_fioe_request *fio_req = io_u->engine_data;
+	struct xnvme_fioe_fwrap *fwrap = &xd->files[io_u->file->fileno];
+	bool pi_act = (fio_req->pi_ctx.pi_flags >> 3);
+	int err;
 
 	if (xnvme_cmd_ctx_cpl_status(ctx)) {
 		xnvme_cmd_ctx_pr(ctx, XNVME_PR_DEF);
@@ -188,6 +271,15 @@ static void cb_pool(struct xnvme_cmd_ctx *ctx, void *cb_arg)
 		io_u->error = EIO;
 	}
 
+	if (!io_u->error && fwrap->geo->pi_type && (io_u->ddir == DDIR_READ) && !pi_act) {
+		err = xnvme_pi_verify(&fio_req->pi_ctx, io_u->xfer_buf,
+				      fio_req->md_buf, io_u->xfer_buflen / fwrap->lba_nbytes);
+		if (err) {
+			xd->ecount += 1;
+			io_u->error = EIO;
+		}
+	}
+
 	xd->iocq[xd->completed++] = io_u;
 	xnvme_queue_put_cmd_ctx(ctx->async.queue, ctx);
 }
@@ -249,10 +341,54 @@ static void xnvme_fioe_cleanup(struct thread_data *td)
 
 	free(xd->iocq);
 	free(xd->iovec);
+	free(xd->md_iovec);
 	free(xd);
 	td->io_ops_data = NULL;
 }
 
+static int _verify_options(struct thread_data *td, struct fio_file *f,
+			   struct xnvme_fioe_fwrap *fwrap)
+{
+	struct xnvme_fioe_options *o = td->eo;
+	unsigned int correct_md_size;
+
+	for_each_rw_ddir(ddir) {
+		if (td->o.min_bs[ddir] % fwrap->lba_nbytes || td->o.max_bs[ddir] % fwrap->lba_nbytes) {
+			if (!fwrap->lba_pow2) {
+				log_err("ioeng->_verify_options(%s): block size must be a multiple of %u "
+					"(LBA data size + Metadata size)\n", f->file_name, fwrap->lba_nbytes);
+			} else {
+				log_err("ioeng->_verify_options(%s): block size must be a multiple of LBA data size\n",
+					f->file_name);
+			}
+			return 1;
+		}
+		if (ddir == DDIR_TRIM)
+			continue;
+
+		correct_md_size = (td->o.max_bs[ddir] / fwrap->lba_nbytes) * fwrap->md_nbytes;
+		if (fwrap->md_nbytes && fwrap->lba_pow2 && (o->md_per_io_size < correct_md_size)) {
+			log_err("ioeng->_verify_options(%s): md_per_io_size should be at least %u bytes\n",
+				f->file_name, correct_md_size);
+			return 1;
+		}
+	}
+
+	/*
+	 * For extended logical block sizes we cannot use verify when
+	 * end to end data protection checks are enabled, as the PI
+	 * section of data buffer conflicts with verify.
+	 */
+	if (fwrap->md_nbytes && fwrap->geo->pi_type && !fwrap->lba_pow2 &&
+	    td->o.verify != VERIFY_NONE) {
+		log_err("ioeng->_verify_options(%s): for extended LBA, verify cannot be used when E2E data protection is enabled\n",
+			f->file_name);
+		return 1;
+	}
+
+	return 0;
+}
+
 /**
  * Helper function setting up device handles as addressed by the naming
  * convention of the given `fio_file` filename.
@@ -263,6 +399,7 @@ static void xnvme_fioe_cleanup(struct thread_data *td)
 static int _dev_open(struct thread_data *td, struct fio_file *f)
 {
 	struct xnvme_opts opts = xnvme_opts_from_fioe(td);
+	struct xnvme_fioe_options *o = td->eo;
 	struct xnvme_fioe_data *xd = td->io_ops_data;
 	struct xnvme_fioe_fwrap *fwrap;
 	int flags = 0;
@@ -297,6 +434,31 @@ static int _dev_open(struct thread_data *td, struct fio_file *f)
 
 	fwrap->ssw = xnvme_dev_get_ssw(fwrap->dev);
 	fwrap->lba_nbytes = fwrap->geo->lba_nbytes;
+	fwrap->md_nbytes = fwrap->geo->nbytes_oob;
+
+	if (fwrap->geo->lba_extended)
+		fwrap->lba_pow2 = 0;
+	else
+		fwrap->lba_pow2 = 1;
+
+	/*
+	 * When PI action is set and PI size is equal to metadata size, the
+	 * controller inserts/removes PI. So update the LBA data and metadata
+	 * sizes accordingly.
+	 */
+	if (o->pi_act && fwrap->geo->pi_type &&
+	    fwrap->geo->nbytes_oob == xnvme_pi_size(fwrap->geo->pi_format)) {
+		if (fwrap->geo->lba_extended) {
+			fwrap->lba_nbytes -= fwrap->geo->nbytes_oob;
+			fwrap->lba_pow2 = 1;
+		}
+		fwrap->md_nbytes = 0;
+	}
+
+	if (_verify_options(td, f, fwrap)) {
+		td_verror(td, EINVAL, "_dev_open");
+		goto failure;
+	}
 
 	fwrap->fio_file = f;
 	fwrap->fio_file->filetype = FIO_TYPE_BLOCK;
@@ -325,6 +487,7 @@ failure:
 static int xnvme_fioe_init(struct thread_data *td)
 {
 	struct xnvme_fioe_data *xd = NULL;
+	struct xnvme_fioe_options *o = td->eo;
 	struct fio_file *f;
 	unsigned int i;
 
@@ -347,12 +510,25 @@ static int xnvme_fioe_init(struct thread_data *td)
 		return 1;
 	}
 
-	xd->iovec = calloc(td->o.iodepth, sizeof(*xd->iovec));
-	if (!xd->iovec) {
-		free(xd->iocq);
-		free(xd);
-		log_err("ioeng->init(): !calloc(xd->iovec), err(%d)\n", errno);
-		return 1;
+	if (o->xnvme_iovec) {
+		xd->iovec = calloc(td->o.iodepth, sizeof(*xd->iovec));
+		if (!xd->iovec) {
+			free(xd->iocq);
+			free(xd);
+			log_err("ioeng->init(): !calloc(xd->iovec), err(%d)\n", errno);
+			return 1;
+		}
+	}
+
+	if (o->xnvme_iovec && o->md_per_io_size) {
+		xd->md_iovec = calloc(td->o.iodepth, sizeof(*xd->md_iovec));
+		if (!xd->md_iovec) {
+			free(xd->iocq);
+			free(xd->iovec);
+			free(xd);
+			log_err("ioeng->init(): !calloc(xd->md_iovec), err(%d)\n", errno);
+			return 1;
+		}
 	}
 
 	xd->prev = -1;
@@ -362,8 +538,8 @@ static int xnvme_fioe_init(struct thread_data *td)
 	{
 		if (_dev_open(td, f)) {
 			/*
-			 * Note: We are not freeing xd, iocq and iovec. This
-			 * will be done as part of cleanup routine.
+			 * Note: We are not freeing xd, iocq, iovec and md_iovec.
+			 * This will be done as part of cleanup routine.
 			 */
 			log_err("ioeng->init(): failed; _dev_open(%s)\n", f->file_name);
 			return 1;
@@ -418,13 +594,61 @@ static void xnvme_fioe_iomem_free(struct thread_data *td)
 
 static int xnvme_fioe_io_u_init(struct thread_data *td, struct io_u *io_u)
 {
+	struct xnvme_fioe_request *fio_req;
+	struct xnvme_fioe_options *o = td->eo;
+	struct xnvme_fioe_data *xd = td->io_ops_data;
+	struct xnvme_fioe_fwrap *fwrap = &xd->files[0];
+
+	if (!fwrap->dev) {
+		log_err("ioeng->io_u_init(): failed; no dev-handle\n");
+		return 1;
+	}
+
 	io_u->mmap_data = td->io_ops_data;
+	io_u->engine_data = NULL;
+
+	fio_req = calloc(1, sizeof(*fio_req));
+	if (!fio_req) {
+		log_err("ioeng->io_u_init(): !calloc(fio_req), err(%d)\n", errno);
+		return 1;
+	}
+
+	if (o->md_per_io_size) {
+		fio_req->md_buf = xnvme_buf_alloc(fwrap->dev, o->md_per_io_size);
+		if (!fio_req->md_buf) {
+			free(fio_req);
+			return 1;
+		}
+	}
+
+	io_u->engine_data = fio_req;
 
 	return 0;
 }
 
 static void xnvme_fioe_io_u_free(struct thread_data *td, struct io_u *io_u)
 {
+	struct xnvme_fioe_data *xd = NULL;
+	struct xnvme_fioe_fwrap *fwrap = NULL;
+	struct xnvme_fioe_request *fio_req = NULL;
+
+	if (!td->io_ops_data)
+		return;
+
+	xd = td->io_ops_data;
+	fwrap = &xd->files[0];
+
+	if (!fwrap->dev) {
+		log_err("ioeng->io_u_free(): failed no dev-handle\n");
+		return;
+	}
+
+	fio_req = io_u->engine_data;
+	if (fio_req->md_buf)
+		xnvme_buf_free(fwrap->dev, fio_req->md_buf);
+
+	free(fio_req);
+
 	io_u->mmap_data = NULL;
 }
 
@@ -499,8 +723,10 @@ static int xnvme_fioe_getevents(struct thread_data *td, unsigned int min, unsign
 static enum fio_q_status xnvme_fioe_queue(struct thread_data *td, struct io_u *io_u)
 {
 	struct xnvme_fioe_data *xd = td->io_ops_data;
+	struct xnvme_fioe_options *o = td->eo;
 	struct xnvme_fioe_fwrap *fwrap;
 	struct xnvme_cmd_ctx *ctx;
+	struct xnvme_fioe_request *fio_req = io_u->engine_data;
 	uint32_t nsid;
 	uint64_t slba;
 	uint16_t nlb;
@@ -513,8 +739,13 @@ static enum fio_q_status xnvme_fioe_queue(struct thread_data *td, struct io_u *i
 	fwrap = &xd->files[io_u->file->fileno];
 	nsid = xnvme_dev_get_nsid(fwrap->dev);
 
-	slba = io_u->offset >> fwrap->ssw;
-	nlb = (io_u->xfer_buflen >> fwrap->ssw) - 1;
+	if (fwrap->lba_pow2) {
+		slba = io_u->offset >> fwrap->ssw;
+		nlb = (io_u->xfer_buflen >> fwrap->ssw) - 1;
+	} else {
+		slba = io_u->offset / fwrap->lba_nbytes;
+		nlb = (io_u->xfer_buflen / fwrap->lba_nbytes) - 1;
+	}
 
 	ctx = xnvme_queue_get_cmd_ctx(fwrap->queue);
 	ctx->async.cb_arg = io_u;
@@ -545,14 +776,80 @@ static enum fio_q_status xnvme_fioe_queue(struct thread_data *td, struct io_u *i
 		return FIO_Q_COMPLETED;
 	}
 
+	if (fwrap->geo->pi_type && !o->pi_act) {
+		err = xnvme_pi_ctx_init(&fio_req->pi_ctx, fwrap->lba_nbytes,
+					fwrap->geo->nbytes_oob, fwrap->geo->lba_extended,
+					fwrap->geo->pi_loc, fwrap->geo->pi_type,
+					(o->pi_act << 3 | o->prchk), slba, o->apptag_mask,
+					o->apptag, fwrap->geo->pi_format);
+		if (err) {
+			log_err("ioeng->queue(): err: '%d'\n", err);
+
+			xnvme_queue_put_cmd_ctx(ctx->async.queue, ctx);
+
+			io_u->error = abs(err);
+			return FIO_Q_COMPLETED;
+		}
+
+		if (io_u->ddir == DDIR_WRITE)
+			xnvme_pi_generate(&fio_req->pi_ctx, io_u->xfer_buf, fio_req->md_buf,
+					  nlb + 1);
+	}
+
+	if (fwrap->geo->pi_type)
+		ctx->cmd.nvm.prinfo = (o->pi_act << 3 | o->prchk);
+
+	switch (fwrap->geo->pi_type) {
+	case XNVME_PI_TYPE1:
+	case XNVME_PI_TYPE2:
+		switch (fwrap->geo->pi_format) {
+		case XNVME_SPEC_NVM_NS_16B_GUARD:
+			if (o->prchk & XNVME_PI_FLAGS_REFTAG_CHECK)
+				ctx->cmd.nvm.ilbrt = (uint32_t)slba;
+			break;
+		case XNVME_SPEC_NVM_NS_64B_GUARD:
+			if (o->prchk & XNVME_PI_FLAGS_REFTAG_CHECK) {
+				ctx->cmd.nvm.ilbrt = (uint32_t)slba;
+				ctx->cmd.common.cdw03 = ((slba >> 32) & 0xffff);
+			}
+			break;
+		default:
+			break;
+		}
+		if (o->prchk & XNVME_PI_FLAGS_APPTAG_CHECK) {
+			ctx->cmd.nvm.lbat = o->apptag;
+			ctx->cmd.nvm.lbatm = o->apptag_mask;
+		}
+		break;
+	case XNVME_PI_TYPE3:
+		if (o->prchk & XNVME_PI_FLAGS_APPTAG_CHECK) {
+			ctx->cmd.nvm.lbat = o->apptag;
+			ctx->cmd.nvm.lbatm = o->apptag_mask;
+		}
+		break;
+	case XNVME_PI_DISABLE:
+		break;
+	}
+
 	if (vectored_io) {
 		xd->iovec[io_u->index].iov_base = io_u->xfer_buf;
 		xd->iovec[io_u->index].iov_len = io_u->xfer_buflen;
-
-		err = xnvme_cmd_passv(ctx, &xd->iovec[io_u->index], 1, io_u->xfer_buflen, NULL, 0,
-				      0);
+		if (fwrap->md_nbytes && fwrap->lba_pow2) {
+			xd->md_iovec[io_u->index].iov_base = fio_req->md_buf;
+			xd->md_iovec[io_u->index].iov_len = fwrap->md_nbytes * (nlb + 1);
+			err = xnvme_cmd_passv(ctx, &xd->iovec[io_u->index], 1, io_u->xfer_buflen,
+					      &xd->md_iovec[io_u->index], 1,
+					      fwrap->md_nbytes * (nlb + 1));
+		} else {
+			err = xnvme_cmd_passv(ctx, &xd->iovec[io_u->index], 1, io_u->xfer_buflen,
+					      NULL, 0, 0);
+		}
 	} else {
-		err = xnvme_cmd_pass(ctx, io_u->xfer_buf, io_u->xfer_buflen, NULL, 0);
+		if (fwrap->md_nbytes && fwrap->lba_pow2)
+			err = xnvme_cmd_pass(ctx, io_u->xfer_buf, io_u->xfer_buflen,
+					     fio_req->md_buf, fwrap->md_nbytes * (nlb + 1));
+		else
+			err = xnvme_cmd_pass(ctx, io_u->xfer_buf, io_u->xfer_buflen, NULL, 0);
 	}
 	switch (err) {
 	case 0:
diff --git a/examples/xnvme-pi.fio b/examples/xnvme-pi.fio
new file mode 100644
index 00000000..ca8c0101
--- /dev/null
+++ b/examples/xnvme-pi.fio
@@ -0,0 +1,53 @@
+; README
+;
+; This job-file is intended to be used either as:
+;
+; # Use the xNVMe io-engine engine io_uring_cmd async. impl.
+; fio examples/xnvme-pi.fio \
+;   --ioengine=xnvme \
+;   --xnvme_async=io_uring_cmd \
+;   --filename=/dev/ng0n1
+;
+; # Use the xNVMe io-engine engine with nvme sync. impl.
+; fio examples/xnvme-pi.fio \
+;   --ioengine=xnvme \
+;   --xnvme_sync=nvme \
+;   --filename=/dev/ng0n1
+;
+; # Use the xNVMe io-engine engine with SPDK backend, note that you have to set the Namespace-id
+; fio examples/xnvme-pi.fio \
+;   --ioengine=xnvme \
+;   --xnvme_dev_nsid=1 \
+;   --filename=0000\\:01\\:00.0
+;
+; NOTE: The URI encoded in the filename above, the ":" must be escaped.
+;
+; On the command-line using two "\\":
+;
+; --filename=0000\\:01\\:00.0
+;
+; Within a fio-script using a single "\":
+;
+; filename=0000\:01\:00.0
+;
+; NOTE: This example configuration assumes that the NVMe device is formatted
+; with a separate metadata buffer. If you want to run on an extended LBA format
+; update the "bs" accordingly.
+;
+[global]
+size=100M
+iodepth=16
+bs=4K
+md_per_io_size=64
+pi_act=0
+pi_chk=GUARD,APPTAG,REFTAG
+apptag=0x0234
+apptag_mask=0xFFFF
+thread=1
+stonewall=1
+
+[write]
+rw=write
+
+[read]
+rw=read
diff --git a/fio.1 b/fio.1
index 227fcb47..7ec5c745 100644
--- a/fio.1
+++ b/fio.1
@@ -2251,10 +2251,10 @@ By default, the job will cycle through all available Placement IDs, so use this
 to isolate these identifiers to specific jobs. If you want fio to use placement
 identifier only at indices 0, 2 and 5 specify, you would set `fdp_pli=0,2,5`.
 .TP
-.BI (io_uring_cmd)md_per_io_size \fR=\fPint
+.BI (io_uring_cmd,xnvme)md_per_io_size \fR=\fPint
 Size in bytes for separate metadata buffer per IO. Default: 0.
 .TP
-.BI (io_uring_cmd)pi_act \fR=\fPint
+.BI (io_uring_cmd,xnvme)pi_act \fR=\fPint
 Action to take when nvme namespace is formatted with protection information.
 If this is set to 1 and namespace is formatted with metadata size equal to
 protection information size, fio won't use separate metadata buffer or extended
@@ -2268,7 +2268,7 @@ For 16 bit CRC generation fio will use isa-l if available otherwise it will
 use the default slower generator.
 (see: https://github.com/intel/isa-l)
 .TP
-.BI (io_uring_cmd)pi_chk \fR=\fPstr[,str][,str]
+.BI (io_uring_cmd,xnvme)pi_chk \fR=\fPstr[,str][,str]
 Controls the protection information check. This can take one or more of these
 values. Default: none.
 .RS
@@ -2285,11 +2285,11 @@ Enables protection information checking of application tag field.
 .RE
 .RE
 .TP
-.BI (io_uring_cmd)apptag \fR=\fPint
+.BI (io_uring_cmd,xnvme)apptag \fR=\fPint
 Specifies logical block application tag value, if namespace is formatted to use
 end to end protection information. Default: 0x1234.
 .TP
-.BI (io_uring_cmd)apptag_mask \fR=\fPint
+.BI (io_uring_cmd,xnvme)apptag_mask \fR=\fPint
 Specifies logical block application tag mask value, if namespace is formatted
 to use end to end protection information. Default: 0xffff.
 .TP
@@ -3765,12 +3765,14 @@ resulting in more precise time-related I/O statistics.
 Also see \fBlog_avg_msec\fR as well. Defaults to 1024.
 .TP
 .BI log_avg_msec \fR=\fPint
-By default, fio will log an entry in the iops, latency, or bw log for every
-I/O that completes. When writing to the disk log, that can quickly grow to a
-very large size. Setting this option makes fio average the each log entry
-over the specified period of time, reducing the resolution of the log. See
-\fBlog_window_value\fR as well. Defaults to 0, logging all entries.
-Also see \fBLOG FILE FORMATS\fR section.
+By default, fio will log an entry in the iops, latency, or bw log for every I/O
+that completes. When writing to the disk log, that can quickly grow to a very
+large size. Setting this option directs fio to instead record an average over
+the specified duration for each log entry, reducing the resolution of the log.
+When the job completes, fio will flush any accumulated latency log data, so the
+final log interval may not match the value specified by this option and there
+may even be duplicate timestamps. See \fBlog_window_value\fR as well. Defaults
+to 0, logging entries for each I/O. Also see \fBLOG FILE FORMATS\fR section.
 .TP
 .BI log_hist_msec \fR=\fPint
 Same as \fBlog_avg_msec\fR, but logs entries for completion latency
diff --git a/helper_thread.c b/helper_thread.c
index 2a9dabf5..332ccb53 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -161,7 +161,6 @@ void helper_thread_exit(void)
 		return;
 
 	helper_data->exit = 1;
-	submit_action(A_EXIT);
 	pthread_join(helper_data->thread, NULL);
 }
 
diff --git a/stat.c b/stat.c
index 11b58626..b98e8b27 100644
--- a/stat.c
+++ b/stat.c
@@ -3576,6 +3576,22 @@ static int add_iops_samples(struct thread_data *td, struct timespec *t)
 				td->ts.iops_stat, td->iops_log, false);
 }
 
+static bool td_in_logging_state(struct thread_data *td)
+{
+	if (in_ramp_time(td))
+		return false;
+
+	switch(td->runstate) {
+	case TD_RUNNING:
+	case TD_VERIFYING:
+	case TD_FINISHING:
+	case TD_EXITED:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*
  * Returns msecs to next event
  */
@@ -3585,15 +3601,13 @@ int calc_log_samples(void)
 	struct timespec now;
 	long elapsed_time = 0;
 
-	fio_gettime(&now, NULL);
-
 	for_each_td(td) {
-		elapsed_time = mtime_since_now(&td->epoch);
+		fio_gettime(&now, NULL);
+		elapsed_time = mtime_since(&td->epoch, &now);
 
 		if (!td->o.stats)
 			continue;
-		if (in_ramp_time(td) ||
-		    !(td->runstate == TD_RUNNING || td->runstate == TD_VERIFYING)) {
+		if (!td_in_logging_state(td)) {
 			next = min(td->o.iops_avg_time, td->o.bw_avg_time);
 			continue;
 		}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-02-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-02-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9cfa60d874e9a1da057677619a370409428ea3cf:

  verify: fix potential overflow before widen (2024-02-08 17:45:41 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 097af663e93f86106f32580bfa59e68fae007035:

  Merge branch 'vsock' of https://github.com/MPinna/fio (2024-02-12 11:56:33 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'vsock' of https://github.com/MPinna/fio

Marco Pinna (1):
      Add support for VSOCK to engine/net.c

 HOWTO.rst                         |   7 +-
 configure                         |  22 +++++++
 engines/net.c                     | 132 +++++++++++++++++++++++++++++++++++++-
 examples/netio_vsock.fio          |  22 +++++++
 examples/netio_vsock_receiver.fio |  14 ++++
 examples/netio_vsock_sender.fio   |  17 +++++
 fio.1                             |   9 ++-
 7 files changed, 217 insertions(+), 6 deletions(-)
 create mode 100644 examples/netio_vsock.fio
 create mode 100644 examples/netio_vsock_receiver.fio
 create mode 100644 examples/netio_vsock_sender.fio

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index ba160551..53b03021 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2626,10 +2626,13 @@ with the caveat that when used on the command line, they must come after the
 		User datagram protocol V6.
 	**unix**
 		UNIX domain socket.
+	**vsock**
+		VSOCK protocol.
 
-	When the protocol is TCP or UDP, the port must also be given, as well as the
-	hostname if the job is a TCP listener or UDP reader. For unix sockets, the
+	When the protocol is TCP, UDP or VSOCK, the port must also be given, as well as the
+	hostname if the job is a TCP or VSOCK listener or UDP reader. For unix sockets, the
 	normal :option:`filename` option should be used and the port is invalid.
+	When the protocol is VSOCK, the :option:`hostname` is the CID of the remote VM.
 
 .. option:: listen : [netsplice] [net]
 
diff --git a/configure b/configure
index dea8d07d..becb193e 100755
--- a/configure
+++ b/configure
@@ -1728,6 +1728,25 @@ elif compile_prog "" "-lws2_32" "TCP_NODELAY"; then
 fi
 print_config "TCP_NODELAY" "$tcp_nodelay"
 
+##########################################
+# Check whether we have vsock
+if test "$vsock" != "yes" ; then
+  vsock="no"
+fi
+cat > $TMPC << EOF
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <linux/vm_sockets.h>
+int main(int argc, char **argv)
+{
+  return socket(AF_VSOCK, SOCK_STREAM, 0);
+}
+EOF
+if compile_prog "" "" "vsock"; then
+  vsock="yes"
+fi
+print_config "vsock" "$vsock"
+
 ##########################################
 # Check whether we have SO_SNDBUF
 if test "$window_size" != "yes" ; then
@@ -3192,6 +3211,9 @@ fi
 if test "$ipv6" = "yes" ; then
   output_sym "CONFIG_IPV6"
 fi
+if test "$vsock" = "yes"; then
+  output_sym "CONFIG_VSOCK"
+fi
 if test "$http" = "yes" ; then
   output_sym "CONFIG_HTTP"
 fi
diff --git a/engines/net.c b/engines/net.c
index fec53d74..29150bb3 100644
--- a/engines/net.c
+++ b/engines/net.c
@@ -18,6 +18,16 @@
 #include <sys/socket.h>
 #include <sys/un.h>
 
+#ifdef CONFIG_VSOCK
+#include <linux/vm_sockets.h>
+#else
+struct sockaddr_vm {
+};
+#ifndef AF_VSOCK
+#define AF_VSOCK	-1
+#endif
+#endif
+
 #include "../fio.h"
 #include "../verify.h"
 #include "../optgroup.h"
@@ -30,6 +40,7 @@ struct netio_data {
 	struct sockaddr_in addr;
 	struct sockaddr_in6 addr6;
 	struct sockaddr_un addr_un;
+	struct sockaddr_vm addr_vm;
 	uint64_t udp_send_seq;
 	uint64_t udp_recv_seq;
 };
@@ -69,6 +80,7 @@ enum {
 	FIO_TYPE_UNIX	= 3,
 	FIO_TYPE_TCP_V6	= 4,
 	FIO_TYPE_UDP_V6	= 5,
+	FIO_TYPE_VSOCK_STREAM   = 6,
 };
 
 static int str_hostname_cb(void *data, const char *input);
@@ -126,6 +138,10 @@ static struct fio_option options[] = {
 			    .oval = FIO_TYPE_UNIX,
 			    .help = "UNIX domain socket",
 			  },
+			  { .ival = "vsock",
+			    .oval = FIO_TYPE_VSOCK_STREAM,
+			    .help = "Virtual socket",
+			  },
 		},
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_NETIO,
@@ -223,6 +239,11 @@ static inline int is_ipv6(struct netio_options *o)
 	return o->proto == FIO_TYPE_UDP_V6 || o->proto == FIO_TYPE_TCP_V6;
 }
 
+static inline int is_vsock(struct netio_options *o)
+{
+	return o->proto == FIO_TYPE_VSOCK_STREAM;
+}
+
 static int set_window_size(struct thread_data *td, int fd)
 {
 #ifdef CONFIG_NET_WINDOWSIZE
@@ -732,6 +753,9 @@ static int fio_netio_connect(struct thread_data *td, struct fio_file *f)
 	} else if (o->proto == FIO_TYPE_UNIX) {
 		domain = AF_UNIX;
 		type = SOCK_STREAM;
+	} else if (is_vsock(o)) {
+		domain = AF_VSOCK;
+		type = SOCK_STREAM;
 	} else {
 		log_err("fio: bad network type %d\n", o->proto);
 		f->fd = -1;
@@ -809,7 +833,14 @@ static int fio_netio_connect(struct thread_data *td, struct fio_file *f)
 			close(f->fd);
 			return 1;
 		}
+	} else if (is_vsock(o)) {
+		socklen_t len = sizeof(nd->addr_vm);
 
+		if (connect(f->fd, (struct sockaddr *) &nd->addr_vm, len) < 0) {
+			td_verror(td, errno, "connect");
+			close(f->fd);
+			return 1;
+		}
 	} else {
 		struct sockaddr_un *addr = &nd->addr_un;
 		socklen_t len;
@@ -849,6 +880,9 @@ static int fio_netio_accept(struct thread_data *td, struct fio_file *f)
 	if (o->proto == FIO_TYPE_TCP) {
 		socklen = sizeof(nd->addr);
 		f->fd = accept(nd->listenfd, (struct sockaddr *) &nd->addr, &socklen);
+	} else if (is_vsock(o)) {
+		socklen = sizeof(nd->addr_vm);
+		f->fd = accept(nd->listenfd, (struct sockaddr *) &nd->addr_vm, &socklen);
 	} else {
 		socklen = sizeof(nd->addr6);
 		f->fd = accept(nd->listenfd, (struct sockaddr *) &nd->addr6, &socklen);
@@ -890,6 +924,9 @@ static void fio_netio_send_close(struct thread_data *td, struct fio_file *f)
 	if (is_ipv6(o)) {
 		to = (struct sockaddr *) &nd->addr6;
 		len = sizeof(nd->addr6);
+	} else if (is_vsock(o)) {
+		to = NULL;
+		len = 0;
 	} else {
 		to = (struct sockaddr *) &nd->addr;
 		len = sizeof(nd->addr);
@@ -960,6 +997,9 @@ static int fio_netio_send_open(struct thread_data *td, struct fio_file *f)
 	if (is_ipv6(o)) {
 		len = sizeof(nd->addr6);
 		to = (struct sockaddr *) &nd->addr6;
+	} else if (is_vsock(o)) {
+		len = sizeof(nd->addr_vm);
+		to = (struct sockaddr *) &nd->addr_vm;
 	} else {
 		len = sizeof(nd->addr);
 		to = (struct sockaddr *) &nd->addr;
@@ -1023,13 +1063,17 @@ static int fio_fill_addr(struct thread_data *td, const char *host, int af,
 
 	memset(&hints, 0, sizeof(hints));
 
-	if (is_tcp(o))
+	if (is_tcp(o) || is_vsock(o))
 		hints.ai_socktype = SOCK_STREAM;
 	else
 		hints.ai_socktype = SOCK_DGRAM;
 
 	if (is_ipv6(o))
 		hints.ai_family = AF_INET6;
+#ifdef CONFIG_VSOCK
+	else if (is_vsock(o))
+		hints.ai_family = AF_VSOCK;
+#endif
 	else
 		hints.ai_family = AF_INET;
 
@@ -1110,12 +1154,50 @@ static int fio_netio_setup_connect_unix(struct thread_data *td,
 	return 0;
 }
 
+static int fio_netio_setup_connect_vsock(struct thread_data *td,
+					const char *host, unsigned short port)
+{
+#ifdef CONFIG_VSOCK
+	struct netio_data *nd = td->io_ops_data;
+	struct sockaddr_vm *addr = &nd->addr_vm;
+	int cid;
+
+	if (!host) {
+		log_err("fio: connect with no host to connect to.\n");
+		if (td_read(td))
+			log_err("fio: did you forget to set 'listen'?\n");
+
+		td_verror(td, EINVAL, "no hostname= set");
+		return 1;
+	}
+
+	addr->svm_family = AF_VSOCK;
+	addr->svm_port = port;
+
+	if (host) {
+		cid = atoi(host);
+		if (cid < 0 || cid > UINT32_MAX) {
+			log_err("fio: invalid CID %d\n", cid);
+			return 1;
+		}
+		addr->svm_cid = cid;
+	}
+
+	return 0;
+#else
+	td_verror(td, -EINVAL, "vsock not supported");
+	return 1;
+#endif
+}
+
 static int fio_netio_setup_connect(struct thread_data *td)
 {
 	struct netio_options *o = td->eo;
 
 	if (is_udp(o) || is_tcp(o))
 		return fio_netio_setup_connect_inet(td, td->o.filename,o->port);
+	else if (is_vsock(o))
+		return fio_netio_setup_connect_vsock(td, td->o.filename, o->port);
 	else
 		return fio_netio_setup_connect_unix(td, td->o.filename);
 }
@@ -1268,6 +1350,47 @@ static int fio_netio_setup_listen_inet(struct thread_data *td, short port)
 	return 0;
 }
 
+static int fio_netio_setup_listen_vsock(struct thread_data *td, short port, int type)
+{
+#ifdef CONFIG_VSOCK
+	struct netio_data *nd = td->io_ops_data;
+	struct sockaddr_vm *addr = &nd->addr_vm;
+	int fd, opt;
+	socklen_t len;
+
+	fd = socket(AF_VSOCK, type, 0);
+	if (fd < 0) {
+		td_verror(td, errno, "socket");
+		return 1;
+	}
+
+	opt = 1;
+	if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (void *) &opt, sizeof(opt)) < 0) {
+		td_verror(td, errno, "setsockopt");
+		close(fd);
+		return 1;
+	}
+
+	len = sizeof(*addr);
+
+	nd->addr_vm.svm_family = AF_VSOCK;
+	nd->addr_vm.svm_cid = VMADDR_CID_ANY;
+	nd->addr_vm.svm_port = port;
+
+	if (bind(fd, (struct sockaddr *) addr, len) < 0) {
+		td_verror(td, errno, "bind");
+		close(fd);
+		return 1;
+	}
+
+	nd->listenfd = fd;
+	return 0;
+#else
+	td_verror(td, -EINVAL, "vsock not supported");
+	return -1;
+#endif
+}
+
 static int fio_netio_setup_listen(struct thread_data *td)
 {
 	struct netio_data *nd = td->io_ops_data;
@@ -1276,6 +1399,8 @@ static int fio_netio_setup_listen(struct thread_data *td)
 
 	if (is_udp(o) || is_tcp(o))
 		ret = fio_netio_setup_listen_inet(td, o->port);
+	else if (is_vsock(o))
+		ret = fio_netio_setup_listen_vsock(td, o->port, SOCK_STREAM);
 	else
 		ret = fio_netio_setup_listen_unix(td, td->o.filename);
 
@@ -1311,6 +1436,9 @@ static int fio_netio_init(struct thread_data *td)
 	if (o->proto == FIO_TYPE_UNIX && o->port) {
 		log_err("fio: network IO port not valid with unix socket\n");
 		return 1;
+	} else if (is_vsock(o) && !o->port) {
+		log_err("fio: network IO requires port for vsock\n");
+		return 1;
 	} else if (o->proto != FIO_TYPE_UNIX && !o->port) {
 		log_err("fio: network IO requires port for tcp or udp\n");
 		return 1;
@@ -1318,7 +1446,7 @@ static int fio_netio_init(struct thread_data *td)
 
 	o->port += td->subjob_number;
 
-	if (!is_tcp(o)) {
+	if (!is_tcp(o) && !is_vsock(o)) {
 		if (o->listen) {
 			log_err("fio: listen only valid for TCP proto IO\n");
 			return 1;
diff --git a/examples/netio_vsock.fio b/examples/netio_vsock.fio
new file mode 100644
index 00000000..8c328f7d
--- /dev/null
+++ b/examples/netio_vsock.fio
@@ -0,0 +1,22 @@
+# Example network vsock job, just defines two clients that send/recv data
+[global]
+ioengine=net
+
+port=8888
+protocol=vsock
+bs=4k
+size=100g
+
+#set the below option to enable end-to-end data integrity tests
+#verify=md5
+
+[receiver]
+listen
+rw=read
+
+[sender]
+# 1 (VMADDR_CID_LOCAL) is the well-known address
+# for local communication (loopback)
+hostname=1
+startdelay=1
+rw=write
diff --git a/examples/netio_vsock_receiver.fio b/examples/netio_vsock_receiver.fio
new file mode 100644
index 00000000..e2a00c4d
--- /dev/null
+++ b/examples/netio_vsock_receiver.fio
@@ -0,0 +1,14 @@
+# Example network vsock job, just defines a receiver
+[global]
+ioengine=net
+port=8888
+protocol=vsock
+bs=4k
+size=100g
+
+#set the below option to enable end-to-end data integrity tests
+#verify=md5
+
+[receiver]
+listen
+rw=read
diff --git a/examples/netio_vsock_sender.fio b/examples/netio_vsock_sender.fio
new file mode 100644
index 00000000..2451d990
--- /dev/null
+++ b/examples/netio_vsock_sender.fio
@@ -0,0 +1,17 @@
+# Example network vsock job, just defines a sender
+[global]
+ioengine=net
+port=8888
+protocol=vsock
+bs=4k
+size=100g
+
+#set the below option to enable end-to-end data integrity tests
+#verify=md5
+
+[sender]
+# set the 'hostname' option to the CID of the listening domain
+hostname=3
+startdelay=1
+rw=write
+
diff --git a/fio.1 b/fio.1
index aef1dc85..227fcb47 100644
--- a/fio.1
+++ b/fio.1
@@ -2376,11 +2376,16 @@ User datagram protocol V6.
 .TP
 .B unix
 UNIX domain socket.
+.TP
+.B vsock
+VSOCK protocol.
 .RE
 .P
-When the protocol is TCP or UDP, the port must also be given, as well as the
-hostname if the job is a TCP listener or UDP reader. For unix sockets, the
+When the protocol is TCP, UDP or VSOCK, the port must also be given, as well as the
+hostname if the job is a TCP or VSOCK listener or UDP reader. For unix sockets, the
 normal \fBfilename\fR option should be used and the port is invalid.
+When the protocol is VSOCK, the \fBhostname\fR is the CID of the remote VM.
+
 .RE
 .TP
 .BI (netsplice,net)listen

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-02-09 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-02-09 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 12067650d11d4777dee0cd64a136923c2fd2d073:

  t/zbd: add -s option to test-zbd-support script (2024-02-07 08:43:13 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9cfa60d874e9a1da057677619a370409428ea3cf:

  verify: fix potential overflow before widen (2024-02-08 17:45:41 -0500)

----------------------------------------------------------------
Oleg Krasnov (1):
      fix wrong offset for VERIFY_PATTERN_NO_HDR

Vincent Fu (2):
      Merge branch 'fix-offset' of https://github.com/onkrasnov/fio
      verify: fix potential overflow before widen

 verify.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/verify.c b/verify.c
index 78f333e6..b438eed6 100644
--- a/verify.c
+++ b/verify.c
@@ -338,12 +338,20 @@ static void dump_verify_buffers(struct verify_header *hdr, struct vcont *vc)
 static void log_verify_failure(struct verify_header *hdr, struct vcont *vc)
 {
 	unsigned long long offset;
+	uint32_t len;
+	struct thread_data *td = vc->td;
 
 	offset = vc->io_u->verify_offset;
-	offset += vc->hdr_num * hdr->len;
+	if (td->o.verify != VERIFY_PATTERN_NO_HDR) {
+		len = hdr->len;
+		offset += (unsigned long long) vc->hdr_num * len;
+	} else {
+		len = vc->io_u->buflen;
+	}
+
 	log_err("%.8s: verify failed at file %s offset %llu, length %u"
 			" (requested block: offset=%llu, length=%llu, flags=%x)\n",
-			vc->name, vc->io_u->file->file_name, offset, hdr->len,
+			vc->name, vc->io_u->file->file_name, offset, len,
 			vc->io_u->verify_offset, vc->io_u->buflen, vc->io_u->flags);
 
 	if (vc->good_crc && vc->bad_crc) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-02-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-02-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 625b155dcd3d56595ced60806e091126446c1e08:

  examples: cmdprio_bssplit: add CDL example (2024-01-27 11:18:57 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 12067650d11d4777dee0cd64a136923c2fd2d073:

  t/zbd: add -s option to test-zbd-support script (2024-02-07 08:43:13 -0500)

----------------------------------------------------------------
Dmitry Fomichev (5):
      zbd: avoid assertions during sequential read I/O
      oslib: log BLKREPORTZONE error code
      zbd: use a helper to calculate zone index
      t/zbd: check device for unrestricted read support
      t/zbd: add -s option to test-zbd-support script

 oslib/linux-blkzoned.c |  2 ++
 t/zbd/functions        | 22 ++++++++++++++++++++++
 t/zbd/test-zbd-support | 20 ++++++++++++++++++--
 zbd.c                  | 21 +++++++++++++++------
 4 files changed, 57 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
index 2c3ecf33..1cc8d288 100644
--- a/oslib/linux-blkzoned.c
+++ b/oslib/linux-blkzoned.c
@@ -242,6 +242,8 @@ int blkzoned_report_zones(struct thread_data *td, struct fio_file *f,
 	hdr->sector = offset >> 9;
 	ret = ioctl(fd, BLKREPORTZONE, hdr);
 	if (ret) {
+		log_err("%s: BLKREPORTZONE ioctl failed, ret=%d, err=%d.\n",
+			f->file_name, ret, -errno);
 		ret = -errno;
 		goto out;
 	}
diff --git a/t/zbd/functions b/t/zbd/functions
index 028df404..7734371e 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -290,6 +290,28 @@ min_seq_write_size() {
 	fi
 }
 
+urswrz() {
+    local dev=$1
+
+    if [ -n "${sg_inq}" ] && [ ! -n "${use_libzbc}" ]; then
+	if ! ${sg_inq} -e --page=0xB6 --len=10 --hex "$dev" \
+		 > /dev/null 2>&1; then
+	    # Couldn't get URSWRZ bit. Assume the reads are unrestricted
+	    # because this configuration is more common.
+	    echo 1
+	else
+	    ${sg_inq} -e --page=0xB6 --len=10 --hex "$dev" | tail -1 |
+		{
+		    read -r offset b0 b1 b2 b3 b4 trailer && \
+			echo $(( $b4 & 0x01 )) || echo 0
+		}
+	fi
+    else
+	${zbc_info} "$dev" |
+	    sed -n 's/^[[:blank:]].*Read commands are \(un\)restricted*/\1/p' | grep -q ^ && echo 1 || echo 0
+    fi
+}
+
 is_zbc() {
 	local dev=$1
 
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 532860eb..c27d2ad6 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -15,6 +15,7 @@ usage() {
 	echo -e "\t-w Reset all zones before executing each write test case"
 	echo -e "\t-o <max_open_zones> Run fio with max_open_zones limit"
 	echo -e "\t-t <test #> Run only a single test case with specified number"
+	echo -e "\t-s <test #> Start testing from the case with the specified number"
 	echo -e "\t-q Quit the test run after any failed test"
 	echo -e "\t-z Run fio with debug=zbd option"
 	echo -e "\t-u Use io_uring ioengine in place of libaio"
@@ -412,8 +413,16 @@ test4() {
     opts+=("--size=$size" "--thread=1" "--read_beyond_wp=1")
     opts+=("$(ioengine "psync")" "--rw=read" "--direct=1" "--disable_lat=1")
     opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
-    run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
-    check_read $size || return $?
+    run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1
+    fio_rc=$?
+    if [[ $unrestricted_reads != 0 ]]; then
+	if [[ $fio_rc != 0 ]]; then
+		return "$fio_rc"
+	fi
+	check_read $size || return $?
+    else
+        [ $fio_rc == 0 ] && return 1 || return 0
+    fi
 }
 
 # Sequential write to sequential zones.
@@ -1594,6 +1603,7 @@ zbd_debug=
 max_open_zones_opt=
 quit_on_err=
 force_io_uring=
+start_test=1
 
 while [ "${1#-}" != "$1" ]; do
   case "$1" in
@@ -1607,6 +1617,7 @@ while [ "${1#-}" != "$1" ]; do
     -w) reset_before_write=1; shift;;
     -t) tests+=("$2"); shift; shift;;
     -o) max_open_zones_opt="${2}"; shift; shift;;
+    -s) start_test=$2; shift; shift;;
     -v) dynamic_analyzer=(valgrind "--read-var-info=yes");
 	shift;;
     -q) quit_on_err=1; shift;;
@@ -1664,6 +1675,7 @@ if [[ -b "$realdev" ]]; then
 		first_sequential_zone_sector=${result[0]}
 		sectors_per_zone=${result[1]}
 		zone_size=$((sectors_per_zone * 512))
+		unrestricted_reads=$(urswrz "$dev")
 		if ! max_open_zones=$(max_open_zones "$dev"); then
 			echo "Failed to determine maximum number of open zones"
 			exit 1
@@ -1681,9 +1693,11 @@ if [[ -b "$realdev" ]]; then
 		sectors_per_zone=$((zone_size / 512))
 		max_open_zones=128
 		max_active_zones=0
+		unrestricted_reads=1
 		set_io_scheduler "$basename" none || exit $?
 		;;
 	esac
+
 elif [[ -c "$realdev" ]]; then
 	# For an SG node, we must have libzbc option specified
 	if [[ ! -n "$use_libzbc" ]]; then
@@ -1712,6 +1726,7 @@ elif [[ -c "$realdev" ]]; then
 	first_sequential_zone_sector=${result[0]}
 	sectors_per_zone=${result[1]}
 	zone_size=$((sectors_per_zone * 512))
+	unrestricted_reads=$(urswrz "$dev")
 	if ! max_open_zones=$(max_open_zones "$dev"); then
 		echo "Failed to determine maximum number of open zones"
 		exit 1
@@ -1761,6 +1776,7 @@ trap 'intr=1' SIGINT
 ret=0
 
 for test_number in "${tests[@]}"; do
+    [ "${test_number}" -lt "${start_test}" ] && continue
     rm -f "${logfile}.${test_number}"
     unset SKIP_REASON
     echo -n "Running test $(printf "%02d" $test_number) ... "
diff --git a/zbd.c b/zbd.c
index 61b5b688..37417660 100644
--- a/zbd.c
+++ b/zbd.c
@@ -104,8 +104,7 @@ static void zone_lock(struct thread_data *td, const struct fio_file *f,
 		      struct fio_zone_info *z)
 {
 #ifndef NDEBUG
-	struct zoned_block_device_info *zbd = f->zbd_info;
-	uint32_t const nz = z - zbd->zone_info;
+	unsigned int const nz = zbd_zone_idx(f, z);
 	/* A thread should never lock zones outside its working area. */
 	assert(f->min_zone <= nz && nz < f->max_zone);
 	assert(z->has_wp);
@@ -674,9 +673,20 @@ static bool zbd_zone_align_file_sizes(struct thread_data *td,
 		return false;
 	}
 
+	if (td->o.td_ddir == TD_DDIR_READ) {
+		z = zbd_offset_to_zone(f, f->file_offset + f->io_size);
+		new_end = z->start;
+		if (f->file_offset + f->io_size > new_end) {
+			log_info("%s: rounded io_size from %"PRIu64" to %"PRIu64"\n",
+				 f->file_name, f->io_size,
+				 new_end - f->file_offset);
+			f->io_size = new_end - f->file_offset;
+		}
+		return true;
+	}
+
 	z = zbd_offset_to_zone(f, f->file_offset);
-	if ((f->file_offset != z->start) &&
-	    (td->o.td_ddir != TD_DDIR_READ)) {
+	if (f->file_offset != z->start) {
 		new_offset = zbd_zone_end(z);
 		if (new_offset >= f->file_offset + f->io_size) {
 			log_info("%s: io_size must be at least one zone\n",
@@ -692,8 +702,7 @@ static bool zbd_zone_align_file_sizes(struct thread_data *td,
 
 	z = zbd_offset_to_zone(f, f->file_offset + f->io_size);
 	new_end = z->start;
-	if ((td->o.td_ddir != TD_DDIR_READ) &&
-	    (f->file_offset + f->io_size != new_end)) {
+	if (f->file_offset + f->io_size != new_end) {
 		if (new_end <= f->file_offset) {
 			log_info("%s: io_size must be at least one zone\n",
 				 f->file_name);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-01-28 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-01-28 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 90a0fbc83c59ddfacc3e90dcb7721ff322bfe26f:

  Merge branch 'coverity-fix' of https://github.com/ankit-sam/fio (2024-01-25 10:20:10 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 625b155dcd3d56595ced60806e091126446c1e08:

  examples: cmdprio_bssplit: add CDL example (2024-01-27 11:18:57 -0500)

----------------------------------------------------------------
Niklas Cassel (2):
      examples: cmdprio_bssplit: s,IO,I/O,
      examples: cmdprio_bssplit: add CDL example

 examples/cmdprio-bssplit.fio | 47 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 41 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/examples/cmdprio-bssplit.fio b/examples/cmdprio-bssplit.fio
index f3b2fac0..ee202d74 100644
--- a/examples/cmdprio-bssplit.fio
+++ b/examples/cmdprio-bssplit.fio
@@ -12,9 +12,9 @@ iodepth=16
 ; use the same prio class and prio level defined by the cmdprio_class
 ; and cmdprio options.
 [cmdprio]
-; 40% of read IOs are 64kB and 60% are 1MB. 100% of writes are 1MB.
+; 40% of read I/Os are 64kB and 60% are 1MB. 100% of writes are 1MB.
 ; 100% of the 64kB reads are executed with prio class 1 and prio level 0.
-; All other IOs are executed without a priority set.
+; All other I/Os are executed without a priority set.
 bssplit=64k/40:1024k/60,1024k/100
 cmdprio_bssplit=64k/100:1024k/0,1024k/0
 cmdprio_class=1
@@ -23,22 +23,57 @@ cmdprio=0
 ; Advanced cmdprio_bssplit format. Each non-zero percentage entry can
 ; use a different prio class and prio level (appended to each entry).
 [cmdprio-adv]
-; 40% of read IOs are 64kB and 60% are 1MB. 100% of writes are 1MB.
+; 40% of read I/Os are 64kB and 60% are 1MB. 100% of writes are 1MB.
 ; 25% of the 64kB reads are executed with prio class 1 and prio level 1,
 ; 75% of the 64kB reads are executed with prio class 3 and prio level 2.
-; All other IOs are executed without a priority set.
+; All other I/Os are executed without a priority set.
 stonewall
 bssplit=64k/40:1024k/60,1024k/100
 cmdprio_bssplit=64k/25/1/1:64k/75/3/2:1024k/0,1024k/0
 
 ; Identical to the previous example, but with a default priority defined.
 [cmdprio-adv-def]
-; 40% of read IOs are 64kB and 60% are 1MB. 100% of writes are 1MB.
+; 40% of read I/Os are 64kB and 60% are 1MB. 100% of writes are 1MB.
 ; 25% of the 64kB reads are executed with prio class 1 and prio level 1,
 ; 75% of the 64kB reads are executed with prio class 3 and prio level 2.
-; All other IOs are executed with prio class 2 and prio level 7.
+; All other I/Os are executed with prio class 2 and prio level 7.
 stonewall
 prioclass=2
 prio=7
 bssplit=64k/40:1024k/60,1024k/100
 cmdprio_bssplit=64k/25/1/1:64k/75/3/2:1024k/0,1024k/0
+
+; Example of how to use cmdprio_bssplit with Command Duration Limits (CDL)
+; using I/O priority hints. The drive has to support CDL, and CDL has to be
+; enabled in sysfs, otherwise the hints will not be sent down to the drive.
+[cmdprio-hints]
+; 40% of the I/Os are 1MB reads and 60% of the I/Os are 2MB reads.
+;
+; 10% of the 1MB reads are executed with prio class 2 (Best Effort),
+; prio level 0, and prio hint 1. Prio hint 1 means CDL descriptor 1.
+; Since 40% of read I/Os are 1MB, and 10% of the 1MB I/Os use CDL desc 1,
+; this means that 4% of all the issued I/O will use this configuration.
+;
+; 30% of the 1MB reads are executed with prio class 2 (Best Effort),
+; prio level 0, and prio hint 2. Prio hint 2 means CDL descriptor 2.
+; Since 40% of read I/Os are 1MB, and 30% of the 1MB I/Os use CDL desc 2,
+; this means that 12% of all the issued I/O will use this configuration.
+;
+; 60% of the 1MB reads are executed with prio class 2 (Best Effort),
+; prio level 0, and prio hint 0. Prio hint 0 means no hint.
+; Since 40% of read I/Os are 1MB, and 60% of the 1MB I/Os use no hint,
+; this means that 24% of all the issued I/O will use this configuration.
+;
+; 10% of the 2MB reads are executed with prio class 2 (Best Effort),
+; prio level 0, and prio hint 3. Prio hint 3 means CDL descriptor 3.
+; Since 60% of read I/Os are 2MB, and 10% of the 2MB I/Os use CDL desc 3,
+; this means that 6% of all the issued I/O will use this configuration.
+;
+; 90% of the 2MB reads are executed with prio class 2 (Best Effort),
+; prio level 0, and prio hint 0. Prio hint 0 means no hint.
+; Since 60% of read I/Os are 2MB, and 90% of the 2MB I/Os use no hint,
+; this means that 54% of all the issued I/O will use this configuration.
+stonewall
+rw=randread
+bssplit=1M/40:2M/60
+cmdprio_bssplit=1M/10/2/0/1:1M/30/2/0/2:1M/60/2/0/0:2M/10/2/0/3:2M/90/2/0/0

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-01-26 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-01-26 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7f250f7514bacef1a3cea24a22ecce8bd30378bd:

  ci: resolve GitHub Actions Node.js warnings (2024-01-24 19:45:33 +0000)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 90a0fbc83c59ddfacc3e90dcb7721ff322bfe26f:

  Merge branch 'coverity-fix' of https://github.com/ankit-sam/fio (2024-01-25 10:20:10 -0700)

----------------------------------------------------------------
Ankit Kumar (3):
      stat: log out both average and max over the window
      docs: update fio man page for log_window_value
      iolog: fix reported defect from coverity scan

Jens Axboe (2):
      t/io_uring: remove dma map option
      Merge branch 'coverity-fix' of https://github.com/ankit-sam/fio

Vincent Fu (1):
      docs: change listed type for log_window_value to str

 HOWTO.rst    | 45 ++++++++++++++++++++++++++--------
 client.c     |  6 +++--
 fio.1        | 47 ++++++++++++++++++++++++++++--------
 iolog.c      | 79 +++++++++++++++++++++++++++++++++++++++++++++---------------
 iolog.h      | 21 +++++++++++++---
 options.c    | 34 ++++++++++++++++++++++----
 server.c     |  6 +++--
 stat.c       | 32 ++++++++++++++----------
 t/io_uring.c | 46 ++---------------------------------
 9 files changed, 207 insertions(+), 109 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index d0ba8021..ba160551 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -4067,7 +4067,7 @@ Measurements and reporting
 	I/O that completes. When writing to the disk log, that can quickly grow to a
 	very large size. Setting this option makes fio average the each log entry
 	over the specified period of time, reducing the resolution of the log.  See
-	:option:`log_max_value` as well. Defaults to 0, logging all entries.
+	:option:`log_window_value` as well. Defaults to 0, logging all entries.
 	Also see `Log File Formats`_.
 
 .. option:: log_hist_msec=int
@@ -4088,11 +4088,28 @@ Measurements and reporting
 	histogram logs contain 1216 latency bins. See :option:`write_hist_log`
 	and `Log File Formats`_.
 
-.. option:: log_max_value=bool
+.. option:: log_window_value=str, log_max_value=str
 
-	If :option:`log_avg_msec` is set, fio logs the average over that window. If
-	you instead want to log the maximum value, set this option to 1. Defaults to
-	0, meaning that averaged values are logged.
+	If :option:`log_avg_msec` is set, fio by default logs the average over that
+	window. This option determines whether fio logs the average, maximum or
+	both the values over the window. This only affects the latency logging,
+	as both average and maximum values for iops or bw log will be same.
+	Accepted values are:
+
+		**avg**
+			Log average value over the window. The default.
+
+		**max**
+			Log maximum value in the window.
+
+		**both**
+			Log both average and maximum value over the window.
+
+		**0**
+			Backward-compatible alias for **avg**.
+
+		**1**
+			Backward-compatible alias for **max**.
 
 .. option:: log_offset=bool
 
@@ -5061,11 +5078,19 @@ toggled with :option:`log_offset`.
 by the ioengine specific :option:`cmdprio_percentage`.
 
 Fio defaults to logging every individual I/O but when windowed logging is set
-through :option:`log_avg_msec`, either the average (by default) or the maximum
-(:option:`log_max_value` is set) *value* seen over the specified period of time
-is recorded. Each *data direction* seen within the window period will aggregate
-its values in a separate row. Further, when using windowed logging the *block
-size* and *offset* entries will always contain 0.
+through :option:`log_avg_msec`, either the average (by default), the maximum
+(:option:`log_window_value` is set to max) *value* seen over the specified period
+of time, or both the average *value* and maximum *value1* (:option:`log_window_value`
+is set to both) is recorded. The log file format when both the values are reported
+takes this form:
+
+    *time* (`msec`), *value*, *value1*, *data direction*, *block size* (`bytes`),
+    *offset* (`bytes`), *command priority*
+
+
+Each *data direction* seen within the window period will aggregate its values in a
+separate row. Further, when using windowed logging the *block size* and *offset*
+entries will always contain 0.
 
 
 Client/Server
diff --git a/client.c b/client.c
index 699a2e5b..4cb7dffe 100644
--- a/client.c
+++ b/client.c
@@ -1718,8 +1718,10 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 			s = (struct io_sample *)((char *)s + sizeof(struct io_u_plat_entry) * i);
 
 		s->time		= le64_to_cpu(s->time);
-		if (ret->log_type != IO_LOG_TYPE_HIST)
-			s->data.val	= le64_to_cpu(s->data.val);
+		if (ret->log_type != IO_LOG_TYPE_HIST) {
+			s->data.val.val0	= le64_to_cpu(s->data.val.val0);
+			s->data.val.val1	= le64_to_cpu(s->data.val.val1);
+		}
 		s->__ddir	= __le32_to_cpu(s->__ddir);
 		s->bs		= le64_to_cpu(s->bs);
 		s->priority	= le16_to_cpu(s->priority);
diff --git a/fio.1 b/fio.1
index 8f659f1d..aef1dc85 100644
--- a/fio.1
+++ b/fio.1
@@ -3764,7 +3764,7 @@ By default, fio will log an entry in the iops, latency, or bw log for every
 I/O that completes. When writing to the disk log, that can quickly grow to a
 very large size. Setting this option makes fio average the each log entry
 over the specified period of time, reducing the resolution of the log. See
-\fBlog_max_value\fR as well. Defaults to 0, logging all entries.
+\fBlog_window_value\fR as well. Defaults to 0, logging all entries.
 Also see \fBLOG FILE FORMATS\fR section.
 .TP
 .BI log_hist_msec \fR=\fPint
@@ -3782,10 +3782,28 @@ the histogram logs enabled with \fBlog_hist_msec\fR. For each increment
 in coarseness, fio outputs half as many bins. Defaults to 0, for which
 histogram logs contain 1216 latency bins. See \fBLOG FILE FORMATS\fR section.
 .TP
-.BI log_max_value \fR=\fPbool
-If \fBlog_avg_msec\fR is set, fio logs the average over that window. If
-you instead want to log the maximum value, set this option to 1. Defaults to
-0, meaning that averaged values are logged.
+.BI log_window_value \fR=\fPstr "\fR,\fP log_max_value" \fR=\fPstr
+If \fBlog_avg_msec\fR is set, fio by default logs the average over that window.
+This option determines whether fio logs the average, maximum or both the
+values over the window. This only affects the latency logging, as both average
+and maximum values for iops or bw log will be same. Accepted values are:
+.RS
+.TP
+.B avg
+Log average value over the window. The default.
+.TP
+.B max
+Log maximum value in the window.
+.TP
+.B both
+Log both average and maximum value over the window.
+.TP
+.B 0
+Backward-compatible alias for \fBavg\fR.
+.TP
+.B 1
+Backward-compatible alias for \fBmax\fR.
+.RE
 .TP
 .BI log_offset \fR=\fPbool
 If this is set, the iolog options will include the byte offset for the I/O
@@ -4797,11 +4815,20 @@ number with the lowest 13 bits indicating the priority value (\fBprio\fR and
 (\fBprioclass\fR and \fBcmdprio_class\fR options).
 .P
 Fio defaults to logging every individual I/O but when windowed logging is set
-through \fBlog_avg_msec\fR, either the average (by default) or the maximum
-(\fBlog_max_value\fR is set) `value' seen over the specified period of time
-is recorded. Each `data direction' seen within the window period will aggregate
-its values in a separate row. Further, when using windowed logging the `block
-size' and `offset' entries will always contain 0.
+through \fBlog_avg_msec\fR, either the average (by default), the maximum
+(\fBlog_window_value\fR is set to max) `value' seen over the specified period of
+time, or both the average `value' and maximum `value1' (\fBlog_window_value\fR is
+set to both) is recorded. The log file format when both the values are reported
+takes this form:
+.RS
+.P
+time (msec), value, value1, data direction, block size (bytes), offset (bytes),
+command priority
+.RE
+.P
+Each `data direction' seen within the window period will aggregate its values
+in a separate row. Further, when using windowed logging the `block size' and
+`offset' entries will always contain 0.
 .SH CLIENT / SERVER
 Normally fio is invoked as a stand-alone application on the machine where the
 I/O workload should be generated. However, the backend and frontend of fio can
diff --git a/iolog.c b/iolog.c
index 5213c60f..f52a9a80 100644
--- a/iolog.c
+++ b/iolog.c
@@ -862,6 +862,13 @@ void setup_log(struct io_log **log, struct log_params *p,
 		l->log_ddir_mask = LOG_OFFSET_SAMPLE_BIT;
 	if (l->log_prio)
 		l->log_ddir_mask |= LOG_PRIO_SAMPLE_BIT;
+	/*
+	 * The bandwidth-log option generates agg-read_bw.log,
+	 * agg-write_bw.log and agg-trim_bw.log for which l->td is NULL.
+	 * Check if l->td is valid before dereferencing it.
+	 */
+	if (l->td && l->td->o.log_max == IO_LOG_SAMPLE_BOTH)
+		l->log_ddir_mask |= LOG_AVG_MAX_SAMPLE_BIT;
 
 	INIT_FLIST_HEAD(&l->chunk_list);
 
@@ -988,7 +995,7 @@ static void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
 void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 {
 	struct io_sample *s;
-	int log_offset, log_prio;
+	int log_offset, log_prio, log_avg_max;
 	uint64_t i, nr_samples;
 	unsigned int prio_val;
 	const char *fmt;
@@ -999,17 +1006,32 @@ void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 	s = __get_sample(samples, 0, 0);
 	log_offset = (s->__ddir & LOG_OFFSET_SAMPLE_BIT) != 0;
 	log_prio = (s->__ddir & LOG_PRIO_SAMPLE_BIT) != 0;
+	log_avg_max = (s->__ddir & LOG_AVG_MAX_SAMPLE_BIT) != 0;
 
 	if (log_offset) {
-		if (log_prio)
-			fmt = "%" PRIu64 ", %" PRId64 ", %u, %llu, %llu, 0x%04x\n";
-		else
-			fmt = "%" PRIu64 ", %" PRId64 ", %u, %llu, %llu, %u\n";
+		if (log_prio) {
+			if (log_avg_max)
+				fmt = "%" PRIu64 ", %" PRId64 ", %" PRId64 ", %u, %llu, %llu, 0x%04x\n";
+			else
+				fmt = "%" PRIu64 ", %" PRId64 ", %u, %llu, %llu, 0x%04x\n";
+		} else {
+			if (log_avg_max)
+				fmt = "%" PRIu64 ", %" PRId64 ", %" PRId64 ", %u, %llu, %llu, %u\n";
+			else
+				fmt = "%" PRIu64 ", %" PRId64 ", %u, %llu, %llu, %u\n";
+		}
 	} else {
-		if (log_prio)
-			fmt = "%" PRIu64 ", %" PRId64 ", %u, %llu, 0x%04x\n";
-		else
-			fmt = "%" PRIu64 ", %" PRId64 ", %u, %llu, %u\n";
+		if (log_prio) {
+			if (log_avg_max)
+				fmt = "%" PRIu64 ", %" PRId64 ", %" PRId64 ", %u, %llu, 0x%04x\n";
+			else
+				fmt = "%" PRIu64 ", %" PRId64 ", %u, %llu, 0x%04x\n";
+		} else {
+			if (log_avg_max)
+				fmt = "%" PRIu64 ", %" PRId64 ", %" PRId64 ", %u, %llu, %u\n";
+			else
+				fmt = "%" PRIu64 ", %" PRId64 ", %u, %llu, %u\n";
+		}
 	}
 
 	nr_samples = sample_size / __log_entry_sz(log_offset);
@@ -1023,20 +1045,37 @@ void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 			prio_val = ioprio_value_is_class_rt(s->priority);
 
 		if (!log_offset) {
-			fprintf(f, fmt,
-				s->time,
-				s->data.val,
-				io_sample_ddir(s), (unsigned long long) s->bs,
-				prio_val);
+			if (log_avg_max)
+				fprintf(f, fmt,
+					s->time,
+					s->data.val.val0,
+					s->data.val.val1,
+					io_sample_ddir(s), (unsigned long long) s->bs,
+					prio_val);
+			else
+				fprintf(f, fmt,
+					s->time,
+					s->data.val.val0,
+					io_sample_ddir(s), (unsigned long long) s->bs,
+					prio_val);
 		} else {
 			struct io_sample_offset *so = (void *) s;
 
-			fprintf(f, fmt,
-				s->time,
-				s->data.val,
-				io_sample_ddir(s), (unsigned long long) s->bs,
-				(unsigned long long) so->offset,
-				prio_val);
+			if (log_avg_max)
+				fprintf(f, fmt,
+					s->time,
+					s->data.val.val0,
+					s->data.val.val1,
+					io_sample_ddir(s), (unsigned long long) s->bs,
+					(unsigned long long) so->offset,
+					prio_val);
+			else
+				fprintf(f, fmt,
+					s->time,
+					s->data.val.val0,
+					io_sample_ddir(s), (unsigned long long) s->bs,
+					(unsigned long long) so->offset,
+					prio_val);
 		}
 	}
 }
diff --git a/iolog.h b/iolog.h
index 62cbd1b0..26dd5cca 100644
--- a/iolog.h
+++ b/iolog.h
@@ -26,13 +26,23 @@ struct io_hist {
 	struct flist_head list;
 };
 
+enum {
+	IO_LOG_SAMPLE_AVG = 0,
+	IO_LOG_SAMPLE_MAX,
+	IO_LOG_SAMPLE_BOTH,
+};
+
+struct io_sample_value {
+	uint64_t val0;
+	uint64_t val1;
+};
 
 union io_sample_data {
-	uint64_t val;
+	struct io_sample_value val;
 	struct io_u_plat_entry *plat_entry;
 };
 
-#define sample_val(value) ((union io_sample_data) { .val = value })
+#define sample_val(value) ((union io_sample_data) { .val.val0 = value })
 #define sample_plat(plat) ((union io_sample_data) { .plat_entry = plat })
 
 /*
@@ -154,8 +164,13 @@ struct io_log {
  * If the bit following the upper bit is set, then we have the priority
  */
 #define LOG_PRIO_SAMPLE_BIT	0x40000000U
+/*
+ * If the bit following prioity sample vit is set, we report both avg and max
+ */
+#define LOG_AVG_MAX_SAMPLE_BIT	0x20000000U
 
-#define LOG_SAMPLE_BITS		(LOG_OFFSET_SAMPLE_BIT | LOG_PRIO_SAMPLE_BIT)
+#define LOG_SAMPLE_BITS		(LOG_OFFSET_SAMPLE_BIT | LOG_PRIO_SAMPLE_BIT |\
+					LOG_AVG_MAX_SAMPLE_BIT)
 #define io_sample_ddir(io)	((io)->__ddir & ~LOG_SAMPLE_BITS)
 
 static inline void io_sample_set_ddir(struct io_log *log,
diff --git a/options.c b/options.c
index 53df03de..1da4de78 100644
--- a/options.c
+++ b/options.c
@@ -4540,14 +4540,38 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
-		.name	= "log_max_value",
-		.lname	= "Log maximum instead of average",
-		.type	= FIO_OPT_BOOL,
+		.name	= "log_window_value",
+		.alias  = "log_max_value",
+		.lname	= "Log maximum, average or both values",
+		.type	= FIO_OPT_STR,
 		.off1	= offsetof(struct thread_options, log_max),
-		.help	= "Log max sample in a window instead of average",
-		.def	= "0",
+		.help	= "Log max, average or both sample in a window",
+		.def	= "avg",
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
+		.posval	= {
+			  { .ival = "avg",
+			    .oval = IO_LOG_SAMPLE_AVG,
+			    .help = "Log average value over the window",
+			  },
+			  { .ival = "max",
+			    .oval = IO_LOG_SAMPLE_MAX,
+			    .help = "Log maximum value in the window",
+			  },
+			  { .ival = "both",
+			    .oval = IO_LOG_SAMPLE_BOTH,
+			    .help = "Log both average and maximum values over the window"
+			  },
+			  /* Compatibility with former boolean values */
+			  { .ival = "0",
+			    .oval = IO_LOG_SAMPLE_AVG,
+			    .help = "Alias for 'avg'",
+			  },
+			  { .ival = "1",
+			    .oval = IO_LOG_SAMPLE_MAX,
+			    .help = "Alias for 'max'",
+			  },
+		},
 	},
 	{
 		.name	= "log_offset",
diff --git a/server.c b/server.c
index b9f0e2ac..afaeb348 100644
--- a/server.c
+++ b/server.c
@@ -2288,8 +2288,10 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 			struct io_sample *s = get_sample(log, cur_log, i);
 
 			s->time		= cpu_to_le64(s->time);
-			if (log->log_type != IO_LOG_TYPE_HIST)
-				s->data.val	= cpu_to_le64(s->data.val);
+			if (log->log_type != IO_LOG_TYPE_HIST) {
+				s->data.val.val0	= cpu_to_le64(s->data.val.val0);
+				s->data.val.val1	= cpu_to_le64(s->data.val.val1);
+			}
 			s->__ddir	= __cpu_to_le32(s->__ddir);
 			s->bs		= cpu_to_le64(s->bs);
 
diff --git a/stat.c b/stat.c
index 7cf6bee1..11b58626 100644
--- a/stat.c
+++ b/stat.c
@@ -3149,7 +3149,7 @@ void reset_io_stats(struct thread_data *td)
 }
 
 static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
-			      unsigned long elapsed, bool log_max)
+			      unsigned long elapsed, int log_max)
 {
 	/*
 	 * Note an entry in the log. Use the mean from the logged samples,
@@ -3159,10 +3159,16 @@ static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
 	if (iolog->avg_window[ddir].samples) {
 		union io_sample_data data;
 
-		if (log_max)
-			data.val = iolog->avg_window[ddir].max_val;
-		else
-			data.val = iolog->avg_window[ddir].mean.u.f + 0.50;
+		if (log_max == IO_LOG_SAMPLE_AVG) {
+			data.val.val0 = iolog->avg_window[ddir].mean.u.f + 0.50;
+			data.val.val1 = 0;
+		} else if (log_max == IO_LOG_SAMPLE_MAX) {
+			data.val.val0 = iolog->avg_window[ddir].max_val;
+			data.val.val1 = 0;
+		} else {
+			data.val.val0 = iolog->avg_window[ddir].mean.u.f + 0.50;
+			data.val.val1 = iolog->avg_window[ddir].max_val;
+		}
 
 		__add_log_sample(iolog, data, ddir, 0, elapsed, 0, 0);
 	}
@@ -3171,7 +3177,7 @@ static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
 }
 
 static void _add_stat_to_log(struct io_log *iolog, unsigned long elapsed,
-			     bool log_max)
+			     int log_max)
 {
 	enum fio_ddir ddir;
 
@@ -3205,7 +3211,7 @@ static unsigned long add_log_sample(struct thread_data *td,
 	 * Add the sample. If the time period has passed, then
 	 * add that entry to the log and clear.
 	 */
-	add_stat_sample(&iolog->avg_window[ddir], data.val);
+	add_stat_sample(&iolog->avg_window[ddir], data.val.val0);
 
 	/*
 	 * If period hasn't passed, adding the above sample is all we
@@ -3221,7 +3227,7 @@ static unsigned long add_log_sample(struct thread_data *td,
 			return diff;
 	}
 
-	__add_stat_to_log(iolog, ddir, elapsed, td->o.log_max != 0);
+	__add_stat_to_log(iolog, ddir, elapsed, td->o.log_max);
 
 	iolog->avg_last[ddir] = elapsed - (elapsed % iolog->avg_msec);
 
@@ -3235,15 +3241,15 @@ void finalize_logs(struct thread_data *td, bool unit_logs)
 	elapsed = mtime_since_now(&td->epoch);
 
 	if (td->clat_log && unit_logs)
-		_add_stat_to_log(td->clat_log, elapsed, td->o.log_max != 0);
+		_add_stat_to_log(td->clat_log, elapsed, td->o.log_max);
 	if (td->slat_log && unit_logs)
-		_add_stat_to_log(td->slat_log, elapsed, td->o.log_max != 0);
+		_add_stat_to_log(td->slat_log, elapsed, td->o.log_max);
 	if (td->lat_log && unit_logs)
-		_add_stat_to_log(td->lat_log, elapsed, td->o.log_max != 0);
+		_add_stat_to_log(td->lat_log, elapsed, td->o.log_max);
 	if (td->bw_log && (unit_logs == per_unit_log(td->bw_log)))
-		_add_stat_to_log(td->bw_log, elapsed, td->o.log_max != 0);
+		_add_stat_to_log(td->bw_log, elapsed, td->o.log_max);
 	if (td->iops_log && (unit_logs == per_unit_log(td->iops_log)))
-		_add_stat_to_log(td->iops_log, elapsed, td->o.log_max != 0);
+		_add_stat_to_log(td->iops_log, elapsed, td->o.log_max);
 }
 
 void add_agg_sample(union io_sample_data data, enum fio_ddir ddir,
diff --git a/t/io_uring.c b/t/io_uring.c
index bf0aa26e..efc50caa 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -129,7 +129,6 @@ static int batch_complete = BATCH_COMPLETE;
 static int bs = BS;
 static int polled = 1;		/* use IO polling */
 static int fixedbufs = 1;	/* use fixed user buffers */
-static int dma_map;		/* pre-map DMA buffers */
 static int register_files = 1;	/* use fixed files */
 static int buffered = 0;	/* use buffered IO, not O_DIRECT */
 static int sq_thread_poll = 0;	/* use kernel submission/poller thread */
@@ -155,17 +154,6 @@ static float plist[] = { 1.0, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0,
 			80.0, 90.0, 95.0, 99.0, 99.5, 99.9, 99.95, 99.99 };
 static int plist_len = 17;
 
-#ifndef IORING_REGISTER_MAP_BUFFERS
-#define IORING_REGISTER_MAP_BUFFERS	26
-struct io_uring_map_buffers {
-	__s32	fd;
-	__u32	buf_start;
-	__u32	buf_end;
-	__u32	flags;
-	__u64	rsvd[2];
-};
-#endif
-
 static int nvme_identify(int fd, __u32 nsid, enum nvme_identify_cns cns,
 			 enum nvme_csi csi, void *data)
 {
@@ -405,22 +393,6 @@ static void add_stat(struct submitter *s, int clock_index, int nr)
 #endif
 }
 
-static int io_uring_map_buffers(struct submitter *s)
-{
-	struct io_uring_map_buffers map = {
-		.fd		= s->files[0].real_fd,
-		.buf_end	= depth,
-	};
-
-	if (do_nop)
-		return 0;
-	if (s->nr_files > 1)
-		fprintf(stdout, "Mapping buffers may not work with multiple files\n");
-
-	return syscall(__NR_io_uring_register, s->ring_fd,
-			IORING_REGISTER_MAP_BUFFERS, &map, 1);
-}
-
 static int io_uring_register_buffers(struct submitter *s)
 {
 	if (do_nop)
@@ -950,14 +922,6 @@ static int setup_ring(struct submitter *s)
 			perror("io_uring_register_buffers");
 			return 1;
 		}
-
-		if (dma_map) {
-			ret = io_uring_map_buffers(s);
-			if (ret < 0) {
-				perror("io_uring_map_buffers");
-				return 1;
-			}
-		}
 	}
 
 	if (register_files) {
@@ -1071,7 +1035,7 @@ static int submitter_init(struct submitter *s)
 	}
 
 	if (!init_printed) {
-		printf("polled=%d, fixedbufs=%d/%d, register_files=%d, buffered=%d, QD=%d\n", polled, fixedbufs, dma_map, register_files, buffered, depth);
+		printf("polled=%d, fixedbufs=%d, register_files=%d, buffered=%d, QD=%d\n", polled, fixedbufs, register_files, buffered, depth);
 		printf("%s", buf);
 		init_printed = 1;
 	}
@@ -1519,7 +1483,6 @@ static void usage(char *argv, int status)
 		" -b <int>  : Block size, default %d\n"
 		" -p <bool> : Polled IO, default %d\n"
 		" -B <bool> : Fixed buffers, default %d\n"
-		" -D <bool> : DMA map fixed buffers, default %d\n"
 		" -F <bool> : Register files, default %d\n"
 		" -n <int>  : Number of threads, default %d\n"
 		" -O <bool> : Use O_DIRECT, default %d\n"
@@ -1534,7 +1497,7 @@ static void usage(char *argv, int status)
 		" -P <bool> : Automatically place on device home node %d\n"
 		" -u <bool> : Use nvme-passthrough I/O, default %d\n",
 		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
-		fixedbufs, dma_map, register_files, nthreads, !buffered, do_nop,
+		fixedbufs, register_files, nthreads, !buffered, do_nop,
 		stats, runtime == 0 ? "unlimited" : runtime_str, random_io, aio,
 		use_sync, register_ring, numa_placement, pt);
 	exit(status);
@@ -1656,9 +1619,6 @@ int main(int argc, char *argv[])
 		case 'r':
 			runtime = atoi(optarg);
 			break;
-		case 'D':
-			dma_map = !!atoi(optarg);
-			break;
 		case 'R':
 			random_io = !!atoi(optarg);
 			break;
@@ -1694,8 +1654,6 @@ int main(int argc, char *argv[])
 		batch_complete = depth;
 	if (batch_submit > depth)
 		batch_submit = depth;
-	if (!fixedbufs && dma_map)
-		dma_map = 0;
 
 	submitter = calloc(nthreads, sizeof(*submitter) +
 				roundup_pow2(depth) * sizeof(struct iovec));

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-01-25 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-01-25 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1ee0469f6d180a98d31196bea787f37269ff9cdd:

  configure: Don't use cross_prefix when invoking pkg-config (2024-01-23 16:28:31 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7f250f7514bacef1a3cea24a22ecce8bd30378bd:

  ci: resolve GitHub Actions Node.js warnings (2024-01-24 19:45:33 +0000)

----------------------------------------------------------------
Vincent Fu (1):
      ci: resolve GitHub Actions Node.js warnings

 .github/workflows/ci.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index b8000024..e53082c3 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -64,7 +64,7 @@ jobs:
       if: ${{ contains( matrix.build, 'windows' ) }}
       run: git config --global core.autocrlf input
     - name: Checkout repo
-      uses: actions/checkout@v3
+      uses: actions/checkout@v4
     - name: Install Cygwin toolchain (Windows)
       if: ${{ startsWith(matrix.build, 'windows-cygwin') }}
       uses: cygwin/cygwin-install-action@master
@@ -110,7 +110,7 @@ jobs:
 
     - name: Upload installer as artifact (Windows)
       if: ${{ contains( matrix.build, 'windows' ) }}
-      uses: actions/upload-artifact@v3
+      uses: actions/upload-artifact@v4
       with:
         name: ${{ matrix.build }}-installer
         path: os\windows\*.msi

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-01-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-01-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8b3190c3ea38af87778a68c576947f8797215d33:

  filesetup: clear O_RDWR flag for verify_only write workloads (2024-01-22 11:51:05 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1ee0469f6d180a98d31196bea787f37269ff9cdd:

  configure: Don't use cross_prefix when invoking pkg-config (2024-01-23 16:28:31 -0700)

----------------------------------------------------------------
Chris Packham (1):
      configure: Don't use cross_prefix when invoking pkg-config

 configure | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index f86fcf77..dea8d07d 100755
--- a/configure
+++ b/configure
@@ -155,11 +155,11 @@ output_sym() {
 check_min_lib_version() {
   _feature=$3
 
-  if "${cross_prefix}"pkg-config --atleast-version="$2" "$1" > /dev/null 2>&1; then
+  if pkg-config --atleast-version="$2" "$1" > /dev/null 2>&1; then
     return 0
   fi
   : "${_feature:=${1}}"
-  if "${cross_prefix}"pkg-config --version > /dev/null 2>&1; then
+  if pkg-config --version > /dev/null 2>&1; then
     if test "$(eval echo \"\$$_feature\")" = "yes" ; then
       feature_not_found "$_feature" "$1 >= $2"
     fi
@@ -1631,14 +1631,14 @@ int main(void)
   return GTK_CHECK_VERSION(2, 18, 0) ? 0 : 1; /* 0 on success */
 }
 EOF
-GTK_CFLAGS=$(${cross_prefix}pkg-config --cflags gtk+-2.0 gthread-2.0)
+GTK_CFLAGS=$(pkg-config --cflags gtk+-2.0 gthread-2.0)
 ORG_LDFLAGS=$LDFLAGS
 LDFLAGS=$(echo $LDFLAGS | sed s/"-static"//g)
 if test "$?" != "0" ; then
   echo "configure: gtk and gthread not found"
   exit 1
 fi
-GTK_LIBS=$(${cross_prefix}pkg-config --libs gtk+-2.0 gthread-2.0)
+GTK_LIBS=$(pkg-config --libs gtk+-2.0 gthread-2.0)
 if test "$?" != "0" ; then
   echo "configure: gtk and gthread not found"
   exit 1

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-01-23 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-01-23 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit aa84b5ba581add84ce6e73b20ca0fbd04f6058c8:

  ci: stop hard coding number of jobs for make (2024-01-18 13:00:31 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8b3190c3ea38af87778a68c576947f8797215d33:

  filesetup: clear O_RDWR flag for verify_only write workloads (2024-01-22 11:51:05 -0500)

----------------------------------------------------------------
Vincent Fu (1):
      filesetup: clear O_RDWR flag for verify_only write workloads

 filesetup.c | 5 +++++
 1 file changed, 5 insertions(+)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index 816d1081..2d277a64 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -749,6 +749,11 @@ open_again:
 		if (!read_only)
 			flags |= O_RDWR;
 
+		if (td->o.verify_only) {
+			flags &= ~O_RDWR;
+			flags |= O_RDONLY;
+		}
+
 		if (f->filetype == FIO_TYPE_FILE && td->o.allow_create)
 			flags |= O_CREAT;
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-01-19 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-01-19 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9eefdcc1dd820a936684168468fa9c81960ea461:

  configure: enable NVME_URING_CMD checking for Android (2024-01-17 09:11:15 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to aa84b5ba581add84ce6e73b20ca0fbd04f6058c8:

  ci: stop hard coding number of jobs for make (2024-01-18 13:00:31 -0500)

----------------------------------------------------------------
Vincent Fu (1):
      ci: stop hard coding number of jobs for make

 ci/actions-build.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/ci/actions-build.sh b/ci/actions-build.sh
index 31d3446c..47d4f044 100755
--- a/ci/actions-build.sh
+++ b/ci/actions-build.sh
@@ -61,7 +61,9 @@ main() {
     configure_flags+=(--extra-cflags="${extra_cflags}")
 
     ./configure "${configure_flags[@]}"
-    make -j 2
+    make -j "$(nproc 2>/dev/null || sysctl -n hw.logicalcpu)"
+# macOS does not have nproc, so we have to use sysctl to obtain the number of
+# logical CPUs.
 }
 
 main

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-01-18 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-01-18 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9f9340cc3a15bca2aa6e883bd5be3d0c9471f573:

  Merge branch 'group_reporting_indentation' of https://github.com/0mp/fio (2024-01-16 09:03:35 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9eefdcc1dd820a936684168468fa9c81960ea461:

  configure: enable NVME_URING_CMD checking for Android (2024-01-17 09:11:15 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      configure: enable NVME_URING_CMD checking for Android

 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 420d97db..f86fcf77 100755
--- a/configure
+++ b/configure
@@ -2656,7 +2656,7 @@ if test "$libzbc" != "no" ; then
 fi
 print_config "libzbc engine" "$libzbc"
 
-if test "$targetos" = "Linux" ; then
+if test "$targetos" = "Linux" || test "$targetos" = "Android"; then
 ##########################################
 # Check NVME_URING_CMD support
 cat > $TMPC << EOF

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2024-01-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2024-01-17 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 06c40418f97811092c0aece1760487400bcdd506:

  t/strided: check_result() has no return value (2023-12-28 22:19:44 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9f9340cc3a15bca2aa6e883bd5be3d0c9471f573:

  Merge branch 'group_reporting_indentation' of https://github.com/0mp/fio (2024-01-16 09:03:35 -0500)

----------------------------------------------------------------
Mateusz Piotrowski (1):
      doc: group_reporting: Fix indentation and syntax

Vincent Fu (1):
      Merge branch 'group_reporting_indentation' of https://github.com/0mp/fio

 HOWTO.rst | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 847c0356..d0ba8021 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -3984,12 +3984,12 @@ Measurements and reporting
 	same reporting group, unless if separated by a :option:`stonewall`, or by
 	using :option:`new_group`.
 
-    NOTE: When :option: `group_reporting` is used along with `json` output,
-    there are certain per-job properties which can be different between jobs
-    but do not have a natural group-level equivalent. Examples include
-    `kb_base`, `unit_base`, `sig_figs`, `thread_number`, `pid`, and
-    `job_start`. For these properties, the values for the first job are
-    recorded for the group.
+	NOTE: When :option:`group_reporting` is used along with `json` output,
+	there are certain per-job properties which can be different between jobs
+	but do not have a natural group-level equivalent. Examples include
+	`kb_base`, `unit_base`, `sig_figs`, `thread_number`, `pid`, and
+	`job_start`. For these properties, the values for the first job are
+	recorded for the group.
 
 .. option:: new_group
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-12-30 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-12-30 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit be943a3ef5d94d8a9fefa11dc004789f66beb8e6:

  t/zbd: add test case to confirm no write with rwmixwrite=0 option (2023-12-19 19:52:35 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 06c40418f97811092c0aece1760487400bcdd506:

  t/strided: check_result() has no return value (2023-12-28 22:19:44 -0500)

----------------------------------------------------------------
Vincent Fu (4):
      t/nvmept: call parent class check_result()
      t/random_seed: call parent class check_result()
      t/strided: call parent class check_result()
      t/strided: check_result() has no return value

 t/nvmept.py      |  2 ++
 t/random_seed.py |  8 ++++++++
 t/strided.py     | 12 +++++++-----
 3 files changed, 17 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/t/nvmept.py b/t/nvmept.py
index c08fb350..1ade64dc 100755
--- a/t/nvmept.py
+++ b/t/nvmept.py
@@ -55,6 +55,8 @@ class PassThruTest(FioJobCmdTest):
 
 
     def check_result(self):
+        super().check_result()
+
         if 'rw' not in self.fio_opts:
             return
 
diff --git a/t/random_seed.py b/t/random_seed.py
index 02187046..82beca65 100755
--- a/t/random_seed.py
+++ b/t/random_seed.py
@@ -91,6 +91,10 @@ class TestRR(FioRandTest):
     def check_result(self):
         """Check output for allrandrepeat=1."""
 
+        super().check_result()
+        if not self.passed:
+            return
+
         opt = 'randrepeat' if 'randrepeat' in self.fio_opts else 'allrandrepeat'
         rr = self.fio_opts[opt]
         rand_seeds = self.get_rand_seeds()
@@ -131,6 +135,10 @@ class TestRS(FioRandTest):
     def check_result(self):
         """Check output for randseed=something."""
 
+        super().check_result()
+        if not self.passed:
+            return
+
         rand_seeds = self.get_rand_seeds()
         randseed = self.fio_opts['randseed']
 
diff --git a/t/strided.py b/t/strided.py
index b7655e1e..75c429e4 100755
--- a/t/strided.py
+++ b/t/strided.py
@@ -71,6 +71,10 @@ class StridedTest(FioJobCmdTest):
         super().setup(fio_args)
 
     def check_result(self):
+        super().check_result()
+        if not self.passed:
+            return
+
         zonestart = 0 if 'offset' not in self.fio_opts else self.fio_opts['offset']
         iospersize = self.fio_opts['zonesize'] / self.fio_opts['bs']
         iosperrange = self.fio_opts['zonerange'] / self.fio_opts['bs']
@@ -95,7 +99,7 @@ class StridedTest(FioJobCmdTest):
             offset = int(tokens[4])
             if offset < zonestart or offset >= zonestart + self.fio_opts['zonerange']:
                 print(f"Offset {offset} outside of zone starting at {zonestart}")
-                return False
+                return
 
             # skip next section if norandommap is enabled with no
             # random_generator or with a random_generator != lfsr
@@ -113,17 +117,15 @@ class StridedTest(FioJobCmdTest):
             block = (offset - zonestart) / self.fio_opts['bs']
             if block in zoneset:
                 print(f"Offset {offset} in zone already touched")
-                return False
+                return
 
             zoneset.add(block)
             if iosperzone % iosperrange == 0:
                 if len(zoneset) != iosperrange:
                     print(f"Expected {iosperrange} blocks in zone but only saw {len(zoneset)}")
-                    return False
+                    return
                 zoneset = set()
 
-        return True
-
 
 TEST_LIST = [   # randommap enabled
     {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-12-20 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-12-20 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c77fe6859b6ef937a6ca900c1fab009175d721f8:

  engines/http: use proper error value (2023-12-15 13:17:13 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to be943a3ef5d94d8a9fefa11dc004789f66beb8e6:

  t/zbd: add test case to confirm no write with rwmixwrite=0 option (2023-12-19 19:52:35 -0700)

----------------------------------------------------------------
Shin'ichiro Kawasaki (2):
      zbd: avoid write with rwmixwrite=0 option
      t/zbd: add test case to confirm no write with rwmixwrite=0 option

 t/zbd/test-zbd-support | 23 +++++++++++++++++++++++
 zbd.c                  |  3 ++-
 2 files changed, 25 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 2f15a191..532860eb 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -1561,6 +1561,29 @@ test67() {
 	grep -q 'Exceeded max_active_zones limit' "${logfile}.${test_number}"
 }
 
+# Test rw=randrw and rwmixwrite=0 options do not issue write I/O unit
+test68() {
+	local off size
+
+	require_zbd || return "$SKIP_TESTCASE"
+
+	reset_zone "${dev}" -1
+
+	# Write some data as preparation
+	off=$((first_sequential_zone_sector * 512))
+	size=$min_seq_write_size
+	run_one_fio_job "$(ioengine "psync")" --rw=write --offset="$off" \
+			--io_size="$size" --zonemode=strided \
+			--zonesize="$zone_size" --zonerange="$zone_size" \
+		       >> "${logfile}.${test_number}" 2>&1 || return $?
+	# Run random mixed read and write specifying zero write ratio
+	run_fio_on_seq "$(ioengine "psync")" --rw=randrw --rwmixwrite=0 \
+		       --time_based --runtime=1s \
+		       >> "${logfile}.${test_number}" 2>&1 || return $?
+	# "WRITE:" shall be recoreded only once for the preparation
+	[[ $(grep -c "WRITE:" "${logfile}.${test_number}") == 1 ]]
+}
+
 SECONDS=0
 tests=()
 dynamic_analyzer=()
diff --git a/zbd.c b/zbd.c
index c4f7b12f..61b5b688 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1876,7 +1876,8 @@ enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
 	if (ddir != DDIR_READ || !td_rw(td))
 		return ddir;
 
-	if (io_u->file->last_start[DDIR_WRITE] != -1ULL || td->o.read_beyond_wp)
+	if (io_u->file->last_start[DDIR_WRITE] != -1ULL ||
+	    td->o.read_beyond_wp || td->o.rwmix[DDIR_WRITE] == 0)
 		return DDIR_READ;
 
 	return DDIR_WRITE;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-12-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-12-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4e472f8806571ea5799bc898e44609697ba0e140:

  Merge branch 'master' of https://github.com/preichl/fio (2023-12-14 14:15:34 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c77fe6859b6ef937a6ca900c1fab009175d721f8:

  engines/http: use proper error value (2023-12-15 13:17:13 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'master' of https://github.com/preichl/fio
      engines/http: use proper error value

Pavel Reichl (3):
      engines/rdma: remove dead code
      client/server: remove dead code
      engines/http: Drop unused varible

 engines/http.c | 3 +--
 engines/rdma.c | 1 -
 server.c       | 1 -
 3 files changed, 1 insertion(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/engines/http.c b/engines/http.c
index 83cfe8bb..99f4e119 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -640,7 +640,6 @@ static enum fio_q_status fio_http_queue(struct thread_data *td,
 	char url[1024];
 	long status;
 	CURLcode res;
-	int r = -1;
 
 	fio_ro_check(td, io_u);
 	memset(&_curl_stream, 0, sizeof(_curl_stream));
@@ -712,7 +711,7 @@ static enum fio_q_status fio_http_queue(struct thread_data *td,
 	log_err("WARNING: Only DDIR_READ/DDIR_WRITE/DDIR_TRIM are supported!\n");
 
 err:
-	io_u->error = r;
+	io_u->error = EIO;
 	td_verror(td, io_u->error, "transfer");
 out:
 	curl_slist_free_all(slist);
diff --git a/engines/rdma.c b/engines/rdma.c
index ebdbcb1c..07336f3b 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -276,7 +276,6 @@ static int cq_event_handler(struct thread_data *td, enum ibv_wc_opcode opcode)
 	int i;
 
 	while ((ret = ibv_poll_cq(rd->cq, 1, &wc)) == 1) {
-		ret = 0;
 		compevnum++;
 
 		if (wc.status) {
diff --git a/server.c b/server.c
index 06eac584..b9f0e2ac 100644
--- a/server.c
+++ b/server.c
@@ -1883,7 +1883,6 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 
 		offset = (char *)extended_buf_wp - (char *)extended_buf;
 		ptr->ts.ss_bw_data_offset = cpu_to_le64(offset);
-		extended_buf_wp = ss_bw + (int) ts->ss_dur;
 	}
 
 	fio_net_queue_cmd(FIO_NET_CMD_TS, extended_buf, extended_buf_size, NULL, SK_F_COPY);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-12-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-12-15 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f53eaac02ec46bdf7f87058f30667be80975caf6:

  engines/io_uring_cmd: skip pi verify checks for error cases (2023-12-12 09:39:06 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4e472f8806571ea5799bc898e44609697ba0e140:

  Merge branch 'master' of https://github.com/preichl/fio (2023-12-14 14:15:34 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'patch-3' of https://github.com/0mp/fio
      Merge branch 'master' of https://github.com/preichl/fio

Mateusz Piotrowski (1):
      doc: Reference geom(4) for FreeBSD users

Pavel Reichl (1):
      engines/http: Fix memory leak

 HOWTO.rst      | 2 +-
 engines/http.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index d173702b..847c0356 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -801,7 +801,7 @@ Target file/device
 
 	On Windows, disk devices are accessed as :file:`\\\\.\\PhysicalDrive0` for
 	the first device, :file:`\\\\.\\PhysicalDrive1` for the second etc.
-	Note: Windows and FreeBSD prevent write access to areas
+	Note: Windows and FreeBSD (refer to geom(4)) prevent write access to areas
 	of the disk containing in-use data (e.g. filesystems).
 
 	The filename "`-`" is a reserved name, meaning *stdin* or *stdout*.  Which
diff --git a/engines/http.c b/engines/http.c
index 56dc7d1b..83cfe8bb 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -250,6 +250,7 @@ static char *_aws_uriencode(const char *uri)
 	for (i = 0; (c = uri[i]); i++) {
 		if (n > bufsize-5) {
 			log_err("encoding the URL failed\n");
+			free(r);
 			return NULL;
 		}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-12-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-12-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 63e6f55a9147cf9f76376c2e7e38a623c8832f23:

  Merge branch 'master' of https://github.com/bvanassche/fio (2023-12-11 16:27:21 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f53eaac02ec46bdf7f87058f30667be80975caf6:

  engines/io_uring_cmd: skip pi verify checks for error cases (2023-12-12 09:39:06 -0500)

----------------------------------------------------------------
Ankit Kumar (1):
      engines/io_uring_cmd: skip pi verify checks for error cases

 engines/io_uring.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 5ae3135b..c0cb5a78 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -468,10 +468,12 @@ static struct io_u *fio_ioring_cmd_event(struct thread_data *td, int event)
 	cqe = &ld->cq_ring.cqes[index];
 	io_u = (struct io_u *) (uintptr_t) cqe->user_data;
 
-	if (cqe->res != 0)
+	if (cqe->res != 0) {
 		io_u->error = -cqe->res;
-	else
+		return io_u;
+	} else {
 		io_u->error = 0;
+	}
 
 	if (o->cmd_type == FIO_URING_CMD_NVME) {
 		data = FILE_ENG_DATA(io_u->file);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-12-12 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-12-12 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit bdf99b6836d75683cba5968c40f321748482ae86:

  Merge branch 'xnvme_includes' of https://github.com/safl/fio (2023-11-20 07:43:16 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 63e6f55a9147cf9f76376c2e7e38a623c8832f23:

  Merge branch 'master' of https://github.com/bvanassche/fio (2023-12-11 16:27:21 -0500)

----------------------------------------------------------------
Bart Van Assche (1):
      Fall back to F_SET_RW_HINT if F_SET_FILE_RW_HINT is not supported

Vincent Fu (2):
      engines/io_uring_cmd: friendlier bad bs error msg
      Merge branch 'master' of https://github.com/bvanassche/fio

 engines/io_uring.c | 19 +++++++++++++------
 ioengines.c        | 22 ++++++++++++----------
 2 files changed, 25 insertions(+), 16 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 38c36fdc..5ae3135b 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -1279,14 +1279,21 @@ static int fio_ioring_cmd_open_file(struct thread_data *td, struct fio_file *f)
 		lba_size = data->lba_ext ? data->lba_ext : data->lba_size;
 
 		for_each_rw_ddir(ddir) {
-			if (td->o.min_bs[ddir] % lba_size ||
-				td->o.max_bs[ddir] % lba_size) {
-				if (data->lba_ext)
-					log_err("%s: block size must be a multiple of (LBA data size + Metadata size)\n",
-						f->file_name);
-				else
+			if (td->o.min_bs[ddir] % lba_size || td->o.max_bs[ddir] % lba_size) {
+				if (data->lba_ext) {
+					log_err("%s: block size must be a multiple of %u "
+						"(LBA data size + Metadata size)\n", f->file_name, lba_size);
+					if (td->o.min_bs[ddir] == td->o.max_bs[ddir] &&
+					    !(td->o.min_bs[ddir] % data->lba_size)) {
+						/* fixed block size is actually a multiple of LBA data size */
+						unsigned long long suggestion = lba_size *
+							(td->o.min_bs[ddir] / data->lba_size);
+						log_err("Did you mean to use a block size of %llu?\n", suggestion);
+					}
+				} else {
 					log_err("%s: block size must be a multiple of LBA data size\n",
 						f->file_name);
+				}
 				td_verror(td, EINVAL, "fio_ioring_cmd_open_file");
 				return 1;
 			}
diff --git a/ioengines.c b/ioengines.c
index 36172725..87cc2286 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -590,19 +590,21 @@ int td_io_open_file(struct thread_data *td, struct fio_file *f)
 	if (fio_option_is_set(&td->o, write_hint) &&
 	    (f->filetype == FIO_TYPE_BLOCK || f->filetype == FIO_TYPE_FILE)) {
 		uint64_t hint = td->o.write_hint;
-		int cmd;
+		int res;
 
 		/*
-		 * For direct IO, we just need/want to set the hint on
-		 * the file descriptor. For buffered IO, we need to set
-		 * it on the inode.
+		 * For direct IO, set the hint on the file descriptor if that is
+		 * supported. Otherwise set it on the inode. For buffered IO, we
+		 * need to set it on the inode.
 		 */
-		if (td->o.odirect)
-			cmd = F_SET_FILE_RW_HINT;
-		else
-			cmd = F_SET_RW_HINT;
-
-		if (fcntl(f->fd, cmd, &hint) < 0) {
+		if (td->o.odirect) {
+			res = fcntl(f->fd, F_SET_FILE_RW_HINT, &hint);
+			if (res < 0)
+				res = fcntl(f->fd, F_SET_RW_HINT, &hint);
+		} else {
+			res = fcntl(f->fd, F_SET_RW_HINT, &hint);
+		}
+		if (res < 0) {
 			td_verror(td, errno, "fcntl write hint");
 			goto err;
 		}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-11-20 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-11-20 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit afdde534004397ba1fb00ccc6f5906fa50dd667f:

  t/jobs/t0012.fio: make this job time_based (2023-11-07 12:22:40 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to bdf99b6836d75683cba5968c40f321748482ae86:

  Merge branch 'xnvme_includes' of https://github.com/safl/fio (2023-11-20 07:43:16 -0500)

----------------------------------------------------------------
Simon A. F. Lund (1):
      engines/xnvme: only include entry-header ('libxnvme.h')

Vincent Fu (1):
      Merge branch 'xnvme_includes' of https://github.com/safl/fio

 engines/xnvme.c | 4 ----
 1 file changed, 4 deletions(-)

---

Diff of recent changes:

diff --git a/engines/xnvme.c b/engines/xnvme.c
index b7824013..2a0b3520 100644
--- a/engines/xnvme.c
+++ b/engines/xnvme.c
@@ -10,10 +10,6 @@
 #include <stdlib.h>
 #include <assert.h>
 #include <libxnvme.h>
-#include <libxnvme_libconf.h>
-#include <libxnvme_nvm.h>
-#include <libxnvme_znd.h>
-#include <libxnvme_spec_fs.h>
 #include "fio.h"
 #include "zbd_types.h"
 #include "fdp.h"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-11-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-11-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 05fce19c7d2668adb38243636a1781c0f8fae523:

  docs: add warning to per_job_logs option (2023-11-06 13:46:30 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to afdde534004397ba1fb00ccc6f5906fa50dd667f:

  t/jobs/t0012.fio: make this job time_based (2023-11-07 12:22:40 -0500)

----------------------------------------------------------------
Vincent Fu (1):
      t/jobs/t0012.fio: make this job time_based

 t/jobs/t0012.fio | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/t/jobs/t0012.fio b/t/jobs/t0012.fio
index d7123966..e01d2b01 100644
--- a/t/jobs/t0012.fio
+++ b/t/jobs/t0012.fio
@@ -14,6 +14,7 @@ flow_sleep=100
 thread
 log_avg_msec=1000
 write_iops_log=t0012.fio
+time_based
 
 [flow1]
 flow=1

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-11-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-11-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2c0b784a12172da1533dfd40b66a0e4e5609065f:

  Merge branch 'thinkcycles-parameter' of https://github.com/cloehle/fio (2023-11-03 11:21:22 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 05fce19c7d2668adb38243636a1781c0f8fae523:

  docs: add warning to per_job_logs option (2023-11-06 13:46:30 -0500)

----------------------------------------------------------------
Vincent Fu (2):
      client/server: enable per_job_logs option
      docs: add warning to per_job_logs option

 HOWTO.rst |  8 +++++---
 client.c  | 18 ++++++++++++++----
 fio.1     |  6 ++++--
 server.c  |  1 +
 server.h  |  3 ++-
 5 files changed, 26 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 42b2b119..d173702b 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -3968,9 +3968,11 @@ Measurements and reporting
 
 .. option:: per_job_logs=bool
 
-	If set, this generates bw/clat/iops log with per file private filenames. If
-	not set, jobs with identical names will share the log filename. Default:
-	true.
+        If set to true, fio generates bw/clat/iops logs with per job unique
+        filenames. If set to false, jobs with identical names will share a log
+        filename. Note that when this option is set to false log files will be
+        opened in append mode and if log files already exist the previous
+        contents will not be overwritten. Default: true.
 
 .. option:: group_reporting
 
diff --git a/client.c b/client.c
index 345fa910..699a2e5b 100644
--- a/client.c
+++ b/client.c
@@ -1452,10 +1452,13 @@ static int fio_client_handle_iolog(struct fio_client *client,
 	if (store_direct) {
 		ssize_t wrote;
 		size_t sz;
-		int fd;
+		int fd, flags;
 
-		fd = open((const char *) log_pathname,
-				O_WRONLY | O_CREAT | O_TRUNC, 0644);
+		if (pdu->per_job_logs)
+			flags = O_WRONLY | O_CREAT | O_TRUNC;
+		else
+			flags = O_WRONLY | O_CREAT | O_APPEND;
+		fd = open((const char *) log_pathname, flags, 0644);
 		if (fd < 0) {
 			log_err("fio: open log %s: %s\n",
 				log_pathname, strerror(errno));
@@ -1476,7 +1479,13 @@ static int fio_client_handle_iolog(struct fio_client *client,
 		ret = 0;
 	} else {
 		FILE *f;
-		f = fopen((const char *) log_pathname, "w");
+		const char *mode;
+
+		if (pdu->per_job_logs)
+			mode = "w";
+		else
+			mode = "a";
+		f = fopen((const char *) log_pathname, mode);
 		if (!f) {
 			log_err("fio: fopen log %s : %s\n",
 				log_pathname, strerror(errno));
@@ -1695,6 +1704,7 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 	ret->log_offset		= le32_to_cpu(ret->log_offset);
 	ret->log_prio		= le32_to_cpu(ret->log_prio);
 	ret->log_hist_coarseness = le32_to_cpu(ret->log_hist_coarseness);
+	ret->per_job_logs	= le32_to_cpu(ret->per_job_logs);
 
 	if (*store_direct)
 		return ret;
diff --git a/fio.1 b/fio.1
index d62da688..8f659f1d 100644
--- a/fio.1
+++ b/fio.1
@@ -3667,8 +3667,10 @@ interpreted in seconds.
 .SS "Measurements and reporting"
 .TP
 .BI per_job_logs \fR=\fPbool
-If set, this generates bw/clat/iops log with per file private filenames. If
-not set, jobs with identical names will share the log filename. Default:
+If set to true, fio generates bw/clat/iops logs with per job unique filenames.
+If set to false, jobs with identical names will share a log filename. Note that
+when this option is set to false log files will be opened in append mode and if
+log files already exist the previous contents will not be overwritten. Default:
 true.
 .TP
 .BI group_reporting
diff --git a/server.c b/server.c
index 27332e32..06eac584 100644
--- a/server.c
+++ b/server.c
@@ -2260,6 +2260,7 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 		.thread_number		= cpu_to_le32(td->thread_number),
 		.log_type		= cpu_to_le32(log->log_type),
 		.log_hist_coarseness	= cpu_to_le32(log->hist_coarseness),
+		.per_job_logs		= cpu_to_le32(td->o.per_job_logs),
 	};
 	struct sk_entry *first;
 	struct flist_head *entry;
diff --git a/server.h b/server.h
index ad706118..0eb594ce 100644
--- a/server.h
+++ b/server.h
@@ -51,7 +51,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 101,
+	FIO_SERVER_VER			= 102,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
@@ -198,6 +198,7 @@ struct cmd_iolog_pdu {
 	uint32_t log_offset;
 	uint32_t log_prio;
 	uint32_t log_hist_coarseness;
+	uint32_t per_job_logs;
 	uint8_t name[FIO_NET_NAME_MAX];
 	struct io_sample samples[0];
 };

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-11-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-11-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 48cf0c63e5b867c8953f25deaa02466bf94a2eed:

  engines/xnvme: fix fdp support for userspace drivers (2023-11-02 06:08:13 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2c0b784a12172da1533dfd40b66a0e4e5609065f:

  Merge branch 'thinkcycles-parameter' of https://github.com/cloehle/fio (2023-11-03 11:21:22 -0400)

----------------------------------------------------------------
Christian Loehle (1):
      fio: Introduce new constant thinkcycles option

Vincent Fu (1):
      Merge branch 'thinkcycles-parameter' of https://github.com/cloehle/fio

 HOWTO.rst        |  8 ++++++++
 backend.c        |  4 ++++
 cconv.c          |  2 ++
 fio.1            |  7 +++++++
 fio_time.h       |  1 +
 options.c        | 12 ++++++++++++
 thread_options.h |  8 ++++++--
 time.c           | 11 +++++++++++
 8 files changed, 51 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 34d6afdf..42b2b119 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -3209,6 +3209,14 @@ I/O depth
 I/O rate
 ~~~~~~~~
 
+.. option:: thinkcycles=int
+
+	Stall the job for the specified number of cycles after an I/O has completed before
+	issuing the next. May be used to simulate processing being done by an application.
+	This is not taken into account for the time to be waited on for  :option:`thinktime`.
+	Might not have any effect on some platforms, this can be checked by trying a setting
+	a high enough amount of thinkcycles.
+
 .. option:: thinktime=time
 
 	Stall the job for the specified period of time after an I/O has completed before issuing the
diff --git a/backend.c b/backend.c
index a5895fec..1fab467a 100644
--- a/backend.c
+++ b/backend.c
@@ -49,6 +49,7 @@
 #include "helper_thread.h"
 #include "pshared.h"
 #include "zone-dist.h"
+#include "fio_time.h"
 
 static struct fio_sem *startup_sem;
 static struct flist_head *cgroup_list;
@@ -1133,6 +1134,9 @@ reap:
 		if (ret < 0)
 			break;
 
+		if (ddir_rw(ddir) && td->o.thinkcycles)
+			cycles_spin(td->o.thinkcycles);
+
 		if (ddir_rw(ddir) && td->o.thinktime)
 			handle_thinktime(td, ddir, &comp_time);
 
diff --git a/cconv.c b/cconv.c
index 341388d4..c9298408 100644
--- a/cconv.c
+++ b/cconv.c
@@ -233,6 +233,7 @@ int convert_thread_options_to_cpu(struct thread_options *o,
 	o->random_generator = le32_to_cpu(top->random_generator);
 	o->hugepage_size = le32_to_cpu(top->hugepage_size);
 	o->rw_min_bs = le64_to_cpu(top->rw_min_bs);
+	o->thinkcycles = le32_to_cpu(top->thinkcycles);
 	o->thinktime = le32_to_cpu(top->thinktime);
 	o->thinktime_spin = le32_to_cpu(top->thinktime_spin);
 	o->thinktime_blocks = le32_to_cpu(top->thinktime_blocks);
@@ -472,6 +473,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->random_generator = cpu_to_le32(o->random_generator);
 	top->hugepage_size = cpu_to_le32(o->hugepage_size);
 	top->rw_min_bs = __cpu_to_le64(o->rw_min_bs);
+	top->thinkcycles = cpu_to_le32(o->thinkcycles);
 	top->thinktime = cpu_to_le32(o->thinktime);
 	top->thinktime_spin = cpu_to_le32(o->thinktime_spin);
 	top->thinktime_blocks = cpu_to_le32(o->thinktime_blocks);
diff --git a/fio.1 b/fio.1
index c4742aa9..d62da688 100644
--- a/fio.1
+++ b/fio.1
@@ -2962,6 +2962,13 @@ reporting if I/O gets backed up on the device side (the coordinated omission
 problem). Note that this option cannot reliably be used with async IO engines.
 .SS "I/O rate"
 .TP
+.BI thinkcycles \fR=\fPint
+Stall the job for the specified number of cycles after an I/O has completed before
+issuing the next. May be used to simulate processing being done by an application.
+This is not taken into account for the time to be waited on for \fBthinktime\fR.
+Might not have any effect on some platforms, this can be checked by trying a setting
+a high enough amount of thinkcycles.
+.TP
 .BI thinktime \fR=\fPtime
 Stall the job for the specified period of time after an I/O has completed before issuing the
 next. May be used to simulate processing being done by an application.
diff --git a/fio_time.h b/fio_time.h
index b20e734c..969ad68d 100644
--- a/fio_time.h
+++ b/fio_time.h
@@ -22,6 +22,7 @@ extern uint64_t time_since_now(const struct timespec *);
 extern uint64_t time_since_genesis(void);
 extern uint64_t mtime_since_genesis(void);
 extern uint64_t utime_since_genesis(void);
+extern void cycles_spin(unsigned int);
 extern uint64_t usec_spin(unsigned int);
 extern uint64_t usec_sleep(struct thread_data *, unsigned long);
 extern void fill_start_time(struct timespec *);
diff --git a/options.c b/options.c
index 6b2cb53f..53df03de 100644
--- a/options.c
+++ b/options.c
@@ -3875,6 +3875,18 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_THINKTIME,
 	},
+	{
+		.name	= "thinkcycles",
+		.lname	= "Think cycles",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, thinkcycles),
+		.help	= "Spin for a constant amount of cycles between requests",
+		.def	= "0",
+		.parent	= "thinktime",
+		.hide	= 1,
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_THINKTIME,
+	},
 	{
 		.name	= "thinktime_blocks",
 		.lname	= "Thinktime blocks",
diff --git a/thread_options.h b/thread_options.h
index fdde055e..24f695fe 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -309,6 +309,8 @@ struct thread_options {
 	char *exec_prerun;
 	char *exec_postrun;
 
+	unsigned int thinkcycles;
+
 	unsigned int thinktime;
 	unsigned int thinktime_spin;
 	unsigned int thinktime_blocks;
@@ -355,8 +357,8 @@ struct thread_options {
 
 	unsigned long long latency_target;
 	unsigned long long latency_window;
-	fio_fp64_t latency_percentile;
 	uint32_t latency_run;
+	fio_fp64_t latency_percentile;
 
 	/*
 	 * flow support
@@ -626,6 +628,8 @@ struct thread_options_pack {
 	uint8_t exec_prerun[FIO_TOP_STR_MAX];
 	uint8_t exec_postrun[FIO_TOP_STR_MAX];
 
+	uint32_t thinkcycles;
+
 	uint32_t thinktime;
 	uint32_t thinktime_spin;
 	uint32_t thinktime_blocks;
@@ -671,8 +675,8 @@ struct thread_options_pack {
 	uint64_t latency_target;
 	uint64_t latency_window;
 	uint64_t max_latency[DDIR_RWDIR_CNT];
-	fio_fp64_t latency_percentile;
 	uint32_t latency_run;
+	fio_fp64_t latency_percentile;
 
 	/*
 	 * flow support
diff --git a/time.c b/time.c
index 7cbab6ff..7f85c8de 100644
--- a/time.c
+++ b/time.c
@@ -38,6 +38,17 @@ uint64_t usec_spin(unsigned int usec)
 	return t;
 }
 
+/*
+ * busy loop for a fixed amount of cycles
+ */
+void cycles_spin(unsigned int n)
+{
+	unsigned long i;
+
+	for (i=0; i < n; i++)
+		nop;
+}
+
 uint64_t usec_sleep(struct thread_data *td, unsigned long usec)
 {
 	struct timespec req;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-11-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-11-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 95f4d3f054464e997ae1067dc7f4f8ec3f896ccc:

  Merge branch 'pi-perf' of https://github.com/ankit-sam/fio (2023-10-31 09:27:15 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 48cf0c63e5b867c8953f25deaa02466bf94a2eed:

  engines/xnvme: fix fdp support for userspace drivers (2023-11-02 06:08:13 -0600)

----------------------------------------------------------------
Ankit Kumar (1):
      engines/xnvme: fix fdp support for userspace drivers

 engines/xnvme.c        |  2 +-
 examples/xnvme-fdp.fio | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/xnvme.c b/engines/xnvme.c
index ce7b2bdd..b7824013 100644
--- a/engines/xnvme.c
+++ b/engines/xnvme.c
@@ -964,7 +964,7 @@ static int xnvme_fioe_fetch_ruhs(struct thread_data *td, struct fio_file *f,
 	uint32_t nsid;
 	int err = 0, err_lock;
 
-	if (f->filetype != FIO_TYPE_CHAR) {
+	if (f->filetype != FIO_TYPE_CHAR && f->filetype != FIO_TYPE_FILE) {
 		log_err("ioeng->fdp_ruhs(): ignoring filetype: %d\n", f->filetype);
 		return -EINVAL;
 	}
diff --git a/examples/xnvme-fdp.fio b/examples/xnvme-fdp.fio
index 86fbe0d3..c50959f1 100644
--- a/examples/xnvme-fdp.fio
+++ b/examples/xnvme-fdp.fio
@@ -16,6 +16,26 @@
 ;   --xnvme_sync=nvme \
 ;   --filename=/dev/ng0n1
 ;
+; # Use the xNVMe io-engine engine with SPDK backend, note that you have to set the Namespace-id
+; fio examples/xnvme-fdp.fio \
+;   --section=default \
+;   --ioengine=xnvme \
+;   --xnvme_dev_nsid=1 \
+;   --filename=0000\\:01\\:00.0
+;
+; NOTE: The URI encoded in the filename above, the ":" must be escaped.
+;
+; On the command-line using two "\\":
+;
+; --filename=0000\\:01\\:00.0
+;
+; Within a fio-script using a single "\":
+;
+; filename=0000\:01\:00.0
+;
+; NOTE: If you want to override the default bs, iodepth, and workload, then
+; invoke it as:
+;
 ; FIO_BS="512" FIO_RW="read" FIO_IODEPTH=16 fio examples/xnvme-fdp.fio \
 ;   --section=override --ioengine=xnvme --xnvme_sync=nvme --filename=/dev/ng0n1
 ;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-11-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-11-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7a725c78547f7337dddb6fd391f80914f671e583:

  Merge branch 'englist' of https://github.com/vt-alt/fio (2023-10-25 17:53:40 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 95f4d3f054464e997ae1067dc7f4f8ec3f896ccc:

  Merge branch 'pi-perf' of https://github.com/ankit-sam/fio (2023-10-31 09:27:15 -0600)

----------------------------------------------------------------
Ankit Kumar (1):
      crct10: use isa-l for crc if available

Jens Axboe (1):
      Merge branch 'pi-perf' of https://github.com/ankit-sam/fio

 HOWTO.rst              |  4 ++++
 configure              | 29 +++++++++++++++++++++++++++++
 crc/crct10dif_common.c | 13 +++++++++++++
 fio.1                  |  4 ++++
 4 files changed, 50 insertions(+)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 6a8fb3e3..34d6afdf 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2507,6 +2507,10 @@ with the caveat that when used on the command line, they must come after the
 	If this is set to 0, fio generates protection information for
 	write case and verifies for read case. Default: 1.
 
+	For 16 bit CRC generation fio will use isa-l if available otherwise
+	it will use the default slower generator.
+	(see: https://github.com/intel/isa-l)
+
 .. option:: pi_chk=str[,str][,str] : [io_uring_cmd]
 
 	Controls the protection information check. This can take one or more
diff --git a/configure b/configure
index 3e3f8132..420d97db 100755
--- a/configure
+++ b/configure
@@ -189,6 +189,7 @@ libiscsi="no"
 libnbd="no"
 libnfs=""
 xnvme=""
+isal=""
 libblkio=""
 libzbc=""
 dfs=""
@@ -262,6 +263,8 @@ for opt do
   ;;
   --disable-xnvme) xnvme="no"
   ;;
+  --disable-isal) isal="no"
+  ;;
   --disable-libblkio) libblkio="no"
   ;;
   --disable-tcmalloc) disable_tcmalloc="yes"
@@ -322,6 +325,7 @@ if test "$show_help" = "yes" ; then
   echo "--enable-libiscsi       Enable iscsi support"
   echo "--enable-libnbd         Enable libnbd (NBD engine) support"
   echo "--disable-xnvme         Disable xnvme support even if found"
+  echo "--disable-isal          Disable isal support even if found"
   echo "--disable-libblkio      Disable libblkio support even if found"
   echo "--disable-libzbc        Disable libzbc even if found"
   echo "--disable-tcmalloc      Disable tcmalloc support"
@@ -2684,6 +2688,28 @@ if test "$xnvme" != "no" ; then
 fi
 print_config "xnvme engine" "$xnvme"
 
+if test "$targetos" = "Linux" ; then
+##########################################
+# Check ISA-L support
+cat > $TMPC << EOF
+#include <isa-l/crc.h>
+#include <stddef.h>
+int main(void)
+{
+  return crc16_t10dif(0, NULL, 4096);
+}
+EOF
+if test "$isal" != "no" ; then
+  if compile_prog "" "-lisal" "ISAL"; then
+    isal="yes"
+    LIBS="-lisal $LIBS"
+  else
+    isal="no"
+  fi
+fi
+print_config "isal" "$isal"
+fi
+
 ##########################################
 # Check if we have libblkio
 if test "$libblkio" != "no" ; then
@@ -3334,6 +3360,9 @@ if test "$xnvme" = "yes" ; then
   echo "LIBXNVME_CFLAGS=$xnvme_cflags" >> $config_host_mak
   echo "LIBXNVME_LIBS=$xnvme_libs" >> $config_host_mak
 fi
+if test "$isal" = "yes" ; then
+  output_sym "CONFIG_LIBISAL"
+fi
 if test "$libblkio" = "yes" ; then
   output_sym "CONFIG_LIBBLKIO"
   echo "LIBBLKIO_CFLAGS=$libblkio_cflags" >> $config_host_mak
diff --git a/crc/crct10dif_common.c b/crc/crct10dif_common.c
index cfb2a1b1..1763b1c6 100644
--- a/crc/crct10dif_common.c
+++ b/crc/crct10dif_common.c
@@ -24,6 +24,17 @@
  *
  */
 
+#ifdef CONFIG_LIBISAL
+#include <isa-l/crc.h>
+
+extern unsigned short fio_crc_t10dif(unsigned short crc,
+				     const unsigned char *buffer,
+				     unsigned int len)
+{
+	return crc16_t10dif(crc, buffer, len);
+}
+
+#else
 #include "crc-t10dif.h"
 
 /* Table generated using the following polynomium:
@@ -76,3 +87,5 @@ extern unsigned short fio_crc_t10dif(unsigned short crc,
 
 	return crc;
 }
+
+#endif
diff --git a/fio.1 b/fio.1
index a8dc8f6c..c4742aa9 100644
--- a/fio.1
+++ b/fio.1
@@ -2263,6 +2263,10 @@ size greater than protection information size, fio will not generate or verify
 the protection information portion of metadata for write or read case
 respectively. If this is set to 0, fio generates protection information for
 write case and verifies for read case. Default: 1.
+
+For 16 bit CRC generation fio will use isa-l if available otherwise it will
+use the default slower generator.
+(see: https://github.com/intel/isa-l)
 .TP
 .BI (io_uring_cmd)pi_chk \fR=\fPstr[,str][,str]
 Controls the protection information check. This can take one or more of these

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-10-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-10-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c11e22e92f3796f21eb15eb6ddc1614d9fa4f99d:

  Merge branch 'spellingfixes-2023-10-23' of https://github.com/proact-de/fio (2023-10-23 08:32:46 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7a725c78547f7337dddb6fd391f80914f671e583:

  Merge branch 'englist' of https://github.com/vt-alt/fio (2023-10-25 17:53:40 -0400)

----------------------------------------------------------------
Vincent Fu (2):
      engines/io_uring_cmd: allocate enough ranges for async trims
      Merge branch 'englist' of https://github.com/vt-alt/fio

Vitaly Chikunov (1):
      nfs: Fix incorrect engine registering for '--enghelp' list

 engines/io_uring.c | 2 +-
 engines/nfs.c      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 05703df8..38c36fdc 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -1196,7 +1196,7 @@ static int fio_ioring_init(struct thread_data *td)
 	    td->o.zone_mode == ZONE_MODE_ZBD)
 		td->io_ops->flags |= FIO_ASYNCIO_SYNC_TRIM;
 	else
-		ld->dsm = calloc(ld->iodepth, sizeof(*ld->dsm));
+		ld->dsm = calloc(td->o.iodepth, sizeof(*ld->dsm));
 
 	return 0;
 }
diff --git a/engines/nfs.c b/engines/nfs.c
index 970962a3..ce748d14 100644
--- a/engines/nfs.c
+++ b/engines/nfs.c
@@ -308,7 +308,7 @@ static int fio_libnfs_close(struct thread_data *td, struct fio_file *f)
 	return ret;
 }
 
-struct ioengine_ops ioengine = {
+static struct ioengine_ops ioengine = {
 	.name		= "nfs",
 	.version	= FIO_IOOPS_VERSION,
 	.setup		= fio_libnfs_setup,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-10-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-10-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d4fdbe3fa91bbcc9583886af35b56cc7b691f8fa:

  Merge branch 'fix-riscv64-cpu-clock' of https://github.com/gilbsgilbs/fio (2023-10-22 18:52:51 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c11e22e92f3796f21eb15eb6ddc1614d9fa4f99d:

  Merge branch 'spellingfixes-2023-10-23' of https://github.com/proact-de/fio (2023-10-23 08:32:46 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'spellingfixes-2023-10-23' of https://github.com/proact-de/fio

Martin Steigerwald (1):
      Various spelling fixes.

 HOWTO.rst | 14 +++++++-------
 configure |  2 +-
 fio.1     |  8 ++++----
 3 files changed, 12 insertions(+), 12 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index cc7124b1..6a8fb3e3 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2344,8 +2344,8 @@ with the caveat that when used on the command line, they must come after the
 
 		cmdprio_bssplit=blocksize/percentage/class/level/hint:...
 
-	This is an extension of the second accepted format that allows to also
-	specify a priority hint.
+	This is an extension of the second accepted format that allows one to
+	also specify a priority hint.
 
 	For all formats, only the read and write data directions are supported,
 	values for trim IOs are ignored. This option is mutually exclusive with
@@ -3014,7 +3014,7 @@ with the caveat that when used on the command line, they must come after the
 	**hugepage**
 		Use hugepages, instead of existing posix memory backend. The
 		memory backend uses hugetlbfs. This require users to allocate
-		hugepages, mount hugetlbfs and set an enviornment variable for
+		hugepages, mount hugetlbfs and set an environment variable for
 		XNVME_HUGETLB_PATH.
 	**spdk**
 		Uses SPDK's memory allocator.
@@ -3047,7 +3047,7 @@ with the caveat that when used on the command line, they must come after the
 	creating but before connecting the libblkio instance. Each property must
 	have the format ``<name>=<value>``. Colons can be escaped as ``\:``.
 	These are set after the engine sets any other properties, so those can
-	be overriden. Available properties depend on the libblkio version in use
+	be overridden. Available properties depend on the libblkio version in use
 	and are listed at
 	https://libblkio.gitlab.io/libblkio/blkio.html#properties
 
@@ -3071,7 +3071,7 @@ with the caveat that when used on the command line, they must come after the
 	connecting but before starting the libblkio instance. Each property must
 	have the format ``<name>=<value>``. Colons can be escaped as ``\:``.
 	These are set after the engine sets any other properties, so those can
-	be overriden. Available properties depend on the libblkio version in use
+	be overridden. Available properties depend on the libblkio version in use
 	and are listed at
 	https://libblkio.gitlab.io/libblkio/blkio.html#properties
 
@@ -3635,8 +3635,8 @@ Threads, processes and job synchronization
 	By default, fio will continue running all other jobs when one job finishes.
 	Sometimes this is not the desired action. Setting ``exitall`` will
 	instead make fio terminate all jobs in the same group. The option
-        ``exit_what`` allows to control which jobs get terminated when ``exitall`` is
-        enabled. The default is ``group`` and does not change the behaviour of
+        ``exit_what`` allows one to control which jobs get terminated when ``exitall``
+        is enabled. The default is ``group`` and does not change the behaviour of
         ``exitall``. The setting ``all`` terminates all jobs. The setting ``stonewall``
         terminates all currently running jobs across all groups and continues execution
         with the next stonewalled group.
diff --git a/configure b/configure
index 742cb7c5..3e3f8132 100755
--- a/configure
+++ b/configure
@@ -334,7 +334,7 @@ if test "$show_help" = "yes" ; then
 fi
 
 cross_prefix=${cross_prefix-${CROSS_COMPILE}}
-# Preferred compiler (can be overriden later after we know the platform):
+# Preferred compiler (can be overridden later after we know the platform):
 #  ${CC} (if set)
 #  ${cross_prefix}gcc (if cross-prefix specified)
 #  gcc if available
diff --git a/fio.1 b/fio.1
index 628e278d..a8dc8f6c 100644
--- a/fio.1
+++ b/fio.1
@@ -2142,7 +2142,7 @@ The third accepted format for this option is:
 cmdprio_bssplit=blocksize/percentage/class/level/hint:...
 .RE
 .P
-This is an extension of the second accepted format that allows to also
+This is an extension of the second accepted format that allows one to also
 specify a priority hint.
 .P
 For all formats, only the read and write data directions are supported, values
@@ -2774,7 +2774,7 @@ This is the default posix memory backend for linux NVMe driver.
 .BI hugepage
 Use hugepages, instead of existing posix memory backend. The memory backend
 uses hugetlbfs. This require users to allocate hugepages, mount hugetlbfs and
-set an enviornment variable for XNVME_HUGETLB_PATH.
+set an environment variable for XNVME_HUGETLB_PATH.
 .TP
 .BI spdk
 Uses SPDK's memory allocator.
@@ -2803,7 +2803,7 @@ support it; see \fIhttps://libblkio.gitlab.io/libblkio/blkio.html#drivers\fR
 A colon-separated list of additional libblkio properties to be set after
 creating but before connecting the libblkio instance. Each property must have
 the format \fB<name>=<value>\fR. Colons can be escaped as \fB\\:\fR. These are
-set after the engine sets any other properties, so those can be overriden.
+set after the engine sets any other properties, so those can be overridden.
 Available properties depend on the libblkio version in use and are listed at
 \fIhttps://libblkio.gitlab.io/libblkio/blkio.html#properties\fR
 .TP
@@ -2821,7 +2821,7 @@ may support it; see \fIhttps://libblkio.gitlab.io/libblkio/blkio.html#drivers\fR
 A colon-separated list of additional libblkio properties to be set after
 connecting but before starting the libblkio instance. Each property must have
 the format \fB<name>=<value>\fR. Colons can be escaped as \fB\\:\fR. These are
-set after the engine sets any other properties, so those can be overriden.
+set after the engine sets any other properties, so those can be overridden.
 Available properties depend on the libblkio version in use and are listed at
 \fIhttps://libblkio.gitlab.io/libblkio/blkio.html#properties\fR
 .TP

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-10-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-10-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f8735bf1fb208bc1b6b1ca818413c9e41944e813:

  Merge branch 'master' of https://github.com/michalbiesek/fio (2023-10-20 04:32:39 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d4fdbe3fa91bbcc9583886af35b56cc7b691f8fa:

  Merge branch 'fix-riscv64-cpu-clock' of https://github.com/gilbsgilbs/fio (2023-10-22 18:52:51 -0600)

----------------------------------------------------------------
Gilbert Gilb's (1):
      riscv64: get clock from `rdtime` instead of `rdcycle`

Jens Axboe (1):
      Merge branch 'fix-riscv64-cpu-clock' of https://github.com/gilbsgilbs/fio

 arch/arch-riscv64.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/arch/arch-riscv64.h b/arch/arch-riscv64.h
index 9b8fd001..8ac33fa3 100644
--- a/arch/arch-riscv64.h
+++ b/arch/arch-riscv64.h
@@ -16,7 +16,7 @@ static inline unsigned long long get_cpu_clock(void)
 {
 	unsigned long val;
 
-	asm volatile("rdcycle %0" : "=r"(val));
+	asm volatile("rdtime %0" : "=r"(val));
 	return val;
 }
 #define ARCH_HAVE_CPU_CLOCK

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-10-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-10-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c5d8ce3fc736210ded83b126c71e3225c7ffd7c9:

  ci: explicitly install pygments and certifi on macos (2023-10-16 10:54:21 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f8735bf1fb208bc1b6b1ca818413c9e41944e813:

  Merge branch 'master' of https://github.com/michalbiesek/fio (2023-10-20 04:32:39 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Merge branch 'fix_issue_1642' of https://github.com/zqs-Oppenauer/fio
      Fio 3.36
      Merge branch 'master' of https://github.com/michalbiesek/fio

Michal Biesek (1):
      riscv64: add syscall helpers

Shai Levy (2):
      configure: improve pthread_sigmask detection.
      helper_thread: fix pthread_sigmask typo.

Vincent Fu (1):
      Merge branch 'master' of https://github.com/shailevi23/fio

zhuqingsong.0909 (1):
      fix assert failed when timeout during call rate_ddir.

 FIO-VERSION-GEN     |  2 +-
 arch/arch-riscv64.h | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 configure           |  3 +-
 helper_thread.c     |  5 ++--
 io_ddir.h           |  1 +
 io_u.c              | 10 +++++--
 zbd.c               |  1 +
 7 files changed, 101 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 4b0d56d0..cf8dbb0e 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.35
+DEF_VER=fio-3.36
 
 LF='
 '
diff --git a/arch/arch-riscv64.h b/arch/arch-riscv64.h
index a74b7d47..9b8fd001 100644
--- a/arch/arch-riscv64.h
+++ b/arch/arch-riscv64.h
@@ -29,4 +29,90 @@ static inline int arch_init(char *envp[])
 	return 0;
 }
 
+#define __do_syscallM(...) ({						\
+	__asm__ volatile (						\
+		"ecall"							\
+		: "=r"(a0)						\
+		: __VA_ARGS__						\
+		: "memory", "a1");					\
+	(long) a0;							\
+})
+
+#define __do_syscallN(...) ({						\
+	__asm__ volatile (						\
+		"ecall"							\
+		: "=r"(a0)						\
+		: __VA_ARGS__						\
+		: "memory");					\
+	(long) a0;							\
+})
+
+#define __do_syscall0(__n) ({						\
+	register long a7 __asm__("a7") = __n;				\
+	register long a0 __asm__("a0");					\
+									\
+	__do_syscallM("r" (a7));					\
+})
+
+#define __do_syscall1(__n, __a) ({					\
+	register long a7 __asm__("a7") = __n;				\
+	register __typeof__(__a) a0 __asm__("a0") = __a;		\
+									\
+	__do_syscallM("r" (a7), "0" (a0));				\
+})
+
+#define __do_syscall2(__n, __a, __b) ({					\
+	register long a7 __asm__("a7") = __n;				\
+	register __typeof__(__a) a0 __asm__("a0") = __a;		\
+	register __typeof__(__b) a1 __asm__("a1") = __b;		\
+									\
+	__do_syscallN("r" (a7), "0" (a0), "r" (a1));			\
+})
+
+#define __do_syscall3(__n, __a, __b, __c) ({				\
+	register long a7 __asm__("a7") = __n;				\
+	register __typeof__(__a) a0 __asm__("a0") = __a;		\
+	register __typeof__(__b) a1 __asm__("a1") = __b;		\
+	register __typeof__(__c) a2 __asm__("a2") = __c;		\
+									\
+	__do_syscallN("r" (a7), "0" (a0), "r" (a1), "r" (a2));		\
+})
+
+#define __do_syscall4(__n, __a, __b, __c, __d) ({			\
+	register long a7 __asm__("a7") = __n;				\
+	register __typeof__(__a) a0 __asm__("a0") = __a;		\
+	register __typeof__(__b) a1 __asm__("a1") = __b;		\
+	register __typeof__(__c) a2 __asm__("a2") = __c;		\
+	register __typeof__(__d) a3 __asm__("a3") = __d;		\
+									\
+	__do_syscallN("r" (a7), "0" (a0), "r" (a1), "r" (a2), "r" (a3));\
+})
+
+#define __do_syscall5(__n, __a, __b, __c, __d, __e) ({			\
+	register long a7 __asm__("a7") = __n;				\
+	register __typeof__(__a) a0 __asm__("a0") = __a;		\
+	register __typeof__(__b) a1 __asm__("a1") = __b;		\
+	register __typeof__(__c) a2 __asm__("a2") = __c;		\
+	register __typeof__(__d) a3 __asm__("a3") = __d;		\
+	register __typeof__(__e) a4 __asm__("a4") = __e;		\
+									\
+	__do_syscallN("r" (a7), "0" (a0), "r" (a1), "r" (a2), "r" (a3),	\
+			"r"(a4));					\
+})
+
+#define __do_syscall6(__n, __a, __b, __c, __d, __e, __f) ({		\
+	register long a7 __asm__("a7") = __n;				\
+	register __typeof__(__a) a0 __asm__("a0") = __a;		\
+	register __typeof__(__b) a1 __asm__("a1") = __b;		\
+	register __typeof__(__c) a2 __asm__("a2") = __c;		\
+	register __typeof__(__d) a3 __asm__("a3") = __d;		\
+	register __typeof__(__e) a4 __asm__("a4") = __e;		\
+	register __typeof__(__f) a5 __asm__("a5") = __f;		\
+									\
+	__do_syscallN("r" (a7), "0" (a0), "r" (a1), "r" (a2), "r" (a3),	\
+			"r" (a4), "r"(a5));				\
+})
+
+#define FIO_ARCH_HAS_SYSCALL
+
 #endif
diff --git a/configure b/configure
index 36184a58..742cb7c5 100755
--- a/configure
+++ b/configure
@@ -864,7 +864,8 @@ cat > $TMPC <<EOF
 #include <signal.h> /* pthread_sigmask() */
 int main(void)
 {
-  return pthread_sigmask(0, NULL, NULL);
+  sigset_t sigmask;
+  return pthread_sigmask(0, NULL, &sigmask);
 }
 EOF
 if compile_prog "" "$LIBS" "pthread_sigmask" ; then
diff --git a/helper_thread.c b/helper_thread.c
index 53dea44b..2a9dabf5 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -106,13 +106,14 @@ static int read_from_pipe(int fd, void *buf, size_t len)
 
 static void block_signals(void)
 {
-#ifdef HAVE_PTHREAD_SIGMASK
+#ifdef CONFIG_PTHREAD_SIGMASK
 	sigset_t sigmask;
 
+	int ret;
+
 	ret = pthread_sigmask(SIG_UNBLOCK, NULL, &sigmask);
 	assert(ret == 0);
 	ret = pthread_sigmask(SIG_BLOCK, &sigmask, NULL);
-	assert(ret == 0);
 #endif
 }
 
diff --git a/io_ddir.h b/io_ddir.h
index 217eb628..280c1e79 100644
--- a/io_ddir.h
+++ b/io_ddir.h
@@ -11,6 +11,7 @@ enum fio_ddir {
 	DDIR_WAIT,
 	DDIR_LAST,
 	DDIR_INVAL = -1,
+	DDIR_TIMEOUT = -2,
 
 	DDIR_RWDIR_CNT = 3,
 	DDIR_RWDIR_SYNC_CNT = 4,
diff --git a/io_u.c b/io_u.c
index 07e5bac5..13187882 100644
--- a/io_u.c
+++ b/io_u.c
@@ -717,7 +717,7 @@ static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
 		 * check if the usec is capable of taking negative values
 		 */
 		if (now > td->o.timeout) {
-			ddir = DDIR_INVAL;
+			ddir = DDIR_TIMEOUT;
 			return ddir;
 		}
 		usec = td->o.timeout - now;
@@ -726,7 +726,7 @@ static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
 
 	now = utime_since_now(&td->epoch);
 	if ((td->o.timeout && (now > td->o.timeout)) || td->terminate)
-		ddir = DDIR_INVAL;
+		ddir = DDIR_TIMEOUT;
 
 	return ddir;
 }
@@ -951,7 +951,7 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 
 	set_rw_ddir(td, io_u);
 
-	if (io_u->ddir == DDIR_INVAL) {
+	if (io_u->ddir == DDIR_INVAL || io_u->ddir == DDIR_TIMEOUT) {
 		dprint(FD_IO, "invalid direction received ddir = %d", io_u->ddir);
 		return 1;
 	}
@@ -1419,6 +1419,10 @@ static long set_io_u_file(struct thread_data *td, struct io_u *io_u)
 		put_file_log(td, f);
 		td_io_close_file(td, f);
 		io_u->file = NULL;
+
+		if (io_u->ddir == DDIR_TIMEOUT)
+			return 1;
+
 		if (td->o.file_service_type & __FIO_FSERVICE_NONUNIFORM)
 			fio_file_reset(td, f);
 		else {
diff --git a/zbd.c b/zbd.c
index caac68bb..c4f7b12f 100644
--- a/zbd.c
+++ b/zbd.c
@@ -2171,6 +2171,7 @@ retry:
 	case DDIR_WAIT:
 	case DDIR_LAST:
 	case DDIR_INVAL:
+	case DDIR_TIMEOUT:
 		goto accept;
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-10-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-10-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 50b94305b08a746c21a2c644ffb3cb56915d86ee:

  t/zbd: avoid test case 45 failure (2023-10-13 17:31:47 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c5d8ce3fc736210ded83b126c71e3225c7ffd7c9:

  ci: explicitly install pygments and certifi on macos (2023-10-16 10:54:21 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      ci: explicitly install pygments and certifi on macos

 ci/actions-install.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index 95241e78..76335fbc 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -86,7 +86,7 @@ install_macos() {
     #echo "Updating homebrew..."
     #brew update >/dev/null 2>&1
     echo "Installing packages..."
-    HOMEBREW_NO_AUTO_UPDATE=1 brew install cunit libnfs sphinx-doc
+    HOMEBREW_NO_AUTO_UPDATE=1 brew install cunit libnfs sphinx-doc pygments python-certifi
     brew link sphinx-doc --force
     pip3 install scipy six statsmodels
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-10-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-10-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 16b9f29dab1d105951da663474ec243942fda400:

  Merge branch 'fix-stat-overflow' of https://github.com/stilor/fio (2023-10-06 15:27:23 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 50b94305b08a746c21a2c644ffb3cb56915d86ee:

  t/zbd: avoid test case 45 failure (2023-10-13 17:31:47 -0400)

----------------------------------------------------------------
Shin'ichiro Kawasaki (1):
      t/zbd: avoid test case 45 failure

 t/zbd/test-zbd-support | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 0436d319..2f15a191 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -1058,15 +1058,20 @@ test44() {
 
 test45() {
     local bs i
+    local grep_str="fio: first I/O failed. If .* is a zoned block device, consider --zonemode=zbd"
 
     require_zbd || return $SKIP_TESTCASE
     prep_write
     bs=$((min_seq_write_size))
-    run_one_fio_job "$(ioengine "psync")" --iodepth=1 --rw=randwrite --bs=$bs\
-		    --offset=$((first_sequential_zone_sector * 512)) \
-		    --size="$zone_size" --do_verify=1 --verify=md5 2>&1 |
-	tee -a "${logfile}.${test_number}" |
-	grep -q "fio: first I/O failed. If .* is a zoned block device, consider --zonemode=zbd"
+    for ((i = 0; i < 10; i++)); do
+	    run_one_fio_job "$(ioengine "psync")" --iodepth=1 --rw=randwrite \
+			    --offset=$((first_sequential_zone_sector * 512)) \
+			    --bs="$bs" --time_based --runtime=1s \
+			    --do_verify=1 --verify=md5 \
+		    >> "${logfile}.${test_number}" 2>&1
+	    grep -qe "$grep_str" "${logfile}.${test_number}" && return 0
+    done
+    return 1
 }
 
 # Random write to sequential zones, libaio, 8 jobs, queue depth 64 per job

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-10-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-10-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6f9cdcfcc7598c7d7b19c4a5120a251a80dab183:

  iolog: don't truncate time values (2023-10-02 14:29:29 +0000)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 16b9f29dab1d105951da663474ec243942fda400:

  Merge branch 'fix-stat-overflow' of https://github.com/stilor/fio (2023-10-06 15:27:23 -0400)

----------------------------------------------------------------
Alexey Neyman (2):
      Change memcpy() calls to assignments
      Handle 32-bit overflows in disk utilization stats

Vincent Fu (1):
      Merge branch 'fix-stat-overflow' of https://github.com/stilor/fio

 diskutil.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/diskutil.c b/diskutil.c
index cf4ede85..69b3dd26 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -77,6 +77,23 @@ static int get_io_ticks(struct disk_util *du, struct disk_util_stat *dus)
 	return ret != 10;
 }
 
+static uint64_t safe_32bit_diff(uint64_t nval, uint64_t oval)
+{
+	/* Linux kernel prints some of the stat fields as 32-bit integers. It is
+	 * possible that the value overflows, but since fio uses unsigned 64-bit
+	 * arithmetic in update_io_tick_disk(), it instead results in a huge
+	 * bogus value being added to the respective accumulating field. Just
+	 * in case Linux starts reporting these metrics as 64-bit values in the
+	 * future, check that overflow actually happens around the 32-bit
+	 * unsigned boundary; assume overflow only happens once between
+	 * successive polls.
+	 */
+	if (oval <= nval || oval >= (1ull << 32))
+		return nval - oval;
+	else
+		return (1ull << 32) + nval - oval;
+}
+
 static void update_io_tick_disk(struct disk_util *du)
 {
 	struct disk_util_stat __dus, *dus, *ldus;
@@ -96,15 +113,16 @@ static void update_io_tick_disk(struct disk_util *du)
 	dus->s.ios[1] += (__dus.s.ios[1] - ldus->s.ios[1]);
 	dus->s.merges[0] += (__dus.s.merges[0] - ldus->s.merges[0]);
 	dus->s.merges[1] += (__dus.s.merges[1] - ldus->s.merges[1]);
-	dus->s.ticks[0] += (__dus.s.ticks[0] - ldus->s.ticks[0]);
-	dus->s.ticks[1] += (__dus.s.ticks[1] - ldus->s.ticks[1]);
-	dus->s.io_ticks += (__dus.s.io_ticks - ldus->s.io_ticks);
-	dus->s.time_in_queue += (__dus.s.time_in_queue - ldus->s.time_in_queue);
+	dus->s.ticks[0] += safe_32bit_diff(__dus.s.ticks[0], ldus->s.ticks[0]);
+	dus->s.ticks[1] += safe_32bit_diff(__dus.s.ticks[1], ldus->s.ticks[1]);
+	dus->s.io_ticks += safe_32bit_diff(__dus.s.io_ticks, ldus->s.io_ticks);
+	dus->s.time_in_queue +=
+			safe_32bit_diff(__dus.s.time_in_queue, ldus->s.time_in_queue);
 
 	fio_gettime(&t, NULL);
 	dus->s.msec += mtime_since(&du->time, &t);
-	memcpy(&du->time, &t, sizeof(t));
-	memcpy(&ldus->s, &__dus.s, sizeof(__dus.s));
+	du->time = t;
+	ldus->s = __dus.s;
 }
 
 int update_io_ticks(void)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-10-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-10-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c95b52caacc8ef5c1235fb3754186e981b109bdb:

  ci: switch macos runs from macos-12 to macos-13 (2023-09-29 11:51:10 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6f9cdcfcc7598c7d7b19c4a5120a251a80dab183:

  iolog: don't truncate time values (2023-10-02 14:29:29 +0000)

----------------------------------------------------------------
Vincent Fu (1):
      iolog: don't truncate time values

 iolog.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/iolog.c b/iolog.c
index 97ba4396..5213c60f 100644
--- a/iolog.c
+++ b/iolog.c
@@ -1002,14 +1002,14 @@ void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 
 	if (log_offset) {
 		if (log_prio)
-			fmt = "%lu, %" PRId64 ", %u, %llu, %llu, 0x%04x\n";
+			fmt = "%" PRIu64 ", %" PRId64 ", %u, %llu, %llu, 0x%04x\n";
 		else
-			fmt = "%lu, %" PRId64 ", %u, %llu, %llu, %u\n";
+			fmt = "%" PRIu64 ", %" PRId64 ", %u, %llu, %llu, %u\n";
 	} else {
 		if (log_prio)
-			fmt = "%lu, %" PRId64 ", %u, %llu, 0x%04x\n";
+			fmt = "%" PRIu64 ", %" PRId64 ", %u, %llu, 0x%04x\n";
 		else
-			fmt = "%lu, %" PRId64 ", %u, %llu, %u\n";
+			fmt = "%" PRIu64 ", %" PRId64 ", %u, %llu, %u\n";
 	}
 
 	nr_samples = sample_size / __log_entry_sz(log_offset);
@@ -1024,7 +1024,7 @@ void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 
 		if (!log_offset) {
 			fprintf(f, fmt,
-				(unsigned long) s->time,
+				s->time,
 				s->data.val,
 				io_sample_ddir(s), (unsigned long long) s->bs,
 				prio_val);
@@ -1032,7 +1032,7 @@ void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 			struct io_sample_offset *so = (void *) s;
 
 			fprintf(f, fmt,
-				(unsigned long) s->time,
+				s->time,
 				s->data.val,
 				io_sample_ddir(s), (unsigned long long) s->bs,
 				(unsigned long long) so->offset,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-09-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-09-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 06812a4f0e4ff4847076e742557ab406a0e96848:

  Merge branch 'fix_verify_block_offset' of https://github.com/ipylypiv/fio (2023-09-29 00:05:10 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c95b52caacc8ef5c1235fb3754186e981b109bdb:

  ci: switch macos runs from macos-12 to macos-13 (2023-09-29 11:51:10 -0400)

----------------------------------------------------------------
Vincent Fu (2):
      workqueue: handle nice better
      ci: switch macos runs from macos-12 to macos-13

 .github/workflows/ci.yml | 2 +-
 workqueue.c              | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 69fedf77..b8000024 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -27,7 +27,7 @@ jobs:
           os: ubuntu-22.04
           cc: clang
         - build: macos
-          os: macos-12
+          os: macos-13
         - build: linux-i686-gcc
           os: ubuntu-22.04
           arch: i686
diff --git a/workqueue.c b/workqueue.c
index 9e6c41ff..3636bc3a 100644
--- a/workqueue.c
+++ b/workqueue.c
@@ -136,7 +136,8 @@ static void *worker_thread(void *data)
 	sk_out_assign(sw->sk_out);
 
 	if (wq->ops.nice) {
-		if (nice(wq->ops.nice) < 0) {
+		errno = 0;
+		if (nice(wq->ops.nice) == -1 && errno != 0) {
 			log_err("workqueue: nice %s\n", strerror(errno));
 			ret = 1;
 		}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-09-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-09-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 996ac91f54844e63ef43092472fc1f7610567b67:

  t/zbd: set mq-deadline scheduler to device-mapper destination devices (2023-09-26 09:00:13 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 06812a4f0e4ff4847076e742557ab406a0e96848:

  Merge branch 'fix_verify_block_offset' of https://github.com/ipylypiv/fio (2023-09-29 00:05:10 -0600)

----------------------------------------------------------------
Igor Pylypiv (1):
      verify: Fix the bad pattern block offset value

Jens Axboe (1):
      Merge branch 'fix_verify_block_offset' of https://github.com/ipylypiv/fio

 verify.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/verify.c b/verify.c
index f7355f30..78f333e6 100644
--- a/verify.c
+++ b/verify.c
@@ -398,7 +398,8 @@ static int verify_io_u_pattern(struct verify_header *hdr, struct vcont *vc)
 				(unsigned char)buf[i],
 				(unsigned char)pattern[mod],
 				bits);
-			log_err("fio: bad pattern block offset %u\n", i);
+			log_err("fio: bad pattern block offset %u\n",
+				i + header_size);
 			vc->name = "pattern";
 			log_verify_failure(hdr, vc);
 			return EILSEQ;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-09-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-09-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a142e0df6c1483a76d92ff7f9d8c07242af9910e:

  Merge branch 'fio_client_server_doc_fix' of https://github.com/pcpartpicker/fio (2023-09-20 07:41:17 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 996ac91f54844e63ef43092472fc1f7610567b67:

  t/zbd: set mq-deadline scheduler to device-mapper destination devices (2023-09-26 09:00:13 -0400)

----------------------------------------------------------------
Shin'ichiro Kawasaki (1):
      t/zbd: set mq-deadline scheduler to device-mapper destination devices

 t/zbd/functions        | 11 +++++++++
 t/zbd/test-zbd-support | 61 +++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 71 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/t/zbd/functions b/t/zbd/functions
index 4faa45a9..028df404 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -27,6 +27,17 @@ blkzone_reports_capacity() {
 		"${blkzone}" report -c 1 -o 0 "${dev}" | grep -q 'cap '
 }
 
+has_command() {
+	local cmd="${1}"
+
+	cmd_path=$(type -p "${cmd}" 2>/dev/null)
+	if [ -z "${cmd_path}" ]; then
+		echo "${cmd} is not available"
+		return 1
+	fi
+	return 0
+}
+
 # Whether or not $1 (/dev/...) is a NVME ZNS device.
 is_nvme_zns() {
 	local s
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index c8f3eb61..0436d319 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -46,6 +46,55 @@ ioengine() {
 	fi
 }
 
+get_dev_path_by_id() {
+	for d in /sys/block/* /sys/block/*/*; do
+		if [[ ! -r "${d}/dev" ]]; then
+			continue
+		fi
+		if [[ "${1}" == "$(<"${d}/dev")" ]]; then
+			echo "/dev/${d##*/}"
+			return 0
+		fi
+	done
+	return 1
+}
+
+dm_destination_dev_set_io_scheduler() {
+	local dev=$1 sched=$2
+	local dest_dev_id dest_dev path
+
+	has_command dmsetup || return 1
+
+	while read -r dest_dev_id; do
+		if ! dest_dev=$(get_dev_path_by_id "${dest_dev_id}"); then
+			continue
+		fi
+		path=${dest_dev/dev/sys\/block}/queue/scheduler
+		if [[ ! -w ${path} ]]; then
+			echo "Can not set scheduler of device mapper destination: ${dest_dev}"
+			continue
+		fi
+		echo "${2}" > "${path}"
+	done < <(dmsetup table "$(<"/sys/block/$dev/dm/name")" |
+			 sed -n  's/.* \([0-9]*:[0-9]*\).*/\1/p')
+}
+
+dev_has_dm_map() {
+	local dev=${1} target_type=${2}
+	local dm_name
+
+	has_command dmsetup || return 1
+
+	dm_name=$(<"/sys/block/$dev/dm/name")
+	if ! dmsetup status "${dm_name}" | grep -qe "${target_type}"; then
+		return 1
+	fi
+	if dmsetup status "${dm_name}" | grep -v "${target_type}"; then
+		return 1
+	fi
+	return 0
+}
+
 set_io_scheduler() {
     local dev=$1 sched=$2
 
@@ -62,7 +111,17 @@ set_io_scheduler() {
 	esac
     fi
 
-    echo "$sched" >"/sys/block/$dev/queue/scheduler"
+    if [ -w "/sys/block/$dev/queue/scheduler" ]; then
+	echo "$sched" >"/sys/block/$dev/queue/scheduler"
+    elif [ -r  "/sys/block/$dev/dm/name" ] &&
+		 ( dev_has_dm_map "$dev" linear ||
+		   dev_has_dm_map "$dev" flakey ||
+		   dev_has_dm_map "$dev" crypt ); then
+	dm_destination_dev_set_io_scheduler "$dev" "$sched"
+    else
+	echo "can not set io scheduler"
+	exit 1
+    fi
 }
 
 check_read() {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-09-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-09-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e2c5f17e3559cc7c96706cd75c2609f12675c60b:

  verify: open state file in binary mode on Windows (2023-09-14 18:54:25 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a142e0df6c1483a76d92ff7f9d8c07242af9910e:

  Merge branch 'fio_client_server_doc_fix' of https://github.com/pcpartpicker/fio (2023-09-20 07:41:17 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      Merge branch 'fio_client_server_doc_fix' of https://github.com/pcpartpicker/fio

aggieNick02 (1):
      Update docs to clarify how to pass job options in client mode

 HOWTO.rst | 3 +++
 fio.1     | 3 +++
 2 files changed, 6 insertions(+)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 7f26978a..cc7124b1 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -5105,6 +5105,9 @@ is the connect string, and `remote-args` and `job file(s)` are sent to the
 server. The `server` string follows the same format as it does on the server
 side, to allow IP/hostname/socket and port strings.
 
+Note that all job options must be defined in job files when running fio as a
+client. Any job options specified in `remote-args` will be ignored.
+
 Fio can connect to multiple servers this way::
 
     fio --client=<server1> <job file(s)> --client=<server2> <job file(s)>
diff --git a/fio.1 b/fio.1
index 8159caa4..628e278d 100644
--- a/fio.1
+++ b/fio.1
@@ -4838,6 +4838,9 @@ is the connect string, and `remote\-args' and `job file(s)' are sent to the
 server. The `server' string follows the same format as it does on the server
 side, to allow IP/hostname/socket and port strings.
 .P
+Note that all job options must be defined in job files when running fio as a
+client. Any job options specified in `remote\-args' will be ignored.
+.P
 Fio can connect to multiple servers this way:
 .RS
 .P

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-09-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-09-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e4a9812dee084b058eca6ebde9634a3d573a0079:

  engines:nvme: fill command fields as per pi check bits (2023-09-11 10:55:56 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e2c5f17e3559cc7c96706cd75c2609f12675c60b:

  verify: open state file in binary mode on Windows (2023-09-14 18:54:25 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      verify: open state file in binary mode on Windows

 verify.c | 4 ++++
 1 file changed, 4 insertions(+)

---

Diff of recent changes:

diff --git a/verify.c b/verify.c
index 2848b686..f7355f30 100644
--- a/verify.c
+++ b/verify.c
@@ -1648,6 +1648,10 @@ static int open_state_file(const char *name, const char *prefix, int num,
 	else
 		flags = O_RDONLY;
 
+#ifdef _WIN32
+	flags |= O_BINARY;
+#endif
+
 	verify_state_gen_name(out, sizeof(out), name, prefix, num);
 
 	fd = open(out, flags, 0644);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-09-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-09-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 904ee91c2831615a054a8dea9b164e96ae00abb3:

  Merge branch 'pcpp_parse_nr_fix' of https://github.com/PCPartPicker/fio (2023-09-02 07:35:49 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e4a9812dee084b058eca6ebde9634a3d573a0079:

  engines:nvme: fill command fields as per pi check bits (2023-09-11 10:55:56 -0600)

----------------------------------------------------------------
Ankit Kumar (2):
      engines:io_uring_cmd: disallow verify for e2e pi with extended blocks
      engines:nvme: fill command fields as per pi check bits

Vincent Fu (1):
      Merge branch 'pcpp_epoch_fixing_2' of https://github.com/PCPartPicker/fio

aggieNick02 (2):
      Record job start time to fix time pain points
      Make log_unix_epoch an official alias of log_alternate_epoch

 HOWTO.rst          | 21 +++++++++++++++------
 backend.c          |  2 +-
 cconv.c            |  4 ++--
 client.c           |  1 +
 engines/io_uring.c | 14 ++++++++++++++
 engines/nvme.c     | 15 ++++++++++-----
 fio.1              | 23 +++++++++++++++++------
 fio.h              |  3 ++-
 fio_time.h         |  2 +-
 libfio.c           |  2 +-
 options.c          | 22 ++++++++++++----------
 rate-submit.c      |  2 +-
 server.c           |  1 +
 stat.c             |  6 +++++-
 stat.h             |  1 +
 thread_options.h   |  8 +++-----
 time.c             | 20 ++++++++++++++------
 17 files changed, 101 insertions(+), 46 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 89032941..7f26978a 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -755,6 +755,10 @@ Time related parameters
 	calls will be excluded from other uses. Fio will manually clear it from the
 	CPU mask of other jobs.
 
+.. option:: job_start_clock_id=int
+   The clock_id passed to the call to `clock_gettime` used to record job_start
+   in the `json` output format. Default is 0, or CLOCK_REALTIME.
+
 
 Target file/device
 ~~~~~~~~~~~~~~~~~~
@@ -3966,6 +3970,13 @@ Measurements and reporting
 	same reporting group, unless if separated by a :option:`stonewall`, or by
 	using :option:`new_group`.
 
+    NOTE: When :option: `group_reporting` is used along with `json` output,
+    there are certain per-job properties which can be different between jobs
+    but do not have a natural group-level equivalent. Examples include
+    `kb_base`, `unit_base`, `sig_figs`, `thread_number`, `pid`, and
+    `job_start`. For these properties, the values for the first job are
+    recorded for the group.
+
 .. option:: new_group
 
 	Start a new reporting group. See: :option:`group_reporting`.  If not given,
@@ -4103,9 +4114,7 @@ Measurements and reporting
 
 .. option:: log_unix_epoch=bool
 
-	If set, fio will log Unix timestamps to the log files produced by enabling
-	write_type_log for each log type, instead of the default zero-based
-	timestamps.
+	Backwards compatible alias for log_alternate_epoch.
 
 .. option:: log_alternate_epoch=bool
 
@@ -4116,9 +4125,9 @@ Measurements and reporting
 
 .. option:: log_alternate_epoch_clock_id=int
 
-	Specifies the clock_id to be used by clock_gettime to obtain the alternate epoch
-	if either log_unix_epoch or log_alternate_epoch are true. Otherwise has no
-	effect. Default value is 0, or CLOCK_REALTIME.
+    Specifies the clock_id to be used by clock_gettime to obtain the alternate
+    epoch if log_alternate_epoch is true. Otherwise has no effect. Default
+    value is 0, or CLOCK_REALTIME.
 
 .. option:: block_error_percentiles=bool
 
diff --git a/backend.c b/backend.c
index 5f074039..a5895fec 100644
--- a/backend.c
+++ b/backend.c
@@ -1858,7 +1858,7 @@ static void *thread_main(void *data)
 	if (rate_submit_init(td, sk_out))
 		goto err;
 
-	set_epoch_time(td, o->log_unix_epoch | o->log_alternate_epoch, o->log_alternate_epoch_clock_id);
+	set_epoch_time(td, o->log_alternate_epoch_clock_id, o->job_start_clock_id);
 	fio_getrusage(&td->ru_start);
 	memcpy(&td->bw_sample_time, &td->epoch, sizeof(td->epoch));
 	memcpy(&td->iops_sample_time, &td->epoch, sizeof(td->epoch));
diff --git a/cconv.c b/cconv.c
index ce6acbe6..341388d4 100644
--- a/cconv.c
+++ b/cconv.c
@@ -216,9 +216,9 @@ int convert_thread_options_to_cpu(struct thread_options *o,
 	o->log_prio = le32_to_cpu(top->log_prio);
 	o->log_gz = le32_to_cpu(top->log_gz);
 	o->log_gz_store = le32_to_cpu(top->log_gz_store);
-	o->log_unix_epoch = le32_to_cpu(top->log_unix_epoch);
 	o->log_alternate_epoch = le32_to_cpu(top->log_alternate_epoch);
 	o->log_alternate_epoch_clock_id = le32_to_cpu(top->log_alternate_epoch_clock_id);
+	o->job_start_clock_id = le32_to_cpu(top->job_start_clock_id);
 	o->norandommap = le32_to_cpu(top->norandommap);
 	o->softrandommap = le32_to_cpu(top->softrandommap);
 	o->bs_unaligned = le32_to_cpu(top->bs_unaligned);
@@ -455,9 +455,9 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->log_prio = cpu_to_le32(o->log_prio);
 	top->log_gz = cpu_to_le32(o->log_gz);
 	top->log_gz_store = cpu_to_le32(o->log_gz_store);
-	top->log_unix_epoch = cpu_to_le32(o->log_unix_epoch);
 	top->log_alternate_epoch = cpu_to_le32(o->log_alternate_epoch);
 	top->log_alternate_epoch_clock_id = cpu_to_le32(o->log_alternate_epoch_clock_id);
+	top->job_start_clock_id = cpu_to_le32(o->job_start_clock_id);
 	top->norandommap = cpu_to_le32(o->norandommap);
 	top->softrandommap = cpu_to_le32(o->softrandommap);
 	top->bs_unaligned = cpu_to_le32(o->bs_unaligned);
diff --git a/client.c b/client.c
index c257036b..345fa910 100644
--- a/client.c
+++ b/client.c
@@ -956,6 +956,7 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->error		= le32_to_cpu(src->error);
 	dst->thread_number	= le32_to_cpu(src->thread_number);
 	dst->groupid		= le32_to_cpu(src->groupid);
+	dst->job_start		= le64_to_cpu(src->job_start);
 	dst->pid		= le32_to_cpu(src->pid);
 	dst->members		= le32_to_cpu(src->members);
 	dst->unified_rw_rep	= le32_to_cpu(src->unified_rw_rep);
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 6cdf1b4f..05703df8 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -18,6 +18,7 @@
 #include "../lib/memalign.h"
 #include "../lib/fls.h"
 #include "../lib/roundup.h"
+#include "../verify.h"
 
 #ifdef ARCH_HAVE_IOURING
 
@@ -1299,6 +1300,19 @@ static int fio_ioring_cmd_open_file(struct thread_data *td, struct fio_file *f)
 				return 1;
 			}
                 }
+
+		/*
+		 * For extended logical block sizes we cannot use verify when
+		 * end to end data protection checks are enabled, as the PI
+		 * section of data buffer conflicts with verify.
+		 */
+		if (data->ms && data->pi_type && data->lba_ext &&
+		    td->o.verify != VERIFY_NONE) {
+			log_err("%s: for extended LBA, verify cannot be used when E2E data protection is enabled\n",
+				f->file_name);
+			td_verror(td, EINVAL, "fio_ioring_cmd_open_file");
+			return 1;
+		}
 	}
 	if (!ld || !o->registerfiles)
 		return generic_open_file(td, f);
diff --git a/engines/nvme.c b/engines/nvme.c
index 08503b33..75a5e0c1 100644
--- a/engines/nvme.c
+++ b/engines/nvme.c
@@ -415,19 +415,24 @@ void fio_nvme_pi_fill(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 	case NVME_NS_DPS_PI_TYPE2:
 		switch (data->guard_type) {
 		case NVME_NVM_NS_16B_GUARD:
-			cmd->cdw14 = (__u32)slba;
+			if (opts->io_flags & NVME_IO_PRINFO_PRCHK_REF)
+				cmd->cdw14 = (__u32)slba;
 			break;
 		case NVME_NVM_NS_64B_GUARD:
-			cmd->cdw14 = (__u32)slba;
-			cmd->cdw3 = ((slba >> 32) & 0xffff);
+			if (opts->io_flags & NVME_IO_PRINFO_PRCHK_REF) {
+				cmd->cdw14 = (__u32)slba;
+				cmd->cdw3 = ((slba >> 32) & 0xffff);
+			}
 			break;
 		default:
 			break;
 		}
-		cmd->cdw15 = (opts->apptag_mask << 16 | opts->apptag);
+		if (opts->io_flags & NVME_IO_PRINFO_PRCHK_APP)
+			cmd->cdw15 = (opts->apptag_mask << 16 | opts->apptag);
 		break;
 	case NVME_NS_DPS_PI_TYPE3:
-		cmd->cdw15 = (opts->apptag_mask << 16 | opts->apptag);
+		if (opts->io_flags & NVME_IO_PRINFO_PRCHK_APP)
+			cmd->cdw15 = (opts->apptag_mask << 16 | opts->apptag);
 		break;
 	case NVME_NS_DPS_PI_NONE:
 		break;
diff --git a/fio.1 b/fio.1
index f0dc49ab..8159caa4 100644
--- a/fio.1
+++ b/fio.1
@@ -537,6 +537,10 @@ copy that segment, instead of entering the kernel with a
 \fBgettimeofday\fR\|(2) call. The CPU set aside for doing these time
 calls will be excluded from other uses. Fio will manually clear it from the
 CPU mask of other jobs.
+.TP
+.BI job_start_clock_id \fR=\fPint
+The clock_id passed to the call to \fBclock_gettime\fR used to record job_start
+in the \fBjson\fR output format. Default is 0, or CLOCK_REALTIME.
 .SS "Target file/device"
 .TP
 .BI directory \fR=\fPstr
@@ -3664,6 +3668,15 @@ quickly becomes unwieldy. To see the final report per-group instead of
 per-job, use \fBgroup_reporting\fR. Jobs in a file will be part of the
 same reporting group, unless if separated by a \fBstonewall\fR, or by
 using \fBnew_group\fR.
+.RS
+.P
+NOTE: When \fBgroup_reporting\fR is used along with \fBjson\fR output, there
+are certain per-job properties which can be different between jobs but do not
+have a natural group-level equivalent. Examples include \fBkb_base\fR,
+\fBunit_base\fR, \fBsig_figs\fR, \fBthread_number\fR, \fBpid\fR, and
+\fBjob_start\fR. For these properties, the values for the first job are
+recorded for the group.
+.RE
 .TP
 .BI new_group
 Start a new reporting group. See: \fBgroup_reporting\fR. If not given,
@@ -3795,9 +3808,7 @@ decompressed with fio, using the \fB\-\-inflate\-log\fR command line
 parameter. The files will be stored with a `.fz' suffix.
 .TP
 .BI log_unix_epoch \fR=\fPbool
-If set, fio will log Unix timestamps to the log files produced by enabling
-write_type_log for each log type, instead of the default zero-based
-timestamps.
+Backward-compatible alias for \fBlog_alternate_epoch\fR.
 .TP
 .BI log_alternate_epoch \fR=\fPbool
 If set, fio will log timestamps based on the epoch used by the clock specified
@@ -3806,9 +3817,9 @@ enabling write_type_log for each log type, instead of the default zero-based
 timestamps.
 .TP
 .BI log_alternate_epoch_clock_id \fR=\fPint
-Specifies the clock_id to be used by clock_gettime to obtain the alternate epoch
-if either \fBBlog_unix_epoch\fR or \fBlog_alternate_epoch\fR are true. Otherwise has no
-effect. Default value is 0, or CLOCK_REALTIME.
+Specifies the clock_id to be used by clock_gettime to obtain the alternate
+epoch if \fBlog_alternate_epoch\fR is true. Otherwise has no effect. Default
+value is 0, or CLOCK_REALTIME.
 .TP
 .BI block_error_percentiles \fR=\fPbool
 If set, record errors in trim block-sized units from writes and trims and
diff --git a/fio.h b/fio.h
index a54f57c9..1322656f 100644
--- a/fio.h
+++ b/fio.h
@@ -388,7 +388,8 @@ struct thread_data {
 
 	struct timespec start;	/* start of this loop */
 	struct timespec epoch;	/* time job was started */
-	unsigned long long alternate_epoch; /* Time job was started, clock_gettime's clock_id epoch based. */
+	unsigned long long alternate_epoch; /* Time job was started, as clock_gettime(log_alternate_epoch_clock_id) */
+	unsigned long long job_start; /* Time job was started, as clock_gettime(job_start_clock_id) */
 	struct timespec last_issue;
 	long time_offset;
 	struct timespec ts_cache;
diff --git a/fio_time.h b/fio_time.h
index 62d92120..b20e734c 100644
--- a/fio_time.h
+++ b/fio_time.h
@@ -30,6 +30,6 @@ extern bool ramp_time_over(struct thread_data *);
 extern bool in_ramp_time(struct thread_data *);
 extern void fio_time_init(void);
 extern void timespec_add_msec(struct timespec *, unsigned int);
-extern void set_epoch_time(struct thread_data *, int, clockid_t);
+extern void set_epoch_time(struct thread_data *, clockid_t, clockid_t);
 
 #endif
diff --git a/libfio.c b/libfio.c
index 237ce34c..5c433277 100644
--- a/libfio.c
+++ b/libfio.c
@@ -149,7 +149,7 @@ void reset_all_stats(struct thread_data *td)
 		td->ts.runtime[i] = 0;
 	}
 
-	set_epoch_time(td, td->o.log_unix_epoch | td->o.log_alternate_epoch, td->o.log_alternate_epoch_clock_id);
+	set_epoch_time(td, td->o.log_alternate_epoch_clock_id, td->o.job_start_clock_id);
 	memcpy(&td->start, &td->epoch, sizeof(td->epoch));
 	memcpy(&td->iops_sample_time, &td->epoch, sizeof(td->epoch));
 	memcpy(&td->bw_sample_time, &td->epoch, sizeof(td->epoch));
diff --git a/options.c b/options.c
index 65b2813c..6b2cb53f 100644
--- a/options.c
+++ b/options.c
@@ -4612,17 +4612,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.help	= "Install libz-dev(el) to get compression support",
 	},
 #endif
-	{
-		.name = "log_unix_epoch",
-		.lname = "Log epoch unix",
-		.type = FIO_OPT_BOOL,
-		.off1 = offsetof(struct thread_options, log_unix_epoch),
-		.help = "Use Unix time in log files",
-		.category = FIO_OPT_C_LOG,
-		.group = FIO_OPT_G_INVALID,
-	},
 	{
 		.name = "log_alternate_epoch",
+		.alias = "log_unix_epoch",
 		.lname = "Log epoch alternate",
 		.type = FIO_OPT_BOOL,
 		.off1 = offsetof(struct thread_options, log_alternate_epoch),
@@ -4635,7 +4627,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname = "Log alternate epoch clock_id",
 		.type = FIO_OPT_INT,
 		.off1 = offsetof(struct thread_options, log_alternate_epoch_clock_id),
-		.help = "If log_alternate_epoch or log_unix_epoch is true, this option specifies the clock_id from clock_gettime whose epoch should be used. If neither of those is true, this option has no effect. Default value is 0, or CLOCK_REALTIME",
+		.help = "If log_alternate_epoch is true, this option specifies the clock_id from clock_gettime whose epoch should be used. If log_alternate_epoch is false, this option has no effect. Default value is 0, or CLOCK_REALTIME",
 		.category = FIO_OPT_C_LOG,
 		.group = FIO_OPT_G_INVALID,
 	},
@@ -4964,6 +4956,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_CLOCK,
 	},
+	{
+		.name	= "job_start_clock_id",
+		.lname	= "Job start clock_id",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, job_start_clock_id),
+		.help	= "The clock_id passed to the call to clock_gettime used to record job_start in the json output format. Default is 0, or CLOCK_REALTIME",
+		.verify	= gtod_cpu_verify,
+		.category = FIO_OPT_C_GENERAL,
+		.group	= FIO_OPT_G_CLOCK,
+	},
 	{
 		.name	= "unified_rw_reporting",
 		.lname	= "Unified RW Reporting",
diff --git a/rate-submit.c b/rate-submit.c
index 6f6d15bd..92be3df7 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -185,7 +185,7 @@ static int io_workqueue_init_worker_fn(struct submit_worker *sw)
 	if (td->io_ops->post_init && td->io_ops->post_init(td))
 		goto err_io_init;
 
-	set_epoch_time(td, td->o.log_unix_epoch | td->o.log_alternate_epoch, td->o.log_alternate_epoch_clock_id);
+	set_epoch_time(td, td->o.log_alternate_epoch_clock_id, td->o.job_start_clock_id);
 	fio_getrusage(&td->ru_start);
 	clear_io_state(td, 1);
 
diff --git a/server.c b/server.c
index bb423702..27332e32 100644
--- a/server.c
+++ b/server.c
@@ -1706,6 +1706,7 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	p.ts.error		= cpu_to_le32(ts->error);
 	p.ts.thread_number	= cpu_to_le32(ts->thread_number);
 	p.ts.groupid		= cpu_to_le32(ts->groupid);
+	p.ts.job_start		= cpu_to_le64(ts->job_start);
 	p.ts.pid		= cpu_to_le32(ts->pid);
 	p.ts.members		= cpu_to_le32(ts->members);
 	p.ts.unified_rw_rep	= cpu_to_le32(ts->unified_rw_rep);
diff --git a/stat.c b/stat.c
index 7b791628..7cf6bee1 100644
--- a/stat.c
+++ b/stat.c
@@ -1712,6 +1712,7 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 	root = json_create_object();
 	json_object_add_value_string(root, "jobname", ts->name);
 	json_object_add_value_int(root, "groupid", ts->groupid);
+	json_object_add_value_int(root, "job_start", ts->job_start);
 	json_object_add_value_int(root, "error", ts->error);
 
 	/* ETA Info */
@@ -2526,6 +2527,7 @@ void __show_run_stats(void)
 			 */
 			ts->thread_number = td->thread_number;
 			ts->groupid = td->groupid;
+			ts->job_start = td->job_start;
 
 			/*
 			 * first pid in group, not very useful...
@@ -3048,7 +3050,9 @@ static void __add_log_sample(struct io_log *iolog, union io_sample_data data,
 		s = get_sample(iolog, cur_log, cur_log->nr_samples);
 
 		s->data = data;
-		s->time = t + (iolog->td ? iolog->td->alternate_epoch : 0);
+		s->time = t;
+		if (iolog->td && iolog->td->o.log_alternate_epoch)
+			s->time += iolog->td->alternate_epoch;
 		io_sample_set_ddir(iolog, s, ddir);
 		s->bs = bs;
 		s->priority = priority;
diff --git a/stat.h b/stat.h
index 8ceabc48..bd986d4e 100644
--- a/stat.h
+++ b/stat.h
@@ -169,6 +169,7 @@ struct thread_stat {
 	uint32_t error;
 	uint32_t thread_number;
 	uint32_t groupid;
+	uint64_t job_start; /* Time job was started, as clock_gettime(job_start_clock_id) */
 	uint32_t pid;
 	char description[FIO_JOBDESC_SIZE];
 	uint32_t members;
diff --git a/thread_options.h b/thread_options.h
index 38a9993d..fdde055e 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -170,7 +170,6 @@ struct thread_options {
 	unsigned int log_offset;
 	unsigned int log_gz;
 	unsigned int log_gz_store;
-	unsigned int log_unix_epoch;
 	unsigned int log_alternate_epoch;
 	unsigned int log_alternate_epoch_clock_id;
 	unsigned int norandommap;
@@ -273,6 +272,7 @@ struct thread_options {
 	unsigned int unified_rw_rep;
 	unsigned int gtod_reduce;
 	unsigned int gtod_cpu;
+	unsigned int job_start_clock_id;
 	enum fio_cs clocksource;
 	unsigned int no_stall;
 	unsigned int trim_percentage;
@@ -422,7 +422,6 @@ struct thread_options_pack {
 	uint32_t iodepth_batch_complete_min;
 	uint32_t iodepth_batch_complete_max;
 	uint32_t serialize_overlap;
-	uint32_t pad;
 
 	uint64_t size;
 	uint64_t io_size;
@@ -433,13 +432,11 @@ struct thread_options_pack {
 	uint32_t fill_device;
 	uint32_t file_append;
 	uint32_t unique_filename;
-	uint32_t pad3;
 	uint64_t file_size_low;
 	uint64_t file_size_high;
 	uint64_t start_offset;
 	uint64_t start_offset_align;
 	uint32_t start_offset_nz;
-	uint32_t pad4;
 
 	uint64_t bs[DDIR_RWDIR_CNT];
 	uint64_t ba[DDIR_RWDIR_CNT];
@@ -494,7 +491,6 @@ struct thread_options_pack {
 	uint32_t log_offset;
 	uint32_t log_gz;
 	uint32_t log_gz_store;
-	uint32_t log_unix_epoch;
 	uint32_t log_alternate_epoch;
 	uint32_t log_alternate_epoch_clock_id;
 	uint32_t norandommap;
@@ -593,6 +589,7 @@ struct thread_options_pack {
 	uint32_t unified_rw_rep;
 	uint32_t gtod_reduce;
 	uint32_t gtod_cpu;
+	uint32_t job_start_clock_id;
 	uint32_t clocksource;
 	uint32_t no_stall;
 	uint32_t trim_percentage;
@@ -603,6 +600,7 @@ struct thread_options_pack {
 	uint32_t lat_percentiles;
 	uint32_t slat_percentiles;
 	uint32_t percentile_precision;
+	uint32_t pad;
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
 	uint8_t read_iolog_file[FIO_TOP_STR_MAX];
diff --git a/time.c b/time.c
index 5c4d6de0..7cbab6ff 100644
--- a/time.c
+++ b/time.c
@@ -172,14 +172,22 @@ void set_genesis_time(void)
 	fio_gettime(&genesis, NULL);
 }
 
-void set_epoch_time(struct thread_data *td, int log_alternate_epoch, clockid_t clock_id)
+void set_epoch_time(struct thread_data *td, clockid_t log_alternate_epoch_clock_id, clockid_t job_start_clock_id)
 {
+	struct timespec ts;
 	fio_gettime(&td->epoch, NULL);
-	if (log_alternate_epoch) {
-		struct timespec ts;
-		clock_gettime(clock_id, &ts);
-		td->alternate_epoch = (unsigned long long)(ts.tv_sec) * 1000 +
-		                 (unsigned long long)(ts.tv_nsec) / 1000000;
+	clock_gettime(log_alternate_epoch_clock_id, &ts);
+	td->alternate_epoch = (unsigned long long)(ts.tv_sec) * 1000 +
+						  (unsigned long long)(ts.tv_nsec) / 1000000;
+	if (job_start_clock_id == log_alternate_epoch_clock_id)
+	{
+		td->job_start = td->alternate_epoch;
+	}
+	else
+	{
+		clock_gettime(job_start_clock_id, &ts);
+		td->job_start = (unsigned long long)(ts.tv_sec) * 1000 +
+						(unsigned long long)(ts.tv_nsec) / 1000000;
 	}
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-09-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-09-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4a0c766c69ddfe5231d65f2676e97333ba89ab2b:

  Merge branch 'master' of https://github.com/michalbiesek/fio (2023-08-23 08:21:39 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 904ee91c2831615a054a8dea9b164e96ae00abb3:

  Merge branch 'pcpp_parse_nr_fix' of https://github.com/PCPartPicker/fio (2023-09-02 07:35:49 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'pcpp_parse_nr_fix' of https://github.com/PCPartPicker/fio

aggieNick02 (1):
      Add basic error checking to parsing nr from rw=randrw:<nr>, etc

 options.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/options.c b/options.c
index 48aa0d7b..65b2813c 100644
--- a/options.c
+++ b/options.c
@@ -596,9 +596,21 @@ static int str_rw_cb(void *data, const char *str)
 	if (!nr)
 		return 0;
 
-	if (td_random(td))
-		o->ddir_seq_nr = atoi(nr);
-	else {
+	if (td_random(td)) {
+		long long val;
+
+		if (str_to_decimal(nr, &val, 1, o, 0, 0)) {
+			log_err("fio: randrw postfix parsing failed\n");
+			free(nr);
+			return 1;
+		}
+		if ((val <= 0) || (val > UINT_MAX)) {
+			log_err("fio: randrw postfix parsing out of range\n");
+			free(nr);
+			return 1;
+		}
+		o->ddir_seq_nr = (unsigned int) val;
+	} else {
 		long long val;
 
 		if (str_to_decimal(nr, &val, 1, o, 0, 0)) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-08-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-08-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b311162c37a2867873e1222ce6b5f38c88be4d80:

  examples: add example and fiograph for protection information options (2023-08-16 09:34:46 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4a0c766c69ddfe5231d65f2676e97333ba89ab2b:

  Merge branch 'master' of https://github.com/michalbiesek/fio (2023-08-23 08:21:39 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'master' of https://github.com/michalbiesek/fio

Michal Biesek (1):
      Add RISC-V 64 support

 arch/arch-riscv64.h   | 32 ++++++++++++++++++++++++++++++++
 arch/arch.h           |  3 +++
 configure             | 24 +++++++++++++++++++++++-
 libfio.c              |  1 +
 os/os-linux-syscall.h |  7 +++++++
 5 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 arch/arch-riscv64.h

---

Diff of recent changes:

diff --git a/arch/arch-riscv64.h b/arch/arch-riscv64.h
new file mode 100644
index 00000000..a74b7d47
--- /dev/null
+++ b/arch/arch-riscv64.h
@@ -0,0 +1,32 @@
+#ifndef ARCH_RISCV64_H
+#define ARCH_RISCV64_H
+
+#include <unistd.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#define FIO_ARCH	(arch_riscv64)
+
+#define nop		__asm__ __volatile__ ("nop")
+#define read_barrier()		__asm__ __volatile__("fence r, r": : :"memory")
+#define write_barrier()		__asm__ __volatile__("fence w, w": : :"memory")
+
+static inline unsigned long long get_cpu_clock(void)
+{
+	unsigned long val;
+
+	asm volatile("rdcycle %0" : "=r"(val));
+	return val;
+}
+#define ARCH_HAVE_CPU_CLOCK
+
+#define ARCH_HAVE_INIT
+extern bool tsc_reliable;
+static inline int arch_init(char *envp[])
+{
+	tsc_reliable = true;
+	return 0;
+}
+
+#endif
diff --git a/arch/arch.h b/arch/arch.h
index 6e476701..3ee9b053 100644
--- a/arch/arch.h
+++ b/arch/arch.h
@@ -24,6 +24,7 @@ enum {
 	arch_mips,
 	arch_aarch64,
 	arch_loongarch64,
+	arch_riscv64,
 
 	arch_generic,
 
@@ -100,6 +101,8 @@ extern unsigned long arch_flags;
 #include "arch-aarch64.h"
 #elif defined(__loongarch64)
 #include "arch-loongarch64.h"
+#elif defined(__riscv) && __riscv_xlen == 64
+#include "arch-riscv64.h"
 #else
 #warning "Unknown architecture, attempting to use generic model."
 #include "arch-generic.h"
diff --git a/configure b/configure
index 6c938251..36184a58 100755
--- a/configure
+++ b/configure
@@ -133,6 +133,20 @@ EOF
   compile_object
 }
 
+check_val() {
+    cat > $TMPC <<EOF
+#if $1 == $2
+int main(void)
+{
+  return 0;
+}
+#else
+#error $1 is not equal $2
+#endif
+EOF
+  compile_object
+}
+
 output_sym() {
   echo "$1=y" >> $config_host_mak
   echo "#define $1" >> $config_host_h
@@ -501,13 +515,21 @@ elif check_define __hppa__ ; then
   cpu="hppa"
 elif check_define __loongarch64 ; then
   cpu="loongarch64"
+elif check_define __riscv ; then
+  if check_val __riscv_xlen 32 ; then
+    cpu="riscv32"
+  elif check_val __riscv_xlen 64 ; then
+    cpu="riscv64"
+  elif check_val __riscv_xlen 128 ; then
+    cpu="riscv128"
+  fi
 else
   cpu=`uname -m`
 fi
 
 # Normalise host CPU name and set ARCH.
 case "$cpu" in
-  ia64|ppc|ppc64|s390|s390x|sparc64|loongarch64)
+  ia64|ppc|ppc64|s390|s390x|sparc64|loongarch64|riscv64)
     cpu="$cpu"
   ;;
   i386|i486|i586|i686|i86pc|BePC)
diff --git a/libfio.c b/libfio.c
index 5e3fd30b..237ce34c 100644
--- a/libfio.c
+++ b/libfio.c
@@ -75,6 +75,7 @@ static const char *fio_arch_strings[arch_nr] = {
 	"mips",
 	"aarch64",
 	"loongarch64",
+	"riscv64",
 	"generic"
 };
 
diff --git a/os/os-linux-syscall.h b/os/os-linux-syscall.h
index 67ee4d91..626330ad 100644
--- a/os/os-linux-syscall.h
+++ b/os/os-linux-syscall.h
@@ -286,6 +286,13 @@
 #define __NR_sys_tee          	77
 #define __NR_sys_vmsplice       75
 #endif
+
+/* Linux syscalls for riscv64 */
+#elif defined(ARCH_RISCV64_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		30
+#define __NR_ioprio_get		31
+#endif
 #else
 #warning "Unknown architecture"
 #endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-08-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-08-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6795954bde09c8697e0accb865b4f438d62c601f:

  engines/io_uring: fix leak of 'ld' in error path (2023-08-14 19:59:20 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b311162c37a2867873e1222ce6b5f38c88be4d80:

  examples: add example and fiograph for protection information options (2023-08-16 09:34:46 -0400)

----------------------------------------------------------------
Ankit Kumar (1):
      examples: add example and fiograph for protection information options

 examples/uring-cmd-pi-ext.fio |  31 +++++++++++++++++++++++++++++++
 examples/uring-cmd-pi-ext.png | Bin 0 -> 81014 bytes
 examples/uring-cmd-pi-sb.fio  |  32 ++++++++++++++++++++++++++++++++
 examples/uring-cmd-pi-sb.png  | Bin 0 -> 87357 bytes
 tools/fiograph/fiograph.conf  |   2 +-
 5 files changed, 64 insertions(+), 1 deletion(-)
 create mode 100644 examples/uring-cmd-pi-ext.fio
 create mode 100644 examples/uring-cmd-pi-ext.png
 create mode 100644 examples/uring-cmd-pi-sb.fio
 create mode 100644 examples/uring-cmd-pi-sb.png

---

Diff of recent changes:

diff --git a/examples/uring-cmd-pi-ext.fio b/examples/uring-cmd-pi-ext.fio
new file mode 100644
index 00000000..e22ec062
--- /dev/null
+++ b/examples/uring-cmd-pi-ext.fio
@@ -0,0 +1,31 @@
+# Protection information test with io_uring_cmd I/O engine for nvme-ns generic
+# character device.
+#
+# This requires nvme device to be formatted with extended LBA data size and
+# protection information enabled. This can be done with nvme-cli utility.
+# Replace bs below with the correct extended LBA size.
+#
+# First we sequentially write to the device, without protection information
+# action being set. FIO will generate and send necessary protection
+# information data as per the protection information check option. Later on we
+# sequentially read and verify the device returned protection information data.
+#
+[global]
+filename=/dev/ng0n1
+ioengine=io_uring_cmd
+cmd_type=nvme
+size=1G
+iodepth=32
+bs=4160
+pi_act=0
+pi_chk=GUARD,APPTAG,REFTAG
+apptag=0x0888
+apptag_mask=0xFFFF
+thread=1
+stonewall=1
+
+[write]
+rw=write
+
+[read]
+rw=read
diff --git a/examples/uring-cmd-pi-ext.png b/examples/uring-cmd-pi-ext.png
new file mode 100644
index 00000000..a102fc1a
Binary files /dev/null and b/examples/uring-cmd-pi-ext.png differ
diff --git a/examples/uring-cmd-pi-sb.fio b/examples/uring-cmd-pi-sb.fio
new file mode 100644
index 00000000..b201a7ce
--- /dev/null
+++ b/examples/uring-cmd-pi-sb.fio
@@ -0,0 +1,32 @@
+# Protection information test with io_uring_cmd I/O engine for nvme-ns generic
+# character device.
+#
+# This requires nvme device to be formatted with separate metadata buffer and
+# protection information enabled. This can be done with nvme-cli utility.
+# Replace md_per_io_size as per the required metadata buffer size for each IO.
+#
+# First we sequentially write to the device, without protection information
+# action being set. FIO will generate and send necessary protection
+# information data as per the protection information check option. Later on we
+# sequentially read and verify the device returned protection information data.
+#
+[global]
+filename=/dev/ng0n1
+ioengine=io_uring_cmd
+cmd_type=nvme
+size=1G
+iodepth=32
+bs=4096
+md_per_io_size=64
+pi_act=0
+pi_chk=GUARD,APPTAG,REFTAG
+apptag=0x0888
+apptag_mask=0xFFFF
+thread=1
+stonewall=1
+
+[write]
+rw=write
+
+[read]
+rw=read
diff --git a/examples/uring-cmd-pi-sb.png b/examples/uring-cmd-pi-sb.png
new file mode 100644
index 00000000..dcdda8cd
Binary files /dev/null and b/examples/uring-cmd-pi-sb.png differ
diff --git a/tools/fiograph/fiograph.conf b/tools/fiograph/fiograph.conf
index 91c5fcfe..123c39ae 100644
--- a/tools/fiograph/fiograph.conf
+++ b/tools/fiograph/fiograph.conf
@@ -54,7 +54,7 @@ specific_options=ime_psync  ime_psyncv
 specific_options=hipri  cmdprio_percentage  cmdprio_class  cmdprio  cmdprio_bssplit  fixedbufs  registerfiles  sqthread_poll  sqthread_poll_cpu  nonvectored  uncached  nowait  force_async
 
 [ioengine_io_uring_cmd]
-specific_options=hipri  cmdprio_percentage  cmdprio_class  cmdprio  cmdprio_bssplit  fixedbufs  registerfiles  sqthread_poll  sqthread_poll_cpu  nonvectored  uncached  nowait  force_async  cmd_type
+specific_options=hipri  cmdprio_percentage  cmdprio_class  cmdprio  cmdprio_bssplit  fixedbufs  registerfiles  sqthread_poll  sqthread_poll_cpu  nonvectored  uncached  nowait  force_async  cmd_type  md_per_io_size  pi_act  pi_chk  apptag  apptag_mask
 
 [ioengine_libaio]
 specific_options=userspace_reap  cmdprio_percentage  cmdprio_class  cmdprio  cmdprio_bssplit  nowait

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-08-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-08-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 62f35562722f0c903567096d0f10a836d1ae2f60:

  eta: calculate aggregate bw statistics even when eta is disabled (2023-08-03 11:49:08 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6795954bde09c8697e0accb865b4f438d62c601f:

  engines/io_uring: fix leak of 'ld' in error path (2023-08-14 19:59:20 -0600)

----------------------------------------------------------------
Ankit Kumar (10):
      engines:io_uring: add missing error during open file
      engines:io_uring: update arguments to fetch nvme data
      engines:io_uring: enable support for separate metadata buffer
      engines:io_uring: uring_cmd add support for protection info
      io_u: move engine data out of union
      crc: pull required crc16-t10 files from linux kernel
      engines:io_uring: generate and verify pi for 16b guard
      crc: pull required crc64 nvme apis from linux kernel
      engines:nvme: pull required 48 bit accessors from linux kernel
      engines:io_uring: generate and verify pi for 64b guard

Jens Axboe (1):
      engines/io_uring: fix leak of 'ld' in error path

Vincent Fu (2):
      t/fiotestlib: use config variable to skip test at runtime
      t/nvmept_pi: test script for protection information

 HOWTO.rst              |  39 ++
 crc/crc-t10dif.h       |   9 +
 crc/crc64.c            |  32 ++
 crc/crc64.h            |   3 +
 crc/crc64table.h       | 130 +++++++
 crc/crct10dif_common.c |  78 ++++
 engines/io_uring.c     | 228 ++++++++++--
 engines/nvme.c         | 466 ++++++++++++++++++++++--
 engines/nvme.h         | 230 +++++++++++-
 fio.1                  |  38 ++
 io_u.h                 |   2 +-
 t/fiotestlib.py        |   5 +-
 t/nvmept_pi.py         | 949 +++++++++++++++++++++++++++++++++++++++++++++++++
 13 files changed, 2154 insertions(+), 55 deletions(-)
 create mode 100644 crc/crc-t10dif.h
 create mode 100644 crc/crc64table.h
 create mode 100644 crc/crct10dif_common.c
 create mode 100755 t/nvmept_pi.py

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index ac8314f3..89032941 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2487,6 +2487,45 @@ with the caveat that when used on the command line, they must come after the
         want fio to use placement identifier only at indices 0, 2 and 5 specify
         ``fdp_pli=0,2,5``.
 
+.. option:: md_per_io_size=int : [io_uring_cmd]
+
+	Size in bytes for separate metadata buffer per IO. Default: 0.
+
+.. option:: pi_act=int : [io_uring_cmd]
+
+	Action to take when nvme namespace is formatted with protection
+	information. If this is set to 1 and namespace is formatted with
+	metadata size equal to protection information size, fio won't use
+	separate metadata buffer or extended logical block. If this is set to
+	1 and namespace is formatted with metadata size greater than protection
+	information size, fio will not generate or verify the protection
+	information portion of metadata for write or read case respectively.
+	If this is set to 0, fio generates protection information for
+	write case and verifies for read case. Default: 1.
+
+.. option:: pi_chk=str[,str][,str] : [io_uring_cmd]
+
+	Controls the protection information check. This can take one or more
+	of these values. Default: none.
+
+	**GUARD**
+		Enables protection information checking of guard field.
+	**REFTAG**
+		Enables protection information checking of logical block
+		reference tag field.
+	**APPTAG**
+		Enables protection information checking of application tag field.
+
+.. option:: apptag=int : [io_uring_cmd]
+
+	Specifies logical block application tag value, if namespace is
+	formatted to use end to end protection information. Default: 0x1234.
+
+.. option:: apptag_mask=int : [io_uring_cmd]
+
+	Specifies logical block application tag mask value, if namespace is
+	formatted to use end to end protection information. Default: 0xffff.
+
 .. option:: cpuload=int : [cpuio]
 
 	Attempt to use the specified percentage of CPU cycles. This is a mandatory
diff --git a/crc/crc-t10dif.h b/crc/crc-t10dif.h
new file mode 100644
index 00000000..fde4ccd7
--- /dev/null
+++ b/crc/crc-t10dif.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __CRC_T10DIF_H
+#define __CRC_T10DIF_H
+
+extern unsigned short fio_crc_t10dif(unsigned short crc,
+				     const unsigned char *buffer,
+				     unsigned int len);
+
+#endif
diff --git a/crc/crc64.c b/crc/crc64.c
index bf24a97b..c910e5b8 100644
--- a/crc/crc64.c
+++ b/crc/crc64.c
@@ -1,4 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * crc64nvme[256] table is from the generator polynomial specified by NVMe
+ * 64b CRC and is defined as,
+ *
+ * x^64 + x^63 + x^61 + x^59 + x^58 + x^56 + x^55 + x^52 + x^49 + x^48 + x^47 +
+ * x^46 + x^44 + x^41 + x^37 + x^36 + x^34 + x^32 + x^31 + x^28 + x^26 + x^23 +
+ * x^22 + x^19 + x^16 + x^13 + x^12 + x^10 + x^9 + x^6 + x^4 + x^3 + 1
+ *
+ */
+
 #include "crc64.h"
+#include "crc64table.h"
 
 /*
  * poly 0x95AC9329AC4BC9B5ULL and init 0xFFFFFFFFFFFFFFFFULL
@@ -102,3 +114,23 @@ unsigned long long fio_crc64(const unsigned char *buffer, unsigned long length)
 	return crc;
 }
 
+/**
+ * fio_crc64_nvme - Calculate bitwise NVMe CRC64
+ * @crc: seed value for computation. 0 for a new CRC calculation, or the
+ * 	 previous crc64 value if computing incrementally.
+ * @p: pointer to buffer over which CRC64 is run
+ * @len: length of buffer @p
+ */
+unsigned long long fio_crc64_nvme(unsigned long long crc, const void *p,
+				  unsigned int len)
+{
+	const unsigned char *_p = p;
+	unsigned int i;
+
+	crc = ~crc;
+
+	for (i = 0; i < len; i++)
+		crc = (crc >> 8) ^ crc64nvmetable[(crc & 0xff) ^ *_p++];
+
+	return ~crc;
+}
diff --git a/crc/crc64.h b/crc/crc64.h
index fe9cad3e..e586edee 100644
--- a/crc/crc64.h
+++ b/crc/crc64.h
@@ -3,4 +3,7 @@
 
 unsigned long long fio_crc64(const unsigned char *, unsigned long);
 
+unsigned long long fio_crc64_nvme(unsigned long long crc, const void *p,
+				  unsigned int len);
+
 #endif
diff --git a/crc/crc64table.h b/crc/crc64table.h
new file mode 100644
index 00000000..04224d4f
--- /dev/null
+++ b/crc/crc64table.h
@@ -0,0 +1,130 @@
+static const unsigned long long crc64nvmetable[256] = {
+	0x0000000000000000ULL, 	0x7f6ef0c830358979ULL,
+	0xfedde190606b12f2ULL, 	0x81b31158505e9b8bULL,
+	0xc962e5739841b68fULL, 	0xb60c15bba8743ff6ULL,
+	0x37bf04e3f82aa47dULL, 	0x48d1f42bc81f2d04ULL,
+	0xa61cecb46814fe75ULL, 	0xd9721c7c5821770cULL,
+	0x58c10d24087fec87ULL, 	0x27affdec384a65feULL,
+	0x6f7e09c7f05548faULL, 	0x1010f90fc060c183ULL,
+	0x91a3e857903e5a08ULL, 	0xeecd189fa00bd371ULL,
+	0x78e0ff3b88be6f81ULL, 	0x078e0ff3b88be6f8ULL,
+	0x863d1eabe8d57d73ULL, 	0xf953ee63d8e0f40aULL,
+	0xb1821a4810ffd90eULL, 	0xceecea8020ca5077ULL,
+	0x4f5ffbd87094cbfcULL, 	0x30310b1040a14285ULL,
+	0xdefc138fe0aa91f4ULL, 	0xa192e347d09f188dULL,
+	0x2021f21f80c18306ULL, 	0x5f4f02d7b0f40a7fULL,
+	0x179ef6fc78eb277bULL, 	0x68f0063448deae02ULL,
+	0xe943176c18803589ULL, 	0x962de7a428b5bcf0ULL,
+	0xf1c1fe77117cdf02ULL, 	0x8eaf0ebf2149567bULL,
+	0x0f1c1fe77117cdf0ULL, 	0x7072ef2f41224489ULL,
+	0x38a31b04893d698dULL, 	0x47cdebccb908e0f4ULL,
+	0xc67efa94e9567b7fULL, 	0xb9100a5cd963f206ULL,
+	0x57dd12c379682177ULL, 	0x28b3e20b495da80eULL,
+	0xa900f35319033385ULL, 	0xd66e039b2936bafcULL,
+	0x9ebff7b0e12997f8ULL, 	0xe1d10778d11c1e81ULL,
+	0x606216208142850aULL, 	0x1f0ce6e8b1770c73ULL,
+	0x8921014c99c2b083ULL, 	0xf64ff184a9f739faULL,
+	0x77fce0dcf9a9a271ULL, 	0x08921014c99c2b08ULL,
+	0x4043e43f0183060cULL, 	0x3f2d14f731b68f75ULL,
+	0xbe9e05af61e814feULL, 	0xc1f0f56751dd9d87ULL,
+	0x2f3dedf8f1d64ef6ULL, 	0x50531d30c1e3c78fULL,
+	0xd1e00c6891bd5c04ULL, 	0xae8efca0a188d57dULL,
+	0xe65f088b6997f879ULL, 	0x9931f84359a27100ULL,
+	0x1882e91b09fcea8bULL, 	0x67ec19d339c963f2ULL,
+	0xd75adabd7a6e2d6fULL, 	0xa8342a754a5ba416ULL,
+	0x29873b2d1a053f9dULL, 	0x56e9cbe52a30b6e4ULL,
+	0x1e383fcee22f9be0ULL, 	0x6156cf06d21a1299ULL,
+	0xe0e5de5e82448912ULL, 	0x9f8b2e96b271006bULL,
+	0x71463609127ad31aULL, 	0x0e28c6c1224f5a63ULL,
+	0x8f9bd7997211c1e8ULL, 	0xf0f5275142244891ULL,
+	0xb824d37a8a3b6595ULL, 	0xc74a23b2ba0eececULL,
+	0x46f932eaea507767ULL, 	0x3997c222da65fe1eULL,
+	0xafba2586f2d042eeULL, 	0xd0d4d54ec2e5cb97ULL,
+	0x5167c41692bb501cULL, 	0x2e0934dea28ed965ULL,
+	0x66d8c0f56a91f461ULL, 	0x19b6303d5aa47d18ULL,
+	0x980521650afae693ULL, 	0xe76bd1ad3acf6feaULL,
+	0x09a6c9329ac4bc9bULL, 	0x76c839faaaf135e2ULL,
+	0xf77b28a2faafae69ULL, 	0x8815d86aca9a2710ULL,
+	0xc0c42c4102850a14ULL, 	0xbfaadc8932b0836dULL,
+	0x3e19cdd162ee18e6ULL, 	0x41773d1952db919fULL,
+	0x269b24ca6b12f26dULL, 	0x59f5d4025b277b14ULL,
+	0xd846c55a0b79e09fULL, 	0xa72835923b4c69e6ULL,
+	0xeff9c1b9f35344e2ULL, 	0x90973171c366cd9bULL,
+	0x1124202993385610ULL, 	0x6e4ad0e1a30ddf69ULL,
+	0x8087c87e03060c18ULL, 	0xffe938b633338561ULL,
+	0x7e5a29ee636d1eeaULL, 	0x0134d92653589793ULL,
+	0x49e52d0d9b47ba97ULL, 	0x368bddc5ab7233eeULL,
+	0xb738cc9dfb2ca865ULL, 	0xc8563c55cb19211cULL,
+	0x5e7bdbf1e3ac9decULL, 	0x21152b39d3991495ULL,
+	0xa0a63a6183c78f1eULL, 	0xdfc8caa9b3f20667ULL,
+	0x97193e827bed2b63ULL, 	0xe877ce4a4bd8a21aULL,
+	0x69c4df121b863991ULL, 	0x16aa2fda2bb3b0e8ULL,
+	0xf86737458bb86399ULL, 	0x8709c78dbb8deae0ULL,
+	0x06bad6d5ebd3716bULL, 	0x79d4261ddbe6f812ULL,
+	0x3105d23613f9d516ULL, 	0x4e6b22fe23cc5c6fULL,
+	0xcfd833a67392c7e4ULL, 	0xb0b6c36e43a74e9dULL,
+	0x9a6c9329ac4bc9b5ULL, 	0xe50263e19c7e40ccULL,
+	0x64b172b9cc20db47ULL, 	0x1bdf8271fc15523eULL,
+	0x530e765a340a7f3aULL, 	0x2c608692043ff643ULL,
+	0xadd397ca54616dc8ULL, 	0xd2bd67026454e4b1ULL,
+	0x3c707f9dc45f37c0ULL, 	0x431e8f55f46abeb9ULL,
+	0xc2ad9e0da4342532ULL, 	0xbdc36ec59401ac4bULL,
+	0xf5129aee5c1e814fULL, 	0x8a7c6a266c2b0836ULL,
+	0x0bcf7b7e3c7593bdULL, 	0x74a18bb60c401ac4ULL,
+	0xe28c6c1224f5a634ULL, 	0x9de29cda14c02f4dULL,
+	0x1c518d82449eb4c6ULL, 	0x633f7d4a74ab3dbfULL,
+	0x2bee8961bcb410bbULL, 	0x548079a98c8199c2ULL,
+	0xd53368f1dcdf0249ULL, 	0xaa5d9839ecea8b30ULL,
+	0x449080a64ce15841ULL, 	0x3bfe706e7cd4d138ULL,
+	0xba4d61362c8a4ab3ULL, 	0xc52391fe1cbfc3caULL,
+	0x8df265d5d4a0eeceULL, 	0xf29c951de49567b7ULL,
+	0x732f8445b4cbfc3cULL, 	0x0c41748d84fe7545ULL,
+	0x6bad6d5ebd3716b7ULL, 	0x14c39d968d029fceULL,
+	0x95708ccedd5c0445ULL, 	0xea1e7c06ed698d3cULL,
+	0xa2cf882d2576a038ULL, 	0xdda178e515432941ULL,
+	0x5c1269bd451db2caULL, 	0x237c997575283bb3ULL,
+	0xcdb181ead523e8c2ULL, 	0xb2df7122e51661bbULL,
+	0x336c607ab548fa30ULL, 	0x4c0290b2857d7349ULL,
+	0x04d364994d625e4dULL, 	0x7bbd94517d57d734ULL,
+	0xfa0e85092d094cbfULL, 	0x856075c11d3cc5c6ULL,
+	0x134d926535897936ULL, 	0x6c2362ad05bcf04fULL,
+	0xed9073f555e26bc4ULL, 	0x92fe833d65d7e2bdULL,
+	0xda2f7716adc8cfb9ULL, 	0xa54187de9dfd46c0ULL,
+	0x24f29686cda3dd4bULL, 	0x5b9c664efd965432ULL,
+	0xb5517ed15d9d8743ULL, 	0xca3f8e196da80e3aULL,
+	0x4b8c9f413df695b1ULL, 	0x34e26f890dc31cc8ULL,
+	0x7c339ba2c5dc31ccULL, 	0x035d6b6af5e9b8b5ULL,
+	0x82ee7a32a5b7233eULL, 	0xfd808afa9582aa47ULL,
+	0x4d364994d625e4daULL, 	0x3258b95ce6106da3ULL,
+	0xb3eba804b64ef628ULL, 	0xcc8558cc867b7f51ULL,
+	0x8454ace74e645255ULL, 	0xfb3a5c2f7e51db2cULL,
+	0x7a894d772e0f40a7ULL, 	0x05e7bdbf1e3ac9deULL,
+	0xeb2aa520be311aafULL, 	0x944455e88e0493d6ULL,
+	0x15f744b0de5a085dULL, 	0x6a99b478ee6f8124ULL,
+	0x224840532670ac20ULL, 	0x5d26b09b16452559ULL,
+	0xdc95a1c3461bbed2ULL, 	0xa3fb510b762e37abULL,
+	0x35d6b6af5e9b8b5bULL, 	0x4ab846676eae0222ULL,
+	0xcb0b573f3ef099a9ULL, 	0xb465a7f70ec510d0ULL,
+	0xfcb453dcc6da3dd4ULL, 	0x83daa314f6efb4adULL,
+	0x0269b24ca6b12f26ULL, 	0x7d0742849684a65fULL,
+	0x93ca5a1b368f752eULL, 	0xeca4aad306bafc57ULL,
+	0x6d17bb8b56e467dcULL, 	0x12794b4366d1eea5ULL,
+	0x5aa8bf68aecec3a1ULL, 	0x25c64fa09efb4ad8ULL,
+	0xa4755ef8cea5d153ULL, 	0xdb1bae30fe90582aULL,
+	0xbcf7b7e3c7593bd8ULL, 	0xc399472bf76cb2a1ULL,
+	0x422a5673a732292aULL, 	0x3d44a6bb9707a053ULL,
+	0x759552905f188d57ULL, 	0x0afba2586f2d042eULL,
+	0x8b48b3003f739fa5ULL, 	0xf42643c80f4616dcULL,
+	0x1aeb5b57af4dc5adULL, 	0x6585ab9f9f784cd4ULL,
+	0xe436bac7cf26d75fULL, 	0x9b584a0fff135e26ULL,
+	0xd389be24370c7322ULL, 	0xace74eec0739fa5bULL,
+	0x2d545fb4576761d0ULL, 	0x523aaf7c6752e8a9ULL,
+	0xc41748d84fe75459ULL, 	0xbb79b8107fd2dd20ULL,
+	0x3acaa9482f8c46abULL, 	0x45a459801fb9cfd2ULL,
+	0x0d75adabd7a6e2d6ULL, 	0x721b5d63e7936bafULL,
+	0xf3a84c3bb7cdf024ULL, 	0x8cc6bcf387f8795dULL,
+	0x620ba46c27f3aa2cULL, 	0x1d6554a417c62355ULL,
+	0x9cd645fc4798b8deULL, 	0xe3b8b53477ad31a7ULL,
+	0xab69411fbfb21ca3ULL, 	0xd407b1d78f8795daULL,
+	0x55b4a08fdfd90e51ULL, 	0x2ada5047efec8728ULL,
+};
diff --git a/crc/crct10dif_common.c b/crc/crct10dif_common.c
new file mode 100644
index 00000000..cfb2a1b1
--- /dev/null
+++ b/crc/crct10dif_common.c
@@ -0,0 +1,78 @@
+/*
+ * Cryptographic API.
+ *
+ * T10 Data Integrity Field CRC16 Crypto Transform
+ *
+ * Copyright (c) 2007 Oracle Corporation.  All rights reserved.
+ * Written by Martin K. Petersen <martin.petersen@oracle.com>
+ * Copyright (C) 2013 Intel Corporation
+ * Author: Tim Chen <tim.c.chen@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include "crc-t10dif.h"
+
+/* Table generated using the following polynomium:
+ * x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
+ * gt: 0x8bb7
+ */
+static const unsigned short t10_dif_crc_table[256] = {
+	0x0000, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
+	0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
+	0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
+	0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
+	0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
+	0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
+	0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
+	0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
+	0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
+	0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
+	0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
+	0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
+	0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
+	0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
+	0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
+	0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
+	0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
+	0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
+	0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
+	0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
+	0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
+	0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
+	0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
+	0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
+	0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
+	0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
+	0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
+	0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
+	0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
+	0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
+	0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
+	0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
+};
+
+extern unsigned short fio_crc_t10dif(unsigned short crc,
+				     const unsigned char *buffer,
+				     unsigned int len)
+{
+	unsigned int i;
+
+	for (i = 0 ; i < len ; i++)
+		crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
+
+	return crc;
+}
diff --git a/engines/io_uring.c b/engines/io_uring.c
index b361e6a5..6cdf1b4f 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -59,6 +59,7 @@ struct ioring_data {
 	int ring_fd;
 
 	struct io_u **io_u_index;
+	char *md_buf;
 
 	int *fds;
 
@@ -95,6 +96,12 @@ struct ioring_options {
 	unsigned int uncached;
 	unsigned int nowait;
 	unsigned int force_async;
+	unsigned int md_per_io_size;
+	unsigned int pi_act;
+	unsigned int apptag;
+	unsigned int apptag_mask;
+	unsigned int prchk;
+	char *pi_chk;
 	enum uring_cmd_type cmd_type;
 };
 
@@ -217,6 +224,56 @@ static struct fio_option options[] = {
 		.group	= FIO_OPT_G_IOURING,
 	},
 	CMDPRIO_OPTIONS(struct ioring_options, FIO_OPT_G_IOURING),
+	{
+		.name	= "md_per_io_size",
+		.lname	= "Separate Metadata Buffer Size per I/O",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct ioring_options, md_per_io_size),
+		.def	= "0",
+		.help	= "Size of separate metadata buffer per I/O (Default: 0)",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
+	{
+		.name	= "pi_act",
+		.lname	= "Protection Information Action",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct ioring_options, pi_act),
+		.def	= "1",
+		.help	= "Protection Information Action bit (pi_act=1 or pi_act=0)",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
+	{
+		.name	= "pi_chk",
+		.lname	= "Protection Information Check",
+		.type	= FIO_OPT_STR_STORE,
+		.off1	= offsetof(struct ioring_options, pi_chk),
+		.def	= NULL,
+		.help	= "Control of Protection Information Checking (pi_chk=GUARD,REFTAG,APPTAG)",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
+	{
+		.name	= "apptag",
+		.lname	= "Application Tag used in Protection Information",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct ioring_options, apptag),
+		.def	= "0x1234",
+		.help	= "Application Tag used in Protection Information field (Default: 0x1234)",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
+	{
+		.name	= "apptag_mask",
+		.lname	= "Application Tag Mask",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct ioring_options, apptag_mask),
+		.def	= "0xffff",
+		.help	= "Application Tag Mask used with Application Tag (Default: 0xffff)",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
 	{
 		.name	= NULL,
 	},
@@ -399,7 +456,9 @@ static struct io_u *fio_ioring_cmd_event(struct thread_data *td, int event)
 	struct ioring_options *o = td->eo;
 	struct io_uring_cqe *cqe;
 	struct io_u *io_u;
+	struct nvme_data *data;
 	unsigned index;
+	int ret;
 
 	index = (event + ld->cq_ring_off) & ld->cq_ring_mask;
 	if (o->cmd_type == FIO_URING_CMD_NVME)
@@ -413,6 +472,15 @@ static struct io_u *fio_ioring_cmd_event(struct thread_data *td, int event)
 	else
 		io_u->error = 0;
 
+	if (o->cmd_type == FIO_URING_CMD_NVME) {
+		data = FILE_ENG_DATA(io_u->file);
+		if (data->pi_type && (io_u->ddir == DDIR_READ) && !o->pi_act) {
+			ret = fio_nvme_pi_verify(data, io_u);
+			if (ret)
+				io_u->error = ret;
+		}
+	}
+
 	return io_u;
 }
 
@@ -474,6 +542,33 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
 	return r < 0 ? r : events;
 }
 
+static inline void fio_ioring_cmd_nvme_pi(struct thread_data *td,
+					  struct io_u *io_u)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+	struct nvme_uring_cmd *cmd;
+	struct io_uring_sqe *sqe;
+	struct nvme_cmd_ext_io_opts ext_opts = {0};
+	struct nvme_data *data = FILE_ENG_DATA(io_u->file);
+
+	if (io_u->ddir == DDIR_TRIM)
+		return;
+
+	sqe = &ld->sqes[(io_u->index) << 1];
+	cmd = (struct nvme_uring_cmd *)sqe->cmd;
+
+	if (data->pi_type) {
+		if (o->pi_act)
+			ext_opts.io_flags |= NVME_IO_PRINFO_PRACT;
+		ext_opts.io_flags |= o->prchk;
+		ext_opts.apptag = o->apptag;
+		ext_opts.apptag_mask = o->apptag_mask;
+	}
+
+	fio_nvme_pi_fill(cmd, io_u, &ext_opts);
+}
+
 static inline void fio_ioring_cmdprio_prep(struct thread_data *td,
 					   struct io_u *io_u)
 {
@@ -488,6 +583,7 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 					  struct io_u *io_u)
 {
 	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
 	struct io_sq_ring *ring = &ld->sq_ring;
 	unsigned tail, next_tail;
 
@@ -515,6 +611,10 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 	if (ld->cmdprio.mode != CMDPRIO_MODE_NONE)
 		fio_ioring_cmdprio_prep(td, io_u);
 
+	if (!strcmp(td->io_ops->name, "io_uring_cmd") &&
+		o->cmd_type == FIO_URING_CMD_NVME)
+		fio_ioring_cmd_nvme_pi(td, io_u);
+
 	ring->array[tail & ld->sq_ring_mask] = io_u->index;
 	atomic_store_release(ring->tail, next_tail);
 
@@ -631,6 +731,7 @@ static void fio_ioring_cleanup(struct thread_data *td)
 
 		fio_cmdprio_cleanup(&ld->cmdprio);
 		free(ld->io_u_index);
+		free(ld->md_buf);
 		free(ld->iovecs);
 		free(ld->fds);
 		free(ld->dsm);
@@ -1012,10 +1113,24 @@ static int fio_ioring_cmd_post_init(struct thread_data *td)
 	return 0;
 }
 
+static void parse_prchk_flags(struct ioring_options *o)
+{
+	if (!o->pi_chk)
+		return;
+
+	if (strstr(o->pi_chk, "GUARD") != NULL)
+		o->prchk = NVME_IO_PRINFO_PRCHK_GUARD;
+	if (strstr(o->pi_chk, "REFTAG") != NULL)
+		o->prchk |= NVME_IO_PRINFO_PRCHK_REF;
+	if (strstr(o->pi_chk, "APPTAG") != NULL)
+		o->prchk |= NVME_IO_PRINFO_PRCHK_APP;
+}
+
 static int fio_ioring_init(struct thread_data *td)
 {
 	struct ioring_options *o = td->eo;
 	struct ioring_data *ld;
+	unsigned long long md_size;
 	int ret;
 
 	/* sqthread submission requires registered files */
@@ -1036,6 +1151,32 @@ static int fio_ioring_init(struct thread_data *td)
 
 	/* io_u index */
 	ld->io_u_index = calloc(td->o.iodepth, sizeof(struct io_u *));
+
+	/*
+	 * metadata buffer for nvme command.
+	 * We are only supporting iomem=malloc / mem=malloc as of now.
+	 */
+	if (!strcmp(td->io_ops->name, "io_uring_cmd") &&
+	    (o->cmd_type == FIO_URING_CMD_NVME) && o->md_per_io_size) {
+		md_size = (unsigned long long) o->md_per_io_size
+				* (unsigned long long) td->o.iodepth;
+		md_size += page_mask + td->o.mem_align;
+		if (td->o.mem_align && td->o.mem_align > page_size)
+			md_size += td->o.mem_align - page_size;
+		if (td->o.mem_type == MEM_MALLOC) {
+			ld->md_buf = malloc(md_size);
+			if (!ld->md_buf) {
+				free(ld);
+				return 1;
+			}
+		} else {
+			log_err("fio: Only iomem=malloc or mem=malloc is supported\n");
+			free(ld);
+			return 1;
+		}
+	}
+	parse_prchk_flags(o);
+
 	ld->iovecs = calloc(td->o.iodepth, sizeof(struct iovec));
 
 	td->io_ops_data = ld;
@@ -1062,11 +1203,42 @@ static int fio_ioring_init(struct thread_data *td)
 static int fio_ioring_io_u_init(struct thread_data *td, struct io_u *io_u)
 {
 	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+	struct nvme_pi_data *pi_data;
+	char *p;
 
 	ld->io_u_index[io_u->index] = io_u;
+
+	if (!strcmp(td->io_ops->name, "io_uring_cmd")) {
+		p = PTR_ALIGN(ld->md_buf, page_mask) + td->o.mem_align;
+		p += o->md_per_io_size * io_u->index;
+		io_u->mmap_data = p;
+
+		if (!o->pi_act) {
+			pi_data = calloc(1, sizeof(*pi_data));
+			pi_data->io_flags |= o->prchk;
+			pi_data->apptag_mask = o->apptag_mask;
+			pi_data->apptag = o->apptag;
+			io_u->engine_data = pi_data;
+		}
+	}
+
 	return 0;
 }
 
+static void fio_ioring_io_u_free(struct thread_data *td, struct io_u *io_u)
+{
+	struct ioring_options *o = td->eo;
+	struct nvme_pi *pi;
+
+	if (!strcmp(td->io_ops->name, "io_uring_cmd") &&
+	    (o->cmd_type == FIO_URING_CMD_NVME)) {
+		pi = io_u->engine_data;
+		free(pi);
+		io_u->engine_data = NULL;
+	}
+}
+
 static int fio_ioring_open_file(struct thread_data *td, struct fio_file *f)
 {
 	struct ioring_data *ld = td->io_ops_data;
@@ -1086,39 +1258,44 @@ static int fio_ioring_cmd_open_file(struct thread_data *td, struct fio_file *f)
 
 	if (o->cmd_type == FIO_URING_CMD_NVME) {
 		struct nvme_data *data = NULL;
-		unsigned int nsid, lba_size = 0;
-		__u32 ms = 0;
+		unsigned int lba_size = 0;
 		__u64 nlba = 0;
 		int ret;
 
 		/* Store the namespace-id and lba size. */
 		data = FILE_ENG_DATA(f);
 		if (data == NULL) {
-			ret = fio_nvme_get_info(f, &nsid, &lba_size, &ms, &nlba);
-			if (ret)
-				return ret;
-
 			data = calloc(1, sizeof(struct nvme_data));
-			data->nsid = nsid;
-			if (ms)
-				data->lba_ext = lba_size + ms;
-			else
-				data->lba_shift = ilog2(lba_size);
+			ret = fio_nvme_get_info(f, &nlba, o->pi_act, data);
+			if (ret) {
+				free(data);
+				return ret;
+			}
 
 			FILE_SET_ENG_DATA(f, data);
 		}
 
-		assert(data->lba_shift < 32);
-		lba_size = data->lba_ext ? data->lba_ext : (1U << data->lba_shift);
+		lba_size = data->lba_ext ? data->lba_ext : data->lba_size;
 
 		for_each_rw_ddir(ddir) {
 			if (td->o.min_bs[ddir] % lba_size ||
 				td->o.max_bs[ddir] % lba_size) {
 				if (data->lba_ext)
-					log_err("block size must be a multiple of "
-						"(LBA data size + Metadata size)\n");
+					log_err("%s: block size must be a multiple of (LBA data size + Metadata size)\n",
+						f->file_name);
 				else
-					log_err("block size must be a multiple of LBA data size\n");
+					log_err("%s: block size must be a multiple of LBA data size\n",
+						f->file_name);
+				td_verror(td, EINVAL, "fio_ioring_cmd_open_file");
+				return 1;
+			}
+			if (data->ms && !data->lba_ext && ddir != DDIR_TRIM &&
+			    (o->md_per_io_size < ((td->o.max_bs[ddir] / data->lba_size) *
+						  data->ms))) {
+				log_err("%s: md_per_io_size should be at least %llu bytes\n",
+					f->file_name,
+					((td->o.max_bs[ddir] / data->lba_size) * data->ms));
+				td_verror(td, EINVAL, "fio_ioring_cmd_open_file");
 				return 1;
 			}
                 }
@@ -1171,23 +1348,17 @@ static int fio_ioring_cmd_get_file_size(struct thread_data *td,
 
 	if (o->cmd_type == FIO_URING_CMD_NVME) {
 		struct nvme_data *data = NULL;
-		unsigned int nsid, lba_size = 0;
-		__u32 ms = 0;
 		__u64 nlba = 0;
 		int ret;
 
-		ret = fio_nvme_get_info(f, &nsid, &lba_size, &ms, &nlba);
-		if (ret)
-			return ret;
-
 		data = calloc(1, sizeof(struct nvme_data));
-		data->nsid = nsid;
-		if (ms)
-			data->lba_ext = lba_size + ms;
-		else
-			data->lba_shift = ilog2(lba_size);
+		ret = fio_nvme_get_info(f, &nlba, o->pi_act, data);
+		if (ret) {
+			free(data);
+			return ret;
+		}
 
-		f->real_file_size = lba_size * nlba;
+		f->real_file_size = data->lba_size * nlba;
 		fio_file_set_size_known(f);
 
 		FILE_SET_ENG_DATA(f, data);
@@ -1276,6 +1447,7 @@ static struct ioengine_ops ioengine_uring_cmd = {
 	.init			= fio_ioring_init,
 	.post_init		= fio_ioring_cmd_post_init,
 	.io_u_init		= fio_ioring_io_u_init,
+	.io_u_free		= fio_ioring_io_u_free,
 	.prep			= fio_ioring_cmd_prep,
 	.queue			= fio_ioring_queue,
 	.commit			= fio_ioring_commit,
diff --git a/engines/nvme.c b/engines/nvme.c
index b18ad4c2..08503b33 100644
--- a/engines/nvme.c
+++ b/engines/nvme.c
@@ -1,9 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
 /*
  * nvme structure declarations and helper functions for the
  * io_uring_cmd engine.
  */
 
 #include "nvme.h"
+#include "../crc/crc-t10dif.h"
+#include "../crc/crc64.h"
 
 static inline __u64 get_slba(struct nvme_data *data, struct io_u *io_u)
 {
@@ -21,6 +24,310 @@ static inline __u32 get_nlb(struct nvme_data *data, struct io_u *io_u)
 		return (io_u->xfer_buflen >> data->lba_shift) - 1;
 }
 
+static void fio_nvme_generate_pi_16b_guard(struct nvme_data *data,
+					   struct io_u *io_u,
+					   struct nvme_cmd_ext_io_opts *opts)
+{
+	struct nvme_pi_data *pi_data = io_u->engine_data;
+	struct nvme_16b_guard_pif *pi;
+	unsigned char *buf = io_u->xfer_buf;
+	unsigned char *md_buf = io_u->mmap_data;
+	__u64 slba = get_slba(data, io_u);
+	__u32 nlb = get_nlb(data, io_u) + 1;
+	__u32 lba_num = 0;
+	__u16 guard = 0;
+
+	if (data->pi_loc) {
+		if (data->lba_ext)
+			pi_data->interval = data->lba_ext - data->ms;
+		else
+			pi_data->interval = 0;
+	} else {
+		if (data->lba_ext)
+			pi_data->interval = data->lba_ext - sizeof(struct nvme_16b_guard_pif);
+		else
+			pi_data->interval = data->ms - sizeof(struct nvme_16b_guard_pif);
+	}
+
+	if (io_u->ddir != DDIR_WRITE)
+		return;
+
+	while (lba_num < nlb) {
+		if (data->lba_ext)
+			pi = (struct nvme_16b_guard_pif *)(buf + pi_data->interval);
+		else
+			pi = (struct nvme_16b_guard_pif *)(md_buf + pi_data->interval);
+
+		if (opts->io_flags & NVME_IO_PRINFO_PRCHK_GUARD) {
+			if (data->lba_ext) {
+				guard = fio_crc_t10dif(0, buf, pi_data->interval);
+			} else {
+				guard = fio_crc_t10dif(0, buf, data->lba_size);
+				guard = fio_crc_t10dif(guard, md_buf, pi_data->interval);
+			}
+			pi->guard = cpu_to_be16(guard);
+		}
+
+		if (opts->io_flags & NVME_IO_PRINFO_PRCHK_APP)
+			pi->apptag = cpu_to_be16(pi_data->apptag);
+
+		if (opts->io_flags & NVME_IO_PRINFO_PRCHK_REF) {
+			switch (data->pi_type) {
+			case NVME_NS_DPS_PI_TYPE1:
+			case NVME_NS_DPS_PI_TYPE2:
+				pi->srtag = cpu_to_be32((__u32)slba + lba_num);
+				break;
+			case NVME_NS_DPS_PI_TYPE3:
+				break;
+			}
+		}
+		if (data->lba_ext) {
+			buf += data->lba_ext;
+		} else {
+			buf += data->lba_size;
+			md_buf += data->ms;
+		}
+		lba_num++;
+	}
+}
+
+static int fio_nvme_verify_pi_16b_guard(struct nvme_data *data,
+					struct io_u *io_u)
+{
+	struct nvme_pi_data *pi_data = io_u->engine_data;
+	struct nvme_16b_guard_pif *pi;
+	struct fio_file *f = io_u->file;
+	unsigned char *buf = io_u->xfer_buf;
+	unsigned char *md_buf = io_u->mmap_data;
+	__u64 slba = get_slba(data, io_u);
+	__u32 nlb = get_nlb(data, io_u) + 1;
+	__u32 lba_num = 0;
+	__u16 unmask_app, unmask_app_exp, guard = 0;
+
+	while (lba_num < nlb) {
+		if (data->lba_ext)
+			pi = (struct nvme_16b_guard_pif *)(buf + pi_data->interval);
+		else
+			pi = (struct nvme_16b_guard_pif *)(md_buf + pi_data->interval);
+
+		if (data->pi_type == NVME_NS_DPS_PI_TYPE3) {
+			if (pi->apptag == NVME_PI_APP_DISABLE &&
+			    pi->srtag == NVME_PI_REF_DISABLE)
+				goto next;
+		} else if (data->pi_type == NVME_NS_DPS_PI_TYPE1 ||
+			   data->pi_type == NVME_NS_DPS_PI_TYPE2) {
+			if (pi->apptag == NVME_PI_APP_DISABLE)
+				goto next;
+		}
+
+		if (pi_data->io_flags & NVME_IO_PRINFO_PRCHK_GUARD) {
+			if (data->lba_ext) {
+				guard = fio_crc_t10dif(0, buf, pi_data->interval);
+			} else {
+				guard = fio_crc_t10dif(0, buf, data->lba_size);
+				guard = fio_crc_t10dif(guard, md_buf, pi_data->interval);
+			}
+			if (be16_to_cpu(pi->guard) != guard) {
+				log_err("%s: Guard compare error: LBA: %llu Expected=%x, Actual=%x\n",
+					f->file_name, (unsigned long long)slba,
+					guard, be16_to_cpu(pi->guard));
+				return -EIO;
+			}
+		}
+
+		if (pi_data->io_flags & NVME_IO_PRINFO_PRCHK_APP) {
+			unmask_app = be16_to_cpu(pi->apptag) & pi_data->apptag_mask;
+			unmask_app_exp = pi_data->apptag & pi_data->apptag_mask;
+			if (unmask_app != unmask_app_exp) {
+				log_err("%s: APPTAG compare error: LBA: %llu Expected=%x, Actual=%x\n",
+					f->file_name, (unsigned long long)slba,
+					unmask_app_exp, unmask_app);
+				return -EIO;
+			}
+		}
+
+		if (pi_data->io_flags & NVME_IO_PRINFO_PRCHK_REF) {
+			switch (data->pi_type) {
+			case NVME_NS_DPS_PI_TYPE1:
+			case NVME_NS_DPS_PI_TYPE2:
+				if (be32_to_cpu(pi->srtag) !=
+				    ((__u32)slba + lba_num)) {
+					log_err("%s: REFTAG compare error: LBA: %llu Expected=%x, Actual=%x\n",
+						f->file_name, (unsigned long long)slba,
+						(__u32)slba + lba_num,
+						be32_to_cpu(pi->srtag));
+					return -EIO;
+				}
+				break;
+			case NVME_NS_DPS_PI_TYPE3:
+				break;
+			}
+		}
+next:
+		if (data->lba_ext) {
+			buf += data->lba_ext;
+		} else {
+			buf += data->lba_size;
+			md_buf += data->ms;
+		}
+		lba_num++;
+	}
+
+	return 0;
+}
+
+static void fio_nvme_generate_pi_64b_guard(struct nvme_data *data,
+					   struct io_u *io_u,
+					   struct nvme_cmd_ext_io_opts *opts)
+{
+	struct nvme_pi_data *pi_data = io_u->engine_data;
+	struct nvme_64b_guard_pif *pi;
+	unsigned char *buf = io_u->xfer_buf;
+	unsigned char *md_buf = io_u->mmap_data;
+	uint64_t guard = 0;
+	__u64 slba = get_slba(data, io_u);
+	__u32 nlb = get_nlb(data, io_u) + 1;
+	__u32 lba_num = 0;
+
+	if (data->pi_loc) {
+		if (data->lba_ext)
+			pi_data->interval = data->lba_ext - data->ms;
+		else
+			pi_data->interval = 0;
+	} else {
+		if (data->lba_ext)
+			pi_data->interval = data->lba_ext - sizeof(struct nvme_64b_guard_pif);
+		else
+			pi_data->interval = data->ms - sizeof(struct nvme_64b_guard_pif);
+	}
+
+	if (io_u->ddir != DDIR_WRITE)
+		return;
+
+	while (lba_num < nlb) {
+		if (data->lba_ext)
+			pi = (struct nvme_64b_guard_pif *)(buf + pi_data->interval);
+		else
+			pi = (struct nvme_64b_guard_pif *)(md_buf + pi_data->interval);
+
+		if (opts->io_flags & NVME_IO_PRINFO_PRCHK_GUARD) {
+			if (data->lba_ext) {
+				guard = fio_crc64_nvme(0, buf, pi_data->interval);
+			} else {
+				guard = fio_crc64_nvme(0, buf, data->lba_size);
+				guard = fio_crc64_nvme(guard, md_buf, pi_data->interval);
+			}
+			pi->guard = cpu_to_be64(guard);
+		}
+
+		if (opts->io_flags & NVME_IO_PRINFO_PRCHK_APP)
+			pi->apptag = cpu_to_be16(pi_data->apptag);
+
+		if (opts->io_flags & NVME_IO_PRINFO_PRCHK_REF) {
+			switch (data->pi_type) {
+			case NVME_NS_DPS_PI_TYPE1:
+			case NVME_NS_DPS_PI_TYPE2:
+				put_unaligned_be48(slba + lba_num, pi->srtag);
+				break;
+			case NVME_NS_DPS_PI_TYPE3:
+				break;
+			}
+		}
+		if (data->lba_ext) {
+			buf += data->lba_ext;
+		} else {
+			buf += data->lba_size;
+			md_buf += data->ms;
+		}
+		lba_num++;
+	}
+}
+
+static int fio_nvme_verify_pi_64b_guard(struct nvme_data *data,
+					struct io_u *io_u)
+{
+	struct nvme_pi_data *pi_data = io_u->engine_data;
+	struct nvme_64b_guard_pif *pi;
+	struct fio_file *f = io_u->file;
+	unsigned char *buf = io_u->xfer_buf;
+	unsigned char *md_buf = io_u->mmap_data;
+	__u64 slba = get_slba(data, io_u);
+	__u64 ref, ref_exp, guard = 0;
+	__u32 nlb = get_nlb(data, io_u) + 1;
+	__u32 lba_num = 0;
+	__u16 unmask_app, unmask_app_exp;
+
+	while (lba_num < nlb) {
+		if (data->lba_ext)
+			pi = (struct nvme_64b_guard_pif *)(buf + pi_data->interval);
+		else
+			pi = (struct nvme_64b_guard_pif *)(md_buf + pi_data->interval);
+
+		if (data->pi_type == NVME_NS_DPS_PI_TYPE3) {
+			if (pi->apptag == NVME_PI_APP_DISABLE &&
+			    fio_nvme_pi_ref_escape(pi->srtag))
+				goto next;
+		} else if (data->pi_type == NVME_NS_DPS_PI_TYPE1 ||
+			   data->pi_type == NVME_NS_DPS_PI_TYPE2) {
+			if (pi->apptag == NVME_PI_APP_DISABLE)
+				goto next;
+		}
+
+		if (pi_data->io_flags & NVME_IO_PRINFO_PRCHK_GUARD) {
+			if (data->lba_ext) {
+				guard = fio_crc64_nvme(0, buf, pi_data->interval);
+			} else {
+				guard = fio_crc64_nvme(0, buf, data->lba_size);
+				guard = fio_crc64_nvme(guard, md_buf, pi_data->interval);
+			}
+			if (be64_to_cpu((uint64_t)pi->guard) != guard) {
+				log_err("%s: Guard compare error: LBA: %llu Expected=%llx, Actual=%llx\n",
+					f->file_name, (unsigned long long)slba,
+					guard, be64_to_cpu((uint64_t)pi->guard));
+				return -EIO;
+			}
+		}
+
+		if (pi_data->io_flags & NVME_IO_PRINFO_PRCHK_APP) {
+			unmask_app = be16_to_cpu(pi->apptag) & pi_data->apptag_mask;
+			unmask_app_exp = pi_data->apptag & pi_data->apptag_mask;
+			if (unmask_app != unmask_app_exp) {
+				log_err("%s: APPTAG compare error: LBA: %llu Expected=%x, Actual=%x\n",
+					f->file_name, (unsigned long long)slba,
+					unmask_app_exp, unmask_app);
+				return -EIO;
+			}
+		}
+
+		if (pi_data->io_flags & NVME_IO_PRINFO_PRCHK_REF) {
+			switch (data->pi_type) {
+			case NVME_NS_DPS_PI_TYPE1:
+			case NVME_NS_DPS_PI_TYPE2:
+				ref = get_unaligned_be48(pi->srtag);
+				ref_exp = (slba + lba_num) & ((1ULL << 48) - 1);
+				if (ref != ref_exp) {
+					log_err("%s: REFTAG compare error: LBA: %llu Expected=%llx, Actual=%llx\n",
+						f->file_name, (unsigned long long)slba,
+						ref_exp, ref);
+					return -EIO;
+				}
+				break;
+			case NVME_NS_DPS_PI_TYPE3:
+				break;
+			}
+		}
+next:
+		if (data->lba_ext) {
+			buf += data->lba_ext;
+		} else {
+			buf += data->lba_size;
+			md_buf += data->ms;
+		}
+		lba_num++;
+	}
+
+	return 0;
+}
 void fio_nvme_uring_cmd_trim_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 				  struct nvme_dsm_range *dsm)
 {
@@ -79,10 +386,72 @@ int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 		cmd->addr = (__u64)(uintptr_t)io_u->xfer_buf;
 		cmd->data_len = io_u->xfer_buflen;
 	}
+	if (data->lba_shift && data->ms) {
+		cmd->metadata = (__u64)(uintptr_t)io_u->mmap_data;
+		cmd->metadata_len = (nlb + 1) * data->ms;
+	}
 	cmd->nsid = data->nsid;
 	return 0;
 }
 
+void fio_nvme_pi_fill(struct nvme_uring_cmd *cmd, struct io_u *io_u,
+		      struct nvme_cmd_ext_io_opts *opts)
+{
+	struct nvme_data *data = FILE_ENG_DATA(io_u->file);
+	__u64 slba;
+
+	slba = get_slba(data, io_u);
+	cmd->cdw12 |= opts->io_flags;
+
+	if (data->pi_type && !(opts->io_flags & NVME_IO_PRINFO_PRACT)) {
+		if (data->guard_type == NVME_NVM_NS_16B_GUARD)
+			fio_nvme_generate_pi_16b_guard(data, io_u, opts);
+		else if (data->guard_type == NVME_NVM_NS_64B_GUARD)
+			fio_nvme_generate_pi_64b_guard(data, io_u, opts);
+	}
+
+	switch (data->pi_type) {
+	case NVME_NS_DPS_PI_TYPE1:
+	case NVME_NS_DPS_PI_TYPE2:
+		switch (data->guard_type) {
+		case NVME_NVM_NS_16B_GUARD:
+			cmd->cdw14 = (__u32)slba;
+			break;
+		case NVME_NVM_NS_64B_GUARD:
+			cmd->cdw14 = (__u32)slba;
+			cmd->cdw3 = ((slba >> 32) & 0xffff);
+			break;
+		default:
+			break;
+		}
+		cmd->cdw15 = (opts->apptag_mask << 16 | opts->apptag);
+		break;
+	case NVME_NS_DPS_PI_TYPE3:
+		cmd->cdw15 = (opts->apptag_mask << 16 | opts->apptag);
+		break;
+	case NVME_NS_DPS_PI_NONE:
+		break;
+	}
+}
+
+int fio_nvme_pi_verify(struct nvme_data *data, struct io_u *io_u)
+{
+	int ret = 0;
+
+	switch (data->guard_type) {
+	case NVME_NVM_NS_16B_GUARD:
+		ret = fio_nvme_verify_pi_16b_guard(data, io_u);
+		break;
+	case NVME_NVM_NS_64B_GUARD:
+		ret = fio_nvme_verify_pi_64b_guard(data, io_u);
+		break;
+	default:
+		break;
+	}
+
+	return ret;
+}
+
 static int nvme_identify(int fd, __u32 nsid, enum nvme_identify_cns cns,
 			 enum nvme_csi csi, void *data)
 {
@@ -99,13 +468,15 @@ static int nvme_identify(int fd, __u32 nsid, enum nvme_identify_cns cns,
 	return ioctl(fd, NVME_IOCTL_ADMIN_CMD, &cmd);
 }
 
-int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
-		      __u32 *ms, __u64 *nlba)
+int fio_nvme_get_info(struct fio_file *f, __u64 *nlba, __u32 pi_act,
+		      struct nvme_data *data)
 {
 	struct nvme_id_ns ns;
+	struct nvme_id_ctrl ctrl;
+	struct nvme_nvm_id_ns nvm_ns;
 	int namespace_id;
 	int fd, err;
-	__u32 format_idx;
+	__u32 format_idx, elbaf;
 
 	if (f->filetype != FIO_TYPE_CHAR) {
 		log_err("ioengine io_uring_cmd only works with nvme ns "
@@ -124,6 +495,12 @@ int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
 		goto out;
 	}
 
+	err = nvme_identify(fd, 0, NVME_IDENTIFY_CNS_CTRL, NVME_CSI_NVM, &ctrl);
+	if (err) {
+		log_err("%s: failed to fetch identify ctrl\n", f->file_name);
+		goto out;
+	}
+
 	/*
 	 * Identify namespace to get namespace-id, namespace size in LBA's
 	 * and LBA data size.
@@ -133,11 +510,10 @@ int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
 	if (err) {
 		log_err("%s: failed to fetch identify namespace\n",
 			f->file_name);
-		close(fd);
-		return err;
+		goto out;
 	}
 
-	*nsid = namespace_id;
+	data->nsid = namespace_id;
 
 	/*
 	 * 16 or 64 as maximum number of supported LBA formats.
@@ -149,28 +525,74 @@ int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
 	else
 		format_idx = (ns.flbas & 0xf) + (((ns.flbas >> 5) & 0x3) << 4);
 
-	*lba_sz = 1 << ns.lbaf[format_idx].ds;
+	data->lba_size = 1 << ns.lbaf[format_idx].ds;
+	data->ms = le16_to_cpu(ns.lbaf[format_idx].ms);
+
+	/* Check for end to end data protection support */
+	if (data->ms && (ns.dps & NVME_NS_DPS_PI_MASK))
+		data->pi_type = (ns.dps & NVME_NS_DPS_PI_MASK);
+
+	if (!data->pi_type)
+		goto check_elba;
+
+	if (ctrl.ctratt & NVME_CTRL_CTRATT_ELBAS) {
+		err = nvme_identify(fd, namespace_id, NVME_IDENTIFY_CNS_CSI_NS,
+					NVME_CSI_NVM, &nvm_ns);
+		if (err) {
+			log_err("%s: failed to fetch identify nvm namespace\n",
+				f->file_name);
+			goto out;
+		}
+
+		elbaf = le32_to_cpu(nvm_ns.elbaf[format_idx]);
+
+		/* Currently we don't support storage tags */
+		if (elbaf & NVME_ID_NS_NVM_STS_MASK) {
+			log_err("%s: Storage tag not supported\n",
+				f->file_name);
+			err = -ENOTSUP;
+			goto out;
+		}
+
+		data->guard_type = (elbaf >> NVME_ID_NS_NVM_GUARD_SHIFT) &
+				NVME_ID_NS_NVM_GUARD_MASK;
+
+		/* No 32 bit guard, as storage tag is mandatory for it */
+		switch (data->guard_type) {
+		case NVME_NVM_NS_16B_GUARD:
+			data->pi_size = sizeof(struct nvme_16b_guard_pif);
+			break;
+		case NVME_NVM_NS_64B_GUARD:
+			data->pi_size = sizeof(struct nvme_64b_guard_pif);
+			break;
+		default:
+			break;
+		}
+	} else {
+		data->guard_type = NVME_NVM_NS_16B_GUARD;
+		data->pi_size = sizeof(struct nvme_16b_guard_pif);
+	}
+
+	/*
+	 * when PRACT bit is set to 1, and metadata size is equal to protection
+	 * information size, controller inserts and removes PI for write and
+	 * read commands respectively.
+	 */
+	if (pi_act && data->ms == data->pi_size)
+		data->ms = 0;
+
+	data->pi_loc = (ns.dps & NVME_NS_DPS_PI_FIRST);
 
+check_elba:
 	/*
-	 * Only extended LBA can be supported.
 	 * Bit 4 for flbas indicates if metadata is transferred at the end of
 	 * logical block creating an extended LBA.
 	 */
-	*ms = le16_to_cpu(ns.lbaf[format_idx].ms);
-	if (*ms && !((ns.flbas >> 4) & 0x1)) {
-		log_err("%s: only extended logical block can be supported\n",
-			f->file_name);
-		err = -ENOTSUP;
-		goto out;
-	}
+	if (data->ms && ((ns.flbas >> 4) & 0x1))
+		data->lba_ext = data->lba_size + data->ms;
+	else
+		data->lba_shift = ilog2(data->lba_size);
 
-	/* Check for end to end data protection support */
-	if (ns.dps & 0x3) {
-		log_err("%s: end to end data protection not supported\n",
-			f->file_name);
-		err = -ENOTSUP;
-		goto out;
-	}
 	*nlba = ns.nsze;
 
 out:
diff --git a/engines/nvme.h b/engines/nvme.h
index 238471dd..792b35d8 100644
--- a/engines/nvme.h
+++ b/engines/nvme.h
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
 /*
  * nvme structure declarations and helper functions for the
  * io_uring_cmd engine.
@@ -42,6 +43,10 @@ struct nvme_uring_cmd {
 #define NVME_DEFAULT_IOCTL_TIMEOUT 0
 #define NVME_IDENTIFY_DATA_SIZE 4096
 #define NVME_IDENTIFY_CSI_SHIFT 24
+#define NVME_NQN_LENGTH	256
+
+#define NVME_PI_APP_DISABLE 0xFFFF
+#define NVME_PI_REF_DISABLE 0xFFFFFFFF
 
 #define NVME_ZNS_ZRA_REPORT_ZONES 0
 #define NVME_ZNS_ZRAS_FEAT_ERZ (1 << 16)
@@ -52,6 +57,7 @@ struct nvme_uring_cmd {
 
 enum nvme_identify_cns {
 	NVME_IDENTIFY_CNS_NS		= 0x00,
+	NVME_IDENTIFY_CNS_CTRL		= 0x01,
 	NVME_IDENTIFY_CNS_CSI_NS	= 0x05,
 	NVME_IDENTIFY_CNS_CSI_CTRL	= 0x06,
 };
@@ -85,10 +91,55 @@ enum nvme_zns_zs {
 	NVME_ZNS_ZS_OFFLINE		= 0xf,
 };
 
+enum nvme_id_ctrl_ctratt {
+	NVME_CTRL_CTRATT_ELBAS		= 1 << 15,
+};
+
+enum {
+	NVME_ID_NS_NVM_STS_MASK		= 0x7f,
+	NVME_ID_NS_NVM_GUARD_SHIFT	= 7,
+	NVME_ID_NS_NVM_GUARD_MASK	= 0x3,
+};
+
+enum {
+	NVME_NVM_NS_16B_GUARD		= 0,
+	NVME_NVM_NS_32B_GUARD		= 1,
+	NVME_NVM_NS_64B_GUARD		= 2,
+};
+
 struct nvme_data {
 	__u32 nsid;
 	__u32 lba_shift;
+	__u32 lba_size;
 	__u32 lba_ext;
+	__u16 ms;
+	__u16 pi_size;
+	__u8 pi_type;
+	__u8 guard_type;
+	__u8 pi_loc;
+};
+
+enum nvme_id_ns_dps {
+	NVME_NS_DPS_PI_NONE		= 0,
+	NVME_NS_DPS_PI_TYPE1		= 1,
+	NVME_NS_DPS_PI_TYPE2		= 2,
+	NVME_NS_DPS_PI_TYPE3		= 3,
+	NVME_NS_DPS_PI_MASK		= 7 << 0,
+	NVME_NS_DPS_PI_FIRST		= 1 << 3,
+};
+
+enum nvme_io_control_flags {
+	NVME_IO_PRINFO_PRCHK_REF	= 1U << 26,
+	NVME_IO_PRINFO_PRCHK_APP	= 1U << 27,
+	NVME_IO_PRINFO_PRCHK_GUARD	= 1U << 28,
+	NVME_IO_PRINFO_PRACT		= 1U << 29,
+};
+
+struct nvme_pi_data {
+	__u32 interval;
+	__u32 io_flags;
+	__u16 apptag;
+	__u16 apptag_mask;
 };
 
 struct nvme_lbaf {
@@ -97,6 +148,20 @@ struct nvme_lbaf {
 	__u8			rp;
 };
 
+/* 16 bit guard protection Information format */
+struct nvme_16b_guard_pif {
+	__be16 guard;
+	__be16 apptag;
+	__be32 srtag;
+};
+
+/* 64 bit guard protection Information format */
+struct nvme_64b_guard_pif {
+	__be64 guard;
+	__be16 apptag;
+	__u8 srtag[6];
+};
+
 struct nvme_id_ns {
 	__le64			nsze;
 	__le64			ncap;
@@ -139,6 +204,133 @@ struct nvme_id_ns {
 	__u8			vs[3712];
 };
 
+struct nvme_id_psd {
+	__le16			mp;
+	__u8			rsvd2;
+	__u8			flags;
+	__le32			enlat;
+	__le32			exlat;
+	__u8			rrt;
+	__u8			rrl;
+	__u8			rwt;
+	__u8			rwl;
+	__le16			idlp;
+	__u8			ips;
+	__u8			rsvd19;
+	__le16			actp;
+	__u8			apws;
+	__u8			rsvd23[9];
+};
+
+struct nvme_id_ctrl {
+	__le16			vid;
+	__le16			ssvid;
+	char			sn[20];
+	char			mn[40];
+	char			fr[8];
+	__u8			rab;
+	__u8			ieee[3];
+	__u8			cmic;
+	__u8			mdts;
+	__le16			cntlid;
+	__le32			ver;
+	__le32			rtd3r;
+	__le32			rtd3e;
+	__le32			oaes;
+	__le32			ctratt;
+	__le16			rrls;
+	__u8			rsvd102[9];
+	__u8			cntrltype;
+	__u8			fguid[16];
+	__le16			crdt1;
+	__le16			crdt2;
+	__le16			crdt3;
+	__u8			rsvd134[119];
+	__u8			nvmsr;
+	__u8			vwci;
+	__u8			mec;
+	__le16			oacs;
+	__u8			acl;
+	__u8			aerl;
+	__u8			frmw;
+	__u8			lpa;
+	__u8			elpe;
+	__u8			npss;
+	__u8			avscc;
+	__u8			apsta;
+	__le16			wctemp;
+	__le16			cctemp;
+	__le16			mtfa;
+	__le32			hmpre;
+	__le32			hmmin;
+	__u8			tnvmcap[16];
+	__u8			unvmcap[16];
+	__le32			rpmbs;
+	__le16			edstt;
+	__u8			dsto;
+	__u8			fwug;
+	__le16			kas;
+	__le16			hctma;
+	__le16			mntmt;
+	__le16			mxtmt;
+	__le32			sanicap;
+	__le32			hmminds;
+	__le16			hmmaxd;
+	__le16			nsetidmax;
+	__le16			endgidmax;
+	__u8			anatt;
+	__u8			anacap;
+	__le32			anagrpmax;
+	__le32			nanagrpid;
+	__le32			pels;
+	__le16			domainid;
+	__u8			rsvd358[10];
+	__u8			megcap[16];
+	__u8			rsvd384[128];
+	__u8			sqes;
+	__u8			cqes;
+	__le16			maxcmd;
+	__le32			nn;
+	__le16			oncs;
+	__le16			fuses;
+	__u8			fna;
+	__u8			vwc;
+	__le16			awun;
+	__le16			awupf;
+	__u8			icsvscc;
+	__u8			nwpc;
+	__le16			acwu;
+	__le16			ocfs;
+	__le32			sgls;
+	__le32			mnan;
+	__u8			maxdna[16];
+	__le32			maxcna;
+	__u8			rsvd564[204];
+	char			subnqn[NVME_NQN_LENGTH];
+	__u8			rsvd1024[768];
+
+	/* Fabrics Only */
+	__le32			ioccsz;
+	__le32			iorcsz;
+	__le16			icdoff;
+	__u8			fcatt;
+	__u8			msdbd;
+	__le16			ofcs;
+	__u8			dctype;
+	__u8			rsvd1807[241];
+
+	struct nvme_id_psd	psd[32];
+	__u8			vs[1024];
+};
+
+struct nvme_nvm_id_ns {
+	__le64			lbstm;
+	__u8			pic;
+	__u8			rsvd9[3];
+	__le32			elbaf[64];
+	__u8			rsvd268[3828];
+};
+
 static inline int ilog2(uint32_t i)
 {
 	int log = -1;
@@ -216,15 +408,26 @@ struct nvme_dsm_range {
 	__le64	slba;
 };
 
+struct nvme_cmd_ext_io_opts {
+	__u32 io_flags;
+	__u16 apptag;
+	__u16 apptag_mask;
+};
+
 int fio_nvme_iomgmt_ruhs(struct thread_data *td, struct fio_file *f,
 			 struct nvme_fdp_ruh_status *ruhs, __u32 bytes);
 
-int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
-		      __u32 *ms, __u64 *nlba);
+int fio_nvme_get_info(struct fio_file *f, __u64 *nlba, __u32 pi_act,
+		      struct nvme_data *data);
 
 int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 			    struct iovec *iov, struct nvme_dsm_range *dsm);
 
+void fio_nvme_pi_fill(struct nvme_uring_cmd *cmd, struct io_u *io_u,
+		      struct nvme_cmd_ext_io_opts *opts);
+
+int fio_nvme_pi_verify(struct nvme_data *data, struct io_u *io_u);
+
 int fio_nvme_get_zoned_model(struct thread_data *td, struct fio_file *f,
 			     enum zbd_zoned_model *model);
 
@@ -238,4 +441,27 @@ int fio_nvme_reset_wp(struct thread_data *td, struct fio_file *f,
 int fio_nvme_get_max_open_zones(struct thread_data *td, struct fio_file *f,
 				unsigned int *max_open_zones);
 
+static inline void put_unaligned_be48(__u64 val, __u8 *p)
+{
+	*p++ = val >> 40;
+	*p++ = val >> 32;
+	*p++ = val >> 24;
+	*p++ = val >> 16;
+	*p++ = val >> 8;
+	*p++ = val;
+}
+
+static inline __u64 get_unaligned_be48(__u8 *p)
+{
+	return (__u64)p[0] << 40 | (__u64)p[1] << 32 | (__u64)p[2] << 24 |
+		p[3] << 16 | p[4] << 8 | p[5];
+}
+
+static inline bool fio_nvme_pi_ref_escape(__u8 *reftag)
+{
+	__u8 ref_esc[6] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
+
+	return memcmp(reftag, ref_esc, sizeof(ref_esc)) == 0;
+}
+
 #endif
diff --git a/fio.1 b/fio.1
index f62617e7..f0dc49ab 100644
--- a/fio.1
+++ b/fio.1
@@ -2247,6 +2247,44 @@ By default, the job will cycle through all available Placement IDs, so use this
 to isolate these identifiers to specific jobs. If you want fio to use placement
 identifier only at indices 0, 2 and 5 specify, you would set `fdp_pli=0,2,5`.
 .TP
+.BI (io_uring_cmd)md_per_io_size \fR=\fPint
+Size in bytes for separate metadata buffer per IO. Default: 0.
+.TP
+.BI (io_uring_cmd)pi_act \fR=\fPint
+Action to take when nvme namespace is formatted with protection information.
+If this is set to 1 and namespace is formatted with metadata size equal to
+protection information size, fio won't use separate metadata buffer or extended
+logical block. If this is set to 1 and namespace is formatted with metadata
+size greater than protection information size, fio will not generate or verify
+the protection information portion of metadata for write or read case
+respectively. If this is set to 0, fio generates protection information for
+write case and verifies for read case. Default: 1.
+.TP
+.BI (io_uring_cmd)pi_chk \fR=\fPstr[,str][,str]
+Controls the protection information check. This can take one or more of these
+values. Default: none.
+.RS
+.RS
+.TP
+.B GUARD
+Enables protection information checking of guard field.
+.TP
+.B REFTAG
+Enables protection information checking of logical block reference tag field.
+.TP
+.B APPTAG
+Enables protection information checking of application tag field.
+.RE
+.RE
+.TP
+.BI (io_uring_cmd)apptag \fR=\fPint
+Specifies logical block application tag value, if namespace is formatted to use
+end to end protection information. Default: 0x1234.
+.TP
+.BI (io_uring_cmd)apptag_mask \fR=\fPint
+Specifies logical block application tag mask value, if namespace is formatted
+to use end to end protection information. Default: 0xffff.
+.TP
 .BI (cpuio)cpuload \fR=\fPint
 Attempt to use the specified percentage of CPU cycles. This is a mandatory
 option when using cpuio I/O engine.
diff --git a/io_u.h b/io_u.h
index b432a540..786251d5 100644
--- a/io_u.h
+++ b/io_u.h
@@ -89,8 +89,8 @@ struct io_u {
 	union {
 		unsigned int index;
 		unsigned int seen;
-		void *engine_data;
 	};
+	void *engine_data;
 
 	union {
 		struct flist_head verify_list;
diff --git a/t/fiotestlib.py b/t/fiotestlib.py
index 1f35de0a..a96338a3 100755
--- a/t/fiotestlib.py
+++ b/t/fiotestlib.py
@@ -382,9 +382,10 @@ def run_fio_tests(test_list, test_env, args):
 
     for config in test_list:
         if (args.skip and config['test_id'] in args.skip) or \
-           (args.run_only and config['test_id'] not in args.run_only):
+           (args.run_only and config['test_id'] not in args.run_only) or \
+           ('force_skip' in config and config['force_skip']):
             skipped = skipped + 1
-            print(f"Test {config['test_id']} SKIPPED (User request)")
+            print(f"Test {config['test_id']} SKIPPED (User request or override)")
             continue
 
         if issubclass(config['test_class'], FioJobFileTest):
diff --git a/t/nvmept_pi.py b/t/nvmept_pi.py
new file mode 100755
index 00000000..5de77c9d
--- /dev/null
+++ b/t/nvmept_pi.py
@@ -0,0 +1,949 @@
+#!/usr/bin/env python3
+"""
+# nvmept_pi.py
+#
+# Test fio's io_uring_cmd ioengine support for DIF/DIX end-to-end data
+# protection.
+#
+# USAGE
+# see python3 nvmept_pi.py --help
+#
+# EXAMPLES (THIS IS A DESTRUCTIVE TEST!!)
+# python3 t/nvmept_pi.py --dut /dev/ng0n1 -f ./fio
+# python3 t/nvmept_pi.py --dut /dev/ng0n1 -f ./fio --lbaf 1
+#
+# REQUIREMENTS
+# Python 3.6
+#
+"""
+import os
+import sys
+import json
+import time
+import locale
+import logging
+import argparse
+import itertools
+import subprocess
+from pathlib import Path
+from fiotestlib import FioJobCmdTest, run_fio_tests
+from fiotestcommon import SUCCESS_NONZERO
+
+NUMBER_IOS = 8192
+BS_LOW = 1
+BS_HIGH = 16
+
+class DifDixTest(FioJobCmdTest):
+    """
+    NVMe DIF/DIX test class.
+    """
+
+    def setup(self, parameters):
+        """Setup a test."""
+
+        fio_args = [
+            "--name=nvmept_pi",
+            "--ioengine=io_uring_cmd",
+            "--cmd_type=nvme",
+            f"--filename={self.fio_opts['filename']}",
+            f"--rw={self.fio_opts['rw']}",
+            f"--bsrange={self.fio_opts['bsrange']}",
+            f"--output={self.filenames['output']}",
+            f"--output-format={self.fio_opts['output-format']}",
+            f"--md_per_io_size={self.fio_opts['md_per_io_size']}",
+            f"--pi_act={self.fio_opts['pi_act']}",
+            f"--pi_chk={self.fio_opts['pi_chk']}",
+            f"--apptag={self.fio_opts['apptag']}",
+            f"--apptag_mask={self.fio_opts['apptag_mask']}",
+        ]
+        for opt in ['fixedbufs', 'nonvectored', 'force_async', 'registerfiles',
+                    'sqthread_poll', 'sqthread_poll_cpu', 'hipri', 'nowait',
+                    'time_based', 'runtime', 'verify', 'io_size', 'offset', 'number_ios']:
+            if opt in self.fio_opts:
+                option = f"--{opt}={self.fio_opts[opt]}"
+                fio_args.append(option)
+
+        super().setup(fio_args)
+
+
+TEST_LIST = [
+#
+# Write data with pi_act=1 and then read the data back (with both
+# pi_act=[0,1]).
+#
+    {
+        # Write workload with variable IO sizes
+        # pi_act=1
+        "test_id": 101,
+        "fio_opts": {
+            "rw": 'write',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            "pi_act": 1,
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with fixed small IO size
+        # pi_act=0
+        "test_id": 102,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_LOW,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with fixed small IO size
+        # pi_act=1
+        "test_id": 103,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 1,
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_LOW,
+        "test_class": DifDixTest,
+    },
+    {
+        # Write workload with fixed large IO size
+        # Precondition for read workloads to follow
+        # pi_act=1
+        "test_id": 104,
+        "fio_opts": {
+            "rw": 'write',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            "pi_act": 1,
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_HIGH,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=0
+        "test_id": 105,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=1
+        "test_id": 106,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 1,
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+#
+# Write data with pi_act=0 and then read the data back (with both
+# pi_act=[0,1]).
+#
+    {
+        # Write workload with variable IO sizes
+        # pi_act=0
+        "test_id": 201,
+        "fio_opts": {
+            "rw": 'write',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            "pi_act": 0,
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with fixed small IO size
+        # pi_act=0
+        "test_id": 202,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_LOW,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with fixed small IO size
+        # pi_act=1
+        "test_id": 203,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 1,
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_LOW,
+        "test_class": DifDixTest,
+    },
+    {
+        # Write workload with fixed large IO sizes
+        # pi_act=0
+        "test_id": 204,
+        "fio_opts": {
+            "rw": 'write',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            "pi_act": 0,
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_HIGH,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=0
+        "test_id": 205,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=1
+        "test_id": 206,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 1,
+            "apptag": "0x8888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+#
+# Test apptag errors.
+#
+    {
+        # Read workload with variable IO sizes
+        # pi_act=0
+        # trigger an apptag error
+        "test_id": 301,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x0888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "success": SUCCESS_NONZERO,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=1
+        # trigger an apptag error
+        "test_id": 302,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 1,
+            "apptag": "0x0888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "success": SUCCESS_NONZERO,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=0
+        # trigger an apptag error
+        # same as above but with pi_chk=APPTAG only
+        "test_id": 303,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x0888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "success": SUCCESS_NONZERO,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=1
+        # trigger an apptag error
+        # same as above but with pi_chk=APPTAG only
+        "test_id": 304,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 1,
+            "apptag": "0x0888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "success": SUCCESS_NONZERO,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=0
+        # this case would trigger an apptag error, but pi_chk says to check
+        # only the Guard PI and reftag, so there should be no error
+        "test_id": 305,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x0888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=1
+        # this case would trigger an apptag error, but pi_chk says to check
+        # only the Guard PI and reftag, so there should be no error
+        "test_id": 306,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 1,
+            "apptag": "0x0888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=0
+        # this case would trigger an apptag error, but pi_chk says to check
+        # only the Guard PI, so there should be no error
+        "test_id": 307,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x0888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "GUARD",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=1
+        # this case would trigger an apptag error, but pi_chk says to check
+        # only the Guard PI, so there should be no error
+        "test_id": 308,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 1,
+            "apptag": "0x0888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "GUARD",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=0
+        # this case would trigger an apptag error, but pi_chk says to check
+        # only the reftag, so there should be no error
+        # This case will be skipped when the device is formatted with Type 3 PI
+        # since Type 3 PI ignores the reftag
+        "test_id": 309,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x0888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "skip": "type3",
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=1
+        # this case would trigger an apptag error, but pi_chk says to check
+        # only the reftag, so there should be no error
+        # This case will be skipped when the device is formatted with Type 3 PI
+        # since Type 3 PI ignores the reftag
+        "test_id": 310,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 1,
+            "apptag": "0x0888",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "skip": "type3",
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=0
+        # use apptag mask to ignore apptag mismatch
+        "test_id": 311,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x0888",
+            "apptag_mask": "0x0FFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=1
+        # use apptag mask to ignore apptag mismatch
+        "test_id": 312,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 1,
+            "apptag": "0x0888",
+            "apptag_mask": "0x0FFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=0
+        # use apptag mask to ignore apptag mismatch
+        "test_id": 313,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0xF888",
+            "apptag_mask": "0x0FFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=1
+        # use apptag mask to ignore apptag mismatch
+        "test_id": 314,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 1,
+            "apptag": "0xF888",
+            "apptag_mask": "0x0FFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "test_class": DifDixTest,
+    },
+    {
+        # Write workload with fixed large IO sizes
+        # Set apptag=0xFFFF to disable all checking for Type 1 and 2
+        # pi_act=1
+        "test_id": 315,
+        "fio_opts": {
+            "rw": 'write',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "apptag": "0xFFFF",
+            "apptag_mask": "0xFFFF",
+            "pi_act": 1,
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_HIGH,
+        "bs_high": BS_HIGH,
+        "skip": "type3",
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=0
+        # Data was written with apptag=0xFFFF
+        # Reading the data back should disable all checking for Type 1 and 2
+        "test_id": 316,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x0101",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "skip": "type3",
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=1
+        # Data was written with apptag=0xFFFF
+        # Reading the data back should disable all checking for Type 1 and 2
+        "test_id": 317,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 1,
+            "apptag": "0x0000",
+            "apptag_mask": "0xFFFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "skip": "type3",
+        "test_class": DifDixTest,
+    },
+#
+# Error cases related to block size and metadata size
+#
+    {
+        # Use a min block size that is not a multiple of lba/elba size to
+        # trigger an error.
+        "test_id": 401,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x8888",
+            "apptag_mask": "0x0FFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW+0.5,
+        "bs_high": BS_HIGH,
+        "success": SUCCESS_NONZERO,
+        "test_class": DifDixTest,
+    },
+    {
+        # Use metadata size that is too small
+        "test_id": 402,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x8888",
+            "apptag_mask": "0x0FFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "mdsize_adjustment": -1,
+        "success": SUCCESS_NONZERO,
+        "skip": "elba",
+        "test_class": DifDixTest,
+    },
+    {
+        # Read workload with variable IO sizes
+        # pi_act=0
+        # Should still work even if metadata size is too large
+        "test_id": 403,
+        "fio_opts": {
+            "rw": 'read',
+            "number_ios": NUMBER_IOS,
+            "output-format": "json",
+            "pi_act": 0,
+            "apptag": "0x8888",
+            "apptag_mask": "0x0FFF",
+            },
+        "pi_chk": "APPTAG,GUARD,REFTAG",
+        "bs_low": BS_LOW,
+        "bs_high": BS_HIGH,
+        "mdsize_adjustment": 1,
+        "test_class": DifDixTest,
+    },
+]
+
+
+def parse_args():
+    """Parse command-line arguments."""
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-d', '--debug', help='Enable debug messages', action='store_true')
+    parser.add_argument('-f', '--fio', help='path to file executable (e.g., ./fio)')
+    parser.add_argument('-a', '--artifact-root', help='artifact root directory')
+    parser.add_argument('-s', '--skip', nargs='+', type=int,
+                        help='list of test(s) to skip')
+    parser.add_argument('-o', '--run-only', nargs='+', type=int,
+                        help='list of test(s) to run, skipping all others')
+    parser.add_argument('--dut', help='target NVMe character device to test '
+                        '(e.g., /dev/ng0n1). WARNING: THIS IS A DESTRUCTIVE TEST', required=True)
+    parser.add_argument('-l', '--lbaf', nargs='+', type=int,
+                        help='list of lba formats to test')
+    args = parser.parse_args()
+
+    return args
+
+
+def get_lbafs(args):
+    """
+    Determine which LBA formats to use. Use either the ones specified on the
+    command line or if none are specified query the device and use all lba
+    formats with metadata.
+    """
+    lbaf_list = []
+    id_ns_cmd = f"sudo nvme id-ns --output-format=json {args.dut}".split(' ')
+    id_ns_output = subprocess.check_output(id_ns_cmd)
+    lbafs = json.loads(id_ns_output)['lbafs']
+    if args.lbaf:
+        for lbaf in args.lbaf:
+            lbaf_list.append({'lbaf': lbaf, 'ds': 2 ** lbafs[lbaf]['ds'],
+                              'ms': lbafs[lbaf]['ms'], })
+            if lbafs[lbaf]['ms'] == 0:
+                print(f'Error: lbaf {lbaf} has metadata size zero')
+                sys.exit(1)
+    else:
+        for lbaf_num, lbaf in enumerate(lbafs):
+            if lbaf['ms'] != 0:
+                lbaf_list.append({'lbaf': lbaf_num, 'ds': 2 ** lbaf['ds'],
+                                  'ms': lbaf['ms'], })
+
+    return lbaf_list
+
+
+def get_guard_pi(lbaf_list, args):
+    """
+    Find out how many bits of guard protection information are associated with
+    each lbaf to be used. If this is not available assume 16-bit guard pi.
+    Also record the bytes of protection information associated with the number
+    of guard PI bits.
+    """
+    nvm_id_ns_cmd = f"sudo nvme nvm-id-ns --output-format=json {args.dut}".split(' ')
+    try:
+        nvm_id_ns_output = subprocess.check_output(nvm_id_ns_cmd)
+    except subprocess.CalledProcessError:
+        print(f"Non-zero return code from {' '.join(nvm_id_ns_cmd)}; " \
+                "assuming all lbafs use 16b Guard Protection Information")
+        for lbaf in lbaf_list:
+            lbaf['guard_pi_bits'] = 16
+    else:
+        elbafs = json.loads(nvm_id_ns_output)['elbafs']
+        for elbaf_num, elbaf in enumerate(elbafs):
+            for lbaf in lbaf_list:
+                if lbaf['lbaf'] == elbaf_num:
+                    lbaf['guard_pi_bits'] = 16 << elbaf['pif']
+
+    # For 16b Guard Protection Information, the PI requires 8 bytes
+    # For 32b and 64b Guard PI, the PI requires 16 bytes
+    for lbaf in lbaf_list:
+        if lbaf['guard_pi_bits'] == 16:
+            lbaf['pi_bytes'] = 8
+        else:
+            lbaf['pi_bytes'] = 16
+
+
+def get_capabilities(args):
+    """
+    Determine what end-to-end data protection features the device supports.
+    """
+    caps = { 'pil': [], 'pitype': [], 'elba': [] }
+    id_ns_cmd = f"sudo nvme id-ns --output-format=json {args.dut}".split(' ')
+    id_ns_output = subprocess.check_output(id_ns_cmd)
+    id_ns_json = json.loads(id_ns_output)
+
+    mc = id_ns_json['mc']
+    if mc & 1:
+        caps['elba'].append(1)
+    if mc & 2:
+        caps['elba'].append(0)
+
+    dpc = id_ns_json['dpc']
+    if dpc & 1:
+        caps['pitype'].append(1)
+    if dpc & 2:
+        caps['pitype'].append(2)
+    if dpc & 4:
+        caps['pitype'].append(3)
+    if dpc & 8:
+        caps['pil'].append(1)
+    if dpc & 16:
+        caps['pil'].append(0)
+
+    for _, value in caps.items():
+        if len(value) == 0:
+            logging.error("One or more end-to-end data protection features unsupported: %s", caps)
+            sys.exit(-1)
+
+    return caps
+
+
+def format_device(args, lbaf, pitype, pil, elba):
+    """
+    Format device using specified lba format with specified pitype, pil, and
+    elba values.
+    """
+
+    format_cmd = f"sudo nvme format {args.dut} --lbaf={lbaf['lbaf']} " \
+                 f"--pi={pitype} --pil={pil} --ms={elba} --force"
+    logging.debug("Format command: %s", format_cmd)
+    format_cmd = format_cmd.split(' ')
+    format_cmd_result = subprocess.run(format_cmd, capture_output=True, check=False,
+                                       encoding=locale.getpreferredencoding())
+
+    # Sometimes nvme-cli may format the device successfully but fail to
+    # rescan the namespaces after the format. Continue if this happens but
+    # abort if some other error occurs.
+    if format_cmd_result.returncode != 0:
+        if 'failed to rescan namespaces' not in format_cmd_result.stderr \
+                or 'Success formatting namespace' not in format_cmd_result.stdout:
+            logging.error(format_cmd_result.stdout)
+            logging.error(format_cmd_result.stderr)
+            print("Unable to format device; skipping this configuration")
+            return False
+
+    logging.debug(format_cmd_result.stdout)
+    return True
+
+
+def difdix_test(test_env, args, lbaf, pitype, elba):
+    """
+    Adjust test arguments based on values of lbaf, pitype, and elba.  Then run
+    the tests.
+    """
+    for test in TEST_LIST:
+        test['force_skip'] = False
+
+        blocksize = lbaf['ds']
+        # Set fio blocksize parameter at runtime
+        # If we formatted the device in extended LBA mode (e.g., 520-byte
+        # sectors), we usually need to add the lba data size and metadata size
+        # together for fio's bs parameter. However, if pi_act == 1 and the
+        # device is formatted so that the metadata is the same size as the PI,
+        # then the device will take care of everything and the application
+        # should just use regular power of 2 lba data size even when the device
+        # is in extended lba mode.
+        if elba:
+            if not test['fio_opts']['pi_act'] or lbaf['ms'] != lbaf['pi_bytes']:
+                blocksize += lbaf['ms']
+            test['fio_opts']['md_per_io_size'] = 0
+        else:
+        # If we are using a separate buffer for metadata, fio doesn't need to
+        # do anything when pi_act==1 and protection information size is equal to
+        # metadata size since the device is taking care of it all. If either of
+        # the two conditions do not hold, then we do need to allocate a
+        # separate metadata buffer.
+            if test['fio_opts']['pi_act'] and lbaf['ms'] == lbaf['pi_bytes']:
+                test['fio_opts']['md_per_io_size'] = 0
+            else:
+                test['fio_opts']['md_per_io_size'] = lbaf['ms'] * test['bs_high']
+
+        test['fio_opts']['bsrange'] = f"{blocksize * test['bs_low']}-{blocksize * test['bs_high']}"
+        if 'mdsize_adjustment' in test:
+            test['fio_opts']['md_per_io_size'] += test['mdsize_adjustment']
+
+        # Set fio pi_chk parameter at runtime. If the device is formatted
+        # with Type 3 protection information, this means that the reference
+        # tag is not checked and I/O commands may throw an error if they
+        # are submitted with the REFTAG bit set in pi_chk. Make sure fio
+        # does not set pi_chk's REFTAG bit if the device is formatted with
+        # Type 3 PI.
+        if 'pi_chk' in test:
+            if pitype == 3 and 'REFTAG' in test['pi_chk']:
+                test['fio_opts']['pi_chk'] = test['pi_chk'].replace('REFTAG','')
+                logging.debug("Type 3 PI: dropping REFTAG bit")
+            else:
+                test['fio_opts']['pi_chk'] = test['pi_chk']
+
+        if 'skip' in test:
+            if pitype == 3 and 'type3' in test['skip']:
+                test['force_skip'] = True
+                logging.debug("Type 3 PI: skipping test case")
+            if elba and 'elba' in test['skip']:
+                test['force_skip'] = True
+                logging.debug("extended lba format: skipping test case")
+
+        logging.debug("Test %d: pi_act=%d, bsrange=%s, md_per_io_size=%d", test['test_id'],
+                      test['fio_opts']['pi_act'], test['fio_opts']['bsrange'],
+                      test['fio_opts']['md_per_io_size'])
+
+    return run_fio_tests(TEST_LIST, test_env, args)
+
+
+def main():
+    """
+    Run tests using fio's io_uring_cmd ioengine to exercise end-to-end data
+    protection capabilities.
+    """
+
+    args = parse_args()
+
+    if args.debug:
+        logging.basicConfig(level=logging.DEBUG)
+    else:
+        logging.basicConfig(level=logging.INFO)
+
+    artifact_root = args.artifact_root if args.artifact_root else \
+        f"nvmept_pi-test-{time.strftime('%Y%m%d-%H%M%S')}"
+    os.mkdir(artifact_root)
+    print(f"Artifact directory is {artifact_root}")
+
+    if args.fio:
+        fio_path = str(Path(args.fio).absolute())
+    else:
+        fio_path = 'fio'
+    print(f"fio path is {fio_path}")
+
+    lbaf_list = get_lbafs(args)
+    get_guard_pi(lbaf_list, args)
+    caps = get_capabilities(args)
+    print("Device capabilities:", caps)
+
+    for test in TEST_LIST:
+        test['fio_opts']['filename'] = args.dut
+
+    test_env = {
+              'fio_path': fio_path,
+              'fio_root': str(Path(__file__).absolute().parent.parent),
+              'artifact_root': artifact_root,
+              'basename': 'nvmept_pi',
+              }
+
+    total = { 'passed':  0, 'failed': 0, 'skipped': 0 }
+
+    try:
+        for lbaf, pil, pitype, elba in itertools.product(lbaf_list, caps['pil'], caps['pitype'],
+                                                         caps['elba']):
+            print(f"\nlbaf: {lbaf}, pil: {pil}, pitype: {pitype}, elba: {elba}")
+
+            if not format_device(args, lbaf, pitype, pil, elba):
+                continue
+
+            test_env['artifact_root'] = \
+                os.path.join(artifact_root, f"lbaf{lbaf['lbaf']}pil{pil}pitype{pitype}" \
+                    f"elba{elba}")
+            os.mkdir(test_env['artifact_root'])
+
+            passed, failed, skipped = difdix_test(test_env, args, lbaf, pitype, elba)
+
+            total['passed'] += passed
+            total['failed'] += failed
+            total['skipped'] += skipped
+    except KeyboardInterrupt:
+        pass
+
+    print(f"\n\n{total['passed']} test(s) passed, {total['failed']} failed, " \
+            f"{total['skipped']} skipped")
+    sys.exit(total['failed'])
+
+
+if __name__ == '__main__':
+    main()

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-08-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-08-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7b57011427a8204bd63671b08dde56cd9e879d68:

  t/fiotestlib: make recorded command prettier (2023-08-02 12:58:16 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 62f35562722f0c903567096d0f10a836d1ae2f60:

  eta: calculate aggregate bw statistics even when eta is disabled (2023-08-03 11:49:08 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      eta: calculate aggregate bw statistics even when eta is disabled

 eta.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/eta.c b/eta.c
index af4027e0..cc342461 100644
--- a/eta.c
+++ b/eta.c
@@ -375,6 +375,22 @@ bool eta_time_within_slack(unsigned int time)
 	return time > ((eta_interval_msec * 95) / 100);
 }
 
+/*
+ * These are the conditions under which we might be able to skip the eta
+ * calculation.
+ */
+static bool skip_eta()
+{
+	if (!(output_format & FIO_OUTPUT_NORMAL) && f_out == stdout)
+		return true;
+	if (temp_stall_ts || eta_print == FIO_ETA_NEVER)
+		return true;
+	if (!isatty(STDOUT_FILENO) && eta_print != FIO_ETA_ALWAYS)
+		return true;
+
+	return false;
+}
+
 /*
  * Print status of the jobs we know about. This includes rate estimates,
  * ETA, thread state, etc.
@@ -393,14 +409,12 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 	static unsigned long long disp_io_iops[DDIR_RWDIR_CNT];
 	static struct timespec rate_prev_time, disp_prev_time;
 
-	if (!force) {
-		if (!(output_format & FIO_OUTPUT_NORMAL) &&
-		    f_out == stdout)
-			return false;
-		if (temp_stall_ts || eta_print == FIO_ETA_NEVER)
-			return false;
+	bool ret = true;
 
-		if (!isatty(STDOUT_FILENO) && (eta_print != FIO_ETA_ALWAYS))
+	if (!force && skip_eta()) {
+		if (write_bw_log)
+			ret = false;
+		else
 			return false;
 	}
 
@@ -534,7 +548,7 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 	je->nr_threads = thread_number;
 	update_condensed_str(__run_str, run_str);
 	memcpy(je->run_str, run_str, strlen(run_str));
-	return true;
+	return ret;
 }
 
 static int gen_eta_str(struct jobs_eta *je, char *p, size_t left,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-08-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-08-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1660df6601e24a17dda9e12cbc901337fd5fd925:

  Merge branch 'master' of https://github.com/min22/fio (2023-07-31 15:03:37 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7b57011427a8204bd63671b08dde56cd9e879d68:

  t/fiotestlib: make recorded command prettier (2023-08-02 12:58:16 -0400)

----------------------------------------------------------------
Vincent Fu (2):
      t/nvmept: fix typo
      t/fiotestlib: make recorded command prettier

 t/fiotestlib.py | 2 +-
 t/nvmept.py     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/t/fiotestlib.py b/t/fiotestlib.py
index 0fe17b74..1f35de0a 100755
--- a/t/fiotestlib.py
+++ b/t/fiotestlib.py
@@ -75,7 +75,7 @@ class FioExeTest(FioTest):
         command = [self.paths['exe']] + self.parameters
         with open(self.filenames['cmd'], "w+",
                   encoding=locale.getpreferredencoding()) as command_file:
-            command_file.write(" ".join(command))
+            command_file.write(" \\\n ".join(command))
 
         try:
             with open(self.filenames['stdout'], "w+",
diff --git a/t/nvmept.py b/t/nvmept.py
index cc26d152..c08fb350 100755
--- a/t/nvmept.py
+++ b/t/nvmept.py
@@ -295,7 +295,7 @@ def main():
               'fio_path': fio_path,
               'fio_root': str(Path(__file__).absolute().parent.parent),
               'artifact_root': artifact_root,
-              'basename': 'readonly',
+              'basename': 'nvmept',
               }
 
     _, failed, _ = run_fio_tests(TEST_LIST, test_env, args)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-08-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-08-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 824912be19542f94264e485a25d37b55a9f68f0e:

  Revert "correctly free thread_data options at the topmost parent process" (2023-07-28 11:32:22 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1660df6601e24a17dda9e12cbc901337fd5fd925:

  Merge branch 'master' of https://github.com/min22/fio (2023-07-31 15:03:37 -0600)

----------------------------------------------------------------
Denis Pronin (1):
      use 'const' where it is required

Jens Axboe (2):
      Merge branch 'improment/constness' of https://github.com/dpronin/fio
      Merge branch 'master' of https://github.com/min22/fio

Kookoo Gu (1):
      iolog.c: fix inaccurate clat when replay trace

 client.c | 10 +++++-----
 client.h |  8 ++++----
 iolog.c  | 14 +++++++-------
 3 files changed, 16 insertions(+), 16 deletions(-)

---

Diff of recent changes:

diff --git a/client.c b/client.c
index 7cd2ba66..c257036b 100644
--- a/client.c
+++ b/client.c
@@ -34,7 +34,7 @@ static void handle_start(struct fio_client *client, struct fio_net_cmd *cmd);
 static void convert_text(struct fio_net_cmd *cmd);
 static void client_display_thread_status(struct jobs_eta *je);
 
-struct client_ops fio_client_ops = {
+struct client_ops const fio_client_ops = {
 	.text		= handle_text,
 	.disk_util	= handle_du,
 	.thread_status	= handle_ts,
@@ -446,7 +446,7 @@ int fio_client_add_ini_file(void *cookie, const char *ini_file, bool remote)
 	return 0;
 }
 
-int fio_client_add(struct client_ops *ops, const char *hostname, void **cookie)
+int fio_client_add(struct client_ops const *ops, const char *hostname, void **cookie)
 {
 	struct fio_client *existing = *cookie;
 	struct fio_client *client;
@@ -1772,7 +1772,7 @@ fail:
 
 int fio_handle_client(struct fio_client *client)
 {
-	struct client_ops *ops = client->ops;
+	struct client_ops const *ops = client->ops;
 	struct fio_net_cmd *cmd;
 
 	dprint(FD_NET, "client: handle %s\n", client->hostname);
@@ -1957,7 +1957,7 @@ int fio_clients_send_trigger(const char *cmd)
 	return 0;
 }
 
-static void request_client_etas(struct client_ops *ops)
+static void request_client_etas(struct client_ops const *ops)
 {
 	struct fio_client *client;
 	struct flist_head *entry;
@@ -2089,7 +2089,7 @@ static int fio_check_clients_timed_out(void)
 	return ret;
 }
 
-int fio_handle_clients(struct client_ops *ops)
+int fio_handle_clients(struct client_ops const *ops)
 {
 	struct pollfd *pfds;
 	int i, ret = 0, retval = 0;
diff --git a/client.h b/client.h
index 8033325e..d77b6076 100644
--- a/client.h
+++ b/client.h
@@ -69,7 +69,7 @@ struct fio_client {
 	uint16_t argc;
 	char **argv;
 
-	struct client_ops *ops;
+	struct client_ops const *ops;
 	void *client_data;
 
 	struct client_file *files;
@@ -84,7 +84,7 @@ typedef void (client_eta_op)(struct jobs_eta *je);
 typedef void (client_timed_out_op)(struct fio_client *);
 typedef void (client_jobs_eta_op)(struct fio_client *client, struct jobs_eta *je);
 
-extern struct client_ops fio_client_ops;
+extern struct client_ops const fio_client_ops;
 
 struct client_ops {
 	client_cmd_op		*text;
@@ -128,8 +128,8 @@ extern int fio_start_client(struct fio_client *);
 extern int fio_start_all_clients(void);
 extern int fio_clients_send_ini(const char *);
 extern int fio_client_send_ini(struct fio_client *, const char *, bool);
-extern int fio_handle_clients(struct client_ops *);
-extern int fio_client_add(struct client_ops *, const char *, void **);
+extern int fio_handle_clients(struct client_ops const*);
+extern int fio_client_add(struct client_ops const*, const char *, void **);
 extern struct fio_client *fio_client_add_explicit(struct client_ops *, const char *, int, int);
 extern void fio_client_add_cmd_option(void *, const char *);
 extern int fio_client_add_ini_file(void *, const char *, bool);
diff --git a/iolog.c b/iolog.c
index cc2cbc65..97ba4396 100644
--- a/iolog.c
+++ b/iolog.c
@@ -82,8 +82,8 @@ static void iolog_delay(struct thread_data *td, unsigned long delay)
 {
 	uint64_t usec = utime_since_now(&td->last_issue);
 	unsigned long orig_delay = delay;
-	uint64_t this_delay;
 	struct timespec ts;
+	int ret = 0;
 
 	if (delay < td->time_offset) {
 		td->time_offset = 0;
@@ -97,13 +97,13 @@ static void iolog_delay(struct thread_data *td, unsigned long delay)
 	delay -= usec;
 
 	fio_gettime(&ts, NULL);
-	while (delay && !td->terminate) {
-		this_delay = delay;
-		if (this_delay > 500000)
-			this_delay = 500000;
 
-		usec_sleep(td, this_delay);
-		delay -= this_delay;
+	while (delay && !td->terminate) {
+		ret = io_u_queued_complete(td, 0);
+		if (ret < 0)
+			td_verror(td, -ret, "io_u_queued_complete");
+		if (utime_since_now(&ts) > delay)
+			break;
 	}
 
 	usec = utime_since_now(&ts);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-07-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-07-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 45eb1cf5ce883ae3b170f102db38204616c8e4b1:

  Merge branch 'helper_thread-fix-missing-stdbool-header' of https://github.com/dpronin/fio (2023-07-27 13:48:26 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 824912be19542f94264e485a25d37b55a9f68f0e:

  Revert "correctly free thread_data options at the topmost parent process" (2023-07-28 11:32:22 -0600)

----------------------------------------------------------------
Denis Pronin (3):
      fix missing headers in multiple files
      correctly free thread_data options at the topmost parent process
      io_uring engine: 'atomic_load_relaxed' instead of 'atomic_load_acquire'

Jens Axboe (4):
      Merge branch 'io_uring' of https://github.com/dpronin/fio
      Merge branch 'master' of https://github.com/dpronin/fio
      Merge branch 'td-eo-double-free-fix' of https://github.com/dpronin/fio
      Revert "correctly free thread_data options at the topmost parent process"

 cairo_text_helpers.c | 2 ++
 cairo_text_helpers.h | 2 ++
 engines/io_uring.c   | 4 ++--
 goptions.h           | 2 ++
 log.c                | 2 ++
 5 files changed, 10 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/cairo_text_helpers.c b/cairo_text_helpers.c
index 19fb8e03..5bdd6021 100644
--- a/cairo_text_helpers.c
+++ b/cairo_text_helpers.c
@@ -1,3 +1,5 @@
+#include "cairo_text_helpers.h"
+
 #include <cairo.h>
 #include <gtk/gtk.h>
 #include <math.h>
diff --git a/cairo_text_helpers.h b/cairo_text_helpers.h
index 014001ad..d0f52d51 100644
--- a/cairo_text_helpers.h
+++ b/cairo_text_helpers.h
@@ -1,6 +1,8 @@
 #ifndef CAIRO_TEXT_HELPERS_H
 #define CAIRO_TEXT_HELPERS_H
 
+#include <cairo.h>
+
 void draw_centered_text(cairo_t *cr, const char *font, double x, double y,
 			       double fontsize, const char *text);
 
diff --git a/engines/io_uring.c b/engines/io_uring.c
index e1abf688..b361e6a5 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -509,7 +509,7 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 
 	tail = *ring->tail;
 	next_tail = tail + 1;
-	if (next_tail == atomic_load_acquire(ring->head))
+	if (next_tail == atomic_load_relaxed(ring->head))
 		return FIO_Q_BUSY;
 
 	if (ld->cmdprio.mode != CMDPRIO_MODE_NONE)
@@ -569,7 +569,7 @@ static int fio_ioring_commit(struct thread_data *td)
 		unsigned start = *ld->sq_ring.tail - ld->queued;
 		unsigned flags;
 
-		flags = atomic_load_acquire(ring->flags);
+		flags = atomic_load_relaxed(ring->flags);
 		if (flags & IORING_SQ_NEED_WAKEUP)
 			io_uring_enter(ld, ld->queued, 0,
 					IORING_ENTER_SQ_WAKEUP);
diff --git a/goptions.h b/goptions.h
index a225a8d1..03617509 100644
--- a/goptions.h
+++ b/goptions.h
@@ -1,6 +1,8 @@
 #ifndef GFIO_OPTIONS_H
 #define GFIO_OPTIONS_H
 
+#include <gtk/gtk.h>
+
 void gopt_get_options_window(GtkWidget *window, struct gfio_client *gc);
 void gopt_init(void);
 void gopt_exit(void);
diff --git a/log.c b/log.c
index 237bac28..df58ea07 100644
--- a/log.c
+++ b/log.c
@@ -1,3 +1,5 @@
+#include "log.h"
+
 #include <unistd.h>
 #include <string.h>
 #include <stdarg.h>

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-07-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-07-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0b47b2cf3dab1d26d72f52ed8c19f782a8277d3a:

  Merge branch 'prio-hints' (2023-07-21 15:23:40 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 45eb1cf5ce883ae3b170f102db38204616c8e4b1:

  Merge branch 'helper_thread-fix-missing-stdbool-header' of https://github.com/dpronin/fio (2023-07-27 13:48:26 -0600)

----------------------------------------------------------------
Denis Pronin (3):
      diskutil.h: fix missing headers wanted by the header
      helper_thread.h: include missing stdbool.h because 'bool' type is used
      helper_thread.h: forwardly declare structures fio_sem and sk_out

Jens Axboe (2):
      Merge branch 'diskutil-fix-missing-headers' of https://github.com/dpronin/fio
      Merge branch 'helper_thread-fix-missing-stdbool-header' of https://github.com/dpronin/fio

 diskutil.h      | 3 +++
 helper_thread.h | 5 +++++
 2 files changed, 8 insertions(+)

---

Diff of recent changes:

diff --git a/diskutil.h b/diskutil.h
index 9dca42c4..9b283799 100644
--- a/diskutil.h
+++ b/diskutil.h
@@ -2,10 +2,13 @@
 #define FIO_DISKUTIL_H
 #define FIO_DU_NAME_SZ		64
 
+#include <stdint.h>
 #include <limits.h>
 
 #include "helper_thread.h"
 #include "fio_sem.h"
+#include "flist.h"
+#include "lib/ieee754.h"
 
 /**
  * @ios: Number of I/O operations that have been completed successfully.
diff --git a/helper_thread.h b/helper_thread.h
index d7df6c4d..1c8167e8 100644
--- a/helper_thread.h
+++ b/helper_thread.h
@@ -1,6 +1,11 @@
 #ifndef FIO_HELPER_THREAD_H
 #define FIO_HELPER_THREAD_H
 
+#include <stdbool.h>
+
+struct fio_sem;
+struct sk_out;
+
 extern void helper_reset(void);
 extern void helper_do_stat(void);
 extern bool helper_should_exit(void);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-07-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-07-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit caf7ac7ef000097765b1c56404adb5e68b227977:

  t/zbd: add max_active configs to run-tests-against-nullb (2023-07-20 09:52:37 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0b47b2cf3dab1d26d72f52ed8c19f782a8277d3a:

  Merge branch 'prio-hints' (2023-07-21 15:23:40 -0600)

----------------------------------------------------------------
Damien Le Moal (6):
      os-linux: Cleanup IO priority class and value macros
      cmdprio: Introduce generic option definitions
      os-linux: add initial support for IO priority hints
      options: add priohint option
      cmdprio: Add support for per I/O priority hint
      stats: Add hint information to per priority level stats

Jens Axboe (1):
      Merge branch 'prio-hints'

Shin'ichiro Kawasaki (1):
      backend: clear IO_U_F_FLIGHT flag in zero byte read path

 HOWTO.rst          |  37 +++++++++++++++++--
 backend.c          |  11 ++++--
 cconv.c            |   2 +
 engines/cmdprio.c  |   9 +++--
 engines/cmdprio.h  | 106 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 engines/io_uring.c |  86 ++-----------------------------------------
 engines/libaio.c   |  82 +----------------------------------------
 fio.1              |  33 +++++++++++++++--
 options.c          |  31 ++++++++++++++--
 os/os-dragonfly.h  |   4 +-
 os/os-linux.h      |  27 ++++++++++----
 os/os.h            |   7 +++-
 server.h           |   2 +-
 stat.c             |  10 +++--
 thread_options.h   |   3 +-
 15 files changed, 252 insertions(+), 198 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 7fe70fbd..ac8314f3 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2287,6 +2287,16 @@ with the caveat that when used on the command line, they must come after the
 	reads and writes. See :manpage:`ionice(1)`. See also the
 	:option:`prioclass` option.
 
+.. option:: cmdprio_hint=int[,int] : [io_uring] [libaio]
+
+	Set the I/O priority hint to use for I/Os that must be issued with
+	a priority when :option:`cmdprio_percentage` or
+	:option:`cmdprio_bssplit` is set. If not specified when
+	:option:`cmdprio_percentage` or :option:`cmdprio_bssplit` is set,
+	this defaults to 0 (no hint). A single value applies to reads and
+	writes. Comma-separated values may be specified for reads and writes.
+	See also the :option:`priohint` option.
+
 .. option:: cmdprio=int[,int] : [io_uring] [libaio]
 
 	Set the I/O priority value to use for I/Os that must be issued with
@@ -2313,9 +2323,9 @@ with the caveat that when used on the command line, they must come after the
 
 		cmdprio_bssplit=blocksize/percentage:blocksize/percentage
 
-	In this case, each entry will use the priority class and priority
-	level defined by the options :option:`cmdprio_class` and
-	:option:`cmdprio` respectively.
+	In this case, each entry will use the priority class, priority hint
+	and priority level defined by the options :option:`cmdprio_class`,
+        :option:`cmdprio` and :option:`cmdprio_hint` respectively.
 
 	The second accepted format for this option is:
 
@@ -2326,7 +2336,14 @@ with the caveat that when used on the command line, they must come after the
 	accepted format does not restrict all entries to have the same priority
 	class and priority level.
 
-	For both formats, only the read and write data directions are supported,
+	The third accepted format for this option is:
+
+		cmdprio_bssplit=blocksize/percentage/class/level/hint:...
+
+	This is an extension of the second accepted format that allows to also
+	specify a priority hint.
+
+	For all formats, only the read and write data directions are supported,
 	values for trim IOs are ignored. This option is mutually exclusive with
 	the :option:`cmdprio_percentage` option.
 
@@ -3436,6 +3453,18 @@ Threads, processes and job synchronization
 	priority setting, see I/O engine specific :option:`cmdprio_percentage`
 	and :option:`cmdprio_class` options.
 
+.. option:: priohint=int
+
+	Set the I/O priority hint. This is only applicable to platforms that
+	support I/O priority classes and to devices with features controlled
+	through priority hints, e.g. block devices supporting command duration
+	limits, or CDL. CDL is a way to indicate the desired maximum latency
+	of I/Os so that the device can optimize its internal command scheduling
+	according to the latency limits indicated by the user.
+
+	For per-I/O priority hint setting, see the I/O engine specific
+	:option:`cmdprio_hint` option.
+
 .. option:: cpus_allowed=str
 
 	Controls the same options as :option:`cpumask`, but accepts a textual
diff --git a/backend.c b/backend.c
index b06a11a5..5f074039 100644
--- a/backend.c
+++ b/backend.c
@@ -466,7 +466,7 @@ int io_queue_event(struct thread_data *td, struct io_u *io_u, int *ret,
 				if (!from_verify)
 					unlog_io_piece(td, io_u);
 				td_verror(td, EIO, "full resid");
-				put_io_u(td, io_u);
+				clear_io_u(td, io_u);
 				break;
 			}
 
@@ -1799,13 +1799,16 @@ static void *thread_main(void *data)
 
 	/* ioprio_set() has to be done before td_io_init() */
 	if (fio_option_is_set(o, ioprio) ||
-	    fio_option_is_set(o, ioprio_class)) {
-		ret = ioprio_set(IOPRIO_WHO_PROCESS, 0, o->ioprio_class, o->ioprio);
+	    fio_option_is_set(o, ioprio_class) ||
+	    fio_option_is_set(o, ioprio_hint)) {
+		ret = ioprio_set(IOPRIO_WHO_PROCESS, 0, o->ioprio_class,
+				 o->ioprio, o->ioprio_hint);
 		if (ret == -1) {
 			td_verror(td, errno, "ioprio_set");
 			goto err;
 		}
-		td->ioprio = ioprio_value(o->ioprio_class, o->ioprio);
+		td->ioprio = ioprio_value(o->ioprio_class, o->ioprio,
+					  o->ioprio_hint);
 		td->ts.ioprio = td->ioprio;
 	}
 
diff --git a/cconv.c b/cconv.c
index 1bfa770f..ce6acbe6 100644
--- a/cconv.c
+++ b/cconv.c
@@ -281,6 +281,7 @@ int convert_thread_options_to_cpu(struct thread_options *o,
 	o->nice = le32_to_cpu(top->nice);
 	o->ioprio = le32_to_cpu(top->ioprio);
 	o->ioprio_class = le32_to_cpu(top->ioprio_class);
+	o->ioprio_hint = le32_to_cpu(top->ioprio_hint);
 	o->file_service_type = le32_to_cpu(top->file_service_type);
 	o->group_reporting = le32_to_cpu(top->group_reporting);
 	o->stats = le32_to_cpu(top->stats);
@@ -496,6 +497,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->nice = cpu_to_le32(o->nice);
 	top->ioprio = cpu_to_le32(o->ioprio);
 	top->ioprio_class = cpu_to_le32(o->ioprio_class);
+	top->ioprio_hint = cpu_to_le32(o->ioprio_hint);
 	top->file_service_type = cpu_to_le32(o->file_service_type);
 	top->group_reporting = cpu_to_le32(o->group_reporting);
 	top->stats = cpu_to_le32(o->stats);
diff --git a/engines/cmdprio.c b/engines/cmdprio.c
index 979a81b6..153e3691 100644
--- a/engines/cmdprio.c
+++ b/engines/cmdprio.c
@@ -267,7 +267,8 @@ static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u,
  * to be set. If the random percentage value is within the user specified
  * percentage of I/Os that should use a cmdprio priority value (rather than
  * the default priority), then this function updates the io_u with an ioprio
- * value as defined by the cmdprio/cmdprio_class or cmdprio_bssplit options.
+ * value as defined by the cmdprio/cmdprio_hint/cmdprio_class or
+ * cmdprio_bssplit options.
  *
  * Return true if the io_u ioprio was changed and false otherwise.
  */
@@ -342,7 +343,8 @@ static int fio_cmdprio_gen_perc(struct thread_data *td, struct cmdprio *cmdprio)
 		prio = &cmdprio->perc_entry[ddir];
 		prio->perc = options->percentage[ddir];
 		prio->prio = ioprio_value(options->class[ddir],
-					  options->level[ddir]);
+					  options->level[ddir],
+					  options->hint[ddir]);
 		assign_clat_prio_index(prio, &values[ddir]);
 
 		ret = init_ts_clat_prio(ts, ddir, &values[ddir]);
@@ -400,7 +402,8 @@ static int fio_cmdprio_parse_and_gen_bssplit(struct thread_data *td,
 			goto err;
 
 		implicit_cmdprio = ioprio_value(options->class[ddir],
-						options->level[ddir]);
+						options->level[ddir],
+						options->hint[ddir]);
 
 		ret = fio_cmdprio_generate_bsprio_desc(&cmdprio->bsprio_desc[ddir],
 						       &parse_res[ddir],
diff --git a/engines/cmdprio.h b/engines/cmdprio.h
index 755da8d0..81e6c390 100644
--- a/engines/cmdprio.h
+++ b/engines/cmdprio.h
@@ -7,6 +7,7 @@
 #define FIO_CMDPRIO_H
 
 #include "../fio.h"
+#include "../optgroup.h"
 
 /* read and writes only, no trim */
 #define CMDPRIO_RWDIR_CNT 2
@@ -39,9 +40,114 @@ struct cmdprio_options {
 	unsigned int percentage[CMDPRIO_RWDIR_CNT];
 	unsigned int class[CMDPRIO_RWDIR_CNT];
 	unsigned int level[CMDPRIO_RWDIR_CNT];
+	unsigned int hint[CMDPRIO_RWDIR_CNT];
 	char *bssplit_str;
 };
 
+#ifdef FIO_HAVE_IOPRIO_CLASS
+#define CMDPRIO_OPTIONS(opt_struct, opt_group)					\
+	{									\
+		.name	= "cmdprio_percentage",					\
+		.lname	= "high priority percentage",				\
+		.type	= FIO_OPT_INT,						\
+		.off1	= offsetof(opt_struct,					\
+				   cmdprio_options.percentage[DDIR_READ]),	\
+		.off2	= offsetof(opt_struct,					\
+				   cmdprio_options.percentage[DDIR_WRITE]),	\
+		.minval	= 0,							\
+		.maxval	= 100,							\
+		.help	= "Send high priority I/O this percentage of the time",	\
+		.category = FIO_OPT_C_ENGINE,					\
+		.group	= opt_group,						\
+	},									\
+	{									\
+		.name	= "cmdprio_class",					\
+		.lname	= "Asynchronous I/O priority class",			\
+		.type	= FIO_OPT_INT,						\
+		.off1	= offsetof(opt_struct,					\
+				   cmdprio_options.class[DDIR_READ]),		\
+		.off2	= offsetof(opt_struct,					\
+				   cmdprio_options.class[DDIR_WRITE]),		\
+		.help	= "Set asynchronous IO priority class",			\
+		.minval	= IOPRIO_MIN_PRIO_CLASS + 1,				\
+		.maxval	= IOPRIO_MAX_PRIO_CLASS,				\
+		.interval = 1,							\
+		.category = FIO_OPT_C_ENGINE,					\
+		.group	= opt_group,						\
+	},									\
+	{									\
+		.name	= "cmdprio_hint",					\
+		.lname	= "Asynchronous I/O priority hint",			\
+		.type	= FIO_OPT_INT,						\
+		.off1	= offsetof(opt_struct,					\
+				   cmdprio_options.hint[DDIR_READ]),		\
+		.off2	= offsetof(opt_struct,					\
+				   cmdprio_options.hint[DDIR_WRITE]),		\
+		.help	= "Set asynchronous IO priority hint",			\
+		.minval	= IOPRIO_MIN_PRIO_HINT,					\
+		.maxval	= IOPRIO_MAX_PRIO_HINT,					\
+		.interval = 1,							\
+		.category = FIO_OPT_C_ENGINE,					\
+		.group	= opt_group,						\
+	},									\
+	{									\
+		.name	= "cmdprio",						\
+		.lname	= "Asynchronous I/O priority level",			\
+		.type	= FIO_OPT_INT,						\
+		.off1	= offsetof(opt_struct,					\
+				   cmdprio_options.level[DDIR_READ]),		\
+		.off2	= offsetof(opt_struct,					\
+				   cmdprio_options.level[DDIR_WRITE]),		\
+		.help	= "Set asynchronous IO priority level",			\
+		.minval	= IOPRIO_MIN_PRIO,					\
+		.maxval	= IOPRIO_MAX_PRIO,					\
+		.interval = 1,							\
+		.category = FIO_OPT_C_ENGINE,					\
+		.group	= opt_group,						\
+	},									\
+	{									\
+		.name   = "cmdprio_bssplit",					\
+		.lname  = "Priority percentage block size split",		\
+		.type   = FIO_OPT_STR_STORE,					\
+		.off1   = offsetof(opt_struct, cmdprio_options.bssplit_str),	\
+		.help   = "Set priority percentages for different block sizes",	\
+		.category = FIO_OPT_C_ENGINE,					\
+		.group	= opt_group,						\
+	}
+#else
+#define CMDPRIO_OPTIONS(opt_struct, opt_group)					\
+	{									\
+		.name	= "cmdprio_percentage",					\
+		.lname	= "high priority percentage",				\
+		.type	= FIO_OPT_UNSUPPORTED,					\
+		.help	= "Platform does not support I/O priority classes",	\
+	},									\
+	{									\
+		.name	= "cmdprio_class",					\
+		.lname	= "Asynchronous I/O priority class",			\
+		.type	= FIO_OPT_UNSUPPORTED,					\
+		.help	= "Platform does not support I/O priority classes",	\
+	},									\
+	{									\
+		.name	= "cmdprio_hint",					\
+		.lname	= "Asynchronous I/O priority hint",			\
+		.type	= FIO_OPT_UNSUPPORTED,					\
+		.help	= "Platform does not support I/O priority classes",	\
+	},									\
+	{									\
+		.name	= "cmdprio",						\
+		.lname	= "Asynchronous I/O priority level",			\
+		.type	= FIO_OPT_UNSUPPORTED,					\
+		.help	= "Platform does not support I/O priority classes",	\
+	},									\
+	{									\
+		.name   = "cmdprio_bssplit",					\
+		.lname  = "Priority percentage block size split",		\
+		.type	= FIO_OPT_UNSUPPORTED,					\
+		.help	= "Platform does not support I/O priority classes",	\
+	}
+#endif
+
 struct cmdprio {
 	struct cmdprio_options *options;
 	struct cmdprio_prio perc_entry[CMDPRIO_RWDIR_CNT];
diff --git a/engines/io_uring.c b/engines/io_uring.c
index f30a3c00..e1abf688 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -127,87 +127,6 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
 	},
-#ifdef FIO_HAVE_IOPRIO_CLASS
-	{
-		.name	= "cmdprio_percentage",
-		.lname	= "high priority percentage",
-		.type	= FIO_OPT_INT,
-		.off1	= offsetof(struct ioring_options,
-				   cmdprio_options.percentage[DDIR_READ]),
-		.off2	= offsetof(struct ioring_options,
-				   cmdprio_options.percentage[DDIR_WRITE]),
-		.minval	= 0,
-		.maxval	= 100,
-		.help	= "Send high priority I/O this percentage of the time",
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_IOURING,
-	},
-	{
-		.name	= "cmdprio_class",
-		.lname	= "Asynchronous I/O priority class",
-		.type	= FIO_OPT_INT,
-		.off1	= offsetof(struct ioring_options,
-				   cmdprio_options.class[DDIR_READ]),
-		.off2	= offsetof(struct ioring_options,
-				   cmdprio_options.class[DDIR_WRITE]),
-		.help	= "Set asynchronous IO priority class",
-		.minval	= IOPRIO_MIN_PRIO_CLASS + 1,
-		.maxval	= IOPRIO_MAX_PRIO_CLASS,
-		.interval = 1,
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_IOURING,
-	},
-	{
-		.name	= "cmdprio",
-		.lname	= "Asynchronous I/O priority level",
-		.type	= FIO_OPT_INT,
-		.off1	= offsetof(struct ioring_options,
-				   cmdprio_options.level[DDIR_READ]),
-		.off2	= offsetof(struct ioring_options,
-				   cmdprio_options.level[DDIR_WRITE]),
-		.help	= "Set asynchronous IO priority level",
-		.minval	= IOPRIO_MIN_PRIO,
-		.maxval	= IOPRIO_MAX_PRIO,
-		.interval = 1,
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_IOURING,
-	},
-	{
-		.name   = "cmdprio_bssplit",
-		.lname  = "Priority percentage block size split",
-		.type   = FIO_OPT_STR_STORE,
-		.off1   = offsetof(struct ioring_options,
-				   cmdprio_options.bssplit_str),
-		.help   = "Set priority percentages for different block sizes",
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_IOURING,
-	},
-#else
-	{
-		.name	= "cmdprio_percentage",
-		.lname	= "high priority percentage",
-		.type	= FIO_OPT_UNSUPPORTED,
-		.help	= "Your platform does not support I/O priority classes",
-	},
-	{
-		.name	= "cmdprio_class",
-		.lname	= "Asynchronous I/O priority class",
-		.type	= FIO_OPT_UNSUPPORTED,
-		.help	= "Your platform does not support I/O priority classes",
-	},
-	{
-		.name	= "cmdprio",
-		.lname	= "Asynchronous I/O priority level",
-		.type	= FIO_OPT_UNSUPPORTED,
-		.help	= "Your platform does not support I/O priority classes",
-	},
-	{
-		.name   = "cmdprio_bssplit",
-		.lname  = "Priority percentage block size split",
-		.type	= FIO_OPT_UNSUPPORTED,
-		.help	= "Your platform does not support I/O priority classes",
-	},
-#endif
 	{
 		.name	= "fixedbufs",
 		.lname	= "Fixed (pre-mapped) IO buffers",
@@ -297,6 +216,7 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
 	},
+	CMDPRIO_OPTIONS(struct ioring_options, FIO_OPT_G_IOURING),
 	{
 		.name	= NULL,
 	},
@@ -365,8 +285,8 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 		/*
 		 * Since io_uring can have a submission context (sqthread_poll)
 		 * that is different from the process context, we cannot rely on
-		 * the IO priority set by ioprio_set() (option prio/prioclass)
-		 * to be inherited.
+		 * the IO priority set by ioprio_set() (options prio, prioclass,
+		 * and priohint) to be inherited.
 		 * td->ioprio will have the value of the "default prio", so set
 		 * this unconditionally. This value might get overridden by
 		 * fio_ioring_cmdprio_prep() if the option cmdprio_percentage or
diff --git a/engines/libaio.c b/engines/libaio.c
index 6a0745aa..aaccc7ce 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -72,87 +72,6 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
 	},
-#ifdef FIO_HAVE_IOPRIO_CLASS
-	{
-		.name	= "cmdprio_percentage",
-		.lname	= "high priority percentage",
-		.type	= FIO_OPT_INT,
-		.off1	= offsetof(struct libaio_options,
-				   cmdprio_options.percentage[DDIR_READ]),
-		.off2	= offsetof(struct libaio_options,
-				   cmdprio_options.percentage[DDIR_WRITE]),
-		.minval	= 0,
-		.maxval	= 100,
-		.help	= "Send high priority I/O this percentage of the time",
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_LIBAIO,
-	},
-	{
-		.name	= "cmdprio_class",
-		.lname	= "Asynchronous I/O priority class",
-		.type	= FIO_OPT_INT,
-		.off1	= offsetof(struct libaio_options,
-				   cmdprio_options.class[DDIR_READ]),
-		.off2	= offsetof(struct libaio_options,
-				   cmdprio_options.class[DDIR_WRITE]),
-		.help	= "Set asynchronous IO priority class",
-		.minval	= IOPRIO_MIN_PRIO_CLASS + 1,
-		.maxval	= IOPRIO_MAX_PRIO_CLASS,
-		.interval = 1,
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_LIBAIO,
-	},
-	{
-		.name	= "cmdprio",
-		.lname	= "Asynchronous I/O priority level",
-		.type	= FIO_OPT_INT,
-		.off1	= offsetof(struct libaio_options,
-				   cmdprio_options.level[DDIR_READ]),
-		.off2	= offsetof(struct libaio_options,
-				   cmdprio_options.level[DDIR_WRITE]),
-		.help	= "Set asynchronous IO priority level",
-		.minval	= IOPRIO_MIN_PRIO,
-		.maxval	= IOPRIO_MAX_PRIO,
-		.interval = 1,
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_LIBAIO,
-	},
-	{
-		.name   = "cmdprio_bssplit",
-		.lname  = "Priority percentage block size split",
-		.type   = FIO_OPT_STR_STORE,
-		.off1   = offsetof(struct libaio_options,
-				   cmdprio_options.bssplit_str),
-		.help   = "Set priority percentages for different block sizes",
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_LIBAIO,
-	},
-#else
-	{
-		.name	= "cmdprio_percentage",
-		.lname	= "high priority percentage",
-		.type	= FIO_OPT_UNSUPPORTED,
-		.help	= "Your platform does not support I/O priority classes",
-	},
-	{
-		.name	= "cmdprio_class",
-		.lname	= "Asynchronous I/O priority class",
-		.type	= FIO_OPT_UNSUPPORTED,
-		.help	= "Your platform does not support I/O priority classes",
-	},
-	{
-		.name	= "cmdprio",
-		.lname	= "Asynchronous I/O priority level",
-		.type	= FIO_OPT_UNSUPPORTED,
-		.help	= "Your platform does not support I/O priority classes",
-	},
-	{
-		.name   = "cmdprio_bssplit",
-		.lname  = "Priority percentage block size split",
-		.type	= FIO_OPT_UNSUPPORTED,
-		.help	= "Your platform does not support I/O priority classes",
-	},
-#endif
 	{
 		.name	= "nowait",
 		.lname	= "RWF_NOWAIT",
@@ -162,6 +81,7 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
 	},
+	CMDPRIO_OPTIONS(struct libaio_options, FIO_OPT_G_LIBAIO),
 	{
 		.name	= NULL,
 	},
diff --git a/fio.1 b/fio.1
index 20acd081..f62617e7 100644
--- a/fio.1
+++ b/fio.1
@@ -2084,6 +2084,14 @@ is set, this defaults to the highest priority class. A single value applies
 to reads and writes. Comma-separated values may be specified for reads and
 writes. See man \fBionice\fR\|(1). See also the \fBprioclass\fR option.
 .TP
+.BI (io_uring,libaio)cmdprio_hint \fR=\fPint[,int]
+Set the I/O priority hint to use for I/Os that must be issued with a
+priority when \fBcmdprio_percentage\fR or \fBcmdprio_bssplit\fR is set.
+If not specified when \fBcmdprio_percentage\fR or \fBcmdprio_bssplit\fR
+is set, this defaults to 0 (no hint). A single value applies to reads and
+writes. Comma-separated values may be specified for reads and writes.
+See also the \fBpriohint\fR option.
+.TP
 .BI (io_uring,libaio)cmdprio \fR=\fPint[,int]
 Set the I/O priority value to use for I/Os that must be issued with a
 priority when \fBcmdprio_percentage\fR or \fBcmdprio_bssplit\fR is set.
@@ -2109,8 +2117,9 @@ The first accepted format for this option is the same as the format of the
 cmdprio_bssplit=blocksize/percentage:blocksize/percentage
 .RE
 .P
-In this case, each entry will use the priority class and priority level defined
-by the options \fBcmdprio_class\fR and \fBcmdprio\fR respectively.
+In this case, each entry will use the priority class, priority hint and
+priority level defined by the options \fBcmdprio_class\fR, \fBcmdprio\fR
+and \fBcmdprio_hint\fR respectively.
 .P
 The second accepted format for this option is:
 .RS
@@ -2123,7 +2132,16 @@ entry. In comparison with the first accepted format, the second accepted format
 does not restrict all entries to have the same priority class and priority
 level.
 .P
-For both formats, only the read and write data directions are supported, values
+The third accepted format for this option is:
+.RS
+.P
+cmdprio_bssplit=blocksize/percentage/class/level/hint:...
+.RE
+.P
+This is an extension of the second accepted format that allows to also
+specify a priority hint.
+.P
+For all formats, only the read and write data directions are supported, values
 for trim IOs are ignored. This option is mutually exclusive with the
 \fBcmdprio_percentage\fR option.
 .RE
@@ -3144,6 +3162,15 @@ Set the I/O priority class. See man \fBionice\fR\|(1). For per-command
 priority setting, see the I/O engine specific `cmdprio_percentage` and
 `cmdprio_class` options.
 .TP
+.BI priohint \fR=\fPint
+Set the I/O priority hint. This is only applicable to platforms that support
+I/O priority classes and to devices with features controlled through priority
+hints, e.g. block devices supporting command duration limits, or CDL. CDL is a
+way to indicate the desired maximum latency of I/Os so that the device can
+optimize its internal command scheduling according to the latency limits
+indicated by the user. For per-I/O priority hint setting, see the I/O engine
+specific \fBcmdprio_hint\fB option.
+.TP
 .BI cpus_allowed \fR=\fPstr
 Controls the same options as \fBcpumask\fR, but accepts a textual
 specification of the permitted CPUs instead and CPUs are indexed from 0. So
diff --git a/options.c b/options.c
index 0f739317..48aa0d7b 100644
--- a/options.c
+++ b/options.c
@@ -313,15 +313,17 @@ static int parse_cmdprio_bssplit_entry(struct thread_options *o,
 	int matches = 0;
 	char *bs_str = NULL;
 	long long bs_val;
-	unsigned int perc = 0, class, level;
+	unsigned int perc = 0, class, level, hint;
 
 	/*
 	 * valid entry formats:
 	 * bs/ - %s/ - set perc to 0, prio to -1.
 	 * bs/perc - %s/%u - set prio to -1.
 	 * bs/perc/class/level - %s/%u/%u/%u
+	 * bs/perc/class/level/hint - %s/%u/%u/%u/%u
 	 */
-	matches = sscanf(str, "%m[^/]/%u/%u/%u", &bs_str, &perc, &class, &level);
+	matches = sscanf(str, "%m[^/]/%u/%u/%u/%u",
+			 &bs_str, &perc, &class, &level, &hint);
 	if (matches < 1) {
 		log_err("fio: invalid cmdprio_bssplit format\n");
 		return 1;
@@ -342,9 +344,14 @@ static int parse_cmdprio_bssplit_entry(struct thread_options *o,
 	case 2: /* bs/perc case */
 		break;
 	case 4: /* bs/perc/class/level case */
+	case 5: /* bs/perc/class/level/hint case */
 		class = min(class, (unsigned int) IOPRIO_MAX_PRIO_CLASS);
 		level = min(level, (unsigned int) IOPRIO_MAX_PRIO);
-		entry->prio = ioprio_value(class, level);
+		if (matches == 5)
+			hint = min(hint, (unsigned int) IOPRIO_MAX_PRIO_HINT);
+		else
+			hint = 0;
+		entry->prio = ioprio_value(class, level, hint);
 		break;
 	default:
 		log_err("fio: invalid cmdprio_bssplit format\n");
@@ -3806,6 +3813,18 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_CRED,
 	},
+	{
+		.name	= "priohint",
+		.lname	= "I/O nice priority hint",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, ioprio_hint),
+		.help	= "Set job IO priority hint",
+		.minval	= IOPRIO_MIN_PRIO_HINT,
+		.maxval	= IOPRIO_MAX_PRIO_HINT,
+		.interval = 1,
+		.category = FIO_OPT_C_GENERAL,
+		.group	= FIO_OPT_G_CRED,
+	},
 #else
 	{
 		.name	= "prioclass",
@@ -3813,6 +3832,12 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.type	= FIO_OPT_UNSUPPORTED,
 		.help	= "Your platform does not support IO priority classes",
 	},
+	{
+		.name	= "priohint",
+		.lname	= "I/O nice priority hint",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support IO priority hints",
+	},
 #endif
 	{
 		.name	= "thinktime",
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index bde39101..4ce72539 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -171,8 +171,8 @@ static inline int fio_getaffinity(int pid, os_cpu_mask_t *mask)
  * ioprio_set() with 4 arguments, so define fio's ioprio_set() as a macro.
  * Note that there is no idea of class within ioprio_set(2) unlike Linux.
  */
-#define ioprio_value(ioprio_class, ioprio)	(ioprio)
-#define ioprio_set(which, who, ioprio_class, ioprio)	\
+#define ioprio_value(ioprio_class, ioprio, ioprio_hint)	(ioprio)
+#define ioprio_set(which, who, ioprio_class, ioprio, ioprio_hint)	\
 	ioprio_set(which, who, ioprio)
 
 #define ioprio(ioprio)		(ioprio)
diff --git a/os/os-linux.h b/os/os-linux.h
index 2f9f7e79..c5cd6515 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -125,13 +125,24 @@ enum {
 #define IOPRIO_BITS		16
 #define IOPRIO_CLASS_SHIFT	13
 
+#define IOPRIO_HINT_BITS	10
+#define IOPRIO_HINT_SHIFT	3
+
 #define IOPRIO_MIN_PRIO		0	/* highest priority */
 #define IOPRIO_MAX_PRIO		7	/* lowest priority */
 
 #define IOPRIO_MIN_PRIO_CLASS	0
 #define IOPRIO_MAX_PRIO_CLASS	3
 
-static inline int ioprio_value(int ioprio_class, int ioprio)
+#define IOPRIO_MIN_PRIO_HINT	0
+#define IOPRIO_MAX_PRIO_HINT	((1 << IOPRIO_HINT_BITS) - 1)
+
+#define ioprio_class(ioprio)	((ioprio) >> IOPRIO_CLASS_SHIFT)
+#define ioprio(ioprio)		((ioprio) & IOPRIO_MAX_PRIO)
+#define ioprio_hint(ioprio)	\
+	(((ioprio) >> IOPRIO_HINT_SHIFT) & IOPRIO_MAX_PRIO_HINT)
+
+static inline int ioprio_value(int ioprio_class, int ioprio, int ioprio_hint)
 {
 	/*
 	 * If no class is set, assume BE
@@ -139,23 +150,23 @@ static inline int ioprio_value(int ioprio_class, int ioprio)
         if (!ioprio_class)
                 ioprio_class = IOPRIO_CLASS_BE;
 
-	return (ioprio_class << IOPRIO_CLASS_SHIFT) | ioprio;
+	return (ioprio_class << IOPRIO_CLASS_SHIFT) |
+		(ioprio_hint << IOPRIO_HINT_SHIFT) |
+		ioprio;
 }
 
 static inline bool ioprio_value_is_class_rt(unsigned int priority)
 {
-	return (priority >> IOPRIO_CLASS_SHIFT) == IOPRIO_CLASS_RT;
+	return ioprio_class(priority) == IOPRIO_CLASS_RT;
 }
 
-static inline int ioprio_set(int which, int who, int ioprio_class, int ioprio)
+static inline int ioprio_set(int which, int who, int ioprio_class, int ioprio,
+			     int ioprio_hint)
 {
 	return syscall(__NR_ioprio_set, which, who,
-		       ioprio_value(ioprio_class, ioprio));
+		       ioprio_value(ioprio_class, ioprio, ioprio_hint));
 }
 
-#define ioprio_class(ioprio)	((ioprio) >> IOPRIO_CLASS_SHIFT)
-#define ioprio(ioprio)		((ioprio) & 7)
-
 #ifndef CONFIG_HAVE_GETTID
 static inline int gettid(void)
 {
diff --git a/os/os.h b/os/os.h
index 036fc233..0f182324 100644
--- a/os/os.h
+++ b/os/os.h
@@ -120,11 +120,14 @@ extern int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu);
 #define ioprio_value_is_class_rt(prio)	(false)
 #define IOPRIO_MIN_PRIO_CLASS		0
 #define IOPRIO_MAX_PRIO_CLASS		0
+#define ioprio_hint(prio)		0
+#define IOPRIO_MIN_PRIO_HINT		0
+#define IOPRIO_MAX_PRIO_HINT		0
 #endif
 #ifndef FIO_HAVE_IOPRIO
-#define ioprio_value(prioclass, prio)	(0)
+#define ioprio_value(prioclass, prio, priohint)	(0)
 #define ioprio(ioprio)			0
-#define ioprio_set(which, who, prioclass, prio)	(0)
+#define ioprio_set(which, who, prioclass, prio, priohint) (0)
 #define IOPRIO_MIN_PRIO			0
 #define IOPRIO_MAX_PRIO			0
 #endif
diff --git a/server.h b/server.h
index 601d3340..ad706118 100644
--- a/server.h
+++ b/server.h
@@ -51,7 +51,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 100,
+	FIO_SERVER_VER			= 101,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index 7fad73d1..7b791628 100644
--- a/stat.c
+++ b/stat.c
@@ -597,10 +597,11 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 				continue;
 
 			snprintf(buf, sizeof(buf),
-				 "%s prio %u/%u",
+				 "%s prio %u/%u/%u",
 				 clat_type,
 				 ioprio_class(ts->clat_prio[ddir][i].ioprio),
-				 ioprio(ts->clat_prio[ddir][i].ioprio));
+				 ioprio(ts->clat_prio[ddir][i].ioprio),
+				 ioprio_hint(ts->clat_prio[ddir][i].ioprio));
 			display_lat(buf, min, max, mean, dev, out);
 		}
 	}
@@ -640,10 +641,11 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 					continue;
 
 				snprintf(prio_name, sizeof(prio_name),
-					 "%s prio %u/%u (%.2f%% of IOs)",
+					 "%s prio %u/%u/%u (%.2f%% of IOs)",
 					 clat_type,
 					 ioprio_class(ts->clat_prio[ddir][i].ioprio),
 					 ioprio(ts->clat_prio[ddir][i].ioprio),
+					 ioprio_hint(ts->clat_prio[ddir][i].ioprio),
 					 100. * (double) prio_samples / (double) samples);
 				show_clat_percentiles(ts->clat_prio[ddir][i].io_u_plat,
 						prio_samples, ts->percentile_list,
@@ -1533,6 +1535,8 @@ static void add_ddir_status_json(struct thread_stat *ts,
 				ioprio_class(ts->clat_prio[ddir][i].ioprio));
 			json_object_add_value_int(obj, "prio",
 				ioprio(ts->clat_prio[ddir][i].ioprio));
+			json_object_add_value_int(obj, "priohint",
+				ioprio_hint(ts->clat_prio[ddir][i].ioprio));
 
 			tmp_object = add_ddir_lat_json(ts,
 					ts->clat_percentiles | ts->lat_percentiles,
diff --git a/thread_options.h b/thread_options.h
index 1715b36c..38a9993d 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -248,6 +248,7 @@ struct thread_options {
 	unsigned int nice;
 	unsigned int ioprio;
 	unsigned int ioprio_class;
+	unsigned int ioprio_hint;
 	unsigned int file_service_type;
 	unsigned int group_reporting;
 	unsigned int stats;
@@ -568,6 +569,7 @@ struct thread_options_pack {
 	uint32_t nice;
 	uint32_t ioprio;
 	uint32_t ioprio_class;
+	uint32_t ioprio_hint;
 	uint32_t file_service_type;
 	uint32_t group_reporting;
 	uint32_t stats;
@@ -601,7 +603,6 @@ struct thread_options_pack {
 	uint32_t lat_percentiles;
 	uint32_t slat_percentiles;
 	uint32_t percentile_precision;
-	uint32_t pad5;
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
 	uint8_t read_iolog_file[FIO_TOP_STR_MAX];

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-07-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-07-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 04361e9a23d6e0448fd6fbbd4e14ecdfff60e314:

  Merge branch 'patch-3' of https://github.com/yangjueji/fio (2023-07-15 09:57:43 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to caf7ac7ef000097765b1c56404adb5e68b227977:

  t/zbd: add max_active configs to run-tests-against-nullb (2023-07-20 09:52:37 -0400)

----------------------------------------------------------------
Dmitry Fomichev (2):
      t/zbd: fix null_blk configuration in run-tests-against-nullb
      t/zbd: add max_active configs to run-tests-against-nullb

Shin'ichiro Kawasaki (11):
      zbd: get max_active_zones limit value from zoned devices
      zbd: write to closed zones on the devices with max_active_zones limit
      zbd: print max_active_zones limit error message
      docs: modify max_open_zones option description
      t/zbd: add close_zone helper function
      t/zbd: add max_active_zone variable
      t/zbd: add test case to check zones in closed condition
      t/zbd: add test case to check max_active_zones limit error message
      t/zbd: get max_open_zones from sysfs
      t/zbd: fix fio failure check and SG node failure in test case 31
      t/zbd: add missing prep_write for test cases with write workloads

 HOWTO.rst                     |  44 +++++----
 fio.1                         |  36 +++++---
 io_u.c                        |   2 +
 ioengines.h                   |   4 +-
 oslib/blkzoned.h              |   9 ++
 oslib/linux-blkzoned.c        |  23 +++++
 t/zbd/functions               |  33 ++++++-
 t/zbd/run-tests-against-nullb | 203 ++++++++++++++++++++++++++++++++++++++++--
 t/zbd/test-zbd-support        |  91 ++++++++++++++++++-
 zbd.c                         |  47 +++++++++-
 zbd.h                         |   5 ++
 11 files changed, 457 insertions(+), 40 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 7ae8ea7b..7fe70fbd 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1056,22 +1056,34 @@ Target file/device
 
 .. option:: max_open_zones=int
 
-	A zone of a zoned block device is in the open state when it is partially
-	written (i.e. not all sectors of the zone have been written). Zoned
-	block devices may have a limit on the total number of zones that can
-	be simultaneously in the open state, that is, the number of zones that
-	can be written to simultaneously. The :option:`max_open_zones` parameter
-	limits the number of zones to which write commands are issued by all fio
-	jobs, that is, limits the number of zones that will be in the open
-	state. This parameter is relevant only if the :option:`zonemode` =zbd is
-	used. The default value is always equal to maximum number of open zones
-	of the target zoned block device and a value higher than this limit
-	cannot be specified by users unless the option
-	:option:`ignore_zone_limits` is specified. When
-	:option:`ignore_zone_limits` is specified or the target device has no
-	limit on the number of zones that can be in an open state,
-	:option:`max_open_zones` can specify 0 to disable any limit on the
-	number of zones that can be simultaneously written to by all jobs.
+	When a zone of a zoned block device is partially written (i.e. not all
+	sectors of the zone have been written), the zone is in one of three
+	conditions: 'implicit open', 'explicit open' or 'closed'. Zoned block
+	devices may have a limit called 'max_open_zones' (same name as the
+	parameter) on the total number of zones that can simultaneously be in
+	the 'implicit open' or 'explicit open' conditions. Zoned block devices
+	may have another limit called 'max_active_zones', on the total number of
+	zones that can simultaneously be in the three conditions. The
+	:option:`max_open_zones` parameter limits the number of zones to which
+	write commands are issued by all fio jobs, that is, limits the number of
+	zones that will be in the conditions. When the device has the
+	max_open_zones limit and does not have the max_active_zones limit, the
+	:option:`max_open_zones` parameter limits the number of zones in the two
+	open conditions up to the limit. In this case, fio includes zones in the
+	two open conditions to the write target zones at fio start. When the
+	device has both the max_open_zones and the max_active_zones limits, the
+	:option:`max_open_zones` parameter limits the number of zones in the
+	three conditions up to the limit. In this case, fio includes zones in
+	the three conditions to the write target zones at fio start.
+
+	This parameter is relevant only if the :option:`zonemode` =zbd is used.
+	The default value is always equal to the max_open_zones limit of the
+	target zoned block device and a value higher than this limit cannot be
+	specified by users unless the option :option:`ignore_zone_limits` is
+	specified. When :option:`ignore_zone_limits` is specified or the target
+	device does not have the max_open_zones limit, :option:`max_open_zones`
+	can specify 0 to disable any limit on the number of zones that can be
+	simultaneously written to by all jobs.
 
 .. option:: job_max_open_zones=int
 
diff --git a/fio.1 b/fio.1
index da875276..20acd081 100644
--- a/fio.1
+++ b/fio.1
@@ -832,18 +832,30 @@ numbers fio only reads beyond the write pointer if explicitly told to do
 so. Default: false.
 .TP
 .BI max_open_zones \fR=\fPint
-A zone of a zoned block device is in the open state when it is partially written
-(i.e. not all sectors of the zone have been written). Zoned block devices may
-have limit a on the total number of zones that can be simultaneously in the
-open state, that is, the number of zones that can be written to simultaneously.
-The \fBmax_open_zones\fR parameter limits the number of zones to which write
-commands are issued by all fio jobs, that is, limits the number of zones that
-will be in the open state. This parameter is relevant only if the
-\fBzonemode=zbd\fR is used. The default value is always equal to maximum number
-of open zones of the target zoned block device and a value higher than this
-limit cannot be specified by users unless the option \fBignore_zone_limits\fR is
-specified. When \fBignore_zone_limits\fR is specified or the target device has
-no limit on the number of zones that can be in an open state,
+When a zone of a zoned block device is partially written (i.e. not all sectors
+of the zone have been written), the zone is in one of three
+conditions: 'implicit open', 'explicit open' or 'closed'. Zoned block devices
+may have a limit called 'max_open_zones' (same name as the parameter) on the
+total number of zones that can simultaneously be in the 'implicit open'
+or 'explicit open' conditions. Zoned block devices may have another limit
+called 'max_active_zones', on the total number of zones that can simultaneously
+be in the three conditions. The \fBmax_open_zones\fR parameter limits
+the number of zones to which write commands are issued by all fio jobs, that is,
+limits the number of zones that will be in the conditions. When the device has
+the max_open_zones limit and does not have the max_active_zones limit, the
+\fBmax_open_zones\fR parameter limits the number of zones in the two open
+conditions up to the limit. In this case, fio includes zones in the two open
+conditions to the write target zones at fio start. When the device has both the
+max_open_zones and the max_active_zones limits, the \fBmax_open_zones\fR
+parameter limits the number of zones in the three conditions up to the limit.
+In this case, fio includes zones in the three conditions to the write target
+zones at fio start.
+
+This parameter is relevant only if the \fBzonemode=zbd\fR is used. The default
+value is always equal to the max_open_zones limit of the target zoned block
+device and a value higher than this limit cannot be specified by users unless
+the option \fBignore_zone_limits\fR is specified. When \fBignore_zone_limits\fR
+is specified or the target device does not have the max_open_zones limit,
 \fBmax_open_zones\fR can specify 0 to disable any limit on the number of zones
 that can be simultaneously written to by all jobs.
 .TP
diff --git a/io_u.c b/io_u.c
index 27b6c92a..07e5bac5 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1879,6 +1879,8 @@ static void __io_u_log_error(struct thread_data *td, struct io_u *io_u)
 		io_ddir_name(io_u->ddir),
 		io_u->offset, io_u->xfer_buflen);
 
+	zbd_log_err(td, io_u);
+
 	if (td->io_ops->errdetails) {
 		char *err = td->io_ops->errdetails(io_u);
 
diff --git a/ioengines.h b/ioengines.h
index 9484265e..4391b31e 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -9,7 +9,7 @@
 #include "zbd_types.h"
 #include "fdp.h"
 
-#define FIO_IOOPS_VERSION	32
+#define FIO_IOOPS_VERSION	33
 
 #ifndef CONFIG_DYNAMIC_ENGINES
 #define FIO_STATIC	static
@@ -62,6 +62,8 @@ struct ioengine_ops {
 			uint64_t, uint64_t);
 	int (*get_max_open_zones)(struct thread_data *, struct fio_file *,
 				  unsigned int *);
+	int (*get_max_active_zones)(struct thread_data *, struct fio_file *,
+				    unsigned int *);
 	int (*finish_zone)(struct thread_data *, struct fio_file *,
 			   uint64_t, uint64_t);
 	int (*fdp_fetch_ruhs)(struct thread_data *, struct fio_file *,
diff --git a/oslib/blkzoned.h b/oslib/blkzoned.h
index 29fb034f..e598bd4f 100644
--- a/oslib/blkzoned.h
+++ b/oslib/blkzoned.h
@@ -18,6 +18,9 @@ extern int blkzoned_reset_wp(struct thread_data *td, struct fio_file *f,
 				uint64_t offset, uint64_t length);
 extern int blkzoned_get_max_open_zones(struct thread_data *td, struct fio_file *f,
 				       unsigned int *max_open_zones);
+extern int blkzoned_get_max_active_zones(struct thread_data *td,
+					 struct fio_file *f,
+					 unsigned int *max_active_zones);
 extern int blkzoned_finish_zone(struct thread_data *td, struct fio_file *f,
 				uint64_t offset, uint64_t length);
 #else
@@ -53,6 +56,12 @@ static inline int blkzoned_get_max_open_zones(struct thread_data *td, struct fio
 {
 	return -EIO;
 }
+static inline int blkzoned_get_max_active_zones(struct thread_data *td,
+						struct fio_file *f,
+						unsigned int *max_open_zones)
+{
+	return -EIO;
+}
 static inline int blkzoned_finish_zone(struct thread_data *td,
 				       struct fio_file *f,
 				       uint64_t offset, uint64_t length)
diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
index 722e0992..2c3ecf33 100644
--- a/oslib/linux-blkzoned.c
+++ b/oslib/linux-blkzoned.c
@@ -186,6 +186,29 @@ int blkzoned_get_max_open_zones(struct thread_data *td, struct fio_file *f,
 	return 0;
 }
 
+int blkzoned_get_max_active_zones(struct thread_data *td, struct fio_file *f,
+				  unsigned int *max_active_zones)
+{
+	char *max_active_str;
+
+	if (f->filetype != FIO_TYPE_BLOCK)
+		return -EIO;
+
+	max_active_str = blkzoned_get_sysfs_attr(f->file_name, "queue/max_active_zones");
+	if (!max_active_str) {
+		*max_active_zones = 0;
+		return 0;
+	}
+
+	dprint(FD_ZBD, "%s: max active zones supported by device: %s\n",
+	       f->file_name, max_active_str);
+	*max_active_zones = atoll(max_active_str);
+
+	free(max_active_str);
+
+	return 0;
+}
+
 static uint64_t zone_capacity(struct blk_zone_report *hdr,
 			      struct blk_zone *blkz)
 {
diff --git a/t/zbd/functions b/t/zbd/functions
index 9a6d6999..4faa45a9 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -4,6 +4,7 @@ blkzone=$(type -p blkzone 2>/dev/null)
 sg_inq=$(type -p sg_inq 2>/dev/null)
 zbc_report_zones=$(type -p zbc_report_zones 2>/dev/null)
 zbc_reset_zone=$(type -p zbc_reset_zone 2>/dev/null)
+zbc_close_zone=$(type -p zbc_close_zone 2>/dev/null)
 zbc_info=$(type -p zbc_info 2>/dev/null)
 if [ -z "${blkzone}" ] &&
        { [ -z "${zbc_report_zones}" ] || [ -z "${zbc_reset_zone}" ]; }; then
@@ -211,8 +212,14 @@ last_online_zone() {
 # max_open_zones in sysfs, or which lacks zoned block device support completely.
 max_open_zones() {
     local dev=$1
+    local realdev syspath
 
-    if [ -n "${sg_inq}" ] && [ ! -n "${use_libzbc}" ]; then
+    realdev=$(readlink -f "$dev")
+    syspath=/sys/block/${realdev##*/}/queue/max_open_zones
+
+    if [ -b "${realdev}" ] && [ -r "${syspath}" ]; then
+	cat ${syspath}
+    elif [ -n "${sg_inq}" ] && [ ! -n "${use_libzbc}" ]; then
 	if ! ${sg_inq} -e --page=0xB6 --len=20 --hex "$dev" \
 		 > /dev/null 2>&1; then
 	    # When sg_inq can not get max open zones, specify 0 which indicates
@@ -238,6 +245,18 @@ max_open_zones() {
     fi
 }
 
+# If sysfs provides, get max_active_zones limit of the zoned block device.
+max_active_zones() {
+	local dev=$1
+	local sys_queue="/sys/block/${dev##*/}/queue/"
+
+	if [[ -e "$sys_queue/max_active_zones" ]]; then
+		cat "$sys_queue/max_active_zones"
+		return
+	fi
+	echo 0
+}
+
 # Get minimum block size to write to seq zones. Refer the sysfs attribute
 # zone_write_granularity which shows the valid minimum size regardless of zoned
 # block device type. If the sysfs attribute is not available, refer physical
@@ -304,6 +323,18 @@ reset_zone() {
     fi
 }
 
+# Close the zone on device $1 at offset $2. The offset must be specified in
+# units of 512 byte sectors.
+close_zone() {
+	local dev=$1 offset=$2
+
+	if [ -n "${blkzone}" ] && [ -z "${use_libzbc}" ]; then
+		${blkzone} close -o "${offset}" -c 1 "$dev"
+	else
+		${zbc_close_zone} -sector "$dev" "${offset}" >/dev/null
+	fi
+}
+
 # Extract the number of bytes that have been transferred from a line like
 # READ: bw=6847KiB/s (7011kB/s), 6847KiB/s-6847KiB/s (7011kB/s-7011kB/s), io=257MiB (269MB), run=38406-38406msec
 fio_io() {
diff --git a/t/zbd/run-tests-against-nullb b/t/zbd/run-tests-against-nullb
index 7d2c7fa8..97d29966 100755
--- a/t/zbd/run-tests-against-nullb
+++ b/t/zbd/run-tests-against-nullb
@@ -67,13 +67,27 @@ configure_nullb()
 			fi
 			echo "${zone_capacity}" > zone_capacity
 		fi
+
 		if ((conv_pcnt)); then
 			if ((!conv_supported)); then
 				echo "null_blk does not support conventional zones"
 				return 2
 			fi
 			nr_conv=$((dev_size/zone_size*conv_pcnt/100))
-			echo "${nr_conv}" > zone_nr_conv
+		else
+			nr_conv=0
+		fi
+		echo "${nr_conv}" > zone_nr_conv
+
+		if ((max_open)); then
+			echo "${max_open}" > zone_max_open
+			if ((max_active)); then
+				if ((!max_act_supported)); then
+					echo "null_blk does not support active zone counts"
+					return 2
+				fi
+				echo "${max_active}" > zone_max_active
+			fi
 		fi
 	fi
 
@@ -90,6 +104,11 @@ show_nullb_config()
 		echo "    $(printf "Zone Capacity: %d MB" ${zone_capacity})"
 		if ((max_open)); then
 			echo "    $(printf "Max Open: %d Zones" ${max_open})"
+			if ((max_active)); then
+				echo "    $(printf "Max Active: %d Zones" ${max_active})"
+			else
+				echo "    Max Active: Unlimited Zones"
+			fi
 		else
 			echo "    Max Open: Unlimited Zones"
 		fi
@@ -124,6 +143,7 @@ section3()
 	zone_size=4
 	zone_capacity=3
 	max_open=0
+	max_active=0
 }
 
 # Zoned device with mostly sequential zones, ZCAP == ZSIZE, unlimited MaxOpen.
@@ -133,6 +153,7 @@ section4()
 	zone_size=1
 	zone_capacity=1
 	max_open=0
+	max_active=0
 }
 
 # Zoned device with mostly sequential zones, ZCAP < ZSIZE, unlimited MaxOpen.
@@ -142,6 +163,7 @@ section5()
 	zone_size=4
 	zone_capacity=3
 	max_open=0
+	max_active=0
 }
 
 # Zoned device with mostly conventional zones, ZCAP == ZSIZE, unlimited MaxOpen.
@@ -151,6 +173,7 @@ section6()
 	zone_size=1
 	zone_capacity=1
 	max_open=0
+	max_active=0
 }
 
 # Zoned device with mostly conventional zones, ZCAP < ZSIZE, unlimited MaxOpen.
@@ -161,9 +184,11 @@ section7()
 	zone_size=4
 	zone_capacity=3
 	max_open=0
+	max_active=0
 }
 
-# Zoned device with no conventional zones, ZCAP == ZSIZE, limited MaxOpen.
+# Zoned device with no conventional zones, ZCAP == ZSIZE, limited MaxOpen,
+# unlimited MaxActive.
 section8()
 {
 	dev_size=1024
@@ -172,9 +197,11 @@ section8()
 	zone_capacity=1
 	max_open=${set_max_open}
 	zbd_test_opts+=("-o ${max_open}")
+	max_active=0
 }
 
-# Zoned device with no conventional zones, ZCAP < ZSIZE, limited MaxOpen.
+# Zoned device with no conventional zones, ZCAP < ZSIZE, limited MaxOpen,
+# unlimited MaxActive.
 section9()
 {
 	conv_pcnt=0
@@ -182,9 +209,11 @@ section9()
 	zone_capacity=3
 	max_open=${set_max_open}
 	zbd_test_opts+=("-o ${max_open}")
+	max_active=0
 }
 
-# Zoned device with mostly sequential zones, ZCAP == ZSIZE, limited MaxOpen.
+# Zoned device with mostly sequential zones, ZCAP == ZSIZE, limited MaxOpen,
+# unlimited MaxActive.
 section10()
 {
 	conv_pcnt=10
@@ -192,9 +221,11 @@ section10()
 	zone_capacity=1
 	max_open=${set_max_open}
 	zbd_test_opts+=("-o ${max_open}")
+	max_active=0
 }
 
-# Zoned device with mostly sequential zones, ZCAP < ZSIZE, limited MaxOpen.
+# Zoned device with mostly sequential zones, ZCAP < ZSIZE, limited MaxOpen,
+# unlimited MaxActive.
 section11()
 {
 	conv_pcnt=10
@@ -202,9 +233,11 @@ section11()
 	zone_capacity=3
 	max_open=${set_max_open}
 	zbd_test_opts+=("-o ${max_open}")
+	max_active=0
 }
 
-# Zoned device with mostly conventional zones, ZCAP == ZSIZE, limited MaxOpen.
+# Zoned device with mostly conventional zones, ZCAP == ZSIZE, limited MaxOpen,
+# unlimited MaxActive.
 section12()
 {
 	conv_pcnt=66
@@ -212,9 +245,11 @@ section12()
 	zone_capacity=1
 	max_open=${set_max_open}
 	zbd_test_opts+=("-o ${max_open}")
+	max_active=0
 }
 
-# Zoned device with mostly conventional zones, ZCAP < ZSIZE, limited MaxOpen.
+# Zoned device with mostly conventional zones, ZCAP < ZSIZE, limited MaxOpen,
+# unlimited MaxActive.
 section13()
 {
 	dev_size=2048
@@ -223,6 +258,155 @@ section13()
 	zone_capacity=3
 	max_open=${set_max_open}
 	zbd_test_opts+=("-o ${max_open}")
+	max_active=0
+}
+
+# Zoned device with no conventional zones, ZCAP == ZSIZE, limited MaxOpen,
+# MaxActive == MaxOpen.
+section14()
+{
+	dev_size=1024
+	conv_pcnt=0
+	zone_size=1
+	zone_capacity=1
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+	max_active=${set_max_open}
+}
+
+# Zoned device with no conventional zones, ZCAP < ZSIZE, limited MaxOpen,
+# MaxActive == MaxOpen.
+section15()
+{
+	conv_pcnt=0
+	zone_size=4
+	zone_capacity=3
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+	max_active=${set_max_open}
+}
+
+# Zoned device with mostly sequential zones, ZCAP == ZSIZE, limited MaxOpen,
+# MaxActive == MaxOpen.
+section16()
+{
+	conv_pcnt=10
+	zone_size=1
+	zone_capacity=1
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+	max_active=${set_max_open}
+}
+
+# Zoned device with mostly sequential zones, ZCAP < ZSIZE, limited MaxOpen,
+# MaxActive == MaxOpen.
+section17()
+{
+	conv_pcnt=10
+	zone_size=4
+	zone_capacity=3
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+	max_active=${set_max_open}
+}
+
+# Zoned device with mostly conventional zones, ZCAP == ZSIZE, limited MaxOpen,
+# MaxActive == MaxOpen.
+section18()
+{
+	conv_pcnt=66
+	zone_size=1
+	zone_capacity=1
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+	max_active=${set_max_open}
+}
+
+# Zoned device with mostly conventional zones, ZCAP < ZSIZE, limited MaxOpen,
+# MaxActive == MaxOpen.
+section19()
+{
+	dev_size=2048
+	conv_pcnt=66
+	zone_size=4
+	zone_capacity=3
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+	max_active=${set_max_open}
+}
+
+# Zoned device with no conventional zones, ZCAP == ZSIZE, limited MaxOpen,
+# MaxActive > MaxOpen.
+section20()
+{
+	dev_size=1024
+	conv_pcnt=0
+	zone_size=1
+	zone_capacity=1
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+	max_active=$((set_max_open+set_extra_max_active))
+}
+
+# Zoned device with no conventional zones, ZCAP < ZSIZE, limited MaxOpen,
+# MaxActive > MaxOpen.
+section21()
+{
+	conv_pcnt=0
+	zone_size=4
+	zone_capacity=3
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+	max_active=$((set_max_open+set_extra_max_active))
+}
+
+# Zoned device with mostly sequential zones, ZCAP == ZSIZE, limited MaxOpen,
+# MaxActive > MaxOpen.
+section22()
+{
+	conv_pcnt=10
+	zone_size=1
+	zone_capacity=1
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+	max_active=$((set_max_open+set_extra_max_active))
+}
+
+# Zoned device with mostly sequential zones, ZCAP < ZSIZE, limited MaxOpen,
+# MaxActive > MaxOpen.
+section23()
+{
+	conv_pcnt=10
+	zone_size=4
+	zone_capacity=3
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+	max_active=$((set_max_open+set_extra_max_active))
+}
+
+# Zoned device with mostly conventional zones, ZCAP == ZSIZE, limited MaxOpen,
+# MaxActive > MaxOpen.
+section24()
+{
+	conv_pcnt=66
+	zone_size=1
+	zone_capacity=1
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+	max_active=$((set_max_open+set_extra_max_active))
+}
+
+# Zoned device with mostly conventional zones, ZCAP < ZSIZE, limited MaxOpen,
+# MaxActive > MaxOpen.
+section25()
+{
+	dev_size=2048
+	conv_pcnt=66
+	zone_size=4
+	zone_capacity=3
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+	max_active=$((set_max_open+set_extra_max_active))
 }
 
 #
@@ -233,10 +417,12 @@ scriptdir="$(cd "$(dirname "$0")" && pwd)"
 sections=()
 zcap_supported=1
 conv_supported=1
+max_act_supported=1
 list_only=0
 dev_size=1024
 dev_blocksize=4096
 set_max_open=8
+set_extra_max_active=2
 zbd_test_opts=()
 num_of_runs=1
 test_case=0
@@ -276,6 +462,9 @@ fi
 if ! cat /sys/kernel/config/nullb/features | grep -q zone_nr_conv; then
 	conv_supported=0
 fi
+if ! cat /sys/kernel/config/nullb/features | grep -q zone_max_active; then
+	max_act_supported=0
+fi
 
 rc=0
 test_rc=0
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index a3d37a7d..c8f3eb61 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -272,6 +272,20 @@ require_max_open_zones() {
 	return 0
 }
 
+require_max_active_zones() {
+	local min=${1}
+
+	if ((max_active_zones == 0)); then
+		SKIP_REASON="$dev does not have max_active_zones limit"
+		return 1
+	fi
+	if ((max_active_zones < min)); then
+		SKIP_REASON="max_active_zones of $dev is smaller than $min"
+		return 1
+	fi
+	return 0
+}
+
 # Check whether buffered writes are refused for block devices.
 test1() {
     require_block_dev || return $SKIP_TESTCASE
@@ -780,9 +794,10 @@ test31() {
     opts=("--name=$dev" "--filename=$dev" "--rw=write" "--bs=${bs}")
     opts+=("--offset=$off" "--size=$((inc * nz))" "--io_size=$((bs * nz))")
     opts+=("--zonemode=strided" "--zonesize=${bs}" "--zonerange=${inc}")
-    opts+=("--direct=1")
+    opts+=("--direct=1" "$(ioengine "psync")")
     echo "fio ${opts[@]}" >> "${logfile}.${test_number}"
-    "$(dirname "$0")/../../fio" "${opts[@]}" >> "${logfile}.${test_number}" 2>&1
+    "$(dirname "$0")/../../fio" "${opts[@]}" >> "${logfile}.${test_number}" \
+				2>&1 || return $?
 
     # Next, run the test.
     opts=("--name=$dev" "--filename=$dev" "--offset=$off" "--size=$size")
@@ -1182,6 +1197,7 @@ test54() {
 	require_zbd || return $SKIP_TESTCASE
 	require_seq_zones 8 || return $SKIP_TESTCASE
 
+	prep_write
 	run_fio --name=job --filename=${dev} "$(ioengine "libaio")" \
 		--time_based=1 --runtime=30s --continue_on_error=0 \
 		--offset=$((first_sequential_zone_sector * 512)) \
@@ -1203,6 +1219,7 @@ test55() {
 	# offset=1z + offset_increment=10z + size=2z
 	require_seq_zones 13 || return $SKIP_TESTCASE
 
+	prep_write
 	run_fio	--name=j		\
 		--filename=${dev}	\
 		--direct=1		\
@@ -1228,6 +1245,7 @@ test56() {
 	require_regular_block_dev || return $SKIP_TESTCASE
 	require_seq_zones 10 || return $SKIP_TESTCASE
 
+	prep_write
 	run_fio	--name=j		\
 		--filename=${dev}	\
 		--direct=1		\
@@ -1249,6 +1267,7 @@ test57() {
 
 	require_zbd || return $SKIP_TESTCASE
 
+	prep_write
 	bs=$((4096 * 7))
 	off=$((first_sequential_zone_sector * 512))
 
@@ -1413,6 +1432,71 @@ test65() {
 	check_written $((zone_size + capacity))
 }
 
+# Test closed zones are handled as open zones. This test case requires zoned
+# block devices which has same max_open_zones and max_active_zones.
+test66() {
+	local i off
+
+	require_zbd || return $SKIP_TESTCASE
+	require_max_active_zones 2 || return $SKIP_TESTCASE
+	require_max_open_zones "${max_active_zones}" || return $SKIP_TESTCASE
+	require_seq_zones $((max_active_zones * 16)) || return $SKIP_TESTCASE
+
+	reset_zone "$dev" -1
+
+	# Prepare max_active_zones in closed condition.
+	off=$((first_sequential_zone_sector * 512))
+	run_fio --name=w --filename="$dev" --zonemod=zbd --direct=1 \
+		--offset=$((off)) --zonesize="${zone_size}" --rw=randwrite \
+		--bs=4096 --size="$((zone_size * max_active_zones))" \
+		--io_size="${zone_size}" "$(ioengine "psync")" \
+		>> "${logfile}.${test_number}" 2>&1 || return $?
+	for ((i = 0; i < max_active_zones; i++)); do
+		close_zone "$dev" $((off / 512)) || return $?
+		off=$((off + zone_size))
+	done
+
+	# Run random write to the closed zones and empty zones. This confirms
+	# that fio handles closed zones as write target open zones. Otherwise,
+	# fio writes to the empty zones and hit the max_active_zones limit.
+	off=$((first_sequential_zone_sector * 512))
+	run_one_fio_job --zonemod=zbd --direct=1 \
+		       "$(ioengine "psync")" --rw=randwrite --bs=4096 \
+		       --max_open_zones="$max_active_zones" --offset=$((off)) \
+		       --size=$((max_active_zones * 16 * zone_size)) \
+		       --io_size=$((zone_size)) --zonesize="${zone_size}" \
+		       --time_based --runtime=5s \
+		       >> "${logfile}.${test_number}" 2>&1
+}
+
+# Test max_active_zones limit failure is reported with good error message.
+test67() {
+	local i off
+
+	require_zbd || return $SKIP_TESTCASE
+	require_max_active_zones 2 || return $SKIP_TESTCASE
+	require_max_open_zones "${max_active_zones}" || return $SKIP_TESTCASE
+	require_seq_zones $((max_active_zones + 1)) || return $SKIP_TESTCASE
+
+	reset_zone "$dev" -1
+
+	# Prepare max_active_zones in open condition.
+	off=$((first_sequential_zone_sector * 512))
+	run_fio --name=w --filename="$dev" --zonemod=zbd --direct=1 \
+		--offset=$((off)) --zonesize="${zone_size}" --rw=randwrite \
+		--bs=4096 --size="$((zone_size * max_active_zones))" \
+		--io_size="${zone_size}" "$(ioengine "psync")" \
+		>> "${logfile}.${test_number}" 2>&1 || return $?
+
+	# Write to antoher zone and trigger max_active_zones limit error.
+	off=$((off + zone_size * max_active_zones))
+	run_one_fio_job --zonemod=zbd --direct=1 "$(ioengine "psync")" \
+			--rw=write --bs=$min_seq_write_size --offset=$((off)) \
+			--size=$((zone_size)) --zonesize="${zone_size}" \
+			>> "${logfile}.${test_number}" 2>&1 && return $?
+	grep -q 'Exceeded max_active_zones limit' "${logfile}.${test_number}"
+}
+
 SECONDS=0
 tests=()
 dynamic_analyzer=()
@@ -1497,6 +1581,7 @@ if [[ -b "$realdev" ]]; then
 			echo "Failed to determine maximum number of open zones"
 			exit 1
 		fi
+		max_active_zones=$(max_active_zones "$dev")
 		set_io_scheduler "$basename" deadline || exit $?
 		if [ -n "$reset_all_zones" ]; then
 			reset_zone "$dev" -1
@@ -1508,6 +1593,7 @@ if [[ -b "$realdev" ]]; then
 		zone_size=$(max 65536 "$min_seq_write_size")
 		sectors_per_zone=$((zone_size / 512))
 		max_open_zones=128
+		max_active_zones=0
 		set_io_scheduler "$basename" none || exit $?
 		;;
 	esac
@@ -1543,6 +1629,7 @@ elif [[ -c "$realdev" ]]; then
 		echo "Failed to determine maximum number of open zones"
 		exit 1
 	fi
+	max_active_zones=0
 	if [ -n "$reset_all_zones" ]; then
 		reset_zone "$dev" -1
 	fi
diff --git a/zbd.c b/zbd.c
index d4565215..caac68bb 100644
--- a/zbd.c
+++ b/zbd.c
@@ -471,6 +471,34 @@ static int zbd_get_max_open_zones(struct thread_data *td, struct fio_file *f,
 	return ret;
 }
 
+/**
+ * zbd_get_max_active_zones - Get the maximum number of active zones
+ * @td: FIO thread data
+ * @f: FIO file for which to get max active zones
+ *
+ * Returns max_active_zones limit value of the target file if it is available.
+ * Otherwise return zero, which means no limit.
+ */
+static unsigned int zbd_get_max_active_zones(struct thread_data *td,
+					     struct fio_file *f)
+{
+	unsigned int max_active_zones;
+	int ret;
+
+	if (td->io_ops && td->io_ops->get_max_active_zones)
+		ret = td->io_ops->get_max_active_zones(td, f,
+						       &max_active_zones);
+	else
+		ret = blkzoned_get_max_active_zones(td, f, &max_active_zones);
+	if (ret < 0) {
+		dprint(FD_ZBD, "%s: max_active_zones is not available\n",
+		       f->file_name);
+		return 0;
+	}
+
+	return max_active_zones;
+}
+
 /**
  * __zbd_write_zone_get - Add a zone to the array of write zones.
  * @td: fio thread data.
@@ -927,6 +955,7 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	f->zbd_info->zone_size_log2 = is_power_of_2(zone_size) ?
 		ilog2(zone_size) : 0;
 	f->zbd_info->nr_zones = nr_zones;
+	f->zbd_info->max_active_zones = zbd_get_max_active_zones(td, f);
 
 	if (same_zone_cap)
 		dprint(FD_ZBD, "Zone capacity = %"PRIu64" KB\n",
@@ -1247,7 +1276,11 @@ int zbd_setup_files(struct thread_data *td)
 		for (zi = f->min_zone; zi < f->max_zone; zi++) {
 			z = &zbd->zone_info[zi];
 			if (z->cond != ZBD_ZONE_COND_IMP_OPEN &&
-			    z->cond != ZBD_ZONE_COND_EXP_OPEN)
+			    z->cond != ZBD_ZONE_COND_EXP_OPEN &&
+			    z->cond != ZBD_ZONE_COND_CLOSED)
+				continue;
+			if (!zbd->max_active_zones &&
+			    z->cond == ZBD_ZONE_COND_CLOSED)
 				continue;
 			if (__zbd_write_zone_get(td, f, z))
 				continue;
@@ -2210,3 +2243,15 @@ int zbd_do_io_u_trim(struct thread_data *td, struct io_u *io_u)
 
 	return io_u_completed;
 }
+
+void zbd_log_err(const struct thread_data *td, const struct io_u *io_u)
+{
+	const struct fio_file *f = io_u->file;
+
+	if (td->o.zone_mode != ZONE_MODE_ZBD)
+		return;
+
+	if (io_u->error == EOVERFLOW)
+		log_err("%s: Exceeded max_active_zones limit. Check conditions of zones out of I/O ranges.\n",
+			f->file_name);
+}
diff --git a/zbd.h b/zbd.h
index f0ac9876..5750a0b8 100644
--- a/zbd.h
+++ b/zbd.h
@@ -52,6 +52,9 @@ struct fio_zone_info {
  *      are simultaneously written. A zero value means unlimited zones of
  *      simultaneous writes and that write target zones will not be tracked in
  *      the write_zones array.
+ * @max_active_zones: device side limit on the number of sequential write zones
+ *	in open or closed conditions. A zero value means unlimited number of
+ *	zones in the conditions.
  * @mutex: Protects the modifiable members in this structure (refcount and
  *		num_open_zones).
  * @zone_size: size of a single zone in bytes.
@@ -75,6 +78,7 @@ struct fio_zone_info {
 struct zoned_block_device_info {
 	enum zbd_zoned_model	model;
 	uint32_t		max_write_zones;
+	uint32_t		max_active_zones;
 	pthread_mutex_t		mutex;
 	uint64_t		zone_size;
 	uint64_t		wp_valid_data_bytes;
@@ -101,6 +105,7 @@ enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
 enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
 char *zbd_write_status(const struct thread_stat *ts);
 int zbd_do_io_u_trim(struct thread_data *td, struct io_u *io_u);
+void zbd_log_err(const struct thread_data *td, const struct io_u *io_u);
 
 static inline void zbd_close_file(struct fio_file *f)
 {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-07-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-07-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 14adf6e31487aa2bc8e47cd037428036089a3834:

  thinktime: Avoid calculating a negative time left to wait (2023-07-14 14:03:34 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 04361e9a23d6e0448fd6fbbd4e14ecdfff60e314:

  Merge branch 'patch-3' of https://github.com/yangjueji/fio (2023-07-15 09:57:43 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'patch-3' of https://github.com/yangjueji/fio

Jueji Yang (1):
      fix: io_uring sqpoll issue_time empty when kernel not yet read sq

 engines/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 407d65ce..f30a3c00 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -646,7 +646,7 @@ static int fio_ioring_commit(struct thread_data *td)
 	 */
 	if (o->sqpoll_thread) {
 		struct io_sq_ring *ring = &ld->sq_ring;
-		unsigned start = *ld->sq_ring.head;
+		unsigned start = *ld->sq_ring.tail - ld->queued;
 		unsigned flags;
 
 		flags = atomic_load_acquire(ring->flags);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-07-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-07-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 270316dd2566346a12cfdf3cbe9996a88307f87d:

  Merge branch 'master' of https://github.com/bvanassche/fio (2023-07-13 15:28:20 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 14adf6e31487aa2bc8e47cd037428036089a3834:

  thinktime: Avoid calculating a negative time left to wait (2023-07-14 14:03:34 -0400)

----------------------------------------------------------------
Michael Kelley (1):
      thinktime: Avoid calculating a negative time left to wait

Vincent Fu (2):
      stat: add new diskutil sectors to json output
      stat: add diskutil aggregated sectors to normal output

 backend.c | 11 ++++++++++-
 stat.c    | 14 +++++++++++---
 2 files changed, 21 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index d67a4a07..b06a11a5 100644
--- a/backend.c
+++ b/backend.c
@@ -897,7 +897,16 @@ static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir,
 	if (left)
 		total = usec_spin(left);
 
-	left = td->o.thinktime - total;
+	/*
+	 * usec_spin() might run for slightly longer than intended in a VM
+	 * where the vCPU could get descheduled or the hypervisor could steal
+	 * CPU time. Ensure "left" doesn't become negative.
+	 */
+	if (total < td->o.thinktime)
+		left = td->o.thinktime - total;
+	else
+		left = 0;
+
 	if (td->o.timeout) {
 		runtime_left = td->o.timeout - utime_since_now(&td->epoch);
 		if (runtime_left < (unsigned long long)left)
diff --git a/stat.c b/stat.c
index ced73645..7fad73d1 100644
--- a/stat.c
+++ b/stat.c
@@ -957,11 +957,13 @@ static void show_agg_stats(struct disk_util_agg *agg, int terse,
 		return;
 
 	if (!terse) {
-		log_buf(out, ", aggrios=%llu/%llu, aggrmerge=%llu/%llu, "
-			 "aggrticks=%llu/%llu, aggrin_queue=%llu, "
-			 "aggrutil=%3.2f%%",
+		log_buf(out, ", aggrios=%llu/%llu, aggsectors=%llu/%llu, "
+			 "aggrmerge=%llu/%llu, aggrticks=%llu/%llu, "
+			 "aggrin_queue=%llu, aggrutil=%3.2f%%",
 			(unsigned long long) agg->ios[0] / agg->slavecount,
 			(unsigned long long) agg->ios[1] / agg->slavecount,
+			(unsigned long long) agg->sectors[0] / agg->slavecount,
+			(unsigned long long) agg->sectors[1] / agg->slavecount,
 			(unsigned long long) agg->merges[0] / agg->slavecount,
 			(unsigned long long) agg->merges[1] / agg->slavecount,
 			(unsigned long long) agg->ticks[0] / agg->slavecount,
@@ -1084,6 +1086,8 @@ void json_array_add_disk_util(struct disk_util_stat *dus,
 	json_object_add_value_string(obj, "name", (const char *)dus->name);
 	json_object_add_value_int(obj, "read_ios", dus->s.ios[0]);
 	json_object_add_value_int(obj, "write_ios", dus->s.ios[1]);
+	json_object_add_value_int(obj, "read_sectors", dus->s.sectors[0]);
+	json_object_add_value_int(obj, "write_sectors", dus->s.sectors[1]);
 	json_object_add_value_int(obj, "read_merges", dus->s.merges[0]);
 	json_object_add_value_int(obj, "write_merges", dus->s.merges[1]);
 	json_object_add_value_int(obj, "read_ticks", dus->s.ticks[0]);
@@ -1101,6 +1105,10 @@ void json_array_add_disk_util(struct disk_util_stat *dus,
 				agg->ios[0] / agg->slavecount);
 	json_object_add_value_int(obj, "aggr_write_ios",
 				agg->ios[1] / agg->slavecount);
+	json_object_add_value_int(obj, "aggr_read_sectors",
+				agg->sectors[0] / agg->slavecount);
+	json_object_add_value_int(obj, "aggr_write_sectors",
+				agg->sectors[1] / agg->slavecount);
 	json_object_add_value_int(obj, "aggr_read_merges",
 				agg->merges[0] / agg->slavecount);
 	json_object_add_value_int(obj, "aggr_write_merge",

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-07-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-07-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8e2b81b854286f32eae7951a434dddebd968f9d5:

  zbd: Support finishing zones on Android (2023-07-05 15:48:11 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 270316dd2566346a12cfdf3cbe9996a88307f87d:

  Merge branch 'master' of https://github.com/bvanassche/fio (2023-07-13 15:28:20 -0600)

----------------------------------------------------------------
Ankit Kumar (4):
      fdp: use macros
      fdp: fix placement id check
      fdp: support random placement id selection
      engines/xnvme: add support for fdp

Bart Van Assche (5):
      diskutil: Improve disk utilization data structure documentation
      diskutil: Remove casts from get_io_ticks()
      diskutil: Simplify get_io_ticks()
      diskutil: Fix a debug statement in get_io_ticks()
      diskutil: Report how many sectors have been read and written

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

Vincent Fu (1):
      options: add code for FDP pli selection use in client/server mode

 HOWTO.rst              | 23 +++++++++++++--
 cconv.c                |  2 ++
 configure              |  2 +-
 diskutil.c             | 29 +++++++------------
 diskutil.h             | 12 +++++++-
 engines/io_uring.c     |  2 +-
 engines/xnvme.c        | 78 +++++++++++++++++++++++++++++++++++++++++++++++++-
 examples/xnvme-fdp.fio | 36 +++++++++++++++++++++++
 fdp.c                  | 22 ++++++++------
 fdp.h                  | 13 +++++++++
 fio.1                  | 22 ++++++++++++--
 fio.h                  |  2 ++
 init.c                 |  2 ++
 options.c              | 20 +++++++++++++
 stat.c                 |  7 +++--
 thread_options.h       |  2 ++
 16 files changed, 236 insertions(+), 38 deletions(-)
 create mode 100644 examples/xnvme-fdp.fio

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 2e1e55c2..7ae8ea7b 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2431,11 +2431,26 @@ with the caveat that when used on the command line, they must come after the
 	For direct I/O, requests will only succeed if cache invalidation isn't required,
 	file blocks are fully allocated and the disk request could be issued immediately.
 
-.. option:: fdp=bool : [io_uring_cmd]
+.. option:: fdp=bool : [io_uring_cmd] [xnvme]
 
 	Enable Flexible Data Placement mode for write commands.
 
-.. option:: fdp_pli=str : [io_uring_cmd]
+.. option:: fdp_pli_select=str : [io_uring_cmd] [xnvme]
+
+	Defines how fio decides which placement ID to use next. The following
+	types are defined:
+
+		**random**
+			Choose a placement ID at random (uniform).
+
+		**roundrobin**
+			Round robin over available placement IDs. This is the
+			default.
+
+	The available placement ID index/indices is defined by the option
+	:option:`fdp_pli`.
+
+.. option:: fdp_pli=str : [io_uring_cmd] [xnvme]
 
 	Select which Placement ID Index/Indicies this job is allowed to use for
 	writes. By default, the job will cycle through all available Placement
@@ -4513,13 +4528,15 @@ For each data direction it prints:
 And finally, the disk statistics are printed. This is Linux specific. They will look like this::
 
   Disk stats (read/write):
-    sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
+    sda: ios=16398/16511, sectors=32321/65472, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
 
 Each value is printed for both reads and writes, with reads first. The
 numbers denote:
 
 **ios**
 		Number of I/Os performed by all groups.
+**sectors**
+		Amount of data transferred in units of 512 bytes for all groups.
 **merge**
 		Number of merges performed by the I/O scheduler.
 **ticks**
diff --git a/cconv.c b/cconv.c
index 9095d519..1bfa770f 100644
--- a/cconv.c
+++ b/cconv.c
@@ -351,6 +351,7 @@ int convert_thread_options_to_cpu(struct thread_options *o,
 		o->merge_blktrace_iters[i].u.f = fio_uint64_to_double(le64_to_cpu(top->merge_blktrace_iters[i].u.i));
 
 	o->fdp = le32_to_cpu(top->fdp);
+	o->fdp_pli_select = le32_to_cpu(top->fdp_pli_select);
 	o->fdp_nrpli = le32_to_cpu(top->fdp_nrpli);
 	for (i = 0; i < o->fdp_nrpli; i++)
 		o->fdp_plis[i] = le32_to_cpu(top->fdp_plis[i]);
@@ -645,6 +646,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 		top->merge_blktrace_iters[i].u.i = __cpu_to_le64(fio_double_to_uint64(o->merge_blktrace_iters[i].u.f));
 
 	top->fdp = cpu_to_le32(o->fdp);
+	top->fdp_pli_select = cpu_to_le32(o->fdp_pli_select);
 	top->fdp_nrpli = cpu_to_le32(o->fdp_nrpli);
 	for (i = 0; i < o->fdp_nrpli; i++)
 		top->fdp_plis[i] = cpu_to_le32(o->fdp_plis[i]);
diff --git a/configure b/configure
index 74416fd4..6c938251 100755
--- a/configure
+++ b/configure
@@ -2651,7 +2651,7 @@ fi
 ##########################################
 # Check if we have xnvme
 if test "$xnvme" != "no" ; then
-  if check_min_lib_version xnvme 0.2.0; then
+  if check_min_lib_version xnvme 0.7.0; then
     xnvme="yes"
     xnvme_cflags=$(pkg-config --cflags xnvme)
     xnvme_libs=$(pkg-config --libs xnvme)
diff --git a/diskutil.c b/diskutil.c
index ace7af3d..cf4ede85 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -1,3 +1,4 @@
+#include <inttypes.h>
 #include <stdio.h>
 #include <string.h>
 #include <sys/types.h>
@@ -44,8 +45,6 @@ static void disk_util_free(struct disk_util *du)
 
 static int get_io_ticks(struct disk_util *du, struct disk_util_stat *dus)
 {
-	unsigned in_flight;
-	unsigned long long sectors[2];
 	char line[256];
 	FILE *f;
 	char *p;
@@ -65,23 +64,17 @@ static int get_io_ticks(struct disk_util *du, struct disk_util_stat *dus)
 
 	dprint(FD_DISKUTIL, "%s: %s", du->path, p);
 
-	ret = sscanf(p, "%llu %llu %llu %llu %llu %llu %llu %llu %u %llu %llu\n",
-				(unsigned long long *) &dus->s.ios[0],
-				(unsigned long long *) &dus->s.merges[0],
-				&sectors[0],
-				(unsigned long long *) &dus->s.ticks[0],
-				(unsigned long long *) &dus->s.ios[1],
-				(unsigned long long *) &dus->s.merges[1],
-				&sectors[1],
-				(unsigned long long *) &dus->s.ticks[1],
-				&in_flight,
-				(unsigned long long *) &dus->s.io_ticks,
-				(unsigned long long *) &dus->s.time_in_queue);
+	ret = sscanf(p, "%"SCNu64" %"SCNu64" %"SCNu64" %"SCNu64" "
+		     "%"SCNu64" %"SCNu64" %"SCNu64" %"SCNu64" "
+		     "%*u %"SCNu64" %"SCNu64"\n",
+		     &dus->s.ios[0], &dus->s.merges[0], &dus->s.sectors[0],
+		     &dus->s.ticks[0],
+		     &dus->s.ios[1], &dus->s.merges[1], &dus->s.sectors[1],
+		     &dus->s.ticks[1],
+		     &dus->s.io_ticks, &dus->s.time_in_queue);
 	fclose(f);
-	dprint(FD_DISKUTIL, "%s: stat read ok? %d\n", du->path, ret == 1);
-	dus->s.sectors[0] = sectors[0];
-	dus->s.sectors[1] = sectors[1];
-	return ret != 11;
+	dprint(FD_DISKUTIL, "%s: stat read ok? %d\n", du->path, ret == 10);
+	return ret != 10;
 }
 
 static void update_io_tick_disk(struct disk_util *du)
diff --git a/diskutil.h b/diskutil.h
index 7d7ef802..9dca42c4 100644
--- a/diskutil.h
+++ b/diskutil.h
@@ -7,6 +7,16 @@
 #include "helper_thread.h"
 #include "fio_sem.h"
 
+/**
+ * @ios: Number of I/O operations that have been completed successfully.
+ * @merges: Number of I/O operations that have been merged.
+ * @sectors: I/O size in 512-byte units.
+ * @ticks: Time spent on I/O in milliseconds.
+ * @io_ticks: CPU time spent on I/O in milliseconds.
+ * @time_in_queue: Weighted time spent doing I/O in milliseconds.
+ *
+ * For the array members, index 0 refers to reads and index 1 refers to writes.
+ */
 struct disk_util_stats {
 	uint64_t ios[2];
 	uint64_t merges[2];
@@ -18,7 +28,7 @@ struct disk_util_stats {
 };
 
 /*
- * Disk utils as read in /sys/block/<dev>/stat
+ * Disk utilization as read from /sys/block/<dev>/stat
  */
 struct disk_util_stat {
 	uint8_t name[FIO_DU_NAME_SZ];
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 5021239e..407d65ce 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -1310,7 +1310,7 @@ static int fio_ioring_cmd_fetch_ruhs(struct thread_data *td, struct fio_file *f,
 	struct nvme_fdp_ruh_status *ruhs;
 	int bytes, ret, i;
 
-	bytes = sizeof(*ruhs) + 128 * sizeof(struct nvme_fdp_ruh_status_desc);
+	bytes = sizeof(*ruhs) + FDP_MAX_RUHS * sizeof(struct nvme_fdp_ruh_status_desc);
 	ruhs = scalloc(1, bytes);
 	if (!ruhs)
 		return -ENOMEM;
diff --git a/engines/xnvme.c b/engines/xnvme.c
index bb92a121..ce7b2bdd 100644
--- a/engines/xnvme.c
+++ b/engines/xnvme.c
@@ -16,6 +16,7 @@
 #include <libxnvme_spec_fs.h>
 #include "fio.h"
 #include "zbd_types.h"
+#include "fdp.h"
 #include "optgroup.h"
 
 static pthread_mutex_t g_serialize = PTHREAD_MUTEX_INITIALIZER;
@@ -509,6 +510,7 @@ static enum fio_q_status xnvme_fioe_queue(struct thread_data *td, struct io_u *i
 	uint16_t nlb;
 	int err;
 	bool vectored_io = ((struct xnvme_fioe_options *)td->eo)->xnvme_iovec;
+	uint32_t dir = io_u->dtype;
 
 	fio_ro_check(td, io_u);
 
@@ -524,6 +526,10 @@ static enum fio_q_status xnvme_fioe_queue(struct thread_data *td, struct io_u *i
 	ctx->cmd.common.nsid = nsid;
 	ctx->cmd.nvm.slba = slba;
 	ctx->cmd.nvm.nlb = nlb;
+	if (dir) {
+		ctx->cmd.nvm.dtype = io_u->dtype;
+		ctx->cmd.nvm.cdw13.dspec = io_u->dspec;
+	}
 
 	switch (io_u->ddir) {
 	case DDIR_READ:
@@ -947,6 +953,72 @@ exit:
 	return err;
 }
 
+static int xnvme_fioe_fetch_ruhs(struct thread_data *td, struct fio_file *f,
+				 struct fio_ruhs_info *fruhs_info)
+{
+	struct xnvme_opts opts = xnvme_opts_from_fioe(td);
+	struct xnvme_dev *dev;
+	struct xnvme_spec_ruhs *ruhs;
+	struct xnvme_cmd_ctx ctx;
+	uint32_t ruhs_nbytes;
+	uint32_t nsid;
+	int err = 0, err_lock;
+
+	if (f->filetype != FIO_TYPE_CHAR) {
+		log_err("ioeng->fdp_ruhs(): ignoring filetype: %d\n", f->filetype);
+		return -EINVAL;
+	}
+
+	err = pthread_mutex_lock(&g_serialize);
+	if (err) {
+		log_err("ioeng->fdp_ruhs(): pthread_mutex_lock(), err(%d)\n", err);
+		return -err;
+	}
+
+	dev = xnvme_dev_open(f->file_name, &opts);
+	if (!dev) {
+		log_err("ioeng->fdp_ruhs(): xnvme_dev_open(%s) failed, errno: %d\n",
+			f->file_name, errno);
+		err = -errno;
+		goto exit;
+	}
+
+	ruhs_nbytes = sizeof(*ruhs) + (FDP_MAX_RUHS * sizeof(struct xnvme_spec_ruhs_desc));
+	ruhs = xnvme_buf_alloc(dev, ruhs_nbytes);
+	if (!ruhs) {
+		err = -errno;
+		goto exit;
+	}
+	memset(ruhs, 0, ruhs_nbytes);
+
+	ctx = xnvme_cmd_ctx_from_dev(dev);
+	nsid = xnvme_dev_get_nsid(dev);
+
+	err = xnvme_nvm_mgmt_recv(&ctx, nsid, XNVME_SPEC_IO_MGMT_RECV_RUHS, 0, ruhs, ruhs_nbytes);
+
+	if (err || xnvme_cmd_ctx_cpl_status(&ctx)) {
+		err = err ? err : -EIO;
+		log_err("ioeng->fdp_ruhs(): err(%d), sc(%d)", err, ctx.cpl.status.sc);
+		goto free_buffer;
+	}
+
+	fruhs_info->nr_ruhs = ruhs->nruhsd;
+	for (uint32_t idx = 0; idx < fruhs_info->nr_ruhs; ++idx) {
+		fruhs_info->plis[idx] = le16_to_cpu(ruhs->desc[idx].pi);
+	}
+
+free_buffer:
+	xnvme_buf_free(dev, ruhs);
+exit:
+	xnvme_dev_close(dev);
+
+	err_lock = pthread_mutex_unlock(&g_serialize);
+	if (err_lock)
+		log_err("ioeng->fdp_ruhs(): pthread_mutex_unlock(), err(%d)\n", err_lock);
+
+	return err;
+}
+
 static int xnvme_fioe_get_file_size(struct thread_data *td, struct fio_file *f)
 {
 	struct xnvme_opts opts = xnvme_opts_from_fioe(td);
@@ -971,7 +1043,9 @@ static int xnvme_fioe_get_file_size(struct thread_data *td, struct fio_file *f)
 
 	f->real_file_size = xnvme_dev_get_geo(dev)->tbytes;
 	fio_file_set_size_known(f);
-	f->filetype = FIO_TYPE_BLOCK;
+
+	if (td->o.zone_mode == ZONE_MODE_ZBD)
+		f->filetype = FIO_TYPE_BLOCK;
 
 exit:
 	xnvme_dev_close(dev);
@@ -1011,6 +1085,8 @@ FIO_STATIC struct ioengine_ops ioengine = {
 	.get_zoned_model = xnvme_fioe_get_zoned_model,
 	.report_zones = xnvme_fioe_report_zones,
 	.reset_wp = xnvme_fioe_reset_wp,
+
+	.fdp_fetch_ruhs = xnvme_fioe_fetch_ruhs,
 };
 
 static void fio_init fio_xnvme_register(void)
diff --git a/examples/xnvme-fdp.fio b/examples/xnvme-fdp.fio
new file mode 100644
index 00000000..86fbe0d3
--- /dev/null
+++ b/examples/xnvme-fdp.fio
@@ -0,0 +1,36 @@
+; README
+;
+; This job-file is intended to be used either as:
+;
+; # Use the xNVMe io-engine engine io_uring_cmd async. impl.
+; fio examples/xnvme-fdp.fio \
+;   --section=default \
+;   --ioengine=xnvme \
+;   --xnvme_async=io_uring_cmd \
+;   --filename=/dev/ng0n1
+;
+; # Use the xNVMe io-engine engine with nvme sync. impl.
+; fio examples/xnvme-fdp.fio \
+;   --section=default \
+;   --ioengine=xnvme \
+;   --xnvme_sync=nvme \
+;   --filename=/dev/ng0n1
+;
+; FIO_BS="512" FIO_RW="read" FIO_IODEPTH=16 fio examples/xnvme-fdp.fio \
+;   --section=override --ioengine=xnvme --xnvme_sync=nvme --filename=/dev/ng0n1
+;
+[global]
+rw=randwrite
+size=2M
+iodepth=1
+bs=4K
+thread=1
+fdp=1
+fdp_pli=4,5
+
+[default]
+
+[override]
+rw=${FIO_RW}
+iodepth=${FIO_IODEPTH}
+bs=${FIO_BS}
diff --git a/fdp.c b/fdp.c
index d92dbc67..49c80d2c 100644
--- a/fdp.c
+++ b/fdp.c
@@ -45,7 +45,7 @@ static int init_ruh_info(struct thread_data *td, struct fio_file *f)
 	struct fio_ruhs_info *ruhs, *tmp;
 	int i, ret;
 
-	ruhs = scalloc(1, sizeof(*ruhs) + 128 * sizeof(*ruhs->plis));
+	ruhs = scalloc(1, sizeof(*ruhs) + FDP_MAX_RUHS * sizeof(*ruhs->plis));
 	if (!ruhs)
 		return -ENOMEM;
 
@@ -56,8 +56,8 @@ static int init_ruh_info(struct thread_data *td, struct fio_file *f)
 		goto out;
 	}
 
-	if (ruhs->nr_ruhs > 128)
-		ruhs->nr_ruhs = 128;
+	if (ruhs->nr_ruhs > FDP_MAX_RUHS)
+		ruhs->nr_ruhs = FDP_MAX_RUHS;
 
 	if (td->o.fdp_nrpli == 0) {
 		f->ruhs_info = ruhs;
@@ -65,7 +65,7 @@ static int init_ruh_info(struct thread_data *td, struct fio_file *f)
 	}
 
 	for (i = 0; i < td->o.fdp_nrpli; i++) {
-		if (td->o.fdp_plis[i] > ruhs->nr_ruhs) {
+		if (td->o.fdp_plis[i] >= ruhs->nr_ruhs) {
 			ret = -EINVAL;
 			goto out;
 		}
@@ -119,10 +119,16 @@ void fdp_fill_dspec_data(struct thread_data *td, struct io_u *io_u)
 		return;
 	}
 
-	if (ruhs->pli_loc >= ruhs->nr_ruhs)
-		ruhs->pli_loc = 0;
+	if (td->o.fdp_pli_select == FIO_FDP_RR) {
+		if (ruhs->pli_loc >= ruhs->nr_ruhs)
+			ruhs->pli_loc = 0;
 
-	dspec = ruhs->plis[ruhs->pli_loc++];
-	io_u->dtype = 2;
+		dspec = ruhs->plis[ruhs->pli_loc++];
+	} else {
+		ruhs->pli_loc = rand_between(&td->fdp_state, 0, ruhs->nr_ruhs - 1);
+		dspec = ruhs->plis[ruhs->pli_loc];
+	}
+
+	io_u->dtype = FDP_DIR_DTYPE;
 	io_u->dspec = dspec;
 }
diff --git a/fdp.h b/fdp.h
index 81691f62..accbac38 100644
--- a/fdp.h
+++ b/fdp.h
@@ -3,6 +3,19 @@
 
 #include "io_u.h"
 
+#define FDP_DIR_DTYPE	2
+#define FDP_MAX_RUHS	128
+
+/*
+ * How fio chooses what placement identifier to use next. Choice of
+ * uniformly random, or roundrobin.
+ */
+
+enum {
+	FIO_FDP_RANDOM	= 0x1,
+	FIO_FDP_RR	= 0x2,
+};
+
 struct fio_ruhs_info {
 	uint32_t nr_ruhs;
 	uint32_t pli_loc;
diff --git a/fio.1 b/fio.1
index 73b7e8c9..da875276 100644
--- a/fio.1
+++ b/fio.1
@@ -2192,10 +2192,26 @@ cached data. Currently the RWF_NOWAIT flag does not supported for cached write.
 For direct I/O, requests will only succeed if cache invalidation isn't required,
 file blocks are fully allocated and the disk request could be issued immediately.
 .TP
-.BI (io_uring_cmd)fdp \fR=\fPbool
+.BI (io_uring_cmd,xnvme)fdp \fR=\fPbool
 Enable Flexible Data Placement mode for write commands.
 .TP
-.BI (io_uring_cmd)fdp_pli \fR=\fPstr
+.BI (io_uring_cmd,xnvme)fdp_pli_select \fR=\fPstr
+Defines how fio decides which placement ID to use next. The following types
+are defined:
+.RS
+.RS
+.TP
+.B random
+Choose a placement ID at random (uniform).
+.TP
+.B roundrobin
+Round robin over available placement IDs. This is the default.
+.RE
+.P
+The available placement ID index/indices is defined by \fBfdp_pli\fR option.
+.RE
+.TP
+.BI (io_uring_cmd,xnvme)fdp_pli \fR=\fPstr
 Select which Placement ID Index/Indicies this job is allowed to use for writes.
 By default, the job will cycle through all available Placement IDs, so use this
 to isolate these identifiers to specific jobs. If you want fio to use placement
@@ -4168,7 +4184,7 @@ They will look like this:
 .P
 .nf
 		  Disk stats (read/write):
-		    sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
+		    sda: ios=16398/16511, sectors=32321/65472, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
 .fi
 .P
 Each value is printed for both reads and writes, with reads first. The
diff --git a/fio.h b/fio.h
index c5453d13..a54f57c9 100644
--- a/fio.h
+++ b/fio.h
@@ -144,6 +144,7 @@ enum {
 	FIO_RAND_POISSON3_OFF,
 	FIO_RAND_PRIO_CMDS,
 	FIO_RAND_DEDUPE_WORKING_SET_IX,
+	FIO_RAND_FDP_OFF,
 	FIO_RAND_NR_OFFS,
 };
 
@@ -262,6 +263,7 @@ struct thread_data {
 	struct frand_state verify_state_last_do_io;
 	struct frand_state trim_state;
 	struct frand_state delay_state;
+	struct frand_state fdp_state;
 
 	struct frand_state buf_state;
 	struct frand_state buf_state_prev;
diff --git a/init.c b/init.c
index 10e63cca..105339fa 100644
--- a/init.c
+++ b/init.c
@@ -1082,6 +1082,8 @@ void td_fill_rand_seeds(struct thread_data *td)
 
 	init_rand_seed(&td->buf_state, td->rand_seeds[FIO_RAND_BUF_OFF], use64);
 	frand_copy(&td->buf_state_prev, &td->buf_state);
+
+	init_rand_seed(&td->fdp_state, td->rand_seeds[FIO_RAND_FDP_OFF], use64);
 }
 
 static int setup_random_seeds(struct thread_data *td)
diff --git a/options.c b/options.c
index a7c4ef6e..0f739317 100644
--- a/options.c
+++ b/options.c
@@ -3679,6 +3679,26 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group  = FIO_OPT_G_INVALID,
 	},
+	{
+		.name	= "fdp_pli_select",
+		.lname	= "FDP Placement ID select",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct thread_options, fdp_pli_select),
+		.help	= "Select which FDP placement ID to use next",
+		.def	= "roundrobin",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_INVALID,
+		.posval	= {
+			  { .ival = "random",
+			    .oval = FIO_FDP_RANDOM,
+			    .help = "Choose a Placement ID at random (uniform)",
+			  },
+			  { .ival = "roundrobin",
+			    .oval = FIO_FDP_RR,
+			    .help = "Round robin select Placement IDs",
+			  },
+		},
+	},
 	{
 		.name	= "fdp_pli",
 		.lname	= "FDP Placement ID indicies",
diff --git a/stat.c b/stat.c
index 015b8e28..ced73645 100644
--- a/stat.c
+++ b/stat.c
@@ -1030,11 +1030,14 @@ void print_disk_util(struct disk_util_stat *dus, struct disk_util_agg *agg,
 		if (agg->slavecount)
 			log_buf(out, "  ");
 
-		log_buf(out, "  %s: ios=%llu/%llu, merge=%llu/%llu, "
-			 "ticks=%llu/%llu, in_queue=%llu, util=%3.2f%%",
+		log_buf(out, "  %s: ios=%llu/%llu, sectors=%llu/%llu, "
+			"merge=%llu/%llu, ticks=%llu/%llu, in_queue=%llu, "
+			"util=%3.2f%%",
 				dus->name,
 				(unsigned long long) dus->s.ios[0],
 				(unsigned long long) dus->s.ios[1],
+				(unsigned long long) dus->s.sectors[0],
+				(unsigned long long) dus->s.sectors[1],
 				(unsigned long long) dus->s.merges[0],
 				(unsigned long long) dus->s.merges[1],
 				(unsigned long long) dus->s.ticks[0],
diff --git a/thread_options.h b/thread_options.h
index a24ebee6..1715b36c 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -388,6 +388,7 @@ struct thread_options {
 
 #define FIO_MAX_PLIS 16
 	unsigned int fdp;
+	unsigned int fdp_pli_select;
 	unsigned int fdp_plis[FIO_MAX_PLIS];
 	unsigned int fdp_nrpli;
 
@@ -703,6 +704,7 @@ struct thread_options_pack {
 	uint32_t log_prio;
 
 	uint32_t fdp;
+	uint32_t fdp_pli_select;
 	uint32_t fdp_plis[FIO_MAX_PLIS];
 	uint32_t fdp_nrpli;
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-07-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-07-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 41508de67c06661ff1d473d108a8a01912ade114:

  fio/server: fix confusing sk_out check (2023-07-03 09:16:45 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8e2b81b854286f32eae7951a434dddebd968f9d5:

  zbd: Support finishing zones on Android (2023-07-05 15:48:11 -0600)

----------------------------------------------------------------
Bart Van Assche (1):
      zbd: Support finishing zones on Android

Jens Axboe (1):
      Merge branch 'makefile-hardening-cpp-flags' of https://github.com/proact-de/fio

Martin Steigerwald (1):
      Keep C pre processor hardening build flags.

Vincent Fu (4):
      engines/io_uring_cmd: make trims async
      engines/io_uring: remove dead code related to trim
      t/nvmept: add check for iodepth
      t/nvmept: add trim test with ioengine options enabled

 Makefile               |  2 +-
 engines/io_uring.c     | 49 ++++++++++----------------
 engines/nvme.c         | 96 ++++++++++++++++++++++++--------------------------
 engines/nvme.h         |  5 +--
 oslib/linux-blkzoned.c | 24 ++++++-------
 t/nvmept.py            | 21 +++++++++++
 6 files changed, 100 insertions(+), 97 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 6d7fd4e2..cc8164b2 100644
--- a/Makefile
+++ b/Makefile
@@ -20,7 +20,7 @@ include config-host.mak
 endif
 
 DEBUGFLAGS = -DFIO_INC_DEBUG
-CPPFLAGS= -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DFIO_INTERNAL $(DEBUGFLAGS)
+CPPFLAGS+= -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DFIO_INTERNAL $(DEBUGFLAGS)
 OPTFLAGS= -g -ffast-math
 FIO_CFLAGS= -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement $(OPTFLAGS) $(EXTFLAGS) $(BUILD_CFLAGS) -I. -I$(SRCDIR)
 LIBS	+= -lm $(EXTLIBS)
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 73e4a27a..5021239e 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -78,6 +78,8 @@ struct ioring_data {
 	struct ioring_mmap mmap[3];
 
 	struct cmdprio cmdprio;
+
+	struct nvme_dsm_range *dsm;
 };
 
 struct ioring_options {
@@ -410,7 +412,7 @@ static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
 	if (o->cmd_type != FIO_URING_CMD_NVME)
 		return -EINVAL;
 
-	if (io_u->ddir == DDIR_TRIM)
+	if (io_u->ddir == DDIR_TRIM && td->io_ops->flags & FIO_ASYNCIO_SYNC_TRIM)
 		return 0;
 
 	sqe = &ld->sqes[(io_u->index) << 1];
@@ -444,7 +446,8 @@ static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
 
 	cmd = (struct nvme_uring_cmd *)sqe->cmd;
 	return fio_nvme_uring_cmd_prep(cmd, io_u,
-			o->nonvectored ? NULL : &ld->iovecs[io_u->index]);
+			o->nonvectored ? NULL : &ld->iovecs[io_u->index],
+			&ld->dsm[io_u->index]);
 }
 
 static struct io_u *fio_ioring_event(struct thread_data *td, int event)
@@ -561,27 +564,6 @@ static inline void fio_ioring_cmdprio_prep(struct thread_data *td,
 		ld->sqes[io_u->index].ioprio = io_u->ioprio;
 }
 
-static int fio_ioring_cmd_io_u_trim(struct thread_data *td,
-				    struct io_u *io_u)
-{
-	struct fio_file *f = io_u->file;
-	int ret;
-
-	if (td->o.zone_mode == ZONE_MODE_ZBD) {
-		ret = zbd_do_io_u_trim(td, io_u);
-		if (ret == io_u_completed)
-			return io_u->xfer_buflen;
-		if (ret)
-			goto err;
-	}
-
-	return fio_nvme_trim(td, f, io_u->offset, io_u->xfer_buflen);
-
-err:
-	io_u->error = ret;
-	return 0;
-}
-
 static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 					  struct io_u *io_u)
 {
@@ -594,14 +576,11 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 	if (ld->queued == ld->iodepth)
 		return FIO_Q_BUSY;
 
-	if (io_u->ddir == DDIR_TRIM) {
+	if (io_u->ddir == DDIR_TRIM && td->io_ops->flags & FIO_ASYNCIO_SYNC_TRIM) {
 		if (ld->queued)
 			return FIO_Q_BUSY;
 
-		if (!strcmp(td->io_ops->name, "io_uring_cmd"))
-			fio_ioring_cmd_io_u_trim(td, io_u);
-		else
-			do_io_u_trim(td, io_u);
+		do_io_u_trim(td, io_u);
 
 		io_u_mark_submit(td, 1);
 		io_u_mark_complete(td, 1);
@@ -734,6 +713,7 @@ static void fio_ioring_cleanup(struct thread_data *td)
 		free(ld->io_u_index);
 		free(ld->iovecs);
 		free(ld->fds);
+		free(ld->dsm);
 		free(ld);
 	}
 }
@@ -1146,6 +1126,16 @@ static int fio_ioring_init(struct thread_data *td)
 		return 1;
 	}
 
+	/*
+	 * For io_uring_cmd, trims are async operations unless we are operating
+	 * in zbd mode where trim means zone reset.
+	 */
+	if (!strcmp(td->io_ops->name, "io_uring_cmd") && td_trim(td) &&
+	    td->o.zone_mode == ZONE_MODE_ZBD)
+		td->io_ops->flags |= FIO_ASYNCIO_SYNC_TRIM;
+	else
+		ld->dsm = calloc(ld->iodepth, sizeof(*ld->dsm));
+
 	return 0;
 }
 
@@ -1361,8 +1351,7 @@ static struct ioengine_ops ioengine_uring = {
 static struct ioengine_ops ioengine_uring_cmd = {
 	.name			= "io_uring_cmd",
 	.version		= FIO_IOOPS_VERSION,
-	.flags			= FIO_ASYNCIO_SYNC_TRIM | FIO_NO_OFFLOAD |
-					FIO_MEMALIGN | FIO_RAWIO |
+	.flags			= FIO_NO_OFFLOAD | FIO_MEMALIGN | FIO_RAWIO |
 					FIO_ASYNCIO_SETS_ISSUE_TIME,
 	.init			= fio_ioring_init,
 	.post_init		= fio_ioring_cmd_post_init,
diff --git a/engines/nvme.c b/engines/nvme.c
index 1047ade2..b18ad4c2 100644
--- a/engines/nvme.c
+++ b/engines/nvme.c
@@ -5,8 +5,41 @@
 
 #include "nvme.h"
 
+static inline __u64 get_slba(struct nvme_data *data, struct io_u *io_u)
+{
+	if (data->lba_ext)
+		return io_u->offset / data->lba_ext;
+	else
+		return io_u->offset >> data->lba_shift;
+}
+
+static inline __u32 get_nlb(struct nvme_data *data, struct io_u *io_u)
+{
+	if (data->lba_ext)
+		return io_u->xfer_buflen / data->lba_ext - 1;
+	else
+		return (io_u->xfer_buflen >> data->lba_shift) - 1;
+}
+
+void fio_nvme_uring_cmd_trim_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
+				  struct nvme_dsm_range *dsm)
+{
+	struct nvme_data *data = FILE_ENG_DATA(io_u->file);
+
+	cmd->opcode = nvme_cmd_dsm;
+	cmd->nsid = data->nsid;
+	cmd->cdw10 = 0;
+	cmd->cdw11 = NVME_ATTRIBUTE_DEALLOCATE;
+	cmd->addr = (__u64) (uintptr_t) dsm;
+	cmd->data_len = sizeof(*dsm);
+
+	dsm->slba = get_slba(data, io_u);
+	/* nlb is a 1-based value for deallocate */
+	dsm->nlb = get_nlb(data, io_u) + 1;
+}
+
 int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
-			    struct iovec *iov)
+			    struct iovec *iov, struct nvme_dsm_range *dsm)
 {
 	struct nvme_data *data = FILE_ENG_DATA(io_u->file);
 	__u64 slba;
@@ -14,21 +47,23 @@ int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 
 	memset(cmd, 0, sizeof(struct nvme_uring_cmd));
 
-	if (io_u->ddir == DDIR_READ)
+	switch (io_u->ddir) {
+	case DDIR_READ:
 		cmd->opcode = nvme_cmd_read;
-	else if (io_u->ddir == DDIR_WRITE)
+		break;
+	case DDIR_WRITE:
 		cmd->opcode = nvme_cmd_write;
-	else
+		break;
+	case DDIR_TRIM:
+		fio_nvme_uring_cmd_trim_prep(cmd, io_u, dsm);
+		return 0;
+	default:
 		return -ENOTSUP;
-
-	if (data->lba_ext) {
-		slba = io_u->offset / data->lba_ext;
-		nlb = (io_u->xfer_buflen / data->lba_ext) - 1;
-	} else {
-		slba = io_u->offset >> data->lba_shift;
-		nlb = (io_u->xfer_buflen >> data->lba_shift) - 1;
 	}
 
+	slba = get_slba(data, io_u);
+	nlb = get_nlb(data, io_u);
+
 	/* cdw10 and cdw11 represent starting lba */
 	cmd->cdw10 = slba & 0xffffffff;
 	cmd->cdw11 = slba >> 32;
@@ -48,45 +83,6 @@ int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 	return 0;
 }
 
-static int nvme_trim(int fd, __u32 nsid, __u32 nr_range, __u32 data_len,
-		     void *data)
-{
-	struct nvme_passthru_cmd cmd = {
-		.opcode		= nvme_cmd_dsm,
-		.nsid		= nsid,
-		.addr		= (__u64)(uintptr_t)data,
-		.data_len 	= data_len,
-		.cdw10		= nr_range - 1,
-		.cdw11		= NVME_ATTRIBUTE_DEALLOCATE,
-	};
-
-	return ioctl(fd, NVME_IOCTL_IO_CMD, &cmd);
-}
-
-int fio_nvme_trim(const struct thread_data *td, struct fio_file *f,
-		  unsigned long long offset, unsigned long long len)
-{
-	struct nvme_data *data = FILE_ENG_DATA(f);
-	struct nvme_dsm_range dsm;
-	int ret;
-
-	if (data->lba_ext) {
-		dsm.nlb = len / data->lba_ext;
-		dsm.slba = offset / data->lba_ext;
-	} else {
-		dsm.nlb = len >> data->lba_shift;
-		dsm.slba = offset >> data->lba_shift;
-	}
-
-	ret = nvme_trim(f->fd, data->nsid, 1, sizeof(struct nvme_dsm_range),
-			&dsm);
-	if (ret)
-		log_err("%s: nvme_trim failed for offset %llu and len %llu, err=%d\n",
-			f->file_name, offset, len, ret);
-
-	return ret;
-}
-
 static int nvme_identify(int fd, __u32 nsid, enum nvme_identify_cns cns,
 			 enum nvme_csi csi, void *data)
 {
diff --git a/engines/nvme.h b/engines/nvme.h
index f7cb820d..238471dd 100644
--- a/engines/nvme.h
+++ b/engines/nvme.h
@@ -216,9 +216,6 @@ struct nvme_dsm_range {
 	__le64	slba;
 };
 
-int fio_nvme_trim(const struct thread_data *td, struct fio_file *f,
-		  unsigned long long offset, unsigned long long len);
-
 int fio_nvme_iomgmt_ruhs(struct thread_data *td, struct fio_file *f,
 			 struct nvme_fdp_ruh_status *ruhs, __u32 bytes);
 
@@ -226,7 +223,7 @@ int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
 		      __u32 *ms, __u64 *nlba);
 
 int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
-			    struct iovec *iov);
+			    struct iovec *iov, struct nvme_dsm_range *dsm);
 
 int fio_nvme_get_zoned_model(struct thread_data *td, struct fio_file *f,
 			     enum zbd_zoned_model *model);
diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
index c3130d0e..722e0992 100644
--- a/oslib/linux-blkzoned.c
+++ b/oslib/linux-blkzoned.c
@@ -22,6 +22,9 @@
 #include "zbd_types.h"
 
 #include <linux/blkzoned.h>
+#ifndef BLKFINISHZONE
+#define BLKFINISHZONE _IOW(0x12, 136, struct blk_zone_range)
+#endif
 
 /*
  * If the uapi headers installed on the system lacks zone capacity support,
@@ -312,7 +315,6 @@ int blkzoned_reset_wp(struct thread_data *td, struct fio_file *f,
 int blkzoned_finish_zone(struct thread_data *td, struct fio_file *f,
 			 uint64_t offset, uint64_t length)
 {
-#ifdef BLKFINISHZONE
 	struct blk_zone_range zr = {
 		.sector         = offset >> 9,
 		.nr_sectors     = length >> 9,
@@ -327,21 +329,19 @@ int blkzoned_finish_zone(struct thread_data *td, struct fio_file *f,
 			return -errno;
 	}
 
-	if (ioctl(fd, BLKFINISHZONE, &zr) < 0)
+	if (ioctl(fd, BLKFINISHZONE, &zr) < 0) {
 		ret = -errno;
+		/*
+		 * Kernel versions older than 5.5 do not support BLKFINISHZONE
+		 * and return the ENOTTY error code. These old kernels only
+		 * support block devices that close zones automatically.
+		 */
+		if (ret == ENOTTY)
+			ret = 0;
+	}
 
 	if (f->fd < 0)
 		close(fd);
 
 	return ret;
-#else
-	/*
-	 * Kernel versions older than 5.5 does not support BLKFINISHZONE. These
-	 * old kernels assumed zones are closed automatically at max_open_zones
-	 * limit. Also they did not support max_active_zones limit. Then there
-	 * was no need to finish zones to avoid errors caused by max_open_zones
-	 * or max_active_zones. For those old versions, just do nothing.
-	 */
-	return 0;
-#endif
 }
diff --git a/t/nvmept.py b/t/nvmept.py
index e235d160..cc26d152 100755
--- a/t/nvmept.py
+++ b/t/nvmept.py
@@ -80,6 +80,10 @@ class PassThruTest(FioJobCmdTest):
             print(f"Unhandled rw value {self.fio_opts['rw']}")
             self.passed = False
 
+        if job['iodepth_level']['8'] < 95:
+            print("Did not achieve requested iodepth")
+            self.passed = False
+
 
 TEST_LIST = [
     {
@@ -232,6 +236,23 @@ TEST_LIST = [
             },
         "test_class": PassThruTest,
     },
+    {
+        # We can't enable fixedbufs because for trim-only
+        # workloads fio actually does not allocate any buffers
+        "test_id": 15,
+        "fio_opts": {
+            "rw": 'randtrim',
+            "timebased": 1,
+            "runtime": 3,
+            "fixedbufs": 0,
+            "nonvectored": 1,
+            "force_async": 1,
+            "registerfiles": 1,
+            "sqthread_poll": 1,
+            "output-format": "json",
+            },
+        "test_class": PassThruTest,
+    },
 ]
 
 def parse_args():

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-07-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-07-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5087502fb05b2b4d756045c594a2e09c2ffc97dc:

  init: don't adjust time units again for subjobs (2023-06-20 14:11:36 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 41508de67c06661ff1d473d108a8a01912ade114:

  fio/server: fix confusing sk_out check (2023-07-03 09:16:45 -0600)

----------------------------------------------------------------
Denis Pronin (3):
      fixed compiler warnings if NDEBUG enabled in core code
      fixed compiler warnings if NDEBUG enabled in test code
      use 'min' macro to find out next value of actual_min in libaio

Jens Axboe (3):
      Merge branch 'libaio/actual_min_algo_update' of https://github.com/dpronin/fio
      Merge branch 'improvement/fix-warnings-if-NDEBUG-enabled' of https://github.com/dpronin/fio
      fio/server: fix confusing sk_out check

 backend.c              | 18 ++++++++++++++----
 engines/libaio.c       |  2 +-
 helper_thread.c        |  8 +++++++-
 io_u.c                 |  7 ++++---
 ioengines.c            | 10 ++++++++--
 rate-submit.c          | 18 +++++++++++++++---
 server.c               |  7 ++++++-
 t/read-to-pipe-async.c | 30 +++++++++++++++++++++++-------
 zbd.c                  | 18 ++++++++----------
 9 files changed, 86 insertions(+), 32 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index f541676c..d67a4a07 100644
--- a/backend.c
+++ b/backend.c
@@ -1633,7 +1633,7 @@ static void *thread_main(void *data)
 	uint64_t bytes_done[DDIR_RWDIR_CNT];
 	int deadlock_loop_cnt;
 	bool clear_state;
-	int res, ret;
+	int ret;
 
 	sk_out_assign(sk_out);
 	free(fd);
@@ -1974,13 +1974,23 @@ static void *thread_main(void *data)
 	 * another thread is checking its io_u's for overlap
 	 */
 	if (td_offload_overlap(td)) {
-		int res = pthread_mutex_lock(&overlap_check);
-		assert(res == 0);
+		int res;
+
+		res = pthread_mutex_lock(&overlap_check);
+		if (res) {
+			td->error = errno;
+			goto err;
+		}
 	}
 	td_set_runstate(td, TD_FINISHING);
 	if (td_offload_overlap(td)) {
+		int res;
+
 		res = pthread_mutex_unlock(&overlap_check);
-		assert(res == 0);
+		if (res) {
+			td->error = errno;
+			goto err;
+		}
 	}
 
 	update_rusage_stat(td);
diff --git a/engines/libaio.c b/engines/libaio.c
index 1b82c90b..6a0745aa 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -296,7 +296,7 @@ static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
 		}
 		if (r > 0) {
 			events += r;
-			actual_min = actual_min > events ? actual_min - events : 0;
+			actual_min -= min((unsigned int)events, actual_min);
 		}
 		else if ((min && r == 0) || r == -EAGAIN) {
 			fio_libaio_commit(td);
diff --git a/helper_thread.c b/helper_thread.c
index 77016638..53dea44b 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -1,4 +1,7 @@
+#include <errno.h>
 #include <signal.h>
+#include <stdio.h>
+#include <string.h>
 #include <unistd.h>
 #ifdef CONFIG_HAVE_TIMERFD_CREATE
 #include <sys/timerfd.h>
@@ -122,7 +125,10 @@ static void submit_action(enum action a)
 		return;
 
 	ret = write_to_pipe(helper_data->pipe[1], &data, sizeof(data));
-	assert(ret == 1);
+	if (ret != 1) {
+		log_err("failed to write action into pipe, err %i:%s", errno, strerror(errno));
+		assert(0);
+	}
 }
 
 void helper_reset(void)
diff --git a/io_u.c b/io_u.c
index faf512e5..27b6c92a 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1613,7 +1613,6 @@ struct io_u *__get_io_u(struct thread_data *td)
 {
 	const bool needs_lock = td_async_processing(td);
 	struct io_u *io_u = NULL;
-	int ret;
 
 	if (td->stop_io)
 		return NULL;
@@ -1647,14 +1646,16 @@ again:
 		io_u_set(td, io_u, IO_U_F_IN_CUR_DEPTH);
 		io_u->ipo = NULL;
 	} else if (td_async_processing(td)) {
+		int ret;
 		/*
 		 * We ran out, wait for async verify threads to finish and
 		 * return one
 		 */
 		assert(!(td->flags & TD_F_CHILD));
 		ret = pthread_cond_wait(&td->free_cond, &td->io_u_lock);
-		assert(ret == 0);
-		if (!td->error)
+		if (fio_unlikely(ret != 0)) {
+			td->error = errno;
+		} else if (!td->error)
 			goto again;
 	}
 
diff --git a/ioengines.c b/ioengines.c
index 742f97dd..36172725 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -17,6 +17,7 @@
 #include <assert.h>
 #include <sys/types.h>
 #include <dirent.h>
+#include <errno.h>
 
 #include "fio.h"
 #include "diskutil.h"
@@ -342,8 +343,13 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	 * flag is now set
 	 */
 	if (td_offload_overlap(td)) {
-		int res = pthread_mutex_unlock(&overlap_check);
-		assert(res == 0);
+		int res;
+
+		res = pthread_mutex_unlock(&overlap_check);
+		if (fio_unlikely(res != 0)) {
+			log_err("failed to unlock overlap check mutex, err: %i:%s", errno, strerror(errno));
+			abort();
+		}
 	}
 
 	assert(fio_file_open(io_u->file));
diff --git a/rate-submit.c b/rate-submit.c
index 103a80aa..6f6d15bd 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -5,6 +5,9 @@
  *
  */
 #include <assert.h>
+#include <errno.h>
+#include <pthread.h>
+
 #include "fio.h"
 #include "ioengines.h"
 #include "lib/getrusage.h"
@@ -27,7 +30,10 @@ static void check_overlap(struct io_u *io_u)
 	 * threads as they assess overlap.
 	 */
 	res = pthread_mutex_lock(&overlap_check);
-	assert(res == 0);
+	if (fio_unlikely(res != 0)) {
+		log_err("failed to lock overlap check mutex, err: %i:%s", errno, strerror(errno));
+		abort();
+	}
 
 retry:
 	for_each_td(td) {
@@ -41,9 +47,15 @@ retry:
 			continue;
 
 		res = pthread_mutex_unlock(&overlap_check);
-		assert(res == 0);
+		if (fio_unlikely(res != 0)) {
+			log_err("failed to unlock overlap check mutex, err: %i:%s", errno, strerror(errno));
+			abort();
+		}
 		res = pthread_mutex_lock(&overlap_check);
-		assert(res == 0);
+		if (fio_unlikely(res != 0)) {
+			log_err("failed to lock overlap check mutex, err: %i:%s", errno, strerror(errno));
+			abort();
+		}
 		goto retry;
 	} end_for_each();
 }
diff --git a/server.c b/server.c
index a6347efd..bb423702 100644
--- a/server.c
+++ b/server.c
@@ -1,5 +1,6 @@
 #include <stdio.h>
 #include <stdlib.h>
+#include <string.h>
 #include <unistd.h>
 #include <errno.h>
 #include <poll.h>
@@ -2343,7 +2344,11 @@ void fio_server_send_start(struct thread_data *td)
 {
 	struct sk_out *sk_out = pthread_getspecific(sk_out_key);
 
-	assert(sk_out->sk != -1);
+	if (sk_out->sk == -1) {
+		log_err("pthread getting specific for key failed, sk_out %p, sk %i, err: %i:%s",
+			sk_out, sk_out->sk, errno, strerror(errno));
+		abort();
+	}
 
 	fio_net_queue_cmd(FIO_NET_CMD_SERVER_START, NULL, 0, NULL, SK_F_SIMPLE);
 }
diff --git a/t/read-to-pipe-async.c b/t/read-to-pipe-async.c
index 569fc62a..de98d032 100644
--- a/t/read-to-pipe-async.c
+++ b/t/read-to-pipe-async.c
@@ -36,6 +36,8 @@
 
 #include "../flist.h"
 
+#include "compiler/compiler.h"
+
 static int bs = 4096;
 static int max_us = 10000;
 static char *file;
@@ -47,6 +49,18 @@ static int separate_writer = 1;
 #define PLAT_NR		(PLAT_GROUP_NR * PLAT_VAL)
 #define PLAT_LIST_MAX	20
 
+#ifndef NDEBUG
+#define CHECK_ZERO_OR_ABORT(code) assert(code)
+#else
+#define CHECK_ZERO_OR_ABORT(code) 										\
+	do { 																\
+		if (fio_unlikely((code) != 0)) { 								\
+			log_err("failed checking code %i != 0", (code)); 	\
+			abort();													\
+		} 																\
+	} while (0)
+#endif
+
 struct stats {
 	unsigned int plat[PLAT_NR];
 	unsigned int nr_samples;
@@ -121,7 +135,7 @@ uint64_t utime_since(const struct timespec *s, const struct timespec *e)
 	return ret;
 }
 
-static struct work_item *find_seq(struct writer_thread *w, unsigned int seq)
+static struct work_item *find_seq(struct writer_thread *w, int seq)
 {
 	struct work_item *work;
 	struct flist_head *entry;
@@ -224,6 +238,8 @@ static int write_work(struct work_item *work)
 
 	clock_gettime(CLOCK_MONOTONIC, &s);
 	ret = write(STDOUT_FILENO, work->buf, work->buf_size);
+	if (ret < 0)
+		return (int)ret;
 	clock_gettime(CLOCK_MONOTONIC, &e);
 	assert(ret == work->buf_size);
 
@@ -241,10 +257,10 @@ static void *writer_fn(void *data)
 {
 	struct writer_thread *wt = data;
 	struct work_item *work;
-	unsigned int seq = 1;
+	int seq = 1;
 
 	work = NULL;
-	while (!wt->thread.exit || !flist_empty(&wt->list)) {
+	while (!(seq < 0) && (!wt->thread.exit || !flist_empty(&wt->list))) {
 		pthread_mutex_lock(&wt->thread.lock);
 
 		if (work)
@@ -467,10 +483,10 @@ static void init_thread(struct thread_data *thread)
 	int ret;
 
 	ret = pthread_condattr_init(&cattr);
-	assert(ret == 0);
+	CHECK_ZERO_OR_ABORT(ret);
 #ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
 	ret = pthread_condattr_setclock(&cattr, CLOCK_MONOTONIC);
-	assert(ret == 0);
+	CHECK_ZERO_OR_ABORT(ret);
 #endif
 	pthread_cond_init(&thread->cond, &cattr);
 	pthread_cond_init(&thread->done_cond, &cattr);
@@ -624,10 +640,10 @@ int main(int argc, char *argv[])
 	bytes = 0;
 
 	ret = pthread_condattr_init(&cattr);
-	assert(ret == 0);
+	CHECK_ZERO_OR_ABORT(ret);
 #ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
 	ret = pthread_condattr_setclock(&cattr, CLOCK_MONOTONIC);
-	assert(ret == 0);
+	CHECK_ZERO_OR_ABORT(ret);
 #endif
 
 	clock_gettime(CLOCK_MONOTONIC, &s);
diff --git a/zbd.c b/zbd.c
index 7fcf1ec4..d4565215 100644
--- a/zbd.c
+++ b/zbd.c
@@ -11,6 +11,7 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
+#include "compiler/compiler.h"
 #include "os/os.h"
 #include "file.h"
 #include "fio.h"
@@ -102,13 +103,13 @@ static bool zbd_zone_full(const struct fio_file *f, struct fio_zone_info *z,
 static void zone_lock(struct thread_data *td, const struct fio_file *f,
 		      struct fio_zone_info *z)
 {
+#ifndef NDEBUG
 	struct zoned_block_device_info *zbd = f->zbd_info;
-	uint32_t nz = z - zbd->zone_info;
-
+	uint32_t const nz = z - zbd->zone_info;
 	/* A thread should never lock zones outside its working area. */
 	assert(f->min_zone <= nz && nz < f->max_zone);
-
 	assert(z->has_wp);
+#endif
 
 	/*
 	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
@@ -128,11 +129,8 @@ static void zone_lock(struct thread_data *td, const struct fio_file *f,
 
 static inline void zone_unlock(struct fio_zone_info *z)
 {
-	int ret;
-
 	assert(z->has_wp);
-	ret = pthread_mutex_unlock(&z->mutex);
-	assert(!ret);
+	pthread_mutex_unlock(&z->mutex);
 }
 
 static inline struct fio_zone_info *zbd_get_zone(const struct fio_file *f,
@@ -420,7 +418,8 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
 	int res = 0;
 
-	assert(min_bs);
+	if (fio_unlikely(0 == min_bs))
+		return 1;
 
 	dprint(FD_ZBD, "%s: examining zones %u .. %u\n",
 	       f->file_name, zbd_zone_idx(f, zb), zbd_zone_idx(f, ze));
@@ -1714,10 +1713,9 @@ unlock:
 static void zbd_put_io(struct thread_data *td, const struct io_u *io_u)
 {
 	const struct fio_file *f = io_u->file;
-	struct zoned_block_device_info *zbd_info = f->zbd_info;
 	struct fio_zone_info *z;
 
-	assert(zbd_info);
+	assert(f->zbd_info);
 
 	z = zbd_offset_to_zone(f, io_u->offset);
 	assert(z->has_wp);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-06-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-06-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8ce9c4003aeaafa91c3278c1c7de4a32fadc5ea0:

  docs: clarify opendir description (2023-06-16 10:41:25 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5087502fb05b2b4d756045c594a2e09c2ffc97dc:

  init: don't adjust time units again for subjobs (2023-06-20 14:11:36 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      init: don't adjust time units again for subjobs

 init.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 437406ec..10e63cca 100644
--- a/init.c
+++ b/init.c
@@ -951,13 +951,16 @@ static int fixup_options(struct thread_data *td)
 	if (o->disable_slat)
 		o->slat_percentiles = 0;
 
-	/*
-	 * Fix these up to be nsec internally
-	 */
-	for_each_rw_ddir(ddir)
-		o->max_latency[ddir] *= 1000ULL;
+	/* Do this only for the parent job */
+	if (!td->subjob_number) {
+		/*
+		 * Fix these up to be nsec internally
+		 */
+		for_each_rw_ddir(ddir)
+			o->max_latency[ddir] *= 1000ULL;
 
-	o->latency_target *= 1000ULL;
+		o->latency_target *= 1000ULL;
+	}
 
 	/*
 	 * Dedupe working set verifications

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-06-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-06-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 62ac66490f5077e5fca1bd5b49165147cafc5a0d:

  zbd: avoid Coverity defect report (2023-06-09 18:04:45 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8ce9c4003aeaafa91c3278c1c7de4a32fadc5ea0:

  docs: clarify opendir description (2023-06-16 10:41:25 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      docs: clarify opendir description

 HOWTO.rst | 4 +++-
 fio.1     | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 32fff5ec..2e1e55c2 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -843,7 +843,9 @@ Target file/device
 
 .. option:: opendir=str
 
-	Recursively open any files below directory `str`.
+        Recursively open any files below directory `str`. This accepts only a
+        single directory and unlike related options, colons appearing in the
+        path must not be escaped.
 
 .. option:: lockfile=str
 
diff --git a/fio.1 b/fio.1
index 80bf3371..73b7e8c9 100644
--- a/fio.1
+++ b/fio.1
@@ -627,7 +627,9 @@ generated filenames (with a directory specified) with the source of the
 client connecting. To disable this behavior, set this option to 0.
 .TP
 .BI opendir \fR=\fPstr
-Recursively open any files below directory \fIstr\fR.
+Recursively open any files below directory \fIstr\fR. This accepts only a
+single directory and unlike related options, colons appearing in the path must
+not be escaped.
 .TP
 .BI lockfile \fR=\fPstr
 Fio defaults to not locking any files before it does I/O to them. If a file

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-06-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-06-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit edaee5b96fd87c3c5fe7f64ec917a175cd9237fc:

  t/zbd: test write zone accounting of trim workload (2023-06-08 14:39:07 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 62ac66490f5077e5fca1bd5b49165147cafc5a0d:

  zbd: avoid Coverity defect report (2023-06-09 18:04:45 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (1):
      zbd: avoid Coverity defect report

 zbd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/zbd.c b/zbd.c
index 9455140a..7fcf1ec4 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1547,11 +1547,11 @@ retry:
 		dprint(FD_ZBD,
 		       "%s(%s): wait zone write and retry write target zone selection\n",
 		       __func__, f->file_name);
+		should_retry = in_flight;
 		pthread_mutex_unlock(&zbdi->mutex);
 		zone_unlock(z);
 		io_u_quiesce(td);
 		zone_lock(td, f, z);
-		should_retry = in_flight;
 		goto retry;
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-06-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-06-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1b4ba547cf45377fffc7a1e60728369997cc7a9b:

  t/run-fio-tests: address issues identified by pylint (2023-06-01 14:12:41 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to edaee5b96fd87c3c5fe7f64ec917a175cd9237fc:

  t/zbd: test write zone accounting of trim workload (2023-06-08 14:39:07 -0400)

----------------------------------------------------------------
Shin'ichiro Kawasaki (7):
      zbd: rename 'open zones' to 'write zones'
      zbd: do not reset extra zones in open conditions
      zbd: fix write zone accounting of almost full zones
      zbd: fix write zone accounting of trim workload
      t/zbd: reset zones before tests with max_open_zones option
      t/zbd: test write zone accounting of almost full zones
      t/zbd: test write zone accounting of trim workload

Vincent Fu (17):
      t/run-fio-tests: split source file
      t/run-fio-tests: rename FioJobTest to FioJobFileTest
      t/run-fio-tests: move get_file outside of FioJobFileTest
      t/fiotestlib: use dictionaries for filenames and paths
      t/fiotestlib: use 'with' for opening files
      t/fiotestlib: use f-string for formatting
      t/fiotestlib: rearrange constructor and setup steps
      t/fiotestlib: record test command in more useful format
      t/fiotestlib: add class for command-line fio job
      t/random_seed: use logging module for debug prints
      t/random_seed: use methods provided in fiotestlib to run tests
      t/random_seed: fixes from pylint
      t/readonly: adapt to use fiotestlib
      t/nvmept: adapt to use fiotestlib
      t/fiotestlib: add ability to ingest iops logs
      t/strided: adapt to use fiotestlib
      t/strided: increase minumum recommended size to 64MiB

 engines/io_uring.c     |   2 +-
 fio.h                  |   2 +-
 io_u.c                 |   2 +-
 io_u.h                 |   2 +-
 options.c              |   4 +-
 t/fiotestcommon.py     | 176 +++++++++++++
 t/fiotestlib.py        | 485 ++++++++++++++++++++++++++++++++++
 t/nvmept.py            | 447 ++++++++++++--------------------
 t/random_seed.py       | 300 +++++++++------------
 t/readonly.py          | 220 +++++++++-------
 t/run-fio-tests.py     | 644 +++++----------------------------------------
 t/strided.py           | 691 ++++++++++++++++++++++++++++---------------------
 t/zbd/test-zbd-support |  64 ++++-
 zbd.c                  | 292 ++++++++++++---------
 zbd.h                  |  25 +-
 zbd_types.h            |   2 +-
 16 files changed, 1771 insertions(+), 1587 deletions(-)
 create mode 100644 t/fiotestcommon.py
 create mode 100755 t/fiotestlib.py

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index ff64fc9f..73e4a27a 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -561,7 +561,7 @@ static inline void fio_ioring_cmdprio_prep(struct thread_data *td,
 		ld->sqes[io_u->index].ioprio = io_u->ioprio;
 }
 
-static int fio_ioring_cmd_io_u_trim(const struct thread_data *td,
+static int fio_ioring_cmd_io_u_trim(struct thread_data *td,
 				    struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
diff --git a/fio.h b/fio.h
index 6fc7fb9c..c5453d13 100644
--- a/fio.h
+++ b/fio.h
@@ -275,7 +275,7 @@ struct thread_data {
 	unsigned long long num_unique_pages;
 
 	struct zone_split_index **zone_state_index;
-	unsigned int num_open_zones;
+	unsigned int num_write_zones;
 
 	unsigned int verify_batch;
 	unsigned int trim_batch;
diff --git a/io_u.c b/io_u.c
index 6f5fc94d..faf512e5 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2379,7 +2379,7 @@ int do_io_u_sync(const struct thread_data *td, struct io_u *io_u)
 	return ret;
 }
 
-int do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
+int do_io_u_trim(struct thread_data *td, struct io_u *io_u)
 {
 #ifndef FIO_HAVE_TRIM
 	io_u->error = EINVAL;
diff --git a/io_u.h b/io_u.h
index 55b4d083..b432a540 100644
--- a/io_u.h
+++ b/io_u.h
@@ -162,7 +162,7 @@ void io_u_mark_submit(struct thread_data *, unsigned int);
 bool queue_full(const struct thread_data *);
 
 int do_io_u_sync(const struct thread_data *, struct io_u *);
-int do_io_u_trim(const struct thread_data *, struct io_u *);
+int do_io_u_trim(struct thread_data *, struct io_u *);
 
 #ifdef FIO_INC_DEBUG
 static inline void dprint_io_u(struct io_u *io_u, const char *p)
diff --git a/options.c b/options.c
index 8193fb29..a7c4ef6e 100644
--- a/options.c
+++ b/options.c
@@ -3618,7 +3618,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Per device/file maximum number of open zones",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct thread_options, max_open_zones),
-		.maxval	= ZBD_MAX_OPEN_ZONES,
+		.maxval	= ZBD_MAX_WRITE_ZONES,
 		.help	= "Limit on the number of simultaneously opened sequential write zones with zonemode=zbd",
 		.def	= "0",
 		.category = FIO_OPT_C_IO,
@@ -3629,7 +3629,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Job maximum number of open zones",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct thread_options, job_max_open_zones),
-		.maxval	= ZBD_MAX_OPEN_ZONES,
+		.maxval	= ZBD_MAX_WRITE_ZONES,
 		.help	= "Limit on the number of simultaneously opened sequential write zones with zonemode=zbd by one thread/process",
 		.def	= "0",
 		.category = FIO_OPT_C_IO,
diff --git a/t/fiotestcommon.py b/t/fiotestcommon.py
new file mode 100644
index 00000000..f5012c82
--- /dev/null
+++ b/t/fiotestcommon.py
@@ -0,0 +1,176 @@
+#!/usr/bin/env python3
+"""
+fiotestcommon.py
+
+This contains constant definitions, helpers, and a Requirements class that can
+be used to help with running fio tests.
+"""
+
+import os
+import locale
+import logging
+import platform
+import subprocess
+import multiprocessing
+
+
+SUCCESS_DEFAULT = {
+    'zero_return': True,
+    'stderr_empty': True,
+    'timeout': 600,
+    }
+SUCCESS_NONZERO = {
+    'zero_return': False,
+    'stderr_empty': False,
+    'timeout': 600,
+    }
+SUCCESS_STDERR = {
+    'zero_return': True,
+    'stderr_empty': False,
+    'timeout': 600,
+    }
+
+
+def get_file(filename):
+    """Safely read a file."""
+    file_data = ''
+    success = True
+
+    try:
+        with open(filename, "r", encoding=locale.getpreferredencoding()) as output_file:
+            file_data = output_file.read()
+    except OSError:
+        success = False
+
+    return file_data, success
+
+
+class Requirements():
+    """Requirements consists of multiple run environment characteristics.
+    These are to determine if a particular test can be run"""
+
+    _linux = False
+    _libaio = False
+    _io_uring = False
+    _zbd = False
+    _root = False
+    _zoned_nullb = False
+    _not_macos = False
+    _not_windows = False
+    _unittests = False
+    _cpucount4 = False
+    _nvmecdev = False
+
+    def __init__(self, fio_root, args):
+        Requirements._not_macos = platform.system() != "Darwin"
+        Requirements._not_windows = platform.system() != "Windows"
+        Requirements._linux = platform.system() == "Linux"
+
+        if Requirements._linux:
+            config_file = os.path.join(fio_root, "config-host.h")
+            contents, success = get_file(config_file)
+            if not success:
+                print(f"Unable to open {config_file} to check requirements")
+                Requirements._zbd = True
+            else:
+                Requirements._zbd = "CONFIG_HAS_BLKZONED" in contents
+                Requirements._libaio = "CONFIG_LIBAIO" in contents
+
+            contents, success = get_file("/proc/kallsyms")
+            if not success:
+                print("Unable to open '/proc/kallsyms' to probe for io_uring support")
+            else:
+                Requirements._io_uring = "io_uring_setup" in contents
+
+            Requirements._root = os.geteuid() == 0
+            if Requirements._zbd and Requirements._root:
+                try:
+                    subprocess.run(["modprobe", "null_blk"],
+                                   stdout=subprocess.PIPE,
+                                   stderr=subprocess.PIPE)
+                    if os.path.exists("/sys/module/null_blk/parameters/zoned"):
+                        Requirements._zoned_nullb = True
+                except Exception:
+                    pass
+
+        if platform.system() == "Windows":
+            utest_exe = "unittest.exe"
+        else:
+            utest_exe = "unittest"
+        unittest_path = os.path.join(fio_root, "unittests", utest_exe)
+        Requirements._unittests = os.path.exists(unittest_path)
+
+        Requirements._cpucount4 = multiprocessing.cpu_count() >= 4
+        Requirements._nvmecdev = args.nvmecdev
+
+        req_list = [
+                Requirements.linux,
+                Requirements.libaio,
+                Requirements.io_uring,
+                Requirements.zbd,
+                Requirements.root,
+                Requirements.zoned_nullb,
+                Requirements.not_macos,
+                Requirements.not_windows,
+                Requirements.unittests,
+                Requirements.cpucount4,
+                Requirements.nvmecdev,
+                    ]
+        for req in req_list:
+            value, desc = req()
+            logging.debug("Requirements: Requirement '%s' met? %s", desc, value)
+
+    @classmethod
+    def linux(cls):
+        """Are we running on Linux?"""
+        return Requirements._linux, "Linux required"
+
+    @classmethod
+    def libaio(cls):
+        """Is libaio available?"""
+        return Requirements._libaio, "libaio required"
+
+    @classmethod
+    def io_uring(cls):
+        """Is io_uring available?"""
+        return Requirements._io_uring, "io_uring required"
+
+    @classmethod
+    def zbd(cls):
+        """Is ZBD support available?"""
+        return Requirements._zbd, "Zoned block device support required"
+
+    @classmethod
+    def root(cls):
+        """Are we running as root?"""
+        return Requirements._root, "root required"
+
+    @classmethod
+    def zoned_nullb(cls):
+        """Are zoned null block devices available?"""
+        return Requirements._zoned_nullb, "Zoned null block device support required"
+
+    @classmethod
+    def not_macos(cls):
+        """Are we running on a platform other than macOS?"""
+        return Requirements._not_macos, "platform other than macOS required"
+
+    @classmethod
+    def not_windows(cls):
+        """Are we running on a platform other than Windws?"""
+        return Requirements._not_windows, "platform other than Windows required"
+
+    @classmethod
+    def unittests(cls):
+        """Were unittests built?"""
+        return Requirements._unittests, "Unittests support required"
+
+    @classmethod
+    def cpucount4(cls):
+        """Do we have at least 4 CPUs?"""
+        return Requirements._cpucount4, "4+ CPUs required"
+
+    @classmethod
+    def nvmecdev(cls):
+        """Do we have an NVMe character device to test?"""
+        return Requirements._nvmecdev, "NVMe character device test target required"
diff --git a/t/fiotestlib.py b/t/fiotestlib.py
new file mode 100755
index 00000000..0fe17b74
--- /dev/null
+++ b/t/fiotestlib.py
@@ -0,0 +1,485 @@
+#!/usr/bin/env python3
+"""
+fiotestlib.py
+
+This library contains FioTest objects that provide convenient means to run
+different sorts of fio tests.
+
+It also contains a test runner that runs an array of dictionary objects
+describing fio tests.
+"""
+
+import os
+import sys
+import json
+import locale
+import logging
+import platform
+import traceback
+import subprocess
+from pathlib import Path
+from fiotestcommon import get_file, SUCCESS_DEFAULT
+
+
+class FioTest():
+    """Base for all fio tests."""
+
+    def __init__(self, exe_path, success, testnum, artifact_root):
+        self.success = success
+        self.testnum = testnum
+        self.output = {}
+        self.passed = True
+        self.failure_reason = ''
+        self.parameters = None
+        self.paths = {
+                        'exe': exe_path,
+                        'artifacts': artifact_root,
+                        'test_dir': os.path.join(artifact_root, \
+                                f"{testnum:04d}"),
+                        }
+        self.filenames = {
+                            'cmd': os.path.join(self.paths['test_dir'], \
+                                    f"{os.path.basename(self.paths['exe'])}.command"),
+                            'stdout': os.path.join(self.paths['test_dir'], \
+                                    f"{os.path.basename(self.paths['exe'])}.stdout"),
+                            'stderr': os.path.join(self.paths['test_dir'], \
+                                    f"{os.path.basename(self.paths['exe'])}.stderr"),
+                            'exitcode': os.path.join(self.paths['test_dir'], \
+                                    f"{os.path.basename(self.paths['exe'])}.exitcode"),
+                            }
+
+    def setup(self, parameters):
+        """Setup instance variables for test."""
+
+        self.parameters = parameters
+        if not os.path.exists(self.paths['test_dir']):
+            os.mkdir(self.paths['test_dir'])
+
+    def run(self):
+        """Run the test."""
+
+        raise NotImplementedError()
+
+    def check_result(self):
+        """Check test results."""
+
+        raise NotImplementedError()
+
+
+class FioExeTest(FioTest):
+    """Test consists of an executable binary or script"""
+
+    def run(self):
+        """Execute the binary or script described by this instance."""
+
+        command = [self.paths['exe']] + self.parameters
+        with open(self.filenames['cmd'], "w+",
+                  encoding=locale.getpreferredencoding()) as command_file:
+            command_file.write(" ".join(command))
+
+        try:
+            with open(self.filenames['stdout'], "w+",
+                      encoding=locale.getpreferredencoding()) as stdout_file, \
+                open(self.filenames['stderr'], "w+",
+                     encoding=locale.getpreferredencoding()) as stderr_file, \
+                open(self.filenames['exitcode'], "w+",
+                     encoding=locale.getpreferredencoding()) as exitcode_file:
+                proc = None
+                # Avoid using subprocess.run() here because when a timeout occurs,
+                # fio will be stopped with SIGKILL. This does not give fio a
+                # chance to clean up and means that child processes may continue
+                # running and submitting IO.
+                proc = subprocess.Popen(command,
+                                        stdout=stdout_file,
+                                        stderr=stderr_file,
+                                        cwd=self.paths['test_dir'],
+                                        universal_newlines=True)
+                proc.communicate(timeout=self.success['timeout'])
+                exitcode_file.write(f'{proc.returncode}\n')
+                logging.debug("Test %d: return code: %d", self.testnum, proc.returncode)
+                self.output['proc'] = proc
+        except subprocess.TimeoutExpired:
+            proc.terminate()
+            proc.communicate()
+            assert proc.poll()
+            self.output['failure'] = 'timeout'
+        except Exception:
+            if proc:
+                if not proc.poll():
+                    proc.terminate()
+                    proc.communicate()
+            self.output['failure'] = 'exception'
+            self.output['exc_info'] = sys.exc_info()
+
+    def check_result(self):
+        """Check results of test run."""
+
+        if 'proc' not in self.output:
+            if self.output['failure'] == 'timeout':
+                self.failure_reason = f"{self.failure_reason} timeout,"
+            else:
+                assert self.output['failure'] == 'exception'
+                self.failure_reason = f'{self.failure_reason} exception: ' + \
+                f'{self.output["exc_info"][0]}, {self.output["exc_info"][1]}'
+
+            self.passed = False
+            return
+
+        if 'zero_return' in self.success:
+            if self.success['zero_return']:
+                if self.output['proc'].returncode != 0:
+                    self.passed = False
+                    self.failure_reason = f"{self.failure_reason} non-zero return code,"
+            else:
+                if self.output['proc'].returncode == 0:
+                    self.failure_reason = f"{self.failure_reason} zero return code,"
+                    self.passed = False
+
+        stderr_size = os.path.getsize(self.filenames['stderr'])
+        if 'stderr_empty' in self.success:
+            if self.success['stderr_empty']:
+                if stderr_size != 0:
+                    self.failure_reason = f"{self.failure_reason} stderr not empty,"
+                    self.passed = False
+            else:
+                if stderr_size == 0:
+                    self.failure_reason = f"{self.failure_reason} stderr empty,"
+                    self.passed = False
+
+
+class FioJobFileTest(FioExeTest):
+    """Test consists of a fio job with options in a job file."""
+
+    def __init__(self, fio_path, fio_job, success, testnum, artifact_root,
+                 fio_pre_job=None, fio_pre_success=None,
+                 output_format="normal"):
+        """Construct a FioJobFileTest which is a FioExeTest consisting of a
+        single fio job file with an optional setup step.
+
+        fio_path:           location of fio executable
+        fio_job:            location of fio job file
+        success:            Definition of test success
+        testnum:            test ID
+        artifact_root:      root directory for artifacts
+        fio_pre_job:        fio job for preconditioning
+        fio_pre_success:    Definition of test success for fio precon job
+        output_format:      normal (default), json, jsonplus, or terse
+        """
+
+        self.fio_job = fio_job
+        self.fio_pre_job = fio_pre_job
+        self.fio_pre_success = fio_pre_success if fio_pre_success else success
+        self.output_format = output_format
+        self.precon_failed = False
+        self.json_data = None
+
+        super().__init__(fio_path, success, testnum, artifact_root)
+
+    def setup(self, parameters=None):
+        """Setup instance variables for fio job test."""
+
+        self.filenames['fio_output'] = f"{os.path.basename(self.fio_job)}.output"
+        fio_args = [
+            "--max-jobs=16",
+            f"--output-format={self.output_format}",
+            f"--output={self.filenames['fio_output']}",
+            self.fio_job,
+            ]
+
+        super().setup(fio_args)
+
+        # Update the filenames from the default
+        self.filenames['cmd'] = os.path.join(self.paths['test_dir'],
+                                             f"{os.path.basename(self.fio_job)}.command")
+        self.filenames['stdout'] = os.path.join(self.paths['test_dir'],
+                                                f"{os.path.basename(self.fio_job)}.stdout")
+        self.filenames['stderr'] = os.path.join(self.paths['test_dir'],
+                                                f"{os.path.basename(self.fio_job)}.stderr")
+        self.filenames['exitcode'] = os.path.join(self.paths['test_dir'],
+                                                  f"{os.path.basename(self.fio_job)}.exitcode")
+
+    def run_pre_job(self):
+        """Run fio job precondition step."""
+
+        precon = FioJobFileTest(self.paths['exe'], self.fio_pre_job,
+                            self.fio_pre_success,
+                            self.testnum,
+                            self.paths['artifacts'],
+                            output_format=self.output_format)
+        precon.setup()
+        precon.run()
+        precon.check_result()
+        self.precon_failed = not precon.passed
+        self.failure_reason = precon.failure_reason
+
+    def run(self):
+        """Run fio job test."""
+
+        if self.fio_pre_job:
+            self.run_pre_job()
+
+        if not self.precon_failed:
+            super().run()
+        else:
+            logging.debug("Test %d: precondition step failed", self.testnum)
+
+    def get_file_fail(self, filename):
+        """Safely read a file and fail the test upon error."""
+        file_data = None
+
+        try:
+            with open(filename, "r", encoding=locale.getpreferredencoding()) as output_file:
+                file_data = output_file.read()
+        except OSError:
+            self.failure_reason += f" unable to read file {filename}"
+            self.passed = False
+
+        return file_data
+
+    def check_result(self):
+        """Check fio job results."""
+
+        if self.precon_failed:
+            self.passed = False
+            self.failure_reason = f"{self.failure_reason} precondition step failed,"
+            return
+
+        super().check_result()
+
+        if not self.passed:
+            return
+
+        if 'json' not in self.output_format:
+            return
+
+        file_data = self.get_file_fail(os.path.join(self.paths['test_dir'],
+                                                    self.filenames['fio_output']))
+        if not file_data:
+            return
+
+        #
+        # Sometimes fio informational messages are included at the top of the
+        # JSON output, especially under Windows. Try to decode output as JSON
+        # data, skipping everything until the first {
+        #
+        lines = file_data.splitlines()
+        file_data = '\n'.join(lines[lines.index("{"):])
+        try:
+            self.json_data = json.loads(file_data)
+        except json.JSONDecodeError:
+            self.failure_reason = f"{self.failure_reason} unable to decode JSON data,"
+            self.passed = False
+
+
+class FioJobCmdTest(FioExeTest):
+    """This runs a fio job with options specified on the command line."""
+
+    def __init__(self, fio_path, success, testnum, artifact_root, fio_opts, basename=None):
+
+        self.basename = basename if basename else os.path.basename(fio_path)
+        self.fio_opts = fio_opts
+        self.json_data = None
+        self.iops_log_lines = None
+
+        super().__init__(fio_path, success, testnum, artifact_root)
+
+        filename_stub = os.path.join(self.paths['test_dir'], f"{self.basename}{self.testnum:03d}")
+        self.filenames['cmd'] = f"{filename_stub}.command"
+        self.filenames['stdout'] = f"{filename_stub}.stdout"
+        self.filenames['stderr'] = f"{filename_stub}.stderr"
+        self.filenames['output'] = os.path.abspath(f"{filename_stub}.output")
+        self.filenames['exitcode'] = f"{filename_stub}.exitcode"
+        self.filenames['iopslog'] = os.path.abspath(f"{filename_stub}")
+
+    def run(self):
+        super().run()
+
+        if 'output-format' in self.fio_opts and 'json' in \
+                self.fio_opts['output-format']:
+            if not self.get_json():
+                print('Unable to decode JSON data')
+                self.passed = False
+
+        if any('--write_iops_log=' in param for param in self.parameters):
+            self.get_iops_log()
+
+    def get_iops_log(self):
+        """Read IOPS log from the first job."""
+
+        log_filename = self.filenames['iopslog'] + "_iops.1.log"
+        with open(log_filename, 'r', encoding=locale.getpreferredencoding()) as iops_file:
+            self.iops_log_lines = iops_file.read()
+
+    def get_json(self):
+        """Convert fio JSON output into a python JSON object"""
+
+        filename = self.filenames['output']
+        with open(filename, 'r', encoding=locale.getpreferredencoding()) as file:
+            file_data = file.read()
+
+        #
+        # Sometimes fio informational messages are included at the top of the
+        # JSON output, especially under Windows. Try to decode output as JSON
+        # data, lopping off up to the first four lines
+        #
+        lines = file_data.splitlines()
+        for i in range(5):
+            file_data = '\n'.join(lines[i:])
+            try:
+                self.json_data = json.loads(file_data)
+            except json.JSONDecodeError:
+                continue
+            else:
+                return True
+
+        return False
+
+    @staticmethod
+    def check_empty(job):
+        """
+        Make sure JSON data is empty.
+
+        Some data structures should be empty. This function makes sure that they are.
+
+        job         JSON object that we need to check for emptiness
+        """
+
+        return job['total_ios'] == 0 and \
+                job['slat_ns']['N'] == 0 and \
+                job['clat_ns']['N'] == 0 and \
+                job['lat_ns']['N'] == 0
+
+    def check_all_ddirs(self, ddir_nonzero, job):
+        """
+        Iterate over the data directions and check whether each is
+        appropriately empty or not.
+        """
+
+        retval = True
+        ddirlist = ['read', 'write', 'trim']
+
+        for ddir in ddirlist:
+            if ddir in ddir_nonzero:
+                if self.check_empty(job[ddir]):
+                    print(f"Unexpected zero {ddir} data found in output")
+                    retval = False
+            else:
+                if not self.check_empty(job[ddir]):
+                    print(f"Unexpected {ddir} data found in output")
+                    retval = False
+
+        return retval
+
+
+def run_fio_tests(test_list, test_env, args):
+    """
+    Run tests as specified in test_list.
+    """
+
+    passed = 0
+    failed = 0
+    skipped = 0
+
+    for config in test_list:
+        if (args.skip and config['test_id'] in args.skip) or \
+           (args.run_only and config['test_id'] not in args.run_only):
+            skipped = skipped + 1
+            print(f"Test {config['test_id']} SKIPPED (User request)")
+            continue
+
+        if issubclass(config['test_class'], FioJobFileTest):
+            if config['pre_job']:
+                fio_pre_job = os.path.join(test_env['fio_root'], 't', 'jobs',
+                                           config['pre_job'])
+            else:
+                fio_pre_job = None
+            if config['pre_success']:
+                fio_pre_success = config['pre_success']
+            else:
+                fio_pre_success = None
+            if 'output_format' in config:
+                output_format = config['output_format']
+            else:
+                output_format = 'normal'
+            test = config['test_class'](
+                test_env['fio_path'],
+                os.path.join(test_env['fio_root'], 't', 'jobs', config['job']),
+                config['success'],
+                config['test_id'],
+                test_env['artifact_root'],
+                fio_pre_job=fio_pre_job,
+                fio_pre_success=fio_pre_success,
+                output_format=output_format)
+            desc = config['job']
+            parameters = []
+        elif issubclass(config['test_class'], FioJobCmdTest):
+            if not 'success' in config:
+                config['success'] = SUCCESS_DEFAULT
+            test = config['test_class'](test_env['fio_path'],
+                                        config['success'],
+                                        config['test_id'],
+                                        test_env['artifact_root'],
+                                        config['fio_opts'],
+                                        test_env['basename'])
+            desc = config['test_id']
+            parameters = config
+        elif issubclass(config['test_class'], FioExeTest):
+            exe_path = os.path.join(test_env['fio_root'], config['exe'])
+            parameters = []
+            if config['parameters']:
+                parameters = [p.format(fio_path=test_env['fio_path'], nvmecdev=args.nvmecdev)
+                              for p in config['parameters']]
+            if Path(exe_path).suffix == '.py' and platform.system() == "Windows":
+                parameters.insert(0, exe_path)
+                exe_path = "python.exe"
+            if config['test_id'] in test_env['pass_through']:
+                parameters += test_env['pass_through'][config['test_id']].split()
+            test = config['test_class'](
+                    exe_path,
+                    config['success'],
+                    config['test_id'],
+                    test_env['artifact_root'])
+            desc = config['exe']
+        else:
+            print(f"Test {config['test_id']} FAILED: unable to process test config")
+            failed = failed + 1
+            continue
+
+        if 'requirements' in config and not args.skip_req:
+            reqs_met = True
+            for req in config['requirements']:
+                reqs_met, reason = req()
+                logging.debug("Test %d: Requirement '%s' met? %s", config['test_id'], reason,
+                              reqs_met)
+                if not reqs_met:
+                    break
+            if not reqs_met:
+                print(f"Test {config['test_id']} SKIPPED ({reason}) {desc}")
+                skipped = skipped + 1
+                continue
+
+        try:
+            test.setup(parameters)
+            test.run()
+            test.check_result()
+        except KeyboardInterrupt:
+            break
+        except Exception as e:
+            test.passed = False
+            test.failure_reason += str(e)
+            logging.debug("Test %d exception:\n%s\n", config['test_id'], traceback.format_exc())
+        if test.passed:
+            result = "PASSED"
+            passed = passed + 1
+        else:
+            result = f"FAILED: {test.failure_reason}"
+            failed = failed + 1
+            contents, _ = get_file(test.filenames['stderr'])
+            logging.debug("Test %d: stderr:\n%s", config['test_id'], contents)
+            contents, _ = get_file(test.filenames['stdout'])
+            logging.debug("Test %d: stdout:\n%s", config['test_id'], contents)
+        print(f"Test {config['test_id']} {result} {desc}")
+
+    print(f"{passed} test(s) passed, {failed} failed, {skipped} skipped")
+
+    return passed, failed, skipped
diff --git a/t/nvmept.py b/t/nvmept.py
index a25192f2..e235d160 100755
--- a/t/nvmept.py
+++ b/t/nvmept.py
@@ -17,42 +17,20 @@
 """
 import os
 import sys
-import json
 import time
-import locale
 import argparse
-import subprocess
 from pathlib import Path
+from fiotestlib import FioJobCmdTest, run_fio_tests
 
-class FioTest():
-    """fio test."""
 
-    def __init__(self, artifact_root, test_opts, debug):
-        """
-        artifact_root   root directory for artifacts (subdirectory will be created under here)
-        test            test specification
-        """
-        self.artifact_root = artifact_root
-        self.test_opts = test_opts
-        self.debug = debug
-        self.filename_stub = None
-        self.filenames = {}
-        self.json_data = None
-
-        self.test_dir = os.path.abspath(os.path.join(self.artifact_root,
-                                     f"{self.test_opts['test_id']:03d}"))
-        if not os.path.exists(self.test_dir):
-            os.mkdir(self.test_dir)
-
-        self.filename_stub = f"pt{self.test_opts['test_id']:03d}"
-        self.filenames['command'] = os.path.join(self.test_dir, f"{self.filename_stub}.command")
-        self.filenames['stdout'] = os.path.join(self.test_dir, f"{self.filename_stub}.stdout")
-        self.filenames['stderr'] = os.path.join(self.test_dir, f"{self.filename_stub}.stderr")
-        self.filenames['exitcode'] = os.path.join(self.test_dir, f"{self.filename_stub}.exitcode")
-        self.filenames['output'] = os.path.join(self.test_dir, f"{self.filename_stub}.output")
+class PassThruTest(FioJobCmdTest):
+    """
+    NVMe pass-through test class. Check to make sure output for selected data
+    direction(s) is non-zero and that zero data appears for other directions.
+    """
 
-    def run_fio(self, fio_path):
-        """Run a test."""
+    def setup(self, parameters):
+        """Setup a test."""
 
         fio_args = [
             "--name=nvmept",
@@ -61,300 +39,172 @@ class FioTest():
             "--iodepth=8",
             "--iodepth_batch=4",
             "--iodepth_batch_complete=4",
-            f"--filename={self.test_opts['filename']}",
-            f"--rw={self.test_opts['rw']}",
+            f"--filename={self.fio_opts['filename']}",
+            f"--rw={self.fio_opts['rw']}",
             f"--output={self.filenames['output']}",
-            f"--output-format={self.test_opts['output-format']}",
+            f"--output-format={self.fio_opts['output-format']}",
         ]
         for opt in ['fixedbufs', 'nonvectored', 'force_async', 'registerfiles',
                     'sqthread_poll', 'sqthread_poll_cpu', 'hipri', 'nowait',
                     'time_based', 'runtime', 'verify', 'io_size']:
-            if opt in self.test_opts:
-                option = f"--{opt}={self.test_opts[opt]}"
+            if opt in self.fio_opts:
+                option = f"--{opt}={self.fio_opts[opt]}"
                 fio_args.append(option)
 
-        command = [fio_path] + fio_args
-        with open(self.filenames['command'], "w+",
-                  encoding=locale.getpreferredencoding()) as command_file:
-            command_file.write(" ".join(command))
-
-        passed = True
-
-        try:
-            with open(self.filenames['stdout'], "w+",
-                      encoding=locale.getpreferredencoding()) as stdout_file, \
-                open(self.filenames['stderr'], "w+",
-                     encoding=locale.getpreferredencoding()) as stderr_file, \
-                open(self.filenames['exitcode'], "w+",
-                     encoding=locale.getpreferredencoding()) as exitcode_file:
-                proc = None
-                # Avoid using subprocess.run() here because when a timeout occurs,
-                # fio will be stopped with SIGKILL. This does not give fio a
-                # chance to clean up and means that child processes may continue
-                # running and submitting IO.
-                proc = subprocess.Popen(command,
-                                        stdout=stdout_file,
-                                        stderr=stderr_file,
-                                        cwd=self.test_dir,
-                                        universal_newlines=True)
-                proc.communicate(timeout=300)
-                exitcode_file.write(f'{proc.returncode}\n')
-                passed &= (proc.returncode == 0)
-        except subprocess.TimeoutExpired:
-            proc.terminate()
-            proc.communicate()
-            assert proc.poll()
-            print("Timeout expired")
-            passed = False
-        except Exception:
-            if proc:
-                if not proc.poll():
-                    proc.terminate()
-                    proc.communicate()
-            print(f"Exception: {sys.exc_info()}")
-            passed = False
-
-        if passed:
-            if 'output-format' in self.test_opts and 'json' in \
-                    self.test_opts['output-format']:
-                if not self.get_json():
-                    print('Unable to decode JSON data')
-                    passed = False
-
-        return passed
-
-    def get_json(self):
-        """Convert fio JSON output into a python JSON object"""
-
-        filename = self.filenames['output']
-        with open(filename, 'r', encoding=locale.getpreferredencoding()) as file:
-            file_data = file.read()
-
-        #
-        # Sometimes fio informational messages are included at the top of the
-        # JSON output, especially under Windows. Try to decode output as JSON
-        # data, lopping off up to the first four lines
-        #
-        lines = file_data.splitlines()
-        for i in range(5):
-            file_data = '\n'.join(lines[i:])
-            try:
-                self.json_data = json.loads(file_data)
-            except json.JSONDecodeError:
-                continue
-            else:
-                return True
-
-        return False
-
-    @staticmethod
-    def check_empty(job):
-        """
-        Make sure JSON data is empty.
-
-        Some data structures should be empty. This function makes sure that they are.
-
-        job         JSON object that we need to check for emptiness
-        """
-
-        return job['total_ios'] == 0 and \
-                job['slat_ns']['N'] == 0 and \
-                job['clat_ns']['N'] == 0 and \
-                job['lat_ns']['N'] == 0
-
-    def check_all_ddirs(self, ddir_nonzero, job):
-        """
-        Iterate over the data directions and check whether each is
-        appropriately empty or not.
-        """
-
-        retval = True
-        ddirlist = ['read', 'write', 'trim']
-
-        for ddir in ddirlist:
-            if ddir in ddir_nonzero:
-                if self.check_empty(job[ddir]):
-                    print(f"Unexpected zero {ddir} data found in output")
-                    retval = False
-            else:
-                if not self.check_empty(job[ddir]):
-                    print(f"Unexpected {ddir} data found in output")
-                    retval = False
-
-        return retval
-
-    def check(self):
-        """Check test output."""
-
-        raise NotImplementedError()
+        super().setup(fio_args)
 
 
-class PTTest(FioTest):
-    """
-    NVMe pass-through test class. Check to make sure output for selected data
-    direction(s) is non-zero and that zero data appears for other directions.
-    """
+    def check_result(self):
+        if 'rw' not in self.fio_opts:
+            return
 
-    def check(self):
-        if 'rw' not in self.test_opts:
-            return True
+        if not self.passed:
+            return
 
         job = self.json_data['jobs'][0]
-        retval = True
 
-        if self.test_opts['rw'] in ['read', 'randread']:
-            retval = self.check_all_ddirs(['read'], job)
-        elif self.test_opts['rw'] in ['write', 'randwrite']:
-            if 'verify' not in self.test_opts:
-                retval = self.check_all_ddirs(['write'], job)
+        if self.fio_opts['rw'] in ['read', 'randread']:
+            self.passed = self.check_all_ddirs(['read'], job)
+        elif self.fio_opts['rw'] in ['write', 'randwrite']:
+            if 'verify' not in self.fio_opts:
+                self.passed = self.check_all_ddirs(['write'], job)
             else:
-                retval = self.check_all_ddirs(['read', 'write'], job)
-        elif self.test_opts['rw'] in ['trim', 'randtrim']:
-            retval = self.check_all_ddirs(['trim'], job)
-        elif self.test_opts['rw'] in ['readwrite', 'randrw']:
-            retval = self.check_all_ddirs(['read', 'write'], job)
-        elif self.test_opts['rw'] in ['trimwrite', 'randtrimwrite']:
-            retval = self.check_all_ddirs(['trim', 'write'], job)
+                self.passed = self.check_all_ddirs(['read', 'write'], job)
+        elif self.fio_opts['rw'] in ['trim', 'randtrim']:
+            self.passed = self.check_all_ddirs(['trim'], job)
+        elif self.fio_opts['rw'] in ['readwrite', 'randrw']:
+            self.passed = self.check_all_ddirs(['read', 'write'], job)
+        elif self.fio_opts['rw'] in ['trimwrite', 'randtrimwrite']:
+            self.passed = self.check_all_ddirs(['trim', 'write'], job)
         else:
-            print(f"Unhandled rw value {self.test_opts['rw']}")
-            retval = False
-
-        return retval
-
+            print(f"Unhandled rw value {self.fio_opts['rw']}")
+            self.passed = False
 
-def parse_args():
-    """Parse command-line arguments."""
 
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-f', '--fio', help='path to file executable (e.g., ./fio)')
-    parser.add_argument('-a', '--artifact-root', help='artifact root directory')
-    parser.add_argument('-d', '--debug', help='enable debug output', action='store_true')
-    parser.add_argument('-s', '--skip', nargs='+', type=int,
-                        help='list of test(s) to skip')
-    parser.add_argument('-o', '--run-only', nargs='+', type=int,
-                        help='list of test(s) to run, skipping all others')
-    parser.add_argument('--dut', help='target NVMe character device to test '
-                        '(e.g., /dev/ng0n1). WARNING: THIS IS A DESTRUCTIVE TEST', required=True)
-    args = parser.parse_args()
-
-    return args
-
-
-def main():
-    """Run tests using fio's io_uring_cmd ioengine to send NVMe pass through commands."""
-
-    args = parse_args()
-
-    artifact_root = args.artifact_root if args.artifact_root else \
-        f"nvmept-test-{time.strftime('%Y%m%d-%H%M%S')}"
-    os.mkdir(artifact_root)
-    print(f"Artifact directory is {artifact_root}")
-
-    if args.fio:
-        fio = str(Path(args.fio).absolute())
-    else:
-        fio = 'fio'
-    print(f"fio path is {fio}")
-
-    test_list = [
-        {
-            "test_id": 1,
+TEST_LIST = [
+    {
+        "test_id": 1,
+        "fio_opts": {
             "rw": 'read',
             "timebased": 1,
             "runtime": 3,
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 2,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 2,
+        "fio_opts": {
             "rw": 'randread',
             "timebased": 1,
             "runtime": 3,
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 3,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 3,
+        "fio_opts": {
             "rw": 'write',
             "timebased": 1,
             "runtime": 3,
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 4,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 4,
+        "fio_opts": {
             "rw": 'randwrite',
             "timebased": 1,
             "runtime": 3,
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 5,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 5,
+        "fio_opts": {
             "rw": 'trim',
             "timebased": 1,
             "runtime": 3,
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 6,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 6,
+        "fio_opts": {
             "rw": 'randtrim',
             "timebased": 1,
             "runtime": 3,
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 7,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 7,
+        "fio_opts": {
             "rw": 'write',
             "io_size": 1024*1024,
             "verify": "crc32c",
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 8,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 8,
+        "fio_opts": {
             "rw": 'randwrite',
             "io_size": 1024*1024,
             "verify": "crc32c",
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 9,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 9,
+        "fio_opts": {
             "rw": 'readwrite',
             "timebased": 1,
             "runtime": 3,
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 10,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 10,
+        "fio_opts": {
             "rw": 'randrw',
             "timebased": 1,
             "runtime": 3,
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 11,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 11,
+        "fio_opts": {
             "rw": 'trimwrite',
             "timebased": 1,
             "runtime": 3,
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 12,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 12,
+        "fio_opts": {
             "rw": 'randtrimwrite',
             "timebased": 1,
             "runtime": 3,
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 13,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 13,
+        "fio_opts": {
             "rw": 'randread',
             "timebased": 1,
             "runtime": 3,
@@ -364,10 +214,12 @@ def main():
             "registerfiles": 1,
             "sqthread_poll": 1,
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-        {
-            "test_id": 14,
+            },
+        "test_class": PassThruTest,
+    },
+    {
+        "test_id": 14,
+        "fio_opts": {
             "rw": 'randwrite',
             "timebased": 1,
             "runtime": 3,
@@ -377,36 +229,55 @@ def main():
             "registerfiles": 1,
             "sqthread_poll": 1,
             "output-format": "json",
-            "test_obj": PTTest,
-        },
-    ]
+            },
+        "test_class": PassThruTest,
+    },
+]
 
-    passed = 0
-    failed = 0
-    skipped = 0
+def parse_args():
+    """Parse command-line arguments."""
 
-    for test in test_list:
-        if (args.skip and test['test_id'] in args.skip) or \
-           (args.run_only and test['test_id'] not in args.run_only):
-            skipped = skipped + 1
-            outcome = 'SKIPPED (User request)'
-        else:
-            test['filename'] = args.dut
-            test_obj = test['test_obj'](artifact_root, test, args.debug)
-            status = test_obj.run_fio(fio)
-            if status:
-                status = test_obj.check()
-            if status:
-                passed = passed + 1
-                outcome = 'PASSED'
-            else:
-                failed = failed + 1
-                outcome = 'FAILED'
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-f', '--fio', help='path to file executable (e.g., ./fio)')
+    parser.add_argument('-a', '--artifact-root', help='artifact root directory')
+    parser.add_argument('-s', '--skip', nargs='+', type=int,
+                        help='list of test(s) to skip')
+    parser.add_argument('-o', '--run-only', nargs='+', type=int,
+                        help='list of test(s) to run, skipping all others')
+    parser.add_argument('--dut', help='target NVMe character device to test '
+                        '(e.g., /dev/ng0n1). WARNING: THIS IS A DESTRUCTIVE TEST', required=True)
+    args = parser.parse_args()
+
+    return args
+
+
+def main():
+    """Run tests using fio's io_uring_cmd ioengine to send NVMe pass through commands."""
+
+    args = parse_args()
+
+    artifact_root = args.artifact_root if args.artifact_root else \
+        f"nvmept-test-{time.strftime('%Y%m%d-%H%M%S')}"
+    os.mkdir(artifact_root)
+    print(f"Artifact directory is {artifact_root}")
+
+    if args.fio:
+        fio_path = str(Path(args.fio).absolute())
+    else:
+        fio_path = 'fio'
+    print(f"fio path is {fio_path}")
 
-        print(f"**********Test {test['test_id']} {outcome}**********")
+    for test in TEST_LIST:
+        test['fio_opts']['filename'] = args.dut
 
-    print(f"{passed} tests passed, {failed} failed, {skipped} skipped")
+    test_env = {
+              'fio_path': fio_path,
+              'fio_root': str(Path(__file__).absolute().parent.parent),
+              'artifact_root': artifact_root,
+              'basename': 'readonly',
+              }
 
+    _, failed, _ = run_fio_tests(TEST_LIST, test_env, args)
     sys.exit(failed)
 
 
diff --git a/t/random_seed.py b/t/random_seed.py
index 86f2eb21..02187046 100755
--- a/t/random_seed.py
+++ b/t/random_seed.py
@@ -23,38 +23,16 @@ import os
 import sys
 import time
 import locale
+import logging
 import argparse
-import subprocess
 from pathlib import Path
+from fiotestlib import FioJobCmdTest, run_fio_tests
 
-class FioRandTest():
+class FioRandTest(FioJobCmdTest):
     """fio random seed test."""
 
-    def __init__(self, artifact_root, test_options, debug):
-        """
-        artifact_root   root directory for artifacts (subdirectory will be created under here)
-        test            test specification
-        """
-        self.artifact_root = artifact_root
-        self.test_options = test_options
-        self.debug = debug
-        self.filename_stub = None
-        self.filenames = {}
-
-        self.test_dir = os.path.abspath(os.path.join(self.artifact_root,
-                                     f"{self.test_options['test_id']:03d}"))
-        if not os.path.exists(self.test_dir):
-            os.mkdir(self.test_dir)
-
-        self.filename_stub = f"random{self.test_options['test_id']:03d}"
-        self.filenames['command'] = os.path.join(self.test_dir, f"{self.filename_stub}.command")
-        self.filenames['stdout'] = os.path.join(self.test_dir, f"{self.filename_stub}.stdout")
-        self.filenames['stderr'] = os.path.join(self.test_dir, f"{self.filename_stub}.stderr")
-        self.filenames['exitcode'] = os.path.join(self.test_dir, f"{self.filename_stub}.exitcode")
-        self.filenames['output'] = os.path.join(self.test_dir, f"{self.filename_stub}.output")
-
-    def run_fio(self, fio_path):
-        """Run a test."""
+    def setup(self, parameters):
+        """Setup the test."""
 
         fio_args = [
             "--debug=random",
@@ -65,52 +43,16 @@ class FioRandTest():
             f"--output={self.filenames['output']}",
         ]
         for opt in ['randseed', 'randrepeat', 'allrandrepeat']:
-            if opt in self.test_options:
-                option = f"--{opt}={self.test_options[opt]}"
+            if opt in self.fio_opts:
+                option = f"--{opt}={self.fio_opts[opt]}"
                 fio_args.append(option)
 
-        command = [fio_path] + fio_args
-        with open(self.filenames['command'], "w+", encoding=locale.getpreferredencoding()) as command_file:
-            command_file.write(" ".join(command))
-
-        passed = True
-
-        try:
-            with open(self.filenames['stdout'], "w+", encoding=locale.getpreferredencoding()) as stdout_file, \
-                open(self.filenames['stderr'], "w+", encoding=locale.getpreferredencoding()) as stderr_file, \
-                open(self.filenames['exitcode'], "w+", encoding=locale.getpreferredencoding()) as exitcode_file:
-                proc = None
-                # Avoid using subprocess.run() here because when a timeout occurs,
-                # fio will be stopped with SIGKILL. This does not give fio a
-                # chance to clean up and means that child processes may continue
-                # running and submitting IO.
-                proc = subprocess.Popen(command,
-                                        stdout=stdout_file,
-                                        stderr=stderr_file,
-                                        cwd=self.test_dir,
-                                        universal_newlines=True)
-                proc.communicate(timeout=300)
-                exitcode_file.write(f'{proc.returncode}\n')
-                passed &= (proc.returncode == 0)
-        except subprocess.TimeoutExpired:
-            proc.terminate()
-            proc.communicate()
-            assert proc.poll()
-            print("Timeout expired")
-            passed = False
-        except Exception:
-            if proc:
-                if not proc.poll():
-                    proc.terminate()
-                    proc.communicate()
-            print(f"Exception: {sys.exc_info()}")
-            passed = False
-
-        return passed
+        super().setup(fio_args)
 
     def get_rand_seeds(self):
         """Collect random seeds from --debug=random output."""
-        with open(self.filenames['output'], "r", encoding=locale.getpreferredencoding()) as out_file:
+        with open(self.filenames['output'], "r",
+                  encoding=locale.getpreferredencoding()) as out_file:
             file_data = out_file.read()
 
             offsets = 0
@@ -136,11 +78,6 @@ class FioRandTest():
 
             return seed_list
 
-    def check(self):
-        """Check test output."""
-
-        raise NotImplementedError()
-
 
 class TestRR(FioRandTest):
     """
@@ -151,41 +88,35 @@ class TestRR(FioRandTest):
     # one set of seeds is for randrepeat=0 and the other is for randrepeat=1
     seeds = { 0: None, 1: None }
 
-    def check(self):
+    def check_result(self):
         """Check output for allrandrepeat=1."""
 
-        retval = True
-        opt = 'randrepeat' if 'randrepeat' in self.test_options else 'allrandrepeat'
-        rr = self.test_options[opt]
+        opt = 'randrepeat' if 'randrepeat' in self.fio_opts else 'allrandrepeat'
+        rr = self.fio_opts[opt]
         rand_seeds = self.get_rand_seeds()
 
         if not TestRR.seeds[rr]:
             TestRR.seeds[rr] = rand_seeds
-            if self.debug:
-                print(f"TestRR: saving rand_seeds for [a]rr={rr}")
+            logging.debug("TestRR: saving rand_seeds for [a]rr=%d", rr)
         else:
             if rr:
                 if TestRR.seeds[1] != rand_seeds:
-                    retval = False
+                    self.passed = False
                     print(f"TestRR: unexpected seed mismatch for [a]rr={rr}")
                 else:
-                    if self.debug:
-                        print(f"TestRR: seeds correctly match for [a]rr={rr}")
+                    logging.debug("TestRR: seeds correctly match for [a]rr=%d", rr)
                 if TestRR.seeds[0] == rand_seeds:
-                    retval = False
+                    self.passed = False
                     print("TestRR: seeds unexpectedly match those from system RNG")
             else:
                 if TestRR.seeds[0] == rand_seeds:
-                    retval = False
+                    self.passed = False
                     print(f"TestRR: unexpected seed match for [a]rr={rr}")
                 else:
-                    if self.debug:
-                        print(f"TestRR: seeds correctly don't match for [a]rr={rr}")
+                    logging.debug("TestRR: seeds correctly don't match for [a]rr=%d", rr)
                 if TestRR.seeds[1] == rand_seeds:
-                    retval = False
-                    print(f"TestRR: random seeds unexpectedly match those from [a]rr=1")
-
-        return retval
+                    self.passed = False
+                    print("TestRR: random seeds unexpectedly match those from [a]rr=1")
 
 
 class TestRS(FioRandTest):
@@ -197,40 +128,33 @@ class TestRS(FioRandTest):
     """
     seeds = {}
 
-    def check(self):
+    def check_result(self):
         """Check output for randseed=something."""
 
-        retval = True
         rand_seeds = self.get_rand_seeds()
-        randseed = self.test_options['randseed']
+        randseed = self.fio_opts['randseed']
 
-        if self.debug:
-            print("randseed = ", randseed)
+        logging.debug("randseed = %s", randseed)
 
         if randseed not in TestRS.seeds:
             TestRS.seeds[randseed] = rand_seeds
-            if self.debug:
-                print("TestRS: saving rand_seeds")
+            logging.debug("TestRS: saving rand_seeds")
         else:
             if TestRS.seeds[randseed] != rand_seeds:
-                retval = False
+                self.passed = False
                 print("TestRS: seeds don't match when they should")
             else:
-                if self.debug:
-                    print("TestRS: seeds correctly match")
+                logging.debug("TestRS: seeds correctly match")
 
         # Now try to find seeds generated using a different randseed and make
         # sure they *don't* match
-        for key in TestRS.seeds:
+        for key, value in TestRS.seeds.items():
             if key != randseed:
-                if TestRS.seeds[key] == rand_seeds:
-                    retval = False
+                if value == rand_seeds:
+                    self.passed = False
                     print("TestRS: randseeds differ but generated seeds match.")
                 else:
-                    if self.debug:
-                        print("TestRS: randseeds differ and generated seeds also differ.")
-
-        return retval
+                    logging.debug("TestRS: randseeds differ and generated seeds also differ.")
 
 
 def parse_args():
@@ -254,139 +178,161 @@ def main():
 
     args = parse_args()
 
+    if args.debug:
+        logging.basicConfig(level=logging.DEBUG)
+    else:
+        logging.basicConfig(level=logging.INFO)
+
     artifact_root = args.artifact_root if args.artifact_root else \
         f"random-seed-test-{time.strftime('%Y%m%d-%H%M%S')}"
     os.mkdir(artifact_root)
     print(f"Artifact directory is {artifact_root}")
 
     if args.fio:
-        fio = str(Path(args.fio).absolute())
+        fio_path = str(Path(args.fio).absolute())
     else:
-        fio = 'fio'
-    print(f"fio path is {fio}")
+        fio_path = 'fio'
+    print(f"fio path is {fio_path}")
 
     test_list = [
         {
             "test_id": 1,
-            "randrepeat": 0,
-            "test_obj": TestRR,
+            "fio_opts": {
+                "randrepeat": 0,
+                },
+            "test_class": TestRR,
         },
         {
             "test_id": 2,
-            "randrepeat": 0,
-            "test_obj": TestRR,
+            "fio_opts": {
+                "randrepeat": 0,
+                },
+            "test_class": TestRR,
         },
         {
             "test_id": 3,
-            "randrepeat": 1,
-            "test_obj": TestRR,
+            "fio_opts": {
+                "randrepeat": 1,
+                },
+            "test_class": TestRR,
         },
         {
             "test_id": 4,
-            "randrepeat": 1,
-            "test_obj": TestRR,
+            "fio_opts": {
+                "randrepeat": 1,
+                },
+            "test_class": TestRR,
         },
         {
             "test_id": 5,
-            "allrandrepeat": 0,
-            "test_obj": TestRR,
+            "fio_opts": {
+                "allrandrepeat": 0,
+                },
+            "test_class": TestRR,
         },
         {
             "test_id": 6,
-            "allrandrepeat": 0,
-            "test_obj": TestRR,
+            "fio_opts": {
+                "allrandrepeat": 0,
+                },
+            "test_class": TestRR,
         },
         {
             "test_id": 7,
-            "allrandrepeat": 1,
-            "test_obj": TestRR,
+            "fio_opts": {
+                "allrandrepeat": 1,
+                },
+            "test_class": TestRR,
         },
         {
             "test_id": 8,
-            "allrandrepeat": 1,
-            "test_obj": TestRR,
+            "fio_opts": {
+                "allrandrepeat": 1,
+                },
+            "test_class": TestRR,
         },
         {
             "test_id": 9,
-            "randrepeat": 0,
-            "randseed": "12345",
-            "test_obj": TestRS,
+            "fio_opts": {
+                "randrepeat": 0,
+                "randseed": "12345",
+                },
+            "test_class": TestRS,
         },
         {
             "test_id": 10,
-            "randrepeat": 0,
-            "randseed": "12345",
-            "test_obj": TestRS,
+            "fio_opts": {
+                "randrepeat": 0,
+                "randseed": "12345",
+                },
+            "test_class": TestRS,
         },
         {
             "test_id": 11,
-            "randrepeat": 1,
-            "randseed": "12345",
-            "test_obj": TestRS,
+            "fio_opts": {
+                "randrepeat": 1,
+                "randseed": "12345",
+                },
+            "test_class": TestRS,
         },
         {
             "test_id": 12,
-            "allrandrepeat": 0,
-            "randseed": "12345",
-            "test_obj": TestRS,
+            "fio_opts": {
+                "allrandrepeat": 0,
+                "randseed": "12345",
+                },
+            "test_class": TestRS,
         },
         {
             "test_id": 13,
-            "allrandrepeat": 1,
-            "randseed": "12345",
-            "test_obj": TestRS,
+            "fio_opts": {
+                "allrandrepeat": 1,
+                "randseed": "12345",
+                },
+            "test_class": TestRS,
         },
         {
             "test_id": 14,
-            "randrepeat": 0,
-            "randseed": "67890",
-            "test_obj": TestRS,
+            "fio_opts": {
+                "randrepeat": 0,
+                "randseed": "67890",
+                },
+            "test_class": TestRS,
         },
         {
             "test_id": 15,
-            "randrepeat": 1,
-            "randseed": "67890",
-            "test_obj": TestRS,
+            "fio_opts": {
+                "randrepeat": 1,
+                "randseed": "67890",
+                },
+            "test_class": TestRS,
         },
         {
             "test_id": 16,
-            "allrandrepeat": 0,
-            "randseed": "67890",
-            "test_obj": TestRS,
+            "fio_opts": {
+                "allrandrepeat": 0,
+                "randseed": "67890",
+                },
+            "test_class": TestRS,
         },
         {
             "test_id": 17,
-            "allrandrepeat": 1,
-            "randseed": "67890",
-            "test_obj": TestRS,
+            "fio_opts": {
+                "allrandrepeat": 1,
+                "randseed": "67890",
+                },
+            "test_class": TestRS,
         },
     ]
 
-    passed = 0
-    failed = 0
-    skipped = 0
-
-    for test in test_list:
-        if (args.skip and test['test_id'] in args.skip) or \
-           (args.run_only and test['test_id'] not in args.run_only):
-            skipped = skipped + 1
-            outcome = 'SKIPPED (User request)'
-        else:
-            test_obj = test['test_obj'](artifact_root, test, args.debug)
-            status = test_obj.run_fio(fio)
-            if status:
-                status = test_obj.check()
-            if status:
-                passed = passed + 1
-                outcome = 'PASSED'
-            else:
-                failed = failed + 1
-                outcome = 'FAILED'
-
-        print(f"**********Test {test['test_id']} {outcome}**********")
-
-    print(f"{passed} tests passed, {failed} failed, {skipped} skipped")
+    test_env = {
+              'fio_path': fio_path,
+              'fio_root': str(Path(__file__).absolute().parent.parent),
+              'artifact_root': artifact_root,
+              'basename': 'random',
+              }
 
+    _, failed, _ = run_fio_tests(test_list, test_env, args)
     sys.exit(failed)
 
 
diff --git a/t/readonly.py b/t/readonly.py
index 80fac639..d36faafa 100755
--- a/t/readonly.py
+++ b/t/readonly.py
@@ -2,8 +2,8 @@
 # SPDX-License-Identifier: GPL-2.0-only
 #
 # Copyright (c) 2019 Western Digital Corporation or its affiliates.
-#
-#
+
+"""
 # readonly.py
 #
 # Do some basic tests of the --readonly parameter
@@ -18,122 +18,144 @@
 # REQUIREMENTS
 # Python 3.5+
 #
-#
+"""
 
+import os
 import sys
+import time
 import argparse
-import subprocess
+from pathlib import Path
+from fiotestlib import FioJobCmdTest, run_fio_tests
+from fiotestcommon import SUCCESS_DEFAULT, SUCCESS_NONZERO
+
+
+class FioReadOnlyTest(FioJobCmdTest):
+    """fio read only test."""
+
+    def setup(self, parameters):
+        """Setup the test."""
+
+        fio_args = [
+                    "--name=readonly",
+                    "--ioengine=null",
+                    "--time_based",
+                    "--runtime=1s",
+                    "--size=1M",
+                    f"--rw={self.fio_opts['rw']}",
+                   ]
+        if 'readonly-pre' in parameters:
+            fio_args.insert(0, "--readonly")
+        if 'readonly-post' in parameters:
+            fio_args.append("--readonly")
+
+        super().setup(fio_args)
+
+
+TEST_LIST = [
+            {
+                "test_id": 1,
+                "fio_opts": { "rw": "randread", },
+                "readonly-pre": 1,
+                "success": SUCCESS_DEFAULT,
+                "test_class": FioReadOnlyTest,
+            },
+            {
+                "test_id": 2,
+                "fio_opts": { "rw": "randwrite", },
+                "readonly-pre": 1,
+                "success": SUCCESS_NONZERO,
+                "test_class": FioReadOnlyTest,
+            },
+            {
+                "test_id": 3,
+                "fio_opts": { "rw": "randtrim", },
+                "readonly-pre": 1,
+                "success": SUCCESS_NONZERO,
+                "test_class": FioReadOnlyTest,
+            },
+            {
+                "test_id": 4,
+                "fio_opts": { "rw": "randread", },
+                "readonly-post": 1,
+                "success": SUCCESS_DEFAULT,
+                "test_class": FioReadOnlyTest,
+            },
+            {
+                "test_id": 5,
+                "fio_opts": { "rw": "randwrite", },
+                "readonly-post": 1,
+                "success": SUCCESS_NONZERO,
+                "test_class": FioReadOnlyTest,
+            },
+            {
+                "test_id": 6,
+                "fio_opts": { "rw": "randtrim", },
+                "readonly-post": 1,
+                "success": SUCCESS_NONZERO,
+                "test_class": FioReadOnlyTest,
+            },
+            {
+                "test_id": 7,
+                "fio_opts": { "rw": "randread", },
+                "success": SUCCESS_DEFAULT,
+                "test_class": FioReadOnlyTest,
+            },
+            {
+                "test_id": 8,
+                "fio_opts": { "rw": "randwrite", },
+                "success": SUCCESS_DEFAULT,
+                "test_class": FioReadOnlyTest,
+            },
+            {
+                "test_id": 9,
+                "fio_opts": { "rw": "randtrim", },
+                "success": SUCCESS_DEFAULT,
+                "test_class": FioReadOnlyTest,
+            },
+        ]
 
 
 def parse_args():
+    """Parse command-line arguments."""
+
     parser = argparse.ArgumentParser()
-    parser.add_argument('-f', '--fio',
-                        help='path to fio executable (e.g., ./fio)')
+    parser.add_argument('-f', '--fio', help='path to fio executable (e.g., ./fio)')
+    parser.add_argument('-a', '--artifact-root', help='artifact root directory')
+    parser.add_argument('-s', '--skip', nargs='+', type=int,
+                        help='list of test(s) to skip')
+    parser.add_argument('-o', '--run-only', nargs='+', type=int,
+                        help='list of test(s) to run, skipping all others')
     args = parser.parse_args()
 
     return args
 
 
-def run_fio(fio, test, index):
-    fio_args = [
-                "--max-jobs=16",
-                "--name=readonly",
-                "--ioengine=null",
-                "--time_based",
-                "--runtime=1s",
-                "--size=1M",
-                "--rw={rw}".format(**test),
-               ]
-    if 'readonly-pre' in test:
-        fio_args.insert(0, "--readonly")
-    if 'readonly-post' in test:
-        fio_args.append("--readonly")
-
-    output = subprocess.run([fio] + fio_args, stdout=subprocess.PIPE,
-                            stderr=subprocess.PIPE)
-
-    return output
-
-
-def check_output(output, test):
-    expect_error = False
-    if 'readonly-pre' in test or 'readonly-post' in test:
-        if 'write' in test['rw'] or 'trim' in test['rw']:
-            expect_error = True
-
-#    print(output.stdout)
-#    print(output.stderr)
-
-    if output.returncode == 0:
-        if expect_error:
-            return False
-        else:
-            return True
-    else:
-        if expect_error:
-            return True
-        else:
-            return False
-
+def main():
+    """Run readonly tests."""
 
-if __name__ == '__main__':
     args = parse_args()
 
-    tests = [
-                {
-                    "rw": "randread",
-                    "readonly-pre": 1,
-                },
-                {
-                    "rw": "randwrite",
-                    "readonly-pre": 1,
-                },
-                {
-                    "rw": "randtrim",
-                    "readonly-pre": 1,
-                },
-                {
-                    "rw": "randread",
-                    "readonly-post": 1,
-                },
-                {
-                    "rw": "randwrite",
-                    "readonly-post": 1,
-                },
-                {
-                    "rw": "randtrim",
-                    "readonly-post": 1,
-                },
-                {
-                    "rw": "randread",
-                },
-                {
-                    "rw": "randwrite",
-                },
-                {
-                    "rw": "randtrim",
-                },
-            ]
-
-    index = 1
-    passed = 0
-    failed = 0
-
     if args.fio:
-        fio_path = args.fio
+        fio_path = str(Path(args.fio).absolute())
     else:
         fio_path = 'fio'
+    print(f"fio path is {fio_path}")
 
-    for test in tests:
-        output = run_fio(fio_path, test, index)
-        status = check_output(output, test)
-        print("Test {0} {1}".format(index, ("PASSED" if status else "FAILED")))
-        if status:
-            passed = passed + 1
-        else:
-            failed = failed + 1
-        index = index + 1
+    artifact_root = args.artifact_root if args.artifact_root else \
+        f"readonly-test-{time.strftime('%Y%m%d-%H%M%S')}"
+    os.mkdir(artifact_root)
+    print(f"Artifact directory is {artifact_root}")
 
-    print("{0} tests passed, {1} failed".format(passed, failed))
+    test_env = {
+              'fio_path': fio_path,
+              'fio_root': str(Path(__file__).absolute().parent.parent),
+              'artifact_root': artifact_root,
+              'basename': 'readonly',
+              }
 
+    _, failed, _ = run_fio_tests(TEST_LIST, test_env, args)
     sys.exit(failed)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index c91deed4..1448f7cb 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -43,298 +43,17 @@
 
 import os
 import sys
-import json
 import time
 import shutil
 import logging
 import argparse
-import platform
-import traceback
-import subprocess
-import multiprocessing
 from pathlib import Path
 from statsmodels.sandbox.stats.runs import runstest_1samp
+from fiotestlib import FioExeTest, FioJobFileTest, run_fio_tests
+from fiotestcommon import *
 
 
-class FioTest():
-    """Base for all fio tests."""
-
-    def __init__(self, exe_path, parameters, success):
-        self.exe_path = exe_path
-        self.parameters = parameters
-        self.success = success
-        self.output = {}
-        self.artifact_root = None
-        self.testnum = None
-        self.test_dir = None
-        self.passed = True
-        self.failure_reason = ''
-        self.command_file = None
-        self.stdout_file = None
-        self.stderr_file = None
-        self.exitcode_file = None
-
-    def setup(self, artifact_root, testnum):
-        """Setup instance variables for test."""
-
-        self.artifact_root = artifact_root
-        self.testnum = testnum
-        self.test_dir = os.path.join(artifact_root, f"{testnum:04d}")
-        if not os.path.exists(self.test_dir):
-            os.mkdir(self.test_dir)
-
-        self.command_file = os.path.join(
-            self.test_dir,
-            f"{os.path.basename(self.exe_path)}.command")
-        self.stdout_file = os.path.join(
-            self.test_dir,
-            f"{os.path.basename(self.exe_path)}.stdout")
-        self.stderr_file = os.path.join(
-            self.test_dir,
-            f"{os.path.basename(self.exe_path)}.stderr")
-        self.exitcode_file = os.path.join(
-            self.test_dir,
-            f"{os.path.basename(self.exe_path)}.exitcode")
-
-    def run(self):
-        """Run the test."""
-
-        raise NotImplementedError()
-
-    def check_result(self):
-        """Check test results."""
-
-        raise NotImplementedError()
-
-
-class FioExeTest(FioTest):
-    """Test consists of an executable binary or script"""
-
-    def __init__(self, exe_path, parameters, success):
-        """Construct a FioExeTest which is a FioTest consisting of an
-        executable binary or script.
-
-        exe_path:       location of executable binary or script
-        parameters:     list of parameters for executable
-        success:        Definition of test success
-        """
-
-        FioTest.__init__(self, exe_path, parameters, success)
-
-    def run(self):
-        """Execute the binary or script described by this instance."""
-
-        command = [self.exe_path] + self.parameters
-        command_file = open(self.command_file, "w+")
-        command_file.write(f"{command}\n")
-        command_file.close()
-
-        stdout_file = open(self.stdout_file, "w+")
-        stderr_file = open(self.stderr_file, "w+")
-        exitcode_file = open(self.exitcode_file, "w+")
-        try:
-            proc = None
-            # Avoid using subprocess.run() here because when a timeout occurs,
-            # fio will be stopped with SIGKILL. This does not give fio a
-            # chance to clean up and means that child processes may continue
-            # running and submitting IO.
-            proc = subprocess.Popen(command,
-                                    stdout=stdout_file,
-                                    stderr=stderr_file,
-                                    cwd=self.test_dir,
-                                    universal_newlines=True)
-            proc.communicate(timeout=self.success['timeout'])
-            exitcode_file.write(f'{proc.returncode}\n')
-            logging.debug("Test %d: return code: %d", self.testnum, proc.returncode)
-            self.output['proc'] = proc
-        except subprocess.TimeoutExpired:
-            proc.terminate()
-            proc.communicate()
-            assert proc.poll()
-            self.output['failure'] = 'timeout'
-        except Exception:
-            if proc:
-                if not proc.poll():
-                    proc.terminate()
-                    proc.communicate()
-            self.output['failure'] = 'exception'
-            self.output['exc_info'] = sys.exc_info()
-        finally:
-            stdout_file.close()
-            stderr_file.close()
-            exitcode_file.close()
-
-    def check_result(self):
-        """Check results of test run."""
-
-        if 'proc' not in self.output:
-            if self.output['failure'] == 'timeout':
-                self.failure_reason = f"{self.failure_reason} timeout,"
-            else:
-                assert self.output['failure'] == 'exception'
-                self.failure_reason = '{0} exception: {1}, {2}'.format(
-                    self.failure_reason, self.output['exc_info'][0],
-                    self.output['exc_info'][1])
-
-            self.passed = False
-            return
-
-        if 'zero_return' in self.success:
-            if self.success['zero_return']:
-                if self.output['proc'].returncode != 0:
-                    self.passed = False
-                    self.failure_reason = f"{self.failure_reason} non-zero return code,"
-            else:
-                if self.output['proc'].returncode == 0:
-                    self.failure_reason = f"{self.failure_reason} zero return code,"
-                    self.passed = False
-
-        stderr_size = os.path.getsize(self.stderr_file)
-        if 'stderr_empty' in self.success:
-            if self.success['stderr_empty']:
-                if stderr_size != 0:
-                    self.failure_reason = f"{self.failure_reason} stderr not empty,"
-                    self.passed = False
-            else:
-                if stderr_size == 0:
-                    self.failure_reason = f"{self.failure_reason} stderr empty,"
-                    self.passed = False
-
-
-class FioJobTest(FioExeTest):
-    """Test consists of a fio job"""
-
-    def __init__(self, fio_path, fio_job, success, fio_pre_job=None,
-                 fio_pre_success=None, output_format="normal"):
-        """Construct a FioJobTest which is a FioExeTest consisting of a
-        single fio job file with an optional setup step.
-
-        fio_path:           location of fio executable
-        fio_job:            location of fio job file
-        success:            Definition of test success
-        fio_pre_job:        fio job for preconditioning
-        fio_pre_success:    Definition of test success for fio precon job
-        output_format:      normal (default), json, jsonplus, or terse
-        """
-
-        self.fio_job = fio_job
-        self.fio_pre_job = fio_pre_job
-        self.fio_pre_success = fio_pre_success if fio_pre_success else success
-        self.output_format = output_format
-        self.precon_failed = False
-        self.json_data = None
-        self.fio_output = f"{os.path.basename(self.fio_job)}.output"
-        self.fio_args = [
-            "--max-jobs=16",
-            f"--output-format={self.output_format}",
-            f"--output={self.fio_output}",
-            self.fio_job,
-            ]
-        FioExeTest.__init__(self, fio_path, self.fio_args, success)
-
-    def setup(self, artifact_root, testnum):
-        """Setup instance variables for fio job test."""
-
-        super().setup(artifact_root, testnum)
-
-        self.command_file = os.path.join(
-            self.test_dir,
-            f"{os.path.basename(self.fio_job)}.command")
-        self.stdout_file = os.path.join(
-            self.test_dir,
-            f"{os.path.basename(self.fio_job)}.stdout")
-        self.stderr_file = os.path.join(
-            self.test_dir,
-            f"{os.path.basename(self.fio_job)}.stderr")
-        self.exitcode_file = os.path.join(
-            self.test_dir,
-            f"{os.path.basename(self.fio_job)}.exitcode")
-
-    def run_pre_job(self):
-        """Run fio job precondition step."""
-
-        precon = FioJobTest(self.exe_path, self.fio_pre_job,
-                            self.fio_pre_success,
-                            output_format=self.output_format)
-        precon.setup(self.artifact_root, self.testnum)
-        precon.run()
-        precon.check_result()
-        self.precon_failed = not precon.passed
-        self.failure_reason = precon.failure_reason
-
-    def run(self):
-        """Run fio job test."""
-
-        if self.fio_pre_job:
-            self.run_pre_job()
-
-        if not self.precon_failed:
-            super().run()
-        else:
-            logging.debug("Test %d: precondition step failed", self.testnum)
-
-    @classmethod
-    def get_file(cls, filename):
-        """Safely read a file."""
-        file_data = ''
-        success = True
-
-        try:
-            with open(filename, "r") as output_file:
-                file_data = output_file.read()
-        except OSError:
-            success = False
-
-        return file_data, success
-
-    def get_file_fail(self, filename):
-        """Safely read a file and fail the test upon error."""
-        file_data = None
-
-        try:
-            with open(filename, "r") as output_file:
-                file_data = output_file.read()
-        except OSError:
-            self.failure_reason += f" unable to read file {filename}"
-            self.passed = False
-
-        return file_data
-
-    def check_result(self):
-        """Check fio job results."""
-
-        if self.precon_failed:
-            self.passed = False
-            self.failure_reason = f"{self.failure_reason} precondition step failed,"
-            return
-
-        super().check_result()
-
-        if not self.passed:
-            return
-
-        if 'json' not in self.output_format:
-            return
-
-        file_data = self.get_file_fail(os.path.join(self.test_dir, self.fio_output))
-        if not file_data:
-            return
-
-        #
-        # Sometimes fio informational messages are included at the top of the
-        # JSON output, especially under Windows. Try to decode output as JSON
-        # data, skipping everything until the first {
-        #
-        lines = file_data.splitlines()
-        file_data = '\n'.join(lines[lines.index("{"):])
-        try:
-            self.json_data = json.loads(file_data)
-        except json.JSONDecodeError:
-            self.failure_reason = f"{self.failure_reason} unable to decode JSON data,"
-            self.passed = False
-
-
-class FioJobTest_t0005(FioJobTest):
+class FioJobFileTest_t0005(FioJobFileTest):
     """Test consists of fio test job t0005
     Confirm that read['io_kbytes'] == write['io_kbytes'] == 102400"""
 
@@ -352,7 +71,7 @@ class FioJobTest_t0005(FioJobTest):
             self.passed = False
 
 
-class FioJobTest_t0006(FioJobTest):
+class FioJobFileTest_t0006(FioJobFileTest):
     """Test consists of fio test job t0006
     Confirm that read['io_kbytes'] ~ 2*write['io_kbytes']"""
 
@@ -370,7 +89,7 @@ class FioJobTest_t0006(FioJobTest):
             self.passed = False
 
 
-class FioJobTest_t0007(FioJobTest):
+class FioJobFileTest_t0007(FioJobFileTest):
     """Test consists of fio test job t0007
     Confirm that read['io_kbytes'] = 87040"""
 
@@ -385,7 +104,7 @@ class FioJobTest_t0007(FioJobTest):
             self.passed = False
 
 
-class FioJobTest_t0008(FioJobTest):
+class FioJobFileTest_t0008(FioJobFileTest):
     """Test consists of fio test job t0008
     Confirm that read['io_kbytes'] = 32768 and that
                 write['io_kbytes'] ~ 16384
@@ -413,7 +132,7 @@ class FioJobTest_t0008(FioJobTest):
             self.passed = False
 
 
-class FioJobTest_t0009(FioJobTest):
+class FioJobFileTest_t0009(FioJobFileTest):
     """Test consists of fio test job t0009
     Confirm that runtime >= 60s"""
 
@@ -430,7 +149,7 @@ class FioJobTest_t0009(FioJobTest):
             self.passed = False
 
 
-class FioJobTest_t0012(FioJobTest):
+class FioJobFileTest_t0012(FioJobFileTest):
     """Test consists of fio test job t0012
     Confirm ratios of job iops are 1:5:10
     job1,job2,job3 respectively"""
@@ -443,7 +162,7 @@ class FioJobTest_t0012(FioJobTest):
 
         iops_files = []
         for i in range(1, 4):
-            filename = os.path.join(self.test_dir, "{0}_iops.{1}.log".format(os.path.basename(
+            filename = os.path.join(self.paths['test_dir'], "{0}_iops.{1}.log".format(os.path.basename(
                 self.fio_job), i))
             file_data = self.get_file_fail(filename)
             if not file_data:
@@ -475,7 +194,7 @@ class FioJobTest_t0012(FioJobTest):
             return
 
 
-class FioJobTest_t0014(FioJobTest):
+class FioJobFileTest_t0014(FioJobFileTest):
     """Test consists of fio test job t0014
 	Confirm that job1_iops / job2_iops ~ 1:2 for entire duration
 	and that job1_iops / job3_iops ~ 1:3 for first half of duration.
@@ -491,7 +210,7 @@ class FioJobTest_t0014(FioJobTest):
 
         iops_files = []
         for i in range(1, 4):
-            filename = os.path.join(self.test_dir, "{0}_iops.{1}.log".format(os.path.basename(
+            filename = os.path.join(self.paths['test_dir'], "{0}_iops.{1}.log".format(os.path.basename(
                 self.fio_job), i))
             file_data = self.get_file_fail(filename)
             if not file_data:
@@ -534,7 +253,7 @@ class FioJobTest_t0014(FioJobTest):
             return
 
 
-class FioJobTest_t0015(FioJobTest):
+class FioJobFileTest_t0015(FioJobFileTest):
     """Test consists of fio test jobs t0015 and t0016
     Confirm that mean(slat) + mean(clat) = mean(tlat)"""
 
@@ -555,14 +274,14 @@ class FioJobTest_t0015(FioJobTest):
             self.passed = False
 
 
-class FioJobTest_t0019(FioJobTest):
+class FioJobFileTest_t0019(FioJobFileTest):
     """Test consists of fio test job t0019
     Confirm that all offsets were touched sequentially"""
 
     def check_result(self):
         super().check_result()
 
-        bw_log_filename = os.path.join(self.test_dir, "test_bw.log")
+        bw_log_filename = os.path.join(self.paths['test_dir'], "test_bw.log")
         file_data = self.get_file_fail(bw_log_filename)
         if not file_data:
             return
@@ -585,14 +304,14 @@ class FioJobTest_t0019(FioJobTest):
             self.failure_reason = f"unexpected last offset {cur}"
 
 
-class FioJobTest_t0020(FioJobTest):
+class FioJobFileTest_t0020(FioJobFileTest):
     """Test consists of fio test jobs t0020 and t0021
     Confirm that almost all offsets were touched non-sequentially"""
 
     def check_result(self):
         super().check_result()
 
-        bw_log_filename = os.path.join(self.test_dir, "test_bw.log")
+        bw_log_filename = os.path.join(self.paths['test_dir'], "test_bw.log")
         file_data = self.get_file_fail(bw_log_filename)
         if not file_data:
             return
@@ -624,13 +343,13 @@ class FioJobTest_t0020(FioJobTest):
             self.failure_reason += f" runs test failed with p = {p}"
 
 
-class FioJobTest_t0022(FioJobTest):
+class FioJobFileTest_t0022(FioJobFileTest):
     """Test consists of fio test job t0022"""
 
     def check_result(self):
         super().check_result()
 
-        bw_log_filename = os.path.join(self.test_dir, "test_bw.log")
+        bw_log_filename = os.path.join(self.paths['test_dir'], "test_bw.log")
         file_data = self.get_file_fail(bw_log_filename)
         if not file_data:
             return
@@ -662,13 +381,13 @@ class FioJobTest_t0022(FioJobTest):
             self.failure_reason += " no duplicate offsets found with norandommap=1"
 
 
-class FioJobTest_t0023(FioJobTest):
+class FioJobFileTest_t0023(FioJobFileTest):
     """Test consists of fio test job t0023 randtrimwrite test."""
 
     def check_trimwrite(self, filename):
         """Make sure that trims are followed by writes of the same size at the same offset."""
 
-        bw_log_filename = os.path.join(self.test_dir, filename)
+        bw_log_filename = os.path.join(self.paths['test_dir'], filename)
         file_data = self.get_file_fail(bw_log_filename)
         if not file_data:
             return
@@ -716,7 +435,7 @@ class FioJobTest_t0023(FioJobTest):
     def check_all_offsets(self, filename, sectorsize, filesize):
         """Make sure all offsets were touched."""
 
-        file_data = self.get_file_fail(os.path.join(self.test_dir, filename))
+        file_data = self.get_file_fail(os.path.join(self.paths['test_dir'], filename))
         if not file_data:
             return
 
@@ -771,12 +490,12 @@ class FioJobTest_t0023(FioJobTest):
         self.check_all_offsets("bssplit_bw.log", 512, filesize)
 
 
-class FioJobTest_t0024(FioJobTest_t0023):
+class FioJobFileTest_t0024(FioJobFileTest_t0023):
     """Test consists of fio test job t0024 trimwrite test."""
 
     def check_result(self):
-        # call FioJobTest_t0023's parent to skip checks done by t0023
-        super(FioJobTest_t0023, self).check_result()
+        # call FioJobFileTest_t0023's parent to skip checks done by t0023
+        super(FioJobFileTest_t0023, self).check_result()
 
         filesize = 1024*1024
 
@@ -791,7 +510,7 @@ class FioJobTest_t0024(FioJobTest_t0023):
         self.check_all_offsets("bssplit_bw.log", 512, filesize)
 
 
-class FioJobTest_t0025(FioJobTest):
+class FioJobFileTest_t0025(FioJobFileTest):
     """Test experimental verify read backs written data pattern."""
     def check_result(self):
         super().check_result()
@@ -802,11 +521,11 @@ class FioJobTest_t0025(FioJobTest):
         if self.json_data['jobs'][0]['read']['io_kbytes'] != 128:
             self.passed = False
 
-class FioJobTest_t0027(FioJobTest):
+class FioJobFileTest_t0027(FioJobFileTest):
     def setup(self, *args, **kws):
         super().setup(*args, **kws)
-        self.pattern_file = os.path.join(self.test_dir, "t0027.pattern")
-        self.output_file = os.path.join(self.test_dir, "t0027file")
+        self.pattern_file = os.path.join(self.paths['test_dir'], "t0027.pattern")
+        self.output_file = os.path.join(self.paths['test_dir'], "t0027file")
         self.pattern = os.urandom(16 << 10)
         with open(self.pattern_file, "wb") as f:
             f.write(self.pattern)
@@ -823,7 +542,7 @@ class FioJobTest_t0027(FioJobTest):
         if data != self.pattern:
             self.passed = False
 
-class FioJobTest_iops_rate(FioJobTest):
+class FioJobFileTest_iops_rate(FioJobFileTest):
     """Test consists of fio test job t0011
     Confirm that job0 iops == 1000
     and that job1_iops / job0_iops ~ 8
@@ -851,156 +570,10 @@ class FioJobTest_iops_rate(FioJobTest):
             self.passed = False
 
 
-class Requirements():
-    """Requirements consists of multiple run environment characteristics.
-    These are to determine if a particular test can be run"""
-
-    _linux = False
-    _libaio = False
-    _io_uring = False
-    _zbd = False
-    _root = False
-    _zoned_nullb = False
-    _not_macos = False
-    _not_windows = False
-    _unittests = False
-    _cpucount4 = False
-    _nvmecdev = False
-
-    def __init__(self, fio_root, args):
-        Requirements._not_macos = platform.system() != "Darwin"
-        Requirements._not_windows = platform.system() != "Windows"
-        Requirements._linux = platform.system() == "Linux"
-
-        if Requirements._linux:
-            config_file = os.path.join(fio_root, "config-host.h")
-            contents, success = FioJobTest.get_file(config_file)
-            if not success:
-                print(f"Unable to open {config_file} to check requirements")
-                Requirements._zbd = True
-            else:
-                Requirements._zbd = "CONFIG_HAS_BLKZONED" in contents
-                Requirements._libaio = "CONFIG_LIBAIO" in contents
-
-            contents, success = FioJobTest.get_file("/proc/kallsyms")
-            if not success:
-                print("Unable to open '/proc/kallsyms' to probe for io_uring support")
-            else:
-                Requirements._io_uring = "io_uring_setup" in contents
-
-            Requirements._root = os.geteuid() == 0
-            if Requirements._zbd and Requirements._root:
-                try:
-                    subprocess.run(["modprobe", "null_blk"],
-                                   stdout=subprocess.PIPE,
-                                   stderr=subprocess.PIPE)
-                    if os.path.exists("/sys/module/null_blk/parameters/zoned"):
-                        Requirements._zoned_nullb = True
-                except Exception:
-                    pass
-
-        if platform.system() == "Windows":
-            utest_exe = "unittest.exe"
-        else:
-            utest_exe = "unittest"
-        unittest_path = os.path.join(fio_root, "unittests", utest_exe)
-        Requirements._unittests = os.path.exists(unittest_path)
-
-        Requirements._cpucount4 = multiprocessing.cpu_count() >= 4
-        Requirements._nvmecdev = args.nvmecdev
-
-        req_list = [
-                Requirements.linux,
-                Requirements.libaio,
-                Requirements.io_uring,
-                Requirements.zbd,
-                Requirements.root,
-                Requirements.zoned_nullb,
-                Requirements.not_macos,
-                Requirements.not_windows,
-                Requirements.unittests,
-                Requirements.cpucount4,
-                Requirements.nvmecdev,
-                    ]
-        for req in req_list:
-            value, desc = req()
-            logging.debug("Requirements: Requirement '%s' met? %s", desc, value)
-
-    @classmethod
-    def linux(cls):
-        """Are we running on Linux?"""
-        return Requirements._linux, "Linux required"
-
-    @classmethod
-    def libaio(cls):
-        """Is libaio available?"""
-        return Requirements._libaio, "libaio required"
-
-    @classmethod
-    def io_uring(cls):
-        """Is io_uring available?"""
-        return Requirements._io_uring, "io_uring required"
-
-    @classmethod
-    def zbd(cls):
-        """Is ZBD support available?"""
-        return Requirements._zbd, "Zoned block device support required"
-
-    @classmethod
-    def root(cls):
-        """Are we running as root?"""
-        return Requirements._root, "root required"
-
-    @classmethod
-    def zoned_nullb(cls):
-        """Are zoned null block devices available?"""
-        return Requirements._zoned_nullb, "Zoned null block device support required"
-
-    @classmethod
-    def not_macos(cls):
-        """Are we running on a platform other than macOS?"""
-        return Requirements._not_macos, "platform other than macOS required"
-
-    @classmethod
-    def not_windows(cls):
-        """Are we running on a platform other than Windws?"""
-        return Requirements._not_windows, "platform other than Windows required"
-
-    @classmethod
-    def unittests(cls):
-        """Were unittests built?"""
-        return Requirements._unittests, "Unittests support required"
-
-    @classmethod
-    def cpucount4(cls):
-        """Do we have at least 4 CPUs?"""
-        return Requirements._cpucount4, "4+ CPUs required"
-
-    @classmethod
-    def nvmecdev(cls):
-        """Do we have an NVMe character device to test?"""
-        return Requirements._nvmecdev, "NVMe character device test target required"
-
-
-SUCCESS_DEFAULT = {
-    'zero_return': True,
-    'stderr_empty': True,
-    'timeout': 600,
-    }
-SUCCESS_NONZERO = {
-    'zero_return': False,
-    'stderr_empty': False,
-    'timeout': 600,
-    }
-SUCCESS_STDERR = {
-    'zero_return': True,
-    'stderr_empty': False,
-    'timeout': 600,
-    }
 TEST_LIST = [
     {
         'test_id':          1,
-        'test_class':       FioJobTest,
+        'test_class':       FioJobFileTest,
         'job':              't0001-52c58027.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1009,7 +582,7 @@ TEST_LIST = [
     },
     {
         'test_id':          2,
-        'test_class':       FioJobTest,
+        'test_class':       FioJobFileTest,
         'job':              't0002-13af05ae-post.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          't0002-13af05ae-pre.fio',
@@ -1018,7 +591,7 @@ TEST_LIST = [
     },
     {
         'test_id':          3,
-        'test_class':       FioJobTest,
+        'test_class':       FioJobFileTest,
         'job':              't0003-0ae2c6e1-post.fio',
         'success':          SUCCESS_NONZERO,
         'pre_job':          't0003-0ae2c6e1-pre.fio',
@@ -1027,7 +600,7 @@ TEST_LIST = [
     },
     {
         'test_id':          4,
-        'test_class':       FioJobTest,
+        'test_class':       FioJobFileTest,
         'job':              't0004-8a99fdf6.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1036,7 +609,7 @@ TEST_LIST = [
     },
     {
         'test_id':          5,
-        'test_class':       FioJobTest_t0005,
+        'test_class':       FioJobFileTest_t0005,
         'job':              't0005-f7078f7b.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1046,7 +619,7 @@ TEST_LIST = [
     },
     {
         'test_id':          6,
-        'test_class':       FioJobTest_t0006,
+        'test_class':       FioJobFileTest_t0006,
         'job':              't0006-82af2a7c.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1056,7 +629,7 @@ TEST_LIST = [
     },
     {
         'test_id':          7,
-        'test_class':       FioJobTest_t0007,
+        'test_class':       FioJobFileTest_t0007,
         'job':              't0007-37cf9e3c.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1066,7 +639,7 @@ TEST_LIST = [
     },
     {
         'test_id':          8,
-        'test_class':       FioJobTest_t0008,
+        'test_class':       FioJobFileTest_t0008,
         'job':              't0008-ae2fafc8.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1076,7 +649,7 @@ TEST_LIST = [
     },
     {
         'test_id':          9,
-        'test_class':       FioJobTest_t0009,
+        'test_class':       FioJobFileTest_t0009,
         'job':              't0009-f8b0bd10.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1088,7 +661,7 @@ TEST_LIST = [
     },
     {
         'test_id':          10,
-        'test_class':       FioJobTest,
+        'test_class':       FioJobFileTest,
         'job':              't0010-b7aae4ba.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1097,7 +670,7 @@ TEST_LIST = [
     },
     {
         'test_id':          11,
-        'test_class':       FioJobTest_iops_rate,
+        'test_class':       FioJobFileTest_iops_rate,
         'job':              't0011-5d2788d5.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1107,7 +680,7 @@ TEST_LIST = [
     },
     {
         'test_id':          12,
-        'test_class':       FioJobTest_t0012,
+        'test_class':       FioJobFileTest_t0012,
         'job':              't0012.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1117,7 +690,7 @@ TEST_LIST = [
     },
     {
         'test_id':          13,
-        'test_class':       FioJobTest,
+        'test_class':       FioJobFileTest,
         'job':              't0013.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1127,7 +700,7 @@ TEST_LIST = [
     },
     {
         'test_id':          14,
-        'test_class':       FioJobTest_t0014,
+        'test_class':       FioJobFileTest_t0014,
         'job':              't0014.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1137,7 +710,7 @@ TEST_LIST = [
     },
     {
         'test_id':          15,
-        'test_class':       FioJobTest_t0015,
+        'test_class':       FioJobFileTest_t0015,
         'job':              't0015-e78980ff.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1147,7 +720,7 @@ TEST_LIST = [
     },
     {
         'test_id':          16,
-        'test_class':       FioJobTest_t0015,
+        'test_class':       FioJobFileTest_t0015,
         'job':              't0016-d54ae22.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1157,7 +730,7 @@ TEST_LIST = [
     },
     {
         'test_id':          17,
-        'test_class':       FioJobTest_t0015,
+        'test_class':       FioJobFileTest_t0015,
         'job':              't0017.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1167,7 +740,7 @@ TEST_LIST = [
     },
     {
         'test_id':          18,
-        'test_class':       FioJobTest,
+        'test_class':       FioJobFileTest,
         'job':              't0018.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1176,7 +749,7 @@ TEST_LIST = [
     },
     {
         'test_id':          19,
-        'test_class':       FioJobTest_t0019,
+        'test_class':       FioJobFileTest_t0019,
         'job':              't0019.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1185,7 +758,7 @@ TEST_LIST = [
     },
     {
         'test_id':          20,
-        'test_class':       FioJobTest_t0020,
+        'test_class':       FioJobFileTest_t0020,
         'job':              't0020.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1194,7 +767,7 @@ TEST_LIST = [
     },
     {
         'test_id':          21,
-        'test_class':       FioJobTest_t0020,
+        'test_class':       FioJobFileTest_t0020,
         'job':              't0021.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1203,7 +776,7 @@ TEST_LIST = [
     },
     {
         'test_id':          22,
-        'test_class':       FioJobTest_t0022,
+        'test_class':       FioJobFileTest_t0022,
         'job':              't0022.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1212,7 +785,7 @@ TEST_LIST = [
     },
     {
         'test_id':          23,
-        'test_class':       FioJobTest_t0023,
+        'test_class':       FioJobFileTest_t0023,
         'job':              't0023.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1221,7 +794,7 @@ TEST_LIST = [
     },
     {
         'test_id':          24,
-        'test_class':       FioJobTest_t0024,
+        'test_class':       FioJobFileTest_t0024,
         'job':              't0024.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1230,7 +803,7 @@ TEST_LIST = [
     },
     {
         'test_id':          25,
-        'test_class':       FioJobTest_t0025,
+        'test_class':       FioJobFileTest_t0025,
         'job':              't0025.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1240,7 +813,7 @@ TEST_LIST = [
     },
     {
         'test_id':          26,
-        'test_class':       FioJobTest,
+        'test_class':       FioJobFileTest,
         'job':              't0026.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1249,7 +822,7 @@ TEST_LIST = [
     },
     {
         'test_id':          27,
-        'test_class':       FioJobTest_t0027,
+        'test_class':       FioJobFileTest_t0027,
         'job':              't0027.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1258,7 +831,7 @@ TEST_LIST = [
     },
     {
         'test_id':          28,
-        'test_class':       FioJobTest,
+        'test_class':       FioJobFileTest,
         'job':              't0028-c6cade16.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -1317,7 +890,7 @@ TEST_LIST = [
         'test_id':          1006,
         'test_class':       FioExeTest,
         'exe':              't/strided.py',
-        'parameters':       ['{fio_path}'],
+        'parameters':       ['--fio', '{fio_path}'],
         'success':          SUCCESS_DEFAULT,
         'requirements':     [],
     },
@@ -1461,98 +1034,15 @@ def main():
     print(f"Artifact directory is {artifact_root}")
 
     if not args.skip_req:
-        req = Requirements(fio_root, args)
-
-    passed = 0
-    failed = 0
-    skipped = 0
-
-    for config in TEST_LIST:
-        if (args.skip and config['test_id'] in args.skip) or \
-           (args.run_only and config['test_id'] not in args.run_only):
-            skipped = skipped + 1
-            print(f"Test {config['test_id']} SKIPPED (User request)")
-            continue
-
-        if issubclass(config['test_class'], FioJobTest):
-            if config['pre_job']:
-                fio_pre_job = os.path.join(fio_root, 't', 'jobs',
-                                           config['pre_job'])
-            else:
-                fio_pre_job = None
-            if config['pre_success']:
-                fio_pre_success = config['pre_success']
-            else:
-                fio_pre_success = None
-            if 'output_format' in config:
-                output_format = config['output_format']
-            else:
-                output_format = 'normal'
-            test = config['test_class'](
-                fio_path,
-                os.path.join(fio_root, 't', 'jobs', config['job']),
-                config['success'],
-                fio_pre_job=fio_pre_job,
-                fio_pre_success=fio_pre_success,
-                output_format=output_format)
-            desc = config['job']
-        elif issubclass(config['test_class'], FioExeTest):
-            exe_path = os.path.join(fio_root, config['exe'])
-            if config['parameters']:
-                parameters = [p.format(fio_path=fio_path, nvmecdev=args.nvmecdev)
-                              for p in config['parameters']]
-            else:
-                parameters = []
-            if Path(exe_path).suffix == '.py' and platform.system() == "Windows":
-                parameters.insert(0, exe_path)
-                exe_path = "python.exe"
-            if config['test_id'] in pass_through:
-                parameters += pass_through[config['test_id']].split()
-            test = config['test_class'](exe_path, parameters,
-                                        config['success'])
-            desc = config['exe']
-        else:
-            print(f"Test {config['test_id']} FAILED: unable to process test config")
-            failed = failed + 1
-            continue
-
-        if not args.skip_req:
-            reqs_met = True
-            for req in config['requirements']:
-                reqs_met, reason = req()
-                logging.debug("Test %d: Requirement '%s' met? %s", config['test_id'], reason,
-                              reqs_met)
-                if not reqs_met:
-                    break
-            if not reqs_met:
-                print(f"Test {config['test_id']} SKIPPED ({reason}) {desc}")
-                skipped = skipped + 1
-                continue
-
-        try:
-            test.setup(artifact_root, config['test_id'])
-            test.run()
-            test.check_result()
-        except KeyboardInterrupt:
-            break
-        except Exception as e:
-            test.passed = False
-            test.failure_reason += str(e)
-            logging.debug("Test %d exception:\n%s\n", config['test_id'], traceback.format_exc())
-        if test.passed:
-            result = "PASSED"
-            passed = passed + 1
-        else:
-            result = f"FAILED: {test.failure_reason}"
-            failed = failed + 1
-            contents, _ = FioJobTest.get_file(test.stderr_file)
-            logging.debug("Test %d: stderr:\n%s", config['test_id'], contents)
-            contents, _ = FioJobTest.get_file(test.stdout_file)
-            logging.debug("Test %d: stdout:\n%s", config['test_id'], contents)
-        print(f"Test {config['test_id']} {result} {desc}")
-
-    print(f"{passed} test(s) passed, {failed} failed, {skipped} skipped")
-
+        Requirements(fio_root, args)
+
+    test_env = {
+              'fio_path': fio_path,
+              'fio_root': fio_root,
+              'artifact_root': artifact_root,
+              'pass_through': pass_through,
+              }
+    _, failed, _ = run_fio_tests(TEST_LIST, test_env, args)
     sys.exit(failed)
 
 
diff --git a/t/strided.py b/t/strided.py
index 45e6f148..b7655e1e 100755
--- a/t/strided.py
+++ b/t/strided.py
@@ -1,11 +1,12 @@
 #!/usr/bin/env python3
-#
+
+"""
 # strided.py
 #
 # Test zonemode=strided. This uses the null ioengine when no file is
 # specified. If a file is specified, use it for randdom read testing.
 # Some of the zoneranges in the tests are 16MiB. So when using a file
-# a minimum size of 32MiB is recommended.
+# a minimum size of 64MiB is recommended.
 #
 # USAGE
 # python strided.py fio-executable [-f file/device]
@@ -13,12 +14,9 @@
 # EXAMPLES
 # python t/strided.py ./fio
 # python t/strided.py ./fio -f /dev/sda
-# dd if=/dev/zero of=temp bs=1M count=32
+# dd if=/dev/zero of=temp bs=1M count=64
 # python t/strided.py ./fio -f temp
 #
-# REQUIREMENTS
-# Python 2.6+
-#
 # ===TEST MATRIX===
 #
 # --zonemode=strided, zoneskip unset
@@ -28,322 +26,417 @@
 #       zonesize<zonerange  all blocks inside zone
 #
 #   w/o randommap       all blocks inside zone
-#
+"""
 
-from __future__ import absolute_import
-from __future__ import print_function
 import os
 import sys
+import time
 import argparse
-import subprocess
+from pathlib import Path
+from fiotestlib import FioJobCmdTest, run_fio_tests
 
 
-def parse_args():
-    parser = argparse.ArgumentParser()
-    parser.add_argument('fio',
-                        help='path to fio executable (e.g., ./fio)')
-    parser.add_argument('-f', '--filename', help="file/device to test")
-    args = parser.parse_args()
+class StridedTest(FioJobCmdTest):
+    """Test zonemode=strided."""
 
-    return args
+    def setup(self, parameters):
+        fio_args = [
+                    "--name=strided",
+                    "--zonemode=strided",
+                    "--log_offset=1",
+                    "--randrepeat=0",
+                    "--rw=randread",
+                    f"--write_iops_log={self.filenames['iopslog']}",
+                    f"--output={self.filenames['output']}",
+                    f"--zonerange={self.fio_opts['zonerange']}",
+                    f"--zonesize={self.fio_opts['zonesize']}",
+                    f"--bs={self.fio_opts['bs']}",
+                   ]
 
+        for opt in ['norandommap', 'random_generator', 'offset']:
+            if opt in self.fio_opts:
+                option = f"--{opt}={self.fio_opts[opt]}"
+                fio_args.append(option)
 
-def run_fio(fio, test, index):
-    filename = "strided"
-    fio_args = [
-                "--max-jobs=16",
-                "--name=strided",
-                "--zonemode=strided",
-                "--log_offset=1",
-                "--randrepeat=0",
-                "--rw=randread",
-                "--write_iops_log={0}{1:03d}".format(filename, index),
-                "--output={0}{1:03d}.out".format(filename, index),
-                "--zonerange={zonerange}".format(**test),
-                "--zonesize={zonesize}".format(**test),
-                "--bs={bs}".format(**test),
-               ]
-    if 'norandommap' in test:
-        fio_args.append('--norandommap')
-    if 'random_generator' in test:
-        fio_args.append('--random_generator={random_generator}'.format(**test))
-    if 'offset' in test:
-        fio_args.append('--offset={offset}'.format(**test))
-    if 'filename' in test:
-        fio_args.append('--filename={filename}'.format(**test))
-        fio_args.append('--filesize={filesize})'.format(**test))
-    else:
-        fio_args.append('--ioengine=null')
-        fio_args.append('--size={size}'.format(**test))
-        fio_args.append('--io_size={io_size}'.format(**test))
-        fio_args.append('--filesize={size})'.format(**test))
-
-    output = subprocess.check_output([fio] + fio_args, universal_newlines=True)
-
-    f = open("{0}{1:03d}_iops.1.log".format(filename, index), "r")
-    log = f.read()
-    f.close()
-
-    return log
-
-
-def check_output(iops_log, test):
-    zonestart = 0 if 'offset' not in test else test['offset']
-    iospersize = test['zonesize'] / test['bs']
-    iosperrange = test['zonerange'] / test['bs']
-    iosperzone = 0
-    lines = iops_log.split('\n')
-    zoneset = set()
-
-    for line in lines:
-        if len(line) == 0:
-            continue
-
-        if iosperzone == iospersize:
-            # time to move to a new zone
-            iosperzone = 0
-            zoneset = set()
-            zonestart += test['zonerange']
-            if zonestart >= test['filesize']:
-                zonestart = 0 if 'offset' not in test else test['offset']
-
-        iosperzone = iosperzone + 1
-        tokens = line.split(',')
-        offset = int(tokens[4])
-        if offset < zonestart or offset >= zonestart + test['zonerange']:
-            print("Offset {0} outside of zone starting at {1}".format(
-                    offset, zonestart))
-            return False
-
-        # skip next section if norandommap is enabled with no
-        # random_generator or with a random_generator != lfsr
-        if 'norandommap' in test:
-            if 'random_generator' in test:
-                if test['random_generator'] != 'lfsr':
-                    continue
-            else:
+        if 'filename' in self.fio_opts:
+            for opt in ['filename', 'filesize']:
+                option = f"--{opt}={self.fio_opts[opt]}"
+                fio_args.append(option)
+        else:
+            fio_args.append('--ioengine=null')
+            for opt in ['size', 'io_size', 'filesize']:
+                option = f"--{opt}={self.fio_opts[opt]}"
+                fio_args.append(option)
+
+        super().setup(fio_args)
+
+    def check_result(self):
+        zonestart = 0 if 'offset' not in self.fio_opts else self.fio_opts['offset']
+        iospersize = self.fio_opts['zonesize'] / self.fio_opts['bs']
+        iosperrange = self.fio_opts['zonerange'] / self.fio_opts['bs']
+        iosperzone = 0
+        lines = self.iops_log_lines.split('\n')
+        zoneset = set()
+
+        for line in lines:
+            if len(line) == 0:
                 continue
 
-        # we either have a random map enabled or we
-        # are using an LFSR
-        # so all blocks should be unique and we should have
-        # covered the entire zone when iosperzone % iosperrange == 0
-        block = (offset - zonestart) / test['bs']
-        if block in zoneset:
-            print("Offset {0} in zone already touched".format(offset))
-            return False
-
-        zoneset.add(block)
-        if iosperzone % iosperrange == 0:
-            if len(zoneset) != iosperrange:
-                print("Expected {0} blocks in zone but only saw {1}".format(
-                        iosperrange, len(zoneset)))
+            if iosperzone == iospersize:
+                # time to move to a new zone
+                iosperzone = 0
+                zoneset = set()
+                zonestart += self.fio_opts['zonerange']
+                if zonestart >= self.fio_opts['filesize']:
+                    zonestart = 0 if 'offset' not in self.fio_opts else self.fio_opts['offset']
+
+            iosperzone = iosperzone + 1
+            tokens = line.split(',')
+            offset = int(tokens[4])
+            if offset < zonestart or offset >= zonestart + self.fio_opts['zonerange']:
+                print(f"Offset {offset} outside of zone starting at {zonestart}")
                 return False
-            zoneset = set()
 
-    return True
+            # skip next section if norandommap is enabled with no
+            # random_generator or with a random_generator != lfsr
+            if 'norandommap' in self.fio_opts:
+                if 'random_generator' in self.fio_opts:
+                    if self.fio_opts['random_generator'] != 'lfsr':
+                        continue
+                else:
+                    continue
 
+            # we either have a random map enabled or we
+            # are using an LFSR
+            # so all blocks should be unique and we should have
+            # covered the entire zone when iosperzone % iosperrange == 0
+            block = (offset - zonestart) / self.fio_opts['bs']
+            if block in zoneset:
+                print(f"Offset {offset} in zone already touched")
+                return False
+
+            zoneset.add(block)
+            if iosperzone % iosperrange == 0:
+                if len(zoneset) != iosperrange:
+                    print(f"Expected {iosperrange} blocks in zone but only saw {len(zoneset)}")
+                    return False
+                zoneset = set()
+
+        return True
+
+
+TEST_LIST = [   # randommap enabled
+    {
+        "test_id": 1,
+        "fio_opts": {
+            "zonerange": 4096,
+            "zonesize": 4096,
+            "bs": 4096,
+            "offset": 8*4096,
+            "size": 16*4096,
+            "io_size": 16*4096,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 2,
+        "fio_opts": {
+            "zonerange": 4096,
+            "zonesize": 4096,
+            "bs": 4096,
+            "size": 16*4096,
+            "io_size": 16*4096,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 3,
+        "fio_opts": {
+            "zonerange": 16*1024*1024,
+            "zonesize": 16*1024*1024,
+            "bs": 4096,
+            "size": 256*1024*1024,
+            "io_size": 256*1024*204,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 4,
+        "fio_opts": {
+            "zonerange": 4096,
+            "zonesize": 4*4096,
+            "bs": 4096,
+            "size": 16*4096,
+            "io_size": 16*4096,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 5,
+        "fio_opts": {
+            "zonerange": 16*1024*1024,
+            "zonesize": 32*1024*1024,
+            "bs": 4096,
+            "size": 256*1024*1024,
+            "io_size": 256*1024*204,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 6,
+        "fio_opts": {
+            "zonerange": 8192,
+            "zonesize": 4096,
+            "bs": 4096,
+            "size": 16*4096,
+            "io_size": 16*4096,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 7,
+        "fio_opts": {
+            "zonerange": 16*1024*1024,
+            "zonesize": 8*1024*1024,
+            "bs": 4096,
+            "size": 256*1024*1024,
+            "io_size": 256*1024*204,
+            },
+        "test_class": StridedTest,
+    },
+            # lfsr
+    {
+        "test_id": 8,
+        "fio_opts": {
+            "random_generator": "lfsr",
+            "zonerange": 4096*1024,
+            "zonesize": 4096*1024,
+            "bs": 4096,
+            "offset": 8*4096*1024,
+            "size": 16*4096*1024,
+            "io_size": 16*4096*1024,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 9,
+        "fio_opts": {
+            "random_generator": "lfsr",
+            "zonerange": 4096*1024,
+            "zonesize": 4096*1024,
+            "bs": 4096,
+            "size": 16*4096*1024,
+            "io_size": 16*4096*1024,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 10,
+        "fio_opts": {
+            "random_generator": "lfsr",
+            "zonerange": 16*1024*1024,
+            "zonesize": 16*1024*1024,
+            "bs": 4096,
+            "size": 256*1024*1024,
+            "io_size": 256*1024*204,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 11,
+        "fio_opts": {
+            "random_generator": "lfsr",
+            "zonerange": 4096*1024,
+            "zonesize": 4*4096*1024,
+            "bs": 4096,
+            "size": 16*4096*1024,
+            "io_size": 16*4096*1024,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 12,
+        "fio_opts": {
+            "random_generator": "lfsr",
+            "zonerange": 16*1024*1024,
+            "zonesize": 32*1024*1024,
+            "bs": 4096,
+            "size": 256*1024*1024,
+            "io_size": 256*1024*204,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 13,
+        "fio_opts": {
+            "random_generator": "lfsr",
+            "zonerange": 8192*1024,
+            "zonesize": 4096*1024,
+            "bs": 4096,
+            "size": 16*4096*1024,
+            "io_size": 16*4096*1024,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 14,
+        "fio_opts": {
+            "random_generator": "lfsr",
+            "zonerange": 16*1024*1024,
+            "zonesize": 8*1024*1024,
+            "bs": 4096,
+            "size": 256*1024*1024,
+            "io_size": 256*1024*204,
+            },
+        "test_class": StridedTest,
+    },
+    # norandommap
+    {
+        "test_id": 15,
+        "fio_opts": {
+            "norandommap": 1,
+            "zonerange": 4096,
+            "zonesize": 4096,
+            "bs": 4096,
+            "offset": 8*4096,
+            "size": 16*4096,
+            "io_size": 16*4096,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 16,
+        "fio_opts": {
+            "norandommap": 1,
+            "zonerange": 4096,
+            "zonesize": 4096,
+            "bs": 4096,
+            "size": 16*4096,
+            "io_size": 16*4096,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 17,
+        "fio_opts": {
+            "norandommap": 1,
+            "zonerange": 16*1024*1024,
+            "zonesize": 16*1024*1024,
+            "bs": 4096,
+            "size": 256*1024*1024,
+            "io_size": 256*1024*204,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 18,
+        "fio_opts": {
+            "norandommap": 1,
+            "zonerange": 4096,
+            "zonesize": 8192,
+            "bs": 4096,
+            "size": 16*4096,
+            "io_size": 16*4096,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 19,
+        "fio_opts": {
+            "norandommap": 1,
+            "zonerange": 16*1024*1024,
+            "zonesize": 32*1024*1024,
+            "bs": 4096,
+            "size": 256*1024*1024,
+            "io_size": 256*1024*204,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 20,
+        "fio_opts": {
+            "norandommap": 1,
+            "zonerange": 8192,
+            "zonesize": 4096,
+            "bs": 4096,
+            "size": 16*4096,
+            "io_size": 16*4096,
+            },
+        "test_class": StridedTest,
+    },
+    {
+        "test_id": 21,
+        "fio_opts": {
+            "norandommap": 1,
+            "zonerange": 16*1024*1024,
+            "zonesize": 8*1024*1024,
+            "bs": 4096,
+            "size": 256*1024*1024,
+            "io_size": 256*1024*1024,
+            },
+        "test_class": StridedTest,
+    },
+]
+
+
+def parse_args():
+    """Parse command-line arguments."""
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-f', '--fio', help='path to file executable (e.g., ./fio)')
+    parser.add_argument('-a', '--artifact-root', help='artifact root directory')
+    parser.add_argument('-s', '--skip', nargs='+', type=int,
+                        help='list of test(s) to skip')
+    parser.add_argument('-o', '--run-only', nargs='+', type=int,
+                        help='list of test(s) to run, skipping all others')
+    parser.add_argument('--dut',
+                        help='target file/device to test.')
+    args = parser.parse_args()
+
+    return args
+
+
+def main():
+    """Run zonemode=strided tests."""
 
-if __name__ == '__main__':
     args = parse_args()
 
-    tests = [   # randommap enabled
-                {
-                    "zonerange": 4096,
-                    "zonesize": 4096,
-                    "bs": 4096,
-                    "offset": 8*4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
-                },
-                {
-                    "zonerange": 4096,
-                    "zonesize": 4096,
-                    "bs": 4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
-                },
-                {
-                    "zonerange": 16*1024*1024,
-                    "zonesize": 16*1024*1024,
-                    "bs": 4096,
-                    "size": 256*1024*1024,
-                    "io_size": 256*1024*204,
-                },
-                {
-                    "zonerange": 4096,
-                    "zonesize": 4*4096,
-                    "bs": 4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
-                },
-                {
-                    "zonerange": 16*1024*1024,
-                    "zonesize": 32*1024*1024,
-                    "bs": 4096,
-                    "size": 256*1024*1024,
-                    "io_size": 256*1024*204,
-                },
-                {
-                    "zonerange": 8192,
-                    "zonesize": 4096,
-                    "bs": 4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
-                },
-                {
-                    "zonerange": 16*1024*1024,
-                    "zonesize": 8*1024*1024,
-                    "bs": 4096,
-                    "size": 256*1024*1024,
-                    "io_size": 256*1024*204,
-                },
-                # lfsr
-                {
-                    "random_generator": "lfsr",
-                    "zonerange": 4096*1024,
-                    "zonesize": 4096*1024,
-                    "bs": 4096,
-                    "offset": 8*4096*1024,
-                    "size": 16*4096*1024,
-                    "io_size": 16*4096*1024,
-                },
-                {
-                    "random_generator": "lfsr",
-                    "zonerange": 4096*1024,
-                    "zonesize": 4096*1024,
-                    "bs": 4096,
-                    "size": 16*4096*1024,
-                    "io_size": 16*4096*1024,
-                },
-                {
-                    "random_generator": "lfsr",
-                    "zonerange": 16*1024*1024,
-                    "zonesize": 16*1024*1024,
-                    "bs": 4096,
-                    "size": 256*1024*1024,
-                    "io_size": 256*1024*204,
-                },
-                {
-                    "random_generator": "lfsr",
-                    "zonerange": 4096*1024,
-                    "zonesize": 4*4096*1024,
-                    "bs": 4096,
-                    "size": 16*4096*1024,
-                    "io_size": 16*4096*1024,
-                },
-                {
-                    "random_generator": "lfsr",
-                    "zonerange": 16*1024*1024,
-                    "zonesize": 32*1024*1024,
-                    "bs": 4096,
-                    "size": 256*1024*1024,
-                    "io_size": 256*1024*204,
-                },
-                {
-                    "random_generator": "lfsr",
-                    "zonerange": 8192*1024,
-                    "zonesize": 4096*1024,
-                    "bs": 4096,
-                    "size": 16*4096*1024,
-                    "io_size": 16*4096*1024,
-                },
-                {
-                    "random_generator": "lfsr",
-                    "zonerange": 16*1024*1024,
-                    "zonesize": 8*1024*1024,
-                    "bs": 4096,
-                    "size": 256*1024*1024,
-                    "io_size": 256*1024*204,
-                },
-                # norandommap
-                {
-                    "norandommap": 1,
-                    "zonerange": 4096,
-                    "zonesize": 4096,
-                    "bs": 4096,
-                    "offset": 8*4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
-                },
-                {
-                    "norandommap": 1,
-                    "zonerange": 4096,
-                    "zonesize": 4096,
-                    "bs": 4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
-                },
-                {
-                    "norandommap": 1,
-                    "zonerange": 16*1024*1024,
-                    "zonesize": 16*1024*1024,
-                    "bs": 4096,
-                    "size": 256*1024*1024,
-                    "io_size": 256*1024*204,
-                },
-                {
-                    "norandommap": 1,
-                    "zonerange": 4096,
-                    "zonesize": 8192,
-                    "bs": 4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
-                },
-                {
-                    "norandommap": 1,
-                    "zonerange": 16*1024*1024,
-                    "zonesize": 32*1024*1024,
-                    "bs": 4096,
-                    "size": 256*1024*1024,
-                    "io_size": 256*1024*204,
-                },
-                {
-                    "norandommap": 1,
-                    "zonerange": 8192,
-                    "zonesize": 4096,
-                    "bs": 4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
-                },
-                {
-                    "norandommap": 1,
-                    "zonerange": 16*1024*1024,
-                    "zonesize": 8*1024*1024,
-                    "bs": 4096,
-                    "size": 256*1024*1024,
-                    "io_size": 256*1024*1024,
-                },
-
-            ]
-
-    index = 1
-    passed = 0
-    failed = 0
-
-    if args.filename:
-        statinfo = os.stat(args.filename)
+    artifact_root = args.artifact_root if args.artifact_root else \
+        f"strided-test-{time.strftime('%Y%m%d-%H%M%S')}"
+    os.mkdir(artifact_root)
+    print(f"Artifact directory is {artifact_root}")
+
+    if args.fio:
+        fio_path = str(Path(args.fio).absolute())
+    else:
+        fio_path = 'fio'
+    print(f"fio path is {fio_path}")
+
+    if args.dut:
+        statinfo = os.stat(args.dut)
         filesize = statinfo.st_size
         if filesize == 0:
-            f = os.open(args.filename, os.O_RDONLY)
+            f = os.open(args.dut, os.O_RDONLY)
             filesize = os.lseek(f, 0, os.SEEK_END)
             os.close(f)
 
-    for test in tests:
-        if args.filename:
-            test['filename'] = args.filename
-            test['filesize'] = filesize
+    for test in TEST_LIST:
+        if args.dut:
+            test['fio_opts']['filename'] = os.path.abspath(args.dut)
+            test['fio_opts']['filesize'] = filesize
         else:
-            test['filesize'] = test['size']
-        iops_log = run_fio(args.fio, test, index)
-        status = check_output(iops_log, test)
-        print("Test {0} {1}".format(index, ("PASSED" if status else "FAILED")))
-        if status:
-            passed = passed + 1
-        else:
-            failed = failed + 1
-        index = index + 1
+            test['fio_opts']['filesize'] = test['fio_opts']['size']
 
-    print("{0} tests passed, {1} failed".format(passed, failed))
+    test_env = {
+              'fio_path': fio_path,
+              'fio_root': str(Path(__file__).absolute().parent.parent),
+              'artifact_root': artifact_root,
+              'basename': 'strided',
+              }
 
+    _, failed, _ = run_fio_tests(TEST_LIST, test_env, args)
     sys.exit(failed)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 996160e7..a3d37a7d 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -460,7 +460,8 @@ test11() {
 test12() {
     local size off capacity
 
-    prep_write
+    [ -n "$is_zbd" ] && reset_zone "$dev" -1
+
     size=$((8 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 8 $off $dev)
@@ -477,7 +478,8 @@ test13() {
 
     require_max_open_zones 4 || return $SKIP_TESTCASE
 
-    prep_write
+    [ -n "$is_zbd" ] && reset_zone "$dev" -1
+
     size=$((8 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 8 $off $dev)
@@ -726,7 +728,9 @@ test29() {
     require_seq_zones 80 || return $SKIP_TESTCASE
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     size=$((16*zone_size))
-    prep_write
+
+    [ -n "$is_zbd" ] && reset_zone "$dev" -1
+
     opts=("--debug=zbd")
     for ((i=0;i<jobs;i++)); do
 	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
@@ -796,7 +800,8 @@ test32() {
 
     require_zbd || return $SKIP_TESTCASE
 
-    prep_write
+    [ -n "$is_zbd" ] && reset_zone "$dev" -1
+
     off=$((first_sequential_zone_sector * 512))
     size=$((disk_size - off))
     opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--size=$size")
@@ -1024,7 +1029,9 @@ test48() {
 
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     size=$((16*zone_size))
-    prep_write
+
+    [ -n "$is_zbd" ] && reset_zone "$dev" -1
+
     opts=("--aux-path=/tmp" "--allow_file_create=0" "--significant_figures=10")
     opts+=("--debug=zbd")
     opts+=("$(ioengine "libaio")" "--rw=randwrite" "--direct=1")
@@ -1094,7 +1101,7 @@ test51() {
 	require_conv_zones 8 || return $SKIP_TESTCASE
 	require_seq_zones 8 || return $SKIP_TESTCASE
 
-	prep_write
+	reset_zone "$dev" -1
 
 	off=$((first_sequential_zone_sector * 512 - 8 * zone_size))
 	opts+=("--size=$((16 * zone_size))" "$(ioengine "libaio")")
@@ -1361,6 +1368,51 @@ test63() {
 	check_reset_count -eq 3 || return $?
 }
 
+# Test write zone accounting handles almost full zones correctly. Prepare an
+# almost full, but not full zone. Write to the zone with verify using larger
+# block size. Then confirm fio does not report write zone accounting failure.
+test64() {
+	local bs cap
+
+	[ -n "$is_zbd" ] && reset_zone "$dev" -1
+
+	bs=$((zone_size / 8))
+	cap=$(total_zone_capacity 1 $((first_sequential_zone_sector*512)) $dev)
+	run_fio_on_seq "$(ioengine "psync")" --rw=write --bs="$bs" \
+		       --size=$((zone_size)) \
+		       --io_size=$((cap - bs)) \
+		       >> "${logfile}.${test_number}" 2>&1 || return $?
+
+	bs=$((zone_size / 2))
+	run_fio_on_seq "$(ioengine "psync")" --rw=write --bs="$bs" \
+		       --size=$((zone_size)) --do_verify=1 --verify=md5 \
+		       >> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
+# Test open zone accounting handles trim workload correctly. Prepare open zones
+# as many as max_open_zones=4. Trim one of the 4 zones. Then write to another
+# zone and check the write amount is expected size.
+test65() {
+	local off capacity
+
+	[ -n "$is_zbd" ] && reset_zone "$dev" -1
+
+	off=$((first_sequential_zone_sector * 512))
+	capacity=$(total_zone_capacity 1 $off "$dev")
+	run_fio --zonemode=zbd --direct=1 --zonesize="$zone_size" --thread=1 \
+		--filename="$dev" --group_reporting=1 --max_open_zones=4 \
+		"$(ioengine "psync")" \
+		--name="prep_open_zones" --rw=randwrite --offset="$off" \
+		--size="$((zone_size * 4))" --bs=4096 --io_size="$zone_size" \
+		--name=trimjob --wait_for="prep_open_zones" --rw=trim \
+		--bs="$zone_size" --offset="$off" --size="$zone_size" \
+		--name=write --wait_for="trimjob" --rw=write --bs=4096 \
+		--offset="$((off + zone_size * 4))" --size="$zone_size" \
+		>> "${logfile}.${test_number}" 2>&1
+
+	check_written $((zone_size + capacity))
+}
+
 SECONDS=0
 tests=()
 dynamic_analyzer=()
diff --git a/zbd.c b/zbd.c
index 5f1a7d7f..9455140a 100644
--- a/zbd.c
+++ b/zbd.c
@@ -254,7 +254,7 @@ static int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
 }
 
 /**
- * zbd_reset_zone - reset the write pointer of a single zone
+ * __zbd_reset_zone - reset the write pointer of a single zone
  * @td: FIO thread data.
  * @f: FIO file associated with the disk for which to reset a write pointer.
  * @z: Zone to reset.
@@ -263,8 +263,8 @@ static int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
  *
  * The caller must hold z->mutex.
  */
-static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
-			  struct fio_zone_info *z)
+static int __zbd_reset_zone(struct thread_data *td, struct fio_file *f,
+			    struct fio_zone_info *z)
 {
 	uint64_t offset = z->start;
 	uint64_t length = (z+1)->start - offset;
@@ -304,39 +304,65 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 }
 
 /**
- * zbd_close_zone - Remove a zone from the open zones array.
+ * zbd_write_zone_put - Remove a zone from the write target zones array.
  * @td: FIO thread data.
- * @f: FIO file associated with the disk for which to reset a write pointer.
+ * @f: FIO file that has the write zones array to remove.
  * @zone_idx: Index of the zone to remove.
  *
  * The caller must hold f->zbd_info->mutex.
  */
-static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
-			   struct fio_zone_info *z)
+static void zbd_write_zone_put(struct thread_data *td, const struct fio_file *f,
+			       struct fio_zone_info *z)
 {
-	uint32_t ozi;
+	uint32_t zi;
 
-	if (!z->open)
+	if (!z->write)
 		return;
 
-	for (ozi = 0; ozi < f->zbd_info->num_open_zones; ozi++) {
-		if (zbd_get_zone(f, f->zbd_info->open_zones[ozi]) == z)
+	for (zi = 0; zi < f->zbd_info->num_write_zones; zi++) {
+		if (zbd_get_zone(f, f->zbd_info->write_zones[zi]) == z)
 			break;
 	}
-	if (ozi == f->zbd_info->num_open_zones)
+	if (zi == f->zbd_info->num_write_zones)
 		return;
 
-	dprint(FD_ZBD, "%s: closing zone %u\n",
+	dprint(FD_ZBD, "%s: removing zone %u from write zone array\n",
 	       f->file_name, zbd_zone_idx(f, z));
 
-	memmove(f->zbd_info->open_zones + ozi,
-		f->zbd_info->open_zones + ozi + 1,
-		(ZBD_MAX_OPEN_ZONES - (ozi + 1)) *
-		sizeof(f->zbd_info->open_zones[0]));
+	memmove(f->zbd_info->write_zones + zi,
+		f->zbd_info->write_zones + zi + 1,
+		(ZBD_MAX_WRITE_ZONES - (zi + 1)) *
+		sizeof(f->zbd_info->write_zones[0]));
+
+	f->zbd_info->num_write_zones--;
+	td->num_write_zones--;
+	z->write = 0;
+}
 
-	f->zbd_info->num_open_zones--;
-	td->num_open_zones--;
-	z->open = 0;
+/**
+ * zbd_reset_zone - reset the write pointer of a single zone and remove the zone
+ *                  from the array of write zones.
+ * @td: FIO thread data.
+ * @f: FIO file associated with the disk for which to reset a write pointer.
+ * @z: Zone to reset.
+ *
+ * Returns 0 upon success and a negative error code upon failure.
+ *
+ * The caller must hold z->mutex.
+ */
+static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
+			  struct fio_zone_info *z)
+{
+	int ret;
+
+	ret = __zbd_reset_zone(td, f, z);
+	if (ret)
+		return ret;
+
+	pthread_mutex_lock(&f->zbd_info->mutex);
+	zbd_write_zone_put(td, f, z);
+	pthread_mutex_unlock(&f->zbd_info->mutex);
+	return 0;
 }
 
 /**
@@ -404,9 +430,6 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 			continue;
 
 		zone_lock(td, f, z);
-		pthread_mutex_lock(&f->zbd_info->mutex);
-		zbd_close_zone(td, f, z);
-		pthread_mutex_unlock(&f->zbd_info->mutex);
 
 		if (z->wp != z->start) {
 			dprint(FD_ZBD, "%s: resetting zone %u\n",
@@ -450,21 +473,19 @@ static int zbd_get_max_open_zones(struct thread_data *td, struct fio_file *f,
 }
 
 /**
- * zbd_open_zone - Add a zone to the array of open zones.
+ * __zbd_write_zone_get - Add a zone to the array of write zones.
  * @td: fio thread data.
- * @f: fio file that has the open zones to add.
+ * @f: fio file that has the write zones array to add.
  * @zone_idx: Index of the zone to add.
  *
- * Open a ZBD zone if it is not already open. Returns true if either the zone
- * was already open or if the zone was successfully added to the array of open
- * zones without exceeding the maximum number of open zones. Returns false if
- * the zone was not already open and opening the zone would cause the zone limit
- * to be exceeded.
+ * Do same operation as @zbd_write_zone_get, except it adds the zone at
+ * @zone_idx to write target zones array even when it does not have remainder
+ * space to write one block.
  */
-static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
-			  struct fio_zone_info *z)
+static bool __zbd_write_zone_get(struct thread_data *td,
+				 const struct fio_file *f,
+				 struct fio_zone_info *z)
 {
-	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
 	struct zoned_block_device_info *zbdi = f->zbd_info;
 	uint32_t zone_idx = zbd_zone_idx(f, z);
 	bool res = true;
@@ -476,24 +497,24 @@ static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 	 * Skip full zones with data verification enabled because resetting a
 	 * zone causes data loss and hence causes verification to fail.
 	 */
-	if (td->o.verify != VERIFY_NONE && zbd_zone_full(f, z, min_bs))
+	if (td->o.verify != VERIFY_NONE && zbd_zone_remainder(z) == 0)
 		return false;
 
 	/*
-	 * zbdi->max_open_zones == 0 means that there is no limit on the maximum
-	 * number of open zones. In this case, do no track open zones in
-	 * zbdi->open_zones array.
+	 * zbdi->max_write_zones == 0 means that there is no limit on the
+	 * maximum number of write target zones. In this case, do no track write
+	 * target zones in zbdi->write_zones array.
 	 */
-	if (!zbdi->max_open_zones)
+	if (!zbdi->max_write_zones)
 		return true;
 
 	pthread_mutex_lock(&zbdi->mutex);
 
-	if (z->open) {
+	if (z->write) {
 		/*
 		 * If the zone is going to be completely filled by writes
-		 * already in-flight, handle it as a full zone instead of an
-		 * open zone.
+		 * already in-flight, handle it as a full zone instead of a
+		 * write target zone.
 		 */
 		if (!zbd_zone_remainder(z))
 			res = false;
@@ -503,17 +524,17 @@ static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 	res = false;
 	/* Zero means no limit */
 	if (td->o.job_max_open_zones > 0 &&
-	    td->num_open_zones >= td->o.job_max_open_zones)
+	    td->num_write_zones >= td->o.job_max_open_zones)
 		goto out;
-	if (zbdi->num_open_zones >= zbdi->max_open_zones)
+	if (zbdi->num_write_zones >= zbdi->max_write_zones)
 		goto out;
 
-	dprint(FD_ZBD, "%s: opening zone %u\n",
+	dprint(FD_ZBD, "%s: adding zone %u to write zone array\n",
 	       f->file_name, zone_idx);
 
-	zbdi->open_zones[zbdi->num_open_zones++] = zone_idx;
-	td->num_open_zones++;
-	z->open = 1;
+	zbdi->write_zones[zbdi->num_write_zones++] = zone_idx;
+	td->num_write_zones++;
+	z->write = 1;
 	res = true;
 
 out:
@@ -521,6 +542,33 @@ out:
 	return res;
 }
 
+/**
+ * zbd_write_zone_get - Add a zone to the array of write zones.
+ * @td: fio thread data.
+ * @f: fio file that has the open zones to add.
+ * @zone_idx: Index of the zone to add.
+ *
+ * Add a ZBD zone to write target zones array, if it is not yet added. Returns
+ * true if either the zone was already added or if the zone was successfully
+ * added to the array without exceeding the maximum number of write zones.
+ * Returns false if the zone was not already added and addition of the zone
+ * would cause the zone limit to be exceeded.
+ */
+static bool zbd_write_zone_get(struct thread_data *td, const struct fio_file *f,
+			       struct fio_zone_info *z)
+{
+	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
+
+	/*
+	 * Skip full zones with data verification enabled because resetting a
+	 * zone causes data loss and hence causes verification to fail.
+	 */
+	if (td->o.verify != VERIFY_NONE && zbd_zone_full(f, z, min_bs))
+		return false;
+
+	return __zbd_write_zone_get(td, f, z);
+}
+
 /* Verify whether direct I/O is used for all host-managed zoned block drives. */
 static bool zbd_using_direct_io(void)
 {
@@ -894,7 +942,7 @@ out:
 	return ret;
 }
 
-static int zbd_set_max_open_zones(struct thread_data *td, struct fio_file *f)
+static int zbd_set_max_write_zones(struct thread_data *td, struct fio_file *f)
 {
 	struct zoned_block_device_info *zbd = f->zbd_info;
 	unsigned int max_open_zones;
@@ -902,7 +950,7 @@ static int zbd_set_max_open_zones(struct thread_data *td, struct fio_file *f)
 
 	if (zbd->model != ZBD_HOST_MANAGED || td->o.ignore_zone_limits) {
 		/* Only host-managed devices have a max open limit */
-		zbd->max_open_zones = td->o.max_open_zones;
+		zbd->max_write_zones = td->o.max_open_zones;
 		goto out;
 	}
 
@@ -913,13 +961,13 @@ static int zbd_set_max_open_zones(struct thread_data *td, struct fio_file *f)
 
 	if (!max_open_zones) {
 		/* No device limit */
-		zbd->max_open_zones = td->o.max_open_zones;
+		zbd->max_write_zones = td->o.max_open_zones;
 	} else if (!td->o.max_open_zones) {
 		/* No user limit. Set limit to device limit */
-		zbd->max_open_zones = max_open_zones;
+		zbd->max_write_zones = max_open_zones;
 	} else if (td->o.max_open_zones <= max_open_zones) {
 		/* Both user limit and dev limit. User limit not too large */
-		zbd->max_open_zones = td->o.max_open_zones;
+		zbd->max_write_zones = td->o.max_open_zones;
 	} else {
 		/* Both user limit and dev limit. User limit too large */
 		td_verror(td, EINVAL,
@@ -931,15 +979,15 @@ static int zbd_set_max_open_zones(struct thread_data *td, struct fio_file *f)
 
 out:
 	/* Ensure that the limit is not larger than FIO's internal limit */
-	if (zbd->max_open_zones > ZBD_MAX_OPEN_ZONES) {
+	if (zbd->max_write_zones > ZBD_MAX_WRITE_ZONES) {
 		td_verror(td, EINVAL, "'max_open_zones' value is too large");
 		log_err("'max_open_zones' value is larger than %u\n",
-			ZBD_MAX_OPEN_ZONES);
+			ZBD_MAX_WRITE_ZONES);
 		return -EINVAL;
 	}
 
-	dprint(FD_ZBD, "%s: using max open zones limit: %"PRIu32"\n",
-	       f->file_name, zbd->max_open_zones);
+	dprint(FD_ZBD, "%s: using max write zones limit: %"PRIu32"\n",
+	       f->file_name, zbd->max_write_zones);
 
 	return 0;
 }
@@ -981,7 +1029,7 @@ static int zbd_create_zone_info(struct thread_data *td, struct fio_file *f)
 	assert(f->zbd_info);
 	f->zbd_info->model = zbd_model;
 
-	ret = zbd_set_max_open_zones(td, f);
+	ret = zbd_set_max_write_zones(td, f);
 	if (ret) {
 		zbd_free_zone_info(f);
 		return ret;
@@ -1174,7 +1222,7 @@ int zbd_setup_files(struct thread_data *td)
 			assert(f->min_zone < f->max_zone);
 
 		if (td->o.max_open_zones > 0 &&
-		    zbd->max_open_zones != td->o.max_open_zones) {
+		    zbd->max_write_zones != td->o.max_open_zones) {
 			log_err("Different 'max_open_zones' values\n");
 			return 1;
 		}
@@ -1184,34 +1232,32 @@ int zbd_setup_files(struct thread_data *td)
 		 * global max open zones limit. (As the tracking of open zones
 		 * is disabled when there is no global max open zones limit.)
 		 */
-		if (td->o.job_max_open_zones && !zbd->max_open_zones) {
+		if (td->o.job_max_open_zones && !zbd->max_write_zones) {
 			log_err("'job_max_open_zones' cannot be used without a global open zones limit\n");
 			return 1;
 		}
 
 		/*
-		 * zbd->max_open_zones is the global limit shared for all jobs
+		 * zbd->max_write_zones is the global limit shared for all jobs
 		 * that target the same zoned block device. Force sync the per
 		 * thread global limit with the actual global limit. (The real
 		 * per thread/job limit is stored in td->o.job_max_open_zones).
 		 */
-		td->o.max_open_zones = zbd->max_open_zones;
+		td->o.max_open_zones = zbd->max_write_zones;
 
 		for (zi = f->min_zone; zi < f->max_zone; zi++) {
 			z = &zbd->zone_info[zi];
 			if (z->cond != ZBD_ZONE_COND_IMP_OPEN &&
 			    z->cond != ZBD_ZONE_COND_EXP_OPEN)
 				continue;
-			if (zbd_open_zone(td, f, z))
+			if (__zbd_write_zone_get(td, f, z))
 				continue;
 			/*
 			 * If the number of open zones exceeds specified limits,
-			 * reset all extra open zones.
+			 * error out.
 			 */
-			if (zbd_reset_zone(td, f, z) < 0) {
-				log_err("Failed to reest zone %d\n", zi);
-				return 1;
-			}
+			log_err("Number of open zones exceeds max_open_zones limit\n");
+			return 1;
 		}
 	}
 
@@ -1284,12 +1330,12 @@ void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 	zbd_reset_write_cnt(td, f);
 }
 
-/* Return random zone index for one of the open zones. */
+/* Return random zone index for one of the write target zones. */
 static uint32_t pick_random_zone_idx(const struct fio_file *f,
 				     const struct io_u *io_u)
 {
 	return (io_u->offset - f->file_offset) *
-		f->zbd_info->num_open_zones / f->io_size;
+		f->zbd_info->num_write_zones / f->io_size;
 }
 
 static bool any_io_in_flight(void)
@@ -1303,35 +1349,35 @@ static bool any_io_in_flight(void)
 }
 
 /*
- * Modify the offset of an I/O unit that does not refer to an open zone such
- * that it refers to an open zone. Close an open zone and open a new zone if
- * necessary. The open zone is searched across sequential zones.
+ * Modify the offset of an I/O unit that does not refer to a zone such that
+ * in write target zones array. Add a zone to or remove a zone from the lsit if
+ * necessary. The write target zone is searched across sequential zones.
  * This algorithm can only work correctly if all write pointers are
  * a multiple of the fio block size. The caller must neither hold z->mutex
  * nor f->zbd_info->mutex. Returns with z->mutex held upon success.
  */
-static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
-						      struct io_u *io_u)
+static struct fio_zone_info *zbd_convert_to_write_zone(struct thread_data *td,
+						       struct io_u *io_u)
 {
 	const uint64_t min_bs = td->o.min_bs[io_u->ddir];
 	struct fio_file *f = io_u->file;
 	struct zoned_block_device_info *zbdi = f->zbd_info;
 	struct fio_zone_info *z;
-	unsigned int open_zone_idx = -1;
+	unsigned int write_zone_idx = -1;
 	uint32_t zone_idx, new_zone_idx;
 	int i;
-	bool wait_zone_close;
+	bool wait_zone_write;
 	bool in_flight;
 	bool should_retry = true;
 
 	assert(is_valid_offset(f, io_u->offset));
 
-	if (zbdi->max_open_zones || td->o.job_max_open_zones) {
+	if (zbdi->max_write_zones || td->o.job_max_open_zones) {
 		/*
-		 * This statement accesses zbdi->open_zones[] on purpose
+		 * This statement accesses zbdi->write_zones[] on purpose
 		 * without locking.
 		 */
-		zone_idx = zbdi->open_zones[pick_random_zone_idx(f, io_u)];
+		zone_idx = zbdi->write_zones[pick_random_zone_idx(f, io_u)];
 	} else {
 		zone_idx = zbd_offset_to_zone_idx(f, io_u->offset);
 	}
@@ -1361,34 +1407,34 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 
 		if (z->has_wp) {
 			if (z->cond != ZBD_ZONE_COND_OFFLINE &&
-			    zbdi->max_open_zones == 0 &&
+			    zbdi->max_write_zones == 0 &&
 			    td->o.job_max_open_zones == 0)
 				goto examine_zone;
-			if (zbdi->num_open_zones == 0) {
-				dprint(FD_ZBD, "%s(%s): no zones are open\n",
+			if (zbdi->num_write_zones == 0) {
+				dprint(FD_ZBD, "%s(%s): no zone is write target\n",
 				       __func__, f->file_name);
-				goto open_other_zone;
+				goto choose_other_zone;
 			}
 		}
 
 		/*
-		 * List of opened zones is per-device, shared across all
+		 * Array of write target zones is per-device, shared across all
 		 * threads. Start with quasi-random candidate zone. Ignore
 		 * zones which don't belong to thread's offset/size area.
 		 */
-		open_zone_idx = pick_random_zone_idx(f, io_u);
-		assert(!open_zone_idx ||
-		       open_zone_idx < zbdi->num_open_zones);
-		tmp_idx = open_zone_idx;
+		write_zone_idx = pick_random_zone_idx(f, io_u);
+		assert(!write_zone_idx ||
+		       write_zone_idx < zbdi->num_write_zones);
+		tmp_idx = write_zone_idx;
 
-		for (i = 0; i < zbdi->num_open_zones; i++) {
+		for (i = 0; i < zbdi->num_write_zones; i++) {
 			uint32_t tmpz;
 
-			if (tmp_idx >= zbdi->num_open_zones)
+			if (tmp_idx >= zbdi->num_write_zones)
 				tmp_idx = 0;
-			tmpz = zbdi->open_zones[tmp_idx];
+			tmpz = zbdi->write_zones[tmp_idx];
 			if (f->min_zone <= tmpz && tmpz < f->max_zone) {
-				open_zone_idx = tmp_idx;
+				write_zone_idx = tmp_idx;
 				goto found_candidate_zone;
 			}
 
@@ -1406,7 +1452,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 		return NULL;
 
 found_candidate_zone:
-		new_zone_idx = zbdi->open_zones[open_zone_idx];
+		new_zone_idx = zbdi->write_zones[write_zone_idx];
 		if (new_zone_idx == zone_idx)
 			break;
 		zone_idx = new_zone_idx;
@@ -1425,32 +1471,32 @@ examine_zone:
 		goto out;
 	}
 
-open_other_zone:
-	/* Check if number of open zones reaches one of limits. */
-	wait_zone_close =
-		zbdi->num_open_zones == f->max_zone - f->min_zone ||
-		(zbdi->max_open_zones &&
-		 zbdi->num_open_zones == zbdi->max_open_zones) ||
+choose_other_zone:
+	/* Check if number of write target zones reaches one of limits. */
+	wait_zone_write =
+		zbdi->num_write_zones == f->max_zone - f->min_zone ||
+		(zbdi->max_write_zones &&
+		 zbdi->num_write_zones == zbdi->max_write_zones) ||
 		(td->o.job_max_open_zones &&
-		 td->num_open_zones == td->o.job_max_open_zones);
+		 td->num_write_zones == td->o.job_max_open_zones);
 
 	pthread_mutex_unlock(&zbdi->mutex);
 
 	/* Only z->mutex is held. */
 
 	/*
-	 * When number of open zones reaches to one of limits, wait for
-	 * zone close before opening a new zone.
+	 * When number of write target zones reaches to one of limits, wait for
+	 * zone write completion to one of them before trying a new zone.
 	 */
-	if (wait_zone_close) {
+	if (wait_zone_write) {
 		dprint(FD_ZBD,
-		       "%s(%s): quiesce to allow open zones to close\n",
+		       "%s(%s): quiesce to remove a zone from write target zones array\n",
 		       __func__, f->file_name);
 		io_u_quiesce(td);
 	}
 
 retry:
-	/* Zone 'z' is full, so try to open a new zone. */
+	/* Zone 'z' is full, so try to choose a new zone. */
 	for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
 		zone_idx++;
 		if (z->has_wp)
@@ -1465,18 +1511,18 @@ retry:
 		if (!z->has_wp)
 			continue;
 		zone_lock(td, f, z);
-		if (z->open)
+		if (z->write)
 			continue;
-		if (zbd_open_zone(td, f, z))
+		if (zbd_write_zone_get(td, f, z))
 			goto out;
 	}
 
 	/* Only z->mutex is held. */
 
-	/* Check whether the write fits in any of the already opened zones. */
+	/* Check whether the write fits in any of the write target zones. */
 	pthread_mutex_lock(&zbdi->mutex);
-	for (i = 0; i < zbdi->num_open_zones; i++) {
-		zone_idx = zbdi->open_zones[i];
+	for (i = 0; i < zbdi->num_write_zones; i++) {
+		zone_idx = zbdi->write_zones[i];
 		if (zone_idx < f->min_zone || zone_idx >= f->max_zone)
 			continue;
 		pthread_mutex_unlock(&zbdi->mutex);
@@ -1492,13 +1538,14 @@ retry:
 
 	/*
 	 * When any I/O is in-flight or when all I/Os in-flight get completed,
-	 * the I/Os might have closed zones then retry the steps to open a zone.
-	 * Before retry, call io_u_quiesce() to complete in-flight writes.
+	 * the I/Os might have removed zones from the write target array then
+	 * retry the steps to choose a zone. Before retry, call io_u_quiesce()
+	 * to complete in-flight writes.
 	 */
 	in_flight = any_io_in_flight();
 	if (in_flight || should_retry) {
 		dprint(FD_ZBD,
-		       "%s(%s): wait zone close and retry open zones\n",
+		       "%s(%s): wait zone write and retry write target zone selection\n",
 		       __func__, f->file_name);
 		pthread_mutex_unlock(&zbdi->mutex);
 		zone_unlock(z);
@@ -1512,7 +1559,7 @@ retry:
 
 	zone_unlock(z);
 
-	dprint(FD_ZBD, "%s(%s): did not open another zone\n",
+	dprint(FD_ZBD, "%s(%s): did not choose another write zone\n",
 	       __func__, f->file_name);
 
 	return NULL;
@@ -1582,7 +1629,8 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u, uint64_t min_bytes,
  * @io_u: I/O unit
  * @z: zone info pointer
  *
- * If the write command made the zone full, close it.
+ * If the write command made the zone full, remove it from the write target
+ * zones array.
  *
  * The caller must hold z->mutex.
  */
@@ -1594,7 +1642,7 @@ static void zbd_end_zone_io(struct thread_data *td, const struct io_u *io_u,
 	if (io_u->ddir == DDIR_WRITE &&
 	    io_u->offset + io_u->buflen >= zbd_zone_capacity_end(z)) {
 		pthread_mutex_lock(&f->zbd_info->mutex);
-		zbd_close_zone(td, f, z);
+		zbd_write_zone_put(td, f, z);
 		pthread_mutex_unlock(&f->zbd_info->mutex);
 	}
 }
@@ -1954,7 +2002,7 @@ retry:
 		if (zbd_zone_remainder(zb) > 0 &&
 		    zbd_zone_remainder(zb) < min_bs) {
 			pthread_mutex_lock(&f->zbd_info->mutex);
-			zbd_close_zone(td, f, zb);
+			zbd_write_zone_put(td, f, zb);
 			pthread_mutex_unlock(&f->zbd_info->mutex);
 			dprint(FD_ZBD,
 			       "%s: finish zone %d\n",
@@ -1977,11 +2025,11 @@ retry:
 			zone_lock(td, f, zb);
 		}
 
-		if (!zbd_open_zone(td, f, zb)) {
+		if (!zbd_write_zone_get(td, f, zb)) {
 			zone_unlock(zb);
-			zb = zbd_convert_to_open_zone(td, io_u);
+			zb = zbd_convert_to_write_zone(td, io_u);
 			if (!zb) {
-				dprint(FD_IO, "%s: can't convert to open zone",
+				dprint(FD_IO, "%s: can't convert to write target zone",
 				       f->file_name);
 				goto eof;
 			}
@@ -2023,7 +2071,7 @@ retry:
 			 */
 			io_u_quiesce(td);
 			zb->reset_zone = 0;
-			if (zbd_reset_zone(td, f, zb) < 0)
+			if (__zbd_reset_zone(td, f, zb) < 0)
 				goto eof;
 
 			if (zb->capacity < min_bs) {
@@ -2142,7 +2190,7 @@ char *zbd_write_status(const struct thread_stat *ts)
  * Return io_u_completed when reset zone succeeds. Return 0 when the target zone
  * does not have write pointer. On error, return negative errno.
  */
-int zbd_do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
+int zbd_do_io_u_trim(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	struct fio_zone_info *z;
diff --git a/zbd.h b/zbd.h
index 05189555..f0ac9876 100644
--- a/zbd.h
+++ b/zbd.h
@@ -29,8 +29,8 @@ enum io_u_action {
  * @type: zone type (BLK_ZONE_TYPE_*)
  * @cond: zone state (BLK_ZONE_COND_*)
  * @has_wp: whether or not this zone can have a valid write pointer
- * @open: whether or not this zone is currently open. Only relevant if
- *		max_open_zones > 0.
+ * @write: whether or not this zone is the write target at this moment. Only
+ *              relevant if zbd->max_open_zones > 0.
  * @reset_zone: whether or not this zone should be reset before writing to it
  */
 struct fio_zone_info {
@@ -41,16 +41,17 @@ struct fio_zone_info {
 	enum zbd_zone_type	type:2;
 	enum zbd_zone_cond	cond:4;
 	unsigned int		has_wp:1;
-	unsigned int		open:1;
+	unsigned int		write:1;
 	unsigned int		reset_zone:1;
 };
 
 /**
  * zoned_block_device_info - zoned block device characteristics
  * @model: Device model.
- * @max_open_zones: global limit on the number of simultaneously opened
- *	sequential write zones. A zero value means unlimited open zones,
- *	and that open zones will not be tracked in the open_zones array.
+ * @max_write_zones: global limit on the number of sequential write zones which
+ *      are simultaneously written. A zero value means unlimited zones of
+ *      simultaneous writes and that write target zones will not be tracked in
+ *      the write_zones array.
  * @mutex: Protects the modifiable members in this structure (refcount and
  *		num_open_zones).
  * @zone_size: size of a single zone in bytes.
@@ -61,10 +62,10 @@ struct fio_zone_info {
  *		if the zone size is not a power of 2.
  * @nr_zones: number of zones
  * @refcount: number of fio files that share this structure
- * @num_open_zones: number of open zones
+ * @num_write_zones: number of write target zones
  * @write_cnt: Number of writes since the latest zone reset triggered by
  *	       the zone_reset_frequency fio job parameter.
- * @open_zones: zone numbers of open zones
+ * @write_zones: zone numbers of write target zones
  * @zone_info: description of the individual zones
  *
  * Only devices for which all zones have the same size are supported.
@@ -73,7 +74,7 @@ struct fio_zone_info {
  */
 struct zoned_block_device_info {
 	enum zbd_zoned_model	model;
-	uint32_t		max_open_zones;
+	uint32_t		max_write_zones;
 	pthread_mutex_t		mutex;
 	uint64_t		zone_size;
 	uint64_t		wp_valid_data_bytes;
@@ -82,9 +83,9 @@ struct zoned_block_device_info {
 	uint32_t		zone_size_log2;
 	uint32_t		nr_zones;
 	uint32_t		refcount;
-	uint32_t		num_open_zones;
+	uint32_t		num_write_zones;
 	uint32_t		write_cnt;
-	uint32_t		open_zones[ZBD_MAX_OPEN_ZONES];
+	uint32_t		write_zones[ZBD_MAX_WRITE_ZONES];
 	struct fio_zone_info	zone_info[0];
 };
 
@@ -99,7 +100,7 @@ enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
 			      enum fio_ddir ddir);
 enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
 char *zbd_write_status(const struct thread_stat *ts);
-int zbd_do_io_u_trim(const struct thread_data *td, struct io_u *io_u);
+int zbd_do_io_u_trim(struct thread_data *td, struct io_u *io_u);
 
 static inline void zbd_close_file(struct fio_file *f)
 {
diff --git a/zbd_types.h b/zbd_types.h
index 0a8630cb..5f44f308 100644
--- a/zbd_types.h
+++ b/zbd_types.h
@@ -8,7 +8,7 @@
 
 #include <inttypes.h>
 
-#define ZBD_MAX_OPEN_ZONES	4096
+#define ZBD_MAX_WRITE_ZONES	4096
 
 /*
  * Zoned block device models.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-06-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-06-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4820d46cef75f806d8c95afaa77f86ded4e3603e:

  ci: disable tls for msys2 builds (2023-05-26 20:09:53 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1b4ba547cf45377fffc7a1e60728369997cc7a9b:

  t/run-fio-tests: address issues identified by pylint (2023-06-01 14:12:41 -0400)

----------------------------------------------------------------
Vincent Fu (3):
      t/nvmept.py: test script for io_uring_cmd NVMe pass through
      t/run-fio-tests: integrate t/nvmept.py
      t/run-fio-tests: address issues identified by pylint

 t/nvmept.py        | 414 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 t/run-fio-tests.py | 191 +++++++++++++-----------
 2 files changed, 521 insertions(+), 84 deletions(-)
 create mode 100755 t/nvmept.py

---

Diff of recent changes:

diff --git a/t/nvmept.py b/t/nvmept.py
new file mode 100755
index 00000000..a25192f2
--- /dev/null
+++ b/t/nvmept.py
@@ -0,0 +1,414 @@
+#!/usr/bin/env python3
+"""
+# nvmept.py
+#
+# Test fio's io_uring_cmd ioengine with NVMe pass-through commands.
+#
+# USAGE
+# see python3 nvmept.py --help
+#
+# EXAMPLES
+# python3 t/nvmept.py --dut /dev/ng0n1
+# python3 t/nvmept.py --dut /dev/ng1n1 -f ./fio
+#
+# REQUIREMENTS
+# Python 3.6
+#
+"""
+import os
+import sys
+import json
+import time
+import locale
+import argparse
+import subprocess
+from pathlib import Path
+
+class FioTest():
+    """fio test."""
+
+    def __init__(self, artifact_root, test_opts, debug):
+        """
+        artifact_root   root directory for artifacts (subdirectory will be created under here)
+        test            test specification
+        """
+        self.artifact_root = artifact_root
+        self.test_opts = test_opts
+        self.debug = debug
+        self.filename_stub = None
+        self.filenames = {}
+        self.json_data = None
+
+        self.test_dir = os.path.abspath(os.path.join(self.artifact_root,
+                                     f"{self.test_opts['test_id']:03d}"))
+        if not os.path.exists(self.test_dir):
+            os.mkdir(self.test_dir)
+
+        self.filename_stub = f"pt{self.test_opts['test_id']:03d}"
+        self.filenames['command'] = os.path.join(self.test_dir, f"{self.filename_stub}.command")
+        self.filenames['stdout'] = os.path.join(self.test_dir, f"{self.filename_stub}.stdout")
+        self.filenames['stderr'] = os.path.join(self.test_dir, f"{self.filename_stub}.stderr")
+        self.filenames['exitcode'] = os.path.join(self.test_dir, f"{self.filename_stub}.exitcode")
+        self.filenames['output'] = os.path.join(self.test_dir, f"{self.filename_stub}.output")
+
+    def run_fio(self, fio_path):
+        """Run a test."""
+
+        fio_args = [
+            "--name=nvmept",
+            "--ioengine=io_uring_cmd",
+            "--cmd_type=nvme",
+            "--iodepth=8",
+            "--iodepth_batch=4",
+            "--iodepth_batch_complete=4",
+            f"--filename={self.test_opts['filename']}",
+            f"--rw={self.test_opts['rw']}",
+            f"--output={self.filenames['output']}",
+            f"--output-format={self.test_opts['output-format']}",
+        ]
+        for opt in ['fixedbufs', 'nonvectored', 'force_async', 'registerfiles',
+                    'sqthread_poll', 'sqthread_poll_cpu', 'hipri', 'nowait',
+                    'time_based', 'runtime', 'verify', 'io_size']:
+            if opt in self.test_opts:
+                option = f"--{opt}={self.test_opts[opt]}"
+                fio_args.append(option)
+
+        command = [fio_path] + fio_args
+        with open(self.filenames['command'], "w+",
+                  encoding=locale.getpreferredencoding()) as command_file:
+            command_file.write(" ".join(command))
+
+        passed = True
+
+        try:
+            with open(self.filenames['stdout'], "w+",
+                      encoding=locale.getpreferredencoding()) as stdout_file, \
+                open(self.filenames['stderr'], "w+",
+                     encoding=locale.getpreferredencoding()) as stderr_file, \
+                open(self.filenames['exitcode'], "w+",
+                     encoding=locale.getpreferredencoding()) as exitcode_file:
+                proc = None
+                # Avoid using subprocess.run() here because when a timeout occurs,
+                # fio will be stopped with SIGKILL. This does not give fio a
+                # chance to clean up and means that child processes may continue
+                # running and submitting IO.
+                proc = subprocess.Popen(command,
+                                        stdout=stdout_file,
+                                        stderr=stderr_file,
+                                        cwd=self.test_dir,
+                                        universal_newlines=True)
+                proc.communicate(timeout=300)
+                exitcode_file.write(f'{proc.returncode}\n')
+                passed &= (proc.returncode == 0)
+        except subprocess.TimeoutExpired:
+            proc.terminate()
+            proc.communicate()
+            assert proc.poll()
+            print("Timeout expired")
+            passed = False
+        except Exception:
+            if proc:
+                if not proc.poll():
+                    proc.terminate()
+                    proc.communicate()
+            print(f"Exception: {sys.exc_info()}")
+            passed = False
+
+        if passed:
+            if 'output-format' in self.test_opts and 'json' in \
+                    self.test_opts['output-format']:
+                if not self.get_json():
+                    print('Unable to decode JSON data')
+                    passed = False
+
+        return passed
+
+    def get_json(self):
+        """Convert fio JSON output into a python JSON object"""
+
+        filename = self.filenames['output']
+        with open(filename, 'r', encoding=locale.getpreferredencoding()) as file:
+            file_data = file.read()
+
+        #
+        # Sometimes fio informational messages are included at the top of the
+        # JSON output, especially under Windows. Try to decode output as JSON
+        # data, lopping off up to the first four lines
+        #
+        lines = file_data.splitlines()
+        for i in range(5):
+            file_data = '\n'.join(lines[i:])
+            try:
+                self.json_data = json.loads(file_data)
+            except json.JSONDecodeError:
+                continue
+            else:
+                return True
+
+        return False
+
+    @staticmethod
+    def check_empty(job):
+        """
+        Make sure JSON data is empty.
+
+        Some data structures should be empty. This function makes sure that they are.
+
+        job         JSON object that we need to check for emptiness
+        """
+
+        return job['total_ios'] == 0 and \
+                job['slat_ns']['N'] == 0 and \
+                job['clat_ns']['N'] == 0 and \
+                job['lat_ns']['N'] == 0
+
+    def check_all_ddirs(self, ddir_nonzero, job):
+        """
+        Iterate over the data directions and check whether each is
+        appropriately empty or not.
+        """
+
+        retval = True
+        ddirlist = ['read', 'write', 'trim']
+
+        for ddir in ddirlist:
+            if ddir in ddir_nonzero:
+                if self.check_empty(job[ddir]):
+                    print(f"Unexpected zero {ddir} data found in output")
+                    retval = False
+            else:
+                if not self.check_empty(job[ddir]):
+                    print(f"Unexpected {ddir} data found in output")
+                    retval = False
+
+        return retval
+
+    def check(self):
+        """Check test output."""
+
+        raise NotImplementedError()
+
+
+class PTTest(FioTest):
+    """
+    NVMe pass-through test class. Check to make sure output for selected data
+    direction(s) is non-zero and that zero data appears for other directions.
+    """
+
+    def check(self):
+        if 'rw' not in self.test_opts:
+            return True
+
+        job = self.json_data['jobs'][0]
+        retval = True
+
+        if self.test_opts['rw'] in ['read', 'randread']:
+            retval = self.check_all_ddirs(['read'], job)
+        elif self.test_opts['rw'] in ['write', 'randwrite']:
+            if 'verify' not in self.test_opts:
+                retval = self.check_all_ddirs(['write'], job)
+            else:
+                retval = self.check_all_ddirs(['read', 'write'], job)
+        elif self.test_opts['rw'] in ['trim', 'randtrim']:
+            retval = self.check_all_ddirs(['trim'], job)
+        elif self.test_opts['rw'] in ['readwrite', 'randrw']:
+            retval = self.check_all_ddirs(['read', 'write'], job)
+        elif self.test_opts['rw'] in ['trimwrite', 'randtrimwrite']:
+            retval = self.check_all_ddirs(['trim', 'write'], job)
+        else:
+            print(f"Unhandled rw value {self.test_opts['rw']}")
+            retval = False
+
+        return retval
+
+
+def parse_args():
+    """Parse command-line arguments."""
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-f', '--fio', help='path to file executable (e.g., ./fio)')
+    parser.add_argument('-a', '--artifact-root', help='artifact root directory')
+    parser.add_argument('-d', '--debug', help='enable debug output', action='store_true')
+    parser.add_argument('-s', '--skip', nargs='+', type=int,
+                        help='list of test(s) to skip')
+    parser.add_argument('-o', '--run-only', nargs='+', type=int,
+                        help='list of test(s) to run, skipping all others')
+    parser.add_argument('--dut', help='target NVMe character device to test '
+                        '(e.g., /dev/ng0n1). WARNING: THIS IS A DESTRUCTIVE TEST', required=True)
+    args = parser.parse_args()
+
+    return args
+
+
+def main():
+    """Run tests using fio's io_uring_cmd ioengine to send NVMe pass through commands."""
+
+    args = parse_args()
+
+    artifact_root = args.artifact_root if args.artifact_root else \
+        f"nvmept-test-{time.strftime('%Y%m%d-%H%M%S')}"
+    os.mkdir(artifact_root)
+    print(f"Artifact directory is {artifact_root}")
+
+    if args.fio:
+        fio = str(Path(args.fio).absolute())
+    else:
+        fio = 'fio'
+    print(f"fio path is {fio}")
+
+    test_list = [
+        {
+            "test_id": 1,
+            "rw": 'read',
+            "timebased": 1,
+            "runtime": 3,
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 2,
+            "rw": 'randread',
+            "timebased": 1,
+            "runtime": 3,
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 3,
+            "rw": 'write',
+            "timebased": 1,
+            "runtime": 3,
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 4,
+            "rw": 'randwrite',
+            "timebased": 1,
+            "runtime": 3,
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 5,
+            "rw": 'trim',
+            "timebased": 1,
+            "runtime": 3,
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 6,
+            "rw": 'randtrim',
+            "timebased": 1,
+            "runtime": 3,
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 7,
+            "rw": 'write',
+            "io_size": 1024*1024,
+            "verify": "crc32c",
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 8,
+            "rw": 'randwrite',
+            "io_size": 1024*1024,
+            "verify": "crc32c",
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 9,
+            "rw": 'readwrite',
+            "timebased": 1,
+            "runtime": 3,
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 10,
+            "rw": 'randrw',
+            "timebased": 1,
+            "runtime": 3,
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 11,
+            "rw": 'trimwrite',
+            "timebased": 1,
+            "runtime": 3,
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 12,
+            "rw": 'randtrimwrite',
+            "timebased": 1,
+            "runtime": 3,
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 13,
+            "rw": 'randread',
+            "timebased": 1,
+            "runtime": 3,
+            "fixedbufs": 1,
+            "nonvectored": 1,
+            "force_async": 1,
+            "registerfiles": 1,
+            "sqthread_poll": 1,
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+        {
+            "test_id": 14,
+            "rw": 'randwrite',
+            "timebased": 1,
+            "runtime": 3,
+            "fixedbufs": 1,
+            "nonvectored": 1,
+            "force_async": 1,
+            "registerfiles": 1,
+            "sqthread_poll": 1,
+            "output-format": "json",
+            "test_obj": PTTest,
+        },
+    ]
+
+    passed = 0
+    failed = 0
+    skipped = 0
+
+    for test in test_list:
+        if (args.skip and test['test_id'] in args.skip) or \
+           (args.run_only and test['test_id'] not in args.run_only):
+            skipped = skipped + 1
+            outcome = 'SKIPPED (User request)'
+        else:
+            test['filename'] = args.dut
+            test_obj = test['test_obj'](artifact_root, test, args.debug)
+            status = test_obj.run_fio(fio)
+            if status:
+                status = test_obj.check()
+            if status:
+                passed = passed + 1
+                outcome = 'PASSED'
+            else:
+                failed = failed + 1
+                outcome = 'FAILED'
+
+        print(f"**********Test {test['test_id']} {outcome}**********")
+
+    print(f"{passed} tests passed, {failed} failed, {skipped} skipped")
+
+    sys.exit(failed)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 71e3e5a6..c91deed4 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -79,22 +79,22 @@ class FioTest():
 
         self.artifact_root = artifact_root
         self.testnum = testnum
-        self.test_dir = os.path.join(artifact_root, "{:04d}".format(testnum))
+        self.test_dir = os.path.join(artifact_root, f"{testnum:04d}")
         if not os.path.exists(self.test_dir):
             os.mkdir(self.test_dir)
 
         self.command_file = os.path.join(
             self.test_dir,
-            "{0}.command".format(os.path.basename(self.exe_path)))
+            f"{os.path.basename(self.exe_path)}.command")
         self.stdout_file = os.path.join(
             self.test_dir,
-            "{0}.stdout".format(os.path.basename(self.exe_path)))
+            f"{os.path.basename(self.exe_path)}.stdout")
         self.stderr_file = os.path.join(
             self.test_dir,
-            "{0}.stderr".format(os.path.basename(self.exe_path)))
+            f"{os.path.basename(self.exe_path)}.stderr")
         self.exitcode_file = os.path.join(
             self.test_dir,
-            "{0}.exitcode".format(os.path.basename(self.exe_path)))
+            f"{os.path.basename(self.exe_path)}.exitcode")
 
     def run(self):
         """Run the test."""
@@ -126,7 +126,7 @@ class FioExeTest(FioTest):
 
         command = [self.exe_path] + self.parameters
         command_file = open(self.command_file, "w+")
-        command_file.write("%s\n" % command)
+        command_file.write(f"{command}\n")
         command_file.close()
 
         stdout_file = open(self.stdout_file, "w+")
@@ -144,7 +144,7 @@ class FioExeTest(FioTest):
                                     cwd=self.test_dir,
                                     universal_newlines=True)
             proc.communicate(timeout=self.success['timeout'])
-            exitcode_file.write('{0}\n'.format(proc.returncode))
+            exitcode_file.write(f'{proc.returncode}\n')
             logging.debug("Test %d: return code: %d", self.testnum, proc.returncode)
             self.output['proc'] = proc
         except subprocess.TimeoutExpired:
@@ -169,7 +169,7 @@ class FioExeTest(FioTest):
 
         if 'proc' not in self.output:
             if self.output['failure'] == 'timeout':
-                self.failure_reason = "{0} timeout,".format(self.failure_reason)
+                self.failure_reason = f"{self.failure_reason} timeout,"
             else:
                 assert self.output['failure'] == 'exception'
                 self.failure_reason = '{0} exception: {1}, {2}'.format(
@@ -183,21 +183,21 @@ class FioExeTest(FioTest):
             if self.success['zero_return']:
                 if self.output['proc'].returncode != 0:
                     self.passed = False
-                    self.failure_reason = "{0} non-zero return code,".format(self.failure_reason)
+                    self.failure_reason = f"{self.failure_reason} non-zero return code,"
             else:
                 if self.output['proc'].returncode == 0:
-                    self.failure_reason = "{0} zero return code,".format(self.failure_reason)
+                    self.failure_reason = f"{self.failure_reason} zero return code,"
                     self.passed = False
 
         stderr_size = os.path.getsize(self.stderr_file)
         if 'stderr_empty' in self.success:
             if self.success['stderr_empty']:
                 if stderr_size != 0:
-                    self.failure_reason = "{0} stderr not empty,".format(self.failure_reason)
+                    self.failure_reason = f"{self.failure_reason} stderr not empty,"
                     self.passed = False
             else:
                 if stderr_size == 0:
-                    self.failure_reason = "{0} stderr empty,".format(self.failure_reason)
+                    self.failure_reason = f"{self.failure_reason} stderr empty,"
                     self.passed = False
 
 
@@ -223,11 +223,11 @@ class FioJobTest(FioExeTest):
         self.output_format = output_format
         self.precon_failed = False
         self.json_data = None
-        self.fio_output = "{0}.output".format(os.path.basename(self.fio_job))
+        self.fio_output = f"{os.path.basename(self.fio_job)}.output"
         self.fio_args = [
             "--max-jobs=16",
-            "--output-format={0}".format(self.output_format),
-            "--output={0}".format(self.fio_output),
+            f"--output-format={self.output_format}",
+            f"--output={self.fio_output}",
             self.fio_job,
             ]
         FioExeTest.__init__(self, fio_path, self.fio_args, success)
@@ -235,20 +235,20 @@ class FioJobTest(FioExeTest):
     def setup(self, artifact_root, testnum):
         """Setup instance variables for fio job test."""
 
-        super(FioJobTest, self).setup(artifact_root, testnum)
+        super().setup(artifact_root, testnum)
 
         self.command_file = os.path.join(
             self.test_dir,
-            "{0}.command".format(os.path.basename(self.fio_job)))
+            f"{os.path.basename(self.fio_job)}.command")
         self.stdout_file = os.path.join(
             self.test_dir,
-            "{0}.stdout".format(os.path.basename(self.fio_job)))
+            f"{os.path.basename(self.fio_job)}.stdout")
         self.stderr_file = os.path.join(
             self.test_dir,
-            "{0}.stderr".format(os.path.basename(self.fio_job)))
+            f"{os.path.basename(self.fio_job)}.stderr")
         self.exitcode_file = os.path.join(
             self.test_dir,
-            "{0}.exitcode".format(os.path.basename(self.fio_job)))
+            f"{os.path.basename(self.fio_job)}.exitcode")
 
     def run_pre_job(self):
         """Run fio job precondition step."""
@@ -269,7 +269,7 @@ class FioJobTest(FioExeTest):
             self.run_pre_job()
 
         if not self.precon_failed:
-            super(FioJobTest, self).run()
+            super().run()
         else:
             logging.debug("Test %d: precondition step failed", self.testnum)
 
@@ -295,7 +295,7 @@ class FioJobTest(FioExeTest):
             with open(filename, "r") as output_file:
                 file_data = output_file.read()
         except OSError:
-            self.failure_reason += " unable to read file {0}".format(filename)
+            self.failure_reason += f" unable to read file {filename}"
             self.passed = False
 
         return file_data
@@ -305,10 +305,10 @@ class FioJobTest(FioExeTest):
 
         if self.precon_failed:
             self.passed = False
-            self.failure_reason = "{0} precondition step failed,".format(self.failure_reason)
+            self.failure_reason = f"{self.failure_reason} precondition step failed,"
             return
 
-        super(FioJobTest, self).check_result()
+        super().check_result()
 
         if not self.passed:
             return
@@ -330,7 +330,7 @@ class FioJobTest(FioExeTest):
         try:
             self.json_data = json.loads(file_data)
         except json.JSONDecodeError:
-            self.failure_reason = "{0} unable to decode JSON data,".format(self.failure_reason)
+            self.failure_reason = f"{self.failure_reason} unable to decode JSON data,"
             self.passed = False
 
 
@@ -339,16 +339,16 @@ class FioJobTest_t0005(FioJobTest):
     Confirm that read['io_kbytes'] == write['io_kbytes'] == 102400"""
 
     def check_result(self):
-        super(FioJobTest_t0005, self).check_result()
+        super().check_result()
 
         if not self.passed:
             return
 
         if self.json_data['jobs'][0]['read']['io_kbytes'] != 102400:
-            self.failure_reason = "{0} bytes read mismatch,".format(self.failure_reason)
+            self.failure_reason = f"{self.failure_reason} bytes read mismatch,"
             self.passed = False
         if self.json_data['jobs'][0]['write']['io_kbytes'] != 102400:
-            self.failure_reason = "{0} bytes written mismatch,".format(self.failure_reason)
+            self.failure_reason = f"{self.failure_reason} bytes written mismatch,"
             self.passed = False
 
 
@@ -357,7 +357,7 @@ class FioJobTest_t0006(FioJobTest):
     Confirm that read['io_kbytes'] ~ 2*write['io_kbytes']"""
 
     def check_result(self):
-        super(FioJobTest_t0006, self).check_result()
+        super().check_result()
 
         if not self.passed:
             return
@@ -366,7 +366,7 @@ class FioJobTest_t0006(FioJobTest):
             / self.json_data['jobs'][0]['write']['io_kbytes']
         logging.debug("Test %d: ratio: %f", self.testnum, ratio)
         if ratio < 1.99 or ratio > 2.01:
-            self.failure_reason = "{0} read/write ratio mismatch,".format(self.failure_reason)
+            self.failure_reason = f"{self.failure_reason} read/write ratio mismatch,"
             self.passed = False
 
 
@@ -375,13 +375,13 @@ class FioJobTest_t0007(FioJobTest):
     Confirm that read['io_kbytes'] = 87040"""
 
     def check_result(self):
-        super(FioJobTest_t0007, self).check_result()
+        super().check_result()
 
         if not self.passed:
             return
 
         if self.json_data['jobs'][0]['read']['io_kbytes'] != 87040:
-            self.failure_reason = "{0} bytes read mismatch,".format(self.failure_reason)
+            self.failure_reason = f"{self.failure_reason} bytes read mismatch,"
             self.passed = False
 
 
@@ -397,7 +397,7 @@ class FioJobTest_t0008(FioJobTest):
     the blocks originally written will be read."""
 
     def check_result(self):
-        super(FioJobTest_t0008, self).check_result()
+        super().check_result()
 
         if not self.passed:
             return
@@ -406,10 +406,10 @@ class FioJobTest_t0008(FioJobTest):
         logging.debug("Test %d: ratio: %f", self.testnum, ratio)
 
         if ratio < 0.97 or ratio > 1.03:
-            self.failure_reason = "{0} bytes written mismatch,".format(self.failure_reason)
+            self.failure_reason = f"{self.failure_reason} bytes written mismatch,"
             self.passed = False
         if self.json_data['jobs'][0]['read']['io_kbytes'] != 32768:
-            self.failure_reason = "{0} bytes read mismatch,".format(self.failure_reason)
+            self.failure_reason = f"{self.failure_reason} bytes read mismatch,"
             self.passed = False
 
 
@@ -418,7 +418,7 @@ class FioJobTest_t0009(FioJobTest):
     Confirm that runtime >= 60s"""
 
     def check_result(self):
-        super(FioJobTest_t0009, self).check_result()
+        super().check_result()
 
         if not self.passed:
             return
@@ -426,7 +426,7 @@ class FioJobTest_t0009(FioJobTest):
         logging.debug('Test %d: elapsed: %d', self.testnum, self.json_data['jobs'][0]['elapsed'])
 
         if self.json_data['jobs'][0]['elapsed'] < 60:
-            self.failure_reason = "{0} elapsed time mismatch,".format(self.failure_reason)
+            self.failure_reason = f"{self.failure_reason} elapsed time mismatch,"
             self.passed = False
 
 
@@ -436,7 +436,7 @@ class FioJobTest_t0012(FioJobTest):
     job1,job2,job3 respectively"""
 
     def check_result(self):
-        super(FioJobTest_t0012, self).check_result()
+        super().check_result()
 
         if not self.passed:
             return
@@ -484,7 +484,7 @@ class FioJobTest_t0014(FioJobTest):
     re-calibrate the activity dynamically"""
 
     def check_result(self):
-        super(FioJobTest_t0014, self).check_result()
+        super().check_result()
 
         if not self.passed:
             return
@@ -539,7 +539,7 @@ class FioJobTest_t0015(FioJobTest):
     Confirm that mean(slat) + mean(clat) = mean(tlat)"""
 
     def check_result(self):
-        super(FioJobTest_t0015, self).check_result()
+        super().check_result()
 
         if not self.passed:
             return
@@ -560,7 +560,7 @@ class FioJobTest_t0019(FioJobTest):
     Confirm that all offsets were touched sequentially"""
 
     def check_result(self):
-        super(FioJobTest_t0019, self).check_result()
+        super().check_result()
 
         bw_log_filename = os.path.join(self.test_dir, "test_bw.log")
         file_data = self.get_file_fail(bw_log_filename)
@@ -576,13 +576,13 @@ class FioJobTest_t0019(FioJobTest):
             cur = int(line.split(',')[4])
             if cur - prev != 4096:
                 self.passed = False
-                self.failure_reason = "offsets {0}, {1} not sequential".format(prev, cur)
+                self.failure_reason = f"offsets {prev}, {cur} not sequential"
                 return
             prev = cur
 
         if cur/4096 != 255:
             self.passed = False
-            self.failure_reason = "unexpected last offset {0}".format(cur)
+            self.failure_reason = f"unexpected last offset {cur}"
 
 
 class FioJobTest_t0020(FioJobTest):
@@ -590,7 +590,7 @@ class FioJobTest_t0020(FioJobTest):
     Confirm that almost all offsets were touched non-sequentially"""
 
     def check_result(self):
-        super(FioJobTest_t0020, self).check_result()
+        super().check_result()
 
         bw_log_filename = os.path.join(self.test_dir, "test_bw.log")
         file_data = self.get_file_fail(bw_log_filename)
@@ -611,14 +611,14 @@ class FioJobTest_t0020(FioJobTest):
 
         if len(offsets) != 256:
             self.passed = False
-            self.failure_reason += " number of offsets is {0} instead of 256".format(len(offsets))
+            self.failure_reason += f" number of offsets is {len(offsets)} instead of 256"
 
         for i in range(256):
             if not i in offsets:
                 self.passed = False
-                self.failure_reason += " missing offset {0}".format(i*4096)
+                self.failure_reason += f" missing offset {i * 4096}"
 
-        (z, p) = runstest_1samp(list(offsets))
+        (_, p) = runstest_1samp(list(offsets))
         if p < 0.05:
             self.passed = False
             self.failure_reason += f" runs test failed with p = {p}"
@@ -628,7 +628,7 @@ class FioJobTest_t0022(FioJobTest):
     """Test consists of fio test job t0022"""
 
     def check_result(self):
-        super(FioJobTest_t0022, self).check_result()
+        super().check_result()
 
         bw_log_filename = os.path.join(self.test_dir, "test_bw.log")
         file_data = self.get_file_fail(bw_log_filename)
@@ -655,7 +655,7 @@ class FioJobTest_t0022(FioJobTest):
         # 10 is an arbitrary threshold
         if seq_count > 10:
             self.passed = False
-            self.failure_reason = "too many ({0}) consecutive offsets".format(seq_count)
+            self.failure_reason = f"too many ({seq_count}) consecutive offsets"
 
         if len(offsets) == filesize/bs:
             self.passed = False
@@ -690,7 +690,7 @@ class FioJobTest_t0023(FioJobTest):
                         bw_log_filename, line)
                     break
             else:
-                if ddir != 1:
+                if ddir != 1:   # pylint: disable=no-else-break
                     self.passed = False
                     self.failure_reason += " {0}: trim not preceeded by write: {1}".format(
                         bw_log_filename, line)
@@ -701,11 +701,13 @@ class FioJobTest_t0023(FioJobTest):
                         self.failure_reason += " {0}: block size does not match: {1}".format(
                             bw_log_filename, line)
                         break
+
                     if prev_offset != offset:
                         self.passed = False
                         self.failure_reason += " {0}: offset does not match: {1}".format(
                             bw_log_filename, line)
                         break
+
             prev_ddir = ddir
             prev_bs = bs
             prev_offset = offset
@@ -750,7 +752,7 @@ class FioJobTest_t0023(FioJobTest):
 
 
     def check_result(self):
-        super(FioJobTest_t0023, self).check_result()
+        super().check_result()
 
         filesize = 1024*1024
 
@@ -792,7 +794,7 @@ class FioJobTest_t0024(FioJobTest_t0023):
 class FioJobTest_t0025(FioJobTest):
     """Test experimental verify read backs written data pattern."""
     def check_result(self):
-        super(FioJobTest_t0025, self).check_result()
+        super().check_result()
 
         if not self.passed:
             return
@@ -802,7 +804,7 @@ class FioJobTest_t0025(FioJobTest):
 
 class FioJobTest_t0027(FioJobTest):
     def setup(self, *args, **kws):
-        super(FioJobTest_t0027, self).setup(*args, **kws)
+        super().setup(*args, **kws)
         self.pattern_file = os.path.join(self.test_dir, "t0027.pattern")
         self.output_file = os.path.join(self.test_dir, "t0027file")
         self.pattern = os.urandom(16 << 10)
@@ -810,7 +812,7 @@ class FioJobTest_t0027(FioJobTest):
             f.write(self.pattern)
 
     def check_result(self):
-        super(FioJobTest_t0027, self).check_result()
+        super().check_result()
 
         if not self.passed:
             return
@@ -828,7 +830,7 @@ class FioJobTest_iops_rate(FioJobTest):
     With two runs of fio-3.16 I observed a ratio of 8.3"""
 
     def check_result(self):
-        super(FioJobTest_iops_rate, self).check_result()
+        super().check_result()
 
         if not self.passed:
             return
@@ -841,11 +843,11 @@ class FioJobTest_iops_rate(FioJobTest):
         logging.debug("Test %d: ratio: %f", self.testnum, ratio)
 
         if iops1 < 950 or iops1 > 1050:
-            self.failure_reason = "{0} iops value mismatch,".format(self.failure_reason)
+            self.failure_reason = f"{self.failure_reason} iops value mismatch,"
             self.passed = False
 
         if ratio < 6 or ratio > 10:
-            self.failure_reason = "{0} iops ratio mismatch,".format(self.failure_reason)
+            self.failure_reason = f"{self.failure_reason} iops ratio mismatch,"
             self.passed = False
 
 
@@ -863,8 +865,9 @@ class Requirements():
     _not_windows = False
     _unittests = False
     _cpucount4 = False
+    _nvmecdev = False
 
-    def __init__(self, fio_root):
+    def __init__(self, fio_root, args):
         Requirements._not_macos = platform.system() != "Darwin"
         Requirements._not_windows = platform.system() != "Windows"
         Requirements._linux = platform.system() == "Linux"
@@ -873,7 +876,7 @@ class Requirements():
             config_file = os.path.join(fio_root, "config-host.h")
             contents, success = FioJobTest.get_file(config_file)
             if not success:
-                print("Unable to open {0} to check requirements".format(config_file))
+                print(f"Unable to open {config_file} to check requirements")
                 Requirements._zbd = True
             else:
                 Requirements._zbd = "CONFIG_HAS_BLKZONED" in contents
@@ -885,7 +888,7 @@ class Requirements():
             else:
                 Requirements._io_uring = "io_uring_setup" in contents
 
-            Requirements._root = (os.geteuid() == 0)
+            Requirements._root = os.geteuid() == 0
             if Requirements._zbd and Requirements._root:
                 try:
                     subprocess.run(["modprobe", "null_blk"],
@@ -904,17 +907,21 @@ class Requirements():
         Requirements._unittests = os.path.exists(unittest_path)
 
         Requirements._cpucount4 = multiprocessing.cpu_count() >= 4
-
-        req_list = [Requirements.linux,
-                    Requirements.libaio,
-                    Requirements.io_uring,
-                    Requirements.zbd,
-                    Requirements.root,
-                    Requirements.zoned_nullb,
-                    Requirements.not_macos,
-                    Requirements.not_windows,
-                    Requirements.unittests,
-                    Requirements.cpucount4]
+        Requirements._nvmecdev = args.nvmecdev
+
+        req_list = [
+                Requirements.linux,
+                Requirements.libaio,
+                Requirements.io_uring,
+                Requirements.zbd,
+                Requirements.root,
+                Requirements.zoned_nullb,
+                Requirements.not_macos,
+                Requirements.not_windows,
+                Requirements.unittests,
+                Requirements.cpucount4,
+                Requirements.nvmecdev,
+                    ]
         for req in req_list:
             value, desc = req()
             logging.debug("Requirements: Requirement '%s' met? %s", desc, value)
@@ -969,6 +976,11 @@ class Requirements():
         """Do we have at least 4 CPUs?"""
         return Requirements._cpucount4, "4+ CPUs required"
 
+    @classmethod
+    def nvmecdev(cls):
+        """Do we have an NVMe character device to test?"""
+        return Requirements._nvmecdev, "NVMe character device test target required"
+
 
 SUCCESS_DEFAULT = {
     'zero_return': True,
@@ -1367,6 +1379,14 @@ TEST_LIST = [
         'success':          SUCCESS_DEFAULT,
         'requirements':     [],
     },
+    {
+        'test_id':          1014,
+        'test_class':       FioExeTest,
+        'exe':              't/nvmept.py',
+        'parameters':       ['-f', '{fio_path}', '--dut', '{nvmecdev}'],
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [Requirements.linux, Requirements.nvmecdev],
+    },
 ]
 
 
@@ -1390,6 +1410,8 @@ def parse_args():
                         help='skip requirements checking')
     parser.add_argument('-p', '--pass-through', action='append',
                         help='pass-through an argument to an executable test')
+    parser.add_argument('--nvmecdev', action='store', default=None,
+                        help='NVMe character device for **DESTRUCTIVE** testing (e.g., /dev/ng0n1)')
     args = parser.parse_args()
 
     return args
@@ -1408,7 +1430,7 @@ def main():
     if args.pass_through:
         for arg in args.pass_through:
             if not ':' in arg:
-                print("Invalid --pass-through argument '%s'" % arg)
+                print(f"Invalid --pass-through argument '{arg}'")
                 print("Syntax for --pass-through is TESTNUMBER:ARGUMENT")
                 return
             split = arg.split(":", 1)
@@ -1419,7 +1441,7 @@ def main():
         fio_root = args.fio_root
     else:
         fio_root = str(Path(__file__).absolute().parent.parent)
-    print("fio root is %s" % fio_root)
+    print(f"fio root is {fio_root}")
 
     if args.fio:
         fio_path = args.fio
@@ -1429,17 +1451,17 @@ def main():
         else:
             fio_exe = "fio"
         fio_path = os.path.join(fio_root, fio_exe)
-    print("fio path is %s" % fio_path)
+    print(f"fio path is {fio_path}")
     if not shutil.which(fio_path):
         print("Warning: fio executable not found")
 
     artifact_root = args.artifact_root if args.artifact_root else \
-        "fio-test-{0}".format(time.strftime("%Y%m%d-%H%M%S"))
+        f"fio-test-{time.strftime('%Y%m%d-%H%M%S')}"
     os.mkdir(artifact_root)
-    print("Artifact directory is %s" % artifact_root)
+    print(f"Artifact directory is {artifact_root}")
 
     if not args.skip_req:
-        req = Requirements(fio_root)
+        req = Requirements(fio_root, args)
 
     passed = 0
     failed = 0
@@ -1449,7 +1471,7 @@ def main():
         if (args.skip and config['test_id'] in args.skip) or \
            (args.run_only and config['test_id'] not in args.run_only):
             skipped = skipped + 1
-            print("Test {0} SKIPPED (User request)".format(config['test_id']))
+            print(f"Test {config['test_id']} SKIPPED (User request)")
             continue
 
         if issubclass(config['test_class'], FioJobTest):
@@ -1477,7 +1499,8 @@ def main():
         elif issubclass(config['test_class'], FioExeTest):
             exe_path = os.path.join(fio_root, config['exe'])
             if config['parameters']:
-                parameters = [p.format(fio_path=fio_path) for p in config['parameters']]
+                parameters = [p.format(fio_path=fio_path, nvmecdev=args.nvmecdev)
+                              for p in config['parameters']]
             else:
                 parameters = []
             if Path(exe_path).suffix == '.py' and platform.system() == "Windows":
@@ -1489,7 +1512,7 @@ def main():
                                         config['success'])
             desc = config['exe']
         else:
-            print("Test {0} FAILED: unable to process test config".format(config['test_id']))
+            print(f"Test {config['test_id']} FAILED: unable to process test config")
             failed = failed + 1
             continue
 
@@ -1502,7 +1525,7 @@ def main():
                 if not reqs_met:
                     break
             if not reqs_met:
-                print("Test {0} SKIPPED ({1}) {2}".format(config['test_id'], reason, desc))
+                print(f"Test {config['test_id']} SKIPPED ({reason}) {desc}")
                 skipped = skipped + 1
                 continue
 
@@ -1520,15 +1543,15 @@ def main():
             result = "PASSED"
             passed = passed + 1
         else:
-            result = "FAILED: {0}".format(test.failure_reason)
+            result = f"FAILED: {test.failure_reason}"
             failed = failed + 1
             contents, _ = FioJobTest.get_file(test.stderr_file)
             logging.debug("Test %d: stderr:\n%s", config['test_id'], contents)
             contents, _ = FioJobTest.get_file(test.stdout_file)
             logging.debug("Test %d: stdout:\n%s", config['test_id'], contents)
-        print("Test {0} {1} {2}".format(config['test_id'], result, desc))
+        print(f"Test {config['test_id']} {result} {desc}")
 
-    print("{0} test(s) passed, {1} failed, {2} skipped".format(passed, failed, skipped))
+    print(f"{passed} test(s) passed, {failed} failed, {skipped} skipped")
 
     sys.exit(failed)
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-05-31 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-05-31 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 954b86f71b0718943796192be1a89ffb0da5a97c:

  ci: upload tagged GitHub Actions Windows installers as releases (2023-05-24 09:58:11 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4820d46cef75f806d8c95afaa77f86ded4e3603e:

  ci: disable tls for msys2 builds (2023-05-26 20:09:53 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      ci: disable tls for msys2 builds

 ci/actions-build.sh | 3 +++
 1 file changed, 3 insertions(+)

---

Diff of recent changes:

diff --git a/ci/actions-build.sh b/ci/actions-build.sh
index 351b8d18..31d3446c 100755
--- a/ci/actions-build.sh
+++ b/ci/actions-build.sh
@@ -53,6 +53,9 @@ main() {
                 "x86_64")
                     ;;
             esac
+            if [ "${CI_TARGET_BUILD}" = "windows-msys2-64" ]; then
+                configure_flags+=("--disable-tls")
+            fi
 	    ;;
     esac
     configure_flags+=(--extra-cflags="${extra_cflags}")

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-05-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-05-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5a649e2dddc4d8ad163b0cf57f7cea00a2e94a33:

  Fio 3.35 (2023-05-23 12:33:03 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 954b86f71b0718943796192be1a89ffb0da5a97c:

  ci: upload tagged GitHub Actions Windows installers as releases (2023-05-24 09:58:11 -0400)

----------------------------------------------------------------
Vincent Fu (2):
      ci: stop using AppVeyor for Windows builds
      ci: upload tagged GitHub Actions Windows installers as releases

 .appveyor.yml            | 68 ------------------------------------------------
 .github/workflows/ci.yml |  7 ++++-
 README.rst               | 11 ++++----
 ci/appveyor-install.sh   | 43 ------------------------------
 4 files changed, 12 insertions(+), 117 deletions(-)
 delete mode 100644 .appveyor.yml
 delete mode 100755 ci/appveyor-install.sh

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
deleted file mode 100644
index a63cf24f..00000000
--- a/.appveyor.yml
+++ /dev/null
@@ -1,68 +0,0 @@
-clone_depth: 1 # NB: this stops FIO-VERSION-GEN making tag based versions
-
-image:
-  - Visual Studio 2019
-
-environment:
-  CYG_MIRROR: http://cygwin.mirror.constant.com
-  matrix:
-# --disable-tls for the msys2 build to work around
-# breakage with clang/lld 16.0.0-1
-    - ARCHITECTURE: x64
-      CC: clang
-      CONFIGURE_OPTIONS: --enable-pdb --disable-tls
-      DISTRO: msys2
-# Skip 32 bit clang build
-#    - ARCHITECTURE: x86
-#      CC: clang
-#      CONFIGURE_OPTIONS: --enable-pdb
-#      DISTRO: msys2
-    - ARCHITECTURE: x64
-      CONFIGURE_OPTIONS:
-      DISTRO: cygwin
-    - ARCHITECTURE: x86
-      CONFIGURE_OPTIONS: --build-32bit-win
-      DISTRO: cygwin
-
-install:
-  - if %DISTRO%==cygwin (
-      SET "PATH=C:\cygwin64\bin;C:\cygwin64;%PATH%"
-    )
-  - if %DISTRO%==msys2 if %ARCHITECTURE%==x86 (
-      SET "PATH=C:\msys64\mingw32\bin;C:\msys64\usr\bin;%PATH%"
-    )
-  - if %DISTRO%==msys2 if %ARCHITECTURE%==x64 (
-      SET "PATH=C:\msys64\mingw64\bin;C:\msys64\usr\bin;%PATH%"
-    )
-  - SET PATH=C:\Python38-x64;%PATH% # NB: Changed env variables persist to later sections
-  - SET PYTHONUNBUFFERED=TRUE
-  - bash.exe ci\appveyor-install.sh
-
-build_script:
-  - bash.exe configure --extra-cflags=-Werror --disable-native %CONFIGURE_OPTIONS%
-  - make.exe -j2
-
-after_build:
-  - file.exe fio.exe
-  - make.exe test
-  - 'cd os\windows && dobuild.cmd %ARCHITECTURE% && cd ..'
-  - ls.exe ./os/windows/*.msi
-  - ps: Get-ChildItem .\os\windows\*.msi | % { Push-AppveyorArtifact $_.FullName -FileName $_.Name -DeploymentName fio.msi }
-
-test_script:
-  - python.exe t/run-fio-tests.py --artifact-root test-artifacts --debug
-
-deploy:
-  - provider: GitHub
-    description: fio Windows installer
-    auth_token:                      # encrypted token from GitHub
-      secure: Tjj+xRQEV25P6dQgboUblTCKx/LtUOUav2bvzSCtwMhHMAxrrn2adod6nlTf0ItV
-    artifact: fio.msi                # upload installer to release assets
-    draft: false
-    prerelease: false
-    on:
-      APPVEYOR_REPO_TAG: true        # deploy on tag push only
-      DISTRO: cygwin
-
-on_finish:
-  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && [ -d test-artifacts ] && 7z a -t7z test-artifacts.7z test-artifacts -xr!foo.0.0 -xr!latency.?.0 -xr!fio_jsonplus_clat2csv.test && appveyor PushArtifact test-artifacts.7z'
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index dd2997f0..69fedf77 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -108,12 +108,17 @@ jobs:
         dobuild.cmd ${{ matrix.installer_arch }}
         cd ..\..
 
-    - name: Upload installer (Windows)
+    - name: Upload installer as artifact (Windows)
       if: ${{ contains( matrix.build, 'windows' ) }}
       uses: actions/upload-artifact@v3
       with:
         name: ${{ matrix.build }}-installer
         path: os\windows\*.msi
+    - name: Upload installer as release for tagged builds (Windows)
+      uses: softprops/action-gh-release@v1
+      if: ${{ startsWith(github.ref, 'refs/tags/') && startsWith(matrix.build, 'windows-cygwin') }}
+      with:
+        files: os/windows/*.msi
     - name: Remove dependency files to resolve Makefile Cygwin sed issue (Windows)
       if: ${{ startsWith(matrix.build, 'windows-cygwin') }}
       run: rm *.d */*.d */*/*.d
diff --git a/README.rst b/README.rst
index 8f6208e3..dd521daf 100644
--- a/README.rst
+++ b/README.rst
@@ -123,11 +123,12 @@ Solaris:
 	``pkgutil -i fio``.
 
 Windows:
-	Beginning with fio 3.31 Windows installers are available on GitHub at
-        https://github.com/axboe/fio/releases. The latest builds for Windows
-	can also be grabbed from https://ci.appveyor.com/project/axboe/fio by
-	clicking the latest x86 or x64 build and then selecting the Artifacts
-	tab.
+        Beginning with fio 3.31 Windows installers for tagged releases are
+        available on GitHub at https://github.com/axboe/fio/releases. The
+        latest installers for Windows can also be obtained as GitHub Actions
+        artifacts by selecting a build from
+        https://github.com/axboe/fio/actions. These require logging in to a
+        GitHub account.
 
 BSDs:
 	Packages for BSDs may be available from their binary package repositories.
diff --git a/ci/appveyor-install.sh b/ci/appveyor-install.sh
deleted file mode 100755
index 1e28c454..00000000
--- a/ci/appveyor-install.sh
+++ /dev/null
@@ -1,43 +0,0 @@
-#!/bin/bash
-# The PATH to appropriate distro commands must already be set before invoking
-# this script
-# The following environment variables must be set:
-# PLATFORM={i686,x64}
-# DISTRO={cygwin,msys2}
-# The following environment can optionally be set:
-# CYG_MIRROR=<URL>
-set -eu
-
-case "${ARCHITECTURE}" in
-    "x64")
-        PACKAGE_ARCH="x86_64"
-        ;;
-    "x86")
-        PACKAGE_ARCH="i686"
-        ;;
-esac
-
-echo "Installing packages..."
-case "${DISTRO}" in
-    "cygwin")
-        CYG_MIRROR=${CYG_MIRROR:-"http://cygwin.mirror.constant.com"}
-        setup-x86_64.exe --quiet-mode --no-shortcuts --only-site \
-            --site "${CYG_MIRROR}" --packages \
-            "mingw64-${PACKAGE_ARCH}-CUnit,mingw64-${PACKAGE_ARCH}-zlib"
-        ;;
-    "msys2")
-        #pacman --noconfirm -Syuu # MSYS2 core update
-        #pacman --noconfirm -Syuu # MSYS2 normal update
-        pacman.exe --noconfirm -S \
-            mingw-w64-${PACKAGE_ARCH}-clang \
-            mingw-w64-${PACKAGE_ARCH}-cunit \
-            mingw-w64-${PACKAGE_ARCH}-toolchain \
-            mingw-w64-${PACKAGE_ARCH}-lld
-        pacman.exe -Q # List installed packages
-        ;;
-esac
-
-python.exe -m pip install scipy six statsmodels
-
-echo "Python3 path: $(type -p python3 2>&1)"
-echo "Python3 version: $(python3 -V 2>&1)"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-05-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-05-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 870ea00243b1290541334bec2a56428c9f68dba6:

  io_ur: make sure that sync errors are noticed upfront (2023-05-19 19:30:38 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5a649e2dddc4d8ad163b0cf57f7cea00a2e94a33:

  Fio 3.35 (2023-05-23 12:33:03 -0600)

----------------------------------------------------------------
Bart Van Assche (2):
      zbd: Make an error message more detailed
      zbd: Report the zone capacity

Jens Axboe (1):
      Fio 3.35

Vincent Fu (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 FIO-VERSION-GEN |  2 +-
 zbd.c           | 15 ++++++++++++---
 2 files changed, 13 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index f1585d34..4b0d56d0 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.34
+DEF_VER=fio-3.35
 
 LF='
 '
diff --git a/zbd.c b/zbd.c
index 351b3971..5f1a7d7f 100644
--- a/zbd.c
+++ b/zbd.c
@@ -213,8 +213,8 @@ static int zbd_report_zones(struct thread_data *td, struct fio_file *f,
 		ret = blkzoned_report_zones(td, f, offset, zones, nr_zones);
 	if (ret < 0) {
 		td_verror(td, errno, "report zones failed");
-		log_err("%s: report zones from sector %"PRIu64" failed (%d).\n",
-			f->file_name, offset >> 9, errno);
+		log_err("%s: report zones from sector %"PRIu64" failed (nr_zones=%d; errno=%d).\n",
+			f->file_name, offset >> 9, nr_zones, errno);
 	} else if (ret == 0) {
 		td_verror(td, errno, "Empty zone report");
 		log_err("%s: report zones from sector %"PRIu64" is empty.\n",
@@ -776,7 +776,8 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	int nr_zones, nrz;
 	struct zbd_zone *zones, *z;
 	struct fio_zone_info *p;
-	uint64_t zone_size, offset;
+	uint64_t zone_size, offset, capacity;
+	bool same_zone_cap = true;
 	struct zoned_block_device_info *zbd_info = NULL;
 	int i, j, ret = -ENOMEM;
 
@@ -793,6 +794,7 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	}
 
 	zone_size = zones[0].len;
+	capacity = zones[0].capacity;
 	nr_zones = (f->real_file_size + zone_size - 1) / zone_size;
 
 	if (td->o.zone_size == 0) {
@@ -821,6 +823,8 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 						     PTHREAD_MUTEX_RECURSIVE);
 			p->start = z->start;
 			p->capacity = z->capacity;
+			if (capacity != z->capacity)
+				same_zone_cap = false;
 
 			switch (z->cond) {
 			case ZBD_ZONE_COND_NOT_WP:
@@ -876,6 +880,11 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	f->zbd_info->zone_size_log2 = is_power_of_2(zone_size) ?
 		ilog2(zone_size) : 0;
 	f->zbd_info->nr_zones = nr_zones;
+
+	if (same_zone_cap)
+		dprint(FD_ZBD, "Zone capacity = %"PRIu64" KB\n",
+		       capacity / 1024);
+
 	zbd_info = NULL;
 	ret = 0;
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-05-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-05-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5cedafafeeb9dde862455342cd24c860d84f4f07:

  ci: fix ups for 32-bit GitHub Actions Linux builds (2023-05-18 14:29:10 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 870ea00243b1290541334bec2a56428c9f68dba6:

  io_ur: make sure that sync errors are noticed upfront (2023-05-19 19:30:38 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'master' of https://github.com/huajingyun01/fio
      io_ur: make sure that sync errors are noticed upfront

Jingyun Hua (1):
      Add LoongArch64 support

 arch/arch-loongarch64.h | 10 ++++++++++
 arch/arch.h             |  3 +++
 configure               |  4 +++-
 io_u.c                  |  3 +++
 libfio.c                |  1 +
 os/os-linux-syscall.h   | 16 ++++++++++++++++
 6 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 arch/arch-loongarch64.h

---

Diff of recent changes:

diff --git a/arch/arch-loongarch64.h b/arch/arch-loongarch64.h
new file mode 100644
index 00000000..43ea83b4
--- /dev/null
+++ b/arch/arch-loongarch64.h
@@ -0,0 +1,10 @@
+#ifndef ARCH_LOONGARCH64_H
+#define ARCH_LOONGARCH64_H
+
+#define FIO_ARCH	(arch_loongarch64)
+
+#define read_barrier()		__asm__ __volatile__("dbar 0": : :"memory")
+#define write_barrier()		__asm__ __volatile__("dbar 0": : :"memory")
+#define nop			__asm__ __volatile__("dbar 0": : :"memory")
+
+#endif
diff --git a/arch/arch.h b/arch/arch.h
index fca003be..6e476701 100644
--- a/arch/arch.h
+++ b/arch/arch.h
@@ -23,6 +23,7 @@ enum {
 	arch_hppa,
 	arch_mips,
 	arch_aarch64,
+	arch_loongarch64,
 
 	arch_generic,
 
@@ -97,6 +98,8 @@ extern unsigned long arch_flags;
 #include "arch-hppa.h"
 #elif defined(__aarch64__)
 #include "arch-aarch64.h"
+#elif defined(__loongarch64)
+#include "arch-loongarch64.h"
 #else
 #warning "Unknown architecture, attempting to use generic model."
 #include "arch-generic.h"
diff --git a/configure b/configure
index ca03350b..74416fd4 100755
--- a/configure
+++ b/configure
@@ -499,13 +499,15 @@ elif check_define __aarch64__ ; then
   cpu="aarch64"
 elif check_define __hppa__ ; then
   cpu="hppa"
+elif check_define __loongarch64 ; then
+  cpu="loongarch64"
 else
   cpu=`uname -m`
 fi
 
 # Normalise host CPU name and set ARCH.
 case "$cpu" in
-  ia64|ppc|ppc64|s390|s390x|sparc64)
+  ia64|ppc|ppc64|s390|s390x|sparc64|loongarch64)
     cpu="$cpu"
   ;;
   i386|i486|i586|i686|i86pc|BePC)
diff --git a/io_u.c b/io_u.c
index 30265cfb..6f5fc94d 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2027,6 +2027,8 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 	}
 
 	if (ddir_sync(ddir)) {
+		if (io_u->error)
+			goto error;
 		td->last_was_sync = true;
 		if (f) {
 			f->first_write = -1ULL;
@@ -2082,6 +2084,7 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 				icd->error = ret;
 		}
 	} else if (io_u->error) {
+error:
 		icd->error = io_u->error;
 		io_u_log_error(td, io_u);
 	}
diff --git a/libfio.c b/libfio.c
index ddd49cd7..5e3fd30b 100644
--- a/libfio.c
+++ b/libfio.c
@@ -74,6 +74,7 @@ static const char *fio_arch_strings[arch_nr] = {
 	"hppa",
 	"mips",
 	"aarch64",
+	"loongarch64",
 	"generic"
 };
 
diff --git a/os/os-linux-syscall.h b/os/os-linux-syscall.h
index c399b2fa..67ee4d91 100644
--- a/os/os-linux-syscall.h
+++ b/os/os-linux-syscall.h
@@ -270,6 +270,22 @@
 #define __NR_ioprio_get		31
 #endif
 
+/* Linux syscalls for loongarch64 */
+#elif defined(ARCH_LOONGARCH64_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set         30
+#define __NR_ioprio_get         31
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64          223
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice         76
+#define __NR_sys_tee          	77
+#define __NR_sys_vmsplice       75
+#endif
 #else
 #warning "Unknown architecture"
 #endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-05-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-05-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a64fd9c7994b51039b2fde851579c2453ddb35c0:

  docs: document no_completion_thread (2023-05-17 14:44:49 +0000)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5cedafafeeb9dde862455342cd24c860d84f4f07:

  ci: fix ups for 32-bit GitHub Actions Linux builds (2023-05-18 14:29:10 -0400)

----------------------------------------------------------------
Vincent Fu (2):
      Revert "ci: stop testing Linux 32-bit builds"
      ci: fix ups for 32-bit GitHub Actions Linux builds

 .github/workflows/ci.yml | 4 ++++
 ci/actions-install.sh    | 2 ++
 2 files changed, 6 insertions(+)

---

Diff of recent changes:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 8325a3d9..dd2997f0 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -14,6 +14,7 @@ jobs:
         - linux-gcc
         - linux-clang
         - macos
+        - linux-i686-gcc
         - android
         - windows-cygwin-64
         - windows-cygwin-32
@@ -27,6 +28,9 @@ jobs:
           cc: clang
         - build: macos
           os: macos-12
+        - build: linux-i686-gcc
+          os: ubuntu-22.04
+          arch: i686
         - build: android
           os: ubuntu-22.04
           arch: aarch64-linux-android32
diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index 0d73ac97..95241e78 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -35,6 +35,8 @@ DPKGCFG
                 gcc-multilib
                 pkg-config:i386
                 zlib1g-dev:i386
+                libc6:i386
+                libgcc-s1:i386
             )
             ;;
         "x86_64")

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-05-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-05-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit be42eadd18fad2569dfc6517940db8bbe2469f6d:

  engines/io_uring: fix coverity issue (2023-05-16 09:01:57 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a64fd9c7994b51039b2fde851579c2453ddb35c0:

  docs: document no_completion_thread (2023-05-17 14:44:49 +0000)

----------------------------------------------------------------
Vincent Fu (3):
      docs: move rate_cycle description
      docs: move experimental_verify description
      docs: document no_completion_thread

 HOWTO.rst | 28 ++++++++++++++++------------
 fio.1     | 21 ++++++++++++---------
 2 files changed, 28 insertions(+), 21 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 80c08f7e..32fff5ec 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -3011,6 +3011,10 @@ with the caveat that when used on the command line, they must come after the
 	performance. The default is to enable it only if
 	:option:`libblkio_wait_mode=eventfd <libblkio_wait_mode>`.
 
+.. option:: no_completion_thread : [windowsaio]
+
+	Avoid using a separate thread for completion polling.
+
 I/O depth
 ~~~~~~~~~
 
@@ -3203,6 +3207,11 @@ I/O rate
 	fio will ignore the thinktime and continue doing IO at the specified
 	rate, instead of entering a catch-up mode after thinktime is done.
 
+.. option:: rate_cycle=int
+
+	Average bandwidth for :option:`rate` and :option:`rate_min` over this number
+	of milliseconds. Defaults to 1000.
+
 
 I/O latency
 ~~~~~~~~~~~
@@ -3241,11 +3250,6 @@ I/O latency
 	microseconds. Comma-separated values may be specified for reads, writes,
 	and trims as described in :option:`blocksize`.
 
-.. option:: rate_cycle=int
-
-	Average bandwidth for :option:`rate` and :option:`rate_min` over this number
-	of milliseconds. Defaults to 1000.
-
 
 I/O replay
 ~~~~~~~~~~
@@ -3761,6 +3765,13 @@ Verification
 	verification pass, according to the settings in the job file used.  Default
 	false.
 
+.. option:: experimental_verify=bool
+
+        Enable experimental verification. Standard verify records I/O metadata
+        for later use during the verification phase. Experimental verify
+        instead resets the file after the write phase and then replays I/Os for
+        the verification phase.
+
 .. option:: trim_percentage=int
 
 	Number of verify blocks to discard/trim.
@@ -3777,13 +3788,6 @@ Verification
 
 	Trim this number of I/O blocks.
 
-.. option:: experimental_verify=bool
-
-        Enable experimental verification. Standard verify records I/O metadata
-        for later use during the verification phase. Experimental verify
-        instead resets the file after the write phase and then replays I/Os for
-        the verification phase.
-
 Steady state
 ~~~~~~~~~~~~
 
diff --git a/fio.1 b/fio.1
index e577e2e0..80bf3371 100644
--- a/fio.1
+++ b/fio.1
@@ -2765,6 +2765,9 @@ Use a busy loop with a non-blocking call to \fBblkioq_do_io()\fR.
 Enable the queue's completion eventfd even when unused. This may impact
 performance. The default is to enable it only if
 \fBlibblkio_wait_mode=eventfd\fR.
+.TP
+.BI (windowsaio)no_completion_thread
+Avoid using a separate thread for completion polling.
 .SS "I/O depth"
 .TP
 .BI iodepth \fR=\fPint
@@ -2946,6 +2949,10 @@ By default, fio will attempt to catch up to the specified rate setting, if any
 kind of thinktime setting was used. If this option is set, then fio will
 ignore the thinktime and continue doing IO at the specified rate, instead of
 entering a catch-up mode after thinktime is done.
+.TP
+.BI rate_cycle \fR=\fPint
+Average bandwidth for \fBrate\fR and \fBrate_min\fR over this number
+of milliseconds. Defaults to 1000.
 .SS "I/O latency"
 .TP
 .BI latency_target \fR=\fPtime
@@ -2975,10 +2982,6 @@ If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
 maximum latency. When the unit is omitted, the value is interpreted in
 microseconds. Comma-separated values may be specified for reads, writes,
 and trims as described in \fBblocksize\fR.
-.TP
-.BI rate_cycle \fR=\fPint
-Average bandwidth for \fBrate\fR and \fBrate_min\fR over this number
-of milliseconds. Defaults to 1000.
 .SS "I/O replay"
 .TP
 .BI write_iolog \fR=\fPstr
@@ -3475,6 +3478,11 @@ far it should verify. Without this information, fio will run a full
 verification pass, according to the settings in the job file used. Default
 false.
 .TP
+.BI experimental_verify \fR=\fPbool
+Enable experimental verification. Standard verify records I/O metadata for
+later use during the verification phase. Experimental verify instead resets the
+file after the write phase and then replays I/Os for the verification phase.
+.TP
 .BI trim_percentage \fR=\fPint
 Number of verify blocks to discard/trim.
 .TP
@@ -3486,11 +3494,6 @@ Verify that trim/discarded blocks are returned as zeros.
 .TP
 .BI trim_backlog_batch \fR=\fPint
 Trim this number of I/O blocks.
-.TP
-.BI experimental_verify \fR=\fPbool
-Enable experimental verification. Standard verify records I/O metadata for
-later use during the verification phase. Experimental verify instead resets the
-file after the write phase and then replays I/Os for the verification phase.
 .SS "Steady state"
 .TP
 .BI steadystate \fR=\fPstr:float "\fR,\fP ss" \fR=\fPstr:float

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-05-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-05-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 83b2d4b78374055c3a2261136eedf03b5fbfc335:

  ci: stop testing Linux 32-bit builds (2023-05-15 08:51:27 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to be42eadd18fad2569dfc6517940db8bbe2469f6d:

  engines/io_uring: fix coverity issue (2023-05-16 09:01:57 -0600)

----------------------------------------------------------------
Ankit Kumar (1):
      engines/io_uring: fix coverity issue

 engines/io_uring.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 90e5a856..ff64fc9f 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -1198,7 +1198,8 @@ static int fio_ioring_cmd_open_file(struct thread_data *td, struct fio_file *f)
 			FILE_SET_ENG_DATA(f, data);
 		}
 
-		lba_size = data->lba_ext ? data->lba_ext : (1 << data->lba_shift);
+		assert(data->lba_shift < 32);
+		lba_size = data->lba_ext ? data->lba_ext : (1U << data->lba_shift);
 
 		for_each_rw_ddir(ddir) {
 			if (td->o.min_bs[ddir] % lba_size ||

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-05-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-05-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f6f80750f75810bdaf56dd9362982055de1d7232:

  docs: expand description for interval-based bw and iops statistics (2023-05-10 20:28:49 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 83b2d4b78374055c3a2261136eedf03b5fbfc335:

  ci: stop testing Linux 32-bit builds (2023-05-15 08:51:27 -0400)

----------------------------------------------------------------
Ankit Kumar (2):
      engines/nvme: support for 64 LBA formats
      engines/io_uring_cmd: add extended LBA support

Vincent Fu (1):
      ci: stop testing Linux 32-bit builds

 .github/workflows/ci.yml |  4 ---
 engines/io_uring.c       | 30 +++++++++++++++++++---
 engines/nvme.c           | 66 ++++++++++++++++++++++++++++++++++++++++--------
 engines/nvme.h           |  6 ++---
 4 files changed, 84 insertions(+), 22 deletions(-)

---

Diff of recent changes:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index dd2997f0..8325a3d9 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -14,7 +14,6 @@ jobs:
         - linux-gcc
         - linux-clang
         - macos
-        - linux-i686-gcc
         - android
         - windows-cygwin-64
         - windows-cygwin-32
@@ -28,9 +27,6 @@ jobs:
           cc: clang
         - build: macos
           os: macos-12
-        - build: linux-i686-gcc
-          os: ubuntu-22.04
-          arch: i686
         - build: android
           os: ubuntu-22.04
           arch: aarch64-linux-android32
diff --git a/engines/io_uring.c b/engines/io_uring.c
index f5ffe9f4..90e5a856 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -1177,22 +1177,40 @@ static int fio_ioring_cmd_open_file(struct thread_data *td, struct fio_file *f)
 	if (o->cmd_type == FIO_URING_CMD_NVME) {
 		struct nvme_data *data = NULL;
 		unsigned int nsid, lba_size = 0;
+		__u32 ms = 0;
 		__u64 nlba = 0;
 		int ret;
 
 		/* Store the namespace-id and lba size. */
 		data = FILE_ENG_DATA(f);
 		if (data == NULL) {
-			ret = fio_nvme_get_info(f, &nsid, &lba_size, &nlba);
+			ret = fio_nvme_get_info(f, &nsid, &lba_size, &ms, &nlba);
 			if (ret)
 				return ret;
 
 			data = calloc(1, sizeof(struct nvme_data));
 			data->nsid = nsid;
-			data->lba_shift = ilog2(lba_size);
+			if (ms)
+				data->lba_ext = lba_size + ms;
+			else
+				data->lba_shift = ilog2(lba_size);
 
 			FILE_SET_ENG_DATA(f, data);
 		}
+
+		lba_size = data->lba_ext ? data->lba_ext : (1 << data->lba_shift);
+
+		for_each_rw_ddir(ddir) {
+			if (td->o.min_bs[ddir] % lba_size ||
+				td->o.max_bs[ddir] % lba_size) {
+				if (data->lba_ext)
+					log_err("block size must be a multiple of "
+						"(LBA data size + Metadata size)\n");
+				else
+					log_err("block size must be a multiple of LBA data size\n");
+				return 1;
+			}
+                }
 	}
 	if (!ld || !o->registerfiles)
 		return generic_open_file(td, f);
@@ -1243,16 +1261,20 @@ static int fio_ioring_cmd_get_file_size(struct thread_data *td,
 	if (o->cmd_type == FIO_URING_CMD_NVME) {
 		struct nvme_data *data = NULL;
 		unsigned int nsid, lba_size = 0;
+		__u32 ms = 0;
 		__u64 nlba = 0;
 		int ret;
 
-		ret = fio_nvme_get_info(f, &nsid, &lba_size, &nlba);
+		ret = fio_nvme_get_info(f, &nsid, &lba_size, &ms, &nlba);
 		if (ret)
 			return ret;
 
 		data = calloc(1, sizeof(struct nvme_data));
 		data->nsid = nsid;
-		data->lba_shift = ilog2(lba_size);
+		if (ms)
+			data->lba_ext = lba_size + ms;
+		else
+			data->lba_shift = ilog2(lba_size);
 
 		f->real_file_size = lba_size * nlba;
 		fio_file_set_size_known(f);
diff --git a/engines/nvme.c b/engines/nvme.c
index fd2161f3..1047ade2 100644
--- a/engines/nvme.c
+++ b/engines/nvme.c
@@ -21,8 +21,13 @@ int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 	else
 		return -ENOTSUP;
 
-	slba = io_u->offset >> data->lba_shift;
-	nlb = (io_u->xfer_buflen >> data->lba_shift) - 1;
+	if (data->lba_ext) {
+		slba = io_u->offset / data->lba_ext;
+		nlb = (io_u->xfer_buflen / data->lba_ext) - 1;
+	} else {
+		slba = io_u->offset >> data->lba_shift;
+		nlb = (io_u->xfer_buflen >> data->lba_shift) - 1;
+	}
 
 	/* cdw10 and cdw11 represent starting lba */
 	cmd->cdw10 = slba & 0xffffffff;
@@ -65,8 +70,13 @@ int fio_nvme_trim(const struct thread_data *td, struct fio_file *f,
 	struct nvme_dsm_range dsm;
 	int ret;
 
-	dsm.nlb = (len >> data->lba_shift);
-	dsm.slba = (offset >> data->lba_shift);
+	if (data->lba_ext) {
+		dsm.nlb = len / data->lba_ext;
+		dsm.slba = offset / data->lba_ext;
+	} else {
+		dsm.nlb = len >> data->lba_shift;
+		dsm.slba = offset >> data->lba_shift;
+	}
 
 	ret = nvme_trim(f->fd, data->nsid, 1, sizeof(struct nvme_dsm_range),
 			&dsm);
@@ -94,11 +104,12 @@ static int nvme_identify(int fd, __u32 nsid, enum nvme_identify_cns cns,
 }
 
 int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
-		      __u64 *nlba)
+		      __u32 *ms, __u64 *nlba)
 {
 	struct nvme_id_ns ns;
 	int namespace_id;
 	int fd, err;
+	__u32 format_idx;
 
 	if (f->filetype != FIO_TYPE_CHAR) {
 		log_err("ioengine io_uring_cmd only works with nvme ns "
@@ -113,9 +124,8 @@ int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
 	namespace_id = ioctl(fd, NVME_IOCTL_ID);
 	if (namespace_id < 0) {
 		err = -errno;
-		log_err("failed to fetch namespace-id");
-		close(fd);
-		return err;
+		log_err("%s: failed to fetch namespace-id\n", f->file_name);
+		goto out;
 	}
 
 	/*
@@ -125,17 +135,51 @@ int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
 	err = nvme_identify(fd, namespace_id, NVME_IDENTIFY_CNS_NS,
 				NVME_CSI_NVM, &ns);
 	if (err) {
-		log_err("failed to fetch identify namespace\n");
+		log_err("%s: failed to fetch identify namespace\n",
+			f->file_name);
 		close(fd);
 		return err;
 	}
 
 	*nsid = namespace_id;
-	*lba_sz = 1 << ns.lbaf[(ns.flbas & 0x0f)].ds;
+
+	/*
+	 * 16 or 64 as maximum number of supported LBA formats.
+	 * From flbas bit 0-3 indicates lsb and bit 5-6 indicates msb
+	 * of the format index used to format the namespace.
+	 */
+	if (ns.nlbaf < 16)
+		format_idx = ns.flbas & 0xf;
+	else
+		format_idx = (ns.flbas & 0xf) + (((ns.flbas >> 5) & 0x3) << 4);
+
+	*lba_sz = 1 << ns.lbaf[format_idx].ds;
+
+	/*
+	 * Only extended LBA can be supported.
+	 * Bit 4 for flbas indicates if metadata is transferred at the end of
+	 * logical block creating an extended LBA.
+	 */
+	*ms = le16_to_cpu(ns.lbaf[format_idx].ms);
+	if (*ms && !((ns.flbas >> 4) & 0x1)) {
+		log_err("%s: only extended logical block can be supported\n",
+			f->file_name);
+		err = -ENOTSUP;
+		goto out;
+	}
+
+	/* Check for end to end data protection support */
+	if (ns.dps & 0x3) {
+		log_err("%s: end to end data protection not supported\n",
+			f->file_name);
+		err = -ENOTSUP;
+		goto out;
+	}
 	*nlba = ns.nsze;
 
+out:
 	close(fd);
-	return 0;
+	return err;
 }
 
 int fio_nvme_get_zoned_model(struct thread_data *td, struct fio_file *f,
diff --git a/engines/nvme.h b/engines/nvme.h
index 408594d5..f7cb820d 100644
--- a/engines/nvme.h
+++ b/engines/nvme.h
@@ -88,6 +88,7 @@ enum nvme_zns_zs {
 struct nvme_data {
 	__u32 nsid;
 	__u32 lba_shift;
+	__u32 lba_ext;
 };
 
 struct nvme_lbaf {
@@ -134,8 +135,7 @@ struct nvme_id_ns {
 	__le16			endgid;
 	__u8			nguid[16];
 	__u8			eui64[8];
-	struct nvme_lbaf	lbaf[16];
-	__u8			rsvd192[192];
+	struct nvme_lbaf	lbaf[64];
 	__u8			vs[3712];
 };
 
@@ -223,7 +223,7 @@ int fio_nvme_iomgmt_ruhs(struct thread_data *td, struct fio_file *f,
 			 struct nvme_fdp_ruh_status *ruhs, __u32 bytes);
 
 int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
-		      __u64 *nlba);
+		      __u32 *ms, __u64 *nlba);
 
 int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 			    struct iovec *iov);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-05-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-05-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 37946bed31b688fe55e2003b6d59ff0c964165bb:

  engines/rdma: remove dead code (2023-05-10 09:16:55 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f6f80750f75810bdaf56dd9362982055de1d7232:

  docs: expand description for interval-based bw and iops statistics (2023-05-10 20:28:49 -0400)

----------------------------------------------------------------
Vincent Fu (2):
      t/run-fio-test: fix comment
      docs: expand description for interval-based bw and iops statistics

 HOWTO.rst          | 22 +++++++++++++++-------
 fio.1              | 19 ++++++++++++-------
 t/run-fio-tests.py |  2 +-
 3 files changed, 28 insertions(+), 15 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 0a6e60c7..80c08f7e 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -4417,15 +4417,23 @@ writes in the example above).  In the order listed, they denote:
                 It is the sum of submission and completion latency.
 
 **bw**
-		Bandwidth statistics based on samples. Same names as the xlat stats,
-		but also includes the number of samples taken (**samples**) and an
-		approximate percentage of total aggregate bandwidth this thread
-		received in its group (**per**). This last value is only really
-		useful if the threads in this group are on the same disk, since they
-		are then competing for disk access.
+		Bandwidth statistics based on measurements from discrete
+		intervals. Fio continuously monitors bytes transferred and I/O
+		operations completed. By default fio calculates bandwidth in
+		each half-second interval (see :option:`bwavgtime`) and reports
+		descriptive statistics for the measurements here. Same names as
+		the xlat stats, but also includes the number of samples taken
+		(**samples**) and an approximate percentage of total aggregate
+		bandwidth this thread received in its group (**per**). This
+		last value is only really useful if the threads in this group
+		are on the same disk, since they are then competing for disk
+		access.
 
 **iops**
-		IOPS statistics based on samples. Same names as bw.
+		IOPS statistics based on measurements from discrete intervals.
+		For details see the description for bw above. See
+		:option:`iopsavgtime` to control the duration of the intervals.
+		Same values reported here as for bw except for percentage.
 
 **lat (nsec/usec/msec)**
 		The distribution of I/O completion latencies. This is the time from when
diff --git a/fio.1 b/fio.1
index 4207814b..e577e2e0 100644
--- a/fio.1
+++ b/fio.1
@@ -4073,15 +4073,20 @@ Total latency. Same names as slat and clat, this denotes the time from
 when fio created the I/O unit to completion of the I/O operation.
 .TP
 .B bw
-Bandwidth statistics based on samples. Same names as the xlat stats,
-but also includes the number of samples taken (\fIsamples\fR) and an
-approximate percentage of total aggregate bandwidth this thread
-received in its group (\fIper\fR). This last value is only really
-useful if the threads in this group are on the same disk, since they
-are then competing for disk access.
+Bandwidth statistics based on measurements from discrete intervals. Fio
+continuosly monitors bytes transferred and I/O operations completed. By default
+fio calculates bandwidth in each half-second interval (see \fBbwavgtime\fR)
+and reports descriptive statistics for the measurements here. Same names as the
+xlat stats, but also includes the number of samples taken (\fIsamples\fR) and an
+approximate percentage of total aggregate bandwidth this thread received in its
+group (\fIper\fR). This last value is only really useful if the threads in this
+group are on the same disk, since they are then competing for disk access.
 .TP
 .B iops
-IOPS statistics based on samples. Same names as \fBbw\fR.
+IOPS statistics based on measurements from discrete intervals.
+For details see the description for \fBbw\fR above. See
+\fBiopsavgtime\fR to control the duration of the intervals.
+Same values reported here as for \fBbw\fR except for percentage.
 .TP
 .B lat (nsec/usec/msec)
 The distribution of I/O completion latencies. This is the time from when
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 4fe6fe46..71e3e5a6 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -822,7 +822,7 @@ class FioJobTest_t0027(FioJobTest):
             self.passed = False
 
 class FioJobTest_iops_rate(FioJobTest):
-    """Test consists of fio test job t0009
+    """Test consists of fio test job t0011
     Confirm that job0 iops == 1000
     and that job1_iops / job0_iops ~ 8
     With two runs of fio-3.16 I observed a ratio of 8.3"""

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-05-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-05-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0771592f81fcb032e261b18212477ceffc6cdac5:

  Merge branch 'master' of https://github.com/bvanassche/fio (2023-04-27 17:08:41 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 37946bed31b688fe55e2003b6d59ff0c964165bb:

  engines/rdma: remove dead code (2023-05-10 09:16:55 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      README: remove reference to the bsdio installer
      t/read-to-pipe-async: remove dead code
      engines/rdma: remove dead code

 README.rst             | 11 +++++------
 engines/rdma.c         |  2 --
 t/read-to-pipe-async.c |  4 +---
 3 files changed, 6 insertions(+), 11 deletions(-)

---

Diff of recent changes:

diff --git a/README.rst b/README.rst
index bcd08ec9..8f6208e3 100644
--- a/README.rst
+++ b/README.rst
@@ -123,12 +123,11 @@ Solaris:
 	``pkgutil -i fio``.
 
 Windows:
-        Beginning with fio 3.31 Windows installers are available on GitHub at
-        https://github.com/axboe/fio/releases.  Rebecca Cran
-        <rebecca@bsdio.com> has fio packages for Windows at
-        https://bsdio.com/fio/ . The latest builds for Windows can also be
-        grabbed from https://ci.appveyor.com/project/axboe/fio by clicking the
-        latest x86 or x64 build and then selecting the Artifacts tab.
+	Beginning with fio 3.31 Windows installers are available on GitHub at
+        https://github.com/axboe/fio/releases. The latest builds for Windows
+	can also be grabbed from https://ci.appveyor.com/project/axboe/fio by
+	clicking the latest x86 or x64 build and then selecting the Artifacts
+	tab.
 
 BSDs:
 	Packages for BSDs may be available from their binary package repositories.
diff --git a/engines/rdma.c b/engines/rdma.c
index ee2844d3..ebdbcb1c 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -856,8 +856,6 @@ static int fio_rdmaio_commit(struct thread_data *td)
 			ret = fio_rdmaio_send(td, io_us, rd->io_u_queued_nr);
 		else if (!rd->is_client)
 			ret = fio_rdmaio_recv(td, io_us, rd->io_u_queued_nr);
-		else
-			ret = 0;	/* must be a SYNC */
 
 		if (ret > 0) {
 			fio_rdmaio_queued(td, io_us, ret);
diff --git a/t/read-to-pipe-async.c b/t/read-to-pipe-async.c
index 586e3c95..569fc62a 100644
--- a/t/read-to-pipe-async.c
+++ b/t/read-to-pipe-async.c
@@ -247,10 +247,8 @@ static void *writer_fn(void *data)
 	while (!wt->thread.exit || !flist_empty(&wt->list)) {
 		pthread_mutex_lock(&wt->thread.lock);
 
-		if (work) {
+		if (work)
 			flist_add_tail(&work->list, &wt->done_list);
-			work = NULL;
-		}
 	
 		work = find_seq(wt, seq);
 		if (work)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-04-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-04-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9724b4f5ebf0841087c5a56c1d83efe0f4aeb6d7:

  Revert "zbd: Report the zone capacity" (2023-04-27 05:08:29 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0771592f81fcb032e261b18212477ceffc6cdac5:

  Merge branch 'master' of https://github.com/bvanassche/fio (2023-04-27 17:08:41 -0600)

----------------------------------------------------------------
Anuj Gupta (1):
      t/io_uring: avoid null-ptr dereference in case setup_ring fails

Bart Van Assche (2):
      Detect ASharedMemory_create() support
      ci: Also test the Android recovery environment

Jens Axboe (2):
      t/io_uring: make submitter_init() return < 0 on error
      Merge branch 'master' of https://github.com/bvanassche/fio

Vincent Fu (2):
      ci: add Windows Cygwin and msys2 builds to GitHub Actions
      ci: work around for GitHub Actions Cygwin sed issue

 .github/workflows/ci.yml | 82 +++++++++++++++++++++++++++++++++++++++++++++---
 ci/actions-build.sh      | 18 +++++++++--
 ci/actions-full-test.sh  |  5 ++-
 ci/actions-install.sh    | 18 +++++++----
 ci/actions-smoke-test.sh |  5 ++-
 ci/common.sh             |  2 +-
 configure                | 20 ++++++++++++
 os/os-ashmem.h           |  4 +--
 t/io_uring.c             | 37 ++++++++++++++++------
 9 files changed, 163 insertions(+), 28 deletions(-)

---

Diff of recent changes:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 4bc91d3e..dd2997f0 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -16,6 +16,9 @@ jobs:
         - macos
         - linux-i686-gcc
         - android
+        - windows-cygwin-64
+        - windows-cygwin-32
+        - windows-msys2-64
         include:
         - build: linux-gcc
           os: ubuntu-22.04
@@ -31,6 +34,25 @@ jobs:
         - build: android
           os: ubuntu-22.04
           arch: aarch64-linux-android32
+        - build: android-recovery
+          os: ubuntu-22.04
+          arch: aarch64-linux-android32
+        - build: windows-cygwin-64
+          os: windows-latest
+          arch: x86_64
+          installer_arch: x64
+          shell: bash
+        - build: windows-cygwin-32
+          os: windows-latest
+          arch: i686
+          installer_arch: x86
+          shell: bash
+        - build: windows-msys2-64
+          os: windows-latest
+          cc: clang
+          arch: x86_64
+          installer_arch: x64
+          shell: msys2
 
     env:
       CI_TARGET_BUILD: ${{ matrix.build }}
@@ -38,13 +60,65 @@ jobs:
       CC: ${{ matrix.cc }}
 
     steps:
+    - name: git config line endings (Windows)
+      if: ${{ contains( matrix.build, 'windows' ) }}
+      run: git config --global core.autocrlf input
     - name: Checkout repo
       uses: actions/checkout@v3
+    - name: Install Cygwin toolchain (Windows)
+      if: ${{ startsWith(matrix.build, 'windows-cygwin') }}
+      uses: cygwin/cygwin-install-action@master
+      with:
+        packages: >
+          mingw64-${{matrix.arch}}-binutils
+          mingw64-${{matrix.arch}}-CUnit
+          mingw64-${{matrix.arch}}-curl
+          mingw64-${{matrix.arch}}-dlfcn
+          mingw64-${{matrix.arch}}-gcc-core
+          mingw64-${{matrix.arch}}-headers
+          mingw64-${{matrix.arch}}-runtime
+          mingw64-${{matrix.arch}}-zlib
+
+    - name: Install msys2 toolchain (Windows)
+      if: ${{ startsWith(matrix.build, 'windows-msys2') }}
+      uses: msys2/setup-msys2@v2
+      with:
+        install: >
+          git
+          base-devel
+          mingw-w64-${{matrix.arch}}-clang
+          mingw-w64-${{matrix.arch}}-cunit
+          mingw-w64-${{matrix.arch}}-toolchain
+          mingw-w64-${{matrix.arch}}-lld
+          mingw-w64-${{matrix.arch}}-python-scipy
+          mingw-w64-${{matrix.arch}}-python-six
+          mingw-w64-${{matrix.arch}}-python-statsmodels
+          mingw-w64-${{matrix.arch}}-python-sphinx
+
     - name: Install dependencies
-      run: ./ci/actions-install.sh
+      run: ${{matrix.shell}} ./ci/actions-install.sh
+      if: ${{ !contains( matrix.build, 'msys2' ) }}
     - name: Build
-      run: ./ci/actions-build.sh
+      run:  ${{matrix.shell}} ./ci/actions-build.sh
+    - name: Build installer (Windows)
+      if: ${{ contains( matrix.build, 'windows' ) }}
+      shell: cmd
+      run: |
+        cd os\windows
+        dobuild.cmd ${{ matrix.installer_arch }}
+        cd ..\..
+
+    - name: Upload installer (Windows)
+      if: ${{ contains( matrix.build, 'windows' ) }}
+      uses: actions/upload-artifact@v3
+      with:
+        name: ${{ matrix.build }}-installer
+        path: os\windows\*.msi
+    - name: Remove dependency files to resolve Makefile Cygwin sed issue (Windows)
+      if: ${{ startsWith(matrix.build, 'windows-cygwin') }}
+      run: rm *.d */*.d */*/*.d
+      shell: bash
     - name: Smoke test
-      run: ./ci/actions-smoke-test.sh
+      run:  ${{matrix.shell}} ./ci/actions-smoke-test.sh
     - name: Full test
-      run: ./ci/actions-full-test.sh
+      run:  ${{matrix.shell}} ./ci/actions-full-test.sh
diff --git a/ci/actions-build.sh b/ci/actions-build.sh
index 2b3de8e3..351b8d18 100755
--- a/ci/actions-build.sh
+++ b/ci/actions-build.sh
@@ -12,7 +12,7 @@ main() {
 
     set_ci_target_os
     case "${CI_TARGET_BUILD}/${CI_TARGET_OS}" in
-        android/*)
+        android*/*)
             export UNAME=Android
             if [ -z "${CI_TARGET_ARCH}" ]; then
                 echo "Error: CI_TARGET_ARCH has not been set"
@@ -20,7 +20,9 @@ main() {
             fi
             NDK=$PWD/android-ndk-r24/toolchains/llvm/prebuilt/linux-x86_64/bin
             export PATH="${NDK}:${PATH}"
-            export LIBS="-landroid"
+            if [ "${CI_TARGET_BUILD}" = "android" ]; then
+                export LIBS="-landroid"
+            fi
             CC=${NDK}/${CI_TARGET_ARCH}-clang
             if [ ! -e "${CC}" ]; then
                 echo "Error: could not find ${CC}"
@@ -41,7 +43,17 @@ main() {
                     )
                     ;;
             esac
-        ;;
+	    ;;
+        */windows)
+	    configure_flags+=("--disable-native")
+            case "${CI_TARGET_ARCH}" in
+                "i686")
+		    configure_flags+=("--build-32bit-win")
+                    ;;
+                "x86_64")
+                    ;;
+            esac
+	    ;;
     esac
     configure_flags+=(--extra-cflags="${extra_cflags}")
 
diff --git a/ci/actions-full-test.sh b/ci/actions-full-test.sh
index d1675f6e..d2fb4201 100755
--- a/ci/actions-full-test.sh
+++ b/ci/actions-full-test.sh
@@ -3,7 +3,10 @@
 set -eu
 
 main() {
-    [ "${CI_TARGET_BUILD}" = android ] && return 0
+    case "${CI_TARGET_BUILD}" in
+	android*)
+	    return 0;;
+    esac
 
     echo "Running long running tests..."
     export PYTHONUNBUFFERED="TRUE"
diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index fb3bd141..0d73ac97 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -89,13 +89,19 @@ install_macos() {
     pip3 install scipy six statsmodels
 }
 
+install_windows() {
+	pip3 install scipy six statsmodels sphinx
+}
+
 main() {
-    if [ "${CI_TARGET_BUILD}" = "android" ]; then
-	echo "Installing Android NDK..."
-	wget --quiet https://dl.google.com/android/repository/android-ndk-r24-linux.zip
-	unzip -q android-ndk-r24-linux.zip
-	return 0
-    fi
+    case "${CI_TARGET_BUILD}" in
+	android*)
+	    echo "Installing Android NDK..."
+	    wget --quiet https://dl.google.com/android/repository/android-ndk-r24-linux.zip
+	    unzip -q android-ndk-r24-linux.zip
+	    return 0
+	    ;;
+    esac
 
     set_ci_target_os
 
diff --git a/ci/actions-smoke-test.sh b/ci/actions-smoke-test.sh
index 3196f6a1..494462ac 100755
--- a/ci/actions-smoke-test.sh
+++ b/ci/actions-smoke-test.sh
@@ -3,7 +3,10 @@
 set -eu
 
 main() {
-    [ "${CI_TARGET_BUILD}" = "android" ] && return 0
+    case "${CI_TARGET_BUILD}" in
+	android*)
+	    return 0;;
+    esac
 
     echo "Running smoke tests..."
     make test
diff --git a/ci/common.sh b/ci/common.sh
index 8861f843..3cf6a416 100644
--- a/ci/common.sh
+++ b/ci/common.sh
@@ -15,7 +15,7 @@ function set_ci_target_os {
             darwin*)
                 CI_TARGET_OS="macos"
                 ;;
-            msys*)
+            cygwin|msys*)
                 CI_TARGET_OS="windows"
                 ;;
             bsd*)
diff --git a/configure b/configure
index abb6d016..ca03350b 100755
--- a/configure
+++ b/configure
@@ -1345,6 +1345,23 @@ if compile_prog "" "" "sync_file_range"; then
 fi
 print_config "sync_file_range" "$sync_file_range"
 
+##########################################
+# ASharedMemory_create() probe
+if test "$ASharedMemory_create" != "yes" ; then
+  ASharedMemory_create="no"
+fi
+cat > $TMPC << EOF
+#include <android/sharedmem.h>
+int main(int argc, char **argv)
+{
+  return ASharedMemory_create("", 0);
+}
+EOF
+if compile_prog "" "" "ASharedMemory_create"; then
+  ASharedMemory_create="yes"
+fi
+print_config "ASharedMemory_create" "$ASharedMemory_create"
+
 ##########################################
 # ext4 move extent probe
 if test "$ext4_me" != "yes" ; then
@@ -3011,6 +3028,9 @@ fi
 if test "$sync_file_range" = "yes" ; then
   output_sym "CONFIG_SYNC_FILE_RANGE"
 fi
+if test "$ASharedMemory_create" = "yes" ; then
+  output_sym "CONFIG_ASHAREDMEMORY_CREATE"
+fi
 if test "$sfaa" = "yes" ; then
   output_sym "CONFIG_SFAA"
 fi
diff --git a/os/os-ashmem.h b/os/os-ashmem.h
index c34ff656..80eab7c4 100644
--- a/os/os-ashmem.h
+++ b/os/os-ashmem.h
@@ -6,7 +6,7 @@
 #include <linux/ashmem.h>
 #include <linux/shm.h>
 #include <android/api-level.h>
-#if __ANDROID_API__ >= __ANDROID_API_O__
+#ifdef CONFIG_ASHAREDMEMORY_CREATE
 #include <android/sharedmem.h>
 #else
 #define ASHMEM_DEVICE	"/dev/ashmem"
@@ -27,7 +27,7 @@ static inline int shmctl(int __shmid, int __cmd, struct shmid_ds *__buf)
 	return ret;
 }
 
-#if __ANDROID_API__ >= __ANDROID_API_O__
+#ifdef CONFIG_ASHAREDMEMORY_CREATE
 static inline int shmget(key_t __key, size_t __size, int __shmflg)
 {
 	char keybuf[11];
diff --git a/t/io_uring.c b/t/io_uring.c
index 6b0efef8..bf0aa26e 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -1049,7 +1049,7 @@ static int submitter_init(struct submitter *s)
 
 		buf = allocate_mem(s, bs);
 		if (!buf)
-			return 1;
+			return -1;
 		s->iovecs[i].iov_base = buf;
 		s->iovecs[i].iov_len = bs;
 	}
@@ -1059,14 +1059,15 @@ static int submitter_init(struct submitter *s)
 		err = 0;
 	} else if (!aio) {
 		err = setup_ring(s);
-		sprintf(buf, "Engine=io_uring, sq_ring=%d, cq_ring=%d\n", *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
+		if (!err)
+			sprintf(buf, "Engine=io_uring, sq_ring=%d, cq_ring=%d\n", *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
 	} else {
 		sprintf(buf, "Engine=aio\n");
 		err = setup_aio(s);
 	}
 	if (err) {
 		printf("queue setup failed: %s, %d\n", strerror(errno), err);
-		return 1;
+		return -1;
 	}
 
 	if (!init_printed) {
@@ -1172,9 +1173,15 @@ static void *submitter_aio_fn(void *data)
 	struct iocb *iocbs;
 	struct io_event *events;
 #ifdef ARCH_HAVE_CPU_CLOCK
-	int nr_batch = submitter_init(s);
-#else
-	submitter_init(s);
+	int nr_batch;
+#endif
+
+	ret = submitter_init(s);
+	if (ret < 0)
+		goto done;
+
+#ifdef ARCH_HAVE_CPU_CLOCK
+	nr_batch = ret;
 #endif
 
 	iocbsptr = calloc(depth, sizeof(struct iocb *));
@@ -1238,6 +1245,7 @@ static void *submitter_aio_fn(void *data)
 	free(iocbsptr);
 	free(iocbs);
 	free(events);
+done:
 	finish = 1;
 	return NULL;
 }
@@ -1277,9 +1285,15 @@ static void *submitter_uring_fn(void *data)
 	struct io_sq_ring *ring = &s->sq_ring;
 	int ret, prepped;
 #ifdef ARCH_HAVE_CPU_CLOCK
-	int nr_batch = submitter_init(s);
-#else
-	submitter_init(s);
+	int nr_batch;
+#endif
+
+	ret = submitter_init(s);
+	if (ret < 0)
+		goto done;
+
+#ifdef ARCH_HAVE_CPU_CLOCK
+	nr_batch = ret;
 #endif
 
 	if (register_ring)
@@ -1383,6 +1397,7 @@ submit:
 	if (register_ring)
 		io_uring_unregister_ring(s);
 
+done:
 	finish = 1;
 	return NULL;
 }
@@ -1393,7 +1408,8 @@ static void *submitter_sync_fn(void *data)
 	struct submitter *s = data;
 	int ret;
 
-	submitter_init(s);
+	if (submitter_init(s) < 0)
+		goto done;
 
 	do {
 		uint64_t offset;
@@ -1429,6 +1445,7 @@ static void *submitter_sync_fn(void *data)
 			add_stat(s, s->clock_index, 1);
 	} while (!s->finish);
 
+done:
 	finish = 1;
 	return NULL;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-04-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-04-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 073974b24aac23610e9e13e3eb56438ad108ab31:

  filesetup: better handle non-uniform distributions (2023-04-20 15:24:39 +0000)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9724b4f5ebf0841087c5a56c1d83efe0f4aeb6d7:

  Revert "zbd: Report the zone capacity" (2023-04-27 05:08:29 -0600)

----------------------------------------------------------------
Niklas Cassel (1):
      Revert "zbd: Report the zone capacity"

 zbd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/zbd.c b/zbd.c
index f5fb923a..351b3971 100644
--- a/zbd.c
+++ b/zbd.c
@@ -804,8 +804,8 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 		goto out;
 	}
 
-	dprint(FD_ZBD, "Device %s has %d zones of size %"PRIu64" KB and capacity %"PRIu64" KB\n",
-	       f->file_name, nr_zones, zone_size / 1024, zones[0].capacity / 1024);
+	dprint(FD_ZBD, "Device %s has %d zones of size %"PRIu64" KB\n",
+	       f->file_name, nr_zones, zone_size / 1024);
 
 	zbd_info = scalloc(1, sizeof(*zbd_info) +
 			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-04-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-04-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7624d58953d38612c11496551a855a1aeee7ad24:

  docs: update documentation for randrepeat and allrandrepeat (2023-04-13 13:38:52 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 073974b24aac23610e9e13e3eb56438ad108ab31:

  filesetup: better handle non-uniform distributions (2023-04-20 15:24:39 +0000)

----------------------------------------------------------------
Vincent Fu (5):
      ci: disable __thread support for Windows msys2 build
      engines: cleanup casts and move memset
      engines: separate declaration and assignment
      fio: replace malloc+memset with calloc
      filesetup: better handle non-uniform distributions

 .appveyor.yml        |  4 +++-
 client.c             |  6 ++----
 configure            |  7 ++++++-
 engines/e4defrag.c   |  3 +--
 engines/io_uring.c   |  3 +--
 engines/libhdfs.c    |  3 +--
 engines/libiscsi.c   |  3 +--
 engines/net.c        |  4 +---
 engines/nfs.c        |  6 ++----
 engines/null.c       |  6 +++---
 engines/posixaio.c   |  8 +++-----
 engines/rdma.c       | 22 +++++++---------------
 engines/solarisaio.c |  7 +++----
 engines/sync.c       |  3 +--
 eta.c                |  6 ++----
 filesetup.c          |  8 +++-----
 gfio.c               |  3 +--
 graph.c              |  3 +--
 init.c               |  3 +--
 t/io_uring.c         |  3 +--
 t/lfsr-test.c        |  3 +--
 verify.c             |  3 +--
 22 files changed, 46 insertions(+), 71 deletions(-)

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index 92301ca9..a63cf24f 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -6,9 +6,11 @@ image:
 environment:
   CYG_MIRROR: http://cygwin.mirror.constant.com
   matrix:
+# --disable-tls for the msys2 build to work around
+# breakage with clang/lld 16.0.0-1
     - ARCHITECTURE: x64
       CC: clang
-      CONFIGURE_OPTIONS: --enable-pdb
+      CONFIGURE_OPTIONS: --enable-pdb --disable-tls
       DISTRO: msys2
 # Skip 32 bit clang build
 #    - ARCHITECTURE: x86
diff --git a/client.c b/client.c
index 51496c77..7cd2ba66 100644
--- a/client.c
+++ b/client.c
@@ -369,8 +369,7 @@ static struct fio_client *get_new_client(void)
 {
 	struct fio_client *client;
 
-	client = malloc(sizeof(*client));
-	memset(client, 0, sizeof(*client));
+	client = calloc(1, sizeof(*client));
 
 	INIT_FLIST_HEAD(&client->list);
 	INIT_FLIST_HEAD(&client->hash_list);
@@ -793,8 +792,7 @@ static int __fio_client_send_remote_ini(struct fio_client *client,
 	dprint(FD_NET, "send remote ini %s to %s\n", filename, client->hostname);
 
 	p_size = sizeof(*pdu) + strlen(filename) + 1;
-	pdu = malloc(p_size);
-	memset(pdu, 0, p_size);
+	pdu = calloc(1, p_size);
 	pdu->name_len = strlen(filename);
 	strcpy((char *) pdu->file, filename);
 	pdu->client_type = cpu_to_le16((uint16_t) client->type);
diff --git a/configure b/configure
index 45d10a31..abb6d016 100755
--- a/configure
+++ b/configure
@@ -264,6 +264,8 @@ for opt do
   ;;
   --seed-buckets=*) seed_buckets="$optarg"
   ;;
+  --disable-tls) tls_check="no"
+  ;;
   --help)
     show_help="yes"
     ;;
@@ -313,6 +315,7 @@ if test "$show_help" = "yes" ; then
   echo "--disable-dfs           Disable DAOS File System support even if found"
   echo "--enable-asan           Enable address sanitizer"
   echo "--seed-buckets=         Number of seed buckets for the refill-buffer"
+  echo "--disable-tls		Disable __thread local storage"
   exit $exit_val
 fi
 
@@ -1549,7 +1552,8 @@ print_config "socklen_t" "$socklen_t"
 if test "$tls_thread" != "yes" ; then
   tls_thread="no"
 fi
-cat > $TMPC << EOF
+if test "$tls_check" != "no"; then
+  cat > $TMPC << EOF
 #include <stdio.h>
 static __thread int ret;
 int main(int argc, char **argv)
@@ -1560,6 +1564,7 @@ EOF
 if compile_prog "" "" "__thread"; then
   tls_thread="yes"
 fi
+fi
 print_config "__thread" "$tls_thread"
 
 ##########################################
diff --git a/engines/e4defrag.c b/engines/e4defrag.c
index 0a0004d0..37cc2ada 100644
--- a/engines/e4defrag.c
+++ b/engines/e4defrag.c
@@ -77,12 +77,11 @@ static int fio_e4defrag_init(struct thread_data *td)
 		return 1;
 	}
 
-	ed = malloc(sizeof(*ed));
+	ed = calloc(1, sizeof(*ed));
 	if (!ed) {
 		td_verror(td, ENOMEM, "io_queue_init");
 		return 1;
 	}
-	memset(ed, 0 ,sizeof(*ed));
 
 	if (td->o.directory)
 		len = sprintf(donor_name, "%s/", td->o.directory);
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 7f743c2a..f5ffe9f4 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -800,11 +800,10 @@ static void fio_ioring_probe(struct thread_data *td)
 	/* default to off, as that's always safe */
 	o->nonvectored = 0;
 
-	p = malloc(sizeof(*p) + 256 * sizeof(struct io_uring_probe_op));
+	p = calloc(1, sizeof(*p) + 256 * sizeof(struct io_uring_probe_op));
 	if (!p)
 		return;
 
-	memset(p, 0, sizeof(*p) + 256 * sizeof(struct io_uring_probe_op));
 	ret = syscall(__NR_io_uring_register, ld->ring_fd,
 			IORING_REGISTER_PROBE, p, 256);
 	if (ret < 0)
diff --git a/engines/libhdfs.c b/engines/libhdfs.c
index f20e45ca..d0a26840 100644
--- a/engines/libhdfs.c
+++ b/engines/libhdfs.c
@@ -315,8 +315,7 @@ static int fio_hdfsio_setup(struct thread_data *td)
 	uint64_t file_size, total_file_size;
 
 	if (!td->io_ops_data) {
-		hd = malloc(sizeof(*hd));
-		memset(hd, 0, sizeof(*hd));
+		hd = calloc(1, sizeof(*hd));
 		
 		hd->curr_file_id = -1;
 
diff --git a/engines/libiscsi.c b/engines/libiscsi.c
index c97b5709..37c9b55a 100644
--- a/engines/libiscsi.c
+++ b/engines/libiscsi.c
@@ -68,8 +68,7 @@ static int fio_iscsi_setup_lun(struct iscsi_info *iscsi_info,
 	struct scsi_readcapacity16	*rc16	    = NULL;
 	int				 ret	    = 0;
 
-	iscsi_lun = malloc(sizeof(struct iscsi_lun));
-	memset(iscsi_lun, 0, sizeof(struct iscsi_lun));
+	iscsi_lun = calloc(1, sizeof(struct iscsi_lun));
 
 	iscsi_lun->iscsi_info = iscsi_info;
 
diff --git a/engines/net.c b/engines/net.c
index c6cec584..fec53d74 100644
--- a/engines/net.c
+++ b/engines/net.c
@@ -1370,9 +1370,7 @@ static int fio_netio_setup(struct thread_data *td)
 	}
 
 	if (!td->io_ops_data) {
-		nd = malloc(sizeof(*nd));
-
-		memset(nd, 0, sizeof(*nd));
+		nd = calloc(1, sizeof(*nd));
 		nd->listenfd = -1;
 		nd->pipes[0] = nd->pipes[1] = -1;
 		td->io_ops_data = nd;
diff --git a/engines/nfs.c b/engines/nfs.c
index 336e670b..970962a3 100644
--- a/engines/nfs.c
+++ b/engines/nfs.c
@@ -224,8 +224,7 @@ static int do_mount(struct thread_data *td, const char *url)
 		return -1;
 	}
 
-	options->events = malloc(event_size);
-	memset(options->events, 0, event_size);
+	options->events = calloc(1, event_size);
 
 	options->prev_requested_event_index = -1;
 	options->queue_depth = td->o.iodepth;
@@ -278,8 +277,7 @@ static int fio_libnfs_open(struct thread_data *td, struct fio_file *f)
 			options->nfs_url, ret, nfs_get_error(options->context));
 		return ret;
 	}
-	nfs_data = malloc(sizeof(struct nfs_data));
-	memset(nfs_data, 0, sizeof(struct nfs_data));
+	nfs_data = calloc(1, sizeof(struct nfs_data));
 	nfs_data->options = options;
 
 	if (td->o.td_ddir == TD_DDIR_WRITE)
diff --git a/engines/null.c b/engines/null.c
index 68759c26..7236ec94 100644
--- a/engines/null.c
+++ b/engines/null.c
@@ -106,13 +106,13 @@ static void null_cleanup(struct null_data *nd)
 
 static struct null_data *null_init(struct thread_data *td)
 {
-	struct null_data *nd = (struct null_data *) malloc(sizeof(*nd));
+	struct null_data *nd;
+	nd = malloc(sizeof(*nd));
 
 	memset(nd, 0, sizeof(*nd));
 
 	if (td->o.iodepth != 1) {
-		nd->io_us = (struct io_u **) malloc(td->o.iodepth * sizeof(struct io_u *));
-		memset(nd->io_us, 0, td->o.iodepth * sizeof(struct io_u *));
+		nd->io_us = calloc(td->o.iodepth, sizeof(struct io_u *));
 		td->io_ops->flags |= FIO_ASYNCIO_SETS_ISSUE_TIME;
 	} else
 		td->io_ops->flags |= FIO_SYNCIO;
diff --git a/engines/posixaio.c b/engines/posixaio.c
index 135d088c..0f4eea68 100644
--- a/engines/posixaio.c
+++ b/engines/posixaio.c
@@ -197,11 +197,9 @@ static void fio_posixaio_cleanup(struct thread_data *td)
 
 static int fio_posixaio_init(struct thread_data *td)
 {
-	struct posixaio_data *pd = malloc(sizeof(*pd));
-
-	memset(pd, 0, sizeof(*pd));
-	pd->aio_events = malloc(td->o.iodepth * sizeof(struct io_u *));
-	memset(pd->aio_events, 0, td->o.iodepth * sizeof(struct io_u *));
+	struct posixaio_data *pd;
+	pd = calloc(1, sizeof(*pd));
+	pd->aio_events = calloc(td->o.iodepth, sizeof(struct io_u *));
 
 	td->io_ops_data = pd;
 	return 0;
diff --git a/engines/rdma.c b/engines/rdma.c
index fcb41068..ee2844d3 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -1296,23 +1296,18 @@ static int fio_rdmaio_init(struct thread_data *td)
 
 	if ((rd->rdma_protocol == FIO_RDMA_MEM_WRITE) ||
 	    (rd->rdma_protocol == FIO_RDMA_MEM_READ)) {
-		rd->rmt_us =
-			malloc(FIO_RDMA_MAX_IO_DEPTH * sizeof(struct remote_u));
-		memset(rd->rmt_us, 0,
-			FIO_RDMA_MAX_IO_DEPTH * sizeof(struct remote_u));
+		rd->rmt_us = calloc(FIO_RDMA_MAX_IO_DEPTH,
+				    sizeof(struct remote_u));
 		rd->rmt_nr = 0;
 	}
 
-	rd->io_us_queued = malloc(td->o.iodepth * sizeof(struct io_u *));
-	memset(rd->io_us_queued, 0, td->o.iodepth * sizeof(struct io_u *));
+	rd->io_us_queued = calloc(td->o.iodepth, sizeof(struct io_u *));
 	rd->io_u_queued_nr = 0;
 
-	rd->io_us_flight = malloc(td->o.iodepth * sizeof(struct io_u *));
-	memset(rd->io_us_flight, 0, td->o.iodepth * sizeof(struct io_u *));
+	rd->io_us_flight = calloc(td->o.iodepth, sizeof(struct io_u *));
 	rd->io_u_flight_nr = 0;
 
-	rd->io_us_completed = malloc(td->o.iodepth * sizeof(struct io_u *));
-	memset(rd->io_us_completed, 0, td->o.iodepth * sizeof(struct io_u *));
+	rd->io_us_completed = calloc(td->o.iodepth, sizeof(struct io_u *));
 	rd->io_u_completed_nr = 0;
 
 	if (td_read(td)) {	/* READ as the server */
@@ -1339,8 +1334,7 @@ static int fio_rdmaio_post_init(struct thread_data *td)
 	for (i = 0; i < td->io_u_freelist.nr; i++) {
 		struct io_u *io_u = td->io_u_freelist.io_us[i];
 
-		io_u->engine_data = malloc(sizeof(struct rdma_io_u_data));
-		memset(io_u->engine_data, 0, sizeof(struct rdma_io_u_data));
+		io_u->engine_data = calloc(1, sizeof(struct rdma_io_u_data));
 		((struct rdma_io_u_data *)io_u->engine_data)->wr_id = i;
 
 		io_u->mr = ibv_reg_mr(rd->pd, io_u->buf, max_bs,
@@ -1386,9 +1380,7 @@ static int fio_rdmaio_setup(struct thread_data *td)
 	}
 
 	if (!td->io_ops_data) {
-		rd = malloc(sizeof(*rd));
-
-		memset(rd, 0, sizeof(*rd));
+		rd = calloc(1, sizeof(*rd));
 		init_rand_seed(&rd->rand_state, (unsigned int) GOLDEN_RATIO_64, 0);
 		td->io_ops_data = rd;
 	}
diff --git a/engines/solarisaio.c b/engines/solarisaio.c
index 21e95935..b2b47fed 100644
--- a/engines/solarisaio.c
+++ b/engines/solarisaio.c
@@ -185,8 +185,9 @@ static void fio_solarisaio_init_sigio(void)
 
 static int fio_solarisaio_init(struct thread_data *td)
 {
-	struct solarisaio_data *sd = malloc(sizeof(*sd));
 	unsigned int max_depth;
+	struct solarisaio_data *sd;
+	sd = calloc(1, sizeof(*sd));
 
 	max_depth = td->o.iodepth;
 	if (max_depth > MAXASYNCHIO) {
@@ -195,9 +196,7 @@ static int fio_solarisaio_init(struct thread_data *td)
 							max_depth);
 	}
 
-	memset(sd, 0, sizeof(*sd));
-	sd->aio_events = malloc(max_depth * sizeof(struct io_u *));
-	memset(sd->aio_events, 0, max_depth * sizeof(struct io_u *));
+	sd->aio_events = calloc(max_depth, sizeof(struct io_u *));
 	sd->max_depth = max_depth;
 
 #ifdef USE_SIGNAL_COMPLETIONS
diff --git a/engines/sync.c b/engines/sync.c
index 339ba999..d1999122 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -402,8 +402,7 @@ static int fio_vsyncio_init(struct thread_data *td)
 {
 	struct syncio_data *sd;
 
-	sd = malloc(sizeof(*sd));
-	memset(sd, 0, sizeof(*sd));
+	sd = calloc(1, sizeof(*sd));
 	sd->last_offset = -1ULL;
 	sd->iovecs = malloc(td->o.iodepth * sizeof(struct iovec));
 	sd->io_us = malloc(td->o.iodepth * sizeof(struct io_u *));
diff --git a/eta.c b/eta.c
index ce1c6f2d..af4027e0 100644
--- a/eta.c
+++ b/eta.c
@@ -409,8 +409,7 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 	if (!ddir_rw_sum(disp_io_bytes))
 		fill_start_time(&disp_prev_time);
 
-	eta_secs = malloc(thread_number * sizeof(uint64_t));
-	memset(eta_secs, 0, thread_number * sizeof(uint64_t));
+	eta_secs = calloc(thread_number, sizeof(uint64_t));
 
 	je->elapsed_sec = (mtime_since_genesis() + 999) / 1000;
 
@@ -692,10 +691,9 @@ struct jobs_eta *get_jobs_eta(bool force, size_t *size)
 		return NULL;
 
 	*size = sizeof(*je) + THREAD_RUNSTR_SZ + 8;
-	je = malloc(*size);
+	je = calloc(1, *size);
 	if (!je)
 		return NULL;
-	memset(je, 0, *size);
 
 	if (!calc_thread_status(je, force)) {
 		free(je);
diff --git a/filesetup.c b/filesetup.c
index 8e505941..816d1081 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -303,13 +303,12 @@ static bool pre_read_file(struct thread_data *td, struct fio_file *f)
 	if (bs > left)
 		bs = left;
 
-	b = malloc(bs);
+	b = calloc(1, bs);
 	if (!b) {
 		td_verror(td, errno, "malloc");
 		ret = false;
 		goto error;
 	}
-	memset(b, 0, bs);
 
 	if (lseek(f->fd, f->file_offset, SEEK_SET) < 0) {
 		td_verror(td, errno, "lseek");
@@ -1448,9 +1447,8 @@ static void __init_rand_distribution(struct thread_data *td, struct fio_file *f)
 
 	nranges = (fsize + range_size - 1ULL) / range_size;
 
-	seed = jhash(f->file_name, strlen(f->file_name), 0) * td->thread_number;
-	if (!td->o.rand_repeatable)
-		seed = td->rand_seeds[FIO_RAND_BLOCK_OFF];
+	seed = jhash(f->file_name, strlen(f->file_name), 0) * td->thread_number *
+		td->rand_seeds[FIO_RAND_BLOCK_OFF];
 
 	if (td->o.random_distribution == FIO_RAND_DIST_ZIPF)
 		zipf_init(&f->zipf, nranges, td->o.zipf_theta.u.f, td->o.random_center.u.f, seed);
diff --git a/gfio.c b/gfio.c
index 22c5314d..10c9b094 100644
--- a/gfio.c
+++ b/gfio.c
@@ -730,8 +730,7 @@ static struct gui_entry *alloc_new_gui_entry(struct gui *ui)
 {
 	struct gui_entry *ge;
 
-	ge = malloc(sizeof(*ge));
-	memset(ge, 0, sizeof(*ge));
+	ge = calloc(1, sizeof(*ge));
 	ge->state = GE_STATE_NEW;
 	ge->ui = ui;
 	return ge;
diff --git a/graph.c b/graph.c
index c49cdae1..3d2b6c96 100644
--- a/graph.c
+++ b/graph.c
@@ -713,8 +713,7 @@ static void graph_label_add_value(struct graph_label *i, void *value,
 	struct graph *g = i->parent;
 	struct graph_value *x;
 
-	x = malloc(sizeof(*x));
-	memset(x, 0, sizeof(*x));
+	x = calloc(1, sizeof(*x));
 	INIT_FLIST_HEAD(&x->alias);
 	INIT_FLIST_HEAD(&x->list);
 	flist_add_tail(&x->list, &i->value_list);
diff --git a/init.c b/init.c
index 48121f14..437406ec 100644
--- a/init.c
+++ b/init.c
@@ -1946,8 +1946,7 @@ static int __parse_jobs_ini(struct thread_data *td,
 	 * it's really 256 + small bit, 280 should suffice
 	 */
 	if (!nested) {
-		name = malloc(280);
-		memset(name, 0, 280);
+		name = calloc(1, 280);
 	}
 
 	opts = NULL;
diff --git a/t/io_uring.c b/t/io_uring.c
index f9f4b840..6b0efef8 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -487,11 +487,10 @@ static void io_uring_probe(int fd)
 	struct io_uring_probe *p;
 	int ret;
 
-	p = malloc(sizeof(*p) + 256 * sizeof(struct io_uring_probe_op));
+	p = calloc(1, sizeof(*p) + 256 * sizeof(struct io_uring_probe_op));
 	if (!p)
 		return;
 
-	memset(p, 0, sizeof(*p) + 256 * sizeof(struct io_uring_probe_op));
 	ret = syscall(__NR_io_uring_register, fd, IORING_REGISTER_PROBE, p, 256);
 	if (ret < 0)
 		goto out;
diff --git a/t/lfsr-test.c b/t/lfsr-test.c
index 4b255e19..632de383 100644
--- a/t/lfsr-test.c
+++ b/t/lfsr-test.c
@@ -78,8 +78,7 @@ int main(int argc, char *argv[])
 	/* Create verification table */
 	if (verify) {
 		v_size = numbers * sizeof(uint8_t);
-		v = malloc(v_size);
-		memset(v, 0, v_size);
+		v = calloc(1, v_size);
 		printf("\nVerification table is %lf KiB\n", (double)(v_size) / 1024);
 	}
 	v_start = v;
diff --git a/verify.c b/verify.c
index e7e4c69c..2848b686 100644
--- a/verify.c
+++ b/verify.c
@@ -1595,8 +1595,7 @@ struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 	*sz = sizeof(*rep);
 	*sz += nr * sizeof(struct thread_io_list);
 	*sz += depth * sizeof(struct file_comp);
-	rep = malloc(*sz);
-	memset(rep, 0, *sz);
+	rep = calloc(1, *sz);
 
 	rep->threads = cpu_to_le64((uint64_t) nr);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-04-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-04-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 07ed2b57741afa53afa7b2b9fa742c652f1ed8c1:

  Merge branch 'libaio-hang' of https://github.com/lrumancik/fio (2023-04-10 15:40:45 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7624d58953d38612c11496551a855a1aeee7ad24:

  docs: update documentation for randrepeat and allrandrepeat (2023-04-13 13:38:52 -0400)

----------------------------------------------------------------
Vincent Fu (7):
      rand: print out random seeds for debugging
      init: refactor random seed setting
      init: get rid of td_fill_rand_seeds_internal
      init: clean up random seed options
      t/random_seed: python script to test random seed options
      test: improve evaluation of t0020.fio and t0021.fio
      docs: update documentation for randrepeat and allrandrepeat

Xiaoguang Wang (1):
      t/io_uring: fix max_blocks calculation in nvme passthrough mode

 HOWTO.rst              |   7 +-
 cconv.c                |   2 -
 ci/actions-install.sh  |   3 +-
 ci/appveyor-install.sh |   2 +-
 fio.1                  |   7 +-
 fio.h                  |   1 -
 init.c                 |  91 ++++--------
 options.c              |  11 +-
 server.h               |   2 +-
 t/io_uring.c           |   2 +-
 t/random_seed.py       | 394 +++++++++++++++++++++++++++++++++++++++++++++++++
 t/run-fio-tests.py     |  26 ++--
 thread_options.h       |   3 -
 13 files changed, 454 insertions(+), 97 deletions(-)
 create mode 100755 t/random_seed.py

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index cb0f9834..0a6e60c7 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1232,13 +1232,12 @@ I/O type
 
 .. option:: randrepeat=bool
 
-	Seed the random number generator used for random I/O patterns in a
-	predictable way so the pattern is repeatable across runs. Default: true.
+        Seed all random number generators in a predictable way so the pattern
+        is repeatable across runs. Default: true.
 
 .. option:: allrandrepeat=bool
 
-	Seed all random number generators in a predictable way so results are
-	repeatable across runs.  Default: false.
+	Alias for :option:`randrepeat`. Default: true.
 
 .. option:: randseed=int
 
diff --git a/cconv.c b/cconv.c
index 1ae38b1b..9095d519 100644
--- a/cconv.c
+++ b/cconv.c
@@ -206,7 +206,6 @@ int convert_thread_options_to_cpu(struct thread_options *o,
 	o->do_disk_util = le32_to_cpu(top->do_disk_util);
 	o->override_sync = le32_to_cpu(top->override_sync);
 	o->rand_repeatable = le32_to_cpu(top->rand_repeatable);
-	o->allrand_repeatable = le32_to_cpu(top->allrand_repeatable);
 	o->rand_seed = le64_to_cpu(top->rand_seed);
 	o->log_entries = le32_to_cpu(top->log_entries);
 	o->log_avg_msec = le32_to_cpu(top->log_avg_msec);
@@ -446,7 +445,6 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->do_disk_util = cpu_to_le32(o->do_disk_util);
 	top->override_sync = cpu_to_le32(o->override_sync);
 	top->rand_repeatable = cpu_to_le32(o->rand_repeatable);
-	top->allrand_repeatable = cpu_to_le32(o->allrand_repeatable);
 	top->rand_seed = __cpu_to_le64(o->rand_seed);
 	top->log_entries = cpu_to_le32(o->log_entries);
 	top->log_avg_msec = cpu_to_le32(o->log_avg_msec);
diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index 5057fca3..fb3bd141 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -62,6 +62,7 @@ DPKGCFG
     pkgs+=(
         python3-scipy
 	python3-sphinx
+	python3-statsmodels
     )
 
     echo "Updating APT..."
@@ -85,7 +86,7 @@ install_macos() {
     echo "Installing packages..."
     HOMEBREW_NO_AUTO_UPDATE=1 brew install cunit libnfs sphinx-doc
     brew link sphinx-doc --force
-    pip3 install scipy six 
+    pip3 install scipy six statsmodels
 }
 
 main() {
diff --git a/ci/appveyor-install.sh b/ci/appveyor-install.sh
index 3137f39e..1e28c454 100755
--- a/ci/appveyor-install.sh
+++ b/ci/appveyor-install.sh
@@ -37,7 +37,7 @@ case "${DISTRO}" in
         ;;
 esac
 
-python.exe -m pip install scipy six
+python.exe -m pip install scipy six statsmodels
 
 echo "Python3 path: $(type -p python3 2>&1)"
 echo "Python3 version: $(python3 -V 2>&1)"
diff --git a/fio.1 b/fio.1
index 311b16d8..4207814b 100644
--- a/fio.1
+++ b/fio.1
@@ -1022,12 +1022,11 @@ Alias for \fBboth\fR.
 .RE
 .TP
 .BI randrepeat \fR=\fPbool
-Seed the random number generator used for random I/O patterns in a
-predictable way so the pattern is repeatable across runs. Default: true.
+Seed all random number generators in a predictable way so the pattern is
+repeatable across runs. Default: true.
 .TP
 .BI allrandrepeat \fR=\fPbool
-Seed all random number generators in a predictable way so results are
-repeatable across runs. Default: false.
+Alias for \fBrandrepeat\fR. Default: true.
 .TP
 .BI randseed \fR=\fPint
 Seed the random number generators based on this seed value, to be able to
diff --git a/fio.h b/fio.h
index 6b841e9c..6fc7fb9c 100644
--- a/fio.h
+++ b/fio.h
@@ -638,7 +638,6 @@ extern void fio_options_dup_and_init(struct option *);
 extern char *fio_option_dup_subs(const char *);
 extern void fio_options_mem_dupe(struct thread_data *);
 extern void td_fill_rand_seeds(struct thread_data *);
-extern void td_fill_verify_state_seed(struct thread_data *);
 extern void add_job_opts(const char **, int);
 extern int ioengine_load(struct thread_data *);
 extern bool parse_dryrun(void);
diff --git a/init.c b/init.c
index a70f749a..48121f14 100644
--- a/init.c
+++ b/init.c
@@ -1020,8 +1020,12 @@ static void init_rand_file_service(struct thread_data *td)
 	}
 }
 
-void td_fill_verify_state_seed(struct thread_data *td)
+void td_fill_rand_seeds(struct thread_data *td)
 {
+	uint64_t read_seed = td->rand_seeds[FIO_RAND_BS_OFF];
+	uint64_t write_seed = td->rand_seeds[FIO_RAND_BS1_OFF];
+	uint64_t trim_seed = td->rand_seeds[FIO_RAND_BS2_OFF];
+	int i;
 	bool use64;
 
 	if (td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE64)
@@ -1029,17 +1033,6 @@ void td_fill_verify_state_seed(struct thread_data *td)
 	else
 		use64 = false;
 
-	init_rand_seed(&td->verify_state, td->rand_seeds[FIO_RAND_VER_OFF],
-		use64);
-}
-
-static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
-{
-	uint64_t read_seed = td->rand_seeds[FIO_RAND_BS_OFF];
-	uint64_t write_seed = td->rand_seeds[FIO_RAND_BS1_OFF];
-	uint64_t trim_seed = td->rand_seeds[FIO_RAND_BS2_OFF];
-	int i;
-
 	/*
 	 * trimwrite is special in that we need to generate the same
 	 * offsets to get the "write after trim" effect. If we are
@@ -1056,7 +1049,8 @@ static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 	init_rand_seed(&td->bsrange_state[DDIR_WRITE], write_seed, use64);
 	init_rand_seed(&td->bsrange_state[DDIR_TRIM], trim_seed, use64);
 
-	td_fill_verify_state_seed(td);
+	init_rand_seed(&td->verify_state, td->rand_seeds[FIO_RAND_VER_OFF],
+		use64);
 	init_rand_seed(&td->rwmix_state, td->rand_seeds[FIO_RAND_MIX_OFF], false);
 
 	if (td->o.file_service_type == FIO_FSERVICE_RANDOM)
@@ -1075,12 +1069,6 @@ static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 	init_rand_seed(&td->prio_state, td->rand_seeds[FIO_RAND_PRIO_CMDS], false);
 	init_rand_seed(&td->dedupe_working_set_index_state, td->rand_seeds[FIO_RAND_DEDUPE_WORKING_SET_IX], use64);
 
-	if (!td_random(td))
-		return;
-
-	if (td->o.rand_repeatable)
-		td->rand_seeds[FIO_RAND_BLOCK_OFF] = FIO_RANDSEED * td->thread_number;
-
 	init_rand_seed(&td->random_state, td->rand_seeds[FIO_RAND_BLOCK_OFF], use64);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
@@ -1088,29 +1076,39 @@ static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 
 		init_rand_seed(s, td->rand_seeds[FIO_RAND_SEQ_RAND_READ_OFF], false);
 	}
+
+	init_rand_seed(&td->buf_state, td->rand_seeds[FIO_RAND_BUF_OFF], use64);
+	frand_copy(&td->buf_state_prev, &td->buf_state);
 }
 
-void td_fill_rand_seeds(struct thread_data *td)
+static int setup_random_seeds(struct thread_data *td)
 {
-	bool use64;
-
-	if (td->o.allrand_repeatable) {
-		unsigned int i;
+	uint64_t seed;
+	unsigned int i;
 
-		for (i = 0; i < FIO_RAND_NR_OFFS; i++)
-			td->rand_seeds[i] = FIO_RANDSEED * td->thread_number
-			       	+ i;
+	if (!td->o.rand_repeatable && !fio_option_is_set(&td->o, rand_seed)) {
+		int ret = init_random_seeds(td->rand_seeds, sizeof(td->rand_seeds));
+		dprint(FD_RANDOM, "using system RNG for random seeds\n");
+		if (ret)
+			return ret;
+	} else {
+		seed = td->o.rand_seed;
+		for (i = 0; i < 4; i++)
+			seed *= 0x9e370001UL;
+
+		for (i = 0; i < FIO_RAND_NR_OFFS; i++) {
+			td->rand_seeds[i] = seed * td->thread_number + i;
+			seed *= 0x9e370001UL;
+		}
 	}
 
-	if (td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE64)
-		use64 = true;
-	else
-		use64 = false;
+	td_fill_rand_seeds(td);
 
-	td_fill_rand_seeds_internal(td, use64);
+	dprint(FD_RANDOM, "FIO_RAND_NR_OFFS=%d\n", FIO_RAND_NR_OFFS);
+	for (int i = 0; i < FIO_RAND_NR_OFFS; i++)
+		dprint(FD_RANDOM, "rand_seeds[%d]=%" PRIu64 "\n", i, td->rand_seeds[i]);
 
-	init_rand_seed(&td->buf_state, td->rand_seeds[FIO_RAND_BUF_OFF], use64);
-	frand_copy(&td->buf_state_prev, &td->buf_state);
+	return 0;
 }
 
 /*
@@ -1246,31 +1244,6 @@ static void init_flags(struct thread_data *td)
 	}
 }
 
-static int setup_random_seeds(struct thread_data *td)
-{
-	uint64_t seed;
-	unsigned int i;
-
-	if (!td->o.rand_repeatable && !fio_option_is_set(&td->o, rand_seed)) {
-		int ret = init_random_seeds(td->rand_seeds, sizeof(td->rand_seeds));
-		if (!ret)
-			td_fill_rand_seeds(td);
-		return ret;
-	}
-
-	seed = td->o.rand_seed;
-	for (i = 0; i < 4; i++)
-		seed *= 0x9e370001UL;
-
-	for (i = 0; i < FIO_RAND_NR_OFFS; i++) {
-		td->rand_seeds[i] = seed * td->thread_number + i;
-		seed *= 0x9e370001UL;
-	}
-
-	td_fill_rand_seeds(td);
-	return 0;
-}
-
 enum {
 	FPRE_NONE = 0,
 	FPRE_JOBNAME,
diff --git a/options.c b/options.c
index 440bff37..8193fb29 100644
--- a/options.c
+++ b/options.c
@@ -2465,6 +2465,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "randrepeat",
+		.alias	= "allrandrepeat",
 		.lname	= "Random repeatable",
 		.type	= FIO_OPT_BOOL,
 		.off1	= offsetof(struct thread_options, rand_repeatable),
@@ -2594,16 +2595,6 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_RANDOM,
 	},
-	{
-		.name	= "allrandrepeat",
-		.lname	= "All Random Repeat",
-		.type	= FIO_OPT_BOOL,
-		.off1	= offsetof(struct thread_options, allrand_repeatable),
-		.help	= "Use repeatable random numbers for everything",
-		.def	= "0",
-		.category = FIO_OPT_C_IO,
-		.group	= FIO_OPT_G_RANDOM,
-	},
 	{
 		.name	= "nrfiles",
 		.lname	= "Number of files",
diff --git a/server.h b/server.h
index 898a893d..601d3340 100644
--- a/server.h
+++ b/server.h
@@ -51,7 +51,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 99,
+	FIO_SERVER_VER			= 100,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/t/io_uring.c b/t/io_uring.c
index 504f8ce9..f9f4b840 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -704,7 +704,7 @@ static int get_file_size(struct file *f)
 					bs, lbs);
 			return -1;
 		}
-		f->max_blocks = nlba / bs;
+		f->max_blocks = nlba;
 		f->max_size = nlba;
 		f->lba_shift = ilog2(lbs);
 		return 0;
diff --git a/t/random_seed.py b/t/random_seed.py
new file mode 100755
index 00000000..86f2eb21
--- /dev/null
+++ b/t/random_seed.py
@@ -0,0 +1,394 @@
+#!/usr/bin/env python3
+"""
+# random_seed.py
+#
+# Test fio's random seed options.
+#
+# - make sure that randseed overrides randrepeat and allrandrepeat
+# - make sure that seeds differ across invocations when [all]randrepeat=0 and randseed is not set
+# - make sure that seeds are always the same when [all]randrepeat=1 and randseed is not set
+#
+# USAGE
+# see python3 random_seed.py --help
+#
+# EXAMPLES
+# python3 t/random_seed.py
+# python3 t/random_seed.py -f ./fio
+#
+# REQUIREMENTS
+# Python 3.6
+#
+"""
+import os
+import sys
+import time
+import locale
+import argparse
+import subprocess
+from pathlib import Path
+
+class FioRandTest():
+    """fio random seed test."""
+
+    def __init__(self, artifact_root, test_options, debug):
+        """
+        artifact_root   root directory for artifacts (subdirectory will be created under here)
+        test            test specification
+        """
+        self.artifact_root = artifact_root
+        self.test_options = test_options
+        self.debug = debug
+        self.filename_stub = None
+        self.filenames = {}
+
+        self.test_dir = os.path.abspath(os.path.join(self.artifact_root,
+                                     f"{self.test_options['test_id']:03d}"))
+        if not os.path.exists(self.test_dir):
+            os.mkdir(self.test_dir)
+
+        self.filename_stub = f"random{self.test_options['test_id']:03d}"
+        self.filenames['command'] = os.path.join(self.test_dir, f"{self.filename_stub}.command")
+        self.filenames['stdout'] = os.path.join(self.test_dir, f"{self.filename_stub}.stdout")
+        self.filenames['stderr'] = os.path.join(self.test_dir, f"{self.filename_stub}.stderr")
+        self.filenames['exitcode'] = os.path.join(self.test_dir, f"{self.filename_stub}.exitcode")
+        self.filenames['output'] = os.path.join(self.test_dir, f"{self.filename_stub}.output")
+
+    def run_fio(self, fio_path):
+        """Run a test."""
+
+        fio_args = [
+            "--debug=random",
+            "--name=random_seed",
+            "--ioengine=null",
+            "--filesize=32k",
+            "--rw=randread",
+            f"--output={self.filenames['output']}",
+        ]
+        for opt in ['randseed', 'randrepeat', 'allrandrepeat']:
+            if opt in self.test_options:
+                option = f"--{opt}={self.test_options[opt]}"
+                fio_args.append(option)
+
+        command = [fio_path] + fio_args
+        with open(self.filenames['command'], "w+", encoding=locale.getpreferredencoding()) as command_file:
+            command_file.write(" ".join(command))
+
+        passed = True
+
+        try:
+            with open(self.filenames['stdout'], "w+", encoding=locale.getpreferredencoding()) as stdout_file, \
+                open(self.filenames['stderr'], "w+", encoding=locale.getpreferredencoding()) as stderr_file, \
+                open(self.filenames['exitcode'], "w+", encoding=locale.getpreferredencoding()) as exitcode_file:
+                proc = None
+                # Avoid using subprocess.run() here because when a timeout occurs,
+                # fio will be stopped with SIGKILL. This does not give fio a
+                # chance to clean up and means that child processes may continue
+                # running and submitting IO.
+                proc = subprocess.Popen(command,
+                                        stdout=stdout_file,
+                                        stderr=stderr_file,
+                                        cwd=self.test_dir,
+                                        universal_newlines=True)
+                proc.communicate(timeout=300)
+                exitcode_file.write(f'{proc.returncode}\n')
+                passed &= (proc.returncode == 0)
+        except subprocess.TimeoutExpired:
+            proc.terminate()
+            proc.communicate()
+            assert proc.poll()
+            print("Timeout expired")
+            passed = False
+        except Exception:
+            if proc:
+                if not proc.poll():
+                    proc.terminate()
+                    proc.communicate()
+            print(f"Exception: {sys.exc_info()}")
+            passed = False
+
+        return passed
+
+    def get_rand_seeds(self):
+        """Collect random seeds from --debug=random output."""
+        with open(self.filenames['output'], "r", encoding=locale.getpreferredencoding()) as out_file:
+            file_data = out_file.read()
+
+            offsets = 0
+            for line in file_data.split('\n'):
+                if 'random' in line and 'FIO_RAND_NR_OFFS=' in line:
+                    tokens = line.split('=')
+                    offsets = int(tokens[len(tokens)-1])
+                    break
+
+            if offsets == 0:
+                pass
+                # find an exception to throw
+
+            seed_list = []
+            for line in file_data.split('\n'):
+                if 'random' not in line:
+                    continue
+                if 'rand_seeds[' in line:
+                    tokens = line.split('=')
+                    seed = int(tokens[-1])
+                    seed_list.append(seed)
+                    # assume that seeds are in order
+
+            return seed_list
+
+    def check(self):
+        """Check test output."""
+
+        raise NotImplementedError()
+
+
+class TestRR(FioRandTest):
+    """
+    Test object for [all]randrepeat. If run for the first time just collect the
+    seeds. For later runs make sure the seeds match or do not match those
+    previously collected.
+    """
+    # one set of seeds is for randrepeat=0 and the other is for randrepeat=1
+    seeds = { 0: None, 1: None }
+
+    def check(self):
+        """Check output for allrandrepeat=1."""
+
+        retval = True
+        opt = 'randrepeat' if 'randrepeat' in self.test_options else 'allrandrepeat'
+        rr = self.test_options[opt]
+        rand_seeds = self.get_rand_seeds()
+
+        if not TestRR.seeds[rr]:
+            TestRR.seeds[rr] = rand_seeds
+            if self.debug:
+                print(f"TestRR: saving rand_seeds for [a]rr={rr}")
+        else:
+            if rr:
+                if TestRR.seeds[1] != rand_seeds:
+                    retval = False
+                    print(f"TestRR: unexpected seed mismatch for [a]rr={rr}")
+                else:
+                    if self.debug:
+                        print(f"TestRR: seeds correctly match for [a]rr={rr}")
+                if TestRR.seeds[0] == rand_seeds:
+                    retval = False
+                    print("TestRR: seeds unexpectedly match those from system RNG")
+            else:
+                if TestRR.seeds[0] == rand_seeds:
+                    retval = False
+                    print(f"TestRR: unexpected seed match for [a]rr={rr}")
+                else:
+                    if self.debug:
+                        print(f"TestRR: seeds correctly don't match for [a]rr={rr}")
+                if TestRR.seeds[1] == rand_seeds:
+                    retval = False
+                    print(f"TestRR: random seeds unexpectedly match those from [a]rr=1")
+
+        return retval
+
+
+class TestRS(FioRandTest):
+    """
+    Test object when randseed=something controls the generated seeds. If run
+    for the first time for a given randseed just collect the seeds. For later
+    runs with the same seed make sure the seeds are the same as those
+    previously collected.
+    """
+    seeds = {}
+
+    def check(self):
+        """Check output for randseed=something."""
+
+        retval = True
+        rand_seeds = self.get_rand_seeds()
+        randseed = self.test_options['randseed']
+
+        if self.debug:
+            print("randseed = ", randseed)
+
+        if randseed not in TestRS.seeds:
+            TestRS.seeds[randseed] = rand_seeds
+            if self.debug:
+                print("TestRS: saving rand_seeds")
+        else:
+            if TestRS.seeds[randseed] != rand_seeds:
+                retval = False
+                print("TestRS: seeds don't match when they should")
+            else:
+                if self.debug:
+                    print("TestRS: seeds correctly match")
+
+        # Now try to find seeds generated using a different randseed and make
+        # sure they *don't* match
+        for key in TestRS.seeds:
+            if key != randseed:
+                if TestRS.seeds[key] == rand_seeds:
+                    retval = False
+                    print("TestRS: randseeds differ but generated seeds match.")
+                else:
+                    if self.debug:
+                        print("TestRS: randseeds differ and generated seeds also differ.")
+
+        return retval
+
+
+def parse_args():
+    """Parse command-line arguments."""
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-f', '--fio', help='path to file executable (e.g., ./fio)')
+    parser.add_argument('-a', '--artifact-root', help='artifact root directory')
+    parser.add_argument('-d', '--debug', help='enable debug output', action='store_true')
+    parser.add_argument('-s', '--skip', nargs='+', type=int,
+                        help='list of test(s) to skip')
+    parser.add_argument('-o', '--run-only', nargs='+', type=int,
+                        help='list of test(s) to run, skipping all others')
+    args = parser.parse_args()
+
+    return args
+
+
+def main():
+    """Run tests of fio random seed options"""
+
+    args = parse_args()
+
+    artifact_root = args.artifact_root if args.artifact_root else \
+        f"random-seed-test-{time.strftime('%Y%m%d-%H%M%S')}"
+    os.mkdir(artifact_root)
+    print(f"Artifact directory is {artifact_root}")
+
+    if args.fio:
+        fio = str(Path(args.fio).absolute())
+    else:
+        fio = 'fio'
+    print(f"fio path is {fio}")
+
+    test_list = [
+        {
+            "test_id": 1,
+            "randrepeat": 0,
+            "test_obj": TestRR,
+        },
+        {
+            "test_id": 2,
+            "randrepeat": 0,
+            "test_obj": TestRR,
+        },
+        {
+            "test_id": 3,
+            "randrepeat": 1,
+            "test_obj": TestRR,
+        },
+        {
+            "test_id": 4,
+            "randrepeat": 1,
+            "test_obj": TestRR,
+        },
+        {
+            "test_id": 5,
+            "allrandrepeat": 0,
+            "test_obj": TestRR,
+        },
+        {
+            "test_id": 6,
+            "allrandrepeat": 0,
+            "test_obj": TestRR,
+        },
+        {
+            "test_id": 7,
+            "allrandrepeat": 1,
+            "test_obj": TestRR,
+        },
+        {
+            "test_id": 8,
+            "allrandrepeat": 1,
+            "test_obj": TestRR,
+        },
+        {
+            "test_id": 9,
+            "randrepeat": 0,
+            "randseed": "12345",
+            "test_obj": TestRS,
+        },
+        {
+            "test_id": 10,
+            "randrepeat": 0,
+            "randseed": "12345",
+            "test_obj": TestRS,
+        },
+        {
+            "test_id": 11,
+            "randrepeat": 1,
+            "randseed": "12345",
+            "test_obj": TestRS,
+        },
+        {
+            "test_id": 12,
+            "allrandrepeat": 0,
+            "randseed": "12345",
+            "test_obj": TestRS,
+        },
+        {
+            "test_id": 13,
+            "allrandrepeat": 1,
+            "randseed": "12345",
+            "test_obj": TestRS,
+        },
+        {
+            "test_id": 14,
+            "randrepeat": 0,
+            "randseed": "67890",
+            "test_obj": TestRS,
+        },
+        {
+            "test_id": 15,
+            "randrepeat": 1,
+            "randseed": "67890",
+            "test_obj": TestRS,
+        },
+        {
+            "test_id": 16,
+            "allrandrepeat": 0,
+            "randseed": "67890",
+            "test_obj": TestRS,
+        },
+        {
+            "test_id": 17,
+            "allrandrepeat": 1,
+            "randseed": "67890",
+            "test_obj": TestRS,
+        },
+    ]
+
+    passed = 0
+    failed = 0
+    skipped = 0
+
+    for test in test_list:
+        if (args.skip and test['test_id'] in args.skip) or \
+           (args.run_only and test['test_id'] not in args.run_only):
+            skipped = skipped + 1
+            outcome = 'SKIPPED (User request)'
+        else:
+            test_obj = test['test_obj'](artifact_root, test, args.debug)
+            status = test_obj.run_fio(fio)
+            if status:
+                status = test_obj.check()
+            if status:
+                passed = passed + 1
+                outcome = 'PASSED'
+            else:
+                failed = failed + 1
+                outcome = 'FAILED'
+
+        print(f"**********Test {test['test_id']} {outcome}**********")
+
+    print(f"{passed} tests passed, {failed} failed, {skipped} skipped")
+
+    sys.exit(failed)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index c3091b68..4fe6fe46 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -53,6 +53,7 @@ import traceback
 import subprocess
 import multiprocessing
 from pathlib import Path
+from statsmodels.sandbox.stats.runs import runstest_1samp
 
 
 class FioTest():
@@ -598,24 +599,16 @@ class FioJobTest_t0020(FioJobTest):
 
         log_lines = file_data.split('\n')
 
-        seq_count = 0
-        offsets = set()
+        offsets = []
 
         prev = int(log_lines[0].split(',')[4])
         for line in log_lines[1:]:
-            offsets.add(prev/4096)
+            offsets.append(prev/4096)
             if len(line.strip()) == 0:
                 continue
             cur = int(line.split(',')[4])
-            if cur - prev == 4096:
-                seq_count += 1
             prev = cur
 
-        # 10 is an arbitrary threshold
-        if seq_count > 10:
-            self.passed = False
-            self.failure_reason = "too many ({0}) consecutive offsets".format(seq_count)
-
         if len(offsets) != 256:
             self.passed = False
             self.failure_reason += " number of offsets is {0} instead of 256".format(len(offsets))
@@ -625,6 +618,11 @@ class FioJobTest_t0020(FioJobTest):
                 self.passed = False
                 self.failure_reason += " missing offset {0}".format(i*4096)
 
+        (z, p) = runstest_1samp(list(offsets))
+        if p < 0.05:
+            self.passed = False
+            self.failure_reason += f" runs test failed with p = {p}"
+
 
 class FioJobTest_t0022(FioJobTest):
     """Test consists of fio test job t0022"""
@@ -1361,6 +1359,14 @@ TEST_LIST = [
         'success':          SUCCESS_DEFAULT,
         'requirements':     [],
     },
+    {
+        'test_id':          1013,
+        'test_class':       FioExeTest,
+        'exe':              't/random_seed.py',
+        'parameters':       ['-f', '{fio_path}'],
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [],
+    },
 ]
 
 
diff --git a/thread_options.h b/thread_options.h
index 6670cbbf..a24ebee6 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -162,7 +162,6 @@ struct thread_options {
 	unsigned int do_disk_util;
 	unsigned int override_sync;
 	unsigned int rand_repeatable;
-	unsigned int allrand_repeatable;
 	unsigned long long rand_seed;
 	unsigned int log_avg_msec;
 	unsigned int log_hist_msec;
@@ -485,8 +484,6 @@ struct thread_options_pack {
 	uint32_t do_disk_util;
 	uint32_t override_sync;
 	uint32_t rand_repeatable;
-	uint32_t allrand_repeatable;
-	uint32_t pad2;
 	uint64_t rand_seed;
 	uint32_t log_avg_msec;
 	uint32_t log_hist_msec;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-04-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-04-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2bb86015831c3ef3f8a077f417ad79ed6998ed48:

  Merge branch 'libaio-hang' of https://github.com/lrumancik/fio (2023-04-07 16:42:07 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 07ed2b57741afa53afa7b2b9fa742c652f1ed8c1:

  Merge branch 'libaio-hang' of https://github.com/lrumancik/fio (2023-04-10 15:40:45 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'libaio-hang' of https://github.com/lrumancik/fio

Leah Rumancik (1):
      engines/io_uring: update getevents max to reflect previously seen events

 engines/io_uring.c | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index f10a4593..7f743c2a 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -529,6 +529,7 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
 		r = fio_ioring_cqring_reap(td, events, max);
 		if (r) {
 			events += r;
+			max -= r;
 			if (actual_min != 0)
 				actual_min -= r;
 			continue;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-04-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-04-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2d8c2709dc067aabdeab8bc1eea1992d9d802375:

  io_u: fix bad style (2023-04-04 09:49:19 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2bb86015831c3ef3f8a077f417ad79ed6998ed48:

  Merge branch 'libaio-hang' of https://github.com/lrumancik/fio (2023-04-07 16:42:07 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'libaio-hang' of https://github.com/lrumancik/fio

Leah Rumancik (1):
      engines/libaio: fix io_getevents min/max events arguments

 engines/libaio.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libaio.c b/engines/libaio.c
index 33b8c12f..1b82c90b 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -288,14 +288,16 @@ static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
 		    && actual_min == 0
 		    && ((struct aio_ring *)(ld->aio_ctx))->magic
 				== AIO_RING_MAGIC) {
-			r = user_io_getevents(ld->aio_ctx, max,
+			r = user_io_getevents(ld->aio_ctx, max - events,
 				ld->aio_events + events);
 		} else {
 			r = io_getevents(ld->aio_ctx, actual_min,
-				max, ld->aio_events + events, lt);
+				max - events, ld->aio_events + events, lt);
 		}
-		if (r > 0)
+		if (r > 0) {
 			events += r;
+			actual_min = actual_min > events ? actual_min - events : 0;
+		}
 		else if ((min && r == 0) || r == -EAGAIN) {
 			fio_libaio_commit(td);
 			if (actual_min)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-04-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-04-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 638689b15af35bd746f9114a3e8895e7a983ed83:

  Only expose fadvise_hint=noreuse if supported (2023-03-31 12:52:01 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2d8c2709dc067aabdeab8bc1eea1992d9d802375:

  io_u: fix bad style (2023-04-04 09:49:19 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      Merge branch 'master' of https://github.com/SuhoSon/fio
      engines/nvme: cache errno value
      engines/nfs: fix the most egregious style violations
      io_u: fix bad style

suho.son (1):
      thinktime: Fix missing re-init thinktime when using ramptime

 engines/nfs.c  | 147 ++++++++++++++++++++++++++++++++++-----------------------
 engines/nvme.c |   6 ++-
 fio.h          |   2 +-
 io_u.c         |   4 +-
 libfio.c       |   6 +++
 5 files changed, 102 insertions(+), 63 deletions(-)

---

Diff of recent changes:

diff --git a/engines/nfs.c b/engines/nfs.c
index 7031769d..336e670b 100644
--- a/engines/nfs.c
+++ b/engines/nfs.c
@@ -16,10 +16,17 @@ enum nfs_op_type {
 struct fio_libnfs_options {
 	struct nfs_context *context;
 	char *nfs_url;
-	unsigned int queue_depth; /* nfs_callback needs this info, but doesn't have fio td structure to pull it from */
+	/* nfs_callback needs this info, but doesn't have fio td structure to
+	 * pull it from
+	 */
+	unsigned int queue_depth;
+
 	/* the following implement a circular queue of outstanding IOs */
-	int outstanding_events; /* IOs issued to libnfs, that have not returned yet */
-	int prev_requested_event_index; /* event last returned via fio_libnfs_event */
+
+	/* IOs issued to libnfs, that have not returned yet */
+	int outstanding_events;
+	/* event last returned via fio_libnfs_event */
+	int prev_requested_event_index;
 	int next_buffered_event; /* round robin-pointer within events[] */
 	int buffered_event_count; /* IOs completed by libnfs, waiting for FIO */
 	int free_event_buffer_index; /* next free buffer */
@@ -33,11 +40,12 @@ struct nfs_data {
 
 static struct fio_option options[] = {
 	{
-		.name     = "nfs_url",
-		.lname    = "nfs_url",
-		.type     = FIO_OPT_STR_STORE,
-		.help	= "URL in libnfs format, eg nfs://<server|ipv4|ipv6>/path[?arg=val[&arg=val]*]",
-		.off1     = offsetof(struct fio_libnfs_options, nfs_url),
+		.name	= "nfs_url",
+		.lname	= "nfs_url",
+		.type	= FIO_OPT_STR_STORE,
+		.help	= "URL in libnfs format, eg nfs://<server|ipv4|"
+			  "ipv6>/path[?arg=val[&arg=val]*]",
+		.off1	= offsetof(struct fio_libnfs_options, nfs_url),
 		.category = FIO_OPT_C_ENGINE,
 		.group	= __FIO_OPT_G_NFS,
 	},
@@ -50,44 +58,53 @@ static struct io_u *fio_libnfs_event(struct thread_data *td, int event)
 {
 	struct fio_libnfs_options *o = td->eo;
 	struct io_u *io_u = o->events[o->next_buffered_event];
+
 	assert(o->events[o->next_buffered_event]);
 	o->events[o->next_buffered_event] = NULL;
 	o->next_buffered_event = (o->next_buffered_event + 1) % td->o.iodepth;
+
 	/* validate our state machine */
 	assert(o->buffered_event_count);
 	o->buffered_event_count--;
 	assert(io_u);
+
 	/* assert that fio_libnfs_event is being called in sequential fashion */
 	assert(event == 0 || o->prev_requested_event_index + 1 == event);
-	if (o->buffered_event_count == 0) {
+	if (o->buffered_event_count == 0)
 		o->prev_requested_event_index = -1;
-	} else {
+	else
 		o->prev_requested_event_index = event;
-	}
 	return io_u;
 }
 
-static int nfs_event_loop(struct thread_data *td, bool flush) {
+/*
+ * fio core logic seems to stop calling this event-loop if we ever return with
+ * 0 events
+ */
+#define SHOULD_WAIT(td, o, flush)			\
+ 	((o)->outstanding_events == (td)->o.iodepth ||	\
+		(flush && (o)->outstanding_events))
+
+static int nfs_event_loop(struct thread_data *td, bool flush)
+{
 	struct fio_libnfs_options *o = td->eo;
 	struct pollfd pfds[1]; /* nfs:0 */
+
 	/* we already have stuff queued for fio, no need to waste cpu on poll() */
 	if (o->buffered_event_count)
 		return o->buffered_event_count;
-	/* fio core logic seems to stop calling this event-loop if we ever return with 0 events */
-	#define SHOULD_WAIT() (o->outstanding_events == td->o.iodepth || (flush && o->outstanding_events))
 
 	do {
-		int timeout = SHOULD_WAIT() ? -1 : 0;
+		int timeout = SHOULD_WAIT(td, o, flush) ? -1 : 0;
 		int ret = 0;
+
 		pfds[0].fd = nfs_get_fd(o->context);
 		pfds[0].events = nfs_which_events(o->context);
 		ret = poll(&pfds[0], 1, timeout);
 		if (ret < 0) {
-			if (errno == EINTR || errno == EAGAIN) {
+			if (errno == EINTR || errno == EAGAIN)
 				continue;
-			}
-			log_err("nfs: failed to poll events: %s.\n",
-				strerror(errno));
+			log_err("nfs: failed to poll events: %s\n", strerror(errno));
 			break;
 		}
 
@@ -96,27 +113,30 @@ static int nfs_event_loop(struct thread_data *td, bool flush) {
 			log_err("nfs: socket is in an unrecoverable error state.\n");
 			break;
 		}
-	} while (SHOULD_WAIT());
+	} while (SHOULD_WAIT(td, o, flush));
+
 	return o->buffered_event_count;
-#undef SHOULD_WAIT
 }
 
 static int fio_libnfs_getevents(struct thread_data *td, unsigned int min,
-				  unsigned int max, const struct timespec *t)
+				unsigned int max, const struct timespec *t)
 {
 	return nfs_event_loop(td, false);
 }
 
 static void nfs_callback(int res, struct nfs_context *nfs, void *data,
-                       void *private_data)
+			 void *private_data)
 {
 	struct io_u *io_u = private_data;
 	struct nfs_data *nfs_data = io_u->file->engine_data;
 	struct fio_libnfs_options *o = nfs_data->options;
 	if (res < 0) {
-		log_err("Failed NFS operation(code:%d): %s\n", res, nfs_get_error(o->context));
+		log_err("Failed NFS operation(code:%d): %s\n", res,
+						nfs_get_error(o->context));
 		io_u->error = -res;
-		/* res is used for read math below, don't wanna pass negative there */
+		/* res is used for read math below, don't want to pass negative
+		 * there
+		 */
 		res = 0;
 	} else if (io_u->ddir == DDIR_READ) {
 		memcpy(io_u->buf, data, res);
@@ -133,42 +153,46 @@ static void nfs_callback(int res, struct nfs_context *nfs, void *data,
 	o->buffered_event_count++;
 }
 
-static int queue_write(struct fio_libnfs_options *o, struct io_u *io_u) {
+static int queue_write(struct fio_libnfs_options *o, struct io_u *io_u)
+{
 	struct nfs_data *nfs_data = io_u->engine_data;
-	return nfs_pwrite_async(o->context, nfs_data->nfsfh,
-                           io_u->offset, io_u->buflen, io_u->buf, nfs_callback,
-                           io_u);
+
+	return nfs_pwrite_async(o->context, nfs_data->nfsfh, io_u->offset,
+				io_u->buflen, io_u->buf, nfs_callback, io_u);
 }
 
-static int queue_read(struct fio_libnfs_options *o, struct io_u *io_u) {
+static int queue_read(struct fio_libnfs_options *o, struct io_u *io_u)
+{
 	struct nfs_data *nfs_data = io_u->engine_data;
-	return nfs_pread_async(o->context,  nfs_data->nfsfh, io_u->offset, io_u->buflen, nfs_callback,  io_u);
+
+	return nfs_pread_async(o->context, nfs_data->nfsfh, io_u->offset,
+				io_u->buflen, nfs_callback, io_u);
 }
 
 static enum fio_q_status fio_libnfs_queue(struct thread_data *td,
-					    struct io_u *io_u)
+					  struct io_u *io_u)
 {
 	struct nfs_data *nfs_data = io_u->file->engine_data;
 	struct fio_libnfs_options *o = nfs_data->options;
 	struct nfs_context *nfs = o->context;
-	int err;
 	enum fio_q_status ret = FIO_Q_QUEUED;
+	int err;
 
 	io_u->engine_data = nfs_data;
-	switch(io_u->ddir) {
-		case DDIR_WRITE:
-			err = queue_write(o, io_u);
-			break;
-		case DDIR_READ:
-			err = queue_read(o, io_u);
-			break;
-		case DDIR_TRIM:
-			log_err("nfs: trim is not supported");
-			err = -1;
-			break;
-		default:
-			log_err("nfs: unhandled io %d\n", io_u->ddir);
-			err = -1;
+	switch (io_u->ddir) {
+	case DDIR_WRITE:
+		err = queue_write(o, io_u);
+		break;
+	case DDIR_READ:
+		err = queue_read(o, io_u);
+		break;
+	case DDIR_TRIM:
+		log_err("nfs: trim is not supported");
+		err = -1;
+		break;
+	default:
+		log_err("nfs: unhandled io %d\n", io_u->ddir);
+		err = -1;
 	}
 	if (err) {
 		log_err("nfs: Failed to queue nfs op: %s\n", nfs_get_error(nfs));
@@ -195,7 +219,7 @@ static int do_mount(struct thread_data *td, const char *url)
 		return 0;
 
 	options->context = nfs_init_context();
-	if (options->context == NULL) {
+	if (!options->context) {
 		log_err("nfs: failed to init nfs context\n");
 		return -1;
 	}
@@ -219,7 +243,9 @@ static int do_mount(struct thread_data *td, const char *url)
 
 static int fio_libnfs_setup(struct thread_data *td)
 {
-	/* Using threads with libnfs causes fio to hang on exit, lower performance */
+	/* Using threads with libnfs causes fio to hang on exit, lower
+	 * performance
+	 */
 	td->o.use_thread = 0;
 	return 0;
 }
@@ -227,6 +253,7 @@ static int fio_libnfs_setup(struct thread_data *td)
 static void fio_libnfs_cleanup(struct thread_data *td)
 {
 	struct fio_libnfs_options *o = td->eo;
+
 	nfs_umount(o->context);
 	nfs_destroy_context(o->context);
 	free(o->events);
@@ -234,10 +261,10 @@ static void fio_libnfs_cleanup(struct thread_data *td)
 
 static int fio_libnfs_open(struct thread_data *td, struct fio_file *f)
 {
-	int ret;
 	struct fio_libnfs_options *options = td->eo;
 	struct nfs_data *nfs_data = NULL;
 	int flags = 0;
+	int ret;
 
 	if (!options->nfs_url) {
 		log_err("nfs: nfs_url is a required parameter\n");
@@ -246,23 +273,25 @@ static int fio_libnfs_open(struct thread_data *td, struct fio_file *f)
 
 	ret = do_mount(td, options->nfs_url);
 
-	if (ret != 0) {
-		log_err("nfs: Failed to mount %s with code %d: %s\n", options->nfs_url, ret, nfs_get_error(options->context));
+	if (ret) {
+		log_err("nfs: Failed to mount %s with code %d: %s\n",
+			options->nfs_url, ret, nfs_get_error(options->context));
 		return ret;
 	}
 	nfs_data = malloc(sizeof(struct nfs_data));
 	memset(nfs_data, 0, sizeof(struct nfs_data));
 	nfs_data->options = options;
 
-	if (td->o.td_ddir == TD_DDIR_WRITE) {
+	if (td->o.td_ddir == TD_DDIR_WRITE)
 		flags |= O_CREAT | O_RDWR;
-	} else {
+	else
 		flags |= O_RDWR;
-	}
+
 	ret = nfs_open(options->context, f->file_name, flags, &nfs_data->nfsfh);
 
-	if (ret != 0)
-		log_err("Failed to open %s: %s\n", f->file_name, nfs_get_error(options->context));
+	if (ret)
+		log_err("Failed to open %s: %s\n", f->file_name,
+					nfs_get_error(options->context));
 	f->engine_data = nfs_data;
 	return ret;
 }
@@ -272,8 +301,10 @@ static int fio_libnfs_close(struct thread_data *td, struct fio_file *f)
 	struct nfs_data *nfs_data = f->engine_data;
 	struct fio_libnfs_options *o = nfs_data->options;
 	int ret = 0;
+
 	if (nfs_data->nfsfh)
 		ret = nfs_close(o->context, nfs_data->nfsfh);
+
 	free(nfs_data);
 	f->engine_data = NULL;
 	return ret;
@@ -289,7 +320,7 @@ struct ioengine_ops ioengine = {
 	.cleanup	= fio_libnfs_cleanup,
 	.open_file	= fio_libnfs_open,
 	.close_file	= fio_libnfs_close,
-	.flags      = FIO_DISKLESSIO | FIO_NOEXTEND | FIO_NODISKUTIL,
+	.flags		= FIO_DISKLESSIO | FIO_NOEXTEND | FIO_NODISKUTIL,
 	.options	= options,
 	.option_struct_size	= sizeof(struct fio_libnfs_options),
 };
diff --git a/engines/nvme.c b/engines/nvme.c
index ac908687..fd2161f3 100644
--- a/engines/nvme.c
+++ b/engines/nvme.c
@@ -112,9 +112,10 @@ int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
 
 	namespace_id = ioctl(fd, NVME_IOCTL_ID);
 	if (namespace_id < 0) {
+		err = -errno;
 		log_err("failed to fetch namespace-id");
 		close(fd);
-		return -errno;
+		return err;
 	}
 
 	/*
@@ -414,6 +415,7 @@ int fio_nvme_iomgmt_ruhs(struct thread_data *td, struct fio_file *f,
 	} else
 		errno = 0;
 
+	ret = -errno;
 	close(fd);
-	return -errno;
+	return ret;
 }
diff --git a/fio.h b/fio.h
index f2acd430..6b841e9c 100644
--- a/fio.h
+++ b/fio.h
@@ -377,7 +377,7 @@ struct thread_data {
 
 	uint64_t *thinktime_blocks_counter;
 	struct timespec last_thinktime;
-	uint64_t last_thinktime_blocks;
+	int64_t last_thinktime_blocks;
 
 	/*
 	 * State for random io, a bitmap of blocks done vs not done
diff --git a/io_u.c b/io_u.c
index ca7ee68f..30265cfb 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1370,8 +1370,8 @@ static struct fio_file *__get_next_file(struct thread_data *td)
 		if (td->o.file_service_type == FIO_FSERVICE_SEQ)
 			goto out;
 		if (td->file_service_left) {
-		  td->file_service_left--;
-		  goto out;
+			td->file_service_left--;
+			goto out;
 		}
 	}
 
diff --git a/libfio.c b/libfio.c
index a52014ce..ddd49cd7 100644
--- a/libfio.c
+++ b/libfio.c
@@ -131,10 +131,14 @@ void clear_io_state(struct thread_data *td, int all)
 
 void reset_all_stats(struct thread_data *td)
 {
+	unsigned long long b;
 	int i;
 
 	reset_io_counters(td, 1);
 
+	b = ddir_rw_sum(td->thinktime_blocks_counter);
+	td->last_thinktime_blocks -= b;
+
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		td->io_bytes[i] = 0;
 		td->io_blocks[i] = 0;
@@ -149,6 +153,8 @@ void reset_all_stats(struct thread_data *td)
 	memcpy(&td->bw_sample_time, &td->epoch, sizeof(td->epoch));
 	memcpy(&td->ss.prev_time, &td->epoch, sizeof(td->epoch));
 
+	td->last_thinktime = td->epoch;
+
 	lat_target_reset(td);
 	clear_rusage_stat(td);
 	helper_reset();

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-04-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-04-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d86ac3e9f4c703b7d7c9add96e69f2d02affdc65:

  Merge branch 'trim-support' of https://github.com/ankit-sam/fio (2023-03-27 13:21:25 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 638689b15af35bd746f9114a3e8895e7a983ed83:

  Only expose fadvise_hint=noreuse if supported (2023-03-31 12:52:01 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Only expose fadvise_hint=noreuse if supported

Yuanchu Xie (2):
      fio: add support for POSIX_FADV_NOREUSE
      docs: add noreuse fadvise_hint option

 HOWTO.rst   | 5 +++++
 fio.1       | 5 +++++
 fio.h       | 1 +
 ioengines.c | 4 ++++
 options.c   | 7 +++++++
 5 files changed, 22 insertions(+)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 5240f9da..cb0f9834 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1308,6 +1308,11 @@ I/O type
 		**random**
 			Advise using **FADV_RANDOM**.
 
+		**noreuse**
+			Advise using **FADV_NOREUSE**. This may be a no-op on older Linux
+			kernels. Since Linux 6.3, it provides a hint to the LRU algorithm.
+			See the :manpage:`posix_fadvise(2)` man page.
+
 .. option:: write_hint=str
 
 	Use :manpage:`fcntl(2)` to advise the kernel what life time to expect
diff --git a/fio.1 b/fio.1
index e2db3a3f..311b16d8 100644
--- a/fio.1
+++ b/fio.1
@@ -1098,6 +1098,11 @@ Advise using FADV_SEQUENTIAL.
 .TP
 .B random
 Advise using FADV_RANDOM.
+.TP
+.B noreuse
+Advise using FADV_NOREUSE. This may be a no-op on older Linux
+kernels. Since Linux 6.3, it provides a hint to the LRU algorithm.
+See the \fBposix_fadvise\fR\|(2) man page.
 .RE
 .RE
 .TP
diff --git a/fio.h b/fio.h
index 32535517..f2acd430 100644
--- a/fio.h
+++ b/fio.h
@@ -163,6 +163,7 @@ enum {
 	F_ADV_TYPE,
 	F_ADV_RANDOM,
 	F_ADV_SEQUENTIAL,
+	F_ADV_NOREUSE,
 };
 
 /*
diff --git a/ioengines.c b/ioengines.c
index e2316ee4..742f97dd 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -565,6 +565,10 @@ int td_io_open_file(struct thread_data *td, struct fio_file *f)
 			flags = POSIX_FADV_RANDOM;
 		else if (td->o.fadvise_hint == F_ADV_SEQUENTIAL)
 			flags = POSIX_FADV_SEQUENTIAL;
+#ifdef POSIX_FADV_NOREUSE
+		else if (td->o.fadvise_hint == F_ADV_NOREUSE)
+			flags = POSIX_FADV_NOREUSE;
+#endif
 		else {
 			log_err("fio: unknown fadvise type %d\n",
 							td->o.fadvise_hint);
diff --git a/options.c b/options.c
index 18857795..440bff37 100644
--- a/options.c
+++ b/options.c
@@ -4,6 +4,7 @@
 #include <ctype.h>
 #include <string.h>
 #include <assert.h>
+#include <fcntl.h>
 #include <sys/stat.h>
 #include <netinet/in.h>
 
@@ -2740,6 +2741,12 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .oval = F_ADV_SEQUENTIAL,
 			    .help = "Advise using FADV_SEQUENTIAL",
 			  },
+#ifdef POSIX_FADV_NOREUSE
+			  { .ival = "noreuse",
+			    .oval = F_ADV_NOREUSE,
+			    .help = "Advise using FADV_NOREUSE",
+			  },
+#endif
 		},
 		.help	= "Use fadvise() to advise the kernel on IO pattern",
 		.def	= "1",

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-03-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-03-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2fa0ab21c5726d8242a820ff688de019cc4d2fe2:

  engines/nvme: cast __u64 to unsigned long long for printing (2023-03-21 08:40:14 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d86ac3e9f4c703b7d7c9add96e69f2d02affdc65:

  Merge branch 'trim-support' of https://github.com/ankit-sam/fio (2023-03-27 13:21:25 -0600)

----------------------------------------------------------------
Ankit Kumar (2):
      fdp: drop expensive modulo operation
      io_uring_cmd: suppport for trim operation

Jens Axboe (1):
      Merge branch 'trim-support' of https://github.com/ankit-sam/fio

 engines/io_uring.c | 31 ++++++++++++++++++++++++++++++-
 engines/nvme.c     | 34 ++++++++++++++++++++++++++++++++++
 engines/nvme.h     | 12 ++++++++++++
 fdp.c              |  5 ++++-
 stat.c             |  2 +-
 5 files changed, 81 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 54fdf7f3..f10a4593 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -24,6 +24,7 @@
 #include "../lib/types.h"
 #include "../os/linux/io_uring.h"
 #include "cmdprio.h"
+#include "zbd.h"
 #include "nvme.h"
 
 #include <sys/stat.h>
@@ -409,6 +410,9 @@ static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
 	if (o->cmd_type != FIO_URING_CMD_NVME)
 		return -EINVAL;
 
+	if (io_u->ddir == DDIR_TRIM)
+		return 0;
+
 	sqe = &ld->sqes[(io_u->index) << 1];
 
 	if (o->registerfiles) {
@@ -556,6 +560,27 @@ static inline void fio_ioring_cmdprio_prep(struct thread_data *td,
 		ld->sqes[io_u->index].ioprio = io_u->ioprio;
 }
 
+static int fio_ioring_cmd_io_u_trim(const struct thread_data *td,
+				    struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	int ret;
+
+	if (td->o.zone_mode == ZONE_MODE_ZBD) {
+		ret = zbd_do_io_u_trim(td, io_u);
+		if (ret == io_u_completed)
+			return io_u->xfer_buflen;
+		if (ret)
+			goto err;
+	}
+
+	return fio_nvme_trim(td, f, io_u->offset, io_u->xfer_buflen);
+
+err:
+	io_u->error = ret;
+	return 0;
+}
+
 static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 					  struct io_u *io_u)
 {
@@ -572,7 +597,11 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 		if (ld->queued)
 			return FIO_Q_BUSY;
 
-		do_io_u_trim(td, io_u);
+		if (!strcmp(td->io_ops->name, "io_uring_cmd"))
+			fio_ioring_cmd_io_u_trim(td, io_u);
+		else
+			do_io_u_trim(td, io_u);
+
 		io_u_mark_submit(td, 1);
 		io_u_mark_complete(td, 1);
 		return FIO_Q_COMPLETED;
diff --git a/engines/nvme.c b/engines/nvme.c
index 3f6b64a8..ac908687 100644
--- a/engines/nvme.c
+++ b/engines/nvme.c
@@ -43,6 +43,40 @@ int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 	return 0;
 }
 
+static int nvme_trim(int fd, __u32 nsid, __u32 nr_range, __u32 data_len,
+		     void *data)
+{
+	struct nvme_passthru_cmd cmd = {
+		.opcode		= nvme_cmd_dsm,
+		.nsid		= nsid,
+		.addr		= (__u64)(uintptr_t)data,
+		.data_len 	= data_len,
+		.cdw10		= nr_range - 1,
+		.cdw11		= NVME_ATTRIBUTE_DEALLOCATE,
+	};
+
+	return ioctl(fd, NVME_IOCTL_IO_CMD, &cmd);
+}
+
+int fio_nvme_trim(const struct thread_data *td, struct fio_file *f,
+		  unsigned long long offset, unsigned long long len)
+{
+	struct nvme_data *data = FILE_ENG_DATA(f);
+	struct nvme_dsm_range dsm;
+	int ret;
+
+	dsm.nlb = (len >> data->lba_shift);
+	dsm.slba = (offset >> data->lba_shift);
+
+	ret = nvme_trim(f->fd, data->nsid, 1, sizeof(struct nvme_dsm_range),
+			&dsm);
+	if (ret)
+		log_err("%s: nvme_trim failed for offset %llu and len %llu, err=%d\n",
+			f->file_name, offset, len, ret);
+
+	return ret;
+}
+
 static int nvme_identify(int fd, __u32 nsid, enum nvme_identify_cns cns,
 			 enum nvme_csi csi, void *data)
 {
diff --git a/engines/nvme.h b/engines/nvme.h
index 1c0e526b..408594d5 100644
--- a/engines/nvme.h
+++ b/engines/nvme.h
@@ -48,6 +48,8 @@ struct nvme_uring_cmd {
 #define NVME_ZNS_ZSA_RESET 0x4
 #define NVME_ZONE_TYPE_SEQWRITE_REQ 0x2
 
+#define NVME_ATTRIBUTE_DEALLOCATE (1 << 2)
+
 enum nvme_identify_cns {
 	NVME_IDENTIFY_CNS_NS		= 0x00,
 	NVME_IDENTIFY_CNS_CSI_NS	= 0x05,
@@ -67,6 +69,7 @@ enum nvme_admin_opcode {
 enum nvme_io_opcode {
 	nvme_cmd_write			= 0x01,
 	nvme_cmd_read			= 0x02,
+	nvme_cmd_dsm			= 0x09,
 	nvme_cmd_io_mgmt_recv		= 0x12,
 	nvme_zns_cmd_mgmt_send		= 0x79,
 	nvme_zns_cmd_mgmt_recv		= 0x7a,
@@ -207,6 +210,15 @@ struct nvme_fdp_ruh_status {
 	struct nvme_fdp_ruh_status_desc ruhss[];
 };
 
+struct nvme_dsm_range {
+	__le32	cattr;
+	__le32	nlb;
+	__le64	slba;
+};
+
+int fio_nvme_trim(const struct thread_data *td, struct fio_file *f,
+		  unsigned long long offset, unsigned long long len);
+
 int fio_nvme_iomgmt_ruhs(struct thread_data *td, struct fio_file *f,
 			 struct nvme_fdp_ruh_status *ruhs, __u32 bytes);
 
diff --git a/fdp.c b/fdp.c
index 84e04fce..d92dbc67 100644
--- a/fdp.c
+++ b/fdp.c
@@ -119,7 +119,10 @@ void fdp_fill_dspec_data(struct thread_data *td, struct io_u *io_u)
 		return;
 	}
 
-	dspec = ruhs->plis[ruhs->pli_loc++ % ruhs->nr_ruhs];
+	if (ruhs->pli_loc >= ruhs->nr_ruhs)
+		ruhs->pli_loc = 0;
+
+	dspec = ruhs->plis[ruhs->pli_loc++];
 	io_u->dtype = 2;
 	io_u->dspec = dspec;
 }
diff --git a/stat.c b/stat.c
index d779a90f..015b8e28 100644
--- a/stat.c
+++ b/stat.c
@@ -555,7 +555,7 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 
 	iops = (1000 * (uint64_t)ts->total_io_u[ddir]) / runt;
 	iops_p = num2str(iops, ts->sig_figs, 1, 0, N2S_NONE);
-	if (ddir == DDIR_WRITE)
+	if (ddir == DDIR_WRITE || ddir == DDIR_TRIM)
 		post_st = zbd_write_status(ts);
 	else if (ddir == DDIR_READ && ts->cachehit && ts->cachemiss) {
 		uint64_t total;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-03-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-03-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 51bbb1a120c96ae7b93d058c7ce418962b202515:

  docs: clean up steadystate options (2023-03-20 13:57:47 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2fa0ab21c5726d8242a820ff688de019cc4d2fe2:

  engines/nvme: cast __u64 to unsigned long long for printing (2023-03-21 08:40:14 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      engines/io_uring: use correct type for fio_nvme_get_info()
      engines/nvme: cast __u64 to unsigned long long for printing

 engines/io_uring.c | 4 ++--
 engines/nvme.c     | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 5393758a..54fdf7f3 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -1148,7 +1148,7 @@ static int fio_ioring_cmd_open_file(struct thread_data *td, struct fio_file *f)
 	if (o->cmd_type == FIO_URING_CMD_NVME) {
 		struct nvme_data *data = NULL;
 		unsigned int nsid, lba_size = 0;
-		unsigned long long nlba = 0;
+		__u64 nlba = 0;
 		int ret;
 
 		/* Store the namespace-id and lba size. */
@@ -1214,7 +1214,7 @@ static int fio_ioring_cmd_get_file_size(struct thread_data *td,
 	if (o->cmd_type == FIO_URING_CMD_NVME) {
 		struct nvme_data *data = NULL;
 		unsigned int nsid, lba_size = 0;
-		unsigned long long nlba = 0;
+		__u64 nlba = 0;
 		int ret;
 
 		ret = fio_nvme_get_info(f, &nsid, &lba_size, &nlba);
diff --git a/engines/nvme.c b/engines/nvme.c
index da18eba9..3f6b64a8 100644
--- a/engines/nvme.c
+++ b/engines/nvme.c
@@ -241,7 +241,7 @@ int fio_nvme_report_zones(struct thread_data *td, struct fio_file *f,
 				break;
 			default:
 				log_err("%s: invalid type for zone at offset %llu.\n",
-					f->file_name, desc->zslba);
+					f->file_name, (unsigned long long) desc->zslba);
 				ret = -EIO;
 				goto out;
 			}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-03-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-03-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a967e54d34afe3bb10cd521d78bcaea2dd8c7cdc:

  stat: Fix ioprio print (2023-03-15 19:18:47 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 51bbb1a120c96ae7b93d058c7ce418962b202515:

  docs: clean up steadystate options (2023-03-20 13:57:47 -0400)

----------------------------------------------------------------
Christian Loehle (1):
      fio: steadystate: allow for custom check interval

Vincent Fu (3):
      steadystate: fix slope calculation for variable check intervals
      steadystate: add some TODO items
      docs: clean up steadystate options

 HOWTO.rst              | 17 +++++++++---
 STEADYSTATE-TODO       | 10 ++++++-
 cconv.c                |  2 ++
 fio.1                  | 13 +++++++--
 helper_thread.c        |  2 +-
 init.c                 | 19 +++++++++++++
 options.c              | 14 ++++++++++
 stat.c                 |  7 +++--
 steadystate.c          | 74 ++++++++++++++++++++++++++++++--------------------
 steadystate.h          |  3 +-
 t/steadystate_tests.py |  1 +
 thread_options.h       |  2 ++
 12 files changed, 120 insertions(+), 44 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index bbd9496e..5240f9da 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -3821,10 +3821,11 @@ Steady state
 
 .. option:: steadystate_duration=time, ss_dur=time
 
-	A rolling window of this duration will be used to judge whether steady state
-	has been reached. Data will be collected once per second. The default is 0
-	which disables steady state detection.  When the unit is omitted, the
-	value is interpreted in seconds.
+        A rolling window of this duration will be used to judge whether steady
+        state has been reached. Data will be collected every
+        :option:`ss_interval`.  The default is 0 which disables steady state
+        detection.  When the unit is omitted, the value is interpreted in
+        seconds.
 
 .. option:: steadystate_ramp_time=time, ss_ramp=time
 
@@ -3832,6 +3833,14 @@ Steady state
 	collection for checking the steady state job termination criterion. The
 	default is 0.  When the unit is omitted, the value is interpreted in seconds.
 
+.. option:: steadystate_check_interval=time, ss_interval=time
+
+        The values during the rolling window will be collected with a period of
+        this value. If :option:`ss_interval` is 30s and :option:`ss_dur` is
+        300s, 10 measurements will be taken. Default is 1s but that might not
+        converge, especially for slower devices, so set this accordingly. When
+        the unit is omitted, the value is interpreted in seconds.
+
 
 Measurements and reporting
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/STEADYSTATE-TODO b/STEADYSTATE-TODO
index e4b146e9..2848eb54 100644
--- a/STEADYSTATE-TODO
+++ b/STEADYSTATE-TODO
@@ -1,6 +1,14 @@
 Known issues/TODO (for steady-state)
 
-- Allow user to specify the frequency of measurements
+- Replace the test script with a better one
+  - Add test cases for the new check_interval option
+  - Parse debug=steadystate output to check calculations
+
+- Instead of calculating `intervals` every time, calculate it once and stash it
+  somewhere
+
+- Add the time unit to the ss_dur and check_interval variable names to reduce
+  possible confusion
 
 - Better documentation for output
 
diff --git a/cconv.c b/cconv.c
index 05ac75e3..1ae38b1b 100644
--- a/cconv.c
+++ b/cconv.c
@@ -252,6 +252,7 @@ int convert_thread_options_to_cpu(struct thread_options *o,
 	o->ss_ramp_time = le64_to_cpu(top->ss_ramp_time);
 	o->ss_state = le32_to_cpu(top->ss_state);
 	o->ss_limit.u.f = fio_uint64_to_double(le64_to_cpu(top->ss_limit.u.i));
+	o->ss_check_interval = le64_to_cpu(top->ss_check_interval);
 	o->zone_range = le64_to_cpu(top->zone_range);
 	o->zone_size = le64_to_cpu(top->zone_size);
 	o->zone_capacity = le64_to_cpu(top->zone_capacity);
@@ -614,6 +615,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->ss_ramp_time = __cpu_to_le64(top->ss_ramp_time);
 	top->ss_state = cpu_to_le32(top->ss_state);
 	top->ss_limit.u.i = __cpu_to_le64(fio_double_to_uint64(o->ss_limit.u.f));
+	top->ss_check_interval = __cpu_to_le64(top->ss_check_interval);
 	top->zone_range = __cpu_to_le64(o->zone_range);
 	top->zone_size = __cpu_to_le64(o->zone_size);
 	top->zone_capacity = __cpu_to_le64(o->zone_capacity);
diff --git a/fio.1 b/fio.1
index a238331c..e2db3a3f 100644
--- a/fio.1
+++ b/fio.1
@@ -3532,14 +3532,21 @@ slope. Stop the job if the slope falls below the specified limit.
 .TP
 .BI steadystate_duration \fR=\fPtime "\fR,\fP ss_dur" \fR=\fPtime
 A rolling window of this duration will be used to judge whether steady state
-has been reached. Data will be collected once per second. The default is 0
-which disables steady state detection. When the unit is omitted, the
-value is interpreted in seconds.
+has been reached. Data will be collected every \fBss_interval\fR. The default
+is 0 which disables steady state detection. When the unit is omitted, the value
+is interpreted in seconds.
 .TP
 .BI steadystate_ramp_time \fR=\fPtime "\fR,\fP ss_ramp" \fR=\fPtime
 Allow the job to run for the specified duration before beginning data
 collection for checking the steady state job termination criterion. The
 default is 0. When the unit is omitted, the value is interpreted in seconds.
+.TP
+.BI steadystate_check_interval \fR=\fPtime "\fR,\fP ss_interval" \fR=\fPtime
+The values suring the rolling window will be collected with a period of this
+value. If \fBss_interval\fR is 30s and \fBss_dur\fR is 300s, 10 measurements
+will be taken. Default is 1s but that might not converge, especially for slower
+devices, so set this accordingly. When the unit is omitted, the value is
+interpreted in seconds.
 .SS "Measurements and reporting"
 .TP
 .BI per_job_logs \fR=\fPbool
diff --git a/helper_thread.c b/helper_thread.c
index b9b83db3..77016638 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -281,7 +281,7 @@ static void *helper_thread_main(void *data)
 		},
 		{
 			.name = "steadystate",
-			.interval_ms = steadystate_enabled ? STEADYSTATE_MSEC :
+			.interval_ms = steadystate_enabled ? ss_check_interval :
 				0,
 			.func = steadystate_check,
 		}
diff --git a/init.c b/init.c
index 442dab42..a70f749a 100644
--- a/init.c
+++ b/init.c
@@ -981,6 +981,25 @@ static int fixup_options(struct thread_data *td)
 		}
 	}
 
+	for_each_td(td2) {
+		if (td->o.ss_check_interval != td2->o.ss_check_interval) {
+			log_err("fio: conflicting ss_check_interval: %llu and %llu, must be globally equal\n",
+					td->o.ss_check_interval, td2->o.ss_check_interval);
+			ret |= 1;
+		}
+	} end_for_each();
+	if (td->o.ss_dur && td->o.ss_check_interval / 1000L < 1000) {
+		log_err("fio: ss_check_interval must be at least 1s\n");
+		ret |= 1;
+
+	}
+	if (td->o.ss_dur && (td->o.ss_dur % td->o.ss_check_interval != 0 || td->o.ss_dur <= td->o.ss_check_interval)) {
+		log_err("fio: ss_duration %lluus must be multiple of ss_check_interval %lluus\n",
+				td->o.ss_dur, td->o.ss_check_interval);
+		ret |= 1;
+	}
+
+
 	return ret;
 }
 
diff --git a/options.c b/options.c
index 91049af5..18857795 100644
--- a/options.c
+++ b/options.c
@@ -5228,6 +5228,20 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_GENERAL,
 		.group  = FIO_OPT_G_RUNTIME,
 	},
+        {
+		.name   = "steadystate_check_interval",
+		.lname  = "Steady state check interval",
+		.alias  = "ss_interval",
+		.parent	= "steadystate",
+		.type   = FIO_OPT_STR_VAL_TIME,
+		.off1   = offsetof(struct thread_options, ss_check_interval),
+		.help   = "Polling interval for the steady state check (too low means steadystate will not converge)",
+		.def    = "1",
+		.is_seconds = 1,
+		.is_time = 1,
+		.category = FIO_OPT_C_GENERAL,
+		.group  = FIO_OPT_G_RUNTIME,
+	},
 	{
 		.name = NULL,
 	},
diff --git a/stat.c b/stat.c
index 56be330b..d779a90f 100644
--- a/stat.c
+++ b/stat.c
@@ -1874,6 +1874,7 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 		struct json_array *iops, *bw;
 		int j, k, l;
 		char ss_buf[64];
+		int intervals = ts->ss_dur / (ss_check_interval / 1000L);
 
 		snprintf(ss_buf, sizeof(ss_buf), "%s%s:%f%s",
 			ts->ss_state & FIO_SS_IOPS ? "iops" : "bw",
@@ -1907,9 +1908,9 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 		if ((ts->ss_state & FIO_SS_ATTAINED) || !(ts->ss_state & FIO_SS_BUFFER_FULL))
 			j = ts->ss_head;
 		else
-			j = ts->ss_head == 0 ? ts->ss_dur - 1 : ts->ss_head - 1;
-		for (l = 0; l < ts->ss_dur; l++) {
-			k = (j + l) % ts->ss_dur;
+			j = ts->ss_head == 0 ? intervals - 1 : ts->ss_head - 1;
+		for (l = 0; l < intervals; l++) {
+			k = (j + l) % intervals;
 			json_array_add_value_int(bw, ts->ss_bw_data[k]);
 			json_array_add_value_int(iops, ts->ss_iops_data[k]);
 		}
diff --git a/steadystate.c b/steadystate.c
index 14cdf0ed..3e3683f3 100644
--- a/steadystate.c
+++ b/steadystate.c
@@ -4,6 +4,7 @@
 #include "steadystate.h"
 
 bool steadystate_enabled = false;
+unsigned int ss_check_interval = 1000;
 
 void steadystate_free(struct thread_data *td)
 {
@@ -15,8 +16,10 @@ void steadystate_free(struct thread_data *td)
 
 static void steadystate_alloc(struct thread_data *td)
 {
-	td->ss.bw_data = calloc(td->ss.dur, sizeof(uint64_t));
-	td->ss.iops_data = calloc(td->ss.dur, sizeof(uint64_t));
+	int intervals = td->ss.dur / (ss_check_interval / 1000L);
+
+	td->ss.bw_data = calloc(intervals, sizeof(uint64_t));
+	td->ss.iops_data = calloc(intervals, sizeof(uint64_t));
 
 	td->ss.state |= FIO_SS_DATA;
 }
@@ -64,6 +67,7 @@ static bool steadystate_slope(uint64_t iops, uint64_t bw,
 	double result;
 	struct steadystate_data *ss = &td->ss;
 	uint64_t new_val;
+	int intervals = ss->dur / (ss_check_interval / 1000L);
 
 	ss->bw_data[ss->tail] = bw;
 	ss->iops_data[ss->tail] = iops;
@@ -73,15 +77,15 @@ static bool steadystate_slope(uint64_t iops, uint64_t bw,
 	else
 		new_val = bw;
 
-	if (ss->state & FIO_SS_BUFFER_FULL || ss->tail - ss->head == ss->dur - 1) {
+	if (ss->state & FIO_SS_BUFFER_FULL || ss->tail - ss->head == intervals - 1) {
 		if (!(ss->state & FIO_SS_BUFFER_FULL)) {
 			/* first time through */
-			for(i = 0, ss->sum_y = 0; i < ss->dur; i++) {
+			for (i = 0, ss->sum_y = 0; i < intervals; i++) {
 				if (ss->state & FIO_SS_IOPS)
 					ss->sum_y += ss->iops_data[i];
 				else
 					ss->sum_y += ss->bw_data[i];
-				j = (ss->head + i) % ss->dur;
+				j = (ss->head + i) % intervals;
 				if (ss->state & FIO_SS_IOPS)
 					ss->sum_xy += i * ss->iops_data[j];
 				else
@@ -91,7 +95,7 @@ static bool steadystate_slope(uint64_t iops, uint64_t bw,
 		} else {		/* easy to update the sums */
 			ss->sum_y -= ss->oldest_y;
 			ss->sum_y += new_val;
-			ss->sum_xy = ss->sum_xy - ss->sum_y + ss->dur * new_val;
+			ss->sum_xy = ss->sum_xy - ss->sum_y + intervals * new_val;
 		}
 
 		if (ss->state & FIO_SS_IOPS)
@@ -105,10 +109,10 @@ static bool steadystate_slope(uint64_t iops, uint64_t bw,
 		 * equally spaced when they are often off by a few milliseconds.
 		 * This assumption greatly simplifies the calculations.
 		 */
-		ss->slope = (ss->sum_xy - (double) ss->sum_x * ss->sum_y / ss->dur) /
-				(ss->sum_x_sq - (double) ss->sum_x * ss->sum_x / ss->dur);
+		ss->slope = (ss->sum_xy - (double) ss->sum_x * ss->sum_y / intervals) /
+				(ss->sum_x_sq - (double) ss->sum_x * ss->sum_x / intervals);
 		if (ss->state & FIO_SS_PCT)
-			ss->criterion = 100.0 * ss->slope / (ss->sum_y / ss->dur);
+			ss->criterion = 100.0 * ss->slope / (ss->sum_y / intervals);
 		else
 			ss->criterion = ss->slope;
 
@@ -123,9 +127,9 @@ static bool steadystate_slope(uint64_t iops, uint64_t bw,
 			return true;
 	}
 
-	ss->tail = (ss->tail + 1) % ss->dur;
+	ss->tail = (ss->tail + 1) % intervals;
 	if (ss->tail <= ss->head)
-		ss->head = (ss->head + 1) % ss->dur;
+		ss->head = (ss->head + 1) % intervals;
 
 	return false;
 }
@@ -138,18 +142,20 @@ static bool steadystate_deviation(uint64_t iops, uint64_t bw,
 	double mean;
 
 	struct steadystate_data *ss = &td->ss;
+	int intervals = ss->dur / (ss_check_interval / 1000L);
 
 	ss->bw_data[ss->tail] = bw;
 	ss->iops_data[ss->tail] = iops;
 
-	if (ss->state & FIO_SS_BUFFER_FULL || ss->tail - ss->head == ss->dur - 1) {
+	if (ss->state & FIO_SS_BUFFER_FULL || ss->tail - ss->head == intervals  - 1) {
 		if (!(ss->state & FIO_SS_BUFFER_FULL)) {
 			/* first time through */
-			for(i = 0, ss->sum_y = 0; i < ss->dur; i++)
+			for (i = 0, ss->sum_y = 0; i < intervals; i++) {
 				if (ss->state & FIO_SS_IOPS)
 					ss->sum_y += ss->iops_data[i];
 				else
 					ss->sum_y += ss->bw_data[i];
+			}
 			ss->state |= FIO_SS_BUFFER_FULL;
 		} else {		/* easy to update the sum */
 			ss->sum_y -= ss->oldest_y;
@@ -164,10 +170,10 @@ static bool steadystate_deviation(uint64_t iops, uint64_t bw,
 		else
 			ss->oldest_y = ss->bw_data[ss->head];
 
-		mean = (double) ss->sum_y / ss->dur;
+		mean = (double) ss->sum_y / intervals;
 		ss->deviation = 0.0;
 
-		for (i = 0; i < ss->dur; i++) {
+		for (i = 0; i < intervals; i++) {
 			if (ss->state & FIO_SS_IOPS)
 				diff = ss->iops_data[i] - mean;
 			else
@@ -180,8 +186,9 @@ static bool steadystate_deviation(uint64_t iops, uint64_t bw,
 		else
 			ss->criterion = ss->deviation;
 
-		dprint(FD_STEADYSTATE, "sum_y: %llu, mean: %f, max diff: %f, "
+		dprint(FD_STEADYSTATE, "intervals: %d, sum_y: %llu, mean: %f, max diff: %f, "
 					"objective: %f, limit: %f\n",
+					intervals,
 					(unsigned long long) ss->sum_y, mean,
 					ss->deviation, ss->criterion, ss->limit);
 
@@ -189,9 +196,9 @@ static bool steadystate_deviation(uint64_t iops, uint64_t bw,
 			return true;
 	}
 
-	ss->tail = (ss->tail + 1) % ss->dur;
-	if (ss->tail <= ss->head)
-		ss->head = (ss->head + 1) % ss->dur;
+	ss->tail = (ss->tail + 1) % intervals;
+	if (ss->tail == ss->head)
+		ss->head = (ss->head + 1) % intervals;
 
 	return false;
 }
@@ -228,10 +235,10 @@ int steadystate_check(void)
 		fio_gettime(&now, NULL);
 		if (ss->ramp_time && !(ss->state & FIO_SS_RAMP_OVER)) {
 			/*
-			 * Begin recording data one second after ss->ramp_time
+			 * Begin recording data one check interval after ss->ramp_time
 			 * has elapsed
 			 */
-			if (utime_since(&td->epoch, &now) >= (ss->ramp_time + 1000000L))
+			if (utime_since(&td->epoch, &now) >= (ss->ramp_time + ss_check_interval * 1000L))
 				ss->state |= FIO_SS_RAMP_OVER;
 		}
 
@@ -250,8 +257,10 @@ int steadystate_check(void)
 		memcpy(&ss->prev_time, &now, sizeof(now));
 
 		if (ss->state & FIO_SS_RAMP_OVER) {
-			group_bw += 1000 * (td_bytes - ss->prev_bytes) / rate_time;
-			group_iops += 1000 * (td_iops - ss->prev_iops) / rate_time;
+			group_bw += rate_time * (td_bytes - ss->prev_bytes) /
+				(ss_check_interval * ss_check_interval / 1000L);
+			group_iops += rate_time * (td_iops - ss->prev_iops) /
+				(ss_check_interval * ss_check_interval / 1000L);
 			++group_ramp_time_over;
 		}
 		ss->prev_iops = td_iops;
@@ -301,6 +310,7 @@ int td_steadystate_init(struct thread_data *td)
 {
 	struct steadystate_data *ss = &td->ss;
 	struct thread_options *o = &td->o;
+	int intervals;
 
 	memset(ss, 0, sizeof(*ss));
 
@@ -312,13 +322,15 @@ int td_steadystate_init(struct thread_data *td)
 		ss->dur = o->ss_dur;
 		ss->limit = o->ss_limit.u.f;
 		ss->ramp_time = o->ss_ramp_time;
+		ss_check_interval = o->ss_check_interval / 1000L;
 
 		ss->state = o->ss_state;
 		if (!td->ss.ramp_time)
 			ss->state |= FIO_SS_RAMP_OVER;
 
-		ss->sum_x = o->ss_dur * (o->ss_dur - 1) / 2;
-		ss->sum_x_sq = (o->ss_dur - 1) * (o->ss_dur) * (2*o->ss_dur - 1) / 6;
+		intervals = ss->dur / (ss_check_interval / 1000L);
+		ss->sum_x = intervals * (intervals - 1) / 2;
+		ss->sum_x_sq = (intervals - 1) * (intervals) * (2*intervals - 1) / 6;
 	}
 
 	/* make sure that ss options are consistent within reporting group */
@@ -345,26 +357,28 @@ uint64_t steadystate_bw_mean(struct thread_stat *ts)
 {
 	int i;
 	uint64_t sum;
-
+	int intervals = ts->ss_dur / (ss_check_interval / 1000L);
+	
 	if (!ts->ss_dur)
 		return 0;
 
-	for (i = 0, sum = 0; i < ts->ss_dur; i++)
+	for (i = 0, sum = 0; i < intervals; i++)
 		sum += ts->ss_bw_data[i];
 
-	return sum / ts->ss_dur;
+	return sum / intervals;
 }
 
 uint64_t steadystate_iops_mean(struct thread_stat *ts)
 {
 	int i;
 	uint64_t sum;
+	int intervals = ts->ss_dur / (ss_check_interval / 1000L);
 
 	if (!ts->ss_dur)
 		return 0;
 
-	for (i = 0, sum = 0; i < ts->ss_dur; i++)
+	for (i = 0, sum = 0; i < intervals; i++)
 		sum += ts->ss_iops_data[i];
 
-	return sum / ts->ss_dur;
+	return sum / intervals;
 }
diff --git a/steadystate.h b/steadystate.h
index bbb86fbb..f1ef2b20 100644
--- a/steadystate.h
+++ b/steadystate.h
@@ -11,6 +11,7 @@ extern uint64_t steadystate_bw_mean(struct thread_stat *);
 extern uint64_t steadystate_iops_mean(struct thread_stat *);
 
 extern bool steadystate_enabled;
+extern unsigned int ss_check_interval;
 
 struct steadystate_data {
 	double limit;
@@ -64,6 +65,4 @@ enum {
 	FIO_SS_BW_SLOPE		= FIO_SS_BW | FIO_SS_SLOPE,
 };
 
-#define STEADYSTATE_MSEC	1000
-
 #endif
diff --git a/t/steadystate_tests.py b/t/steadystate_tests.py
index d6ffd177..d0fa73b2 100755
--- a/t/steadystate_tests.py
+++ b/t/steadystate_tests.py
@@ -115,6 +115,7 @@ if __name__ == '__main__':
               {'s': False, 'timeout': 20, 'numjobs': 2},
               {'s': True, 'timeout': 100, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 5, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
               {'s': True, 'timeout': 10, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 500, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
+              {'s': True, 'timeout': 10, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 500, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True, 'ss_interval': 5},
             ]
 
     jobnum = 0
diff --git a/thread_options.h b/thread_options.h
index 2520357c..6670cbbf 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -211,6 +211,7 @@ struct thread_options {
 	fio_fp64_t ss_limit;
 	unsigned long long ss_dur;
 	unsigned long long ss_ramp_time;
+	unsigned long long ss_check_interval;
 	unsigned int overwrite;
 	unsigned int bw_avg_time;
 	unsigned int iops_avg_time;
@@ -533,6 +534,7 @@ struct thread_options_pack {
 	uint64_t ss_ramp_time;
 	uint32_t ss_state;
 	fio_fp64_t ss_limit;
+	uint64_t ss_check_interval;
 	uint32_t overwrite;
 	uint32_t bw_avg_time;
 	uint32_t iops_avg_time;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-03-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-03-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4ad09b569a2689b3b67744eaccd378d013eb82a7:

  t/io_uring: abstract out init_new_io() helper (2023-03-14 14:03:32 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a967e54d34afe3bb10cd521d78bcaea2dd8c7cdc:

  stat: Fix ioprio print (2023-03-15 19:18:47 -0400)

----------------------------------------------------------------
Damien Le Moal (1):
      stat: Fix ioprio print

 os/os-dragonfly.h |  2 ++
 os/os-linux.h     |  3 ++
 os/os.h           |  2 ++
 stat.c            | 85 +++++++++++++++++++++++++++++--------------------------
 4 files changed, 52 insertions(+), 40 deletions(-)

---

Diff of recent changes:

diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index 5b37a37e..bde39101 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -175,6 +175,8 @@ static inline int fio_getaffinity(int pid, os_cpu_mask_t *mask)
 #define ioprio_set(which, who, ioprio_class, ioprio)	\
 	ioprio_set(which, who, ioprio)
 
+#define ioprio(ioprio)		(ioprio)
+
 static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 {
 	struct partinfo pi;
diff --git a/os/os-linux.h b/os/os-linux.h
index 7a78b42d..2f9f7e79 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -153,6 +153,9 @@ static inline int ioprio_set(int which, int who, int ioprio_class, int ioprio)
 		       ioprio_value(ioprio_class, ioprio));
 }
 
+#define ioprio_class(ioprio)	((ioprio) >> IOPRIO_CLASS_SHIFT)
+#define ioprio(ioprio)		((ioprio) & 7)
+
 #ifndef CONFIG_HAVE_GETTID
 static inline int gettid(void)
 {
diff --git a/os/os.h b/os/os.h
index ebaf8af5..036fc233 100644
--- a/os/os.h
+++ b/os/os.h
@@ -116,12 +116,14 @@ extern int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu);
 #endif
 
 #ifndef FIO_HAVE_IOPRIO_CLASS
+#define ioprio_class(prio)		0
 #define ioprio_value_is_class_rt(prio)	(false)
 #define IOPRIO_MIN_PRIO_CLASS		0
 #define IOPRIO_MAX_PRIO_CLASS		0
 #endif
 #ifndef FIO_HAVE_IOPRIO
 #define ioprio_value(prioclass, prio)	(0)
+#define ioprio(ioprio)			0
 #define ioprio_set(which, who, prioclass, prio)	(0)
 #define IOPRIO_MIN_PRIO			0
 #define IOPRIO_MAX_PRIO			0
diff --git a/stat.c b/stat.c
index e0a2dcc6..56be330b 100644
--- a/stat.c
+++ b/stat.c
@@ -590,17 +590,18 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	/* Only print per prio stats if there are >= 2 prios with samples */
 	if (get_nr_prios_with_samples(ts, ddir) >= 2) {
 		for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
-			if (calc_lat(&ts->clat_prio[ddir][i].clat_stat, &min,
-				     &max, &mean, &dev)) {
-				char buf[64];
+			char buf[64];
 
-				snprintf(buf, sizeof(buf),
-					 "%s prio %u/%u",
-					 clat_type,
-					 ts->clat_prio[ddir][i].ioprio >> 13,
-					 ts->clat_prio[ddir][i].ioprio & 7);
-				display_lat(buf, min, max, mean, dev, out);
-			}
+			if (!calc_lat(&ts->clat_prio[ddir][i].clat_stat, &min,
+				      &max, &mean, &dev))
+				continue;
+
+			snprintf(buf, sizeof(buf),
+				 "%s prio %u/%u",
+				 clat_type,
+				 ioprio_class(ts->clat_prio[ddir][i].ioprio),
+				 ioprio(ts->clat_prio[ddir][i].ioprio));
+			display_lat(buf, min, max, mean, dev, out);
 		}
 	}
 
@@ -632,20 +633,22 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		/* Only print per prio stats if there are >= 2 prios with samples */
 		if (get_nr_prios_with_samples(ts, ddir) >= 2) {
 			for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
-				uint64_t prio_samples = ts->clat_prio[ddir][i].clat_stat.samples;
-
-				if (prio_samples > 0) {
-					snprintf(prio_name, sizeof(prio_name),
-						 "%s prio %u/%u (%.2f%% of IOs)",
-						 clat_type,
-						 ts->clat_prio[ddir][i].ioprio >> 13,
-						 ts->clat_prio[ddir][i].ioprio & 7,
-						 100. * (double) prio_samples / (double) samples);
-					show_clat_percentiles(ts->clat_prio[ddir][i].io_u_plat,
-							      prio_samples, ts->percentile_list,
-							      ts->percentile_precision,
-							      prio_name, out);
-				}
+				uint64_t prio_samples =
+					ts->clat_prio[ddir][i].clat_stat.samples;
+
+				if (!prio_samples)
+					continue;
+
+				snprintf(prio_name, sizeof(prio_name),
+					 "%s prio %u/%u (%.2f%% of IOs)",
+					 clat_type,
+					 ioprio_class(ts->clat_prio[ddir][i].ioprio),
+					 ioprio(ts->clat_prio[ddir][i].ioprio),
+					 100. * (double) prio_samples / (double) samples);
+				show_clat_percentiles(ts->clat_prio[ddir][i].io_u_plat,
+						prio_samples, ts->percentile_list,
+						ts->percentile_precision,
+						prio_name, out);
 			}
 		}
 	}
@@ -1508,22 +1511,24 @@ static void add_ddir_status_json(struct thread_stat *ts,
 		json_object_add_value_array(dir_object, "prios", array);
 
 		for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
-			if (ts->clat_prio[ddir][i].clat_stat.samples > 0) {
-				struct json_object *obj = json_create_object();
-				unsigned long long class, level;
-
-				class = ts->clat_prio[ddir][i].ioprio >> 13;
-				json_object_add_value_int(obj, "prioclass", class);
-				level = ts->clat_prio[ddir][i].ioprio & 7;
-				json_object_add_value_int(obj, "prio", level);
-
-				tmp_object = add_ddir_lat_json(ts,
-							       ts->clat_percentiles | ts->lat_percentiles,
-							       &ts->clat_prio[ddir][i].clat_stat,
-							       ts->clat_prio[ddir][i].io_u_plat);
-				json_object_add_value_object(obj, obj_name, tmp_object);
-				json_array_add_value_object(array, obj);
-			}
+			struct json_object *obj;
+
+			if (!ts->clat_prio[ddir][i].clat_stat.samples)
+				continue;
+
+			obj = json_create_object();
+
+			json_object_add_value_int(obj, "prioclass",
+				ioprio_class(ts->clat_prio[ddir][i].ioprio));
+			json_object_add_value_int(obj, "prio",
+				ioprio(ts->clat_prio[ddir][i].ioprio));
+
+			tmp_object = add_ddir_lat_json(ts,
+					ts->clat_percentiles | ts->lat_percentiles,
+					&ts->clat_prio[ddir][i].clat_stat,
+					ts->clat_prio[ddir][i].io_u_plat);
+			json_object_add_value_object(obj, obj_name, tmp_object);
+			json_array_add_value_object(array, obj);
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-03-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-03-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 557cfc51068921766e8cd6b242feb4c929cb45ea:

  t/zbd: fix minimum write size to sequential write required zones (2023-03-07 12:45:41 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4ad09b569a2689b3b67744eaccd378d013eb82a7:

  t/io_uring: abstract out init_new_io() helper (2023-03-14 14:03:32 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      Fio 3.34
      t/io_uring: avoid truncation of offset on 32-bit builds
      t/io_uring: use the get_offset() code to retrieve pass-through offset
      t/io_uring: abstract out init_new_io() helper

 FIO-VERSION-GEN |  2 +-
 t/io_uring.c    | 78 ++++++++++++++++++---------------------------------------
 2 files changed, 25 insertions(+), 55 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 5a0822c9..f1585d34 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.33
+DEF_VER=fio-3.34
 
 LF='
 '
diff --git a/t/io_uring.c b/t/io_uring.c
index 1ea0a9da..504f8ce9 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -530,8 +530,11 @@ static unsigned long long get_offset(struct submitter *s, struct file *f)
 	long r;
 
 	if (random_io) {
+		unsigned long long block;
+
 		r = __rand64(&s->rand_state);
-		offset = (r % (f->max_blocks - 1)) * bs;
+		block = r % f->max_blocks;
+		offset = block * (unsigned long long) bs;
 	} else {
 		offset = f->cur_off;
 		f->cur_off += bs;
@@ -542,16 +545,10 @@ static unsigned long long get_offset(struct submitter *s, struct file *f)
 	return offset;
 }
 
-static void init_io(struct submitter *s, unsigned index)
+static struct file *init_new_io(struct submitter *s)
 {
-	struct io_uring_sqe *sqe = &s->sqes[index];
 	struct file *f;
 
-	if (do_nop) {
-		sqe->opcode = IORING_OP_NOP;
-		return;
-	}
-
 	if (s->nr_files == 1) {
 		f = &s->files[0];
 	} else {
@@ -563,7 +560,22 @@ static void init_io(struct submitter *s, unsigned index)
 			f = &s->files[s->cur_file];
 		}
 	}
+
 	f->pending_ios++;
+	return f;
+}
+
+static void init_io(struct submitter *s, unsigned index)
+{
+	struct io_uring_sqe *sqe = &s->sqes[index];
+	struct file *f;
+
+	if (do_nop) {
+		sqe->opcode = IORING_OP_NOP;
+		return;
+	}
+
+	f = init_new_io(s);
 
 	if (register_files) {
 		sqe->flags = IOSQE_FIXED_FILE;
@@ -603,30 +615,10 @@ static void init_io_pt(struct submitter *s, unsigned index)
 	struct nvme_uring_cmd *cmd;
 	unsigned long long slba;
 	unsigned long long nlb;
-	long r;
 
-	if (s->nr_files == 1) {
-		f = &s->files[0];
-	} else {
-		f = &s->files[s->cur_file];
-		if (f->pending_ios >= file_depth(s)) {
-			s->cur_file++;
-			if (s->cur_file == s->nr_files)
-				s->cur_file = 0;
-			f = &s->files[s->cur_file];
-		}
-	}
-	f->pending_ios++;
+	f = init_new_io(s);
 
-	if (random_io) {
-		r = __rand64(&s->rand_state);
-		offset = (r % (f->max_blocks - 1)) * bs;
-	} else {
-		offset = f->cur_off;
-		f->cur_off += bs;
-		if (f->cur_off + bs > f->max_size)
-			f->cur_off = 0;
-	}
+	offset = get_offset(s, f);
 
 	if (register_files) {
 		sqe->fd = f->fixed_fd;
@@ -1121,18 +1113,7 @@ static int prep_more_ios_aio(struct submitter *s, int max_ios, struct iocb *iocb
 	while (index < max_ios) {
 		struct iocb *iocb = &iocbs[index];
 
-		if (s->nr_files == 1) {
-			f = &s->files[0];
-		} else {
-			f = &s->files[s->cur_file];
-			if (f->pending_ios >= file_depth(s)) {
-				s->cur_file++;
-				if (s->cur_file == s->nr_files)
-					s->cur_file = 0;
-				f = &s->files[s->cur_file];
-			}
-		}
-		f->pending_ios++;
+		f = init_new_io(s);
 
 		io_prep_pread(iocb, f->real_fd, s->iovecs[index].iov_base,
 				s->iovecs[index].iov_len, get_offset(s, f));
@@ -1419,18 +1400,7 @@ static void *submitter_sync_fn(void *data)
 		uint64_t offset;
 		struct file *f;
 
-		if (s->nr_files == 1) {
-			f = &s->files[0];
-		} else {
-			f = &s->files[s->cur_file];
-			if (f->pending_ios >= file_depth(s)) {
-				s->cur_file++;
-				if (s->cur_file == s->nr_files)
-					s->cur_file = 0;
-				f = &s->files[s->cur_file];
-			}
-		}
-		f->pending_ios++;
+		f = init_new_io(s);
 
 #ifdef ARCH_HAVE_CPU_CLOCK
 		if (stats)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-03-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-03-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 051b5785bc47ab216fa3db9dceb6184073dcc88a:

  Merge branch 'For_Each_Td_Private_Scope' of https://github.com/horshack-dpreview/fio (2023-03-03 10:46:26 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 557cfc51068921766e8cd6b242feb4c929cb45ea:

  t/zbd: fix minimum write size to sequential write required zones (2023-03-07 12:45:41 -0500)

----------------------------------------------------------------
Shin'ichiro Kawasaki (2):
      t/zbd: rename logical_block_size to min_seq_write_size
      t/zbd: fix minimum write size to sequential write required zones

 t/zbd/functions        | 28 +++++++++++++++++++++++++---
 t/zbd/test-zbd-support | 42 +++++++++++++++++++++---------------------
 2 files changed, 46 insertions(+), 24 deletions(-)

---

Diff of recent changes:

diff --git a/t/zbd/functions b/t/zbd/functions
index 812320f5..9a6d6999 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -238,18 +238,40 @@ max_open_zones() {
     fi
 }
 
+# Get minimum block size to write to seq zones. Refer the sysfs attribute
+# zone_write_granularity which shows the valid minimum size regardless of zoned
+# block device type. If the sysfs attribute is not available, refer physical
+# block size for rotational SMR drives. For non-rotational devices such as ZNS
+# devices, refer logical block size.
+min_seq_write_size() {
+	local sys_path="/sys/block/$1/queue"
+	local -i size=0
+
+	if [[ -r "$sys_path/zone_write_granularity" ]]; then
+		size=$(<"$sys_path/zone_write_granularity")
+	fi
+
+	if ((size)); then
+		echo "$size"
+	elif (($(<"$sys_path/rotational"))); then
+		cat "$sys_path/physical_block_size"
+	else
+		cat "$sys_path/logical_block_size"
+	fi
+}
+
 is_zbc() {
 	local dev=$1
 
 	[[ -z "$(${zbc_info} "$dev" | grep "is not a zoned block device")" ]]
 }
 
-zbc_logical_block_size() {
+zbc_physical_block_size() {
 	local dev=$1
 
 	${zbc_info} "$dev" |
-		grep "logical blocks" |
-		sed -n 's/^[[:blank:]]*[0-9]* logical blocks of[[:blank:]]*//p' |
+		grep "physical blocks" |
+		sed -n 's/^[[:blank:]]*[0-9]* physical blocks of[[:blank:]]*//p' |
 		sed 's/ B//'
 }
 
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 893aff3c..996160e7 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -166,7 +166,7 @@ write_and_run_one_fio_job() {
     shift 2
     r=$(((RANDOM << 16) | RANDOM))
     write_opts=(--name="write_job" --rw=write "$(ioengine "psync")" \
-		      --bs="${logical_block_size}" --zonemode=zbd \
+		      --bs="${min_seq_write_size}" --zonemode=zbd \
 		      --zonesize="${zone_size}" --thread=1 --direct=1 \
 		      --offset="${write_offset}" --size="${write_size}")
     write_opts+=("${job_var_opts[@]}")
@@ -335,7 +335,7 @@ test4() {
     size=$((zone_size))
     [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
     opts+=("--name=$dev" "--filename=$dev" "--offset=$off")
-    opts+=(--bs="$(min $((logical_block_size * 256)) $size)")
+    opts+=(--bs="$(min $((min_seq_write_size * 256)) $size)")
     opts+=("--size=$size" "--thread=1" "--read_beyond_wp=1")
     opts+=("$(ioengine "psync")" "--rw=read" "--direct=1" "--disable_lat=1")
     opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
@@ -351,7 +351,7 @@ test5() {
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 4 $off $dev)
     size=$((4 * zone_size))
-    bs=$(min "$(max $((zone_size / 64)) "$logical_block_size")" "$zone_cap_bs")
+    bs=$(min "$(max $((zone_size / 64)) "$min_seq_write_size")" "$zone_cap_bs")
     run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write	\
 		   --bs="$bs" --do_verify=1 --verify=md5 \
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
@@ -367,7 +367,7 @@ test6() {
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 4 $off $dev)
     size=$((4 * zone_size))
-    bs=$(min "$(max $((zone_size / 64)) "$logical_block_size")" "$zone_cap_bs")
+    bs=$(min "$(max $((zone_size / 64)) "$min_seq_write_size")" "$zone_cap_bs")
     write_and_run_one_fio_job \
 	    $((first_sequential_zone_sector * 512)) "${size}" \
 	    --offset="${off}" \
@@ -748,7 +748,7 @@ test30() {
     prep_write
     off=$((first_sequential_zone_sector * 512))
     run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw	\
-		    --bs="$(max $((zone_size / 128)) "$logical_block_size")"\
+		    --bs="$(max $((zone_size / 128)) "$min_seq_write_size")"\
 		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off\
 		    --loops=2 --time_based --runtime=30s --norandommap=1\
 		    >>"${logfile}.${test_number}" 2>&1
@@ -904,9 +904,9 @@ test38() {
     local bs off size
 
     prep_write
-    size=$((logical_block_size))
-    off=$((disk_size - logical_block_size))
-    bs=$((logical_block_size))
+    size=$((min_seq_write_size))
+    off=$((disk_size - min_seq_write_size))
+    bs=$((min_seq_write_size))
     run_one_fio_job --offset=$off --size=$size "$(ioengine "psync")"	\
 		    --iodepth=1 --rw=write --do_verify=1 --verify=md5	\
 		    --bs=$bs --zonemode=zbd --zonesize="${zone_size}"	\
@@ -924,7 +924,7 @@ read_one_block() {
 	exit 1
     fi
     off=${result[0]}
-    bs=$((logical_block_size))
+    bs=$((min_seq_write_size))
     run_one_fio_job --rw=read "$(ioengine "psync")" --offset=$off --bs=$bs \
 		    --size=$bs "$@" 2>&1 |
 	tee -a "${logfile}.${test_number}"
@@ -934,14 +934,14 @@ read_one_block() {
 test39() {
     require_zbd || return $SKIP_TESTCASE
     read_one_block --zonemode=none >/dev/null || return $?
-    check_read $((logical_block_size)) || return $?
+    check_read $((min_seq_write_size)) || return $?
 }
 
 # Check whether fio accepts --zonemode=strided for zoned block devices.
 test40() {
     local bs
 
-    bs=$((logical_block_size))
+    bs=$((min_seq_write_size))
     require_zbd || return $SKIP_TESTCASE
     read_one_block --zonemode=strided |
 	grep -q 'fio: --zonesize must be specified when using --zonemode=strided' ||
@@ -982,7 +982,7 @@ test45() {
 
     require_zbd || return $SKIP_TESTCASE
     prep_write
-    bs=$((logical_block_size))
+    bs=$((min_seq_write_size))
     run_one_fio_job "$(ioengine "psync")" --iodepth=1 --rw=randwrite --bs=$bs\
 		    --offset=$((first_sequential_zone_sector * 512)) \
 		    --size="$zone_size" --do_verify=1 --verify=md5 2>&1 |
@@ -1007,7 +1007,7 @@ test47() {
     local bs
 
     prep_write
-    bs=$((logical_block_size))
+    bs=$((min_seq_write_size))
     run_fio_on_seq "$(ioengine "psync")" --rw=write --bs=$bs --zoneskip=1 \
 		    >> "${logfile}.${test_number}" 2>&1 && return 1
     grep -q 'zoneskip 1 is not a multiple of the device zone size' "${logfile}.${test_number}"
@@ -1190,7 +1190,7 @@ test54() {
 # test 'z' suffix parsing only
 test55() {
 	local bs
-	bs=$((logical_block_size))
+	bs=$((min_seq_write_size))
 
 	require_zbd || return $SKIP_TESTCASE
 	# offset=1z + offset_increment=10z + size=2z
@@ -1216,7 +1216,7 @@ test55() {
 # test 'z' suffix parsing only
 test56() {
 	local bs
-	bs=$((logical_block_size))
+	bs=$((min_seq_write_size))
 
 	require_regular_block_dev || return $SKIP_TESTCASE
 	require_seq_zones 10 || return $SKIP_TESTCASE
@@ -1260,7 +1260,7 @@ test58() {
     require_seq_zones 128 || return $SKIP_TESTCASE
 
     size=$((zone_size * 128))
-    bs="$(max $((zone_size / 128)) "$logical_block_size")"
+    bs="$(max $((zone_size / 128)) "$min_seq_write_size")"
     prep_write
     off=$((first_sequential_zone_sector * 512))
     run_fio --zonemode=zbd --direct=1 --zonesize="${zone_size}" --thread=1 \
@@ -1427,7 +1427,7 @@ if [[ -b "$realdev" ]]; then
 		realsysfs=$(readlink "/sys/dev/block/$major:$minor")
 		basename=$(basename "${realsysfs%/*}")
 	fi
-	logical_block_size=$(<"/sys/block/$basename/queue/logical_block_size")
+	min_seq_write_size=$(min_seq_write_size "$basename")
 	case "$(<"/sys/class/block/$basename/queue/zoned")" in
 	host-managed|host-aware)
 		is_zbd=true
@@ -1452,8 +1452,8 @@ if [[ -b "$realdev" ]]; then
 		;;
 	*)
 		first_sequential_zone_sector=$(((disk_size / 2) &
-						(logical_block_size - 1)))
-		zone_size=$(max 65536 "$logical_block_size")
+						(min_seq_write_size - 1)))
+		zone_size=$(max 65536 "$min_seq_write_size")
 		sectors_per_zone=$((zone_size / 512))
 		max_open_zones=128
 		set_io_scheduler "$basename" none || exit $?
@@ -1476,8 +1476,8 @@ elif [[ -c "$realdev" ]]; then
 		echo "Failed to determine disk size"
 		exit 1
 	fi
-	if ! logical_block_size=($(zbc_logical_block_size "$dev")); then
-		echo "Failed to determine logical block size"
+	if ! min_seq_write_size=($(zbc_physical_block_size "$dev")); then
+		echo "Failed to determine physical block size"
 		exit 1
 	fi
 	if ! result=($(first_sequential_zone "$dev")); then

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-03-04 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-03-04 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5f81856714671287e93e087af8943d3d1779dd5f:

  Merge branch 'fiologparser-fix' of https://github.com/patrakov/fio (2023-03-02 19:57:17 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 051b5785bc47ab216fa3db9dceb6184073dcc88a:

  Merge branch 'For_Each_Td_Private_Scope' of https://github.com/horshack-dpreview/fio (2023-03-03 10:46:26 -0700)

----------------------------------------------------------------
Horshack (2):
      Fix --bandwidth-log segmentation fault when numjobs even multiple of 8
      Refactor for_each_td() to catch inappropriate td ptr reuse

Jens Axboe (2):
      Merge branch 'Fix_calc_thread_status_ramp_time_check' of https://github.com/horshack-dpreview/fio
      Merge branch 'For_Each_Td_Private_Scope' of https://github.com/horshack-dpreview/fio

 backend.c          | 44 ++++++++++++++++++--------------------------
 dedupe.c           |  7 ++-----
 engines/libblkio.c |  6 ++----
 eta.c              | 36 ++++++++++++++++++++----------------
 fio.h              | 19 +++++++++++++++++--
 init.c             | 14 +++++---------
 iolog.c            |  6 ++----
 libfio.c           | 12 ++++--------
 rate-submit.c      |  7 +++----
 stat.c             | 48 +++++++++++++++++++++---------------------------
 steadystate.c      | 27 ++++++++++++---------------
 verify.c           | 17 ++++++++---------
 zbd.c              | 35 ++++++++++++++---------------------
 13 files changed, 128 insertions(+), 150 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 975ef489..f541676c 100644
--- a/backend.c
+++ b/backend.c
@@ -93,19 +93,16 @@ static void sig_int(int sig)
 #ifdef WIN32
 static void sig_break(int sig)
 {
-	struct thread_data *td;
-	int i;
-
 	sig_int(sig);
 
 	/**
 	 * Windows terminates all job processes on SIGBREAK after the handler
 	 * returns, so give them time to wrap-up and give stats
 	 */
-	for_each_td(td, i) {
+	for_each_td(td) {
 		while (td->runstate < TD_EXITED)
 			sleep(1);
-	}
+	} end_for_each();
 }
 #endif
 
@@ -2056,15 +2053,14 @@ err:
 static void reap_threads(unsigned int *nr_running, uint64_t *t_rate,
 			 uint64_t *m_rate)
 {
-	struct thread_data *td;
 	unsigned int cputhreads, realthreads, pending;
-	int i, status, ret;
+	int status, ret;
 
 	/*
 	 * reap exited threads (TD_EXITED -> TD_REAPED)
 	 */
 	realthreads = pending = cputhreads = 0;
-	for_each_td(td, i) {
+	for_each_td(td) {
 		int flags = 0;
 
 		if (!strcmp(td->o.ioengine, "cpuio"))
@@ -2157,7 +2153,7 @@ reaped:
 		done_secs += mtime_since_now(&td->epoch) / 1000;
 		profile_td_exit(td);
 		flow_exit_job(td);
-	}
+	} end_for_each();
 
 	if (*nr_running == cputhreads && !pending && realthreads)
 		fio_terminate_threads(TERMINATE_ALL, TERMINATE_ALL);
@@ -2284,13 +2280,11 @@ static bool waitee_running(struct thread_data *me)
 {
 	const char *waitee = me->o.wait_for;
 	const char *self = me->o.name;
-	struct thread_data *td;
-	int i;
 
 	if (!waitee)
 		return false;
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (!strcmp(td->o.name, self) || strcmp(td->o.name, waitee))
 			continue;
 
@@ -2300,7 +2294,7 @@ static bool waitee_running(struct thread_data *me)
 					runstate_to_name(td->runstate));
 			return true;
 		}
-	}
+	} end_for_each();
 
 	dprint(FD_PROCESS, "%s: %s completed, can run\n", self, waitee);
 	return false;
@@ -2324,14 +2318,14 @@ static void run_threads(struct sk_out *sk_out)
 	set_sig_handlers();
 
 	nr_thread = nr_process = 0;
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (check_mount_writes(td))
 			return;
 		if (td->o.use_thread)
 			nr_thread++;
 		else
 			nr_process++;
-	}
+	} end_for_each();
 
 	if (output_format & FIO_OUTPUT_NORMAL) {
 		struct buf_output out;
@@ -2357,7 +2351,7 @@ static void run_threads(struct sk_out *sk_out)
 	nr_started = 0;
 	m_rate = t_rate = 0;
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		print_status_init(td->thread_number - 1);
 
 		if (!td->o.create_serialize)
@@ -2393,7 +2387,7 @@ reap:
 					td_io_close_file(td, f);
 			}
 		}
-	}
+	} end_for_each();
 
 	/* start idle threads before io threads start to run */
 	fio_idle_prof_start();
@@ -2409,7 +2403,7 @@ reap:
 		/*
 		 * create threads (TD_NOT_CREATED -> TD_CREATED)
 		 */
-		for_each_td(td, i) {
+		for_each_td(td) {
 			if (td->runstate != TD_NOT_CREATED)
 				continue;
 
@@ -2488,7 +2482,7 @@ reap:
 
 					ret = (int)(uintptr_t)thread_main(fd);
 					_exit(ret);
-				} else if (i == fio_debug_jobno)
+				} else if (__td_index == fio_debug_jobno)
 					*fio_debug_jobp = pid;
 				free(eo);
 				free(fd);
@@ -2504,7 +2498,7 @@ reap:
 				break;
 			}
 			dprint(FD_MUTEX, "done waiting on startup_sem\n");
-		}
+		} end_for_each();
 
 		/*
 		 * Wait for the started threads to transition to
@@ -2549,7 +2543,7 @@ reap:
 		/*
 		 * start created threads (TD_INITIALIZED -> TD_RUNNING).
 		 */
-		for_each_td(td, i) {
+		for_each_td(td) {
 			if (td->runstate != TD_INITIALIZED)
 				continue;
 
@@ -2563,7 +2557,7 @@ reap:
 			t_rate += ddir_rw_sum(td->o.rate);
 			todo--;
 			fio_sem_up(td->sem);
-		}
+		} end_for_each();
 
 		reap_threads(&nr_running, &t_rate, &m_rate);
 
@@ -2589,9 +2583,7 @@ static void free_disk_util(void)
 
 int fio_backend(struct sk_out *sk_out)
 {
-	struct thread_data *td;
 	int i;
-
 	if (exec_profile) {
 		if (load_profile(exec_profile))
 			return 1;
@@ -2647,7 +2639,7 @@ int fio_backend(struct sk_out *sk_out)
 		}
 	}
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		struct thread_stat *ts = &td->ts;
 
 		free_clat_prio_stats(ts);
@@ -2660,7 +2652,7 @@ int fio_backend(struct sk_out *sk_out)
 		}
 		fio_sem_remove(td->sem);
 		td->sem = NULL;
-	}
+	} end_for_each();
 
 	free_disk_util();
 	if (cgroup_list) {
diff --git a/dedupe.c b/dedupe.c
index 8214a786..61705689 100644
--- a/dedupe.c
+++ b/dedupe.c
@@ -7,16 +7,13 @@
  */
 int init_global_dedupe_working_set_seeds(void)
 {
-	int i;
-	struct thread_data *td;
-
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (!td->o.dedupe_global)
 			continue;
 
 		if (init_dedupe_working_set_seeds(td, 1))
 			return 1;
-	}
+	} end_for_each();
 
 	return 0;
 }
diff --git a/engines/libblkio.c b/engines/libblkio.c
index 054aa800..ee42d11c 100644
--- a/engines/libblkio.c
+++ b/engines/libblkio.c
@@ -283,16 +283,14 @@ static bool possibly_null_strs_equal(const char *a, const char *b)
  */
 static int total_threaded_subjobs(bool hipri)
 {
-	struct thread_data *td;
-	unsigned int i;
 	int count = 0;
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		const struct fio_blkio_options *options = td->eo;
 		if (strcmp(td->o.ioengine, "libblkio") == 0 &&
 		    td->o.use_thread && (bool)options->hipri == hipri)
 			++count;
-	}
+	} end_for_each();
 
 	return count;
 }
diff --git a/eta.c b/eta.c
index 6017ca31..ce1c6f2d 100644
--- a/eta.c
+++ b/eta.c
@@ -381,8 +381,8 @@ bool eta_time_within_slack(unsigned int time)
  */
 bool calc_thread_status(struct jobs_eta *je, int force)
 {
-	struct thread_data *td;
-	int i, unified_rw_rep;
+	int unified_rw_rep;
+	bool any_td_in_ramp;
 	uint64_t rate_time, disp_time, bw_avg_time, *eta_secs;
 	unsigned long long io_bytes[DDIR_RWDIR_CNT] = {};
 	unsigned long long io_iops[DDIR_RWDIR_CNT] = {};
@@ -416,7 +416,7 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 
 	bw_avg_time = ULONG_MAX;
 	unified_rw_rep = 0;
-	for_each_td(td, i) {
+	for_each_td(td) {
 		unified_rw_rep += td->o.unified_rw_rep;
 		if (is_power_of_2(td->o.kb_base))
 			je->is_pow2 = 1;
@@ -458,9 +458,9 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 			je->nr_pending++;
 
 		if (je->elapsed_sec >= 3)
-			eta_secs[i] = thread_eta(td);
+			eta_secs[__td_index] = thread_eta(td);
 		else
-			eta_secs[i] = INT_MAX;
+			eta_secs[__td_index] = INT_MAX;
 
 		check_str_update(td);
 
@@ -477,26 +477,26 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 				}
 			}
 		}
-	}
+	} end_for_each();
 
 	if (exitall_on_terminate) {
 		je->eta_sec = INT_MAX;
-		for_each_td(td, i) {
-			if (eta_secs[i] < je->eta_sec)
-				je->eta_sec = eta_secs[i];
-		}
+		for_each_td_index() {
+			if (eta_secs[__td_index] < je->eta_sec)
+				je->eta_sec = eta_secs[__td_index];
+		} end_for_each();
 	} else {
 		unsigned long eta_stone = 0;
 
 		je->eta_sec = 0;
-		for_each_td(td, i) {
+		for_each_td(td) {
 			if ((td->runstate == TD_NOT_CREATED) && td->o.stonewall)
-				eta_stone += eta_secs[i];
+				eta_stone += eta_secs[__td_index];
 			else {
-				if (eta_secs[i] > je->eta_sec)
-					je->eta_sec = eta_secs[i];
+				if (eta_secs[__td_index] > je->eta_sec)
+					je->eta_sec = eta_secs[__td_index];
 			}
-		}
+		} end_for_each();
 		je->eta_sec += eta_stone;
 	}
 
@@ -505,7 +505,11 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 	fio_gettime(&now, NULL);
 	rate_time = mtime_since(&rate_prev_time, &now);
 
-	if (write_bw_log && rate_time > bw_avg_time && !in_ramp_time(td)) {
+	any_td_in_ramp = false;
+	for_each_td(td) {
+		any_td_in_ramp |= in_ramp_time(td);
+	} end_for_each();
+	if (write_bw_log && rate_time > bw_avg_time && !any_td_in_ramp) {
 		calc_rate(unified_rw_rep, rate_time, io_bytes, rate_io_bytes,
 				je->rate);
 		memcpy(&rate_prev_time, &now, sizeof(now));
diff --git a/fio.h b/fio.h
index 09c44149..32535517 100644
--- a/fio.h
+++ b/fio.h
@@ -753,9 +753,24 @@ extern void lat_target_reset(struct thread_data *);
 
 /*
  * Iterates all threads/processes within all the defined jobs
+ * Usage:
+ *		for_each_td(var_name_for_td) {
+ *			<< bodoy of your loop >>
+ *			 Note: internally-scoped loop index availble as __td_index
+ *		} end_for_each_td()
  */
-#define for_each_td(td, i)	\
-	for ((i) = 0, (td) = &segments[0].threads[0]; (i) < (int) thread_number; (i)++, (td) = tnumber_to_td((i)))
+#define for_each_td(td)			\
+{								\
+	int __td_index;				\
+	struct thread_data *(td);	\
+	for (__td_index = 0, (td) = &segments[0].threads[0];\
+		__td_index < (int) thread_number; __td_index++, (td) = tnumber_to_td(__td_index))
+#define for_each_td_index()	    \
+{								\
+	int __td_index;				\
+	for (__td_index = 0; __td_index < (int) thread_number; __td_index++)
+#define	end_for_each()	}
+
 #define for_each_file(td, f, i)	\
 	if ((td)->files_index)						\
 		for ((i) = 0, (f) = (td)->files[0];			\
diff --git a/init.c b/init.c
index 78c6c803..442dab42 100644
--- a/init.c
+++ b/init.c
@@ -1405,15 +1405,14 @@ static void gen_log_name(char *name, size_t size, const char *logtype,
 
 static int check_waitees(char *waitee)
 {
-	struct thread_data *td;
-	int i, ret = 0;
+	int ret = 0;
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (td->subjob_number)
 			continue;
 
 		ret += !strcmp(td->o.name, waitee);
-	}
+	} end_for_each();
 
 	return ret;
 }
@@ -1448,10 +1447,7 @@ static bool wait_for_ok(const char *jobname, struct thread_options *o)
 
 static int verify_per_group_options(struct thread_data *td, const char *jobname)
 {
-	struct thread_data *td2;
-	int i;
-
-	for_each_td(td2, i) {
+	for_each_td(td2) {
 		if (td->groupid != td2->groupid)
 			continue;
 
@@ -1461,7 +1457,7 @@ static int verify_per_group_options(struct thread_data *td, const char *jobname)
 				jobname);
 			return 1;
 		}
-	}
+	} end_for_each();
 
 	return 0;
 }
diff --git a/iolog.c b/iolog.c
index ea779632..cc2cbc65 100644
--- a/iolog.c
+++ b/iolog.c
@@ -1875,9 +1875,7 @@ void td_writeout_logs(struct thread_data *td, bool unit_logs)
 
 void fio_writeout_logs(bool unit_logs)
 {
-	struct thread_data *td;
-	int i;
-
-	for_each_td(td, i)
+	for_each_td(td) {
 		td_writeout_logs(td, unit_logs);
+	} end_for_each();
 }
diff --git a/libfio.c b/libfio.c
index ac521974..a52014ce 100644
--- a/libfio.c
+++ b/libfio.c
@@ -240,13 +240,11 @@ void fio_mark_td_terminate(struct thread_data *td)
 
 void fio_terminate_threads(unsigned int group_id, unsigned int terminate)
 {
-	struct thread_data *td;
 	pid_t pid = getpid();
-	int i;
 
 	dprint(FD_PROCESS, "terminate group_id=%d\n", group_id);
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if ((terminate == TERMINATE_GROUP && group_id == TERMINATE_ALL) ||
 		    (terminate == TERMINATE_GROUP && group_id == td->groupid) ||
 		    (terminate == TERMINATE_STONEWALL && td->runstate >= TD_RUNNING) ||
@@ -274,22 +272,20 @@ void fio_terminate_threads(unsigned int group_id, unsigned int terminate)
 					ops->terminate(td);
 			}
 		}
-	}
+	} end_for_each();
 }
 
 int fio_running_or_pending_io_threads(void)
 {
-	struct thread_data *td;
-	int i;
 	int nr_io_threads = 0;
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (td->io_ops_init && td_ioengine_flagged(td, FIO_NOIO))
 			continue;
 		nr_io_threads++;
 		if (td->runstate < TD_EXITED)
 			return 1;
-	}
+	} end_for_each();
 
 	if (!nr_io_threads)
 		return -1; /* we only had cpuio threads to begin with */
diff --git a/rate-submit.c b/rate-submit.c
index 3cc17eaa..103a80aa 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -12,8 +12,7 @@
 
 static void check_overlap(struct io_u *io_u)
 {
-	int i, res;
-	struct thread_data *td;
+	int res;
 
 	/*
 	 * Allow only one thread to check for overlap at a time to prevent two
@@ -31,7 +30,7 @@ static void check_overlap(struct io_u *io_u)
 	assert(res == 0);
 
 retry:
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (td->runstate <= TD_SETTING_UP ||
 		    td->runstate >= TD_FINISHING ||
 		    !td->o.serialize_overlap ||
@@ -46,7 +45,7 @@ retry:
 		res = pthread_mutex_lock(&overlap_check);
 		assert(res == 0);
 		goto retry;
-	}
+	} end_for_each();
 }
 
 static int io_workqueue_fn(struct submit_worker *sw,
diff --git a/stat.c b/stat.c
index b963973a..e0a2dcc6 100644
--- a/stat.c
+++ b/stat.c
@@ -2366,7 +2366,6 @@ void init_thread_stat(struct thread_stat *ts)
 
 static void init_per_prio_stats(struct thread_stat *threadstats, int nr_ts)
 {
-	struct thread_data *td;
 	struct thread_stat *ts;
 	int i, j, last_ts, idx;
 	enum fio_ddir ddir;
@@ -2380,7 +2379,7 @@ static void init_per_prio_stats(struct thread_stat *threadstats, int nr_ts)
 	 * store a 1 in ts->disable_prio_stat, and then do an additional
 	 * loop at the end where we invert the ts->disable_prio_stat values.
 	 */
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (!td->o.stats)
 			continue;
 		if (idx &&
@@ -2407,7 +2406,7 @@ static void init_per_prio_stats(struct thread_stat *threadstats, int nr_ts)
 		}
 
 		idx++;
-	}
+	} end_for_each();
 
 	/* Loop through all dst threadstats and fixup the values. */
 	for (i = 0; i < nr_ts; i++) {
@@ -2419,7 +2418,6 @@ static void init_per_prio_stats(struct thread_stat *threadstats, int nr_ts)
 void __show_run_stats(void)
 {
 	struct group_run_stats *runstats, *rs;
-	struct thread_data *td;
 	struct thread_stat *threadstats, *ts;
 	int i, j, k, nr_ts, last_ts, idx;
 	bool kb_base_warned = false;
@@ -2440,7 +2438,7 @@ void __show_run_stats(void)
 	 */
 	nr_ts = 0;
 	last_ts = -1;
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (!td->o.group_reporting) {
 			nr_ts++;
 			continue;
@@ -2452,7 +2450,7 @@ void __show_run_stats(void)
 
 		last_ts = td->groupid;
 		nr_ts++;
-	}
+	} end_for_each();
 
 	threadstats = malloc(nr_ts * sizeof(struct thread_stat));
 	opt_lists = malloc(nr_ts * sizeof(struct flist_head *));
@@ -2467,7 +2465,7 @@ void __show_run_stats(void)
 	j = 0;
 	last_ts = -1;
 	idx = 0;
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (!td->o.stats)
 			continue;
 		if (idx && (!td->o.group_reporting ||
@@ -2569,7 +2567,7 @@ void __show_run_stats(void)
 		}
 		else
 			ts->ss_dur = ts->ss_state = 0;
-	}
+	} end_for_each();
 
 	for (i = 0; i < nr_ts; i++) {
 		unsigned long long bw;
@@ -2722,17 +2720,15 @@ void __show_run_stats(void)
 
 int __show_running_run_stats(void)
 {
-	struct thread_data *td;
 	unsigned long long *rt;
 	struct timespec ts;
-	int i;
 
 	fio_sem_down(stat_sem);
 
 	rt = malloc(thread_number * sizeof(unsigned long long));
 	fio_gettime(&ts, NULL);
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (td->runstate >= TD_EXITED)
 			continue;
 
@@ -2742,16 +2738,16 @@ int __show_running_run_stats(void)
 		}
 		td->ts.total_run_time = mtime_since(&td->epoch, &ts);
 
-		rt[i] = mtime_since(&td->start, &ts);
+		rt[__td_index] = mtime_since(&td->start, &ts);
 		if (td_read(td) && td->ts.io_bytes[DDIR_READ])
-			td->ts.runtime[DDIR_READ] += rt[i];
+			td->ts.runtime[DDIR_READ] += rt[__td_index];
 		if (td_write(td) && td->ts.io_bytes[DDIR_WRITE])
-			td->ts.runtime[DDIR_WRITE] += rt[i];
+			td->ts.runtime[DDIR_WRITE] += rt[__td_index];
 		if (td_trim(td) && td->ts.io_bytes[DDIR_TRIM])
-			td->ts.runtime[DDIR_TRIM] += rt[i];
-	}
+			td->ts.runtime[DDIR_TRIM] += rt[__td_index];
+	} end_for_each();
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (td->runstate >= TD_EXITED)
 			continue;
 		if (td->rusage_sem) {
@@ -2759,21 +2755,21 @@ int __show_running_run_stats(void)
 			fio_sem_down(td->rusage_sem);
 		}
 		td->update_rusage = 0;
-	}
+	} end_for_each();
 
 	__show_run_stats();
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (td->runstate >= TD_EXITED)
 			continue;
 
 		if (td_read(td) && td->ts.io_bytes[DDIR_READ])
-			td->ts.runtime[DDIR_READ] -= rt[i];
+			td->ts.runtime[DDIR_READ] -= rt[__td_index];
 		if (td_write(td) && td->ts.io_bytes[DDIR_WRITE])
-			td->ts.runtime[DDIR_WRITE] -= rt[i];
+			td->ts.runtime[DDIR_WRITE] -= rt[__td_index];
 		if (td_trim(td) && td->ts.io_bytes[DDIR_TRIM])
-			td->ts.runtime[DDIR_TRIM] -= rt[i];
-	}
+			td->ts.runtime[DDIR_TRIM] -= rt[__td_index];
+	} end_for_each();
 
 	free(rt);
 	fio_sem_up(stat_sem);
@@ -3554,15 +3550,13 @@ static int add_iops_samples(struct thread_data *td, struct timespec *t)
  */
 int calc_log_samples(void)
 {
-	struct thread_data *td;
 	unsigned int next = ~0U, tmp = 0, next_mod = 0, log_avg_msec_min = -1U;
 	struct timespec now;
-	int i;
 	long elapsed_time = 0;
 
 	fio_gettime(&now, NULL);
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		elapsed_time = mtime_since_now(&td->epoch);
 
 		if (!td->o.stats)
@@ -3589,7 +3583,7 @@ int calc_log_samples(void)
 
 		if (tmp < next)
 			next = tmp;
-	}
+	} end_for_each();
 
 	/* if log_avg_msec_min has not been changed, set it to 0 */
 	if (log_avg_msec_min == -1U)
diff --git a/steadystate.c b/steadystate.c
index ad19318c..14cdf0ed 100644
--- a/steadystate.c
+++ b/steadystate.c
@@ -23,8 +23,8 @@ static void steadystate_alloc(struct thread_data *td)
 
 void steadystate_setup(void)
 {
-	struct thread_data *td, *prev_td;
-	int i, prev_groupid;
+	struct thread_data *prev_td;
+	int prev_groupid;
 
 	if (!steadystate_enabled)
 		return;
@@ -36,7 +36,7 @@ void steadystate_setup(void)
 	 */
 	prev_groupid = -1;
 	prev_td = NULL;
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (!td->ss.dur)
 			continue;
 
@@ -51,7 +51,7 @@ void steadystate_setup(void)
 			prev_groupid = td->groupid;
 		}
 		prev_td = td;
-	}
+	} end_for_each();
 
 	if (prev_td && prev_td->o.group_reporting)
 		steadystate_alloc(prev_td);
@@ -198,16 +198,15 @@ static bool steadystate_deviation(uint64_t iops, uint64_t bw,
 
 int steadystate_check(void)
 {
-	int i, j, ddir, prev_groupid, group_ramp_time_over = 0;
+	int  ddir, prev_groupid, group_ramp_time_over = 0;
 	unsigned long rate_time;
-	struct thread_data *td, *td2;
 	struct timespec now;
 	uint64_t group_bw = 0, group_iops = 0;
 	uint64_t td_iops, td_bytes;
 	bool ret;
 
 	prev_groupid = -1;
-	for_each_td(td, i) {
+	for_each_td(td) {
 		const bool needs_lock = td_async_processing(td);
 		struct steadystate_data *ss = &td->ss;
 
@@ -271,7 +270,7 @@ int steadystate_check(void)
 		dprint(FD_STEADYSTATE, "steadystate_check() thread: %d, "
 					"groupid: %u, rate_msec: %ld, "
 					"iops: %llu, bw: %llu, head: %d, tail: %d\n",
-					i, td->groupid, rate_time,
+					__td_index, td->groupid, rate_time,
 					(unsigned long long) group_iops,
 					(unsigned long long) group_bw,
 					ss->head, ss->tail);
@@ -283,18 +282,18 @@ int steadystate_check(void)
 
 		if (ret) {
 			if (td->o.group_reporting) {
-				for_each_td(td2, j) {
+				for_each_td(td2) {
 					if (td2->groupid == td->groupid) {
 						td2->ss.state |= FIO_SS_ATTAINED;
 						fio_mark_td_terminate(td2);
 					}
-				}
+				} end_for_each();
 			} else {
 				ss->state |= FIO_SS_ATTAINED;
 				fio_mark_td_terminate(td);
 			}
 		}
-	}
+	} end_for_each();
 	return 0;
 }
 
@@ -302,8 +301,6 @@ int td_steadystate_init(struct thread_data *td)
 {
 	struct steadystate_data *ss = &td->ss;
 	struct thread_options *o = &td->o;
-	struct thread_data *td2;
-	int j;
 
 	memset(ss, 0, sizeof(*ss));
 
@@ -325,7 +322,7 @@ int td_steadystate_init(struct thread_data *td)
 	}
 
 	/* make sure that ss options are consistent within reporting group */
-	for_each_td(td2, j) {
+	for_each_td(td2) {
 		if (td2->groupid == td->groupid) {
 			struct steadystate_data *ss2 = &td2->ss;
 
@@ -339,7 +336,7 @@ int td_steadystate_init(struct thread_data *td)
 				return 1;
 			}
 		}
-	}
+	} end_for_each();
 
 	return 0;
 }
diff --git a/verify.c b/verify.c
index ddfadcc8..e7e4c69c 100644
--- a/verify.c
+++ b/verify.c
@@ -1568,10 +1568,9 @@ static int fill_file_completions(struct thread_data *td,
 struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 {
 	struct all_io_list *rep;
-	struct thread_data *td;
 	size_t depth;
 	void *next;
-	int i, nr;
+	int nr;
 
 	compiletime_assert(sizeof(struct all_io_list) == 8, "all_io_list");
 
@@ -1581,14 +1580,14 @@ struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 	 */
 	depth = 0;
 	nr = 0;
-	for_each_td(td, i) {
-		if (save_mask != IO_LIST_ALL && (i + 1) != save_mask)
+	for_each_td(td) {
+		if (save_mask != IO_LIST_ALL && (__td_index + 1) != save_mask)
 			continue;
 		td->stop_io = 1;
 		td->flags |= TD_F_VSTATE_SAVED;
 		depth += (td->o.iodepth * td->o.nr_files);
 		nr++;
-	}
+	} end_for_each();
 
 	if (!nr)
 		return NULL;
@@ -1602,11 +1601,11 @@ struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 	rep->threads = cpu_to_le64((uint64_t) nr);
 
 	next = &rep->state[0];
-	for_each_td(td, i) {
+	for_each_td(td) {
 		struct thread_io_list *s = next;
 		unsigned int comps, index = 0;
 
-		if (save_mask != IO_LIST_ALL && (i + 1) != save_mask)
+		if (save_mask != IO_LIST_ALL && (__td_index + 1) != save_mask)
 			continue;
 
 		comps = fill_file_completions(td, s, &index);
@@ -1615,7 +1614,7 @@ struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 		s->depth = cpu_to_le64((uint64_t) td->o.iodepth);
 		s->nofiles = cpu_to_le64((uint64_t) td->o.nr_files);
 		s->numberio = cpu_to_le64((uint64_t) td->io_issues[DDIR_WRITE]);
-		s->index = cpu_to_le64((uint64_t) i);
+		s->index = cpu_to_le64((uint64_t) __td_index);
 		if (td->random_state.use64) {
 			s->rand.state64.s[0] = cpu_to_le64(td->random_state.state64.s1);
 			s->rand.state64.s[1] = cpu_to_le64(td->random_state.state64.s2);
@@ -1633,7 +1632,7 @@ struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 		}
 		snprintf((char *) s->name, sizeof(s->name), "%s", td->o.name);
 		next = io_list_next(s);
-	}
+	} end_for_each();
 
 	return rep;
 }
diff --git a/zbd.c b/zbd.c
index d6f8f800..f5fb923a 100644
--- a/zbd.c
+++ b/zbd.c
@@ -524,11 +524,10 @@ out:
 /* Verify whether direct I/O is used for all host-managed zoned block drives. */
 static bool zbd_using_direct_io(void)
 {
-	struct thread_data *td;
 	struct fio_file *f;
-	int i, j;
+	int j;
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (td->o.odirect || !(td->o.td_ddir & TD_DDIR_WRITE))
 			continue;
 		for_each_file(td, f, j) {
@@ -536,7 +535,7 @@ static bool zbd_using_direct_io(void)
 			    f->zbd_info->model == ZBD_HOST_MANAGED)
 				return false;
 		}
-	}
+	} end_for_each();
 
 	return true;
 }
@@ -639,27 +638,25 @@ static bool zbd_zone_align_file_sizes(struct thread_data *td,
  */
 static bool zbd_verify_sizes(void)
 {
-	struct thread_data *td;
 	struct fio_file *f;
-	int i, j;
+	int j;
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		for_each_file(td, f, j) {
 			if (!zbd_zone_align_file_sizes(td, f))
 				return false;
 		}
-	}
+	} end_for_each();
 
 	return true;
 }
 
 static bool zbd_verify_bs(void)
 {
-	struct thread_data *td;
 	struct fio_file *f;
-	int i, j;
+	int j;
 
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (td_trim(td) &&
 		    (td->o.min_bs[DDIR_TRIM] != td->o.max_bs[DDIR_TRIM] ||
 		     td->o.bssplit_nr[DDIR_TRIM])) {
@@ -680,7 +677,7 @@ static bool zbd_verify_bs(void)
 				return false;
 			}
 		}
-	}
+	} end_for_each();
 	return true;
 }
 
@@ -1010,11 +1007,10 @@ void zbd_free_zone_info(struct fio_file *f)
  */
 static int zbd_init_zone_info(struct thread_data *td, struct fio_file *file)
 {
-	struct thread_data *td2;
 	struct fio_file *f2;
-	int i, j, ret;
+	int j, ret;
 
-	for_each_td(td2, i) {
+	for_each_td(td2) {
 		for_each_file(td2, f2, j) {
 			if (td2 == td && f2 == file)
 				continue;
@@ -1025,7 +1021,7 @@ static int zbd_init_zone_info(struct thread_data *td, struct fio_file *file)
 			file->zbd_info->refcount++;
 			return 0;
 		}
-	}
+	} end_for_each();
 
 	ret = zbd_create_zone_info(td, file);
 	if (ret < 0)
@@ -1289,13 +1285,10 @@ static uint32_t pick_random_zone_idx(const struct fio_file *f,
 
 static bool any_io_in_flight(void)
 {
-	struct thread_data *td;
-	int i;
-
-	for_each_td(td, i) {
+	for_each_td(td) {
 		if (td->io_u_in_flight)
 			return true;
-	}
+	} end_for_each();
 
 	return false;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-03-03 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-03-03 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5a37211238f995657c50e5d0ea6e5e22ff3ca69e:

  examples: add fiograph diagram for uring-cmd-fdp.fio (2023-02-28 13:58:58 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5f81856714671287e93e087af8943d3d1779dd5f:

  Merge branch 'fiologparser-fix' of https://github.com/patrakov/fio (2023-03-02 19:57:17 -0500)

----------------------------------------------------------------
Alexander Patrakov (1):
      fix fiologparser.py to work with new logging format

Vincent Fu (1):
      Merge branch 'fiologparser-fix' of https://github.com/patrakov/fio

 tools/fiologparser.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/tools/fiologparser.py b/tools/fiologparser.py
index 054f1f60..708c5d49 100755
--- a/tools/fiologparser.py
+++ b/tools/fiologparser.py
@@ -166,7 +166,7 @@ class TimeSeries(object):
         f = open(fn, 'r')
         p_time = 0
         for line in f:
-            (time, value, foo, bar) = line.rstrip('\r\n').rsplit(', ')
+            (time, value) = line.rstrip('\r\n').rsplit(', ')[:2]
             self.add_sample(p_time, int(time), int(value))
             p_time = int(time)
  

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-03-01 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-03-01 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8d94106730d11047f313caadda87e450f242f53c:

  Merge branch 'master' of https://github.com/Cuelive/fio (2023-02-28 05:55:55 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5a37211238f995657c50e5d0ea6e5e22ff3ca69e:

  examples: add fiograph diagram for uring-cmd-fdp.fio (2023-02-28 13:58:58 -0500)

----------------------------------------------------------------
Horshack (3):
      ioengines.c:346: td_io_queue: Assertion `res == 0' failed
      Fix "verify bad_hdr rand_seed" for requeued I/Os
      Clarify documentation for runtime parameter

Jens Axboe (5):
      Merge branch 'Fix_Bad_Hdr_Rand_Seed_For_Requeued_IO' of https://github.com/horshack-dpreview/fio
      Merge branch 'Fix_Assert_TdIoQueue_Serialize_Overlap_Offload' of https://github.com/horshack-dpreview/fio
      fdp: cleanup init
      Revert "ioengines.c:346: td_io_queue: Assertion `res == 0' failed"
      Merge branch 'doc-Clarify_Runtime_Param' of https://github.com/horshack-dpreview/fio

Keith Busch (1):
      fio: add fdp support for io_uring_cmd nvme engine

Vincent Fu (2):
      fdp: change the order of includes to fix Windows build error
      examples: add fiograph diagram for uring-cmd-fdp.fio

 HOWTO.rst                  |  22 ++++++--
 Makefile                   |   2 +-
 backend.c                  |   7 ++-
 cconv.c                    |  10 ++++
 engines/io_uring.c         |  24 +++++++++
 engines/nvme.c             |  40 ++++++++++++++-
 engines/nvme.h             |  18 +++++++
 examples/uring-cmd-fdp.fio |  37 ++++++++++++++
 examples/uring-cmd-fdp.png | Bin 0 -> 50265 bytes
 fdp.c                      | 125 +++++++++++++++++++++++++++++++++++++++++++++
 fdp.h                      |  16 ++++++
 file.h                     |   3 ++
 filesetup.c                |   9 ++++
 fio.1                      |  19 +++++--
 io_u.c                     |   5 +-
 io_u.h                     |   4 ++
 ioengines.h                |   5 +-
 options.c                  |  49 ++++++++++++++++++
 server.h                   |   2 +-
 thread_options.h           |   9 ++++
 20 files changed, 391 insertions(+), 15 deletions(-)
 create mode 100644 examples/uring-cmd-fdp.fio
 create mode 100644 examples/uring-cmd-fdp.png
 create mode 100644 fdp.c
 create mode 100644 fdp.h

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 7a0535af..bbd9496e 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -686,10 +686,12 @@ Time related parameters
 
 .. option:: runtime=time
 
-	Tell fio to terminate processing after the specified period of time.  It
-	can be quite hard to determine for how long a specified job will run, so
-	this parameter is handy to cap the total runtime to a given time.  When
-	the unit is omitted, the value is interpreted in seconds.
+	Limit runtime. The test will run until it completes the configured I/O
+	workload or until it has run for this specified amount of time, whichever
+	occurs first. It can be quite hard to determine for how long a specified
+	job will run, so this parameter is handy to cap the total runtime to a
+	given time.  When the unit is omitted, the value is interpreted in
+	seconds.
 
 .. option:: time_based
 
@@ -2423,6 +2425,18 @@ with the caveat that when used on the command line, they must come after the
 	For direct I/O, requests will only succeed if cache invalidation isn't required,
 	file blocks are fully allocated and the disk request could be issued immediately.
 
+.. option:: fdp=bool : [io_uring_cmd]
+
+	Enable Flexible Data Placement mode for write commands.
+
+.. option:: fdp_pli=str : [io_uring_cmd]
+
+	Select which Placement ID Index/Indicies this job is allowed to use for
+	writes. By default, the job will cycle through all available Placement
+        IDs, so use this to isolate these identifiers to specific jobs. If you
+        want fio to use placement identifier only at indices 0, 2 and 5 specify
+        ``fdp_pli=0,2,5``.
+
 .. option:: cpuload=int : [cpuio]
 
 	Attempt to use the specified percentage of CPU cycles. This is a mandatory
diff --git a/Makefile b/Makefile
index e4cde4ba..6d7fd4e2 100644
--- a/Makefile
+++ b/Makefile
@@ -62,7 +62,7 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		gettime-thread.c helpers.c json.c idletime.c td_error.c \
 		profiles/tiobench.c profiles/act.c io_u_queue.c filelock.c \
 		workqueue.c rate-submit.c optgroup.c helper_thread.c \
-		steadystate.c zone-dist.c zbd.c dedupe.c
+		steadystate.c zone-dist.c zbd.c dedupe.c fdp.c
 
 ifdef CONFIG_LIBHDFS
   HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
diff --git a/backend.c b/backend.c
index f494c831..975ef489 100644
--- a/backend.c
+++ b/backend.c
@@ -1040,8 +1040,11 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 		}
 
 		if (io_u->ddir == DDIR_WRITE && td->flags & TD_F_DO_VERIFY) {
-			io_u->numberio = td->io_issues[io_u->ddir];
-			populate_verify_io_u(td, io_u);
+			if (!(io_u->flags & IO_U_F_PATTERN_DONE)) {
+				io_u_set(td, io_u, IO_U_F_PATTERN_DONE);
+				io_u->numberio = td->io_issues[io_u->ddir];
+				populate_verify_io_u(td, io_u);
+			}
 		}
 
 		ddir = io_u->ddir;
diff --git a/cconv.c b/cconv.c
index d755844f..05ac75e3 100644
--- a/cconv.c
+++ b/cconv.c
@@ -349,6 +349,11 @@ int convert_thread_options_to_cpu(struct thread_options *o,
 
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++)
 		o->merge_blktrace_iters[i].u.f = fio_uint64_to_double(le64_to_cpu(top->merge_blktrace_iters[i].u.i));
+
+	o->fdp = le32_to_cpu(top->fdp);
+	o->fdp_nrpli = le32_to_cpu(top->fdp_nrpli);
+	for (i = 0; i < o->fdp_nrpli; i++)
+		o->fdp_plis[i] = le32_to_cpu(top->fdp_plis[i]);
 #if 0
 	uint8_t cpumask[FIO_TOP_STR_MAX];
 	uint8_t verify_cpumask[FIO_TOP_STR_MAX];
@@ -638,6 +643,11 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++)
 		top->merge_blktrace_iters[i].u.i = __cpu_to_le64(fio_double_to_uint64(o->merge_blktrace_iters[i].u.f));
+
+	top->fdp = cpu_to_le32(o->fdp);
+	top->fdp_nrpli = cpu_to_le32(o->fdp_nrpli);
+	for (i = 0; i < o->fdp_nrpli; i++)
+		top->fdp_plis[i] = cpu_to_le32(o->fdp_plis[i]);
 #if 0
 	uint8_t cpumask[FIO_TOP_STR_MAX];
 	uint8_t verify_cpumask[FIO_TOP_STR_MAX];
diff --git a/engines/io_uring.c b/engines/io_uring.c
index a9abd11d..5393758a 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -1262,6 +1262,29 @@ static int fio_ioring_cmd_get_max_open_zones(struct thread_data *td,
 	return fio_nvme_get_max_open_zones(td, f, max_open_zones);
 }
 
+static int fio_ioring_cmd_fetch_ruhs(struct thread_data *td, struct fio_file *f,
+				     struct fio_ruhs_info *fruhs_info)
+{
+	struct nvme_fdp_ruh_status *ruhs;
+	int bytes, ret, i;
+
+	bytes = sizeof(*ruhs) + 128 * sizeof(struct nvme_fdp_ruh_status_desc);
+	ruhs = scalloc(1, bytes);
+	if (!ruhs)
+		return -ENOMEM;
+
+	ret = fio_nvme_iomgmt_ruhs(td, f, ruhs, bytes);
+	if (ret)
+		goto free;
+
+	fruhs_info->nr_ruhs = le16_to_cpu(ruhs->nruhsd);
+	for (i = 0; i < fruhs_info->nr_ruhs; i++)
+		fruhs_info->plis[i] = le16_to_cpu(ruhs->ruhss[i].pid);
+free:
+	sfree(ruhs);
+	return ret;
+}
+
 static struct ioengine_ops ioengine_uring = {
 	.name			= "io_uring",
 	.version		= FIO_IOOPS_VERSION,
@@ -1307,6 +1330,7 @@ static struct ioengine_ops ioengine_uring_cmd = {
 	.get_max_open_zones	= fio_ioring_cmd_get_max_open_zones,
 	.options		= options,
 	.option_struct_size	= sizeof(struct ioring_options),
+	.fdp_fetch_ruhs		= fio_ioring_cmd_fetch_ruhs,
 };
 
 static void fio_init fio_ioring_register(void)
diff --git a/engines/nvme.c b/engines/nvme.c
index 9ffc5303..da18eba9 100644
--- a/engines/nvme.c
+++ b/engines/nvme.c
@@ -28,7 +28,8 @@ int fio_nvme_uring_cmd_prep(struct nvme_uring_cmd *cmd, struct io_u *io_u,
 	cmd->cdw10 = slba & 0xffffffff;
 	cmd->cdw11 = slba >> 32;
 	/* cdw12 represent number of lba's for read/write */
-	cmd->cdw12 = nlb;
+	cmd->cdw12 = nlb | (io_u->dtype << 20);
+	cmd->cdw13 = io_u->dspec << 16;
 	if (iov) {
 		iov->iov_base = io_u->xfer_buf;
 		iov->iov_len = io_u->xfer_buflen;
@@ -345,3 +346,40 @@ out:
 	close(fd);
 	return ret;
 }
+
+static inline int nvme_fdp_reclaim_unit_handle_status(int fd, __u32 nsid,
+						      __u32 data_len, void *data)
+{
+	struct nvme_passthru_cmd cmd = {
+		.opcode		= nvme_cmd_io_mgmt_recv,
+		.nsid		= nsid,
+		.addr		= (__u64)(uintptr_t)data,
+		.data_len 	= data_len,
+		.cdw10		= 1,
+		.cdw11		= (data_len >> 2) - 1,
+	};
+
+	return ioctl(fd, NVME_IOCTL_IO_CMD, &cmd);
+}
+
+int fio_nvme_iomgmt_ruhs(struct thread_data *td, struct fio_file *f,
+			 struct nvme_fdp_ruh_status *ruhs, __u32 bytes)
+{
+	struct nvme_data *data = FILE_ENG_DATA(f);
+	int fd, ret;
+
+	fd = open(f->file_name, O_RDONLY | O_LARGEFILE);
+	if (fd < 0)
+		return -errno;
+
+	ret = nvme_fdp_reclaim_unit_handle_status(fd, data->nsid, bytes, ruhs);
+	if (ret) {
+		log_err("%s: nvme_fdp_reclaim_unit_handle_status failed, err=%d\n",
+			f->file_name, ret);
+		errno = ENOTSUP;
+	} else
+		errno = 0;
+
+	close(fd);
+	return -errno;
+}
diff --git a/engines/nvme.h b/engines/nvme.h
index 70a89b74..1c0e526b 100644
--- a/engines/nvme.h
+++ b/engines/nvme.h
@@ -67,6 +67,7 @@ enum nvme_admin_opcode {
 enum nvme_io_opcode {
 	nvme_cmd_write			= 0x01,
 	nvme_cmd_read			= 0x02,
+	nvme_cmd_io_mgmt_recv		= 0x12,
 	nvme_zns_cmd_mgmt_send		= 0x79,
 	nvme_zns_cmd_mgmt_recv		= 0x7a,
 };
@@ -192,6 +193,23 @@ struct nvme_zone_report {
 	struct nvme_zns_desc	entries[];
 };
 
+struct nvme_fdp_ruh_status_desc {
+	__u16 pid;
+	__u16 ruhid;
+	__u32 earutr;
+	__u64 ruamw;
+	__u8  rsvd16[16];
+};
+
+struct nvme_fdp_ruh_status {
+	__u8  rsvd0[14];
+	__le16 nruhsd;
+	struct nvme_fdp_ruh_status_desc ruhss[];
+};
+
+int fio_nvme_iomgmt_ruhs(struct thread_data *td, struct fio_file *f,
+			 struct nvme_fdp_ruh_status *ruhs, __u32 bytes);
+
 int fio_nvme_get_info(struct fio_file *f, __u32 *nsid, __u32 *lba_sz,
 		      __u64 *nlba);
 
diff --git a/examples/uring-cmd-fdp.fio b/examples/uring-cmd-fdp.fio
new file mode 100644
index 00000000..55d741d3
--- /dev/null
+++ b/examples/uring-cmd-fdp.fio
@@ -0,0 +1,37 @@
+# io_uring_cmd I/O engine for nvme-ns generic character device with FDP enabled
+# This assumes the namespace is already configured with FDP support and has at
+# least 8 available reclaim units.
+#
+# Each job targets different ranges of LBAs with different placement
+# identifiers, and has different write intensity.
+
+[global]
+filename=/dev/ng0n1
+ioengine=io_uring_cmd
+cmd_type=nvme
+iodepth=32
+bs=4K
+fdp=1
+time_based=1
+runtime=1000
+
+[write-heavy]
+rw=randrw
+rwmixwrite=90
+fdp_pli=0,1,2,3
+offset=0%
+size=30%
+
+[write-mid]
+rw=randrw
+rwmixwrite=30
+fdp_pli=4,5
+offset=30%
+size=30%
+
+[write-light]
+rw=randrw
+rwmixwrite=10
+fdp_pli=6
+offset=60%
+size=30%
diff --git a/examples/uring-cmd-fdp.png b/examples/uring-cmd-fdp.png
new file mode 100644
index 00000000..251f4fe3
Binary files /dev/null and b/examples/uring-cmd-fdp.png differ
diff --git a/fdp.c b/fdp.c
new file mode 100644
index 00000000..84e04fce
--- /dev/null
+++ b/fdp.c
@@ -0,0 +1,125 @@
+/*
+ * Note: This is similar to a very basic setup
+ * of ZBD devices
+ *
+ * Specify fdp=1 (With char devices /dev/ng0n1)
+ */
+
+#include <errno.h>
+#include <string.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include "fio.h"
+#include "file.h"
+
+#include "pshared.h"
+#include "fdp.h"
+
+static int fdp_ruh_info(struct thread_data *td, struct fio_file *f,
+			struct fio_ruhs_info *ruhs)
+{
+	int ret = -EINVAL;
+
+	if (!td->io_ops) {
+		log_err("fio: no ops set in fdp init?!\n");
+		return ret;
+	}
+
+	if (td->io_ops->fdp_fetch_ruhs) {
+		ret = td->io_ops->fdp_fetch_ruhs(td, f, ruhs);
+		if (ret < 0) {
+			td_verror(td, errno, "fdp fetch ruhs failed");
+			log_err("%s: fdp fetch ruhs failed (%d)\n",
+				f->file_name, errno);
+		}
+	} else {
+		log_err("%s: engine (%s) lacks fetch ruhs\n",
+			f->file_name, td->io_ops->name);
+	}
+
+	return ret;
+}
+
+static int init_ruh_info(struct thread_data *td, struct fio_file *f)
+{
+	struct fio_ruhs_info *ruhs, *tmp;
+	int i, ret;
+
+	ruhs = scalloc(1, sizeof(*ruhs) + 128 * sizeof(*ruhs->plis));
+	if (!ruhs)
+		return -ENOMEM;
+
+	ret = fdp_ruh_info(td, f, ruhs);
+	if (ret) {
+		log_info("fio: ruh info failed for %s (%d)\n",
+			 f->file_name, -ret);
+		goto out;
+	}
+
+	if (ruhs->nr_ruhs > 128)
+		ruhs->nr_ruhs = 128;
+
+	if (td->o.fdp_nrpli == 0) {
+		f->ruhs_info = ruhs;
+		return 0;
+	}
+
+	for (i = 0; i < td->o.fdp_nrpli; i++) {
+		if (td->o.fdp_plis[i] > ruhs->nr_ruhs) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	tmp = scalloc(1, sizeof(*tmp) + ruhs->nr_ruhs * sizeof(*tmp->plis));
+	if (!tmp) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	tmp->nr_ruhs = td->o.fdp_nrpli;
+	for (i = 0; i < td->o.fdp_nrpli; i++)
+		tmp->plis[i] = ruhs->plis[td->o.fdp_plis[i]];
+	f->ruhs_info = tmp;
+out:
+	sfree(ruhs);
+	return ret;
+}
+
+int fdp_init(struct thread_data *td)
+{
+	struct fio_file *f;
+	int i, ret = 0;
+
+	for_each_file(td, f, i) {
+		ret = init_ruh_info(td, f);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
+void fdp_free_ruhs_info(struct fio_file *f)
+{
+	if (!f->ruhs_info)
+		return;
+	sfree(f->ruhs_info);
+	f->ruhs_info = NULL;
+}
+
+void fdp_fill_dspec_data(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	struct fio_ruhs_info *ruhs = f->ruhs_info;
+	int dspec;
+
+	if (!ruhs || io_u->ddir != DDIR_WRITE) {
+		io_u->dtype = 0;
+		io_u->dspec = 0;
+		return;
+	}
+
+	dspec = ruhs->plis[ruhs->pli_loc++ % ruhs->nr_ruhs];
+	io_u->dtype = 2;
+	io_u->dspec = dspec;
+}
diff --git a/fdp.h b/fdp.h
new file mode 100644
index 00000000..81691f62
--- /dev/null
+++ b/fdp.h
@@ -0,0 +1,16 @@
+#ifndef FIO_FDP_H
+#define FIO_FDP_H
+
+#include "io_u.h"
+
+struct fio_ruhs_info {
+	uint32_t nr_ruhs;
+	uint32_t pli_loc;
+	uint16_t plis[];
+};
+
+int fdp_init(struct thread_data *td);
+void fdp_free_ruhs_info(struct fio_file *f);
+void fdp_fill_dspec_data(struct thread_data *td, struct io_u *io_u);
+
+#endif /* FIO_FDP_H */
diff --git a/file.h b/file.h
index da1b8947..deb36e02 100644
--- a/file.h
+++ b/file.h
@@ -12,6 +12,7 @@
 
 /* Forward declarations */
 struct zoned_block_device_info;
+struct fdp_ruh_info;
 
 /*
  * The type of object we are working on
@@ -101,6 +102,8 @@ struct fio_file {
 	uint64_t file_offset;
 	uint64_t io_size;
 
+	struct fio_ruhs_info *ruhs_info;
+
 	/*
 	 * Zoned block device information. See also zonemode=zbd.
 	 */
diff --git a/filesetup.c b/filesetup.c
index 648f48c6..8e505941 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1407,6 +1407,12 @@ done:
 
 	td_restore_runstate(td, old_state);
 
+	if (td->o.fdp) {
+		err = fdp_init(td);
+		if (err)
+			goto err_out;
+	}
+
 	return 0;
 
 err_offset:
@@ -1584,6 +1590,8 @@ void fio_file_free(struct fio_file *f)
 {
 	if (fio_file_axmap(f))
 		axmap_free(f->io_axmap);
+	if (f->ruhs_info)
+		sfree(f->ruhs_info);
 	if (!fio_file_smalloc(f)) {
 		free(f->file_name);
 		free(f);
@@ -1617,6 +1625,7 @@ void close_and_free_files(struct thread_data *td)
 		}
 
 		zbd_close_file(f);
+		fdp_free_ruhs_info(f);
 		fio_file_free(f);
 	}
 
diff --git a/fio.1 b/fio.1
index e94fad0a..a238331c 100644
--- a/fio.1
+++ b/fio.1
@@ -471,10 +471,12 @@ See \fB\-\-max\-jobs\fR. Default: 1.
 .SS "Time related parameters"
 .TP
 .BI runtime \fR=\fPtime
-Tell fio to terminate processing after the specified period of time. It
-can be quite hard to determine for how long a specified job will run, so
-this parameter is handy to cap the total runtime to a given time. When
-the unit is omitted, the value is interpreted in seconds.
+Limit runtime. The test will run until it completes the configured I/O
+workload or until it has run for this specified amount of time, whichever
+occurs first. It can be quite hard to determine for how long a specified
+job will run, so this parameter is handy to cap the total runtime to a
+given time.  When the unit is omitted, the value is interpreted in
+seconds.
 .TP
 .BI time_based
 If set, fio will run for the duration of the \fBruntime\fR specified
@@ -2184,6 +2186,15 @@ cached data. Currently the RWF_NOWAIT flag does not supported for cached write.
 For direct I/O, requests will only succeed if cache invalidation isn't required,
 file blocks are fully allocated and the disk request could be issued immediately.
 .TP
+.BI (io_uring_cmd)fdp \fR=\fPbool
+Enable Flexible Data Placement mode for write commands.
+.TP
+.BI (io_uring_cmd)fdp_pli \fR=\fPstr
+Select which Placement ID Index/Indicies this job is allowed to use for writes.
+By default, the job will cycle through all available Placement IDs, so use this
+to isolate these identifiers to specific jobs. If you want fio to use placement
+identifier only at indices 0, 2 and 5 specify, you would set `fdp_pli=0,2,5`.
+.TP
 .BI (cpuio)cpuload \fR=\fPint
 Attempt to use the specified percentage of CPU cycles. This is a mandatory
 option when using cpuio I/O engine.
diff --git a/io_u.c b/io_u.c
index d50d8465..ca7ee68f 100644
--- a/io_u.c
+++ b/io_u.c
@@ -990,6 +990,9 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 		}
 	}
 
+	if (td->o.fdp)
+		fdp_fill_dspec_data(td, io_u);
+
 	if (io_u->offset + io_u->buflen > io_u->file->real_file_size) {
 		dprint(FD_IO, "io_u %p, off=0x%llx + len=0x%llx exceeds file size=0x%llx\n",
 			io_u,
@@ -2006,7 +2009,7 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 	dprint_io_u(io_u, "complete");
 
 	assert(io_u->flags & IO_U_F_FLIGHT);
-	io_u_clear(td, io_u, IO_U_F_FLIGHT | IO_U_F_BUSY_OK);
+	io_u_clear(td, io_u, IO_U_F_FLIGHT | IO_U_F_BUSY_OK | IO_U_F_PATTERN_DONE);
 
 	/*
 	 * Mark IO ok to verify
diff --git a/io_u.h b/io_u.h
index 206e24fe..55b4d083 100644
--- a/io_u.h
+++ b/io_u.h
@@ -21,6 +21,7 @@ enum {
 	IO_U_F_TRIMMED		= 1 << 5,
 	IO_U_F_BARRIER		= 1 << 6,
 	IO_U_F_VER_LIST		= 1 << 7,
+	IO_U_F_PATTERN_DONE	= 1 << 8,
 };
 
 /*
@@ -117,6 +118,9 @@ struct io_u {
 	 */
 	int (*end_io)(struct thread_data *, struct io_u **);
 
+	uint32_t dtype;
+	uint32_t dspec;
+
 	union {
 #ifdef CONFIG_LIBAIO
 		struct iocb iocb;
diff --git a/ioengines.h b/ioengines.h
index ea799180..9484265e 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -7,8 +7,9 @@
 #include "flist.h"
 #include "io_u.h"
 #include "zbd_types.h"
+#include "fdp.h"
 
-#define FIO_IOOPS_VERSION	31
+#define FIO_IOOPS_VERSION	32
 
 #ifndef CONFIG_DYNAMIC_ENGINES
 #define FIO_STATIC	static
@@ -63,6 +64,8 @@ struct ioengine_ops {
 				  unsigned int *);
 	int (*finish_zone)(struct thread_data *, struct fio_file *,
 			   uint64_t, uint64_t);
+	int (*fdp_fetch_ruhs)(struct thread_data *, struct fio_file *,
+			      struct fio_ruhs_info *);
 	int option_struct_size;
 	struct fio_option *options;
 };
diff --git a/options.c b/options.c
index 536ba91c..91049af5 100644
--- a/options.c
+++ b/options.c
@@ -251,6 +251,34 @@ int str_split_parse(struct thread_data *td, char *str,
 	return ret;
 }
 
+static int fio_fdp_cmp(const void *p1, const void *p2)
+{
+	const uint16_t *t1 = p1;
+	const uint16_t *t2 = p2;
+
+	return *t1 - *t2;
+}
+
+static int str_fdp_pli_cb(void *data, const char *input)
+{
+	struct thread_data *td = cb_data_to_td(data);
+	char *str, *p, *v;
+	int i = 0;
+
+	p = str = strdup(input);
+	strip_blank_front(&str);
+	strip_blank_end(str);
+
+	while ((v = strsep(&str, ",")) != NULL && i < FIO_MAX_PLIS)
+		td->o.fdp_plis[i++] = strtoll(v, NULL, 0);
+	free(p);
+
+	qsort(td->o.fdp_plis, i, sizeof(*td->o.fdp_plis), fio_fdp_cmp);
+	td->o.fdp_nrpli = i;
+
+	return 0;
+}
+
 static int str_bssplit_cb(void *data, const char *input)
 {
 	struct thread_data *td = cb_data_to_td(data);
@@ -3643,6 +3671,27 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_ZONE,
 	},
+	{
+		.name   = "fdp",
+		.lname  = "Flexible data placement",
+		.type   = FIO_OPT_BOOL,
+		.off1   = offsetof(struct thread_options, fdp),
+		.help   = "Use Data placement directive (FDP)",
+		.def	= "0",
+		.category = FIO_OPT_C_IO,
+		.group  = FIO_OPT_G_INVALID,
+	},
+	{
+		.name	= "fdp_pli",
+		.lname	= "FDP Placement ID indicies",
+		.type	= FIO_OPT_STR,
+		.cb	= str_fdp_pli_cb,
+		.off1	= offsetof(struct thread_options, fdp_plis),
+		.help	= "Sets which placement ids to use (defaults to all)",
+		.hide	= 1,
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_INVALID,
+	},
 	{
 		.name	= "lockmem",
 		.lname	= "Lock memory",
diff --git a/server.h b/server.h
index 28133020..898a893d 100644
--- a/server.h
+++ b/server.h
@@ -51,7 +51,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 98,
+	FIO_SERVER_VER			= 99,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 74e7ea45..2520357c 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -386,6 +386,11 @@ struct thread_options {
 	fio_fp64_t zrt;
 	fio_fp64_t zrf;
 
+#define FIO_MAX_PLIS 16
+	unsigned int fdp;
+	unsigned int fdp_plis[FIO_MAX_PLIS];
+	unsigned int fdp_nrpli;
+
 	unsigned int log_entries;
 	unsigned int log_prio;
 };
@@ -698,6 +703,10 @@ struct thread_options_pack {
 	uint32_t log_entries;
 	uint32_t log_prio;
 
+	uint32_t fdp;
+	uint32_t fdp_plis[FIO_MAX_PLIS];
+	uint32_t fdp_nrpli;
+
 	/*
 	 * verify_pattern followed by buffer_pattern from the unpacked struct
 	 */

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-28 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-28 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b5904c0d7434a49770cdb90eada1c724f0f7fe4e:

  Merge branch 'master' of https://github.com/bvanassche/fio (2023-02-23 20:17:31 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8d94106730d11047f313caadda87e450f242f53c:

  Merge branch 'master' of https://github.com/Cuelive/fio (2023-02-28 05:55:55 -0700)

----------------------------------------------------------------
Cuelive (1):
      blktrace: fix compilation error on the uos system

Jens Axboe (1):
      Merge branch 'master' of https://github.com/Cuelive/fio

 blktrace.c | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/blktrace.c b/blktrace.c
index d5c8aee7..ef9ce6bf 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -5,6 +5,7 @@
 #include <stdlib.h>
 #include <unistd.h>
 #include <errno.h>
+#include <sys/sysmacros.h>
 
 #include "flist.h"
 #include "fio.h"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6946ad5940565d573d85e210b8ea4da5884f0323:

  Merge branch 'Verify_Bad_Hdr_Rand_Seed_Mult_Workload_Iterations_Non_Repeating_Seed' of https://github.com/horshack-dpreview/fio (2023-02-21 09:37:09 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b5904c0d7434a49770cdb90eada1c724f0f7fe4e:

  Merge branch 'master' of https://github.com/bvanassche/fio (2023-02-23 20:17:31 -0500)

----------------------------------------------------------------
Bart Van Assche (3):
      io_u: Add a debug message in fill_io_u()
      zbd: Report the zone capacity
      zbd: Make an error message more detailed

Vincent Fu (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 io_u.c | 4 +++-
 zbd.c  | 9 +++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index eb617e64..d50d8465 100644
--- a/io_u.c
+++ b/io_u.c
@@ -984,8 +984,10 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 	offset = io_u->offset;
 	if (td->o.zone_mode == ZONE_MODE_ZBD) {
 		ret = zbd_adjust_block(td, io_u);
-		if (ret == io_u_eof)
+		if (ret == io_u_eof) {
+			dprint(FD_IO, "zbd_adjust_block() returned io_u_eof\n");
 			return 1;
+		}
 	}
 
 	if (io_u->offset + io_u->buflen > io_u->file->real_file_size) {
diff --git a/zbd.c b/zbd.c
index ba2c0401..d6f8f800 100644
--- a/zbd.c
+++ b/zbd.c
@@ -807,8 +807,8 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 		goto out;
 	}
 
-	dprint(FD_ZBD, "Device %s has %d zones of size %"PRIu64" KB\n",
-	       f->file_name, nr_zones, zone_size / 1024);
+	dprint(FD_ZBD, "Device %s has %d zones of size %"PRIu64" KB and capacity %"PRIu64" KB\n",
+	       f->file_name, nr_zones, zone_size / 1024, zones[0].capacity / 1024);
 
 	zbd_info = scalloc(1, sizeof(*zbd_info) +
 			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
@@ -848,8 +848,9 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 			p->cond = z->cond;
 
 			if (j > 0 && p->start != p[-1].start + zone_size) {
-				log_info("%s: invalid zone data\n",
-					 f->file_name);
+				log_info("%s: invalid zone data [%d:%d]: %"PRIu64" + %"PRIu64" != %"PRIu64"\n",
+					 f->file_name, j, i,
+					 p[-1].start, zone_size, p->start);
 				ret = -EINVAL;
 				goto out;
 			}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-22 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-22 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7a3a166c6c43e45de1c8085254fbdd011c572f05:

  configure: restore dev-dax and libpmem (2023-02-20 08:53:23 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6946ad5940565d573d85e210b8ea4da5884f0323:

  Merge branch 'Verify_Bad_Hdr_Rand_Seed_Mult_Workload_Iterations_Non_Repeating_Seed' of https://github.com/horshack-dpreview/fio (2023-02-21 09:37:09 -0700)

----------------------------------------------------------------
Horshack (1):
      Bad header rand_seed with time_based or loops with randrepeat=0 verify

Jens Axboe (1):
      Merge branch 'Verify_Bad_Hdr_Rand_Seed_Mult_Workload_Iterations_Non_Repeating_Seed' of https://github.com/horshack-dpreview/fio

 backend.c | 15 +++++----------
 fio.h     |  1 +
 2 files changed, 6 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index cb1fbf42..f494c831 100644
--- a/backend.c
+++ b/backend.c
@@ -637,15 +637,6 @@ static void do_verify(struct thread_data *td, uint64_t verify_bytes)
 	if (td->error)
 		return;
 
-	/*
-	 * verify_state needs to be reset before verification
-	 * proceeds so that expected random seeds match actual
-	 * random seeds in headers. The main loop will reset
-	 * all random number generators if randrepeat is set.
-	 */
-	if (!td->o.rand_repeatable)
-		td_fill_verify_state_seed(td);
-
 	td_set_runstate(td, TD_VERIFYING);
 
 	io_u = NULL;
@@ -1894,8 +1885,12 @@ static void *thread_main(void *data)
 		if (td->o.verify_only && td_write(td))
 			verify_bytes = do_dry_run(td);
 		else {
+			if (!td->o.rand_repeatable)
+				/* save verify rand state to replay hdr seeds later at verify */
+				frand_copy(&td->verify_state_last_do_io, &td->verify_state);
 			do_io(td, bytes_done);
-
+			if (!td->o.rand_repeatable)
+				frand_copy(&td->verify_state, &td->verify_state_last_do_io);
 			if (!ddir_rw_sum(bytes_done)) {
 				fio_mark_td_terminate(td);
 				verify_bytes = 0;
diff --git a/fio.h b/fio.h
index 8da77640..09c44149 100644
--- a/fio.h
+++ b/fio.h
@@ -258,6 +258,7 @@ struct thread_data {
 
 	struct frand_state bsrange_state[DDIR_RWDIR_CNT];
 	struct frand_state verify_state;
+	struct frand_state verify_state_last_do_io;
 	struct frand_state trim_state;
 	struct frand_state delay_state;
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-21 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-21 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f9fc7a27cae5ea2dbb310c05f7b693c68ba15537:

  backend: fix runtime when used with thinktime (2023-02-17 19:52:50 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7a3a166c6c43e45de1c8085254fbdd011c572f05:

  configure: restore dev-dax and libpmem (2023-02-20 08:53:23 -0500)

----------------------------------------------------------------
Vincent Fu (1):
      configure: restore dev-dax and libpmem

 configure | 8 ++++++++
 1 file changed, 8 insertions(+)

---

Diff of recent changes:

diff --git a/configure b/configure
index 0d02bce8..45d10a31 100755
--- a/configure
+++ b/configure
@@ -2228,6 +2228,14 @@ if compile_prog "" "-lpmem2" "libpmem2"; then
 fi
 print_config "libpmem2" "$libpmem2"
 
+# Choose libpmem-based ioengines
+if test "$libpmem" = "yes" && test "$disable_pmem" = "no"; then
+  devdax="yes"
+  if test "$libpmem1_5" = "yes"; then
+    pmem="yes"
+  fi
+fi
+
 ##########################################
 # Report whether dev-dax engine is enabled
 print_config "PMDK dev-dax engine" "$devdax"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-18 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-18 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ded6cce8274ccf6f3820fb19ab46fd6d2aed0311:

  Merge branch 'Read_Stats_Not_Reported_For_Timed_Backlog_Verifies' of github.com:horshack-dpreview/fio (2023-02-15 12:49:31 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f9fc7a27cae5ea2dbb310c05f7b693c68ba15537:

  backend: fix runtime when used with thinktime (2023-02-17 19:52:50 -0700)

----------------------------------------------------------------
Ankit Kumar (1):
      backend: fix runtime when used with thinktime

Jens Axboe (1):
      Get rid of O_ATOMIC

Vincent Fu (3):
      iolog: handle trim commands when reading iologs
      filesetup: don't skip flags for trim workloads
      Merge branch 'remove_pmemblk_engine' of github.com:osalyk/fio

osalyk (1):
      pmemblk: remove pmemblk engine

 HOWTO.rst               |  11 --
 Makefile                |   5 -
 backend.c               |  28 ++-
 ci/actions-install.sh   |   1 -
 configure               |  41 -----
 engines/ime.c           |   4 -
 engines/libzbc.c        |   6 -
 engines/pmemblk.c       | 449 ------------------------------------------------
 examples/pmemblk.fio    |  71 --------
 examples/pmemblk.png    | Bin 107529 -> 0 bytes
 filesetup.c             |  10 --
 fio.1                   |  10 --
 init.c                  |   6 -
 iolog.c                 |  27 ++-
 memory.c                |   4 +-
 options.c               |   6 -
 os/os-linux.h           |   6 -
 os/os.h                 |   6 -
 os/windows/examples.wxs |   4 -
 19 files changed, 42 insertions(+), 653 deletions(-)
 delete mode 100644 engines/pmemblk.c
 delete mode 100644 examples/pmemblk.fio
 delete mode 100644 examples/pmemblk.png

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 158c5d89..7a0535af 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1110,12 +1110,6 @@ I/O type
 	OpenBSD and ZFS on Solaris don't support direct I/O.  On Windows the synchronous
 	ioengines don't support direct I/O.  Default: false.
 
-.. option:: atomic=bool
-
-	If value is true, attempt to use atomic direct I/O. Atomic writes are
-	guaranteed to be stable once acknowledged by the operating system. Only
-	Linux supports O_ATOMIC right now.
-
 .. option:: buffered=bool
 
 	If value is true, use buffered I/O. This is the opposite of the
@@ -2147,11 +2141,6 @@ I/O engine
 			before overwriting. The `trimwrite` mode works well for this
 			constraint.
 
-		**pmemblk**
-			Read and write using filesystem DAX to a file on a filesystem
-			mounted with DAX on a persistent memory device through the PMDK
-			libpmemblk library.
-
 		**dev-dax**
 			Read and write using device DAX to a persistent memory device (e.g.,
 			/dev/dax0.0) through the PMDK libpmem library.
diff --git a/Makefile b/Makefile
index 5f4e6562..e4cde4ba 100644
--- a/Makefile
+++ b/Makefile
@@ -208,11 +208,6 @@ ifdef CONFIG_MTD
   SOURCE += oslib/libmtd.c
   SOURCE += oslib/libmtd_legacy.c
 endif
-ifdef CONFIG_PMEMBLK
-  pmemblk_SRCS = engines/pmemblk.c
-  pmemblk_LIBS = -lpmemblk
-  ENGINES += pmemblk
-endif
 ifdef CONFIG_LINUX_DEVDAX
   dev-dax_SRCS = engines/dev-dax.c
   dev-dax_LIBS = -lpmem
diff --git a/backend.c b/backend.c
index 0ccc7c2b..cb1fbf42 100644
--- a/backend.c
+++ b/backend.c
@@ -866,6 +866,7 @@ static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir,
 			     struct timespec *time)
 {
 	unsigned long long b;
+	unsigned long long runtime_left;
 	uint64_t total;
 	int left;
 	struct timespec now;
@@ -874,7 +875,7 @@ static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir,
 	if (td->o.thinktime_iotime) {
 		fio_gettime(&now, NULL);
 		if (utime_since(&td->last_thinktime, &now)
-		    >= td->o.thinktime_iotime + td->o.thinktime) {
+		    >= td->o.thinktime_iotime) {
 			stall = true;
 		} else if (!fio_option_is_set(&td->o, thinktime_blocks)) {
 			/*
@@ -897,11 +898,24 @@ static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir,
 
 	io_u_quiesce(td);
 
+	left = td->o.thinktime_spin;
+	if (td->o.timeout) {
+		runtime_left = td->o.timeout - utime_since_now(&td->epoch);
+		if (runtime_left < (unsigned long long)left)
+			left = runtime_left;
+	}
+
 	total = 0;
-	if (td->o.thinktime_spin)
-		total = usec_spin(td->o.thinktime_spin);
+	if (left)
+		total = usec_spin(left);
 
 	left = td->o.thinktime - total;
+	if (td->o.timeout) {
+		runtime_left = td->o.timeout - utime_since_now(&td->epoch);
+		if (runtime_left < (unsigned long long)left)
+			left = runtime_left;
+	}
+
 	if (left)
 		total += usec_sleep(td, left);
 
@@ -930,8 +944,10 @@ static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir,
 		fio_gettime(time, NULL);
 
 	td->last_thinktime_blocks = b;
-	if (td->o.thinktime_iotime)
+	if (td->o.thinktime_iotime) {
+		fio_gettime(&now, NULL);
 		td->last_thinktime = now;
+	}
 }
 
 /*
@@ -1333,7 +1349,7 @@ int init_io_u_buffers(struct thread_data *td)
 	 * overflow later. this adjustment may be too much if we get
 	 * lucky and the allocator gives us an aligned address.
 	 */
-	if (td->o.odirect || td->o.mem_align || td->o.oatomic ||
+	if (td->o.odirect || td->o.mem_align ||
 	    td_ioengine_flagged(td, FIO_RAWIO))
 		td->orig_buffer_size += page_mask + td->o.mem_align;
 
@@ -1352,7 +1368,7 @@ int init_io_u_buffers(struct thread_data *td)
 	if (data_xfer && allocate_io_mem(td))
 		return 1;
 
-	if (td->o.odirect || td->o.mem_align || td->o.oatomic ||
+	if (td->o.odirect || td->o.mem_align ||
 	    td_ioengine_flagged(td, FIO_RAWIO))
 		p = PTR_ALIGN(td->orig_buffer, page_mask) + td->o.mem_align;
 	else
diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index c16dff16..5057fca3 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -45,7 +45,6 @@ DPKGCFG
                 libnbd-dev
                 libpmem-dev
                 libpmem2-dev
-                libpmemblk-dev
                 libprotobuf-c-dev
                 librbd-dev
                 libtcmalloc-minimal4
diff --git a/configure b/configure
index 182cd3c3..0d02bce8 100755
--- a/configure
+++ b/configure
@@ -163,7 +163,6 @@ show_help="no"
 exit_val=0
 gfio_check="no"
 libhdfs="no"
-pmemblk="no"
 devdax="no"
 pmem="no"
 cuda="no"
@@ -2229,43 +2228,6 @@ if compile_prog "" "-lpmem2" "libpmem2"; then
 fi
 print_config "libpmem2" "$libpmem2"
 
-##########################################
-# Check whether we have libpmemblk
-# libpmem is a prerequisite
-if test "$libpmemblk" != "yes" ; then
-  libpmemblk="no"
-fi
-if test "$libpmem" = "yes"; then
-  cat > $TMPC << EOF
-#include <libpmemblk.h>
-int main(int argc, char **argv)
-{
-  PMEMblkpool *pbp;
-  pbp = pmemblk_open("", 0);
-  return 0;
-}
-EOF
-  if compile_prog "" "-lpmemblk" "libpmemblk"; then
-    libpmemblk="yes"
-  fi
-fi
-print_config "libpmemblk" "$libpmemblk"
-
-# Choose libpmem-based ioengines
-if test "$libpmem" = "yes" && test "$disable_pmem" = "no"; then
-  devdax="yes"
-  if test "$libpmem1_5" = "yes"; then
-    pmem="yes"
-  fi
-  if test "$libpmemblk" = "yes"; then
-    pmemblk="yes"
-  fi
-fi
-
-##########################################
-# Report whether pmemblk engine is enabled
-print_config "PMDK pmemblk engine" "$pmemblk"
-
 ##########################################
 # Report whether dev-dax engine is enabled
 print_config "PMDK dev-dax engine" "$devdax"
@@ -3191,9 +3153,6 @@ fi
 if test "$mtd" = "yes" ; then
   output_sym "CONFIG_MTD"
 fi
-if test "$pmemblk" = "yes" ; then
-  output_sym "CONFIG_PMEMBLK"
-fi
 if test "$devdax" = "yes" ; then
   output_sym "CONFIG_LINUX_DEVDAX"
 fi
diff --git a/engines/ime.c b/engines/ime.c
index f6690cc1..037b8419 100644
--- a/engines/ime.c
+++ b/engines/ime.c
@@ -188,10 +188,6 @@ static int fio_ime_open_file(struct thread_data *td, struct fio_file *f)
 		return 1;
 	}
 
-	if (td->o.oatomic) {
-		td_verror(td, EINVAL, "IME does not support atomic IO");
-		return 1;
-	}
 	if (td->o.odirect)
 		flags |= O_DIRECT;
 	flags |= td->o.sync_io;
diff --git a/engines/libzbc.c b/engines/libzbc.c
index cb3e9ca5..1bf1e8c8 100644
--- a/engines/libzbc.c
+++ b/engines/libzbc.c
@@ -71,12 +71,6 @@ static int libzbc_open_dev(struct thread_data *td, struct fio_file *f,
 			flags |= O_RDONLY;
 	}
 
-	if (td->o.oatomic) {
-		td_verror(td, EINVAL, "libzbc does not support O_ATOMIC");
-		log_err("%s: libzbc does not support O_ATOMIC\n", f->file_name);
-		return -EINVAL;
-	}
-
 	ld = calloc(1, sizeof(*ld));
 	if (!ld)
 		return -ENOMEM;
diff --git a/engines/pmemblk.c b/engines/pmemblk.c
deleted file mode 100644
index 849d8a15..00000000
--- a/engines/pmemblk.c
+++ /dev/null
@@ -1,449 +0,0 @@
-/*
- * pmemblk: IO engine that uses PMDK libpmemblk to read and write data
- *
- * Copyright (C) 2016 Hewlett Packard Enterprise Development LP
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License,
- * version 2 as published by the Free Software Foundation..
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public
- * License along with this program; if not, write to the Free
- * Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
- * Boston, MA 02110-1301, USA.
- */
-
-/*
- * pmemblk engine
- *
- * IO engine that uses libpmemblk to read and write data
- *
- * To use:
- *   ioengine=pmemblk
- *
- * Other relevant settings:
- *   thread=1   REQUIRED
- *   iodepth=1
- *   direct=1
- *   unlink=1
- *   filename=/mnt/pmem0/fiotestfile,BSIZE,FSIZEMiB
- *
- *   thread must be set to 1 for pmemblk as multiple processes cannot
- *     open the same block pool file.
- *
- *   iodepth should be set to 1 as pmemblk is always synchronous.
- *   Use numjobs to scale up.
- *
- *   direct=1 is implied as pmemblk is always direct. A warning message
- *   is printed if this is not specified.
- *
- *   unlink=1 removes the block pool file after testing, and is optional.
- *
- *   The pmem device must have a DAX-capable filesystem and be mounted
- *   with DAX enabled.  filename must point to a file on that filesystem.
- *
- *   Example:
- *     mkfs.xfs /dev/pmem0
- *     mkdir /mnt/pmem0
- *     mount -o dax /dev/pmem0 /mnt/pmem0
- *
- *   When specifying the filename, if the block pool file does not already
- *   exist, then the pmemblk engine creates the pool file if you specify
- *   the block and file sizes.  BSIZE is the block size in bytes.
- *   FSIZEMB is the pool file size in MiB.
- *
- *   See examples/pmemblk.fio for more.
- *
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <sys/uio.h>
-#include <errno.h>
-#include <assert.h>
-#include <string.h>
-#include <libpmem.h>
-#include <libpmemblk.h>
-
-#include "../fio.h"
-
-/*
- * libpmemblk
- */
-typedef struct fio_pmemblk_file *fio_pmemblk_file_t;
-
-struct fio_pmemblk_file {
-	fio_pmemblk_file_t pmb_next;
-	char *pmb_filename;
-	uint64_t pmb_refcnt;
-	PMEMblkpool *pmb_pool;
-	size_t pmb_bsize;
-	size_t pmb_nblocks;
-};
-
-static fio_pmemblk_file_t Cache;
-
-static pthread_mutex_t CacheLock = PTHREAD_MUTEX_INITIALIZER;
-
-#define PMB_CREATE   (0x0001)	/* should create file */
-
-fio_pmemblk_file_t fio_pmemblk_cache_lookup(const char *filename)
-{
-	fio_pmemblk_file_t i;
-
-	for (i = Cache; i != NULL; i = i->pmb_next)
-		if (!strcmp(filename, i->pmb_filename))
-			return i;
-
-	return NULL;
-}
-
-static void fio_pmemblk_cache_insert(fio_pmemblk_file_t pmb)
-{
-	pmb->pmb_next = Cache;
-	Cache = pmb;
-}
-
-static void fio_pmemblk_cache_remove(fio_pmemblk_file_t pmb)
-{
-	fio_pmemblk_file_t i;
-
-	if (pmb == Cache) {
-		Cache = Cache->pmb_next;
-		pmb->pmb_next = NULL;
-		return;
-	}
-
-	for (i = Cache; i != NULL; i = i->pmb_next)
-		if (pmb == i->pmb_next) {
-			i->pmb_next = i->pmb_next->pmb_next;
-			pmb->pmb_next = NULL;
-			return;
-		}
-}
-
-/*
- * to control block size and gross file size at the libpmemblk
- * level, we allow the block size and file size to be appended
- * to the file name:
- *
- *   path[,bsize,fsizemib]
- *
- * note that we do not use the fio option "filesize" to dictate
- * the file size because we can only give libpmemblk the gross
- * file size, which is different from the net or usable file
- * size (which is probably what fio wants).
- *
- * the final path without the parameters is returned in ppath.
- * the block size and file size are returned in pbsize and fsize.
- *
- * note that the user specifies the file size in MiB, but
- * we return bytes from here.
- */
-static void pmb_parse_path(const char *pathspec, char **ppath, uint64_t *pbsize,
-			   uint64_t *pfsize)
-{
-	char *path;
-	char *s;
-	uint64_t bsize;
-	uint64_t fsizemib;
-
-	path = strdup(pathspec);
-	if (!path) {
-		*ppath = NULL;
-		return;
-	}
-
-	/* extract sizes, if given */
-	s = strrchr(path, ',');
-	if (s && (fsizemib = strtoull(s + 1, NULL, 10))) {
-		*s = 0;
-		s = strrchr(path, ',');
-		if (s && (bsize = strtoull(s + 1, NULL, 10))) {
-			*s = 0;
-			*ppath = path;
-			*pbsize = bsize;
-			*pfsize = fsizemib << 20;
-			return;
-		}
-	}
-
-	/* size specs not found */
-	strcpy(path, pathspec);
-	*ppath = path;
-	*pbsize = 0;
-	*pfsize = 0;
-}
-
-static fio_pmemblk_file_t pmb_open(const char *pathspec, int flags)
-{
-	fio_pmemblk_file_t pmb;
-	char *path = NULL;
-	uint64_t bsize = 0;
-	uint64_t fsize = 0;
-
-	pmb_parse_path(pathspec, &path, &bsize, &fsize);
-	if (!path)
-		return NULL;
-
-	pthread_mutex_lock(&CacheLock);
-
-	pmb = fio_pmemblk_cache_lookup(path);
-	if (!pmb) {
-		pmb = malloc(sizeof(*pmb));
-		if (!pmb)
-			goto error;
-
-		/* try opening existing first, create it if needed */
-		pmb->pmb_pool = pmemblk_open(path, bsize);
-		if (!pmb->pmb_pool && (errno == ENOENT) &&
-		    (flags & PMB_CREATE) && (0 < fsize) && (0 < bsize)) {
-			pmb->pmb_pool =
-			    pmemblk_create(path, bsize, fsize, 0644);
-		}
-		if (!pmb->pmb_pool) {
-			log_err("pmemblk: unable to open pmemblk pool file %s (%s)\n",
-			     path, strerror(errno));
-			goto error;
-		}
-
-		pmb->pmb_filename = path;
-		pmb->pmb_next = NULL;
-		pmb->pmb_refcnt = 0;
-		pmb->pmb_bsize = pmemblk_bsize(pmb->pmb_pool);
-		pmb->pmb_nblocks = pmemblk_nblock(pmb->pmb_pool);
-
-		fio_pmemblk_cache_insert(pmb);
-	} else {
-		free(path);
-	}
-
-	pmb->pmb_refcnt += 1;
-
-	pthread_mutex_unlock(&CacheLock);
-
-	return pmb;
-
-error:
-	if (pmb) {
-		if (pmb->pmb_pool)
-			pmemblk_close(pmb->pmb_pool);
-		pmb->pmb_pool = NULL;
-		pmb->pmb_filename = NULL;
-		free(pmb);
-	}
-	if (path)
-		free(path);
-
-	pthread_mutex_unlock(&CacheLock);
-	return NULL;
-}
-
-static void pmb_close(fio_pmemblk_file_t pmb, const bool keep)
-{
-	pthread_mutex_lock(&CacheLock);
-
-	pmb->pmb_refcnt--;
-
-	if (!keep && !pmb->pmb_refcnt) {
-		pmemblk_close(pmb->pmb_pool);
-		pmb->pmb_pool = NULL;
-		free(pmb->pmb_filename);
-		pmb->pmb_filename = NULL;
-		fio_pmemblk_cache_remove(pmb);
-		free(pmb);
-	}
-
-	pthread_mutex_unlock(&CacheLock);
-}
-
-static int pmb_get_flags(struct thread_data *td, uint64_t *pflags)
-{
-	static int thread_warned = 0;
-	static int odirect_warned = 0;
-
-	uint64_t flags = 0;
-
-	if (!td->o.use_thread) {
-		if (!thread_warned) {
-			thread_warned = 1;
-			log_err("pmemblk: must set thread=1 for pmemblk engine\n");
-		}
-		return 1;
-	}
-
-	if (!td->o.odirect && !odirect_warned) {
-		odirect_warned = 1;
-		log_info("pmemblk: direct == 0, but pmemblk is always direct\n");
-	}
-
-	if (td->o.allow_create)
-		flags |= PMB_CREATE;
-
-	(*pflags) = flags;
-	return 0;
-}
-
-static int fio_pmemblk_open_file(struct thread_data *td, struct fio_file *f)
-{
-	uint64_t flags = 0;
-	fio_pmemblk_file_t pmb;
-
-	if (pmb_get_flags(td, &flags))
-		return 1;
-
-	pmb = pmb_open(f->file_name, flags);
-	if (!pmb)
-		return 1;
-
-	FILE_SET_ENG_DATA(f, pmb);
-	return 0;
-}
-
-static int fio_pmemblk_close_file(struct thread_data fio_unused *td,
-				  struct fio_file *f)
-{
-	fio_pmemblk_file_t pmb = FILE_ENG_DATA(f);
-
-	if (pmb)
-		pmb_close(pmb, false);
-
-	FILE_SET_ENG_DATA(f, NULL);
-	return 0;
-}
-
-static int fio_pmemblk_get_file_size(struct thread_data *td, struct fio_file *f)
-{
-	uint64_t flags = 0;
-	fio_pmemblk_file_t pmb = FILE_ENG_DATA(f);
-
-	if (fio_file_size_known(f))
-		return 0;
-
-	if (!pmb) {
-		if (pmb_get_flags(td, &flags))
-			return 1;
-		pmb = pmb_open(f->file_name, flags);
-		if (!pmb)
-			return 1;
-	}
-
-	f->real_file_size = pmb->pmb_bsize * pmb->pmb_nblocks;
-
-	fio_file_set_size_known(f);
-
-	if (!FILE_ENG_DATA(f))
-		pmb_close(pmb, true);
-
-	return 0;
-}
-
-static enum fio_q_status fio_pmemblk_queue(struct thread_data *td,
-					   struct io_u *io_u)
-{
-	struct fio_file *f = io_u->file;
-	fio_pmemblk_file_t pmb = FILE_ENG_DATA(f);
-
-	unsigned long long off;
-	unsigned long len;
-	void *buf;
-
-	fio_ro_check(td, io_u);
-
-	switch (io_u->ddir) {
-	case DDIR_READ:
-	case DDIR_WRITE:
-		off = io_u->offset;
-		len = io_u->xfer_buflen;
-
-		io_u->error = EINVAL;
-		if (off % pmb->pmb_bsize)
-			break;
-		if (len % pmb->pmb_bsize)
-			break;
-		if ((off + len) / pmb->pmb_bsize > pmb->pmb_nblocks)
-			break;
-
-		io_u->error = 0;
-		buf = io_u->xfer_buf;
-		off /= pmb->pmb_bsize;
-		len /= pmb->pmb_bsize;
-		while (0 < len) {
-			if (io_u->ddir == DDIR_READ) {
-				if (0 != pmemblk_read(pmb->pmb_pool, buf, off)) {
-					io_u->error = errno;
-					break;
-				}
-			} else if (0 != pmemblk_write(pmb->pmb_pool, buf, off)) {
-				io_u->error = errno;
-				break;
-			}
-			buf += pmb->pmb_bsize;
-			off++;
-			len--;
-		}
-		off *= pmb->pmb_bsize;
-		len *= pmb->pmb_bsize;
-		io_u->resid = io_u->xfer_buflen - (off - io_u->offset);
-		break;
-	case DDIR_SYNC:
-	case DDIR_DATASYNC:
-	case DDIR_SYNC_FILE_RANGE:
-		/* we're always sync'd */
-		io_u->error = 0;
-		break;
-	default:
-		io_u->error = EINVAL;
-		break;
-	}
-
-	return FIO_Q_COMPLETED;
-}
-
-static int fio_pmemblk_unlink_file(struct thread_data *td, struct fio_file *f)
-{
-	char *path = NULL;
-	uint64_t bsize = 0;
-	uint64_t fsize = 0;
-
-	/*
-	 * we need our own unlink in case the user has specified
-	 * the block and file sizes in the path name.  we parse
-	 * the file_name to determine the file name we actually used.
-	 */
-
-	pmb_parse_path(f->file_name, &path, &bsize, &fsize);
-	if (!path)
-		return ENOENT;
-
-	unlink(path);
-	free(path);
-	return 0;
-}
-
-FIO_STATIC struct ioengine_ops ioengine = {
-	.name = "pmemblk",
-	.version = FIO_IOOPS_VERSION,
-	.queue = fio_pmemblk_queue,
-	.open_file = fio_pmemblk_open_file,
-	.close_file = fio_pmemblk_close_file,
-	.get_file_size = fio_pmemblk_get_file_size,
-	.unlink_file = fio_pmemblk_unlink_file,
-	.flags = FIO_SYNCIO | FIO_DISKLESSIO | FIO_NOEXTEND | FIO_NODISKUTIL,
-};
-
-static void fio_init fio_pmemblk_register(void)
-{
-	register_ioengine(&ioengine);
-}
-
-static void fio_exit fio_pmemblk_unregister(void)
-{
-	unregister_ioengine(&ioengine);
-}
diff --git a/examples/pmemblk.fio b/examples/pmemblk.fio
deleted file mode 100644
index 59bb2a8a..00000000
--- a/examples/pmemblk.fio
+++ /dev/null
@@ -1,71 +0,0 @@
-[global]
-bs=1m
-ioengine=pmemblk
-norandommap
-time_based
-runtime=30
-group_reporting
-disable_lat=1
-disable_slat=1
-disable_clat=1
-clat_percentiles=0
-cpus_allowed_policy=split
-
-# For the pmemblk engine:
-#
-#   IOs always complete immediately
-#   IOs are always direct
-#   Must use threads
-#
-iodepth=1
-direct=1
-thread
-numjobs=16
-#
-# Unlink can be used to remove the files when done, but if you are
-# using serial runs with stonewall, and you want the files to be created
-# only once and unlinked only at the very end, then put the unlink=1
-# in the last group.  This is the method demonstrated here.
-#
-# Note that if you have a read-only group and if the files will be
-# newly created, then all of the data will read back as zero and the
-# read will be optimized, yielding performance that is different from
-# that of reading non-zero blocks (or unoptimized zero blocks).
-#
-unlink=0
-#
-# The pmemblk engine does IO to files in a DAX-mounted filesystem.
-# The filesystem should be created on an NVDIMM (e.g /dev/pmem0)
-# and then mounted with the '-o dax' option.  Note that the engine
-# accesses the underlying NVDIMM directly, bypassing the kernel block
-# layer, so the usual filesystem/disk performance monitoring tools such
-# as iostat will not provide useful data.
-#
-# Here we specify a test file on each of two NVDIMMs.  The first
-# number after the file name is the block size in bytes (4096 bytes
-# in this example).  The second number is the size of the file to
-# create in MiB (1 GiB in this example); note that the actual usable
-# space available to fio will be less than this as libpmemblk requires
-# some space for metadata.
-#
-# Currently, the minimum block size is 512 bytes and the minimum file
-# size is about 17 MiB (these are libpmemblk requirements).
-#
-# While both files in this example have the same block size and file
-# size, this is not required.
-#
-filename=/pmem0/fio-test,4096,1024
-#filename=/pmem1/fio-test,4096,1024
-
-[pmemblk-write]
-rw=randwrite
-stonewall
-
-[pmemblk-read]
-rw=randread
-stonewall
-#
-# We're done, so unlink the file:
-#
-unlink=1
-
diff --git a/examples/pmemblk.png b/examples/pmemblk.png
deleted file mode 100644
index 250e254b..00000000
Binary files a/examples/pmemblk.png and /dev/null differ
diff --git a/filesetup.c b/filesetup.c
index cb7047c5..648f48c6 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -737,21 +737,11 @@ int generic_open_file(struct thread_data *td, struct fio_file *f)
 			f_out = stderr;
 	}
 
-	if (td_trim(td))
-		goto skip_flags;
 	if (td->o.odirect)
 		flags |= OS_O_DIRECT;
-	if (td->o.oatomic) {
-		if (!FIO_O_ATOMIC) {
-			td_verror(td, EINVAL, "OS does not support atomic IO");
-			return 1;
-		}
-		flags |= OS_O_DIRECT | FIO_O_ATOMIC;
-	}
 	flags |= td->o.sync_io;
 	if (td->o.create_on_open && td->o.allow_create)
 		flags |= O_CREAT;
-skip_flags:
 	if (f->filetype != FIO_TYPE_FILE)
 		flags |= FIO_O_NOATIME;
 
diff --git a/fio.1 b/fio.1
index 00a09353..e94fad0a 100644
--- a/fio.1
+++ b/fio.1
@@ -873,11 +873,6 @@ If value is true, use non-buffered I/O. This is usually O_DIRECT. Note that
 OpenBSD and ZFS on Solaris don't support direct I/O. On Windows the synchronous
 ioengines don't support direct I/O. Default: false.
 .TP
-.BI atomic \fR=\fPbool
-If value is true, attempt to use atomic direct I/O. Atomic writes are
-guaranteed to be stable once acknowledged by the operating system. Only
-Linux supports O_ATOMIC right now.
-.TP
 .BI buffered \fR=\fPbool
 If value is true, use buffered I/O. This is the opposite of the
 \fBdirect\fR option. Defaults to true.
@@ -1959,11 +1954,6 @@ e.g., on NAND, writing sequentially to erase blocks and discarding
 before overwriting. The \fBtrimwrite\fR mode works well for this
 constraint.
 .TP
-.B pmemblk
-Read and write using filesystem DAX to a file on a filesystem
-mounted with DAX on a persistent memory device through the PMDK
-libpmemblk library.
-.TP
 .B dev\-dax
 Read and write using device DAX to a persistent memory device (e.g.,
 /dev/dax0.0) through the PMDK libpmem library.
diff --git a/init.c b/init.c
index f6a8056a..78c6c803 100644
--- a/init.c
+++ b/init.c
@@ -916,12 +916,6 @@ static int fixup_options(struct thread_data *td)
 		ret |= 1;
 	}
 
-	/*
-	 * O_ATOMIC implies O_DIRECT
-	 */
-	if (o->oatomic)
-		o->odirect = 1;
-
 	/*
 	 * If randseed is set, that overrides randrepeat
 	 */
diff --git a/iolog.c b/iolog.c
index 3b296cd7..ea779632 100644
--- a/iolog.c
+++ b/iolog.c
@@ -439,7 +439,7 @@ static bool read_iolog(struct thread_data *td)
 	unsigned long long offset;
 	unsigned int bytes;
 	unsigned long long delay = 0;
-	int reads, writes, waits, fileno = 0, file_action = 0; /* stupid gcc */
+	int reads, writes, trims, waits, fileno = 0, file_action = 0; /* stupid gcc */
 	char *rfname, *fname, *act;
 	char *str, *p;
 	enum fio_ddir rw;
@@ -461,7 +461,7 @@ static bool read_iolog(struct thread_data *td)
 	rfname = fname = malloc(256+16);
 	act = malloc(256+16);
 
-	syncs = reads = writes = waits = 0;
+	syncs = reads = writes = trims = waits = 0;
 	while ((p = fgets(str, 4096, td->io_log_rfile)) != NULL) {
 		struct io_piece *ipo;
 		int r;
@@ -552,6 +552,13 @@ static bool read_iolog(struct thread_data *td)
 			if (read_only)
 				continue;
 			writes++;
+		} else if (rw == DDIR_TRIM) {
+			/*
+			 * Don't add a trim for ro mode
+			 */
+			if (read_only)
+				continue;
+			trims++;
 		} else if (rw == DDIR_WAIT) {
 			if (td->o.no_stall)
 				continue;
@@ -634,14 +641,16 @@ static bool read_iolog(struct thread_data *td)
 		return true;
 	}
 
-	if (!reads && !writes && !waits)
+	if (!reads && !writes && !waits && !trims)
 		return false;
-	else if (reads && !writes)
-		td->o.td_ddir = TD_DDIR_READ;
-	else if (!reads && writes)
-		td->o.td_ddir = TD_DDIR_WRITE;
-	else
-		td->o.td_ddir = TD_DDIR_RW;
+
+	td->o.td_ddir = 0;
+	if (reads)
+		td->o.td_ddir |= TD_DDIR_READ;
+	if (writes)
+		td->o.td_ddir |= TD_DDIR_WRITE;
+	if (trims)
+		td->o.td_ddir |= TD_DDIR_TRIM;
 
 	return true;
 }
diff --git a/memory.c b/memory.c
index 577d3dd5..2fdca657 100644
--- a/memory.c
+++ b/memory.c
@@ -295,7 +295,7 @@ int allocate_io_mem(struct thread_data *td)
 
 	total_mem = td->orig_buffer_size;
 
-	if (td->o.odirect || td->o.mem_align || td->o.oatomic ||
+	if (td->o.odirect || td->o.mem_align ||
 	    td_ioengine_flagged(td, FIO_MEMALIGN)) {
 		total_mem += page_mask;
 		if (td->o.mem_align && td->o.mem_align > page_size)
@@ -341,7 +341,7 @@ void free_io_mem(struct thread_data *td)
 	unsigned int total_mem;
 
 	total_mem = td->orig_buffer_size;
-	if (td->o.odirect || td->o.oatomic)
+	if (td->o.odirect)
 		total_mem += page_mask;
 
 	if (td->io_ops->iomem_alloc && !fio_option_is_set(&td->o, mem_type)) {
diff --git a/options.c b/options.c
index 49612345..536ba91c 100644
--- a/options.c
+++ b/options.c
@@ -2096,12 +2096,6 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "Hadoop Distributed Filesystem (HDFS) engine"
 			  },
 #endif
-#ifdef CONFIG_PMEMBLK
-			  { .ival = "pmemblk",
-			    .help = "PMDK libpmemblk based IO engine",
-			  },
-
-#endif
 #ifdef CONFIG_IME
 			  { .ival = "ime_psync",
 			    .help = "DDN's IME synchronous IO engine",
diff --git a/os/os-linux.h b/os/os-linux.h
index bbb1f27c..7a78b42d 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -205,12 +205,6 @@ static inline unsigned long long os_phys_mem(void)
 #define FIO_O_NOATIME	0
 #endif
 
-#ifdef O_ATOMIC
-#define OS_O_ATOMIC	O_ATOMIC
-#else
-#define OS_O_ATOMIC	040000000
-#endif
-
 #ifdef MADV_REMOVE
 #define FIO_MADV_FREE	MADV_REMOVE
 #endif
diff --git a/os/os.h b/os/os.h
index c428260c..ebaf8af5 100644
--- a/os/os.h
+++ b/os/os.h
@@ -133,12 +133,6 @@ extern int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu);
 #define OS_O_DIRECT			O_DIRECT
 #endif
 
-#ifdef OS_O_ATOMIC
-#define FIO_O_ATOMIC			OS_O_ATOMIC
-#else
-#define FIO_O_ATOMIC			0
-#endif
-
 #ifndef FIO_HAVE_HUGETLB
 #define SHM_HUGETLB			0
 #define MAP_HUGETLB			0
diff --git a/os/windows/examples.wxs b/os/windows/examples.wxs
index 9308ba8b..d70c7713 100755
--- a/os/windows/examples.wxs
+++ b/os/windows/examples.wxs
@@ -125,9 +125,6 @@
                 <Component>
                   <File Source="..\..\examples\numa.fio" />
                 </Component>
-                <Component>
-                  <File Source="..\..\examples\pmemblk.fio" />
-                </Component>
                 <Component>
                   <File Source="..\..\examples\poisson-rate-submission.fio" />
                 </Component>
@@ -212,7 +209,6 @@
             <ComponentRef Id="netio_multicast.fio" />
             <ComponentRef Id="null.fio" />
             <ComponentRef Id="numa.fio" />
-            <ComponentRef Id="pmemblk.fio" />
             <ComponentRef Id="poisson_rate_submission.fio" />
             <ComponentRef Id="rados.fio"/>
             <ComponentRef Id="rand_zones.fio" />

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1bd16cf9c113fcf9d49cae07da50e8a5c7a784ee:

  examples: update nbd.fio fiograph diagram (2023-02-14 10:47:50 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ded6cce8274ccf6f3820fb19ab46fd6d2aed0311:

  Merge branch 'Read_Stats_Not_Reported_For_Timed_Backlog_Verifies' of github.com:horshack-dpreview/fio (2023-02-15 12:49:31 -0500)

----------------------------------------------------------------
Horshack (1):
      Read stats for backlog verifies not reported for time-expired workloads

Vincent Fu (1):
      Merge branch 'Read_Stats_Not_Reported_For_Timed_Backlog_Verifies' of github.com:horshack-dpreview/fio

 backend.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index ffd34b36..0ccc7c2b 100644
--- a/backend.c
+++ b/backend.c
@@ -1919,7 +1919,8 @@ static void *thread_main(void *data)
 			}
 		} while (1);
 
-		if (td_read(td) && td->io_bytes[DDIR_READ])
+		if (td->io_bytes[DDIR_READ] && (td_read(td) ||
+			((td->flags & TD_F_VER_BACKLOG) && td_write(td))))
 			update_runtime(td, elapsed_us, DDIR_READ);
 		if (td_write(td) && td->io_bytes[DDIR_WRITE])
 			update_runtime(td, elapsed_us, DDIR_WRITE);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-15 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b65023f3c8849e122b2a223838ae9fdaed994e84:

  Merge branch 'msg-Modify_QD_Sync_Warning_For_offload' of https://github.com/horshack-dpreview/fio (2023-02-10 11:49:46 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1bd16cf9c113fcf9d49cae07da50e8a5c7a784ee:

  examples: update nbd.fio fiograph diagram (2023-02-14 10:47:50 -0500)

----------------------------------------------------------------
Richard W.M. Jones (1):
      examples: Small updates to nbd.fio

Shin'ichiro Kawasaki (8):
      zbd: refer file->last_start[] instead of sectors with data accounting
      zbd: remove CHECK_SWD feature
      zbd: rename the accounting 'sectors with data' to 'valid data bytes'
      doc: fix unit of zone_reset_threshold and relation to other option
      zbd: account valid data bytes only for zone_reset_threshold option
      zbd: check write ranges for zone_reset_threshold option
      zbd: initialize valid data bytes accounting at file setup
      t/zbd: add test cases for zone_reset_threshold option

Vincent Fu (1):
      examples: update nbd.fio fiograph diagram

 HOWTO.rst              |   9 ++-
 examples/nbd.fio       |  28 ++++++----
 examples/nbd.png       | Bin 88667 -> 43251 bytes
 fio.1                  |   8 ++-
 t/zbd/test-zbd-support |  60 +++++++++++++++++++-
 zbd.c                  | 149 +++++++++++++++++++++++--------------------------
 zbd.h                  |  11 ++--
 7 files changed, 161 insertions(+), 104 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 17caaf5d..158c5d89 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1085,9 +1085,12 @@ Target file/device
 
 .. option:: zone_reset_threshold=float
 
-	A number between zero and one that indicates the ratio of logical
-	blocks with data to the total number of logical blocks in the test
-	above which zones should be reset periodically.
+	A number between zero and one that indicates the ratio of written bytes
+	in the zones with write pointers in the IO range to the size of the IO
+	range. When current ratio is above this ratio, zones are reset
+	periodically as :option:`zone_reset_frequency` specifies. If there are
+	multiple jobs when using this option, the IO range for all write jobs
+	has to be the same.
 
 .. option:: zone_reset_frequency=float
 
diff --git a/examples/nbd.fio b/examples/nbd.fio
index 6900ebe7..31629fad 100644
--- a/examples/nbd.fio
+++ b/examples/nbd.fio
@@ -1,21 +1,25 @@
-# To use fio to test nbdkit:
+# To use fio to test nbdkit + RAM disk:
 #
-# nbdkit -U - memory size=256M --run 'export unixsocket; fio examples/nbd.fio'
+#   nbdkit -U - memory size=256M --run 'export uri; fio examples/nbd.fio'
 #
-# To use fio to test qemu-nbd:
+# To use fio to test nbdkit + local file:
 #
-# rm -f /tmp/disk.img /tmp/socket
-# truncate -s 256M /tmp/disk.img
-# export unixsocket=/tmp/socket
-# qemu-nbd -t -k $unixsocket -f raw /tmp/disk.img &
-# fio examples/nbd.fio
-# killall qemu-nbd
+#   rm -f /var/tmp/disk.img
+#   truncate -s 256M /var/tmp/disk.img
+#   nbdkit -U - file /var/tmp/disk.img --run 'export uri; fio examples/nbd.fio'
+#
+# To use fio to test qemu-nbd + local file:
+#
+#   rm -f /var/tmp/disk.img /var/tmp/socket
+#   truncate -s 256M /var/tmp/disk.img
+#   export uri='nbd+unix:///?socket=/var/tmp/socket'
+#   qemu-nbd -t -k /var/tmp/socket -f raw /var/tmp/disk.img &
+#   fio examples/nbd.fio
+#   killall qemu-nbd
 
 [global]
 ioengine=nbd
-uri=nbd+unix:///?socket=${unixsocket}
-# Starting from nbdkit 1.14 the following will work:
-#uri=${uri}
+uri=${uri}
 rw=randrw
 time_based
 runtime=60
diff --git a/examples/nbd.png b/examples/nbd.png
index e3bcf610..3a933c9b 100644
Binary files a/examples/nbd.png and b/examples/nbd.png differ
diff --git a/fio.1 b/fio.1
index 527b3d46..00a09353 100644
--- a/fio.1
+++ b/fio.1
@@ -854,9 +854,11 @@ of the zoned block device in use, thus allowing the option \fBmax_open_zones\fR
 value to be larger than the device reported limit. Default: false.
 .TP
 .BI zone_reset_threshold \fR=\fPfloat
-A number between zero and one that indicates the ratio of logical blocks with
-data to the total number of logical blocks in the test above which zones
-should be reset periodically.
+A number between zero and one that indicates the ratio of written bytes in the
+zones with write pointers in the IO range to the size of the IO range. When
+current ratio is above this ratio, zones are reset periodically as
+\fBzone_reset_frequency\fR specifies. If there are multiple jobs when using this
+option, the IO range for all write jobs has to be the same.
 .TP
 .BI zone_reset_frequency \fR=\fPfloat
 A number between zero and one that indicates how often a zone reset should be
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 4091d9ac..893aff3c 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -1110,8 +1110,8 @@ test51() {
 	run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
 }
 
-# Verify that zone_reset_threshold only takes logical blocks from seq
-# zones into account, and logical blocks of conv zones are not counted.
+# Verify that zone_reset_threshold only accounts written bytes in seq
+# zones, and written data bytes of conv zones are not counted.
 test52() {
 	local off io_size
 
@@ -1305,6 +1305,62 @@ test60() {
 	grep -q 'not support experimental verify' "${logfile}.${test_number}"
 }
 
+# Test fio errors out zone_reset_threshold option for multiple jobs with
+# different write ranges.
+test61() {
+	run_fio_on_seq "$(ioengine "psync")" --rw=write --size="$zone_size" \
+		       --numjobs=2 --offset_increment="$zone_size" \
+		       --zone_reset_threshold=0.1 --zone_reset_frequency=1 \
+		       --exitall_on_error=1 \
+		       >> "${logfile}.${test_number}" 2>&1 && return 1
+	grep -q 'different write ranges' "${logfile}.${test_number}"
+}
+
+# Test zone_reset_threshold option works for multiple jobs with same write
+# range.
+test62() {
+	local bs loops=2 size=$((zone_size))
+
+	[ -n "$is_zbd" ] && reset_zone "$dev" -1
+
+	# Two jobs write to single zone twice. Reset zone happens at next write
+	# after half of the zone gets filled. So 2 * 2 * 2 - 1 = 7 times zone
+	# resets are expected.
+	bs=$(min $((256*1024)) $((zone_size / 4)))
+	run_fio_on_seq "$(ioengine "psync")" --rw=write --bs="$bs" \
+		       --size=$size --loops=$loops --numjobs=2 \
+		       --zone_reset_frequency=1 --zone_reset_threshold=.5 \
+		       --group_reporting=1 \
+		       >> "${logfile}.${test_number}" 2>&1 || return $?
+	check_written $((size * loops * 2)) || return $?
+	check_reset_count -eq 7 || return $?
+}
+
+# Test zone_reset_threshold option works for a read job and a write job with
+# different IO range.
+test63() {
+	local bs loops=2 size=$((zone_size)) off1 off2
+
+	[ -n "$is_zbd" ] && reset_zone "$dev" -1
+
+	off1=$((first_sequential_zone_sector * 512))
+	off2=$((off1 + zone_size))
+	bs=$(min $((256*1024)) $((zone_size / 4)))
+
+	# One job writes to single zone twice. Reset zone happens at next write
+	# after half of the zone gets filled. So 2 * 2 - 1 = 3 times zone resets
+	# are expected.
+	run_fio "$(ioengine "psync")" --bs="$bs" --size=$size --loops=$loops \
+		--filename="$dev" --group_reporting=1 \
+		--zonemode=zbd --zonesize="$zone_size" --direct=1 \
+		--zone_reset_frequency=1 --zone_reset_threshold=.5 \
+		--name=r --rw=read --offset=$off1 "${job_var_opts[@]}" \
+		--name=w --rw=write --offset=$off2 "${job_var_opts[@]}" \
+		       >> "${logfile}.${test_number}" 2>&1 || return $?
+	check_written $((size * loops)) || return $?
+	check_reset_count -eq 3 || return $?
+}
+
 SECONDS=0
 tests=()
 dynamic_analyzer=()
diff --git a/zbd.c b/zbd.c
index d1e469f6..ba2c0401 100644
--- a/zbd.c
+++ b/zbd.c
@@ -147,6 +147,11 @@ zbd_offset_to_zone(const struct fio_file *f,  uint64_t offset)
 	return zbd_get_zone(f, zbd_offset_to_zone_idx(f, offset));
 }
 
+static bool accounting_vdb(struct thread_data *td, const struct fio_file *f)
+{
+	return td->o.zrt.u.f && td_write(td);
+}
+
 /**
  * zbd_get_zoned_model - Get a device zoned model
  * @td: FIO thread data
@@ -285,10 +290,11 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 		break;
 	}
 
-	pthread_mutex_lock(&f->zbd_info->mutex);
-	f->zbd_info->sectors_with_data -= data_in_zone;
-	f->zbd_info->wp_sectors_with_data -= data_in_zone;
-	pthread_mutex_unlock(&f->zbd_info->mutex);
+	if (accounting_vdb(td, f)) {
+		pthread_mutex_lock(&f->zbd_info->mutex);
+		f->zbd_info->wp_valid_data_bytes -= data_in_zone;
+		pthread_mutex_unlock(&f->zbd_info->mutex);
+	}
 
 	z->wp = z->start;
 
@@ -536,7 +542,7 @@ static bool zbd_using_direct_io(void)
 }
 
 /* Whether or not the I/O range for f includes one or more sequential zones */
-static bool zbd_is_seq_job(struct fio_file *f)
+static bool zbd_is_seq_job(const struct fio_file *f)
 {
 	uint32_t zone_idx, zone_idx_b, zone_idx_e;
 
@@ -1068,6 +1074,52 @@ void zbd_recalc_options_with_zone_granularity(struct thread_data *td)
 	}
 }
 
+static uint64_t zbd_verify_and_set_vdb(struct thread_data *td,
+				       const struct fio_file *f)
+{
+	struct fio_zone_info *zb, *ze, *z;
+	uint64_t wp_vdb = 0;
+	struct zoned_block_device_info *zbdi = f->zbd_info;
+
+	assert(td->runstate < TD_RUNNING);
+	assert(zbdi);
+
+	if (!accounting_vdb(td, f))
+		return 0;
+
+	/*
+	 * Ensure that the I/O range includes one or more sequential zones so
+	 * that f->min_zone and f->max_zone have different values.
+	 */
+	if (!zbd_is_seq_job(f))
+		return 0;
+
+	if (zbdi->write_min_zone != zbdi->write_max_zone) {
+		if (zbdi->write_min_zone != f->min_zone ||
+		    zbdi->write_max_zone != f->max_zone) {
+			td_verror(td, EINVAL,
+				  "multi-jobs with different write ranges are "
+				  "not supported with zone_reset_threshold");
+			log_err("multi-jobs with different write ranges are "
+				"not supported with zone_reset_threshold\n");
+		}
+		return 0;
+	}
+
+	zbdi->write_min_zone = f->min_zone;
+	zbdi->write_max_zone = f->max_zone;
+
+	zb = zbd_get_zone(f, f->min_zone);
+	ze = zbd_get_zone(f, f->max_zone);
+	for (z = zb; z < ze; z++)
+		if (z->has_wp)
+			wp_vdb += z->wp - z->start;
+
+	zbdi->wp_valid_data_bytes = wp_vdb;
+
+	return wp_vdb;
+}
+
 int zbd_setup_files(struct thread_data *td)
 {
 	struct fio_file *f;
@@ -1093,6 +1145,7 @@ int zbd_setup_files(struct thread_data *td)
 		struct zoned_block_device_info *zbd = f->zbd_info;
 		struct fio_zone_info *z;
 		int zi;
+		uint64_t vdb;
 
 		assert(zbd);
 
@@ -1100,6 +1153,11 @@ int zbd_setup_files(struct thread_data *td)
 		f->max_zone =
 			zbd_offset_to_zone_idx(f, f->file_offset + f->io_size);
 
+		vdb = zbd_verify_and_set_vdb(td, f);
+
+		dprint(FD_ZBD, "%s(%s): valid data bytes = %" PRIu64 "\n",
+		       __func__, f->file_name, vdb);
+
 		/*
 		 * When all zones in the I/O range are conventional, io_size
 		 * can be smaller than zone size, making min_zone the same
@@ -1191,68 +1249,9 @@ static bool zbd_dec_and_reset_write_cnt(const struct thread_data *td,
 	return write_cnt == 0;
 }
 
-enum swd_action {
-	CHECK_SWD,
-	SET_SWD,
-};
-
-/* Calculate the number of sectors with data (swd) and perform action 'a' */
-static uint64_t zbd_process_swd(struct thread_data *td,
-				const struct fio_file *f, enum swd_action a)
-{
-	struct fio_zone_info *zb, *ze, *z;
-	uint64_t swd = 0;
-	uint64_t wp_swd = 0;
-
-	zb = zbd_get_zone(f, f->min_zone);
-	ze = zbd_get_zone(f, f->max_zone);
-	for (z = zb; z < ze; z++) {
-		if (z->has_wp) {
-			zone_lock(td, f, z);
-			wp_swd += z->wp - z->start;
-		}
-		swd += z->wp - z->start;
-	}
-
-	pthread_mutex_lock(&f->zbd_info->mutex);
-	switch (a) {
-	case CHECK_SWD:
-		assert(f->zbd_info->sectors_with_data == swd);
-		assert(f->zbd_info->wp_sectors_with_data == wp_swd);
-		break;
-	case SET_SWD:
-		f->zbd_info->sectors_with_data = swd;
-		f->zbd_info->wp_sectors_with_data = wp_swd;
-		break;
-	}
-	pthread_mutex_unlock(&f->zbd_info->mutex);
-
-	for (z = zb; z < ze; z++)
-		if (z->has_wp)
-			zone_unlock(z);
-
-	return swd;
-}
-
-/*
- * The swd check is useful for debugging but takes too much time to leave
- * it enabled all the time. Hence it is disabled by default.
- */
-static const bool enable_check_swd = false;
-
-/* Check whether the values of zbd_info.*sectors_with_data are correct. */
-static void zbd_check_swd(struct thread_data *td, const struct fio_file *f)
-{
-	if (!enable_check_swd)
-		return;
-
-	zbd_process_swd(td, f, CHECK_SWD);
-}
-
 void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 {
 	struct fio_zone_info *zb, *ze;
-	uint64_t swd;
 	bool verify_data_left = false;
 
 	if (!f->zbd_info || !td_write(td))
@@ -1260,10 +1259,6 @@ void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 
 	zb = zbd_get_zone(f, f->min_zone);
 	ze = zbd_get_zone(f, f->max_zone);
-	swd = zbd_process_swd(td, f, SET_SWD);
-
-	dprint(FD_ZBD, "%s(%s): swd = %" PRIu64 "\n",
-	       __func__, f->file_name, swd);
 
 	/*
 	 * If data verification is enabled reset the affected zones before
@@ -1639,12 +1634,11 @@ static void zbd_queue_io(struct thread_data *td, struct io_u *io_u, int q,
 		 * z->wp > zone_end means that one or more I/O errors
 		 * have occurred.
 		 */
-		pthread_mutex_lock(&zbd_info->mutex);
-		if (z->wp <= zone_end) {
-			zbd_info->sectors_with_data += zone_end - z->wp;
-			zbd_info->wp_sectors_with_data += zone_end - z->wp;
+		if (accounting_vdb(td, f) && z->wp <= zone_end) {
+			pthread_mutex_lock(&zbd_info->mutex);
+			zbd_info->wp_valid_data_bytes += zone_end - z->wp;
+			pthread_mutex_unlock(&zbd_info->mutex);
 		}
-		pthread_mutex_unlock(&zbd_info->mutex);
 		z->wp = zone_end;
 		break;
 	default:
@@ -1684,7 +1678,6 @@ static void zbd_put_io(struct thread_data *td, const struct io_u *io_u)
 	zbd_end_zone_io(td, io_u, z);
 
 	zone_unlock(z);
-	zbd_check_swd(td, f);
 }
 
 /*
@@ -1801,8 +1794,7 @@ enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
 	if (ddir != DDIR_READ || !td_rw(td))
 		return ddir;
 
-	if (io_u->file->zbd_info->sectors_with_data ||
-	    td->o.read_beyond_wp)
+	if (io_u->file->last_start[DDIR_WRITE] != -1ULL || td->o.read_beyond_wp)
 		return DDIR_READ;
 
 	return DDIR_WRITE;
@@ -1874,8 +1866,6 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	    io_u->ddir == DDIR_READ && td->o.read_beyond_wp)
 		return io_u_accept;
 
-	zbd_check_swd(td, f);
-
 	zone_lock(td, f, zb);
 
 	switch (io_u->ddir) {
@@ -2000,7 +1990,8 @@ retry:
 
 		/* Check whether the zone reset threshold has been exceeded */
 		if (td->o.zrf.u.f) {
-			if (zbdi->wp_sectors_with_data >= f->io_size * td->o.zrt.u.f &&
+			if (zbdi->wp_valid_data_bytes >=
+			    f->io_size * td->o.zrt.u.f &&
 			    zbd_dec_and_reset_write_cnt(td, f))
 				zb->reset_zone = 1;
 		}
diff --git a/zbd.h b/zbd.h
index d425707e..05189555 100644
--- a/zbd.h
+++ b/zbd.h
@@ -54,9 +54,9 @@ struct fio_zone_info {
  * @mutex: Protects the modifiable members in this structure (refcount and
  *		num_open_zones).
  * @zone_size: size of a single zone in bytes.
- * @sectors_with_data: total size of data in all zones in units of 512 bytes
- * @wp_sectors_with_data: total size of data in zones with write pointers in
- *                        units of 512 bytes
+ * @wp_valid_data_bytes: total size of data in zones with write pointers
+ * @write_min_zone: Minimum zone index of all job's write ranges. Inclusive.
+ * @write_max_zone: Maximum zone index of all job's write ranges. Exclusive.
  * @zone_size_log2: log2 of the zone size in bytes if it is a power of 2 or 0
  *		if the zone size is not a power of 2.
  * @nr_zones: number of zones
@@ -76,8 +76,9 @@ struct zoned_block_device_info {
 	uint32_t		max_open_zones;
 	pthread_mutex_t		mutex;
 	uint64_t		zone_size;
-	uint64_t		sectors_with_data;
-	uint64_t		wp_sectors_with_data;
+	uint64_t		wp_valid_data_bytes;
+	uint32_t		write_min_zone;
+	uint32_t		write_max_zone;
 	uint32_t		zone_size_log2;
 	uint32_t		nr_zones;
 	uint32_t		refcount;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0f85dc841d9220f3a0b45d0b104d5277ea962199:

  Merge branch 'Offload_Segfault_Write_Log' of https://github.com/horshack-dpreview/fio (2023-02-09 09:34:32 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b65023f3c8849e122b2a223838ae9fdaed994e84:

  Merge branch 'msg-Modify_QD_Sync_Warning_For_offload' of https://github.com/horshack-dpreview/fio (2023-02-10 11:49:46 -0500)

----------------------------------------------------------------
Horshack (1):
      Suppress sync engine QD > 1 warning if io_submit_mode is offload

Vincent Fu (1):
      Merge branch 'msg-Modify_QD_Sync_Warning_For_offload' of https://github.com/horshack-dpreview/fio

 backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 928e524a..ffd34b36 100644
--- a/backend.c
+++ b/backend.c
@@ -1796,7 +1796,7 @@ static void *thread_main(void *data)
 	if (td_io_init(td))
 		goto err;
 
-	if (td_ioengine_flagged(td, FIO_SYNCIO) && td->o.iodepth > 1) {
+	if (td_ioengine_flagged(td, FIO_SYNCIO) && td->o.iodepth > 1 && td->o.io_submit_mode != IO_MODE_OFFLOAD) {
 		log_info("note: both iodepth >= 1 and synchronous I/O engine "
 			 "are selected, queue depth will be capped at 1\n");
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-10 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-10 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f0c8ab1c36369d8d6aa214fba572dacefa3a8677:

  ioengines: clarify FIO_RO_NEEDS_RW_OPEN flag (2023-02-07 10:44:00 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0f85dc841d9220f3a0b45d0b104d5277ea962199:

  Merge branch 'Offload_Segfault_Write_Log' of https://github.com/horshack-dpreview/fio (2023-02-09 09:34:32 -0700)

----------------------------------------------------------------
Horshack (1):
      SIGSEGV / Exit 139 when write_iolog used with io_submit_mode=offload

Jens Axboe (1):
      Merge branch 'Offload_Segfault_Write_Log' of https://github.com/horshack-dpreview/fio

 rate-submit.c | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/rate-submit.c b/rate-submit.c
index 2fe768c0..3cc17eaa 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -154,6 +154,7 @@ static int io_workqueue_init_worker_fn(struct submit_worker *sw)
 	dup_files(td, parent);
 	td->eo = parent->eo;
 	fio_options_mem_dupe(td);
+	td->iolog_f = parent->iolog_f;
 
 	if (ioengine_load(td))
 		goto err;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6d7f8d9a31f9ecdeab0eed8f23c63b9a94ec61f6:

  engines/libzbc: set FIO_RO_NEEDS_RW_OPEN engine flag (2023-02-06 12:36:37 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f0c8ab1c36369d8d6aa214fba572dacefa3a8677:

  ioengines: clarify FIO_RO_NEEDS_RW_OPEN flag (2023-02-07 10:44:00 -0500)

----------------------------------------------------------------
Vincent Fu (3):
      Revert "engines/libzbc: set FIO_RO_NEEDS_RW_OPEN engine flag"
      engines/libzbc: for read workloads always open devices with O_RDONLY flag
      ioengines: clarify FIO_RO_NEEDS_RW_OPEN flag

 engines/libzbc.c | 6 +-----
 ioengines.h      | 3 ++-
 2 files changed, 3 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libzbc.c b/engines/libzbc.c
index dae4fe16..cb3e9ca5 100644
--- a/engines/libzbc.c
+++ b/engines/libzbc.c
@@ -68,9 +68,6 @@ static int libzbc_open_dev(struct thread_data *td, struct fio_file *f,
 		if (!read_only)
 			flags |= O_RDWR;
 	} else if (td_read(td)) {
-		if (f->filetype == FIO_TYPE_CHAR && !read_only)
-			flags |= O_RDWR;
-		else
 			flags |= O_RDONLY;
 	}
 
@@ -469,8 +466,7 @@ FIO_STATIC struct ioengine_ops ioengine = {
 	.get_max_open_zones	= libzbc_get_max_open_zones,
 	.finish_zone		= libzbc_finish_zone,
 	.queue			= libzbc_queue,
-	.flags			= FIO_SYNCIO | FIO_NOEXTEND | FIO_RAWIO |
-				  FIO_RO_NEEDS_RW_OPEN,
+	.flags			= FIO_SYNCIO | FIO_NOEXTEND | FIO_RAWIO,
 };
 
 static void fio_init fio_libzbc_register(void)
diff --git a/ioengines.h b/ioengines.h
index 2cb9743e..ea799180 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -90,7 +90,8 @@ enum fio_ioengine_flags {
 	FIO_SKIPPABLE_IOMEM_ALLOC
 			= 1 << 17,	/* skip iomem_alloc & iomem_free if job sets mem/iomem */
 	FIO_RO_NEEDS_RW_OPEN
-			= 1 << 18,	/* open files in rw mode even if we have a read job */
+			= 1 << 18,	/* open files in rw mode even if we have a read job; only
+					   affects ioengines using generic_open_file */
 };
 
 /*

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit d72b10e3ca2f7e3b7ef4e54ea98e4e964a67192d:

  fio: add FIO_RO_NEEDS_RW_OPEN ioengine flag (2023-02-03 13:26:32 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6d7f8d9a31f9ecdeab0eed8f23c63b9a94ec61f6:

  engines/libzbc: set FIO_RO_NEEDS_RW_OPEN engine flag (2023-02-06 12:36:37 -0700)

----------------------------------------------------------------
Horshack (1):
      Improve IOPs 50% by avoiding clock sampling when rate options not used

Jens Axboe (2):
      Merge branch 'perf-Avoid_Clock_Check_For_No_Rate_Check' of https://github.com/horshack-dpreview/fio
      engines/libzbc: set FIO_RO_NEEDS_RW_OPEN engine flag

 engines/libzbc.c |  3 ++-
 io_u.c           | 10 +++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libzbc.c b/engines/libzbc.c
index 2b63ef1a..dae4fe16 100644
--- a/engines/libzbc.c
+++ b/engines/libzbc.c
@@ -469,7 +469,8 @@ FIO_STATIC struct ioengine_ops ioengine = {
 	.get_max_open_zones	= libzbc_get_max_open_zones,
 	.finish_zone		= libzbc_finish_zone,
 	.queue			= libzbc_queue,
-	.flags			= FIO_SYNCIO | FIO_NOEXTEND | FIO_RAWIO,
+	.flags			= FIO_SYNCIO | FIO_NOEXTEND | FIO_RAWIO |
+				  FIO_RO_NEEDS_RW_OPEN,
 };
 
 static void fio_init fio_libzbc_register(void)
diff --git a/io_u.c b/io_u.c
index 8035f4b7..eb617e64 100644
--- a/io_u.c
+++ b/io_u.c
@@ -785,7 +785,15 @@ static enum fio_ddir get_rw_ddir(struct thread_data *td)
 	else
 		ddir = DDIR_INVAL;
 
-	td->rwmix_ddir = rate_ddir(td, ddir);
+	if (!should_check_rate(td)) {
+		/*
+		 * avoid time-consuming call to utime_since_now() if rate checking
+		 * isn't being used. this imrpoves IOPs 50%. See:
+		 * https://github.com/axboe/fio/issues/1501#issuecomment-1418327049
+		 */
+		td->rwmix_ddir = ddir;
+	} else
+		td->rwmix_ddir = rate_ddir(td, ddir);
 	return td->rwmix_ddir;
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-04 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-04 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7d7a704638a1e957c845c04eeac82bdeda0c674c:

  lib/pattern: fix formatting (2023-01-31 10:44:54 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d72b10e3ca2f7e3b7ef4e54ea98e4e964a67192d:

  fio: add FIO_RO_NEEDS_RW_OPEN ioengine flag (2023-02-03 13:26:32 -0500)

----------------------------------------------------------------
Horshack (1):
      Add -replay_skip support for fio-generated I/O logs

Vincent Fu (2):
      Merge branch 'master' of https://github.com/horshack-dpreview/fio
      fio: add FIO_RO_NEEDS_RW_OPEN ioengine flag

 engines/sg.c |  2 +-
 filesetup.c  |  2 +-
 ioengines.h  |  2 ++
 iolog.c      | 20 ++++++++++++++------
 4 files changed, 18 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/engines/sg.c b/engines/sg.c
index 24783374..0bb5be4a 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -1428,7 +1428,7 @@ static struct ioengine_ops ioengine = {
 	.open_file	= fio_sgio_open,
 	.close_file	= fio_sgio_close,
 	.get_file_size	= fio_sgio_get_file_size,
-	.flags		= FIO_SYNCIO | FIO_RAWIO,
+	.flags		= FIO_SYNCIO | FIO_RAWIO | FIO_RO_NEEDS_RW_OPEN,
 	.options	= options,
 	.option_struct_size	= sizeof(struct sg_options)
 };
diff --git a/filesetup.c b/filesetup.c
index 1d3cc5ad..cb7047c5 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -768,7 +768,7 @@ open_again:
 		else
 			from_hash = file_lookup_open(f, flags);
 	} else if (td_read(td)) {
-		if (f->filetype == FIO_TYPE_CHAR && !read_only)
+		if (td_ioengine_flagged(td, FIO_RO_NEEDS_RW_OPEN) && !read_only)
 			flags |= O_RDWR;
 		else
 			flags |= O_RDONLY;
diff --git a/ioengines.h b/ioengines.h
index d43540d0..2cb9743e 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -89,6 +89,8 @@ enum fio_ioengine_flags {
 			= 1 << 16,	/* async ioengine with commit function that sets issue_time */
 	FIO_SKIPPABLE_IOMEM_ALLOC
 			= 1 << 17,	/* skip iomem_alloc & iomem_free if job sets mem/iomem */
+	FIO_RO_NEEDS_RW_OPEN
+			= 1 << 18,	/* open files in rw mode even if we have a read job */
 };
 
 /*
diff --git a/iolog.c b/iolog.c
index 62f2f524..3b296cd7 100644
--- a/iolog.c
+++ b/iolog.c
@@ -492,17 +492,25 @@ static bool read_iolog(struct thread_data *td)
 			 */
 			if (!strcmp(act, "wait"))
 				rw = DDIR_WAIT;
-			else if (!strcmp(act, "read"))
+			else if (!strcmp(act, "read")) {
+				if (td->o.replay_skip & (1u << DDIR_READ))
+					continue;
 				rw = DDIR_READ;
-			else if (!strcmp(act, "write"))
+			} else if (!strcmp(act, "write")) {
+				if (td->o.replay_skip & (1u << DDIR_WRITE))
+					continue;
 				rw = DDIR_WRITE;
-			else if (!strcmp(act, "sync"))
+			} else if (!strcmp(act, "sync")) {
+				if (td->o.replay_skip & (1u << DDIR_SYNC))
+					continue;
 				rw = DDIR_SYNC;
-			else if (!strcmp(act, "datasync"))
+			} else if (!strcmp(act, "datasync"))
 				rw = DDIR_DATASYNC;
-			else if (!strcmp(act, "trim"))
+			else if (!strcmp(act, "trim")) {
+				if (td->o.replay_skip & (1u << DDIR_TRIM))
+					continue;
 				rw = DDIR_TRIM;
-			else {
+			} else {
 				log_err("fio: bad iolog file action: %s\n",
 									act);
 				continue;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-02-01 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-02-01 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c6cade164bc7e35e95ba88f816be4f44475e4e23:

  lib/pattern: Fix seg fault when calculating pattern length (2023-01-30 10:46:22 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7d7a704638a1e957c845c04eeac82bdeda0c674c:

  lib/pattern: fix formatting (2023-01-31 10:44:54 -0500)

----------------------------------------------------------------
Vincent Fu (2):
      test: add test for lib/pattern segfault issue
      lib/pattern: fix formatting

 lib/pattern.c             | 4 ++--
 t/jobs/t0028-c6cade16.fio | 5 +++++
 t/run-fio-tests.py        | 9 +++++++++
 3 files changed, 16 insertions(+), 2 deletions(-)
 create mode 100644 t/jobs/t0028-c6cade16.fio

---

Diff of recent changes:

diff --git a/lib/pattern.c b/lib/pattern.c
index e31d4734..9fca643e 100644
--- a/lib/pattern.c
+++ b/lib/pattern.c
@@ -386,9 +386,9 @@ static int parse_and_fill_pattern(const char *in, unsigned int in_len,
 		assert(filled);
 		assert(filled <= out_len);
 		out_len -= filled;
-		if (out)
-			out     += filled;
 		total   += filled;
+		if (out)
+			out += filled;
 
 	} while (in_len);
 
diff --git a/t/jobs/t0028-c6cade16.fio b/t/jobs/t0028-c6cade16.fio
new file mode 100644
index 00000000..a0096d80
--- /dev/null
+++ b/t/jobs/t0028-c6cade16.fio
@@ -0,0 +1,5 @@
+[test]
+size=16k
+readwrite=write
+buffer_pattern="abcd"-120xdeadface
+ioengine=null
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 70ff4371..c3091b68 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -1246,6 +1246,15 @@ TEST_LIST = [
         'pre_success':      None,
         'requirements':     [],
     },
+    {
+        'test_id':          28,
+        'test_class':       FioJobTest,
+        'job':              't0028-c6cade16.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-01-31 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-01-31 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1e8ec88fd5f3ab4b7bbd0119708d94fd64a4e7ad:

  Enable crc32c accelleration for arm64 on OSX (2023-01-25 08:01:30 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c6cade164bc7e35e95ba88f816be4f44475e4e23:

  lib/pattern: Fix seg fault when calculating pattern length (2023-01-30 10:46:22 -0500)

----------------------------------------------------------------
Vincent Fu (1):
      lib/pattern: Fix seg fault when calculating pattern length

 lib/pattern.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/lib/pattern.c b/lib/pattern.c
index 9be29af6..e31d4734 100644
--- a/lib/pattern.c
+++ b/lib/pattern.c
@@ -386,7 +386,8 @@ static int parse_and_fill_pattern(const char *in, unsigned int in_len,
 		assert(filled);
 		assert(filled <= out_len);
 		out_len -= filled;
-		out     += filled;
+		if (out)
+			out     += filled;
 		total   += filled;
 
 	} while (in_len);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-01-26 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-01-26 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5ca54c1ba2db849dfaef5fe3aec60329b3df0bd1:

  Makefile: add -Wno-stringop-truncation for y.tab.o (2023-01-24 21:07:37 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1e8ec88fd5f3ab4b7bbd0119708d94fd64a4e7ad:

  Enable crc32c accelleration for arm64 on OSX (2023-01-25 08:01:30 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Enable crc32c accelleration for arm64 on OSX

 configure   |  8 +++++---
 os/os-mac.h | 10 ++++++++++
 2 files changed, 15 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index a17d1cda..182cd3c3 100755
--- a/configure
+++ b/configure
@@ -2685,20 +2685,22 @@ print_config "libblkio engine" "$libblkio"
 
 ##########################################
 # check march=armv8-a+crc+crypto
-if test "$march_armv8_a_crc_crypto" != "yes" ; then
-  march_armv8_a_crc_crypto="no"
-fi
+march_armv8_a_crc_crypto="no"
 if test "$cpu" = "arm64" ; then
   cat > $TMPC <<EOF
+#if __linux__
 #include <arm_acle.h>
 #include <arm_neon.h>
 #include <sys/auxv.h>
+#endif
 
 int main(void)
 {
   /* Can we also do a runtime probe? */
 #if __linux__
   return getauxval(AT_HWCAP);
+#elif defined(__APPLE__)
+  return 0;
 #else
 # error "Don't know how to do runtime probe for ARM CRC32c"
 #endif
diff --git a/os/os-mac.h b/os/os-mac.h
index ec2cc1e5..c9103c45 100644
--- a/os/os-mac.h
+++ b/os/os-mac.h
@@ -14,12 +14,14 @@
 #include <machine/endian.h>
 #include <libkern/OSByteOrder.h>
 
+#include "../arch/arch.h"
 #include "../file.h"
 
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_GETTID
 #define FIO_HAVE_CHARDEV_SIZE
 #define FIO_HAVE_NATIVE_FALLOCATE
+#define FIO_HAVE_CPU_HAS
 
 #define OS_MAP_ANON		MAP_ANON
 
@@ -106,4 +108,12 @@ static inline bool fio_fallocate(struct fio_file *f, uint64_t offset, uint64_t l
 	return false;
 }
 
+static inline bool os_cpu_has(cpu_features feature)
+{
+	/* just check for arm on OSX for now, we know that has it */
+	if (feature != CPU_ARM64_CRC32C)
+		return false;
+	return FIO_ARCH == arch_aarch64;
+}
+
 #endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-01-25 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-01-25 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 13bc47b65d32df6fd212c4687c7fd29b4ab7c09d:

  tools/fiograph: accommodate job files not ending in .fio (2023-01-23 13:51:16 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5ca54c1ba2db849dfaef5fe3aec60329b3df0bd1:

  Makefile: add -Wno-stringop-truncation for y.tab.o (2023-01-24 21:07:37 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Makefile: add -Wno-stringop-truncation for y.tab.o

 Makefile  |  6 +++++-
 configure | 19 +++++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 9fd8f59b..5f4e6562 100644
--- a/Makefile
+++ b/Makefile
@@ -554,11 +554,15 @@ ifneq (,$(findstring -Wimplicit-fallthrough,$(CFLAGS)))
 LEX_YY_CFLAGS := -Wno-implicit-fallthrough
 endif
 
+ifdef CONFIG_HAVE_NO_STRINGOP
+YTAB_YY_CFLAGS := -Wno-stringop-truncation
+endif
+
 lex.yy.o: lex.yy.c y.tab.h
 	$(QUIET_CC)$(CC) -o $@ $(CFLAGS) $(CPPFLAGS) $(LEX_YY_CFLAGS) -c $<
 
 y.tab.o: y.tab.c y.tab.h
-	$(QUIET_CC)$(CC) -o $@ $(CFLAGS) $(CPPFLAGS) -c $<
+	$(QUIET_CC)$(CC) -o $@ $(CFLAGS) $(CPPFLAGS) $(YTAB_YY_CFLAGS) -c $<
 
 y.tab.c: exp/expression-parser.y
 	$(QUIET_YACC)$(YACC) -o $@ -l -d -b y $<
diff --git a/configure b/configure
index 6d8e3a87..a17d1cda 100755
--- a/configure
+++ b/configure
@@ -2826,6 +2826,22 @@ if compile_prog "-Wimplicit-fallthrough=2" "" "-Wimplicit-fallthrough=2"; then
 fi
 print_config "-Wimplicit-fallthrough=2" "$fallthrough"
 
+##########################################
+# check if the compiler has -Wno-stringop-concatenation
+no_stringop="no"
+cat > $TMPC << EOF
+#include <stdio.h>
+
+int main(int argc, char **argv)
+{
+	return printf("%s\n", argv[0]);
+}
+EOF
+if compile_prog "-Wno-stringop-truncation -Werror" "" "no_stringop"; then
+  no_stringop="yes"
+fi
+print_config "-Wno-stringop-truncation" "$no_stringop"
+
 ##########################################
 # check for MADV_HUGEPAGE support
 if test "$thp" != "yes" ; then
@@ -3271,6 +3287,9 @@ fi
 if test "$fallthrough" = "yes"; then
   CFLAGS="$CFLAGS -Wimplicit-fallthrough"
 fi
+if test "$no_stringop" = "yes"; then
+  output_sym "CONFIG_HAVE_NO_STRINGOP"
+fi
 if test "$thp" = "yes" ; then
   output_sym "CONFIG_HAVE_THP"
 fi

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-01-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-01-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit cab9bedf70f34142b70cf97bf4d8c8df57a6f82f:

  examples: remove test.png (2023-01-19 13:10:22 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 13bc47b65d32df6fd212c4687c7fd29b4ab7c09d:

  tools/fiograph: accommodate job files not ending in .fio (2023-01-23 13:51:16 -0500)

----------------------------------------------------------------
Vincent Fu (1):
      tools/fiograph: accommodate job files not ending in .fio

 tools/fiograph/fiograph.py | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/tools/fiograph/fiograph.py b/tools/fiograph/fiograph.py
index 86ed40a8..cfb9b041 100755
--- a/tools/fiograph/fiograph.py
+++ b/tools/fiograph/fiograph.py
@@ -1,4 +1,6 @@
 #!/usr/bin/env python3
+import uuid
+import time
 import errno
 from graphviz import Digraph
 import argparse
@@ -293,13 +295,6 @@ def main():
     global config_file
     args = setup_commandline()
 
-    if args.output is None:
-        output_file = args.file
-        if output_file.endswith('.fio'):
-            output_file = output_file[:-4]
-    else:
-        output_file = args.output
-
     if args.config is None:
         if os.path.exists('fiograph.conf'):
             config_filename = 'fiograph.conf'
@@ -312,9 +307,25 @@ def main():
     config_file = configparser.RawConfigParser(allow_no_value=True)
     config_file.read(config_filename)
 
-    fio_to_graphviz(args.file, args.format).render(output_file, view=args.view)
+    temp_filename = uuid.uuid4().hex
+    image_filename = fio_to_graphviz(args.file, args.format).render(temp_filename, view=args.view)
+
+    output_filename_stub = args.file
+    if args.output:
+        output_filename = args.output
+    else:
+        if output_filename_stub.endswith('.fio'):
+            output_filename_stub = output_filename_stub[:-4]
+        output_filename = image_filename.replace(temp_filename, output_filename_stub)
+    if args.view:
+        time.sleep(1)
+        # allow time for the file to be opened before renaming it
+    os.rename(image_filename, output_filename)
+
     if not args.keep:
-        os.remove(output_file)
+        os.remove(temp_filename)
+    else:
+        os.rename(temp_filename, output_filename_stub + '.gv')
 
 
 main()

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-01-21 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-01-21 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 69bb4b00b20047e7b5af9a6e35cc872cae605071:

  tools/fiograph: improve default config file search (2023-01-18 19:52:15 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cab9bedf70f34142b70cf97bf4d8c8df57a6f82f:

  examples: remove test.png (2023-01-19 13:10:22 -0500)

----------------------------------------------------------------
Vincent Fu (1):
      examples: remove test.png

 examples/test.png | Bin 30141 -> 0 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 examples/test.png

---

Diff of recent changes:

diff --git a/examples/test.png b/examples/test.png
deleted file mode 100644
index 6be50029..00000000
Binary files a/examples/test.png and /dev/null differ

^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-01-19 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-01-19 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 44834c57f944074684c1b58604cc44199cf5e633:

  examples: add missing fiograph diagram for sg_write_same_ndob.fio (2023-01-11 15:22:40 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 69bb4b00b20047e7b5af9a6e35cc872cae605071:

  tools/fiograph: improve default config file search (2023-01-18 19:52:15 -0500)

----------------------------------------------------------------
Vincent Fu (3):
      tools/fiograph: add link to file formats
      tools/fiograph: improve default output file name
      tools/fiograph: improve default config file search

 tools/fiograph/fiograph.py | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/tools/fiograph/fiograph.py b/tools/fiograph/fiograph.py
index 384decda..86ed40a8 100755
--- a/tools/fiograph/fiograph.py
+++ b/tools/fiograph/fiograph.py
@@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+import errno
 from graphviz import Digraph
 import argparse
 import configparser
@@ -274,7 +275,7 @@ def setup_commandline():
     parser.add_argument('--format', action='store',
                         type=str,
                         default='png',
-                        help='the output format')
+                        help='the output format (see https://graphviz.org/docs/outputs/)')
     parser.add_argument('--view', action='store_true',
                         default=False,
                         help='view the graph')
@@ -283,7 +284,6 @@ def setup_commandline():
                         help='keep the graphviz script file')
     parser.add_argument('--config', action='store',
                         type=str,
-                        default='fiograph.conf',
                         help='the configuration filename')
     args = parser.parse_args()
     return args
@@ -292,13 +292,26 @@ def setup_commandline():
 def main():
     global config_file
     args = setup_commandline()
+
     if args.output is None:
         output_file = args.file
-        output_file = output_file.replace('.fio', '')
+        if output_file.endswith('.fio'):
+            output_file = output_file[:-4]
     else:
         output_file = args.output
+
+    if args.config is None:
+        if os.path.exists('fiograph.conf'):
+            config_filename = 'fiograph.conf'
+        else:
+            config_filename = os.path.join(os.path.dirname(__file__), 'fiograph.conf')
+            if not os.path.exists(config_filename):
+                raise FileNotFoundError("Cannot locate configuration file")
+    else:
+        config_filename = args.config
     config_file = configparser.RawConfigParser(allow_no_value=True)
-    config_file.read(args.config)
+    config_file.read(config_filename)
+
     fio_to_graphviz(args.file, args.format).render(output_file, view=args.view)
     if not args.keep:
         os.remove(output_file)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2023-01-12 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2023-01-12 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c945074c0336fb1720acead38e578d4dd7f29921:

  engines/xnvme: add support for picking mem backend (2022-12-22 08:50:03 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 44834c57f944074684c1b58604cc44199cf5e633:

  examples: add missing fiograph diagram for sg_write_same_ndob.fio (2023-01-11 15:22:40 -0500)

----------------------------------------------------------------
Ankit Kumar (1):
      doc: clarify the usage of rw_sequencer

Vincent Fu (1):
      examples: add missing fiograph diagram for sg_write_same_ndob.fio

 HOWTO.rst                       |  35 ++++++++++++++++++++++++------
 examples/sg_write_same_ndob.png | Bin 0 -> 97793 bytes
 fio.1                           |  47 +++++++++++++++++++++++++++++++++++-----
 3 files changed, 69 insertions(+), 13 deletions(-)
 create mode 100644 examples/sg_write_same_ndob.png

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 0a48a453..17caaf5d 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1176,13 +1176,34 @@ I/O type
 			Generate the same offset.
 
 	``sequential`` is only useful for random I/O, where fio would normally
-	generate a new random offset for every I/O. If you append e.g. 8 to randread,
-	you would get a new random offset for every 8 I/Os. The result would be a
-	seek for only every 8 I/Os, instead of for every I/O. Use ``rw=randread:8``
-	to specify that. As sequential I/O is already sequential, setting
-	``sequential`` for that would not result in any differences.  ``identical``
-	behaves in a similar fashion, except it sends the same offset 8 number of
-	times before generating a new offset.
+	generate a new random offset for every I/O. If you append e.g. 8 to
+	randread, i.e. ``rw=randread:8`` you would get a new random offset for
+	every 8 I/Os. The result would be a sequence of 8 sequential offsets
+	with a random starting point. However this behavior may change if a
+	sequential I/O reaches end of the file. As sequential I/O is already
+	sequential, setting ``sequential`` for that would not result in any
+	difference. ``identical`` behaves in a similar fashion, except it sends
+	the same offset 8 number of times before generating a new offset.
+
+	Example #1::
+
+		rw=randread:8
+		rw_sequencer=sequential
+		bs=4k
+
+	The generated sequence of offsets will look like this:
+	4k, 8k, 12k, 16k, 20k, 24k, 28k, 32k, 92k, 96k, 100k, 104k, 108k,
+	112k, 116k, 120k, 48k, 52k ...
+
+	Example #2::
+
+		rw=randread:8
+		rw_sequencer=identical
+		bs=4k
+
+	The generated sequence of offsets will look like this:
+	4k, 4k, 4k, 4k, 4k, 4k, 4k, 4k, 92k, 92k, 92k, 92k, 92k, 92k, 92k, 92k,
+	48k, 48k, 48k ...
 
 .. option:: unified_rw_reporting=str
 
diff --git a/examples/sg_write_same_ndob.png b/examples/sg_write_same_ndob.png
new file mode 100644
index 00000000..8b76fc6c
Binary files /dev/null and b/examples/sg_write_same_ndob.png differ
diff --git a/fio.1 b/fio.1
index eb87533f..527b3d46 100644
--- a/fio.1
+++ b/fio.1
@@ -952,12 +952,47 @@ Generate the same offset.
 .P
 \fBsequential\fR is only useful for random I/O, where fio would normally
 generate a new random offset for every I/O. If you append e.g. 8 to randread,
-you would get a new random offset for every 8 I/Os. The result would be a
-seek for only every 8 I/Os, instead of for every I/O. Use `rw=randread:8'
-to specify that. As sequential I/O is already sequential, setting
-\fBsequential\fR for that would not result in any differences. \fBidentical\fR
-behaves in a similar fashion, except it sends the same offset 8 number of
-times before generating a new offset.
+i.e. `rw=randread:8' you would get a new random offset for every 8 I/Os. The
+result would be a sequence of 8 sequential offsets with a random starting
+point.  However this behavior may change if a sequential I/O reaches end of the
+file. As sequential I/O is already sequential, setting \fBsequential\fR for
+that would not result in any difference. \fBidentical\fR behaves in a similar
+fashion, except it sends the same offset 8 number of times before generating a
+new offset.
+.P
+.P
+Example #1:
+.RS
+.P
+.PD 0
+rw=randread:8
+.P
+rw_sequencer=sequential
+.P
+bs=4k
+.PD
+.RE
+.P
+The generated sequence of offsets will look like this:
+4k, 8k, 12k, 16k, 20k, 24k, 28k, 32k, 92k, 96k, 100k, 104k, 108k, 112k, 116k,
+120k, 48k, 52k ...
+.P
+.P
+Example #2:
+.RS
+.P
+.PD 0
+rw=randread:8
+.P
+rw_sequencer=identical
+.P
+bs=4k
+.PD
+.RE
+.P
+The generated sequence of offsets will look like this:
+4k, 4k, 4k, 4k, 4k, 4k, 4k, 4k, 92k, 92k, 92k, 92k, 92k, 92k, 92k, 92k, 48k,
+48k, 48k ...
 .RE
 .TP
 .BI unified_rw_reporting \fR=\fPstr

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-12-23 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-12-23 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 77c758db876d93022e8f2bb4fd4c1acbbf7e76ac:

  t/run-fio-tests: relax acceptance criteria for t0008 (2022-12-16 19:35:07 +0000)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c945074c0336fb1720acead38e578d4dd7f29921:

  engines/xnvme: add support for picking mem backend (2022-12-22 08:50:03 -0500)

----------------------------------------------------------------
Ankit Kumar (4):
      engines/xnvme: fixes for xnvme ioengine
      engines/xnvme: user space vfio based backend
      engines/xnvme: add subnqn to fio-options
      engines/xnvme: add support for picking mem backend

 HOWTO.rst       | 27 ++++++++++++++++++++++++++-
 engines/xnvme.c | 44 ++++++++++++++++++++++++++++++++++++++------
 fio.1           | 32 +++++++++++++++++++++++++++++++-
 3 files changed, 95 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 97fe5350..0a48a453 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2845,6 +2845,9 @@ with the caveat that when used on the command line, they must come after the
 	**posix**
 		Use the posix asynchronous I/O interface to perform one or
 		more I/O operations asynchronously.
+	**vfio**
+		Use the user-space VFIO-based backend, implemented using
+		libvfn instead of SPDK.
 	**nil**
 		Do not transfer any data; just pretend to. This is mainly used
 		for introspective performance evaluation.
@@ -2875,7 +2878,29 @@ with the caveat that when used on the command line, they must come after the
 
 .. option:: xnvme_dev_nsid=int : [xnvme]
 
-	xnvme namespace identifier for userspace NVMe driver, such as SPDK.
+	xnvme namespace identifier for userspace NVMe driver, SPDK or vfio.
+
+.. option:: xnvme_dev_subnqn=str : [xnvme]
+
+	Sets the subsystem NQN for fabrics. This is for xNVMe to utilize a
+	fabrics target with multiple systems.
+
+.. option:: xnvme_mem=str : [xnvme]
+
+	Select the xnvme memory backend. This can take these values.
+
+	**posix**
+		This is the default posix memory backend for linux NVMe driver.
+	**hugepage**
+		Use hugepages, instead of existing posix memory backend. The
+		memory backend uses hugetlbfs. This require users to allocate
+		hugepages, mount hugetlbfs and set an enviornment variable for
+		XNVME_HUGETLB_PATH.
+	**spdk**
+		Uses SPDK's memory allocator.
+	**vfio**
+		Uses libvfn's memory allocator. This also specifies the use
+		of libvfn backend instead of SPDK.
 
 .. option:: xnvme_iovec=int : [xnvme]
 
diff --git a/engines/xnvme.c b/engines/xnvme.c
index d8647481..bb92a121 100644
--- a/engines/xnvme.c
+++ b/engines/xnvme.c
@@ -75,9 +75,11 @@ struct xnvme_fioe_options {
 	unsigned int xnvme_dev_nsid;
 	unsigned int xnvme_iovec;
 	char *xnvme_be;
+	char *xnvme_mem;
 	char *xnvme_async;
 	char *xnvme_sync;
 	char *xnvme_admin;
+	char *xnvme_dev_subnqn;
 };
 
 static struct fio_option options[] = {
@@ -108,12 +110,22 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group = FIO_OPT_G_XNVME,
 	},
+	{
+		.name = "xnvme_mem",
+		.lname = "xNVMe Memory Backend",
+		.type = FIO_OPT_STR_STORE,
+		.off1 = offsetof(struct xnvme_fioe_options, xnvme_mem),
+		.help = "Select xNVMe memory backend",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_XNVME,
+	},
 	{
 		.name = "xnvme_async",
 		.lname = "xNVMe Asynchronous command-interface",
 		.type = FIO_OPT_STR_STORE,
 		.off1 = offsetof(struct xnvme_fioe_options, xnvme_async),
-		.help = "Select xNVMe async. interface: [emu,thrpool,io_uring,libaio,posix,nil]",
+		.help = "Select xNVMe async. interface: "
+			"[emu,thrpool,io_uring,io_uring_cmd,libaio,posix,vfio,nil]",
 		.category = FIO_OPT_C_ENGINE,
 		.group = FIO_OPT_G_XNVME,
 	},
@@ -122,7 +134,7 @@ static struct fio_option options[] = {
 		.lname = "xNVMe Synchronous. command-interface",
 		.type = FIO_OPT_STR_STORE,
 		.off1 = offsetof(struct xnvme_fioe_options, xnvme_sync),
-		.help = "Select xNVMe sync. interface: [nvme,psync]",
+		.help = "Select xNVMe sync. interface: [nvme,psync,block]",
 		.category = FIO_OPT_C_ENGINE,
 		.group = FIO_OPT_G_XNVME,
 	},
@@ -131,7 +143,7 @@ static struct fio_option options[] = {
 		.lname = "xNVMe Admin command-interface",
 		.type = FIO_OPT_STR_STORE,
 		.off1 = offsetof(struct xnvme_fioe_options, xnvme_admin),
-		.help = "Select xNVMe admin. cmd-interface: [nvme,block,file_as_ns]",
+		.help = "Select xNVMe admin. cmd-interface: [nvme,block]",
 		.category = FIO_OPT_C_ENGINE,
 		.group = FIO_OPT_G_XNVME,
 	},
@@ -144,6 +156,15 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group = FIO_OPT_G_XNVME,
 	},
+	{
+		.name = "xnvme_dev_subnqn",
+		.lname = "Subsystem nqn for Fabrics",
+		.type = FIO_OPT_STR_STORE,
+		.off1 = offsetof(struct xnvme_fioe_options, xnvme_dev_subnqn),
+		.help = "Subsystem NQN for Fabrics",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_XNVME,
+	},
 	{
 		.name = "xnvme_iovec",
 		.lname = "Vectored IOs",
@@ -180,7 +201,9 @@ static struct xnvme_opts xnvme_opts_from_fioe(struct thread_data *td)
 	struct xnvme_opts opts = xnvme_opts_default();
 
 	opts.nsid = o->xnvme_dev_nsid;
+	opts.subnqn = o->xnvme_dev_subnqn;
 	opts.be = o->xnvme_be;
+	opts.mem = o->xnvme_mem;
 	opts.async = o->xnvme_async;
 	opts.sync = o->xnvme_sync;
 	opts.admin = o->xnvme_admin;
@@ -322,12 +345,15 @@ static int xnvme_fioe_init(struct thread_data *td)
 
 	xd->iocq = calloc(td->o.iodepth, sizeof(struct io_u *));
 	if (!xd->iocq) {
-		log_err("ioeng->init(): !calloc(), err(%d)\n", errno);
+		free(xd);
+		log_err("ioeng->init(): !calloc(xd->iocq), err(%d)\n", errno);
 		return 1;
 	}
 
 	xd->iovec = calloc(td->o.iodepth, sizeof(*xd->iovec));
 	if (!xd->iovec) {
+		free(xd->iocq);
+		free(xd);
 		log_err("ioeng->init(): !calloc(xd->iovec), err(%d)\n", errno);
 		return 1;
 	}
@@ -338,6 +364,10 @@ static int xnvme_fioe_init(struct thread_data *td)
 	for_each_file(td, f, i)
 	{
 		if (_dev_open(td, f)) {
+			/*
+			 * Note: We are not freeing xd, iocq and iovec. This
+			 * will be done as part of cleanup routine.
+			 */
 			log_err("ioeng->init(): failed; _dev_open(%s)\n", f->file_name);
 			return 1;
 		}
@@ -506,9 +536,11 @@ static enum fio_q_status xnvme_fioe_queue(struct thread_data *td, struct io_u *i
 
 	default:
 		log_err("ioeng->queue(): ENOSYS: %u\n", io_u->ddir);
-		err = -1;
+		xnvme_queue_put_cmd_ctx(ctx->async.queue, ctx);
+
+		io_u->error = ENOSYS;
 		assert(false);
-		break;
+		return FIO_Q_COMPLETED;
 	}
 
 	if (vectored_io) {
diff --git a/fio.1 b/fio.1
index 1074b52a..eb87533f 100644
--- a/fio.1
+++ b/fio.1
@@ -2584,6 +2584,10 @@ Use Linux aio for Asynchronous I/O
 Use the posix asynchronous I/O interface to perform one or more I/O operations
 asynchronously.
 .TP
+.BI vfio
+Use the user-space VFIO-based backend, implemented using libvfn instead of
+SPDK.
+.TP
 .BI nil
 Do not transfer any data; just pretend to. This is mainly used for
 introspective performance evaluation.
@@ -2621,7 +2625,33 @@ Use Linux Block Layer ioctl() and sysfs for admin commands.
 .RE
 .TP
 .BI (xnvme)xnvme_dev_nsid\fR=\fPint
-xnvme namespace identifier for userspace NVMe driver such as SPDK.
+xnvme namespace identifier for userspace NVMe driver SPDK or vfio.
+.TP
+.BI (xnvme)xnvme_dev_subnqn\fR=\fPstr
+Sets the subsystem NQN for fabrics. This is for xNVMe to utilize a fabrics
+target with multiple systems.
+.TP
+.BI (xnvme)xnvme_mem\fR=\fPstr
+Select the xnvme memory backend. This can take these values.
+.RS
+.RS
+.TP
+.B posix
+This is the default posix memory backend for linux NVMe driver.
+.TP
+.BI hugepage
+Use hugepages, instead of existing posix memory backend. The memory backend
+uses hugetlbfs. This require users to allocate hugepages, mount hugetlbfs and
+set an enviornment variable for XNVME_HUGETLB_PATH.
+.TP
+.BI spdk
+Uses SPDK's memory allocator.
+.TP
+.BI vfio
+Uses libvfn's memory allocator. This also specifies the use of libvfn backend
+instead of SPDK.
+.RE
+.RE
 .TP
 .BI (xnvme)xnvme_iovec
 If this option is set, xnvme will use vectored read/write commands.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-12-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-12-17 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a5000c864aa8e7b5525f51fa6fecec75c518b013:

  example: add a zoned block device write example with GC by trim workload (2022-12-15 12:42:12 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 77c758db876d93022e8f2bb4fd4c1acbbf7e76ac:

  t/run-fio-tests: relax acceptance criteria for t0008 (2022-12-16 19:35:07 +0000)

----------------------------------------------------------------
Vincent Fu (3):
      tools/fiograph: update config file
      examples: add missing fiograph diagrams
      t/run-fio-tests: relax acceptance criteria for t0008

 examples/dedupe-global.png                | Bin 0 -> 55479 bytes
 examples/http-s3-crypto.png               | Bin 0 -> 131254 bytes
 examples/http-s3-storage-class.png        | Bin 0 -> 123771 bytes
 examples/libblkio-io_uring.png            | Bin 0 -> 57227 bytes
 examples/libblkio-virtio-blk-vfio-pci.png | Bin 0 -> 62398 bytes
 examples/sg_verify-fail.png               | Bin 0 -> 107581 bytes
 examples/sg_verify.png                    | Bin 0 -> 161969 bytes
 examples/uring-cmd-ng.png                 | Bin 0 -> 83761 bytes
 examples/uring-cmd-zoned.png              | Bin 0 -> 98231 bytes
 examples/xnvme-compare.png                | Bin 0 -> 44742 bytes
 examples/xnvme-zoned.png                  | Bin 0 -> 48340 bytes
 t/run-fio-tests.py                        |  13 ++++++++-----
 tools/fiograph/fiograph.conf              |  13 +++++++++++--
 13 files changed, 19 insertions(+), 7 deletions(-)
 create mode 100644 examples/dedupe-global.png
 create mode 100644 examples/http-s3-crypto.png
 create mode 100644 examples/http-s3-storage-class.png
 create mode 100644 examples/libblkio-io_uring.png
 create mode 100644 examples/libblkio-virtio-blk-vfio-pci.png
 create mode 100644 examples/sg_verify-fail.png
 create mode 100644 examples/sg_verify.png
 create mode 100644 examples/uring-cmd-ng.png
 create mode 100644 examples/uring-cmd-zoned.png
 create mode 100644 examples/xnvme-compare.png
 create mode 100644 examples/xnvme-zoned.png

---

Diff of recent changes:

diff --git a/examples/dedupe-global.png b/examples/dedupe-global.png
new file mode 100644
index 00000000..fd4602e3
Binary files /dev/null and b/examples/dedupe-global.png differ
diff --git a/examples/http-s3-crypto.png b/examples/http-s3-crypto.png
new file mode 100644
index 00000000..b452cf45
Binary files /dev/null and b/examples/http-s3-crypto.png differ
diff --git a/examples/http-s3-storage-class.png b/examples/http-s3-storage-class.png
new file mode 100644
index 00000000..b893a4eb
Binary files /dev/null and b/examples/http-s3-storage-class.png differ
diff --git a/examples/libblkio-io_uring.png b/examples/libblkio-io_uring.png
new file mode 100644
index 00000000..1bc6cc98
Binary files /dev/null and b/examples/libblkio-io_uring.png differ
diff --git a/examples/libblkio-virtio-blk-vfio-pci.png b/examples/libblkio-virtio-blk-vfio-pci.png
new file mode 100644
index 00000000..8a670cc2
Binary files /dev/null and b/examples/libblkio-virtio-blk-vfio-pci.png differ
diff --git a/examples/sg_verify-fail.png b/examples/sg_verify-fail.png
new file mode 100644
index 00000000..516e2d40
Binary files /dev/null and b/examples/sg_verify-fail.png differ
diff --git a/examples/sg_verify.png b/examples/sg_verify.png
new file mode 100644
index 00000000..f244a748
Binary files /dev/null and b/examples/sg_verify.png differ
diff --git a/examples/uring-cmd-ng.png b/examples/uring-cmd-ng.png
new file mode 100644
index 00000000..cd2ff162
Binary files /dev/null and b/examples/uring-cmd-ng.png differ
diff --git a/examples/uring-cmd-zoned.png b/examples/uring-cmd-zoned.png
new file mode 100644
index 00000000..a3dd199d
Binary files /dev/null and b/examples/uring-cmd-zoned.png differ
diff --git a/examples/xnvme-compare.png b/examples/xnvme-compare.png
new file mode 100644
index 00000000..2af92f62
Binary files /dev/null and b/examples/xnvme-compare.png differ
diff --git a/examples/xnvme-zoned.png b/examples/xnvme-zoned.png
new file mode 100644
index 00000000..2f850740
Binary files /dev/null and b/examples/xnvme-zoned.png differ
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index a06f8126..70ff4371 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -387,10 +387,13 @@ class FioJobTest_t0007(FioJobTest):
 class FioJobTest_t0008(FioJobTest):
     """Test consists of fio test job t0008
     Confirm that read['io_kbytes'] = 32768 and that
-                write['io_kbytes'] ~ 16568
+                write['io_kbytes'] ~ 16384
 
-    I did runs with fio-ae2fafc8 and saw write['io_kbytes'] values of
-    16585, 16588. With two runs of fio-3.16 I obtained 16568"""
+    This is a 50/50 seq read/write workload. Since fio flips a coin to
+    determine whether to issue a read or a write, total bytes written will not
+    be exactly 16384K. But total bytes read will be exactly 32768K because
+    reads will include the initial phase as well as the verify phase where all
+    the blocks originally written will be read."""
 
     def check_result(self):
         super(FioJobTest_t0008, self).check_result()
@@ -398,10 +401,10 @@ class FioJobTest_t0008(FioJobTest):
         if not self.passed:
             return
 
-        ratio = self.json_data['jobs'][0]['write']['io_kbytes'] / 16568
+        ratio = self.json_data['jobs'][0]['write']['io_kbytes'] / 16384
         logging.debug("Test %d: ratio: %f", self.testnum, ratio)
 
-        if ratio < 0.99 or ratio > 1.01:
+        if ratio < 0.97 or ratio > 1.03:
             self.failure_reason = "{0} bytes written mismatch,".format(self.failure_reason)
             self.passed = False
         if self.json_data['jobs'][0]['read']['io_kbytes'] != 32768:
diff --git a/tools/fiograph/fiograph.conf b/tools/fiograph/fiograph.conf
index cfd2fd8e..91c5fcfe 100644
--- a/tools/fiograph/fiograph.conf
+++ b/tools/fiograph/fiograph.conf
@@ -45,7 +45,7 @@ specific_options=stat_type
 specific_options=volume  brick
 
 [ioengine_http]
-specific_options=https  http_host  http_user  http_pass  http_s3_key  http_s3_keyid  http_swift_auth_token  http_s3_region  http_mode  http_verbose
+specific_options=https  http_host  http_user  http_pass  http_s3_key  http_s3_keyid  http_swift_auth_token  http_s3_region  http_mode  http_verbose  http_s3_storage_class  http_s3_sse_customer_key  http_s3_sse_customer_algorithm
 
 [ioengine_ime_aio]
 specific_options=ime_psync  ime_psyncv
@@ -53,9 +53,15 @@ specific_options=ime_psync  ime_psyncv
 [ioengine_io_uring]
 specific_options=hipri  cmdprio_percentage  cmdprio_class  cmdprio  cmdprio_bssplit  fixedbufs  registerfiles  sqthread_poll  sqthread_poll_cpu  nonvectored  uncached  nowait  force_async
 
+[ioengine_io_uring_cmd]
+specific_options=hipri  cmdprio_percentage  cmdprio_class  cmdprio  cmdprio_bssplit  fixedbufs  registerfiles  sqthread_poll  sqthread_poll_cpu  nonvectored  uncached  nowait  force_async  cmd_type
+
 [ioengine_libaio]
 specific_options=userspace_reap  cmdprio_percentage  cmdprio_class  cmdprio  cmdprio_bssplit  nowait
 
+[ioengine_libblkio]
+specific_options=libblkio_driver  libblkio_path  libblkio_pre_connect_props  libblkio_num_entries  libblkio_queue_size  libblkio_pre_start_props  hipri  libblkio_vectored  libblkio_write_zeroes_on_trim  libblkio_wait_mode  libblkio_force_enable_completion_eventfd
+
 [ioengine_libcufile]
 specific_options=gpu_dev_ids  cuda_io
 
@@ -99,7 +105,10 @@ specific_options=clustername  rbdname  pool  clientname  busy_poll
 specific_options=hostname  bindname  port  verb
 
 [ioengine_sg]
-specific_options=hipri  readfua  writefua  sg_write_mode  sg
+specific_options=hipri  readfua  writefua  sg_write_mode  stream_id
 
 [ioengine_pvsync2]
 specific_options=hipri  hipri_percentage  uncached  nowait  sync  psync  vsync  pvsync
+
+[ioengine_xnvme]
+specific_options=hipri  sqthread_poll  xnvme_be  xnvme_async  xnvme_sync  xnvme_admin  xnvme_dev_nsid  xnvme_iovec

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-12-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-12-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 70eb71e682b90078db6f361936933b88f71ad5fd:

  t/io_uring: adjust IORING_REGISTER_MAP_BUFFERS value (2022-12-12 16:58:32 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a5000c864aa8e7b5525f51fa6fecec75c518b013:

  example: add a zoned block device write example with GC by trim workload (2022-12-15 12:42:12 -0700)

----------------------------------------------------------------
Shin'ichiro Kawasaki (4):
      man: fix troff warning
      HOWTO/man: improve descriptions of max open zones options
      example: add a zoned block device write example with GC by zone resets
      example: add a zoned block device write example with GC by trim workload

 HOWTO.rst                                 |  27 ++++++++++++++-----
 examples/zbd-rand-write-trim-gc.fio       |  43 ++++++++++++++++++++++++++++++
 examples/zbd-rand-write-trim-gc.png       | Bin 0 -> 104661 bytes
 examples/zbd-rand-write-zone-reset-gc.fio |  27 +++++++++++++++++++
 examples/zbd-rand-write-zone-reset-gc.png | Bin 0 -> 59186 bytes
 fio.1                                     |  25 ++++++++++++-----
 6 files changed, 108 insertions(+), 14 deletions(-)
 create mode 100644 examples/zbd-rand-write-trim-gc.fio
 create mode 100644 examples/zbd-rand-write-trim-gc.png
 create mode 100644 examples/zbd-rand-write-zone-reset-gc.fio
 create mode 100644 examples/zbd-rand-write-zone-reset-gc.png

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 5a5263c3..97fe5350 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1052,16 +1052,29 @@ Target file/device
 
 .. option:: max_open_zones=int
 
-	When running a random write test across an entire drive many more
-	zones will be open than in a typical application workload. Hence this
-	command line option that allows one to limit the number of open zones. The
-	number of open zones is defined as the number of zones to which write
-	commands are issued.
+	A zone of a zoned block device is in the open state when it is partially
+	written (i.e. not all sectors of the zone have been written). Zoned
+	block devices may have a limit on the total number of zones that can
+	be simultaneously in the open state, that is, the number of zones that
+	can be written to simultaneously. The :option:`max_open_zones` parameter
+	limits the number of zones to which write commands are issued by all fio
+	jobs, that is, limits the number of zones that will be in the open
+	state. This parameter is relevant only if the :option:`zonemode` =zbd is
+	used. The default value is always equal to maximum number of open zones
+	of the target zoned block device and a value higher than this limit
+	cannot be specified by users unless the option
+	:option:`ignore_zone_limits` is specified. When
+	:option:`ignore_zone_limits` is specified or the target device has no
+	limit on the number of zones that can be in an open state,
+	:option:`max_open_zones` can specify 0 to disable any limit on the
+	number of zones that can be simultaneously written to by all jobs.
 
 .. option:: job_max_open_zones=int
 
-	Limit on the number of simultaneously opened zones per single
-	thread/process.
+	In the same manner as :option:`max_open_zones`, limit the number of open
+	zones per fio job, that is, the number of zones that a single job can
+	simultaneously write to. A value of zero indicates no limit.
+	Default: zero.
 
 .. option:: ignore_zone_limits=bool
 
diff --git a/examples/zbd-rand-write-trim-gc.fio b/examples/zbd-rand-write-trim-gc.fio
new file mode 100644
index 00000000..139d2c43
--- /dev/null
+++ b/examples/zbd-rand-write-trim-gc.fio
@@ -0,0 +1,43 @@
+; Using the libaio ioengine, random write to a (zoned) block device. Write
+; target zones are chosen randomly among the first 128 zones starting from
+; device offset corresponding to the 524th zone of the device (524 x 256 MB).
+; For first 3 seconds, run only random write. After that, run random write job
+; and garbage collection simulation job in parallel. The garbage collection
+; simulation job runs trim workload to reset the 128 zones randomly. Use flow
+; option to make the zone resets happen every 128 blocks writes by the other
+; job. This example does not specify max_open_zones. The limit of maximum
+; open zones is obtained from the target block device.
+
+[global]
+group_reporting
+zonemode=zbd
+zonesize=256M
+direct=1
+time_based
+runtime=30
+
+filename=/dev/sdb
+offset=524z
+
+[warmup]
+rw=randwrite
+bs=2M
+size=128z
+ioengine=libaio
+runtime=3
+
+[wjob]
+wait_for=warmup
+rw=randwrite
+bs=2M
+size=128z
+ioengine=libaio
+flow=128
+
+[trimjob]
+wait_for=warmup
+rw=randtrim
+bs=256M
+size=128z
+ioengine=psync
+flow=1
diff --git a/examples/zbd-rand-write-trim-gc.png b/examples/zbd-rand-write-trim-gc.png
new file mode 100644
index 00000000..f58dd412
Binary files /dev/null and b/examples/zbd-rand-write-trim-gc.png differ
diff --git a/examples/zbd-rand-write-zone-reset-gc.fio b/examples/zbd-rand-write-zone-reset-gc.fio
new file mode 100644
index 00000000..8f77baf3
--- /dev/null
+++ b/examples/zbd-rand-write-zone-reset-gc.fio
@@ -0,0 +1,27 @@
+; Using the psync ioengine, random write to a (zoned) block device. Write
+; target zones are chosen randomly among the first 8 zones starting from device
+; offset corresponding to the 524th zone of the device (524 x 256 MB). Simulate
+; garbage collection operation using zone_reset_threshold and
+; zone_reset_frequency options. The zone resets happen when total written data
+; bytes is beyond 70% of 8 zones, and 8 = 1 / 0.125 blocks are written. This
+; example does not specify max_open_zones. The limit of maximum open zones is
+; obtained from the target block device.
+
+[global]
+name=zbd-rand-write-gc
+group_reporting
+rw=randwrite
+zonemode=zbd
+zonesize=256M
+bs=32M
+direct=1
+time_based
+runtime=40
+
+[dev1]
+filename=/dev/sdb
+size=8z
+offset=524z
+ioengine=psync
+zone_reset_threshold=0.7
+zone_reset_frequency=0.125
diff --git a/examples/zbd-rand-write-zone-reset-gc.png b/examples/zbd-rand-write-zone-reset-gc.png
new file mode 100644
index 00000000..b10acc80
Binary files /dev/null and b/examples/zbd-rand-write-zone-reset-gc.png differ
diff --git a/fio.1 b/fio.1
index 7a153731..1074b52a 100644
--- a/fio.1
+++ b/fio.1
@@ -828,14 +828,25 @@ numbers fio only reads beyond the write pointer if explicitly told to do
 so. Default: false.
 .TP
 .BI max_open_zones \fR=\fPint
-When running a random write test across an entire drive many more zones will be
-open than in a typical application workload. Hence this command line option
-that allows one to limit the number of open zones. The number of open zones is
-defined as the number of zones to which write commands are issued by all
-threads/processes.
+A zone of a zoned block device is in the open state when it is partially written
+(i.e. not all sectors of the zone have been written). Zoned block devices may
+have limit a on the total number of zones that can be simultaneously in the
+open state, that is, the number of zones that can be written to simultaneously.
+The \fBmax_open_zones\fR parameter limits the number of zones to which write
+commands are issued by all fio jobs, that is, limits the number of zones that
+will be in the open state. This parameter is relevant only if the
+\fBzonemode=zbd\fR is used. The default value is always equal to maximum number
+of open zones of the target zoned block device and a value higher than this
+limit cannot be specified by users unless the option \fBignore_zone_limits\fR is
+specified. When \fBignore_zone_limits\fR is specified or the target device has
+no limit on the number of zones that can be in an open state,
+\fBmax_open_zones\fR can specify 0 to disable any limit on the number of zones
+that can be simultaneously written to by all jobs.
 .TP
 .BI job_max_open_zones \fR=\fPint
-Limit on the number of simultaneously opened zones per single thread/process.
+In the same manner as \fBmax_open_zones\fR, limit the number of open zones per
+fio job, that is, the number of zones that a single job can simultaneously write
+to. A value of zero indicates no limit. Default: zero.
 .TP
 .BI ignore_zone_limits \fR=\fPbool
 If this option is used, fio will ignore the maximum number of open zones limit
@@ -2544,7 +2555,7 @@ replaced by the name of the job
 .BI (exec)grace_time\fR=\fPint
 Defines the time between the SIGTERM and SIGKILL signals. Default is 1 second.
 .TP
-.BI (exec)std_redirect\fR=\fbool
+.BI (exec)std_redirect\fR=\fPbool
 If set, stdout and stderr streams are redirected to files named from the job name. Default is true.
 .TP
 .BI (xnvme)xnvme_async\fR=\fPstr

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-12-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-12-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3afc2d8ac30c58372a1b7ccabaea0f3eae4ddaba:

  engines/libblkio: Share a single blkio instance among threads in same process (2022-12-02 16:24:03 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 70eb71e682b90078db6f361936933b88f71ad5fd:

  t/io_uring: adjust IORING_REGISTER_MAP_BUFFERS value (2022-12-12 16:58:32 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: adjust IORING_REGISTER_MAP_BUFFERS value

 t/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index edbacee3..1ea0a9da 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -156,7 +156,7 @@ static float plist[] = { 1.0, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0,
 static int plist_len = 17;
 
 #ifndef IORING_REGISTER_MAP_BUFFERS
-#define IORING_REGISTER_MAP_BUFFERS	22
+#define IORING_REGISTER_MAP_BUFFERS	26
 struct io_uring_map_buffers {
 	__s32	fd;
 	__u32	buf_start;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-12-03 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-12-03 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 942d66c85ee8f007ea5f1097d097cf9a44b662a0:

  doc: update about size (2022-12-01 11:12:35 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3afc2d8ac30c58372a1b7ccabaea0f3eae4ddaba:

  engines/libblkio: Share a single blkio instance among threads in same process (2022-12-02 16:24:03 -0500)

----------------------------------------------------------------
Alberto Faria (10):
      Add a libblkio engine
      Add engine flag FIO_SKIPPABLE_IOMEM_ALLOC
      engines/libblkio: Allow setting option mem/iomem
      engines/libblkio: Add support for poll queues
      engines/libblkio: Add option libblkio_vectored
      engines/libblkio: Add option libblkio_write_zeroes_on_trim
      engines/libblkio: Add option libblkio_wait_mode
      engines/libblkio: Add option libblkio_force_enable_completion_eventfd
      engines/libblkio: Add options for some driver-specific properties
      engines/libblkio: Share a single blkio instance among threads in same process

 HOWTO.rst                                 |  95 ++++
 Makefile                                  |   6 +
 configure                                 |  25 +
 engines/libblkio.c                        | 914 ++++++++++++++++++++++++++++++
 examples/libblkio-io_uring.fio            |  29 +
 examples/libblkio-virtio-blk-vfio-pci.fio |  29 +
 fio.1                                     |  78 +++
 ioengines.h                               |   2 +
 memory.c                                  |  22 +-
 optgroup.h                                |   2 +
 10 files changed, 1192 insertions(+), 10 deletions(-)
 create mode 100644 engines/libblkio.c
 create mode 100644 examples/libblkio-io_uring.fio
 create mode 100644 examples/libblkio-virtio-blk-vfio-pci.fio

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 0aaf033a..5a5263c3 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2195,6 +2195,21 @@ I/O engine
 			the SPDK NVMe driver, or your own custom NVMe driver. The xnvme engine includes
 			engine specific options. (See https://xnvme.io).
 
+		**libblkio**
+			Use the libblkio library
+			(https://gitlab.com/libblkio/libblkio). The specific
+			*driver* to use must be set using
+			:option:`libblkio_driver`. If
+			:option:`mem`/:option:`iomem` is not specified, memory
+			allocation is delegated to libblkio (and so is
+			guaranteed to work with the selected *driver*). One
+			libblkio instance is used per process, so all jobs
+			setting option :option:`thread` will share a single
+			instance (with one queue per thread) and must specify
+			compatible options. Note that some drivers don't allow
+			several instances to access the same device or file
+			simultaneously, but allow it for threads.
+
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -2326,6 +2341,12 @@ with the caveat that when used on the command line, they must come after the
         by the application. The benefits are more efficient IO for high IOPS
         scenarios, and lower latencies for low queue depth IO.
 
+   [libblkio]
+
+	Use poll queues. This is incompatible with
+	:option:`libblkio_wait_mode=eventfd <libblkio_wait_mode>` and
+	:option:`libblkio_force_enable_completion_eventfd`.
+
    [pvsync2]
 
 	Set RWF_HIPRI on I/O, indicating to the kernel that it's of higher priority
@@ -2847,6 +2868,80 @@ with the caveat that when used on the command line, they must come after the
 
 	If this option is set. xnvme will use vectored read/write commands.
 
+.. option:: libblkio_driver=str : [libblkio]
+
+	The libblkio *driver* to use. Different drivers access devices through
+	different underlying interfaces. Available drivers depend on the
+	libblkio version in use and are listed at
+	https://libblkio.gitlab.io/libblkio/blkio.html#drivers
+
+.. option:: libblkio_path=str : [libblkio]
+
+	Sets the value of the driver-specific "path" property before connecting
+	the libblkio instance, which identifies the target device or file on
+	which to perform I/O. Its exact semantics are driver-dependent and not
+	all drivers may support it; see
+	https://libblkio.gitlab.io/libblkio/blkio.html#drivers
+
+.. option:: libblkio_pre_connect_props=str : [libblkio]
+
+	A colon-separated list of additional libblkio properties to be set after
+	creating but before connecting the libblkio instance. Each property must
+	have the format ``<name>=<value>``. Colons can be escaped as ``\:``.
+	These are set after the engine sets any other properties, so those can
+	be overriden. Available properties depend on the libblkio version in use
+	and are listed at
+	https://libblkio.gitlab.io/libblkio/blkio.html#properties
+
+.. option:: libblkio_num_entries=int : [libblkio]
+
+	Sets the value of the driver-specific "num-entries" property before
+	starting the libblkio instance. Its exact semantics are driver-dependent
+	and not all drivers may support it; see
+	https://libblkio.gitlab.io/libblkio/blkio.html#drivers
+
+.. option:: libblkio_queue_size=int : [libblkio]
+
+	Sets the value of the driver-specific "queue-size" property before
+	starting the libblkio instance. Its exact semantics are driver-dependent
+	and not all drivers may support it; see
+	https://libblkio.gitlab.io/libblkio/blkio.html#drivers
+
+.. option:: libblkio_pre_start_props=str : [libblkio]
+
+	A colon-separated list of additional libblkio properties to be set after
+	connecting but before starting the libblkio instance. Each property must
+	have the format ``<name>=<value>``. Colons can be escaped as ``\:``.
+	These are set after the engine sets any other properties, so those can
+	be overriden. Available properties depend on the libblkio version in use
+	and are listed at
+	https://libblkio.gitlab.io/libblkio/blkio.html#properties
+
+.. option:: libblkio_vectored : [libblkio]
+
+	Submit vectored read and write requests.
+
+.. option:: libblkio_write_zeroes_on_trim : [libblkio]
+
+	Submit trims as "write zeroes" requests instead of discard requests.
+
+.. option:: libblkio_wait_mode=str : [libblkio]
+
+	How to wait for completions:
+
+	**block** (default)
+		Use a blocking call to ``blkioq_do_io()``.
+	**eventfd**
+		Use a blocking call to ``read()`` on the completion eventfd.
+	**loop**
+		Use a busy loop with a non-blocking call to ``blkioq_do_io()``.
+
+.. option:: libblkio_force_enable_completion_eventfd : [libblkio]
+
+	Enable the queue's completion eventfd even when unused. This may impact
+	performance. The default is to enable it only if
+	:option:`libblkio_wait_mode=eventfd <libblkio_wait_mode>`.
+
 I/O depth
 ~~~~~~~~~
 
diff --git a/Makefile b/Makefile
index 7bd572d7..9fd8f59b 100644
--- a/Makefile
+++ b/Makefile
@@ -237,6 +237,12 @@ ifdef CONFIG_LIBXNVME
   xnvme_CFLAGS = $(LIBXNVME_CFLAGS)
   ENGINES += xnvme
 endif
+ifdef CONFIG_LIBBLKIO
+  libblkio_SRCS = engines/libblkio.c
+  libblkio_LIBS = $(LIBBLKIO_LIBS)
+  libblkio_CFLAGS = $(LIBBLKIO_CFLAGS)
+  ENGINES += libblkio
+endif
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
 		oslib/linux-dev-lookup.c engines/io_uring.c engines/nvme.c
diff --git a/configure b/configure
index 1b12d268..6d8e3a87 100755
--- a/configure
+++ b/configure
@@ -176,6 +176,7 @@ libiscsi="no"
 libnbd="no"
 libnfs=""
 xnvme=""
+libblkio=""
 libzbc=""
 dfs=""
 seed_buckets=""
@@ -248,6 +249,8 @@ for opt do
   ;;
   --disable-xnvme) xnvme="no"
   ;;
+  --disable-libblkio) libblkio="no"
+  ;;
   --disable-tcmalloc) disable_tcmalloc="yes"
   ;;
   --disable-libnfs) libnfs="no"
@@ -304,6 +307,7 @@ if test "$show_help" = "yes" ; then
   echo "--enable-libiscsi       Enable iscsi support"
   echo "--enable-libnbd         Enable libnbd (NBD engine) support"
   echo "--disable-xnvme         Disable xnvme support even if found"
+  echo "--disable-libblkio      Disable libblkio support even if found"
   echo "--disable-libzbc        Disable libzbc even if found"
   echo "--disable-tcmalloc      Disable tcmalloc support"
   echo "--dynamic-libengines    Lib-based ioengines as dynamic libraries"
@@ -2663,6 +2667,22 @@ if test "$xnvme" != "no" ; then
 fi
 print_config "xnvme engine" "$xnvme"
 
+##########################################
+# Check if we have libblkio
+if test "$libblkio" != "no" ; then
+  if check_min_lib_version blkio 1.0.0; then
+    libblkio="yes"
+    libblkio_cflags=$(pkg-config --cflags blkio)
+    libblkio_libs=$(pkg-config --libs blkio)
+  else
+    if test "$libblkio" = "yes" ; then
+      feature_not_found "libblkio" "libblkio-dev or libblkio-devel"
+    fi
+    libblkio="no"
+  fi
+fi
+print_config "libblkio engine" "$libblkio"
+
 ##########################################
 # check march=armv8-a+crc+crypto
 if test "$march_armv8_a_crc_crypto" != "yes" ; then
@@ -3276,6 +3296,11 @@ if test "$xnvme" = "yes" ; then
   echo "LIBXNVME_CFLAGS=$xnvme_cflags" >> $config_host_mak
   echo "LIBXNVME_LIBS=$xnvme_libs" >> $config_host_mak
 fi
+if test "$libblkio" = "yes" ; then
+  output_sym "CONFIG_LIBBLKIO"
+  echo "LIBBLKIO_CFLAGS=$libblkio_cflags" >> $config_host_mak
+  echo "LIBBLKIO_LIBS=$libblkio_libs" >> $config_host_mak
+fi
 if test "$dynamic_engines" = "yes" ; then
   output_sym "CONFIG_DYNAMIC_ENGINES"
 fi
diff --git a/engines/libblkio.c b/engines/libblkio.c
new file mode 100644
index 00000000..054aa800
--- /dev/null
+++ b/engines/libblkio.c
@@ -0,0 +1,914 @@
+/*
+ * libblkio engine
+ *
+ * IO engine using libblkio to access various block I/O interfaces:
+ * https://gitlab.com/libblkio/libblkio
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <blkio.h>
+
+#include "../fio.h"
+#include "../optgroup.h"
+#include "../options.h"
+#include "../parse.h"
+
+/* per-process state */
+static struct {
+	pthread_mutex_t mutex;
+	int initted_threads;
+	int initted_hipri_threads;
+	struct blkio *b;
+} proc_state = { PTHREAD_MUTEX_INITIALIZER, 0, 0, NULL };
+
+static void fio_blkio_proc_lock(void) {
+	int ret;
+	ret = pthread_mutex_lock(&proc_state.mutex);
+	assert(ret == 0);
+}
+
+static void fio_blkio_proc_unlock(void) {
+	int ret;
+	ret = pthread_mutex_unlock(&proc_state.mutex);
+	assert(ret == 0);
+}
+
+/* per-thread state */
+struct fio_blkio_data {
+	struct blkioq *q;
+	int completion_fd; /* may be -1 if not FIO_BLKIO_WAIT_MODE_EVENTFD */
+
+	bool has_mem_region; /* whether mem_region is valid */
+	struct blkio_mem_region mem_region; /* only if allocated by libblkio */
+
+	struct iovec *iovecs; /* for vectored requests */
+	struct blkio_completion *completions;
+};
+
+enum fio_blkio_wait_mode {
+	FIO_BLKIO_WAIT_MODE_BLOCK,
+	FIO_BLKIO_WAIT_MODE_EVENTFD,
+	FIO_BLKIO_WAIT_MODE_LOOP,
+};
+
+struct fio_blkio_options {
+	void *pad; /* option fields must not have offset 0 */
+
+	char *driver;
+
+	char *path;
+	char *pre_connect_props;
+
+	int num_entries;
+	int queue_size;
+	char *pre_start_props;
+
+	unsigned int hipri;
+	unsigned int vectored;
+	unsigned int write_zeroes_on_trim;
+	enum fio_blkio_wait_mode wait_mode;
+	unsigned int force_enable_completion_eventfd;
+};
+
+static struct fio_option options[] = {
+	{
+		.name	= "libblkio_driver",
+		.lname	= "libblkio driver name",
+		.type	= FIO_OPT_STR_STORE,
+		.off1	= offsetof(struct fio_blkio_options, driver),
+		.help	= "Name of the driver to be used by libblkio",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBBLKIO,
+	},
+	{
+		.name	= "libblkio_path",
+		.lname	= "libblkio \"path\" property",
+		.type	= FIO_OPT_STR_STORE,
+		.off1	= offsetof(struct fio_blkio_options, path),
+		.help	= "Value to set the \"path\" property to",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBBLKIO,
+	},
+	{
+		.name	= "libblkio_pre_connect_props",
+		.lname	= "Additional properties to be set before blkio_connect()",
+		.type	= FIO_OPT_STR_STORE,
+		.off1	= offsetof(struct fio_blkio_options, pre_connect_props),
+		.help	= "",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBBLKIO,
+	},
+	{
+		.name	= "libblkio_num_entries",
+		.lname	= "libblkio \"num-entries\" property",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct fio_blkio_options, num_entries),
+		.help	= "Value to set the \"num-entries\" property to",
+		.minval	= 1,
+		.interval = 1,
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBBLKIO,
+	},
+	{
+		.name	= "libblkio_queue_size",
+		.lname	= "libblkio \"queue-size\" property",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct fio_blkio_options, queue_size),
+		.help	= "Value to set the \"queue-size\" property to",
+		.minval	= 1,
+		.interval = 1,
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBBLKIO,
+	},
+	{
+		.name	= "libblkio_pre_start_props",
+		.lname	= "Additional properties to be set before blkio_start()",
+		.type	= FIO_OPT_STR_STORE,
+		.off1	= offsetof(struct fio_blkio_options, pre_start_props),
+		.help	= "",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBBLKIO,
+	},
+	{
+		.name	= "hipri",
+		.lname	= "Use poll queues",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct fio_blkio_options, hipri),
+		.help	= "Use poll queues",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBBLKIO,
+	},
+	{
+		.name	= "libblkio_vectored",
+		.lname	= "Use blkioq_{readv,writev}()",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct fio_blkio_options, vectored),
+		.help	= "Use blkioq_{readv,writev}() instead of blkioq_{read,write}()",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBBLKIO,
+	},
+	{
+		.name	= "libblkio_write_zeroes_on_trim",
+		.lname	= "Use blkioq_write_zeroes() for TRIM",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct fio_blkio_options,
+				   write_zeroes_on_trim),
+		.help	= "Use blkioq_write_zeroes() for TRIM instead of blkioq_discard()",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBBLKIO,
+	},
+	{
+		.name	= "libblkio_wait_mode",
+		.lname	= "How to wait for completions",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct fio_blkio_options, wait_mode),
+		.help	= "How to wait for completions",
+		.def	= "block",
+		.posval = {
+			  { .ival = "block",
+			    .oval = FIO_BLKIO_WAIT_MODE_BLOCK,
+			    .help = "Blocking blkioq_do_io()",
+			  },
+			  { .ival = "eventfd",
+			    .oval = FIO_BLKIO_WAIT_MODE_EVENTFD,
+			    .help = "Blocking read() on the completion eventfd",
+			  },
+			  { .ival = "loop",
+			    .oval = FIO_BLKIO_WAIT_MODE_LOOP,
+			    .help = "Busy loop with non-blocking blkioq_do_io()",
+			  },
+		},
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBBLKIO,
+	},
+	{
+		.name	= "libblkio_force_enable_completion_eventfd",
+		.lname	= "Force enable the completion eventfd, even if unused",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct fio_blkio_options,
+				   force_enable_completion_eventfd),
+		.help	= "This can impact performance",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBBLKIO,
+	},
+	{
+		.name = NULL,
+	},
+};
+
+static int fio_blkio_set_props_from_str(struct blkio *b, const char *opt_name,
+					const char *str) {
+	int ret = 0;
+	char *new_str, *name, *value;
+
+	if (!str)
+		return 0;
+
+	/* iteration can mutate string, so copy it */
+	new_str = strdup(str);
+	if (!new_str) {
+		log_err("fio: strdup() failed\n");
+		return 1;
+	}
+
+	/* iterate over property name-value pairs */
+	while ((name = get_next_str(&new_str))) {
+		/* split into property name and value */
+		value = strchr(name, '=');
+		if (!value) {
+			log_err("fio: missing '=' in option %s\n", opt_name);
+			ret = 1;
+			break;
+		}
+
+		*value = '\0';
+		++value;
+
+		/* strip whitespace from property name */
+		strip_blank_front(&name);
+		strip_blank_end(name);
+
+		if (name[0] == '\0') {
+			log_err("fio: empty property name in option %s\n",
+				opt_name);
+			ret = 1;
+			break;
+		}
+
+		/* strip whitespace from property value */
+		strip_blank_front(&value);
+		strip_blank_end(value);
+
+		/* set property */
+		if (blkio_set_str(b, name, value) != 0) {
+			log_err("fio: error setting property '%s' to '%s': %s\n",
+				name, value, blkio_get_error_msg());
+			ret = 1;
+			break;
+		}
+	}
+
+	free(new_str);
+	return ret;
+}
+
+/*
+ * Log the failure of a libblkio function.
+ *
+ * `(void)func` is to ensure `func` exists and prevent typos
+ */
+#define fio_blkio_log_err(func) \
+	({ \
+		(void)func; \
+		log_err("fio: %s() failed: %s\n", #func, \
+			blkio_get_error_msg()); \
+	})
+
+static bool possibly_null_strs_equal(const char *a, const char *b)
+{
+	return (!a && !b) || (a && b && strcmp(a, b) == 0);
+}
+
+/*
+ * Returns the total number of subjobs using the 'libblkio' ioengine and setting
+ * the 'thread' option in the entire workload that have the given value for the
+ * 'hipri' option.
+ */
+static int total_threaded_subjobs(bool hipri)
+{
+	struct thread_data *td;
+	unsigned int i;
+	int count = 0;
+
+	for_each_td(td, i) {
+		const struct fio_blkio_options *options = td->eo;
+		if (strcmp(td->o.ioengine, "libblkio") == 0 &&
+		    td->o.use_thread && (bool)options->hipri == hipri)
+			++count;
+	}
+
+	return count;
+}
+
+static struct {
+	bool set_up;
+	bool direct;
+	struct fio_blkio_options opts;
+} first_threaded_subjob = { 0 };
+
+static void fio_blkio_log_opt_compat_err(const char *option_name)
+{
+	log_err("fio: jobs using engine libblkio and sharing a process must agree on the %s option\n",
+		option_name);
+}
+
+/*
+ * If td represents a subjob with option 'thread', check if its options are
+ * compatible with those of other threaded subjobs that were already set up.
+ */
+static int fio_blkio_check_opt_compat(struct thread_data *td)
+{
+	const struct fio_blkio_options *options = td->eo, *prev_options;
+
+	if (!td->o.use_thread)
+		return 0; /* subjob doesn't use 'thread' */
+
+	if (!first_threaded_subjob.set_up) {
+		/* first subjob using 'thread', store options for later */
+		first_threaded_subjob.set_up	= true;
+		first_threaded_subjob.direct	= td->o.odirect;
+		first_threaded_subjob.opts	= *options;
+		return 0;
+	}
+
+	/* not first subjob using 'thread', check option compatibility */
+	prev_options = &first_threaded_subjob.opts;
+
+	if (td->o.odirect != first_threaded_subjob.direct) {
+		fio_blkio_log_opt_compat_err("direct/buffered");
+		return 1;
+	}
+
+	if (strcmp(options->driver, prev_options->driver) != 0) {
+		fio_blkio_log_opt_compat_err("libblkio_driver");
+		return 1;
+	}
+
+	if (!possibly_null_strs_equal(options->path, prev_options->path)) {
+		fio_blkio_log_opt_compat_err("libblkio_path");
+		return 1;
+	}
+
+	if (!possibly_null_strs_equal(options->pre_connect_props,
+				      prev_options->pre_connect_props)) {
+		fio_blkio_log_opt_compat_err("libblkio_pre_connect_props");
+		return 1;
+	}
+
+	if (options->num_entries != prev_options->num_entries) {
+		fio_blkio_log_opt_compat_err("libblkio_num_entries");
+		return 1;
+	}
+
+	if (options->queue_size != prev_options->queue_size) {
+		fio_blkio_log_opt_compat_err("libblkio_queue_size");
+		return 1;
+	}
+
+	if (!possibly_null_strs_equal(options->pre_start_props,
+				      prev_options->pre_start_props)) {
+		fio_blkio_log_opt_compat_err("libblkio_pre_start_props");
+		return 1;
+	}
+
+	return 0;
+}
+
+static int fio_blkio_create_and_connect(struct thread_data *td,
+					struct blkio **out_blkio)
+{
+	const struct fio_blkio_options *options = td->eo;
+	struct blkio *b;
+	int ret;
+
+	if (!options->driver) {
+		log_err("fio: engine libblkio requires option libblkio_driver to be set\n");
+		return 1;
+	}
+
+	if (blkio_create(options->driver, &b) != 0) {
+		fio_blkio_log_err(blkio_create);
+		return 1;
+	}
+
+	/* don't fail if driver doesn't have a "direct" property */
+	ret = blkio_set_bool(b, "direct", td->o.odirect);
+	if (ret != 0 && ret != -ENOENT) {
+		fio_blkio_log_err(blkio_set_bool);
+		goto err_blkio_destroy;
+	}
+
+	if (blkio_set_bool(b, "read-only", read_only) != 0) {
+		fio_blkio_log_err(blkio_set_bool);
+		goto err_blkio_destroy;
+	}
+
+	if (options->path) {
+		if (blkio_set_str(b, "path", options->path) != 0) {
+			fio_blkio_log_err(blkio_set_str);
+			goto err_blkio_destroy;
+		}
+	}
+
+	if (fio_blkio_set_props_from_str(b, "libblkio_pre_connect_props",
+					 options->pre_connect_props) != 0)
+		goto err_blkio_destroy;
+
+	if (blkio_connect(b) != 0) {
+		fio_blkio_log_err(blkio_connect);
+		goto err_blkio_destroy;
+	}
+
+	if (options->num_entries != 0) {
+		if (blkio_set_int(b, "num-entries",
+				  options->num_entries) != 0) {
+			fio_blkio_log_err(blkio_set_int);
+			goto err_blkio_destroy;
+		}
+	}
+
+	if (options->queue_size != 0) {
+		if (blkio_set_int(b, "queue-size", options->queue_size) != 0) {
+			fio_blkio_log_err(blkio_set_int);
+			goto err_blkio_destroy;
+		}
+	}
+
+	if (fio_blkio_set_props_from_str(b, "libblkio_pre_start_props",
+					 options->pre_start_props) != 0)
+		goto err_blkio_destroy;
+
+	*out_blkio = b;
+	return 0;
+
+err_blkio_destroy:
+	blkio_destroy(&b);
+	return 1;
+}
+
+static bool incompatible_threaded_subjob_options = false;
+
+/*
+ * This callback determines the device/file size, so it creates and connects a
+ * blkio instance. But it is invoked from the main thread in the original fio
+ * process, not from the processes in which jobs will actually run. It thus
+ * subsequently destroys the blkio, which is recreated in the init() callback.
+ */
+static int fio_blkio_setup(struct thread_data *td)
+{
+	const struct fio_blkio_options *options = td->eo;
+	struct blkio *b;
+	int ret = 0;
+	uint64_t capacity;
+
+	assert(td->files_index == 1);
+
+	if (fio_blkio_check_opt_compat(td) != 0) {
+		incompatible_threaded_subjob_options = true;
+		return 1;
+	}
+
+	if (options->hipri &&
+		options->wait_mode == FIO_BLKIO_WAIT_MODE_EVENTFD) {
+		log_err("fio: option hipri is incompatible with option libblkio_wait_mode=eventfd\n");
+		return 1;
+	}
+
+	if (options->hipri && options->force_enable_completion_eventfd) {
+		log_err("fio: option hipri is incompatible with option libblkio_force_enable_completion_eventfd\n");
+		return 1;
+	}
+
+	if (fio_blkio_create_and_connect(td, &b) != 0)
+		return 1;
+
+	if (blkio_get_uint64(b, "capacity", &capacity) != 0) {
+		fio_blkio_log_err(blkio_get_uint64);
+		ret = 1;
+		goto out_blkio_destroy;
+	}
+
+	td->files[0]->real_file_size = capacity;
+	fio_file_set_size_known(td->files[0]);
+
+out_blkio_destroy:
+	blkio_destroy(&b);
+	return ret;
+}
+
+static int fio_blkio_init(struct thread_data *td)
+{
+	const struct fio_blkio_options *options = td->eo;
+	struct fio_blkio_data *data;
+	int flags;
+
+	if (td->o.use_thread && incompatible_threaded_subjob_options) {
+		/*
+		 * Different subjobs using option 'thread' specified
+		 * incompatible options. We don't know which configuration
+		 * should win, so we just fail all such subjobs.
+		 */
+		return 1;
+	}
+
+	/*
+	 * Request enqueueing is fast, and it's not possible to know exactly
+	 * when a request is submitted, so never report submission latencies.
+	 */
+	td->o.disable_slat = 1;
+
+	data = calloc(1, sizeof(*data));
+	if (!data) {
+		log_err("fio: calloc() failed\n");
+		return 1;
+	}
+
+	data->iovecs = calloc(td->o.iodepth, sizeof(data->iovecs[0]));
+	data->completions = calloc(td->o.iodepth, sizeof(data->completions[0]));
+	if (!data->iovecs || !data->completions) {
+		log_err("fio: calloc() failed\n");
+		goto err_free;
+	}
+
+	fio_blkio_proc_lock();
+
+	if (proc_state.initted_threads == 0) {
+		/* initialize per-process blkio */
+		int num_queues, num_poll_queues;
+
+		if (td->o.use_thread) {
+			num_queues 	= total_threaded_subjobs(false);
+			num_poll_queues = total_threaded_subjobs(true);
+		} else {
+			num_queues 	= options->hipri ? 0 : 1;
+			num_poll_queues = options->hipri ? 1 : 0;
+		}
+
+		if (fio_blkio_create_and_connect(td, &proc_state.b) != 0)
+			goto err_unlock;
+
+		if (blkio_set_int(proc_state.b, "num-queues",
+				  num_queues) != 0) {
+			fio_blkio_log_err(blkio_set_int);
+			goto err_blkio_destroy;
+		}
+
+		if (blkio_set_int(proc_state.b, "num-poll-queues",
+				  num_poll_queues) != 0) {
+			fio_blkio_log_err(blkio_set_int);
+			goto err_blkio_destroy;
+		}
+
+		if (blkio_start(proc_state.b) != 0) {
+			fio_blkio_log_err(blkio_start);
+			goto err_blkio_destroy;
+		}
+	}
+
+	if (options->hipri) {
+		int i = proc_state.initted_hipri_threads;
+		data->q = blkio_get_poll_queue(proc_state.b, i);
+	} else {
+		int i = proc_state.initted_threads -
+				proc_state.initted_hipri_threads;
+		data->q = blkio_get_queue(proc_state.b, i);
+	}
+
+	if (options->wait_mode == FIO_BLKIO_WAIT_MODE_EVENTFD ||
+		options->force_enable_completion_eventfd) {
+		/* enable completion fd and make it blocking */
+		blkioq_set_completion_fd_enabled(data->q, true);
+		data->completion_fd = blkioq_get_completion_fd(data->q);
+
+		flags = fcntl(data->completion_fd, F_GETFL);
+		if (flags < 0) {
+			log_err("fio: fcntl(F_GETFL) failed: %s\n",
+				strerror(errno));
+			goto err_blkio_destroy;
+		}
+
+		if (fcntl(data->completion_fd, F_SETFL,
+			  flags & ~O_NONBLOCK) != 0) {
+			log_err("fio: fcntl(F_SETFL) failed: %s\n",
+				strerror(errno));
+			goto err_blkio_destroy;
+		}
+	} else {
+		data->completion_fd = -1;
+	}
+
+	++proc_state.initted_threads;
+	if (options->hipri)
+		++proc_state.initted_hipri_threads;
+
+	/* Set data last so cleanup() does nothing if init() fails. */
+	td->io_ops_data = data;
+
+	fio_blkio_proc_unlock();
+
+	return 0;
+
+err_blkio_destroy:
+	if (proc_state.initted_threads == 0)
+		blkio_destroy(&proc_state.b);
+err_unlock:
+	if (proc_state.initted_threads == 0)
+		proc_state.b = NULL;
+	fio_blkio_proc_unlock();
+err_free:
+	free(data->completions);
+	free(data->iovecs);
+	free(data);
+	return 1;
+}
+
+static int fio_blkio_post_init(struct thread_data *td)
+{
+	struct fio_blkio_data *data = td->io_ops_data;
+
+	if (!data->has_mem_region) {
+		/*
+		 * Memory was allocated by the fio core and not iomem_alloc(),
+		 * so we need to register it as a memory region here.
+		 *
+		 * `td->orig_buffer_size` is computed like `len` below, but then
+		 * fio can add some padding to it to make sure it is
+		 * sufficiently aligned to the page size and the mem_align
+		 * option. However, this can make it become unaligned to the
+		 * "mem-region-alignment" property in ways that the user can't
+		 * control, so we essentially recompute `td->orig_buffer_size`
+		 * here but without adding that padding.
+		 */
+
+		unsigned long long max_block_size;
+		struct blkio_mem_region region;
+
+		max_block_size = max(td->o.max_bs[DDIR_READ],
+				     max(td->o.max_bs[DDIR_WRITE],
+					 td->o.max_bs[DDIR_TRIM]));
+
+		region = (struct blkio_mem_region) {
+			.addr	= td->orig_buffer,
+			.len	= (size_t)max_block_size *
+					(size_t)td->o.iodepth,
+			.fd	= -1,
+		};
+
+		if (blkio_map_mem_region(proc_state.b, &region) != 0) {
+			fio_blkio_log_err(blkio_map_mem_region);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
+static void fio_blkio_cleanup(struct thread_data *td)
+{
+	struct fio_blkio_data *data = td->io_ops_data;
+
+	/*
+	 * Subjobs from different jobs can be terminated at different times, so
+	 * this callback may be invoked for one subjob while another is still
+	 * doing I/O. Those subjobs may share the process, so we must wait until
+	 * the last subjob in the process wants to clean up to actually destroy
+	 * the blkio.
+	 */
+
+	if (data) {
+		free(data->completions);
+		free(data->iovecs);
+		free(data);
+
+		fio_blkio_proc_lock();
+		if (--proc_state.initted_threads == 0) {
+			blkio_destroy(&proc_state.b);
+			proc_state.b = NULL;
+		}
+		fio_blkio_proc_unlock();
+	}
+}
+
+#define align_up(x, y) ((((x) + (y) - 1) / (y)) * (y))
+
+static int fio_blkio_iomem_alloc(struct thread_data *td, size_t size)
+{
+	struct fio_blkio_data *data = td->io_ops_data;
+	int ret;
+	uint64_t mem_region_alignment;
+
+	if (blkio_get_uint64(proc_state.b, "mem-region-alignment",
+			     &mem_region_alignment) != 0) {
+		fio_blkio_log_err(blkio_get_uint64);
+		return 1;
+	}
+
+	/* round up size to satisfy mem-region-alignment */
+	size = align_up(size, (size_t)mem_region_alignment);
+
+	fio_blkio_proc_lock();
+
+	if (blkio_alloc_mem_region(proc_state.b, &data->mem_region,
+				   size) != 0) {
+		fio_blkio_log_err(blkio_alloc_mem_region);
+		ret = 1;
+		goto out;
+	}
+
+	if (blkio_map_mem_region(proc_state.b, &data->mem_region) != 0) {
+		fio_blkio_log_err(blkio_map_mem_region);
+		ret = 1;
+		goto out_free;
+	}
+
+	td->orig_buffer = data->mem_region.addr;
+	data->has_mem_region = true;
+
+	ret = 0;
+	goto out;
+
+out_free:
+	blkio_free_mem_region(proc_state.b, &data->mem_region);
+out:
+	fio_blkio_proc_unlock();
+	return ret;
+}
+
+static void fio_blkio_iomem_free(struct thread_data *td)
+{
+	struct fio_blkio_data *data = td->io_ops_data;
+
+	if (data && data->has_mem_region) {
+		fio_blkio_proc_lock();
+		blkio_unmap_mem_region(proc_state.b, &data->mem_region);
+		blkio_free_mem_region(proc_state.b, &data->mem_region);
+		fio_blkio_proc_unlock();
+
+		data->has_mem_region = false;
+	}
+}
+
+static int fio_blkio_open_file(struct thread_data *td, struct fio_file *f)
+{
+	return 0;
+}
+
+static enum fio_q_status fio_blkio_queue(struct thread_data *td,
+					 struct io_u *io_u)
+{
+	const struct fio_blkio_options *options = td->eo;
+	struct fio_blkio_data *data = td->io_ops_data;
+
+	fio_ro_check(td, io_u);
+
+	switch (io_u->ddir) {
+		case DDIR_READ:
+			if (options->vectored) {
+				struct iovec *iov = &data->iovecs[io_u->index];
+				iov->iov_base = io_u->xfer_buf;
+				iov->iov_len = (size_t)io_u->xfer_buflen;
+
+				blkioq_readv(data->q, io_u->offset, iov, 1,
+					     io_u, 0);
+			} else {
+				blkioq_read(data->q, io_u->offset,
+					    io_u->xfer_buf,
+					    (size_t)io_u->xfer_buflen, io_u, 0);
+			}
+			break;
+		case DDIR_WRITE:
+			if (options->vectored) {
+				struct iovec *iov = &data->iovecs[io_u->index];
+				iov->iov_base = io_u->xfer_buf;
+				iov->iov_len = (size_t)io_u->xfer_buflen;
+
+				blkioq_writev(data->q, io_u->offset, iov, 1,
+					      io_u, 0);
+			} else {
+				blkioq_write(data->q, io_u->offset,
+					     io_u->xfer_buf,
+					     (size_t)io_u->xfer_buflen, io_u,
+					     0);
+			}
+			break;
+		case DDIR_TRIM:
+			if (options->write_zeroes_on_trim) {
+				blkioq_write_zeroes(data->q, io_u->offset,
+						    io_u->xfer_buflen, io_u, 0);
+			} else {
+				blkioq_discard(data->q, io_u->offset,
+					       io_u->xfer_buflen, io_u, 0);
+			}
+		        break;
+		case DDIR_SYNC:
+		case DDIR_DATASYNC:
+			blkioq_flush(data->q, io_u, 0);
+			break;
+		default:
+			io_u->error = ENOTSUP;
+			io_u_log_error(td, io_u);
+			return FIO_Q_COMPLETED;
+	}
+
+	return FIO_Q_QUEUED;
+}
+
+static int fio_blkio_getevents(struct thread_data *td, unsigned int min,
+			       unsigned int max, const struct timespec *t)
+{
+	const struct fio_blkio_options *options = td->eo;
+	struct fio_blkio_data *data = td->io_ops_data;
+	int ret, n;
+	uint64_t event;
+
+	switch (options->wait_mode) {
+	case FIO_BLKIO_WAIT_MODE_BLOCK:
+		n = blkioq_do_io(data->q, data->completions, (int)min, (int)max,
+				 NULL);
+		if (n < 0) {
+			fio_blkio_log_err(blkioq_do_io);
+			return -1;
+		}
+		return n;
+	case FIO_BLKIO_WAIT_MODE_EVENTFD:
+		n = blkioq_do_io(data->q, data->completions, 0, (int)max, NULL);
+		if (n < 0) {
+			fio_blkio_log_err(blkioq_do_io);
+			return -1;
+		}
+		while (n < (int)min) {
+			ret = read(data->completion_fd, &event, sizeof(event));
+			if (ret != sizeof(event)) {
+				log_err("fio: read() on the completion fd returned %d\n",
+					ret);
+				return -1;
+			}
+
+			ret = blkioq_do_io(data->q, data->completions + n, 0,
+					   (int)max - n, NULL);
+			if (ret < 0) {
+				fio_blkio_log_err(blkioq_do_io);
+				return -1;
+			}
+
+			n += ret;
+		}
+		return n;
+	case FIO_BLKIO_WAIT_MODE_LOOP:
+		for (n = 0; n < (int)min; ) {
+			ret = blkioq_do_io(data->q, data->completions + n, 0,
+					   (int)max - n, NULL);
+			if (ret < 0) {
+				fio_blkio_log_err(blkioq_do_io);
+				return -1;
+			}
+
+			n += ret;
+		}
+		return n;
+	default:
+		return -1;
+	}
+}
+
+static struct io_u *fio_blkio_event(struct thread_data *td, int event)
+{
+	struct fio_blkio_data *data = td->io_ops_data;
+	struct blkio_completion *completion = &data->completions[event];
+	struct io_u *io_u = completion->user_data;
+
+	io_u->error = -completion->ret;
+
+	return io_u;
+}
+
+FIO_STATIC struct ioengine_ops ioengine = {
+	.name			= "libblkio",
+	.version		= FIO_IOOPS_VERSION,
+	.flags			= FIO_DISKLESSIO | FIO_NOEXTEND |
+				  FIO_NO_OFFLOAD | FIO_SKIPPABLE_IOMEM_ALLOC,
+
+	.setup			= fio_blkio_setup,
+	.init			= fio_blkio_init,
+	.post_init		= fio_blkio_post_init,
+	.cleanup		= fio_blkio_cleanup,
+
+	.iomem_alloc		= fio_blkio_iomem_alloc,
+	.iomem_free		= fio_blkio_iomem_free,
+
+	.open_file		= fio_blkio_open_file,
+
+	.queue			= fio_blkio_queue,
+	.getevents		= fio_blkio_getevents,
+	.event			= fio_blkio_event,
+
+	.options		= options,
+	.option_struct_size	= sizeof(struct fio_blkio_options),
+};
+
+static void fio_init fio_blkio_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_blkio_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/libblkio-io_uring.fio b/examples/libblkio-io_uring.fio
new file mode 100644
index 00000000..40f625cf
--- /dev/null
+++ b/examples/libblkio-io_uring.fio
@@ -0,0 +1,29 @@
+; Benchmark accessing a regular file or block device using libblkio.
+;
+; Replace "/dev/nvme0n1" below with the path to your file or device, or override
+; it by passing the '--libblkio_path=...' flag to fio.
+;
+; In the example below, the two subjobs of "job-B" *and* the single subjob of
+; "job-C" will share a single libblkio instance, and "job-A" will use a separate
+; libblkio instance.
+;
+; For information on libblkio, see: https://gitlab.com/libblkio/libblkio
+
+[global]
+ioengine=libblkio
+libblkio_driver=io_uring
+libblkio_path=/dev/nvme0n1  ; REPLACE THIS WITH THE RIGHT PATH
+rw=randread
+blocksize=4k
+direct=1
+time_based=1
+runtime=10s
+
+[job-A]
+
+[job-B]
+numjobs=2  ; run two copies of this job simultaneously
+thread=1   ; have each copy run as a separate thread in the *same* process
+
+[job-C]
+thread=1  ; have the job run as a thread in the *same* process as "job-B"
diff --git a/examples/libblkio-virtio-blk-vfio-pci.fio b/examples/libblkio-virtio-blk-vfio-pci.fio
new file mode 100644
index 00000000..024224a6
--- /dev/null
+++ b/examples/libblkio-virtio-blk-vfio-pci.fio
@@ -0,0 +1,29 @@
+; Benchmark accessing a PCI virtio-blk device using libblkio.
+;
+; Replace "/sys/bus/pci/devices/0000:00:01.0" below with the path to your
+; device's sysfs directory, or override it by passing the '--libblkio_path=...'
+; flag to fio.
+;
+; In the example below, the two subjobs of "job-B" *and* the single subjob of
+; "job-C" will share a single libblkio instance, and "job-A" will use a separate
+; libblkio instance.
+;
+; For information on libblkio, see: https://gitlab.com/libblkio/libblkio
+
+[global]
+ioengine=libblkio
+libblkio_driver=virtio-blk-vfio-pci
+libblkio_path=/sys/bus/pci/devices/0000:00:01.0  ; REPLACE THIS WITH THE RIGHT PATH
+rw=randread
+blocksize=4k
+time_based=1
+runtime=10s
+
+[job-A]
+
+[job-B]
+numjobs=2  ; run two copies of this job simultaneously
+thread=1   ; have each copy run as a separate thread in the *same* process
+
+[job-C]
+thread=1  ; have the job run as a thread in the *same* process as "job-B"
diff --git a/fio.1 b/fio.1
index 62af0bd2..7a153731 100644
--- a/fio.1
+++ b/fio.1
@@ -1992,6 +1992,16 @@ I/O engine using the xNVMe C API, for NVMe devices. The xnvme engine provides
 flexibility to access GNU/Linux Kernel NVMe driver via libaio, IOCTLs, io_uring,
 the SPDK NVMe driver, or your own custom NVMe driver. The xnvme engine includes
 engine specific options. (See \fIhttps://xnvme.io/\fR).
+.TP
+.B libblkio
+Use the libblkio library (\fIhttps://gitlab.com/libblkio/libblkio\fR). The
+specific driver to use must be set using \fBlibblkio_driver\fR. If
+\fBmem\fR/\fBiomem\fR is not specified, memory allocation is delegated to
+libblkio (and so is guaranteed to work with the selected driver). One libblkio
+instance is used per process, so all jobs setting option \fBthread\fR will share
+a single instance (with one queue per thread) and must specify compatible
+options. Note that some drivers don't allow several instances to access the same
+device or file simultaneously, but allow it for threads.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -2604,6 +2614,74 @@ xnvme namespace identifier for userspace NVMe driver such as SPDK.
 .TP
 .BI (xnvme)xnvme_iovec
 If this option is set, xnvme will use vectored read/write commands.
+.TP
+.BI (libblkio)libblkio_driver \fR=\fPstr
+The libblkio driver to use. Different drivers access devices through different
+underlying interfaces. Available drivers depend on the libblkio version in use
+and are listed at \fIhttps://libblkio.gitlab.io/libblkio/blkio.html#drivers\fR
+.TP
+.BI (libblkio)libblkio_path \fR=\fPstr
+Sets the value of the driver-specific "path" property before connecting the
+libblkio instance, which identifies the target device or file on which to
+perform I/O. Its exact semantics are driver-dependent and not all drivers may
+support it; see \fIhttps://libblkio.gitlab.io/libblkio/blkio.html#drivers\fR
+.TP
+.BI (libblkio)libblkio_pre_connect_props \fR=\fPstr
+A colon-separated list of additional libblkio properties to be set after
+creating but before connecting the libblkio instance. Each property must have
+the format \fB<name>=<value>\fR. Colons can be escaped as \fB\\:\fR. These are
+set after the engine sets any other properties, so those can be overriden.
+Available properties depend on the libblkio version in use and are listed at
+\fIhttps://libblkio.gitlab.io/libblkio/blkio.html#properties\fR
+.TP
+.BI (libblkio)libblkio_num_entries \fR=\fPint
+Sets the value of the driver-specific "num-entries" property before starting the
+libblkio instance. Its exact semantics are driver-dependent and not all drivers
+may support it; see \fIhttps://libblkio.gitlab.io/libblkio/blkio.html#drivers\fR
+.TP
+.BI (libblkio)libblkio_queue_size \fR=\fPint
+Sets the value of the driver-specific "queue-size" property before starting the
+libblkio instance. Its exact semantics are driver-dependent and not all drivers
+may support it; see \fIhttps://libblkio.gitlab.io/libblkio/blkio.html#drivers\fR
+.TP
+.BI (libblkio)libblkio_pre_start_props \fR=\fPstr
+A colon-separated list of additional libblkio properties to be set after
+connecting but before starting the libblkio instance. Each property must have
+the format \fB<name>=<value>\fR. Colons can be escaped as \fB\\:\fR. These are
+set after the engine sets any other properties, so those can be overriden.
+Available properties depend on the libblkio version in use and are listed at
+\fIhttps://libblkio.gitlab.io/libblkio/blkio.html#properties\fR
+.TP
+.BI (libblkio)hipri
+Use poll queues. This is incompatible with \fBlibblkio_wait_mode=eventfd\fR and
+\fBlibblkio_force_enable_completion_eventfd\fR.
+.TP
+.BI (libblkio)libblkio_vectored
+Submit vectored read and write requests.
+.TP
+.BI (libblkio)libblkio_write_zeroes_on_trim
+Submit trims as "write zeroes" requests instead of discard requests.
+.TP
+.BI (libblkio)libblkio_wait_mode \fR=\fPstr
+How to wait for completions:
+.RS
+.RS
+.TP
+.B block \fR(default)
+Use a blocking call to \fBblkioq_do_io()\fR.
+.TP
+.B eventfd
+Use a blocking call to \fBread()\fR on the completion eventfd.
+.TP
+.B loop
+Use a busy loop with a non-blocking call to \fBblkioq_do_io()\fR.
+.RE
+.RE
+.TP
+.BI (libblkio)libblkio_force_enable_completion_eventfd
+Enable the queue's completion eventfd even when unused. This may impact
+performance. The default is to enable it only if
+\fBlibblkio_wait_mode=eventfd\fR.
 .SS "I/O depth"
 .TP
 .BI iodepth \fR=\fPint
diff --git a/ioengines.h b/ioengines.h
index 11d2115c..d43540d0 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -87,6 +87,8 @@ enum fio_ioengine_flags {
 	FIO_NO_OFFLOAD	= 1 << 15,	/* no async offload */
 	FIO_ASYNCIO_SETS_ISSUE_TIME
 			= 1 << 16,	/* async ioengine with commit function that sets issue_time */
+	FIO_SKIPPABLE_IOMEM_ALLOC
+			= 1 << 17,	/* skip iomem_alloc & iomem_free if job sets mem/iomem */
 };
 
 /*
diff --git a/memory.c b/memory.c
index 6cf73333..577d3dd5 100644
--- a/memory.c
+++ b/memory.c
@@ -305,16 +305,18 @@ int allocate_io_mem(struct thread_data *td)
 	dprint(FD_MEM, "Alloc %llu for buffers\n", (unsigned long long) total_mem);
 
 	/*
-	 * If the IO engine has hooks to allocate/free memory, use those. But
-	 * error out if the user explicitly asked for something else.
+	 * If the IO engine has hooks to allocate/free memory and the user
+	 * doesn't explicitly ask for something else, use those. But fail if the
+	 * user asks for something else with an engine that doesn't allow that.
 	 */
-	if (td->io_ops->iomem_alloc) {
-		if (fio_option_is_set(&td->o, mem_type)) {
-			log_err("fio: option 'mem/iomem' conflicts with specified IO engine\n");
-			ret = 1;
-		} else
-			ret = td->io_ops->iomem_alloc(td, total_mem);
-	} else if (td->o.mem_type == MEM_MALLOC)
+	if (td->io_ops->iomem_alloc && fio_option_is_set(&td->o, mem_type) &&
+	    !td_ioengine_flagged(td, FIO_SKIPPABLE_IOMEM_ALLOC)) {
+		log_err("fio: option 'mem/iomem' conflicts with specified IO engine\n");
+		ret = 1;
+	} else if (td->io_ops->iomem_alloc &&
+		   !fio_option_is_set(&td->o, mem_type))
+		ret = td->io_ops->iomem_alloc(td, total_mem);
+	else if (td->o.mem_type == MEM_MALLOC)
 		ret = alloc_mem_malloc(td, total_mem);
 	else if (td->o.mem_type == MEM_SHM || td->o.mem_type == MEM_SHMHUGE)
 		ret = alloc_mem_shm(td, total_mem);
@@ -342,7 +344,7 @@ void free_io_mem(struct thread_data *td)
 	if (td->o.odirect || td->o.oatomic)
 		total_mem += page_mask;
 
-	if (td->io_ops->iomem_alloc) {
+	if (td->io_ops->iomem_alloc && !fio_option_is_set(&td->o, mem_type)) {
 		if (td->io_ops->iomem_free)
 			td->io_ops->iomem_free(td);
 	} else if (td->o.mem_type == MEM_MALLOC)
diff --git a/optgroup.h b/optgroup.h
index dc73c8f3..024b902f 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -73,6 +73,7 @@ enum opt_category_group {
 	__FIO_OPT_G_NFS,
 	__FIO_OPT_G_WINDOWSAIO,
 	__FIO_OPT_G_XNVME,
+	__FIO_OPT_G_LIBBLKIO,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
 	FIO_OPT_G_ZONE		= (1ULL << __FIO_OPT_G_ZONE),
@@ -120,6 +121,7 @@ enum opt_category_group {
 	FIO_OPT_G_DFS		= (1ULL << __FIO_OPT_G_DFS),
 	FIO_OPT_G_WINDOWSAIO	= (1ULL << __FIO_OPT_G_WINDOWSAIO),
 	FIO_OPT_G_XNVME         = (1ULL << __FIO_OPT_G_XNVME),
+	FIO_OPT_G_LIBBLKIO	= (1ULL << __FIO_OPT_G_LIBBLKIO),
 };
 
 extern const struct opt_group *opt_group_from_mask(uint64_t *mask);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-12-02 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-12-02 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6d8fe6e847bb43cf7db5eee4cf58fd490f12be47:

  backend: respect return value of init_io_u_buffers (2022-11-30 19:58:34 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 942d66c85ee8f007ea5f1097d097cf9a44b662a0:

  doc: update about size (2022-12-01 11:12:35 -0500)

----------------------------------------------------------------
Ankit Kumar (1):
      doc: update about size

 HOWTO.rst | 7 +++++--
 fio.1     | 7 +++++--
 2 files changed, 10 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 2ea84558..0aaf033a 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1875,8 +1875,11 @@ I/O size
 .. option:: size=int
 
 	The total size of file I/O for each thread of this job. Fio will run until
-	this many bytes has been transferred, unless runtime is limited by other options
-	(such as :option:`runtime`, for instance, or increased/decreased by :option:`io_size`).
+	this many bytes has been transferred, unless runtime is altered by other means
+	such as (1) :option:`runtime`, (2) :option:`io_size` (3) :option:`number_ios`,
+	(4) gaps/holes while doing I/O's such as ``rw=read:16K``, or (5) sequential
+	I/O reaching end of the file which is possible when :option:`percentage_random`
+	is less than 100.
 	Fio will divide this size between the available files determined by options
 	such as :option:`nrfiles`, :option:`filename`, unless :option:`filesize` is
 	specified by the job. If the result of division happens to be 0, the size is
diff --git a/fio.1 b/fio.1
index 746c4472..62af0bd2 100644
--- a/fio.1
+++ b/fio.1
@@ -1676,8 +1676,11 @@ simulate a smaller amount of memory. The amount specified is per worker.
 .TP
 .BI size \fR=\fPint[%|z]
 The total size of file I/O for each thread of this job. Fio will run until
-this many bytes has been transferred, unless runtime is limited by other options
-(such as \fBruntime\fR, for instance, or increased/decreased by \fBio_size\fR).
+this many bytes has been transferred, unless runtime is altered by other means
+such as (1) \fBruntime\fR, (2) \fBio_size\fR, (3) \fBnumber_ios\fR, (4)
+gaps/holes while doing I/O's such as `rw=read:16K', or (5) sequential I/O
+reaching end of the file which is possible when \fBpercentage_random\fR is
+less than 100.
 Fio will divide this size between the available files determined by options
 such as \fBnrfiles\fR, \fBfilename\fR, unless \fBfilesize\fR is
 specified by the job. If the result of division happens to be 0, the size is

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-12-01 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-12-01 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 967c5441fa3d3932ec50ea5623411cc6e8589463:

  docs: description for experimental_verify (2022-11-29 17:09:41 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6d8fe6e847bb43cf7db5eee4cf58fd490f12be47:

  backend: respect return value of init_io_u_buffers (2022-11-30 19:58:34 -0700)

----------------------------------------------------------------
Shin'ichiro Kawasaki (1):
      backend: respect return value of init_io_u_buffers

 backend.c  | 3 ++-
 blktrace.c | 3 ++-
 iolog.c    | 3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index ba954a6b..928e524a 100644
--- a/backend.c
+++ b/backend.c
@@ -1301,7 +1301,8 @@ static int init_io_u(struct thread_data *td)
 		}
 	}
 
-	init_io_u_buffers(td);
+	if (init_io_u_buffers(td))
+		return 1;
 
 	if (init_file_completion_logging(td, max_units))
 		return 1;
diff --git a/blktrace.c b/blktrace.c
index 00e5f9a9..d5c8aee7 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -545,7 +545,8 @@ bool read_blktrace(struct thread_data* td)
 			td->o.max_bs[DDIR_TRIM] = max(td->o.max_bs[DDIR_TRIM], rw_bs[DDIR_TRIM]);
 			io_u_quiesce(td);
 			free_io_mem(td);
-			init_io_u_buffers(td);
+			if (init_io_u_buffers(td))
+				return false;
 		}
 		return true;
 	}
diff --git a/iolog.c b/iolog.c
index aa9c3bb1..62f2f524 100644
--- a/iolog.c
+++ b/iolog.c
@@ -620,7 +620,8 @@ static bool read_iolog(struct thread_data *td)
 		{
 			io_u_quiesce(td);
 			free_io_mem(td);
-			init_io_u_buffers(td);
+			if (init_io_u_buffers(td))
+				return false;
 		}
 		return true;
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-11-30 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-11-30 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 01fd7497328f55622ec989a8edb015f2cccb94eb:

  Merge branch 'lintian-manpage-fixes' of https://github.com/hoexter/fio (2022-11-28 12:54:53 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 967c5441fa3d3932ec50ea5623411cc6e8589463:

  docs: description for experimental_verify (2022-11-29 17:09:41 -0500)

----------------------------------------------------------------
Vincent Fu (2):
      docs: synchronize fio.1 and HOWTO changes
      docs: description for experimental_verify

 HOWTO.rst | 11 +++++++----
 fio.1     |  6 ++++--
 2 files changed, 11 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 4419ee1b..2ea84558 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1054,7 +1054,7 @@ Target file/device
 
 	When running a random write test across an entire drive many more
 	zones will be open than in a typical application workload. Hence this
-	command line option that allows to limit the number of open zones. The
+	command line option that allows one to limit the number of open zones. The
 	number of open zones is defined as the number of zones to which write
 	commands are issued.
 
@@ -1446,7 +1446,7 @@ I/O type
 	supplied as a value between 0 and 100.
 
 	The second, optional float is allowed for **pareto**, **zipf** and **normal** distributions.
-	It allows to set base of distribution in non-default place, giving more control
+	It allows one to set base of distribution in non-default place, giving more control
 	over most probable outcome. This value is in range [0-1] which maps linearly to
 	range of possible random values.
 	Defaults are: random for **pareto** and **zipf**, and 0.5 for **normal**.
@@ -3612,7 +3612,10 @@ Verification
 
 .. option:: experimental_verify=bool
 
-	Enable experimental verification.
+        Enable experimental verification. Standard verify records I/O metadata
+        for later use during the verification phase. Experimental verify
+        instead resets the file after the write phase and then replays I/Os for
+        the verification phase.
 
 Steady state
 ~~~~~~~~~~~~
@@ -4503,7 +4506,7 @@ Trace file format v2
 ~~~~~~~~~~~~~~~~~~~~
 
 The second version of the trace file format was added in fio version 1.17.  It
-allows to access more than one file per trace and has a bigger set of possible
+allows one to access more than one file per trace and has a bigger set of possible
 file actions.
 
 The first line of the trace file has to be::
diff --git a/fio.1 b/fio.1
index a28ec032..746c4472 100644
--- a/fio.1
+++ b/fio.1
@@ -3324,7 +3324,9 @@ Verify that trim/discarded blocks are returned as zeros.
 Trim this number of I/O blocks.
 .TP
 .BI experimental_verify \fR=\fPbool
-Enable experimental verification.
+Enable experimental verification. Standard verify records I/O metadata for
+later use during the verification phase. Experimental verify instead resets the
+file after the write phase and then replays I/Os for the verification phase.
 .SS "Steady state"
 .TP
 .BI steadystate \fR=\fPstr:float "\fR,\fP ss" \fR=\fPstr:float
@@ -4213,7 +4215,7 @@ This format is not supported in fio versions >= 1.20\-rc3.
 .TP
 .B Trace file format v2
 The second version of the trace file format was added in fio version 1.17. It
-allows one to access more then one file per trace and has a bigger set of possible
+allows one to access more than one file per trace and has a bigger set of possible
 file actions.
 .RS
 .P

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-11-29 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-11-29 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 72044c66ac7055a98c9b3021c298c81849e3c990:

  doc: update about sqthread_poll (2022-11-23 14:06:03 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 01fd7497328f55622ec989a8edb015f2cccb94eb:

  Merge branch 'lintian-manpage-fixes' of https://github.com/hoexter/fio (2022-11-28 12:54:53 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'lintian-manpage-fixes' of https://github.com/hoexter/fio

Sven Hoexter (2):
      Spelling: Fix allows to -> allows one to in man 1 fio
      Use correct backslash escape in man 1 fio

 fio.1 | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/fio.1 b/fio.1
index a156bf5d..a28ec032 100644
--- a/fio.1
+++ b/fio.1
@@ -569,7 +569,7 @@ by this option will be \fBsize\fR divided by number of files unless an
 explicit size is specified by \fBfilesize\fR.
 .RS
 .P
-Each colon in the wanted path must be escaped with a '\\'
+Each colon in the wanted path must be escaped with a '\e'
 character. For instance, if the path is `/dev/dsk/foo@3,0:c' then you
 would use `filename=/dev/dsk/foo@3,0\\:c' and if the path is
 `F:\\filename' then you would use `filename=F\\:\\filename'.
@@ -830,7 +830,7 @@ so. Default: false.
 .BI max_open_zones \fR=\fPint
 When running a random write test across an entire drive many more zones will be
 open than in a typical application workload. Hence this command line option
-that allows to limit the number of open zones. The number of open zones is
+that allows one to limit the number of open zones. The number of open zones is
 defined as the number of zones to which write commands are issued by all
 threads/processes.
 .TP
@@ -1224,7 +1224,7 @@ map. For the \fBnormal\fR distribution, a normal (Gaussian) deviation is
 supplied as a value between 0 and 100.
 .P
 The second, optional float is allowed for \fBpareto\fR, \fBzipf\fR and \fBnormal\fR
-distributions. It allows to set base of distribution in non-default place, giving
+distributions. It allows one to set base of distribution in non-default place, giving
 more control over most probable outcome. This value is in range [0-1] which maps linearly to
 range of possible random values.
 Defaults are: random for \fBpareto\fR and \fBzipf\fR, and 0.5 for \fBnormal\fR.
@@ -4213,7 +4213,7 @@ This format is not supported in fio versions >= 1.20\-rc3.
 .TP
 .B Trace file format v2
 The second version of the trace file format was added in fio version 1.17. It
-allows to access more then one file per trace and has a bigger set of possible
+allows one to access more then one file per trace and has a bigger set of possible
 file actions.
 .RS
 .P

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-11-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-11-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ede04c27b618842e32b2a3349672f6b59a1697e1:

  test: add large pattern test (2022-11-18 19:36:10 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 72044c66ac7055a98c9b3021c298c81849e3c990:

  doc: update about sqthread_poll (2022-11-23 14:06:03 -0500)

----------------------------------------------------------------
Ankit Kumar (2):
      engines:io_uring: fix clat calculation for sqthread poll
      doc: update about sqthread_poll

Jens Axboe (1):
      Merge branch 'patch-1' of https://github.com/chienfuchen32/fio

chienfuchen32 (1):
      update documentation typo

 HOWTO.rst          |  6 ++++--
 engines/io_uring.c | 20 ++++++++++++++++++++
 fio.1              |  4 +++-
 3 files changed, 27 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index e796f961..4419ee1b 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2299,7 +2299,9 @@ with the caveat that when used on the command line, they must come after the
 	kernel of available items in the SQ ring. If this option is set, the
 	act of submitting IO will be done by a polling thread in the kernel.
 	This frees up cycles for fio, at the cost of using more CPU in the
-	system.
+	system. As submission is just the time it takes to fill in the sqe
+	entries and any syscall required to wake up the idle kernel thread,
+	fio will not report submission latencies.
 
 .. option:: sqthread_poll_cpu=int : [io_uring] [io_uring_cmd]
 
@@ -4501,7 +4503,7 @@ Trace file format v2
 ~~~~~~~~~~~~~~~~~~~~
 
 The second version of the trace file format was added in fio version 1.17.  It
-allows to access more then one file per trace and has a bigger set of possible
+allows to access more than one file per trace and has a bigger set of possible
 file actions.
 
 The first line of the trace file has to be::
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 3c656b77..a9abd11d 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -637,12 +637,16 @@ static int fio_ioring_commit(struct thread_data *td)
 	 */
 	if (o->sqpoll_thread) {
 		struct io_sq_ring *ring = &ld->sq_ring;
+		unsigned start = *ld->sq_ring.head;
 		unsigned flags;
 
 		flags = atomic_load_acquire(ring->flags);
 		if (flags & IORING_SQ_NEED_WAKEUP)
 			io_uring_enter(ld, ld->queued, 0,
 					IORING_ENTER_SQ_WAKEUP);
+		fio_ioring_queued(td, start, ld->queued);
+		io_u_mark_submit(td, ld->queued);
+
 		ld->queued = 0;
 		return 0;
 	}
@@ -804,6 +808,14 @@ static int fio_ioring_queue_init(struct thread_data *td)
 			p.flags |= IORING_SETUP_SQ_AFF;
 			p.sq_thread_cpu = o->sqpoll_cpu;
 		}
+
+		/*
+		 * Submission latency for sqpoll_thread is just the time it
+		 * takes to fill in the SQ ring entries, and any syscall if
+		 * IORING_SQ_NEED_WAKEUP is set, we don't need to log that time
+		 * separately.
+		 */
+		td->o.disable_slat = 1;
 	}
 
 	/*
@@ -876,6 +888,14 @@ static int fio_ioring_cmd_queue_init(struct thread_data *td)
 			p.flags |= IORING_SETUP_SQ_AFF;
 			p.sq_thread_cpu = o->sqpoll_cpu;
 		}
+
+		/*
+		 * Submission latency for sqpoll_thread is just the time it
+		 * takes to fill in the SQ ring entries, and any syscall if
+		 * IORING_SQ_NEED_WAKEUP is set, we don't need to log that time
+		 * separately.
+		 */
+		td->o.disable_slat = 1;
 	}
 	if (o->cmd_type == FIO_URING_CMD_NVME) {
 		p.flags |= IORING_SETUP_SQE128;
diff --git a/fio.1 b/fio.1
index 9e33c9e1..a156bf5d 100644
--- a/fio.1
+++ b/fio.1
@@ -2090,7 +2090,9 @@ sqthread_poll option.
 Normally fio will submit IO by issuing a system call to notify the kernel of
 available items in the SQ ring. If this option is set, the act of submitting IO
 will be done by a polling thread in the kernel. This frees up cycles for fio, at
-the cost of using more CPU in the system.
+the cost of using more CPU in the system. As submission is just the time it
+takes to fill in the sqe entries and any syscall required to wake up the idle
+kernel thread, fio will not report submission latencies.
 .TP
 .BI (io_uring,io_uring_cmd)sqthread_poll_cpu \fR=\fPint
 When `sqthread_poll` is set, this option provides a way to define which CPU

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-11-19 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-11-19 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 07c8fe21021681f86fbfd3c3d63b88a5ebd4e557:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-11-14 08:47:00 -0500)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ede04c27b618842e32b2a3349672f6b59a1697e1:

  test: add large pattern test (2022-11-18 19:36:10 -0500)

----------------------------------------------------------------
Logan Gunthorpe (6):
      cconv: Support pattern buffers of arbitrary size
      lib/pattern: Support NULL output buffer in parse_and_fill_pattern()
      lib/pattern: Support short repeated read calls when loading from file
      options: Support arbitrarily long pattern buffers
      lib/pattern: Support binary pattern buffers on windows
      test: add large pattern test

Shin'ichiro Kawasaki (13):
      oslib: blkzoned: add blkzoned_finish_zone() helper function
      engines/libzbc: add libzbc_finish_zone() helper function
      zbd: add zbd_zone_remainder() helper function
      zbd: finish zones with remainder smaller than minimum write block size
      zbd: allow block size not divisor of zone size
      zbd, verify: verify before zone reset for zone_reset_threshold/frequency
      zbd: fix zone reset condition for verify
      zbd: prevent experimental verify with zonemode=zbd
      t/zbd: fix test case #33 for block size unaligned to zone size
      t/zbd: modify test case #34 for block size unaligned to zone size
      t/zbd: add test case to check zone_reset_threshold/frequency with verify
      t/zbd: remove experimental_verify option from test case #54
      t/zbd: add test case to check experimental_verify option

 cconv.c                |  86 +++++++++++++++++-------
 client.c               |  17 +++--
 engines/libzbc.c       |  34 ++++++++++
 gclient.c              |  12 +++-
 ioengines.h            |   2 +
 lib/pattern.c          | 100 +++++++++++++++++++++++-----
 lib/pattern.h          |  21 ++++--
 options.c              |  10 +--
 oslib/blkzoned.h       |   8 +++
 oslib/linux-blkzoned.c |  37 +++++++++++
 server.c               |  23 ++++---
 server.h               |   2 +-
 stat.h                 |   1 -
 t/jobs/t0027.fio       |  14 ++++
 t/run-fio-tests.py     |  29 ++++++++
 t/zbd/test-zbd-support |  60 +++++++++++++----
 thread_options.h       |  15 +++--
 verify.c               |   6 +-
 zbd.c                  | 175 +++++++++++++++++++++++++++++++++----------------
 zbd.h                  |   2 -
 20 files changed, 507 insertions(+), 147 deletions(-)
 create mode 100644 t/jobs/t0027.fio

---

Diff of recent changes:

diff --git a/cconv.c b/cconv.c
index 6c36afb7..d755844f 100644
--- a/cconv.c
+++ b/cconv.c
@@ -48,14 +48,24 @@ static void free_thread_options_to_cpu(struct thread_options *o)
 	free(o->profile);
 	free(o->cgroup);
 
+	free(o->verify_pattern);
+	free(o->buffer_pattern);
+
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		free(o->bssplit[i]);
 		free(o->zone_split[i]);
 	}
 }
 
-void convert_thread_options_to_cpu(struct thread_options *o,
-				   struct thread_options_pack *top)
+size_t thread_options_pack_size(struct thread_options *o)
+{
+	return sizeof(struct thread_options_pack) + o->verify_pattern_bytes +
+		o->buffer_pattern_bytes;
+}
+
+int convert_thread_options_to_cpu(struct thread_options *o,
+				  struct thread_options_pack *top,
+				  size_t top_sz)
 {
 	int i, j;
 
@@ -171,10 +181,21 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->verify_interval = le32_to_cpu(top->verify_interval);
 	o->verify_offset = le32_to_cpu(top->verify_offset);
 
-	memcpy(o->verify_pattern, top->verify_pattern, MAX_PATTERN_SIZE);
-	memcpy(o->buffer_pattern, top->buffer_pattern, MAX_PATTERN_SIZE);
-
 	o->verify_pattern_bytes = le32_to_cpu(top->verify_pattern_bytes);
+	o->buffer_pattern_bytes = le32_to_cpu(top->buffer_pattern_bytes);
+	if (o->verify_pattern_bytes >= MAX_PATTERN_SIZE ||
+	    o->buffer_pattern_bytes >= MAX_PATTERN_SIZE ||
+	    thread_options_pack_size(o) > top_sz)
+		return -EINVAL;
+
+	o->verify_pattern = realloc(o->verify_pattern,
+				    o->verify_pattern_bytes);
+	o->buffer_pattern = realloc(o->buffer_pattern,
+				    o->buffer_pattern_bytes);
+	memcpy(o->verify_pattern, top->patterns, o->verify_pattern_bytes);
+	memcpy(o->buffer_pattern, &top->patterns[o->verify_pattern_bytes],
+	       o->buffer_pattern_bytes);
+
 	o->verify_fatal = le32_to_cpu(top->verify_fatal);
 	o->verify_dump = le32_to_cpu(top->verify_dump);
 	o->verify_async = le32_to_cpu(top->verify_async);
@@ -268,7 +289,6 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->zero_buffers = le32_to_cpu(top->zero_buffers);
 	o->refill_buffers = le32_to_cpu(top->refill_buffers);
 	o->scramble_buffers = le32_to_cpu(top->scramble_buffers);
-	o->buffer_pattern_bytes = le32_to_cpu(top->buffer_pattern_bytes);
 	o->time_based = le32_to_cpu(top->time_based);
 	o->disable_lat = le32_to_cpu(top->disable_lat);
 	o->disable_clat = le32_to_cpu(top->disable_clat);
@@ -334,6 +354,8 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	uint8_t verify_cpumask[FIO_TOP_STR_MAX];
 	uint8_t log_gz_cpumask[FIO_TOP_STR_MAX];
 #endif
+
+	return 0;
 }
 
 void convert_thread_options_to_net(struct thread_options_pack *top,
@@ -572,8 +594,9 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 		top->max_latency[i] = __cpu_to_le64(o->max_latency[i]);
 	}
 
-	memcpy(top->verify_pattern, o->verify_pattern, MAX_PATTERN_SIZE);
-	memcpy(top->buffer_pattern, o->buffer_pattern, MAX_PATTERN_SIZE);
+	memcpy(top->patterns, o->verify_pattern, o->verify_pattern_bytes);
+	memcpy(&top->patterns[o->verify_pattern_bytes], o->buffer_pattern,
+	       o->buffer_pattern_bytes);
 
 	top->size = __cpu_to_le64(o->size);
 	top->io_size = __cpu_to_le64(o->io_size);
@@ -620,7 +643,6 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	uint8_t verify_cpumask[FIO_TOP_STR_MAX];
 	uint8_t log_gz_cpumask[FIO_TOP_STR_MAX];
 #endif
-
 }
 
 /*
@@ -630,18 +652,36 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
  */
 int fio_test_cconv(struct thread_options *__o)
 {
-	struct thread_options o;
-	struct thread_options_pack top1, top2;
-
-	memset(&top1, 0, sizeof(top1));
-	memset(&top2, 0, sizeof(top2));
-
-	convert_thread_options_to_net(&top1, __o);
-	memset(&o, 0, sizeof(o));
-	convert_thread_options_to_cpu(&o, &top1);
-	convert_thread_options_to_net(&top2, &o);
-
-	free_thread_options_to_cpu(&o);
-
-	return memcmp(&top1, &top2, sizeof(top1));
+	struct thread_options o1 = *__o, o2;
+	struct thread_options_pack *top1, *top2;
+	size_t top_sz;
+	int ret;
+
+	o1.verify_pattern_bytes = 61;
+	o1.verify_pattern = malloc(o1.verify_pattern_bytes);
+	memset(o1.verify_pattern, 'V', o1.verify_pattern_bytes);
+	o1.buffer_pattern_bytes = 15;
+	o1.buffer_pattern = malloc(o1.buffer_pattern_bytes);
+	memset(o1.buffer_pattern, 'B', o1.buffer_pattern_bytes);
+
+	top_sz = thread_options_pack_size(&o1);
+	top1 = calloc(1, top_sz);
+	top2 = calloc(1, top_sz);
+
+	convert_thread_options_to_net(top1, &o1);
+	memset(&o2, 0, sizeof(o2));
+	ret = convert_thread_options_to_cpu(&o2, top1, top_sz);
+	if (ret)
+		goto out;
+
+	convert_thread_options_to_net(top2, &o2);
+	ret = memcmp(top1, top2, top_sz);
+
+out:
+	free_thread_options_to_cpu(&o2);
+	free(top2);
+	free(top1);
+	free(o1.buffer_pattern);
+	free(o1.verify_pattern);
+	return ret;
 }
diff --git a/client.c b/client.c
index 37da74bc..51496c77 100644
--- a/client.c
+++ b/client.c
@@ -922,13 +922,20 @@ int fio_clients_send_ini(const char *filename)
 int fio_client_update_options(struct fio_client *client,
 			      struct thread_options *o, uint64_t *tag)
 {
-	struct cmd_add_job_pdu pdu;
+	size_t cmd_sz = offsetof(struct cmd_add_job_pdu, top) +
+		thread_options_pack_size(o);
+	struct cmd_add_job_pdu *pdu;
+	int ret;
 
-	pdu.thread_number = cpu_to_le32(client->thread_number);
-	pdu.groupid = cpu_to_le32(client->groupid);
-	convert_thread_options_to_net(&pdu.top, o);
+	pdu = malloc(cmd_sz);
+	pdu->thread_number = cpu_to_le32(client->thread_number);
+	pdu->groupid = cpu_to_le32(client->groupid);
+	convert_thread_options_to_net(&pdu->top, o);
 
-	return fio_net_send_cmd(client->fd, FIO_NET_CMD_UPDATE_JOB, &pdu, sizeof(pdu), tag, &client->cmd_list);
+	ret = fio_net_send_cmd(client->fd, FIO_NET_CMD_UPDATE_JOB, pdu,
+			       cmd_sz, tag, &client->cmd_list);
+	free(pdu);
+	return ret;
 }
 
 static void convert_io_stat(struct io_stat *dst, struct io_stat *src)
diff --git a/engines/libzbc.c b/engines/libzbc.c
index 2bc2c7e0..2b63ef1a 100644
--- a/engines/libzbc.c
+++ b/engines/libzbc.c
@@ -332,6 +332,39 @@ err:
 	return -ret;
 }
 
+static int libzbc_finish_zone(struct thread_data *td, struct fio_file *f,
+			      uint64_t offset, uint64_t length)
+{
+	struct libzbc_data *ld = td->io_ops_data;
+	uint64_t sector = offset >> 9;
+	unsigned int nr_zones;
+	struct zbc_errno err;
+	int i, ret;
+
+	assert(ld);
+	assert(ld->zdev);
+
+	nr_zones = (length + td->o.zone_size - 1) / td->o.zone_size;
+	assert(nr_zones > 0);
+
+	for (i = 0; i < nr_zones; i++, sector += td->o.zone_size >> 9) {
+		ret = zbc_finish_zone(ld->zdev, sector, 0);
+		if (ret)
+			goto err;
+	}
+
+	return 0;
+
+err:
+	zbc_errno(ld->zdev, &err);
+	td_verror(td, errno, "zbc_finish_zone failed");
+	if (err.sk)
+		log_err("%s: finish zone failed %s:%s\n",
+			f->file_name,
+			zbc_sk_str(err.sk), zbc_asc_ascq_str(err.asc_ascq));
+	return -ret;
+}
+
 static int libzbc_get_max_open_zones(struct thread_data *td, struct fio_file *f,
 				     unsigned int *max_open_zones)
 {
@@ -434,6 +467,7 @@ FIO_STATIC struct ioengine_ops ioengine = {
 	.report_zones		= libzbc_report_zones,
 	.reset_wp		= libzbc_reset_wp,
 	.get_max_open_zones	= libzbc_get_max_open_zones,
+	.finish_zone		= libzbc_finish_zone,
 	.queue			= libzbc_queue,
 	.flags			= FIO_SYNCIO | FIO_NOEXTEND | FIO_RAWIO,
 };
diff --git a/gclient.c b/gclient.c
index c59bcfe2..73f64b3b 100644
--- a/gclient.c
+++ b/gclient.c
@@ -553,12 +553,15 @@ static void gfio_quit_op(struct fio_client *client, struct fio_net_cmd *cmd)
 }
 
 static struct thread_options *gfio_client_add_job(struct gfio_client *gc,
-			struct thread_options_pack *top)
+			struct thread_options_pack *top, size_t top_sz)
 {
 	struct gfio_client_options *gco;
 
 	gco = calloc(1, sizeof(*gco));
-	convert_thread_options_to_cpu(&gco->o, top);
+	if (convert_thread_options_to_cpu(&gco->o, top, top_sz)) {
+		dprint(FD_NET, "client: failed parsing add_job command\n");
+		return NULL;
+	}
 	INIT_FLIST_HEAD(&gco->list);
 	flist_add_tail(&gco->list, &gc->o_list);
 	gc->o_list_nr = 1;
@@ -577,7 +580,10 @@ static void gfio_add_job_op(struct fio_client *client, struct fio_net_cmd *cmd)
 
 	p->thread_number = le32_to_cpu(p->thread_number);
 	p->groupid = le32_to_cpu(p->groupid);
-	o = gfio_client_add_job(gc, &p->top);
+	o = gfio_client_add_job(gc, &p->top,
+			cmd->pdu_len - offsetof(struct cmd_add_job_pdu, top));
+	if (o == NULL)
+		return;
 
 	gdk_threads_enter();
 
diff --git a/ioengines.h b/ioengines.h
index fafa1e48..11d2115c 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -61,6 +61,8 @@ struct ioengine_ops {
 			uint64_t, uint64_t);
 	int (*get_max_open_zones)(struct thread_data *, struct fio_file *,
 				  unsigned int *);
+	int (*finish_zone)(struct thread_data *, struct fio_file *,
+			   uint64_t, uint64_t);
 	int option_struct_size;
 	struct fio_option *options;
 };
diff --git a/lib/pattern.c b/lib/pattern.c
index d8203630..9be29af6 100644
--- a/lib/pattern.c
+++ b/lib/pattern.c
@@ -32,7 +32,7 @@ static const char *parse_file(const char *beg, char *out,
 	const char *end;
 	char *file;
 	int fd;
-	ssize_t count;
+	ssize_t rc, count = 0;
 
 	if (!out_len)
 		goto err_out;
@@ -47,13 +47,32 @@ static const char *parse_file(const char *beg, char *out,
 	if (file == NULL)
 		goto err_out;
 
+#ifdef _WIN32
+	fd = open(file, O_RDONLY | O_BINARY);
+#else
 	fd = open(file, O_RDONLY);
+#endif
 	if (fd < 0)
 		goto err_free_out;
 
-	count = read(fd, out, out_len);
-	if (count == -1)
-		goto err_free_close_out;
+	if (out) {
+		while (1) {
+			rc = read(fd, out, out_len - count);
+			if (rc == 0)
+				break;
+			if (rc == -1)
+				goto err_free_close_out;
+
+			count += rc;
+			out += rc;
+		}
+	} else {
+		count = lseek(fd, 0, SEEK_END);
+		if (count == -1)
+			goto err_free_close_out;
+		if (count >= out_len)
+			count = out_len;
+	}
 
 	*filled = count;
 	close(fd);
@@ -100,7 +119,8 @@ static const char *parse_string(const char *beg, char *out,
 	if (end - beg > out_len)
 		return NULL;
 
-	memcpy(out, beg, end - beg);
+	if (out)
+		memcpy(out, beg, end - beg);
 	*filled = end - beg;
 
 	/* Catch up quote */
@@ -156,12 +176,14 @@ static const char *parse_number(const char *beg, char *out,
 		i = 0;
 		if (!lval) {
 			num    = 0;
-			out[i] = 0x00;
+			if (out)
+				out[i] = 0x00;
 			i      = 1;
 		} else {
 			val = (unsigned int)lval;
 			for (; val && out_len; out_len--, i++, val >>= 8)
-				out[i] = val & 0xff;
+				if (out)
+					out[i] = val & 0xff;
 			if (val)
 				return NULL;
 		}
@@ -183,7 +205,8 @@ static const char *parse_number(const char *beg, char *out,
 			const char *fmt;
 
 			fmt = (num & 1 ? "%1hhx" : "%2hhx");
-			sscanf(beg, fmt, &out[i]);
+			if (out)
+				sscanf(beg, fmt, &out[i]);
 			if (num & 1) {
 				num++;
 				beg--;
@@ -251,7 +274,8 @@ static const char *parse_format(const char *in, char *out, unsigned int parsed,
 	if (f->desc->len > out_len)
 		return NULL;
 
-	memset(out, '\0', f->desc->len);
+	if (out)
+		memset(out, '\0', f->desc->len);
 	*filled = f->desc->len;
 
 	return in + len;
@@ -262,7 +286,9 @@ static const char *parse_format(const char *in, char *out, unsigned int parsed,
  *                            numbers and pattern formats.
  * @in - string input
  * @in_len - size of the input string
- * @out - output buffer where parsed result will be put
+ * @out - output buffer where parsed result will be put, may be NULL
+ *	  in which case this function just calculates the required
+ *	  length of the buffer
  * @out_len - lengths of the output buffer
  * @fmt_desc - array of pattern format descriptors [input]
  * @fmt - array of pattern formats [output]
@@ -305,16 +331,16 @@ static const char *parse_format(const char *in, char *out, unsigned int parsed,
  *
  * Returns number of bytes filled or err < 0 in case of failure.
  */
-int parse_and_fill_pattern(const char *in, unsigned int in_len,
-			   char *out, unsigned int out_len,
-			   const struct pattern_fmt_desc *fmt_desc,
-			   struct pattern_fmt *fmt,
-			   unsigned int *fmt_sz_out)
+static int parse_and_fill_pattern(const char *in, unsigned int in_len,
+				  char *out, unsigned int out_len,
+				  const struct pattern_fmt_desc *fmt_desc,
+				  struct pattern_fmt *fmt,
+				  unsigned int *fmt_sz_out)
 {
 	const char *beg, *end, *out_beg = out;
 	unsigned int total = 0, fmt_rem = 0;
 
-	if (!in || !in_len || !out || !out_len)
+	if (!in || !in_len || !out_len)
 		return -EINVAL;
 	if (fmt_sz_out)
 		fmt_rem = *fmt_sz_out;
@@ -370,6 +396,48 @@ int parse_and_fill_pattern(const char *in, unsigned int in_len,
 	return total;
 }
 
+/**
+ * parse_and_fill_pattern_alloc() - Parses combined input, which consists of
+ *				    strings, numbers and pattern formats and
+ *				    allocates a buffer for the result.
+ *
+ * @in - string input
+ * @in_len - size of the input string
+ * @out - pointer to the output buffer pointer, this will be set to the newly
+ *        allocated pattern buffer which must be freed by the caller
+ * @fmt_desc - array of pattern format descriptors [input]
+ * @fmt - array of pattern formats [output]
+ * @fmt_sz - pointer where the size of pattern formats array stored [input],
+ *           after successful parsing this pointer will contain the number
+ *           of parsed formats if any [output].
+ *
+ * See documentation on parse_and_fill_pattern() above for a description
+ * of the functionality.
+ *
+ * Returns number of bytes filled or err < 0 in case of failure.
+ */
+int parse_and_fill_pattern_alloc(const char *in, unsigned int in_len,
+		char **out, const struct pattern_fmt_desc *fmt_desc,
+		struct pattern_fmt *fmt, unsigned int *fmt_sz_out)
+{
+	int count;
+
+	count = parse_and_fill_pattern(in, in_len, NULL, MAX_PATTERN_SIZE,
+				       fmt_desc, fmt, fmt_sz_out);
+	if (count < 0)
+		return count;
+
+	*out = malloc(count);
+	count = parse_and_fill_pattern(in, in_len, *out, count, fmt_desc,
+				       fmt, fmt_sz_out);
+	if (count < 0) {
+		free(*out);
+		*out = NULL;
+	}
+
+	return count;
+}
+
 /**
  * dup_pattern() - Duplicates part of the pattern all over the buffer.
  *
diff --git a/lib/pattern.h b/lib/pattern.h
index a6d9d6b4..7123b42d 100644
--- a/lib/pattern.h
+++ b/lib/pattern.h
@@ -1,6 +1,19 @@
 #ifndef FIO_PARSE_PATTERN_H
 #define FIO_PARSE_PATTERN_H
 
+/*
+ * The pattern is dynamically allocated, but that doesn't mean there
+ * are not limits. The network protocol has a limit of
+ * FIO_SERVER_MAX_CMD_MB and potentially two patterns must fit in there.
+ * There's also a need to verify the incoming data from the network and
+ * this provides a sensible check.
+ *
+ * 128MiB is an arbitrary limit that meets these criteria. The patterns
+ * tend to be truncated at the IO size anyway and IO sizes that large
+ * aren't terribly practical.
+ */
+#define MAX_PATTERN_SIZE	(128 << 20)
+
 /**
  * Pattern format description. The input for 'parse_pattern'.
  * Describes format with its name and callback, which should
@@ -21,11 +34,9 @@ struct pattern_fmt {
 	const struct pattern_fmt_desc *desc;
 };
 
-int parse_and_fill_pattern(const char *in, unsigned int in_len,
-			   char *out, unsigned int out_len,
-			   const struct pattern_fmt_desc *fmt_desc,
-			   struct pattern_fmt *fmt,
-			   unsigned int *fmt_sz_out);
+int parse_and_fill_pattern_alloc(const char *in, unsigned int in_len,
+		char **out, const struct pattern_fmt_desc *fmt_desc,
+		struct pattern_fmt *fmt, unsigned int *fmt_sz_out);
 
 int paste_format_inplace(char *pattern, unsigned int pattern_len,
 			 struct pattern_fmt *fmt, unsigned int fmt_sz,
diff --git a/options.c b/options.c
index 9e4d8cd1..49612345 100644
--- a/options.c
+++ b/options.c
@@ -1488,8 +1488,8 @@ static int str_buffer_pattern_cb(void *data, const char *input)
 	int ret;
 
 	/* FIXME: for now buffer pattern does not support formats */
-	ret = parse_and_fill_pattern(input, strlen(input), td->o.buffer_pattern,
-				     MAX_PATTERN_SIZE, NULL, NULL, NULL);
+	ret = parse_and_fill_pattern_alloc(input, strlen(input),
+				&td->o.buffer_pattern, NULL, NULL, NULL);
 	if (ret < 0)
 		return 1;
 
@@ -1537,9 +1537,9 @@ static int str_verify_pattern_cb(void *data, const char *input)
 	int ret;
 
 	td->o.verify_fmt_sz = FIO_ARRAY_SIZE(td->o.verify_fmt);
-	ret = parse_and_fill_pattern(input, strlen(input), td->o.verify_pattern,
-				     MAX_PATTERN_SIZE, fmt_desc,
-				     td->o.verify_fmt, &td->o.verify_fmt_sz);
+	ret = parse_and_fill_pattern_alloc(input, strlen(input),
+			&td->o.verify_pattern, fmt_desc, td->o.verify_fmt,
+			&td->o.verify_fmt_sz);
 	if (ret < 0)
 		return 1;
 
diff --git a/oslib/blkzoned.h b/oslib/blkzoned.h
index 719b041d..29fb034f 100644
--- a/oslib/blkzoned.h
+++ b/oslib/blkzoned.h
@@ -18,6 +18,8 @@ extern int blkzoned_reset_wp(struct thread_data *td, struct fio_file *f,
 				uint64_t offset, uint64_t length);
 extern int blkzoned_get_max_open_zones(struct thread_data *td, struct fio_file *f,
 				       unsigned int *max_open_zones);
+extern int blkzoned_finish_zone(struct thread_data *td, struct fio_file *f,
+				uint64_t offset, uint64_t length);
 #else
 /*
  * Define stubs for systems that do not have zoned block device support.
@@ -51,6 +53,12 @@ static inline int blkzoned_get_max_open_zones(struct thread_data *td, struct fio
 {
 	return -EIO;
 }
+static inline int blkzoned_finish_zone(struct thread_data *td,
+				       struct fio_file *f,
+				       uint64_t offset, uint64_t length)
+{
+	return -EIO;
+}
 #endif
 
 #endif /* FIO_BLKZONED_H */
diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
index 185bd501..c3130d0e 100644
--- a/oslib/linux-blkzoned.c
+++ b/oslib/linux-blkzoned.c
@@ -308,3 +308,40 @@ int blkzoned_reset_wp(struct thread_data *td, struct fio_file *f,
 
 	return ret;
 }
+
+int blkzoned_finish_zone(struct thread_data *td, struct fio_file *f,
+			 uint64_t offset, uint64_t length)
+{
+#ifdef BLKFINISHZONE
+	struct blk_zone_range zr = {
+		.sector         = offset >> 9,
+		.nr_sectors     = length >> 9,
+	};
+	int fd, ret = 0;
+
+	/* If the file is not yet opened, open it for this function. */
+	fd = f->fd;
+	if (fd < 0) {
+		fd = open(f->file_name, O_RDWR | O_LARGEFILE);
+		if (fd < 0)
+			return -errno;
+	}
+
+	if (ioctl(fd, BLKFINISHZONE, &zr) < 0)
+		ret = -errno;
+
+	if (f->fd < 0)
+		close(fd);
+
+	return ret;
+#else
+	/*
+	 * Kernel versions older than 5.5 does not support BLKFINISHZONE. These
+	 * old kernels assumed zones are closed automatically at max_open_zones
+	 * limit. Also they did not support max_active_zones limit. Then there
+	 * was no need to finish zones to avoid errors caused by max_open_zones
+	 * or max_active_zones. For those old versions, just do nothing.
+	 */
+	return 0;
+#endif
+}
diff --git a/server.c b/server.c
index b869d387..a6347efd 100644
--- a/server.c
+++ b/server.c
@@ -1082,6 +1082,7 @@ static int handle_update_job_cmd(struct fio_net_cmd *cmd)
 	struct cmd_add_job_pdu *pdu = (struct cmd_add_job_pdu *) cmd->payload;
 	struct thread_data *td;
 	uint32_t tnumber;
+	int ret;
 
 	tnumber = le32_to_cpu(pdu->thread_number);
 
@@ -1093,8 +1094,9 @@ static int handle_update_job_cmd(struct fio_net_cmd *cmd)
 	}
 
 	td = tnumber_to_td(tnumber);
-	convert_thread_options_to_cpu(&td->o, &pdu->top);
-	send_update_job_reply(cmd->tag, 0);
+	ret = convert_thread_options_to_cpu(&td->o, &pdu->top,
+			cmd->pdu_len - offsetof(struct cmd_add_job_pdu, top));
+	send_update_job_reply(cmd->tag, ret);
 	return 0;
 }
 
@@ -2323,15 +2325,18 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 
 void fio_server_send_add_job(struct thread_data *td)
 {
-	struct cmd_add_job_pdu pdu = {
-		.thread_number = cpu_to_le32(td->thread_number),
-		.groupid = cpu_to_le32(td->groupid),
-	};
+	struct cmd_add_job_pdu *pdu;
+	size_t cmd_sz = offsetof(struct cmd_add_job_pdu, top) +
+		thread_options_pack_size(&td->o);
 
-	convert_thread_options_to_net(&pdu.top, &td->o);
+	pdu = malloc(cmd_sz);
+	pdu->thread_number = cpu_to_le32(td->thread_number);
+	pdu->groupid = cpu_to_le32(td->groupid);
 
-	fio_net_queue_cmd(FIO_NET_CMD_ADD_JOB, &pdu, sizeof(pdu), NULL,
-				SK_F_COPY);
+	convert_thread_options_to_net(&pdu->top, &td->o);
+
+	fio_net_queue_cmd(FIO_NET_CMD_ADD_JOB, pdu, cmd_sz, NULL, SK_F_COPY);
+	free(pdu);
 }
 
 void fio_server_send_start(struct thread_data *td)
diff --git a/server.h b/server.h
index b0c5e2df..28133020 100644
--- a/server.h
+++ b/server.h
@@ -51,7 +51,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 97,
+	FIO_SERVER_VER			= 98,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.h b/stat.h
index 4c3bf71f..8ceabc48 100644
--- a/stat.h
+++ b/stat.h
@@ -142,7 +142,6 @@ enum block_info_state {
 	BLOCK_STATE_COUNT,
 };
 
-#define MAX_PATTERN_SIZE	512
 #define FIO_JOBNAME_SIZE	128
 #define FIO_JOBDESC_SIZE	256
 #define FIO_VERROR_SIZE		128
diff --git a/t/jobs/t0027.fio b/t/jobs/t0027.fio
new file mode 100644
index 00000000..b5b97a30
--- /dev/null
+++ b/t/jobs/t0027.fio
@@ -0,0 +1,14 @@
+[global]
+filename=t0027file
+size=16k
+bs=16k
+
+[write_job]
+readwrite=write
+buffer_pattern='t0027.pattern'
+
+[read_job]
+stonewall=1
+readwrite=read
+verify=pattern
+verify_pattern='t0027.pattern'
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index e5b307ac..a06f8126 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -799,6 +799,26 @@ class FioJobTest_t0025(FioJobTest):
         if self.json_data['jobs'][0]['read']['io_kbytes'] != 128:
             self.passed = False
 
+class FioJobTest_t0027(FioJobTest):
+    def setup(self, *args, **kws):
+        super(FioJobTest_t0027, self).setup(*args, **kws)
+        self.pattern_file = os.path.join(self.test_dir, "t0027.pattern")
+        self.output_file = os.path.join(self.test_dir, "t0027file")
+        self.pattern = os.urandom(16 << 10)
+        with open(self.pattern_file, "wb") as f:
+            f.write(self.pattern)
+
+    def check_result(self):
+        super(FioJobTest_t0027, self).check_result()
+
+        if not self.passed:
+            return
+
+        with open(self.output_file, "rb") as f:
+            data = f.read()
+
+        if data != self.pattern:
+            self.passed = False
 
 class FioJobTest_iops_rate(FioJobTest):
     """Test consists of fio test job t0009
@@ -1214,6 +1234,15 @@ TEST_LIST = [
         'pre_success':      None,
         'requirements':     [Requirements.not_windows],
     },
+    {
+        'test_id':          27,
+        'test_class':       FioJobTest_t0027,
+        'job':              't0027.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index cdc03f28..4091d9ac 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -813,7 +813,8 @@ test33() {
     local bs io_size size
     local off capacity=0;
 
-    prep_write
+    [ -n "$is_zbd" ] && reset_zone "$dev" -1
+
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 1 $off $dev)
     size=$((2 * zone_size))
@@ -822,20 +823,30 @@ test33() {
     run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write	\
 		   --size=$size --io_size=$io_size --bs=$bs	\
 		   >> "${logfile}.${test_number}" 2>&1 || return $?
-    check_written $(((io_size + bs - 1) / bs * bs)) || return $?
+    check_written $((io_size / bs * bs)) || return $?
 }
 
-# Write to sequential zones with a block size that is not a divisor of the
-# zone size and with data verification enabled.
+# Test repeated async write job with verify using two unaligned block sizes.
 test34() {
-    local size
+	local bs off zone_capacity
+	local -a block_sizes
 
-    prep_write
-    size=$((2 * zone_size))
-    run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write --size=$size \
-		   --do_verify=1 --verify=md5 --bs=$((3 * zone_size / 4)) \
-		   >> "${logfile}.${test_number}" 2>&1 && return 1
-    grep -q 'not a divisor of' "${logfile}.${test_number}"
+	require_zbd || return $SKIP_TESTCASE
+	prep_write
+
+	off=$((first_sequential_zone_sector * 512))
+	zone_capacity=$(total_zone_capacity 1 $off $dev)
+	block_sizes=($((4096 * 7)) $(($(min ${zone_capacity} 4194304) - 4096)))
+
+	for bs in ${block_sizes[@]}; do
+		run_fio --name=job --filename="${dev}" --rw=randwrite \
+			--bs="${bs}" --offset="${off}" \
+			--size=$((4 * zone_size)) --iodepth=256 \
+			"$(ioengine "libaio")" --time_based=1 --runtime=15s \
+			--zonemode=zbd --direct=1 --zonesize="${zone_size}" \
+			--verify=crc32c --do_verify=1 ${job_var_opts[@]} \
+			>> "${logfile}.${test_number}" 2>&1 || return $?
+	done
 }
 
 # Test 1/4 for the I/O boundary rounding code: $size < $zone_size.
@@ -1171,7 +1182,6 @@ test54() {
 		--rw=randrw:2 --rwmixwrite=25 --bsrange=4k-${zone_size} \
 		--zonemode=zbd --zonesize=${zone_size} \
 		--verify=crc32c --do_verify=1 --verify_backlog=2 \
-		--experimental_verify=1 \
 		--alloc-size=65536 --random_generator=tausworthe64 \
 		${job_var_opts[@]} --debug=zbd \
 		>> "${logfile}.${test_number}" 2>&1 || return $?
@@ -1269,6 +1279,32 @@ test58() {
 	    >>"${logfile}.${test_number}" 2>&1
 }
 
+# Test zone_reset_threshold with verify.
+test59() {
+	local off bs loops=2 size=$((zone_size)) w
+	local -a workloads=(write randwrite rw randrw)
+
+	prep_write
+	off=$((first_sequential_zone_sector * 512))
+
+	bs=$(min $((256*1024)) "$zone_size")
+	for w in "${workloads[@]}"; do
+		run_fio_on_seq "$(ioengine "psync")" --rw=${w} --bs="$bs" \
+			       --size=$size --loops=$loops --do_verify=1 \
+			       --verify=md5 --zone_reset_frequency=.9 \
+			       --zone_reset_threshold=.1 \
+			       >> "${logfile}.${test_number}" 2>&1 || return $?
+	done
+}
+
+# Test fio errors out experimental_verify option with zonemode=zbd.
+test60() {
+	run_fio_on_seq "$(ioengine "psync")" --rw=write --size=$zone_size \
+		       --do_verify=1 --verify=md5 --experimental_verify=1 \
+		       >> "${logfile}.${test_number}" 2>&1 && return 1
+	grep -q 'not support experimental verify' "${logfile}.${test_number}"
+}
+
 SECONDS=0
 tests=()
 dynamic_analyzer=()
diff --git a/thread_options.h b/thread_options.h
index 634070af..74e7ea45 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -144,7 +144,7 @@ struct thread_options {
 	unsigned int do_verify;
 	unsigned int verify_interval;
 	unsigned int verify_offset;
-	char verify_pattern[MAX_PATTERN_SIZE];
+	char *verify_pattern;
 	unsigned int verify_pattern_bytes;
 	struct pattern_fmt verify_fmt[8];
 	unsigned int verify_fmt_sz;
@@ -256,7 +256,7 @@ struct thread_options {
 	unsigned int zero_buffers;
 	unsigned int refill_buffers;
 	unsigned int scramble_buffers;
-	char buffer_pattern[MAX_PATTERN_SIZE];
+	char *buffer_pattern;
 	unsigned int buffer_pattern_bytes;
 	unsigned int compress_percentage;
 	unsigned int compress_chunk;
@@ -464,7 +464,6 @@ struct thread_options_pack {
 	uint32_t do_verify;
 	uint32_t verify_interval;
 	uint32_t verify_offset;
-	uint8_t verify_pattern[MAX_PATTERN_SIZE];
 	uint32_t verify_pattern_bytes;
 	uint32_t verify_fatal;
 	uint32_t verify_dump;
@@ -572,7 +571,6 @@ struct thread_options_pack {
 	uint32_t zero_buffers;
 	uint32_t refill_buffers;
 	uint32_t scramble_buffers;
-	uint8_t buffer_pattern[MAX_PATTERN_SIZE];
 	uint32_t buffer_pattern_bytes;
 	uint32_t compress_percentage;
 	uint32_t compress_chunk;
@@ -699,9 +697,16 @@ struct thread_options_pack {
 
 	uint32_t log_entries;
 	uint32_t log_prio;
+
+	/*
+	 * verify_pattern followed by buffer_pattern from the unpacked struct
+	 */
+	uint8_t patterns[];
 } __attribute__((packed));
 
-extern void convert_thread_options_to_cpu(struct thread_options *o, struct thread_options_pack *top);
+extern int convert_thread_options_to_cpu(struct thread_options *o,
+		struct thread_options_pack *top, size_t top_sz);
+extern size_t thread_options_pack_size(struct thread_options *o);
 extern void convert_thread_options_to_net(struct thread_options_pack *top, struct thread_options *);
 extern int fio_test_cconv(struct thread_options *);
 extern void options_default_fill(struct thread_options *o);
diff --git a/verify.c b/verify.c
index d6a229ca..ddfadcc8 100644
--- a/verify.c
+++ b/verify.c
@@ -917,9 +917,11 @@ int verify_io_u(struct thread_data *td, struct io_u **io_u_ptr)
 		hdr = p;
 
 		/*
-		 * Make rand_seed check pass when have verify_backlog.
+		 * Make rand_seed check pass when have verify_backlog or
+		 * zone reset frequency for zonemode=zbd.
 		 */
-		if (!td_rw(td) || (td->flags & TD_F_VER_BACKLOG))
+		if (!td_rw(td) || (td->flags & TD_F_VER_BACKLOG) ||
+		    td->o.zrf.u.f)
 			io_u->rand_seed = hdr->rand_seed;
 
 		if (td->o.verify != VERIFY_PATTERN_NO_HDR) {
diff --git a/zbd.c b/zbd.c
index 627fb968..d1e469f6 100644
--- a/zbd.c
+++ b/zbd.c
@@ -70,6 +70,19 @@ static inline uint64_t zbd_zone_capacity_end(const struct fio_zone_info *z)
 	return z->start + z->capacity;
 }
 
+/**
+ * zbd_zone_remainder - Return the number of bytes that are still available for
+ *                      writing before the zone gets full
+ * @z: zone info pointer.
+ */
+static inline uint64_t zbd_zone_remainder(struct fio_zone_info *z)
+{
+	if (z->wp >= zbd_zone_capacity_end(z))
+		return 0;
+
+	return zbd_zone_capacity_end(z) - z->wp;
+}
+
 /**
  * zbd_zone_full - verify whether a minimum number of bytes remain in a zone
  * @f: file pointer.
@@ -83,8 +96,7 @@ static bool zbd_zone_full(const struct fio_file *f, struct fio_zone_info *z,
 {
 	assert((required & 511) == 0);
 
-	return z->has_wp &&
-		z->wp + required > zbd_zone_capacity_end(z);
+	return z->has_wp && required > zbd_zone_remainder(z);
 }
 
 static void zone_lock(struct thread_data *td, const struct fio_file *f,
@@ -279,7 +291,6 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 	pthread_mutex_unlock(&f->zbd_info->mutex);
 
 	z->wp = z->start;
-	z->verify_block = 0;
 
 	td->ts.nr_zone_resets++;
 
@@ -322,6 +333,44 @@ static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
 	z->open = 0;
 }
 
+/**
+ * zbd_finish_zone - finish the specified zone
+ * @td: FIO thread data.
+ * @f: FIO file for which to finish a zone
+ * @z: Zone to finish.
+ *
+ * Finish the zone at @offset with open or close status.
+ */
+static int zbd_finish_zone(struct thread_data *td, struct fio_file *f,
+			   struct fio_zone_info *z)
+{
+	uint64_t offset = z->start;
+	uint64_t length = f->zbd_info->zone_size;
+	int ret = 0;
+
+	switch (f->zbd_info->model) {
+	case ZBD_HOST_AWARE:
+	case ZBD_HOST_MANAGED:
+		if (td->io_ops && td->io_ops->finish_zone)
+			ret = td->io_ops->finish_zone(td, f, offset, length);
+		else
+			ret = blkzoned_finish_zone(td, f, offset, length);
+		break;
+	default:
+		break;
+	}
+
+	if (ret < 0) {
+		td_verror(td, errno, "finish zone failed");
+		log_err("%s: finish zone at sector %"PRIu64" failed (%d).\n",
+			f->file_name, offset >> 9, errno);
+	} else {
+		z->wp = (z+1)->start;
+	}
+
+	return ret;
+}
+
 /**
  * zbd_reset_zones - Reset a range of zones.
  * @td: fio thread data.
@@ -440,7 +489,7 @@ static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 		 * already in-flight, handle it as a full zone instead of an
 		 * open zone.
 		 */
-		if (z->wp >= zbd_zone_capacity_end(z))
+		if (!zbd_zone_remainder(z))
 			res = false;
 		goto out;
 	}
@@ -602,7 +651,7 @@ static bool zbd_verify_bs(void)
 {
 	struct thread_data *td;
 	struct fio_file *f;
-	int i, j, k;
+	int i, j;
 
 	for_each_td(td, i) {
 		if (td_trim(td) &&
@@ -624,15 +673,6 @@ static bool zbd_verify_bs(void)
 					 zone_size);
 				return false;
 			}
-			for (k = 0; k < FIO_ARRAY_SIZE(td->o.bs); k++) {
-				if (td->o.verify != VERIFY_NONE &&
-				    zone_size % td->o.bs[k] != 0) {
-					log_info("%s: block size %llu is not a divisor of the zone size %"PRIu64"\n",
-						 f->file_name, td->o.bs[k],
-						 zone_size);
-					return false;
-				}
-			}
 		}
 	}
 	return true;
@@ -1044,6 +1084,11 @@ int zbd_setup_files(struct thread_data *td)
 	if (!zbd_verify_bs())
 		return 1;
 
+	if (td->o.experimental_verify) {
+		log_err("zonemode=zbd does not support experimental verify\n");
+		return 1;
+	}
+
 	for_each_file(td, f, i) {
 		struct zoned_block_device_info *zbd = f->zbd_info;
 		struct fio_zone_info *z;
@@ -1208,6 +1253,7 @@ void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 {
 	struct fio_zone_info *zb, *ze;
 	uint64_t swd;
+	bool verify_data_left = false;
 
 	if (!f->zbd_info || !td_write(td))
 		return;
@@ -1224,8 +1270,16 @@ void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 	 * writing any data to avoid that a zone reset has to be issued while
 	 * writing data, which causes data loss.
 	 */
-	if (td->o.verify != VERIFY_NONE && td->runstate != TD_VERIFYING)
-		zbd_reset_zones(td, f, zb, ze);
+	if (td->o.verify != VERIFY_NONE) {
+		verify_data_left = td->runstate == TD_VERIFYING ||
+			td->io_hist_len || td->verify_batch;
+		if (td->io_hist_len && td->o.verify_backlog)
+			verify_data_left =
+				td->io_hist_len % td->o.verify_backlog;
+		if (!verify_data_left)
+			zbd_reset_zones(td, f, zb, ze);
+	}
+
 	zbd_reset_write_cnt(td, f);
 }
 
@@ -1368,7 +1422,7 @@ found_candidate_zone:
 	/* Both z->mutex and zbdi->mutex are held. */
 
 examine_zone:
-	if (z->wp + min_bs <= zbd_zone_capacity_end(z)) {
+	if (zbd_zone_remainder(z) >= min_bs) {
 		pthread_mutex_unlock(&zbdi->mutex);
 		goto out;
 	}
@@ -1433,7 +1487,7 @@ retry:
 		z = zbd_get_zone(f, zone_idx);
 
 		zone_lock(td, f, z);
-		if (z->wp + min_bs <= zbd_zone_capacity_end(z))
+		if (zbd_zone_remainder(z) >= min_bs)
 			goto out;
 		pthread_mutex_lock(&zbdi->mutex);
 	}
@@ -1476,42 +1530,6 @@ out:
 	return z;
 }
 
-/* The caller must hold z->mutex. */
-static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
-						    struct io_u *io_u,
-						    struct fio_zone_info *z)
-{
-	const struct fio_file *f = io_u->file;
-	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
-
-	if (!zbd_open_zone(td, f, z)) {
-		zone_unlock(z);
-		z = zbd_convert_to_open_zone(td, io_u);
-		assert(z);
-	}
-
-	if (z->verify_block * min_bs >= z->capacity) {
-		log_err("%s: %d * %"PRIu64" >= %"PRIu64"\n",
-			f->file_name, z->verify_block, min_bs, z->capacity);
-		/*
-		 * If the assertion below fails during a test run, adding
-		 * "--experimental_verify=1" to the command line may help.
-		 */
-		assert(false);
-	}
-
-	io_u->offset = z->start + z->verify_block * min_bs;
-	if (io_u->offset + io_u->buflen >= zbd_zone_capacity_end(z)) {
-		log_err("%s: %llu + %llu >= %"PRIu64"\n",
-			f->file_name, io_u->offset, io_u->buflen,
-			zbd_zone_capacity_end(z));
-		assert(false);
-	}
-	z->verify_block += io_u->buflen / min_bs;
-
-	return z;
-}
-
 /*
  * Find another zone which has @min_bytes of readable data. Search in zones
  * @zb + 1 .. @zl. For random workload, also search in zones @zb - 1 .. @zf.
@@ -1862,10 +1880,8 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 
 	switch (io_u->ddir) {
 	case DDIR_READ:
-		if (td->runstate == TD_VERIFYING && td_write(td)) {
-			zb = zbd_replay_write_order(td, io_u, zb);
+		if (td->runstate == TD_VERIFYING && td_write(td))
 			goto accept;
-		}
 
 		/*
 		 * Check that there is enough written data in the zone to do an
@@ -1941,6 +1957,33 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			goto eof;
 		}
 
+retry:
+		if (zbd_zone_remainder(zb) > 0 &&
+		    zbd_zone_remainder(zb) < min_bs) {
+			pthread_mutex_lock(&f->zbd_info->mutex);
+			zbd_close_zone(td, f, zb);
+			pthread_mutex_unlock(&f->zbd_info->mutex);
+			dprint(FD_ZBD,
+			       "%s: finish zone %d\n",
+			       f->file_name, zbd_zone_idx(f, zb));
+			io_u_quiesce(td);
+			zbd_finish_zone(td, f, zb);
+			if (zbd_zone_idx(f, zb) + 1 >= f->max_zone) {
+				if (!td_random(td))
+					goto eof;
+			}
+			zone_unlock(zb);
+
+			/* Find the next write pointer zone */
+			do {
+				zb++;
+				if (zbd_zone_idx(f, zb) >= f->max_zone)
+					zb = zbd_get_zone(f, f->min_zone);
+			} while (!zb->has_wp);
+
+			zone_lock(td, f, zb);
+		}
+
 		if (!zbd_open_zone(td, f, zb)) {
 			zone_unlock(zb);
 			zb = zbd_convert_to_open_zone(td, io_u);
@@ -1951,6 +1994,10 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			}
 		}
 
+		if (zbd_zone_remainder(zb) > 0 &&
+		    zbd_zone_remainder(zb) < min_bs)
+			goto retry;
+
 		/* Check whether the zone reset threshold has been exceeded */
 		if (td->o.zrf.u.f) {
 			if (zbdi->wp_sectors_with_data >= f->io_size * td->o.zrt.u.f &&
@@ -1960,7 +2007,19 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 
 		/* Reset the zone pointer if necessary */
 		if (zb->reset_zone || zbd_zone_full(f, zb, min_bs)) {
-			assert(td->o.verify == VERIFY_NONE);
+			if (td->o.verify != VERIFY_NONE) {
+				/*
+				 * Unset io-u->file to tell get_next_verify()
+				 * that this IO is not requeue.
+				 */
+				io_u->file = NULL;
+				if (!get_next_verify(td, io_u)) {
+					zone_unlock(zb);
+					return io_u_accept;
+				}
+				io_u->file = f;
+			}
+
 			/*
 			 * Since previous write requests may have been submitted
 			 * asynchronously and since we will submit the zone
diff --git a/zbd.h b/zbd.h
index 0a73b41d..d425707e 100644
--- a/zbd.h
+++ b/zbd.h
@@ -25,7 +25,6 @@ enum io_u_action {
  * @start: zone start location (bytes)
  * @wp: zone write pointer location (bytes)
  * @capacity: maximum size usable from the start of a zone (bytes)
- * @verify_block: number of blocks that have been verified for this zone
  * @mutex: protects the modifiable members in this structure
  * @type: zone type (BLK_ZONE_TYPE_*)
  * @cond: zone state (BLK_ZONE_COND_*)
@@ -39,7 +38,6 @@ struct fio_zone_info {
 	uint64_t		start;
 	uint64_t		wp;
 	uint64_t		capacity;
-	uint32_t		verify_block;
 	enum zbd_zone_type	type:2;
 	enum zbd_zone_cond	cond:4;
 	unsigned int		has_wp:1;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-11-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-11-15 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2d92b09513b3c11a04541298aece35eae3dbc963:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-11-07 16:20:04 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 07c8fe21021681f86fbfd3c3d63b88a5ebd4e557:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-11-14 08:47:00 -0500)

----------------------------------------------------------------
Bart Van Assche (3):
      configure: Fix clock_gettime() detection
      configure: Fix the struct nvme_uring_cmd detection
      os/os.h: Improve cpus_configured()

Vincent Fu (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 configure | 10 ++++++----
 os/os.h   |  4 +++-
 2 files changed, 9 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 30bf5acb..1b12d268 100755
--- a/configure
+++ b/configure
@@ -1172,7 +1172,9 @@ cat > $TMPC << EOF
 #include <time.h>
 int main(int argc, char **argv)
 {
-  return clock_gettime(0, NULL);
+  struct timespec ts;
+
+  return clock_gettime(0, &ts);
 }
 EOF
 if compile_prog "" "" "clock_gettime"; then
@@ -1194,7 +1196,9 @@ if test "$clock_gettime" = "yes" ; then
 #include <time.h>
 int main(int argc, char **argv)
 {
-  return clock_gettime(CLOCK_MONOTONIC, NULL);
+  struct timespec ts;
+
+  return clock_gettime(CLOCK_MONOTONIC, &ts);
 }
 EOF
   if compile_prog "" "$LIBS" "clock monotonic"; then
@@ -2634,8 +2638,6 @@ cat > $TMPC << EOF
 #include <linux/nvme_ioctl.h>
 int main(void)
 {
-  struct nvme_uring_cmd *cmd;
-
   return sizeof(struct nvme_uring_cmd);
 }
 EOF
diff --git a/os/os.h b/os/os.h
index a6fde1fd..c428260c 100644
--- a/os/os.h
+++ b/os/os.h
@@ -355,7 +355,9 @@ static inline unsigned long long get_fs_free_size(const char *path)
 #ifndef FIO_HAVE_CPU_CONF_SYSCONF
 static inline unsigned int cpus_configured(void)
 {
-	return sysconf(_SC_NPROCESSORS_CONF);
+	int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+
+	return nr_cpus >= 1 ? nr_cpus : 1;
 }
 #endif
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-11-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-11-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 72bcaffd7d56d4c2ebad6d0a1e465e0e9db8be40:

  Fio 3.33 (2022-11-06 13:55:41 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2d92b09513b3c11a04541298aece35eae3dbc963:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-11-07 16:20:04 -0700)

----------------------------------------------------------------
Bart Van Assche (2):
      Windows: Fix the build
      Android: Enable zoned block device support

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 configure                         |  2 +-
 os/windows/dlls.c                 | 16 +++++++++++-----
 os/windows/posix/include/syslog.h |  2 +-
 3 files changed, 13 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 24c599a8..30bf5acb 100755
--- a/configure
+++ b/configure
@@ -2561,7 +2561,7 @@ if compile_prog "" "" "valgrind_dev"; then
 fi
 print_config "Valgrind headers" "$valgrind_dev"
 
-if test "$targetos" = "Linux" ; then
+if test "$targetos" = "Linux" || test "$targetos" = "Android"; then
 ##########################################
 # <linux/blkzoned.h> probe
 if test "$linux_blkzoned" != "yes" ; then
diff --git a/os/windows/dlls.c b/os/windows/dlls.c
index 774b1c61..ffedfa1e 100644
--- a/os/windows/dlls.c
+++ b/os/windows/dlls.c
@@ -11,12 +11,18 @@ void os_clk_tck(long *clk_tck)
 	 */
 	unsigned long minRes, maxRes, curRes;
 	HMODULE lib;
-	FARPROC queryTimer;
-	FARPROC setTimer;
+	NTSTATUS NTAPI (*queryTimer)
+		(OUT PULONG              MinimumResolution,
+		 OUT PULONG              MaximumResolution,
+		 OUT PULONG              CurrentResolution);
+	NTSTATUS NTAPI (*setTimer)
+		(IN ULONG                DesiredResolution,
+		 IN BOOLEAN              SetResolution,
+		 OUT PULONG              CurrentResolution);
 
 	if (!(lib = LoadLibrary(TEXT("ntdll.dll"))) ||
-		!(queryTimer = GetProcAddress(lib, "NtQueryTimerResolution")) ||
-		!(setTimer = GetProcAddress(lib, "NtSetTimerResolution"))) {
+		!(queryTimer = (void *)GetProcAddress(lib, "NtQueryTimerResolution")) ||
+		!(setTimer = (void *)GetProcAddress(lib, "NtSetTimerResolution"))) {
 		dprint(FD_HELPERTHREAD, 
 			"Failed to load ntdll library, set to lower bound 64 Hz\n");
 		*clk_tck = 64;
@@ -30,4 +36,4 @@ void os_clk_tck(long *clk_tck)
 		setTimer(maxRes, 1, &curRes);
 		*clk_tck = (long) (10000000L / maxRes);
 	}
-}
\ No newline at end of file
+}
diff --git a/os/windows/posix/include/syslog.h b/os/windows/posix/include/syslog.h
index b8582e95..03a04f69 100644
--- a/os/windows/posix/include/syslog.h
+++ b/os/windows/posix/include/syslog.h
@@ -1,7 +1,7 @@
 #ifndef SYSLOG_H
 #define SYSLOG_H
 
-int syslog();
+int syslog(int priority, const char *format, ...);
 
 #define LOG_INFO	0x1
 #define LOG_ERROR	0x2

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-11-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-11-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 02ee8a1ba7ea798f03fb029f589382b6f799be24:

  test: use homebrew to install sphinx instead of pip on macOS (2022-11-04 13:50:31 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 72bcaffd7d56d4c2ebad6d0a1e465e0e9db8be40:

  Fio 3.33 (2022-11-06 13:55:41 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 3.33

 FIO-VERSION-GEN | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index db073818..5a0822c9 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.32
+DEF_VER=fio-3.33
 
 LF='
 '

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-11-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-11-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7fc3a553beadd15cac09b1514547c4d382d292d9:

  HOWTO: clean up exit_what description (2022-11-02 10:26:36 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 02ee8a1ba7ea798f03fb029f589382b6f799be24:

  test: use homebrew to install sphinx instead of pip on macOS (2022-11-04 13:50:31 -0400)

----------------------------------------------------------------
Ankit Kumar (1):
      io_uring: update documentation and small fix for sqthread_poll

Vincent Fu (2):
      test: change GitHub Actions macOS platform to macOS 12
      test: use homebrew to install sphinx instead of pip on macOS

 .github/workflows/ci.yml | 2 +-
 HOWTO.rst                | 6 +++---
 ci/actions-install.sh    | 5 +++--
 engines/io_uring.c       | 2 +-
 fio.1                    | 6 +++---
 5 files changed, 11 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 1b8c0701..4bc91d3e 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -24,7 +24,7 @@ jobs:
           os: ubuntu-22.04
           cc: clang
         - build: macos
-          os: macos-11
+          os: macos-12
         - build: linux-i686-gcc
           os: ubuntu-22.04
           arch: i686
diff --git a/HOWTO.rst b/HOWTO.rst
index 0fb5593e..e796f961 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2274,7 +2274,7 @@ with the caveat that when used on the command line, they must come after the
 	map and release for each IO. This is more efficient, and reduces the
 	IO latency as well.
 
-.. option:: nonvectored : [io_uring] [io_uring_cmd]
+.. option:: nonvectored=int : [io_uring] [io_uring_cmd]
 
 	With this option, fio will use non-vectored read/write commands, where
 	address must contain the address directly. Default is -1.
@@ -2301,7 +2301,7 @@ with the caveat that when used on the command line, they must come after the
 	This frees up cycles for fio, at the cost of using more CPU in the
 	system.
 
-.. option:: sqthread_poll_cpu : [io_uring] [io_uring_cmd]
+.. option:: sqthread_poll_cpu=int : [io_uring] [io_uring_cmd]
 
 	When :option:`sqthread_poll` is set, this option provides a way to
 	define which CPU should be used for the polling thread.
@@ -2351,7 +2351,7 @@ with the caveat that when used on the command line, they must come after the
 	When hipri is set this determines the probability of a pvsync2 I/O being high
 	priority. The default is 100%.
 
-.. option:: nowait : [pvsync2] [libaio] [io_uring]
+.. option:: nowait=bool : [pvsync2] [libaio] [io_uring] [io_uring_cmd]
 
 	By default if a request cannot be executed immediately (e.g. resource starvation,
 	waiting on locks) it is queued and the initiating process will be blocked until
diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index 82e14d2a..c16dff16 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -84,8 +84,9 @@ install_macos() {
     #echo "Updating homebrew..."
     #brew update >/dev/null 2>&1
     echo "Installing packages..."
-    HOMEBREW_NO_AUTO_UPDATE=1 brew install cunit libnfs
-    pip3 install scipy six sphinx
+    HOMEBREW_NO_AUTO_UPDATE=1 brew install cunit libnfs sphinx-doc
+    brew link sphinx-doc --force
+    pip3 install scipy six 
 }
 
 main() {
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 6906e0a4..3c656b77 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -226,7 +226,7 @@ static struct fio_option options[] = {
 	{
 		.name	= "sqthread_poll",
 		.lname	= "Kernel SQ thread polling",
-		.type	= FIO_OPT_INT,
+		.type	= FIO_OPT_STR_SET,
 		.off1	= offsetof(struct ioring_options, sqpoll_thread),
 		.help	= "Offload submission/completion to kernel thread",
 		.category = FIO_OPT_C_ENGINE,
diff --git a/fio.1 b/fio.1
index 4324a975..9e33c9e1 100644
--- a/fio.1
+++ b/fio.1
@@ -2063,7 +2063,7 @@ release them when IO is done. If this option is set, the pages are pre-mapped
 before IO is started. This eliminates the need to map and release for each IO.
 This is more efficient, and reduces the IO latency as well.
 .TP
-.BI (io_uring,io_uring_cmd)nonvectored
+.BI (io_uring,io_uring_cmd)nonvectored \fR=\fPint
 With this option, fio will use non-vectored read/write commands, where address
 must contain the address directly. Default is -1.
 .TP
@@ -2092,7 +2092,7 @@ available items in the SQ ring. If this option is set, the act of submitting IO
 will be done by a polling thread in the kernel. This frees up cycles for fio, at
 the cost of using more CPU in the system.
 .TP
-.BI (io_uring,io_uring_cmd)sqthread_poll_cpu
+.BI (io_uring,io_uring_cmd)sqthread_poll_cpu \fR=\fPint
 When `sqthread_poll` is set, this option provides a way to define which CPU
 should be used for the polling thread.
 .TP
@@ -2115,7 +2115,7 @@ than normal.
 When hipri is set this determines the probability of a pvsync2 I/O being high
 priority. The default is 100%.
 .TP
-.BI (pvsync2,libaio,io_uring)nowait
+.BI (pvsync2,libaio,io_uring,io_uring_cmd)nowait \fR=\fPbool
 By default if a request cannot be executed immediately (e.g. resource starvation,
 waiting on locks) it is queued and the initiating process will be blocked until
 the required resource becomes free.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-11-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-11-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 73f168ea2c9a66145559c2217fc5a70c992cb80e:

  HOWTO: update description for flow option (2022-11-01 17:24:34 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7fc3a553beadd15cac09b1514547c4d382d292d9:

  HOWTO: clean up exit_what description (2022-11-02 10:26:36 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      HOWTO: clean up exit_what description

 HOWTO.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 53ae8c17..0fb5593e 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -3356,10 +3356,10 @@ Threads, processes and job synchronization
 	make fio terminate all jobs in the same group, as soon as one job of that
 	group finishes.
 
-.. option:: exit_what
+.. option:: exit_what=str
 
 	By default, fio will continue running all other jobs when one job finishes.
-	Sometimes this is not the desired action. Setting ``exit_all`` will
+	Sometimes this is not the desired action. Setting ``exitall`` will
 	instead make fio terminate all jobs in the same group. The option
         ``exit_what`` allows to control which jobs get terminated when ``exitall`` is
         enabled. The default is ``group`` and does not change the behaviour of

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-11-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-11-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c4704c081a54160621227b42238f6e439c28fba3:

  test: add test for experimental verify with loops and time_based options (2022-10-24 10:34:57 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 73f168ea2c9a66145559c2217fc5a70c992cb80e:

  HOWTO: update description for flow option (2022-11-01 17:24:34 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      HOWTO: update description for flow option

 HOWTO.rst | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index e89d05f0..53ae8c17 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -3329,13 +3329,13 @@ Threads, processes and job synchronization
 
 .. option:: flow=int
 
-	Weight in token-based flow control. If this value is used, then there is a
-	'flow counter' which is used to regulate the proportion of activity between
-	two or more jobs. Fio attempts to keep this flow counter near zero. The
-	``flow`` parameter stands for how much should be added or subtracted to the
-	flow counter on each iteration of the main I/O loop. That is, if one job has
-	``flow=8`` and another job has ``flow=-1``, then there will be a roughly 1:8
-	ratio in how much one runs vs the other.
+        Weight in token-based flow control. If this value is used, then fio
+        regulates the activity between two or more jobs sharing the same
+        flow_id. Fio attempts to keep each job activity proportional to other
+        jobs' activities in the same flow_id group, with respect to requested
+        weight per job. That is, if one job has `flow=3', another job has
+        `flow=2' and another with `flow=1`, then there will be a roughly 3:2:1
+        ratio in how much one runs vs the others.
 
 .. option:: flow_sleep=int
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-10-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-10-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d72244761b2230fbb2d6eaec59cdedd3ea651d4f:

  stat: fix segfault with fio option --bandwidth-log (2022-10-21 13:23:41 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c4704c081a54160621227b42238f6e439c28fba3:

  test: add test for experimental verify with loops and time_based options (2022-10-24 10:34:57 -0400)

----------------------------------------------------------------
Shin'ichiro Kawasaki (4):
      verify: fix bytes_done accounting of experimental verify
      verify: fix numberio accounting of experimental verify
      test: add test for verify read back of experimental verify
      test: add test for experimental verify with loops and time_based options

Vincent Fu (1):
      Merge branch 'fix-cpus_allowed' of https://github.com/roxma/fio

mayuanpeng (1):
      cpus_allowed: use __NRPROCESSORS_CONF instead of __SC_NPROCESSORS_ONLN for non-sequential CPU ids

 backend.c                 |  8 ++++++--
 fio.h                     |  2 ++
 gettime.c                 |  2 +-
 idletime.c                |  2 +-
 io_u.c                    | 23 +++++++++++++++++------
 libfio.c                  |  1 +
 options.c                 |  8 ++++----
 os/os-hpux.h              |  4 ++--
 os/os-linux.h             |  8 --------
 os/os-solaris.h           |  2 +-
 os/os-windows.h           |  5 +----
 os/os.h                   |  8 ++++----
 os/windows/cpu-affinity.c |  6 ------
 os/windows/posix.c        | 16 ++++++++++++----
 rate-submit.c             |  2 ++
 server.c                  |  2 +-
 t/dedupe.c                |  2 +-
 t/jobs/t0025.fio          |  7 +++++++
 t/jobs/t0026.fio          | 19 +++++++++++++++++++
 t/run-fio-tests.py        | 31 +++++++++++++++++++++++++++++++
 verify.c                  |  2 --
 21 files changed, 113 insertions(+), 47 deletions(-)
 create mode 100644 t/jobs/t0025.fio
 create mode 100644 t/jobs/t0026.fio

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index d8f4f2a5..ba954a6b 100644
--- a/backend.c
+++ b/backend.c
@@ -682,7 +682,7 @@ static void do_verify(struct thread_data *td, uint64_t verify_bytes)
 				break;
 			}
 		} else {
-			if (ddir_rw_sum(td->bytes_done) + td->o.rw_min_bs > verify_bytes)
+			if (td->bytes_verified + td->o.rw_min_bs > verify_bytes)
 				break;
 
 			while ((io_u = get_io_u(td)) != NULL) {
@@ -711,6 +711,8 @@ static void do_verify(struct thread_data *td, uint64_t verify_bytes)
 					break;
 				} else if (io_u->ddir == DDIR_WRITE) {
 					io_u->ddir = DDIR_READ;
+					io_u->numberio = td->verify_read_issues;
+					td->verify_read_issues++;
 					populate_verify_io_u(td, io_u);
 					break;
 				} else {
@@ -1030,8 +1032,10 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 			break;
 		}
 
-		if (io_u->ddir == DDIR_WRITE && td->flags & TD_F_DO_VERIFY)
+		if (io_u->ddir == DDIR_WRITE && td->flags & TD_F_DO_VERIFY) {
+			io_u->numberio = td->io_issues[io_u->ddir];
 			populate_verify_io_u(td, io_u);
+		}
 
 		ddir = io_u->ddir;
 
diff --git a/fio.h b/fio.h
index de7eca79..8da77640 100644
--- a/fio.h
+++ b/fio.h
@@ -356,6 +356,7 @@ struct thread_data {
 	 * Issue side
 	 */
 	uint64_t io_issues[DDIR_RWDIR_CNT];
+	uint64_t verify_read_issues;
 	uint64_t io_issue_bytes[DDIR_RWDIR_CNT];
 	uint64_t loops;
 
@@ -370,6 +371,7 @@ struct thread_data {
 	uint64_t zone_bytes;
 	struct fio_sem *sem;
 	uint64_t bytes_done[DDIR_RWDIR_CNT];
+	uint64_t bytes_verified;
 
 	uint64_t *thinktime_blocks_counter;
 	struct timespec last_thinktime;
diff --git a/gettime.c b/gettime.c
index 8993be16..bc66a3ac 100644
--- a/gettime.c
+++ b/gettime.c
@@ -671,7 +671,7 @@ static int clock_cmp(const void *p1, const void *p2)
 int fio_monotonic_clocktest(int debug)
 {
 	struct clock_thread *cthreads;
-	unsigned int seen_cpus, nr_cpus = cpus_online();
+	unsigned int seen_cpus, nr_cpus = cpus_configured();
 	struct clock_entry *entries;
 	unsigned long nr_entries, tentries, failed = 0;
 	struct clock_entry *prev, *this;
diff --git a/idletime.c b/idletime.c
index fc1df8e9..90ed77ea 100644
--- a/idletime.c
+++ b/idletime.c
@@ -189,7 +189,7 @@ void fio_idle_prof_init(void)
 	pthread_condattr_t cattr;
 	struct idle_prof_thread *ipt;
 
-	ipc.nr_cpus = cpus_online();
+	ipc.nr_cpus = cpus_configured();
 	ipc.status = IDLE_PROF_STATUS_OK;
 
 	if (ipc.opt == IDLE_PROF_OPT_NONE)
diff --git a/io_u.c b/io_u.c
index 91f1a358..8035f4b7 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2121,13 +2121,26 @@ static void ios_completed(struct thread_data *td,
 	}
 }
 
+static void io_u_update_bytes_done(struct thread_data *td,
+				   struct io_completion_data *icd)
+{
+	int ddir;
+
+	if (td->runstate == TD_VERIFYING) {
+		td->bytes_verified += icd->bytes_done[DDIR_READ];
+		return;
+	}
+
+	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
+		td->bytes_done[ddir] += icd->bytes_done[ddir];
+}
+
 /*
  * Complete a single io_u for the sync engines.
  */
 int io_u_sync_complete(struct thread_data *td, struct io_u *io_u)
 {
 	struct io_completion_data icd;
-	int ddir;
 
 	init_icd(td, &icd, 1);
 	io_completed(td, &io_u, &icd);
@@ -2140,8 +2153,7 @@ int io_u_sync_complete(struct thread_data *td, struct io_u *io_u)
 		return -1;
 	}
 
-	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
-		td->bytes_done[ddir] += icd.bytes_done[ddir];
+	io_u_update_bytes_done(td, &icd);
 
 	return 0;
 }
@@ -2153,7 +2165,7 @@ int io_u_queued_complete(struct thread_data *td, int min_evts)
 {
 	struct io_completion_data icd;
 	struct timespec *tvp = NULL;
-	int ret, ddir;
+	int ret;
 	struct timespec ts = { .tv_sec = 0, .tv_nsec = 0, };
 
 	dprint(FD_IO, "io_u_queued_complete: min=%d\n", min_evts);
@@ -2179,8 +2191,7 @@ int io_u_queued_complete(struct thread_data *td, int min_evts)
 		return -1;
 	}
 
-	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
-		td->bytes_done[ddir] += icd.bytes_done[ddir];
+	io_u_update_bytes_done(td, &icd);
 
 	return ret;
 }
diff --git a/libfio.c b/libfio.c
index 1a891776..ac521974 100644
--- a/libfio.c
+++ b/libfio.c
@@ -94,6 +94,7 @@ static void reset_io_counters(struct thread_data *td, int all)
 			td->rate_next_io_time[ddir] = 0;
 			td->last_usec[ddir] = 0;
 		}
+		td->bytes_verified = 0;
 	}
 
 	td->zone_bytes = 0;
diff --git a/options.c b/options.c
index a668b0e4..9e4d8cd1 100644
--- a/options.c
+++ b/options.c
@@ -627,7 +627,7 @@ static int str_exitall_cb(void)
 int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu_index)
 {
 	unsigned int i, index, cpus_in_mask;
-	const long max_cpu = cpus_online();
+	const long max_cpu = cpus_configured();
 
 	cpus_in_mask = fio_cpu_count(mask);
 	if (!cpus_in_mask)
@@ -666,7 +666,7 @@ static int str_cpumask_cb(void *data, unsigned long long *val)
 		return 1;
 	}
 
-	max_cpu = cpus_online();
+	max_cpu = cpus_configured();
 
 	for (i = 0; i < sizeof(int) * 8; i++) {
 		if ((1 << i) & *val) {
@@ -702,7 +702,7 @@ static int set_cpus_allowed(struct thread_data *td, os_cpu_mask_t *mask,
 	strip_blank_front(&str);
 	strip_blank_end(str);
 
-	max_cpu = cpus_online();
+	max_cpu = cpus_configured();
 
 	while ((cpu = strsep(&str, ",")) != NULL) {
 		char *str2, *cpu2;
@@ -5305,7 +5305,7 @@ void fio_keywords_init(void)
 	sprintf(buf, "%llu", mb_memory);
 	fio_keywords[1].replace = strdup(buf);
 
-	l = cpus_online();
+	l = cpus_configured();
 	sprintf(buf, "%lu", l);
 	fio_keywords[2].replace = strdup(buf);
 }
diff --git a/os/os-hpux.h b/os/os-hpux.h
index a80cb2bc..9f3d76f5 100644
--- a/os/os-hpux.h
+++ b/os/os-hpux.h
@@ -88,9 +88,9 @@ static inline unsigned long long os_phys_mem(void)
 	return ret;
 }
 
-#define FIO_HAVE_CPU_ONLINE_SYSCONF
+#define FIO_HAVE_CPU_CONF_SYSCONF
 
-static inline unsigned int cpus_online(void)
+static inline unsigned int cpus_configured(void)
 {
 	return mpctl(MPC_GETNUMSPUS, 0, NULL);
 }
diff --git a/os/os-linux.h b/os/os-linux.h
index 831f0ad0..bbb1f27c 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -251,14 +251,6 @@ static inline int arch_cache_line_size(void)
 		return atoi(size);
 }
 
-#ifdef __powerpc64__
-#define FIO_HAVE_CPU_ONLINE_SYSCONF
-static inline unsigned int cpus_online(void)
-{
-        return sysconf(_SC_NPROCESSORS_CONF);
-}
-#endif
-
 static inline unsigned long long get_fs_free_size(const char *path)
 {
 	unsigned long long ret;
diff --git a/os/os-solaris.h b/os/os-solaris.h
index ea1f081c..60d4c1ec 100644
--- a/os/os-solaris.h
+++ b/os/os-solaris.h
@@ -119,7 +119,7 @@ static inline int fio_set_odirect(struct fio_file *f)
 
 static inline bool fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
 {
-	const unsigned int max_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+	const unsigned int max_cpus = sysconf(_SC_NPROCESSORS_CONF);
 	unsigned int num_cpus;
 	processorid_t *cpus;
 	bool ret;
diff --git a/os/os-windows.h b/os/os-windows.h
index 510b8143..12f33486 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -44,7 +44,7 @@
 #define fio_swap64(x)	_byteswap_uint64(x)
 
 #define _SC_PAGESIZE			0x1
-#define _SC_NPROCESSORS_ONLN	0x2
+#define _SC_NPROCESSORS_CONF	0x2
 #define _SC_PHYS_PAGES			0x4
 
 #define SA_RESTART	0
@@ -219,9 +219,6 @@ static inline int fio_mkdir(const char *path, mode_t mode) {
 	return 0;
 }
 
-#define FIO_HAVE_CPU_ONLINE_SYSCONF
-unsigned int cpus_online(void);
-
 int first_set_cpu(os_cpu_mask_t *cpumask);
 int fio_setaffinity(int pid, os_cpu_mask_t cpumask);
 int fio_cpuset_init(os_cpu_mask_t *mask);
diff --git a/os/os.h b/os/os.h
index aba6813f..a6fde1fd 100644
--- a/os/os.h
+++ b/os/os.h
@@ -352,10 +352,10 @@ static inline unsigned long long get_fs_free_size(const char *path)
 }
 #endif
 
-#ifndef FIO_HAVE_CPU_ONLINE_SYSCONF
-static inline unsigned int cpus_online(void)
+#ifndef FIO_HAVE_CPU_CONF_SYSCONF
+static inline unsigned int cpus_configured(void)
 {
-	return sysconf(_SC_NPROCESSORS_ONLN);
+	return sysconf(_SC_NPROCESSORS_CONF);
 }
 #endif
 
@@ -363,7 +363,7 @@ static inline unsigned int cpus_online(void)
 #ifdef FIO_HAVE_CPU_AFFINITY
 static inline int CPU_COUNT(os_cpu_mask_t *mask)
 {
-	int max_cpus = cpus_online();
+	int max_cpus = cpus_configured();
 	int nr_cpus, i;
 
 	for (i = 0, nr_cpus = 0; i < max_cpus; i++)
diff --git a/os/windows/cpu-affinity.c b/os/windows/cpu-affinity.c
index 7601970f..8f3d6a76 100644
--- a/os/windows/cpu-affinity.c
+++ b/os/windows/cpu-affinity.c
@@ -2,12 +2,6 @@
 
 #include <windows.h>
 
-/* Return all processors regardless of processor group */
-unsigned int cpus_online(void)
-{
-	return GetActiveProcessorCount(ALL_PROCESSOR_GROUPS);
-}
-
 static void print_mask(os_cpu_mask_t *cpumask)
 {
 	for (int i = 0; i < FIO_CPU_MASK_ROWS; i++)
diff --git a/os/windows/posix.c b/os/windows/posix.c
index a3a6c89f..a47223da 100644
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -216,10 +216,18 @@ long sysconf(int name)
 	MEMORYSTATUSEX status;
 
 	switch (name) {
-	case _SC_NPROCESSORS_ONLN:
-		val = GetNumLogicalProcessors();
+	case _SC_NPROCESSORS_CONF:
+		/*
+		 * Using GetMaximumProcessorCount introduces a problem in
+		 * gettime.c because Windows does not have
+		 * fio_get_thread_affinity. Log sample (see #1479):
+		 *
+		 *   CPU mask contains processor beyond last active processor index (2)
+		 *   clock setaffinity failed: No error
+		 */
+		val = GetActiveProcessorCount(ALL_PROCESSOR_GROUPS);
 		if (val == -1)
-			log_err("sysconf(_SC_NPROCESSORS_ONLN) failed\n");
+			log_err("sysconf(_SC_NPROCESSORS_CONF) failed\n");
 
 		break;
 
@@ -1201,4 +1209,4 @@ cleanup:
 	DisconnectNamedPipe(hpipe);
 	CloseHandle(hpipe);
 	return ret;
-}
\ No newline at end of file
+}
diff --git a/rate-submit.c b/rate-submit.c
index 268356d1..2fe768c0 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -263,6 +263,8 @@ static void sum_ddir(struct thread_data *dst, struct thread_data *src,
 	sum_val(&dst->this_io_blocks[ddir], &src->this_io_blocks[ddir]);
 	sum_val(&dst->this_io_bytes[ddir], &src->this_io_bytes[ddir]);
 	sum_val(&dst->bytes_done[ddir], &src->bytes_done[ddir]);
+	if (ddir == DDIR_READ)
+		sum_val(&dst->bytes_verified, &src->bytes_verified);
 
 	pthread_double_unlock(&dst->io_wq.stat_lock, &src->io_wq.stat_lock);
 }
diff --git a/server.c b/server.c
index b453be5f..b869d387 100644
--- a/server.c
+++ b/server.c
@@ -999,7 +999,7 @@ static int handle_probe_cmd(struct fio_net_cmd *cmd)
 		.os		= FIO_OS,
 		.arch		= FIO_ARCH,
 		.bpp		= sizeof(void *),
-		.cpus		= __cpu_to_le32(cpus_online()),
+		.cpus		= __cpu_to_le32(cpus_configured()),
 	};
 
 	dprint(FD_NET, "server: sending probe reply\n");
diff --git a/t/dedupe.c b/t/dedupe.c
index d21e96f4..02e52b74 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -688,7 +688,7 @@ int main(int argc, char *argv[])
 		use_bloom = 0;
 
 	if (!num_threads)
-		num_threads = cpus_online();
+		num_threads = cpus_configured();
 
 	if (argc == optind)
 		return usage(argv);
diff --git a/t/jobs/t0025.fio b/t/jobs/t0025.fio
new file mode 100644
index 00000000..29b5fe80
--- /dev/null
+++ b/t/jobs/t0025.fio
@@ -0,0 +1,7 @@
+[job]
+filename=t0025file
+size=128k
+readwrite=write
+do_verify=1
+verify=md5
+experimental_verify=1
diff --git a/t/jobs/t0026.fio b/t/jobs/t0026.fio
new file mode 100644
index 00000000..ee89b140
--- /dev/null
+++ b/t/jobs/t0026.fio
@@ -0,0 +1,19 @@
+[job1]
+filename=t0026file
+size=1M
+readwrite=randwrite
+loops=8
+do_verify=1
+verify=md5
+experimental_verify=1
+
+[job2]
+stonewall=1
+filename=t0026file
+size=1M
+readwrite=randrw
+time_based
+runtime=5
+do_verify=1
+verify=md5
+experimental_verify=1
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index df87ae72..e5b307ac 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -788,6 +788,18 @@ class FioJobTest_t0024(FioJobTest_t0023):
         self.check_all_offsets("bssplit_bw.log", 512, filesize)
 
 
+class FioJobTest_t0025(FioJobTest):
+    """Test experimental verify read backs written data pattern."""
+    def check_result(self):
+        super(FioJobTest_t0025, self).check_result()
+
+        if not self.passed:
+            return
+
+        if self.json_data['jobs'][0]['read']['io_kbytes'] != 128:
+            self.passed = False
+
+
 class FioJobTest_iops_rate(FioJobTest):
     """Test consists of fio test job t0009
     Confirm that job0 iops == 1000
@@ -1183,6 +1195,25 @@ TEST_LIST = [
         'pre_success':      None,
         'requirements':     [],
     },
+    {
+        'test_id':          25,
+        'test_class':       FioJobTest_t0025,
+        'job':              't0025.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [],
+    },
+    {
+        'test_id':          26,
+        'test_class':       FioJobTest,
+        'job':              't0026.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [Requirements.not_windows],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,
diff --git a/verify.c b/verify.c
index 0e1e4639..d6a229ca 100644
--- a/verify.c
+++ b/verify.c
@@ -1287,8 +1287,6 @@ void populate_verify_io_u(struct thread_data *td, struct io_u *io_u)
 	if (td->o.verify == VERIFY_NULL)
 		return;
 
-	io_u->numberio = td->io_issues[io_u->ddir];
-
 	fill_pattern_headers(td, io_u, 0, 0);
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-10-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-10-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 548f1269e3772c666cac4148453d9c63bdfa65c4:

  Merge branch 'issue-1213' of https://github.com/SystemFabricWorks/fio (2022-10-19 12:04:50 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d72244761b2230fbb2d6eaec59cdedd3ea651d4f:

  stat: fix segfault with fio option --bandwidth-log (2022-10-21 13:23:41 -0400)

----------------------------------------------------------------
Ankit Kumar (1):
      stat: fix segfault with fio option --bandwidth-log

 stat.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 949af5ed..b963973a 100644
--- a/stat.c
+++ b/stat.c
@@ -2870,7 +2870,10 @@ static struct io_logs *get_new_log(struct io_log *iolog)
 	 * forever
 	 */
 	if (!iolog->cur_log_max) {
-		new_samples = iolog->td->o.log_entries;
+		if (iolog->td)
+			new_samples = iolog->td->o.log_entries;
+		else
+			new_samples = DEF_LOG_ENTRIES;
 	} else {
 		new_samples = iolog->cur_log_max * 2;
 		if (new_samples > MAX_LOG_ENTRIES)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-10-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-10-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2f160e0c8848bab566427a11eee116d8e834bcf0:

  test: change GitHub actions checkout from v2 to v3 (2022-10-18 11:13:03 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 548f1269e3772c666cac4148453d9c63bdfa65c4:

  Merge branch 'issue-1213' of https://github.com/SystemFabricWorks/fio (2022-10-19 12:04:50 -0400)

----------------------------------------------------------------
Brian T. Smith (2):
      fix configure probe for libcufile
      libcufile: use generic_get_file_size

Vincent Fu (1):
      Merge branch 'issue-1213' of https://github.com/SystemFabricWorks/fio

 configure           | 4 ++--
 engines/libcufile.c | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 546541a2..24c599a8 100755
--- a/configure
+++ b/configure
@@ -2723,9 +2723,9 @@ int main(int argc, char* argv[]) {
    return 0;
 }
 EOF
-  if compile_prog "" "-lcuda -lcudart -lcufile" "libcufile"; then
+  if compile_prog "" "-lcuda -lcudart -lcufile -ldl" "libcufile"; then
     libcufile="yes"
-    LIBS="-lcuda -lcudart -lcufile $LIBS"
+    LIBS="-lcuda -lcudart -lcufile -ldl $LIBS"
   else
     if test "$libcufile" = "yes" ; then
       feature_not_found "libcufile" ""
diff --git a/engines/libcufile.c b/engines/libcufile.c
index e575b786..2bedf261 100644
--- a/engines/libcufile.c
+++ b/engines/libcufile.c
@@ -606,6 +606,7 @@ FIO_STATIC struct ioengine_ops ioengine = {
 	.version             = FIO_IOOPS_VERSION,
 	.init                = fio_libcufile_init,
 	.queue               = fio_libcufile_queue,
+	.get_file_size       = generic_get_file_size,
 	.open_file           = fio_libcufile_open_file,
 	.close_file          = fio_libcufile_close_file,
 	.iomem_alloc         = fio_libcufile_iomem_alloc,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-10-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-10-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0360d61fbfcc1f07bcdc16672f5040f8cf49681f:

  t/zbd: add a CLI option to force io_uring (2022-10-16 17:05:03 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2f160e0c8848bab566427a11eee116d8e834bcf0:

  test: change GitHub actions checkout from v2 to v3 (2022-10-18 11:13:03 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      test: change GitHub actions checkout from v2 to v3

 .github/workflows/ci.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index bdc4db85..1b8c0701 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -39,7 +39,7 @@ jobs:
 
     steps:
     - name: Checkout repo
-      uses: actions/checkout@v2
+      uses: actions/checkout@v3
     - name: Install dependencies
       run: ./ci/actions-install.sh
     - name: Build

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-10-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-10-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8a63e7a32fcb6b7b131c4678ba95b81a9f2f8bca:

  Merge branch 'readme-update' of https://github.com/nikoandpiko/fio (2022-10-15 09:05:32 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0360d61fbfcc1f07bcdc16672f5040f8cf49681f:

  t/zbd: add a CLI option to force io_uring (2022-10-16 17:05:03 -0600)

----------------------------------------------------------------
Alexey Dobriyan (1):
      fio: warn about "ioengine=psync" and "iodepth >= 1"

Dmitry Fomichev (2):
      t/zbd: fix max_open_zones determination in tests
      t/zbd: add a CLI option to force io_uring

 backend.c              |  5 +++++
 t/zbd/functions        |  4 +++-
 t/zbd/test-zbd-support | 10 ++++++++++
 3 files changed, 18 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index ec535bcc..d8f4f2a5 100644
--- a/backend.c
+++ b/backend.c
@@ -1791,6 +1791,11 @@ static void *thread_main(void *data)
 	if (td_io_init(td))
 		goto err;
 
+	if (td_ioengine_flagged(td, FIO_SYNCIO) && td->o.iodepth > 1) {
+		log_info("note: both iodepth >= 1 and synchronous I/O engine "
+			 "are selected, queue depth will be capped at 1\n");
+	}
+
 	if (init_io_u(td))
 		goto err;
 
diff --git a/t/zbd/functions b/t/zbd/functions
index 7cff18fd..812320f5 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -230,9 +230,11 @@ max_open_zones() {
 		    echo ${max_nr_open_zones}
 		}
 	fi
-    else
+    elif [ -n "${use_libzbc}" ]; then
 	${zbc_report_zones} "$dev" |
 	    sed -n 's/^[[:blank:]]*Maximum number of open sequential write required zones:[[:blank:]]*//p'
+    else
+	echo 0
     fi
 }
 
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index d4aaa813..cdc03f28 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -17,6 +17,7 @@ usage() {
 	echo -e "\t-t <test #> Run only a single test case with specified number"
 	echo -e "\t-q Quit the test run after any failed test"
 	echo -e "\t-z Run fio with debug=zbd option"
+	echo -e "\t-u Use io_uring ioengine in place of libaio"
 }
 
 max() {
@@ -38,6 +39,8 @@ min() {
 ioengine() {
 	if [ -n "$use_libzbc" ]; then
 		echo -n "--ioengine=libzbc"
+	elif [ "$1" = "libaio" -a -n "$force_io_uring" ]; then
+		echo -n "--ioengine=io_uring"
 	else
 		echo -n "--ioengine=$1"
 	fi
@@ -1275,6 +1278,7 @@ use_libzbc=
 zbd_debug=
 max_open_zones_opt=
 quit_on_err=
+force_io_uring=
 
 while [ "${1#-}" != "$1" ]; do
   case "$1" in
@@ -1292,6 +1296,7 @@ while [ "${1#-}" != "$1" ]; do
 	shift;;
     -q) quit_on_err=1; shift;;
     -z) zbd_debug=1; shift;;
+    -u) force_io_uring=1; shift;;
     --) shift; break;;
      *) usage; exit 1;;
   esac
@@ -1302,6 +1307,11 @@ if [ $# != 1 ]; then
     exit 1
 fi
 
+if [ -n "$use_libzbc" -a -n "$force_io_uring" ]; then
+    echo "Please specify only one of -l and -u options"
+    exit 1
+fi
+
 # shellcheck source=functions
 source "$(dirname "$0")/functions" || exit $?
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-10-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-10-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 07f78c37833730594778fb5684ac6ec40d0289f8:

  engines/io_uring: set coop taskrun, single issuer and defer taskrun (2022-10-12 07:19:35 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8a63e7a32fcb6b7b131c4678ba95b81a9f2f8bca:

  Merge branch 'readme-update' of https://github.com/nikoandpiko/fio (2022-10-15 09:05:32 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'readme-update' of https://github.com/nikoandpiko/fio

Nicholas Roma (1):
      Update to README

 README.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/README.rst b/README.rst
index 79582dea..bcd08ec9 100644
--- a/README.rst
+++ b/README.rst
@@ -150,7 +150,7 @@ GNU make isn't the default, type ``gmake`` instead of ``make``.
 
 Configure will print the enabled options. Note that on Linux based platforms,
 the libaio development packages must be installed to use the libaio
-engine. Depending on distro, it is usually called libaio-devel or libaio-dev.
+engine. Depending on the distro, it is usually called libaio-devel or libaio-dev.
 
 For gfio, gtk 2.18 (or newer), associated glib threads, and cairo are required
 to be installed.  gfio isn't built automatically and can be enabled with a
@@ -170,7 +170,7 @@ configure.
 Windows
 ~~~~~~~
 
-The minimum versions of Windows for building/runing fio are Windows 7/Windows
+The minimum versions of Windows for building/running fio are Windows 7/Windows
 Server 2008 R2. On Windows, Cygwin (https://www.cygwin.com/) is required in
 order to build fio. To create an MSI installer package install WiX from
 https://wixtoolset.org and run :file:`dobuild.cmd` from the :file:`os/windows`
@@ -224,7 +224,7 @@ implemented, I'd be happy to take patches for that. An example of that is disk
 utility statistics and (I think) huge page support, support for that does exist
 in FreeBSD/Solaris.
 
-Fio uses pthread mutexes for signalling and locking and some platforms do not
+Fio uses pthread mutexes for signaling and locking and some platforms do not
 support process shared pthread mutexes. As a result, on such platforms only
 threads are supported. This could be fixed with sysv ipc locking or other
 locking alternatives.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-10-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-10-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b19c5ee1357ffb74f4de57b1617364bbbaacf1a0:

  examples: uring-cmd-zoned: expand the reasoning behind QD1 (2022-10-07 09:50:37 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 07f78c37833730594778fb5684ac6ec40d0289f8:

  engines/io_uring: set coop taskrun, single issuer and defer taskrun (2022-10-12 07:19:35 -0600)

----------------------------------------------------------------
Ankit Kumar (1):
      engines/io_uring: set coop taskrun, single issuer and defer taskrun

 engines/io_uring.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index c679177f..6906e0a4 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -889,9 +889,30 @@ static int fio_ioring_cmd_queue_init(struct thread_data *td)
 	p.flags |= IORING_SETUP_CQSIZE;
 	p.cq_entries = depth;
 
+	/*
+	 * Setup COOP_TASKRUN as we don't need to get IPI interrupted for
+	 * completing IO operations.
+	 */
+	p.flags |= IORING_SETUP_COOP_TASKRUN;
+
+	/*
+	 * io_uring is always a single issuer, and we can defer task_work
+	 * runs until we reap events.
+	 */
+	p.flags |= IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN;
+
 retry:
 	ret = syscall(__NR_io_uring_setup, depth, &p);
 	if (ret < 0) {
+		if (errno == EINVAL && p.flags & IORING_SETUP_DEFER_TASKRUN) {
+			p.flags &= ~IORING_SETUP_DEFER_TASKRUN;
+			p.flags &= ~IORING_SETUP_SINGLE_ISSUER;
+			goto retry;
+		}
+		if (errno == EINVAL && p.flags & IORING_SETUP_COOP_TASKRUN) {
+			p.flags &= ~IORING_SETUP_COOP_TASKRUN;
+			goto retry;
+		}
 		if (errno == EINVAL && p.flags & IORING_SETUP_CQSIZE) {
 			p.flags &= ~IORING_SETUP_CQSIZE;
 			goto retry;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-10-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-10-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7aeb498947d6f2d6c96b571520f12b80365fa8a1:

  test: make t0014.fio time_based (2022-10-05 18:34:41 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b19c5ee1357ffb74f4de57b1617364bbbaacf1a0:

  examples: uring-cmd-zoned: expand the reasoning behind QD1 (2022-10-07 09:50:37 -0400)

----------------------------------------------------------------
Pankaj Raghav (1):
      examples: uring-cmd-zoned: expand the reasoning behind QD1

 examples/uring-cmd-zoned.fio | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/examples/uring-cmd-zoned.fio b/examples/uring-cmd-zoned.fio
index 58e8f79e..89be61be 100644
--- a/examples/uring-cmd-zoned.fio
+++ b/examples/uring-cmd-zoned.fio
@@ -1,7 +1,11 @@
 # io_uring_cmd I/O engine for nvme-ns generic zoned character device
 #
-# NOTE: with write workload iodepth must be set to 1 as there is no IO
-# scheduler.
+# NOTE:
+# Regular writes against a zone should be limited to QD1, as the device can
+# reorder the requests.
+#
+# As the passthrough path do not use an IO scheduler (such as mq-deadline),
+# the queue depth should be limited to 1 to avoid zone invalid writes.
 
 [global]
 filename=/dev/ng0n1

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-10-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-10-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0474b83f022f1f1cc14208c05b7ccda682e01263:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-10-04 14:25:09 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7aeb498947d6f2d6c96b571520f12b80365fa8a1:

  test: make t0014.fio time_based (2022-10-05 18:34:41 -0400)

----------------------------------------------------------------
Vincent Fu (7):
      test: clean up randtrimwrite test
      test: check all offsets touched for randtrimwrite
      test: fix style issues in run-fio-tests.py
      test: add basic tests for trimwrite workloads
      test: fix t/run-fio-tests.py style issues identified by pylint
      test: improve run-fio-tests.py file open method
      test: make t0014.fio time_based

 t/jobs/t0014.fio   |   1 +
 t/jobs/t0023.fio   |  16 +---
 t/jobs/t0024.fio   |  36 +++++++++
 t/run-fio-tests.py | 209 ++++++++++++++++++++++++++++++++++++++---------------
 4 files changed, 190 insertions(+), 72 deletions(-)
 create mode 100644 t/jobs/t0024.fio

---

Diff of recent changes:

diff --git a/t/jobs/t0014.fio b/t/jobs/t0014.fio
index d9b45651..eb13478b 100644
--- a/t/jobs/t0014.fio
+++ b/t/jobs/t0014.fio
@@ -17,6 +17,7 @@ flow_id=1
 thread
 log_avg_msec=1000
 write_iops_log=t0014.fio
+time_based
 
 [flow1]
 flow=1
diff --git a/t/jobs/t0023.fio b/t/jobs/t0023.fio
index 0250ee1a..4f0bef89 100644
--- a/t/jobs/t0023.fio
+++ b/t/jobs/t0023.fio
@@ -6,29 +6,26 @@ rw=randtrimwrite
 log_offset=1
 per_job_logs=0
 randrepeat=0
-stonewall
+write_bw_log
 
 # Expected result: 	trim issued to random offset followed by write to same offset
 # 			all offsets touched
 # 			block sizes match
 # Buggy result: 	something else
 [basic]
-write_bw_log
 
 # Expected result: 	trim issued to random offset followed by write to same offset
 # 			all offsets trimmed
 # 			block sizes 8k for both write and trim
 # Buggy result: 	something else
 [bs]
-write_bw_log
-bs=4k,4k,8k
+bs=8k,8k,8k
 
 # Expected result: 	trim issued to random offset followed by write to same offset
 # 			all offsets trimmed
 # 			block sizes match
 # Buggy result: 	something else
 [bsrange]
-write_bw_log
 bsrange=512-4k
 
 # Expected result: 	trim issued to random offset followed by write to same offset
@@ -36,40 +33,31 @@ bsrange=512-4k
 # 			block sizes match
 # Buggy result: 	something else
 [bssplit]
-write_bw_log
 bsrange=512/25:1k:25:2k:25:4k/25
 
 # Expected result: 	trim issued to random offset followed by write to same offset
-# 			all offsets touched
 # 			block sizes match
 # Buggy result: 	something else
 [basic_no_rm]
-write_bw_log
 norandommap=1
 
 # Expected result: 	trim issued to random offset followed by write to same offset
-# 			all offsets trimmed
 # 			block sizes 8k for both write and trim
 # Buggy result: 	something else
 [bs_no_rm]
-write_bw_log
 bs=4k,4k,8k
 norandommap=1
 
 # Expected result: 	trim issued to random offset followed by write to same offset
-# 			all offsets trimmed
 # 			block sizes match
 # Buggy result: 	something else
 [bsrange_no_rm]
-write_bw_log
 bsrange=512-4k
 norandommap=1
 
 # Expected result: 	trim issued to random offset followed by write to same offset
-# 			all offsets trimmed
 # 			block sizes match
 # Buggy result: 	something else
 [bssplit_no_rm]
-write_bw_log
 bsrange=512/25:1k:25:2k:25:4k/25
 norandommap=1
diff --git a/t/jobs/t0024.fio b/t/jobs/t0024.fio
new file mode 100644
index 00000000..393a2b70
--- /dev/null
+++ b/t/jobs/t0024.fio
@@ -0,0 +1,36 @@
+# trimwrite data direction tests
+[global]
+filesize=1M
+ioengine=null
+rw=trimwrite
+log_offset=1
+per_job_logs=0
+randrepeat=0
+write_bw_log
+
+# Expected result: 	trim issued to sequential offsets followed by write to same offset
+# 			all offsets touched
+# 			block sizes match
+# Buggy result: 	something else
+[basic]
+
+# Expected result: 	trim issued to sequential offsets followed by write to same offset
+# 			all offsets trimmed
+# 			block sizes 8k for both write and trim
+# Buggy result: 	something else
+[bs]
+bs=8k,8k,8k
+
+# Expected result: 	trim issued to sequential offsets followed by write to same offset
+# 			all offsets trimmed
+# 			block sizes match
+# Buggy result: 	something else
+[bsrange]
+bsrange=512-4k
+
+# Expected result: 	trim issued to sequential offsets followed by write to same offset
+# 			all offsets trimmed
+# 			block sizes match
+# Buggy result: 	something else
+[bssplit]
+bsrange=512/25:1k:25:2k:25:4k/25
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index a2b036d9..df87ae72 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -55,7 +55,7 @@ import multiprocessing
 from pathlib import Path
 
 
-class FioTest(object):
+class FioTest():
     """Base for all fio tests."""
 
     def __init__(self, exe_path, parameters, success):
@@ -286,6 +286,19 @@ class FioJobTest(FioExeTest):
 
         return file_data, success
 
+    def get_file_fail(self, filename):
+        """Safely read a file and fail the test upon error."""
+        file_data = None
+
+        try:
+            with open(filename, "r") as output_file:
+                file_data = output_file.read()
+        except OSError:
+            self.failure_reason += " unable to read file {0}".format(filename)
+            self.passed = False
+
+        return file_data
+
     def check_result(self):
         """Check fio job results."""
 
@@ -302,10 +315,8 @@ class FioJobTest(FioExeTest):
         if 'json' not in self.output_format:
             return
 
-        file_data, success = self.get_file(os.path.join(self.test_dir, self.fio_output))
-        if not success:
-            self.failure_reason = "{0} unable to open output file,".format(self.failure_reason)
-            self.passed = False
+        file_data = self.get_file_fail(os.path.join(self.test_dir, self.fio_output))
+        if not file_data:
             return
 
         #
@@ -427,12 +438,11 @@ class FioJobTest_t0012(FioJobTest):
             return
 
         iops_files = []
-        for i in range(1,4):
-            file_data, success = self.get_file(os.path.join(self.test_dir, "{0}_iops.{1}.log".format(os.path.basename(self.fio_job), i)))
-
-            if not success:
-                self.failure_reason = "{0} unable to open output file,".format(self.failure_reason)
-                self.passed = False
+        for i in range(1, 4):
+            filename = os.path.join(self.test_dir, "{0}_iops.{1}.log".format(os.path.basename(
+                self.fio_job), i))
+            file_data = self.get_file_fail(filename)
+            if not file_data:
                 return
 
             iops_files.append(file_data.splitlines())
@@ -448,17 +458,15 @@ class FioJobTest_t0012(FioJobTest):
 
             ratio1 = iops3/iops2
             ratio2 = iops3/iops1
-            logging.debug(
-                "sample {0}: job1 iops={1} job2 iops={2} job3 iops={3} job3/job2={4:.3f} job3/job1={5:.3f}".format(
-                    i, iops1, iops2, iops3, ratio1, ratio2
-                )
-            )
+            logging.debug("sample {0}: job1 iops={1} job2 iops={2} job3 iops={3} " \
+                "job3/job2={4:.3f} job3/job1={5:.3f}".format(i, iops1, iops2, iops3, ratio1,
+                                                             ratio2))
 
         # test job1 and job2 succeeded to recalibrate
         if ratio1 < 1 or ratio1 > 3 or ratio2 < 7 or ratio2 > 13:
-            self.failure_reason = "{0} iops ratio mismatch iops1={1} iops2={2} iops3={3} expected r1~2 r2~10 got r1={4:.3f} r2={5:.3f},".format(
-                self.failure_reason, iops1, iops2, iops3, ratio1, ratio2
-            )
+            self.failure_reason += " iops ratio mismatch iops1={0} iops2={1} iops3={2} " \
+                "expected r1~2 r2~10 got r1={3:.3f} r2={4:.3f},".format(iops1, iops2, iops3,
+                                                                        ratio1, ratio2)
             self.passed = False
             return
 
@@ -478,12 +486,11 @@ class FioJobTest_t0014(FioJobTest):
             return
 
         iops_files = []
-        for i in range(1,4):
-            file_data, success = self.get_file(os.path.join(self.test_dir, "{0}_iops.{1}.log".format(os.path.basename(self.fio_job), i)))
-
-            if not success:
-                self.failure_reason = "{0} unable to open output file,".format(self.failure_reason)
-                self.passed = False
+        for i in range(1, 4):
+            filename = os.path.join(self.test_dir, "{0}_iops.{1}.log".format(os.path.basename(
+                self.fio_job), i))
+            file_data = self.get_file_fail(filename)
+            if not file_data:
                 return
 
             iops_files.append(file_data.splitlines())
@@ -501,10 +508,9 @@ class FioJobTest_t0014(FioJobTest):
 
 
                 if ratio1 < 0.43 or ratio1 > 0.57 or ratio2 < 0.21 or ratio2 > 0.45:
-                    self.failure_reason = "{0} iops ratio mismatch iops1={1} iops2={2} iops3={3}\
-                                                expected r1~0.5 r2~0.33 got r1={4:.3f} r2={5:.3f},".format(
-                        self.failure_reason, iops1, iops2, iops3, ratio1, ratio2
-                    )
+                    self.failure_reason += " iops ratio mismatch iops1={0} iops2={1} iops3={2} " \
+                                           "expected r1~0.5 r2~0.33 got r1={3:.3f} r2={4:.3f},".format(
+                                               iops1, iops2, iops3, ratio1, ratio2)
                     self.passed = False
 
             iops1 = iops1 + float(iops_files[0][i].split(',')[1])
@@ -512,17 +518,14 @@ class FioJobTest_t0014(FioJobTest):
 
             ratio1 = iops1/iops2
             ratio2 = iops1/iops3
-            logging.debug(
-                "sample {0}: job1 iops={1} job2 iops={2} job3 iops={3} job1/job2={4:.3f} job1/job3={5:.3f}".format(
-                    i, iops1, iops2, iops3, ratio1, ratio2
-                )
-            )
+            logging.debug("sample {0}: job1 iops={1} job2 iops={2} job3 iops={3} " \
+                          "job1/job2={4:.3f} job1/job3={5:.3f}".format(i, iops1, iops2, iops3,
+                                                                       ratio1, ratio2))
 
         # test job1 and job2 succeeded to recalibrate
         if ratio1 < 0.43 or ratio1 > 0.57:
-            self.failure_reason = "{0} iops ratio mismatch iops1={1} iops2={2} expected ratio~0.5 got ratio={3:.3f},".format(
-                self.failure_reason, iops1, iops2, ratio1
-            )
+            self.failure_reason += " iops ratio mismatch iops1={0} iops2={1} expected ratio~0.5 " \
+                                   "got ratio={2:.3f},".format(iops1, iops2, ratio1)
             self.passed = False
             return
 
@@ -556,7 +559,10 @@ class FioJobTest_t0019(FioJobTest):
         super(FioJobTest_t0019, self).check_result()
 
         bw_log_filename = os.path.join(self.test_dir, "test_bw.log")
-        file_data, success = self.get_file(bw_log_filename)
+        file_data = self.get_file_fail(bw_log_filename)
+        if not file_data:
+            return
+
         log_lines = file_data.split('\n')
 
         prev = -4096
@@ -583,7 +589,10 @@ class FioJobTest_t0020(FioJobTest):
         super(FioJobTest_t0020, self).check_result()
 
         bw_log_filename = os.path.join(self.test_dir, "test_bw.log")
-        file_data, success = self.get_file(bw_log_filename)
+        file_data = self.get_file_fail(bw_log_filename)
+        if not file_data:
+            return
+
         log_lines = file_data.split('\n')
 
         seq_count = 0
@@ -621,7 +630,10 @@ class FioJobTest_t0022(FioJobTest):
         super(FioJobTest_t0022, self).check_result()
 
         bw_log_filename = os.path.join(self.test_dir, "test_bw.log")
-        file_data, success = self.get_file(bw_log_filename)
+        file_data = self.get_file_fail(bw_log_filename)
+        if not file_data:
+            return
+
         log_lines = file_data.split('\n')
 
         filesize = 1024*1024
@@ -646,15 +658,20 @@ class FioJobTest_t0022(FioJobTest):
 
         if len(offsets) == filesize/bs:
             self.passed = False
-            self.failure_reason += " no duplicate offsets found with norandommap=1".format(len(offsets))
+            self.failure_reason += " no duplicate offsets found with norandommap=1"
 
 
 class FioJobTest_t0023(FioJobTest):
-    """Test consists of fio test job t0023"""
+    """Test consists of fio test job t0023 randtrimwrite test."""
+
+    def check_trimwrite(self, filename):
+        """Make sure that trims are followed by writes of the same size at the same offset."""
 
-    def check_seq(self, filename):
         bw_log_filename = os.path.join(self.test_dir, filename)
-        file_data, success = self.get_file(bw_log_filename)
+        file_data = self.get_file_fail(bw_log_filename)
+        if not file_data:
+            return
+
         log_lines = file_data.split('\n')
 
         prev_ddir = 1
@@ -668,40 +685,107 @@ class FioJobTest_t0023(FioJobTest):
             if prev_ddir == 1:
                 if ddir != 2:
                     self.passed = False
-                    self.failure_reason += " {0}: write not preceeded by trim: {1}".format(bw_log_filename, line)
+                    self.failure_reason += " {0}: write not preceeded by trim: {1}".format(
+                        bw_log_filename, line)
                     break
             else:
                 if ddir != 1:
                     self.passed = False
-                    self.failure_reason += " {0}: trim not preceeded by write: {1}".format(bw_log_filename, line)
+                    self.failure_reason += " {0}: trim not preceeded by write: {1}".format(
+                        bw_log_filename, line)
                     break
                 else:
                     if prev_bs != bs:
                         self.passed = False
-                        self.failure_reason += " {0}: block size does not match: {1}".format(bw_log_filename, line)
+                        self.failure_reason += " {0}: block size does not match: {1}".format(
+                            bw_log_filename, line)
                         break
                     if prev_offset != offset:
                         self.passed = False
-                        self.failure_reason += " {0}: offset does not match: {1}".format(bw_log_filename, line)
+                        self.failure_reason += " {0}: offset does not match: {1}".format(
+                            bw_log_filename, line)
                         break
             prev_ddir = ddir
             prev_bs = bs
             prev_offset = offset
 
 
+    def check_all_offsets(self, filename, sectorsize, filesize):
+        """Make sure all offsets were touched."""
+
+        file_data = self.get_file_fail(os.path.join(self.test_dir, filename))
+        if not file_data:
+            return
+
+        log_lines = file_data.split('\n')
+
+        offsets = set()
+
+        for line in log_lines:
+            if len(line.strip()) == 0:
+                continue
+            vals = line.split(',')
+            bs = int(vals[3])
+            offset = int(vals[4])
+            if offset % sectorsize != 0:
+                self.passed = False
+                self.failure_reason += " {0}: offset {1} not a multiple of sector size {2}".format(
+                    filename, offset, sectorsize)
+                break
+            if bs % sectorsize != 0:
+                self.passed = False
+                self.failure_reason += " {0}: block size {1} not a multiple of sector size " \
+                    "{2}".format(filename, bs, sectorsize)
+                break
+            for i in range(int(bs/sectorsize)):
+                offsets.add(offset/sectorsize + i)
+
+        if len(offsets) != filesize/sectorsize:
+            self.passed = False
+            self.failure_reason += " {0}: only {1} offsets touched; expected {2}".format(
+                filename, len(offsets), filesize/sectorsize)
+        else:
+            logging.debug("%s: %d sectors touched", filename, len(offsets))
+
+
+    def check_result(self):
+        super(FioJobTest_t0023, self).check_result()
+
+        filesize = 1024*1024
+
+        self.check_trimwrite("basic_bw.log")
+        self.check_trimwrite("bs_bw.log")
+        self.check_trimwrite("bsrange_bw.log")
+        self.check_trimwrite("bssplit_bw.log")
+        self.check_trimwrite("basic_no_rm_bw.log")
+        self.check_trimwrite("bs_no_rm_bw.log")
+        self.check_trimwrite("bsrange_no_rm_bw.log")
+        self.check_trimwrite("bssplit_no_rm_bw.log")
+
+        self.check_all_offsets("basic_bw.log", 4096, filesize)
+        self.check_all_offsets("bs_bw.log", 8192, filesize)
+        self.check_all_offsets("bsrange_bw.log", 512, filesize)
+        self.check_all_offsets("bssplit_bw.log", 512, filesize)
+
+
+class FioJobTest_t0024(FioJobTest_t0023):
+    """Test consists of fio test job t0024 trimwrite test."""
+
     def check_result(self):
+        # call FioJobTest_t0023's parent to skip checks done by t0023
         super(FioJobTest_t0023, self).check_result()
 
-        self.check_seq("basic_bw.log")
-        self.check_seq("bs_bw.log")
-        self.check_seq("bsrange_bw.log")
-        self.check_seq("bssplit_bw.log")
-        self.check_seq("basic_no_rm_bw.log")
-        self.check_seq("bs_no_rm_bw.log")
-        self.check_seq("bsrange_no_rm_bw.log")
-        self.check_seq("bssplit_no_rm_bw.log")
+        filesize = 1024*1024
 
-        # TODO make sure all offsets were touched
+        self.check_trimwrite("basic_bw.log")
+        self.check_trimwrite("bs_bw.log")
+        self.check_trimwrite("bsrange_bw.log")
+        self.check_trimwrite("bssplit_bw.log")
+
+        self.check_all_offsets("basic_bw.log", 4096, filesize)
+        self.check_all_offsets("bs_bw.log", 8192, filesize)
+        self.check_all_offsets("bsrange_bw.log", 512, filesize)
+        self.check_all_offsets("bssplit_bw.log", 512, filesize)
 
 
 class FioJobTest_iops_rate(FioJobTest):
@@ -732,7 +816,7 @@ class FioJobTest_iops_rate(FioJobTest):
             self.passed = False
 
 
-class Requirements(object):
+class Requirements():
     """Requirements consists of multiple run environment characteristics.
     These are to determine if a particular test can be run"""
 
@@ -1090,6 +1174,15 @@ TEST_LIST = [
         'pre_success':      None,
         'requirements':     [],
     },
+    {
+        'test_id':          24,
+        'test_class':       FioJobTest_t0024,
+        'job':              't0024.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-10-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-10-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 793b868671d14f9a3e4fa76ac129545987084a8d:

  randtrimwrite: fix corner case with variable block sizes (2022-10-03 17:36:57 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0474b83f022f1f1cc14208c05b7ccda682e01263:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-10-04 14:25:09 -0600)

----------------------------------------------------------------
Bart Van Assche (2):
      Android: Fix the build of the 'sg' engine
      Android: Enable the 'sg' engine

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 Makefile     | 3 ++-
 engines/sg.c | 4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index f947f11c..7bd572d7 100644
--- a/Makefile
+++ b/Makefile
@@ -249,7 +249,8 @@ endif
 endif
 ifeq ($(CONFIG_TARGET_OS), Android)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c profiles/tiobench.c \
-		oslib/linux-dev-lookup.c engines/io_uring.c engines/nvme.c
+		oslib/linux-dev-lookup.c engines/io_uring.c engines/nvme.c \
+		engines/sg.c
   cmdprio_SRCS = engines/cmdprio.c
 ifdef CONFIG_HAS_BLKZONED
   SOURCE += oslib/linux-blkzoned.c
diff --git a/engines/sg.c b/engines/sg.c
index 72ee07ba..24783374 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -1331,10 +1331,12 @@ static char *fio_sgio_errdetails(struct io_u *io_u)
 			strlcat(msg, ". ", MAXERRDETAIL);
 		}
 		if (hdr->sb_len_wr) {
+			const uint8_t *const sbp = hdr->sbp;
+
 			snprintf(msgchunk, MAXMSGCHUNK, "Sense Data (%d bytes):", hdr->sb_len_wr);
 			strlcat(msg, msgchunk, MAXERRDETAIL);
 			for (i = 0; i < hdr->sb_len_wr; i++) {
-				snprintf(msgchunk, MAXMSGCHUNK, " %02x", hdr->sbp[i]);
+				snprintf(msgchunk, MAXMSGCHUNK, " %02x", sbp[i]);
 				strlcat(msg, msgchunk, MAXERRDETAIL);
 			}
 			strlcat(msg, ". ", MAXERRDETAIL);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-10-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-10-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c16dc793a3c45780f67ce65244b6e91323dee014:

  Add randtrimwrite data direction (2022-09-28 10:06:40 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 793b868671d14f9a3e4fa76ac129545987084a8d:

  randtrimwrite: fix corner case with variable block sizes (2022-10-03 17:36:57 -0400)

----------------------------------------------------------------
Anuj Gupta (1):
      engines/io_uring: add fixedbufs support for io_uring_cmd

Vincent Fu (4):
      randtrimwrite: write at same offset as trim
      test: test job for randtrimwrite
      randtrimwrite: fix offsets for corner case
      randtrimwrite: fix corner case with variable block sizes

 engines/io_uring.c |  4 +++
 io_ddir.h          |  2 ++
 io_u.c             | 34 +++++++++++++++++++++++--
 t/jobs/t0023.fio   | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 t/run-fio-tests.py | 64 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 177 insertions(+), 2 deletions(-)
 create mode 100644 t/jobs/t0023.fio

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index d0fc61dc..c679177f 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -433,6 +433,10 @@ static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
 		ld->prepped = 0;
 		sqe->flags |= IOSQE_ASYNC;
 	}
+	if (o->fixedbufs) {
+		sqe->uring_cmd_flags = IORING_URING_CMD_FIXED;
+		sqe->buf_index = io_u->index;
+	}
 
 	cmd = (struct nvme_uring_cmd *)sqe->cmd;
 	return fio_nvme_uring_cmd_prep(cmd, io_u,
diff --git a/io_ddir.h b/io_ddir.h
index 7227e9ee..217eb628 100644
--- a/io_ddir.h
+++ b/io_ddir.h
@@ -52,6 +52,8 @@ enum td_ddir {
 #define file_randommap(td, f)	(!(td)->o.norandommap && fio_file_axmap((f)))
 #define td_trimwrite(td)	(((td)->o.td_ddir & TD_DDIR_TRIMWRITE) \
 					== TD_DDIR_TRIMWRITE)
+#define td_randtrimwrite(td)	(((td)->o.td_ddir & TD_DDIR_RANDTRIMWRITE) \
+					== TD_DDIR_RANDTRIMWRITE)
 
 static inline int ddir_sync(enum fio_ddir ddir)
 {
diff --git a/io_u.c b/io_u.c
index eec378dd..91f1a358 100644
--- a/io_u.c
+++ b/io_u.c
@@ -417,7 +417,13 @@ static int get_next_block(struct thread_data *td, struct io_u *io_u,
 
 	b = offset = -1ULL;
 
-	if (rw_seq) {
+	if (td_randtrimwrite(td) && ddir == DDIR_WRITE) {
+		/* don't mark randommap for these writes */
+		io_u_set(td, io_u, IO_U_F_BUSY_OK);
+		offset = f->last_start[DDIR_TRIM];
+		*is_random = true;
+		ret = 0;
+	} else if (rw_seq) {
 		if (td_random(td)) {
 			if (should_do_random(td, ddir)) {
 				ret = get_next_rand_block(td, f, ddir, &b);
@@ -507,6 +513,24 @@ static int get_next_offset(struct thread_data *td, struct io_u *io_u,
 		return 1;
 	}
 
+	/*
+	 * For randtrimwrite, we decide whether to issue a trim or a write
+	 * based on whether the offsets for the most recent trim and write
+	 * operations match. If they don't match that means we just issued a
+	 * new trim and the next operation should be a write. If they *do*
+	 * match that means we just completed a trim+write pair and the next
+	 * command should be a trim.
+	 *
+	 * This works fine for sequential workloads but for random workloads
+	 * it's possible to complete a trim+write pair and then have the next
+	 * randomly generated offset match the previous offset. If that happens
+	 * we need to alter the offset for the last write operation in order
+	 * to ensure that we issue a write operation the next time through.
+	 */
+	if (td_randtrimwrite(td) && ddir == DDIR_TRIM &&
+	    f->last_start[DDIR_TRIM] == io_u->offset)
+		f->last_start[DDIR_WRITE]--;
+
 	io_u->verify_offset = io_u->offset;
 	return 0;
 }
@@ -530,6 +554,12 @@ static unsigned long long get_next_buflen(struct thread_data *td, struct io_u *i
 
 	assert(ddir_rw(ddir));
 
+	if (td_randtrimwrite(td) && ddir == DDIR_WRITE) {
+		struct fio_file *f = io_u->file;
+
+		return f->last_pos[DDIR_TRIM] - f->last_start[DDIR_TRIM];
+	}
+
 	if (td->o.bs_is_seq_rand)
 		ddir = is_random ? DDIR_WRITE : DDIR_READ;
 
@@ -768,7 +798,7 @@ static void set_rw_ddir(struct thread_data *td, struct io_u *io_u)
 
 	if (td_trimwrite(td)) {
 		struct fio_file *f = io_u->file;
-		if (f->last_pos[DDIR_WRITE] == f->last_pos[DDIR_TRIM])
+		if (f->last_start[DDIR_WRITE] == f->last_start[DDIR_TRIM])
 			ddir = DDIR_TRIM;
 		else
 			ddir = DDIR_WRITE;
diff --git a/t/jobs/t0023.fio b/t/jobs/t0023.fio
new file mode 100644
index 00000000..0250ee1a
--- /dev/null
+++ b/t/jobs/t0023.fio
@@ -0,0 +1,75 @@
+# randtrimwrite data direction tests
+[global]
+filesize=1M
+ioengine=null
+rw=randtrimwrite
+log_offset=1
+per_job_logs=0
+randrepeat=0
+stonewall
+
+# Expected result: 	trim issued to random offset followed by write to same offset
+# 			all offsets touched
+# 			block sizes match
+# Buggy result: 	something else
+[basic]
+write_bw_log
+
+# Expected result: 	trim issued to random offset followed by write to same offset
+# 			all offsets trimmed
+# 			block sizes 8k for both write and trim
+# Buggy result: 	something else
+[bs]
+write_bw_log
+bs=4k,4k,8k
+
+# Expected result: 	trim issued to random offset followed by write to same offset
+# 			all offsets trimmed
+# 			block sizes match
+# Buggy result: 	something else
+[bsrange]
+write_bw_log
+bsrange=512-4k
+
+# Expected result: 	trim issued to random offset followed by write to same offset
+# 			all offsets trimmed
+# 			block sizes match
+# Buggy result: 	something else
+[bssplit]
+write_bw_log
+bsrange=512/25:1k:25:2k:25:4k/25
+
+# Expected result: 	trim issued to random offset followed by write to same offset
+# 			all offsets touched
+# 			block sizes match
+# Buggy result: 	something else
+[basic_no_rm]
+write_bw_log
+norandommap=1
+
+# Expected result: 	trim issued to random offset followed by write to same offset
+# 			all offsets trimmed
+# 			block sizes 8k for both write and trim
+# Buggy result: 	something else
+[bs_no_rm]
+write_bw_log
+bs=4k,4k,8k
+norandommap=1
+
+# Expected result: 	trim issued to random offset followed by write to same offset
+# 			all offsets trimmed
+# 			block sizes match
+# Buggy result: 	something else
+[bsrange_no_rm]
+write_bw_log
+bsrange=512-4k
+norandommap=1
+
+# Expected result: 	trim issued to random offset followed by write to same offset
+# 			all offsets trimmed
+# 			block sizes match
+# Buggy result: 	something else
+[bssplit_no_rm]
+write_bw_log
+bsrange=512/25:1k:25:2k:25:4k/25
+norandommap=1
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index e72fa2a0..a2b036d9 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -649,6 +649,61 @@ class FioJobTest_t0022(FioJobTest):
             self.failure_reason += " no duplicate offsets found with norandommap=1".format(len(offsets))
 
 
+class FioJobTest_t0023(FioJobTest):
+    """Test consists of fio test job t0023"""
+
+    def check_seq(self, filename):
+        bw_log_filename = os.path.join(self.test_dir, filename)
+        file_data, success = self.get_file(bw_log_filename)
+        log_lines = file_data.split('\n')
+
+        prev_ddir = 1
+        for line in log_lines:
+            if len(line.strip()) == 0:
+                continue
+            vals = line.split(',')
+            ddir = int(vals[2])
+            bs = int(vals[3])
+            offset = int(vals[4])
+            if prev_ddir == 1:
+                if ddir != 2:
+                    self.passed = False
+                    self.failure_reason += " {0}: write not preceeded by trim: {1}".format(bw_log_filename, line)
+                    break
+            else:
+                if ddir != 1:
+                    self.passed = False
+                    self.failure_reason += " {0}: trim not preceeded by write: {1}".format(bw_log_filename, line)
+                    break
+                else:
+                    if prev_bs != bs:
+                        self.passed = False
+                        self.failure_reason += " {0}: block size does not match: {1}".format(bw_log_filename, line)
+                        break
+                    if prev_offset != offset:
+                        self.passed = False
+                        self.failure_reason += " {0}: offset does not match: {1}".format(bw_log_filename, line)
+                        break
+            prev_ddir = ddir
+            prev_bs = bs
+            prev_offset = offset
+
+
+    def check_result(self):
+        super(FioJobTest_t0023, self).check_result()
+
+        self.check_seq("basic_bw.log")
+        self.check_seq("bs_bw.log")
+        self.check_seq("bsrange_bw.log")
+        self.check_seq("bssplit_bw.log")
+        self.check_seq("basic_no_rm_bw.log")
+        self.check_seq("bs_no_rm_bw.log")
+        self.check_seq("bsrange_no_rm_bw.log")
+        self.check_seq("bssplit_no_rm_bw.log")
+
+        # TODO make sure all offsets were touched
+
+
 class FioJobTest_iops_rate(FioJobTest):
     """Test consists of fio test job t0009
     Confirm that job0 iops == 1000
@@ -1026,6 +1081,15 @@ TEST_LIST = [
         'pre_success':      None,
         'requirements':     [],
     },
+    {
+        'test_id':          23,
+        'test_class':       FioJobTest_t0023,
+        'job':              't0023.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-09-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-09-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6112c0f5a86c6b437e7158ab40a6e9384ce95e85:

  doc: build manpage from fio_doc.rst instead of fio_man.rst (2022-09-27 11:58:25 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c16dc793a3c45780f67ce65244b6e91323dee014:

  Add randtrimwrite data direction (2022-09-28 10:06:40 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      t/io_uring: get rid of useless read barriers
      Add randtrimwrite data direction

 HOWTO.rst    |  3 +++
 fio.1        |  5 +++++
 io_ddir.h    |  4 +++-
 options.c    |  4 ++++
 t/io_uring.c | 10 ++++++----
 5 files changed, 21 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 924f5ed9..e89d05f0 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1134,6 +1134,9 @@ I/O type
 				write 64K bytes on the same trimmed blocks. This behaviour
 				will be consistent with ``number_ios`` or other Fio options
 				limiting the total bytes or number of I/O's.
+		**randtrimwrite**
+				Like trimwrite, but uses random offsets rather
+				than sequential writes.
 
 	Fio defaults to read if the option is not specified.  For the mixed I/O
 	types, the default is to split them 50/50.  For certain types of I/O the
diff --git a/fio.1 b/fio.1
index 39d6b4f4..4324a975 100644
--- a/fio.1
+++ b/fio.1
@@ -904,6 +904,11 @@ then the same blocks will be written to. So if `io_size=64K' is specified,
 Fio will trim a total of 64K bytes and also write 64K bytes on the same
 trimmed blocks. This behaviour will be consistent with `number_ios' or
 other Fio options limiting the total bytes or number of I/O's.
+.TP
+.B randtrimwrite
+Like
+.B trimwrite ,
+but uses random offsets rather than sequential writes.
 .RE
 .P
 Fio defaults to read if the option is not specified. For the mixed I/O
diff --git a/io_ddir.h b/io_ddir.h
index 296a9d04..7227e9ee 100644
--- a/io_ddir.h
+++ b/io_ddir.h
@@ -41,6 +41,7 @@ enum td_ddir {
 	TD_DDIR_RANDRW		= TD_DDIR_RW | TD_DDIR_RAND,
 	TD_DDIR_RANDTRIM	= TD_DDIR_TRIM | TD_DDIR_RAND,
 	TD_DDIR_TRIMWRITE	= TD_DDIR_TRIM | TD_DDIR_WRITE,
+	TD_DDIR_RANDTRIMWRITE	= TD_DDIR_RANDTRIM | TD_DDIR_WRITE,
 };
 
 #define td_read(td)		((td)->o.td_ddir & TD_DDIR_READ)
@@ -67,7 +68,8 @@ static inline const char *ddir_str(enum td_ddir ddir)
 {
 	static const char *__str[] = { NULL, "read", "write", "rw", "rand",
 				"randread", "randwrite", "randrw",
-				"trim", NULL, "trimwrite", NULL, "randtrim" };
+				"trim", NULL, "trimwrite", NULL, "randtrim",
+				NULL, "randtrimwrite" };
 
 	return __str[ddir];
 }
diff --git a/options.c b/options.c
index 5d3daedf..a668b0e4 100644
--- a/options.c
+++ b/options.c
@@ -1947,6 +1947,10 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .oval = TD_DDIR_TRIMWRITE,
 			    .help = "Trim and write mix, trims preceding writes"
 			  },
+			  { .ival = "randtrimwrite",
+			    .oval = TD_DDIR_RANDTRIMWRITE,
+			    .help = "Randomly trim and write mix, trims preceding writes"
+			  },
 		},
 	},
 	{
diff --git a/t/io_uring.c b/t/io_uring.c
index b9353ac8..edbacee3 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -661,8 +661,12 @@ static void init_io_pt(struct submitter *s, unsigned index)
 static int prep_more_ios_uring(struct submitter *s, int max_ios)
 {
 	struct io_sq_ring *ring = &s->sq_ring;
-	unsigned index, tail, next_tail, prepped = 0;
-	unsigned int head = atomic_load_acquire(ring->head);
+	unsigned head, index, tail, next_tail, prepped = 0;
+
+	if (sq_thread_poll)
+		head = atomic_load_acquire(ring->head);
+	else
+		head = *ring->head;
 
 	next_tail = tail = *ring->tail;
 	do {
@@ -741,7 +745,6 @@ static int reap_events_uring(struct submitter *s)
 	do {
 		struct file *f;
 
-		read_barrier();
 		if (head == atomic_load_acquire(ring->tail))
 			break;
 		cqe = &ring->cqes[head & cq_ring_mask];
@@ -796,7 +799,6 @@ static int reap_events_uring_pt(struct submitter *s)
 	do {
 		struct file *f;
 
-		read_barrier();
 		if (head == atomic_load_acquire(ring->tail))
 			break;
 		index = head & cq_ring_mask;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-09-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-09-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d14687025c0c61d047e4252036d1b024d62cb0a6:

  configure: change grep -P to grep -E (2022-09-19 09:42:14 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0574e8c3b2b47e1e2564c2f50ea0b6f2629f2e48:

  arm64: ensure CPU clock retrieval issues isb() (2022-09-22 10:03:51 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      arm64: ensure CPU clock retrieval issues isb()

 arch/arch-aarch64.h | 3 +++
 1 file changed, 3 insertions(+)

---

Diff of recent changes:

diff --git a/arch/arch-aarch64.h b/arch/arch-aarch64.h
index 951d1718..919e5796 100644
--- a/arch/arch-aarch64.h
+++ b/arch/arch-aarch64.h
@@ -27,10 +27,13 @@ static inline int arch_ffz(unsigned long bitmask)
 
 #define ARCH_HAVE_FFZ
 
+#define isb()	asm volatile("isb" : : : "memory")
+
 static inline unsigned long long get_cpu_clock(void)
 {
 	unsigned long val;
 
+	isb();
 	asm volatile("mrs %0, cntvct_el0" : "=r" (val));
 	return val;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-09-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-09-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5932cf0f2a03396b5f3f0b4667f5e66f7d8477e5:

  Merge branch 'fix-example-disk-zone-profile' of github.com:cvubrugier/fio (2022-09-15 11:02:49 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d14687025c0c61d047e4252036d1b024d62cb0a6:

  configure: change grep -P to grep -E (2022-09-19 09:42:14 -0400)

----------------------------------------------------------------
Vincent Fu (2):
      gettime: cleanups
      configure: change grep -P to grep -E

 configure | 2 +-
 gettime.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 7741ef4f..546541a2 100755
--- a/configure
+++ b/configure
@@ -117,7 +117,7 @@ has() {
 }
 
 num() {
-  echo "$1" | grep -P -q "^[0-9]+$"
+  echo "$1" | grep -E -q "^[0-9]+$"
 }
 
 check_define() {
diff --git a/gettime.c b/gettime.c
index 14462420..8993be16 100644
--- a/gettime.c
+++ b/gettime.c
@@ -313,7 +313,7 @@ static int calibrate_cpu_clock(void)
 
 	max_ticks = MAX_CLOCK_SEC * cycles_per_msec * 1000ULL;
 	max_mult = ULLONG_MAX / max_ticks;
-	dprint(FD_TIME, "\n\nmax_ticks=%llu, __builtin_clzll=%d, "
+	dprint(FD_TIME, "max_ticks=%llu, __builtin_clzll=%d, "
 			"max_mult=%llu\n", max_ticks,
 			__builtin_clzll(max_ticks), max_mult);
 
@@ -335,7 +335,7 @@ static int calibrate_cpu_clock(void)
 
 	/*
 	 * Find the greatest power of 2 clock ticks that is less than the
-	 * ticks in MAX_CLOCK_SEC_2STAGE
+	 * ticks in MAX_CLOCK_SEC
 	 */
 	max_cycles_shift = max_cycles_mask = 0;
 	tmp = MAX_CLOCK_SEC * 1000ULL * cycles_per_msec;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-09-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-09-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 08996af41b2566565cbcdee71766030a2c8ba377:

  backend: number of ios not as expected for trimwrite (2022-09-13 15:03:21 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5932cf0f2a03396b5f3f0b4667f5e66f7d8477e5:

  Merge branch 'fix-example-disk-zone-profile' of github.com:cvubrugier/fio (2022-09-15 11:02:49 -0400)

----------------------------------------------------------------
Christophe Vu-Brugier (2):
      examples: set zonemode to strided in disk-zone-profile.fio
      examples: fix bandwidth logs generation in disk-zone-profile.fio

Vincent Fu (2):
      Merge branch 'master' of github.com:uniontech-lilinjie/fio
      Merge branch 'fix-example-disk-zone-profile' of github.com:cvubrugier/fio

lilinjie (1):
      fix spelling error

 examples/disk-zone-profile.fio | 9 ++++++---
 fio.1                          | 4 ++--
 2 files changed, 8 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/examples/disk-zone-profile.fio b/examples/disk-zone-profile.fio
index 96e56695..577820eb 100644
--- a/examples/disk-zone-profile.fio
+++ b/examples/disk-zone-profile.fio
@@ -1,4 +1,4 @@
-; Read disk in zones of 128m/2g, generating a plot of that afterwards
+; Read disk in zones of 256m/2g. Generating a plot of that afterwards
 ; should give a nice picture of the zoning of this drive
 
 [global]
@@ -7,8 +7,11 @@ direct=1
 rw=read
 ioengine=libaio
 iodepth=2
+zonemode=strided
 zonesize=256m
 zoneskip=2g
-write_bw_log
 
-[/dev/sdb]
+[disk-zone-profile]
+filename=/dev/sdb
+write_bw_log
+log_offset=1
diff --git a/fio.1 b/fio.1
index c67bd464..39d6b4f4 100644
--- a/fio.1
+++ b/fio.1
@@ -2491,11 +2491,11 @@ Specify the label or UUID of the DAOS pool to connect to.
 Specify the label or UUID of the DAOS container to open.
 .TP
 .BI (dfs)chunk_size
-Specificy a different chunk size (in bytes) for the dfs file.
+Specify a different chunk size (in bytes) for the dfs file.
 Use DAOS container's chunk size by default.
 .TP
 .BI (dfs)object_class
-Specificy a different object class for the dfs file.
+Specify a different object class for the dfs file.
 Use DAOS container's object class by default.
 .TP
 .BI (nfs)nfs_url

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-09-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-09-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 53c82bb879532b994451c6abc7be80c94241d03b:

  stat: fix comment about memory consumption (2022-09-12 10:45:56 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 08996af41b2566565cbcdee71766030a2c8ba377:

  backend: number of ios not as expected for trimwrite (2022-09-13 15:03:21 -0600)

----------------------------------------------------------------
Ankit Kumar (1):
      backend: number of ios not as expected for trimwrite

 HOWTO.rst | 6 +++++-
 backend.c | 6 ++++--
 fio.1     | 5 ++++-
 3 files changed, 13 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 2c6c6dbe..924f5ed9 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1129,7 +1129,11 @@ I/O type
 				Random mixed reads and writes.
 		**trimwrite**
 				Sequential trim+write sequences. Blocks will be trimmed first,
-				then the same blocks will be written to.
+				then the same blocks will be written to. So if ``io_size=64K``
+				is specified, Fio will trim a total of 64K bytes and also
+				write 64K bytes on the same trimmed blocks. This behaviour
+				will be consistent with ``number_ios`` or other Fio options
+				limiting the total bytes or number of I/O's.
 
 	Fio defaults to read if the option is not specified.  For the mixed I/O
 	types, the default is to split them 50/50.  For certain types of I/O the
diff --git a/backend.c b/backend.c
index fe614f6e..ec535bcc 100644
--- a/backend.c
+++ b/backend.c
@@ -971,9 +971,11 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 		total_bytes += td->o.size;
 
 	/* In trimwrite mode, each byte is trimmed and then written, so
-	 * allow total_bytes to be twice as big */
-	if (td_trimwrite(td))
+	 * allow total_bytes or number of ios to be twice as big */
+	if (td_trimwrite(td)) {
 		total_bytes += td->total_io_size;
+		td->o.number_ios *= 2;
+	}
 
 	while ((td->o.read_iolog_file && !flist_empty(&td->io_log_list)) ||
 		(!flist_empty(&td->trim_list)) || !io_issue_bytes_exceeded(td) ||
diff --git a/fio.1 b/fio.1
index 67d7c710..c67bd464 100644
--- a/fio.1
+++ b/fio.1
@@ -900,7 +900,10 @@ Random mixed reads and writes.
 .TP
 .B trimwrite
 Sequential trim+write sequences. Blocks will be trimmed first,
-then the same blocks will be written to.
+then the same blocks will be written to. So if `io_size=64K' is specified,
+Fio will trim a total of 64K bytes and also write 64K bytes on the same
+trimmed blocks. This behaviour will be consistent with `number_ios' or
+other Fio options limiting the total bytes or number of I/O's.
 .RE
 .P
 Fio defaults to read if the option is not specified. For the mixed I/O

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-09-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-09-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 10fc06dc4166ef7c69a6c06cb3a318878048f6be:

  Merge branch 'rpma-add-support-for-libpmem2-to-the-librpma-engine' of https://github.com/ldorau/fio (2022-09-06 06:58:48 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 53c82bb879532b994451c6abc7be80c94241d03b:

  stat: fix comment about memory consumption (2022-09-12 10:45:56 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      stat: fix comment about memory consumption

 stat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/stat.h b/stat.h
index eb7845af..4c3bf71f 100644
--- a/stat.h
+++ b/stat.h
@@ -51,7 +51,7 @@ struct group_run_stats {
  *
  * FIO_IO_U_PLAT_GROUP_NR and FIO_IO_U_PLAT_BITS determine the memory
  * requirement of storing those aggregate counts. The memory used will
- * be (FIO_IO_U_PLAT_GROUP_NR * 2^FIO_IO_U_PLAT_BITS) * sizeof(int)
+ * be (FIO_IO_U_PLAT_GROUP_NR * 2^FIO_IO_U_PLAT_BITS) * sizeof(uint64_t)
  * bytes.
  *
  * FIO_IO_U_PLAT_NR is the total number of buckets.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-09-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-09-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 021ce718f5ae4bfd5f4e42290993578adb7c7bd5:

  t/io_uring: enable support for registered buffers for passthrough (2022-09-03 11:04:06 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 10fc06dc4166ef7c69a6c06cb3a318878048f6be:

  Merge branch 'rpma-add-support-for-libpmem2-to-the-librpma-engine' of https://github.com/ldorau/fio (2022-09-06 06:58:48 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'ci-build-the-librpma-fio-engine' of https://github.com/ldorau/fio
      Merge branch 'rpma-add-support-for-libpmem2-to-the-librpma-engine' of https://github.com/ldorau/fio

Kacper Stefanski (2):
      rpma: add support for libpmem2 to librpma engine in APM mode
      rpma: add support for libpmem2 to librpma engine in GPSPM mode

Lukasz Dorau (3):
      rpma: simplify server_cmpl_process()
      ci: build the librpma fio engine
      ci: remove the unused travis-install-pmdk.sh file

 Makefile                                           | 12 ++-
 ...stall-librpma.sh => actions-install-librpma.sh} |  3 +-
 ci/actions-install.sh                              |  6 ++
 ci/travis-install-pmdk.sh                          | 29 -------
 configure                                          | 29 ++++++-
 engines/librpma_fio.c                              | 52 ++++---------
 engines/librpma_fio.h                              |  7 +-
 engines/librpma_fio_pmem.h                         | 67 ++++++++++++++++
 engines/librpma_fio_pmem2.h                        | 91 ++++++++++++++++++++++
 engines/librpma_gpspm.c                            | 59 ++++++++------
 10 files changed, 257 insertions(+), 98 deletions(-)
 rename ci/{travis-install-librpma.sh => actions-install-librpma.sh} (74%)
 delete mode 100755 ci/travis-install-pmdk.sh
 create mode 100644 engines/librpma_fio_pmem.h
 create mode 100644 engines/librpma_fio_pmem2.h

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 634d2c93..f947f11c 100644
--- a/Makefile
+++ b/Makefile
@@ -111,13 +111,21 @@ endif
 ifdef CONFIG_LIBRPMA_APM
   librpma_apm_SRCS = engines/librpma_apm.c
   librpma_fio_SRCS = engines/librpma_fio.c
-  librpma_apm_LIBS = -lrpma -lpmem
+  ifdef CONFIG_LIBPMEM2_INSTALLED
+    librpma_apm_LIBS = -lrpma -lpmem2
+  else
+    librpma_apm_LIBS = -lrpma -lpmem
+  endif
   ENGINES += librpma_apm
 endif
 ifdef CONFIG_LIBRPMA_GPSPM
   librpma_gpspm_SRCS = engines/librpma_gpspm.c engines/librpma_gpspm_flush.pb-c.c
   librpma_fio_SRCS = engines/librpma_fio.c
-  librpma_gpspm_LIBS = -lrpma -lpmem -lprotobuf-c
+  ifdef CONFIG_LIBPMEM2_INSTALLED
+    librpma_gpspm_LIBS = -lrpma -lpmem2 -lprotobuf-c
+  else
+    librpma_gpspm_LIBS = -lrpma -lpmem -lprotobuf-c
+  endif
   ENGINES += librpma_gpspm
 endif
 ifdef librpma_fio_SRCS
diff --git a/ci/travis-install-librpma.sh b/ci/actions-install-librpma.sh
similarity index 74%
rename from ci/travis-install-librpma.sh
rename to ci/actions-install-librpma.sh
index 4e5ed21d..31f9f712 100755
--- a/ci/travis-install-librpma.sh
+++ b/ci/actions-install-librpma.sh
@@ -1,7 +1,6 @@
 #!/bin/bash -e
 
-# 11.02.2021 Merge pull request #866 from ldorau/rpma-mmap-memory-for-rpma_mr_reg-in-rpma_flush_apm_new
-LIBRPMA_VERSION=fbac593917e98f3f26abf14f4fad5a832b330f5c
+LIBRPMA_VERSION="1.0.0"
 ZIP_FILE=rpma.zip
 
 WORKDIR=$(pwd)
diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index c209a089..82e14d2a 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -44,7 +44,9 @@ DPKGCFG
                 libiscsi-dev
                 libnbd-dev
                 libpmem-dev
+                libpmem2-dev
                 libpmemblk-dev
+                libprotobuf-c-dev
                 librbd-dev
                 libtcmalloc-minimal4
                 nvidia-cuda-dev
@@ -67,6 +69,10 @@ DPKGCFG
     sudo apt-get -qq update
     echo "Installing packages... ${pkgs[@]}"
     sudo apt-get install -o APT::Immediate-Configure=false --no-install-recommends -qq -y "${pkgs[@]}"
+    if [ "${CI_TARGET_ARCH}" == "x86_64" ]; then
+        # install librpma from sources
+        ci/actions-install-librpma.sh
+    fi
 }
 
 install_linux() {
diff --git a/ci/travis-install-pmdk.sh b/ci/travis-install-pmdk.sh
deleted file mode 100755
index 7bde9fd0..00000000
--- a/ci/travis-install-pmdk.sh
+++ /dev/null
@@ -1,29 +0,0 @@
-#!/bin/bash -e
-
-# pmdk v1.9.1 release
-PMDK_VERSION=1.9.1
-
-WORKDIR=$(pwd)
-
-#
-# The '/bin/sh' shell used by PMDK's 'make install'
-# does not know the exact localization of clang
-# and fails with:
-#    /bin/sh: 1: clang: not found
-# if CC is not set to the full path of clang.
-#
-CC=$(type -P "$CC")
-export CC
-
-# Install PMDK libraries, because PMDK's libpmem
-# is a dependency of the librpma fio engine.
-# Install it from a release package
-# with already generated documentation,
-# in order to not install 'pandoc'.
-wget https://github.com/pmem/pmdk/releases/download/${PMDK_VERSION}/pmdk-${PMDK_VERSION}.tar.gz
-tar -xzf pmdk-${PMDK_VERSION}.tar.gz
-cd pmdk-${PMDK_VERSION}
-make -j"$(nproc)" NDCTL_ENABLE=n
-sudo make -j"$(nproc)" install prefix=/usr NDCTL_ENABLE=n
-cd "$WORKDIR"
-rm -rf pmdk-${PMDK_VERSION}
diff --git a/configure b/configure
index a2b9bd4c..7741ef4f 100755
--- a/configure
+++ b/configure
@@ -2201,6 +2201,26 @@ EOF
 fi
 print_config "libpmem1_5" "$libpmem1_5"
 
+##########################################
+# Check whether we have libpmem2
+if test "$libpmem2" != "yes" ; then
+  libpmem2="no"
+fi
+cat > $TMPC << EOF
+#include <libpmem2.h>
+int main(int argc, char **argv)
+{
+  struct pmem2_config *cfg;
+  pmem2_config_new(&cfg);
+  pmem2_config_delete(&cfg);
+  return 0;
+}
+EOF
+if compile_prog "" "-lpmem2" "libpmem2"; then
+  libpmem2="yes"
+fi
+print_config "libpmem2" "$libpmem2"
+
 ##########################################
 # Check whether we have libpmemblk
 # libpmem is a prerequisite
@@ -2990,11 +3010,13 @@ if test "$libverbs" = "yes" -a "$rdmacm" = "yes" ; then
 fi
 # librpma is supported on the 'x86_64' architecture for now
 if test "$cpu" = "x86_64" -a "$libverbs" = "yes" -a "$rdmacm" = "yes" \
-    -a "$librpma" = "yes" -a "$libpmem" = "yes" ; then
+    -a "$librpma" = "yes" \
+    && test "$libpmem" = "yes" -o "$libpmem2" = "yes" ; then
   output_sym "CONFIG_LIBRPMA_APM"
 fi
 if test "$cpu" = "x86_64" -a "$libverbs" = "yes" -a "$rdmacm" = "yes" \
-    -a "$librpma" = "yes" -a "$libpmem" = "yes" -a "$libprotobuf_c" = "yes" ; then
+    -a "$librpma" = "yes" -a "$libprotobuf_c" = "yes" \
+    && test "$libpmem" = "yes" -o "$libpmem2" = "yes" ; then
   output_sym "CONFIG_LIBRPMA_GPSPM"
 fi
 if test "$clock_gettime" = "yes" ; then
@@ -3138,6 +3160,9 @@ fi
 if test "$pmem" = "yes" ; then
   output_sym "CONFIG_LIBPMEM"
 fi
+if test "$libpmem2" = "yes" ; then
+  output_sym "CONFIG_LIBPMEM2_INSTALLED"
+fi
 if test "$libime" = "yes" ; then
   output_sym "CONFIG_IME"
 fi
diff --git a/engines/librpma_fio.c b/engines/librpma_fio.c
index a78a1e57..42d6163e 100644
--- a/engines/librpma_fio.c
+++ b/engines/librpma_fio.c
@@ -1,7 +1,7 @@
 /*
  * librpma_fio: librpma_apm and librpma_gpspm engines' common part.
  *
- * Copyright 2021, Intel Corporation
+ * Copyright 2021-2022, Intel Corporation
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License,
@@ -13,9 +13,11 @@
  * GNU General Public License for more details.
  */
 
-#include "librpma_fio.h"
-
-#include <libpmem.h>
+#ifdef CONFIG_LIBPMEM2_INSTALLED
+#include "librpma_fio_pmem2.h"
+#else
+#include "librpma_fio_pmem.h"
+#endif /* CONFIG_LIBPMEM2_INSTALLED */
 
 struct fio_option librpma_fio_options[] = {
 	{
@@ -111,10 +113,8 @@ char *librpma_fio_allocate_dram(struct thread_data *td, size_t size,
 char *librpma_fio_allocate_pmem(struct thread_data *td, struct fio_file *f,
 		size_t size, struct librpma_fio_mem *mem)
 {
-	size_t size_mmap = 0;
-	char *mem_ptr = NULL;
-	int is_pmem = 0;
 	size_t ws_offset;
+	mem->mem_ptr = NULL;
 
 	if (size % page_size) {
 		log_err("fio: size (%zu) is not aligned to page size (%zu)\n",
@@ -135,48 +135,24 @@ char *librpma_fio_allocate_pmem(struct thread_data *td, struct fio_file *f,
 		return NULL;
 	}
 
-	/* map the file */
-	mem_ptr = pmem_map_file(f->file_name, 0 /* len */, 0 /* flags */,
-			0 /* mode */, &size_mmap, &is_pmem);
-	if (mem_ptr == NULL) {
-		log_err("fio: pmem_map_file(%s) failed\n", f->file_name);
-		/* pmem_map_file() sets errno on failure */
-		td_verror(td, errno, "pmem_map_file");
-		return NULL;
-	}
-
-	/* pmem is expected */
-	if (!is_pmem) {
-		log_err("fio: %s is not located in persistent memory\n",
+	if (librpma_fio_pmem_map_file(f, size, mem, ws_offset)) {
+		log_err("fio: librpma_fio_pmem_map_file(%s) failed\n",
 			f->file_name);
-		goto err_unmap;
-	}
-
-	/* check size of allocated persistent memory */
-	if (size_mmap < ws_offset + size) {
-		log_err(
-			"fio: %s is too small to handle so many threads (%zu < %zu)\n",
-			f->file_name, size_mmap, ws_offset + size);
-		goto err_unmap;
+		return NULL;
 	}
 
 	log_info("fio: size of memory mapped from the file %s: %zu\n",
-		f->file_name, size_mmap);
-
-	mem->mem_ptr = mem_ptr;
-	mem->size_mmap = size_mmap;
+		f->file_name, mem->size_mmap);
 
-	return mem_ptr + ws_offset;
+	log_info("fio: library used to map PMem from file: %s\n", RPMA_PMEM_USED);
 
-err_unmap:
-	(void) pmem_unmap(mem_ptr, size_mmap);
-	return NULL;
+	return mem->mem_ptr ? mem->mem_ptr + ws_offset : NULL;
 }
 
 void librpma_fio_free(struct librpma_fio_mem *mem)
 {
 	if (mem->size_mmap)
-		(void) pmem_unmap(mem->mem_ptr, mem->size_mmap);
+		librpma_fio_unmap(mem);
 	else
 		free(mem->mem_ptr);
 }
diff --git a/engines/librpma_fio.h b/engines/librpma_fio.h
index 91290235..480ded1b 100644
--- a/engines/librpma_fio.h
+++ b/engines/librpma_fio.h
@@ -1,7 +1,7 @@
 /*
  * librpma_fio: librpma_apm and librpma_gpspm engines' common header.
  *
- * Copyright 2021, Intel Corporation
+ * Copyright 2021-2022, Intel Corporation
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License,
@@ -72,6 +72,11 @@ struct librpma_fio_mem {
 
 	/* size of the mapped persistent memory */
 	size_t size_mmap;
+
+#ifdef CONFIG_LIBPMEM2_INSTALLED
+	/* libpmem2 structure used for mapping PMem */
+	struct pmem2_map *map;
+#endif
 };
 
 char *librpma_fio_allocate_dram(struct thread_data *td, size_t size,
diff --git a/engines/librpma_fio_pmem.h b/engines/librpma_fio_pmem.h
new file mode 100644
index 00000000..4854292c
--- /dev/null
+++ b/engines/librpma_fio_pmem.h
@@ -0,0 +1,67 @@
+/*
+ * librpma_fio_pmem: allocates pmem using libpmem.
+ *
+ * Copyright 2022, Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <libpmem.h>
+#include "librpma_fio.h"
+
+#define RPMA_PMEM_USED "libpmem"
+
+static int librpma_fio_pmem_map_file(struct fio_file *f, size_t size,
+		struct librpma_fio_mem *mem, size_t ws_offset)
+{
+	int is_pmem = 0;
+	size_t size_mmap = 0;
+
+	/* map the file */
+	mem->mem_ptr = pmem_map_file(f->file_name, 0 /* len */, 0 /* flags */,
+			0 /* mode */, &size_mmap, &is_pmem);
+	if (mem->mem_ptr == NULL) {
+		/* pmem_map_file() sets errno on failure */
+		log_err("fio: pmem_map_file(%s) failed: %s (errno %i)\n",
+			f->file_name, strerror(errno), errno);
+		return -1;
+	}
+
+	/* pmem is expected */
+	if (!is_pmem) {
+		log_err("fio: %s is not located in persistent memory\n",
+			f->file_name);
+		goto err_unmap;
+	}
+
+	/* check size of allocated persistent memory */
+	if (size_mmap < ws_offset + size) {
+		log_err(
+			"fio: %s is too small to handle so many threads (%zu < %zu)\n",
+			f->file_name, size_mmap, ws_offset + size);
+		goto err_unmap;
+	}
+
+	log_info("fio: size of memory mapped from the file %s: %zu\n",
+		f->file_name, size_mmap);
+
+	mem->size_mmap = size_mmap;
+
+	return 0;
+
+err_unmap:
+	(void) pmem_unmap(mem->mem_ptr, size_mmap);
+	return -1;
+}
+
+static inline void librpma_fio_unmap(struct librpma_fio_mem *mem)
+{
+	(void) pmem_unmap(mem->mem_ptr, mem->size_mmap);
+}
diff --git a/engines/librpma_fio_pmem2.h b/engines/librpma_fio_pmem2.h
new file mode 100644
index 00000000..09a51f5f
--- /dev/null
+++ b/engines/librpma_fio_pmem2.h
@@ -0,0 +1,91 @@
+/*
+ * librpma_fio_pmem2: allocates pmem using libpmem2.
+ *
+ * Copyright 2022, Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <libpmem2.h>
+#include "librpma_fio.h"
+
+#define RPMA_PMEM_USED "libpmem2"
+
+static int librpma_fio_pmem_map_file(struct fio_file *f, size_t size,
+		struct librpma_fio_mem *mem, size_t ws_offset)
+{
+	int fd;
+	struct pmem2_config *cfg = NULL;
+	struct pmem2_map *map = NULL;
+	struct pmem2_source *src = NULL;
+
+	size_t size_mmap;
+
+	if((fd = open(f->file_name, O_RDWR)) < 0) {
+		log_err("fio: cannot open fio file\n");
+		return -1;
+	}
+
+	if (pmem2_source_from_fd(&src, fd) != 0) {
+		log_err("fio: pmem2_source_from_fd() failed\n");
+		goto err_close;
+	}
+
+	if (pmem2_config_new(&cfg) != 0) {
+		log_err("fio: pmem2_config_new() failed\n");
+		goto err_source_delete;
+	}
+
+	if (pmem2_config_set_required_store_granularity(cfg,
+					PMEM2_GRANULARITY_CACHE_LINE) != 0) {
+		log_err("fio: pmem2_config_set_required_store_granularity() failed: %s\n", pmem2_errormsg());
+		goto err_config_delete;
+	}
+
+	if (pmem2_map_new(&map, cfg, src) != 0) {
+		log_err("fio: pmem2_map_new(%s) failed: %s\n", f->file_name, pmem2_errormsg());
+		goto err_config_delete;
+	}
+
+	size_mmap = pmem2_map_get_size(map);
+
+	/* check size of allocated persistent memory */
+	if (size_mmap < ws_offset + size) {
+		log_err(
+			"fio: %s is too small to handle so many threads (%zu < %zu)\n",
+			f->file_name, size_mmap, ws_offset + size);
+		goto err_map_delete;
+	}
+
+	mem->mem_ptr = pmem2_map_get_address(map);
+	mem->size_mmap = size_mmap;
+	mem->map = map;
+	pmem2_config_delete(&cfg);
+	pmem2_source_delete(&src);
+	close(fd);
+
+	return 0;
+
+err_map_delete:
+	pmem2_map_delete(&map);
+err_config_delete:
+	pmem2_config_delete(&cfg);
+err_source_delete:
+	pmem2_source_delete(&src);
+err_close:
+	close(fd);
+
+	return -1;
+}
+
+static inline void librpma_fio_unmap(struct librpma_fio_mem *mem)
+{
+	(void) pmem2_map_delete(&mem->map);
+}
diff --git a/engines/librpma_gpspm.c b/engines/librpma_gpspm.c
index f00717a7..70116d0d 100644
--- a/engines/librpma_gpspm.c
+++ b/engines/librpma_gpspm.c
@@ -2,7 +2,7 @@
  * librpma_gpspm: IO engine that uses PMDK librpma to write data,
  *		based on General Purpose Server Persistency Method
  *
- * Copyright 2020-2021, Intel Corporation
+ * Copyright 2020-2022, Intel Corporation
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License,
@@ -16,7 +16,11 @@
 
 #include "librpma_fio.h"
 
+#ifdef CONFIG_LIBPMEM2_INSTALLED
+#include <libpmem2.h>
+#else
 #include <libpmem.h>
+#endif
 
 /* Generated by the protocol buffer compiler from: librpma_gpspm_flush.proto */
 #include "librpma_gpspm_flush.pb-c.h"
@@ -361,6 +365,8 @@ FIO_STATIC struct ioengine_ops ioengine_client = {
 
 #define IO_U_BUFF_OFF_SERVER(i) (i * IO_U_BUF_LEN)
 
+typedef void (*librpma_fio_persist_fn)(const void *ptr, size_t size);
+
 struct server_data {
 	/* aligned td->orig_buffer */
 	char *orig_buffer_aligned;
@@ -373,6 +379,8 @@ struct server_data {
 	/* in-memory queues */
 	struct ibv_wc *msgs_queued;
 	uint32_t msg_queued_nr;
+
+	librpma_fio_persist_fn persist;
 };
 
 static int server_init(struct thread_data *td)
@@ -400,6 +408,13 @@ static int server_init(struct thread_data *td)
 		goto err_free_sd;
 	}
 
+#ifdef CONFIG_LIBPMEM2_INSTALLED
+	/* get libpmem2 persist function from pmem2_map */
+	sd->persist = pmem2_get_persist_fn(csd->mem.map);
+#else
+	sd->persist = pmem_persist;
+#endif
+
 	/*
 	 * Assure a single io_u buffer can store both SEND and RECV messages and
 	 * an io_us buffer allocation is page-size-aligned which is required
@@ -594,7 +609,7 @@ static int server_qe_process(struct thread_data *td, struct ibv_wc *wc)
 
 	if (IS_NOT_THE_LAST_MESSAGE(flush_req)) {
 		op_ptr = csd->ws_ptr + flush_req->offset;
-		pmem_persist(op_ptr, flush_req->length);
+		sd->persist(op_ptr, flush_req->length);
 	} else {
 		/*
 		 * This is the last message - the client is done.
@@ -685,29 +700,25 @@ static int server_cmpl_process(struct thread_data *td)
 
 	ret = rpma_cq_get_wc(csd->cq, 1, wc, NULL);
 	if (ret == RPMA_E_NO_COMPLETION) {
-		if (o->busy_wait_polling == 0) {
-			ret = rpma_cq_wait(csd->cq);
-			if (ret == RPMA_E_NO_COMPLETION) {
-				/* lack of completion is not an error */
-				return 0;
-			} else if (ret != 0) {
-				librpma_td_verror(td, ret, "rpma_cq_wait");
-				goto err_terminate;
-			}
-
-			ret = rpma_cq_get_wc(csd->cq, 1, wc, NULL);
-			if (ret == RPMA_E_NO_COMPLETION) {
-				/* lack of completion is not an error */
-				return 0;
-			} else if (ret != 0) {
-				librpma_td_verror(td, ret, "rpma_cq_get_wc");
-				goto err_terminate;
-			}
-		} else {
-			/* lack of completion is not an error */
-			return 0;
+		if (o->busy_wait_polling)
+			return 0; /* lack of completion is not an error */
+
+		ret = rpma_cq_wait(csd->cq);
+		if (ret == RPMA_E_NO_COMPLETION)
+			return 0; /* lack of completion is not an error */
+		if (ret) {
+			librpma_td_verror(td, ret, "rpma_cq_wait");
+			goto err_terminate;
+		}
+
+		ret = rpma_cq_get_wc(csd->cq, 1, wc, NULL);
+		if (ret == RPMA_E_NO_COMPLETION)
+			return 0; /* lack of completion is not an error */
+		if (ret) {
+			librpma_td_verror(td, ret, "rpma_cq_get_wc");
+			goto err_terminate;
 		}
-	} else if (ret != 0) {
+	} else if (ret) {
 		librpma_td_verror(td, ret, "rpma_cq_get_wc");
 		goto err_terminate;
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-09-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-09-04 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4952 bytes --]

The following changes since commit 0b2c736174402afc742a7ed97c37f872fa93ee25:

  Merge branch 'fiopr_windows_log_compression_storage_fixes' of https://github.com/PCPartPicker/fio (2022-09-02 17:29:45 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 021ce718f5ae4bfd5f4e42290993578adb7c7bd5:

  t/io_uring: enable support for registered buffers for passthrough (2022-09-03 11:04:06 -0600)

----------------------------------------------------------------
Jens Axboe (5):
      Merge branch 'fix/help-terse-version-5' of https://github.com/scop/fio
      Merge branch 'doc/showcmd-usage' of https://github.com/scop/fio
      Merge branch 'fix/howto-spelling' of https://github.com/scop/fio
      t/io_uring: properly detect numa nodes for passthrough mode
      t/io_uring: enable support for registered buffers for passthrough

Ville Skyttä (3):
      init: include 5 in --terse-version help
      HOWTO: spelling fixes
      doc: fix --showcmd usage

 HOWTO.rst           | 8 ++++----
 fio.1               | 4 ++--
 init.c              | 2 +-
 os/linux/io_uring.h | 8 ++++++++
 t/io_uring.c        | 9 ++++++++-
 5 files changed, 23 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 08be687c..2c6c6dbe 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -167,9 +167,9 @@ Command line options
 	defined by `ioengine`.  If no `ioengine` is given, list all
 	available ioengines.
 
-.. option:: --showcmd=jobfile
+.. option:: --showcmd
 
-	Convert `jobfile` to a set of command-line options.
+	Convert given job files to a set of command-line options.
 
 .. option:: --readonly
 
@@ -2550,7 +2550,7 @@ with the caveat that when used on the command line, they must come after the
 
    [dfs]
 
-	Specificy a different chunk size (in bytes) for the dfs file.
+	Specify a different chunk size (in bytes) for the dfs file.
 	Use DAOS container's chunk size by default.
 
    [libhdfs]
@@ -2559,7 +2559,7 @@ with the caveat that when used on the command line, they must come after the
 
 .. option:: object_class=str : [dfs]
 
-	Specificy a different object class for the dfs file.
+	Specify a different object class for the dfs file.
 	Use DAOS container's object class by default.
 
 .. option:: skip_bad=bool : [mtd]
diff --git a/fio.1 b/fio.1
index 27454b0b..67d7c710 100644
--- a/fio.1
+++ b/fio.1
@@ -67,8 +67,8 @@ List all commands defined by \fIioengine\fR, or print help for \fIcommand\fR
 defined by \fIioengine\fR. If no \fIioengine\fR is given, list all
 available ioengines.
 .TP
-.BI \-\-showcmd \fR=\fPjobfile
-Convert \fIjobfile\fR to a set of command\-line options.
+.BI \-\-showcmd
+Convert given \fIjobfile\fRs to a set of command\-line options.
 .TP
 .BI \-\-readonly
 Turn on safety read\-only checks, preventing writes and trims. The \fB\-\-readonly\fR
diff --git a/init.c b/init.c
index da800776..f6a8056a 100644
--- a/init.c
+++ b/init.c
@@ -2269,7 +2269,7 @@ static void usage(const char *name)
 	printf("  --minimal\t\tMinimal (terse) output\n");
 	printf("  --output-format=type\tOutput format (terse,json,json+,normal)\n");
 	printf("  --terse-version=type\tSet terse version output format"
-		" (default 3, or 2 or 4)\n");
+		" (default 3, or 2 or 4 or 5)\n");
 	printf("  --version\t\tPrint version info and exit\n");
 	printf("  --help\t\tPrint this page\n");
 	printf("  --cpuclock-test\tPerform test/validation of CPU clock\n");
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index 6604e736..c7a24ad8 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -46,6 +46,7 @@ struct io_uring_sqe {
 		__u32		rename_flags;
 		__u32		unlink_flags;
 		__u32		hardlink_flags;
+		__u32		uring_cmd_flags;
 	};
 	__u64	user_data;	/* data to be passed back at completion time */
 	/* pack this to avoid bogus arm OABI complaints */
@@ -197,6 +198,13 @@ enum {
 	IORING_OP_LAST,
 };
 
+/*
+ * sqe->uring_cmd_flags
+ * IORING_URING_CMD_FIXED	use registered buffer; pass thig flag
+ *				along with setting sqe->buf_index.
+ */
+#define IORING_URING_CMD_FIXED	(1U << 0)
+
 /*
  * sqe->fsync_flags
  */
diff --git a/t/io_uring.c b/t/io_uring.c
index 9d580b5a..b9353ac8 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -650,6 +650,10 @@ static void init_io_pt(struct submitter *s, unsigned index)
 	cmd->cdw12 = nlb;
 	cmd->addr = (unsigned long) s->iovecs[index].iov_base;
 	cmd->data_len = bs;
+	if (fixedbufs) {
+		sqe->uring_cmd_flags = IORING_URING_CMD_FIXED;
+		sqe->buf_index = index;
+	}
 	cmd->nsid = f->nsid;
 	cmd->opcode = 2;
 }
@@ -856,7 +860,10 @@ static int detect_node(struct submitter *s, const char *name)
 	char str[128];
 	int ret, fd, node;
 
-	sprintf(str, "/sys/block/%s/device/numa_node", base);
+	if (pt)
+		sprintf(str, "/sys/class/nvme-generic/%s/device/numa_node", base);
+	else
+		sprintf(str, "/sys/block/%s/device/numa_node", base);
 	fd = open(str, O_RDONLY);
 	if (fd < 0)
 		return -1;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-09-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-09-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e57758c12bdb24885e32ba143a04fcc8f98565ca:

  Merge branch 'fiopr_compressfixes' of https://github.com/PCPartPicker/fio (2022-09-01 12:03:23 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0b2c736174402afc742a7ed97c37f872fa93ee25:

  Merge branch 'fiopr_windows_log_compression_storage_fixes' of https://github.com/PCPartPicker/fio (2022-09-02 17:29:45 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fiopr_windows_log_compression_storage_fixes' of https://github.com/PCPartPicker/fio

aggieNick02 (1):
      Fix log compression storage on windows

 iolog.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/iolog.c b/iolog.c
index 41d3e473..aa9c3bb1 100644
--- a/iolog.c
+++ b/iolog.c
@@ -1218,7 +1218,7 @@ int iolog_file_inflate(const char *file)
 	void *buf;
 	FILE *f;
 
-	f = fopen(file, "r");
+	f = fopen(file, "rb");
 	if (!f) {
 		perror("fopen");
 		return 1;
@@ -1300,10 +1300,21 @@ void flush_log(struct io_log *log, bool do_append)
 	void *buf;
 	FILE *f;
 
+	/*
+	 * If log_gz_store is true, we are writing a binary file.
+	 * Set the mode appropriately (on all platforms) to avoid issues
+	 * on windows (line-ending conversions, etc.)
+	 */
 	if (!do_append)
-		f = fopen(log->filename, "w");
+		if (log->log_gz_store)
+			f = fopen(log->filename, "wb");
+		else
+			f = fopen(log->filename, "w");
 	else
-		f = fopen(log->filename, "a");
+		if (log->log_gz_store)
+			f = fopen(log->filename, "ab");
+		else
+			f = fopen(log->filename, "a");
 	if (!f) {
 		perror("fopen log");
 		return;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-09-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-09-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2be18f6b266f3fcba89719b354672090f49d53d9:

  t/io_uring: take advantage of new io_uring setup flags (2022-08-31 18:44:52 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e57758c12bdb24885e32ba143a04fcc8f98565ca:

  Merge branch 'fiopr_compressfixes' of https://github.com/PCPartPicker/fio (2022-09-01 12:03:23 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      t/io_uring: minor optimizations to IO init fast path
      Merge branch 'fiopr_compressfixes' of https://github.com/PCPartPicker/fio

aggieNick02 (1):
      Fix fio silently dropping log entries when using log_compression

 iolog.c              |   6 +--
 t/io_uring.c         |  10 +++--
 t/log_compression.py | 121 +++++++++++++++++++++++++++++++++++++++++++++++++++
 t/run-fio-tests.py   |   8 ++++
 4 files changed, 139 insertions(+), 6 deletions(-)
 create mode 100755 t/log_compression.py

---

Diff of recent changes:

diff --git a/iolog.c b/iolog.c
index 37e799a1..41d3e473 100644
--- a/iolog.c
+++ b/iolog.c
@@ -1574,14 +1574,14 @@ void iolog_compress_exit(struct thread_data *td)
  * Queue work item to compress the existing log entries. We reset the
  * current log to a small size, and reference the existing log in the
  * data that we queue for compression. Once compression has been done,
- * this old log is freed. If called with finish == true, will not return
- * until the log compression has completed, and will flush all previous
- * logs too
+ * this old log is freed. Will not return until the log compression
+ * has completed, and will flush all previous logs too
  */
 static int iolog_flush(struct io_log *log)
 {
 	struct iolog_flush_data *data;
 
+	workqueue_flush(&log->td->log_compress_wq);
 	data = malloc(sizeof(*data));
 	if (!data)
 		return 1;
diff --git a/t/io_uring.c b/t/io_uring.c
index 5b46015a..9d580b5a 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -658,11 +658,12 @@ static int prep_more_ios_uring(struct submitter *s, int max_ios)
 {
 	struct io_sq_ring *ring = &s->sq_ring;
 	unsigned index, tail, next_tail, prepped = 0;
+	unsigned int head = atomic_load_acquire(ring->head);
 
 	next_tail = tail = *ring->tail;
 	do {
 		next_tail++;
-		if (next_tail == atomic_load_acquire(ring->head))
+		if (next_tail == head)
 			break;
 
 		index = tail & sq_ring_mask;
@@ -670,7 +671,6 @@ static int prep_more_ios_uring(struct submitter *s, int max_ios)
 			init_io_pt(s, index);
 		else
 			init_io(s, index);
-		ring->array[index] = index;
 		prepped++;
 		tail = next_tail;
 	} while (prepped < max_ios);
@@ -908,7 +908,7 @@ static int setup_ring(struct submitter *s)
 	struct io_sq_ring *sring = &s->sq_ring;
 	struct io_cq_ring *cring = &s->cq_ring;
 	struct io_uring_params p;
-	int ret, fd;
+	int ret, fd, i;
 	void *ptr;
 	size_t len;
 
@@ -1003,6 +1003,10 @@ static int setup_ring(struct submitter *s)
 	cring->ring_entries = ptr + p.cq_off.ring_entries;
 	cring->cqes = ptr + p.cq_off.cqes;
 	cq_ring_mask = *cring->ring_mask;
+
+	for (i = 0; i < p.sq_entries; i++)
+		sring->array[i] = i;
+
 	return 0;
 }
 
diff --git a/t/log_compression.py b/t/log_compression.py
new file mode 100755
index 00000000..94c92db7
--- /dev/null
+++ b/t/log_compression.py
@@ -0,0 +1,121 @@
+#!/usr/bin/env python3
+#
+# log_compression.py
+#
+# Test log_compression and log_store_compressed. Uses null ioengine.
+# Previous bugs have caused output in per I/O log files to be missing
+# and/or out of order
+#
+# Expected result: 8000 log entries, offset starting at 0 and increasing by bs
+# Buggy result: Log entries out of order (usually without log_store_compressed)
+# and/or missing log entries (usually with log_store_compressed)
+#
+# USAGE
+# python log_compression.py [-f fio-executable]
+#
+# EXAMPLES
+# python t/log_compression.py
+# python t/log_compression.py -f ./fio
+#
+# REQUIREMENTS
+# Python 3.5+
+#
+# ===TEST MATRIX===
+#
+# With log_compression=10K
+# With log_store_compressed=1 and log_compression=10K
+
+import os
+import sys
+import platform
+import argparse
+import subprocess
+
+
+def parse_args():
+    """Parse command-line arguments."""
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-f', '--fio',
+                        help='path to fio executable (e.g., ./fio)')
+    return parser.parse_args()
+
+
+def run_fio(fio,log_store_compressed):
+    fio_args = [
+        '--name=job',
+        '--ioengine=null',
+        '--filesize=1000M',
+        '--bs=128K',
+        '--rw=write',
+        '--iodepth=1',
+        '--write_bw_log=test',
+        '--per_job_logs=0',
+        '--log_offset=1',
+        '--log_compression=10K',
+        ]
+    if log_store_compressed:
+        fio_args.append('--log_store_compressed=1')
+
+    subprocess.check_output([fio] + fio_args)
+
+    if log_store_compressed:
+        fio_inflate_args = [
+            '--inflate-log=test_bw.log.fz'
+            ]
+        with open('test_bw.from_fz.log','wt') as f:
+            subprocess.check_call([fio]+fio_inflate_args,stdout=f)
+
+def check_log_file(log_store_compressed):
+    filename = 'test_bw.from_fz.log' if log_store_compressed else 'test_bw.log'
+    with open(filename,'rt') as f:
+        file_data = f.read()
+    log_lines = [x for x in file_data.split('\n') if len(x.strip())!=0]
+    log_ios = len(log_lines)
+
+    filesize = 1000*1024*1024
+    bs = 128*1024
+    ios = filesize//bs
+    if log_ios!=ios:
+        print('wrong number of ios ({}) in log; should be {}'.format(log_ios,ios))
+        return False
+
+    expected_offset = 0
+    for line_number,line in enumerate(log_lines):
+        log_offset = int(line.split(',')[4])
+        if log_offset != expected_offset:
+            print('wrong offset ({}) for io number {} in log; should be {}'.format(
+                log_offset, line_number, expected_offset))
+            return False
+        expected_offset += bs
+    return True
+
+def main():
+    """Entry point for this script."""
+    args = parse_args()
+    if args.fio:
+        fio_path = args.fio
+    else:
+        fio_path = os.path.join(os.path.dirname(__file__), '../fio')
+        if not os.path.exists(fio_path):
+            fio_path = 'fio'
+    print("fio path is", fio_path)
+
+    passed_count = 0
+    failed_count = 0
+    for log_store_compressed in [False, True]:
+        run_fio(fio_path, log_store_compressed)
+        passed = check_log_file(log_store_compressed)
+        print('Test with log_store_compressed={} {}'.format(log_store_compressed,
+            'PASSED' if passed else 'FAILED'))
+        if passed:
+            passed_count+=1
+        else:
+            failed_count+=1
+
+    print('{} tests passed, {} failed'.format(passed_count, failed_count))
+
+    sys.exit(failed_count)
+
+if __name__ == '__main__':
+    main()
+
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 47823761..e72fa2a0 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -1124,6 +1124,14 @@ TEST_LIST = [
         'success':          SUCCESS_DEFAULT,
         'requirements':     [],
     },
+    {
+        'test_id':          1012,
+        'test_class':       FioExeTest,
+        'exe':              't/log_compression.py',
+        'parameters':       ['-f', '{fio_path}'],
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [],
+    },
 ]
 
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-09-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-09-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c9be6f0007ab79e3f83952c650af8e7a0c324953:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-08-30 18:19:30 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2be18f6b266f3fcba89719b354672090f49d53d9:

  t/io_uring: take advantage of new io_uring setup flags (2022-08-31 18:44:52 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      engines/io_uring: set COOP_TASKRUN for ring setup
      engines/io_uring: set single issuer and defer taskrun
      t/io_uring: unify getting of the offset
      t/io_uring: take advantage of new io_uring setup flags

 engines/io_uring.c  | 21 +++++++++++++++
 os/linux/io_uring.h | 12 +++++++++
 t/io_uring.c        | 75 ++++++++++++++++++++++++++++++++---------------------
 3 files changed, 78 insertions(+), 30 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 94376efa..d0fc61dc 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -809,9 +809,30 @@ static int fio_ioring_queue_init(struct thread_data *td)
 	p.flags |= IORING_SETUP_CQSIZE;
 	p.cq_entries = depth;
 
+	/*
+	 * Setup COOP_TASKRUN as we don't need to get IPI interrupted for
+	 * completing IO operations.
+	 */
+	p.flags |= IORING_SETUP_COOP_TASKRUN;
+
+	/*
+	 * io_uring is always a single issuer, and we can defer task_work
+	 * runs until we reap events.
+	 */
+	p.flags |= IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN;
+
 retry:
 	ret = syscall(__NR_io_uring_setup, depth, &p);
 	if (ret < 0) {
+		if (errno == EINVAL && p.flags & IORING_SETUP_DEFER_TASKRUN) {
+			p.flags &= ~IORING_SETUP_DEFER_TASKRUN;
+			p.flags &= ~IORING_SETUP_SINGLE_ISSUER;
+			goto retry;
+		}
+		if (errno == EINVAL && p.flags & IORING_SETUP_COOP_TASKRUN) {
+			p.flags &= ~IORING_SETUP_COOP_TASKRUN;
+			goto retry;
+		}
 		if (errno == EINVAL && p.flags & IORING_SETUP_CQSIZE) {
 			p.flags &= ~IORING_SETUP_CQSIZE;
 			goto retry;
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index 929997f8..6604e736 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -131,6 +131,18 @@ enum {
 #define IORING_SETUP_SQE128		(1U << 10) /* SQEs are 128 byte */
 #define IORING_SETUP_CQE32		(1U << 11) /* CQEs are 32 byte */
 
+/*
+ * Only one task is allowed to submit requests
+ */
+#define IORING_SETUP_SINGLE_ISSUER	(1U << 12)
+
+/*
+ * Defer running task work to get events.
+ * Rather than running bits of task work whenever the task transitions
+ * try to do it just before it is needed.
+ */
+#define IORING_SETUP_DEFER_TASKRUN	(1U << 13)
+
 enum {
 	IORING_OP_NOP,
 	IORING_OP_READV,
diff --git a/t/io_uring.c b/t/io_uring.c
index e8e41796..5b46015a 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -449,6 +449,8 @@ static int io_uring_register_files(struct submitter *s)
 
 static int io_uring_setup(unsigned entries, struct io_uring_params *p)
 {
+	int ret;
+
 	/*
 	 * Clamp CQ ring size at our SQ ring size, we don't need more entries
 	 * than that.
@@ -456,7 +458,28 @@ static int io_uring_setup(unsigned entries, struct io_uring_params *p)
 	p->flags |= IORING_SETUP_CQSIZE;
 	p->cq_entries = entries;
 
-	return syscall(__NR_io_uring_setup, entries, p);
+	p->flags |= IORING_SETUP_COOP_TASKRUN;
+	p->flags |= IORING_SETUP_SINGLE_ISSUER;
+	p->flags |= IORING_SETUP_DEFER_TASKRUN;
+retry:
+	ret = syscall(__NR_io_uring_setup, entries, p);
+	if (!ret)
+		return 0;
+
+	if (errno == EINVAL && p->flags & IORING_SETUP_COOP_TASKRUN) {
+		p->flags &= ~IORING_SETUP_COOP_TASKRUN;
+		goto retry;
+	}
+	if (errno == EINVAL && p->flags & IORING_SETUP_SINGLE_ISSUER) {
+		p->flags &= ~IORING_SETUP_SINGLE_ISSUER;
+		goto retry;
+	}
+	if (errno == EINVAL && p->flags & IORING_SETUP_DEFER_TASKRUN) {
+		p->flags &= ~IORING_SETUP_DEFER_TASKRUN;
+		goto retry;
+	}
+
+	return ret;
 }
 
 static void io_uring_probe(int fd)
@@ -501,12 +524,28 @@ static unsigned file_depth(struct submitter *s)
 	return (depth + s->nr_files - 1) / s->nr_files;
 }
 
+static unsigned long long get_offset(struct submitter *s, struct file *f)
+{
+	unsigned long long offset;
+	long r;
+
+	if (random_io) {
+		r = __rand64(&s->rand_state);
+		offset = (r % (f->max_blocks - 1)) * bs;
+	} else {
+		offset = f->cur_off;
+		f->cur_off += bs;
+		if (f->cur_off + bs > f->max_size)
+			f->cur_off = 0;
+	}
+
+	return offset;
+}
+
 static void init_io(struct submitter *s, unsigned index)
 {
 	struct io_uring_sqe *sqe = &s->sqes[index];
-	unsigned long offset;
 	struct file *f;
-	long r;
 
 	if (do_nop) {
 		sqe->opcode = IORING_OP_NOP;
@@ -526,16 +565,6 @@ static void init_io(struct submitter *s, unsigned index)
 	}
 	f->pending_ios++;
 
-	if (random_io) {
-		r = __rand64(&s->rand_state);
-		offset = (r % (f->max_blocks - 1)) * bs;
-	} else {
-		offset = f->cur_off;
-		f->cur_off += bs;
-		if (f->cur_off + bs > f->max_size)
-			f->cur_off = 0;
-	}
-
 	if (register_files) {
 		sqe->flags = IOSQE_FIXED_FILE;
 		sqe->fd = f->fixed_fd;
@@ -560,7 +589,7 @@ static void init_io(struct submitter *s, unsigned index)
 		sqe->buf_index = 0;
 	}
 	sqe->ioprio = 0;
-	sqe->off = offset;
+	sqe->off = get_offset(s, f);
 	sqe->user_data = (unsigned long) f->fileno;
 	if (stats && stats_running)
 		sqe->user_data |= ((uint64_t)s->clock_index << 32);
@@ -1072,10 +1101,8 @@ static int submitter_init(struct submitter *s)
 static int prep_more_ios_aio(struct submitter *s, int max_ios, struct iocb *iocbs)
 {
 	uint64_t data;
-	long long offset;
 	struct file *f;
 	unsigned index;
-	long r;
 
 	index = 0;
 	while (index < max_ios) {
@@ -1094,10 +1121,8 @@ static int prep_more_ios_aio(struct submitter *s, int max_ios, struct iocb *iocb
 		}
 		f->pending_ios++;
 
-		r = lrand48();
-		offset = (r % (f->max_blocks - 1)) * bs;
 		io_prep_pread(iocb, f->real_fd, s->iovecs[index].iov_base,
-				s->iovecs[index].iov_len, offset);
+				s->iovecs[index].iov_len, get_offset(s, f));
 
 		data = f->fileno;
 		if (stats && stats_running)
@@ -1380,7 +1405,6 @@ static void *submitter_sync_fn(void *data)
 	do {
 		uint64_t offset;
 		struct file *f;
-		long r;
 
 		if (s->nr_files == 1) {
 			f = &s->files[0];
@@ -1395,16 +1419,6 @@ static void *submitter_sync_fn(void *data)
 		}
 		f->pending_ios++;
 
-		if (random_io) {
-			r = __rand64(&s->rand_state);
-			offset = (r % (f->max_blocks - 1)) * bs;
-		} else {
-			offset = f->cur_off;
-			f->cur_off += bs;
-			if (f->cur_off + bs > f->max_size)
-				f->cur_off = 0;
-		}
-
 #ifdef ARCH_HAVE_CPU_CLOCK
 		if (stats)
 			s->clock_batch[s->clock_index] = get_cpu_clock();
@@ -1413,6 +1427,7 @@ static void *submitter_sync_fn(void *data)
 		s->inflight++;
 		s->calls++;
 
+		offset = get_offset(s, f);
 		if (polled)
 			ret = preadv2(f->real_fd, &s->iovecs[0], 1, offset, RWF_HIPRI);
 		else

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-31 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-31 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b68ba328173f5a4714d888f6ce80fd24a4e4c504:

  test: get 32-bit Ubuntu 22.04 build working (2022-08-29 16:42:18 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c9be6f0007ab79e3f83952c650af8e7a0c324953:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-08-30 18:19:30 -0600)

----------------------------------------------------------------
Bart Van Assche (4):
      Remove two casts from os-linux.h
      Linux: Use the byte order functions from <asm/byteorder.h>
      Split os-android.h
      Merge os-android.h into os-linux.h

Jens Axboe (3):
      backend: revert bad memory leak fix
      Fio 3.32
      Merge branch 'master' of https://github.com/bvanassche/fio

Vincent Fu (1):
      test: add tests for lfsr and norandommap

 FIO-VERSION-GEN    |   2 +-
 backend.c          |   5 -
 os/os-android.h    | 342 -----------------------------------------------------
 os/os-ashmem.h     |  84 +++++++++++++
 os/os-linux.h      |  14 ++-
 os/os.h            |   4 +-
 t/jobs/t0021.fio   |  15 +++
 t/jobs/t0022.fio   |  13 ++
 t/run-fio-tests.py |  55 ++++++++-
 9 files changed, 180 insertions(+), 354 deletions(-)
 delete mode 100644 os/os-android.h
 create mode 100644 os/os-ashmem.h
 create mode 100644 t/jobs/t0021.fio
 create mode 100644 t/jobs/t0022.fio

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 72630dd0..db073818 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.31
+DEF_VER=fio-3.32
 
 LF='
 '
diff --git a/backend.c b/backend.c
index 375a23e4..fe614f6e 100644
--- a/backend.c
+++ b/backend.c
@@ -2451,10 +2451,8 @@ reap:
 							strerror(ret));
 			} else {
 				pid_t pid;
-				struct fio_file **files;
 				void *eo;
 				dprint(FD_PROCESS, "will fork\n");
-				files = td->files;
 				eo = td->eo;
 				read_barrier();
 				pid = fork();
@@ -2465,9 +2463,6 @@ reap:
 					_exit(ret);
 				} else if (i == fio_debug_jobno)
 					*fio_debug_jobp = pid;
-				// freeing previously allocated memory for files
-				// this memory freed MUST NOT be shared between processes, only the pointer itself may be shared within TD
-				free(files);
 				free(eo);
 				free(fd);
 				fd = NULL;
diff --git a/os/os-android.h b/os/os-android.h
deleted file mode 100644
index 34534239..00000000
--- a/os/os-android.h
+++ /dev/null
@@ -1,342 +0,0 @@
-#ifndef FIO_OS_ANDROID_H
-#define FIO_OS_ANDROID_H
-
-#define	FIO_OS	os_android
-
-#include <sys/ioctl.h>
-#include <sys/mman.h>
-#include <sys/uio.h>
-#include <sys/syscall.h>
-#include <sys/sysmacros.h>
-#include <sys/vfs.h>
-#include <unistd.h>
-#include <fcntl.h>
-#include <errno.h>
-#include <sched.h>
-#include <linux/unistd.h>
-#include <linux/major.h>
-#include <asm/byteorder.h>
-
-#include "./os-linux-syscall.h"
-#include "../file.h"
-
-#ifndef __has_builtin         // Optional of course.
-  #define __has_builtin(x) 0  // Compatibility with non-clang compilers.
-#endif
-
-#define FIO_HAVE_CPU_AFFINITY
-#define FIO_HAVE_DISK_UTIL
-#define FIO_HAVE_IOSCHED_SWITCH
-#define FIO_HAVE_IOPRIO
-#define FIO_HAVE_IOPRIO_CLASS
-#define FIO_HAVE_ODIRECT
-#define FIO_HAVE_HUGETLB
-#define FIO_HAVE_BLKTRACE
-#define FIO_HAVE_CL_SIZE
-#define FIO_HAVE_CGROUPS
-#define FIO_HAVE_FS_STAT
-#define FIO_HAVE_TRIM
-#define FIO_HAVE_GETTID
-#define FIO_USE_GENERIC_INIT_RANDOM_STATE
-#define FIO_HAVE_E4_ENG
-#define FIO_HAVE_BYTEORDER_FUNCS
-#define FIO_HAVE_MMAP_HUGE
-#define FIO_NO_HAVE_SHM_H
-
-#define OS_MAP_ANON		MAP_ANONYMOUS
-
-typedef cpu_set_t os_cpu_mask_t;
-
-#define fio_setaffinity(pid, cpumask)		\
-	sched_setaffinity((pid), sizeof(cpumask), &(cpumask))
-#define fio_getaffinity(pid, ptr)	\
-	sched_getaffinity((pid), sizeof(cpu_set_t), (ptr))
-
-#ifndef POSIX_MADV_DONTNEED
-#define posix_madvise   madvise
-#define POSIX_MADV_DONTNEED MADV_DONTNEED
-#define POSIX_MADV_SEQUENTIAL	MADV_SEQUENTIAL
-#define POSIX_MADV_RANDOM	MADV_RANDOM
-#endif
-
-#ifdef MADV_REMOVE
-#define FIO_MADV_FREE	MADV_REMOVE
-#endif
-#ifndef MAP_HUGETLB
-#define MAP_HUGETLB 0x40000 /* arch specific */
-#endif
-
-#ifdef CONFIG_PTHREAD_GETAFFINITY
-#define FIO_HAVE_GET_THREAD_AFFINITY
-#define fio_get_thread_affinity(mask)	\
-	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
-#endif
-
-#define fio_cpu_clear(mask, cpu)	CPU_CLR((cpu), (mask))
-#define fio_cpu_set(mask, cpu)		CPU_SET((cpu), (mask))
-#define fio_cpu_isset(mask, cpu)	(CPU_ISSET((cpu), (mask)) != 0)
-#define fio_cpu_count(mask)		CPU_COUNT((mask))
-
-static inline int fio_cpuset_init(os_cpu_mask_t *mask)
-{
-	CPU_ZERO(mask);
-	return 0;
-}
-
-static inline int fio_cpuset_exit(os_cpu_mask_t *mask)
-{
-	return 0;
-}
-
-#define FIO_MAX_CPUS			CPU_SETSIZE
-
-#ifndef CONFIG_NO_SHM
-/*
- * Bionic doesn't support SysV shared memory, so implement it using ashmem
- */
-#include <stdio.h>
-#include <linux/ashmem.h>
-#include <linux/shm.h>
-#include <android/api-level.h>
-#if __ANDROID_API__ >= __ANDROID_API_O__
-#include <android/sharedmem.h>
-#else
-#define ASHMEM_DEVICE	"/dev/ashmem"
-#endif
-#define shmid_ds shmid64_ds
-#define SHM_HUGETLB    04000
-
-static inline int shmctl(int __shmid, int __cmd, struct shmid_ds *__buf)
-{
-	int ret=0;
-	if (__cmd == IPC_RMID)
-	{
-		int length = ioctl(__shmid, ASHMEM_GET_SIZE, NULL);
-		struct ashmem_pin pin = {0 , length};
-		ret = ioctl(__shmid, ASHMEM_UNPIN, &pin);
-		close(__shmid);
-	}
-	return ret;
-}
-
-#if __ANDROID_API__ >= __ANDROID_API_O__
-static inline int shmget(key_t __key, size_t __size, int __shmflg)
-{
-	char keybuf[11];
-
-	sprintf(keybuf, "%d", __key);
-
-	return ASharedMemory_create(keybuf, __size + sizeof(uint64_t));
-}
-#else
-static inline int shmget(key_t __key, size_t __size, int __shmflg)
-{
-	int fd,ret;
-	char keybuf[11];
-
-	fd = open(ASHMEM_DEVICE, O_RDWR);
-	if (fd < 0)
-		return fd;
-
-	sprintf(keybuf,"%d",__key);
-	ret = ioctl(fd, ASHMEM_SET_NAME, keybuf);
-	if (ret < 0)
-		goto error;
-
-	/* Stores size in first 8 bytes, allocate extra space */
-	ret = ioctl(fd, ASHMEM_SET_SIZE, __size + sizeof(uint64_t));
-	if (ret < 0)
-		goto error;
-
-	return fd;
-
-error:
-	close(fd);
-	return ret;
-}
-#endif
-
-static inline void *shmat(int __shmid, const void *__shmaddr, int __shmflg)
-{
-	size_t size = ioctl(__shmid, ASHMEM_GET_SIZE, NULL);
-	/* Needs to be 8-byte aligned to prevent SIGBUS on 32-bit ARM */
-	uint64_t *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, __shmid, 0);
-	/* Save size at beginning of buffer, for use with munmap */
-	*ptr = size;
-	return ptr + 1;
-}
-
-static inline int shmdt (const void *__shmaddr)
-{
-	/* Find mmap size which we stored at the beginning of the buffer */
-	uint64_t *ptr = (uint64_t *)__shmaddr - 1;
-	size_t size = *ptr;
-	return munmap(ptr, size);
-}
-#endif
-
-#define SPLICE_DEF_SIZE	(64*1024)
-
-enum {
-	IOPRIO_CLASS_NONE,
-	IOPRIO_CLASS_RT,
-	IOPRIO_CLASS_BE,
-	IOPRIO_CLASS_IDLE,
-};
-
-enum {
-	IOPRIO_WHO_PROCESS = 1,
-	IOPRIO_WHO_PGRP,
-	IOPRIO_WHO_USER,
-};
-
-#define IOPRIO_BITS		16
-#define IOPRIO_CLASS_SHIFT	13
-
-#define IOPRIO_MIN_PRIO		0	/* highest priority */
-#define IOPRIO_MAX_PRIO		7	/* lowest priority */
-
-#define IOPRIO_MIN_PRIO_CLASS	0
-#define IOPRIO_MAX_PRIO_CLASS	3
-
-static inline int ioprio_value(int ioprio_class, int ioprio)
-{
-	/*
-	 * If no class is set, assume BE
-	 */
-        if (!ioprio_class)
-                ioprio_class = IOPRIO_CLASS_BE;
-
-	return (ioprio_class << IOPRIO_CLASS_SHIFT) | ioprio;
-}
-
-static inline bool ioprio_value_is_class_rt(unsigned int priority)
-{
-	return (priority >> IOPRIO_CLASS_SHIFT) == IOPRIO_CLASS_RT;
-}
-
-static inline int ioprio_set(int which, int who, int ioprio_class, int ioprio)
-{
-	return syscall(__NR_ioprio_set, which, who,
-		       ioprio_value(ioprio_class, ioprio));
-}
-
-#ifndef BLKGETSIZE64
-#define BLKGETSIZE64	_IOR(0x12,114,size_t)
-#endif
-
-#ifndef BLKFLSBUF
-#define BLKFLSBUF	_IO(0x12,97)
-#endif
-
-#ifndef BLKDISCARD
-#define BLKDISCARD	_IO(0x12,119)
-#endif
-
-static inline int blockdev_invalidate_cache(struct fio_file *f)
-{
-	return ioctl(f->fd, BLKFLSBUF);
-}
-
-static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
-{
-	if (!ioctl(f->fd, BLKGETSIZE64, bytes))
-		return 0;
-
-	return errno;
-}
-
-static inline unsigned long long os_phys_mem(void)
-{
-	long pagesize, pages;
-
-	pagesize = sysconf(_SC_PAGESIZE);
-	pages = sysconf(_SC_PHYS_PAGES);
-	if (pages == -1 || pagesize == -1)
-		return 0;
-
-	return (unsigned long long) pages * (unsigned long long) pagesize;
-}
-
-#ifdef O_NOATIME
-#define FIO_O_NOATIME	O_NOATIME
-#else
-#define FIO_O_NOATIME	0
-#endif
-
-/* Check for GCC or Clang byte swap intrinsics */
-#if (__has_builtin(__builtin_bswap16) && __has_builtin(__builtin_bswap32) \
-     && __has_builtin(__builtin_bswap64)) || (__GNUC__ > 4 \
-     || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8)) /* fio_swapN */
-#define fio_swap16(x)	__builtin_bswap16(x)
-#define fio_swap32(x)	__builtin_bswap32(x)
-#define fio_swap64(x)	__builtin_bswap64(x)
-#else
-#include <byteswap.h>
-#define fio_swap16(x)	bswap_16(x)
-#define fio_swap32(x)	bswap_32(x)
-#define fio_swap64(x)	bswap_64(x)
-#endif /* fio_swapN */
-
-#define CACHE_LINE_FILE	\
-	"/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size"
-
-static inline int arch_cache_line_size(void)
-{
-	char size[32];
-	int fd, ret;
-
-	fd = open(CACHE_LINE_FILE, O_RDONLY);
-	if (fd < 0)
-		return -1;
-
-	ret = read(fd, size, sizeof(size));
-
-	close(fd);
-
-	if (ret <= 0)
-		return -1;
-	else
-		return atoi(size);
-}
-
-static inline unsigned long long get_fs_free_size(const char *path)
-{
-	unsigned long long ret;
-	struct statfs s;
-
-	if (statfs(path, &s) < 0)
-		return -1ULL;
-
-	ret = s.f_bsize;
-	ret *= (unsigned long long) s.f_bfree;
-	return ret;
-}
-
-static inline int os_trim(struct fio_file *f, unsigned long long start,
-			  unsigned long long len)
-{
-	uint64_t range[2];
-
-	range[0] = start;
-	range[1] = len;
-
-	if (!ioctl(f->fd, BLKDISCARD, range))
-		return 0;
-
-	return errno;
-}
-
-#ifdef CONFIG_SCHED_IDLE
-static inline int fio_set_sched_idle(void)
-{
-        struct sched_param p = { .sched_priority = 0, };
-        return sched_setscheduler(gettid(), SCHED_IDLE, &p);
-}
-#endif
-
-#ifndef RWF_UNCACHED
-#define RWF_UNCACHED	0x00000040
-#endif
-
-#endif
diff --git a/os/os-ashmem.h b/os/os-ashmem.h
new file mode 100644
index 00000000..c34ff656
--- /dev/null
+++ b/os/os-ashmem.h
@@ -0,0 +1,84 @@
+#ifndef CONFIG_NO_SHM
+/*
+ * Bionic doesn't support SysV shared memory, so implement it using ashmem
+ */
+#include <stdio.h>
+#include <linux/ashmem.h>
+#include <linux/shm.h>
+#include <android/api-level.h>
+#if __ANDROID_API__ >= __ANDROID_API_O__
+#include <android/sharedmem.h>
+#else
+#define ASHMEM_DEVICE	"/dev/ashmem"
+#endif
+#define shmid_ds shmid64_ds
+#define SHM_HUGETLB    04000
+
+static inline int shmctl(int __shmid, int __cmd, struct shmid_ds *__buf)
+{
+	int ret=0;
+	if (__cmd == IPC_RMID)
+	{
+		int length = ioctl(__shmid, ASHMEM_GET_SIZE, NULL);
+		struct ashmem_pin pin = {0 , length};
+		ret = ioctl(__shmid, ASHMEM_UNPIN, &pin);
+		close(__shmid);
+	}
+	return ret;
+}
+
+#if __ANDROID_API__ >= __ANDROID_API_O__
+static inline int shmget(key_t __key, size_t __size, int __shmflg)
+{
+	char keybuf[11];
+
+	sprintf(keybuf, "%d", __key);
+
+	return ASharedMemory_create(keybuf, __size + sizeof(uint64_t));
+}
+#else
+static inline int shmget(key_t __key, size_t __size, int __shmflg)
+{
+	int fd,ret;
+	char keybuf[11];
+
+	fd = open(ASHMEM_DEVICE, O_RDWR);
+	if (fd < 0)
+		return fd;
+
+	sprintf(keybuf,"%d",__key);
+	ret = ioctl(fd, ASHMEM_SET_NAME, keybuf);
+	if (ret < 0)
+		goto error;
+
+	/* Stores size in first 8 bytes, allocate extra space */
+	ret = ioctl(fd, ASHMEM_SET_SIZE, __size + sizeof(uint64_t));
+	if (ret < 0)
+		goto error;
+
+	return fd;
+
+error:
+	close(fd);
+	return ret;
+}
+#endif
+
+static inline void *shmat(int __shmid, const void *__shmaddr, int __shmflg)
+{
+	size_t size = ioctl(__shmid, ASHMEM_GET_SIZE, NULL);
+	/* Needs to be 8-byte aligned to prevent SIGBUS on 32-bit ARM */
+	uint64_t *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, __shmid, 0);
+	/* Save size at beginning of buffer, for use with munmap */
+	*ptr = size;
+	return ptr + 1;
+}
+
+static inline int shmdt (const void *__shmaddr)
+{
+	/* Find mmap size which we stored at the beginning of the buffer */
+	uint64_t *ptr = (uint64_t *)__shmaddr - 1;
+	size_t size = *ptr;
+	return munmap(ptr, size);
+}
+#endif
diff --git a/os/os-linux.h b/os/os-linux.h
index 3001140c..831f0ad0 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -1,7 +1,11 @@
 #ifndef FIO_OS_LINUX_H
 #define FIO_OS_LINUX_H
 
+#ifdef __ANDROID__
+#define FIO_OS  os_android
+#else
 #define	FIO_OS	os_linux
+#endif
 
 #include <sys/ioctl.h>
 #include <sys/uio.h>
@@ -17,6 +21,11 @@
 #include <linux/major.h>
 #include <linux/fs.h>
 #include <scsi/sg.h>
+#include <asm/byteorder.h>
+#ifdef __ANDROID__
+#include "os-ashmem.h"
+#define FIO_NO_HAVE_SHM_H
+#endif
 
 #ifdef ARCH_HAVE_CRC_CRYPTO
 #include <sys/auxv.h>
@@ -50,6 +59,7 @@
 #define FIO_HAVE_TRIM
 #define FIO_HAVE_GETTID
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
+#define FIO_HAVE_BYTEORDER_FUNCS
 #define FIO_HAVE_PWRITEV2
 #define FIO_HAVE_SHM_ATTACH_REMOVED
 
@@ -81,8 +91,8 @@ typedef cpu_set_t os_cpu_mask_t;
 	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
 #endif
 
-#define fio_cpu_clear(mask, cpu)	(void) CPU_CLR((cpu), (mask))
-#define fio_cpu_set(mask, cpu)		(void) CPU_SET((cpu), (mask))
+#define fio_cpu_clear(mask, cpu)	CPU_CLR((cpu), (mask))
+#define fio_cpu_set(mask, cpu)		CPU_SET((cpu), (mask))
 #define fio_cpu_isset(mask, cpu)	(CPU_ISSET((cpu), (mask)) != 0)
 #define fio_cpu_count(mask)		CPU_COUNT((mask))
 
diff --git a/os/os.h b/os/os.h
index 810e6166..aba6813f 100644
--- a/os/os.h
+++ b/os/os.h
@@ -33,9 +33,7 @@ typedef enum {
 } cpu_features;
 
 /* IWYU pragma: begin_exports */
-#if defined(__ANDROID__)
-#include "os-android.h"
-#elif defined(__linux__)
+#if defined(__linux__)
 #include "os-linux.h"
 #elif defined(__FreeBSD__)
 #include "os-freebsd.h"
diff --git a/t/jobs/t0021.fio b/t/jobs/t0021.fio
new file mode 100644
index 00000000..47fbae71
--- /dev/null
+++ b/t/jobs/t0021.fio
@@ -0,0 +1,15 @@
+# make sure the lfsr random generator actually does touch all the offsets
+#
+# Expected result: offsets are not accessed sequentially and all offsets are touched
+# Buggy result: offsets are accessed sequentially and one or more offsets are missed
+# run with --debug=io or logging to see which offsets are read
+
+[test]
+ioengine=null
+filesize=1M
+rw=randread
+write_bw_log=test
+per_job_logs=0
+log_offset=1
+norandommap=1
+random_generator=lfsr
diff --git a/t/jobs/t0022.fio b/t/jobs/t0022.fio
new file mode 100644
index 00000000..2324571e
--- /dev/null
+++ b/t/jobs/t0022.fio
@@ -0,0 +1,13 @@
+# make sure that when we enable norandommap we touch some offsets more than once
+#
+# Expected result: at least one offset is touched more than once
+# Buggy result: each offset is touched only once
+
+[test]
+ioengine=null
+filesize=1M
+rw=randread
+write_bw_log=test
+per_job_logs=0
+log_offset=1
+norandommap=1
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 78f43521..47823761 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -576,7 +576,7 @@ class FioJobTest_t0019(FioJobTest):
 
 
 class FioJobTest_t0020(FioJobTest):
-    """Test consists of fio test job t0020
+    """Test consists of fio test jobs t0020 and t0021
     Confirm that almost all offsets were touched non-sequentially"""
 
     def check_result(self):
@@ -614,6 +614,41 @@ class FioJobTest_t0020(FioJobTest):
                 self.failure_reason += " missing offset {0}".format(i*4096)
 
 
+class FioJobTest_t0022(FioJobTest):
+    """Test consists of fio test job t0022"""
+
+    def check_result(self):
+        super(FioJobTest_t0022, self).check_result()
+
+        bw_log_filename = os.path.join(self.test_dir, "test_bw.log")
+        file_data, success = self.get_file(bw_log_filename)
+        log_lines = file_data.split('\n')
+
+        filesize = 1024*1024
+        bs = 4096
+        seq_count = 0
+        offsets = set()
+
+        prev = int(log_lines[0].split(',')[4])
+        for line in log_lines[1:]:
+            offsets.add(prev/bs)
+            if len(line.strip()) == 0:
+                continue
+            cur = int(line.split(',')[4])
+            if cur - prev == bs:
+                seq_count += 1
+            prev = cur
+
+        # 10 is an arbitrary threshold
+        if seq_count > 10:
+            self.passed = False
+            self.failure_reason = "too many ({0}) consecutive offsets".format(seq_count)
+
+        if len(offsets) == filesize/bs:
+            self.passed = False
+            self.failure_reason += " no duplicate offsets found with norandommap=1".format(len(offsets))
+
+
 class FioJobTest_iops_rate(FioJobTest):
     """Test consists of fio test job t0009
     Confirm that job0 iops == 1000
@@ -973,6 +1008,24 @@ TEST_LIST = [
         'pre_success':      None,
         'requirements':     [],
     },
+    {
+        'test_id':          21,
+        'test_class':       FioJobTest_t0020,
+        'job':              't0021.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [],
+    },
+    {
+        'test_id':          22,
+        'test_class':       FioJobTest_t0022,
+        'job':              't0022.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a5a2429ece9b2a7e35e2b8a0248e7b1de6d075c3:

  t/io_uring: remove duplicate definition of gettid() (2022-08-26 14:17:40 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b68ba328173f5a4714d888f6ce80fd24a4e4c504:

  test: get 32-bit Ubuntu 22.04 build working (2022-08-29 16:42:18 -0400)

----------------------------------------------------------------
Vincent Fu (3):
      test: add some tests for seq and rand offsets
      test: use Ubuntu 22.04 for 64-bit tests
      test: get 32-bit Ubuntu 22.04 build working

 .github/workflows/ci.yml |  8 ++---
 ci/actions-install.sh    | 13 ++++----
 t/jobs/t0019.fio         | 10 ++++++
 t/jobs/t0020.fio         | 11 +++++++
 t/run-fio-tests.py       | 84 ++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 115 insertions(+), 11 deletions(-)
 create mode 100644 t/jobs/t0019.fio
 create mode 100644 t/jobs/t0020.fio

---

Diff of recent changes:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 650366b2..bdc4db85 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -18,18 +18,18 @@ jobs:
         - android
         include:
         - build: linux-gcc
-          os: ubuntu-20.04
+          os: ubuntu-22.04
           cc: gcc
         - build: linux-clang
-          os: ubuntu-20.04
+          os: ubuntu-22.04
           cc: clang
         - build: macos
           os: macos-11
         - build: linux-i686-gcc
-          os: ubuntu-20.04
+          os: ubuntu-22.04
           arch: i686
         - build: android
-          os: ubuntu-20.04
+          os: ubuntu-22.04
           arch: aarch64-linux-android32
 
     env:
diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index b5c4198f..c209a089 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -23,26 +23,21 @@ DPKGCFG
         libcunit1-dev
         libcurl4-openssl-dev
         libfl-dev
-        libibverbs-dev
         libnuma-dev
-        librdmacm-dev
 	libnfs-dev
         valgrind
     )
     case "${CI_TARGET_ARCH}" in
         "i686")
             sudo dpkg --add-architecture i386
-            opts="--allow-downgrades"
             pkgs=("${pkgs[@]/%/:i386}")
             pkgs+=(
                 gcc-multilib
                 pkg-config:i386
                 zlib1g-dev:i386
-		libpcre2-8-0=10.34-7
             )
             ;;
         "x86_64")
-            opts=""
             pkgs+=(
                 libglusterfs-dev
                 libgoogle-perftools-dev
@@ -53,7 +48,11 @@ DPKGCFG
                 librbd-dev
                 libtcmalloc-minimal4
                 nvidia-cuda-dev
+                libibverbs-dev
+                librdmacm-dev
             )
+	    echo "Removing libunwind-14-dev because of conflicts with libunwind-dev"
+	    sudo apt remove -y libunwind-14-dev
             ;;
     esac
 
@@ -66,8 +65,8 @@ DPKGCFG
 
     echo "Updating APT..."
     sudo apt-get -qq update
-    echo "Installing packages..."
-    sudo apt-get install "$opts" -o APT::Immediate-Configure=false --no-install-recommends -qq -y "${pkgs[@]}"
+    echo "Installing packages... ${pkgs[@]}"
+    sudo apt-get install -o APT::Immediate-Configure=false --no-install-recommends -qq -y "${pkgs[@]}"
 }
 
 install_linux() {
diff --git a/t/jobs/t0019.fio b/t/jobs/t0019.fio
new file mode 100644
index 00000000..b60d27d2
--- /dev/null
+++ b/t/jobs/t0019.fio
@@ -0,0 +1,10 @@
+# Expected result: offsets are accessed sequentially and all offsets are read
+# Buggy result: offsets are not accessed sequentially and one or more offsets are missed
+# run with --debug=io or logging to see which offsets are accessed
+
+[test]
+ioengine=null
+filesize=1M
+write_bw_log=test
+per_job_logs=0
+log_offset=1
diff --git a/t/jobs/t0020.fio b/t/jobs/t0020.fio
new file mode 100644
index 00000000..1c1c5166
--- /dev/null
+++ b/t/jobs/t0020.fio
@@ -0,0 +1,11 @@
+# Expected result: offsets are not accessed sequentially and all offsets are touched
+# Buggy result: offsets are accessed sequentially and one or more offsets are missed
+# run with --debug=io or logging to see which offsets are read
+
+[test]
+ioengine=null
+filesize=1M
+rw=randread
+write_bw_log=test
+per_job_logs=0
+log_offset=1
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 1e5e9f24..78f43521 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -548,6 +548,72 @@ class FioJobTest_t0015(FioJobTest):
             self.passed = False
 
 
+class FioJobTest_t0019(FioJobTest):
+    """Test consists of fio test job t0019
+    Confirm that all offsets were touched sequentially"""
+
+    def check_result(self):
+        super(FioJobTest_t0019, self).check_result()
+
+        bw_log_filename = os.path.join(self.test_dir, "test_bw.log")
+        file_data, success = self.get_file(bw_log_filename)
+        log_lines = file_data.split('\n')
+
+        prev = -4096
+        for line in log_lines:
+            if len(line.strip()) == 0:
+                continue
+            cur = int(line.split(',')[4])
+            if cur - prev != 4096:
+                self.passed = False
+                self.failure_reason = "offsets {0}, {1} not sequential".format(prev, cur)
+                return
+            prev = cur
+
+        if cur/4096 != 255:
+            self.passed = False
+            self.failure_reason = "unexpected last offset {0}".format(cur)
+
+
+class FioJobTest_t0020(FioJobTest):
+    """Test consists of fio test job t0020
+    Confirm that almost all offsets were touched non-sequentially"""
+
+    def check_result(self):
+        super(FioJobTest_t0020, self).check_result()
+
+        bw_log_filename = os.path.join(self.test_dir, "test_bw.log")
+        file_data, success = self.get_file(bw_log_filename)
+        log_lines = file_data.split('\n')
+
+        seq_count = 0
+        offsets = set()
+
+        prev = int(log_lines[0].split(',')[4])
+        for line in log_lines[1:]:
+            offsets.add(prev/4096)
+            if len(line.strip()) == 0:
+                continue
+            cur = int(line.split(',')[4])
+            if cur - prev == 4096:
+                seq_count += 1
+            prev = cur
+
+        # 10 is an arbitrary threshold
+        if seq_count > 10:
+            self.passed = False
+            self.failure_reason = "too many ({0}) consecutive offsets".format(seq_count)
+
+        if len(offsets) != 256:
+            self.passed = False
+            self.failure_reason += " number of offsets is {0} instead of 256".format(len(offsets))
+
+        for i in range(256):
+            if not i in offsets:
+                self.passed = False
+                self.failure_reason += " missing offset {0}".format(i*4096)
+
+
 class FioJobTest_iops_rate(FioJobTest):
     """Test consists of fio test job t0009
     Confirm that job0 iops == 1000
@@ -889,6 +955,24 @@ TEST_LIST = [
         'pre_success':      None,
         'requirements':     [Requirements.linux, Requirements.io_uring],
     },
+    {
+        'test_id':          19,
+        'test_class':       FioJobTest_t0019,
+        'job':              't0019.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [],
+    },
+    {
+        'test_id':          20,
+        'test_class':       FioJobTest_t0020,
+        'job':              't0020.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c27ae7ae6c3d9108bba80ff71cf36bf7fc8b34c9:

  engines/io_uring: delete debug code (2022-08-25 11:19:34 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a5a2429ece9b2a7e35e2b8a0248e7b1de6d075c3:

  t/io_uring: remove duplicate definition of gettid() (2022-08-26 14:17:40 -0600)

----------------------------------------------------------------
Anuj Gupta (2):
      t/io_uring: prep for including engines/nvme.h in t/io_uring
      t/io_uring: add support for async-passthru

Jens Axboe (2):
      t/io_uring: fix 64-bit cast on 32-bit archs
      t/io_uring: remove duplicate definition of gettid()

Vincent Fu (1):
      test: add basic test for io_uring ioengine

 t/io_uring.c       | 264 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 t/jobs/t0018.fio   |   9 ++
 t/run-fio-tests.py |  22 +++++
 3 files changed, 271 insertions(+), 24 deletions(-)
 create mode 100644 t/jobs/t0018.fio

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index f34a3554..e8e41796 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -30,11 +30,13 @@
 #include <sched.h>
 
 #include "../arch/arch.h"
+#include "../os/os.h"
 #include "../lib/types.h"
 #include "../lib/roundup.h"
 #include "../lib/rand.h"
 #include "../minmax.h"
 #include "../os/linux/io_uring.h"
+#include "../engines/nvme.h"
 
 struct io_sq_ring {
 	unsigned *head;
@@ -67,6 +69,8 @@ struct file {
 	unsigned long max_size;
 	unsigned long cur_off;
 	unsigned pending_ios;
+	unsigned int nsid;	/* nsid field required for nvme-passthrough */
+	unsigned int lba_shift;	/* lba_shift field required for nvme-passthrough */
 	int real_fd;
 	int fixed_fd;
 	int fileno;
@@ -117,7 +121,7 @@ static struct submitter *submitter;
 static volatile int finish;
 static int stats_running;
 static unsigned long max_iops;
-static long page_size;
+static long t_io_uring_page_size;
 
 static int depth = DEPTH;
 static int batch_submit = BATCH_SUBMIT;
@@ -139,6 +143,7 @@ static int random_io = 1;	/* random or sequential IO */
 static int register_ring = 1;	/* register ring */
 static int use_sync = 0;	/* use preadv2 */
 static int numa_placement = 0;	/* set to node of device */
+static int pt = 0;		/* passthrough I/O or not */
 
 static unsigned long tsc_rate;
 
@@ -161,6 +166,54 @@ struct io_uring_map_buffers {
 };
 #endif
 
+static int nvme_identify(int fd, __u32 nsid, enum nvme_identify_cns cns,
+			 enum nvme_csi csi, void *data)
+{
+	struct nvme_passthru_cmd cmd = {
+		.opcode         = nvme_admin_identify,
+		.nsid           = nsid,
+		.addr           = (__u64)(uintptr_t)data,
+		.data_len       = NVME_IDENTIFY_DATA_SIZE,
+		.cdw10          = cns,
+		.cdw11          = csi << NVME_IDENTIFY_CSI_SHIFT,
+		.timeout_ms     = NVME_DEFAULT_IOCTL_TIMEOUT,
+	};
+
+	return ioctl(fd, NVME_IOCTL_ADMIN_CMD, &cmd);
+}
+
+static int nvme_get_info(int fd, __u32 *nsid, __u32 *lba_sz, __u64 *nlba)
+{
+	struct nvme_id_ns ns;
+	int namespace_id;
+	int err;
+
+	namespace_id = ioctl(fd, NVME_IOCTL_ID);
+	if (namespace_id < 0) {
+		fprintf(stderr, "error failed to fetch namespace-id\n");
+		close(fd);
+		return -errno;
+	}
+
+	/*
+	 * Identify namespace to get namespace-id, namespace size in LBA's
+	 * and LBA data size.
+	 */
+	err = nvme_identify(fd, namespace_id, NVME_IDENTIFY_CNS_NS,
+				NVME_CSI_NVM, &ns);
+	if (err) {
+		fprintf(stderr, "error failed to fetch identify namespace\n");
+		close(fd);
+		return err;
+	}
+
+	*nsid = namespace_id;
+	*lba_sz = 1 << ns.lbaf[(ns.flbas & 0x0f)].ds;
+	*nlba = ns.nsze;
+
+	return 0;
+}
+
 static unsigned long cycles_to_nsec(unsigned long cycles)
 {
 	uint64_t val;
@@ -195,9 +248,9 @@ static unsigned long plat_idx_to_val(unsigned int idx)
 	return cycles_to_nsec(base + ((k + 0.5) * (1 << error_bits)));
 }
 
-unsigned int calc_clat_percentiles(unsigned long *io_u_plat, unsigned long nr,
-				   unsigned long **output,
-				   unsigned long *maxv, unsigned long *minv)
+unsigned int calculate_clat_percentiles(unsigned long *io_u_plat,
+		unsigned long nr, unsigned long **output,
+		unsigned long *maxv, unsigned long *minv)
 {
 	unsigned long sum = 0;
 	unsigned int len = plist_len, i, j = 0;
@@ -251,7 +304,7 @@ static void show_clat_percentiles(unsigned long *io_u_plat, unsigned long nr,
 	bool is_last;
 	char fmt[32];
 
-	len = calc_clat_percentiles(io_u_plat, nr, &ovals, &maxv, &minv);
+	len = calculate_clat_percentiles(io_u_plat, nr, &ovals, &maxv, &minv);
 	if (!len || !ovals)
 		goto out;
 
@@ -443,13 +496,6 @@ static int io_uring_enter(struct submitter *s, unsigned int to_submit,
 #endif
 }
 
-#ifndef CONFIG_HAVE_GETTID
-static int gettid(void)
-{
-	return syscall(__NR_gettid);
-}
-#endif
-
 static unsigned file_depth(struct submitter *s)
 {
 	return (depth + s->nr_files - 1) / s->nr_files;
@@ -520,6 +566,65 @@ static void init_io(struct submitter *s, unsigned index)
 		sqe->user_data |= ((uint64_t)s->clock_index << 32);
 }
 
+static void init_io_pt(struct submitter *s, unsigned index)
+{
+	struct io_uring_sqe *sqe = &s->sqes[index << 1];
+	unsigned long offset;
+	struct file *f;
+	struct nvme_uring_cmd *cmd;
+	unsigned long long slba;
+	unsigned long long nlb;
+	long r;
+
+	if (s->nr_files == 1) {
+		f = &s->files[0];
+	} else {
+		f = &s->files[s->cur_file];
+		if (f->pending_ios >= file_depth(s)) {
+			s->cur_file++;
+			if (s->cur_file == s->nr_files)
+				s->cur_file = 0;
+			f = &s->files[s->cur_file];
+		}
+	}
+	f->pending_ios++;
+
+	if (random_io) {
+		r = __rand64(&s->rand_state);
+		offset = (r % (f->max_blocks - 1)) * bs;
+	} else {
+		offset = f->cur_off;
+		f->cur_off += bs;
+		if (f->cur_off + bs > f->max_size)
+			f->cur_off = 0;
+	}
+
+	if (register_files) {
+		sqe->fd = f->fixed_fd;
+		sqe->flags = IOSQE_FIXED_FILE;
+	} else {
+		sqe->fd = f->real_fd;
+		sqe->flags = 0;
+	}
+	sqe->opcode = IORING_OP_URING_CMD;
+	sqe->user_data = (unsigned long) f->fileno;
+	if (stats)
+		sqe->user_data |= ((__u64) s->clock_index << 32ULL);
+	sqe->cmd_op = NVME_URING_CMD_IO;
+	slba = offset >> f->lba_shift;
+	nlb = (bs >> f->lba_shift) - 1;
+	cmd = (struct nvme_uring_cmd *)&sqe->cmd;
+	/* cdw10 and cdw11 represent starting slba*/
+	cmd->cdw10 = slba & 0xffffffff;
+	cmd->cdw11 = slba >> 32;
+	/* cdw12 represent number of lba to be read*/
+	cmd->cdw12 = nlb;
+	cmd->addr = (unsigned long) s->iovecs[index].iov_base;
+	cmd->data_len = bs;
+	cmd->nsid = f->nsid;
+	cmd->opcode = 2;
+}
+
 static int prep_more_ios_uring(struct submitter *s, int max_ios)
 {
 	struct io_sq_ring *ring = &s->sq_ring;
@@ -532,7 +637,10 @@ static int prep_more_ios_uring(struct submitter *s, int max_ios)
 			break;
 
 		index = tail & sq_ring_mask;
-		init_io(s, index);
+		if (pt)
+			init_io_pt(s, index);
+		else
+			init_io(s, index);
 		ring->array[index] = index;
 		prepped++;
 		tail = next_tail;
@@ -549,7 +657,29 @@ static int get_file_size(struct file *f)
 
 	if (fstat(f->real_fd, &st) < 0)
 		return -1;
-	if (S_ISBLK(st.st_mode)) {
+	if (pt) {
+		__u64 nlba;
+		__u32 lbs;
+		int ret;
+
+		if (!S_ISCHR(st.st_mode)) {
+			fprintf(stderr, "passthrough works with only nvme-ns "
+					"generic devices (/dev/ngXnY)\n");
+			return -1;
+		}
+		ret = nvme_get_info(f->real_fd, &f->nsid, &lbs, &nlba);
+		if (ret)
+			return -1;
+		if ((bs % lbs) != 0) {
+			printf("error: bs:%d should be a multiple logical_block_size:%d\n",
+					bs, lbs);
+			return -1;
+		}
+		f->max_blocks = nlba / bs;
+		f->max_size = nlba;
+		f->lba_shift = ilog2(lbs);
+		return 0;
+	} else if (S_ISBLK(st.st_mode)) {
 		unsigned long long bytes;
 
 		if (ioctl(f->real_fd, BLKGETSIZE64, &bytes) != 0)
@@ -620,6 +750,60 @@ static int reap_events_uring(struct submitter *s)
 	return reaped;
 }
 
+static int reap_events_uring_pt(struct submitter *s)
+{
+	struct io_cq_ring *ring = &s->cq_ring;
+	struct io_uring_cqe *cqe;
+	unsigned head, reaped = 0;
+	int last_idx = -1, stat_nr = 0;
+	unsigned index;
+	int fileno;
+
+	head = *ring->head;
+	do {
+		struct file *f;
+
+		read_barrier();
+		if (head == atomic_load_acquire(ring->tail))
+			break;
+		index = head & cq_ring_mask;
+		cqe = &ring->cqes[index << 1];
+		fileno = cqe->user_data & 0xffffffff;
+		f = &s->files[fileno];
+		f->pending_ios--;
+
+		if (cqe->res != 0) {
+			printf("io: unexpected ret=%d\n", cqe->res);
+			if (polled && cqe->res == -EINVAL)
+				printf("passthrough doesn't support polled IO\n");
+			return -1;
+		}
+		if (stats) {
+			int clock_index = cqe->user_data >> 32;
+
+			if (last_idx != clock_index) {
+				if (last_idx != -1) {
+					add_stat(s, last_idx, stat_nr);
+					stat_nr = 0;
+				}
+				last_idx = clock_index;
+			}
+			stat_nr++;
+		}
+		reaped++;
+		head++;
+	} while (1);
+
+	if (stat_nr)
+		add_stat(s, last_idx, stat_nr);
+
+	if (reaped) {
+		s->inflight -= reaped;
+		atomic_store_release(ring->head, head);
+	}
+	return reaped;
+}
+
 static void set_affinity(struct submitter *s)
 {
 #ifdef CONFIG_LIBNUMA
@@ -697,6 +881,7 @@ static int setup_ring(struct submitter *s)
 	struct io_uring_params p;
 	int ret, fd;
 	void *ptr;
+	size_t len;
 
 	memset(&p, 0, sizeof(p));
 
@@ -709,6 +894,10 @@ static int setup_ring(struct submitter *s)
 			p.sq_thread_cpu = sq_thread_cpu;
 		}
 	}
+	if (pt) {
+		p.flags |= IORING_SETUP_SQE128;
+		p.flags |= IORING_SETUP_CQE32;
+	}
 
 	fd = io_uring_setup(depth, &p);
 	if (fd < 0) {
@@ -761,11 +950,22 @@ static int setup_ring(struct submitter *s)
 	sring->array = ptr + p.sq_off.array;
 	sq_ring_mask = *sring->ring_mask;
 
-	s->sqes = mmap(0, p.sq_entries * sizeof(struct io_uring_sqe),
+	if (p.flags & IORING_SETUP_SQE128)
+		len = 2 * p.sq_entries * sizeof(struct io_uring_sqe);
+	else
+		len = p.sq_entries * sizeof(struct io_uring_sqe);
+	s->sqes = mmap(0, len,
 			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
 			IORING_OFF_SQES);
 
-	ptr = mmap(0, p.cq_off.cqes + p.cq_entries * sizeof(struct io_uring_cqe),
+	if (p.flags & IORING_SETUP_CQE32) {
+		len = p.cq_off.cqes +
+			2 * p.cq_entries * sizeof(struct io_uring_cqe);
+	} else {
+		len = p.cq_off.cqes +
+			p.cq_entries * sizeof(struct io_uring_cqe);
+	}
+	ptr = mmap(0, len,
 			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
 			IORING_OFF_CQ_RING);
 	cring->head = ptr + p.cq_off.head;
@@ -786,7 +986,7 @@ static void *allocate_mem(struct submitter *s, int size)
 		return numa_alloc_onnode(size, s->numa_node);
 #endif
 
-	if (posix_memalign(&buf, page_size, bs)) {
+	if (posix_memalign(&buf, t_io_uring_page_size, bs)) {
 		printf("failed alloc\n");
 		return NULL;
 	}
@@ -855,7 +1055,16 @@ static int submitter_init(struct submitter *s)
 		s->plat = NULL;
 		nr_batch = 0;
 	}
+	/* perform the expensive command initialization part for passthrough here
+	 * rather than in the fast path
+	 */
+	if (pt) {
+		for (i = 0; i < roundup_pow2(depth); i++) {
+			struct io_uring_sqe *sqe = &s->sqes[i << 1];
 
+			memset(&sqe->cmd, 0, sizeof(struct nvme_uring_cmd));
+		}
+	}
 	return nr_batch;
 }
 
@@ -1111,7 +1320,10 @@ submit:
 		do {
 			int r;
 
-			r = reap_events_uring(s);
+			if (pt)
+				r = reap_events_uring_pt(s);
+			else
+				r = reap_events_uring(s);
 			if (r == -1) {
 				s->finish = 1;
 				break;
@@ -1305,11 +1517,12 @@ static void usage(char *argv, int status)
 		" -a <bool> : Use legacy aio, default %d\n"
 		" -S <bool> : Use sync IO (preadv2), default %d\n"
 		" -X <bool> : Use registered ring %d\n"
-		" -P <bool> : Automatically place on device home node %d\n",
+		" -P <bool> : Automatically place on device home node %d\n"
+		" -u <bool> : Use nvme-passthrough I/O, default %d\n",
 		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
 		fixedbufs, dma_map, register_files, nthreads, !buffered, do_nop,
 		stats, runtime == 0 ? "unlimited" : runtime_str, random_io, aio,
-		use_sync, register_ring, numa_placement);
+		use_sync, register_ring, numa_placement, pt);
 	exit(status);
 }
 
@@ -1368,7 +1581,7 @@ int main(int argc, char *argv[])
 	if (!do_nop && argc < 2)
 		usage(argv[0], 1);
 
-	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:r:D:R:X:S:P:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:r:D:R:X:S:P:u:h?")) != -1) {
 		switch (opt) {
 		case 'a':
 			aio = !!atoi(optarg);
@@ -1449,6 +1662,9 @@ int main(int argc, char *argv[])
 		case 'P':
 			numa_placement = !!atoi(optarg);
 			break;
+		case 'u':
+			pt = !!atoi(optarg);
+			break;
 		case 'h':
 		case '?':
 		default:
@@ -1542,9 +1758,9 @@ int main(int argc, char *argv[])
 
 	arm_sig_int();
 
-	page_size = sysconf(_SC_PAGESIZE);
-	if (page_size < 0)
-		page_size = 4096;
+	t_io_uring_page_size = sysconf(_SC_PAGESIZE);
+	if (t_io_uring_page_size < 0)
+		t_io_uring_page_size = 4096;
 
 	for (j = 0; j < nthreads; j++) {
 		s = get_submitter(j);
diff --git a/t/jobs/t0018.fio b/t/jobs/t0018.fio
new file mode 100644
index 00000000..e2298b1f
--- /dev/null
+++ b/t/jobs/t0018.fio
@@ -0,0 +1,9 @@
+# Expected result: job completes without error
+# Buggy result: job fails
+
+[test]
+ioengine=io_uring
+filesize=256K
+time_based
+runtime=3s
+rw=randrw
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 504b7cdb..1e5e9f24 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -582,6 +582,7 @@ class Requirements(object):
 
     _linux = False
     _libaio = False
+    _io_uring = False
     _zbd = False
     _root = False
     _zoned_nullb = False
@@ -605,6 +606,12 @@ class Requirements(object):
                 Requirements._zbd = "CONFIG_HAS_BLKZONED" in contents
                 Requirements._libaio = "CONFIG_LIBAIO" in contents
 
+            contents, success = FioJobTest.get_file("/proc/kallsyms")
+            if not success:
+                print("Unable to open '/proc/kallsyms' to probe for io_uring support")
+            else:
+                Requirements._io_uring = "io_uring_setup" in contents
+
             Requirements._root = (os.geteuid() == 0)
             if Requirements._zbd and Requirements._root:
                 try:
@@ -627,6 +634,7 @@ class Requirements(object):
 
         req_list = [Requirements.linux,
                     Requirements.libaio,
+                    Requirements.io_uring,
                     Requirements.zbd,
                     Requirements.root,
                     Requirements.zoned_nullb,
@@ -648,6 +656,11 @@ class Requirements(object):
         """Is libaio available?"""
         return Requirements._libaio, "libaio required"
 
+    @classmethod
+    def io_uring(cls):
+        """Is io_uring available?"""
+        return Requirements._io_uring, "io_uring required"
+
     @classmethod
     def zbd(cls):
         """Is ZBD support available?"""
@@ -867,6 +880,15 @@ TEST_LIST = [
         'output_format':    'json',
         'requirements':     [Requirements.not_windows],
     },
+    {
+        'test_id':          18,
+        'test_class':       FioJobTest,
+        'job':              't0018.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [Requirements.linux, Requirements.io_uring],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 05ef0e4e822ffa81d6e92ed538d32cc37a907279:

  Merge branch 'master' of https://github.com/kraj/fio (2022-08-24 20:09:29 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c27ae7ae6c3d9108bba80ff71cf36bf7fc8b34c9:

  engines/io_uring: delete debug code (2022-08-25 11:19:34 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      engines/io_uring: delete debug code

 engines/io_uring.c | 6 ------
 1 file changed, 6 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 89d64b06..94376efa 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -445,18 +445,12 @@ static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 	struct io_uring_cqe *cqe;
 	struct io_u *io_u;
 	unsigned index;
-	static int eio;
 
 	index = (event + ld->cq_ring_off) & ld->cq_ring_mask;
 
 	cqe = &ld->cq_ring.cqes[index];
 	io_u = (struct io_u *) (uintptr_t) cqe->user_data;
 
-	if (eio++ == 5) {
-		printf("mark EIO\n");
-		cqe->res = -EIO;
-	}
-
 	if (cqe->res != io_u->xfer_buflen) {
 		if (cqe->res > io_u->xfer_buflen)
 			io_u->error = -cqe->res;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 995c45c08c7a362ae0fb2e54e2de27b555a757ab:

  Merge branch 'sigbreak-wait' of github.com:bjpaupor/fio (2022-08-23 17:09:25 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 05ef0e4e822ffa81d6e92ed538d32cc37a907279:

  Merge branch 'master' of https://github.com/kraj/fio (2022-08-24 20:09:29 -0600)

----------------------------------------------------------------
Bart Van Assche (1):
      Enable CPU affinity support on Android

Jens Axboe (3):
      engines/io_uring: pass back correct error value when interrupted
      Merge branch 'master' of https://github.com/bvanassche/fio
      Merge branch 'master' of https://github.com/kraj/fio

Khem Raj (1):
      io_uring: Replace pthread_self with s->tid

 engines/io_uring.c |  8 ++++++++
 os/os-android.h    | 26 ++++++++++++++++++++++++++
 t/io_uring.c       |  5 ++---
 3 files changed, 36 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index cffc7371..89d64b06 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -445,12 +445,18 @@ static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 	struct io_uring_cqe *cqe;
 	struct io_u *io_u;
 	unsigned index;
+	static int eio;
 
 	index = (event + ld->cq_ring_off) & ld->cq_ring_mask;
 
 	cqe = &ld->cq_ring.cqes[index];
 	io_u = (struct io_u *) (uintptr_t) cqe->user_data;
 
+	if (eio++ == 5) {
+		printf("mark EIO\n");
+		cqe->res = -EIO;
+	}
+
 	if (cqe->res != io_u->xfer_buflen) {
 		if (cqe->res > io_u->xfer_buflen)
 			io_u->error = -cqe->res;
@@ -532,6 +538,7 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
 			if (r < 0) {
 				if (errno == EAGAIN || errno == EINTR)
 					continue;
+				r = -errno;
 				td_verror(td, errno, "io_uring_enter");
 				break;
 			}
@@ -665,6 +672,7 @@ static int fio_ioring_commit(struct thread_data *td)
 				usleep(1);
 				continue;
 			}
+			ret = -errno;
 			td_verror(td, errno, "io_uring_enter submit");
 			break;
 		}
diff --git a/os/os-android.h b/os/os-android.h
index 2f73d249..34534239 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -24,6 +24,7 @@
   #define __has_builtin(x) 0  // Compatibility with non-clang compilers.
 #endif
 
+#define FIO_HAVE_CPU_AFFINITY
 #define FIO_HAVE_DISK_UTIL
 #define FIO_HAVE_IOSCHED_SWITCH
 #define FIO_HAVE_IOPRIO
@@ -44,6 +45,13 @@
 
 #define OS_MAP_ANON		MAP_ANONYMOUS
 
+typedef cpu_set_t os_cpu_mask_t;
+
+#define fio_setaffinity(pid, cpumask)		\
+	sched_setaffinity((pid), sizeof(cpumask), &(cpumask))
+#define fio_getaffinity(pid, ptr)	\
+	sched_getaffinity((pid), sizeof(cpu_set_t), (ptr))
+
 #ifndef POSIX_MADV_DONTNEED
 #define posix_madvise   madvise
 #define POSIX_MADV_DONTNEED MADV_DONTNEED
@@ -64,6 +72,24 @@
 	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
 #endif
 
+#define fio_cpu_clear(mask, cpu)	CPU_CLR((cpu), (mask))
+#define fio_cpu_set(mask, cpu)		CPU_SET((cpu), (mask))
+#define fio_cpu_isset(mask, cpu)	(CPU_ISSET((cpu), (mask)) != 0)
+#define fio_cpu_count(mask)		CPU_COUNT((mask))
+
+static inline int fio_cpuset_init(os_cpu_mask_t *mask)
+{
+	CPU_ZERO(mask);
+	return 0;
+}
+
+static inline int fio_cpuset_exit(os_cpu_mask_t *mask)
+{
+	return 0;
+}
+
+#define FIO_MAX_CPUS			CPU_SETSIZE
+
 #ifndef CONFIG_NO_SHM
 /*
  * Bionic doesn't support SysV shared memory, so implement it using ashmem
diff --git a/t/io_uring.c b/t/io_uring.c
index 35bf1956..f34a3554 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -799,15 +799,14 @@ static int submitter_init(struct submitter *s)
 	int i, nr_batch, err;
 	static int init_printed;
 	char buf[80];
-
 	s->tid = gettid();
 	printf("submitter=%d, tid=%d, file=%s, node=%d\n", s->index, s->tid,
 							s->filename, s->numa_node);
 
 	set_affinity(s);
 
-	__init_rand64(&s->rand_state, pthread_self());
-	srand48(pthread_self());
+	__init_rand64(&s->rand_state, s->tid);
+	srand48(s->tid);
 
 	for (i = 0; i < MAX_FDS; i++)
 		s->files[i].fileno = i;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d33c7846cc5f175177e194a5489282780e2a04c4:

  Merge branch 'clarify-io-errors' of https://github.com/Hi-Angel/fio (2022-08-16 19:54:17 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 995c45c08c7a362ae0fb2e54e2de27b555a757ab:

  Merge branch 'sigbreak-wait' of github.com:bjpaupor/fio (2022-08-23 17:09:25 -0400)

----------------------------------------------------------------
Brandon Paupore (1):
      Add wait for handling SIGBREAK

Vincent Fu (3):
      Revert "Minor style fixups"
      Revert "Fix multithread issues when operating on a single shared file"
      Merge branch 'sigbreak-wait' of github.com:bjpaupor/fio

 backend.c   | 40 +++++++++++++++++++++-------------------
 file.h      |  1 -
 filesetup.c | 45 ++-------------------------------------------
 3 files changed, 23 insertions(+), 63 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 5159b60d..375a23e4 100644
--- a/backend.c
+++ b/backend.c
@@ -90,6 +90,25 @@ static void sig_int(int sig)
 	}
 }
 
+#ifdef WIN32
+static void sig_break(int sig)
+{
+	struct thread_data *td;
+	int i;
+
+	sig_int(sig);
+
+	/**
+	 * Windows terminates all job processes on SIGBREAK after the handler
+	 * returns, so give them time to wrap-up and give stats
+	 */
+	for_each_td(td, i) {
+		while (td->runstate < TD_EXITED)
+			sleep(1);
+	}
+}
+#endif
+
 void sig_show_status(int sig)
 {
 	show_running_run_stats();
@@ -112,7 +131,7 @@ static void set_sig_handlers(void)
 /* Windows uses SIGBREAK as a quit signal from other applications */
 #ifdef WIN32
 	memset(&act, 0, sizeof(act));
-	act.sa_handler = sig_int;
+	act.sa_handler = sig_break;
 	act.sa_flags = SA_RESTART;
 	sigaction(SIGBREAK, &act, NULL);
 #endif
@@ -2314,25 +2333,8 @@ static void run_threads(struct sk_out *sk_out)
 	for_each_td(td, i) {
 		print_status_init(td->thread_number - 1);
 
-		if (!td->o.create_serialize) {
-			/*
-			 *  When operating on a single rile in parallel,
-			 *  perform single-threaded early setup so that
-			 *  when setup_files() does not run into issues
-			 *  later.
-			*/
-			if (!i && td->o.nr_files == 1) {
-				if (setup_shared_file(td)) {
-					exit_value++;
-					if (td->error)
-						log_err("fio: pid=%d, err=%d/%s\n",
-							(int) td->pid, td->error, td->verror);
-					td_set_runstate(td, TD_REAPED);
-					todo--;
-				}
-			}
+		if (!td->o.create_serialize)
 			continue;
-		}
 
 		if (fio_verify_load_state(td))
 			goto reap;
diff --git a/file.h b/file.h
index e646cf22..da1b8947 100644
--- a/file.h
+++ b/file.h
@@ -201,7 +201,6 @@ struct thread_data;
 extern void close_files(struct thread_data *);
 extern void close_and_free_files(struct thread_data *);
 extern uint64_t get_start_offset(struct thread_data *, struct fio_file *);
-extern int __must_check setup_shared_file(struct thread_data *);
 extern int __must_check setup_files(struct thread_data *);
 extern int __must_check file_invalidate_cache(struct thread_data *, struct fio_file *);
 #ifdef __cplusplus
diff --git a/filesetup.c b/filesetup.c
index 3e2ccf9b..1d3cc5ad 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -143,7 +143,7 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 	if (unlink_file || new_layout) {
 		int ret;
 
-		dprint(FD_FILE, "layout %d unlink %d %s\n", new_layout, unlink_file, f->file_name);
+		dprint(FD_FILE, "layout unlink %s\n", f->file_name);
 
 		ret = td_io_unlink_file(td, f);
 		if (ret != 0 && ret != ENOENT) {
@@ -198,9 +198,6 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 		}
 	}
 
-
-	dprint(FD_FILE, "fill file %s, size %llu\n", f->file_name, (unsigned long long) f->real_file_size);
-
 	left = f->real_file_size;
 	bs = td->o.max_bs[DDIR_WRITE];
 	if (bs > left)
@@ -1081,44 +1078,6 @@ static bool create_work_dirs(struct thread_data *td, const char *fname)
 	return true;
 }
 
-int setup_shared_file(struct thread_data *td)
-{
-	struct fio_file *f;
-	uint64_t file_size;
-	int err = 0;
-
-	if (td->o.nr_files > 1) {
-		log_err("fio: shared file setup called for multiple files\n");
-		return -1;
-	}
-
-	get_file_sizes(td);
-
-	f = td->files[0];
-
-	if (f == NULL) {
-		log_err("fio: NULL shared file\n");
-		return -1;
-	}
-
-	file_size = thread_number * td->o.size;
-	dprint(FD_FILE, "shared setup %s real_file_size=%llu, desired=%llu\n", 
-			f->file_name, (unsigned long long)f->real_file_size, (unsigned long long)file_size);
-
-	if (f->real_file_size < file_size) {
-		dprint(FD_FILE, "fio: extending shared file\n");
-		f->real_file_size = file_size;
-		err = extend_file(td, f);
-		if (!err)
-			err = __file_invalidate_cache(td, f, 0, f->real_file_size);
-		get_file_sizes(td);
-		dprint(FD_FILE, "shared setup new real_file_size=%llu\n", 
-				(unsigned long long)f->real_file_size);
-	}
-
-	return err;
-}
-
 /*
  * Open the files and setup files sizes, creating files if necessary.
  */
@@ -1133,7 +1092,7 @@ int setup_files(struct thread_data *td)
 	const unsigned long long bs = td_min_bs(td);
 	uint64_t fs = 0;
 
-	dprint(FD_FILE, "setup files (thread_number=%d, subjob_number=%d)\n", td->thread_number, td->subjob_number);
+	dprint(FD_FILE, "setup files\n");
 
 	old_state = td_bump_runstate(td, TD_SETTING_UP);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit eeb302f9bfa4bbe121cae2a12a679c888164fc93:

  README: link to GitHub releases for Windows (2022-08-15 10:37:57 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d33c7846cc5f175177e194a5489282780e2a04c4:

  Merge branch 'clarify-io-errors' of https://github.com/Hi-Angel/fio (2022-08-16 19:54:17 -0600)

----------------------------------------------------------------
Ankit Kumar (2):
      engines/xnvme: fix segfault issue with xnvme ioengine
      doc: update fio doc for xnvme engine

Jens Axboe (1):
      Merge branch 'clarify-io-errors' of https://github.com/Hi-Angel/fio

Konstantin Kharlamov (2):
      doc: get rid of trailing whitespace
      doc: clarify that I/O errors may go unnoticed without direct=1

Vincent Fu (2):
      test: add latency test using posixaio ioengine
      test: fix hash for t0016

 HOWTO.rst                                        | 48 +++++++++++++++------
 engines/xnvme.c                                  | 17 ++++++--
 fio.1                                            | 54 ++++++++++++++++--------
 t/jobs/{t0016-259ebc00.fio => t0016-d54ae22.fio} |  0
 t/jobs/t0017.fio                                 |  9 ++++
 t/run-fio-tests.py                               | 12 +++++-
 6 files changed, 105 insertions(+), 35 deletions(-)
 rename t/jobs/{t0016-259ebc00.fio => t0016-d54ae22.fio} (100%)
 create mode 100644 t/jobs/t0017.fio

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 05fc117f..08be687c 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1301,7 +1301,7 @@ I/O type
 	effectively caps the file size at `real_size - offset`. Can be combined with
 	:option:`size` to constrain the start and end range of the I/O workload.
 	A percentage can be specified by a number between 1 and 100 followed by '%',
-	for example, ``offset=20%`` to specify 20%. In ZBD mode, value can be set as 
+	for example, ``offset=20%`` to specify 20%. In ZBD mode, value can be set as
         number of zones using 'z'.
 
 .. option:: offset_align=int
@@ -1877,7 +1877,7 @@ I/O size
 	If this option is not specified, fio will use the full size of the given
 	files or devices.  If the files do not exist, size must be given. It is also
 	possible to give size as a percentage between 1 and 100. If ``size=20%`` is
-	given, fio will use 20% of the full size of the given files or devices. 
+	given, fio will use 20% of the full size of the given files or devices.
 	In ZBD mode, value can also be set as number of zones using 'z'.
 	Can be combined with :option:`offset` to constrain the start and end range
 	that I/O will be done within.
@@ -2780,41 +2780,56 @@ with the caveat that when used on the command line, they must come after the
 	Select the xnvme async command interface. This can take these values.
 
 	**emu**
-		This is default and used to emulate asynchronous I/O.
+		This is default and use to emulate asynchronous I/O by using a
+		single thread to create a queue pair on top of a synchronous
+		I/O interface using the NVMe driver IOCTL.
 	**thrpool**
-		Use thread pool for Asynchronous I/O.
+		Emulate an asynchronous I/O interface with a pool of userspace
+		threads on top of a synchronous I/O interface using the NVMe
+		driver IOCTL. By default four threads are used.
 	**io_uring**
-		Use Linux io_uring/liburing for Asynchronous I/O.
+		Linux native asynchronous I/O interface which supports both
+		direct and buffered I/O.
+	**io_uring_cmd**
+		Fast Linux native asynchronous I/O interface for NVMe pass
+		through commands. This only works with NVMe character device
+		(/dev/ngXnY).
 	**libaio**
 		Use Linux aio for Asynchronous I/O.
 	**posix**
-		Use POSIX aio for Asynchronous I/O.
+		Use the posix asynchronous I/O interface to perform one or
+		more I/O operations asynchronously.
 	**nil**
-		Use nil-io; For introspective perf. evaluation
+		Do not transfer any data; just pretend to. This is mainly used
+		for introspective performance evaluation.
 
 .. option:: xnvme_sync=str : [xnvme]
 
 	Select the xnvme synchronous command interface. This can take these values.
 
 	**nvme**
-		This is default and uses Linux NVMe Driver ioctl() for synchronous I/O.
+		This is default and uses Linux NVMe Driver ioctl() for
+		synchronous I/O.
 	**psync**
-		Use pread()/write() for synchronous I/O.
+		This supports regular as well as vectored pread() and pwrite()
+		commands.
+	**block**
+		This is the same as psync except that it also supports zone
+		management commands using Linux block layer IOCTLs.
 
 .. option:: xnvme_admin=str : [xnvme]
 
 	Select the xnvme admin command interface. This can take these values.
 
 	**nvme**
-		This is default and uses linux NVMe Driver ioctl() for admin commands.
+		This is default and uses linux NVMe Driver ioctl() for admin
+		commands.
 	**block**
 		Use Linux Block Layer ioctl() and sysfs for admin commands.
-	**file_as_ns**
-		Use file-stat to construct NVMe idfy responses.
 
 .. option:: xnvme_dev_nsid=int : [xnvme]
 
-	xnvme namespace identifier, for userspace NVMe driver.
+	xnvme namespace identifier for userspace NVMe driver, such as SPDK.
 
 .. option:: xnvme_iovec=int : [xnvme]
 
@@ -3912,6 +3927,13 @@ Error handling
 	appended, the total error count and the first error. The error field given
 	in the stats is the first error that was hit during the run.
 
+	Note: a write error from the device may go unnoticed by fio when using
+	buffered IO, as the write() (or similar) system call merely dirties the
+	kernel pages, unless :option:`sync` or :option:`direct` is used. Device IO
+	errors occur when the dirty data is actually written out to disk. If fully
+	sync writes aren't desirable, :option:`fsync` or :option:`fdatasync` can be
+	used as well. This is specific to writes, as reads are always synchronous.
+
 	The allowed values are:
 
 		**none**
diff --git a/engines/xnvme.c b/engines/xnvme.c
index c11b33a8..d8647481 100644
--- a/engines/xnvme.c
+++ b/engines/xnvme.c
@@ -205,9 +205,14 @@ static void _dev_close(struct thread_data *td, struct xnvme_fioe_fwrap *fwrap)
 
 static void xnvme_fioe_cleanup(struct thread_data *td)
 {
-	struct xnvme_fioe_data *xd = td->io_ops_data;
+	struct xnvme_fioe_data *xd = NULL;
 	int err;
 
+	if (!td->io_ops_data)
+		return;
+
+	xd = td->io_ops_data;
+
 	err = pthread_mutex_lock(&g_serialize);
 	if (err)
 		log_err("ioeng->cleanup(): pthread_mutex_lock(), err(%d)\n", err);
@@ -367,8 +372,14 @@ static int xnvme_fioe_iomem_alloc(struct thread_data *td, size_t total_mem)
 /* NOTE: using the first device for buffer-allocators) */
 static void xnvme_fioe_iomem_free(struct thread_data *td)
 {
-	struct xnvme_fioe_data *xd = td->io_ops_data;
-	struct xnvme_fioe_fwrap *fwrap = &xd->files[0];
+	struct xnvme_fioe_data *xd = NULL;
+	struct xnvme_fioe_fwrap *fwrap = NULL;
+
+	if (!td->io_ops_data)
+		return;
+
+	xd = td->io_ops_data;
+	fwrap = &xd->files[0];
 
 	if (!fwrap->dev) {
 		log_err("ioeng->iomem_free(): failed no dev-handle\n");
diff --git a/fio.1 b/fio.1
index 6630525f..27454b0b 100644
--- a/fio.1
+++ b/fio.1
@@ -292,7 +292,7 @@ For Zone Block Device Mode:
 .RS
 .P
 .PD 0
-z means Zone 
+z means Zone
 .P
 .PD
 .RE
@@ -1083,7 +1083,7 @@ provided. Data before the given offset will not be touched. This
 effectively caps the file size at `real_size \- offset'. Can be combined with
 \fBsize\fR to constrain the start and end range of the I/O workload.
 A percentage can be specified by a number between 1 and 100 followed by '%',
-for example, `offset=20%' to specify 20%. In ZBD mode, value can be set as 
+for example, `offset=20%' to specify 20%. In ZBD mode, value can be set as
 number of zones using 'z'.
 .TP
 .BI offset_align \fR=\fPint
@@ -1099,7 +1099,7 @@ specified). This option is useful if there are several jobs which are
 intended to operate on a file in parallel disjoint segments, with even
 spacing between the starting points. Percentages can be used for this option.
 If a percentage is given, the generated offset will be aligned to the minimum
-\fBblocksize\fR or to the value of \fBoffset_align\fR if provided.In ZBD mode, value 
+\fBblocksize\fR or to the value of \fBoffset_align\fR if provided.In ZBD mode, value
 can be set as number of zones using 'z'.
 .TP
 .BI number_ios \fR=\fPint
@@ -1678,7 +1678,7 @@ If this option is not specified, fio will use the full size of the given
 files or devices. If the files do not exist, size must be given. It is also
 possible to give size as a percentage between 1 and 100. If `size=20%' is
 given, fio will use 20% of the full size of the given files or devices. In ZBD mode,
-size can be given in units of number of zones using 'z'. Can be combined with \fBoffset\fR to 
+size can be given in units of number of zones using 'z'. Can be combined with \fBoffset\fR to
 constrain the start and end range that I/O will be done within.
 .TP
 .BI io_size \fR=\fPint[%|z] "\fR,\fB io_limit" \fR=\fPint[%|z]
@@ -1697,7 +1697,7 @@ also be set as number of zones using 'z'.
 .BI filesize \fR=\fPirange(int)
 Individual file sizes. May be a range, in which case fio will select sizes
 for files at random within the given range. If not given, each created file
-is the same size. This option overrides \fBsize\fR in terms of file size, 
+is the same size. This option overrides \fBsize\fR in terms of file size,
 i.e. \fBsize\fR becomes merely the default for \fBio_size\fR (and
 has no effect it all if \fBio_size\fR is set explicitly).
 .TP
@@ -2530,22 +2530,29 @@ Select the xnvme async command interface. This can take these values.
 .RS
 .TP
 .B emu
-This is default and used to emulate asynchronous I/O
+This is default and use to emulate asynchronous I/O by using a single thread to
+create a queue pair on top of a synchronous I/O interface using the NVMe driver
+IOCTL.
 .TP
 .BI thrpool
-Use thread pool for Asynchronous I/O
+Emulate an asynchronous I/O interface with a pool of userspace threads on top
+of a synchronous I/O interface using the NVMe driver IOCTL. By default four
+threads are used.
 .TP
 .BI io_uring
-Use Linux io_uring/liburing for Asynchronous I/O
+Linux native asynchronous I/O interface which supports both direct and buffered
+I/O.
 .TP
 .BI libaio
 Use Linux aio for Asynchronous I/O
 .TP
 .BI posix
-Use POSIX aio for Asynchronous I/O
+Use the posix asynchronous I/O interface to perform one or more I/O operations
+asynchronously.
 .TP
 .BI nil
-Use nil-io; For introspective perf. evaluation
+Do not transfer any data; just pretend to. This is mainly used for
+introspective performance evaluation.
 .RE
 .RE
 .TP
@@ -2555,10 +2562,14 @@ Select the xnvme synchronous command interface. This can take these values.
 .RS
 .TP
 .B nvme
-This is default and uses Linux NVMe Driver ioctl() for synchronous I/O
+This is default and uses Linux NVMe Driver ioctl() for synchronous I/O.
 .TP
 .BI psync
-Use pread()/write() for synchronous I/O
+This supports regular as well as vectored pread() and pwrite() commands.
+.TP
+.BI block
+This is the same as psync except that it also supports zone management
+commands using Linux block layer IOCTLs.
 .RE
 .RE
 .TP
@@ -2568,18 +2579,15 @@ Select the xnvme admin command interface. This can take these values.
 .RS
 .TP
 .B nvme
-This is default and uses Linux NVMe Driver ioctl() for admin commands
+This is default and uses Linux NVMe Driver ioctl() for admin commands.
 .TP
 .BI block
-Use Linux Block Layer ioctl() and sysfs for admin commands
-.TP
-.BI file_as_ns
-Use file-stat as to construct NVMe idfy responses
+Use Linux Block Layer ioctl() and sysfs for admin commands.
 .RE
 .RE
 .TP
 .BI (xnvme)xnvme_dev_nsid\fR=\fPint
-xnvme namespace identifier, for userspace NVMe driver.
+xnvme namespace identifier for userspace NVMe driver such as SPDK.
 .TP
 .BI (xnvme)xnvme_iovec
 If this option is set, xnvme will use vectored read/write commands.
@@ -3598,6 +3606,16 @@ EILSEQ) until the runtime is exceeded or the I/O size specified is
 completed. If this option is used, there are two more stats that are
 appended, the total error count and the first error. The error field given
 in the stats is the first error that was hit during the run.
+.RS
+.P
+Note: a write error from the device may go unnoticed by fio when using buffered
+IO, as the write() (or similar) system call merely dirties the kernel pages,
+unless `sync' or `direct' is used. Device IO errors occur when the dirty data is
+actually written out to disk. If fully sync writes aren't desirable, `fsync' or
+`fdatasync' can be used as well. This is specific to writes, as reads are always
+synchronous.
+.RS
+.P
 The allowed values are:
 .RS
 .RS
diff --git a/t/jobs/t0016-259ebc00.fio b/t/jobs/t0016-d54ae22.fio
similarity index 100%
rename from t/jobs/t0016-259ebc00.fio
rename to t/jobs/t0016-d54ae22.fio
diff --git a/t/jobs/t0017.fio b/t/jobs/t0017.fio
new file mode 100644
index 00000000..14486d98
--- /dev/null
+++ b/t/jobs/t0017.fio
@@ -0,0 +1,9 @@
+# Expected result: mean(slat) + mean(clat) = mean(lat)
+# Buggy result: equality does not hold
+# This is similar to t0015 and t0016 except that is uses posixaio which is
+# available on more platforms and does not have a commit hook
+
+[test]
+ioengine=posixaio
+size=1M
+iodepth=16
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index d77f20e0..504b7cdb 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -850,13 +850,23 @@ TEST_LIST = [
     {
         'test_id':          16,
         'test_class':       FioJobTest_t0015,
-        'job':              't0016-259ebc00.fio',
+        'job':              't0016-d54ae22.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
         'pre_success':      None,
         'output_format':    'json',
         'requirements':     [],
     },
+    {
+        'test_id':          17,
+        'test_class':       FioJobTest_t0015,
+        'job':              't0017.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [Requirements.not_windows],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7a7bcae0610d872951bc22dc310105c7ec1157af:

  Merge branch 's3_crypto' of github.com:hualongfeng/fio (2022-08-11 15:39:02 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to eeb302f9bfa4bbe121cae2a12a679c888164fc93:

  README: link to GitHub releases for Windows (2022-08-15 10:37:57 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      README: link to GitHub releases for Windows

 README.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/README.rst b/README.rst
index 67420903..79582dea 100644
--- a/README.rst
+++ b/README.rst
@@ -123,10 +123,12 @@ Solaris:
 	``pkgutil -i fio``.
 
 Windows:
-	Rebecca Cran <rebecca@bsdio.com> has fio packages for Windows at
-	https://bsdio.com/fio/ . The latest builds for Windows can also
-	be grabbed from https://ci.appveyor.com/project/axboe/fio by clicking
-	the latest x86 or x64 build, then selecting the ARTIFACTS tab.
+        Beginning with fio 3.31 Windows installers are available on GitHub at
+        https://github.com/axboe/fio/releases.  Rebecca Cran
+        <rebecca@bsdio.com> has fio packages for Windows at
+        https://bsdio.com/fio/ . The latest builds for Windows can also be
+        grabbed from https://ci.appveyor.com/project/axboe/fio by clicking the
+        latest x86 or x64 build and then selecting the Artifacts tab.
 
 BSDs:
 	Packages for BSDs may be available from their binary package repositories.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9dc528b1638b625b5e167983a74de4e85c5859ea:

  lib/rand: get rid of unused MAX_SEED_BUCKETS (2022-08-10 09:51:49 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7a7bcae0610d872951bc22dc310105c7ec1157af:

  Merge branch 's3_crypto' of github.com:hualongfeng/fio (2022-08-11 15:39:02 -0400)

----------------------------------------------------------------
Feng, Hualong (3):
      engines/http: Add storage class option for s3
      engines/http: Add s3 crypto options for s3
      doc: Add usage and example about s3 storage class and crypto

Friendy.Su@sony.com (1):
      ioengines: merge filecreate, filestat, filedelete engines to fileoperations.c

Vincent Fu (1):
      Merge branch 's3_crypto' of github.com:hualongfeng/fio

 HOWTO.rst                          |  14 ++
 Makefile                           |   2 +-
 engines/filecreate.c               | 118 --------------
 engines/filedelete.c               | 115 --------------
 engines/fileoperations.c           | 318 +++++++++++++++++++++++++++++++++++++
 engines/filestat.c                 | 190 ----------------------
 engines/http.c                     | 178 ++++++++++++++++++---
 examples/http-s3-crypto.fio        |  38 +++++
 examples/http-s3-storage-class.fio |  37 +++++
 fio.1                              |   9 ++
 10 files changed, 577 insertions(+), 442 deletions(-)
 delete mode 100644 engines/filecreate.c
 delete mode 100644 engines/filedelete.c
 create mode 100644 engines/fileoperations.c
 delete mode 100644 engines/filestat.c
 create mode 100644 examples/http-s3-crypto.fio
 create mode 100644 examples/http-s3-storage-class.fio

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 104cce2d..05fc117f 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2692,6 +2692,20 @@ with the caveat that when used on the command line, they must come after the
 
 	The S3 key/access id.
 
+.. option:: http_s3_sse_customer_key=str : [http]
+
+        The encryption customer key in SSE server side.
+
+.. option:: http_s3_sse_customer_algorithm=str : [http]
+
+        The encryption customer algorithm in SSE server side.
+        Default is **AES256**
+
+.. option:: http_s3_storage_class=str : [http]
+
+        Which storage class to access. User-customizable settings.
+        Default is **STANDARD**
+
 .. option:: http_swift_auth_token=str : [http]
 
 	The Swift auth token. See the example configuration file on how
diff --git a/Makefile b/Makefile
index 188a74d7..634d2c93 100644
--- a/Makefile
+++ b/Makefile
@@ -56,7 +56,7 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		pshared.c options.c \
 		smalloc.c filehash.c profile.c debug.c engines/cpu.c \
 		engines/mmap.c engines/sync.c engines/null.c engines/net.c \
-		engines/ftruncate.c engines/filecreate.c engines/filestat.c engines/filedelete.c \
+		engines/ftruncate.c engines/fileoperations.c \
 		engines/exec.c \
 		server.c client.c iolog.c backend.c libfio.c flow.c cconv.c \
 		gettime-thread.c helpers.c json.c idletime.c td_error.c \
diff --git a/engines/filecreate.c b/engines/filecreate.c
deleted file mode 100644
index 7884752d..00000000
--- a/engines/filecreate.c
+++ /dev/null
@@ -1,118 +0,0 @@
-/*
- * filecreate engine
- *
- * IO engine that doesn't do any IO, just creates files and tracks the latency
- * of the file creation.
- */
-#include <stdio.h>
-#include <fcntl.h>
-#include <errno.h>
-
-#include "../fio.h"
-
-struct fc_data {
-	enum fio_ddir stat_ddir;
-};
-
-static int open_file(struct thread_data *td, struct fio_file *f)
-{
-	struct timespec start;
-	int do_lat = !td->o.disable_lat;
-
-	dprint(FD_FILE, "fd open %s\n", f->file_name);
-
-	if (f->filetype != FIO_TYPE_FILE) {
-		log_err("fio: only files are supported\n");
-		return 1;
-	}
-	if (!strcmp(f->file_name, "-")) {
-		log_err("fio: can't read/write to stdin/out\n");
-		return 1;
-	}
-
-	if (do_lat)
-		fio_gettime(&start, NULL);
-
-	f->fd = open(f->file_name, O_CREAT|O_RDWR, 0600);
-
-	if (f->fd == -1) {
-		char buf[FIO_VERROR_SIZE];
-		int e = errno;
-
-		snprintf(buf, sizeof(buf), "open(%s)", f->file_name);
-		td_verror(td, e, buf);
-		return 1;
-	}
-
-	if (do_lat) {
-		struct fc_data *data = td->io_ops_data;
-		uint64_t nsec;
-
-		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, 0);
-	}
-
-	return 0;
-}
-
-static enum fio_q_status queue_io(struct thread_data *td,
-				  struct io_u fio_unused *io_u)
-{
-	return FIO_Q_COMPLETED;
-}
-
-/*
- * Ensure that we at least have a block size worth of IO to do for each
- * file. If the job file has td->o.size < nr_files * block_size, then
- * fio won't do anything.
- */
-static int get_file_size(struct thread_data *td, struct fio_file *f)
-{
-	f->real_file_size = td_min_bs(td);
-	return 0;
-}
-
-static int init(struct thread_data *td)
-{
-	struct fc_data *data;
-
-	data = calloc(1, sizeof(*data));
-
-	if (td_read(td))
-		data->stat_ddir = DDIR_READ;
-	else if (td_write(td))
-		data->stat_ddir = DDIR_WRITE;
-
-	td->io_ops_data = data;
-	return 0;
-}
-
-static void cleanup(struct thread_data *td)
-{
-	struct fc_data *data = td->io_ops_data;
-
-	free(data);
-}
-
-static struct ioengine_ops ioengine = {
-	.name		= "filecreate",
-	.version	= FIO_IOOPS_VERSION,
-	.init		= init,
-	.cleanup	= cleanup,
-	.queue		= queue_io,
-	.get_file_size	= get_file_size,
-	.open_file	= open_file,
-	.close_file	= generic_close_file,
-	.flags		= FIO_DISKLESSIO | FIO_SYNCIO | FIO_FAKEIO |
-				FIO_NOSTATS | FIO_NOFILEHASH,
-};
-
-static void fio_init fio_filecreate_register(void)
-{
-	register_ioengine(&ioengine);
-}
-
-static void fio_exit fio_filecreate_unregister(void)
-{
-	unregister_ioengine(&ioengine);
-}
diff --git a/engines/filedelete.c b/engines/filedelete.c
deleted file mode 100644
index df388ac9..00000000
--- a/engines/filedelete.c
+++ /dev/null
@@ -1,115 +0,0 @@
-/*
- * file delete engine
- *
- * IO engine that doesn't do any IO, just delete files and track the latency
- * of the file deletion.
- */
-#include <stdio.h>
-#include <fcntl.h>
-#include <errno.h>
-#include <sys/types.h>
-#include <unistd.h>
-#include "../fio.h"
-
-struct fc_data {
-	enum fio_ddir stat_ddir;
-};
-
-static int delete_file(struct thread_data *td, struct fio_file *f)
-{
-	struct timespec start;
-	int do_lat = !td->o.disable_lat;
-	int ret;
-
-	dprint(FD_FILE, "fd delete %s\n", f->file_name);
-
-	if (f->filetype != FIO_TYPE_FILE) {
-		log_err("fio: only files are supported\n");
-		return 1;
-	}
-	if (!strcmp(f->file_name, "-")) {
-		log_err("fio: can't read/write to stdin/out\n");
-		return 1;
-	}
-
-	if (do_lat)
-		fio_gettime(&start, NULL);
-
-	ret = unlink(f->file_name);
-
-	if (ret == -1) {
-		char buf[FIO_VERROR_SIZE];
-		int e = errno;
-
-		snprintf(buf, sizeof(buf), "delete(%s)", f->file_name);
-		td_verror(td, e, buf);
-		return 1;
-	}
-
-	if (do_lat) {
-		struct fc_data *data = td->io_ops_data;
-		uint64_t nsec;
-
-		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, 0);
-	}
-
-	return 0;
-}
-
-
-static enum fio_q_status queue_io(struct thread_data *td, struct io_u fio_unused *io_u)
-{
-	return FIO_Q_COMPLETED;
-}
-
-static int init(struct thread_data *td)
-{
-	struct fc_data *data;
-
-	data = calloc(1, sizeof(*data));
-
-	if (td_read(td))
-		data->stat_ddir = DDIR_READ;
-	else if (td_write(td))
-		data->stat_ddir = DDIR_WRITE;
-
-	td->io_ops_data = data;
-	return 0;
-}
-
-static int delete_invalidate(struct thread_data *td, struct fio_file *f)
-{
-    /* do nothing because file not opened */
-    return 0;
-}
-
-static void cleanup(struct thread_data *td)
-{
-	struct fc_data *data = td->io_ops_data;
-
-	free(data);
-}
-
-static struct ioengine_ops ioengine = {
-	.name		= "filedelete",
-	.version	= FIO_IOOPS_VERSION,
-	.init		= init,
-	.invalidate	= delete_invalidate,
-	.cleanup	= cleanup,
-	.queue		= queue_io,
-	.get_file_size	= generic_get_file_size,
-	.open_file	= delete_file,
-	.flags		=  FIO_SYNCIO | FIO_FAKEIO |
-				FIO_NOSTATS | FIO_NOFILEHASH,
-};
-
-static void fio_init fio_filedelete_register(void)
-{
-	register_ioengine(&ioengine);
-}
-
-static void fio_exit fio_filedelete_unregister(void)
-{
-	unregister_ioengine(&ioengine);
-}
diff --git a/engines/fileoperations.c b/engines/fileoperations.c
new file mode 100644
index 00000000..1db60da1
--- /dev/null
+++ b/engines/fileoperations.c
@@ -0,0 +1,318 @@
+/*
+ * fileoperations engine
+ *
+ * IO engine that doesn't do any IO, just operates files and tracks the latency
+ * of the file operation.
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include "../fio.h"
+#include "../optgroup.h"
+#include "../oslib/statx.h"
+
+
+struct fc_data {
+	enum fio_ddir stat_ddir;
+};
+
+struct filestat_options {
+	void *pad;
+	unsigned int stat_type;
+};
+
+enum {
+	FIO_FILESTAT_STAT	= 1,
+	FIO_FILESTAT_LSTAT	= 2,
+	FIO_FILESTAT_STATX	= 3,
+};
+
+static struct fio_option options[] = {
+	{
+		.name	= "stat_type",
+		.lname	= "stat_type",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct filestat_options, stat_type),
+		.help	= "Specify stat system call type to measure lookup/getattr performance",
+		.def	= "stat",
+		.posval = {
+			  { .ival = "stat",
+			    .oval = FIO_FILESTAT_STAT,
+			    .help = "Use stat(2)",
+			  },
+			  { .ival = "lstat",
+			    .oval = FIO_FILESTAT_LSTAT,
+			    .help = "Use lstat(2)",
+			  },
+			  { .ival = "statx",
+			    .oval = FIO_FILESTAT_STATX,
+			    .help = "Use statx(2) if exists",
+			  },
+		},
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_FILESTAT,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
+
+static int open_file(struct thread_data *td, struct fio_file *f)
+{
+	struct timespec start;
+	int do_lat = !td->o.disable_lat;
+
+	dprint(FD_FILE, "fd open %s\n", f->file_name);
+
+	if (f->filetype != FIO_TYPE_FILE) {
+		log_err("fio: only files are supported\n");
+		return 1;
+	}
+	if (!strcmp(f->file_name, "-")) {
+		log_err("fio: can't read/write to stdin/out\n");
+		return 1;
+	}
+
+	if (do_lat)
+		fio_gettime(&start, NULL);
+
+	f->fd = open(f->file_name, O_CREAT|O_RDWR, 0600);
+
+	if (f->fd == -1) {
+		char buf[FIO_VERROR_SIZE];
+		int e = errno;
+
+		snprintf(buf, sizeof(buf), "open(%s)", f->file_name);
+		td_verror(td, e, buf);
+		return 1;
+	}
+
+	if (do_lat) {
+		struct fc_data *data = td->io_ops_data;
+		uint64_t nsec;
+
+		nsec = ntime_since_now(&start);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, 0);
+	}
+
+	return 0;
+}
+
+static int stat_file(struct thread_data *td, struct fio_file *f)
+{
+	struct filestat_options *o = td->eo;
+	struct timespec start;
+	int do_lat = !td->o.disable_lat;
+	struct stat statbuf;
+#ifndef WIN32
+	struct statx statxbuf;
+	char *abspath;
+#endif
+	int ret;
+
+	dprint(FD_FILE, "fd stat %s\n", f->file_name);
+
+	if (f->filetype != FIO_TYPE_FILE) {
+		log_err("fio: only files are supported\n");
+		return 1;
+	}
+	if (!strcmp(f->file_name, "-")) {
+		log_err("fio: can't read/write to stdin/out\n");
+		return 1;
+	}
+
+	if (do_lat)
+		fio_gettime(&start, NULL);
+
+	switch (o->stat_type) {
+	case FIO_FILESTAT_STAT:
+		ret = stat(f->file_name, &statbuf);
+		break;
+	case FIO_FILESTAT_LSTAT:
+		ret = lstat(f->file_name, &statbuf);
+		break;
+	case FIO_FILESTAT_STATX:
+#ifndef WIN32
+		abspath = realpath(f->file_name, NULL);
+		if (abspath) {
+			ret = statx(-1, abspath, 0, STATX_ALL, &statxbuf);
+			free(abspath);
+		} else
+			ret = -1;
+#else
+		ret = -1;
+#endif
+		break;
+	default:
+		ret = -1;
+		break;
+	}
+
+	if (ret == -1) {
+		char buf[FIO_VERROR_SIZE];
+		int e = errno;
+
+		snprintf(buf, sizeof(buf), "stat(%s) type=%u", f->file_name,
+			o->stat_type);
+		td_verror(td, e, buf);
+		return 1;
+	}
+
+	if (do_lat) {
+		struct fc_data *data = td->io_ops_data;
+		uint64_t nsec;
+
+		nsec = ntime_since_now(&start);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, 0);
+	}
+
+	return 0;
+}
+
+
+static int delete_file(struct thread_data *td, struct fio_file *f)
+{
+	struct timespec start;
+	int do_lat = !td->o.disable_lat;
+	int ret;
+
+	dprint(FD_FILE, "fd delete %s\n", f->file_name);
+
+	if (f->filetype != FIO_TYPE_FILE) {
+		log_err("fio: only files are supported\n");
+		return 1;
+	}
+	if (!strcmp(f->file_name, "-")) {
+		log_err("fio: can't read/write to stdin/out\n");
+		return 1;
+	}
+
+	if (do_lat)
+		fio_gettime(&start, NULL);
+
+	ret = unlink(f->file_name);
+
+	if (ret == -1) {
+		char buf[FIO_VERROR_SIZE];
+		int e = errno;
+
+		snprintf(buf, sizeof(buf), "delete(%s)", f->file_name);
+		td_verror(td, e, buf);
+		return 1;
+	}
+
+	if (do_lat) {
+		struct fc_data *data = td->io_ops_data;
+		uint64_t nsec;
+
+		nsec = ntime_since_now(&start);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, 0);
+	}
+
+	return 0;
+}
+
+static int invalidate_do_nothing(struct thread_data *td, struct fio_file *f)
+{
+	/* do nothing because file not opened */
+	return 0;
+}
+
+static enum fio_q_status queue_io(struct thread_data *td, struct io_u *io_u)
+{
+	return FIO_Q_COMPLETED;
+}
+
+/*
+ * Ensure that we at least have a block size worth of IO to do for each
+ * file. If the job file has td->o.size < nr_files * block_size, then
+ * fio won't do anything.
+ */
+static int get_file_size(struct thread_data *td, struct fio_file *f)
+{
+	f->real_file_size = td_min_bs(td);
+	return 0;
+}
+
+static int init(struct thread_data *td)
+{
+	struct fc_data *data;
+
+	data = calloc(1, sizeof(*data));
+
+	if (td_read(td))
+		data->stat_ddir = DDIR_READ;
+	else if (td_write(td))
+		data->stat_ddir = DDIR_WRITE;
+
+	td->io_ops_data = data;
+	return 0;
+}
+
+static void cleanup(struct thread_data *td)
+{
+	struct fc_data *data = td->io_ops_data;
+
+	free(data);
+}
+
+static struct ioengine_ops ioengine_filecreate = {
+	.name		= "filecreate",
+	.version	= FIO_IOOPS_VERSION,
+	.init		= init,
+	.cleanup	= cleanup,
+	.queue		= queue_io,
+	.get_file_size	= get_file_size,
+	.open_file	= open_file,
+	.close_file	= generic_close_file,
+	.flags		= FIO_DISKLESSIO | FIO_SYNCIO | FIO_FAKEIO |
+				FIO_NOSTATS | FIO_NOFILEHASH,
+};
+
+static struct ioengine_ops ioengine_filestat = {
+	.name		= "filestat",
+	.version	= FIO_IOOPS_VERSION,
+	.init		= init,
+	.cleanup	= cleanup,
+	.queue		= queue_io,
+	.invalidate	= invalidate_do_nothing,
+	.get_file_size	= generic_get_file_size,
+	.open_file	= stat_file,
+	.flags		=  FIO_SYNCIO | FIO_FAKEIO |
+				FIO_NOSTATS | FIO_NOFILEHASH,
+	.options	= options,
+	.option_struct_size = sizeof(struct filestat_options),
+};
+
+static struct ioengine_ops ioengine_filedelete = {
+	.name		= "filedelete",
+	.version	= FIO_IOOPS_VERSION,
+	.init		= init,
+	.invalidate	= invalidate_do_nothing,
+	.cleanup	= cleanup,
+	.queue		= queue_io,
+	.get_file_size	= generic_get_file_size,
+	.open_file	= delete_file,
+	.flags		=  FIO_SYNCIO | FIO_FAKEIO |
+				FIO_NOSTATS | FIO_NOFILEHASH,
+};
+
+
+static void fio_init fio_fileoperations_register(void)
+{
+	register_ioengine(&ioengine_filecreate);
+	register_ioengine(&ioengine_filestat);
+	register_ioengine(&ioengine_filedelete);
+}
+
+static void fio_exit fio_fileoperations_unregister(void)
+{
+	unregister_ioengine(&ioengine_filecreate);
+	unregister_ioengine(&ioengine_filestat);
+	unregister_ioengine(&ioengine_filedelete);
+}
diff --git a/engines/filestat.c b/engines/filestat.c
deleted file mode 100644
index e587eb54..00000000
--- a/engines/filestat.c
+++ /dev/null
@@ -1,190 +0,0 @@
-/*
- * filestat engine
- *
- * IO engine that doesn't do any IO, just stat files and tracks the latency
- * of the file stat.
- */
-#include <stdio.h>
-#include <stdlib.h>
-#include <fcntl.h>
-#include <errno.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <unistd.h>
-#include "../fio.h"
-#include "../optgroup.h"
-#include "../oslib/statx.h"
-
-struct fc_data {
-	enum fio_ddir stat_ddir;
-};
-
-struct filestat_options {
-	void *pad;
-	unsigned int stat_type;
-};
-
-enum {
-	FIO_FILESTAT_STAT	= 1,
-	FIO_FILESTAT_LSTAT	= 2,
-	FIO_FILESTAT_STATX	= 3,
-};
-
-static struct fio_option options[] = {
-	{
-		.name	= "stat_type",
-		.lname	= "stat_type",
-		.type	= FIO_OPT_STR,
-		.off1	= offsetof(struct filestat_options, stat_type),
-		.help	= "Specify stat system call type to measure lookup/getattr performance",
-		.def	= "stat",
-		.posval = {
-			  { .ival = "stat",
-			    .oval = FIO_FILESTAT_STAT,
-			    .help = "Use stat(2)",
-			  },
-			  { .ival = "lstat",
-			    .oval = FIO_FILESTAT_LSTAT,
-			    .help = "Use lstat(2)",
-			  },
-			  { .ival = "statx",
-			    .oval = FIO_FILESTAT_STATX,
-			    .help = "Use statx(2) if exists",
-			  },
-		},
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_FILESTAT,
-	},
-	{
-		.name	= NULL,
-	},
-};
-
-static int stat_file(struct thread_data *td, struct fio_file *f)
-{
-	struct filestat_options *o = td->eo;
-	struct timespec start;
-	int do_lat = !td->o.disable_lat;
-	struct stat statbuf;
-#ifndef WIN32
-	struct statx statxbuf;
-	char *abspath;
-#endif
-	int ret;
-
-	dprint(FD_FILE, "fd stat %s\n", f->file_name);
-
-	if (f->filetype != FIO_TYPE_FILE) {
-		log_err("fio: only files are supported\n");
-		return 1;
-	}
-	if (!strcmp(f->file_name, "-")) {
-		log_err("fio: can't read/write to stdin/out\n");
-		return 1;
-	}
-
-	if (do_lat)
-		fio_gettime(&start, NULL);
-
-	switch (o->stat_type){
-	case FIO_FILESTAT_STAT:
-		ret = stat(f->file_name, &statbuf);
-		break;
-	case FIO_FILESTAT_LSTAT:
-		ret = lstat(f->file_name, &statbuf);
-		break;
-	case FIO_FILESTAT_STATX:
-#ifndef WIN32
-		abspath = realpath(f->file_name, NULL);
-		if (abspath) {
-			ret = statx(-1, abspath, 0, STATX_ALL, &statxbuf);
-			free(abspath);
-		} else
-			ret = -1;
-#else
-		ret = -1;
-#endif
-		break;
-	default:
-		ret = -1;
-		break;
-	}
-
-	if (ret == -1) {
-		char buf[FIO_VERROR_SIZE];
-		int e = errno;
-
-		snprintf(buf, sizeof(buf), "stat(%s) type=%u", f->file_name,
-			o->stat_type);
-		td_verror(td, e, buf);
-		return 1;
-	}
-
-	if (do_lat) {
-		struct fc_data *data = td->io_ops_data;
-		uint64_t nsec;
-
-		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, 0);
-	}
-
-	return 0;
-}
-
-static enum fio_q_status queue_io(struct thread_data *td, struct io_u fio_unused *io_u)
-{
-	return FIO_Q_COMPLETED;
-}
-
-static int init(struct thread_data *td)
-{
-	struct fc_data *data;
-
-	data = calloc(1, sizeof(*data));
-
-	if (td_read(td))
-		data->stat_ddir = DDIR_READ;
-	else if (td_write(td))
-		data->stat_ddir = DDIR_WRITE;
-
-	td->io_ops_data = data;
-	return 0;
-}
-
-static void cleanup(struct thread_data *td)
-{
-	struct fc_data *data = td->io_ops_data;
-
-	free(data);
-}
-
-static int stat_invalidate(struct thread_data *td, struct fio_file *f)
-{
-	/* do nothing because file not opened */
-	return 0;
-}
-
-static struct ioengine_ops ioengine = {
-	.name		= "filestat",
-	.version	= FIO_IOOPS_VERSION,
-	.init		= init,
-	.cleanup	= cleanup,
-	.queue		= queue_io,
-	.invalidate	= stat_invalidate,
-	.get_file_size	= generic_get_file_size,
-	.open_file	= stat_file,
-	.flags		=  FIO_SYNCIO | FIO_FAKEIO |
-				FIO_NOSTATS | FIO_NOFILEHASH,
-	.options	= options,
-	.option_struct_size = sizeof(struct filestat_options),
-};
-
-static void fio_init fio_filestat_register(void)
-{
-	register_ioengine(&ioengine);
-}
-
-static void fio_exit fio_filestat_unregister(void)
-{
-	unregister_ioengine(&ioengine);
-}
diff --git a/engines/http.c b/engines/http.c
index 1de9e66c..56dc7d1b 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -57,6 +57,9 @@ struct http_options {
 	char *s3_key;
 	char *s3_keyid;
 	char *s3_region;
+	char *s3_sse_customer_key;
+	char *s3_sse_customer_algorithm;
+	char *s3_storage_class;
 	char *swift_auth_token;
 	int verbose;
 	unsigned int mode;
@@ -161,6 +164,36 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group    = FIO_OPT_G_HTTP,
 	},
+	{
+		.name     = "http_s3_sse_customer_key",
+		.lname    = "SSE Customer Key",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "S3 SSE Customer Key",
+		.off1     = offsetof(struct http_options, s3_sse_customer_key),
+		.def	  = "",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
+	{
+		.name     = "http_s3_sse_customer_algorithm",
+		.lname    = "SSE Customer Algorithm",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "S3 SSE Customer Algorithm",
+		.off1     = offsetof(struct http_options, s3_sse_customer_algorithm),
+		.def	  = "AES256",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
+	{
+		.name     = "http_s3_storage_class",
+		.lname    = "S3 Storage class",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "S3 Storage Class",
+		.off1     = offsetof(struct http_options, s3_storage_class),
+		.def	  = "STANDARD",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
 	{
 		.name     = "http_mode",
 		.lname    = "Request mode to use",
@@ -266,6 +299,54 @@ static char *_gen_hex_md5(const char *p, size_t len)
 	return _conv_hex(hash, MD5_DIGEST_LENGTH);
 }
 
+static char *_conv_base64_encode(const unsigned char *p, size_t len)
+{
+	char *r, *ret;
+	int i;
+	static const char sEncodingTable[] = {
+		'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
+		'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P',
+		'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
+		'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f',
+		'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+		'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
+		'w', 'x', 'y', 'z', '0', '1', '2', '3',
+		'4', '5', '6', '7', '8', '9', '+', '/'
+	};
+
+	size_t out_len = 4 * ((len + 2) / 3);
+	ret = r = malloc(out_len + 1);
+
+	for (i = 0; i < len - 2; i += 3) {
+		*r++ = sEncodingTable[(p[i] >> 2) & 0x3F];
+		*r++ = sEncodingTable[((p[i] & 0x3) << 4) | ((int) (p[i + 1] & 0xF0) >> 4)];
+		*r++ = sEncodingTable[((p[i + 1] & 0xF) << 2) | ((int) (p[i + 2] & 0xC0) >> 6)];
+		*r++ = sEncodingTable[p[i + 2] & 0x3F];
+	}
+
+	if (i < len) {
+		*r++ = sEncodingTable[(p[i] >> 2) & 0x3F];
+		if (i == (len - 1)) {
+			*r++ = sEncodingTable[((p[i] & 0x3) << 4)];
+			*r++ = '=';
+		} else {
+			*r++ = sEncodingTable[((p[i] & 0x3) << 4) | ((int) (p[i + 1] & 0xF0) >> 4)];
+			*r++ = sEncodingTable[((p[i + 1] & 0xF) << 2)];
+		}
+		*r++ = '=';
+	}
+
+	ret[out_len]=0;
+	return ret;
+}
+
+static char *_gen_base64_md5(const unsigned char *p, size_t len)
+{
+	unsigned char hash[MD5_DIGEST_LENGTH];
+	MD5((unsigned char*)p, len, hash);
+	return _conv_base64_encode(hash, MD5_DIGEST_LENGTH);
+}
+
 static void _hmac(unsigned char *md, void *key, int key_len, char *data) {
 #ifndef CONFIG_HAVE_OPAQUE_HMAC_CTX
 	HMAC_CTX _ctx;
@@ -335,8 +416,8 @@ static void _add_aws_auth_header(CURL *curl, struct curl_slist *slist, struct ht
 	char date_iso[32];
 	char method[8];
 	char dkey[128];
-	char creq[512];
-	char sts[256];
+	char creq[4096];
+	char sts[512];
 	char s[512];
 	char *uri_encoded = NULL;
 	char *dsha = NULL;
@@ -345,6 +426,9 @@ static void _add_aws_auth_header(CURL *curl, struct curl_slist *slist, struct ht
 	const char *service = "s3";
 	const char *aws = "aws4_request";
 	unsigned char md[SHA256_DIGEST_LENGTH];
+	unsigned char sse_key[33] = {0};
+	char *sse_key_base64 = NULL;
+	char *sse_key_md5_base64 = NULL;
 
 	time_t t = time(NULL);
 	struct tm *gtm = gmtime(&t);
@@ -353,6 +437,9 @@ static void _add_aws_auth_header(CURL *curl, struct curl_slist *slist, struct ht
 	strftime (date_iso, sizeof(date_iso), "%Y%m%dT%H%M%SZ", gtm);
 	uri_encoded = _aws_uriencode(uri);
 
+	if (o->s3_sse_customer_key != NULL)
+		strncpy((char*)sse_key, o->s3_sse_customer_key, sizeof(sse_key) - 1);
+
 	if (op == DDIR_WRITE) {
 		dsha = _gen_hex_sha256(buf, len);
 		sprintf(method, "PUT");
@@ -366,22 +453,50 @@ static void _add_aws_auth_header(CURL *curl, struct curl_slist *slist, struct ht
 	}
 
 	/* Create the canonical request first */
-	snprintf(creq, sizeof(creq),
-	"%s\n"
-	"%s\n"
-	"\n"
-	"host:%s\n"
-	"x-amz-content-sha256:%s\n"
-	"x-amz-date:%s\n"
-	"\n"
-	"host;x-amz-content-sha256;x-amz-date\n"
-	"%s"
-	, method
-	, uri_encoded, o->host, dsha, date_iso, dsha);
+	if (sse_key[0] != '\0') {
+		sse_key_base64 = _conv_base64_encode(sse_key, sizeof(sse_key) - 1);
+		sse_key_md5_base64 = _gen_base64_md5(sse_key, sizeof(sse_key) - 1);
+		snprintf(creq, sizeof(creq),
+			"%s\n"
+			"%s\n"
+			"\n"
+			"host:%s\n"
+			"x-amz-content-sha256:%s\n"
+			"x-amz-date:%s\n"
+			"x-amz-server-side-encryption-customer-algorithm:%s\n"
+			"x-amz-server-side-encryption-customer-key:%s\n"
+			"x-amz-server-side-encryption-customer-key-md5:%s\n"
+			"x-amz-storage-class:%s\n"
+			"\n"
+			"host;x-amz-content-sha256;x-amz-date;"
+			"x-amz-server-side-encryption-customer-algorithm;"
+			"x-amz-server-side-encryption-customer-key;"
+			"x-amz-server-side-encryption-customer-key-md5;"
+			"x-amz-storage-class\n"
+			"%s"
+			, method
+			, uri_encoded, o->host, dsha, date_iso
+			, o->s3_sse_customer_algorithm, sse_key_base64
+			, sse_key_md5_base64, o->s3_storage_class, dsha);
+	} else {
+		snprintf(creq, sizeof(creq),
+			"%s\n"
+			"%s\n"
+			"\n"
+			"host:%s\n"
+			"x-amz-content-sha256:%s\n"
+			"x-amz-date:%s\n"
+			"x-amz-storage-class:%s\n"
+			"\n"
+			"host;x-amz-content-sha256;x-amz-date;x-amz-storage-class\n"
+			"%s"
+			, method
+			, uri_encoded, o->host, dsha, date_iso, o->s3_storage_class, dsha);
+	}
 
 	csha = _gen_hex_sha256(creq, strlen(creq));
 	snprintf(sts, sizeof(sts), "AWS4-HMAC-SHA256\n%s\n%s/%s/%s/%s\n%s",
-		date_iso, date_short, o->s3_region, service, aws, csha);
+			date_iso, date_short, o->s3_region, service, aws, csha);
 
 	snprintf((char *)dkey, sizeof(dkey), "AWS4%s", o->s3_key);
 	_hmac(md, dkey, strlen(dkey), date_short);
@@ -401,9 +516,32 @@ static void _add_aws_auth_header(CURL *curl, struct curl_slist *slist, struct ht
 	snprintf(s, sizeof(s), "x-amz-date: %s", date_iso);
 	slist = curl_slist_append(slist, s);
 
-	snprintf(s, sizeof(s), "Authorization: AWS4-HMAC-SHA256 Credential=%s/%s/%s/s3/aws4_request,"
-	"SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=%s",
-	o->s3_keyid, date_short, o->s3_region, signature);
+	if (sse_key[0] != '\0') {
+		snprintf(s, sizeof(s), "x-amz-server-side-encryption-customer-algorithm: %s", o->s3_sse_customer_algorithm);
+		slist = curl_slist_append(slist, s);
+		snprintf(s, sizeof(s), "x-amz-server-side-encryption-customer-key: %s", sse_key_base64);
+		slist = curl_slist_append(slist, s);
+		snprintf(s, sizeof(s), "x-amz-server-side-encryption-customer-key-md5: %s", sse_key_md5_base64);
+		slist = curl_slist_append(slist, s);
+	}
+
+	snprintf(s, sizeof(s), "x-amz-storage-class: %s", o->s3_storage_class);
+	slist = curl_slist_append(slist, s);
+
+	if (sse_key[0] != '\0') {
+		snprintf(s, sizeof(s), "Authorization: AWS4-HMAC-SHA256 Credential=%s/%s/%s/s3/aws4_request,"
+			"SignedHeaders=host;x-amz-content-sha256;"
+			"x-amz-date;x-amz-server-side-encryption-customer-algorithm;"
+			"x-amz-server-side-encryption-customer-key;"
+			"x-amz-server-side-encryption-customer-key-md5;"
+			"x-amz-storage-class,"
+			"Signature=%s",
+		o->s3_keyid, date_short, o->s3_region, signature);
+	} else {
+		snprintf(s, sizeof(s), "Authorization: AWS4-HMAC-SHA256 Credential=%s/%s/%s/s3/aws4_request,"
+			"SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-storage-class,Signature=%s",
+			o->s3_keyid, date_short, o->s3_region, signature);
+	}
 	slist = curl_slist_append(slist, s);
 
 	curl_easy_setopt(curl, CURLOPT_HTTPHEADER, slist);
@@ -412,6 +550,10 @@ static void _add_aws_auth_header(CURL *curl, struct curl_slist *slist, struct ht
 	free(csha);
 	free(dsha);
 	free(signature);
+	if (sse_key_base64 != NULL) {
+		free(sse_key_base64);
+		free(sse_key_md5_base64);
+	}
 }
 
 static void _add_swift_header(CURL *curl, struct curl_slist *slist, struct http_options *o,
diff --git a/examples/http-s3-crypto.fio b/examples/http-s3-crypto.fio
new file mode 100644
index 00000000..2403746e
--- /dev/null
+++ b/examples/http-s3-crypto.fio
@@ -0,0 +1,38 @@
+# Example test for the HTTP engine's S3 support against Amazon AWS.
+# Obviously, you have to adjust the S3 credentials; for this example,
+# they're passed in via the environment.
+# And you can set the SSE Customer Key and Algorithm to test Server
+# Side Encryption.
+#
+
+[global]
+ioengine=http
+name=test
+direct=1
+filename=/larsmb-fio-test/object
+http_verbose=0
+https=on
+http_mode=s3
+http_s3_key=${S3_KEY}
+http_s3_keyid=${S3_ID}
+http_host=s3.eu-central-1.amazonaws.com
+http_s3_region=eu-central-1
+http_s3_sse_customer_key=${SSE_KEY}
+http_s3_sse_customer_algorithm=AES256
+group_reporting
+
+# With verify, this both writes and reads the object
+[create]
+rw=write
+bs=4k
+size=64k
+io_size=4k
+verify=sha256
+
+[trim]
+stonewall
+rw=trim
+bs=4k
+size=64k
+io_size=4k
+
diff --git a/examples/http-s3-storage-class.fio b/examples/http-s3-storage-class.fio
new file mode 100644
index 00000000..9ee23837
--- /dev/null
+++ b/examples/http-s3-storage-class.fio
@@ -0,0 +1,37 @@
+# Example test for the HTTP engine's S3 support against Amazon AWS.
+# Obviously, you have to adjust the S3 credentials; for this example,
+# they're passed in via the environment.
+# And here add storage class parameter, you can set normal test for
+# STANDARD and compression test for another storage class.
+#
+
+[global]
+ioengine=http
+name=test
+direct=1
+filename=/larsmb-fio-test/object
+http_verbose=0
+https=on
+http_mode=s3
+http_s3_key=${S3_KEY}
+http_s3_keyid=${S3_ID}
+http_host=s3.eu-central-1.amazonaws.com
+http_s3_region=eu-central-1
+http_s3_storage_class=${STORAGE_CLASS}
+group_reporting
+
+# With verify, this both writes and reads the object
+[create]
+rw=write
+bs=4k
+size=64k
+io_size=4k
+verify=sha256
+
+[trim]
+stonewall
+rw=trim
+bs=4k
+size=64k
+io_size=4k
+
diff --git a/fio.1 b/fio.1
index ce9bf3ef..6630525f 100644
--- a/fio.1
+++ b/fio.1
@@ -2308,6 +2308,15 @@ The S3 secret key.
 .BI (http)http_s3_keyid \fR=\fPstr
 The S3 key/access id.
 .TP
+.BI (http)http_s3_sse_customer_key \fR=\fPstr
+The encryption customer key in SSE server side.
+.TP
+.BI (http)http_s3_sse_customer_algorithm \fR=\fPstr
+The encryption customer algorithm in SSE server side. Default is \fBAES256\fR
+.TP
+.BI (http)http_s3_storage_class \fR=\fPstr
+Which storage class to access. User-customizable settings. Default is \fBSTANDARD\fR
+.TP
 .BI (http)http_swift_auth_token \fR=\fPstr
 The Swift auth token. See the example configuration file on how to
 retrieve this.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6cafe8445fd1e04e5f7d67bbc73029a538d1b253:

  Fio 3.31 (2022-08-09 14:41:25 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9dc528b1638b625b5e167983a74de4e85c5859ea:

  lib/rand: get rid of unused MAX_SEED_BUCKETS (2022-08-10 09:51:49 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'multi_seed_refill' of https://github.com/sungup/fio
      lib/rand: get rid of unused MAX_SEED_BUCKETS

Sungup Moon (1):
      lib/rand: Enhance __fill_random_buf using the multi random seed

 configure  | 17 +++++++++++++++++
 lib/rand.c | 33 ++++++++++++++++++++++++++++++++-
 2 files changed, 49 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 36450df8..a2b9bd4c 100755
--- a/configure
+++ b/configure
@@ -116,6 +116,10 @@ has() {
   type "$1" >/dev/null 2>&1
 }
 
+num() {
+  echo "$1" | grep -P -q "^[0-9]+$"
+}
+
 check_define() {
   cat > $TMPC <<EOF
 #if !defined($1)
@@ -174,6 +178,7 @@ libnfs=""
 xnvme=""
 libzbc=""
 dfs=""
+seed_buckets=""
 dynamic_engines="no"
 prefix=/usr/local
 
@@ -255,6 +260,8 @@ for opt do
   ;;
   --enable-asan) asan="yes"
   ;;
+  --seed-buckets=*) seed_buckets="$optarg"
+  ;;
   --help)
     show_help="yes"
     ;;
@@ -302,6 +309,7 @@ if test "$show_help" = "yes" ; then
   echo "--dynamic-libengines    Lib-based ioengines as dynamic libraries"
   echo "--disable-dfs           Disable DAOS File System support even if found"
   echo "--enable-asan           Enable address sanitizer"
+  echo "--seed-buckets=         Number of seed buckets for the refill-buffer"
   exit $exit_val
 fi
 
@@ -3273,6 +3281,15 @@ if test "$disable_tcmalloc" != "yes"; then
   fi
 fi
 print_config "TCMalloc support" "$tcmalloc"
+if ! num "$seed_buckets"; then
+  seed_buckets=4
+elif test "$seed_buckets" -lt 2; then
+  seed_buckets=2
+elif test "$seed_buckets" -gt 16; then
+  seed_buckets=16
+fi
+echo "#define CONFIG_SEED_BUCKETS $seed_buckets" >> $config_host_h
+print_config "seed_buckets" "$seed_buckets"
 
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "GFIO_LIBS+=$GFIO_LIBS" >> $config_host_mak
diff --git a/lib/rand.c b/lib/rand.c
index 1e669116..0e787a62 100644
--- a/lib/rand.c
+++ b/lib/rand.c
@@ -95,7 +95,7 @@ void init_rand_seed(struct frand_state *state, uint64_t seed, bool use64)
 		__init_rand64(&state->state64, seed);
 }
 
-void __fill_random_buf(void *buf, unsigned int len, uint64_t seed)
+void __fill_random_buf_small(void *buf, unsigned int len, uint64_t seed)
 {
 	uint64_t *b = buf;
 	uint64_t *e = b  + len / sizeof(*b);
@@ -110,6 +110,37 @@ void __fill_random_buf(void *buf, unsigned int len, uint64_t seed)
 		__builtin_memcpy(e, &seed, rest);
 }
 
+void __fill_random_buf(void *buf, unsigned int len, uint64_t seed)
+{
+	static uint64_t prime[] = {1, 2, 3, 5, 7, 11, 13, 17,
+				   19, 23, 29, 31, 37, 41, 43, 47};
+	uint64_t *b, *e, s[CONFIG_SEED_BUCKETS];
+	unsigned int rest;
+	int p;
+
+	/*
+	 * Calculate the max index which is multiples of the seed buckets.
+	 */
+	rest = (len / sizeof(*b) / CONFIG_SEED_BUCKETS) * CONFIG_SEED_BUCKETS;
+
+	b = buf;
+	e = b + rest;
+
+	rest = len - (rest * sizeof(*b));
+
+	for (p = 0; p < CONFIG_SEED_BUCKETS; p++)
+		s[p] = seed * prime[p];
+
+	for (; b != e; b += CONFIG_SEED_BUCKETS) {
+		for (p = 0; p < CONFIG_SEED_BUCKETS; ++p) {
+			b[p] = s[p];
+			s[p] = __hash_u64(s[p]);
+		}
+	}
+
+	__fill_random_buf_small(b, rest, s[0]);
+}
+
 uint64_t fill_random_buf(struct frand_state *fs, void *buf,
 			 unsigned int len)
 {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit de31fe9ab3dd6115cd0d5c77354f67f06595570d:

  testing: add test for slat + clat = tlat (2022-08-07 12:27:55 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6cafe8445fd1e04e5f7d67bbc73029a538d1b253:

  Fio 3.31 (2022-08-09 14:41:25 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'master' of ssh://git.kernel.dk/data/git/fio
      Fio 3.31

Vincent Fu (2):
      ci: upload tagged AppVeyor installers as GitHub releases
      ci: drop master branch requirement for AppVeyor releases

 .appveyor.yml   | 12 ++++++++++++
 FIO-VERSION-GEN |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index b94eefe3..92301ca9 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -50,5 +50,17 @@ after_build:
 test_script:
   - python.exe t/run-fio-tests.py --artifact-root test-artifacts --debug
 
+deploy:
+  - provider: GitHub
+    description: fio Windows installer
+    auth_token:                      # encrypted token from GitHub
+      secure: Tjj+xRQEV25P6dQgboUblTCKx/LtUOUav2bvzSCtwMhHMAxrrn2adod6nlTf0ItV
+    artifact: fio.msi                # upload installer to release assets
+    draft: false
+    prerelease: false
+    on:
+      APPVEYOR_REPO_TAG: true        # deploy on tag push only
+      DISTRO: cygwin
+
 on_finish:
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && [ -d test-artifacts ] && 7z a -t7z test-artifacts.7z test-artifacts -xr!foo.0.0 -xr!latency.?.0 -xr!fio_jsonplus_clat2csv.test && appveyor PushArtifact test-artifacts.7z'
diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index fa64f50f..72630dd0 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.30
+DEF_VER=fio-3.31
 
 LF='
 '

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c08f9533042e909d4b4b12fdb8d14f1bc8e23dff:

  filesetup: use correct random seed for non-uniform distributions (2022-08-03 16:18:53 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to de31fe9ab3dd6115cd0d5c77354f67f06595570d:

  testing: add test for slat + clat = tlat (2022-08-07 12:27:55 -0400)

----------------------------------------------------------------
Vincent Fu (3):
      testing: add test for slat + clat = tlat
      engines/null: add FIO_ASYNCIO_SETS_ISSUE_TIME flag
      testing: add test for slat + clat = tlat

 engines/null.c            |  2 ++
 t/jobs/t0015-e78980ff.fio |  7 +++++++
 t/jobs/t0016-259ebc00.fio |  7 +++++++
 t/run-fio-tests.py        | 41 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+)
 create mode 100644 t/jobs/t0015-e78980ff.fio
 create mode 100644 t/jobs/t0016-259ebc00.fio

---

Diff of recent changes:

diff --git a/engines/null.c b/engines/null.c
index 2df56718..68759c26 100644
--- a/engines/null.c
+++ b/engines/null.c
@@ -113,9 +113,11 @@ static struct null_data *null_init(struct thread_data *td)
 	if (td->o.iodepth != 1) {
 		nd->io_us = (struct io_u **) malloc(td->o.iodepth * sizeof(struct io_u *));
 		memset(nd->io_us, 0, td->o.iodepth * sizeof(struct io_u *));
+		td->io_ops->flags |= FIO_ASYNCIO_SETS_ISSUE_TIME;
 	} else
 		td->io_ops->flags |= FIO_SYNCIO;
 
+	td_set_ioengine_flags(td);
 	return nd;
 }
 
diff --git a/t/jobs/t0015-e78980ff.fio b/t/jobs/t0015-e78980ff.fio
new file mode 100644
index 00000000..c650c0b2
--- /dev/null
+++ b/t/jobs/t0015-e78980ff.fio
@@ -0,0 +1,7 @@
+# Expected result: mean(slat) + mean(clat) = mean(lat)
+# Buggy result: equality does not hold
+
+[test]
+ioengine=libaio
+size=1M
+iodepth=16
diff --git a/t/jobs/t0016-259ebc00.fio b/t/jobs/t0016-259ebc00.fio
new file mode 100644
index 00000000..1b418e7c
--- /dev/null
+++ b/t/jobs/t0016-259ebc00.fio
@@ -0,0 +1,7 @@
+# Expected result: mean(slat) + mean(clat) = mean(lat)
+# Buggy result: equality does not hold
+
+[test]
+ioengine=null
+size=1M
+iodepth=16
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 32cdbc19..d77f20e0 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -527,6 +527,27 @@ class FioJobTest_t0014(FioJobTest):
             return
 
 
+class FioJobTest_t0015(FioJobTest):
+    """Test consists of fio test jobs t0015 and t0016
+    Confirm that mean(slat) + mean(clat) = mean(tlat)"""
+
+    def check_result(self):
+        super(FioJobTest_t0015, self).check_result()
+
+        if not self.passed:
+            return
+
+        slat = self.json_data['jobs'][0]['read']['slat_ns']['mean']
+        clat = self.json_data['jobs'][0]['read']['clat_ns']['mean']
+        tlat = self.json_data['jobs'][0]['read']['lat_ns']['mean']
+        logging.debug('Test %d: slat %f, clat %f, tlat %f', self.testnum, slat, clat, tlat)
+
+        if abs(slat + clat - tlat) > 1:
+            self.failure_reason = "{0} slat {1} + clat {2} = {3} != tlat {4},".format(
+                self.failure_reason, slat, clat, slat+clat, tlat)
+            self.passed = False
+
+
 class FioJobTest_iops_rate(FioJobTest):
     """Test consists of fio test job t0009
     Confirm that job0 iops == 1000
@@ -816,6 +837,26 @@ TEST_LIST = [
         'output_format':    'json',
         'requirements':     [],
     },
+    {
+        'test_id':          15,
+        'test_class':       FioJobTest_t0015,
+        'job':              't0015-e78980ff.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [Requirements.linux, Requirements.libaio],
+    },
+    {
+        'test_id':          16,
+        'test_class':       FioJobTest_t0015,
+        'job':              't0016-259ebc00.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7006d70c7c8b9a39cf3dfdd839d1975295c10527:

  Merge branch 'io_uring-numa' (2022-08-02 10:20:31 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c08f9533042e909d4b4b12fdb8d14f1bc8e23dff:

  filesetup: use correct random seed for non-uniform distributions (2022-08-03 16:18:53 -0400)

----------------------------------------------------------------
Vincent Fu (3):
      examples: fix ioengine in zbd-rand-write.fio
      engines/null: fill issue_time during commit
      filesetup: use correct random seed for non-uniform distributions

 engines/null.c              | 19 +++++++++++++++++++
 examples/zbd-rand-write.fio |  2 +-
 filesetup.c                 |  2 +-
 3 files changed, 21 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/engines/null.c b/engines/null.c
index 8dcd1b21..2df56718 100644
--- a/engines/null.c
+++ b/engines/null.c
@@ -44,9 +44,28 @@ static int null_getevents(struct null_data *nd, unsigned int min_events,
 	return ret;
 }
 
+static void null_queued(struct thread_data *td, struct null_data *nd)
+{
+	struct timespec now;
+
+	if (!fio_fill_issue_time(td))
+		return;
+
+	fio_gettime(&now, NULL);
+
+	for (int i = 0; i < nd->queued; i++) {
+		struct io_u *io_u = nd->io_us[i];
+
+		memcpy(&io_u->issue_time, &now, sizeof(now));
+		io_u_queued(td, io_u);
+	}
+}
+
 static int null_commit(struct thread_data *td, struct null_data *nd)
 {
 	if (!nd->events) {
+		null_queued(td, nd);
+
 #ifndef FIO_EXTERNAL_ENGINE
 		io_u_mark_submit(td, nd->queued);
 #endif
diff --git a/examples/zbd-rand-write.fio b/examples/zbd-rand-write.fio
index 46cddd06..9494a583 100644
--- a/examples/zbd-rand-write.fio
+++ b/examples/zbd-rand-write.fio
@@ -1,4 +1,4 @@
-; Using the libaio ioengine, random write to a (zoned) block device,
+; Using the psync ioengine, random write to a (zoned) block device,
 ; writing at most 32 zones at a time. Target zones are chosen randomly
 ; and writes directed at the write pointer of the chosen zones
 
diff --git a/filesetup.c b/filesetup.c
index e0592209..3e2ccf9b 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1495,7 +1495,7 @@ static void __init_rand_distribution(struct thread_data *td, struct fio_file *f)
 
 	seed = jhash(f->file_name, strlen(f->file_name), 0) * td->thread_number;
 	if (!td->o.rand_repeatable)
-		seed = td->rand_seeds[4];
+		seed = td->rand_seeds[FIO_RAND_BLOCK_OFF];
 
 	if (td->o.random_distribution == FIO_RAND_DIST_ZIPF)
 		zipf_init(&f->zipf, nranges, td->o.zipf_theta.u.f, td->o.random_center.u.f, seed);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 55037c4839c65612fa388ae937e63661d8192ed9:

  t/io_uring: switch to GiB/sec if numbers get large (2022-07-31 12:06:12 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7006d70c7c8b9a39cf3dfdd839d1975295c10527:

  Merge branch 'io_uring-numa' (2022-08-02 10:20:31 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      t/io_uring: support NUMA placement
      Merge branch 'io_uring-numa'

 t/io_uring.c | 446 +++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 252 insertions(+), 194 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 335a06ed..35bf1956 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -11,6 +11,10 @@
 #include <libaio.h>
 #endif
 
+#ifdef CONFIG_LIBNUMA
+#include <numa.h>
+#endif
+
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <sys/ioctl.h>
@@ -100,6 +104,9 @@ struct submitter {
 	io_context_t aio_ctx;
 #endif
 
+	int numa_node;
+	const char *filename;
+
 	struct file files[MAX_FDS];
 	unsigned nr_files;
 	unsigned cur_file;
@@ -110,6 +117,7 @@ static struct submitter *submitter;
 static volatile int finish;
 static int stats_running;
 static unsigned long max_iops;
+static long page_size;
 
 static int depth = DEPTH;
 static int batch_submit = BATCH_SUBMIT;
@@ -130,6 +138,7 @@ static int runtime = 0;		/* runtime */
 static int random_io = 1;	/* random or sequential IO */
 static int register_ring = 1;	/* register ring */
 static int use_sync = 0;	/* use preadv2 */
+static int numa_placement = 0;	/* set to node of device */
 
 static unsigned long tsc_rate;
 
@@ -611,12 +620,191 @@ static int reap_events_uring(struct submitter *s)
 	return reaped;
 }
 
+static void set_affinity(struct submitter *s)
+{
+#ifdef CONFIG_LIBNUMA
+	struct bitmask *mask;
+
+	if (s->numa_node == -1)
+		return;
+
+	numa_set_preferred(s->numa_node);
+
+	mask = numa_allocate_cpumask();
+	numa_node_to_cpus(s->numa_node, mask);
+	numa_sched_setaffinity(s->tid, mask);
+#endif
+}
+
+static int detect_node(struct submitter *s, const char *name)
+{
+#ifdef CONFIG_LIBNUMA
+	const char *base = basename(name);
+	char str[128];
+	int ret, fd, node;
+
+	sprintf(str, "/sys/block/%s/device/numa_node", base);
+	fd = open(str, O_RDONLY);
+	if (fd < 0)
+		return -1;
+
+	ret = read(fd, str, sizeof(str));
+	if (ret < 0) {
+		close(fd);
+		return -1;
+	}
+	node = atoi(str);
+	s->numa_node = node;
+	close(fd);
+#else
+	s->numa_node = -1;
+#endif
+	return 0;
+}
+
+static int setup_aio(struct submitter *s)
+{
+#ifdef CONFIG_LIBAIO
+	if (polled) {
+		fprintf(stderr, "aio does not support polled IO\n");
+		polled = 0;
+	}
+	if (sq_thread_poll) {
+		fprintf(stderr, "aio does not support SQPOLL IO\n");
+		sq_thread_poll = 0;
+	}
+	if (do_nop) {
+		fprintf(stderr, "aio does not support polled IO\n");
+		do_nop = 0;
+	}
+	if (fixedbufs || register_files) {
+		fprintf(stderr, "aio does not support registered files or buffers\n");
+		fixedbufs = register_files = 0;
+	}
+
+	return io_queue_init(roundup_pow2(depth), &s->aio_ctx);
+#else
+	fprintf(stderr, "Legacy AIO not available on this system/build\n");
+	errno = EINVAL;
+	return -1;
+#endif
+}
+
+static int setup_ring(struct submitter *s)
+{
+	struct io_sq_ring *sring = &s->sq_ring;
+	struct io_cq_ring *cring = &s->cq_ring;
+	struct io_uring_params p;
+	int ret, fd;
+	void *ptr;
+
+	memset(&p, 0, sizeof(p));
+
+	if (polled && !do_nop)
+		p.flags |= IORING_SETUP_IOPOLL;
+	if (sq_thread_poll) {
+		p.flags |= IORING_SETUP_SQPOLL;
+		if (sq_thread_cpu != -1) {
+			p.flags |= IORING_SETUP_SQ_AFF;
+			p.sq_thread_cpu = sq_thread_cpu;
+		}
+	}
+
+	fd = io_uring_setup(depth, &p);
+	if (fd < 0) {
+		perror("io_uring_setup");
+		return 1;
+	}
+	s->ring_fd = s->enter_ring_fd = fd;
+
+	io_uring_probe(fd);
+
+	if (fixedbufs) {
+		struct rlimit rlim;
+
+		rlim.rlim_cur = RLIM_INFINITY;
+		rlim.rlim_max = RLIM_INFINITY;
+		/* ignore potential error, not needed on newer kernels */
+		setrlimit(RLIMIT_MEMLOCK, &rlim);
+
+		ret = io_uring_register_buffers(s);
+		if (ret < 0) {
+			perror("io_uring_register_buffers");
+			return 1;
+		}
+
+		if (dma_map) {
+			ret = io_uring_map_buffers(s);
+			if (ret < 0) {
+				perror("io_uring_map_buffers");
+				return 1;
+			}
+		}
+	}
+
+	if (register_files) {
+		ret = io_uring_register_files(s);
+		if (ret < 0) {
+			perror("io_uring_register_files");
+			return 1;
+		}
+	}
+
+	ptr = mmap(0, p.sq_off.array + p.sq_entries * sizeof(__u32),
+			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
+			IORING_OFF_SQ_RING);
+	sring->head = ptr + p.sq_off.head;
+	sring->tail = ptr + p.sq_off.tail;
+	sring->ring_mask = ptr + p.sq_off.ring_mask;
+	sring->ring_entries = ptr + p.sq_off.ring_entries;
+	sring->flags = ptr + p.sq_off.flags;
+	sring->array = ptr + p.sq_off.array;
+	sq_ring_mask = *sring->ring_mask;
+
+	s->sqes = mmap(0, p.sq_entries * sizeof(struct io_uring_sqe),
+			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
+			IORING_OFF_SQES);
+
+	ptr = mmap(0, p.cq_off.cqes + p.cq_entries * sizeof(struct io_uring_cqe),
+			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
+			IORING_OFF_CQ_RING);
+	cring->head = ptr + p.cq_off.head;
+	cring->tail = ptr + p.cq_off.tail;
+	cring->ring_mask = ptr + p.cq_off.ring_mask;
+	cring->ring_entries = ptr + p.cq_off.ring_entries;
+	cring->cqes = ptr + p.cq_off.cqes;
+	cq_ring_mask = *cring->ring_mask;
+	return 0;
+}
+
+static void *allocate_mem(struct submitter *s, int size)
+{
+	void *buf;
+
+#ifdef CONFIG_LIBNUMA
+	if (s->numa_node != -1)
+		return numa_alloc_onnode(size, s->numa_node);
+#endif
+
+	if (posix_memalign(&buf, page_size, bs)) {
+		printf("failed alloc\n");
+		return NULL;
+	}
+
+	return buf;
+}
+
 static int submitter_init(struct submitter *s)
 {
-	int i, nr_batch;
+	int i, nr_batch, err;
+	static int init_printed;
+	char buf[80];
 
 	s->tid = gettid();
-	printf("submitter=%d, tid=%d\n", s->index, s->tid);
+	printf("submitter=%d, tid=%d, file=%s, node=%d\n", s->index, s->tid,
+							s->filename, s->numa_node);
+
+	set_affinity(s);
 
 	__init_rand64(&s->rand_state, pthread_self());
 	srand48(pthread_self());
@@ -624,6 +812,37 @@ static int submitter_init(struct submitter *s)
 	for (i = 0; i < MAX_FDS; i++)
 		s->files[i].fileno = i;
 
+	for (i = 0; i < roundup_pow2(depth); i++) {
+		void *buf;
+
+		buf = allocate_mem(s, bs);
+		if (!buf)
+			return 1;
+		s->iovecs[i].iov_base = buf;
+		s->iovecs[i].iov_len = bs;
+	}
+
+	if (use_sync) {
+		sprintf(buf, "Engine=preadv2\n");
+		err = 0;
+	} else if (!aio) {
+		err = setup_ring(s);
+		sprintf(buf, "Engine=io_uring, sq_ring=%d, cq_ring=%d\n", *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
+	} else {
+		sprintf(buf, "Engine=aio\n");
+		err = setup_aio(s);
+	}
+	if (err) {
+		printf("queue setup failed: %s, %d\n", strerror(errno), err);
+		return 1;
+	}
+
+	if (!init_printed) {
+		printf("polled=%d, fixedbufs=%d/%d, register_files=%d, buffered=%d, QD=%d\n", polled, fixedbufs, dma_map, register_files, buffered, depth);
+		printf("%s", buf);
+		init_printed = 1;
+	}
+
 	if (stats) {
 		nr_batch = roundup_pow2(depth / batch_submit);
 		if (nr_batch < 2)
@@ -1026,15 +1245,21 @@ static struct submitter *get_submitter(int offset)
 static void do_finish(const char *reason)
 {
 	int j;
+
 	printf("Exiting on %s\n", reason);
 	for (j = 0; j < nthreads; j++) {
 		struct submitter *s = get_submitter(j);
 		s->finish = 1;
 	}
-	if (max_iops > 100000)
-		printf("Maximum IOPS=%luK\n", max_iops / 1000);
-	else if (max_iops)
+	if (max_iops > 1000000) {
+		double miops = (double) max_iops / 1000000.0;
+		printf("Maximum IOPS=%.2fM\n", miops);
+	} else if (max_iops > 100000) {
+		double kiops = (double) max_iops / 1000.0;
+		printf("Maximum IOPS=%.2fK\n", kiops);
+	} else {
 		printf("Maximum IOPS=%lu\n", max_iops);
+	}
 	finish = 1;
 }
 
@@ -1058,144 +1283,6 @@ static void arm_sig_int(void)
 #endif
 }
 
-static int setup_aio(struct submitter *s)
-{
-#ifdef CONFIG_LIBAIO
-	if (polled) {
-		fprintf(stderr, "aio does not support polled IO\n");
-		polled = 0;
-	}
-	if (sq_thread_poll) {
-		fprintf(stderr, "aio does not support SQPOLL IO\n");
-		sq_thread_poll = 0;
-	}
-	if (do_nop) {
-		fprintf(stderr, "aio does not support polled IO\n");
-		do_nop = 0;
-	}
-	if (fixedbufs || register_files) {
-		fprintf(stderr, "aio does not support registered files or buffers\n");
-		fixedbufs = register_files = 0;
-	}
-
-	return io_queue_init(roundup_pow2(depth), &s->aio_ctx);
-#else
-	fprintf(stderr, "Legacy AIO not available on this system/build\n");
-	errno = EINVAL;
-	return -1;
-#endif
-}
-
-static int setup_ring(struct submitter *s)
-{
-	struct io_sq_ring *sring = &s->sq_ring;
-	struct io_cq_ring *cring = &s->cq_ring;
-	struct io_uring_params p;
-	int ret, fd;
-	void *ptr;
-
-	memset(&p, 0, sizeof(p));
-
-	if (polled && !do_nop)
-		p.flags |= IORING_SETUP_IOPOLL;
-	if (sq_thread_poll) {
-		p.flags |= IORING_SETUP_SQPOLL;
-		if (sq_thread_cpu != -1) {
-			p.flags |= IORING_SETUP_SQ_AFF;
-			p.sq_thread_cpu = sq_thread_cpu;
-		}
-	}
-
-	fd = io_uring_setup(depth, &p);
-	if (fd < 0) {
-		perror("io_uring_setup");
-		return 1;
-	}
-	s->ring_fd = s->enter_ring_fd = fd;
-
-	io_uring_probe(fd);
-
-	if (fixedbufs) {
-		struct rlimit rlim;
-
-		rlim.rlim_cur = RLIM_INFINITY;
-		rlim.rlim_max = RLIM_INFINITY;
-		/* ignore potential error, not needed on newer kernels */
-		setrlimit(RLIMIT_MEMLOCK, &rlim);
-
-		ret = io_uring_register_buffers(s);
-		if (ret < 0) {
-			perror("io_uring_register_buffers");
-			return 1;
-		}
-
-		if (dma_map) {
-			ret = io_uring_map_buffers(s);
-			if (ret < 0) {
-				perror("io_uring_map_buffers");
-				return 1;
-			}
-		}
-	}
-
-	if (register_files) {
-		ret = io_uring_register_files(s);
-		if (ret < 0) {
-			perror("io_uring_register_files");
-			return 1;
-		}
-	}
-
-	ptr = mmap(0, p.sq_off.array + p.sq_entries * sizeof(__u32),
-			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
-			IORING_OFF_SQ_RING);
-	sring->head = ptr + p.sq_off.head;
-	sring->tail = ptr + p.sq_off.tail;
-	sring->ring_mask = ptr + p.sq_off.ring_mask;
-	sring->ring_entries = ptr + p.sq_off.ring_entries;
-	sring->flags = ptr + p.sq_off.flags;
-	sring->array = ptr + p.sq_off.array;
-	sq_ring_mask = *sring->ring_mask;
-
-	s->sqes = mmap(0, p.sq_entries * sizeof(struct io_uring_sqe),
-			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
-			IORING_OFF_SQES);
-
-	ptr = mmap(0, p.cq_off.cqes + p.cq_entries * sizeof(struct io_uring_cqe),
-			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
-			IORING_OFF_CQ_RING);
-	cring->head = ptr + p.cq_off.head;
-	cring->tail = ptr + p.cq_off.tail;
-	cring->ring_mask = ptr + p.cq_off.ring_mask;
-	cring->ring_entries = ptr + p.cq_off.ring_entries;
-	cring->cqes = ptr + p.cq_off.cqes;
-	cq_ring_mask = *cring->ring_mask;
-	return 0;
-}
-
-static void file_depths(char *buf)
-{
-	bool prev = false;
-	char *p;
-	int i, j;
-
-	buf[0] = '\0';
-	p = buf;
-	for (j = 0; j < nthreads; j++) {
-		struct submitter *s = get_submitter(j);
-
-		for (i = 0; i < s->nr_files; i++) {
-			struct file *f = &s->files[i];
-
-			if (prev)
-				p += sprintf(p, " %d", f->pending_ios);
-			else
-				p += sprintf(p, "%d", f->pending_ios);
-			prev = true;
-		}
-	}
-}
-
 static void usage(char *argv, int status)
 {
 	char runtime_str[16];
@@ -1218,11 +1305,12 @@ static void usage(char *argv, int status)
 		" -R <bool> : Use random IO, default %d\n"
 		" -a <bool> : Use legacy aio, default %d\n"
 		" -S <bool> : Use sync IO (preadv2), default %d\n"
-		" -X <bool> : Use registered ring %d\n",
+		" -X <bool> : Use registered ring %d\n"
+		" -P <bool> : Automatically place on device home node %d\n",
 		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
 		fixedbufs, dma_map, register_files, nthreads, !buffered, do_nop,
 		stats, runtime == 0 ? "unlimited" : runtime_str, random_io, aio,
-		use_sync, register_ring);
+		use_sync, register_ring, numa_placement);
 	exit(status);
 }
 
@@ -1274,16 +1362,14 @@ int main(int argc, char *argv[])
 {
 	struct submitter *s;
 	unsigned long done, calls, reap;
-	int err, i, j, flags, fd, opt, threads_per_f, threads_rem = 0, nfiles;
-	long page_size;
+	int i, j, flags, fd, opt, threads_per_f, threads_rem = 0, nfiles;
 	struct file f;
-	char *fdepths;
 	void *ret;
 
 	if (!do_nop && argc < 2)
 		usage(argv[0], 1);
 
-	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:r:D:R:X:S:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:r:D:R:X:S:P:h?")) != -1) {
 		switch (opt) {
 		case 'a':
 			aio = !!atoi(optarg);
@@ -1361,6 +1447,9 @@ int main(int argc, char *argv[])
 			exit(1);
 #endif
 			break;
+		case 'P':
+			numa_placement = !!atoi(optarg);
+			break;
 		case 'h':
 		case '?':
 		default:
@@ -1383,6 +1472,7 @@ int main(int argc, char *argv[])
 				roundup_pow2(depth) * sizeof(struct iovec));
 	for (j = 0; j < nthreads; j++) {
 		s = get_submitter(j);
+		s->numa_node = -1;
 		s->index = j;
 		s->done = s->calls = s->reaps = 0;
 	}
@@ -1440,7 +1530,10 @@ int main(int argc, char *argv[])
 
 			memcpy(&s->files[s->nr_files], &f, sizeof(f));
 
-			printf("Added file %s (submitter %d)\n", argv[i], s->index);
+			if (numa_placement)
+				detect_node(s, argv[i]);
+
+			s->filename = argv[i];
 			s->nr_files++;
 		}
 		threads_rem--;
@@ -1454,43 +1547,6 @@ int main(int argc, char *argv[])
 	if (page_size < 0)
 		page_size = 4096;
 
-	for (j = 0; j < nthreads; j++) {
-		s = get_submitter(j);
-		for (i = 0; i < roundup_pow2(depth); i++) {
-			void *buf;
-
-			if (posix_memalign(&buf, page_size, bs)) {
-				printf("failed alloc\n");
-				return 1;
-			}
-			s->iovecs[i].iov_base = buf;
-			s->iovecs[i].iov_len = bs;
-		}
-	}
-
-	for (j = 0; j < nthreads; j++) {
-		s = get_submitter(j);
-
-		if (use_sync)
-			continue;
-		else if (!aio)
-			err = setup_ring(s);
-		else
-			err = setup_aio(s);
-		if (err) {
-			printf("ring setup failed: %s, %d\n", strerror(errno), err);
-			return 1;
-		}
-	}
-	s = get_submitter(0);
-	printf("polled=%d, fixedbufs=%d/%d, register_files=%d, buffered=%d, QD=%d\n", polled, fixedbufs, dma_map, register_files, buffered, depth);
-	if (use_sync)
-		printf("Engine=preadv2\n");
-	else if (!aio)
-		printf("Engine=io_uring, sq_ring=%d, cq_ring=%d\n", *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
-	else
-		printf("Engine=aio\n");
-
 	for (j = 0; j < nthreads; j++) {
 		s = get_submitter(j);
 		if (use_sync)
@@ -1503,7 +1559,6 @@ int main(int argc, char *argv[])
 #endif
 	}
 
-	fdepths = malloc(8 * s->nr_files * nthreads);
 	reap = calls = done = 0;
 	do {
 		unsigned long this_done = 0;
@@ -1535,16 +1590,20 @@ int main(int argc, char *argv[])
 			ipc = (this_reap - reap) / (this_call - calls);
 		} else
 			rpc = ipc = -1;
-		file_depths(fdepths);
 		iops = this_done - done;
 		if (bs > 1048576)
 			bw = iops * (bs / 1048576);
 		else
 			bw = iops / (1048576 / bs);
-		if (iops > 100000)
-			printf("IOPS=%luK, ", iops / 1000);
-		else
+		if (iops > 1000000) {
+			double miops = (double) iops / 1000000.0;
+			printf("IOPS=%.2fM, ", miops);
+		} else if (iops > 100000) {
+			double kiops = (double) iops / 1000.0;
+			printf("IOPS=%.2fK, ", kiops);
+		} else {
 			printf("IOPS=%lu, ", iops);
+		}
 		max_iops = max(max_iops, iops);
 		if (!do_nop) {
 			if (bw > 2000) {
@@ -1555,7 +1614,7 @@ int main(int argc, char *argv[])
 				printf("BW=%luMiB/s, ", bw);
 			}
 		}
-		printf("IOS/call=%ld/%ld, inflight=(%s)\n", rpc, ipc, fdepths);
+		printf("IOS/call=%ld/%ld\n", rpc, ipc);
 		done = this_done;
 		calls = this_call;
 		reap = this_reap;
@@ -1578,7 +1637,6 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	free(fdepths);
 	free(submitter);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-08-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-08-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3e1d3f2fc4a5f09174f0d6d70d036285d69f17c2:

  .github: add pull request template (2022-07-28 11:00:04 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 55037c4839c65612fa388ae937e63661d8192ed9:

  t/io_uring: switch to GiB/sec if numbers get large (2022-07-31 12:06:12 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: switch to GiB/sec if numbers get large

 t/io_uring.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 10035912..335a06ed 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -1546,8 +1546,15 @@ int main(int argc, char *argv[])
 		else
 			printf("IOPS=%lu, ", iops);
 		max_iops = max(max_iops, iops);
-		if (!do_nop)
-			printf("BW=%luMiB/s, ", bw);
+		if (!do_nop) {
+			if (bw > 2000) {
+				double bw_g = (double) bw / 1000.0;
+
+				printf("BW=%.2fGiB/s, ", bw_g);
+			} else {
+				printf("BW=%luMiB/s, ", bw);
+			}
+		}
 		printf("IOS/call=%ld/%ld, inflight=(%s)\n", rpc, ipc, fdepths);
 		done = this_done;
 		calls = this_call;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-07-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-07-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5b99196735a245224ec9321f796a9da30654ae6c:

  README: add maintainer section (2022-07-27 21:04:31 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3e1d3f2fc4a5f09174f0d6d70d036285d69f17c2:

  .github: add pull request template (2022-07-28 11:00:04 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      .github: add pull request template

 .github/PULL_REQUEST_TEMPLATE.md | 8 ++++++++
 1 file changed, 8 insertions(+)
 create mode 100644 .github/PULL_REQUEST_TEMPLATE.md

---

Diff of recent changes:

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
new file mode 100644
index 00000000..4d98a694
--- /dev/null
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,8 @@
+Please confirm that your commit message(s) follow these guidelines:
+
+1. First line is a commit title, a descriptive one-liner for the change
+2. Empty second line
+3. Commit message body that explains why the change is useful. Break lines that
+   aren't something like a URL at 72-74 chars.
+4. Empty line
+5. Signed-off-by: Real Name <real@email.com>

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-07-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-07-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dff32ddb97f2257975b6047474d665a5de7f7bbc:

  ci: install libnfs for linux and macos builds (2022-07-22 15:57:27 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5b99196735a245224ec9321f796a9da30654ae6c:

  README: add maintainer section (2022-07-27 21:04:31 -0600)

----------------------------------------------------------------
Chris Weber (1):
      Fix multithread issues when operating on a single shared file

Jens Axboe (3):
      Merge branch 'proposed_fix' of https://github.com/weberc-ntap/fio
      Minor style fixups
      README: add maintainer section

 README.rst  | 11 +++++++++++
 backend.c   | 19 ++++++++++++++++++-
 file.h      |  1 +
 filesetup.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 73 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/README.rst b/README.rst
index 4d736eaf..67420903 100644
--- a/README.rst
+++ b/README.rst
@@ -81,6 +81,17 @@ benchmark/test tools out there weren't flexible enough to do what he wanted.
 Jens Axboe <axboe@kernel.dk> 20060905
 
 
+Maintainers
+-----------
+
+Fio is maintained by Jens Axboe <axboe@kernel.dk and
+Vincent Fu <vincentfu@gmail.com> - however, for reporting bugs please use
+the fio reflector or the GitHub page rather than email any of them
+directly. By using the public resources, others will be able to learn from
+the responses too. Chances are also good that other members will be able to
+help with your inquiry as well.
+
+
 Binary packages
 ---------------
 
diff --git a/backend.c b/backend.c
index e5bb4e25..5159b60d 100644
--- a/backend.c
+++ b/backend.c
@@ -2314,8 +2314,25 @@ static void run_threads(struct sk_out *sk_out)
 	for_each_td(td, i) {
 		print_status_init(td->thread_number - 1);
 
-		if (!td->o.create_serialize)
+		if (!td->o.create_serialize) {
+			/*
+			 *  When operating on a single rile in parallel,
+			 *  perform single-threaded early setup so that
+			 *  when setup_files() does not run into issues
+			 *  later.
+			*/
+			if (!i && td->o.nr_files == 1) {
+				if (setup_shared_file(td)) {
+					exit_value++;
+					if (td->error)
+						log_err("fio: pid=%d, err=%d/%s\n",
+							(int) td->pid, td->error, td->verror);
+					td_set_runstate(td, TD_REAPED);
+					todo--;
+				}
+			}
 			continue;
+		}
 
 		if (fio_verify_load_state(td))
 			goto reap;
diff --git a/file.h b/file.h
index da1b8947..e646cf22 100644
--- a/file.h
+++ b/file.h
@@ -201,6 +201,7 @@ struct thread_data;
 extern void close_files(struct thread_data *);
 extern void close_and_free_files(struct thread_data *);
 extern uint64_t get_start_offset(struct thread_data *, struct fio_file *);
+extern int __must_check setup_shared_file(struct thread_data *);
 extern int __must_check setup_files(struct thread_data *);
 extern int __must_check file_invalidate_cache(struct thread_data *, struct fio_file *);
 #ifdef __cplusplus
diff --git a/filesetup.c b/filesetup.c
index ab6c488b..e0592209 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -143,7 +143,7 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 	if (unlink_file || new_layout) {
 		int ret;
 
-		dprint(FD_FILE, "layout unlink %s\n", f->file_name);
+		dprint(FD_FILE, "layout %d unlink %d %s\n", new_layout, unlink_file, f->file_name);
 
 		ret = td_io_unlink_file(td, f);
 		if (ret != 0 && ret != ENOENT) {
@@ -198,6 +198,9 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 		}
 	}
 
+
+	dprint(FD_FILE, "fill file %s, size %llu\n", f->file_name, (unsigned long long) f->real_file_size);
+
 	left = f->real_file_size;
 	bs = td->o.max_bs[DDIR_WRITE];
 	if (bs > left)
@@ -1078,6 +1081,44 @@ static bool create_work_dirs(struct thread_data *td, const char *fname)
 	return true;
 }
 
+int setup_shared_file(struct thread_data *td)
+{
+	struct fio_file *f;
+	uint64_t file_size;
+	int err = 0;
+
+	if (td->o.nr_files > 1) {
+		log_err("fio: shared file setup called for multiple files\n");
+		return -1;
+	}
+
+	get_file_sizes(td);
+
+	f = td->files[0];
+
+	if (f == NULL) {
+		log_err("fio: NULL shared file\n");
+		return -1;
+	}
+
+	file_size = thread_number * td->o.size;
+	dprint(FD_FILE, "shared setup %s real_file_size=%llu, desired=%llu\n", 
+			f->file_name, (unsigned long long)f->real_file_size, (unsigned long long)file_size);
+
+	if (f->real_file_size < file_size) {
+		dprint(FD_FILE, "fio: extending shared file\n");
+		f->real_file_size = file_size;
+		err = extend_file(td, f);
+		if (!err)
+			err = __file_invalidate_cache(td, f, 0, f->real_file_size);
+		get_file_sizes(td);
+		dprint(FD_FILE, "shared setup new real_file_size=%llu\n", 
+				(unsigned long long)f->real_file_size);
+	}
+
+	return err;
+}
+
 /*
  * Open the files and setup files sizes, creating files if necessary.
  */
@@ -1092,7 +1133,7 @@ int setup_files(struct thread_data *td)
 	const unsigned long long bs = td_min_bs(td);
 	uint64_t fs = 0;
 
-	dprint(FD_FILE, "setup files\n");
+	dprint(FD_FILE, "setup files (thread_number=%d, subjob_number=%d)\n", td->thread_number, td->subjob_number);
 
 	old_state = td_bump_runstate(td, TD_SETTING_UP);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-07-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-07-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 02a36caa69f5675f7144fbeddb7a32e1d35ce0c7:

  docs: clarify write_iolog description (2022-07-21 15:18:18 -0400)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dff32ddb97f2257975b6047474d665a5de7f7bbc:

  ci: install libnfs for linux and macos builds (2022-07-22 15:57:27 -0400)

----------------------------------------------------------------
Vincent Fu (3):
      configure: cleanups for nfs ioengine
      engines/nfs: remove commit hook
      ci: install libnfs for linux and macos builds

 ci/actions-install.sh |  3 ++-
 configure             | 16 +++++++---------
 engines/nfs.c         |  9 ---------
 options.c             |  2 +-
 4 files changed, 10 insertions(+), 20 deletions(-)

---

Diff of recent changes:

diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index ff514926..b5c4198f 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -26,6 +26,7 @@ DPKGCFG
         libibverbs-dev
         libnuma-dev
         librdmacm-dev
+	libnfs-dev
         valgrind
     )
     case "${CI_TARGET_ARCH}" in
@@ -78,7 +79,7 @@ install_macos() {
     #echo "Updating homebrew..."
     #brew update >/dev/null 2>&1
     echo "Installing packages..."
-    HOMEBREW_NO_AUTO_UPDATE=1 brew install cunit
+    HOMEBREW_NO_AUTO_UPDATE=1 brew install cunit libnfs
     pip3 install scipy six sphinx
 }
 
diff --git a/configure b/configure
index 7965f0b0..36450df8 100755
--- a/configure
+++ b/configure
@@ -170,7 +170,7 @@ disable_native="no"
 march_set="no"
 libiscsi="no"
 libnbd="no"
-libnfs="no"
+libnfs=""
 xnvme=""
 libzbc=""
 dfs=""
@@ -245,6 +245,8 @@ for opt do
   ;;
   --disable-tcmalloc) disable_tcmalloc="yes"
   ;;
+  --disable-libnfs) libnfs="no"
+  ;;
   --enable-libnfs) libnfs="yes"
   ;;
   --dynamic-libengines) dynamic_engines="yes"
@@ -282,6 +284,7 @@ if test "$show_help" = "yes" ; then
   echo "--disable-gfapi         Disable gfapi"
   echo "--enable-libhdfs        Enable hdfs support"
   echo "--enable-libnfs         Enable nfs support"
+  echo "--disable-libnfs        Disable nfs support"
   echo "--disable-lex           Disable use of lex/yacc for math"
   echo "--disable-pmem          Disable pmem based engines even if found"
   echo "--enable-lex            Enable use of lex/yacc for math"
@@ -2313,15 +2316,14 @@ print_config "DAOS File System (dfs) Engine" "$dfs"
 
 ##########################################
 # Check if we have libnfs (for userspace nfs support).
-if test "$libnfs" = "yes" ; then
+if test "$libnfs" != "no" ; then
   if $(pkg-config libnfs > /dev/null 2>&1); then
     libnfs="yes"
     libnfs_cflags=$(pkg-config --cflags libnfs)
-    # libnfs_libs=$(pkg-config --libs libnfs)
-    libnfs_libs=/usr/local/lib/libnfs.a
+    libnfs_libs=$(pkg-config --libs libnfs)
   else
     if test "$libnfs" = "yes" ; then
-      echo "libnfs" "Install libnfs"
+      feature_not_found "libnfs" "libnfs"
     fi
     libnfs="no"
   fi
@@ -3190,9 +3192,6 @@ fi
 if test "$dfs" = "yes" ; then
   output_sym "CONFIG_DFS"
 fi
-if test "$libnfs" = "yes" ; then
-  output_sym "CONFIG_NFS"
-fi
 if test "$march_set" = "no" && test "$build_native" = "yes" ; then
   output_sym "CONFIG_BUILD_NATIVE"
 fi
@@ -3234,7 +3233,6 @@ if test "$libnbd" = "yes" ; then
 fi
 if test "$libnfs" = "yes" ; then
   output_sym "CONFIG_LIBNFS"
-  echo "CONFIG_LIBNFS=m" >> $config_host_mak
   echo "LIBNFS_CFLAGS=$libnfs_cflags" >> $config_host_mak
   echo "LIBNFS_LIBS=$libnfs_libs" >> $config_host_mak
 fi
diff --git a/engines/nfs.c b/engines/nfs.c
index 21be8833..7031769d 100644
--- a/engines/nfs.c
+++ b/engines/nfs.c
@@ -279,14 +279,6 @@ static int fio_libnfs_close(struct thread_data *td, struct fio_file *f)
 	return ret;
 }
 
-/*
- * Hook for writing out outstanding data.
- */
-static int fio_libnfs_commit(struct thread_data *td) {
-	nfs_event_loop(td, true);
-	return 0;
-}
-
 struct ioengine_ops ioengine = {
 	.name		= "nfs",
 	.version	= FIO_IOOPS_VERSION,
@@ -297,7 +289,6 @@ struct ioengine_ops ioengine = {
 	.cleanup	= fio_libnfs_cleanup,
 	.open_file	= fio_libnfs_open,
 	.close_file	= fio_libnfs_close,
-	.commit     = fio_libnfs_commit,
 	.flags      = FIO_DISKLESSIO | FIO_NOEXTEND | FIO_NODISKUTIL,
 	.options	= options,
 	.option_struct_size	= sizeof(struct fio_libnfs_options),
diff --git a/options.c b/options.c
index 2b183c60..5d3daedf 100644
--- a/options.c
+++ b/options.c
@@ -2140,7 +2140,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "DAOS File System (dfs) IO engine",
 			  },
 #endif
-#ifdef CONFIG_NFS
+#ifdef CONFIG_LIBNFS
 			  { .ival = "nfs",
 			    .help = "NFS IO engine",
 			  },

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-07-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-07-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9c1c1a8d6a4f30eba9595da951d18db1685c03d8:

  engines/http: silence openssl 3.0 deprecation warnings (2022-07-19 13:21:19 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 02a36caa69f5675f7144fbeddb7a32e1d35ce0c7:

  docs: clarify write_iolog description (2022-07-21 15:18:18 -0400)

----------------------------------------------------------------
Vincent Fu (1):
      docs: clarify write_iolog description

 HOWTO.rst | 3 ++-
 fio.1     | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 470777e2..104cce2d 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -3049,7 +3049,8 @@ I/O replay
 
 	Write the issued I/O patterns to the specified file. See
 	:option:`read_iolog`.  Specify a separate file for each job, otherwise the
-	iologs will be interspersed and the file may be corrupt.
+        iologs will be interspersed and the file may be corrupt. This file will
+        be opened in append mode.
 
 .. option:: read_iolog=str
 
diff --git a/fio.1 b/fio.1
index 948c01f9..ce9bf3ef 100644
--- a/fio.1
+++ b/fio.1
@@ -2793,7 +2793,8 @@ of milliseconds. Defaults to 1000.
 .BI write_iolog \fR=\fPstr
 Write the issued I/O patterns to the specified file. See
 \fBread_iolog\fR. Specify a separate file for each job, otherwise the
-iologs will be interspersed and the file may be corrupt.
+iologs will be interspersed and the file may be corrupt. This file will be
+opened in append mode.
 .TP
 .BI read_iolog \fR=\fPstr
 Open an iolog with the specified filename and replay the I/O patterns it

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-07-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-07-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d6225c1550827077c0c0f9e1b8816b4f35cd5304:

  Update README.rst to specify secure protocols where possible (2022-07-11 07:53:29 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9c1c1a8d6a4f30eba9595da951d18db1685c03d8:

  engines/http: silence openssl 3.0 deprecation warnings (2022-07-19 13:21:19 -0600)

----------------------------------------------------------------
Giuseppe Baccini (1):
      Fixed misplaced goto in http.c

Jens Axboe (1):
      engines/http: silence openssl 3.0 deprecation warnings

Vincent Fu (1):
      Merge branch 'giubacc-misplaced-goto'

 engines/http.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/http.c b/engines/http.c
index 696febe1..1de9e66c 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -29,6 +29,10 @@
 #include "fio.h"
 #include "../optgroup.h"
 
+/*
+ * Silence OpenSSL 3.0 deprecated function warnings
+ */
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
 
 enum {
 	FIO_HTTP_WEBDAV	    = 0,
@@ -526,8 +530,8 @@ static enum fio_q_status fio_http_queue(struct thread_data *td,
 			if (status == 100 || (status >= 200 && status <= 204))
 				goto out;
 			log_err("DDIR_WRITE failed with HTTP status code %ld\n", status);
-			goto err;
 		}
+		goto err;
 	} else if (io_u->ddir == DDIR_READ) {
 		curl_easy_setopt(http->curl, CURLOPT_READDATA, NULL);
 		curl_easy_setopt(http->curl, CURLOPT_WRITEDATA, &_curl_stream);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-07-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-07-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 30568e0ed9366a810dfcf90a903ecfbff1a6196c:

  Merge branch 'client-hist-le64' of https://github.com/tuan-hoang1/fio (2022-07-07 06:33:25 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d6225c1550827077c0c0f9e1b8816b4f35cd5304:

  Update README.rst to specify secure protocols where possible (2022-07-11 07:53:29 -0600)

----------------------------------------------------------------
Rebecca Cran (1):
      Update README.rst to specify secure protocols where possible

 README.rst | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

---

Diff of recent changes:

diff --git a/README.rst b/README.rst
index 527f33ab..4d736eaf 100644
--- a/README.rst
+++ b/README.rst
@@ -27,31 +27,20 @@ Source
 
 Fio resides in a git repo, the canonical place is:
 
-	git://git.kernel.dk/fio.git
-
-When inside a corporate firewall, git:// URL sometimes does not work.
-If git:// does not work, use the http protocol instead:
-
-	http://git.kernel.dk/fio.git
+	https://git.kernel.dk/cgit/fio/
 
 Snapshots are frequently generated and :file:`fio-git-*.tar.gz` include the git
 meta data as well. Other tarballs are archives of official fio releases.
 Snapshots can download from:
 
-	http://brick.kernel.dk/snaps/
+	https://brick.kernel.dk/snaps/
 
 There are also two official mirrors. Both of these are automatically synced with
 the main repository, when changes are pushed. If the main repo is down for some
 reason, either one of these is safe to use as a backup:
 
-	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/fio.git
-
 	https://git.kernel.org/pub/scm/linux/kernel/git/axboe/fio.git
 
-or
-
-	git://github.com/axboe/fio.git
-
 	https://github.com/axboe/fio.git
 
 
@@ -70,7 +59,7 @@ email to majordomo@vger.kernel.org with
 
 in the body of the email. Archives can be found here:
 
-	http://www.spinics.net/lists/fio/
+	https://www.spinics.net/lists/fio/
 
 or here:
 
@@ -97,12 +86,12 @@ Binary packages
 
 Debian:
 	Starting with Debian "Squeeze", fio packages are part of the official
-	Debian repository. http://packages.debian.org/search?keywords=fio .
+	Debian repository. https://packages.debian.org/search?keywords=fio .
 
 Ubuntu:
 	Starting with Ubuntu 10.04 LTS (aka "Lucid Lynx"), fio packages are part
 	of the Ubuntu "universe" repository.
-	http://packages.ubuntu.com/search?keywords=fio .
+	https://packages.ubuntu.com/search?keywords=fio .
 
 Red Hat, Fedora, CentOS & Co:
 	Starting with Fedora 9/Extra Packages for Enterprise Linux 4, fio
@@ -176,7 +165,7 @@ directory.
 
 How to compile fio on 64-bit Windows:
 
- 1. Install Cygwin (http://www.cygwin.com/). Install **make** and all
+ 1. Install Cygwin (https://www.cygwin.com/). Install **make** and all
     packages starting with **mingw64-x86_64**. Ensure
     **mingw64-x86_64-zlib** are installed if you wish
     to enable fio's log compression functionality.
@@ -205,8 +194,8 @@ browser to :file:`./doc/output/html/index.html`.  To build manual page run
 ``make -C doc man`` and then ``man doc/output/man/fio.1``.  To see what other
 output formats are supported run ``make -C doc help``.
 
-.. _reStructuredText: http://www.sphinx-doc.org/rest.html
-.. _Sphinx: http://www.sphinx-doc.org
+.. _reStructuredText: https://www.sphinx-doc.org/rest.html
+.. _Sphinx: https://www.sphinx-doc.org
 
 
 Platforms

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-07-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-07-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1f43cc2e7b2f3ac7461f8ea66bb9b32cb03075c3:

  Merge branch 'server-hist-le64' of https://github.com/tuan-hoang1/fio (2022-07-06 16:38:07 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 30568e0ed9366a810dfcf90a903ecfbff1a6196c:

  Merge branch 'client-hist-le64' of https://github.com/tuan-hoang1/fio (2022-07-07 06:33:25 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'client-hist-le64' of https://github.com/tuan-hoang1/fio

Tuan Hoang (1):
      client: only do le64_to_cpu() on io_sample_data member if iolog is histogram

 client.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/client.c b/client.c
index 605a3ce5..37da74bc 100644
--- a/client.c
+++ b/client.c
@@ -1702,7 +1702,8 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 			s = (struct io_sample *)((char *)s + sizeof(struct io_u_plat_entry) * i);
 
 		s->time		= le64_to_cpu(s->time);
-		s->data.val	= le64_to_cpu(s->data.val);
+		if (ret->log_type != IO_LOG_TYPE_HIST)
+			s->data.val	= le64_to_cpu(s->data.val);
 		s->__ddir	= __le32_to_cpu(s->__ddir);
 		s->bs		= le64_to_cpu(s->bs);
 		s->priority	= le16_to_cpu(s->priority);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-07-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-07-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1eb5ca76ee17ff80dd06a0c2d22498ab720ec76f:

  configure: revert NFS configure change (2022-07-05 07:19:39 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1f43cc2e7b2f3ac7461f8ea66bb9b32cb03075c3:

  Merge branch 'server-hist-le64' of https://github.com/tuan-hoang1/fio (2022-07-06 16:38:07 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'server-hist-le64' of https://github.com/tuan-hoang1/fio

Tuan Hoang (1):
      server: only do cpu_to_le64() on io_sample_data member if iolog is histogram

 server.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/server.c b/server.c
index 4c71bd44..b453be5f 100644
--- a/server.c
+++ b/server.c
@@ -2284,7 +2284,8 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 			struct io_sample *s = get_sample(log, cur_log, i);
 
 			s->time		= cpu_to_le64(s->time);
-			s->data.val	= cpu_to_le64(s->data.val);
+			if (log->log_type != IO_LOG_TYPE_HIST)
+				s->data.val	= cpu_to_le64(s->data.val);
 			s->__ddir	= __cpu_to_le32(s->__ddir);
 			s->bs		= cpu_to_le64(s->bs);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-07-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-07-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dc4729e3ef6a9116d7cd30e96e4f5863883e5bd7:

  hash: cleanups (2022-07-01 15:03:39 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1eb5ca76ee17ff80dd06a0c2d22498ab720ec76f:

  configure: revert NFS configure change (2022-07-05 07:19:39 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      configure: revert NFS configure change

 configure | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 04a1d0e2..7965f0b0 100755
--- a/configure
+++ b/configure
@@ -245,7 +245,7 @@ for opt do
   ;;
   --disable-tcmalloc) disable_tcmalloc="yes"
   ;;
-  --disable-nfs) disable_nfs="yes"
+  --enable-libnfs) libnfs="yes"
   ;;
   --dynamic-libengines) dynamic_engines="yes"
   ;;
@@ -279,7 +279,6 @@ if test "$show_help" = "yes" ; then
   echo "--disable-rados         Disable Rados support even if found"
   echo "--disable-rbd           Disable Rados Block Device even if found"
   echo "--disable-http          Disable HTTP support even if found"
-  echo "--disable-nfs           Disable userspace NFS support even if found"
   echo "--disable-gfapi         Disable gfapi"
   echo "--enable-libhdfs        Enable hdfs support"
   echo "--enable-libnfs         Enable nfs support"
@@ -2314,15 +2313,17 @@ print_config "DAOS File System (dfs) Engine" "$dfs"
 
 ##########################################
 # Check if we have libnfs (for userspace nfs support).
-if test "$disable_nfs" != "yes"; then
+if test "$libnfs" = "yes" ; then
   if $(pkg-config libnfs > /dev/null 2>&1); then
     libnfs="yes"
     libnfs_cflags=$(pkg-config --cflags libnfs)
-    libnfs_libs=$(pkg-config --libs libnfs)
+    # libnfs_libs=$(pkg-config --libs libnfs)
+    libnfs_libs=/usr/local/lib/libnfs.a
   else
     if test "$libnfs" = "yes" ; then
       echo "libnfs" "Install libnfs"
     fi
+    libnfs="no"
   fi
 fi
 print_config "NFS engine" "$libnfs"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-07-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-07-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 660879102e32a0ed3d3225afaebcc0d46625a4a6:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-06-23 08:20:22 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dc4729e3ef6a9116d7cd30e96e4f5863883e5bd7:

  hash: cleanups (2022-07-01 15:03:39 -0600)

----------------------------------------------------------------
Georg Sauthoff (1):
      Simplify and optimize __fill_random_buf

Jens Axboe (3):
      Merge branch 'fill-random-smaller' of https://github.com/gsauthof/fio
      lib/rand: improve __fill_random_buf()
      hash: cleanups

 engines/rdma.c |  2 +-
 hash.h         | 26 --------------------------
 lib/rand.c     | 30 +++++++++---------------------
 3 files changed, 10 insertions(+), 48 deletions(-)

---

Diff of recent changes:

diff --git a/engines/rdma.c b/engines/rdma.c
index e3bb2567..fcb41068 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -1389,7 +1389,7 @@ static int fio_rdmaio_setup(struct thread_data *td)
 		rd = malloc(sizeof(*rd));
 
 		memset(rd, 0, sizeof(*rd));
-		init_rand_seed(&rd->rand_state, (unsigned int) GOLDEN_RATIO_PRIME, 0);
+		init_rand_seed(&rd->rand_state, (unsigned int) GOLDEN_RATIO_64, 0);
 		td->io_ops_data = rd;
 	}
 
diff --git a/hash.h b/hash.h
index f7596a56..51f0706e 100644
--- a/hash.h
+++ b/hash.h
@@ -9,32 +9,6 @@
    (C) 2002 William Lee Irwin III, IBM */
 
 /*
- * Knuth recommends primes in approximately golden ratio to the maximum
- * integer representable by a machine word for multiplicative hashing.
- * Chuck Lever verified the effectiveness of this technique:
- * http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
- *
- * These primes are chosen to be bit-sparse, that is operations on
- * them can use shifts and additions instead of multiplications for
- * machines where multiplications are slow.
- */
-
-#if BITS_PER_LONG == 32
-/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
-#define GOLDEN_RATIO_PRIME 0x9e370001UL
-#elif BITS_PER_LONG == 64
-/*  2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
-#define GOLDEN_RATIO_PRIME 0x9e37fffffffc0001UL
-#else
-#error Define GOLDEN_RATIO_PRIME for your wordsize.
-#endif
-
-/*
- * The above primes are actively bad for hashing, since they are
- * too sparse. The 32-bit one is mostly ok, the 64-bit one causes
- * real problems. Besides, the "prime" part is pointless for the
- * multiplicative hash.
- *
  * Although a random odd number will do, it turns out that the golden
  * ratio phi = (sqrt(5)-1)/2, or its negative, has particularly nice
  * properties.
diff --git a/lib/rand.c b/lib/rand.c
index 6e893e80..1e669116 100644
--- a/lib/rand.c
+++ b/lib/rand.c
@@ -97,29 +97,17 @@ void init_rand_seed(struct frand_state *state, uint64_t seed, bool use64)
 
 void __fill_random_buf(void *buf, unsigned int len, uint64_t seed)
 {
-	void *ptr = buf;
+	uint64_t *b = buf;
+	uint64_t *e = b  + len / sizeof(*b);
+	unsigned int rest = len % sizeof(*b);
 
-	while (len) {
-		int this_len;
-
-		if (len >= sizeof(int64_t)) {
-			*((int64_t *) ptr) = seed;
-			this_len = sizeof(int64_t);
-		} else if (len >= sizeof(int32_t)) {
-			*((int32_t *) ptr) = seed;
-			this_len = sizeof(int32_t);
-		} else if (len >= sizeof(int16_t)) {
-			*((int16_t *) ptr) = seed;
-			this_len = sizeof(int16_t);
-		} else {
-			*((int8_t *) ptr) = seed;
-			this_len = sizeof(int8_t);
-		}
-		ptr += this_len;
-		len -= this_len;
-		seed *= GOLDEN_RATIO_PRIME;
-		seed >>= 3;
+	for (; b != e; ++b) {
+		*b = seed;
+		seed = __hash_u64(seed);
 	}
+
+	if (fio_unlikely(rest))
+		__builtin_memcpy(e, &seed, rest);
 }
 
 uint64_t fill_random_buf(struct frand_state *fs, void *buf,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-06-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-06-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6aaebfbe7269f95164ac83a04505869f96f5f83a:

  configure: add option to disable xnvme build (2022-06-22 11:45:32 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 660879102e32a0ed3d3225afaebcc0d46625a4a6:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-06-23 08:20:22 -0600)

----------------------------------------------------------------
Bart Van Assche (2):
      ci/travis-*: Fix shellcheck warnings
      ci: Verify the Android build

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 .github/workflows/ci.yml     |  5 +++++
 ci/actions-build.sh          | 19 +++++++++++++++++--
 ci/actions-full-test.sh      |  2 ++
 ci/actions-install.sh        |  7 +++++++
 ci/actions-smoke-test.sh     |  2 ++
 ci/travis-install-librpma.sh |  6 +++---
 ci/travis-install-pmdk.sh    |  9 +++++----
 7 files changed, 41 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index cd8ce142..650366b2 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -15,6 +15,7 @@ jobs:
         - linux-clang
         - macos
         - linux-i686-gcc
+        - android
         include:
         - build: linux-gcc
           os: ubuntu-20.04
@@ -27,8 +28,12 @@ jobs:
         - build: linux-i686-gcc
           os: ubuntu-20.04
           arch: i686
+        - build: android
+          os: ubuntu-20.04
+          arch: aarch64-linux-android32
 
     env:
+      CI_TARGET_BUILD: ${{ matrix.build }}
       CI_TARGET_ARCH: ${{ matrix.arch }}
       CC: ${{ matrix.cc }}
 
diff --git a/ci/actions-build.sh b/ci/actions-build.sh
index 74a6fdcb..2b3de8e3 100755
--- a/ci/actions-build.sh
+++ b/ci/actions-build.sh
@@ -11,8 +11,23 @@ main() {
     local configure_flags=()
 
     set_ci_target_os
-    case "${CI_TARGET_OS}" in
-        "linux")
+    case "${CI_TARGET_BUILD}/${CI_TARGET_OS}" in
+        android/*)
+            export UNAME=Android
+            if [ -z "${CI_TARGET_ARCH}" ]; then
+                echo "Error: CI_TARGET_ARCH has not been set"
+                return 1
+            fi
+            NDK=$PWD/android-ndk-r24/toolchains/llvm/prebuilt/linux-x86_64/bin
+            export PATH="${NDK}:${PATH}"
+            export LIBS="-landroid"
+            CC=${NDK}/${CI_TARGET_ARCH}-clang
+            if [ ! -e "${CC}" ]; then
+                echo "Error: could not find ${CC}"
+                return 1
+            fi
+            ;;
+        */linux)
             case "${CI_TARGET_ARCH}" in
                 "i686")
                     extra_cflags="${extra_cflags} -m32"
diff --git a/ci/actions-full-test.sh b/ci/actions-full-test.sh
index 8282002f..d1675f6e 100755
--- a/ci/actions-full-test.sh
+++ b/ci/actions-full-test.sh
@@ -3,6 +3,8 @@
 set -eu
 
 main() {
+    [ "${CI_TARGET_BUILD}" = android ] && return 0
+
     echo "Running long running tests..."
     export PYTHONUNBUFFERED="TRUE"
     if [[ "${CI_TARGET_ARCH}" == "arm64" ]]; then
diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index 0e472717..ff514926 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -83,6 +83,13 @@ install_macos() {
 }
 
 main() {
+    if [ "${CI_TARGET_BUILD}" = "android" ]; then
+	echo "Installing Android NDK..."
+	wget --quiet https://dl.google.com/android/repository/android-ndk-r24-linux.zip
+	unzip -q android-ndk-r24-linux.zip
+	return 0
+    fi
+
     set_ci_target_os
 
     install_function="install_${CI_TARGET_OS}"
diff --git a/ci/actions-smoke-test.sh b/ci/actions-smoke-test.sh
index c129c89f..3196f6a1 100755
--- a/ci/actions-smoke-test.sh
+++ b/ci/actions-smoke-test.sh
@@ -3,6 +3,8 @@
 set -eu
 
 main() {
+    [ "${CI_TARGET_BUILD}" = "android" ] && return 0
+
     echo "Running smoke tests..."
     make test
 }
diff --git a/ci/travis-install-librpma.sh b/ci/travis-install-librpma.sh
index b127f3f5..4e5ed21d 100755
--- a/ci/travis-install-librpma.sh
+++ b/ci/travis-install-librpma.sh
@@ -16,7 +16,7 @@ cmake .. -DCMAKE_BUILD_TYPE=Release \
 	-DBUILD_DOC=OFF \
 	-DBUILD_EXAMPLES=OFF \
 	-DBUILD_TESTS=OFF
-make -j$(nproc)
-sudo make -j$(nproc) install
-cd $WORKDIR
+make -j"$(nproc)"
+sudo make -j"$(nproc)" install
+cd "$WORKDIR"
 rm -rf $ZIP_FILE rpma-${LIBRPMA_VERSION}
diff --git a/ci/travis-install-pmdk.sh b/ci/travis-install-pmdk.sh
index 3b0b5bbc..7bde9fd0 100755
--- a/ci/travis-install-pmdk.sh
+++ b/ci/travis-install-pmdk.sh
@@ -12,7 +12,8 @@ WORKDIR=$(pwd)
 #    /bin/sh: 1: clang: not found
 # if CC is not set to the full path of clang.
 #
-export CC=$(type -P $CC)
+CC=$(type -P "$CC")
+export CC
 
 # Install PMDK libraries, because PMDK's libpmem
 # is a dependency of the librpma fio engine.
@@ -22,7 +23,7 @@ export CC=$(type -P $CC)
 wget https://github.com/pmem/pmdk/releases/download/${PMDK_VERSION}/pmdk-${PMDK_VERSION}.tar.gz
 tar -xzf pmdk-${PMDK_VERSION}.tar.gz
 cd pmdk-${PMDK_VERSION}
-make -j$(nproc) NDCTL_ENABLE=n
-sudo make -j$(nproc) install prefix=/usr NDCTL_ENABLE=n
-cd $WORKDIR
+make -j"$(nproc)" NDCTL_ENABLE=n
+sudo make -j"$(nproc)" install prefix=/usr NDCTL_ENABLE=n
+cd "$WORKDIR"
 rm -rf pmdk-${PMDK_VERSION}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-06-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-06-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d4bf5e6193b97c5e5490fdb93b069d149a38777c:

  gettime: fix whitespace damage (2022-06-19 12:04:19 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6aaebfbe7269f95164ac83a04505869f96f5f83a:

  configure: add option to disable xnvme build (2022-06-22 11:45:32 -0600)

----------------------------------------------------------------
Ankit Kumar (1):
      configure: add option to disable xnvme build

 configure | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 510af424..04a1d0e2 100755
--- a/configure
+++ b/configure
@@ -171,7 +171,7 @@ march_set="no"
 libiscsi="no"
 libnbd="no"
 libnfs="no"
-xnvme="no"
+xnvme=""
 libzbc=""
 dfs=""
 dynamic_engines="no"
@@ -241,7 +241,7 @@ for opt do
   ;;
   --disable-libzbc) libzbc="no"
   ;;
-  --enable-xnvme) xnvme="yes"
+  --disable-xnvme) xnvme="no"
   ;;
   --disable-tcmalloc) disable_tcmalloc="yes"
   ;;
@@ -294,7 +294,7 @@ if test "$show_help" = "yes" ; then
   echo "--with-ime=             Install path for DDN's Infinite Memory Engine"
   echo "--enable-libiscsi       Enable iscsi support"
   echo "--enable-libnbd         Enable libnbd (NBD engine) support"
-  echo "--enable-xnvme          Enable xnvme support"
+  echo "--disable-xnvme         Disable xnvme support even if found"
   echo "--disable-libzbc        Disable libzbc even if found"
   echo "--disable-tcmalloc      Disable tcmalloc support"
   echo "--dynamic-libengines    Lib-based ioengines as dynamic libraries"
@@ -2619,7 +2619,7 @@ fi
 
 ##########################################
 # Check if we have xnvme
-if test "$xnvme" != "yes" ; then
+if test "$xnvme" != "no" ; then
   if check_min_lib_version xnvme 0.2.0; then
     xnvme="yes"
     xnvme_cflags=$(pkg-config --cflags xnvme)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-06-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-06-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e4d384755e4831cf5bbaa97e0c5b79a3598efbc4:

  Merge branch 'master' of https://github.com/useche/fio (2022-06-15 18:38:41 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d4bf5e6193b97c5e5490fdb93b069d149a38777c:

  gettime: fix whitespace damage (2022-06-19 12:04:19 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      gettime: fix whitespace damage

 gettime.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/gettime.c b/gettime.c
index 099e9d9f..14462420 100644
--- a/gettime.c
+++ b/gettime.c
@@ -431,22 +431,22 @@ void fio_clock_init(void)
 
 uint64_t ntime_since(const struct timespec *s, const struct timespec *e)
 {
-       int64_t sec, nsec;
+	int64_t sec, nsec;
 
-       sec = e->tv_sec - s->tv_sec;
-       nsec = e->tv_nsec - s->tv_nsec;
-       if (sec > 0 && nsec < 0) {
-	       sec--;
-	       nsec += 1000000000LL;
-       }
+	sec = e->tv_sec - s->tv_sec;
+	nsec = e->tv_nsec - s->tv_nsec;
+	if (sec > 0 && nsec < 0) {
+		sec--;
+		nsec += 1000000000LL;
+	}
 
        /*
 	* time warp bug on some kernels?
 	*/
-       if (sec < 0 || (sec == 0 && nsec < 0))
-	       return 0;
+	if (sec < 0 || (sec == 0 && nsec < 0))
+		return 0;
 
-       return nsec + (sec * 1000000000LL);
+	return nsec + (sec * 1000000000LL);
 }
 
 uint64_t ntime_since_now(const struct timespec *s)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-06-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-06-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b5f3adf9e1e40c7bdb76a9e433aa580f7eead740:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-06-13 18:14:26 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e4d384755e4831cf5bbaa97e0c5b79a3598efbc4:

  Merge branch 'master' of https://github.com/useche/fio (2022-06-15 18:38:41 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'master' of https://github.com/useche/fio

Luis Useche (1):
      Init file_cache to invalid (maj, min)

Vincent Fu (5):
      ioengines: add helper for trims with async ioengines
      ioengines: don't record issue_time if ioengines already do it
      HOWTO: improve description of latency measures
      ioengines: update last_issue if we set issue_time
      ioengines: clean up latency accounting for 3 ioengines

 HOWTO.rst               | 29 ++++++++++++++++++-----------
 blktrace.c              |  5 ++++-
 engines/io_uring.c      | 13 +++++++++++--
 engines/libaio.c        |  9 ++++++++-
 engines/librpma_apm.c   |  2 +-
 engines/librpma_fio.c   |  9 ++++++++-
 engines/librpma_gpspm.c |  2 +-
 engines/rdma.c          |  9 ++++++++-
 ioengines.c             | 44 ++++++++++++++++++++++++++------------------
 ioengines.h             |  2 ++
 10 files changed, 87 insertions(+), 37 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 28ac2b7c..470777e2 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -4165,24 +4165,31 @@ writes in the example above).  In the order listed, they denote:
 **slat**
 		Submission latency (**min** being the minimum, **max** being the
 		maximum, **avg** being the average, **stdev** being the standard
-		deviation).  This is the time it took to submit the I/O.  For
-		sync I/O this row is not displayed as the slat is really the
-		completion latency (since queue/complete is one operation there).
-		This value can be in nanoseconds, microseconds or milliseconds ---
-		fio will choose the most appropriate base and print that (in the
-		example above nanoseconds was the best scale).  Note: in :option:`--minimal` mode
-		latencies are always expressed in microseconds.
+                deviation).  This is the time from when fio initialized the I/O
+                to submission.  For synchronous ioengines this includes the time
+                up until just before the ioengine's queue function is called.
+                For asynchronous ioengines this includes the time up through the
+                completion of the ioengine's queue function (and commit function
+                if it is defined). For sync I/O this row is not displayed as the
+                slat is negligible.  This value can be in nanoseconds,
+                microseconds or milliseconds --- fio will choose the most
+                appropriate base and print that (in the example above
+                nanoseconds was the best scale).  Note: in :option:`--minimal`
+                mode latencies are always expressed in microseconds.
 
 **clat**
 		Completion latency. Same names as slat, this denotes the time from
-		submission to completion of the I/O pieces. For sync I/O, clat will
-		usually be equal (or very close) to 0, as the time from submit to
-		complete is basically just CPU time (I/O has already been done, see slat
-		explanation).
+                submission to completion of the I/O pieces. For sync I/O, this
+                represents the time from when the I/O was submitted to the
+                operating system to when it was completed. For asynchronous
+                ioengines this is the time from when the ioengine's queue (and
+                commit if available) functions were completed to when the I/O's
+                completion was reaped by fio.
 
 **lat**
 		Total latency. Same names as slat and clat, this denotes the time from
 		when fio created the I/O unit to completion of the I/O operation.
+                It is the sum of submission and completion latency.
 
 **bw**
 		Bandwidth statistics based on samples. Same names as the xlat stats,
diff --git a/blktrace.c b/blktrace.c
index 619121c7..00e5f9a9 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -442,7 +442,10 @@ err:
 bool read_blktrace(struct thread_data* td)
 {
 	struct blk_io_trace t;
-	struct file_cache cache = { };
+	struct file_cache cache = {
+		.maj = ~0U,
+		.min = ~0U,
+	};
 	unsigned long ios[DDIR_RWDIR_SYNC_CNT] = { };
 	unsigned long long rw_bs[DDIR_RWDIR_CNT] = { };
 	unsigned long skipped_writes;
diff --git a/engines/io_uring.c b/engines/io_uring.c
index cceafe69..cffc7371 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -608,6 +608,12 @@ static void fio_ioring_queued(struct thread_data *td, int start, int nr)
 
 		start++;
 	}
+
+	/*
+	 * only used for iolog
+	 */
+	if (td->o.read_iolog_file)
+		memcpy(&td->last_issue, &now, sizeof(now));
 }
 
 static int fio_ioring_commit(struct thread_data *td)
@@ -1191,7 +1197,8 @@ static int fio_ioring_cmd_get_max_open_zones(struct thread_data *td,
 static struct ioengine_ops ioengine_uring = {
 	.name			= "io_uring",
 	.version		= FIO_IOOPS_VERSION,
-	.flags			= FIO_ASYNCIO_SYNC_TRIM | FIO_NO_OFFLOAD,
+	.flags			= FIO_ASYNCIO_SYNC_TRIM | FIO_NO_OFFLOAD |
+					FIO_ASYNCIO_SETS_ISSUE_TIME,
 	.init			= fio_ioring_init,
 	.post_init		= fio_ioring_post_init,
 	.io_u_init		= fio_ioring_io_u_init,
@@ -1211,7 +1218,9 @@ static struct ioengine_ops ioengine_uring = {
 static struct ioengine_ops ioengine_uring_cmd = {
 	.name			= "io_uring_cmd",
 	.version		= FIO_IOOPS_VERSION,
-	.flags			= FIO_ASYNCIO_SYNC_TRIM | FIO_NO_OFFLOAD | FIO_MEMALIGN | FIO_RAWIO,
+	.flags			= FIO_ASYNCIO_SYNC_TRIM | FIO_NO_OFFLOAD |
+					FIO_MEMALIGN | FIO_RAWIO |
+					FIO_ASYNCIO_SETS_ISSUE_TIME,
 	.init			= fio_ioring_init,
 	.post_init		= fio_ioring_cmd_post_init,
 	.io_u_init		= fio_ioring_io_u_init,
diff --git a/engines/libaio.c b/engines/libaio.c
index 9c278d06..33b8c12f 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -368,6 +368,12 @@ static void fio_libaio_queued(struct thread_data *td, struct io_u **io_us,
 		memcpy(&io_u->issue_time, &now, sizeof(now));
 		io_u_queued(td, io_u);
 	}
+
+	/*
+	 * only used for iolog
+	 */
+	if (td->o.read_iolog_file)
+		memcpy(&td->last_issue, &now, sizeof(now));
 }
 
 static int fio_libaio_commit(struct thread_data *td)
@@ -511,7 +517,8 @@ static int fio_libaio_init(struct thread_data *td)
 FIO_STATIC struct ioengine_ops ioengine = {
 	.name			= "libaio",
 	.version		= FIO_IOOPS_VERSION,
-	.flags			= FIO_ASYNCIO_SYNC_TRIM,
+	.flags			= FIO_ASYNCIO_SYNC_TRIM |
+					FIO_ASYNCIO_SETS_ISSUE_TIME,
 	.init			= fio_libaio_init,
 	.post_init		= fio_libaio_post_init,
 	.prep			= fio_libaio_prep,
diff --git a/engines/librpma_apm.c b/engines/librpma_apm.c
index d1166ad8..896240dd 100644
--- a/engines/librpma_apm.c
+++ b/engines/librpma_apm.c
@@ -208,7 +208,7 @@ FIO_STATIC struct ioengine_ops ioengine_client = {
 	.errdetails		= librpma_fio_client_errdetails,
 	.close_file		= librpma_fio_file_nop,
 	.cleanup		= client_cleanup,
-	.flags			= FIO_DISKLESSIO,
+	.flags			= FIO_DISKLESSIO | FIO_ASYNCIO_SETS_ISSUE_TIME,
 	.options		= librpma_fio_options,
 	.option_struct_size	= sizeof(struct librpma_fio_options_values),
 };
diff --git a/engines/librpma_fio.c b/engines/librpma_fio.c
index 34818904..a78a1e57 100644
--- a/engines/librpma_fio.c
+++ b/engines/librpma_fio.c
@@ -621,9 +621,16 @@ int librpma_fio_client_commit(struct thread_data *td)
 		}
 	}
 
-	if ((fill_time = fio_fill_issue_time(td)))
+	if ((fill_time = fio_fill_issue_time(td))) {
 		fio_gettime(&now, NULL);
 
+		/*
+		 * only used for iolog
+		 */
+		if (td->o.read_iolog_file)
+			memcpy(&td->last_issue, &now, sizeof(now));
+
+	}
 	/* move executed io_us from queued[] to flight[] */
 	for (i = 0; i < ccd->io_u_queued_nr; i++) {
 		struct io_u *io_u = ccd->io_us_queued[i];
diff --git a/engines/librpma_gpspm.c b/engines/librpma_gpspm.c
index 5cf97472..f00717a7 100644
--- a/engines/librpma_gpspm.c
+++ b/engines/librpma_gpspm.c
@@ -352,7 +352,7 @@ FIO_STATIC struct ioengine_ops ioengine_client = {
 	.errdetails		= librpma_fio_client_errdetails,
 	.close_file		= librpma_fio_file_nop,
 	.cleanup		= client_cleanup,
-	.flags			= FIO_DISKLESSIO,
+	.flags			= FIO_DISKLESSIO | FIO_ASYNCIO_SETS_ISSUE_TIME,
 	.options		= librpma_fio_options,
 	.option_struct_size	= sizeof(struct librpma_fio_options_values),
 };
diff --git a/engines/rdma.c b/engines/rdma.c
index 4eb86652..e3bb2567 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -832,6 +832,12 @@ static void fio_rdmaio_queued(struct thread_data *td, struct io_u **io_us,
 		memcpy(&io_u->issue_time, &now, sizeof(now));
 		io_u_queued(td, io_u);
 	}
+
+	/*
+	 * only used for iolog
+	 */
+	if (td->o.read_iolog_file)
+		memcpy(&td->last_issue, &now, sizeof(now));
 }
 
 static int fio_rdmaio_commit(struct thread_data *td)
@@ -1404,7 +1410,8 @@ FIO_STATIC struct ioengine_ops ioengine = {
 	.cleanup		= fio_rdmaio_cleanup,
 	.open_file		= fio_rdmaio_open_file,
 	.close_file		= fio_rdmaio_close_file,
-	.flags			= FIO_DISKLESSIO | FIO_UNIDIR | FIO_PIPEIO,
+	.flags			= FIO_DISKLESSIO | FIO_UNIDIR | FIO_PIPEIO |
+					FIO_ASYNCIO_SETS_ISSUE_TIME,
 	.options		= options,
 	.option_struct_size	= sizeof(struct rdmaio_options),
 };
diff --git a/ioengines.c b/ioengines.c
index 68f307e5..e2316ee4 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -24,6 +24,13 @@
 
 static FLIST_HEAD(engine_list);
 
+static inline bool async_ioengine_sync_trim(struct thread_data *td,
+					    struct io_u	*io_u)
+{
+	return td_ioengine_flagged(td, FIO_ASYNCIO_SYNC_TRIM) &&
+		io_u->ddir == DDIR_TRIM;
+}
+
 static bool check_engine_ops(struct thread_data *td, struct ioengine_ops *ops)
 {
 	if (ops->version != FIO_IOOPS_VERSION) {
@@ -350,17 +357,17 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	io_u->resid = 0;
 
 	if (td_ioengine_flagged(td, FIO_SYNCIO) ||
-		(td_ioengine_flagged(td, FIO_ASYNCIO_SYNC_TRIM) && 
-		io_u->ddir == DDIR_TRIM)) {
-		if (fio_fill_issue_time(td))
+		async_ioengine_sync_trim(td, io_u)) {
+		if (fio_fill_issue_time(td)) {
 			fio_gettime(&io_u->issue_time, NULL);
 
-		/*
-		 * only used for iolog
-		 */
-		if (td->o.read_iolog_file)
-			memcpy(&td->last_issue, &io_u->issue_time,
-					sizeof(io_u->issue_time));
+			/*
+			 * only used for iolog
+			 */
+			if (td->o.read_iolog_file)
+				memcpy(&td->last_issue, &io_u->issue_time,
+						sizeof(io_u->issue_time));
+		}
 	}
 
 
@@ -435,17 +442,18 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	}
 
 	if (!td_ioengine_flagged(td, FIO_SYNCIO) &&
-		(!td_ioengine_flagged(td, FIO_ASYNCIO_SYNC_TRIM) ||
-		 io_u->ddir != DDIR_TRIM)) {
-		if (fio_fill_issue_time(td))
+		!async_ioengine_sync_trim(td, io_u)) {
+		if (fio_fill_issue_time(td) &&
+			!td_ioengine_flagged(td, FIO_ASYNCIO_SETS_ISSUE_TIME)) {
 			fio_gettime(&io_u->issue_time, NULL);
 
-		/*
-		 * only used for iolog
-		 */
-		if (td->o.read_iolog_file)
-			memcpy(&td->last_issue, &io_u->issue_time,
-					sizeof(io_u->issue_time));
+			/*
+			 * only used for iolog
+			 */
+			if (td->o.read_iolog_file)
+				memcpy(&td->last_issue, &io_u->issue_time,
+						sizeof(io_u->issue_time));
+		}
 	}
 
 	return ret;
diff --git a/ioengines.h b/ioengines.h
index acdb0071..fafa1e48 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -83,6 +83,8 @@ enum fio_ioengine_flags {
 	FIO_ASYNCIO_SYNC_TRIM
 			= 1 << 14,	/* io engine has async ->queue except for trim */
 	FIO_NO_OFFLOAD	= 1 << 15,	/* no async offload */
+	FIO_ASYNCIO_SETS_ISSUE_TIME
+			= 1 << 16,	/* async ioengine with commit function that sets issue_time */
 };
 
 /*

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-06-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-06-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 26faead0f3c6e7608b89a51373f1455b91377fcb:

  t/zbd: skip test case #13 when max_open_zones is too small (2022-06-02 03:58:31 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b5f3adf9e1e40c7bdb76a9e433aa580f7eead740:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-06-13 18:14:26 -0600)

----------------------------------------------------------------
Bart Van Assche (2):
      configure: Support gcc 12
      configure: Fix libzbc detection on SUSE Linux

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 configure | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 8182322b..510af424 100755
--- a/configure
+++ b/configure
@@ -1128,7 +1128,8 @@ cat > $TMPC << EOF
 #include <sched.h>
 int main(int argc, char **argv)
 {
-  cpu_set_t mask;
+  cpu_set_t mask = { };
+
   return sched_setaffinity(0, sizeof(mask), &mask);
 }
 EOF
@@ -1139,7 +1140,8 @@ else
 #include <sched.h>
 int main(int argc, char **argv)
 {
-  cpu_set_t mask;
+  cpu_set_t mask = { };
+
   return sched_setaffinity(0, &mask);
 }
 EOF
@@ -1621,7 +1623,8 @@ cat > $TMPC << EOF
 #include <sched.h>
 int main(int argc, char **argv)
 {
-  struct sched_param p;
+  struct sched_param p = { };
+
   return sched_setscheduler(0, SCHED_IDLE, &p);
 }
 EOF
@@ -1743,7 +1746,9 @@ cat > $TMPC << EOF
 #include <sys/uio.h>
 int main(int argc, char **argv)
 {
-  return pwritev(0, NULL, 1, 0) + preadv(0, NULL, 1, 0);
+  struct iovec iov[1] = { };
+
+  return pwritev(0, iov, 1, 0) + preadv(0, iov, 1, 0);
 }
 EOF
 if compile_prog "" "" "pwritev"; then
@@ -1761,7 +1766,9 @@ cat > $TMPC << EOF
 #include <sys/uio.h>
 int main(int argc, char **argv)
 {
-  return pwritev2(0, NULL, 1, 0, 0) + preadv2(0, NULL, 1, 0, 0);
+  struct iovec iov[1] = { };
+
+  return pwritev2(0, iov, 1, 0, 0) + preadv2(0, iov, 1, 0, 0);
 }
 EOF
 if compile_prog "" "" "pwritev2"; then
@@ -1787,14 +1794,14 @@ cat > $TMPC << EOF
 #include <stdio.h>
 int main(int argc, char **argv)
 {
-  struct addrinfo hints;
-  struct in6_addr addr;
+  struct addrinfo hints = { };
+  struct in6_addr addr = in6addr_any;
   int ret;
 
   ret = getaddrinfo(NULL, NULL, &hints, NULL);
   freeaddrinfo(NULL);
-  printf("%s\n", gai_strerror(ret));
-  addr = in6addr_any;
+  printf("%s %d\n", gai_strerror(ret), addr.s6_addr[0]);
+
   return 0;
 }
 EOF
@@ -2155,9 +2162,7 @@ cat > $TMPC << EOF
 #include <stdlib.h>
 int main(int argc, char **argv)
 {
-  int rc;
-  rc = pmem_is_pmem(NULL, 0);
-  return 0;
+  return pmem_is_pmem(NULL, 0);
 }
 EOF
 if compile_prog "" "-lpmem" "libpmem"; then
@@ -2176,7 +2181,7 @@ if test "$libpmem" = "yes"; then
 #include <stdlib.h>
 int main(int argc, char **argv)
 {
-  pmem_memcpy(NULL, NULL, NULL, NULL);
+  pmem_memcpy(NULL, NULL, 0, 0);
   return 0;
 }
 EOF
@@ -2392,7 +2397,7 @@ int main(int argc, char **argv)
   FILE *mtab = setmntent(NULL, "r");
   struct mntent *mnt = getmntent(mtab);
   endmntent(mtab);
-  return 0;
+  return mnt != NULL;
 }
 EOF
 if compile_prog "" "" "getmntent"; then
@@ -2573,6 +2578,10 @@ int main(int argc, char **argv)
 }
 EOF
 if test "$libzbc" != "no" ; then
+  if [ -e /usr/include/libzbc/libzbc ]; then
+    # SUSE Linux.
+    CFLAGS="$CFLAGS -I/usr/include/libzbc"
+  fi
   if compile_prog "" "-lzbc" "libzbc"; then
     libzbc="yes"
     if ! check_min_lib_version libzbc 5; then

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-06-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-06-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e1aeff3ac96a51128b0493377f405e38bdc83500:

  Merge branch 'wip-lmy-rados' of https://github.com/liangmingyuanneo/fio (2022-05-29 09:32:18 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5ceed0be62f3ce8903d5747674f9f70f44e736d6:

  docs: update language setting for Sphinx build (2022-05-31 20:58:00 -0600)

----------------------------------------------------------------
Vincent Fu (1):
      docs: update language setting for Sphinx build

 doc/conf.py | 7 -------
 1 file changed, 7 deletions(-)

---

Diff of recent changes:

diff --git a/doc/conf.py b/doc/conf.py
index 10b72ecb..844f951a 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -85,13 +85,6 @@ def fio_version():
 
 version, release = fio_version()
 
-# The language for content autogenerated by Sphinx. Refer to documentation
-# for a list of supported languages.
-#
-# This is also used if you do content translation via gettext catalogs.
-# Usually you set "language" from the command line for these cases.
-language = None
-
 # There are two options for replacing |today|: either, you set today to some
 # non-false value, then it is used:
 #

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-05-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-05-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a2840331c3cae5b2b0a13f99e58ae18375e2e40d:

  Merge branch 'master' of https://github.com/guoanwu/fio (2022-05-25 06:30:06 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e1aeff3ac96a51128b0493377f405e38bdc83500:

  Merge branch 'wip-lmy-rados' of https://github.com/liangmingyuanneo/fio (2022-05-29 09:32:18 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'wip-lmy-rados' of https://github.com/liangmingyuanneo/fio

Vincent Fu (5):
      steadystate: delete incorrect comment
      configure: refer to zlib1g-dev package for zlib support
      HOWTO: add blank line for prettier formatting
      t/run-fio-tests: improve json data decoding
      docs: update discussion of huge page sizes

liangmingyuan (1):
      engines/ceph: add option for setting config file path

 HOWTO.rst          | 31 ++++++++++++++++++++-----------
 configure          |  2 +-
 engines/rados.c    | 13 ++++++++++++-
 examples/rados.fio |  1 +
 fio.1              | 23 ++++++++++++++---------
 steadystate.c      |  7 -------
 t/run-fio-tests.py | 20 +++++++-------------
 7 files changed, 55 insertions(+), 42 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 84bea5c5..8ab3ac4b 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1064,6 +1064,7 @@ Target file/device
 	thread/process.
 
 .. option:: ignore_zone_limits=bool
+
 	If this option is used, fio will ignore the maximum number of open
 	zones limit of the zoned block device in use, thus allowing the
 	option :option:`max_open_zones` value to be larger than the device
@@ -1822,13 +1823,14 @@ Buffers and memory
 	**mmaphuge** to work, the system must have free huge pages allocated. This
 	can normally be checked and set by reading/writing
 	:file:`/proc/sys/vm/nr_hugepages` on a Linux system. Fio assumes a huge page
-	is 4MiB in size. So to calculate the number of huge pages you need for a
-	given job file, add up the I/O depth of all jobs (normally one unless
-	:option:`iodepth` is used) and multiply by the maximum bs set. Then divide
-	that number by the huge page size. You can see the size of the huge pages in
-	:file:`/proc/meminfo`. If no huge pages are allocated by having a non-zero
-	number in `nr_hugepages`, using **mmaphuge** or **shmhuge** will fail. Also
-	see :option:`hugepage-size`.
+        is 2 or 4MiB in size depending on the platform. So to calculate the
+        number of huge pages you need for a given job file, add up the I/O
+        depth of all jobs (normally one unless :option:`iodepth` is used) and
+        multiply by the maximum bs set. Then divide that number by the huge
+        page size. You can see the size of the huge pages in
+        :file:`/proc/meminfo`. If no huge pages are allocated by having a
+        non-zero number in `nr_hugepages`, using **mmaphuge** or **shmhuge**
+        will fail. Also see :option:`hugepage-size`.
 
 	**mmaphuge** also needs to have hugetlbfs mounted and the file location
 	should point there. So if it's mounted in :file:`/huge`, you would use
@@ -1847,10 +1849,12 @@ Buffers and memory
 
 .. option:: hugepage-size=int
 
-	Defines the size of a huge page. Must at least be equal to the system
-	setting, see :file:`/proc/meminfo`. Defaults to 4MiB.  Should probably
-	always be a multiple of megabytes, so using ``hugepage-size=Xm`` is the
-	preferred way to set this to avoid setting a non-pow-2 bad value.
+        Defines the size of a huge page. Must at least be equal to the system
+        setting, see :file:`/proc/meminfo` and
+        :file:`/sys/kernel/mm/hugepages/`. Defaults to 2 or 4MiB depending on
+        the platform.  Should probably always be a multiple of megabytes, so
+        using ``hugepage-size=Xm`` is the preferred way to set this to avoid
+        setting a non-pow-2 bad value.
 
 .. option:: lockmem=int
 
@@ -2491,6 +2495,11 @@ with the caveat that when used on the command line, they must come after the
 	the full *type.id* string. If no type. prefix is given, fio will add
 	'client.' by default.
 
+.. option:: conf=str : [rados]
+
+    Specifies the configuration path of ceph cluster, so conf file does not
+    have to be /etc/ceph/ceph.conf.
+
 .. option:: busy_poll=bool : [rbd,rados]
 
         Poll store instead of waiting for completion. Usually this provides better
diff --git a/configure b/configure
index 95b60bb7..4ee536a0 100755
--- a/configure
+++ b/configure
@@ -3142,7 +3142,7 @@ if test "$libzbc" = "yes" ; then
   output_sym "CONFIG_LIBZBC"
 fi
 if test "$zlib" = "no" ; then
-  echo "Consider installing zlib-dev (zlib-devel, some fio features depend on it."
+  echo "Consider installing zlib1g-dev (zlib-devel) as some fio features depend on it."
   if test "$build_static" = "yes"; then
     echo "Note that some distros have separate packages for static libraries."
   fi
diff --git a/engines/rados.c b/engines/rados.c
index 976f9229..d0d15c5b 100644
--- a/engines/rados.c
+++ b/engines/rados.c
@@ -37,6 +37,7 @@ struct rados_options {
 	char *cluster_name;
 	char *pool_name;
 	char *client_name;
+	char *conf;
 	int busy_poll;
 	int touch_objects;
 };
@@ -69,6 +70,16 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group    = FIO_OPT_G_RBD,
 	},
+	{
+		.name     = "conf",
+		.lname    = "ceph configuration file path",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "Path of the ceph configuration file",
+		.off1     = offsetof(struct rados_options, conf),
+		.def      = "/etc/ceph/ceph.conf",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_RBD,
+	},
 	{
 		.name     = "busy_poll",
 		.lname    = "busy poll mode",
@@ -184,7 +195,7 @@ static int _fio_rados_connect(struct thread_data *td)
 		goto failed_early;
 	}
 
-	r = rados_conf_read_file(rados->cluster, NULL);
+	r = rados_conf_read_file(rados->cluster, o->conf);
 	if (r < 0) {
 		log_err("rados_conf_read_file failed.\n");
 		goto failed_early;
diff --git a/examples/rados.fio b/examples/rados.fio
index 035cbff4..dd86f354 100644
--- a/examples/rados.fio
+++ b/examples/rados.fio
@@ -14,6 +14,7 @@
 ioengine=rados
 clientname=admin
 pool=rados
+conf=/etc/ceph/ceph.conf
 busy_poll=0
 rw=randwrite
 bs=4k
diff --git a/fio.1 b/fio.1
index ded7bbfc..bdba3142 100644
--- a/fio.1
+++ b/fio.1
@@ -1631,11 +1631,11 @@ multiplied by the I/O depth given. Note that for \fBshmhuge\fR and
 \fBmmaphuge\fR to work, the system must have free huge pages allocated. This
 can normally be checked and set by reading/writing
 `/proc/sys/vm/nr_hugepages' on a Linux system. Fio assumes a huge page
-is 4MiB in size. So to calculate the number of huge pages you need for a
-given job file, add up the I/O depth of all jobs (normally one unless
-\fBiodepth\fR is used) and multiply by the maximum bs set. Then divide
-that number by the huge page size. You can see the size of the huge pages in
-`/proc/meminfo'. If no huge pages are allocated by having a non-zero
+is 2 or 4MiB in size depending on the platform. So to calculate the number of
+huge pages you need for a given job file, add up the I/O depth of all jobs
+(normally one unless \fBiodepth\fR is used) and multiply by the maximum bs set.
+Then divide that number by the huge page size. You can see the size of the huge
+pages in `/proc/meminfo'. If no huge pages are allocated by having a non-zero
 number in `nr_hugepages', using \fBmmaphuge\fR or \fBshmhuge\fR will fail. Also
 see \fBhugepage\-size\fR.
 .P
@@ -1655,10 +1655,11 @@ of subsequent I/O memory buffers is the sum of the \fBiomem_align\fR and
 \fBbs\fR used.
 .TP
 .BI hugepage\-size \fR=\fPint
-Defines the size of a huge page. Must at least be equal to the system
-setting, see `/proc/meminfo'. Defaults to 4MiB. Should probably
-always be a multiple of megabytes, so using `hugepage\-size=Xm' is the
-preferred way to set this to avoid setting a non-pow-2 bad value.
+Defines the size of a huge page. Must at least be equal to the system setting,
+see `/proc/meminfo' and `/sys/kernel/mm/hugepages/'. Defaults to 2 or 4MiB
+depending on the platform. Should probably always be a multiple of megabytes,
+so using `hugepage\-size=Xm' is the preferred way to set this to avoid setting
+a non-pow-2 bad value.
 .TP
 .BI lockmem \fR=\fPint
 Pin the specified amount of memory with \fBmlock\fR\|(2). Can be used to
@@ -2243,6 +2244,10 @@ Ceph cluster. If the \fBclustername\fR is specified, the \fBclientname\fR shall
 the full *type.id* string. If no type. prefix is given, fio will add 'client.'
 by default.
 .TP
+.BI (rados)conf \fR=\fPstr
+Specifies the configuration path of ceph cluster, so conf file does not
+have to be /etc/ceph/ceph.conf.
+.TP
 .BI (rbd,rados)busy_poll \fR=\fPbool
 Poll store instead of waiting for completion. Usually this provides better
 throughput at cost of higher(up to 100%) CPU utilization.
diff --git a/steadystate.c b/steadystate.c
index 2e3da1db..ad19318c 100644
--- a/steadystate.c
+++ b/steadystate.c
@@ -250,13 +250,6 @@ int steadystate_check(void)
 		rate_time = mtime_since(&ss->prev_time, &now);
 		memcpy(&ss->prev_time, &now, sizeof(now));
 
-		/*
-		 * Begin monitoring when job starts but don't actually use
-		 * data in checking stopping criterion until ss->ramp_time is
-		 * over. This ensures that we will have a sane value in
-		 * prev_iops/bw the first time through after ss->ramp_time
-		 * is done.
-		 */
 		if (ss->state & FIO_SS_RAMP_OVER) {
 			group_bw += 1000 * (td_bytes - ss->prev_bytes) / rate_time;
 			group_iops += 1000 * (td_iops - ss->prev_iops) / rate_time;
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index ecceb67e..32cdbc19 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -311,21 +311,15 @@ class FioJobTest(FioExeTest):
         #
         # Sometimes fio informational messages are included at the top of the
         # JSON output, especially under Windows. Try to decode output as JSON
-        # data, lopping off up to the first four lines
+        # data, skipping everything until the first {
         #
         lines = file_data.splitlines()
-        for i in range(5):
-            file_data = '\n'.join(lines[i:])
-            try:
-                self.json_data = json.loads(file_data)
-            except json.JSONDecodeError:
-                continue
-            else:
-                logging.debug("Test %d: skipped %d lines decoding JSON data", self.testnum, i)
-                return
-
-        self.failure_reason = "{0} unable to decode JSON data,".format(self.failure_reason)
-        self.passed = False
+        file_data = '\n'.join(lines[lines.index("{"):])
+        try:
+            self.json_data = json.loads(file_data)
+        except json.JSONDecodeError:
+            self.failure_reason = "{0} unable to decode JSON data,".format(self.failure_reason)
+            self.passed = False
 
 
 class FioJobTest_t0005(FioJobTest):

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-05-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-05-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6f1a24593c227a4f392f454698aca20e95f0006c:

  Makefile: Suppress `-Wimplicit-fallthrough` when compiling `lex.yy` (2022-05-12 11:02:55 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a2840331c3cae5b2b0a13f99e58ae18375e2e40d:

  Merge branch 'master' of https://github.com/guoanwu/fio (2022-05-25 06:30:06 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'master' of https://github.com/guoanwu/fio

dennis.wu (1):
      pmemblk.c: fix one logic bug - read always with write

 engines/pmemblk.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/engines/pmemblk.c b/engines/pmemblk.c
index fc6358e8..849d8a15 100644
--- a/engines/pmemblk.c
+++ b/engines/pmemblk.c
@@ -375,10 +375,11 @@ static enum fio_q_status fio_pmemblk_queue(struct thread_data *td,
 		off /= pmb->pmb_bsize;
 		len /= pmb->pmb_bsize;
 		while (0 < len) {
-			if (io_u->ddir == DDIR_READ &&
-			   0 != pmemblk_read(pmb->pmb_pool, buf, off)) {
-				io_u->error = errno;
-				break;
+			if (io_u->ddir == DDIR_READ) {
+				if (0 != pmemblk_read(pmb->pmb_pool, buf, off)) {
+					io_u->error = errno;
+					break;
+				}
 			} else if (0 != pmemblk_write(pmb->pmb_pool, buf, off)) {
 				io_u->error = errno;
 				break;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-05-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-05-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 12db6deb8b767ac89dd73e34dbc6f06905441e07:

  Merge branch 'patch-1' of https://github.com/ferdnyc/fio (2022-05-01 07:29:05 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6f1a24593c227a4f392f454698aca20e95f0006c:

  Makefile: Suppress `-Wimplicit-fallthrough` when compiling `lex.yy` (2022-05-12 11:02:55 -0600)

----------------------------------------------------------------
Ammar Faizi (2):
      backend: Fix indentation
      Makefile: Suppress `-Wimplicit-fallthrough` when compiling `lex.yy`

Ankit Kumar (3):
      engines/xnvme: add xnvme engine
      docs: documentation for xnvme ioengine
      examples: add example job file for xnvme engine usage

 HOWTO.rst                  |  55 ++-
 Makefile                   |  13 +-
 backend.c                  |   2 +-
 configure                  |  22 +
 engines/xnvme.c            | 981 +++++++++++++++++++++++++++++++++++++++++++++
 examples/xnvme-compare.fio |  72 ++++
 examples/xnvme-zoned.fio   |  87 ++++
 fio.1                      |  70 +++-
 optgroup.h                 |   2 +
 options.c                  |   5 +
 10 files changed, 1302 insertions(+), 7 deletions(-)
 create mode 100644 engines/xnvme.c
 create mode 100644 examples/xnvme-compare.fio
 create mode 100644 examples/xnvme-zoned.fio

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 6a3e09f5..84bea5c5 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -2171,6 +2171,12 @@ I/O engine
 		**exec**
 			Execute 3rd party tools. Could be used to perform monitoring during jobs runtime.
 
+		**xnvme**
+			I/O engine using the xNVMe C API, for NVMe devices. The xnvme engine provides
+			flexibility to access GNU/Linux Kernel NVMe driver via libaio, IOCTLs, io_uring,
+			the SPDK NVMe driver, or your own custom NVMe driver. The xnvme engine includes
+			engine specific options. (See https://xnvme.io).
+
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -2260,7 +2266,7 @@ with the caveat that when used on the command line, they must come after the
 	making the submission and completion part more lightweight. Required
 	for the below :option:`sqthread_poll` option.
 
-.. option:: sqthread_poll : [io_uring]
+.. option:: sqthread_poll : [io_uring] [xnvme]
 
 	Normally fio will submit IO by issuing a system call to notify the
 	kernel of available items in the SQ ring. If this option is set, the
@@ -2275,7 +2281,7 @@ with the caveat that when used on the command line, they must come after the
 
 .. option:: hipri
 
-   [io_uring]
+   [io_uring], [xnvme]
 
         If this option is set, fio will attempt to use polled IO completions.
         Normal IO completions generate interrupts to signal the completion of
@@ -2725,6 +2731,51 @@ with the caveat that when used on the command line, they must come after the
 
 	If set, stdout and stderr streams are redirected to files named from the job name. Default is true.
 
+.. option:: xnvme_async=str : [xnvme]
+
+	Select the xnvme async command interface. This can take these values.
+
+	**emu**
+		This is default and used to emulate asynchronous I/O.
+	**thrpool**
+		Use thread pool for Asynchronous I/O.
+	**io_uring**
+		Use Linux io_uring/liburing for Asynchronous I/O.
+	**libaio**
+		Use Linux aio for Asynchronous I/O.
+	**posix**
+		Use POSIX aio for Asynchronous I/O.
+	**nil**
+		Use nil-io; For introspective perf. evaluation
+
+.. option:: xnvme_sync=str : [xnvme]
+
+	Select the xnvme synchronous command interface. This can take these values.
+
+	**nvme**
+		This is default and uses Linux NVMe Driver ioctl() for synchronous I/O.
+	**psync**
+		Use pread()/write() for synchronous I/O.
+
+.. option:: xnvme_admin=str : [xnvme]
+
+	Select the xnvme admin command interface. This can take these values.
+
+	**nvme**
+		This is default and uses linux NVMe Driver ioctl() for admin commands.
+	**block**
+		Use Linux Block Layer ioctl() and sysfs for admin commands.
+	**file_as_ns**
+		Use file-stat to construct NVMe idfy responses.
+
+.. option:: xnvme_dev_nsid=int : [xnvme]
+
+	xnvme namespace identifier, for userspace NVMe driver.
+
+.. option:: xnvme_iovec=int : [xnvme]
+
+	If this option is set. xnvme will use vectored read/write commands.
+
 I/O depth
 ~~~~~~~~~
 
diff --git a/Makefile b/Makefile
index e670c1f2..ed66305a 100644
--- a/Makefile
+++ b/Makefile
@@ -223,7 +223,12 @@ ifdef CONFIG_LIBZBC
   libzbc_LIBS = -lzbc
   ENGINES += libzbc
 endif
-
+ifdef CONFIG_LIBXNVME
+  xnvme_SRCS = engines/xnvme.c
+  xnvme_LIBS = $(LIBXNVME_LIBS)
+  xnvme_CFLAGS = $(LIBXNVME_CFLAGS)
+  ENGINES += xnvme
+endif
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
 		oslib/linux-dev-lookup.c engines/io_uring.c
@@ -530,8 +535,12 @@ else
 	$(QUIET_LEX)$(LEX) $<
 endif
 
+ifneq (,$(findstring -Wimplicit-fallthrough,$(CFLAGS)))
+LEX_YY_CFLAGS := -Wno-implicit-fallthrough
+endif
+
 lex.yy.o: lex.yy.c y.tab.h
-	$(QUIET_CC)$(CC) -o $@ $(CFLAGS) $(CPPFLAGS) -c $<
+	$(QUIET_CC)$(CC) -o $@ $(CFLAGS) $(CPPFLAGS) $(LEX_YY_CFLAGS) -c $<
 
 y.tab.o: y.tab.c y.tab.h
 	$(QUIET_CC)$(CC) -o $@ $(CFLAGS) $(CPPFLAGS) -c $<
diff --git a/backend.c b/backend.c
index ffbb7e2a..e5bb4e25 100644
--- a/backend.c
+++ b/backend.c
@@ -2021,7 +2021,7 @@ static void reap_threads(unsigned int *nr_running, uint64_t *t_rate,
 	for_each_td(td, i) {
 		int flags = 0;
 
-		 if (!strcmp(td->o.ioengine, "cpuio"))
+		if (!strcmp(td->o.ioengine, "cpuio"))
 			cputhreads++;
 		else
 			realthreads++;
diff --git a/configure b/configure
index d327d2ca..95b60bb7 100755
--- a/configure
+++ b/configure
@@ -171,6 +171,7 @@ march_set="no"
 libiscsi="no"
 libnbd="no"
 libnfs="no"
+xnvme="no"
 libzbc=""
 dfs=""
 dynamic_engines="no"
@@ -240,6 +241,8 @@ for opt do
   ;;
   --disable-libzbc) libzbc="no"
   ;;
+  --enable-xnvme) xnvme="yes"
+  ;;
   --disable-tcmalloc) disable_tcmalloc="yes"
   ;;
   --disable-nfs) disable_nfs="yes"
@@ -291,6 +294,7 @@ if test "$show_help" = "yes" ; then
   echo "--with-ime=             Install path for DDN's Infinite Memory Engine"
   echo "--enable-libiscsi       Enable iscsi support"
   echo "--enable-libnbd         Enable libnbd (NBD engine) support"
+  echo "--enable-xnvme          Enable xnvme support"
   echo "--disable-libzbc        Disable libzbc even if found"
   echo "--disable-tcmalloc      Disable tcmalloc support"
   echo "--dynamic-libengines    Lib-based ioengines as dynamic libraries"
@@ -2583,6 +2587,19 @@ if test "$libzbc" != "no" ; then
 fi
 print_config "libzbc engine" "$libzbc"
 
+##########################################
+# Check if we have xnvme
+if test "$xnvme" != "yes" ; then
+  if check_min_lib_version xnvme 0.2.0; then
+    xnvme="yes"
+    xnvme_cflags=$(pkg-config --cflags xnvme)
+    xnvme_libs=$(pkg-config --libs xnvme)
+  else
+    xnvme="no"
+  fi
+fi
+print_config "xnvme engine" "$xnvme"
+
 ##########################################
 # check march=armv8-a+crc+crypto
 if test "$march_armv8_a_crc_crypto" != "yes" ; then
@@ -3190,6 +3207,11 @@ if test "$libnfs" = "yes" ; then
   echo "LIBNFS_CFLAGS=$libnfs_cflags" >> $config_host_mak
   echo "LIBNFS_LIBS=$libnfs_libs" >> $config_host_mak
 fi
+if test "$xnvme" = "yes" ; then
+  output_sym "CONFIG_LIBXNVME"
+  echo "LIBXNVME_CFLAGS=$xnvme_cflags" >> $config_host_mak
+  echo "LIBXNVME_LIBS=$xnvme_libs" >> $config_host_mak
+fi
 if test "$dynamic_engines" = "yes" ; then
   output_sym "CONFIG_DYNAMIC_ENGINES"
 fi
diff --git a/engines/xnvme.c b/engines/xnvme.c
new file mode 100644
index 00000000..c11b33a8
--- /dev/null
+++ b/engines/xnvme.c
@@ -0,0 +1,981 @@
+/*
+ * fio xNVMe IO Engine
+ *
+ * IO engine using the xNVMe C API.
+ *
+ * See: http://xnvme.io/
+ *
+ * SPDX-License-Identifier: Apache-2.0
+ */
+#include <stdlib.h>
+#include <assert.h>
+#include <libxnvme.h>
+#include <libxnvme_libconf.h>
+#include <libxnvme_nvm.h>
+#include <libxnvme_znd.h>
+#include <libxnvme_spec_fs.h>
+#include "fio.h"
+#include "zbd_types.h"
+#include "optgroup.h"
+
+static pthread_mutex_t g_serialize = PTHREAD_MUTEX_INITIALIZER;
+
+struct xnvme_fioe_fwrap {
+	/* fio file representation */
+	struct fio_file *fio_file;
+
+	/* xNVMe device handle */
+	struct xnvme_dev *dev;
+	/* xNVMe device geometry */
+	const struct xnvme_geo *geo;
+
+	struct xnvme_queue *queue;
+
+	uint32_t ssw;
+	uint32_t lba_nbytes;
+
+	uint8_t _pad[24];
+};
+XNVME_STATIC_ASSERT(sizeof(struct xnvme_fioe_fwrap) == 64, "Incorrect size")
+
+struct xnvme_fioe_data {
+	/* I/O completion queue */
+	struct io_u **iocq;
+
+	/* # of iocq entries; incremented via getevents()/cb_pool() */
+	uint64_t completed;
+
+	/*
+	 *  # of errors; incremented when observed on completion via
+	 *  getevents()/cb_pool()
+	 */
+	uint64_t ecount;
+
+	/* Controller which device/file to select */
+	int32_t prev;
+	int32_t cur;
+
+	/* Number of devices/files for which open() has been called */
+	int64_t nopen;
+	/* Number of devices/files allocated in files[] */
+	uint64_t nallocated;
+
+	struct iovec *iovec;
+
+	uint8_t _pad[8];
+
+	struct xnvme_fioe_fwrap files[];
+};
+XNVME_STATIC_ASSERT(sizeof(struct xnvme_fioe_data) == 64, "Incorrect size")
+
+struct xnvme_fioe_options {
+	void *padding;
+	unsigned int hipri;
+	unsigned int sqpoll_thread;
+	unsigned int xnvme_dev_nsid;
+	unsigned int xnvme_iovec;
+	char *xnvme_be;
+	char *xnvme_async;
+	char *xnvme_sync;
+	char *xnvme_admin;
+};
+
+static struct fio_option options[] = {
+	{
+		.name = "hipri",
+		.lname = "High Priority",
+		.type = FIO_OPT_STR_SET,
+		.off1 = offsetof(struct xnvme_fioe_options, hipri),
+		.help = "Use polled IO completions",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_XNVME,
+	},
+	{
+		.name = "sqthread_poll",
+		.lname = "Kernel SQ thread polling",
+		.type = FIO_OPT_STR_SET,
+		.off1 = offsetof(struct xnvme_fioe_options, sqpoll_thread),
+		.help = "Offload submission/completion to kernel thread",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_XNVME,
+	},
+	{
+		.name = "xnvme_be",
+		.lname = "xNVMe Backend",
+		.type = FIO_OPT_STR_STORE,
+		.off1 = offsetof(struct xnvme_fioe_options, xnvme_be),
+		.help = "Select xNVMe backend [spdk,linux,fbsd]",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_XNVME,
+	},
+	{
+		.name = "xnvme_async",
+		.lname = "xNVMe Asynchronous command-interface",
+		.type = FIO_OPT_STR_STORE,
+		.off1 = offsetof(struct xnvme_fioe_options, xnvme_async),
+		.help = "Select xNVMe async. interface: [emu,thrpool,io_uring,libaio,posix,nil]",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_XNVME,
+	},
+	{
+		.name = "xnvme_sync",
+		.lname = "xNVMe Synchronous. command-interface",
+		.type = FIO_OPT_STR_STORE,
+		.off1 = offsetof(struct xnvme_fioe_options, xnvme_sync),
+		.help = "Select xNVMe sync. interface: [nvme,psync]",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_XNVME,
+	},
+	{
+		.name = "xnvme_admin",
+		.lname = "xNVMe Admin command-interface",
+		.type = FIO_OPT_STR_STORE,
+		.off1 = offsetof(struct xnvme_fioe_options, xnvme_admin),
+		.help = "Select xNVMe admin. cmd-interface: [nvme,block,file_as_ns]",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_XNVME,
+	},
+	{
+		.name = "xnvme_dev_nsid",
+		.lname = "xNVMe Namespace-Identifier, for user-space NVMe driver",
+		.type = FIO_OPT_INT,
+		.off1 = offsetof(struct xnvme_fioe_options, xnvme_dev_nsid),
+		.help = "xNVMe Namespace-Identifier, for user-space NVMe driver",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_XNVME,
+	},
+	{
+		.name = "xnvme_iovec",
+		.lname = "Vectored IOs",
+		.type = FIO_OPT_STR_SET,
+		.off1 = offsetof(struct xnvme_fioe_options, xnvme_iovec),
+		.help = "Send vectored IOs",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_XNVME,
+	},
+
+	{
+		.name = NULL,
+	},
+};
+
+static void cb_pool(struct xnvme_cmd_ctx *ctx, void *cb_arg)
+{
+	struct io_u *io_u = cb_arg;
+	struct xnvme_fioe_data *xd = io_u->mmap_data;
+
+	if (xnvme_cmd_ctx_cpl_status(ctx)) {
+		xnvme_cmd_ctx_pr(ctx, XNVME_PR_DEF);
+		xd->ecount += 1;
+		io_u->error = EIO;
+	}
+
+	xd->iocq[xd->completed++] = io_u;
+	xnvme_queue_put_cmd_ctx(ctx->async.queue, ctx);
+}
+
+static struct xnvme_opts xnvme_opts_from_fioe(struct thread_data *td)
+{
+	struct xnvme_fioe_options *o = td->eo;
+	struct xnvme_opts opts = xnvme_opts_default();
+
+	opts.nsid = o->xnvme_dev_nsid;
+	opts.be = o->xnvme_be;
+	opts.async = o->xnvme_async;
+	opts.sync = o->xnvme_sync;
+	opts.admin = o->xnvme_admin;
+
+	opts.poll_io = o->hipri;
+	opts.poll_sq = o->sqpoll_thread;
+
+	opts.direct = td->o.odirect;
+
+	return opts;
+}
+
+static void _dev_close(struct thread_data *td, struct xnvme_fioe_fwrap *fwrap)
+{
+	if (fwrap->dev)
+		xnvme_queue_term(fwrap->queue);
+
+	xnvme_dev_close(fwrap->dev);
+
+	memset(fwrap, 0, sizeof(*fwrap));
+}
+
+static void xnvme_fioe_cleanup(struct thread_data *td)
+{
+	struct xnvme_fioe_data *xd = td->io_ops_data;
+	int err;
+
+	err = pthread_mutex_lock(&g_serialize);
+	if (err)
+		log_err("ioeng->cleanup(): pthread_mutex_lock(), err(%d)\n", err);
+		/* NOTE: not returning here */
+
+	for (uint64_t i = 0; i < xd->nallocated; ++i)
+		_dev_close(td, &xd->files[i]);
+
+	if (!err) {
+		err = pthread_mutex_unlock(&g_serialize);
+		if (err)
+			log_err("ioeng->cleanup(): pthread_mutex_unlock(), err(%d)\n", err);
+	}
+
+	free(xd->iocq);
+	free(xd->iovec);
+	free(xd);
+	td->io_ops_data = NULL;
+}
+
+/**
+ * Helper function setting up device handles as addressed by the naming
+ * convention of the given `fio_file` filename.
+ *
+ * Checks thread-options for explicit control of asynchronous implementation via
+ * the ``--xnvme_async={thrpool,emu,posix,io_uring,libaio,nil}``.
+ */
+static int _dev_open(struct thread_data *td, struct fio_file *f)
+{
+	struct xnvme_opts opts = xnvme_opts_from_fioe(td);
+	struct xnvme_fioe_data *xd = td->io_ops_data;
+	struct xnvme_fioe_fwrap *fwrap;
+	int flags = 0;
+	int err;
+
+	if (f->fileno > (int)xd->nallocated) {
+		log_err("ioeng->_dev_open(%s): invalid assumption\n", f->file_name);
+		return 1;
+	}
+
+	fwrap = &xd->files[f->fileno];
+
+	err = pthread_mutex_lock(&g_serialize);
+	if (err) {
+		log_err("ioeng->_dev_open(%s): pthread_mutex_lock(), err(%d)\n", f->file_name,
+			err);
+		return -err;
+	}
+
+	fwrap->dev = xnvme_dev_open(f->file_name, &opts);
+	if (!fwrap->dev) {
+		log_err("ioeng->_dev_open(%s): xnvme_dev_open(), err(%d)\n", f->file_name, errno);
+		goto failure;
+	}
+	fwrap->geo = xnvme_dev_get_geo(fwrap->dev);
+
+	if (xnvme_queue_init(fwrap->dev, td->o.iodepth, flags, &(fwrap->queue))) {
+		log_err("ioeng->_dev_open(%s): xnvme_queue_init(), err(?)\n", f->file_name);
+		goto failure;
+	}
+	xnvme_queue_set_cb(fwrap->queue, cb_pool, NULL);
+
+	fwrap->ssw = xnvme_dev_get_ssw(fwrap->dev);
+	fwrap->lba_nbytes = fwrap->geo->lba_nbytes;
+
+	fwrap->fio_file = f;
+	fwrap->fio_file->filetype = FIO_TYPE_BLOCK;
+	fwrap->fio_file->real_file_size = fwrap->geo->tbytes;
+	fio_file_set_size_known(fwrap->fio_file);
+
+	err = pthread_mutex_unlock(&g_serialize);
+	if (err)
+		log_err("ioeng->_dev_open(%s): pthread_mutex_unlock(), err(%d)\n", f->file_name,
+			err);
+
+	return 0;
+
+failure:
+	xnvme_queue_term(fwrap->queue);
+	xnvme_dev_close(fwrap->dev);
+
+	err = pthread_mutex_unlock(&g_serialize);
+	if (err)
+		log_err("ioeng->_dev_open(%s): pthread_mutex_unlock(), err(%d)\n", f->file_name,
+			err);
+
+	return 1;
+}
+
+static int xnvme_fioe_init(struct thread_data *td)
+{
+	struct xnvme_fioe_data *xd = NULL;
+	struct fio_file *f;
+	unsigned int i;
+
+	if (!td->o.use_thread) {
+		log_err("ioeng->init(): --thread=1 is required\n");
+		return 1;
+	}
+
+	/* Allocate xd and iocq */
+	xd = calloc(1, sizeof(*xd) + sizeof(*xd->files) * td->o.nr_files);
+	if (!xd) {
+		log_err("ioeng->init(): !calloc(), err(%d)\n", errno);
+		return 1;
+	}
+
+	xd->iocq = calloc(td->o.iodepth, sizeof(struct io_u *));
+	if (!xd->iocq) {
+		log_err("ioeng->init(): !calloc(), err(%d)\n", errno);
+		return 1;
+	}
+
+	xd->iovec = calloc(td->o.iodepth, sizeof(*xd->iovec));
+	if (!xd->iovec) {
+		log_err("ioeng->init(): !calloc(xd->iovec), err(%d)\n", errno);
+		return 1;
+	}
+
+	xd->prev = -1;
+	td->io_ops_data = xd;
+
+	for_each_file(td, f, i)
+	{
+		if (_dev_open(td, f)) {
+			log_err("ioeng->init(): failed; _dev_open(%s)\n", f->file_name);
+			return 1;
+		}
+
+		++(xd->nallocated);
+	}
+
+	if (xd->nallocated != td->o.nr_files) {
+		log_err("ioeng->init(): failed; nallocated != td->o.nr_files\n");
+		return 1;
+	}
+
+	return 0;
+}
+
+/* NOTE: using the first device for buffer-allocators) */
+static int xnvme_fioe_iomem_alloc(struct thread_data *td, size_t total_mem)
+{
+	struct xnvme_fioe_data *xd = td->io_ops_data;
+	struct xnvme_fioe_fwrap *fwrap = &xd->files[0];
+
+	if (!fwrap->dev) {
+		log_err("ioeng->iomem_alloc(): failed; no dev-handle\n");
+		return 1;
+	}
+
+	td->orig_buffer = xnvme_buf_alloc(fwrap->dev, total_mem);
+
+	return td->orig_buffer == NULL;
+}
+
+/* NOTE: using the first device for buffer-allocators) */
+static void xnvme_fioe_iomem_free(struct thread_data *td)
+{
+	struct xnvme_fioe_data *xd = td->io_ops_data;
+	struct xnvme_fioe_fwrap *fwrap = &xd->files[0];
+
+	if (!fwrap->dev) {
+		log_err("ioeng->iomem_free(): failed no dev-handle\n");
+		return;
+	}
+
+	xnvme_buf_free(fwrap->dev, td->orig_buffer);
+}
+
+static int xnvme_fioe_io_u_init(struct thread_data *td, struct io_u *io_u)
+{
+	io_u->mmap_data = td->io_ops_data;
+
+	return 0;
+}
+
+static void xnvme_fioe_io_u_free(struct thread_data *td, struct io_u *io_u)
+{
+	io_u->mmap_data = NULL;
+}
+
+static struct io_u *xnvme_fioe_event(struct thread_data *td, int event)
+{
+	struct xnvme_fioe_data *xd = td->io_ops_data;
+
+	assert(event >= 0);
+	assert((unsigned)event < xd->completed);
+
+	return xd->iocq[event];
+}
+
+static int xnvme_fioe_getevents(struct thread_data *td, unsigned int min, unsigned int max,
+				const struct timespec *t)
+{
+	struct xnvme_fioe_data *xd = td->io_ops_data;
+	struct xnvme_fioe_fwrap *fwrap = NULL;
+	int nfiles = xd->nallocated;
+	int err = 0;
+
+	if (xd->prev != -1 && ++xd->prev < nfiles) {
+		fwrap = &xd->files[xd->prev];
+		xd->cur = xd->prev;
+	}
+
+	xd->completed = 0;
+	for (;;) {
+		if (fwrap == NULL || xd->cur == nfiles) {
+			fwrap = &xd->files[0];
+			xd->cur = 0;
+		}
+
+		while (fwrap != NULL && xd->cur < nfiles && err >= 0) {
+			err = xnvme_queue_poke(fwrap->queue, max - xd->completed);
+			if (err < 0) {
+				switch (err) {
+				case -EBUSY:
+				case -EAGAIN:
+					usleep(1);
+					break;
+
+				default:
+					log_err("ioeng->getevents(): unhandled IO error\n");
+					assert(false);
+					return 0;
+				}
+			}
+			if (xd->completed >= min) {
+				xd->prev = xd->cur;
+				return xd->completed;
+			}
+			xd->cur++;
+			fwrap = &xd->files[xd->cur];
+
+			if (err < 0) {
+				switch (err) {
+				case -EBUSY:
+				case -EAGAIN:
+					usleep(1);
+					break;
+				}
+			}
+		}
+	}
+
+	xd->cur = 0;
+
+	return xd->completed;
+}
+
+static enum fio_q_status xnvme_fioe_queue(struct thread_data *td, struct io_u *io_u)
+{
+	struct xnvme_fioe_data *xd = td->io_ops_data;
+	struct xnvme_fioe_fwrap *fwrap;
+	struct xnvme_cmd_ctx *ctx;
+	uint32_t nsid;
+	uint64_t slba;
+	uint16_t nlb;
+	int err;
+	bool vectored_io = ((struct xnvme_fioe_options *)td->eo)->xnvme_iovec;
+
+	fio_ro_check(td, io_u);
+
+	fwrap = &xd->files[io_u->file->fileno];
+	nsid = xnvme_dev_get_nsid(fwrap->dev);
+
+	slba = io_u->offset >> fwrap->ssw;
+	nlb = (io_u->xfer_buflen >> fwrap->ssw) - 1;
+
+	ctx = xnvme_queue_get_cmd_ctx(fwrap->queue);
+	ctx->async.cb_arg = io_u;
+
+	ctx->cmd.common.nsid = nsid;
+	ctx->cmd.nvm.slba = slba;
+	ctx->cmd.nvm.nlb = nlb;
+
+	switch (io_u->ddir) {
+	case DDIR_READ:
+		ctx->cmd.common.opcode = XNVME_SPEC_NVM_OPC_READ;
+		break;
+
+	case DDIR_WRITE:
+		ctx->cmd.common.opcode = XNVME_SPEC_NVM_OPC_WRITE;
+		break;
+
+	default:
+		log_err("ioeng->queue(): ENOSYS: %u\n", io_u->ddir);
+		err = -1;
+		assert(false);
+		break;
+	}
+
+	if (vectored_io) {
+		xd->iovec[io_u->index].iov_base = io_u->xfer_buf;
+		xd->iovec[io_u->index].iov_len = io_u->xfer_buflen;
+
+		err = xnvme_cmd_passv(ctx, &xd->iovec[io_u->index], 1, io_u->xfer_buflen, NULL, 0,
+				      0);
+	} else {
+		err = xnvme_cmd_pass(ctx, io_u->xfer_buf, io_u->xfer_buflen, NULL, 0);
+	}
+	switch (err) {
+	case 0:
+		return FIO_Q_QUEUED;
+
+	case -EBUSY:
+	case -EAGAIN:
+		xnvme_queue_put_cmd_ctx(ctx->async.queue, ctx);
+		return FIO_Q_BUSY;
+
+	default:
+		log_err("ioeng->queue(): err: '%d'\n", err);
+
+		xnvme_queue_put_cmd_ctx(ctx->async.queue, ctx);
+
+		io_u->error = abs(err);
+		assert(false);
+		return FIO_Q_COMPLETED;
+	}
+}
+
+static int xnvme_fioe_close(struct thread_data *td, struct fio_file *f)
+{
+	struct xnvme_fioe_data *xd = td->io_ops_data;
+
+	dprint(FD_FILE, "xnvme close %s -- nopen: %ld\n", f->file_name, xd->nopen);
+
+	--(xd->nopen);
+
+	return 0;
+}
+
+static int xnvme_fioe_open(struct thread_data *td, struct fio_file *f)
+{
+	struct xnvme_fioe_data *xd = td->io_ops_data;
+
+	dprint(FD_FILE, "xnvme open %s -- nopen: %ld\n", f->file_name, xd->nopen);
+
+	if (f->fileno > (int)xd->nallocated) {
+		log_err("ioeng->open(): f->fileno > xd->nallocated; invalid assumption\n");
+		return 1;
+	}
+	if (xd->files[f->fileno].fio_file != f) {
+		log_err("ioeng->open(): fio_file != f; invalid assumption\n");
+		return 1;
+	}
+
+	++(xd->nopen);
+
+	return 0;
+}
+
+static int xnvme_fioe_invalidate(struct thread_data *td, struct fio_file *f)
+{
+	/* Consider only doing this with be:spdk */
+	return 0;
+}
+
+static int xnvme_fioe_get_max_open_zones(struct thread_data *td, struct fio_file *f,
+					 unsigned int *max_open_zones)
+{
+	struct xnvme_opts opts = xnvme_opts_from_fioe(td);
+	struct xnvme_dev *dev;
+	const struct xnvme_spec_znd_idfy_ns *zns;
+	int err = 0, err_lock;
+
+	if (f->filetype != FIO_TYPE_FILE && f->filetype != FIO_TYPE_BLOCK &&
+	    f->filetype != FIO_TYPE_CHAR) {
+		log_info("ioeng->get_max_open_zoned(): ignoring filetype: %d\n", f->filetype);
+		return 0;
+	}
+	err_lock = pthread_mutex_lock(&g_serialize);
+	if (err_lock) {
+		log_err("ioeng->get_max_open_zones(): pthread_mutex_lock(), err(%d)\n", err_lock);
+		return -err_lock;
+	}
+
+	dev = xnvme_dev_open(f->file_name, &opts);
+	if (!dev) {
+		log_err("ioeng->get_max_open_zones(): xnvme_dev_open(), err(%d)\n", err_lock);
+		err = -errno;
+		goto exit;
+	}
+	if (xnvme_dev_get_geo(dev)->type != XNVME_GEO_ZONED) {
+		errno = EINVAL;
+		err = -errno;
+		goto exit;
+	}
+
+	zns = (void *)xnvme_dev_get_ns_css(dev);
+	if (!zns) {
+		log_err("ioeng->get_max_open_zones(): xnvme_dev_get_ns_css(), err(%d)\n", errno);
+		err = -errno;
+		goto exit;
+	}
+
+	/*
+	 * intentional overflow as the value is zero-based and NVMe
+	 * defines 0xFFFFFFFF as unlimited thus overflowing to 0 which
+	 * is how fio indicates unlimited and otherwise just converting
+	 * to one-based.
+	 */
+	*max_open_zones = zns->mor + 1;
+
+exit:
+	xnvme_dev_close(dev);
+	err_lock = pthread_mutex_unlock(&g_serialize);
+	if (err_lock)
+		log_err("ioeng->get_max_open_zones(): pthread_mutex_unlock(), err(%d)\n",
+			err_lock);
+
+	return err;
+}
+
+/**
+ * Currently, this function is called before of I/O engine initialization, so,
+ * we cannot consult the file-wrapping done when 'fioe' initializes.
+ * Instead we just open based on the given filename.
+ *
+ * TODO: unify the different setup methods, consider keeping the handle around,
+ * and consider how to support the --be option in this usecase
+ */
+static int xnvme_fioe_get_zoned_model(struct thread_data *td, struct fio_file *f,
+				      enum zbd_zoned_model *model)
+{
+	struct xnvme_opts opts = xnvme_opts_from_fioe(td);
+	struct xnvme_dev *dev;
+	int err = 0, err_lock;
+
+	if (f->filetype != FIO_TYPE_FILE && f->filetype != FIO_TYPE_BLOCK &&
+	    f->filetype != FIO_TYPE_CHAR) {
+		log_info("ioeng->get_zoned_model(): ignoring filetype: %d\n", f->filetype);
+		return -EINVAL;
+	}
+
+	err = pthread_mutex_lock(&g_serialize);
+	if (err) {
+		log_err("ioeng->get_zoned_model(): pthread_mutex_lock(), err(%d)\n", err);
+		return -err;
+	}
+
+	dev = xnvme_dev_open(f->file_name, &opts);
+	if (!dev) {
+		log_err("ioeng->get_zoned_model(): xnvme_dev_open(%s) failed, errno: %d\n",
+			f->file_name, errno);
+		err = -errno;
+		goto exit;
+	}
+
+	switch (xnvme_dev_get_geo(dev)->type) {
+	case XNVME_GEO_UNKNOWN:
+		dprint(FD_ZBD, "%s: got 'unknown', assigning ZBD_NONE\n", f->file_name);
+		*model = ZBD_NONE;
+		break;
+
+	case XNVME_GEO_CONVENTIONAL:
+		dprint(FD_ZBD, "%s: got 'conventional', assigning ZBD_NONE\n", f->file_name);
+		*model = ZBD_NONE;
+		break;
+
+	case XNVME_GEO_ZONED:
+		dprint(FD_ZBD, "%s: got 'zoned', assigning ZBD_HOST_MANAGED\n", f->file_name);
+		*model = ZBD_HOST_MANAGED;
+		break;
+
+	default:
+		dprint(FD_ZBD, "%s: hit-default, assigning ZBD_NONE\n", f->file_name);
+		*model = ZBD_NONE;
+		errno = EINVAL;
+		err = -errno;
+		break;
+	}
+
+exit:
+	xnvme_dev_close(dev);
+
+	err_lock = pthread_mutex_unlock(&g_serialize);
+	if (err_lock)
+		log_err("ioeng->get_zoned_model(): pthread_mutex_unlock(), err(%d)\n", err_lock);
+
+	return err;
+}
+
+/**
+ * Fills the given ``zbdz`` with at most ``nr_zones`` zone-descriptors.
+ *
+ * The implementation converts the NVMe Zoned Command Set log-pages for Zone
+ * descriptors into the Linux Kernel Zoned Block Report format.
+ *
+ * NOTE: This function is called before I/O engine initialization, that is,
+ * before ``_dev_open`` has been called and file-wrapping is setup. Thus is has
+ * to do the ``_dev_open`` itself, and shut it down again once it is done
+ * retrieving the log-pages and converting them to the report format.
+ *
+ * TODO: unify the different setup methods, consider keeping the handle around,
+ * and consider how to support the --async option in this usecase
+ */
+static int xnvme_fioe_report_zones(struct thread_data *td, struct fio_file *f, uint64_t offset,
+				   struct zbd_zone *zbdz, unsigned int nr_zones)
+{
+	struct xnvme_opts opts = xnvme_opts_from_fioe(td);
+	const struct xnvme_spec_znd_idfy_lbafe *lbafe = NULL;
+	struct xnvme_dev *dev = NULL;
+	const struct xnvme_geo *geo = NULL;
+	struct xnvme_znd_report *rprt = NULL;
+	uint32_t ssw;
+	uint64_t slba;
+	unsigned int limit = 0;
+	int err = 0, err_lock;
+
+	dprint(FD_ZBD, "%s: report_zones() offset: %zu, nr_zones: %u\n", f->file_name, offset,
+	       nr_zones);
+
+	err = pthread_mutex_lock(&g_serialize);
+	if (err) {
+		log_err("ioeng->report_zones(%s): pthread_mutex_lock(), err(%d)\n", f->file_name,
+			err);
+		return -err;
+	}
+
+	dev = xnvme_dev_open(f->file_name, &opts);
+	if (!dev) {
+		log_err("ioeng->report_zones(%s): xnvme_dev_open(), err(%d)\n", f->file_name,
+			errno);
+		goto exit;
+	}
+
+	geo = xnvme_dev_get_geo(dev);
+	ssw = xnvme_dev_get_ssw(dev);
+	lbafe = xnvme_znd_dev_get_lbafe(dev);
+
+	limit = nr_zones > geo->nzone ? geo->nzone : nr_zones;
+
+	dprint(FD_ZBD, "%s: limit: %u\n", f->file_name, limit);
+
+	slba = ((offset >> ssw) / geo->nsect) * geo->nsect;
+
+	rprt = xnvme_znd_report_from_dev(dev, slba, limit, 0);
+	if (!rprt) {
+		log_err("ioeng->report_zones(%s): xnvme_znd_report_from_dev(), err(%d)\n",
+			f->file_name, errno);
+		err = -errno;
+		goto exit;
+	}
+	if (rprt->nentries != limit) {
+		log_err("ioeng->report_zones(%s): nentries != nr_zones\n", f->file_name);
+		err = 1;
+		goto exit;
+	}
+	if (offset > geo->tbytes) {
+		log_err("ioeng->report_zones(%s): out-of-bounds\n", f->file_name);
+		goto exit;
+	}
+
+	/* Transform the zone-report */
+	for (uint32_t idx = 0; idx < rprt->nentries; ++idx) {
+		struct xnvme_spec_znd_descr *descr = XNVME_ZND_REPORT_DESCR(rprt, idx);
+
+		zbdz[idx].start = descr->zslba << ssw;
+		zbdz[idx].len = lbafe->zsze << ssw;
+		zbdz[idx].capacity = descr->zcap << ssw;
+		zbdz[idx].wp = descr->wp << ssw;
+
+		switch (descr->zt) {
+		case XNVME_SPEC_ZND_TYPE_SEQWR:
+			zbdz[idx].type = ZBD_ZONE_TYPE_SWR;
+			break;
+
+		default:
+			log_err("ioeng->report_zones(%s): invalid type for zone at offset(%zu)\n",
+				f->file_name, zbdz[idx].start);
+			err = -EIO;
+			goto exit;
+		}
+
+		switch (descr->zs) {
+		case XNVME_SPEC_ZND_STATE_EMPTY:
+			zbdz[idx].cond = ZBD_ZONE_COND_EMPTY;
+			break;
+		case XNVME_SPEC_ZND_STATE_IOPEN:
+			zbdz[idx].cond = ZBD_ZONE_COND_IMP_OPEN;
+			break;
+		case XNVME_SPEC_ZND_STATE_EOPEN:
+			zbdz[idx].cond = ZBD_ZONE_COND_EXP_OPEN;
+			break;
+		case XNVME_SPEC_ZND_STATE_CLOSED:
+			zbdz[idx].cond = ZBD_ZONE_COND_CLOSED;
+			break;
+		case XNVME_SPEC_ZND_STATE_FULL:
+			zbdz[idx].cond = ZBD_ZONE_COND_FULL;
+			break;
+
+		case XNVME_SPEC_ZND_STATE_RONLY:
+		case XNVME_SPEC_ZND_STATE_OFFLINE:
+		default:
+			zbdz[idx].cond = ZBD_ZONE_COND_OFFLINE;
+			break;
+		}
+	}
+
+exit:
+	xnvme_buf_virt_free(rprt);
+
+	xnvme_dev_close(dev);
+
+	err_lock = pthread_mutex_unlock(&g_serialize);
+	if (err_lock)
+		log_err("ioeng->report_zones(): pthread_mutex_unlock(), err: %d\n", err_lock);
+
+	dprint(FD_ZBD, "err: %d, nr_zones: %d\n", err, (int)nr_zones);
+
+	return err ? err : (int)limit;
+}
+
+/**
+ * NOTE: This function may get called before I/O engine initialization, that is,
+ * before ``_dev_open`` has been called and file-wrapping is setup. In such
+ * case it has to do ``_dev_open`` itself, and shut it down again once it is
+ * done resetting write pointer of zones.
+ */
+static int xnvme_fioe_reset_wp(struct thread_data *td, struct fio_file *f, uint64_t offset,
+			       uint64_t length)
+{
+	struct xnvme_opts opts = xnvme_opts_from_fioe(td);
+	struct xnvme_fioe_data *xd = NULL;
+	struct xnvme_fioe_fwrap *fwrap = NULL;
+	struct xnvme_dev *dev = NULL;
+	const struct xnvme_geo *geo = NULL;
+	uint64_t first, last;
+	uint32_t ssw;
+	uint32_t nsid;
+	int err = 0, err_lock;
+
+	if (td->io_ops_data) {
+		xd = td->io_ops_data;
+		fwrap = &xd->files[f->fileno];
+
+		assert(fwrap->dev);
+		assert(fwrap->geo);
+
+		dev = fwrap->dev;
+		geo = fwrap->geo;
+		ssw = fwrap->ssw;
+	} else {
+		err = pthread_mutex_lock(&g_serialize);
+		if (err) {
+			log_err("ioeng->reset_wp(): pthread_mutex_lock(), err(%d)\n", err);
+			return -err;
+		}
+
+		dev = xnvme_dev_open(f->file_name, &opts);
+		if (!dev) {
+			log_err("ioeng->reset_wp(): xnvme_dev_open(%s) failed, errno(%d)\n",
+				f->file_name, errno);
+			goto exit;
+		}
+		geo = xnvme_dev_get_geo(dev);
+		ssw = xnvme_dev_get_ssw(dev);
+	}
+
+	nsid = xnvme_dev_get_nsid(dev);
+
+	first = ((offset >> ssw) / geo->nsect) * geo->nsect;
+	last = (((offset + length) >> ssw) / geo->nsect) * geo->nsect;
+	dprint(FD_ZBD, "first: 0x%lx, last: 0x%lx\n", first, last);
+
+	for (uint64_t zslba = first; zslba < last; zslba += geo->nsect) {
+		struct xnvme_cmd_ctx ctx = xnvme_cmd_ctx_from_dev(dev);
+
+		if (zslba >= (geo->nsect * geo->nzone)) {
+			log_err("ioeng->reset_wp(): out-of-bounds\n");
+			err = 0;
+			break;
+		}
+
+		err = xnvme_znd_mgmt_send(&ctx, nsid, zslba, false,
+					  XNVME_SPEC_ZND_CMD_MGMT_SEND_RESET, 0x0, NULL);
+		if (err || xnvme_cmd_ctx_cpl_status(&ctx)) {
+			err = err ? err : -EIO;
+			log_err("ioeng->reset_wp(): err(%d), sc(%d)", err, ctx.cpl.status.sc);
+			goto exit;
+		}
+	}
+
+exit:
+	if (!td->io_ops_data) {
+		xnvme_dev_close(dev);
+
+		err_lock = pthread_mutex_unlock(&g_serialize);
+		if (err_lock)
+			log_err("ioeng->reset_wp(): pthread_mutex_unlock(), err(%d)\n", err_lock);
+	}
+
+	return err;
+}
+
+static int xnvme_fioe_get_file_size(struct thread_data *td, struct fio_file *f)
+{
+	struct xnvme_opts opts = xnvme_opts_from_fioe(td);
+	struct xnvme_dev *dev;
+	int ret = 0, err;
+
+	if (fio_file_size_known(f))
+		return 0;
+
+	ret = pthread_mutex_lock(&g_serialize);
+	if (ret) {
+		log_err("ioeng->reset_wp(): pthread_mutex_lock(), err(%d)\n", ret);
+		return -ret;
+	}
+
+	dev = xnvme_dev_open(f->file_name, &opts);
+	if (!dev) {
+		log_err("%s: failed retrieving device handle, errno: %d\n", f->file_name, errno);
+		ret = -errno;
+		goto exit;
+	}
+
+	f->real_file_size = xnvme_dev_get_geo(dev)->tbytes;
+	fio_file_set_size_known(f);
+	f->filetype = FIO_TYPE_BLOCK;
+
+exit:
+	xnvme_dev_close(dev);
+	err = pthread_mutex_unlock(&g_serialize);
+	if (err)
+		log_err("ioeng->reset_wp(): pthread_mutex_unlock(), err(%d)\n", err);
+
+	return ret;
+}
+
+FIO_STATIC struct ioengine_ops ioengine = {
+	.name = "xnvme",
+	.version = FIO_IOOPS_VERSION,
+	.options = options,
+	.option_struct_size = sizeof(struct xnvme_fioe_options),
+	.flags = FIO_DISKLESSIO | FIO_NODISKUTIL | FIO_NOEXTEND | FIO_MEMALIGN | FIO_RAWIO,
+
+	.cleanup = xnvme_fioe_cleanup,
+	.init = xnvme_fioe_init,
+
+	.iomem_free = xnvme_fioe_iomem_free,
+	.iomem_alloc = xnvme_fioe_iomem_alloc,
+
+	.io_u_free = xnvme_fioe_io_u_free,
+	.io_u_init = xnvme_fioe_io_u_init,
+
+	.event = xnvme_fioe_event,
+	.getevents = xnvme_fioe_getevents,
+	.queue = xnvme_fioe_queue,
+
+	.close_file = xnvme_fioe_close,
+	.open_file = xnvme_fioe_open,
+	.get_file_size = xnvme_fioe_get_file_size,
+
+	.invalidate = xnvme_fioe_invalidate,
+	.get_max_open_zones = xnvme_fioe_get_max_open_zones,
+	.get_zoned_model = xnvme_fioe_get_zoned_model,
+	.report_zones = xnvme_fioe_report_zones,
+	.reset_wp = xnvme_fioe_reset_wp,
+};
+
+static void fio_init fio_xnvme_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_xnvme_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/xnvme-compare.fio b/examples/xnvme-compare.fio
new file mode 100644
index 00000000..b89dfdf4
--- /dev/null
+++ b/examples/xnvme-compare.fio
@@ -0,0 +1,72 @@
+; Compare fio IO engines with a random-read workload using BS=4k at QD=1
+;
+; README
+;
+; This job-file is intended to be used as:
+;
+; # Use the built-in io_uring engine to get baseline numbers
+; fio examples/xnvme-compare.fio \
+;   --section=default \
+;   --ioengine=io_uring \
+;   --sqthread_poll=1 \
+;   --filename=/dev/nvme0n1
+;
+; # Use the xNVMe io-engine engine with Linux backend and io_uring async. impl.
+; fio examples/xnvme-compare.fio \
+;   --section=default \
+;   --ioengine=xnvme \
+;   --sqthread_poll=1 \
+;   --xnvme_async=io_uring \
+;   --filename=/dev/nvme0n1
+;
+; # Use the xNVMe io-engine engine with Linux backend and libaio async. impl.
+; fio examples/xnvme-compare.fio \
+;   --section=default \
+;   --ioengine=xnvme \
+;   --xnvme_async=libaio \
+;   --filename=/dev/nvme0n1
+;
+; # Use the xNVMe io-engine engine with SPDK backend, note that you have to set the Namespace-id
+; fio examples/xnvme-compare.fio \
+;   --section=default \
+;   --ioengine=xnvme \
+;   --xnvme_dev_nsid=1 \
+;   --filename=0000\\:01\\:00.0
+;
+; NOTE: The URI encoded in the filename above, the ":" must be escaped.
+;
+; On the command-line using two "\\":
+;
+; --filename=0000\\:01\\:00.0
+;
+; Within a fio-script using a single "\":
+;
+; filename=0000\:01\:00.0
+;
+; NOTE: If you want to override the default bs, iodepth, and workload, then
+; invoke it as:
+;
+; FIO_BS="512" FIO_RW="verify" FIO_IODEPTH=16 fio examples/xnvme-compare.fio \
+;   --section=override
+;
+[global]
+rw=randread
+size=12G
+iodepth=1
+bs=4K
+direct=1
+thread=1
+time_based=1
+runtime=7
+ramp_time=3
+norandommap=1
+
+; Avoid accidentally creating device files; e.g. "/dev/nvme0n1", "/dev/nullb0"
+allow_file_create=0
+
+[default]
+
+[override]
+rw=${FIO_RW}
+iodepth=${FIO_IODEPTH}
+bs=${FIO_BS}
diff --git a/examples/xnvme-zoned.fio b/examples/xnvme-zoned.fio
new file mode 100644
index 00000000..1344f9a1
--- /dev/null
+++ b/examples/xnvme-zoned.fio
@@ -0,0 +1,87 @@
+; Running xNVMe/fio on a Zoned Device
+;
+; Writes 1GB at QD1 using 4K BS and verifies it.
+;
+; README
+;
+; This job-file is intended to be used as:
+;
+; # Use the built-in io_uring engine to get baseline numbers
+; fio examples/xnvme-zoned.fio \
+;   --section=default \
+;   --ioengine=io_uring \
+;   --sqthread_poll=1 \
+;   --filename=/dev/nvme0n1
+;
+; # Use the xNVMe io-engine engine with Linux backend and io_uring async. impl.
+; fio examples/xnvme-zoned.fio \
+;   --section=default \
+;   --ioengine=xnvme \
+;   --sqthread_poll=1 \
+;   --xnvme_async=io_uring \
+;   --filename=/dev/nvme0n1
+;
+; # Use the xNVMe io-engine engine with Linux backend and libaio async. impl.
+; fio examples/xnvme-zoned.fio \
+;   --section=default \
+;   --ioengine=xnvme \
+;   --xnvme_async=libaio \
+;   --filename=/dev/nvme0n1
+;
+; # Use the xNVMe io-engine engine with SPDK backend, note that you have to set the Namespace-id
+; fio examples/xnvme-zoned.fio \
+;   --section=default \
+;   --ioengine=xnvme \
+;   --xnvme_dev_nsid=1 \
+;   --filename=0000\\:01\\:00.0
+;
+; NOTE: The URI encoded in the filename above, the ":" must be escaped.
+;
+; On the command-line using two "\\":
+;
+; --filename=0000\\:01\\:00.0
+;
+; Within a fio-script using a single "\":
+;
+; filename=0000\:01\:00.0
+;
+; NOTE: If you want to override the default bs, iodepth, and workload, then
+; invoke it as:
+;
+; FIO_BS="512" FIO_RW="verify" FIO_IODEPTH=16 fio examples/xnvme-zoned.fio \
+;   --section=override
+;
+; To reset all zones on the device to EMPTY state aka. wipe the entire device.
+;
+; # zoned mgmt-reset /dev/nvme0n2 --slba 0x0 --all
+;
+[global]
+zonemode=zbd
+rw=write
+size=1G
+iodepth=1
+bs=4K
+direct=1
+thread=1
+ramp_time=1
+norandommap=1
+verify=crc32c
+; Avoid accidentally creating device files; e.g. "/dev/nvme0n1", "/dev/nullb0"
+allow_file_create=0
+;
+; NOTE: If fio complains about zone-size, then run:
+;
+; # zoned info /dev/nvme0n1
+;
+; The command will provide the values you need, then in the fio-script define:
+;
+; zonesize=nsect * nbytes
+;
+;zonesize=
+
+[default]
+
+[override]
+rw=${FIO_RW}
+iodepth=${FIO_IODEPTH}
+bs=${FIO_BS}
diff --git a/fio.1 b/fio.1
index 609947dc..ded7bbfc 100644
--- a/fio.1
+++ b/fio.1
@@ -1965,6 +1965,12 @@ via kernel NFS.
 .TP
 .B exec
 Execute 3rd party tools. Could be used to perform monitoring during jobs runtime.
+.TP
+.B xnvme
+I/O engine using the xNVMe C API, for NVMe devices. The xnvme engine provides
+flexibility to access GNU/Linux Kernel NVMe driver via libaio, IOCTLs, io_uring,
+the SPDK NVMe driver, or your own custom NVMe driver. The xnvme engine includes
+engine specific options. (See \fIhttps://xnvme.io/\fR).
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -2039,7 +2045,7 @@ release them when IO is done. If this option is set, the pages are pre-mapped
 before IO is started. This eliminates the need to map and release for each IO.
 This is more efficient, and reduces the IO latency as well.
 .TP
-.BI (io_uring)hipri
+.BI (io_uring,xnvme)hipri
 If this option is set, fio will attempt to use polled IO completions. Normal IO
 completions generate interrupts to signal the completion of IO, polled
 completions do not. Hence they are require active reaping by the application.
@@ -2052,7 +2058,7 @@ This avoids the overhead of managing file counts in the kernel, making the
 submission and completion part more lightweight. Required for the below
 sqthread_poll option.
 .TP
-.BI (io_uring)sqthread_poll
+.BI (io_uring,xnvme)sqthread_poll
 Normally fio will submit IO by issuing a system call to notify the kernel of
 available items in the SQ ring. If this option is set, the act of submitting IO
 will be done by a polling thread in the kernel. This frees up cycles for fio, at
@@ -2480,6 +2486,66 @@ Defines the time between the SIGTERM and SIGKILL signals. Default is 1 second.
 .TP
 .BI (exec)std_redirect\fR=\fbool
 If set, stdout and stderr streams are redirected to files named from the job name. Default is true.
+.TP
+.BI (xnvme)xnvme_async\fR=\fPstr
+Select the xnvme async command interface. This can take these values.
+.RS
+.RS
+.TP
+.B emu
+This is default and used to emulate asynchronous I/O
+.TP
+.BI thrpool
+Use thread pool for Asynchronous I/O
+.TP
+.BI io_uring
+Use Linux io_uring/liburing for Asynchronous I/O
+.TP
+.BI libaio
+Use Linux aio for Asynchronous I/O
+.TP
+.BI posix
+Use POSIX aio for Asynchronous I/O
+.TP
+.BI nil
+Use nil-io; For introspective perf. evaluation
+.RE
+.RE
+.TP
+.BI (xnvme)xnvme_sync\fR=\fPstr
+Select the xnvme synchronous command interface. This can take these values.
+.RS
+.RS
+.TP
+.B nvme
+This is default and uses Linux NVMe Driver ioctl() for synchronous I/O
+.TP
+.BI psync
+Use pread()/write() for synchronous I/O
+.RE
+.RE
+.TP
+.BI (xnvme)xnvme_admin\fR=\fPstr
+Select the xnvme admin command interface. This can take these values.
+.RS
+.RS
+.TP
+.B nvme
+This is default and uses Linux NVMe Driver ioctl() for admin commands
+.TP
+.BI block
+Use Linux Block Layer ioctl() and sysfs for admin commands
+.TP
+.BI file_as_ns
+Use file-stat as to construct NVMe idfy responses
+.RE
+.RE
+.TP
+.BI (xnvme)xnvme_dev_nsid\fR=\fPint
+xnvme namespace identifier, for userspace NVMe driver.
+.TP
+.BI (xnvme)xnvme_iovec
+If this option is set, xnvme will use vectored read/write commands.
 .SS "I/O depth"
 .TP
 .BI iodepth \fR=\fPint
diff --git a/optgroup.h b/optgroup.h
index 3ac8f62a..dc73c8f3 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -72,6 +72,7 @@ enum opt_category_group {
 	__FIO_OPT_G_DFS,
 	__FIO_OPT_G_NFS,
 	__FIO_OPT_G_WINDOWSAIO,
+	__FIO_OPT_G_XNVME,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
 	FIO_OPT_G_ZONE		= (1ULL << __FIO_OPT_G_ZONE),
@@ -118,6 +119,7 @@ enum opt_category_group {
 	FIO_OPT_G_LIBCUFILE	= (1ULL << __FIO_OPT_G_LIBCUFILE),
 	FIO_OPT_G_DFS		= (1ULL << __FIO_OPT_G_DFS),
 	FIO_OPT_G_WINDOWSAIO	= (1ULL << __FIO_OPT_G_WINDOWSAIO),
+	FIO_OPT_G_XNVME         = (1ULL << __FIO_OPT_G_XNVME),
 };
 
 extern const struct opt_group *opt_group_from_mask(uint64_t *mask);
diff --git a/options.c b/options.c
index 3b83573b..2b183c60 100644
--- a/options.c
+++ b/options.c
@@ -2144,6 +2144,11 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  { .ival = "nfs",
 			    .help = "NFS IO engine",
 			  },
+#endif
+#ifdef CONFIG_LIBXNVME
+			  { .ival = "xnvme",
+			    .help = "XNVME IO engine",
+			  },
 #endif
 		},
 	},

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-05-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-05-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6e594a2fa8388892dffb2ffc9b865689e2d67833:

  Merge branch 'global_dedup' of https://github.com/bardavid/fio (2022-04-29 16:30:50 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 12db6deb8b767ac89dd73e34dbc6f06905441e07:

  Merge branch 'patch-1' of https://github.com/ferdnyc/fio (2022-05-01 07:29:05 -0600)

----------------------------------------------------------------
Frank Dana (1):
      README: Update Fedora pkg URL

Jens Axboe (1):
      Merge branch 'patch-1' of https://github.com/ferdnyc/fio

 README.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/README.rst b/README.rst
index d566fae3..527f33ab 100644
--- a/README.rst
+++ b/README.rst
@@ -107,7 +107,7 @@ Ubuntu:
 Red Hat, Fedora, CentOS & Co:
 	Starting with Fedora 9/Extra Packages for Enterprise Linux 4, fio
 	packages are part of the Fedora/EPEL repositories.
-	https://apps.fedoraproject.org/packages/fio .
+	https://packages.fedoraproject.org/pkgs/fio/ .
 
 Mandriva:
 	Mandriva has integrated fio into their package repository, so installing

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-04-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-04-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5f2d43188c2d65674aaba6280e2a87107e5d7099:

  Merge branch 'fix/json/strdup_memory_leak' of https://github.com/dpronin/fio (2022-04-17 16:47:22 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6e594a2fa8388892dffb2ffc9b865689e2d67833:

  Merge branch 'global_dedup' of https://github.com/bardavid/fio (2022-04-29 16:30:50 -0600)

----------------------------------------------------------------
Bar David (2):
      Introducing support for generation of dedup buffers across jobs. The dedup buffers are spread evenly between the jobs that enabled the dedupe_global option
      adding an example for dedupe_global usage and DRR testing

Jens Axboe (1):
      Merge branch 'global_dedup' of https://github.com/bardavid/fio

 HOWTO.rst                  |  6 +++++
 backend.c                  |  5 ++++
 cconv.c                    |  2 ++
 dedupe.c                   | 46 +++++++++++++++++++++++++++++++++----
 dedupe.h                   |  3 ++-
 examples/dedupe-global.fio | 57 ++++++++++++++++++++++++++++++++++++++++++++++
 fio.1                      |  9 ++++++++
 init.c                     |  2 +-
 options.c                  | 10 ++++++++
 server.h                   |  2 +-
 thread_options.h           |  3 +++
 11 files changed, 138 insertions(+), 7 deletions(-)
 create mode 100644 examples/dedupe-global.fio

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index a5fa432e..6a3e09f5 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1749,6 +1749,12 @@ Buffers and memory
 	Note that size needs to be explicitly provided and only 1 file per
 	job is supported
 
+.. option:: dedupe_global=bool
+
+	This controls whether the deduplication buffers will be shared amongst
+	all jobs that have this option set. The buffers are spread evenly between
+	participating jobs.
+
 .. option:: invalidate=bool
 
 	Invalidate the buffer/page cache parts of the files to be used prior to
diff --git a/backend.c b/backend.c
index 317e4f6c..ffbb7e2a 100644
--- a/backend.c
+++ b/backend.c
@@ -2570,6 +2570,11 @@ int fio_backend(struct sk_out *sk_out)
 		setup_log(&agg_io_log[DDIR_TRIM], &p, "agg-trim_bw.log");
 	}
 
+	if (init_global_dedupe_working_set_seeds()) {
+		log_err("fio: failed to initialize global dedupe working set\n");
+		return 1;
+	}
+
 	startup_sem = fio_sem_init(FIO_SEM_LOCKED);
 	if (!sk_out)
 		is_local_backend = true;
diff --git a/cconv.c b/cconv.c
index 62d02e36..6c36afb7 100644
--- a/cconv.c
+++ b/cconv.c
@@ -305,6 +305,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->dedupe_percentage = le32_to_cpu(top->dedupe_percentage);
 	o->dedupe_mode = le32_to_cpu(top->dedupe_mode);
 	o->dedupe_working_set_percentage = le32_to_cpu(top->dedupe_working_set_percentage);
+	o->dedupe_global = le32_to_cpu(top->dedupe_global);
 	o->block_error_hist = le32_to_cpu(top->block_error_hist);
 	o->replay_align = le32_to_cpu(top->replay_align);
 	o->replay_scale = le32_to_cpu(top->replay_scale);
@@ -513,6 +514,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->dedupe_percentage = cpu_to_le32(o->dedupe_percentage);
 	top->dedupe_mode = cpu_to_le32(o->dedupe_mode);
 	top->dedupe_working_set_percentage = cpu_to_le32(o->dedupe_working_set_percentage);
+	top->dedupe_global = cpu_to_le32(o->dedupe_global);
 	top->block_error_hist = cpu_to_le32(o->block_error_hist);
 	top->replay_align = cpu_to_le32(o->replay_align);
 	top->replay_scale = cpu_to_le32(o->replay_scale);
diff --git a/dedupe.c b/dedupe.c
index fd116dfb..8214a786 100644
--- a/dedupe.c
+++ b/dedupe.c
@@ -1,13 +1,37 @@
 #include "fio.h"
 
-int init_dedupe_working_set_seeds(struct thread_data *td)
+/**
+ * initializes the global dedup workset.
+ * this needs to be called after all jobs' seeds
+ * have been initialized
+ */
+int init_global_dedupe_working_set_seeds(void)
 {
-	unsigned long long i, j, num_seed_advancements;
+	int i;
+	struct thread_data *td;
+
+	for_each_td(td, i) {
+		if (!td->o.dedupe_global)
+			continue;
+
+		if (init_dedupe_working_set_seeds(td, 1))
+			return 1;
+	}
+
+	return 0;
+}
+
+int init_dedupe_working_set_seeds(struct thread_data *td, bool global_dedup)
+{
+	int tindex;
+	struct thread_data *td_seed;
+	unsigned long long i, j, num_seed_advancements, pages_per_seed;
 	struct frand_state dedupe_working_set_state = {0};
 
 	if (!td->o.dedupe_percentage || !(td->o.dedupe_mode == DEDUPE_MODE_WORKING_SET))
 		return 0;
 
+	tindex = td->thread_number - 1;
 	num_seed_advancements = td->o.min_bs[DDIR_WRITE] /
 		min_not_zero(td->o.min_bs[DDIR_WRITE], (unsigned long long) td->o.compress_chunk);
 	/*
@@ -20,9 +44,11 @@ int init_dedupe_working_set_seeds(struct thread_data *td)
 		log_err("fio: could not allocate dedupe working set\n");
 		return 1;
 	}
+
 	frand_copy(&dedupe_working_set_state, &td->buf_state);
-	for (i = 0; i < td->num_unique_pages; i++) {
-		frand_copy(&td->dedupe_working_set_states[i], &dedupe_working_set_state);
+	frand_copy(&td->dedupe_working_set_states[0], &dedupe_working_set_state);
+	pages_per_seed = max(td->num_unique_pages / thread_number, 1ull);
+	for (i = 1; i < td->num_unique_pages; i++) {
 		/*
 		 * When compression is used the seed is advanced multiple times to
 		 * generate the buffer. We want to regenerate the same buffer when
@@ -30,6 +56,18 @@ int init_dedupe_working_set_seeds(struct thread_data *td)
 		 */
 		for (j = 0; j < num_seed_advancements; j++)
 			__get_next_seed(&dedupe_working_set_state);
+
+		/*
+		 * When global dedup is used, we rotate the seeds to allow
+		 * generating same buffers across different jobs. Deduplication buffers
+		 * are spread evenly across jobs participating in global dedupe
+		 */
+		if (global_dedup && i % pages_per_seed == 0) {
+			td_seed = tnumber_to_td(++tindex % thread_number);
+			frand_copy(&dedupe_working_set_state, &td_seed->buf_state);
+		}
+
+		frand_copy(&td->dedupe_working_set_states[i], &dedupe_working_set_state);
 	}
 
 	return 0;
diff --git a/dedupe.h b/dedupe.h
index d4c4dc37..bd1f9c0c 100644
--- a/dedupe.h
+++ b/dedupe.h
@@ -1,6 +1,7 @@
 #ifndef DEDUPE_H
 #define DEDUPE_H
 
-int init_dedupe_working_set_seeds(struct thread_data *td);
+int init_dedupe_working_set_seeds(struct thread_data *td, bool global_dedupe);
+int init_global_dedupe_working_set_seeds(void);
 
 #endif
diff --git a/examples/dedupe-global.fio b/examples/dedupe-global.fio
new file mode 100644
index 00000000..edaaad55
--- /dev/null
+++ b/examples/dedupe-global.fio
@@ -0,0 +1,57 @@
+# Writing to 2 files that share the duplicate blocks.
+# The dedupe working set is spread uniformly such that when
+# each of the jobs choose to perform a dedup operation they will
+# regenerate a buffer from the global space.
+# If you test the dedup ratio on either file by itself the result
+# is likely lower than if you test the ratio of the two files combined.
+#
+# Use `./t/fio-dedupe <file> -C 1 -c 1 -b 4096` to test the total
+# data reduction ratio.
+#
+#
+# Full example of test:
+# $ ./fio ./examples/dedupe-global.fio
+#
+# Checking ratio on a and b individually:
+# $ ./t/fio-dedupe a.0.0 -C 1 -c 1 -b 4096
+#
+# $ Extents=25600, Unique extents=16817 Duplicated extents=5735
+# $ De-dupe ratio: 1:0.52
+# $ De-dupe working set at least: 22.40%
+# $ Fio setting: dedupe_percentage=34
+# $ Unique capacity 33MB
+#
+# ./t/fio-dedupe b.0.0 -C 1 -c 1 -b 4096
+# $ Extents=25600, Unique extents=17009 Duplicated extents=5636
+# $ De-dupe ratio: 1:0.51
+# $ De-dupe working set at least: 22.02%
+# $ Fio setting: dedupe_percentage=34
+# $ Unique capacity 34MB
+#
+# Combining files:
+# $ cat a.0.0 > c.0.0
+# $ cat b.0.0 >> c.0.0
+#
+# Checking data reduction ratio on combined file:
+# $ ./t/fio-dedupe c.0.0 -C 1 -c 1 -b 4096
+# $ Extents=51200, Unique extents=25747 Duplicated extents=11028
+# $ De-dupe ratio: 1:0.99
+# $ De-dupe working set at least: 21.54%
+# $ Fio setting: dedupe_percentage=50
+# $ Unique capacity 51MB
+#
+[global]
+ioengine=libaio
+iodepth=256
+size=100m
+dedupe_mode=working_set
+dedupe_global=1
+dedupe_percentage=50
+blocksize=4k
+rw=write
+buffer_compress_percentage=50
+dedupe_working_set_percentage=50
+
+[a]
+
+[b]
diff --git a/fio.1 b/fio.1
index a2ec836f..609947dc 100644
--- a/fio.1
+++ b/fio.1
@@ -1553,6 +1553,15 @@ Note that \fBsize\fR needs to be explicitly provided and only 1 file
 per job is supported
 .RE
 .TP
+.BI dedupe_global \fR=\fPbool
+This controls whether the deduplication buffers will be shared amongst
+all jobs that have this option set. The buffers are spread evenly between
+participating jobs.
+.P
+.RS
+Note that \fBdedupe_mode\fR must be set to \fBworking_set\fR for this to work.
+Can be used in combination with compression
+.TP
 .BI invalidate \fR=\fPbool
 Invalidate the buffer/page cache parts of the files to be used prior to
 starting I/O if the platform and file type support it. Defaults to true.
diff --git a/init.c b/init.c
index 6f186051..f7d702f8 100644
--- a/init.c
+++ b/init.c
@@ -1541,7 +1541,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	if (fixup_options(td))
 		goto err;
 
-	if (init_dedupe_working_set_seeds(td))
+	if (!td->o.dedupe_global && init_dedupe_working_set_seeds(td, 0))
 		goto err;
 
 	/*
diff --git a/options.c b/options.c
index e06d9b66..3b83573b 100644
--- a/options.c
+++ b/options.c
@@ -4665,6 +4665,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_IO_BUF,
 	},
+	{
+		.name	= "dedupe_global",
+		.lname	= "Global deduplication",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct thread_options, dedupe_global),
+		.help	= "Share deduplication buffers across jobs",
+		.def	= "0",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_IO_BUF,
+	},
 	{
 		.name	= "dedupe_mode",
 		.lname	= "Dedupe mode",
diff --git a/server.h b/server.h
index 0e62b6df..b0c5e2df 100644
--- a/server.h
+++ b/server.h
@@ -51,7 +51,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 96,
+	FIO_SERVER_VER			= 97,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 4162c42f..634070af 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -263,6 +263,7 @@ struct thread_options {
 	unsigned int dedupe_percentage;
 	unsigned int dedupe_mode;
 	unsigned int dedupe_working_set_percentage;
+	unsigned int dedupe_global;
 	unsigned int time_based;
 	unsigned int disable_lat;
 	unsigned int disable_clat;
@@ -578,6 +579,7 @@ struct thread_options_pack {
 	uint32_t dedupe_percentage;
 	uint32_t dedupe_mode;
 	uint32_t dedupe_working_set_percentage;
+	uint32_t dedupe_global;
 	uint32_t time_based;
 	uint32_t disable_lat;
 	uint32_t disable_clat;
@@ -596,6 +598,7 @@ struct thread_options_pack {
 	uint32_t lat_percentiles;
 	uint32_t slat_percentiles;
 	uint32_t percentile_precision;
+	uint32_t pad5;
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
 	uint8_t read_iolog_file[FIO_TOP_STR_MAX];

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-04-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-04-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d684bb2839d1fa010fba1e64f9b0c16240d8bdae:

  Merge branch 'fix/remove-sudo-in-test-script' of https://github.com/dpronin/fio (2022-04-10 15:18:42 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5f2d43188c2d65674aaba6280e2a87107e5d7099:

  Merge branch 'fix/json/strdup_memory_leak' of https://github.com/dpronin/fio (2022-04-17 16:47:22 -0600)

----------------------------------------------------------------
Denis Pronin (5):
      fixed possible and actual memory leaks
      fixed memory leak of not freed jobs_eta in several cases
      use flist_first_entry instead of flist_entry applied to 'next' list item
      fixed bunch of memory leaks in json constructor
      updated logging of iops1, iops2, ratio in FioJobTest_iops_rate

Jens Axboe (3):
      Merge branch 'fix/memory-leak' of https://github.com/dpronin/fio
      Merge branch 'fix/jobs_eta_memory_leak' of https://github.com/dpronin/fio
      Merge branch 'fix/json/strdup_memory_leak' of https://github.com/dpronin/fio

 backend.c          | 3 +++
 eta.c              | 7 ++++---
 ioengines.c        | 2 ++
 json.h             | 7 ++++++-
 server.c           | 2 +-
 stat.c             | 2 ++
 t/run-fio-tests.py | 3 ++-
 7 files changed, 20 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 001b2b96..317e4f6c 100644
--- a/backend.c
+++ b/backend.c
@@ -2433,8 +2433,10 @@ reap:
 			} else {
 				pid_t pid;
 				struct fio_file **files;
+				void *eo;
 				dprint(FD_PROCESS, "will fork\n");
 				files = td->files;
+				eo = td->eo;
 				read_barrier();
 				pid = fork();
 				if (!pid) {
@@ -2447,6 +2449,7 @@ reap:
 				// freeing previously allocated memory for files
 				// this memory freed MUST NOT be shared between processes, only the pointer itself may be shared within TD
 				free(files);
+				free(eo);
 				free(fd);
 				fd = NULL;
 			}
diff --git a/eta.c b/eta.c
index 17970c78..6017ca31 100644
--- a/eta.c
+++ b/eta.c
@@ -3,6 +3,7 @@
  */
 #include <unistd.h>
 #include <string.h>
+#include <stdlib.h>
 #ifdef CONFIG_VALGRIND_DEV
 #include <valgrind/drd.h>
 #else
@@ -707,10 +708,10 @@ void print_thread_status(void)
 	size_t size;
 
 	je = get_jobs_eta(false, &size);
-	if (je)
+	if (je) {
 		display_thread_status(je);
-
-	free(je);
+		free(je);
+	}
 }
 
 void print_status_init(int thr_number)
diff --git a/ioengines.c b/ioengines.c
index d08a511a..68f307e5 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -223,6 +223,8 @@ struct ioengine_ops *load_ioengine(struct thread_data *td)
  */
 void free_ioengine(struct thread_data *td)
 {
+	assert(td != NULL && td->io_ops != NULL);
+
 	dprint(FD_IO, "free ioengine %s\n", td->io_ops->name);
 
 	if (td->eo && td->io_ops->options) {
diff --git a/json.h b/json.h
index d9824263..66bb06b1 100644
--- a/json.h
+++ b/json.h
@@ -81,8 +81,13 @@ static inline int json_object_add_value_string(struct json_object *obj,
 	struct json_value arg = {
 		.type = JSON_TYPE_STRING,
 	};
+	union {
+		const char *a;
+		char *b;
+	} string;
 
-	arg.string = strdup(val ? : "");
+	string.a = val ? val : "";
+	arg.string = string.b;
 	return json_object_add_value_type(obj, name, &arg);
 }
 
diff --git a/server.c b/server.c
index 914a8c74..4c71bd44 100644
--- a/server.c
+++ b/server.c
@@ -1323,7 +1323,7 @@ static int handle_xmits(struct sk_out *sk_out)
 	sk_unlock(sk_out);
 
 	while (!flist_empty(&list)) {
-		entry = flist_entry(list.next, struct sk_entry, list);
+		entry = flist_first_entry(&list, struct sk_entry, list);
 		flist_del(&entry->list);
 		ret += handle_sk_entry(sk_out, entry);
 	}
diff --git a/stat.c b/stat.c
index 356083e2..949af5ed 100644
--- a/stat.c
+++ b/stat.c
@@ -1,5 +1,6 @@
 #include <stdio.h>
 #include <string.h>
+#include <stdlib.h>
 #include <sys/time.h>
 #include <sys/stat.h>
 #include <math.h>
@@ -1698,6 +1699,7 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 	if (je) {
 		json_object_add_value_int(root, "eta", je->eta_sec);
 		json_object_add_value_int(root, "elapsed", je->elapsed_sec);
+		free(je);
 	}
 
 	if (opt_list)
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 612e50ca..ecceb67e 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -546,9 +546,10 @@ class FioJobTest_iops_rate(FioJobTest):
             return
 
         iops1 = self.json_data['jobs'][0]['read']['iops']
+        logging.debug("Test %d: iops1: %f", self.testnum, iops1)
         iops2 = self.json_data['jobs'][1]['read']['iops']
+        logging.debug("Test %d: iops2: %f", self.testnum, iops2)
         ratio = iops2 / iops1
-        logging.debug("Test %d: iops1: %f", self.testnum, iops1)
         logging.debug("Test %d: ratio: %f", self.testnum, ratio)
 
         if iops1 < 950 or iops1 > 1050:

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-04-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-04-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6d01ac19170fadaf46a6db6b4cc347f1b389f422:

  iolog: Use %llu for 64-bit (2022-04-08 12:46:44 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d684bb2839d1fa010fba1e64f9b0c16240d8bdae:

  Merge branch 'fix/remove-sudo-in-test-script' of https://github.com/dpronin/fio (2022-04-10 15:18:42 -0600)

----------------------------------------------------------------
Denis Pronin (1):
      actions-full-test.sh, removed sudo from the script

Jens Axboe (1):
      Merge branch 'fix/remove-sudo-in-test-script' of https://github.com/dpronin/fio

 ci/actions-full-test.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/ci/actions-full-test.sh b/ci/actions-full-test.sh
index 91790664..8282002f 100755
--- a/ci/actions-full-test.sh
+++ b/ci/actions-full-test.sh
@@ -6,9 +6,9 @@ main() {
     echo "Running long running tests..."
     export PYTHONUNBUFFERED="TRUE"
     if [[ "${CI_TARGET_ARCH}" == "arm64" ]]; then
-        sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug -p 1010:"--skip 15 16 17 18 19 20"
+        python3 t/run-fio-tests.py --skip 6 1007 1008 --debug -p 1010:"--skip 15 16 17 18 19 20"
     else
-        sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug
+        python3 t/run-fio-tests.py --skip 6 1007 1008 --debug
     fi
     make -C doc html
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-04-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-04-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a3e48f483db27d20e02cbd81e3a8f18c6c5c50f5:

  Fio 3.30 (2022-04-06 17:10:00 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6d01ac19170fadaf46a6db6b4cc347f1b389f422:

  iolog: Use %llu for 64-bit (2022-04-08 12:46:44 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      iolog: fix warning for 32-bit compilation
      iolog: Use %llu for 64-bit

Mohamad Gebai (3):
      iolog: add version 3 to support timestamp-based replay
      iolog: add iolog_write for version 3
      iolog: update man page for version 3

 HOWTO.rst  |  29 +++++++++++++++-
 blktrace.c |  17 ++--------
 fio.1      |  35 +++++++++++++++++++-
 fio.h      |   4 ++-
 iolog.c    | 109 ++++++++++++++++++++++++++++++++++++++++++++++++-------------
 iolog.h    |   8 ++---
 6 files changed, 158 insertions(+), 44 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index 0978879c..a5fa432e 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -4398,7 +4398,9 @@ given in bytes. The `action` can be one of these:
 
 **wait**
 	   Wait for `offset` microseconds. Everything below 100 is discarded.
-	   The time is relative to the previous `wait` statement.
+	   The time is relative to the previous `wait` statement. Note that
+	   action `wait` is not allowed as of version 3, as the same behavior
+	   can be achieved using timestamps.
 **read**
 	   Read `length` bytes beginning from `offset`.
 **write**
@@ -4411,6 +4413,31 @@ given in bytes. The `action` can be one of these:
 	   Trim the given file from the given `offset` for `length` bytes.
 
 
+Trace file format v3
+~~~~~~~~~~~~~~~~~~~~
+
+The third version of the trace file format was added in fio version 3.31. It
+forces each action to have a timestamp associated with it.
+
+The first line of the trace file has to be::
+
+    fio version 3 iolog
+
+Following this can be lines in two different formats, which are described below.
+
+The file management format::
+
+    timestamp filename action
+
+The file I/O action format::
+
+    timestamp filename action offset length
+
+The `timestamp` is relative to the beginning of the run (ie starts at 0). The
+`filename`, `action`, `offset` and `length`  are identical to version 2, except
+that version 3 does not allow the `wait` action.
+
+
 I/O Replay - Merging Traces
 ---------------------------
 
diff --git a/blktrace.c b/blktrace.c
index ead60130..619121c7 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -313,25 +313,14 @@ static bool queue_trace(struct thread_data *td, struct blk_io_trace *t,
 			 unsigned long *ios, unsigned long long *bs,
 			 struct file_cache *cache)
 {
-	unsigned long long *last_ttime = &td->io_log_blktrace_last_ttime;
+	unsigned long long *last_ttime = &td->io_log_last_ttime;
 	unsigned long long delay = 0;
 
 	if ((t->action & 0xffff) != __BLK_TA_QUEUE)
 		return false;
 
 	if (!(t->action & BLK_TC_ACT(BLK_TC_NOTIFY))) {
-		if (!*last_ttime || td->o.no_stall || t->time < *last_ttime)
-			delay = 0;
-		else if (td->o.replay_time_scale == 100)
-			delay = t->time - *last_ttime;
-		else {
-			double tmp = t->time - *last_ttime;
-			double scale;
-
-			scale = (double) 100.0 / (double) td->o.replay_time_scale;
-			tmp *= scale;
-			delay = tmp;
-		}
+		delay = delay_since_ttime(td, t->time);
 		*last_ttime = t->time;
 	}
 
@@ -422,7 +411,7 @@ bool init_blktrace_read(struct thread_data *td, const char *filename, int need_s
 		goto err;
 	}
 	td->io_log_blktrace_swap = need_swap;
-	td->io_log_blktrace_last_ttime = 0;
+	td->io_log_last_ttime = 0;
 	td->o.size = 0;
 
 	free_release_files(td);
diff --git a/fio.1 b/fio.1
index 98410655..a2ec836f 100644
--- a/fio.1
+++ b/fio.1
@@ -4117,7 +4117,9 @@ given in bytes. The `action' can be one of these:
 .TP
 .B wait
 Wait for `offset' microseconds. Everything below 100 is discarded.
-The time is relative to the previous `wait' statement.
+The time is relative to the previous `wait' statement. Note that action `wait`
+is not allowed as of version 3, as the same behavior can be achieved using
+timestamps.
 .TP
 .B read
 Read `length' bytes beginning from `offset'.
@@ -4135,6 +4137,37 @@ Write `length' bytes beginning from `offset'.
 Trim the given file from the given `offset' for `length' bytes.
 .RE
 .RE
+.RE
+.TP
+.B Trace file format v3
+The third version of the trace file format was added in fio version 3.31. It
+forces each action to have a timestamp associated with it.
+.RS
+.P
+The first line of the trace file has to be:
+.RS
+.P
+"fio version 3 iolog"
+.RE
+.P
+Following this can be lines in two different formats, which are described below.
+.P
+.B
+The file management format:
+.RS
+timestamp filename action
+.P
+.RE
+.B
+The file I/O action format:
+.RS
+timestamp filename action offset length
+.P
+The `timestamp` is relative to the beginning of the run (ie starts at 0). The
+`filename`, `action`, `offset` and `length`  are identical to version 2, except
+that version 3 does not allow the `wait` action.
+.RE
+.RE
 .SH I/O REPLAY \- MERGING TRACES
 Colocation is a common practice used to get the most out of a machine.
 Knowing which workloads play nicely with each other and which ones don't is
diff --git a/fio.h b/fio.h
index 776fb51f..de7eca79 100644
--- a/fio.h
+++ b/fio.h
@@ -431,10 +431,12 @@ struct thread_data {
 	FILE *io_log_rfile;
 	unsigned int io_log_blktrace;
 	unsigned int io_log_blktrace_swap;
-	unsigned long long io_log_blktrace_last_ttime;
+	unsigned long long io_log_last_ttime;
+	struct timespec io_log_start_time;
 	unsigned int io_log_current;
 	unsigned int io_log_checkmark;
 	unsigned int io_log_highmark;
+	unsigned int io_log_version;
 	struct timespec io_log_highmark_time;
 
 	/*
diff --git a/iolog.c b/iolog.c
index 724ec1fe..37e799a1 100644
--- a/iolog.c
+++ b/iolog.c
@@ -31,6 +31,7 @@
 static int iolog_flush(struct io_log *log);
 
 static const char iolog_ver2[] = "fio version 2 iolog";
+static const char iolog_ver3[] = "fio version 3 iolog";
 
 void queue_io_piece(struct thread_data *td, struct io_piece *ipo)
 {
@@ -40,18 +41,24 @@ void queue_io_piece(struct thread_data *td, struct io_piece *ipo)
 
 void log_io_u(const struct thread_data *td, const struct io_u *io_u)
 {
+	struct timespec now;
+
 	if (!td->o.write_iolog_file)
 		return;
 
-	fprintf(td->iolog_f, "%s %s %llu %llu\n", io_u->file->file_name,
-						io_ddir_name(io_u->ddir),
-						io_u->offset, io_u->buflen);
+	fio_gettime(&now, NULL);
+	fprintf(td->iolog_f, "%llu %s %s %llu %llu\n",
+		(unsigned long long) utime_since_now(&td->io_log_start_time),
+		io_u->file->file_name, io_ddir_name(io_u->ddir), io_u->offset,
+		io_u->buflen);
+
 }
 
 void log_file(struct thread_data *td, struct fio_file *f,
 	      enum file_log_act what)
 {
 	const char *act[] = { "add", "open", "close" };
+	struct timespec now;
 
 	assert(what < 3);
 
@@ -65,7 +72,10 @@ void log_file(struct thread_data *td, struct fio_file *f,
 	if (!td->iolog_f)
 		return;
 
-	fprintf(td->iolog_f, "%s %s\n", f->file_name, act[what]);
+	fio_gettime(&now, NULL);
+	fprintf(td->iolog_f, "%llu %s %s\n",
+		(unsigned long long) utime_since_now(&td->io_log_start_time),
+		f->file_name, act[what]);
 }
 
 static void iolog_delay(struct thread_data *td, unsigned long delay)
@@ -116,6 +126,10 @@ static int ipo_special(struct thread_data *td, struct io_piece *ipo)
 
 	f = td->files[ipo->fileno];
 
+	if (ipo->delay)
+		iolog_delay(td, ipo->delay);
+	if (fio_fill_issue_time(td))
+		fio_gettime(&td->last_issue, NULL);
 	switch (ipo->file_action) {
 	case FIO_LOG_OPEN_FILE:
 		if (td->o.replay_redirect && fio_file_open(f)) {
@@ -134,6 +148,11 @@ static int ipo_special(struct thread_data *td, struct io_piece *ipo)
 	case FIO_LOG_UNLINK_FILE:
 		td_io_unlink_file(td, f);
 		break;
+	case FIO_LOG_ADD_FILE:
+		/*
+		 * Nothing to do
+		 */
+		break;
 	default:
 		log_err("fio: bad file action %d\n", ipo->file_action);
 		break;
@@ -142,7 +161,25 @@ static int ipo_special(struct thread_data *td, struct io_piece *ipo)
 	return 1;
 }
 
-static bool read_iolog2(struct thread_data *td);
+static bool read_iolog(struct thread_data *td);
+
+unsigned long long delay_since_ttime(const struct thread_data *td,
+	       unsigned long long time)
+{
+	double tmp;
+	double scale;
+	const unsigned long long *last_ttime = &td->io_log_last_ttime;
+
+	if (!*last_ttime || td->o.no_stall || time < *last_ttime)
+		return 0;
+	else if (td->o.replay_time_scale == 100)
+		return time - *last_ttime;
+
+
+	scale = (double) 100.0 / (double) td->o.replay_time_scale;
+	tmp = time - *last_ttime;
+	return tmp * scale;
+}
 
 int read_iolog_get(struct thread_data *td, struct io_u *io_u)
 {
@@ -158,7 +195,7 @@ int read_iolog_get(struct thread_data *td, struct io_u *io_u)
 					if (!read_blktrace(td))
 						return 1;
 				} else {
-					if (!read_iolog2(td))
+					if (!read_iolog(td))
 						return 1;
 				}
 			}
@@ -388,14 +425,20 @@ int64_t iolog_items_to_fetch(struct thread_data *td)
 	return items_to_fetch;
 }
 
+#define io_act(_td, _r) (((_td)->io_log_version == 3 && (r) == 5) || \
+					((_td)->io_log_version == 2 && (r) == 4))
+#define file_act(_td, _r) (((_td)->io_log_version == 3 && (r) == 3) || \
+					((_td)->io_log_version == 2 && (r) == 2))
+
 /*
- * Read version 2 iolog data. It is enhanced to include per-file logging,
+ * Read version 2 and 3 iolog data. It is enhanced to include per-file logging,
  * syncs, etc.
  */
-static bool read_iolog2(struct thread_data *td)
+static bool read_iolog(struct thread_data *td)
 {
 	unsigned long long offset;
 	unsigned int bytes;
+	unsigned long long delay = 0;
 	int reads, writes, waits, fileno = 0, file_action = 0; /* stupid gcc */
 	char *rfname, *fname, *act;
 	char *str, *p;
@@ -422,14 +465,28 @@ static bool read_iolog2(struct thread_data *td)
 	while ((p = fgets(str, 4096, td->io_log_rfile)) != NULL) {
 		struct io_piece *ipo;
 		int r;
+		unsigned long long ttime;
 
-		r = sscanf(p, "%256s %256s %llu %u", rfname, act, &offset,
-									&bytes);
+		if (td->io_log_version == 3) {
+			r = sscanf(p, "%llu %256s %256s %llu %u", &ttime, rfname, act,
+							&offset, &bytes);
+			delay = delay_since_ttime(td, ttime);
+			td->io_log_last_ttime = ttime;
+			/*
+			 * "wait" is not allowed with version 3
+			 */
+			if (!strcmp(act, "wait")) {
+				log_err("iolog: ignoring wait command with"
+					" version 3 for file %s\n", fname);
+				continue;
+			}
+		} else /* version 2 */
+			r = sscanf(p, "%256s %256s %llu %u", rfname, act, &offset, &bytes);
 
 		if (td->o.replay_redirect)
 			fname = td->o.replay_redirect;
 
-		if (r == 4) {
+		if (io_act(td, r)) {
 			/*
 			 * Check action first
 			 */
@@ -451,7 +508,7 @@ static bool read_iolog2(struct thread_data *td)
 				continue;
 			}
 			fileno = get_fileno(td, fname);
-		} else if (r == 2) {
+		} else if (file_act(td, r)) {
 			rw = DDIR_INVAL;
 			if (!strcmp(act, "add")) {
 				if (td->o.replay_redirect &&
@@ -462,7 +519,6 @@ static bool read_iolog2(struct thread_data *td)
 					fileno = add_file(td, fname, td->subjob_number, 1);
 					file_action = FIO_LOG_ADD_FILE;
 				}
-				continue;
 			} else if (!strcmp(act, "open")) {
 				fileno = get_fileno(td, fname);
 				file_action = FIO_LOG_OPEN_FILE;
@@ -475,7 +531,7 @@ static bool read_iolog2(struct thread_data *td)
 				continue;
 			}
 		} else {
-			log_err("bad iolog2: %s\n", p);
+			log_err("bad iolog%d: %s\n", td->io_log_version, p);
 			continue;
 		}
 
@@ -506,6 +562,8 @@ static bool read_iolog2(struct thread_data *td)
 		ipo = calloc(1, sizeof(*ipo));
 		init_ipo(ipo);
 		ipo->ddir = rw;
+		if (td->io_log_version == 3)
+			ipo->delay = delay;
 		if (rw == DDIR_WAIT) {
 			ipo->delay = offset;
 		} else {
@@ -650,18 +708,22 @@ static bool init_iolog_read(struct thread_data *td, char *fname)
 	}
 
 	/*
-	 * version 2 of the iolog stores a specific string as the
+	 * versions 2 and 3 of the iolog store a specific string as the
 	 * first line, check for that
 	 */
-	if (!strncmp(iolog_ver2, buffer, strlen(iolog_ver2))) {
-		free_release_files(td);
-		td->io_log_rfile = f;
-		return read_iolog2(td);
+	if (!strncmp(iolog_ver2, buffer, strlen(iolog_ver2)))
+		td->io_log_version = 2;
+	else if (!strncmp(iolog_ver3, buffer, strlen(iolog_ver3)))
+		td->io_log_version = 3;
+	else {
+		log_err("fio: iolog version 1 is no longer supported\n");
+		fclose(f);
+		return false;
 	}
 
-	log_err("fio: iolog version 1 is no longer supported\n");
-	fclose(f);
-	return false;
+	free_release_files(td);
+	td->io_log_rfile = f;
+	return read_iolog(td);
 }
 
 /*
@@ -685,11 +747,12 @@ static bool init_iolog_write(struct thread_data *td)
 	td->iolog_f = f;
 	td->iolog_buf = malloc(8192);
 	setvbuf(f, td->iolog_buf, _IOFBF, 8192);
+	fio_gettime(&td->io_log_start_time, NULL);
 
 	/*
 	 * write our version line
 	 */
-	if (fprintf(f, "%s\n", iolog_ver2) < 0) {
+	if (fprintf(f, "%s\n", iolog_ver3) < 0) {
 		perror("iolog init\n");
 		return false;
 	}
diff --git a/iolog.h b/iolog.h
index a3986309..62cbd1b0 100644
--- a/iolog.h
+++ b/iolog.h
@@ -227,10 +227,8 @@ struct io_piece {
 	unsigned long len;
 	unsigned int flags;
 	enum fio_ddir ddir;
-	union {
-		unsigned long delay;
-		unsigned int file_action;
-	};
+	unsigned long delay;
+	unsigned int file_action;
 };
 
 /*
@@ -259,6 +257,8 @@ extern int iolog_compress_init(struct thread_data *, struct sk_out *);
 extern void iolog_compress_exit(struct thread_data *);
 extern size_t log_chunk_sizes(struct io_log *);
 extern int init_io_u_buffers(struct thread_data *);
+extern unsigned long long delay_since_ttime(const struct thread_data *,
+					     unsigned long long);
 
 #ifdef CONFIG_ZLIB
 extern int iolog_file_inflate(const char *);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-04-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-04-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 06bbdc1cb857a11e6d1b7c089126397daca904fe:

  smalloc: fix ptr address in redzone error message (2022-04-05 11:47:35 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a3e48f483db27d20e02cbd81e3a8f18c6c5c50f5:

  Fio 3.30 (2022-04-06 17:10:00 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 3.30

 FIO-VERSION-GEN | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 60f7bb21..fa64f50f 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.29
+DEF_VER=fio-3.30
 
 LF='
 '

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-04-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-04-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 87933e32e356b15b85c6d9775d5e840994080a4f:

  Rename 'fallthrough' attribute to 'fio_fallthrough' (2022-03-30 17:31:36 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 06bbdc1cb857a11e6d1b7c089126397daca904fe:

  smalloc: fix ptr address in redzone error message (2022-04-05 11:47:35 -0600)

----------------------------------------------------------------
Vincent Fu (1):
      smalloc: fix ptr address in redzone error message

 smalloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/smalloc.c b/smalloc.c
index fa00f0ee..23243054 100644
--- a/smalloc.c
+++ b/smalloc.c
@@ -283,13 +283,13 @@ static void sfree_check_redzone(struct block_hdr *hdr)
 	if (hdr->prered != SMALLOC_PRE_RED) {
 		log_err("smalloc pre redzone destroyed!\n"
 			" ptr=%p, prered=%x, expected %x\n",
-				hdr, hdr->prered, SMALLOC_PRE_RED);
+				hdr+1, hdr->prered, SMALLOC_PRE_RED);
 		assert(0);
 	}
 	if (*postred != SMALLOC_POST_RED) {
 		log_err("smalloc post redzone destroyed!\n"
 			"  ptr=%p, postred=%x, expected %x\n",
-				hdr, *postred, SMALLOC_POST_RED);
+				hdr+1, *postred, SMALLOC_POST_RED);
 		assert(0);
 	}
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-03-31 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-03-31 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5e644771eb91e91dd0fa32f4b51f90c44853a2b1:

  Merge branch 'status-interval-finished-jobs' of https://github.com/mmkayPL/fio (2022-03-29 06:30:44 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 87933e32e356b15b85c6d9775d5e840994080a4f:

  Rename 'fallthrough' attribute to 'fio_fallthrough' (2022-03-30 17:31:36 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Rename 'fallthrough' attribute to 'fio_fallthrough'

 compiler/compiler.h |  4 ++--
 crc/murmur3.c       |  4 ++--
 engines/http.c      |  2 +-
 hash.h              | 24 ++++++++++++------------
 init.c              |  2 +-
 io_u.c              | 10 +++++-----
 lib/lfsr.c          | 32 ++++++++++++++++----------------
 parse.c             |  4 ++--
 t/lfsr-test.c       |  6 +++---
 9 files changed, 44 insertions(+), 44 deletions(-)

---

Diff of recent changes:

diff --git a/compiler/compiler.h b/compiler/compiler.h
index 3fd0822f..fefadeaa 100644
--- a/compiler/compiler.h
+++ b/compiler/compiler.h
@@ -72,9 +72,9 @@
 #endif
 
 #if __has_attribute(__fallthrough__)
-#define fallthrough	 __attribute__((__fallthrough__))
+#define fio_fallthrough	 __attribute__((__fallthrough__))
 #else
-#define fallthrough	do {} while (0)  /* fallthrough */
+#define fio_fallthrough	do {} while (0)  /* fallthrough */
 #endif
 
 #endif
diff --git a/crc/murmur3.c b/crc/murmur3.c
index ba408a9e..08660bc8 100644
--- a/crc/murmur3.c
+++ b/crc/murmur3.c
@@ -30,10 +30,10 @@ static uint32_t murmur3_tail(const uint8_t *data, const int nblocks,
 	switch (len & 3) {
 	case 3:
 		k1 ^= tail[2] << 16;
-		fallthrough;
+		fio_fallthrough;
 	case 2:
 		k1 ^= tail[1] << 8;
-		fallthrough;
+		fio_fallthrough;
 	case 1:
 		k1 ^= tail[0];
 		k1 *= c1;
diff --git a/engines/http.c b/engines/http.c
index 57d4967d..696febe1 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -297,7 +297,7 @@ static int _curl_trace(CURL *handle, curl_infotype type,
 	switch (type) {
 	case CURLINFO_TEXT:
 		fprintf(stderr, "== Info: %s", data);
-		fallthrough;
+		fio_fallthrough;
 	default:
 	case CURLINFO_SSL_DATA_OUT:
 	case CURLINFO_SSL_DATA_IN:
diff --git a/hash.h b/hash.h
index 2c04bc29..f7596a56 100644
--- a/hash.h
+++ b/hash.h
@@ -142,20 +142,20 @@ static inline uint32_t jhash(const void *key, uint32_t length, uint32_t initval)
 	/* Last block: affect all 32 bits of (c) */
 	/* All the case statements fall through */
 	switch (length) {
-	case 12: c += (uint32_t) k[11] << 24;	fallthrough;
-	case 11: c += (uint32_t) k[10] << 16;	fallthrough;
-	case 10: c += (uint32_t) k[9] << 8;	fallthrough;
-	case 9:  c += k[8];			fallthrough;
-	case 8:  b += (uint32_t) k[7] << 24;	fallthrough;
-	case 7:  b += (uint32_t) k[6] << 16;	fallthrough;
-	case 6:  b += (uint32_t) k[5] << 8;	fallthrough;
-	case 5:  b += k[4];			fallthrough;
-	case 4:  a += (uint32_t) k[3] << 24;	fallthrough;
-	case 3:  a += (uint32_t) k[2] << 16;	fallthrough;
-	case 2:  a += (uint32_t) k[1] << 8;	fallthrough;
+	case 12: c += (uint32_t) k[11] << 24;	fio_fallthrough;
+	case 11: c += (uint32_t) k[10] << 16;	fio_fallthrough;
+	case 10: c += (uint32_t) k[9] << 8;	fio_fallthrough;
+	case 9:  c += k[8];			fio_fallthrough;
+	case 8:  b += (uint32_t) k[7] << 24;	fio_fallthrough;
+	case 7:  b += (uint32_t) k[6] << 16;	fio_fallthrough;
+	case 6:  b += (uint32_t) k[5] << 8;	fio_fallthrough;
+	case 5:  b += k[4];			fio_fallthrough;
+	case 4:  a += (uint32_t) k[3] << 24;	fio_fallthrough;
+	case 3:  a += (uint32_t) k[2] << 16;	fio_fallthrough;
+	case 2:  a += (uint32_t) k[1] << 8;	fio_fallthrough;
 	case 1:  a += k[0];
 		 __jhash_final(a, b, c);
-		 fallthrough;
+		 fio_fallthrough;
 	case 0: /* Nothing left to add */
 		break;
 	}
diff --git a/init.c b/init.c
index b7f866e6..6f186051 100644
--- a/init.c
+++ b/init.c
@@ -2990,7 +2990,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			log_err("%s: unrecognized option '%s'\n", argv[0],
 							argv[optind - 1]);
 			show_closest_option(argv[optind - 1]);
-			fallthrough;
+			fio_fallthrough;
 		default:
 			do_exit++;
 			exit_val = 1;
diff --git a/io_u.c b/io_u.c
index 50197a4b..eec378dd 100644
--- a/io_u.c
+++ b/io_u.c
@@ -993,7 +993,7 @@ static void __io_u_mark_map(uint64_t *map, unsigned int nr)
 		break;
 	case 1 ... 4:
 		idx = 1;
-		fallthrough;
+		fio_fallthrough;
 	case 0:
 		break;
 	}
@@ -1035,7 +1035,7 @@ void io_u_mark_depth(struct thread_data *td, unsigned int nr)
 		break;
 	case 2 ... 3:
 		idx = 1;
-		fallthrough;
+		fio_fallthrough;
 	case 1:
 		break;
 	}
@@ -1076,7 +1076,7 @@ static void io_u_mark_lat_nsec(struct thread_data *td, unsigned long long nsec)
 		break;
 	case 2 ... 3:
 		idx = 1;
-		fallthrough;
+		fio_fallthrough;
 	case 0 ... 1:
 		break;
 	}
@@ -1118,7 +1118,7 @@ static void io_u_mark_lat_usec(struct thread_data *td, unsigned long long usec)
 		break;
 	case 2 ... 3:
 		idx = 1;
-		fallthrough;
+		fio_fallthrough;
 	case 0 ... 1:
 		break;
 	}
@@ -1166,7 +1166,7 @@ static void io_u_mark_lat_msec(struct thread_data *td, unsigned long long msec)
 		break;
 	case 2 ... 3:
 		idx = 1;
-		fallthrough;
+		fio_fallthrough;
 	case 0 ... 1:
 		break;
 	}
diff --git a/lib/lfsr.c b/lib/lfsr.c
index a32e850a..e86086c4 100644
--- a/lib/lfsr.c
+++ b/lib/lfsr.c
@@ -88,37 +88,37 @@ static inline void __lfsr_next(struct fio_lfsr *fl, unsigned int spin)
 	 */
 	switch (spin) {
 		case 15: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case 14: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case 13: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case 12: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case 11: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case 10: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case  9: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case  8: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case  7: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case  6: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case  5: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case  4: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case  3: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case  2: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case  1: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		case  0: __LFSR_NEXT(fl, fl->last_val);
-		fallthrough;
+		fio_fallthrough;
 		default: break;
 	}
 }
diff --git a/parse.c b/parse.c
index e0bee004..656a5025 100644
--- a/parse.c
+++ b/parse.c
@@ -601,7 +601,7 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 	}
 	case FIO_OPT_STR_VAL_TIME:
 		is_time = 1;
-		fallthrough;
+		fio_fallthrough;
 	case FIO_OPT_ULL:
 	case FIO_OPT_INT:
 	case FIO_OPT_STR_VAL:
@@ -980,7 +980,7 @@ store_option_value:
 	}
 	case FIO_OPT_DEPRECATED:
 		ret = 1;
-		fallthrough;
+		fio_fallthrough;
 	case FIO_OPT_SOFT_DEPRECATED:
 		log_info("Option %s is deprecated\n", o->name);
 		break;
diff --git a/t/lfsr-test.c b/t/lfsr-test.c
index 279e07f0..4b255e19 100644
--- a/t/lfsr-test.c
+++ b/t/lfsr-test.c
@@ -41,11 +41,11 @@ int main(int argc, char *argv[])
 	switch (argc) {
 		case 5: if (strncmp(argv[4], "verify", 7) == 0)
 				verify = 1;
-			fallthrough;
+			fio_fallthrough;
 		case 4: spin = atoi(argv[3]);
-			fallthrough;
+			fio_fallthrough;
 		case 3: seed = atol(argv[2]);
-			fallthrough;
+			fio_fallthrough;
 		case 2: numbers = strtol(argv[1], NULL, 16);
 				break;
 		default: usage();

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-03-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-03-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a57d3fdce796f1bb516c74db95d016bb6db170c1:

  Merge branch 'master' of https://github.com/cccheng/fio (2022-03-28 06:43:56 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5e644771eb91e91dd0fa32f4b51f90c44853a2b1:

  Merge branch 'status-interval-finished-jobs' of https://github.com/mmkayPL/fio (2022-03-29 06:30:44 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'status-interval-finished-jobs' of https://github.com/mmkayPL/fio

Kozlowski Mateusz (1):
      Handle finished jobs when using status-interval

 stat.c | 6 ++++++
 1 file changed, 6 insertions(+)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 7947edb4..356083e2 100644
--- a/stat.c
+++ b/stat.c
@@ -2731,6 +2731,9 @@ int __show_running_run_stats(void)
 	fio_gettime(&ts, NULL);
 
 	for_each_td(td, i) {
+		if (td->runstate >= TD_EXITED)
+			continue;
+
 		td->update_rusage = 1;
 		for_each_rw_ddir(ddir) {
 			td->ts.io_bytes[ddir] = td->io_bytes[ddir];
@@ -2759,6 +2762,9 @@ int __show_running_run_stats(void)
 	__show_run_stats();
 
 	for_each_td(td, i) {
+		if (td->runstate >= TD_EXITED)
+			continue;
+
 		if (td_read(td) && td->ts.io_bytes[DDIR_READ])
 			td->ts.runtime[DDIR_READ] -= rt[i];
 		if (td_write(td) && td->ts.io_bytes[DDIR_WRITE])

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-03-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-03-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e3de2e7fe2889942d46699e72ac06b96eab09e27:

  Merge branch 'github-1372' of https://github.com/vincentkfu/fio (2022-03-24 10:11:34 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a57d3fdce796f1bb516c74db95d016bb6db170c1:

  Merge branch 'master' of https://github.com/cccheng/fio (2022-03-28 06:43:56 -0600)

----------------------------------------------------------------
Chung-Chiang Cheng (1):
      Fix compile error of GCC 4

Jens Axboe (1):
      Merge branch 'master' of https://github.com/cccheng/fio

 compiler/compiler.h | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/compiler/compiler.h b/compiler/compiler.h
index 44fa87b9..3fd0822f 100644
--- a/compiler/compiler.h
+++ b/compiler/compiler.h
@@ -67,6 +67,7 @@
 #endif
 
 #ifndef __has_attribute
+#define __has_attribute(x) __GCC4_has_attribute_##x
 #define __GCC4_has_attribute___fallthrough__	0
 #endif
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-03-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-03-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c822572d68e326384ce179b9484de0e4abf3d514:

  engines/null: use correct -include (2022-03-20 09:31:20 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e3de2e7fe2889942d46699e72ac06b96eab09e27:

  Merge branch 'github-1372' of https://github.com/vincentkfu/fio (2022-03-24 10:11:34 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'github-1372' of https://github.com/vincentkfu/fio

Vincent Fu (1):
      io_u: produce bad offsets for some time_based jobs

 io_u.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index 806ceb77..50197a4b 100644
--- a/io_u.c
+++ b/io_u.c
@@ -355,7 +355,7 @@ static int get_next_seq_offset(struct thread_data *td, struct fio_file *f,
 	 * and invalidate the cache, if we need to.
 	 */
 	if (f->last_pos[ddir] >= f->io_size + get_start_offset(td, f) &&
-	    o->time_based) {
+	    o->time_based && o->nr_files == 1) {
 		f->last_pos[ddir] = f->file_offset;
 		loop_cache_invalidate(td, f);
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-03-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-03-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1953e1adb5a28ed21370e85991d7f5c3cdc699f3:

  Merge branch 'flags-fix' of https://github.com/albertofaria/fio (2022-03-15 17:21:41 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c822572d68e326384ce179b9484de0e4abf3d514:

  engines/null: use correct -include (2022-03-20 09:31:20 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      engines/null: update external engine compilation
      Merge branch 'master' of https://github.com/jnoc/fio
      engines/null: use correct -include

Jonathon Carter (1):
      Added citation.cff for easy APA/BibTeX citation directly from the Github repository

 CITATION.cff   | 11 +++++++++++
 engines/null.c |  7 ++++---
 2 files changed, 15 insertions(+), 3 deletions(-)
 create mode 100644 CITATION.cff

---

Diff of recent changes:

diff --git a/CITATION.cff b/CITATION.cff
new file mode 100644
index 00000000..3df315e5
--- /dev/null
+++ b/CITATION.cff
@@ -0,0 +1,11 @@
+cff-version: 1.2.0
+preferred-citation:
+  type: software
+  authors:
+  - family-names: "Axboe"
+    given-names: "Jens"
+    email: axboe@kernel.dk
+  title: "Flexible I/O Tester"
+  year: 2022
+  url: "https://github.com/axboe/fio"
+licence: GNU GPL v2.0
diff --git a/engines/null.c b/engines/null.c
index 4cc0102b..8dcd1b21 100644
--- a/engines/null.c
+++ b/engines/null.c
@@ -6,7 +6,8 @@
  *
  * It also can act as external C++ engine - compiled with:
  *
- * g++ -O2 -g -shared -rdynamic -fPIC -o cpp_null null.c -DFIO_EXTERNAL_ENGINE
+ * g++ -O2 -g -shared -rdynamic -fPIC -o cpp_null null.c \
+ *	-include ../config-host.h -DFIO_EXTERNAL_ENGINE
  *
  * to test it execute:
  *
@@ -201,7 +202,7 @@ struct NullData {
 		return null_commit(td, impl_);
 	}
 
-	int fio_null_queue(struct thread_data *td, struct io_u *io_u)
+	fio_q_status fio_null_queue(struct thread_data *td, struct io_u *io_u)
 	{
 		return null_queue(td, impl_, io_u);
 	}
@@ -233,7 +234,7 @@ static int fio_null_commit(struct thread_data *td)
 	return NullData::get(td)->fio_null_commit(td);
 }
 
-static int fio_null_queue(struct thread_data *td, struct io_u *io_u)
+static fio_q_status fio_null_queue(struct thread_data *td, struct io_u *io_u)
 {
 	return NullData::get(td)->fio_null_queue(td, io_u);
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-03-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-03-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1fe261a24794f60bf374cd1852e09ec56997a20a:

  t/dedupe: ensure that 'ret' is initialized (2022-03-11 06:15:53 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1953e1adb5a28ed21370e85991d7f5c3cdc699f3:

  Merge branch 'flags-fix' of https://github.com/albertofaria/fio (2022-03-15 17:21:41 -0600)

----------------------------------------------------------------
Alberto Faria (1):
      Properly encode engine flags in thread_data::flags

Jens Axboe (1):
      Merge branch 'flags-fix' of https://github.com/albertofaria/fio

 fio.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/fio.h b/fio.h
index c314f0a8..776fb51f 100644
--- a/fio.h
+++ b/fio.h
@@ -184,7 +184,7 @@ struct zone_split_index {
  */
 struct thread_data {
 	struct flist_head opt_list;
-	unsigned long flags;
+	unsigned long long flags;
 	struct thread_options o;
 	void *eo;
 	pthread_t thread;
@@ -681,12 +681,12 @@ enum {
 };
 
 #define TD_ENG_FLAG_SHIFT	18
-#define TD_ENG_FLAG_MASK	((1U << 18) - 1)
+#define TD_ENG_FLAG_MASK	((1ULL << 18) - 1)
 
 static inline void td_set_ioengine_flags(struct thread_data *td)
 {
 	td->flags = (~(TD_ENG_FLAG_MASK << TD_ENG_FLAG_SHIFT) & td->flags) |
-		    (td->io_ops->flags << TD_ENG_FLAG_SHIFT);
+		    ((unsigned long long)td->io_ops->flags << TD_ENG_FLAG_SHIFT);
 }
 
 static inline bool td_ioengine_flagged(struct thread_data *td,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-03-12 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-03-12 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 16b1e24562347d371d6d62e0bb9a03ad4e2a8a96:

  t/dedupe: handle errors more gracefully (2022-03-11 05:09:20 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1fe261a24794f60bf374cd1852e09ec56997a20a:

  t/dedupe: ensure that 'ret' is initialized (2022-03-11 06:15:53 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      t/dedupe: ensure that 'ret' is initialized

 t/dedupe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/t/dedupe.c b/t/dedupe.c
index 561aa08d..d21e96f4 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -280,7 +280,7 @@ static int insert_chunks(struct item *items, unsigned int nitems,
 			 uint64_t *ndupes, uint64_t *unique_capacity,
 			 struct zlib_ctrl *zc)
 {
-	int i, ret;
+	int i, ret = 0;
 
 	fio_sem_down(rb_lock);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-03-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-03-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit df0ab55ff9e28f4b85c199e207aec904f8a76440:

  Merge branch 'master' of https://github.com/dpronin/fio (2022-03-09 06:20:31 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 16b1e24562347d371d6d62e0bb9a03ad4e2a8a96:

  t/dedupe: handle errors more gracefully (2022-03-11 05:09:20 -0700)

----------------------------------------------------------------
Denis Pronin (4):
      configure script refactoring
      improvements in dup_files function
      fixed memory leak detected by ASAN
      ASAN enabling when configuring

Jens Axboe (7):
      Merge branch 'master' of https://github.com/dpronin/fio
      Merge branch 'refactoring/configure' of https://github.com/dpronin/fio
      Merge branch 'improvement/prevent-sigsegv-when-dup-files' of https://github.com/dpronin/fio
      Merge branch 'improvement/enable-asan' of https://github.com/dpronin/fio
      t/io_uring: only enable sync if we have preadv2
      Merge branch 'fuzz-cleanup' of https://github.com/vincentkfu/fio
      t/dedupe: handle errors more gracefully

Vincent Fu (1):
      fuzz: avoid building t/fuzz/parse_ini by default

 Makefile     |  8 +++++++-
 backend.c    |  6 ++++++
 configure    | 14 ++++++++++----
 filesetup.c  |  3 ++-
 t/dedupe.c   | 57 +++++++++++++++++++++++++++++++++++----------------------
 t/io_uring.c | 13 +++++++++++++
 6 files changed, 73 insertions(+), 28 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 6ffd3d13..e670c1f2 100644
--- a/Makefile
+++ b/Makefile
@@ -385,14 +385,16 @@ T_MEMLOCK_PROGS = t/memlock
 T_TT_OBJS = t/time-test.o
 T_TT_PROGS = t/time-test
 
+ifneq (,$(findstring -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION,$(CFLAGS)))
 T_FUZZ_OBJS = t/fuzz/fuzz_parseini.o
 T_FUZZ_OBJS += $(OBJS)
 ifdef CONFIG_ARITHMETIC
 T_FUZZ_OBJS += lex.yy.o y.tab.o
 endif
+# For proper fio code teardown CFLAGS needs to include -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
 # in case there is no fuzz driver defined by environment variable LIB_FUZZING_ENGINE, use a simple one
 # For instance, with compiler clang, address sanitizer and libFuzzer as a fuzzing engine, you should define
-# export CFLAGS="-fsanitize=address,fuzzer-no-link"
+# export CFLAGS="-fsanitize=address,fuzzer-no-link -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION"
 # export LIB_FUZZING_ENGINE="-fsanitize=address"
 # export CC=clang
 # before running configure && make
@@ -401,6 +403,10 @@ ifndef LIB_FUZZING_ENGINE
 T_FUZZ_OBJS += t/fuzz/onefile.o
 endif
 T_FUZZ_PROGS = t/fuzz/fuzz_parseini
+else	# CFLAGS includes -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
+T_FUZZ_OBJS =
+T_FUZZ_PROGS =
+endif
 
 T_OBJS = $(T_SMALLOC_OBJS)
 T_OBJS += $(T_IEEE_OBJS)
diff --git a/backend.c b/backend.c
index cd7f4e5f..001b2b96 100644
--- a/backend.c
+++ b/backend.c
@@ -2432,7 +2432,10 @@ reap:
 							strerror(ret));
 			} else {
 				pid_t pid;
+				struct fio_file **files;
 				dprint(FD_PROCESS, "will fork\n");
+				files = td->files;
+				read_barrier();
 				pid = fork();
 				if (!pid) {
 					int ret;
@@ -2441,6 +2444,9 @@ reap:
 					_exit(ret);
 				} else if (i == fio_debug_jobno)
 					*fio_debug_jobp = pid;
+				// freeing previously allocated memory for files
+				// this memory freed MUST NOT be shared between processes, only the pointer itself may be shared within TD
+				free(files);
 				free(fd);
 				fd = NULL;
 			}
diff --git a/configure b/configure
index 67e5d535..d327d2ca 100755
--- a/configure
+++ b/configure
@@ -248,6 +248,8 @@ for opt do
   ;;
   --disable-dfs) dfs="no"
   ;;
+  --enable-asan) asan="yes"
+  ;;
   --help)
     show_help="yes"
     ;;
@@ -290,9 +292,10 @@ if test "$show_help" = "yes" ; then
   echo "--enable-libiscsi       Enable iscsi support"
   echo "--enable-libnbd         Enable libnbd (NBD engine) support"
   echo "--disable-libzbc        Disable libzbc even if found"
-  echo "--disable-tcmalloc	Disable tcmalloc support"
-  echo "--dynamic-libengines	Lib-based ioengines as dynamic libraries"
-  echo "--disable-dfs		Disable DAOS File System support even if found"
+  echo "--disable-tcmalloc      Disable tcmalloc support"
+  echo "--dynamic-libengines    Lib-based ioengines as dynamic libraries"
+  echo "--disable-dfs           Disable DAOS File System support even if found"
+  echo "--enable-asan           Enable address sanitizer"
   exit $exit_val
 fi
 
@@ -3196,7 +3199,10 @@ fi
 if test "$fcntl_sync" = "yes" ; then
   output_sym "CONFIG_FCNTL_SYNC"
 fi
-
+if test "$asan" = "yes"; then
+  CFLAGS="$CFLAGS -fsanitize=address"
+  LDFLAGS="$LDFLAGS -fsanitize=address"
+fi
 print_config "Lib-based ioengines dynamic" "$dynamic_engines"
 cat > $TMPC << EOF
 int main(int argc, char **argv)
diff --git a/filesetup.c b/filesetup.c
index 7c32d0af..ab6c488b 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -2031,11 +2031,12 @@ void dup_files(struct thread_data *td, struct thread_data *org)
 	if (!org->files)
 		return;
 
-	td->files = malloc(org->files_index * sizeof(f));
+	td->files = calloc(org->files_index, sizeof(f));
 
 	if (td->o.file_lock_mode != FILE_LOCK_NONE)
 		td->file_locks = malloc(org->files_index);
 
+	assert(org->files_index >= org->o.nr_files);
 	for_each_file(org, f, i) {
 		struct fio_file *__f;
 
diff --git a/t/dedupe.c b/t/dedupe.c
index 109ea1af..561aa08d 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -143,15 +143,15 @@ static int read_block(int fd, void *buf, off_t offset)
 	return __read_block(fd, buf, offset, blocksize);
 }
 
-static void account_unique_capacity(uint64_t offset, uint64_t *unique_capacity,
-				    struct zlib_ctrl *zc)
+static int account_unique_capacity(uint64_t offset, uint64_t *unique_capacity,
+				   struct zlib_ctrl *zc)
 {
 	z_stream *stream = &zc->stream;
 	unsigned int compressed_len;
 	int ret;
 
 	if (read_block(file.fd, zc->buf_in, offset))
-		return;
+		return 1;
 
 	stream->next_in = zc->buf_in;
 	stream->avail_in = blocksize;
@@ -159,7 +159,8 @@ static void account_unique_capacity(uint64_t offset, uint64_t *unique_capacity,
 	stream->next_out = zc->buf_out;
 
 	ret = deflate(stream, Z_FINISH);
-	assert(ret != Z_STREAM_ERROR);
+	if (ret == Z_STREAM_ERROR)
+		return 1;
 	compressed_len = blocksize - stream->avail_out;
 
 	if (dump_output)
@@ -169,6 +170,7 @@ static void account_unique_capacity(uint64_t offset, uint64_t *unique_capacity,
 
 	*unique_capacity += compressed_len;
 	deflateReset(stream);
+	return 0;
 }
 
 static void add_item(struct chunk *c, struct item *i)
@@ -225,12 +227,12 @@ static struct chunk *alloc_chunk(void)
 	return c;
 }
 
-static void insert_chunk(struct item *i, uint64_t *unique_capacity,
-			 struct zlib_ctrl *zc)
+static int insert_chunk(struct item *i, uint64_t *unique_capacity,
+			struct zlib_ctrl *zc)
 {
 	struct fio_rb_node **p, *parent;
 	struct chunk *c;
-	int diff;
+	int ret, diff;
 
 	p = &rb_root.rb_node;
 	parent = NULL;
@@ -244,8 +246,6 @@ static void insert_chunk(struct item *i, uint64_t *unique_capacity,
 		} else if (diff > 0) {
 			p = &(*p)->rb_right;
 		} else {
-			int ret;
-
 			if (!collision_check)
 				goto add;
 
@@ -266,17 +266,21 @@ static void insert_chunk(struct item *i, uint64_t *unique_capacity,
 	memcpy(c->hash, i->hash, sizeof(i->hash));
 	rb_link_node(&c->rb_node, parent, p);
 	rb_insert_color(&c->rb_node, &rb_root);
-	if (compression)
-		account_unique_capacity(i->offset, unique_capacity, zc);
+	if (compression) {
+		ret = account_unique_capacity(i->offset, unique_capacity, zc);
+		if (ret)
+			return ret;
+	}
 add:
 	add_item(c, i);
+	return 0;
 }
 
-static void insert_chunks(struct item *items, unsigned int nitems,
-			  uint64_t *ndupes, uint64_t *unique_capacity,
-			  struct zlib_ctrl *zc)
+static int insert_chunks(struct item *items, unsigned int nitems,
+			 uint64_t *ndupes, uint64_t *unique_capacity,
+			 struct zlib_ctrl *zc)
 {
-	int i;
+	int i, ret;
 
 	fio_sem_down(rb_lock);
 
@@ -288,11 +292,15 @@ static void insert_chunks(struct item *items, unsigned int nitems,
 			s = sizeof(items[i].hash) / sizeof(uint32_t);
 			r = bloom_set(bloom, items[i].hash, s);
 			*ndupes += r;
-		} else
-			insert_chunk(&items[i], unique_capacity, zc);
+		} else {
+			ret = insert_chunk(&items[i], unique_capacity, zc);
+			if (ret)
+				break;
+		}
 	}
 
 	fio_sem_up(rb_lock);
+	return ret;
 }
 
 static void crc_buf(void *buf, uint32_t *hash)
@@ -320,6 +328,7 @@ static int do_work(struct worker_thread *thread, void *buf)
 	uint64_t ndupes = 0;
 	uint64_t unique_capacity = 0;
 	struct item *items;
+	int ret;
 
 	offset = thread->cur_offset;
 
@@ -339,13 +348,17 @@ static int do_work(struct worker_thread *thread, void *buf)
 		nitems++;
 	}
 
-	insert_chunks(items, nitems, &ndupes, &unique_capacity, &thread->zc);
+	ret = insert_chunks(items, nitems, &ndupes, &unique_capacity, &thread->zc);
 
 	free(items);
-	thread->items += nitems;
-	thread->dupes += ndupes;
-	thread->unique_capacity += unique_capacity;
-	return 0;
+	if (!ret) {
+		thread->items += nitems;
+		thread->dupes += ndupes;
+		thread->unique_capacity += unique_capacity;
+		return 0;
+	}
+
+	return ret;
 }
 
 static void thread_init_zlib_control(struct worker_thread *thread)
diff --git a/t/io_uring.c b/t/io_uring.c
index 157eea9e..10035912 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -939,6 +939,7 @@ submit:
 	return NULL;
 }
 
+#ifdef CONFIG_PWRITEV2
 static void *submitter_sync_fn(void *data)
 {
 	struct submitter *s = data;
@@ -1004,6 +1005,13 @@ static void *submitter_sync_fn(void *data)
 	finish = 1;
 	return NULL;
 }
+#else
+static void *submitter_sync_fn(void *data)
+{
+	finish = 1;
+	return NULL;
+}
+#endif
 
 static struct submitter *get_submitter(int offset)
 {
@@ -1346,7 +1354,12 @@ int main(int argc, char *argv[])
 			register_ring = !!atoi(optarg);
 			break;
 		case 'S':
+#ifdef CONFIG_PWRITEV2
 			use_sync = !!atoi(optarg);
+#else
+			fprintf(stderr, "preadv2 not supported\n");
+			exit(1);
+#endif
 			break;
 		case 'h':
 		case '?':

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-03-10 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-03-10 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a24ef2702e2c1b948df37080eb3f18cca60d414b:

  Merge branch 'master' of https://github.com/dpronin/fio (2022-03-08 16:42:37 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to df0ab55ff9e28f4b85c199e207aec904f8a76440:

  Merge branch 'master' of https://github.com/dpronin/fio (2022-03-09 06:20:31 -0700)

----------------------------------------------------------------
Denis Pronin (3):
      - freeing job_sections array of strings upon freeing each its item in init.c
      - fixed memory leak, which is happening when parsing options, claimed by ASAN
      - fixed memory leak in parent process detected by ASAN when forking and not freeing memory in the parent process allocated for fork_data

Jens Axboe (3):
      Merge branch 'fix/asan-memleak' of https://github.com/dpronin/fio
      Merge branch 'fix/asan-memleak-forkdata' of https://github.com/dpronin/fio
      Merge branch 'master' of https://github.com/dpronin/fio

 backend.c | 2 ++
 init.c    | 4 ++++
 parse.c   | 2 ++
 3 files changed, 8 insertions(+)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index a21dfef6..cd7f4e5f 100644
--- a/backend.c
+++ b/backend.c
@@ -2441,6 +2441,8 @@ reap:
 					_exit(ret);
 				} else if (i == fio_debug_jobno)
 					*fio_debug_jobp = pid;
+				free(fd);
+				fd = NULL;
 			}
 			dprint(FD_MUTEX, "wait on startup_sem\n");
 			if (fio_sem_down_timeout(startup_sem, 10000)) {
diff --git a/init.c b/init.c
index 81c30f8c..b7f866e6 100644
--- a/init.c
+++ b/init.c
@@ -2185,6 +2185,10 @@ static int __parse_jobs_ini(struct thread_data *td,
 		i++;
 	}
 
+	free(job_sections);
+	job_sections = NULL;
+	nr_job_sections = 0;
+
 	free(opts);
 out:
 	free(string);
diff --git a/parse.c b/parse.c
index d086ee48..e0bee004 100644
--- a/parse.c
+++ b/parse.c
@@ -817,6 +817,8 @@ store_option_value:
 
 		if (o->off1) {
 			cp = td_var(data, o, o->off1);
+			if (*cp)
+				free(*cp);
 			*cp = strdup(ptr);
 			if (strlen(ptr) > o->maxlen - 1) {
 				log_err("value exceeds max length of %d\n",

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-03-09 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-03-09 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit dc44588f2e445edd7a4ca7dc9bf05bb3b4b2789e:

  Makefile: get rid of fortify source (2022-03-07 09:16:39 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a24ef2702e2c1b948df37080eb3f18cca60d414b:

  Merge branch 'master' of https://github.com/dpronin/fio (2022-03-08 16:42:37 -0700)

----------------------------------------------------------------
Denis Pronin (1):
      - fixed typo in configure script

Jens Axboe (1):
      Merge branch 'master' of https://github.com/dpronin/fio

 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index be4605f9..67e5d535 100755
--- a/configure
+++ b/configure
@@ -2098,7 +2098,7 @@ if test "$libhdfs" = "yes" ; then
     hdfs_conf_error=1
   fi
   if test "$FIO_LIBHDFS_INCLUDE" = "" ; then
-    echo "configure: FIO_LIBHDFS_INCLUDE should be defined to libhdfs inlude path"
+    echo "configure: FIO_LIBHDFS_INCLUDE should be defined to libhdfs include path"
     hdfs_conf_error=1
   fi
   if test "$FIO_LIBHDFS_LIB" = "" ; then

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-03-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-03-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c3773c171dffb79f771d213d94249cefc4b9b6de:

  windowsaio: open file for write if we have syncs (2022-02-26 10:43:20 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dc44588f2e445edd7a4ca7dc9bf05bb3b4b2789e:

  Makefile: get rid of fortify source (2022-03-07 09:16:39 -0700)

----------------------------------------------------------------
Jens Axboe (7):
      t/io_uring: change map buffers registration opcode
      t/io_uring: change fatal map buffers condition with multiple files
      io_uring.h: sync with 5.18 kernel bits
      t/io_uring: add support for registering the ring fd
      t/io_uring: support using preadv2
      t/io_uring: add missing CR
      Makefile: get rid of fortify source

 Makefile            |   2 +-
 os/linux/io_uring.h |  17 ++++--
 t/io_uring.c        | 148 ++++++++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 147 insertions(+), 20 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 0ab4f82c..6ffd3d13 100644
--- a/Makefile
+++ b/Makefile
@@ -28,7 +28,7 @@ PROGS	= fio
 SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/hist/fiologparser_hist.py tools/hist/fio-histo-log-pctiles.py tools/fio_jsonplus_clat2csv)
 
 ifndef CONFIG_FIO_NO_OPT
-  FIO_CFLAGS += -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2
+  FIO_CFLAGS += -O3
 endif
 ifdef CONFIG_BUILD_NATIVE
   FIO_CFLAGS += -march=native
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index c45b5e9a..42b2fe84 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -70,6 +70,7 @@ enum {
 	IOSQE_IO_HARDLINK_BIT,
 	IOSQE_ASYNC_BIT,
 	IOSQE_BUFFER_SELECT_BIT,
+	IOSQE_CQE_SKIP_SUCCESS_BIT,
 };
 
 /*
@@ -87,6 +88,8 @@ enum {
 #define IOSQE_ASYNC		(1U << IOSQE_ASYNC_BIT)
 /* select buffer from sqe->buf_group */
 #define IOSQE_BUFFER_SELECT	(1U << IOSQE_BUFFER_SELECT_BIT)
+/* don't post CQE if request succeeded */
+#define IOSQE_CQE_SKIP_SUCCESS	(1U << IOSQE_CQE_SKIP_SUCCESS_BIT)
 
 /*
  * io_uring_setup() flags
@@ -254,10 +257,11 @@ struct io_cqring_offsets {
 /*
  * io_uring_enter(2) flags
  */
-#define IORING_ENTER_GETEVENTS	(1U << 0)
-#define IORING_ENTER_SQ_WAKEUP	(1U << 1)
-#define IORING_ENTER_SQ_WAIT	(1U << 2)
-#define IORING_ENTER_EXT_ARG	(1U << 3)
+#define IORING_ENTER_GETEVENTS		(1U << 0)
+#define IORING_ENTER_SQ_WAKEUP		(1U << 1)
+#define IORING_ENTER_SQ_WAIT		(1U << 2)
+#define IORING_ENTER_EXT_ARG		(1U << 3)
+#define IORING_ENTER_REGISTERED_RING	(1U << 4)
 
 /*
  * Passed in for io_uring_setup(2). Copied back with updated info on success
@@ -289,6 +293,7 @@ struct io_uring_params {
 #define IORING_FEAT_EXT_ARG		(1U << 8)
 #define IORING_FEAT_NATIVE_WORKERS	(1U << 9)
 #define IORING_FEAT_RSRC_TAGS		(1U << 10)
+#define IORING_FEAT_CQE_SKIP		(1U << 11)
 
 /*
  * io_uring_register(2) opcodes and arguments
@@ -321,6 +326,10 @@ enum {
 	/* set/get max number of io-wq workers */
 	IORING_REGISTER_IOWQ_MAX_WORKERS	= 19,
 
+	/* register/unregister io_uring fd with the ring */
+	IORING_REGISTER_RING_FDS		= 20,
+	IORING_UNREGISTER_RING_FDS		= 21,
+
 	/* this goes last */
 	IORING_REGISTER_LAST
 };
diff --git a/t/io_uring.c b/t/io_uring.c
index b8fcffe8..157eea9e 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -76,6 +76,7 @@ struct file {
 struct submitter {
 	pthread_t thread;
 	int ring_fd;
+	int enter_ring_fd;
 	int index;
 	struct io_sq_ring sq_ring;
 	struct io_uring_sqe *sqes;
@@ -127,6 +128,8 @@ static int stats = 0;		/* generate IO stats */
 static int aio = 0;		/* use libaio */
 static int runtime = 0;		/* runtime */
 static int random_io = 1;	/* random or sequential IO */
+static int register_ring = 1;	/* register ring */
+static int use_sync = 0;	/* use preadv2 */
 
 static unsigned long tsc_rate;
 
@@ -139,7 +142,7 @@ static float plist[] = { 1.0, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0,
 static int plist_len = 17;
 
 #ifndef IORING_REGISTER_MAP_BUFFERS
-#define IORING_REGISTER_MAP_BUFFERS	20
+#define IORING_REGISTER_MAP_BUFFERS	22
 struct io_uring_map_buffers {
 	__s32	fd;
 	__u32	buf_start;
@@ -349,10 +352,8 @@ static int io_uring_map_buffers(struct submitter *s)
 
 	if (do_nop)
 		return 0;
-	if (s->nr_files > 1) {
-		fprintf(stderr, "Can't map buffers with multiple files\n");
-		return -1;
-	}
+	if (s->nr_files > 1)
+		fprintf(stdout, "Mapping buffers may not work with multiple files\n");
 
 	return syscall(__NR_io_uring_register, s->ring_fd,
 			IORING_REGISTER_MAP_BUFFERS, &map, 1);
@@ -422,12 +423,14 @@ out:
 static int io_uring_enter(struct submitter *s, unsigned int to_submit,
 			  unsigned int min_complete, unsigned int flags)
 {
+	if (register_ring)
+		flags |= IORING_ENTER_REGISTERED_RING;
 #ifdef FIO_ARCH_HAS_SYSCALL
-	return __do_syscall6(__NR_io_uring_enter, s->ring_fd, to_submit,
+	return __do_syscall6(__NR_io_uring_enter, s->enter_ring_fd, to_submit,
 				min_complete, flags, NULL, 0);
 #else
-	return syscall(__NR_io_uring_enter, s->ring_fd, to_submit, min_complete,
-			flags, NULL, 0);
+	return syscall(__NR_io_uring_enter, s->enter_ring_fd, to_submit,
+			min_complete, flags, NULL, 0);
 #endif
 }
 
@@ -795,6 +798,34 @@ static void *submitter_aio_fn(void *data)
 }
 #endif
 
+static void io_uring_unregister_ring(struct submitter *s)
+{
+	struct io_uring_rsrc_update up = {
+		.offset	= s->enter_ring_fd,
+	};
+
+	syscall(__NR_io_uring_register, s->ring_fd, IORING_UNREGISTER_RING_FDS,
+		&up, 1);
+}
+
+static int io_uring_register_ring(struct submitter *s)
+{
+	struct io_uring_rsrc_update up = {
+		.data	= s->ring_fd,
+		.offset	= -1U,
+	};
+	int ret;
+
+	ret = syscall(__NR_io_uring_register, s->ring_fd,
+			IORING_REGISTER_RING_FDS, &up, 1);
+	if (ret == 1) {
+		s->enter_ring_fd = up.offset;
+		return 0;
+	}
+	register_ring = 0;
+	return -1;
+}
+
 static void *submitter_uring_fn(void *data)
 {
 	struct submitter *s = data;
@@ -806,6 +837,9 @@ static void *submitter_uring_fn(void *data)
 	submitter_init(s);
 #endif
 
+	if (register_ring)
+		io_uring_register_ring(s);
+
 	prepped = 0;
 	do {
 		int to_wait, to_submit, this_reap, to_prep;
@@ -898,6 +932,75 @@ submit:
 		}
 	} while (!s->finish);
 
+	if (register_ring)
+		io_uring_unregister_ring(s);
+
+	finish = 1;
+	return NULL;
+}
+
+static void *submitter_sync_fn(void *data)
+{
+	struct submitter *s = data;
+	int ret;
+
+	submitter_init(s);
+
+	do {
+		uint64_t offset;
+		struct file *f;
+		long r;
+
+		if (s->nr_files == 1) {
+			f = &s->files[0];
+		} else {
+			f = &s->files[s->cur_file];
+			if (f->pending_ios >= file_depth(s)) {
+				s->cur_file++;
+				if (s->cur_file == s->nr_files)
+					s->cur_file = 0;
+				f = &s->files[s->cur_file];
+			}
+		}
+		f->pending_ios++;
+
+		if (random_io) {
+			r = __rand64(&s->rand_state);
+			offset = (r % (f->max_blocks - 1)) * bs;
+		} else {
+			offset = f->cur_off;
+			f->cur_off += bs;
+			if (f->cur_off + bs > f->max_size)
+				f->cur_off = 0;
+		}
+
+#ifdef ARCH_HAVE_CPU_CLOCK
+		if (stats)
+			s->clock_batch[s->clock_index] = get_cpu_clock();
+#endif
+
+		s->inflight++;
+		s->calls++;
+
+		if (polled)
+			ret = preadv2(f->real_fd, &s->iovecs[0], 1, offset, RWF_HIPRI);
+		else
+			ret = preadv2(f->real_fd, &s->iovecs[0], 1, offset, 0);
+
+		if (ret < 0) {
+			perror("preadv2");
+			break;
+		} else if (ret != bs) {
+			break;
+		}
+
+		s->done++;
+		s->inflight--;
+		f->pending_ios--;
+		if (stats)
+			add_stat(s, s->clock_index, 1);
+	} while (!s->finish);
+
 	finish = 1;
 	return NULL;
 }
@@ -1000,7 +1103,7 @@ static int setup_ring(struct submitter *s)
 		perror("io_uring_setup");
 		return 1;
 	}
-	s->ring_fd = fd;
+	s->ring_fd = s->enter_ring_fd = fd;
 
 	io_uring_probe(fd);
 
@@ -1105,10 +1208,13 @@ static void usage(char *argv, int status)
 		" -T <int>  : TSC rate in HZ\n"
 		" -r <int>  : Runtime in seconds, default %s\n"
 		" -R <bool> : Use random IO, default %d\n"
-		" -a <bool> : Use legacy aio, default %d\n",
+		" -a <bool> : Use legacy aio, default %d\n"
+		" -S <bool> : Use sync IO (preadv2), default %d\n"
+		" -X <bool> : Use registered ring %d\n",
 		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
 		fixedbufs, dma_map, register_files, nthreads, !buffered, do_nop,
-		stats, runtime == 0 ? "unlimited" : runtime_str, random_io, aio);
+		stats, runtime == 0 ? "unlimited" : runtime_str, random_io, aio,
+		use_sync, register_ring);
 	exit(status);
 }
 
@@ -1169,7 +1275,7 @@ int main(int argc, char *argv[])
 	if (!do_nop && argc < 2)
 		usage(argv[0], 1);
 
-	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:r:D:R:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:r:D:R:X:S:h?")) != -1) {
 		switch (opt) {
 		case 'a':
 			aio = !!atoi(optarg);
@@ -1236,6 +1342,12 @@ int main(int argc, char *argv[])
 		case 'R':
 			random_io = !!atoi(optarg);
 			break;
+		case 'X':
+			register_ring = !!atoi(optarg);
+			break;
+		case 'S':
+			use_sync = !!atoi(optarg);
+			break;
 		case 'h':
 		case '?':
 		default:
@@ -1346,7 +1458,9 @@ int main(int argc, char *argv[])
 	for (j = 0; j < nthreads; j++) {
 		s = get_submitter(j);
 
-		if (!aio)
+		if (use_sync)
+			continue;
+		else if (!aio)
 			err = setup_ring(s);
 		else
 			err = setup_aio(s);
@@ -1357,14 +1471,18 @@ int main(int argc, char *argv[])
 	}
 	s = get_submitter(0);
 	printf("polled=%d, fixedbufs=%d/%d, register_files=%d, buffered=%d, QD=%d\n", polled, fixedbufs, dma_map, register_files, buffered, depth);
-	if (!aio)
+	if (use_sync)
+		printf("Engine=preadv2\n");
+	else if (!aio)
 		printf("Engine=io_uring, sq_ring=%d, cq_ring=%d\n", *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
 	else
 		printf("Engine=aio\n");
 
 	for (j = 0; j < nthreads; j++) {
 		s = get_submitter(j);
-		if (!aio)
+		if (use_sync)
+			pthread_create(&s->thread, NULL, submitter_sync_fn, s);
+		else if (!aio)
 			pthread_create(&s->thread, NULL, submitter_uring_fn, s);
 #ifdef CONFIG_LIBAIO
 		else

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-02-27 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-02-27 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit cf2511565f40be1b78b3fc1194e823baf305f0a0:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-02-24 12:40:19 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c3773c171dffb79f771d213d94249cefc4b9b6de:

  windowsaio: open file for write if we have syncs (2022-02-26 10:43:20 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Add TD_F_SYNCS thread flag
      windowsaio: open file for write if we have syncs

 blktrace.c           | 4 ++++
 engines/windowsaio.c | 2 +-
 fio.h                | 6 ++++--
 ioengines.h          | 2 +-
 iolog.c              | 9 +++++++--
 5 files changed, 17 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/blktrace.c b/blktrace.c
index e1804765..ead60130 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -297,6 +297,10 @@ static bool handle_trace_flush(struct thread_data *td, struct blk_io_trace *t,
 
 	ios[DDIR_SYNC]++;
 	dprint(FD_BLKTRACE, "store flush delay=%lu\n", ipo->delay);
+
+	if (!(td->flags & TD_F_SYNCS))
+		td->flags |= TD_F_SYNCS;
+
 	queue_io_piece(td, ipo);
 	return true;
 }
diff --git a/engines/windowsaio.c b/engines/windowsaio.c
index d82c8053..6681f8bb 100644
--- a/engines/windowsaio.c
+++ b/engines/windowsaio.c
@@ -248,7 +248,7 @@ static int fio_windowsaio_open_file(struct thread_data *td, struct fio_file *f)
 		log_err("fio: unknown fadvise type %d\n", td->o.fadvise_hint);
 	}
 
-	if (!td_write(td) || read_only)
+	if ((!td_write(td) && !(td->flags & TD_F_SYNCS)) || read_only)
 		access = GENERIC_READ;
 	else
 		access = (GENERIC_READ | GENERIC_WRITE);
diff --git a/fio.h b/fio.h
index 88df117d..c314f0a8 100644
--- a/fio.h
+++ b/fio.h
@@ -97,6 +97,7 @@ enum {
 	__TD_F_MMAP_KEEP,
 	__TD_F_DIRS_CREATED,
 	__TD_F_CHECK_RATE,
+	__TD_F_SYNCS,
 	__TD_F_LAST,		/* not a real bit, keep last */
 };
 
@@ -118,6 +119,7 @@ enum {
 	TD_F_MMAP_KEEP		= 1U << __TD_F_MMAP_KEEP,
 	TD_F_DIRS_CREATED	= 1U << __TD_F_DIRS_CREATED,
 	TD_F_CHECK_RATE		= 1U << __TD_F_CHECK_RATE,
+	TD_F_SYNCS		= 1U << __TD_F_SYNCS,
 };
 
 enum {
@@ -678,8 +680,8 @@ enum {
 	TD_NR,
 };
 
-#define TD_ENG_FLAG_SHIFT	17
-#define TD_ENG_FLAG_MASK	((1U << 17) - 1)
+#define TD_ENG_FLAG_SHIFT	18
+#define TD_ENG_FLAG_MASK	((1U << 18) - 1)
 
 static inline void td_set_ioengine_flags(struct thread_data *td)
 {
diff --git a/ioengines.h b/ioengines.h
index b3f755b4..acdb0071 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -8,7 +8,7 @@
 #include "io_u.h"
 #include "zbd_types.h"
 
-#define FIO_IOOPS_VERSION	30
+#define FIO_IOOPS_VERSION	31
 
 #ifndef CONFIG_DYNAMIC_ENGINES
 #define FIO_STATIC	static
diff --git a/iolog.c b/iolog.c
index a2cf0c1c..724ec1fe 100644
--- a/iolog.c
+++ b/iolog.c
@@ -402,6 +402,7 @@ static bool read_iolog2(struct thread_data *td)
 	enum fio_ddir rw;
 	bool realloc = false;
 	int64_t items_to_fetch = 0;
+	int syncs;
 
 	if (td->o.read_iolog_chunked) {
 		items_to_fetch = iolog_items_to_fetch(td);
@@ -417,7 +418,7 @@ static bool read_iolog2(struct thread_data *td)
 	rfname = fname = malloc(256+16);
 	act = malloc(256+16);
 
-	reads = writes = waits = 0;
+	syncs = reads = writes = waits = 0;
 	while ((p = fgets(str, 4096, td->io_log_rfile)) != NULL) {
 		struct io_piece *ipo;
 		int r;
@@ -492,7 +493,9 @@ static bool read_iolog2(struct thread_data *td)
 				continue;
 			waits++;
 		} else if (rw == DDIR_INVAL) {
-		} else if (!ddir_sync(rw)) {
+		} else if (ddir_sync(rw)) {
+			syncs++;
+		} else {
 			log_err("bad ddir: %d\n", rw);
 			continue;
 		}
@@ -547,6 +550,8 @@ static bool read_iolog2(struct thread_data *td)
 			" read-only\n", td->o.name, writes);
 		writes = 0;
 	}
+	if (syncs)
+		td->flags |= TD_F_SYNCS;
 
 	if (td->o.read_iolog_chunked) {
 		if (td->io_log_current == 0) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-02-25 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-02-25 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c377f4f85943e5b155b3daaab1ce5213077531d8:

  io_uring: use syscall helpers for the hot path (2022-02-21 09:43:48 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cf2511565f40be1b78b3fc1194e823baf305f0a0:

  Merge branch 'master' of https://github.com/bvanassche/fio (2022-02-24 12:40:19 -0700)

----------------------------------------------------------------
Bart Van Assche (1):
      Fix three compiler warnings

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 engines/cmdprio.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/engines/cmdprio.c b/engines/cmdprio.c
index dd358754..979a81b6 100644
--- a/engines/cmdprio.c
+++ b/engines/cmdprio.c
@@ -319,7 +319,7 @@ static int fio_cmdprio_gen_perc(struct thread_data *td, struct cmdprio *cmdprio)
 {
 	struct cmdprio_options *options = cmdprio->options;
 	struct cmdprio_prio *prio;
-	struct cmdprio_values values[CMDPRIO_RWDIR_CNT] = {0};
+	struct cmdprio_values values[CMDPRIO_RWDIR_CNT] = {};
 	struct thread_stat *ts = &td->ts;
 	enum fio_ddir ddir;
 	int ret;
@@ -368,8 +368,8 @@ static int fio_cmdprio_parse_and_gen_bssplit(struct thread_data *td,
 					     struct cmdprio *cmdprio)
 {
 	struct cmdprio_options *options = cmdprio->options;
-	struct cmdprio_parse_result parse_res[CMDPRIO_RWDIR_CNT] = {0};
-	struct cmdprio_values values[CMDPRIO_RWDIR_CNT] = {0};
+	struct cmdprio_parse_result parse_res[CMDPRIO_RWDIR_CNT] = {};
+	struct cmdprio_values values[CMDPRIO_RWDIR_CNT] = {};
 	struct thread_stat *ts = &td->ts;
 	int ret, implicit_cmdprio;
 	enum fio_ddir ddir;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-02-22 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-02-22 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3f43022d4021850905886e391ec68c02c99aec5a:

  Merge branch 'genfio-tempfile' of https://github.com/scop/fio (2022-02-20 12:39:11 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c377f4f85943e5b155b3daaab1ce5213077531d8:

  io_uring: use syscall helpers for the hot path (2022-02-21 09:43:48 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      aarch64: add system call definitions
      x86-64: add system call definitions
      io_uring: use syscall helpers for the hot path

 arch/arch-aarch64.h |  77 +++++++++++++++++++++++++++++++++++
 arch/arch-x86_64.h  | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 engines/io_uring.c  |   5 +++
 t/io_uring.c        |   5 +++
 4 files changed, 200 insertions(+)

---

Diff of recent changes:

diff --git a/arch/arch-aarch64.h b/arch/arch-aarch64.h
index 94571709..951d1718 100644
--- a/arch/arch-aarch64.h
+++ b/arch/arch-aarch64.h
@@ -44,4 +44,81 @@ static inline int arch_init(char *envp[])
 	return 0;
 }
 
+#define __do_syscallN(...) ({						\
+	__asm__ volatile (						\
+		"svc 0"							\
+		: "=r"(x0)						\
+		: __VA_ARGS__						\
+		: "memory", "cc");					\
+	(long) x0;							\
+})
+
+#define __do_syscall0(__n) ({						\
+	register long x8 __asm__("x8") = __n;				\
+	register long x0 __asm__("x0");					\
+									\
+	__do_syscallN("r" (x8));					\
+})
+
+#define __do_syscall1(__n, __a) ({					\
+	register long x8 __asm__("x8") = __n;				\
+	register __typeof__(__a) x0 __asm__("x0") = __a;		\
+									\
+	__do_syscallN("r" (x8), "0" (x0));				\
+})
+
+#define __do_syscall2(__n, __a, __b) ({					\
+	register long x8 __asm__("x8") = __n;				\
+	register __typeof__(__a) x0 __asm__("x0") = __a;		\
+	register __typeof__(__b) x1 __asm__("x1") = __b;		\
+									\
+	__do_syscallN("r" (x8), "0" (x0), "r" (x1));			\
+})
+
+#define __do_syscall3(__n, __a, __b, __c) ({				\
+	register long x8 __asm__("x8") = __n;				\
+	register __typeof__(__a) x0 __asm__("x0") = __a;		\
+	register __typeof__(__b) x1 __asm__("x1") = __b;		\
+	register __typeof__(__c) x2 __asm__("x2") = __c;		\
+									\
+	__do_syscallN("r" (x8), "0" (x0), "r" (x1), "r" (x2));		\
+})
+
+#define __do_syscall4(__n, __a, __b, __c, __d) ({			\
+	register long x8 __asm__("x8") = __n;				\
+	register __typeof__(__a) x0 __asm__("x0") = __a;		\
+	register __typeof__(__b) x1 __asm__("x1") = __b;		\
+	register __typeof__(__c) x2 __asm__("x2") = __c;		\
+	register __typeof__(__d) x3 __asm__("x3") = __d;		\
+									\
+	__do_syscallN("r" (x8), "0" (x0), "r" (x1), "r" (x2), "r" (x3));\
+})
+
+#define __do_syscall5(__n, __a, __b, __c, __d, __e) ({			\
+	register long x8 __asm__("x8") = __n;				\
+	register __typeof__(__a) x0 __asm__("x0") = __a;		\
+	register __typeof__(__b) x1 __asm__("x1") = __b;		\
+	register __typeof__(__c) x2 __asm__("x2") = __c;		\
+	register __typeof__(__d) x3 __asm__("x3") = __d;		\
+	register __typeof__(__e) x4 __asm__("x4") = __e;		\
+									\
+	__do_syscallN("r" (x8), "0" (x0), "r" (x1), "r" (x2), "r" (x3),	\
+			"r"(x4));					\
+})
+
+#define __do_syscall6(__n, __a, __b, __c, __d, __e, __f) ({		\
+	register long x8 __asm__("x8") = __n;				\
+	register __typeof__(__a) x0 __asm__("x0") = __a;		\
+	register __typeof__(__b) x1 __asm__("x1") = __b;		\
+	register __typeof__(__c) x2 __asm__("x2") = __c;		\
+	register __typeof__(__d) x3 __asm__("x3") = __d;		\
+	register __typeof__(__e) x4 __asm__("x4") = __e;		\
+	register __typeof__(__f) x5 __asm__("x5") = __f;		\
+									\
+	__do_syscallN("r" (x8), "0" (x0), "r" (x1), "r" (x2), "r" (x3),	\
+			"r" (x4), "r"(x5));				\
+})
+
+#define FIO_ARCH_HAS_SYSCALL
+
 #endif
diff --git a/arch/arch-x86_64.h b/arch/arch-x86_64.h
index 25850f90..86ce1b7e 100644
--- a/arch/arch-x86_64.h
+++ b/arch/arch-x86_64.h
@@ -68,4 +68,117 @@ static inline int arch_rand_seed(unsigned long *seed)
 	return 0;
 }
 
+#define __do_syscall0(NUM) ({			\
+	intptr_t rax;				\
+						\
+	__asm__ volatile(			\
+		"syscall"			\
+		: "=a"(rax)	/* %rax */	\
+		: "a"(NUM)	/* %rax */	\
+		: "rcx", "r11", "memory"	\
+	);					\
+	rax;					\
+})
+
+#define __do_syscall1(NUM, ARG1) ({		\
+	intptr_t rax;				\
+						\
+	__asm__ volatile(			\
+		"syscall"			\
+		: "=a"(rax)	/* %rax */	\
+		: "a"((NUM)),	/* %rax */	\
+		  "D"((ARG1))	/* %rdi */	\
+		: "rcx", "r11", "memory"	\
+	);					\
+	rax;					\
+})
+
+#define __do_syscall2(NUM, ARG1, ARG2) ({	\
+	intptr_t rax;				\
+						\
+	__asm__ volatile(			\
+		"syscall"			\
+		: "=a"(rax)	/* %rax */	\
+		: "a"((NUM)),	/* %rax */	\
+		  "D"((ARG1)),	/* %rdi */	\
+		  "S"((ARG2))	/* %rsi */	\
+		: "rcx", "r11", "memory"	\
+	);					\
+	rax;					\
+})
+
+#define __do_syscall3(NUM, ARG1, ARG2, ARG3) ({	\
+	intptr_t rax;				\
+						\
+	__asm__ volatile(			\
+		"syscall"			\
+		: "=a"(rax)	/* %rax */	\
+		: "a"((NUM)),	/* %rax */	\
+		  "D"((ARG1)),	/* %rdi */	\
+		  "S"((ARG2)),	/* %rsi */	\
+		  "d"((ARG3))	/* %rdx */	\
+		: "rcx", "r11", "memory"	\
+	);					\
+	rax;					\
+})
+
+#define __do_syscall4(NUM, ARG1, ARG2, ARG3, ARG4) ({			\
+	intptr_t rax;							\
+	register __typeof__(ARG4) __r10 __asm__("r10") = (ARG4);	\
+									\
+	__asm__ volatile(						\
+		"syscall"						\
+		: "=a"(rax)	/* %rax */				\
+		: "a"((NUM)),	/* %rax */				\
+		  "D"((ARG1)),	/* %rdi */				\
+		  "S"((ARG2)),	/* %rsi */				\
+		  "d"((ARG3)),	/* %rdx */				\
+		  "r"(__r10)	/* %r10 */				\
+		: "rcx", "r11", "memory"				\
+	);								\
+	rax;								\
+})
+
+#define __do_syscall5(NUM, ARG1, ARG2, ARG3, ARG4, ARG5) ({		\
+	intptr_t rax;							\
+	register __typeof__(ARG4) __r10 __asm__("r10") = (ARG4);	\
+	register __typeof__(ARG5) __r8 __asm__("r8") = (ARG5);		\
+									\
+	__asm__ volatile(						\
+		"syscall"						\
+		: "=a"(rax)	/* %rax */				\
+		: "a"((NUM)),	/* %rax */				\
+		  "D"((ARG1)),	/* %rdi */				\
+		  "S"((ARG2)),	/* %rsi */				\
+		  "d"((ARG3)),	/* %rdx */				\
+		  "r"(__r10),	/* %r10 */				\
+		  "r"(__r8)	/* %r8 */				\
+		: "rcx", "r11", "memory"				\
+	);								\
+	rax;								\
+})
+
+#define __do_syscall6(NUM, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6) ({	\
+	intptr_t rax;							\
+	register __typeof__(ARG4) __r10 __asm__("r10") = (ARG4);	\
+	register __typeof__(ARG5) __r8 __asm__("r8") = (ARG5);		\
+	register __typeof__(ARG6) __r9 __asm__("r9") = (ARG6);		\
+									\
+	__asm__ volatile(						\
+		"syscall"						\
+		: "=a"(rax)	/* %rax */				\
+		: "a"((NUM)),	/* %rax */				\
+		  "D"((ARG1)),	/* %rdi */				\
+		  "S"((ARG2)),	/* %rsi */				\
+		  "d"((ARG3)),	/* %rdx */				\
+		  "r"(__r10),	/* %r10 */				\
+		  "r"(__r8),	/* %r8 */				\
+		  "r"(__r9)	/* %r9 */				\
+		: "rcx", "r11", "memory"				\
+	);								\
+	rax;								\
+})
+
+#define FIO_ARCH_HAS_SYSCALL
+
 #endif
diff --git a/engines/io_uring.c b/engines/io_uring.c
index a2533c88..1e15647e 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -278,8 +278,13 @@ static struct fio_option options[] = {
 static int io_uring_enter(struct ioring_data *ld, unsigned int to_submit,
 			 unsigned int min_complete, unsigned int flags)
 {
+#ifdef FIO_ARCH_HAS_SYSCALL
+	return __do_syscall6(__NR_io_uring_enter, ld->ring_fd, to_submit,
+				min_complete, flags, NULL, 0);
+#else
 	return syscall(__NR_io_uring_enter, ld->ring_fd, to_submit,
 			min_complete, flags, NULL, 0);
+#endif
 }
 
 static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
diff --git a/t/io_uring.c b/t/io_uring.c
index f513d7dc..b8fcffe8 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -422,8 +422,13 @@ out:
 static int io_uring_enter(struct submitter *s, unsigned int to_submit,
 			  unsigned int min_complete, unsigned int flags)
 {
+#ifdef FIO_ARCH_HAS_SYSCALL
+	return __do_syscall6(__NR_io_uring_enter, s->ring_fd, to_submit,
+				min_complete, flags, NULL, 0);
+#else
 	return syscall(__NR_io_uring_enter, s->ring_fd, to_submit, min_complete,
 			flags, NULL, 0);
+#endif
 }
 
 #ifndef CONFIG_HAVE_GETTID

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-02-21 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-02-21 13:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 29560 bytes --]

The following changes since commit 933651ec130ce4d27a5c249d649d20afeb2bdf38:

  Merge branch 'rpma-update-RPMA-engines-with-new-librpma-completions-API' of https://github.com/ldorau/fio (2022-02-18 09:02:03 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3f43022d4021850905886e391ec68c02c99aec5a:

  Merge branch 'genfio-tempfile' of https://github.com/scop/fio (2022-02-20 12:39:11 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      Merge branch 'which-command-v-type-P' of https://github.com/scop/fio
      Merge branch 'spelling' of https://github.com/scop/fio
      Merge branch 'genfio-tempfile' of https://github.com/scop/fio

Ville Skyttä (3):
      genfio: fix temporary file handling
      ci, t, tools: use `command` and `type` instead of `which`
      Spelling and grammar fixes

 HOWTO.rst                           | 4 ++--
 ci/travis-install-pmdk.sh           | 2 +-
 crc/xxhash.c                        | 4 ++--
 engines/exec.c                      | 4 ++--
 engines/http.c                      | 4 ++--
 engines/ime.c                       | 2 +-
 engines/libhdfs.c                   | 2 +-
 engines/librpma_fio.c               | 2 +-
 engines/librpma_gpspm.c             | 2 +-
 engines/nbd.c                       | 2 +-
 engines/rados.c                     | 2 +-
 engines/rbd.c                       | 4 ++--
 engines/rdma.c                      | 2 +-
 examples/enospc-pressure.fio        | 4 ++--
 examples/falloc.fio                 | 2 +-
 examples/librpma_apm-server.fio     | 2 +-
 examples/librpma_gpspm-server.fio   | 2 +-
 examples/rand-zones.fio             | 2 +-
 filesetup.c                         | 2 +-
 fio.1                               | 4 ++--
 graph.c                             | 2 +-
 lib/pattern.c                       | 6 +++---
 options.c                           | 4 ++--
 os/os-android.h                     | 2 +-
 os/os-netbsd.h                      | 2 +-
 os/windows/posix.c                  | 2 +-
 oslib/libmtd.h                      | 6 +++---
 stat.c                              | 2 +-
 stat.h                              | 2 +-
 t/latency_percentiles.py            | 2 +-
 t/one-core-peak.sh                  | 6 +++---
 t/readonly.py                       | 2 +-
 t/sgunmap-test.py                   | 2 +-
 t/steadystate_tests.py              | 2 +-
 t/time-test.c                       | 2 +-
 tools/fio_generate_plots            | 2 +-
 tools/fio_jsonplus_clat2csv         | 4 ++--
 tools/fiograph/fiograph.py          | 2 +-
 tools/genfio                        | 5 +++--
 tools/hist/fio-histo-log-pctiles.py | 2 +-
 tools/plot/fio2gnuplot              | 4 ++--
 tools/plot/fio2gnuplot.1            | 2 +-
 tools/plot/fio2gnuplot.manpage      | 2 +-
 43 files changed, 61 insertions(+), 60 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO.rst b/HOWTO.rst
index ac1f3478..0978879c 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1443,7 +1443,7 @@ I/O type
 	range of possible random values.
 	Defaults are: random for **pareto** and **zipf**, and 0.5 for **normal**.
 	If you wanted to use **zipf** with a `theta` of 1.2 centered on 1/4 of allowed value range,
-	you would use ``random_distibution=zipf:1.2:0.25``.
+	you would use ``random_distribution=zipf:1.2:0.25``.
 
 	For a **zoned** distribution, fio supports specifying percentages of I/O
 	access that should fall within what range of the file or device. For
@@ -3370,7 +3370,7 @@ Verification
 	To avoid false verification errors, do not use the norandommap option when
 	verifying data with async I/O engines and I/O depths > 1.  Or use the
 	norandommap and the lfsr random generator together to avoid writing to the
-	same offset with muliple outstanding I/Os.
+	same offset with multiple outstanding I/Os.
 
 .. option:: verify_offset=int
 
diff --git a/ci/travis-install-pmdk.sh b/ci/travis-install-pmdk.sh
index 803438f8..3b0b5bbc 100755
--- a/ci/travis-install-pmdk.sh
+++ b/ci/travis-install-pmdk.sh
@@ -12,7 +12,7 @@ WORKDIR=$(pwd)
 #    /bin/sh: 1: clang: not found
 # if CC is not set to the full path of clang.
 #
-export CC=$(which $CC)
+export CC=$(type -P $CC)
 
 # Install PMDK libraries, because PMDK's libpmem
 # is a dependency of the librpma fio engine.
diff --git a/crc/xxhash.c b/crc/xxhash.c
index 4736c528..0119564b 100644
--- a/crc/xxhash.c
+++ b/crc/xxhash.c
@@ -50,10 +50,10 @@ You can contact the author at :
 //#define XXH_ACCEPT_NULL_INPUT_POINTER 1
 
 // XXH_FORCE_NATIVE_FORMAT :
-// By default, xxHash library provides endian-independant Hash values, based on little-endian convention.
+// By default, xxHash library provides endian-independent Hash values, based on little-endian convention.
 // Results are therefore identical for little-endian and big-endian CPU.
 // This comes at a performance cost for big-endian CPU, since some swapping is required to emulate little-endian format.
-// Should endian-independance be of no importance for your application, you may set the #define below to 1.
+// Should endian-independence be of no importance for your application, you may set the #define below to 1.
 // It will improve speed for Big-endian CPU.
 // This option has no impact on Little_Endian CPU.
 #define XXH_FORCE_NATIVE_FORMAT 0
diff --git a/engines/exec.c b/engines/exec.c
index ab3639c5..20e50e00 100644
--- a/engines/exec.c
+++ b/engines/exec.c
@@ -67,8 +67,8 @@ char *str_replace(char *orig, const char *rep, const char *with)
 	/*
 	 * Replace a substring by another.
 	 *
-	 * Returns the new string if occurences were found
-	 * Returns orig if no occurence is found
+	 * Returns the new string if occurrences were found
+	 * Returns orig if no occurrence is found
 	 */
 	char *result, *insert, *tmp;
 	int len_rep, len_with, len_front, count;
diff --git a/engines/http.c b/engines/http.c
index 35c44871..57d4967d 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -388,7 +388,7 @@ static void _add_aws_auth_header(CURL *curl, struct curl_slist *slist, struct ht
 
 	signature = _conv_hex(md, SHA256_DIGEST_LENGTH);
 
-	/* Surpress automatic Accept: header */
+	/* Suppress automatic Accept: header */
 	slist = curl_slist_append(slist, "Accept:");
 
 	snprintf(s, sizeof(s), "x-amz-content-sha256: %s", dsha);
@@ -419,7 +419,7 @@ static void _add_swift_header(CURL *curl, struct curl_slist *slist, struct http_
 	if (op == DDIR_WRITE) {
 		dsha = _gen_hex_md5(buf, len);
 	}
-	/* Surpress automatic Accept: header */
+	/* Suppress automatic Accept: header */
 	slist = curl_slist_append(slist, "Accept:");
 
 	snprintf(s, sizeof(s), "etag: %s", dsha);
diff --git a/engines/ime.c b/engines/ime.c
index 440cc29e..f6690cc1 100644
--- a/engines/ime.c
+++ b/engines/ime.c
@@ -83,7 +83,7 @@ struct ime_data {
 	};
 	struct iovec 	*iovecs;		/* array of queued iovecs */
 	struct io_u 	**io_us;		/* array of queued io_u pointers */
-	struct io_u 	**event_io_us;	/* array of the events retieved afer get_events*/
+	struct io_u 	**event_io_us;	/* array of the events retrieved after get_events*/
 	unsigned int 	queued;			/* iovecs/io_us in the queue */
 	unsigned int 	events;			/* number of committed iovecs/io_us */
 
diff --git a/engines/libhdfs.c b/engines/libhdfs.c
index eb55c3c5..f20e45ca 100644
--- a/engines/libhdfs.c
+++ b/engines/libhdfs.c
@@ -27,7 +27,7 @@ struct hdfsio_data {
 };
 
 struct hdfsio_options {
-	void *pad;			/* needed because offset can't be 0 for a option defined used offsetof */
+	void *pad;			/* needed because offset can't be 0 for an option defined used offsetof */
 	char *host;
 	char *directory;
 	unsigned int port;
diff --git a/engines/librpma_fio.c b/engines/librpma_fio.c
index dfd82180..34818904 100644
--- a/engines/librpma_fio.c
+++ b/engines/librpma_fio.c
@@ -426,7 +426,7 @@ int librpma_fio_client_post_init(struct thread_data *td)
 
 	/*
 	 * td->orig_buffer is not aligned. The engine requires aligned io_us
-	 * so FIO alignes up the address using the formula below.
+	 * so FIO aligns up the address using the formula below.
 	 */
 	ccd->orig_buffer_aligned = PTR_ALIGN(td->orig_buffer, page_mask) +
 			td->o.mem_align;
diff --git a/engines/librpma_gpspm.c b/engines/librpma_gpspm.c
index 14626e7f..5cf97472 100644
--- a/engines/librpma_gpspm.c
+++ b/engines/librpma_gpspm.c
@@ -431,7 +431,7 @@ static int server_post_init(struct thread_data *td)
 
 	/*
 	 * td->orig_buffer is not aligned. The engine requires aligned io_us
-	 * so FIO alignes up the address using the formula below.
+	 * so FIO aligns up the address using the formula below.
 	 */
 	sd->orig_buffer_aligned = PTR_ALIGN(td->orig_buffer, page_mask) +
 			td->o.mem_align;
diff --git a/engines/nbd.c b/engines/nbd.c
index b0ba75e6..7c2d5f4b 100644
--- a/engines/nbd.c
+++ b/engines/nbd.c
@@ -52,7 +52,7 @@ static struct fio_option options[] = {
 	},
 };
 
-/* Alocates nbd_data. */
+/* Allocates nbd_data. */
 static int nbd_setup(struct thread_data *td)
 {
 	struct nbd_data *nbd_data;
diff --git a/engines/rados.c b/engines/rados.c
index 23e62c4c..976f9229 100644
--- a/engines/rados.c
+++ b/engines/rados.c
@@ -151,7 +151,7 @@ static int _fio_rados_connect(struct thread_data *td)
 		char *client_name = NULL;
 
 		/*
-		* If we specify cluser name, the rados_create2
+		* If we specify cluster name, the rados_create2
 		* will not assume 'client.'. name is considered
 		* as a full type.id namestr
 		*/
diff --git a/engines/rbd.c b/engines/rbd.c
index c6203d4c..2f25889a 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -173,7 +173,7 @@ static int _fio_rbd_connect(struct thread_data *td)
 		char *client_name = NULL; 
 
 		/*
-		 * If we specify cluser name, the rados_create2
+		 * If we specify cluster name, the rados_create2
 		 * will not assume 'client.'. name is considered
 		 * as a full type.id namestr
 		 */
@@ -633,7 +633,7 @@ static int fio_rbd_setup(struct thread_data *td)
 
 	/* taken from "net" engine. Pretend we deal with files,
 	 * even if we do not have any ideas about files.
-	 * The size of the RBD is set instead of a artificial file.
+	 * The size of the RBD is set instead of an artificial file.
 	 */
 	if (!td->files_index) {
 		add_file(td, td->o.filename ? : "rbd", 0, 0);
diff --git a/engines/rdma.c b/engines/rdma.c
index f4471869..4eb86652 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -1194,7 +1194,7 @@ static int check_set_rlimits(struct thread_data *td)
 
 static int compat_options(struct thread_data *td)
 {
-	// The original RDMA engine had an ugly / seperator
+	// The original RDMA engine had an ugly / separator
 	// on the filename for it's options. This function
 	// retains backwards compatibility with it. Note we do not
 	// support setting the bindname option is this legacy mode.
diff --git a/examples/enospc-pressure.fio b/examples/enospc-pressure.fio
index ca9d8f7a..fa404fd5 100644
--- a/examples/enospc-pressure.fio
+++ b/examples/enospc-pressure.fio
@@ -35,8 +35,8 @@ bs=4k
 rw=randtrim
 filename=raicer
 
-# Verifier thread continiously write to newly allcated blocks
-# and veryfy written content
+# Verifier thread continuously writes to newly allcated blocks
+# and verifies written content
 [aio-dio-verifier]
 create_on_open=1
 verify=crc32c-intel
diff --git a/examples/falloc.fio b/examples/falloc.fio
index fadf1321..5a3e88b8 100644
--- a/examples/falloc.fio
+++ b/examples/falloc.fio
@@ -29,7 +29,7 @@ rw=randtrim
 numjobs=2
 filename=fragmented_file
 
-## Mesure IO performance on fragmented file
+## Measure IO performance on fragmented file
 [sequential aio-dio write]
 stonewall
 ioengine=libaio
diff --git a/examples/librpma_apm-server.fio b/examples/librpma_apm-server.fio
index 062b5215..dc1ddba2 100644
--- a/examples/librpma_apm-server.fio
+++ b/examples/librpma_apm-server.fio
@@ -20,7 +20,7 @@ thread
 # (https://pmem.io/rpma/documentation/basic-direct-write-to-pmem.html)
 direct_write_to_pmem=0
 
-numjobs=1 # number of expected incomming connections
+numjobs=1 # number of expected incoming connections
 size=100MiB # size of workspace for a single connection
 filename=malloc # device dax or an existing fsdax file or "malloc" for allocation from DRAM
 # filename=/dev/dax1.0
diff --git a/examples/librpma_gpspm-server.fio b/examples/librpma_gpspm-server.fio
index 67e92a28..4555314f 100644
--- a/examples/librpma_gpspm-server.fio
+++ b/examples/librpma_gpspm-server.fio
@@ -22,7 +22,7 @@ thread
 direct_write_to_pmem=0
 # set to 0 (false) to wait for completion instead of busy-wait polling completion.
 busy_wait_polling=1
-numjobs=1 # number of expected incomming connections
+numjobs=1 # number of expected incoming connections
 iodepth=2 # number of parallel GPSPM requests
 size=100MiB # size of workspace for a single connection
 filename=malloc # device dax or an existing fsdax file or "malloc" for allocation from DRAM
diff --git a/examples/rand-zones.fio b/examples/rand-zones.fio
index 169137d4..10e71727 100644
--- a/examples/rand-zones.fio
+++ b/examples/rand-zones.fio
@@ -21,6 +21,6 @@ random_distribution=zoned:50/5:30/15:20/
 # The above applies to all of reads/writes/trims. If we wanted to do
 # something differently for writes, let's say 50% for the first 10%
 # and 50% for the remaining 90%, we could do it by adding a new section
-# after a a comma.
+# after a comma.
 
 # random_distribution=zoned:50/5:30/15:20/,50/10:50/90
diff --git a/filesetup.c b/filesetup.c
index fb556d84..7c32d0af 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1486,7 +1486,7 @@ static bool init_rand_distribution(struct thread_data *td)
 
 /*
  * Check if the number of blocks exceeds the randomness capability of
- * the selected generator. Tausworthe is 32-bit, the others are fullly
+ * the selected generator. Tausworthe is 32-bit, the others are fully
  * 64-bit capable.
  */
 static int check_rand_gen_limits(struct thread_data *td, struct fio_file *f,
diff --git a/fio.1 b/fio.1
index e23d4092..98410655 100644
--- a/fio.1
+++ b/fio.1
@@ -1221,7 +1221,7 @@ more control over most probable outcome. This value is in range [0-1] which maps
 range of possible random values.
 Defaults are: random for \fBpareto\fR and \fBzipf\fR, and 0.5 for \fBnormal\fR.
 If you wanted to use \fBzipf\fR with a `theta` of 1.2 centered on 1/4 of allowed value range,
-you would use `random_distibution=zipf:1.2:0.25`.
+you would use `random_distribution=zipf:1.2:0.25`.
 .P
 For a \fBzoned\fR distribution, fio supports specifying percentages of I/O
 access that should fall within what range of the file or device. For
@@ -3082,7 +3082,7 @@ the verify will be of the newly written data.
 To avoid false verification errors, do not use the norandommap option when
 verifying data with async I/O engines and I/O depths > 1.  Or use the
 norandommap and the lfsr random generator together to avoid writing to the
-same offset with muliple outstanding I/Os.
+same offset with multiple outstanding I/Os.
 .RE
 .TP
 .BI verify_offset \fR=\fPint
diff --git a/graph.c b/graph.c
index 7a174170..c49cdae1 100644
--- a/graph.c
+++ b/graph.c
@@ -999,7 +999,7 @@ const char *graph_find_tooltip(struct graph *g, int ix, int iy)
 				ydiff = fabs(yval - y);
 
 				/*
-				 * zero delta, or within or match critera, break
+				 * zero delta, or within or match criteria, break
 				 */
 				if (ydiff < best_delta) {
 					best_delta = ydiff;
diff --git a/lib/pattern.c b/lib/pattern.c
index 680a12be..d8203630 100644
--- a/lib/pattern.c
+++ b/lib/pattern.c
@@ -211,7 +211,7 @@ static const char *parse_number(const char *beg, char *out,
  * This function tries to find formats, e.g.:
  *   %o - offset of the block
  *
- * In case of successfull parsing it fills the format param
+ * In case of successful parsing it fills the format param
  * with proper offset and the size of the expected value, which
  * should be pasted into buffer using the format 'func' callback.
  *
@@ -267,7 +267,7 @@ static const char *parse_format(const char *in, char *out, unsigned int parsed,
  * @fmt_desc - array of pattern format descriptors [input]
  * @fmt - array of pattern formats [output]
  * @fmt_sz - pointer where the size of pattern formats array stored [input],
- *           after successfull parsing this pointer will contain the number
+ *           after successful parsing this pointer will contain the number
  *           of parsed formats if any [output].
  *
  * strings:
@@ -275,7 +275,7 @@ static const char *parse_format(const char *in, char *out, unsigned int parsed,
  *   NOTE: there is no way to escape quote, so "123\"abc" does not work.
  *
  * numbers:
- *   hexidecimal - sequence of hex bytes starting from 0x or 0X prefix,
+ *   hexadecimal - sequence of hex bytes starting from 0x or 0X prefix,
  *                 e.g. 0xff12ceff1100ff
  *   decimal     - decimal number in range [INT_MIN, INT_MAX]
  *
diff --git a/options.c b/options.c
index 6cdbd268..e06d9b66 100644
--- a/options.c
+++ b/options.c
@@ -1366,7 +1366,7 @@ int get_max_str_idx(char *input)
 }
 
 /*
- * Returns the directory at the index, indexes > entires will be
+ * Returns the directory at the index, indexes > entries will be
  * assigned via modulo division of the index
  */
 int set_name_idx(char *target, size_t tlen, char *input, int index,
@@ -1560,7 +1560,7 @@ static int str_gtod_reduce_cb(void *data, int *il)
 	int val = *il;
 
 	/*
-	 * Only modfiy options if gtod_reduce==1
+	 * Only modify options if gtod_reduce==1
 	 * Otherwise leave settings alone.
 	 */
 	if (val) {
diff --git a/os/os-android.h b/os/os-android.h
index 10c51b83..2f73d249 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -66,7 +66,7 @@
 
 #ifndef CONFIG_NO_SHM
 /*
- * Bionic doesn't support SysV shared memeory, so implement it using ashmem
+ * Bionic doesn't support SysV shared memory, so implement it using ashmem
  */
 #include <stdio.h>
 #include <linux/ashmem.h>
diff --git a/os/os-netbsd.h b/os/os-netbsd.h
index 624c7fa5..b553a430 100644
--- a/os/os-netbsd.h
+++ b/os/os-netbsd.h
@@ -13,7 +13,7 @@
 #include <sys/endian.h>
 #include <sys/sysctl.h>
 
-/* XXX hack to avoid confilcts between rbtree.h and <sys/rbtree.h> */
+/* XXX hack to avoid conflicts between rbtree.h and <sys/rbtree.h> */
 #undef rb_node
 #undef rb_left
 #undef rb_right
diff --git a/os/windows/posix.c b/os/windows/posix.c
index 0d415e1e..a3a6c89f 100644
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -1165,7 +1165,7 @@ HANDLE windows_handle_connection(HANDLE hjob, int sk)
 		ret = pi.hProcess;
 
 	/* duplicate socket and write the protocol_info to pipe so child can
-	 * duplicate the communciation socket */
+	 * duplicate the communication socket */
 	if (WSADuplicateSocket(sk, GetProcessId(pi.hProcess), &protocol_info)) {
 		log_err("WSADuplicateSocket failed (%lu).\n", GetLastError());
 		ret = INVALID_HANDLE_VALUE;
diff --git a/oslib/libmtd.h b/oslib/libmtd.h
index a0c90dcb..668e7798 100644
--- a/oslib/libmtd.h
+++ b/oslib/libmtd.h
@@ -256,7 +256,7 @@ int mtd_mark_bad(const struct mtd_dev_info *mtd, int fd, int eb);
  * @mtd: MTD device description object
  * @fd: MTD device node file descriptor
  * @eb: eraseblock to read from
- * @offs: offset withing the eraseblock to read from
+ * @offs: offset within the eraseblock to read from
  * @buf: buffer to read data to
  * @len: how many bytes to read
  *
@@ -273,7 +273,7 @@ int mtd_read(const struct mtd_dev_info *mtd, int fd, int eb, int offs,
  * @mtd: MTD device description object
  * @fd: MTD device node file descriptor
  * @eb: eraseblock to write to
- * @offs: offset withing the eraseblock to write to
+ * @offs: offset within the eraseblock to write to
  * @data: data buffer to write
  * @len: how many data bytes to write
  * @oob: OOB buffer to write
@@ -329,7 +329,7 @@ int mtd_write_oob(libmtd_t desc, const struct mtd_dev_info *mtd, int fd,
  * @mtd: MTD device description object
  * @fd: MTD device node file descriptor
  * @eb: eraseblock to write to
- * @offs: offset withing the eraseblock to write to
+ * @offs: offset within the eraseblock to write to
  * @img_name: the file to write
  *
  * This function writes an image @img_name the MTD device defined by @mtd. @eb
diff --git a/stat.c b/stat.c
index 1764eebc..7947edb4 100644
--- a/stat.c
+++ b/stat.c
@@ -377,7 +377,7 @@ void show_group_stats(struct group_run_stats *rs, struct buf_output *out)
 		free(maxalt);
 	}
 
-	/* Need to aggregate statisitics to show mixed values */
+	/* Need to aggregate statistics to show mixed values */
 	if (rs->unified_rw_rep == UNIFIED_BOTH)
 		show_mixed_group_stats(rs, out);
 }
diff --git a/stat.h b/stat.h
index dce0bb0d..eb7845af 100644
--- a/stat.h
+++ b/stat.h
@@ -68,7 +68,7 @@ struct group_run_stats {
  * than one. This method has low accuracy when the value is small. For
  * example, let the buckets be {[0,99],[100,199],...,[900,999]}, and
  * the represented value of each bucket be the mean of the range. Then
- * a value 0 has an round-off error of 49.5. To improve on this, we
+ * a value 0 has a round-off error of 49.5. To improve on this, we
  * use buckets with non-uniform ranges, while bounding the error of
  * each bucket within a ratio of the sample value. A simple example
  * would be when error_bound = 0.005, buckets are {
diff --git a/t/latency_percentiles.py b/t/latency_percentiles.py
index 9e37d9fe..81704700 100755
--- a/t/latency_percentiles.py
+++ b/t/latency_percentiles.py
@@ -270,7 +270,7 @@ class FioLatTest():
             #
             # Check only for the presence/absence of json+
             # latency bins. Future work can check the
-            # accurracy of the bin values and counts.
+            # accuracy of the bin values and counts.
             #
             # Because the latency percentiles are based on
             # the bins, we can be confident that the bin
diff --git a/t/one-core-peak.sh b/t/one-core-peak.sh
index 9da8304e..3ac119f6 100755
--- a/t/one-core-peak.sh
+++ b/t/one-core-peak.sh
@@ -33,8 +33,8 @@ check_binary() {
   # Ensure the binaries are present and executable
   for bin in "$@"; do
     if [ ! -x ${bin} ]; then
-      which ${bin} >/dev/null
-      [ $? -eq 0 ] || fatal "${bin} doesn't exists or is not executable"
+      command -v ${bin} >/dev/null
+      [ $? -eq 0 ] || fatal "${bin} doesn't exist or is not executable"
     fi
   done
 }
@@ -197,7 +197,7 @@ show_nvme() {
   fw=$(cat ${device_dir}/firmware_rev | xargs) #xargs for trimming spaces
   serial=$(cat ${device_dir}/serial | xargs) #xargs for trimming spaces
   info ${device_name} "MODEL=${model} FW=${fw} serial=${serial} PCI=${pci_addr}@${link_speed} IRQ=${irq} NUMA=${numa} CPUS=${cpus} "
-  which nvme &> /dev/null
+  command -v nvme > /dev/null
   if [ $? -eq 0 ]; then
     status=""
     NCQA=$(nvme get-feature -H -f 0x7 ${device} 2>&1 |grep NCQA |cut -d ':' -f 2 | xargs)
diff --git a/t/readonly.py b/t/readonly.py
index 464847c6..80fac639 100755
--- a/t/readonly.py
+++ b/t/readonly.py
@@ -6,7 +6,7 @@
 #
 # readonly.py
 #
-# Do some basic tests of the --readonly paramter
+# Do some basic tests of the --readonly parameter
 #
 # USAGE
 # python readonly.py [-f fio-executable]
diff --git a/t/sgunmap-test.py b/t/sgunmap-test.py
index 4960a040..6687494f 100755
--- a/t/sgunmap-test.py
+++ b/t/sgunmap-test.py
@@ -3,7 +3,7 @@
 #
 # sgunmap-test.py
 #
-# Limited functonality test for trim workloads using fio's sg ioengine
+# Limited functionality test for trim workloads using fio's sg ioengine
 # This checks only the three sets of reported iodepths
 #
 # !!!WARNING!!!
diff --git a/t/steadystate_tests.py b/t/steadystate_tests.py
index e8bd768c..d6ffd177 100755
--- a/t/steadystate_tests.py
+++ b/t/steadystate_tests.py
@@ -2,7 +2,7 @@
 #
 # steadystate_tests.py
 #
-# Test option parsing and functonality for fio's steady state detection feature.
+# Test option parsing and functionality for fio's steady state detection feature.
 #
 # steadystate_tests.py --read file-for-read-testing --write file-for-write-testing ./fio
 #
diff --git a/t/time-test.c b/t/time-test.c
index a74d9206..3c87d4d4 100644
--- a/t/time-test.c
+++ b/t/time-test.c
@@ -67,7 +67,7 @@
  *	accuracy because the (ticks * clock_mult) product used for final
  *	fractional chunk
  *
- *  iv) 64-bit arithmetic with the clock ticks to nsec conversion occuring in
+ *  iv) 64-bit arithmetic with the clock ticks to nsec conversion occurring in
  *	two stages. This is carried out using locks to update the number of
  *	large time chunks (MAX_CLOCK_SEC_2STAGE) that have elapsed.
  *
diff --git a/tools/fio_generate_plots b/tools/fio_generate_plots
index e4558788..468cf27a 100755
--- a/tools/fio_generate_plots
+++ b/tools/fio_generate_plots
@@ -21,7 +21,7 @@ if [ -z "$1" ]; then
 	exit 1
 fi
 
-GNUPLOT=$(which gnuplot)
+GNUPLOT=$(command -v gnuplot)
 if [ ! -x "$GNUPLOT" ]
 then
 	echo You need gnuplot installed to generate graphs
diff --git a/tools/fio_jsonplus_clat2csv b/tools/fio_jsonplus_clat2csv
index 7f310fcc..8fdd014d 100755
--- a/tools/fio_jsonplus_clat2csv
+++ b/tools/fio_jsonplus_clat2csv
@@ -135,7 +135,7 @@ def more_bins(indices, bins):
 
     Returns:
         True if the indices do not yet point to the end of each bin in bins.
-        False if the indices point beyond their repsective bins.
+        False if the indices point beyond their respective bins.
     """
 
     for key, value in six.iteritems(indices):
@@ -160,7 +160,7 @@ def debug_print(debug, *args):
 def get_csvfile(dest, jobnum):
     """Generate CSV filename from command-line arguments and job numbers.
 
-    Paramaters:
+    Parameters:
         dest        file specification for CSV filename.
         jobnum      job number.
 
diff --git a/tools/fiograph/fiograph.py b/tools/fiograph/fiograph.py
index b5669a2d..384decda 100755
--- a/tools/fiograph/fiograph.py
+++ b/tools/fiograph/fiograph.py
@@ -218,7 +218,7 @@ def fio_to_graphviz(filename, format):
     # The first job will be a new execution group
     new_execution_group = True
 
-    # Let's interate on all sections to create links between them
+    # Let's iterate on all sections to create links between them
     for section_name in fio_file.sections():
         # The current section
         section = fio_file[section_name]
diff --git a/tools/genfio b/tools/genfio
index 8518bbcc..c9bc2f76 100755
--- a/tools/genfio
+++ b/tools/genfio
@@ -22,7 +22,8 @@
 BLK_SIZE=
 BLOCK_SIZE=4k
 SEQ=-1
-TEMPLATE=/tmp/template.fio
+TEMPLATE=$(mktemp "${TMPDIR:-${TEMP:-/tmp}}/template.fio.XXXXXX") || exit $?
+trap 'rm -f "$TEMPLATE"' EXIT
 OUTFILE=
 DISKS=
 PRINTABLE_DISKS=
@@ -48,7 +49,7 @@ show_help() {
 					one test after another then one disk after another
 					Disabled by default
 -p				: Run parallel test
-					one test after anoter but all disks at the same time
+					one test after another but all disks at the same time
 					Enabled by default
 -D iodepth			: Run with the specified iodepth
 					Default is $IODEPTH
diff --git a/tools/hist/fio-histo-log-pctiles.py b/tools/hist/fio-histo-log-pctiles.py
index 08e7722d..b5d167de 100755
--- a/tools/hist/fio-histo-log-pctiles.py
+++ b/tools/hist/fio-histo-log-pctiles.py
@@ -748,7 +748,7 @@ if unittest2_imported:
     def test_e2_get_pctiles_highest_pct(self):
         fio_v3_bucket_count = 29 * 64
         with open(self.fn, 'w') as f:
-            # make a empty fio v3 histogram
+            # make an empty fio v3 histogram
             buckets = [ 0 for j in range(0, fio_v3_bucket_count) ]
             # add one I/O request to last bucket
             buckets[-1] = 1
diff --git a/tools/plot/fio2gnuplot b/tools/plot/fio2gnuplot
index d2dc81df..ce3ca2cc 100755
--- a/tools/plot/fio2gnuplot
+++ b/tools/plot/fio2gnuplot
@@ -492,8 +492,8 @@ def main(argv):
     #We need to adjust the output filename regarding the pattern required by the user
     if (pattern_set_by_user == True):
         gnuplot_output_filename=pattern
-        # As we do have some glob in the pattern, let's make this simpliest
-        # We do remove the simpliest parts of the expression to get a clear file name
+        # As we do have some glob in the pattern, let's make this simplest
+        # We do remove the simplest parts of the expression to get a clear file name
         gnuplot_output_filename=gnuplot_output_filename.replace('-*-','-')
         gnuplot_output_filename=gnuplot_output_filename.replace('*','-')
         gnuplot_output_filename=gnuplot_output_filename.replace('--','-')
diff --git a/tools/plot/fio2gnuplot.1 b/tools/plot/fio2gnuplot.1
index 6fb1283f..bfa10d26 100644
--- a/tools/plot/fio2gnuplot.1
+++ b/tools/plot/fio2gnuplot.1
@@ -35,7 +35,7 @@ The resulting graph helps at understanding trends.
 .TP
 .B
 Grouped 2D graph
-All files are plotted in a single image to ease the comparaison. The same rendering options as per the individual 2D graph are used :
+All files are plotted in a single image to ease the comparison. The same rendering options as per the individual 2D graph are used :
 .RS
 .IP \(bu 3
 raw
diff --git a/tools/plot/fio2gnuplot.manpage b/tools/plot/fio2gnuplot.manpage
index 6a12cf81..be3f13c2 100644
--- a/tools/plot/fio2gnuplot.manpage
+++ b/tools/plot/fio2gnuplot.manpage
@@ -20,7 +20,7 @@ DESCRIPTION
                     	The resulting graph helps at understanding trends.
 
  Grouped 2D graph   
-	All files are plotted in a single image to ease the comparaison. The same rendering options as per the individual 2D graph are used :
+	All files are plotted in a single image to ease the comparison. The same rendering options as per the individual 2D graph are used :
          - raw
          - smooth
          - trend

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-02-19 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-02-19 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c99c81adb3510a8dc34d47fd40b19ef657e32192:

  Correct F_FULLSYNC -> F_FULLFSYNC (2022-02-17 12:53:59 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 933651ec130ce4d27a5c249d649d20afeb2bdf38:

  Merge branch 'rpma-update-RPMA-engines-with-new-librpma-completions-API' of https://github.com/ldorau/fio (2022-02-18 09:02:03 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'rpma-update-RPMA-engines-with-new-librpma-completions-API' of https://github.com/ldorau/fio

Lukasz Dorau (1):
      rpma: RPMA engines require librpma>=v0.11.0 with rpma_cq_get_wc()

Oksana Salyk (1):
      rpma: update RPMA engines with new librpma completions API

 configure               |  4 ++--
 engines/librpma_apm.c   |  8 +++-----
 engines/librpma_fio.c   | 46 +++++++++++++++++++++++++++++-----------------
 engines/librpma_fio.h   | 16 +++++++++-------
 engines/librpma_gpspm.c | 39 ++++++++++++++++++---------------------
 5 files changed, 61 insertions(+), 52 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 6160d84d..be4605f9 100755
--- a/configure
+++ b/configure
@@ -974,7 +974,7 @@ print_config "rdmacm" "$rdmacm"
 
 ##########################################
 # librpma probe
-# The librpma engine requires librpma>=v0.10.0 with rpma_mr_advise().
+# The librpma engines require librpma>=v0.11.0 with rpma_cq_get_wc().
 if test "$librpma" != "yes" ; then
   librpma="no"
 fi
@@ -982,7 +982,7 @@ cat > $TMPC << EOF
 #include <librpma.h>
 int main(void)
 {
-  void *ptr = rpma_mr_advise;
+  void *ptr = rpma_cq_get_wc;
   (void) ptr; /* unused */
   return 0;
 }
diff --git a/engines/librpma_apm.c b/engines/librpma_apm.c
index ffa3769d..d1166ad8 100644
--- a/engines/librpma_apm.c
+++ b/engines/librpma_apm.c
@@ -22,8 +22,7 @@ static inline int client_io_flush(struct thread_data *td,
 		struct io_u *first_io_u, struct io_u *last_io_u,
 		unsigned long long int len);
 
-static int client_get_io_u_index(struct rpma_completion *cmpl,
-		unsigned int *io_u_index);
+static int client_get_io_u_index(struct ibv_wc *wc, unsigned int *io_u_index);
 
 static int client_init(struct thread_data *td)
 {
@@ -188,10 +187,9 @@ static inline int client_io_flush(struct thread_data *td,
 	return 0;
 }
 
-static int client_get_io_u_index(struct rpma_completion *cmpl,
-		unsigned int *io_u_index)
+static int client_get_io_u_index(struct ibv_wc *wc, unsigned int *io_u_index)
 {
-	memcpy(io_u_index, &cmpl->op_context, sizeof(*io_u_index));
+	memcpy(io_u_index, &wc->wr_id, sizeof(*io_u_index));
 
 	return 1;
 }
diff --git a/engines/librpma_fio.c b/engines/librpma_fio.c
index 9d6ebf38..dfd82180 100644
--- a/engines/librpma_fio.c
+++ b/engines/librpma_fio.c
@@ -302,6 +302,12 @@ int librpma_fio_client_init(struct thread_data *td,
 	if (ccd->conn == NULL)
 		goto err_peer_delete;
 
+	/* get the connection's main CQ */
+	if ((ret = rpma_conn_get_cq(ccd->conn, &ccd->cq))) {
+		librpma_td_verror(td, ret, "rpma_conn_get_cq");
+		goto err_conn_delete;
+	}
+
 	/* get the connection's private data sent from the server */
 	if ((ret = rpma_conn_get_private_data(ccd->conn, &pdata))) {
 		librpma_td_verror(td, ret, "rpma_conn_get_private_data");
@@ -455,7 +461,7 @@ static enum fio_q_status client_queue_sync(struct thread_data *td,
 		struct io_u *io_u)
 {
 	struct librpma_fio_client_data *ccd = td->io_ops_data;
-	struct rpma_completion cmpl;
+	struct ibv_wc wc;
 	unsigned io_u_index;
 	int ret;
 
@@ -478,31 +484,31 @@ static enum fio_q_status client_queue_sync(struct thread_data *td,
 
 	do {
 		/* get a completion */
-		ret = rpma_conn_completion_get(ccd->conn, &cmpl);
+		ret = rpma_cq_get_wc(ccd->cq, 1, &wc, NULL);
 		if (ret == RPMA_E_NO_COMPLETION) {
 			/* lack of completion is not an error */
 			continue;
 		} else if (ret != 0) {
 			/* an error occurred */
-			librpma_td_verror(td, ret, "rpma_conn_completion_get");
+			librpma_td_verror(td, ret, "rpma_cq_get_wc");
 			goto err;
 		}
 
 		/* if io_us has completed with an error */
-		if (cmpl.op_status != IBV_WC_SUCCESS)
+		if (wc.status != IBV_WC_SUCCESS)
 			goto err;
 
-		if (cmpl.op == RPMA_OP_SEND)
+		if (wc.opcode == IBV_WC_SEND)
 			++ccd->op_send_completed;
 		else {
-			if (cmpl.op == RPMA_OP_RECV)
+			if (wc.opcode == IBV_WC_RECV)
 				++ccd->op_recv_completed;
 
 			break;
 		}
 	} while (1);
 
-	if (ccd->get_io_u_index(&cmpl, &io_u_index) != 1)
+	if (ccd->get_io_u_index(&wc, &io_u_index) != 1)
 		goto err;
 
 	if (io_u->index != io_u_index) {
@@ -654,8 +660,8 @@ int librpma_fio_client_commit(struct thread_data *td)
 static int client_getevent_process(struct thread_data *td)
 {
 	struct librpma_fio_client_data *ccd = td->io_ops_data;
-	struct rpma_completion cmpl;
-	/* io_u->index of completed io_u (cmpl.op_context) */
+	struct ibv_wc wc;
+	/* io_u->index of completed io_u (wc.wr_id) */
 	unsigned int io_u_index;
 	/* # of completed io_us */
 	int cmpl_num = 0;
@@ -665,7 +671,7 @@ static int client_getevent_process(struct thread_data *td)
 	int ret;
 
 	/* get a completion */
-	if ((ret = rpma_conn_completion_get(ccd->conn, &cmpl))) {
+	if ((ret = rpma_cq_get_wc(ccd->cq, 1, &wc, NULL))) {
 		/* lack of completion is not an error */
 		if (ret == RPMA_E_NO_COMPLETION) {
 			/* lack of completion is not an error */
@@ -673,22 +679,22 @@ static int client_getevent_process(struct thread_data *td)
 		}
 
 		/* an error occurred */
-		librpma_td_verror(td, ret, "rpma_conn_completion_get");
+		librpma_td_verror(td, ret, "rpma_cq_get_wc");
 		return -1;
 	}
 
 	/* if io_us has completed with an error */
-	if (cmpl.op_status != IBV_WC_SUCCESS) {
-		td->error = cmpl.op_status;
+	if (wc.status != IBV_WC_SUCCESS) {
+		td->error = wc.status;
 		return -1;
 	}
 
-	if (cmpl.op == RPMA_OP_SEND)
+	if (wc.opcode == IBV_WC_SEND)
 		++ccd->op_send_completed;
-	else if (cmpl.op == RPMA_OP_RECV)
+	else if (wc.opcode == IBV_WC_RECV)
 		++ccd->op_recv_completed;
 
-	if ((ret = ccd->get_io_u_index(&cmpl, &io_u_index)) != 1)
+	if ((ret = ccd->get_io_u_index(&wc, &io_u_index)) != 1)
 		return ret;
 
 	/* look for an io_u being completed */
@@ -750,7 +756,7 @@ int librpma_fio_client_getevents(struct thread_data *td, unsigned int min,
 
 			/*
 			 * To reduce CPU consumption one can use
-			 * the rpma_conn_completion_wait() function.
+			 * the rpma_cq_wait() function.
 			 * Note this greatly increase the latency
 			 * and make the results less stable.
 			 * The bandwidth stays more or less the same.
@@ -1029,6 +1035,12 @@ int librpma_fio_server_open_file(struct thread_data *td, struct fio_file *f,
 	csd->ws_ptr = ws_ptr;
 	csd->conn = conn;
 
+	/* get the connection's main CQ */
+	if ((ret = rpma_conn_get_cq(csd->conn, &csd->cq))) {
+		librpma_td_verror(td, ret, "rpma_conn_get_cq");
+		goto err_conn_delete;
+	}
+
 	return 0;
 
 err_conn_delete:
diff --git a/engines/librpma_fio.h b/engines/librpma_fio.h
index 2c507e9c..91290235 100644
--- a/engines/librpma_fio.h
+++ b/engines/librpma_fio.h
@@ -94,12 +94,13 @@ typedef int (*librpma_fio_flush_t)(struct thread_data *td,
  * - ( 0) - skip
  * - (-1) - on error
  */
-typedef int (*librpma_fio_get_io_u_index_t)(struct rpma_completion *cmpl,
+typedef int (*librpma_fio_get_io_u_index_t)(struct ibv_wc *wc,
 		unsigned int *io_u_index);
 
 struct librpma_fio_client_data {
 	struct rpma_peer *peer;
 	struct rpma_conn *conn;
+	struct rpma_cq *cq;
 
 	/* aligned td->orig_buffer */
 	char *orig_buffer_aligned;
@@ -199,29 +200,29 @@ static inline int librpma_fio_client_io_complete_all_sends(
 		struct thread_data *td)
 {
 	struct librpma_fio_client_data *ccd = td->io_ops_data;
-	struct rpma_completion cmpl;
+	struct ibv_wc wc;
 	int ret;
 
 	while (ccd->op_send_posted != ccd->op_send_completed) {
 		/* get a completion */
-		ret = rpma_conn_completion_get(ccd->conn, &cmpl);
+		ret = rpma_cq_get_wc(ccd->cq, 1, &wc, NULL);
 		if (ret == RPMA_E_NO_COMPLETION) {
 			/* lack of completion is not an error */
 			continue;
 		} else if (ret != 0) {
 			/* an error occurred */
-			librpma_td_verror(td, ret, "rpma_conn_completion_get");
+			librpma_td_verror(td, ret, "rpma_cq_get_wc");
 			break;
 		}
 
-		if (cmpl.op_status != IBV_WC_SUCCESS)
+		if (wc.status != IBV_WC_SUCCESS)
 			return -1;
 
-		if (cmpl.op == RPMA_OP_SEND)
+		if (wc.opcode == IBV_WC_SEND)
 			++ccd->op_send_completed;
 		else {
 			log_err(
-				"A completion other than RPMA_OP_SEND got during cleaning up the CQ from SENDs\n");
+				"A completion other than IBV_WC_SEND got during cleaning up the CQ from SENDs\n");
 			return -1;
 		}
 	}
@@ -251,6 +252,7 @@ struct librpma_fio_server_data {
 
 	/* resources of an incoming connection */
 	struct rpma_conn *conn;
+	struct rpma_cq *cq;
 
 	char *ws_ptr;
 	struct rpma_mr_local *ws_mr;
diff --git a/engines/librpma_gpspm.c b/engines/librpma_gpspm.c
index 74147709..14626e7f 100644
--- a/engines/librpma_gpspm.c
+++ b/engines/librpma_gpspm.c
@@ -60,8 +60,7 @@ static inline int client_io_flush(struct thread_data *td,
 		struct io_u *first_io_u, struct io_u *last_io_u,
 		unsigned long long int len);
 
-static int client_get_io_u_index(struct rpma_completion *cmpl,
-		unsigned int *io_u_index);
+static int client_get_io_u_index(struct ibv_wc *wc, unsigned int *io_u_index);
 
 static int client_init(struct thread_data *td)
 {
@@ -317,17 +316,16 @@ static inline int client_io_flush(struct thread_data *td,
 	return 0;
 }
 
-static int client_get_io_u_index(struct rpma_completion *cmpl,
-		unsigned int *io_u_index)
+static int client_get_io_u_index(struct ibv_wc *wc, unsigned int *io_u_index)
 {
 	GPSPMFlushResponse *flush_resp;
 
-	if (cmpl->op != RPMA_OP_RECV)
+	if (wc->opcode != IBV_WC_RECV)
 		return 0;
 
 	/* unpack a response from the received buffer */
 	flush_resp = gpspm_flush_response__unpack(NULL,
-			cmpl->byte_len, cmpl->op_context);
+			wc->byte_len, (void *)wc->wr_id);
 	if (flush_resp == NULL) {
 		log_err("Cannot unpack the flush response buffer\n");
 		return -1;
@@ -373,7 +371,7 @@ struct server_data {
 	uint32_t msg_sqe_available; /* # of free SQ slots */
 
 	/* in-memory queues */
-	struct rpma_completion *msgs_queued;
+	struct ibv_wc *msgs_queued;
 	uint32_t msg_queued_nr;
 };
 
@@ -562,8 +560,7 @@ err_cfg_delete:
 	return ret;
 }
 
-static int server_qe_process(struct thread_data *td,
-		struct rpma_completion *cmpl)
+static int server_qe_process(struct thread_data *td, struct ibv_wc *wc)
 {
 	struct librpma_fio_server_data *csd = td->io_ops_data;
 	struct server_data *sd = csd->server_data;
@@ -580,7 +577,7 @@ static int server_qe_process(struct thread_data *td,
 	int ret;
 
 	/* calculate SEND/RECV pair parameters */
-	msg_index = (int)(uintptr_t)cmpl->op_context;
+	msg_index = (int)(uintptr_t)wc->wr_id;
 	io_u_buff_offset = IO_U_BUFF_OFF_SERVER(msg_index);
 	send_buff_offset = io_u_buff_offset + SEND_OFFSET;
 	recv_buff_offset = io_u_buff_offset + RECV_OFFSET;
@@ -588,7 +585,7 @@ static int server_qe_process(struct thread_data *td,
 	recv_buff_ptr = sd->orig_buffer_aligned + recv_buff_offset;
 
 	/* unpack a flush request from the received buffer */
-	flush_req = gpspm_flush_request__unpack(NULL, cmpl->byte_len,
+	flush_req = gpspm_flush_request__unpack(NULL, wc->byte_len,
 			recv_buff_ptr);
 	if (flush_req == NULL) {
 		log_err("cannot unpack the flush request buffer\n");
@@ -682,28 +679,28 @@ static int server_cmpl_process(struct thread_data *td)
 {
 	struct librpma_fio_server_data *csd = td->io_ops_data;
 	struct server_data *sd = csd->server_data;
-	struct rpma_completion *cmpl = &sd->msgs_queued[sd->msg_queued_nr];
+	struct ibv_wc *wc = &sd->msgs_queued[sd->msg_queued_nr];
 	struct librpma_fio_options_values *o = td->eo;
 	int ret;
 
-	ret = rpma_conn_completion_get(csd->conn, cmpl);
+	ret = rpma_cq_get_wc(csd->cq, 1, wc, NULL);
 	if (ret == RPMA_E_NO_COMPLETION) {
 		if (o->busy_wait_polling == 0) {
-			ret = rpma_conn_completion_wait(csd->conn);
+			ret = rpma_cq_wait(csd->cq);
 			if (ret == RPMA_E_NO_COMPLETION) {
 				/* lack of completion is not an error */
 				return 0;
 			} else if (ret != 0) {
-				librpma_td_verror(td, ret, "rpma_conn_completion_wait");
+				librpma_td_verror(td, ret, "rpma_cq_wait");
 				goto err_terminate;
 			}
 
-			ret = rpma_conn_completion_get(csd->conn, cmpl);
+			ret = rpma_cq_get_wc(csd->cq, 1, wc, NULL);
 			if (ret == RPMA_E_NO_COMPLETION) {
 				/* lack of completion is not an error */
 				return 0;
 			} else if (ret != 0) {
-				librpma_td_verror(td, ret, "rpma_conn_completion_get");
+				librpma_td_verror(td, ret, "rpma_cq_get_wc");
 				goto err_terminate;
 			}
 		} else {
@@ -711,17 +708,17 @@ static int server_cmpl_process(struct thread_data *td)
 			return 0;
 		}
 	} else if (ret != 0) {
-		librpma_td_verror(td, ret, "rpma_conn_completion_get");
+		librpma_td_verror(td, ret, "rpma_cq_get_wc");
 		goto err_terminate;
 	}
 
 	/* validate the completion */
-	if (cmpl->op_status != IBV_WC_SUCCESS)
+	if (wc->status != IBV_WC_SUCCESS)
 		goto err_terminate;
 
-	if (cmpl->op == RPMA_OP_RECV)
+	if (wc->opcode == IBV_WC_RECV)
 		++sd->msg_queued_nr;
-	else if (cmpl->op == RPMA_OP_SEND)
+	else if (wc->opcode == IBV_WC_SEND)
 		++sd->msg_sqe_available;
 
 	return 0;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-02-18 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-02-18 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6a16e9e9531a5f746c4e2fe43873de1db434b4fc:

  diskutil: include limits.h for PATH_MAX (2022-02-15 17:17:30 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c99c81adb3510a8dc34d47fd40b19ef657e32192:

  Correct F_FULLSYNC -> F_FULLFSYNC (2022-02-17 12:53:59 -0700)

----------------------------------------------------------------
Jens Axboe (4):
      t/io_uring: allow non-power-of-2 queue depths
      t/io_uring: align buffers correctly on non-4k page sizes
      Use fcntl(..., F_FULLSYNC) if available
      Correct F_FULLSYNC -> F_FULLFSYNC

 configure    | 22 ++++++++++++++++++++++
 io_u.c       |  4 ++++
 t/io_uring.c | 15 ++++++++++-----
 3 files changed, 36 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 0efde7d6..6160d84d 100755
--- a/configure
+++ b/configure
@@ -645,6 +645,25 @@ if compile_prog "" "-lz" "zlib" ; then
 fi
 print_config "zlib" "$zlib"
 
+##########################################
+# fcntl(F_FULLFSYNC) support
+if test "$fcntl_sync" != "yes" ; then
+  fcntl_sync="no"
+fi
+cat > $TMPC << EOF
+#include <unistd.h>
+#include <fcntl.h>
+
+int main(int argc, char **argv)
+{
+  return fcntl(0, F_FULLFSYNC);
+}
+EOF
+if compile_prog "" "" "fcntl(F_FULLFSYNC)" ; then
+    fcntl_sync="yes"
+fi
+print_config "fcntl(F_FULLFSYNC)" "$fcntl_sync"
+
 ##########################################
 # linux-aio probe
 if test "$libaio" != "yes" ; then
@@ -3174,6 +3193,9 @@ fi
 if test "$pdb" = yes; then
   output_sym "CONFIG_PDB"
 fi
+if test "$fcntl_sync" = "yes" ; then
+  output_sym "CONFIG_FCNTL_SYNC"
+fi
 
 print_config "Lib-based ioengines dynamic" "$dynamic_engines"
 cat > $TMPC << EOF
diff --git a/io_u.c b/io_u.c
index 059637e5..806ceb77 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2297,7 +2297,11 @@ int do_io_u_sync(const struct thread_data *td, struct io_u *io_u)
 	int ret;
 
 	if (io_u->ddir == DDIR_SYNC) {
+#ifdef CONFIG_FCNTL_SYNC
+		ret = fcntl(io_u->file->fd, F_FULLFSYNC);
+#else
 		ret = fsync(io_u->file->fd);
+#endif
 	} else if (io_u->ddir == DDIR_DATASYNC) {
 #ifdef CONFIG_FDATASYNC
 		ret = fdatasync(io_u->file->fd);
diff --git a/t/io_uring.c b/t/io_uring.c
index 4520de43..f513d7dc 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -364,7 +364,7 @@ static int io_uring_register_buffers(struct submitter *s)
 		return 0;
 
 	return syscall(__NR_io_uring_register, s->ring_fd,
-			IORING_REGISTER_BUFFERS, s->iovecs, depth);
+			IORING_REGISTER_BUFFERS, s->iovecs, roundup_pow2(depth));
 }
 
 static int io_uring_register_files(struct submitter *s)
@@ -962,7 +962,7 @@ static int setup_aio(struct submitter *s)
 		fixedbufs = register_files = 0;
 	}
 
-	return io_queue_init(depth, &s->aio_ctx);
+	return io_queue_init(roundup_pow2(depth), &s->aio_ctx);
 #else
 	fprintf(stderr, "Legacy AIO not available on this system/build\n");
 	errno = EINVAL;
@@ -1156,6 +1156,7 @@ int main(int argc, char *argv[])
 	struct submitter *s;
 	unsigned long done, calls, reap;
 	int err, i, j, flags, fd, opt, threads_per_f, threads_rem = 0, nfiles;
+	long page_size;
 	struct file f;
 	char *fdepths;
 	void *ret;
@@ -1249,7 +1250,7 @@ int main(int argc, char *argv[])
 		dma_map = 0;
 
 	submitter = calloc(nthreads, sizeof(*submitter) +
-				depth * sizeof(struct iovec));
+				roundup_pow2(depth) * sizeof(struct iovec));
 	for (j = 0; j < nthreads; j++) {
 		s = get_submitter(j);
 		s->index = j;
@@ -1319,12 +1320,16 @@ int main(int argc, char *argv[])
 
 	arm_sig_int();
 
+	page_size = sysconf(_SC_PAGESIZE);
+	if (page_size < 0)
+		page_size = 4096;
+
 	for (j = 0; j < nthreads; j++) {
 		s = get_submitter(j);
-		for (i = 0; i < depth; i++) {
+		for (i = 0; i < roundup_pow2(depth); i++) {
 			void *buf;
 
-			if (posix_memalign(&buf, bs, bs)) {
+			if (posix_memalign(&buf, page_size, bs)) {
 				printf("failed alloc\n");
 				return 1;
 			}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-02-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-02-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a1db4528a59a99c5e2aa66091c505fb60e3a70ca:

  Merge branch 'fio-docs-ci' of https://github.com/vincentkfu/fio (2022-02-11 16:29:44 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6a16e9e9531a5f746c4e2fe43873de1db434b4fc:

  diskutil: include limits.h for PATH_MAX (2022-02-15 17:17:30 -0700)

----------------------------------------------------------------
Jens Axboe (4):
      Merge branch 'fix_bytesrate_eta' of https://github.com/PCPartPicker/fio
      Merge branch 'rand_nr_bugfix' of https://github.com/PCPartPicker/fio
      Merge branch 'check_min_rate_cleanup' of https://github.com/PCPartPicker/fio
      diskutil: include limits.h for PATH_MAX

Vincent Fu (1):
      ci: detect Windows installer build failures

aggieNick02 (3):
      Cleanup __check_min_rate
      Fix ETA display when rate and/or rate_min are specified
      Fix :<nr> suffix with random read/write causing 0 initial offset

 .appveyor.yml |  1 +
 backend.c     | 81 ++++++++++++++++++++---------------------------------------
 diskutil.h    |  2 ++
 eta.c         |  5 ++--
 fio.h         |  6 ++---
 init.c        |  9 ++++++-
 libfio.c      |  4 +--
 7 files changed, 46 insertions(+), 62 deletions(-)

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index 42b79958..b94eefe3 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -44,6 +44,7 @@ after_build:
   - file.exe fio.exe
   - make.exe test
   - 'cd os\windows && dobuild.cmd %ARCHITECTURE% && cd ..'
+  - ls.exe ./os/windows/*.msi
   - ps: Get-ChildItem .\os\windows\*.msi | % { Push-AppveyorArtifact $_.FullName -FileName $_.Name -DeploymentName fio.msi }
 
 test_script:
diff --git a/backend.c b/backend.c
index c035baed..a21dfef6 100644
--- a/backend.c
+++ b/backend.c
@@ -136,13 +136,10 @@ static void set_sig_handlers(void)
 static bool __check_min_rate(struct thread_data *td, struct timespec *now,
 			     enum fio_ddir ddir)
 {
-	unsigned long long bytes = 0;
-	unsigned long iops = 0;
-	unsigned long spent;
-	unsigned long long rate;
-	unsigned long long ratemin = 0;
-	unsigned int rate_iops = 0;
-	unsigned int rate_iops_min = 0;
+	unsigned long long current_rate_check_bytes = td->this_io_bytes[ddir];
+	unsigned long current_rate_check_blocks = td->this_io_blocks[ddir];
+	unsigned long long option_rate_bytes_min = td->o.ratemin[ddir];
+	unsigned int option_rate_iops_min = td->o.rate_iops_min[ddir];
 
 	assert(ddir_rw(ddir));
 
@@ -155,68 +152,44 @@ static bool __check_min_rate(struct thread_data *td, struct timespec *now,
 	if (mtime_since(&td->start, now) < 2000)
 		return false;
 
-	iops += td->this_io_blocks[ddir];
-	bytes += td->this_io_bytes[ddir];
-	ratemin += td->o.ratemin[ddir];
-	rate_iops += td->o.rate_iops[ddir];
-	rate_iops_min += td->o.rate_iops_min[ddir];
-
 	/*
-	 * if rate blocks is set, sample is running
+	 * if last_rate_check_blocks or last_rate_check_bytes is set,
+	 * we can compute a rate per ratecycle
 	 */
-	if (td->rate_bytes[ddir] || td->rate_blocks[ddir]) {
-		spent = mtime_since(&td->lastrate[ddir], now);
-		if (spent < td->o.ratecycle)
+	if (td->last_rate_check_bytes[ddir] || td->last_rate_check_blocks[ddir]) {
+		unsigned long spent = mtime_since(&td->last_rate_check_time[ddir], now);
+		if (spent < td->o.ratecycle || spent==0)
 			return false;
 
-		if (td->o.rate[ddir] || td->o.ratemin[ddir]) {
+		if (td->o.ratemin[ddir]) {
 			/*
 			 * check bandwidth specified rate
 			 */
-			if (bytes < td->rate_bytes[ddir]) {
-				log_err("%s: rate_min=%lluB/s not met, only transferred %lluB\n",
-					td->o.name, ratemin, bytes);
+			unsigned long long current_rate_bytes =
+				((current_rate_check_bytes - td->last_rate_check_bytes[ddir]) * 1000) / spent;
+			if (current_rate_bytes < option_rate_bytes_min) {
+				log_err("%s: rate_min=%lluB/s not met, got %lluB/s\n",
+					td->o.name, option_rate_bytes_min, current_rate_bytes);
 				return true;
-			} else {
-				if (spent)
-					rate = ((bytes - td->rate_bytes[ddir]) * 1000) / spent;
-				else
-					rate = 0;
-
-				if (rate < ratemin ||
-				    bytes < td->rate_bytes[ddir]) {
-					log_err("%s: rate_min=%lluB/s not met, got %lluB/s\n",
-						td->o.name, ratemin, rate);
-					return true;
-				}
 			}
 		} else {
 			/*
 			 * checks iops specified rate
 			 */
-			if (iops < rate_iops) {
-				log_err("%s: rate_iops_min=%u not met, only performed %lu IOs\n",
-						td->o.name, rate_iops, iops);
+			unsigned long long current_rate_iops =
+				((current_rate_check_blocks - td->last_rate_check_blocks[ddir]) * 1000) / spent;
+
+			if (current_rate_iops < option_rate_iops_min) {
+				log_err("%s: rate_iops_min=%u not met, got %llu IOPS\n",
+					td->o.name, option_rate_iops_min, current_rate_iops);
 				return true;
-			} else {
-				if (spent)
-					rate = ((iops - td->rate_blocks[ddir]) * 1000) / spent;
-				else
-					rate = 0;
-
-				if (rate < rate_iops_min ||
-				    iops < td->rate_blocks[ddir]) {
-					log_err("%s: rate_iops_min=%u not met, got %llu IOPS\n",
-						td->o.name, rate_iops_min, rate);
-					return true;
-				}
 			}
 		}
 	}
 
-	td->rate_bytes[ddir] = bytes;
-	td->rate_blocks[ddir] = iops;
-	memcpy(&td->lastrate[ddir], now, sizeof(*now));
+	td->last_rate_check_bytes[ddir] = current_rate_check_bytes;
+	td->last_rate_check_blocks[ddir] = current_rate_check_blocks;
+	memcpy(&td->last_rate_check_time[ddir], now, sizeof(*now));
 	return false;
 }
 
@@ -1845,11 +1818,11 @@ static void *thread_main(void *data)
 
 	if (o->ratemin[DDIR_READ] || o->ratemin[DDIR_WRITE] ||
 			o->ratemin[DDIR_TRIM]) {
-	        memcpy(&td->lastrate[DDIR_READ], &td->bw_sample_time,
+	        memcpy(&td->last_rate_check_time[DDIR_READ], &td->bw_sample_time,
 					sizeof(td->bw_sample_time));
-	        memcpy(&td->lastrate[DDIR_WRITE], &td->bw_sample_time,
+	        memcpy(&td->last_rate_check_time[DDIR_WRITE], &td->bw_sample_time,
 					sizeof(td->bw_sample_time));
-	        memcpy(&td->lastrate[DDIR_TRIM], &td->bw_sample_time,
+	        memcpy(&td->last_rate_check_time[DDIR_TRIM], &td->bw_sample_time,
 					sizeof(td->bw_sample_time));
 	}
 
diff --git a/diskutil.h b/diskutil.h
index 83bcbf89..7d7ef802 100644
--- a/diskutil.h
+++ b/diskutil.h
@@ -2,6 +2,8 @@
 #define FIO_DISKUTIL_H
 #define FIO_DU_NAME_SZ		64
 
+#include <limits.h>
+
 #include "helper_thread.h"
 #include "fio_sem.h"
 
diff --git a/eta.c b/eta.c
index ea1781f3..17970c78 100644
--- a/eta.c
+++ b/eta.c
@@ -420,6 +420,7 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 		if (is_power_of_2(td->o.kb_base))
 			je->is_pow2 = 1;
 		je->unit_base = td->o.unit_base;
+		je->sig_figs = td->o.sig_figs;
 		if (td->o.bw_avg_time < bw_avg_time)
 			bw_avg_time = td->o.bw_avg_time;
 		if (td->runstate == TD_RUNNING || td->runstate == TD_VERIFYING
@@ -600,9 +601,9 @@ void display_thread_status(struct jobs_eta *je)
 		char *tr, *mr;
 
 		mr = num2str(je->m_rate[0] + je->m_rate[1] + je->m_rate[2],
-				je->sig_figs, 0, je->is_pow2, N2S_BYTEPERSEC);
+				je->sig_figs, 1, je->is_pow2, N2S_BYTEPERSEC);
 		tr = num2str(je->t_rate[0] + je->t_rate[1] + je->t_rate[2],
-				je->sig_figs, 0, je->is_pow2, N2S_BYTEPERSEC);
+				je->sig_figs, 1, je->is_pow2, N2S_BYTEPERSEC);
 
 		p += sprintf(p, ", %s-%s", mr, tr);
 		free(tr);
diff --git a/fio.h b/fio.h
index 7b0ca843..88df117d 100644
--- a/fio.h
+++ b/fio.h
@@ -335,10 +335,10 @@ struct thread_data {
 	 */
 	uint64_t rate_bps[DDIR_RWDIR_CNT];
 	uint64_t rate_next_io_time[DDIR_RWDIR_CNT];
-	unsigned long long rate_bytes[DDIR_RWDIR_CNT];
-	unsigned long rate_blocks[DDIR_RWDIR_CNT];
+	unsigned long long last_rate_check_bytes[DDIR_RWDIR_CNT];
+	unsigned long last_rate_check_blocks[DDIR_RWDIR_CNT];
 	unsigned long long rate_io_issue_bytes[DDIR_RWDIR_CNT];
-	struct timespec lastrate[DDIR_RWDIR_CNT];
+	struct timespec last_rate_check_time[DDIR_RWDIR_CNT];
 	int64_t last_usec[DDIR_RWDIR_CNT];
 	struct frand_state poisson_state[DDIR_RWDIR_CNT];
 
diff --git a/init.c b/init.c
index 13935152..81c30f8c 100644
--- a/init.c
+++ b/init.c
@@ -1576,7 +1576,14 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	td->ts.sig_figs = o->sig_figs;
 
 	init_thread_stat_min_vals(&td->ts);
-	td->ddir_seq_nr = o->ddir_seq_nr;
+
+	/*
+	 * td->>ddir_seq_nr needs to be initialized to 1, NOT o->ddir_seq_nr,
+	 * so that get_next_offset gets a new random offset the first time it
+	 * is called, instead of keeping an initial offset of 0 for the first
+	 * nr-1 calls
+	 */
+	td->ddir_seq_nr = 1;
 
 	if ((o->stonewall || o->new_group) && prev_group_jobs) {
 		prev_group_jobs = 0;
diff --git a/libfio.c b/libfio.c
index 01fa7452..1a891776 100644
--- a/libfio.c
+++ b/libfio.c
@@ -87,8 +87,8 @@ static void reset_io_counters(struct thread_data *td, int all)
 			td->this_io_bytes[ddir] = 0;
 			td->stat_io_blocks[ddir] = 0;
 			td->this_io_blocks[ddir] = 0;
-			td->rate_bytes[ddir] = 0;
-			td->rate_blocks[ddir] = 0;
+			td->last_rate_check_bytes[ddir] = 0;
+			td->last_rate_check_blocks[ddir] = 0;
 			td->bytes_done[ddir] = 0;
 			td->rate_io_issue_bytes[ddir] = 0;
 			td->rate_next_io_time[ddir] = 0;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-02-12 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-02-12 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit df597be63e26ef59c1538b3ce2026c83684ff7fb:

  fio: really use LDFLAGS when linking dynamic engines (2022-02-08 09:28:30 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a1db4528a59a99c5e2aa66091c505fb60e3a70ca:

  Merge branch 'fio-docs-ci' of https://github.com/vincentkfu/fio (2022-02-11 16:29:44 -0700)

----------------------------------------------------------------
Jens Axboe (4):
      t/io_uring: avoid unused `nr_batch` warning
      Add aarch64 cpu clock support
      Merge branch 'fio_offload_fixes' of https://github.com/PCPartPicker/fio
      Merge branch 'fio-docs-ci' of https://github.com/vincentkfu/fio

Vincent Fu (8):
      docs: document cpumode option for the cpuio ioengine
      docs: update Makefile in order to detect build failures
      docs: rename HOWTO to HOWTO.rst
      HOWTO: combine multiple pool option listings
      HOWTO: combine separate hipri listings into a single one
      HOWTO: combine two chunk_size listings into a single one
      ci: install sphinx packages and add doc building to GitHub Actions
      windows: update the installer build for renamed files

aggieNick02 (1):
      Fix issues (assert or uninit var, hang) with check_min_rate and offloading

 HOWTO => HOWTO.rst      | 126 ++++++++++++++++++++++++++++--------------------
 arch/arch-aarch64.h     |  17 +++++++
 backend.c               |   9 +++-
 ci/actions-full-test.sh |   1 +
 ci/actions-install.sh   |   3 +-
 doc/Makefile            |   2 +-
 doc/fio_doc.rst         |   2 +-
 doc/fio_man.rst         |   2 +-
 fio.1                   |  13 +++++
 os/windows/install.wxs  |   4 +-
 t/io_uring.c            |   9 ++--
 11 files changed, 124 insertions(+), 64 deletions(-)
 rename HOWTO => HOWTO.rst (99%)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO.rst
similarity index 99%
rename from HOWTO
rename to HOWTO.rst
index 74ba7216..ac1f3478 100644
--- a/HOWTO
+++ b/HOWTO.rst
@@ -2137,8 +2137,10 @@ I/O engine
 			Asynchronous read and write using DDN's Infinite Memory Engine (IME).
 			This engine will try to stack as much IOs as possible by creating
 			requests for IME. FIO will then decide when to commit these requests.
+
 		**libiscsi**
 			Read and write iscsi lun with libiscsi.
+
 		**nbd**
 			Read and write a Network Block Device (NBD).
 
@@ -2149,6 +2151,7 @@ I/O engine
 			unless :option:`verify` is set or :option:`cuda_io` is `posix`.
 			:option:`iomem` must not be `cudamalloc`. This ioengine defines
 			engine specific options.
+
 		**dfs**
 			I/O engine supporting asynchronous read and write operations to the
 			DAOS File System (DFS) via libdfs.
@@ -2175,8 +2178,8 @@ with the caveat that when used on the command line, they must come after the
     Set the percentage of I/O that will be issued with the highest priority.
     Default: 0. A single value applies to reads and writes. Comma-separated
     values may be specified for reads and writes. For this option to be
-    effective, NCQ priority must be supported and enabled, and `direct=1'
-    option must be used. fio must also be run as the root user. Unlike
+    effective, NCQ priority must be supported and enabled, and the :option:`direct`
+    option must be set. fio must also be run as the root user. Unlike
     slat/clat/lat stats, which can be tracked and reported independently, per
     priority stats only track and report a single type of latency. By default,
     completion latency (clat) will be reported, if :option:`lat_percentiles` is
@@ -2207,6 +2210,7 @@ with the caveat that when used on the command line, they must come after the
 	meaning of priority may differ. See also the :option:`prio` option.
 
 .. option:: cmdprio_bssplit=str[,str] : [io_uring] [libaio]
+
 	To get a finer control over I/O priority, this option allows
 	specifying the percentage of IOs that must have a priority set
 	depending on the block size of the IO. This option is useful only
@@ -2243,14 +2247,6 @@ with the caveat that when used on the command line, they must come after the
     map and release for each IO. This is more efficient, and reduces the
     IO latency as well.
 
-.. option:: hipri : [io_uring]
-
-    If this option is set, fio will attempt to use polled IO completions.
-    Normal IO completions generate interrupts to signal the completion of
-    IO, polled completions do not. Hence they are require active reaping
-    by the application. The benefits are more efficient IO for high IOPS
-    scenarios, and lower latencies for low queue depth IO.
-
 .. option:: registerfiles : [io_uring]
 
 	With this option, fio registers the set of files being used with the
@@ -2271,6 +2267,33 @@ with the caveat that when used on the command line, they must come after the
 	When :option:`sqthread_poll` is set, this option provides a way to
 	define which CPU should be used for the polling thread.
 
+.. option:: hipri
+
+   [io_uring]
+
+        If this option is set, fio will attempt to use polled IO completions.
+        Normal IO completions generate interrupts to signal the completion of
+        IO, polled completions do not. Hence they are require active reaping
+        by the application. The benefits are more efficient IO for high IOPS
+        scenarios, and lower latencies for low queue depth IO.
+
+   [pvsync2]
+
+	Set RWF_HIPRI on I/O, indicating to the kernel that it's of higher priority
+	than normal.
+
+   [sg]
+
+	If this option is set, fio will attempt to use polled IO completions.
+	This will have a similar effect as (io_uring)hipri. Only SCSI READ and
+	WRITE commands will have the SGV4_FLAG_HIPRI set (not UNMAP (trim) nor
+	VERIFY). Older versions of the Linux sg driver that do not support
+	hipri will simply ignore this flag and do normal IO. The Linux SCSI
+	Low Level Driver (LLD) that "owns" the device also needs to support
+	hipri (also known as iopoll and mq_poll). The MegaRAID driver is an
+	example of a SCSI LLD. Default: clear (0) which does normal
+	(interrupted based) IO.
+
 .. option:: userspace_reap : [libaio]
 
 	Normally, with the libaio engine in use, fio will use the
@@ -2279,11 +2302,6 @@ with the caveat that when used on the command line, they must come after the
 	reap events. The reaping mode is only enabled when polling for a minimum of
 	0 events (e.g. when :option:`iodepth_batch_complete` `=0`).
 
-.. option:: hipri : [pvsync2]
-
-	Set RWF_HIPRI on I/O, indicating to the kernel that it's of higher priority
-	than normal.
-
 .. option:: hipri_percentage : [pvsync2]
 
 	When hipri is set this determines the probability of a pvsync2 I/O being high
@@ -2318,6 +2336,16 @@ with the caveat that when used on the command line, they must come after the
 
 	Split the load into cycles of the given time. In microseconds.
 
+.. option:: cpumode=str : [cpuio]
+
+	Specify how to stress the CPU. It can take these two values:
+
+	**noop**
+		This is the default where the CPU executes noop instructions.
+	**qsort**
+		Replace the default noop instructions loop with a qsort algorithm to
+		consume more energy.
+
 .. option:: exit_on_io_done=bool : [cpuio]
 
 	Detect when I/O threads are done, then exit.
@@ -2444,10 +2472,6 @@ with the caveat that when used on the command line, they must come after the
 
 	Specifies the name of the RBD.
 
-.. option:: pool=str : [rbd,rados]
-
-	Specifies the name of the Ceph pool containing RBD or RADOS data.
-
 .. option:: clientname=str : [rbd,rados]
 
 	Specifies the username (without the 'client.' prefix) used to access the
@@ -2466,6 +2490,36 @@ with the caveat that when used on the command line, they must come after the
         Touching all objects affects ceph caches and likely impacts test results.
         Enabled by default.
 
+.. option:: pool=str :
+
+   [rbd,rados]
+
+	Specifies the name of the Ceph pool containing RBD or RADOS data.
+
+   [dfs]
+
+	Specify the label or UUID of the DAOS pool to connect to.
+
+.. option:: cont=str : [dfs]
+
+	Specify the label or UUID of the DAOS container to open.
+
+.. option:: chunk_size=int
+
+   [dfs]
+
+	Specificy a different chunk size (in bytes) for the dfs file.
+	Use DAOS container's chunk size by default.
+
+   [libhdfs]
+
+	The size of the chunk to use for each file.
+
+.. option:: object_class=str : [dfs]
+
+	Specificy a different object class for the dfs file.
+	Use DAOS container's object class by default.
+
 .. option:: skip_bad=bool : [mtd]
 
 	Skip operations against known bad blocks.
@@ -2474,10 +2528,6 @@ with the caveat that when used on the command line, they must come after the
 
 	libhdfs will create chunk in this HDFS directory.
 
-.. option:: chunk_size : [libhdfs]
-
-	The size of the chunk to use for each file.
-
 .. option:: verb=str : [rdma]
 
 	The RDMA verb to use on this side of the RDMA ioengine connection. Valid
@@ -2563,18 +2613,6 @@ with the caveat that when used on the command line, they must come after the
 	a valid stream identifier) fio will open a stream and then close it when done. Default
 	is 0.
 
-.. option:: hipri : [sg]
-
-	If this option is set, fio will attempt to use polled IO completions.
-	This will have a similar effect as (io_uring)hipri. Only SCSI READ and
-	WRITE commands will have the SGV4_FLAG_HIPRI set (not UNMAP (trim) nor
-	VERIFY). Older versions of the Linux sg driver that do not support
-	hipri will simply ignore this flag and do normal IO. The Linux SCSI
-	Low Level Driver (LLD) that "owns" the device also needs to support
-	hipri (also known as iopoll and mq_poll). The MegaRAID driver is an
-	example of a SCSI LLD. Default: clear (0) which does normal
-	(interrupted based) IO.
-
 .. option:: http_host=str : [http]
 
 	Hostname to connect to. For S3, this could be the bucket hostname.
@@ -2654,24 +2692,6 @@ with the caveat that when used on the command line, they must come after the
 		GPU to RAM before a write and copied from RAM to GPU after a
 		read. :option:`verify` does not affect use of cudaMemcpy.
 
-.. option:: pool=str : [dfs]
-
-	Specify the label or UUID of the DAOS pool to connect to.
-
-.. option:: cont=str : [dfs]
-
-	Specify the label or UUID of the DAOS container to open.
-
-.. option:: chunk_size=int : [dfs]
-
-	Specificy a different chunk size (in bytes) for the dfs file.
-	Use DAOS container's chunk size by default.
-
-.. option:: object_class=str : [dfs]
-
-	Specificy a different object class for the dfs file.
-	Use DAOS container's object class by default.
-
 .. option:: nfs_url=str : [nfs]
 
 	URL in libnfs format, eg nfs://<server|ipv4|ipv6>/path[?arg=val[&arg=val]*]
diff --git a/arch/arch-aarch64.h b/arch/arch-aarch64.h
index 2a86cc5a..94571709 100644
--- a/arch/arch-aarch64.h
+++ b/arch/arch-aarch64.h
@@ -27,4 +27,21 @@ static inline int arch_ffz(unsigned long bitmask)
 
 #define ARCH_HAVE_FFZ
 
+static inline unsigned long long get_cpu_clock(void)
+{
+	unsigned long val;
+
+	asm volatile("mrs %0, cntvct_el0" : "=r" (val));
+	return val;
+}
+#define ARCH_HAVE_CPU_CLOCK
+
+#define ARCH_HAVE_INIT
+extern bool tsc_reliable;
+static inline int arch_init(char *envp[])
+{
+	tsc_reliable = true;
+	return 0;
+}
+
 #endif
diff --git a/backend.c b/backend.c
index 061e3b32..c035baed 100644
--- a/backend.c
+++ b/backend.c
@@ -1091,8 +1091,10 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 				td->rate_io_issue_bytes[__ddir] += blen;
 			}
 
-			if (should_check_rate(td))
+			if (should_check_rate(td)) {
 				td->rate_next_io_time[__ddir] = usec_for_io(td, __ddir);
+				fio_gettime(&comp_time, NULL);
+			}
 
 		} else {
 			ret = io_u_submit(td, io_u);
@@ -1172,8 +1174,11 @@ reap:
 								f->file_name);
 			}
 		}
-	} else
+	} else {
+		if (td->o.io_submit_mode == IO_MODE_OFFLOAD)
+			workqueue_flush(&td->io_wq);
 		cleanup_pending_aio(td);
+	}
 
 	/*
 	 * stop job if we failed doing any IO
diff --git a/ci/actions-full-test.sh b/ci/actions-full-test.sh
index 4ae1dba1..91790664 100755
--- a/ci/actions-full-test.sh
+++ b/ci/actions-full-test.sh
@@ -10,6 +10,7 @@ main() {
     else
         sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug
     fi
+    make -C doc html
 }
 
 main
diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index b3486a47..0e472717 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -60,6 +60,7 @@ DPKGCFG
     # care about the architecture.
     pkgs+=(
         python3-scipy
+	python3-sphinx
     )
 
     echo "Updating APT..."
@@ -78,7 +79,7 @@ install_macos() {
     #brew update >/dev/null 2>&1
     echo "Installing packages..."
     HOMEBREW_NO_AUTO_UPDATE=1 brew install cunit
-    pip3 install scipy six
+    pip3 install scipy six sphinx
 }
 
 main() {
diff --git a/doc/Makefile b/doc/Makefile
index 3b979f9a..a444d83a 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -2,7 +2,7 @@
 #
 
 # You can set these variables from the command line.
-SPHINXOPTS    =
+SPHINXOPTS    = -W --keep-going
 SPHINXBUILD   = sphinx-build
 PAPER         =
 BUILDDIR      = output
diff --git a/doc/fio_doc.rst b/doc/fio_doc.rst
index 8e1216f0..34e7fde9 100644
--- a/doc/fio_doc.rst
+++ b/doc/fio_doc.rst
@@ -5,7 +5,7 @@ fio - Flexible I/O tester rev. |version|
 .. include:: ../README.rst
 
 
-.. include:: ../HOWTO
+.. include:: ../HOWTO.rst
 
 
 
diff --git a/doc/fio_man.rst b/doc/fio_man.rst
index 44312f16..dc1d1c0d 100644
--- a/doc/fio_man.rst
+++ b/doc/fio_man.rst
@@ -9,4 +9,4 @@ Fio Manpage
 .. include:: ../README.rst
 
 
-.. include:: ../HOWTO
+.. include:: ../HOWTO.rst
diff --git a/fio.1 b/fio.1
index f32d7915..e23d4092 100644
--- a/fio.1
+++ b/fio.1
@@ -2091,6 +2091,19 @@ option when using cpuio I/O engine.
 .BI (cpuio)cpuchunks \fR=\fPint
 Split the load into cycles of the given time. In microseconds.
 .TP
+.BI (cpuio)cpumode \fR=\fPstr
+Specify how to stress the CPU. It can take these two values:
+.RS
+.RS
+.TP
+.B noop
+This is the default and directs the CPU to execute noop instructions.
+.TP
+.B qsort
+Replace the default noop instructions with a qsort algorithm to consume more energy.
+.RE
+.RE
+.TP
 .BI (cpuio)exit_on_io_done \fR=\fPbool
 Detect when I/O threads are done, then exit.
 .TP
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 7773bb3b..f2753289 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -33,13 +33,13 @@
 						</Component>
 						<?endif?>
 						<Component>
-							<File Id="README" Name="README.txt" Source="..\..\README"/>
+							<File Id="README" Name="README.txt" Source="..\..\README.rst"/>
 						</Component>
 						<Component>
 							<File Id="REPORTING_BUGS" Name="REPORTING-BUGS.txt" Source="..\..\REPORTING-BUGS"/>
 						</Component>
 						<Component>
-							<File Id="HOWTO" Name="HOWTO.txt" Source="..\..\HOWTO"/>
+							<File Id="HOWTO" Name="HOWTO.txt" Source="..\..\HOWTO.rst"/>
 						</Component>
 						<Component>
 							<File Id="COPYING" Name="COPYING.txt" Source="..\..\COPYING"/>
diff --git a/t/io_uring.c b/t/io_uring.c
index faf5978c..4520de43 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -714,12 +714,15 @@ static int reap_events_aio(struct submitter *s, struct io_event *events, int evs
 static void *submitter_aio_fn(void *data)
 {
 	struct submitter *s = data;
-	int i, ret, prepped, nr_batch;
+	int i, ret, prepped;
 	struct iocb **iocbsptr;
 	struct iocb *iocbs;
 	struct io_event *events;
-
-	nr_batch = submitter_init(s);
+#ifdef ARCH_HAVE_CPU_CLOCK
+	int nr_batch = submitter_init(s);
+#else
+	submitter_init(s);
+#endif
 
 	iocbsptr = calloc(depth, sizeof(struct iocb *));
 	iocbs = calloc(depth, sizeof(struct iocb));

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-02-09 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-02-09 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b65c1fc07d4794920224312c56c785de2f3f1692:

  t/io_uring: fix warnings for !ARCH_HAVE_CPU_CLOCK (2022-02-04 09:02:49 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to df597be63e26ef59c1538b3ce2026c83684ff7fb:

  fio: really use LDFLAGS when linking dynamic engines (2022-02-08 09:28:30 -0700)

----------------------------------------------------------------
Eric Sandeen (1):
      fio: really use LDFLAGS when linking dynamic engines

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 2432f519..0ab4f82c 100644
--- a/Makefile
+++ b/Makefile
@@ -295,7 +295,7 @@ define engine_template =
 $(1)_OBJS := $$($(1)_SRCS:.c=.o)
 $$($(1)_OBJS): CFLAGS := -fPIC $$($(1)_CFLAGS) $(CFLAGS)
 engines/fio-$(1).so: $$($(1)_OBJS)
-	$$(QUIET_LINK)$(CC) $(DYNAMIC) -shared -rdynamic -fPIC -Wl,-soname,fio-$(1).so.1 -o $$@ $$< $$($(1)_LIBS)
+	$$(QUIET_LINK)$(CC) $(LDFLAGS) -shared -rdynamic -fPIC -Wl,-soname,fio-$(1).so.1 -o $$@ $$< $$($(1)_LIBS)
 ENGS_OBJS += engines/fio-$(1).so
 endef
 else # !CONFIG_DYNAMIC_ENGINES

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-02-05 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-02-05 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 62e9ece4d540ff2af865e4b43811f3150b8b846b:

  fio: use correct function declaration for set_epoch_time() (2022-02-03 16:06:59 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b65c1fc07d4794920224312c56c785de2f3f1692:

  t/io_uring: fix warnings for !ARCH_HAVE_CPU_CLOCK (2022-02-04 09:02:49 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: fix warnings for !ARCH_HAVE_CPU_CLOCK

Niklas Cassel (1):
      stat: make free_clat_prio_stats() safe against NULL

 stat.c       |  3 +++
 t/io_uring.c | 11 ++++++++---
 2 files changed, 11 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 0876222a..1764eebc 100644
--- a/stat.c
+++ b/stat.c
@@ -2041,6 +2041,9 @@ void free_clat_prio_stats(struct thread_stat *ts)
 {
 	enum fio_ddir ddir;
 
+	if (!ts)
+		return;
+
 	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
 		sfree(ts->clat_prio[ddir]);
 		ts->clat_prio[ddir] = NULL;
diff --git a/t/io_uring.c b/t/io_uring.c
index e8365a79..faf5978c 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -287,6 +287,7 @@ out:
 	free(ovals);
 }
 
+#ifdef ARCH_HAVE_CPU_CLOCK
 static unsigned int plat_val_to_idx(unsigned long val)
 {
 	unsigned int msb, error_bits, base, offset, idx;
@@ -322,6 +323,7 @@ static unsigned int plat_val_to_idx(unsigned long val)
 
 	return idx;
 }
+#endif
 
 static void add_stat(struct submitter *s, int clock_index, int nr)
 {
@@ -789,9 +791,12 @@ static void *submitter_uring_fn(void *data)
 {
 	struct submitter *s = data;
 	struct io_sq_ring *ring = &s->sq_ring;
-	int ret, prepped, nr_batch;
-
-	nr_batch = submitter_init(s);
+	int ret, prepped;
+#ifdef ARCH_HAVE_CPU_CLOCK
+	int nr_batch = submitter_init(s);
+#else
+	submitter_init(s);
+#endif
 
 	prepped = 0;
 	do {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-02-04 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-02-04 13:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 132470 bytes --]

The following changes since commit 52a0b9ed71c3e929461e64b39059281948107071:

  Merge branch 'patch-1' of https://github.com/Nikratio/fio (2022-01-28 14:50:51 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 62e9ece4d540ff2af865e4b43811f3150b8b846b:

  fio: use correct function declaration for set_epoch_time() (2022-02-03 16:06:59 -0700)

----------------------------------------------------------------
David Korczynski (1):
      ci/Github actions: add CIFuzz integration

Jens Axboe (6):
      Merge branch 'master' of https://github.com/blah325/fio
      server: fix formatting issue
      Merge branch 'freebsd-comment-update' of https://github.com/macdice/fio
      Merge branch 'cifuzz-integration' of https://github.com/DavidKorczynski/fio
      Merge branch 'fio_pr_alternate_epoch' of https://github.com/PCPartPicker/fio
      fio: use correct function declaration for set_epoch_time()

Niklas Cassel (18):
      init: verify option lat_percentiles consistency for all jobs in group
      backend: do ioprio_set() before calling the ioengine init callback
      stat: save the default ioprio in struct thread_stat
      client/server: convert ss_data to use an offset instead of fixed position
      stat: add a new function to allocate a clat_prio_stat array
      os: define min/max prio class and level for systems without ioprio
      options: add a parsing function for an additional cmdprio_bssplit format
      cmdprio: add support for a new cmdprio_bssplit entry format
      examples: add new cmdprio_bssplit format examples
      stat: use enum fio_ddir consistently
      stat: increment members counter after call to sum_thread_stats()
      stat: add helper for resetting the latency buckets
      stat: disable per prio stats where not needed
      stat: report clat stats on a per priority granularity
      stat: convert json output to a new per priority granularity format
      gfio: drop support for high/low priority latency results
      stat: remove unused high/low prio struct members
      t/latency_percentiles.py: add tests for the new cmdprio_bssplit format

Thomas Munro (1):
      Update comments about availability of fdatasync().

aggieNick02 (1):
      Support for alternate epochs in fio log files

james rizzo (3):
      Avoid client calls to recv() without prior poll()
      Add Windows support for --server.
      Added a new windows only IO engine option “no_completion_thread”.

 .github/workflows/cifuzz.yml |  24 ++
 HOWTO                        |  41 +++-
 backend.c                    |  27 ++-
 cconv.c                      |   4 +
 client.c                     |  48 ++--
 engines/cmdprio.c            | 440 +++++++++++++++++++++++++++++------
 engines/cmdprio.h            |  22 +-
 engines/filecreate.c         |   2 +-
 engines/filedelete.c         |   2 +-
 engines/filestat.c           |   2 +-
 engines/windowsaio.c         | 134 +++++++++--
 examples/cmdprio-bssplit.fio |  39 +++-
 fio.1                        |  45 +++-
 fio.h                        |   2 +-
 fio_time.h                   |   2 +-
 gclient.c                    |  55 +----
 init.c                       |  37 +++
 io_u.c                       |   7 +-
 io_u.h                       |   3 +-
 libfio.c                     |   2 +-
 optgroup.h                   |   2 +
 options.c                    | 140 ++++++++++++
 os/os-windows.h              |   2 +
 os/os.h                      |   4 +
 os/windows/posix.c           | 182 ++++++++++++++-
 rate-submit.c                |  11 +-
 server.c                     | 369 +++++++++++++++++++++++++++---
 server.h                     |   7 +-
 stat.c                       | 531 ++++++++++++++++++++++++++++++++++---------
 stat.h                       |  40 +++-
 t/latency_percentiles.py     | 211 ++++++++++-------
 thread_options.h             |  14 ++
 time.c                       |  12 +-
 33 files changed, 2019 insertions(+), 444 deletions(-)
 create mode 100644 .github/workflows/cifuzz.yml

---

Diff of recent changes:

diff --git a/.github/workflows/cifuzz.yml b/.github/workflows/cifuzz.yml
new file mode 100644
index 00000000..acc8d482
--- /dev/null
+++ b/.github/workflows/cifuzz.yml
@@ -0,0 +1,24 @@
+name: CIFuzz
+on: [pull_request]
+jobs:
+  Fuzzing:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Build Fuzzers
+      id: build
+      uses: google/oss-fuzz/infra/cifuzz/actions/build_fuzzers@master
+      with:
+        oss-fuzz-project-name: 'fio'
+        dry-run: false
+    - name: Run Fuzzers
+      uses: google/oss-fuzz/infra/cifuzz/actions/run_fuzzers@master
+      with:
+        oss-fuzz-project-name: 'fio'
+        fuzz-seconds: 600
+        dry-run: false
+    - name: Upload Crash
+      uses: actions/upload-artifact@v1
+      if: failure() && steps.build.outcome == 'success'
+      with:
+        name: artifacts
+        path: ./out/artifacts
diff --git a/HOWTO b/HOWTO
index c72ec8cd..74ba7216 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1344,7 +1344,7 @@ I/O type
 .. option:: fdatasync=int
 
 	Like :option:`fsync` but uses :manpage:`fdatasync(2)` to only sync data and
-	not metadata blocks. In Windows, FreeBSD, DragonFlyBSD or OSX there is no
+	not metadata blocks. In Windows, DragonFlyBSD or OSX there is no
 	:manpage:`fdatasync(2)` so this falls back to using :manpage:`fsync(2)`.
 	Defaults to 0, which means fio does not periodically issue and wait for a
 	data-only sync to complete.
@@ -2212,10 +2212,28 @@ with the caveat that when used on the command line, they must come after the
 	depending on the block size of the IO. This option is useful only
 	when used together with the :option:`bssplit` option, that is,
 	multiple different block sizes are used for reads and writes.
-	The format for this option is the same as the format of the
-	:option:`bssplit` option, with the exception that values for
-	trim IOs are ignored. This option is mutually exclusive with the
-	:option:`cmdprio_percentage` option.
+
+	The first accepted format for this option is the same as the format of
+	the :option:`bssplit` option:
+
+		cmdprio_bssplit=blocksize/percentage:blocksize/percentage
+
+	In this case, each entry will use the priority class and priority
+	level defined by the options :option:`cmdprio_class` and
+	:option:`cmdprio` respectively.
+
+	The second accepted format for this option is:
+
+		cmdprio_bssplit=blocksize/percentage/class/level:blocksize/percentage/class/level
+
+	In this case, the priority class and priority level is defined inside
+	each entry. In comparison with the first accepted format, the second
+	accepted format does not restrict all entries to have the same priority
+	class and priority level.
+
+	For both formats, only the read and write data directions are supported,
+	values for trim IOs are ignored. This option is mutually exclusive with
+	the :option:`cmdprio_percentage` option.
 
 .. option:: fixedbufs : [io_uring]
 
@@ -3663,6 +3681,19 @@ Measurements and reporting
 	write_type_log for each log type, instead of the default zero-based
 	timestamps.
 
+.. option:: log_alternate_epoch=bool
+
+	If set, fio will log timestamps based on the epoch used by the clock specified
+	in the log_alternate_epoch_clock_id option, to the log files produced by
+	enabling write_type_log for each log type, instead of the default zero-based
+	timestamps.
+
+.. option:: log_alternate_epoch_clock_id=int
+
+	Specifies the clock_id to be used by clock_gettime to obtain the alternate epoch
+	if either log_unix_epoch or log_alternate_epoch are true. Otherwise has no
+	effect. Default value is 0, or CLOCK_REALTIME.
+
 .. option:: block_error_percentiles=bool
 
 	If set, record errors in trim block-sized units from writes and trims and
diff --git a/backend.c b/backend.c
index c167f908..061e3b32 100644
--- a/backend.c
+++ b/backend.c
@@ -1777,6 +1777,18 @@ static void *thread_main(void *data)
 	if (!init_iolog(td))
 		goto err;
 
+	/* ioprio_set() has to be done before td_io_init() */
+	if (fio_option_is_set(o, ioprio) ||
+	    fio_option_is_set(o, ioprio_class)) {
+		ret = ioprio_set(IOPRIO_WHO_PROCESS, 0, o->ioprio_class, o->ioprio);
+		if (ret == -1) {
+			td_verror(td, errno, "ioprio_set");
+			goto err;
+		}
+		td->ioprio = ioprio_value(o->ioprio_class, o->ioprio);
+		td->ts.ioprio = td->ioprio;
+	}
+
 	if (td_io_init(td))
 		goto err;
 
@@ -1789,16 +1801,6 @@ static void *thread_main(void *data)
 	if (o->verify_async && verify_async_init(td))
 		goto err;
 
-	if (fio_option_is_set(o, ioprio) ||
-	    fio_option_is_set(o, ioprio_class)) {
-		ret = ioprio_set(IOPRIO_WHO_PROCESS, 0, o->ioprio_class, o->ioprio);
-		if (ret == -1) {
-			td_verror(td, errno, "ioprio_set");
-			goto err;
-		}
-		td->ioprio = ioprio_value(o->ioprio_class, o->ioprio);
-	}
-
 	if (o->cgroup && cgroup_setup(td, cgroup_list, &cgroup_mnt))
 		goto err;
 
@@ -1828,7 +1830,7 @@ static void *thread_main(void *data)
 	if (rate_submit_init(td, sk_out))
 		goto err;
 
-	set_epoch_time(td, o->log_unix_epoch);
+	set_epoch_time(td, o->log_unix_epoch | o->log_alternate_epoch, o->log_alternate_epoch_clock_id);
 	fio_getrusage(&td->ru_start);
 	memcpy(&td->bw_sample_time, &td->epoch, sizeof(td->epoch));
 	memcpy(&td->iops_sample_time, &td->epoch, sizeof(td->epoch));
@@ -2611,6 +2613,9 @@ int fio_backend(struct sk_out *sk_out)
 	}
 
 	for_each_td(td, i) {
+		struct thread_stat *ts = &td->ts;
+
+		free_clat_prio_stats(ts);
 		steadystate_free(td);
 		fio_options_free(td);
 		fio_dump_options_free(td);
diff --git a/cconv.c b/cconv.c
index 4f8d27eb..62d02e36 100644
--- a/cconv.c
+++ b/cconv.c
@@ -197,6 +197,8 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->log_gz = le32_to_cpu(top->log_gz);
 	o->log_gz_store = le32_to_cpu(top->log_gz_store);
 	o->log_unix_epoch = le32_to_cpu(top->log_unix_epoch);
+	o->log_alternate_epoch = le32_to_cpu(top->log_alternate_epoch);
+	o->log_alternate_epoch_clock_id = le32_to_cpu(top->log_alternate_epoch_clock_id);
 	o->norandommap = le32_to_cpu(top->norandommap);
 	o->softrandommap = le32_to_cpu(top->softrandommap);
 	o->bs_unaligned = le32_to_cpu(top->bs_unaligned);
@@ -425,6 +427,8 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->log_gz = cpu_to_le32(o->log_gz);
 	top->log_gz_store = cpu_to_le32(o->log_gz_store);
 	top->log_unix_epoch = cpu_to_le32(o->log_unix_epoch);
+	top->log_alternate_epoch = cpu_to_le32(o->log_alternate_epoch);
+	top->log_alternate_epoch_clock_id = cpu_to_le32(o->log_alternate_epoch_clock_id);
 	top->norandommap = cpu_to_le32(o->norandommap);
 	top->softrandommap = cpu_to_le32(o->softrandommap);
 	top->bs_unaligned = cpu_to_le32(o->bs_unaligned);
diff --git a/client.c b/client.c
index be8411d8..605a3ce5 100644
--- a/client.c
+++ b/client.c
@@ -284,9 +284,10 @@ static int fio_client_dec_jobs_eta(struct client_eta *eta, client_eta_op eta_fn)
 static void fio_drain_client_text(struct fio_client *client)
 {
 	do {
-		struct fio_net_cmd *cmd;
+		struct fio_net_cmd *cmd = NULL;
 
-		cmd = fio_net_recv_cmd(client->fd, false);
+		if (fio_server_poll_fd(client->fd, POLLIN, 0))
+			cmd = fio_net_recv_cmd(client->fd, false);
 		if (!cmd)
 			break;
 
@@ -953,6 +954,8 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->pid		= le32_to_cpu(src->pid);
 	dst->members		= le32_to_cpu(src->members);
 	dst->unified_rw_rep	= le32_to_cpu(src->unified_rw_rep);
+	dst->ioprio		= le32_to_cpu(src->ioprio);
+	dst->disable_prio_stat	= le32_to_cpu(src->disable_prio_stat);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		convert_io_stat(&dst->clat_stat[i], &src->clat_stat[i]);
@@ -1035,14 +1038,6 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->nr_block_infos	= le64_to_cpu(src->nr_block_infos);
 	for (i = 0; i < dst->nr_block_infos; i++)
 		dst->block_infos[i] = le32_to_cpu(src->block_infos[i]);
-	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
-			dst->io_u_plat_high_prio[i][j] = le64_to_cpu(src->io_u_plat_high_prio[i][j]);
-			dst->io_u_plat_low_prio[i][j] = le64_to_cpu(src->io_u_plat_low_prio[i][j]);
-		}
-		convert_io_stat(&dst->clat_high_prio_stat[i], &src->clat_high_prio_stat[i]);
-		convert_io_stat(&dst->clat_low_prio_stat[i], &src->clat_low_prio_stat[i]);
-	}
 
 	dst->ss_dur		= le64_to_cpu(src->ss_dur);
 	dst->ss_state		= le32_to_cpu(src->ss_state);
@@ -1052,6 +1047,19 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->ss_deviation.u.f 	= fio_uint64_to_double(le64_to_cpu(src->ss_deviation.u.i));
 	dst->ss_criterion.u.f 	= fio_uint64_to_double(le64_to_cpu(src->ss_criterion.u.i));
 
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		dst->nr_clat_prio[i] = le32_to_cpu(src->nr_clat_prio[i]);
+		for (j = 0; j < dst->nr_clat_prio[i]; j++) {
+			for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
+				dst->clat_prio[i][j].io_u_plat[k] =
+					le64_to_cpu(src->clat_prio[i][j].io_u_plat[k]);
+			convert_io_stat(&dst->clat_prio[i][j].clat_stat,
+					&src->clat_prio[i][j].clat_stat);
+			dst->clat_prio[i][j].ioprio =
+				le32_to_cpu(dst->clat_prio[i][j].ioprio);
+		}
+	}
+
 	if (dst->ss_state & FIO_SS_DATA) {
 		for (i = 0; i < dst->ss_dur; i++ ) {
 			dst->ss_iops_data[i] = le64_to_cpu(src->ss_iops_data[i]);
@@ -1760,7 +1768,6 @@ int fio_handle_client(struct fio_client *client)
 {
 	struct client_ops *ops = client->ops;
 	struct fio_net_cmd *cmd;
-	int size;
 
 	dprint(FD_NET, "client: handle %s\n", client->hostname);
 
@@ -1794,14 +1801,26 @@ int fio_handle_client(struct fio_client *client)
 		}
 	case FIO_NET_CMD_TS: {
 		struct cmd_ts_pdu *p = (struct cmd_ts_pdu *) cmd->payload;
+		uint64_t offset;
+		int i;
+
+		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+			if (le32_to_cpu(p->ts.nr_clat_prio[i])) {
+				offset = le64_to_cpu(p->ts.clat_prio_offset[i]);
+				p->ts.clat_prio[i] =
+					(struct clat_prio_stat *)((char *)p + offset);
+			}
+		}
 
 		dprint(FD_NET, "client: ts->ss_state = %u\n", (unsigned int) le32_to_cpu(p->ts.ss_state));
 		if (le32_to_cpu(p->ts.ss_state) & FIO_SS_DATA) {
 			dprint(FD_NET, "client: received steadystate ring buffers\n");
 
-			size = le64_to_cpu(p->ts.ss_dur);
-			p->ts.ss_iops_data = (uint64_t *) ((struct cmd_ts_pdu *)cmd->payload + 1);
-			p->ts.ss_bw_data = p->ts.ss_iops_data + size;
+			offset = le64_to_cpu(p->ts.ss_iops_data_offset);
+			p->ts.ss_iops_data = (uint64_t *)((char *)p + offset);
+
+			offset = le64_to_cpu(p->ts.ss_bw_data_offset);
+			p->ts.ss_bw_data = (uint64_t *)((char *)p + offset);
 		}
 
 		convert_ts(&p->ts, &p->ts);
@@ -2152,6 +2171,7 @@ int fio_handle_clients(struct client_ops *ops)
 
 	fio_client_json_fini();
 
+	free_clat_prio_stats(&client_ts);
 	free(pfds);
 	return retval || error_clients;
 }
diff --git a/engines/cmdprio.c b/engines/cmdprio.c
index 92b752ae..dd358754 100644
--- a/engines/cmdprio.c
+++ b/engines/cmdprio.c
@@ -5,45 +5,201 @@
 
 #include "cmdprio.h"
 
-static int fio_cmdprio_bssplit_ddir(struct thread_options *to, void *cb_arg,
-				    enum fio_ddir ddir, char *str, bool data)
+/*
+ * Temporary array used during parsing. Will be freed after the corresponding
+ * struct bsprio_desc has been generated and saved in cmdprio->bsprio_desc.
+ */
+struct cmdprio_parse_result {
+	struct split_prio *entries;
+	int nr_entries;
+};
+
+/*
+ * Temporary array used during init. Will be freed after the corresponding
+ * struct clat_prio_stat array has been saved in td->ts.clat_prio and the
+ * matching clat_prio_indexes have been saved in each struct cmdprio_prio.
+ */
+struct cmdprio_values {
+	unsigned int *prios;
+	int nr_prios;
+};
+
+static int find_clat_prio_index(unsigned int *all_prios, int nr_prios,
+				int32_t prio)
 {
-	struct cmdprio *cmdprio = cb_arg;
-	struct split split;
-	unsigned int i;
+	int i;
 
-	if (ddir == DDIR_TRIM)
-		return 0;
+	for (i = 0; i < nr_prios; i++) {
+		if (all_prios[i] == prio)
+			return i;
+	}
 
-	memset(&split, 0, sizeof(split));
+	return -1;
+}
 
-	if (split_parse_ddir(to, &split, str, data, BSSPLIT_MAX))
+/**
+ * assign_clat_prio_index - In order to avoid stat.c the need to loop through
+ * all possible priorities each time add_clat_sample() / add_lat_sample() is
+ * called, save which index to use in each cmdprio_prio. This will later be
+ * propagated to the io_u, if the specific io_u was determined to use a cmdprio
+ * priority value.
+ */
+static void assign_clat_prio_index(struct cmdprio_prio *prio,
+				   struct cmdprio_values *values)
+{
+	int clat_prio_index = find_clat_prio_index(values->prios,
+						   values->nr_prios,
+						   prio->prio);
+	if (clat_prio_index == -1) {
+		clat_prio_index = values->nr_prios;
+		values->prios[clat_prio_index] = prio->prio;
+		values->nr_prios++;
+	}
+	prio->clat_prio_index = clat_prio_index;
+}
+
+/**
+ * init_cmdprio_values - Allocate a temporary array that can hold all unique
+ * priorities (per ddir), so that we can assign_clat_prio_index() for each
+ * cmdprio_prio during setup. This temporary array is freed after setup.
+ */
+static int init_cmdprio_values(struct cmdprio_values *values,
+			       int max_unique_prios, struct thread_stat *ts)
+{
+	values->prios = calloc(max_unique_prios + 1,
+			       sizeof(*values->prios));
+	if (!values->prios)
 		return 1;
-	if (!split.nr)
-		return 0;
 
-	cmdprio->bssplit_nr[ddir] = split.nr;
-	cmdprio->bssplit[ddir] = malloc(split.nr * sizeof(struct bssplit));
-	if (!cmdprio->bssplit[ddir])
+	/* td->ioprio/ts->ioprio is always stored at index 0. */
+	values->prios[0] = ts->ioprio;
+	values->nr_prios++;
+
+	return 0;
+}
+
+/**
+ * init_ts_clat_prio - Allocates and fills a clat_prio_stat array which holds
+ * all unique priorities (per ddir).
+ */
+static int init_ts_clat_prio(struct thread_stat *ts, enum fio_ddir ddir,
+			     struct cmdprio_values *values)
+{
+	int i;
+
+	if (alloc_clat_prio_stat_ddir(ts, ddir, values->nr_prios))
 		return 1;
 
-	for (i = 0; i < split.nr; i++) {
-		cmdprio->bssplit[ddir][i].bs = split.val1[i];
-		if (split.val2[i] == -1U) {
-			cmdprio->bssplit[ddir][i].perc = 0;
-		} else {
-			if (split.val2[i] > 100)
-				cmdprio->bssplit[ddir][i].perc = 100;
-			else
-				cmdprio->bssplit[ddir][i].perc = split.val2[i];
+	for (i = 0; i < values->nr_prios; i++)
+		ts->clat_prio[ddir][i].ioprio = values->prios[i];
+
+	return 0;
+}
+
+static int fio_cmdprio_fill_bsprio(struct cmdprio_bsprio *bsprio,
+				   struct split_prio *entries,
+				   struct cmdprio_values *values,
+				   int implicit_cmdprio, int start, int end)
+{
+	struct cmdprio_prio *prio;
+	int i = end - start + 1;
+
+	bsprio->prios = calloc(i, sizeof(*bsprio->prios));
+	if (!bsprio->prios)
+		return 1;
+
+	bsprio->bs = entries[start].bs;
+	bsprio->nr_prios = 0;
+	for (i = start; i <= end; i++) {
+		prio = &bsprio->prios[bsprio->nr_prios];
+		prio->perc = entries[i].perc;
+		if (entries[i].prio == -1)
+			prio->prio = implicit_cmdprio;
+		else
+			prio->prio = entries[i].prio;
+		assign_clat_prio_index(prio, values);
+		bsprio->tot_perc += entries[i].perc;
+		if (bsprio->tot_perc > 100) {
+			log_err("fio: cmdprio_bssplit total percentage "
+				"for bs: %"PRIu64" exceeds 100\n",
+				bsprio->bs);
+			free(bsprio->prios);
+			return 1;
 		}
+		bsprio->nr_prios++;
+	}
+
+	return 0;
+}
+
+static int
+fio_cmdprio_generate_bsprio_desc(struct cmdprio_bsprio_desc *bsprio_desc,
+				 struct cmdprio_parse_result *parse_res,
+				 struct cmdprio_values *values,
+				 int implicit_cmdprio)
+{
+	struct split_prio *entries = parse_res->entries;
+	int nr_entries = parse_res->nr_entries;
+	struct cmdprio_bsprio *bsprio;
+	int i, start, count = 0;
+
+	/*
+	 * The parsed result is sorted by blocksize, so count only the number
+	 * of different blocksizes, to know how many cmdprio_bsprio we need.
+	 */
+	for (i = 0; i < nr_entries; i++) {
+		while (i + 1 < nr_entries && entries[i].bs == entries[i + 1].bs)
+			i++;
+		count++;
+	}
+
+	/*
+	 * This allocation is not freed on error. Instead, the calling function
+	 * is responsible for calling fio_cmdprio_cleanup() on error.
+	 */
+	bsprio_desc->bsprios = calloc(count, sizeof(*bsprio_desc->bsprios));
+	if (!bsprio_desc->bsprios)
+		return 1;
+
+	start = 0;
+	bsprio_desc->nr_bsprios = 0;
+	for (i = 0; i < nr_entries; i++) {
+		while (i + 1 < nr_entries && entries[i].bs == entries[i + 1].bs)
+			i++;
+		bsprio = &bsprio_desc->bsprios[bsprio_desc->nr_bsprios];
+		/*
+		 * All parsed entries with the same blocksize get saved in the
+		 * same cmdprio_bsprio, to expedite the search in the hot path.
+		 */
+		if (fio_cmdprio_fill_bsprio(bsprio, entries, values,
+					    implicit_cmdprio, start, i))
+			return 1;
+
+		start = i + 1;
+		bsprio_desc->nr_bsprios++;
 	}
 
 	return 0;
 }
 
-int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
-			      struct cmdprio *cmdprio)
+static int fio_cmdprio_bssplit_ddir(struct thread_options *to, void *cb_arg,
+				    enum fio_ddir ddir, char *str, bool data)
+{
+	struct cmdprio_parse_result *parse_res_arr = cb_arg;
+	struct cmdprio_parse_result *parse_res = &parse_res_arr[ddir];
+
+	if (ddir == DDIR_TRIM)
+		return 0;
+
+	if (split_parse_prio_ddir(to, &parse_res->entries,
+				  &parse_res->nr_entries, str))
+		return 1;
+
+	return 0;
+}
+
+static int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
+				     struct cmdprio_parse_result *parse_res)
 {
 	char *str, *p;
 	int ret = 0;
@@ -53,26 +209,39 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
 	strip_blank_front(&str);
 	strip_blank_end(str);
 
-	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio,
+	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, parse_res,
 			      false);
 
 	free(p);
 	return ret;
 }
 
-static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
+/**
+ * fio_cmdprio_percentage - Returns the percentage of I/Os that should
+ * use a cmdprio priority value (rather than the default context priority).
+ *
+ * For CMDPRIO_MODE_BSSPLIT, if the percentage is non-zero, we will also
+ * return the matching bsprio, to avoid the same linear search elsewhere.
+ * For CMDPRIO_MODE_PERC, we will never return a bsprio.
+ */
+static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u,
+				  struct cmdprio_bsprio **bsprio)
 {
+	struct cmdprio_bsprio *bsprio_entry;
 	enum fio_ddir ddir = io_u->ddir;
-	struct cmdprio_options *options = cmdprio->options;
 	int i;
 
 	switch (cmdprio->mode) {
 	case CMDPRIO_MODE_PERC:
-		return options->percentage[ddir];
+		*bsprio = NULL;
+		return cmdprio->perc_entry[ddir].perc;
 	case CMDPRIO_MODE_BSSPLIT:
-		for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
-			if (cmdprio->bssplit[ddir][i].bs == io_u->buflen)
-				return cmdprio->bssplit[ddir][i].perc;
+		for (i = 0; i < cmdprio->bsprio_desc[ddir].nr_bsprios; i++) {
+			bsprio_entry = &cmdprio->bsprio_desc[ddir].bsprios[i];
+			if (bsprio_entry->bs == io_u->buflen) {
+				*bsprio = bsprio_entry;
+				return bsprio_entry->tot_perc;
+			}
 		}
 		break;
 	default:
@@ -83,6 +252,11 @@ static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
 		assert(0);
 	}
 
+	/*
+	 * This is totally fine, the given blocksize simply does not
+	 * have any (non-zero) cmdprio_bssplit entries defined.
+	 */
+	*bsprio = NULL;
 	return 0;
 }
 
@@ -100,52 +274,162 @@ static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
 bool fio_cmdprio_set_ioprio(struct thread_data *td, struct cmdprio *cmdprio,
 			    struct io_u *io_u)
 {
-	enum fio_ddir ddir = io_u->ddir;
-	struct cmdprio_options *options = cmdprio->options;
-	unsigned int p;
-	unsigned int cmdprio_value =
-		ioprio_value(options->class[ddir], options->level[ddir]);
-
-	p = fio_cmdprio_percentage(cmdprio, io_u);
-	if (p && rand_between(&td->prio_state, 0, 99) < p) {
-		io_u->ioprio = cmdprio_value;
-		if (!td->ioprio || cmdprio_value < td->ioprio) {
-			/*
-			 * The async IO priority is higher (has a lower value)
-			 * than the default priority (which is either 0 or the
-			 * value set by "prio" and "prioclass" options).
-			 */
-			io_u->flags |= IO_U_F_HIGH_PRIO;
-		}
+	struct cmdprio_bsprio *bsprio;
+	unsigned int p, rand;
+	uint32_t perc = 0;
+	int i;
+
+	p = fio_cmdprio_percentage(cmdprio, io_u, &bsprio);
+	if (!p)
+		return false;
+
+	rand = rand_between(&td->prio_state, 0, 99);
+	if (rand >= p)
+		return false;
+
+	switch (cmdprio->mode) {
+	case CMDPRIO_MODE_PERC:
+		io_u->ioprio = cmdprio->perc_entry[io_u->ddir].prio;
+		io_u->clat_prio_index =
+			cmdprio->perc_entry[io_u->ddir].clat_prio_index;
 		return true;
+	case CMDPRIO_MODE_BSSPLIT:
+		assert(bsprio);
+		for (i = 0; i < bsprio->nr_prios; i++) {
+			struct cmdprio_prio *prio = &bsprio->prios[i];
+
+			perc += prio->perc;
+			if (rand < perc) {
+				io_u->ioprio = prio->prio;
+				io_u->clat_prio_index = prio->clat_prio_index;
+				return true;
+			}
+		}
+		break;
+	default:
+		assert(0);
 	}
 
-	if (td->ioprio && td->ioprio < cmdprio_value) {
+	/* When rand < p (total perc), we should always find a cmdprio_prio. */
+	assert(0);
+	return false;
+}
+
+static int fio_cmdprio_gen_perc(struct thread_data *td, struct cmdprio *cmdprio)
+{
+	struct cmdprio_options *options = cmdprio->options;
+	struct cmdprio_prio *prio;
+	struct cmdprio_values values[CMDPRIO_RWDIR_CNT] = {0};
+	struct thread_stat *ts = &td->ts;
+	enum fio_ddir ddir;
+	int ret;
+
+	for (ddir = 0; ddir < CMDPRIO_RWDIR_CNT; ddir++) {
 		/*
-		 * The IO will be executed with the default priority (which is
-		 * either 0 or the value set by "prio" and "prioclass options),
-		 * and this priority is higher (has a lower value) than the
-		 * async IO priority.
+		 * Do not allocate a clat_prio array nor set the cmdprio struct
+		 * if zero percent of the I/Os (for the ddir) should use a
+		 * cmdprio priority value, or when the ddir is not enabled.
 		 */
-		io_u->flags |= IO_U_F_HIGH_PRIO;
+		if (!options->percentage[ddir] ||
+		    (ddir == DDIR_READ && !td_read(td)) ||
+		    (ddir == DDIR_WRITE && !td_write(td)))
+			continue;
+
+		ret = init_cmdprio_values(&values[ddir], 1, ts);
+		if (ret)
+			goto err;
+
+		prio = &cmdprio->perc_entry[ddir];
+		prio->perc = options->percentage[ddir];
+		prio->prio = ioprio_value(options->class[ddir],
+					  options->level[ddir]);
+		assign_clat_prio_index(prio, &values[ddir]);
+
+		ret = init_ts_clat_prio(ts, ddir, &values[ddir]);
+		if (ret)
+			goto err;
+
+		free(values[ddir].prios);
+		values[ddir].prios = NULL;
+		values[ddir].nr_prios = 0;
 	}
 
-	return false;
+	return 0;
+
+err:
+	for (ddir = 0; ddir < CMDPRIO_RWDIR_CNT; ddir++)
+		free(values[ddir].prios);
+	free_clat_prio_stats(ts);
+
+	return ret;
 }
 
 static int fio_cmdprio_parse_and_gen_bssplit(struct thread_data *td,
 					     struct cmdprio *cmdprio)
 {
 	struct cmdprio_options *options = cmdprio->options;
-	int ret;
-
-	ret = fio_cmdprio_bssplit_parse(td, options->bssplit_str, cmdprio);
+	struct cmdprio_parse_result parse_res[CMDPRIO_RWDIR_CNT] = {0};
+	struct cmdprio_values values[CMDPRIO_RWDIR_CNT] = {0};
+	struct thread_stat *ts = &td->ts;
+	int ret, implicit_cmdprio;
+	enum fio_ddir ddir;
+
+	ret = fio_cmdprio_bssplit_parse(td, options->bssplit_str,
+					&parse_res[0]);
 	if (ret)
 		goto err;
 
+	for (ddir = 0; ddir < CMDPRIO_RWDIR_CNT; ddir++) {
+		/*
+		 * Do not allocate a clat_prio array nor set the cmdprio structs
+		 * if there are no non-zero entries (for the ddir), or when the
+		 * ddir is not enabled.
+		 */
+		if (!parse_res[ddir].nr_entries ||
+		    (ddir == DDIR_READ && !td_read(td)) ||
+		    (ddir == DDIR_WRITE && !td_write(td))) {
+			free(parse_res[ddir].entries);
+			parse_res[ddir].entries = NULL;
+			parse_res[ddir].nr_entries = 0;
+			continue;
+		}
+
+		ret = init_cmdprio_values(&values[ddir],
+					  parse_res[ddir].nr_entries, ts);
+		if (ret)
+			goto err;
+
+		implicit_cmdprio = ioprio_value(options->class[ddir],
+						options->level[ddir]);
+
+		ret = fio_cmdprio_generate_bsprio_desc(&cmdprio->bsprio_desc[ddir],
+						       &parse_res[ddir],
+						       &values[ddir],
+						       implicit_cmdprio);
+		if (ret)
+			goto err;
+
+		free(parse_res[ddir].entries);
+		parse_res[ddir].entries = NULL;
+		parse_res[ddir].nr_entries = 0;
+
+		ret = init_ts_clat_prio(ts, ddir, &values[ddir]);
+		if (ret)
+			goto err;
+
+		free(values[ddir].prios);
+		values[ddir].prios = NULL;
+		values[ddir].nr_prios = 0;
+	}
+
 	return 0;
 
 err:
+	for (ddir = 0; ddir < CMDPRIO_RWDIR_CNT; ddir++) {
+		free(parse_res[ddir].entries);
+		free(values[ddir].prios);
+	}
+	free_clat_prio_stats(ts);
 	fio_cmdprio_cleanup(cmdprio);
 
 	return ret;
@@ -157,40 +441,46 @@ static int fio_cmdprio_parse_and_gen(struct thread_data *td,
 	struct cmdprio_options *options = cmdprio->options;
 	int i, ret;
 
+	/*
+	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
+	 * is not set, default to RT priority class.
+	 */
+	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
+		/*
+		 * A cmdprio value is only used when fio_cmdprio_percentage()
+		 * returns non-zero, so it is safe to set a class even for a
+		 * DDIR that will never use it.
+		 */
+		if (!options->class[i])
+			options->class[i] = IOPRIO_CLASS_RT;
+	}
+
 	switch (cmdprio->mode) {
 	case CMDPRIO_MODE_BSSPLIT:
 		ret = fio_cmdprio_parse_and_gen_bssplit(td, cmdprio);
 		break;
 	case CMDPRIO_MODE_PERC:
-		ret = 0;
+		ret = fio_cmdprio_gen_perc(td, cmdprio);
 		break;
 	default:
 		assert(0);
 		return 1;
 	}
 
-	/*
-	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
-	 * is not set, default to RT priority class.
-	 */
-	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
-		if (options->percentage[i] || cmdprio->bssplit_nr[i]) {
-			if (!options->class[i])
-				options->class[i] = IOPRIO_CLASS_RT;
-		}
-	}
-
 	return ret;
 }
 
 void fio_cmdprio_cleanup(struct cmdprio *cmdprio)
 {
-	int ddir;
+	enum fio_ddir ddir;
+	int i;
 
 	for (ddir = 0; ddir < CMDPRIO_RWDIR_CNT; ddir++) {
-		free(cmdprio->bssplit[ddir]);
-		cmdprio->bssplit[ddir] = NULL;
-		cmdprio->bssplit_nr[ddir] = 0;
+		for (i = 0; i < cmdprio->bsprio_desc[ddir].nr_bsprios; i++)
+			free(cmdprio->bsprio_desc[ddir].bsprios[i].prios);
+		free(cmdprio->bsprio_desc[ddir].bsprios);
+		cmdprio->bsprio_desc[ddir].bsprios = NULL;
+		cmdprio->bsprio_desc[ddir].nr_bsprios = 0;
 	}
 
 	/*
diff --git a/engines/cmdprio.h b/engines/cmdprio.h
index 0c7bd6cf..755da8d0 100644
--- a/engines/cmdprio.h
+++ b/engines/cmdprio.h
@@ -17,6 +17,24 @@ enum {
 	CMDPRIO_MODE_BSSPLIT,
 };
 
+struct cmdprio_prio {
+	int32_t prio;
+	uint32_t perc;
+	uint16_t clat_prio_index;
+};
+
+struct cmdprio_bsprio {
+	uint64_t bs;
+	uint32_t tot_perc;
+	unsigned int nr_prios;
+	struct cmdprio_prio *prios;
+};
+
+struct cmdprio_bsprio_desc {
+	struct cmdprio_bsprio *bsprios;
+	unsigned int nr_bsprios;
+};
+
 struct cmdprio_options {
 	unsigned int percentage[CMDPRIO_RWDIR_CNT];
 	unsigned int class[CMDPRIO_RWDIR_CNT];
@@ -26,8 +44,8 @@ struct cmdprio_options {
 
 struct cmdprio {
 	struct cmdprio_options *options;
-	unsigned int bssplit_nr[CMDPRIO_RWDIR_CNT];
-	struct bssplit *bssplit[CMDPRIO_RWDIR_CNT];
+	struct cmdprio_prio perc_entry[CMDPRIO_RWDIR_CNT];
+	struct cmdprio_bsprio_desc bsprio_desc[CMDPRIO_RWDIR_CNT];
 	unsigned int mode;
 };
 
diff --git a/engines/filecreate.c b/engines/filecreate.c
index 4bb13c34..7884752d 100644
--- a/engines/filecreate.c
+++ b/engines/filecreate.c
@@ -49,7 +49,7 @@ static int open_file(struct thread_data *td, struct fio_file *f)
 		uint64_t nsec;
 
 		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, false);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, 0);
 	}
 
 	return 0;
diff --git a/engines/filedelete.c b/engines/filedelete.c
index e882ccf0..df388ac9 100644
--- a/engines/filedelete.c
+++ b/engines/filedelete.c
@@ -51,7 +51,7 @@ static int delete_file(struct thread_data *td, struct fio_file *f)
 		uint64_t nsec;
 
 		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, false);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, 0);
 	}
 
 	return 0;
diff --git a/engines/filestat.c b/engines/filestat.c
index 00311247..e587eb54 100644
--- a/engines/filestat.c
+++ b/engines/filestat.c
@@ -125,7 +125,7 @@ static int stat_file(struct thread_data *td, struct fio_file *f)
 		uint64_t nsec;
 
 		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, false);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, 0);
 	}
 
 	return 0;
diff --git a/engines/windowsaio.c b/engines/windowsaio.c
index 9868e816..d82c8053 100644
--- a/engines/windowsaio.c
+++ b/engines/windowsaio.c
@@ -11,6 +11,7 @@
 #include <errno.h>
 
 #include "../fio.h"
+#include "../optgroup.h"
 
 typedef BOOL (WINAPI *CANCELIOEX)(HANDLE hFile, LPOVERLAPPED lpOverlapped);
 
@@ -35,6 +36,26 @@ struct thread_ctx {
 	struct windowsaio_data *wd;
 };
 
+struct windowsaio_options {
+	struct thread_data *td;
+	unsigned int no_completion_thread;
+};
+
+static struct fio_option options[] = {
+	{
+		.name	= "no_completion_thread",
+		.lname	= "No completion polling thread",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct windowsaio_options, no_completion_thread),
+		.help	= "Use to avoid separate completion polling thread",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_WINDOWSAIO,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
 static DWORD WINAPI IoCompletionRoutine(LPVOID lpParameter);
 
 static int fio_windowsaio_init(struct thread_data *td)
@@ -80,6 +101,7 @@ static int fio_windowsaio_init(struct thread_data *td)
 		struct thread_ctx *ctx;
 		struct windowsaio_data *wd;
 		HANDLE hFile;
+		struct windowsaio_options *o = td->eo;
 
 		hFile = CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 0);
 		if (hFile == INVALID_HANDLE_VALUE) {
@@ -91,29 +113,30 @@ static int fio_windowsaio_init(struct thread_data *td)
 		wd->iothread_running = TRUE;
 		wd->iocp = hFile;
 
-		if (!rc)
-			ctx = malloc(sizeof(struct thread_ctx));
+		if (o->no_completion_thread == 0) {
+			if (!rc)
+				ctx = malloc(sizeof(struct thread_ctx));
 
-		if (!rc && ctx == NULL) {
-			log_err("windowsaio: failed to allocate memory for thread context structure\n");
-			CloseHandle(hFile);
-			rc = 1;
-		}
+			if (!rc && ctx == NULL) {
+				log_err("windowsaio: failed to allocate memory for thread context structure\n");
+				CloseHandle(hFile);
+				rc = 1;
+			}
 
-		if (!rc) {
-			DWORD threadid;
+			if (!rc) {
+				DWORD threadid;
 
-			ctx->iocp = hFile;
-			ctx->wd = wd;
-			wd->iothread = CreateThread(NULL, 0, IoCompletionRoutine, ctx, 0, &threadid);
-			if (!wd->iothread)
-				log_err("windowsaio: failed to create io completion thread\n");
-			else if (fio_option_is_set(&td->o, cpumask))
-				fio_setaffinity(threadid, td->o.cpumask);
+				ctx->iocp = hFile;
+				ctx->wd = wd;
+				wd->iothread = CreateThread(NULL, 0, IoCompletionRoutine, ctx, 0, &threadid);
+				if (!wd->iothread)
+					log_err("windowsaio: failed to create io completion thread\n");
+				else if (fio_option_is_set(&td->o, cpumask))
+					fio_setaffinity(threadid, td->o.cpumask);
+			}
+			if (rc || wd->iothread == NULL)
+				rc = 1;
 		}
-
-		if (rc || wd->iothread == NULL)
-			rc = 1;
 	}
 
 	return rc;
@@ -302,9 +325,63 @@ static struct io_u* fio_windowsaio_event(struct thread_data *td, int event)
 	return wd->aio_events[event];
 }
 
-static int fio_windowsaio_getevents(struct thread_data *td, unsigned int min,
-				    unsigned int max,
-				    const struct timespec *t)
+/* dequeue completion entrees directly (no separate completion thread) */
+static int fio_windowsaio_getevents_nothread(struct thread_data *td, unsigned int min,
+				    unsigned int max, const struct timespec *t)
+{
+	struct windowsaio_data *wd = td->io_ops_data;
+	unsigned int dequeued = 0;
+	struct io_u *io_u;
+	DWORD start_count = 0;
+	DWORD end_count = 0;
+	DWORD mswait = 250;
+	struct fio_overlapped *fov;
+
+	if (t != NULL) {
+		mswait = (t->tv_sec * 1000) + (t->tv_nsec / 1000000);
+		start_count = GetTickCount();
+		end_count = start_count + (t->tv_sec * 1000) + (t->tv_nsec / 1000000);
+	}
+
+	do {
+		BOOL ret;
+		OVERLAPPED *ovl;
+
+		ULONG entries = min(16, max-dequeued);
+		OVERLAPPED_ENTRY oe[16];
+		ret = GetQueuedCompletionStatusEx(wd->iocp, oe, 16, &entries, mswait, 0);
+		if (ret && entries) {
+			int entry_num;
+
+			for (entry_num=0; entry_num<entries; entry_num++) {
+				ovl = oe[entry_num].lpOverlapped;
+				fov = CONTAINING_RECORD(ovl, struct fio_overlapped, o);
+				io_u = fov->io_u;
+
+				if (ovl->Internal == ERROR_SUCCESS) {
+					io_u->resid = io_u->xfer_buflen - ovl->InternalHigh;
+					io_u->error = 0;
+				} else {
+					io_u->resid = io_u->xfer_buflen;
+					io_u->error = win_to_posix_error(GetLastError());
+				}
+
+				fov->io_complete = FALSE;
+				wd->aio_events[dequeued] = io_u;
+				dequeued++;
+			}
+		}
+
+		if (dequeued >= min ||
+			(t != NULL && timeout_expired(start_count, end_count)))
+			break;
+	} while (1);
+	return dequeued;
+}
+
+/* dequeue completion entrees creates by separate IoCompletionRoutine thread */
+static int fio_windowaio_getevents_thread(struct thread_data *td, unsigned int min,
+				    unsigned int max, const struct timespec *t)
 {
 	struct windowsaio_data *wd = td->io_ops_data;
 	unsigned int dequeued = 0;
@@ -334,7 +411,6 @@ static int fio_windowsaio_getevents(struct thread_data *td, unsigned int min,
 				wd->aio_events[dequeued] = io_u;
 				dequeued++;
 			}
-
 		}
 		if (dequeued >= min)
 			break;
@@ -353,6 +429,16 @@ static int fio_windowsaio_getevents(struct thread_data *td, unsigned int min,
 	return dequeued;
 }
 
+static int fio_windowsaio_getevents(struct thread_data *td, unsigned int min,
+				    unsigned int max, const struct timespec *t)
+{
+	struct windowsaio_options *o = td->eo;
+
+	if (o->no_completion_thread)
+		return fio_windowsaio_getevents_nothread(td, min, max, t);
+	return fio_windowaio_getevents_thread(td, min, max, t);
+}
+
 static enum fio_q_status fio_windowsaio_queue(struct thread_data *td,
 					      struct io_u *io_u)
 {
@@ -484,6 +570,8 @@ static struct ioengine_ops ioengine = {
 	.get_file_size	= generic_get_file_size,
 	.io_u_init	= fio_windowsaio_io_u_init,
 	.io_u_free	= fio_windowsaio_io_u_free,
+	.options	= options,
+	.option_struct_size	= sizeof(struct windowsaio_options),
 };
 
 static void fio_init fio_windowsaio_register(void)
diff --git a/examples/cmdprio-bssplit.fio b/examples/cmdprio-bssplit.fio
index 47e9a790..f3b2fac0 100644
--- a/examples/cmdprio-bssplit.fio
+++ b/examples/cmdprio-bssplit.fio
@@ -1,17 +1,44 @@
 ; Randomly read/write a block device file at queue depth 16.
-; 40 % of read IOs are 64kB and 60% are 1MB. 100% of writes are 1MB.
-; 100% of the 64kB reads are executed at the highest priority and
-; all other IOs executed without a priority set.
 [global]
 filename=/dev/sda
 direct=1
 write_lat_log=prio-run.log
 log_prio=1
-
-[randrw]
 rw=randrw
-bssplit=64k/40:1024k/60,1024k/100
 ioengine=libaio
 iodepth=16
+
+; Simple cmdprio_bssplit format. All non-zero percentage entries will
+; use the same prio class and prio level defined by the cmdprio_class
+; and cmdprio options.
+[cmdprio]
+; 40% of read IOs are 64kB and 60% are 1MB. 100% of writes are 1MB.
+; 100% of the 64kB reads are executed with prio class 1 and prio level 0.
+; All other IOs are executed without a priority set.
+bssplit=64k/40:1024k/60,1024k/100
 cmdprio_bssplit=64k/100:1024k/0,1024k/0
 cmdprio_class=1
+cmdprio=0
+
+; Advanced cmdprio_bssplit format. Each non-zero percentage entry can
+; use a different prio class and prio level (appended to each entry).
+[cmdprio-adv]
+; 40% of read IOs are 64kB and 60% are 1MB. 100% of writes are 1MB.
+; 25% of the 64kB reads are executed with prio class 1 and prio level 1,
+; 75% of the 64kB reads are executed with prio class 3 and prio level 2.
+; All other IOs are executed without a priority set.
+stonewall
+bssplit=64k/40:1024k/60,1024k/100
+cmdprio_bssplit=64k/25/1/1:64k/75/3/2:1024k/0,1024k/0
+
+; Identical to the previous example, but with a default priority defined.
+[cmdprio-adv-def]
+; 40% of read IOs are 64kB and 60% are 1MB. 100% of writes are 1MB.
+; 25% of the 64kB reads are executed with prio class 1 and prio level 1,
+; 75% of the 64kB reads are executed with prio class 3 and prio level 2.
+; All other IOs are executed with prio class 2 and prio level 7.
+stonewall
+prioclass=2
+prio=7
+bssplit=64k/40:1024k/60,1024k/100
+cmdprio_bssplit=64k/25/1/1:64k/75/3/2:1024k/0,1024k/0
diff --git a/fio.1 b/fio.1
index b87d2309..f32d7915 100644
--- a/fio.1
+++ b/fio.1
@@ -1122,7 +1122,7 @@ see \fBend_fsync\fR and \fBfsync_on_close\fR.
 .TP
 .BI fdatasync \fR=\fPint
 Like \fBfsync\fR but uses \fBfdatasync\fR\|(2) to only sync data and
-not metadata blocks. In Windows, FreeBSD, DragonFlyBSD or OSX there is no
+not metadata blocks. In Windows, DragonFlyBSD or OSX there is no
 \fBfdatasync\fR\|(2) so this falls back to using \fBfsync\fR\|(2).
 Defaults to 0, which means fio does not periodically issue and wait for a
 data-only sync to complete.
@@ -1995,10 +1995,34 @@ To get a finer control over I/O priority, this option allows specifying
 the percentage of IOs that must have a priority set depending on the block
 size of the IO. This option is useful only when used together with the option
 \fBbssplit\fR, that is, multiple different block sizes are used for reads and
-writes. The format for this option is the same as the format of the
-\fBbssplit\fR option, with the exception that values for trim IOs are
-ignored. This option is mutually exclusive with the \fBcmdprio_percentage\fR
-option.
+writes.
+.RS
+.P
+The first accepted format for this option is the same as the format of the
+\fBbssplit\fR option:
+.RS
+.P
+cmdprio_bssplit=blocksize/percentage:blocksize/percentage
+.RE
+.P
+In this case, each entry will use the priority class and priority level defined
+by the options \fBcmdprio_class\fR and \fBcmdprio\fR respectively.
+.P
+The second accepted format for this option is:
+.RS
+.P
+cmdprio_bssplit=blocksize/percentage/class/level:blocksize/percentage/class/level
+.RE
+.P
+In this case, the priority class and priority level is defined inside each
+entry. In comparison with the first accepted format, the second accepted format
+does not restrict all entries to have the same priority class and priority
+level.
+.P
+For both formats, only the read and write data directions are supported, values
+for trim IOs are ignored. This option is mutually exclusive with the
+\fBcmdprio_percentage\fR option.
+.RE
 .TP
 .BI (io_uring)fixedbufs
 If fio is asked to do direct IO, then Linux will map pages for each IO call, and
@@ -3360,6 +3384,17 @@ If set, fio will log Unix timestamps to the log files produced by enabling
 write_type_log for each log type, instead of the default zero-based
 timestamps.
 .TP
+.BI log_alternate_epoch \fR=\fPbool
+If set, fio will log timestamps based on the epoch used by the clock specified
+in the \fBlog_alternate_epoch_clock_id\fR option, to the log files produced by
+enabling write_type_log for each log type, instead of the default zero-based
+timestamps.
+.TP
+.BI log_alternate_epoch_clock_id \fR=\fPint
+Specifies the clock_id to be used by clock_gettime to obtain the alternate epoch
+if either \fBBlog_unix_epoch\fR or \fBlog_alternate_epoch\fR are true. Otherwise has no
+effect. Default value is 0, or CLOCK_REALTIME.
+.TP
 .BI block_error_percentiles \fR=\fPbool
 If set, record errors in trim block-sized units from writes and trims and
 output a histogram of how many trims it took to get to errors, and what kind
diff --git a/fio.h b/fio.h
index 1ea3d064..7b0ca843 100644
--- a/fio.h
+++ b/fio.h
@@ -380,7 +380,7 @@ struct thread_data {
 
 	struct timespec start;	/* start of this loop */
 	struct timespec epoch;	/* time job was started */
-	unsigned long long unix_epoch; /* Time job was started, unix epoch based. */
+	unsigned long long alternate_epoch; /* Time job was started, clock_gettime's clock_id epoch based. */
 	struct timespec last_issue;
 	long time_offset;
 	struct timespec ts_cache;
diff --git a/fio_time.h b/fio_time.h
index b3bbd4c0..62d92120 100644
--- a/fio_time.h
+++ b/fio_time.h
@@ -30,6 +30,6 @@ extern bool ramp_time_over(struct thread_data *);
 extern bool in_ramp_time(struct thread_data *);
 extern void fio_time_init(void);
 extern void timespec_add_msec(struct timespec *, unsigned int);
-extern void set_epoch_time(struct thread_data *, int);
+extern void set_epoch_time(struct thread_data *, int, clockid_t);
 
 #endif
diff --git a/gclient.c b/gclient.c
index ac063536..c59bcfe2 100644
--- a/gclient.c
+++ b/gclient.c
@@ -1155,21 +1155,18 @@ out:
 #define GFIO_CLAT	1
 #define GFIO_SLAT	2
 #define GFIO_LAT	4
-#define GFIO_HILAT	8
-#define GFIO_LOLAT	16
 
 static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 				  struct group_run_stats *rs,
 				  struct thread_stat *ts, int ddir)
 {
 	const char *ddir_label[3] = { "Read", "Write", "Trim" };
-	const char *hilat, *lolat;
 	GtkWidget *frame, *label, *box, *vbox, *main_vbox;
-	unsigned long long min[5], max[5];
+	unsigned long long min[3], max[3];
 	unsigned long runt;
 	unsigned long long bw, iops;
 	unsigned int flags = 0;
-	double mean[5], dev[5];
+	double mean[3], dev[3];
 	char *io_p, *io_palt, *bw_p, *bw_palt, *iops_p;
 	char tmp[128];
 	int i2p;
@@ -1268,14 +1265,6 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 		flags |= GFIO_CLAT;
 	if (calc_lat(&ts->lat_stat[ddir], &min[2], &max[2], &mean[2], &dev[2]))
 		flags |= GFIO_LAT;
-	if (calc_lat(&ts->clat_high_prio_stat[ddir], &min[3], &max[3], &mean[3], &dev[3])) {
-		flags |= GFIO_HILAT;
-		if (calc_lat(&ts->clat_low_prio_stat[ddir], &min[4], &max[4], &mean[4], &dev[4]))
-			flags |= GFIO_LOLAT;
-		/* we only want to print low priority statistics if other IOs were
-		 * submitted with the priority bit set
-		 */
-	}
 
 	if (flags) {
 		frame = gtk_frame_new("Latency");
@@ -1284,24 +1273,12 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 		vbox = gtk_vbox_new(FALSE, 3);
 		gtk_container_add(GTK_CONTAINER(frame), vbox);
 
-		if (ts->lat_percentiles) {
-			hilat = "High priority total latency";
-			lolat = "Low priority total latency";
-		} else {
-			hilat = "High priority completion latency";
-			lolat = "Low priority completion latency";
-		}
-
 		if (flags & GFIO_SLAT)
 			gfio_show_lat(vbox, "Submission latency", min[0], max[0], mean[0], dev[0]);
 		if (flags & GFIO_CLAT)
 			gfio_show_lat(vbox, "Completion latency", min[1], max[1], mean[1], dev[1]);
 		if (flags & GFIO_LAT)
 			gfio_show_lat(vbox, "Total latency", min[2], max[2], mean[2], dev[2]);
-		if (flags & GFIO_HILAT)
-			gfio_show_lat(vbox, hilat, min[3], max[3], mean[3], dev[3]);
-		if (flags & GFIO_LOLAT)
-			gfio_show_lat(vbox, lolat, min[4], max[4], mean[4], dev[4]);
 	}
 
 	if (ts->slat_percentiles && flags & GFIO_SLAT)
@@ -1309,40 +1286,16 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 				ts->io_u_plat[FIO_SLAT][ddir],
 				ts->slat_stat[ddir].samples,
 				"Submission");
-	if (ts->clat_percentiles && flags & GFIO_CLAT) {
+	if (ts->clat_percentiles && flags & GFIO_CLAT)
 		gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
 				ts->io_u_plat[FIO_CLAT][ddir],
 				ts->clat_stat[ddir].samples,
 				"Completion");
-		if (!ts->lat_percentiles) {
-			if (flags & GFIO_HILAT)
-				gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
-						ts->io_u_plat_high_prio[ddir],
-						ts->clat_high_prio_stat[ddir].samples,
-						"High priority completion");
-			if (flags & GFIO_LOLAT)
-				gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
-						ts->io_u_plat_low_prio[ddir],
-						ts->clat_low_prio_stat[ddir].samples,
-						"Low priority completion");
-		}
-	}
-	if (ts->lat_percentiles && flags & GFIO_LAT) {
+	if (ts->lat_percentiles && flags & GFIO_LAT)
 		gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
 				ts->io_u_plat[FIO_LAT][ddir],
 				ts->lat_stat[ddir].samples,
 				"Total");
-		if (flags & GFIO_HILAT)
-			gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
-					ts->io_u_plat_high_prio[ddir],
-					ts->clat_high_prio_stat[ddir].samples,
-					"High priority total");
-		if (flags & GFIO_LOLAT)
-			gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
-					ts->io_u_plat_low_prio[ddir],
-					ts->clat_low_prio_stat[ddir].samples,
-					"Low priority total");
-	}
 
 	free(io_p);
 	free(bw_p);
diff --git a/init.c b/init.c
index 07daaa84..13935152 100644
--- a/init.c
+++ b/init.c
@@ -224,6 +224,13 @@ static struct option l_opts[FIO_NR_OPTIONS] = {
 		.has_arg	= optional_argument,
 		.val		= 'S',
 	},
+#ifdef WIN32
+	{
+		.name		= (char *) "server-internal",
+		.has_arg	= required_argument,
+		.val		= 'N',
+	},
+#endif
 	{	.name		= (char *) "daemonize",
 		.has_arg	= required_argument,
 		.val		= 'D',
@@ -1445,6 +1452,26 @@ static bool wait_for_ok(const char *jobname, struct thread_options *o)
 	return true;
 }
 
+static int verify_per_group_options(struct thread_data *td, const char *jobname)
+{
+	struct thread_data *td2;
+	int i;
+
+	for_each_td(td2, i) {
+		if (td->groupid != td2->groupid)
+			continue;
+
+		if (td->o.stats &&
+		    td->o.lat_percentiles != td2->o.lat_percentiles) {
+			log_err("fio: lat_percentiles in job: %s differs from group\n",
+				jobname);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
 /*
  * Treat an empty log file name the same as a one not given
  */
@@ -1563,6 +1590,10 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	td->groupid = groupid;
 	prev_group_jobs++;
 
+	if (td->o.group_reporting && prev_group_jobs > 1 &&
+	    verify_per_group_options(td, jobname))
+		goto err;
+
 	if (setup_rate(td))
 		goto err;
 
@@ -2795,6 +2826,12 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			exit_val = 1;
 #endif
 			break;
+#ifdef WIN32
+		case 'N':
+			did_arg = true;
+			fio_server_internal_set(optarg);
+			break;
+#endif
 		case 'D':
 			if (pid_file)
 				free(pid_file);
diff --git a/io_u.c b/io_u.c
index 3c72d63d..059637e5 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1595,7 +1595,7 @@ again:
 		assert(io_u->flags & IO_U_F_FREE);
 		io_u_clear(td, io_u, IO_U_F_FREE | IO_U_F_NO_FILE_PUT |
 				 IO_U_F_TRIMMED | IO_U_F_BARRIER |
-				 IO_U_F_VER_LIST | IO_U_F_HIGH_PRIO);
+				 IO_U_F_VER_LIST);
 
 		io_u->error = 0;
 		io_u->acct_ddir = -1;
@@ -1803,6 +1803,7 @@ struct io_u *get_io_u(struct thread_data *td)
 	 * Remember the issuing context priority. The IO engine may change this.
 	 */
 	io_u->ioprio = td->ioprio;
+	io_u->clat_prio_index = 0;
 out:
 	assert(io_u->file);
 	if (!td_io_prep(td, io_u)) {
@@ -1889,7 +1890,7 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 
 		tnsec = ntime_since(&io_u->start_time, &icd->time);
 		add_lat_sample(td, idx, tnsec, bytes, io_u->offset,
-			       io_u->ioprio, io_u_is_high_prio(io_u));
+			       io_u->ioprio, io_u->clat_prio_index);
 
 		if (td->flags & TD_F_PROFILE_OPS) {
 			struct prof_io_ops *ops = &td->prof_io_ops;
@@ -1911,7 +1912,7 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 	if (ddir_rw(idx)) {
 		if (!td->o.disable_clat) {
 			add_clat_sample(td, idx, llnsec, bytes, io_u->offset,
-					io_u->ioprio, io_u_is_high_prio(io_u));
+					io_u->ioprio, io_u->clat_prio_index);
 			io_u_mark_latency(td, llnsec);
 		}
 
diff --git a/io_u.h b/io_u.h
index bdbac525..206e24fe 100644
--- a/io_u.h
+++ b/io_u.h
@@ -21,7 +21,6 @@ enum {
 	IO_U_F_TRIMMED		= 1 << 5,
 	IO_U_F_BARRIER		= 1 << 6,
 	IO_U_F_VER_LIST		= 1 << 7,
-	IO_U_F_HIGH_PRIO	= 1 << 8,
 };
 
 /*
@@ -50,6 +49,7 @@ struct io_u {
 	 * IO priority.
 	 */
 	unsigned short ioprio;
+	unsigned short clat_prio_index;
 
 	/*
 	 * Allocated/set buffer and length
@@ -193,6 +193,5 @@ static inline enum fio_ddir acct_ddir(struct io_u *io_u)
 	td_flags_clear((td), &(io_u->flags), (val))
 #define io_u_set(td, io_u, val)		\
 	td_flags_set((td), &(io_u)->flags, (val))
-#define io_u_is_high_prio(io_u)	(io_u->flags & IO_U_F_HIGH_PRIO)
 
 #endif
diff --git a/libfio.c b/libfio.c
index 198eaf2e..01fa7452 100644
--- a/libfio.c
+++ b/libfio.c
@@ -142,7 +142,7 @@ void reset_all_stats(struct thread_data *td)
 		td->ts.runtime[i] = 0;
 	}
 
-	set_epoch_time(td, td->o.log_unix_epoch);
+	set_epoch_time(td, td->o.log_unix_epoch | td->o.log_alternate_epoch, td->o.log_alternate_epoch_clock_id);
 	memcpy(&td->start, &td->epoch, sizeof(td->epoch));
 	memcpy(&td->iops_sample_time, &td->epoch, sizeof(td->epoch));
 	memcpy(&td->bw_sample_time, &td->epoch, sizeof(td->epoch));
diff --git a/optgroup.h b/optgroup.h
index 1fb84a29..3ac8f62a 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -71,6 +71,7 @@ enum opt_category_group {
 	__FIO_OPT_G_LIBCUFILE,
 	__FIO_OPT_G_DFS,
 	__FIO_OPT_G_NFS,
+	__FIO_OPT_G_WINDOWSAIO,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
 	FIO_OPT_G_ZONE		= (1ULL << __FIO_OPT_G_ZONE),
@@ -116,6 +117,7 @@ enum opt_category_group {
 	FIO_OPT_G_FILESTAT	= (1ULL << __FIO_OPT_G_FILESTAT),
 	FIO_OPT_G_LIBCUFILE	= (1ULL << __FIO_OPT_G_LIBCUFILE),
 	FIO_OPT_G_DFS		= (1ULL << __FIO_OPT_G_DFS),
+	FIO_OPT_G_WINDOWSAIO	= (1ULL << __FIO_OPT_G_WINDOWSAIO),
 };
 
 extern const struct opt_group *opt_group_from_mask(uint64_t *mask);
diff --git a/options.c b/options.c
index 102bcf56..6cdbd268 100644
--- a/options.c
+++ b/options.c
@@ -278,6 +278,128 @@ static int str_bssplit_cb(void *data, const char *input)
 	return ret;
 }
 
+static int parse_cmdprio_bssplit_entry(struct thread_options *o,
+				       struct split_prio *entry, char *str)
+{
+	int matches = 0;
+	char *bs_str = NULL;
+	long long bs_val;
+	unsigned int perc = 0, class, level;
+
+	/*
+	 * valid entry formats:
+	 * bs/ - %s/ - set perc to 0, prio to -1.
+	 * bs/perc - %s/%u - set prio to -1.
+	 * bs/perc/class/level - %s/%u/%u/%u
+	 */
+	matches = sscanf(str, "%m[^/]/%u/%u/%u", &bs_str, &perc, &class, &level);
+	if (matches < 1) {
+		log_err("fio: invalid cmdprio_bssplit format\n");
+		return 1;
+	}
+
+	if (str_to_decimal(bs_str, &bs_val, 1, o, 0, 0)) {
+		log_err("fio: split conversion failed\n");
+		free(bs_str);
+		return 1;
+	}
+	free(bs_str);
+
+	entry->bs = bs_val;
+	entry->perc = min(perc, 100u);
+	entry->prio = -1;
+	switch (matches) {
+	case 1: /* bs/ case */
+	case 2: /* bs/perc case */
+		break;
+	case 4: /* bs/perc/class/level case */
+		class = min(class, (unsigned int) IOPRIO_MAX_PRIO_CLASS);
+		level = min(level, (unsigned int) IOPRIO_MAX_PRIO);
+		entry->prio = ioprio_value(class, level);
+		break;
+	default:
+		log_err("fio: invalid cmdprio_bssplit format\n");
+		return 1;
+	}
+
+	return 0;
+}
+
+/*
+ * Returns a negative integer if the first argument should be before the second
+ * argument in the sorted list. A positive integer if the first argument should
+ * be after the second argument in the sorted list. A zero if they are equal.
+ */
+static int fio_split_prio_cmp(const void *p1, const void *p2)
+{
+	const struct split_prio *tmp1 = p1;
+	const struct split_prio *tmp2 = p2;
+
+	if (tmp1->bs > tmp2->bs)
+		return 1;
+	if (tmp1->bs < tmp2->bs)
+		return -1;
+	return 0;
+}
+
+int split_parse_prio_ddir(struct thread_options *o, struct split_prio **entries,
+			  int *nr_entries, char *str)
+{
+	struct split_prio *tmp_entries;
+	unsigned int nr_bssplits;
+	char *str_cpy, *p, *fname;
+
+	/* strsep modifies the string, dup it so that we can use strsep twice */
+	p = str_cpy = strdup(str);
+	if (!p)
+		return 1;
+
+	nr_bssplits = 0;
+	while ((fname = strsep(&str_cpy, ":")) != NULL) {
+		if (!strlen(fname))
+			break;
+		nr_bssplits++;
+	}
+	free(p);
+
+	if (nr_bssplits > BSSPLIT_MAX) {
+		log_err("fio: too many cmdprio_bssplit entries\n");
+		return 1;
+	}
+
+	tmp_entries = calloc(nr_bssplits, sizeof(*tmp_entries));
+	if (!tmp_entries)
+		return 1;
+
+	nr_bssplits = 0;
+	while ((fname = strsep(&str, ":")) != NULL) {
+		struct split_prio *entry;
+
+		if (!strlen(fname))
+			break;
+
+		entry = &tmp_entries[nr_bssplits];
+
+		if (parse_cmdprio_bssplit_entry(o, entry, fname)) {
+			log_err("fio: failed to parse cmdprio_bssplit entry\n");
+			free(tmp_entries);
+			return 1;
+		}
+
+		/* skip zero perc entries, they provide no useful information */
+		if (entry->perc)
+			nr_bssplits++;
+	}
+
+	qsort(tmp_entries, nr_bssplits, sizeof(*tmp_entries),
+	      fio_split_prio_cmp);
+
+	*entries = tmp_entries;
+	*nr_entries = nr_bssplits;
+
+	return 0;
+}
+
 static int str2error(char *str)
 {
 	const char *err[] = { "EPERM", "ENOENT", "ESRCH", "EINTR", "EIO",
@@ -4392,6 +4514,24 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_LOG,
 		.group = FIO_OPT_G_INVALID,
 	},
+	{
+		.name = "log_alternate_epoch",
+		.lname = "Log epoch alternate",
+		.type = FIO_OPT_BOOL,
+		.off1 = offsetof(struct thread_options, log_alternate_epoch),
+		.help = "Use alternate epoch time in log files. Uses the same epoch as that is used by clock_gettime with specified log_alternate_epoch_clock_id.",
+		.category = FIO_OPT_C_LOG,
+		.group = FIO_OPT_G_INVALID,
+	},
+	{
+		.name = "log_alternate_epoch_clock_id",
+		.lname = "Log alternate epoch clock_id",
+		.type = FIO_OPT_INT,
+		.off1 = offsetof(struct thread_options, log_alternate_epoch_clock_id),
+		.help = "If log_alternate_epoch or log_unix_epoch is true, this option specifies the clock_id from clock_gettime whose epoch should be used. If neither of those is true, this option has no effect. Default value is 0, or CLOCK_REALTIME",
+		.category = FIO_OPT_C_LOG,
+		.group = FIO_OPT_G_INVALID,
+	},
 	{
 		.name	= "block_error_percentiles",
 		.lname	= "Block error percentiles",
diff --git a/os/os-windows.h b/os/os-windows.h
index 59da9dba..510b8143 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -110,6 +110,8 @@ int nanosleep(const struct timespec *rqtp, struct timespec *rmtp);
 ssize_t pread(int fildes, void *buf, size_t nbyte, off_t offset);
 ssize_t pwrite(int fildes, const void *buf, size_t nbyte,
 		off_t offset);
+HANDLE windows_handle_connection(HANDLE hjob, int sk);
+HANDLE windows_create_job(void);
 
 static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 {
diff --git a/os/os.h b/os/os.h
index 5965d7b8..810e6166 100644
--- a/os/os.h
+++ b/os/os.h
@@ -119,10 +119,14 @@ extern int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu);
 
 #ifndef FIO_HAVE_IOPRIO_CLASS
 #define ioprio_value_is_class_rt(prio)	(false)
+#define IOPRIO_MIN_PRIO_CLASS		0
+#define IOPRIO_MAX_PRIO_CLASS		0
 #endif
 #ifndef FIO_HAVE_IOPRIO
 #define ioprio_value(prioclass, prio)	(0)
 #define ioprio_set(which, who, prioclass, prio)	(0)
+#define IOPRIO_MIN_PRIO			0
+#define IOPRIO_MAX_PRIO			0
 #endif
 
 #ifndef FIO_HAVE_ODIRECT
diff --git a/os/windows/posix.c b/os/windows/posix.c
index 09c2e4a7..0d415e1e 100644
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -537,16 +537,21 @@ int fcntl(int fildes, int cmd, ...)
 return 0;
 }
 
+#ifndef CLOCK_MONOTONIC_RAW
+#define CLOCK_MONOTONIC_RAW 4
+#endif
+
 /*
  * Get the value of a local clock source.
- * This implementation supports 2 clocks: CLOCK_MONOTONIC provides high-accuracy
- * relative time, while CLOCK_REALTIME provides a low-accuracy wall time.
+ * This implementation supports 3 clocks: CLOCK_MONOTONIC/CLOCK_MONOTONIC_RAW
+ * provide high-accuracy relative time, while CLOCK_REALTIME provides a
+ * low-accuracy wall time.
  */
 int clock_gettime(clockid_t clock_id, struct timespec *tp)
 {
 	int rc = 0;
 
-	if (clock_id == CLOCK_MONOTONIC) {
+	if (clock_id == CLOCK_MONOTONIC || clock_id == CLOCK_MONOTONIC_RAW) {
 		static LARGE_INTEGER freq = {{0,0}};
 		LARGE_INTEGER counts;
 		uint64_t t;
@@ -1026,3 +1031,174 @@ in_addr_t inet_network(const char *cp)
 	hbo = ((nbo & 0xFF) << 24) + ((nbo & 0xFF00) << 8) + ((nbo & 0xFF0000) >> 8) + ((nbo & 0xFF000000) >> 24);
 	return hbo;
 }
+
+static HANDLE create_named_pipe(char *pipe_name, int wait_connect_time)
+{
+	HANDLE hpipe;
+
+	hpipe = CreateNamedPipe (
+			pipe_name,
+			PIPE_ACCESS_DUPLEX,
+			PIPE_WAIT | PIPE_TYPE_BYTE,
+			1, 0, 0, wait_connect_time, NULL);
+
+	if (hpipe == INVALID_HANDLE_VALUE) {
+		log_err("ConnectNamedPipe failed (%lu).\n", GetLastError());
+		return INVALID_HANDLE_VALUE;
+	}
+
+	if (!ConnectNamedPipe(hpipe, NULL)) {
+		log_err("ConnectNamedPipe failed (%lu).\n", GetLastError());
+		CloseHandle(hpipe);
+		return INVALID_HANDLE_VALUE;
+	}
+
+	return hpipe;
+}
+
+static BOOL windows_create_process(PROCESS_INFORMATION *pi, const char *args, HANDLE *hjob)
+{
+	LPSTR this_cmd_line = GetCommandLine();
+	LPSTR new_process_cmd_line = malloc((strlen(this_cmd_line)+strlen(args)) * sizeof(char *));
+	STARTUPINFO si = {0};
+	DWORD flags = 0;
+
+	strcpy(new_process_cmd_line, this_cmd_line);
+	strcat(new_process_cmd_line, args);
+
+	si.cb = sizeof(si);
+	memset(pi, 0, sizeof(*pi));
+
+	if ((hjob != NULL) && (*hjob != INVALID_HANDLE_VALUE))
+		flags = CREATE_SUSPENDED | CREATE_BREAKAWAY_FROM_JOB;
+
+	flags |= CREATE_NEW_CONSOLE;
+
+	if( !CreateProcess( NULL,
+		new_process_cmd_line,
+		NULL,    /* Process handle not inherited */
+		NULL,    /* Thread handle not inherited */
+		TRUE,    /* no handle inheritance */
+		flags,
+		NULL,    /* Use parent's environment block */
+		NULL,    /* Use parent's starting directory */
+		&si,
+		pi )
+	)
+	{
+		log_err("CreateProcess failed (%lu).\n", GetLastError() );
+		free(new_process_cmd_line);
+		return 1;
+	}
+	if ((hjob != NULL) && (*hjob != INVALID_HANDLE_VALUE)) {
+		BOOL ret = AssignProcessToJobObject(*hjob, pi->hProcess);
+		if (!ret) {
+			log_err("AssignProcessToJobObject failed (%lu).\n", GetLastError() );
+			return 1;
+		}
+
+ 		ResumeThread(pi->hThread);
+	}
+
+	free(new_process_cmd_line);
+	return 0;
+}
+
+HANDLE windows_create_job(void)
+{
+	JOBOBJECT_EXTENDED_LIMIT_INFORMATION jeli = { 0 };
+	BOOL success;
+	HANDLE hjob = CreateJobObject(NULL, NULL);
+
+	jeli.BasicLimitInformation.LimitFlags = JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE;
+	success = SetInformationJobObject(hjob, JobObjectExtendedLimitInformation, &jeli, sizeof(jeli));
+	if ( success == 0 ) {
+        log_err( "SetInformationJobObject failed: error %lu\n", GetLastError() );
+        return INVALID_HANDLE_VALUE;
+    }
+	return hjob;
+}
+
+/* wait for a child process to either exit or connect to a child */
+static bool monitor_process_till_connect(PROCESS_INFORMATION *pi, HANDLE *hpipe)
+{
+	bool connected = FALSE;
+	bool process_alive = TRUE;
+	char buffer[32] = {0};
+	DWORD bytes_read;
+
+	do {
+		DWORD exit_code;
+		GetExitCodeProcess(pi->hProcess, &exit_code);
+		if (exit_code != STILL_ACTIVE) {
+			dprint(FD_PROCESS, "process %u exited %d\n", GetProcessId(pi->hProcess), exit_code);
+			break;
+		}
+
+		memset(buffer, 0, sizeof(buffer));
+		ReadFile(*hpipe, &buffer, sizeof(buffer) - 1, &bytes_read, NULL);
+		if (bytes_read && strstr(buffer, "connected")) {
+			dprint(FD_PROCESS, "process %u connected to client\n", GetProcessId(pi->hProcess));
+			connected = TRUE;
+		}
+		usleep(10*1000);
+	} while (process_alive && !connected);
+	return connected;
+}
+
+/*create a process with --server-internal to emulate fork() */
+HANDLE windows_handle_connection(HANDLE hjob, int sk)
+{
+	char pipe_name[64] =  "\\\\.\\pipe\\fiointernal-";
+	char args[128] = " --server-internal=";
+	PROCESS_INFORMATION pi;
+	HANDLE hpipe = INVALID_HANDLE_VALUE;
+	WSAPROTOCOL_INFO protocol_info;
+	HANDLE ret;
+
+	sprintf(pipe_name+strlen(pipe_name), "%d", GetCurrentProcessId());
+	sprintf(args+strlen(args), "%s", pipe_name);
+
+	if (windows_create_process(&pi, args, &hjob) != 0)
+		return INVALID_HANDLE_VALUE;
+	else
+		ret = pi.hProcess;
+
+	/* duplicate socket and write the protocol_info to pipe so child can
+	 * duplicate the communciation socket */
+	if (WSADuplicateSocket(sk, GetProcessId(pi.hProcess), &protocol_info)) {
+		log_err("WSADuplicateSocket failed (%lu).\n", GetLastError());
+		ret = INVALID_HANDLE_VALUE;
+		goto cleanup;
+	}
+
+	/* make a pipe with a unique name based upon processid */
+	hpipe = create_named_pipe(pipe_name, 1000);
+	if (hpipe == INVALID_HANDLE_VALUE) {
+		ret = INVALID_HANDLE_VALUE;
+		goto cleanup;
+	}
+
+	if (!WriteFile(hpipe, &protocol_info, sizeof(protocol_info), NULL, NULL)) {
+		log_err("WriteFile failed (%lu).\n", GetLastError());
+		ret = INVALID_HANDLE_VALUE;
+		goto cleanup;
+	}
+
+	dprint(FD_PROCESS, "process %d created child process %u\n", GetCurrentProcessId(), GetProcessId(pi.hProcess));
+
+	/* monitor the process until it either exits or connects. This level
+	 * doesnt care which of those occurs because the result is that it
+	 * needs to loop around and create another child process to monitor */
+	if (!monitor_process_till_connect(&pi, &hpipe))
+		ret = INVALID_HANDLE_VALUE;
+
+cleanup:
+	/* close the handles and pipes because this thread is done monitoring them */
+	if (ret == INVALID_HANDLE_VALUE)
+		CloseHandle(pi.hProcess);
+	CloseHandle(pi.hThread);
+	DisconnectNamedPipe(hpipe);
+	CloseHandle(hpipe);
+	return ret;
+}
\ No newline at end of file
diff --git a/rate-submit.c b/rate-submit.c
index 752c30a5..268356d1 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -173,7 +173,7 @@ static int io_workqueue_init_worker_fn(struct submit_worker *sw)
 	if (td->io_ops->post_init && td->io_ops->post_init(td))
 		goto err_io_init;
 
-	set_epoch_time(td, td->o.log_unix_epoch);
+	set_epoch_time(td, td->o.log_unix_epoch | td->o.log_alternate_epoch, td->o.log_alternate_epoch_clock_id);
 	fio_getrusage(&td->ru_start);
 	clear_io_state(td, 1);
 
@@ -195,6 +195,15 @@ static void io_workqueue_exit_worker_fn(struct submit_worker *sw,
 	struct thread_data *td = sw->priv;
 
 	(*sum_cnt)++;
+
+	/*
+	 * io_workqueue_update_acct_fn() doesn't support per prio stats, and
+	 * even if it did, offload can't be used with all async IO engines.
+	 * If group reporting is set in the parent td, the group result
+	 * generated by __show_run_stats() can still contain multiple prios
+	 * from different offloaded jobs.
+	 */
+	sw->wq->td->ts.disable_prio_stat = 1;
 	sum_thread_stats(&sw->wq->td->ts, &td->ts);
 
 	fio_options_free(td);
diff --git a/server.c b/server.c
index 90c52e01..914a8c74 100644
--- a/server.c
+++ b/server.c
@@ -63,12 +63,28 @@ static char me[128];
 
 static pthread_key_t sk_out_key;
 
+#ifdef WIN32
+static char *fio_server_pipe_name  = NULL;
+static HANDLE hjob = INVALID_HANDLE_VALUE;
+struct ffi_element {
+	union {
+		pthread_t thread;
+		HANDLE hProcess;
+	};
+	bool is_thread;
+};
+#endif
+
 struct fio_fork_item {
 	struct flist_head list;
 	int exitval;
 	int signal;
 	int exited;
+#ifdef WIN32
+	struct ffi_element element;
+#else
 	pid_t pid;
+#endif
 };
 
 struct cmd_reply {
@@ -250,6 +266,28 @@ static int fio_send_data(int sk, const void *p, unsigned int len)
 	return fio_sendv_data(sk, &iov, 1);
 }
 
+bool fio_server_poll_fd(int fd, short events, int timeout)
+{
+	struct pollfd pfd = {
+		.fd	= fd,
+		.events	= events,
+	};
+	int ret;
+
+	ret = poll(&pfd, 1, timeout);
+	if (ret < 0) {
+		if (errno == EINTR)
+			return false;
+		log_err("fio: poll: %s\n", strerror(errno));
+		return false;
+	} else if (!ret) {
+		return false;
+	}
+	if (pfd.revents & events)
+		return true;
+	return false;
+}
+
 static int fio_recv_data(int sk, void *buf, unsigned int len, bool wait)
 {
 	int flags;
@@ -651,6 +689,63 @@ static int fio_net_queue_stop(int error, int signal)
 	return fio_net_send_ack(NULL, error, signal);
 }
 
+#ifdef WIN32
+static void fio_server_add_fork_item(struct ffi_element *element, struct flist_head *list)
+{
+	struct fio_fork_item *ffi;
+
+	ffi = malloc(sizeof(*ffi));
+	ffi->exitval = 0;
+	ffi->signal = 0;
+	ffi->exited = 0;
+	ffi->element = *element;
+	flist_add_tail(&ffi->list, list);
+}
+
+static void fio_server_add_conn_pid(struct flist_head *conn_list, HANDLE hProcess)
+{
+	struct ffi_element element = {.hProcess = hProcess, .is_thread=FALSE};
+	dprint(FD_NET, "server: forked off connection job (tid=%u)\n", (int) element.thread);
+
+	fio_server_add_fork_item(&element, conn_list);
+}
+
+static void fio_server_add_job_pid(struct flist_head *job_list, pthread_t thread)
+{
+	struct ffi_element element = {.thread = thread, .is_thread=TRUE};
+	dprint(FD_NET, "server: forked off job job (tid=%u)\n", (int) element.thread);
+	fio_server_add_fork_item(&element, job_list);
+}
+
+static void fio_server_check_fork_item(struct fio_fork_item *ffi)
+{
+	int ret;
+
+	if (ffi->element.is_thread) {
+
+		ret = pthread_kill(ffi->element.thread, 0);
+		if (ret) {
+			int rev_val;
+			pthread_join(ffi->element.thread, (void**) &rev_val); /*if the thread is dead, then join it to get status*/
+
+			ffi->exitval = rev_val;
+			if (ffi->exitval)
+				log_err("thread (tid=%u) exited with %x\n", (int) ffi->element.thread, (int) ffi->exitval);
+			dprint(FD_PROCESS, "thread (tid=%u) exited with %x\n", (int) ffi->element.thread, (int) ffi->exitval);
+			ffi->exited = 1;
+		}
+	} else {
+		DWORD exit_val;
+		GetExitCodeProcess(ffi->element.hProcess, &exit_val);
+
+		if (exit_val != STILL_ACTIVE) {
+			dprint(FD_PROCESS, "process %u exited with %d\n", GetProcessId(ffi->element.hProcess), exit_val);
+			ffi->exited = 1;
+			ffi->exitval = exit_val;
+		}
+	}
+}
+#else
 static void fio_server_add_fork_item(pid_t pid, struct flist_head *list)
 {
 	struct fio_fork_item *ffi;
@@ -698,10 +793,21 @@ static void fio_server_check_fork_item(struct fio_fork_item *ffi)
 		}
 	}
 }
+#endif
 
 static void fio_server_fork_item_done(struct fio_fork_item *ffi, bool stop)
 {
+#ifdef WIN32
+	if (ffi->element.is_thread)
+		dprint(FD_NET, "tid %u exited, sig=%u, exitval=%d\n", (int) ffi->element.thread, ffi->signal, ffi->exitval);
+	else {
+		dprint(FD_NET, "pid %u exited, sig=%u, exitval=%d\n", (int)  GetProcessId(ffi->element.hProcess), ffi->signal, ffi->exitval);
+		CloseHandle(ffi->element.hProcess);
+		ffi->element.hProcess = INVALID_HANDLE_VALUE;
+	}
+#else
 	dprint(FD_NET, "pid %u exited, sig=%u, exitval=%d\n", (int) ffi->pid, ffi->signal, ffi->exitval);
+#endif
 
 	/*
 	 * Fold STOP and QUIT...
@@ -762,27 +868,62 @@ static int handle_load_file_cmd(struct fio_net_cmd *cmd)
 	return 0;
 }
 
-static int handle_run_cmd(struct sk_out *sk_out, struct flist_head *job_list,
-			  struct fio_net_cmd *cmd)
+#ifdef WIN32
+static void *fio_backend_thread(void *data)
 {
-	pid_t pid;
 	int ret;
+	struct sk_out *sk_out = (struct sk_out *) data;
 
 	sk_out_assign(sk_out);
 
+	ret = fio_backend(sk_out);
+	sk_out_drop();
+
+	pthread_exit((void*) (intptr_t) ret);
+	return NULL;
+}
+#endif
+
+static int handle_run_cmd(struct sk_out *sk_out, struct flist_head *job_list,
+			  struct fio_net_cmd *cmd)
+{
+	int ret;
+
 	fio_time_init();
 	set_genesis_time();
 
-	pid = fork();
-	if (pid) {
-		fio_server_add_job_pid(job_list, pid);
-		return 0;
+#ifdef WIN32
+	{
+		pthread_t thread;
+		/* both this thread and backend_thread call sk_out_assign() to double increment
+		 * the ref count.  This ensures struct is valid until both threads are done with it
+		 */
+		sk_out_assign(sk_out);
+		ret = pthread_create(&thread, NULL,	fio_backend_thread, sk_out);
+		if (ret) {
+			log_err("pthread_create: %s\n", strerror(ret));
+			return ret;
+		}
+
+		fio_server_add_job_pid(job_list, thread);
+		return ret;
 	}
+#else
+    {
+		pid_t pid;
+		sk_out_assign(sk_out);
+		pid = fork();
+		if (pid) {
+			fio_server_add_job_pid(job_list, pid);
+			return 0;
+		}
 
-	ret = fio_backend(sk_out);
-	free_threads_shm();
-	sk_out_drop();
-	_exit(ret);
+		ret = fio_backend(sk_out);
+		free_threads_shm();
+		sk_out_drop();
+		_exit(ret);
+	}
+#endif
 }
 
 static int handle_job_cmd(struct fio_net_cmd *cmd)
@@ -1238,7 +1379,8 @@ static int handle_connection(struct sk_out *sk_out)
 		if (ret < 0)
 			break;
 
-		cmd = fio_net_recv_cmd(sk_out->sk, true);
+		if (pfd.revents & POLLIN)
+			cmd = fio_net_recv_cmd(sk_out->sk, true);
 		if (!cmd) {
 			ret = -1;
 			break;
@@ -1300,6 +1442,73 @@ static int get_my_addr_str(int sk)
 	return 0;
 }
 
+#ifdef WIN32
+static int handle_connection_process(void)
+{
+	WSAPROTOCOL_INFO protocol_info;
+	DWORD bytes_read;
+	HANDLE hpipe;
+	int sk;
+	struct sk_out *sk_out;
+	int ret;
+	char *msg = (char *) "connected";
+
+	log_info("server enter accept loop.  ProcessID %d\n", GetCurrentProcessId());
+
+	hpipe = CreateFile(
+					fio_server_pipe_name,
+					GENERIC_READ | GENERIC_WRITE,
+					0, NULL,
+					OPEN_EXISTING,
+					0, NULL);
+
+	if (hpipe == INVALID_HANDLE_VALUE) {
+		log_err("couldnt open pipe %s error %lu\n",
+				fio_server_pipe_name, GetLastError());
+		return -1;
+	}
+
+	if (!ReadFile(hpipe, &protocol_info, sizeof(protocol_info), &bytes_read, NULL)) {
+		log_err("couldnt read pi from pipe %s error %lu\n", fio_server_pipe_name,
+				GetLastError());
+	}
+
+	if (use_ipv6) /* use protocol_info to create a duplicate of parents socket */
+		sk = WSASocket(AF_INET6, SOCK_STREAM, 0, &protocol_info, 0, 0);
+	else
+		sk = WSASocket(AF_INET,  SOCK_STREAM, 0, &protocol_info, 0, 0);
+
+	sk_out = scalloc(1, sizeof(*sk_out));
+	if (!sk_out) {
+		CloseHandle(hpipe);
+		close(sk);
+		return -1;
+	}
+
+	sk_out->sk = sk;
+	sk_out->hProcess = INVALID_HANDLE_VALUE;
+	INIT_FLIST_HEAD(&sk_out->list);
+	__fio_sem_init(&sk_out->lock, FIO_SEM_UNLOCKED);
+	__fio_sem_init(&sk_out->wait, FIO_SEM_LOCKED);
+	__fio_sem_init(&sk_out->xmit, FIO_SEM_UNLOCKED);
+
+	get_my_addr_str(sk);
+
+	if (!WriteFile(hpipe, msg, strlen(msg), NULL, NULL)) {
+		log_err("couldnt write pipe\n");
+		close(sk);
+		return -1;
+	}
+	CloseHandle(hpipe);
+
+	sk_out_assign(sk_out);
+
+	ret = handle_connection(sk_out);
+	__sk_out_drop(sk_out);
+	return ret;
+}
+#endif
+
 static int accept_loop(int listen_sk)
 {
 	struct sockaddr_in addr;
@@ -1317,8 +1526,11 @@ static int accept_loop(int listen_sk)
 		struct sk_out *sk_out;
 		const char *from;
 		char buf[64];
+#ifdef WIN32
+		HANDLE hProcess;
+#else
 		pid_t pid;
-
+#endif
 		pfd.fd = listen_sk;
 		pfd.events = POLLIN;
 		do {
@@ -1376,6 +1588,13 @@ static int accept_loop(int listen_sk)
 		__fio_sem_init(&sk_out->wait, FIO_SEM_LOCKED);
 		__fio_sem_init(&sk_out->xmit, FIO_SEM_UNLOCKED);
 
+#ifdef WIN32
+		hProcess = windows_handle_connection(hjob, sk);
+		if (hProcess == INVALID_HANDLE_VALUE)
+			return -1;
+		sk_out->hProcess = hProcess;
+		fio_server_add_conn_pid(&conn_list, hProcess);
+#else
 		pid = fork();
 		if (pid) {
 			close(sk);
@@ -1392,6 +1611,7 @@ static int accept_loop(int listen_sk)
 		 */
 		sk_out_assign(sk_out);
 		handle_connection(sk_out);
+#endif
 	}
 
 	return exitval;
@@ -1465,8 +1685,11 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 {
 	struct cmd_ts_pdu p;
 	int i, j, k;
-	void *ss_buf;
-	uint64_t *ss_iops, *ss_bw;
+	size_t clat_prio_stats_extra_size = 0;
+	size_t ss_extra_size = 0;
+	size_t extended_buf_size = 0;
+	void *extended_buf;
+	void *extended_buf_wp;
 
 	dprint(FD_NET, "server sending end stats\n");
 
@@ -1483,6 +1706,8 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	p.ts.pid		= cpu_to_le32(ts->pid);
 	p.ts.members		= cpu_to_le32(ts->members);
 	p.ts.unified_rw_rep	= cpu_to_le32(ts->unified_rw_rep);
+	p.ts.ioprio		= cpu_to_le32(ts->ioprio);
+	p.ts.disable_prio_stat	= cpu_to_le32(ts->disable_prio_stat);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		convert_io_stat(&p.ts.clat_stat[i], &ts->clat_stat[i]);
@@ -1577,38 +1802,88 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	p.ts.cachehit		= cpu_to_le64(ts->cachehit);
 	p.ts.cachemiss		= cpu_to_le64(ts->cachemiss);
 
+	convert_gs(&p.rs, rs);
+
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
-			p.ts.io_u_plat_high_prio[i][j] = cpu_to_le64(ts->io_u_plat_high_prio[i][j]);
-			p.ts.io_u_plat_low_prio[i][j] = cpu_to_le64(ts->io_u_plat_low_prio[i][j]);
+		if (ts->nr_clat_prio[i])
+			clat_prio_stats_extra_size += ts->nr_clat_prio[i] * sizeof(*ts->clat_prio[i]);
+	}
+	extended_buf_size += clat_prio_stats_extra_size;
+
+	dprint(FD_NET, "ts->ss_state = %d\n", ts->ss_state);
+	if (ts->ss_state & FIO_SS_DATA)
+		ss_extra_size = 2 * ts->ss_dur * sizeof(uint64_t);
+
+	extended_buf_size += ss_extra_size;
+	if (!extended_buf_size) {
+		fio_net_queue_cmd(FIO_NET_CMD_TS, &p, sizeof(p), NULL, SK_F_COPY);
+		return;
+	}
+
+	extended_buf_size += sizeof(p);
+	extended_buf = calloc(1, extended_buf_size);
+	if (!extended_buf) {
+		log_err("fio: failed to allocate FIO_NET_CMD_TS buffer\n");
+		return;
+	}
+
+	memcpy(extended_buf, &p, sizeof(p));
+	extended_buf_wp = (struct cmd_ts_pdu *)extended_buf + 1;
+
+	if (clat_prio_stats_extra_size) {
+		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+			struct clat_prio_stat *prio = (struct clat_prio_stat *) extended_buf_wp;
+
+			for (j = 0; j < ts->nr_clat_prio[i]; j++) {
+				for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
+					prio->io_u_plat[k] =
+						cpu_to_le64(ts->clat_prio[i][j].io_u_plat[k]);
+				convert_io_stat(&prio->clat_stat,
+						&ts->clat_prio[i][j].clat_stat);
+				prio->ioprio = cpu_to_le32(ts->clat_prio[i][j].ioprio);
+				prio++;
+			}
+
+			if (ts->nr_clat_prio[i]) {
+				uint64_t offset = (char *)extended_buf_wp - (char *)extended_buf;
+				struct cmd_ts_pdu *ptr = extended_buf;
+
+				ptr->ts.clat_prio_offset[i] = cpu_to_le64(offset);
+				ptr->ts.nr_clat_prio[i] = cpu_to_le32(ts->nr_clat_prio[i]);
+			}
+
+			extended_buf_wp = prio;
 		}
-		convert_io_stat(&p.ts.clat_high_prio_stat[i], &ts->clat_high_prio_stat[i]);
-		convert_io_stat(&p.ts.clat_low_prio_stat[i], &ts->clat_low_prio_stat[i]);
 	}
 
-	convert_gs(&p.rs, rs);
+	if (ss_extra_size) {
+		uint64_t *ss_iops, *ss_bw;
+		uint64_t offset;
+		struct cmd_ts_pdu *ptr = extended_buf;
 
-	dprint(FD_NET, "ts->ss_state = %d\n", ts->ss_state);
-	if (ts->ss_state & FIO_SS_DATA) {
 		dprint(FD_NET, "server sending steadystate ring buffers\n");
 
-		ss_buf = malloc(sizeof(p) + 2*ts->ss_dur*sizeof(uint64_t));
+		/* ss iops */
+		ss_iops = (uint64_t *) extended_buf_wp;
+		for (i = 0; i < ts->ss_dur; i++)
+			ss_iops[i] = cpu_to_le64(ts->ss_iops_data[i]);
 
-		memcpy(ss_buf, &p, sizeof(p));
+		offset = (char *)extended_buf_wp - (char *)extended_buf;
+		ptr->ts.ss_iops_data_offset = cpu_to_le64(offset);
+		extended_buf_wp = ss_iops + (int) ts->ss_dur;
 
-		ss_iops = (uint64_t *) ((struct cmd_ts_pdu *)ss_buf + 1);
-		ss_bw = ss_iops + (int) ts->ss_dur;
-		for (i = 0; i < ts->ss_dur; i++) {
-			ss_iops[i] = cpu_to_le64(ts->ss_iops_data[i]);
+		/* ss bw */
+		ss_bw = extended_buf_wp;
+		for (i = 0; i < ts->ss_dur; i++)
 			ss_bw[i] = cpu_to_le64(ts->ss_bw_data[i]);
-		}
-
-		fio_net_queue_cmd(FIO_NET_CMD_TS, ss_buf, sizeof(p) + 2*ts->ss_dur*sizeof(uint64_t), NULL, SK_F_COPY);
 
-		free(ss_buf);
+		offset = (char *)extended_buf_wp - (char *)extended_buf;
+		ptr->ts.ss_bw_data_offset = cpu_to_le64(offset);
+		extended_buf_wp = ss_bw + (int) ts->ss_dur;
 	}
-	else
-		fio_net_queue_cmd(FIO_NET_CMD_TS, &p, sizeof(p), NULL, SK_F_COPY);
+
+	fio_net_queue_cmd(FIO_NET_CMD_TS, extended_buf, extended_buf_size, NULL, SK_F_COPY);
+	free(extended_buf);
 }
 
 void fio_server_send_gs(struct group_run_stats *rs)
@@ -2489,12 +2764,25 @@ static int fio_server(void)
 	if (fio_handle_server_arg())
 		return -1;
 
+	set_sig_handlers();
+
+#ifdef WIN32
+	/* if this is a child process, go handle the connection */
+	if (fio_server_pipe_name != NULL) {
+		ret = handle_connection_process();
+		return ret;
+	}
+
+	/* job to link child processes so they terminate together */
+	hjob = windows_create_job();
+	if (hjob == INVALID_HANDLE_VALUE)
+		return -1;
+#endif
+
 	sk = fio_init_server_connection();
 	if (sk < 0)
 		return -1;
 
-	set_sig_handlers();
-
 	ret = accept_loop(sk);
 
 	close(sk);
@@ -2635,3 +2923,10 @@ void fio_server_set_arg(const char *arg)
 {
 	fio_server_arg = strdup(arg);
 }
+
+#ifdef WIN32
+void fio_server_internal_set(const char *arg)
+{
+	fio_server_pipe_name = strdup(arg);
+}
+#endif
diff --git a/server.h b/server.h
index 25b6bbdc..0e62b6df 100644
--- a/server.h
+++ b/server.h
@@ -15,6 +15,9 @@ struct sk_out {
 	unsigned int refs;	/* frees sk_out when it drops to zero.
 				 * protected by below ->lock */
 
+#ifdef WIN32
+	HANDLE hProcess;		/* process handle of handle_connection_process*/
+#endif
 	int sk;			/* socket fd to talk to client */
 	struct fio_sem lock;	/* protects ref and below list */
 	struct flist_head list;	/* list of pending transmit work */
@@ -48,7 +51,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 95,
+	FIO_SERVER_VER			= 96,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
@@ -212,6 +215,7 @@ extern int fio_server_text_output(int, const char *, size_t);
 extern int fio_net_send_cmd(int, uint16_t, const void *, off_t, uint64_t *, struct flist_head *);
 extern int fio_net_send_simple_cmd(int, uint16_t, uint64_t, struct flist_head *);
 extern void fio_server_set_arg(const char *);
+extern void fio_server_internal_set(const char *);
 extern int fio_server_parse_string(const char *, char **, bool *, int *, struct in_addr *, struct in6_addr *, int *);
 extern int fio_server_parse_host(const char *, int, struct in_addr *, struct in6_addr *);
 extern const char *fio_server_op(unsigned int);
@@ -222,6 +226,7 @@ extern void fio_server_send_gs(struct group_run_stats *);
 extern void fio_server_send_du(void);
 extern void fio_server_send_job_options(struct flist_head *, unsigned int);
 extern int fio_server_get_verify_state(const char *, int, void **);
+extern bool fio_server_poll_fd(int fd, short events, int timeout);
 
 extern struct fio_net_cmd *fio_net_recv_cmd(int sk, bool wait);
 
diff --git a/stat.c b/stat.c
index b08d2f25..0876222a 100644
--- a/stat.c
+++ b/stat.c
@@ -265,6 +265,18 @@ static void show_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 	free(ovals);
 }
 
+static int get_nr_prios_with_samples(struct thread_stat *ts, enum fio_ddir ddir)
+{
+	int i, nr_prios_with_samples = 0;
+
+	for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
+		if (ts->clat_prio[ddir][i].clat_stat.samples)
+			nr_prios_with_samples++;
+	}
+
+	return nr_prios_with_samples;
+}
+
 bool calc_lat(struct io_stat *is, unsigned long long *min,
 	      unsigned long long *max, double *mean, double *dev)
 {
@@ -491,7 +503,8 @@ static struct thread_stat *gen_mixed_ddir_stats_from_ts(struct thread_stat *ts)
 	return ts_lcl;
 }
 
-static double convert_agg_kbytes_percent(struct group_run_stats *rs, int ddir, int mean)
+static double convert_agg_kbytes_percent(struct group_run_stats *rs,
+					 enum fio_ddir ddir, int mean)
 {
 	double p_of_agg = 100.0;
 	if (rs && rs->agg[ddir] > 1024) {
@@ -504,13 +517,14 @@ static double convert_agg_kbytes_percent(struct group_run_stats *rs, int ddir, i
 }
 
 static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
-			     int ddir, struct buf_output *out)
+			     enum fio_ddir ddir, struct buf_output *out)
 {
 	unsigned long runt;
 	unsigned long long min, max, bw, iops;
 	double mean, dev;
 	char *io_p, *bw_p, *bw_p_alt, *iops_p, *post_st = NULL;
-	int i2p;
+	int i2p, i;
+	const char *clat_type = ts->lat_percentiles ? "lat" : "clat";
 
 	if (ddir_sync(ddir)) {
 		if (calc_lat(&ts->sync_stat, &min, &max, &mean, &dev)) {
@@ -571,12 +585,22 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		display_lat("clat", min, max, mean, dev, out);
 	if (calc_lat(&ts->lat_stat[ddir], &min, &max, &mean, &dev))
 		display_lat(" lat", min, max, mean, dev, out);
-	if (calc_lat(&ts->clat_high_prio_stat[ddir], &min, &max, &mean, &dev)) {
-		display_lat(ts->lat_percentiles ? "high prio_lat" : "high prio_clat",
-				min, max, mean, dev, out);
-		if (calc_lat(&ts->clat_low_prio_stat[ddir], &min, &max, &mean, &dev))
-			display_lat(ts->lat_percentiles ? "low prio_lat" : "low prio_clat",
-					min, max, mean, dev, out);
+
+	/* Only print per prio stats if there are >= 2 prios with samples */
+	if (get_nr_prios_with_samples(ts, ddir) >= 2) {
+		for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
+			if (calc_lat(&ts->clat_prio[ddir][i].clat_stat, &min,
+				     &max, &mean, &dev)) {
+				char buf[64];
+
+				snprintf(buf, sizeof(buf),
+					 "%s prio %u/%u",
+					 clat_type,
+					 ts->clat_prio[ddir][i].ioprio >> 13,
+					 ts->clat_prio[ddir][i].ioprio & 7);
+				display_lat(buf, min, max, mean, dev, out);
+			}
+		}
 	}
 
 	if (ts->slat_percentiles && ts->slat_stat[ddir].samples > 0)
@@ -596,8 +620,7 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 					ts->percentile_precision, "lat", out);
 
 	if (ts->clat_percentiles || ts->lat_percentiles) {
-		const char *name = ts->lat_percentiles ? "lat" : "clat";
-		char prio_name[32];
+		char prio_name[64];
 		uint64_t samples;
 
 		if (ts->lat_percentiles)
@@ -605,25 +628,24 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		else
 			samples = ts->clat_stat[ddir].samples;
 
-		/* Only print this if some high and low priority stats were collected */
-		if (ts->clat_high_prio_stat[ddir].samples > 0 &&
-			ts->clat_low_prio_stat[ddir].samples > 0)
-		{
-			sprintf(prio_name, "high prio (%.2f%%) %s",
-					100. * (double) ts->clat_high_prio_stat[ddir].samples / (double) samples,
-					name);
-			show_clat_percentiles(ts->io_u_plat_high_prio[ddir],
-						ts->clat_high_prio_stat[ddir].samples,
-						ts->percentile_list,
-						ts->percentile_precision, prio_name, out);
-
-			sprintf(prio_name, "low prio (%.2f%%) %s",
-					100. * (double) ts->clat_low_prio_stat[ddir].samples / (double) samples,
-					name);
-			show_clat_percentiles(ts->io_u_plat_low_prio[ddir],
-						ts->clat_low_prio_stat[ddir].samples,
-						ts->percentile_list,
-						ts->percentile_precision, prio_name, out);
+		/* Only print per prio stats if there are >= 2 prios with samples */
+		if (get_nr_prios_with_samples(ts, ddir) >= 2) {
+			for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
+				uint64_t prio_samples = ts->clat_prio[ddir][i].clat_stat.samples;
+
+				if (prio_samples > 0) {
+					snprintf(prio_name, sizeof(prio_name),
+						 "%s prio %u/%u (%.2f%% of IOs)",
+						 clat_type,
+						 ts->clat_prio[ddir][i].ioprio >> 13,
+						 ts->clat_prio[ddir][i].ioprio & 7,
+						 100. * (double) prio_samples / (double) samples);
+					show_clat_percentiles(ts->clat_prio[ddir][i].io_u_plat,
+							      prio_samples, ts->percentile_list,
+							      ts->percentile_precision,
+							      prio_name, out);
+				}
+			}
 		}
 	}
 
@@ -678,6 +700,7 @@ static void show_mixed_ddir_status(struct group_run_stats *rs,
 	if (ts_lcl)
 		show_ddir_status(rs, ts_lcl, DDIR_READ, out);
 
+	free_clat_prio_stats(ts_lcl);
 	free(ts_lcl);
 }
 
@@ -1251,8 +1274,9 @@ static void show_thread_status_normal(struct thread_stat *ts,
 }
 
 static void show_ddir_status_terse(struct thread_stat *ts,
-				   struct group_run_stats *rs, int ddir,
-				   int ver, struct buf_output *out)
+				   struct group_run_stats *rs,
+				   enum fio_ddir ddir, int ver,
+				   struct buf_output *out)
 {
 	unsigned long long min, max, minv, maxv, bw, iops;
 	unsigned long long *ovals = NULL;
@@ -1351,6 +1375,7 @@ static void show_mixed_ddir_status_terse(struct thread_stat *ts,
 	if (ts_lcl)
 		show_ddir_status_terse(ts_lcl, rs, DDIR_READ, ver, out);
 
+	free_clat_prio_stats(ts_lcl);
 	free(ts_lcl);
 }
 
@@ -1407,7 +1432,8 @@ static struct json_object *add_ddir_lat_json(struct thread_stat *ts,
 }
 
 static void add_ddir_status_json(struct thread_stat *ts,
-		struct group_run_stats *rs, int ddir, struct json_object *parent)
+				 struct group_run_stats *rs, enum fio_ddir ddir,
+				 struct json_object *parent)
 {
 	unsigned long long min, max;
 	unsigned long long bw_bytes, bw;
@@ -1467,25 +1493,37 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	if (!ddir_rw(ddir))
 		return;
 
-	/* Only print PRIO latencies if some high priority samples were gathered */
-	if (ts->clat_high_prio_stat[ddir].samples > 0) {
-		const char *high, *low;
+	/* Only include per prio stats if there are >= 2 prios with samples */
+	if (get_nr_prios_with_samples(ts, ddir) >= 2) {
+		struct json_array *array = json_create_array();
+		const char *obj_name;
+		int i;
 
-		if (ts->lat_percentiles) {
-			high = "lat_high_prio";
-			low = "lat_low_prio";
-		} else {
-			high = "clat_high_prio";
-			low = "clat_low_prio";
+		if (ts->lat_percentiles)
+			obj_name = "lat_ns";
+		else
+			obj_name = "clat_ns";
+
+		json_object_add_value_array(dir_object, "prios", array);
+
+		for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
+			if (ts->clat_prio[ddir][i].clat_stat.samples > 0) {
+				struct json_object *obj = json_create_object();
+				unsigned long long class, level;
+
+				class = ts->clat_prio[ddir][i].ioprio >> 13;
+				json_object_add_value_int(obj, "prioclass", class);
+				level = ts->clat_prio[ddir][i].ioprio & 7;
+				json_object_add_value_int(obj, "prio", level);
+
+				tmp_object = add_ddir_lat_json(ts,
+							       ts->clat_percentiles | ts->lat_percentiles,
+							       &ts->clat_prio[ddir][i].clat_stat,
+							       ts->clat_prio[ddir][i].io_u_plat);
+				json_object_add_value_object(obj, obj_name, tmp_object);
+				json_array_add_value_object(array, obj);
+			}
 		}
-
-		tmp_object = add_ddir_lat_json(ts, ts->clat_percentiles | ts->lat_percentiles,
-				&ts->clat_high_prio_stat[ddir], ts->io_u_plat_high_prio[ddir]);
-		json_object_add_value_object(dir_object, high, tmp_object);
-
-		tmp_object = add_ddir_lat_json(ts, ts->clat_percentiles | ts->lat_percentiles,
-				&ts->clat_low_prio_stat[ddir], ts->io_u_plat_low_prio[ddir]);
-		json_object_add_value_object(dir_object, low, tmp_object);
 	}
 
 	if (calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev)) {
@@ -1534,6 +1572,7 @@ static void add_mixed_ddir_status_json(struct thread_stat *ts,
 	if (ts_lcl)
 		add_ddir_status_json(ts_lcl, rs, DDIR_READ, parent);
 
+	free_clat_prio_stats(ts_lcl);
 	free(ts_lcl);
 }
 
@@ -1995,6 +2034,215 @@ void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src)
 		dst->sig_figs = src->sig_figs;
 }
 
+/*
+ * Free the clat_prio_stat arrays allocated by alloc_clat_prio_stat_ddir().
+ */
+void free_clat_prio_stats(struct thread_stat *ts)
+{
+	enum fio_ddir ddir;
+
+	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
+		sfree(ts->clat_prio[ddir]);
+		ts->clat_prio[ddir] = NULL;
+		ts->nr_clat_prio[ddir] = 0;
+	}
+}
+
+/*
+ * Allocate a clat_prio_stat array. The array has to be allocated/freed using
+ * smalloc/sfree, so that it is accessible by the process/thread summing the
+ * thread_stats.
+ */
+int alloc_clat_prio_stat_ddir(struct thread_stat *ts, enum fio_ddir ddir,
+			      int nr_prios)
+{
+	struct clat_prio_stat *clat_prio;
+	int i;
+
+	clat_prio = scalloc(nr_prios, sizeof(*ts->clat_prio[ddir]));
+	if (!clat_prio) {
+		log_err("fio: failed to allocate ts clat data\n");
+		return 1;
+	}
+
+	for (i = 0; i < nr_prios; i++)
+		clat_prio[i].clat_stat.min_val = ULONG_MAX;
+
+	ts->clat_prio[ddir] = clat_prio;
+	ts->nr_clat_prio[ddir] = nr_prios;
+
+	return 0;
+}
+
+static int grow_clat_prio_stat(struct thread_stat *dst, enum fio_ddir ddir)
+{
+	int curr_len = dst->nr_clat_prio[ddir];
+	void *new_arr;
+
+	new_arr = scalloc(curr_len + 1, sizeof(*dst->clat_prio[ddir]));
+	if (!new_arr) {
+		log_err("fio: failed to grow clat prio array\n");
+		return 1;
+	}
+
+	memcpy(new_arr, dst->clat_prio[ddir],
+	       curr_len * sizeof(*dst->clat_prio[ddir]));
+	sfree(dst->clat_prio[ddir]);
+
+	dst->clat_prio[ddir] = new_arr;
+	dst->clat_prio[ddir][curr_len].clat_stat.min_val = ULONG_MAX;
+	dst->nr_clat_prio[ddir]++;
+
+	return 0;
+}
+
+static int find_clat_prio_index(struct thread_stat *dst, enum fio_ddir ddir,
+				uint32_t ioprio)
+{
+	int i, nr_prios = dst->nr_clat_prio[ddir];
+
+	for (i = 0; i < nr_prios; i++) {
+		if (dst->clat_prio[ddir][i].ioprio == ioprio)
+			return i;
+	}
+
+	return -1;
+}
+
+static int alloc_or_get_clat_prio_index(struct thread_stat *dst,
+					enum fio_ddir ddir, uint32_t ioprio,
+					int *idx)
+{
+	int index = find_clat_prio_index(dst, ddir, ioprio);
+
+	if (index == -1) {
+		index = dst->nr_clat_prio[ddir];
+
+		if (grow_clat_prio_stat(dst, ddir))
+			return 1;
+
+		dst->clat_prio[ddir][index].ioprio = ioprio;
+	}
+
+	*idx = index;
+
+	return 0;
+}
+
+static int clat_prio_stats_copy(struct thread_stat *dst, struct thread_stat *src,
+				enum fio_ddir dst_ddir, enum fio_ddir src_ddir)
+{
+	size_t sz = sizeof(*src->clat_prio[src_ddir]) *
+		src->nr_clat_prio[src_ddir];
+
+	dst->clat_prio[dst_ddir] = smalloc(sz);
+	if (!dst->clat_prio[dst_ddir]) {
+		log_err("fio: failed to alloc clat prio array\n");
+		return 1;
+	}
+
+	memcpy(dst->clat_prio[dst_ddir], src->clat_prio[src_ddir], sz);
+	dst->nr_clat_prio[dst_ddir] = src->nr_clat_prio[src_ddir];
+
+	return 0;
+}
+
+static int clat_prio_stat_add_samples(struct thread_stat *dst,
+				      enum fio_ddir dst_ddir, uint32_t ioprio,
+				      struct io_stat *io_stat,
+				      uint64_t *io_u_plat)
+{
+	int i, dst_index;
+
+	if (!io_stat->samples)
+		return 0;
+
+	if (alloc_or_get_clat_prio_index(dst, dst_ddir, ioprio, &dst_index))
+		return 1;
+
+	sum_stat(&dst->clat_prio[dst_ddir][dst_index].clat_stat, io_stat,
+		 false);
+
+	for (i = 0; i < FIO_IO_U_PLAT_NR; i++)
+		dst->clat_prio[dst_ddir][dst_index].io_u_plat[i] += io_u_plat[i];
+
+	return 0;
+}
+
+static int sum_clat_prio_stats_src_single_prio(struct thread_stat *dst,
+					       struct thread_stat *src,
+					       enum fio_ddir dst_ddir,
+					       enum fio_ddir src_ddir)
+{
+	struct io_stat *io_stat;
+	uint64_t *io_u_plat;
+
+	/*
+	 * If src ts has no clat_prio_stat array, then all I/Os were submitted
+	 * using src->ioprio. Thus, the global samples in src->clat_stat (or
+	 * src->lat_stat) can be used as the 'per prio' samples for src->ioprio.
+	 */
+	assert(!src->clat_prio[src_ddir]);
+	assert(src->nr_clat_prio[src_ddir] == 0);
+
+	if (src->lat_percentiles) {
+		io_u_plat = src->io_u_plat[FIO_LAT][src_ddir];
+		io_stat = &src->lat_stat[src_ddir];
+	} else {
+		io_u_plat = src->io_u_plat[FIO_CLAT][src_ddir];
+		io_stat = &src->clat_stat[src_ddir];
+	}
+
+	return clat_prio_stat_add_samples(dst, dst_ddir, src->ioprio, io_stat,
+					  io_u_plat);
+}
+
+static int sum_clat_prio_stats_src_multi_prio(struct thread_stat *dst,
+					      struct thread_stat *src,
+					      enum fio_ddir dst_ddir,
+					      enum fio_ddir src_ddir)
+{
+	int i;
+
+	/*
+	 * If src ts has a clat_prio_stat array, then there are multiple prios
+	 * in use (i.e. src ts had cmdprio_percentage or cmdprio_bssplit set).
+	 * The samples for the default prio will exist in the src->clat_prio
+	 * array, just like the samples for any other prio.
+	 */
+	assert(src->clat_prio[src_ddir]);
+	assert(src->nr_clat_prio[src_ddir]);
+
+	/* If the dst ts doesn't yet have a clat_prio array, simply memcpy. */
+	if (!dst->clat_prio[dst_ddir])
+		return clat_prio_stats_copy(dst, src, dst_ddir, src_ddir);
+
+	/* The dst ts already has a clat_prio_array, add src stats into it. */
+	for (i = 0; i < src->nr_clat_prio[src_ddir]; i++) {
+		struct io_stat *io_stat = &src->clat_prio[src_ddir][i].clat_stat;
+		uint64_t *io_u_plat = src->clat_prio[src_ddir][i].io_u_plat;
+		uint32_t ioprio = src->clat_prio[src_ddir][i].ioprio;
+
+		if (clat_prio_stat_add_samples(dst, dst_ddir, ioprio, io_stat, io_u_plat))
+			return 1;
+	}
+
+	return 0;
+}
+
+static int sum_clat_prio_stats(struct thread_stat *dst, struct thread_stat *src,
+			       enum fio_ddir dst_ddir, enum fio_ddir src_ddir)
+{
+	if (dst->disable_prio_stat)
+		return 0;
+
+	if (!src->clat_prio[src_ddir])
+		return sum_clat_prio_stats_src_single_prio(dst, src, dst_ddir,
+							   src_ddir);
+
+	return sum_clat_prio_stats_src_multi_prio(dst, src, dst_ddir, src_ddir);
+}
+
 void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src)
 {
 	int k, l, m;
@@ -2002,12 +2250,11 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src)
 	for (l = 0; l < DDIR_RWDIR_CNT; l++) {
 		if (dst->unified_rw_rep != UNIFIED_MIXED) {
 			sum_stat(&dst->clat_stat[l], &src->clat_stat[l], false);
-			sum_stat(&dst->clat_high_prio_stat[l], &src->clat_high_prio_stat[l], false);
-			sum_stat(&dst->clat_low_prio_stat[l], &src->clat_low_prio_stat[l], false);
 			sum_stat(&dst->slat_stat[l], &src->slat_stat[l], false);
 			sum_stat(&dst->lat_stat[l], &src->lat_stat[l], false);
 			sum_stat(&dst->bw_stat[l], &src->bw_stat[l], true);
 			sum_stat(&dst->iops_stat[l], &src->iops_stat[l], true);
+			sum_clat_prio_stats(dst, src, l, l);
 
 			dst->io_bytes[l] += src->io_bytes[l];
 
@@ -2015,12 +2262,11 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src)
 				dst->runtime[l] = src->runtime[l];
 		} else {
 			sum_stat(&dst->clat_stat[0], &src->clat_stat[l], false);
-			sum_stat(&dst->clat_high_prio_stat[0], &src->clat_high_prio_stat[l], false);
-			sum_stat(&dst->clat_low_prio_stat[0], &src->clat_low_prio_stat[l], false);
 			sum_stat(&dst->slat_stat[0], &src->slat_stat[l], false);
 			sum_stat(&dst->lat_stat[0], &src->lat_stat[l], false);
 			sum_stat(&dst->bw_stat[0], &src->bw_stat[l], true);
 			sum_stat(&dst->iops_stat[0], &src->iops_stat[l], true);
+			sum_clat_prio_stats(dst, src, 0, l);
 
 			dst->io_bytes[0] += src->io_bytes[l];
 
@@ -2074,19 +2320,6 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src)
 	for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
 		dst->io_u_sync_plat[k] += src->io_u_sync_plat[k];
 
-	for (k = 0; k < DDIR_RWDIR_CNT; k++) {
-		for (m = 0; m < FIO_IO_U_PLAT_NR; m++) {
-			if (dst->unified_rw_rep != UNIFIED_MIXED) {
-				dst->io_u_plat_high_prio[k][m] += src->io_u_plat_high_prio[k][m];
-				dst->io_u_plat_low_prio[k][m] += src->io_u_plat_low_prio[k][m];
-			} else {
-				dst->io_u_plat_high_prio[0][m] += src->io_u_plat_high_prio[k][m];
-				dst->io_u_plat_low_prio[0][m] += src->io_u_plat_low_prio[k][m];
-			}
-
-		}
-	}
-
 	dst->total_run_time += src->total_run_time;
 	dst->total_submit += src->total_submit;
 	dst->total_complete += src->total_complete;
@@ -2114,8 +2347,6 @@ void init_thread_stat_min_vals(struct thread_stat *ts)
 		ts->lat_stat[i].min_val = ULONG_MAX;
 		ts->bw_stat[i].min_val = ULONG_MAX;
 		ts->iops_stat[i].min_val = ULONG_MAX;
-		ts->clat_high_prio_stat[i].min_val = ULONG_MAX;
-		ts->clat_low_prio_stat[i].min_val = ULONG_MAX;
 	}
 	ts->sync_stat.min_val = ULONG_MAX;
 }
@@ -2128,6 +2359,58 @@ void init_thread_stat(struct thread_stat *ts)
 	ts->groupid = -1;
 }
 
+static void init_per_prio_stats(struct thread_stat *threadstats, int nr_ts)
+{
+	struct thread_data *td;
+	struct thread_stat *ts;
+	int i, j, last_ts, idx;
+	enum fio_ddir ddir;
+
+	j = 0;
+	last_ts = -1;
+	idx = 0;
+
+	/*
+	 * Loop through all tds, if a td requires per prio stats, temporarily
+	 * store a 1 in ts->disable_prio_stat, and then do an additional
+	 * loop at the end where we invert the ts->disable_prio_stat values.
+	 */
+	for_each_td(td, i) {
+		if (!td->o.stats)
+			continue;
+		if (idx &&
+		    (!td->o.group_reporting ||
+		     (td->o.group_reporting && last_ts != td->groupid))) {
+			idx = 0;
+			j++;
+		}
+
+		last_ts = td->groupid;
+		ts = &threadstats[j];
+
+		/* idx == 0 means first td in group, or td is not in a group. */
+		if (idx == 0)
+			ts->ioprio = td->ioprio;
+		else if (td->ioprio != ts->ioprio)
+			ts->disable_prio_stat = 1;
+
+		for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
+			if (td->ts.clat_prio[ddir]) {
+				ts->disable_prio_stat = 1;
+				break;
+			}
+		}
+
+		idx++;
+	}
+
+	/* Loop through all dst threadstats and fixup the values. */
+	for (i = 0; i < nr_ts; i++) {
+		ts = &threadstats[i];
+		ts->disable_prio_stat = !ts->disable_prio_stat;
+	}
+}
+
 void __show_run_stats(void)
 {
 	struct group_run_stats *runstats, *rs;
@@ -2174,6 +2457,8 @@ void __show_run_stats(void)
 		opt_lists[i] = NULL;
 	}
 
+	init_per_prio_stats(threadstats, nr_ts);
+
 	j = 0;
 	last_ts = -1;
 	idx = 0;
@@ -2198,7 +2483,6 @@ void __show_run_stats(void)
 		opt_lists[j] = &td->opt_list;
 
 		idx++;
-		ts->members++;
 
 		if (ts->groupid == -1) {
 			/*
@@ -2265,6 +2549,8 @@ void __show_run_stats(void)
 
 		sum_thread_stats(ts, &td->ts);
 
+		ts->members++;
+
 		if (td->o.ss_dur) {
 			ts->ss_state = td->ss.state;
 			ts->ss_dur = td->ss.dur;
@@ -2313,7 +2599,7 @@ void __show_run_stats(void)
 	}
 
 	for (i = 0; i < groupid + 1; i++) {
-		int ddir;
+		enum fio_ddir ddir;
 
 		rs = &runstats[i];
 
@@ -2419,6 +2705,12 @@ void __show_run_stats(void)
 
 	log_info_flush();
 	free(runstats);
+
+	/* free arrays allocated by sum_thread_stats(), if any */
+	for (i = 0; i < nr_ts; i++) {
+		ts = &threadstats[i];
+		free_clat_prio_stats(ts);
+	}
 	free(threadstats);
 	free(opt_lists);
 }
@@ -2545,6 +2837,14 @@ static inline void add_stat_sample(struct io_stat *is, unsigned long long data)
 	is->samples++;
 }
 
+static inline void add_stat_prio_sample(struct clat_prio_stat *clat_prio,
+					unsigned short clat_prio_index,
+					unsigned long long nsec)
+{
+	if (clat_prio)
+		add_stat_sample(&clat_prio[clat_prio_index].clat_stat, nsec);
+}
+
 /*
  * Return a struct io_logs, which is added to the tail of the log
  * list for 'iolog'.
@@ -2717,7 +3017,7 @@ static void __add_log_sample(struct io_log *iolog, union io_sample_data data,
 		s = get_sample(iolog, cur_log, cur_log->nr_samples);
 
 		s->data = data;
-		s->time = t + (iolog->td ? iolog->td->unix_epoch : 0);
+		s->time = t + (iolog->td ? iolog->td->alternate_epoch : 0);
 		io_sample_set_ddir(iolog, s, ddir);
 		s->bs = bs;
 		s->priority = priority;
@@ -2742,14 +3042,36 @@ static inline void reset_io_stat(struct io_stat *ios)
 	ios->mean.u.f = ios->S.u.f = 0;
 }
 
+static inline void reset_io_u_plat(uint64_t *io_u_plat)
+{
+	int i;
+
+	for (i = 0; i < FIO_IO_U_PLAT_NR; i++)
+		io_u_plat[i] = 0;
+}
+
+static inline void reset_clat_prio_stats(struct thread_stat *ts)
+{
+	enum fio_ddir ddir;
+	int i;
+
+	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
+		if (!ts->clat_prio[ddir])
+			continue;
+
+		for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
+			reset_io_stat(&ts->clat_prio[ddir][i].clat_stat);
+			reset_io_u_plat(ts->clat_prio[ddir][i].io_u_plat);
+		}
+	}
+}
+
 void reset_io_stats(struct thread_data *td)
 {
 	struct thread_stat *ts = &td->ts;
-	int i, j, k;
+	int i, j;
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		reset_io_stat(&ts->clat_high_prio_stat[i]);
-		reset_io_stat(&ts->clat_low_prio_stat[i]);
 		reset_io_stat(&ts->clat_stat[i]);
 		reset_io_stat(&ts->slat_stat[i]);
 		reset_io_stat(&ts->lat_stat[i]);
@@ -2761,21 +3083,16 @@ void reset_io_stats(struct thread_data *td)
 		ts->total_io_u[i] = 0;
 		ts->short_io_u[i] = 0;
 		ts->drop_io_u[i] = 0;
-
-		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
-			ts->io_u_plat_high_prio[i][j] = 0;
-			ts->io_u_plat_low_prio[i][j] = 0;
-			if (!i)
-				ts->io_u_sync_plat[j] = 0;
-		}
 	}
 
 	for (i = 0; i < FIO_LAT_CNT; i++)
 		for (j = 0; j < DDIR_RWDIR_CNT; j++)
-			for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
-				ts->io_u_plat[i][j][k] = 0;
+			reset_io_u_plat(ts->io_u_plat[i][j]);
+
+	reset_clat_prio_stats(ts);
 
 	ts->total_io_u[DDIR_SYNC] = 0;
+	reset_io_u_plat(ts->io_u_sync_plat);
 
 	for (i = 0; i < FIO_IO_U_MAP_NR; i++) {
 		ts->io_u_map[i] = 0;
@@ -2821,7 +3138,7 @@ static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
 static void _add_stat_to_log(struct io_log *iolog, unsigned long elapsed,
 			     bool log_max)
 {
-	int ddir;
+	enum fio_ddir ddir;
 
 	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
 		__add_stat_to_log(iolog, ddir, elapsed, log_max);
@@ -2926,22 +3243,21 @@ static inline void add_lat_percentile_sample(struct thread_stat *ts,
 	ts->io_u_plat[lat][ddir][idx]++;
 }
 
-static inline void add_lat_percentile_prio_sample(struct thread_stat *ts,
-						  unsigned long long nsec,
-						  enum fio_ddir ddir,
-						  bool high_prio)
+static inline void
+add_lat_percentile_prio_sample(struct thread_stat *ts, unsigned long long nsec,
+			       enum fio_ddir ddir,
+			       unsigned short clat_prio_index)
 {
 	unsigned int idx = plat_val_to_idx(nsec);
 
-	if (!high_prio)
-		ts->io_u_plat_low_prio[ddir][idx]++;
-	else
-		ts->io_u_plat_high_prio[ddir][idx]++;
+	if (ts->clat_prio[ddir])
+		ts->clat_prio[ddir][clat_prio_index].io_u_plat[idx]++;
 }
 
 void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		     unsigned long long nsec, unsigned long long bs,
-		     uint64_t offset, unsigned int ioprio, bool high_prio)
+		     uint64_t offset, unsigned int ioprio,
+		     unsigned short clat_prio_index)
 {
 	const bool needs_lock = td_async_processing(td);
 	unsigned long elapsed, this_window;
@@ -2954,7 +3270,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 	add_stat_sample(&ts->clat_stat[ddir], nsec);
 
 	/*
-	 * When lat_percentiles=1 (default 0), the reported high/low priority
+	 * When lat_percentiles=1 (default 0), the reported per priority
 	 * percentiles and stats are used for describing total latency values,
 	 * even though the variable names themselves start with clat_.
 	 *
@@ -2962,12 +3278,9 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 	 * lat_percentiles=0. add_lat_sample() will add the prio stat sample
 	 * when lat_percentiles=1.
 	 */
-	if (!ts->lat_percentiles) {
-		if (high_prio)
-			add_stat_sample(&ts->clat_high_prio_stat[ddir], nsec);
-		else
-			add_stat_sample(&ts->clat_low_prio_stat[ddir], nsec);
-	}
+	if (!ts->lat_percentiles)
+		add_stat_prio_sample(ts->clat_prio[ddir], clat_prio_index,
+				     nsec);
 
 	if (td->clat_log)
 		add_log_sample(td, td->clat_log, sample_val(nsec), ddir, bs,
@@ -2982,7 +3295,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		add_lat_percentile_sample(ts, nsec, ddir, FIO_CLAT);
 		if (!ts->lat_percentiles)
 			add_lat_percentile_prio_sample(ts, nsec, ddir,
-						       high_prio);
+						       clat_prio_index);
 	}
 
 	if (iolog && iolog->hist_msec) {
@@ -3055,7 +3368,8 @@ void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
 
 void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 		    unsigned long long nsec, unsigned long long bs,
-		    uint64_t offset, unsigned int ioprio, bool high_prio)
+		    uint64_t offset, unsigned int ioprio,
+		    unsigned short clat_prio_index)
 {
 	const bool needs_lock = td_async_processing(td);
 	struct thread_stat *ts = &td->ts;
@@ -3073,7 +3387,7 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 			       offset, ioprio);
 
 	/*
-	 * When lat_percentiles=1 (default 0), the reported high/low priority
+	 * When lat_percentiles=1 (default 0), the reported per priority
 	 * percentiles and stats are used for describing total latency values,
 	 * even though the variable names themselves start with clat_.
 	 *
@@ -3084,12 +3398,9 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 	 */
 	if (ts->lat_percentiles) {
 		add_lat_percentile_sample(ts, nsec, ddir, FIO_LAT);
-		add_lat_percentile_prio_sample(ts, nsec, ddir, high_prio);
-		if (high_prio)
-			add_stat_sample(&ts->clat_high_prio_stat[ddir], nsec);
-		else
-			add_stat_sample(&ts->clat_low_prio_stat[ddir], nsec);
-
+		add_lat_percentile_prio_sample(ts, nsec, ddir, clat_prio_index);
+		add_stat_prio_sample(ts->clat_prio[ddir], clat_prio_index,
+				     nsec);
 	}
 	if (needs_lock)
 		__td_io_u_unlock(td);
diff --git a/stat.h b/stat.h
index 15ca4eff..dce0bb0d 100644
--- a/stat.h
+++ b/stat.h
@@ -158,6 +158,12 @@ enum fio_lat {
 	FIO_LAT_CNT = 3,
 };
 
+struct clat_prio_stat {
+	uint64_t io_u_plat[FIO_IO_U_PLAT_NR];
+	struct io_stat clat_stat;
+	uint32_t ioprio;
+};
+
 struct thread_stat {
 	char name[FIO_JOBNAME_SIZE];
 	char verror[FIO_VERROR_SIZE];
@@ -168,6 +174,7 @@ struct thread_stat {
 	char description[FIO_JOBDESC_SIZE];
 	uint32_t members;
 	uint32_t unified_rw_rep;
+	uint32_t disable_prio_stat;
 
 	/*
 	 * bandwidth and latency stats
@@ -252,21 +259,40 @@ struct thread_stat {
 	fio_fp64_t ss_deviation;
 	fio_fp64_t ss_criterion;
 
-	uint64_t io_u_plat_high_prio[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR] __attribute__((aligned(8)));;
-	uint64_t io_u_plat_low_prio[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR];
-	struct io_stat clat_high_prio_stat[DDIR_RWDIR_CNT] __attribute__((aligned(8)));
-	struct io_stat clat_low_prio_stat[DDIR_RWDIR_CNT];
+	/* A mirror of td->ioprio. */
+	uint32_t ioprio;
 
 	union {
 		uint64_t *ss_iops_data;
+		/*
+		 * For FIO_NET_CMD_TS, the pointed to data will temporarily
+		 * be stored at this offset from the start of the payload.
+		 */
+		uint64_t ss_iops_data_offset;
 		uint64_t pad4;
 	};
 
 	union {
 		uint64_t *ss_bw_data;
+		/*
+		 * For FIO_NET_CMD_TS, the pointed to data will temporarily
+		 * be stored at this offset from the start of the payload.
+		 */
+		uint64_t ss_bw_data_offset;
 		uint64_t pad5;
 	};
 
+	union {
+		struct clat_prio_stat *clat_prio[DDIR_RWDIR_CNT];
+		/*
+		 * For FIO_NET_CMD_TS, the pointed to data will temporarily
+		 * be stored at this offset from the start of the payload.
+		 */
+		uint64_t clat_prio_offset[DDIR_RWDIR_CNT];
+		uint64_t pad6;
+	};
+	uint32_t nr_clat_prio[DDIR_RWDIR_CNT];
+
 	uint64_t cachehit;
 	uint64_t cachemiss;
 } __attribute__((packed));
@@ -342,9 +368,9 @@ extern void update_rusage_stat(struct thread_data *);
 extern void clear_rusage_stat(struct thread_data *);
 
 extern void add_lat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
-			   unsigned long long, uint64_t, unsigned int, bool);
+			   unsigned long long, uint64_t, unsigned int, unsigned short);
 extern void add_clat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
-			    unsigned long long, uint64_t, unsigned int, bool);
+			    unsigned long long, uint64_t, unsigned int, unsigned short);
 extern void add_slat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
 				unsigned long long, uint64_t, unsigned int);
 extern void add_agg_sample(union io_sample_data, enum fio_ddir, unsigned long long);
@@ -355,6 +381,8 @@ extern void add_bw_sample(struct thread_data *, struct io_u *,
 extern void add_sync_clat_sample(struct thread_stat *ts,
 				unsigned long long nsec);
 extern int calc_log_samples(void);
+extern void free_clat_prio_stats(struct thread_stat *);
+extern int alloc_clat_prio_stat_ddir(struct thread_stat *, enum fio_ddir, int);
 
 extern void print_disk_util(struct disk_util_stat *, struct disk_util_agg *, int terse, struct buf_output *);
 extern void json_array_add_disk_util(struct disk_util_stat *dus,
diff --git a/t/latency_percentiles.py b/t/latency_percentiles.py
index cc437426..9e37d9fe 100755
--- a/t/latency_percentiles.py
+++ b/t/latency_percentiles.py
@@ -80,6 +80,7 @@ import time
 import argparse
 import platform
 import subprocess
+from collections import Counter
 from pathlib import Path
 
 
@@ -125,7 +126,8 @@ class FioLatTest():
             "--output-format={output-format}".format(**self.test_options),
         ]
         for opt in ['slat_percentiles', 'clat_percentiles', 'lat_percentiles',
-                    'unified_rw_reporting', 'fsync', 'fdatasync', 'numjobs', 'cmdprio_percentage']:
+                    'unified_rw_reporting', 'fsync', 'fdatasync', 'numjobs',
+                    'cmdprio_percentage', 'bssplit', 'cmdprio_bssplit']:
             if opt in self.test_options:
                 option = '--{0}={{{0}}}'.format(opt)
                 fio_args.append(option.format(**self.test_options))
@@ -363,20 +365,19 @@ class FioLatTest():
 
     def check_nocmdprio_lat(self, job):
         """
-        Make sure no high/low priority latencies appear.
+        Make sure no per priority latencies appear.
 
         job         JSON object to check
         """
 
         for ddir in ['read', 'write', 'trim']:
             if ddir in job:
-                if 'lat_high_prio' in job[ddir] or 'lat_low_prio' in job[ddir] or \
-                    'clat_high_prio' in job[ddir] or 'clat_low_prio' in job[ddir]:
-                    print("Unexpected high/low priority latencies found in %s output" % ddir)
+                if 'prios' in job[ddir]:
+                    print("Unexpected per priority latencies found in %s output" % ddir)
                     return False
 
         if self.debug:
-            print("No high/low priority latencies found")
+            print("No per priority latencies found")
 
         return True
 
@@ -497,7 +498,7 @@ class FioLatTest():
         return retval
 
     def check_prio_latencies(self, jsondata, clat=True, plus=False):
-        """Check consistency of high/low priority latencies.
+        """Check consistency of per priority latencies.
 
         clat                True if we should check clat data; other check lat data
         plus                True if we have json+ format data where additional checks can
@@ -506,78 +507,78 @@ class FioLatTest():
         """
 
         if clat:
-            high = 'clat_high_prio'
-            low = 'clat_low_prio'
-            combined = 'clat_ns'
+            obj = combined = 'clat_ns'
         else:
-            high = 'lat_high_prio'
-            low = 'lat_low_prio'
-            combined = 'lat_ns'
+            obj = combined = 'lat_ns'
 
-        if not high in jsondata or not low in jsondata or not combined in jsondata:
-            print("Error identifying high/low priority latencies")
+        if not 'prios' in jsondata or not combined in jsondata:
+            print("Error identifying per priority latencies")
             return False
 
-        if jsondata[high]['N'] + jsondata[low]['N'] != jsondata[combined]['N']:
-            print("High %d + low %d != combined sample size %d" % \
-                    (jsondata[high]['N'], jsondata[low]['N'], jsondata[combined]['N']))
+        sum_sample_size = sum([x[obj]['N'] for x in jsondata['prios']])
+        if sum_sample_size != jsondata[combined]['N']:
+            print("Per prio sample size sum %d != combined sample size %d" %
+                  (sum_sample_size, jsondata[combined]['N']))
             return False
         elif self.debug:
-            print("High %d + low %d == combined sample size %d" % \
-                    (jsondata[high]['N'], jsondata[low]['N'], jsondata[combined]['N']))
+            print("Per prio sample size sum %d == combined sample size %d" %
+                  (sum_sample_size, jsondata[combined]['N']))
 
-        if min(jsondata[high]['min'], jsondata[low]['min']) != jsondata[combined]['min']:
-            print("Min of high %d, low %d min latencies does not match min %d from combined data" % \
-                    (jsondata[high]['min'], jsondata[low]['min'], jsondata[combined]['min']))
+        min_val = min([x[obj]['min'] for x in jsondata['prios']])
+        if min_val != jsondata[combined]['min']:
+            print("Min per prio min latency %d does not match min %d from combined data" %
+                  (min_val, jsondata[combined]['min']))
             return False
         elif self.debug:
-            print("Min of high %d, low %d min latencies matches min %d from combined data" % \
-                    (jsondata[high]['min'], jsondata[low]['min'], jsondata[combined]['min']))
+            print("Min per prio min latency %d matches min %d from combined data" %
+                  (min_val, jsondata[combined]['min']))
 
-        if max(jsondata[high]['max'], jsondata[low]['max']) != jsondata[combined]['max']:
-            print("Max of high %d, low %d max latencies does not match max %d from combined data" % \
-                    (jsondata[high]['max'], jsondata[low]['max'], jsondata[combined]['max']))
+        max_val = max([x[obj]['max'] for x in jsondata['prios']])
+        if max_val != jsondata[combined]['max']:
+            print("Max per prio max latency %d does not match max %d from combined data" %
+                  (max_val, jsondata[combined]['max']))
             return False
         elif self.debug:
-            print("Max of high %d, low %d max latencies matches max %d from combined data" % \
-                    (jsondata[high]['max'], jsondata[low]['max'], jsondata[combined]['max']))
+            print("Max per prio max latency %d matches max %d from combined data" %
+                  (max_val, jsondata[combined]['max']))
 
-        weighted_avg = (jsondata[high]['mean'] * jsondata[high]['N'] + \
-                        jsondata[low]['mean'] * jsondata[low]['N']) / jsondata[combined]['N']
+        weighted_vals = [x[obj]['mean'] * x[obj]['N'] for x in jsondata['prios']]
+        weighted_avg = sum(weighted_vals) / jsondata[combined]['N']
         delta = abs(weighted_avg - jsondata[combined]['mean'])
         if (delta / jsondata[combined]['mean']) > 0.0001:
-            print("Difference between weighted average %f of high, low means "
+            print("Difference between merged per prio weighted average %f mean "
                   "and actual mean %f exceeds 0.01%%" % (weighted_avg, jsondata[combined]['mean']))
             return False
         elif self.debug:
-            print("Weighted average %f of high, low means matches actual mean %f" % \
-                    (weighted_avg, jsondata[combined]['mean']))
+            print("Merged per prio weighted average %f mean matches actual mean %f" %
+                  (weighted_avg, jsondata[combined]['mean']))
 
         if plus:
-            if not self.check_jsonplus(jsondata[high]):
-                return False
-            if not self.check_jsonplus(jsondata[low]):
-                return False
+            for prio in jsondata['prios']:
+                if not self.check_jsonplus(prio[obj]):
+                    return False
 
-            bins = {**jsondata[high]['bins'], **jsondata[low]['bins']}
-            for duration in bins.keys():
-                if duration in jsondata[high]['bins'] and duration in jsondata[low]['bins']:
-                    bins[duration] = jsondata[high]['bins'][duration] + \
-                            jsondata[low]['bins'][duration]
+            counter = Counter()
+            for prio in jsondata['prios']:
+                counter.update(prio[obj]['bins'])
+
+            bins = dict(counter)
 
             if len(bins) != len(jsondata[combined]['bins']):
-                print("Number of combined high/low bins does not match number of overall bins")
+                print("Number of merged bins %d does not match number of overall bins %d" %
+                      (len(bins), len(jsondata[combined]['bins'])))
                 return False
             elif self.debug:
-                print("Number of bins from merged high/low data matches number of overall bins")
+                print("Number of merged bins %d matches number of overall bins %d" %
+                      (len(bins), len(jsondata[combined]['bins'])))
 
             for duration in bins.keys():
                 if bins[duration] != jsondata[combined]['bins'][duration]:
-                    print("Merged high/low count does not match overall count for duration %d" \
-                            % duration)
+                    print("Merged per prio count does not match overall count for duration %d" %
+                          duration)
                     return False
 
-        print("Merged high/low priority latency data match combined latency data")
+        print("Merged per priority latency data match combined latency data")
         return True
 
     def check(self):
@@ -602,7 +603,7 @@ class Test001(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['read'], 0, slat=False)
@@ -626,7 +627,7 @@ class Test002(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['write'], 1, slat=False, clat=False)
@@ -650,7 +651,7 @@ class Test003(FioLatTest):
             print("Unexpected write data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['trim'], 2, slat=False, tlat=False)
@@ -674,7 +675,7 @@ class Test004(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['read'], 0, plus=True)
@@ -698,7 +699,7 @@ class Test005(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['write'], 1, slat=False, plus=True)
@@ -722,7 +723,7 @@ class Test006(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['read'], 0, slat=False, tlat=False, plus=True)
@@ -743,7 +744,7 @@ class Test007(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['read'], 0, clat=False, tlat=False, plus=True)
@@ -761,11 +762,11 @@ class Test008(FioLatTest):
         job = self.json_data['jobs'][0]
 
         retval = True
-        if 'read' in job or 'write'in job or 'trim' in job:
+        if 'read' in job or 'write' in job or 'trim' in job:
             print("Unexpected data direction found in fio output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['mixed'], 0, plus=True, unified=True)
@@ -792,7 +793,7 @@ class Test009(FioLatTest):
             print("Error checking fsync latency data")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['write'], 1, slat=False, plus=True)
@@ -813,7 +814,7 @@ class Test010(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['read'], 0, plus=True)
@@ -839,7 +840,7 @@ class Test011(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['read'], 0, slat=False, clat=False, plus=True)
@@ -953,7 +954,7 @@ class Test019(FioLatTest):
         job = self.json_data['jobs'][0]
 
         retval = True
-        if 'read' in job or 'write'in job or 'trim' in job:
+        if 'read' in job or 'write' in job or 'trim' in job:
             print("Unexpected data direction found in fio output")
             retval = False
 
@@ -963,6 +964,27 @@ class Test019(FioLatTest):
         return retval
 
 
+class Test021(FioLatTest):
+    """Test object for Test 21."""
+
+    def check(self):
+        """Check Test 21 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+
+        retval &= self.check_latencies(job['read'], 0, slat=False, tlat=False, plus=True)
+        retval &= self.check_latencies(job['write'], 1, slat=False, tlat=False, plus=True)
+        retval &= self.check_prio_latencies(job['read'], clat=True, plus=True)
+        retval &= self.check_prio_latencies(job['write'], clat=True, plus=True)
+
+        return retval
+
+
 def parse_args():
     """Parse command-line arguments."""
 
@@ -1007,7 +1029,7 @@ def main():
             # randread, null
             # enable slat, clat, lat
             # only clat and lat will appear because
-            # because the null ioengine is syncrhonous
+            # because the null ioengine is synchronous
             "test_id": 1,
             "runtime": 2,
             "output-format": "json",
@@ -1047,7 +1069,7 @@ def main():
         {
             # randread, aio
             # enable slat, clat, lat
-            # all will appear because liaio is asynchronous
+            # all will appear because libaio is asynchronous
             "test_id": 4,
             "runtime": 5,
             "output-format": "json+",
@@ -1153,9 +1175,9 @@ def main():
             # randread, null
             # enable slat, clat, lat
             # only clat and lat will appear because
-            # because the null ioengine is syncrhonous
-            # same as Test 1 except
-            # numjobs = 4 to test sum_thread_stats() changes
+            # because the null ioengine is synchronous
+            # same as Test 1 except add numjobs = 4 to test
+            # sum_thread_stats() changes
             "test_id": 12,
             "runtime": 2,
             "output-format": "json",
@@ -1170,9 +1192,9 @@ def main():
         {
             # randread, aio
             # enable slat, clat, lat
-            # all will appear because liaio is asynchronous
-            # same as Test 4 except
-            # numjobs = 4 to test sum_thread_stats() changes
+            # all will appear because libaio is asynchronous
+            # same as Test 4 except add numjobs = 4 to test
+            # sum_thread_stats() changes
             "test_id": 13,
             "runtime": 5,
             "output-format": "json+",
@@ -1187,8 +1209,8 @@ def main():
         {
             # 50/50 r/w, aio, unified_rw_reporting
             # enable slat, clat, lata
-            # same as Test 8 except
-            # numjobs = 4 to test sum_thread_stats() changes
+            # same as Test 8 except add numjobs = 4 to test
+            # sum_thread_stats() changes
             "test_id": 14,
             "runtime": 5,
             "output-format": "json+",
@@ -1204,7 +1226,7 @@ def main():
         {
             # randread, aio
             # enable slat, clat, lat
-            # all will appear because liaio is asynchronous
+            # all will appear because libaio is asynchronous
             # same as Test 4 except add cmdprio_percentage
             "test_id": 15,
             "runtime": 5,
@@ -1278,8 +1300,8 @@ def main():
         {
             # 50/50 r/w, aio, unified_rw_reporting
             # enable slat, clat, lat
-            # same as Test 19 except
-            # add numjobs = 4 to test sum_thread_stats() changes
+            # same as Test 19 except add numjobs = 4 to test
+            # sum_thread_stats() changes
             "test_id": 20,
             "runtime": 5,
             "output-format": "json+",
@@ -1293,6 +1315,40 @@ def main():
             'numjobs': 4,
             "test_obj": Test019,
         },
+        {
+            # r/w, aio
+            # enable only clat
+            # test bssplit and cmdprio_bssplit
+            "test_id": 21,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 0,
+            "clat_percentiles": 1,
+            "lat_percentiles": 0,
+            "ioengine": aio,
+            'rw': 'randrw',
+            'bssplit': '64k/40:1024k/60',
+            'cmdprio_bssplit': '64k/25/1/1:64k/75/3/2:1024k/0',
+            "test_obj": Test021,
+        },
+        {
+            # r/w, aio
+            # enable only clat
+            # same as Test 21 except add numjobs = 4 to test
+            # sum_thread_stats() changes
+            "test_id": 22,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 0,
+            "clat_percentiles": 1,
+            "lat_percentiles": 0,
+            "ioengine": aio,
+            'rw': 'randrw',
+            'bssplit': '64k/40:1024k/60',
+            'cmdprio_bssplit': '64k/25/1/1:64k/75/3/2:1024k/0',
+            'numjobs': 4,
+            "test_obj": Test021,
+        },
     ]
 
     passed = 0
@@ -1304,9 +1360,10 @@ def main():
            (args.run_only and test['test_id'] not in args.run_only):
             skipped = skipped + 1
             outcome = 'SKIPPED (User request)'
-        elif (platform.system() != 'Linux' or os.geteuid() != 0) and 'cmdprio_percentage' in test:
+        elif (platform.system() != 'Linux' or os.geteuid() != 0) and \
+             ('cmdprio_percentage' in test or 'cmdprio_bssplit' in test):
             skipped = skipped + 1
-            outcome = 'SKIPPED (Linux root required for cmdprio_percentage tests)'
+            outcome = 'SKIPPED (Linux root required for cmdprio tests)'
         else:
             test_obj = test['test_obj'](artifact_root, test, args.debug)
             status = test_obj.run_fio(fio)
diff --git a/thread_options.h b/thread_options.h
index 8f4c8a59..4162c42f 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -50,6 +50,12 @@ struct split {
 	unsigned long long val2[ZONESPLIT_MAX];
 };
 
+struct split_prio {
+	uint64_t bs;
+	int32_t prio;
+	uint32_t perc;
+};
+
 struct bssplit {
 	uint64_t bs;
 	uint32_t perc;
@@ -166,6 +172,8 @@ struct thread_options {
 	unsigned int log_gz;
 	unsigned int log_gz_store;
 	unsigned int log_unix_epoch;
+	unsigned int log_alternate_epoch;
+	unsigned int log_alternate_epoch_clock_id;
 	unsigned int norandommap;
 	unsigned int softrandommap;
 	unsigned int bs_unaligned;
@@ -482,6 +490,8 @@ struct thread_options_pack {
 	uint32_t log_gz;
 	uint32_t log_gz_store;
 	uint32_t log_unix_epoch;
+	uint32_t log_alternate_epoch;
+	uint32_t log_alternate_epoch_clock_id;
 	uint32_t norandommap;
 	uint32_t softrandommap;
 	uint32_t bs_unaligned;
@@ -702,4 +712,8 @@ extern int str_split_parse(struct thread_data *td, char *str,
 extern int split_parse_ddir(struct thread_options *o, struct split *split,
 			    char *str, bool absolute, unsigned int max_splits);
 
+extern int split_parse_prio_ddir(struct thread_options *o,
+				 struct split_prio **entries, int *nr_entries,
+				 char *str);
+
 #endif
diff --git a/time.c b/time.c
index cd0e2a89..5c4d6de0 100644
--- a/time.c
+++ b/time.c
@@ -172,14 +172,14 @@ void set_genesis_time(void)
 	fio_gettime(&genesis, NULL);
 }
 
-void set_epoch_time(struct thread_data *td, int log_unix_epoch)
+void set_epoch_time(struct thread_data *td, int log_alternate_epoch, clockid_t clock_id)
 {
 	fio_gettime(&td->epoch, NULL);
-	if (log_unix_epoch) {
-		struct timeval tv;
-		gettimeofday(&tv, NULL);
-		td->unix_epoch = (unsigned long long)(tv.tv_sec) * 1000 +
-		                 (unsigned long long)(tv.tv_usec) / 1000;
+	if (log_alternate_epoch) {
+		struct timespec ts;
+		clock_gettime(clock_id, &ts);
+		td->alternate_epoch = (unsigned long long)(ts.tv_sec) * 1000 +
+		                 (unsigned long long)(ts.tv_nsec) / 1000000;
 	}
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-01-29 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-01-29 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2b3d4a6a924e0aa82654d3b96fb134085af7a98a:

  fio: use LDFLAGS when linking dynamic engines (2022-01-26 13:12:14 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 52a0b9ed71c3e929461e64b39059281948107071:

  Merge branch 'patch-1' of https://github.com/Nikratio/fio (2022-01-28 14:50:51 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'docs' of https://github.com/vincentkfu/fio
      Merge branch 'patch-1' of https://github.com/Nikratio/fio

Nikolaus Rath (1):
      I/O size: fix description of filesize

Vincent Fu (4):
      Revert "Update README to markdown format"
      docs: rename README to README.rst
      docs: update fio docs to pull from README.rst
      Makefile: build t/fio-dedupe only if zlib support is found

 HOWTO                   | 11 +++----
 Makefile                |  4 +++
 README.md => README.rst | 78 +++++++++++++++++++++++++------------------------
 doc/fio_doc.rst         |  2 +-
 doc/fio_man.rst         |  2 +-
 fio.1                   |  8 ++---
 6 files changed, 56 insertions(+), 49 deletions(-)
 rename README.md => README.rst (94%)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index f9e7c857..c72ec8cd 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1886,11 +1886,12 @@ I/O size
 
 .. option:: filesize=irange(int)
 
-	Individual file sizes. May be a range, in which case fio will select sizes
-	for files at random within the given range and limited to :option:`size` in
-	total (if that is given). If not given, each created file is the same size.
-	This option overrides :option:`size` in terms of file size, which means
-	this value is used as a fixed size or possible range of each file.
+	Individual file sizes. May be a range, in which case fio will select sizes for
+	files at random within the given range. If not given, each created file is the
+	same size. This option overrides :option:`size` in terms of file size, i.e. if
+	:option:`filesize` is specified then :option:`size` becomes merely the default
+	for :option:`io_size` and has no effect at all if :option:`io_size` is set
+	explicitly.
 
 .. option:: file_append=bool
 
diff --git a/Makefile b/Makefile
index 00e79539..2432f519 100644
--- a/Makefile
+++ b/Makefile
@@ -430,7 +430,9 @@ T_TEST_PROGS += $(T_AXMAP_PROGS)
 T_TEST_PROGS += $(T_LFSR_TEST_PROGS)
 T_TEST_PROGS += $(T_GEN_RAND_PROGS)
 T_PROGS += $(T_BTRACE_FIO_PROGS)
+ifdef CONFIG_ZLIB
 T_PROGS += $(T_DEDUPE_PROGS)
+endif
 T_PROGS += $(T_VS_PROGS)
 T_TEST_PROGS += $(T_MEMLOCK_PROGS)
 ifdef CONFIG_PREAD
@@ -618,8 +620,10 @@ t/fio-btrace2fio: $(T_BTRACE_FIO_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_BTRACE_FIO_OBJS) $(LIBS)
 endif
 
+ifdef CONFIG_ZLIB
 t/fio-dedupe: $(T_DEDUPE_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_DEDUPE_OBJS) $(LIBS)
+endif
 
 t/fio-verify-state: $(T_VS_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_VS_OBJS) $(LIBS)
diff --git a/README.md b/README.rst
similarity index 94%
rename from README.md
rename to README.rst
index b10b1688..d566fae3 100644
--- a/README.md
+++ b/README.rst
@@ -1,5 +1,5 @@
-# Fio README
-## Overview and history
+Overview and history
+--------------------
 
 Fio was originally written to save me the hassle of writing special test case
 programs when I wanted to test a specific workload, either for performance
@@ -22,13 +22,14 @@ that setting is given.  The typical use of fio is to write a job file matching
 the I/O load one wants to simulate.
 
 
-## Source
+Source
+------
 
 Fio resides in a git repo, the canonical place is:
 
 	git://git.kernel.dk/fio.git
 
-When inside a corporate firewall, `git://` URL sometimes does not work.
+When inside a corporate firewall, git:// URL sometimes does not work.
 If git:// does not work, use the http protocol instead:
 
 	http://git.kernel.dk/fio.git
@@ -54,8 +55,8 @@ or
 	https://github.com/axboe/fio.git
 
 
-## Mailing list
-
+Mailing list
+------------
 
 The fio project mailing list is meant for anything related to fio including
 general discussion, bug reporting, questions, and development. For bug reporting,
@@ -80,8 +81,8 @@ and archives for the old list can be found here:
 	http://maillist.kernel.dk/fio-devel/
 
 
-## Author
-
+Author
+------
 
 Fio was written by Jens Axboe <axboe@kernel.dk> to enable flexible testing of
 the Linux I/O subsystem and schedulers. He got tired of writing specific test
@@ -91,55 +92,56 @@ benchmark/test tools out there weren't flexible enough to do what he wanted.
 Jens Axboe <axboe@kernel.dk> 20060905
 
 
-## Binary packages
+Binary packages
+---------------
 
-**Debian:**
+Debian:
 	Starting with Debian "Squeeze", fio packages are part of the official
 	Debian repository. http://packages.debian.org/search?keywords=fio .
 
-**Ubuntu:**
+Ubuntu:
 	Starting with Ubuntu 10.04 LTS (aka "Lucid Lynx"), fio packages are part
 	of the Ubuntu "universe" repository.
 	http://packages.ubuntu.com/search?keywords=fio .
 
-**Red Hat, Fedora, CentOS & Co:**
+Red Hat, Fedora, CentOS & Co:
 	Starting with Fedora 9/Extra Packages for Enterprise Linux 4, fio
 	packages are part of the Fedora/EPEL repositories.
 	https://apps.fedoraproject.org/packages/fio .
 
-**Mandriva:**
+Mandriva:
 	Mandriva has integrated fio into their package repository, so installing
 	on that distro should be as easy as typing ``urpmi fio``.
 
-**Arch Linux:**
+Arch Linux:
         An Arch Linux package is provided under the Community sub-repository:
         https://www.archlinux.org/packages/?sort=&q=fio
 
-**Solaris:**
+Solaris:
 	Packages for Solaris are available from OpenCSW. Install their pkgutil
 	tool (http://www.opencsw.org/get-it/pkgutil/) and then install fio via
 	``pkgutil -i fio``.
 
-**Windows:**
+Windows:
 	Rebecca Cran <rebecca@bsdio.com> has fio packages for Windows at
 	https://bsdio.com/fio/ . The latest builds for Windows can also
 	be grabbed from https://ci.appveyor.com/project/axboe/fio by clicking
 	the latest x86 or x64 build, then selecting the ARTIFACTS tab.
 
-**BSDs:**
+BSDs:
 	Packages for BSDs may be available from their binary package repositories.
 	Look for a package "fio" using their binary package managers.
 
 
-## Building
-
+Building
+--------
 
 Just type::
-```
-./configure
-make
-make install
-```
+
+ $ ./configure
+ $ make
+ $ make install
+
 Note that GNU make is required. On BSDs it's available from devel/gmake within
 ports directory; on Solaris it's in the SUNWgmake package.  On platforms where
 GNU make isn't the default, type ``gmake`` instead of ``make``.
@@ -153,18 +155,18 @@ to be installed.  gfio isn't built automatically and can be enabled with a
 ``--enable-gfio`` option to configure.
 
 To build fio with a cross-compiler::
-```
-make clean
-make CROSS_COMPILE=/path/to/toolchain/prefix
-```
+
+ $ make clean
+ $ make CROSS_COMPILE=/path/to/toolchain/prefix
+
 Configure will attempt to determine the target platform automatically.
 
 It's possible to build fio for ESX as well, use the ``--esx`` switch to
 configure.
 
 
-## Windows
-
+Windows
+~~~~~~~
 
 The minimum versions of Windows for building/runing fio are Windows 7/Windows
 Server 2008 R2. On Windows, Cygwin (https://www.cygwin.com/) is required in
@@ -172,7 +174,7 @@ order to build fio. To create an MSI installer package install WiX from
 https://wixtoolset.org and run :file:`dobuild.cmd` from the :file:`os/windows`
 directory.
 
-### How to compile fio on 64-bit Windows:
+How to compile fio on 64-bit Windows:
 
  1. Install Cygwin (http://www.cygwin.com/). Install **make** and all
     packages starting with **mingw64-x86_64**. Ensure
@@ -194,21 +196,21 @@ https://github.com/mintty/mintty/wiki/Tips#inputoutput-interaction-with-alien-pr
 for details).
 
 
-## Documentation
-
+Documentation
+~~~~~~~~~~~~~
 
 Fio uses Sphinx_ to generate documentation from the reStructuredText_ files.
 To build HTML formatted documentation run ``make -C doc html`` and direct your
 browser to :file:`./doc/output/html/index.html`.  To build manual page run
 ``make -C doc man`` and then ``man doc/output/man/fio.1``.  To see what other
 output formats are supported run ``make -C doc help``.
-```
+
 .. _reStructuredText: http://www.sphinx-doc.org/rest.html
 .. _Sphinx: http://www.sphinx-doc.org
-```
 
-## Platforms
 
+Platforms
+---------
 
 Fio works on (at least) Linux, Solaris, AIX, HP-UX, OSX, NetBSD, OpenBSD,
 Windows, FreeBSD, and DragonFly. Some features and/or options may only be
@@ -250,8 +252,8 @@ POSIX aio should work now. To make the change permanent::
         posix_aio0 changed
 
 
-## Running fio
-
+Running fio
+-----------
 
 Running fio is normally the easiest part - you just give it the job file
 (or job files) as parameters::
diff --git a/doc/fio_doc.rst b/doc/fio_doc.rst
index b5987b52..8e1216f0 100644
--- a/doc/fio_doc.rst
+++ b/doc/fio_doc.rst
@@ -2,7 +2,7 @@ fio - Flexible I/O tester rev. |version|
 ========================================
 
 
-.. include:: ../README
+.. include:: ../README.rst
 
 
 .. include:: ../HOWTO
diff --git a/doc/fio_man.rst b/doc/fio_man.rst
index c6a6438f..44312f16 100644
--- a/doc/fio_man.rst
+++ b/doc/fio_man.rst
@@ -6,7 +6,7 @@ Fio Manpage
 (rev. |release|)
 
 
-.. include:: ../README
+.. include:: ../README.rst
 
 
 .. include:: ../HOWTO
diff --git a/fio.1 b/fio.1
index 34aa874d..b87d2309 100644
--- a/fio.1
+++ b/fio.1
@@ -1686,10 +1686,10 @@ also be set as number of zones using 'z'.
 .TP
 .BI filesize \fR=\fPirange(int)
 Individual file sizes. May be a range, in which case fio will select sizes
-for files at random within the given range and limited to \fBsize\fR in
-total (if that is given). If not given, each created file is the same size.
-This option overrides \fBsize\fR in terms of file size, which means
-this value is used as a fixed size or possible range of each file.
+for files at random within the given range. If not given, each created file
+is the same size. This option overrides \fBsize\fR in terms of file size, 
+i.e. \fBsize\fR becomes merely the default for \fBio_size\fR (and
+has no effect it all if \fBio_size\fR is set explicitly).
 .TP
 .BI file_append \fR=\fPbool
 Perform I/O after the end of the file. Normally fio will operate within the

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-01-27 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-01-27 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2f0b54419a6ab039c677e41008391b8c53ae2e6b:

  Merge branch 'master' of https://github.com/ben-ihelputech/fio (2022-01-21 10:46:26 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2b3d4a6a924e0aa82654d3b96fb134085af7a98a:

  fio: use LDFLAGS when linking dynamic engines (2022-01-26 13:12:14 -0700)

----------------------------------------------------------------
Eric Sandeen (2):
      t/io_uring: link with libaio when necessary
      fio: use LDFLAGS when linking dynamic engines

Jens Axboe (1):
      Merge branch 'rpma-add-support-for-File-System-DAX' of https://github.com/ldorau/fio

Lukasz Dorau (1):
      rpma: RPMA engine requires librpma>=v0.10.0 with rpma_mr_advise()

Wang, Long (1):
      rpma: add support for File System DAX

 Makefile              |  3 ++-
 configure             |  9 ++++-----
 engines/librpma_fio.c | 44 +++++++++++++++++++++++++++++++++-----------
 engines/librpma_fio.h |  2 +-
 4 files changed, 40 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 5d17bcab..00e79539 100644
--- a/Makefile
+++ b/Makefile
@@ -99,6 +99,7 @@ endif
 ifdef CONFIG_LIBAIO
   libaio_SRCS = engines/libaio.c
   cmdprio_SRCS = engines/cmdprio.c
+  LIBS += -laio
   libaio_LIBS = -laio
   ENGINES += libaio
 endif
@@ -294,7 +295,7 @@ define engine_template =
 $(1)_OBJS := $$($(1)_SRCS:.c=.o)
 $$($(1)_OBJS): CFLAGS := -fPIC $$($(1)_CFLAGS) $(CFLAGS)
 engines/fio-$(1).so: $$($(1)_OBJS)
-	$$(QUIET_LINK)$(CC) -shared -rdynamic -fPIC -Wl,-soname,fio-$(1).so.1 -o $$@ $$< $$($(1)_LIBS)
+	$$(QUIET_LINK)$(CC) $(DYNAMIC) -shared -rdynamic -fPIC -Wl,-soname,fio-$(1).so.1 -o $$@ $$< $$($(1)_LIBS)
 ENGS_OBJS += engines/fio-$(1).so
 endef
 else # !CONFIG_DYNAMIC_ENGINES
diff --git a/configure b/configure
index 84ccce04..0efde7d6 100755
--- a/configure
+++ b/configure
@@ -955,17 +955,16 @@ print_config "rdmacm" "$rdmacm"
 
 ##########################################
 # librpma probe
+# The librpma engine requires librpma>=v0.10.0 with rpma_mr_advise().
 if test "$librpma" != "yes" ; then
   librpma="no"
 fi
 cat > $TMPC << EOF
-#include <stdio.h>
 #include <librpma.h>
-int main(int argc, char **argv)
+int main(void)
 {
-  enum rpma_conn_event event = RPMA_CONN_REJECTED;
-  (void) event; /* unused */
-  rpma_log_set_threshold(RPMA_LOG_THRESHOLD, RPMA_LOG_LEVEL_INFO);
+  void *ptr = rpma_mr_advise;
+  (void) ptr; /* unused */
   return 0;
 }
 EOF
diff --git a/engines/librpma_fio.c b/engines/librpma_fio.c
index 3d605ed6..9d6ebf38 100644
--- a/engines/librpma_fio.c
+++ b/engines/librpma_fio.c
@@ -108,7 +108,7 @@ char *librpma_fio_allocate_dram(struct thread_data *td, size_t size,
 	return mem_ptr;
 }
 
-char *librpma_fio_allocate_pmem(struct thread_data *td, const char *filename,
+char *librpma_fio_allocate_pmem(struct thread_data *td, struct fio_file *f,
 		size_t size, struct librpma_fio_mem *mem)
 {
 	size_t size_mmap = 0;
@@ -122,18 +122,24 @@ char *librpma_fio_allocate_pmem(struct thread_data *td, const char *filename,
 		return NULL;
 	}
 
-	ws_offset = (td->thread_number - 1) * size;
+	if (f->filetype == FIO_TYPE_CHAR) {
+		/* Each thread uses a separate offset within DeviceDAX. */
+		ws_offset = (td->thread_number - 1) * size;
+	} else {
+		/* Each thread uses a separate FileSystemDAX file. No offset is needed. */
+		ws_offset = 0;
+	}
 
-	if (!filename) {
+	if (!f->file_name) {
 		log_err("fio: filename is not set\n");
 		return NULL;
 	}
 
 	/* map the file */
-	mem_ptr = pmem_map_file(filename, 0 /* len */, 0 /* flags */,
+	mem_ptr = pmem_map_file(f->file_name, 0 /* len */, 0 /* flags */,
 			0 /* mode */, &size_mmap, &is_pmem);
 	if (mem_ptr == NULL) {
-		log_err("fio: pmem_map_file(%s) failed\n", filename);
+		log_err("fio: pmem_map_file(%s) failed\n", f->file_name);
 		/* pmem_map_file() sets errno on failure */
 		td_verror(td, errno, "pmem_map_file");
 		return NULL;
@@ -142,7 +148,7 @@ char *librpma_fio_allocate_pmem(struct thread_data *td, const char *filename,
 	/* pmem is expected */
 	if (!is_pmem) {
 		log_err("fio: %s is not located in persistent memory\n",
-			filename);
+			f->file_name);
 		goto err_unmap;
 	}
 
@@ -150,12 +156,12 @@ char *librpma_fio_allocate_pmem(struct thread_data *td, const char *filename,
 	if (size_mmap < ws_offset + size) {
 		log_err(
 			"fio: %s is too small to handle so many threads (%zu < %zu)\n",
-			filename, size_mmap, ws_offset + size);
+			f->file_name, size_mmap, ws_offset + size);
 		goto err_unmap;
 	}
 
 	log_info("fio: size of memory mapped from the file %s: %zu\n",
-		filename, size_mmap);
+		f->file_name, size_mmap);
 
 	mem->mem_ptr = mem_ptr;
 	mem->size_mmap = size_mmap;
@@ -893,6 +899,7 @@ int librpma_fio_server_open_file(struct thread_data *td, struct fio_file *f,
 	size_t mem_size = td->o.size;
 	size_t mr_desc_size;
 	void *ws_ptr;
+	bool is_dram;
 	int usage_mem_type;
 	int ret;
 
@@ -910,14 +917,14 @@ int librpma_fio_server_open_file(struct thread_data *td, struct fio_file *f,
 		return -1;
 	}
 
-	if (strcmp(f->file_name, "malloc") == 0) {
+	is_dram = !strcmp(f->file_name, "malloc");
+	if (is_dram) {
 		/* allocation from DRAM using posix_memalign() */
 		ws_ptr = librpma_fio_allocate_dram(td, mem_size, &csd->mem);
 		usage_mem_type = RPMA_MR_USAGE_FLUSH_TYPE_VISIBILITY;
 	} else {
 		/* allocation from PMEM using pmem_map_file() */
-		ws_ptr = librpma_fio_allocate_pmem(td, f->file_name,
-				mem_size, &csd->mem);
+		ws_ptr = librpma_fio_allocate_pmem(td, f, mem_size, &csd->mem);
 		usage_mem_type = RPMA_MR_USAGE_FLUSH_TYPE_PERSISTENT;
 	}
 
@@ -934,6 +941,21 @@ int librpma_fio_server_open_file(struct thread_data *td, struct fio_file *f,
 		goto err_free;
 	}
 
+	if (!is_dram && f->filetype == FIO_TYPE_FILE) {
+		ret = rpma_mr_advise(mr, 0, mem_size,
+				IBV_ADVISE_MR_ADVICE_PREFETCH_WRITE,
+				IBV_ADVISE_MR_FLAG_FLUSH);
+		if (ret) {
+			librpma_td_verror(td, ret, "rpma_mr_advise");
+			/* an invalid argument is an error */
+			if (ret == RPMA_E_INVAL)
+				goto err_mr_dereg;
+
+			/* log_err used instead of log_info to avoid corruption of the JSON output */
+			log_err("Note: having rpma_mr_advise(3) failed because of RPMA_E_NOSUPP or RPMA_E_PROVIDER may come with a performance penalty, but it is not a blocker for running the benchmark.\n");
+		}
+	}
+
 	/* get size of the memory region's descriptor */
 	if ((ret = rpma_mr_get_descriptor_size(mr, &mr_desc_size))) {
 		librpma_td_verror(td, ret, "rpma_mr_get_descriptor_size");
diff --git a/engines/librpma_fio.h b/engines/librpma_fio.h
index fb89d99d..2c507e9c 100644
--- a/engines/librpma_fio.h
+++ b/engines/librpma_fio.h
@@ -77,7 +77,7 @@ struct librpma_fio_mem {
 char *librpma_fio_allocate_dram(struct thread_data *td, size_t size,
 		struct librpma_fio_mem *mem);
 
-char *librpma_fio_allocate_pmem(struct thread_data *td, const char *filename,
+char *librpma_fio_allocate_pmem(struct thread_data *td, struct fio_file *f,
 		size_t size, struct librpma_fio_mem *mem);
 
 void librpma_fio_free(struct librpma_fio_mem *mem);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-01-22 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-01-22 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3a3e5c6e7606e727df1788a73d04db56d77ba00d:

  iolog.c: Fix memory leak for blkparse case (2022-01-20 11:40:42 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2f0b54419a6ab039c677e41008391b8c53ae2e6b:

  Merge branch 'master' of https://github.com/ben-ihelputech/fio (2022-01-21 10:46:26 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'master' of https://github.com/ben-ihelputech/fio

ben-ihelputech (1):
      Update README to markdown format

 README => README.md | 78 ++++++++++++++++++++++++++---------------------------
 1 file changed, 38 insertions(+), 40 deletions(-)
 rename README => README.md (94%)

---

Diff of recent changes:

diff --git a/README b/README.md
similarity index 94%
rename from README
rename to README.md
index d566fae3..b10b1688 100644
--- a/README
+++ b/README.md
@@ -1,5 +1,5 @@
-Overview and history
---------------------
+# Fio README
+## Overview and history
 
 Fio was originally written to save me the hassle of writing special test case
 programs when I wanted to test a specific workload, either for performance
@@ -22,14 +22,13 @@ that setting is given.  The typical use of fio is to write a job file matching
 the I/O load one wants to simulate.
 
 
-Source
-------
+## Source
 
 Fio resides in a git repo, the canonical place is:
 
 	git://git.kernel.dk/fio.git
 
-When inside a corporate firewall, git:// URL sometimes does not work.
+When inside a corporate firewall, `git://` URL sometimes does not work.
 If git:// does not work, use the http protocol instead:
 
 	http://git.kernel.dk/fio.git
@@ -55,8 +54,8 @@ or
 	https://github.com/axboe/fio.git
 
 
-Mailing list
-------------
+## Mailing list
+
 
 The fio project mailing list is meant for anything related to fio including
 general discussion, bug reporting, questions, and development. For bug reporting,
@@ -81,8 +80,8 @@ and archives for the old list can be found here:
 	http://maillist.kernel.dk/fio-devel/
 
 
-Author
-------
+## Author
+
 
 Fio was written by Jens Axboe <axboe@kernel.dk> to enable flexible testing of
 the Linux I/O subsystem and schedulers. He got tired of writing specific test
@@ -92,56 +91,55 @@ benchmark/test tools out there weren't flexible enough to do what he wanted.
 Jens Axboe <axboe@kernel.dk> 20060905
 
 
-Binary packages
----------------
+## Binary packages
 
-Debian:
+**Debian:**
 	Starting with Debian "Squeeze", fio packages are part of the official
 	Debian repository. http://packages.debian.org/search?keywords=fio .
 
-Ubuntu:
+**Ubuntu:**
 	Starting with Ubuntu 10.04 LTS (aka "Lucid Lynx"), fio packages are part
 	of the Ubuntu "universe" repository.
 	http://packages.ubuntu.com/search?keywords=fio .
 
-Red Hat, Fedora, CentOS & Co:
+**Red Hat, Fedora, CentOS & Co:**
 	Starting with Fedora 9/Extra Packages for Enterprise Linux 4, fio
 	packages are part of the Fedora/EPEL repositories.
 	https://apps.fedoraproject.org/packages/fio .
 
-Mandriva:
+**Mandriva:**
 	Mandriva has integrated fio into their package repository, so installing
 	on that distro should be as easy as typing ``urpmi fio``.
 
-Arch Linux:
+**Arch Linux:**
         An Arch Linux package is provided under the Community sub-repository:
         https://www.archlinux.org/packages/?sort=&q=fio
 
-Solaris:
+**Solaris:**
 	Packages for Solaris are available from OpenCSW. Install their pkgutil
 	tool (http://www.opencsw.org/get-it/pkgutil/) and then install fio via
 	``pkgutil -i fio``.
 
-Windows:
+**Windows:**
 	Rebecca Cran <rebecca@bsdio.com> has fio packages for Windows at
 	https://bsdio.com/fio/ . The latest builds for Windows can also
 	be grabbed from https://ci.appveyor.com/project/axboe/fio by clicking
 	the latest x86 or x64 build, then selecting the ARTIFACTS tab.
 
-BSDs:
+**BSDs:**
 	Packages for BSDs may be available from their binary package repositories.
 	Look for a package "fio" using their binary package managers.
 
 
-Building
---------
-
-Just type::
+## Building
 
- $ ./configure
- $ make
- $ make install
 
+Just type::
+```
+./configure
+make
+make install
+```
 Note that GNU make is required. On BSDs it's available from devel/gmake within
 ports directory; on Solaris it's in the SUNWgmake package.  On platforms where
 GNU make isn't the default, type ``gmake`` instead of ``make``.
@@ -155,18 +153,18 @@ to be installed.  gfio isn't built automatically and can be enabled with a
 ``--enable-gfio`` option to configure.
 
 To build fio with a cross-compiler::
-
- $ make clean
- $ make CROSS_COMPILE=/path/to/toolchain/prefix
-
+```
+make clean
+make CROSS_COMPILE=/path/to/toolchain/prefix
+```
 Configure will attempt to determine the target platform automatically.
 
 It's possible to build fio for ESX as well, use the ``--esx`` switch to
 configure.
 
 
-Windows
-~~~~~~~
+## Windows
+
 
 The minimum versions of Windows for building/runing fio are Windows 7/Windows
 Server 2008 R2. On Windows, Cygwin (https://www.cygwin.com/) is required in
@@ -174,7 +172,7 @@ order to build fio. To create an MSI installer package install WiX from
 https://wixtoolset.org and run :file:`dobuild.cmd` from the :file:`os/windows`
 directory.
 
-How to compile fio on 64-bit Windows:
+### How to compile fio on 64-bit Windows:
 
  1. Install Cygwin (http://www.cygwin.com/). Install **make** and all
     packages starting with **mingw64-x86_64**. Ensure
@@ -196,21 +194,21 @@ https://github.com/mintty/mintty/wiki/Tips#inputoutput-interaction-with-alien-pr
 for details).
 
 
-Documentation
-~~~~~~~~~~~~~
+## Documentation
+
 
 Fio uses Sphinx_ to generate documentation from the reStructuredText_ files.
 To build HTML formatted documentation run ``make -C doc html`` and direct your
 browser to :file:`./doc/output/html/index.html`.  To build manual page run
 ``make -C doc man`` and then ``man doc/output/man/fio.1``.  To see what other
 output formats are supported run ``make -C doc help``.
-
+```
 .. _reStructuredText: http://www.sphinx-doc.org/rest.html
 .. _Sphinx: http://www.sphinx-doc.org
+```
 
+## Platforms
 
-Platforms
----------
 
 Fio works on (at least) Linux, Solaris, AIX, HP-UX, OSX, NetBSD, OpenBSD,
 Windows, FreeBSD, and DragonFly. Some features and/or options may only be
@@ -252,8 +250,8 @@ POSIX aio should work now. To make the change permanent::
         posix_aio0 changed
 
 
-Running fio
------------
+## Running fio
+
 
 Running fio is normally the easiest part - you just give it the job file
 (or job files) as parameters::

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-01-21 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-01-21 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 71efbed61dfb157dfa7fe550f500b53f9731e1cb:

  docs: documentation for sg WRITE STREAM(16) (2022-01-18 06:37:39 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3a3e5c6e7606e727df1788a73d04db56d77ba00d:

  iolog.c: Fix memory leak for blkparse case (2022-01-20 11:40:42 -0700)

----------------------------------------------------------------
Lukas Straub (8):
      blktrace.c: Use file stream interface instead of fifo
      iolog.c: Make iolog_items_to_fetch public
      blktrace.c: Add support for read_iolog_chunked
      linux-dev-lookup.c: Put the check for replay_redirect in the beginning
      blktrace.c: Don't hardcode direct-io
      blktrace.c: Don't sleep indefinitely if there is a wrong timestamp
      blktrace.c: Make thread-safe by removing local static variables
      iolog.c: Fix memory leak for blkparse case

 blktrace.c               | 325 ++++++++++++++++++++++++-----------------------
 blktrace.h               |  14 +-
 fio.h                    |   2 +
 iolog.c                  |  18 ++-
 iolog.h                  |   1 +
 oslib/linux-dev-lookup.c |  21 ++-
 6 files changed, 203 insertions(+), 178 deletions(-)

---

Diff of recent changes:

diff --git a/blktrace.c b/blktrace.c
index 64a610a9..e1804765 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -4,71 +4,34 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
+#include <errno.h>
 
 #include "flist.h"
 #include "fio.h"
+#include "iolog.h"
 #include "blktrace.h"
 #include "blktrace_api.h"
 #include "oslib/linux-dev-lookup.h"
 
-#define TRACE_FIFO_SIZE	8192
-
-/*
- * fifo refill frontend, to avoid reading data in trace sized bites
- */
-static int refill_fifo(struct thread_data *td, struct fifo *fifo, int fd)
-{
-	char buf[TRACE_FIFO_SIZE];
-	unsigned int total;
-	int ret;
-
-	total = sizeof(buf);
-	if (total > fifo_room(fifo))
-		total = fifo_room(fifo);
-
-	ret = read(fd, buf, total);
-	if (ret < 0) {
-		int read_err = errno;
-
-		assert(read_err > 0);
-		td_verror(td, read_err, "read blktrace file");
-		return -read_err;
-	}
-
-	if (ret > 0)
-		ret = fifo_put(fifo, buf, ret);
-
-	dprint(FD_BLKTRACE, "refill: filled %d bytes\n", ret);
-	return ret;
-}
-
-/*
- * Retrieve 'len' bytes from the fifo, refilling if necessary.
- */
-static int trace_fifo_get(struct thread_data *td, struct fifo *fifo, int fd,
-			  void *buf, unsigned int len)
-{
-	if (fifo_len(fifo) < len) {
-		int ret = refill_fifo(td, fifo, fd);
-
-		if (ret < 0)
-			return ret;
-	}
-
-	return fifo_get(fifo, buf, len);
-}
+struct file_cache {
+	unsigned int maj;
+	unsigned int min;
+	unsigned int fileno;
+};
 
 /*
  * Just discard the pdu by seeking past it.
  */
-static int discard_pdu(struct thread_data *td, struct fifo *fifo, int fd,
-		       struct blk_io_trace *t)
+static int discard_pdu(FILE* f, struct blk_io_trace *t)
 {
 	if (t->pdu_len == 0)
 		return 0;
 
 	dprint(FD_BLKTRACE, "discard pdu len %u\n", t->pdu_len);
-	return trace_fifo_get(td, fifo, fd, NULL, t->pdu_len);
+	if (fseek(f, t->pdu_len, SEEK_CUR) < 0)
+		return -errno;
+
+	return t->pdu_len;
 }
 
 /*
@@ -130,28 +93,28 @@ static void trace_add_open_close_event(struct thread_data *td, int fileno, enum
 	flist_add_tail(&ipo->list, &td->io_log_list);
 }
 
-static int trace_add_file(struct thread_data *td, __u32 device)
+static int trace_add_file(struct thread_data *td, __u32 device,
+			  struct file_cache *cache)
 {
-	static unsigned int last_maj, last_min, last_fileno;
 	unsigned int maj = FMAJOR(device);
 	unsigned int min = FMINOR(device);
 	struct fio_file *f;
 	char dev[256];
 	unsigned int i;
 
-	if (last_maj == maj && last_min == min)
-		return last_fileno;
+	if (cache->maj == maj && cache->min == min)
+		return cache->fileno;
 
-	last_maj = maj;
-	last_min = min;
+	cache->maj = maj;
+	cache->min = min;
 
 	/*
 	 * check for this file in our list
 	 */
 	for_each_file(td, f, i)
 		if (f->major == maj && f->minor == min) {
-			last_fileno = f->fileno;
-			return last_fileno;
+			cache->fileno = f->fileno;
+			return cache->fileno;
 		}
 
 	strcpy(dev, "/dev");
@@ -171,10 +134,10 @@ static int trace_add_file(struct thread_data *td, __u32 device)
 		td->files[fileno]->major = maj;
 		td->files[fileno]->minor = min;
 		trace_add_open_close_event(td, fileno, FIO_LOG_OPEN_FILE);
-		last_fileno = fileno;
+		cache->fileno = fileno;
 	}
 
-	return last_fileno;
+	return cache->fileno;
 }
 
 static void t_bytes_align(struct thread_options *o, struct blk_io_trace *t)
@@ -215,7 +178,7 @@ static void store_ipo(struct thread_data *td, unsigned long long offset,
 	queue_io_piece(td, ipo);
 }
 
-static void handle_trace_notify(struct blk_io_trace *t)
+static bool handle_trace_notify(struct blk_io_trace *t)
 {
 	switch (t->action) {
 	case BLK_TN_PROCESS:
@@ -232,22 +195,24 @@ static void handle_trace_notify(struct blk_io_trace *t)
 		dprint(FD_BLKTRACE, "unknown trace act %x\n", t->action);
 		break;
 	}
+	return false;
 }
 
-static void handle_trace_discard(struct thread_data *td,
+static bool handle_trace_discard(struct thread_data *td,
 				 struct blk_io_trace *t,
 				 unsigned long long ttime,
-				 unsigned long *ios, unsigned int *bs)
+				 unsigned long *ios, unsigned long long *bs,
+				 struct file_cache *cache)
 {
 	struct io_piece *ipo;
 	int fileno;
 
 	if (td->o.replay_skip & (1u << DDIR_TRIM))
-		return;
+		return false;
 
 	ipo = calloc(1, sizeof(*ipo));
 	init_ipo(ipo);
-	fileno = trace_add_file(td, t->device);
+	fileno = trace_add_file(td, t->device, cache);
 
 	ios[DDIR_TRIM]++;
 	if (t->bytes > bs[DDIR_TRIM])
@@ -270,6 +235,7 @@ static void handle_trace_discard(struct thread_data *td,
 							ipo->offset, ipo->len,
 							ipo->delay);
 	queue_io_piece(td, ipo);
+	return true;
 }
 
 static void dump_trace(struct blk_io_trace *t)
@@ -277,29 +243,29 @@ static void dump_trace(struct blk_io_trace *t)
 	log_err("blktrace: ignoring zero byte trace: action=%x\n", t->action);
 }
 
-static void handle_trace_fs(struct thread_data *td, struct blk_io_trace *t,
+static bool handle_trace_fs(struct thread_data *td, struct blk_io_trace *t,
 			    unsigned long long ttime, unsigned long *ios,
-			    unsigned int *bs)
+			    unsigned long long *bs, struct file_cache *cache)
 {
 	int rw;
 	int fileno;
 
-	fileno = trace_add_file(td, t->device);
+	fileno = trace_add_file(td, t->device, cache);
 
 	rw = (t->action & BLK_TC_ACT(BLK_TC_WRITE)) != 0;
 
 	if (rw) {
 		if (td->o.replay_skip & (1u << DDIR_WRITE))
-			return;
+			return false;
 	} else {
 		if (td->o.replay_skip & (1u << DDIR_READ))
-			return;
+			return false;
 	}
 
 	if (!t->bytes) {
 		if (!fio_did_warn(FIO_WARN_BTRACE_ZERO))
 			dump_trace(t);
-		return;
+		return false;
 	}
 
 	if (t->bytes > bs[rw])
@@ -308,20 +274,22 @@ static void handle_trace_fs(struct thread_data *td, struct blk_io_trace *t,
 	ios[rw]++;
 	td->o.size += t->bytes;
 	store_ipo(td, t->sector, t->bytes, rw, ttime, fileno);
+	return true;
 }
 
-static void handle_trace_flush(struct thread_data *td, struct blk_io_trace *t,
-			       unsigned long long ttime, unsigned long *ios)
+static bool handle_trace_flush(struct thread_data *td, struct blk_io_trace *t,
+			       unsigned long long ttime, unsigned long *ios,
+			       struct file_cache *cache)
 {
 	struct io_piece *ipo;
 	int fileno;
 
 	if (td->o.replay_skip & (1u << DDIR_SYNC))
-		return;
+		return false;
 
 	ipo = calloc(1, sizeof(*ipo));
 	init_ipo(ipo);
-	fileno = trace_add_file(td, t->device);
+	fileno = trace_add_file(td, t->device, cache);
 
 	ipo->delay = ttime / 1000;
 	ipo->ddir = DDIR_SYNC;
@@ -330,47 +298,49 @@ static void handle_trace_flush(struct thread_data *td, struct blk_io_trace *t,
 	ios[DDIR_SYNC]++;
 	dprint(FD_BLKTRACE, "store flush delay=%lu\n", ipo->delay);
 	queue_io_piece(td, ipo);
+	return true;
 }
 
 /*
  * We only care for queue traces, most of the others are side effects
  * due to internal workings of the block layer.
  */
-static void handle_trace(struct thread_data *td, struct blk_io_trace *t,
-			 unsigned long *ios, unsigned int *bs)
+static bool queue_trace(struct thread_data *td, struct blk_io_trace *t,
+			 unsigned long *ios, unsigned long long *bs,
+			 struct file_cache *cache)
 {
-	static unsigned long long last_ttime;
+	unsigned long long *last_ttime = &td->io_log_blktrace_last_ttime;
 	unsigned long long delay = 0;
 
 	if ((t->action & 0xffff) != __BLK_TA_QUEUE)
-		return;
+		return false;
 
 	if (!(t->action & BLK_TC_ACT(BLK_TC_NOTIFY))) {
-		if (!last_ttime || td->o.no_stall)
+		if (!*last_ttime || td->o.no_stall || t->time < *last_ttime)
 			delay = 0;
 		else if (td->o.replay_time_scale == 100)
-			delay = t->time - last_ttime;
+			delay = t->time - *last_ttime;
 		else {
-			double tmp = t->time - last_ttime;
+			double tmp = t->time - *last_ttime;
 			double scale;
 
 			scale = (double) 100.0 / (double) td->o.replay_time_scale;
 			tmp *= scale;
 			delay = tmp;
 		}
-		last_ttime = t->time;
+		*last_ttime = t->time;
 	}
 
 	t_bytes_align(&td->o, t);
 
 	if (t->action & BLK_TC_ACT(BLK_TC_NOTIFY))
-		handle_trace_notify(t);
+		return handle_trace_notify(t);
 	else if (t->action & BLK_TC_ACT(BLK_TC_DISCARD))
-		handle_trace_discard(td, t, delay, ios, bs);
+		return handle_trace_discard(td, t, delay, ios, bs, cache);
 	else if (t->action & BLK_TC_ACT(BLK_TC_FLUSH))
-		handle_trace_flush(td, t, delay, ios);
+		return handle_trace_flush(td, t, delay, ios, cache);
 	else
-		handle_trace_fs(td, t, delay, ios, bs);
+		return handle_trace_fs(td, t, delay, ios, bs, cache);
 }
 
 static void byteswap_trace(struct blk_io_trace *t)
@@ -438,43 +408,79 @@ static void depth_end(struct blk_io_trace *t, int *this_depth, int *depth)
  * Load a blktrace file by reading all the blk_io_trace entries, and storing
  * them as io_pieces like the fio text version would do.
  */
-bool load_blktrace(struct thread_data *td, const char *filename, int need_swap)
+bool init_blktrace_read(struct thread_data *td, const char *filename, int need_swap)
+{
+	int old_state;
+
+	td->io_log_rfile = fopen(filename, "rb");
+	if (!td->io_log_rfile) {
+		td_verror(td, errno, "open blktrace file");
+		goto err;
+	}
+	td->io_log_blktrace_swap = need_swap;
+	td->io_log_blktrace_last_ttime = 0;
+	td->o.size = 0;
+
+	free_release_files(td);
+
+	old_state = td_bump_runstate(td, TD_SETTING_UP);
+
+	if (!read_blktrace(td)) {
+		goto err;
+	}
+
+	td_restore_runstate(td, old_state);
+
+	if (!td->files_index) {
+		log_err("fio: did not find replay device(s)\n");
+		return false;
+	}
+
+	return true;
+
+err:
+	if (td->io_log_rfile) {
+		fclose(td->io_log_rfile);
+		td->io_log_rfile = NULL;
+	}
+	return false;
+}
+
+bool read_blktrace(struct thread_data* td)
 {
 	struct blk_io_trace t;
+	struct file_cache cache = { };
 	unsigned long ios[DDIR_RWDIR_SYNC_CNT] = { };
-	unsigned int rw_bs[DDIR_RWDIR_CNT] = { };
+	unsigned long long rw_bs[DDIR_RWDIR_CNT] = { };
 	unsigned long skipped_writes;
-	struct fifo *fifo;
-	int fd, i, old_state, max_depth;
-	struct fio_file *f;
+	FILE *f = td->io_log_rfile;
+	int i, max_depth;
+	struct fio_file *fiof;
 	int this_depth[DDIR_RWDIR_CNT] = { };
 	int depth[DDIR_RWDIR_CNT] = { };
+	int64_t items_to_fetch = 0;
 
-	fd = open(filename, O_RDONLY);
-	if (fd < 0) {
-		td_verror(td, errno, "open blktrace file");
-		return false;
+	if (td->o.read_iolog_chunked) {
+		items_to_fetch = iolog_items_to_fetch(td);
+		if (!items_to_fetch)
+			return true;
 	}
 
-	fifo = fifo_alloc(TRACE_FIFO_SIZE);
-
-	old_state = td_bump_runstate(td, TD_SETTING_UP);
-
-	td->o.size = 0;
 	skipped_writes = 0;
 	do {
-		int ret = trace_fifo_get(td, fifo, fd, &t, sizeof(t));
+		int ret = fread(&t, 1, sizeof(t), f);
 
-		if (ret < 0)
+		if (ferror(f)) {
+			td_verror(td, errno, "read blktrace file");
 			goto err;
-		else if (!ret)
+		} else if (feof(f)) {
 			break;
-		else if (ret < (int) sizeof(t)) {
-			log_err("fio: short fifo get\n");
+		} else if (ret < (int) sizeof(t)) {
+			log_err("fio: iolog short read\n");
 			break;
 		}
 
-		if (need_swap)
+		if (td->io_log_blktrace_swap)
 			byteswap_trace(&t);
 
 		if ((t.magic & 0xffffff00) != BLK_IO_TRACE_MAGIC) {
@@ -487,13 +493,10 @@ bool load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 								t.magic & 0xff);
 			goto err;
 		}
-		ret = discard_pdu(td, fifo, fd, &t);
+		ret = discard_pdu(f, &t);
 		if (ret < 0) {
 			td_verror(td, -ret, "blktrace lseek");
 			goto err;
-		} else if (t.pdu_len != ret) {
-			log_err("fio: discarded %d of %d\n", ret, t.pdu_len);
-			goto err;
 		}
 		if ((t.action & BLK_TC_ACT(BLK_TC_NOTIFY)) == 0) {
 			if ((t.action & 0xffff) == __BLK_TA_QUEUE)
@@ -510,22 +513,53 @@ bool load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 			}
 		}
 
-		handle_trace(td, &t, ios, rw_bs);
-	} while (1);
+		if (!queue_trace(td, &t, ios, rw_bs, &cache))
+			continue;
 
-	for_each_file(td, f, i)
-		trace_add_open_close_event(td, f->fileno, FIO_LOG_CLOSE_FILE);
+		if (td->o.read_iolog_chunked) {
+			td->io_log_current++;
+			items_to_fetch--;
+			if (items_to_fetch == 0)
+				break;
+		}
+	} while (1);
 
-	fifo_free(fifo);
-	close(fd);
+	if (td->o.read_iolog_chunked) {
+		td->io_log_highmark = td->io_log_current;
+		td->io_log_checkmark = (td->io_log_highmark + 1) / 2;
+		fio_gettime(&td->io_log_highmark_time, NULL);
+	}
 
-	td_restore_runstate(td, old_state);
+	if (skipped_writes)
+		log_err("fio: %s skips replay of %lu writes due to read-only\n",
+						td->o.name, skipped_writes);
 
-	if (!td->files_index) {
-		log_err("fio: did not find replay device(s)\n");
-		return false;
+	if (td->o.read_iolog_chunked) {
+		if (td->io_log_current == 0) {
+			return false;
+		}
+		td->o.td_ddir = TD_DDIR_RW;
+		if ((rw_bs[DDIR_READ] > td->o.max_bs[DDIR_READ] ||
+		     rw_bs[DDIR_WRITE] > td->o.max_bs[DDIR_WRITE] ||
+		     rw_bs[DDIR_TRIM] > td->o.max_bs[DDIR_TRIM]) &&
+		    td->orig_buffer)
+		{
+			td->o.max_bs[DDIR_READ] = max(td->o.max_bs[DDIR_READ], rw_bs[DDIR_READ]);
+			td->o.max_bs[DDIR_WRITE] = max(td->o.max_bs[DDIR_WRITE], rw_bs[DDIR_WRITE]);
+			td->o.max_bs[DDIR_TRIM] = max(td->o.max_bs[DDIR_TRIM], rw_bs[DDIR_TRIM]);
+			io_u_quiesce(td);
+			free_io_mem(td);
+			init_io_u_buffers(td);
+		}
+		return true;
 	}
 
+	for_each_file(td, fiof, i)
+		trace_add_open_close_event(td, fiof->fileno, FIO_LOG_CLOSE_FILE);
+
+	fclose(td->io_log_rfile);
+	td->io_log_rfile = NULL;
+
 	/*
 	 * For stacked devices, we don't always get a COMPLETE event so
 	 * the depth grows to insane values. Limit it to something sane(r).
@@ -539,10 +573,6 @@ bool load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 		max_depth = max(depth[i], max_depth);
 	}
 
-	if (skipped_writes)
-		log_err("fio: %s skips replay of %lu writes due to read-only\n",
-						td->o.name, skipped_writes);
-
 	if (!ios[DDIR_READ] && !ios[DDIR_WRITE] && !ios[DDIR_TRIM] &&
 	    !ios[DDIR_SYNC]) {
 		log_err("fio: found no ios in blktrace data\n");
@@ -563,14 +593,6 @@ bool load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 		td->o.max_bs[DDIR_TRIM] = rw_bs[DDIR_TRIM];
 	}
 
-	/*
-	 * We need to do direct/raw ios to the device, to avoid getting
-	 * read-ahead in our way. But only do so if the minimum block size
-	 * is a multiple of 4k, otherwise we don't know if it's safe to do so.
-	 */
-	if (!fio_option_is_set(&td->o, odirect) && !(td_min_bs(td) & 4095))
-		td->o.odirect = 1;
-
 	/*
 	 * If depth wasn't manually set, use probed depth
 	 */
@@ -579,8 +601,7 @@ bool load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 
 	return true;
 err:
-	close(fd);
-	fifo_free(fifo);
+	fclose(f);
 	return false;
 }
 
@@ -625,15 +646,14 @@ static void merge_finish_file(struct blktrace_cursor *bcs, int i, int *nr_logs)
 {
 	bcs[i].iter++;
 	if (bcs[i].iter < bcs[i].nr_iter) {
-		lseek(bcs[i].fd, 0, SEEK_SET);
+		fseek(bcs[i].f, 0, SEEK_SET);
 		return;
 	}
 
 	*nr_logs -= 1;
 
 	/* close file */
-	fifo_free(bcs[i].fifo);
-	close(bcs[i].fd);
+	fclose(bcs[i].f);
 
 	/* keep active files contiguous */
 	memmove(&bcs[i], &bcs[*nr_logs], sizeof(bcs[i]));
@@ -646,15 +666,16 @@ static int read_trace(struct thread_data *td, struct blktrace_cursor *bc)
 
 read_skip:
 	/* read an io trace */
-	ret = trace_fifo_get(td, bc->fifo, bc->fd, t, sizeof(*t));
-	if (ret < 0) {
+	ret = fread(&t, 1, sizeof(t), bc->f);
+	if (ferror(bc->f)) {
+		td_verror(td, errno, "read blktrace file");
 		return ret;
-	} else if (!ret) {
+	} else if (feof(bc->f)) {
 		if (!bc->length)
 			bc->length = bc->t.time;
 		return ret;
 	} else if (ret < (int) sizeof(*t)) {
-		log_err("fio: short fifo get\n");
+		log_err("fio: iolog short read\n");
 		return -1;
 	}
 
@@ -664,14 +685,10 @@ read_skip:
 	/* skip over actions that fio does not care about */
 	if ((t->action & 0xffff) != __BLK_TA_QUEUE ||
 	    t_get_ddir(t) == DDIR_INVAL) {
-		ret = discard_pdu(td, bc->fifo, bc->fd, t);
+		ret = discard_pdu(bc->f, t);
 		if (ret < 0) {
 			td_verror(td, -ret, "blktrace lseek");
 			return ret;
-		} else if (t->pdu_len != ret) {
-			log_err("fio: discarded %d of %d\n", ret,
-				t->pdu_len);
-			return -1;
 		}
 		goto read_skip;
 	}
@@ -729,14 +746,13 @@ int merge_blktrace_iologs(struct thread_data *td)
 	str = ptr = strdup(td->o.read_iolog_file);
 	nr_logs = 0;
 	for (i = 0; (name = get_next_str(&ptr)) != NULL; i++) {
-		bcs[i].fd = open(name, O_RDONLY);
-		if (bcs[i].fd < 0) {
+		bcs[i].f = fopen(name, "rb");
+		if (!bcs[i].f) {
 			log_err("fio: could not open file: %s\n", name);
-			ret = bcs[i].fd;
+			ret = -errno;
 			free(str);
 			goto err_file;
 		}
-		bcs[i].fifo = fifo_alloc(TRACE_FIFO_SIZE);
 		nr_logs++;
 
 		if (!is_blktrace(name, &bcs[i].swap)) {
@@ -761,14 +777,10 @@ int merge_blktrace_iologs(struct thread_data *td)
 		i = find_earliest_io(bcs, nr_logs);
 		bc = &bcs[i];
 		/* skip over the pdu */
-		ret = discard_pdu(td, bc->fifo, bc->fd, &bc->t);
+		ret = discard_pdu(bc->f, &bc->t);
 		if (ret < 0) {
 			td_verror(td, -ret, "blktrace lseek");
 			goto err_file;
-		} else if (bc->t.pdu_len != ret) {
-			log_err("fio: discarded %d of %d\n", ret,
-				bc->t.pdu_len);
-			goto err_file;
 		}
 
 		ret = write_trace(merge_fp, &bc->t);
@@ -786,8 +798,7 @@ int merge_blktrace_iologs(struct thread_data *td)
 err_file:
 	/* cleanup */
 	for (i = 0; i < nr_logs; i++) {
-		fifo_free(bcs[i].fifo);
-		close(bcs[i].fd);
+		fclose(bcs[i].f);
 	}
 err_merge_buf:
 	free(merge_buf);
diff --git a/blktrace.h b/blktrace.h
index a0e82faa..c53b717b 100644
--- a/blktrace.h
+++ b/blktrace.h
@@ -10,7 +10,7 @@
 
 struct blktrace_cursor {
 	struct fifo		*fifo;	// fifo queue for reading
-	int			fd;	// blktrace file
+	FILE			*f;	// blktrace file
 	__u64			length; // length of trace
 	struct blk_io_trace	t;	// current io trace
 	int			swap;	// bitwise reverse required
@@ -20,7 +20,9 @@ struct blktrace_cursor {
 };
 
 bool is_blktrace(const char *, int *);
-bool load_blktrace(struct thread_data *, const char *, int);
+bool init_blktrace_read(struct thread_data *, const char *, int);
+bool read_blktrace(struct thread_data* td);
+
 int merge_blktrace_iologs(struct thread_data *td);
 
 #else
@@ -30,12 +32,18 @@ static inline bool is_blktrace(const char *fname, int *need_swap)
 	return false;
 }
 
-static inline bool load_blktrace(struct thread_data *td, const char *fname,
+static inline bool init_blktrace_read(struct thread_data *td, const char *fname,
 				 int need_swap)
 {
 	return false;
 }
 
+static inline bool read_blktrace(struct thread_data* td)
+{
+	return false;
+}
+
+
 static inline int merge_blktrace_iologs(struct thread_data *td)
 {
 	return false;
diff --git a/fio.h b/fio.h
index 6bb21ebb..1ea3d064 100644
--- a/fio.h
+++ b/fio.h
@@ -428,6 +428,8 @@ struct thread_data {
 	struct flist_head io_log_list;
 	FILE *io_log_rfile;
 	unsigned int io_log_blktrace;
+	unsigned int io_log_blktrace_swap;
+	unsigned long long io_log_blktrace_last_ttime;
 	unsigned int io_log_current;
 	unsigned int io_log_checkmark;
 	unsigned int io_log_highmark;
diff --git a/iolog.c b/iolog.c
index 1aeb7a76..a2cf0c1c 100644
--- a/iolog.c
+++ b/iolog.c
@@ -152,10 +152,15 @@ int read_iolog_get(struct thread_data *td, struct io_u *io_u)
 	while (!flist_empty(&td->io_log_list)) {
 		int ret;
 
-		if (!td->io_log_blktrace && td->o.read_iolog_chunked) {
+		if (td->o.read_iolog_chunked) {
 			if (td->io_log_checkmark == td->io_log_current) {
-				if (!read_iolog2(td))
-					return 1;
+				if (td->io_log_blktrace) {
+					if (!read_blktrace(td))
+						return 1;
+				} else {
+					if (!read_iolog2(td))
+						return 1;
+				}
 			}
 			td->io_log_current--;
 		}
@@ -355,7 +360,7 @@ void write_iolog_close(struct thread_data *td)
 	td->iolog_buf = NULL;
 }
 
-static int64_t iolog_items_to_fetch(struct thread_data *td)
+int64_t iolog_items_to_fetch(struct thread_data *td)
 {
 	struct timespec now;
 	uint64_t elapsed;
@@ -626,8 +631,6 @@ static bool init_iolog_read(struct thread_data *td, char *fname)
 	} else
 		f = fopen(fname, "r");
 
-	free(fname);
-
 	if (!f) {
 		perror("fopen read iolog");
 		return false;
@@ -709,11 +712,12 @@ bool init_iolog(struct thread_data *td)
 		 */
 		if (is_blktrace(fname, &need_swap)) {
 			td->io_log_blktrace = 1;
-			ret = load_blktrace(td, fname, need_swap);
+			ret = init_blktrace_read(td, fname, need_swap);
 		} else {
 			td->io_log_blktrace = 0;
 			ret = init_iolog_read(td, fname);
 		}
+		free(fname);
 	} else if (td->o.write_iolog_file)
 		ret = init_iolog_write(td);
 	else
diff --git a/iolog.h b/iolog.h
index 7d66b7c4..a3986309 100644
--- a/iolog.h
+++ b/iolog.h
@@ -254,6 +254,7 @@ extern void trim_io_piece(const struct io_u *);
 extern void queue_io_piece(struct thread_data *, struct io_piece *);
 extern void prune_io_piece_log(struct thread_data *);
 extern void write_iolog_close(struct thread_data *);
+int64_t iolog_items_to_fetch(struct thread_data *td);
 extern int iolog_compress_init(struct thread_data *, struct sk_out *);
 extern void iolog_compress_exit(struct thread_data *);
 extern size_t log_chunk_sizes(struct io_log *);
diff --git a/oslib/linux-dev-lookup.c b/oslib/linux-dev-lookup.c
index 1dda93f2..4335faf9 100644
--- a/oslib/linux-dev-lookup.c
+++ b/oslib/linux-dev-lookup.c
@@ -16,6 +16,16 @@ int blktrace_lookup_device(const char *redirect, char *path, unsigned int maj,
 	int found = 0;
 	DIR *D;
 
+	/*
+	 * If replay_redirect is set then always return this device
+	 * upon lookup which overrides the device lookup based on
+	 * major minor in the actual blktrace
+	 */
+	if (redirect) {
+		strcpy(path, redirect);
+		return 1;
+	}
+
 	D = opendir(path);
 	if (!D)
 		return 0;
@@ -44,17 +54,6 @@ int blktrace_lookup_device(const char *redirect, char *path, unsigned int maj,
 		if (!S_ISBLK(st.st_mode))
 			continue;
 
-		/*
-		 * If replay_redirect is set then always return this device
-		 * upon lookup which overrides the device lookup based on
-		 * major minor in the actual blktrace
-		 */
-		if (redirect) {
-			strcpy(path, redirect);
-			found = 1;
-			break;
-		}
-
 		if (maj == major(st.st_rdev) && min == minor(st.st_rdev)) {
 			strcpy(path, full_path);
 			found = 1;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-01-19 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-01-19 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ef37053efdfb8c3b8b6deef43c0969753e6adb44:

  init: do not create lat logs when not needed (2022-01-17 07:21:58 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 71efbed61dfb157dfa7fe550f500b53f9731e1cb:

  docs: documentation for sg WRITE STREAM(16) (2022-01-18 06:37:39 -0700)

----------------------------------------------------------------
Niklas Cassel (2):
      stat: remove duplicated code in show_mixed_ddir_status()
      stat: move unified=both mixed allocation and calculation to new helper

Vincent Fu (6):
      sg: add support for VERIFY command using write modes
      sg: add support for WRITE SAME(16) commands with NDOB flag set
      sg: improve sg_write_mode option names
      sg: add support for WRITE STREAM(16) commands
      sg: allow fio to open and close streams for WRITE STREAM(16) commands
      docs: documentation for sg WRITE STREAM(16)

 HOWTO                           |  36 +++++-
 engines/sg.c                    | 181 +++++++++++++++++++++++++++++--
 examples/sg_verify-fail.fio     |  48 ++++++++
 examples/sg_verify.fio          |  57 ++++++++++
 examples/sg_write_same_ndob.fio |  44 ++++++++
 fio.1                           |  47 +++++++-
 stat.c                          | 235 ++++++++--------------------------------
 7 files changed, 441 insertions(+), 207 deletions(-)
 create mode 100644 examples/sg_verify-fail.fio
 create mode 100644 examples/sg_verify.fio
 create mode 100644 examples/sg_write_same_ndob.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 2956e50d..f9e7c857 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2496,11 +2496,13 @@ with the caveat that when used on the command line, they must come after the
 
 	**write**
 		This is the default where write opcodes are issued as usual.
-	**verify**
+	**write_and_verify**
 		Issue WRITE AND VERIFY commands. The BYTCHK bit is set to 0. This
 		directs the device to carry out a medium verification with no data
 		comparison. The writefua option is ignored with this selection.
-	**same**
+	**verify**
+		This option is deprecated. Use write_and_verify instead.
+	**write_same**
 		Issue WRITE SAME commands. This transfers a single block to the device
 		and writes this same block of data to a contiguous sequence of LBAs
 		beginning at the specified offset. fio's block size parameter specifies
@@ -2511,6 +2513,36 @@ with the caveat that when used on the command line, they must come after the
 		for each command but only the first 512 bytes will be used and
 		transferred to the device. The writefua option is ignored with this
 		selection.
+	**same**
+		This option is deprecated. Use write_same instead.
+	**write_same_ndob**
+		Issue WRITE SAME(16) commands as above but with the No Data Output
+		Buffer (NDOB) bit set. No data will be transferred to the device with
+		this bit set. Data written will be a pre-determined pattern such as
+		all zeroes.
+	**write_stream**
+		Issue WRITE STREAM(16) commands. Use the **stream_id** option to specify
+		the stream identifier.
+	**verify_bytchk_00**
+		Issue VERIFY commands with BYTCHK set to 00. This directs the
+		device to carry out a medium verification with no data comparison.
+	**verify_bytchk_01**
+		Issue VERIFY commands with BYTCHK set to 01. This directs the device to
+		compare the data on the device with the data transferred to the device.
+	**verify_bytchk_11**
+		Issue VERIFY commands with BYTCHK set to 11. This transfers a
+		single block to the device and compares the contents of this block with the
+		data on the device beginning at the specified offset. fio's block size
+		parameter specifies the total amount of data compared with this command.
+		However, only one block (sector) worth of data is transferred to the device.
+		This is similar to the WRITE SAME command except that data is compared instead
+		of written.
+
+.. option:: stream_id=int : [sg]
+
+	Set the stream identifier for WRITE STREAM commands. If this is set to 0 (which is not
+	a valid stream identifier) fio will open a stream and then close it when done. Default
+	is 0.
 
 .. option:: hipri : [sg]
 
diff --git a/engines/sg.c b/engines/sg.c
index 1c019384..72ee07ba 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -66,8 +66,13 @@
 
 enum {
 	FIO_SG_WRITE		= 1,
-	FIO_SG_WRITE_VERIFY	= 2,
-	FIO_SG_WRITE_SAME	= 3
+	FIO_SG_WRITE_VERIFY,
+	FIO_SG_WRITE_SAME,
+	FIO_SG_WRITE_SAME_NDOB,
+	FIO_SG_WRITE_STREAM,
+	FIO_SG_VERIFY_BYTCHK_00,
+	FIO_SG_VERIFY_BYTCHK_01,
+	FIO_SG_VERIFY_BYTCHK_11,
 };
 
 struct sg_options {
@@ -76,6 +81,7 @@ struct sg_options {
 	unsigned int readfua;
 	unsigned int writefua;
 	unsigned int write_mode;
+	uint16_t stream_id;
 };
 
 static struct fio_option options[] = {
@@ -120,18 +126,58 @@ static struct fio_option options[] = {
 			    .oval = FIO_SG_WRITE,
 			    .help = "Issue standard SCSI WRITE commands",
 			  },
-			  { .ival = "verify",
+			  { .ival = "write_and_verify",
 			    .oval = FIO_SG_WRITE_VERIFY,
 			    .help = "Issue SCSI WRITE AND VERIFY commands",
 			  },
-			  { .ival = "same",
+			  { .ival = "verify",
+			    .oval = FIO_SG_WRITE_VERIFY,
+			    .help = "Issue SCSI WRITE AND VERIFY commands. This "
+				    "option is deprecated. Use write_and_verify instead.",
+			  },
+			  { .ival = "write_same",
 			    .oval = FIO_SG_WRITE_SAME,
 			    .help = "Issue SCSI WRITE SAME commands",
 			  },
+			  { .ival = "same",
+			    .oval = FIO_SG_WRITE_SAME,
+			    .help = "Issue SCSI WRITE SAME commands. This "
+				    "option is deprecated. Use write_same instead.",
+			  },
+			  { .ival = "write_same_ndob",
+			    .oval = FIO_SG_WRITE_SAME_NDOB,
+			    .help = "Issue SCSI WRITE SAME(16) commands with NDOB flag set",
+			  },
+			  { .ival = "verify_bytchk_00",
+			    .oval = FIO_SG_VERIFY_BYTCHK_00,
+			    .help = "Issue SCSI VERIFY commands with BYTCHK set to 00",
+			  },
+			  { .ival = "verify_bytchk_01",
+			    .oval = FIO_SG_VERIFY_BYTCHK_01,
+			    .help = "Issue SCSI VERIFY commands with BYTCHK set to 01",
+			  },
+			  { .ival = "verify_bytchk_11",
+			    .oval = FIO_SG_VERIFY_BYTCHK_11,
+			    .help = "Issue SCSI VERIFY commands with BYTCHK set to 11",
+			  },
+			  { .ival = "write_stream",
+			    .oval = FIO_SG_WRITE_STREAM,
+			    .help = "Issue SCSI WRITE STREAM(16) commands",
+			  },
 		},
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_SG,
 	},
+	{
+		.name	= "stream_id",
+		.lname	= "stream id for WRITE STREAM(16) commands",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct sg_options, stream_id),
+		.help	= "Stream ID for WRITE STREAM(16) commands",
+		.def	= "0",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_SG,
+	},
 	{
 		.name	= NULL,
 	},
@@ -171,6 +217,11 @@ struct sgio_data {
 #endif
 };
 
+static inline uint16_t sgio_get_be16(uint8_t *buf)
+{
+	return be16_to_cpu(*((uint16_t *) buf));
+}
+
 static inline uint32_t sgio_get_be32(uint8_t *buf)
 {
 	return be32_to_cpu(*((uint32_t *) buf));
@@ -502,9 +553,9 @@ static enum fio_q_status fio_sgio_doio(struct thread_data *td,
 }
 
 static void fio_sgio_rw_lba(struct sg_io_hdr *hdr, unsigned long long lba,
-			    unsigned long long nr_blocks)
+			    unsigned long long nr_blocks, bool override16)
 {
-	if (lba < MAX_10B_LBA) {
+	if (lba < MAX_10B_LBA && !override16) {
 		sgio_set_be32((uint32_t) lba, &hdr->cmdp[2]);
 		sgio_set_be16((uint16_t) nr_blocks, &hdr->cmdp[7]);
 	} else {
@@ -545,7 +596,7 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 		if (o->readfua)
 			hdr->cmdp[1] |= 0x08;
 
-		fio_sgio_rw_lba(hdr, lba, nr_blocks);
+		fio_sgio_rw_lba(hdr, lba, nr_blocks, false);
 
 	} else if (io_u->ddir == DDIR_WRITE) {
 		sgio_hdr_init(sd, hdr, io_u, 1);
@@ -576,9 +627,46 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 			else
 				hdr->cmdp[0] = 0x93; // write same(16)
 			break;
+		case FIO_SG_WRITE_SAME_NDOB:
+			hdr->cmdp[0] = 0x93; // write same(16)
+			hdr->cmdp[1] |= 0x1; // no data output buffer
+			hdr->dxfer_len = 0;
+			break;
+		case FIO_SG_WRITE_STREAM:
+			hdr->cmdp[0] = 0x9a; // write stream (16)
+			if (o->writefua)
+				hdr->cmdp[1] |= 0x08;
+			sgio_set_be64(lba, &hdr->cmdp[2]);
+			sgio_set_be16((uint16_t) io_u->file->engine_pos, &hdr->cmdp[10]);
+			sgio_set_be16((uint16_t) nr_blocks, &hdr->cmdp[12]);
+			break;
+		case FIO_SG_VERIFY_BYTCHK_00:
+			if (lba < MAX_10B_LBA)
+				hdr->cmdp[0] = 0x2f; // VERIFY(10)
+			else
+				hdr->cmdp[0] = 0x8f; // VERIFY(16)
+			hdr->dxfer_len = 0;
+			break;
+		case FIO_SG_VERIFY_BYTCHK_01:
+			if (lba < MAX_10B_LBA)
+				hdr->cmdp[0] = 0x2f; // VERIFY(10)
+			else
+				hdr->cmdp[0] = 0x8f; // VERIFY(16)
+			hdr->cmdp[1] |= 0x02;		// BYTCHK = 01b
+			break;
+		case FIO_SG_VERIFY_BYTCHK_11:
+			if (lba < MAX_10B_LBA)
+				hdr->cmdp[0] = 0x2f; // VERIFY(10)
+			else
+				hdr->cmdp[0] = 0x8f; // VERIFY(16)
+			hdr->cmdp[1] |= 0x06;		// BYTCHK = 11b
+			hdr->dxfer_len = sd->bs;
+			break;
 		};
 
-		fio_sgio_rw_lba(hdr, lba, nr_blocks);
+		if (o->write_mode != FIO_SG_WRITE_STREAM)
+			fio_sgio_rw_lba(hdr, lba, nr_blocks,
+				o->write_mode == FIO_SG_WRITE_SAME_NDOB);
 
 	} else if (io_u->ddir == DDIR_TRIM) {
 		struct sgio_trim *st;
@@ -970,9 +1058,60 @@ static int fio_sgio_type_check(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
+static int fio_sgio_stream_control(struct fio_file *f, bool open_stream, uint16_t *stream_id)
+{
+	struct sg_io_hdr hdr;
+	unsigned char cmd[16];
+	unsigned char sb[64];
+	unsigned char buf[8];
+	int ret;
+
+	memset(&hdr, 0, sizeof(hdr));
+	memset(cmd, 0, sizeof(cmd));
+	memset(sb, 0, sizeof(sb));
+	memset(buf, 0, sizeof(buf));
+
+	hdr.interface_id = 'S';
+	hdr.cmdp = cmd;
+	hdr.cmd_len = 16;
+	hdr.sbp = sb;
+	hdr.mx_sb_len = sizeof(sb);
+	hdr.timeout = SCSI_TIMEOUT_MS;
+	hdr.cmdp[0] = 0x9e;
+	hdr.dxfer_direction = SG_DXFER_FROM_DEV;
+	hdr.dxferp = buf;
+	hdr.dxfer_len = sizeof(buf);
+	sgio_set_be32(sizeof(buf), &hdr.cmdp[10]);
+
+	if (open_stream)
+		hdr.cmdp[1] = 0x34;
+	else {
+		hdr.cmdp[1] = 0x54;
+		sgio_set_be16(*stream_id, &hdr.cmdp[4]);
+	}
+
+	ret = ioctl(f->fd, SG_IO, &hdr);
+
+	if (ret < 0)
+		return ret;
+
+	if (hdr.info & SG_INFO_CHECK)
+		return 1;
+
+	if (open_stream) {
+		*stream_id = sgio_get_be16(&buf[4]);
+		dprint(FD_FILE, "sgio_stream_control: opened stream %u\n", (unsigned int) *stream_id);
+		assert(*stream_id != 0);
+	} else
+		dprint(FD_FILE, "sgio_stream_control: closed stream %u\n", (unsigned int) *stream_id);
+
+	return 0;
+}
+
 static int fio_sgio_open(struct thread_data *td, struct fio_file *f)
 {
 	struct sgio_data *sd = td->io_ops_data;
+	struct sg_options *o = td->eo;
 	int ret;
 
 	ret = generic_open_file(td, f);
@@ -984,9 +1123,33 @@ static int fio_sgio_open(struct thread_data *td, struct fio_file *f)
 		return ret;
 	}
 
+	if (o->write_mode == FIO_SG_WRITE_STREAM) {
+		if (o->stream_id)
+			f->engine_pos = o->stream_id;
+		else {
+			ret = fio_sgio_stream_control(f, true, (uint16_t *) &f->engine_pos);
+			if (ret)
+				return ret;
+		}
+	}
+
 	return 0;
 }
 
+int fio_sgio_close(struct thread_data *td, struct fio_file *f)
+{
+	struct sg_options *o = td->eo;
+	int ret;
+
+	if (!o->stream_id && o->write_mode == FIO_SG_WRITE_STREAM) {
+		ret = fio_sgio_stream_control(f, false, (uint16_t *) &f->engine_pos);
+		if (ret)
+			return ret;
+	}
+
+	return generic_close_file(td, f);
+}
+
 /*
  * Build an error string with details about the driver, host or scsi
  * error contained in the sg header Caller will use as necessary.
@@ -1261,7 +1424,7 @@ static struct ioengine_ops ioengine = {
 	.event		= fio_sgio_event,
 	.cleanup	= fio_sgio_cleanup,
 	.open_file	= fio_sgio_open,
-	.close_file	= generic_close_file,
+	.close_file	= fio_sgio_close,
 	.get_file_size	= fio_sgio_get_file_size,
 	.flags		= FIO_SYNCIO | FIO_RAWIO,
 	.options	= options,
diff --git a/examples/sg_verify-fail.fio b/examples/sg_verify-fail.fio
new file mode 100644
index 00000000..64feece3
--- /dev/null
+++ b/examples/sg_verify-fail.fio
@@ -0,0 +1,48 @@
+#
+# **********************************
+# * !!THIS IS A DESTRUCTIVE TEST!! *
+# * IF NOT CHANGED THIS TEST WILL  *
+# * DESTROY DATA ON /dev/sdb       *
+# **********************************
+#
+# Test SCSI VERIFY commands issued via the sg ioengine
+# The jobs with fail in the name should produce errors
+#
+# job			description
+# precon		precondition the device by writing with a known
+#			pattern
+# verify01		verify each block one at a time by comparing to known
+#			pattern
+# verify01-fail		verifying one too many blocks should produce a failure
+# verify11-one_ios	verify all 20 blocks by sending only 512 bytes
+# verify11-fail		verifying beyond the preconditioned region should
+#			produce a failure
+
+[global]
+filename=/dev/sdb
+buffer_pattern=0x01
+ioengine=sg
+rw=write
+bs=512
+number_ios=20
+stonewall
+
+[precon]
+
+[verify01]
+sg_write_mode=verify_bytchk_01
+number_ios=20
+
+[verify01-fail]
+sg_write_mode=verify_bytchk_01
+number_ios=21
+
+[verify11-one_ios]
+sg_write_mode=verify_bytchk_11
+number_ios=1
+bs=10240
+
+[verify11-fail]
+sg_write_mode=verify_bytchk_11
+number_ios=1
+bs=10752
diff --git a/examples/sg_verify.fio b/examples/sg_verify.fio
new file mode 100644
index 00000000..6db0dd0a
--- /dev/null
+++ b/examples/sg_verify.fio
@@ -0,0 +1,57 @@
+#
+# **********************************
+# * !!THIS IS A DESTRUCTIVE TEST!! *
+# * IF NOT CHANGED THIS TEST WILL  *
+# * DESTROY DATA ON /dev/sdb       *
+# **********************************
+#
+# Test SCSI VERIFY commands issued via the sg ioengine
+# All of the jobs below should complete without error
+#
+# job			description
+# precon		precondition the device by writing with a known
+#			pattern
+# verify00		verify written data on medium only
+# verify01		verify each block one at a time by comparing to known
+#			pattern
+# verify01-two_ios	verify same data but with only two VERIFY operations
+# verify11		verify each block one at a time
+# verify11-five_ios	verify data with five IOs, four blocks at a time,
+#			sending 512 bytes for each IO
+# verify11-one_ios	verify all 20 blocks by sending only 512 bytes
+#
+
+[global]
+filename=/dev/sdb
+buffer_pattern=0x01
+ioengine=sg
+rw=write
+bs=512
+number_ios=20
+stonewall
+
+[precon]
+
+[verify00]
+sg_write_mode=verify_bytchk_00
+
+[verify01]
+sg_write_mode=verify_bytchk_01
+
+[verify01-two_ios]
+sg_write_mode=verify_bytchk_01
+bs=5120
+number_ios=2
+
+[verify11]
+sg_write_mode=verify_bytchk_11
+
+[verify11-five_ios]
+sg_write_mode=verify_bytchk_11
+bs=2048
+number_ios=5
+
+[verify11-one_ios]
+sg_write_mode=verify_bytchk_11
+bs=10240
+number_ios=1
diff --git a/examples/sg_write_same_ndob.fio b/examples/sg_write_same_ndob.fio
new file mode 100644
index 00000000..fb047319
--- /dev/null
+++ b/examples/sg_write_same_ndob.fio
@@ -0,0 +1,44 @@
+#
+# **********************************
+# * !!THIS IS A DESTRUCTIVE TEST!! *
+# * IF NOT CHANGED THIS TEST WILL  *
+# * DESTROY DATA ON /dev/sdb       *
+# **********************************
+#
+# Test WRITE SAME commands with the NDOB flag set
+# issued via the sg ioengine
+# All of the jobs below should complete without error
+# except the last one
+#
+# job			description
+# precon		Precondition the device by writing 20 blocks with a
+# 			known pattern
+# write_same_ndob	Write 19 sectors of all zeroes with the NDOB flag set
+# verify-pass		Verify 19 blocks of all zeroes
+# verify-fail		Verify 20 blocks of all zeroes. This should fail.
+#
+
+[global]
+filename=/dev/sdb
+buffer_pattern=0x01
+ioengine=sg
+rw=write
+bs=512
+stonewall
+
+[precon]
+number_ios=20
+
+[write_same_ndob]
+sg_write_mode=write_same_ndob
+number_ios=19
+
+[verify-pass]
+sg_write_mode=verify_bytchk_01
+buffer_pattern=0x00
+number_ios=19
+
+[verify-fail]
+sg_write_mode=verify_bytchk_01
+buffer_pattern=0x00
+number_ios=20
diff --git a/fio.1 b/fio.1
index e0458c22..34aa874d 100644
--- a/fio.1
+++ b/fio.1
@@ -2284,7 +2284,7 @@ With writefua option set to 1, write operations include the force
 unit access (fua) flag. Default: 0.
 .TP
 .BI (sg)sg_write_mode \fR=\fPstr
-Specify the type of write commands to issue. This option can take three
+Specify the type of write commands to issue. This option can take multiple
 values:
 .RS
 .RS
@@ -2292,12 +2292,15 @@ values:
 .B write (default)
 Write opcodes are issued as usual
 .TP
+.B write_and_verify
+Issue WRITE AND VERIFY commands. The BYTCHK bit is set to 00b. This directs the
+device to carry out a medium verification with no data comparison for the data
+that was written. The writefua option is ignored with this selection.
+.TP
 .B verify
-Issue WRITE AND VERIFY commands. The BYTCHK bit is set to 0. This
-directs the device to carry out a medium verification with no data
-comparison. The writefua option is ignored with this selection.
+This option is deprecated. Use write_and_verify instead.
 .TP
-.B same
+.B write_same
 Issue WRITE SAME commands. This transfers a single block to the device
 and writes this same block of data to a contiguous sequence of LBAs
 beginning at the specified offset. fio's block size parameter
@@ -2308,9 +2311,43 @@ blocksize=8k will write 16 sectors with each command. fio will still
 generate 8k of data for each command butonly the first 512 bytes will
 be used and transferred to the device. The writefua option is ignored
 with this selection.
+.TP
+.B same
+This option is deprecated. Use write_same instead.
+.TP
+.B write_same_ndob
+Issue WRITE SAME(16) commands as above but with the No Data Output
+Buffer (NDOB) bit set. No data will be transferred to the device with
+this bit set. Data written will be a pre-determined pattern such as
+all zeroes.
+.TP
+.B write_stream
+Issue WRITE STREAM(16) commands. Use the stream_id option to specify
+the stream identifier.
+.TP
+.B verify_bytchk_00
+Issue VERIFY commands with BYTCHK set to 00. This directs the device to carry
+out a medium verification with no data comparison.
+.TP
+.B verify_bytchk_01
+Issue VERIFY commands with BYTCHK set to 01. This directs the device to
+compare the data on the device with the data transferred to the device.
+.TP
+.B verify_bytchk_11
+Issue VERIFY commands with BYTCHK set to 11. This transfers a single block to
+the device and compares the contents of this block with the data on the device
+beginning at the specified offset. fio's block size parameter specifies the
+total amount of data compared with this command. However, only one block
+(sector) worth of data is transferred to the device. This is similar to the
+WRITE SAME command except that data is compared instead of written.
 .RE
 .RE
 .TP
+.BI (sg)stream_id \fR=\fPint
+Set the stream identifier for WRITE STREAM commands. If this is set to 0 (which is not
+a valid stream identifier) fio will open a stream and then close it when done. Default
+is 0.
+.TP
 .BI (nbd)uri \fR=\fPstr
 Specify the NBD URI of the server to test.
 The string is a standard NBD URI (see
diff --git a/stat.c b/stat.c
index 36742a25..b08d2f25 100644
--- a/stat.c
+++ b/stat.c
@@ -462,173 +462,45 @@ static void display_lat(const char *name, unsigned long long min,
 	free(maxp);
 }
 
-static double convert_agg_kbytes_percent(struct group_run_stats *rs, int ddir, int mean)
+static struct thread_stat *gen_mixed_ddir_stats_from_ts(struct thread_stat *ts)
 {
-	double p_of_agg = 100.0;
-	if (rs && rs->agg[ddir] > 1024) {
-		p_of_agg = mean * 100.0 / (double) (rs->agg[ddir] / 1024.0);
-
-		if (p_of_agg > 100.0)
-			p_of_agg = 100.0;
-	}
-	return p_of_agg;
-}
-
-static void show_mixed_ddir_status(struct group_run_stats *rs,
-				   struct thread_stat *ts,
-				   struct buf_output *out)
-{
-	unsigned long runt;
-	unsigned long long min, max, bw, iops;
-	double mean, dev;
-	char *io_p, *bw_p, *bw_p_alt, *iops_p, *post_st = NULL;
 	struct thread_stat *ts_lcl;
-	int i2p;
-	int ddir = 0;
 
 	/*
 	 * Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and
-	 * Trims (ddir = 2) */
+	 * Trims (ddir = 2)
+	 */
 	ts_lcl = malloc(sizeof(struct thread_stat));
-	memset((void *)ts_lcl, 0, sizeof(struct thread_stat));
-	/* calculate mixed stats  */
-	ts_lcl->unified_rw_rep = UNIFIED_MIXED;
-	init_thread_stat_min_vals(ts_lcl);
-
-	sum_thread_stats(ts_lcl, ts);
-
-	assert(ddir_rw(ddir));
-
-	if (!ts_lcl->runtime[ddir]) {
-		free(ts_lcl);
-		return;
-	}
-
-	i2p = is_power_of_2(rs->kb_base);
-	runt = ts_lcl->runtime[ddir];
-
-	bw = (1000 * ts_lcl->io_bytes[ddir]) / runt;
-	io_p = num2str(ts_lcl->io_bytes[ddir], ts->sig_figs, 1, i2p, N2S_BYTE);
-	bw_p = num2str(bw, ts->sig_figs, 1, i2p, ts->unit_base);
-	bw_p_alt = num2str(bw, ts->sig_figs, 1, !i2p, ts->unit_base);
-
-	iops = (1000 * ts_lcl->total_io_u[ddir]) / runt;
-	iops_p = num2str(iops, ts->sig_figs, 1, 0, N2S_NONE);
-
-	log_buf(out, "  mixed: IOPS=%s, BW=%s (%s)(%s/%llumsec)%s\n",
-			iops_p, bw_p, bw_p_alt, io_p,
-			(unsigned long long) ts_lcl->runtime[ddir],
-			post_st ? : "");
-
-	free(post_st);
-	free(io_p);
-	free(bw_p);
-	free(bw_p_alt);
-	free(iops_p);
-
-	if (calc_lat(&ts_lcl->slat_stat[ddir], &min, &max, &mean, &dev))
-		display_lat("slat", min, max, mean, dev, out);
-	if (calc_lat(&ts_lcl->clat_stat[ddir], &min, &max, &mean, &dev))
-		display_lat("clat", min, max, mean, dev, out);
-	if (calc_lat(&ts_lcl->lat_stat[ddir], &min, &max, &mean, &dev))
-		display_lat(" lat", min, max, mean, dev, out);
-	if (calc_lat(&ts_lcl->clat_high_prio_stat[ddir], &min, &max, &mean, &dev)) {
-		display_lat(ts_lcl->lat_percentiles ? "high prio_lat" : "high prio_clat",
-				min, max, mean, dev, out);
-		if (calc_lat(&ts_lcl->clat_low_prio_stat[ddir], &min, &max, &mean, &dev))
-			display_lat(ts_lcl->lat_percentiles ? "low prio_lat" : "low prio_clat",
-					min, max, mean, dev, out);
-	}
-
-	if (ts->slat_percentiles && ts_lcl->slat_stat[ddir].samples > 0)
-		show_clat_percentiles(ts_lcl->io_u_plat[FIO_SLAT][ddir],
-				ts_lcl->slat_stat[ddir].samples,
-				ts->percentile_list,
-				ts->percentile_precision, "slat", out);
-	if (ts->clat_percentiles && ts_lcl->clat_stat[ddir].samples > 0)
-		show_clat_percentiles(ts_lcl->io_u_plat[FIO_CLAT][ddir],
-				ts_lcl->clat_stat[ddir].samples,
-				ts->percentile_list,
-				ts->percentile_precision, "clat", out);
-	if (ts->lat_percentiles && ts_lcl->lat_stat[ddir].samples > 0)
-		show_clat_percentiles(ts_lcl->io_u_plat[FIO_LAT][ddir],
-				ts_lcl->lat_stat[ddir].samples,
-				ts->percentile_list,
-				ts->percentile_precision, "lat", out);
-
-	if (ts->clat_percentiles || ts->lat_percentiles) {
-		const char *name = ts->lat_percentiles ? "lat" : "clat";
-		char prio_name[32];
-		uint64_t samples;
-
-		if (ts->lat_percentiles)
-			samples = ts_lcl->lat_stat[ddir].samples;
-		else
-			samples = ts_lcl->clat_stat[ddir].samples;
-
-		/* Only print if high and low priority stats were collected */
-		if (ts_lcl->clat_high_prio_stat[ddir].samples > 0 &&
-				ts_lcl->clat_low_prio_stat[ddir].samples > 0) {
-			sprintf(prio_name, "high prio (%.2f%%) %s",
-					100. * (double) ts_lcl->clat_high_prio_stat[ddir].samples / (double) samples,
-					name);
-			show_clat_percentiles(ts_lcl->io_u_plat_high_prio[ddir],
-					ts_lcl->clat_high_prio_stat[ddir].samples,
-					ts->percentile_list,
-					ts->percentile_precision, prio_name, out);
-
-			sprintf(prio_name, "low prio (%.2f%%) %s",
-					100. * (double) ts_lcl->clat_low_prio_stat[ddir].samples / (double) samples,
-					name);
-			show_clat_percentiles(ts_lcl->io_u_plat_low_prio[ddir],
-					ts_lcl->clat_low_prio_stat[ddir].samples,
-					ts->percentile_list,
-					ts->percentile_precision, prio_name, out);
-		}
+	if (!ts_lcl) {
+		log_err("fio: failed to allocate local thread stat\n");
+		return NULL;
 	}
 
-	if (calc_lat(&ts_lcl->bw_stat[ddir], &min, &max, &mean, &dev)) {
-		double p_of_agg = 100.0, fkb_base = (double)rs->kb_base;
-		const char *bw_str;
+	init_thread_stat(ts_lcl);
 
-		if ((rs->unit_base == 1) && i2p)
-			bw_str = "Kibit";
-		else if (rs->unit_base == 1)
-			bw_str = "kbit";
-		else if (i2p)
-			bw_str = "KiB";
-		else
-			bw_str = "kB";
+	/* calculate mixed stats  */
+	ts_lcl->unified_rw_rep = UNIFIED_MIXED;
+	ts_lcl->lat_percentiles = ts->lat_percentiles;
+	ts_lcl->clat_percentiles = ts->clat_percentiles;
+	ts_lcl->slat_percentiles = ts->slat_percentiles;
+	ts_lcl->percentile_precision = ts->percentile_precision;
+	memcpy(ts_lcl->percentile_list, ts->percentile_list, sizeof(ts->percentile_list));
 
-		p_of_agg = convert_agg_kbytes_percent(rs, ddir, mean);
+	sum_thread_stats(ts_lcl, ts);
 
-		if (rs->unit_base == 1) {
-			min *= 8.0;
-			max *= 8.0;
-			mean *= 8.0;
-			dev *= 8.0;
-		}
+	return ts_lcl;
+}
 
-		if (mean > fkb_base * fkb_base) {
-			min /= fkb_base;
-			max /= fkb_base;
-			mean /= fkb_base;
-			dev /= fkb_base;
-			bw_str = (rs->unit_base == 1 ? "Mibit" : "MiB");
-		}
+static double convert_agg_kbytes_percent(struct group_run_stats *rs, int ddir, int mean)
+{
+	double p_of_agg = 100.0;
+	if (rs && rs->agg[ddir] > 1024) {
+		p_of_agg = mean * 100.0 / (double) (rs->agg[ddir] / 1024.0);
 
-		log_buf(out, "   bw (%5s/s): min=%5llu, max=%5llu, per=%3.2f%%, "
-			"avg=%5.02f, stdev=%5.02f, samples=%" PRIu64 "\n",
-			bw_str, min, max, p_of_agg, mean, dev,
-			(&ts_lcl->bw_stat[ddir])->samples);
-	}
-	if (calc_lat(&ts_lcl->iops_stat[ddir], &min, &max, &mean, &dev)) {
-		log_buf(out, "   iops        : min=%5llu, max=%5llu, "
-			"avg=%5.02f, stdev=%5.02f, samples=%" PRIu64 "\n",
-			min, max, mean, dev, (&ts_lcl->iops_stat[ddir])->samples);
+		if (p_of_agg > 100.0)
+			p_of_agg = 100.0;
 	}
-
-	free(ts_lcl);
+	return p_of_agg;
 }
 
 static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
@@ -797,6 +669,18 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	}
 }
 
+static void show_mixed_ddir_status(struct group_run_stats *rs,
+				   struct thread_stat *ts,
+				   struct buf_output *out)
+{
+	struct thread_stat *ts_lcl = gen_mixed_ddir_stats_from_ts(ts);
+
+	if (ts_lcl)
+		show_ddir_status(rs, ts_lcl, DDIR_READ, out);
+
+	free(ts_lcl);
+}
+
 static bool show_lat(double *io_u_lat, int nr, const char **ranges,
 		     const char *msg, struct buf_output *out)
 {
@@ -1462,27 +1346,11 @@ static void show_mixed_ddir_status_terse(struct thread_stat *ts,
 				   struct group_run_stats *rs,
 				   int ver, struct buf_output *out)
 {
-	struct thread_stat *ts_lcl;
+	struct thread_stat *ts_lcl = gen_mixed_ddir_stats_from_ts(ts);
 
-	/*
-	 * Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and
-	 * Trims (ddir = 2)
-	 */
-	ts_lcl = malloc(sizeof(struct thread_stat));
-	memset((void *)ts_lcl, 0, sizeof(struct thread_stat));
-	/* calculate mixed stats  */
-	ts_lcl->unified_rw_rep = UNIFIED_MIXED;
-	init_thread_stat_min_vals(ts_lcl);
-	ts_lcl->lat_percentiles = ts->lat_percentiles;
-	ts_lcl->clat_percentiles = ts->clat_percentiles;
-	ts_lcl->slat_percentiles = ts->slat_percentiles;
-	ts_lcl->percentile_precision = ts->percentile_precision;
-	memcpy(ts_lcl->percentile_list, ts->percentile_list, sizeof(ts->percentile_list));
-	
-	sum_thread_stats(ts_lcl, ts);
+	if (ts_lcl)
+		show_ddir_status_terse(ts_lcl, rs, DDIR_READ, ver, out);
 
-	/* add the aggregated stats to json parent */
-	show_ddir_status_terse(ts_lcl, rs, DDIR_READ, ver, out);
 	free(ts_lcl);
 }
 
@@ -1660,27 +1528,12 @@ static void add_ddir_status_json(struct thread_stat *ts,
 static void add_mixed_ddir_status_json(struct thread_stat *ts,
 		struct group_run_stats *rs, struct json_object *parent)
 {
-	struct thread_stat *ts_lcl;
-
-	/*
-	 * Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and
-	 * Trims (ddir = 2)
-	 */
-	ts_lcl = malloc(sizeof(struct thread_stat));
-	memset((void *)ts_lcl, 0, sizeof(struct thread_stat));
-	/* calculate mixed stats  */
-	ts_lcl->unified_rw_rep = UNIFIED_MIXED;
-	init_thread_stat_min_vals(ts_lcl);
-	ts_lcl->lat_percentiles = ts->lat_percentiles;
-	ts_lcl->clat_percentiles = ts->clat_percentiles;
-	ts_lcl->slat_percentiles = ts->slat_percentiles;
-	ts_lcl->percentile_precision = ts->percentile_precision;
-	memcpy(ts_lcl->percentile_list, ts->percentile_list, sizeof(ts->percentile_list));
-
-	sum_thread_stats(ts_lcl, ts);
+	struct thread_stat *ts_lcl = gen_mixed_ddir_stats_from_ts(ts);
 
 	/* add the aggregated stats to json parent */
-	add_ddir_status_json(ts_lcl, rs, DDIR_READ, parent);
+	if (ts_lcl)
+		add_ddir_status_json(ts_lcl, rs, DDIR_READ, parent);
+
 	free(ts_lcl);
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-01-18 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-01-18 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 016869bebe9bef7cae5a7f9dc0762162b0612226:

  stat: remove unnecessary bool parameter to sum_thread_stats() (2022-01-10 09:22:14 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ef37053efdfb8c3b8b6deef43c0969753e6adb44:

  init: do not create lat logs when not needed (2022-01-17 07:21:58 -0700)

----------------------------------------------------------------
Damien Le Moal (1):
      init: do not create lat logs when not needed

 init.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 5f069d9a..07daaa84 100644
--- a/init.c
+++ b/init.c
@@ -1586,17 +1586,23 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		else
 			suf = "log";
 
-		gen_log_name(logname, sizeof(logname), "lat", pre,
-				td->thread_number, suf, o->per_job_logs);
-		setup_log(&td->lat_log, &p, logname);
+		if (!o->disable_lat) {
+			gen_log_name(logname, sizeof(logname), "lat", pre,
+				     td->thread_number, suf, o->per_job_logs);
+			setup_log(&td->lat_log, &p, logname);
+		}
 
-		gen_log_name(logname, sizeof(logname), "slat", pre,
-				td->thread_number, suf, o->per_job_logs);
-		setup_log(&td->slat_log, &p, logname);
+		if (!o->disable_slat) {
+			gen_log_name(logname, sizeof(logname), "slat", pre,
+				     td->thread_number, suf, o->per_job_logs);
+			setup_log(&td->slat_log, &p, logname);
+		}
 
-		gen_log_name(logname, sizeof(logname), "clat", pre,
-				td->thread_number, suf, o->per_job_logs);
-		setup_log(&td->clat_log, &p, logname);
+		if (!o->disable_clat) {
+			gen_log_name(logname, sizeof(logname), "clat", pre,
+				     td->thread_number, suf, o->per_job_logs);
+			setup_log(&td->clat_log, &p, logname);
+		}
 
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-01-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-01-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b5e99df6ec605b4dc6a3488203f32d5c5bfce8df:

  engines/io_uring: don't set CQSIZE clamp unconditionally (2022-01-09 19:34:27 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 016869bebe9bef7cae5a7f9dc0762162b0612226:

  stat: remove unnecessary bool parameter to sum_thread_stats() (2022-01-10 09:22:14 -0700)

----------------------------------------------------------------
Niklas Cassel (1):
      stat: remove unnecessary bool parameter to sum_thread_stats()

 client.c      |  2 +-
 gclient.c     |  2 +-
 rate-submit.c |  2 +-
 stat.c        | 53 +++++++++++++++++++++++------------------------------
 stat.h        |  2 +-
 5 files changed, 27 insertions(+), 34 deletions(-)

---

Diff of recent changes:

diff --git a/client.c b/client.c
index 8b230617..be8411d8 100644
--- a/client.c
+++ b/client.c
@@ -1111,7 +1111,7 @@ static void handle_ts(struct fio_client *client, struct fio_net_cmd *cmd)
 	if (sum_stat_clients <= 1)
 		return;
 
-	sum_thread_stats(&client_ts, &p->ts, sum_stat_nr == 1);
+	sum_thread_stats(&client_ts, &p->ts);
 	sum_group_stats(&client_gs, &p->rs);
 
 	client_ts.members++;
diff --git a/gclient.c b/gclient.c
index e0e0e7bf..ac063536 100644
--- a/gclient.c
+++ b/gclient.c
@@ -292,7 +292,7 @@ static void gfio_thread_status_op(struct fio_client *client,
 	if (sum_stat_clients == 1)
 		return;
 
-	sum_thread_stats(&client_ts, &p->ts, sum_stat_nr == 1);
+	sum_thread_stats(&client_ts, &p->ts);
 	sum_group_stats(&client_gs, &p->rs);
 
 	client_ts.members++;
diff --git a/rate-submit.c b/rate-submit.c
index 13dbe7a2..752c30a5 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -195,7 +195,7 @@ static void io_workqueue_exit_worker_fn(struct submit_worker *sw,
 	struct thread_data *td = sw->priv;
 
 	(*sum_cnt)++;
-	sum_thread_stats(&sw->wq->td->ts, &td->ts, *sum_cnt == 1);
+	sum_thread_stats(&sw->wq->td->ts, &td->ts);
 
 	fio_options_free(td);
 	close_and_free_files(td);
diff --git a/stat.c b/stat.c
index 99de1294..36742a25 100644
--- a/stat.c
+++ b/stat.c
@@ -495,7 +495,7 @@ static void show_mixed_ddir_status(struct group_run_stats *rs,
 	ts_lcl->unified_rw_rep = UNIFIED_MIXED;
 	init_thread_stat_min_vals(ts_lcl);
 
-	sum_thread_stats(ts_lcl, ts, 1);
+	sum_thread_stats(ts_lcl, ts);
 
 	assert(ddir_rw(ddir));
 
@@ -1479,7 +1479,7 @@ static void show_mixed_ddir_status_terse(struct thread_stat *ts,
 	ts_lcl->percentile_precision = ts->percentile_precision;
 	memcpy(ts_lcl->percentile_list, ts->percentile_list, sizeof(ts->percentile_list));
 	
-	sum_thread_stats(ts_lcl, ts, 1);
+	sum_thread_stats(ts_lcl, ts);
 
 	/* add the aggregated stats to json parent */
 	show_ddir_status_terse(ts_lcl, rs, DDIR_READ, ver, out);
@@ -1677,7 +1677,7 @@ static void add_mixed_ddir_status_json(struct thread_stat *ts,
 	ts_lcl->percentile_precision = ts->percentile_precision;
 	memcpy(ts_lcl->percentile_list, ts->percentile_list, sizeof(ts->percentile_list));
 
-	sum_thread_stats(ts_lcl, ts, 1);
+	sum_thread_stats(ts_lcl, ts);
 
 	/* add the aggregated stats to json parent */
 	add_ddir_status_json(ts_lcl, rs, DDIR_READ, parent);
@@ -2089,9 +2089,10 @@ static void __sum_stat(struct io_stat *dst, struct io_stat *src, bool first)
  * numbers. For group_reporting, we should just add those up, not make
  * them the mean of everything.
  */
-static void sum_stat(struct io_stat *dst, struct io_stat *src, bool first,
-		     bool pure_sum)
+static void sum_stat(struct io_stat *dst, struct io_stat *src, bool pure_sum)
 {
+	bool first = dst->samples == 0;
+
 	if (src->samples == 0)
 		return;
 
@@ -2141,49 +2142,41 @@ void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src)
 		dst->sig_figs = src->sig_figs;
 }
 
-void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
-		      bool first)
+void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src)
 {
 	int k, l, m;
 
-	sum_stat(&dst->sync_stat, &src->sync_stat, first, false);
-
 	for (l = 0; l < DDIR_RWDIR_CNT; l++) {
 		if (dst->unified_rw_rep != UNIFIED_MIXED) {
-			sum_stat(&dst->clat_stat[l], &src->clat_stat[l], first, false);
-			sum_stat(&dst->clat_high_prio_stat[l], &src->clat_high_prio_stat[l], first, false);
-			sum_stat(&dst->clat_low_prio_stat[l], &src->clat_low_prio_stat[l], first, false);
-			sum_stat(&dst->slat_stat[l], &src->slat_stat[l], first, false);
-			sum_stat(&dst->lat_stat[l], &src->lat_stat[l], first, false);
-			sum_stat(&dst->bw_stat[l], &src->bw_stat[l], first, true);
-			sum_stat(&dst->iops_stat[l], &src->iops_stat[l], first, true);
+			sum_stat(&dst->clat_stat[l], &src->clat_stat[l], false);
+			sum_stat(&dst->clat_high_prio_stat[l], &src->clat_high_prio_stat[l], false);
+			sum_stat(&dst->clat_low_prio_stat[l], &src->clat_low_prio_stat[l], false);
+			sum_stat(&dst->slat_stat[l], &src->slat_stat[l], false);
+			sum_stat(&dst->lat_stat[l], &src->lat_stat[l], false);
+			sum_stat(&dst->bw_stat[l], &src->bw_stat[l], true);
+			sum_stat(&dst->iops_stat[l], &src->iops_stat[l], true);
 
 			dst->io_bytes[l] += src->io_bytes[l];
 
 			if (dst->runtime[l] < src->runtime[l])
 				dst->runtime[l] = src->runtime[l];
 		} else {
-			sum_stat(&dst->clat_stat[0], &src->clat_stat[l], first, false);
-			sum_stat(&dst->clat_high_prio_stat[0], &src->clat_high_prio_stat[l], first, false);
-			sum_stat(&dst->clat_low_prio_stat[0], &src->clat_low_prio_stat[l], first, false);
-			sum_stat(&dst->slat_stat[0], &src->slat_stat[l], first, false);
-			sum_stat(&dst->lat_stat[0], &src->lat_stat[l], first, false);
-			sum_stat(&dst->bw_stat[0], &src->bw_stat[l], first, true);
-			sum_stat(&dst->iops_stat[0], &src->iops_stat[l], first, true);
+			sum_stat(&dst->clat_stat[0], &src->clat_stat[l], false);
+			sum_stat(&dst->clat_high_prio_stat[0], &src->clat_high_prio_stat[l], false);
+			sum_stat(&dst->clat_low_prio_stat[0], &src->clat_low_prio_stat[l], false);
+			sum_stat(&dst->slat_stat[0], &src->slat_stat[l], false);
+			sum_stat(&dst->lat_stat[0], &src->lat_stat[l], false);
+			sum_stat(&dst->bw_stat[0], &src->bw_stat[l], true);
+			sum_stat(&dst->iops_stat[0], &src->iops_stat[l], true);
 
 			dst->io_bytes[0] += src->io_bytes[l];
 
 			if (dst->runtime[0] < src->runtime[l])
 				dst->runtime[0] = src->runtime[l];
-
-			/*
-			 * We're summing to the same destination, so override
-			 * 'first' after the first iteration of the loop
-			 */
-			first = false;
 		}
 	}
 
+	sum_stat(&dst->sync_stat, &src->sync_stat, false);
 	dst->usr_time += src->usr_time;
 	dst->sys_time += src->sys_time;
 	dst->ctx += src->ctx;
@@ -2417,7 +2410,7 @@ void __show_run_stats(void)
 		for (k = 0; k < ts->nr_block_infos; k++)
 			ts->block_infos[k] = td->ts.block_infos[k];
 
-		sum_thread_stats(ts, &td->ts, idx == 1);
+		sum_thread_stats(ts, &td->ts);
 
 		if (td->o.ss_dur) {
 			ts->ss_state = td->ss.state;
diff --git a/stat.h b/stat.h
index 9ef8caa4..15ca4eff 100644
--- a/stat.h
+++ b/stat.h
@@ -325,7 +325,7 @@ extern void __show_run_stats(void);
 extern int __show_running_run_stats(void);
 extern void show_running_run_stats(void);
 extern void check_for_running_stats(void);
-extern void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src, bool first);
+extern void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src);
 extern void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src);
 extern void init_thread_stat_min_vals(struct thread_stat *ts);
 extern void init_thread_stat(struct thread_stat *ts);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2022-01-10 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2022-01-10 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a3e33e2fc06582e4170f90ae6e62d6225d52dc7c:

  Merge branch 'github-actions-i686' of https://github.com/vincentkfu/fio (2021-12-23 16:27:33 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b5e99df6ec605b4dc6a3488203f32d5c5bfce8df:

  engines/io_uring: don't set CQSIZE clamp unconditionally (2022-01-09 19:34:27 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      engines/io_uring: don't set CQSIZE clamp unconditionally

 engines/io_uring.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 00ae3482..a2533c88 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -699,9 +699,15 @@ static int fio_ioring_queue_init(struct thread_data *td)
 	p.flags |= IORING_SETUP_CQSIZE;
 	p.cq_entries = depth;
 
+retry:
 	ret = syscall(__NR_io_uring_setup, depth, &p);
-	if (ret < 0)
+	if (ret < 0) {
+		if (errno == EINVAL && p.flags & IORING_SETUP_CQSIZE) {
+			p.flags &= ~IORING_SETUP_CQSIZE;
+			goto retry;
+		}
 		return ret;
+	}
 
 	ld->ring_fd = ret;
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-12-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-12-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9b46661c289d01dbfe5182189a7abea9ce2f9e04:

  Fio 3.29 (2021-12-18 07:09:32 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a3e33e2fc06582e4170f90ae6e62d6225d52dc7c:

  Merge branch 'github-actions-i686' of https://github.com/vincentkfu/fio (2021-12-23 16:27:33 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'github-actions-i686' of https://github.com/vincentkfu/fio

Vincent Fu (4):
      ci: workaround for problem with i686 builds
      Revert "ci: temporarily remove linux-i686-gcc build"
      t/io_uring: fix 32-bit build warnings
      t/io_uring: fix help defaults for aio and random_io

 .github/workflows/ci.yml | 4 ++++
 ci/actions-install.sh    | 5 ++++-
 t/io_uring.c             | 9 +++++----
 3 files changed, 13 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 8167e3d1..cd8ce142 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -14,6 +14,7 @@ jobs:
         - linux-gcc
         - linux-clang
         - macos
+        - linux-i686-gcc
         include:
         - build: linux-gcc
           os: ubuntu-20.04
@@ -23,6 +24,9 @@ jobs:
           cc: clang
         - build: macos
           os: macos-11
+        - build: linux-i686-gcc
+          os: ubuntu-20.04
+          arch: i686
 
     env:
       CI_TARGET_ARCH: ${{ matrix.arch }}
diff --git a/ci/actions-install.sh b/ci/actions-install.sh
index 7408ccb4..b3486a47 100755
--- a/ci/actions-install.sh
+++ b/ci/actions-install.sh
@@ -31,14 +31,17 @@ DPKGCFG
     case "${CI_TARGET_ARCH}" in
         "i686")
             sudo dpkg --add-architecture i386
+            opts="--allow-downgrades"
             pkgs=("${pkgs[@]/%/:i386}")
             pkgs+=(
                 gcc-multilib
                 pkg-config:i386
                 zlib1g-dev:i386
+		libpcre2-8-0=10.34-7
             )
             ;;
         "x86_64")
+            opts=""
             pkgs+=(
                 libglusterfs-dev
                 libgoogle-perftools-dev
@@ -62,7 +65,7 @@ DPKGCFG
     echo "Updating APT..."
     sudo apt-get -qq update
     echo "Installing packages..."
-    sudo apt-get install -o APT::Immediate-Configure=false --no-install-recommends -qq -y "${pkgs[@]}"
+    sudo apt-get install "$opts" -o APT::Immediate-Configure=false --no-install-recommends -qq -y "${pkgs[@]}"
 }
 
 install_linux() {
diff --git a/t/io_uring.c b/t/io_uring.c
index a98f78fd..e8365a79 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -634,7 +634,8 @@ static int submitter_init(struct submitter *s)
 #ifdef CONFIG_LIBAIO
 static int prep_more_ios_aio(struct submitter *s, int max_ios, struct iocb *iocbs)
 {
-	unsigned long offset, data;
+	uint64_t data;
+	long long offset;
 	struct file *f;
 	unsigned index;
 	long r;
@@ -663,7 +664,7 @@ static int prep_more_ios_aio(struct submitter *s, int max_ios, struct iocb *iocb
 
 		data = f->fileno;
 		if (stats && stats_running)
-			data |= ((unsigned long) s->clock_index << 32);
+			data |= (((uint64_t) s->clock_index) << 32);
 		iocb->data = (void *) (uintptr_t) data;
 		index++;
 	}
@@ -676,7 +677,7 @@ static int reap_events_aio(struct submitter *s, struct io_event *events, int evs
 	int reaped = 0;
 
 	while (evs) {
-		unsigned long data = (uintptr_t) events[reaped].data;
+		uint64_t data = (uintptr_t) events[reaped].data;
 		struct file *f = &s->files[data & 0xffffffff];
 
 		f->pending_ios--;
@@ -1094,7 +1095,7 @@ static void usage(char *argv, int status)
 		" -a <bool> : Use legacy aio, default %d\n",
 		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
 		fixedbufs, dma_map, register_files, nthreads, !buffered, do_nop,
-		stats, runtime == 0 ? "unlimited" : runtime_str, aio, random_io);
+		stats, runtime == 0 ? "unlimited" : runtime_str, random_io, aio);
 	exit(status);
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-12-19 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-12-19 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e86afa536b175a90546e20d7d19f2418ee1bca78:

  stat: sum sync_stat before reassigning bool first (2021-12-15 08:45:32 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9b46661c289d01dbfe5182189a7abea9ce2f9e04:

  Fio 3.29 (2021-12-18 07:09:32 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      stat: code cleanup and leak free
      Fio 3.29

 FIO-VERSION-GEN |  2 +-
 stat.c          | 84 ++++++++++++++++++++++++++++++++++-----------------------
 2 files changed, 51 insertions(+), 35 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index e9d563c1..60f7bb21 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.28
+DEF_VER=fio-3.29
 
 LF='
 '
diff --git a/stat.c b/stat.c
index ec44c79e..99de1294 100644
--- a/stat.c
+++ b/stat.c
@@ -289,9 +289,10 @@ void show_mixed_group_stats(struct group_run_stats *rs, struct buf_output *out)
 {
 	char *io, *agg, *min, *max;
 	char *ioalt, *aggalt, *minalt, *maxalt;
-	uint64_t io_mix = 0, agg_mix = 0, min_mix = -1, max_mix = 0, min_run = -1, max_run = 0;
-	int i;
+	uint64_t io_mix = 0, agg_mix = 0, min_mix = -1, max_mix = 0;
+	uint64_t min_run = -1, max_run = 0;
 	const int i2p = is_power_of_2(rs->kb_base);
+	int i;
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		if (!rs->max_run[i])
@@ -363,9 +364,9 @@ void show_group_stats(struct group_run_stats *rs, struct buf_output *out)
 		free(minalt);
 		free(maxalt);
 	}
-	
+
 	/* Need to aggregate statisitics to show mixed values */
-	if (rs->unified_rw_rep == UNIFIED_BOTH) 
+	if (rs->unified_rw_rep == UNIFIED_BOTH)
 		show_mixed_group_stats(rs, out);
 }
 
@@ -473,30 +474,35 @@ static double convert_agg_kbytes_percent(struct group_run_stats *rs, int ddir, i
 	return p_of_agg;
 }
 
-static void show_mixed_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
-			     struct buf_output *out)
+static void show_mixed_ddir_status(struct group_run_stats *rs,
+				   struct thread_stat *ts,
+				   struct buf_output *out)
 {
 	unsigned long runt;
 	unsigned long long min, max, bw, iops;
 	double mean, dev;
 	char *io_p, *bw_p, *bw_p_alt, *iops_p, *post_st = NULL;
 	struct thread_stat *ts_lcl;
-
 	int i2p;
 	int ddir = 0;
 
-	/* Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and Trims (ddir = 2) */
+	/*
+	 * Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and
+	 * Trims (ddir = 2) */
 	ts_lcl = malloc(sizeof(struct thread_stat));
 	memset((void *)ts_lcl, 0, sizeof(struct thread_stat));
-	ts_lcl->unified_rw_rep = UNIFIED_MIXED;               /* calculate mixed stats  */
+	/* calculate mixed stats  */
+	ts_lcl->unified_rw_rep = UNIFIED_MIXED;
 	init_thread_stat_min_vals(ts_lcl);
 
 	sum_thread_stats(ts_lcl, ts, 1);
 
 	assert(ddir_rw(ddir));
 
-	if (!ts_lcl->runtime[ddir])
+	if (!ts_lcl->runtime[ddir]) {
+		free(ts_lcl);
 		return;
+	}
 
 	i2p = is_power_of_2(rs->kb_base);
 	runt = ts_lcl->runtime[ddir];
@@ -560,10 +566,9 @@ static void show_mixed_ddir_status(struct group_run_stats *rs, struct thread_sta
 		else
 			samples = ts_lcl->clat_stat[ddir].samples;
 
-		/* Only print this if some high and low priority stats were collected */
+		/* Only print if high and low priority stats were collected */
 		if (ts_lcl->clat_high_prio_stat[ddir].samples > 0 &&
-				ts_lcl->clat_low_prio_stat[ddir].samples > 0)
-		{
+				ts_lcl->clat_low_prio_stat[ddir].samples > 0) {
 			sprintf(prio_name, "high prio (%.2f%%) %s",
 					100. * (double) ts_lcl->clat_high_prio_stat[ddir].samples / (double) samples,
 					name);
@@ -1222,9 +1227,8 @@ void show_disk_util(int terse, struct json_object *parent,
 	if (!is_running_backend())
 		return;
 
-	if (flist_empty(&disk_list)) {
+	if (flist_empty(&disk_list))
 		return;
-	}
 
 	if ((output_format & FIO_OUTPUT_JSON) && parent)
 		do_json = true;
@@ -1234,9 +1238,9 @@ void show_disk_util(int terse, struct json_object *parent,
 	if (!terse && !do_json)
 		log_buf(out, "\nDisk stats (read/write):\n");
 
-	if (do_json)
+	if (do_json) {
 		json_object_add_disk_utils(parent, &disk_list);
-	else if (output_format & ~(FIO_OUTPUT_JSON | FIO_OUTPUT_JSON_PLUS)) {
+	} else if (output_format & ~(FIO_OUTPUT_JSON | FIO_OUTPUT_JSON_PLUS)) {
 		flist_for_each(entry, &disk_list) {
 			du = flist_entry(entry, struct disk_util, list);
 
@@ -1396,19 +1400,20 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 	else
 		log_buf(out, ";%llu;%llu;%f;%f", 0ULL, 0ULL, 0.0, 0.0);
 
-	if (ts->lat_percentiles)
+	if (ts->lat_percentiles) {
 		len = calc_clat_percentiles(ts->io_u_plat[FIO_LAT][ddir],
 					ts->lat_stat[ddir].samples,
 					ts->percentile_list, &ovals, &maxv,
 					&minv);
-	else if (ts->clat_percentiles)
+	} else if (ts->clat_percentiles) {
 		len = calc_clat_percentiles(ts->io_u_plat[FIO_CLAT][ddir],
 					ts->clat_stat[ddir].samples,
 					ts->percentile_list, &ovals, &maxv,
 					&minv);
-	else
+	} else {
 		len = 0;
-	
+	}
+
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++) {
 		if (i >= len) {
 			log_buf(out, ";0%%=0");
@@ -1435,8 +1440,9 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 		}
 
 		log_buf(out, ";%llu;%llu;%f%%;%f;%f", min, max, p_of_agg, mean, dev);
-	} else
+	} else {
 		log_buf(out, ";%llu;%llu;%f%%;%f;%f", 0ULL, 0ULL, 0.0, 0.0, 0.0);
+	}
 
 	if (ver == 5) {
 		if (bw_stat)
@@ -1458,15 +1464,19 @@ static void show_mixed_ddir_status_terse(struct thread_stat *ts,
 {
 	struct thread_stat *ts_lcl;
 
-	/* Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and Trims (ddir = 2) */
+	/*
+	 * Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and
+	 * Trims (ddir = 2)
+	 */
 	ts_lcl = malloc(sizeof(struct thread_stat));
 	memset((void *)ts_lcl, 0, sizeof(struct thread_stat));
-	ts_lcl->unified_rw_rep = UNIFIED_MIXED;               /* calculate mixed stats  */
+	/* calculate mixed stats  */
+	ts_lcl->unified_rw_rep = UNIFIED_MIXED;
 	init_thread_stat_min_vals(ts_lcl);
 	ts_lcl->lat_percentiles = ts->lat_percentiles;
 	ts_lcl->clat_percentiles = ts->clat_percentiles;
 	ts_lcl->slat_percentiles = ts->slat_percentiles;
-	ts_lcl->percentile_precision = ts->percentile_precision;		
+	ts_lcl->percentile_precision = ts->percentile_precision;
 	memcpy(ts_lcl->percentile_list, ts->percentile_list, sizeof(ts->percentile_list));
 	
 	sum_thread_stats(ts_lcl, ts, 1);
@@ -1476,8 +1486,10 @@ static void show_mixed_ddir_status_terse(struct thread_stat *ts,
 	free(ts_lcl);
 }
 
-static struct json_object *add_ddir_lat_json(struct thread_stat *ts, uint32_t percentiles,
-		struct io_stat *lat_stat, uint64_t *io_u_plat)
+static struct json_object *add_ddir_lat_json(struct thread_stat *ts,
+					     uint32_t percentiles,
+					     struct io_stat *lat_stat,
+					     uint64_t *io_u_plat)
 {
 	char buf[120];
 	double mean, dev;
@@ -1650,15 +1662,19 @@ static void add_mixed_ddir_status_json(struct thread_stat *ts,
 {
 	struct thread_stat *ts_lcl;
 
-	/* Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and Trims (ddir = 2) */
+	/*
+	 * Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and
+	 * Trims (ddir = 2)
+	 */
 	ts_lcl = malloc(sizeof(struct thread_stat));
 	memset((void *)ts_lcl, 0, sizeof(struct thread_stat));
-	ts_lcl->unified_rw_rep = UNIFIED_MIXED;               /* calculate mixed stats  */
+	/* calculate mixed stats  */
+	ts_lcl->unified_rw_rep = UNIFIED_MIXED;
 	init_thread_stat_min_vals(ts_lcl);
 	ts_lcl->lat_percentiles = ts->lat_percentiles;
 	ts_lcl->clat_percentiles = ts->clat_percentiles;
 	ts_lcl->slat_percentiles = ts->slat_percentiles;
-	ts_lcl->percentile_precision = ts->percentile_precision;		
+	ts_lcl->percentile_precision = ts->percentile_precision;
 	memcpy(ts_lcl->percentile_list, ts->percentile_list, sizeof(ts->percentile_list));
 
 	sum_thread_stats(ts_lcl, ts, 1);
@@ -2133,7 +2149,7 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 	sum_stat(&dst->sync_stat, &src->sync_stat, first, false);
 
 	for (l = 0; l < DDIR_RWDIR_CNT; l++) {
-		if (!(dst->unified_rw_rep == UNIFIED_MIXED)) {
+		if (dst->unified_rw_rep != UNIFIED_MIXED) {
 			sum_stat(&dst->clat_stat[l], &src->clat_stat[l], first, false);
 			sum_stat(&dst->clat_high_prio_stat[l], &src->clat_high_prio_stat[l], first, false);
 			sum_stat(&dst->clat_low_prio_stat[l], &src->clat_low_prio_stat[l], first, false);
@@ -2188,7 +2204,7 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 		dst->io_u_lat_m[k] += src->io_u_lat_m[k];
 
 	for (k = 0; k < DDIR_RWDIR_CNT; k++) {
-		if (!(dst->unified_rw_rep == UNIFIED_MIXED)) {
+		if (dst->unified_rw_rep != UNIFIED_MIXED) {
 			dst->total_io_u[k] += src->total_io_u[k];
 			dst->short_io_u[k] += src->short_io_u[k];
 			dst->drop_io_u[k] += src->drop_io_u[k];
@@ -2204,7 +2220,7 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 	for (k = 0; k < FIO_LAT_CNT; k++)
 		for (l = 0; l < DDIR_RWDIR_CNT; l++)
 			for (m = 0; m < FIO_IO_U_PLAT_NR; m++)
-				if (!(dst->unified_rw_rep == UNIFIED_MIXED))
+				if (dst->unified_rw_rep != UNIFIED_MIXED)
 					dst->io_u_plat[k][l][m] += src->io_u_plat[k][l][m];
 				else
 					dst->io_u_plat[k][0][m] += src->io_u_plat[k][l][m];
@@ -2214,7 +2230,7 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 
 	for (k = 0; k < DDIR_RWDIR_CNT; k++) {
 		for (m = 0; m < FIO_IO_U_PLAT_NR; m++) {
-			if (!(dst->unified_rw_rep == UNIFIED_MIXED)) {
+			if (dst->unified_rw_rep != UNIFIED_MIXED) {
 				dst->io_u_plat_high_prio[k][m] += src->io_u_plat_high_prio[k][m];
 				dst->io_u_plat_low_prio[k][m] += src->io_u_plat_low_prio[k][m];
 			} else {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-12-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-12-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9ffe433d729101a34d9709030d7d4dd2444347ef:

  t/zbd: Avoid inappropriate blkzone command call in zone_cap_bs (2021-12-14 06:48:14 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e86afa536b175a90546e20d7d19f2418ee1bca78:

  stat: sum sync_stat before reassigning bool first (2021-12-15 08:45:32 -0700)

----------------------------------------------------------------
Niklas Cassel (1):
      stat: sum sync_stat before reassigning bool first

 stat.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 7e84058d..ec44c79e 100644
--- a/stat.c
+++ b/stat.c
@@ -2130,6 +2130,8 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 {
 	int k, l, m;
 
+	sum_stat(&dst->sync_stat, &src->sync_stat, first, false);
+
 	for (l = 0; l < DDIR_RWDIR_CNT; l++) {
 		if (!(dst->unified_rw_rep == UNIFIED_MIXED)) {
 			sum_stat(&dst->clat_stat[l], &src->clat_stat[l], first, false);
@@ -2166,7 +2168,6 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 		}
 	}
 
-	sum_stat(&dst->sync_stat, &src->sync_stat, first, false);
 	dst->usr_time += src->usr_time;
 	dst->sys_time += src->sys_time;
 	dst->ctx += src->ctx;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-12-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-12-15 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2ea393df3256e44398558c264f035f8db7656b08:

  Merge branch 'github-actions' of https://github.com/sitsofe/fio (2021-12-10 11:08:26 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9ffe433d729101a34d9709030d7d4dd2444347ef:

  t/zbd: Avoid inappropriate blkzone command call in zone_cap_bs (2021-12-14 06:48:14 -0700)

----------------------------------------------------------------
Damien Le Moal (11):
      fio: Improve documentation of ignore_zone_limits option
      zbd: define local functions as static
      zbd: move and cleanup code
      zbd: remove is_zone_open() helper
      zbd: introduce zbd_zone_align_file_sizes() helper
      zbd: fix code style issues
      zbd: simplify zbd_close_zone()
      zbd: simplify zbd_open_zone()
      zbd: rename zbd_zone_idx() and zbd_zone_nr()
      zbd: rename get_zone()
      zbd: introduce zbd_offset_to_zone() helper

Niklas Cassel (2):
      ci: temporarily remove linux-i686-gcc build
      ci: use macos 11 in virtual environment

Shin'ichiro Kawasaki (1):
      t/zbd: Avoid inappropriate blkzone command call in zone_cap_bs

 .github/workflows/ci.yml |   6 +-
 HOWTO                    |   6 +
 fio.1                    |   6 +-
 t/zbd/functions          |   6 +-
 zbd.c                    | 963 +++++++++++++++++++++++++----------------------
 5 files changed, 532 insertions(+), 455 deletions(-)

---

Diff of recent changes:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index a766cfa8..8167e3d1 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -14,7 +14,6 @@ jobs:
         - linux-gcc
         - linux-clang
         - macos
-        - linux-i686-gcc
         include:
         - build: linux-gcc
           os: ubuntu-20.04
@@ -23,10 +22,7 @@ jobs:
           os: ubuntu-20.04
           cc: clang
         - build: macos
-          os: macos-10.15
-        - build: linux-i686-gcc
-          os: ubuntu-20.04
-          arch: i686
+          os: macos-11
 
     env:
       CI_TARGET_ARCH: ${{ matrix.arch }}
diff --git a/HOWTO b/HOWTO
index 8c9e4135..2956e50d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1063,6 +1063,12 @@ Target file/device
 	Limit on the number of simultaneously opened zones per single
 	thread/process.
 
+.. option:: ignore_zone_limits=bool
+	If this option is used, fio will ignore the maximum number of open
+	zones limit of the zoned block device in use, thus allowing the
+	option :option:`max_open_zones` value to be larger than the device
+	reported limit. Default: false.
+
 .. option:: zone_reset_threshold=float
 
 	A number between zero and one that indicates the ratio of logical
diff --git a/fio.1 b/fio.1
index a3ebb67d..e0458c22 100644
--- a/fio.1
+++ b/fio.1
@@ -838,9 +838,9 @@ threads/processes.
 Limit on the number of simultaneously opened zones per single thread/process.
 .TP
 .BI ignore_zone_limits \fR=\fPbool
-If this isn't set, fio will query the max open zones limit from the zoned block
-device, and exit if the specified \fBmax_open_zones\fR value is larger than the
-limit reported by the device. Default: false.
+If this option is used, fio will ignore the maximum number of open zones limit
+of the zoned block device in use, thus allowing the option \fBmax_open_zones\fR
+value to be larger than the device reported limit. Default: false.
 .TP
 .BI zone_reset_threshold \fR=\fPfloat
 A number between zero and one that indicates the ratio of logical blocks with
diff --git a/t/zbd/functions b/t/zbd/functions
index e4e248b9..7cff18fd 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -72,9 +72,11 @@ zone_cap_bs() {
 	local sed_str='s/.*len \([0-9A-Za-z]*\), cap \([0-9A-Za-z]*\).*/\1 \2/p'
 	local cap bs="$zone_size"
 
-	# When blkzone is not available or blkzone does not report capacity,
+	# When blkzone command is neither available nor relevant to the
+	# test device, or when blkzone command does not report capacity,
 	# assume that zone capacity is same as zone size for all zones.
-	if [ -z "${blkzone}" ] || ! blkzone_reports_capacity "${dev}"; then
+	if [ -z "${blkzone}" ] || [ -z "$is_zbd" ] || [ -c "$dev" ] ||
+		   ! blkzone_reports_capacity "${dev}"; then
 		echo "$zone_size"
 		return
 	fi
diff --git a/zbd.c b/zbd.c
index c18998c4..b1fd6b4b 100644
--- a/zbd.c
+++ b/zbd.c
@@ -22,13 +22,126 @@
 #include "pshared.h"
 #include "zbd.h"
 
+static bool is_valid_offset(const struct fio_file *f, uint64_t offset)
+{
+	return (uint64_t)(offset - f->file_offset) < f->io_size;
+}
+
+static inline unsigned int zbd_zone_idx(const struct fio_file *f,
+					struct fio_zone_info *zone)
+{
+	return zone - f->zbd_info->zone_info;
+}
+
+/**
+ * zbd_offset_to_zone_idx - convert an offset into a zone number
+ * @f: file pointer.
+ * @offset: offset in bytes. If this offset is in the first zone_size bytes
+ *	    past the disk size then the index of the sentinel is returned.
+ */
+static unsigned int zbd_offset_to_zone_idx(const struct fio_file *f,
+					   uint64_t offset)
+{
+	uint32_t zone_idx;
+
+	if (f->zbd_info->zone_size_log2 > 0)
+		zone_idx = offset >> f->zbd_info->zone_size_log2;
+	else
+		zone_idx = offset / f->zbd_info->zone_size;
+
+	return min(zone_idx, f->zbd_info->nr_zones);
+}
+
+/**
+ * zbd_zone_end - Return zone end location
+ * @z: zone info pointer.
+ */
+static inline uint64_t zbd_zone_end(const struct fio_zone_info *z)
+{
+	return (z+1)->start;
+}
+
+/**
+ * zbd_zone_capacity_end - Return zone capacity limit end location
+ * @z: zone info pointer.
+ */
+static inline uint64_t zbd_zone_capacity_end(const struct fio_zone_info *z)
+{
+	return z->start + z->capacity;
+}
+
+/**
+ * zbd_zone_full - verify whether a minimum number of bytes remain in a zone
+ * @f: file pointer.
+ * @z: zone info pointer.
+ * @required: minimum number of bytes that must remain in a zone.
+ *
+ * The caller must hold z->mutex.
+ */
+static bool zbd_zone_full(const struct fio_file *f, struct fio_zone_info *z,
+			  uint64_t required)
+{
+	assert((required & 511) == 0);
+
+	return z->has_wp &&
+		z->wp + required > zbd_zone_capacity_end(z);
+}
+
+static void zone_lock(struct thread_data *td, const struct fio_file *f,
+		      struct fio_zone_info *z)
+{
+	struct zoned_block_device_info *zbd = f->zbd_info;
+	uint32_t nz = z - zbd->zone_info;
+
+	/* A thread should never lock zones outside its working area. */
+	assert(f->min_zone <= nz && nz < f->max_zone);
+
+	assert(z->has_wp);
+
+	/*
+	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
+	 * is changed or when io_u completes and zbd_put_io() executed.
+	 * To avoid multiple jobs doing asynchronous I/Os from deadlocking each
+	 * other waiting for zone locks when building an io_u batch, first
+	 * only trylock the zone. If the zone is already locked by another job,
+	 * process the currently queued I/Os so that I/O progress is made and
+	 * zones unlocked.
+	 */
+	if (pthread_mutex_trylock(&z->mutex) != 0) {
+		if (!td_ioengine_flagged(td, FIO_SYNCIO))
+			io_u_quiesce(td);
+		pthread_mutex_lock(&z->mutex);
+	}
+}
+
+static inline void zone_unlock(struct fio_zone_info *z)
+{
+	int ret;
+
+	assert(z->has_wp);
+	ret = pthread_mutex_unlock(&z->mutex);
+	assert(!ret);
+}
+
+static inline struct fio_zone_info *zbd_get_zone(const struct fio_file *f,
+						 unsigned int zone_idx)
+{
+	return &f->zbd_info->zone_info[zone_idx];
+}
+
+static inline struct fio_zone_info *
+zbd_offset_to_zone(const struct fio_file *f,  uint64_t offset)
+{
+	return zbd_get_zone(f, zbd_offset_to_zone_idx(f, offset));
+}
+
 /**
  * zbd_get_zoned_model - Get a device zoned model
  * @td: FIO thread data
  * @f: FIO file for which to get model information
  */
-int zbd_get_zoned_model(struct thread_data *td, struct fio_file *f,
-			enum zbd_zoned_model *model)
+static int zbd_get_zoned_model(struct thread_data *td, struct fio_file *f,
+			       enum zbd_zoned_model *model)
 {
 	int ret;
 
@@ -71,9 +184,9 @@ int zbd_get_zoned_model(struct thread_data *td, struct fio_file *f,
  * upon failure. If the zone report is empty, always assume an error (device
  * problem) and return -EIO.
  */
-int zbd_report_zones(struct thread_data *td, struct fio_file *f,
-		     uint64_t offset, struct zbd_zone *zones,
-		     unsigned int nr_zones)
+static int zbd_report_zones(struct thread_data *td, struct fio_file *f,
+			    uint64_t offset, struct zbd_zone *zones,
+			    unsigned int nr_zones)
 {
 	int ret;
 
@@ -105,8 +218,8 @@ int zbd_report_zones(struct thread_data *td, struct fio_file *f,
  * Reset the write pointer of all zones in the range @offset...@offset+@length.
  * Returns 0 upon success and a negative error code upon failure.
  */
-int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
-		 uint64_t offset, uint64_t length)
+static int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
+			uint64_t offset, uint64_t length)
 {
 	int ret;
 
@@ -124,131 +237,233 @@ int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
 }
 
 /**
- * zbd_get_max_open_zones - Get the maximum number of open zones
- * @td: FIO thread data
- * @f: FIO file for which to get max open zones
- * @max_open_zones: Upon success, result will be stored here.
- *
- * A @max_open_zones value set to zero means no limit.
+ * zbd_reset_zone - reset the write pointer of a single zone
+ * @td: FIO thread data.
+ * @f: FIO file associated with the disk for which to reset a write pointer.
+ * @z: Zone to reset.
  *
  * Returns 0 upon success and a negative error code upon failure.
+ *
+ * The caller must hold z->mutex.
  */
-int zbd_get_max_open_zones(struct thread_data *td, struct fio_file *f,
-			   unsigned int *max_open_zones)
+static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
+			  struct fio_zone_info *z)
 {
-	int ret;
+	uint64_t offset = z->start;
+	uint64_t length = (z+1)->start - offset;
+	uint64_t data_in_zone = z->wp - z->start;
+	int ret = 0;
 
-	if (td->io_ops && td->io_ops->get_max_open_zones)
-		ret = td->io_ops->get_max_open_zones(td, f, max_open_zones);
-	else
-		ret = blkzoned_get_max_open_zones(td, f, max_open_zones);
-	if (ret < 0) {
-		td_verror(td, errno, "get max open zones failed");
-		log_err("%s: get max open zones failed (%d).\n",
-			f->file_name, errno);
+	if (!data_in_zone)
+		return 0;
+
+	assert(is_valid_offset(f, offset + length - 1));
+
+	dprint(FD_ZBD, "%s: resetting wp of zone %u.\n",
+	       f->file_name, zbd_zone_idx(f, z));
+
+	switch (f->zbd_info->model) {
+	case ZBD_HOST_AWARE:
+	case ZBD_HOST_MANAGED:
+		ret = zbd_reset_wp(td, f, offset, length);
+		if (ret < 0)
+			return ret;
+		break;
+	default:
+		break;
 	}
 
+	pthread_mutex_lock(&f->zbd_info->mutex);
+	f->zbd_info->sectors_with_data -= data_in_zone;
+	f->zbd_info->wp_sectors_with_data -= data_in_zone;
+	pthread_mutex_unlock(&f->zbd_info->mutex);
+
+	z->wp = z->start;
+	z->verify_block = 0;
+
+	td->ts.nr_zone_resets++;
+
 	return ret;
 }
 
 /**
- * zbd_zone_idx - convert an offset into a zone number
- * @f: file pointer.
- * @offset: offset in bytes. If this offset is in the first zone_size bytes
- *	    past the disk size then the index of the sentinel is returned.
+ * zbd_close_zone - Remove a zone from the open zones array.
+ * @td: FIO thread data.
+ * @f: FIO file associated with the disk for which to reset a write pointer.
+ * @zone_idx: Index of the zone to remove.
+ *
+ * The caller must hold f->zbd_info->mutex.
  */
-static uint32_t zbd_zone_idx(const struct fio_file *f, uint64_t offset)
+static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
+			   struct fio_zone_info *z)
 {
-	uint32_t zone_idx;
+	uint32_t ozi;
 
-	if (f->zbd_info->zone_size_log2 > 0)
-		zone_idx = offset >> f->zbd_info->zone_size_log2;
-	else
-		zone_idx = offset / f->zbd_info->zone_size;
+	if (!z->open)
+		return;
 
-	return min(zone_idx, f->zbd_info->nr_zones);
-}
+	for (ozi = 0; ozi < f->zbd_info->num_open_zones; ozi++) {
+		if (zbd_get_zone(f, f->zbd_info->open_zones[ozi]) == z)
+			break;
+	}
+	if (ozi == f->zbd_info->num_open_zones)
+		return;
 
-/**
- * zbd_zone_end - Return zone end location
- * @z: zone info pointer.
- */
-static inline uint64_t zbd_zone_end(const struct fio_zone_info *z)
-{
-	return (z+1)->start;
+	dprint(FD_ZBD, "%s: closing zone %u\n",
+	       f->file_name, zbd_zone_idx(f, z));
+
+	memmove(f->zbd_info->open_zones + ozi,
+		f->zbd_info->open_zones + ozi + 1,
+		(ZBD_MAX_OPEN_ZONES - (ozi + 1)) *
+		sizeof(f->zbd_info->open_zones[0]));
+
+	f->zbd_info->num_open_zones--;
+	td->num_open_zones--;
+	z->open = 0;
 }
 
 /**
- * zbd_zone_capacity_end - Return zone capacity limit end location
- * @z: zone info pointer.
+ * zbd_reset_zones - Reset a range of zones.
+ * @td: fio thread data.
+ * @f: fio file for which to reset zones
+ * @zb: first zone to reset.
+ * @ze: first zone not to reset.
+ *
+ * Returns 0 upon success and 1 upon failure.
  */
-static inline uint64_t zbd_zone_capacity_end(const struct fio_zone_info *z)
+static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
+			   struct fio_zone_info *const zb,
+			   struct fio_zone_info *const ze)
 {
-	return z->start + z->capacity;
+	struct fio_zone_info *z;
+	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
+	int res = 0;
+
+	assert(min_bs);
+
+	dprint(FD_ZBD, "%s: examining zones %u .. %u\n",
+	       f->file_name, zbd_zone_idx(f, zb), zbd_zone_idx(f, ze));
+
+	for (z = zb; z < ze; z++) {
+		if (!z->has_wp)
+			continue;
+
+		zone_lock(td, f, z);
+		pthread_mutex_lock(&f->zbd_info->mutex);
+		zbd_close_zone(td, f, z);
+		pthread_mutex_unlock(&f->zbd_info->mutex);
+
+		if (z->wp != z->start) {
+			dprint(FD_ZBD, "%s: resetting zone %u\n",
+			       f->file_name, zbd_zone_idx(f, z));
+			if (zbd_reset_zone(td, f, z) < 0)
+				res = 1;
+		}
+
+		zone_unlock(z);
+	}
+
+	return res;
 }
 
 /**
- * zbd_zone_full - verify whether a minimum number of bytes remain in a zone
- * @f: file pointer.
- * @z: zone info pointer.
- * @required: minimum number of bytes that must remain in a zone.
+ * zbd_get_max_open_zones - Get the maximum number of open zones
+ * @td: FIO thread data
+ * @f: FIO file for which to get max open zones
+ * @max_open_zones: Upon success, result will be stored here.
  *
- * The caller must hold z->mutex.
+ * A @max_open_zones value set to zero means no limit.
+ *
+ * Returns 0 upon success and a negative error code upon failure.
  */
-static bool zbd_zone_full(const struct fio_file *f, struct fio_zone_info *z,
-			  uint64_t required)
+static int zbd_get_max_open_zones(struct thread_data *td, struct fio_file *f,
+				  unsigned int *max_open_zones)
 {
-	assert((required & 511) == 0);
+	int ret;
 
-	return z->has_wp &&
-		z->wp + required > zbd_zone_capacity_end(z);
+	if (td->io_ops && td->io_ops->get_max_open_zones)
+		ret = td->io_ops->get_max_open_zones(td, f, max_open_zones);
+	else
+		ret = blkzoned_get_max_open_zones(td, f, max_open_zones);
+	if (ret < 0) {
+		td_verror(td, errno, "get max open zones failed");
+		log_err("%s: get max open zones failed (%d).\n",
+			f->file_name, errno);
+	}
+
+	return ret;
 }
 
-static void zone_lock(struct thread_data *td, const struct fio_file *f,
-		      struct fio_zone_info *z)
+/**
+ * zbd_open_zone - Add a zone to the array of open zones.
+ * @td: fio thread data.
+ * @f: fio file that has the open zones to add.
+ * @zone_idx: Index of the zone to add.
+ *
+ * Open a ZBD zone if it is not already open. Returns true if either the zone
+ * was already open or if the zone was successfully added to the array of open
+ * zones without exceeding the maximum number of open zones. Returns false if
+ * the zone was not already open and opening the zone would cause the zone limit
+ * to be exceeded.
+ */
+static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
+			  struct fio_zone_info *z)
 {
-	struct zoned_block_device_info *zbd = f->zbd_info;
-	uint32_t nz = z - zbd->zone_info;
+	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
+	struct zoned_block_device_info *zbdi = f->zbd_info;
+	uint32_t zone_idx = zbd_zone_idx(f, z);
+	bool res = true;
 
-	/* A thread should never lock zones outside its working area. */
-	assert(f->min_zone <= nz && nz < f->max_zone);
+	if (z->cond == ZBD_ZONE_COND_OFFLINE)
+		return false;
 
-	assert(z->has_wp);
+	/*
+	 * Skip full zones with data verification enabled because resetting a
+	 * zone causes data loss and hence causes verification to fail.
+	 */
+	if (td->o.verify != VERIFY_NONE && zbd_zone_full(f, z, min_bs))
+		return false;
 
 	/*
-	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
-	 * is changed or when io_u completes and zbd_put_io() executed.
-	 * To avoid multiple jobs doing asynchronous I/Os from deadlocking each
-	 * other waiting for zone locks when building an io_u batch, first
-	 * only trylock the zone. If the zone is already locked by another job,
-	 * process the currently queued I/Os so that I/O progress is made and
-	 * zones unlocked.
+	 * zbdi->max_open_zones == 0 means that there is no limit on the maximum
+	 * number of open zones. In this case, do no track open zones in
+	 * zbdi->open_zones array.
 	 */
-	if (pthread_mutex_trylock(&z->mutex) != 0) {
-		if (!td_ioengine_flagged(td, FIO_SYNCIO))
-			io_u_quiesce(td);
-		pthread_mutex_lock(&z->mutex);
+	if (!zbdi->max_open_zones)
+		return true;
+
+	pthread_mutex_lock(&zbdi->mutex);
+
+	if (z->open) {
+		/*
+		 * If the zone is going to be completely filled by writes
+		 * already in-flight, handle it as a full zone instead of an
+		 * open zone.
+		 */
+		if (z->wp >= zbd_zone_capacity_end(z))
+			res = false;
+		goto out;
 	}
-}
 
-static inline void zone_unlock(struct fio_zone_info *z)
-{
-	int ret;
+	res = false;
+	/* Zero means no limit */
+	if (td->o.job_max_open_zones > 0 &&
+	    td->num_open_zones >= td->o.job_max_open_zones)
+		goto out;
+	if (zbdi->num_open_zones >= zbdi->max_open_zones)
+		goto out;
 
-	assert(z->has_wp);
-	ret = pthread_mutex_unlock(&z->mutex);
-	assert(!ret);
-}
+	dprint(FD_ZBD, "%s: opening zone %u\n",
+	       f->file_name, zone_idx);
 
-static bool is_valid_offset(const struct fio_file *f, uint64_t offset)
-{
-	return (uint64_t)(offset - f->file_offset) < f->io_size;
-}
+	zbdi->open_zones[zbdi->num_open_zones++] = zone_idx;
+	td->num_open_zones++;
+	z->open = 1;
+	res = true;
 
-static inline struct fio_zone_info *get_zone(const struct fio_file *f,
-					     unsigned int zone_nr)
-{
-	return &f->zbd_info->zone_info[zone_nr];
+out:
+	pthread_mutex_unlock(&zbdi->mutex);
+	return res;
 }
 
 /* Verify whether direct I/O is used for all host-managed zoned drives. */
@@ -277,15 +492,91 @@ static bool zbd_is_seq_job(struct fio_file *f)
 	uint32_t zone_idx, zone_idx_b, zone_idx_e;
 
 	assert(f->zbd_info);
+
 	if (f->io_size == 0)
 		return false;
-	zone_idx_b = zbd_zone_idx(f, f->file_offset);
-	zone_idx_e = zbd_zone_idx(f, f->file_offset + f->io_size - 1);
+
+	zone_idx_b = zbd_offset_to_zone_idx(f, f->file_offset);
+	zone_idx_e =
+		zbd_offset_to_zone_idx(f, f->file_offset + f->io_size - 1);
 	for (zone_idx = zone_idx_b; zone_idx <= zone_idx_e; zone_idx++)
-		if (get_zone(f, zone_idx)->has_wp)
+		if (zbd_get_zone(f, zone_idx)->has_wp)
 			return true;
 
-	return false;
+	return false;
+}
+
+/*
+ * Verify whether the file offset and size parameters are aligned with zone
+ * boundaries. If the file offset is not aligned, align it down to the start of
+ * the zone containing the start offset and align up the file io_size parameter.
+ */
+static bool zbd_zone_align_file_sizes(struct thread_data *td,
+				      struct fio_file *f)
+{
+	const struct fio_zone_info *z;
+	uint64_t new_offset, new_end;
+
+	if (!f->zbd_info)
+		return true;
+	if (f->file_offset >= f->real_file_size)
+		return true;
+	if (!zbd_is_seq_job(f))
+		return true;
+
+	if (!td->o.zone_size) {
+		td->o.zone_size = f->zbd_info->zone_size;
+		if (!td->o.zone_size) {
+			log_err("%s: invalid 0 zone size\n",
+				f->file_name);
+			return false;
+		}
+	} else if (td->o.zone_size != f->zbd_info->zone_size) {
+		log_err("%s: zonesize %llu does not match the device zone size %"PRIu64".\n",
+			f->file_name, td->o.zone_size,
+			f->zbd_info->zone_size);
+		return false;
+	}
+
+	if (td->o.zone_skip % td->o.zone_size) {
+		log_err("%s: zoneskip %llu is not a multiple of the device zone size %llu.\n",
+			f->file_name, td->o.zone_skip,
+			td->o.zone_size);
+		return false;
+	}
+
+	z = zbd_offset_to_zone(f, f->file_offset);
+	if ((f->file_offset != z->start) &&
+	    (td->o.td_ddir != TD_DDIR_READ)) {
+		new_offset = zbd_zone_end(z);
+		if (new_offset >= f->file_offset + f->io_size) {
+			log_info("%s: io_size must be at least one zone\n",
+				 f->file_name);
+			return false;
+		}
+		log_info("%s: rounded up offset from %"PRIu64" to %"PRIu64"\n",
+			 f->file_name, f->file_offset,
+			 new_offset);
+		f->io_size -= (new_offset - f->file_offset);
+		f->file_offset = new_offset;
+	}
+
+	z = zbd_offset_to_zone(f, f->file_offset + f->io_size);
+	new_end = z->start;
+	if ((td->o.td_ddir != TD_DDIR_READ) &&
+	    (f->file_offset + f->io_size != new_end)) {
+		if (new_end <= f->file_offset) {
+			log_info("%s: io_size must be at least one zone\n",
+				 f->file_name);
+			return false;
+		}
+		log_info("%s: rounded down io_size from %"PRIu64" to %"PRIu64"\n",
+			 f->file_name, f->io_size,
+			 new_end - f->file_offset);
+		f->io_size = new_end - f->file_offset;
+	}
+
+	return true;
 }
 
 /*
@@ -293,74 +584,14 @@ static bool zbd_is_seq_job(struct fio_file *f)
  */
 static bool zbd_verify_sizes(void)
 {
-	const struct fio_zone_info *z;
 	struct thread_data *td;
 	struct fio_file *f;
-	uint64_t new_offset, new_end;
-	uint32_t zone_idx;
 	int i, j;
 
 	for_each_td(td, i) {
 		for_each_file(td, f, j) {
-			if (!f->zbd_info)
-				continue;
-			if (f->file_offset >= f->real_file_size)
-				continue;
-			if (!zbd_is_seq_job(f))
-				continue;
-
-			if (!td->o.zone_size) {
-				td->o.zone_size = f->zbd_info->zone_size;
-				if (!td->o.zone_size) {
-					log_err("%s: invalid 0 zone size\n",
-						f->file_name);
-					return false;
-				}
-			} else if (td->o.zone_size != f->zbd_info->zone_size) {
-				log_err("%s: job parameter zonesize %llu does not match disk zone size %"PRIu64".\n",
-					f->file_name, td->o.zone_size,
-					f->zbd_info->zone_size);
-				return false;
-			}
-
-			if (td->o.zone_skip % td->o.zone_size) {
-				log_err("%s: zoneskip %llu is not a multiple of the device zone size %llu.\n",
-					f->file_name, td->o.zone_skip,
-					td->o.zone_size);
+			if (!zbd_zone_align_file_sizes(td, f))
 				return false;
-			}
-
-			zone_idx = zbd_zone_idx(f, f->file_offset);
-			z = get_zone(f, zone_idx);
-			if ((f->file_offset != z->start) &&
-			    (td->o.td_ddir != TD_DDIR_READ)) {
-				new_offset = zbd_zone_end(z);
-				if (new_offset >= f->file_offset + f->io_size) {
-					log_info("%s: io_size must be at least one zone\n",
-						 f->file_name);
-					return false;
-				}
-				log_info("%s: rounded up offset from %"PRIu64" to %"PRIu64"\n",
-					 f->file_name, f->file_offset,
-					 new_offset);
-				f->io_size -= (new_offset - f->file_offset);
-				f->file_offset = new_offset;
-			}
-			zone_idx = zbd_zone_idx(f, f->file_offset + f->io_size);
-			z = get_zone(f, zone_idx);
-			new_end = z->start;
-			if ((td->o.td_ddir != TD_DDIR_READ) &&
-			    (f->file_offset + f->io_size != new_end)) {
-				if (new_end <= f->file_offset) {
-					log_info("%s: io_size must be at least one zone\n",
-						 f->file_name);
-					return false;
-				}
-				log_info("%s: rounded down io_size from %"PRIu64" to %"PRIu64"\n",
-					 f->file_name, f->io_size,
-					 new_end - f->file_offset);
-				f->io_size = new_end - f->file_offset;
-			}
 		}
 	}
 
@@ -385,6 +616,7 @@ static bool zbd_verify_bs(void)
 
 			if (!f->zbd_info)
 				continue;
+
 			zone_size = f->zbd_info->zone_size;
 			if (td_trim(td) && td->o.bs[DDIR_TRIM] != zone_size) {
 				log_info("%s: trim block size %llu is not the zone size %"PRIu64"\n",
@@ -529,8 +761,8 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 		goto out;
 	}
 
-	dprint(FD_ZBD, "Device %s has %d zones of size %"PRIu64" KB\n", f->file_name,
-	       nr_zones, zone_size / 1024);
+	dprint(FD_ZBD, "Device %s has %d zones of size %"PRIu64" KB\n",
+	       f->file_name, nr_zones, zone_size / 1024);
 
 	zbd_info = scalloc(1, sizeof(*zbd_info) +
 			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
@@ -546,6 +778,7 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 						     PTHREAD_MUTEX_RECURSIVE);
 			p->start = z->start;
 			p->capacity = z->capacity;
+
 			switch (z->cond) {
 			case ZBD_ZONE_COND_NOT_WP:
 			case ZBD_ZONE_COND_FULL:
@@ -579,6 +812,7 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 		offset = z->start + z->len;
 		if (j >= nr_zones)
 			break;
+
 		nrz = zbd_report_zones(td, f, offset, zones,
 				       min((uint32_t)(nr_zones - j),
 					   ZBD_REPORT_MAX_ZONES));
@@ -646,7 +880,8 @@ out:
 	/* Ensure that the limit is not larger than FIO's internal limit */
 	if (zbd->max_open_zones > ZBD_MAX_OPEN_ZONES) {
 		td_verror(td, EINVAL, "'max_open_zones' value is too large");
-		log_err("'max_open_zones' value is larger than %u\n", ZBD_MAX_OPEN_ZONES);
+		log_err("'max_open_zones' value is larger than %u\n",
+			ZBD_MAX_OPEN_ZONES);
 		return -EINVAL;
 	}
 
@@ -748,14 +983,10 @@ static int zbd_init_zone_info(struct thread_data *td, struct fio_file *file)
 	ret = zbd_create_zone_info(td, file);
 	if (ret < 0)
 		td_verror(td, -ret, "zbd_create_zone_info() failed");
+
 	return ret;
 }
 
-static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
-			  uint32_t zone_idx);
-static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
-			  struct fio_zone_info *z);
-
 int zbd_init_files(struct thread_data *td)
 {
 	struct fio_file *f;
@@ -765,6 +996,7 @@ int zbd_init_files(struct thread_data *td)
 		if (zbd_init_zone_info(td, f))
 			return 1;
 	}
+
 	return 0;
 }
 
@@ -775,27 +1007,24 @@ void zbd_recalc_options_with_zone_granularity(struct thread_data *td)
 
 	for_each_file(td, f, i) {
 		struct zoned_block_device_info *zbd = f->zbd_info;
-		// zonemode=strided doesn't get per-file zone size.
-		uint64_t zone_size = zbd ? zbd->zone_size : td->o.zone_size;
+		uint64_t zone_size;
 
+		/* zonemode=strided doesn't get per-file zone size. */
+		zone_size = zbd ? zbd->zone_size : td->o.zone_size;
 		if (zone_size == 0)
 			continue;
 
-		if (td->o.size_nz > 0) {
+		if (td->o.size_nz > 0)
 			td->o.size = td->o.size_nz * zone_size;
-		}
-		if (td->o.io_size_nz > 0) {
+		if (td->o.io_size_nz > 0)
 			td->o.io_size = td->o.io_size_nz * zone_size;
-		}
-		if (td->o.start_offset_nz > 0) {
+		if (td->o.start_offset_nz > 0)
 			td->o.start_offset = td->o.start_offset_nz * zone_size;
-		}
-		if (td->o.offset_increment_nz > 0) {
-			td->o.offset_increment = td->o.offset_increment_nz * zone_size;
-		}
-		if (td->o.zone_skip_nz > 0) {
+		if (td->o.offset_increment_nz > 0)
+			td->o.offset_increment =
+				td->o.offset_increment_nz * zone_size;
+		if (td->o.zone_skip_nz > 0)
 			td->o.zone_skip = td->o.zone_skip_nz * zone_size;
-		}
 	}
 }
 
@@ -822,8 +1051,9 @@ int zbd_setup_files(struct thread_data *td)
 
 		assert(zbd);
 
-		f->min_zone = zbd_zone_idx(f, f->file_offset);
-		f->max_zone = zbd_zone_idx(f, f->file_offset + f->io_size);
+		f->min_zone = zbd_offset_to_zone_idx(f, f->file_offset);
+		f->max_zone =
+			zbd_offset_to_zone_idx(f, f->file_offset + f->io_size);
 
 		/*
 		 * When all zones in the I/O range are conventional, io_size
@@ -863,7 +1093,7 @@ int zbd_setup_files(struct thread_data *td)
 			if (z->cond != ZBD_ZONE_COND_IMP_OPEN &&
 			    z->cond != ZBD_ZONE_COND_EXP_OPEN)
 				continue;
-			if (zbd_open_zone(td, f, zi))
+			if (zbd_open_zone(td, f, z))
 				continue;
 			/*
 			 * If the number of open zones exceeds specified limits,
@@ -879,123 +1109,6 @@ int zbd_setup_files(struct thread_data *td)
 	return 0;
 }
 
-static inline unsigned int zbd_zone_nr(const struct fio_file *f,
-				       struct fio_zone_info *zone)
-{
-	return zone - f->zbd_info->zone_info;
-}
-
-/**
- * zbd_reset_zone - reset the write pointer of a single zone
- * @td: FIO thread data.
- * @f: FIO file associated with the disk for which to reset a write pointer.
- * @z: Zone to reset.
- *
- * Returns 0 upon success and a negative error code upon failure.
- *
- * The caller must hold z->mutex.
- */
-static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
-			  struct fio_zone_info *z)
-{
-	uint64_t offset = z->start;
-	uint64_t length = (z+1)->start - offset;
-	uint64_t data_in_zone = z->wp - z->start;
-	int ret = 0;
-
-	if (!data_in_zone)
-		return 0;
-
-	assert(is_valid_offset(f, offset + length - 1));
-
-	dprint(FD_ZBD, "%s: resetting wp of zone %u.\n", f->file_name,
-		zbd_zone_nr(f, z));
-	switch (f->zbd_info->model) {
-	case ZBD_HOST_AWARE:
-	case ZBD_HOST_MANAGED:
-		ret = zbd_reset_wp(td, f, offset, length);
-		if (ret < 0)
-			return ret;
-		break;
-	default:
-		break;
-	}
-
-	pthread_mutex_lock(&f->zbd_info->mutex);
-	f->zbd_info->sectors_with_data -= data_in_zone;
-	f->zbd_info->wp_sectors_with_data -= data_in_zone;
-	pthread_mutex_unlock(&f->zbd_info->mutex);
-	z->wp = z->start;
-	z->verify_block = 0;
-
-	td->ts.nr_zone_resets++;
-
-	return ret;
-}
-
-/* The caller must hold f->zbd_info->mutex */
-static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
-			   unsigned int zone_idx)
-{
-	uint32_t open_zone_idx = 0;
-
-	for (; open_zone_idx < f->zbd_info->num_open_zones; open_zone_idx++) {
-		if (f->zbd_info->open_zones[open_zone_idx] == zone_idx)
-			break;
-	}
-	if (open_zone_idx == f->zbd_info->num_open_zones)
-		return;
-
-	dprint(FD_ZBD, "%s: closing zone %d\n", f->file_name, zone_idx);
-	memmove(f->zbd_info->open_zones + open_zone_idx,
-		f->zbd_info->open_zones + open_zone_idx + 1,
-		(ZBD_MAX_OPEN_ZONES - (open_zone_idx + 1)) *
-		sizeof(f->zbd_info->open_zones[0]));
-	f->zbd_info->num_open_zones--;
-	td->num_open_zones--;
-	get_zone(f, zone_idx)->open = 0;
-}
-
-/*
- * Reset a range of zones. Returns 0 upon success and 1 upon failure.
- * @td: fio thread data.
- * @f: fio file for which to reset zones
- * @zb: first zone to reset.
- * @ze: first zone not to reset.
- */
-static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
-			   struct fio_zone_info *const zb,
-			   struct fio_zone_info *const ze)
-{
-	struct fio_zone_info *z;
-	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
-	int res = 0;
-
-	assert(min_bs);
-
-	dprint(FD_ZBD, "%s: examining zones %u .. %u\n", f->file_name,
-		zbd_zone_nr(f, zb), zbd_zone_nr(f, ze));
-	for (z = zb; z < ze; z++) {
-		uint32_t nz = zbd_zone_nr(f, z);
-
-		if (!z->has_wp)
-			continue;
-		zone_lock(td, f, z);
-		pthread_mutex_lock(&f->zbd_info->mutex);
-		zbd_close_zone(td, f, nz);
-		pthread_mutex_unlock(&f->zbd_info->mutex);
-		if (z->wp != z->start) {
-			dprint(FD_ZBD, "%s: resetting zone %u\n",
-			       f->file_name, zbd_zone_nr(f, z));
-			if (zbd_reset_zone(td, f, z) < 0)
-				res = 1;
-		}
-		zone_unlock(z);
-	}
-
-	return res;
-}
-
 /*
  * Reset zbd_info.write_cnt, the counter that counts down towards the next
  * zone reset.
@@ -1046,8 +1159,8 @@ static uint64_t zbd_process_swd(struct thread_data *td,
 	uint64_t swd = 0;
 	uint64_t wp_swd = 0;
 
-	zb = get_zone(f, f->min_zone);
-	ze = get_zone(f, f->max_zone);
+	zb = zbd_get_zone(f, f->min_zone);
+	ze = zbd_get_zone(f, f->max_zone);
 	for (z = zb; z < ze; z++) {
 		if (z->has_wp) {
 			zone_lock(td, f, z);
@@ -1055,6 +1168,7 @@ static uint64_t zbd_process_swd(struct thread_data *td,
 		}
 		swd += z->wp - z->start;
 	}
+
 	pthread_mutex_lock(&f->zbd_info->mutex);
 	switch (a) {
 	case CHECK_SWD:
@@ -1067,6 +1181,7 @@ static uint64_t zbd_process_swd(struct thread_data *td,
 		break;
 	}
 	pthread_mutex_unlock(&f->zbd_info->mutex);
+
 	for (z = zb; z < ze; z++)
 		if (z->has_wp)
 			zone_unlock(z);
@@ -1097,11 +1212,13 @@ void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 	if (!f->zbd_info || !td_write(td))
 		return;
 
-	zb = get_zone(f, f->min_zone);
-	ze = get_zone(f, f->max_zone);
+	zb = zbd_get_zone(f, f->min_zone);
+	ze = zbd_get_zone(f, f->max_zone);
 	swd = zbd_process_swd(td, f, SET_SWD);
-	dprint(FD_ZBD, "%s(%s): swd = %" PRIu64 "\n", __func__, f->file_name,
-	       swd);
+
+	dprint(FD_ZBD, "%s(%s): swd = %" PRIu64 "\n",
+	       __func__, f->file_name, swd);
+
 	/*
 	 * If data verification is enabled reset the affected zones before
 	 * writing any data to avoid that a zone reset has to be issued while
@@ -1112,92 +1229,12 @@ void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 	zbd_reset_write_cnt(td, f);
 }
 
-/* The caller must hold f->zbd_info->mutex. */
-static bool is_zone_open(const struct thread_data *td, const struct fio_file *f,
-			 unsigned int zone_idx)
-{
-	struct zoned_block_device_info *zbdi = f->zbd_info;
-	int i;
-
-	/* This function should never be called when zbdi->max_open_zones == 0 */
-	assert(zbdi->max_open_zones);
-	assert(td->o.job_max_open_zones == 0 || td->num_open_zones <= td->o.job_max_open_zones);
-	assert(td->o.job_max_open_zones <= zbdi->max_open_zones);
-	assert(zbdi->num_open_zones <= zbdi->max_open_zones);
-
-	for (i = 0; i < zbdi->num_open_zones; i++)
-		if (zbdi->open_zones[i] == zone_idx)
-			return true;
-
-	return false;
-}
-
-/*
- * Open a ZBD zone if it was not yet open. Returns true if either the zone was
- * already open or if opening a new zone is allowed. Returns false if the zone
- * was not yet open and opening a new zone would cause the zone limit to be
- * exceeded.
- */
-static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
-			  uint32_t zone_idx)
-{
-	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
-	struct zoned_block_device_info *zbdi = f->zbd_info;
-	struct fio_zone_info *z = get_zone(f, zone_idx);
-	bool res = true;
-
-	if (z->cond == ZBD_ZONE_COND_OFFLINE)
-		return false;
-
-	/*
-	 * Skip full zones with data verification enabled because resetting a
-	 * zone causes data loss and hence causes verification to fail.
-	 */
-	if (td->o.verify != VERIFY_NONE && zbd_zone_full(f, z, min_bs))
-		return false;
-
-	/*
-	 * zbdi->max_open_zones == 0 means that there is no limit on the maximum
-	 * number of open zones. In this case, do no track open zones in
-	 * zbdi->open_zones array.
-	 */
-	if (!zbdi->max_open_zones)
-		return true;
-
-	pthread_mutex_lock(&zbdi->mutex);
-	if (is_zone_open(td, f, zone_idx)) {
-		/*
-		 * If the zone is already open and going to be full by writes
-		 * in-flight, handle it as a full zone instead of an open zone.
-		 */
-		if (z->wp >= zbd_zone_capacity_end(z))
-			res = false;
-		goto out;
-	}
-	res = false;
-	/* Zero means no limit */
-	if (td->o.job_max_open_zones > 0 &&
-	    td->num_open_zones >= td->o.job_max_open_zones)
-		goto out;
-	if (zbdi->num_open_zones >= zbdi->max_open_zones)
-		goto out;
-	dprint(FD_ZBD, "%s: opening zone %d\n", f->file_name, zone_idx);
-	zbdi->open_zones[zbdi->num_open_zones++] = zone_idx;
-	td->num_open_zones++;
-	z->open = 1;
-	res = true;
-
-out:
-	pthread_mutex_unlock(&zbdi->mutex);
-	return res;
-}
-
 /* Return random zone index for one of the open zones. */
 static uint32_t pick_random_zone_idx(const struct fio_file *f,
 				     const struct io_u *io_u)
 {
-	return (io_u->offset - f->file_offset) * f->zbd_info->num_open_zones /
-		f->io_size;
+	return (io_u->offset - f->file_offset) *
+		f->zbd_info->num_open_zones / f->io_size;
 }
 
 static bool any_io_in_flight(void)
@@ -1244,13 +1281,15 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 		 */
 		zone_idx = zbdi->open_zones[pick_random_zone_idx(f, io_u)];
 	} else {
-		zone_idx = zbd_zone_idx(f, io_u->offset);
+		zone_idx = zbd_offset_to_zone_idx(f, io_u->offset);
 	}
 	if (zone_idx < f->min_zone)
 		zone_idx = f->min_zone;
 	else if (zone_idx >= f->max_zone)
 		zone_idx = f->max_zone - 1;
-	dprint(FD_ZBD, "%s(%s): starting from zone %d (offset %lld, buflen %lld)\n",
+
+	dprint(FD_ZBD,
+	       "%s(%s): starting from zone %d (offset %lld, buflen %lld)\n",
 	       __func__, f->file_name, zone_idx, io_u->offset, io_u->buflen);
 
 	/*
@@ -1262,13 +1301,16 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 	for (;;) {
 		uint32_t tmp_idx;
 
-		z = get_zone(f, zone_idx);
+		z = zbd_get_zone(f, zone_idx);
 		if (z->has_wp)
 			zone_lock(td, f, z);
+
 		pthread_mutex_lock(&zbdi->mutex);
+
 		if (z->has_wp) {
 			if (z->cond != ZBD_ZONE_COND_OFFLINE &&
-			    zbdi->max_open_zones == 0 && td->o.job_max_open_zones == 0)
+			    zbdi->max_open_zones == 0 &&
+			    td->o.job_max_open_zones == 0)
 				goto examine_zone;
 			if (zbdi->num_open_zones == 0) {
 				dprint(FD_ZBD, "%s(%s): no zones are open\n",
@@ -1278,14 +1320,15 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 		}
 
 		/*
-		 * List of opened zones is per-device, shared across all threads.
-		 * Start with quasi-random candidate zone.
-		 * Ignore zones which don't belong to thread's offset/size area.
+		 * List of opened zones is per-device, shared across all
+		 * threads. Start with quasi-random candidate zone. Ignore
+		 * zones which don't belong to thread's offset/size area.
 		 */
 		open_zone_idx = pick_random_zone_idx(f, io_u);
 		assert(!open_zone_idx ||
 		       open_zone_idx < zbdi->num_open_zones);
 		tmp_idx = open_zone_idx;
+
 		for (i = 0; i < zbdi->num_open_zones; i++) {
 			uint32_t tmpz;
 
@@ -1302,9 +1345,12 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 
 		dprint(FD_ZBD, "%s(%s): no candidate zone\n",
 			__func__, f->file_name);
+
 		pthread_mutex_unlock(&zbdi->mutex);
+
 		if (z->has_wp)
 			zone_unlock(z);
+
 		return NULL;
 
 found_candidate_zone:
@@ -1312,7 +1358,9 @@ found_candidate_zone:
 		if (new_zone_idx == zone_idx)
 			break;
 		zone_idx = new_zone_idx;
+
 		pthread_mutex_unlock(&zbdi->mutex);
+
 		if (z->has_wp)
 			zone_unlock(z);
 	}
@@ -1343,7 +1391,8 @@ open_other_zone:
 	 * zone close before opening a new zone.
 	 */
 	if (wait_zone_close) {
-		dprint(FD_ZBD, "%s(%s): quiesce to allow open zones to close\n",
+		dprint(FD_ZBD,
+		       "%s(%s): quiesce to allow open zones to close\n",
 		       __func__, f->file_name);
 		io_u_quiesce(td);
 	}
@@ -1358,7 +1407,7 @@ retry:
 		if (!is_valid_offset(f, z->start)) {
 			/* Wrap-around. */
 			zone_idx = f->min_zone;
-			z = get_zone(f, zone_idx);
+			z = zbd_get_zone(f, zone_idx);
 		}
 		assert(is_valid_offset(f, z->start));
 		if (!z->has_wp)
@@ -1366,7 +1415,7 @@ retry:
 		zone_lock(td, f, z);
 		if (z->open)
 			continue;
-		if (zbd_open_zone(td, f, zone_idx))
+		if (zbd_open_zone(td, f, z))
 			goto out;
 	}
 
@@ -1381,7 +1430,7 @@ retry:
 		pthread_mutex_unlock(&zbdi->mutex);
 		zone_unlock(z);
 
-		z = get_zone(f, zone_idx);
+		z = zbd_get_zone(f, zone_idx);
 
 		zone_lock(td, f, z);
 		if (z->wp + min_bs <= zbd_zone_capacity_end(z))
@@ -1396,7 +1445,8 @@ retry:
 	 */
 	in_flight = any_io_in_flight();
 	if (in_flight || should_retry) {
-		dprint(FD_ZBD, "%s(%s): wait zone close and retry open zones\n",
+		dprint(FD_ZBD,
+		       "%s(%s): wait zone close and retry open zones\n",
 		       __func__, f->file_name);
 		pthread_mutex_unlock(&zbdi->mutex);
 		zone_unlock(z);
@@ -1407,17 +1457,22 @@ retry:
 	}
 
 	pthread_mutex_unlock(&zbdi->mutex);
+
 	zone_unlock(z);
-	dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,
-	       f->file_name);
+
+	dprint(FD_ZBD, "%s(%s): did not open another zone\n",
+	       __func__, f->file_name);
+
 	return NULL;
 
 out:
-	dprint(FD_ZBD, "%s(%s): returning zone %d\n", __func__, f->file_name,
-	       zone_idx);
+	dprint(FD_ZBD, "%s(%s): returning zone %d\n",
+	       __func__, f->file_name, zone_idx);
+
 	io_u->offset = z->start;
 	assert(z->has_wp);
 	assert(z->cond != ZBD_ZONE_COND_OFFLINE);
+
 	return z;
 }
 
@@ -1429,25 +1484,27 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
 	const struct fio_file *f = io_u->file;
 	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
 
-	if (!zbd_open_zone(td, f, zbd_zone_nr(f, z))) {
+	if (!zbd_open_zone(td, f, z)) {
 		zone_unlock(z);
 		z = zbd_convert_to_open_zone(td, io_u);
 		assert(z);
 	}
 
 	if (z->verify_block * min_bs >= z->capacity) {
-		log_err("%s: %d * %"PRIu64" >= %"PRIu64"\n", f->file_name, z->verify_block,
-			min_bs, z->capacity);
+		log_err("%s: %d * %"PRIu64" >= %"PRIu64"\n",
+			f->file_name, z->verify_block, min_bs, z->capacity);
 		/*
 		 * If the assertion below fails during a test run, adding
 		 * "--experimental_verify=1" to the command line may help.
 		 */
 		assert(false);
 	}
+
 	io_u->offset = z->start + z->verify_block * min_bs;
 	if (io_u->offset + io_u->buflen >= zbd_zone_capacity_end(z)) {
-		log_err("%s: %llu + %llu >= %"PRIu64"\n", f->file_name, io_u->offset,
-			io_u->buflen, zbd_zone_capacity_end(z));
+		log_err("%s: %llu + %llu >= %"PRIu64"\n",
+			f->file_name, io_u->offset, io_u->buflen,
+			zbd_zone_capacity_end(z));
 		assert(false);
 	}
 	z->verify_block += io_u->buflen / min_bs;
@@ -1468,7 +1525,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u, uint64_t min_bytes,
 {
 	struct fio_file *f = io_u->file;
 	struct fio_zone_info *z1, *z2;
-	const struct fio_zone_info *const zf = get_zone(f, f->min_zone);
+	const struct fio_zone_info *const zf = zbd_get_zone(f, f->min_zone);
 
 	/*
 	 * Skip to the next non-empty zone in case of sequential I/O and to
@@ -1485,6 +1542,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u, uint64_t min_bytes,
 		} else if (!td_random(td)) {
 			break;
 		}
+
 		if (td_random(td) && z2 >= zf &&
 		    z2->cond != ZBD_ZONE_COND_OFFLINE) {
 			if (z2->has_wp)
@@ -1495,8 +1553,11 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u, uint64_t min_bytes,
 				zone_unlock(z2);
 		}
 	}
-	dprint(FD_ZBD, "%s: no zone has %"PRIu64" bytes of readable data\n",
+
+	dprint(FD_ZBD,
+	       "%s: no zone has %"PRIu64" bytes of readable data\n",
 	       f->file_name, min_bytes);
+
 	return NULL;
 }
 
@@ -1517,7 +1578,7 @@ static void zbd_end_zone_io(struct thread_data *td, const struct io_u *io_u,
 	if (io_u->ddir == DDIR_WRITE &&
 	    io_u->offset + io_u->buflen >= zbd_zone_capacity_end(z)) {
 		pthread_mutex_lock(&f->zbd_info->mutex);
-		zbd_close_zone(td, f, zbd_zone_nr(f, z));
+		zbd_close_zone(td, f, z);
 		pthread_mutex_unlock(&f->zbd_info->mutex);
 	}
 }
@@ -1537,15 +1598,11 @@ static void zbd_queue_io(struct thread_data *td, struct io_u *io_u, int q,
 	const struct fio_file *f = io_u->file;
 	struct zoned_block_device_info *zbd_info = f->zbd_info;
 	struct fio_zone_info *z;
-	uint32_t zone_idx;
 	uint64_t zone_end;
 
 	assert(zbd_info);
 
-	zone_idx = zbd_zone_idx(f, io_u->offset);
-	assert(zone_idx < zbd_info->nr_zones);
-	z = get_zone(f, zone_idx);
-
+	z = zbd_offset_to_zone(f, io_u->offset);
 	assert(z->has_wp);
 
 	if (!success)
@@ -1553,17 +1610,18 @@ static void zbd_queue_io(struct thread_data *td, struct io_u *io_u, int q,
 
 	dprint(FD_ZBD,
 	       "%s: queued I/O (%lld, %llu) for zone %u\n",
-	       f->file_name, io_u->offset, io_u->buflen, zone_idx);
+	       f->file_name, io_u->offset, io_u->buflen, zbd_zone_idx(f, z));
 
 	switch (io_u->ddir) {
 	case DDIR_WRITE:
 		zone_end = min((uint64_t)(io_u->offset + io_u->buflen),
 			       zbd_zone_capacity_end(z));
-		pthread_mutex_lock(&zbd_info->mutex);
+
 		/*
 		 * z->wp > zone_end means that one or more I/O errors
 		 * have occurred.
 		 */
+		pthread_mutex_lock(&zbd_info->mutex);
 		if (z->wp <= zone_end) {
 			zbd_info->sectors_with_data += zone_end - z->wp;
 			zbd_info->wp_sectors_with_data += zone_end - z->wp;
@@ -1595,19 +1653,15 @@ static void zbd_put_io(struct thread_data *td, const struct io_u *io_u)
 	const struct fio_file *f = io_u->file;
 	struct zoned_block_device_info *zbd_info = f->zbd_info;
 	struct fio_zone_info *z;
-	uint32_t zone_idx;
 
 	assert(zbd_info);
 
-	zone_idx = zbd_zone_idx(f, io_u->offset);
-	assert(zone_idx < zbd_info->nr_zones);
-	z = get_zone(f, zone_idx);
-
+	z = zbd_offset_to_zone(f, io_u->offset);
 	assert(z->has_wp);
 
 	dprint(FD_ZBD,
 	       "%s: terminate I/O (%lld, %llu) for zone %u\n",
-	       f->file_name, io_u->offset, io_u->buflen, zone_idx);
+	       f->file_name, io_u->offset, io_u->buflen, zbd_zone_idx(f, z));
 
 	zbd_end_zone_io(td, io_u, z);
 
@@ -1649,28 +1703,26 @@ void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u)
 	struct fio_file *f = io_u->file;
 	enum fio_ddir ddir = io_u->ddir;
 	struct fio_zone_info *z;
-	uint32_t zone_idx;
 
 	assert(td->o.zone_mode == ZONE_MODE_ZBD);
 	assert(td->o.zone_size);
 	assert(f->zbd_info);
 
-	zone_idx = zbd_zone_idx(f, f->last_pos[ddir]);
-	z = get_zone(f, zone_idx);
+	z = zbd_offset_to_zone(f, f->last_pos[ddir]);
 
 	/*
 	 * When the zone capacity is smaller than the zone size and the I/O is
 	 * sequential write, skip to zone end if the latest position is at the
 	 * zone capacity limit.
 	 */
-	if (z->capacity < f->zbd_info->zone_size && !td_random(td) &&
-	    ddir == DDIR_WRITE &&
+	if (z->capacity < f->zbd_info->zone_size &&
+	    !td_random(td) && ddir == DDIR_WRITE &&
 	    f->last_pos[ddir] >= zbd_zone_capacity_end(z)) {
 		dprint(FD_ZBD,
 		       "%s: Jump from zone capacity limit to zone end:"
 		       " (%"PRIu64" -> %"PRIu64") for zone %u (%"PRIu64")\n",
 		       f->file_name, f->last_pos[ddir],
-		       zbd_zone_end(z), zone_idx, z->capacity);
+		       zbd_zone_end(z), zbd_zone_idx(f, z), z->capacity);
 		td->io_skip_bytes += zbd_zone_end(z) - f->last_pos[ddir];
 		f->last_pos[ddir] = zbd_zone_end(z);
 	}
@@ -1751,7 +1803,6 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	struct zoned_block_device_info *zbdi = f->zbd_info;
-	uint32_t zone_idx_b;
 	struct fio_zone_info *zb, *zl, *orig_zb;
 	uint32_t orig_len = io_u->buflen;
 	uint64_t min_bs = td->o.min_bs[io_u->ddir];
@@ -1762,14 +1813,15 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	assert(min_bs);
 	assert(is_valid_offset(f, io_u->offset));
 	assert(io_u->buflen);
-	zone_idx_b = zbd_zone_idx(f, io_u->offset);
-	zb = get_zone(f, zone_idx_b);
+
+	zb = zbd_offset_to_zone(f, io_u->offset);
 	orig_zb = zb;
 
 	if (!zb->has_wp) {
 		/* Accept non-write I/Os for conventional zones. */
 		if (io_u->ddir != DDIR_WRITE)
 			return io_u_accept;
+
 		/*
 		 * Make sure that writes to conventional zones
 		 * don't cross over to any sequential zones.
@@ -1783,12 +1835,16 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			       "%s: off=%llu + min_bs=%"PRIu64" > next zone %"PRIu64"\n",
 			       f->file_name, io_u->offset,
 			       min_bs, (zb + 1)->start);
-			io_u->offset = zb->start + (zb + 1)->start - io_u->offset;
-			new_len = min(io_u->buflen, (zb + 1)->start - io_u->offset);
+			io_u->offset =
+				zb->start + (zb + 1)->start - io_u->offset;
+			new_len = min(io_u->buflen,
+				      (zb + 1)->start - io_u->offset);
 		} else {
 			new_len = (zb + 1)->start - io_u->offset;
 		}
+
 		io_u->buflen = new_len / min_bs * min_bs;
+
 		return io_u_accept;
 	}
 
@@ -1810,6 +1866,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			zb = zbd_replay_write_order(td, io_u, zb);
 			goto accept;
 		}
+
 		/*
 		 * Check that there is enough written data in the zone to do an
 		 * I/O of at least min_bs B. If there isn't, find a new zone for
@@ -1820,7 +1877,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		if (range < min_bs ||
 		    ((!td_random(td)) && (io_u->offset + min_bs > zb->wp))) {
 			zone_unlock(zb);
-			zl = get_zone(f, f->max_zone);
+			zl = zbd_get_zone(f, f->max_zone);
 			zb = zbd_find_zone(td, io_u, min_bs, zb, zl);
 			if (!zb) {
 				dprint(FD_ZBD,
@@ -1839,6 +1896,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			if (!td_random(td))
 				io_u->offset = zb->start;
 		}
+
 		/*
 		 * Make sure the I/O is within the zone valid data range while
 		 * maximizing the I/O size and preserving randomness.
@@ -1849,12 +1907,14 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			io_u->offset = zb->start +
 				((io_u->offset - orig_zb->start) %
 				 (range - io_u->buflen)) / min_bs * min_bs;
+
 		/*
 		 * When zbd_find_zone() returns a conventional zone,
 		 * we can simply accept the new i/o offset here.
 		 */
 		if (!zb->has_wp)
 			return io_u_accept;
+
 		/*
 		 * Make sure the I/O does not cross over the zone wp position.
 		 */
@@ -1866,9 +1926,12 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			dprint(FD_IO, "Changed length from %u into %llu\n",
 			       orig_len, io_u->buflen);
 		}
+
 		assert(zb->start <= io_u->offset);
 		assert(io_u->offset + io_u->buflen <= zb->wp);
+
 		goto accept;
+
 	case DDIR_WRITE:
 		if (io_u->buflen > zbdi->zone_size) {
 			td_verror(td, EINVAL, "I/O buflen exceeds zone size");
@@ -1877,7 +1940,8 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			       f->file_name, io_u->buflen, zbdi->zone_size);
 			goto eof;
 		}
-		if (!zbd_open_zone(td, f, zone_idx_b)) {
+
+		if (!zbd_open_zone(td, f, zb)) {
 			zone_unlock(zb);
 			zb = zbd_convert_to_open_zone(td, io_u);
 			if (!zb) {
@@ -1886,14 +1950,14 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 				goto eof;
 			}
 		}
+
 		/* Check whether the zone reset threshold has been exceeded */
 		if (td->o.zrf.u.f) {
-			if (zbdi->wp_sectors_with_data >=
-			    f->io_size * td->o.zrt.u.f &&
-			    zbd_dec_and_reset_write_cnt(td, f)) {
+			if (zbdi->wp_sectors_with_data >= f->io_size * td->o.zrt.u.f &&
+			    zbd_dec_and_reset_write_cnt(td, f))
 				zb->reset_zone = 1;
-			}
 		}
+
 		/* Reset the zone pointer if necessary */
 		if (zb->reset_zone || zbd_zone_full(f, zb, min_bs)) {
 			assert(td->o.verify == VERIFY_NONE);
@@ -1916,6 +1980,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 				goto eof;
 			}
 		}
+
 		/* Make writes occur at the write pointer */
 		assert(!zbd_zone_full(f, zb, min_bs));
 		io_u->offset = zb->wp;
@@ -1925,6 +1990,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			       f->file_name, io_u->offset);
 			goto eof;
 		}
+
 		/*
 		 * Make sure that the buflen is a multiple of the minimal
 		 * block size. Give up if shrinking would make the request too
@@ -1941,10 +2007,13 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			       orig_len, io_u->buflen);
 			goto accept;
 		}
+
 		td_verror(td, EIO, "zone remainder too small");
 		log_err("zone remainder %lld smaller than min block size %"PRIu64"\n",
 			(zbd_zone_capacity_end(zb) - io_u->offset), min_bs);
+
 		goto eof;
+
 	case DDIR_TRIM:
 		/* Check random trim targets a non-empty zone */
 		if (!td_random(td) || zb->wp > zb->start)
@@ -1952,7 +2021,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 
 		/* Find out a non-empty zone to trim */
 		zone_unlock(zb);
-		zl = get_zone(f, f->max_zone);
+		zl = zbd_get_zone(f, f->max_zone);
 		zb = zbd_find_zone(td, io_u, 1, zb, zl);
 		if (zb) {
 			io_u->offset = zb->start;
@@ -1960,7 +2029,9 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			       f->file_name, io_u->offset);
 			goto accept;
 		}
+
 		goto eof;
+
 	case DDIR_SYNC:
 		/* fall-through */
 	case DDIR_DATASYNC:
@@ -1978,19 +2049,23 @@ accept:
 	assert(zb->cond != ZBD_ZONE_COND_OFFLINE);
 	assert(!io_u->zbd_queue_io);
 	assert(!io_u->zbd_put_io);
+
 	io_u->zbd_queue_io = zbd_queue_io;
 	io_u->zbd_put_io = zbd_put_io;
+
 	/*
 	 * Since we return with the zone lock still held,
 	 * add an annotation to let Coverity know that it
 	 * is intentional.
 	 */
 	/* coverity[missing_unlock] */
+
 	return io_u_accept;
 
 eof:
 	if (zb && zb->has_wp)
 		zone_unlock(zb);
+
 	return io_u_eof;
 }
 
@@ -2018,17 +2093,15 @@ int zbd_do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	struct fio_zone_info *z;
-	uint32_t zone_idx;
 	int ret;
 
-	zone_idx = zbd_zone_idx(f, io_u->offset);
-	z = get_zone(f, zone_idx);
-
+	z = zbd_offset_to_zone(f, io_u->offset);
 	if (!z->has_wp)
 		return 0;
 
 	if (io_u->offset != z->start) {
-		log_err("Trim offset not at zone start (%lld)\n", io_u->offset);
+		log_err("Trim offset not at zone start (%lld)\n",
+			io_u->offset);
 		return -EINVAL;
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-12-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-12-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 79eb6c9a17de959d72ee51c601b2764225101282:

  ioengines: libzbc: disable libzbc block backend driver (2021-12-09 21:34:21 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2ea393df3256e44398558c264f035f8db7656b08:

  Merge branch 'github-actions' of https://github.com/sitsofe/fio (2021-12-10 11:08:26 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'github-actions' of https://github.com/sitsofe/fio

Sitsofe Wheeler (2):
      ci: add CI via GitHub Actions
      ci: retire travis configuration

 .github/workflows/ci.yml | 45 ++++++++++++++++++++++++
 .travis.yml              | 37 --------------------
 ci/actions-build.sh      | 37 ++++++++++++++++++++
 ci/actions-full-test.sh  | 15 ++++++++
 ci/actions-install.sh    | 91 ++++++++++++++++++++++++++++++++++++++++++++++++
 ci/actions-smoke-test.sh | 10 ++++++
 ci/common.sh             | 34 ++++++++++++++++++
 ci/travis-build.sh       | 32 -----------------
 ci/travis-install.sh     | 65 ----------------------------------
 9 files changed, 232 insertions(+), 134 deletions(-)
 create mode 100644 .github/workflows/ci.yml
 delete mode 100644 .travis.yml
 create mode 100755 ci/actions-build.sh
 create mode 100755 ci/actions-full-test.sh
 create mode 100755 ci/actions-install.sh
 create mode 100755 ci/actions-smoke-test.sh
 create mode 100644 ci/common.sh
 delete mode 100755 ci/travis-build.sh
 delete mode 100755 ci/travis-install.sh

---

Diff of recent changes:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
new file mode 100644
index 00000000..a766cfa8
--- /dev/null
+++ b/.github/workflows/ci.yml
@@ -0,0 +1,45 @@
+name: CI
+
+on:
+  push:
+  pull_request:
+
+jobs:
+  build:
+    runs-on: ${{ matrix.os }}
+    strategy:
+      fail-fast: false
+      matrix:
+        build:
+        - linux-gcc
+        - linux-clang
+        - macos
+        - linux-i686-gcc
+        include:
+        - build: linux-gcc
+          os: ubuntu-20.04
+          cc: gcc
+        - build: linux-clang
+          os: ubuntu-20.04
+          cc: clang
+        - build: macos
+          os: macos-10.15
+        - build: linux-i686-gcc
+          os: ubuntu-20.04
+          arch: i686
+
+    env:
+      CI_TARGET_ARCH: ${{ matrix.arch }}
+      CC: ${{ matrix.cc }}
+
+    steps:
+    - name: Checkout repo
+      uses: actions/checkout@v2
+    - name: Install dependencies
+      run: ./ci/actions-install.sh
+    - name: Build
+      run: ./ci/actions-build.sh
+    - name: Smoke test
+      run: ./ci/actions-smoke-test.sh
+    - name: Full test
+      run: ./ci/actions-full-test.sh
diff --git a/.travis.yml b/.travis.yml
deleted file mode 100644
index e35aff39..00000000
--- a/.travis.yml
+++ /dev/null
@@ -1,37 +0,0 @@
-language: c
-dist: bionic
-os:
-  - linux
-compiler:
-  - clang
-  - gcc
-arch:
-  - amd64
-  - arm64
-env:
-  global:
-    - MAKEFLAGS="-j 2"
-matrix:
-  include:
-    - os: linux
-      compiler: gcc
-      arch: amd64
-      env: BUILD_ARCH="x86" # Only do the gcc x86 build to reduce clutter
-    # Default xcode image
-    - os: osx
-      compiler: clang # Workaround travis setting CC=["clang", "gcc"]
-      arch: amd64
-    # Latest xcode image (needs periodic updating)
-    - os: osx
-      compiler: clang
-      osx_image: xcode11.2
-      arch: amd64
-  exclude:
-    - os: osx
-      compiler: gcc
-
-install:
-  - ci/travis-install.sh
-
-script:
-  - ci/travis-build.sh
diff --git a/ci/actions-build.sh b/ci/actions-build.sh
new file mode 100755
index 00000000..74a6fdcb
--- /dev/null
+++ b/ci/actions-build.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+# This script expects to be invoked from the base fio directory.
+set -eu
+
+SCRIPT_DIR=$(dirname "$0")
+# shellcheck disable=SC1091
+. "${SCRIPT_DIR}/common.sh"
+
+main() {
+    local extra_cflags="-Werror"
+    local configure_flags=()
+
+    set_ci_target_os
+    case "${CI_TARGET_OS}" in
+        "linux")
+            case "${CI_TARGET_ARCH}" in
+                "i686")
+                    extra_cflags="${extra_cflags} -m32"
+                    export LDFLAGS="-m32"
+                    ;;
+                "x86_64")
+                    configure_flags+=(
+                        "--enable-cuda"
+                        "--enable-libiscsi"
+                        "--enable-libnbd"
+                    )
+                    ;;
+            esac
+        ;;
+    esac
+    configure_flags+=(--extra-cflags="${extra_cflags}")
+
+    ./configure "${configure_flags[@]}"
+    make -j 2
+}
+
+main
diff --git a/ci/actions-full-test.sh b/ci/actions-full-test.sh
new file mode 100755
index 00000000..4ae1dba1
--- /dev/null
+++ b/ci/actions-full-test.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+# This script expects to be invoked from the base fio directory.
+set -eu
+
+main() {
+    echo "Running long running tests..."
+    export PYTHONUNBUFFERED="TRUE"
+    if [[ "${CI_TARGET_ARCH}" == "arm64" ]]; then
+        sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug -p 1010:"--skip 15 16 17 18 19 20"
+    else
+        sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug
+    fi
+}
+
+main
diff --git a/ci/actions-install.sh b/ci/actions-install.sh
new file mode 100755
index 00000000..7408ccb4
--- /dev/null
+++ b/ci/actions-install.sh
@@ -0,0 +1,91 @@
+#!/bin/bash
+# This script expects to be invoked from the base fio directory.
+set -eu
+
+SCRIPT_DIR=$(dirname "$0")
+# shellcheck disable=SC1091
+. "${SCRIPT_DIR}/common.sh"
+
+install_ubuntu() {
+    local pkgs
+
+    cat <<DPKGCFG | sudo tee /etc/dpkg/dpkg.cfg.d/dpkg-speedup > /dev/null
+# Skip fsync
+force-unsafe-io
+# Don't install documentation
+path-exclude=/usr/share/man/*
+path-exclude=/usr/share/locale/*/LC_MESSAGES/*.mo
+path-exclude=/usr/share/doc/*
+DPKGCFG
+    # Packages available on i686 and x86_64
+    pkgs=(
+        libaio-dev
+        libcunit1-dev
+        libcurl4-openssl-dev
+        libfl-dev
+        libibverbs-dev
+        libnuma-dev
+        librdmacm-dev
+        valgrind
+    )
+    case "${CI_TARGET_ARCH}" in
+        "i686")
+            sudo dpkg --add-architecture i386
+            pkgs=("${pkgs[@]/%/:i386}")
+            pkgs+=(
+                gcc-multilib
+                pkg-config:i386
+                zlib1g-dev:i386
+            )
+            ;;
+        "x86_64")
+            pkgs+=(
+                libglusterfs-dev
+                libgoogle-perftools-dev
+                libiscsi-dev
+                libnbd-dev
+                libpmem-dev
+                libpmemblk-dev
+                librbd-dev
+                libtcmalloc-minimal4
+                nvidia-cuda-dev
+            )
+            ;;
+    esac
+
+    # Architecture-independent packages and packages for which we don't
+    # care about the architecture.
+    pkgs+=(
+        python3-scipy
+    )
+
+    echo "Updating APT..."
+    sudo apt-get -qq update
+    echo "Installing packages..."
+    sudo apt-get install -o APT::Immediate-Configure=false --no-install-recommends -qq -y "${pkgs[@]}"
+}
+
+install_linux() {
+    install_ubuntu
+}
+
+install_macos() {
+    # Assumes homebrew and python3 are already installed
+    #echo "Updating homebrew..."
+    #brew update >/dev/null 2>&1
+    echo "Installing packages..."
+    HOMEBREW_NO_AUTO_UPDATE=1 brew install cunit
+    pip3 install scipy six
+}
+
+main() {
+    set_ci_target_os
+
+    install_function="install_${CI_TARGET_OS}"
+    ${install_function}
+
+    echo "Python3 path: $(type -p python3 2>&1)"
+    echo "Python3 version: $(python3 -V 2>&1)"
+}
+
+main
diff --git a/ci/actions-smoke-test.sh b/ci/actions-smoke-test.sh
new file mode 100755
index 00000000..c129c89f
--- /dev/null
+++ b/ci/actions-smoke-test.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+# This script expects to be invoked from the base fio directory.
+set -eu
+
+main() {
+    echo "Running smoke tests..."
+    make test
+}
+
+main
diff --git a/ci/common.sh b/ci/common.sh
new file mode 100644
index 00000000..8861f843
--- /dev/null
+++ b/ci/common.sh
@@ -0,0 +1,34 @@
+# shellcheck shell=bash
+
+function set_ci_target_os {
+    # Function that exports CI_TARGET_OS to the current OS if it is not already
+    # set.
+
+    # Don't override CI_TARGET_OS if already set
+    CI_TARGET_OS=${CI_TARGET_OS:-}
+    if [[ -z ${CI_TARGET_OS} ]]; then
+        # Detect operating system
+        case "${OSTYPE}" in
+            linux*)
+                CI_TARGET_OS="linux"
+                ;;
+            darwin*)
+                CI_TARGET_OS="macos"
+                ;;
+            msys*)
+                CI_TARGET_OS="windows"
+                ;;
+            bsd*)
+                CI_TARGET_OS="bsd"
+                ;;
+            *)
+                CI_TARGET_OS=""
+        esac
+    fi
+
+    # Don't override CI_TARGET_ARCH if already set
+    CI_TARGET_ARCH=${CI_TARGET_ARCH:-}
+    if [[ -z ${CI_TARGET_ARCH} ]]; then
+        CI_TARGET_ARCH="$(uname -m)"
+    fi
+}
diff --git a/ci/travis-build.sh b/ci/travis-build.sh
deleted file mode 100755
index 923d882d..00000000
--- a/ci/travis-build.sh
+++ /dev/null
@@ -1,32 +0,0 @@
-#!/bin/bash
-set -eu
-
-CI_TARGET_ARCH="${BUILD_ARCH:-$TRAVIS_CPU_ARCH}"
-EXTRA_CFLAGS="-Werror"
-export PYTHONUNBUFFERED=TRUE
-CONFIGURE_FLAGS=()
-
-case "$TRAVIS_OS_NAME" in
-    "linux")
-        CONFIGURE_FLAGS+=(--enable-libiscsi)
-        case "$CI_TARGET_ARCH" in
-            "x86")
-                EXTRA_CFLAGS="${EXTRA_CFLAGS} -m32"
-                export LDFLAGS="-m32"
-                ;;
-            "amd64")
-                CONFIGURE_FLAGS+=(--enable-cuda)
-                ;;
-        esac
-    ;;
-esac
-CONFIGURE_FLAGS+=(--extra-cflags="${EXTRA_CFLAGS}")
-
-./configure "${CONFIGURE_FLAGS[@]}" &&
-    make &&
-    make test &&
-    if [[ "$CI_TARGET_ARCH" == "arm64" ]]; then
-        sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug -p 1010:"--skip 15 16 17 18 19 20"
-    else
-        sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug
-    fi
diff --git a/ci/travis-install.sh b/ci/travis-install.sh
deleted file mode 100755
index 4c4c04c5..00000000
--- a/ci/travis-install.sh
+++ /dev/null
@@ -1,65 +0,0 @@
-#!/bin/bash
-set -eu
-
-CI_TARGET_ARCH="${BUILD_ARCH:-$TRAVIS_CPU_ARCH}"
-case "$TRAVIS_OS_NAME" in
-    "linux")
-	# Architecture-dependent packages.
-	pkgs=(
-	    libaio-dev
-	    libcunit1-dev
-	    libfl-dev
-	    libgoogle-perftools-dev
-	    libibverbs-dev
-	    libiscsi-dev
-	    libnuma-dev
-	    librbd-dev
-	    librdmacm-dev
-	    libz-dev
-	)
-	case "$CI_TARGET_ARCH" in
-	    "x86")
-		pkgs=("${pkgs[@]/%/:i386}")
-		pkgs+=(
-		    gcc-multilib
-		    pkg-config:i386
-	        )
-		;;
-	    "amd64")
-		pkgs+=(nvidia-cuda-dev)
-		;;
-	esac
-	if [[ $CI_TARGET_ARCH != "x86" ]]; then
-		pkgs+=(glusterfs-common)
-	fi
-	# Architecture-independent packages and packages for which we don't
-	# care about the architecture.
-	pkgs+=(
-	    bison
-	    flex
-	    python3
-	    python3-scipy
-	    python3-six
-	)
-	sudo apt-get -qq update
-	sudo apt-get install --no-install-recommends -qq -y "${pkgs[@]}"
-	# librpma is supported on the amd64 (x86_64) architecture for now
-	if [[ $CI_TARGET_ARCH == "amd64" ]]; then
-		# install libprotobuf-c-dev required by librpma_gpspm
-		sudo apt-get install --no-install-recommends -qq -y libprotobuf-c-dev
-		# PMDK libraries have to be installed, because
-		# libpmem is a dependency of the librpma fio engine
-		ci/travis-install-pmdk.sh
-		# install librpma from sources from GitHub
-		ci/travis-install-librpma.sh
-	fi
-	;;
-    "osx")
-	brew update >/dev/null 2>&1
-	brew install cunit
-	pip3 install scipy six
-	;;
-esac
-
-echo "Python3 path: $(type -p python3 2>&1)"
-echo "Python3 version: $(python3 -V 2>&1)"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-12-10 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-12-10 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit fab60fa78a1832c17f8bb200292ded4a8b3eb2a5:

  Merge branch 'arm-detect-pmull' of https://github.com/sitsofe/fio (2021-12-06 13:26:52 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 79eb6c9a17de959d72ee51c601b2764225101282:

  ioengines: libzbc: disable libzbc block backend driver (2021-12-09 21:34:21 -0700)

----------------------------------------------------------------
Damien Le Moal (1):
      ioengines: libzbc: disable libzbc block backend driver

 engines/libzbc.c              |  2 +-
 t/zbd/run-tests-against-nullb | 14 --------------
 2 files changed, 1 insertion(+), 15 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libzbc.c b/engines/libzbc.c
index abee2043..2bc2c7e0 100644
--- a/engines/libzbc.c
+++ b/engines/libzbc.c
@@ -85,7 +85,7 @@ static int libzbc_open_dev(struct thread_data *td, struct fio_file *f,
 		return -ENOMEM;
 
 	ret = zbc_open(f->file_name,
-		       flags | ZBC_O_DRV_BLOCK | ZBC_O_DRV_SCSI | ZBC_O_DRV_ATA,
+		       flags | ZBC_O_DRV_SCSI | ZBC_O_DRV_ATA,
 		       &ld->zdev);
 	if (ret) {
 		log_err("%s: zbc_open() failed, err=%d\n",
diff --git a/t/zbd/run-tests-against-nullb b/t/zbd/run-tests-against-nullb
index db901179..7d2c7fa8 100755
--- a/t/zbd/run-tests-against-nullb
+++ b/t/zbd/run-tests-against-nullb
@@ -19,7 +19,6 @@ usage()
 	echo -e "\t-L List the device layouts for every section without running"
 	echo -e "\t   tests."
 	echo -e "\t-s <#section> Only run the section with the given number."
-	echo -e "\t-l Use libzbc ioengine to run the tests."
 	echo -e "\t-t <#test> Only run the test with the given number in every section."
 	echo -e "\t-o <max_open_zones> Specify MaxOpen value, (${set_max_open} by default)."
 	echo -e "\t-n <#number of runs> Set the number of times to run the entire suite "
@@ -239,7 +238,6 @@ dev_size=1024
 dev_blocksize=4096
 set_max_open=8
 zbd_test_opts=()
-libzbc=0
 num_of_runs=1
 test_case=0
 quit_on_err=0
@@ -250,7 +248,6 @@ while (($#)); do
 		-o) set_max_open="${2}"; shift; shift;;
 		-L) list_only=1; shift;;
 		-r) cleanup_nullb; exit 0;;
-		-l) libzbc=1; shift;;
 		-n) num_of_runs="${2}"; shift; shift;;
 		-t) test_case="${2}"; shift; shift;;
 		-q) quit_on_err=1; shift;;
@@ -311,17 +308,6 @@ while ((run_nr <= $num_of_runs)); do
 			exit 1
 		fi
 		show_nullb_config
-		if ((libzbc)); then
-			if ((zone_capacity < zone_size)); then
-				echo "libzbc doesn't support zone capacity, skipping section $(printf "%02d" $section_number)"
-				continue
-			fi
-			if ((conv_pcnt == 100)); then
-				echo "libzbc only supports zoned devices, skipping section $(printf "%02d" $section_number)"
-				continue
-			fi
-			zbd_test_opts+=("-l")
-		fi
 		cd "${scriptdir}"
 		((intr)) && exit 1
 		((list_only)) && continue

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-12-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-12-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit fd1d8e0ab3dc852193037a3acebcf8b8bdbcd9c5:

  filesetup: create zbd_info before jumping to done label (2021-12-02 17:54:15 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fab60fa78a1832c17f8bb200292ded4a8b3eb2a5:

  Merge branch 'arm-detect-pmull' of https://github.com/sitsofe/fio (2021-12-06 13:26:52 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'arm-detect-pmull' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      os: detect PMULL support before enabling accelerated crc32c on ARM

Vincent Fu (2):
      io_ddir: return appropriate string for DDIR_INVAL
      libfio: drop unneeded reset of rwmix_issues

 io_ddir.h     | 2 +-
 libfio.c      | 1 -
 os/os-linux.h | 6 +++++-
 3 files changed, 6 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/io_ddir.h b/io_ddir.h
index a42da97a..296a9d04 100644
--- a/io_ddir.h
+++ b/io_ddir.h
@@ -24,7 +24,7 @@ static inline const char *io_ddir_name(enum fio_ddir ddir)
 					"datasync", "sync_file_range",
 					"wait", };
 
-	if (ddir < DDIR_LAST)
+	if (ddir >= 0 && ddir < DDIR_LAST)
 		return name[ddir];
 
 	return "invalid";
diff --git a/libfio.c b/libfio.c
index ed5906d4..198eaf2e 100644
--- a/libfio.c
+++ b/libfio.c
@@ -140,7 +140,6 @@ void reset_all_stats(struct thread_data *td)
 		td->io_issues[i] = 0;
 		td->ts.total_io_u[i] = 0;
 		td->ts.runtime[i] = 0;
-		td->rwmix_issues = 0;
 	}
 
 	set_epoch_time(td, td->o.log_unix_epoch);
diff --git a/os/os-linux.h b/os/os-linux.h
index 808f1d02..3001140c 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -20,6 +20,9 @@
 
 #ifdef ARCH_HAVE_CRC_CRYPTO
 #include <sys/auxv.h>
+#ifndef HWCAP_PMULL
+#define HWCAP_PMULL             (1 << 4)
+#endif /* HWCAP_PMULL */
 #ifndef HWCAP_CRC32
 #define HWCAP_CRC32             (1 << 7)
 #endif /* HWCAP_CRC32 */
@@ -405,7 +408,8 @@ static inline bool os_cpu_has(cpu_features feature)
 #ifdef ARCH_HAVE_CRC_CRYPTO
 	case CPU_ARM64_CRC32C:
 		hwcap = getauxval(AT_HWCAP);
-		have_feature = (hwcap & HWCAP_CRC32) != 0;
+		have_feature = (hwcap & (HWCAP_PMULL | HWCAP_CRC32)) ==
+			       (HWCAP_PMULL | HWCAP_CRC32);
 		break;
 #endif
 	default:

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-12-03 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-12-03 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ed7f3a07363d62c6d6147b0c568f87f079d241a8:

  stat: make add lat percentile functions inline (2021-11-25 09:03:10 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fd1d8e0ab3dc852193037a3acebcf8b8bdbcd9c5:

  filesetup: create zbd_info before jumping to done label (2021-12-02 17:54:15 -0700)

----------------------------------------------------------------
Niklas Cassel (1):
      filesetup: create zbd_info before jumping to done label

 filesetup.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index 228e4fff..fb556d84 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1119,9 +1119,6 @@ int setup_files(struct thread_data *td)
 	if (err)
 		goto err_out;
 
-	if (o->read_iolog_file)
-		goto done;
-
 	if (td->o.zone_mode == ZONE_MODE_ZBD) {
 		err = zbd_init_files(td);
 		if (err)
@@ -1129,6 +1126,9 @@ int setup_files(struct thread_data *td)
 	}
 	zbd_recalc_options_with_zone_granularity(td);
 
+	if (o->read_iolog_file)
+		goto done;
+
 	/*
 	 * check sizes. if the files/devices do not exist and the size
 	 * isn't passed to fio, abort.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-11-26 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-11-26 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2b00ac1c82d54795911343c9b3b3f4ef64c92d92:

  Merge branch 'fix-parse-sync-file-range' of https://github.com/oleglatin/fio (2021-11-24 10:27:20 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ed7f3a07363d62c6d6147b0c568f87f079d241a8:

  stat: make add lat percentile functions inline (2021-11-25 09:03:10 -0700)

----------------------------------------------------------------
Niklas Cassel (6):
      docs: document quirky implementation of per priority stats reporting
      stat: add comments describing the quirky behavior of clat prio samples
      stat: rename add_lat_percentile_sample()
      stat: rename add_lat_percentile_sample_noprio()
      stat: simplify add_lat_percentile_prio_sample()
      stat: make add lat percentile functions inline

 HOWTO  |  6 +++++-
 fio.1  |  5 ++++-
 stat.c | 52 +++++++++++++++++++++++++++++++++++++++-------------
 3 files changed, 48 insertions(+), 15 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index a3b3acfe..8c9e4135 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2169,7 +2169,11 @@ with the caveat that when used on the command line, they must come after the
     Default: 0. A single value applies to reads and writes. Comma-separated
     values may be specified for reads and writes. For this option to be
     effective, NCQ priority must be supported and enabled, and `direct=1'
-    option must be used. fio must also be run as the root user.
+    option must be used. fio must also be run as the root user. Unlike
+    slat/clat/lat stats, which can be tracked and reported independently, per
+    priority stats only track and report a single type of latency. By default,
+    completion latency (clat) will be reported, if :option:`lat_percentiles` is
+    set, total latency (lat) will be reported.
 
 .. option:: cmdprio_class=int[,int] : [io_uring] [libaio]
 
diff --git a/fio.1 b/fio.1
index a6469541..a3ebb67d 100644
--- a/fio.1
+++ b/fio.1
@@ -1967,7 +1967,10 @@ Set the percentage of I/O that will be issued with the highest priority.
 Default: 0. A single value applies to reads and writes. Comma-separated
 values may be specified for reads and writes. For this option to be effective,
 NCQ priority must be supported and enabled, and `direct=1' option must be
-used. fio must also be run as the root user.
+used. fio must also be run as the root user. Unlike slat/clat/lat stats, which
+can be tracked and reported independently, per priority stats only track and
+report a single type of latency. By default, completion latency (clat) will be
+reported, if \fBlat_percentiles\fR is set, total latency (lat) will be reported.
 .TP
 .BI (io_uring,libaio)cmdprio_class \fR=\fPint[,int]
 Set the I/O priority class to use for I/Os that must be issued with a
diff --git a/stat.c b/stat.c
index e0dc99b6..7e84058d 100644
--- a/stat.c
+++ b/stat.c
@@ -3052,8 +3052,10 @@ void add_sync_clat_sample(struct thread_stat *ts, unsigned long long nsec)
 	add_stat_sample(&ts->sync_stat, nsec);
 }
 
-static void add_lat_percentile_sample_noprio(struct thread_stat *ts,
-				unsigned long long nsec, enum fio_ddir ddir, enum fio_lat lat)
+static inline void add_lat_percentile_sample(struct thread_stat *ts,
+					     unsigned long long nsec,
+					     enum fio_ddir ddir,
+					     enum fio_lat lat)
 {
 	unsigned int idx = plat_val_to_idx(nsec);
 	assert(idx < FIO_IO_U_PLAT_NR);
@@ -3061,14 +3063,13 @@ static void add_lat_percentile_sample_noprio(struct thread_stat *ts,
 	ts->io_u_plat[lat][ddir][idx]++;
 }
 
-static void add_lat_percentile_sample(struct thread_stat *ts,
-				unsigned long long nsec, enum fio_ddir ddir,
-				bool high_prio, enum fio_lat lat)
+static inline void add_lat_percentile_prio_sample(struct thread_stat *ts,
+						  unsigned long long nsec,
+						  enum fio_ddir ddir,
+						  bool high_prio)
 {
 	unsigned int idx = plat_val_to_idx(nsec);
 
-	add_lat_percentile_sample_noprio(ts, nsec, ddir, lat);
-
 	if (!high_prio)
 		ts->io_u_plat_low_prio[ddir][idx]++;
 	else
@@ -3089,6 +3090,15 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 
 	add_stat_sample(&ts->clat_stat[ddir], nsec);
 
+	/*
+	 * When lat_percentiles=1 (default 0), the reported high/low priority
+	 * percentiles and stats are used for describing total latency values,
+	 * even though the variable names themselves start with clat_.
+	 *
+	 * Because of the above definition, add a prio stat sample only when
+	 * lat_percentiles=0. add_lat_sample() will add the prio stat sample
+	 * when lat_percentiles=1.
+	 */
 	if (!ts->lat_percentiles) {
 		if (high_prio)
 			add_stat_sample(&ts->clat_high_prio_stat[ddir], nsec);
@@ -3101,10 +3111,15 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 			       offset, ioprio);
 
 	if (ts->clat_percentiles) {
-		if (ts->lat_percentiles)
-			add_lat_percentile_sample_noprio(ts, nsec, ddir, FIO_CLAT);
-		else
-			add_lat_percentile_sample(ts, nsec, ddir, high_prio, FIO_CLAT);
+		/*
+		 * Because of the above definition, add a prio lat percentile
+		 * sample only when lat_percentiles=0. add_lat_sample() will add
+		 * the prio lat percentile sample when lat_percentiles=1.
+		 */
+		add_lat_percentile_sample(ts, nsec, ddir, FIO_CLAT);
+		if (!ts->lat_percentiles)
+			add_lat_percentile_prio_sample(ts, nsec, ddir,
+						       high_prio);
 	}
 
 	if (iolog && iolog->hist_msec) {
@@ -3169,7 +3184,7 @@ void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
 			       offset, ioprio);
 
 	if (ts->slat_percentiles)
-		add_lat_percentile_sample_noprio(ts, nsec, ddir, FIO_SLAT);
+		add_lat_percentile_sample(ts, nsec, ddir, FIO_SLAT);
 
 	if (needs_lock)
 		__td_io_u_unlock(td);
@@ -3194,8 +3209,19 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 		add_log_sample(td, td->lat_log, sample_val(nsec), ddir, bs,
 			       offset, ioprio);
 
+	/*
+	 * When lat_percentiles=1 (default 0), the reported high/low priority
+	 * percentiles and stats are used for describing total latency values,
+	 * even though the variable names themselves start with clat_.
+	 *
+	 * Because of the above definition, add a prio stat and prio lat
+	 * percentile sample only when lat_percentiles=1. add_clat_sample() will
+	 * add the prio stat and prio lat percentile sample when
+	 * lat_percentiles=0.
+	 */
 	if (ts->lat_percentiles) {
-		add_lat_percentile_sample(ts, nsec, ddir, high_prio, FIO_LAT);
+		add_lat_percentile_sample(ts, nsec, ddir, FIO_LAT);
+		add_lat_percentile_prio_sample(ts, nsec, ddir, high_prio);
 		if (high_prio)
 			add_stat_sample(&ts->clat_high_prio_stat[ddir], nsec);
 		else

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-11-25 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-11-25 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1d08bfb018e600cc47f122fb78c02bf74b84dee8:

  t/dedupe: style fixups (2021-11-21 06:51:11 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2b00ac1c82d54795911343c9b3b3f4ef64c92d92:

  Merge branch 'fix-parse-sync-file-range' of https://github.com/oleglatin/fio (2021-11-24 10:27:20 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fix-parse-sync-file-range' of https://github.com/oleglatin/fio

Oleg Latin (1):
      parse: handle comma-separated options

 parse.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/parse.c b/parse.c
index 45f4f2d3..d086ee48 100644
--- a/parse.c
+++ b/parse.c
@@ -477,13 +477,17 @@ static int check_int(const char *p, int *val)
 
 static size_t opt_len(const char *str)
 {
+	char delimiter[] = {',', ':'};
 	char *postfix;
+	unsigned int i;
 
-	postfix = strchr(str, ':');
-	if (!postfix)
-		return strlen(str);
+	for (i = 0; i < FIO_ARRAY_SIZE(delimiter); i++) {
+		postfix = strchr(str, delimiter[i]);
+		if (postfix)
+			return (int)(postfix - str);
+	}
 
-	return (int)(postfix - str);
+	return strlen(str);
 }
 
 static int str_match_len(const struct value_pair *vp, const char *str)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-11-22 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-11-22 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9f51d89c683d70cd8ab23ba09ec6e628a548af5a:

  Sync io_uring header with the kernel (2021-11-20 07:31:20 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1d08bfb018e600cc47f122fb78c02bf74b84dee8:

  t/dedupe: style fixups (2021-11-21 06:51:11 -0700)

----------------------------------------------------------------
Bar David (2):
      Mixed dedup and compression
      fio-dedup: adjusted the binary to support compression

Jens Axboe (3):
      Merge branch 'dedupe_and_compression' of https://github.com/bardavid/fio
      t/io_uring: fix 32-bit compile warnings
      t/dedupe: style fixups

 DEDUPE-TODO  |   3 --
 dedupe.c     |  12 ++++-
 io_u.c       |  29 ++++++-----
 t/dedupe.c   | 167 +++++++++++++++++++++++++++++++++++++++++++++++------------
 t/io_uring.c |   4 +-
 5 files changed, 161 insertions(+), 54 deletions(-)

---

Diff of recent changes:

diff --git a/DEDUPE-TODO b/DEDUPE-TODO
index 1f3ee9da..4b0bfd1d 100644
--- a/DEDUPE-TODO
+++ b/DEDUPE-TODO
@@ -1,6 +1,3 @@
-- Mixed buffers of dedupe-able and compressible data.
-  Major usecase in performance benchmarking of storage subsystems.
-
 - Shifted dedup-able data.
   Allow for dedup buffer generation to shift contents by random number
   of sectors (fill the gaps with uncompressible data). Some storage
diff --git a/dedupe.c b/dedupe.c
index 043a376c..fd116dfb 100644
--- a/dedupe.c
+++ b/dedupe.c
@@ -2,12 +2,14 @@
 
 int init_dedupe_working_set_seeds(struct thread_data *td)
 {
-	unsigned long long i;
+	unsigned long long i, j, num_seed_advancements;
 	struct frand_state dedupe_working_set_state = {0};
 
 	if (!td->o.dedupe_percentage || !(td->o.dedupe_mode == DEDUPE_MODE_WORKING_SET))
 		return 0;
 
+	num_seed_advancements = td->o.min_bs[DDIR_WRITE] /
+		min_not_zero(td->o.min_bs[DDIR_WRITE], (unsigned long long) td->o.compress_chunk);
 	/*
 	 * The dedupe working set keeps seeds of unique data (generated by buf_state).
 	 * Dedupe-ed pages will be generated using those seeds.
@@ -21,7 +23,13 @@ int init_dedupe_working_set_seeds(struct thread_data *td)
 	frand_copy(&dedupe_working_set_state, &td->buf_state);
 	for (i = 0; i < td->num_unique_pages; i++) {
 		frand_copy(&td->dedupe_working_set_states[i], &dedupe_working_set_state);
-		__get_next_seed(&dedupe_working_set_state);
+		/*
+		 * When compression is used the seed is advanced multiple times to
+		 * generate the buffer. We want to regenerate the same buffer when
+		 * deduping against this page
+		 */
+		for (j = 0; j < num_seed_advancements; j++)
+			__get_next_seed(&dedupe_working_set_state);
 	}
 
 	return 0;
diff --git a/io_u.c b/io_u.c
index 586a4bef..3c72d63d 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2230,27 +2230,30 @@ void fill_io_buffer(struct thread_data *td, void *buf, unsigned long long min_wr
 
 	if (o->compress_percentage || o->dedupe_percentage) {
 		unsigned int perc = td->o.compress_percentage;
-		struct frand_state *rs;
+		struct frand_state *rs = NULL;
 		unsigned long long left = max_bs;
 		unsigned long long this_write;
 
 		do {
-			rs = get_buf_state(td);
+			/*
+			 * Buffers are either entirely dedupe-able or not.
+			 * If we choose to dedup, the buffer should undergo
+			 * the same manipulation as the original write. Which
+			 * means we should retrack the steps we took for compression
+			 * as well.
+			 */
+			if (!rs)
+				rs = get_buf_state(td);
 
 			min_write = min(min_write, left);
 
-			if (perc) {
-				this_write = min_not_zero(min_write,
-							(unsigned long long) td->o.compress_chunk);
+			this_write = min_not_zero(min_write,
+						(unsigned long long) td->o.compress_chunk);
 
-				fill_random_buf_percentage(rs, buf, perc,
-					this_write, this_write,
-					o->buffer_pattern,
-					o->buffer_pattern_bytes);
-			} else {
-				fill_random_buf(rs, buf, min_write);
-				this_write = min_write;
-			}
+			fill_random_buf_percentage(rs, buf, perc,
+				this_write, this_write,
+				o->buffer_pattern,
+				o->buffer_pattern_bytes);
 
 			buf += this_write;
 			left -= this_write;
diff --git a/t/dedupe.c b/t/dedupe.c
index 8b659c76..109ea1af 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -24,19 +24,25 @@
 
 #include "../lib/bloom.h"
 #include "debug.h"
+#include "zlib.h"
+
+struct zlib_ctrl {
+	z_stream stream;
+	unsigned char *buf_in;
+	unsigned char *buf_out;
+};
 
 struct worker_thread {
+	struct zlib_ctrl zc;
 	pthread_t thread;
-
-	volatile int done;
-
-	int fd;
 	uint64_t cur_offset;
 	uint64_t size;
-
+	unsigned long long unique_capacity;
 	unsigned long items;
 	unsigned long dupes;
 	int err;
+	int fd;
+	volatile int done;
 };
 
 struct extent {
@@ -68,6 +74,7 @@ static unsigned int odirect;
 static unsigned int collision_check;
 static unsigned int print_progress = 1;
 static unsigned int use_bloom = 1;
+static unsigned int compression = 0;
 
 static uint64_t total_size;
 static uint64_t cur_offset;
@@ -87,8 +94,9 @@ static uint64_t get_size(struct fio_file *f, struct stat *sb)
 			return 0;
 		}
 		ret = bytes;
-	} else
+	} else {
 		ret = sb->st_size;
+	}
 
 	return (ret & ~((uint64_t)blocksize - 1));
 }
@@ -120,9 +128,9 @@ static int __read_block(int fd, void *buf, off_t offset, size_t count)
 	if (ret < 0) {
 		perror("pread");
 		return 1;
-	} else if (!ret)
+	} else if (!ret) {
 		return 1;
-	else if (ret != count) {
+	} else if (ret != count) {
 		log_err("dedupe: short read on block\n");
 		return 1;
 	}
@@ -135,6 +143,34 @@ static int read_block(int fd, void *buf, off_t offset)
 	return __read_block(fd, buf, offset, blocksize);
 }
 
+static void account_unique_capacity(uint64_t offset, uint64_t *unique_capacity,
+				    struct zlib_ctrl *zc)
+{
+	z_stream *stream = &zc->stream;
+	unsigned int compressed_len;
+	int ret;
+
+	if (read_block(file.fd, zc->buf_in, offset))
+		return;
+
+	stream->next_in = zc->buf_in;
+	stream->avail_in = blocksize;
+	stream->avail_out = deflateBound(stream, blocksize);
+	stream->next_out = zc->buf_out;
+
+	ret = deflate(stream, Z_FINISH);
+	assert(ret != Z_STREAM_ERROR);
+	compressed_len = blocksize - stream->avail_out;
+
+	if (dump_output)
+		printf("offset 0x%lx compressed to %d blocksize %d ratio %.2f \n",
+				(unsigned long) offset, compressed_len, blocksize,
+				(float)compressed_len / (float)blocksize);
+
+	*unique_capacity += compressed_len;
+	deflateReset(stream);
+}
+
 static void add_item(struct chunk *c, struct item *i)
 {
 	/*	
@@ -182,13 +218,15 @@ static struct chunk *alloc_chunk(void)
 	if (collision_check || dump_output) {
 		c = malloc(sizeof(struct chunk) + sizeof(struct flist_head));
 		INIT_FLIST_HEAD(&c->extent_list[0]);
-	} else
+	} else {
 		c = malloc(sizeof(struct chunk));
+	}
 
 	return c;
 }
 
-static void insert_chunk(struct item *i)
+static void insert_chunk(struct item *i, uint64_t *unique_capacity,
+			 struct zlib_ctrl *zc)
 {
 	struct fio_rb_node **p, *parent;
 	struct chunk *c;
@@ -201,11 +239,11 @@ static void insert_chunk(struct item *i)
 
 		c = rb_entry(parent, struct chunk, rb_node);
 		diff = memcmp(i->hash, c->hash, sizeof(i->hash));
-		if (diff < 0)
+		if (diff < 0) {
 			p = &(*p)->rb_left;
-		else if (diff > 0)
+		} else if (diff > 0) {
 			p = &(*p)->rb_right;
-		else {
+		} else {
 			int ret;
 
 			if (!collision_check)
@@ -228,12 +266,15 @@ static void insert_chunk(struct item *i)
 	memcpy(c->hash, i->hash, sizeof(i->hash));
 	rb_link_node(&c->rb_node, parent, p);
 	rb_insert_color(&c->rb_node, &rb_root);
+	if (compression)
+		account_unique_capacity(i->offset, unique_capacity, zc);
 add:
 	add_item(c, i);
 }
 
 static void insert_chunks(struct item *items, unsigned int nitems,
-			  uint64_t *ndupes)
+			  uint64_t *ndupes, uint64_t *unique_capacity,
+			  struct zlib_ctrl *zc)
 {
 	int i;
 
@@ -248,7 +289,7 @@ static void insert_chunks(struct item *items, unsigned int nitems,
 			r = bloom_set(bloom, items[i].hash, s);
 			*ndupes += r;
 		} else
-			insert_chunk(&items[i]);
+			insert_chunk(&items[i], unique_capacity, zc);
 	}
 
 	fio_sem_up(rb_lock);
@@ -277,11 +318,13 @@ static int do_work(struct worker_thread *thread, void *buf)
 	off_t offset;
 	int nitems = 0;
 	uint64_t ndupes = 0;
+	uint64_t unique_capacity = 0;
 	struct item *items;
 
 	offset = thread->cur_offset;
 
-	nblocks = read_blocks(thread->fd, buf, offset, min(thread->size, (uint64_t)chunk_size));
+	nblocks = read_blocks(thread->fd, buf, offset,
+				min(thread->size, (uint64_t) chunk_size));
 	if (!nblocks)
 		return 1;
 
@@ -296,20 +339,39 @@ static int do_work(struct worker_thread *thread, void *buf)
 		nitems++;
 	}
 
-	insert_chunks(items, nitems, &ndupes);
+	insert_chunks(items, nitems, &ndupes, &unique_capacity, &thread->zc);
 
 	free(items);
 	thread->items += nitems;
 	thread->dupes += ndupes;
+	thread->unique_capacity += unique_capacity;
 	return 0;
 }
 
+static void thread_init_zlib_control(struct worker_thread *thread)
+{
+	size_t sz;
+
+	z_stream *stream = &thread->zc.stream;
+	stream->zalloc = Z_NULL;
+	stream->zfree = Z_NULL;
+	stream->opaque = Z_NULL;
+
+	if (deflateInit(stream, Z_DEFAULT_COMPRESSION) != Z_OK)
+		return;
+
+	thread->zc.buf_in = fio_memalign(blocksize, blocksize, false);
+	sz = deflateBound(stream, blocksize);
+	thread->zc.buf_out = fio_memalign(blocksize, sz, false);
+}
+
 static void *thread_fn(void *data)
 {
 	struct worker_thread *thread = data;
 	void *buf;
 
 	buf = fio_memalign(blocksize, chunk_size, false);
+	thread_init_zlib_control(thread);
 
 	do {
 		if (get_work(&thread->cur_offset, &thread->size)) {
@@ -362,15 +424,17 @@ static void show_progress(struct worker_thread *threads, unsigned long total)
 			printf("%3.2f%% done (%luKiB/sec)\r", perc, this_items);
 			last_nitems = nitems;
 			fio_gettime(&last_tv, NULL);
-		} else
+		} else {
 			printf("%3.2f%% done\r", perc);
+		}
 		fflush(stdout);
 		usleep(250000);
 	};
 }
 
 static int run_dedupe_threads(struct fio_file *f, uint64_t dev_size,
-			      uint64_t *nextents, uint64_t *nchunks)
+			      uint64_t *nextents, uint64_t *nchunks,
+			      uint64_t *unique_capacity)
 {
 	struct worker_thread *threads;
 	unsigned long nitems, total_items;
@@ -398,11 +462,13 @@ static int run_dedupe_threads(struct fio_file *f, uint64_t dev_size,
 	nitems = 0;
 	*nextents = 0;
 	*nchunks = 1;
+	*unique_capacity = 0;
 	for (i = 0; i < num_threads; i++) {
 		void *ret;
 		pthread_join(threads[i].thread, &ret);
 		nitems += threads[i].items;
 		*nchunks += threads[i].dupes;
+		*unique_capacity += threads[i].unique_capacity;
 	}
 
 	printf("Threads(%u): %lu items processed\n", num_threads, nitems);
@@ -416,7 +482,7 @@ static int run_dedupe_threads(struct fio_file *f, uint64_t dev_size,
 }
 
 static int dedupe_check(const char *filename, uint64_t *nextents,
-			uint64_t *nchunks)
+			uint64_t *nchunks, uint64_t *unique_capacity)
 {
 	uint64_t dev_size;
 	struct stat sb;
@@ -451,9 +517,11 @@ static int dedupe_check(const char *filename, uint64_t *nextents,
 		bloom = bloom_new(bloom_entries);
 	}
 
-	printf("Will check <%s>, size <%llu>, using %u threads\n", filename, (unsigned long long) dev_size, num_threads);
+	printf("Will check <%s>, size <%llu>, using %u threads\n", filename,
+				(unsigned long long) dev_size, num_threads);
 
-	return run_dedupe_threads(&file, dev_size, nextents, nchunks);
+	return run_dedupe_threads(&file, dev_size, nextents, nchunks,
+					unique_capacity);
 err:
 	if (file.fd != -1)
 		close(file.fd);
@@ -466,18 +534,38 @@ static void show_chunk(struct chunk *c)
 	struct flist_head *n;
 	struct extent *e;
 
-	printf("c hash %8x %8x %8x %8x, count %lu\n", c->hash[0], c->hash[1], c->hash[2], c->hash[3], (unsigned long) c->count);
+	printf("c hash %8x %8x %8x %8x, count %lu\n", c->hash[0], c->hash[1],
+			c->hash[2], c->hash[3], (unsigned long) c->count);
 	flist_for_each(n, &c->extent_list[0]) {
 		e = flist_entry(n, struct extent, list);
 		printf("\toffset %llu\n", (unsigned long long) e->offset);
 	}
 }
 
-static void show_stat(uint64_t nextents, uint64_t nchunks, uint64_t ndupextents)
+static const char *capacity_unit[] = {"b","KB", "MB", "GB", "TB", "PB", "EB"};
+
+static uint64_t bytes_to_human_readable_unit(uint64_t n, const char **unit_out)
+{
+	uint8_t i = 0;
+
+	while (n >= 1024) {
+		i++;
+		n /= 1024;
+	}
+
+	*unit_out = capacity_unit[i];
+	return n;
+}
+
+static void show_stat(uint64_t nextents, uint64_t nchunks, uint64_t ndupextents,
+		      uint64_t unique_capacity)
 {
 	double perc, ratio;
+	const char *unit;
+	uint64_t uc_human;
 
-	printf("Extents=%lu, Unique extents=%lu", (unsigned long) nextents, (unsigned long) nchunks);
+	printf("Extents=%lu, Unique extents=%lu", (unsigned long) nextents,
+						(unsigned long) nchunks);
 	if (!bloom)
 		printf(" Duplicated extents=%lu", (unsigned long) ndupextents);
 	printf("\n");
@@ -485,22 +573,29 @@ static void show_stat(uint64_t nextents, uint64_t nchunks, uint64_t ndupextents)
 	if (nchunks) {
 		ratio = (double) nextents / (double) nchunks;
 		printf("De-dupe ratio: 1:%3.2f\n", ratio - 1.0);
-	} else
+	} else {
 		printf("De-dupe ratio: 1:infinite\n");
+	}
 
-	if (ndupextents)
-		printf("De-dupe working set at least: %3.2f%%\n", 100.0 * (double) ndupextents / (double) nextents);
+	if (ndupextents) {
+		printf("De-dupe working set at least: %3.2f%%\n",
+			100.0 * (double) ndupextents / (double) nextents);
+	}
 
 	perc = 1.00 - ((double) nchunks / (double) nextents);
 	perc *= 100.0;
 	printf("Fio setting: dedupe_percentage=%u\n", (int) (perc + 0.50));
 
+
+	if (compression) {
+		uc_human = bytes_to_human_readable_unit(unique_capacity, &unit);
+		printf("Unique capacity %lu%s\n", (unsigned long) uc_human, unit);
+	}
 }
 
 static void iter_rb_tree(uint64_t *nextents, uint64_t *nchunks, uint64_t *ndupextents)
 {
 	struct fio_rb_node *n;
-
 	*nchunks = *nextents = *ndupextents = 0;
 
 	n = rb_first(&rb_root);
@@ -532,18 +627,19 @@ static int usage(char *argv[])
 	log_err("\t-c\tFull collision check\n");
 	log_err("\t-B\tUse probabilistic bloom filter\n");
 	log_err("\t-p\tPrint progress indicator\n");
+	log_err("\t-C\tCalculate compressible size\n");
 	return 1;
 }
 
 int main(int argc, char *argv[])
 {
-	uint64_t nextents = 0, nchunks = 0, ndupextents = 0;
+	uint64_t nextents = 0, nchunks = 0, ndupextents = 0, unique_capacity;
 	int c, ret;
 
 	arch_init(argv);
 	debug_init();
 
-	while ((c = getopt(argc, argv, "b:t:d:o:c:p:B:")) != -1) {
+	while ((c = getopt(argc, argv, "b:t:d:o:c:p:B:C:")) != -1) {
 		switch (c) {
 		case 'b':
 			blocksize = atoi(optarg);
@@ -566,13 +662,16 @@ int main(int argc, char *argv[])
 		case 'B':
 			use_bloom = atoi(optarg);
 			break;
+		case 'C':
+			compression = atoi(optarg);
+			break;
 		case '?':
 		default:
 			return usage(argv);
 		}
 	}
 
-	if (collision_check || dump_output)
+	if (collision_check || dump_output || compression)
 		use_bloom = 0;
 
 	if (!num_threads)
@@ -586,13 +685,13 @@ int main(int argc, char *argv[])
 	rb_root = RB_ROOT;
 	rb_lock = fio_sem_init(FIO_SEM_UNLOCKED);
 
-	ret = dedupe_check(argv[optind], &nextents, &nchunks);
+	ret = dedupe_check(argv[optind], &nextents, &nchunks, &unique_capacity);
 
 	if (!ret) {
 		if (!bloom)
 			iter_rb_tree(&nextents, &nchunks, &ndupextents);
 
-		show_stat(nextents, nchunks, ndupextents);
+		show_stat(nextents, nchunks, ndupextents, unique_capacity);
 	}
 
 	fio_sem_remove(rb_lock);
diff --git a/t/io_uring.c b/t/io_uring.c
index 7bf215c7..a98f78fd 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -192,7 +192,7 @@ unsigned int calc_clat_percentiles(unsigned long *io_u_plat, unsigned long nr,
 	unsigned long *ovals = NULL;
 	bool is_last;
 
-	*minv = -1ULL;
+	*minv = -1UL;
 	*maxv = 0;
 
 	ovals = malloc(len * sizeof(*ovals));
@@ -498,7 +498,7 @@ static void init_io(struct submitter *s, unsigned index)
 	sqe->off = offset;
 	sqe->user_data = (unsigned long) f->fileno;
 	if (stats && stats_running)
-		sqe->user_data |= ((unsigned long)s->clock_index << 32);
+		sqe->user_data |= ((uint64_t)s->clock_index << 32);
 }
 
 static int prep_more_ios_uring(struct submitter *s, int max_ios)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-11-21 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-11-21 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit beda9d8d9e9148ff34eaa0eeb0cde19a36f47494:

  t/io_uring: add -R option for random/sequential IO (2021-11-19 10:44:15 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9f51d89c683d70cd8ab23ba09ec6e628a548af5a:

  Sync io_uring header with the kernel (2021-11-20 07:31:20 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      io_uring: clamp CQ size to SQ size
      Sync io_uring header with the kernel

 engines/io_uring.c  |   7 ++
 os/linux/io_uring.h | 186 ++++++++++++++++++++++++++++++++++++++++++----------
 t/io_uring.c        |   7 ++
 3 files changed, 167 insertions(+), 33 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 8b8f35f1..00ae3482 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -692,6 +692,13 @@ static int fio_ioring_queue_init(struct thread_data *td)
 		}
 	}
 
+	/*
+	 * Clamp CQ ring size at our SQ ring size, we don't need more entries
+	 * than that.
+	 */
+	p.flags |= IORING_SETUP_CQSIZE;
+	p.cq_entries = depth;
+
 	ret = syscall(__NR_io_uring_setup, depth, &p);
 	if (ret < 0)
 		return ret;
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index d39b45fd..c45b5e9a 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -11,10 +11,6 @@
 #include <linux/fs.h>
 #include <linux/types.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /*
  * IO submission data structure (Submission Queue Entry)
  */
@@ -46,23 +42,25 @@ struct io_uring_sqe {
 		__u32		statx_flags;
 		__u32		fadvise_advice;
 		__u32		splice_flags;
+		__u32		rename_flags;
+		__u32		unlink_flags;
+		__u32		hardlink_flags;
 	};
 	__u64	user_data;	/* data to be passed back at completion time */
+	/* pack this to avoid bogus arm OABI complaints */
 	union {
-		struct {
-			/* pack this to avoid bogus arm OABI complaints */
-			union {
-				/* index into fixed buffers, if used */
-				__u16	buf_index;
-				/* for grouped buffer selection */
-				__u16	buf_group;
-			} __attribute__((packed));
-			/* personality to use, if used */
-			__u16	personality;
-			__s32	splice_fd_in;
-		};
-		__u64	__pad2[3];
+		/* index into fixed buffers, if used */
+		__u16	buf_index;
+		/* for grouped buffer selection */
+		__u16	buf_group;
+	} __attribute__((packed));
+	/* personality to use, if used */
+	__u16	personality;
+	union {
+		__s32	splice_fd_in;
+		__u32	file_index;
 	};
+	__u64	__pad2[2];
 };
 
 enum {
@@ -99,6 +97,7 @@ enum {
 #define IORING_SETUP_CQSIZE	(1U << 3)	/* app defines CQ size */
 #define IORING_SETUP_CLAMP	(1U << 4)	/* clamp SQ/CQ ring sizes */
 #define IORING_SETUP_ATTACH_WQ	(1U << 5)	/* attach to existing wq */
+#define IORING_SETUP_R_DISABLED	(1U << 6)	/* start with ring disabled */
 
 enum {
 	IORING_OP_NOP,
@@ -135,6 +134,12 @@ enum {
 	IORING_OP_PROVIDE_BUFFERS,
 	IORING_OP_REMOVE_BUFFERS,
 	IORING_OP_TEE,
+	IORING_OP_SHUTDOWN,
+	IORING_OP_RENAMEAT,
+	IORING_OP_UNLINKAT,
+	IORING_OP_MKDIRAT,
+	IORING_OP_SYMLINKAT,
+	IORING_OP_LINKAT,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
@@ -148,14 +153,35 @@ enum {
 /*
  * sqe->timeout_flags
  */
-#define IORING_TIMEOUT_ABS	(1U << 0)
-
+#define IORING_TIMEOUT_ABS		(1U << 0)
+#define IORING_TIMEOUT_UPDATE		(1U << 1)
+#define IORING_TIMEOUT_BOOTTIME		(1U << 2)
+#define IORING_TIMEOUT_REALTIME		(1U << 3)
+#define IORING_LINK_TIMEOUT_UPDATE	(1U << 4)
+#define IORING_TIMEOUT_ETIME_SUCCESS	(1U << 5)
+#define IORING_TIMEOUT_CLOCK_MASK	(IORING_TIMEOUT_BOOTTIME | IORING_TIMEOUT_REALTIME)
+#define IORING_TIMEOUT_UPDATE_MASK	(IORING_TIMEOUT_UPDATE | IORING_LINK_TIMEOUT_UPDATE)
 /*
  * sqe->splice_flags
  * extends splice(2) flags
  */
 #define SPLICE_F_FD_IN_FIXED	(1U << 31) /* the last bit of __u32 */
 
+/*
+ * POLL_ADD flags. Note that since sqe->poll_events is the flag space, the
+ * command flags for POLL_ADD are stored in sqe->len.
+ *
+ * IORING_POLL_ADD_MULTI	Multishot poll. Sets IORING_CQE_F_MORE if
+ *				the poll handler will continue to report
+ *				CQEs on behalf of the same SQE.
+ *
+ * IORING_POLL_UPDATE		Update existing poll request, matching
+ *				sqe->addr as the old user_data field.
+ */
+#define IORING_POLL_ADD_MULTI	(1U << 0)
+#define IORING_POLL_UPDATE_EVENTS	(1U << 1)
+#define IORING_POLL_UPDATE_USER_DATA	(1U << 2)
+
 /*
  * IO completion data structure (Completion Queue Entry)
  */
@@ -169,8 +195,10 @@ struct io_uring_cqe {
  * cqe->flags
  *
  * IORING_CQE_F_BUFFER	If set, the upper 16 bits are the buffer ID
+ * IORING_CQE_F_MORE	If set, parent SQE will generate more CQE entries
  */
 #define IORING_CQE_F_BUFFER		(1U << 0)
+#define IORING_CQE_F_MORE		(1U << 1)
 
 enum {
 	IORING_CQE_BUFFER_SHIFT		= 16,
@@ -228,6 +256,8 @@ struct io_cqring_offsets {
  */
 #define IORING_ENTER_GETEVENTS	(1U << 0)
 #define IORING_ENTER_SQ_WAKEUP	(1U << 1)
+#define IORING_ENTER_SQ_WAIT	(1U << 2)
+#define IORING_ENTER_EXT_ARG	(1U << 3)
 
 /*
  * Passed in for io_uring_setup(2). Copied back with updated info on success
@@ -255,28 +285,85 @@ struct io_uring_params {
 #define IORING_FEAT_CUR_PERSONALITY	(1U << 4)
 #define IORING_FEAT_FAST_POLL		(1U << 5)
 #define IORING_FEAT_POLL_32BITS 	(1U << 6)
+#define IORING_FEAT_SQPOLL_NONFIXED	(1U << 7)
+#define IORING_FEAT_EXT_ARG		(1U << 8)
+#define IORING_FEAT_NATIVE_WORKERS	(1U << 9)
+#define IORING_FEAT_RSRC_TAGS		(1U << 10)
 
 /*
  * io_uring_register(2) opcodes and arguments
  */
-#define IORING_REGISTER_BUFFERS		0
-#define IORING_UNREGISTER_BUFFERS	1
-#define IORING_REGISTER_FILES		2
-#define IORING_UNREGISTER_FILES		3
-#define IORING_REGISTER_EVENTFD		4
-#define IORING_UNREGISTER_EVENTFD	5
-#define IORING_REGISTER_FILES_UPDATE	6
-#define IORING_REGISTER_EVENTFD_ASYNC	7
-#define IORING_REGISTER_PROBE		8
-#define IORING_REGISTER_PERSONALITY	9
-#define IORING_UNREGISTER_PERSONALITY	10
+enum {
+	IORING_REGISTER_BUFFERS			= 0,
+	IORING_UNREGISTER_BUFFERS		= 1,
+	IORING_REGISTER_FILES			= 2,
+	IORING_UNREGISTER_FILES			= 3,
+	IORING_REGISTER_EVENTFD			= 4,
+	IORING_UNREGISTER_EVENTFD		= 5,
+	IORING_REGISTER_FILES_UPDATE		= 6,
+	IORING_REGISTER_EVENTFD_ASYNC		= 7,
+	IORING_REGISTER_PROBE			= 8,
+	IORING_REGISTER_PERSONALITY		= 9,
+	IORING_UNREGISTER_PERSONALITY		= 10,
+	IORING_REGISTER_RESTRICTIONS		= 11,
+	IORING_REGISTER_ENABLE_RINGS		= 12,
+
+	/* extended with tagging */
+	IORING_REGISTER_FILES2			= 13,
+	IORING_REGISTER_FILES_UPDATE2		= 14,
+	IORING_REGISTER_BUFFERS2		= 15,
+	IORING_REGISTER_BUFFERS_UPDATE		= 16,
+
+	/* set/clear io-wq thread affinities */
+	IORING_REGISTER_IOWQ_AFF		= 17,
+	IORING_UNREGISTER_IOWQ_AFF		= 18,
+
+	/* set/get max number of io-wq workers */
+	IORING_REGISTER_IOWQ_MAX_WORKERS	= 19,
 
+	/* this goes last */
+	IORING_REGISTER_LAST
+};
+
+/* io-wq worker categories */
+enum {
+	IO_WQ_BOUND,
+	IO_WQ_UNBOUND,
+};
+
+/* deprecated, see struct io_uring_rsrc_update */
 struct io_uring_files_update {
 	__u32 offset;
 	__u32 resv;
 	__aligned_u64 /* __s32 * */ fds;
 };
 
+struct io_uring_rsrc_register {
+	__u32 nr;
+	__u32 resv;
+	__u64 resv2;
+	__aligned_u64 data;
+	__aligned_u64 tags;
+};
+
+struct io_uring_rsrc_update {
+	__u32 offset;
+	__u32 resv;
+	__aligned_u64 data;
+};
+
+struct io_uring_rsrc_update2 {
+	__u32 offset;
+	__u32 resv;
+	__aligned_u64 data;
+	__aligned_u64 tags;
+	__u32 nr;
+	__u32 resv2;
+};
+
+/* Skip updating fd indexes set to this value in the fd table */
+#define IORING_REGISTER_FILES_SKIP	(-2)
+
 #define IO_URING_OP_SUPPORTED	(1U << 0)
 
 struct io_uring_probe_op {
@@ -294,8 +381,41 @@ struct io_uring_probe {
 	struct io_uring_probe_op ops[0];
 };
 
-#ifdef __cplusplus
-}
-#endif
+struct io_uring_restriction {
+	__u16 opcode;
+	union {
+		__u8 register_op; /* IORING_RESTRICTION_REGISTER_OP */
+		__u8 sqe_op;      /* IORING_RESTRICTION_SQE_OP */
+		__u8 sqe_flags;   /* IORING_RESTRICTION_SQE_FLAGS_* */
+	};
+	__u8 resv;
+	__u32 resv2[3];
+};
+
+/*
+ * io_uring_restriction->opcode values
+ */
+enum {
+	/* Allow an io_uring_register(2) opcode */
+	IORING_RESTRICTION_REGISTER_OP		= 0,
+
+	/* Allow an sqe opcode */
+	IORING_RESTRICTION_SQE_OP		= 1,
+
+	/* Allow sqe flags */
+	IORING_RESTRICTION_SQE_FLAGS_ALLOWED	= 2,
+
+	/* Require sqe flags (these flags must be set on each submission) */
+	IORING_RESTRICTION_SQE_FLAGS_REQUIRED	= 3,
+
+	IORING_RESTRICTION_LAST
+};
+
+struct io_uring_getevents_arg {
+	__u64	sigmask;
+	__u32	sigmask_sz;
+	__u32	pad;
+	__u64	ts;
+};
 
 #endif
diff --git a/t/io_uring.c b/t/io_uring.c
index b79822d7..7bf215c7 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -384,6 +384,13 @@ static int io_uring_register_files(struct submitter *s)
 
 static int io_uring_setup(unsigned entries, struct io_uring_params *p)
 {
+	/*
+	 * Clamp CQ ring size at our SQ ring size, we don't need more entries
+	 * than that.
+	 */
+	p->flags |= IORING_SETUP_CQSIZE;
+	p->cq_entries = entries;
+
 	return syscall(__NR_io_uring_setup, entries, p);
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-11-20 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-11-20 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5711325cbb37d10c21a6975d1f1ebea11799c05e:

  Makefile: Fix android compilation (2021-11-17 16:14:27 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to beda9d8d9e9148ff34eaa0eeb0cde19a36f47494:

  t/io_uring: add -R option for random/sequential IO (2021-11-19 10:44:15 -0700)

----------------------------------------------------------------
Damien Le Moal (1):
      fio: Introduce the log_entries option

Jens Axboe (2):
      t/io_uring: use internal random generator
      t/io_uring: add -R option for random/sequential IO

 HOWTO            | 12 ++++++++++++
 Makefile         |  3 +--
 cconv.c          |  2 ++
 fio.1            | 11 +++++++++++
 lib/rand.c       |  2 +-
 lib/rand.h       |  1 +
 options.c        | 12 ++++++++++++
 server.h         |  2 +-
 stat.c           | 12 +++++-------
 t/io_uring.c     | 34 +++++++++++++++++++++++++++-------
 thread_options.h |  2 ++
 11 files changed, 75 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 196bca6c..a3b3acfe 100644
--- a/HOWTO
+++ b/HOWTO
@@ -3537,6 +3537,18 @@ Measurements and reporting
 	:option:`write_bw_log` for details about the filename format and `Log
 	File Formats`_ for how data is structured within the file.
 
+.. option:: log_entries=int
+
+	By default, fio will log an entry in the iops, latency, or bw log for
+	every I/O that completes. The initial number of I/O log entries is 1024.
+	When the log entries are all used, new log entries are dynamically
+	allocated.  This dynamic log entry allocation may negatively impact
+	time-related statistics such as I/O tail latencies (e.g. 99.9th percentile
+	completion latency). This option allows specifying a larger initial
+	number of log entries to avoid run-time allocations of new log entries,
+	resulting in more precise time-related I/O statistics.
+	Also see :option:`log_avg_msec`. Defaults to 1024.
+
 .. option:: log_avg_msec=int
 
 	By default, fio will log an entry in the iops, latency, or bw log for every
diff --git a/Makefile b/Makefile
index 04c1e0a7..5d17bcab 100644
--- a/Makefile
+++ b/Makefile
@@ -375,8 +375,7 @@ T_VS_PROGS = t/fio-verify-state
 T_PIPE_ASYNC_OBJS = t/read-to-pipe-async.o
 T_PIPE_ASYNC_PROGS = t/read-to-pipe-async
 
-T_IOU_RING_OBJS = t/io_uring.o
-T_IOU_RING_OBJS += t/arch.o
+T_IOU_RING_OBJS = t/io_uring.o lib/rand.o lib/pattern.o lib/strntol.o
 T_IOU_RING_PROGS = t/io_uring
 
 T_MEMLOCK_OBJS = t/memlock.o
diff --git a/cconv.c b/cconv.c
index 2104308c..4f8d27eb 100644
--- a/cconv.c
+++ b/cconv.c
@@ -187,6 +187,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->rand_repeatable = le32_to_cpu(top->rand_repeatable);
 	o->allrand_repeatable = le32_to_cpu(top->allrand_repeatable);
 	o->rand_seed = le64_to_cpu(top->rand_seed);
+	o->log_entries = le32_to_cpu(top->log_entries);
 	o->log_avg_msec = le32_to_cpu(top->log_avg_msec);
 	o->log_hist_msec = le32_to_cpu(top->log_hist_msec);
 	o->log_hist_coarseness = le32_to_cpu(top->log_hist_coarseness);
@@ -416,6 +417,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->rand_repeatable = cpu_to_le32(o->rand_repeatable);
 	top->allrand_repeatable = cpu_to_le32(o->allrand_repeatable);
 	top->rand_seed = __cpu_to_le64(o->rand_seed);
+	top->log_entries = cpu_to_le32(o->log_entries);
 	top->log_avg_msec = cpu_to_le32(o->log_avg_msec);
 	top->log_max = cpu_to_le32(o->log_max);
 	top->log_offset = cpu_to_le32(o->log_offset);
diff --git a/fio.1 b/fio.1
index e3c3feae..a6469541 100644
--- a/fio.1
+++ b/fio.1
@@ -3243,6 +3243,17 @@ logging (see \fBlog_avg_msec\fR) has been enabled. See
 \fBwrite_bw_log\fR for details about the filename format and \fBLOG
 FILE FORMATS\fR for how data is structured within the file.
 .TP
+.BI log_entries \fR=\fPint
+By default, fio will log an entry in the iops, latency, or bw log for
+every I/O that completes. The initial number of I/O log entries is 1024.
+When the log entries are all used, new log entries are dynamically
+allocated.  This dynamic log entry allocation may negatively impact
+time-related statistics such as I/O tail latencies (e.g. 99.9th percentile
+completion latency). This option allows specifying a larger initial
+number of log entries to avoid run-time allocation of new log entries,
+resulting in more precise time-related I/O statistics.
+Also see \fBlog_avg_msec\fR as well. Defaults to 1024.
+.TP
 .BI log_avg_msec \fR=\fPint
 By default, fio will log an entry in the iops, latency, or bw log for every
 I/O that completes. When writing to the disk log, that can quickly grow to a
diff --git a/lib/rand.c b/lib/rand.c
index e74da609..6e893e80 100644
--- a/lib/rand.c
+++ b/lib/rand.c
@@ -59,7 +59,7 @@ static void __init_rand32(struct taus88_state *state, unsigned int seed)
 		__rand32(state);
 }
 
-static void __init_rand64(struct taus258_state *state, uint64_t seed)
+void __init_rand64(struct taus258_state *state, uint64_t seed)
 {
 	int cranks = 6;
 
diff --git a/lib/rand.h b/lib/rand.h
index a8060045..2b4be788 100644
--- a/lib/rand.h
+++ b/lib/rand.h
@@ -162,6 +162,7 @@ static inline uint64_t __get_next_seed(struct frand_state *fs)
 
 extern void init_rand(struct frand_state *, bool);
 extern void init_rand_seed(struct frand_state *, uint64_t seed, bool);
+void __init_rand64(struct taus258_state *state, uint64_t seed);
 extern void __fill_random_buf(void *buf, unsigned int len, uint64_t seed);
 extern uint64_t fill_random_buf(struct frand_state *, void *buf, unsigned int len);
 extern void __fill_random_buf_percentage(uint64_t, void *, unsigned int, unsigned int, unsigned int, char *, unsigned int);
diff --git a/options.c b/options.c
index 460cf4ff..102bcf56 100644
--- a/options.c
+++ b/options.c
@@ -4244,6 +4244,18 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
 	},
+	{
+		.name	= "log_entries",
+		.lname	= "Log entries",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, log_entries),
+		.help	= "Initial number of entries in a job IO log",
+		.def	= __fio_stringify(DEF_LOG_ENTRIES),
+		.minval	= DEF_LOG_ENTRIES,
+		.maxval	= MAX_LOG_ENTRIES,
+		.category = FIO_OPT_C_LOG,
+		.group	= FIO_OPT_G_INVALID,
+	},
 	{
 		.name	= "log_avg_msec",
 		.lname	= "Log averaging (msec)",
diff --git a/server.h b/server.h
index 44b8da12..25b6bbdc 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 94,
+	FIO_SERVER_VER			= 95,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index cd35b114..e0dc99b6 100644
--- a/stat.c
+++ b/stat.c
@@ -2688,27 +2688,25 @@ static inline void add_stat_sample(struct io_stat *is, unsigned long long data)
  */
 static struct io_logs *get_new_log(struct io_log *iolog)
 {
-	size_t new_size, new_samples;
+	size_t new_samples;
 	struct io_logs *cur_log;
 
 	/*
 	 * Cap the size at MAX_LOG_ENTRIES, so we don't keep doubling
 	 * forever
 	 */
-	if (!iolog->cur_log_max)
-		new_samples = DEF_LOG_ENTRIES;
-	else {
+	if (!iolog->cur_log_max) {
+		new_samples = iolog->td->o.log_entries;
+	} else {
 		new_samples = iolog->cur_log_max * 2;
 		if (new_samples > MAX_LOG_ENTRIES)
 			new_samples = MAX_LOG_ENTRIES;
 	}
 
-	new_size = new_samples * log_entry_sz(iolog);
-
 	cur_log = smalloc(sizeof(*cur_log));
 	if (cur_log) {
 		INIT_FLIST_HEAD(&cur_log->list);
-		cur_log->log = malloc(new_size);
+		cur_log->log = calloc(new_samples, log_entry_sz(iolog));
 		if (cur_log->log) {
 			cur_log->nr_samples = 0;
 			cur_log->max_samples = new_samples;
diff --git a/t/io_uring.c b/t/io_uring.c
index f758a6d9..b79822d7 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -28,6 +28,7 @@
 #include "../arch/arch.h"
 #include "../lib/types.h"
 #include "../lib/roundup.h"
+#include "../lib/rand.h"
 #include "../minmax.h"
 #include "../os/linux/io_uring.h"
 
@@ -59,6 +60,8 @@ static unsigned sq_ring_mask, cq_ring_mask;
 
 struct file {
 	unsigned long max_blocks;
+	unsigned long max_size;
+	unsigned long cur_off;
 	unsigned pending_ios;
 	int real_fd;
 	int fixed_fd;
@@ -86,6 +89,8 @@ struct submitter {
 
 	__s32 *fds;
 
+	struct taus258_state rand_state;
+
 	unsigned long *clock_batch;
 	int clock_index;
 	unsigned long *plat;
@@ -120,7 +125,8 @@ static int do_nop = 0;		/* no-op SQ ring commands */
 static int nthreads = 1;
 static int stats = 0;		/* generate IO stats */
 static int aio = 0;		/* use libaio */
-static int runtime = 0;	/* runtime */
+static int runtime = 0;		/* runtime */
+static int random_io = 1;	/* random or sequential IO */
 
 static unsigned long tsc_rate;
 
@@ -448,8 +454,15 @@ static void init_io(struct submitter *s, unsigned index)
 	}
 	f->pending_ios++;
 
-	r = lrand48();
-	offset = (r % (f->max_blocks - 1)) * bs;
+	if (random_io) {
+		r = __rand64(&s->rand_state);
+		offset = (r % (f->max_blocks - 1)) * bs;
+	} else {
+		offset = f->cur_off;
+		f->cur_off += bs;
+		if (f->cur_off + bs > f->max_size)
+			f->cur_off = 0;
+	}
 
 	if (register_files) {
 		sqe->flags = IOSQE_FIXED_FILE;
@@ -517,9 +530,11 @@ static int get_file_size(struct file *f)
 			return -1;
 
 		f->max_blocks = bytes / bs;
+		f->max_size = bytes;
 		return 0;
 	} else if (S_ISREG(st.st_mode)) {
 		f->max_blocks = st.st_size / bs;
+		f->max_size = st.st_size;
 		return 0;
 	}
 
@@ -586,6 +601,7 @@ static int submitter_init(struct submitter *s)
 	s->tid = gettid();
 	printf("submitter=%d, tid=%d\n", s->index, s->tid);
 
+	__init_rand64(&s->rand_state, pthread_self());
 	srand48(pthread_self());
 
 	for (i = 0; i < MAX_FDS; i++)
@@ -1066,11 +1082,12 @@ static void usage(char *argv, int status)
 		" -N <bool> : Perform just no-op requests, default %d\n"
 		" -t <bool> : Track IO latencies, default %d\n"
 		" -T <int>  : TSC rate in HZ\n"
-		" -a <bool> : Use legacy aio, default %d\n"
-		" -r <int>  : Runtime in seconds, default %s\n",
+		" -r <int>  : Runtime in seconds, default %s\n"
+		" -R <bool> : Use random IO, default %d\n"
+		" -a <bool> : Use legacy aio, default %d\n",
 		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
 		fixedbufs, dma_map, register_files, nthreads, !buffered, do_nop,
-		stats, aio, runtime == 0 ? "unlimited" : runtime_str);
+		stats, runtime == 0 ? "unlimited" : runtime_str, aio, random_io);
 	exit(status);
 }
 
@@ -1130,7 +1147,7 @@ int main(int argc, char *argv[])
 	if (!do_nop && argc < 2)
 		usage(argv[0], 1);
 
-	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:r:D:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:r:D:R:h?")) != -1) {
 		switch (opt) {
 		case 'a':
 			aio = !!atoi(optarg);
@@ -1194,6 +1211,9 @@ int main(int argc, char *argv[])
 		case 'D':
 			dma_map = !!atoi(optarg);
 			break;
+		case 'R':
+			random_io = !!atoi(optarg);
+			break;
 		case 'h':
 		case '?':
 		default:
diff --git a/thread_options.h b/thread_options.h
index 6e1a2cdd..8f4c8a59 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -377,6 +377,7 @@ struct thread_options {
 	fio_fp64_t zrt;
 	fio_fp64_t zrf;
 
+	unsigned int log_entries;
 	unsigned int log_prio;
 };
 
@@ -683,6 +684,7 @@ struct thread_options_pack {
 	int32_t max_open_zones;
 	uint32_t ignore_zone_limits;
 
+	uint32_t log_entries;
 	uint32_t log_prio;
 } __attribute__((packed));
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-11-18 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-11-18 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f7c3f31db877d30056d19761e48499f5b0bfa0b6:

  Merge branch 'jf_readme_typo' of https://github.com/jfpanisset/fio (2021-11-12 09:22:21 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5711325cbb37d10c21a6975d1f1ebea11799c05e:

  Makefile: Fix android compilation (2021-11-17 16:14:27 -0700)

----------------------------------------------------------------
Gwendal Grignou (1):
      Makefile: Fix android compilation

 Makefile | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index e9028dce..04c1e0a7 100644
--- a/Makefile
+++ b/Makefile
@@ -236,6 +236,7 @@ endif
 ifeq ($(CONFIG_TARGET_OS), Android)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c profiles/tiobench.c \
 		oslib/linux-dev-lookup.c engines/io_uring.c
+  cmdprio_SRCS = engines/cmdprio.c
 ifdef CONFIG_HAS_BLKZONED
   SOURCE += oslib/linux-blkzoned.c
 endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-11-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-11-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6619fc32c413c4ff3a24c819037fb9227af3f876:

  stat: create a init_thread_stat_min_vals() helper (2021-11-08 06:24:48 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f7c3f31db877d30056d19761e48499f5b0bfa0b6:

  Merge branch 'jf_readme_typo' of https://github.com/jfpanisset/fio (2021-11-12 09:22:21 -0700)

----------------------------------------------------------------
Jean-Francois Panisset (1):
      Small typo fix

Jens Axboe (1):
      Merge branch 'jf_readme_typo' of https://github.com/jfpanisset/fio

Niklas Cassel (8):
      docs: update cmdprio_percentage documentation
      cmdprio: move cmdprio function definitions to a new cmdprio.c file
      cmdprio: do not allocate memory for unused data direction
      io_uring: set async IO priority to td->ioprio in fio_ioring_prep()
      libaio,io_uring: rename prio_prep() to include cmdprio in the name
      libaio,io_uring: move common cmdprio_prep() code to cmdprio
      cmdprio: add mode to make the logic easier to reason about
      libaio,io_uring: make it possible to cleanup cmdprio malloced data

 HOWTO              |   5 +-
 Makefile           |   6 ++
 README             |   2 +-
 engines/cmdprio.c  | 243 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 engines/cmdprio.h  | 150 ++++++---------------------------
 engines/io_uring.c | 100 ++++++++--------------
 engines/libaio.c   |  72 +++++-----------
 fio.1              |   3 +-
 8 files changed, 333 insertions(+), 248 deletions(-)
 create mode 100644 engines/cmdprio.c

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 297a0485..196bca6c 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2167,9 +2167,8 @@ with the caveat that when used on the command line, they must come after the
 
     Set the percentage of I/O that will be issued with the highest priority.
     Default: 0. A single value applies to reads and writes. Comma-separated
-    values may be specified for reads and writes. This option cannot be used
-    with the :option:`prio` or :option:`prioclass` options. For this option
-    to be effective, NCQ priority must be supported and enabled, and `direct=1'
+    values may be specified for reads and writes. For this option to be
+    effective, NCQ priority must be supported and enabled, and `direct=1'
     option must be used. fio must also be run as the root user.
 
 .. option:: cmdprio_class=int[,int] : [io_uring] [libaio]
diff --git a/Makefile b/Makefile
index 4ae5a371..e9028dce 100644
--- a/Makefile
+++ b/Makefile
@@ -98,6 +98,7 @@ else ifdef CONFIG_32BIT
 endif
 ifdef CONFIG_LIBAIO
   libaio_SRCS = engines/libaio.c
+  cmdprio_SRCS = engines/cmdprio.c
   libaio_LIBS = -laio
   ENGINES += libaio
 endif
@@ -225,6 +226,7 @@ endif
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
 		oslib/linux-dev-lookup.c engines/io_uring.c
+  cmdprio_SRCS = engines/cmdprio.c
 ifdef CONFIG_HAS_BLKZONED
   SOURCE += oslib/linux-blkzoned.c
 endif
@@ -281,6 +283,10 @@ ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
   FIO_CFLAGS += -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format
 endif
 
+ifdef cmdprio_SRCS
+  SOURCE += $(cmdprio_SRCS)
+endif
+
 ifdef CONFIG_DYNAMIC_ENGINES
  DYNAMIC_ENGS := $(ENGINES)
 define engine_template =
diff --git a/README b/README
index 52eca5c3..d566fae3 100644
--- a/README
+++ b/README
@@ -10,7 +10,7 @@ tailored test case again and again.
 
 A test work load is difficult to define, though. There can be any number of
 processes or threads involved, and they can each be using their own way of
-generating I/O. You could have someone dirtying large amounts of memory in an
+generating I/O. You could have someone dirtying large amounts of memory in a
 memory mapped file, or maybe several threads issuing reads using asynchronous
 I/O. fio needed to be flexible enough to simulate both of these cases, and many
 more.
diff --git a/engines/cmdprio.c b/engines/cmdprio.c
new file mode 100644
index 00000000..92b752ae
--- /dev/null
+++ b/engines/cmdprio.c
@@ -0,0 +1,243 @@
+/*
+ * IO priority handling helper functions common to the libaio and io_uring
+ * engines.
+ */
+
+#include "cmdprio.h"
+
+static int fio_cmdprio_bssplit_ddir(struct thread_options *to, void *cb_arg,
+				    enum fio_ddir ddir, char *str, bool data)
+{
+	struct cmdprio *cmdprio = cb_arg;
+	struct split split;
+	unsigned int i;
+
+	if (ddir == DDIR_TRIM)
+		return 0;
+
+	memset(&split, 0, sizeof(split));
+
+	if (split_parse_ddir(to, &split, str, data, BSSPLIT_MAX))
+		return 1;
+	if (!split.nr)
+		return 0;
+
+	cmdprio->bssplit_nr[ddir] = split.nr;
+	cmdprio->bssplit[ddir] = malloc(split.nr * sizeof(struct bssplit));
+	if (!cmdprio->bssplit[ddir])
+		return 1;
+
+	for (i = 0; i < split.nr; i++) {
+		cmdprio->bssplit[ddir][i].bs = split.val1[i];
+		if (split.val2[i] == -1U) {
+			cmdprio->bssplit[ddir][i].perc = 0;
+		} else {
+			if (split.val2[i] > 100)
+				cmdprio->bssplit[ddir][i].perc = 100;
+			else
+				cmdprio->bssplit[ddir][i].perc = split.val2[i];
+		}
+	}
+
+	return 0;
+}
+
+int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
+			      struct cmdprio *cmdprio)
+{
+	char *str, *p;
+	int ret = 0;
+
+	p = str = strdup(input);
+
+	strip_blank_front(&str);
+	strip_blank_end(str);
+
+	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio,
+			      false);
+
+	free(p);
+	return ret;
+}
+
+static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
+{
+	enum fio_ddir ddir = io_u->ddir;
+	struct cmdprio_options *options = cmdprio->options;
+	int i;
+
+	switch (cmdprio->mode) {
+	case CMDPRIO_MODE_PERC:
+		return options->percentage[ddir];
+	case CMDPRIO_MODE_BSSPLIT:
+		for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
+			if (cmdprio->bssplit[ddir][i].bs == io_u->buflen)
+				return cmdprio->bssplit[ddir][i].perc;
+		}
+		break;
+	default:
+		/*
+		 * An I/O engine should never call this function if cmdprio
+		 * is not is use.
+		 */
+		assert(0);
+	}
+
+	return 0;
+}
+
+/**
+ * fio_cmdprio_set_ioprio - Set an io_u ioprio according to cmdprio options
+ *
+ * Generates a random percentage value to determine if an io_u ioprio needs
+ * to be set. If the random percentage value is within the user specified
+ * percentage of I/Os that should use a cmdprio priority value (rather than
+ * the default priority), then this function updates the io_u with an ioprio
+ * value as defined by the cmdprio/cmdprio_class or cmdprio_bssplit options.
+ *
+ * Return true if the io_u ioprio was changed and false otherwise.
+ */
+bool fio_cmdprio_set_ioprio(struct thread_data *td, struct cmdprio *cmdprio,
+			    struct io_u *io_u)
+{
+	enum fio_ddir ddir = io_u->ddir;
+	struct cmdprio_options *options = cmdprio->options;
+	unsigned int p;
+	unsigned int cmdprio_value =
+		ioprio_value(options->class[ddir], options->level[ddir]);
+
+	p = fio_cmdprio_percentage(cmdprio, io_u);
+	if (p && rand_between(&td->prio_state, 0, 99) < p) {
+		io_u->ioprio = cmdprio_value;
+		if (!td->ioprio || cmdprio_value < td->ioprio) {
+			/*
+			 * The async IO priority is higher (has a lower value)
+			 * than the default priority (which is either 0 or the
+			 * value set by "prio" and "prioclass" options).
+			 */
+			io_u->flags |= IO_U_F_HIGH_PRIO;
+		}
+		return true;
+	}
+
+	if (td->ioprio && td->ioprio < cmdprio_value) {
+		/*
+		 * The IO will be executed with the default priority (which is
+		 * either 0 or the value set by "prio" and "prioclass options),
+		 * and this priority is higher (has a lower value) than the
+		 * async IO priority.
+		 */
+		io_u->flags |= IO_U_F_HIGH_PRIO;
+	}
+
+	return false;
+}
+
+static int fio_cmdprio_parse_and_gen_bssplit(struct thread_data *td,
+					     struct cmdprio *cmdprio)
+{
+	struct cmdprio_options *options = cmdprio->options;
+	int ret;
+
+	ret = fio_cmdprio_bssplit_parse(td, options->bssplit_str, cmdprio);
+	if (ret)
+		goto err;
+
+	return 0;
+
+err:
+	fio_cmdprio_cleanup(cmdprio);
+
+	return ret;
+}
+
+static int fio_cmdprio_parse_and_gen(struct thread_data *td,
+				     struct cmdprio *cmdprio)
+{
+	struct cmdprio_options *options = cmdprio->options;
+	int i, ret;
+
+	switch (cmdprio->mode) {
+	case CMDPRIO_MODE_BSSPLIT:
+		ret = fio_cmdprio_parse_and_gen_bssplit(td, cmdprio);
+		break;
+	case CMDPRIO_MODE_PERC:
+		ret = 0;
+		break;
+	default:
+		assert(0);
+		return 1;
+	}
+
+	/*
+	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
+	 * is not set, default to RT priority class.
+	 */
+	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
+		if (options->percentage[i] || cmdprio->bssplit_nr[i]) {
+			if (!options->class[i])
+				options->class[i] = IOPRIO_CLASS_RT;
+		}
+	}
+
+	return ret;
+}
+
+void fio_cmdprio_cleanup(struct cmdprio *cmdprio)
+{
+	int ddir;
+
+	for (ddir = 0; ddir < CMDPRIO_RWDIR_CNT; ddir++) {
+		free(cmdprio->bssplit[ddir]);
+		cmdprio->bssplit[ddir] = NULL;
+		cmdprio->bssplit_nr[ddir] = 0;
+	}
+
+	/*
+	 * options points to a cmdprio_options struct that is part of td->eo.
+	 * td->eo itself will be freed by free_ioengine().
+	 */
+	cmdprio->options = NULL;
+}
+
+int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
+		     struct cmdprio_options *options)
+{
+	struct thread_options *to = &td->o;
+	bool has_cmdprio_percentage = false;
+	bool has_cmdprio_bssplit = false;
+	int i;
+
+	cmdprio->options = options;
+
+	if (options->bssplit_str && strlen(options->bssplit_str))
+		has_cmdprio_bssplit = true;
+
+	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
+		if (options->percentage[i])
+			has_cmdprio_percentage = true;
+	}
+
+	/*
+	 * Check for option conflicts
+	 */
+	if (has_cmdprio_percentage && has_cmdprio_bssplit) {
+		log_err("%s: cmdprio_percentage and cmdprio_bssplit options "
+			"are mutually exclusive\n",
+			to->name);
+		return 1;
+	}
+
+	if (has_cmdprio_bssplit)
+		cmdprio->mode = CMDPRIO_MODE_BSSPLIT;
+	else if (has_cmdprio_percentage)
+		cmdprio->mode = CMDPRIO_MODE_PERC;
+	else
+		cmdprio->mode = CMDPRIO_MODE_NONE;
+
+	/* Nothing left to do if cmdprio is not used */
+	if (cmdprio->mode == CMDPRIO_MODE_NONE)
+		return 0;
+
+	return fio_cmdprio_parse_and_gen(td, cmdprio);
+}
diff --git a/engines/cmdprio.h b/engines/cmdprio.h
index 0edc4365..0c7bd6cf 100644
--- a/engines/cmdprio.h
+++ b/engines/cmdprio.h
@@ -8,137 +8,35 @@
 
 #include "../fio.h"
 
-struct cmdprio {
-	unsigned int percentage[DDIR_RWDIR_CNT];
-	unsigned int class[DDIR_RWDIR_CNT];
-	unsigned int level[DDIR_RWDIR_CNT];
-	unsigned int bssplit_nr[DDIR_RWDIR_CNT];
-	struct bssplit *bssplit[DDIR_RWDIR_CNT];
-};
-
-static int fio_cmdprio_bssplit_ddir(struct thread_options *to, void *cb_arg,
-				    enum fio_ddir ddir, char *str, bool data)
-{
-	struct cmdprio *cmdprio = cb_arg;
-	struct split split;
-	unsigned int i;
-
-	if (ddir == DDIR_TRIM)
-		return 0;
-
-	memset(&split, 0, sizeof(split));
-
-	if (split_parse_ddir(to, &split, str, data, BSSPLIT_MAX))
-		return 1;
-	if (!split.nr)
-		return 0;
-
-	cmdprio->bssplit_nr[ddir] = split.nr;
-	cmdprio->bssplit[ddir] = malloc(split.nr * sizeof(struct bssplit));
-	if (!cmdprio->bssplit[ddir])
-		return 1;
-
-	for (i = 0; i < split.nr; i++) {
-		cmdprio->bssplit[ddir][i].bs = split.val1[i];
-		if (split.val2[i] == -1U) {
-			cmdprio->bssplit[ddir][i].perc = 0;
-		} else {
-			if (split.val2[i] > 100)
-				cmdprio->bssplit[ddir][i].perc = 100;
-			else
-				cmdprio->bssplit[ddir][i].perc = split.val2[i];
-		}
-	}
-
-	return 0;
-}
-
-static int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
-				     struct cmdprio *cmdprio)
-{
-	char *str, *p;
-	int i, ret = 0;
-
-	p = str = strdup(input);
+/* read and writes only, no trim */
+#define CMDPRIO_RWDIR_CNT 2
 
-	strip_blank_front(&str);
-	strip_blank_end(str);
-
-	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio, false);
-
-	if (parse_dryrun()) {
-		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-			free(cmdprio->bssplit[i]);
-			cmdprio->bssplit[i] = NULL;
-			cmdprio->bssplit_nr[i] = 0;
-		}
-	}
-
-	free(p);
-	return ret;
-}
-
-static inline int fio_cmdprio_percentage(struct cmdprio *cmdprio,
-					 struct io_u *io_u)
-{
-	enum fio_ddir ddir = io_u->ddir;
-	unsigned int p = cmdprio->percentage[ddir];
-	int i;
-
-	/*
-	 * If cmdprio_percentage option was specified, then use that
-	 * percentage. Otherwise, use cmdprio_bssplit percentages depending
-	 * on the IO size.
-	 */
-	if (p)
-		return p;
-
-	for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
-		if (cmdprio->bssplit[ddir][i].bs == io_u->buflen)
-			return cmdprio->bssplit[ddir][i].perc;
-	}
-
-	return 0;
-}
+enum {
+	CMDPRIO_MODE_NONE,
+	CMDPRIO_MODE_PERC,
+	CMDPRIO_MODE_BSSPLIT,
+};
 
-static int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
-			    bool *has_cmdprio)
-{
-	struct thread_options *to = &td->o;
-	bool has_cmdprio_percentage = false;
-	bool has_cmdprio_bssplit = false;
-	int i;
+struct cmdprio_options {
+	unsigned int percentage[CMDPRIO_RWDIR_CNT];
+	unsigned int class[CMDPRIO_RWDIR_CNT];
+	unsigned int level[CMDPRIO_RWDIR_CNT];
+	char *bssplit_str;
+};
 
-	/*
-	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
-	 * is not set, default to RT priority class.
-	 */
-	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		if (cmdprio->percentage[i]) {
-			if (!cmdprio->class[i])
-				cmdprio->class[i] = IOPRIO_CLASS_RT;
-			has_cmdprio_percentage = true;
-		}
-		if (cmdprio->bssplit_nr[i]) {
-			if (!cmdprio->class[i])
-				cmdprio->class[i] = IOPRIO_CLASS_RT;
-			has_cmdprio_bssplit = true;
-		}
-	}
+struct cmdprio {
+	struct cmdprio_options *options;
+	unsigned int bssplit_nr[CMDPRIO_RWDIR_CNT];
+	struct bssplit *bssplit[CMDPRIO_RWDIR_CNT];
+	unsigned int mode;
+};
 
-	/*
-	 * Check for option conflicts
-	 */
-	if (has_cmdprio_percentage && has_cmdprio_bssplit) {
-		log_err("%s: cmdprio_percentage and cmdprio_bssplit options "
-			"are mutually exclusive\n",
-			to->name);
-		return 1;
-	}
+bool fio_cmdprio_set_ioprio(struct thread_data *td, struct cmdprio *cmdprio,
+			    struct io_u *io_u);
 
-	*has_cmdprio = has_cmdprio_percentage || has_cmdprio_bssplit;
+void fio_cmdprio_cleanup(struct cmdprio *cmdprio);
 
-	return 0;
-}
+int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
+		     struct cmdprio_options *options);
 
 #endif
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 27a4a678..8b8f35f1 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -69,13 +69,13 @@ struct ioring_data {
 
 	struct ioring_mmap mmap[3];
 
-	bool use_cmdprio;
+	struct cmdprio cmdprio;
 };
 
 struct ioring_options {
 	struct thread_data *td;
 	unsigned int hipri;
-	struct cmdprio cmdprio;
+	struct cmdprio_options cmdprio_options;
 	unsigned int fixedbufs;
 	unsigned int registerfiles;
 	unsigned int sqpoll_thread;
@@ -106,15 +106,6 @@ static int fio_ioring_sqpoll_cb(void *data, unsigned long long *val)
 	return 0;
 }
 
-static int str_cmdprio_bssplit_cb(void *data, const char *input)
-{
-	struct ioring_options *o = data;
-	struct thread_data *td = o->td;
-	struct cmdprio *cmdprio = &o->cmdprio;
-
-	return fio_cmdprio_bssplit_parse(td, input, cmdprio);
-}
-
 static struct fio_option options[] = {
 	{
 		.name	= "hipri",
@@ -131,9 +122,9 @@ static struct fio_option options[] = {
 		.lname	= "high priority percentage",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct ioring_options,
-				   cmdprio.percentage[DDIR_READ]),
+				   cmdprio_options.percentage[DDIR_READ]),
 		.off2	= offsetof(struct ioring_options,
-				   cmdprio.percentage[DDIR_WRITE]),
+				   cmdprio_options.percentage[DDIR_WRITE]),
 		.minval	= 0,
 		.maxval	= 100,
 		.help	= "Send high priority I/O this percentage of the time",
@@ -145,9 +136,9 @@ static struct fio_option options[] = {
 		.lname	= "Asynchronous I/O priority class",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct ioring_options,
-				   cmdprio.class[DDIR_READ]),
+				   cmdprio_options.class[DDIR_READ]),
 		.off2	= offsetof(struct ioring_options,
-				   cmdprio.class[DDIR_WRITE]),
+				   cmdprio_options.class[DDIR_WRITE]),
 		.help	= "Set asynchronous IO priority class",
 		.minval	= IOPRIO_MIN_PRIO_CLASS + 1,
 		.maxval	= IOPRIO_MAX_PRIO_CLASS,
@@ -160,9 +151,9 @@ static struct fio_option options[] = {
 		.lname	= "Asynchronous I/O priority level",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct ioring_options,
-				   cmdprio.level[DDIR_READ]),
+				   cmdprio_options.level[DDIR_READ]),
 		.off2	= offsetof(struct ioring_options,
-				   cmdprio.level[DDIR_WRITE]),
+				   cmdprio_options.level[DDIR_WRITE]),
 		.help	= "Set asynchronous IO priority level",
 		.minval	= IOPRIO_MIN_PRIO,
 		.maxval	= IOPRIO_MAX_PRIO,
@@ -173,9 +164,9 @@ static struct fio_option options[] = {
 	{
 		.name   = "cmdprio_bssplit",
 		.lname  = "Priority percentage block size split",
-		.type   = FIO_OPT_STR_ULL,
-		.cb     = str_cmdprio_bssplit_cb,
-		.off1   = offsetof(struct ioring_options, cmdprio.bssplit),
+		.type   = FIO_OPT_STR_STORE,
+		.off1   = offsetof(struct ioring_options,
+				   cmdprio_options.bssplit_str),
 		.help   = "Set priority percentages for different block sizes",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
@@ -338,6 +329,18 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 			sqe->rw_flags |= RWF_UNCACHED;
 		if (o->nowait)
 			sqe->rw_flags |= RWF_NOWAIT;
+
+		/*
+		 * Since io_uring can have a submission context (sqthread_poll)
+		 * that is different from the process context, we cannot rely on
+		 * the IO priority set by ioprio_set() (option prio/prioclass)
+		 * to be inherited.
+		 * td->ioprio will have the value of the "default prio", so set
+		 * this unconditionally. This value might get overridden by
+		 * fio_ioring_cmdprio_prep() if the option cmdprio_percentage or
+		 * cmdprio_bssplit is used.
+		 */
+		sqe->ioprio = td->ioprio;
 		sqe->off = io_u->offset;
 	} else if (ddir_sync(io_u->ddir)) {
 		sqe->ioprio = 0;
@@ -444,41 +447,14 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
 	return r < 0 ? r : events;
 }
 
-static void fio_ioring_prio_prep(struct thread_data *td, struct io_u *io_u)
+static inline void fio_ioring_cmdprio_prep(struct thread_data *td,
+					   struct io_u *io_u)
 {
-	struct ioring_options *o = td->eo;
 	struct ioring_data *ld = td->io_ops_data;
-	struct io_uring_sqe *sqe = &ld->sqes[io_u->index];
-	struct cmdprio *cmdprio = &o->cmdprio;
-	enum fio_ddir ddir = io_u->ddir;
-	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
-	unsigned int cmdprio_value =
-		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
-
-	if (p && rand_between(&td->prio_state, 0, 99) < p) {
-		sqe->ioprio = cmdprio_value;
-		if (!td->ioprio || cmdprio_value < td->ioprio) {
-			/*
-			 * The async IO priority is higher (has a lower value)
-			 * than the priority set by "prio" and "prioclass"
-			 * options.
-			 */
-			io_u->flags |= IO_U_F_HIGH_PRIO;
-		}
-	} else {
-		sqe->ioprio = td->ioprio;
-		if (cmdprio_value && td->ioprio && td->ioprio < cmdprio_value) {
-			/*
-			 * The IO will be executed with the priority set by
-			 * "prio" and "prioclass" options, and this priority
-			 * is higher (has a lower value) than the async IO
-			 * priority.
-			 */
-			io_u->flags |= IO_U_F_HIGH_PRIO;
-		}
-	}
+	struct cmdprio *cmdprio = &ld->cmdprio;
 
-	io_u->ioprio = sqe->ioprio;
+	if (fio_cmdprio_set_ioprio(td, cmdprio, io_u))
+		ld->sqes[io_u->index].ioprio = io_u->ioprio;
 }
 
 static enum fio_q_status fio_ioring_queue(struct thread_data *td,
@@ -508,8 +484,9 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 	if (next_tail == atomic_load_acquire(ring->head))
 		return FIO_Q_BUSY;
 
-	if (ld->use_cmdprio)
-		fio_ioring_prio_prep(td, io_u);
+	if (ld->cmdprio.mode != CMDPRIO_MODE_NONE)
+		fio_ioring_cmdprio_prep(td, io_u);
+
 	ring->array[tail & ld->sq_ring_mask] = io_u->index;
 	atomic_store_release(ring->tail, next_tail);
 
@@ -613,6 +590,7 @@ static void fio_ioring_cleanup(struct thread_data *td)
 		if (!(td->flags & TD_F_CHILD))
 			fio_ioring_unmap(ld);
 
+		fio_cmdprio_cleanup(&ld->cmdprio);
 		free(ld->io_u_index);
 		free(ld->iovecs);
 		free(ld->fds);
@@ -819,8 +797,6 @@ static int fio_ioring_init(struct thread_data *td)
 {
 	struct ioring_options *o = td->eo;
 	struct ioring_data *ld;
-	struct cmdprio *cmdprio = &o->cmdprio;
-	bool has_cmdprio = false;
 	int ret;
 
 	/* sqthread submission requires registered files */
@@ -845,22 +821,12 @@ static int fio_ioring_init(struct thread_data *td)
 
 	td->io_ops_data = ld;
 
-	ret = fio_cmdprio_init(td, cmdprio, &has_cmdprio);
+	ret = fio_cmdprio_init(td, &ld->cmdprio, &o->cmdprio_options);
 	if (ret) {
 		td_verror(td, EINVAL, "fio_ioring_init");
 		return 1;
 	}
 
-	/*
-	 * Since io_uring can have a submission context (sqthread_poll) that is
-	 * different from the process context, we cannot rely on the the IO
-	 * priority set by ioprio_set() (option prio/prioclass) to be inherited.
-	 * Therefore, we set the sqe->ioprio field when prio/prioclass is used.
-	 */
-	ld->use_cmdprio = has_cmdprio ||
-		fio_option_is_set(&td->o, ioprio_class) ||
-		fio_option_is_set(&td->o, ioprio);
-
 	return 0;
 }
 
diff --git a/engines/libaio.c b/engines/libaio.c
index dd655355..9c278d06 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -52,25 +52,16 @@ struct libaio_data {
 	unsigned int head;
 	unsigned int tail;
 
-	bool use_cmdprio;
+	struct cmdprio cmdprio;
 };
 
 struct libaio_options {
 	struct thread_data *td;
 	unsigned int userspace_reap;
-	struct cmdprio cmdprio;
+	struct cmdprio_options cmdprio_options;
 	unsigned int nowait;
 };
 
-static int str_cmdprio_bssplit_cb(void *data, const char *input)
-{
-	struct libaio_options *o = data;
-	struct thread_data *td = o->td;
-	struct cmdprio *cmdprio = &o->cmdprio;
-
-	return fio_cmdprio_bssplit_parse(td, input, cmdprio);
-}
-
 static struct fio_option options[] = {
 	{
 		.name	= "userspace_reap",
@@ -87,9 +78,9 @@ static struct fio_option options[] = {
 		.lname	= "high priority percentage",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct libaio_options,
-				   cmdprio.percentage[DDIR_READ]),
+				   cmdprio_options.percentage[DDIR_READ]),
 		.off2	= offsetof(struct libaio_options,
-				   cmdprio.percentage[DDIR_WRITE]),
+				   cmdprio_options.percentage[DDIR_WRITE]),
 		.minval	= 0,
 		.maxval	= 100,
 		.help	= "Send high priority I/O this percentage of the time",
@@ -101,9 +92,9 @@ static struct fio_option options[] = {
 		.lname	= "Asynchronous I/O priority class",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct libaio_options,
-				   cmdprio.class[DDIR_READ]),
+				   cmdprio_options.class[DDIR_READ]),
 		.off2	= offsetof(struct libaio_options,
-				   cmdprio.class[DDIR_WRITE]),
+				   cmdprio_options.class[DDIR_WRITE]),
 		.help	= "Set asynchronous IO priority class",
 		.minval	= IOPRIO_MIN_PRIO_CLASS + 1,
 		.maxval	= IOPRIO_MAX_PRIO_CLASS,
@@ -116,9 +107,9 @@ static struct fio_option options[] = {
 		.lname	= "Asynchronous I/O priority level",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct libaio_options,
-				   cmdprio.level[DDIR_READ]),
+				   cmdprio_options.level[DDIR_READ]),
 		.off2	= offsetof(struct libaio_options,
-				   cmdprio.level[DDIR_WRITE]),
+				   cmdprio_options.level[DDIR_WRITE]),
 		.help	= "Set asynchronous IO priority level",
 		.minval	= IOPRIO_MIN_PRIO,
 		.maxval	= IOPRIO_MAX_PRIO,
@@ -129,9 +120,9 @@ static struct fio_option options[] = {
 	{
 		.name   = "cmdprio_bssplit",
 		.lname  = "Priority percentage block size split",
-		.type   = FIO_OPT_STR_ULL,
-		.cb     = str_cmdprio_bssplit_cb,
-		.off1   = offsetof(struct libaio_options, cmdprio.bssplit),
+		.type   = FIO_OPT_STR_STORE,
+		.off1   = offsetof(struct libaio_options,
+				   cmdprio_options.bssplit_str),
 		.help   = "Set priority percentages for different block sizes",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
@@ -205,33 +196,15 @@ static int fio_libaio_prep(struct thread_data *td, struct io_u *io_u)
 	return 0;
 }
 
-static void fio_libaio_prio_prep(struct thread_data *td, struct io_u *io_u)
+static inline void fio_libaio_cmdprio_prep(struct thread_data *td,
+					   struct io_u *io_u)
 {
-	struct libaio_options *o = td->eo;
-	struct cmdprio *cmdprio = &o->cmdprio;
-	enum fio_ddir ddir = io_u->ddir;
-	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
-	unsigned int cmdprio_value =
-		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
-
-	if (p && rand_between(&td->prio_state, 0, 99) < p) {
-		io_u->ioprio = cmdprio_value;
-		io_u->iocb.aio_reqprio = cmdprio_value;
+	struct libaio_data *ld = td->io_ops_data;
+	struct cmdprio *cmdprio = &ld->cmdprio;
+
+	if (fio_cmdprio_set_ioprio(td, cmdprio, io_u)) {
+		io_u->iocb.aio_reqprio = io_u->ioprio;
 		io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO;
-		if (!td->ioprio || cmdprio_value < td->ioprio) {
-			/*
-			 * The async IO priority is higher (has a lower value)
-			 * than the default context priority.
-			 */
-			io_u->flags |= IO_U_F_HIGH_PRIO;
-		}
-	} else if (td->ioprio && td->ioprio < cmdprio_value) {
-		/*
-		 * The IO will be executed with the default context priority,
-		 * and this priority is higher (has a lower value) than the
-		 * async IO priority.
-		 */
-		io_u->flags |= IO_U_F_HIGH_PRIO;
 	}
 }
 
@@ -368,8 +341,8 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 		return FIO_Q_COMPLETED;
 	}
 
-	if (ld->use_cmdprio)
-		fio_libaio_prio_prep(td, io_u);
+	if (ld->cmdprio.mode != CMDPRIO_MODE_NONE)
+		fio_libaio_cmdprio_prep(td, io_u);
 
 	ld->iocbs[ld->head] = &io_u->iocb;
 	ld->io_us[ld->head] = io_u;
@@ -487,6 +460,8 @@ static void fio_libaio_cleanup(struct thread_data *td)
 		 */
 		if (!(td->flags & TD_F_CHILD))
 			io_destroy(ld->aio_ctx);
+
+		fio_cmdprio_cleanup(&ld->cmdprio);
 		free(ld->aio_events);
 		free(ld->iocbs);
 		free(ld->io_us);
@@ -512,7 +487,6 @@ static int fio_libaio_init(struct thread_data *td)
 {
 	struct libaio_data *ld;
 	struct libaio_options *o = td->eo;
-	struct cmdprio *cmdprio = &o->cmdprio;
 	int ret;
 
 	ld = calloc(1, sizeof(*ld));
@@ -525,7 +499,7 @@ static int fio_libaio_init(struct thread_data *td)
 
 	td->io_ops_data = ld;
 
-	ret = fio_cmdprio_init(td, cmdprio, &ld->use_cmdprio);
+	ret = fio_cmdprio_init(td, &ld->cmdprio, &o->cmdprio_options);
 	if (ret) {
 		td_verror(td, EINVAL, "fio_libaio_init");
 		return 1;
diff --git a/fio.1 b/fio.1
index 78988c9e..e3c3feae 100644
--- a/fio.1
+++ b/fio.1
@@ -1965,8 +1965,7 @@ with the caveat that when used on the command line, they must come after the
 .BI (io_uring,libaio)cmdprio_percentage \fR=\fPint[,int]
 Set the percentage of I/O that will be issued with the highest priority.
 Default: 0. A single value applies to reads and writes. Comma-separated
-values may be specified for reads and writes. This option cannot be used
-with the `prio` or `prioclass` options. For this option to be effective,
+values may be specified for reads and writes. For this option to be effective,
 NCQ priority must be supported and enabled, and `direct=1' option must be
 used. fio must also be run as the root user.
 .TP

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-11-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-11-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a87ea1a869595ca57052e7645431a397d3c7d5ac:

  Merge branch 'evelu-peak' of https://github.com/ErwanAliasr1/fio (2021-10-25 12:38:35 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6619fc32c413c4ff3a24c819037fb9227af3f876:

  stat: create a init_thread_stat_min_vals() helper (2021-11-08 06:24:48 -0700)

----------------------------------------------------------------
Niklas Cassel (1):
      stat: create a init_thread_stat_min_vals() helper

 init.c | 11 +----------
 stat.c | 66 +++++++++++++++++++++---------------------------------------------
 stat.h |  1 +
 3 files changed, 23 insertions(+), 55 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index ec1a2cac..5f069d9a 100644
--- a/init.c
+++ b/init.c
@@ -1548,16 +1548,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	memcpy(td->ts.percentile_list, o->percentile_list, sizeof(o->percentile_list));
 	td->ts.sig_figs = o->sig_figs;
 
-	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		td->ts.clat_stat[i].min_val = ULONG_MAX;
-		td->ts.slat_stat[i].min_val = ULONG_MAX;
-		td->ts.lat_stat[i].min_val = ULONG_MAX;
-		td->ts.bw_stat[i].min_val = ULONG_MAX;
-		td->ts.iops_stat[i].min_val = ULONG_MAX;
-		td->ts.clat_high_prio_stat[i].min_val = ULONG_MAX;
-		td->ts.clat_low_prio_stat[i].min_val = ULONG_MAX;
-	}
-	td->ts.sync_stat.min_val = ULONG_MAX;
+	init_thread_stat_min_vals(&td->ts);
 	td->ddir_seq_nr = o->ddir_seq_nr;
 
 	if ((o->stonewall || o->new_group) && prev_group_jobs) {
diff --git a/stat.c b/stat.c
index 30f9b5c1..cd35b114 100644
--- a/stat.c
+++ b/stat.c
@@ -483,22 +483,13 @@ static void show_mixed_ddir_status(struct group_run_stats *rs, struct thread_sta
 	struct thread_stat *ts_lcl;
 
 	int i2p;
-	int ddir = 0, i;
+	int ddir = 0;
 
 	/* Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and Trims (ddir = 2) */
 	ts_lcl = malloc(sizeof(struct thread_stat));
 	memset((void *)ts_lcl, 0, sizeof(struct thread_stat));
 	ts_lcl->unified_rw_rep = UNIFIED_MIXED;               /* calculate mixed stats  */
-	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		ts_lcl->clat_stat[i].min_val = ULONG_MAX;
-		ts_lcl->slat_stat[i].min_val = ULONG_MAX;
-		ts_lcl->lat_stat[i].min_val = ULONG_MAX;
-		ts_lcl->bw_stat[i].min_val = ULONG_MAX;
-		ts_lcl->iops_stat[i].min_val = ULONG_MAX;
-		ts_lcl->clat_high_prio_stat[i].min_val = ULONG_MAX;
-		ts_lcl->clat_low_prio_stat[i].min_val = ULONG_MAX;
-	}
-	ts_lcl->sync_stat.min_val = ULONG_MAX;
+	init_thread_stat_min_vals(ts_lcl);
 
 	sum_thread_stats(ts_lcl, ts, 1);
 
@@ -1466,22 +1457,12 @@ static void show_mixed_ddir_status_terse(struct thread_stat *ts,
 				   int ver, struct buf_output *out)
 {
 	struct thread_stat *ts_lcl;
-	int i;
 
 	/* Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and Trims (ddir = 2) */
 	ts_lcl = malloc(sizeof(struct thread_stat));
 	memset((void *)ts_lcl, 0, sizeof(struct thread_stat));
 	ts_lcl->unified_rw_rep = UNIFIED_MIXED;               /* calculate mixed stats  */
-	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		ts_lcl->clat_stat[i].min_val = ULONG_MAX;
-		ts_lcl->slat_stat[i].min_val = ULONG_MAX;
-		ts_lcl->lat_stat[i].min_val = ULONG_MAX;
-		ts_lcl->bw_stat[i].min_val = ULONG_MAX;
-		ts_lcl->iops_stat[i].min_val = ULONG_MAX;
-		ts_lcl->clat_high_prio_stat[i].min_val = ULONG_MAX;
-		ts_lcl->clat_low_prio_stat[i].min_val = ULONG_MAX;
-	}
-	ts_lcl->sync_stat.min_val = ULONG_MAX;
+	init_thread_stat_min_vals(ts_lcl);
 	ts_lcl->lat_percentiles = ts->lat_percentiles;
 	ts_lcl->clat_percentiles = ts->clat_percentiles;
 	ts_lcl->slat_percentiles = ts->slat_percentiles;
@@ -1668,22 +1649,12 @@ static void add_mixed_ddir_status_json(struct thread_stat *ts,
 		struct group_run_stats *rs, struct json_object *parent)
 {
 	struct thread_stat *ts_lcl;
-	int i;
 
 	/* Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and Trims (ddir = 2) */
 	ts_lcl = malloc(sizeof(struct thread_stat));
 	memset((void *)ts_lcl, 0, sizeof(struct thread_stat));
 	ts_lcl->unified_rw_rep = UNIFIED_MIXED;               /* calculate mixed stats  */
-	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		ts_lcl->clat_stat[i].min_val = ULONG_MAX;
-		ts_lcl->slat_stat[i].min_val = ULONG_MAX;
-		ts_lcl->lat_stat[i].min_val = ULONG_MAX;
-		ts_lcl->bw_stat[i].min_val = ULONG_MAX;
-		ts_lcl->iops_stat[i].min_val = ULONG_MAX;
-		ts_lcl->clat_high_prio_stat[i].min_val = ULONG_MAX;
-		ts_lcl->clat_low_prio_stat[i].min_val = ULONG_MAX;
-	}
-	ts_lcl->sync_stat.min_val = ULONG_MAX;
+	init_thread_stat_min_vals(ts_lcl);
 	ts_lcl->lat_percentiles = ts->lat_percentiles;
 	ts_lcl->clat_percentiles = ts->clat_percentiles;
 	ts_lcl->slat_percentiles = ts->slat_percentiles;
@@ -2270,22 +2241,27 @@ void init_group_run_stat(struct group_run_stats *gs)
 		gs->min_bw[i] = gs->min_run[i] = ~0UL;
 }
 
-void init_thread_stat(struct thread_stat *ts)
+void init_thread_stat_min_vals(struct thread_stat *ts)
 {
-	int j;
+	int i;
 
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		ts->clat_stat[i].min_val = ULONG_MAX;
+		ts->slat_stat[i].min_val = ULONG_MAX;
+		ts->lat_stat[i].min_val = ULONG_MAX;
+		ts->bw_stat[i].min_val = ULONG_MAX;
+		ts->iops_stat[i].min_val = ULONG_MAX;
+		ts->clat_high_prio_stat[i].min_val = ULONG_MAX;
+		ts->clat_low_prio_stat[i].min_val = ULONG_MAX;
+	}
+	ts->sync_stat.min_val = ULONG_MAX;
+}
+
+void init_thread_stat(struct thread_stat *ts)
+{
 	memset(ts, 0, sizeof(*ts));
 
-	for (j = 0; j < DDIR_RWDIR_CNT; j++) {
-		ts->lat_stat[j].min_val = -1UL;
-		ts->clat_stat[j].min_val = -1UL;
-		ts->slat_stat[j].min_val = -1UL;
-		ts->bw_stat[j].min_val = -1UL;
-		ts->iops_stat[j].min_val = -1UL;
-		ts->clat_high_prio_stat[j].min_val = -1UL;
-		ts->clat_low_prio_stat[j].min_val = -1UL;
-	}
-	ts->sync_stat.min_val = -1UL;
+	init_thread_stat_min_vals(ts);
 	ts->groupid = -1;
 }
 
diff --git a/stat.h b/stat.h
index a06237e7..9ef8caa4 100644
--- a/stat.h
+++ b/stat.h
@@ -327,6 +327,7 @@ extern void show_running_run_stats(void);
 extern void check_for_running_stats(void);
 extern void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src, bool first);
 extern void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src);
+extern void init_thread_stat_min_vals(struct thread_stat *ts);
 extern void init_thread_stat(struct thread_stat *ts);
 extern void init_group_run_stat(struct group_run_stats *gs);
 extern void eta_to_str(char *str, unsigned long eta_sec);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 515418094c61cf135513a34651af6134a8794b5d:

  Merge branch 'master' of https://github.com/bvanassche/fio (2021-10-22 10:19:04 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a87ea1a869595ca57052e7645431a397d3c7d5ac:

  Merge branch 'evelu-peak' of https://github.com/ErwanAliasr1/fio (2021-10-25 12:38:35 -0600)

----------------------------------------------------------------
Erwan Velu (3):
      t/one-core-peak: Reporting SElinux status
      t/io_uring: Fixing typo in help message
      t/one-core-peak: Don't report errors if missing NVME features

Jens Axboe (1):
      Merge branch 'evelu-peak' of https://github.com/ErwanAliasr1/fio

 t/io_uring.c       |  2 +-
 t/one-core-peak.sh | 19 +++++++++++++------
 2 files changed, 14 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index a87042f8..f758a6d9 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -1059,7 +1059,7 @@ static void usage(char *argv, int status)
 		" -b <int>  : Block size, default %d\n"
 		" -p <bool> : Polled IO, default %d\n"
 		" -B <bool> : Fixed buffers, default %d\n"
-		" -R <bool> : DMA map fixed buffers, default %d\n"
+		" -D <bool> : DMA map fixed buffers, default %d\n"
 		" -F <bool> : Register files, default %d\n"
 		" -n <int>  : Number of threads, default %d\n"
 		" -O <bool> : Use O_DIRECT, default %d\n"
diff --git a/t/one-core-peak.sh b/t/one-core-peak.sh
index fba4ec95..9da8304e 100755
--- a/t/one-core-peak.sh
+++ b/t/one-core-peak.sh
@@ -199,12 +199,18 @@ show_nvme() {
   info ${device_name} "MODEL=${model} FW=${fw} serial=${serial} PCI=${pci_addr}@${link_speed} IRQ=${irq} NUMA=${numa} CPUS=${cpus} "
   which nvme &> /dev/null
   if [ $? -eq 0 ]; then
-    NCQA=$(nvme get-feature -H -f 0x7 ${device} |grep NCQA |cut -d ':' -f 2 | xargs)
-    NSQA=$(nvme get-feature -H -f 0x7 ${device} |grep NSQA |cut -d ':' -f 2 | xargs)
-    power_state=$(nvme get-feature -H -f 0x2 ${device} | grep PS |cut -d ":" -f 2 | xargs)
-    apste=$(nvme get-feature -H -f 0xc ${device} | grep APSTE |cut -d ":" -f 2 | xargs)
-    temp=$(nvme smart-log ${device} |grep 'temperature' |cut -d ':' -f 2 |xargs)
-    info ${device_name} "Temp:${temp}, Autonomous Power State Transition:${apste}, PowerState:${power_state}, Completion Queues:${NCQA}, Submission Queues:${NSQA}"
+    status=""
+    NCQA=$(nvme get-feature -H -f 0x7 ${device} 2>&1 |grep NCQA |cut -d ':' -f 2 | xargs)
+    [ -n "${NCQA}" ] && status="${status}Completion Queues:${NCQA}, "
+    NSQA=$(nvme get-feature -H -f 0x7 ${device} 2>&1 |grep NSQA |cut -d ':' -f 2 | xargs)
+    [ -n "${NSQA}" ] && status="${status}Submission Queues:${NSQA}, "
+    power_state=$(nvme get-feature -H -f 0x2 ${device} 2>&1 | grep PS |cut -d ":" -f 2 | xargs)
+    [ -n "${power_state}" ] && status="${status}PowerState:${power_state}, "
+    apste=$(nvme get-feature -H -f 0xc ${device} 2>&1 | grep APSTE |cut -d ":" -f 2 | xargs)
+    [ -n "${apste}" ] && status="${status} Autonomous Power State Transition:${apste}, "
+    temp=$(nvme smart-log ${device} 2>&1 |grep 'temperature' |cut -d ':' -f 2 |xargs)
+    [ -n "${temp}" ] && status="${status}Temp:${temp}"
+    info ${device_name} "${status}"
   fi
 }
 
@@ -241,6 +247,7 @@ show_system() {
     info "system" "KERNEL: $(show_kernel_config_item ${config_item})"
   done
   info "system" "KERNEL: $(cat /proc/cmdline)"
+  info "system" "SElinux: $(getenforce)"
   tsc=$(journalctl -k | grep 'tsc: Refined TSC clocksource calibration:' | awk '{print $11}')
   if [ -n "${tsc}" ]; then
     info "system" "TSC: ${tsc} Mhz"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2021-10-25 15:42     ` Rebecca Cran
@ 2021-10-25 15:43       ` Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-25 15:43 UTC (permalink / raw)
  To: Rebecca Cran, fio

On 10/25/21 9:42 AM, Rebecca Cran wrote:
> On 10/25/21 9:41 AM, Jens Axboe wrote:
> 
>> On 10/25/21 9:37 AM, Rebecca Cran wrote:
>>> On 10/23/21 6:00 AM, Jens Axboe wrote:
>>>> The following changes since commit 09d0a62931df0bb7ed4ae92b83a245e35d04100a:
>>>>
>>>>     Merge branch 'patch-1' of https://github.com/sweettea/fio (2021-10-19 16:09:21 -0600)
>>>>
>>>> are available in the Git repository at:
>>>>
>>>>     git://git.kernel.dk/fio.git master
>>> I just noticed this. Is it possible to change this to specify the https
>>> URL instead, since the git protocol is insecure?
>> Both will work with git.kernel.dk
> 
> I understand, I was thinking we might want to default to a secure protocol.

If you visit the page, it is listing both. This is just my script that sends
out the changes, don't think it's one that most would find and then use to
clone :)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2021-10-25 15:41   ` Jens Axboe
@ 2021-10-25 15:42     ` Rebecca Cran
  2021-10-25 15:43       ` Jens Axboe
  0 siblings, 1 reply; 1305+ messages in thread
From: Rebecca Cran @ 2021-10-25 15:42 UTC (permalink / raw)
  To: Jens Axboe, fio

On 10/25/21 9:41 AM, Jens Axboe wrote:

> On 10/25/21 9:37 AM, Rebecca Cran wrote:
>> On 10/23/21 6:00 AM, Jens Axboe wrote:
>>> The following changes since commit 09d0a62931df0bb7ed4ae92b83a245e35d04100a:
>>>
>>>     Merge branch 'patch-1' of https://github.com/sweettea/fio (2021-10-19 16:09:21 -0600)
>>>
>>> are available in the Git repository at:
>>>
>>>     git://git.kernel.dk/fio.git master
>> I just noticed this. Is it possible to change this to specify the https
>> URL instead, since the git protocol is insecure?
> Both will work with git.kernel.dk

I understand, I was thinking we might want to default to a secure protocol.


-- 
Rebecca Cran



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2021-10-25 15:37 ` Rebecca Cran
@ 2021-10-25 15:41   ` Jens Axboe
  2021-10-25 15:42     ` Rebecca Cran
  0 siblings, 1 reply; 1305+ messages in thread
From: Jens Axboe @ 2021-10-25 15:41 UTC (permalink / raw)
  To: Rebecca Cran, fio

On 10/25/21 9:37 AM, Rebecca Cran wrote:
> On 10/23/21 6:00 AM, Jens Axboe wrote:
>> The following changes since commit 09d0a62931df0bb7ed4ae92b83a245e35d04100a:
>>
>>    Merge branch 'patch-1' of https://github.com/sweettea/fio (2021-10-19 16:09:21 -0600)
>>
>> are available in the Git repository at:
>>
>>    git://git.kernel.dk/fio.git master
> 
> I just noticed this. Is it possible to change this to specify the https 
> URL instead, since the git protocol is insecure?

Both will work with git.kernel.dk

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2021-10-23 12:00 Jens Axboe
@ 2021-10-25 15:37 ` Rebecca Cran
  2021-10-25 15:41   ` Jens Axboe
  0 siblings, 1 reply; 1305+ messages in thread
From: Rebecca Cran @ 2021-10-25 15:37 UTC (permalink / raw)
  To: Jens Axboe, fio

On 10/23/21 6:00 AM, Jens Axboe wrote:
> The following changes since commit 09d0a62931df0bb7ed4ae92b83a245e35d04100a:
>
>    Merge branch 'patch-1' of https://github.com/sweettea/fio (2021-10-19 16:09:21 -0600)
>
> are available in the Git repository at:
>
>    git://git.kernel.dk/fio.git master

I just noticed this. Is it possible to change this to specify the https 
URL instead, since the git protocol is insecure?


-- 
Rebecca Cran



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-23 12:00 Jens Axboe
  2021-10-25 15:37 ` Rebecca Cran
  0 siblings, 1 reply; 1305+ messages in thread
From: Jens Axboe @ 2021-10-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 09d0a62931df0bb7ed4ae92b83a245e35d04100a:

  Merge branch 'patch-1' of https://github.com/sweettea/fio (2021-10-19 16:09:21 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 515418094c61cf135513a34651af6134a8794b5d:

  Merge branch 'master' of https://github.com/bvanassche/fio (2021-10-22 10:19:04 -0600)

----------------------------------------------------------------
Bart Van Assche (1):
      Android: Add io_uring support

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 Makefile        | 2 +-
 os/os-android.h | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index f28c130a..4ae5a371 100644
--- a/Makefile
+++ b/Makefile
@@ -233,7 +233,7 @@ endif
 endif
 ifeq ($(CONFIG_TARGET_OS), Android)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c profiles/tiobench.c \
-		oslib/linux-dev-lookup.c
+		oslib/linux-dev-lookup.c engines/io_uring.c
 ifdef CONFIG_HAS_BLKZONED
   SOURCE += oslib/linux-blkzoned.c
 endif
diff --git a/os/os-android.h b/os/os-android.h
index 18eb39ce..10c51b83 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -309,4 +309,8 @@ static inline int fio_set_sched_idle(void)
 }
 #endif
 
+#ifndef RWF_UNCACHED
+#define RWF_UNCACHED	0x00000040
+#endif
+
 #endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a7194b2d3d427e7e5678c55a128639df9caf4a48:

  Merge branch 'fixes_1290' of https://github.com/rthardin/fio (2021-10-18 19:29:46 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 09d0a62931df0bb7ed4ae92b83a245e35d04100a:

  Merge branch 'patch-1' of https://github.com/sweettea/fio (2021-10-19 16:09:21 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'patch-1' of https://github.com/sweettea/fio

Sweet Tea Dorminy (1):
      t/fuzz: Clean up generated dependency makefiles

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index c3feb53f..f28c130a 100644
--- a/Makefile
+++ b/Makefile
@@ -626,7 +626,7 @@ unittests/unittest: $(UT_OBJS) $(UT_TARGET_OBJS)
 endif
 
 clean: FORCE
-	@rm -f .depend $(FIO_OBJS) $(GFIO_OBJS) $(OBJS) $(T_OBJS) $(UT_OBJS) $(PROGS) $(T_PROGS) $(T_TEST_PROGS) core.* core gfio unittests/unittest FIO-VERSION-FILE *.[do] lib/*.d oslib/*.[do] crc/*.d engines/*.[do] engines/*.so profiles/*.[do] t/*.[do] unittests/*.[do] unittests/*/*.[do] config-host.mak config-host.h y.tab.[ch] lex.yy.c exp/*.[do] lexer.h
+	@rm -f .depend $(FIO_OBJS) $(GFIO_OBJS) $(OBJS) $(T_OBJS) $(UT_OBJS) $(PROGS) $(T_PROGS) $(T_TEST_PROGS) core.* core gfio unittests/unittest FIO-VERSION-FILE *.[do] lib/*.d oslib/*.[do] crc/*.d engines/*.[do] engines/*.so profiles/*.[do] t/*.[do] t/*/*.[do] unittests/*.[do] unittests/*/*.[do] config-host.mak config-host.h y.tab.[ch] lex.yy.c exp/*.[do] lexer.h
 	@rm -f t/fio-btrace2fio t/io_uring t/read-to-pipe-async
 	@rm -rf  doc/output
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit aa9f26276e1961fab2d33e188f5a2432360c9c14:

  run-fio-tests: make test runs more resilient (2021-10-17 07:22:55 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a7194b2d3d427e7e5678c55a128639df9caf4a48:

  Merge branch 'fixes_1290' of https://github.com/rthardin/fio (2021-10-18 19:29:46 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fixes_1290' of https://github.com/rthardin/fio

Ryan Hardin (1):
      Use min_bs in rate_process=poisson

 backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 86fa6d41..c167f908 100644
--- a/backend.c
+++ b/backend.c
@@ -837,7 +837,7 @@ static long long usec_for_io(struct thread_data *td, enum fio_ddir ddir)
 	if (td->o.rate_process == RATE_PROCESS_POISSON) {
 		uint64_t val, iops;
 
-		iops = bps / td->o.bs[ddir];
+		iops = bps / td->o.min_bs[ddir];
 		val = (int64_t) (1000000 / iops) *
 				-logf(__rand_0_1(&td->poisson_state[ddir]));
 		if (val) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7d1ce4b752e67868b3c7eb9aa5972ceec51210aa:

  t/io_uring: Fix the parameters calculation for multiple threads scenario (2021-10-15 06:20:47 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to aa9f26276e1961fab2d33e188f5a2432360c9c14:

  run-fio-tests: make test runs more resilient (2021-10-17 07:22:55 -0600)

----------------------------------------------------------------
Rebecca Cran (1):
      engines/http.c: add fallthrough annotation to _curl_trace

Shin'ichiro Kawasaki (5):
      zbd: Remove cast to unsigned long long for printf
      zbd: Fix type of local variable min_bs
      t/zbd: Do not use too large block size in test case #4
      t/zbd: Align block size to zone capacity
      t/zbd: Add -w option to ensure no open zone before write tests

Vincent Fu (1):
      run-fio-tests: make test runs more resilient

 engines/http.c         |   3 +-
 t/run-fio-tests.py     |  14 +++++--
 t/zbd/functions        |  26 ++++++++++++
 t/zbd/test-zbd-support |  40 ++++++++++--------
 zbd.c                  | 107 +++++++++++++++++++++++--------------------------
 5 files changed, 112 insertions(+), 78 deletions(-)

---

Diff of recent changes:

diff --git a/engines/http.c b/engines/http.c
index 7a61b132..35c44871 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -297,10 +297,9 @@ static int _curl_trace(CURL *handle, curl_infotype type,
 	switch (type) {
 	case CURLINFO_TEXT:
 		fprintf(stderr, "== Info: %s", data);
-		/* fall through */
+		fallthrough;
 	default:
 	case CURLINFO_SSL_DATA_OUT:
-		/* fall through */
 	case CURLINFO_SSL_DATA_IN:
 		return 0;
 
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index a59cdfe0..612e50ca 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -49,6 +49,7 @@ import shutil
 import logging
 import argparse
 import platform
+import traceback
 import subprocess
 import multiprocessing
 from pathlib import Path
@@ -1057,9 +1058,16 @@ def main():
                 skipped = skipped + 1
                 continue
 
-        test.setup(artifact_root, config['test_id'])
-        test.run()
-        test.check_result()
+        try:
+            test.setup(artifact_root, config['test_id'])
+            test.run()
+            test.check_result()
+        except KeyboardInterrupt:
+            break
+        except Exception as e:
+            test.passed = False
+            test.failure_reason += str(e)
+            logging.debug("Test %d exception:\n%s\n", config['test_id'], traceback.format_exc())
         if test.passed:
             result = "PASSED"
             passed = passed + 1
diff --git a/t/zbd/functions b/t/zbd/functions
index 08a2c629..e4e248b9 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -64,6 +64,32 @@ check_blkzone() {
 	fi
 }
 
+# Check zone capacity of each zone and report block size aligned to the zone
+# capacities. If zone capacity is same as zone size for zones, report zone size.
+zone_cap_bs() {
+	local dev="${1}"
+	local zone_size="${2}"
+	local sed_str='s/.*len \([0-9A-Za-z]*\), cap \([0-9A-Za-z]*\).*/\1 \2/p'
+	local cap bs="$zone_size"
+
+	# When blkzone is not available or blkzone does not report capacity,
+	# assume that zone capacity is same as zone size for all zones.
+	if [ -z "${blkzone}" ] || ! blkzone_reports_capacity "${dev}"; then
+		echo "$zone_size"
+		return
+	fi
+
+	while read -r -a line; do
+		((line[0] == line[1])) && continue
+		cap=$((line[1] * 512))
+		while ((bs > 512 && cap % bs)); do
+			bs=$((bs / 2))
+		done
+	done < <(blkzone report "${dev}" | sed -n "${sed_str}")
+
+	echo "$bs"
+}
+
 # Reports the starting sector and length of the first sequential zone of device
 # $1.
 first_sequential_zone() {
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 5103c406..7e2fff00 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -12,6 +12,7 @@ usage() {
 	echo -e "\t-v Run fio with valgrind --read-var-info option"
 	echo -e "\t-l Test with libzbc ioengine"
 	echo -e "\t-r Reset all zones before test start"
+	echo -e "\t-w Reset all zones before executing each write test case"
 	echo -e "\t-o <max_open_zones> Run fio with max_open_zones limit"
 	echo -e "\t-t <test #> Run only a single test case with specified number"
 	echo -e "\t-q Quit the test run after any failed test"
@@ -182,13 +183,14 @@ run_fio_on_seq() {
     run_one_fio_job "${opts[@]}" "$@"
 }
 
-# Prepare for write test by resetting zones. When max_open_zones option is
-# specified, reset all zones of the test target to ensure that zones out of the
-# test target range do not have open zones. This allows the write test to the
-# target range to be able to open zones up to max_open_zones.
+# Prepare for write test by resetting zones. When reset_before_write or
+# max_open_zones option is specified, reset all zones of the test target to
+# ensure that zones out of the test target range do not have open zones. This
+# allows the write test to the target range to be able to open zones up to
+# max_open_zones limit specified as the option or obtained from sysfs.
 prep_write() {
-	[[ -n "${max_open_zones_opt}" && -n "${is_zbd}" ]] &&
-		reset_zone "${dev}" -1
+	[[ -n "${reset_before_write}" || -n "${max_open_zones_opt}" ]] &&
+		[[ -n "${is_zbd}" ]] && reset_zone "${dev}" -1
 }
 
 SKIP_TESTCASE=255
@@ -310,7 +312,8 @@ test4() {
     off=$((first_sequential_zone_sector * 512 + 129 * zone_size))
     size=$((zone_size))
     [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
-    opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--bs=$size")
+    opts+=("--name=$dev" "--filename=$dev" "--offset=$off")
+    opts+=(--bs="$(min $((logical_block_size * 256)) $size)")
     opts+=("--size=$size" "--thread=1" "--read_beyond_wp=1")
     opts+=("$(ioengine "psync")" "--rw=read" "--direct=1" "--disable_lat=1")
     opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
@@ -320,15 +323,15 @@ test4() {
 
 # Sequential write to sequential zones.
 test5() {
-    local size off capacity
+    local size off capacity bs
 
     prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 4 $off $dev)
     size=$((4 * zone_size))
+    bs=$(min "$(max $((zone_size / 64)) "$logical_block_size")" "$zone_cap_bs")
     run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write	\
-		   --bs="$(max $((zone_size / 64)) "$logical_block_size")"\
-		   --do_verify=1 --verify=md5				\
+		   --bs="$bs" --do_verify=1 --verify=md5 \
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
     check_written $capacity || return $?
     check_read $capacity || return $?
@@ -336,18 +339,18 @@ test5() {
 
 # Sequential read from sequential zones.
 test6() {
-    local size off capacity
+    local size off capacity bs
 
     prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 4 $off $dev)
     size=$((4 * zone_size))
+    bs=$(min "$(max $((zone_size / 64)) "$logical_block_size")" "$zone_cap_bs")
     write_and_run_one_fio_job \
 	    $((first_sequential_zone_sector * 512)) "${size}" \
 	    --offset="${off}" \
 	    --size="${size}" --zonemode=zbd --zonesize="${zone_size}" \
-	    "$(ioengine "psync")" --iodepth=1 --rw=read \
-	    --bs="$(max $((zone_size / 64)) "$logical_block_size")" \
+	    "$(ioengine "psync")" --iodepth=1 --rw=read --bs="$bs" \
 	    >>"${logfile}.${test_number}" 2>&1 || return $?
     check_read $capacity || return $?
 }
@@ -485,7 +488,7 @@ test14() {
 
 # Sequential read on a mix of empty and full zones.
 test15() {
-    local i off size
+    local i off size bs
     local w_off w_size w_capacity
 
     for ((i=0;i<4;i++)); do
@@ -499,8 +502,9 @@ test15() {
     w_capacity=$(total_zone_capacity 2 $w_off $dev)
     off=$((first_sequential_zone_sector * 512))
     size=$((4 * zone_size))
+    bs=$(min $((zone_size / 16)) "$zone_cap_bs")
     write_and_run_one_fio_job "${w_off}" "${w_size}" \
-		    "$(ioengine "psync")" --rw=read --bs=$((zone_size / 16)) \
+		    "$(ioengine "psync")" --rw=read --bs="$bs" \
 		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off \
 		    --size=$((size)) >>"${logfile}.${test_number}" 2>&1 ||
 	return $?
@@ -852,7 +856,7 @@ test37() {
 	off=$(((first_sequential_zone_sector - 1) * 512))
     fi
     size=$((zone_size + 2 * 512))
-    bs=$((zone_size / 4))
+    bs=$(min $((zone_size / 4)) "$zone_cap_bs")
     run_one_fio_job --offset=$off --size=$size "$(ioengine "psync")"	\
 		    --iodepth=1 --rw=write --do_verify=1 --verify=md5	\
 		    --bs=$bs --zonemode=zbd --zonesize="${zone_size}"	\
@@ -1245,6 +1249,7 @@ SECONDS=0
 tests=()
 dynamic_analyzer=()
 reset_all_zones=
+reset_before_write=
 use_libzbc=
 zbd_debug=
 max_open_zones_opt=
@@ -1259,6 +1264,7 @@ while [ "${1#-}" != "$1" ]; do
 	shift;;
     -l) use_libzbc=1; shift;;
     -r) reset_all_zones=1; shift;;
+    -w) reset_before_write=1; shift;;
     -t) tests+=("$2"); shift; shift;;
     -o) max_open_zones_opt="${2}"; shift; shift;;
     -v) dynamic_analyzer=(valgrind "--read-var-info=yes");
@@ -1377,6 +1383,8 @@ fi
 echo -n "First sequential zone starts at sector $first_sequential_zone_sector;"
 echo " zone size: $((zone_size >> 20)) MB"
 
+zone_cap_bs=$(zone_cap_bs "$dev" "$zone_size")
+
 if [ "${#tests[@]}" = 0 ]; then
     readarray -t tests < <(declare -F | grep "test[0-9]*" | \
 				   tr -c -d "[:digit:]\n" | sort -n)
diff --git a/zbd.c b/zbd.c
index c0b0b81c..c18998c4 100644
--- a/zbd.c
+++ b/zbd.c
@@ -83,12 +83,12 @@ int zbd_report_zones(struct thread_data *td, struct fio_file *f,
 		ret = blkzoned_report_zones(td, f, offset, zones, nr_zones);
 	if (ret < 0) {
 		td_verror(td, errno, "report zones failed");
-		log_err("%s: report zones from sector %llu failed (%d).\n",
-			f->file_name, (unsigned long long)offset >> 9, errno);
+		log_err("%s: report zones from sector %"PRIu64" failed (%d).\n",
+			f->file_name, offset >> 9, errno);
 	} else if (ret == 0) {
 		td_verror(td, errno, "Empty zone report");
-		log_err("%s: report zones from sector %llu is empty.\n",
-			f->file_name, (unsigned long long)offset >> 9);
+		log_err("%s: report zones from sector %"PRIu64" is empty.\n",
+			f->file_name, offset >> 9);
 		ret = -EIO;
 	}
 
@@ -116,9 +116,8 @@ int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
 		ret = blkzoned_reset_wp(td, f, offset, length);
 	if (ret < 0) {
 		td_verror(td, errno, "resetting wp failed");
-		log_err("%s: resetting wp for %llu sectors at sector %llu failed (%d).\n",
-			f->file_name, (unsigned long long)length >> 9,
-			(unsigned long long)offset >> 9, errno);
+		log_err("%s: resetting wp for %"PRIu64" sectors at sector %"PRIu64" failed (%d).\n",
+			f->file_name, length >> 9, offset >> 9, errno);
 	}
 
 	return ret;
@@ -318,16 +317,16 @@ static bool zbd_verify_sizes(void)
 					return false;
 				}
 			} else if (td->o.zone_size != f->zbd_info->zone_size) {
-				log_err("%s: job parameter zonesize %llu does not match disk zone size %llu.\n",
-					f->file_name, (unsigned long long) td->o.zone_size,
-					(unsigned long long) f->zbd_info->zone_size);
+				log_err("%s: job parameter zonesize %llu does not match disk zone size %"PRIu64".\n",
+					f->file_name, td->o.zone_size,
+					f->zbd_info->zone_size);
 				return false;
 			}
 
 			if (td->o.zone_skip % td->o.zone_size) {
 				log_err("%s: zoneskip %llu is not a multiple of the device zone size %llu.\n",
-					f->file_name, (unsigned long long) td->o.zone_skip,
-					(unsigned long long) td->o.zone_size);
+					f->file_name, td->o.zone_skip,
+					td->o.zone_size);
 				return false;
 			}
 
@@ -341,9 +340,9 @@ static bool zbd_verify_sizes(void)
 						 f->file_name);
 					return false;
 				}
-				log_info("%s: rounded up offset from %llu to %llu\n",
-					 f->file_name, (unsigned long long) f->file_offset,
-					 (unsigned long long) new_offset);
+				log_info("%s: rounded up offset from %"PRIu64" to %"PRIu64"\n",
+					 f->file_name, f->file_offset,
+					 new_offset);
 				f->io_size -= (new_offset - f->file_offset);
 				f->file_offset = new_offset;
 			}
@@ -357,9 +356,9 @@ static bool zbd_verify_sizes(void)
 						 f->file_name);
 					return false;
 				}
-				log_info("%s: rounded down io_size from %llu to %llu\n",
-					 f->file_name, (unsigned long long) f->io_size,
-					 (unsigned long long) new_end - f->file_offset);
+				log_info("%s: rounded down io_size from %"PRIu64" to %"PRIu64"\n",
+					 f->file_name, f->io_size,
+					 new_end - f->file_offset);
 				f->io_size = new_end - f->file_offset;
 			}
 		}
@@ -388,17 +387,17 @@ static bool zbd_verify_bs(void)
 				continue;
 			zone_size = f->zbd_info->zone_size;
 			if (td_trim(td) && td->o.bs[DDIR_TRIM] != zone_size) {
-				log_info("%s: trim block size %llu is not the zone size %llu\n",
+				log_info("%s: trim block size %llu is not the zone size %"PRIu64"\n",
 					 f->file_name, td->o.bs[DDIR_TRIM],
-					 (unsigned long long)zone_size);
+					 zone_size);
 				return false;
 			}
 			for (k = 0; k < FIO_ARRAY_SIZE(td->o.bs); k++) {
 				if (td->o.verify != VERIFY_NONE &&
 				    zone_size % td->o.bs[k] != 0) {
-					log_info("%s: block size %llu is not a divisor of the zone size %llu\n",
+					log_info("%s: block size %llu is not a divisor of the zone size %"PRIu64"\n",
 						 f->file_name, td->o.bs[k],
-						 (unsigned long long)zone_size);
+						 zone_size);
 					return false;
 				}
 			}
@@ -448,8 +447,7 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 
 	if (zone_capacity > zone_size) {
 		log_err("%s: job parameter zonecapacity %llu is larger than zone size %llu\n",
-			f->file_name, (unsigned long long) td->o.zone_capacity,
-			(unsigned long long) td->o.zone_size);
+			f->file_name, td->o.zone_capacity, td->o.zone_size);
 		return 1;
 	}
 
@@ -525,15 +523,14 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	if (td->o.zone_size == 0) {
 		td->o.zone_size = zone_size;
 	} else if (td->o.zone_size != zone_size) {
-		log_err("fio: %s job parameter zonesize %llu does not match disk zone size %llu.\n",
-			f->file_name, (unsigned long long) td->o.zone_size,
-			(unsigned long long) zone_size);
+		log_err("fio: %s job parameter zonesize %llu does not match disk zone size %"PRIu64".\n",
+			f->file_name, td->o.zone_size, zone_size);
 		ret = -EINVAL;
 		goto out;
 	}
 
-	dprint(FD_ZBD, "Device %s has %d zones of size %llu KB\n", f->file_name,
-	       nr_zones, (unsigned long long) zone_size / 1024);
+	dprint(FD_ZBD, "Device %s has %d zones of size %"PRIu64" KB\n", f->file_name,
+	       nr_zones, zone_size / 1024);
 
 	zbd_info = scalloc(1, sizeof(*zbd_info) +
 			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
@@ -587,9 +584,8 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 					   ZBD_REPORT_MAX_ZONES));
 		if (nrz < 0) {
 			ret = nrz;
-			log_info("fio: report zones (offset %llu) failed for %s (%d).\n",
-			 	 (unsigned long long)offset,
-				 f->file_name, -ret);
+			log_info("fio: report zones (offset %"PRIu64") failed for %s (%d).\n",
+				 offset, f->file_name, -ret);
 			goto out;
 		}
 	}
@@ -972,7 +968,7 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 			   struct fio_zone_info *const ze)
 {
 	struct fio_zone_info *z;
-	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
+	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
 	int res = 0;
 
 	assert(min_bs);
@@ -1145,7 +1141,7 @@ static bool is_zone_open(const struct thread_data *td, const struct fio_file *f,
 static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 			  uint32_t zone_idx)
 {
-	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
+	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
 	struct zoned_block_device_info *zbdi = f->zbd_info;
 	struct fio_zone_info *z = get_zone(f, zone_idx);
 	bool res = true;
@@ -1228,7 +1224,7 @@ static bool any_io_in_flight(void)
 static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 						      struct io_u *io_u)
 {
-	const uint32_t min_bs = td->o.min_bs[io_u->ddir];
+	const uint64_t min_bs = td->o.min_bs[io_u->ddir];
 	struct fio_file *f = io_u->file;
 	struct zoned_block_device_info *zbdi = f->zbd_info;
 	struct fio_zone_info *z;
@@ -1431,7 +1427,7 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
 						    struct fio_zone_info *z)
 {
 	const struct fio_file *f = io_u->file;
-	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
+	const uint64_t min_bs = td->o.min_bs[DDIR_WRITE];
 
 	if (!zbd_open_zone(td, f, zbd_zone_nr(f, z))) {
 		zone_unlock(z);
@@ -1440,8 +1436,8 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
 	}
 
 	if (z->verify_block * min_bs >= z->capacity) {
-		log_err("%s: %d * %d >= %llu\n", f->file_name, z->verify_block,
-			min_bs, (unsigned long long)z->capacity);
+		log_err("%s: %d * %"PRIu64" >= %"PRIu64"\n", f->file_name, z->verify_block,
+			min_bs, z->capacity);
 		/*
 		 * If the assertion below fails during a test run, adding
 		 * "--experimental_verify=1" to the command line may help.
@@ -1450,8 +1446,8 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
 	}
 	io_u->offset = z->start + z->verify_block * min_bs;
 	if (io_u->offset + io_u->buflen >= zbd_zone_capacity_end(z)) {
-		log_err("%s: %llu + %llu >= %llu\n", f->file_name, io_u->offset,
-			io_u->buflen, (unsigned long long) zbd_zone_capacity_end(z));
+		log_err("%s: %llu + %llu >= %"PRIu64"\n", f->file_name, io_u->offset,
+			io_u->buflen, zbd_zone_capacity_end(z));
 		assert(false);
 	}
 	z->verify_block += io_u->buflen / min_bs;
@@ -1467,7 +1463,7 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
  * pointer, hold the mutex for the zone.
  */
 static struct fio_zone_info *
-zbd_find_zone(struct thread_data *td, struct io_u *io_u, uint32_t min_bytes,
+zbd_find_zone(struct thread_data *td, struct io_u *io_u, uint64_t min_bytes,
 	      struct fio_zone_info *zb, struct fio_zone_info *zl)
 {
 	struct fio_file *f = io_u->file;
@@ -1499,7 +1495,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u, uint32_t min_bytes,
 				zone_unlock(z2);
 		}
 	}
-	dprint(FD_ZBD, "%s: no zone has %d bytes of readable data\n",
+	dprint(FD_ZBD, "%s: no zone has %"PRIu64" bytes of readable data\n",
 	       f->file_name, min_bytes);
 	return NULL;
 }
@@ -1672,10 +1668,9 @@ void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u)
 	    f->last_pos[ddir] >= zbd_zone_capacity_end(z)) {
 		dprint(FD_ZBD,
 		       "%s: Jump from zone capacity limit to zone end:"
-		       " (%llu -> %llu) for zone %u (%llu)\n",
-		       f->file_name, (unsigned long long) f->last_pos[ddir],
-		       (unsigned long long) zbd_zone_end(z), zone_idx,
-		       (unsigned long long) z->capacity);
+		       " (%"PRIu64" -> %"PRIu64") for zone %u (%"PRIu64")\n",
+		       f->file_name, f->last_pos[ddir],
+		       zbd_zone_end(z), zone_idx, z->capacity);
 		td->io_skip_bytes += zbd_zone_end(z) - f->last_pos[ddir];
 		f->last_pos[ddir] = zbd_zone_end(z);
 	}
@@ -1759,7 +1754,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	uint32_t zone_idx_b;
 	struct fio_zone_info *zb, *zl, *orig_zb;
 	uint32_t orig_len = io_u->buflen;
-	uint32_t min_bs = td->o.min_bs[io_u->ddir];
+	uint64_t min_bs = td->o.min_bs[io_u->ddir];
 	uint64_t new_len;
 	int64_t range;
 
@@ -1785,9 +1780,9 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 
 		if (io_u->offset + min_bs > (zb + 1)->start) {
 			dprint(FD_IO,
-			       "%s: off=%llu + min_bs=%u > next zone %llu\n",
+			       "%s: off=%llu + min_bs=%"PRIu64" > next zone %"PRIu64"\n",
 			       f->file_name, io_u->offset,
-			       min_bs, (unsigned long long) (zb + 1)->start);
+			       min_bs, (zb + 1)->start);
 			io_u->offset = zb->start + (zb + 1)->start - io_u->offset;
 			new_len = min(io_u->buflen, (zb + 1)->start - io_u->offset);
 		} else {
@@ -1878,9 +1873,8 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		if (io_u->buflen > zbdi->zone_size) {
 			td_verror(td, EINVAL, "I/O buflen exceeds zone size");
 			dprint(FD_IO,
-			       "%s: I/O buflen %llu exceeds zone size %llu\n",
-			       f->file_name, io_u->buflen,
-			       (unsigned long long) zbdi->zone_size);
+			       "%s: I/O buflen %llu exceeds zone size %"PRIu64"\n",
+			       f->file_name, io_u->buflen, zbdi->zone_size);
 			goto eof;
 		}
 		if (!zbd_open_zone(td, f, zone_idx_b)) {
@@ -1917,9 +1911,8 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 
 			if (zb->capacity < min_bs) {
 				td_verror(td, EINVAL, "ZCAP is less min_bs");
-				log_err("zone capacity %llu smaller than minimum block size %d\n",
-					(unsigned long long)zb->capacity,
-					min_bs);
+				log_err("zone capacity %"PRIu64" smaller than minimum block size %"PRIu64"\n",
+					zb->capacity, min_bs);
 				goto eof;
 			}
 		}
@@ -1949,7 +1942,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			goto accept;
 		}
 		td_verror(td, EIO, "zone remainder too small");
-		log_err("zone remainder %lld smaller than min block size %d\n",
+		log_err("zone remainder %lld smaller than min block size %"PRIu64"\n",
 			(zbd_zone_capacity_end(zb) - io_u->offset), min_bs);
 		goto eof;
 	case DDIR_TRIM:
@@ -2006,7 +1999,7 @@ char *zbd_write_status(const struct thread_stat *ts)
 {
 	char *res;
 
-	if (asprintf(&res, "; %llu zone resets", (unsigned long long) ts->nr_zone_resets) < 0)
+	if (asprintf(&res, "; %"PRIu64" zone resets", ts->nr_zone_resets) < 0)
 		return NULL;
 	return res;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 246054544cc74b56b063640ecb538893ea613936:

  Merge branch 'evelu-typo' of https://github.com/ErwanAliasr1/fio (2021-10-14 15:01:30 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7d1ce4b752e67868b3c7eb9aa5972ceec51210aa:

  t/io_uring: Fix the parameters calculation for multiple threads scenario (2021-10-15 06:20:47 -0600)

----------------------------------------------------------------
Pankaj Raghav (1):
      t/io_uring: Fix the parameters calculation for multiple threads scenario

 t/io_uring.c | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 5a80e074..a87042f8 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -1348,6 +1348,7 @@ int main(int argc, char *argv[])
 			stats_running = 1;
 
 		for (j = 0; j < nthreads; j++) {
+			s = get_submitter(j);
 			this_done += s->done;
 			this_call += s->calls;
 			this_reap += s->reaps;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 53b5fa1ea4f821899440637ab632ce0e1687c916:

  t/io_uring: don't append 'K' to IOPS if we don't divide by 1000 (2021-10-13 06:17:44 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 246054544cc74b56b063640ecb538893ea613936:

  Merge branch 'evelu-typo' of https://github.com/ErwanAliasr1/fio (2021-10-14 15:01:30 -0600)

----------------------------------------------------------------
Erwan Velu (1):
      t/io_uring: Fixing typo

Jens Axboe (2):
      t/io_uring: include a maximum IOPS seen when exiting
      Merge branch 'evelu-typo' of https://github.com/ErwanAliasr1/fio

 t/io_uring.c | 6 ++++++
 1 file changed, 6 insertions(+)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 1b729ebf..5a80e074 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -103,6 +103,7 @@ struct submitter {
 static struct submitter *submitter;
 static volatile int finish;
 static int stats_running;
+static unsigned long max_iops;
 
 static int depth = DEPTH;
 static int batch_submit = BATCH_SUBMIT;
@@ -882,6 +883,10 @@ static void do_finish(const char *reason)
 		struct submitter *s = get_submitter(j);
 		s->finish = 1;
 	}
+	if (max_iops > 100000)
+		printf("Maximum IOPS=%luK\n", max_iops / 1000);
+	else if (max_iops)
+		printf("Maximum IOPS=%lu\n", max_iops);
 	finish = 1;
 }
 
@@ -1362,6 +1367,7 @@ int main(int argc, char *argv[])
 			printf("IOPS=%luK, ", iops / 1000);
 		else
 			printf("IOPS=%lu, ", iops);
+		max_iops = max(max_iops, iops);
 		if (!do_nop)
 			printf("BW=%luMiB/s, ", bw);
 		printf("IOS/call=%ld/%ld, inflight=(%s)\n", rpc, ipc, fdepths);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 702b0be2fe3b6cf5b3556920edaf0637a33b36e1:

  t/io_uring: update for new DMA map buffers API (2021-10-12 18:41:14 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 53b5fa1ea4f821899440637ab632ce0e1687c916:

  t/io_uring: don't append 'K' to IOPS if we don't divide by 1000 (2021-10-13 06:17:44 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: don't append 'K' to IOPS if we don't divide by 1000

 t/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 84960ba9..1b729ebf 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -1361,7 +1361,7 @@ int main(int argc, char *argv[])
 		if (iops > 100000)
 			printf("IOPS=%luK, ", iops / 1000);
 		else
-			printf("IOPS=%luK, ", iops);
+			printf("IOPS=%lu, ", iops);
 		if (!do_nop)
 			printf("BW=%luMiB/s, ", bw);
 		printf("IOS/call=%ld/%ld, inflight=(%s)\n", rpc, ipc, fdepths);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c4cb947e8f92c10835164b67deed06828cfc01be:

  io_u: don't attempt to requeue for full residual (2021-10-11 09:49:21 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 702b0be2fe3b6cf5b3556920edaf0637a33b36e1:

  t/io_uring: update for new DMA map buffers API (2021-10-12 18:41:14 -0600)

----------------------------------------------------------------
Brandon Paupore (1):
      Query Windows clock frequency and use reported max

Erwan Velu (1):
      t/one-core-peak: Improving check_sysblock_value error handling

Jens Axboe (6):
      Merge branch 'windows-res' of https://github.com/bjpaupor/fio
      t/io_uring: show IOPS in increments of 1000 IOPS if necessary
      Merge branch 'evelu-onecore' of https://github.com/ErwanAliasr1/fio
      t/io_uring: fix silly identical branch error
      t/io_uring: add test support for pre mapping DMA buffers
      t/io_uring: update for new DMA map buffers API

 Makefile           |  4 ++--
 helper_thread.c    | 21 +++++++++-----------
 os/os.h            |  9 +++++++++
 os/windows/dlls.c  | 33 +++++++++++++++++++++++++++++++
 stat.c             |  4 ++++
 t/io_uring.c       | 57 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 t/one-core-peak.sh |  5 ++---
 7 files changed, 111 insertions(+), 22 deletions(-)
 create mode 100644 os/windows/dlls.c

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 5198f70e..c3feb53f 100644
--- a/Makefile
+++ b/Makefile
@@ -275,8 +275,8 @@ ifeq ($(CONFIG_TARGET_OS), Darwin)
   LIBS	 += -lpthread -ldl
 endif
 ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
-  SOURCE += os/windows/cpu-affinity.c os/windows/posix.c
-  WINDOWS_OBJS = os/windows/cpu-affinity.o os/windows/posix.o lib/hweight.o
+  SOURCE += os/windows/cpu-affinity.c os/windows/posix.c os/windows/dlls.c
+  WINDOWS_OBJS = os/windows/cpu-affinity.o os/windows/posix.o os/windows/dlls.o lib/hweight.o
   LIBS	 += -lpthread -lpsapi -lws2_32 -lssp
   FIO_CFLAGS += -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format
 endif
diff --git a/helper_thread.c b/helper_thread.c
index d8e7ebfe..b9b83db3 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -9,6 +9,10 @@
 #define DRD_IGNORE_VAR(x) do { } while (0)
 #endif
 
+#ifdef WIN32
+#include "os/os-windows.h"
+#endif
+
 #include "fio.h"
 #include "smalloc.h"
 #include "helper_thread.h"
@@ -283,19 +287,12 @@ static void *helper_thread_main(void *data)
 		}
 	};
 	struct timespec ts;
-	int clk_tck, ret = 0;
+	long clk_tck;
+	int ret = 0;
 
-#ifdef _SC_CLK_TCK
-	clk_tck = sysconf(_SC_CLK_TCK);
-#else
-	/*
-	 * The timer frequence is variable on Windows. Instead of trying to
-	 * query it, use 64 Hz, the clock frequency lower bound. See also
-	 * https://carpediemsystems.co.uk/2019/07/18/windows-system-timer-granularity/.
-	 */
-	clk_tck = 64;
-#endif
-	dprint(FD_HELPERTHREAD, "clk_tck = %d\n", clk_tck);
+	os_clk_tck(&clk_tck);
+
+	dprint(FD_HELPERTHREAD, "clk_tck = %ld\n", clk_tck);
 	assert(clk_tck > 0);
 	sleep_accuracy_ms = (1000 + clk_tck - 1) / clk_tck;
 
diff --git a/os/os.h b/os/os.h
index 827b61e9..5965d7b8 100644
--- a/os/os.h
+++ b/os/os.h
@@ -412,4 +412,13 @@ static inline bool os_cpu_has(cpu_features feature)
 # define fio_mkdir(path, mode)	mkdir(path, mode)
 #endif
 
+#ifdef _SC_CLK_TCK
+static inline void os_clk_tck(long *clk_tck)
+{
+	*clk_tck = sysconf(_SC_CLK_TCK);
+}
+#else
+extern void os_clk_tck(long *clk_tck);
+#endif
+
 #endif /* FIO_OS_H */
diff --git a/os/windows/dlls.c b/os/windows/dlls.c
new file mode 100644
index 00000000..774b1c61
--- /dev/null
+++ b/os/windows/dlls.c
@@ -0,0 +1,33 @@
+#include "os/os.h"
+
+#include <windows.h>
+
+void os_clk_tck(long *clk_tck)
+{
+	/*
+	 * The timer resolution is variable on Windows. Try to query it 
+	 * or use 64 Hz, the clock frequency lower bound. See also
+	 * https://carpediemsystems.co.uk/2019/07/18/windows-system-timer-granularity/.
+	 */
+	unsigned long minRes, maxRes, curRes;
+	HMODULE lib;
+	FARPROC queryTimer;
+	FARPROC setTimer;
+
+	if (!(lib = LoadLibrary(TEXT("ntdll.dll"))) ||
+		!(queryTimer = GetProcAddress(lib, "NtQueryTimerResolution")) ||
+		!(setTimer = GetProcAddress(lib, "NtSetTimerResolution"))) {
+		dprint(FD_HELPERTHREAD, 
+			"Failed to load ntdll library, set to lower bound 64 Hz\n");
+		*clk_tck = 64;
+	} else {
+		queryTimer(&minRes, &maxRes, &curRes);
+		dprint(FD_HELPERTHREAD, 
+			"minRes = %lu, maxRes = %lu, curRes = %lu\n",
+			minRes, maxRes, curRes);
+
+		/* Use maximum resolution for most accurate timestamps */
+		setTimer(maxRes, 1, &curRes);
+		*clk_tck = (long) (10000000L / maxRes);
+	}
+}
\ No newline at end of file
diff --git a/stat.c b/stat.c
index ac53463d..30f9b5c1 100644
--- a/stat.c
+++ b/stat.c
@@ -17,7 +17,11 @@
 #include "zbd.h"
 #include "oslib/asprintf.h"
 
+#ifdef WIN32
+#define LOG_MSEC_SLACK	2
+#else
 #define LOG_MSEC_SLACK	1
+#endif
 
 struct fio_sem *stat_sem;
 
diff --git a/t/io_uring.c b/t/io_uring.c
index cdd15986..84960ba9 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -110,6 +110,7 @@ static int batch_complete = BATCH_COMPLETE;
 static int bs = BS;
 static int polled = 1;		/* use IO polling */
 static int fixedbufs = 1;	/* use fixed user buffers */
+static int dma_map;		/* pre-map DMA buffers */
 static int register_files = 1;	/* use fixed files */
 static int buffered = 0;	/* use buffered IO, not O_DIRECT */
 static int sq_thread_poll = 0;	/* use kernel submission/poller thread */
@@ -130,6 +131,17 @@ static float plist[] = { 1.0, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0,
 			80.0, 90.0, 95.0, 99.0, 99.5, 99.9, 99.95, 99.99 };
 static int plist_len = 17;
 
+#ifndef IORING_REGISTER_MAP_BUFFERS
+#define IORING_REGISTER_MAP_BUFFERS	20
+struct io_uring_map_buffers {
+	__s32	fd;
+	__u32	buf_start;
+	__u32	buf_end;
+	__u32	flags;
+	__u64	rsvd[2];
+};
+#endif
+
 static unsigned long cycles_to_nsec(unsigned long cycles)
 {
 	uint64_t val;
@@ -319,6 +331,24 @@ static void add_stat(struct submitter *s, int clock_index, int nr)
 #endif
 }
 
+static int io_uring_map_buffers(struct submitter *s)
+{
+	struct io_uring_map_buffers map = {
+		.fd		= s->files[0].real_fd,
+		.buf_end	= depth,
+	};
+
+	if (do_nop)
+		return 0;
+	if (s->nr_files > 1) {
+		fprintf(stderr, "Can't map buffers with multiple files\n");
+		return -1;
+	}
+
+	return syscall(__NR_io_uring_register, s->ring_fd,
+			IORING_REGISTER_MAP_BUFFERS, &map, 1);
+}
+
 static int io_uring_register_buffers(struct submitter *s)
 {
 	if (do_nop)
@@ -945,6 +975,14 @@ static int setup_ring(struct submitter *s)
 			perror("io_uring_register_buffers");
 			return 1;
 		}
+
+		if (dma_map) {
+			ret = io_uring_map_buffers(s);
+			if (ret < 0) {
+				perror("io_uring_map_buffers");
+				return 1;
+			}
+		}
 	}
 
 	if (register_files) {
@@ -1016,6 +1054,7 @@ static void usage(char *argv, int status)
 		" -b <int>  : Block size, default %d\n"
 		" -p <bool> : Polled IO, default %d\n"
 		" -B <bool> : Fixed buffers, default %d\n"
+		" -R <bool> : DMA map fixed buffers, default %d\n"
 		" -F <bool> : Register files, default %d\n"
 		" -n <int>  : Number of threads, default %d\n"
 		" -O <bool> : Use O_DIRECT, default %d\n"
@@ -1025,8 +1064,8 @@ static void usage(char *argv, int status)
 		" -a <bool> : Use legacy aio, default %d\n"
 		" -r <int>  : Runtime in seconds, default %s\n",
 		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
-		fixedbufs, register_files, nthreads, !buffered, do_nop, stats, aio,
-		runtime == 0 ? "unlimited" : runtime_str);
+		fixedbufs, dma_map, register_files, nthreads, !buffered, do_nop,
+		stats, aio, runtime == 0 ? "unlimited" : runtime_str);
 	exit(status);
 }
 
@@ -1086,7 +1125,7 @@ int main(int argc, char *argv[])
 	if (!do_nop && argc < 2)
 		usage(argv[0], 1);
 
-	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:r:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:r:D:h?")) != -1) {
 		switch (opt) {
 		case 'a':
 			aio = !!atoi(optarg);
@@ -1147,6 +1186,9 @@ int main(int argc, char *argv[])
 		case 'r':
 			runtime = atoi(optarg);
 			break;
+		case 'D':
+			dma_map = !!atoi(optarg);
+			break;
 		case 'h':
 		case '?':
 		default:
@@ -1162,6 +1204,8 @@ int main(int argc, char *argv[])
 		batch_complete = depth;
 	if (batch_submit > depth)
 		batch_submit = depth;
+	if (!fixedbufs && dma_map)
+		dma_map = 0;
 
 	submitter = calloc(nthreads, sizeof(*submitter) +
 				depth * sizeof(struct iovec));
@@ -1261,7 +1305,7 @@ int main(int argc, char *argv[])
 		}
 	}
 	s = get_submitter(0);
-	printf("polled=%d, fixedbufs=%d, register_files=%d, buffered=%d, QD=%d\n", polled, fixedbufs, register_files, buffered, depth);
+	printf("polled=%d, fixedbufs=%d/%d, register_files=%d, buffered=%d, QD=%d\n", polled, fixedbufs, dma_map, register_files, buffered, depth);
 	if (!aio)
 		printf("Engine=io_uring, sq_ring=%d, cq_ring=%d\n", *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
 	else
@@ -1314,7 +1358,10 @@ int main(int argc, char *argv[])
 			bw = iops * (bs / 1048576);
 		else
 			bw = iops / (1048576 / bs);
-		printf("IOPS=%lu, ", iops);
+		if (iops > 100000)
+			printf("IOPS=%luK, ", iops / 1000);
+		else
+			printf("IOPS=%luK, ", iops);
 		if (!do_nop)
 			printf("BW=%luMiB/s, ", bw);
 		printf("IOS/call=%ld/%ld, inflight=(%s)\n", rpc, ipc, fdepths);
diff --git a/t/one-core-peak.sh b/t/one-core-peak.sh
index 11b1d69a..fba4ec95 100755
--- a/t/one-core-peak.sh
+++ b/t/one-core-peak.sh
@@ -153,10 +153,9 @@ check_sysblock_value() {
   target_file="${sys_block_dir}/$2"
   value=$3
   [ -f "${target_file}" ] || return
-  content=$(cat ${target_file})
+  content=$(cat ${target_file} 2>/dev/null)
   if [ "${content}" != "${value}" ]; then
-    info "${device_name}" "${target_file} set to ${value}."
-    echo ${value} > ${target_file} 2>/dev/null || hint "${device_name}: Cannot set ${value} on ${target_file}"
+    echo ${value} > ${target_file} 2>/dev/null && info "${device_name}" "${target_file} set to ${value}." || hint "${device_name}: Cannot set ${value} on ${target_file}"
   fi
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d4af2ecea2930138bbaf58fe84debef8e84761c6:

  t/io_uring: fix latency stats for depth == 1 (2021-10-09 12:56:11 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c4cb947e8f92c10835164b67deed06828cfc01be:

  io_u: don't attempt to requeue for full residual (2021-10-11 09:49:21 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      io_u: don't attempt to requeue for full residual

 io_u.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index 5289b5d1..586a4bef 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2004,7 +2004,7 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 		 * Make sure we notice short IO from here, and requeue them
 		 * appropriately!
 		 */
-		if (io_u->resid) {
+		if (bytes && io_u->resid) {
 			io_u->xfer_buflen = io_u->resid;
 			io_u->xfer_buf += bytes;
 			io_u->offset += bytes;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b1ebcbce6499fed58f87d1bcbfd50899c508d3ab:

  Merge branch 'evelu-ocp' of https://github.com/ErwanAliasr1/fio (2021-10-07 06:18:21 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d4af2ecea2930138bbaf58fe84debef8e84761c6:

  t/io_uring: fix latency stats for depth == 1 (2021-10-09 12:56:11 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: fix latency stats for depth == 1

 t/io_uring.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 2c9fd08c..cdd15986 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -531,8 +531,8 @@ static int reap_events_uring(struct submitter *s)
 					stat_nr = 0;
 				}
 				last_idx = clock_index;
-			} else if (clock_index)
-				stat_nr++;
+			}
+			stat_nr++;
 		}
 		reaped++;
 		head++;
@@ -562,6 +562,8 @@ static int submitter_init(struct submitter *s)
 
 	if (stats) {
 		nr_batch = roundup_pow2(depth / batch_submit);
+		if (nr_batch < 2)
+			nr_batch = 2;
 		s->clock_batch = calloc(nr_batch, sizeof(unsigned long));
 		s->clock_index = 1;
 
@@ -637,8 +639,8 @@ static int reap_events_aio(struct submitter *s, struct io_event *events, int evs
 					stat_nr = 0;
 				}
 				last_idx = clock_index;
-			} else if (clock_index)
-				stat_nr++;
+			}
+			stat_nr++;
 		}
 		reaped++;
 		evs--;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 09ee86fa431939cb8f634e9ee8e1fc8d9302ea59:

  t/io_uring: get rid of old debug printfs (2021-10-05 06:58:07 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b1ebcbce6499fed58f87d1bcbfd50899c508d3ab:

  Merge branch 'evelu-ocp' of https://github.com/ErwanAliasr1/fio (2021-10-07 06:18:21 -0600)

----------------------------------------------------------------
Erwan Velu (5):
      t/one-core-peak: Reporting BLK_CGROUP
      t/one-core-peak: Reporting BLK_WBT_MQ
      t/one-core-peak: Reporting kernel cmdline
      t/one-core-peak: Reporting RETPOLINE & PAGE_TABLE_ISOLATION
      t/io_uring: Add -r option to control the runtime

Jens Axboe (1):
      Merge branch 'evelu-ocp' of https://github.com/ErwanAliasr1/fio

 t/io_uring.c       | 26 ++++++++++++++++++++------
 t/one-core-peak.sh |  6 ++++--
 2 files changed, 24 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 7ef2f6ce..2c9fd08c 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -118,6 +118,7 @@ static int do_nop = 0;		/* no-op SQ ring commands */
 static int nthreads = 1;
 static int stats = 0;		/* generate IO stats */
 static int aio = 0;		/* use libaio */
+static int runtime = 0;	/* runtime */
 
 static unsigned long tsc_rate;
 
@@ -841,11 +842,10 @@ static struct submitter *get_submitter(int offset)
 	return ret;
 }
 
-static void sig_int(int sig)
+static void do_finish(const char *reason)
 {
 	int j;
-
-	printf("Exiting on signal %d\n", sig);
+	printf("Exiting on %s\n", reason);
 	for (j = 0; j < nthreads; j++) {
 		struct submitter *s = get_submitter(j);
 		s->finish = 1;
@@ -853,6 +853,11 @@ static void sig_int(int sig)
 	finish = 1;
 }
 
+static void sig_int(int sig)
+{
+	do_finish("signal");
+}
+
 static void arm_sig_int(void)
 {
 	struct sigaction act;
@@ -1000,6 +1005,8 @@ static void file_depths(char *buf)
 
 static void usage(char *argv, int status)
 {
+	char runtime_str[16];
+	snprintf(runtime_str, sizeof(runtime_str), "%d", runtime);
 	printf("%s [options] -- [filenames]\n"
 		" -d <int>  : IO Depth, default %d\n"
 		" -s <int>  : Batch submit, default %d\n"
@@ -1013,9 +1020,11 @@ static void usage(char *argv, int status)
 		" -N <bool> : Perform just no-op requests, default %d\n"
 		" -t <bool> : Track IO latencies, default %d\n"
 		" -T <int>  : TSC rate in HZ\n"
-		" -a <bool> : Use legacy aio, default %d\n",
+		" -a <bool> : Use legacy aio, default %d\n"
+		" -r <int>  : Runtime in seconds, default %s\n",
 		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
-		fixedbufs, register_files, nthreads, !buffered, do_nop, stats, aio);
+		fixedbufs, register_files, nthreads, !buffered, do_nop, stats, aio,
+		runtime == 0 ? "unlimited" : runtime_str);
 	exit(status);
 }
 
@@ -1075,7 +1084,7 @@ int main(int argc, char *argv[])
 	if (!do_nop && argc < 2)
 		usage(argv[0], 1);
 
-	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:r:h?")) != -1) {
 		switch (opt) {
 		case 'a':
 			aio = !!atoi(optarg);
@@ -1133,6 +1142,9 @@ int main(int argc, char *argv[])
 			tsc_rate = strtoul(optarg, NULL, 10);
 			write_tsc_rate();
 			break;
+		case 'r':
+			runtime = atoi(optarg);
+			break;
 		case 'h':
 		case '?':
 		default:
@@ -1273,6 +1285,8 @@ int main(int argc, char *argv[])
 		unsigned long iops, bw;
 
 		sleep(1);
+		if (runtime && !--runtime)
+			do_finish("timeout");
 
 		/* don't print partial run, if interrupted by signal */
 		if (finish)
diff --git a/t/one-core-peak.sh b/t/one-core-peak.sh
index d0649d2e..11b1d69a 100755
--- a/t/one-core-peak.sh
+++ b/t/one-core-peak.sh
@@ -152,7 +152,7 @@ check_sysblock_value() {
   sys_block_dir=$(get_sys_block_dir ${device_name})
   target_file="${sys_block_dir}/$2"
   value=$3
-  [ -f "${target_file}" ] || fatal "Cannot find ${target_file} for ${device_name}"
+  [ -f "${target_file}" ] || return
   content=$(cat ${target_file})
   if [ "${content}" != "${value}" ]; then
     info "${device_name}" "${target_file} set to ${value}."
@@ -238,9 +238,10 @@ show_system() {
   info "system" "CPU: ${CPU_MODEL}"
   info "system" "MEMORY: ${MEMORY_SPEED}"
   info "system" "KERNEL: ${KERNEL}"
-  for config_item in BLK_CGROUP_IOCOST HZ; do
+  for config_item in BLK_CGROUP BLK_WBT_MQ HZ RETPOLINE PAGE_TABLE_ISOLATION; do
     info "system" "KERNEL: $(show_kernel_config_item ${config_item})"
   done
+  info "system" "KERNEL: $(cat /proc/cmdline)"
   tsc=$(journalctl -k | grep 'tsc: Refined TSC clocksource calibration:' | awk '{print $11}')
   if [ -n "${tsc}" ]; then
     info "system" "TSC: ${tsc} Mhz"
@@ -263,6 +264,7 @@ for drive in ${drives}; do
   check_sysblock_value ${drive} "queue/iostats" 0 # Ensure iostats are disabled
   check_sysblock_value ${drive} "queue/nomerges" 2 # Ensure merge are disabled
   check_sysblock_value ${drive} "queue/io_poll" 1 # Ensure io_poll is enabled
+  check_sysblock_value ${drive} "queue/wbt_lat_usec" 0 # Disabling wbt lat
   show_device ${drive}
 done
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 24a24c12a04c45174c2d68ffb7fcb3f367e40dee:

  t/io_uring: clean up aio wait loop (2021-10-04 17:04:04 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 09ee86fa431939cb8f634e9ee8e1fc8d9302ea59:

  t/io_uring: get rid of old debug printfs (2021-10-05 06:58:07 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      t/io_uring: print submitter id with tid on startup
      t/io_uring: get rid of old debug printfs

 t/io_uring.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index f27a12c7..7ef2f6ce 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -552,7 +552,7 @@ static int submitter_init(struct submitter *s)
 	int i, nr_batch;
 
 	s->tid = gettid();
-	printf("submitter=%d\n", s->tid);
+	printf("submitter=%d, tid=%d\n", s->index, s->tid);
 
 	srand48(pthread_self());
 
@@ -951,7 +951,6 @@ static int setup_ring(struct submitter *s)
 	ptr = mmap(0, p.sq_off.array + p.sq_entries * sizeof(__u32),
 			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
 			IORING_OFF_SQ_RING);
-	printf("sq_ring ptr = 0x%p\n", ptr);
 	sring->head = ptr + p.sq_off.head;
 	sring->tail = ptr + p.sq_off.tail;
 	sring->ring_mask = ptr + p.sq_off.ring_mask;
@@ -963,12 +962,10 @@ static int setup_ring(struct submitter *s)
 	s->sqes = mmap(0, p.sq_entries * sizeof(struct io_uring_sqe),
 			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
 			IORING_OFF_SQES);
-	printf("sqes ptr    = 0x%p\n", s->sqes);
 
 	ptr = mmap(0, p.cq_off.cqes + p.cq_entries * sizeof(struct io_uring_cqe),
 			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
 			IORING_OFF_CQ_RING);
-	printf("cq_ring ptr = 0x%p\n", ptr);
 	cring->head = ptr + p.cq_off.head;
 	cring->tail = ptr + p.cq_off.tail;
 	cring->ring_mask = ptr + p.cq_off.ring_mask;
@@ -1253,10 +1250,8 @@ int main(int argc, char *argv[])
 	printf("polled=%d, fixedbufs=%d, register_files=%d, buffered=%d, QD=%d\n", polled, fixedbufs, register_files, buffered, depth);
 	if (!aio)
 		printf("Engine=io_uring, sq_ring=%d, cq_ring=%d\n", *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
-#ifdef CONFIG_LIBAIO
 	else
-		printf("Engine=aio, ctx=%p\n", &s->aio_ctx);
-#endif
+		printf("Engine=aio\n");
 
 	for (j = 0; j < nthreads; j++) {
 		s = get_submitter(j);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ca4eefc1b55d7f9fc03bf113d63e3d0b2d7b38ae:

  Merge branch 'evelu-fixes2' of https://github.com/ErwanAliasr1/fio (2021-10-01 13:55:52 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 24a24c12a04c45174c2d68ffb7fcb3f367e40dee:

  t/io_uring: clean up aio wait loop (2021-10-04 17:04:04 -0600)

----------------------------------------------------------------
Jens Axboe (6):
      t/io_uring: remove extra add_stat() call
      t/io_uring: add support for legacy AIO
      t/io_uring: don't print partial IOPS etc output if exit signal was received
      t/io_uring: don't track IO latencies the first second of runtime
      t/io_uring: check for valid clock_index and finish state for stats
      t/io_uring: clean up aio wait loop

 t/io_uring.c | 277 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 253 insertions(+), 24 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 2ec4caeb..f27a12c7 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -7,6 +7,10 @@
 #include <inttypes.h>
 #include <math.h>
 
+#ifdef CONFIG_LIBAIO
+#include <libaio.h>
+#endif
+
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <sys/ioctl.h>
@@ -86,6 +90,10 @@ struct submitter {
 	int clock_index;
 	unsigned long *plat;
 
+#ifdef CONFIG_LIBAIO
+	io_context_t aio_ctx;
+#endif
+
 	struct file files[MAX_FDS];
 	unsigned nr_files;
 	unsigned cur_file;
@@ -94,6 +102,7 @@ struct submitter {
 
 static struct submitter *submitter;
 static volatile int finish;
+static int stats_running;
 
 static int depth = DEPTH;
 static int batch_submit = BATCH_SUBMIT;
@@ -108,6 +117,8 @@ static int sq_thread_cpu = -1;	/* pin above thread to this CPU */
 static int do_nop = 0;		/* no-op SQ ring commands */
 static int nthreads = 1;
 static int stats = 0;		/* generate IO stats */
+static int aio = 0;		/* use libaio */
+
 static unsigned long tsc_rate;
 
 #define TSC_RATE_FILE	"tsc-rate"
@@ -298,10 +309,12 @@ static void add_stat(struct submitter *s, int clock_index, int nr)
 	unsigned long cycles;
 	unsigned int pidx;
 
-	cycles = get_cpu_clock();
-	cycles -= s->clock_batch[clock_index];
-	pidx = plat_val_to_idx(cycles);
-	s->plat[pidx] += nr;
+	if (!s->finish && clock_index) {
+		cycles = get_cpu_clock();
+		cycles -= s->clock_batch[clock_index];
+		pidx = plat_val_to_idx(cycles);
+		s->plat[pidx] += nr;
+	}
 #endif
 }
 
@@ -432,11 +445,11 @@ static void init_io(struct submitter *s, unsigned index)
 	sqe->ioprio = 0;
 	sqe->off = offset;
 	sqe->user_data = (unsigned long) f->fileno;
-	if (stats)
+	if (stats && stats_running)
 		sqe->user_data |= ((unsigned long)s->clock_index << 32);
 }
 
-static int prep_more_ios(struct submitter *s, int max_ios)
+static int prep_more_ios_uring(struct submitter *s, int max_ios)
 {
 	struct io_sq_ring *ring = &s->sq_ring;
 	unsigned index, tail, next_tail, prepped = 0;
@@ -481,7 +494,7 @@ static int get_file_size(struct file *f)
 	return -1;
 }
 
-static int reap_events(struct submitter *s)
+static int reap_events_uring(struct submitter *s)
 {
 	struct io_cq_ring *ring = &s->cq_ring;
 	struct io_uring_cqe *cqe;
@@ -517,9 +530,8 @@ static int reap_events(struct submitter *s)
 					stat_nr = 0;
 				}
 				last_idx = clock_index;
-			}
-			stat_nr++;
-			add_stat(s, clock_index, 1);
+			} else if (clock_index)
+				stat_nr++;
 		}
 		reaped++;
 		head++;
@@ -535,11 +547,9 @@ static int reap_events(struct submitter *s)
 	return reaped;
 }
 
-static void *submitter_fn(void *data)
+static int submitter_init(struct submitter *s)
 {
-	struct submitter *s = data;
-	struct io_sq_ring *ring = &s->sq_ring;
-	int i, ret, prepped, nr_batch;
+	int i, nr_batch;
 
 	s->tid = gettid();
 	printf("submitter=%d\n", s->tid);
@@ -552,7 +562,7 @@ static void *submitter_fn(void *data)
 	if (stats) {
 		nr_batch = roundup_pow2(depth / batch_submit);
 		s->clock_batch = calloc(nr_batch, sizeof(unsigned long));
-		s->clock_index = 0;
+		s->clock_index = 1;
 
 		s->plat = calloc(PLAT_NR, sizeof(unsigned long));
 	} else {
@@ -561,6 +571,170 @@ static void *submitter_fn(void *data)
 		nr_batch = 0;
 	}
 
+	return nr_batch;
+}
+
+#ifdef CONFIG_LIBAIO
+static int prep_more_ios_aio(struct submitter *s, int max_ios, struct iocb *iocbs)
+{
+	unsigned long offset, data;
+	struct file *f;
+	unsigned index;
+	long r;
+
+	index = 0;
+	while (index < max_ios) {
+		struct iocb *iocb = &iocbs[index];
+
+		if (s->nr_files == 1) {
+			f = &s->files[0];
+		} else {
+			f = &s->files[s->cur_file];
+			if (f->pending_ios >= file_depth(s)) {
+				s->cur_file++;
+				if (s->cur_file == s->nr_files)
+					s->cur_file = 0;
+				f = &s->files[s->cur_file];
+			}
+		}
+		f->pending_ios++;
+
+		r = lrand48();
+		offset = (r % (f->max_blocks - 1)) * bs;
+		io_prep_pread(iocb, f->real_fd, s->iovecs[index].iov_base,
+				s->iovecs[index].iov_len, offset);
+
+		data = f->fileno;
+		if (stats && stats_running)
+			data |= ((unsigned long) s->clock_index << 32);
+		iocb->data = (void *) (uintptr_t) data;
+		index++;
+	}
+	return index;
+}
+
+static int reap_events_aio(struct submitter *s, struct io_event *events, int evs)
+{
+	int last_idx = -1, stat_nr = 0;
+	int reaped = 0;
+
+	while (evs) {
+		unsigned long data = (uintptr_t) events[reaped].data;
+		struct file *f = &s->files[data & 0xffffffff];
+
+		f->pending_ios--;
+		if (events[reaped].res != bs) {
+			printf("io: unexpected ret=%ld\n", events[reaped].res);
+			return -1;
+		}
+		if (stats) {
+			int clock_index = data >> 32;
+
+			if (last_idx != clock_index) {
+				if (last_idx != -1) {
+					add_stat(s, last_idx, stat_nr);
+					stat_nr = 0;
+				}
+				last_idx = clock_index;
+			} else if (clock_index)
+				stat_nr++;
+		}
+		reaped++;
+		evs--;
+	}
+
+	if (stat_nr)
+		add_stat(s, last_idx, stat_nr);
+
+	s->inflight -= reaped;
+	s->done += reaped;
+	return reaped;
+}
+
+static void *submitter_aio_fn(void *data)
+{
+	struct submitter *s = data;
+	int i, ret, prepped, nr_batch;
+	struct iocb **iocbsptr;
+	struct iocb *iocbs;
+	struct io_event *events;
+
+	nr_batch = submitter_init(s);
+
+	iocbsptr = calloc(depth, sizeof(struct iocb *));
+	iocbs = calloc(depth, sizeof(struct iocb));
+	events = calloc(depth, sizeof(struct io_event));
+
+	for (i = 0; i < depth; i++)
+		iocbsptr[i] = &iocbs[i];
+
+	prepped = 0;
+	do {
+		int to_wait, to_submit, to_prep;
+
+		if (!prepped && s->inflight < depth) {
+			to_prep = min(depth - s->inflight, batch_submit);
+			prepped = prep_more_ios_aio(s, to_prep, iocbs);
+#ifdef ARCH_HAVE_CPU_CLOCK
+			if (prepped && stats) {
+				s->clock_batch[s->clock_index] = get_cpu_clock();
+				s->clock_index = (s->clock_index + 1) & (nr_batch - 1);
+			}
+#endif
+		}
+		s->inflight += prepped;
+		to_submit = prepped;
+
+		if (to_submit && (s->inflight + to_submit <= depth))
+			to_wait = 0;
+		else
+			to_wait = min(s->inflight + to_submit, batch_complete);
+
+		ret = io_submit(s->aio_ctx, to_submit, iocbsptr);
+		s->calls++;
+		if (ret < 0) {
+			perror("io_submit");
+			break;
+		} else if (ret != to_submit) {
+			printf("submitted %d, wanted %d\n", ret, to_submit);
+			break;
+		}
+		prepped = 0;
+
+		while (to_wait) {
+			int r;
+
+			s->calls++;
+			r = io_getevents(s->aio_ctx, to_wait, to_wait, events, NULL);
+			if (r < 0) {
+				perror("io_getevents");
+				break;
+			} else if (r != to_wait) {
+				printf("r=%d, wait=%d\n", r, to_wait);
+				break;
+			}
+			r = reap_events_aio(s, events, r);
+			s->reaps += r;
+			to_wait -= r;
+		}
+	} while (!s->finish);
+
+	free(iocbsptr);
+	free(iocbs);
+	free(events);
+	finish = 1;
+	return NULL;
+}
+#endif
+
+static void *submitter_uring_fn(void *data)
+{
+	struct submitter *s = data;
+	struct io_sq_ring *ring = &s->sq_ring;
+	int ret, prepped, nr_batch;
+
+	nr_batch = submitter_init(s);
+
 	prepped = 0;
 	do {
 		int to_wait, to_submit, this_reap, to_prep;
@@ -568,7 +742,7 @@ static void *submitter_fn(void *data)
 
 		if (!prepped && s->inflight < depth) {
 			to_prep = min(depth - s->inflight, batch_submit);
-			prepped = prep_more_ios(s, to_prep);
+			prepped = prep_more_ios_uring(s, to_prep);
 #ifdef ARCH_HAVE_CPU_CLOCK
 			if (prepped && stats) {
 				s->clock_batch[s->clock_index] = get_cpu_clock();
@@ -613,7 +787,8 @@ submit:
 		this_reap = 0;
 		do {
 			int r;
-			r = reap_events(s);
+
+			r = reap_events_uring(s);
 			if (r == -1) {
 				s->finish = 1;
 				break;
@@ -693,6 +868,34 @@ static void arm_sig_int(void)
 #endif
 }
 
+static int setup_aio(struct submitter *s)
+{
+#ifdef CONFIG_LIBAIO
+	if (polled) {
+		fprintf(stderr, "aio does not support polled IO\n");
+		polled = 0;
+	}
+	if (sq_thread_poll) {
+		fprintf(stderr, "aio does not support SQPOLL IO\n");
+		sq_thread_poll = 0;
+	}
+	if (do_nop) {
+		fprintf(stderr, "aio does not support polled IO\n");
+		do_nop = 0;
+	}
+	if (fixedbufs || register_files) {
+		fprintf(stderr, "aio does not support registered files or buffers\n");
+		fixedbufs = register_files = 0;
+	}
+
+	return io_queue_init(depth, &s->aio_ctx);
+#else
+	fprintf(stderr, "Legacy AIO not available on this system/build\n");
+	errno = EINVAL;
+	return -1;
+#endif
+}
+
 static int setup_ring(struct submitter *s)
 {
 	struct io_sq_ring *sring = &s->sq_ring;
@@ -812,9 +1015,10 @@ static void usage(char *argv, int status)
 		" -O <bool> : Use O_DIRECT, default %d\n"
 		" -N <bool> : Perform just no-op requests, default %d\n"
 		" -t <bool> : Track IO latencies, default %d\n"
-		" -T <int>  : TSC rate in HZ\n",
+		" -T <int>  : TSC rate in HZ\n"
+		" -a <bool> : Use legacy aio, default %d\n",
 		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
-		fixedbufs, register_files, nthreads, !buffered, do_nop, stats);
+		fixedbufs, register_files, nthreads, !buffered, do_nop, stats, aio);
 	exit(status);
 }
 
@@ -874,8 +1078,11 @@ int main(int argc, char *argv[])
 	if (!do_nop && argc < 2)
 		usage(argv[0], 1);
 
-	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:a:h?")) != -1) {
 		switch (opt) {
+		case 'a':
+			aio = !!atoi(optarg);
+			break;
 		case 'd':
 			depth = atoi(optarg);
 			break;
@@ -1033,19 +1240,32 @@ int main(int argc, char *argv[])
 	for (j = 0; j < nthreads; j++) {
 		s = get_submitter(j);
 
-		err = setup_ring(s);
+		if (!aio)
+			err = setup_ring(s);
+		else
+			err = setup_aio(s);
 		if (err) {
 			printf("ring setup failed: %s, %d\n", strerror(errno), err);
 			return 1;
 		}
 	}
 	s = get_submitter(0);
-	printf("polled=%d, fixedbufs=%d, register_files=%d, buffered=%d", polled, fixedbufs, register_files, buffered);
-	printf(" QD=%d, sq_ring=%d, cq_ring=%d\n", depth, *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
+	printf("polled=%d, fixedbufs=%d, register_files=%d, buffered=%d, QD=%d\n", polled, fixedbufs, register_files, buffered, depth);
+	if (!aio)
+		printf("Engine=io_uring, sq_ring=%d, cq_ring=%d\n", *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
+#ifdef CONFIG_LIBAIO
+	else
+		printf("Engine=aio, ctx=%p\n", &s->aio_ctx);
+#endif
 
 	for (j = 0; j < nthreads; j++) {
 		s = get_submitter(j);
-		pthread_create(&s->thread, NULL, submitter_fn, s);
+		if (!aio)
+			pthread_create(&s->thread, NULL, submitter_uring_fn, s);
+#ifdef CONFIG_LIBAIO
+		else
+			pthread_create(&s->thread, NULL, submitter_aio_fn, s);
+#endif
 	}
 
 	fdepths = malloc(8 * s->nr_files * nthreads);
@@ -1058,6 +1278,15 @@ int main(int argc, char *argv[])
 		unsigned long iops, bw;
 
 		sleep(1);
+
+		/* don't print partial run, if interrupted by signal */
+		if (finish)
+			break;
+
+		/* one second in to the run, enable stats */
+		if (stats)
+			stats_running = 1;
+
 		for (j = 0; j < nthreads; j++) {
 			this_done += s->done;
 			this_call += s->calls;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0f77c977ab44a10d69268546a849376efc327d47:

  zbd: Fix unexpected job termination by open zone search failure (2021-09-30 10:05:23 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ca4eefc1b55d7f9fc03bf113d63e3d0b2d7b38ae:

  Merge branch 'evelu-fixes2' of https://github.com/ErwanAliasr1/fio (2021-10-01 13:55:52 -0600)

----------------------------------------------------------------
Erwan Velu (2):
      t/one-core-peak: Report numa as off if missing
      t/one-core-peak: nvme-cli as optional tooling

Jens Axboe (2):
      t/io_uring: correct percentile ranking
      Merge branch 'evelu-fixes2' of https://github.com/ErwanAliasr1/fio

Shin'ichiro Kawasaki (2):
      Revert "Fix for loop count issue when do_verify=0 (#1093)"
      Refer td->loops instead of td->o.loops to fix loop count issue

 backend.c          |  4 ++--
 libfio.c           |  2 +-
 t/io_uring.c       |  2 +-
 t/one-core-peak.sh | 19 +++++++++++--------
 4 files changed, 15 insertions(+), 12 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 4c260747..86fa6d41 100644
--- a/backend.c
+++ b/backend.c
@@ -1920,13 +1920,13 @@ static void *thread_main(void *data)
 		if (td->error || td->terminate)
 			break;
 
-		clear_io_state(td, 0);
-		
 		if (!o->do_verify ||
 		    o->verify == VERIFY_NONE ||
 		    td_ioengine_flagged(td, FIO_UNIDIR))
 			continue;
 
+		clear_io_state(td, 0);
+
 		fio_gettime(&td->start, NULL);
 
 		do_verify(td, verify_bytes);
diff --git a/libfio.c b/libfio.c
index 6144a474..ed5906d4 100644
--- a/libfio.c
+++ b/libfio.c
@@ -104,7 +104,7 @@ static void reset_io_counters(struct thread_data *td, int all)
 	/*
 	 * reset file done count if we are to start over
 	 */
-	if (td->o.time_based || td->o.loops || td->o.do_verify)
+	if (td->o.time_based || td->loops > 1 || td->o.do_verify)
 		td->nr_done_files = 0;
 }
 
diff --git a/t/io_uring.c b/t/io_uring.c
index d7ae18b0..2ec4caeb 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -115,7 +115,7 @@ static unsigned long tsc_rate;
 static int vectored = 1;
 
 static float plist[] = { 1.0, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0,
-			80.0, 90.0, 95.0, 99.9, 99.5, 99.9, 99.95, 99.99 };
+			80.0, 90.0, 95.0, 99.0, 99.5, 99.9, 99.95, 99.99 };
 static int plist_len = 17;
 
 static unsigned long cycles_to_nsec(unsigned long cycles)
diff --git a/t/one-core-peak.sh b/t/one-core-peak.sh
index 57c45451..d0649d2e 100755
--- a/t/one-core-peak.sh
+++ b/t/one-core-peak.sh
@@ -192,18 +192,21 @@ show_nvme() {
   pci_dir="/sys/bus/pci/devices/${pci_addr}/"
   link_speed=$(cat ${pci_dir}/current_link_speed)
   irq=$(cat ${pci_dir}/irq)
-  numa=$(cat ${pci_dir}/numa_node)
+  numa=$([ -f ${pci_dir}/numa_node ] && cat ${pci_dir}/numa_node || echo "off")
   cpus=$(cat ${pci_dir}/local_cpulist)
   model=$(cat ${device_dir}/model | xargs) #xargs for trimming spaces
   fw=$(cat ${device_dir}/firmware_rev | xargs) #xargs for trimming spaces
   serial=$(cat ${device_dir}/serial | xargs) #xargs for trimming spaces
   info ${device_name} "MODEL=${model} FW=${fw} serial=${serial} PCI=${pci_addr}@${link_speed} IRQ=${irq} NUMA=${numa} CPUS=${cpus} "
-  NCQA=$(nvme get-feature -H -f 0x7 ${device} |grep NCQA |cut -d ':' -f 2 | xargs)
-  NSQA=$(nvme get-feature -H -f 0x7 ${device} |grep NSQA |cut -d ':' -f 2 | xargs)
-  power_state=$(nvme get-feature -H -f 0x2 ${device} | grep PS |cut -d ":" -f 2 | xargs)
-  apste=$(nvme get-feature -H -f 0xc ${device} | grep APSTE |cut -d ":" -f 2 | xargs)
-  temp=$(nvme smart-log ${device} |grep 'temperature' |cut -d ':' -f 2 |xargs)
-  info ${device_name} "Temp:${temp}, Autonomous Power State Transition:${apste}, PowerState:${power_state}, Completion Queues:${NCQA}, Submission Queues:${NSQA}"
+  which nvme &> /dev/null
+  if [ $? -eq 0 ]; then
+    NCQA=$(nvme get-feature -H -f 0x7 ${device} |grep NCQA |cut -d ':' -f 2 | xargs)
+    NSQA=$(nvme get-feature -H -f 0x7 ${device} |grep NSQA |cut -d ':' -f 2 | xargs)
+    power_state=$(nvme get-feature -H -f 0x2 ${device} | grep PS |cut -d ":" -f 2 | xargs)
+    apste=$(nvme get-feature -H -f 0xc ${device} | grep APSTE |cut -d ":" -f 2 | xargs)
+    temp=$(nvme smart-log ${device} |grep 'temperature' |cut -d ':' -f 2 |xargs)
+    info ${device_name} "Temp:${temp}, Autonomous Power State Transition:${apste}, PowerState:${power_state}, Completion Queues:${NCQA}, Submission Queues:${NSQA}"
+  fi
 }
 
 show_device() {
@@ -249,7 +252,7 @@ show_system() {
 ### MAIN
 check_args ${args}
 check_root
-check_binary t/io_uring lscpu grep taskset cpupower awk tr xargs dmidecode nvme
+check_binary t/io_uring lscpu grep taskset cpupower awk tr xargs dmidecode
 detect_first_core
 
 info "##################################################"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-10-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-10-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 203e4c2624493c0db8c69c9ad830090c5b79be67:

  t/io_uring: store TSC rate in local file (2021-09-29 20:16:54 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0f77c977ab44a10d69268546a849376efc327d47:

  zbd: Fix unexpected job termination by open zone search failure (2021-09-30 10:05:23 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (1):
      zbd: Fix unexpected job termination by open zone search failure

 zbd.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

---

Diff of recent changes:

diff --git a/zbd.c b/zbd.c
index 64415d2b..c0b0b81c 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1204,6 +1204,19 @@ static uint32_t pick_random_zone_idx(const struct fio_file *f,
 		f->io_size;
 }
 
+static bool any_io_in_flight(void)
+{
+	struct thread_data *td;
+	int i;
+
+	for_each_td(td, i) {
+		if (td->io_u_in_flight)
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * Modify the offset of an I/O unit that does not refer to an open zone such
  * that it refers to an open zone. Close an open zone and open a new zone if
@@ -1223,6 +1236,8 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 	uint32_t zone_idx, new_zone_idx;
 	int i;
 	bool wait_zone_close;
+	bool in_flight;
+	bool should_retry = true;
 
 	assert(is_valid_offset(f, io_u->offset));
 
@@ -1337,6 +1352,7 @@ open_other_zone:
 		io_u_quiesce(td);
 	}
 
+retry:
 	/* Zone 'z' is full, so try to open a new zone. */
 	for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
 		zone_idx++;
@@ -1376,6 +1392,24 @@ open_other_zone:
 			goto out;
 		pthread_mutex_lock(&zbdi->mutex);
 	}
+
+	/*
+	 * When any I/O is in-flight or when all I/Os in-flight get completed,
+	 * the I/Os might have closed zones then retry the steps to open a zone.
+	 * Before retry, call io_u_quiesce() to complete in-flight writes.
+	 */
+	in_flight = any_io_in_flight();
+	if (in_flight || should_retry) {
+		dprint(FD_ZBD, "%s(%s): wait zone close and retry open zones\n",
+		       __func__, f->file_name);
+		pthread_mutex_unlock(&zbdi->mutex);
+		zone_unlock(z);
+		io_u_quiesce(td);
+		zone_lock(td, f, z);
+		should_retry = in_flight;
+		goto retry;
+	}
+
 	pthread_mutex_unlock(&zbdi->mutex);
 	zone_unlock(z);
 	dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit cd312799e6a82557abbd742797b59f51e8c2c2e4:

  Merge branch 'sigbreak' of https://github.com/bjpaupor/fio (2021-09-28 13:28:18 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 203e4c2624493c0db8c69c9ad830090c5b79be67:

  t/io_uring: store TSC rate in local file (2021-09-29 20:16:54 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'patch-1' of https://github.com/ravisowmya/fio
      t/io_uring: store TSC rate in local file

ravisowmya (1):
      Fix for loop count issue when do_verify=0 (#1093)

 .gitignore   |  1 +
 backend.c    |  4 ++--
 t/io_uring.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/.gitignore b/.gitignore
index 6651f96e..72494a1e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -31,3 +31,4 @@ doc/output
 /TAGS
 /t/zbd/test-zbd-support.log.*
 /t/fuzz/fuzz_parseini
+tsc-rate
diff --git a/backend.c b/backend.c
index 86fa6d41..4c260747 100644
--- a/backend.c
+++ b/backend.c
@@ -1920,13 +1920,13 @@ static void *thread_main(void *data)
 		if (td->error || td->terminate)
 			break;
 
+		clear_io_state(td, 0);
+		
 		if (!o->do_verify ||
 		    o->verify == VERIFY_NONE ||
 		    td_ioengine_flagged(td, FIO_UNIDIR))
 			continue;
 
-		clear_io_state(td, 0);
-
 		fio_gettime(&td->start, NULL);
 
 		do_verify(td, verify_bytes);
diff --git a/t/io_uring.c b/t/io_uring.c
index e5568aa2..d7ae18b0 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -110,6 +110,8 @@ static int nthreads = 1;
 static int stats = 0;		/* generate IO stats */
 static unsigned long tsc_rate;
 
+#define TSC_RATE_FILE	"tsc-rate"
+
 static int vectored = 1;
 
 static float plist[] = { 1.0, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0,
@@ -816,6 +818,50 @@ static void usage(char *argv, int status)
 	exit(status);
 }
 
+static void read_tsc_rate(void)
+{
+	char buffer[32];
+	int fd, ret;
+
+	if (tsc_rate)
+		return;
+
+	fd = open(TSC_RATE_FILE, O_RDONLY);
+	if (fd < 0)
+		return;
+
+	ret = read(fd, buffer, sizeof(buffer));
+	if (ret < 0) {
+		close(fd);
+		return;
+	}
+
+	tsc_rate = strtoul(buffer, NULL, 10);
+	printf("Using TSC rate %luHz\n", tsc_rate);
+	close(fd);
+}
+
+static void write_tsc_rate(void)
+{
+	char buffer[32];
+	struct stat sb;
+	int fd, ret;
+
+	if (!stat(TSC_RATE_FILE, &sb))
+		return;
+
+	fd = open(TSC_RATE_FILE, O_WRONLY | O_CREAT, 0644);
+	if (fd < 0)
+		return;
+
+	memset(buffer, 0, sizeof(buffer));
+	sprintf(buffer, "%lu", tsc_rate);
+	ret = write(fd, buffer, strlen(buffer));
+	if (ret < 0)
+		perror("write");
+	close(fd);
+}
+
 int main(int argc, char *argv[])
 {
 	struct submitter *s;
@@ -881,6 +927,7 @@ int main(int argc, char *argv[])
 			return 1;
 #endif
 			tsc_rate = strtoul(optarg, NULL, 10);
+			write_tsc_rate();
 			break;
 		case 'h':
 		case '?':
@@ -890,6 +937,9 @@ int main(int argc, char *argv[])
 		}
 	}
 
+	if (stats)
+		read_tsc_rate();
+
 	if (batch_complete > depth)
 		batch_complete = depth;
 	if (batch_submit > depth)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6e0ef20ffd975fc217aba4e7c125b420cd2fbd91:

  Merge branch 'onecore' of https://github.com/ByteHamster/fio (2021-09-26 16:32:32 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cd312799e6a82557abbd742797b59f51e8c2c2e4:

  Merge branch 'sigbreak' of https://github.com/bjpaupor/fio (2021-09-28 13:28:18 -0600)

----------------------------------------------------------------
Brandon Paupore (1):
      add signal handlers for Windows SIGBREAK

Jens Axboe (1):
      Merge branch 'sigbreak' of https://github.com/bjpaupor/fio

 server.c     | 5 +++++
 t/io_uring.c | 5 +++++
 2 files changed, 10 insertions(+)

---

Diff of recent changes:

diff --git a/server.c b/server.c
index 859a401b..90c52e01 100644
--- a/server.c
+++ b/server.c
@@ -2457,6 +2457,11 @@ static void set_sig_handlers(void)
 	};
 
 	sigaction(SIGINT, &act, NULL);
+
+	/* Windows uses SIGBREAK as a quit signal from other applications */
+#ifdef WIN32
+	sigaction(SIGBREAK, &act, NULL);
+#endif
 }
 
 void fio_server_destroy_sk_key(void)
diff --git a/t/io_uring.c b/t/io_uring.c
index f22c504a..e5568aa2 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -684,6 +684,11 @@ static void arm_sig_int(void)
 	act.sa_handler = sig_int;
 	act.sa_flags = SA_RESTART;
 	sigaction(SIGINT, &act, NULL);
+
+	/* Windows uses SIGBREAK as a quit signal from other applications */
+#ifdef WIN32
+	sigaction(SIGBREAK, &act, NULL);
+#endif
 }
 
 static int setup_ring(struct submitter *s)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0b2114e7b46d047271d8d404beaae7006e89f8ef:

  Merge branch 'evelu-uring' of https://github.com/ErwanAliasr1/fio (2021-09-25 14:56:14 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6e0ef20ffd975fc217aba4e7c125b420cd2fbd91:

  Merge branch 'onecore' of https://github.com/ByteHamster/fio (2021-09-26 16:32:32 -0600)

----------------------------------------------------------------
ByteHamster (1):
      Pick core for running t/one-core-peak.sh

Erwan Velu (5):
      one-core-peak: Avoid reporting Unknown memory speed
      one-core-peak: Adding option to reporting latencies
      one-core-peak.sh: Fixing bash
      t/one-core-peak: Reporting kernel config
      one-core-peak: Reporting NVME features

Jens Axboe (3):
      Merge branch 'tsc' of https://github.com/ErwanAliasr1/fio
      Merge branch 'evelu-fio' of https://github.com/ErwanAliasr1/fio
      Merge branch 'onecore' of https://github.com/ByteHamster/fio

 t/one-core-peak.sh | 109 +++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 93 insertions(+), 16 deletions(-)

---

Diff of recent changes:

diff --git a/t/one-core-peak.sh b/t/one-core-peak.sh
index 791deece..57c45451 100755
--- a/t/one-core-peak.sh
+++ b/t/one-core-peak.sh
@@ -4,7 +4,11 @@ args=$*
 first_cores=""
 taskset_cores=""
 first_cores_count=0
-nb_threads=4 #default from the benchmark
+nb_threads=1
+drives=""
+
+# Default options
+latency_cmdline=""
 
 fatal() {
   echo "$@"
@@ -35,11 +39,22 @@ check_binary() {
   done
 }
 
-
 detect_first_core() {
+  cpu_to_search="0"
+  if [ "${#drives[@]}" -eq 1 ]; then
+    device_name=$(block_dev_name ${drives[0]})
+    device_dir="/sys/block/${device_name}/device/"
+    pci_addr=$(cat ${device_dir}/address)
+    pci_dir="/sys/bus/pci/devices/${pci_addr}/"
+    cpu_to_search=$(cat ${pci_dir}/local_cpulist | cut -d"," -f 1 | cut -d"-" -f 1)
+  else
+    hint 'Passed multiple devices. Running on the first core.'
+  fi
+  core_to_run=$(lscpu  --all -pSOCKET,CORE,CPU | grep ",$cpu_to_search\$" | cut -d"," -f1-2)
+
   # Detect which logical cpus belongs to the first physical core
   # If Hyperthreading is enabled, two cores are returned
-  cpus=$(lscpu  --all -pSOCKET,CORE,CPU |grep "0,0")
+  cpus=$(lscpu  --all -pSOCKET,CORE,CPU | grep "$core_to_run")
   for cpu in ${cpus}; do
     IFS=','
     # shellcheck disable=SC2206
@@ -57,8 +72,37 @@ detect_first_core() {
   taskset_cores=$(echo "${first_cores}" | tr ' ' ',')
 }
 
+usage() {
+  echo "usage: [options] block_device [other_block_devices]
+
+   -h         : print help
+   -l         : enable latency reporting
+
+   example:
+      t/one-core-peak.sh /dev/nvme0n1
+      t/one-core-peak.sh -l /dev/nvme0n1 /dev/nvme1n1
+  "
+  exit 0
+}
+
 check_args() {
-  [ $1 -eq 0 ] && fatal "Missing drive(s) as argument"
+  local OPTIND option
+  while getopts "hl" option; do
+    case "${option}" in
+        h) # Show help
+            usage
+            ;;
+        l) # Report latency
+            latency_cmdline="1"
+            ;;
+        *)
+            fatal "Unsupported ${option} option"
+            ;;
+    esac
+  done
+  shift $((OPTIND-1))
+  [ $# -eq 0 ] && fatal "Missing drive(s) as argument"
+  drives="$*"
 }
 
 check_drive_exists() {
@@ -72,7 +116,7 @@ is_nvme() {
 
 check_poll_queue() {
   # Print a warning if the nvme poll queues aren't enabled
-  is_nvme ${args} || return
+  is_nvme ${drives} || return
   poll_queue=$(cat /sys/module/nvme/parameters/poll_queues)
   [ ${poll_queue} -eq 0 ] && hint "For better performance, you should enable nvme poll queues by setting nvme.poll_queues=32 on the kernel commande line"
 }
@@ -141,6 +185,7 @@ check_idle_governor() {
 }
 
 show_nvme() {
+  device="$1"
   device_name=$(block_dev_name $1)
   device_dir="/sys/block/${device_name}/device/"
   pci_addr=$(cat ${device_dir}/address)
@@ -153,6 +198,12 @@ show_nvme() {
   fw=$(cat ${device_dir}/firmware_rev | xargs) #xargs for trimming spaces
   serial=$(cat ${device_dir}/serial | xargs) #xargs for trimming spaces
   info ${device_name} "MODEL=${model} FW=${fw} serial=${serial} PCI=${pci_addr}@${link_speed} IRQ=${irq} NUMA=${numa} CPUS=${cpus} "
+  NCQA=$(nvme get-feature -H -f 0x7 ${device} |grep NCQA |cut -d ':' -f 2 | xargs)
+  NSQA=$(nvme get-feature -H -f 0x7 ${device} |grep NSQA |cut -d ':' -f 2 | xargs)
+  power_state=$(nvme get-feature -H -f 0x2 ${device} | grep PS |cut -d ":" -f 2 | xargs)
+  apste=$(nvme get-feature -H -f 0xc ${device} | grep APSTE |cut -d ":" -f 2 | xargs)
+  temp=$(nvme smart-log ${device} |grep 'temperature' |cut -d ':' -f 2 |xargs)
+  info ${device_name} "Temp:${temp}, Autonomous Power State Transition:${apste}, PowerState:${power_state}, Completion Queues:${NCQA}, Submission Queues:${NSQA}"
 }
 
 show_device() {
@@ -160,24 +211,50 @@ show_device() {
   is_nvme $1 && show_nvme $1
 }
 
+show_kernel_config_item() {
+  config_item="CONFIG_$1"
+  config_file="/boot/config-$(uname -r)"
+  if [ ! -f "${config_file}" ]; then
+    config_file='/proc/config.gz'
+    if [ ! -f "${config_file}" ]; then
+      return
+    fi
+  fi
+  status=$(zgrep ${config_item}= ${config_file})
+  if [ -z "${status}" ]; then
+    echo "${config_item}=N"
+  else
+    echo "${config_item}=$(echo ${status} | cut -d '=' -f 2)"
+  fi
+}
+
 show_system() {
-CPU_MODEL=$(grep -m1 "model name" /proc/cpuinfo | awk '{print substr($0, index($0,$4))}')
-MEMORY_SPEED=$(dmidecode -t 17 -q |grep -m 1 "Configured Memory Speed: " | awk '{print substr($0, index($0,$4))}')
-KERNEL=$(uname -r)
-info "system" "CPU: ${CPU_MODEL}"
-info "system" "MEMORY: ${MEMORY_SPEED}"
-info "system" "KERNEL: ${KERNEL}"
+  CPU_MODEL=$(grep -m1 "model name" /proc/cpuinfo | awk '{print substr($0, index($0,$4))}')
+  MEMORY_SPEED=$(dmidecode -t 17 -q | grep -m 1 "Configured Memory Speed: [0-9]" | awk '{print substr($0, index($0,$4))}')
+  KERNEL=$(uname -r)
+  info "system" "CPU: ${CPU_MODEL}"
+  info "system" "MEMORY: ${MEMORY_SPEED}"
+  info "system" "KERNEL: ${KERNEL}"
+  for config_item in BLK_CGROUP_IOCOST HZ; do
+    info "system" "KERNEL: $(show_kernel_config_item ${config_item})"
+  done
+  tsc=$(journalctl -k | grep 'tsc: Refined TSC clocksource calibration:' | awk '{print $11}')
+  if [ -n "${tsc}" ]; then
+    info "system" "TSC: ${tsc} Mhz"
+    tsc=$(echo ${tsc} | tr -d '.')
+    [ -n "${latency_cmdline}" ] && latency_cmdline="-t1 -T${tsc}000"
+  fi
 }
 
 ### MAIN
-check_args $#
+check_args ${args}
 check_root
-check_binary t/io_uring lscpu grep taskset cpupower awk tr xargs dmidecode
+check_binary t/io_uring lscpu grep taskset cpupower awk tr xargs dmidecode nvme
 detect_first_core
 
 info "##################################################"
 show_system
-for drive in ${args}; do
+for drive in ${drives}; do
   check_drive_exists ${drive}
   check_io_scheduler ${drive}
   check_sysblock_value ${drive} "queue/iostats" 0 # Ensure iostats are disabled
@@ -187,13 +264,13 @@ for drive in ${args}; do
 done
 
 check_poll_queue
-compute_nb_threads ${args}
+compute_nb_threads ${drives}
 check_scaling_governor
 check_idle_governor
 
 info "##################################################"
 echo
 
-cmdline="taskset -c ${taskset_cores} t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n${nb_threads} ${args}"
+cmdline="taskset -c ${taskset_cores} t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n${nb_threads} ${latency_cmdline} ${drives}"
 info "io_uring" "Running ${cmdline}"
 ${cmdline}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6c5d3a1c08bda1bbf22187c7b80573400e1c1053:

  t/io_uring: don't print BW numbers for do_nop (2021-09-24 15:17:44 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0b2114e7b46d047271d8d404beaae7006e89f8ef:

  Merge branch 'evelu-uring' of https://github.com/ErwanAliasr1/fio (2021-09-25 14:56:14 -0600)

----------------------------------------------------------------
Erwan Velu (1):
      t/io_uring.c: Adding \n on help

Jens Axboe (3):
      t/io_uring: add support for latency tracking
      t/io_uring: batch stat updates
      Merge branch 'evelu-uring' of https://github.com/ErwanAliasr1/fio

 t/io_uring.c | 298 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 289 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index d5636380..f22c504a 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -5,6 +5,7 @@
 #include <stddef.h>
 #include <signal.h>
 #include <inttypes.h>
+#include <math.h>
 
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -22,10 +23,10 @@
 
 #include "../arch/arch.h"
 #include "../lib/types.h"
+#include "../lib/roundup.h"
+#include "../minmax.h"
 #include "../os/linux/io_uring.h"
 
-#define min(a, b)		((a < b) ? (a) : (b))
-
 struct io_sq_ring {
 	unsigned *head;
 	unsigned *tail;
@@ -57,8 +58,14 @@ struct file {
 	unsigned pending_ios;
 	int real_fd;
 	int fixed_fd;
+	int fileno;
 };
 
+#define PLAT_BITS		6
+#define PLAT_VAL		(1 << PLAT_BITS)
+#define PLAT_GROUP_NR		29
+#define PLAT_NR			(PLAT_GROUP_NR * PLAT_VAL)
+
 struct submitter {
 	pthread_t thread;
 	int ring_fd;
@@ -67,6 +74,7 @@ struct submitter {
 	struct io_uring_sqe *sqes;
 	struct io_cq_ring cq_ring;
 	int inflight;
+	int tid;
 	unsigned long reaps;
 	unsigned long done;
 	unsigned long calls;
@@ -74,6 +82,10 @@ struct submitter {
 
 	__s32 *fds;
 
+	unsigned long *clock_batch;
+	int clock_index;
+	unsigned long *plat;
+
 	struct file files[MAX_FDS];
 	unsigned nr_files;
 	unsigned cur_file;
@@ -95,9 +107,202 @@ static int sq_thread_poll = 0;	/* use kernel submission/poller thread */
 static int sq_thread_cpu = -1;	/* pin above thread to this CPU */
 static int do_nop = 0;		/* no-op SQ ring commands */
 static int nthreads = 1;
+static int stats = 0;		/* generate IO stats */
+static unsigned long tsc_rate;
 
 static int vectored = 1;
 
+static float plist[] = { 1.0, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0,
+			80.0, 90.0, 95.0, 99.9, 99.5, 99.9, 99.95, 99.99 };
+static int plist_len = 17;
+
+static unsigned long cycles_to_nsec(unsigned long cycles)
+{
+	uint64_t val;
+
+	if (!tsc_rate)
+		return cycles;
+
+	val = cycles * 1000000000ULL;
+	return val / tsc_rate;
+}
+
+static unsigned long plat_idx_to_val(unsigned int idx)
+{
+	unsigned int error_bits;
+	unsigned long k, base;
+
+	assert(idx < PLAT_NR);
+
+	/* MSB <= (PLAT_BITS-1), cannot be rounded off. Use
+	 * all bits of the sample as index */
+	if (idx < (PLAT_VAL << 1))
+		return cycles_to_nsec(idx);
+
+	/* Find the group and compute the minimum value of that group */
+	error_bits = (idx >> PLAT_BITS) - 1;
+	base = ((unsigned long) 1) << (error_bits + PLAT_BITS);
+
+	/* Find its bucket number of the group */
+	k = idx % PLAT_VAL;
+
+	/* Return the mean of the range of the bucket */
+	return cycles_to_nsec(base + ((k + 0.5) * (1 << error_bits)));
+}
+
+unsigned int calc_clat_percentiles(unsigned long *io_u_plat, unsigned long nr,
+				   unsigned long **output,
+				   unsigned long *maxv, unsigned long *minv)
+{
+	unsigned long sum = 0;
+	unsigned int len = plist_len, i, j = 0;
+	unsigned long *ovals = NULL;
+	bool is_last;
+
+	*minv = -1ULL;
+	*maxv = 0;
+
+	ovals = malloc(len * sizeof(*ovals));
+	if (!ovals)
+		return 0;
+
+	/*
+	 * Calculate bucket values, note down max and min values
+	 */
+	is_last = false;
+	for (i = 0; i < PLAT_NR && !is_last; i++) {
+		sum += io_u_plat[i];
+		while (sum >= ((long double) plist[j] / 100.0 * nr)) {
+			assert(plist[j] <= 100.0);
+
+			ovals[j] = plat_idx_to_val(i);
+			if (ovals[j] < *minv)
+				*minv = ovals[j];
+			if (ovals[j] > *maxv)
+				*maxv = ovals[j];
+
+			is_last = (j == len - 1) != 0;
+			if (is_last)
+				break;
+
+			j++;
+		}
+	}
+
+	if (!is_last)
+		fprintf(stderr, "error calculating latency percentiles\n");
+
+	*output = ovals;
+	return len;
+}
+
+static void show_clat_percentiles(unsigned long *io_u_plat, unsigned long nr,
+				  unsigned int precision)
+{
+	unsigned int divisor, len, i, j = 0;
+	unsigned long minv, maxv;
+	unsigned long *ovals;
+	int per_line, scale_down, time_width;
+	bool is_last;
+	char fmt[32];
+
+	len = calc_clat_percentiles(io_u_plat, nr, &ovals, &maxv, &minv);
+	if (!len || !ovals)
+		goto out;
+
+	if (!tsc_rate) {
+		scale_down = 0;
+		divisor = 1;
+		printf("    percentiles (tsc ticks):\n     |");
+	} else if (minv > 2000 && maxv > 99999) {
+		scale_down = 1;
+		divisor = 1000;
+		printf("    percentiles (usec):\n     |");
+	} else {
+		scale_down = 0;
+		divisor = 1;
+		printf("    percentiles (nsec):\n     |");
+	}
+
+	time_width = max(5, (int) (log10(maxv / divisor) + 1));
+	snprintf(fmt, sizeof(fmt), " %%%u.%ufth=[%%%dllu]%%c", precision + 3,
+			precision, time_width);
+	/* fmt will be something like " %5.2fth=[%4llu]%c" */
+	per_line = (80 - 7) / (precision + 10 + time_width);
+
+	for (j = 0; j < len; j++) {
+		/* for formatting */
+		if (j != 0 && (j % per_line) == 0)
+			printf("     |");
+
+		/* end of the list */
+		is_last = (j == len - 1) != 0;
+
+		for (i = 0; i < scale_down; i++)
+			ovals[j] = (ovals[j] + 999) / 1000;
+
+		printf(fmt, plist[j], ovals[j], is_last ? '\n' : ',');
+
+		if (is_last)
+			break;
+
+		if ((j % per_line) == per_line - 1)	/* for formatting */
+			printf("\n");
+	}
+
+out:
+	free(ovals);
+}
+
+static unsigned int plat_val_to_idx(unsigned long val)
+{
+	unsigned int msb, error_bits, base, offset, idx;
+
+	/* Find MSB starting from bit 0 */
+	if (val == 0)
+		msb = 0;
+	else
+		msb = (sizeof(val)*8) - __builtin_clzll(val) - 1;
+
+	/*
+	 * MSB <= (PLAT_BITS-1), cannot be rounded off. Use
+	 * all bits of the sample as index
+	 */
+	if (msb <= PLAT_BITS)
+		return val;
+
+	/* Compute the number of error bits to discard*/
+	error_bits = msb - PLAT_BITS;
+
+	/* Compute the number of buckets before the group */
+	base = (error_bits + 1) << PLAT_BITS;
+
+	/*
+	 * Discard the error bits and apply the mask to find the
+	 * index for the buckets in the group
+	 */
+	offset = (PLAT_VAL - 1) & (val >> error_bits);
+
+	/* Make sure the index does not exceed (array size - 1) */
+	idx = (base + offset) < (PLAT_NR - 1) ?
+		(base + offset) : (PLAT_NR - 1);
+
+	return idx;
+}
+
+static void add_stat(struct submitter *s, int clock_index, int nr)
+{
+#ifdef ARCH_HAVE_CPU_CLOCK
+	unsigned long cycles;
+	unsigned int pidx;
+
+	cycles = get_cpu_clock();
+	cycles -= s->clock_batch[clock_index];
+	pidx = plat_val_to_idx(cycles);
+	s->plat[pidx] += nr;
+#endif
+}
+
 static int io_uring_register_buffers(struct submitter *s)
 {
 	if (do_nop)
@@ -224,7 +429,9 @@ static void init_io(struct submitter *s, unsigned index)
 	}
 	sqe->ioprio = 0;
 	sqe->off = offset;
-	sqe->user_data = (unsigned long) f;
+	sqe->user_data = (unsigned long) f->fileno;
+	if (stats)
+		sqe->user_data |= ((unsigned long)s->clock_index << 32);
 }
 
 static int prep_more_ios(struct submitter *s, int max_ios)
@@ -277,6 +484,7 @@ static int reap_events(struct submitter *s)
 	struct io_cq_ring *ring = &s->cq_ring;
 	struct io_uring_cqe *cqe;
 	unsigned head, reaped = 0;
+	int last_idx = -1, stat_nr = 0;
 
 	head = *ring->head;
 	do {
@@ -287,7 +495,9 @@ static int reap_events(struct submitter *s)
 			break;
 		cqe = &ring->cqes[head & cq_ring_mask];
 		if (!do_nop) {
-			f = (struct file *) (uintptr_t) cqe->user_data;
+			int fileno = cqe->user_data & 0xffffffff;
+
+			f = &s->files[fileno];
 			f->pending_ios--;
 			if (cqe->res != bs) {
 				printf("io: unexpected ret=%d\n", cqe->res);
@@ -296,10 +506,26 @@ static int reap_events(struct submitter *s)
 				return -1;
 			}
 		}
+		if (stats) {
+			int clock_index = cqe->user_data >> 32;
+
+			if (last_idx != clock_index) {
+				if (last_idx != -1) {
+					add_stat(s, last_idx, stat_nr);
+					stat_nr = 0;
+				}
+				last_idx = clock_index;
+			}
+			stat_nr++;
+			add_stat(s, clock_index, 1);
+		}
 		reaped++;
 		head++;
 	} while (1);
 
+	if (stat_nr)
+		add_stat(s, last_idx, stat_nr);
+
 	if (reaped) {
 		s->inflight -= reaped;
 		atomic_store_release(ring->head, head);
@@ -311,12 +537,28 @@ static void *submitter_fn(void *data)
 {
 	struct submitter *s = data;
 	struct io_sq_ring *ring = &s->sq_ring;
-	int ret, prepped;
+	int i, ret, prepped, nr_batch;
 
-	printf("submitter=%d\n", gettid());
+	s->tid = gettid();
+	printf("submitter=%d\n", s->tid);
 
 	srand48(pthread_self());
 
+	for (i = 0; i < MAX_FDS; i++)
+		s->files[i].fileno = i;
+
+	if (stats) {
+		nr_batch = roundup_pow2(depth / batch_submit);
+		s->clock_batch = calloc(nr_batch, sizeof(unsigned long));
+		s->clock_index = 0;
+
+		s->plat = calloc(PLAT_NR, sizeof(unsigned long));
+	} else {
+		s->clock_batch = NULL;
+		s->plat = NULL;
+		nr_batch = 0;
+	}
+
 	prepped = 0;
 	do {
 		int to_wait, to_submit, this_reap, to_prep;
@@ -325,6 +567,12 @@ static void *submitter_fn(void *data)
 		if (!prepped && s->inflight < depth) {
 			to_prep = min(depth - s->inflight, batch_submit);
 			prepped = prep_more_ios(s, to_prep);
+#ifdef ARCH_HAVE_CPU_CLOCK
+			if (prepped && stats) {
+				s->clock_batch[s->clock_index] = get_cpu_clock();
+				s->clock_index = (s->clock_index + 1) & (nr_batch - 1);
+			}
+#endif
 		}
 		s->inflight += prepped;
 submit_more:
@@ -555,9 +803,11 @@ static void usage(char *argv, int status)
 		" -F <bool> : Register files, default %d\n"
 		" -n <int>  : Number of threads, default %d\n"
 		" -O <bool> : Use O_DIRECT, default %d\n"
-		" -N <bool> : Perform just no-op requests, default %d\n",
+		" -N <bool> : Perform just no-op requests, default %d\n"
+		" -t <bool> : Track IO latencies, default %d\n"
+		" -T <int>  : TSC rate in HZ\n",
 		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
-		fixedbufs, register_files, nthreads, !buffered, do_nop);
+		fixedbufs, register_files, nthreads, !buffered, do_nop, stats);
 	exit(status);
 }
 
@@ -573,16 +823,20 @@ int main(int argc, char *argv[])
 	if (!do_nop && argc < 2)
 		usage(argv[0], 1);
 
-	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:t:T:h?")) != -1) {
 		switch (opt) {
 		case 'd':
 			depth = atoi(optarg);
 			break;
 		case 's':
 			batch_submit = atoi(optarg);
+			if (!batch_submit)
+				batch_submit = 1;
 			break;
 		case 'c':
 			batch_complete = atoi(optarg);
+			if (!batch_complete)
+				batch_complete = 1;
 			break;
 		case 'b':
 			bs = atoi(optarg);
@@ -609,6 +863,20 @@ int main(int argc, char *argv[])
 		case 'O':
 			buffered = !atoi(optarg);
 			break;
+		case 't':
+#ifndef ARCH_HAVE_CPU_CLOCK
+			fprintf(stderr, "Stats not supported on this CPU\n");
+			return 1;
+#endif
+			stats = !!atoi(optarg);
+			break;
+		case 'T':
+#ifndef ARCH_HAVE_CPU_CLOCK
+			fprintf(stderr, "Stats not supported on this CPU\n");
+			return 1;
+#endif
+			tsc_rate = strtoul(optarg, NULL, 10);
+			break;
 		case 'h':
 		case '?':
 		default:
@@ -764,7 +1032,19 @@ int main(int argc, char *argv[])
 		s = get_submitter(j);
 		pthread_join(s->thread, &ret);
 		close(s->ring_fd);
+
+		if (stats) {
+			unsigned long nr;
+
+			printf("%d: Latency percentiles:\n", s->tid);
+			for (i = 0, nr = 0; i < PLAT_NR; i++)
+				nr += s->plat[i];
+			show_clat_percentiles(s->plat, nr, 4);
+			free(s->clock_batch);
+			free(s->plat);
+		}
 	}
+
 	free(fdepths);
 	free(submitter);
 	return 0;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c53476111d5ede61d24b3fa181fa2d19d3a3e6bc:

  t/io_uring: ensure batch counts are smaller or equal to depth (2021-09-23 09:15:16 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6c5d3a1c08bda1bbf22187c7b80573400e1c1053:

  t/io_uring: don't print BW numbers for do_nop (2021-09-24 15:17:44 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: don't print BW numbers for do_nop

 t/io_uring.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index af1b8fa8..d5636380 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -751,8 +751,10 @@ int main(int argc, char *argv[])
 			bw = iops * (bs / 1048576);
 		else
 			bw = iops / (1048576 / bs);
-		printf("IOPS=%lu, BW=%luMiB/s, IOS/call=%ld/%ld, inflight=(%s)\n",
-				iops, bw, rpc, ipc, fdepths);
+		printf("IOPS=%lu, ", iops);
+		if (!do_nop)
+			printf("BW=%luMiB/s, ", bw);
+		printf("IOS/call=%ld/%ld, inflight=(%s)\n", rpc, ipc, fdepths);
 		done = this_done;
 		calls = this_call;
 		reap = this_reap;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d9137307bc621280dcb1738e5df5d5ee4269a665:

  Merge branch 'one-core' of https://github.com/ErwanAliasr1/fio (2021-09-20 18:29:40 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c53476111d5ede61d24b3fa181fa2d19d3a3e6bc:

  t/io_uring: ensure batch counts are smaller or equal to depth (2021-09-23 09:15:16 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: ensure batch counts are smaller or equal to depth

 t/io_uring.c | 5 +++++
 1 file changed, 5 insertions(+)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 1adb8789..af1b8fa8 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -617,6 +617,11 @@ int main(int argc, char *argv[])
 		}
 	}
 
+	if (batch_complete > depth)
+		batch_complete = depth;
+	if (batch_submit > depth)
+		batch_submit = depth;
+
 	submitter = calloc(nthreads, sizeof(*submitter) +
 				depth * sizeof(struct iovec));
 	for (j = 0; j < nthreads; j++) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f3057d268d98aeb638b659a284d087fb0c2f654c:

  t/io_uring: fix bandwidth calculation (2021-09-16 11:41:06 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d9137307bc621280dcb1738e5df5d5ee4269a665:

  Merge branch 'one-core' of https://github.com/ErwanAliasr1/fio (2021-09-20 18:29:40 -0600)

----------------------------------------------------------------
Erwan Velu (1):
      t/one-core.sh: Adding script to run the one-core io benchmark

Jens Axboe (1):
      Merge branch 'one-core' of https://github.com/ErwanAliasr1/fio

 t/one-core-peak.sh | 199 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 199 insertions(+)
 create mode 100755 t/one-core-peak.sh

---

Diff of recent changes:

diff --git a/t/one-core-peak.sh b/t/one-core-peak.sh
new file mode 100755
index 00000000..791deece
--- /dev/null
+++ b/t/one-core-peak.sh
@@ -0,0 +1,199 @@
+#!/bin/bash
+
+args=$*
+first_cores=""
+taskset_cores=""
+first_cores_count=0
+nb_threads=4 #default from the benchmark
+
+fatal() {
+  echo "$@"
+  exit 1
+}
+
+hint() {
+  echo "Warning: $*"
+}
+
+info() {
+  item=$1
+  shift
+  echo "${item}: $*"
+}
+
+check_root() {
+  [[ ${EUID} -eq 0 ]] || fatal "You should be root to run this tool"
+}
+
+check_binary() {
+  # Ensure the binaries are present and executable
+  for bin in "$@"; do
+    if [ ! -x ${bin} ]; then
+      which ${bin} >/dev/null
+      [ $? -eq 0 ] || fatal "${bin} doesn't exists or is not executable"
+    fi
+  done
+}
+
+
+detect_first_core() {
+  # Detect which logical cpus belongs to the first physical core
+  # If Hyperthreading is enabled, two cores are returned
+  cpus=$(lscpu  --all -pSOCKET,CORE,CPU |grep "0,0")
+  for cpu in ${cpus}; do
+    IFS=','
+    # shellcheck disable=SC2206
+    array=(${cpu})
+    if [ ${first_cores_count} -eq 0 ]; then
+      first_cores="${array[2]}"
+    else
+      first_cores="${first_cores} ${array[2]}"
+    fi
+
+    first_cores_count=$((first_cores_count + 1))
+    unset IFS
+  done
+  [ ${first_cores_count} -eq 0 ] && fatal "Cannot detect first core"
+  taskset_cores=$(echo "${first_cores}" | tr ' ' ',')
+}
+
+check_args() {
+  [ $1 -eq 0 ] && fatal "Missing drive(s) as argument"
+}
+
+check_drive_exists() {
+  # Ensure the block device exists
+  [ -b $1 ] || fatal "$1 is not a valid block device"
+}
+
+is_nvme() {
+  [[ ${*} == *"nvme"* ]]
+}
+
+check_poll_queue() {
+  # Print a warning if the nvme poll queues aren't enabled
+  is_nvme ${args} || return
+  poll_queue=$(cat /sys/module/nvme/parameters/poll_queues)
+  [ ${poll_queue} -eq 0 ] && hint "For better performance, you should enable nvme poll queues by setting nvme.poll_queues=32 on the kernel commande line"
+}
+
+block_dev_name() {
+  echo ${1#"/dev/"}
+}
+
+get_sys_block_dir() {
+  # Returns the /sys/block/ directory of a given block device
+  device_name=$1
+  sys_block_dir="/sys/block/${device_name}"
+  [ -d "${sys_block_dir}" ] || fatal "Cannot find ${sys_block_dir} directory"
+  echo ${sys_block_dir}
+}
+
+check_io_scheduler() {
+  # Ensure io_sched is set to none
+  device_name=$(block_dev_name $1)
+  sys_block_dir=$(get_sys_block_dir ${device_name})
+  sched_file="${sys_block_dir}/queue/scheduler"
+  [ -f "${sched_file}" ] || fatal "Cannot find IO scheduler for ${device_name}"
+  grep -q '\[none\]' ${sched_file}
+  if [ $? -ne 0 ]; then
+    info "${device_name}" "set none as io scheduler"
+    echo "none" > ${sched_file}
+  fi
+
+}
+
+check_sysblock_value() {
+  device_name=$(block_dev_name $1)
+  sys_block_dir=$(get_sys_block_dir ${device_name})
+  target_file="${sys_block_dir}/$2"
+  value=$3
+  [ -f "${target_file}" ] || fatal "Cannot find ${target_file} for ${device_name}"
+  content=$(cat ${target_file})
+  if [ "${content}" != "${value}" ]; then
+    info "${device_name}" "${target_file} set to ${value}."
+    echo ${value} > ${target_file} 2>/dev/null || hint "${device_name}: Cannot set ${value} on ${target_file}"
+  fi
+}
+
+compute_nb_threads() {
+  # Increase the number of threads if there is more devices or cores than the default value
+  [ $# -gt ${nb_threads} ] && nb_threads=$#
+  [ ${first_cores_count} -gt ${nb_threads} ] && nb_threads=${first_cores_count}
+}
+
+check_scaling_governor() {
+  driver=$(LC_ALL=C cpupower frequency-info |grep "driver:" |awk '{print $2}')
+  if [ -z "${driver}" ]; then
+    hint "Cannot detect processor scaling driver"
+    return
+  fi
+  cpupower frequency-set -g performance >/dev/null 2>&1 || fatal "Cannot set scaling processor governor"
+}
+
+check_idle_governor() {
+  filename="/sys/devices/system/cpu/cpuidle/current_governor"
+  if [ ! -f "${filename}" ]; then
+    hint "Cannot detect cpu idle governor"
+    return
+  fi
+  echo "menu" > ${filename} 2>/dev/null || fatal "Cannot set cpu idle governor to menu"
+}
+
+show_nvme() {
+  device_name=$(block_dev_name $1)
+  device_dir="/sys/block/${device_name}/device/"
+  pci_addr=$(cat ${device_dir}/address)
+  pci_dir="/sys/bus/pci/devices/${pci_addr}/"
+  link_speed=$(cat ${pci_dir}/current_link_speed)
+  irq=$(cat ${pci_dir}/irq)
+  numa=$(cat ${pci_dir}/numa_node)
+  cpus=$(cat ${pci_dir}/local_cpulist)
+  model=$(cat ${device_dir}/model | xargs) #xargs for trimming spaces
+  fw=$(cat ${device_dir}/firmware_rev | xargs) #xargs for trimming spaces
+  serial=$(cat ${device_dir}/serial | xargs) #xargs for trimming spaces
+  info ${device_name} "MODEL=${model} FW=${fw} serial=${serial} PCI=${pci_addr}@${link_speed} IRQ=${irq} NUMA=${numa} CPUS=${cpus} "
+}
+
+show_device() {
+  device_name=$(block_dev_name $1)
+  is_nvme $1 && show_nvme $1
+}
+
+show_system() {
+CPU_MODEL=$(grep -m1 "model name" /proc/cpuinfo | awk '{print substr($0, index($0,$4))}')
+MEMORY_SPEED=$(dmidecode -t 17 -q |grep -m 1 "Configured Memory Speed: " | awk '{print substr($0, index($0,$4))}')
+KERNEL=$(uname -r)
+info "system" "CPU: ${CPU_MODEL}"
+info "system" "MEMORY: ${MEMORY_SPEED}"
+info "system" "KERNEL: ${KERNEL}"
+}
+
+### MAIN
+check_args $#
+check_root
+check_binary t/io_uring lscpu grep taskset cpupower awk tr xargs dmidecode
+detect_first_core
+
+info "##################################################"
+show_system
+for drive in ${args}; do
+  check_drive_exists ${drive}
+  check_io_scheduler ${drive}
+  check_sysblock_value ${drive} "queue/iostats" 0 # Ensure iostats are disabled
+  check_sysblock_value ${drive} "queue/nomerges" 2 # Ensure merge are disabled
+  check_sysblock_value ${drive} "queue/io_poll" 1 # Ensure io_poll is enabled
+  show_device ${drive}
+done
+
+check_poll_queue
+compute_nb_threads ${args}
+check_scaling_governor
+check_idle_governor
+
+info "##################################################"
+echo
+
+cmdline="taskset -c ${taskset_cores} t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n${nb_threads} ${args}"
+info "io_uring" "Running ${cmdline}"
+${cmdline}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2686fc2279c0e1272a48657dc62c16059a672da9:

  t/io_uring: add switch -O for O_DIRECT vs buffered (2021-09-15 06:51:01 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f3057d268d98aeb638b659a284d087fb0c2f654c:

  t/io_uring: fix bandwidth calculation (2021-09-16 11:41:06 -0600)

----------------------------------------------------------------
Erwan Velu (1):
      t/io_uring: Reporting bandwidth

Jens Axboe (2):
      Merge branch 'bwps' of https://github.com/ErwanAliasr1/fio
      t/io_uring: fix bandwidth calculation

 t/io_uring.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 0acbf0b4..1adb8789 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -727,6 +727,7 @@ int main(int argc, char *argv[])
 		unsigned long this_reap = 0;
 		unsigned long this_call = 0;
 		unsigned long rpc = 0, ipc = 0;
+		unsigned long iops, bw;
 
 		sleep(1);
 		for (j = 0; j < nthreads; j++) {
@@ -740,8 +741,13 @@ int main(int argc, char *argv[])
 		} else
 			rpc = ipc = -1;
 		file_depths(fdepths);
-		printf("IOPS=%lu, IOS/call=%ld/%ld, inflight=(%s)\n",
-				this_done - done, rpc, ipc, fdepths);
+		iops = this_done - done;
+		if (bs > 1048576)
+			bw = iops * (bs / 1048576);
+		else
+			bw = iops / (1048576 / bs);
+		printf("IOPS=%lu, BW=%luMiB/s, IOS/call=%ld/%ld, inflight=(%s)\n",
+				iops, bw, rpc, ipc, fdepths);
 		done = this_done;
 		calls = this_call;
 		reap = this_reap;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d5c3be105af97c71bc2095ffd19343e4217abcd7:

  zbd: remove dead zone retrieval call (2021-09-13 14:09:01 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2686fc2279c0e1272a48657dc62c16059a672da9:

  t/io_uring: add switch -O for O_DIRECT vs buffered (2021-09-15 06:51:01 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: add switch -O for O_DIRECT vs buffered

 t/io_uring.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index aed6fdbd..0acbf0b4 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -554,9 +554,10 @@ static void usage(char *argv, int status)
 		" -B <bool> : Fixed buffers, default %d\n"
 		" -F <bool> : Register files, default %d\n"
 		" -n <int>  : Number of threads, default %d\n"
+		" -O <bool> : Use O_DIRECT, default %d\n"
 		" -N <bool> : Perform just no-op requests, default %d\n",
 		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
-		fixedbufs, register_files, nthreads, do_nop);
+		fixedbufs, register_files, nthreads, !buffered, do_nop);
 	exit(status);
 }
 
@@ -572,7 +573,7 @@ int main(int argc, char *argv[])
 	if (!do_nop && argc < 2)
 		usage(argv[0], 1);
 
-	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:O:h?")) != -1) {
 		switch (opt) {
 		case 'd':
 			depth = atoi(optarg);
@@ -605,6 +606,9 @@ int main(int argc, char *argv[])
 		case 'N':
 			do_nop = !!atoi(optarg);
 			break;
+		case 'O':
+			buffered = !atoi(optarg);
+			break;
 		case 'h':
 		case '?':
 		default:

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 25425cb4a5531b1b3f26eba4e49866d944e0f1fb:

  Merge branch 'ft' of https://github.com/ErwanAliasr1/fio (2021-09-08 15:40:47 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d5c3be105af97c71bc2095ffd19343e4217abcd7:

  zbd: remove dead zone retrieval call (2021-09-13 14:09:01 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      t/io_uring: don't require a file for do_nop runs
      t/io_uring: add -N option for do_nop
      zbd: remove dead zone retrieval call

 t/io_uring.c | 38 +++++++++++++++++++++-----------------
 zbd.c        |  1 -
 2 files changed, 21 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index c9ca3e9d..aed6fdbd 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -553,9 +553,10 @@ static void usage(char *argv, int status)
 		" -p <bool> : Polled IO, default %d\n"
 		" -B <bool> : Fixed buffers, default %d\n"
 		" -F <bool> : Register files, default %d\n"
-		" -n <int>  : Number of threads, default %d\n",
+		" -n <int>  : Number of threads, default %d\n"
+		" -N <bool> : Perform just no-op requests, default %d\n",
 		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
-		fixedbufs, register_files, nthreads);
+		fixedbufs, register_files, nthreads, do_nop);
 	exit(status);
 }
 
@@ -568,12 +569,10 @@ int main(int argc, char *argv[])
 	char *fdepths;
 	void *ret;
 
-	if (!do_nop && argc < 2) {
-		printf("%s: filename [options]\n", argv[0]);
-		return 1;
-	}
+	if (!do_nop && argc < 2)
+		usage(argv[0], 1);
 
-	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:N:h?")) != -1) {
 		switch (opt) {
 		case 'd':
 			depth = atoi(optarg);
@@ -603,6 +602,9 @@ int main(int argc, char *argv[])
 				usage(argv[0], 1);
 			}
 			break;
+		case 'N':
+			do_nop = !!atoi(optarg);
+			break;
 		case 'h':
 		case '?':
 		default:
@@ -626,16 +628,18 @@ int main(int argc, char *argv[])
 	j = 0;
 	i = optind;
 	nfiles = argc - i;
-	if (!nfiles) {
-		printf("No files specified\n");
-		usage(argv[0], 1);
-	}
-	threads_per_f = nthreads / nfiles;
-	/* make sure each thread gets assigned files */
-	if (threads_per_f == 0) {
-		threads_per_f = 1;
-	} else {
-		threads_rem = nthreads - threads_per_f * nfiles;
+	if (!do_nop) {
+		if (!nfiles) {
+			printf("No files specified\n");
+			usage(argv[0], 1);
+		}
+		threads_per_f = nthreads / nfiles;
+		/* make sure each thread gets assigned files */
+		if (threads_per_f == 0) {
+			threads_per_f = 1;
+		} else {
+			threads_rem = nthreads - threads_per_f * nfiles;
+		}
 	}
 	while (!do_nop && i < argc) {
 		int k, limit;
diff --git a/zbd.c b/zbd.c
index dd1abc58..64415d2b 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1857,7 +1857,6 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 				       f->file_name);
 				goto eof;
 			}
-			zbd_zone_nr(f, zb);
 		}
 		/* Check whether the zone reset threshold has been exceeded */
 		if (td->o.zrf.u.f) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f7942acdc23a4ee837ef30834e1d2cb1fc6d0afe:

  options: Add thinktime_iotime option (2021-09-05 21:11:25 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 25425cb4a5531b1b3f26eba4e49866d944e0f1fb:

  Merge branch 'ft' of https://github.com/ErwanAliasr1/fio (2021-09-08 15:40:47 -0600)

----------------------------------------------------------------
Andrzej Jakowski (2):
      t/io_uring: fixes in output
      t/io_uring: allow flexible IO threads assignment

Erwan Velu (7):
      engines/sg: Return error if generic_close_file fails
      filesetup: Removing unused variable usage
      stat: Avoid freeing null pointer
      engines/sg: Removing useless variable assignment
      lib/fls.h: Remove unused variable assignment
      zbd: Removing useless variable assignment
      log: Removing useless assignment

Jens Axboe (5):
      t/io_uring: don't make setrlimit() failing fatal
      Fio 3.28
      t/io_uring: ensure that nthreads is > 0
      README: add link to new lore archive
      Merge branch 'ft' of https://github.com/ErwanAliasr1/fio

 FIO-VERSION-GEN |  2 +-
 README          |  4 +++
 engines/sg.c    |  7 ++---
 filesetup.c     |  1 -
 lib/fls.h       |  1 -
 log.c           |  2 +-
 stat.c          |  3 +-
 t/io_uring.c    | 96 +++++++++++++++++++++++++++++++++++----------------------
 zbd.c           |  2 +-
 9 files changed, 71 insertions(+), 47 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 47af94e9..e9d563c1 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.27
+DEF_VER=fio-3.28
 
 LF='
 '
diff --git a/README b/README
index 2fecf0e0..52eca5c3 100644
--- a/README
+++ b/README
@@ -72,6 +72,10 @@ in the body of the email. Archives can be found here:
 
 	http://www.spinics.net/lists/fio/
 
+or here:
+
+	https://lore.kernel.org/fio/
+
 and archives for the old list can be found here:
 
 	http://maillist.kernel.dk/fio-devel/
diff --git a/engines/sg.c b/engines/sg.c
index 0c2d2c8b..1c019384 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -471,10 +471,9 @@ static enum fio_q_status fio_sgio_rw_doio(struct thread_data *td,
 			if (__io_u == io_u)
 				break;
 
-			if (io_u_sync_complete(td, __io_u)) {
-				ret = -1;
+			if (io_u_sync_complete(td, __io_u))
 				break;
-			}
+
 		} while (1);
 
 		return FIO_Q_COMPLETED;
@@ -982,7 +981,7 @@ static int fio_sgio_open(struct thread_data *td, struct fio_file *f)
 
 	if (sd && !sd->type_checked && fio_sgio_type_check(td, f)) {
 		ret = generic_close_file(td, f);
-		return 1;
+		return ret;
 	}
 
 	return 0;
diff --git a/filesetup.c b/filesetup.c
index 296de5a1..228e4fff 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1024,7 +1024,6 @@ int longest_existing_path(char *path) {
 	while (!done) {
 		buf_pos = strrchr(buf, FIO_OS_PATH_SEPARATOR);
 		if (!buf_pos) {
-			done = true;
 			offset = 0;
 			break;
 		}
diff --git a/lib/fls.h b/lib/fls.h
index dc7ecd0d..99e1862a 100644
--- a/lib/fls.h
+++ b/lib/fls.h
@@ -32,7 +32,6 @@ static inline int __fls(int x)
 		r -= 2;
 	}
 	if (!(x & 0x80000000u)) {
-		x <<= 1;
 		r -= 1;
 	}
 	return r;
diff --git a/log.c b/log.c
index 562a29aa..237bac28 100644
--- a/log.c
+++ b/log.c
@@ -62,7 +62,7 @@ void log_prevalist(int type, const char *fmt, va_list args)
 	free(buf1);
 	if (len < 0)
 		return;
-	len = log_info_buf(buf2, len);
+	log_info_buf(buf2, len);
 	free(buf2);
 }
 #endif
diff --git a/stat.c b/stat.c
index 99275620..ac53463d 100644
--- a/stat.c
+++ b/stat.c
@@ -211,7 +211,7 @@ static void show_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 
 	len = calc_clat_percentiles(io_u_plat, nr, plist, &ovals, &maxv, &minv);
 	if (!len || !ovals)
-		goto out;
+		return;
 
 	/*
 	 * We default to nsecs, but if the value range is such that we
@@ -258,7 +258,6 @@ static void show_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 			log_buf(out, "\n");
 	}
 
-out:
 	free(ovals);
 }
 
diff --git a/t/io_uring.c b/t/io_uring.c
index 3130e469..c9ca3e9d 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -468,6 +468,13 @@ static int setup_ring(struct submitter *s)
 	io_uring_probe(fd);
 
 	if (fixedbufs) {
+		struct rlimit rlim;
+
+		rlim.rlim_cur = RLIM_INFINITY;
+		rlim.rlim_max = RLIM_INFINITY;
+		/* ignore potential error, not needed on newer kernels */
+		setrlimit(RLIMIT_MEMLOCK, &rlim);
+
 		ret = io_uring_register_buffers(s);
 		if (ret < 0) {
 			perror("io_uring_register_buffers");
@@ -536,23 +543,28 @@ static void file_depths(char *buf)
 	}
 }
 
-static void usage(char *argv)
+static void usage(char *argv, int status)
 {
 	printf("%s [options] -- [filenames]\n"
-		" -d <int> : IO Depth, default %d\n"
-		" -s <int> : Batch submit, default %d\n"
-		" -c <int> : Batch complete, default %d\n"
-		" -b <int> : Block size, default %d\n"
-		" -p <bool> : Polled IO, default %d\n",
-		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled);
-	exit(0);
+		" -d <int>  : IO Depth, default %d\n"
+		" -s <int>  : Batch submit, default %d\n"
+		" -c <int>  : Batch complete, default %d\n"
+		" -b <int>  : Block size, default %d\n"
+		" -p <bool> : Polled IO, default %d\n"
+		" -B <bool> : Fixed buffers, default %d\n"
+		" -F <bool> : Register files, default %d\n"
+		" -n <int>  : Number of threads, default %d\n",
+		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled,
+		fixedbufs, register_files, nthreads);
+	exit(status);
 }
 
 int main(int argc, char *argv[])
 {
 	struct submitter *s;
 	unsigned long done, calls, reap;
-	int err, i, j, flags, fd, opt;
+	int err, i, j, flags, fd, opt, threads_per_f, threads_rem = 0, nfiles;
+	struct file f;
 	char *fdepths;
 	void *ret;
 
@@ -586,11 +598,15 @@ int main(int argc, char *argv[])
 			break;
 		case 'n':
 			nthreads = atoi(optarg);
+			if (!nthreads) {
+				printf("Threads must be non-zero\n");
+				usage(argv[0], 1);
+			}
 			break;
 		case 'h':
 		case '?':
 		default:
-			usage(argv[0]);
+			usage(argv[0], 0);
 			break;
 		}
 	}
@@ -609,49 +625,57 @@ int main(int argc, char *argv[])
 
 	j = 0;
 	i = optind;
-	printf("i %d, argc %d\n", i, argc);
+	nfiles = argc - i;
+	if (!nfiles) {
+		printf("No files specified\n");
+		usage(argv[0], 1);
+	}
+	threads_per_f = nthreads / nfiles;
+	/* make sure each thread gets assigned files */
+	if (threads_per_f == 0) {
+		threads_per_f = 1;
+	} else {
+		threads_rem = nthreads - threads_per_f * nfiles;
+	}
 	while (!do_nop && i < argc) {
-		struct file *f;
+		int k, limit;
+
+		memset(&f, 0, sizeof(f));
 
-		s = get_submitter(j);
-		if (s->nr_files == MAX_FDS) {
-			printf("Max number of files (%d) reached\n", MAX_FDS);
-			break;
-		}
 		fd = open(argv[i], flags);
 		if (fd < 0) {
 			perror("open");
 			return 1;
 		}
-
-		f = &s->files[s->nr_files];
-		f->real_fd = fd;
-		if (get_file_size(f)) {
+		f.real_fd = fd;
+		if (get_file_size(&f)) {
 			printf("failed getting size of device/file\n");
 			return 1;
 		}
-		if (f->max_blocks <= 1) {
+		if (f.max_blocks <= 1) {
 			printf("Zero file/device size?\n");
 			return 1;
 		}
-		f->max_blocks--;
+		f.max_blocks--;
 
-		printf("Added file %s (submitter %d)\n", argv[i], s->index);
-		s->nr_files++;
-		i++;
-		if (++j >= nthreads)
-			j = 0;
-	}
+		limit = threads_per_f;
+		limit += threads_rem > 0 ? 1 : 0;
+		for (k = 0; k < limit; k++) {
+			s = get_submitter((j + k) % nthreads);
 
-	if (fixedbufs) {
-		struct rlimit rlim;
+			if (s->nr_files == MAX_FDS) {
+				printf("Max number of files (%d) reached\n", MAX_FDS);
+				break;
+			}
 
-		rlim.rlim_cur = RLIM_INFINITY;
-		rlim.rlim_max = RLIM_INFINITY;
-		if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0) {
-			perror("setrlimit");
-			return 1;
+			memcpy(&s->files[s->nr_files], &f, sizeof(f));
+
+			printf("Added file %s (submitter %d)\n", argv[i], s->index);
+			s->nr_files++;
 		}
+		threads_rem--;
+		i++;
+		j += limit;
 	}
 
 	arm_sig_int();
diff --git a/zbd.c b/zbd.c
index 1b933ce4..dd1abc58 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1857,7 +1857,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 				       f->file_name);
 				goto eof;
 			}
-			zone_idx_b = zbd_zone_nr(f, zb);
+			zbd_zone_nr(f, zb);
 		}
 		/* Check whether the zone reset threshold has been exceeded */
 		if (td->o.zrf.u.f) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 63176c21beb68ec54787eb2fd6be5b3c9132113b:

  examples: add examples for cmdprio_* IO priority options (2021-09-03 10:12:25 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f7942acdc23a4ee837ef30834e1d2cb1fc6d0afe:

  options: Add thinktime_iotime option (2021-09-05 21:11:25 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (1):
      options: Add thinktime_iotime option

 HOWTO            | 14 +++++++++++++-
 backend.c        | 45 +++++++++++++++++++++++++++++++++++++++------
 cconv.c          |  2 ++
 fio.1            | 13 ++++++++++++-
 fio.h            |  2 ++
 options.c        | 14 ++++++++++++++
 server.h         |  2 +-
 thread_options.h | 21 ++++++++++++---------
 8 files changed, 95 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 1853f56a..297a0485 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2745,7 +2745,7 @@ I/O rate
 	Stall the job for the specified period of time after an I/O has completed before issuing the
 	next. May be used to simulate processing being done by an application.
 	When the unit is omitted, the value is interpreted in microseconds.  See
-	:option:`thinktime_blocks` and :option:`thinktime_spin`.
+	:option:`thinktime_blocks`, :option:`thinktime_iotime` and :option:`thinktime_spin`.
 
 .. option:: thinktime_spin=time
 
@@ -2770,6 +2770,18 @@ I/O rate
 	:option:`thinktime_blocks` blocks. If this is set to `issue`, then the trigger happens
 	at the issue side.
 
+.. option:: thinktime_iotime=time
+
+	Only valid if :option:`thinktime` is set - control :option:`thinktime`
+	interval by time. The :option:`thinktime` stall is repeated after IOs
+	are executed for :option:`thinktime_iotime`. For example,
+	``--thinktime_iotime=9s --thinktime=1s`` repeat 10-second cycle with IOs
+	for 9 seconds and stall for 1 second. When the unit is omitted,
+	:option:`thinktime_iotime` is interpreted as a number of seconds. If
+	this option is used together with :option:`thinktime_blocks`, the
+	:option:`thinktime` stall is repeated after :option:`thinktime_iotime`
+	or after :option:`thinktime_blocks` IOs, whichever happens first.
+
 .. option:: rate=int[,int][,int]
 
 	Cap the bandwidth used by this job. The number is in bytes/sec, the normal
diff --git a/backend.c b/backend.c
index 1bcb035a..86fa6d41 100644
--- a/backend.c
+++ b/backend.c
@@ -858,15 +858,47 @@ static long long usec_for_io(struct thread_data *td, enum fio_ddir ddir)
 	return 0;
 }
 
+static void init_thinktime(struct thread_data *td)
+{
+	if (td->o.thinktime_blocks_type == THINKTIME_BLOCKS_TYPE_COMPLETE)
+		td->thinktime_blocks_counter = td->io_blocks;
+	else
+		td->thinktime_blocks_counter = td->io_issues;
+	td->last_thinktime = td->epoch;
+	td->last_thinktime_blocks = 0;
+}
+
 static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir,
 			     struct timespec *time)
 {
 	unsigned long long b;
 	uint64_t total;
 	int left;
+	struct timespec now;
+	bool stall = false;
+
+	if (td->o.thinktime_iotime) {
+		fio_gettime(&now, NULL);
+		if (utime_since(&td->last_thinktime, &now)
+		    >= td->o.thinktime_iotime + td->o.thinktime) {
+			stall = true;
+		} else if (!fio_option_is_set(&td->o, thinktime_blocks)) {
+			/*
+			 * When thinktime_iotime is set and thinktime_blocks is
+			 * not set, skip the thinktime_blocks check, since
+			 * thinktime_blocks default value 1 does not work
+			 * together with thinktime_iotime.
+			 */
+			return;
+		}
+
+	}
 
 	b = ddir_rw_sum(td->thinktime_blocks_counter);
-	if (b % td->o.thinktime_blocks || !b)
+	if (b >= td->last_thinktime_blocks + td->o.thinktime_blocks)
+		stall = true;
+
+	if (!stall)
 		return;
 
 	io_u_quiesce(td);
@@ -902,6 +934,10 @@ static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir,
 
 	if (time && should_check_rate(td))
 		fio_gettime(time, NULL);
+
+	td->last_thinktime_blocks = b;
+	if (td->o.thinktime_iotime)
+		td->last_thinktime = now;
 }
 
 /*
@@ -1792,17 +1828,14 @@ static void *thread_main(void *data)
 	if (rate_submit_init(td, sk_out))
 		goto err;
 
-	if (td->o.thinktime_blocks_type == THINKTIME_BLOCKS_TYPE_COMPLETE)
-		td->thinktime_blocks_counter = td->io_blocks;
-	else
-		td->thinktime_blocks_counter = td->io_issues;
-
 	set_epoch_time(td, o->log_unix_epoch);
 	fio_getrusage(&td->ru_start);
 	memcpy(&td->bw_sample_time, &td->epoch, sizeof(td->epoch));
 	memcpy(&td->iops_sample_time, &td->epoch, sizeof(td->epoch));
 	memcpy(&td->ss.prev_time, &td->epoch, sizeof(td->epoch));
 
+	init_thinktime(td);
+
 	if (o->ratemin[DDIR_READ] || o->ratemin[DDIR_WRITE] ||
 			o->ratemin[DDIR_TRIM]) {
 	        memcpy(&td->lastrate[DDIR_READ], &td->bw_sample_time,
diff --git a/cconv.c b/cconv.c
index 2dc5274e..2104308c 100644
--- a/cconv.c
+++ b/cconv.c
@@ -214,6 +214,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->thinktime_spin = le32_to_cpu(top->thinktime_spin);
 	o->thinktime_blocks = le32_to_cpu(top->thinktime_blocks);
 	o->thinktime_blocks_type = le32_to_cpu(top->thinktime_blocks_type);
+	o->thinktime_iotime = le32_to_cpu(top->thinktime_iotime);
 	o->fsync_blocks = le32_to_cpu(top->fsync_blocks);
 	o->fdatasync_blocks = le32_to_cpu(top->fdatasync_blocks);
 	o->barrier_blocks = le32_to_cpu(top->barrier_blocks);
@@ -440,6 +441,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->thinktime_spin = cpu_to_le32(o->thinktime_spin);
 	top->thinktime_blocks = cpu_to_le32(o->thinktime_blocks);
 	top->thinktime_blocks_type = __cpu_to_le32(o->thinktime_blocks_type);
+	top->thinktime_iotime = __cpu_to_le32(o->thinktime_iotime);
 	top->fsync_blocks = cpu_to_le32(o->fsync_blocks);
 	top->fdatasync_blocks = cpu_to_le32(o->fdatasync_blocks);
 	top->barrier_blocks = cpu_to_le32(o->barrier_blocks);
diff --git a/fio.1 b/fio.1
index 03fddffb..78988c9e 100644
--- a/fio.1
+++ b/fio.1
@@ -2499,7 +2499,7 @@ problem). Note that this option cannot reliably be used with async IO engines.
 Stall the job for the specified period of time after an I/O has completed before issuing the
 next. May be used to simulate processing being done by an application.
 When the unit is omitted, the value is interpreted in microseconds. See
-\fBthinktime_blocks\fR and \fBthinktime_spin\fR.
+\fBthinktime_blocks\fR, \fBthinktime_iotime\fR and \fBthinktime_spin\fR.
 .TP
 .BI thinktime_spin \fR=\fPtime
 Only valid if \fBthinktime\fR is set - pretend to spend CPU time doing
@@ -2520,6 +2520,17 @@ Only valid if \fBthinktime\fR is set - control how \fBthinktime_blocks\fR trigge
 The default is `complete', which triggers \fBthinktime\fR when fio completes
 \fBthinktime_blocks\fR blocks. If this is set to `issue', then the trigger happens
 at the issue side.
+.TP
+.BI thinktime_iotime \fR=\fPtime
+Only valid if \fBthinktime\fR is set - control \fBthinktime\fR interval by time.
+The \fBthinktime\fR stall is repeated after IOs are executed for
+\fBthinktime_iotime\fR. For example, `\-\-thinktime_iotime=9s \-\-thinktime=1s'
+repeat 10-second cycle with IOs for 9 seconds and stall for 1 second. When the
+unit is omitted, \fBthinktime_iotime\fR is interpreted as a number of seconds.
+If this option is used together with \fBthinktime_blocks\fR, the \fBthinktime\fR
+stall is repeated after \fBthinktime_iotime\fR or after \fBthinktime_blocks\fR
+IOs, whichever happens first.
+
 .TP
 .BI rate \fR=\fPint[,int][,int]
 Cap the bandwidth used by this job. The number is in bytes/sec, the normal
diff --git a/fio.h b/fio.h
index da1fe085..6bb21ebb 100644
--- a/fio.h
+++ b/fio.h
@@ -370,6 +370,8 @@ struct thread_data {
 	uint64_t bytes_done[DDIR_RWDIR_CNT];
 
 	uint64_t *thinktime_blocks_counter;
+	struct timespec last_thinktime;
+	uint64_t last_thinktime_blocks;
 
 	/*
 	 * State for random io, a bitmap of blocks done vs not done
diff --git a/options.c b/options.c
index 74ac1f3f..460cf4ff 100644
--- a/options.c
+++ b/options.c
@@ -3680,6 +3680,20 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		},
 		.parent = "thinktime",
 	},
+	{
+		.name	= "thinktime_iotime",
+		.lname	= "Thinktime interval",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, thinktime_iotime),
+		.help	= "IO time interval between 'thinktime'",
+		.def	= "0",
+		.parent	= "thinktime",
+		.hide	= 1,
+		.is_seconds = 1,
+		.is_time = 1,
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_THINKTIME,
+	},
 	{
 		.name	= "rate",
 		.lname	= "I/O rate",
diff --git a/server.h b/server.h
index 3ff32d9a..44b8da12 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 93,
+	FIO_SERVER_VER			= 94,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 9990ab9b..6e1a2cdd 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -191,10 +191,6 @@ struct thread_options {
 
 	unsigned int hugepage_size;
 	unsigned long long rw_min_bs;
-	unsigned int thinktime;
-	unsigned int thinktime_spin;
-	unsigned int thinktime_blocks;
-	unsigned int thinktime_blocks_type;
 	unsigned int fsync_blocks;
 	unsigned int fdatasync_blocks;
 	unsigned int barrier_blocks;
@@ -303,6 +299,12 @@ struct thread_options {
 	char *exec_prerun;
 	char *exec_postrun;
 
+	unsigned int thinktime;
+	unsigned int thinktime_spin;
+	unsigned int thinktime_blocks;
+	unsigned int thinktime_blocks_type;
+	unsigned int thinktime_iotime;
+
 	uint64_t rate[DDIR_RWDIR_CNT];
 	uint64_t ratemin[DDIR_RWDIR_CNT];
 	unsigned int ratecycle;
@@ -504,10 +506,6 @@ struct thread_options_pack {
 
 	uint32_t hugepage_size;
 	uint64_t rw_min_bs;
-	uint32_t thinktime;
-	uint32_t thinktime_spin;
-	uint32_t thinktime_blocks;
-	uint32_t thinktime_blocks_type;
 	uint32_t fsync_blocks;
 	uint32_t fdatasync_blocks;
 	uint32_t barrier_blocks;
@@ -612,6 +610,12 @@ struct thread_options_pack {
 	uint8_t exec_prerun[FIO_TOP_STR_MAX];
 	uint8_t exec_postrun[FIO_TOP_STR_MAX];
 
+	uint32_t thinktime;
+	uint32_t thinktime_spin;
+	uint32_t thinktime_blocks;
+	uint32_t thinktime_blocks_type;
+	uint32_t thinktime_iotime;
+
 	uint64_t rate[DDIR_RWDIR_CNT];
 	uint64_t ratemin[DDIR_RWDIR_CNT];
 	uint32_t ratecycle;
@@ -651,7 +655,6 @@ struct thread_options_pack {
 	uint64_t latency_target;
 	uint64_t latency_window;
 	uint64_t max_latency[DDIR_RWDIR_CNT];
-	uint32_t pad5;
 	fio_fp64_t latency_percentile;
 	uint32_t latency_run;
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
       [not found] <20210904120002.6CvOT9T4szpIiJFCHDKPhuyks6R8uigef-9NM23WJEg@z>
@ 2021-09-04 12:00 ` Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f3463241727215e228a60dc3b9a1ba2996f149a1:

  oslib: Fix blkzoned_get_max_open_zones() (2021-09-02 20:56:19 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 63176c21beb68ec54787eb2fd6be5b3c9132113b:

  examples: add examples for cmdprio_* IO priority options (2021-09-03 10:12:25 -0600)

----------------------------------------------------------------
Damien Le Moal (11):
      manpage: fix formatting
      manpage: fix definition of prio and prioclass options
      tools: fiograph: do not overwrite input script file
      os: introduce ioprio_value() helper
      options: make parsing functions available to ioengines
      libaio,io_uring: improve cmdprio_percentage option
      libaio,io_uring: introduce cmdprio_class and cmdprio options
      libaio,io_uring: introduce cmdprio_bssplit
      libaio,io_uring: relax cmdprio_percentage constraints
      fio: Introduce the log_prio option
      examples: add examples for cmdprio_* IO priority options

 HOWTO                           |  59 ++++++++++++----
 backend.c                       |   1 +
 cconv.c                         |   2 +
 client.c                        |   2 +
 engines/cmdprio.h               | 144 +++++++++++++++++++++++++++++++++++++
 engines/filecreate.c            |   2 +-
 engines/filedelete.c            |   2 +-
 engines/filestat.c              |   2 +-
 engines/io_uring.c              | 152 ++++++++++++++++++++++++++++++++--------
 engines/libaio.c                | 125 ++++++++++++++++++++++++++++-----
 eta.c                           |   2 +-
 examples/cmdprio-bssplit.fio    |  17 +++++
 examples/cmdprio-bssplit.png    | Bin 0 -> 45606 bytes
 examples/cmdprio-percentage.fio |  17 +++++
 examples/cmdprio-percentage.png | Bin 0 -> 46271 bytes
 fio.1                           |  73 ++++++++++++++-----
 fio.h                           |   5 ++
 init.c                          |   4 ++
 io_u.c                          |  14 ++--
 io_u.h                          |  10 ++-
 iolog.c                         |  45 +++++++++---
 iolog.h                         |  16 ++++-
 options.c                       |  50 ++++++-------
 os/os-android.h                 |  20 ++++--
 os/os-dragonfly.h               |   1 +
 os/os-linux.h                   |  20 ++++--
 os/os.h                         |   4 ++
 server.h                        |   3 +-
 stat.c                          |  75 ++++++++++----------
 stat.h                          |   9 ++-
 thread_options.h                |  19 +++++
 tools/fiograph/fiograph.conf    |   4 +-
 tools/fiograph/fiograph.py      |   4 +-
 33 files changed, 724 insertions(+), 179 deletions(-)
 create mode 100644 engines/cmdprio.h
 create mode 100644 examples/cmdprio-bssplit.fio
 create mode 100644 examples/cmdprio-bssplit.png
 create mode 100644 examples/cmdprio-percentage.fio
 create mode 100644 examples/cmdprio-percentage.png

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index a2cf20f6..1853f56a 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2163,14 +2163,49 @@ In addition, there are some parameters which are only valid when a specific
 with the caveat that when used on the command line, they must come after the
 :option:`ioengine` that defines them is selected.
 
-.. option:: cmdprio_percentage=int : [io_uring] [libaio]
-
-    Set the percentage of I/O that will be issued with higher priority by setting
-    the priority bit. Non-read I/O is likely unaffected by ``cmdprio_percentage``.
-    This option cannot be used with the `prio` or `prioclass` options. For this
-    option to set the priority bit properly, NCQ priority must be supported and
-    enabled and :option:`direct`\=1 option must be used. fio must also be run as
-    the root user.
+.. option:: cmdprio_percentage=int[,int] : [io_uring] [libaio]
+
+    Set the percentage of I/O that will be issued with the highest priority.
+    Default: 0. A single value applies to reads and writes. Comma-separated
+    values may be specified for reads and writes. This option cannot be used
+    with the :option:`prio` or :option:`prioclass` options. For this option
+    to be effective, NCQ priority must be supported and enabled, and `direct=1'
+    option must be used. fio must also be run as the root user.
+
+.. option:: cmdprio_class=int[,int] : [io_uring] [libaio]
+
+	Set the I/O priority class to use for I/Os that must be issued with
+	a priority when :option:`cmdprio_percentage` or
+	:option:`cmdprio_bssplit` is set. If not specified when
+	:option:`cmdprio_percentage` or :option:`cmdprio_bssplit` is set,
+	this defaults to the highest priority class. A single value applies
+	to reads and writes. Comma-separated values may be specified for
+	reads and writes. See :manpage:`ionice(1)`. See also the
+	:option:`prioclass` option.
+
+.. option:: cmdprio=int[,int] : [io_uring] [libaio]
+
+	Set the I/O priority value to use for I/Os that must be issued with
+	a priority when :option:`cmdprio_percentage` or
+	:option:`cmdprio_bssplit` is set. If not specified when
+	:option:`cmdprio_percentage` or :option:`cmdprio_bssplit` is set,
+	this defaults to 0.
+	Linux limits us to a positive value between 0 and 7, with 0 being the
+	highest. A single value applies to reads and writes. Comma-separated
+	values may be specified for reads and writes. See :manpage:`ionice(1)`.
+	Refer to an appropriate manpage for other operating systems since
+	meaning of priority may differ. See also the :option:`prio` option.
+
+.. option:: cmdprio_bssplit=str[,str] : [io_uring] [libaio]
+	To get a finer control over I/O priority, this option allows
+	specifying the percentage of IOs that must have a priority set
+	depending on the block size of the IO. This option is useful only
+	when used together with the :option:`bssplit` option, that is,
+	multiple different block sizes are used for reads and writes.
+	The format for this option is the same as the format of the
+	:option:`bssplit` option, with the exception that values for
+	trim IOs are ignored. This option is mutually exclusive with the
+	:option:`cmdprio_percentage` option.
 
 .. option:: fixedbufs : [io_uring]
 
@@ -2974,14 +3009,14 @@ Threads, processes and job synchronization
 	between 0 and 7, with 0 being the highest.  See man
 	:manpage:`ionice(1)`. Refer to an appropriate manpage for other operating
 	systems since meaning of priority may differ. For per-command priority
-	setting, see I/O engine specific `cmdprio_percentage` and `hipri_percentage`
-	options.
+	setting, see I/O engine specific :option:`cmdprio_percentage` and
+	:option:`cmdprio` options.
 
 .. option:: prioclass=int
 
 	Set the I/O priority class. See man :manpage:`ionice(1)`. For per-command
-	priority setting, see I/O engine specific `cmdprio_percentage` and
-	`hipri_percentage` options.
+	priority setting, see I/O engine specific :option:`cmdprio_percentage`
+	and :option:`cmdprio_class` options.
 
 .. option:: cpus_allowed=str
 
diff --git a/backend.c b/backend.c
index 808e4362..1bcb035a 100644
--- a/backend.c
+++ b/backend.c
@@ -1760,6 +1760,7 @@ static void *thread_main(void *data)
 			td_verror(td, errno, "ioprio_set");
 			goto err;
 		}
+		td->ioprio = ioprio_value(o->ioprio_class, o->ioprio);
 	}
 
 	if (o->cgroup && cgroup_setup(td, cgroup_list, &cgroup_mnt))
diff --git a/cconv.c b/cconv.c
index e3a8c27c..2dc5274e 100644
--- a/cconv.c
+++ b/cconv.c
@@ -192,6 +192,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->log_hist_coarseness = le32_to_cpu(top->log_hist_coarseness);
 	o->log_max = le32_to_cpu(top->log_max);
 	o->log_offset = le32_to_cpu(top->log_offset);
+	o->log_prio = le32_to_cpu(top->log_prio);
 	o->log_gz = le32_to_cpu(top->log_gz);
 	o->log_gz_store = le32_to_cpu(top->log_gz_store);
 	o->log_unix_epoch = le32_to_cpu(top->log_unix_epoch);
@@ -417,6 +418,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->log_avg_msec = cpu_to_le32(o->log_avg_msec);
 	top->log_max = cpu_to_le32(o->log_max);
 	top->log_offset = cpu_to_le32(o->log_offset);
+	top->log_prio = cpu_to_le32(o->log_prio);
 	top->log_gz = cpu_to_le32(o->log_gz);
 	top->log_gz_store = cpu_to_le32(o->log_gz_store);
 	top->log_unix_epoch = cpu_to_le32(o->log_unix_epoch);
diff --git a/client.c b/client.c
index 29d8750a..8b230617 100644
--- a/client.c
+++ b/client.c
@@ -1679,6 +1679,7 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 	ret->log_type		= le32_to_cpu(ret->log_type);
 	ret->compressed		= le32_to_cpu(ret->compressed);
 	ret->log_offset		= le32_to_cpu(ret->log_offset);
+	ret->log_prio		= le32_to_cpu(ret->log_prio);
 	ret->log_hist_coarseness = le32_to_cpu(ret->log_hist_coarseness);
 
 	if (*store_direct)
@@ -1696,6 +1697,7 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 		s->data.val	= le64_to_cpu(s->data.val);
 		s->__ddir	= __le32_to_cpu(s->__ddir);
 		s->bs		= le64_to_cpu(s->bs);
+		s->priority	= le16_to_cpu(s->priority);
 
 		if (ret->log_offset) {
 			struct io_sample_offset *so = (void *) s;
diff --git a/engines/cmdprio.h b/engines/cmdprio.h
new file mode 100644
index 00000000..0edc4365
--- /dev/null
+++ b/engines/cmdprio.h
@@ -0,0 +1,144 @@
+/*
+ * IO priority handling declarations and helper functions common to the
+ * libaio and io_uring engines.
+ */
+
+#ifndef FIO_CMDPRIO_H
+#define FIO_CMDPRIO_H
+
+#include "../fio.h"
+
+struct cmdprio {
+	unsigned int percentage[DDIR_RWDIR_CNT];
+	unsigned int class[DDIR_RWDIR_CNT];
+	unsigned int level[DDIR_RWDIR_CNT];
+	unsigned int bssplit_nr[DDIR_RWDIR_CNT];
+	struct bssplit *bssplit[DDIR_RWDIR_CNT];
+};
+
+static int fio_cmdprio_bssplit_ddir(struct thread_options *to, void *cb_arg,
+				    enum fio_ddir ddir, char *str, bool data)
+{
+	struct cmdprio *cmdprio = cb_arg;
+	struct split split;
+	unsigned int i;
+
+	if (ddir == DDIR_TRIM)
+		return 0;
+
+	memset(&split, 0, sizeof(split));
+
+	if (split_parse_ddir(to, &split, str, data, BSSPLIT_MAX))
+		return 1;
+	if (!split.nr)
+		return 0;
+
+	cmdprio->bssplit_nr[ddir] = split.nr;
+	cmdprio->bssplit[ddir] = malloc(split.nr * sizeof(struct bssplit));
+	if (!cmdprio->bssplit[ddir])
+		return 1;
+
+	for (i = 0; i < split.nr; i++) {
+		cmdprio->bssplit[ddir][i].bs = split.val1[i];
+		if (split.val2[i] == -1U) {
+			cmdprio->bssplit[ddir][i].perc = 0;
+		} else {
+			if (split.val2[i] > 100)
+				cmdprio->bssplit[ddir][i].perc = 100;
+			else
+				cmdprio->bssplit[ddir][i].perc = split.val2[i];
+		}
+	}
+
+	return 0;
+}
+
+static int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
+				     struct cmdprio *cmdprio)
+{
+	char *str, *p;
+	int i, ret = 0;
+
+	p = str = strdup(input);
+
+	strip_blank_front(&str);
+	strip_blank_end(str);
+
+	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio, false);
+
+	if (parse_dryrun()) {
+		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+			free(cmdprio->bssplit[i]);
+			cmdprio->bssplit[i] = NULL;
+			cmdprio->bssplit_nr[i] = 0;
+		}
+	}
+
+	free(p);
+	return ret;
+}
+
+static inline int fio_cmdprio_percentage(struct cmdprio *cmdprio,
+					 struct io_u *io_u)
+{
+	enum fio_ddir ddir = io_u->ddir;
+	unsigned int p = cmdprio->percentage[ddir];
+	int i;
+
+	/*
+	 * If cmdprio_percentage option was specified, then use that
+	 * percentage. Otherwise, use cmdprio_bssplit percentages depending
+	 * on the IO size.
+	 */
+	if (p)
+		return p;
+
+	for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
+		if (cmdprio->bssplit[ddir][i].bs == io_u->buflen)
+			return cmdprio->bssplit[ddir][i].perc;
+	}
+
+	return 0;
+}
+
+static int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
+			    bool *has_cmdprio)
+{
+	struct thread_options *to = &td->o;
+	bool has_cmdprio_percentage = false;
+	bool has_cmdprio_bssplit = false;
+	int i;
+
+	/*
+	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
+	 * is not set, default to RT priority class.
+	 */
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		if (cmdprio->percentage[i]) {
+			if (!cmdprio->class[i])
+				cmdprio->class[i] = IOPRIO_CLASS_RT;
+			has_cmdprio_percentage = true;
+		}
+		if (cmdprio->bssplit_nr[i]) {
+			if (!cmdprio->class[i])
+				cmdprio->class[i] = IOPRIO_CLASS_RT;
+			has_cmdprio_bssplit = true;
+		}
+	}
+
+	/*
+	 * Check for option conflicts
+	 */
+	if (has_cmdprio_percentage && has_cmdprio_bssplit) {
+		log_err("%s: cmdprio_percentage and cmdprio_bssplit options "
+			"are mutually exclusive\n",
+			to->name);
+		return 1;
+	}
+
+	*has_cmdprio = has_cmdprio_percentage || has_cmdprio_bssplit;
+
+	return 0;
+}
+
+#endif
diff --git a/engines/filecreate.c b/engines/filecreate.c
index 16c64928..4bb13c34 100644
--- a/engines/filecreate.c
+++ b/engines/filecreate.c
@@ -49,7 +49,7 @@ static int open_file(struct thread_data *td, struct fio_file *f)
 		uint64_t nsec;
 
 		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, false);
 	}
 
 	return 0;
diff --git a/engines/filedelete.c b/engines/filedelete.c
index 64c58639..e882ccf0 100644
--- a/engines/filedelete.c
+++ b/engines/filedelete.c
@@ -51,7 +51,7 @@ static int delete_file(struct thread_data *td, struct fio_file *f)
 		uint64_t nsec;
 
 		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, false);
 	}
 
 	return 0;
diff --git a/engines/filestat.c b/engines/filestat.c
index 405f028d..00311247 100644
--- a/engines/filestat.c
+++ b/engines/filestat.c
@@ -125,7 +125,7 @@ static int stat_file(struct thread_data *td, struct fio_file *f)
 		uint64_t nsec;
 
 		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, false);
 	}
 
 	return 0;
diff --git a/engines/io_uring.c b/engines/io_uring.c
index b8d4cf91..27a4a678 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -23,6 +23,7 @@
 
 #include "../lib/types.h"
 #include "../os/linux/io_uring.h"
+#include "cmdprio.h"
 
 struct io_sq_ring {
 	unsigned *head;
@@ -64,17 +65,17 @@ struct ioring_data {
 	int queued;
 	int cq_ring_off;
 	unsigned iodepth;
-	bool ioprio_class_set;
-	bool ioprio_set;
 	int prepped;
 
 	struct ioring_mmap mmap[3];
+
+	bool use_cmdprio;
 };
 
 struct ioring_options {
-	void *pad;
+	struct thread_data *td;
 	unsigned int hipri;
-	unsigned int cmdprio_percentage;
+	struct cmdprio cmdprio;
 	unsigned int fixedbufs;
 	unsigned int registerfiles;
 	unsigned int sqpoll_thread;
@@ -105,6 +106,15 @@ static int fio_ioring_sqpoll_cb(void *data, unsigned long long *val)
 	return 0;
 }
 
+static int str_cmdprio_bssplit_cb(void *data, const char *input)
+{
+	struct ioring_options *o = data;
+	struct thread_data *td = o->td;
+	struct cmdprio *cmdprio = &o->cmdprio;
+
+	return fio_cmdprio_bssplit_parse(td, input, cmdprio);
+}
+
 static struct fio_option options[] = {
 	{
 		.name	= "hipri",
@@ -120,13 +130,56 @@ static struct fio_option options[] = {
 		.name	= "cmdprio_percentage",
 		.lname	= "high priority percentage",
 		.type	= FIO_OPT_INT,
-		.off1	= offsetof(struct ioring_options, cmdprio_percentage),
-		.minval	= 1,
+		.off1	= offsetof(struct ioring_options,
+				   cmdprio.percentage[DDIR_READ]),
+		.off2	= offsetof(struct ioring_options,
+				   cmdprio.percentage[DDIR_WRITE]),
+		.minval	= 0,
 		.maxval	= 100,
 		.help	= "Send high priority I/O this percentage of the time",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
 	},
+	{
+		.name	= "cmdprio_class",
+		.lname	= "Asynchronous I/O priority class",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct ioring_options,
+				   cmdprio.class[DDIR_READ]),
+		.off2	= offsetof(struct ioring_options,
+				   cmdprio.class[DDIR_WRITE]),
+		.help	= "Set asynchronous IO priority class",
+		.minval	= IOPRIO_MIN_PRIO_CLASS + 1,
+		.maxval	= IOPRIO_MAX_PRIO_CLASS,
+		.interval = 1,
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
+	{
+		.name	= "cmdprio",
+		.lname	= "Asynchronous I/O priority level",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct ioring_options,
+				   cmdprio.level[DDIR_READ]),
+		.off2	= offsetof(struct ioring_options,
+				   cmdprio.level[DDIR_WRITE]),
+		.help	= "Set asynchronous IO priority level",
+		.minval	= IOPRIO_MIN_PRIO,
+		.maxval	= IOPRIO_MAX_PRIO,
+		.interval = 1,
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
+	{
+		.name   = "cmdprio_bssplit",
+		.lname  = "Priority percentage block size split",
+		.type   = FIO_OPT_STR_ULL,
+		.cb     = str_cmdprio_bssplit_cb,
+		.off1   = offsetof(struct ioring_options, cmdprio.bssplit),
+		.help   = "Set priority percentages for different block sizes",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
 #else
 	{
 		.name	= "cmdprio_percentage",
@@ -134,6 +187,24 @@ static struct fio_option options[] = {
 		.type	= FIO_OPT_UNSUPPORTED,
 		.help	= "Your platform does not support I/O priority classes",
 	},
+	{
+		.name	= "cmdprio_class",
+		.lname	= "Asynchronous I/O priority class",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support I/O priority classes",
+	},
+	{
+		.name	= "cmdprio",
+		.lname	= "Asynchronous I/O priority level",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support I/O priority classes",
+	},
+	{
+		.name   = "cmdprio_bssplit",
+		.lname  = "Priority percentage block size split",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support I/O priority classes",
+	},
 #endif
 	{
 		.name	= "fixedbufs",
@@ -267,10 +338,6 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 			sqe->rw_flags |= RWF_UNCACHED;
 		if (o->nowait)
 			sqe->rw_flags |= RWF_NOWAIT;
-		if (ld->ioprio_class_set)
-			sqe->ioprio = td->o.ioprio_class << 13;
-		if (ld->ioprio_set)
-			sqe->ioprio |= td->o.ioprio;
 		sqe->off = io_u->offset;
 	} else if (ddir_sync(io_u->ddir)) {
 		sqe->ioprio = 0;
@@ -381,13 +448,37 @@ static void fio_ioring_prio_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct ioring_options *o = td->eo;
 	struct ioring_data *ld = td->io_ops_data;
-	if (rand_between(&td->prio_state, 0, 99) < o->cmdprio_percentage) {
-		ld->sqes[io_u->index].ioprio = IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT;
-		io_u->flags |= IO_U_F_PRIORITY;
+	struct io_uring_sqe *sqe = &ld->sqes[io_u->index];
+	struct cmdprio *cmdprio = &o->cmdprio;
+	enum fio_ddir ddir = io_u->ddir;
+	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
+	unsigned int cmdprio_value =
+		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
+
+	if (p && rand_between(&td->prio_state, 0, 99) < p) {
+		sqe->ioprio = cmdprio_value;
+		if (!td->ioprio || cmdprio_value < td->ioprio) {
+			/*
+			 * The async IO priority is higher (has a lower value)
+			 * than the priority set by "prio" and "prioclass"
+			 * options.
+			 */
+			io_u->flags |= IO_U_F_HIGH_PRIO;
+		}
 	} else {
-		ld->sqes[io_u->index].ioprio = 0;
+		sqe->ioprio = td->ioprio;
+		if (cmdprio_value && td->ioprio && td->ioprio < cmdprio_value) {
+			/*
+			 * The IO will be executed with the priority set by
+			 * "prio" and "prioclass" options, and this priority
+			 * is higher (has a lower value) than the async IO
+			 * priority.
+			 */
+			io_u->flags |= IO_U_F_HIGH_PRIO;
+		}
 	}
-	return;
+
+	io_u->ioprio = sqe->ioprio;
 }
 
 static enum fio_q_status fio_ioring_queue(struct thread_data *td,
@@ -395,7 +486,6 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 {
 	struct ioring_data *ld = td->io_ops_data;
 	struct io_sq_ring *ring = &ld->sq_ring;
-	struct ioring_options *o = td->eo;
 	unsigned tail, next_tail;
 
 	fio_ro_check(td, io_u);
@@ -418,7 +508,7 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 	if (next_tail == atomic_load_acquire(ring->head))
 		return FIO_Q_BUSY;
 
-	if (o->cmdprio_percentage)
+	if (ld->use_cmdprio)
 		fio_ioring_prio_prep(td, io_u);
 	ring->array[tail & ld->sq_ring_mask] = io_u->index;
 	atomic_store_release(ring->tail, next_tail);
@@ -729,7 +819,9 @@ static int fio_ioring_init(struct thread_data *td)
 {
 	struct ioring_options *o = td->eo;
 	struct ioring_data *ld;
-	struct thread_options *to = &td->o;
+	struct cmdprio *cmdprio = &o->cmdprio;
+	bool has_cmdprio = false;
+	int ret;
 
 	/* sqthread submission requires registered files */
 	if (o->sqpoll_thread)
@@ -753,21 +845,21 @@ static int fio_ioring_init(struct thread_data *td)
 
 	td->io_ops_data = ld;
 
-	/*
-	 * Check for option conflicts
-	 */
-	if ((fio_option_is_set(to, ioprio) || fio_option_is_set(to, ioprio_class)) &&
-			o->cmdprio_percentage != 0) {
-		log_err("%s: cmdprio_percentage option and mutually exclusive "
-				"prio or prioclass option is set, exiting\n", to->name);
-		td_verror(td, EINVAL, "fio_io_uring_init");
+	ret = fio_cmdprio_init(td, cmdprio, &has_cmdprio);
+	if (ret) {
+		td_verror(td, EINVAL, "fio_ioring_init");
 		return 1;
 	}
 
-	if (fio_option_is_set(&td->o, ioprio_class))
-		ld->ioprio_class_set = true;
-	if (fio_option_is_set(&td->o, ioprio))
-		ld->ioprio_set = true;
+	/*
+	 * Since io_uring can have a submission context (sqthread_poll) that is
+	 * different from the process context, we cannot rely on the the IO
+	 * priority set by ioprio_set() (option prio/prioclass) to be inherited.
+	 * Therefore, we set the sqe->ioprio field when prio/prioclass is used.
+	 */
+	ld->use_cmdprio = has_cmdprio ||
+		fio_option_is_set(&td->o, ioprio_class) ||
+		fio_option_is_set(&td->o, ioprio);
 
 	return 0;
 }
diff --git a/engines/libaio.c b/engines/libaio.c
index b909b79e..dd655355 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -15,6 +15,7 @@
 #include "../lib/pow2.h"
 #include "../optgroup.h"
 #include "../lib/memalign.h"
+#include "cmdprio.h"
 
 /* Should be defined in newest aio_abi.h */
 #ifndef IOCB_FLAG_IOPRIO
@@ -50,15 +51,26 @@ struct libaio_data {
 	unsigned int queued;
 	unsigned int head;
 	unsigned int tail;
+
+	bool use_cmdprio;
 };
 
 struct libaio_options {
-	void *pad;
+	struct thread_data *td;
 	unsigned int userspace_reap;
-	unsigned int cmdprio_percentage;
+	struct cmdprio cmdprio;
 	unsigned int nowait;
 };
 
+static int str_cmdprio_bssplit_cb(void *data, const char *input)
+{
+	struct libaio_options *o = data;
+	struct thread_data *td = o->td;
+	struct cmdprio *cmdprio = &o->cmdprio;
+
+	return fio_cmdprio_bssplit_parse(td, input, cmdprio);
+}
+
 static struct fio_option options[] = {
 	{
 		.name	= "userspace_reap",
@@ -74,13 +86,56 @@ static struct fio_option options[] = {
 		.name	= "cmdprio_percentage",
 		.lname	= "high priority percentage",
 		.type	= FIO_OPT_INT,
-		.off1	= offsetof(struct libaio_options, cmdprio_percentage),
-		.minval	= 1,
+		.off1	= offsetof(struct libaio_options,
+				   cmdprio.percentage[DDIR_READ]),
+		.off2	= offsetof(struct libaio_options,
+				   cmdprio.percentage[DDIR_WRITE]),
+		.minval	= 0,
 		.maxval	= 100,
 		.help	= "Send high priority I/O this percentage of the time",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
 	},
+	{
+		.name	= "cmdprio_class",
+		.lname	= "Asynchronous I/O priority class",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct libaio_options,
+				   cmdprio.class[DDIR_READ]),
+		.off2	= offsetof(struct libaio_options,
+				   cmdprio.class[DDIR_WRITE]),
+		.help	= "Set asynchronous IO priority class",
+		.minval	= IOPRIO_MIN_PRIO_CLASS + 1,
+		.maxval	= IOPRIO_MAX_PRIO_CLASS,
+		.interval = 1,
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
+	{
+		.name	= "cmdprio",
+		.lname	= "Asynchronous I/O priority level",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct libaio_options,
+				   cmdprio.level[DDIR_READ]),
+		.off2	= offsetof(struct libaio_options,
+				   cmdprio.level[DDIR_WRITE]),
+		.help	= "Set asynchronous IO priority level",
+		.minval	= IOPRIO_MIN_PRIO,
+		.maxval	= IOPRIO_MAX_PRIO,
+		.interval = 1,
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
+	{
+		.name   = "cmdprio_bssplit",
+		.lname  = "Priority percentage block size split",
+		.type   = FIO_OPT_STR_ULL,
+		.cb     = str_cmdprio_bssplit_cb,
+		.off1   = offsetof(struct libaio_options, cmdprio.bssplit),
+		.help   = "Set priority percentages for different block sizes",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
 #else
 	{
 		.name	= "cmdprio_percentage",
@@ -88,6 +143,24 @@ static struct fio_option options[] = {
 		.type	= FIO_OPT_UNSUPPORTED,
 		.help	= "Your platform does not support I/O priority classes",
 	},
+	{
+		.name	= "cmdprio_class",
+		.lname	= "Asynchronous I/O priority class",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support I/O priority classes",
+	},
+	{
+		.name	= "cmdprio",
+		.lname	= "Asynchronous I/O priority level",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support I/O priority classes",
+	},
+	{
+		.name   = "cmdprio_bssplit",
+		.lname  = "Priority percentage block size split",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support I/O priority classes",
+	},
 #endif
 	{
 		.name	= "nowait",
@@ -135,12 +208,31 @@ static int fio_libaio_prep(struct thread_data *td, struct io_u *io_u)
 static void fio_libaio_prio_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct libaio_options *o = td->eo;
-	if (rand_between(&td->prio_state, 0, 99) < o->cmdprio_percentage) {
-		io_u->iocb.aio_reqprio = IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT;
+	struct cmdprio *cmdprio = &o->cmdprio;
+	enum fio_ddir ddir = io_u->ddir;
+	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
+	unsigned int cmdprio_value =
+		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
+
+	if (p && rand_between(&td->prio_state, 0, 99) < p) {
+		io_u->ioprio = cmdprio_value;
+		io_u->iocb.aio_reqprio = cmdprio_value;
 		io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO;
-		io_u->flags |= IO_U_F_PRIORITY;
+		if (!td->ioprio || cmdprio_value < td->ioprio) {
+			/*
+			 * The async IO priority is higher (has a lower value)
+			 * than the default context priority.
+			 */
+			io_u->flags |= IO_U_F_HIGH_PRIO;
+		}
+	} else if (td->ioprio && td->ioprio < cmdprio_value) {
+		/*
+		 * The IO will be executed with the default context priority,
+		 * and this priority is higher (has a lower value) than the
+		 * async IO priority.
+		 */
+		io_u->flags |= IO_U_F_HIGH_PRIO;
 	}
-	return;
 }
 
 static struct io_u *fio_libaio_event(struct thread_data *td, int event)
@@ -246,7 +338,6 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 					  struct io_u *io_u)
 {
 	struct libaio_data *ld = td->io_ops_data;
-	struct libaio_options *o = td->eo;
 
 	fio_ro_check(td, io_u);
 
@@ -277,7 +368,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 		return FIO_Q_COMPLETED;
 	}
 
-	if (o->cmdprio_percentage)
+	if (ld->use_cmdprio)
 		fio_libaio_prio_prep(td, io_u);
 
 	ld->iocbs[ld->head] = &io_u->iocb;
@@ -420,8 +511,9 @@ static int fio_libaio_post_init(struct thread_data *td)
 static int fio_libaio_init(struct thread_data *td)
 {
 	struct libaio_data *ld;
-	struct thread_options *to = &td->o;
 	struct libaio_options *o = td->eo;
+	struct cmdprio *cmdprio = &o->cmdprio;
+	int ret;
 
 	ld = calloc(1, sizeof(*ld));
 
@@ -432,16 +524,13 @@ static int fio_libaio_init(struct thread_data *td)
 	ld->io_us = calloc(ld->entries, sizeof(struct io_u *));
 
 	td->io_ops_data = ld;
-	/*
-	 * Check for option conflicts
-	 */
-	if ((fio_option_is_set(to, ioprio) || fio_option_is_set(to, ioprio_class)) &&
-			o->cmdprio_percentage != 0) {
-		log_err("%s: cmdprio_percentage option and mutually exclusive "
-				"prio or prioclass option is set, exiting\n", to->name);
+
+	ret = fio_cmdprio_init(td, cmdprio, &ld->use_cmdprio);
+	if (ret) {
 		td_verror(td, EINVAL, "fio_libaio_init");
 		return 1;
 	}
+
 	return 0;
 }
 
diff --git a/eta.c b/eta.c
index db13cb18..ea1781f3 100644
--- a/eta.c
+++ b/eta.c
@@ -509,7 +509,7 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 		memcpy(&rate_prev_time, &now, sizeof(now));
 		regrow_agg_logs();
 		for_each_rw_ddir(ddir) {
-			add_agg_sample(sample_val(je->rate[ddir]), ddir, 0, 0);
+			add_agg_sample(sample_val(je->rate[ddir]), ddir, 0);
 		}
 	}
 
diff --git a/examples/cmdprio-bssplit.fio b/examples/cmdprio-bssplit.fio
new file mode 100644
index 00000000..47e9a790
--- /dev/null
+++ b/examples/cmdprio-bssplit.fio
@@ -0,0 +1,17 @@
+; Randomly read/write a block device file at queue depth 16.
+; 40 % of read IOs are 64kB and 60% are 1MB. 100% of writes are 1MB.
+; 100% of the 64kB reads are executed at the highest priority and
+; all other IOs executed without a priority set.
+[global]
+filename=/dev/sda
+direct=1
+write_lat_log=prio-run.log
+log_prio=1
+
+[randrw]
+rw=randrw
+bssplit=64k/40:1024k/60,1024k/100
+ioengine=libaio
+iodepth=16
+cmdprio_bssplit=64k/100:1024k/0,1024k/0
+cmdprio_class=1
diff --git a/examples/cmdprio-bssplit.png b/examples/cmdprio-bssplit.png
new file mode 100644
index 00000000..a0bb3ff4
Binary files /dev/null and b/examples/cmdprio-bssplit.png differ
diff --git a/examples/cmdprio-percentage.fio b/examples/cmdprio-percentage.fio
new file mode 100644
index 00000000..e4bc9db8
--- /dev/null
+++ b/examples/cmdprio-percentage.fio
@@ -0,0 +1,17 @@
+; Read a block device file at queue depth 8
+; with 20 % of the IOs using the high priority RT class
+; and the remaining IOs using the idle priority class
+[global]
+filename=/dev/sda
+direct=1
+write_lat_log=prio-run.log
+log_prio=1
+
+[randread]
+rw=randread
+bs=128k
+ioengine=libaio
+iodepth=8
+prioclass=3
+cmdprio_percentage=20
+cmdprio_class=1
diff --git a/examples/cmdprio-percentage.png b/examples/cmdprio-percentage.png
new file mode 100644
index 00000000..e794de0c
Binary files /dev/null and b/examples/cmdprio-percentage.png differ
diff --git a/fio.1 b/fio.1
index 382cebfc..03fddffb 100644
--- a/fio.1
+++ b/fio.1
@@ -1962,13 +1962,41 @@ In addition, there are some parameters which are only valid when a specific
 with the caveat that when used on the command line, they must come after the
 \fBioengine\fR that defines them is selected.
 .TP
-.BI (io_uring, libaio)cmdprio_percentage \fR=\fPint
-Set the percentage of I/O that will be issued with higher priority by setting
-the priority bit. Non-read I/O is likely unaffected by ``cmdprio_percentage``.
-This option cannot be used with the `prio` or `prioclass` options. For this
-option to set the priority bit properly, NCQ priority must be supported and
-enabled and `direct=1' option must be used. fio must also be run as the root
-user.
+.BI (io_uring,libaio)cmdprio_percentage \fR=\fPint[,int]
+Set the percentage of I/O that will be issued with the highest priority.
+Default: 0. A single value applies to reads and writes. Comma-separated
+values may be specified for reads and writes. This option cannot be used
+with the `prio` or `prioclass` options. For this option to be effective,
+NCQ priority must be supported and enabled, and `direct=1' option must be
+used. fio must also be run as the root user.
+.TP
+.BI (io_uring,libaio)cmdprio_class \fR=\fPint[,int]
+Set the I/O priority class to use for I/Os that must be issued with a
+priority when \fBcmdprio_percentage\fR or \fBcmdprio_bssplit\fR is set.
+If not specified when \fBcmdprio_percentage\fR or \fBcmdprio_bssplit\fR
+is set, this defaults to the highest priority class. A single value applies
+to reads and writes. Comma-separated values may be specified for reads and
+writes. See man \fBionice\fR\|(1). See also the \fBprioclass\fR option.
+.TP
+.BI (io_uring,libaio)cmdprio \fR=\fPint[,int]
+Set the I/O priority value to use for I/Os that must be issued with a
+priority when \fBcmdprio_percentage\fR or \fBcmdprio_bssplit\fR is set.
+If not specified when \fBcmdprio_percentage\fR or \fBcmdprio_bssplit\fR
+is set, this defaults to 0. Linux limits us to a positive value between
+0 and 7, with 0 being the highest. A single value applies to reads and writes.
+Comma-separated values may be specified for reads and writes. See man
+\fBionice\fR\|(1). Refer to an appropriate manpage for other operating systems
+since the meaning of priority may differ. See also the \fBprio\fR option.
+.TP
+.BI (io_uring,libaio)cmdprio_bssplit \fR=\fPstr[,str]
+To get a finer control over I/O priority, this option allows specifying
+the percentage of IOs that must have a priority set depending on the block
+size of the IO. This option is useful only when used together with the option
+\fBbssplit\fR, that is, multiple different block sizes are used for reads and
+writes. The format for this option is the same as the format of the
+\fBbssplit\fR option, with the exception that values for trim IOs are
+ignored. This option is mutually exclusive with the \fBcmdprio_percentage\fR
+option.
 .TP
 .BI (io_uring)fixedbufs
 If fio is asked to do direct IO, then Linux will map pages for each IO call, and
@@ -2043,20 +2071,20 @@ Detect when I/O threads are done, then exit.
 .BI (libhdfs)namenode \fR=\fPstr
 The hostname or IP address of a HDFS cluster namenode to contact.
 .TP
-.BI (libhdfs)port
+.BI (libhdfs)port \fR=\fPint
 The listening port of the HFDS cluster namenode.
 .TP
-.BI (netsplice,net)port
+.BI (netsplice,net)port \fR=\fPint
 The TCP or UDP port to bind to or connect to. If this is used with
 \fBnumjobs\fR to spawn multiple instances of the same job type, then
 this will be the starting port number since fio will use a range of
 ports.
 .TP
-.BI (rdma, librpma_*)port
+.BI (rdma,librpma_*)port \fR=\fPint
 The port to use for RDMA-CM communication. This should be the same
 value on the client and the server side.
 .TP
-.BI (netsplice,net, rdma)hostname \fR=\fPstr
+.BI (netsplice,net,rdma)hostname \fR=\fPstr
 The hostname or IP address to use for TCP, UDP or RDMA-CM based I/O.
 If the job is a TCP listener or UDP reader, the hostname is not used
 and must be omitted unless it is a valid UDP multicast address.
@@ -2693,13 +2721,13 @@ Set the I/O priority value of this job. Linux limits us to a positive value
 between 0 and 7, with 0 being the highest. See man
 \fBionice\fR\|(1). Refer to an appropriate manpage for other operating
 systems since meaning of priority may differ. For per-command priority
-setting, see I/O engine specific `cmdprio_percentage` and `hipri_percentage`
-options.
+setting, see the I/O engine specific `cmdprio_percentage` and
+`cmdprio` options.
 .TP
 .BI prioclass \fR=\fPint
 Set the I/O priority class. See man \fBionice\fR\|(1). For per-command
-priority setting, see I/O engine specific `cmdprio_percentage` and `hipri_percent`
-options.
+priority setting, see the I/O engine specific `cmdprio_percentage` and
+`cmdprio_class` options.
 .TP
 .BI cpus_allowed \fR=\fPstr
 Controls the same options as \fBcpumask\fR, but accepts a textual
@@ -3238,6 +3266,11 @@ If this is set, the iolog options will include the byte offset for the I/O
 entry as well as the other data values. Defaults to 0 meaning that
 offsets are not present in logs. Also see \fBLOG FILE FORMATS\fR section.
 .TP
+.BI log_prio \fR=\fPbool
+If this is set, the iolog options will include the I/O priority for the I/O
+entry as well as the other data values. Defaults to 0 meaning that
+I/O priorities are not present in logs. Also see \fBLOG FILE FORMATS\fR section.
+.TP
 .BI log_compression \fR=\fPint
 If this is set, fio will compress the I/O logs as it goes, to keep the
 memory footprint lower. When a log reaches the specified size, that chunk is
@@ -4171,8 +4204,14 @@ The entry's `block size' is always in bytes. The `offset' is the position in byt
 from the start of the file for that particular I/O. The logging of the offset can be
 toggled with \fBlog_offset\fR.
 .P
-`Command priority` is 0 for normal priority and 1 for high priority. This is controlled
-by the ioengine specific \fBcmdprio_percentage\fR.
+If \fBlog_prio\fR is not set, the entry's `Command priority` is 1 for an IO executed
+with the highest RT priority class (\fBprioclass\fR=1 or \fBcmdprio_class\fR=1) and 0
+otherwise. This is controlled by the \fBprioclass\fR option and the ioengine specific
+\fBcmdprio_percentage\fR \fBcmdprio_class\fR options. If \fBlog_prio\fR is set, the
+entry's `Command priority` is the priority set for the IO, as a 16-bits hexadecimal
+number with the lowest 13 bits indicating the priority value (\fBprio\fR and
+\fBcmdprio\fR options) and the highest 3 bits indicating the IO priority class
+(\fBprioclass\fR and \fBcmdprio_class\fR options).
 .P
 Fio defaults to logging every individual I/O but when windowed logging is set
 through \fBlog_avg_msec\fR, either the average (by default) or the maximum
diff --git a/fio.h b/fio.h
index 6f6b211b..da1fe085 100644
--- a/fio.h
+++ b/fio.h
@@ -280,6 +280,11 @@ struct thread_data {
 
 	int shm_id;
 
+	/*
+	 * Job default IO priority set with prioclass and prio options.
+	 */
+	unsigned int ioprio;
+
 	/*
 	 * IO engine hooks, contains everything needed to submit an io_u
 	 * to any of the available IO engines.
diff --git a/init.c b/init.c
index 871fb5ad..ec1a2cac 100644
--- a/init.c
+++ b/init.c
@@ -1583,6 +1583,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			.hist_coarseness = o->log_hist_coarseness,
 			.log_type = IO_LOG_TYPE_LAT,
 			.log_offset = o->log_offset,
+			.log_prio = o->log_prio,
 			.log_gz = o->log_gz,
 			.log_gz_store = o->log_gz_store,
 		};
@@ -1616,6 +1617,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			.hist_coarseness = o->log_hist_coarseness,
 			.log_type = IO_LOG_TYPE_HIST,
 			.log_offset = o->log_offset,
+			.log_prio = o->log_prio,
 			.log_gz = o->log_gz,
 			.log_gz_store = o->log_gz_store,
 		};
@@ -1647,6 +1649,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			.hist_coarseness = o->log_hist_coarseness,
 			.log_type = IO_LOG_TYPE_BW,
 			.log_offset = o->log_offset,
+			.log_prio = o->log_prio,
 			.log_gz = o->log_gz,
 			.log_gz_store = o->log_gz_store,
 		};
@@ -1678,6 +1681,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			.hist_coarseness = o->log_hist_coarseness,
 			.log_type = IO_LOG_TYPE_IOPS,
 			.log_offset = o->log_offset,
+			.log_prio = o->log_prio,
 			.log_gz = o->log_gz,
 			.log_gz_store = o->log_gz_store,
 		};
diff --git a/io_u.c b/io_u.c
index 696d25cd..5289b5d1 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1595,7 +1595,7 @@ again:
 		assert(io_u->flags & IO_U_F_FREE);
 		io_u_clear(td, io_u, IO_U_F_FREE | IO_U_F_NO_FILE_PUT |
 				 IO_U_F_TRIMMED | IO_U_F_BARRIER |
-				 IO_U_F_VER_LIST | IO_U_F_PRIORITY);
+				 IO_U_F_VER_LIST | IO_U_F_HIGH_PRIO);
 
 		io_u->error = 0;
 		io_u->acct_ddir = -1;
@@ -1799,6 +1799,10 @@ struct io_u *get_io_u(struct thread_data *td)
 	io_u->xfer_buf = io_u->buf;
 	io_u->xfer_buflen = io_u->buflen;
 
+	/*
+	 * Remember the issuing context priority. The IO engine may change this.
+	 */
+	io_u->ioprio = td->ioprio;
 out:
 	assert(io_u->file);
 	if (!td_io_prep(td, io_u)) {
@@ -1884,7 +1888,8 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 		unsigned long long tnsec;
 
 		tnsec = ntime_since(&io_u->start_time, &icd->time);
-		add_lat_sample(td, idx, tnsec, bytes, io_u->offset, io_u_is_prio(io_u));
+		add_lat_sample(td, idx, tnsec, bytes, io_u->offset,
+			       io_u->ioprio, io_u_is_high_prio(io_u));
 
 		if (td->flags & TD_F_PROFILE_OPS) {
 			struct prof_io_ops *ops = &td->prof_io_ops;
@@ -1905,7 +1910,8 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 
 	if (ddir_rw(idx)) {
 		if (!td->o.disable_clat) {
-			add_clat_sample(td, idx, llnsec, bytes, io_u->offset, io_u_is_prio(io_u));
+			add_clat_sample(td, idx, llnsec, bytes, io_u->offset,
+					io_u->ioprio, io_u_is_high_prio(io_u));
 			io_u_mark_latency(td, llnsec);
 		}
 
@@ -2162,7 +2168,7 @@ void io_u_queued(struct thread_data *td, struct io_u *io_u)
 			td = td->parent;
 
 		add_slat_sample(td, io_u->ddir, slat_time, io_u->xfer_buflen,
-				io_u->offset, io_u_is_prio(io_u));
+				io_u->offset, io_u->ioprio);
 	}
 }
 
diff --git a/io_u.h b/io_u.h
index d4c5be43..bdbac525 100644
--- a/io_u.h
+++ b/io_u.h
@@ -21,7 +21,7 @@ enum {
 	IO_U_F_TRIMMED		= 1 << 5,
 	IO_U_F_BARRIER		= 1 << 6,
 	IO_U_F_VER_LIST		= 1 << 7,
-	IO_U_F_PRIORITY		= 1 << 8,
+	IO_U_F_HIGH_PRIO	= 1 << 8,
 };
 
 /*
@@ -46,6 +46,11 @@ struct io_u {
 	 */
 	unsigned short numberio;
 
+	/*
+	 * IO priority.
+	 */
+	unsigned short ioprio;
+
 	/*
 	 * Allocated/set buffer and length
 	 */
@@ -188,7 +193,6 @@ static inline enum fio_ddir acct_ddir(struct io_u *io_u)
 	td_flags_clear((td), &(io_u->flags), (val))
 #define io_u_set(td, io_u, val)		\
 	td_flags_set((td), &(io_u)->flags, (val))
-#define io_u_is_prio(io_u)	\
-	(io_u->flags & (unsigned int) IO_U_F_PRIORITY) != 0
+#define io_u_is_high_prio(io_u)	(io_u->flags & IO_U_F_HIGH_PRIO)
 
 #endif
diff --git a/iolog.c b/iolog.c
index 26501b4a..1aeb7a76 100644
--- a/iolog.c
+++ b/iolog.c
@@ -737,6 +737,7 @@ void setup_log(struct io_log **log, struct log_params *p,
 	INIT_FLIST_HEAD(&l->io_logs);
 	l->log_type = p->log_type;
 	l->log_offset = p->log_offset;
+	l->log_prio = p->log_prio;
 	l->log_gz = p->log_gz;
 	l->log_gz_store = p->log_gz_store;
 	l->avg_msec = p->avg_msec;
@@ -769,6 +770,8 @@ void setup_log(struct io_log **log, struct log_params *p,
 
 	if (l->log_offset)
 		l->log_ddir_mask = LOG_OFFSET_SAMPLE_BIT;
+	if (l->log_prio)
+		l->log_ddir_mask |= LOG_PRIO_SAMPLE_BIT;
 
 	INIT_FLIST_HEAD(&l->chunk_list);
 
@@ -895,33 +898,55 @@ static void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
 void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 {
 	struct io_sample *s;
-	int log_offset;
+	int log_offset, log_prio;
 	uint64_t i, nr_samples;
+	unsigned int prio_val;
+	const char *fmt;
 
 	if (!sample_size)
 		return;
 
 	s = __get_sample(samples, 0, 0);
 	log_offset = (s->__ddir & LOG_OFFSET_SAMPLE_BIT) != 0;
+	log_prio = (s->__ddir & LOG_PRIO_SAMPLE_BIT) != 0;
+
+	if (log_offset) {
+		if (log_prio)
+			fmt = "%lu, %" PRId64 ", %u, %llu, %llu, 0x%04x\n";
+		else
+			fmt = "%lu, %" PRId64 ", %u, %llu, %llu, %u\n";
+	} else {
+		if (log_prio)
+			fmt = "%lu, %" PRId64 ", %u, %llu, 0x%04x\n";
+		else
+			fmt = "%lu, %" PRId64 ", %u, %llu, %u\n";
+	}
 
 	nr_samples = sample_size / __log_entry_sz(log_offset);
 
 	for (i = 0; i < nr_samples; i++) {
 		s = __get_sample(samples, log_offset, i);
 
+		if (log_prio)
+			prio_val = s->priority;
+		else
+			prio_val = ioprio_value_is_class_rt(s->priority);
+
 		if (!log_offset) {
-			fprintf(f, "%lu, %" PRId64 ", %u, %llu, %u\n",
-					(unsigned long) s->time,
-					s->data.val,
-					io_sample_ddir(s), (unsigned long long) s->bs, s->priority_bit);
+			fprintf(f, fmt,
+				(unsigned long) s->time,
+				s->data.val,
+				io_sample_ddir(s), (unsigned long long) s->bs,
+				prio_val);
 		} else {
 			struct io_sample_offset *so = (void *) s;
 
-			fprintf(f, "%lu, %" PRId64 ", %u, %llu, %llu, %u\n",
-					(unsigned long) s->time,
-					s->data.val,
-					io_sample_ddir(s), (unsigned long long) s->bs,
-					(unsigned long long) so->offset, s->priority_bit);
+			fprintf(f, fmt,
+				(unsigned long) s->time,
+				s->data.val,
+				io_sample_ddir(s), (unsigned long long) s->bs,
+				(unsigned long long) so->offset,
+				prio_val);
 		}
 	}
 }
diff --git a/iolog.h b/iolog.h
index 9e382cc0..7d66b7c4 100644
--- a/iolog.h
+++ b/iolog.h
@@ -42,7 +42,7 @@ struct io_sample {
 	uint64_t time;
 	union io_sample_data data;
 	uint32_t __ddir;
-	uint8_t priority_bit;
+	uint16_t priority;
 	uint64_t bs;
 };
 
@@ -104,6 +104,11 @@ struct io_log {
 	 */
 	unsigned int log_offset;
 
+	/*
+	 * Log I/O priorities
+	 */
+	unsigned int log_prio;
+
 	/*
 	 * Max size of log entries before a chunk is compressed
 	 */
@@ -145,7 +150,13 @@ struct io_log {
  * If the upper bit is set, then we have the offset as well
  */
 #define LOG_OFFSET_SAMPLE_BIT	0x80000000U
-#define io_sample_ddir(io)	((io)->__ddir & ~LOG_OFFSET_SAMPLE_BIT)
+/*
+ * If the bit following the upper bit is set, then we have the priority
+ */
+#define LOG_PRIO_SAMPLE_BIT	0x40000000U
+
+#define LOG_SAMPLE_BITS		(LOG_OFFSET_SAMPLE_BIT | LOG_PRIO_SAMPLE_BIT)
+#define io_sample_ddir(io)	((io)->__ddir & ~LOG_SAMPLE_BITS)
 
 static inline void io_sample_set_ddir(struct io_log *log,
 				      struct io_sample *io,
@@ -262,6 +273,7 @@ struct log_params {
 	int hist_coarseness;
 	int log_type;
 	int log_offset;
+	int log_prio;
 	int log_gz;
 	int log_gz_store;
 	int log_compress;
diff --git a/options.c b/options.c
index 8c2ab7cc..74ac1f3f 100644
--- a/options.c
+++ b/options.c
@@ -73,13 +73,7 @@ static int bs_cmp(const void *p1, const void *p2)
 	return (int) bsp1->perc - (int) bsp2->perc;
 }
 
-struct split {
-	unsigned int nr;
-	unsigned long long val1[ZONESPLIT_MAX];
-	unsigned long long val2[ZONESPLIT_MAX];
-};
-
-static int split_parse_ddir(struct thread_options *o, struct split *split,
+int split_parse_ddir(struct thread_options *o, struct split *split,
 			    char *str, bool absolute, unsigned int max_splits)
 {
 	unsigned long long perc;
@@ -138,8 +132,8 @@ static int split_parse_ddir(struct thread_options *o, struct split *split,
 	return 0;
 }
 
-static int bssplit_ddir(struct thread_options *o, enum fio_ddir ddir, char *str,
-			bool data)
+static int bssplit_ddir(struct thread_options *o, void *eo,
+			enum fio_ddir ddir, char *str, bool data)
 {
 	unsigned int i, perc, perc_missing;
 	unsigned long long max_bs, min_bs;
@@ -211,10 +205,8 @@ static int bssplit_ddir(struct thread_options *o, enum fio_ddir ddir, char *str,
 	return 0;
 }
 
-typedef int (split_parse_fn)(struct thread_options *, enum fio_ddir, char *, bool);
-
-static int str_split_parse(struct thread_data *td, char *str,
-			   split_parse_fn *fn, bool data)
+int str_split_parse(struct thread_data *td, char *str,
+		    split_parse_fn *fn, void *eo, bool data)
 {
 	char *odir, *ddir;
 	int ret = 0;
@@ -223,37 +215,37 @@ static int str_split_parse(struct thread_data *td, char *str,
 	if (odir) {
 		ddir = strchr(odir + 1, ',');
 		if (ddir) {
-			ret = fn(&td->o, DDIR_TRIM, ddir + 1, data);
+			ret = fn(&td->o, eo, DDIR_TRIM, ddir + 1, data);
 			if (!ret)
 				*ddir = '\0';
 		} else {
 			char *op;
 
 			op = strdup(odir + 1);
-			ret = fn(&td->o, DDIR_TRIM, op, data);
+			ret = fn(&td->o, eo, DDIR_TRIM, op, data);
 
 			free(op);
 		}
 		if (!ret)
-			ret = fn(&td->o, DDIR_WRITE, odir + 1, data);
+			ret = fn(&td->o, eo, DDIR_WRITE, odir + 1, data);
 		if (!ret) {
 			*odir = '\0';
-			ret = fn(&td->o, DDIR_READ, str, data);
+			ret = fn(&td->o, eo, DDIR_READ, str, data);
 		}
 	} else {
 		char *op;
 
 		op = strdup(str);
-		ret = fn(&td->o, DDIR_WRITE, op, data);
+		ret = fn(&td->o, eo, DDIR_WRITE, op, data);
 		free(op);
 
 		if (!ret) {
 			op = strdup(str);
-			ret = fn(&td->o, DDIR_TRIM, op, data);
+			ret = fn(&td->o, eo, DDIR_TRIM, op, data);
 			free(op);
 		}
 		if (!ret)
-			ret = fn(&td->o, DDIR_READ, str, data);
+			ret = fn(&td->o, eo, DDIR_READ, str, data);
 	}
 
 	return ret;
@@ -270,7 +262,7 @@ static int str_bssplit_cb(void *data, const char *input)
 	strip_blank_front(&str);
 	strip_blank_end(str);
 
-	ret = str_split_parse(td, str, bssplit_ddir, false);
+	ret = str_split_parse(td, str, bssplit_ddir, NULL, false);
 
 	if (parse_dryrun()) {
 		int i;
@@ -906,8 +898,8 @@ static int str_sfr_cb(void *data, const char *str)
 }
 #endif
 
-static int zone_split_ddir(struct thread_options *o, enum fio_ddir ddir,
-			   char *str, bool absolute)
+static int zone_split_ddir(struct thread_options *o, void *eo,
+			   enum fio_ddir ddir, char *str, bool absolute)
 {
 	unsigned int i, perc, perc_missing, sperc, sperc_missing;
 	struct split split;
@@ -1012,7 +1004,7 @@ static int parse_zoned_distribution(struct thread_data *td, const char *input,
 	}
 	str += strlen(pre);
 
-	ret = str_split_parse(td, str, zone_split_ddir, absolute);
+	ret = str_split_parse(td, str, zone_split_ddir, NULL, absolute);
 
 	free(p);
 
@@ -4300,6 +4292,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
 	},
+	{
+		.name	= "log_prio",
+		.lname	= "Log priority of IO",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct thread_options, log_prio),
+		.help	= "Include priority value of IO for each log entry",
+		.def	= "0",
+		.category = FIO_OPT_C_LOG,
+		.group	= FIO_OPT_G_INVALID,
+	},
 #ifdef CONFIG_ZLIB
 	{
 		.name	= "log_compression",
diff --git a/os/os-android.h b/os/os-android.h
index a81cd815..18eb39ce 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -173,16 +173,26 @@ enum {
 #define IOPRIO_MIN_PRIO_CLASS	0
 #define IOPRIO_MAX_PRIO_CLASS	3
 
-static inline int ioprio_set(int which, int who, int ioprio_class, int ioprio)
+static inline int ioprio_value(int ioprio_class, int ioprio)
 {
 	/*
 	 * If no class is set, assume BE
 	 */
-	if (!ioprio_class)
-		ioprio_class = IOPRIO_CLASS_BE;
+        if (!ioprio_class)
+                ioprio_class = IOPRIO_CLASS_BE;
+
+	return (ioprio_class << IOPRIO_CLASS_SHIFT) | ioprio;
+}
+
+static inline bool ioprio_value_is_class_rt(unsigned int priority)
+{
+	return (priority >> IOPRIO_CLASS_SHIFT) == IOPRIO_CLASS_RT;
+}
 
-	ioprio |= ioprio_class << IOPRIO_CLASS_SHIFT;
-	return syscall(__NR_ioprio_set, which, who, ioprio);
+static inline int ioprio_set(int which, int who, int ioprio_class, int ioprio)
+{
+	return syscall(__NR_ioprio_set, which, who,
+		       ioprio_value(ioprio_class, ioprio));
 }
 
 #ifndef BLKGETSIZE64
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index 6e465894..5b37a37e 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -171,6 +171,7 @@ static inline int fio_getaffinity(int pid, os_cpu_mask_t *mask)
  * ioprio_set() with 4 arguments, so define fio's ioprio_set() as a macro.
  * Note that there is no idea of class within ioprio_set(2) unlike Linux.
  */
+#define ioprio_value(ioprio_class, ioprio)	(ioprio)
 #define ioprio_set(which, who, ioprio_class, ioprio)	\
 	ioprio_set(which, who, ioprio)
 
diff --git a/os/os-linux.h b/os/os-linux.h
index 16ed5258..808f1d02 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -118,16 +118,26 @@ enum {
 #define IOPRIO_MIN_PRIO_CLASS	0
 #define IOPRIO_MAX_PRIO_CLASS	3
 
-static inline int ioprio_set(int which, int who, int ioprio_class, int ioprio)
+static inline int ioprio_value(int ioprio_class, int ioprio)
 {
 	/*
 	 * If no class is set, assume BE
 	 */
-	if (!ioprio_class)
-		ioprio_class = IOPRIO_CLASS_BE;
+        if (!ioprio_class)
+                ioprio_class = IOPRIO_CLASS_BE;
+
+	return (ioprio_class << IOPRIO_CLASS_SHIFT) | ioprio;
+}
+
+static inline bool ioprio_value_is_class_rt(unsigned int priority)
+{
+	return (priority >> IOPRIO_CLASS_SHIFT) == IOPRIO_CLASS_RT;
+}
 
-	ioprio |= ioprio_class << IOPRIO_CLASS_SHIFT;
-	return syscall(__NR_ioprio_set, which, who, ioprio);
+static inline int ioprio_set(int which, int who, int ioprio_class, int ioprio)
+{
+	return syscall(__NR_ioprio_set, which, who,
+		       ioprio_value(ioprio_class, ioprio));
 }
 
 #ifndef CONFIG_HAVE_GETTID
diff --git a/os/os.h b/os/os.h
index 17daf91d..827b61e9 100644
--- a/os/os.h
+++ b/os/os.h
@@ -117,7 +117,11 @@ static inline int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu_index)
 extern int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu);
 #endif
 
+#ifndef FIO_HAVE_IOPRIO_CLASS
+#define ioprio_value_is_class_rt(prio)	(false)
+#endif
 #ifndef FIO_HAVE_IOPRIO
+#define ioprio_value(prioclass, prio)	(0)
 #define ioprio_set(which, who, prioclass, prio)	(0)
 #endif
 
diff --git a/server.h b/server.h
index daed057a..3ff32d9a 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 92,
+	FIO_SERVER_VER			= 93,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
@@ -193,6 +193,7 @@ struct cmd_iolog_pdu {
 	uint32_t log_type;
 	uint32_t compressed;
 	uint32_t log_offset;
+	uint32_t log_prio;
 	uint32_t log_hist_coarseness;
 	uint8_t name[FIO_NET_NAME_MAX];
 	struct io_sample samples[0];
diff --git a/stat.c b/stat.c
index a8a96c85..99275620 100644
--- a/stat.c
+++ b/stat.c
@@ -2860,7 +2860,8 @@ static struct io_logs *get_cur_log(struct io_log *iolog)
 
 static void __add_log_sample(struct io_log *iolog, union io_sample_data data,
 			     enum fio_ddir ddir, unsigned long long bs,
-			     unsigned long t, uint64_t offset, uint8_t priority_bit)
+			     unsigned long t, uint64_t offset,
+			     unsigned int priority)
 {
 	struct io_logs *cur_log;
 
@@ -2879,7 +2880,7 @@ static void __add_log_sample(struct io_log *iolog, union io_sample_data data,
 		s->time = t + (iolog->td ? iolog->td->unix_epoch : 0);
 		io_sample_set_ddir(iolog, s, ddir);
 		s->bs = bs;
-		s->priority_bit = priority_bit;
+		s->priority = priority;
 
 		if (iolog->log_offset) {
 			struct io_sample_offset *so = (void *) s;
@@ -2956,7 +2957,7 @@ void reset_io_stats(struct thread_data *td)
 }
 
 static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
-			      unsigned long elapsed, bool log_max, uint8_t priority_bit)
+			      unsigned long elapsed, bool log_max)
 {
 	/*
 	 * Note an entry in the log. Use the mean from the logged samples,
@@ -2971,26 +2972,26 @@ static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
 		else
 			data.val = iolog->avg_window[ddir].mean.u.f + 0.50;
 
-		__add_log_sample(iolog, data, ddir, 0, elapsed, 0, priority_bit);
+		__add_log_sample(iolog, data, ddir, 0, elapsed, 0, 0);
 	}
 
 	reset_io_stat(&iolog->avg_window[ddir]);
 }
 
 static void _add_stat_to_log(struct io_log *iolog, unsigned long elapsed,
-			     bool log_max, uint8_t priority_bit)
+			     bool log_max)
 {
 	int ddir;
 
 	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
-		__add_stat_to_log(iolog, ddir, elapsed, log_max, priority_bit);
+		__add_stat_to_log(iolog, ddir, elapsed, log_max);
 }
 
 static unsigned long add_log_sample(struct thread_data *td,
 				    struct io_log *iolog,
 				    union io_sample_data data,
 				    enum fio_ddir ddir, unsigned long long bs,
-				    uint64_t offset, uint8_t priority_bit)
+				    uint64_t offset, unsigned int ioprio)
 {
 	unsigned long elapsed, this_window;
 
@@ -3003,7 +3004,8 @@ static unsigned long add_log_sample(struct thread_data *td,
 	 * If no time averaging, just add the log sample.
 	 */
 	if (!iolog->avg_msec) {
-		__add_log_sample(iolog, data, ddir, bs, elapsed, offset, priority_bit);
+		__add_log_sample(iolog, data, ddir, bs, elapsed, offset,
+				 ioprio);
 		return 0;
 	}
 
@@ -3027,7 +3029,7 @@ static unsigned long add_log_sample(struct thread_data *td,
 			return diff;
 	}
 
-	__add_stat_to_log(iolog, ddir, elapsed, td->o.log_max != 0, priority_bit);
+	__add_stat_to_log(iolog, ddir, elapsed, td->o.log_max != 0);
 
 	iolog->avg_last[ddir] = elapsed - (elapsed % iolog->avg_msec);
 
@@ -3041,19 +3043,19 @@ void finalize_logs(struct thread_data *td, bool unit_logs)
 	elapsed = mtime_since_now(&td->epoch);
 
 	if (td->clat_log && unit_logs)
-		_add_stat_to_log(td->clat_log, elapsed, td->o.log_max != 0, 0);
+		_add_stat_to_log(td->clat_log, elapsed, td->o.log_max != 0);
 	if (td->slat_log && unit_logs)
-		_add_stat_to_log(td->slat_log, elapsed, td->o.log_max != 0, 0);
+		_add_stat_to_log(td->slat_log, elapsed, td->o.log_max != 0);
 	if (td->lat_log && unit_logs)
-		_add_stat_to_log(td->lat_log, elapsed, td->o.log_max != 0, 0);
+		_add_stat_to_log(td->lat_log, elapsed, td->o.log_max != 0);
 	if (td->bw_log && (unit_logs == per_unit_log(td->bw_log)))
-		_add_stat_to_log(td->bw_log, elapsed, td->o.log_max != 0, 0);
+		_add_stat_to_log(td->bw_log, elapsed, td->o.log_max != 0);
 	if (td->iops_log && (unit_logs == per_unit_log(td->iops_log)))
-		_add_stat_to_log(td->iops_log, elapsed, td->o.log_max != 0, 0);
+		_add_stat_to_log(td->iops_log, elapsed, td->o.log_max != 0);
 }
 
-void add_agg_sample(union io_sample_data data, enum fio_ddir ddir, unsigned long long bs,
-					uint8_t priority_bit)
+void add_agg_sample(union io_sample_data data, enum fio_ddir ddir,
+		    unsigned long long bs)
 {
 	struct io_log *iolog;
 
@@ -3061,7 +3063,7 @@ void add_agg_sample(union io_sample_data data, enum fio_ddir ddir, unsigned long
 		return;
 
 	iolog = agg_io_log[ddir];
-	__add_log_sample(iolog, data, ddir, bs, mtime_since_genesis(), 0, priority_bit);
+	__add_log_sample(iolog, data, ddir, bs, mtime_since_genesis(), 0, 0);
 }
 
 void add_sync_clat_sample(struct thread_stat *ts, unsigned long long nsec)
@@ -3083,14 +3085,14 @@ static void add_lat_percentile_sample_noprio(struct thread_stat *ts,
 }
 
 static void add_lat_percentile_sample(struct thread_stat *ts,
-				unsigned long long nsec, enum fio_ddir ddir, uint8_t priority_bit,
-				enum fio_lat lat)
+				unsigned long long nsec, enum fio_ddir ddir,
+				bool high_prio, enum fio_lat lat)
 {
 	unsigned int idx = plat_val_to_idx(nsec);
 
 	add_lat_percentile_sample_noprio(ts, nsec, ddir, lat);
 
-	if (!priority_bit)
+	if (!high_prio)
 		ts->io_u_plat_low_prio[ddir][idx]++;
 	else
 		ts->io_u_plat_high_prio[ddir][idx]++;
@@ -3098,7 +3100,7 @@ static void add_lat_percentile_sample(struct thread_stat *ts,
 
 void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		     unsigned long long nsec, unsigned long long bs,
-		     uint64_t offset, uint8_t priority_bit)
+		     uint64_t offset, unsigned int ioprio, bool high_prio)
 {
 	const bool needs_lock = td_async_processing(td);
 	unsigned long elapsed, this_window;
@@ -3111,7 +3113,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 	add_stat_sample(&ts->clat_stat[ddir], nsec);
 
 	if (!ts->lat_percentiles) {
-		if (priority_bit)
+		if (high_prio)
 			add_stat_sample(&ts->clat_high_prio_stat[ddir], nsec);
 		else
 			add_stat_sample(&ts->clat_low_prio_stat[ddir], nsec);
@@ -3119,13 +3121,13 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 
 	if (td->clat_log)
 		add_log_sample(td, td->clat_log, sample_val(nsec), ddir, bs,
-			       offset, priority_bit);
+			       offset, ioprio);
 
 	if (ts->clat_percentiles) {
 		if (ts->lat_percentiles)
 			add_lat_percentile_sample_noprio(ts, nsec, ddir, FIO_CLAT);
 		else
-			add_lat_percentile_sample(ts, nsec, ddir, priority_bit, FIO_CLAT);
+			add_lat_percentile_sample(ts, nsec, ddir, high_prio, FIO_CLAT);
 	}
 
 	if (iolog && iolog->hist_msec) {
@@ -3154,7 +3156,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 				FIO_IO_U_PLAT_NR * sizeof(uint64_t));
 			flist_add(&dst->list, &hw->list);
 			__add_log_sample(iolog, sample_plat(dst), ddir, bs,
-						elapsed, offset, priority_bit);
+					 elapsed, offset, ioprio);
 
 			/*
 			 * Update the last time we recorded as being now, minus
@@ -3171,8 +3173,8 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 }
 
 void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
-			unsigned long long nsec, unsigned long long bs, uint64_t offset,
-			uint8_t priority_bit)
+		     unsigned long long nsec, unsigned long long bs,
+		     uint64_t offset, unsigned int ioprio)
 {
 	const bool needs_lock = td_async_processing(td);
 	struct thread_stat *ts = &td->ts;
@@ -3186,8 +3188,8 @@ void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
 	add_stat_sample(&ts->slat_stat[ddir], nsec);
 
 	if (td->slat_log)
-		add_log_sample(td, td->slat_log, sample_val(nsec), ddir, bs, offset,
-			priority_bit);
+		add_log_sample(td, td->slat_log, sample_val(nsec), ddir, bs,
+			       offset, ioprio);
 
 	if (ts->slat_percentiles)
 		add_lat_percentile_sample_noprio(ts, nsec, ddir, FIO_SLAT);
@@ -3198,7 +3200,7 @@ void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
 
 void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 		    unsigned long long nsec, unsigned long long bs,
-		    uint64_t offset, uint8_t priority_bit)
+		    uint64_t offset, unsigned int ioprio, bool high_prio)
 {
 	const bool needs_lock = td_async_processing(td);
 	struct thread_stat *ts = &td->ts;
@@ -3213,11 +3215,11 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 
 	if (td->lat_log)
 		add_log_sample(td, td->lat_log, sample_val(nsec), ddir, bs,
-			       offset, priority_bit);
+			       offset, ioprio);
 
 	if (ts->lat_percentiles) {
-		add_lat_percentile_sample(ts, nsec, ddir, priority_bit, FIO_LAT);
-		if (priority_bit)
+		add_lat_percentile_sample(ts, nsec, ddir, high_prio, FIO_LAT);
+		if (high_prio)
 			add_stat_sample(&ts->clat_high_prio_stat[ddir], nsec);
 		else
 			add_stat_sample(&ts->clat_low_prio_stat[ddir], nsec);
@@ -3246,7 +3248,7 @@ void add_bw_sample(struct thread_data *td, struct io_u *io_u,
 
 	if (td->bw_log)
 		add_log_sample(td, td->bw_log, sample_val(rate), io_u->ddir,
-			       bytes, io_u->offset, io_u_is_prio(io_u));
+			       bytes, io_u->offset, io_u->ioprio);
 
 	td->stat_io_bytes[io_u->ddir] = td->this_io_bytes[io_u->ddir];
 
@@ -3300,7 +3302,8 @@ static int __add_samples(struct thread_data *td, struct timespec *parent_tv,
 			if (td->o.min_bs[ddir] == td->o.max_bs[ddir])
 				bs = td->o.min_bs[ddir];
 
-			next = add_log_sample(td, log, sample_val(rate), ddir, bs, 0, 0);
+			next = add_log_sample(td, log, sample_val(rate), ddir,
+					      bs, 0, 0);
 			next_log = min(next_log, next);
 		}
 
@@ -3340,7 +3343,7 @@ void add_iops_sample(struct thread_data *td, struct io_u *io_u,
 
 	if (td->iops_log)
 		add_log_sample(td, td->iops_log, sample_val(1), io_u->ddir,
-			       bytes, io_u->offset, io_u_is_prio(io_u));
+			       bytes, io_u->offset, io_u->ioprio);
 
 	td->stat_io_blocks[io_u->ddir] = td->this_io_blocks[io_u->ddir];
 
diff --git a/stat.h b/stat.h
index d08d4dc0..a06237e7 100644
--- a/stat.h
+++ b/stat.h
@@ -341,13 +341,12 @@ extern void update_rusage_stat(struct thread_data *);
 extern void clear_rusage_stat(struct thread_data *);
 
 extern void add_lat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
-				unsigned long long, uint64_t, uint8_t);
+			   unsigned long long, uint64_t, unsigned int, bool);
 extern void add_clat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
-				unsigned long long, uint64_t, uint8_t);
+			    unsigned long long, uint64_t, unsigned int, bool);
 extern void add_slat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
-				unsigned long long, uint64_t, uint8_t);
-extern void add_agg_sample(union io_sample_data, enum fio_ddir, unsigned long long bs,
-				uint8_t priority_bit);
+				unsigned long long, uint64_t, unsigned int);
+extern void add_agg_sample(union io_sample_data, enum fio_ddir, unsigned long long);
 extern void add_iops_sample(struct thread_data *, struct io_u *,
 				unsigned int);
 extern void add_bw_sample(struct thread_data *, struct io_u *,
diff --git a/thread_options.h b/thread_options.h
index 4b4ecfe1..9990ab9b 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -44,6 +44,12 @@ enum dedupe_mode {
 #define BSSPLIT_MAX	64
 #define ZONESPLIT_MAX	256
 
+struct split {
+	unsigned int nr;
+	unsigned long long val1[ZONESPLIT_MAX];
+	unsigned long long val2[ZONESPLIT_MAX];
+};
+
 struct bssplit {
 	uint64_t bs;
 	uint32_t perc;
@@ -368,6 +374,8 @@ struct thread_options {
 	unsigned int ignore_zone_limits;
 	fio_fp64_t zrt;
 	fio_fp64_t zrf;
+
+	unsigned int log_prio;
 };
 
 #define FIO_TOP_STR_MAX		256
@@ -671,6 +679,8 @@ struct thread_options_pack {
 	uint32_t zone_mode;
 	int32_t max_open_zones;
 	uint32_t ignore_zone_limits;
+
+	uint32_t log_prio;
 } __attribute__((packed));
 
 extern void convert_thread_options_to_cpu(struct thread_options *o, struct thread_options_pack *top);
@@ -678,4 +688,13 @@ extern void convert_thread_options_to_net(struct thread_options_pack *top, struc
 extern int fio_test_cconv(struct thread_options *);
 extern void options_default_fill(struct thread_options *o);
 
+typedef int (split_parse_fn)(struct thread_options *, void *,
+			     enum fio_ddir, char *, bool);
+
+extern int str_split_parse(struct thread_data *td, char *str,
+			   split_parse_fn *fn, void *eo, bool data);
+
+extern int split_parse_ddir(struct thread_options *o, struct split *split,
+			    char *str, bool absolute, unsigned int max_splits);
+
 #endif
diff --git a/tools/fiograph/fiograph.conf b/tools/fiograph/fiograph.conf
index 5becc4d9..cfd2fd8e 100644
--- a/tools/fiograph/fiograph.conf
+++ b/tools/fiograph/fiograph.conf
@@ -51,10 +51,10 @@ specific_options=https  http_host  http_user  http_pass  http_s3_key  http_s3_ke
 specific_options=ime_psync  ime_psyncv
 
 [ioengine_io_uring]
-specific_options=hipri  cmdprio_percentage  cmdprio_percentage  fixedbufs  registerfiles  sqthread_poll  sqthread_poll_cpu  nonvectored  uncached  nowait  force_async
+specific_options=hipri  cmdprio_percentage  cmdprio_class  cmdprio  cmdprio_bssplit  fixedbufs  registerfiles  sqthread_poll  sqthread_poll_cpu  nonvectored  uncached  nowait  force_async
 
 [ioengine_libaio]
-specific_options=userspace_reap  cmdprio_percentage  cmdprio_percentage  nowait
+specific_options=userspace_reap  cmdprio_percentage  cmdprio_class  cmdprio  cmdprio_bssplit  nowait
 
 [ioengine_libcufile]
 specific_options=gpu_dev_ids  cuda_io
diff --git a/tools/fiograph/fiograph.py b/tools/fiograph/fiograph.py
index 7695c964..b5669a2d 100755
--- a/tools/fiograph/fiograph.py
+++ b/tools/fiograph/fiograph.py
@@ -292,9 +292,11 @@ def setup_commandline():
 def main():
     global config_file
     args = setup_commandline()
-    output_file = args.file
     if args.output is None:
+        output_file = args.file
         output_file = output_file.replace('.fio', '')
+    else:
+        output_file = args.output
     config_file = configparser.RawConfigParser(allow_no_value=True)
     config_file.read(args.config)
     fio_to_graphviz(args.file, args.format).render(output_file, view=args.view)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-09-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-09-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4f2152278e0b3c35ded02fb3e6fb550eab7bedcd:

  t/io_uring: further simplify inflight tracking (2021-08-28 15:37:25 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f3463241727215e228a60dc3b9a1ba2996f149a1:

  oslib: Fix blkzoned_get_max_open_zones() (2021-09-02 20:56:19 -0600)

----------------------------------------------------------------
Damien Le Moal (1):
      oslib: Fix blkzoned_get_max_open_zones()

 oslib/linux-blkzoned.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
index 4e441d29..185bd501 100644
--- a/oslib/linux-blkzoned.c
+++ b/oslib/linux-blkzoned.c
@@ -169,8 +169,10 @@ int blkzoned_get_max_open_zones(struct thread_data *td, struct fio_file *f,
 		return -EIO;
 
 	max_open_str = blkzoned_get_sysfs_attr(f->file_name, "queue/max_open_zones");
-	if (!max_open_str)
+	if (!max_open_str) {
+		*max_open_zones = 0;
 		return 0;
+	}
 
 	dprint(FD_ZBD, "%s: max open zones supported by device: %s\n",
 	       f->file_name, max_open_str);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-08-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-08-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 26976a134e44f583f91e05df95ef8ec5a9cc968d:

  t/io_uring: pretty up multi-file depths (2021-08-27 12:44:02 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4f2152278e0b3c35ded02fb3e6fb550eab7bedcd:

  t/io_uring: further simplify inflight tracking (2021-08-28 15:37:25 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: further simplify inflight tracking

 t/io_uring.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 607c7946..3130e469 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -530,7 +530,7 @@ static void file_depths(char *buf)
 			if (prev)
 				p += sprintf(p, " %d", f->pending_ios);
 			else
-				p += sprintf(p, "%d ", f->pending_ios);
+				p += sprintf(p, "%d", f->pending_ios);
 			prev = true;
 		}
 	}
@@ -708,9 +708,8 @@ int main(int argc, char *argv[])
 		} else
 			rpc = ipc = -1;
 		file_depths(fdepths);
-		printf("IOPS=%lu, IOS/call=%ld/%ld, inflight=%u (%s)\n",
-				this_done - done, rpc, ipc, s->inflight,
-				fdepths);
+		printf("IOPS=%lu, IOS/call=%ld/%ld, inflight=(%s)\n",
+				this_done - done, rpc, ipc, fdepths);
 		done = this_done;
 		calls = this_call;
 		reap = this_reap;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-08-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-08-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit fd70e3619c00bc9f7b2f80cadf3fdb348cbacf51:

  io_uring: don't clear recently set sqe->rw_flags (2021-08-26 10:50:05 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 26976a134e44f583f91e05df95ef8ec5a9cc968d:

  t/io_uring: pretty up multi-file depths (2021-08-27 12:44:02 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: pretty up multi-file depths

 t/io_uring.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 538cc7d4..607c7946 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -515,6 +515,7 @@ static int setup_ring(struct submitter *s)
 
 static void file_depths(char *buf)
 {
+	bool prev = false;
 	char *p;
 	int i, j;
 
@@ -526,10 +527,11 @@ static void file_depths(char *buf)
 		for (i = 0; i < s->nr_files; i++) {
 			struct file *f = &s->files[i];
 
-			if (i + 1 == s->nr_files)
-				p += sprintf(p, "%d", f->pending_ios);
+			if (prev)
+				p += sprintf(p, " %d", f->pending_ios);
 			else
-				p += sprintf(p, "%d, ", f->pending_ios);
+				p += sprintf(p, "%d ", f->pending_ios);
+			prev = true;
 		}
 	}
 }


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-08-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-08-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 15ce99bb71e7c289f62ddee94e0149f6c81549de:

  Merge branch 'master' of https://github.com/DamonPalovaara/fio (2021-08-20 20:58:42 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fd70e3619c00bc9f7b2f80cadf3fdb348cbacf51:

  io_uring: don't clear recently set sqe->rw_flags (2021-08-26 10:50:05 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'wip-cxx' of https://github.com/tchaikov/fio

Kefu Chai (1):
      arch,lib/seqlock: implement seqlock with C++ atomic if compiled with C++

Niklas Cassel (3):
      io_uring: always initialize sqe->flags
      io_uring: fix misbehaving cmdprio_percentage option
      io_uring: don't clear recently set sqe->rw_flags

 arch/arch.h        | 20 ++++++++++++++++++++
 engines/io_uring.c |  7 +++++--
 lib/seqlock.h      |  4 ++++
 3 files changed, 29 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/arch/arch.h b/arch/arch.h
index a25779d4..fca003be 100644
--- a/arch/arch.h
+++ b/arch/arch.h
@@ -1,7 +1,11 @@
 #ifndef ARCH_H
 #define ARCH_H
 
+#ifdef __cplusplus
+#include <atomic>
+#else
 #include <stdatomic.h>
+#endif
 
 #include "../lib/types.h"
 
@@ -36,6 +40,21 @@ extern unsigned long arch_flags;
 
 #define ARCH_CPU_CLOCK_WRAPS
 
+#ifdef __cplusplus
+#define atomic_add(p, v)						\
+	std::atomic_fetch_add(p, (v))
+#define atomic_sub(p, v)						\
+	std::atomic_fetch_sub(p, (v))
+#define atomic_load_relaxed(p)					\
+	std::atomic_load_explicit(p,				\
+			     std::memory_order_relaxed)
+#define atomic_load_acquire(p)					\
+	std::atomic_load_explicit(p,				\
+			     std::memory_order_acquire)
+#define atomic_store_release(p, v)				\
+	std::atomic_store_explicit(p, (v),			\
+			     std::memory_order_release)
+#else
 #define atomic_add(p, v)					\
 	atomic_fetch_add((_Atomic typeof(*(p)) *)(p), v)
 #define atomic_sub(p, v)					\
@@ -49,6 +68,7 @@ extern unsigned long arch_flags;
 #define atomic_store_release(p, v)				\
 	atomic_store_explicit((_Atomic typeof(*(p)) *)(p), (v),	\
 			      memory_order_release)
+#endif
 
 /* IWYU pragma: begin_exports */
 #if defined(__i386__)
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 9c091e37..b8d4cf91 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -234,6 +234,7 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 		sqe->flags = IOSQE_FIXED_FILE;
 	} else {
 		sqe->fd = f->fd;
+		sqe->flags = 0;
 	}
 
 	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
@@ -261,8 +262,9 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 				sqe->len = 1;
 			}
 		}
+		sqe->rw_flags = 0;
 		if (!td->o.odirect && o->uncached)
-			sqe->rw_flags = RWF_UNCACHED;
+			sqe->rw_flags |= RWF_UNCACHED;
 		if (o->nowait)
 			sqe->rw_flags |= RWF_NOWAIT;
 		if (ld->ioprio_class_set)
@@ -270,7 +272,6 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 		if (ld->ioprio_set)
 			sqe->ioprio |= td->o.ioprio;
 		sqe->off = io_u->offset;
-		sqe->rw_flags = 0;
 	} else if (ddir_sync(io_u->ddir)) {
 		sqe->ioprio = 0;
 		if (io_u->ddir == DDIR_SYNC_FILE_RANGE) {
@@ -383,6 +384,8 @@ static void fio_ioring_prio_prep(struct thread_data *td, struct io_u *io_u)
 	if (rand_between(&td->prio_state, 0, 99) < o->cmdprio_percentage) {
 		ld->sqes[io_u->index].ioprio = IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT;
 		io_u->flags |= IO_U_F_PRIORITY;
+	} else {
+		ld->sqes[io_u->index].ioprio = 0;
 	}
 	return;
 }
diff --git a/lib/seqlock.h b/lib/seqlock.h
index 56f3e37d..ef3aa091 100644
--- a/lib/seqlock.h
+++ b/lib/seqlock.h
@@ -5,7 +5,11 @@
 #include "../arch/arch.h"
 
 struct seqlock {
+#ifdef __cplusplus
+	std::atomic<unsigned int> sequence;
+#else
 	volatile unsigned int sequence;
+#endif
 };
 
 static inline void seqlock_init(struct seqlock *s)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-08-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-08-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 864314464e2772a9885da34ea041f130073affe9:

  Merge branch 'patch-1' of https://github.com/antroseco/fio (2021-08-18 10:47:55 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 15ce99bb71e7c289f62ddee94e0149f6c81549de:

  Merge branch 'master' of https://github.com/DamonPalovaara/fio (2021-08-20 20:58:42 -0600)

----------------------------------------------------------------
Damon Palovaara (1):
      fixed type boot->bool

Jens Axboe (1):
      Merge branch 'master' of https://github.com/DamonPalovaara/fio

 HOWTO | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 9bfd38b4..a2cf20f6 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2600,7 +2600,7 @@ with the caveat that when used on the command line, they must come after the
 
 	Specify the time between the SIGTERM and SIGKILL signals. Default is 1 second.
 
-.. option:: std_redirect=boot : [exec]
+.. option:: std_redirect=bool : [exec]
 
 	If set, stdout and stderr streams are redirected to files named from the job name. Default is true.
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-08-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-08-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit df9e8b65a52fdab5a1ac48847c44d7201faa3cf1:

  Merge branch 'dfs_update_13_api' of https://github.com/johannlombardi/fio (2021-08-13 10:01:31 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 864314464e2772a9885da34ea041f130073affe9:

  Merge branch 'patch-1' of https://github.com/antroseco/fio (2021-08-18 10:47:55 -0600)

----------------------------------------------------------------
Andreas Economides (1):
      server: reopen standard streams to /dev/null

Jens Axboe (1):
      Merge branch 'patch-1' of https://github.com/antroseco/fio

 server.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/server.c b/server.c
index 42eaa4b1..859a401b 100644
--- a/server.c
+++ b/server.c
@@ -2565,6 +2565,7 @@ static int write_pid(pid_t pid, const char *pidfile)
  */
 int fio_start_server(char *pidfile)
 {
+	FILE *file;
 	pid_t pid;
 	int ret;
 
@@ -2597,14 +2598,28 @@ int fio_start_server(char *pidfile)
 	setsid();
 	openlog("fio", LOG_NDELAY|LOG_NOWAIT|LOG_PID, LOG_USER);
 	log_syslog = true;
-	close(STDIN_FILENO);
-	close(STDOUT_FILENO);
-	close(STDERR_FILENO);
+
+	file = freopen("/dev/null", "r", stdin);
+	if (!file)
+		perror("freopen");
+
+	file = freopen("/dev/null", "w", stdout);
+	if (!file)
+		perror("freopen");
+
+	file = freopen("/dev/null", "w", stderr);
+	if (!file)
+		perror("freopen");
+
 	f_out = NULL;
 	f_err = NULL;
 
 	ret = fio_server();
 
+	fclose(stdin);
+	fclose(stdout);
+	fclose(stderr);
+
 	closelog();
 	unlink(pidfile);
 	free(pidfile);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-08-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-08-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 543196617ce45b6a04fb039a3b9c6d06c9b58309:

  t/io_uring: allow multiple IO threads (2021-08-11 16:57:19 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to df9e8b65a52fdab5a1ac48847c44d7201faa3cf1:

  Merge branch 'dfs_update_13_api' of https://github.com/johannlombardi/fio (2021-08-13 10:01:31 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'dfs_update_13_api' of https://github.com/johannlombardi/fio

Johann Lombardi (1):
      engines/dfs: add support for 1.3 DAOS API

 HOWTO         |  4 ++--
 engines/dfs.c | 24 ++++++++++++++++++------
 fio.1         |  4 ++--
 3 files changed, 22 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 04ea284b..9bfd38b4 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2561,11 +2561,11 @@ with the caveat that when used on the command line, they must come after the
 
 .. option:: pool=str : [dfs]
 
-	Specify the UUID of the DAOS pool to connect to.
+	Specify the label or UUID of the DAOS pool to connect to.
 
 .. option:: cont=str : [dfs]
 
-	Specify the UUID of the DAOS container to open.
+	Specify the label or UUID of the DAOS container to open.
 
 .. option:: chunk_size=int : [dfs]
 
diff --git a/engines/dfs.c b/engines/dfs.c
index 0343b101..664e8b13 100644
--- a/engines/dfs.c
+++ b/engines/dfs.c
@@ -49,19 +49,19 @@ struct daos_fio_options {
 static struct fio_option options[] = {
 	{
 		.name		= "pool",
-		.lname		= "pool uuid",
+		.lname		= "pool uuid or label",
 		.type		= FIO_OPT_STR_STORE,
 		.off1		= offsetof(struct daos_fio_options, pool),
-		.help		= "DAOS pool uuid",
+		.help		= "DAOS pool uuid or label",
 		.category	= FIO_OPT_C_ENGINE,
 		.group		= FIO_OPT_G_DFS,
 	},
 	{
 		.name           = "cont",
-		.lname          = "container uuid",
+		.lname          = "container uuid or label",
 		.type           = FIO_OPT_STR_STORE,
 		.off1           = offsetof(struct daos_fio_options, cont),
-		.help           = "DAOS container uuid",
+		.help           = "DAOS container uuid or label",
 		.category	= FIO_OPT_C_ENGINE,
 		.group		= FIO_OPT_G_DFS,
 	},
@@ -103,7 +103,6 @@ static struct fio_option options[] = {
 static int daos_fio_global_init(struct thread_data *td)
 {
 	struct daos_fio_options	*eo = td->eo;
-	uuid_t			pool_uuid, co_uuid;
 	daos_pool_info_t	pool_info;
 	daos_cont_info_t	co_info;
 	int			rc = 0;
@@ -124,6 +123,10 @@ static int daos_fio_global_init(struct thread_data *td)
 		return rc;
 	}
 
+#if !defined(DAOS_API_VERSION_MAJOR) || \
+    (DAOS_API_VERSION_MAJOR == 1 && DAOS_API_VERSION_MINOR < 3)
+	uuid_t pool_uuid, co_uuid;
+
 	rc = uuid_parse(eo->pool, pool_uuid);
 	if (rc) {
 		log_err("Failed to parse 'Pool uuid': %s\n", eo->pool);
@@ -137,6 +140,7 @@ static int daos_fio_global_init(struct thread_data *td)
 		td_verror(td, EINVAL, "uuid_parse(eo->cont)");
 		return EINVAL;
 	}
+#endif
 
 	/* Connect to the DAOS pool */
 #if !defined(DAOS_API_VERSION_MAJOR) || DAOS_API_VERSION_MAJOR < 1
@@ -152,9 +156,12 @@ static int daos_fio_global_init(struct thread_data *td)
 	rc = daos_pool_connect(pool_uuid, NULL, svcl, DAOS_PC_RW,
 			&poh, &pool_info, NULL);
 	d_rank_list_free(svcl);
-#else
+#elif (DAOS_API_VERSION_MAJOR == 1 && DAOS_API_VERSION_MINOR < 3)
 	rc = daos_pool_connect(pool_uuid, NULL, DAOS_PC_RW, &poh, &pool_info,
 			       NULL);
+#else
+	rc = daos_pool_connect(eo->pool, NULL, DAOS_PC_RW, &poh, &pool_info,
+			       NULL);
 #endif
 	if (rc) {
 		log_err("Failed to connect to pool %d\n", rc);
@@ -163,7 +170,12 @@ static int daos_fio_global_init(struct thread_data *td)
 	}
 
 	/* Open the DAOS container */
+#if !defined(DAOS_API_VERSION_MAJOR) || \
+    (DAOS_API_VERSION_MAJOR == 1 && DAOS_API_VERSION_MINOR < 3)
 	rc = daos_cont_open(poh, co_uuid, DAOS_COO_RW, &coh, &co_info, NULL);
+#else
+	rc = daos_cont_open(poh, eo->cont, DAOS_COO_RW, &coh, &co_info, NULL);
+#endif
 	if (rc) {
 		log_err("Failed to open container: %d\n", rc);
 		td_verror(td, rc, "daos_cont_open");
diff --git a/fio.1 b/fio.1
index ff100a1c..382cebfc 100644
--- a/fio.1
+++ b/fio.1
@@ -2326,10 +2326,10 @@ the use of cudaMemcpy.
 .RE
 .TP
 .BI (dfs)pool
-Specify the UUID of the DAOS pool to connect to.
+Specify the label or UUID of the DAOS pool to connect to.
 .TP
 .BI (dfs)cont
-Specify the UUID of the DAOS DAOS container to open.
+Specify the label or UUID of the DAOS container to open.
 .TP
 .BI (dfs)chunk_size
 Specificy a different chunk size (in bytes) for the dfs file.


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-08-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-08-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit faff87e6f0da68853908652a95f0ec40dd12869d:

  t/zbd: Add test #58 to test zone reset by trim workload (2021-08-06 16:39:31 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 543196617ce45b6a04fb039a3b9c6d06c9b58309:

  t/io_uring: allow multiple IO threads (2021-08-11 16:57:19 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: allow multiple IO threads

 t/io_uring.c | 115 ++++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 82 insertions(+), 33 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index ff4c7a7c..538cc7d4 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -62,6 +62,7 @@ struct file {
 struct submitter {
 	pthread_t thread;
 	int ring_fd;
+	int index;
 	struct io_sq_ring sq_ring;
 	struct io_uring_sqe *sqes;
 	struct io_cq_ring cq_ring;
@@ -93,6 +94,7 @@ static int buffered = 0;	/* use buffered IO, not O_DIRECT */
 static int sq_thread_poll = 0;	/* use kernel submission/poller thread */
 static int sq_thread_cpu = -1;	/* pin above thread to this CPU */
 static int do_nop = 0;		/* no-op SQ ring commands */
+static int nthreads = 1;
 
 static int vectored = 1;
 
@@ -404,10 +406,25 @@ submit:
 	return NULL;
 }
 
+static struct submitter *get_submitter(int offset)
+{
+	void *ret;
+
+	ret = submitter;
+	if (offset)
+		ret += offset * (sizeof(*submitter) + depth * sizeof(struct iovec));
+	return ret;
+}
+
 static void sig_int(int sig)
 {
+	int j;
+
 	printf("Exiting on signal %d\n", sig);
-	submitter->finish = 1;
+	for (j = 0; j < nthreads; j++) {
+		struct submitter *s = get_submitter(j);
+		s->finish = 1;
+	}
 	finish = 1;
 }
 
@@ -498,19 +515,22 @@ static int setup_ring(struct submitter *s)
 
 static void file_depths(char *buf)
 {
-	struct submitter *s = submitter;
 	char *p;
-	int i;
+	int i, j;
 
 	buf[0] = '\0';
 	p = buf;
-	for (i = 0; i < s->nr_files; i++) {
-		struct file *f = &s->files[i];
+	for (j = 0; j < nthreads; j++) {
+		struct submitter *s = get_submitter(j);
 
-		if (i + 1 == s->nr_files)
-			p += sprintf(p, "%d", f->pending_ios);
-		else
-			p += sprintf(p, "%d, ", f->pending_ios);
+		for (i = 0; i < s->nr_files; i++) {
+			struct file *f = &s->files[i];
+
+			if (i + 1 == s->nr_files)
+				p += sprintf(p, "%d", f->pending_ios);
+			else
+				p += sprintf(p, "%d, ", f->pending_ios);
+		}
 	}
 }
 
@@ -530,7 +550,7 @@ int main(int argc, char *argv[])
 {
 	struct submitter *s;
 	unsigned long done, calls, reap;
-	int err, i, flags, fd, opt;
+	int err, i, j, flags, fd, opt;
 	char *fdepths;
 	void *ret;
 
@@ -539,7 +559,7 @@ int main(int argc, char *argv[])
 		return 1;
 	}
 
-	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:n:h?")) != -1) {
 		switch (opt) {
 		case 'd':
 			depth = atoi(optarg);
@@ -562,6 +582,9 @@ int main(int argc, char *argv[])
 		case 'F':
 			register_files = !!atoi(optarg);
 			break;
+		case 'n':
+			nthreads = atoi(optarg);
+			break;
 		case 'h':
 		case '?':
 		default:
@@ -570,18 +593,25 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	submitter = malloc(sizeof(*submitter) + depth * sizeof(struct iovec));
-	memset(submitter, 0, sizeof(*submitter) + depth * sizeof(struct iovec));
-	s = submitter;
+	submitter = calloc(nthreads, sizeof(*submitter) +
+				depth * sizeof(struct iovec));
+	for (j = 0; j < nthreads; j++) {
+		s = get_submitter(j);
+		s->index = j;
+		s->done = s->calls = s->reaps = 0;
+	}
 
 	flags = O_RDONLY | O_NOATIME;
 	if (!buffered)
 		flags |= O_DIRECT;
 
+	j = 0;
 	i = optind;
+	printf("i %d, argc %d\n", i, argc);
 	while (!do_nop && i < argc) {
 		struct file *f;
 
+		s = get_submitter(j);
 		if (s->nr_files == MAX_FDS) {
 			printf("Max number of files (%d) reached\n", MAX_FDS);
 			break;
@@ -604,9 +634,11 @@ int main(int argc, char *argv[])
 		}
 		f->max_blocks--;
 
-		printf("Added file %s\n", argv[i]);
+		printf("Added file %s (submitter %d)\n", argv[i], s->index);
 		s->nr_files++;
 		i++;
+		if (++j >= nthreads)
+			j = 0;
 	}
 
 	if (fixedbufs) {
@@ -622,28 +654,39 @@ int main(int argc, char *argv[])
 
 	arm_sig_int();
 
-	for (i = 0; i < depth; i++) {
-		void *buf;
+	for (j = 0; j < nthreads; j++) {
+		s = get_submitter(j);
+		for (i = 0; i < depth; i++) {
+			void *buf;
 
-		if (posix_memalign(&buf, bs, bs)) {
-			printf("failed alloc\n");
-			return 1;
+			if (posix_memalign(&buf, bs, bs)) {
+				printf("failed alloc\n");
+				return 1;
+			}
+			s->iovecs[i].iov_base = buf;
+			s->iovecs[i].iov_len = bs;
 		}
-		s->iovecs[i].iov_base = buf;
-		s->iovecs[i].iov_len = bs;
 	}
 
-	err = setup_ring(s);
-	if (err) {
-		printf("ring setup failed: %s, %d\n", strerror(errno), err);
-		return 1;
+	for (j = 0; j < nthreads; j++) {
+		s = get_submitter(j);
+
+		err = setup_ring(s);
+		if (err) {
+			printf("ring setup failed: %s, %d\n", strerror(errno), err);
+			return 1;
+		}
 	}
+	s = get_submitter(0);
 	printf("polled=%d, fixedbufs=%d, register_files=%d, buffered=%d", polled, fixedbufs, register_files, buffered);
 	printf(" QD=%d, sq_ring=%d, cq_ring=%d\n", depth, *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
 
-	pthread_create(&s->thread, NULL, submitter_fn, s);
+	for (j = 0; j < nthreads; j++) {
+		s = get_submitter(j);
+		pthread_create(&s->thread, NULL, submitter_fn, s);
+	}
 
-	fdepths = malloc(8 * s->nr_files);
+	fdepths = malloc(8 * s->nr_files * nthreads);
 	reap = calls = done = 0;
 	do {
 		unsigned long this_done = 0;
@@ -652,9 +695,11 @@ int main(int argc, char *argv[])
 		unsigned long rpc = 0, ipc = 0;
 
 		sleep(1);
-		this_done += s->done;
-		this_call += s->calls;
-		this_reap += s->reaps;
+		for (j = 0; j < nthreads; j++) {
+			this_done += s->done;
+			this_call += s->calls;
+			this_reap += s->reaps;
+		}
 		if (this_call - calls) {
 			rpc = (this_done - done) / (this_call - calls);
 			ipc = (this_reap - reap) / (this_call - calls);
@@ -669,8 +714,12 @@ int main(int argc, char *argv[])
 		reap = this_reap;
 	} while (!finish);
 
-	pthread_join(s->thread, &ret);
-	close(s->ring_fd);
+	for (j = 0; j < nthreads; j++) {
+		s = get_submitter(j);
+		pthread_join(s->thread, &ret);
+		close(s->ring_fd);
+	}
 	free(fdepths);
+	free(submitter);
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-08-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-08-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2459bd33b3dbb7a34f28c612d595311a6bc7593d:

  ioengines: fix crash with --enghelp option (2021-08-04 12:49:57 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to faff87e6f0da68853908652a95f0ec40dd12869d:

  t/zbd: Add test #58 to test zone reset by trim workload (2021-08-06 16:39:31 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (5):
      zbd: Add min_bytes argument to zbd_find_zone()
      zbd: Support zone reset by trim
      engines/libzbc: Enable trim for libzbc I/O engine
      HOWTO/man: Describe trim support by zone reset for zoned devices
      t/zbd: Add test #58 to test zone reset by trim workload

 HOWTO                  |  8 +++++
 engines/libzbc.c       | 13 ++++----
 fio.1                  |  9 +++---
 io_u.c                 |  9 ++++++
 t/zbd/test-zbd-support | 26 +++++++++++++++
 zbd.c                  | 85 +++++++++++++++++++++++++++++++++++++++++---------
 zbd.h                  |  2 ++
 7 files changed, 128 insertions(+), 24 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index d4e620de..04ea284b 100644
--- a/HOWTO
+++ b/HOWTO
@@ -992,6 +992,9 @@ Target file/device
 				single zone. The :option:`zoneskip` parameter
 				is ignored. :option:`zonerange` and
 				:option:`zonesize` must be identical.
+				Trim is handled using a zone reset operation.
+				Trim only considers non-empty sequential write
+				required and sequential write preferred zones.
 
 .. option:: zonerange=int
 
@@ -1965,6 +1968,11 @@ I/O engine
 			character devices. This engine supports trim operations.
 			The sg engine includes engine specific options.
 
+		**libzbc**
+			Read, write, trim and ZBC/ZAC operations to a zoned
+			block device using libzbc library. The target can be
+			either an SG character device or a block device file.
+
 		**null**
 			Doesn't transfer any data, just pretends to.  This is mainly used to
 			exercise fio itself and for debugging/testing purposes.
diff --git a/engines/libzbc.c b/engines/libzbc.c
index 7f2bc431..abee2043 100644
--- a/engines/libzbc.c
+++ b/engines/libzbc.c
@@ -14,6 +14,7 @@
 #include "fio.h"
 #include "err.h"
 #include "zbd_types.h"
+#include "zbd.h"
 
 struct libzbc_data {
 	struct zbc_device	*zdev;
@@ -63,7 +64,7 @@ static int libzbc_open_dev(struct thread_data *td, struct fio_file *f,
 		return -EINVAL;
 	}
 
-	if (td_write(td)) {
+	if (td_write(td) || td_trim(td)) {
 		if (!read_only)
 			flags |= O_RDWR;
 	} else if (td_read(td)) {
@@ -71,10 +72,6 @@ static int libzbc_open_dev(struct thread_data *td, struct fio_file *f,
 			flags |= O_RDWR;
 		else
 			flags |= O_RDONLY;
-	} else if (td_trim(td)) {
-		td_verror(td, EINVAL, "libzbc does not support trim");
-		log_err("%s: libzbc does not support trim\n", f->file_name);
-		return -EINVAL;
 	}
 
 	if (td->o.oatomic) {
@@ -411,7 +408,11 @@ static enum fio_q_status libzbc_queue(struct thread_data *td, struct io_u *io_u)
 		ret = zbc_flush(ld->zdev);
 		if (ret)
 			log_err("zbc_flush error %zd\n", ret);
-	} else if (io_u->ddir != DDIR_TRIM) {
+	} else if (io_u->ddir == DDIR_TRIM) {
+		ret = zbd_do_io_u_trim(td, io_u);
+		if (!ret)
+			ret = EINVAL;
+	} else {
 		log_err("Unsupported operation %u\n", io_u->ddir);
 		ret = -EINVAL;
 	}
diff --git a/fio.1 b/fio.1
index 9c12ad13..ff100a1c 100644
--- a/fio.1
+++ b/fio.1
@@ -766,6 +766,8 @@ starts. The \fBzonecapacity\fR parameter is ignored.
 Zoned block device mode. I/O happens sequentially in each zone, even if random
 I/O has been selected. Random I/O happens across all zones instead of being
 restricted to a single zone.
+Trim is handled using a zone reset operation. Trim only considers non-empty
+sequential write required and sequential write preferred zones.
 .RE
 .RE
 .TP
@@ -1761,10 +1763,9 @@ character devices. This engine supports trim operations. The
 sg engine includes engine specific options.
 .TP
 .B libzbc
-Synchronous I/O engine for SMR hard-disks using the \fBlibzbc\fR
-library. The target can be either an sg character device or
-a block device file. This engine supports the zonemode=zbd zone
-operations.
+Read, write, trim and ZBC/ZAC operations to a zoned block device using
+\fBlibzbc\fR library. The target can be either an SG character device or
+a block device file.
 .TP
 .B null
 Doesn't transfer any data, just pretends to. This is mainly used to
diff --git a/io_u.c b/io_u.c
index 9a1cd547..696d25cd 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2317,10 +2317,19 @@ int do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
 	struct fio_file *f = io_u->file;
 	int ret;
 
+	if (td->o.zone_mode == ZONE_MODE_ZBD) {
+		ret = zbd_do_io_u_trim(td, io_u);
+		if (ret == io_u_completed)
+			return io_u->xfer_buflen;
+		if (ret)
+			goto err;
+	}
+
 	ret = os_trim(f, io_u->offset, io_u->xfer_buflen);
 	if (!ret)
 		return io_u->xfer_buflen;
 
+err:
 	io_u->error = ret;
 	return 0;
 #endif
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 57e6d05e..5103c406 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -1215,6 +1215,32 @@ test57() {
 		>> "${logfile}.${test_number}" 2>&1 || return $?
 }
 
+# Random writes and random trims to sequential write required zones for 30s.
+test58() {
+    local off size bs
+
+    require_seq_zones 128 || return $SKIP_TESTCASE
+
+    size=$((zone_size * 128))
+    bs="$(max $((zone_size / 128)) "$logical_block_size")"
+    prep_write
+    off=$((first_sequential_zone_sector * 512))
+    run_fio --zonemode=zbd --direct=1 --zonesize="${zone_size}" --thread=1 \
+	    --filename="${dev}" --norandommap=1 \
+            --name="precondition"  --rw=write "$(ioengine "psync")" \
+            --offset="${off}" --size=$((zone_size * 16)) --bs="${bs}" \
+	    "${job_var_opts[@]}" \
+	    --name=wjob --wait_for="precondition" --rw=randwrite \
+	    "$(ioengine "libaio")" --iodepth=8 \
+	    --offset="${off}" --size="${size}" --bs="${bs}" \
+	    --time_based --runtime=30s --flow=128 "${job_var_opts[@]}" \
+	    --name=trimjob --wait_for="precondition" --rw=randtrim \
+	    "$(ioengine "psync")" \
+	    --offset="${off}" --size="${size}" --bs="${zone_size}" \
+	    --time_based --runtime=30s --flow=1 "${job_var_opts[@]}" \
+	    >>"${logfile}.${test_number}" 2>&1
+}
+
 SECONDS=0
 tests=()
 dynamic_analyzer=()
diff --git a/zbd.c b/zbd.c
index 43f12b45..1b933ce4 100644
--- a/zbd.c
+++ b/zbd.c
@@ -375,12 +375,24 @@ static bool zbd_verify_bs(void)
 	int i, j, k;
 
 	for_each_td(td, i) {
+		if (td_trim(td) &&
+		    (td->o.min_bs[DDIR_TRIM] != td->o.max_bs[DDIR_TRIM] ||
+		     td->o.bssplit_nr[DDIR_TRIM])) {
+			log_info("bsrange and bssplit are not allowed for trim with zonemode=zbd\n");
+			return false;
+		}
 		for_each_file(td, f, j) {
 			uint64_t zone_size;
 
 			if (!f->zbd_info)
 				continue;
 			zone_size = f->zbd_info->zone_size;
+			if (td_trim(td) && td->o.bs[DDIR_TRIM] != zone_size) {
+				log_info("%s: trim block size %llu is not the zone size %llu\n",
+					 f->file_name, td->o.bs[DDIR_TRIM],
+					 (unsigned long long)zone_size);
+				return false;
+			}
 			for (k = 0; k < FIO_ARRAY_SIZE(td->o.bs); k++) {
 				if (td->o.verify != VERIFY_NONE &&
 				    zone_size % td->o.bs[k] != 0) {
@@ -1414,18 +1426,16 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
 }
 
 /*
- * Find another zone for which @io_u fits in the readable data in the zone.
- * Search in zones @zb + 1 .. @zl. For random workload, also search in zones
- * @zb - 1 .. @zf.
+ * Find another zone which has @min_bytes of readable data. Search in zones
+ * @zb + 1 .. @zl. For random workload, also search in zones @zb - 1 .. @zf.
  *
  * Either returns NULL or returns a zone pointer. When the zone has write
  * pointer, hold the mutex for the zone.
  */
 static struct fio_zone_info *
-zbd_find_zone(struct thread_data *td, struct io_u *io_u,
+zbd_find_zone(struct thread_data *td, struct io_u *io_u, uint32_t min_bytes,
 	      struct fio_zone_info *zb, struct fio_zone_info *zl)
 {
-	const uint32_t min_bs = td->o.min_bs[io_u->ddir];
 	struct fio_file *f = io_u->file;
 	struct fio_zone_info *z1, *z2;
 	const struct fio_zone_info *const zf = get_zone(f, f->min_zone);
@@ -1438,7 +1448,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 		if (z1 < zl && z1->cond != ZBD_ZONE_COND_OFFLINE) {
 			if (z1->has_wp)
 				zone_lock(td, f, z1);
-			if (z1->start + min_bs <= z1->wp)
+			if (z1->start + min_bytes <= z1->wp)
 				return z1;
 			if (z1->has_wp)
 				zone_unlock(z1);
@@ -1449,14 +1459,14 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 		    z2->cond != ZBD_ZONE_COND_OFFLINE) {
 			if (z2->has_wp)
 				zone_lock(td, f, z2);
-			if (z2->start + min_bs <= z2->wp)
+			if (z2->start + min_bytes <= z2->wp)
 				return z2;
 			if (z2->has_wp)
 				zone_unlock(z2);
 		}
 	}
-	dprint(FD_ZBD, "%s: adjusting random read offset failed\n",
-	       f->file_name);
+	dprint(FD_ZBD, "%s: no zone has %d bytes of readable data\n",
+	       f->file_name, min_bytes);
 	return NULL;
 }
 
@@ -1531,9 +1541,6 @@ static void zbd_queue_io(struct thread_data *td, struct io_u *io_u, int q,
 		pthread_mutex_unlock(&zbd_info->mutex);
 		z->wp = zone_end;
 		break;
-	case DDIR_TRIM:
-		assert(z->wp == z->start);
-		break;
 	default:
 		break;
 	}
@@ -1785,7 +1792,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		    ((!td_random(td)) && (io_u->offset + min_bs > zb->wp))) {
 			zone_unlock(zb);
 			zl = get_zone(f, f->max_zone);
-			zb = zbd_find_zone(td, io_u, zb, zl);
+			zb = zbd_find_zone(td, io_u, min_bs, zb, zl);
 			if (!zb) {
 				dprint(FD_ZBD,
 				       "%s: zbd_find_zone(%lld, %llu) failed\n",
@@ -1913,8 +1920,23 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			(zbd_zone_capacity_end(zb) - io_u->offset), min_bs);
 		goto eof;
 	case DDIR_TRIM:
-		/* fall-through */
+		/* Check random trim targets a non-empty zone */
+		if (!td_random(td) || zb->wp > zb->start)
+			goto accept;
+
+		/* Find out a non-empty zone to trim */
+		zone_unlock(zb);
+		zl = get_zone(f, f->max_zone);
+		zb = zbd_find_zone(td, io_u, 1, zb, zl);
+		if (zb) {
+			io_u->offset = zb->start;
+			dprint(FD_ZBD, "%s: found new zone(%lld) for trim\n",
+			       f->file_name, io_u->offset);
+			goto accept;
+		}
+		goto eof;
 	case DDIR_SYNC:
+		/* fall-through */
 	case DDIR_DATASYNC:
 	case DDIR_SYNC_FILE_RANGE:
 	case DDIR_WAIT:
@@ -1955,3 +1977,38 @@ char *zbd_write_status(const struct thread_stat *ts)
 		return NULL;
 	return res;
 }
+
+/**
+ * zbd_do_io_u_trim - If reset zone is applicable, do reset zone instead of trim
+ *
+ * @td: FIO thread data.
+ * @io_u: FIO I/O unit.
+ *
+ * It is assumed that z->mutex is already locked.
+ * Return io_u_completed when reset zone succeeds. Return 0 when the target zone
+ * does not have write pointer. On error, return negative errno.
+ */
+int zbd_do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	struct fio_zone_info *z;
+	uint32_t zone_idx;
+	int ret;
+
+	zone_idx = zbd_zone_idx(f, io_u->offset);
+	z = get_zone(f, zone_idx);
+
+	if (!z->has_wp)
+		return 0;
+
+	if (io_u->offset != z->start) {
+		log_err("Trim offset not at zone start (%lld)\n", io_u->offset);
+		return -EINVAL;
+	}
+
+	ret = zbd_reset_zone((struct thread_data *)td, f, z);
+	if (ret < 0)
+		return ret;
+
+	return io_u_completed;
+}
diff --git a/zbd.h b/zbd.h
index 39dc45e3..0a73b41d 100644
--- a/zbd.h
+++ b/zbd.h
@@ -17,6 +17,7 @@ struct fio_file;
 enum io_u_action {
 	io_u_accept	= 0,
 	io_u_eof	= 1,
+	io_u_completed  = 2,
 };
 
 /**
@@ -99,6 +100,7 @@ enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
 			      enum fio_ddir ddir);
 enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
 char *zbd_write_status(const struct thread_stat *ts);
+int zbd_do_io_u_trim(const struct thread_data *td, struct io_u *io_u);
 
 static inline void zbd_close_file(struct fio_file *f)
 {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-08-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-08-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 382975557e632efb506836bc1709789e615c9094:

  fio: remove raw device support (2021-08-03 12:20:22 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2459bd33b3dbb7a34f28c612d595311a6bc7593d:

  ioengines: fix crash with --enghelp option (2021-08-04 12:49:57 -0600)

----------------------------------------------------------------
Ankit Kumar (2):
      HOWTO: Add missing documentation for job_max_open_zones
      zbd: Improve random zone index generation logic

Vincent Fu (2):
      backend: clarify io scheduler setting error message
      ioengines: fix crash with --enghelp option

 HOWTO       |  5 +++++
 backend.c   |  2 +-
 ioengines.c | 10 +++++-----
 zbd.c       |  5 +++--
 4 files changed, 14 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 59c7f1ff..d4e620de 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1055,6 +1055,11 @@ Target file/device
 	number of open zones is defined as the number of zones to which write
 	commands are issued.
 
+.. option:: job_max_open_zones=int
+
+	Limit on the number of simultaneously opened zones per single
+	thread/process.
+
 .. option:: zone_reset_threshold=float
 
 	A number between zero and one that indicates the ratio of logical
diff --git a/backend.c b/backend.c
index 6290e0d6..808e4362 100644
--- a/backend.c
+++ b/backend.c
@@ -1407,7 +1407,7 @@ static int set_ioscheduler(struct thread_data *td, struct fio_file *file)
 
 	sprintf(tmp2, "[%s]", td->o.ioscheduler);
 	if (!strstr(tmp, tmp2)) {
-		log_err("fio: io scheduler %s not found\n", td->o.ioscheduler);
+		log_err("fio: unable to set io scheduler to %s\n", td->o.ioscheduler);
 		td_verror(td, EINVAL, "iosched_switch");
 		fclose(f);
 		return 1;
diff --git a/ioengines.c b/ioengines.c
index dd61af07..d08a511a 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -692,17 +692,17 @@ int fio_show_ioengine_help(const char *engine)
 	}
 
 	td.o.ioengine = (char *)engine;
-	io_ops = load_ioengine(&td);
+	td.io_ops = load_ioengine(&td);
 
-	if (!io_ops) {
+	if (!td.io_ops) {
 		log_info("IO engine %s not found\n", engine);
 		return 1;
 	}
 
-	if (io_ops->options)
-		ret = show_cmd_help(io_ops->options, sep);
+	if (td.io_ops->options)
+		ret = show_cmd_help(td.io_ops->options, sep);
 	else
-		log_info("IO engine %s has no options\n", io_ops->name);
+		log_info("IO engine %s has no options\n", td.io_ops->name);
 
 	free_ioengine(&td);
 	return ret;
diff --git a/zbd.c b/zbd.c
index 04c68dea..43f12b45 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1184,11 +1184,12 @@ out:
 	return res;
 }
 
-/* Anything goes as long as it is not a constant. */
+/* Return random zone index for one of the open zones. */
 static uint32_t pick_random_zone_idx(const struct fio_file *f,
 				     const struct io_u *io_u)
 {
-	return io_u->offset * f->zbd_info->num_open_zones / f->real_file_size;
+	return (io_u->offset - f->file_offset) * f->zbd_info->num_open_zones /
+		f->io_size;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-08-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-08-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b61fdc8a555f844ba838c80781972df1239b5959:

  iolog: don't attempt read chunking with blktrace format (2021-08-02 08:23:24 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 382975557e632efb506836bc1709789e615c9094:

  fio: remove raw device support (2021-08-03 12:20:22 -0600)

----------------------------------------------------------------
Eric Sandeen (1):
      fio: remove raw device support

 diskutil.c    | 10 +++-------
 fio.1         |  4 +---
 os/os-linux.h | 32 --------------------------------
 os/os.h       |  4 ----
 4 files changed, 4 insertions(+), 46 deletions(-)

---

Diff of recent changes:

diff --git a/diskutil.c b/diskutil.c
index 0051a7a0..ace7af3d 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -166,14 +166,10 @@ static int get_device_numbers(char *file_name, int *maj, int *min)
 		if (S_ISBLK(st.st_mode)) {
 			majdev = major(st.st_rdev);
 			mindev = minor(st.st_rdev);
-		} else if (S_ISCHR(st.st_mode)) {
-			majdev = major(st.st_rdev);
-			mindev = minor(st.st_rdev);
-			if (fio_lookup_raw(st.st_rdev, &majdev, &mindev))
-				return -1;
-		} else if (S_ISFIFO(st.st_mode))
+		} else if (S_ISCHR(st.st_mode) ||
+			   S_ISFIFO(st.st_mode)) {
 			return -1;
-		else {
+		} else {
 			majdev = major(st.st_dev);
 			mindev = minor(st.st_dev);
 		}
diff --git a/fio.1 b/fio.1
index 6cc82542..9c12ad13 100644
--- a/fio.1
+++ b/fio.1
@@ -1700,9 +1700,7 @@ Sets size to something really large and waits for ENOSPC (no space left on
 device) or EDQUOT (disk quota exceeded)
 as the terminating condition. Only makes sense with sequential
 write. For a read workload, the mount point will be filled first then I/O
-started on the result. This option doesn't make sense if operating on a raw
-device node, since the size of that is already known by the file system.
-Additionally, writing beyond end-of-device will not return ENOSPC there.
+started on the result.
 .SS "I/O engine"
 .TP
 .BI ioengine \fR=\fPstr
diff --git a/os/os-linux.h b/os/os-linux.h
index f7137abe..16ed5258 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -14,7 +14,6 @@
 #include <errno.h>
 #include <sched.h>
 #include <linux/unistd.h>
-#include <linux/raw.h>
 #include <linux/major.h>
 #include <linux/fs.h>
 #include <scsi/sg.h>
@@ -41,7 +40,6 @@
 #define FIO_HAVE_IOSCHED_SWITCH
 #define FIO_HAVE_ODIRECT
 #define FIO_HAVE_HUGETLB
-#define FIO_HAVE_RAWBIND
 #define FIO_HAVE_BLKTRACE
 #define FIO_HAVE_CL_SIZE
 #define FIO_HAVE_CGROUPS
@@ -178,36 +176,6 @@ static inline unsigned long long os_phys_mem(void)
 	return (unsigned long long) pages * (unsigned long long) pagesize;
 }
 
-static inline int fio_lookup_raw(dev_t dev, int *majdev, int *mindev)
-{
-	struct raw_config_request rq;
-	int fd;
-
-	if (major(dev) != RAW_MAJOR)
-		return 1;
-
-	/*
-	 * we should be able to find /dev/rawctl or /dev/raw/rawctl
-	 */
-	fd = open("/dev/rawctl", O_RDONLY);
-	if (fd < 0) {
-		fd = open("/dev/raw/rawctl", O_RDONLY);
-		if (fd < 0)
-			return 1;
-	}
-
-	rq.raw_minor = minor(dev);
-	if (ioctl(fd, RAW_GETBIND, &rq) < 0) {
-		close(fd);
-		return 1;
-	}
-
-	close(fd);
-	*majdev = rq.block_major;
-	*mindev = rq.block_minor;
-	return 0;
-}
-
 #ifdef O_NOATIME
 #define FIO_O_NOATIME	O_NOATIME
 #else
diff --git a/os/os.h b/os/os.h
index e47d3d97..17daf91d 100644
--- a/os/os.h
+++ b/os/os.h
@@ -157,10 +157,6 @@ extern int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu);
 #define OS_RAND_MAX			RAND_MAX
 #endif
 
-#ifndef FIO_HAVE_RAWBIND
-#define fio_lookup_raw(dev, majdev, mindev)	1
-#endif
-
 #ifndef FIO_PREFERRED_ENGINE
 #define FIO_PREFERRED_ENGINE	"psync"
 #endif


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-08-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-08-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 572782274da6f7223627f734c6e1818a03c71a6d:

  Merge branch 'master' of https://github.com/anson-lo/fio (2021-08-01 08:36:01 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b61fdc8a555f844ba838c80781972df1239b5959:

  iolog: don't attempt read chunking with blktrace format (2021-08-02 08:23:24 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      iolog: don't attempt read chunking with blktrace format

 fio.h   |  1 +
 iolog.c | 10 +++++++---
 2 files changed, 8 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/fio.h b/fio.h
index 51686fd0..6f6b211b 100644
--- a/fio.h
+++ b/fio.h
@@ -420,6 +420,7 @@ struct thread_data {
 	 */
 	struct flist_head io_log_list;
 	FILE *io_log_rfile;
+	unsigned int io_log_blktrace;
 	unsigned int io_log_current;
 	unsigned int io_log_checkmark;
 	unsigned int io_log_highmark;
diff --git a/iolog.c b/iolog.c
index cf264916..26501b4a 100644
--- a/iolog.c
+++ b/iolog.c
@@ -151,7 +151,8 @@ int read_iolog_get(struct thread_data *td, struct io_u *io_u)
 
 	while (!flist_empty(&td->io_log_list)) {
 		int ret;
-		if (td->o.read_iolog_chunked) {
+
+		if (!td->io_log_blktrace && td->o.read_iolog_chunked) {
 			if (td->io_log_checkmark == td->io_log_current) {
 				if (!read_iolog2(td))
 					return 1;
@@ -706,10 +707,13 @@ bool init_iolog(struct thread_data *td)
 		 * Check if it's a blktrace file and load that if possible.
 		 * Otherwise assume it's a normal log file and load that.
 		 */
-		if (is_blktrace(fname, &need_swap))
+		if (is_blktrace(fname, &need_swap)) {
+			td->io_log_blktrace = 1;
 			ret = load_blktrace(td, fname, need_swap);
-		else
+		} else {
+			td->io_log_blktrace = 0;
 			ret = init_iolog_read(td, fname);
+		}
 	} else if (td->o.write_iolog_file)
 		ret = init_iolog_write(td);
 	else


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-08-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-08-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7c8e6725155cae72a0a730d3c3a36776bc5621a3:

  Makefile: update libzbc git repository (2021-07-28 07:27:29 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 572782274da6f7223627f734c6e1818a03c71a6d:

  Merge branch 'master' of https://github.com/anson-lo/fio (2021-08-01 08:36:01 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'master' of https://github.com/anson-lo/fio

anson-lo (1):
      Fix an error triggered by double releasing the lock

 verify.c | 1 -
 1 file changed, 1 deletion(-)

---

Diff of recent changes:

diff --git a/verify.c b/verify.c
index a418c054..0e1e4639 100644
--- a/verify.c
+++ b/verify.c
@@ -1411,7 +1411,6 @@ static void *verify_async_thread(void *data)
 			ret = pthread_cond_wait(&td->verify_cond,
 							&td->io_u_lock);
 			if (ret) {
-				pthread_mutex_unlock(&td->io_u_lock);
 				break;
 			}
 		}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-07-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-07-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9ce6f6f2bd636e9678982b86d6992ed419634c31:

  Merge branch 'evelu-fix-engines' of https://github.com/ErwanAliasr1/fio (2021-07-25 16:48:02 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7c8e6725155cae72a0a730d3c3a36776bc5621a3:

  Makefile: update libzbc git repository (2021-07-28 07:27:29 -0600)

----------------------------------------------------------------
Damien Le Moal (1):
      Makefile: update libzbc git repository

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 6b4b4122..5198f70e 100644
--- a/Makefile
+++ b/Makefile
@@ -651,7 +651,7 @@ test: fio
 fulltest:
 	sudo modprobe null_blk &&				 	\
 	if [ ! -e /usr/include/libzbc/zbc.h ]; then			\
-	  git clone https://github.com/hgst/libzbc &&		 	\
+	  git clone https://github.com/westerndigitalcorporation/libzbc && \
 	  (cd libzbc &&						 	\
 	   ./autogen.sh &&					 	\
 	   ./configure --prefix=/usr &&				 	\


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-07-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-07-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ae5c7cdd710dfa97705d965dcf001a96504e5f31:

  Merge branch 'dedupe_workset' of https://github.com/bardavid/fio (2021-07-15 09:54:03 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9ce6f6f2bd636e9678982b86d6992ed419634c31:

  Merge branch 'evelu-fix-engines' of https://github.com/ErwanAliasr1/fio (2021-07-25 16:48:02 -0600)

----------------------------------------------------------------
Erwan Velu (3):
      engines: Adding exec engine
      fiograph: Adding exec engine support
      engines/exec: Code cleanup to remove leaks

Jens Axboe (3):
      Merge branch 'evelu-exec' of https://github.com/ErwanAliasr1/fio
      engines/exec: style cleanups
      Merge branch 'evelu-fix-engines' of https://github.com/ErwanAliasr1/fio

 HOWTO                        |  25 +++
 Makefile                     |   1 +
 engines/exec.c               | 394 +++++++++++++++++++++++++++++++++++++++++++
 examples/exec.fio            |  36 ++++
 examples/exec.png            | Bin 0 -> 101933 bytes
 fio.1                        |  28 +++
 os/os-windows.h              |   1 +
 tools/fiograph/fiograph.conf |   3 +
 8 files changed, 488 insertions(+)
 create mode 100644 engines/exec.c
 create mode 100644 examples/exec.fio
 create mode 100644 examples/exec.png

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index a12bccba..59c7f1ff 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2139,6 +2139,9 @@ I/O engine
 			achieving higher concurrency and thus throughput than is possible
 			via kernel NFS.
 
+		**exec**
+			Execute 3rd party tools. Could be used to perform monitoring during jobs runtime.
+
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -2566,6 +2569,28 @@ with the caveat that when used on the command line, they must come after the
 	URL in libnfs format, eg nfs://<server|ipv4|ipv6>/path[?arg=val[&arg=val]*]
 	Refer to the libnfs README for more details.
 
+.. option:: program=str : [exec]
+
+	Specify the program to execute.
+
+.. option:: arguments=str : [exec]
+
+	Specify arguments to pass to program.
+	Some special variables can be expanded to pass fio's job details to the program.
+
+	**%r**
+		Replaced by the duration of the job in seconds.
+	**%n**
+		Replaced by the name of the job.
+
+.. option:: grace_time=int : [exec]
+
+	Specify the time between the SIGTERM and SIGKILL signals. Default is 1 second.
+
+.. option:: std_redirect=boot : [exec]
+
+	If set, stdout and stderr streams are redirected to files named from the job name. Default is true.
+
 I/O depth
 ~~~~~~~~~
 
diff --git a/Makefile b/Makefile
index cc7dada7..6b4b4122 100644
--- a/Makefile
+++ b/Makefile
@@ -57,6 +57,7 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		smalloc.c filehash.c profile.c debug.c engines/cpu.c \
 		engines/mmap.c engines/sync.c engines/null.c engines/net.c \
 		engines/ftruncate.c engines/filecreate.c engines/filestat.c engines/filedelete.c \
+		engines/exec.c \
 		server.c client.c iolog.c backend.c libfio.c flow.c cconv.c \
 		gettime-thread.c helpers.c json.c idletime.c td_error.c \
 		profiles/tiobench.c profiles/act.c io_u_queue.c filelock.c \
diff --git a/engines/exec.c b/engines/exec.c
new file mode 100644
index 00000000..ab3639c5
--- /dev/null
+++ b/engines/exec.c
@@ -0,0 +1,394 @@
+/*
+ * Exec engine
+ *
+ * Doesn't transfer any data, merely run 3rd party tools
+ *
+ */
+#include "../fio.h"
+#include "../optgroup.h"
+#include <signal.h>
+
+struct exec_options {
+	void *pad;
+	char *program;
+	char *arguments;
+	int grace_time;
+	unsigned int std_redirect;
+	pid_t pid;
+};
+
+static struct fio_option options[] = {
+	{
+		.name = "program",
+		.lname = "Program",
+		.type = FIO_OPT_STR_STORE,
+		.off1 = offsetof(struct exec_options, program),
+		.help = "Program to execute",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_INVALID,
+	},
+	{
+		.name = "arguments",
+		.lname = "Arguments",
+		.type = FIO_OPT_STR_STORE,
+		.off1 = offsetof(struct exec_options, arguments),
+		.help = "Arguments to pass",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_INVALID,
+	},
+	{
+		.name = "grace_time",
+		.lname = "Grace time",
+		.type = FIO_OPT_INT,
+		.minval = 0,
+		.def = "1",
+		.off1 = offsetof(struct exec_options, grace_time),
+		.help = "Grace time before sending a SIGKILL",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_INVALID,
+	},
+	{
+		.name = "std_redirect",
+		.lname = "Std redirect",
+		.type = FIO_OPT_BOOL,
+		.def = "1",
+		.off1 = offsetof(struct exec_options, std_redirect),
+		.help = "Redirect stdout & stderr to files",
+		.category = FIO_OPT_C_ENGINE,
+		.group = FIO_OPT_G_INVALID,
+	},
+	{
+		.name = NULL,
+	},
+};
+
+char *str_replace(char *orig, const char *rep, const char *with)
+{
+	/*
+	 * Replace a substring by another.
+	 *
+	 * Returns the new string if occurences were found
+	 * Returns orig if no occurence is found
+	 */
+	char *result, *insert, *tmp;
+	int len_rep, len_with, len_front, count;
+
+	/* sanity checks and initialization */
+	if (!orig || !rep)
+		return orig;
+
+	len_rep = strlen(rep);
+	if (len_rep == 0)
+		return orig;
+
+	if (!with)
+		with = "";
+	len_with = strlen(with);
+
+	insert = orig;
+	for (count = 0; (tmp = strstr(insert, rep)); ++count) {
+		insert = tmp + len_rep;
+	}
+
+	tmp = result = malloc(strlen(orig) + (len_with - len_rep) * count + 1);
+
+	if (!result)
+		return orig;
+
+	while (count--) {
+		insert = strstr(orig, rep);
+		len_front = insert - orig;
+		tmp = strncpy(tmp, orig, len_front) + len_front;
+		tmp = strcpy(tmp, with) + len_with;
+		orig += len_front + len_rep;
+	}
+	strcpy(tmp, orig);
+	return result;
+}
+
+char *expand_variables(struct thread_options *o, char *arguments)
+{
+	char str[16];
+	char *expanded_runtime, *expanded_name;
+	snprintf(str, sizeof(str), "%lld", o->timeout / 1000000);
+
+	/* %r is replaced by the runtime in seconds */
+	expanded_runtime = str_replace(arguments, "%r", str);
+
+	/* %n is replaced by the name of the running job */
+	expanded_name = str_replace(expanded_runtime, "%n", o->name);
+
+	free(expanded_runtime);
+	return expanded_name;
+}
+
+static int exec_background(struct thread_options *o, struct exec_options *eo)
+{
+	char *outfilename = NULL, *errfilename = NULL;
+	int outfd = 0, errfd = 0;
+	pid_t pid;
+	char *expanded_arguments = NULL;
+	/* For the arguments splitting */
+	char **arguments_array = NULL;
+	char *p;
+	char *exec_cmd = NULL;
+	size_t arguments_nb_items = 0, q;
+
+	if (asprintf(&outfilename, "%s.stdout", o->name) < 0)
+		return -1;
+
+	if (asprintf(&errfilename, "%s.stderr", o->name) < 0) {
+		free(outfilename);
+		return -1;
+	}
+
+	/* If we have variables in the arguments, let's expand them */
+	expanded_arguments = expand_variables(o, eo->arguments);
+
+	if (eo->std_redirect) {
+		log_info("%s : Saving output of %s %s : stdout=%s stderr=%s\n",
+			 o->name, eo->program, expanded_arguments, outfilename,
+			 errfilename);
+
+		/* Creating the stderr & stdout output files */
+		outfd = open(outfilename, O_CREAT | O_WRONLY | O_TRUNC, 0644);
+		if (outfd < 0) {
+			log_err("fio: cannot open output file %s : %s\n",
+				outfilename, strerror(errno));
+			free(outfilename);
+			free(errfilename);
+			free(expanded_arguments);
+			return -1;
+		}
+
+		errfd = open(errfilename, O_CREAT | O_WRONLY | O_TRUNC, 0644);
+		if (errfd < 0) {
+			log_err("fio: cannot open output file %s : %s\n",
+				errfilename, strerror(errno));
+			free(outfilename);
+			free(errfilename);
+			free(expanded_arguments);
+			close(outfd);
+			return -1;
+		}
+	} else {
+		log_info("%s : Running %s %s\n",
+			 o->name, eo->program, expanded_arguments);
+	}
+
+	pid = fork();
+
+	/* We are on the control thread (parent side of the fork */
+	if (pid > 0) {
+		eo->pid = pid;
+		if (eo->std_redirect) {
+			/* The output file is for the client side of the fork */
+			close(outfd);
+			close(errfd);
+			free(outfilename);
+			free(errfilename);
+		}
+		free(expanded_arguments);
+		return 0;
+	}
+
+	/* If the fork failed */
+	if (pid < 0) {
+		log_err("fio: forking failed %s \n", strerror(errno));
+		if (eo->std_redirect) {
+			close(outfd);
+			close(errfd);
+			free(outfilename);
+			free(errfilename);
+		}
+		free(expanded_arguments);
+		return -1;
+	}
+
+	/* We are in the worker (child side of the fork) */
+	if (pid == 0) {
+		if (eo->std_redirect) {
+			/* replace stdout by the output file we create */
+			dup2(outfd, 1);
+			/* replace stderr by the output file we create */
+			dup2(errfd, 2);
+			close(outfd);
+			close(errfd);
+			free(outfilename);
+			free(errfilename);
+		}
+
+		/*
+		 * Let's split the command line into a null terminated array to
+		 * be passed to the exec'd program.
+		 * But don't asprintf expanded_arguments if NULL as it would be
+		 * converted to a '(null)' argument, while we want no arguments
+		 * at all.
+		 */
+		if (expanded_arguments != NULL) {
+			if (asprintf(&exec_cmd, "%s %s", eo->program, expanded_arguments) < 0) {
+				free(expanded_arguments);
+				return -1;
+			}
+		} else {
+			if (asprintf(&exec_cmd, "%s", eo->program) < 0)
+				return -1;
+		}
+
+		/*
+		 * Let's build an argv array to based on the program name and
+		 * arguments
+		 */
+		p = exec_cmd;
+		for (;;) {
+			p += strspn(p, " ");
+
+			if (!(q = strcspn(p, " ")))
+				break;
+
+			if (q) {
+				arguments_array =
+				    realloc(arguments_array,
+					    (arguments_nb_items +
+					     1) * sizeof(char *));
+				arguments_array[arguments_nb_items] =
+				    malloc(q + 1);
+				strncpy(arguments_array[arguments_nb_items], p,
+					q);
+				arguments_array[arguments_nb_items][q] = 0;
+				arguments_nb_items++;
+				p += q;
+			}
+		}
+
+		/* Adding a null-terminated item to close the list */
+		arguments_array =
+		    realloc(arguments_array,
+			    (arguments_nb_items + 1) * sizeof(char *));
+		arguments_array[arguments_nb_items] = NULL;
+
+		/*
+		 * Replace the fio program from the child fork by the target
+		 * program
+		 */
+		execvp(arguments_array[0], arguments_array);
+	}
+	/* We never reach this place */
+	/* Let's free the malloc'ed structures to make static checkers happy */
+	if (expanded_arguments)
+		free(expanded_arguments);
+	if (arguments_array)
+		free(arguments_array);
+	return 0;
+}
+
+static enum fio_q_status
+fio_exec_queue(struct thread_data *td, struct io_u fio_unused * io_u)
+{
+	struct thread_options *o = &td->o;
+	struct exec_options *eo = td->eo;
+
+	/* Let's execute the program the first time we get queued */
+	if (eo->pid == -1) {
+		exec_background(o, eo);
+	} else {
+		/*
+		 * The program is running in background, let's check on a
+		 * regular basis
+		 * if the time is over and if we need to stop the tool
+		 */
+		usleep(o->thinktime);
+		if (utime_since_now(&td->start) > o->timeout) {
+			/* Let's stop the child */
+			kill(eo->pid, SIGTERM);
+			/*
+			 * Let's give grace_time (1 sec by default) to the 3rd
+			 * party tool to stop
+			 */
+			sleep(eo->grace_time);
+		}
+	}
+
+	return FIO_Q_COMPLETED;
+}
+
+static int fio_exec_init(struct thread_data *td)
+{
+	struct thread_options *o = &td->o;
+	struct exec_options *eo = td->eo;
+	int td_previous_state;
+
+	eo->pid = -1;
+
+	if (!eo->program) {
+		td_vmsg(td, EINVAL,
+			"no program is defined, it is mandatory to define one",
+			"exec");
+		return 1;
+	}
+
+	log_info("%s : program=%s, arguments=%s\n",
+		 td->o.name, eo->program, eo->arguments);
+
+	/* Saving the current thread state */
+	td_previous_state = td->runstate;
+
+	/*
+	 * Reporting that we are preparing the engine
+	 * This is useful as the qsort() calibration takes time
+	 * This prevents the job from starting before init is completed
+	 */
+	td_set_runstate(td, TD_SETTING_UP);
+
+	/*
+	 * set thinktime_sleep and thinktime_spin appropriately
+	 */
+	o->thinktime_blocks = 1;
+	o->thinktime_blocks_type = THINKTIME_BLOCKS_TYPE_COMPLETE;
+	o->thinktime_spin = 0;
+	/* 50ms pause when waiting for the program to complete */
+	o->thinktime = 50000;
+
+	o->nr_files = o->open_files = 1;
+
+	/* Let's restore the previous state. */
+	td_set_runstate(td, td_previous_state);
+	return 0;
+}
+
+static void fio_exec_cleanup(struct thread_data *td)
+{
+	struct exec_options *eo = td->eo;
+	/* Send a sigkill to ensure the job is well terminated */
+	if (eo->pid > 0)
+		kill(eo->pid, SIGKILL);
+}
+
+static int
+fio_exec_open(struct thread_data fio_unused * td,
+	      struct fio_file fio_unused * f)
+{
+	return 0;
+}
+
+static struct ioengine_ops ioengine = {
+	.name = "exec",
+	.version = FIO_IOOPS_VERSION,
+	.queue = fio_exec_queue,
+	.init = fio_exec_init,
+	.cleanup = fio_exec_cleanup,
+	.open_file = fio_exec_open,
+	.flags = FIO_SYNCIO | FIO_DISKLESSIO | FIO_NOIO,
+	.options = options,
+	.option_struct_size = sizeof(struct exec_options),
+};
+
+static void fio_init fio_exec_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_exec_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/exec.fio b/examples/exec.fio
new file mode 100644
index 00000000..ac1bedfb
--- /dev/null
+++ b/examples/exec.fio
@@ -0,0 +1,36 @@
+[global]
+time_based
+runtime=30
+
+[monitoring_noop]
+ioengine=exec
+program=/usr/sbin/turbostat
+arguments=-c package -qS --interval 5 -s Busy%,Bzy_MHz,Avg_MHz,CorWatt,PkgWatt,RAMWatt,PkgTmp
+
+[cpuload_noop]
+ioengine=cpuio
+cpuload=100
+numjobs=12
+cpumode=noop
+
+[sleep]
+# Let the processor cooling down for a few seconds
+stonewall
+ioengine=exec
+runtime=10
+program=/bin/sleep
+arguments=%r
+grace_time=0
+std_redirect=0
+
+[monitoring_qsort]
+stonewall
+ioengine=exec
+program=/usr/sbin/turbostat
+arguments=-c package -qS --interval 5 -s Busy%,Bzy_MHz,Avg_MHz,CorWatt,PkgWatt,RAMWatt,PkgTmp
+
+[cpuload_qsort]
+ioengine=cpuio
+cpuload=100
+numjobs=12
+cpumode=qsort
diff --git a/examples/exec.png b/examples/exec.png
new file mode 100644
index 00000000..5f9f3b59
Binary files /dev/null and b/examples/exec.png differ
diff --git a/fio.1 b/fio.1
index bd315e11..6cc82542 100644
--- a/fio.1
+++ b/fio.1
@@ -1954,6 +1954,9 @@ I/O engine supporting asynchronous read and write operations to
 NFS filesystems from userspace via libnfs. This is useful for
 achieving higher concurrency and thus throughput than is possible
 via kernel NFS.
+.TP
+.B exec
+Execute 3rd party tools. Could be used to perform monitoring during jobs runtime.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -2340,6 +2343,31 @@ Use DAOS container's object class by default.
 .BI (nfs)nfs_url
 URL in libnfs format, eg nfs://<server|ipv4|ipv6>/path[?arg=val[&arg=val]*]
 Refer to the libnfs README for more details.
+.TP
+.BI (exec)program\fR=\fPstr
+Specify the program to execute.
+Note the program will receive a SIGTERM when the job is reaching the time limit.
+A SIGKILL is sent once the job is over. The delay between the two signals is defined by \fBgrace_time\fR option.
+.TP
+.BI (exec)arguments\fR=\fPstr
+Specify arguments to pass to program.
+Some special variables can be expanded to pass fio's job details to the program :
+.RS
+.RS
+.TP
+.B %r
+replaced by the duration of the job in seconds
+.TP
+.BI %n
+replaced by the name of the job
+.RE
+.RE
+.TP
+.BI (exec)grace_time\fR=\fPint
+Defines the time between the SIGTERM and SIGKILL signals. Default is 1 second.
+.TP
+.BI (exec)std_redirect\fR=\fbool
+If set, stdout and stderr streams are redirected to files named from the job name. Default is true.
 .SS "I/O depth"
 .TP
 .BI iodepth \fR=\fPint
diff --git a/os/os-windows.h b/os/os-windows.h
index ddfae413..59da9dba 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -77,6 +77,7 @@
 #define SIGCONT	0
 #define SIGUSR1	1
 #define SIGUSR2 2
+#define SIGKILL 15 /* SIGKILL doesn't exists, let's use SIGTERM */
 
 typedef int sigset_t;
 typedef int siginfo_t;
diff --git a/tools/fiograph/fiograph.conf b/tools/fiograph/fiograph.conf
index 7b851e19..5becc4d9 100644
--- a/tools/fiograph/fiograph.conf
+++ b/tools/fiograph/fiograph.conf
@@ -35,6 +35,9 @@ specific_options=pool  cont  chunk_size  object_class  svcl
 [ioengine_e4defrag]
 specific_options=donorname  inplace
 
+[ioengine_exec]
+specific_options=program arguments grace_time std_redirect
+
 [ioengine_filestat]
 specific_options=stat_type
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-07-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-07-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7a9cc9c93c1384f72ac16d1d7980e158ec5f9f0a:

  Makefile: use override directive on engine CFLAGS (2021-07-07 07:05:08 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ae5c7cdd710dfa97705d965dcf001a96504e5f31:

  Merge branch 'dedupe_workset' of https://github.com/bardavid/fio (2021-07-15 09:54:03 -0600)

----------------------------------------------------------------
Bar David (1):
      dedupe: allow to generate dedupe buffers from working set

Jens Axboe (2):
      Merge branch 'cmd-test-be' of https://github.com/tuan-hoang1/fio
      Merge branch 'dedupe_workset' of https://github.com/bardavid/fio

Tuan Hoang (1):
      server: fix missing le32_to_cpu conversion when opcode is FIO_NET_CMD_TEXT

 DEDUPE-TODO      | 19 +++++++++++++++++++
 HOWTO            | 30 ++++++++++++++++++++++++++++++
 Makefile         |  2 +-
 cconv.c          |  4 ++++
 dedupe.c         | 28 ++++++++++++++++++++++++++++
 dedupe.h         |  6 ++++++
 fio.1            | 42 ++++++++++++++++++++++++++++++++++++++++++
 fio.h            |  6 ++++++
 init.c           | 26 ++++++++++++++++++++++++++
 io_u.c           | 30 ++++++++++++++++++++----------
 lib/rand.c       | 10 ++--------
 lib/rand.h       | 10 ++++++++++
 options.c        | 34 ++++++++++++++++++++++++++++++++++
 server.c         |  3 ++-
 server.h         |  2 +-
 t/dedupe.c       | 21 ++++++++++++++-------
 thread_options.h | 12 ++++++++++++
 17 files changed, 257 insertions(+), 28 deletions(-)
 create mode 100644 DEDUPE-TODO
 create mode 100644 dedupe.c
 create mode 100644 dedupe.h

---

Diff of recent changes:

diff --git a/DEDUPE-TODO b/DEDUPE-TODO
new file mode 100644
index 00000000..1f3ee9da
--- /dev/null
+++ b/DEDUPE-TODO
@@ -0,0 +1,19 @@
+- Mixed buffers of dedupe-able and compressible data.
+  Major usecase in performance benchmarking of storage subsystems.
+
+- Shifted dedup-able data.
+  Allow for dedup buffer generation to shift contents by random number
+  of sectors (fill the gaps with uncompressible data). Some storage
+  subsystems modernized the deduplication detection algorithms to look
+  for shifted data as well. For example, some databases push a timestamp
+  on the prefix of written blocks, which makes the underlying data
+  dedup-able in different alignment. FIO should be able to simulate such
+  workload.
+
+- Generation of similar data (but not exact).
+  A rising trend in enterprise storage systems.
+  Generation of "similar" data means random uncompressible buffers
+  that differ by few(configurable number of) bits from each other.
+  The storage subsystem usually identifies the similar buffers using
+  locality-sensitive hashing or other methods.
+
diff --git a/HOWTO b/HOWTO
index 86fb2964..a12bccba 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1705,6 +1705,36 @@ Buffers and memory
 	this option will also enable :option:`refill_buffers` to prevent every buffer
 	being identical.
 
+.. option:: dedupe_mode=str
+
+	If ``dedupe_percentage=<int>`` is given, then this option controls how fio
+	generates the dedupe buffers.
+
+		**repeat**
+			Generate dedupe buffers by repeating previous writes
+		**working_set**
+			Generate dedupe buffers from working set
+
+	``repeat`` is the default option for fio. Dedupe buffers are generated
+	by repeating previous unique write.
+
+	``working_set`` is a more realistic workload.
+	With ``working_set``, ``dedupe_working_set_percentage=<int>`` should be provided.
+	Given that, fio will use the initial unique write buffers as its working set.
+	Upon deciding to dedupe, fio will randomly choose a buffer from the working set.
+	Note that by using ``working_set`` the dedupe percentage will converge
+	to the desired over time while ``repeat`` maintains the desired percentage
+	throughout the job.
+
+.. option:: dedupe_working_set_percentage=int
+
+	If ``dedupe_mode=<str>`` is set to ``working_set``, then this controls
+	the percentage of size of the file or device used as the buffers
+	fio will choose to generate the dedupe buffers from
+
+	Note that size needs to be explicitly provided and only 1 file per
+	job is supported
+
 .. option:: invalidate=bool
 
 	Invalidate the buffer/page cache parts of the files to be used prior to
diff --git a/Makefile b/Makefile
index 510e07fc..cc7dada7 100644
--- a/Makefile
+++ b/Makefile
@@ -61,7 +61,7 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		gettime-thread.c helpers.c json.c idletime.c td_error.c \
 		profiles/tiobench.c profiles/act.c io_u_queue.c filelock.c \
 		workqueue.c rate-submit.c optgroup.c helper_thread.c \
-		steadystate.c zone-dist.c zbd.c
+		steadystate.c zone-dist.c zbd.c dedupe.c
 
 ifdef CONFIG_LIBHDFS
   HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
diff --git a/cconv.c b/cconv.c
index 74c24106..e3a8c27c 100644
--- a/cconv.c
+++ b/cconv.c
@@ -298,6 +298,8 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->compress_percentage = le32_to_cpu(top->compress_percentage);
 	o->compress_chunk = le32_to_cpu(top->compress_chunk);
 	o->dedupe_percentage = le32_to_cpu(top->dedupe_percentage);
+	o->dedupe_mode = le32_to_cpu(top->dedupe_mode);
+	o->dedupe_working_set_percentage = le32_to_cpu(top->dedupe_working_set_percentage);
 	o->block_error_hist = le32_to_cpu(top->block_error_hist);
 	o->replay_align = le32_to_cpu(top->replay_align);
 	o->replay_scale = le32_to_cpu(top->replay_scale);
@@ -499,6 +501,8 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->compress_percentage = cpu_to_le32(o->compress_percentage);
 	top->compress_chunk = cpu_to_le32(o->compress_chunk);
 	top->dedupe_percentage = cpu_to_le32(o->dedupe_percentage);
+	top->dedupe_mode = cpu_to_le32(o->dedupe_mode);
+	top->dedupe_working_set_percentage = cpu_to_le32(o->dedupe_working_set_percentage);
 	top->block_error_hist = cpu_to_le32(o->block_error_hist);
 	top->replay_align = cpu_to_le32(o->replay_align);
 	top->replay_scale = cpu_to_le32(o->replay_scale);
diff --git a/dedupe.c b/dedupe.c
new file mode 100644
index 00000000..043a376c
--- /dev/null
+++ b/dedupe.c
@@ -0,0 +1,28 @@
+#include "fio.h"
+
+int init_dedupe_working_set_seeds(struct thread_data *td)
+{
+	unsigned long long i;
+	struct frand_state dedupe_working_set_state = {0};
+
+	if (!td->o.dedupe_percentage || !(td->o.dedupe_mode == DEDUPE_MODE_WORKING_SET))
+		return 0;
+
+	/*
+	 * The dedupe working set keeps seeds of unique data (generated by buf_state).
+	 * Dedupe-ed pages will be generated using those seeds.
+	 */
+	td->num_unique_pages = (td->o.size * (unsigned long long)td->o.dedupe_working_set_percentage / 100) / td->o.min_bs[DDIR_WRITE];
+	td->dedupe_working_set_states = malloc(sizeof(struct frand_state) * td->num_unique_pages);
+	if (!td->dedupe_working_set_states) {
+		log_err("fio: could not allocate dedupe working set\n");
+		return 1;
+	}
+	frand_copy(&dedupe_working_set_state, &td->buf_state);
+	for (i = 0; i < td->num_unique_pages; i++) {
+		frand_copy(&td->dedupe_working_set_states[i], &dedupe_working_set_state);
+		__get_next_seed(&dedupe_working_set_state);
+	}
+
+	return 0;
+}
diff --git a/dedupe.h b/dedupe.h
new file mode 100644
index 00000000..d4c4dc37
--- /dev/null
+++ b/dedupe.h
@@ -0,0 +1,6 @@
+#ifndef DEDUPE_H
+#define DEDUPE_H
+
+int init_dedupe_working_set_seeds(struct thread_data *td);
+
+#endif
diff --git a/fio.1 b/fio.1
index 5aa54a4d..bd315e11 100644
--- a/fio.1
+++ b/fio.1
@@ -1509,6 +1509,48 @@ all \-\- this option only controls the distribution of unique buffers. Setting
 this option will also enable \fBrefill_buffers\fR to prevent every buffer
 being identical.
 .TP
+.BI dedupe_mode \fR=\fPstr
+If \fBdedupe_percentage\fR is given, then this option controls how fio
+generates the dedupe buffers.
+.RS
+.RS
+.TP
+.B repeat
+.P
+.RS
+Generate dedupe buffers by repeating previous writes
+.RE
+.TP
+.B working_set
+.P
+.RS
+Generate dedupe buffers from working set
+.RE
+.RE
+.P
+\fBrepeat\fR is the default option for fio. Dedupe buffers are generated
+by repeating previous unique write.
+
+\fBworking_set\fR is a more realistic workload.
+With \fBworking_set\fR, \fBdedupe_working_set_percentage\fR should be provided.
+Given that, fio will use the initial unique write buffers as its working set.
+Upon deciding to dedupe, fio will randomly choose a buffer from the working set.
+Note that by using \fBworking_set\fR the dedupe percentage will converge
+to the desired over time while \fBrepeat\fR maintains the desired percentage
+throughout the job.
+.RE
+.RE
+.TP
+.BI dedupe_working_set_percentage \fR=\fPint
+If \fBdedupe_mode\fR is set to \fBworking_set\fR, then this controls
+the percentage of size of the file or device used as the buffers
+fio will choose to generate the dedupe buffers from
+.P
+.RS
+Note that \fBsize\fR needs to be explicitly provided and only 1 file
+per job is supported
+.RE
+.TP
 .BI invalidate \fR=\fPbool
 Invalidate the buffer/page cache parts of the files to be used prior to
 starting I/O if the platform and file type support it. Defaults to true.
diff --git a/fio.h b/fio.h
index 83334652..51686fd0 100644
--- a/fio.h
+++ b/fio.h
@@ -47,6 +47,7 @@
 #include "workqueue.h"
 #include "steadystate.h"
 #include "lib/nowarn_snprintf.h"
+#include "dedupe.h"
 
 #ifdef CONFIG_SOLARISAIO
 #include <sys/asynch.h>
@@ -140,6 +141,7 @@ enum {
 	FIO_RAND_POISSON2_OFF,
 	FIO_RAND_POISSON3_OFF,
 	FIO_RAND_PRIO_CMDS,
+	FIO_RAND_DEDUPE_WORKING_SET_IX,
 	FIO_RAND_NR_OFFS,
 };
 
@@ -263,6 +265,10 @@ struct thread_data {
 	struct frand_state dedupe_state;
 	struct frand_state zone_state;
 	struct frand_state prio_state;
+	struct frand_state dedupe_working_set_index_state;
+	struct frand_state *dedupe_working_set_states;
+
+	unsigned long long num_unique_pages;
 
 	struct zone_split_index **zone_state_index;
 	unsigned int num_open_zones;
diff --git a/init.c b/init.c
index 60c7cff4..871fb5ad 100644
--- a/init.c
+++ b/init.c
@@ -958,6 +958,28 @@ static int fixup_options(struct thread_data *td)
 
 	o->latency_target *= 1000ULL;
 
+	/*
+	 * Dedupe working set verifications
+	 */
+	if (o->dedupe_percentage && o->dedupe_mode == DEDUPE_MODE_WORKING_SET) {
+		if (!fio_option_is_set(o, size)) {
+			log_err("fio: pregenerated dedupe working set "
+					"requires size to be set\n");
+			ret |= 1;
+		} else if (o->nr_files != 1) {
+			log_err("fio: dedupe working set mode supported with "
+					"single file per job, but %d files "
+					"provided\n", o->nr_files);
+			ret |= 1;
+		} else if (o->dedupe_working_set_percentage + o->dedupe_percentage > 100) {
+			log_err("fio: impossible to reach expected dedupe percentage %u "
+					"since %u percentage of size is reserved to dedupe working set "
+					"(those are unique pages)\n",
+					o->dedupe_percentage, o->dedupe_working_set_percentage);
+			ret |= 1;
+		}
+	}
+
 	return ret;
 }
 
@@ -1031,6 +1053,7 @@ static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 	init_rand_seed(&td->dedupe_state, td->rand_seeds[FIO_DEDUPE_OFF], false);
 	init_rand_seed(&td->zone_state, td->rand_seeds[FIO_RAND_ZONE_OFF], false);
 	init_rand_seed(&td->prio_state, td->rand_seeds[FIO_RAND_PRIO_CMDS], false);
+	init_rand_seed(&td->dedupe_working_set_index_state, td->rand_seeds[FIO_RAND_DEDUPE_WORKING_SET_IX], use64);
 
 	if (!td_random(td))
 		return;
@@ -1491,6 +1514,9 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	if (fixup_options(td))
 		goto err;
 
+	if (init_dedupe_working_set_seeds(td))
+		goto err;
+
 	/*
 	 * Belongs to fixup_options, but o->name is not necessarily set as yet
 	 */
diff --git a/io_u.c b/io_u.c
index b60488a3..9a1cd547 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2172,6 +2172,7 @@ void io_u_queued(struct thread_data *td, struct io_u *io_u)
 static struct frand_state *get_buf_state(struct thread_data *td)
 {
 	unsigned int v;
+	unsigned long long i;
 
 	if (!td->o.dedupe_percentage)
 		return &td->buf_state;
@@ -2182,16 +2183,25 @@ static struct frand_state *get_buf_state(struct thread_data *td)
 
 	v = rand_between(&td->dedupe_state, 1, 100);
 
-	if (v <= td->o.dedupe_percentage) {
-		/*
-		 * The caller advances the returned frand_state.
-		 * A copy of prev should be returned instead since
-		 * a subsequent intention to generate a deduped buffer
-		 * might result in generating a unique one
-		 */
-		frand_copy(&td->buf_state_ret, &td->buf_state_prev);
-		return &td->buf_state_ret;
-	}
+	if (v <= td->o.dedupe_percentage)
+		switch (td->o.dedupe_mode) {
+		case DEDUPE_MODE_REPEAT:
+			/*
+			* The caller advances the returned frand_state.
+			* A copy of prev should be returned instead since
+			* a subsequent intention to generate a deduped buffer
+			* might result in generating a unique one
+			*/
+			frand_copy(&td->buf_state_ret, &td->buf_state_prev);
+			return &td->buf_state_ret;
+		case DEDUPE_MODE_WORKING_SET:
+			i = rand_between(&td->dedupe_working_set_index_state, 0, td->num_unique_pages - 1);
+			frand_copy(&td->buf_state_ret, &td->dedupe_working_set_states[i]);
+			return &td->buf_state_ret;
+		default:
+			log_err("unexpected dedupe mode %u\n", td->o.dedupe_mode);
+			assert(0);
+		}
 
 	return &td->buf_state;
 }
diff --git a/lib/rand.c b/lib/rand.c
index 5eb6e60a..e74da609 100644
--- a/lib/rand.c
+++ b/lib/rand.c
@@ -125,10 +125,7 @@ void __fill_random_buf(void *buf, unsigned int len, uint64_t seed)
 uint64_t fill_random_buf(struct frand_state *fs, void *buf,
 			 unsigned int len)
 {
-	uint64_t r = __rand(fs);
-
-	if (sizeof(int) != sizeof(long *))
-		r *= (unsigned long) __rand(fs);
+	uint64_t r = __get_next_seed(fs);
 
 	__fill_random_buf(buf, len, r);
 	return r;
@@ -188,10 +185,7 @@ uint64_t fill_random_buf_percentage(struct frand_state *fs, void *buf,
 				    unsigned int segment, unsigned int len,
 				    char *pattern, unsigned int pbytes)
 {
-	uint64_t r = __rand(fs);
-
-	if (sizeof(int) != sizeof(long *))
-		r *= (unsigned long) __rand(fs);
+	uint64_t r = __get_next_seed(fs);
 
 	__fill_random_buf_percentage(r, buf, percentage, segment, len,
 					pattern, pbytes);
diff --git a/lib/rand.h b/lib/rand.h
index 46c1c5e0..a8060045 100644
--- a/lib/rand.h
+++ b/lib/rand.h
@@ -150,6 +150,16 @@ static inline uint64_t rand_between(struct frand_state *state, uint64_t start,
 		return start + rand32_upto(state, end - start);
 }
 
+static inline uint64_t __get_next_seed(struct frand_state *fs)
+{
+	uint64_t r = __rand(fs);
+
+	if (sizeof(int) != sizeof(long *))
+		r *= (unsigned long) __rand(fs);
+
+	return r;
+}
+
 extern void init_rand(struct frand_state *, bool);
 extern void init_rand_seed(struct frand_state *, uint64_t seed, bool);
 extern void __fill_random_buf(void *buf, unsigned int len, uint64_t seed);
diff --git a/options.c b/options.c
index a8986d11..8c2ab7cc 100644
--- a/options.c
+++ b/options.c
@@ -4497,6 +4497,40 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_IO_BUF,
 	},
+	{
+		.name	= "dedupe_mode",
+		.lname	= "Dedupe mode",
+		.help	= "Mode for the deduplication buffer generation",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct thread_options, dedupe_mode),
+		.parent	= "dedupe_percentage",
+		.def	= "repeat",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_IO_BUF,
+		.posval	= {
+			   { .ival = "repeat",
+			     .oval = DEDUPE_MODE_REPEAT,
+			     .help = "repeat previous page",
+			   },
+			   { .ival = "working_set",
+			     .oval = DEDUPE_MODE_WORKING_SET,
+			     .help = "choose a page randomly from limited working set defined in dedupe_working_set_percentage",
+			   },
+		},
+	},
+	{
+		.name	= "dedupe_working_set_percentage",
+		.lname	= "Dedupe working set percentage",
+		.help	= "Dedupe working set size in percentages from file or device size used to generate dedupe patterns from",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, dedupe_working_set_percentage),
+		.parent	= "dedupe_percentage",
+		.def	= "5",
+		.maxval	= 100,
+		.minval	= 0,
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_IO_BUF,
+	},
 	{
 		.name	= "clat_percentiles",
 		.lname	= "Completion latency percentiles",
diff --git a/server.c b/server.c
index 8daefbab..42eaa4b1 100644
--- a/server.c
+++ b/server.c
@@ -409,8 +409,9 @@ struct fio_net_cmd *fio_net_recv_cmd(int sk, bool wait)
 			if (cmdret->opcode == FIO_NET_CMD_TEXT) {
 				struct cmd_text_pdu *__pdu = (struct cmd_text_pdu *) cmdret->payload;
 				char *buf = (char *) __pdu->buf;
+				int len = le32_to_cpu(__pdu->buf_len);
 
-				buf[__pdu->buf_len] = '\0';
+				buf[len] = '\0';
 			} else if (cmdret->opcode == FIO_NET_CMD_JOB) {
 				struct cmd_job_pdu *__pdu = (struct cmd_job_pdu *) cmdret->payload;
 				char *buf = (char *) __pdu->buf;
diff --git a/server.h b/server.h
index c128df28..daed057a 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 91,
+	FIO_SERVER_VER			= 92,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/t/dedupe.c b/t/dedupe.c
index 68d31f19..8b659c76 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -473,11 +473,14 @@ static void show_chunk(struct chunk *c)
 	}
 }
 
-static void show_stat(uint64_t nextents, uint64_t nchunks)
+static void show_stat(uint64_t nextents, uint64_t nchunks, uint64_t ndupextents)
 {
 	double perc, ratio;
 
-	printf("Extents=%lu, Unique extents=%lu\n", (unsigned long) nextents, (unsigned long) nchunks);
+	printf("Extents=%lu, Unique extents=%lu", (unsigned long) nextents, (unsigned long) nchunks);
+	if (!bloom)
+		printf(" Duplicated extents=%lu", (unsigned long) ndupextents);
+	printf("\n");
 
 	if (nchunks) {
 		ratio = (double) nextents / (double) nchunks;
@@ -485,17 +488,20 @@ static void show_stat(uint64_t nextents, uint64_t nchunks)
 	} else
 		printf("De-dupe ratio: 1:infinite\n");
 
+	if (ndupextents)
+		printf("De-dupe working set at least: %3.2f%%\n", 100.0 * (double) ndupextents / (double) nextents);
+
 	perc = 1.00 - ((double) nchunks / (double) nextents);
 	perc *= 100.0;
 	printf("Fio setting: dedupe_percentage=%u\n", (int) (perc + 0.50));
 
 }
 
-static void iter_rb_tree(uint64_t *nextents, uint64_t *nchunks)
+static void iter_rb_tree(uint64_t *nextents, uint64_t *nchunks, uint64_t *ndupextents)
 {
 	struct fio_rb_node *n;
 
-	*nchunks = *nextents = 0;
+	*nchunks = *nextents = *ndupextents = 0;
 
 	n = rb_first(&rb_root);
 	if (!n)
@@ -507,6 +513,7 @@ static void iter_rb_tree(uint64_t *nextents, uint64_t *nchunks)
 		c = rb_entry(n, struct chunk, rb_node);
 		(*nchunks)++;
 		*nextents += c->count;
+		*ndupextents += (c->count > 1);
 
 		if (dump_output)
 			show_chunk(c);
@@ -530,7 +537,7 @@ static int usage(char *argv[])
 
 int main(int argc, char *argv[])
 {
-	uint64_t nextents = 0, nchunks = 0;
+	uint64_t nextents = 0, nchunks = 0, ndupextents = 0;
 	int c, ret;
 
 	arch_init(argv);
@@ -583,9 +590,9 @@ int main(int argc, char *argv[])
 
 	if (!ret) {
 		if (!bloom)
-			iter_rb_tree(&nextents, &nchunks);
+			iter_rb_tree(&nextents, &nchunks, &ndupextents);
 
-		show_stat(nextents, nchunks);
+		show_stat(nextents, nchunks, ndupextents);
 	}
 
 	fio_sem_remove(rb_lock);
diff --git a/thread_options.h b/thread_options.h
index 05c2d138..4b4ecfe1 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -31,6 +31,14 @@ enum fio_memtype {
 	MEM_CUDA_MALLOC,/* use GPU memory */
 };
 
+/*
+ * What mode to use for deduped data generation
+ */
+enum dedupe_mode {
+	DEDUPE_MODE_REPEAT = 0,
+	DEDUPE_MODE_WORKING_SET = 1,
+};
+
 #define ERROR_STR_MAX	128
 
 #define BSSPLIT_MAX	64
@@ -243,6 +251,8 @@ struct thread_options {
 	unsigned int compress_percentage;
 	unsigned int compress_chunk;
 	unsigned int dedupe_percentage;
+	unsigned int dedupe_mode;
+	unsigned int dedupe_working_set_percentage;
 	unsigned int time_based;
 	unsigned int disable_lat;
 	unsigned int disable_clat;
@@ -549,6 +559,8 @@ struct thread_options_pack {
 	uint32_t compress_percentage;
 	uint32_t compress_chunk;
 	uint32_t dedupe_percentage;
+	uint32_t dedupe_mode;
+	uint32_t dedupe_working_set_percentage;
 	uint32_t time_based;
 	uint32_t disable_lat;
 	uint32_t disable_clat;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-07-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-07-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 77c72e0f504364adf6a0e8f1155fdf3fd68ef248:

  Merge branch 'dedupe_bugfix' of https://github.com/bardavid/fio (2021-07-01 13:27:39 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7a9cc9c93c1384f72ac16d1d7980e158ec5f9f0a:

  Makefile: use override directive on engine CFLAGS (2021-07-07 07:05:08 -0600)

----------------------------------------------------------------
Stefan Hajnoczi (1):
      Makefile: use override directive on engine CFLAGS

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index f57569d5..510e07fc 100644
--- a/Makefile
+++ b/Makefile
@@ -293,7 +293,7 @@ else # !CONFIG_DYNAMIC_ENGINES
 define engine_template =
 SOURCE += $$($(1)_SRCS)
 LIBS += $$($(1)_LIBS)
-CFLAGS += $$($(1)_CFLAGS)
+override CFLAGS += $$($(1)_CFLAGS)
 endef
 endif
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-07-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-07-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ea51055cbb2fcbca3935e25c78e8b6d358ca2b3f:

  zbd: ensure that global max open zones limit is respected (2021-06-29 07:43:30 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 77c72e0f504364adf6a0e8f1155fdf3fd68ef248:

  Merge branch 'dedupe_bugfix' of https://github.com/bardavid/fio (2021-07-01 13:27:39 -0600)

----------------------------------------------------------------
Bar David (1):
      dedupe: fixing bug with subsequent dedupe buffer generation

Jens Axboe (1):
      Merge branch 'dedupe_bugfix' of https://github.com/bardavid/fio

 fio.h  |  1 +
 io_u.c | 12 ++++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/fio.h b/fio.h
index b05cb3df..83334652 100644
--- a/fio.h
+++ b/fio.h
@@ -259,6 +259,7 @@ struct thread_data {
 
 	struct frand_state buf_state;
 	struct frand_state buf_state_prev;
+	struct frand_state buf_state_ret;
 	struct frand_state dedupe_state;
 	struct frand_state zone_state;
 	struct frand_state prio_state;
diff --git a/io_u.c b/io_u.c
index b421a579..b60488a3 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2182,8 +2182,16 @@ static struct frand_state *get_buf_state(struct thread_data *td)
 
 	v = rand_between(&td->dedupe_state, 1, 100);
 
-	if (v <= td->o.dedupe_percentage)
-		return &td->buf_state_prev;
+	if (v <= td->o.dedupe_percentage) {
+		/*
+		 * The caller advances the returned frand_state.
+		 * A copy of prev should be returned instead since
+		 * a subsequent intention to generate a deduped buffer
+		 * might result in generating a unique one
+		 */
+		frand_copy(&td->buf_state_ret, &td->buf_state_prev);
+		return &td->buf_state_ret;
+	}
 
 	return &td->buf_state;
 }


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-06-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-06-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d3dacdc61dfe878fda0c363084c4330492e38b2b:

  Merge branch 'pkg_config_1' of https://github.com/kusumi/fio (2021-06-20 10:44:49 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ea51055cbb2fcbca3935e25c78e8b6d358ca2b3f:

  zbd: ensure that global max open zones limit is respected (2021-06-29 07:43:30 -0600)

----------------------------------------------------------------
Erwan Velu (2):
      tools: Adding fiograph
      examples: Avoid duplicated items

Jens Axboe (1):
      Merge branch 'evelu-fiog' of https://github.com/ErwanAliasr1/fio

Niklas Cassel (3):
      zbd: create a local zbdi variable for f->zbd_info
      zbd: allow an unlimited global max open zones limit
      zbd: ensure that global max open zones limit is respected

 examples/1mbs_clients.png               | Bin 0 -> 121336 bytes
 examples/aio-read.png                   | Bin 0 -> 76819 bytes
 examples/backwards-read.png             | Bin 0 -> 31100 bytes
 examples/basic-verify.png               | Bin 0 -> 35952 bytes
 examples/butterfly.png                  | Bin 0 -> 35393 bytes
 examples/cpp_null.png                   | Bin 0 -> 34346 bytes
 examples/cpuio.png                      | Bin 0 -> 52593 bytes
 examples/cross-stripe-verify.png        | Bin 0 -> 54366 bytes
 examples/dev-dax.png                    | Bin 0 -> 93539 bytes
 examples/dfs.png                        | Bin 0 -> 187461 bytes
 examples/disk-zone-profile.png          | Bin 0 -> 37313 bytes
 examples/e4defrag.png                   | Bin 0 -> 97107 bytes
 examples/e4defrag2.fio                  |   2 +-
 examples/e4defrag2.png                  | Bin 0 -> 222226 bytes
 examples/enospc-pressure.png            | Bin 0 -> 150373 bytes
 examples/exitwhat.png                   | Bin 0 -> 111627 bytes
 examples/falloc.png                     | Bin 0 -> 129273 bytes
 examples/filecreate-ioengine.png        | Bin 0 -> 59636 bytes
 examples/filedelete-ioengine.png        | Bin 0 -> 33042 bytes
 examples/filestat-ioengine.png          | Bin 0 -> 52330 bytes
 examples/fio-rand-RW.png                | Bin 0 -> 47406 bytes
 examples/fio-rand-read.png              | Bin 0 -> 36614 bytes
 examples/fio-rand-write.png             | Bin 0 -> 38608 bytes
 examples/fio-seq-RW.png                 | Bin 0 -> 48279 bytes
 examples/fio-seq-read.png               | Bin 0 -> 37851 bytes
 examples/fio-seq-write.png              | Bin 0 -> 42756 bytes
 examples/fixed-rate-submission.png      | Bin 0 -> 41703 bytes
 examples/flow.png                       | Bin 0 -> 63860 bytes
 examples/fsx.png                        | Bin 0 -> 37310 bytes
 examples/ftruncate.png                  | Bin 0 -> 56594 bytes
 examples/gfapi.png                      | Bin 0 -> 46875 bytes
 examples/gpudirect-rdmaio-client.png    | Bin 0 -> 50659 bytes
 examples/gpudirect-rdmaio-server.png    | Bin 0 -> 37805 bytes
 examples/http-s3.png                    | Bin 0 -> 108929 bytes
 examples/http-swift.png                 | Bin 0 -> 113698 bytes
 examples/http-webdav.png                | Bin 0 -> 86857 bytes
 examples/ime.png                        | Bin 0 -> 193722 bytes
 examples/iometer-file-access-server.png | Bin 0 -> 44797 bytes
 examples/jesd219.png                    | Bin 0 -> 64846 bytes
 examples/latency-profile.png            | Bin 0 -> 44487 bytes
 examples/libcufile-cufile.png           | Bin 0 -> 160611 bytes
 examples/libcufile-posix.png            | Bin 0 -> 164649 bytes
 examples/libhdfs.png                    | Bin 0 -> 32812 bytes
 examples/libiscsi.png                   | Bin 0 -> 31649 bytes
 examples/libpmem.png                    | Bin 0 -> 119668 bytes
 examples/librpma_apm-client.png         | Bin 0 -> 53792 bytes
 examples/librpma_apm-server.png         | Bin 0 -> 42611 bytes
 examples/librpma_gpspm-client.png       | Bin 0 -> 56398 bytes
 examples/librpma_gpspm-server.png       | Bin 0 -> 53793 bytes
 examples/libzbc-rand-write.png          | Bin 0 -> 48503 bytes
 examples/libzbc-seq-read.png            | Bin 0 -> 47229 bytes
 examples/mtd.fio                        |   4 +-
 examples/mtd.png                        | Bin 0 -> 79866 bytes
 examples/nbd.png                        | Bin 0 -> 88667 bytes
 examples/netio.png                      | Bin 0 -> 50944 bytes
 examples/netio_multicast.png            | Bin 0 -> 74921 bytes
 examples/nfs.png                        | Bin 0 -> 84808 bytes
 examples/null.png                       | Bin 0 -> 30223 bytes
 examples/numa.png                       | Bin 0 -> 66068 bytes
 examples/pmemblk.fio                    |   2 +-
 examples/pmemblk.png                    | Bin 0 -> 107529 bytes
 examples/poisson-rate-submission.png    | Bin 0 -> 41057 bytes
 examples/rados.png                      | Bin 0 -> 39665 bytes
 examples/rand-zones.png                 | Bin 0 -> 38297 bytes
 examples/rbd.png                        | Bin 0 -> 37191 bytes
 examples/rdmaio-client.png              | Bin 0 -> 44671 bytes
 examples/rdmaio-server.png              | Bin 0 -> 31860 bytes
 examples/ssd-steadystate.png            | Bin 0 -> 71772 bytes
 examples/ssd-test.png                   | Bin 0 -> 99835 bytes
 examples/steadystate.png                | Bin 0 -> 64580 bytes
 examples/surface-scan.png               | Bin 0 -> 72042 bytes
 examples/test.png                       | Bin 0 -> 30141 bytes
 examples/tiobench-example.png           | Bin 0 -> 71939 bytes
 examples/waitfor.png                    | Bin 0 -> 94577 bytes
 examples/zbd-rand-write.png             | Bin 0 -> 53018 bytes
 examples/zbd-seq-read.png               | Bin 0 -> 50185 bytes
 examples/zipf.png                       | Bin 0 -> 33276 bytes
 tools/fiograph/fiograph.conf            | 102 +++++++++++
 tools/fiograph/fiograph.py              | 305 ++++++++++++++++++++++++++++++++
 zbd.c                                   | 111 +++++++-----
 zbd.h                                   |   3 +-
 81 files changed, 484 insertions(+), 45 deletions(-)
 create mode 100644 examples/1mbs_clients.png
 create mode 100644 examples/aio-read.png
 create mode 100644 examples/backwards-read.png
 create mode 100644 examples/basic-verify.png
 create mode 100644 examples/butterfly.png
 create mode 100644 examples/cpp_null.png
 create mode 100644 examples/cpuio.png
 create mode 100644 examples/cross-stripe-verify.png
 create mode 100644 examples/dev-dax.png
 create mode 100644 examples/dfs.png
 create mode 100644 examples/disk-zone-profile.png
 create mode 100644 examples/e4defrag.png
 create mode 100644 examples/e4defrag2.png
 create mode 100644 examples/enospc-pressure.png
 create mode 100644 examples/exitwhat.png
 create mode 100644 examples/falloc.png
 create mode 100644 examples/filecreate-ioengine.png
 create mode 100644 examples/filedelete-ioengine.png
 create mode 100644 examples/filestat-ioengine.png
 create mode 100644 examples/fio-rand-RW.png
 create mode 100644 examples/fio-rand-read.png
 create mode 100644 examples/fio-rand-write.png
 create mode 100644 examples/fio-seq-RW.png
 create mode 100644 examples/fio-seq-read.png
 create mode 100644 examples/fio-seq-write.png
 create mode 100644 examples/fixed-rate-submission.png
 create mode 100644 examples/flow.png
 create mode 100644 examples/fsx.png
 create mode 100644 examples/ftruncate.png
 create mode 100644 examples/gfapi.png
 create mode 100644 examples/gpudirect-rdmaio-client.png
 create mode 100644 examples/gpudirect-rdmaio-server.png
 create mode 100644 examples/http-s3.png
 create mode 100644 examples/http-swift.png
 create mode 100644 examples/http-webdav.png
 create mode 100644 examples/ime.png
 create mode 100644 examples/iometer-file-access-server.png
 create mode 100644 examples/jesd219.png
 create mode 100644 examples/latency-profile.png
 create mode 100644 examples/libcufile-cufile.png
 create mode 100644 examples/libcufile-posix.png
 create mode 100644 examples/libhdfs.png
 create mode 100644 examples/libiscsi.png
 create mode 100644 examples/libpmem.png
 create mode 100644 examples/librpma_apm-client.png
 create mode 100644 examples/librpma_apm-server.png
 create mode 100644 examples/librpma_gpspm-client.png
 create mode 100644 examples/librpma_gpspm-server.png
 create mode 100644 examples/libzbc-rand-write.png
 create mode 100644 examples/libzbc-seq-read.png
 create mode 100644 examples/mtd.png
 create mode 100644 examples/nbd.png
 create mode 100644 examples/netio.png
 create mode 100644 examples/netio_multicast.png
 create mode 100644 examples/nfs.png
 create mode 100644 examples/null.png
 create mode 100644 examples/numa.png
 create mode 100644 examples/pmemblk.png
 create mode 100644 examples/poisson-rate-submission.png
 create mode 100644 examples/rados.png
 create mode 100644 examples/rand-zones.png
 create mode 100644 examples/rbd.png
 create mode 100644 examples/rdmaio-client.png
 create mode 100644 examples/rdmaio-server.png
 create mode 100644 examples/ssd-steadystate.png
 create mode 100644 examples/ssd-test.png
 create mode 100644 examples/steadystate.png
 create mode 100644 examples/surface-scan.png
 create mode 100644 examples/test.png
 create mode 100644 examples/tiobench-example.png
 create mode 100644 examples/waitfor.png
 create mode 100644 examples/zbd-rand-write.png
 create mode 100644 examples/zbd-seq-read.png
 create mode 100644 examples/zipf.png
 create mode 100644 tools/fiograph/fiograph.conf
 create mode 100755 tools/fiograph/fiograph.py

---

Diff of recent changes:

diff --git a/examples/1mbs_clients.png b/examples/1mbs_clients.png
new file mode 100644
index 00000000..3f972dc6
Binary files /dev/null and b/examples/1mbs_clients.png differ
diff --git a/examples/aio-read.png b/examples/aio-read.png
new file mode 100644
index 00000000..e0c020a5
Binary files /dev/null and b/examples/aio-read.png differ
diff --git a/examples/backwards-read.png b/examples/backwards-read.png
new file mode 100644
index 00000000..81dc9208
Binary files /dev/null and b/examples/backwards-read.png differ
diff --git a/examples/basic-verify.png b/examples/basic-verify.png
new file mode 100644
index 00000000..98f73020
Binary files /dev/null and b/examples/basic-verify.png differ
diff --git a/examples/butterfly.png b/examples/butterfly.png
new file mode 100644
index 00000000..2c566512
Binary files /dev/null and b/examples/butterfly.png differ
diff --git a/examples/cpp_null.png b/examples/cpp_null.png
new file mode 100644
index 00000000..5303ac2a
Binary files /dev/null and b/examples/cpp_null.png differ
diff --git a/examples/cpuio.png b/examples/cpuio.png
new file mode 100644
index 00000000..02938dbb
Binary files /dev/null and b/examples/cpuio.png differ
diff --git a/examples/cross-stripe-verify.png b/examples/cross-stripe-verify.png
new file mode 100644
index 00000000..90aa630f
Binary files /dev/null and b/examples/cross-stripe-verify.png differ
diff --git a/examples/dev-dax.png b/examples/dev-dax.png
new file mode 100644
index 00000000..2463bca3
Binary files /dev/null and b/examples/dev-dax.png differ
diff --git a/examples/dfs.png b/examples/dfs.png
new file mode 100644
index 00000000..049ccaec
Binary files /dev/null and b/examples/dfs.png differ
diff --git a/examples/disk-zone-profile.png b/examples/disk-zone-profile.png
new file mode 100644
index 00000000..5f7b24c9
Binary files /dev/null and b/examples/disk-zone-profile.png differ
diff --git a/examples/e4defrag.png b/examples/e4defrag.png
new file mode 100644
index 00000000..00a7fefd
Binary files /dev/null and b/examples/e4defrag.png differ
diff --git a/examples/e4defrag2.fio b/examples/e4defrag2.fio
index 2d4e1a87..86554ef7 100644
--- a/examples/e4defrag2.fio
+++ b/examples/e4defrag2.fio
@@ -48,7 +48,7 @@ donorname=file.def
 
 ########
 # Run random e4defrag and various aio workers in parallel
-[e4defrag-fuzzer-4k]
+[e4defrag-fuzzer-4k-bis]
 stonewall
 continue_on_error=all
 inplace=1
diff --git a/examples/e4defrag2.png b/examples/e4defrag2.png
new file mode 100644
index 00000000..8a128e95
Binary files /dev/null and b/examples/e4defrag2.png differ
diff --git a/examples/enospc-pressure.png b/examples/enospc-pressure.png
new file mode 100644
index 00000000..da28b7c0
Binary files /dev/null and b/examples/enospc-pressure.png differ
diff --git a/examples/exitwhat.png b/examples/exitwhat.png
new file mode 100644
index 00000000..9fc1883f
Binary files /dev/null and b/examples/exitwhat.png differ
diff --git a/examples/falloc.png b/examples/falloc.png
new file mode 100644
index 00000000..886be22e
Binary files /dev/null and b/examples/falloc.png differ
diff --git a/examples/filecreate-ioengine.png b/examples/filecreate-ioengine.png
new file mode 100644
index 00000000..45d11da3
Binary files /dev/null and b/examples/filecreate-ioengine.png differ
diff --git a/examples/filedelete-ioengine.png b/examples/filedelete-ioengine.png
new file mode 100644
index 00000000..3512ab71
Binary files /dev/null and b/examples/filedelete-ioengine.png differ
diff --git a/examples/filestat-ioengine.png b/examples/filestat-ioengine.png
new file mode 100644
index 00000000..bed59ab9
Binary files /dev/null and b/examples/filestat-ioengine.png differ
diff --git a/examples/fio-rand-RW.png b/examples/fio-rand-RW.png
new file mode 100644
index 00000000..aa4b0998
Binary files /dev/null and b/examples/fio-rand-RW.png differ
diff --git a/examples/fio-rand-read.png b/examples/fio-rand-read.png
new file mode 100644
index 00000000..d45664a4
Binary files /dev/null and b/examples/fio-rand-read.png differ
diff --git a/examples/fio-rand-write.png b/examples/fio-rand-write.png
new file mode 100644
index 00000000..10e068bc
Binary files /dev/null and b/examples/fio-rand-write.png differ
diff --git a/examples/fio-seq-RW.png b/examples/fio-seq-RW.png
new file mode 100644
index 00000000..a2be35ec
Binary files /dev/null and b/examples/fio-seq-RW.png differ
diff --git a/examples/fio-seq-read.png b/examples/fio-seq-read.png
new file mode 100644
index 00000000..cf8f2978
Binary files /dev/null and b/examples/fio-seq-read.png differ
diff --git a/examples/fio-seq-write.png b/examples/fio-seq-write.png
new file mode 100644
index 00000000..8db12092
Binary files /dev/null and b/examples/fio-seq-write.png differ
diff --git a/examples/fixed-rate-submission.png b/examples/fixed-rate-submission.png
new file mode 100644
index 00000000..86ca9b3e
Binary files /dev/null and b/examples/fixed-rate-submission.png differ
diff --git a/examples/flow.png b/examples/flow.png
new file mode 100644
index 00000000..26a3d34c
Binary files /dev/null and b/examples/flow.png differ
diff --git a/examples/fsx.png b/examples/fsx.png
new file mode 100644
index 00000000..b4e13c80
Binary files /dev/null and b/examples/fsx.png differ
diff --git a/examples/ftruncate.png b/examples/ftruncate.png
new file mode 100644
index 00000000..b98895f6
Binary files /dev/null and b/examples/ftruncate.png differ
diff --git a/examples/gfapi.png b/examples/gfapi.png
new file mode 100644
index 00000000..acc6a6ae
Binary files /dev/null and b/examples/gfapi.png differ
diff --git a/examples/gpudirect-rdmaio-client.png b/examples/gpudirect-rdmaio-client.png
new file mode 100644
index 00000000..eac79858
Binary files /dev/null and b/examples/gpudirect-rdmaio-client.png differ
diff --git a/examples/gpudirect-rdmaio-server.png b/examples/gpudirect-rdmaio-server.png
new file mode 100644
index 00000000..e043d7c0
Binary files /dev/null and b/examples/gpudirect-rdmaio-server.png differ
diff --git a/examples/http-s3.png b/examples/http-s3.png
new file mode 100644
index 00000000..2021e85e
Binary files /dev/null and b/examples/http-s3.png differ
diff --git a/examples/http-swift.png b/examples/http-swift.png
new file mode 100644
index 00000000..9928fb16
Binary files /dev/null and b/examples/http-swift.png differ
diff --git a/examples/http-webdav.png b/examples/http-webdav.png
new file mode 100644
index 00000000..c37c3de5
Binary files /dev/null and b/examples/http-webdav.png differ
diff --git a/examples/ime.png b/examples/ime.png
new file mode 100644
index 00000000..f636f5e7
Binary files /dev/null and b/examples/ime.png differ
diff --git a/examples/iometer-file-access-server.png b/examples/iometer-file-access-server.png
new file mode 100644
index 00000000..e3124554
Binary files /dev/null and b/examples/iometer-file-access-server.png differ
diff --git a/examples/jesd219.png b/examples/jesd219.png
new file mode 100644
index 00000000..73b5a124
Binary files /dev/null and b/examples/jesd219.png differ
diff --git a/examples/latency-profile.png b/examples/latency-profile.png
new file mode 100644
index 00000000..50650df8
Binary files /dev/null and b/examples/latency-profile.png differ
diff --git a/examples/libcufile-cufile.png b/examples/libcufile-cufile.png
new file mode 100644
index 00000000..f3758e5d
Binary files /dev/null and b/examples/libcufile-cufile.png differ
diff --git a/examples/libcufile-posix.png b/examples/libcufile-posix.png
new file mode 100644
index 00000000..7818feb4
Binary files /dev/null and b/examples/libcufile-posix.png differ
diff --git a/examples/libhdfs.png b/examples/libhdfs.png
new file mode 100644
index 00000000..e774c911
Binary files /dev/null and b/examples/libhdfs.png differ
diff --git a/examples/libiscsi.png b/examples/libiscsi.png
new file mode 100644
index 00000000..d0006cc0
Binary files /dev/null and b/examples/libiscsi.png differ
diff --git a/examples/libpmem.png b/examples/libpmem.png
new file mode 100644
index 00000000..8a9a1432
Binary files /dev/null and b/examples/libpmem.png differ
diff --git a/examples/librpma_apm-client.png b/examples/librpma_apm-client.png
new file mode 100644
index 00000000..2fe02cdf
Binary files /dev/null and b/examples/librpma_apm-client.png differ
diff --git a/examples/librpma_apm-server.png b/examples/librpma_apm-server.png
new file mode 100644
index 00000000..f78ae02e
Binary files /dev/null and b/examples/librpma_apm-server.png differ
diff --git a/examples/librpma_gpspm-client.png b/examples/librpma_gpspm-client.png
new file mode 100644
index 00000000..0c975a27
Binary files /dev/null and b/examples/librpma_gpspm-client.png differ
diff --git a/examples/librpma_gpspm-server.png b/examples/librpma_gpspm-server.png
new file mode 100644
index 00000000..56124533
Binary files /dev/null and b/examples/librpma_gpspm-server.png differ
diff --git a/examples/libzbc-rand-write.png b/examples/libzbc-rand-write.png
new file mode 100644
index 00000000..1d277412
Binary files /dev/null and b/examples/libzbc-rand-write.png differ
diff --git a/examples/libzbc-seq-read.png b/examples/libzbc-seq-read.png
new file mode 100644
index 00000000..5a532228
Binary files /dev/null and b/examples/libzbc-seq-read.png differ
diff --git a/examples/mtd.fio b/examples/mtd.fio
index e5dcea4c..0a7f2bae 100644
--- a/examples/mtd.fio
+++ b/examples/mtd.fio
@@ -6,7 +6,7 @@ ignore_error=,EIO
 blocksize=512,512,16384
 skip_bad=1
 
-[write]
+[trim]
 stonewall
 rw=trim
 
@@ -14,7 +14,7 @@ rw=trim
 stonewall
 rw=write
 
-[write]
+[trimwrite]
 stonewall
 block_error_percentiles=1
 rw=trimwrite
diff --git a/examples/mtd.png b/examples/mtd.png
new file mode 100644
index 00000000..8cb3692e
Binary files /dev/null and b/examples/mtd.png differ
diff --git a/examples/nbd.png b/examples/nbd.png
new file mode 100644
index 00000000..e3bcf610
Binary files /dev/null and b/examples/nbd.png differ
diff --git a/examples/netio.png b/examples/netio.png
new file mode 100644
index 00000000..81afd41d
Binary files /dev/null and b/examples/netio.png differ
diff --git a/examples/netio_multicast.png b/examples/netio_multicast.png
new file mode 100644
index 00000000..f07ab4b7
Binary files /dev/null and b/examples/netio_multicast.png differ
diff --git a/examples/nfs.png b/examples/nfs.png
new file mode 100644
index 00000000..29dbca0d
Binary files /dev/null and b/examples/nfs.png differ
diff --git a/examples/null.png b/examples/null.png
new file mode 100644
index 00000000..052671db
Binary files /dev/null and b/examples/null.png differ
diff --git a/examples/numa.png b/examples/numa.png
new file mode 100644
index 00000000..1ef45759
Binary files /dev/null and b/examples/numa.png differ
diff --git a/examples/pmemblk.fio b/examples/pmemblk.fio
index f8131741..59bb2a8a 100644
--- a/examples/pmemblk.fio
+++ b/examples/pmemblk.fio
@@ -55,7 +55,7 @@ unlink=0
 # size, this is not required.
 #
 filename=/pmem0/fio-test,4096,1024
-filename=/pmem1/fio-test,4096,1024
+#filename=/pmem1/fio-test,4096,1024
 
 [pmemblk-write]
 rw=randwrite
diff --git a/examples/pmemblk.png b/examples/pmemblk.png
new file mode 100644
index 00000000..250e254b
Binary files /dev/null and b/examples/pmemblk.png differ
diff --git a/examples/poisson-rate-submission.png b/examples/poisson-rate-submission.png
new file mode 100644
index 00000000..739c2560
Binary files /dev/null and b/examples/poisson-rate-submission.png differ
diff --git a/examples/rados.png b/examples/rados.png
new file mode 100644
index 00000000..91bd61a0
Binary files /dev/null and b/examples/rados.png differ
diff --git a/examples/rand-zones.png b/examples/rand-zones.png
new file mode 100644
index 00000000..13cbfb47
Binary files /dev/null and b/examples/rand-zones.png differ
diff --git a/examples/rbd.png b/examples/rbd.png
new file mode 100644
index 00000000..f1186139
Binary files /dev/null and b/examples/rbd.png differ
diff --git a/examples/rdmaio-client.png b/examples/rdmaio-client.png
new file mode 100644
index 00000000..4e4bc289
Binary files /dev/null and b/examples/rdmaio-client.png differ
diff --git a/examples/rdmaio-server.png b/examples/rdmaio-server.png
new file mode 100644
index 00000000..fc344725
Binary files /dev/null and b/examples/rdmaio-server.png differ
diff --git a/examples/ssd-steadystate.png b/examples/ssd-steadystate.png
new file mode 100644
index 00000000..eb27f8a4
Binary files /dev/null and b/examples/ssd-steadystate.png differ
diff --git a/examples/ssd-test.png b/examples/ssd-test.png
new file mode 100644
index 00000000..a92ed153
Binary files /dev/null and b/examples/ssd-test.png differ
diff --git a/examples/steadystate.png b/examples/steadystate.png
new file mode 100644
index 00000000..4bb90484
Binary files /dev/null and b/examples/steadystate.png differ
diff --git a/examples/surface-scan.png b/examples/surface-scan.png
new file mode 100644
index 00000000..00573808
Binary files /dev/null and b/examples/surface-scan.png differ
diff --git a/examples/test.png b/examples/test.png
new file mode 100644
index 00000000..6be50029
Binary files /dev/null and b/examples/test.png differ
diff --git a/examples/tiobench-example.png b/examples/tiobench-example.png
new file mode 100644
index 00000000..14410326
Binary files /dev/null and b/examples/tiobench-example.png differ
diff --git a/examples/waitfor.png b/examples/waitfor.png
new file mode 100644
index 00000000..64e4bf94
Binary files /dev/null and b/examples/waitfor.png differ
diff --git a/examples/zbd-rand-write.png b/examples/zbd-rand-write.png
new file mode 100644
index 00000000..d58721be
Binary files /dev/null and b/examples/zbd-rand-write.png differ
diff --git a/examples/zbd-seq-read.png b/examples/zbd-seq-read.png
new file mode 100644
index 00000000..b81a08c4
Binary files /dev/null and b/examples/zbd-seq-read.png differ
diff --git a/examples/zipf.png b/examples/zipf.png
new file mode 100644
index 00000000..cb2a9816
Binary files /dev/null and b/examples/zipf.png differ
diff --git a/tools/fiograph/fiograph.conf b/tools/fiograph/fiograph.conf
new file mode 100644
index 00000000..7b851e19
--- /dev/null
+++ b/tools/fiograph/fiograph.conf
@@ -0,0 +1,102 @@
+[fio_jobs]
+header=<<B><font color="{}"> {} </font></B> >
+header_color=black
+text_color=darkgreen
+shape=box
+shape_color=blue
+style=rounded
+title_style=<<table border='0' cellborder='0' cellspacing='1'> <tr> <td align='center'> <b> {} </b> </td> </tr>
+item_style=<tr> <td align = "left"> <font color="{}" > {} </font> </td> </tr>
+cluster_style=filled
+cluster_color=gainsboro
+
+[exec_prerun]
+text_color=red
+
+[exec_postrun]
+text_color=red
+
+[numjobs]
+text_color=red
+style=<font color="{}" > x {} </font>
+
+[ioengine]
+text_color=darkblue
+specific_options_color=darkblue
+
+# definitions of engine's specific options
+
+[ioengine_cpuio]
+specific_options=cpuload cpumode cpuchunks exit_on_io_done
+
+[ioengine_dfs]
+specific_options=pool  cont  chunk_size  object_class  svcl
+
+[ioengine_e4defrag]
+specific_options=donorname  inplace
+
+[ioengine_filestat]
+specific_options=stat_type
+
+[ioengine_single-instance]
+specific_options=volume  brick
+
+[ioengine_http]
+specific_options=https  http_host  http_user  http_pass  http_s3_key  http_s3_keyid  http_swift_auth_token  http_s3_region  http_mode  http_verbose
+
+[ioengine_ime_aio]
+specific_options=ime_psync  ime_psyncv
+
+[ioengine_io_uring]
+specific_options=hipri  cmdprio_percentage  cmdprio_percentage  fixedbufs  registerfiles  sqthread_poll  sqthread_poll_cpu  nonvectored  uncached  nowait  force_async
+
+[ioengine_libaio]
+specific_options=userspace_reap  cmdprio_percentage  cmdprio_percentage  nowait
+
+[ioengine_libcufile]
+specific_options=gpu_dev_ids  cuda_io
+
+[ioengine_libhdfs]
+specific_options=namenode  hostname  port  hdfsdirectory  chunk_size  single_instance  hdfs_use_direct
+
+[ioengine_libiscsi]
+specific_options=initiator
+
+[ioengine_librpma_apm_server]
+specific_options=librpma_apm_client
+
+[ioengine_busy_wait_polling]
+specific_options=serverip  port  direct_write_to_pmem
+
+[ioengine_librpma_gpspm_server]
+specific_options=librpma_gpspm_client
+
+[ioengine_mmap]
+specific_options=thp
+
+[ioengine_mtd]
+specific_options=skip_bad
+
+[ioengine_nbd]
+specific_options=uri
+
+[ioengine_net]
+specific_options=hostname  port  protocol  nodelay  listen  pingpong  interface  ttl  window_size  mss  netsplice
+
+[ioengine_nfs]
+specific_options=nfs_url
+
+[ioengine_rados]
+specific_options=clustername  pool  clientname  busy_poll  touch_objects
+
+[ioengine_rbd]
+specific_options=clustername  rbdname  pool  clientname  busy_poll
+
+[ioengine_rdma]
+specific_options=hostname  bindname  port  verb
+
+[ioengine_sg]
+specific_options=hipri  readfua  writefua  sg_write_mode  sg
+
+[ioengine_pvsync2]
+specific_options=hipri  hipri_percentage  uncached  nowait  sync  psync  vsync  pvsync
diff --git a/tools/fiograph/fiograph.py b/tools/fiograph/fiograph.py
new file mode 100755
index 00000000..7695c964
--- /dev/null
+++ b/tools/fiograph/fiograph.py
@@ -0,0 +1,305 @@
+#!/usr/bin/env python3
+from graphviz import Digraph
+import argparse
+import configparser
+import os
+
+config_file = None
+fio_file = None
+
+
+def get_section_option(section_name, option_name, default=None):
+    global fio_file
+    if fio_file.has_option(section_name, option_name):
+        return fio_file[section_name][option_name]
+    return default
+
+
+def get_config_option(section_name, option_name, default=None):
+    global config_file
+    if config_file.has_option(section_name, option_name):
+        return config_file[section_name][option_name]
+    return default
+
+
+def get_header_color(keyword='fio_jobs', default_color='black'):
+    return get_config_option(keyword, 'header_color', default_color)
+
+
+def get_shape_color(keyword='fio_jobs', default_color='black'):
+    return get_config_option(keyword, 'shape_color', default_color)
+
+
+def get_text_color(keyword='fio_jobs', default_color='black'):
+    return get_config_option(keyword, 'text_color', default_color)
+
+
+def get_cluster_color(keyword='fio_jobs', default_color='gray92'):
+    return get_config_option(keyword, 'cluster_color', default_color)
+
+
+def get_header(keyword='fio_jobs'):
+    return get_config_option(keyword, 'header')
+
+
+def get_shape(keyword='fio_jobs'):
+    return get_config_option(keyword, 'shape', 'box')
+
+
+def get_style(keyword='fio_jobs'):
+    return get_config_option(keyword, 'style', 'rounded')
+
+
+def get_cluster_style(keyword='fio_jobs'):
+    return get_config_option(keyword, 'cluster_style', 'filled')
+
+
+def get_specific_options(engine):
+    if not engine:
+        return ''
+    return get_config_option('ioengine_{}'.format(engine), 'specific_options', '').split(' ')
+
+
+def render_option(section, label, display, option, color_override=None):
+    # These options are already shown with graphical helpers, no need to report them directly
+    skip_list = ['size', 'stonewall', 'runtime', 'time_based',
+                 'numjobs', 'wait_for', 'wait_for_previous']
+    # If the option doesn't exist or if a special handling is already done
+    # don't render it, just return the current state
+    if option in skip_list or option not in section:
+        return label, display
+    display = option
+    if section[option]:
+        display = '{} = {}'.format(display, section[option])
+
+    # Adding jobs's options into the box, darkgreen is the default color
+    if color_override:
+        color = color_override
+    else:
+        color = get_text_color(option, get_text_color('fio_jobs', 'darkgreen'))
+    label += get_config_option('fio_jobs',
+                               'item_style').format(color, display)
+    return label, display
+
+
+def render_options(fio_file, section_name):
+    """Render all options of a section."""
+    display = section_name
+    section = fio_file[section_name]
+
+    # Add a multiplier to the section_name if numjobs is set
+    numjobs = int(get_section_option(section_name, 'numjobs', '1'))
+    if numjobs > 1:
+        display = display + \
+            get_style('numjobs').format(
+                get_text_color('numjobs'), numjobs)
+
+    # Header of the box
+    label = get_config_option('fio_jobs', 'title_style').format(display)
+
+    # Let's parse all the options of the current fio thread
+    # Some needs to be printed on top or bottom of the job to ease the read
+    to_early_print = ['exec_prerun', 'ioengine']
+    to_late_print = ['exec_postrun']
+
+    # Let's print the options on top of the box
+    for early_print in to_early_print:
+        label, display = render_option(
+            section, label, display, early_print)
+
+    current_io_engine = get_section_option(
+        section_name, 'ioengine', None)
+    if current_io_engine:
+        # Let's print all specifics options for this engine
+        for specific_option in sorted(get_specific_options(current_io_engine)):
+            label, display = render_option(
+                section, label, display, specific_option, get_config_option('ioengine', 'specific_options_color'))
+
+    # Let's print generic options sorted by name
+    for option in sorted(section):
+        if option in to_early_print or option in to_late_print or option in get_specific_options(current_io_engine):
+            continue
+        label, display = render_option(section, label, display, option)
+
+    # let's print options on the bottom of the box
+    for late_print in to_late_print:
+        label, display = render_option(
+            section, label, display, late_print)
+
+    # End of the box content
+    label += '</table>>'
+    return label
+
+
+def render_section(current_graph, fio_file, section_name, label):
+    """Render the section."""
+    attr = None
+    section = fio_file[section_name]
+
+    # Let's render the box associated to a job
+    current_graph.node(section_name, label,
+                       shape=get_shape(),
+                       color=get_shape_color(),
+                       style=get_style())
+
+    # Let's report the duration of the jobs with a self-loop arrow
+    if 'runtime' in section and 'time_based' in section:
+        attr = 'runtime={}'.format(section['runtime'])
+    elif 'size' in section:
+        attr = 'size={}'.format(section['size'])
+    if attr:
+        current_graph.edge(section_name, section_name, attr)
+
+
+def create_sub_graph(name):
+    """Return a new graph."""
+    # We need to put 'cluster' in the name to ensure graphviz consider it as a cluster
+    cluster_name = 'cluster_' + name
+    # Unset the main graph labels to avoid a recopy in each subgraph
+    attr = {}
+    attr['label'] = ''
+    new_graph = Digraph(name=cluster_name, graph_attr=attr)
+    new_graph.attr(style=get_cluster_style(),
+                   color=get_cluster_color())
+    return new_graph
+
+
+def create_legend():
+    """Return a legend."""
+    html_table = "<<table border='0' cellborder='1' cellspacing='0' cellpadding='4'>"
+    html_table += '<tr><td COLSPAN="2"><b>Legend</b></td></tr>'
+    legend_item = '<tr> <td>{}</td> <td><font color="{}">{}</font></td></tr>"'
+    legend_bgcolor_item = '<tr><td>{}</td><td BGCOLOR="{}"></td></tr>'
+    html_table += legend_item.format('numjobs',
+                                     get_text_color('numjobs'), 'x numjobs')
+    html_table += legend_item.format('generic option',
+                                     get_text_color(), 'generic option')
+    html_table += legend_item.format('ioengine option',
+                                     get_text_color('ioengine'), 'ioengine option')
+    html_table += legend_bgcolor_item.format('job', get_shape_color())
+    html_table += legend_bgcolor_item.format(
+        'execution group', get_cluster_color())
+    html_table += '</table>>'
+    legend = Digraph('html_table')
+    legend.node('legend', shape='none', label=html_table)
+    return legend
+
+
+def fio_to_graphviz(filename, format):
+    """Compute the graphviz graph from the fio file."""
+
+    # Let's read the fio file
+    global fio_file
+    fio_file = configparser.RawConfigParser(
+        allow_no_value=True,
+        default_section="global",
+        inline_comment_prefixes="'#', ';'")
+    fio_file.read(filename)
+
+    # Prepare the main graph object
+    # Let's define the header of the document
+    attrs = {}
+    attrs['labelloc'] = 't'
+    attrs['label'] = get_header().format(
+        get_header_color(), os.path.basename(filename))
+    main_graph = Digraph(engine='dot', graph_attr=attrs, format=format)
+
+    # Let's add a legend
+    main_graph.subgraph(create_legend())
+
+    # By default all jobs are run in parallel and depends on "global"
+    depends_on = fio_file.default_section
+
+    # The previous section is by default the global section
+    previous_section = fio_file.default_section
+
+    current_graph = main_graph
+
+    # The first job will be a new execution group
+    new_execution_group = True
+
+    # Let's interate on all sections to create links between them
+    for section_name in fio_file.sections():
+        # The current section
+        section = fio_file[section_name]
+
+        # If the current section is waiting the previous job
+        if ('stonewall' or 'wait_for_previous') in section:
+            # let's remember what was the previous job we depend on
+            depends_on = previous_section
+            new_execution_group = True
+        elif 'wait_for' in section:
+            # This sections depends on a named section pointed by wait_for
+            depends_on = section['wait_for']
+            new_execution_group = True
+
+        if new_execution_group:
+            # Let's link the current graph with the main one
+            main_graph.subgraph(current_graph)
+            # Let's create a new graph to represent all the incoming jobs running at the same time
+            current_graph = create_sub_graph(section_name)
+
+        # Let's render the current section in its execution group
+        render_section(current_graph, fio_file, section_name,
+                       render_options(fio_file, section_name))
+
+        # Let's trace the link between this job and the one it depends on
+        # If we depend on 'global', we can avoid doing adding an arrow as we don't want to see 'global'
+        if depends_on != fio_file.default_section:
+            current_graph.edge(depends_on, section_name)
+
+        # The current section become the parent of the next one
+        previous_section = section_name
+
+        # We are by default in the same execution group
+        new_execution_group = False
+
+    # The last subgraph isn't rendered yet
+    main_graph.subgraph(current_graph)
+
+    # Let's return the main graphviz object
+    return main_graph
+
+
+def setup_commandline():
+    "Prepare the command line."
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--file', action='store',
+                        type=str,
+                        required=True,
+                        help='the fio file to graph')
+    parser.add_argument('--output', action='store',
+                        type=str,
+                        help='the output filename')
+    parser.add_argument('--format', action='store',
+                        type=str,
+                        default='png',
+                        help='the output format')
+    parser.add_argument('--view', action='store_true',
+                        default=False,
+                        help='view the graph')
+    parser.add_argument('--keep', action='store_true',
+                        default=False,
+                        help='keep the graphviz script file')
+    parser.add_argument('--config', action='store',
+                        type=str,
+                        default='fiograph.conf',
+                        help='the configuration filename')
+    args = parser.parse_args()
+    return args
+
+
+def main():
+    global config_file
+    args = setup_commandline()
+    output_file = args.file
+    if args.output is None:
+        output_file = output_file.replace('.fio', '')
+    config_file = configparser.RawConfigParser(allow_no_value=True)
+    config_file.read(args.config)
+    fio_to_graphviz(args.file, args.format).render(output_file, view=args.view)
+    if not args.keep:
+        os.remove(output_file)
+
+
+main()
diff --git a/zbd.c b/zbd.c
index 8e99eb95..04c68dea 100644
--- a/zbd.c
+++ b/zbd.c
@@ -636,8 +636,12 @@ static int zbd_set_max_open_zones(struct thread_data *td, struct fio_file *f)
 
 out:
 	/* Ensure that the limit is not larger than FIO's internal limit */
-	zbd->max_open_zones = min_not_zero(zbd->max_open_zones,
-					   (uint32_t) ZBD_MAX_OPEN_ZONES);
+	if (zbd->max_open_zones > ZBD_MAX_OPEN_ZONES) {
+		td_verror(td, EINVAL, "'max_open_zones' value is too large");
+		log_err("'max_open_zones' value is larger than %u\n", ZBD_MAX_OPEN_ZONES);
+		return -EINVAL;
+	}
+
 	dprint(FD_ZBD, "%s: using max open zones limit: %"PRIu32"\n",
 	       f->file_name, zbd->max_open_zones);
 
@@ -827,11 +831,25 @@ int zbd_setup_files(struct thread_data *td)
 			log_err("Different 'max_open_zones' values\n");
 			return 1;
 		}
-		if (zbd->max_open_zones > ZBD_MAX_OPEN_ZONES) {
-			log_err("'max_open_zones' value is limited by %u\n", ZBD_MAX_OPEN_ZONES);
+
+		/*
+		 * The per job max open zones limit cannot be used without a
+		 * global max open zones limit. (As the tracking of open zones
+		 * is disabled when there is no global max open zones limit.)
+		 */
+		if (td->o.job_max_open_zones && !zbd->max_open_zones) {
+			log_err("'job_max_open_zones' cannot be used without a global open zones limit\n");
 			return 1;
 		}
 
+		/*
+		 * zbd->max_open_zones is the global limit shared for all jobs
+		 * that target the same zoned block device. Force sync the per
+		 * thread global limit with the actual global limit. (The real
+		 * per thread/job limit is stored in td->o.job_max_open_zones).
+		 */
+		td->o.max_open_zones = zbd->max_open_zones;
+
 		for (zi = f->min_zone; zi < f->max_zone; zi++) {
 			z = &zbd->zone_info[zi];
 			if (z->cond != ZBD_ZONE_COND_IMP_OPEN &&
@@ -1093,6 +1111,8 @@ static bool is_zone_open(const struct thread_data *td, const struct fio_file *f,
 	struct zoned_block_device_info *zbdi = f->zbd_info;
 	int i;
 
+	/* This function should never be called when zbdi->max_open_zones == 0 */
+	assert(zbdi->max_open_zones);
 	assert(td->o.job_max_open_zones == 0 || td->num_open_zones <= td->o.job_max_open_zones);
 	assert(td->o.job_max_open_zones <= zbdi->max_open_zones);
 	assert(zbdi->num_open_zones <= zbdi->max_open_zones);
@@ -1114,6 +1134,7 @@ static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 			  uint32_t zone_idx)
 {
 	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
+	struct zoned_block_device_info *zbdi = f->zbd_info;
 	struct fio_zone_info *z = get_zone(f, zone_idx);
 	bool res = true;
 
@@ -1127,7 +1148,15 @@ static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 	if (td->o.verify != VERIFY_NONE && zbd_zone_full(f, z, min_bs))
 		return false;
 
-	pthread_mutex_lock(&f->zbd_info->mutex);
+	/*
+	 * zbdi->max_open_zones == 0 means that there is no limit on the maximum
+	 * number of open zones. In this case, do no track open zones in
+	 * zbdi->open_zones array.
+	 */
+	if (!zbdi->max_open_zones)
+		return true;
+
+	pthread_mutex_lock(&zbdi->mutex);
 	if (is_zone_open(td, f, zone_idx)) {
 		/*
 		 * If the zone is already open and going to be full by writes
@@ -1142,16 +1171,16 @@ static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 	if (td->o.job_max_open_zones > 0 &&
 	    td->num_open_zones >= td->o.job_max_open_zones)
 		goto out;
-	if (f->zbd_info->num_open_zones >= f->zbd_info->max_open_zones)
+	if (zbdi->num_open_zones >= zbdi->max_open_zones)
 		goto out;
 	dprint(FD_ZBD, "%s: opening zone %d\n", f->file_name, zone_idx);
-	f->zbd_info->open_zones[f->zbd_info->num_open_zones++] = zone_idx;
+	zbdi->open_zones[zbdi->num_open_zones++] = zone_idx;
 	td->num_open_zones++;
 	z->open = 1;
 	res = true;
 
 out:
-	pthread_mutex_unlock(&f->zbd_info->mutex);
+	pthread_mutex_unlock(&zbdi->mutex);
 	return res;
 }
 
@@ -1175,6 +1204,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 {
 	const uint32_t min_bs = td->o.min_bs[io_u->ddir];
 	struct fio_file *f = io_u->file;
+	struct zoned_block_device_info *zbdi = f->zbd_info;
 	struct fio_zone_info *z;
 	unsigned int open_zone_idx = -1;
 	uint32_t zone_idx, new_zone_idx;
@@ -1183,12 +1213,12 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 
 	assert(is_valid_offset(f, io_u->offset));
 
-	if (td->o.max_open_zones || td->o.job_max_open_zones) {
+	if (zbdi->max_open_zones || td->o.job_max_open_zones) {
 		/*
-		 * This statement accesses f->zbd_info->open_zones[] on purpose
+		 * This statement accesses zbdi->open_zones[] on purpose
 		 * without locking.
 		 */
-		zone_idx = f->zbd_info->open_zones[pick_random_zone_idx(f, io_u)];
+		zone_idx = zbdi->open_zones[pick_random_zone_idx(f, io_u)];
 	} else {
 		zone_idx = zbd_zone_idx(f, io_u->offset);
 	}
@@ -1200,9 +1230,9 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 	       __func__, f->file_name, zone_idx, io_u->offset, io_u->buflen);
 
 	/*
-	 * Since z->mutex is the outer lock and f->zbd_info->mutex the inner
+	 * Since z->mutex is the outer lock and zbdi->mutex the inner
 	 * lock it can happen that the state of the zone with index zone_idx
-	 * has changed after 'z' has been assigned and before f->zbd_info->mutex
+	 * has changed after 'z' has been assigned and before zbdi->mutex
 	 * has been obtained. Hence the loop.
 	 */
 	for (;;) {
@@ -1211,12 +1241,12 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 		z = get_zone(f, zone_idx);
 		if (z->has_wp)
 			zone_lock(td, f, z);
-		pthread_mutex_lock(&f->zbd_info->mutex);
+		pthread_mutex_lock(&zbdi->mutex);
 		if (z->has_wp) {
 			if (z->cond != ZBD_ZONE_COND_OFFLINE &&
-			    td->o.max_open_zones == 0 && td->o.job_max_open_zones == 0)
+			    zbdi->max_open_zones == 0 && td->o.job_max_open_zones == 0)
 				goto examine_zone;
-			if (f->zbd_info->num_open_zones == 0) {
+			if (zbdi->num_open_zones == 0) {
 				dprint(FD_ZBD, "%s(%s): no zones are open\n",
 				       __func__, f->file_name);
 				goto open_other_zone;
@@ -1230,14 +1260,14 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 		 */
 		open_zone_idx = pick_random_zone_idx(f, io_u);
 		assert(!open_zone_idx ||
-		       open_zone_idx < f->zbd_info->num_open_zones);
+		       open_zone_idx < zbdi->num_open_zones);
 		tmp_idx = open_zone_idx;
-		for (i = 0; i < f->zbd_info->num_open_zones; i++) {
+		for (i = 0; i < zbdi->num_open_zones; i++) {
 			uint32_t tmpz;
 
-			if (tmp_idx >= f->zbd_info->num_open_zones)
+			if (tmp_idx >= zbdi->num_open_zones)
 				tmp_idx = 0;
-			tmpz = f->zbd_info->open_zones[tmp_idx];
+			tmpz = zbdi->open_zones[tmp_idx];
 			if (f->min_zone <= tmpz && tmpz < f->max_zone) {
 				open_zone_idx = tmp_idx;
 				goto found_candidate_zone;
@@ -1248,39 +1278,39 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 
 		dprint(FD_ZBD, "%s(%s): no candidate zone\n",
 			__func__, f->file_name);
-		pthread_mutex_unlock(&f->zbd_info->mutex);
+		pthread_mutex_unlock(&zbdi->mutex);
 		if (z->has_wp)
 			zone_unlock(z);
 		return NULL;
 
 found_candidate_zone:
-		new_zone_idx = f->zbd_info->open_zones[open_zone_idx];
+		new_zone_idx = zbdi->open_zones[open_zone_idx];
 		if (new_zone_idx == zone_idx)
 			break;
 		zone_idx = new_zone_idx;
-		pthread_mutex_unlock(&f->zbd_info->mutex);
+		pthread_mutex_unlock(&zbdi->mutex);
 		if (z->has_wp)
 			zone_unlock(z);
 	}
 
-	/* Both z->mutex and f->zbd_info->mutex are held. */
+	/* Both z->mutex and zbdi->mutex are held. */
 
 examine_zone:
 	if (z->wp + min_bs <= zbd_zone_capacity_end(z)) {
-		pthread_mutex_unlock(&f->zbd_info->mutex);
+		pthread_mutex_unlock(&zbdi->mutex);
 		goto out;
 	}
 
 open_other_zone:
 	/* Check if number of open zones reaches one of limits. */
 	wait_zone_close =
-		f->zbd_info->num_open_zones == f->max_zone - f->min_zone ||
-		(td->o.max_open_zones &&
-		 f->zbd_info->num_open_zones == td->o.max_open_zones) ||
+		zbdi->num_open_zones == f->max_zone - f->min_zone ||
+		(zbdi->max_open_zones &&
+		 zbdi->num_open_zones == zbdi->max_open_zones) ||
 		(td->o.job_max_open_zones &&
 		 td->num_open_zones == td->o.job_max_open_zones);
 
-	pthread_mutex_unlock(&f->zbd_info->mutex);
+	pthread_mutex_unlock(&zbdi->mutex);
 
 	/* Only z->mutex is held. */
 
@@ -1295,7 +1325,7 @@ open_other_zone:
 	}
 
 	/* Zone 'z' is full, so try to open a new zone. */
-	for (i = f->io_size / f->zbd_info->zone_size; i > 0; i--) {
+	for (i = f->io_size / zbdi->zone_size; i > 0; i--) {
 		zone_idx++;
 		if (z->has_wp)
 			zone_unlock(z);
@@ -1318,12 +1348,12 @@ open_other_zone:
 	/* Only z->mutex is held. */
 
 	/* Check whether the write fits in any of the already opened zones. */
-	pthread_mutex_lock(&f->zbd_info->mutex);
-	for (i = 0; i < f->zbd_info->num_open_zones; i++) {
-		zone_idx = f->zbd_info->open_zones[i];
+	pthread_mutex_lock(&zbdi->mutex);
+	for (i = 0; i < zbdi->num_open_zones; i++) {
+		zone_idx = zbdi->open_zones[i];
 		if (zone_idx < f->min_zone || zone_idx >= f->max_zone)
 			continue;
-		pthread_mutex_unlock(&f->zbd_info->mutex);
+		pthread_mutex_unlock(&zbdi->mutex);
 		zone_unlock(z);
 
 		z = get_zone(f, zone_idx);
@@ -1331,9 +1361,9 @@ open_other_zone:
 		zone_lock(td, f, z);
 		if (z->wp + min_bs <= zbd_zone_capacity_end(z))
 			goto out;
-		pthread_mutex_lock(&f->zbd_info->mutex);
+		pthread_mutex_lock(&zbdi->mutex);
 	}
-	pthread_mutex_unlock(&f->zbd_info->mutex);
+	pthread_mutex_unlock(&zbdi->mutex);
 	zone_unlock(z);
 	dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,
 	       f->file_name);
@@ -1683,6 +1713,7 @@ enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
 enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
+	struct zoned_block_device_info *zbdi = f->zbd_info;
 	uint32_t zone_idx_b;
 	struct fio_zone_info *zb, *zl, *orig_zb;
 	uint32_t orig_len = io_u->buflen;
@@ -1690,7 +1721,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	uint64_t new_len;
 	int64_t range;
 
-	assert(f->zbd_info);
+	assert(zbdi);
 	assert(min_bs);
 	assert(is_valid_offset(f, io_u->offset));
 	assert(io_u->buflen);
@@ -1802,12 +1833,12 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		assert(io_u->offset + io_u->buflen <= zb->wp);
 		goto accept;
 	case DDIR_WRITE:
-		if (io_u->buflen > f->zbd_info->zone_size) {
+		if (io_u->buflen > zbdi->zone_size) {
 			td_verror(td, EINVAL, "I/O buflen exceeds zone size");
 			dprint(FD_IO,
 			       "%s: I/O buflen %llu exceeds zone size %llu\n",
 			       f->file_name, io_u->buflen,
-			       (unsigned long long) f->zbd_info->zone_size);
+			       (unsigned long long) zbdi->zone_size);
 			goto eof;
 		}
 		if (!zbd_open_zone(td, f, zone_idx_b)) {
@@ -1822,7 +1853,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		}
 		/* Check whether the zone reset threshold has been exceeded */
 		if (td->o.zrf.u.f) {
-			if (f->zbd_info->wp_sectors_with_data >=
+			if (zbdi->wp_sectors_with_data >=
 			    f->io_size * td->o.zrt.u.f &&
 			    zbd_dec_and_reset_write_cnt(td, f)) {
 				zb->reset_zone = 1;
diff --git a/zbd.h b/zbd.h
index 64534393..39dc45e3 100644
--- a/zbd.h
+++ b/zbd.h
@@ -50,7 +50,8 @@ struct fio_zone_info {
  * zoned_block_device_info - zoned block device characteristics
  * @model: Device model.
  * @max_open_zones: global limit on the number of simultaneously opened
- *	sequential write zones.
+ *	sequential write zones. A zero value means unlimited open zones,
+ *	and that open zones will not be tracked in the open_zones array.
  * @mutex: Protects the modifiable members in this structure (refcount and
  *		num_open_zones).
  * @zone_size: size of a single zone in bytes.


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-06-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-06-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit afa37b56c6637967180e4409adbb682b2ce16bcb:

  filehash: ignore hashed file with fd == -1 (2021-06-17 10:55:20 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d3dacdc61dfe878fda0c363084c4330492e38b2b:

  Merge branch 'pkg_config_1' of https://github.com/kusumi/fio (2021-06-20 10:44:49 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'pkg_config_1' of https://github.com/kusumi/fio

Tomohiro Kusumi (1):
      configure: silence "pkg-config: not found"

 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 8b763700..84ccce04 100755
--- a/configure
+++ b/configure
@@ -2285,7 +2285,7 @@ print_config "DAOS File System (dfs) Engine" "$dfs"
 ##########################################
 # Check if we have libnfs (for userspace nfs support).
 if test "$disable_nfs" != "yes"; then
-  if $(pkg-config libnfs); then
+  if $(pkg-config libnfs > /dev/null 2>&1); then
     libnfs="yes"
     libnfs_cflags=$(pkg-config --cflags libnfs)
     libnfs_libs=$(pkg-config --libs libnfs)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-06-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-06-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a59b12d2a5eb92c1128a5d8ebcd03b1831962ce5:

  t/zbd: update test case 42 (2021-06-14 08:54:56 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to afa37b56c6637967180e4409adbb682b2ce16bcb:

  filehash: ignore hashed file with fd == -1 (2021-06-17 10:55:20 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      filehash: ignore hashed file with fd == -1

 filehash.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/filehash.c b/filehash.c
index b55ab734..71ec7b18 100644
--- a/filehash.c
+++ b/filehash.c
@@ -60,10 +60,8 @@ static struct fio_file *__lookup_file_hash(const char *name)
 		if (!f->file_name)
 			continue;
 
-		if (!strcmp(f->file_name, name)) {
-			assert(f->fd != -1);
+		if (!strcmp(f->file_name, name))
 			return f;
-		}
 	}
 
 	return NULL;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-06-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-06-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dd4620b7f9171edaa10955c4826454a05af27c85:

  io_uring: drop redundant IO_MODE_OFFLOAD check (2021-06-10 16:40:49 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a59b12d2a5eb92c1128a5d8ebcd03b1831962ce5:

  t/zbd: update test case 42 (2021-06-14 08:54:56 -0600)

----------------------------------------------------------------
Niklas Cassel (5):
      zbd: disallow pipes for zonemode=zbd
      zbd: allow zonemode=zbd with regular files by emulating zones
      zbd: remove zbd_zoned_model ZBD_IGNORE
      zbd: change some f->zbd_info conditionals to asserts
      t/zbd: update test case 42

 engines/libzbc.c            |  6 ++----
 engines/skeleton_external.c |  1 -
 oslib/linux-blkzoned.c      |  6 ++----
 t/zbd/test-zbd-support      |  2 +-
 zbd.c                       | 37 +++++++++++++++++++++++++------------
 zbd_types.h                 |  7 +++----
 6 files changed, 33 insertions(+), 26 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libzbc.c b/engines/libzbc.c
index 3dde93db..7f2bc431 100644
--- a/engines/libzbc.c
+++ b/engines/libzbc.c
@@ -180,10 +180,8 @@ static int libzbc_get_zoned_model(struct thread_data *td, struct fio_file *f,
 	struct libzbc_data *ld;
 	int ret;
 
-	if (f->filetype != FIO_TYPE_BLOCK && f->filetype != FIO_TYPE_CHAR) {
-		*model = ZBD_IGNORE;
-		return 0;
-	}
+	if (f->filetype != FIO_TYPE_BLOCK && f->filetype != FIO_TYPE_CHAR)
+		return -EINVAL;
 
 	ret = libzbc_open_dev(td, f, &ld);
 	if (ret)
diff --git a/engines/skeleton_external.c b/engines/skeleton_external.c
index c79b6f11..cff83a10 100644
--- a/engines/skeleton_external.c
+++ b/engines/skeleton_external.c
@@ -156,7 +156,6 @@ static int fio_skeleton_close(struct thread_data *td, struct fio_file *f)
 /*
  * Hook for getting the zoned model of a zoned block device for zonemode=zbd.
  * The zoned model can be one of (see zbd_types.h):
- * - ZBD_IGNORE: skip regular files
  * - ZBD_NONE: regular block device (zone emulation will be used)
  * - ZBD_HOST_AWARE: host aware zoned block device
  * - ZBD_HOST_MANAGED: host managed zoned block device
diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
index 6f89ec6f..4e441d29 100644
--- a/oslib/linux-blkzoned.c
+++ b/oslib/linux-blkzoned.c
@@ -140,10 +140,8 @@ int blkzoned_get_zoned_model(struct thread_data *td, struct fio_file *f,
 {
 	char *model_str = NULL;
 
-	if (f->filetype != FIO_TYPE_BLOCK) {
-		*model = ZBD_IGNORE;
-		return 0;
-	}
+	if (f->filetype != FIO_TYPE_BLOCK)
+		return -EINVAL;
 
 	*model = ZBD_NONE;
 
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index a684f988..57e6d05e 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -922,7 +922,7 @@ test41() {
 test42() {
     require_regular_block_dev || return $SKIP_TESTCASE
     read_one_block --zonemode=zbd --zonesize=0 |
-	grep -q 'Specifying the zone size is mandatory for regular block devices with --zonemode=zbd'
+	grep -q 'Specifying the zone size is mandatory for regular file/block device with --zonemode=zbd'
 }
 
 # Check whether fio handles --zonesize=1 correctly for regular block devices.
diff --git a/zbd.c b/zbd.c
index 5d9e331a..8e99eb95 100644
--- a/zbd.c
+++ b/zbd.c
@@ -32,6 +32,17 @@ int zbd_get_zoned_model(struct thread_data *td, struct fio_file *f,
 {
 	int ret;
 
+	if (f->filetype == FIO_TYPE_PIPE) {
+		log_err("zonemode=zbd does not support pipes\n");
+		return -EINVAL;
+	}
+
+	/* If regular file, always emulate zones inside the file. */
+	if (f->filetype == FIO_TYPE_FILE) {
+		*model = ZBD_NONE;
+		return 0;
+	}
+
 	if (td->io_ops && td->io_ops->get_zoned_model)
 		ret = td->io_ops->get_zoned_model(td, f, model);
 	else
@@ -409,7 +420,7 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 	int i;
 
 	if (zone_size == 0) {
-		log_err("%s: Specifying the zone size is mandatory for regular block devices with --zonemode=zbd\n\n",
+		log_err("%s: Specifying the zone size is mandatory for regular file/block device with --zonemode=zbd\n\n",
 			f->file_name);
 		return 1;
 	}
@@ -430,6 +441,12 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 		return 1;
 	}
 
+	if (f->real_file_size < zone_size) {
+		log_err("%s: file/device size %"PRIu64" is smaller than zone size %"PRIu64"\n",
+			f->file_name, f->real_file_size, zone_size);
+		return -EINVAL;
+	}
+
 	nr_zones = (f->real_file_size + zone_size - 1) / zone_size;
 	zbd_info = scalloc(1, sizeof(*zbd_info) +
 			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
@@ -644,8 +661,6 @@ static int zbd_create_zone_info(struct thread_data *td, struct fio_file *f)
 		return ret;
 
 	switch (zbd_model) {
-	case ZBD_IGNORE:
-		return 0;
 	case ZBD_HOST_AWARE:
 	case ZBD_HOST_MANAGED:
 		ret = parse_zone_info(td, f);
@@ -663,6 +678,7 @@ static int zbd_create_zone_info(struct thread_data *td, struct fio_file *f)
 		return -EINVAL;
 	}
 
+	assert(f->zbd_info);
 	f->zbd_info->model = zbd_model;
 
 	ret = zbd_set_max_open_zones(td, f);
@@ -792,8 +808,7 @@ int zbd_setup_files(struct thread_data *td)
 		struct fio_zone_info *z;
 		int zi;
 
-		if (!zbd)
-			continue;
+		assert(zbd);
 
 		f->min_zone = zbd_zone_idx(f, f->file_offset);
 		f->max_zone = zbd_zone_idx(f, f->file_offset + f->io_size);
@@ -1454,8 +1469,7 @@ static void zbd_queue_io(struct thread_data *td, struct io_u *io_u, int q,
 	uint32_t zone_idx;
 	uint64_t zone_end;
 
-	if (!zbd_info)
-		return;
+	assert(zbd_info);
 
 	zone_idx = zbd_zone_idx(f, io_u->offset);
 	assert(zone_idx < zbd_info->nr_zones);
@@ -1515,8 +1529,7 @@ static void zbd_put_io(struct thread_data *td, const struct io_u *io_u)
 	struct fio_zone_info *z;
 	uint32_t zone_idx;
 
-	if (!zbd_info)
-		return;
+	assert(zbd_info);
 
 	zone_idx = zbd_zone_idx(f, io_u->offset);
 	assert(zone_idx < zbd_info->nr_zones);
@@ -1572,6 +1585,7 @@ void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u)
 
 	assert(td->o.zone_mode == ZONE_MODE_ZBD);
 	assert(td->o.zone_size);
+	assert(f->zbd_info);
 
 	zone_idx = zbd_zone_idx(f, f->last_pos[ddir]);
 	z = get_zone(f, zone_idx);
@@ -1646,6 +1660,7 @@ enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
 	 * devices with all empty zones. Overwrite the first I/O direction as
 	 * write to make sure data to read exists.
 	 */
+	assert(io_u->file->zbd_info);
 	if (ddir != DDIR_READ || !td_rw(td))
 		return ddir;
 
@@ -1675,9 +1690,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	uint64_t new_len;
 	int64_t range;
 
-	if (!f->zbd_info)
-		return io_u_accept;
-
+	assert(f->zbd_info);
 	assert(min_bs);
 	assert(is_valid_offset(f, io_u->offset));
 	assert(io_u->buflen);
diff --git a/zbd_types.h b/zbd_types.h
index 5ed41aa0..0a8630cb 100644
--- a/zbd_types.h
+++ b/zbd_types.h
@@ -14,10 +14,9 @@
  * Zoned block device models.
  */
 enum zbd_zoned_model {
-	ZBD_IGNORE,		/* Ignore file */
-	ZBD_NONE,		/* Regular block device */
-	ZBD_HOST_AWARE,		/* Host-aware zoned block device */
-	ZBD_HOST_MANAGED,	/* Host-managed zoned block device */
+	ZBD_NONE		= 0x1,	/* No zone support. Emulate zones. */
+	ZBD_HOST_AWARE		= 0x2,	/* Host-aware zoned block device */
+	ZBD_HOST_MANAGED	= 0x3,	/* Host-managed zoned block device */
 };
 
 /*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-06-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-06-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 40d0b84220f7c0ff9c3874656db7f0f8cb6a85e6:

  t/zbd: Fix write target zones counting in test case #31 (2021-06-08 15:15:58 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dd4620b7f9171edaa10955c4826454a05af27c85:

  io_uring: drop redundant IO_MODE_OFFLOAD check (2021-06-10 16:40:49 -0600)

----------------------------------------------------------------
Stefan Hajnoczi (1):
      io_uring: drop redundant IO_MODE_OFFLOAD check

 engines/io_uring.c | 6 ------
 1 file changed, 6 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index b962e804..9c091e37 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -728,12 +728,6 @@ static int fio_ioring_init(struct thread_data *td)
 	struct ioring_data *ld;
 	struct thread_options *to = &td->o;
 
-	if (to->io_submit_mode == IO_MODE_OFFLOAD) {
-		log_err("fio: io_submit_mode=offload is not compatible (or "
-			"useful) with io_uring\n");
-		return 1;
-	}
-
 	/* sqthread submission requires registered files */
 	if (o->sqpoll_thread)
 		o->registerfiles = 1;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-06-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-06-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d7a28031718d281f1b9ea593c8a3f395761510ca:

  Merge branch 'fix/928' of https://github.com/larsks/fio (2021-06-03 09:01:52 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 40d0b84220f7c0ff9c3874656db7f0f8cb6a85e6:

  t/zbd: Fix write target zones counting in test case #31 (2021-06-08 15:15:58 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (3):
      t/zbd: Use max_open_zones that fio fetched from device
      t/zbd: Add ignore_zone_limit option to test with special max_open_zones
      t/zbd: Fix write target zones counting in test case #31

 t/zbd/functions        | 14 +++++++++++---
 t/zbd/test-zbd-support | 37 +++++++++++++++++--------------------
 2 files changed, 28 insertions(+), 23 deletions(-)

---

Diff of recent changes:

diff --git a/t/zbd/functions b/t/zbd/functions
index 40ffe1de..08a2c629 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -173,15 +173,23 @@ last_online_zone() {
     fi
 }
 
+# Get max_open_zones of SMR drives using sg_inq or libzbc tools. Two test cases
+# 31 and 32 use this max_open_zones value. The test case 31 uses max_open_zones
+# to decide number of write target zones. The test case 32 passes max_open_zones
+# value to fio with --max_open_zones option. Of note is that fio itself has the
+# feature to get max_open_zones from the device through sysfs or ioengine
+# specific implementation. This max_open_zones fetch by test script is required
+# in case fio is running on an old Linux kernel version which lacks
+# max_open_zones in sysfs, or which lacks zoned block device support completely.
 max_open_zones() {
     local dev=$1
 
     if [ -n "${sg_inq}" ] && [ ! -n "${use_libzbc}" ]; then
 	if ! ${sg_inq} -e --page=0xB6 --len=20 --hex "$dev" \
 		 > /dev/null 2>&1; then
-	    # Non scsi device such as null_blk can not return max open zones.
-	    # Use default value.
-	    echo 128
+	    # When sg_inq can not get max open zones, specify 0 which indicates
+	    # fio to get max open zones limit from the device.
+	    echo 0
 	else
 	    ${sg_inq} -e --page=0xB6 --len=20 --hex "$dev" | tail -1 |
 		{
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 26aff373..a684f988 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -731,32 +731,28 @@ test30() {
 test31() {
     local bs inc nz off opts size
 
-    prep_write
-    # Start with writing 128 KB to max_open_zones sequential zones.
-    bs=128K
+    [ -n "$is_zbd" ] && reset_zone "$dev" -1
+
+    # As preparation, write 128 KB to sequential write required zones. Limit
+    # write target zones up to max_open_zones to keep test time reasonable.
+    # To distribute the write target zones evenly, skip certain zones for every
+    # write. Utilize zonemode strided for such write patterns.
+    bs=$((128 * 1024))
     nz=$((max_open_zones))
     if [[ $nz -eq 0 ]]; then
 	nz=128
     fi
-    # shellcheck disable=SC2017
-    inc=$(((disk_size - (first_sequential_zone_sector * 512)) / (nz * zone_size)
-	   * zone_size))
-    if [ "$inc" -eq 0 ]; then
-	require_seq_zones $nz || return $SKIP_TESTCASE
-    fi
-    opts=()
-    for ((off = first_sequential_zone_sector * 512; off < disk_size;
-	  off += inc)); do
-	opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--io_size=$bs")
-	opts+=("--bs=$bs" "--size=$zone_size" "$(ioengine "libaio")")
-	opts+=("--rw=write" "--direct=1" "--thread=1" "--stats=0")
-	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
-	opts+=(${job_var_opts[@]})
-    done
-    "$(dirname "$0")/../../fio" "${opts[@]}" >> "${logfile}.${test_number}" 2>&1
-    # Next, run the test.
     off=$((first_sequential_zone_sector * 512))
     size=$((disk_size - off))
+    inc=$(((size / nz / zone_size) * zone_size))
+    opts=("--name=$dev" "--filename=$dev" "--rw=write" "--bs=${bs}")
+    opts+=("--offset=$off" "--size=$((inc * nz))" "--io_size=$((bs * nz))")
+    opts+=("--zonemode=strided" "--zonesize=${bs}" "--zonerange=${inc}")
+    opts+=("--direct=1")
+    echo "fio ${opts[@]}" >> "${logfile}.${test_number}"
+    "$(dirname "$0")/../../fio" "${opts[@]}" >> "${logfile}.${test_number}" 2>&1
+
+    # Next, run the test.
     opts=("--name=$dev" "--filename=$dev" "--offset=$off" "--size=$size")
     opts+=("--bs=$bs" "$(ioengine "psync")" "--rw=randread" "--direct=1")
     opts+=("--thread=1" "--time_based" "--runtime=30" "--zonemode=zbd")
@@ -1348,6 +1344,7 @@ fi
 if [[ -n ${max_open_zones_opt} ]]; then
 	# Override max_open_zones with the script option value
 	max_open_zones="${max_open_zones_opt}"
+	global_var_opts+=("--ignore_zone_limits=1")
 	job_var_opts+=("--max_open_zones=${max_open_zones_opt}")
 fi
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-06-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-06-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 575686bb85fa36f326524c505e83c54abc0d2f2b:

  zbd: add a new --ignore_zone_limits option (2021-05-27 16:04:58 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d7a28031718d281f1b9ea593c8a3f395761510ca:

  Merge branch 'fix/928' of https://github.com/larsks/fio (2021-06-03 09:01:52 -0600)

----------------------------------------------------------------
Erwan Velu (3):
      ci: Installing missing toolchain
      ci: Reporting installed msys2 packages
      Makefile: Avoid using built-in stpcpy during clang build

Jens Axboe (2):
      Merge branch 'evelu-test' of https://github.com/ErwanAliasr1/fio
      Merge branch 'fix/928' of https://github.com/larsks/fio

Lars Kellogg-Stedman (1):
      fix fio2gnuplot to work with new logging format

 Makefile               | 5 +++++
 ci/appveyor-install.sh | 2 ++
 tools/plot/fio2gnuplot | 2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index ef317373..f57569d5 100644
--- a/Makefile
+++ b/Makefile
@@ -40,6 +40,11 @@ ifdef CONFIG_PDB
   LDFLAGS += -fuse-ld=lld $(LINK_PDBFILE)
 endif
 
+# If clang, do not use builtin stpcpy as it breaks the build
+ifeq ($(CC),clang)
+  FIO_CFLAGS += -fno-builtin-stpcpy
+endif
+
 ifdef CONFIG_GFIO
   PROGS += gfio
 endif
diff --git a/ci/appveyor-install.sh b/ci/appveyor-install.sh
index c73e4cb5..3137f39e 100755
--- a/ci/appveyor-install.sh
+++ b/ci/appveyor-install.sh
@@ -31,7 +31,9 @@ case "${DISTRO}" in
         pacman.exe --noconfirm -S \
             mingw-w64-${PACKAGE_ARCH}-clang \
             mingw-w64-${PACKAGE_ARCH}-cunit \
+            mingw-w64-${PACKAGE_ARCH}-toolchain \
             mingw-w64-${PACKAGE_ARCH}-lld
+        pacman.exe -Q # List installed packages
         ;;
 esac
 
diff --git a/tools/plot/fio2gnuplot b/tools/plot/fio2gnuplot
index 78ee82fb..d2dc81df 100755
--- a/tools/plot/fio2gnuplot
+++ b/tools/plot/fio2gnuplot
@@ -198,7 +198,7 @@ def compute_temp_file(fio_data_file,disk_perf,gnuplot_output_dir, min_time, max_
 			# Index will be used to remember what file was featuring what value
 			index=index+1
 
-			time, perf, x, block_size = line[1]
+			time, perf, x, block_size = line[1][:4]
 			if (blk_size == 0):
 				try:
 					blk_size=int(block_size)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-05-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-05-28 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 12503 bytes --]

The following changes since commit 0313e938c9c8bb37d71dade239f1f5326677b079:

  Fio 3.27 (2021-05-26 10:10:32 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 575686bb85fa36f326524c505e83c54abc0d2f2b:

  zbd: add a new --ignore_zone_limits option (2021-05-27 16:04:58 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fix-libpmem' of https://github.com/lukaszstolarczuk/fio

Niklas Cassel (2):
      zbd: add missing client/server support for option max_open_zones
      zbd: add a new --ignore_zone_limits option

��ukasz Stolarczuk (3):
      engines/libpmem: set file open/create mode always to RW
      engines/libpmem: cleanup a little code, comments and example
      engines/libpmem: do not call drain on close

 cconv.c              |  4 ++++
 engines/libpmem.c    | 64 +++++++++++++++++-----------------------------------
 examples/libpmem.fio | 35 ++++++++++++++--------------
 fio.1                |  5 ++++
 options.c            | 10 ++++++++
 server.h             |  2 +-
 thread_options.h     |  3 +++
 zbd.c                |  2 +-
 8 files changed, 63 insertions(+), 62 deletions(-)

---

Diff of recent changes:

diff --git a/cconv.c b/cconv.c
index aa06e3ea..74c24106 100644
--- a/cconv.c
+++ b/cconv.c
@@ -231,6 +231,8 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->zone_capacity = le64_to_cpu(top->zone_capacity);
 	o->zone_skip = le64_to_cpu(top->zone_skip);
 	o->zone_mode = le32_to_cpu(top->zone_mode);
+	o->max_open_zones = __le32_to_cpu(top->max_open_zones);
+	o->ignore_zone_limits = le32_to_cpu(top->ignore_zone_limits);
 	o->lockmem = le64_to_cpu(top->lockmem);
 	o->offset_increment_percent = le32_to_cpu(top->offset_increment_percent);
 	o->offset_increment = le64_to_cpu(top->offset_increment);
@@ -573,6 +575,8 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->zone_capacity = __cpu_to_le64(o->zone_capacity);
 	top->zone_skip = __cpu_to_le64(o->zone_skip);
 	top->zone_mode = __cpu_to_le32(o->zone_mode);
+	top->max_open_zones = __cpu_to_le32(o->max_open_zones);
+	top->ignore_zone_limits = cpu_to_le32(o->ignore_zone_limits);
 	top->lockmem = __cpu_to_le64(o->lockmem);
 	top->ddir_seq_add = __cpu_to_le64(o->ddir_seq_add);
 	top->file_size_low = __cpu_to_le64(o->file_size_low);
diff --git a/engines/libpmem.c b/engines/libpmem.c
index 2338f0fa..ab29a453 100644
--- a/engines/libpmem.c
+++ b/engines/libpmem.c
@@ -2,7 +2,7 @@
  * libpmem: IO engine that uses PMDK libpmem to read and write data
  *
  * Copyright (C) 2017 Nippon Telegraph and Telephone Corporation.
- * Copyright 2018-2020, Intel Corporation
+ * Copyright 2018-2021, Intel Corporation
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License,
@@ -18,7 +18,8 @@
 /*
  * libpmem engine
  *
- * IO engine that uses libpmem to write data (and memcpy to read)
+ * IO engine that uses libpmem (part of PMDK collection) to write data
+ *	and libc's memcpy to read. It requires PMDK >= 1.5.
  *
  * To use:
  *   ioengine=libpmem
@@ -43,25 +44,13 @@
  *     mkdir /mnt/pmem0
  *     mount -o dax /dev/pmem0 /mnt/pmem0
  *
- * See examples/libpmem.fio for more.
- *
- *
- * libpmem.so
- *   By default, the libpmem engine will let the system find the libpmem.so
- *   that it uses. You can use an alternative libpmem by setting the
- *   FIO_PMEM_LIB environment variable to the full path to the desired
- *   libpmem.so. This engine requires PMDK >= 1.5.
+ * See examples/libpmem.fio for complete usage example.
  */
 
 #include <stdio.h>
-#include <limits.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <errno.h>
-#include <sys/mman.h>
-#include <sys/stat.h>
-#include <sys/sysmacros.h>
-#include <libgen.h>
 #include <libpmem.h>
 
 #include "../fio.h"
@@ -77,8 +66,8 @@ static int fio_libpmem_init(struct thread_data *td)
 {
 	struct thread_options *o = &td->o;
 
-	dprint(FD_IO,"o->rw_min_bs %llu \n o->fsync_blocks %u \n o->fdatasync_blocks %u \n",
-			o->rw_min_bs,o->fsync_blocks,o->fdatasync_blocks);
+	dprint(FD_IO, "o->rw_min_bs %llu\n o->fsync_blocks %u\n o->fdatasync_blocks %u\n",
+			o->rw_min_bs, o->fsync_blocks, o->fdatasync_blocks);
 	dprint(FD_IO, "DEBUG fio_libpmem_init\n");
 
 	if ((o->rw_min_bs & page_mask) &&
@@ -91,23 +80,17 @@ static int fio_libpmem_init(struct thread_data *td)
 }
 
 /*
- * This is the pmem_map_file execution function
+ * This is the pmem_map_file execution function, a helper to
+ * fio_libpmem_open_file function.
  */
 static int fio_libpmem_file(struct thread_data *td, struct fio_file *f,
 			    size_t length, off_t off)
 {
 	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
-	mode_t mode = 0;
+	mode_t mode = S_IWUSR | S_IRUSR;
 	size_t mapped_len;
 	int is_pmem;
 
-	if(td_rw(td))
-		mode = S_IWUSR | S_IRUSR;
-	else if (td_write(td))
-		mode = S_IWUSR;
-	else
-		mode = S_IRUSR;
-
 	dprint(FD_IO, "DEBUG fio_libpmem_file\n");
 	dprint(FD_IO, "f->file_name = %s td->o.verify = %d \n", f->file_name,
 			td->o.verify);
@@ -142,11 +125,11 @@ static int fio_libpmem_open_file(struct thread_data *td, struct fio_file *f)
 {
 	struct fio_libpmem_data *fdd;
 
-	dprint(FD_IO,"DEBUG fio_libpmem_open_file\n");
-	dprint(FD_IO,"f->io_size=%ld \n",f->io_size);
-	dprint(FD_IO,"td->o.size=%lld \n",td->o.size);
-	dprint(FD_IO,"td->o.iodepth=%d\n",td->o.iodepth);
-	dprint(FD_IO,"td->o.iodepth_batch=%d \n",td->o.iodepth_batch);
+	dprint(FD_IO, "DEBUG fio_libpmem_open_file\n");
+	dprint(FD_IO, "f->io_size=%ld\n", f->io_size);
+	dprint(FD_IO, "td->o.size=%lld\n", td->o.size);
+	dprint(FD_IO, "td->o.iodepth=%d\n", td->o.iodepth);
+	dprint(FD_IO, "td->o.iodepth_batch=%d\n", td->o.iodepth_batch);
 
 	if (fio_file_open(f))
 		td_io_close_file(td, f);
@@ -167,8 +150,8 @@ static int fio_libpmem_prep(struct thread_data *td, struct io_u *io_u)
 	struct fio_file *f = io_u->file;
 	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
 
-	dprint(FD_IO, "DEBUG fio_libpmem_prep\n" );
-	dprint(FD_IO," io_u->offset %llu : fdd->libpmem_off %ld : "
+	dprint(FD_IO, "DEBUG fio_libpmem_prep\n");
+	dprint(FD_IO, "io_u->offset %llu : fdd->libpmem_off %ld : "
 			"io_u->buflen %llu : fdd->libpmem_sz %ld\n",
 			io_u->offset, fdd->libpmem_off,
 			io_u->buflen, fdd->libpmem_sz);
@@ -192,8 +175,9 @@ static enum fio_q_status fio_libpmem_queue(struct thread_data *td,
 	io_u->error = 0;
 
 	dprint(FD_IO, "DEBUG fio_libpmem_queue\n");
-	dprint(FD_IO,"td->o.odirect %d td->o.sync_io %d \n",td->o.odirect, td->o.sync_io);
-	/* map both O_SYNC / DSYNC to not using NODRAIN */
+	dprint(FD_IO, "td->o.odirect %d td->o.sync_io %d\n",
+			td->o.odirect, td->o.sync_io);
+	/* map both O_SYNC / DSYNC to not use NODRAIN */
 	flags = td->o.sync_io ? 0 : PMEM_F_MEM_NODRAIN;
 	flags |= td->o.odirect ? PMEM_F_MEM_NONTEMPORAL : PMEM_F_MEM_TEMPORAL;
 
@@ -203,7 +187,7 @@ static enum fio_q_status fio_libpmem_queue(struct thread_data *td,
 		break;
 	case DDIR_WRITE:
 		dprint(FD_IO, "DEBUG mmap_data=%p, xfer_buf=%p\n",
-				io_u->mmap_data, io_u->xfer_buf );
+				io_u->mmap_data, io_u->xfer_buf);
 		pmem_memcpy(io_u->mmap_data,
 					io_u->xfer_buf,
 					io_u->xfer_buflen,
@@ -227,13 +211,7 @@ static int fio_libpmem_close_file(struct thread_data *td, struct fio_file *f)
 	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
 	int ret = 0;
 
-	dprint(FD_IO,"DEBUG fio_libpmem_close_file\n");
-	dprint(FD_IO,"td->o.odirect %d \n",td->o.odirect);
-
-	if (!td->o.odirect) {
-		dprint(FD_IO,"pmem_drain\n");
-		pmem_drain();
-	}
+	dprint(FD_IO, "DEBUG fio_libpmem_close_file\n");
 
 	if (fdd->libpmem_ptr)
 		ret = pmem_unmap(fdd->libpmem_ptr, fdd->libpmem_sz);
diff --git a/examples/libpmem.fio b/examples/libpmem.fio
index 0ff681f0..3b854a32 100644
--- a/examples/libpmem.fio
+++ b/examples/libpmem.fio
@@ -1,6 +1,6 @@
 [global]
 bs=4k
-size=8g
+size=10g
 ioengine=libpmem
 norandommap
 time_based
@@ -17,16 +17,6 @@ thread
 numjobs=1
 runtime=300
 
-#
-# In case of 'scramble_buffers=1', the source buffer
-# is rewritten with a random value every write operations.
-#
-# But when 'scramble_buffers=0' is set, the source buffer isn't
-# rewritten. So it will be likely that the source buffer is in CPU
-# cache and it seems to be high performance.
-#
-scramble_buffers=0
-
 #
 # depends on direct option, flags are set for pmem_memcpy() call:
 # direct=1 - PMEM_F_MEM_NONTEMPORAL,
@@ -39,9 +29,19 @@ direct=1
 #
 sync=1
 
+#
+# In case of 'scramble_buffers=1', the source buffer
+# is rewritten with a random value every write operation.
+#
+# But when 'scramble_buffers=0' is set, the source buffer isn't
+# rewritten. So it will be likely that the source buffer is in CPU
+# cache and it seems to be high write performance.
+#
+scramble_buffers=1
 
 #
-# Setting for fio process's CPU Node and Memory Node
+# Setting for fio process's CPU Node and Memory Node.
+# Set proper node below or use `numactl` command along with FIO.
 #
 numa_cpu_nodes=0
 numa_mem_policy=bind:0
@@ -53,21 +53,22 @@ cpus_allowed_policy=split
 
 #
 # The libpmem engine does IO to files in a DAX-mounted filesystem.
-# The filesystem should be created on an NVDIMM (e.g /dev/pmem0)
+# The filesystem should be created on a Non-Volatile DIMM (e.g /dev/pmem0)
 # and then mounted with the '-o dax' option.  Note that the engine
 # accesses the underlying NVDIMM directly, bypassing the kernel block
 # layer, so the usual filesystem/disk performance monitoring tools such
 # as iostat will not provide useful data.
 #
-directory=/mnt/pmem0
+#filename=/mnt/pmem/somefile
+directory=/mnt/pmem
 
 [libpmem-seqwrite]
 rw=write
 stonewall
 
-#[libpmem-seqread]
-#rw=read
-#stonewall
+[libpmem-seqread]
+rw=read
+stonewall
 
 #[libpmem-randwrite]
 #rw=randwrite
diff --git a/fio.1 b/fio.1
index ab08cb01..5aa54a4d 100644
--- a/fio.1
+++ b/fio.1
@@ -835,6 +835,11 @@ threads/processes.
 .BI job_max_open_zones \fR=\fPint
 Limit on the number of simultaneously opened zones per single thread/process.
 .TP
+.BI ignore_zone_limits \fR=\fPbool
+If this isn't set, fio will query the max open zones limit from the zoned block
+device, and exit if the specified \fBmax_open_zones\fR value is larger than the
+limit reported by the device. Default: false.
+.TP
 .BI zone_reset_threshold \fR=\fPfloat
 A number between zero and one that indicates the ratio of logical blocks with
 data to the total number of logical blocks in the test above which zones
diff --git a/options.c b/options.c
index b82a10aa..a8986d11 100644
--- a/options.c
+++ b/options.c
@@ -3492,6 +3492,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_INVALID,
 	},
+	{
+		.name	= "ignore_zone_limits",
+		.lname	= "Ignore zone resource limits",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct thread_options, ignore_zone_limits),
+		.def	= "0",
+		.help	= "Ignore the zone resource limits (max open/active zones) reported by the device",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_INVALID,
+	},
 	{
 		.name	= "zone_reset_threshold",
 		.lname	= "Zone reset threshold",
diff --git a/server.h b/server.h
index b45b319b..c128df28 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 89,
+	FIO_SERVER_VER			= 91,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 5ecc72d7..05c2d138 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -355,6 +355,7 @@ struct thread_options {
 	unsigned int read_beyond_wp;
 	int max_open_zones;
 	unsigned int job_max_open_zones;
+	unsigned int ignore_zone_limits;
 	fio_fp64_t zrt;
 	fio_fp64_t zrf;
 };
@@ -656,6 +657,8 @@ struct thread_options_pack {
 	uint32_t allow_mounted_write;
 
 	uint32_t zone_mode;
+	int32_t max_open_zones;
+	uint32_t ignore_zone_limits;
 } __attribute__((packed));
 
 extern void convert_thread_options_to_cpu(struct thread_options *o, struct thread_options_pack *top);
diff --git a/zbd.c b/zbd.c
index 68cd58e1..5d9e331a 100644
--- a/zbd.c
+++ b/zbd.c
@@ -588,7 +588,7 @@ static int zbd_set_max_open_zones(struct thread_data *td, struct fio_file *f)
 	unsigned int max_open_zones;
 	int ret;
 
-	if (zbd->model != ZBD_HOST_MANAGED) {
+	if (zbd->model != ZBD_HOST_MANAGED || td->o.ignore_zone_limits) {
 		/* Only host-managed devices have a max open limit */
 		zbd->max_open_zones = td->o.max_open_zones;
 		goto out;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-05-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-05-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c91fd13d479dc38bbb7ef6995256ad098ebbceb2:

  Merge branch 'master' of https://github.com/DevriesL/fio (2021-05-25 16:54:13 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0313e938c9c8bb37d71dade239f1f5326677b079:

  Fio 3.27 (2021-05-26 10:10:32 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 3.27

 FIO-VERSION-GEN | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 29486071..47af94e9 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.26
+DEF_VER=fio-3.27
 
 LF='
 '


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-05-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-05-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b54e0d80c52e626021aacd0ae4d9875940cff9aa:

  Merge branch 'taras/nfs-upstream' of https://github.com/tarasglek/fio-1 (2021-05-18 17:34:38 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c91fd13d479dc38bbb7ef6995256ad098ebbceb2:

  Merge branch 'master' of https://github.com/DevriesL/fio (2021-05-25 16:54:13 -0600)

----------------------------------------------------------------
DevriesL (1):
      android: add support for NDK sharedmem

Jens Axboe (1):
      Merge branch 'master' of https://github.com/DevriesL/fio

 os/os-android.h | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/os/os-android.h b/os/os-android.h
index 3f1aa9d3..a81cd815 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -71,11 +71,15 @@
 #include <stdio.h>
 #include <linux/ashmem.h>
 #include <linux/shm.h>
+#include <android/api-level.h>
+#if __ANDROID_API__ >= __ANDROID_API_O__
+#include <android/sharedmem.h>
+#else
+#define ASHMEM_DEVICE	"/dev/ashmem"
+#endif
 #define shmid_ds shmid64_ds
 #define SHM_HUGETLB    04000
 
-#define ASHMEM_DEVICE	"/dev/ashmem"
-
 static inline int shmctl(int __shmid, int __cmd, struct shmid_ds *__buf)
 {
 	int ret=0;
@@ -89,6 +93,16 @@ static inline int shmctl(int __shmid, int __cmd, struct shmid_ds *__buf)
 	return ret;
 }
 
+#if __ANDROID_API__ >= __ANDROID_API_O__
+static inline int shmget(key_t __key, size_t __size, int __shmflg)
+{
+	char keybuf[11];
+
+	sprintf(keybuf, "%d", __key);
+
+	return ASharedMemory_create(keybuf, __size + sizeof(uint64_t));
+}
+#else
 static inline int shmget(key_t __key, size_t __size, int __shmflg)
 {
 	int fd,ret;
@@ -114,6 +128,7 @@ error:
 	close(fd);
 	return ret;
 }
+#endif
 
 static inline void *shmat(int __shmid, const void *__shmaddr, int __shmflg)
 {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-05-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-05-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dfecde6a4b49bd299b2a7192c10533b9beb4820d:

  Merge branch '2021-05-13/stat-fix-integer-overflow' of https://github.com/flx42/fio (2021-05-14 09:36:59 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b54e0d80c52e626021aacd0ae4d9875940cff9aa:

  Merge branch 'taras/nfs-upstream' of https://github.com/tarasglek/fio-1 (2021-05-18 17:34:38 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'taras/nfs-upstream' of https://github.com/tarasglek/fio-1

Taras Glek (6):
      NFS engine
      NFS configure fixes
      C-style comments
      single line bodies
      skip skeleton comments
      clean up nfs example

 HOWTO            |  13 ++-
 Makefile         |   6 ++
 configure        |  29 +++++
 engines/nfs.c    | 314 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 examples/nfs.fio |  22 ++++
 fio.1            |  10 ++
 optgroup.c       |   4 +
 optgroup.h       |   2 +
 options.c        |   5 +
 9 files changed, 404 insertions(+), 1 deletion(-)
 create mode 100644 engines/nfs.c
 create mode 100644 examples/nfs.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index f5681c0d..86fb2964 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1171,7 +1171,7 @@ I/O type
 
 		**1**
 			Backward-compatible alias for **mixed**.
-		
+
 		**2**
 			Alias for **both**.
 
@@ -2103,6 +2103,12 @@ I/O engine
 			I/O engine supporting asynchronous read and write operations to the
 			DAOS File System (DFS) via libdfs.
 
+		**nfs**
+			I/O engine supporting asynchronous read and write operations to
+			NFS filesystems from userspace via libnfs. This is useful for
+			achieving higher concurrency and thus throughput than is possible
+			via kernel NFS.
+
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -2525,6 +2531,11 @@ with the caveat that when used on the command line, they must come after the
 	Specificy a different object class for the dfs file.
 	Use DAOS container's object class by default.
 
+.. option:: nfs_url=str : [nfs]
+
+	URL in libnfs format, eg nfs://<server|ipv4|ipv6>/path[?arg=val[&arg=val]*]
+	Refer to the libnfs README for more details.
+
 I/O depth
 ~~~~~~~~~
 
diff --git a/Makefile b/Makefile
index ba027b2e..ef317373 100644
--- a/Makefile
+++ b/Makefile
@@ -79,6 +79,12 @@ ifdef CONFIG_LIBNBD
   ENGINES += nbd
 endif
 
+ifdef CONFIG_LIBNFS
+  CFLAGS += $(LIBNFS_CFLAGS)
+  LIBS += $(LIBNFS_LIBS)
+  SOURCE += engines/nfs.c
+endif
+
 ifdef CONFIG_64BIT
   CPPFLAGS += -DBITS_PER_LONG=64
 else ifdef CONFIG_32BIT
diff --git a/configure b/configure
index e886bdc8..8b763700 100755
--- a/configure
+++ b/configure
@@ -170,6 +170,7 @@ disable_native="no"
 march_set="no"
 libiscsi="no"
 libnbd="no"
+libnfs="no"
 libzbc=""
 dfs=""
 dynamic_engines="no"
@@ -241,6 +242,8 @@ for opt do
   ;;
   --disable-tcmalloc) disable_tcmalloc="yes"
   ;;
+  --disable-nfs) disable_nfs="yes"
+  ;;
   --dynamic-libengines) dynamic_engines="yes"
   ;;
   --disable-dfs) dfs="no"
@@ -271,8 +274,10 @@ if test "$show_help" = "yes" ; then
   echo "--disable-rados         Disable Rados support even if found"
   echo "--disable-rbd           Disable Rados Block Device even if found"
   echo "--disable-http          Disable HTTP support even if found"
+  echo "--disable-nfs           Disable userspace NFS support even if found"
   echo "--disable-gfapi         Disable gfapi"
   echo "--enable-libhdfs        Enable hdfs support"
+  echo "--enable-libnfs         Enable nfs support"
   echo "--disable-lex           Disable use of lex/yacc for math"
   echo "--disable-pmem          Disable pmem based engines even if found"
   echo "--enable-lex            Enable use of lex/yacc for math"
@@ -2277,6 +2282,21 @@ EOF
 fi
 print_config "DAOS File System (dfs) Engine" "$dfs"
 
+##########################################
+# Check if we have libnfs (for userspace nfs support).
+if test "$disable_nfs" != "yes"; then
+  if $(pkg-config libnfs); then
+    libnfs="yes"
+    libnfs_cflags=$(pkg-config --cflags libnfs)
+    libnfs_libs=$(pkg-config --libs libnfs)
+  else
+    if test "$libnfs" = "yes" ; then
+      echo "libnfs" "Install libnfs"
+    fi
+  fi
+fi
+print_config "NFS engine" "$libnfs"
+
 ##########################################
 # Check if we have lex/yacc available
 yacc="no"
@@ -3101,6 +3121,9 @@ fi
 if test "$dfs" = "yes" ; then
   output_sym "CONFIG_DFS"
 fi
+if test "$libnfs" = "yes" ; then
+  output_sym "CONFIG_NFS"
+fi
 if test "$march_set" = "no" && test "$build_native" = "yes" ; then
   output_sym "CONFIG_BUILD_NATIVE"
 fi
@@ -3140,6 +3163,12 @@ if test "$libnbd" = "yes" ; then
   echo "LIBNBD_CFLAGS=$libnbd_cflags" >> $config_host_mak
   echo "LIBNBD_LIBS=$libnbd_libs" >> $config_host_mak
 fi
+if test "$libnfs" = "yes" ; then
+  output_sym "CONFIG_LIBNFS"
+  echo "CONFIG_LIBNFS=m" >> $config_host_mak
+  echo "LIBNFS_CFLAGS=$libnfs_cflags" >> $config_host_mak
+  echo "LIBNFS_LIBS=$libnfs_libs" >> $config_host_mak
+fi
 if test "$dynamic_engines" = "yes" ; then
   output_sym "CONFIG_DYNAMIC_ENGINES"
 fi
diff --git a/engines/nfs.c b/engines/nfs.c
new file mode 100644
index 00000000..21be8833
--- /dev/null
+++ b/engines/nfs.c
@@ -0,0 +1,314 @@
+#include <stdlib.h>
+#include <poll.h>
+#include <nfsc/libnfs.h>
+#include <nfsc/libnfs-raw.h>
+#include <nfsc/libnfs-raw-mount.h>
+
+#include "../fio.h"
+#include "../optgroup.h"
+
+enum nfs_op_type {
+	NFS_READ_WRITE = 0,
+	NFS_STAT_MKDIR_RMDIR,
+	NFS_STAT_TOUCH_RM,
+};
+
+struct fio_libnfs_options {
+	struct nfs_context *context;
+	char *nfs_url;
+	unsigned int queue_depth; /* nfs_callback needs this info, but doesn't have fio td structure to pull it from */
+	/* the following implement a circular queue of outstanding IOs */
+	int outstanding_events; /* IOs issued to libnfs, that have not returned yet */
+	int prev_requested_event_index; /* event last returned via fio_libnfs_event */
+	int next_buffered_event; /* round robin-pointer within events[] */
+	int buffered_event_count; /* IOs completed by libnfs, waiting for FIO */
+	int free_event_buffer_index; /* next free buffer */
+	struct io_u**events;
+};
+
+struct nfs_data {
+	struct nfsfh *nfsfh;
+	struct fio_libnfs_options *options;
+};
+
+static struct fio_option options[] = {
+	{
+		.name     = "nfs_url",
+		.lname    = "nfs_url",
+		.type     = FIO_OPT_STR_STORE,
+		.help	= "URL in libnfs format, eg nfs://<server|ipv4|ipv6>/path[?arg=val[&arg=val]*]",
+		.off1     = offsetof(struct fio_libnfs_options, nfs_url),
+		.category = FIO_OPT_C_ENGINE,
+		.group	= __FIO_OPT_G_NFS,
+	},
+	{
+		.name     = NULL,
+	},
+};
+
+static struct io_u *fio_libnfs_event(struct thread_data *td, int event)
+{
+	struct fio_libnfs_options *o = td->eo;
+	struct io_u *io_u = o->events[o->next_buffered_event];
+	assert(o->events[o->next_buffered_event]);
+	o->events[o->next_buffered_event] = NULL;
+	o->next_buffered_event = (o->next_buffered_event + 1) % td->o.iodepth;
+	/* validate our state machine */
+	assert(o->buffered_event_count);
+	o->buffered_event_count--;
+	assert(io_u);
+	/* assert that fio_libnfs_event is being called in sequential fashion */
+	assert(event == 0 || o->prev_requested_event_index + 1 == event);
+	if (o->buffered_event_count == 0) {
+		o->prev_requested_event_index = -1;
+	} else {
+		o->prev_requested_event_index = event;
+	}
+	return io_u;
+}
+
+static int nfs_event_loop(struct thread_data *td, bool flush) {
+	struct fio_libnfs_options *o = td->eo;
+	struct pollfd pfds[1]; /* nfs:0 */
+	/* we already have stuff queued for fio, no need to waste cpu on poll() */
+	if (o->buffered_event_count)
+		return o->buffered_event_count;
+	/* fio core logic seems to stop calling this event-loop if we ever return with 0 events */
+	#define SHOULD_WAIT() (o->outstanding_events == td->o.iodepth || (flush && o->outstanding_events))
+
+	do {
+		int timeout = SHOULD_WAIT() ? -1 : 0;
+		int ret = 0;
+		pfds[0].fd = nfs_get_fd(o->context);
+		pfds[0].events = nfs_which_events(o->context);
+		ret = poll(&pfds[0], 1, timeout);
+		if (ret < 0) {
+			if (errno == EINTR || errno == EAGAIN) {
+				continue;
+			}
+			log_err("nfs: failed to poll events: %s.\n",
+				strerror(errno));
+			break;
+		}
+
+		ret = nfs_service(o->context, pfds[0].revents);
+		if (ret < 0) {
+			log_err("nfs: socket is in an unrecoverable error state.\n");
+			break;
+		}
+	} while (SHOULD_WAIT());
+	return o->buffered_event_count;
+#undef SHOULD_WAIT
+}
+
+static int fio_libnfs_getevents(struct thread_data *td, unsigned int min,
+				  unsigned int max, const struct timespec *t)
+{
+	return nfs_event_loop(td, false);
+}
+
+static void nfs_callback(int res, struct nfs_context *nfs, void *data,
+                       void *private_data)
+{
+	struct io_u *io_u = private_data;
+	struct nfs_data *nfs_data = io_u->file->engine_data;
+	struct fio_libnfs_options *o = nfs_data->options;
+	if (res < 0) {
+		log_err("Failed NFS operation(code:%d): %s\n", res, nfs_get_error(o->context));
+		io_u->error = -res;
+		/* res is used for read math below, don't wanna pass negative there */
+		res = 0;
+	} else if (io_u->ddir == DDIR_READ) {
+		memcpy(io_u->buf, data, res);
+		if (res == 0)
+			log_err("Got NFS EOF, this is probably not expected\n");
+	}
+	/* fio uses resid to track remaining data */
+	io_u->resid = io_u->xfer_buflen - res;
+
+	assert(!o->events[o->free_event_buffer_index]);
+	o->events[o->free_event_buffer_index] = io_u;
+	o->free_event_buffer_index = (o->free_event_buffer_index + 1) % o->queue_depth;
+	o->outstanding_events--;
+	o->buffered_event_count++;
+}
+
+static int queue_write(struct fio_libnfs_options *o, struct io_u *io_u) {
+	struct nfs_data *nfs_data = io_u->engine_data;
+	return nfs_pwrite_async(o->context, nfs_data->nfsfh,
+                           io_u->offset, io_u->buflen, io_u->buf, nfs_callback,
+                           io_u);
+}
+
+static int queue_read(struct fio_libnfs_options *o, struct io_u *io_u) {
+	struct nfs_data *nfs_data = io_u->engine_data;
+	return nfs_pread_async(o->context,  nfs_data->nfsfh, io_u->offset, io_u->buflen, nfs_callback,  io_u);
+}
+
+static enum fio_q_status fio_libnfs_queue(struct thread_data *td,
+					    struct io_u *io_u)
+{
+	struct nfs_data *nfs_data = io_u->file->engine_data;
+	struct fio_libnfs_options *o = nfs_data->options;
+	struct nfs_context *nfs = o->context;
+	int err;
+	enum fio_q_status ret = FIO_Q_QUEUED;
+
+	io_u->engine_data = nfs_data;
+	switch(io_u->ddir) {
+		case DDIR_WRITE:
+			err = queue_write(o, io_u);
+			break;
+		case DDIR_READ:
+			err = queue_read(o, io_u);
+			break;
+		case DDIR_TRIM:
+			log_err("nfs: trim is not supported");
+			err = -1;
+			break;
+		default:
+			log_err("nfs: unhandled io %d\n", io_u->ddir);
+			err = -1;
+	}
+	if (err) {
+		log_err("nfs: Failed to queue nfs op: %s\n", nfs_get_error(nfs));
+		td->error = 1;
+		return FIO_Q_COMPLETED;
+	}
+	o->outstanding_events++;
+	return ret;
+}
+
+/*
+ * Do a mount if one has not been done before 
+ */
+static int do_mount(struct thread_data *td, const char *url)
+{
+	size_t event_size = sizeof(struct io_u **) * td->o.iodepth;
+	struct fio_libnfs_options *options = td->eo;
+	struct nfs_url *nfs_url = NULL;
+	int ret = 0;
+	int path_len = 0;
+	char *mnt_dir = NULL;
+
+	if (options->context)
+		return 0;
+
+	options->context = nfs_init_context();
+	if (options->context == NULL) {
+		log_err("nfs: failed to init nfs context\n");
+		return -1;
+	}
+
+	options->events = malloc(event_size);
+	memset(options->events, 0, event_size);
+
+	options->prev_requested_event_index = -1;
+	options->queue_depth = td->o.iodepth;
+
+	nfs_url = nfs_parse_url_full(options->context, url);
+	path_len = strlen(nfs_url->path);
+	mnt_dir = malloc(path_len + strlen(nfs_url->file) + 1);
+	strcpy(mnt_dir, nfs_url->path);
+	strcpy(mnt_dir + strlen(nfs_url->path), nfs_url->file);
+	ret = nfs_mount(options->context, nfs_url->server, mnt_dir);
+	free(mnt_dir);
+	nfs_destroy_url(nfs_url);
+	return ret;
+}
+
+static int fio_libnfs_setup(struct thread_data *td)
+{
+	/* Using threads with libnfs causes fio to hang on exit, lower performance */
+	td->o.use_thread = 0;
+	return 0;
+}
+
+static void fio_libnfs_cleanup(struct thread_data *td)
+{
+	struct fio_libnfs_options *o = td->eo;
+	nfs_umount(o->context);
+	nfs_destroy_context(o->context);
+	free(o->events);
+}
+
+static int fio_libnfs_open(struct thread_data *td, struct fio_file *f)
+{
+	int ret;
+	struct fio_libnfs_options *options = td->eo;
+	struct nfs_data *nfs_data = NULL;
+	int flags = 0;
+
+	if (!options->nfs_url) {
+		log_err("nfs: nfs_url is a required parameter\n");
+		return -1;
+	}
+
+	ret = do_mount(td, options->nfs_url);
+
+	if (ret != 0) {
+		log_err("nfs: Failed to mount %s with code %d: %s\n", options->nfs_url, ret, nfs_get_error(options->context));
+		return ret;
+	}
+	nfs_data = malloc(sizeof(struct nfs_data));
+	memset(nfs_data, 0, sizeof(struct nfs_data));
+	nfs_data->options = options;
+
+	if (td->o.td_ddir == TD_DDIR_WRITE) {
+		flags |= O_CREAT | O_RDWR;
+	} else {
+		flags |= O_RDWR;
+	}
+	ret = nfs_open(options->context, f->file_name, flags, &nfs_data->nfsfh);
+
+	if (ret != 0)
+		log_err("Failed to open %s: %s\n", f->file_name, nfs_get_error(options->context));
+	f->engine_data = nfs_data;
+	return ret;
+}
+
+static int fio_libnfs_close(struct thread_data *td, struct fio_file *f)
+{
+	struct nfs_data *nfs_data = f->engine_data;
+	struct fio_libnfs_options *o = nfs_data->options;
+	int ret = 0;
+	if (nfs_data->nfsfh)
+		ret = nfs_close(o->context, nfs_data->nfsfh);
+	free(nfs_data);
+	f->engine_data = NULL;
+	return ret;
+}
+
+/*
+ * Hook for writing out outstanding data.
+ */
+static int fio_libnfs_commit(struct thread_data *td) {
+	nfs_event_loop(td, true);
+	return 0;
+}
+
+struct ioengine_ops ioengine = {
+	.name		= "nfs",
+	.version	= FIO_IOOPS_VERSION,
+	.setup		= fio_libnfs_setup,
+	.queue		= fio_libnfs_queue,
+	.getevents	= fio_libnfs_getevents,
+	.event		= fio_libnfs_event,
+	.cleanup	= fio_libnfs_cleanup,
+	.open_file	= fio_libnfs_open,
+	.close_file	= fio_libnfs_close,
+	.commit     = fio_libnfs_commit,
+	.flags      = FIO_DISKLESSIO | FIO_NOEXTEND | FIO_NODISKUTIL,
+	.options	= options,
+	.option_struct_size	= sizeof(struct fio_libnfs_options),
+};
+
+static void fio_init fio_nfs_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_nfs_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/nfs.fio b/examples/nfs.fio
new file mode 100644
index 00000000..f856cebf
--- /dev/null
+++ b/examples/nfs.fio
@@ -0,0 +1,22 @@
+[global]
+nfs_url=nfs://127.0.0.1/nfs
+blocksize=524288
+iodepth=10
+ioengine=nfs
+size=104857600
+lat_percentiles=1
+group_reporting
+numjobs=10
+ramp_time=5s
+filename_format=myfiles.$clientuid.$jobnum.$filenum
+time_based=1
+
+[write]
+rw=write
+runtime=10s
+stonewall
+
+[read]
+wait_for=write
+rw=randread
+runtime=10s
diff --git a/fio.1 b/fio.1
index 533bcf6a..ab08cb01 100644
--- a/fio.1
+++ b/fio.1
@@ -1901,6 +1901,12 @@ not be \fBcudamalloc\fR. This ioengine defines engine specific options.
 .B dfs
 I/O engine supporting asynchronous read and write operations to the DAOS File
 System (DFS) via libdfs.
+.TP
+.B nfs
+I/O engine supporting asynchronous read and write operations to
+NFS filesystems from userspace via libnfs. This is useful for
+achieving higher concurrency and thus throughput than is possible
+via kernel NFS.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -2283,6 +2289,10 @@ Use DAOS container's chunk size by default.
 .BI (dfs)object_class
 Specificy a different object class for the dfs file.
 Use DAOS container's object class by default.
+.TP
+.BI (nfs)nfs_url
+URL in libnfs format, eg nfs://<server|ipv4|ipv6>/path[?arg=val[&arg=val]*]
+Refer to the libnfs README for more details.
 .SS "I/O depth"
 .TP
 .BI iodepth \fR=\fPint
diff --git a/optgroup.c b/optgroup.c
index 15a16229..bebb4a51 100644
--- a/optgroup.c
+++ b/optgroup.c
@@ -185,6 +185,10 @@ static const struct opt_group fio_opt_cat_groups[] = {
 		.name	= "DAOS File System (dfs) I/O engine", /* dfs */
 		.mask	= FIO_OPT_G_DFS,
 	},
+	{
+		.name	= "NFS I/O engine", /* nfs */
+		.mask	= FIO_OPT_G_NFS,
+	},
 	{
 		.name	= NULL,
 	},
diff --git a/optgroup.h b/optgroup.h
index ff748629..1fb84a29 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -70,6 +70,7 @@ enum opt_category_group {
 	__FIO_OPT_G_NR,
 	__FIO_OPT_G_LIBCUFILE,
 	__FIO_OPT_G_DFS,
+	__FIO_OPT_G_NFS,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
 	FIO_OPT_G_ZONE		= (1ULL << __FIO_OPT_G_ZONE),
@@ -110,6 +111,7 @@ enum opt_category_group {
 	FIO_OPT_G_INVALID	= (1ULL << __FIO_OPT_G_NR),
 	FIO_OPT_G_ISCSI         = (1ULL << __FIO_OPT_G_ISCSI),
 	FIO_OPT_G_NBD		= (1ULL << __FIO_OPT_G_NBD),
+	FIO_OPT_G_NFS		= (1ULL << __FIO_OPT_G_NFS),
 	FIO_OPT_G_IOURING	= (1ULL << __FIO_OPT_G_IOURING),
 	FIO_OPT_G_FILESTAT	= (1ULL << __FIO_OPT_G_FILESTAT),
 	FIO_OPT_G_LIBCUFILE	= (1ULL << __FIO_OPT_G_LIBCUFILE),
diff --git a/options.c b/options.c
index ddabaa82..b82a10aa 100644
--- a/options.c
+++ b/options.c
@@ -2025,6 +2025,11 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  { .ival = "dfs",
 			    .help = "DAOS File System (dfs) IO engine",
 			  },
+#endif
+#ifdef CONFIG_NFS
+			  { .ival = "nfs",
+			    .help = "NFS IO engine",
+			  },
 #endif
 		},
 	},


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-05-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-05-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 30bec59eab3908b681cbc2866179f7166a849c83:

  os: define EDQUOT to EIO if the OS doesn't provide it (2021-05-11 07:58:03 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dfecde6a4b49bd299b2a7192c10533b9beb4820d:

  Merge branch '2021-05-13/stat-fix-integer-overflow' of https://github.com/flx42/fio (2021-05-14 09:36:59 -0600)

----------------------------------------------------------------
Felix Abecassis (1):
      stat: fix integer overflow in convert_agg_kbytes_percent

Jens Axboe (1):
      Merge branch '2021-05-13/stat-fix-integer-overflow' of https://github.com/flx42/fio

Niklas Cassel (4):
      zbd: only put an upper limit on max open zones once
      oslib/linux-blkzoned: move sysfs reading into its own function
      ioengines: add get_max_open_zones zoned block device operation
      engines/libzbc: add support for the get_max_open_zones io op

 engines/libzbc.c            | 21 +++++++++++
 engines/skeleton_external.c | 13 +++++++
 ioengines.h                 |  4 +-
 oslib/blkzoned.h            |  7 ++++
 oslib/linux-blkzoned.c      | 83 ++++++++++++++++++++++++++++++-----------
 stat.c                      |  2 +-
 zbd.c                       | 91 ++++++++++++++++++++++++++++++++++++++++++---
 7 files changed, 191 insertions(+), 30 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libzbc.c b/engines/libzbc.c
index 2aacf7bb..3dde93db 100644
--- a/engines/libzbc.c
+++ b/engines/libzbc.c
@@ -19,6 +19,7 @@ struct libzbc_data {
 	struct zbc_device	*zdev;
 	enum zbc_dev_model	model;
 	uint64_t		nr_sectors;
+	uint32_t		max_open_seq_req;
 };
 
 static int libzbc_get_dev_info(struct libzbc_data *ld, struct fio_file *f)
@@ -32,6 +33,7 @@ static int libzbc_get_dev_info(struct libzbc_data *ld, struct fio_file *f)
 	zbc_get_device_info(ld->zdev, zinfo);
 	ld->model = zinfo->zbd_model;
 	ld->nr_sectors = zinfo->zbd_sectors;
+	ld->max_open_seq_req = zinfo->zbd_max_nr_open_seq_req;
 
 	dprint(FD_ZBD, "%s: vendor_id:%s, type: %s, model: %s\n",
 	       f->file_name, zinfo->zbd_vendor_id,
@@ -335,6 +337,24 @@ err:
 	return -ret;
 }
 
+static int libzbc_get_max_open_zones(struct thread_data *td, struct fio_file *f,
+				     unsigned int *max_open_zones)
+{
+	struct libzbc_data *ld;
+	int ret;
+
+	ret = libzbc_open_dev(td, f, &ld);
+	if (ret)
+		return ret;
+
+	if (ld->max_open_seq_req == ZBC_NO_LIMIT)
+		*max_open_zones = 0;
+	else
+		*max_open_zones = ld->max_open_seq_req;
+
+	return 0;
+}
+
 ssize_t libzbc_rw(struct thread_data *td, struct io_u *io_u)
 {
 	struct libzbc_data *ld = td->io_ops_data;
@@ -414,6 +434,7 @@ FIO_STATIC struct ioengine_ops ioengine = {
 	.get_zoned_model	= libzbc_get_zoned_model,
 	.report_zones		= libzbc_report_zones,
 	.reset_wp		= libzbc_reset_wp,
+	.get_max_open_zones	= libzbc_get_max_open_zones,
 	.queue			= libzbc_queue,
 	.flags			= FIO_SYNCIO | FIO_NOEXTEND | FIO_RAWIO,
 };
diff --git a/engines/skeleton_external.c b/engines/skeleton_external.c
index 7f3e4cb3..c79b6f11 100644
--- a/engines/skeleton_external.c
+++ b/engines/skeleton_external.c
@@ -193,6 +193,18 @@ static int fio_skeleton_reset_wp(struct thread_data *td, struct fio_file *f,
 	return 0;
 }
 
+/*
+ * Hook called for getting the maximum number of open zones for a
+ * ZBD_HOST_MANAGED zoned block device.
+ * A @max_open_zones value set to zero means no limit.
+ */
+static int fio_skeleton_get_max_open_zones(struct thread_data *td,
+					   struct fio_file *f,
+					   unsigned int *max_open_zones)
+{
+	return 0;
+}
+
 /*
  * Note that the structure is exported, so that fio can get it via
  * dlsym(..., "ioengine"); for (and only for) external engines.
@@ -212,6 +224,7 @@ struct ioengine_ops ioengine = {
 	.get_zoned_model = fio_skeleton_get_zoned_model,
 	.report_zones	= fio_skeleton_report_zones,
 	.reset_wp	= fio_skeleton_reset_wp,
+	.get_max_open_zones = fio_skeleton_get_max_open_zones,
 	.options	= options,
 	.option_struct_size	= sizeof(struct fio_skeleton_options),
 };
diff --git a/ioengines.h b/ioengines.h
index 1d01ab0a..b3f755b4 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -8,7 +8,7 @@
 #include "io_u.h"
 #include "zbd_types.h"
 
-#define FIO_IOOPS_VERSION	29
+#define FIO_IOOPS_VERSION	30
 
 #ifndef CONFIG_DYNAMIC_ENGINES
 #define FIO_STATIC	static
@@ -59,6 +59,8 @@ struct ioengine_ops {
 			    uint64_t, struct zbd_zone *, unsigned int);
 	int (*reset_wp)(struct thread_data *, struct fio_file *,
 			uint64_t, uint64_t);
+	int (*get_max_open_zones)(struct thread_data *, struct fio_file *,
+				  unsigned int *);
 	int option_struct_size;
 	struct fio_option *options;
 };
diff --git a/oslib/blkzoned.h b/oslib/blkzoned.h
index 4cc071dc..719b041d 100644
--- a/oslib/blkzoned.h
+++ b/oslib/blkzoned.h
@@ -16,6 +16,8 @@ extern int blkzoned_report_zones(struct thread_data *td,
 				struct zbd_zone *zones, unsigned int nr_zones);
 extern int blkzoned_reset_wp(struct thread_data *td, struct fio_file *f,
 				uint64_t offset, uint64_t length);
+extern int blkzoned_get_max_open_zones(struct thread_data *td, struct fio_file *f,
+				       unsigned int *max_open_zones);
 #else
 /*
  * Define stubs for systems that do not have zoned block device support.
@@ -44,6 +46,11 @@ static inline int blkzoned_reset_wp(struct thread_data *td, struct fio_file *f,
 {
 	return -EIO;
 }
+static inline int blkzoned_get_max_open_zones(struct thread_data *td, struct fio_file *f,
+					      unsigned int *max_open_zones)
+{
+	return -EIO;
+}
 #endif
 
 #endif /* FIO_BLKZONED_H */
diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
index 81e4e7f0..6f89ec6f 100644
--- a/oslib/linux-blkzoned.c
+++ b/oslib/linux-blkzoned.c
@@ -74,12 +74,16 @@ static char *read_file(const char *path)
 	return strdup(line);
 }
 
-int blkzoned_get_zoned_model(struct thread_data *td, struct fio_file *f,
-			     enum zbd_zoned_model *model)
+/*
+ * Get the value of a sysfs attribute for a block device.
+ *
+ * Returns NULL on failure.
+ * Returns a pointer to a string on success.
+ * The caller is responsible for freeing the memory.
+ */
+static char *blkzoned_get_sysfs_attr(const char *file_name, const char *attr)
 {
-	const char *file_name = f->file_name;
-	char *zoned_attr_path = NULL;
-	char *model_str = NULL;
+	char *attr_path = NULL;
 	struct stat statbuf;
 	char *sys_devno_path = NULL;
 	char *part_attr_path = NULL;
@@ -87,13 +91,7 @@ int blkzoned_get_zoned_model(struct thread_data *td, struct fio_file *f,
 	char sys_path[PATH_MAX];
 	ssize_t sz;
 	char *delim = NULL;
-
-	if (f->filetype != FIO_TYPE_BLOCK) {
-		*model = ZBD_IGNORE;
-		return 0;
-	}
-
-	*model = ZBD_NONE;
+	char *attr_str = NULL;
 
 	if (stat(file_name, &statbuf) < 0)
 		goto out;
@@ -123,24 +121,65 @@ int blkzoned_get_zoned_model(struct thread_data *td, struct fio_file *f,
 		*delim = '\0';
 	}
 
-	if (asprintf(&zoned_attr_path,
-		     "/sys/dev/block/%s/queue/zoned", sys_path) < 0)
+	if (asprintf(&attr_path,
+		     "/sys/dev/block/%s/%s", sys_path, attr) < 0)
 		goto out;
 
-	model_str = read_file(zoned_attr_path);
+	attr_str = read_file(attr_path);
+out:
+	free(attr_path);
+	free(part_str);
+	free(part_attr_path);
+	free(sys_devno_path);
+
+	return attr_str;
+}
+
+int blkzoned_get_zoned_model(struct thread_data *td, struct fio_file *f,
+			     enum zbd_zoned_model *model)
+{
+	char *model_str = NULL;
+
+	if (f->filetype != FIO_TYPE_BLOCK) {
+		*model = ZBD_IGNORE;
+		return 0;
+	}
+
+	*model = ZBD_NONE;
+
+	model_str = blkzoned_get_sysfs_attr(f->file_name, "queue/zoned");
 	if (!model_str)
-		goto out;
-	dprint(FD_ZBD, "%s: zbd model string: %s\n", file_name, model_str);
+		return 0;
+
+	dprint(FD_ZBD, "%s: zbd model string: %s\n", f->file_name, model_str);
 	if (strcmp(model_str, "host-aware") == 0)
 		*model = ZBD_HOST_AWARE;
 	else if (strcmp(model_str, "host-managed") == 0)
 		*model = ZBD_HOST_MANAGED;
-out:
+
 	free(model_str);
-	free(zoned_attr_path);
-	free(part_str);
-	free(part_attr_path);
-	free(sys_devno_path);
+
+	return 0;
+}
+
+int blkzoned_get_max_open_zones(struct thread_data *td, struct fio_file *f,
+				unsigned int *max_open_zones)
+{
+	char *max_open_str;
+
+	if (f->filetype != FIO_TYPE_BLOCK)
+		return -EIO;
+
+	max_open_str = blkzoned_get_sysfs_attr(f->file_name, "queue/max_open_zones");
+	if (!max_open_str)
+		return 0;
+
+	dprint(FD_ZBD, "%s: max open zones supported by device: %s\n",
+	       f->file_name, max_open_str);
+	*max_open_zones = atoll(max_open_str);
+
+	free(max_open_str);
+
 	return 0;
 }
 
diff --git a/stat.c b/stat.c
index b7222f46..a8a96c85 100644
--- a/stat.c
+++ b/stat.c
@@ -462,7 +462,7 @@ static double convert_agg_kbytes_percent(struct group_run_stats *rs, int ddir, i
 {
 	double p_of_agg = 100.0;
 	if (rs && rs->agg[ddir] > 1024) {
-		p_of_agg = mean * 100 / (double) (rs->agg[ddir] / 1024.0);
+		p_of_agg = mean * 100.0 / (double) (rs->agg[ddir] / 1024.0);
 
 		if (p_of_agg > 100.0)
 			p_of_agg = 100.0;
diff --git a/zbd.c b/zbd.c
index eed796b3..68cd58e1 100644
--- a/zbd.c
+++ b/zbd.c
@@ -113,6 +113,34 @@ int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
 	return ret;
 }
 
+/**
+ * zbd_get_max_open_zones - Get the maximum number of open zones
+ * @td: FIO thread data
+ * @f: FIO file for which to get max open zones
+ * @max_open_zones: Upon success, result will be stored here.
+ *
+ * A @max_open_zones value set to zero means no limit.
+ *
+ * Returns 0 upon success and a negative error code upon failure.
+ */
+int zbd_get_max_open_zones(struct thread_data *td, struct fio_file *f,
+			   unsigned int *max_open_zones)
+{
+	int ret;
+
+	if (td->io_ops && td->io_ops->get_max_open_zones)
+		ret = td->io_ops->get_max_open_zones(td, f, max_open_zones);
+	else
+		ret = blkzoned_get_max_open_zones(td, f, max_open_zones);
+	if (ret < 0) {
+		td_verror(td, errno, "get max open zones failed");
+		log_err("%s: get max open zones failed (%d).\n",
+			f->file_name, errno);
+	}
+
+	return ret;
+}
+
 /**
  * zbd_zone_idx - convert an offset into a zone number
  * @f: file pointer.
@@ -554,6 +582,51 @@ out:
 	return ret;
 }
 
+static int zbd_set_max_open_zones(struct thread_data *td, struct fio_file *f)
+{
+	struct zoned_block_device_info *zbd = f->zbd_info;
+	unsigned int max_open_zones;
+	int ret;
+
+	if (zbd->model != ZBD_HOST_MANAGED) {
+		/* Only host-managed devices have a max open limit */
+		zbd->max_open_zones = td->o.max_open_zones;
+		goto out;
+	}
+
+	/* If host-managed, get the max open limit */
+	ret = zbd_get_max_open_zones(td, f, &max_open_zones);
+	if (ret)
+		return ret;
+
+	if (!max_open_zones) {
+		/* No device limit */
+		zbd->max_open_zones = td->o.max_open_zones;
+	} else if (!td->o.max_open_zones) {
+		/* No user limit. Set limit to device limit */
+		zbd->max_open_zones = max_open_zones;
+	} else if (td->o.max_open_zones <= max_open_zones) {
+		/* Both user limit and dev limit. User limit not too large */
+		zbd->max_open_zones = td->o.max_open_zones;
+	} else {
+		/* Both user limit and dev limit. User limit too large */
+		td_verror(td, EINVAL,
+			  "Specified --max_open_zones is too large");
+		log_err("Specified --max_open_zones (%d) is larger than max (%u)\n",
+			td->o.max_open_zones, max_open_zones);
+		return -EINVAL;
+	}
+
+out:
+	/* Ensure that the limit is not larger than FIO's internal limit */
+	zbd->max_open_zones = min_not_zero(zbd->max_open_zones,
+					   (uint32_t) ZBD_MAX_OPEN_ZONES);
+	dprint(FD_ZBD, "%s: using max open zones limit: %"PRIu32"\n",
+	       f->file_name, zbd->max_open_zones);
+
+	return 0;
+}
+
 /*
  * Allocate zone information and store it into f->zbd_info if zonemode=zbd.
  *
@@ -576,9 +649,13 @@ static int zbd_create_zone_info(struct thread_data *td, struct fio_file *f)
 	case ZBD_HOST_AWARE:
 	case ZBD_HOST_MANAGED:
 		ret = parse_zone_info(td, f);
+		if (ret)
+			return ret;
 		break;
 	case ZBD_NONE:
 		ret = init_zone_info(td, f);
+		if (ret)
+			return ret;
 		break;
 	default:
 		td_verror(td, EINVAL, "Unsupported zoned model");
@@ -586,11 +663,15 @@ static int zbd_create_zone_info(struct thread_data *td, struct fio_file *f)
 		return -EINVAL;
 	}
 
-	if (ret == 0) {
-		f->zbd_info->model = zbd_model;
-		f->zbd_info->max_open_zones = td->o.max_open_zones;
+	f->zbd_info->model = zbd_model;
+
+	ret = zbd_set_max_open_zones(td, f);
+	if (ret) {
+		zbd_free_zone_info(f);
+		return ret;
 	}
-	return ret;
+
+	return 0;
 }
 
 void zbd_free_zone_info(struct fio_file *f)
@@ -726,8 +807,6 @@ int zbd_setup_files(struct thread_data *td)
 		if (zbd_is_seq_job(f))
 			assert(f->min_zone < f->max_zone);
 
-		zbd->max_open_zones = zbd->max_open_zones ?: ZBD_MAX_OPEN_ZONES;
-
 		if (td->o.max_open_zones > 0 &&
 		    zbd->max_open_zones != td->o.max_open_zones) {
 			log_err("Different 'max_open_zones' values\n");


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-05-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-05-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit afa2cfb29b6c28b55d19f71f59287e43ecba80dd:

  Merge branch 'z_unit_docs' of https://github.com/ahribeng/fio (2021-05-10 21:16:58 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 30bec59eab3908b681cbc2866179f7166a849c83:

  os: define EDQUOT to EIO if the OS doesn't provide it (2021-05-11 07:58:03 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      os: define EDQUOT to EIO if the OS doesn't provide it

Martin Bukatovic (1):
      Make fill_device to stop writing on EDQUOT

 HOWTO       |  3 ++-
 backend.c   |  7 ++++---
 filesetup.c | 11 ++++++++---
 fio.1       |  3 ++-
 os/os.h     |  5 +++++
 5 files changed, 21 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 177310f6..f5681c0d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1858,7 +1858,8 @@ I/O size
 .. option:: fill_device=bool, fill_fs=bool
 
 	Sets size to something really large and waits for ENOSPC (no space left on
-	device) as the terminating condition. Only makes sense with sequential
+	device) or EDQUOT (disk quota exceeded)
+	as the terminating condition. Only makes sense with sequential
 	write. For a read workload, the mount point will be filled first then I/O
 	started on the result. This option doesn't make sense if operating on a raw
 	device node, since the size of that is already known by the file system.
diff --git a/backend.c b/backend.c
index 399c299e..6290e0d6 100644
--- a/backend.c
+++ b/backend.c
@@ -393,7 +393,7 @@ static bool break_on_this_error(struct thread_data *td, enum fio_ddir ddir,
 			td_clear_error(td);
 			*retptr = 0;
 			return false;
-		} else if (td->o.fill_device && err == ENOSPC) {
+		} else if (td->o.fill_device && (err == ENOSPC || err == EDQUOT)) {
 			/*
 			 * We expect to hit this error if
 			 * fill_device option is set.
@@ -1105,7 +1105,7 @@ reap:
 	if (td->trim_entries)
 		log_err("fio: %lu trim entries leaked?\n", td->trim_entries);
 
-	if (td->o.fill_device && td->error == ENOSPC) {
+	if (td->o.fill_device && (td->error == ENOSPC || td->error == EDQUOT)) {
 		td->error = 0;
 		fio_mark_td_terminate(td);
 	}
@@ -1120,7 +1120,8 @@ reap:
 
 		if (i) {
 			ret = io_u_queued_complete(td, i);
-			if (td->o.fill_device && td->error == ENOSPC)
+			if (td->o.fill_device &&
+			    (td->error == ENOSPC || td->error == EDQUOT))
 				td->error = 0;
 		}
 
diff --git a/filesetup.c b/filesetup.c
index e664f8b4..296de5a1 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -226,11 +226,16 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 			if (r < 0) {
 				int __e = errno;
 
-				if (__e == ENOSPC) {
+				if (__e == ENOSPC || __e == EDQUOT) {
+					const char *__e_name;
 					if (td->o.fill_device)
 						break;
-					log_info("fio: ENOSPC on laying out "
-						 "file, stopping\n");
+					if (__e == ENOSPC)
+						__e_name = "ENOSPC";
+					else
+						__e_name = "EDQUOT";
+					log_info("fio: %s on laying out "
+						 "file, stopping\n", __e_name);
 				}
 				td_verror(td, errno, "write");
 			} else
diff --git a/fio.1 b/fio.1
index e7da5c68..533bcf6a 100644
--- a/fio.1
+++ b/fio.1
@@ -1650,7 +1650,8 @@ of a file. This option is ignored on non-regular files.
 .TP
 .BI fill_device \fR=\fPbool "\fR,\fB fill_fs" \fR=\fPbool
 Sets size to something really large and waits for ENOSPC (no space left on
-device) as the terminating condition. Only makes sense with sequential
+device) or EDQUOT (disk quota exceeded)
+as the terminating condition. Only makes sense with sequential
 write. For a read workload, the mount point will be filled first then I/O
 started on the result. This option doesn't make sense if operating on a raw
 device node, since the size of that is already known by the file system.
diff --git a/os/os.h b/os/os.h
index b46f4164..e47d3d97 100644
--- a/os/os.h
+++ b/os/os.h
@@ -7,6 +7,7 @@
 #include <pthread.h>
 #include <unistd.h>
 #include <stdlib.h>
+#include <errno.h>
 
 #include "../arch/arch.h" /* IWYU pragma: export */
 #include "../lib/types.h"
@@ -58,6 +59,10 @@ typedef enum {
 #error "unsupported os"
 #endif
 
+#ifndef EDQUOT
+#define EDQUOT	EIO
+#endif
+
 #ifdef CONFIG_POSIXAIO
 #include <aio.h>
 #ifndef FIO_OS_HAVE_AIOCB_TYPEDEF


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-05-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-05-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 79f488cbd95ca6989031a7ace5ec382313d31b3c:

  don't access dlclose'd dynamic ioengine object after close (2021-05-08 22:13:16 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to afa2cfb29b6c28b55d19f71f59287e43ecba80dd:

  Merge branch 'z_unit_docs' of https://github.com/ahribeng/fio (2021-05-10 21:16:58 -0600)

----------------------------------------------------------------
Gonzalez (1):
      Add Documentation for z unit

Jens Axboe (1):
      Merge branch 'z_unit_docs' of https://github.com/ahribeng/fio

Niklas Cassel (1):
      oslib/linux-blkzoned: make sure that we always support zone capacity

 HOWTO                  | 14 ++++++++++----
 fio.1                  | 26 +++++++++++++++++++-------
 oslib/linux-blkzoned.c | 33 +++++++++++++++++++++++++++++++--
 3 files changed, 60 insertions(+), 13 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 889526d9..177310f6 100644
--- a/HOWTO
+++ b/HOWTO
@@ -544,6 +544,9 @@ Parameter types
 		* *Ti* -- means tebi (Ti) or 1024**4
 		* *Pi* -- means pebi (Pi) or 1024**5
 
+	For Zone Block Device Mode:
+	        * *z*  -- means Zone
+
 	With :option:`kb_base`\=1024 (the default), the unit prefixes are opposite
 	from those specified in the SI and IEC 80000-13 standards to provide
 	compatibility with old scripts.  For example, 4k means 4096.
@@ -1277,13 +1280,14 @@ I/O type
 .. option:: offset=int
 
 	Start I/O at the provided offset in the file, given as either a fixed size in
-	bytes or a percentage. If a percentage is given, the generated offset will be
+	bytes, zones or a percentage. If a percentage is given, the generated offset will be
 	aligned to the minimum ``blocksize`` or to the value of ``offset_align`` if
 	provided. Data before the given offset will not be touched. This
 	effectively caps the file size at `real_size - offset`. Can be combined with
 	:option:`size` to constrain the start and end range of the I/O workload.
 	A percentage can be specified by a number between 1 and 100 followed by '%',
-	for example, ``offset=20%`` to specify 20%.
+	for example, ``offset=20%`` to specify 20%. In ZBD mode, value can be set as 
+        number of zones using 'z'.
 
 .. option:: offset_align=int
 
@@ -1300,7 +1304,8 @@ I/O type
 	intended to operate on a file in parallel disjoint segments, with even
 	spacing between the starting points. Percentages can be used for this option.
 	If a percentage is given, the generated offset will be aligned to the minimum
-	``blocksize`` or to the value of ``offset_align`` if provided.
+	``blocksize`` or to the value of ``offset_align`` if provided. In ZBD mode, value can
+        also be set as number of zones using 'z'.
 
 .. option:: number_ios=int
 
@@ -1818,7 +1823,8 @@ I/O size
 	If this option is not specified, fio will use the full size of the given
 	files or devices.  If the files do not exist, size must be given. It is also
 	possible to give size as a percentage between 1 and 100. If ``size=20%`` is
-	given, fio will use 20% of the full size of the given files or devices.
+	given, fio will use 20% of the full size of the given files or devices. 
+	In ZBD mode, value can also be set as number of zones using 'z'.
 	Can be combined with :option:`offset` to constrain the start and end range
 	that I/O will be done within.
 
diff --git a/fio.1 b/fio.1
index c3916168..e7da5c68 100644
--- a/fio.1
+++ b/fio.1
@@ -288,6 +288,15 @@ Pi means pebi (Pi) or 1024**5
 .PD
 .RE
 .P
+For Zone Block Device Mode:
+.RS
+.P
+.PD 0
+z means Zone 
+.P
+.PD
+.RE
+.P
 With `kb_base=1024' (the default), the unit prefixes are opposite
 from those specified in the SI and IEC 80000-13 standards to provide
 compatibility with old scripts. For example, 4k means 4096.
@@ -1061,13 +1070,14 @@ should be associated with them.
 .TP
 .BI offset \fR=\fPint[%|z]
 Start I/O at the provided offset in the file, given as either a fixed size in
-bytes or a percentage. If a percentage is given, the generated offset will be
+bytes, zones or a percentage. If a percentage is given, the generated offset will be
 aligned to the minimum \fBblocksize\fR or to the value of \fBoffset_align\fR if
 provided. Data before the given offset will not be touched. This
 effectively caps the file size at `real_size \- offset'. Can be combined with
 \fBsize\fR to constrain the start and end range of the I/O workload.
 A percentage can be specified by a number between 1 and 100 followed by '%',
-for example, `offset=20%' to specify 20%.
+for example, `offset=20%' to specify 20%. In ZBD mode, value can be set as 
+number of zones using 'z'.
 .TP
 .BI offset_align \fR=\fPint
 If set to non-zero value, the byte offset generated by a percentage \fBoffset\fR
@@ -1082,7 +1092,8 @@ specified). This option is useful if there are several jobs which are
 intended to operate on a file in parallel disjoint segments, with even
 spacing between the starting points. Percentages can be used for this option.
 If a percentage is given, the generated offset will be aligned to the minimum
-\fBblocksize\fR or to the value of \fBoffset_align\fR if provided.
+\fBblocksize\fR or to the value of \fBoffset_align\fR if provided.In ZBD mode, value 
+can be set as number of zones using 'z'.
 .TP
 .BI number_ios \fR=\fPint
 Fio will normally perform I/Os until it has exhausted the size of the region
@@ -1607,9 +1618,9 @@ set to the physical size of the given files or devices if they exist.
 If this option is not specified, fio will use the full size of the given
 files or devices. If the files do not exist, size must be given. It is also
 possible to give size as a percentage between 1 and 100. If `size=20%' is
-given, fio will use 20% of the full size of the given files or devices.
-Can be combined with \fBoffset\fR to constrain the start and end range
-that I/O will be done within.
+given, fio will use 20% of the full size of the given files or devices. In ZBD mode,
+size can be given in units of number of zones using 'z'. Can be combined with \fBoffset\fR to 
+constrain the start and end range that I/O will be done within.
 .TP
 .BI io_size \fR=\fPint[%|z] "\fR,\fB io_limit" \fR=\fPint[%|z]
 Normally fio operates within the region set by \fBsize\fR, which means
@@ -1621,7 +1632,8 @@ will perform I/O within the first 20GiB but exit when 5GiB have been
 done. The opposite is also possible \-\- if \fBsize\fR is set to 20GiB,
 and \fBio_size\fR is set to 40GiB, then fio will do 40GiB of I/O within
 the 0..20GiB region. Value can be set as percentage: \fBio_size\fR=N%.
-In this case \fBio_size\fR multiplies \fBsize\fR= value.
+In this case \fBio_size\fR multiplies \fBsize\fR= value. In ZBD mode, value can
+also be set as number of zones using 'z'.
 .TP
 .BI filesize \fR=\fPirange(int)
 Individual file sizes. May be a range, in which case fio will select sizes
diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
index f37c67fc..81e4e7f0 100644
--- a/oslib/linux-blkzoned.c
+++ b/oslib/linux-blkzoned.c
@@ -23,6 +23,37 @@
 
 #include <linux/blkzoned.h>
 
+/*
+ * If the uapi headers installed on the system lacks zone capacity support,
+ * use our local versions. If the installed headers are recent enough to
+ * support zone capacity, do not redefine any structs.
+ */
+#ifndef CONFIG_HAVE_REP_CAPACITY
+#define BLK_ZONE_REP_CAPACITY	(1 << 0)
+
+struct blk_zone_v2 {
+	__u64	start;          /* Zone start sector */
+	__u64	len;            /* Zone length in number of sectors */
+	__u64	wp;             /* Zone write pointer position */
+	__u8	type;           /* Zone type */
+	__u8	cond;           /* Zone condition */
+	__u8	non_seq;        /* Non-sequential write resources active */
+	__u8	reset;          /* Reset write pointer recommended */
+	__u8	resv[4];
+	__u64	capacity;       /* Zone capacity in number of sectors */
+	__u8	reserved[24];
+};
+#define blk_zone blk_zone_v2
+
+struct blk_zone_report_v2 {
+	__u64	sector;
+	__u32	nr_zones;
+	__u32	flags;
+struct blk_zone zones[0];
+};
+#define blk_zone_report blk_zone_report_v2
+#endif /* CONFIG_HAVE_REP_CAPACITY */
+
 /*
  * Read up to 255 characters from the first line of a file. Strip the trailing
  * newline.
@@ -116,10 +147,8 @@ out:
 static uint64_t zone_capacity(struct blk_zone_report *hdr,
 			      struct blk_zone *blkz)
 {
-#ifdef CONFIG_HAVE_REP_CAPACITY
 	if (hdr->flags & BLK_ZONE_REP_CAPACITY)
 		return blkz->capacity << 9;
-#endif
 	return blkz->len << 9;
 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-05-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-05-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit cffe80a41cbf9b26446c803177a27f7695f94a31:

  configure: fix check_min_lib_version() eval (2021-05-06 10:24:58 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 79f488cbd95ca6989031a7ace5ec382313d31b3c:

  don't access dlclose'd dynamic ioengine object after close (2021-05-08 22:13:16 -0600)

----------------------------------------------------------------
Eric Sandeen (1):
      don't access dlclose'd dynamic ioengine object after close

 ioengines.c | 1 -
 1 file changed, 1 deletion(-)

---

Diff of recent changes:

diff --git a/ioengines.c b/ioengines.c
index 3561bb4e..dd61af07 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -234,7 +234,6 @@ void free_ioengine(struct thread_data *td)
 	if (td->io_ops->dlhandle) {
 		dprint(FD_IO, "dlclose ioengine %s\n", td->io_ops->name);
 		dlclose(td->io_ops->dlhandle);
-		td->io_ops->dlhandle = NULL;
 	}
 
 	td->io_ops = NULL;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-05-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-05-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6308ef297145e73add65ba86bfdbeaf967957d1f:

  ioengines: don't call zbd_put_io_u() for engines not implementing commit (2021-04-27 11:56:55 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cffe80a41cbf9b26446c803177a27f7695f94a31:

  configure: fix check_min_lib_version() eval (2021-05-06 10:24:58 -0600)

----------------------------------------------------------------
Stefan Hajnoczi (1):
      configure: fix check_min_lib_version() eval

 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index a7d82be0..e886bdc8 100755
--- a/configure
+++ b/configure
@@ -142,7 +142,7 @@ check_min_lib_version() {
   fi
   : "${_feature:=${1}}"
   if "${cross_prefix}"pkg-config --version > /dev/null 2>&1; then
-    if eval "echo \$$_feature" = "yes" ; then
+    if test "$(eval echo \"\$$_feature\")" = "yes" ; then
       feature_not_found "$_feature" "$1 >= $2"
     fi
   else


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-04-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-04-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 13169d44d3725847858a7c817965d2cac5abd8f8:

  Merge branch 'pthread_getaffinity_1' of https://github.com/kusumi/fio (2021-04-25 10:23:34 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6308ef297145e73add65ba86bfdbeaf967957d1f:

  ioengines: don't call zbd_put_io_u() for engines not implementing commit (2021-04-27 11:56:55 -0600)

----------------------------------------------------------------
Niklas Cassel (1):
      ioengines: don't call zbd_put_io_u() for engines not implementing commit

Rebecca Cran (1):
      The GPL isn't a EULA: remove it and introduce WixUI_Minimal_NoEULA

 ioengines.c                         |   1 -
 os/windows/WixUI_Minimal_NoEULA.wxs |  96 ++++++++++++++++++++++++++++++++++++
 os/windows/WixUI_fio.wxl            |  12 +++++
 os/windows/dobuild.cmd              |   5 +-
 os/windows/eula.rtf                 | Bin 1075 -> 0 bytes
 os/windows/install.wxs              |   2 +-
 6 files changed, 113 insertions(+), 3 deletions(-)
 create mode 100755 os/windows/WixUI_Minimal_NoEULA.wxs
 create mode 100755 os/windows/WixUI_fio.wxl
 delete mode 100755 os/windows/eula.rtf

---

Diff of recent changes:

diff --git a/ioengines.c b/ioengines.c
index f88b0537..3561bb4e 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -414,7 +414,6 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	if (!td->io_ops->commit) {
 		io_u_mark_submit(td, 1);
 		io_u_mark_complete(td, 1);
-		zbd_put_io_u(td, io_u);
 	}
 
 	if (ret == FIO_Q_COMPLETED) {
diff --git a/os/windows/WixUI_Minimal_NoEULA.wxs b/os/windows/WixUI_Minimal_NoEULA.wxs
new file mode 100755
index 00000000..48391186
--- /dev/null
+++ b/os/windows/WixUI_Minimal_NoEULA.wxs
@@ -0,0 +1,96 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Copyright (c) .NET Foundation and contributors. All rights reserved. Licensed under the Microsoft Reciprocal License. See LICENSE.TXT file in the project root for full license information. -->
+
+
+
+<!--
+First-time install dialog sequence:
+ - WixUI_MyWelcomeDlg
+Maintenance dialog sequence:
+ WixUI_MaintenanceWelcomeDlg
+ - WixUI_MaintenanceTypeDlg
+ - WixUI_VerifyReadyDlg
+-->
+
+<Wix xmlns="http://schemas.microsoft.com/wix/2006/wi">
+  <Fragment>
+    <UI Id="WixUI_Minimal_NoEULA">
+      <TextStyle Id="WixUI_Font_Normal" FaceName="Tahoma" Size="8" />
+      <TextStyle Id="WixUI_Font_Bigger" FaceName="Tahoma" Size="12" />
+      <TextStyle Id="WixUI_Font_Title" FaceName="Tahoma" Size="9" Bold="yes" />
+
+      <Property Id="DefaultUIFont" Value="WixUI_Font_Normal" />
+      <Property Id="WixUI_Mode" Value="Minimal" />
+
+      <DialogRef Id="ErrorDlg" />
+      <DialogRef Id="FatalError" />
+      <DialogRef Id="FilesInUse" />
+      <DialogRef Id="MsiRMFilesInUse" />
+      <DialogRef Id="PrepareDlg" />
+      <DialogRef Id="ProgressDlg" />
+      <DialogRef Id="ResumeDlg" />
+      <DialogRef Id="UserExit" />
+      <DialogRef Id="MyWelcomeDlg" />
+
+      <Dialog Id="MyWelcomeDlg" Width="370" Height="270" Title="!(loc.WelcomeDlg_Title)">
+          <Control Id="Install" Type="PushButton" ElevationShield="yes" X="236" Y="243" Width="56" Height="17" Default="yes" Hidden="yes" Text="!(loc.WelcomeEulaDlgInstall)" >
+            <Publish Property="WixUI_InstallMode" Value="Update">Installed AND PATCH</Publish>
+            <Publish Event="SpawnWaitDialog" Value="WaitForCostingDlg">!(wix.WixUICostingPopupOptOut) OR CostingComplete = 1</Publish>
+            <Publish Event="EndDialog" Value="Return"><![CDATA[OutOfDiskSpace <> 1]]></Publish>
+            <Publish Event="SpawnDialog" Value="OutOfRbDiskDlg">OutOfDiskSpace = 1 AND OutOfNoRbDiskSpace = 0 AND (PROMPTROLLBACKCOST="P" OR NOT PROMPTROLLBACKCOST)</Publish>
+            <Publish Event="EndDialog" Value="Return">OutOfDiskSpace = 1 AND OutOfNoRbDiskSpace = 0 AND PROMPTROLLBACKCOST="D"</Publish>
+            <Publish Event="EnableRollback" Value="False">OutOfDiskSpace = 1 AND OutOfNoRbDiskSpace = 0 AND PROMPTROLLBACKCOST="D"</Publish>
+            <Publish Event="SpawnDialog" Value="OutOfDiskDlg">(OutOfDiskSpace = 1 AND OutOfNoRbDiskSpace = 1) OR (OutOfDiskSpace = 1 AND PROMPTROLLBACKCOST="F")</Publish>
+            <Condition Action="show">ALLUSERS</Condition>
+        </Control>
+        <Control Id="InstallNoShield" Type="PushButton" ElevationShield="no" X="212" Y="243" Width="80" Height="17" Default="yes" Text="!(loc.WelcomeEulaDlgInstall)" Hidden="yes">
+          <Publish Event="SpawnWaitDialog" Value="WaitForCostingDlg">!(wix.WixUICostingPopupOptOut) OR CostingComplete = 1</Publish>
+          <Publish Event="EndDialog" Value="Return"><![CDATA[OutOfDiskSpace <> 1]]></Publish>
+          <Publish Event="SpawnDialog" Value="OutOfRbDiskDlg">OutOfDiskSpace = 1 AND OutOfNoRbDiskSpace = 0 AND (PROMPTROLLBACKCOST="P" OR NOT PROMPTROLLBACKCOST)</Publish>
+          <Publish Event="EndDialog" Value="Return">OutOfDiskSpace = 1 AND OutOfNoRbDiskSpace = 0 AND PROMPTROLLBACKCOST="D"</Publish>
+          <Publish Event="EnableRollback" Value="False">OutOfDiskSpace = 1 AND OutOfNoRbDiskSpace = 0 AND PROMPTROLLBACKCOST="D"</Publish>
+          <Publish Event="SpawnDialog" Value="OutOfDiskDlg">(OutOfDiskSpace = 1 AND OutOfNoRbDiskSpace = 1) OR (OutOfDiskSpace = 1 AND PROMPTROLLBACKCOST="F")</Publish>
+          <Condition Action="disable"><![CDATA[LicenseAccepted <> "1"]]></Condition>
+          <Condition Action="show">NOT ALLUSERS</Condition>
+        </Control>
+        <Control Id="Cancel" Type="PushButton" X="304" Y="243" Width="56" Height="17" Cancel="yes" Text="!(loc.WixUICancel)">
+          <Publish Event="SpawnDialog" Value="CancelDlg">1</Publish>
+        </Control>
+        <Control Id="Bitmap" Type="Bitmap" X="0" Y="0" Width="370" Height="234" TabSkip="no" Text="!(loc.WelcomeDlgBitmap)" />
+        <Control Id="Back" Type="PushButton" X="180" Y="243" Width="56" Height="17" Disabled="yes" Text="!(loc.WixUIBack)" />
+        <Control Id="BottomLine" Type="Line" X="0" Y="234" Width="370" Height="0" />
+        <Control Id="Description" Type="Text" X="135" Y="80" Width="220" Height="60" Transparent="yes" NoPrefix="yes" Text="!(loc.MyWelcomeDlgDescription)" >
+          <Condition Action="show">NOT Installed OR NOT PATCH</Condition>
+          <Condition Action="hide">Installed AND PATCH</Condition>
+        </Control>
+        <Control Id="PatchDescription" Type="Text" X="135" Y="80" Width="220" Height="60" Transparent="yes" NoPrefix="yes" Text="!(loc.WelcomeUpdateDlgDescriptionUpdate)" >
+          <Condition Action="show">Installed AND PATCH</Condition>
+          <Condition Action="hide">NOT Installed OR NOT PATCH</Condition>
+        </Control>
+        <Control Id="Title" Type="Text" X="135" Y="20" Width="220" Height="60" Transparent="yes" NoPrefix="yes" Text="!(loc.WelcomeDlgTitle)" />
+      </Dialog>
+
+      <Publish Dialog="ExitDialog" Control="Finish" Event="EndDialog" Value="Return" Order="999">1</Publish>
+
+      <Publish Dialog="VerifyReadyDlg" Control="Back" Event="NewDialog" Value="MaintenanceTypeDlg">1</Publish>
+
+      <Publish Dialog="MaintenanceWelcomeDlg" Control="Next" Event="NewDialog" Value="MaintenanceTypeDlg">1</Publish>
+
+      <Publish Dialog="MaintenanceTypeDlg" Control="RepairButton" Event="NewDialog" Value="VerifyReadyDlg">1</Publish>
+      <Publish Dialog="MaintenanceTypeDlg" Control="RemoveButton" Event="NewDialog" Value="VerifyReadyDlg">1</Publish>
+      <Publish Dialog="MaintenanceTypeDlg" Control="Back" Event="NewDialog" Value="MaintenanceWelcomeDlg">1</Publish>
+
+      <Publish Dialog="MyWelcomeDlg" Control="Install" Event="NewDialog" Value="PrepareDlg">1</Publish>
+      <Publish Dialog="VerifyReadyDlg" Control="Back" Event="NewDialog" Value="WelcomeDlg" Order="2">Installed AND PATCH</Publish>
+
+      <InstallUISequence>
+        <Show Dialog="WelcomeDlg" Before="ProgressDlg">0</Show>
+        <Show Dialog="MyWelcomeDlg" Before="ProgressDlg">NOT Installed</Show>
+      </InstallUISequence>
+
+      <Property Id="ARPNOMODIFY" Value="1" />
+    </UI>
+
+    <UIRef Id="WixUI_Common" />
+  </Fragment>
+</Wix>
\ No newline at end of file
diff --git a/os/windows/WixUI_fio.wxl b/os/windows/WixUI_fio.wxl
new file mode 100755
index 00000000..11ec736a
--- /dev/null
+++ b/os/windows/WixUI_fio.wxl
@@ -0,0 +1,12 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!-- Copyright (c) .NET Foundation and contributors. All rights reserved. Licensed under the Microsoft Reciprocal License. See LICENSE.TXT file in the project root for full license information. -->
+
+
+<WixLocalization Culture="en-US" Codepage="1252" xmlns="http://schemas.microsoft.com/wix/2006/localization">
+  <!-- _locID@Culture="en-US" _locComment="American English" -->
+  <!-- _locID@Codepage="1252" _locComment="Windows-1252" -->
+
+<String Id="MyWelcomeDlgDescription" Overridable="yes">
+<!-- _locID_text="MyWelcomeDlgDescription" _locComment="MyWelcomeDlgDescription" -->The Setup Wizard will install [ProductName] on your computer. Click Install to continue or Cancel to exit the Setup Wizard.
+</String>
+</WixLocalization>
\ No newline at end of file
diff --git a/os/windows/dobuild.cmd b/os/windows/dobuild.cmd
index 08df3e87..7b9cb1dd 100644
--- a/os/windows/dobuild.cmd
+++ b/os/windows/dobuild.cmd
@@ -44,7 +44,10 @@ if exist ..\..\fio.pdb (
 @if ERRORLEVEL 1 goto end
 "%WIX%bin\candle" -nologo -arch %FIO_ARCH% examples.wxs
 @if ERRORLEVEL 1 goto end
-"%WIX%bin\light" -nologo -sice:ICE61 install.wixobj examples.wixobj -ext WixUIExtension -out %FIO_VERSION%-%FIO_ARCH%.msi
+"%WIX%bin\candle" -nologo -arch %FIO_ARCH% WixUI_Minimal_NoEULA.wxs
+@if ERRORLEVEL 1 goto end
+
+"%WIX%bin\light" -nologo -sice:ICE61 install.wixobj examples.wixobj WixUI_Minimal_NoEULA.wixobj -loc WixUI_fio.wxl -ext WixUIExtension -out %FIO_VERSION%-%FIO_ARCH%.msi
 :end
 
 if defined SIGN_FIO (
diff --git a/os/windows/eula.rtf b/os/windows/eula.rtf
deleted file mode 100755
index a931017c..00000000
Binary files a/os/windows/eula.rtf and /dev/null differ
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index f73ec5e2..7773bb3b 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -107,7 +107,7 @@
 
 	<WixVariable Id="WixUILicenseRtf" Value="eula.rtf" />
 
-	<UIRef Id="WixUI_Minimal"/>
+	<UIRef Id="WixUI_Minimal_NoEULA"/>
 
 	<MajorUpgrade AllowDowngrades="no" DowngradeErrorMessage="A newer version of the application is already installed."
                   AllowSameVersionUpgrades="yes"/>


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-04-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-04-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 14691a4df98b85621b07dd2bdc0f0a960acbb8ba:

  Merge branch 'gpspm-add-optional-use-rpma_conn_completion_wait-function' of https://github.com/ldorau/fio (2021-04-23 08:39:21 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 13169d44d3725847858a7c817965d2cac5abd8f8:

  Merge branch 'pthread_getaffinity_1' of https://github.com/kusumi/fio (2021-04-25 10:23:34 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'pthread_getaffinity_1' of https://github.com/kusumi/fio

Tomohiro Kusumi (1):
      gettime: Fix compilation on non-Linux with pthread_getaffinity_np()

 gettime.c         | 2 +-
 os/os-aix.h       | 6 ++++++
 os/os-android.h   | 6 ++++++
 os/os-dragonfly.h | 6 ++++++
 os/os-freebsd.h   | 6 ++++++
 os/os-hpux.h      | 7 +++++++
 os/os-linux.h     | 3 +++
 os/os-mac.h       | 6 ++++++
 os/os-netbsd.h    | 6 ++++++
 os/os-openbsd.h   | 6 ++++++
 os/os-solaris.h   | 6 ++++++
 11 files changed, 59 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/gettime.c b/gettime.c
index e3f483a7..099e9d9f 100644
--- a/gettime.c
+++ b/gettime.c
@@ -679,7 +679,7 @@ int fio_monotonic_clocktest(int debug)
 	unsigned int i;
 	os_cpu_mask_t mask;
 
-#ifdef CONFIG_PTHREAD_GETAFFINITY
+#ifdef FIO_HAVE_GET_THREAD_AFFINITY
 	fio_get_thread_affinity(mask);
 #else
 	memset(&mask, 0, sizeof(mask));
diff --git a/os/os-aix.h b/os/os-aix.h
index 1aab96e0..db99eef4 100644
--- a/os/os-aix.h
+++ b/os/os-aix.h
@@ -18,6 +18,12 @@
 
 #define FIO_USE_GENERIC_SWAP
 
+#ifdef CONFIG_PTHREAD_GETAFFINITY
+#define FIO_HAVE_GET_THREAD_AFFINITY
+#define fio_get_thread_affinity(mask)	\
+	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
+#endif
+
 static inline int blockdev_invalidate_cache(struct fio_file *f)
 {
 	return ENOTSUP;
diff --git a/os/os-android.h b/os/os-android.h
index 3c050776..3f1aa9d3 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -58,6 +58,12 @@
 #define MAP_HUGETLB 0x40000 /* arch specific */
 #endif
 
+#ifdef CONFIG_PTHREAD_GETAFFINITY
+#define FIO_HAVE_GET_THREAD_AFFINITY
+#define fio_get_thread_affinity(mask)	\
+	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
+#endif
+
 #ifndef CONFIG_NO_SHM
 /*
  * Bionic doesn't support SysV shared memeory, so implement it using ashmem
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index 44bfcd5d..6e465894 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -92,6 +92,12 @@ typedef cpumask_t os_cpu_mask_t;
 /* No CPU_COUNT(), but use the default function defined in os/os.h */
 #define fio_cpu_count(mask)             CPU_COUNT((mask))
 
+#ifdef CONFIG_PTHREAD_GETAFFINITY
+#define FIO_HAVE_GET_THREAD_AFFINITY
+#define fio_get_thread_affinity(mask)	\
+	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
+#endif
+
 static inline int fio_cpuset_init(os_cpu_mask_t *mask)
 {
 	CPUMASK_ASSZERO(*mask);
diff --git a/os/os-freebsd.h b/os/os-freebsd.h
index b3addf98..1b24fa02 100644
--- a/os/os-freebsd.h
+++ b/os/os-freebsd.h
@@ -37,6 +37,12 @@ typedef cpuset_t os_cpu_mask_t;
 #define fio_cpu_isset(mask, cpu)	(CPU_ISSET((cpu), (mask)) != 0)
 #define fio_cpu_count(mask)		CPU_COUNT((mask))
 
+#ifdef CONFIG_PTHREAD_GETAFFINITY
+#define FIO_HAVE_GET_THREAD_AFFINITY
+#define fio_get_thread_affinity(mask)	\
+	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
+#endif
+
 static inline int fio_cpuset_init(os_cpu_mask_t *mask)
 {
         CPU_ZERO(mask);
diff --git a/os/os-hpux.h b/os/os-hpux.h
index c1dafe42..a80cb2bc 100644
--- a/os/os-hpux.h
+++ b/os/os-hpux.h
@@ -38,6 +38,13 @@
 #define FIO_USE_GENERIC_SWAP
 
 #define FIO_OS_HAVE_AIOCB_TYPEDEF
+
+#ifdef CONFIG_PTHREAD_GETAFFINITY
+#define FIO_HAVE_GET_THREAD_AFFINITY
+#define fio_get_thread_affinity(mask)	\
+	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
+#endif
+
 typedef struct aiocb64 os_aiocb_t;
 
 static inline int blockdev_invalidate_cache(struct fio_file *f)
diff --git a/os/os-linux.h b/os/os-linux.h
index ea8d7922..f7137abe 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -74,8 +74,11 @@ typedef cpu_set_t os_cpu_mask_t;
 	sched_getaffinity((pid), (ptr))
 #endif
 
+#ifdef CONFIG_PTHREAD_GETAFFINITY
+#define FIO_HAVE_GET_THREAD_AFFINITY
 #define fio_get_thread_affinity(mask)	\
 	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
+#endif
 
 #define fio_cpu_clear(mask, cpu)	(void) CPU_CLR((cpu), (mask))
 #define fio_cpu_set(mask, cpu)		(void) CPU_SET((cpu), (mask))
diff --git a/os/os-mac.h b/os/os-mac.h
index 683aab32..ec2cc1e5 100644
--- a/os/os-mac.h
+++ b/os/os-mac.h
@@ -27,6 +27,12 @@
 #define fio_swap32(x)	OSSwapInt32(x)
 #define fio_swap64(x)	OSSwapInt64(x)
 
+#ifdef CONFIG_PTHREAD_GETAFFINITY
+#define FIO_HAVE_GET_THREAD_AFFINITY
+#define fio_get_thread_affinity(mask)	\
+	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
+#endif
+
 #ifndef CONFIG_CLOCKID_T
 typedef unsigned int clockid_t;
 #endif
diff --git a/os/os-netbsd.h b/os/os-netbsd.h
index abc1d3cb..624c7fa5 100644
--- a/os/os-netbsd.h
+++ b/os/os-netbsd.h
@@ -35,6 +35,12 @@
 #define fio_swap32(x)	bswap32(x)
 #define fio_swap64(x)	bswap64(x)
 
+#ifdef CONFIG_PTHREAD_GETAFFINITY
+#define FIO_HAVE_GET_THREAD_AFFINITY
+#define fio_get_thread_affinity(mask)	\
+	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
+#endif
+
 static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 {
 	struct disklabel dl;
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index 994bf078..f1bad671 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -35,6 +35,12 @@
 #define fio_swap32(x)	swap32(x)
 #define fio_swap64(x)	swap64(x)
 
+#ifdef CONFIG_PTHREAD_GETAFFINITY
+#define FIO_HAVE_GET_THREAD_AFFINITY
+#define fio_get_thread_affinity(mask)	\
+	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
+#endif
+
 static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 {
 	struct disklabel dl;
diff --git a/os/os-solaris.h b/os/os-solaris.h
index f1966f44..ea1f081c 100644
--- a/os/os-solaris.h
+++ b/os/os-solaris.h
@@ -46,6 +46,12 @@ struct solaris_rand_seed {
 #define os_ctime_r(x, y, z)     ctime_r((x), (y), (z))
 #define FIO_OS_HAS_CTIME_R
 
+#ifdef CONFIG_PTHREAD_GETAFFINITY
+#define FIO_HAVE_GET_THREAD_AFFINITY
+#define fio_get_thread_affinity(mask)	\
+	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
+#endif
+
 typedef psetid_t os_cpu_mask_t;
 
 static inline int chardev_size(struct fio_file *f, unsigned long long *bytes)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-04-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-04-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1b65668b6a50392436947802a49896e891feb0f8:

  Merge branch 'zbd-no-parallel-init' of https://github.com/floatious/fio (2021-04-22 11:18:23 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 14691a4df98b85621b07dd2bdc0f0a960acbb8ba:

  Merge branch 'gpspm-add-optional-use-rpma_conn_completion_wait-function' of https://github.com/ldorau/fio (2021-04-23 08:39:21 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'gpspm-add-optional-use-rpma_conn_completion_wait-function' of https://github.com/ldorau/fio

Oksana Salyk (1):
      rpma: gpspm: introduce the busy_wait_polling toggle

 HOWTO                             |  5 +++++
 engines/librpma_fio.c             | 11 +++++++++++
 engines/librpma_fio.h             |  2 ++
 engines/librpma_gpspm.c           | 25 +++++++++++++++++++++++--
 examples/librpma_gpspm-server.fio |  2 ++
 fio.1                             |  4 ++++
 6 files changed, 47 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index e6078c5f..889526d9 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2237,6 +2237,11 @@ with the caveat that when used on the command line, they must come after the
 	Set to 1 only when Direct Write to PMem from the remote host is possible.
 	Otherwise, set to 0.
 
+.. option:: busy_wait_polling=bool : [librpma_*_server]
+
+	Set to 0 to wait for completion instead of busy-wait polling completion.
+	Default: 1.
+
 .. option:: interface=str : [netsplice] [net]
 
 	The IP address of the network interface used to send or receive UDP
diff --git a/engines/librpma_fio.c b/engines/librpma_fio.c
index 810b55e2..3d605ed6 100644
--- a/engines/librpma_fio.c
+++ b/engines/librpma_fio.c
@@ -49,6 +49,17 @@ struct fio_option librpma_fio_options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBRPMA,
 	},
+	{
+		.name	= "busy_wait_polling",
+		.lname	= "Set to 0 to wait for completion instead of busy-wait polling completion.",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct librpma_fio_options_values,
+					busy_wait_polling),
+		.help	= "Set to false if you want to reduce CPU usage",
+		.def	= "1",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBRPMA,
+	},
 	{
 		.name	= NULL,
 	},
diff --git a/engines/librpma_fio.h b/engines/librpma_fio.h
index 8cfb2e2d..fb89d99d 100644
--- a/engines/librpma_fio.h
+++ b/engines/librpma_fio.h
@@ -41,6 +41,8 @@ struct librpma_fio_options_values {
 	char *port;
 	/* Direct Write to PMem is possible */
 	unsigned int direct_write_to_pmem;
+	/* Set to 0 to wait for completion instead of busy-wait polling completion. */
+	unsigned int busy_wait_polling;
 };
 
 extern struct fio_option librpma_fio_options[];
diff --git a/engines/librpma_gpspm.c b/engines/librpma_gpspm.c
index ac614f46..74147709 100644
--- a/engines/librpma_gpspm.c
+++ b/engines/librpma_gpspm.c
@@ -683,12 +683,33 @@ static int server_cmpl_process(struct thread_data *td)
 	struct librpma_fio_server_data *csd = td->io_ops_data;
 	struct server_data *sd = csd->server_data;
 	struct rpma_completion *cmpl = &sd->msgs_queued[sd->msg_queued_nr];
+	struct librpma_fio_options_values *o = td->eo;
 	int ret;
 
 	ret = rpma_conn_completion_get(csd->conn, cmpl);
 	if (ret == RPMA_E_NO_COMPLETION) {
-		/* lack of completion is not an error */
-		return 0;
+		if (o->busy_wait_polling == 0) {
+			ret = rpma_conn_completion_wait(csd->conn);
+			if (ret == RPMA_E_NO_COMPLETION) {
+				/* lack of completion is not an error */
+				return 0;
+			} else if (ret != 0) {
+				librpma_td_verror(td, ret, "rpma_conn_completion_wait");
+				goto err_terminate;
+			}
+
+			ret = rpma_conn_completion_get(csd->conn, cmpl);
+			if (ret == RPMA_E_NO_COMPLETION) {
+				/* lack of completion is not an error */
+				return 0;
+			} else if (ret != 0) {
+				librpma_td_verror(td, ret, "rpma_conn_completion_get");
+				goto err_terminate;
+			}
+		} else {
+			/* lack of completion is not an error */
+			return 0;
+		}
 	} else if (ret != 0) {
 		librpma_td_verror(td, ret, "rpma_conn_completion_get");
 		goto err_terminate;
diff --git a/examples/librpma_gpspm-server.fio b/examples/librpma_gpspm-server.fio
index d618f2db..67e92a28 100644
--- a/examples/librpma_gpspm-server.fio
+++ b/examples/librpma_gpspm-server.fio
@@ -20,6 +20,8 @@ thread
 # set to 1 (true) ONLY when Direct Write to PMem from the remote host is possible
 # (https://pmem.io/rpma/documentation/basic-direct-write-to-pmem.html)
 direct_write_to_pmem=0
+# set to 0 (false) to wait for completion instead of busy-wait polling completion.
+busy_wait_polling=1
 numjobs=1 # number of expected incomming connections
 iodepth=2 # number of parallel GPSPM requests
 size=100MiB # size of workspace for a single connection
diff --git a/fio.1 b/fio.1
index 18dc156a..c3916168 100644
--- a/fio.1
+++ b/fio.1
@@ -1999,6 +1999,10 @@ The IP address to be used for RDMA-CM based I/O.
 .BI (librpma_*_server)direct_write_to_pmem \fR=\fPbool
 Set to 1 only when Direct Write to PMem from the remote host is possible. Otherwise, set to 0.
 .TP
+.BI (librpma_*_server)busy_wait_polling \fR=\fPbool
+Set to 0 to wait for completion instead of busy-wait polling completion.
+Default: 1.
+.TP
 .BI (netsplice,net)interface \fR=\fPstr
 The IP address of the network interface used to send or receive UDP
 multicast.


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-04-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-04-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5592e99219864e21b425cfc66fa05ece5b514259:

  backend: fix switch_ioscheduler() (2021-04-16 10:25:24 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1b65668b6a50392436947802a49896e891feb0f8:

  Merge branch 'zbd-no-parallel-init' of https://github.com/floatious/fio (2021-04-22 11:18:23 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'zbd-no-parallel-init' of https://github.com/floatious/fio

Niklas Cassel (1):
      init: zonemode=zbd does not work with create_serialize=0

 init.c | 5 +++++
 1 file changed, 5 insertions(+)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 37bff876..60c7cff4 100644
--- a/init.c
+++ b/init.c
@@ -633,6 +633,11 @@ static int fixup_options(struct thread_data *td)
 		ret |= 1;
 	}
 
+	if (o->zone_mode == ZONE_MODE_ZBD && !o->create_serialize) {
+		log_err("fio: --zonemode=zbd and --create_serialize=0 are not compatible.\n");
+		ret |= 1;
+	}
+
 	if (o->zone_mode == ZONE_MODE_STRIDED && !o->zone_size) {
 		log_err("fio: --zonesize must be specified when using --zonemode=strided.\n");
 		ret |= 1;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-04-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-04-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5561e9dddca8479f182f0269a760dcabe7ff59ad:

  engines: add engine for file delete (2021-04-16 05:56:19 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5592e99219864e21b425cfc66fa05ece5b514259:

  backend: fix switch_ioscheduler() (2021-04-16 10:25:24 -0600)

----------------------------------------------------------------
Damien Le Moal (1):
      backend: fix switch_ioscheduler()

 backend.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
 fio.1     |  3 ++-
 2 files changed, 52 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 52b4ca7e..399c299e 100644
--- a/backend.c
+++ b/backend.c
@@ -1341,22 +1341,19 @@ int init_io_u_buffers(struct thread_data *td)
 	return 0;
 }
 
+#ifdef FIO_HAVE_IOSCHED_SWITCH
 /*
- * This function is Linux specific.
+ * These functions are Linux specific.
  * FIO_HAVE_IOSCHED_SWITCH enabled currently means it's Linux.
  */
-static int switch_ioscheduler(struct thread_data *td)
+static int set_ioscheduler(struct thread_data *td, struct fio_file *file)
 {
-#ifdef FIO_HAVE_IOSCHED_SWITCH
 	char tmp[256], tmp2[128], *p;
 	FILE *f;
 	int ret;
 
-	if (td_ioengine_flagged(td, FIO_DISKLESSIO))
-		return 0;
-
-	assert(td->files && td->files[0]);
-	sprintf(tmp, "%s/queue/scheduler", td->files[0]->du->sysfs_root);
+	assert(file->du && file->du->sysfs_root);
+	sprintf(tmp, "%s/queue/scheduler", file->du->sysfs_root);
 
 	f = fopen(tmp, "r+");
 	if (!f) {
@@ -1417,11 +1414,55 @@ static int switch_ioscheduler(struct thread_data *td)
 
 	fclose(f);
 	return 0;
+}
+
+static int switch_ioscheduler(struct thread_data *td)
+{
+	struct fio_file *f;
+	unsigned int i;
+	int ret = 0;
+
+	if (td_ioengine_flagged(td, FIO_DISKLESSIO))
+		return 0;
+
+	assert(td->files && td->files[0]);
+
+	for_each_file(td, f, i) {
+
+		/* Only consider regular files and block device files */
+		switch (f->filetype) {
+		case FIO_TYPE_FILE:
+		case FIO_TYPE_BLOCK:
+			/*
+			 * Make sure that the device hosting the file could
+			 * be determined.
+			 */
+			if (!f->du)
+				continue;
+			break;
+		case FIO_TYPE_CHAR:
+		case FIO_TYPE_PIPE:
+		default:
+			continue;
+		}
+
+		ret = set_ioscheduler(td, f);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 #else
+
+static int switch_ioscheduler(struct thread_data *td)
+{
 	return 0;
-#endif
 }
 
+#endif /* FIO_HAVE_IOSCHED_SWITCH */
+
 static bool keep_running(struct thread_data *td)
 {
 	unsigned long long limit;
diff --git a/fio.1 b/fio.1
index c59a8002..18dc156a 100644
--- a/fio.1
+++ b/fio.1
@@ -690,7 +690,8 @@ of how that would work.
 .TP
 .BI ioscheduler \fR=\fPstr
 Attempt to switch the device hosting the file to the specified I/O scheduler
-before running.
+before running. If the file is a pipe, a character device file or if device
+hosting the file could not be determined, this option is ignored.
 .TP
 .BI create_serialize \fR=\fPbool
 If true, serialize the file creation for the jobs. This may be handy to


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-04-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-04-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a2aa490a0677771e070e1e2d9e6fd1ad19cfe1fd:

  Merge branch 'parse-signedness-warn' of https://github.com/floatious/fio (2021-04-13 07:51:17 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5561e9dddca8479f182f0269a760dcabe7ff59ad:

  engines: add engine for file delete (2021-04-16 05:56:19 -0600)

----------------------------------------------------------------
Friendy.Su@sony.com (1):
      engines: add engine for file delete

 HOWTO                            |   5 ++
 Makefile                         |   2 +-
 engines/filedelete.c             | 115 +++++++++++++++++++++++++++++++++++++++
 examples/filedelete-ioengine.fio |  18 ++++++
 fio.1                            |   5 ++
 5 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 engines/filedelete.c
 create mode 100644 examples/filedelete-ioengine.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 2788670d..e6078c5f 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2055,6 +2055,11 @@ I/O engine
 			and 'nrfiles', so that files will be created.
 			This engine is to measure file lookup and meta data access.
 
+		**filedelete**
+			Simply delete the files by unlink() and do no I/O to them. You need to set 'filesize'
+			and 'nrfiles', so that the files will be created.
+			This engine is to measure file delete.
+
 		**libpmem**
 			Read and write using mmap I/O to a file on a filesystem
 			mounted with DAX on a persistent memory device through the PMDK
diff --git a/Makefile b/Makefile
index fce3d0d1..ba027b2e 100644
--- a/Makefile
+++ b/Makefile
@@ -51,7 +51,7 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		pshared.c options.c \
 		smalloc.c filehash.c profile.c debug.c engines/cpu.c \
 		engines/mmap.c engines/sync.c engines/null.c engines/net.c \
-		engines/ftruncate.c engines/filecreate.c engines/filestat.c \
+		engines/ftruncate.c engines/filecreate.c engines/filestat.c engines/filedelete.c \
 		server.c client.c iolog.c backend.c libfio.c flow.c cconv.c \
 		gettime-thread.c helpers.c json.c idletime.c td_error.c \
 		profiles/tiobench.c profiles/act.c io_u_queue.c filelock.c \
diff --git a/engines/filedelete.c b/engines/filedelete.c
new file mode 100644
index 00000000..64c58639
--- /dev/null
+++ b/engines/filedelete.c
@@ -0,0 +1,115 @@
+/*
+ * file delete engine
+ *
+ * IO engine that doesn't do any IO, just delete files and track the latency
+ * of the file deletion.
+ */
+#include <stdio.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include "../fio.h"
+
+struct fc_data {
+	enum fio_ddir stat_ddir;
+};
+
+static int delete_file(struct thread_data *td, struct fio_file *f)
+{
+	struct timespec start;
+	int do_lat = !td->o.disable_lat;
+	int ret;
+
+	dprint(FD_FILE, "fd delete %s\n", f->file_name);
+
+	if (f->filetype != FIO_TYPE_FILE) {
+		log_err("fio: only files are supported\n");
+		return 1;
+	}
+	if (!strcmp(f->file_name, "-")) {
+		log_err("fio: can't read/write to stdin/out\n");
+		return 1;
+	}
+
+	if (do_lat)
+		fio_gettime(&start, NULL);
+
+	ret = unlink(f->file_name);
+
+	if (ret == -1) {
+		char buf[FIO_VERROR_SIZE];
+		int e = errno;
+
+		snprintf(buf, sizeof(buf), "delete(%s)", f->file_name);
+		td_verror(td, e, buf);
+		return 1;
+	}
+
+	if (do_lat) {
+		struct fc_data *data = td->io_ops_data;
+		uint64_t nsec;
+
+		nsec = ntime_since_now(&start);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0);
+	}
+
+	return 0;
+}
+
+
+static enum fio_q_status queue_io(struct thread_data *td, struct io_u fio_unused *io_u)
+{
+	return FIO_Q_COMPLETED;
+}
+
+static int init(struct thread_data *td)
+{
+	struct fc_data *data;
+
+	data = calloc(1, sizeof(*data));
+
+	if (td_read(td))
+		data->stat_ddir = DDIR_READ;
+	else if (td_write(td))
+		data->stat_ddir = DDIR_WRITE;
+
+	td->io_ops_data = data;
+	return 0;
+}
+
+static int delete_invalidate(struct thread_data *td, struct fio_file *f)
+{
+    /* do nothing because file not opened */
+    return 0;
+}
+
+static void cleanup(struct thread_data *td)
+{
+	struct fc_data *data = td->io_ops_data;
+
+	free(data);
+}
+
+static struct ioengine_ops ioengine = {
+	.name		= "filedelete",
+	.version	= FIO_IOOPS_VERSION,
+	.init		= init,
+	.invalidate	= delete_invalidate,
+	.cleanup	= cleanup,
+	.queue		= queue_io,
+	.get_file_size	= generic_get_file_size,
+	.open_file	= delete_file,
+	.flags		=  FIO_SYNCIO | FIO_FAKEIO |
+				FIO_NOSTATS | FIO_NOFILEHASH,
+};
+
+static void fio_init fio_filedelete_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_filedelete_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/filedelete-ioengine.fio b/examples/filedelete-ioengine.fio
new file mode 100644
index 00000000..3c0028f9
--- /dev/null
+++ b/examples/filedelete-ioengine.fio
@@ -0,0 +1,18 @@
+# Example filedelete job
+
+# 'filedelete' engine only do 'unlink(filename)', file will not be open().
+# 'filesize' must be set, then files will be created at setup stage.
+# 'unlink' is better set to 0, since the file is deleted in measurement.
+# the options disabled completion latency output such as 'disable_clat' and 'gtod_reduce' must not set.
+[global]
+ioengine=filedelete
+filesize=4k
+nrfiles=200
+unlink=0
+
+[t0]
+[t1]
+[t2]
+[t3]
+[t4]
+[t5]
diff --git a/fio.1 b/fio.1
index f959e00d..c59a8002 100644
--- a/fio.1
+++ b/fio.1
@@ -1847,6 +1847,11 @@ Simply do stat() and do no I/O to the file. You need to set 'filesize'
 and 'nrfiles', so that files will be created.
 This engine is to measure file lookup and meta data access.
 .TP
+.B filedelete
+Simply delete files by unlink() and do no I/O to the file. You need to set 'filesize'
+and 'nrfiles', so that files will be created.
+This engine is to measure file delete.
+.TP
 .B libpmem
 Read and write using mmap I/O to a file on a filesystem
 mounted with DAX on a persistent memory device through the PMDK


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-04-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-04-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9b6253bc6af3b38d4677f7470f42a1ff22492ef3:

  t/zbd: test repeated async write with block size unaligned to zone size (2021-04-12 06:56:29 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a2aa490a0677771e070e1e2d9e6fd1ad19cfe1fd:

  Merge branch 'parse-signedness-warn' of https://github.com/floatious/fio (2021-04-13 07:51:17 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'parse-signedness-warn' of https://github.com/floatious/fio

Niklas Cassel (1):
      parse: fix parse_is_percent() warning

 parse.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/parse.h b/parse.h
index 4cf08fd2..d68484ea 100644
--- a/parse.h
+++ b/parse.h
@@ -131,7 +131,7 @@ static inline void *td_var(void *to, const struct fio_option *o,
 
 static inline int parse_is_percent(unsigned long long val)
 {
-	return val >= -101;
+	return val >= -101ULL;
 }
 
 #define ZONE_BASE_VAL ((-1ULL >> 1) + 1)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-04-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-04-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1588c8f571f67a004571e51cdbb5de97c3e4f457:

  Merge branch 'wip-rados-dont-zerowrite' of https://github.com/aclamk/fio (2021-04-10 11:46:30 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9b6253bc6af3b38d4677f7470f42a1ff22492ef3:

  t/zbd: test repeated async write with block size unaligned to zone size (2021-04-12 06:56:29 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (2):
      zbd: avoid zone reset during asynchronous IOs in-flight
      t/zbd: test repeated async write with block size unaligned to zone size

 t/zbd/test-zbd-support | 18 ++++++++++++++++++
 zbd.c                  | 23 +++++++----------------
 2 files changed, 25 insertions(+), 16 deletions(-)

---

Diff of recent changes:

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index be129615..26aff373 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -1201,6 +1201,24 @@ test56() {
 		>> "${logfile}.${test_number}" 2>&1 || return $?
 }
 
+# Test that repeated async write job does not cause zone reset during writes
+# in-flight, when the block size is not a divisor of the zone size.
+test57() {
+	local bs off
+
+	require_zbd || return $SKIP_TESTCASE
+
+	bs=$((4096 * 7))
+	off=$((first_sequential_zone_sector * 512))
+
+	run_fio --name=job --filename="${dev}" --rw=randwrite --bs="${bs}" \
+		--offset="${off}" --size=$((4 * zone_size)) --iodepth=256 \
+		"$(ioengine "libaio")" --time_based=1 --runtime=30s \
+		--zonemode=zbd --direct=1 --zonesize="${zone_size}" \
+		${job_var_opts[@]} \
+		>> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
 SECONDS=0
 tests=()
 dynamic_analyzer=()
diff --git a/zbd.c b/zbd.c
index d16b890f..eed796b3 100644
--- a/zbd.c
+++ b/zbd.c
@@ -842,16 +842,13 @@ static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
  * @f: fio file for which to reset zones
  * @zb: first zone to reset.
  * @ze: first zone not to reset.
- * @all_zones: whether to reset all zones or only those zones for which the
- *	write pointer is not a multiple of td->o.min_bs[DDIR_WRITE].
  */
 static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 			   struct fio_zone_info *const zb,
-			   struct fio_zone_info *const ze, bool all_zones)
+			   struct fio_zone_info *const ze)
 {
 	struct fio_zone_info *z;
 	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
-	bool reset_wp;
 	int res = 0;
 
 	assert(min_bs);
@@ -864,16 +861,10 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 		if (!z->has_wp)
 			continue;
 		zone_lock(td, f, z);
-		if (all_zones) {
-			pthread_mutex_lock(&f->zbd_info->mutex);
-			zbd_close_zone(td, f, nz);
-			pthread_mutex_unlock(&f->zbd_info->mutex);
-
-			reset_wp = z->wp != z->start;
-		} else {
-			reset_wp = z->wp % min_bs != 0;
-		}
-		if (reset_wp) {
+		pthread_mutex_lock(&f->zbd_info->mutex);
+		zbd_close_zone(td, f, nz);
+		pthread_mutex_unlock(&f->zbd_info->mutex);
+		if (z->wp != z->start) {
 			dprint(FD_ZBD, "%s: resetting zone %u\n",
 			       f->file_name, zbd_zone_nr(f, z));
 			if (zbd_reset_zone(td, f, z) < 0)
@@ -996,8 +987,8 @@ void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 	 * writing any data to avoid that a zone reset has to be issued while
 	 * writing data, which causes data loss.
 	 */
-	zbd_reset_zones(td, f, zb, ze, td->o.verify != VERIFY_NONE &&
-			td->runstate != TD_VERIFYING);
+	if (td->o.verify != VERIFY_NONE && td->runstate != TD_VERIFYING)
+		zbd_reset_zones(td, f, zb, ze);
 	zbd_reset_write_cnt(td, f);
 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-04-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-04-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6202c70d8d5cbdd3fb4bc23b96f691cbd25a327e:

  gettime: cleanup ifdef mess (2021-03-30 20:13:16 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1588c8f571f67a004571e51cdbb5de97c3e4f457:

  Merge branch 'wip-rados-dont-zerowrite' of https://github.com/aclamk/fio (2021-04-10 11:46:30 -0600)

----------------------------------------------------------------
Adam Kupczyk (1):
      engine/rados: Add option to skip object creation

Jens Axboe (1):
      Merge branch 'wip-rados-dont-zerowrite' of https://github.com/aclamk/fio

 HOWTO           |  6 ++++++
 engines/rados.c | 19 ++++++++++++++++---
 fio.1           |  5 +++++
 3 files changed, 27 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index c48f46d8..2788670d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2328,6 +2328,12 @@ with the caveat that when used on the command line, they must come after the
         Poll store instead of waiting for completion. Usually this provides better
         throughput at cost of higher(up to 100%) CPU utilization.
 
+.. option:: touch_objects=bool : [rados]
+
+        During initialization, touch (create if do not exist) all objects (files).
+        Touching all objects affects ceph caches and likely impacts test results.
+        Enabled by default.
+
 .. option:: skip_bad=bool : [mtd]
 
 	Skip operations against known bad blocks.
diff --git a/engines/rados.c b/engines/rados.c
index 42ee48ff..23e62c4c 100644
--- a/engines/rados.c
+++ b/engines/rados.c
@@ -38,6 +38,7 @@ struct rados_options {
 	char *pool_name;
 	char *client_name;
 	int busy_poll;
+	int touch_objects;
 };
 
 static struct fio_option options[] = {
@@ -78,6 +79,16 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group    = FIO_OPT_G_RBD,
 	},
+	{
+		.name     = "touch_objects",
+		.lname    = "touch objects on start",
+		.type     = FIO_OPT_BOOL,
+		.help     = "Touch (create) objects on start",
+		.off1     = offsetof(struct rados_options, touch_objects),
+		.def	  = "1",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_RBD,
+	},
 	{
 		.name     = NULL,
 	},
@@ -194,9 +205,11 @@ static int _fio_rados_connect(struct thread_data *td)
 	for (i = 0; i < td->o.nr_files; i++) {
 		f = td->files[i];
 		f->real_file_size = file_size;
-		r = rados_write(rados->io_ctx, f->file_name, "", 0, 0);
-		if (r < 0) {
-			goto failed_obj_create;
+		if (o->touch_objects) {
+			r = rados_write(rados->io_ctx, f->file_name, "", 0, 0);
+			if (r < 0) {
+				goto failed_obj_create;
+			}
 		}
 	}
 	return 0;
diff --git a/fio.1 b/fio.1
index ad4a662b..f959e00d 100644
--- a/fio.1
+++ b/fio.1
@@ -2087,6 +2087,11 @@ by default.
 Poll store instead of waiting for completion. Usually this provides better
 throughput at cost of higher(up to 100%) CPU utilization.
 .TP
+.BI (rados)touch_objects \fR=\fPbool
+During initialization, touch (create if do not exist) all objects (files).
+Touching all objects affects ceph caches and likely impacts test results.
+Enabled by default.
+.TP
 .BI (http)http_host \fR=\fPstr
 Hostname to connect to. For S3, this could be the bucket name. Default
 is \fBlocalhost\fR


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-03-31 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-03-31 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e7e536b665bd6a9d3e936e0847dbbb6957101da4:

  Merge branch 'unified-merge' of https://github.com/jeffreyalien/fio (2021-03-18 10:19:57 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6202c70d8d5cbdd3fb4bc23b96f691cbd25a327e:

  gettime: cleanup ifdef mess (2021-03-30 20:13:16 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      configure: add test case for pthread_getaffinity_np()
      os/os-linux: add pthread CPU affinity helper
      gettime: check affinity for thread, if we have it
      gettime: cleanup ifdef mess

 configure     | 27 +++++++++++++++++++++++++++
 gettime.c     | 22 ++++++++++++++++++++--
 os/os-linux.h |  3 +++
 3 files changed, 50 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 2f5ac91f..a7d82be0 100755
--- a/configure
+++ b/configure
@@ -418,6 +418,7 @@ CYGWIN*)
   clock_monotonic="yes"
   sched_idle="yes"
   pthread_condattr_setclock="no"
+  pthread_affinity="no"
   ;;
 esac
 
@@ -803,6 +804,29 @@ elif compile_prog "" "$LIBS -lpthread" "pthread_sigmask" ; then
 fi
 print_config "pthread_sigmask()" "$pthread_sigmask"
 
+##########################################
+# pthread_getaffinity_np() probe
+if test "$pthread_getaffinity" != "yes" ; then
+  pthread_getaffinity="no"
+fi
+cat > $TMPC <<EOF
+#include <stddef.h> /* NULL */
+#include <signal.h> /* pthread_sigmask() */
+#include <pthread.h>
+int main(void)
+{
+  cpu_set_t set;
+  return pthread_getaffinity_np(pthread_self(), sizeof(set), &set);
+}
+EOF
+if compile_prog "" "$LIBS" "pthread_getaffinity" ; then
+  pthread_getaffinity="yes"
+elif compile_prog "" "$LIBS -lpthread" "pthread_getaffinity" ; then
+  pthread_getaffinity="yes"
+  LIBS="$LIBS -lpthread"
+fi
+print_config "pthread_getaffinity_np()" "$pthread_getaffinity"
+
 ##########################################
 # solaris aio probe
 if test "$solaris_aio" != "yes" ; then
@@ -2823,6 +2847,9 @@ fi
 if test "$pthread_sigmask" = "yes" ; then
   output_sym "CONFIG_PTHREAD_SIGMASK"
 fi
+if test "$pthread_getaffinity" = "yes" ; then
+  output_sym "CONFIG_PTHREAD_GETAFFINITY"
+fi
 if test "$have_asprintf" = "yes" ; then
     output_sym "CONFIG_HAVE_ASPRINTF"
 fi
diff --git a/gettime.c b/gettime.c
index f85da6e0..e3f483a7 100644
--- a/gettime.c
+++ b/gettime.c
@@ -671,12 +671,21 @@ static int clock_cmp(const void *p1, const void *p2)
 int fio_monotonic_clocktest(int debug)
 {
 	struct clock_thread *cthreads;
-	unsigned int nr_cpus = cpus_online();
+	unsigned int seen_cpus, nr_cpus = cpus_online();
 	struct clock_entry *entries;
 	unsigned long nr_entries, tentries, failed = 0;
 	struct clock_entry *prev, *this;
 	uint32_t seq = 0;
 	unsigned int i;
+	os_cpu_mask_t mask;
+
+#ifdef CONFIG_PTHREAD_GETAFFINITY
+	fio_get_thread_affinity(mask);
+#else
+	memset(&mask, 0, sizeof(mask));
+	for (i = 0; i < nr_cpus; i++)
+		fio_cpu_set(&mask, i);
+#endif
 
 	if (debug) {
 		log_info("cs: reliable_tsc: %s\n", tsc_reliable ? "yes" : "no");
@@ -703,25 +712,31 @@ int fio_monotonic_clocktest(int debug)
 	if (debug)
 		log_info("cs: Testing %u CPUs\n", nr_cpus);
 
+	seen_cpus = 0;
 	for (i = 0; i < nr_cpus; i++) {
 		struct clock_thread *t = &cthreads[i];
 
+		if (!fio_cpu_isset(&mask, i))
+			continue;
 		t->cpu = i;
 		t->debug = debug;
 		t->seq = &seq;
 		t->nr_entries = nr_entries;
-		t->entries = &entries[i * nr_entries];
+		t->entries = &entries[seen_cpus * nr_entries];
 		__fio_sem_init(&t->lock, FIO_SEM_LOCKED);
 		if (pthread_create(&t->thread, NULL, clock_thread_fn, t)) {
 			failed++;
 			nr_cpus = i;
 			break;
 		}
+		seen_cpus++;
 	}
 
 	for (i = 0; i < nr_cpus; i++) {
 		struct clock_thread *t = &cthreads[i];
 
+		if (!fio_cpu_isset(&mask, i))
+			continue;
 		fio_sem_up(&t->lock);
 	}
 
@@ -729,6 +744,8 @@ int fio_monotonic_clocktest(int debug)
 		struct clock_thread *t = &cthreads[i];
 		void *ret;
 
+		if (!fio_cpu_isset(&mask, i))
+			continue;
 		pthread_join(t->thread, &ret);
 		if (ret)
 			failed++;
@@ -742,6 +759,7 @@ int fio_monotonic_clocktest(int debug)
 		goto err;
 	}
 
+	tentries = nr_entries * seen_cpus;
 	qsort(entries, tentries, sizeof(struct clock_entry), clock_cmp);
 
 	/* silence silly gcc */
diff --git a/os/os-linux.h b/os/os-linux.h
index 5562b0da..ea8d7922 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -74,6 +74,9 @@ typedef cpu_set_t os_cpu_mask_t;
 	sched_getaffinity((pid), (ptr))
 #endif
 
+#define fio_get_thread_affinity(mask)	\
+	pthread_getaffinity_np(pthread_self(), sizeof(mask), &(mask))
+
 #define fio_cpu_clear(mask, cpu)	(void) CPU_CLR((cpu), (mask))
 #define fio_cpu_set(mask, cpu)		(void) CPU_SET((cpu), (mask))
 #define fio_cpu_isset(mask, cpu)	(CPU_ISSET((cpu), (mask)) != 0)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-03-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-03-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dede9b9fae3ab670c1ca864ac66aea5e997e1f34:

  Merge branch 'free-dump-options' of https://github.com/floatious/fio (2021-03-17 09:25:46 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e7e536b665bd6a9d3e936e0847dbbb6957101da4:

  Merge branch 'unified-merge' of https://github.com/jeffreyalien/fio (2021-03-18 10:19:57 -0600)

----------------------------------------------------------------
Brandon Paupore (1):
      Add functionality to the unified_rw_reporting parameter to output separate and mixed stats when set to 'both' or 2.

Jan Michalski (1):
      rpma: add librpma_apm_* and librpma_gpspm_* engines

Jens Axboe (2):
      Merge branch 'add-librpma-engines' of https://github.com/janekmi/fio
      Merge branch 'unified-merge' of https://github.com/jeffreyalien/fio

 HOWTO                              |   37 +-
 Makefile                           |   15 +
 ci/travis-install-librpma.sh       |   22 +
 ci/travis-install-pmdk.sh          |   28 +
 ci/travis-install.sh               |   10 +
 configure                          |   52 ++
 engines/librpma_apm.c              |  256 +++++++++
 engines/librpma_fio.c              | 1051 ++++++++++++++++++++++++++++++++++++
 engines/librpma_fio.h              |  273 ++++++++++
 engines/librpma_gpspm.c            |  755 ++++++++++++++++++++++++++
 engines/librpma_gpspm_flush.pb-c.c |  214 ++++++++
 engines/librpma_gpspm_flush.pb-c.h |  120 ++++
 engines/librpma_gpspm_flush.proto  |   15 +
 eta.c                              |    4 +-
 examples/librpma_apm-client.fio    |   24 +
 examples/librpma_apm-server.fio    |   26 +
 examples/librpma_gpspm-client.fio  |   23 +
 examples/librpma_gpspm-server.fio  |   31 ++
 fio.1                              |   36 +-
 optgroup.c                         |    4 +
 optgroup.h                         |    2 +
 options.c                          |   41 +-
 stat.c                             |  316 ++++++++++-
 stat.h                             |    3 +
 24 files changed, 3329 insertions(+), 29 deletions(-)
 create mode 100755 ci/travis-install-librpma.sh
 create mode 100755 ci/travis-install-pmdk.sh
 create mode 100644 engines/librpma_apm.c
 create mode 100644 engines/librpma_fio.c
 create mode 100644 engines/librpma_fio.h
 create mode 100644 engines/librpma_gpspm.c
 create mode 100644 engines/librpma_gpspm_flush.pb-c.c
 create mode 100644 engines/librpma_gpspm_flush.pb-c.h
 create mode 100644 engines/librpma_gpspm_flush.proto
 create mode 100644 examples/librpma_apm-client.fio
 create mode 100644 examples/librpma_apm-server.fio
 create mode 100644 examples/librpma_gpspm-client.fio
 create mode 100644 examples/librpma_gpspm-server.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 041b91fa..c48f46d8 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1146,11 +1146,31 @@ I/O type
 	behaves in a similar fashion, except it sends the same offset 8 number of
 	times before generating a new offset.
 
-.. option:: unified_rw_reporting=bool
+.. option:: unified_rw_reporting=str
 
 	Fio normally reports statistics on a per data direction basis, meaning that
-	reads, writes, and trims are accounted and reported separately. If this
-	option is set fio sums the results and report them as "mixed" instead.
+	reads, writes, and trims are accounted and reported separately. This option
+	determines whether fio reports the results normally, summed together, or as
+	both options.
+	Accepted values are:
+
+		**none**
+			Normal statistics reporting.
+
+		**mixed**
+			Statistics are summed per data direction and reported together.
+
+		**both**
+			Statistics are reported normally, followed by the mixed statistics.
+
+		**0**
+			Backward-compatible alias for **none**.
+
+		**1**
+			Backward-compatible alias for **mixed**.
+		
+		**2**
+			Alias for **both**.
 
 .. option:: randrepeat=bool
 
@@ -2192,7 +2212,7 @@ with the caveat that when used on the command line, they must come after the
 		this will be the starting port number since fio will use a range of
 		ports.
 
-   [rdma]
+   [rdma], [librpma_*]
 
 		The port to use for RDMA-CM communication. This should be the same value
 		on the client and the server side.
@@ -2203,6 +2223,15 @@ with the caveat that when used on the command line, they must come after the
 	is a TCP listener or UDP reader, the hostname is not used and must be omitted
 	unless it is a valid UDP multicast address.
 
+.. option:: serverip=str : [librpma_*]
+
+	The IP address to be used for RDMA-CM based I/O.
+
+.. option:: direct_write_to_pmem=bool : [librpma_*]
+
+	Set to 1 only when Direct Write to PMem from the remote host is possible.
+	Otherwise, set to 0.
+
 .. option:: interface=str : [netsplice] [net]
 
 	The IP address of the network interface used to send or receive UDP
diff --git a/Makefile b/Makefile
index 87a47b66..fce3d0d1 100644
--- a/Makefile
+++ b/Makefile
@@ -94,6 +94,21 @@ ifdef CONFIG_RDMA
   rdma_LIBS = -libverbs -lrdmacm
   ENGINES += rdma
 endif
+ifdef CONFIG_LIBRPMA_APM
+  librpma_apm_SRCS = engines/librpma_apm.c
+  librpma_fio_SRCS = engines/librpma_fio.c
+  librpma_apm_LIBS = -lrpma -lpmem
+  ENGINES += librpma_apm
+endif
+ifdef CONFIG_LIBRPMA_GPSPM
+  librpma_gpspm_SRCS = engines/librpma_gpspm.c engines/librpma_gpspm_flush.pb-c.c
+  librpma_fio_SRCS = engines/librpma_fio.c
+  librpma_gpspm_LIBS = -lrpma -lpmem -lprotobuf-c
+  ENGINES += librpma_gpspm
+endif
+ifdef librpma_fio_SRCS
+  SOURCE += $(librpma_fio_SRCS)
+endif
 ifdef CONFIG_POSIXAIO
   SOURCE += engines/posixaio.c
 endif
diff --git a/ci/travis-install-librpma.sh b/ci/travis-install-librpma.sh
new file mode 100755
index 00000000..b127f3f5
--- /dev/null
+++ b/ci/travis-install-librpma.sh
@@ -0,0 +1,22 @@
+#!/bin/bash -e
+
+# 11.02.2021 Merge pull request #866 from ldorau/rpma-mmap-memory-for-rpma_mr_reg-in-rpma_flush_apm_new
+LIBRPMA_VERSION=fbac593917e98f3f26abf14f4fad5a832b330f5c
+ZIP_FILE=rpma.zip
+
+WORKDIR=$(pwd)
+
+# install librpma
+wget -O $ZIP_FILE https://github.com/pmem/rpma/archive/${LIBRPMA_VERSION}.zip
+unzip $ZIP_FILE
+mkdir -p rpma-${LIBRPMA_VERSION}/build
+cd rpma-${LIBRPMA_VERSION}/build
+cmake .. -DCMAKE_BUILD_TYPE=Release \
+	-DCMAKE_INSTALL_PREFIX=/usr \
+	-DBUILD_DOC=OFF \
+	-DBUILD_EXAMPLES=OFF \
+	-DBUILD_TESTS=OFF
+make -j$(nproc)
+sudo make -j$(nproc) install
+cd $WORKDIR
+rm -rf $ZIP_FILE rpma-${LIBRPMA_VERSION}
diff --git a/ci/travis-install-pmdk.sh b/ci/travis-install-pmdk.sh
new file mode 100755
index 00000000..803438f8
--- /dev/null
+++ b/ci/travis-install-pmdk.sh
@@ -0,0 +1,28 @@
+#!/bin/bash -e
+
+# pmdk v1.9.1 release
+PMDK_VERSION=1.9.1
+
+WORKDIR=$(pwd)
+
+#
+# The '/bin/sh' shell used by PMDK's 'make install'
+# does not know the exact localization of clang
+# and fails with:
+#    /bin/sh: 1: clang: not found
+# if CC is not set to the full path of clang.
+#
+export CC=$(which $CC)
+
+# Install PMDK libraries, because PMDK's libpmem
+# is a dependency of the librpma fio engine.
+# Install it from a release package
+# with already generated documentation,
+# in order to not install 'pandoc'.
+wget https://github.com/pmem/pmdk/releases/download/${PMDK_VERSION}/pmdk-${PMDK_VERSION}.tar.gz
+tar -xzf pmdk-${PMDK_VERSION}.tar.gz
+cd pmdk-${PMDK_VERSION}
+make -j$(nproc) NDCTL_ENABLE=n
+sudo make -j$(nproc) install prefix=/usr NDCTL_ENABLE=n
+cd $WORKDIR
+rm -rf pmdk-${PMDK_VERSION}
diff --git a/ci/travis-install.sh b/ci/travis-install.sh
index 103695dc..4c4c04c5 100755
--- a/ci/travis-install.sh
+++ b/ci/travis-install.sh
@@ -43,6 +43,16 @@ case "$TRAVIS_OS_NAME" in
 	)
 	sudo apt-get -qq update
 	sudo apt-get install --no-install-recommends -qq -y "${pkgs[@]}"
+	# librpma is supported on the amd64 (x86_64) architecture for now
+	if [[ $CI_TARGET_ARCH == "amd64" ]]; then
+		# install libprotobuf-c-dev required by librpma_gpspm
+		sudo apt-get install --no-install-recommends -qq -y libprotobuf-c-dev
+		# PMDK libraries have to be installed, because
+		# libpmem is a dependency of the librpma fio engine
+		ci/travis-install-pmdk.sh
+		# install librpma from sources from GitHub
+		ci/travis-install-librpma.sh
+	fi
 	;;
     "osx")
 	brew update >/dev/null 2>&1
diff --git a/configure b/configure
index d79f6521..2f5ac91f 100755
--- a/configure
+++ b/configure
@@ -924,6 +924,49 @@ if test "$disable_rdma" != "yes" && compile_prog "" "-lrdmacm" "rdma"; then
 fi
 print_config "rdmacm" "$rdmacm"
 
+##########################################
+# librpma probe
+if test "$librpma" != "yes" ; then
+  librpma="no"
+fi
+cat > $TMPC << EOF
+#include <stdio.h>
+#include <librpma.h>
+int main(int argc, char **argv)
+{
+  enum rpma_conn_event event = RPMA_CONN_REJECTED;
+  (void) event; /* unused */
+  rpma_log_set_threshold(RPMA_LOG_THRESHOLD, RPMA_LOG_LEVEL_INFO);
+  return 0;
+}
+EOF
+if test "$disable_rdma" != "yes" && compile_prog "" "-lrpma" "rpma"; then
+    librpma="yes"
+fi
+print_config "librpma" "$librpma"
+
+##########################################
+# libprotobuf-c probe
+if test "$libprotobuf_c" != "yes" ; then
+  libprotobuf_c="no"
+fi
+cat > $TMPC << EOF
+#include <stdio.h>
+#include <protobuf-c/protobuf-c.h>
+#if !defined(PROTOBUF_C_VERSION_NUMBER)
+# error PROTOBUF_C_VERSION_NUMBER is not defined!
+#endif
+int main(int argc, char **argv)
+{
+  (void)protobuf_c_message_check(NULL);
+  return 0;
+}
+EOF
+if compile_prog "" "-lprotobuf-c" "protobuf_c"; then
+    libprotobuf_c="yes"
+fi
+print_config "libprotobuf_c" "$libprotobuf_c"
+
 ##########################################
 # asprintf() and vasprintf() probes
 if test "$have_asprintf" != "yes" ; then
@@ -2819,6 +2862,15 @@ fi
 if test "$libverbs" = "yes" -a "$rdmacm" = "yes" ; then
   output_sym "CONFIG_RDMA"
 fi
+# librpma is supported on the 'x86_64' architecture for now
+if test "$cpu" = "x86_64" -a "$libverbs" = "yes" -a "$rdmacm" = "yes" \
+    -a "$librpma" = "yes" -a "$libpmem" = "yes" ; then
+  output_sym "CONFIG_LIBRPMA_APM"
+fi
+if test "$cpu" = "x86_64" -a "$libverbs" = "yes" -a "$rdmacm" = "yes" \
+    -a "$librpma" = "yes" -a "$libpmem" = "yes" -a "$libprotobuf_c" = "yes" ; then
+  output_sym "CONFIG_LIBRPMA_GPSPM"
+fi
 if test "$clock_gettime" = "yes" ; then
   output_sym "CONFIG_CLOCK_GETTIME"
 fi
diff --git a/engines/librpma_apm.c b/engines/librpma_apm.c
new file mode 100644
index 00000000..ffa3769d
--- /dev/null
+++ b/engines/librpma_apm.c
@@ -0,0 +1,256 @@
+/*
+* librpma_apm: IO engine that uses PMDK librpma to read and write data,
+ *		based on Appliance Persistency Method
+ *
+ * Copyright 2020-2021, Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "librpma_fio.h"
+
+/* client side implementation */
+
+static inline int client_io_flush(struct thread_data *td,
+		struct io_u *first_io_u, struct io_u *last_io_u,
+		unsigned long long int len);
+
+static int client_get_io_u_index(struct rpma_completion *cmpl,
+		unsigned int *io_u_index);
+
+static int client_init(struct thread_data *td)
+{
+	struct librpma_fio_client_data *ccd;
+	unsigned int sq_size;
+	uint32_t cq_size;
+	struct rpma_conn_cfg *cfg = NULL;
+	struct rpma_peer_cfg *pcfg = NULL;
+	int ret;
+
+	/* not supported readwrite = trim / randtrim / trimwrite */
+	if (td_trim(td)) {
+		td_verror(td, EINVAL, "Not supported mode.");
+		return -1;
+	}
+
+	/*
+	 * Calculate the required queue sizes where:
+	 * - the send queue (SQ) has to be big enough to accommodate
+	 *   all io_us (WRITEs) and all flush requests (FLUSHes)
+	 * - the completion queue (CQ) has to be big enough to accommodate all
+	 *   success and error completions (cq_size = sq_size)
+	 */
+	if (td_random(td) || td_rw(td)) {
+		/*
+		 * sq_size = max(rand_read_sq_size, rand_write_sq_size)
+		 * where rand_read_sq_size < rand_write_sq_size because read
+		 * does not require flush afterwards
+		 * rand_write_sq_size = N * (WRITE + FLUSH)
+		 *
+		 * Note: rw is no different from random write since having
+		 * interleaved reads with writes in extreme forces you to flush
+		 * as often as when the writes are random.
+		 */
+		sq_size = 2 * td->o.iodepth;
+	} else if (td_write(td)) {
+		/* sequential TD_DDIR_WRITE only */
+		if (td->o.sync_io) {
+			sq_size = 2; /* WRITE + FLUSH */
+		} else {
+			/*
+			 * N * WRITE + B * FLUSH where:
+			 * - B == ceil(iodepth / iodepth_batch)
+			 *   which is the number of batches for N writes
+			 */
+			sq_size = td->o.iodepth + LIBRPMA_FIO_CEIL(td->o.iodepth,
+					td->o.iodepth_batch);
+		}
+	} else {
+		/* TD_DDIR_READ only */
+		if (td->o.sync_io) {
+			sq_size = 1; /* READ */
+		} else {
+			sq_size = td->o.iodepth; /* N x READ */
+		}
+	}
+	cq_size = sq_size;
+
+	/* create a connection configuration object */
+	if ((ret = rpma_conn_cfg_new(&cfg))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_new");
+		return -1;
+	}
+
+	/* apply queue sizes */
+	if ((ret = rpma_conn_cfg_set_sq_size(cfg, sq_size))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_set_sq_size");
+		goto err_cfg_delete;
+	}
+	if ((ret = rpma_conn_cfg_set_cq_size(cfg, cq_size))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_set_cq_size");
+		goto err_cfg_delete;
+	}
+
+	if (librpma_fio_client_init(td, cfg))
+		goto err_cfg_delete;
+
+	ccd = td->io_ops_data;
+
+	if (ccd->server_mr_flush_type == RPMA_FLUSH_TYPE_PERSISTENT) {
+		if (!ccd->ws->direct_write_to_pmem) {
+			if (td->thread_number == 1)
+				log_err(
+					"Fio librpma engine will not work until the Direct Write to PMem on the server side is possible (direct_write_to_pmem)\n");
+			goto err_cleanup_common;
+		}
+
+		/* configure peer's direct write to pmem support */
+		if ((ret = rpma_peer_cfg_new(&pcfg))) {
+			librpma_td_verror(td, ret, "rpma_peer_cfg_new");
+			goto err_cleanup_common;
+		}
+
+		if ((ret = rpma_peer_cfg_set_direct_write_to_pmem(pcfg, true))) {
+			librpma_td_verror(td, ret,
+				"rpma_peer_cfg_set_direct_write_to_pmem");
+			(void) rpma_peer_cfg_delete(&pcfg);
+			goto err_cleanup_common;
+		}
+
+		if ((ret = rpma_conn_apply_remote_peer_cfg(ccd->conn, pcfg))) {
+			librpma_td_verror(td, ret,
+				"rpma_conn_apply_remote_peer_cfg");
+			(void) rpma_peer_cfg_delete(&pcfg);
+			goto err_cleanup_common;
+		}
+
+		(void) rpma_peer_cfg_delete(&pcfg);
+	} else if (td->thread_number == 1) {
+		/* XXX log_info mixes with the JSON output */
+		log_err(
+			"Note: Direct Write to PMem is not supported by default nor required if you use DRAM instead of PMem on the server side (direct_write_to_pmem).\n"
+			"Remember that flushing to DRAM does not make your data persistent and may be used only for experimental purposes.\n");
+	}
+
+	if ((ret = rpma_conn_cfg_delete(&cfg))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_delete");
+		/* non fatal error - continue */
+	}
+
+	ccd->flush = client_io_flush;
+	ccd->get_io_u_index = client_get_io_u_index;
+
+	return 0;
+
+err_cleanup_common:
+	librpma_fio_client_cleanup(td);
+
+err_cfg_delete:
+	(void) rpma_conn_cfg_delete(&cfg);
+
+	return -1;
+}
+
+static void client_cleanup(struct thread_data *td)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+
+	if (ccd == NULL)
+		return;
+
+	free(ccd->client_data);
+
+	librpma_fio_client_cleanup(td);
+}
+
+static inline int client_io_flush(struct thread_data *td,
+		struct io_u *first_io_u, struct io_u *last_io_u,
+		unsigned long long int len)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	size_t dst_offset = first_io_u->offset;
+	int ret;
+
+	if ((ret = rpma_flush(ccd->conn, ccd->server_mr, dst_offset, len,
+			ccd->server_mr_flush_type, RPMA_F_COMPLETION_ALWAYS,
+			(void *)(uintptr_t)last_io_u->index))) {
+		librpma_td_verror(td, ret, "rpma_flush");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int client_get_io_u_index(struct rpma_completion *cmpl,
+		unsigned int *io_u_index)
+{
+	memcpy(io_u_index, &cmpl->op_context, sizeof(*io_u_index));
+
+	return 1;
+}
+
+FIO_STATIC struct ioengine_ops ioengine_client = {
+	.name			= "librpma_apm_client",
+	.version		= FIO_IOOPS_VERSION,
+	.init			= client_init,
+	.post_init		= librpma_fio_client_post_init,
+	.get_file_size		= librpma_fio_client_get_file_size,
+	.open_file		= librpma_fio_file_nop,
+	.queue			= librpma_fio_client_queue,
+	.commit			= librpma_fio_client_commit,
+	.getevents		= librpma_fio_client_getevents,
+	.event			= librpma_fio_client_event,
+	.errdetails		= librpma_fio_client_errdetails,
+	.close_file		= librpma_fio_file_nop,
+	.cleanup		= client_cleanup,
+	.flags			= FIO_DISKLESSIO,
+	.options		= librpma_fio_options,
+	.option_struct_size	= sizeof(struct librpma_fio_options_values),
+};
+
+/* server side implementation */
+
+static int server_open_file(struct thread_data *td, struct fio_file *f)
+{
+	return librpma_fio_server_open_file(td, f, NULL);
+}
+
+static enum fio_q_status server_queue(struct thread_data *td, struct io_u *io_u)
+{
+	return FIO_Q_COMPLETED;
+}
+
+FIO_STATIC struct ioengine_ops ioengine_server = {
+	.name			= "librpma_apm_server",
+	.version		= FIO_IOOPS_VERSION,
+	.init			= librpma_fio_server_init,
+	.open_file		= server_open_file,
+	.close_file		= librpma_fio_server_close_file,
+	.queue			= server_queue,
+	.invalidate		= librpma_fio_file_nop,
+	.cleanup		= librpma_fio_server_cleanup,
+	.flags			= FIO_SYNCIO,
+	.options		= librpma_fio_options,
+	.option_struct_size	= sizeof(struct librpma_fio_options_values),
+};
+
+/* register both engines */
+
+static void fio_init fio_librpma_apm_register(void)
+{
+	register_ioengine(&ioengine_client);
+	register_ioengine(&ioengine_server);
+}
+
+static void fio_exit fio_librpma_apm_unregister(void)
+{
+	unregister_ioengine(&ioengine_client);
+	unregister_ioengine(&ioengine_server);
+}
diff --git a/engines/librpma_fio.c b/engines/librpma_fio.c
new file mode 100644
index 00000000..810b55e2
--- /dev/null
+++ b/engines/librpma_fio.c
@@ -0,0 +1,1051 @@
+/*
+ * librpma_fio: librpma_apm and librpma_gpspm engines' common part.
+ *
+ * Copyright 2021, Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "librpma_fio.h"
+
+#include <libpmem.h>
+
+struct fio_option librpma_fio_options[] = {
+	{
+		.name	= "serverip",
+		.lname	= "rpma_server_ip",
+		.type	= FIO_OPT_STR_STORE,
+		.off1	= offsetof(struct librpma_fio_options_values, server_ip),
+		.help	= "IP address the server is listening on",
+		.def	= "",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBRPMA,
+	},
+	{
+		.name	= "port",
+		.lname	= "rpma_server port",
+		.type	= FIO_OPT_STR_STORE,
+		.off1	= offsetof(struct librpma_fio_options_values, port),
+		.help	= "port the server is listening on",
+		.def	= "7204",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBRPMA,
+	},
+	{
+		.name	= "direct_write_to_pmem",
+		.lname	= "Direct Write to PMem (via RDMA) from the remote host is possible",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct librpma_fio_options_values,
+					direct_write_to_pmem),
+		.help	= "Set to true ONLY when Direct Write to PMem from the remote host is possible (https://pmem.io/rpma/documentation/basic-direct-write-to-pmem.html)",
+		.def	= "",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBRPMA,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
+int librpma_fio_td_port(const char *port_base_str, struct thread_data *td,
+		char *port_out)
+{
+	unsigned long int port_ul = strtoul(port_base_str, NULL, 10);
+	unsigned int port_new;
+
+	port_out[0] = '\0';
+
+	if (port_ul == ULONG_MAX) {
+		td_verror(td, errno, "strtoul");
+		return -1;
+	}
+	port_ul += td->thread_number - 1;
+	if (port_ul >= UINT_MAX) {
+		log_err("[%u] port number (%lu) bigger than UINT_MAX\n",
+			td->thread_number, port_ul);
+		return -1;
+	}
+
+	port_new = port_ul;
+	snprintf(port_out, LIBRPMA_FIO_PORT_STR_LEN_MAX - 1, "%u", port_new);
+
+	return 0;
+}
+
+char *librpma_fio_allocate_dram(struct thread_data *td, size_t size,
+	struct librpma_fio_mem *mem)
+{
+	char *mem_ptr = NULL;
+	int ret;
+
+	if ((ret = posix_memalign((void **)&mem_ptr, page_size, size))) {
+		log_err("fio: posix_memalign() failed\n");
+		td_verror(td, ret, "posix_memalign");
+		return NULL;
+	}
+
+	mem->mem_ptr = mem_ptr;
+	mem->size_mmap = 0;
+
+	return mem_ptr;
+}
+
+char *librpma_fio_allocate_pmem(struct thread_data *td, const char *filename,
+		size_t size, struct librpma_fio_mem *mem)
+{
+	size_t size_mmap = 0;
+	char *mem_ptr = NULL;
+	int is_pmem = 0;
+	size_t ws_offset;
+
+	if (size % page_size) {
+		log_err("fio: size (%zu) is not aligned to page size (%zu)\n",
+			size, page_size);
+		return NULL;
+	}
+
+	ws_offset = (td->thread_number - 1) * size;
+
+	if (!filename) {
+		log_err("fio: filename is not set\n");
+		return NULL;
+	}
+
+	/* map the file */
+	mem_ptr = pmem_map_file(filename, 0 /* len */, 0 /* flags */,
+			0 /* mode */, &size_mmap, &is_pmem);
+	if (mem_ptr == NULL) {
+		log_err("fio: pmem_map_file(%s) failed\n", filename);
+		/* pmem_map_file() sets errno on failure */
+		td_verror(td, errno, "pmem_map_file");
+		return NULL;
+	}
+
+	/* pmem is expected */
+	if (!is_pmem) {
+		log_err("fio: %s is not located in persistent memory\n",
+			filename);
+		goto err_unmap;
+	}
+
+	/* check size of allocated persistent memory */
+	if (size_mmap < ws_offset + size) {
+		log_err(
+			"fio: %s is too small to handle so many threads (%zu < %zu)\n",
+			filename, size_mmap, ws_offset + size);
+		goto err_unmap;
+	}
+
+	log_info("fio: size of memory mapped from the file %s: %zu\n",
+		filename, size_mmap);
+
+	mem->mem_ptr = mem_ptr;
+	mem->size_mmap = size_mmap;
+
+	return mem_ptr + ws_offset;
+
+err_unmap:
+	(void) pmem_unmap(mem_ptr, size_mmap);
+	return NULL;
+}
+
+void librpma_fio_free(struct librpma_fio_mem *mem)
+{
+	if (mem->size_mmap)
+		(void) pmem_unmap(mem->mem_ptr, mem->size_mmap);
+	else
+		free(mem->mem_ptr);
+}
+
+#define LIBRPMA_FIO_RETRY_MAX_NO	10
+#define LIBRPMA_FIO_RETRY_DELAY_S	5
+
+int librpma_fio_client_init(struct thread_data *td,
+		struct rpma_conn_cfg *cfg)
+{
+	struct librpma_fio_client_data *ccd;
+	struct librpma_fio_options_values *o = td->eo;
+	struct ibv_context *dev = NULL;
+	char port_td[LIBRPMA_FIO_PORT_STR_LEN_MAX];
+	struct rpma_conn_req *req = NULL;
+	enum rpma_conn_event event;
+	struct rpma_conn_private_data pdata;
+	enum rpma_log_level log_level_aux = RPMA_LOG_LEVEL_WARNING;
+	int remote_flush_type;
+	int retry;
+	int ret;
+
+	/* --debug=net sets RPMA_LOG_THRESHOLD_AUX to RPMA_LOG_LEVEL_INFO */
+#ifdef FIO_INC_DEBUG
+	if ((1UL << FD_NET) & fio_debug)
+		log_level_aux = RPMA_LOG_LEVEL_INFO;
+#endif
+
+	/* configure logging thresholds to see more details */
+	rpma_log_set_threshold(RPMA_LOG_THRESHOLD, RPMA_LOG_LEVEL_INFO);
+	rpma_log_set_threshold(RPMA_LOG_THRESHOLD_AUX, log_level_aux);
+
+	/* obtain an IBV context for a remote IP address */
+	if ((ret = rpma_utils_get_ibv_context(o->server_ip,
+			RPMA_UTIL_IBV_CONTEXT_REMOTE, &dev))) {
+		librpma_td_verror(td, ret, "rpma_utils_get_ibv_context");
+		return -1;
+	}
+
+	/* allocate client's data */
+	ccd = calloc(1, sizeof(*ccd));
+	if (ccd == NULL) {
+		td_verror(td, errno, "calloc");
+		return -1;
+	}
+
+	/* allocate all in-memory queues */
+	ccd->io_us_queued = calloc(td->o.iodepth, sizeof(*ccd->io_us_queued));
+	if (ccd->io_us_queued == NULL) {
+		td_verror(td, errno, "calloc");
+		goto err_free_ccd;
+	}
+
+	ccd->io_us_flight = calloc(td->o.iodepth, sizeof(*ccd->io_us_flight));
+	if (ccd->io_us_flight == NULL) {
+		td_verror(td, errno, "calloc");
+		goto err_free_io_u_queues;
+	}
+
+	ccd->io_us_completed = calloc(td->o.iodepth,
+			sizeof(*ccd->io_us_completed));
+	if (ccd->io_us_completed == NULL) {
+		td_verror(td, errno, "calloc");
+		goto err_free_io_u_queues;
+	}
+
+	/* create a new peer object */
+	if ((ret = rpma_peer_new(dev, &ccd->peer))) {
+		librpma_td_verror(td, ret, "rpma_peer_new");
+		goto err_free_io_u_queues;
+	}
+
+	/* create a connection request */
+	if (librpma_fio_td_port(o->port, td, port_td))
+		goto err_peer_delete;
+
+	for (retry = 0; retry < LIBRPMA_FIO_RETRY_MAX_NO; retry++) {
+		if ((ret = rpma_conn_req_new(ccd->peer, o->server_ip, port_td,
+				cfg, &req))) {
+			librpma_td_verror(td, ret, "rpma_conn_req_new");
+			goto err_peer_delete;
+		}
+
+		/*
+		 * Connect the connection request
+		 * and obtain the connection object.
+		 */
+		if ((ret = rpma_conn_req_connect(&req, NULL, &ccd->conn))) {
+			librpma_td_verror(td, ret, "rpma_conn_req_connect");
+			goto err_req_delete;
+		}
+
+		/* wait for the connection to establish */
+		if ((ret = rpma_conn_next_event(ccd->conn, &event))) {
+			librpma_td_verror(td, ret, "rpma_conn_next_event");
+			goto err_conn_delete;
+		} else if (event == RPMA_CONN_ESTABLISHED) {
+			break;
+		} else if (event == RPMA_CONN_REJECTED) {
+			(void) rpma_conn_disconnect(ccd->conn);
+			(void) rpma_conn_delete(&ccd->conn);
+			if (retry < LIBRPMA_FIO_RETRY_MAX_NO - 1) {
+				log_err("Thread [%d]: Retrying (#%i) ...\n",
+					td->thread_number, retry + 1);
+				sleep(LIBRPMA_FIO_RETRY_DELAY_S);
+			} else {
+				log_err(
+					"Thread [%d]: The maximum number of retries exceeded. Closing.\n",
+					td->thread_number);
+			}
+		} else {
+			log_err(
+				"rpma_conn_next_event returned an unexptected event: (%s != RPMA_CONN_ESTABLISHED)\n",
+				rpma_utils_conn_event_2str(event));
+			goto err_conn_delete;
+		}
+	}
+
+	if (retry > 0)
+		log_err("Thread [%d]: Connected after retry #%i\n",
+			td->thread_number, retry);
+
+	if (ccd->conn == NULL)
+		goto err_peer_delete;
+
+	/* get the connection's private data sent from the server */
+	if ((ret = rpma_conn_get_private_data(ccd->conn, &pdata))) {
+		librpma_td_verror(td, ret, "rpma_conn_get_private_data");
+		goto err_conn_delete;
+	}
+
+	/* get the server's workspace representation */
+	ccd->ws = pdata.ptr;
+
+	/* create the server's memory representation */
+	if ((ret = rpma_mr_remote_from_descriptor(&ccd->ws->descriptor[0],
+			ccd->ws->mr_desc_size, &ccd->server_mr))) {
+		librpma_td_verror(td, ret, "rpma_mr_remote_from_descriptor");
+		goto err_conn_delete;
+	}
+
+	/* get the total size of the shared server memory */
+	if ((ret = rpma_mr_remote_get_size(ccd->server_mr, &ccd->ws_size))) {
+		librpma_td_verror(td, ret, "rpma_mr_remote_get_size");
+		goto err_conn_delete;
+	}
+
+	/* get flush type of the remote node */
+	if ((ret = rpma_mr_remote_get_flush_type(ccd->server_mr,
+			&remote_flush_type))) {
+		librpma_td_verror(td, ret, "rpma_mr_remote_get_flush_type");
+		goto err_conn_delete;
+	}
+
+	ccd->server_mr_flush_type =
+		(remote_flush_type & RPMA_MR_USAGE_FLUSH_TYPE_PERSISTENT) ?
+		RPMA_FLUSH_TYPE_PERSISTENT : RPMA_FLUSH_TYPE_VISIBILITY;
+
+	/*
+	 * Assure an io_us buffer allocation is page-size-aligned which is required
+	 * to register for RDMA. User-provided value is intentionally ignored.
+	 */
+	td->o.mem_align = page_size;
+
+	td->io_ops_data = ccd;
+
+	return 0;
+
+err_conn_delete:
+	(void) rpma_conn_disconnect(ccd->conn);
+	(void) rpma_conn_delete(&ccd->conn);
+
+err_req_delete:
+	(void) rpma_conn_req_delete(&req);
+
+err_peer_delete:
+	(void) rpma_peer_delete(&ccd->peer);
+
+err_free_io_u_queues:
+	free(ccd->io_us_queued);
+	free(ccd->io_us_flight);
+	free(ccd->io_us_completed);
+
+err_free_ccd:
+	free(ccd);
+
+	return -1;
+}
+
+void librpma_fio_client_cleanup(struct thread_data *td)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	enum rpma_conn_event ev;
+	int ret;
+
+	if (ccd == NULL)
+		return;
+
+	/* delete the iou's memory registration */
+	if ((ret = rpma_mr_dereg(&ccd->orig_mr)))
+		librpma_td_verror(td, ret, "rpma_mr_dereg");
+	/* delete the iou's memory registration */
+	if ((ret = rpma_mr_remote_delete(&ccd->server_mr)))
+		librpma_td_verror(td, ret, "rpma_mr_remote_delete");
+	/* initiate disconnection */
+	if ((ret = rpma_conn_disconnect(ccd->conn)))
+		librpma_td_verror(td, ret, "rpma_conn_disconnect");
+	/* wait for disconnection to end up */
+	if ((ret = rpma_conn_next_event(ccd->conn, &ev))) {
+		librpma_td_verror(td, ret, "rpma_conn_next_event");
+	} else if (ev != RPMA_CONN_CLOSED) {
+		log_err(
+			"client_cleanup received an unexpected event (%s != RPMA_CONN_CLOSED)\n",
+			rpma_utils_conn_event_2str(ev));
+	}
+	/* delete the connection */
+	if ((ret = rpma_conn_delete(&ccd->conn)))
+		librpma_td_verror(td, ret, "rpma_conn_delete");
+	/* delete the peer */
+	if ((ret = rpma_peer_delete(&ccd->peer)))
+		librpma_td_verror(td, ret, "rpma_peer_delete");
+	/* free the software queues */
+	free(ccd->io_us_queued);
+	free(ccd->io_us_flight);
+	free(ccd->io_us_completed);
+	free(ccd);
+	td->io_ops_data = NULL; /* zero ccd */
+}
+
+int librpma_fio_file_nop(struct thread_data *td, struct fio_file *f)
+{
+	/* NOP */
+	return 0;
+}
+
+int librpma_fio_client_post_init(struct thread_data *td)
+{
+	struct librpma_fio_client_data *ccd =  td->io_ops_data;
+	size_t io_us_size;
+	int ret;
+
+	/*
+	 * td->orig_buffer is not aligned. The engine requires aligned io_us
+	 * so FIO alignes up the address using the formula below.
+	 */
+	ccd->orig_buffer_aligned = PTR_ALIGN(td->orig_buffer, page_mask) +
+			td->o.mem_align;
+
+	/*
+	 * td->orig_buffer_size beside the space really consumed by io_us
+	 * has paddings which can be omitted for the memory registration.
+	 */
+	io_us_size = (unsigned long long)td_max_bs(td) *
+			(unsigned long long)td->o.iodepth;
+
+	if ((ret = rpma_mr_reg(ccd->peer, ccd->orig_buffer_aligned, io_us_size,
+			RPMA_MR_USAGE_READ_DST | RPMA_MR_USAGE_READ_SRC |
+			RPMA_MR_USAGE_WRITE_DST | RPMA_MR_USAGE_WRITE_SRC |
+			RPMA_MR_USAGE_FLUSH_TYPE_PERSISTENT, &ccd->orig_mr)))
+		librpma_td_verror(td, ret, "rpma_mr_reg");
+	return ret;
+}
+
+int librpma_fio_client_get_file_size(struct thread_data *td,
+		struct fio_file *f)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+
+	f->real_file_size = ccd->ws_size;
+	fio_file_set_size_known(f);
+
+	return 0;
+}
+
+static enum fio_q_status client_queue_sync(struct thread_data *td,
+		struct io_u *io_u)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	struct rpma_completion cmpl;
+	unsigned io_u_index;
+	int ret;
+
+	/* execute io_u */
+	if (io_u->ddir == DDIR_READ) {
+		/* post an RDMA read operation */
+		if (librpma_fio_client_io_read(td, io_u,
+				RPMA_F_COMPLETION_ALWAYS))
+			goto err;
+	} else if (io_u->ddir == DDIR_WRITE) {
+		/* post an RDMA write operation */
+		if (librpma_fio_client_io_write(td, io_u))
+			goto err;
+		if (ccd->flush(td, io_u, io_u, io_u->xfer_buflen))
+			goto err;
+	} else {
+		log_err("unsupported IO mode: %s\n", io_ddir_name(io_u->ddir));
+		goto err;
+	}
+
+	do {
+		/* get a completion */
+		ret = rpma_conn_completion_get(ccd->conn, &cmpl);
+		if (ret == RPMA_E_NO_COMPLETION) {
+			/* lack of completion is not an error */
+			continue;
+		} else if (ret != 0) {
+			/* an error occurred */
+			librpma_td_verror(td, ret, "rpma_conn_completion_get");
+			goto err;
+		}
+
+		/* if io_us has completed with an error */
+		if (cmpl.op_status != IBV_WC_SUCCESS)
+			goto err;
+
+		if (cmpl.op == RPMA_OP_SEND)
+			++ccd->op_send_completed;
+		else {
+			if (cmpl.op == RPMA_OP_RECV)
+				++ccd->op_recv_completed;
+
+			break;
+		}
+	} while (1);
+
+	if (ccd->get_io_u_index(&cmpl, &io_u_index) != 1)
+		goto err;
+
+	if (io_u->index != io_u_index) {
+		log_err(
+			"no matching io_u for received completion found (io_u_index=%u)\n",
+			io_u_index);
+		goto err;
+	}
+
+	/* make sure all SENDs are completed before exit - clean up SQ */
+	if (librpma_fio_client_io_complete_all_sends(td))
+		goto err;
+
+	return FIO_Q_COMPLETED;
+
+err:
+	io_u->error = -1;
+	return FIO_Q_COMPLETED;
+}
+
+enum fio_q_status librpma_fio_client_queue(struct thread_data *td,
+		struct io_u *io_u)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+
+	if (ccd->io_u_queued_nr == (int)td->o.iodepth)
+		return FIO_Q_BUSY;
+
+	if (td->o.sync_io)
+		return client_queue_sync(td, io_u);
+
+	/* io_u -> queued[] */
+	ccd->io_us_queued[ccd->io_u_queued_nr] = io_u;
+	ccd->io_u_queued_nr++;
+
+	return FIO_Q_QUEUED;
+}
+
+int librpma_fio_client_commit(struct thread_data *td)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	int flags = RPMA_F_COMPLETION_ON_ERROR;
+	struct timespec now;
+	bool fill_time;
+	int i;
+	struct io_u *flush_first_io_u = NULL;
+	unsigned long long int flush_len = 0;
+
+	if (!ccd->io_us_queued)
+		return -1;
+
+	/* execute all io_us from queued[] */
+	for (i = 0; i < ccd->io_u_queued_nr; i++) {
+		struct io_u *io_u = ccd->io_us_queued[i];
+
+		if (io_u->ddir == DDIR_READ) {
+			if (i + 1 == ccd->io_u_queued_nr ||
+			    ccd->io_us_queued[i + 1]->ddir == DDIR_WRITE)
+				flags = RPMA_F_COMPLETION_ALWAYS;
+			/* post an RDMA read operation */
+			if (librpma_fio_client_io_read(td, io_u, flags))
+				return -1;
+		} else if (io_u->ddir == DDIR_WRITE) {
+			/* post an RDMA write operation */
+			if (librpma_fio_client_io_write(td, io_u))
+				return -1;
+
+			/* cache the first io_u in the sequence */
+			if (flush_first_io_u == NULL)
+				flush_first_io_u = io_u;
+
+			/*
+			 * the flush length is the sum of all io_u's creating
+			 * the sequence
+			 */
+			flush_len += io_u->xfer_buflen;
+
+			/*
+			 * if io_u's are random the rpma_flush is required
+			 * after each one of them
+			 */
+			if (!td_random(td)) {
+				/*
+				 * When the io_u's are sequential and
+				 * the current io_u is not the last one and
+				 * the next one is also a write operation
+				 * the flush can be postponed by one io_u and
+				 * cover all of them which build a continuous
+				 * sequence.
+				 */
+				if ((i + 1 < ccd->io_u_queued_nr) &&
+				    (ccd->io_us_queued[i + 1]->ddir == DDIR_WRITE))
+					continue;
+			}
+
+			/* flush all writes which build a continuous sequence */
+			if (ccd->flush(td, flush_first_io_u, io_u, flush_len))
+				return -1;
+
+			/*
+			 * reset the flush parameters in preparation for
+			 * the next one
+			 */
+			flush_first_io_u = NULL;
+			flush_len = 0;
+		} else {
+			log_err("unsupported IO mode: %s\n",
+				io_ddir_name(io_u->ddir));
+			return -1;
+		}
+	}
+
+	if ((fill_time = fio_fill_issue_time(td)))
+		fio_gettime(&now, NULL);
+
+	/* move executed io_us from queued[] to flight[] */
+	for (i = 0; i < ccd->io_u_queued_nr; i++) {
+		struct io_u *io_u = ccd->io_us_queued[i];
+
+		/* FIO does not do this if the engine is asynchronous */
+		if (fill_time)
+			memcpy(&io_u->issue_time, &now, sizeof(now));
+
+		/* move executed io_us from queued[] to flight[] */
+		ccd->io_us_flight[ccd->io_u_flight_nr] = io_u;
+		ccd->io_u_flight_nr++;
+
+		/*
+		 * FIO says:
+		 * If an engine has the commit hook
+		 * it has to call io_u_queued() itself.
+		 */
+		io_u_queued(td, io_u);
+	}
+
+	/* FIO does not do this if an engine has the commit hook. */
+	io_u_mark_submit(td, ccd->io_u_queued_nr);
+	ccd->io_u_queued_nr = 0;
+
+	return 0;
+}
+
+/*
+ * RETURN VALUE
+ * - > 0  - a number of completed io_us
+ * -   0  - when no complicitions received
+ * - (-1) - when an error occurred
+ */
+static int client_getevent_process(struct thread_data *td)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	struct rpma_completion cmpl;
+	/* io_u->index of completed io_u (cmpl.op_context) */
+	unsigned int io_u_index;
+	/* # of completed io_us */
+	int cmpl_num = 0;
+	/* helpers */
+	struct io_u *io_u;
+	int i;
+	int ret;
+
+	/* get a completion */
+	if ((ret = rpma_conn_completion_get(ccd->conn, &cmpl))) {
+		/* lack of completion is not an error */
+		if (ret == RPMA_E_NO_COMPLETION) {
+			/* lack of completion is not an error */
+			return 0;
+		}
+
+		/* an error occurred */
+		librpma_td_verror(td, ret, "rpma_conn_completion_get");
+		return -1;
+	}
+
+	/* if io_us has completed with an error */
+	if (cmpl.op_status != IBV_WC_SUCCESS) {
+		td->error = cmpl.op_status;
+		return -1;
+	}
+
+	if (cmpl.op == RPMA_OP_SEND)
+		++ccd->op_send_completed;
+	else if (cmpl.op == RPMA_OP_RECV)
+		++ccd->op_recv_completed;
+
+	if ((ret = ccd->get_io_u_index(&cmpl, &io_u_index)) != 1)
+		return ret;
+
+	/* look for an io_u being completed */
+	for (i = 0; i < ccd->io_u_flight_nr; ++i) {
+		if (ccd->io_us_flight[i]->index == io_u_index) {
+			cmpl_num = i + 1;
+			break;
+		}
+	}
+
+	/* if no matching io_u has been found */
+	if (cmpl_num == 0) {
+		log_err(
+			"no matching io_u for received completion found (io_u_index=%u)\n",
+			io_u_index);
+		return -1;
+	}
+
+	/* move completed io_us to the completed in-memory queue */
+	for (i = 0; i < cmpl_num; ++i) {
+		/* get and prepare io_u */
+		io_u = ccd->io_us_flight[i];
+
+		/* append to the queue */
+		ccd->io_us_completed[ccd->io_u_completed_nr] = io_u;
+		ccd->io_u_completed_nr++;
+	}
+
+	/* remove completed io_us from the flight queue */
+	for (i = cmpl_num; i < ccd->io_u_flight_nr; ++i)
+		ccd->io_us_flight[i - cmpl_num] = ccd->io_us_flight[i];
+	ccd->io_u_flight_nr -= cmpl_num;
+
+	return cmpl_num;
+}
+
+int librpma_fio_client_getevents(struct thread_data *td, unsigned int min,
+		unsigned int max, const struct timespec *t)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	/* total # of completed io_us */
+	int cmpl_num_total = 0;
+	/* # of completed io_us from a single event */
+	int cmpl_num;
+
+	do {
+		cmpl_num = client_getevent_process(td);
+		if (cmpl_num > 0) {
+			/* new completions collected */
+			cmpl_num_total += cmpl_num;
+		} else if (cmpl_num == 0) {
+			/*
+			 * It is required to make sure that CQEs for SENDs
+			 * will flow at least at the same pace as CQEs for RECVs.
+			 */
+			if (cmpl_num_total >= min &&
+			    ccd->op_send_completed >= ccd->op_recv_completed)
+				break;
+
+			/*
+			 * To reduce CPU consumption one can use
+			 * the rpma_conn_completion_wait() function.
+			 * Note this greatly increase the latency
+			 * and make the results less stable.
+			 * The bandwidth stays more or less the same.
+			 */
+		} else {
+			/* an error occurred */
+			return -1;
+		}
+
+		/*
+		 * The expected max can be exceeded if CQEs for RECVs will come up
+		 * faster than CQEs for SENDs. But it is required to make sure CQEs for
+		 * SENDs will flow at least at the same pace as CQEs for RECVs.
+		 */
+	} while (cmpl_num_total < max ||
+			ccd->op_send_completed < ccd->op_recv_completed);
+
+	/*
+	 * All posted SENDs are completed and RECVs for them (responses) are
+	 * completed. This is the initial situation so the counters are reset.
+	 */
+	if (ccd->op_send_posted == ccd->op_send_completed &&
+			ccd->op_send_completed == ccd->op_recv_completed) {
+		ccd->op_send_posted = 0;
+		ccd->op_send_completed = 0;
+		ccd->op_recv_completed = 0;
+	}
+
+	return cmpl_num_total;
+}
+
+struct io_u *librpma_fio_client_event(struct thread_data *td, int event)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	struct io_u *io_u;
+	int i;
+
+	/* get the first io_u from the queue */
+	io_u = ccd->io_us_completed[0];
+
+	/* remove the first io_u from the queue */
+	for (i = 1; i < ccd->io_u_completed_nr; ++i)
+		ccd->io_us_completed[i - 1] = ccd->io_us_completed[i];
+	ccd->io_u_completed_nr--;
+
+	dprint_io_u(io_u, "client_event");
+
+	return io_u;
+}
+
+char *librpma_fio_client_errdetails(struct io_u *io_u)
+{
+	/* get the string representation of an error */
+	enum ibv_wc_status status = io_u->error;
+	const char *status_str = ibv_wc_status_str(status);
+
+	char *details = strdup(status_str);
+	if (details == NULL) {
+		fprintf(stderr, "Error: %s\n", status_str);
+		fprintf(stderr, "Fatal error: out of memory. Aborting.\n");
+		abort();
+	}
+
+	/* FIO frees the returned string when it becomes obsolete */
+	return details;
+}
+
+int librpma_fio_server_init(struct thread_data *td)
+{
+	struct librpma_fio_options_values *o = td->eo;
+	struct librpma_fio_server_data *csd;
+	struct ibv_context *dev = NULL;
+	enum rpma_log_level log_level_aux = RPMA_LOG_LEVEL_WARNING;
+	int ret = -1;
+
+	/* --debug=net sets RPMA_LOG_THRESHOLD_AUX to RPMA_LOG_LEVEL_INFO */
+#ifdef FIO_INC_DEBUG
+	if ((1UL << FD_NET) & fio_debug)
+		log_level_aux = RPMA_LOG_LEVEL_INFO;
+#endif
+
+	/* configure logging thresholds to see more details */
+	rpma_log_set_threshold(RPMA_LOG_THRESHOLD, RPMA_LOG_LEVEL_INFO);
+	rpma_log_set_threshold(RPMA_LOG_THRESHOLD_AUX, log_level_aux);
+
+
+	/* obtain an IBV context for a remote IP address */
+	if ((ret = rpma_utils_get_ibv_context(o->server_ip,
+			RPMA_UTIL_IBV_CONTEXT_LOCAL, &dev))) {
+		librpma_td_verror(td, ret, "rpma_utils_get_ibv_context");
+		return -1;
+	}
+
+	/* allocate server's data */
+	csd = calloc(1, sizeof(*csd));
+	if (csd == NULL) {
+		td_verror(td, errno, "calloc");
+		return -1;
+	}
+
+	/* create a new peer object */
+	if ((ret = rpma_peer_new(dev, &csd->peer))) {
+		librpma_td_verror(td, ret, "rpma_peer_new");
+		goto err_free_csd;
+	}
+
+	td->io_ops_data = csd;
+
+	return 0;
+
+err_free_csd:
+	free(csd);
+
+	return -1;
+}
+
+void librpma_fio_server_cleanup(struct thread_data *td)
+{
+	struct librpma_fio_server_data *csd =  td->io_ops_data;
+	int ret;
+
+	if (csd == NULL)
+		return;
+
+	/* free the peer */
+	if ((ret = rpma_peer_delete(&csd->peer)))
+		librpma_td_verror(td, ret, "rpma_peer_delete");
+
+	free(csd);
+}
+
+int librpma_fio_server_open_file(struct thread_data *td, struct fio_file *f,
+		struct rpma_conn_cfg *cfg)
+{
+	struct librpma_fio_server_data *csd = td->io_ops_data;
+	struct librpma_fio_options_values *o = td->eo;
+	enum rpma_conn_event conn_event = RPMA_CONN_UNDEFINED;
+	struct librpma_fio_workspace ws = {0};
+	struct rpma_conn_private_data pdata;
+	uint32_t max_msg_num;
+	struct rpma_conn_req *conn_req;
+	struct rpma_conn *conn;
+	struct rpma_mr_local *mr;
+	char port_td[LIBRPMA_FIO_PORT_STR_LEN_MAX];
+	struct rpma_ep *ep;
+	size_t mem_size = td->o.size;
+	size_t mr_desc_size;
+	void *ws_ptr;
+	int usage_mem_type;
+	int ret;
+
+	if (!f->file_name) {
+		log_err("fio: filename is not set\n");
+		return -1;
+	}
+
+	/* start a listening endpoint at addr:port */
+	if (librpma_fio_td_port(o->port, td, port_td))
+		return -1;
+
+	if ((ret = rpma_ep_listen(csd->peer, o->server_ip, port_td, &ep))) {
+		librpma_td_verror(td, ret, "rpma_ep_listen");
+		return -1;
+	}
+
+	if (strcmp(f->file_name, "malloc") == 0) {
+		/* allocation from DRAM using posix_memalign() */
+		ws_ptr = librpma_fio_allocate_dram(td, mem_size, &csd->mem);
+		usage_mem_type = RPMA_MR_USAGE_FLUSH_TYPE_VISIBILITY;
+	} else {
+		/* allocation from PMEM using pmem_map_file() */
+		ws_ptr = librpma_fio_allocate_pmem(td, f->file_name,
+				mem_size, &csd->mem);
+		usage_mem_type = RPMA_MR_USAGE_FLUSH_TYPE_PERSISTENT;
+	}
+
+	if (ws_ptr == NULL)
+		goto err_ep_shutdown;
+
+	f->real_file_size = mem_size;
+
+	if ((ret = rpma_mr_reg(csd->peer, ws_ptr, mem_size,
+			RPMA_MR_USAGE_READ_DST | RPMA_MR_USAGE_READ_SRC |
+			RPMA_MR_USAGE_WRITE_DST | RPMA_MR_USAGE_WRITE_SRC |
+			usage_mem_type, &mr))) {
+		librpma_td_verror(td, ret, "rpma_mr_reg");
+		goto err_free;
+	}
+
+	/* get size of the memory region's descriptor */
+	if ((ret = rpma_mr_get_descriptor_size(mr, &mr_desc_size))) {
+		librpma_td_verror(td, ret, "rpma_mr_get_descriptor_size");
+		goto err_mr_dereg;
+	}
+
+	/* verify size of the memory region's descriptor */
+	if (mr_desc_size > LIBRPMA_FIO_DESCRIPTOR_MAX_SIZE) {
+		log_err(
+			"size of the memory region's descriptor is too big (max=%i)\n",
+			LIBRPMA_FIO_DESCRIPTOR_MAX_SIZE);
+		goto err_mr_dereg;
+	}
+
+	/* get the memory region's descriptor */
+	if ((ret = rpma_mr_get_descriptor(mr, &ws.descriptor[0]))) {
+		librpma_td_verror(td, ret, "rpma_mr_get_descriptor");
+		goto err_mr_dereg;
+	}
+
+	if (cfg != NULL) {
+		if ((ret = rpma_conn_cfg_get_rq_size(cfg, &max_msg_num))) {
+			librpma_td_verror(td, ret, "rpma_conn_cfg_get_rq_size");
+			goto err_mr_dereg;
+		}
+
+		/* verify whether iodepth fits into uint16_t */
+		if (max_msg_num > UINT16_MAX) {
+			log_err("fio: iodepth too big (%u > %u)\n",
+				max_msg_num, UINT16_MAX);
+			return -1;
+		}
+
+		ws.max_msg_num = max_msg_num;
+	}
+
+	/* prepare a workspace description */
+	ws.direct_write_to_pmem = o->direct_write_to_pmem;
+	ws.mr_desc_size = mr_desc_size;
+	pdata.ptr = &ws;
+	pdata.len = sizeof(ws);
+
+	/* receive an incoming connection request */
+	if ((ret = rpma_ep_next_conn_req(ep, cfg, &conn_req))) {
+		librpma_td_verror(td, ret, "rpma_ep_next_conn_req");
+		goto err_mr_dereg;
+	}
+
+	if (csd->prepare_connection && csd->prepare_connection(td, conn_req))
+		goto err_req_delete;
+
+	/* accept the connection request and obtain the connection object */
+	if ((ret = rpma_conn_req_connect(&conn_req, &pdata, &conn))) {
+		librpma_td_verror(td, ret, "rpma_conn_req_connect");
+		goto err_req_delete;
+	}
+
+	/* wait for the connection to be established */
+	if ((ret = rpma_conn_next_event(conn, &conn_event))) {
+		librpma_td_verror(td, ret, "rpma_conn_next_event");
+		goto err_conn_delete;
+	} else if (conn_event != RPMA_CONN_ESTABLISHED) {
+		log_err("rpma_conn_next_event returned an unexptected event\n");
+		goto err_conn_delete;
+	}
+
+	/* end-point is no longer needed */
+	(void) rpma_ep_shutdown(&ep);
+
+	csd->ws_mr = mr;
+	csd->ws_ptr = ws_ptr;
+	csd->conn = conn;
+
+	return 0;
+
+err_conn_delete:
+	(void) rpma_conn_delete(&conn);
+
+err_req_delete:
+	(void) rpma_conn_req_delete(&conn_req);
+
+err_mr_dereg:
+	(void) rpma_mr_dereg(&mr);
+
+err_free:
+	librpma_fio_free(&csd->mem);
+
+err_ep_shutdown:
+	(void) rpma_ep_shutdown(&ep);
+
+	return -1;
+}
+
+int librpma_fio_server_close_file(struct thread_data *td, struct fio_file *f)
+{
+	struct librpma_fio_server_data *csd = td->io_ops_data;
+	enum rpma_conn_event conn_event = RPMA_CONN_UNDEFINED;
+	int rv = 0;
+	int ret;
+
+	/* wait for the connection to be closed */
+	ret = rpma_conn_next_event(csd->conn, &conn_event);
+	if (!ret && conn_event != RPMA_CONN_CLOSED) {
+		log_err("rpma_conn_next_event returned an unexptected event\n");
+		rv = -1;
+	}
+
+	if ((ret = rpma_conn_disconnect(csd->conn))) {
+		librpma_td_verror(td, ret, "rpma_conn_disconnect");
+		rv = -1;
+	}
+
+	if ((ret = rpma_conn_delete(&csd->conn))) {
+		librpma_td_verror(td, ret, "rpma_conn_delete");
+		rv = -1;
+	}
+
+	if ((ret = rpma_mr_dereg(&csd->ws_mr))) {
+		librpma_td_verror(td, ret, "rpma_mr_dereg");
+		rv = -1;
+	}
+
+	librpma_fio_free(&csd->mem);
+
+	return rv;
+}
diff --git a/engines/librpma_fio.h b/engines/librpma_fio.h
new file mode 100644
index 00000000..8cfb2e2d
--- /dev/null
+++ b/engines/librpma_fio.h
@@ -0,0 +1,273 @@
+/*
+ * librpma_fio: librpma_apm and librpma_gpspm engines' common header.
+ *
+ * Copyright 2021, Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef LIBRPMA_FIO_H
+#define LIBRPMA_FIO_H 1
+
+#include "../fio.h"
+#include "../optgroup.h"
+
+#include <librpma.h>
+
+/* servers' and clients' common */
+
+#define librpma_td_verror(td, err, func) \
+	td_vmsg((td), (err), rpma_err_2str(err), (func))
+
+/* ceil(a / b) = (a + b - 1) / b */
+#define LIBRPMA_FIO_CEIL(a, b) (((a) + (b) - 1) / (b))
+
+/* common option structure for server and client */
+struct librpma_fio_options_values {
+	/*
+	 * FIO considers .off1 == 0 absent so the first meaningful field has to
+	 * have padding ahead of it.
+	 */
+	void *pad;
+	char *server_ip;
+	/* base server listening port */
+	char *port;
+	/* Direct Write to PMem is possible */
+	unsigned int direct_write_to_pmem;
+};
+
+extern struct fio_option librpma_fio_options[];
+
+/*
+ * Limited by the maximum length of the private data
+ * for rdma_connect() in case of RDMA_PS_TCP (28 bytes).
+ */
+#define LIBRPMA_FIO_DESCRIPTOR_MAX_SIZE 24
+
+struct librpma_fio_workspace {
+	uint16_t max_msg_num;	/* # of RQ slots */
+	uint8_t direct_write_to_pmem; /* Direct Write to PMem is possible */
+	uint8_t mr_desc_size;	/* size of mr_desc in descriptor[] */
+	/* buffer containing mr_desc */
+	char descriptor[LIBRPMA_FIO_DESCRIPTOR_MAX_SIZE];
+};
+
+#define LIBRPMA_FIO_PORT_STR_LEN_MAX 12
+
+int librpma_fio_td_port(const char *port_base_str, struct thread_data *td,
+		char *port_out);
+
+struct librpma_fio_mem {
+	/* memory buffer */
+	char *mem_ptr;
+
+	/* size of the mapped persistent memory */
+	size_t size_mmap;
+};
+
+char *librpma_fio_allocate_dram(struct thread_data *td, size_t size,
+		struct librpma_fio_mem *mem);
+
+char *librpma_fio_allocate_pmem(struct thread_data *td, const char *filename,
+		size_t size, struct librpma_fio_mem *mem);
+
+void librpma_fio_free(struct librpma_fio_mem *mem);
+
+/* clients' common */
+
+typedef int (*librpma_fio_flush_t)(struct thread_data *td,
+		struct io_u *first_io_u, struct io_u *last_io_u,
+		unsigned long long int len);
+
+/*
+ * RETURN VALUE
+ * - ( 1) - on success
+ * - ( 0) - skip
+ * - (-1) - on error
+ */
+typedef int (*librpma_fio_get_io_u_index_t)(struct rpma_completion *cmpl,
+		unsigned int *io_u_index);
+
+struct librpma_fio_client_data {
+	struct rpma_peer *peer;
+	struct rpma_conn *conn;
+
+	/* aligned td->orig_buffer */
+	char *orig_buffer_aligned;
+
+	/* ious's base address memory registration (cd->orig_buffer_aligned) */
+	struct rpma_mr_local *orig_mr;
+
+	struct librpma_fio_workspace *ws;
+
+	/* a server's memory representation */
+	struct rpma_mr_remote *server_mr;
+	enum rpma_flush_type server_mr_flush_type;
+
+	/* remote workspace description */
+	size_t ws_size;
+
+	/* in-memory queues */
+	struct io_u **io_us_queued;
+	int io_u_queued_nr;
+	struct io_u **io_us_flight;
+	int io_u_flight_nr;
+	struct io_u **io_us_completed;
+	int io_u_completed_nr;
+
+	/* SQ control. Note: all of them have to be kept in sync. */
+	uint32_t op_send_posted;
+	uint32_t op_send_completed;
+	uint32_t op_recv_completed;
+
+	librpma_fio_flush_t flush;
+	librpma_fio_get_io_u_index_t get_io_u_index;
+
+	/* engine-specific client data */
+	void *client_data;
+};
+
+int librpma_fio_client_init(struct thread_data *td,
+		struct rpma_conn_cfg *cfg);
+void librpma_fio_client_cleanup(struct thread_data *td);
+
+int librpma_fio_file_nop(struct thread_data *td, struct fio_file *f);
+int librpma_fio_client_get_file_size(struct thread_data *td,
+		struct fio_file *f);
+
+int librpma_fio_client_post_init(struct thread_data *td);
+
+enum fio_q_status librpma_fio_client_queue(struct thread_data *td,
+		struct io_u *io_u);
+
+int librpma_fio_client_commit(struct thread_data *td);
+
+int librpma_fio_client_getevents(struct thread_data *td, unsigned int min,
+		unsigned int max, const struct timespec *t);
+
+struct io_u *librpma_fio_client_event(struct thread_data *td, int event);
+
+char *librpma_fio_client_errdetails(struct io_u *io_u);
+
+static inline int librpma_fio_client_io_read(struct thread_data *td,
+		struct io_u *io_u, int flags)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	size_t dst_offset = (char *)(io_u->xfer_buf) - ccd->orig_buffer_aligned;
+	size_t src_offset = io_u->offset;
+	int ret;
+
+	if ((ret = rpma_read(ccd->conn, ccd->orig_mr, dst_offset,
+			ccd->server_mr, src_offset, io_u->xfer_buflen,
+			flags, (void *)(uintptr_t)io_u->index))) {
+		librpma_td_verror(td, ret, "rpma_read");
+		return -1;
+	}
+
+	return 0;
+}
+
+static inline int librpma_fio_client_io_write(struct thread_data *td,
+		struct io_u *io_u)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	size_t src_offset = (char *)(io_u->xfer_buf) - ccd->orig_buffer_aligned;
+	size_t dst_offset = io_u->offset;
+	int ret;
+
+	if ((ret = rpma_write(ccd->conn, ccd->server_mr, dst_offset,
+			ccd->orig_mr, src_offset, io_u->xfer_buflen,
+			RPMA_F_COMPLETION_ON_ERROR,
+			(void *)(uintptr_t)io_u->index))) {
+		librpma_td_verror(td, ret, "rpma_write");
+		return -1;
+	}
+
+	return 0;
+}
+
+static inline int librpma_fio_client_io_complete_all_sends(
+		struct thread_data *td)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	struct rpma_completion cmpl;
+	int ret;
+
+	while (ccd->op_send_posted != ccd->op_send_completed) {
+		/* get a completion */
+		ret = rpma_conn_completion_get(ccd->conn, &cmpl);
+		if (ret == RPMA_E_NO_COMPLETION) {
+			/* lack of completion is not an error */
+			continue;
+		} else if (ret != 0) {
+			/* an error occurred */
+			librpma_td_verror(td, ret, "rpma_conn_completion_get");
+			break;
+		}
+
+		if (cmpl.op_status != IBV_WC_SUCCESS)
+			return -1;
+
+		if (cmpl.op == RPMA_OP_SEND)
+			++ccd->op_send_completed;
+		else {
+			log_err(
+				"A completion other than RPMA_OP_SEND got during cleaning up the CQ from SENDs\n");
+			return -1;
+		}
+	}
+
+	/*
+	 * All posted SENDs are completed and RECVs for them (responses) are
+	 * completed. This is the initial situation so the counters are reset.
+	 */
+	if (ccd->op_send_posted == ccd->op_send_completed &&
+			ccd->op_send_completed == ccd->op_recv_completed) {
+		ccd->op_send_posted = 0;
+		ccd->op_send_completed = 0;
+		ccd->op_recv_completed = 0;
+	}
+
+	return 0;
+}
+
+/* servers' common */
+
+typedef int (*librpma_fio_prepare_connection_t)(
+		struct thread_data *td,
+		struct rpma_conn_req *conn_req);
+
+struct librpma_fio_server_data {
+	struct rpma_peer *peer;
+
+	/* resources of an incoming connection */
+	struct rpma_conn *conn;
+
+	char *ws_ptr;
+	struct rpma_mr_local *ws_mr;
+	struct librpma_fio_mem mem;
+
+	/* engine-specific server data */
+	void *server_data;
+
+	librpma_fio_prepare_connection_t prepare_connection;
+};
+
+int librpma_fio_server_init(struct thread_data *td);
+
+void librpma_fio_server_cleanup(struct thread_data *td);
+
+int librpma_fio_server_open_file(struct thread_data *td,
+		struct fio_file *f, struct rpma_conn_cfg *cfg);
+
+int librpma_fio_server_close_file(struct thread_data *td,
+		struct fio_file *f);
+
+#endif /* LIBRPMA_FIO_H */
diff --git a/engines/librpma_gpspm.c b/engines/librpma_gpspm.c
new file mode 100644
index 00000000..ac614f46
--- /dev/null
+++ b/engines/librpma_gpspm.c
@@ -0,0 +1,755 @@
+/*
+ * librpma_gpspm: IO engine that uses PMDK librpma to write data,
+ *		based on General Purpose Server Persistency Method
+ *
+ * Copyright 2020-2021, Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "librpma_fio.h"
+
+#include <libpmem.h>
+
+/* Generated by the protocol buffer compiler from: librpma_gpspm_flush.proto */
+#include "librpma_gpspm_flush.pb-c.h"
+
+#define MAX_MSG_SIZE (512)
+#define IO_U_BUF_LEN (2 * MAX_MSG_SIZE)
+#define SEND_OFFSET (0)
+#define RECV_OFFSET (SEND_OFFSET + MAX_MSG_SIZE)
+
+#define GPSPM_FLUSH_REQUEST__LAST \
+	{ PROTOBUF_C_MESSAGE_INIT(&gpspm_flush_request__descriptor), 0, 0, 0 }
+
+/*
+ * 'Flush_req_last' is the last flush request
+ * the client has to send to server to indicate
+ * that the client is done.
+ */
+static const GPSPMFlushRequest Flush_req_last = GPSPM_FLUSH_REQUEST__LAST;
+
+#define IS_NOT_THE_LAST_MESSAGE(flush_req) \
+	(flush_req->length != Flush_req_last.length || \
+	flush_req->offset != Flush_req_last.offset)
+
+/* client side implementation */
+
+/* get next io_u message buffer in the round-robin fashion */
+#define IO_U_NEXT_BUF_OFF_CLIENT(cd) \
+	(IO_U_BUF_LEN * ((cd->msg_curr++) % cd->msg_num))
+
+struct client_data {
+	/* memory for sending and receiving buffered */
+	char *io_us_msgs;
+
+	/* resources for messaging buffer */
+	uint32_t msg_num;
+	uint32_t msg_curr;
+	struct rpma_mr_local *msg_mr;
+};
+
+static inline int client_io_flush(struct thread_data *td,
+		struct io_u *first_io_u, struct io_u *last_io_u,
+		unsigned long long int len);
+
+static int client_get_io_u_index(struct rpma_completion *cmpl,
+		unsigned int *io_u_index);
+
+static int client_init(struct thread_data *td)
+{
+	struct librpma_fio_client_data *ccd;
+	struct client_data *cd;
+	uint32_t write_num;
+	struct rpma_conn_cfg *cfg = NULL;
+	int ret;
+
+	/*
+	 * not supported:
+	 * - readwrite = read / trim / randread / randtrim /
+	 *               / rw / randrw / trimwrite
+	 */
+	if (td_read(td) || td_trim(td)) {
+		td_verror(td, EINVAL, "Not supported mode.");
+		return -1;
+	}
+
+	/* allocate client's data */
+	cd = calloc(1, sizeof(*cd));
+	if (cd == NULL) {
+		td_verror(td, errno, "calloc");
+		return -1;
+	}
+
+	/*
+	 * Calculate the required number of WRITEs and FLUSHes.
+	 *
+	 * Note: Each flush is a request (SEND) and response (RECV) pair.
+	 */
+	if (td_random(td)) {
+		write_num = td->o.iodepth; /* WRITE * N */
+		cd->msg_num = td->o.iodepth; /* FLUSH * N */
+	} else {
+		if (td->o.sync_io) {
+			write_num = 1; /* WRITE */
+			cd->msg_num = 1; /* FLUSH */
+		} else {
+			write_num = td->o.iodepth; /* WRITE * N */
+			/*
+			 * FLUSH * B where:
+			 * - B == ceil(iodepth / iodepth_batch)
+			 *   which is the number of batches for N writes
+			 */
+			cd->msg_num = LIBRPMA_FIO_CEIL(td->o.iodepth,
+					td->o.iodepth_batch);
+		}
+	}
+
+	/* create a connection configuration object */
+	if ((ret = rpma_conn_cfg_new(&cfg))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_new");
+		goto err_free_cd;
+	}
+
+	/*
+	 * Calculate the required queue sizes where:
+	 * - the send queue (SQ) has to be big enough to accommodate
+	 *   all io_us (WRITEs) and all flush requests (SENDs)
+	 * - the receive queue (RQ) has to be big enough to accommodate
+	 *   all flush responses (RECVs)
+	 * - the completion queue (CQ) has to be big enough to accommodate all
+	 *   success and error completions (sq_size + rq_size)
+	 */
+	if ((ret = rpma_conn_cfg_set_sq_size(cfg, write_num + cd->msg_num))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_set_sq_size");
+		goto err_cfg_delete;
+	}
+	if ((ret = rpma_conn_cfg_set_rq_size(cfg, cd->msg_num))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_set_rq_size");
+		goto err_cfg_delete;
+	}
+	if ((ret = rpma_conn_cfg_set_cq_size(cfg, write_num + cd->msg_num * 2))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_set_cq_size");
+		goto err_cfg_delete;
+	}
+
+	if (librpma_fio_client_init(td, cfg))
+		goto err_cfg_delete;
+
+	ccd = td->io_ops_data;
+
+	if (ccd->ws->direct_write_to_pmem &&
+	    ccd->server_mr_flush_type == RPMA_FLUSH_TYPE_PERSISTENT &&
+	    td->thread_number == 1) {
+		/* XXX log_info mixes with the JSON output */
+		log_err(
+			"Note: The server side supports Direct Write to PMem and it is equipped with PMem (direct_write_to_pmem).\n"
+			"You can use librpma_client and librpma_server engines for better performance instead of GPSPM.\n");
+	}
+
+	/* validate the server's RQ capacity */
+	if (cd->msg_num > ccd->ws->max_msg_num) {
+		log_err(
+			"server's RQ size (iodepth) too small to handle the client's workspace requirements (%u < %u)\n",
+			ccd->ws->max_msg_num, cd->msg_num);
+		goto err_cleanup_common;
+	}
+
+	if ((ret = rpma_conn_cfg_delete(&cfg))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_delete");
+		/* non fatal error - continue */
+	}
+
+	ccd->flush = client_io_flush;
+	ccd->get_io_u_index = client_get_io_u_index;
+	ccd->client_data = cd;
+
+	return 0;
+
+err_cleanup_common:
+	librpma_fio_client_cleanup(td);
+
+err_cfg_delete:
+	(void) rpma_conn_cfg_delete(&cfg);
+
+err_free_cd:
+	free(cd);
+
+	return -1;
+}
+
+static int client_post_init(struct thread_data *td)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	struct client_data *cd = ccd->client_data;
+	unsigned int io_us_msgs_size;
+	int ret;
+
+	/* message buffers initialization and registration */
+	io_us_msgs_size = cd->msg_num * IO_U_BUF_LEN;
+	if ((ret = posix_memalign((void **)&cd->io_us_msgs, page_size,
+			io_us_msgs_size))) {
+		td_verror(td, ret, "posix_memalign");
+		return ret;
+	}
+	if ((ret = rpma_mr_reg(ccd->peer, cd->io_us_msgs, io_us_msgs_size,
+			RPMA_MR_USAGE_SEND | RPMA_MR_USAGE_RECV,
+			&cd->msg_mr))) {
+		librpma_td_verror(td, ret, "rpma_mr_reg");
+		return ret;
+	}
+
+	return librpma_fio_client_post_init(td);
+}
+
+static void client_cleanup(struct thread_data *td)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	struct client_data *cd;
+	size_t flush_req_size;
+	size_t io_u_buf_off;
+	size_t send_offset;
+	void *send_ptr;
+	int ret;
+
+	if (ccd == NULL)
+		return;
+
+	cd = ccd->client_data;
+	if (cd == NULL) {
+		librpma_fio_client_cleanup(td);
+		return;
+	}
+
+	/*
+	 * Make sure all SEND completions are collected ergo there are free
+	 * slots in the SQ for the last SEND message.
+	 *
+	 * Note: If any operation will fail we still can send the termination
+	 * notice.
+	 */
+	(void) librpma_fio_client_io_complete_all_sends(td);
+
+	/* prepare the last flush message and pack it to the send buffer */
+	flush_req_size = gpspm_flush_request__get_packed_size(&Flush_req_last);
+	if (flush_req_size > MAX_MSG_SIZE) {
+		log_err(
+			"Packed flush request size is bigger than available send buffer space (%zu > %d\n",
+			flush_req_size, MAX_MSG_SIZE);
+	} else {
+		io_u_buf_off = IO_U_NEXT_BUF_OFF_CLIENT(cd);
+		send_offset = io_u_buf_off + SEND_OFFSET;
+		send_ptr = cd->io_us_msgs + send_offset;
+		(void) gpspm_flush_request__pack(&Flush_req_last, send_ptr);
+
+		/* send the flush message */
+		if ((ret = rpma_send(ccd->conn, cd->msg_mr, send_offset,
+				flush_req_size, RPMA_F_COMPLETION_ALWAYS,
+				NULL)))
+			librpma_td_verror(td, ret, "rpma_send");
+
+		++ccd->op_send_posted;
+
+		/* Wait for the SEND to complete */
+		(void) librpma_fio_client_io_complete_all_sends(td);
+	}
+
+	/* deregister the messaging buffer memory */
+	if ((ret = rpma_mr_dereg(&cd->msg_mr)))
+		librpma_td_verror(td, ret, "rpma_mr_dereg");
+
+	free(ccd->client_data);
+
+	librpma_fio_client_cleanup(td);
+}
+
+static inline int client_io_flush(struct thread_data *td,
+		struct io_u *first_io_u, struct io_u *last_io_u,
+		unsigned long long int len)
+{
+	struct librpma_fio_client_data *ccd = td->io_ops_data;
+	struct client_data *cd = ccd->client_data;
+	size_t io_u_buf_off = IO_U_NEXT_BUF_OFF_CLIENT(cd);
+	size_t send_offset = io_u_buf_off + SEND_OFFSET;
+	size_t recv_offset = io_u_buf_off + RECV_OFFSET;
+	void *send_ptr = cd->io_us_msgs + send_offset;
+	void *recv_ptr = cd->io_us_msgs + recv_offset;
+	GPSPMFlushRequest flush_req = GPSPM_FLUSH_REQUEST__INIT;
+	size_t flush_req_size = 0;
+	int ret;
+
+	/* prepare a response buffer */
+	if ((ret = rpma_recv(ccd->conn, cd->msg_mr, recv_offset, MAX_MSG_SIZE,
+			recv_ptr))) {
+		librpma_td_verror(td, ret, "rpma_recv");
+		return -1;
+	}
+
+	/* prepare a flush message and pack it to a send buffer */
+	flush_req.offset = first_io_u->offset;
+	flush_req.length = len;
+	flush_req.op_context = last_io_u->index;
+	flush_req_size = gpspm_flush_request__get_packed_size(&flush_req);
+	if (flush_req_size > MAX_MSG_SIZE) {
+		log_err(
+			"Packed flush request size is bigger than available send buffer space (%"
+			PRIu64 " > %d\n", flush_req_size, MAX_MSG_SIZE);
+		return -1;
+	}
+	(void) gpspm_flush_request__pack(&flush_req, send_ptr);
+
+	/* send the flush message */
+	if ((ret = rpma_send(ccd->conn, cd->msg_mr, send_offset, flush_req_size,
+			RPMA_F_COMPLETION_ALWAYS, NULL))) {
+		librpma_td_verror(td, ret, "rpma_send");
+		return -1;
+	}
+
+	++ccd->op_send_posted;
+
+	return 0;
+}
+
+static int client_get_io_u_index(struct rpma_completion *cmpl,
+		unsigned int *io_u_index)
+{
+	GPSPMFlushResponse *flush_resp;
+
+	if (cmpl->op != RPMA_OP_RECV)
+		return 0;
+
+	/* unpack a response from the received buffer */
+	flush_resp = gpspm_flush_response__unpack(NULL,
+			cmpl->byte_len, cmpl->op_context);
+	if (flush_resp == NULL) {
+		log_err("Cannot unpack the flush response buffer\n");
+		return -1;
+	}
+
+	memcpy(io_u_index, &flush_resp->op_context, sizeof(*io_u_index));
+
+	gpspm_flush_response__free_unpacked(flush_resp, NULL);
+
+	return 1;
+}
+
+FIO_STATIC struct ioengine_ops ioengine_client = {
+	.name			= "librpma_gpspm_client",
+	.version		= FIO_IOOPS_VERSION,
+	.init			= client_init,
+	.post_init		= client_post_init,
+	.get_file_size		= librpma_fio_client_get_file_size,
+	.open_file		= librpma_fio_file_nop,
+	.queue			= librpma_fio_client_queue,
+	.commit			= librpma_fio_client_commit,
+	.getevents		= librpma_fio_client_getevents,
+	.event			= librpma_fio_client_event,
+	.errdetails		= librpma_fio_client_errdetails,
+	.close_file		= librpma_fio_file_nop,
+	.cleanup		= client_cleanup,
+	.flags			= FIO_DISKLESSIO,
+	.options		= librpma_fio_options,
+	.option_struct_size	= sizeof(struct librpma_fio_options_values),
+};
+
+/* server side implementation */
+
+#define IO_U_BUFF_OFF_SERVER(i) (i * IO_U_BUF_LEN)
+
+struct server_data {
+	/* aligned td->orig_buffer */
+	char *orig_buffer_aligned;
+
+	/* resources for messaging buffer from DRAM allocated by fio */
+	struct rpma_mr_local *msg_mr;
+
+	uint32_t msg_sqe_available; /* # of free SQ slots */
+
+	/* in-memory queues */
+	struct rpma_completion *msgs_queued;
+	uint32_t msg_queued_nr;
+};
+
+static int server_init(struct thread_data *td)
+{
+	struct librpma_fio_server_data *csd;
+	struct server_data *sd;
+	int ret = -1;
+
+	if ((ret = librpma_fio_server_init(td)))
+		return ret;
+
+	csd = td->io_ops_data;
+
+	/* allocate server's data */
+	sd = calloc(1, sizeof(*sd));
+	if (sd == NULL) {
+		td_verror(td, errno, "calloc");
+		goto err_server_cleanup;
+	}
+
+	/* allocate in-memory queue */
+	sd->msgs_queued = calloc(td->o.iodepth, sizeof(*sd->msgs_queued));
+	if (sd->msgs_queued == NULL) {
+		td_verror(td, errno, "calloc");
+		goto err_free_sd;
+	}
+
+	/*
+	 * Assure a single io_u buffer can store both SEND and RECV messages and
+	 * an io_us buffer allocation is page-size-aligned which is required
+	 * to register for RDMA. User-provided values are intentionally ignored.
+	 */
+	td->o.max_bs[DDIR_READ] = IO_U_BUF_LEN;
+	td->o.mem_align = page_size;
+
+	csd->server_data = sd;
+
+	return 0;
+
+err_free_sd:
+	free(sd);
+
+err_server_cleanup:
+	librpma_fio_server_cleanup(td);
+
+	return -1;
+}
+
+static int server_post_init(struct thread_data *td)
+{
+	struct librpma_fio_server_data *csd = td->io_ops_data;
+	struct server_data *sd = csd->server_data;
+	size_t io_us_size;
+	size_t io_u_buflen;
+	int ret;
+
+	/*
+	 * td->orig_buffer is not aligned. The engine requires aligned io_us
+	 * so FIO alignes up the address using the formula below.
+	 */
+	sd->orig_buffer_aligned = PTR_ALIGN(td->orig_buffer, page_mask) +
+			td->o.mem_align;
+
+	/*
+	 * XXX
+	 * Each io_u message buffer contains recv and send messages.
+	 * Aligning each of those buffers may potentially give
+	 * some performance benefits.
+	 */
+	io_u_buflen = td_max_bs(td);
+
+	/* check whether io_u buffer is big enough */
+	if (io_u_buflen < IO_U_BUF_LEN) {
+		log_err(
+			"blocksize too small to accommodate assumed maximal request/response pair size (%" PRIu64 " < %d)\n",
+			io_u_buflen, IO_U_BUF_LEN);
+		return -1;
+	}
+
+	/*
+	 * td->orig_buffer_size beside the space really consumed by io_us
+	 * has paddings which can be omitted for the memory registration.
+	 */
+	io_us_size = (unsigned long long)io_u_buflen *
+			(unsigned long long)td->o.iodepth;
+
+	if ((ret = rpma_mr_reg(csd->peer, sd->orig_buffer_aligned, io_us_size,
+			RPMA_MR_USAGE_SEND | RPMA_MR_USAGE_RECV,
+			&sd->msg_mr))) {
+		librpma_td_verror(td, ret, "rpma_mr_reg");
+		return -1;
+	}
+
+	return 0;
+}
+
+static void server_cleanup(struct thread_data *td)
+{
+	struct librpma_fio_server_data *csd = td->io_ops_data;
+	struct server_data *sd;
+	int ret;
+
+	if (csd == NULL)
+		return;
+
+	sd = csd->server_data;
+
+	if (sd != NULL) {
+		/* rpma_mr_dereg(messaging buffer from DRAM) */
+		if ((ret = rpma_mr_dereg(&sd->msg_mr)))
+			librpma_td_verror(td, ret, "rpma_mr_dereg");
+
+		free(sd->msgs_queued);
+		free(sd);
+	}
+
+	librpma_fio_server_cleanup(td);
+}
+
+static int prepare_connection(struct thread_data *td,
+		struct rpma_conn_req *conn_req)
+{
+	struct librpma_fio_server_data *csd = td->io_ops_data;
+	struct server_data *sd = csd->server_data;
+	int ret;
+	int i;
+
+	/* prepare buffers for a flush requests */
+	sd->msg_sqe_available = td->o.iodepth;
+	for (i = 0; i < td->o.iodepth; i++) {
+		size_t offset_recv_msg = IO_U_BUFF_OFF_SERVER(i) + RECV_OFFSET;
+		if ((ret = rpma_conn_req_recv(conn_req, sd->msg_mr,
+				offset_recv_msg, MAX_MSG_SIZE,
+				(const void *)(uintptr_t)i))) {
+			librpma_td_verror(td, ret, "rpma_conn_req_recv");
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int server_open_file(struct thread_data *td, struct fio_file *f)
+{
+	struct librpma_fio_server_data *csd = td->io_ops_data;
+	struct rpma_conn_cfg *cfg = NULL;
+	uint16_t max_msg_num = td->o.iodepth;
+	int ret;
+
+	csd->prepare_connection = prepare_connection;
+
+	/* create a connection configuration object */
+	if ((ret = rpma_conn_cfg_new(&cfg))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_new");
+		return -1;
+	}
+
+	/*
+	 * Calculate the required queue sizes where:
+	 * - the send queue (SQ) has to be big enough to accommodate
+	 *   all possible flush requests (SENDs)
+	 * - the receive queue (RQ) has to be big enough to accommodate
+	 *   all flush responses (RECVs)
+	 * - the completion queue (CQ) has to be big enough to accommodate
+	 *   all success and error completions (sq_size + rq_size)
+	 */
+	if ((ret = rpma_conn_cfg_set_sq_size(cfg, max_msg_num))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_set_sq_size");
+		goto err_cfg_delete;
+	}
+	if ((ret = rpma_conn_cfg_set_rq_size(cfg, max_msg_num))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_set_rq_size");
+		goto err_cfg_delete;
+	}
+	if ((ret = rpma_conn_cfg_set_cq_size(cfg, max_msg_num * 2))) {
+		librpma_td_verror(td, ret, "rpma_conn_cfg_set_cq_size");
+		goto err_cfg_delete;
+	}
+
+	ret = librpma_fio_server_open_file(td, f, cfg);
+
+err_cfg_delete:
+	(void) rpma_conn_cfg_delete(&cfg);
+
+	return ret;
+}
+
+static int server_qe_process(struct thread_data *td,
+		struct rpma_completion *cmpl)
+{
+	struct librpma_fio_server_data *csd = td->io_ops_data;
+	struct server_data *sd = csd->server_data;
+	GPSPMFlushRequest *flush_req;
+	GPSPMFlushResponse flush_resp = GPSPM_FLUSH_RESPONSE__INIT;
+	size_t flush_resp_size = 0;
+	size_t send_buff_offset;
+	size_t recv_buff_offset;
+	size_t io_u_buff_offset;
+	void *send_buff_ptr;
+	void *recv_buff_ptr;
+	void *op_ptr;
+	int msg_index;
+	int ret;
+
+	/* calculate SEND/RECV pair parameters */
+	msg_index = (int)(uintptr_t)cmpl->op_context;
+	io_u_buff_offset = IO_U_BUFF_OFF_SERVER(msg_index);
+	send_buff_offset = io_u_buff_offset + SEND_OFFSET;
+	recv_buff_offset = io_u_buff_offset + RECV_OFFSET;
+	send_buff_ptr = sd->orig_buffer_aligned + send_buff_offset;
+	recv_buff_ptr = sd->orig_buffer_aligned + recv_buff_offset;
+
+	/* unpack a flush request from the received buffer */
+	flush_req = gpspm_flush_request__unpack(NULL, cmpl->byte_len,
+			recv_buff_ptr);
+	if (flush_req == NULL) {
+		log_err("cannot unpack the flush request buffer\n");
+		goto err_terminate;
+	}
+
+	if (IS_NOT_THE_LAST_MESSAGE(flush_req)) {
+		op_ptr = csd->ws_ptr + flush_req->offset;
+		pmem_persist(op_ptr, flush_req->length);
+	} else {
+		/*
+		 * This is the last message - the client is done.
+		 */
+		gpspm_flush_request__free_unpacked(flush_req, NULL);
+		td->done = true;
+		return 0;
+	}
+
+	/* initiate the next receive operation */
+	if ((ret = rpma_recv(csd->conn, sd->msg_mr, recv_buff_offset,
+			MAX_MSG_SIZE,
+			(const void *)(uintptr_t)msg_index))) {
+		librpma_td_verror(td, ret, "rpma_recv");
+		goto err_free_unpacked;
+	}
+
+	/* prepare a flush response and pack it to a send buffer */
+	flush_resp.op_context = flush_req->op_context;
+	flush_resp_size = gpspm_flush_response__get_packed_size(&flush_resp);
+	if (flush_resp_size > MAX_MSG_SIZE) {
+		log_err(
+			"Size of the packed flush response is bigger than the available space of the send buffer (%"
+			PRIu64 " > %i\n", flush_resp_size, MAX_MSG_SIZE);
+		goto err_free_unpacked;
+	}
+
+	(void) gpspm_flush_response__pack(&flush_resp, send_buff_ptr);
+
+	/* send the flush response */
+	if ((ret = rpma_send(csd->conn, sd->msg_mr, send_buff_offset,
+			flush_resp_size, RPMA_F_COMPLETION_ALWAYS, NULL))) {
+		librpma_td_verror(td, ret, "rpma_send");
+		goto err_free_unpacked;
+	}
+	--sd->msg_sqe_available;
+
+	gpspm_flush_request__free_unpacked(flush_req, NULL);
+
+	return 0;
+
+err_free_unpacked:
+	gpspm_flush_request__free_unpacked(flush_req, NULL);
+
+err_terminate:
+	td->terminate = true;
+
+	return -1;
+}
+
+static inline int server_queue_process(struct thread_data *td)
+{
+	struct librpma_fio_server_data *csd = td->io_ops_data;
+	struct server_data *sd = csd->server_data;
+	int ret;
+	int i;
+
+	/* min(# of queue entries, # of SQ entries available) */
+	uint32_t qes_to_process = min(sd->msg_queued_nr, sd->msg_sqe_available);
+	if (qes_to_process == 0)
+		return 0;
+
+	/* process queued completions */
+	for (i = 0; i < qes_to_process; ++i) {
+		if ((ret = server_qe_process(td, &sd->msgs_queued[i])))
+			return ret;
+	}
+
+	/* progress the queue */
+	for (i = 0; i < sd->msg_queued_nr - qes_to_process; ++i) {
+		memcpy(&sd->msgs_queued[i],
+			&sd->msgs_queued[qes_to_process + i],
+			sizeof(sd->msgs_queued[i]));
+	}
+
+	sd->msg_queued_nr -= qes_to_process;
+
+	return 0;
+}
+
+static int server_cmpl_process(struct thread_data *td)
+{
+	struct librpma_fio_server_data *csd = td->io_ops_data;
+	struct server_data *sd = csd->server_data;
+	struct rpma_completion *cmpl = &sd->msgs_queued[sd->msg_queued_nr];
+	int ret;
+
+	ret = rpma_conn_completion_get(csd->conn, cmpl);
+	if (ret == RPMA_E_NO_COMPLETION) {
+		/* lack of completion is not an error */
+		return 0;
+	} else if (ret != 0) {
+		librpma_td_verror(td, ret, "rpma_conn_completion_get");
+		goto err_terminate;
+	}
+
+	/* validate the completion */
+	if (cmpl->op_status != IBV_WC_SUCCESS)
+		goto err_terminate;
+
+	if (cmpl->op == RPMA_OP_RECV)
+		++sd->msg_queued_nr;
+	else if (cmpl->op == RPMA_OP_SEND)
+		++sd->msg_sqe_available;
+
+	return 0;
+
+err_terminate:
+	td->terminate = true;
+
+	return -1;
+}
+
+static enum fio_q_status server_queue(struct thread_data *td, struct io_u *io_u)
+{
+	do {
+		if (server_cmpl_process(td))
+			return FIO_Q_BUSY;
+
+		if (server_queue_process(td))
+			return FIO_Q_BUSY;
+
+	} while (!td->done);
+
+	return FIO_Q_COMPLETED;
+}
+
+FIO_STATIC struct ioengine_ops ioengine_server = {
+	.name			= "librpma_gpspm_server",
+	.version		= FIO_IOOPS_VERSION,
+	.init			= server_init,
+	.post_init		= server_post_init,
+	.open_file		= server_open_file,
+	.close_file		= librpma_fio_server_close_file,
+	.queue			= server_queue,
+	.invalidate		= librpma_fio_file_nop,
+	.cleanup		= server_cleanup,
+	.flags			= FIO_SYNCIO,
+	.options		= librpma_fio_options,
+	.option_struct_size	= sizeof(struct librpma_fio_options_values),
+};
+
+/* register both engines */
+
+static void fio_init fio_librpma_gpspm_register(void)
+{
+	register_ioengine(&ioengine_client);
+	register_ioengine(&ioengine_server);
+}
+
+static void fio_exit fio_librpma_gpspm_unregister(void)
+{
+	unregister_ioengine(&ioengine_client);
+	unregister_ioengine(&ioengine_server);
+}
diff --git a/engines/librpma_gpspm_flush.pb-c.c b/engines/librpma_gpspm_flush.pb-c.c
new file mode 100644
index 00000000..3ff24756
--- /dev/null
+++ b/engines/librpma_gpspm_flush.pb-c.c
@@ -0,0 +1,214 @@
+/*
+ * Copyright 2020, Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+/* Generated by the protocol buffer compiler. DO NOT EDIT! */
+/* Generated from: librpma_gpspm_flush.proto */
+
+/* Do not generate deprecated warnings for self */
+#ifndef PROTOBUF_C__NO_DEPRECATED
+#define PROTOBUF_C__NO_DEPRECATED
+#endif
+
+#include "librpma_gpspm_flush.pb-c.h"
+void   gpspm_flush_request__init
+                     (GPSPMFlushRequest         *message)
+{
+  static const GPSPMFlushRequest init_value = GPSPM_FLUSH_REQUEST__INIT;
+  *message = init_value;
+}
+size_t gpspm_flush_request__get_packed_size
+                     (const GPSPMFlushRequest *message)
+{
+  assert(message->base.descriptor == &gpspm_flush_request__descriptor);
+  return protobuf_c_message_get_packed_size ((const ProtobufCMessage*)(message));
+}
+size_t gpspm_flush_request__pack
+                     (const GPSPMFlushRequest *message,
+                      uint8_t       *out)
+{
+  assert(message->base.descriptor == &gpspm_flush_request__descriptor);
+  return protobuf_c_message_pack ((const ProtobufCMessage*)message, out);
+}
+size_t gpspm_flush_request__pack_to_buffer
+                     (const GPSPMFlushRequest *message,
+                      ProtobufCBuffer *buffer)
+{
+  assert(message->base.descriptor == &gpspm_flush_request__descriptor);
+  return protobuf_c_message_pack_to_buffer ((const ProtobufCMessage*)message, buffer);
+}
+GPSPMFlushRequest *
+       gpspm_flush_request__unpack
+                     (ProtobufCAllocator  *allocator,
+                      size_t               len,
+                      const uint8_t       *data)
+{
+  return (GPSPMFlushRequest *)
+     protobuf_c_message_unpack (&gpspm_flush_request__descriptor,
+                                allocator, len, data);
+}
+void   gpspm_flush_request__free_unpacked
+                     (GPSPMFlushRequest *message,
+                      ProtobufCAllocator *allocator)
+{
+  if(!message)
+    return;
+  assert(message->base.descriptor == &gpspm_flush_request__descriptor);
+  protobuf_c_message_free_unpacked ((ProtobufCMessage*)message, allocator);
+}
+void   gpspm_flush_response__init
+                     (GPSPMFlushResponse         *message)
+{
+  static const GPSPMFlushResponse init_value = GPSPM_FLUSH_RESPONSE__INIT;
+  *message = init_value;
+}
+size_t gpspm_flush_response__get_packed_size
+                     (const GPSPMFlushResponse *message)
+{
+  assert(message->base.descriptor == &gpspm_flush_response__descriptor);
+  return protobuf_c_message_get_packed_size ((const ProtobufCMessage*)(message));
+}
+size_t gpspm_flush_response__pack
+                     (const GPSPMFlushResponse *message,
+                      uint8_t       *out)
+{
+  assert(message->base.descriptor == &gpspm_flush_response__descriptor);
+  return protobuf_c_message_pack ((const ProtobufCMessage*)message, out);
+}
+size_t gpspm_flush_response__pack_to_buffer
+                     (const GPSPMFlushResponse *message,
+                      ProtobufCBuffer *buffer)
+{
+  assert(message->base.descriptor == &gpspm_flush_response__descriptor);
+  return protobuf_c_message_pack_to_buffer ((const ProtobufCMessage*)message, buffer);
+}
+GPSPMFlushResponse *
+       gpspm_flush_response__unpack
+                     (ProtobufCAllocator  *allocator,
+                      size_t               len,
+                      const uint8_t       *data)
+{
+  return (GPSPMFlushResponse *)
+     protobuf_c_message_unpack (&gpspm_flush_response__descriptor,
+                                allocator, len, data);
+}
+void   gpspm_flush_response__free_unpacked
+                     (GPSPMFlushResponse *message,
+                      ProtobufCAllocator *allocator)
+{
+  if(!message)
+    return;
+  assert(message->base.descriptor == &gpspm_flush_response__descriptor);
+  protobuf_c_message_free_unpacked ((ProtobufCMessage*)message, allocator);
+}
+static const ProtobufCFieldDescriptor gpspm_flush_request__field_descriptors[3] =
+{
+  {
+    "offset",
+    1,
+    PROTOBUF_C_LABEL_REQUIRED,
+    PROTOBUF_C_TYPE_FIXED64,
+    0,   /* quantifier_offset */
+    offsetof(GPSPMFlushRequest, offset),
+    NULL,
+    NULL,
+    0,             /* flags */
+    0,NULL,NULL    /* reserved1,reserved2, etc */
+  },
+  {
+    "length",
+    2,
+    PROTOBUF_C_LABEL_REQUIRED,
+    PROTOBUF_C_TYPE_FIXED64,
+    0,   /* quantifier_offset */
+    offsetof(GPSPMFlushRequest, length),
+    NULL,
+    NULL,
+    0,             /* flags */
+    0,NULL,NULL    /* reserved1,reserved2, etc */
+  },
+  {
+    "op_context",
+    3,
+    PROTOBUF_C_LABEL_REQUIRED,
+    PROTOBUF_C_TYPE_FIXED64,
+    0,   /* quantifier_offset */
+    offsetof(GPSPMFlushRequest, op_context),
+    NULL,
+    NULL,
+    0,             /* flags */
+    0,NULL,NULL    /* reserved1,reserved2, etc */
+  },
+};
+static const unsigned gpspm_flush_request__field_indices_by_name[] = {
+  1,   /* field[1] = length */
+  0,   /* field[0] = offset */
+  2,   /* field[2] = op_context */
+};
+static const ProtobufCIntRange gpspm_flush_request__number_ranges[1 + 1] =
+{
+  { 1, 0 },
+  { 0, 3 }
+};
+const ProtobufCMessageDescriptor gpspm_flush_request__descriptor =
+{
+  PROTOBUF_C__MESSAGE_DESCRIPTOR_MAGIC,
+  "GPSPM_flush_request",
+  "GPSPMFlushRequest",
+  "GPSPMFlushRequest",
+  "",
+  sizeof(GPSPMFlushRequest),
+  3,
+  gpspm_flush_request__field_descriptors,
+  gpspm_flush_request__field_indices_by_name,
+  1,  gpspm_flush_request__number_ranges,
+  (ProtobufCMessageInit) gpspm_flush_request__init,
+  NULL,NULL,NULL    /* reserved[123] */
+};
+static const ProtobufCFieldDescriptor gpspm_flush_response__field_descriptors[1] =
+{
+  {
+    "op_context",
+    1,
+    PROTOBUF_C_LABEL_REQUIRED,
+    PROTOBUF_C_TYPE_FIXED64,
+    0,   /* quantifier_offset */
+    offsetof(GPSPMFlushResponse, op_context),
+    NULL,
+    NULL,
+    0,             /* flags */
+    0,NULL,NULL    /* reserved1,reserved2, etc */
+  },
+};
+static const unsigned gpspm_flush_response__field_indices_by_name[] = {
+  0,   /* field[0] = op_context */
+};
+static const ProtobufCIntRange gpspm_flush_response__number_ranges[1 + 1] =
+{
+  { 1, 0 },
+  { 0, 1 }
+};
+const ProtobufCMessageDescriptor gpspm_flush_response__descriptor =
+{
+  PROTOBUF_C__MESSAGE_DESCRIPTOR_MAGIC,
+  "GPSPM_flush_response",
+  "GPSPMFlushResponse",
+  "GPSPMFlushResponse",
+  "",
+  sizeof(GPSPMFlushResponse),
+  1,
+  gpspm_flush_response__field_descriptors,
+  gpspm_flush_response__field_indices_by_name,
+  1,  gpspm_flush_response__number_ranges,
+  (ProtobufCMessageInit) gpspm_flush_response__init,
+  NULL,NULL,NULL    /* reserved[123] */
+};
diff --git a/engines/librpma_gpspm_flush.pb-c.h b/engines/librpma_gpspm_flush.pb-c.h
new file mode 100644
index 00000000..ad475a95
--- /dev/null
+++ b/engines/librpma_gpspm_flush.pb-c.h
@@ -0,0 +1,120 @@
+/*
+ * Copyright 2020, Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+/* Generated by the protocol buffer compiler. DO NOT EDIT! */
+/* Generated from: librpma_gpspm_flush.proto */
+
+#ifndef PROTOBUF_C_GPSPM_5fflush_2eproto__INCLUDED
+#define PROTOBUF_C_GPSPM_5fflush_2eproto__INCLUDED
+
+#include <protobuf-c/protobuf-c.h>
+
+PROTOBUF_C__BEGIN_DECLS
+
+#if PROTOBUF_C_VERSION_NUMBER < 1000000
+# error This file was generated by a newer version of protoc-c which is incompatible with your libprotobuf-c headers. Please update your headers.
+#elif 1003003 < PROTOBUF_C_MIN_COMPILER_VERSION
+# error This file was generated by an older version of protoc-c which is incompatible with your libprotobuf-c headers. Please regenerate this file with a newer version of protoc-c.
+#endif
+
+
+typedef struct _GPSPMFlushRequest GPSPMFlushRequest;
+typedef struct _GPSPMFlushResponse GPSPMFlushResponse;
+
+
+/* --- enums --- */
+
+
+/* --- messages --- */
+
+struct  _GPSPMFlushRequest
+{
+  ProtobufCMessage base;
+  uint64_t offset;
+  uint64_t length;
+  uint64_t op_context;
+};
+#define GPSPM_FLUSH_REQUEST__INIT \
+ { PROTOBUF_C_MESSAGE_INIT (&gpspm_flush_request__descriptor) \
+    , 0, 0, 0 }
+
+
+struct  _GPSPMFlushResponse
+{
+  ProtobufCMessage base;
+  uint64_t op_context;
+};
+#define GPSPM_FLUSH_RESPONSE__INIT \
+ { PROTOBUF_C_MESSAGE_INIT (&gpspm_flush_response__descriptor) \
+    , 0 }
+
+
+/* GPSPMFlushRequest methods */
+void   gpspm_flush_request__init
+                     (GPSPMFlushRequest         *message);
+size_t gpspm_flush_request__get_packed_size
+                     (const GPSPMFlushRequest   *message);
+size_t gpspm_flush_request__pack
+                     (const GPSPMFlushRequest   *message,
+                      uint8_t             *out);
+size_t gpspm_flush_request__pack_to_buffer
+                     (const GPSPMFlushRequest   *message,
+                      ProtobufCBuffer     *buffer);
+GPSPMFlushRequest *
+       gpspm_flush_request__unpack
+                     (ProtobufCAllocator  *allocator,
+                      size_t               len,
+                      const uint8_t       *data);
+void   gpspm_flush_request__free_unpacked
+                     (GPSPMFlushRequest *message,
+                      ProtobufCAllocator *allocator);
+/* GPSPMFlushResponse methods */
+void   gpspm_flush_response__init
+                     (GPSPMFlushResponse         *message);
+size_t gpspm_flush_response__get_packed_size
+                     (const GPSPMFlushResponse   *message);
+size_t gpspm_flush_response__pack
+                     (const GPSPMFlushResponse   *message,
+                      uint8_t             *out);
+size_t gpspm_flush_response__pack_to_buffer
+                     (const GPSPMFlushResponse   *message,
+                      ProtobufCBuffer     *buffer);
+GPSPMFlushResponse *
+       gpspm_flush_response__unpack
+                     (ProtobufCAllocator  *allocator,
+                      size_t               len,
+                      const uint8_t       *data);
+void   gpspm_flush_response__free_unpacked
+                     (GPSPMFlushResponse *message,
+                      ProtobufCAllocator *allocator);
+/* --- per-message closures --- */
+
+typedef void (*GPSPMFlushRequest_Closure)
+                 (const GPSPMFlushRequest *message,
+                  void *closure_data);
+typedef void (*GPSPMFlushResponse_Closure)
+                 (const GPSPMFlushResponse *message,
+                  void *closure_data);
+
+/* --- services --- */
+
+
+/* --- descriptors --- */
+
+extern const ProtobufCMessageDescriptor gpspm_flush_request__descriptor;
+extern const ProtobufCMessageDescriptor gpspm_flush_response__descriptor;
+
+PROTOBUF_C__END_DECLS
+
+
+#endif  /* PROTOBUF_C_GPSPM_5fflush_2eproto__INCLUDED */
diff --git a/engines/librpma_gpspm_flush.proto b/engines/librpma_gpspm_flush.proto
new file mode 100644
index 00000000..91765a7f
--- /dev/null
+++ b/engines/librpma_gpspm_flush.proto
@@ -0,0 +1,15 @@
+syntax = "proto2";
+
+message GPSPM_flush_request {
+    /* an offset of a region to be flushed within its memory registration */
+    required fixed64 offset = 1;
+    /* a length of a region to be flushed */
+    required fixed64 length = 2;
+    /* a user-defined operation context */
+    required fixed64 op_context = 3;
+}
+
+message GPSPM_flush_response {
+    /* the operation context of a completed request */
+    required fixed64 op_context = 1;
+}
diff --git a/eta.c b/eta.c
index 97843012..db13cb18 100644
--- a/eta.c
+++ b/eta.c
@@ -331,7 +331,7 @@ static void calc_rate(int unified_rw_rep, unsigned long mtime,
 		else
 			this_rate = 0;
 
-		if (unified_rw_rep) {
+		if (unified_rw_rep == UNIFIED_MIXED) {
 			rate[i] = 0;
 			rate[0] += this_rate;
 		} else
@@ -356,7 +356,7 @@ static void calc_iops(int unified_rw_rep, unsigned long mtime,
 		else
 			this_iops = 0;
 
-		if (unified_rw_rep) {
+		if (unified_rw_rep == UNIFIED_MIXED) {
 			iops[i] = 0;
 			iops[0] += this_iops;
 		} else
diff --git a/examples/librpma_apm-client.fio b/examples/librpma_apm-client.fio
new file mode 100644
index 00000000..82a5d20c
--- /dev/null
+++ b/examples/librpma_apm-client.fio
@@ -0,0 +1,24 @@
+# Example of the librpma_apm_client job
+
+[global]
+ioengine=librpma_apm_client
+create_serialize=0 # (required) forces specific initiation sequence
+serverip=[serverip] #IP address the server is listening on
+port=7204 # port(s) the server will listen on, <port; port + numjobs - 1> will be used
+thread
+
+# The client will get a remote memory region description after establishing
+# a connection.
+
+[client]
+numjobs=1 # number of parallel connections
+group_reporting=1
+sync=1 # 1 is the best for latency measurements, 0 for bandwidth
+iodepth=2 # total number of ious
+iodepth_batch_submit=1 # number of ious to be submitted at once
+rw=write # read/write/randread/randwrite/readwrite/rw
+rwmixread=70 # % of a mixed workload that should be reads
+blocksize=4KiB
+ramp_time=15s # gives some time to stabilize the workload
+time_based
+runtime=60s # run the workload for the specified period of time
diff --git a/examples/librpma_apm-server.fio b/examples/librpma_apm-server.fio
new file mode 100644
index 00000000..062b5215
--- /dev/null
+++ b/examples/librpma_apm-server.fio
@@ -0,0 +1,26 @@
+# Example of the librpma_apm_server job
+
+[global]
+ioengine=librpma_apm_server
+create_serialize=0 # (required) forces specific initiation sequence
+kb_base=1000 # turn on the straight units handling (non-compatibility mode)
+serverip=[serverip] # IP address to listen on
+port=7204 # port(s) the server jobs will listen on, ports <port; port + numjobs - 1> will be used
+thread
+
+# The server side spawns one thread for each expected connection from
+# the client-side, opens and registers the range dedicated for this thread
+# (a workspace) from the provided memory.
+# Each of the server threads accepts a connection on the dedicated port
+# (different for each and every working thread) and waits for it to end up,
+# and closes itself.
+
+[server]
+# set to 1 (true) ONLY when Direct Write to PMem from the remote host is possible
+# (https://pmem.io/rpma/documentation/basic-direct-write-to-pmem.html)
+direct_write_to_pmem=0
+
+numjobs=1 # number of expected incomming connections
+size=100MiB # size of workspace for a single connection
+filename=malloc # device dax or an existing fsdax file or "malloc" for allocation from DRAM
+# filename=/dev/dax1.0
diff --git a/examples/librpma_gpspm-client.fio b/examples/librpma_gpspm-client.fio
new file mode 100644
index 00000000..843382df
--- /dev/null
+++ b/examples/librpma_gpspm-client.fio
@@ -0,0 +1,23 @@
+# Example of the librpma_gpspm_client job
+
+[global]
+ioengine=librpma_gpspm_client
+create_serialize=0 # (required) forces specific initiation sequence
+serverip=[serverip] #IP address the server is listening on
+port=7204 # port(s) the server will listen on, <port; port + numjobs - 1> will be used
+thread
+
+# The client will get a remote memory region description after establishing
+# a connection.
+
+[client]
+numjobs=1 # number of parallel connections
+group_reporting=1
+sync=1 # 1 is the best for latency measurements, 0 for bandwidth
+iodepth=2 # total number of ious
+iodepth_batch_submit=1 # number of ious to be submitted at once
+rw=write # write/randwrite
+blocksize=4KiB
+ramp_time=15s # gives some time to stabilize the workload
+time_based
+runtime=60s # run the workload for the specified period of time
diff --git a/examples/librpma_gpspm-server.fio b/examples/librpma_gpspm-server.fio
new file mode 100644
index 00000000..d618f2db
--- /dev/null
+++ b/examples/librpma_gpspm-server.fio
@@ -0,0 +1,31 @@
+# Example of the librpma_gpspm_server job
+
+[global]
+ioengine=librpma_gpspm_server
+create_serialize=0 # (required) forces specific initiation sequence
+kb_base=1000 # turn on the straight units handling (non-compatibility mode)
+serverip=[serverip] #IP address to listen on
+port=7204 # port(s) the server jobs will listen on, ports <port; port + numjobs - 1> will be used
+thread
+
+# The server side spawns one thread for each expected connection from
+# the client-side, opens and registers the range dedicated for this thread
+# (a workspace) from the provided memory.
+# Each of the server threads accepts a connection on the dedicated port
+# (different for each and every working thread), accepts and executes flush
+# requests, and sends back a flush response for each of the requests.
+# When the client is done it sends the termination notice to the server's thread.
+
+[server]
+# set to 1 (true) ONLY when Direct Write to PMem from the remote host is possible
+# (https://pmem.io/rpma/documentation/basic-direct-write-to-pmem.html)
+direct_write_to_pmem=0
+numjobs=1 # number of expected incomming connections
+iodepth=2 # number of parallel GPSPM requests
+size=100MiB # size of workspace for a single connection
+filename=malloc # device dax or an existing fsdax file or "malloc" for allocation from DRAM
+# filename=/dev/dax1.0
+
+# The client will terminate the server when the client will end up its job.
+time_based
+runtime=365d
diff --git a/fio.1 b/fio.1
index 27cf2f15..ad4a662b 100644
--- a/fio.1
+++ b/fio.1
@@ -924,10 +924,32 @@ behaves in a similar fashion, except it sends the same offset 8 number of
 times before generating a new offset.
 .RE
 .TP
-.BI unified_rw_reporting \fR=\fPbool
+.BI unified_rw_reporting \fR=\fPstr
 Fio normally reports statistics on a per data direction basis, meaning that
-reads, writes, and trims are accounted and reported separately. If this
-option is set fio sums the results and report them as "mixed" instead.
+reads, writes, and trims are accounted and reported separately. This option
+determines whether fio reports the results normally, summed together, or as
+both options.
+Accepted values are:
+.RS
+.TP
+.B none
+Normal statistics reporting.
+.TP
+.B mixed
+Statistics are summed per data direction and reported together.
+.TP
+.B both
+Statistics are reported normally, followed by the mixed statistics.
+.TP
+.B 0
+Backward-compatible alias for \fBnone\fR.
+.TP
+.B 1
+Backward-compatible alias for \fBmixed\fR.
+.TP
+.B 2
+Alias for \fBboth\fR.
+.RE
 .TP
 .BI randrepeat \fR=\fPbool
 Seed the random number generator used for random I/O patterns in a
@@ -1956,7 +1978,7 @@ The TCP or UDP port to bind to or connect to. If this is used with
 this will be the starting port number since fio will use a range of
 ports.
 .TP
-.BI (rdma)port
+.BI (rdma, librpma_*)port
 The port to use for RDMA-CM communication. This should be the same
 value on the client and the server side.
 .TP
@@ -1965,6 +1987,12 @@ The hostname or IP address to use for TCP, UDP or RDMA-CM based I/O.
 If the job is a TCP listener or UDP reader, the hostname is not used
 and must be omitted unless it is a valid UDP multicast address.
 .TP
+.BI (librpma_*)serverip \fR=\fPstr
+The IP address to be used for RDMA-CM based I/O.
+.TP
+.BI (librpma_*_server)direct_write_to_pmem \fR=\fPbool
+Set to 1 only when Direct Write to PMem from the remote host is possible. Otherwise, set to 0.
+.TP
 .BI (netsplice,net)interface \fR=\fPstr
 The IP address of the network interface used to send or receive UDP
 multicast.
diff --git a/optgroup.c b/optgroup.c
index 4cdea71f..15a16229 100644
--- a/optgroup.c
+++ b/optgroup.c
@@ -141,6 +141,10 @@ static const struct opt_group fio_opt_cat_groups[] = {
 		.name	= "RDMA I/O engine", /* rdma */
 		.mask	= FIO_OPT_G_RDMA,
 	},
+	{
+		.name	= "librpma I/O engines", /* librpma_apm && librpma_gpspm */
+		.mask	= FIO_OPT_G_LIBRPMA,
+	},
 	{
 		.name	= "libaio I/O engine", /* libaio */
 		.mask	= FIO_OPT_G_LIBAIO,
diff --git a/optgroup.h b/optgroup.h
index 25b7fec1..ff748629 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -52,6 +52,7 @@ enum opt_category_group {
 	__FIO_OPT_G_E4DEFRAG,
 	__FIO_OPT_G_NETIO,
 	__FIO_OPT_G_RDMA,
+	__FIO_OPT_G_LIBRPMA,
 	__FIO_OPT_G_LIBAIO,
 	__FIO_OPT_G_ACT,
 	__FIO_OPT_G_LATPROF,
@@ -95,6 +96,7 @@ enum opt_category_group {
 	FIO_OPT_G_E4DEFRAG	= (1ULL << __FIO_OPT_G_E4DEFRAG),
 	FIO_OPT_G_NETIO		= (1ULL << __FIO_OPT_G_NETIO),
 	FIO_OPT_G_RDMA		= (1ULL << __FIO_OPT_G_RDMA),
+	FIO_OPT_G_LIBRPMA	= (1ULL << __FIO_OPT_G_LIBRPMA),
 	FIO_OPT_G_LIBAIO	= (1ULL << __FIO_OPT_G_LIBAIO),
 	FIO_OPT_G_ACT		= (1ULL << __FIO_OPT_G_ACT),
 	FIO_OPT_G_LATPROF	= (1ULL << __FIO_OPT_G_LATPROF),
diff --git a/options.c b/options.c
index 151e7a7e..ddabaa82 100644
--- a/options.c
+++ b/options.c
@@ -1945,6 +1945,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "RDMA IO engine",
 			  },
 #endif
+#ifdef CONFIG_LIBRPMA_APM
+			  { .ival = "librpma_apm",
+			    .help = "librpma IO engine in APM mode",
+			  },
+#endif
+#ifdef CONFIG_LIBRPMA_GPSPM
+			  { .ival = "librpma_gpspm",
+			    .help = "librpma IO engine in GPSPM mode",
+			  },
+#endif
 #ifdef CONFIG_LINUX_EXT4_MOVE_EXTENT
 			  { .ival = "e4defrag",
 			    .help = "ext4 defrag engine",
@@ -4623,12 +4633,39 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "unified_rw_reporting",
 		.lname	= "Unified RW Reporting",
-		.type	= FIO_OPT_BOOL,
+		.type	= FIO_OPT_STR,
 		.off1	= offsetof(struct thread_options, unified_rw_rep),
 		.help	= "Unify reporting across data direction",
-		.def	= "0",
+		.def	= "none",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_INVALID,
+		.posval	= {
+			  { .ival = "none",
+			    .oval = UNIFIED_SPLIT,
+			    .help = "Normal statistics reporting",
+			  },
+			  { .ival = "mixed",
+			    .oval = UNIFIED_MIXED,
+			    .help = "Statistics are summed per data direction and reported together",
+			  },
+			  { .ival = "both",
+			    .oval = UNIFIED_BOTH,
+			    .help = "Statistics are reported normally, followed by the mixed statistics"
+			  },
+			  /* Compatibility with former boolean values */
+			  { .ival = "0",
+			    .oval = UNIFIED_SPLIT,
+			    .help = "Alias for 'none'",
+			  },
+			  { .ival = "1",
+			    .oval = UNIFIED_MIXED,
+			    .help = "Alias for 'mixed'",
+			  },
+			  { .ival = "2",
+			    .oval = UNIFIED_BOTH,
+			    .help = "Alias for 'both'",
+			  },
+		},
 	},
 	{
 		.name	= "continue_on_error",
diff --git a/stat.c b/stat.c
index b7237953..b7222f46 100644
--- a/stat.c
+++ b/stat.c
@@ -282,6 +282,46 @@ bool calc_lat(struct io_stat *is, unsigned long long *min,
 	return true;
 }
 
+void show_mixed_group_stats(struct group_run_stats *rs, struct buf_output *out) 
+{
+	char *io, *agg, *min, *max;
+	char *ioalt, *aggalt, *minalt, *maxalt;
+	uint64_t io_mix = 0, agg_mix = 0, min_mix = -1, max_mix = 0, min_run = -1, max_run = 0;
+	int i;
+	const int i2p = is_power_of_2(rs->kb_base);
+
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		if (!rs->max_run[i])
+			continue;
+		io_mix += rs->iobytes[i];
+		agg_mix += rs->agg[i];
+		min_mix = min_mix < rs->min_bw[i] ? min_mix : rs->min_bw[i];
+		max_mix = max_mix > rs->max_bw[i] ? max_mix : rs->max_bw[i];
+		min_run = min_run < rs->min_run[i] ? min_run : rs->min_run[i];
+		max_run = max_run > rs->max_run[i] ? max_run : rs->max_run[i];
+	}
+	io = num2str(io_mix, rs->sig_figs, 1, i2p, N2S_BYTE);
+	ioalt = num2str(io_mix, rs->sig_figs, 1, !i2p, N2S_BYTE);
+	agg = num2str(agg_mix, rs->sig_figs, 1, i2p, rs->unit_base);
+	aggalt = num2str(agg_mix, rs->sig_figs, 1, !i2p, rs->unit_base);
+	min = num2str(min_mix, rs->sig_figs, 1, i2p, rs->unit_base);
+	minalt = num2str(min_mix, rs->sig_figs, 1, !i2p, rs->unit_base);
+	max = num2str(max_mix, rs->sig_figs, 1, i2p, rs->unit_base);
+	maxalt = num2str(max_mix, rs->sig_figs, 1, !i2p, rs->unit_base);
+	log_buf(out, "  MIXED: bw=%s (%s), %s-%s (%s-%s), io=%s (%s), run=%llu-%llumsec\n",
+			agg, aggalt, min, max, minalt, maxalt, io, ioalt,
+			(unsigned long long) min_run,
+			(unsigned long long) max_run);
+	free(io);
+	free(agg);
+	free(min);
+	free(max);
+	free(ioalt);
+	free(aggalt);
+	free(minalt);
+	free(maxalt);
+}
+
 void show_group_stats(struct group_run_stats *rs, struct buf_output *out)
 {
 	char *io, *agg, *min, *max;
@@ -306,7 +346,7 @@ void show_group_stats(struct group_run_stats *rs, struct buf_output *out)
 		max = num2str(rs->max_bw[i], rs->sig_figs, 1, i2p, rs->unit_base);
 		maxalt = num2str(rs->max_bw[i], rs->sig_figs, 1, !i2p, rs->unit_base);
 		log_buf(out, "%s: bw=%s (%s), %s-%s (%s-%s), io=%s (%s), run=%llu-%llumsec\n",
-				rs->unified_rw_rep ? "  MIXED" : str[i],
+				(rs->unified_rw_rep == UNIFIED_MIXED) ? "  MIXED" : str[i],
 				agg, aggalt, min, max, minalt, maxalt, io, ioalt,
 				(unsigned long long) rs->min_run[i],
 				(unsigned long long) rs->max_run[i]);
@@ -320,6 +360,10 @@ void show_group_stats(struct group_run_stats *rs, struct buf_output *out)
 		free(minalt);
 		free(maxalt);
 	}
+	
+	/* Need to aggregate statisitics to show mixed values */
+	if (rs->unified_rw_rep == UNIFIED_BOTH) 
+		show_mixed_group_stats(rs, out);
 }
 
 void stat_calc_dist(uint64_t *map, unsigned long total, double *io_u_dist)
@@ -426,6 +470,168 @@ static double convert_agg_kbytes_percent(struct group_run_stats *rs, int ddir, i
 	return p_of_agg;
 }
 
+static void show_mixed_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
+			     struct buf_output *out)
+{
+	unsigned long runt;
+	unsigned long long min, max, bw, iops;
+	double mean, dev;
+	char *io_p, *bw_p, *bw_p_alt, *iops_p, *post_st = NULL;
+	struct thread_stat *ts_lcl;
+
+	int i2p;
+	int ddir = 0, i;
+
+	/* Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and Trims (ddir = 2) */
+	ts_lcl = malloc(sizeof(struct thread_stat));
+	memset((void *)ts_lcl, 0, sizeof(struct thread_stat));
+	ts_lcl->unified_rw_rep = UNIFIED_MIXED;               /* calculate mixed stats  */
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		ts_lcl->clat_stat[i].min_val = ULONG_MAX;
+		ts_lcl->slat_stat[i].min_val = ULONG_MAX;
+		ts_lcl->lat_stat[i].min_val = ULONG_MAX;
+		ts_lcl->bw_stat[i].min_val = ULONG_MAX;
+		ts_lcl->iops_stat[i].min_val = ULONG_MAX;
+		ts_lcl->clat_high_prio_stat[i].min_val = ULONG_MAX;
+		ts_lcl->clat_low_prio_stat[i].min_val = ULONG_MAX;
+	}
+	ts_lcl->sync_stat.min_val = ULONG_MAX;
+
+	sum_thread_stats(ts_lcl, ts, 1);
+
+	assert(ddir_rw(ddir));
+
+	if (!ts_lcl->runtime[ddir])
+		return;
+
+	i2p = is_power_of_2(rs->kb_base);
+	runt = ts_lcl->runtime[ddir];
+
+	bw = (1000 * ts_lcl->io_bytes[ddir]) / runt;
+	io_p = num2str(ts_lcl->io_bytes[ddir], ts->sig_figs, 1, i2p, N2S_BYTE);
+	bw_p = num2str(bw, ts->sig_figs, 1, i2p, ts->unit_base);
+	bw_p_alt = num2str(bw, ts->sig_figs, 1, !i2p, ts->unit_base);
+
+	iops = (1000 * ts_lcl->total_io_u[ddir]) / runt;
+	iops_p = num2str(iops, ts->sig_figs, 1, 0, N2S_NONE);
+
+	log_buf(out, "  mixed: IOPS=%s, BW=%s (%s)(%s/%llumsec)%s\n",
+			iops_p, bw_p, bw_p_alt, io_p,
+			(unsigned long long) ts_lcl->runtime[ddir],
+			post_st ? : "");
+
+	free(post_st);
+	free(io_p);
+	free(bw_p);
+	free(bw_p_alt);
+	free(iops_p);
+
+	if (calc_lat(&ts_lcl->slat_stat[ddir], &min, &max, &mean, &dev))
+		display_lat("slat", min, max, mean, dev, out);
+	if (calc_lat(&ts_lcl->clat_stat[ddir], &min, &max, &mean, &dev))
+		display_lat("clat", min, max, mean, dev, out);
+	if (calc_lat(&ts_lcl->lat_stat[ddir], &min, &max, &mean, &dev))
+		display_lat(" lat", min, max, mean, dev, out);
+	if (calc_lat(&ts_lcl->clat_high_prio_stat[ddir], &min, &max, &mean, &dev)) {
+		display_lat(ts_lcl->lat_percentiles ? "high prio_lat" : "high prio_clat",
+				min, max, mean, dev, out);
+		if (calc_lat(&ts_lcl->clat_low_prio_stat[ddir], &min, &max, &mean, &dev))
+			display_lat(ts_lcl->lat_percentiles ? "low prio_lat" : "low prio_clat",
+					min, max, mean, dev, out);
+	}
+
+	if (ts->slat_percentiles && ts_lcl->slat_stat[ddir].samples > 0)
+		show_clat_percentiles(ts_lcl->io_u_plat[FIO_SLAT][ddir],
+				ts_lcl->slat_stat[ddir].samples,
+				ts->percentile_list,
+				ts->percentile_precision, "slat", out);
+	if (ts->clat_percentiles && ts_lcl->clat_stat[ddir].samples > 0)
+		show_clat_percentiles(ts_lcl->io_u_plat[FIO_CLAT][ddir],
+				ts_lcl->clat_stat[ddir].samples,
+				ts->percentile_list,
+				ts->percentile_precision, "clat", out);
+	if (ts->lat_percentiles && ts_lcl->lat_stat[ddir].samples > 0)
+		show_clat_percentiles(ts_lcl->io_u_plat[FIO_LAT][ddir],
+				ts_lcl->lat_stat[ddir].samples,
+				ts->percentile_list,
+				ts->percentile_precision, "lat", out);
+
+	if (ts->clat_percentiles || ts->lat_percentiles) {
+		const char *name = ts->lat_percentiles ? "lat" : "clat";
+		char prio_name[32];
+		uint64_t samples;
+
+		if (ts->lat_percentiles)
+			samples = ts_lcl->lat_stat[ddir].samples;
+		else
+			samples = ts_lcl->clat_stat[ddir].samples;
+
+		/* Only print this if some high and low priority stats were collected */
+		if (ts_lcl->clat_high_prio_stat[ddir].samples > 0 &&
+				ts_lcl->clat_low_prio_stat[ddir].samples > 0)
+		{
+			sprintf(prio_name, "high prio (%.2f%%) %s",
+					100. * (double) ts_lcl->clat_high_prio_stat[ddir].samples / (double) samples,
+					name);
+			show_clat_percentiles(ts_lcl->io_u_plat_high_prio[ddir],
+					ts_lcl->clat_high_prio_stat[ddir].samples,
+					ts->percentile_list,
+					ts->percentile_precision, prio_name, out);
+
+			sprintf(prio_name, "low prio (%.2f%%) %s",
+					100. * (double) ts_lcl->clat_low_prio_stat[ddir].samples / (double) samples,
+					name);
+			show_clat_percentiles(ts_lcl->io_u_plat_low_prio[ddir],
+					ts_lcl->clat_low_prio_stat[ddir].samples,
+					ts->percentile_list,
+					ts->percentile_precision, prio_name, out);
+		}
+	}
+
+	if (calc_lat(&ts_lcl->bw_stat[ddir], &min, &max, &mean, &dev)) {
+		double p_of_agg = 100.0, fkb_base = (double)rs->kb_base;
+		const char *bw_str;
+
+		if ((rs->unit_base == 1) && i2p)
+			bw_str = "Kibit";
+		else if (rs->unit_base == 1)
+			bw_str = "kbit";
+		else if (i2p)
+			bw_str = "KiB";
+		else
+			bw_str = "kB";
+
+		p_of_agg = convert_agg_kbytes_percent(rs, ddir, mean);
+
+		if (rs->unit_base == 1) {
+			min *= 8.0;
+			max *= 8.0;
+			mean *= 8.0;
+			dev *= 8.0;
+		}
+
+		if (mean > fkb_base * fkb_base) {
+			min /= fkb_base;
+			max /= fkb_base;
+			mean /= fkb_base;
+			dev /= fkb_base;
+			bw_str = (rs->unit_base == 1 ? "Mibit" : "MiB");
+		}
+
+		log_buf(out, "   bw (%5s/s): min=%5llu, max=%5llu, per=%3.2f%%, "
+			"avg=%5.02f, stdev=%5.02f, samples=%" PRIu64 "\n",
+			bw_str, min, max, p_of_agg, mean, dev,
+			(&ts_lcl->bw_stat[ddir])->samples);
+	}
+	if (calc_lat(&ts_lcl->iops_stat[ddir], &min, &max, &mean, &dev)) {
+		log_buf(out, "   iops        : min=%5llu, max=%5llu, "
+			"avg=%5.02f, stdev=%5.02f, samples=%" PRIu64 "\n",
+			min, max, mean, dev, (&ts_lcl->iops_stat[ddir])->samples);
+	}
+
+	free(ts_lcl);
+}
+
 static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 			     int ddir, struct buf_output *out)
 {
@@ -477,7 +683,7 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	}
 
 	log_buf(out, "  %s: IOPS=%s, BW=%s (%s)(%s/%llumsec)%s\n",
-			rs->unified_rw_rep ? "mixed" : io_ddir_name(ddir),
+			(ts->unified_rw_rep == UNIFIED_MIXED) ? "mixed" : io_ddir_name(ddir),
 			iops_p, bw_p, bw_p_alt, io_p,
 			(unsigned long long) ts->runtime[ddir],
 			post_st ? : "");
@@ -1083,6 +1289,9 @@ static void show_thread_status_normal(struct thread_stat *ts,
 			show_ddir_status(rs, ts, ddir, out);
 	}
 
+	if (ts->unified_rw_rep == UNIFIED_BOTH)
+		show_mixed_ddir_status(rs, ts, out);
+
 	show_latencies(ts, out);
 
 	if (ts->sync_stat.samples)
@@ -1205,7 +1414,7 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 					&minv);
 	else
 		len = 0;
-
+	
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++) {
 		if (i >= len) {
 			log_buf(out, ";0%%=0");
@@ -1249,6 +1458,40 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 	}
 }
 
+static void show_mixed_ddir_status_terse(struct thread_stat *ts,
+				   struct group_run_stats *rs,
+				   int ver, struct buf_output *out)
+{
+	struct thread_stat *ts_lcl;
+	int i;
+
+	/* Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and Trims (ddir = 2) */
+	ts_lcl = malloc(sizeof(struct thread_stat));
+	memset((void *)ts_lcl, 0, sizeof(struct thread_stat));
+	ts_lcl->unified_rw_rep = UNIFIED_MIXED;               /* calculate mixed stats  */
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		ts_lcl->clat_stat[i].min_val = ULONG_MAX;
+		ts_lcl->slat_stat[i].min_val = ULONG_MAX;
+		ts_lcl->lat_stat[i].min_val = ULONG_MAX;
+		ts_lcl->bw_stat[i].min_val = ULONG_MAX;
+		ts_lcl->iops_stat[i].min_val = ULONG_MAX;
+		ts_lcl->clat_high_prio_stat[i].min_val = ULONG_MAX;
+		ts_lcl->clat_low_prio_stat[i].min_val = ULONG_MAX;
+	}
+	ts_lcl->sync_stat.min_val = ULONG_MAX;
+	ts_lcl->lat_percentiles = ts->lat_percentiles;
+	ts_lcl->clat_percentiles = ts->clat_percentiles;
+	ts_lcl->slat_percentiles = ts->slat_percentiles;
+	ts_lcl->percentile_precision = ts->percentile_precision;		
+	memcpy(ts_lcl->percentile_list, ts->percentile_list, sizeof(ts->percentile_list));
+	
+	sum_thread_stats(ts_lcl, ts, 1);
+
+	/* add the aggregated stats to json parent */
+	show_ddir_status_terse(ts_lcl, rs, DDIR_READ, ver, out);
+	free(ts_lcl);
+}
+
 static struct json_object *add_ddir_lat_json(struct thread_stat *ts, uint32_t percentiles,
 		struct io_stat *lat_stat, uint64_t *io_u_plat)
 {
@@ -1310,12 +1553,12 @@ static void add_ddir_status_json(struct thread_stat *ts,
 
 	assert(ddir_rw(ddir) || ddir_sync(ddir));
 
-	if (ts->unified_rw_rep && ddir != DDIR_READ)
+	if ((ts->unified_rw_rep == UNIFIED_MIXED) && ddir != DDIR_READ)
 		return;
 
 	dir_object = json_create_object();
 	json_object_add_value_object(parent,
-		ts->unified_rw_rep ? "mixed" : io_ddir_name(ddir), dir_object);
+		(ts->unified_rw_rep == UNIFIED_MIXED) ? "mixed" : io_ddir_name(ddir), dir_object);
 
 	if (ddir_rw(ddir)) {
 		bw_bytes = 0;
@@ -1418,6 +1661,39 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	}
 }
 
+static void add_mixed_ddir_status_json(struct thread_stat *ts,
+		struct group_run_stats *rs, struct json_object *parent)
+{
+	struct thread_stat *ts_lcl;
+	int i;
+
+	/* Handle aggregation of Reads (ddir = 0), Writes (ddir = 1), and Trims (ddir = 2) */
+	ts_lcl = malloc(sizeof(struct thread_stat));
+	memset((void *)ts_lcl, 0, sizeof(struct thread_stat));
+	ts_lcl->unified_rw_rep = UNIFIED_MIXED;               /* calculate mixed stats  */
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		ts_lcl->clat_stat[i].min_val = ULONG_MAX;
+		ts_lcl->slat_stat[i].min_val = ULONG_MAX;
+		ts_lcl->lat_stat[i].min_val = ULONG_MAX;
+		ts_lcl->bw_stat[i].min_val = ULONG_MAX;
+		ts_lcl->iops_stat[i].min_val = ULONG_MAX;
+		ts_lcl->clat_high_prio_stat[i].min_val = ULONG_MAX;
+		ts_lcl->clat_low_prio_stat[i].min_val = ULONG_MAX;
+	}
+	ts_lcl->sync_stat.min_val = ULONG_MAX;
+	ts_lcl->lat_percentiles = ts->lat_percentiles;
+	ts_lcl->clat_percentiles = ts->clat_percentiles;
+	ts_lcl->slat_percentiles = ts->slat_percentiles;
+	ts_lcl->percentile_precision = ts->percentile_precision;		
+	memcpy(ts_lcl->percentile_list, ts->percentile_list, sizeof(ts->percentile_list));
+
+	sum_thread_stats(ts_lcl, ts, 1);
+
+	/* add the aggregated stats to json parent */
+	add_ddir_status_json(ts_lcl, rs, DDIR_READ, parent);
+	free(ts_lcl);
+}
+
 static void show_thread_status_terse_all(struct thread_stat *ts,
 					 struct group_run_stats *rs, int ver,
 					 struct buf_output *out)
@@ -1435,14 +1711,17 @@ static void show_thread_status_terse_all(struct thread_stat *ts,
 		log_buf(out, "%d;%s;%s;%d;%d", ver, fio_version_string,
 			ts->name, ts->groupid, ts->error);
 
-	/* Log Read Status */
+	/* Log Read Status, or mixed if unified_rw_rep = 1 */
 	show_ddir_status_terse(ts, rs, DDIR_READ, ver, out);
-	/* Log Write Status */
-	show_ddir_status_terse(ts, rs, DDIR_WRITE, ver, out);
-	/* Log Trim Status */
-	if (ver == 2 || ver == 4 || ver == 5)
-		show_ddir_status_terse(ts, rs, DDIR_TRIM, ver, out);
-
+	if (ts->unified_rw_rep != UNIFIED_MIXED) {
+		/* Log Write Status */
+		show_ddir_status_terse(ts, rs, DDIR_WRITE, ver, out);
+		/* Log Trim Status */
+		if (ver == 2 || ver == 4 || ver == 5)
+			show_ddir_status_terse(ts, rs, DDIR_TRIM, ver, out);
+	}
+	if (ts->unified_rw_rep == UNIFIED_BOTH)
+		show_mixed_ddir_status_terse(ts, rs, ver, out);
 	/* CPU Usage */
 	if (ts->total_run_time) {
 		double runt = (double) ts->total_run_time;
@@ -1547,6 +1826,9 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 	add_ddir_status_json(ts, rs, DDIR_TRIM, root);
 	add_ddir_status_json(ts, rs, DDIR_SYNC, root);
 
+	if (ts->unified_rw_rep == UNIFIED_BOTH)
+		add_mixed_ddir_status_json(ts, rs, root);
+
 	/* CPU Usage */
 	if (ts->total_run_time) {
 		double runt = (double) ts->total_run_time;
@@ -1875,7 +2157,7 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 	int k, l, m;
 
 	for (l = 0; l < DDIR_RWDIR_CNT; l++) {
-		if (!dst->unified_rw_rep) {
+		if (!(dst->unified_rw_rep == UNIFIED_MIXED)) {
 			sum_stat(&dst->clat_stat[l], &src->clat_stat[l], first, false);
 			sum_stat(&dst->clat_high_prio_stat[l], &src->clat_high_prio_stat[l], first, false);
 			sum_stat(&dst->clat_low_prio_stat[l], &src->clat_low_prio_stat[l], first, false);
@@ -1931,7 +2213,7 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 		dst->io_u_lat_m[k] += src->io_u_lat_m[k];
 
 	for (k = 0; k < DDIR_RWDIR_CNT; k++) {
-		if (!dst->unified_rw_rep) {
+		if (!(dst->unified_rw_rep == UNIFIED_MIXED)) {
 			dst->total_io_u[k] += src->total_io_u[k];
 			dst->short_io_u[k] += src->short_io_u[k];
 			dst->drop_io_u[k] += src->drop_io_u[k];
@@ -1947,7 +2229,7 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 	for (k = 0; k < FIO_LAT_CNT; k++)
 		for (l = 0; l < DDIR_RWDIR_CNT; l++)
 			for (m = 0; m < FIO_IO_U_PLAT_NR; m++)
-				if (!dst->unified_rw_rep)
+				if (!(dst->unified_rw_rep == UNIFIED_MIXED))
 					dst->io_u_plat[k][l][m] += src->io_u_plat[k][l][m];
 				else
 					dst->io_u_plat[k][0][m] += src->io_u_plat[k][l][m];
@@ -1957,7 +2239,7 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 
 	for (k = 0; k < DDIR_RWDIR_CNT; k++) {
 		for (m = 0; m < FIO_IO_U_PLAT_NR; m++) {
-			if (!dst->unified_rw_rep) {
+			if (!(dst->unified_rw_rep == UNIFIED_MIXED)) {
 				dst->io_u_plat_high_prio[k][m] += src->io_u_plat_high_prio[k][m];
 				dst->io_u_plat_low_prio[k][m] += src->io_u_plat_low_prio[k][m];
 			} else {
@@ -2166,7 +2448,7 @@ void __show_run_stats(void)
 		rs->kb_base = ts->kb_base;
 		rs->unit_base = ts->unit_base;
 		rs->sig_figs = ts->sig_figs;
-		rs->unified_rw_rep += ts->unified_rw_rep;
+		rs->unified_rw_rep |= ts->unified_rw_rep;
 
 		for (j = 0; j < DDIR_RWDIR_CNT; j++) {
 			if (!ts->runtime[j])
diff --git a/stat.h b/stat.h
index 6dd5ef74..d08d4dc0 100644
--- a/stat.h
+++ b/stat.h
@@ -146,6 +146,9 @@ enum block_info_state {
 #define FIO_JOBNAME_SIZE	128
 #define FIO_JOBDESC_SIZE	256
 #define FIO_VERROR_SIZE		128
+#define UNIFIED_SPLIT		0
+#define UNIFIED_MIXED		1
+#define UNIFIED_BOTH		2
 
 enum fio_lat {
 	FIO_SLAT = 0,


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-03-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-03-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 014ab48afcbcf442464acc7427fcd0f194f64bf4:

  Merge branch 'dev_luye_github' of https://github.com/louisluSCU/fio (2021-03-11 11:50:49 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dede9b9fae3ab670c1ca864ac66aea5e997e1f34:

  Merge branch 'free-dump-options' of https://github.com/floatious/fio (2021-03-17 09:25:46 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Merge branch 'dfs_engine' of https://github.com/johannlombardi/fio
      Merge branch 'patch-1' of https://github.com/ihsinme/fio
      Merge branch 'free-dump-options' of https://github.com/floatious/fio

Johann Lombardi (2):
      Disable pthread_condattr_setclock on cygwin
      engines/dfs: add DAOS File System (dfs) engine

Niklas Cassel (1):
      options: free dump options list on exit

ihsinme (1):
      fix loop with unreachable exit condition

 HOWTO            |  21 ++
 Makefile         |   5 +
 backend.c        |   1 +
 configure        |  52 ++++-
 engines/dfs.c    | 583 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 examples/dfs.fio |  33 ++++
 fio.1            |  18 ++
 init.c           |  13 --
 optgroup.c       |   4 +
 optgroup.h       |   2 +
 options.c        |  18 ++
 options.h        |   1 +
 server.c         |   2 +-
 13 files changed, 730 insertions(+), 23 deletions(-)
 create mode 100644 engines/dfs.c
 create mode 100644 examples/dfs.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 1e5ebd5d..041b91fa 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2067,6 +2067,9 @@ I/O engine
 			unless :option:`verify` is set or :option:`cuda_io` is `posix`.
 			:option:`iomem` must not be `cudamalloc`. This ioengine defines
 			engine specific options.
+		**dfs**
+			I/O engine supporting asynchronous read and write operations to the
+			DAOS File System (DFS) via libdfs.
 
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -2452,6 +2455,24 @@ with the caveat that when used on the command line, they must come after the
 		GPU to RAM before a write and copied from RAM to GPU after a
 		read. :option:`verify` does not affect use of cudaMemcpy.
 
+.. option:: pool=str : [dfs]
+
+	Specify the UUID of the DAOS pool to connect to.
+
+.. option:: cont=str : [dfs]
+
+	Specify the UUID of the DAOS container to open.
+
+.. option:: chunk_size=int : [dfs]
+
+	Specificy a different chunk size (in bytes) for the dfs file.
+	Use DAOS container's chunk size by default.
+
+.. option:: object_class=str : [dfs]
+
+	Specificy a different object class for the dfs file.
+	Use DAOS container's object class by default.
+
 I/O depth
 ~~~~~~~~~
 
diff --git a/Makefile b/Makefile
index 612344d1..87a47b66 100644
--- a/Makefile
+++ b/Makefile
@@ -130,6 +130,11 @@ ifdef CONFIG_HTTP
   http_LIBS = -lcurl -lssl -lcrypto
   ENGINES += http
 endif
+ifdef CONFIG_DFS
+  dfs_SRCS = engines/dfs.c
+  dfs_LIBS = -luuid -ldaos -ldfs
+  ENGINES += dfs
+endif
 SOURCE += oslib/asprintf.c
 ifndef CONFIG_STRSEP
   SOURCE += oslib/strsep.c
diff --git a/backend.c b/backend.c
index f2efddd6..52b4ca7e 100644
--- a/backend.c
+++ b/backend.c
@@ -2537,6 +2537,7 @@ int fio_backend(struct sk_out *sk_out)
 	for_each_td(td, i) {
 		steadystate_free(td);
 		fio_options_free(td);
+		fio_dump_options_free(td);
 		if (td->rusage_sem) {
 			fio_sem_remove(td->rusage_sem);
 			td->rusage_sem = NULL;
diff --git a/configure b/configure
index 71b31868..d79f6521 100755
--- a/configure
+++ b/configure
@@ -171,6 +171,7 @@ march_set="no"
 libiscsi="no"
 libnbd="no"
 libzbc=""
+dfs=""
 dynamic_engines="no"
 prefix=/usr/local
 
@@ -242,6 +243,8 @@ for opt do
   ;;
   --dynamic-libengines) dynamic_engines="yes"
   ;;
+  --disable-dfs) dfs="no"
+  ;;
   --help)
     show_help="yes"
     ;;
@@ -284,6 +287,7 @@ if test "$show_help" = "yes" ; then
   echo "--disable-libzbc        Disable libzbc even if found"
   echo "--disable-tcmalloc	Disable tcmalloc support"
   echo "--dynamic-libengines	Lib-based ioengines as dynamic libraries"
+  echo "--disable-dfs		Disable DAOS File System support even if found"
   exit $exit_val
 fi
 
@@ -413,6 +417,7 @@ CYGWIN*)
   clock_gettime="yes" # clock_monotonic probe has dependency on this
   clock_monotonic="yes"
   sched_idle="yes"
+  pthread_condattr_setclock="no"
   ;;
 esac
 
@@ -758,10 +763,8 @@ print_config "POSIX pshared support" "$posix_pshared"
 
 ##########################################
 # POSIX pthread_condattr_setclock() probe
-if test "$pthread_condattr_setclock" != "yes" ; then
-  pthread_condattr_setclock="no"
-fi
-cat > $TMPC <<EOF
+if test "$pthread_condattr_setclock" != "no" ; then
+  cat > $TMPC <<EOF
 #include <pthread.h>
 int main(void)
 {
@@ -770,11 +773,12 @@ int main(void)
   return 0;
 }
 EOF
-if compile_prog "" "$LIBS" "pthread_condattr_setclock" ; then
-  pthread_condattr_setclock=yes
-elif compile_prog "" "$LIBS -lpthread" "pthread_condattr_setclock" ; then
-  pthread_condattr_setclock=yes
-  LIBS="$LIBS -lpthread"
+  if compile_prog "" "$LIBS" "pthread_condattr_setclock" ; then
+    pthread_condattr_setclock=yes
+  elif compile_prog "" "$LIBS -lpthread" "pthread_condattr_setclock" ; then
+    pthread_condattr_setclock=yes
+    LIBS="$LIBS -lpthread"
+  fi
 fi
 print_config "pthread_condattr_setclock()" "$pthread_condattr_setclock"
 
@@ -2179,6 +2183,33 @@ if test "$libnbd" != "no" ; then
 fi
 print_config "NBD engine" "$libnbd"
 
+##########################################
+# check for dfs (DAOS File System)
+if test "$dfs" != "no" ; then
+  cat > $TMPC << EOF
+#include <fcntl.h>
+#include <daos.h>
+#include <daos_fs.h>
+
+int main(int argc, char **argv)
+{
+  daos_handle_t	poh;
+  daos_handle_t	coh;
+  dfs_t		*dfs;
+
+  (void) dfs_mount(poh, coh, O_RDWR, &dfs);
+
+  return 0;
+}
+EOF
+  if compile_prog "" "-luuid -ldfs -ldaos" "dfs"; then
+    dfs="yes"
+  else
+    dfs="no"
+  fi
+fi
+print_config "DAOS File System (dfs) Engine" "$dfs"
+
 ##########################################
 # Check if we have lex/yacc available
 yacc="no"
@@ -2988,6 +3019,9 @@ fi
 if test "$libcufile" = "yes" ; then
   output_sym "CONFIG_LIBCUFILE"
 fi
+if test "$dfs" = "yes" ; then
+  output_sym "CONFIG_DFS"
+fi
 if test "$march_set" = "no" && test "$build_native" = "yes" ; then
   output_sym "CONFIG_BUILD_NATIVE"
 fi
diff --git a/engines/dfs.c b/engines/dfs.c
new file mode 100644
index 00000000..0343b101
--- /dev/null
+++ b/engines/dfs.c
@@ -0,0 +1,583 @@
+/**
+ * FIO engine for DAOS File System (dfs).
+ *
+ * (C) Copyright 2020-2021 Intel Corporation.
+ */
+
+#include <fio.h>
+#include <optgroup.h>
+
+#include <daos.h>
+#include <daos_fs.h>
+
+static bool		daos_initialized;
+static int		num_threads;
+static pthread_mutex_t	daos_mutex = PTHREAD_MUTEX_INITIALIZER;
+daos_handle_t		poh;  /* pool handle */
+daos_handle_t		coh;  /* container handle */
+daos_oclass_id_t	cid = OC_UNKNOWN;  /* object class */
+dfs_t			*dfs; /* dfs mount reference */
+
+struct daos_iou {
+	struct io_u	*io_u;
+	daos_event_t	ev;
+	d_sg_list_t	sgl;
+	d_iov_t		iov;
+	daos_size_t	size;
+	bool		complete;
+};
+
+struct daos_data {
+	daos_handle_t	eqh;
+	dfs_obj_t	*obj;
+	struct io_u	**io_us;
+	int		queued;
+	int		num_ios;
+};
+
+struct daos_fio_options {
+	void		*pad;
+	char		*pool;   /* Pool UUID */
+	char		*cont;   /* Container UUID */
+	daos_size_t	chsz;    /* Chunk size */
+	char		*oclass; /* object class */
+#if !defined(DAOS_API_VERSION_MAJOR) || DAOS_API_VERSION_MAJOR < 1
+	char		*svcl;   /* service replica list, deprecated */
+#endif
+};
+
+static struct fio_option options[] = {
+	{
+		.name		= "pool",
+		.lname		= "pool uuid",
+		.type		= FIO_OPT_STR_STORE,
+		.off1		= offsetof(struct daos_fio_options, pool),
+		.help		= "DAOS pool uuid",
+		.category	= FIO_OPT_C_ENGINE,
+		.group		= FIO_OPT_G_DFS,
+	},
+	{
+		.name           = "cont",
+		.lname          = "container uuid",
+		.type           = FIO_OPT_STR_STORE,
+		.off1           = offsetof(struct daos_fio_options, cont),
+		.help           = "DAOS container uuid",
+		.category	= FIO_OPT_C_ENGINE,
+		.group		= FIO_OPT_G_DFS,
+	},
+	{
+		.name           = "chunk_size",
+		.lname          = "DFS chunk size",
+		.type           = FIO_OPT_ULL,
+		.off1           = offsetof(struct daos_fio_options, chsz),
+		.help           = "DFS chunk size in bytes",
+		.def		= "0", /* use container default */
+		.category	= FIO_OPT_C_ENGINE,
+		.group		= FIO_OPT_G_DFS,
+	},
+	{
+		.name           = "object_class",
+		.lname          = "object class",
+		.type           = FIO_OPT_STR_STORE,
+		.off1           = offsetof(struct daos_fio_options, oclass),
+		.help           = "DAOS object class",
+		.category	= FIO_OPT_C_ENGINE,
+		.group		= FIO_OPT_G_DFS,
+	},
+#if !defined(DAOS_API_VERSION_MAJOR) || DAOS_API_VERSION_MAJOR < 1
+	{
+		.name           = "svcl",
+		.lname          = "List of service ranks",
+		.type           = FIO_OPT_STR_STORE,
+		.off1           = offsetof(struct daos_fio_options, svcl),
+		.help           = "List of pool replicated service ranks",
+		.category	= FIO_OPT_C_ENGINE,
+		.group		= FIO_OPT_G_DFS,
+	},
+#endif
+	{
+		.name           = NULL,
+	},
+};
+
+static int daos_fio_global_init(struct thread_data *td)
+{
+	struct daos_fio_options	*eo = td->eo;
+	uuid_t			pool_uuid, co_uuid;
+	daos_pool_info_t	pool_info;
+	daos_cont_info_t	co_info;
+	int			rc = 0;
+
+#if !defined(DAOS_API_VERSION_MAJOR) || DAOS_API_VERSION_MAJOR < 1
+	if (!eo->pool || !eo->cont || !eo->svcl) {
+#else
+	if (!eo->pool || !eo->cont) {
+#endif
+		log_err("Missing required DAOS options\n");
+		return EINVAL;
+	}
+
+	rc = daos_init();
+	if (rc != -DER_ALREADY && rc) {
+		log_err("Failed to initialize daos %d\n", rc);
+		td_verror(td, rc, "daos_init");
+		return rc;
+	}
+
+	rc = uuid_parse(eo->pool, pool_uuid);
+	if (rc) {
+		log_err("Failed to parse 'Pool uuid': %s\n", eo->pool);
+		td_verror(td, EINVAL, "uuid_parse(eo->pool)");
+		return EINVAL;
+	}
+
+	rc = uuid_parse(eo->cont, co_uuid);
+	if (rc) {
+		log_err("Failed to parse 'Cont uuid': %s\n", eo->cont);
+		td_verror(td, EINVAL, "uuid_parse(eo->cont)");
+		return EINVAL;
+	}
+
+	/* Connect to the DAOS pool */
+#if !defined(DAOS_API_VERSION_MAJOR) || DAOS_API_VERSION_MAJOR < 1
+	d_rank_list_t *svcl = NULL;
+
+	svcl = daos_rank_list_parse(eo->svcl, ":");
+	if (svcl == NULL) {
+		log_err("Failed to parse svcl\n");
+		td_verror(td, EINVAL, "daos_rank_list_parse");
+		return EINVAL;
+	}
+
+	rc = daos_pool_connect(pool_uuid, NULL, svcl, DAOS_PC_RW,
+			&poh, &pool_info, NULL);
+	d_rank_list_free(svcl);
+#else
+	rc = daos_pool_connect(pool_uuid, NULL, DAOS_PC_RW, &poh, &pool_info,
+			       NULL);
+#endif
+	if (rc) {
+		log_err("Failed to connect to pool %d\n", rc);
+		td_verror(td, rc, "daos_pool_connect");
+		return rc;
+	}
+
+	/* Open the DAOS container */
+	rc = daos_cont_open(poh, co_uuid, DAOS_COO_RW, &coh, &co_info, NULL);
+	if (rc) {
+		log_err("Failed to open container: %d\n", rc);
+		td_verror(td, rc, "daos_cont_open");
+		(void)daos_pool_disconnect(poh, NULL);
+		return rc;
+	}
+
+	/* Mount encapsulated filesystem */
+	rc = dfs_mount(poh, coh, O_RDWR, &dfs);
+	if (rc) {
+		log_err("Failed to mount DFS namespace: %d\n", rc);
+		td_verror(td, rc, "dfs_mount");
+		(void)daos_pool_disconnect(poh, NULL);
+		(void)daos_cont_close(coh, NULL);
+		return rc;
+	}
+
+	/* Retrieve object class to use, if specified */
+	if (eo->oclass)
+		cid = daos_oclass_name2id(eo->oclass);
+
+	return 0;
+}
+
+static int daos_fio_global_cleanup()
+{
+	int rc;
+	int ret = 0;
+
+	rc = dfs_umount(dfs);
+	if (rc) {
+		log_err("failed to umount dfs: %d\n", rc);
+		ret = rc;
+	}
+	rc = daos_cont_close(coh, NULL);
+	if (rc) {
+		log_err("failed to close container: %d\n", rc);
+		if (ret == 0)
+			ret = rc;
+	}
+	rc = daos_pool_disconnect(poh, NULL);
+	if (rc) {
+		log_err("failed to disconnect pool: %d\n", rc);
+		if (ret == 0)
+			ret = rc;
+	}
+	rc = daos_fini();
+	if (rc) {
+		log_err("failed to finalize daos: %d\n", rc);
+		if (ret == 0)
+			ret = rc;
+	}
+
+	return ret;
+}
+
+static int daos_fio_setup(struct thread_data *td)
+{
+	return 0;
+}
+
+static int daos_fio_init(struct thread_data *td)
+{
+	struct daos_data	*dd;
+	int			rc = 0;
+
+	pthread_mutex_lock(&daos_mutex);
+
+	dd = malloc(sizeof(*dd));
+	if (dd == NULL) {
+		log_err("Failed to allocate DAOS-private data\n");
+		rc = ENOMEM;
+		goto out;
+	}
+
+	dd->queued	= 0;
+	dd->num_ios	= td->o.iodepth;
+	dd->io_us	= calloc(dd->num_ios, sizeof(struct io_u *));
+	if (dd->io_us == NULL) {
+		log_err("Failed to allocate IO queue\n");
+		rc = ENOMEM;
+		goto out;
+	}
+
+	/* initialize DAOS stack if not already up */
+	if (!daos_initialized) {
+		rc = daos_fio_global_init(td);
+		if (rc)
+			goto out;
+		daos_initialized = true;
+	}
+
+	rc = daos_eq_create(&dd->eqh);
+	if (rc) {
+		log_err("Failed to create event queue: %d\n", rc);
+		td_verror(td, rc, "daos_eq_create");
+		goto out;
+	}
+
+	td->io_ops_data = dd;
+	num_threads++;
+out:
+	if (rc) {
+		if (dd) {
+			free(dd->io_us);
+			free(dd);
+		}
+		if (num_threads == 0 && daos_initialized) {
+			/* don't clobber error return value */
+			(void)daos_fio_global_cleanup();
+			daos_initialized = false;
+		}
+	}
+	pthread_mutex_unlock(&daos_mutex);
+	return rc;
+}
+
+static void daos_fio_cleanup(struct thread_data *td)
+{
+	struct daos_data	*dd = td->io_ops_data;
+	int			rc;
+
+	if (dd == NULL)
+		return;
+
+	rc = daos_eq_destroy(dd->eqh, DAOS_EQ_DESTROY_FORCE);
+	if (rc < 0) {
+		log_err("failed to destroy event queue: %d\n", rc);
+		td_verror(td, rc, "daos_eq_destroy");
+	}
+
+	free(dd->io_us);
+	free(dd);
+
+	pthread_mutex_lock(&daos_mutex);
+	num_threads--;
+	if (daos_initialized && num_threads == 0) {
+		int ret;
+
+		ret = daos_fio_global_cleanup();
+		if (ret < 0 && rc == 0) {
+			log_err("failed to clean up: %d\n", ret);
+			td_verror(td, ret, "daos_fio_global_cleanup");
+		}
+		daos_initialized = false;
+	}
+	pthread_mutex_unlock(&daos_mutex);
+}
+
+static int daos_fio_get_file_size(struct thread_data *td, struct fio_file *f)
+{
+	char		*file_name = f->file_name;
+	struct stat	stbuf = {0};
+	int		rc;
+
+	dprint(FD_FILE, "dfs stat %s\n", f->file_name);
+
+	if (!daos_initialized)
+		return 0;
+
+	rc = dfs_stat(dfs, NULL, file_name, &stbuf);
+	if (rc) {
+		log_err("Failed to stat %s: %d\n", f->file_name, rc);
+		td_verror(td, rc, "dfs_stat");
+		return rc;
+	}
+
+	f->real_file_size = stbuf.st_size;
+	return 0;
+}
+
+static int daos_fio_close(struct thread_data *td, struct fio_file *f)
+{
+	struct daos_data	*dd = td->io_ops_data;
+	int			rc;
+
+	dprint(FD_FILE, "dfs release %s\n", f->file_name);
+
+	rc = dfs_release(dd->obj);
+	if (rc) {
+		log_err("Failed to release %s: %d\n", f->file_name, rc);
+		td_verror(td, rc, "dfs_release");
+		return rc;
+	}
+
+	return 0;
+}
+
+static int daos_fio_open(struct thread_data *td, struct fio_file *f)
+{
+	struct daos_data	*dd = td->io_ops_data;
+	struct daos_fio_options	*eo = td->eo;
+	int			flags = 0;
+	int			rc;
+
+	dprint(FD_FILE, "dfs open %s (%s/%d/%d)\n",
+	       f->file_name, td_write(td) & !read_only ? "rw" : "r",
+	       td->o.create_on_open, td->o.allow_create);
+
+	if (td->o.create_on_open && td->o.allow_create)
+		flags |= O_CREAT;
+
+	if (td_write(td)) {
+		if (!read_only)
+			flags |= O_RDWR;
+		if (td->o.allow_create)
+			flags |= O_CREAT;
+	} else if (td_read(td)) {
+		flags |= O_RDONLY;
+	}
+
+	rc = dfs_open(dfs, NULL, f->file_name,
+		      S_IFREG | S_IRUSR | S_IWUSR,
+		      flags, cid, eo->chsz, NULL, &dd->obj);
+	if (rc) {
+		log_err("Failed to open %s: %d\n", f->file_name, rc);
+		td_verror(td, rc, "dfs_open");
+		return rc;
+	}
+
+	return 0;
+}
+
+static int daos_fio_unlink(struct thread_data *td, struct fio_file *f)
+{
+	int rc;
+
+	dprint(FD_FILE, "dfs remove %s\n", f->file_name);
+
+	rc = dfs_remove(dfs, NULL, f->file_name, false, NULL);
+	if (rc) {
+		log_err("Failed to remove %s: %d\n", f->file_name, rc);
+		td_verror(td, rc, "dfs_remove");
+		return rc;
+	}
+
+	return 0;
+}
+
+static int daos_fio_invalidate(struct thread_data *td, struct fio_file *f)
+{
+	dprint(FD_FILE, "dfs invalidate %s\n", f->file_name);
+	return 0;
+}
+
+static void daos_fio_io_u_free(struct thread_data *td, struct io_u *io_u)
+{
+	struct daos_iou *io = io_u->engine_data;
+
+	if (io) {
+		io_u->engine_data = NULL;
+		free(io);
+	}
+}
+
+static int daos_fio_io_u_init(struct thread_data *td, struct io_u *io_u)
+{
+	struct daos_iou *io;
+
+	io = malloc(sizeof(struct daos_iou));
+	if (!io) {
+		td_verror(td, ENOMEM, "malloc");
+		return ENOMEM;
+	}
+	io->io_u = io_u;
+	io_u->engine_data = io;
+	return 0;
+}
+
+static struct io_u * daos_fio_event(struct thread_data *td, int event)
+{
+	struct daos_data *dd = td->io_ops_data;
+
+	return dd->io_us[event];
+}
+
+static int daos_fio_getevents(struct thread_data *td, unsigned int min,
+			      unsigned int max, const struct timespec *t)
+{
+	struct daos_data	*dd = td->io_ops_data;
+	daos_event_t		*evp[max];
+	unsigned int		events = 0;
+	int			i;
+	int			rc;
+
+	while (events < min) {
+		rc = daos_eq_poll(dd->eqh, 0, DAOS_EQ_NOWAIT, max, evp);
+		if (rc < 0) {
+			log_err("Event poll failed: %d\n", rc);
+			td_verror(td, rc, "daos_eq_poll");
+			return events;
+		}
+
+		for (i = 0; i < rc; i++) {
+			struct daos_iou	*io;
+			struct io_u	*io_u;
+
+			io = container_of(evp[i], struct daos_iou, ev);
+			if (io->complete)
+				log_err("Completion on already completed I/O\n");
+
+			io_u = io->io_u;
+			if (io->ev.ev_error)
+				io_u->error = io->ev.ev_error;
+			else
+				io_u->resid = 0;
+
+			dd->io_us[events] = io_u;
+			dd->queued--;
+			daos_event_fini(&io->ev);
+			io->complete = true;
+			events++;
+		}
+	}
+
+	dprint(FD_IO, "dfs eq_pool returning %d (%u/%u)\n", events, min, max);
+
+	return events;
+}
+
+static enum fio_q_status daos_fio_queue(struct thread_data *td,
+					struct io_u *io_u)
+{
+	struct daos_data	*dd = td->io_ops_data;
+	struct daos_iou		*io = io_u->engine_data;
+	daos_off_t		offset = io_u->offset;
+	int			rc;
+
+	if (dd->queued == td->o.iodepth)
+		return FIO_Q_BUSY;
+
+	io->sgl.sg_nr = 1;
+	io->sgl.sg_nr_out = 0;
+	d_iov_set(&io->iov, io_u->xfer_buf, io_u->xfer_buflen);
+	io->sgl.sg_iovs = &io->iov;
+	io->size = io_u->xfer_buflen;
+
+	io->complete = false;
+	rc = daos_event_init(&io->ev, dd->eqh, NULL);
+	if (rc) {
+		log_err("Event init failed: %d\n", rc);
+		io_u->error = rc;
+		return FIO_Q_COMPLETED;
+	}
+
+	switch (io_u->ddir) {
+	case DDIR_WRITE:
+		rc = dfs_write(dfs, dd->obj, &io->sgl, offset, &io->ev);
+		if (rc) {
+			log_err("dfs_write failed: %d\n", rc);
+			io_u->error = rc;
+			return FIO_Q_COMPLETED;
+		}
+		break;
+	case DDIR_READ:
+		rc = dfs_read(dfs, dd->obj, &io->sgl, offset, &io->size,
+			      &io->ev);
+		if (rc) {
+			log_err("dfs_read failed: %d\n", rc);
+			io_u->error = rc;
+			return FIO_Q_COMPLETED;
+		}
+		break;
+	case DDIR_SYNC:
+		io_u->error = 0;
+		return FIO_Q_COMPLETED;
+	default:
+		dprint(FD_IO, "Invalid IO type: %d\n", io_u->ddir);
+		io_u->error = -DER_INVAL;
+		return FIO_Q_COMPLETED;
+	}
+
+	dd->queued++;
+	return FIO_Q_QUEUED;
+}
+
+static int daos_fio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
+{
+	return 0;
+}
+
+/* ioengine_ops for get_ioengine() */
+FIO_STATIC struct ioengine_ops ioengine = {
+	.name			= "dfs",
+	.version		= FIO_IOOPS_VERSION,
+	.flags			= FIO_DISKLESSIO | FIO_NODISKUTIL,
+
+	.setup			= daos_fio_setup,
+	.init			= daos_fio_init,
+	.prep			= daos_fio_prep,
+	.cleanup		= daos_fio_cleanup,
+
+	.open_file		= daos_fio_open,
+	.invalidate		= daos_fio_invalidate,
+	.get_file_size		= daos_fio_get_file_size,
+	.close_file		= daos_fio_close,
+	.unlink_file		= daos_fio_unlink,
+
+	.queue			= daos_fio_queue,
+	.getevents		= daos_fio_getevents,
+	.event			= daos_fio_event,
+	.io_u_init		= daos_fio_io_u_init,
+	.io_u_free		= daos_fio_io_u_free,
+
+	.option_struct_size	= sizeof(struct daos_fio_options),
+	.options		= options,
+};
+
+static void fio_init fio_dfs_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_dfs_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/dfs.fio b/examples/dfs.fio
new file mode 100644
index 00000000..5de887d7
--- /dev/null
+++ b/examples/dfs.fio
@@ -0,0 +1,33 @@
+[global]
+ioengine=dfs
+pool=${POOL}
+cont=${CONT}
+filename_format=fio-test.$jobnum
+
+cpus_allowed_policy=split
+group_reporting=1
+time_based=0
+percentile_list=99.0:99.9:99.99:99.999:99.9999:100
+disable_slat=1
+disable_clat=1
+
+bs=1M
+size=100G
+iodepth=16
+numjobs=16
+
+[daos-seqwrite]
+rw=write
+stonewall
+
+[daos-seqread]
+rw=read
+stonewall
+
+[daos-randwrite]
+rw=randwrite
+stonewall
+
+[daos-randread]
+rw=randread
+stonewall
diff --git a/fio.1 b/fio.1
index b95a67ab..27cf2f15 100644
--- a/fio.1
+++ b/fio.1
@@ -1856,6 +1856,10 @@ GPUDirect Storage-supported filesystem. This engine performs
 I/O without transferring buffers between user-space and the kernel,
 unless \fBverify\fR is set or \fBcuda_io\fR is \fBposix\fR. \fBiomem\fR must
 not be \fBcudamalloc\fR. This ioengine defines engine specific options.
+.TP
+.B dfs
+I/O engine supporting asynchronous read and write operations to the DAOS File
+System (DFS) via libdfs.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -2209,6 +2213,20 @@ from RAM to GPU after a read. \fBverify\fR does not affect
 the use of cudaMemcpy.
 .RE
 .RE
+.TP
+.BI (dfs)pool
+Specify the UUID of the DAOS pool to connect to.
+.TP
+.BI (dfs)cont
+Specify the UUID of the DAOS DAOS container to open.
+.TP
+.BI (dfs)chunk_size
+Specificy a different chunk size (in bytes) for the dfs file.
+Use DAOS container's chunk size by default.
+.TP
+.BI (dfs)object_class
+Specificy a different object class for the dfs file.
+Use DAOS container's object class by default.
 .SS "I/O depth"
 .TP
 .BI iodepth \fR=\fPint
diff --git a/init.c b/init.c
index e8530fec..37bff876 100644
--- a/init.c
+++ b/init.c
@@ -448,19 +448,6 @@ static void dump_opt_list(struct thread_data *td)
 	}
 }
 
-static void fio_dump_options_free(struct thread_data *td)
-{
-	while (!flist_empty(&td->opt_list)) {
-		struct print_option *p;
-
-		p = flist_first_entry(&td->opt_list, struct print_option, list);
-		flist_del_init(&p->list);
-		free(p->name);
-		free(p->value);
-		free(p);
-	}
-}
-
 static void copy_opt_list(struct thread_data *dst, struct thread_data *src)
 {
 	struct flist_head *entry;
diff --git a/optgroup.c b/optgroup.c
index 64774896..4cdea71f 100644
--- a/optgroup.c
+++ b/optgroup.c
@@ -177,6 +177,10 @@ static const struct opt_group fio_opt_cat_groups[] = {
 		.name	= "libcufile I/O engine", /* libcufile */
 		.mask	= FIO_OPT_G_LIBCUFILE,
 	},
+	{
+		.name	= "DAOS File System (dfs) I/O engine", /* dfs */
+		.mask	= FIO_OPT_G_DFS,
+	},
 	{
 		.name	= NULL,
 	},
diff --git a/optgroup.h b/optgroup.h
index d2f1ceb3..25b7fec1 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -68,6 +68,7 @@ enum opt_category_group {
 	__FIO_OPT_G_FILESTAT,
 	__FIO_OPT_G_NR,
 	__FIO_OPT_G_LIBCUFILE,
+	__FIO_OPT_G_DFS,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
 	FIO_OPT_G_ZONE		= (1ULL << __FIO_OPT_G_ZONE),
@@ -110,6 +111,7 @@ enum opt_category_group {
 	FIO_OPT_G_IOURING	= (1ULL << __FIO_OPT_G_IOURING),
 	FIO_OPT_G_FILESTAT	= (1ULL << __FIO_OPT_G_FILESTAT),
 	FIO_OPT_G_LIBCUFILE	= (1ULL << __FIO_OPT_G_LIBCUFILE),
+	FIO_OPT_G_DFS		= (1ULL << __FIO_OPT_G_DFS),
 };
 
 extern const struct opt_group *opt_group_from_mask(uint64_t *mask);
diff --git a/options.c b/options.c
index 0791101e..151e7a7e 100644
--- a/options.c
+++ b/options.c
@@ -2011,6 +2011,11 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  { .ival = "nbd",
 			    .help = "Network Block Device (NBD) IO engine"
 			  },
+#ifdef CONFIG_DFS
+			  { .ival = "dfs",
+			    .help = "DAOS File System (dfs) IO engine",
+			  },
+#endif
 		},
 	},
 	{
@@ -5456,6 +5461,19 @@ void fio_options_free(struct thread_data *td)
 	}
 }
 
+void fio_dump_options_free(struct thread_data *td)
+{
+	while (!flist_empty(&td->opt_list)) {
+		struct print_option *p;
+
+		p = flist_first_entry(&td->opt_list, struct print_option, list);
+		flist_del_init(&p->list);
+		free(p->name);
+		free(p->value);
+		free(p);
+	}
+}
+
 struct fio_option *fio_option_find(const char *name)
 {
 	return find_option(fio_options, name);
diff --git a/options.h b/options.h
index 5276f31e..df80fd98 100644
--- a/options.h
+++ b/options.h
@@ -16,6 +16,7 @@ void add_opt_posval(const char *, const char *, const char *);
 void del_opt_posval(const char *, const char *);
 struct thread_data;
 void fio_options_free(struct thread_data *);
+void fio_dump_options_free(struct thread_data *);
 char *get_next_str(char **ptr);
 int get_max_str_idx(char *input);
 char* get_name_by_idx(char *input, int index);
diff --git a/server.c b/server.c
index 1b65297e..8daefbab 100644
--- a/server.c
+++ b/server.c
@@ -1909,7 +1909,7 @@ static int fio_append_iolog_gz(struct sk_entry *first, struct io_log *log)
 			break;
 		}
 		flist_add_tail(&entry->list, &first->next);
-	} while (ret != Z_STREAM_END);
+	}
 
 	ret = deflateEnd(&stream);
 	if (ret == Z_OK)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-03-12 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-03-12 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9697503cacf01c2b83f09973a6c8f9439156ce63:

  Merge branch 'fallock-blkdev' of https://github.com/dmonakhov/fio (2021-03-10 13:30:23 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 014ab48afcbcf442464acc7427fcd0f194f64bf4:

  Merge branch 'dev_luye_github' of https://github.com/louisluSCU/fio (2021-03-11 11:50:49 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'dev_luye_github' of https://github.com/louisluSCU/fio

luye (1):
      Fix reading multiple blktrace replay files

 iolog.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/iolog.c b/iolog.c
index fa40c857..cf264916 100644
--- a/iolog.c
+++ b/iolog.c
@@ -607,12 +607,11 @@ static int open_socket(const char *path)
 /*
  * open iolog, check version, and call appropriate parser
  */
-static bool init_iolog_read(struct thread_data *td)
+static bool init_iolog_read(struct thread_data *td, char *fname)
 {
-	char buffer[256], *p, *fname;
+	char buffer[256], *p;
 	FILE *f = NULL;
 
-	fname = get_name_by_idx(td->o.read_iolog_file, td->subjob_number);
 	dprint(FD_IO, "iolog: name=%s\n", fname);
 
 	if (is_socket(fname)) {
@@ -701,15 +700,16 @@ bool init_iolog(struct thread_data *td)
 
 	if (td->o.read_iolog_file) {
 		int need_swap;
+		char * fname = get_name_by_idx(td->o.read_iolog_file, td->subjob_number);
 
 		/*
 		 * Check if it's a blktrace file and load that if possible.
 		 * Otherwise assume it's a normal log file and load that.
 		 */
-		if (is_blktrace(td->o.read_iolog_file, &need_swap))
-			ret = load_blktrace(td, td->o.read_iolog_file, need_swap);
+		if (is_blktrace(fname, &need_swap))
+			ret = load_blktrace(td, fname, need_swap);
 		else
-			ret = init_iolog_read(td);
+			ret = init_iolog_read(td, fname);
 	} else if (td->o.write_iolog_file)
 		ret = init_iolog_write(td);
 	else


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-03-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-03-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6b8cadb66c62394420a39b46af1a2967b916c829:

  Merge branch 'master' of https://github.com/DevriesL/fio (2021-03-09 07:58:39 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9697503cacf01c2b83f09973a6c8f9439156ce63:

  Merge branch 'fallock-blkdev' of https://github.com/dmonakhov/fio (2021-03-10 13:30:23 -0700)

----------------------------------------------------------------
Dmitry Monakhov (1):
      engines/falloc: add blockdevice as a target

Jens Axboe (2):
      Merge branch 'master' of https://github.com/venkatrag1/fio
      Merge branch 'fallock-blkdev' of https://github.com/dmonakhov/fio

Venkat Ramesh (1):
      options: allow separate values for max_latency

 HOWTO            |  5 +++--
 cconv.c          |  6 ++++--
 engines/falloc.c |  4 ++--
 fio.1            |  5 +++--
 init.c           |  4 +++-
 io_u.c           | 23 +++++++++++++++--------
 options.c        |  6 ++++--
 server.h         |  2 +-
 thread_options.h |  4 ++--
 9 files changed, 37 insertions(+), 22 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 52812cc7..1e5ebd5d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2663,11 +2663,12 @@ I/O latency
 	true, fio will continue running and try to meet :option:`latency_target`
 	by adjusting queue depth.
 
-.. option:: max_latency=time
+.. option:: max_latency=time[,time][,time]
 
 	If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
 	maximum latency. When the unit is omitted, the value is interpreted in
-	microseconds.
+	microseconds. Comma-separated values may be specified for reads, writes,
+	and trims as described in :option:`blocksize`.
 
 .. option:: rate_cycle=int
 
diff --git a/cconv.c b/cconv.c
index b10868fb..aa06e3ea 100644
--- a/cconv.c
+++ b/cconv.c
@@ -143,6 +143,8 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 		o->rate_iops_min[i] = le32_to_cpu(top->rate_iops_min[i]);
 
 		o->perc_rand[i] = le32_to_cpu(top->perc_rand[i]);
+
+		o->max_latency[i] = le64_to_cpu(top->max_latency[i]);
 	}
 
 	o->ratecycle = le32_to_cpu(top->ratecycle);
@@ -289,7 +291,6 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->sync_file_range = le32_to_cpu(top->sync_file_range);
 	o->latency_target = le64_to_cpu(top->latency_target);
 	o->latency_window = le64_to_cpu(top->latency_window);
-	o->max_latency = le64_to_cpu(top->max_latency);
 	o->latency_percentile.u.f = fio_uint64_to_double(le64_to_cpu(top->latency_percentile.u.i));
 	o->latency_run = le32_to_cpu(top->latency_run);
 	o->compress_percentage = le32_to_cpu(top->compress_percentage);
@@ -491,7 +492,6 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->sync_file_range = cpu_to_le32(o->sync_file_range);
 	top->latency_target = __cpu_to_le64(o->latency_target);
 	top->latency_window = __cpu_to_le64(o->latency_window);
-	top->max_latency = __cpu_to_le64(o->max_latency);
 	top->latency_percentile.u.i = __cpu_to_le64(fio_double_to_uint64(o->latency_percentile.u.f));
 	top->latency_run = __cpu_to_le32(o->latency_run);
 	top->compress_percentage = cpu_to_le32(o->compress_percentage);
@@ -550,6 +550,8 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 		top->rate_iops_min[i] = cpu_to_le32(o->rate_iops_min[i]);
 
 		top->perc_rand[i] = cpu_to_le32(o->perc_rand[i]);
+
+		top->max_latency[i] = __cpu_to_le64(o->max_latency[i]);
 	}
 
 	memcpy(top->verify_pattern, o->verify_pattern, MAX_PATTERN_SIZE);
diff --git a/engines/falloc.c b/engines/falloc.c
index 6382569b..4b05ed68 100644
--- a/engines/falloc.c
+++ b/engines/falloc.c
@@ -25,8 +25,8 @@ static int open_file(struct thread_data *td, struct fio_file *f)
 
 	dprint(FD_FILE, "fd open %s\n", f->file_name);
 
-	if (f->filetype != FIO_TYPE_FILE) {
-		log_err("fio: only files are supported fallocate \n");
+	if (f->filetype != FIO_TYPE_FILE && f->filetype != FIO_TYPE_BLOCK) {
+		log_err("fio: only files and blockdev are supported fallocate \n");
 		return 1;
 	}
 	if (!strcmp(f->file_name, "-")) {
diff --git a/fio.1 b/fio.1
index 2c0d8348..b95a67ab 100644
--- a/fio.1
+++ b/fio.1
@@ -2403,10 +2403,11 @@ Used with \fBlatency_target\fR. If false (default), fio will find the highest
 queue depth that meets \fBlatency_target\fR and exit. If true, fio will continue
 running and try to meet \fBlatency_target\fR by adjusting queue depth.
 .TP
-.BI max_latency \fR=\fPtime
+.BI max_latency \fR=\fPtime[,time][,time]
 If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
 maximum latency. When the unit is omitted, the value is interpreted in
-microseconds.
+microseconds. Comma-separated values may be specified for reads, writes,
+and trims as described in \fBblocksize\fR.
 .TP
 .BI rate_cycle \fR=\fPint
 Average bandwidth for \fBrate\fR and \fBrate_min\fR over this number
diff --git a/init.c b/init.c
index eea6e546..e8530fec 100644
--- a/init.c
+++ b/init.c
@@ -961,7 +961,9 @@ static int fixup_options(struct thread_data *td)
 	/*
 	 * Fix these up to be nsec internally
 	 */
-	o->max_latency *= 1000ULL;
+	for_each_rw_ddir(ddir)
+		o->max_latency[ddir] *= 1000ULL;
+
 	o->latency_target *= 1000ULL;
 
 	return ret;
diff --git a/io_u.c b/io_u.c
index 00a219c2..b421a579 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1389,11 +1389,16 @@ static long set_io_u_file(struct thread_data *td, struct io_u *io_u)
 	return 0;
 }
 
-static void lat_fatal(struct thread_data *td, struct io_completion_data *icd,
+static void lat_fatal(struct thread_data *td, struct io_u *io_u, struct io_completion_data *icd,
 		      unsigned long long tnsec, unsigned long long max_nsec)
 {
-	if (!td->error)
-		log_err("fio: latency of %llu nsec exceeds specified max (%llu nsec)\n", tnsec, max_nsec);
+	if (!td->error) {
+		log_err("fio: latency of %llu nsec exceeds specified max (%llu nsec): %s %s %llu %llu\n",
+					tnsec, max_nsec,
+					io_u->file->file_name,
+					io_ddir_name(io_u->ddir),
+					io_u->offset, io_u->buflen);
+	}
 	td_verror(td, ETIMEDOUT, "max latency exceeded");
 	icd->error = ETIMEDOUT;
 }
@@ -1888,11 +1893,13 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 				icd->error = ops->io_u_lat(td, tnsec);
 		}
 
-		if (td->o.max_latency && tnsec > td->o.max_latency)
-			lat_fatal(td, icd, tnsec, td->o.max_latency);
-		if (td->o.latency_target && tnsec > td->o.latency_target) {
-			if (lat_target_failed(td))
-				lat_fatal(td, icd, tnsec, td->o.latency_target);
+		if (ddir_rw(idx)) {
+			if (td->o.max_latency[idx] && tnsec > td->o.max_latency[idx])
+				lat_fatal(td, io_u, icd, tnsec, td->o.max_latency[idx]);
+			if (td->o.latency_target && tnsec > td->o.latency_target) {
+				if (lat_target_failed(td))
+					lat_fatal(td, io_u, icd, tnsec, td->o.latency_target);
+			}
 		}
 	}
 
diff --git a/options.c b/options.c
index e3b0c4ef..0791101e 100644
--- a/options.c
+++ b/options.c
@@ -3756,8 +3756,10 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "max_latency",
 		.lname	= "Max Latency (usec)",
-		.type	= FIO_OPT_STR_VAL_TIME,
-		.off1	= offsetof(struct thread_options, max_latency),
+		.type	= FIO_OPT_ULL,
+		.off1	= offsetof(struct thread_options, max_latency[DDIR_READ]),
+		.off2	= offsetof(struct thread_options, max_latency[DDIR_WRITE]),
+		.off3	= offsetof(struct thread_options, max_latency[DDIR_TRIM]),
 		.help	= "Maximum tolerated IO latency (usec)",
 		.is_time = 1,
 		.category = FIO_OPT_C_IO,
diff --git a/server.h b/server.h
index 74618ca7..b45b319b 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 88,
+	FIO_SERVER_VER			= 89,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 88fd7ad9..5ecc72d7 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -207,7 +207,7 @@ struct thread_options {
 	enum fio_memtype mem_type;
 	unsigned int mem_align;
 
-	unsigned long long max_latency;
+	unsigned long long max_latency[DDIR_RWDIR_CNT];
 
 	unsigned int exit_what;
 	unsigned int stonewall;
@@ -629,7 +629,7 @@ struct thread_options_pack {
 
 	uint64_t latency_target;
 	uint64_t latency_window;
-	uint64_t max_latency;
+	uint64_t max_latency[DDIR_RWDIR_CNT];
 	uint32_t pad5;
 	fio_fp64_t latency_percentile;
 	uint32_t latency_run;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-03-10 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-03-10 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 267b164c372d57145880f365bab8d8a52bf8baa7:

  Fio 3.26 (2021-03-08 17:44:38 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6b8cadb66c62394420a39b46af1a2967b916c829:

  Merge branch 'master' of https://github.com/DevriesL/fio (2021-03-09 07:58:39 -0700)

----------------------------------------------------------------
DevriesL (1):
      engines/io_uring: fix compilation conflict with Android NDK

Jens Axboe (1):
      Merge branch 'master' of https://github.com/DevriesL/fio

 engines/io_uring.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index c9036ba0..b962e804 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -696,11 +696,11 @@ static int fio_ioring_post_init(struct thread_data *td)
 
 	err = fio_ioring_queue_init(td);
 	if (err) {
-		int __errno = errno;
+		int init_err = errno;
 
-		if (__errno == ENOSYS)
+		if (init_err == ENOSYS)
 			log_err("fio: your kernel doesn't support io_uring\n");
-		td_verror(td, __errno, "io_queue_init");
+		td_verror(td, init_err, "io_queue_init");
 		return 1;
 	}
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-03-09 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-03-09 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ea9be958d8948cab0c5593a7afc695d17bd6ba79:

  Merge branch 'clock_monotonic_unused' of https://github.com/foxeng/fio (2021-03-06 15:37:56 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 267b164c372d57145880f365bab8d8a52bf8baa7:

  Fio 3.26 (2021-03-08 17:44:38 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      t/io_uring: SQPOLL fixes
      Fio 3.26

 FIO-VERSION-GEN |  2 +-
 t/io_uring.c    | 28 ++++++++++++++++------------
 2 files changed, 17 insertions(+), 13 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 81a6355b..29486071 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.25
+DEF_VER=fio-3.26
 
 LF='
 '
diff --git a/t/io_uring.c b/t/io_uring.c
index 044f9195..ff4c7a7c 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -233,8 +233,7 @@ static int prep_more_ios(struct submitter *s, int max_ios)
 	next_tail = tail = *ring->tail;
 	do {
 		next_tail++;
-		read_barrier();
-		if (next_tail == *ring->head)
+		if (next_tail == atomic_load_acquire(ring->head))
 			break;
 
 		index = tail & sq_ring_mask;
@@ -244,10 +243,8 @@ static int prep_more_ios(struct submitter *s, int max_ios)
 		tail = next_tail;
 	} while (prepped < max_ios);
 
-	if (*ring->tail != tail) {
-		*ring->tail = tail;
-		write_barrier();
-	}
+	if (prepped)
+		atomic_store_release(ring->tail, tail);
 	return prepped;
 }
 
@@ -284,7 +281,7 @@ static int reap_events(struct submitter *s)
 		struct file *f;
 
 		read_barrier();
-		if (head == *ring->tail)
+		if (head == atomic_load_acquire(ring->tail))
 			break;
 		cqe = &ring->cqes[head & cq_ring_mask];
 		if (!do_nop) {
@@ -301,9 +298,10 @@ static int reap_events(struct submitter *s)
 		head++;
 	} while (1);
 
-	s->inflight -= reaped;
-	*ring->head = head;
-	write_barrier();
+	if (reaped) {
+		s->inflight -= reaped;
+		atomic_store_release(ring->head, head);
+	}
 	return reaped;
 }
 
@@ -320,6 +318,7 @@ static void *submitter_fn(void *data)
 	prepped = 0;
 	do {
 		int to_wait, to_submit, this_reap, to_prep;
+		unsigned ring_flags = 0;
 
 		if (!prepped && s->inflight < depth) {
 			to_prep = min(depth - s->inflight, batch_submit);
@@ -338,15 +337,20 @@ submit:
 		 * Only need to call io_uring_enter if we're not using SQ thread
 		 * poll, or if IORING_SQ_NEED_WAKEUP is set.
 		 */
-		if (!sq_thread_poll || (*ring->flags & IORING_SQ_NEED_WAKEUP)) {
+		if (sq_thread_poll)
+			ring_flags = atomic_load_acquire(ring->flags);
+		if (!sq_thread_poll || ring_flags & IORING_SQ_NEED_WAKEUP) {
 			unsigned flags = 0;
 
 			if (to_wait)
 				flags = IORING_ENTER_GETEVENTS;
-			if ((*ring->flags & IORING_SQ_NEED_WAKEUP))
+			if (ring_flags & IORING_SQ_NEED_WAKEUP)
 				flags |= IORING_ENTER_SQ_WAKEUP;
 			ret = io_uring_enter(s, to_submit, to_wait, flags);
 			s->calls++;
+		} else {
+			/* for SQPOLL, we submitted it all effectively */
+			ret = to_submit;
 		}
 
 		/*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-03-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-03-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ab02be41807ec9451c47b17129cf61457ef21db6:

  Add a new file to gitignore (2021-02-21 06:14:40 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ea9be958d8948cab0c5593a7afc695d17bd6ba79:

  Merge branch 'clock_monotonic_unused' of https://github.com/foxeng/fio (2021-03-06 15:37:56 -0700)

----------------------------------------------------------------
Alexey Dobriyan (4):
      parse: simplify parse_is_percent()
      zbd: simplify zoneskip= validness check
      zbd: fix check against 32-bit zone size
      zbd: support 'z' suffix for zone granularity

Fotis Xenakis (1):
      configure: remove unused CLOCK_MONOTONIC_* symbols

Friendy.Su@sony.com (1):
      engines/filecreate: remove improper message print

Jens Axboe (1):
      Merge branch 'clock_monotonic_unused' of https://github.com/foxeng/fio

 configure              |  6 ------
 engines/filecreate.c   |  2 +-
 filesetup.c            | 18 +++++++++++++-----
 fio.1                  | 13 ++++++++-----
 options.c              | 48 +++++++++++++++++++++++++++++++++++++----------
 parse.c                | 30 ++++++++++++++++++++++++++++-
 parse.h                | 11 +++++++++--
 server.h               |  2 +-
 t/zbd/test-zbd-support | 48 +++++++++++++++++++++++++++++++++++++++++++++++
 thread_options.h       | 17 +++++++++++++----
 zbd.c                  | 51 +++++++++++++++++++++++++++++++++++++++++++-------
 zbd.h                  |  2 ++
 12 files changed, 206 insertions(+), 42 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 748f7014..71b31868 100755
--- a/configure
+++ b/configure
@@ -2794,12 +2794,6 @@ fi
 if test "$clock_monotonic" = "yes" ; then
   output_sym "CONFIG_CLOCK_MONOTONIC"
 fi
-if test "$clock_monotonic_raw" = "yes" ; then
-  output_sym "CONFIG_CLOCK_MONOTONIC_RAW"
-fi
-if test "$clock_monotonic_precise" = "yes" ; then
-  output_sym "CONFIG_CLOCK_MONOTONIC_PRECISE"
-fi
 if test "$clockid_t" = "yes"; then
   output_sym "CONFIG_CLOCKID_T"
 fi
diff --git a/engines/filecreate.c b/engines/filecreate.c
index 5fec8544..16c64928 100644
--- a/engines/filecreate.c
+++ b/engines/filecreate.c
@@ -22,7 +22,7 @@ static int open_file(struct thread_data *td, struct fio_file *f)
 	dprint(FD_FILE, "fd open %s\n", f->file_name);
 
 	if (f->filetype != FIO_TYPE_FILE) {
-		log_err("fio: only files are supported fallocate \n");
+		log_err("fio: only files are supported\n");
 		return 1;
 	}
 	if (!strcmp(f->file_name, "-")) {
diff --git a/filesetup.c b/filesetup.c
index 661d4c2f..e664f8b4 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1118,6 +1118,13 @@ int setup_files(struct thread_data *td)
 	if (o->read_iolog_file)
 		goto done;
 
+	if (td->o.zone_mode == ZONE_MODE_ZBD) {
+		err = zbd_init_files(td);
+		if (err)
+			goto err_out;
+	}
+	zbd_recalc_options_with_zone_granularity(td);
+
 	/*
 	 * check sizes. if the files/devices do not exist and the size
 	 * isn't passed to fio, abort.
@@ -1395,16 +1402,17 @@ int setup_files(struct thread_data *td)
 	}
 
 done:
-	if (o->create_only)
-		td->done = 1;
-
-	td_restore_runstate(td, old_state);
-
 	if (td->o.zone_mode == ZONE_MODE_ZBD) {
 		err = zbd_setup_files(td);
 		if (err)
 			goto err_out;
 	}
+
+	if (o->create_only)
+		td->done = 1;
+
+	td_restore_runstate(td, old_state);
+
 	return 0;
 
 err_offset:
diff --git a/fio.1 b/fio.1
index accc6a32..2c0d8348 100644
--- a/fio.1
+++ b/fio.1
@@ -348,6 +348,9 @@ us or usec means microseconds
 .PD
 .RE
 .P
+`z' suffix specifies that the value is measured in zones.
+Value is recalculated once block device's zone size becomes known.
+.P
 If the option accepts an upper and lower range, use a colon ':' or
 minus '\-' to separate such values. See \fIirange\fR parameter type.
 If the lower value specified happens to be larger than the upper value
@@ -783,7 +786,7 @@ If not specified it defaults to the zone size. If the target device is a zoned
 block device, the zone capacity is obtained from the device information and this
 option is ignored.
 .TP
-.BI zoneskip \fR=\fPint
+.BI zoneskip \fR=\fPint[z]
 For \fBzonemode\fR=strided, the number of bytes to skip after \fBzonesize\fR
 bytes of data have been transferred.
 
@@ -1033,7 +1036,7 @@ The values are all relative to each other, and no absolute meaning
 should be associated with them.
 .RE
 .TP
-.BI offset \fR=\fPint
+.BI offset \fR=\fPint[%|z]
 Start I/O at the provided offset in the file, given as either a fixed size in
 bytes or a percentage. If a percentage is given, the generated offset will be
 aligned to the minimum \fBblocksize\fR or to the value of \fBoffset_align\fR if
@@ -1048,7 +1051,7 @@ If set to non-zero value, the byte offset generated by a percentage \fBoffset\fR
 is aligned upwards to this value. Defaults to 0 meaning that a percentage
 offset is aligned to the minimum block size.
 .TP
-.BI offset_increment \fR=\fPint
+.BI offset_increment \fR=\fPint[%|z]
 If this is provided, then the real offset becomes `\fBoffset\fR + \fBoffset_increment\fR
 * thread_number', where the thread number is a counter that starts at 0 and
 is incremented for each sub-job (i.e. when \fBnumjobs\fR option is
@@ -1570,7 +1573,7 @@ Pin the specified amount of memory with \fBmlock\fR\|(2). Can be used to
 simulate a smaller amount of memory. The amount specified is per worker.
 .SS "I/O size"
 .TP
-.BI size \fR=\fPint
+.BI size \fR=\fPint[%|z]
 The total size of file I/O for each thread of this job. Fio will run until
 this many bytes has been transferred, unless runtime is limited by other options
 (such as \fBruntime\fR, for instance, or increased/decreased by \fBio_size\fR).
@@ -1585,7 +1588,7 @@ given, fio will use 20% of the full size of the given files or devices.
 Can be combined with \fBoffset\fR to constrain the start and end range
 that I/O will be done within.
 .TP
-.BI io_size \fR=\fPint "\fR,\fB io_limit" \fR=\fPint
+.BI io_size \fR=\fPint[%|z] "\fR,\fB io_limit" \fR=\fPint[%|z]
 Normally fio operates within the region set by \fBsize\fR, which means
 that the \fBsize\fR option sets both the region and size of I/O to be
 performed. Sometimes that is not what you want. With this option, it is
diff --git a/options.c b/options.c
index e62e0cfb..e3b0c4ef 100644
--- a/options.c
+++ b/options.c
@@ -1471,8 +1471,13 @@ static int str_offset_cb(void *data, unsigned long long *__val)
 	if (parse_is_percent(v)) {
 		td->o.start_offset = 0;
 		td->o.start_offset_percent = -1ULL - v;
+		td->o.start_offset_nz = 0;
 		dprint(FD_PARSE, "SET start_offset_percent %d\n",
 					td->o.start_offset_percent);
+	} else if (parse_is_zone(v)) {
+		td->o.start_offset = 0;
+		td->o.start_offset_percent = 0;
+		td->o.start_offset_nz = v - ZONE_BASE_VAL;
 	} else
 		td->o.start_offset = v;
 
@@ -1487,8 +1492,13 @@ static int str_offset_increment_cb(void *data, unsigned long long *__val)
 	if (parse_is_percent(v)) {
 		td->o.offset_increment = 0;
 		td->o.offset_increment_percent = -1ULL - v;
+		td->o.offset_increment_nz = 0;
 		dprint(FD_PARSE, "SET offset_increment_percent %d\n",
 					td->o.offset_increment_percent);
+	} else if (parse_is_zone(v)) {
+		td->o.offset_increment = 0;
+		td->o.offset_increment_percent = 0;
+		td->o.offset_increment_nz = v - ZONE_BASE_VAL;
 	} else
 		td->o.offset_increment = v;
 
@@ -1505,6 +1515,10 @@ static int str_size_cb(void *data, unsigned long long *__val)
 		td->o.size_percent = -1ULL - v;
 		dprint(FD_PARSE, "SET size_percent %d\n",
 					td->o.size_percent);
+	} else if (parse_is_zone(v)) {
+		td->o.size = 0;
+		td->o.size_percent = 0;
+		td->o.size_nz = v - ZONE_BASE_VAL;
 	} else
 		td->o.size = v;
 
@@ -1525,12 +1539,30 @@ static int str_io_size_cb(void *data, unsigned long long *__val)
 		}
 		dprint(FD_PARSE, "SET io_size_percent %d\n",
 					td->o.io_size_percent);
+	} else if (parse_is_zone(v)) {
+		td->o.io_size = 0;
+		td->o.io_size_percent = 0;
+		td->o.io_size_nz = v - ZONE_BASE_VAL;
 	} else
 		td->o.io_size = v;
 
 	return 0;
 }
 
+static int str_zoneskip_cb(void *data, unsigned long long *__val)
+{
+	struct thread_data *td = cb_data_to_td(data);
+	unsigned long long v = *__val;
+
+	if (parse_is_zone(v)) {
+		td->o.zone_skip = 0;
+		td->o.zone_skip_nz = v - ZONE_BASE_VAL;
+	} else
+		td->o.zone_skip = v;
+
+	return 0;
+}
+
 static int str_write_bw_log_cb(void *data, const char *str)
 {
 	struct thread_data *td = cb_data_to_td(data);
@@ -2081,11 +2113,10 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "size",
 		.lname	= "Size",
-		.type	= FIO_OPT_STR_VAL,
+		.type	= FIO_OPT_STR_VAL_ZONE,
 		.cb	= str_size_cb,
 		.off1	= offsetof(struct thread_options, size),
 		.help	= "Total size of device or files",
-		.interval = 1024 * 1024,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_INVALID,
 	},
@@ -2093,11 +2124,10 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "io_size",
 		.alias	= "io_limit",
 		.lname	= "IO Size",
-		.type	= FIO_OPT_STR_VAL,
+		.type	= FIO_OPT_STR_VAL_ZONE,
 		.cb	= str_io_size_cb,
 		.off1	= offsetof(struct thread_options, io_size),
 		.help	= "Total size of I/O to be performed",
-		.interval = 1024 * 1024,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_INVALID,
 	},
@@ -2138,12 +2168,11 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "offset",
 		.lname	= "IO offset",
 		.alias	= "fileoffset",
-		.type	= FIO_OPT_STR_VAL,
+		.type	= FIO_OPT_STR_VAL_ZONE,
 		.cb	= str_offset_cb,
 		.off1	= offsetof(struct thread_options, start_offset),
 		.help	= "Start IO from this offset",
 		.def	= "0",
-		.interval = 1024 * 1024,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_INVALID,
 	},
@@ -2161,14 +2190,13 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "offset_increment",
 		.lname	= "IO offset increment",
-		.type	= FIO_OPT_STR_VAL,
+		.type	= FIO_OPT_STR_VAL_ZONE,
 		.cb	= str_offset_increment_cb,
 		.off1	= offsetof(struct thread_options, offset_increment),
 		.help	= "What is the increment from one offset to the next",
 		.parent = "offset",
 		.hide	= 1,
 		.def	= "0",
-		.interval = 1024 * 1024,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_INVALID,
 	},
@@ -3404,11 +3432,11 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "zoneskip",
 		.lname	= "Zone skip",
-		.type	= FIO_OPT_STR_VAL,
+		.type	= FIO_OPT_STR_VAL_ZONE,
+		.cb	= str_zoneskip_cb,
 		.off1	= offsetof(struct thread_options, zone_skip),
 		.help	= "Space between IO zones",
 		.def	= "0",
-		.interval = 1024 * 1024,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_ZONE,
 	},
diff --git a/parse.c b/parse.c
index 44bf9507..45f4f2d3 100644
--- a/parse.c
+++ b/parse.c
@@ -37,6 +37,7 @@ static const char *opt_type_names[] = {
 	"OPT_BOOL",
 	"OPT_FLOAT_LIST",
 	"OPT_STR_SET",
+	"OPT_STR_VAL_ZONE",
 	"OPT_DEPRECATED",
 	"OPT_SOFT_DEPRECATED",
 	"OPT_UNSUPPORTED",
@@ -599,9 +600,35 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 		fallthrough;
 	case FIO_OPT_ULL:
 	case FIO_OPT_INT:
-	case FIO_OPT_STR_VAL: {
+	case FIO_OPT_STR_VAL:
+	case FIO_OPT_STR_VAL_ZONE:
+	{
 		fio_opt_str_val_fn *fn = o->cb;
 		char tmp[128], *p;
+		size_t len = strlen(ptr);
+
+		if (len > 0 && ptr[len - 1] == 'z') {
+			if (o->type == FIO_OPT_STR_VAL_ZONE) {
+				char *ep;
+				unsigned long long val;
+
+				errno = 0;
+				val = strtoul(ptr, &ep, 10);
+				if (errno == 0 && ep != ptr && *ep == 'z') {
+					ull = ZONE_BASE_VAL + (uint32_t)val;
+					ret = 0;
+					goto store_option_value;
+				} else {
+					log_err("%s: unexpected zone value '%s'\n",
+						o->name, ptr);
+					return 1;
+				}
+			} else {
+				log_err("%s: 'z' suffix isn't applicable\n",
+					o->name);
+				return 1;
+			}
+		}
 
 		if (!is_time && o->is_time)
 			is_time = o->is_time;
@@ -655,6 +682,7 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 			}
 		}
 
+store_option_value:
 		if (fn)
 			ret = fn(data, &ull);
 		else {
diff --git a/parse.h b/parse.h
index e6663ed4..4cf08fd2 100644
--- a/parse.h
+++ b/parse.h
@@ -21,6 +21,7 @@ enum fio_opt_type {
 	FIO_OPT_BOOL,
 	FIO_OPT_FLOAT_LIST,
 	FIO_OPT_STR_SET,
+	FIO_OPT_STR_VAL_ZONE,
 	FIO_OPT_DEPRECATED,
 	FIO_OPT_SOFT_DEPRECATED,
 	FIO_OPT_UNSUPPORTED,	/* keep this last */
@@ -130,12 +131,18 @@ static inline void *td_var(void *to, const struct fio_option *o,
 
 static inline int parse_is_percent(unsigned long long val)
 {
-	return val <= -1ULL && val >= (-1ULL - 100ULL);
+	return val >= -101;
 }
 
+#define ZONE_BASE_VAL ((-1ULL >> 1) + 1)
 static inline int parse_is_percent_uncapped(unsigned long long val)
 {
-	return (long long)val <= -1;
+	return ZONE_BASE_VAL + -1U < val;
+}
+
+static inline int parse_is_zone(unsigned long long val)
+{
+	return (val - ZONE_BASE_VAL) <= -1U;
 }
 
 struct print_option {
diff --git a/server.h b/server.h
index 9256d44c..74618ca7 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 87,
+	FIO_SERVER_VER			= 88,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 1658dc25..be129615 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -1153,6 +1153,54 @@ test54() {
 		>> "${logfile}.${test_number}" 2>&1 || return $?
 }
 
+# test 'z' suffix parsing only
+test55() {
+	local bs
+	bs=$((logical_block_size))
+
+	require_zbd || return $SKIP_TESTCASE
+	# offset=1z + offset_increment=10z + size=2z
+	require_seq_zones 13 || return $SKIP_TESTCASE
+
+	run_fio	--name=j		\
+		--filename=${dev}	\
+		--direct=1		\
+		"$(ioengine "psync")"	\
+		--zonemode=zbd		\
+		--zonesize=${zone_size}	\
+		--rw=write		\
+		--bs=${bs}		\
+		--numjobs=2		\
+		--offset_increment=10z	\
+		--offset=1z		\
+		--size=2z		\
+		--io_size=3z		\
+		${job_var_opts[@]} --debug=zbd \
+		>> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
+# test 'z' suffix parsing only
+test56() {
+	local bs
+	bs=$((logical_block_size))
+
+	require_regular_block_dev || return $SKIP_TESTCASE
+	require_seq_zones 10 || return $SKIP_TESTCASE
+
+	run_fio	--name=j		\
+		--filename=${dev}	\
+		--direct=1		\
+		"$(ioengine "psync")"	\
+		--zonemode=strided	\
+		--zonesize=${zone_size}	\
+		--rw=write		\
+		--bs=${bs}		\
+		--size=10z		\
+		--zoneskip=2z		\
+		${job_var_opts[@]} --debug=zbd \
+		>> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
 SECONDS=0
 tests=()
 dynamic_analyzer=()
diff --git a/thread_options.h b/thread_options.h
index f6b15403..88fd7ad9 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -83,13 +83,16 @@ struct thread_options {
 	unsigned long long size;
 	unsigned long long io_size;
 	unsigned int size_percent;
+	unsigned int size_nz;
 	unsigned int io_size_percent;
+	unsigned int io_size_nz;
 	unsigned int fill_device;
 	unsigned int file_append;
 	unsigned long long file_size_low;
 	unsigned long long file_size_high;
 	unsigned long long start_offset;
 	unsigned long long start_offset_align;
+	unsigned int start_offset_nz;
 
 	unsigned long long bs[DDIR_RWDIR_CNT];
 	unsigned long long ba[DDIR_RWDIR_CNT];
@@ -198,6 +201,7 @@ struct thread_options {
 	unsigned long long zone_size;
 	unsigned long long zone_capacity;
 	unsigned long long zone_skip;
+	uint32_t zone_skip_nz;
 	enum fio_zone_mode zone_mode;
 	unsigned long long lockmem;
 	enum fio_memtype mem_type;
@@ -315,6 +319,7 @@ struct thread_options {
 	unsigned int gid;
 
 	unsigned int offset_increment_percent;
+	unsigned int offset_increment_nz;
 	unsigned long long offset_increment;
 	unsigned long long number_ios;
 
@@ -384,14 +389,19 @@ struct thread_options_pack {
 	uint64_t size;
 	uint64_t io_size;
 	uint32_t size_percent;
+	uint32_t size_nz;
 	uint32_t io_size_percent;
+	uint32_t io_size_nz;
 	uint32_t fill_device;
 	uint32_t file_append;
 	uint32_t unique_filename;
+	uint32_t pad3;
 	uint64_t file_size_low;
 	uint64_t file_size_high;
 	uint64_t start_offset;
 	uint64_t start_offset_align;
+	uint32_t start_offset_nz;
+	uint32_t pad4;
 
 	uint64_t bs[DDIR_RWDIR_CNT];
 	uint64_t ba[DDIR_RWDIR_CNT];
@@ -464,8 +474,6 @@ struct thread_options_pack {
 	struct zone_split zone_split[DDIR_RWDIR_CNT][ZONESPLIT_MAX];
 	uint32_t zone_split_nr[DDIR_RWDIR_CNT];
 
-	uint8_t pad1[4];
-
 	fio_fp64_t zipf_theta;
 	fio_fp64_t pareto_h;
 	fio_fp64_t gauss_dev;
@@ -501,6 +509,7 @@ struct thread_options_pack {
 	uint64_t zone_capacity;
 	uint64_t zone_skip;
 	uint64_t lockmem;
+	uint32_t zone_skip_nz;
 	uint32_t mem_type;
 	uint32_t mem_align;
 
@@ -509,8 +518,6 @@ struct thread_options_pack {
 	uint32_t new_group;
 	uint32_t numjobs;
 
-	uint8_t pad3[4];
-
 	/*
 	 * We currently can't convert these, so don't enable them
 	 */
@@ -616,12 +623,14 @@ struct thread_options_pack {
 	uint32_t gid;
 
 	uint32_t offset_increment_percent;
+	uint32_t offset_increment_nz;
 	uint64_t offset_increment;
 	uint64_t number_ios;
 
 	uint64_t latency_target;
 	uint64_t latency_window;
 	uint64_t max_latency;
+	uint32_t pad5;
 	fio_fp64_t latency_percentile;
 	uint32_t latency_run;
 
diff --git a/zbd.c b/zbd.c
index 6a26fe10..d16b890f 100644
--- a/zbd.c
+++ b/zbd.c
@@ -285,9 +285,7 @@ static bool zbd_verify_sizes(void)
 				return false;
 			}
 
-			if (td->o.zone_skip &&
-			    (td->o.zone_skip < td->o.zone_size ||
-			     td->o.zone_skip % td->o.zone_size)) {
+			if (td->o.zone_skip % td->o.zone_size) {
 				log_err("%s: zoneskip %llu is not a multiple of the device zone size %llu.\n",
 					f->file_name, (unsigned long long) td->o.zone_skip,
 					(unsigned long long) td->o.zone_size);
@@ -335,20 +333,21 @@ static bool zbd_verify_bs(void)
 {
 	struct thread_data *td;
 	struct fio_file *f;
-	uint32_t zone_size;
 	int i, j, k;
 
 	for_each_td(td, i) {
 		for_each_file(td, f, j) {
+			uint64_t zone_size;
+
 			if (!f->zbd_info)
 				continue;
 			zone_size = f->zbd_info->zone_size;
 			for (k = 0; k < FIO_ARRAY_SIZE(td->o.bs); k++) {
 				if (td->o.verify != VERIFY_NONE &&
 				    zone_size % td->o.bs[k] != 0) {
-					log_info("%s: block size %llu is not a divisor of the zone size %d\n",
+					log_info("%s: block size %llu is not a divisor of the zone size %llu\n",
 						 f->file_name, td->o.bs[k],
-						 zone_size);
+						 (unsigned long long)zone_size);
 					return false;
 				}
 			}
@@ -648,7 +647,7 @@ static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 			  struct fio_zone_info *z);
 
-int zbd_setup_files(struct thread_data *td)
+int zbd_init_files(struct thread_data *td)
 {
 	struct fio_file *f;
 	int i;
@@ -657,6 +656,44 @@ int zbd_setup_files(struct thread_data *td)
 		if (zbd_init_zone_info(td, f))
 			return 1;
 	}
+	return 0;
+}
+
+void zbd_recalc_options_with_zone_granularity(struct thread_data *td)
+{
+	struct fio_file *f;
+	int i;
+
+	for_each_file(td, f, i) {
+		struct zoned_block_device_info *zbd = f->zbd_info;
+		// zonemode=strided doesn't get per-file zone size.
+		uint64_t zone_size = zbd ? zbd->zone_size : td->o.zone_size;
+
+		if (zone_size == 0)
+			continue;
+
+		if (td->o.size_nz > 0) {
+			td->o.size = td->o.size_nz * zone_size;
+		}
+		if (td->o.io_size_nz > 0) {
+			td->o.io_size = td->o.io_size_nz * zone_size;
+		}
+		if (td->o.start_offset_nz > 0) {
+			td->o.start_offset = td->o.start_offset_nz * zone_size;
+		}
+		if (td->o.offset_increment_nz > 0) {
+			td->o.offset_increment = td->o.offset_increment_nz * zone_size;
+		}
+		if (td->o.zone_skip_nz > 0) {
+			td->o.zone_skip = td->o.zone_skip_nz * zone_size;
+		}
+	}
+}
+
+int zbd_setup_files(struct thread_data *td)
+{
+	struct fio_file *f;
+	int i;
 
 	if (!zbd_using_direct_io()) {
 		log_err("Using direct I/O is mandatory for writing to ZBD drives\n\n");
diff --git a/zbd.h b/zbd.h
index cc3ab624..64534393 100644
--- a/zbd.h
+++ b/zbd.h
@@ -87,6 +87,8 @@ struct zoned_block_device_info {
 	struct fio_zone_info	zone_info[0];
 };
 
+int zbd_init_files(struct thread_data *td);
+void zbd_recalc_options_with_zone_granularity(struct thread_data *td);
 int zbd_setup_files(struct thread_data *td);
 void zbd_free_zone_info(struct fio_file *f);
 void zbd_file_reset(struct thread_data *td, struct fio_file *f);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-02-22 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-02-22 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c4f5c92fac8a39ffff29d57e99c3c0163358dd7a:

  engines/io_uring: add verbose error for ENOSYS (2021-02-16 12:07:14 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ab02be41807ec9451c47b17129cf61457ef21db6:

  Add a new file to gitignore (2021-02-21 06:14:40 -0700)

----------------------------------------------------------------
Hongwei Qin (1):
      Add a new file to gitignore

 .gitignore | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/.gitignore b/.gitignore
index 0aa4a361..6651f96e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -30,3 +30,4 @@ doc/output
 /tags
 /TAGS
 /t/zbd/test-zbd-support.log.*
+/t/fuzz/fuzz_parseini


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-02-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-02-17 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit d16e84e256ffdcd3143c9439cf1e408d8db61c1a:

  Merge branch 'per-engine-pre-write-function' of https://github.com/lukaszstolarczuk/fio (2021-02-14 13:21:05 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c4f5c92fac8a39ffff29d57e99c3c0163358dd7a:

  engines/io_uring: add verbose error for ENOSYS (2021-02-16 12:07:14 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      engines/io_uring: add verbose error for ENOSYS

 engines/io_uring.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 9ce2ae80..c9036ba0 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -696,7 +696,11 @@ static int fio_ioring_post_init(struct thread_data *td)
 
 	err = fio_ioring_queue_init(td);
 	if (err) {
-		td_verror(td, errno, "io_queue_init");
+		int __errno = errno;
+
+		if (__errno == ENOSYS)
+			log_err("fio: your kernel doesn't support io_uring\n");
+		td_verror(td, __errno, "io_queue_init");
 		return 1;
 	}
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-02-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-02-15 13:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 5581 bytes --]

The following changes since commit b02c5eda07966d2e1c41870f64b741413b67a9aa:

  Merge branch 'taras/clientuid' of https://github.com/tarasglek/fio-1 (2021-02-10 13:22:04 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d16e84e256ffdcd3143c9439cf1e408d8db61c1a:

  Merge branch 'per-engine-pre-write-function' of https://github.com/lukaszstolarczuk/fio (2021-02-14 13:21:05 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'per-engine-pre-write-function' of https://github.com/lukaszstolarczuk/fio

��ukasz Stolarczuk (1):
      filesetup: add engine's io_ops to prepopulate file with data

 engines/libpmem.c |   1 +
 file.h            |   1 +
 filesetup.c       | 126 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 ioengines.h       |   3 +-
 4 files changed, 130 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/libpmem.c b/engines/libpmem.c
index eefb7767..2338f0fa 100644
--- a/engines/libpmem.c
+++ b/engines/libpmem.c
@@ -255,6 +255,7 @@ FIO_STATIC struct ioengine_ops ioengine = {
 	.open_file	= fio_libpmem_open_file,
 	.close_file	= fio_libpmem_close_file,
 	.get_file_size	= generic_get_file_size,
+	.prepopulate_file = generic_prepopulate_file,
 	.flags		= FIO_SYNCIO | FIO_RAWIO | FIO_DISKLESSIO | FIO_NOEXTEND |
 				FIO_NODISKUTIL | FIO_BARRIER | FIO_MEMALIGN,
 };
diff --git a/file.h b/file.h
index 493ec04a..faf65a2a 100644
--- a/file.h
+++ b/file.h
@@ -207,6 +207,7 @@ extern "C" {
 extern int __must_check generic_open_file(struct thread_data *, struct fio_file *);
 extern int __must_check generic_close_file(struct thread_data *, struct fio_file *);
 extern int __must_check generic_get_file_size(struct thread_data *, struct fio_file *);
+extern int __must_check generic_prepopulate_file(struct thread_data *, struct fio_file *);
 #ifdef __cplusplus
 }
 #endif
diff --git a/filesetup.c b/filesetup.c
index 9d033757..661d4c2f 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -338,6 +338,95 @@ error:
 	return ret;
 }
 
+/*
+ * Generic function to prepopulate regular file with data.
+ * Useful if you want to make sure I/O engine has data to read.
+ * Leaves f->fd open on success, caller must close.
+ */
+int generic_prepopulate_file(struct thread_data *td, struct fio_file *f)
+{
+	int flags;
+	unsigned long long left, bs;
+	char *b = NULL;
+
+	/* generic function for regular files only */
+	assert(f->filetype == FIO_TYPE_FILE);
+
+	if (read_only) {
+		log_err("fio: refusing to write a file due to read-only\n");
+		return 0;
+	}
+
+	flags = O_WRONLY;
+	if (td->o.allow_create)
+		flags |= O_CREAT;
+
+#ifdef WIN32
+	flags |= _O_BINARY;
+#endif
+
+	dprint(FD_FILE, "open file %s, flags %x\n", f->file_name, flags);
+	f->fd = open(f->file_name, flags, 0644);
+	if (f->fd < 0) {
+		int err = errno;
+
+		if (err == ENOENT && !td->o.allow_create)
+			log_err("fio: file creation disallowed by "
+					"allow_file_create=0\n");
+		else
+			td_verror(td, err, "open");
+		return 1;
+	}
+
+	left = f->real_file_size;
+	bs = td->o.max_bs[DDIR_WRITE];
+	if (bs > left)
+		bs = left;
+
+	b = malloc(bs);
+	if (!b) {
+		td_verror(td, errno, "malloc");
+		goto err;
+	}
+
+	while (left && !td->terminate) {
+		ssize_t r;
+
+		if (bs > left)
+			bs = left;
+
+		fill_io_buffer(td, b, bs, bs);
+
+		r = write(f->fd, b, bs);
+
+		if (r > 0) {
+			left -= r;
+		} else {
+			td_verror(td, errno, "write");
+			goto err;
+		}
+	}
+
+	if (td->terminate) {
+		dprint(FD_FILE, "terminate unlink %s\n", f->file_name);
+		td_io_unlink_file(td, f);
+	} else if (td->o.create_fsync) {
+		if (fsync(f->fd) < 0) {
+			td_verror(td, errno, "fsync");
+			goto err;
+		}
+	}
+
+	free(b);
+	return 0;
+err:
+	close(f->fd);
+	f->fd = -1;
+	if (b)
+		free(b);
+	return 1;
+}
+
 unsigned long long get_rand_file_size(struct thread_data *td)
 {
 	unsigned long long ret, sized;
@@ -1254,6 +1343,43 @@ int setup_files(struct thread_data *td)
 		temp_stall_ts = 0;
 	}
 
+	if (err)
+		goto err_out;
+
+	/*
+	 * Prepopulate files with data. It might be expected to read some
+	 * "real" data instead of zero'ed files (if no writes to file occurred
+	 * prior to a read job). Engine has to provide a way to do that.
+	 */
+	if (td->io_ops->prepopulate_file) {
+		temp_stall_ts = 1;
+
+		for_each_file(td, f, i) {
+			if (output_format & FIO_OUTPUT_NORMAL) {
+				log_info("%s: Prepopulating IO file (%s)\n",
+							o->name, f->file_name);
+			}
+
+			err = td->io_ops->prepopulate_file(td, f);
+			if (err)
+				break;
+
+			err = __file_invalidate_cache(td, f, f->file_offset,
+								f->io_size);
+
+			/*
+			 * Shut up static checker
+			 */
+			if (f->fd != -1)
+				close(f->fd);
+
+			f->fd = -1;
+			if (err)
+				break;
+		}
+		temp_stall_ts = 0;
+	}
+
 	if (err)
 		goto err_out;
 
diff --git a/ioengines.h b/ioengines.h
index 839b318d..1d01ab0a 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -8,7 +8,7 @@
 #include "io_u.h"
 #include "zbd_types.h"
 
-#define FIO_IOOPS_VERSION	28
+#define FIO_IOOPS_VERSION	29
 
 #ifndef CONFIG_DYNAMIC_ENGINES
 #define FIO_STATIC	static
@@ -47,6 +47,7 @@ struct ioengine_ops {
 	int (*invalidate)(struct thread_data *, struct fio_file *);
 	int (*unlink_file)(struct thread_data *, struct fio_file *);
 	int (*get_file_size)(struct thread_data *, struct fio_file *);
+	int (*prepopulate_file)(struct thread_data *, struct fio_file *);
 	void (*terminate)(struct thread_data *);
 	int (*iomem_alloc)(struct thread_data *, size_t);
 	void (*iomem_free)(struct thread_data *);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-02-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-02-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2ef3c1b02473a14bf7b8b52e28d0cdded9c5cc9a:

  zbd: relocate Coverity annotation (2021-01-29 22:06:49 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b02c5eda07966d2e1c41870f64b741413b67a9aa:

  Merge branch 'taras/clientuid' of https://github.com/tarasglek/fio-1 (2021-02-10 13:22:04 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'taras/clientuid' of https://github.com/tarasglek/fio-1

Taras Glek (1):
      $clientuid keyword to differentiate clients in client/server mode.

 HOWTO  |  2 ++
 fio.1  |  3 +++
 init.c | 19 ++++++++++++++++++-
 3 files changed, 23 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index b6d1b58a..52812cc7 100644
--- a/HOWTO
+++ b/HOWTO
@@ -809,6 +809,8 @@ Target file/device
 
 		**$jobname**
 				The name of the worker thread or process.
+		**$clientuid**
+				IP of the fio process when using client/server mode.
 		**$jobnum**
 				The incremental number of the worker thread or process.
 		**$filenum**
diff --git a/fio.1 b/fio.1
index aa248a3b..accc6a32 100644
--- a/fio.1
+++ b/fio.1
@@ -584,6 +584,9 @@ string:
 .B $jobname
 The name of the worker thread or process.
 .TP
+.B $clientuid
+IP of the fio process when using client/server mode.
+.TP
 .B $jobnum
 The incremental number of the worker thread or process.
 .TP
diff --git a/init.c b/init.c
index d6dbaf7c..eea6e546 100644
--- a/init.c
+++ b/init.c
@@ -1238,7 +1238,8 @@ enum {
 	FPRE_NONE = 0,
 	FPRE_JOBNAME,
 	FPRE_JOBNUM,
-	FPRE_FILENUM
+	FPRE_FILENUM,
+	FPRE_CLIENTUID
 };
 
 static struct fpre_keyword {
@@ -1249,6 +1250,7 @@ static struct fpre_keyword {
 	{ .keyword = "$jobname",	.key = FPRE_JOBNAME, },
 	{ .keyword = "$jobnum",		.key = FPRE_JOBNUM, },
 	{ .keyword = "$filenum",	.key = FPRE_FILENUM, },
+	{ .keyword = "$clientuid",	.key = FPRE_CLIENTUID, },
 	{ .keyword = NULL, },
 	};
 
@@ -1338,6 +1340,21 @@ static char *make_filename(char *buf, size_t buf_size,struct thread_options *o,
 				}
 				break;
 				}
+			case FPRE_CLIENTUID: {
+				int ret;
+				ret = snprintf(dst, dst_left, "%s", client_sockaddr_str);
+				if (ret < 0)
+					break;
+				else if (ret > dst_left) {
+					log_err("fio: truncated filename\n");
+					dst += dst_left;
+					dst_left = 0;
+				} else {
+					dst += ret;
+					dst_left -= ret;
+				}
+				break;
+				}
 			default:
 				assert(0);
 				break;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-01-30 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-01-30 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 119bb82143bd6c4a577c135ece4ed6b702443f50:

  Merge branch 'fio-fix-detecting-libpmem' of https://github.com/ldorau/fio (2021-01-27 09:51:01 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2ef3c1b02473a14bf7b8b52e28d0cdded9c5cc9a:

  zbd: relocate Coverity annotation (2021-01-29 22:06:49 -0700)

----------------------------------------------------------------
Aravind Ramesh (1):
      zbd: initialize sectors with data at start time

Dmitry Fomichev (26):
      zbd: return ENOMEM if zone buffer allocation fails
      zbd: use zbd_zone_nr() more actively in the code
      zbd: add get_zone() helper function
      zbd: introduce zone_unlock()
      zbd: engines/libzbc: don't fail on assert for offline zones
      zbd: remove dependency on zone type during i/o
      zbd: skip offline zones in zbd_convert_to_open_zone()
      zbd: avoid zone buffer overrun
      zbd: don't unlock zone mutex after verify replay
      zbd: use zone_lock() in zbd_process_swd()
      zbd: don't log "zone nnnn is not open" message
      zbd: handle conventional start zone in zbd_convert_to_open_zone()
      zbd: improve replay range validation
      engines/libzbc: enable block backend
      zbd: avoid failing assertion in zbd_convert_to_open_zone()
      zbd: set thread errors in zbd_adjust_block()
      t/zbd: check for error in test #2
      t/zbd: add run-tests-against-nullb script
      t/zbd: add an option to bail on a failed test
      t/zbd: prevent test #31 from looping
      t/zbd: add checks for offline zone condition
      t/zbd: add test #54 to exercise ZBD verification
      t/zbd: show elapsed time in test-zbd-support
      t/zbd: increase timeout in test #48
      t/zbd: avoid looping on invalid command line options
      zbd: relocate Coverity annotation

Jens Axboe (1):
      zbd: fix 32-bit compile warnings for logging

Shin'ichiro Kawasaki (12):
      zbd: do not lock conventional zones on I/O adjustment
      zbd: do not set zbd handlers for conventional zones
      zbd: count sectors with data for write pointer zones
      zbd: initialize min_zone and max_zone for all zone types
      zbd: disable crossing from conventional to sequential zones
      t/zbd: add -t option to run-tests-against-nullb
      t/zbd: skip tests when test prerequisites are not met
      t/zbd: skip tests that need too many sequential zones
      t/zbd: test that conventional zones are not locked during random i/o
      t/zbd: test that zone_reset_threshold calculation is correct
      t/zbd: test random I/O direction in all-conventional case
      t/zbd: fix wrong units in test case #37

 Makefile                              |   5 +-
 engines/libzbc.c                      |   5 +-
 oslib/linux-blkzoned.c                |   2 +-
 t/run-fio-tests.py                    |   8 +-
 t/zbd/functions                       |  56 +++++-
 t/zbd/run-tests-against-nullb         | 354 +++++++++++++++++++++++++++++++++
 t/zbd/run-tests-against-regular-nullb |  27 ---
 t/zbd/run-tests-against-zoned-nullb   |  53 -----
 t/zbd/test-zbd-support                | 299 ++++++++++++++++++++++++----
 zbd.c                                 | 357 +++++++++++++++++++++-------------
 zbd.h                                 |   5 +
 11 files changed, 911 insertions(+), 260 deletions(-)
 create mode 100755 t/zbd/run-tests-against-nullb
 delete mode 100755 t/zbd/run-tests-against-regular-nullb
 delete mode 100755 t/zbd/run-tests-against-zoned-nullb

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index f74e59e1..612344d1 100644
--- a/Makefile
+++ b/Makefile
@@ -626,9 +626,10 @@ fulltest:
 	   make -j &&						 	\
 	   sudo make install)						\
 	fi &&					 			\
-	sudo t/zbd/run-tests-against-regular-nullb &&		 	\
+	sudo t/zbd/run-tests-against-nullb -s 1 &&		 	\
 	if [ -e /sys/module/null_blk/parameters/zoned ]; then		\
-		sudo t/zbd/run-tests-against-zoned-nullb;	 	\
+		sudo t/zbd/run-tests-against-nullb -s 2;	 	\
+		sudo t/zbd/run-tests-against-nullb -s 4;	 	\
 	fi
 
 install: $(PROGS) $(SCRIPTS) $(ENGS_OBJS) tools/plot/fio2gnuplot.1 FORCE
diff --git a/engines/libzbc.c b/engines/libzbc.c
index 4b900233..2aacf7bb 100644
--- a/engines/libzbc.c
+++ b/engines/libzbc.c
@@ -86,7 +86,8 @@ static int libzbc_open_dev(struct thread_data *td, struct fio_file *f,
 		return -ENOMEM;
 
 	ret = zbc_open(f->file_name,
-		       flags | ZBC_O_DRV_SCSI | ZBC_O_DRV_ATA, &ld->zdev);
+		       flags | ZBC_O_DRV_BLOCK | ZBC_O_DRV_SCSI | ZBC_O_DRV_ATA,
+		       &ld->zdev);
 	if (ret) {
 		log_err("%s: zbc_open() failed, err=%d\n",
 			f->file_name, ret);
@@ -283,7 +284,7 @@ static int libzbc_report_zones(struct thread_data *td, struct fio_file *f,
 		default:
 			/* Treat all these conditions as offline (don't use!) */
 			zbdz->cond = ZBD_ZONE_COND_OFFLINE;
-			break;
+			zbdz->wp = zbdz->start;
 		}
 	}
 
diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
index 0a8a577a..f37c67fc 100644
--- a/oslib/linux-blkzoned.c
+++ b/oslib/linux-blkzoned.c
@@ -203,7 +203,7 @@ int blkzoned_report_zones(struct thread_data *td, struct fio_file *f,
 		default:
 			/* Treat all these conditions as offline (don't use!) */
 			z->cond = ZBD_ZONE_COND_OFFLINE;
-			break;
+			z->wp = z->start;
 		}
 	}
 
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index e5c2f17c..a59cdfe0 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -879,8 +879,8 @@ TEST_LIST = [
     {
         'test_id':          1007,
         'test_class':       FioExeTest,
-        'exe':              't/zbd/run-tests-against-regular-nullb',
-        'parameters':       None,
+        'exe':              't/zbd/run-tests-against-nullb',
+        'parameters':       ['-s', '1'],
         'success':          SUCCESS_DEFAULT,
         'requirements':     [Requirements.linux, Requirements.zbd,
                              Requirements.root],
@@ -888,8 +888,8 @@ TEST_LIST = [
     {
         'test_id':          1008,
         'test_class':       FioExeTest,
-        'exe':              't/zbd/run-tests-against-zoned-nullb',
-        'parameters':       None,
+        'exe':              't/zbd/run-tests-against-nullb',
+        'parameters':       ['-s', '2'],
         'success':          SUCCESS_DEFAULT,
         'requirements':     [Requirements.linux, Requirements.zbd,
                              Requirements.root, Requirements.zoned_nullb],
diff --git a/t/zbd/functions b/t/zbd/functions
index 1a64a215..40ffe1de 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -71,7 +71,7 @@ first_sequential_zone() {
 
     if [ -n "${blkzone}" ] && [ ! -n "${use_libzbc}" ]; then
 	${blkzone} report "$dev" |
-	    sed -n 's/^[[:blank:]]*start:[[:blank:]]\([0-9a-zA-Z]*\),[[:blank:]]len[[:blank:]]\([0-9a-zA-Z]*\),.*type:[[:blank:]]2(.*/\1 \2/p' |
+	    sed -n 's/^[[:blank:]]*start:[[:blank:]]\([0-9a-zA-Z]*\),[[:blank:]]len[[:blank:]]\([0-9a-zA-Z]*\),.*zcond:\(14\|[[:blank:]][0-4]\)(.*type:[[:blank:]]\([2]\)(.*/\1 \2/p' |
 	    {
 		read -r starting_sector length &&
 		    # Convert from hex to decimal
@@ -79,7 +79,7 @@ first_sequential_zone() {
 	    }
     else
 	${zbc_report_zones} "$dev" |
-	    sed -n 's/^Zone [0-9]*: type 0x2 .*, sector \([0-9]*\), \([0-9]*\) sectors,.*$/\1 \2/p' |
+	    sed -n 's/^Zone [0-9]*: type 0x2 .*,[[:blank:]]cond[[:blank:]]0x[0-4e][[:blank:]].*, sector \([0-9]*\), \([0-9]*\) sectors.*$/\1 \2/p' |
 	    head -n1
     fi
 }
@@ -121,6 +121,58 @@ total_zone_capacity() {
 	echo $((capacity * 512))
 }
 
+# Reports the starting sector and length of the first zone of device $1
+# that is not in offline (or similar) condition.
+first_online_zone() {
+    local dev=$1
+
+    if [ -z "$is_zbd" ]; then
+	echo 0
+	return
+    fi
+
+    if [ -n "${blkzone}" ] && [ ! -n "${use_libzbc}" ]; then
+	${blkzone} report "$dev" |
+	    sed -n 's/^[[:blank:]]*start:[[:blank:]]\([0-9a-zA-Z]*\),[[:blank:]]len[[:blank:]]\([0-9a-zA-Z]*\),.*zcond:\(14\|[[:blank:]][0-4]\)(.*type:[[:blank:]][12](.*/\1/p' |
+	    head -n1 |
+	    {
+		read -r starting_sector &&
+		    # Convert from hex to decimal
+		    echo $((starting_sector))
+	    }
+    else
+	${zbc_report_zones} "$dev" |
+	    sed -n 's/^Zone[[:blank:]][0-9]*:[[:blank:]]type[[:blank:]]0x[12][[:blank:]].*,[[:blank:]]cond[[:blank:]]0x[0-4e][[:blank:]].*,[[:blank:]]sector[[:blank:]]\([0-9]*\),.*$/\1/p' |
+	    head -n1
+    fi
+}
+
+# Reports the starting sector and length of the last zone of device $1
+# that is not in offline (or similar) condition.
+last_online_zone() {
+    local dev=$1
+
+    if [ -z "$is_zbd" ]; then
+	echo 0
+	return
+    fi
+
+    if [ -n "${blkzone}" ] && [ ! -n "${use_libzbc}" ]; then
+	${blkzone} report "$dev" |
+	    sed -n 's/^[[:blank:]]*start:[[:blank:]]\([0-9a-zA-Z]*\),[[:blank:]]len[[:blank:]]\([0-9a-zA-Z]*\),.*zcond:\(14\|[[:blank:]][0-4]\)(.*type:[[:blank:]][12](.*/\1/p' |
+	    tail -1 |
+	    {
+		read -r starting_sector &&
+		    # Convert from hex to decimal
+		    echo $((starting_sector))
+	    }
+    else
+	${zbc_report_zones} "$dev" |
+	    sed -n 's/^Zone[[:blank:]][0-9]*:[[:blank:]]type[[:blank:]]0x[12][[:blank:]].*,[[:blank:]]cond[[:blank:]]0x[0-4e][[:blank:]].*,[[:blank:]]sector[[:blank:]]\([0-9]*\),.*$/\1/p' |
+	    tail -1
+    fi
+}
+
 max_open_zones() {
     local dev=$1
 
diff --git a/t/zbd/run-tests-against-nullb b/t/zbd/run-tests-against-nullb
new file mode 100755
index 00000000..db901179
--- /dev/null
+++ b/t/zbd/run-tests-against-nullb
@@ -0,0 +1,354 @@
+#!/bin/bash
+#
+# Copyright (C) 2020 Western Digital Corporation or its affiliates.
+#
+# This file is released under the GPL.
+#
+# Run t/zbd/test-zbd-support script against a variety of conventional,
+# zoned and mixed zone configurations.
+#
+
+usage()
+{
+	echo "This script runs the tests from t/zbd/test-zbd-support script"
+        echo "against a nullb device in a variety of conventional and zoned"
+	echo "configurations."
+	echo "Usage: ${0} [OPTIONS]"
+	echo "Options:"
+	echo -e "\t-h Show this message."
+	echo -e "\t-L List the device layouts for every section without running"
+	echo -e "\t   tests."
+	echo -e "\t-s <#section> Only run the section with the given number."
+	echo -e "\t-l Use libzbc ioengine to run the tests."
+	echo -e "\t-t <#test> Only run the test with the given number in every section."
+	echo -e "\t-o <max_open_zones> Specify MaxOpen value, (${set_max_open} by default)."
+	echo -e "\t-n <#number of runs> Set the number of times to run the entire suite "
+	echo -e "\t   or an individual section/test."
+	echo -e "\t-q Quit t/zbd/test-zbd-support run after any failed test."
+	echo -e "\t-r Remove the /dev/nullb0 device that may still exist after"
+	echo -e "\t   running this script."
+	exit 1
+}
+
+cleanup_nullb()
+{
+	for d in /sys/kernel/config/nullb/*; do [ -d "$d" ] && rmdir "$d"; done
+	modprobe -r null_blk
+	modprobe null_blk nr_devices=0 || exit $?
+	for d in /sys/kernel/config/nullb/*; do
+		[ -d "$d" ] && rmdir "$d"
+	done
+	modprobe -r null_blk
+	[ -e /sys/module/null_blk ] && exit $?
+}
+
+create_nullb()
+{
+	modprobe null_blk nr_devices=0 &&
+	cd /sys/kernel/config/nullb &&
+	mkdir nullb0 &&
+	cd nullb0 || return $?
+}
+
+configure_nullb()
+{
+	echo 0 > completion_nsec &&
+		echo ${dev_blocksize} > blocksize &&
+		echo ${dev_size} > size &&
+		echo 1 > memory_backed || return $?
+
+	if ((conv_pcnt < 100)); then
+		echo 1 > zoned &&
+			echo "${zone_size}" > zone_size || return $?
+
+		if ((zone_capacity < zone_size)); then
+			if ((!zcap_supported)); then
+				echo "null_blk does not support zone capacity"
+				return 2
+			fi
+			echo "${zone_capacity}" > zone_capacity
+		fi
+		if ((conv_pcnt)); then
+			if ((!conv_supported)); then
+				echo "null_blk does not support conventional zones"
+				return 2
+			fi
+			nr_conv=$((dev_size/zone_size*conv_pcnt/100))
+			echo "${nr_conv}" > zone_nr_conv
+		fi
+	fi
+
+	echo 1 > power || return $?
+	return 0
+}
+
+show_nullb_config()
+{
+	if ((conv_pcnt < 100)); then
+		echo "    $(printf "Zoned Device, %d%% Conventional Zones (%d)" \
+			  ${conv_pcnt} ${nr_conv})"
+		echo "    $(printf "Zone Size: %d MB" ${zone_size})"
+		echo "    $(printf "Zone Capacity: %d MB" ${zone_capacity})"
+		if ((max_open)); then
+			echo "    $(printf "Max Open: %d Zones" ${max_open})"
+		else
+			echo "    Max Open: Unlimited Zones"
+		fi
+	else
+		echo "    Non-zoned Device"
+	fi
+}
+
+#
+# Test sections.
+#
+# Fully conventional device.
+section1()
+{
+	conv_pcnt=100
+	max_open=0
+}
+
+# Zoned device with no conventional zones, ZCAP == ZSIZE, unlimited MaxOpen.
+section2()
+{
+	conv_pcnt=0
+	zone_size=1
+	zone_capacity=1
+	max_open=0
+}
+
+# Zoned device with no conventional zones, ZCAP < ZSIZE, unlimited MaxOpen.
+section3()
+{
+	conv_pcnt=0
+	zone_size=4
+	zone_capacity=3
+	max_open=0
+}
+
+# Zoned device with mostly sequential zones, ZCAP == ZSIZE, unlimited MaxOpen.
+section4()
+{
+	conv_pcnt=10
+	zone_size=1
+	zone_capacity=1
+	max_open=0
+}
+
+# Zoned device with mostly sequential zones, ZCAP < ZSIZE, unlimited MaxOpen.
+section5()
+{
+	conv_pcnt=10
+	zone_size=4
+	zone_capacity=3
+	max_open=0
+}
+
+# Zoned device with mostly conventional zones, ZCAP == ZSIZE, unlimited MaxOpen.
+section6()
+{
+	conv_pcnt=66
+	zone_size=1
+	zone_capacity=1
+	max_open=0
+}
+
+# Zoned device with mostly conventional zones, ZCAP < ZSIZE, unlimited MaxOpen.
+section7()
+{
+	dev_size=2048
+	conv_pcnt=66
+	zone_size=4
+	zone_capacity=3
+	max_open=0
+}
+
+# Zoned device with no conventional zones, ZCAP == ZSIZE, limited MaxOpen.
+section8()
+{
+	dev_size=1024
+	conv_pcnt=0
+	zone_size=1
+	zone_capacity=1
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+}
+
+# Zoned device with no conventional zones, ZCAP < ZSIZE, limited MaxOpen.
+section9()
+{
+	conv_pcnt=0
+	zone_size=4
+	zone_capacity=3
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+}
+
+# Zoned device with mostly sequential zones, ZCAP == ZSIZE, limited MaxOpen.
+section10()
+{
+	conv_pcnt=10
+	zone_size=1
+	zone_capacity=1
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+}
+
+# Zoned device with mostly sequential zones, ZCAP < ZSIZE, limited MaxOpen.
+section11()
+{
+	conv_pcnt=10
+	zone_size=4
+	zone_capacity=3
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+}
+
+# Zoned device with mostly conventional zones, ZCAP == ZSIZE, limited MaxOpen.
+section12()
+{
+	conv_pcnt=66
+	zone_size=1
+	zone_capacity=1
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+}
+
+# Zoned device with mostly conventional zones, ZCAP < ZSIZE, limited MaxOpen.
+section13()
+{
+	dev_size=2048
+	conv_pcnt=66
+	zone_size=4
+	zone_capacity=3
+	max_open=${set_max_open}
+	zbd_test_opts+=("-o ${max_open}")
+}
+
+#
+# Entry point.
+#
+SECONDS=0
+scriptdir="$(cd "$(dirname "$0")" && pwd)"
+sections=()
+zcap_supported=1
+conv_supported=1
+list_only=0
+dev_size=1024
+dev_blocksize=4096
+set_max_open=8
+zbd_test_opts=()
+libzbc=0
+num_of_runs=1
+test_case=0
+quit_on_err=0
+
+while (($#)); do
+	case "$1" in
+		-s) sections+=("$2"); shift; shift;;
+		-o) set_max_open="${2}"; shift; shift;;
+		-L) list_only=1; shift;;
+		-r) cleanup_nullb; exit 0;;
+		-l) libzbc=1; shift;;
+		-n) num_of_runs="${2}"; shift; shift;;
+		-t) test_case="${2}"; shift; shift;;
+		-q) quit_on_err=1; shift;;
+		-h) usage; break;;
+		--) shift; break;;
+		 *) usage; exit 1;;
+	esac
+done
+
+if [ "${#sections[@]}" = 0 ]; then
+	readarray -t sections < <(declare -F | grep "section[0-9]*" |  tr -c -d "[:digit:]\n" | sort -n)
+fi
+
+cleanup_nullb
+
+#
+# Test creating null_blk device and check if newer features are supported
+#
+if ! eval "create_nullb"; then
+	echo "can't create nullb"
+	exit 1
+fi
+if ! cat /sys/kernel/config/nullb/features | grep -q zone_capacity; then
+	zcap_supported=0
+fi
+if ! cat /sys/kernel/config/nullb/features | grep -q zone_nr_conv; then
+	conv_supported=0
+fi
+
+rc=0
+test_rc=0
+intr=0
+run_nr=1
+trap 'kill ${zbd_test_pid}; intr=1' SIGINT
+
+while ((run_nr <= $num_of_runs)); do
+	echo -e "\nRun #$run_nr:"
+	for section_number in "${sections[@]}"; do
+		cleanup_nullb
+		echo "---------- Section $(printf "%02d" $section_number) ----------"
+		if ! eval "create_nullb"; then
+			echo "error creating nullb"
+			exit 1
+		fi
+		zbd_test_opts=()
+		if ((test_case)); then
+			zbd_test_opts+=("-t" "${test_case}")
+		fi
+		if ((quit_on_err)); then
+			zbd_test_opts+=("-q")
+		fi
+		section$section_number
+		configure_nullb
+		rc=$?
+		((rc == 2)) && continue
+		if ((rc)); then
+			echo "can't set up nullb for section $(printf "%02d" $section_number)"
+			exit 1
+		fi
+		show_nullb_config
+		if ((libzbc)); then
+			if ((zone_capacity < zone_size)); then
+				echo "libzbc doesn't support zone capacity, skipping section $(printf "%02d" $section_number)"
+				continue
+			fi
+			if ((conv_pcnt == 100)); then
+				echo "libzbc only supports zoned devices, skipping section $(printf "%02d" $section_number)"
+				continue
+			fi
+			zbd_test_opts+=("-l")
+		fi
+		cd "${scriptdir}"
+		((intr)) && exit 1
+		((list_only)) && continue
+
+		./test-zbd-support ${zbd_test_opts[@]} /dev/nullb0 &
+		zbd_test_pid=$!
+		if kill -0 "${zbd_test_pid}"; then
+			wait "${zbd_test_pid}"
+			test_rc=$?
+		else
+			echo "can't run ZBD tests"
+			exit 1
+		fi
+		((intr)) && exit 1
+		if (($test_rc)); then
+			rc=1
+			((quit_on_err)) && break
+		fi
+	done
+
+	((rc && quit_on_err)) && break
+	run_nr=$((run_nr + 1))
+done
+
+if ((!list_only)); then
+	echo "--------------------------------"
+	echo "Total run time: $(TZ=UTC0 printf "%(%H:%M:%S)T\n" $(( SECONDS )) )"
+fi
+
+exit $rc
diff --git a/t/zbd/run-tests-against-regular-nullb b/t/zbd/run-tests-against-regular-nullb
deleted file mode 100755
index 5b7b4009..00000000
--- a/t/zbd/run-tests-against-regular-nullb
+++ /dev/null
@@ -1,27 +0,0 @@
-#!/bin/bash
-#
-# Copyright (C) 2018 Western Digital Corporation or its affiliates.
-#
-# This file is released under the GPL.
-
-scriptdir="$(cd "$(dirname "$0")" && pwd)"
-
-for d in /sys/kernel/config/nullb/*; do [ -d "$d" ] && rmdir "$d"; done
-modprobe -r null_blk
-modprobe null_blk nr_devices=0 || exit $?
-for d in /sys/kernel/config/nullb/*; do
-    [ -d "$d" ] && rmdir "$d"
-done
-modprobe -r null_blk
-[ -e /sys/module/null_blk ] && exit $?
-modprobe null_blk nr_devices=0 &&
-    cd /sys/kernel/config/nullb &&
-    mkdir nullb0 &&
-    cd nullb0 &&
-    echo 0 > completion_nsec &&
-    echo 4096 > blocksize &&
-    echo 1024 > size &&
-    echo 1 > memory_backed &&
-    echo 1 > power || exit $?
-
-"${scriptdir}"/test-zbd-support "$@" /dev/nullb0
diff --git a/t/zbd/run-tests-against-zoned-nullb b/t/zbd/run-tests-against-zoned-nullb
deleted file mode 100755
index f9c9530c..00000000
--- a/t/zbd/run-tests-against-zoned-nullb
+++ /dev/null
@@ -1,53 +0,0 @@
-#!/bin/bash
-#
-# Copyright (C) 2018 Western Digital Corporation or its affiliates.
-#
-# This file is released under the GPL.
-
-scriptdir="$(cd "$(dirname "$0")" && pwd)"
-
-zone_size=1
-zone_capacity=1
-if [[ ${1} == "-h" ]]; then
-    echo "Usage: ${0} [OPTIONS]"
-    echo "Options:"
-    echo -e "\t-h Show this message."
-    echo -e "\t-zone-cap Use null blk with zone capacity less than zone size."
-    echo -e "\tany option supported by test-zbd-support script."
-    exit 1
-elif [[ ${1} == "-zone-cap" ]]; then
-    zone_size=4
-    zone_capacity=3
-    shift
-fi
-
-for d in /sys/kernel/config/nullb/*; do [ -d "$d" ] && rmdir "$d"; done
-modprobe -r null_blk
-modprobe null_blk nr_devices=0 || exit $?
-for d in /sys/kernel/config/nullb/*; do
-    [ -d "$d" ] && rmdir "$d"
-done
-modprobe -r null_blk
-[ -e /sys/module/null_blk ] && exit $?
-modprobe null_blk nr_devices=0 &&
-    cd /sys/kernel/config/nullb &&
-    mkdir nullb0 &&
-    cd nullb0 || exit $?
-
-if ((zone_capacity < zone_size)); then
-    if [[ ! -w zone_capacity ]]; then
-        echo "null blk does not support zone capacity"
-        exit 1
-    fi
-    echo "${zone_capacity}" > zone_capacity
-fi
-
-echo 1 > zoned &&
-    echo "${zone_size}" > zone_size &&
-    echo 0 > completion_nsec &&
-    echo 4096 > blocksize &&
-    echo 1024 > size &&
-    echo 1 > memory_backed &&
-    echo 1 > power || exit $?
-
-"${scriptdir}"/test-zbd-support "$@" /dev/nullb0
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index acde3b3a..1658dc25 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -14,6 +14,7 @@ usage() {
 	echo -e "\t-r Reset all zones before test start"
 	echo -e "\t-o <max_open_zones> Run fio with max_open_zones limit"
 	echo -e "\t-t <test #> Run only a single test case with specified number"
+	echo -e "\t-q Quit the test run after any failed test"
 	echo -e "\t-z Run fio with debug=zbd option"
 }
 
@@ -190,6 +191,64 @@ prep_write() {
 		reset_zone "${dev}" -1
 }
 
+SKIP_TESTCASE=255
+
+require_scsi_dev() {
+	if ! is_scsi_device "$dev"; then
+		SKIP_REASON="$dev is not a SCSI device"
+		return 1
+	fi
+	return 0
+}
+
+require_conv_zone_bytes() {
+	local req_bytes=${1}
+
+	if ((req_bytes > first_sequential_zone_sector * 512)); then
+		SKIP_REASON="$dev does not have enough conventional zones"
+		return 1
+	fi
+	return 0
+}
+
+require_zbd() {
+	if [[ -z ${is_zbd} ]]; then
+		SKIP_REASON="$dev is not a zoned block device"
+		return 1
+	fi
+	return 0
+}
+
+require_regular_block_dev() {
+	if [[ -n ${is_zbd} ]]; then
+		SKIP_REASON="$dev is not a regular block device"
+		return 1
+	fi
+	return 0
+}
+
+require_seq_zones() {
+	local req_seq_zones=${1}
+	local seq_bytes=$((disk_size - first_sequential_zone_sector * 512))
+
+	if ((req_seq_zones > seq_bytes / zone_size)); then
+		SKIP_REASON="$dev does not have $req_seq_zones sequential zones"
+		return 1
+	fi
+	return 0
+}
+
+require_conv_zones() {
+	local req_c_zones=${1}
+	local conv_bytes=$((first_sequential_zone_sector * 512))
+
+	if ((req_c_zones > conv_bytes / zone_size)); then
+		SKIP_REASON="$dev does not have $req_c_zones conventional zones"
+		return 1
+	fi
+	return 0
+}
+
 # Check whether buffered writes are refused.
 test1() {
     run_fio --name=job1 --filename="$dev" --rw=write --direct=0 --bs=4K	\
@@ -221,14 +280,15 @@ test2() {
     if [ -z "$is_zbd" ]; then
 	opts+=("--zonesize=${zone_size}")
     fi
-    run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
-    ! grep -q 'WRITE:' "${logfile}.${test_number}"
+    run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 && return 1
+    grep -q 'buflen exceeds zone size' "${logfile}.${test_number}"
 }
 
 # Run fio against an empty zone. This causes fio to report "No I/O performed".
 test3() {
     local off opts=() rc
 
+    require_seq_zones 129 || return $SKIP_TESTCASE
     off=$((first_sequential_zone_sector * 512 + 128 * zone_size))
     size=$((zone_size))
     [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
@@ -246,6 +306,7 @@ test3() {
 test4() {
     local off opts=()
 
+    require_seq_zones 130 || return $SKIP_TESTCASE
     off=$((first_sequential_zone_sector * 512 + 129 * zone_size))
     size=$((zone_size))
     [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
@@ -327,10 +388,7 @@ test8() {
 test9() {
     local size
 
-    if ! is_scsi_device "$dev"; then
-	echo "$dev is not a SCSI device" >>"${logfile}.${test_number}"
-	return 0
-    fi
+    require_scsi_dev || return $SKIP_TESTCASE
 
     prep_write
     size=$((4 * zone_size))
@@ -346,10 +404,7 @@ test9() {
 test10() {
     local size
 
-    if ! is_scsi_device "$dev"; then
-	echo "$dev is not a SCSI device" >>"${logfile}.${test_number}"
-	return 0
-    fi
+    require_scsi_dev || return $SKIP_TESTCASE
 
     prep_write
     size=$((4 * zone_size))
@@ -409,18 +464,20 @@ test13() {
 
 # Random write to conventional zones.
 test14() {
-    local size
+    local off size
 
+    if ! result=($(first_online_zone "$dev")); then
+	echo "Failed to determine first online zone"
+	exit 1
+    fi
+    off=${result[0]}
     prep_write
     size=$((16 * 2**20)) # 20 MB
-    if [ $size -gt $((first_sequential_zone_sector * 512)) ]; then
-	echo "$dev does not have enough sequential zones" \
-	     >>"${logfile}.${test_number}"
-	return 0
-    fi
+    require_conv_zone_bytes "${size}" || return $SKIP_TESTCASE
+
     run_one_fio_job "$(ioengine "libaio")" --iodepth=64 --rw=randwrite --bs=16K \
 		    --zonemode=zbd --zonesize="${zone_size}" --do_verify=1 \
-		    --verify=md5 --size=$size				   \
+		    --verify=md5 --offset=$off --size=$size\
 		    >>"${logfile}.${test_number}" 2>&1 || return $?
     check_written $((size)) || return $?
     check_read $((size)) || return $?
@@ -477,17 +534,26 @@ test16() {
 
 # Random reads and writes in the last zone.
 test17() {
-    local io off read size written
+    local io off last read size written
 
     off=$(((disk_size / zone_size - 1) * zone_size))
     size=$((disk_size - off))
+    if ! last=($(last_online_zone "$dev")); then
+	echo "Failed to determine last online zone"
+	exit 1
+    fi
+    if [[ "$((last * 512))" -lt "$off" ]]; then
+	off=$((last * 512))
+	size=$zone_size
+    fi
     if [ -n "$is_zbd" ]; then
 	reset_zone "$dev" $((off / 512)) || return $?
     fi
     prep_write
     run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw --bs=4K \
 		    --zonemode=zbd --zonesize="${zone_size}"		\
-		    --offset=$off --loops=2 --norandommap=1\
+		    --offset=$off --loops=2 --norandommap=1 \
+		    --size="$size"\
 		    >>"${logfile}.${test_number}" 2>&1 || return $?
     written=$(fio_written <"${logfile}.${test_number}")
     read=$(fio_read <"${logfile}.${test_number}")
@@ -604,6 +670,7 @@ test27() {
 test28() {
     local i jobs=16 off opts
 
+    require_seq_zones 65 || return $SKIP_TESTCASE
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
     prep_write
@@ -628,6 +695,7 @@ test28() {
 test29() {
     local i jobs=16 off opts=()
 
+    require_seq_zones 80 || return $SKIP_TESTCASE
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     size=$((16*zone_size))
     prep_write
@@ -664,12 +732,18 @@ test31() {
     local bs inc nz off opts size
 
     prep_write
-    # Start with writing 128 KB to 128 sequential zones.
+    # Start with writing 128 KB to max_open_zones sequential zones.
     bs=128K
-    nz=128
+    nz=$((max_open_zones))
+    if [[ $nz -eq 0 ]]; then
+	nz=128
+    fi
     # shellcheck disable=SC2017
     inc=$(((disk_size - (first_sequential_zone_sector * 512)) / (nz * zone_size)
 	   * zone_size))
+    if [ "$inc" -eq 0 ]; then
+	require_seq_zones $nz || return $SKIP_TESTCASE
+    fi
     opts=()
     for ((off = first_sequential_zone_sector * 512; off < disk_size;
 	  off += inc)); do
@@ -696,6 +770,8 @@ test31() {
 test32() {
     local off opts=() size
 
+    require_zbd || return $SKIP_TESTCASE
+
     prep_write
     off=$((first_sequential_zone_sector * 512))
     size=$((disk_size - off))
@@ -773,7 +849,7 @@ test37() {
     local bs off size capacity
 
     prep_write
-    capacity=$(total_zone_capacity 1 $first_sequential_zone_sector $dev)
+    capacity=$(total_zone_capacity 1 $((first_sequential_zone_sector*512)) $dev)
     if [ "$first_sequential_zone_sector" = 0 ]; then
 	off=0
     else
@@ -805,16 +881,23 @@ test38() {
 
 # Read one block from a block device.
 read_one_block() {
+    local off
     local bs
 
+    if ! result=($(first_online_zone "$dev")); then
+	echo "Failed to determine first online zone"
+	exit 1
+    fi
+    off=${result[0]}
     bs=$((logical_block_size))
-    run_one_fio_job --rw=read "$(ioengine "psync")" --bs=$bs --size=$bs "$@" 2>&1 |
+    run_one_fio_job --rw=read "$(ioengine "psync")" --offset=$off --bs=$bs \
+		    --size=$bs "$@" 2>&1 |
 	tee -a "${logfile}.${test_number}"
 }
 
 # Check whether fio accepts --zonemode=none for zoned block devices.
 test39() {
-    [ -n "$is_zbd" ] || return 0
+    require_zbd || return $SKIP_TESTCASE
     read_one_block --zonemode=none >/dev/null || return $?
     check_read $((logical_block_size)) || return $?
 }
@@ -824,7 +907,7 @@ test40() {
     local bs
 
     bs=$((logical_block_size))
-    [ -n "$is_zbd" ] || return 0
+    require_zbd || return $SKIP_TESTCASE
     read_one_block --zonemode=strided |
 	grep -q 'fio: --zonesize must be specified when using --zonemode=strided' ||
 	return $?
@@ -834,21 +917,21 @@ test40() {
 
 # Check whether fio checks the zone size for zoned block devices.
 test41() {
-    [ -n "$is_zbd" ] || return 0
+    require_zbd || return $SKIP_TESTCASE
     read_one_block --zonemode=zbd --zonesize=$((2 * zone_size)) |
 	grep -q 'job parameter zonesize.*does not match disk zone size'
 }
 
 # Check whether fio handles --zonesize=0 correctly for regular block devices.
 test42() {
-    [ -n "$is_zbd" ] && return 0
+    require_regular_block_dev || return $SKIP_TESTCASE
     read_one_block --zonemode=zbd --zonesize=0 |
 	grep -q 'Specifying the zone size is mandatory for regular block devices with --zonemode=zbd'
 }
 
 # Check whether fio handles --zonesize=1 correctly for regular block devices.
 test43() {
-    [ -n "$is_zbd" ] && return 0
+    require_regular_block_dev || return $SKIP_TESTCASE
     read_one_block --zonemode=zbd --zonesize=1 |
 	grep -q 'zone size must be at least 512 bytes for --zonemode=zbd'
 }
@@ -862,7 +945,7 @@ test44() {
 test45() {
     local bs i
 
-    [ -z "$is_zbd" ] && return 0
+    require_zbd || return $SKIP_TESTCASE
     prep_write
     bs=$((logical_block_size))
     run_one_fio_job "$(ioengine "psync")" --iodepth=1 --rw=randwrite --bs=$bs\
@@ -901,6 +984,9 @@ test47() {
 test48() {
     local i jobs=16 off opts=()
 
+    require_zbd || return $SKIP_TESTCASE
+    require_seq_zones 80 || return $SKIP_TESTCASE
+
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     size=$((16*zone_size))
     prep_write
@@ -922,7 +1008,7 @@ test48() {
 
     { echo; echo "fio ${opts[*]}"; echo; } >>"${logfile}.${test_number}"
 
-    timeout -v -s KILL 45s \
+    timeout -v -s KILL 180s \
 	    "${dynamic_analyzer[@]}" "$fio" "${opts[@]}" \
 	    >> "${logfile}.${test_number}" 2>&1 || return $?
 }
@@ -930,11 +1016,7 @@ test48() {
 # Check if fio handles --zonecapacity on a normal block device correctly
 test49() {
 
-    if [ -n "$is_zbd" ]; then
-	echo "$dev is not a regular block device" \
-	     >>"${logfile}.${test_number}"
-	return 0
-    fi
+    require_regular_block_dev || return $SKIP_TESTCASE
 
     size=$((2 * zone_size))
     capacity=$((zone_size * 3 / 4))
@@ -948,12 +1030,137 @@ test49() {
     check_read $((capacity * 2)) || return $?
 }
 
+# Verify that conv zones are not locked and only seq zones are locked during
+# random read on conv-seq mixed zones.
+test50() {
+	local off
+
+	require_zbd || return $SKIP_TESTCASE
+	require_conv_zones 8 || return $SKIP_TESTCASE
+	require_seq_zones 8 || return $SKIP_TESTCASE
+
+	reset_zone "${dev}" -1
+
+	off=$((first_sequential_zone_sector * 512 - 8 * zone_size))
+	run_fio --name=job --filename=${dev} --offset=${off} --bs=64K \
+		--size=$((16 * zone_size)) "$(ioengine "libaio")" --rw=randread\
+		--time_based --runtime=3 --zonemode=zbd --zonesize=${zone_size}\
+		--direct=1 --group_reporting=1 ${job_var_opts[@]} \
+		>> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
+# Verify that conv zones are neither locked nor opened during random write on
+# conv-seq mixed zones. Zone lock and zone open shall happen only on seq zones.
+test51() {
+	local off jobs=16
+	local -a opts
+
+	require_zbd || return $SKIP_TESTCASE
+	require_conv_zones 8 || return $SKIP_TESTCASE
+	require_seq_zones 8 || return $SKIP_TESTCASE
+
+	prep_write
+
+	off=$((first_sequential_zone_sector * 512 - 8 * zone_size))
+	opts+=("--size=$((16 * zone_size))" "$(ioengine "libaio")")
+	opts+=("--zonemode=zbd" "--direct=1" "--zonesize=${zone_size}")
+	opts+=("--max_open_zones=2" "--offset=$off")
+	opts+=("--thread=1" "--group_reporting=1")
+	opts+=("--time_based" "--runtime=30" "--rw=randwrite")
+	for ((i=0;i<jobs;i++)); do
+		opts+=("--name=job${i}" "--filename=$dev")
+		opts+=("--bs=$(((i+1)*16))K")
+		opts+=($(job_var_opts_exclude "--max_open_zones"))
+	done
+	run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
+# Verify that zone_reset_threshold only takes logical blocks from seq
+# zones into account, and logical blocks of conv zones are not counted.
+test52() {
+	local off io_size
+
+	require_zbd || return $SKIP_TESTCASE
+	require_conv_zones 8 || return $SKIP_TESTCASE
+	require_seq_zones 8 || return $SKIP_TESTCASE
+
+	reset_zone "${dev}" -1
+
+	# Total I/O size is 1/8 = 0.125 of the I/O range of cont + seq zones.
+	# Set zone_reset_threshold as 0.1. The threshold size is less than
+	# 0.125, then, reset count zero is expected.
+	# On the other hand, half of the I/O range is covered by conv zones.
+	# If fio would count the conv zones for zone_reset_threshold, the ratio
+	# were more than 0.5 and would trigger zone resets.
+
+	off=$((first_sequential_zone_sector * 512 - 8 * zone_size))
+	io_size=$((zone_size * 16 / 8))
+	run_fio --name=job --filename=$dev --rw=randwrite --bs=$((zone_size/16))\
+		--size=$((zone_size * 16)) --softrandommap=1 \
+		--io_size=$((io_size)) "$(ioengine "psync")" --offset=$off \
+		--zonemode=zbd --direct=1 --zonesize=${zone_size} \
+		--zone_reset_threshold=.1 --zone_reset_frequency=1.0 \
+		${job_var_opts[@]} --debug=zbd \
+		>> "${logfile}.${test_number}" 2>&1 || return $?
+
+	check_written ${io_size} || return $?
+	check_reset_count -eq 0 || return $?
+}
+
+# Check both reads and writes are executed by random I/O to conventional zones.
+test53() {
+	local off capacity io read_b=0 written_b=0
+
+	require_zbd || return $SKIP_TESTCASE
+	require_conv_zones 4 || return $SKIP_TESTCASE
+
+	off=$((first_sequential_zone_sector * 512 - 4 * zone_size))
+	capacity=$(total_zone_capacity 4 $off $dev)
+	run_fio --name=job --filename=${dev} --rw=randrw --bs=64K \
+		--size=$((4 * zone_size)) "$(ioengine "psync")" --offset=${off}\
+		--zonemode=zbd --direct=1 --zonesize=${zone_size} \
+		${job_var_opts[@]} \
+		>> "${logfile}.${test_number}" 2>&1 || return $?
+
+	written_b=$(fio_written <"${logfile}.${test_number}")
+	read_b=$(fio_read <"${logfile}.${test_number}")
+	io=$((written_b + read_b))
+	echo "Number of bytes read: $read_b" >>"${logfile}.${test_number}"
+	echo "Number of bytes written: $written_b" >>"${logfile}.${test_number}"
+	echo "Total number of bytes read and written: $io <> $capacity" \
+	     >>"${logfile}.${test_number}"
+	if ((io==capacity && written_b != 0 && read_b != 0)); then
+		return 0
+	fi
+	return 1
+}
+
+# Test read/write mix with verify.
+test54() {
+	require_zbd || return $SKIP_TESTCASE
+	require_seq_zones 8 || return $SKIP_TESTCASE
+
+	run_fio --name=job --filename=${dev} "$(ioengine "libaio")" \
+		--time_based=1 --runtime=30s --continue_on_error=0 \
+		--offset=$((first_sequential_zone_sector * 512)) \
+		--size=$((8*zone_size)) --direct=1 --iodepth=1 \
+		--rw=randrw:2 --rwmixwrite=25 --bsrange=4k-${zone_size} \
+		--zonemode=zbd --zonesize=${zone_size} \
+		--verify=crc32c --do_verify=1 --verify_backlog=2 \
+		--experimental_verify=1 \
+		--alloc-size=65536 --random_generator=tausworthe64 \
+		${job_var_opts[@]} --debug=zbd \
+		>> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
+SECONDS=0
 tests=()
 dynamic_analyzer=()
 reset_all_zones=
 use_libzbc=
 zbd_debug=
 max_open_zones_opt=
+quit_on_err=
 
 while [ "${1#-}" != "$1" ]; do
   case "$1" in
@@ -968,8 +1175,10 @@ while [ "${1#-}" != "$1" ]; do
     -o) max_open_zones_opt="${2}"; shift; shift;;
     -v) dynamic_analyzer=(valgrind "--read-var-info=yes");
 	shift;;
+    -q) quit_on_err=1; shift;;
     -z) zbd_debug=1; shift;;
     --) shift; break;;
+     *) usage; exit 1;;
   esac
 done
 
@@ -1087,10 +1296,12 @@ fi
 logfile=$0.log
 
 passed=0
+skipped=0
 failed=0
 if [ -t 1 ]; then
     red="\e[1;31m"
     green="\e[1;32m"
+    cyan="\e[1;36m"
     end="\e[m"
 else
     red=""
@@ -1101,14 +1312,23 @@ rc=0
 
 intr=0
 trap 'intr=1' SIGINT
+ret=0
 
 for test_number in "${tests[@]}"; do
     rm -f "${logfile}.${test_number}"
+    unset SKIP_REASON
     echo -n "Running test $(printf "%02d" $test_number) ... "
-    if eval "test$test_number" && check_log $test_number; then
+    eval "test$test_number"
+    ret=$?
+    if ((!ret)) && check_log $test_number; then
 	status="PASS"
 	cc_status="${green}${status}${end}"
 	((passed++))
+    elif ((ret==SKIP_TESTCASE)); then
+	status="SKIP"
+	echo "${SKIP_REASON}" >> "${logfile}.${test_number}"
+	cc_status="${cyan}${status}${end}    ${SKIP_REASON}"
+	((skipped++))
     else
 	status="FAIL"
 	cc_status="${red}${status}${end}"
@@ -1118,10 +1338,15 @@ for test_number in "${tests[@]}"; do
     echo -e "$cc_status"
     echo "$status" >> "${logfile}.${test_number}"
     [ $intr -ne 0 ] && exit 1
+    [ -n "$quit_on_err" -a "$rc" -ne 0 ] && exit 1
 done
 
 echo "$passed tests passed"
+if [ $skipped -gt 0 ]; then
+    echo " $skipped tests skipped"
+fi
 if [ $failed -gt 0 ]; then
-    echo " and $failed tests failed"
+    echo " $failed tests failed"
 fi
+echo "Run time: $(TZ=UTC0 printf "%(%H:%M:%S)T\n" $(( SECONDS )) )"
 exit $rc
diff --git a/zbd.c b/zbd.c
index f2599bd4..6a26fe10 100644
--- a/zbd.c
+++ b/zbd.c
@@ -131,15 +131,6 @@ static uint32_t zbd_zone_idx(const struct fio_file *f, uint64_t offset)
 	return min(zone_idx, f->zbd_info->nr_zones);
 }
 
-/**
- * zbd_zone_swr - Test whether a zone requires sequential writes
- * @z: zone info pointer.
- */
-static inline bool zbd_zone_swr(struct fio_zone_info *z)
-{
-	return z->type == ZBD_ZONE_TYPE_SWR;
-}
-
 /**
  * zbd_zone_end - Return zone end location
  * @z: zone info pointer.
@@ -171,11 +162,12 @@ static bool zbd_zone_full(const struct fio_file *f, struct fio_zone_info *z,
 {
 	assert((required & 511) == 0);
 
-	return zbd_zone_swr(z) &&
+	return z->has_wp &&
 		z->wp + required > zbd_zone_capacity_end(z);
 }
 
-static void zone_lock(struct thread_data *td, struct fio_file *f, struct fio_zone_info *z)
+static void zone_lock(struct thread_data *td, const struct fio_file *f,
+		      struct fio_zone_info *z)
 {
 	struct zoned_block_device_info *zbd = f->zbd_info;
 	uint32_t nz = z - zbd->zone_info;
@@ -183,6 +175,8 @@ static void zone_lock(struct thread_data *td, struct fio_file *f, struct fio_zon
 	/* A thread should never lock zones outside its working area. */
 	assert(f->min_zone <= nz && nz < f->max_zone);
 
+	assert(z->has_wp);
+
 	/*
 	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
 	 * is changed or when io_u completes and zbd_put_io() executed.
@@ -199,11 +193,26 @@ static void zone_lock(struct thread_data *td, struct fio_file *f, struct fio_zon
 	}
 }
 
+static inline void zone_unlock(struct fio_zone_info *z)
+{
+	int ret;
+
+	assert(z->has_wp);
+	ret = pthread_mutex_unlock(&z->mutex);
+	assert(!ret);
+}
+
 static bool is_valid_offset(const struct fio_file *f, uint64_t offset)
 {
 	return (uint64_t)(offset - f->file_offset) < f->io_size;
 }
 
+static inline struct fio_zone_info *get_zone(const struct fio_file *f,
+					     unsigned int zone_nr)
+{
+	return &f->zbd_info->zone_info[zone_nr];
+}
+
 /* Verify whether direct I/O is used for all host-managed zoned drives. */
 static bool zbd_using_direct_io(void)
 {
@@ -235,7 +244,7 @@ static bool zbd_is_seq_job(struct fio_file *f)
 	zone_idx_b = zbd_zone_idx(f, f->file_offset);
 	zone_idx_e = zbd_zone_idx(f, f->file_offset + f->io_size - 1);
 	for (zone_idx = zone_idx_b; zone_idx <= zone_idx_e; zone_idx++)
-		if (zbd_zone_swr(&f->zbd_info->zone_info[zone_idx]))
+		if (get_zone(f, zone_idx)->has_wp)
 			return true;
 
 	return false;
@@ -286,7 +295,7 @@ static bool zbd_verify_sizes(void)
 			}
 
 			zone_idx = zbd_zone_idx(f, f->file_offset);
-			z = &f->zbd_info->zone_info[zone_idx];
+			z = get_zone(f, zone_idx);
 			if ((f->file_offset != z->start) &&
 			    (td->o.td_ddir != TD_DDIR_READ)) {
 				new_offset = zbd_zone_end(z);
@@ -302,7 +311,7 @@ static bool zbd_verify_sizes(void)
 				f->file_offset = new_offset;
 			}
 			zone_idx = zbd_zone_idx(f, f->file_offset + f->io_size);
-			z = &f->zbd_info->zone_info[zone_idx];
+			z = get_zone(f, zone_idx);
 			new_end = z->start;
 			if ((td->o.td_ddir != TD_DDIR_READ) &&
 			    (f->file_offset + f->io_size != new_end)) {
@@ -316,10 +325,6 @@ static bool zbd_verify_sizes(void)
 					 (unsigned long long) new_end - f->file_offset);
 				f->io_size = new_end - f->file_offset;
 			}
-
-			f->min_zone = zbd_zone_idx(f, f->file_offset);
-			f->max_zone = zbd_zone_idx(f, f->file_offset + f->io_size);
-			assert(f->min_zone < f->max_zone);
 		}
 	}
 
@@ -415,6 +420,7 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 		p->type = ZBD_ZONE_TYPE_SWR;
 		p->cond = ZBD_ZONE_COND_EMPTY;
 		p->capacity = zone_capacity;
+		p->has_wp = 1;
 	}
 	/* a sentinel */
 	p->start = nr_zones * zone_size;
@@ -443,7 +449,7 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	struct fio_zone_info *p;
 	uint64_t zone_size, offset;
 	struct zoned_block_device_info *zbd_info = NULL;
-	int i, j, ret = 0;
+	int i, j, ret = -ENOMEM;
 
 	zones = calloc(ZBD_REPORT_MAX_ZONES, sizeof(struct zbd_zone));
 	if (!zones)
@@ -475,7 +481,6 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 
 	zbd_info = scalloc(1, sizeof(*zbd_info) +
 			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
-	ret = -ENOMEM;
 	if (!zbd_info)
 		goto out;
 	mutex_init_pshared(&zbd_info->mutex);
@@ -499,8 +504,17 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 				p->wp = z->wp;
 				break;
 			}
+
+			switch (z->type) {
+			case ZBD_ZONE_TYPE_SWR:
+				p->has_wp = 1;
+				break;
+			default:
+				p->has_wp = 0;
+			}
 			p->type = z->type;
 			p->cond = z->cond;
+
 			if (j > 0 && p->start != p[-1].start + zone_size) {
 				log_info("%s: invalid zone data\n",
 					 f->file_name);
@@ -512,8 +526,9 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 		offset = z->start + z->len;
 		if (j >= nr_zones)
 			break;
-		nrz = zbd_report_zones(td, f, offset,
-					    zones, ZBD_REPORT_MAX_ZONES);
+		nrz = zbd_report_zones(td, f, offset, zones,
+				       min((uint32_t)(nr_zones - j),
+					   ZBD_REPORT_MAX_ZONES));
 		if (nrz < 0) {
 			ret = nrz;
 			log_info("fio: report zones (offset %llu) failed for %s (%d).\n",
@@ -662,6 +677,18 @@ int zbd_setup_files(struct thread_data *td)
 		if (!zbd)
 			continue;
 
+		f->min_zone = zbd_zone_idx(f, f->file_offset);
+		f->max_zone = zbd_zone_idx(f, f->file_offset + f->io_size);
+
+		/*
+		 * When all zones in the I/O range are conventional, io_size
+		 * can be smaller than zone size, making min_zone the same
+		 * as max_zone. This is why the assert below needs to be made
+		 * conditional.
+		 */
+		if (zbd_is_seq_job(f))
+			assert(f->min_zone < f->max_zone);
+
 		zbd->max_open_zones = zbd->max_open_zones ?: ZBD_MAX_OPEN_ZONES;
 
 		if (td->o.max_open_zones > 0 &&
@@ -695,10 +722,10 @@ int zbd_setup_files(struct thread_data *td)
 	return 0;
 }
 
-static unsigned int zbd_zone_nr(struct zoned_block_device_info *zbd_info,
-				struct fio_zone_info *zone)
+static inline unsigned int zbd_zone_nr(const struct fio_file *f,
+				       struct fio_zone_info *zone)
 {
-	return zone - zbd_info->zone_info;
+	return zone - f->zbd_info->zone_info;
 }
 
 /**
@@ -716,15 +743,16 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 {
 	uint64_t offset = z->start;
 	uint64_t length = (z+1)->start - offset;
+	uint64_t data_in_zone = z->wp - z->start;
 	int ret = 0;
 
-	if (z->wp == z->start)
+	if (!data_in_zone)
 		return 0;
 
 	assert(is_valid_offset(f, offset + length - 1));
 
 	dprint(FD_ZBD, "%s: resetting wp of zone %u.\n", f->file_name,
-		zbd_zone_nr(f->zbd_info, z));
+		zbd_zone_nr(f, z));
 	switch (f->zbd_info->model) {
 	case ZBD_HOST_AWARE:
 	case ZBD_HOST_MANAGED:
@@ -737,7 +765,8 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 	}
 
 	pthread_mutex_lock(&f->zbd_info->mutex);
-	f->zbd_info->sectors_with_data -= z->wp - z->start;
+	f->zbd_info->sectors_with_data -= data_in_zone;
+	f->zbd_info->wp_sectors_with_data -= data_in_zone;
 	pthread_mutex_unlock(&f->zbd_info->mutex);
 	z->wp = z->start;
 	z->verify_block = 0;
@@ -757,11 +786,8 @@ static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
 		if (f->zbd_info->open_zones[open_zone_idx] == zone_idx)
 			break;
 	}
-	if (open_zone_idx == f->zbd_info->num_open_zones) {
-		dprint(FD_ZBD, "%s: zone %d is not open\n",
-		       f->file_name, zone_idx);
+	if (open_zone_idx == f->zbd_info->num_open_zones)
 		return;
-	}
 
 	dprint(FD_ZBD, "%s: closing zone %d\n", f->file_name, zone_idx);
 	memmove(f->zbd_info->open_zones + open_zone_idx,
@@ -770,7 +796,7 @@ static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
 		sizeof(f->zbd_info->open_zones[0]));
 	f->zbd_info->num_open_zones--;
 	td->num_open_zones--;
-	f->zbd_info->zone_info[zone_idx].open = 0;
+	get_zone(f, zone_idx)->open = 0;
 }
 
 /*
@@ -794,11 +820,11 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 	assert(min_bs);
 
 	dprint(FD_ZBD, "%s: examining zones %u .. %u\n", f->file_name,
-		zbd_zone_nr(f->zbd_info, zb), zbd_zone_nr(f->zbd_info, ze));
+		zbd_zone_nr(f, zb), zbd_zone_nr(f, ze));
 	for (z = zb; z < ze; z++) {
-		uint32_t nz = z - f->zbd_info->zone_info;
+		uint32_t nz = zbd_zone_nr(f, z);
 
-		if (!zbd_zone_swr(z))
+		if (!z->has_wp)
 			continue;
 		zone_lock(td, f, z);
 		if (all_zones) {
@@ -812,12 +838,11 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 		}
 		if (reset_wp) {
 			dprint(FD_ZBD, "%s: resetting zone %u\n",
-			       f->file_name,
-			       zbd_zone_nr(f->zbd_info, z));
+			       f->file_name, zbd_zone_nr(f, z));
 			if (zbd_reset_zone(td, f, z) < 0)
 				res = 1;
 		}
-		pthread_mutex_unlock(&z->mutex);
+		zone_unlock(z);
 	}
 
 	return res;
@@ -866,29 +891,37 @@ enum swd_action {
 };
 
 /* Calculate the number of sectors with data (swd) and perform action 'a' */
-static uint64_t zbd_process_swd(const struct fio_file *f, enum swd_action a)
+static uint64_t zbd_process_swd(struct thread_data *td,
+				const struct fio_file *f, enum swd_action a)
 {
 	struct fio_zone_info *zb, *ze, *z;
 	uint64_t swd = 0;
+	uint64_t wp_swd = 0;
 
-	zb = &f->zbd_info->zone_info[f->min_zone];
-	ze = &f->zbd_info->zone_info[f->max_zone];
+	zb = get_zone(f, f->min_zone);
+	ze = get_zone(f, f->max_zone);
 	for (z = zb; z < ze; z++) {
-		pthread_mutex_lock(&z->mutex);
+		if (z->has_wp) {
+			zone_lock(td, f, z);
+			wp_swd += z->wp - z->start;
+		}
 		swd += z->wp - z->start;
 	}
 	pthread_mutex_lock(&f->zbd_info->mutex);
 	switch (a) {
 	case CHECK_SWD:
 		assert(f->zbd_info->sectors_with_data == swd);
+		assert(f->zbd_info->wp_sectors_with_data == wp_swd);
 		break;
 	case SET_SWD:
 		f->zbd_info->sectors_with_data = swd;
+		f->zbd_info->wp_sectors_with_data = wp_swd;
 		break;
 	}
 	pthread_mutex_unlock(&f->zbd_info->mutex);
 	for (z = zb; z < ze; z++)
-		pthread_mutex_unlock(&z->mutex);
+		if (z->has_wp)
+			zone_unlock(z);
 
 	return swd;
 }
@@ -899,37 +932,28 @@ static uint64_t zbd_process_swd(const struct fio_file *f, enum swd_action a)
  */
 static const bool enable_check_swd = false;
 
-/* Check whether the value of zbd_info.sectors_with_data is correct. */
-static void zbd_check_swd(const struct fio_file *f)
-{
-	if (!enable_check_swd)
-		return;
-
-	zbd_process_swd(f, CHECK_SWD);
-}
-
-static void zbd_init_swd(struct fio_file *f)
+/* Check whether the values of zbd_info.*sectors_with_data are correct. */
+static void zbd_check_swd(struct thread_data *td, const struct fio_file *f)
 {
-	uint64_t swd;
-
 	if (!enable_check_swd)
 		return;
 
-	swd = zbd_process_swd(f, SET_SWD);
-	dprint(FD_ZBD, "%s(%s): swd = %" PRIu64 "\n", __func__, f->file_name,
-	       swd);
+	zbd_process_swd(td, f, CHECK_SWD);
 }
 
 void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 {
 	struct fio_zone_info *zb, *ze;
+	uint64_t swd;
 
 	if (!f->zbd_info || !td_write(td))
 		return;
 
-	zb = &f->zbd_info->zone_info[f->min_zone];
-	ze = &f->zbd_info->zone_info[f->max_zone];
-	zbd_init_swd(f);
+	zb = get_zone(f, f->min_zone);
+	ze = get_zone(f, f->max_zone);
+	swd = zbd_process_swd(td, f, SET_SWD);
+	dprint(FD_ZBD, "%s(%s): swd = %" PRIu64 "\n", __func__, f->file_name,
+	       swd);
 	/*
 	 * If data verification is enabled reset the affected zones before
 	 * writing any data to avoid that a zone reset has to be issued while
@@ -968,7 +992,7 @@ static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 			  uint32_t zone_idx)
 {
 	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
-	struct fio_zone_info *z = &f->zbd_info->zone_info[zone_idx];
+	struct fio_zone_info *z = get_zone(f, zone_idx);
 	bool res = true;
 
 	if (z->cond == ZBD_ZONE_COND_OFFLINE)
@@ -1019,7 +1043,8 @@ static uint32_t pick_random_zone_idx(const struct fio_file *f,
 /*
  * Modify the offset of an I/O unit that does not refer to an open zone such
  * that it refers to an open zone. Close an open zone and open a new zone if
- * necessary. This algorithm can only work correctly if all write pointers are
+ * necessary. The open zone is searched across sequential zones.
+ * This algorithm can only work correctly if all write pointers are
  * a multiple of the fio block size. The caller must neither hold z->mutex
  * nor f->zbd_info->mutex. Returns with z->mutex held upon success.
  */
@@ -1061,16 +1086,19 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 	for (;;) {
 		uint32_t tmp_idx;
 
-		z = &f->zbd_info->zone_info[zone_idx];
-
-		zone_lock(td, f, z);
+		z = get_zone(f, zone_idx);
+		if (z->has_wp)
+			zone_lock(td, f, z);
 		pthread_mutex_lock(&f->zbd_info->mutex);
-		if (td->o.max_open_zones == 0 && td->o.job_max_open_zones == 0)
-			goto examine_zone;
-		if (f->zbd_info->num_open_zones == 0) {
-			dprint(FD_ZBD, "%s(%s): no zones are open\n",
-			       __func__, f->file_name);
-			goto open_other_zone;
+		if (z->has_wp) {
+			if (z->cond != ZBD_ZONE_COND_OFFLINE &&
+			    td->o.max_open_zones == 0 && td->o.job_max_open_zones == 0)
+				goto examine_zone;
+			if (f->zbd_info->num_open_zones == 0) {
+				dprint(FD_ZBD, "%s(%s): no zones are open\n",
+				       __func__, f->file_name);
+				goto open_other_zone;
+			}
 		}
 
 		/*
@@ -1079,7 +1107,8 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 		 * Ignore zones which don't belong to thread's offset/size area.
 		 */
 		open_zone_idx = pick_random_zone_idx(f, io_u);
-		assert(open_zone_idx < f->zbd_info->num_open_zones);
+		assert(!open_zone_idx ||
+		       open_zone_idx < f->zbd_info->num_open_zones);
 		tmp_idx = open_zone_idx;
 		for (i = 0; i < f->zbd_info->num_open_zones; i++) {
 			uint32_t tmpz;
@@ -1098,7 +1127,8 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 		dprint(FD_ZBD, "%s(%s): no candidate zone\n",
 			__func__, f->file_name);
 		pthread_mutex_unlock(&f->zbd_info->mutex);
-		pthread_mutex_unlock(&z->mutex);
+		if (z->has_wp)
+			zone_unlock(z);
 		return NULL;
 
 found_candidate_zone:
@@ -1107,7 +1137,8 @@ found_candidate_zone:
 			break;
 		zone_idx = new_zone_idx;
 		pthread_mutex_unlock(&f->zbd_info->mutex);
-		pthread_mutex_unlock(&z->mutex);
+		if (z->has_wp)
+			zone_unlock(z);
 	}
 
 	/* Both z->mutex and f->zbd_info->mutex are held. */
@@ -1144,14 +1175,17 @@ open_other_zone:
 	/* Zone 'z' is full, so try to open a new zone. */
 	for (i = f->io_size / f->zbd_info->zone_size; i > 0; i--) {
 		zone_idx++;
-		pthread_mutex_unlock(&z->mutex);
+		if (z->has_wp)
+			zone_unlock(z);
 		z++;
 		if (!is_valid_offset(f, z->start)) {
 			/* Wrap-around. */
 			zone_idx = f->min_zone;
-			z = &f->zbd_info->zone_info[zone_idx];
+			z = get_zone(f, zone_idx);
 		}
 		assert(is_valid_offset(f, z->start));
+		if (!z->has_wp)
+			continue;
 		zone_lock(td, f, z);
 		if (z->open)
 			continue;
@@ -1168,9 +1202,9 @@ open_other_zone:
 		if (zone_idx < f->min_zone || zone_idx >= f->max_zone)
 			continue;
 		pthread_mutex_unlock(&f->zbd_info->mutex);
-		pthread_mutex_unlock(&z->mutex);
+		zone_unlock(z);
 
-		z = &f->zbd_info->zone_info[zone_idx];
+		z = get_zone(f, zone_idx);
 
 		zone_lock(td, f, z);
 		if (z->wp + min_bs <= zbd_zone_capacity_end(z))
@@ -1178,7 +1212,7 @@ open_other_zone:
 		pthread_mutex_lock(&f->zbd_info->mutex);
 	}
 	pthread_mutex_unlock(&f->zbd_info->mutex);
-	pthread_mutex_unlock(&z->mutex);
+	zone_unlock(z);
 	dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,
 	       f->file_name);
 	return NULL;
@@ -1187,6 +1221,8 @@ out:
 	dprint(FD_ZBD, "%s(%s): returning zone %d\n", __func__, f->file_name,
 	       zone_idx);
 	io_u->offset = z->start;
+	assert(z->has_wp);
+	assert(z->cond != ZBD_ZONE_COND_OFFLINE);
 	return z;
 }
 
@@ -1198,26 +1234,39 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
 	const struct fio_file *f = io_u->file;
 	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
 
-	if (!zbd_open_zone(td, f, z - f->zbd_info->zone_info)) {
-		pthread_mutex_unlock(&z->mutex);
+	if (!zbd_open_zone(td, f, zbd_zone_nr(f, z))) {
+		zone_unlock(z);
 		z = zbd_convert_to_open_zone(td, io_u);
 		assert(z);
 	}
 
-	if (z->verify_block * min_bs >= z->capacity)
+	if (z->verify_block * min_bs >= z->capacity) {
 		log_err("%s: %d * %d >= %llu\n", f->file_name, z->verify_block,
 			min_bs, (unsigned long long)z->capacity);
-	io_u->offset = z->start + z->verify_block++ * min_bs;
+		/*
+		 * If the assertion below fails during a test run, adding
+		 * "--experimental_verify=1" to the command line may help.
+		 */
+		assert(false);
+	}
+	io_u->offset = z->start + z->verify_block * min_bs;
+	if (io_u->offset + io_u->buflen >= zbd_zone_capacity_end(z)) {
+		log_err("%s: %llu + %llu >= %llu\n", f->file_name, io_u->offset,
+			io_u->buflen, (unsigned long long) zbd_zone_capacity_end(z));
+		assert(false);
+	}
+	z->verify_block += io_u->buflen / min_bs;
+
 	return z;
 }
 
 /*
- * Find another zone for which @io_u fits below the write pointer. Start
- * searching in zones @zb + 1 .. @zl and continue searching in zones
- * @zf .. @zb - 1.
+ * Find another zone for which @io_u fits in the readable data in the zone.
+ * Search in zones @zb + 1 .. @zl. For random workload, also search in zones
+ * @zb - 1 .. @zf.
  *
- * Either returns NULL or returns a zone pointer and holds the mutex for that
- * zone.
+ * Either returns NULL or returns a zone pointer. When the zone has write
+ * pointer, hold the mutex for the zone.
  */
 static struct fio_zone_info *
 zbd_find_zone(struct thread_data *td, struct io_u *io_u,
@@ -1226,8 +1275,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 	const uint32_t min_bs = td->o.min_bs[io_u->ddir];
 	struct fio_file *f = io_u->file;
 	struct fio_zone_info *z1, *z2;
-	const struct fio_zone_info *const zf =
-		&f->zbd_info->zone_info[f->min_zone];
+	const struct fio_zone_info *const zf = get_zone(f, f->min_zone);
 
 	/*
 	 * Skip to the next non-empty zone in case of sequential I/O and to
@@ -1235,19 +1283,23 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 	 */
 	for (z1 = zb + 1, z2 = zb - 1; z1 < zl || z2 >= zf; z1++, z2--) {
 		if (z1 < zl && z1->cond != ZBD_ZONE_COND_OFFLINE) {
-			zone_lock(td, f, z1);
+			if (z1->has_wp)
+				zone_lock(td, f, z1);
 			if (z1->start + min_bs <= z1->wp)
 				return z1;
-			pthread_mutex_unlock(&z1->mutex);
+			if (z1->has_wp)
+				zone_unlock(z1);
 		} else if (!td_random(td)) {
 			break;
 		}
 		if (td_random(td) && z2 >= zf &&
 		    z2->cond != ZBD_ZONE_COND_OFFLINE) {
-			zone_lock(td, f, z2);
+			if (z2->has_wp)
+				zone_lock(td, f, z2);
 			if (z2->start + min_bs <= z2->wp)
 				return z2;
-			pthread_mutex_unlock(&z2->mutex);
+			if (z2->has_wp)
+				zone_unlock(z2);
 		}
 	}
 	dprint(FD_ZBD, "%s: adjusting random read offset failed\n",
@@ -1272,7 +1324,7 @@ static void zbd_end_zone_io(struct thread_data *td, const struct io_u *io_u,
 	if (io_u->ddir == DDIR_WRITE &&
 	    io_u->offset + io_u->buflen >= zbd_zone_capacity_end(z)) {
 		pthread_mutex_lock(&f->zbd_info->mutex);
-		zbd_close_zone(td, f, z - f->zbd_info->zone_info);
+		zbd_close_zone(td, f, zbd_zone_nr(f, z));
 		pthread_mutex_unlock(&f->zbd_info->mutex);
 	}
 }
@@ -1300,10 +1352,9 @@ static void zbd_queue_io(struct thread_data *td, struct io_u *io_u, int q,
 
 	zone_idx = zbd_zone_idx(f, io_u->offset);
 	assert(zone_idx < zbd_info->nr_zones);
-	z = &zbd_info->zone_info[zone_idx];
+	z = get_zone(f, zone_idx);
 
-	if (!zbd_zone_swr(z))
-		return;
+	assert(z->has_wp);
 
 	if (!success)
 		goto unlock;
@@ -1321,8 +1372,10 @@ static void zbd_queue_io(struct thread_data *td, struct io_u *io_u, int q,
 		 * z->wp > zone_end means that one or more I/O errors
 		 * have occurred.
 		 */
-		if (z->wp <= zone_end)
+		if (z->wp <= zone_end) {
 			zbd_info->sectors_with_data += zone_end - z->wp;
+			zbd_info->wp_sectors_with_data += zone_end - z->wp;
+		}
 		pthread_mutex_unlock(&zbd_info->mutex);
 		z->wp = zone_end;
 		break;
@@ -1339,7 +1392,7 @@ static void zbd_queue_io(struct thread_data *td, struct io_u *io_u, int q,
 unlock:
 	if (!success || q != FIO_Q_QUEUED) {
 		/* BUSY or COMPLETED: unlock the zone */
-		pthread_mutex_unlock(&z->mutex);
+		zone_unlock(z);
 		io_u->zbd_put_io = NULL;
 	}
 }
@@ -1354,17 +1407,15 @@ static void zbd_put_io(struct thread_data *td, const struct io_u *io_u)
 	struct zoned_block_device_info *zbd_info = f->zbd_info;
 	struct fio_zone_info *z;
 	uint32_t zone_idx;
-	int ret;
 
 	if (!zbd_info)
 		return;
 
 	zone_idx = zbd_zone_idx(f, io_u->offset);
 	assert(zone_idx < zbd_info->nr_zones);
-	z = &zbd_info->zone_info[zone_idx];
+	z = get_zone(f, zone_idx);
 
-	if (!zbd_zone_swr(z))
-		return;
+	assert(z->has_wp);
 
 	dprint(FD_ZBD,
 	       "%s: terminate I/O (%lld, %llu) for zone %u\n",
@@ -1372,9 +1423,8 @@ static void zbd_put_io(struct thread_data *td, const struct io_u *io_u)
 
 	zbd_end_zone_io(td, io_u, z);
 
-	ret = pthread_mutex_unlock(&z->mutex);
-	assert(ret == 0);
-	zbd_check_swd(f);
+	zone_unlock(z);
+	zbd_check_swd(td, f);
 }
 
 /*
@@ -1417,7 +1467,7 @@ void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u)
 	assert(td->o.zone_size);
 
 	zone_idx = zbd_zone_idx(f, f->last_pos[ddir]);
-	z = &f->zbd_info->zone_info[zone_idx];
+	z = get_zone(f, zone_idx);
 
 	/*
 	 * When the zone capacity is smaller than the zone size and the I/O is
@@ -1431,8 +1481,7 @@ void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u)
 		       "%s: Jump from zone capacity limit to zone end:"
 		       " (%llu -> %llu) for zone %u (%llu)\n",
 		       f->file_name, (unsigned long long) f->last_pos[ddir],
-		       (unsigned long long) zbd_zone_end(z),
-		       zbd_zone_nr(f->zbd_info, z),
+		       (unsigned long long) zbd_zone_end(z), zone_idx,
 		       (unsigned long long) z->capacity);
 		td->io_skip_bytes += zbd_zone_end(z) - f->last_pos[ddir];
 		f->last_pos[ddir] = zbd_zone_end(z);
@@ -1526,12 +1575,34 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	assert(is_valid_offset(f, io_u->offset));
 	assert(io_u->buflen);
 	zone_idx_b = zbd_zone_idx(f, io_u->offset);
-	zb = &f->zbd_info->zone_info[zone_idx_b];
+	zb = get_zone(f, zone_idx_b);
 	orig_zb = zb;
 
-	/* Accept the I/O offset for conventional zones. */
-	if (!zbd_zone_swr(zb))
+	if (!zb->has_wp) {
+		/* Accept non-write I/Os for conventional zones. */
+		if (io_u->ddir != DDIR_WRITE)
+			return io_u_accept;
+		/*
+		 * Make sure that writes to conventional zones
+		 * don't cross over to any sequential zones.
+		 */
+		if (!(zb + 1)->has_wp ||
+		    io_u->offset + io_u->buflen <= (zb + 1)->start)
+			return io_u_accept;
+
+		if (io_u->offset + min_bs > (zb + 1)->start) {
+			dprint(FD_IO,
+			       "%s: off=%llu + min_bs=%u > next zone %llu\n",
+			       f->file_name, io_u->offset,
+			       min_bs, (unsigned long long) (zb + 1)->start);
+			io_u->offset = zb->start + (zb + 1)->start - io_u->offset;
+			new_len = min(io_u->buflen, (zb + 1)->start - io_u->offset);
+		} else {
+			new_len = (zb + 1)->start - io_u->offset;
+		}
+		io_u->buflen = new_len / min_bs * min_bs;
 		return io_u_accept;
+	}
 
 	/*
 	 * Accept the I/O offset for reads if reading beyond the write pointer
@@ -1541,7 +1612,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	    io_u->ddir == DDIR_READ && td->o.read_beyond_wp)
 		return io_u_accept;
 
-	zbd_check_swd(f);
+	zbd_check_swd(td, f);
 
 	zone_lock(td, f, zb);
 
@@ -1549,7 +1620,6 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	case DDIR_READ:
 		if (td->runstate == TD_VERIFYING && td_write(td)) {
 			zb = zbd_replay_write_order(td, io_u, zb);
-			pthread_mutex_unlock(&zb->mutex);
 			goto accept;
 		}
 		/*
@@ -1561,8 +1631,8 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			zb->wp - zb->start : 0;
 		if (range < min_bs ||
 		    ((!td_random(td)) && (io_u->offset + min_bs > zb->wp))) {
-			pthread_mutex_unlock(&zb->mutex);
-			zl = &f->zbd_info->zone_info[f->max_zone];
+			zone_unlock(zb);
+			zl = get_zone(f, f->max_zone);
 			zb = zbd_find_zone(td, io_u, zb, zl);
 			if (!zb) {
 				dprint(FD_ZBD,
@@ -1591,6 +1661,12 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			io_u->offset = zb->start +
 				((io_u->offset - orig_zb->start) %
 				 (range - io_u->buflen)) / min_bs * min_bs;
+		/*
+		 * When zbd_find_zone() returns a conventional zone,
+		 * we can simply accept the new i/o offset here.
+		 */
+		if (!zb->has_wp)
+			return io_u_accept;
 		/*
 		 * Make sure the I/O does not cross over the zone wp position.
 		 */
@@ -1606,18 +1682,27 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		assert(io_u->offset + io_u->buflen <= zb->wp);
 		goto accept;
 	case DDIR_WRITE:
-		if (io_u->buflen > f->zbd_info->zone_size)
+		if (io_u->buflen > f->zbd_info->zone_size) {
+			td_verror(td, EINVAL, "I/O buflen exceeds zone size");
+			dprint(FD_IO,
+			       "%s: I/O buflen %llu exceeds zone size %llu\n",
+			       f->file_name, io_u->buflen,
+			       (unsigned long long) f->zbd_info->zone_size);
 			goto eof;
+		}
 		if (!zbd_open_zone(td, f, zone_idx_b)) {
-			pthread_mutex_unlock(&zb->mutex);
+			zone_unlock(zb);
 			zb = zbd_convert_to_open_zone(td, io_u);
-			if (!zb)
+			if (!zb) {
+				dprint(FD_IO, "%s: can't convert to open zone",
+				       f->file_name);
 				goto eof;
-			zone_idx_b = zb - f->zbd_info->zone_info;
+			}
+			zone_idx_b = zbd_zone_nr(f, zb);
 		}
 		/* Check whether the zone reset threshold has been exceeded */
 		if (td->o.zrf.u.f) {
-			if (f->zbd_info->sectors_with_data >=
+			if (f->zbd_info->wp_sectors_with_data >=
 			    f->io_size * td->o.zrt.u.f &&
 			    zbd_dec_and_reset_write_cnt(td, f)) {
 				zb->reset_zone = 1;
@@ -1639,6 +1724,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 				goto eof;
 
 			if (zb->capacity < min_bs) {
+				td_verror(td, EINVAL, "ZCAP is less min_bs");
 				log_err("zone capacity %llu smaller than minimum block size %d\n",
 					(unsigned long long)zb->capacity,
 					min_bs);
@@ -1649,8 +1735,9 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		assert(!zbd_zone_full(f, zb, min_bs));
 		io_u->offset = zb->wp;
 		if (!is_valid_offset(f, io_u->offset)) {
-			dprint(FD_ZBD, "Dropped request with offset %llu\n",
-			       io_u->offset);
+			td_verror(td, EINVAL, "invalid WP value");
+			dprint(FD_ZBD, "%s: dropped request with offset %llu\n",
+			       f->file_name, io_u->offset);
 			goto eof;
 		}
 		/*
@@ -1669,9 +1756,9 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			       orig_len, io_u->buflen);
 			goto accept;
 		}
-		log_err("Zone remainder %lld smaller than minimum block size %d\n",
-			(zbd_zone_capacity_end(zb) - io_u->offset),
-			min_bs);
+		td_verror(td, EIO, "zone remainder too small");
+		log_err("zone remainder %lld smaller than min block size %d\n",
+			(zbd_zone_capacity_end(zb) - io_u->offset), min_bs);
 		goto eof;
 	case DDIR_TRIM:
 		/* fall-through */
@@ -1687,17 +1774,23 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	assert(false);
 
 accept:
-	assert(zb);
+	assert(zb->has_wp);
 	assert(zb->cond != ZBD_ZONE_COND_OFFLINE);
 	assert(!io_u->zbd_queue_io);
 	assert(!io_u->zbd_put_io);
 	io_u->zbd_queue_io = zbd_queue_io;
 	io_u->zbd_put_io = zbd_put_io;
+	/*
+	 * Since we return with the zone lock still held,
+	 * add an annotation to let Coverity know that it
+	 * is intentional.
+	 */
+	/* coverity[missing_unlock] */
 	return io_u_accept;
 
 eof:
-	if (zb)
-		pthread_mutex_unlock(&zb->mutex);
+	if (zb && zb->has_wp)
+		zone_unlock(zb);
 	return io_u_eof;
 }
 
diff --git a/zbd.h b/zbd.h
index bff55f99..cc3ab624 100644
--- a/zbd.h
+++ b/zbd.h
@@ -28,6 +28,7 @@ enum io_u_action {
  * @mutex: protects the modifiable members in this structure
  * @type: zone type (BLK_ZONE_TYPE_*)
  * @cond: zone state (BLK_ZONE_COND_*)
+ * @has_wp: whether or not this zone can have a valid write pointer
  * @open: whether or not this zone is currently open. Only relevant if
  *		max_open_zones > 0.
  * @reset_zone: whether or not this zone should be reset before writing to it
@@ -40,6 +41,7 @@ struct fio_zone_info {
 	uint32_t		verify_block;
 	enum zbd_zone_type	type:2;
 	enum zbd_zone_cond	cond:4;
+	unsigned int		has_wp:1;
 	unsigned int		open:1;
 	unsigned int		reset_zone:1;
 };
@@ -53,6 +55,8 @@ struct fio_zone_info {
  *		num_open_zones).
  * @zone_size: size of a single zone in bytes.
  * @sectors_with_data: total size of data in all zones in units of 512 bytes
+ * @wp_sectors_with_data: total size of data in zones with write pointers in
+ *                        units of 512 bytes
  * @zone_size_log2: log2 of the zone size in bytes if it is a power of 2 or 0
  *		if the zone size is not a power of 2.
  * @nr_zones: number of zones
@@ -73,6 +77,7 @@ struct zoned_block_device_info {
 	pthread_mutex_t		mutex;
 	uint64_t		zone_size;
 	uint64_t		sectors_with_data;
+	uint64_t		wp_sectors_with_data;
 	uint32_t		zone_size_log2;
 	uint32_t		nr_zones;
 	uint32_t		refcount;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-01-28 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-01-28 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7216b664d93ef9c59ca5dbc8f54bad2118231ee3:

  Calculate min_rate with the consideration of thinktime (2021-01-26 08:58:42 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 119bb82143bd6c4a577c135ece4ed6b702443f50:

  Merge branch 'fio-fix-detecting-libpmem' of https://github.com/ldorau/fio (2021-01-27 09:51:01 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fio-fix-detecting-libpmem' of https://github.com/ldorau/fio

Lukasz Dorau (1):
      fio: fix detecting libpmem

 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index e6e33d7f..748f7014 100755
--- a/configure
+++ b/configure
@@ -2055,7 +2055,7 @@ cat > $TMPC << EOF
 int main(int argc, char **argv)
 {
   int rc;
-  rc = pmem_is_pmem(NULL, NULL);
+  rc = pmem_is_pmem(NULL, 0);
   return 0;
 }
 EOF


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-01-27 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-01-27 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e493ceaeccfaebda9d30435cbbe30e97058313a7:

  HOWTO: add sg 'hipri' option (2021-01-25 14:06:48 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7216b664d93ef9c59ca5dbc8f54bad2118231ee3:

  Calculate min_rate with the consideration of thinktime (2021-01-26 08:58:42 -0700)

----------------------------------------------------------------
Hongwei Qin (2):
      Add thinktime_blocks_type parameter
      Calculate min_rate with the consideration of thinktime

 HOWTO            |  7 +++++++
 backend.c        | 22 ++++++++++++++++------
 cconv.c          |  2 ++
 engines/cpu.c    |  1 +
 fio.1            |  6 ++++++
 fio.h            |  5 +++++
 options.c        | 22 ++++++++++++++++++++++
 thread_options.h |  5 +++++
 8 files changed, 64 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 3ec86aff..b6d1b58a 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2574,6 +2574,13 @@ I/O rate
 	before we have to complete it and do our :option:`thinktime`. In other words, this
 	setting effectively caps the queue depth if the latter is larger.
 
+.. option:: thinktime_blocks_type=str
+
+	Only valid if :option:`thinktime` is set - control how :option:`thinktime_blocks`
+	triggers. The default is `complete`, which triggers thinktime when fio completes
+	:option:`thinktime_blocks` blocks. If this is set to `issue`, then the trigger happens
+	at the issue side.
+
 .. option:: rate=int[,int][,int]
 
 	Cap the bandwidth used by this job. The number is in bytes/sec, the normal
diff --git a/backend.c b/backend.c
index e20a2e07..f2efddd6 100644
--- a/backend.c
+++ b/backend.c
@@ -858,14 +858,15 @@ static long long usec_for_io(struct thread_data *td, enum fio_ddir ddir)
 	return 0;
 }
 
-static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir)
+static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir,
+			     struct timespec *time)
 {
 	unsigned long long b;
 	uint64_t total;
 	int left;
 
-	b = ddir_rw_sum(td->io_blocks);
-	if (b % td->o.thinktime_blocks)
+	b = ddir_rw_sum(td->thinktime_blocks_counter);
+	if (b % td->o.thinktime_blocks || !b)
 		return;
 
 	io_u_quiesce(td);
@@ -898,6 +899,9 @@ static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir)
 		/* adjust for rate_process=poisson */
 		td->last_usec[ddir] += total;
 	}
+
+	if (time && should_check_rate(td))
+		fio_gettime(time, NULL);
 }
 
 /*
@@ -1076,6 +1080,10 @@ reap:
 		}
 		if (ret < 0)
 			break;
+
+		if (ddir_rw(ddir) && td->o.thinktime)
+			handle_thinktime(td, ddir, &comp_time);
+
 		if (!ddir_rw_sum(td->bytes_done) &&
 		    !td_ioengine_flagged(td, FIO_NOIO))
 			continue;
@@ -1090,9 +1098,6 @@ reap:
 		}
 		if (!in_ramp_time(td) && td->o.latency_target)
 			lat_target_check(td);
-
-		if (ddir_rw(ddir) && td->o.thinktime)
-			handle_thinktime(td, ddir);
 	}
 
 	check_update_rusage(td);
@@ -1744,6 +1749,11 @@ static void *thread_main(void *data)
 	if (rate_submit_init(td, sk_out))
 		goto err;
 
+	if (td->o.thinktime_blocks_type == THINKTIME_BLOCKS_TYPE_COMPLETE)
+		td->thinktime_blocks_counter = td->io_blocks;
+	else
+		td->thinktime_blocks_counter = td->io_issues;
+
 	set_epoch_time(td, o->log_unix_epoch);
 	fio_getrusage(&td->ru_start);
 	memcpy(&td->bw_sample_time, &td->epoch, sizeof(td->epoch));
diff --git a/cconv.c b/cconv.c
index 62c2fc29..b10868fb 100644
--- a/cconv.c
+++ b/cconv.c
@@ -210,6 +210,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->thinktime = le32_to_cpu(top->thinktime);
 	o->thinktime_spin = le32_to_cpu(top->thinktime_spin);
 	o->thinktime_blocks = le32_to_cpu(top->thinktime_blocks);
+	o->thinktime_blocks_type = le32_to_cpu(top->thinktime_blocks_type);
 	o->fsync_blocks = le32_to_cpu(top->fsync_blocks);
 	o->fdatasync_blocks = le32_to_cpu(top->fdatasync_blocks);
 	o->barrier_blocks = le32_to_cpu(top->barrier_blocks);
@@ -431,6 +432,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->thinktime = cpu_to_le32(o->thinktime);
 	top->thinktime_spin = cpu_to_le32(o->thinktime_spin);
 	top->thinktime_blocks = cpu_to_le32(o->thinktime_blocks);
+	top->thinktime_blocks_type = __cpu_to_le32(o->thinktime_blocks_type);
 	top->fsync_blocks = cpu_to_le32(o->fsync_blocks);
 	top->fdatasync_blocks = cpu_to_le32(o->fdatasync_blocks);
 	top->barrier_blocks = cpu_to_le32(o->barrier_blocks);
diff --git a/engines/cpu.c b/engines/cpu.c
index ccbfe003..ce74dbce 100644
--- a/engines/cpu.c
+++ b/engines/cpu.c
@@ -268,6 +268,7 @@ static int fio_cpuio_init(struct thread_data *td)
 	 * set thinktime_sleep and thinktime_spin appropriately
 	 */
 	o->thinktime_blocks = 1;
+	o->thinktime_blocks_type = THINKTIME_BLOCKS_TYPE_COMPLETE;
 	o->thinktime_spin = 0;
 	o->thinktime = ((unsigned long long) co->cpucycle *
 				(100 - co->cpuload)) / co->cpuload;
diff --git a/fio.1 b/fio.1
index 9636a85f..aa248a3b 100644
--- a/fio.1
+++ b/fio.1
@@ -2323,6 +2323,12 @@ queue depth setting redundant, since no more than 1 I/O will be queued
 before we have to complete it and do our \fBthinktime\fR. In other words, this
 setting effectively caps the queue depth if the latter is larger.
 .TP
+.BI thinktime_blocks_type \fR=\fPstr
+Only valid if \fBthinktime\fR is set - control how \fBthinktime_blocks\fR triggers.
+The default is `complete', which triggers \fBthinktime\fR when fio completes
+\fBthinktime_blocks\fR blocks. If this is set to `issue', then the trigger happens
+at the issue side.
+.TP
 .BI rate \fR=\fPint[,int][,int]
 Cap the bandwidth used by this job. The number is in bytes/sec, the normal
 suffix rules apply. Comma-separated values may be specified for reads,
diff --git a/fio.h b/fio.h
index 062abfa7..b05cb3df 100644
--- a/fio.h
+++ b/fio.h
@@ -149,6 +149,9 @@ enum {
 
 	RATE_PROCESS_LINEAR = 0,
 	RATE_PROCESS_POISSON = 1,
+
+	THINKTIME_BLOCKS_TYPE_COMPLETE = 0,
+	THINKTIME_BLOCKS_TYPE_ISSUE = 1,
 };
 
 enum {
@@ -354,6 +357,8 @@ struct thread_data {
 	struct fio_sem *sem;
 	uint64_t bytes_done[DDIR_RWDIR_CNT];
 
+	uint64_t *thinktime_blocks_counter;
+
 	/*
 	 * State for random io, a bitmap of blocks done vs not done
 	 */
diff --git a/options.c b/options.c
index 955bf959..e62e0cfb 100644
--- a/options.c
+++ b/options.c
@@ -3608,6 +3608,28 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_THINKTIME,
 	},
+	{
+		.name	= "thinktime_blocks_type",
+		.lname	= "Thinktime blocks type",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct thread_options, thinktime_blocks_type),
+		.help	= "How thinktime_blocks takes effect",
+		.def	= "complete",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_THINKTIME,
+		.posval = {
+			  { .ival = "complete",
+			    .oval = THINKTIME_BLOCKS_TYPE_COMPLETE,
+			    .help = "thinktime_blocks takes effect at the completion side",
+			  },
+			  {
+			    .ival = "issue",
+			    .oval = THINKTIME_BLOCKS_TYPE_ISSUE,
+			    .help = "thinktime_blocks takes effect at the issue side",
+			  },
+		},
+		.parent = "thinktime",
+	},
 	{
 		.name	= "rate",
 		.lname	= "I/O rate",
diff --git a/thread_options.h b/thread_options.h
index 0a033430..f6b15403 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -177,6 +177,7 @@ struct thread_options {
 	unsigned int thinktime;
 	unsigned int thinktime_spin;
 	unsigned int thinktime_blocks;
+	unsigned int thinktime_blocks_type;
 	unsigned int fsync_blocks;
 	unsigned int fdatasync_blocks;
 	unsigned int barrier_blocks;
@@ -479,6 +480,7 @@ struct thread_options_pack {
 	uint32_t thinktime;
 	uint32_t thinktime_spin;
 	uint32_t thinktime_blocks;
+	uint32_t thinktime_blocks_type;
 	uint32_t fsync_blocks;
 	uint32_t fdatasync_blocks;
 	uint32_t barrier_blocks;
@@ -506,6 +508,9 @@ struct thread_options_pack {
 	uint32_t stonewall;
 	uint32_t new_group;
 	uint32_t numjobs;
+
+	uint8_t pad3[4];
+
 	/*
 	 * We currently can't convert these, so don't enable them
 	 */


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-01-26 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-01-26 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 42664db2e3ba38faae8ee4c8375f8958206a8c5d:

  Merge branch 'esx-timerfd-bypass' of https://github.com/brianredbeard/fio (2021-01-23 11:04:54 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e493ceaeccfaebda9d30435cbbe30e97058313a7:

  HOWTO: add sg 'hipri' option (2021-01-25 14:06:48 -0700)

----------------------------------------------------------------
Douglas Gilbert (1):
      fio: add hipri option to sg engine

Eric Sandeen (2):
      fio: move dynamic library handle to io_ops structure
      fio: fix dlopen refcounting of dynamic engines

Jens Axboe (1):
      HOWTO: add sg 'hipri' option

 HOWTO        | 12 ++++++++++++
 engines/sg.c | 22 +++++++++++++++++++++-
 fio.1        | 10 ++++++++++
 fio.h        |  1 -
 init.c       |  9 +++------
 ioengines.c  | 16 ++++++++++------
 ioengines.h  |  3 ++-
 7 files changed, 58 insertions(+), 15 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 372f268f..3ec86aff 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2359,6 +2359,18 @@ with the caveat that when used on the command line, they must come after the
 		transferred to the device. The writefua option is ignored with this
 		selection.
 
+.. option:: hipri : [sg]
+
+	If this option is set, fio will attempt to use polled IO completions.
+	This will have a similar effect as (io_uring)hipri. Only SCSI READ and
+	WRITE commands will have the SGV4_FLAG_HIPRI set (not UNMAP (trim) nor
+	VERIFY). Older versions of the Linux sg driver that do not support
+	hipri will simply ignore this flag and do normal IO. The Linux SCSI
+	Low Level Driver (LLD) that "owns" the device also needs to support
+	hipri (also known as iopoll and mq_poll). The MegaRAID driver is an
+	example of a SCSI LLD. Default: clear (0) which does normal
+	(interrupted based) IO.
+
 .. option:: http_host=str : [http]
 
 	Hostname to connect to. For S3, this could be the bucket hostname.
diff --git a/engines/sg.c b/engines/sg.c
index a1a6de4c..0c2d2c8b 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -60,6 +60,10 @@
 
 #ifdef FIO_HAVE_SGIO
 
+#ifndef SGV4_FLAG_HIPRI
+#define SGV4_FLAG_HIPRI 0x800
+#endif
+
 enum {
 	FIO_SG_WRITE		= 1,
 	FIO_SG_WRITE_VERIFY	= 2,
@@ -68,12 +72,22 @@ enum {
 
 struct sg_options {
 	void *pad;
+	unsigned int hipri;
 	unsigned int readfua;
 	unsigned int writefua;
 	unsigned int write_mode;
 };
 
 static struct fio_option options[] = {
+        {
+                .name   = "hipri",
+                .lname  = "High Priority",
+                .type   = FIO_OPT_STR_SET,
+                .off1   = offsetof(struct sg_options, hipri),
+                .help   = "Use polled IO completions",
+                .category = FIO_OPT_C_ENGINE,
+                .group  = FIO_OPT_G_SG,
+        },
 	{
 		.name	= "readfua",
 		.lname	= "sg engine read fua flag support",
@@ -527,6 +541,8 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 		else
 			hdr->cmdp[0] = 0x88; // read(16)
 
+		if (o->hipri)
+			hdr->flags |= SGV4_FLAG_HIPRI;
 		if (o->readfua)
 			hdr->cmdp[1] |= 0x08;
 
@@ -542,6 +558,8 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 				hdr->cmdp[0] = 0x2a; // write(10)
 			else
 				hdr->cmdp[0] = 0x8a; // write(16)
+			if (o->hipri)
+				hdr->flags |= SGV4_FLAG_HIPRI;
 			if (o->writefua)
 				hdr->cmdp[1] |= 0x08;
 			break;
@@ -865,6 +883,7 @@ static int fio_sgio_init(struct thread_data *td)
 {
 	struct sgio_data *sd;
 	struct sgio_trim *st;
+	struct sg_io_hdr *h3p;
 	int i;
 
 	sd = calloc(1, sizeof(*sd));
@@ -880,12 +899,13 @@ static int fio_sgio_init(struct thread_data *td)
 #ifdef FIO_SGIO_DEBUG
 	sd->trim_queue_map = calloc(td->o.iodepth, sizeof(int));
 #endif
-	for (i = 0; i < td->o.iodepth; i++) {
+	for (i = 0, h3p = sd->sgbuf; i < td->o.iodepth; i++, ++h3p) {
 		sd->trim_queues[i] = calloc(1, sizeof(struct sgio_trim));
 		st = sd->trim_queues[i];
 		st->unmap_param = calloc(td->o.iodepth + 1, sizeof(char[16]));
 		st->unmap_range_count = 0;
 		st->trim_io_us = calloc(td->o.iodepth, sizeof(struct io_u *));
+		h3p->interface_id = 'S';
 	}
 
 	td->io_ops_data = sd;
diff --git a/fio.1 b/fio.1
index d477b508..9636a85f 100644
--- a/fio.1
+++ b/fio.1
@@ -2114,6 +2114,16 @@ client and the server or in certain loopback configurations.
 Specify stat system call type to measure lookup/getattr performance.
 Default is \fBstat\fR for \fBstat\fR\|(2).
 .TP
+.BI (sg)hipri
+If this option is set, fio will attempt to use polled IO completions. This
+will have a similar effect as (io_uring)hipri. Only SCSI READ and WRITE
+commands will have the SGV4_FLAG_HIPRI set (not UNMAP (trim) nor VERIFY).
+Older versions of the Linux sg driver that do not support hipri will simply
+ignore this flag and do normal IO. The Linux SCSI Low Level Driver (LLD)
+that "owns" the device also needs to support hipri (also known as iopoll
+and mq_poll). The MegaRAID driver is an example of a SCSI LLD.
+Default: clear (0) which does normal (interrupted based) IO.
+.TP
 .BI (sg)readfua \fR=\fPbool
 With readfua option set to 1, read operations include the force
 unit access (fua) flag. Default: 0.
diff --git a/fio.h b/fio.h
index ee582a72..062abfa7 100644
--- a/fio.h
+++ b/fio.h
@@ -281,7 +281,6 @@ struct thread_data {
 	 * IO engine private data and dlhandle.
 	 */
 	void *io_ops_data;
-	void *io_ops_dlhandle;
 
 	/*
 	 * Queue depth of io_u's that fio MIGHT do
diff --git a/init.c b/init.c
index 1d14df16..d6dbaf7c 100644
--- a/init.c
+++ b/init.c
@@ -1104,18 +1104,15 @@ int ioengine_load(struct thread_data *td)
 		 * for this name and see if they match. If they do, then
 		 * the engine is unchanged.
 		 */
-		dlhandle = td->io_ops_dlhandle;
+		dlhandle = td->io_ops->dlhandle;
 		ops = load_ioengine(td);
 		if (!ops)
 			goto fail;
 
-		if (ops == td->io_ops && dlhandle == td->io_ops_dlhandle) {
-			if (dlhandle)
-				dlclose(dlhandle);
+		if (ops == td->io_ops && dlhandle == td->io_ops->dlhandle)
 			return 0;
-		}
 
-		if (dlhandle && dlhandle != td->io_ops_dlhandle)
+		if (dlhandle && dlhandle != td->io_ops->dlhandle)
 			dlclose(dlhandle);
 
 		/* Unload the old engine. */
diff --git a/ioengines.c b/ioengines.c
index 5ac512ae..f88b0537 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -95,6 +95,7 @@ static void *dlopen_external(struct thread_data *td, const char *engine)
 
 	sprintf(engine_path, "%s/fio-%s.so", FIO_EXT_ENG_DIR, engine);
 
+	dprint(FD_IO, "dlopen external %s\n", engine_path);
 	dlhandle = dlopen(engine_path, RTLD_LAZY);
 	if (!dlhandle)
 		log_info("Engine %s not found; Either name is invalid, was not built, or fio-engine-%s package is missing.\n",
@@ -116,7 +117,7 @@ static struct ioengine_ops *dlopen_ioengine(struct thread_data *td,
 	    !strncmp(engine_lib, "aio", 3))
 		engine_lib = "libaio";
 
-	dprint(FD_IO, "dload engine %s\n", engine_lib);
+	dprint(FD_IO, "dlopen engine %s\n", engine_lib);
 
 	dlerror();
 	dlhandle = dlopen(engine_lib, RTLD_LAZY);
@@ -155,7 +156,7 @@ static struct ioengine_ops *dlopen_ioengine(struct thread_data *td,
 		return NULL;
 	}
 
-	td->io_ops_dlhandle = dlhandle;
+	ops->dlhandle = dlhandle;
 	return ops;
 }
 
@@ -194,7 +195,9 @@ struct ioengine_ops *load_ioengine(struct thread_data *td)
 	 * so as not to break job files not using the prefix.
 	 */
 	ops = __load_ioengine(td->o.ioengine);
-	if (!ops)
+
+	/* We do re-dlopen existing handles, for reference counting */
+	if (!ops || ops->dlhandle)
 		ops = dlopen_ioengine(td, name);
 
 	/*
@@ -228,9 +231,10 @@ void free_ioengine(struct thread_data *td)
 		td->eo = NULL;
 	}
 
-	if (td->io_ops_dlhandle) {
-		dlclose(td->io_ops_dlhandle);
-		td->io_ops_dlhandle = NULL;
+	if (td->io_ops->dlhandle) {
+		dprint(FD_IO, "dlclose ioengine %s\n", td->io_ops->name);
+		dlclose(td->io_ops->dlhandle);
+		td->io_ops->dlhandle = NULL;
 	}
 
 	td->io_ops = NULL;
diff --git a/ioengines.h b/ioengines.h
index a928b211..839b318d 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -8,7 +8,7 @@
 #include "io_u.h"
 #include "zbd_types.h"
 
-#define FIO_IOOPS_VERSION	27
+#define FIO_IOOPS_VERSION	28
 
 #ifndef CONFIG_DYNAMIC_ENGINES
 #define FIO_STATIC	static
@@ -30,6 +30,7 @@ struct ioengine_ops {
 	const char *name;
 	int version;
 	int flags;
+	void *dlhandle;
 	int (*setup)(struct thread_data *);
 	int (*init)(struct thread_data *);
 	int (*post_init)(struct thread_data *);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-01-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-01-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 548b363c08875165a018788195e8fd2304c2ce24:

  Merge branch 'fix_filename_overrun' of https://github.com/sitsofe/fio (2021-01-16 13:36:27 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 42664db2e3ba38faae8ee4c8375f8958206a8c5d:

  Merge branch 'esx-timerfd-bypass' of https://github.com/brianredbeard/fio (2021-01-23 11:04:54 -0700)

----------------------------------------------------------------
Brian 'Redbeard' Harrington (1):
      configure: ESX does not have timerfd support

Jens Axboe (1):
      Merge branch 'esx-timerfd-bypass' of https://github.com/brianredbeard/fio

 configure | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 1306f1b3..e6e33d7f 100755
--- a/configure
+++ b/configure
@@ -2695,6 +2695,7 @@ print_config "Windows PDB generation" "$pdb"
 ##########################################
 # check for timerfd support
 timerfd_create="no"
+if test "$esx" != "yes" ; then
 cat > $TMPC << EOF
 #include <sys/time.h>
 #include <sys/timerfd.h>
@@ -2704,8 +2705,9 @@ int main(int argc, char **argv)
 	return timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK);
 }
 EOF
-if compile_prog "" "" "timerfd_create"; then
-  timerfd_create="yes"
+  if compile_prog "" "" "timerfd_create"; then
+    timerfd_create="yes"
+  fi
 fi
 print_config "timerfd_create" "$timerfd_create"
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-01-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-01-17 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e4d9a7bf68d0ffb9fd7ab328a4f0edddc89297be:

  Merge branch 'fix_keyword_sub' of https://github.com/sitsofe/fio (2021-01-15 20:52:57 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 548b363c08875165a018788195e8fd2304c2ce24:

  Merge branch 'fix_filename_overrun' of https://github.com/sitsofe/fio (2021-01-16 13:36:27 -0700)

----------------------------------------------------------------
HongweiQin (1):
      Fix a rate limit issue.

Jens Axboe (1):
      Merge branch 'fix_filename_overrun' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      options: fix buffer overrun

 backend.c |  4 ++--
 fio.h     | 10 +---------
 options.c |  1 +
 parse.c   |  5 +++++
 4 files changed, 9 insertions(+), 11 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 2e6a377c..e20a2e07 100644
--- a/backend.c
+++ b/backend.c
@@ -439,7 +439,7 @@ static int wait_for_completions(struct thread_data *td, struct timespec *time)
 	if ((full && !min_evts) || !td->o.iodepth_batch_complete_min)
 		min_evts = 1;
 
-	if (time && __should_check_rate(td))
+	if (time && should_check_rate(td))
 		fio_gettime(time, NULL);
 
 	do {
@@ -494,7 +494,7 @@ int io_queue_event(struct thread_data *td, struct io_u *io_u, int *ret,
 			requeue_io_u(td, &io_u);
 		} else {
 sync_done:
-			if (comp_time && __should_check_rate(td))
+			if (comp_time && should_check_rate(td))
 				fio_gettime(comp_time, NULL);
 
 			*ret = io_u_sync_complete(td, io_u);
diff --git a/fio.h b/fio.h
index 4d439d98..ee582a72 100644
--- a/fio.h
+++ b/fio.h
@@ -757,17 +757,9 @@ static inline bool option_check_rate(struct thread_data *td, enum fio_ddir ddir)
 	return false;
 }
 
-static inline bool __should_check_rate(struct thread_data *td)
-{
-	return (td->flags & TD_F_CHECK_RATE) != 0;
-}
-
 static inline bool should_check_rate(struct thread_data *td)
 {
-	if (!__should_check_rate(td))
-		return false;
-
-	return ddir_rw_sum(td->bytes_done) != 0;
+	return (td->flags & TD_F_CHECK_RATE) != 0;
 }
 
 static inline unsigned long long td_max_bs(struct thread_data *td)
diff --git a/options.c b/options.c
index 0b4c48d6..955bf959 100644
--- a/options.c
+++ b/options.c
@@ -1672,6 +1672,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Filename(s)",
 		.type	= FIO_OPT_STR_STORE,
 		.off1	= offsetof(struct thread_options, filename),
+		.maxlen	= PATH_MAX,
 		.cb	= str_filename_cb,
 		.prio	= -1, /* must come after "directory" */
 		.help	= "File(s) to use for the workload",
diff --git a/parse.c b/parse.c
index c28d82ef..44bf9507 100644
--- a/parse.c
+++ b/parse.c
@@ -786,6 +786,11 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 		if (o->off1) {
 			cp = td_var(data, o, o->off1);
 			*cp = strdup(ptr);
+			if (strlen(ptr) > o->maxlen - 1) {
+				log_err("value exceeds max length of %d\n",
+					o->maxlen);
+				return 1;
+			}
 		}
 
 		if (fn)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-01-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-01-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4008b7fc8e2bff60a4e98de0005e6bc71b1a8641:

  Merge branch 'zipf-pareto-lock' of https://github.com/aclamk/fio (2021-01-12 10:52:54 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e4d9a7bf68d0ffb9fd7ab328a4f0edddc89297be:

  Merge branch 'fix_keyword_sub' of https://github.com/sitsofe/fio (2021-01-15 20:52:57 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fix_keyword_sub' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      options: fix keyword substitution heap overrun

 options.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/options.c b/options.c
index 47b20c24..0b4c48d6 100644
--- a/options.c
+++ b/options.c
@@ -5115,10 +5115,10 @@ static char *fio_keyword_replace(char *opt)
 			 * If there's more in the original string, copy that
 			 * in too
 			 */
-			opt += strlen(kw->word) + olen;
+			opt += olen + strlen(kw->word);
 			/* keeps final zero thanks to calloc */
 			if (strlen(opt))
-				memcpy(new + olen + len, opt, opt - o_org - 1);
+				memcpy(new + olen + len, opt, strlen(opt));
 
 			/*
 			 * replace opt and free the old opt


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-01-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-01-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 674428a527931d86bfb164abcc847508b3be2742:

  Merge branch 'num2str-patch' of https://github.com/gloit042/fio (2021-01-09 15:28:44 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4008b7fc8e2bff60a4e98de0005e6bc71b1a8641:

  Merge branch 'zipf-pareto-lock' of https://github.com/aclamk/fio (2021-01-12 10:52:54 -0700)

----------------------------------------------------------------
Adam Kupczyk (1):
      distibutions: Extend flexibility of non-uniform random distributions

Jens Axboe (3):
      Merge branch 'fuzz' of https://github.com/catenacyber/fio
      Merge branch 'osx_fix' of https://github.com/sitsofe/fio
      Merge branch 'zipf-pareto-lock' of https://github.com/aclamk/fio

Philippe Antoine (2):
      fuzz: Adds fuzz target for parse_jobs_ini
      options: Fix buffer over read in fio_keyword_replace

Sitsofe Wheeler (1):
      configure: fix compilation on recent macOS Xcode versions

 HOWTO                  | 10 +++++++++-
 Makefile               | 26 +++++++++++++++++++++++++
 cconv.c                |  2 ++
 configure              | 12 ++++++------
 filesetup.c            |  6 +++---
 fio.1                  | 10 +++++++++-
 fio.h                  |  1 +
 init.c                 |  8 +++++---
 lib/gauss.c            |  8 ++++++--
 lib/gauss.h            |  3 ++-
 lib/zipf.c             | 12 +++++++-----
 lib/zipf.h             |  6 ++++--
 options.c              | 42 ++++++++++++++++++++++++++++++++++++++---
 server.h               |  2 +-
 t/fuzz/fuzz_parseini.c | 41 ++++++++++++++++++++++++++++++++++++++++
 t/fuzz/onefile.c       | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++
 t/genzipf.c            |  6 +++---
 thread_options.h       |  2 ++
 18 files changed, 217 insertions(+), 31 deletions(-)
 create mode 100644 t/fuzz/fuzz_parseini.c
 create mode 100644 t/fuzz/onefile.c

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 0547c721..372f268f 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1361,7 +1361,7 @@ I/O type
 	limit reads or writes to a certain rate.  If that is the case, then the
 	distribution may be skewed. Default: 50.
 
-.. option:: random_distribution=str:float[,str:float][,str:float]
+.. option:: random_distribution=str:float[:float][,str:float][,str:float]
 
 	By default, fio will use a completely uniform random distribution when asked
 	to perform random I/O. Sometimes it is useful to skew the distribution in
@@ -1396,6 +1396,14 @@ I/O type
 	map. For the **normal** distribution, a normal (Gaussian) deviation is
 	supplied as a value between 0 and 100.
 
+	The second, optional float is allowed for **pareto**, **zipf** and **normal** distributions.
+	It allows to set base of distribution in non-default place, giving more control
+	over most probable outcome. This value is in range [0-1] which maps linearly to
+	range of possible random values.
+	Defaults are: random for **pareto** and **zipf**, and 0.5 for **normal**.
+	If you wanted to use **zipf** with a `theta` of 1.2 centered on 1/4 of allowed value range,
+	you would use ``random_distibution=zipf:1.2:0.25``.
+
 	For a **zoned** distribution, fio supports specifying percentages of I/O
 	access that should fall within what range of the file or device. For
 	example, given a criteria of:
diff --git a/Makefile b/Makefile
index a838af9a..f74e59e1 100644
--- a/Makefile
+++ b/Makefile
@@ -346,6 +346,23 @@ T_MEMLOCK_PROGS = t/memlock
 T_TT_OBJS = t/time-test.o
 T_TT_PROGS = t/time-test
 
+T_FUZZ_OBJS = t/fuzz/fuzz_parseini.o
+T_FUZZ_OBJS += $(OBJS)
+ifdef CONFIG_ARITHMETIC
+T_FUZZ_OBJS += lex.yy.o y.tab.o
+endif
+# in case there is no fuzz driver defined by environment variable LIB_FUZZING_ENGINE, use a simple one
+# For instance, with compiler clang, address sanitizer and libFuzzer as a fuzzing engine, you should define
+# export CFLAGS="-fsanitize=address,fuzzer-no-link"
+# export LIB_FUZZING_ENGINE="-fsanitize=address"
+# export CC=clang
+# before running configure && make
+# You can adapt this with different compilers, sanitizers, and fuzzing engines
+ifndef LIB_FUZZING_ENGINE
+T_FUZZ_OBJS += t/fuzz/onefile.o
+endif
+T_FUZZ_PROGS = t/fuzz/fuzz_parseini
+
 T_OBJS = $(T_SMALLOC_OBJS)
 T_OBJS += $(T_IEEE_OBJS)
 T_OBJS += $(T_ZIPF_OBJS)
@@ -359,6 +376,7 @@ T_OBJS += $(T_PIPE_ASYNC_OBJS)
 T_OBJS += $(T_MEMLOCK_OBJS)
 T_OBJS += $(T_TT_OBJS)
 T_OBJS += $(T_IOU_RING_OBJS)
+T_OBJS += $(T_FUZZ_OBJS)
 
 ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
     T_DEDUPE_OBJS += $(WINDOWS_OBJS)
@@ -382,6 +400,7 @@ endif
 ifneq (,$(findstring Linux,$(CONFIG_TARGET_OS)))
 T_TEST_PROGS += $(T_IOU_RING_PROGS)
 endif
+T_TEST_PROGS += $(T_FUZZ_PROGS)
 
 PROGS += $(T_PROGS)
 
@@ -533,6 +552,13 @@ t/ieee754: $(T_IEEE_OBJS)
 fio: $(FIO_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(FIO_OBJS) $(LIBS) $(HDFSLIB)
 
+t/fuzz/fuzz_parseini: $(T_FUZZ_OBJS)
+ifndef LIB_FUZZING_ENGINE
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_FUZZ_OBJS) $(LIBS) $(HDFSLIB)
+else
+	$(QUIET_LINK)$(CXX) $(LDFLAGS) -o $@ $(T_FUZZ_OBJS) $(LIB_FUZZING_ENGINE) $(LIBS) $(HDFSLIB)
+endif
+
 gfio: $(GFIO_OBJS)
 	$(QUIET_LINK)$(CC) $(filter-out -static, $(LDFLAGS)) -o gfio $(GFIO_OBJS) $(LIBS) $(GFIO_LIBS) $(GTK_LDFLAGS) $(HDFSLIB)
 
diff --git a/cconv.c b/cconv.c
index 488dd799..62c2fc29 100644
--- a/cconv.c
+++ b/cconv.c
@@ -203,6 +203,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->zipf_theta.u.f = fio_uint64_to_double(le64_to_cpu(top->zipf_theta.u.i));
 	o->pareto_h.u.f = fio_uint64_to_double(le64_to_cpu(top->pareto_h.u.i));
 	o->gauss_dev.u.f = fio_uint64_to_double(le64_to_cpu(top->gauss_dev.u.i));
+	o->random_center.u.f = fio_uint64_to_double(le64_to_cpu(top->random_center.u.i));
 	o->random_generator = le32_to_cpu(top->random_generator);
 	o->hugepage_size = le32_to_cpu(top->hugepage_size);
 	o->rw_min_bs = le64_to_cpu(top->rw_min_bs);
@@ -423,6 +424,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->zipf_theta.u.i = __cpu_to_le64(fio_double_to_uint64(o->zipf_theta.u.f));
 	top->pareto_h.u.i = __cpu_to_le64(fio_double_to_uint64(o->pareto_h.u.f));
 	top->gauss_dev.u.i = __cpu_to_le64(fio_double_to_uint64(o->gauss_dev.u.f));
+	top->random_center.u.i = __cpu_to_le64(fio_double_to_uint64(o->random_center.u.f));
 	top->random_generator = cpu_to_le32(o->random_generator);
 	top->hugepage_size = cpu_to_le32(o->hugepage_size);
 	top->rw_min_bs = __cpu_to_le64(o->rw_min_bs);
diff --git a/configure b/configure
index e3e37d56..1306f1b3 100755
--- a/configure
+++ b/configure
@@ -45,6 +45,7 @@ print_config() {
 
 # Default CFLAGS
 CFLAGS="-D_GNU_SOURCE -include config-host.h $CFLAGS"
+CONFIGURE_CFLAGS="-Werror-implicit-function-declaration"
 BUILD_CFLAGS=""
 
 # Print a helpful header at the top of config.log
@@ -88,14 +89,14 @@ do_cc() {
 }
 
 compile_object() {
-  do_cc $CFLAGS -Werror-implicit-function-declaration -c -o $TMPO $TMPC
+  do_cc $CFLAGS $CONFIGURE_CFLAGS -c -o $TMPO $TMPC
 }
 
 compile_prog() {
   local_cflags="$1"
   local_ldflags="$2 $LIBS"
   echo "Compiling test case $3" >> config.log
-  do_cc $CFLAGS -Werror-implicit-function-declaration $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags
+  do_cc $CFLAGS $CONFIGURE_CFLAGS $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags
 }
 
 feature_not_found() {
@@ -360,16 +361,15 @@ Darwin)
   if test -z "$cpu" && test "$(sysctl -n hw.optional.x86_64)" = "1"; then
     cpu="x86_64"
   fi
-  # Error at compile time linking of weak/partial symbols if possible...
+  # Avoid configure feature detection of features provided by weak symbols
 cat > $TMPC <<EOF
 int main(void)
 {
   return 0;
 }
 EOF
-  if compile_prog "" "-Wl,-no_weak_imports" "disable weak symbols"; then
-    echo "Disabling weak symbols"
-    LDFLAGS="$LDFLAGS -Wl,-no_weak_imports"
+  if compile_prog "" "-Werror=partial-availability" "error on weak symbols"; then
+    CONFIGURE_CFLAGS="$CONFIGURE_CFLAGS -Werror=partial-availability"
   fi
   ;;
 SunOS)
diff --git a/filesetup.c b/filesetup.c
index 76b3f935..9d033757 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1319,11 +1319,11 @@ static void __init_rand_distribution(struct thread_data *td, struct fio_file *f)
 		seed = td->rand_seeds[4];
 
 	if (td->o.random_distribution == FIO_RAND_DIST_ZIPF)
-		zipf_init(&f->zipf, nranges, td->o.zipf_theta.u.f, seed);
+		zipf_init(&f->zipf, nranges, td->o.zipf_theta.u.f, td->o.random_center.u.f, seed);
 	else if (td->o.random_distribution == FIO_RAND_DIST_PARETO)
-		pareto_init(&f->zipf, nranges, td->o.pareto_h.u.f, seed);
+		pareto_init(&f->zipf, nranges, td->o.pareto_h.u.f, td->o.random_center.u.f, seed);
 	else if (td->o.random_distribution == FIO_RAND_DIST_GAUSS)
-		gauss_init(&f->gauss, nranges, td->o.gauss_dev.u.f, seed);
+		gauss_init(&f->gauss, nranges, td->o.gauss_dev.u.f, td->o.random_center.u.f, seed);
 }
 
 static bool init_rand_distribution(struct thread_data *td)
diff --git a/fio.1 b/fio.1
index e361b05f..d477b508 100644
--- a/fio.1
+++ b/fio.1
@@ -1132,7 +1132,7 @@ first. This may interfere with a given rate setting, if fio is asked to
 limit reads or writes to a certain rate. If that is the case, then the
 distribution may be skewed. Default: 50.
 .TP
-.BI random_distribution \fR=\fPstr:float[,str:float][,str:float]
+.BI random_distribution \fR=\fPstr:float[:float][,str:float][,str:float]
 By default, fio will use a completely uniform random distribution when asked
 to perform random I/O. Sometimes it is useful to skew the distribution in
 specific ways, ensuring that some parts of the data is more hot than others.
@@ -1168,6 +1168,14 @@ option. If a non\-uniform model is used, fio will disable use of the random
 map. For the \fBnormal\fR distribution, a normal (Gaussian) deviation is
 supplied as a value between 0 and 100.
 .P
+The second, optional float is allowed for \fBpareto\fR, \fBzipf\fR and \fBnormal\fR
+distributions. It allows to set base of distribution in non-default place, giving
+more control over most probable outcome. This value is in range [0-1] which maps linearly to
+range of possible random values.
+Defaults are: random for \fBpareto\fR and \fBzipf\fR, and 0.5 for \fBnormal\fR.
+If you wanted to use \fBzipf\fR with a `theta` of 1.2 centered on 1/4 of allowed value range,
+you would use `random_distibution=zipf:1.2:0.25`.
+.P
 For a \fBzoned\fR distribution, fio supports specifying percentages of I/O
 access that should fall within what range of the file or device. For
 example, given a criteria of:
diff --git a/fio.h b/fio.h
index fffec001..4d439d98 100644
--- a/fio.h
+++ b/fio.h
@@ -229,6 +229,7 @@ struct thread_data {
 		double pareto_h;
 		double gauss_dev;
 	};
+	double random_center;
 	int error;
 	int sig;
 	int done;
diff --git a/init.c b/init.c
index f9c20bdb..1d14df16 100644
--- a/init.c
+++ b/init.c
@@ -327,6 +327,7 @@ void free_threads_shm(void)
 
 static void free_shm(void)
 {
+#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
 	if (nr_segments) {
 		flow_exit();
 		fio_debug_jobp = NULL;
@@ -343,6 +344,7 @@ static void free_shm(void)
 	fio_filelock_exit();
 	file_hash_exit();
 	scleanup();
+#endif
 }
 
 static int add_thread_segment(void)
@@ -971,13 +973,13 @@ static void init_rand_file_service(struct thread_data *td)
 	const unsigned int seed = td->rand_seeds[FIO_RAND_FILE_OFF];
 
 	if (td->o.file_service_type == FIO_FSERVICE_ZIPF) {
-		zipf_init(&td->next_file_zipf, nranges, td->zipf_theta, seed);
+		zipf_init(&td->next_file_zipf, nranges, td->zipf_theta, td->random_center, seed);
 		zipf_disable_hash(&td->next_file_zipf);
 	} else if (td->o.file_service_type == FIO_FSERVICE_PARETO) {
-		pareto_init(&td->next_file_zipf, nranges, td->pareto_h, seed);
+		pareto_init(&td->next_file_zipf, nranges, td->pareto_h, td->random_center, seed);
 		zipf_disable_hash(&td->next_file_zipf);
 	} else if (td->o.file_service_type == FIO_FSERVICE_GAUSS) {
-		gauss_init(&td->next_file_gauss, nranges, td->gauss_dev, seed);
+		gauss_init(&td->next_file_gauss, nranges, td->gauss_dev, td->random_center, seed);
 		gauss_disable_hash(&td->next_file_gauss);
 	}
 }
diff --git a/lib/gauss.c b/lib/gauss.c
index 3f84dbc6..c64f61e7 100644
--- a/lib/gauss.c
+++ b/lib/gauss.c
@@ -40,11 +40,11 @@ unsigned long long gauss_next(struct gauss_state *gs)
 	if (!gs->disable_hash)
 		sum = __hash_u64(sum);
 
-	return sum % gs->nranges;
+	return (sum + gs->rand_off) % gs->nranges;
 }
 
 void gauss_init(struct gauss_state *gs, unsigned long nranges, double dev,
-		unsigned int seed)
+		double center, unsigned int seed)
 {
 	memset(gs, 0, sizeof(*gs));
 	init_rand_seed(&gs->r, seed, 0);
@@ -55,6 +55,10 @@ void gauss_init(struct gauss_state *gs, unsigned long nranges, double dev,
 		if (gs->stddev > nranges / 2)
 			gs->stddev = nranges / 2;
 	}
+	if (center == -1)
+	  gs->rand_off = 0;
+	else
+	  gs->rand_off = nranges * (center - 0.5);
 }
 
 void gauss_disable_hash(struct gauss_state *gs)
diff --git a/lib/gauss.h b/lib/gauss.h
index 478aa146..19e3a666 100644
--- a/lib/gauss.h
+++ b/lib/gauss.h
@@ -8,11 +8,12 @@ struct gauss_state {
 	struct frand_state r;
 	uint64_t nranges;
 	unsigned int stddev;
+	unsigned int rand_off;
 	bool disable_hash;
 };
 
 void gauss_init(struct gauss_state *gs, unsigned long nranges, double dev,
-		unsigned int seed);
+		double center, unsigned int seed);
 unsigned long long gauss_next(struct gauss_state *gs);
 void gauss_disable_hash(struct gauss_state *gs);
 
diff --git a/lib/zipf.c b/lib/zipf.c
index 321a4fb9..14d7928f 100644
--- a/lib/zipf.c
+++ b/lib/zipf.c
@@ -23,19 +23,21 @@ static void zipf_update(struct zipf_state *zs)
 }
 
 static void shared_rand_init(struct zipf_state *zs, uint64_t nranges,
-			     unsigned int seed)
+			     double center, unsigned int seed)
 {
 	memset(zs, 0, sizeof(*zs));
 	zs->nranges = nranges;
 
 	init_rand_seed(&zs->rand, seed, 0);
 	zs->rand_off = __rand(&zs->rand);
+	if (center != -1)
+		zs->rand_off = nranges * center;
 }
 
 void zipf_init(struct zipf_state *zs, uint64_t nranges, double theta,
-	       unsigned int seed)
+	       double center, unsigned int seed)
 {
-	shared_rand_init(zs, nranges, seed);
+	shared_rand_init(zs, nranges, center, seed);
 
 	zs->theta = theta;
 	zs->zeta2 = pow(1.0, zs->theta) + pow(0.5, zs->theta);
@@ -71,9 +73,9 @@ uint64_t zipf_next(struct zipf_state *zs)
 }
 
 void pareto_init(struct zipf_state *zs, uint64_t nranges, double h,
-		 unsigned int seed)
+		 double center, unsigned int seed)
 {
-	shared_rand_init(zs, nranges, seed);
+	shared_rand_init(zs, nranges, center, seed);
 	zs->pareto_pow = log(h) / log(1.0 - h);
 }
 
diff --git a/lib/zipf.h b/lib/zipf.h
index 16b65f57..332e3b2f 100644
--- a/lib/zipf.h
+++ b/lib/zipf.h
@@ -16,10 +16,12 @@ struct zipf_state {
 	bool disable_hash;
 };
 
-void zipf_init(struct zipf_state *zs, uint64_t nranges, double theta, unsigned int seed);
+void zipf_init(struct zipf_state *zs, uint64_t nranges, double theta,
+	       double center, unsigned int seed);
 uint64_t zipf_next(struct zipf_state *zs);
 
-void pareto_init(struct zipf_state *zs, uint64_t nranges, double h, unsigned int seed);
+void pareto_init(struct zipf_state *zs, uint64_t nranges, double h,
+		 double center, unsigned int seed);
 uint64_t pareto_next(struct zipf_state *zs);
 void zipf_disable_hash(struct zipf_state *zs);
 
diff --git a/options.c b/options.c
index 4c472589..47b20c24 100644
--- a/options.c
+++ b/options.c
@@ -44,6 +44,27 @@ static char *get_opt_postfix(const char *str)
 	return strdup(p);
 }
 
+static bool split_parse_distr(const char *str, double *val, double *center)
+{
+	char *cp, *p;
+	bool r;
+
+	p = strdup(str);
+	if (!p)
+		return false;
+
+	cp = strstr(p, ":");
+	r = true;
+	if (cp) {
+		*cp = '\0';
+		cp++;
+		r = str_to_float(cp, center, 0);
+	}
+	r = r && str_to_float(p, val, 0);
+	free(p);
+	return r;
+}
+
 static int bs_cmp(const void *p1, const void *p2)
 {
 	const struct bssplit *bsp1 = p1;
@@ -787,6 +808,7 @@ static int str_fst_cb(void *data, const char *str)
 {
 	struct thread_data *td = cb_data_to_td(data);
 	double val;
+	double center = -1;
 	bool done = false;
 	char *nr;
 
@@ -821,7 +843,7 @@ static int str_fst_cb(void *data, const char *str)
 		return 0;
 
 	nr = get_opt_postfix(str);
-	if (nr && !str_to_float(nr, &val, 0)) {
+	if (nr && !split_parse_distr(nr, &val, &center)) {
 		log_err("fio: file service type random postfix parsing failed\n");
 		free(nr);
 		return 1;
@@ -829,6 +851,12 @@ static int str_fst_cb(void *data, const char *str)
 
 	free(nr);
 
+	if (center != -1 && (center < 0.00 || center > 1.00)) {
+		log_err("fio: distribution center out of range (0 <= center <= 1.0)\n");
+		return 1;
+	}
+	td->random_center = center;
+
 	switch (td->o.file_service_type) {
 	case FIO_FSERVICE_ZIPF:
 		if (val == 1.00) {
@@ -1030,6 +1058,7 @@ static int str_random_distribution_cb(void *data, const char *str)
 {
 	struct thread_data *td = cb_data_to_td(data);
 	double val;
+	double center = -1;
 	char *nr;
 
 	if (td->o.random_distribution == FIO_RAND_DIST_ZIPF)
@@ -1046,7 +1075,7 @@ static int str_random_distribution_cb(void *data, const char *str)
 		return 0;
 
 	nr = get_opt_postfix(str);
-	if (nr && !str_to_float(nr, &val, 0)) {
+	if (nr && !split_parse_distr(nr, &val, &center)) {
 		log_err("fio: random postfix parsing failed\n");
 		free(nr);
 		return 1;
@@ -1054,6 +1083,12 @@ static int str_random_distribution_cb(void *data, const char *str)
 
 	free(nr);
 
+	if (center != -1 && (center < 0.00 || center > 1.00)) {
+		log_err("fio: distribution center out of range (0 <= center <= 1.0)\n");
+		return 1;
+	}
+	td->o.random_center.u.f = center;
+
 	if (td->o.random_distribution == FIO_RAND_DIST_ZIPF) {
 		if (val == 1.00) {
 			log_err("fio: zipf theta must different than 1.0\n");
@@ -5064,7 +5099,7 @@ static char *fio_keyword_replace(char *opt)
 		struct fio_keyword *kw = &fio_keywords[i];
 
 		while ((s = strstr(opt, kw->word)) != NULL) {
-			char *new = malloc(strlen(opt) + 1);
+			char *new = calloc(strlen(opt) + 1, 1);
 			char *o_org = opt;
 			int olen = s - opt;
 			int len;
@@ -5081,6 +5116,7 @@ static char *fio_keyword_replace(char *opt)
 			 * in too
 			 */
 			opt += strlen(kw->word) + olen;
+			/* keeps final zero thanks to calloc */
 			if (strlen(opt))
 				memcpy(new + olen + len, opt, opt - o_org - 1);
 
diff --git a/server.h b/server.h
index 6d444749..9256d44c 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 86,
+	FIO_SERVER_VER			= 87,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/t/fuzz/fuzz_parseini.c b/t/fuzz/fuzz_parseini.c
new file mode 100644
index 00000000..7e422c18
--- /dev/null
+++ b/t/fuzz/fuzz_parseini.c
@@ -0,0 +1,41 @@
+#include "fio.h"
+
+static int initialized = 0;
+
+const char *const fakeargv[] = {(char *) "fuzz",
+	(char *) "--output", (char *) "/dev/null",
+	(char *) "--parse-only",
+	0};
+
+int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size)
+{
+	char *fuzzedini;
+
+	if (size < 2)
+		return 0;
+
+	if (initialized == 0) {
+		if (fio_init_options()) {
+			printf("Failed fio_init_options\n");
+			return 1;
+		}
+
+		parse_cmd_line(4, (char **) fakeargv, 0);
+		sinit();
+
+		initialized = 1;
+	}
+	fuzzedini = malloc(size);
+	if (!fuzzedini) {
+		printf("Failed malloc\n");
+		return 1;
+	}
+	/* final character is type for parse_jobs_ini */
+	memcpy(fuzzedini, data, size - 1);
+	/* ensures final 0 */
+	fuzzedini[size - 1] = 0;
+
+	parse_jobs_ini(fuzzedini, 1, 0, data[size - 1]);
+	free(fuzzedini);
+	return 0;
+}
diff --git a/t/fuzz/onefile.c b/t/fuzz/onefile.c
new file mode 100644
index 00000000..2ed3bbe6
--- /dev/null
+++ b/t/fuzz/onefile.c
@@ -0,0 +1,51 @@
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size);
+
+int main(int argc, char** argv)
+{
+	FILE *fp;
+	uint8_t *data;
+	size_t size;
+
+	if (argc != 2)
+		return 1;
+
+	/* opens the file, get its size, and reads it into a buffer */
+	fp = fopen(argv[1], "rb");
+	if (fp == NULL)
+		return 2;
+
+	if (fseek(fp, 0L, SEEK_END) != 0) {
+		fclose(fp);
+		return 2;
+	}
+	size = ftell(fp);
+	if (size == (size_t) -1) {
+		fclose(fp);
+		return 2;
+	}
+	if (fseek(fp, 0L, SEEK_SET) != 0) {
+		fclose(fp);
+		return 2;
+	}
+	data = malloc(size);
+	if (data == NULL) {
+		fclose(fp);
+		return 2;
+	}
+	if (fread(data, size, 1, fp) != 1) {
+		fclose(fp);
+		free(data);
+		return 2;
+	}
+
+	/* launch fuzzer */
+	LLVMFuzzerTestOneInput(data, size);
+	free(data);
+	fclose(fp);
+
+	return 0;
+}
diff --git a/t/genzipf.c b/t/genzipf.c
index 4fc10ae7..cd62e584 100644
--- a/t/genzipf.c
+++ b/t/genzipf.c
@@ -297,11 +297,11 @@ int main(int argc, char *argv[])
 	nranges /= block_size;
 
 	if (dist_type == TYPE_ZIPF)
-		zipf_init(&zs, nranges, dist_val, 1);
+		zipf_init(&zs, nranges, dist_val, -1, 1);
 	else if (dist_type == TYPE_PARETO)
-		pareto_init(&zs, nranges, dist_val, 1);
+		pareto_init(&zs, nranges, dist_val, -1, 1);
 	else
-		gauss_init(&gs, nranges, dist_val, 1);
+		gauss_init(&gs, nranges, dist_val, -1, 1);
 
 	hash_bits = 0;
 	hash_size = nranges;
diff --git a/thread_options.h b/thread_options.h
index 97c400fe..0a033430 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -166,6 +166,7 @@ struct thread_options {
 	fio_fp64_t zipf_theta;
 	fio_fp64_t pareto_h;
 	fio_fp64_t gauss_dev;
+	fio_fp64_t random_center;
 
 	unsigned int random_generator;
 
@@ -467,6 +468,7 @@ struct thread_options_pack {
 	fio_fp64_t zipf_theta;
 	fio_fp64_t pareto_h;
 	fio_fp64_t gauss_dev;
+	fio_fp64_t random_center;
 
 	uint32_t random_generator;
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-01-10 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-01-10 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8f14dc61422df3b1eaee1293a7d10ba791c8084c:

  Merge branch 'cpu-engine' of https://github.com/bvanassche/fio (2021-01-07 20:53:00 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 674428a527931d86bfb164abcc847508b3be2742:

  Merge branch 'num2str-patch' of https://github.com/gloit042/fio (2021-01-09 15:28:44 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'num2str-patch' of https://github.com/gloit042/fio

gloit042 (1):
      num2str: fix precision loss bug when the fractional part is close to 1

 lib/num2str.c | 3 +++
 1 file changed, 3 insertions(+)

---

Diff of recent changes:

diff --git a/lib/num2str.c b/lib/num2str.c
index 3597de2f..cd89a0e5 100644
--- a/lib/num2str.c
+++ b/lib/num2str.c
@@ -110,6 +110,9 @@ done:
 	sprintf(tmp, "%.*f", (int)(maxlen - strlen(tmp) - 1),
 		(double)modulo / (double)thousand);
 
+	if (tmp[0] == '1')
+		num++;
+
 	if (asprintf(&buf, "%llu.%s%s%s", (unsigned long long) num, &tmp[2],
 		     unitprefix[post_index], unitstr[units]) < 0)
 		buf = NULL;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-01-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-01-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 59f94d26f98e9c0bc18d4e013f3361c51a2c6b25:

  Change ARRAY_SIZE to FIO_ARRAY_SIZE (2021-01-06 11:32:59 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8f14dc61422df3b1eaee1293a7d10ba791c8084c:

  Merge branch 'cpu-engine' of https://github.com/bvanassche/fio (2021-01-07 20:53:00 -0700)

----------------------------------------------------------------
Bart Van Assche (1):
      engines/cpu: Fix td_vmsg() call

Erwan Velu (1):
      engines/cpu: Adding qsort capabilities

Jens Axboe (3):
      Merge branch 'evelu-qsort' of https://github.com/ErwanAliasr1/fio
      engines/cpu: style cleanups
      Merge branch 'cpu-engine' of https://github.com/bvanassche/fio

 HOWTO              |   6 +-
 engines/cpu.c      | 233 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 examples/cpuio.fio |  14 +++-
 fio.1              |  21 +++--
 4 files changed, 254 insertions(+), 20 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index d663166d..0547c721 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1912,12 +1912,14 @@ I/O engine
 
 		**cpuio**
 			Doesn't transfer any data, but burns CPU cycles according to the
-			:option:`cpuload` and :option:`cpuchunks` options. Setting
-			:option:`cpuload`\=85 will cause that job to do nothing but burn 85%
+			:option:`cpuload`, :option:`cpuchunks` and :option:`cpumode` options.
+			Setting :option:`cpuload`\=85 will cause that job to do nothing but burn 85%
 			of the CPU. In case of SMP machines, use :option:`numjobs`\=<nr_of_cpu>
 			to get desired CPU usage, as the cpuload only loads a
 			single CPU at the desired rate. A job never finishes unless there is
 			at least one non-cpuio job.
+			Setting :option:`cpumode`\=qsort replace the default noop instructions loop
+			by a qsort algorithm to consume more energy.
 
 		**rdma**
 			The RDMA I/O engine supports both RDMA memory semantics
diff --git a/engines/cpu.c b/engines/cpu.c
index 4d572b44..ccbfe003 100644
--- a/engines/cpu.c
+++ b/engines/cpu.c
@@ -8,11 +8,26 @@
 #include "../fio.h"
 #include "../optgroup.h"
 
+// number of 32 bit integers to sort
+size_t qsort_size = (256 * (1ULL << 10)); // 256KB
+
+struct mwc {
+	uint32_t w;
+	uint32_t z;
+};
+
+enum stress_mode {
+	FIO_CPU_NOOP = 0,
+	FIO_CPU_QSORT = 1,
+};
+
 struct cpu_options {
 	void *pad;
 	unsigned int cpuload;
 	unsigned int cpucycle;
+	enum stress_mode cpumode;
 	unsigned int exit_io_done;
+	int32_t *qsort_data;
 };
 
 static struct fio_option options[] = {
@@ -25,6 +40,26 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_INVALID,
 	},
+	{
+		.name     = "cpumode",
+		.lname    = "cpumode",
+		.type     = FIO_OPT_STR,
+		.help     = "Stress mode",
+		.off1     = offsetof(struct cpu_options, cpumode),
+		.def      = "noop",
+		.posval = {
+			  { .ival = "noop",
+			    .oval = FIO_CPU_NOOP,
+			    .help = "NOOP instructions",
+			  },
+			  { .ival = "qsort",
+			    .oval = FIO_CPU_QSORT,
+			    .help = "QSORT computation",
+			  },
+		},
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_INVALID,
+	},
 	{
 		.name	= "cpuchunks",
 		.lname	= "CPU chunk",
@@ -52,6 +87,91 @@ static struct fio_option options[] = {
 	},
 };
 
+/*
+ *  mwc32()
+ *      Multiply-with-carry random numbers
+ *      fast pseudo random number generator, see
+ *      http://www.cse.yorku.ca/~oz/marsaglia-rng.html
+ */
+uint32_t mwc32(struct mwc *mwc)
+{
+        mwc->z = 36969 * (mwc->z & 65535) + (mwc->z >> 16);
+        mwc->w = 18000 * (mwc->w & 65535) + (mwc->w >> 16);
+        return (mwc->z << 16) + mwc->w;
+}
+
+/*
+ *  stress_qsort_cmp_1()
+ *	qsort comparison - sort on int32 values
+ */
+static int stress_qsort_cmp_1(const void *p1, const void *p2)
+{
+	const int32_t *i1 = (const int32_t *)p1;
+	const int32_t *i2 = (const int32_t *)p2;
+
+	if (*i1 > *i2)
+		return 1;
+	else if (*i1 < *i2)
+		return -1;
+	else
+		return 0;
+}
+
+/*
+ *  stress_qsort_cmp_2()
+ *	qsort comparison - reverse sort on int32 values
+ */
+static int stress_qsort_cmp_2(const void *p1, const void *p2)
+{
+	return stress_qsort_cmp_1(p2, p1);
+}
+
+/*
+ *  stress_qsort_cmp_3()
+ *	qsort comparison - sort on int8 values
+ */
+static int stress_qsort_cmp_3(const void *p1, const void *p2)
+{
+	const int8_t *i1 = (const int8_t *)p1;
+	const int8_t *i2 = (const int8_t *)p2;
+
+	/* Force re-ordering on 8 bit value */
+	return *i1 - *i2;
+}
+
+static int do_qsort(struct thread_data *td)
+{
+	struct thread_options *o = &td->o;
+	struct cpu_options *co = td->eo;
+	struct timespec start, now;
+
+	fio_get_mono_time(&start);
+
+	/* Sort "random" data */
+	qsort(co->qsort_data, qsort_size, sizeof(*(co->qsort_data)), stress_qsort_cmp_1);
+
+	/* Reverse sort */
+	qsort(co->qsort_data, qsort_size, sizeof(*(co->qsort_data)), stress_qsort_cmp_2);
+
+	/* And re-order by byte compare */
+	qsort((uint8_t *)co->qsort_data, qsort_size * 4, sizeof(uint8_t), stress_qsort_cmp_3);
+
+	/* Reverse sort this again */
+	qsort(co->qsort_data, qsort_size, sizeof(*(co->qsort_data)), stress_qsort_cmp_2);
+	fio_get_mono_time(&now);
+
+	/* Adjusting cpucycle automatically to be as close as possible to the
+	 * expected cpuload The time to execute do_qsort() may change over time
+	 * as per : - the job concurrency - the cpu clock adjusted by the power
+	 * management After every do_qsort() call, the next thinktime is
+	 * adjusted regarding the last run performance
+	 */
+	co->cpucycle = utime_since(&start, &now);
+	o->thinktime = ((unsigned long long) co->cpucycle *
+				(100 - co->cpuload)) / co->cpuload;
+
+	return 0;
+}
 
 static enum fio_q_status fio_cpuio_queue(struct thread_data *td,
 					 struct io_u fio_unused *io_u)
@@ -63,14 +183,69 @@ static enum fio_q_status fio_cpuio_queue(struct thread_data *td,
 		return FIO_Q_BUSY;
 	}
 
-	usec_spin(co->cpucycle);
+	switch (co->cpumode) {
+	case FIO_CPU_NOOP:
+		usec_spin(co->cpucycle);
+		break;
+	case FIO_CPU_QSORT:
+		do_qsort(td);
+		break;
+	}
+
 	return FIO_Q_COMPLETED;
 }
 
+static int noop_init(struct thread_data *td)
+{
+	struct cpu_options *co = td->eo;
+
+	log_info("%s (noop): ioengine=%s, cpuload=%u, cpucycle=%u\n",
+		td->o.name, td->io_ops->name, co->cpuload, co->cpucycle);
+	return 0;
+}
+
+static int qsort_cleanup(struct thread_data *td)
+{
+	struct cpu_options *co = td->eo;
+
+	if (co->qsort_data) {
+		free(co->qsort_data);
+		co->qsort_data = NULL;
+	}
+
+	return 0;
+}
+
+static int qsort_init(struct thread_data *td)
+{
+	/* Setting up a default entropy */
+	struct mwc mwc = { 521288629UL, 362436069UL };
+	struct cpu_options *co = td->eo;
+	int32_t *ptr;
+	int i;
+
+	co->qsort_data = calloc(qsort_size, sizeof(*co->qsort_data));
+	if (co->qsort_data == NULL) {
+		td_verror(td, ENOMEM, "qsort_init");
+		return 1;
+	}
+
+	/* This is expensive, init the memory once */
+	for (ptr = co->qsort_data, i = 0; i < qsort_size; i++)
+		*ptr++ = mwc32(&mwc);
+
+	log_info("%s (qsort): ioengine=%s, cpuload=%u, cpucycle=%u\n",
+		td->o.name, td->io_ops->name, co->cpuload, co->cpucycle);
+
+	return 0;
+}
+
 static int fio_cpuio_init(struct thread_data *td)
 {
 	struct thread_options *o = &td->o;
 	struct cpu_options *co = td->eo;
+	int td_previous_state;
+	char *msg;
 
 	if (!co->cpuload) {
 		td_vmsg(td, EINVAL, "cpu thread needs rate (cpuload=)","cpuio");
@@ -80,21 +255,58 @@ static int fio_cpuio_init(struct thread_data *td)
 	if (co->cpuload > 100)
 		co->cpuload = 100;
 
+	/* Saving the current thread state */
+	td_previous_state = td->runstate;
+
+	/* Reporting that we are preparing the engine
+	 * This is useful as the qsort() calibration takes time
+	 * This prevents the job from starting before init is completed
+	 */
+	td_set_runstate(td, TD_SETTING_UP);
+
 	/*
 	 * set thinktime_sleep and thinktime_spin appropriately
 	 */
 	o->thinktime_blocks = 1;
 	o->thinktime_spin = 0;
-	o->thinktime = ((unsigned long long) co->cpucycle * (100 - co->cpuload)) / co->cpuload;
+	o->thinktime = ((unsigned long long) co->cpucycle *
+				(100 - co->cpuload)) / co->cpuload;
 
 	o->nr_files = o->open_files = 1;
 
-	log_info("%s: ioengine=%s, cpuload=%u, cpucycle=%u\n",
-		td->o.name, td->io_ops->name, co->cpuload, co->cpucycle);
+	switch (co->cpumode) {
+	case FIO_CPU_NOOP:
+		noop_init(td);
+		break;
+	case FIO_CPU_QSORT:
+		qsort_init(td);
+		break;
+	default:
+		if (asprintf(&msg, "bad cpu engine mode: %d", co->cpumode) < 0)
+			msg = NULL;
+		td_vmsg(td, EINVAL, msg ? : "(?)", __func__);
+		free(msg);
+		return 1;
+	}
 
+	/* Let's restore the previous state. */
+	td_set_runstate(td, td_previous_state);
 	return 0;
 }
 
+static void fio_cpuio_cleanup(struct thread_data *td)
+{
+	struct cpu_options *co = td->eo;
+
+	switch (co->cpumode) {
+	case FIO_CPU_NOOP:
+		break;
+	case FIO_CPU_QSORT:
+		qsort_cleanup(td);
+		break;
+	}
+}
+
 static int fio_cpuio_open(struct thread_data fio_unused *td,
 			  struct fio_file fio_unused *f)
 {
@@ -102,12 +314,13 @@ static int fio_cpuio_open(struct thread_data fio_unused *td,
 }
 
 static struct ioengine_ops ioengine = {
-	.name		= "cpuio",
-	.version	= FIO_IOOPS_VERSION,
-	.queue		= fio_cpuio_queue,
-	.init		= fio_cpuio_init,
-	.open_file	= fio_cpuio_open,
-	.flags		= FIO_SYNCIO | FIO_DISKLESSIO | FIO_NOIO,
+	.name			= "cpuio",
+	.version		= FIO_IOOPS_VERSION,
+	.queue			= fio_cpuio_queue,
+	.init			= fio_cpuio_init,
+	.cleanup		= fio_cpuio_cleanup,
+	.open_file		= fio_cpuio_open,
+	.flags			= FIO_SYNCIO | FIO_DISKLESSIO | FIO_NOIO,
 	.options		= options,
 	.option_struct_size	= sizeof(struct cpu_options),
 };
diff --git a/examples/cpuio.fio b/examples/cpuio.fio
index 577e0729..471cf4b2 100644
--- a/examples/cpuio.fio
+++ b/examples/cpuio.fio
@@ -1,8 +1,18 @@
 [global]
 ioengine=cpuio
 time_based
-runtime=10
+runtime=15
 
-[burn50percent]
+# The following example load 2 cores at 50% with the noop (default) mode
+[burn_2x50_noop]
 cpuload=50
+numjobs=2
+cpumode=noop
 
+# Once burn_2x50_noop is over,
+# fio load 2 cores at 50% with the qsort mode which drains much more power
+[burn_2x50%_qsort]
+stonewall
+cpuload=50
+numjobs=2
+cpumode=qsort
diff --git a/fio.1 b/fio.1
index b29ac437..e361b05f 100644
--- a/fio.1
+++ b/fio.1
@@ -1690,12 +1690,21 @@ This engine defines engine specific options.
 .TP
 .B cpuio
 Doesn't transfer any data, but burns CPU cycles according to the
-\fBcpuload\fR and \fBcpuchunks\fR options. Setting
-\fBcpuload\fR\=85 will cause that job to do nothing but burn 85%
-of the CPU. In case of SMP machines, use `numjobs=<nr_of_cpu>'
-to get desired CPU usage, as the cpuload only loads a
-single CPU at the desired rate. A job never finishes unless there is
-at least one non-cpuio job.
+\fBcpuload\fR, \fBcpuchunks\fR and \fBcpumode\fR options.
+A job never finishes unless there is at least one non-cpuio job.
+.RS
+.P
+.PD 0
+\fBcpuload\fR\=85 will cause that job to do nothing but burn 85% of the CPU.
+In case of SMP machines, use \fBnumjobs=<nr_of_cpu>\fR\ to get desired CPU usage,
+as the cpuload only loads a single CPU at the desired rate.
+
+.P
+\fBcpumode\fR\=qsort replace the default noop instructions loop
+by a qsort algorithm to consume more energy.
+
+.P
+.RE
 .TP
 .B rdma
 The RDMA I/O engine supports both RDMA memory semantics


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-01-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-01-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4065c6a4949fa85e5f2d8de8a5556130231dd680:

  log: only compile log_prevalist() if FIO_INC_DEBUG is set (2021-01-05 13:14:28 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 59f94d26f98e9c0bc18d4e013f3361c51a2c6b25:

  Change ARRAY_SIZE to FIO_ARRAY_SIZE (2021-01-06 11:32:59 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'drop_xp' of https://github.com/sitsofe/fio
      Change ARRAY_SIZE to FIO_ARRAY_SIZE

Sitsofe Wheeler (1):
      windows: drop XP support

 .appveyor.yml                        |  2 +-
 README                               |  9 ++--
 compiler/compiler.h                  |  4 +-
 configure                            |  8 +---
 diskutil.c                           |  6 +--
 engines/io_uring.c                   |  2 +-
 filesetup.c                          |  4 +-
 gclient.c                            |  4 +-
 gfio.c                               |  4 +-
 helper_thread.c                      |  6 +--
 lib/num2str.c                        |  8 ++--
 lib/prio_tree.c                      |  6 +--
 options.c                            |  4 +-
 os/os-windows-xp.h                   |  3 --
 os/os-windows.h                      |  6 +--
 os/windows/cpu-affinity.c            | 73 ------------------------------
 os/windows/posix.c                   | 87 ------------------------------------
 os/windows/posix/include/arpa/inet.h |  6 ---
 os/windows/posix/include/poll.h      | 14 ------
 oslib/libmtd.c                       |  4 +-
 oslib/libmtd_common.h                |  1 -
 parse.c                              |  2 +-
 td_error.c                           |  2 +-
 unittests/lib/num2str.c              |  2 +-
 zbd.c                                |  2 +-
 25 files changed, 37 insertions(+), 232 deletions(-)
 delete mode 100644 os/os-windows-xp.h

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index fad16326..42b79958 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -19,7 +19,7 @@ environment:
       CONFIGURE_OPTIONS:
       DISTRO: cygwin
     - ARCHITECTURE: x86
-      CONFIGURE_OPTIONS: --build-32bit-win --target-win-ver=xp
+      CONFIGURE_OPTIONS: --build-32bit-win
       DISTRO: cygwin
 
 install:
diff --git a/README b/README
index 0f943bcc..2fecf0e0 100644
--- a/README
+++ b/README
@@ -164,8 +164,9 @@ configure.
 Windows
 ~~~~~~~
 
-On Windows, Cygwin (https://www.cygwin.com/) is required in order to build
-fio. To create an MSI installer package install WiX from
+The minimum versions of Windows for building/runing fio are Windows 7/Windows
+Server 2008 R2. On Windows, Cygwin (https://www.cygwin.com/) is required in
+order to build fio. To create an MSI installer package install WiX from
 https://wixtoolset.org and run :file:`dobuild.cmd` from the :file:`os/windows`
 directory.
 
@@ -181,9 +182,7 @@ How to compile fio on 64-bit Windows:
 
 To build fio for 32-bit Windows, ensure the -i686 versions of the previously
 mentioned -x86_64 packages are installed and run ``./configure
---build-32bit-win`` before ``make``. To build an fio that supports versions of
-Windows below Windows 7/Windows Server 2008 R2 also add ``--target-win-ver=xp``
-to the end of the configure line that you run before doing ``make``.
+--build-32bit-win`` before ``make``.
 
 It's recommended that once built or installed, fio be run in a Command Prompt or
 other 'native' console such as console2, since there are known to be display and
diff --git a/compiler/compiler.h b/compiler/compiler.h
index 8988236c..44fa87b9 100644
--- a/compiler/compiler.h
+++ b/compiler/compiler.h
@@ -62,8 +62,8 @@
 #endif
 
 #ifdef FIO_INTERNAL
-#define ARRAY_SIZE(x)    (sizeof((x)) / (sizeof((x)[0])))
-#define FIELD_SIZE(s, f) (sizeof(((__typeof__(s))0)->f))
+#define FIO_ARRAY_SIZE(x)    (sizeof((x)) / (sizeof((x)[0])))
+#define FIO_FIELD_SIZE(s, f) (sizeof(((__typeof__(s))0)->f))
 #endif
 
 #ifndef __has_attribute
diff --git a/configure b/configure
index d247a041..e3e37d56 100755
--- a/configure
+++ b/configure
@@ -257,7 +257,7 @@ if test "$show_help" = "yes" ; then
   echo "--cc=                   Specify compiler to use"
   echo "--extra-cflags=         Specify extra CFLAGS to pass to compiler"
   echo "--build-32bit-win       Enable 32-bit build on Windows"
-  echo "--target-win-ver=       Minimum version of Windows to target (XP or 7)"
+  echo "--target-win-ver=       Minimum version of Windows to target (only accepts 7)"
   echo "--enable-pdb            Enable Windows PDB symbols generation (needs clang/lld)"
   echo "--build-static          Build a static fio"
   echo "--esx                   Configure build options for esx"
@@ -395,11 +395,7 @@ CYGWIN*)
     # Default Windows API target
     target_win_ver="7"
   fi
-  if test "$target_win_ver" = "XP"; then
-    output_sym "CONFIG_WINDOWS_XP"
-    # Technically the below is targeting 2003
-    CFLAGS="$CFLAGS -D_WIN32_WINNT=0x0502"
-  elif test "$target_win_ver" = "7"; then
+  if test "$target_win_ver" = "7"; then
     output_sym "CONFIG_WINDOWS_7"
     CFLAGS="$CFLAGS -D_WIN32_WINNT=0x0601"
   else
diff --git a/diskutil.c b/diskutil.c
index 6c6380bb..0051a7a0 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -181,7 +181,7 @@ static int get_device_numbers(char *file_name, int *maj, int *min)
 		/*
 		 * must be a file, open "." in that path
 		 */
-		snprintf(tempname, ARRAY_SIZE(tempname), "%s", file_name);
+		snprintf(tempname, FIO_ARRAY_SIZE(tempname), "%s", file_name);
 		p = dirname(tempname);
 		if (stat(p, &st)) {
 			perror("disk util stat");
@@ -313,7 +313,7 @@ static struct disk_util *disk_util_add(struct thread_data *td, int majdev,
 		sfree(du);
 		return NULL;
 	}
-	snprintf((char *) du->dus.name, ARRAY_SIZE(du->dus.name), "%s",
+	snprintf((char *) du->dus.name, FIO_ARRAY_SIZE(du->dus.name), "%s",
 		 basename(path));
 	du->sysfs_root = strdup(path);
 	du->major = majdev;
@@ -435,7 +435,7 @@ static struct disk_util *__init_per_file_disk_util(struct thread_data *td,
 			log_err("unknown sysfs layout\n");
 			return NULL;
 		}
-		snprintf(tmp, ARRAY_SIZE(tmp), "%s", p);
+		snprintf(tmp, FIO_ARRAY_SIZE(tmp), "%s", p);
 		sprintf(path, "%s", tmp);
 	}
 
diff --git a/engines/io_uring.c b/engines/io_uring.c
index b997c8d8..9ce2ae80 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -507,7 +507,7 @@ static void fio_ioring_unmap(struct ioring_data *ld)
 {
 	int i;
 
-	for (i = 0; i < ARRAY_SIZE(ld->mmap); i++)
+	for (i = 0; i < FIO_ARRAY_SIZE(ld->mmap); i++)
 		munmap(ld->mmap[i].ptr, ld->mmap[i].len);
 	close(ld->ring_fd);
 }
diff --git a/filesetup.c b/filesetup.c
index d3c370ca..76b3f935 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -815,7 +815,7 @@ static unsigned long long get_fs_free_counts(struct thread_data *td)
 		} else if (f->filetype != FIO_TYPE_FILE)
 			continue;
 
-		snprintf(buf, ARRAY_SIZE(buf), "%s", f->file_name);
+		snprintf(buf, FIO_ARRAY_SIZE(buf), "%s", f->file_name);
 
 		if (stat(buf, &sb) < 0) {
 			if (errno != ENOENT)
@@ -838,7 +838,7 @@ static unsigned long long get_fs_free_counts(struct thread_data *td)
 			continue;
 
 		fm = calloc(1, sizeof(*fm));
-		snprintf(fm->__base, ARRAY_SIZE(fm->__base), "%s", buf);
+		snprintf(fm->__base, FIO_ARRAY_SIZE(fm->__base), "%s", buf);
 		fm->base = basename(fm->__base);
 		fm->key = sb.st_dev;
 		flist_add(&fm->list, &list);
diff --git a/gclient.c b/gclient.c
index fe83382f..e0e0e7bf 100644
--- a/gclient.c
+++ b/gclient.c
@@ -48,7 +48,7 @@ static GtkActionEntry results_menu_items[] = {
 	{ "PrintFile", GTK_STOCK_PRINT, "Print", "<Control>P", NULL, G_CALLBACK(results_print) },
 	{ "CloseFile", GTK_STOCK_CLOSE, "Close", "<Control>W", NULL, G_CALLBACK(results_close) },
 };
-static gint results_nmenu_items = ARRAY_SIZE(results_menu_items);
+static gint results_nmenu_items = FIO_ARRAY_SIZE(results_menu_items);
 
 static const gchar *results_ui_string = " \
 	<ui> \
@@ -755,7 +755,7 @@ static void gfio_show_io_depths(GtkWidget *vbox, struct thread_stat *ts)
 	GtkListStore *model;
 	int i;
 	const char *labels[] = { "Depth", "0", "1", "2", "4", "8", "16", "32", "64", ">= 64" };
-	const int nr_labels = ARRAY_SIZE(labels);
+	const int nr_labels = FIO_ARRAY_SIZE(labels);
 	GType types[nr_labels];
 
 	frame = gtk_frame_new("IO depths");
diff --git a/gfio.c b/gfio.c
index 734651b6..22c5314d 100644
--- a/gfio.c
+++ b/gfio.c
@@ -1274,7 +1274,7 @@ static GtkActionEntry menu_items[] = {
 	{ "Quit", GTK_STOCK_QUIT, NULL,   "<Control>Q", NULL, G_CALLBACK(quit_clicked) },
 	{ "About", GTK_STOCK_ABOUT, NULL,  NULL, NULL, G_CALLBACK(about_dialog) },
 };
-static gint nmenu_items = ARRAY_SIZE(menu_items);
+static gint nmenu_items = FIO_ARRAY_SIZE(menu_items);
 
 static const gchar *ui_string = " \
 	<ui> \
@@ -1447,7 +1447,7 @@ static GtkWidget *new_client_page(struct gui_entry *ge)
 	gtk_container_add(GTK_CONTAINER(bottom_align), ge->buttonbox);
 	gtk_box_pack_start(GTK_BOX(main_vbox), bottom_align, FALSE, FALSE, 0);
 
-	add_buttons(ge, buttonspeclist, ARRAY_SIZE(buttonspeclist));
+	add_buttons(ge, buttonspeclist, FIO_ARRAY_SIZE(buttonspeclist));
 
 	/*
 	 * Set up thread status progress bar
diff --git a/helper_thread.c b/helper_thread.c
index 2d553654..d8e7ebfe 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -311,7 +311,7 @@ static void *helper_thread_main(void *data)
 	block_signals();
 
 	fio_get_mono_time(&ts);
-	msec_to_next_event = reset_timers(timer, ARRAY_SIZE(timer), &ts);
+	msec_to_next_event = reset_timers(timer, FIO_ARRAY_SIZE(timer), &ts);
 
 	fio_sem_up(hd->startup_sem);
 
@@ -329,9 +329,9 @@ static void *helper_thread_main(void *data)
 
 		if (action == A_RESET)
 			msec_to_next_event = reset_timers(timer,
-						ARRAY_SIZE(timer), &ts);
+						FIO_ARRAY_SIZE(timer), &ts);
 
-		for (i = 0; i < ARRAY_SIZE(timer); ++i)
+		for (i = 0; i < FIO_ARRAY_SIZE(timer); ++i)
 			ret = eval_timer(&timer[i], &ts, &msec_to_next_event);
 
 		if (action == A_DO_STAT)
diff --git a/lib/num2str.c b/lib/num2str.c
index 726f1c44..3597de2f 100644
--- a/lib/num2str.c
+++ b/lib/num2str.c
@@ -7,8 +7,6 @@
 #include "../oslib/asprintf.h"
 #include "num2str.h"
 
-#define ARRAY_SIZE(x)    (sizeof((x)) / (sizeof((x)[0])))
-
 /**
  * num2str() - Cheesy number->string conversion, complete with carry rounding error.
  * @num: quantity (e.g., number of blocks, bytes or bits)
@@ -38,7 +36,7 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, enum n2s_unit units)
 	char *buf;
 
 	compiletime_assert(sizeof(sistr) == sizeof(iecstr), "unit prefix arrays must be identical sizes");
-	assert(units < ARRAY_SIZE(unitstr));
+	assert(units < FIO_ARRAY_SIZE(unitstr));
 
 	if (pow2)
 		unitprefix = iecstr;
@@ -69,7 +67,7 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, enum n2s_unit units)
 	 * Divide by K/Ki until string length of num <= maxlen.
 	 */
 	modulo = -1U;
-	while (post_index < ARRAY_SIZE(sistr)) {
+	while (post_index < FIO_ARRAY_SIZE(sistr)) {
 		sprintf(tmp, "%llu", (unsigned long long) num);
 		if (strlen(tmp) <= maxlen)
 			break;
@@ -80,7 +78,7 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, enum n2s_unit units)
 		post_index++;
 	}
 
-	if (post_index >= ARRAY_SIZE(sistr))
+	if (post_index >= FIO_ARRAY_SIZE(sistr))
 		post_index = 0;
 
 	/*
diff --git a/lib/prio_tree.c b/lib/prio_tree.c
index d8e1b89a..c4f66a49 100644
--- a/lib/prio_tree.c
+++ b/lib/prio_tree.c
@@ -18,8 +18,6 @@
 #include "../compiler/compiler.h"
 #include "prio_tree.h"
 
-#define ARRAY_SIZE(x)    (sizeof((x)) / (sizeof((x)[0])))
-
 /*
  * A clever mix of heap and radix trees forms a radix priority search tree (PST)
  * which is useful for storing intervals, e.g, we can consider a vma as a closed
@@ -57,9 +55,9 @@ static void fio_init prio_tree_init(void)
 {
 	unsigned int i;
 
-	for (i = 0; i < ARRAY_SIZE(index_bits_to_maxindex) - 1; i++)
+	for (i = 0; i < FIO_ARRAY_SIZE(index_bits_to_maxindex) - 1; i++)
 		index_bits_to_maxindex[i] = (1UL << (i + 1)) - 1;
-	index_bits_to_maxindex[ARRAY_SIZE(index_bits_to_maxindex) - 1] = ~0UL;
+	index_bits_to_maxindex[FIO_ARRAY_SIZE(index_bits_to_maxindex) - 1] = ~0UL;
 }
 
 /*
diff --git a/options.c b/options.c
index 1e91b3e9..4c472589 100644
--- a/options.c
+++ b/options.c
@@ -22,7 +22,7 @@ char client_sockaddr_str[INET6_ADDRSTRLEN] = { 0 };
 static const struct pattern_fmt_desc fmt_desc[] = {
 	{
 		.fmt   = "%o",
-		.len   = FIELD_SIZE(struct io_u *, offset),
+		.len   = FIO_FIELD_SIZE(struct io_u *, offset),
 		.paste = paste_blockoff
 	},
 	{ }
@@ -1387,7 +1387,7 @@ static int str_verify_pattern_cb(void *data, const char *input)
 	struct thread_data *td = cb_data_to_td(data);
 	int ret;
 
-	td->o.verify_fmt_sz = ARRAY_SIZE(td->o.verify_fmt);
+	td->o.verify_fmt_sz = FIO_ARRAY_SIZE(td->o.verify_fmt);
 	ret = parse_and_fill_pattern(input, strlen(input), td->o.verify_pattern,
 				     MAX_PATTERN_SIZE, fmt_desc,
 				     td->o.verify_fmt, &td->o.verify_fmt_sz);
diff --git a/os/os-windows-xp.h b/os/os-windows-xp.h
deleted file mode 100644
index fbc23e2c..00000000
--- a/os/os-windows-xp.h
+++ /dev/null
@@ -1,3 +0,0 @@
-#define FIO_MAX_CPUS	MAXIMUM_PROCESSORS
-
-typedef DWORD_PTR os_cpu_mask_t;
diff --git a/os/os-windows.h b/os/os-windows.h
index fa2955f9..ddfae413 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -21,6 +21,7 @@
 #include "../lib/types.h"
 
 #include "windows/posix.h"
+#include "os-windows-7.h"
 
 #ifndef PTHREAD_STACK_MIN
 #define PTHREAD_STACK_MIN 65535
@@ -215,13 +216,8 @@ static inline int fio_mkdir(const char *path, mode_t mode) {
 	return 0;
 }
 
-#ifdef CONFIG_WINDOWS_XP
-#include "os-windows-xp.h"
-#else
 #define FIO_HAVE_CPU_ONLINE_SYSCONF
 unsigned int cpus_online(void);
-#include "os-windows-7.h"
-#endif
 
 int first_set_cpu(os_cpu_mask_t *cpumask);
 int fio_setaffinity(int pid, os_cpu_mask_t cpumask);
diff --git a/os/windows/cpu-affinity.c b/os/windows/cpu-affinity.c
index 46fd048d..7601970f 100644
--- a/os/windows/cpu-affinity.c
+++ b/os/windows/cpu-affinity.c
@@ -2,78 +2,6 @@
 
 #include <windows.h>
 
-#ifdef CONFIG_WINDOWS_XP
-int fio_setaffinity(int pid, os_cpu_mask_t cpumask)
-{
-	HANDLE h;
-	BOOL bSuccess = FALSE;
-
-	h = OpenThread(THREAD_QUERY_INFORMATION | THREAD_SET_INFORMATION, TRUE,
-		       pid);
-	if (h != NULL) {
-		bSuccess = SetThreadAffinityMask(h, cpumask);
-		if (!bSuccess)
-			log_err("fio_setaffinity failed: failed to set thread affinity (pid %d, mask %.16llx)\n",
-				pid, (long long unsigned) cpumask);
-
-		CloseHandle(h);
-	} else {
-		log_err("fio_setaffinity failed: failed to get handle for pid %d\n",
-			pid);
-	}
-
-	return bSuccess ? 0 : -1;
-}
-
-int fio_getaffinity(int pid, os_cpu_mask_t *mask)
-{
-	os_cpu_mask_t systemMask;
-
-	HANDLE h = OpenProcess(PROCESS_QUERY_INFORMATION, TRUE, pid);
-
-	if (h != NULL) {
-		GetProcessAffinityMask(h, mask, &systemMask);
-		CloseHandle(h);
-	} else {
-		log_err("fio_getaffinity failed: failed to get handle for pid %d\n",
-			pid);
-		return -1;
-	}
-
-	return 0;
-}
-
-void fio_cpu_clear(os_cpu_mask_t *mask, int cpu)
-{
-	*mask &= ~(1ULL << cpu);
-}
-
-void fio_cpu_set(os_cpu_mask_t *mask, int cpu)
-{
-	*mask |= 1ULL << cpu;
-}
-
-int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
-{
-	return (*mask & (1ULL << cpu)) != 0;
-}
-
-int fio_cpu_count(os_cpu_mask_t *mask)
-{
-	return hweight64(*mask);
-}
-
-int fio_cpuset_init(os_cpu_mask_t *mask)
-{
-	*mask = 0;
-	return 0;
-}
-
-int fio_cpuset_exit(os_cpu_mask_t *mask)
-{
-	return 0;
-}
-#else /* CONFIG_WINDOWS_XP */
 /* Return all processors regardless of processor group */
 unsigned int cpus_online(void)
 {
@@ -443,4 +371,3 @@ int fio_cpuset_exit(os_cpu_mask_t *mask)
 {
 	return 0;
 }
-#endif /* CONFIG_WINDOWS_XP */
diff --git a/os/windows/posix.c b/os/windows/posix.c
index 9e9f12ef..09c2e4a7 100644
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -1026,90 +1026,3 @@ in_addr_t inet_network(const char *cp)
 	hbo = ((nbo & 0xFF) << 24) + ((nbo & 0xFF00) << 8) + ((nbo & 0xFF0000) >> 8) + ((nbo & 0xFF000000) >> 24);
 	return hbo;
 }
-
-#ifdef CONFIG_WINDOWS_XP
-const char *inet_ntop(int af, const void *restrict src, char *restrict dst,
-		      socklen_t size)
-{
-	INT status = SOCKET_ERROR;
-	WSADATA wsd;
-	char *ret = NULL;
-
-	if (af != AF_INET && af != AF_INET6) {
-		errno = EAFNOSUPPORT;
-		return NULL;
-	}
-
-	WSAStartup(MAKEWORD(2,2), &wsd);
-
-	if (af == AF_INET) {
-		struct sockaddr_in si;
-		DWORD len = size;
-
-		memset(&si, 0, sizeof(si));
-		si.sin_family = af;
-		memcpy(&si.sin_addr, src, sizeof(si.sin_addr));
-		status = WSAAddressToString((struct sockaddr*)&si, sizeof(si), NULL, dst, &len);
-	} else if (af == AF_INET6) {
-		struct sockaddr_in6 si6;
-		DWORD len = size;
-
-		memset(&si6, 0, sizeof(si6));
-		si6.sin6_family = af;
-		memcpy(&si6.sin6_addr, src, sizeof(si6.sin6_addr));
-		status = WSAAddressToString((struct sockaddr*)&si6, sizeof(si6), NULL, dst, &len);
-	}
-
-	if (status != SOCKET_ERROR)
-		ret = dst;
-	else
-		errno = ENOSPC;
-
-	WSACleanup();
-
-	return ret;
-}
-
-int inet_pton(int af, const char *restrict src, void *restrict dst)
-{
-	INT status = SOCKET_ERROR;
-	WSADATA wsd;
-	int ret = 1;
-
-	if (af != AF_INET && af != AF_INET6) {
-		errno = EAFNOSUPPORT;
-		return -1;
-	}
-
-	WSAStartup(MAKEWORD(2,2), &wsd);
-
-	if (af == AF_INET) {
-		struct sockaddr_in si;
-		INT len = sizeof(si);
-
-		memset(&si, 0, sizeof(si));
-		si.sin_family = af;
-		status = WSAStringToAddressA((char*)src, af, NULL, (struct sockaddr*)&si, &len);
-		if (status != SOCKET_ERROR)
-			memcpy(dst, &si.sin_addr, sizeof(si.sin_addr));
-	} else if (af == AF_INET6) {
-		struct sockaddr_in6 si6;
-		INT len = sizeof(si6);
-
-		memset(&si6, 0, sizeof(si6));
-		si6.sin6_family = af;
-		status = WSAStringToAddressA((char*)src, af, NULL, (struct sockaddr*)&si6, &len);
-		if (status != SOCKET_ERROR)
-			memcpy(dst, &si6.sin6_addr, sizeof(si6.sin6_addr));
-	}
-
-	if (status == SOCKET_ERROR) {
-		errno = ENOSPC;
-		ret = 0;
-	}
-
-	WSACleanup();
-
-	return ret;
-}
-#endif /* CONFIG_WINDOWS_XP */
diff --git a/os/windows/posix/include/arpa/inet.h b/os/windows/posix/include/arpa/inet.h
index 056f1dd5..1024db37 100644
--- a/os/windows/posix/include/arpa/inet.h
+++ b/os/windows/posix/include/arpa/inet.h
@@ -12,10 +12,4 @@ typedef int in_addr_t;
 
 in_addr_t inet_network(const char *cp);
 
-#ifdef CONFIG_WINDOWS_XP
-const char *inet_ntop(int af, const void *restrict src,
-        char *restrict dst, socklen_t size);
-int inet_pton(int af, const char *restrict src, void *restrict dst);
-#endif
-
 #endif /* ARPA_INET_H */
diff --git a/os/windows/posix/include/poll.h b/os/windows/posix/include/poll.h
index 25b8183f..5099cf2e 100644
--- a/os/windows/posix/include/poll.h
+++ b/os/windows/posix/include/poll.h
@@ -5,20 +5,6 @@
 
 typedef int nfds_t;
 
-#ifdef CONFIG_WINDOWS_XP
-struct pollfd
-{
-	int fd;
-	short events;
-	short revents;
-};
-
-#define POLLOUT	1
-#define POLLIN	2
-#define POLLERR	0
-#define POLLHUP	1
-#endif /* CONFIG_WINDOWS_XP */
-
 int poll(struct pollfd fds[], nfds_t nfds, int timeout);
 
 #endif /* POLL_H */
diff --git a/oslib/libmtd.c b/oslib/libmtd.c
index 385b9d2f..5fca3a01 100644
--- a/oslib/libmtd.c
+++ b/oslib/libmtd.c
@@ -35,6 +35,8 @@
 #include <sys/ioctl.h>
 #include <inttypes.h>
 
+#include "../compiler/compiler.h"
+
 #include <mtd/mtd-user.h>
 #include "libmtd.h"
 
@@ -960,7 +962,7 @@ int mtd_torture(libmtd_t desc, const struct mtd_dev_info *mtd, int fd, int eb)
 	void *buf;
 
 	normsg("run torture test for PEB %d", eb);
-	patt_count = ARRAY_SIZE(patterns);
+	patt_count = FIO_ARRAY_SIZE(patterns);
 
 	buf = xmalloc(mtd->eb_size);
 
diff --git a/oslib/libmtd_common.h b/oslib/libmtd_common.h
index 4ed9f0ba..db0494dd 100644
--- a/oslib/libmtd_common.h
+++ b/oslib/libmtd_common.h
@@ -47,7 +47,6 @@ extern "C" {
 #define MAX(a, b) ((a) > (b) ? (a) : (b))
 #endif
 #define min(a, b) MIN(a, b) /* glue for linux kernel source */
-#define ARRAY_SIZE(a) (sizeof(a) / sizeof((a)[0]))
 
 #define ALIGN(x,a) __ALIGN_MASK(x,(__typeof__(x))(a)-1)
 #define __ALIGN_MASK(x,mask) (((x)+(mask))&~(mask))
diff --git a/parse.c b/parse.c
index f4cefcf6..c28d82ef 100644
--- a/parse.c
+++ b/parse.c
@@ -501,7 +501,7 @@ static int str_match_len(const struct value_pair *vp, const char *str)
 
 static const char *opt_type_name(const struct fio_option *o)
 {
-	compiletime_assert(ARRAY_SIZE(opt_type_names) - 1 == FIO_OPT_UNSUPPORTED,
+	compiletime_assert(FIO_ARRAY_SIZE(opt_type_names) - 1 == FIO_OPT_UNSUPPORTED,
 				"opt_type_names[] index");
 
 	if (o->type <= FIO_OPT_UNSUPPORTED)
diff --git a/td_error.c b/td_error.c
index 9d58a314..13408f2e 100644
--- a/td_error.c
+++ b/td_error.c
@@ -20,7 +20,7 @@ int td_non_fatal_error(struct thread_data *td, enum error_type_bit etype,
 
 	if (!td->o.ignore_error[etype]) {
 		td->o.ignore_error[etype] = __NON_FATAL_ERR;
-		td->o.ignore_error_nr[etype] = ARRAY_SIZE(__NON_FATAL_ERR);
+		td->o.ignore_error_nr[etype] = FIO_ARRAY_SIZE(__NON_FATAL_ERR);
 	}
 
 	if (!(td->o.continue_on_error & (1 << etype)))
diff --git a/unittests/lib/num2str.c b/unittests/lib/num2str.c
index a3492a8d..8f12cf83 100644
--- a/unittests/lib/num2str.c
+++ b/unittests/lib/num2str.c
@@ -29,7 +29,7 @@ static void test_num2str(void)
 	char *str;
 	int i;
 
-	for (i = 0; i < ARRAY_SIZE(testcases); ++i) {
+	for (i = 0; i < FIO_ARRAY_SIZE(testcases); ++i) {
 		p = &testcases[i];
 		str = num2str(p->num, p->maxlen, p->base, p->pow2, p->unit);
 		CU_ASSERT_STRING_EQUAL(str, p->expected);
diff --git a/zbd.c b/zbd.c
index 9327816a..f2599bd4 100644
--- a/zbd.c
+++ b/zbd.c
@@ -338,7 +338,7 @@ static bool zbd_verify_bs(void)
 			if (!f->zbd_info)
 				continue;
 			zone_size = f->zbd_info->zone_size;
-			for (k = 0; k < ARRAY_SIZE(td->o.bs); k++) {
+			for (k = 0; k < FIO_ARRAY_SIZE(td->o.bs); k++) {
 				if (td->o.verify != VERIFY_NONE &&
 				    zone_size % td->o.bs[k] != 0) {
 					log_info("%s: block size %llu is not a divisor of the zone size %d\n",


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2021-01-06 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2021-01-06 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a64b7d6e9b3e5174e034f7c147de71e4b51b2a01:

  Merge branch 'fix-get-next-file' of https://github.com/aclamk/fio (2020-12-29 16:36:32 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4065c6a4949fa85e5f2d8de8a5556130231dd680:

  log: only compile log_prevalist() if FIO_INC_DEBUG is set (2021-01-05 13:14:28 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      log: only compile log_prevalist() if FIO_INC_DEBUG is set

 log.c | 2 ++
 1 file changed, 2 insertions(+)

---

Diff of recent changes:

diff --git a/log.c b/log.c
index 6c36813d..562a29aa 100644
--- a/log.c
+++ b/log.c
@@ -42,6 +42,7 @@ size_t log_valist(const char *fmt, va_list args)
 }
 
 /* add prefix for the specified type in front of the valist */
+#ifdef FIO_INC_DEBUG
 void log_prevalist(int type, const char *fmt, va_list args)
 {
 	char *buf1, *buf2;
@@ -64,6 +65,7 @@ void log_prevalist(int type, const char *fmt, va_list args)
 	len = log_info_buf(buf2, len);
 	free(buf2);
 }
+#endif
 
 ssize_t log_info(const char *format, ...)
 {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-12-30 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-12-30 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b0340c5c7de38b3f4632366247489da7c52d5cfb:

  Merge branch 'terse_units' of https://github.com/sitsofe/fio (2020-12-24 07:38:18 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a64b7d6e9b3e5174e034f7c147de71e4b51b2a01:

  Merge branch 'fix-get-next-file' of https://github.com/aclamk/fio (2020-12-29 16:36:32 -0700)

----------------------------------------------------------------
Adam Kupczyk (1):
      io_u: Fix bad interaction with --openfiles and non-sequential file selection policy

Jens Axboe (1):
      Merge branch 'fix-get-next-file' of https://github.com/aclamk/fio

 io_u.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index f30fc037..00a219c2 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1326,8 +1326,10 @@ static struct fio_file *__get_next_file(struct thread_data *td)
 	if (f && fio_file_open(f) && !fio_file_closing(f)) {
 		if (td->o.file_service_type == FIO_FSERVICE_SEQ)
 			goto out;
-		if (td->file_service_left--)
-			goto out;
+		if (td->file_service_left) {
+		  td->file_service_left--;
+		  goto out;
+		}
 	}
 
 	if (td->o.file_service_type == FIO_FSERVICE_RR ||


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-12-25 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-12-25 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ced224611b039df68ceebde4733269f4f6606912:

  Merge branch 'github_issue' of https://github.com/sitsofe/fio (2020-12-17 16:09:54 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b0340c5c7de38b3f4632366247489da7c52d5cfb:

  Merge branch 'terse_units' of https://github.com/sitsofe/fio (2020-12-24 07:38:18 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'terse_units' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      docs: add missing units to terse headings

 HOWTO | 2 +-
 fio.1 | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 7e46cee0..d663166d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -3963,7 +3963,7 @@ will be a disk utilization section.
 Below is a single line containing short names for each of the fields in the
 minimal output v3, separated by semicolons::
 
-        terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_cla
 t_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
+        terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth_kb;read_iops;read_runtime_ms;read_slat_min_us;read_slat_max_us;read_slat_mean_us;read_slat_dev_us;read_clat_min_us;read_clat_max_us;read_clat_mean_us;read_clat_dev_us;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min_us;read_lat_max_us;read_lat_mean_us;read_lat_dev_us;read_bw_min_kb;read_bw_max_kb;read_bw_agg_pct;read_bw_mean_kb;read_bw_dev_kb;write_kb;write_bandwidth_kb;write_iops;write_runtime_ms;write_slat_min_us;write_slat_max_us;write_slat_mean_us;write_slat_dev_us;write_clat_min_us;write_clat_max_us;write_clat_mean_us;write_clat_dev_us;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05
 ;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min_us;write_lat_max_us;write_lat_mean_us;write_lat_dev_us;write_bw_min_kb;write_bw_max_kb;write_bw_agg_pct;write_bw_mean_kb;write_bw_dev_kb;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 
 In client/server mode terse output differs from what appears when jobs are run
 locally. Disk utilization data is omitted from the standard terse output and
diff --git a/fio.1 b/fio.1
index 45ec8d43..b29ac437 100644
--- a/fio.1
+++ b/fio.1
@@ -3678,7 +3678,7 @@ Below is a single line containing short names for each of the fields in the
 minimal output v3, separated by semicolons:
 .P
 .nf
-		terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct1
 0;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
+		terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth_kb;read_iops;read_runtime_ms;read_slat_min_us;read_slat_max_us;read_slat_mean_us;read_slat_dev_us;read_clat_min_us;read_clat_max_us;read_clat_mean_us;read_clat_dev_us;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min_us;read_lat_max_us;read_lat_mean_us;read_lat_dev_us;read_bw_min_kb;read_bw_max_kb;read_bw_agg_pct;read_bw_mean_kb;read_bw_dev_kb;write_kb;write_bandwidth_kb;write_iops;write_runtime_ms;write_slat_min_us;write_slat_max_us;write_slat_mean_us;write_slat_dev_us;write_clat_min_us;write_clat_max_us;write_clat_mean_us;write_clat_dev_us;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write
 _clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min_us;write_lat_max_us;write_lat_mean_us;write_lat_dev_us;write_bw_min_kb;write_bw_max_kb;write_bw_agg_pct;write_bw_mean_kb;write_bw_dev_kb;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 .fi
 .P
 In client/server mode terse output differs from what appears when jobs are run


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-12-18 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-12-18 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit dee9b29bef5bc344815d7a53dda6bb21426f2bfa:

  Merge branch 'wip-rbd-engine-tweaks' of https://github.com/dillaman/fio (2020-12-15 09:18:28 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ced224611b039df68ceebde4733269f4f6606912:

  Merge branch 'github_issue' of https://github.com/sitsofe/fio (2020-12-17 16:09:54 -0700)

----------------------------------------------------------------
Erwan Velu (4):
      examples/fsx: Removing deprecated rwmixcycle options
      examples: Clarify time_based usage
      examples: Clarify thread usage
      examples: Clarify group_reporting usage

Jens Axboe (2):
      Merge branch 'evelu-examples' of https://github.com/ErwanAliasr1/fio
      Merge branch 'github_issue' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      docs: add new section to REPORTING-BUGS and github issue templates

 .github/ISSUE_TEMPLATE.md             |  5 +++++
 .github/ISSUE_TEMPLATE/bug-report.md  | 20 +++++++++++++++++++
 .github/ISSUE_TEMPLATE/config.yml     |  6 ++++++
 .github/ISSUE_TEMPLATE/enhancement.md | 11 +++++++++++
 .github/SUPPORT.md                    | 36 +++++++++++++++++++++++++++++++++++
 REPORTING-BUGS                        | 31 +++++++++++++++++++++++++-----
 examples/cpp_null.fio                 |  2 +-
 examples/cross-stripe-verify.fio      |  2 +-
 examples/dev-dax.fio                  |  4 ++--
 examples/e4defrag.fio                 |  2 +-
 examples/e4defrag2.fio                |  5 ++---
 examples/exitwhat.fio                 |  2 +-
 examples/falloc.fio                   |  4 ++--
 examples/fio-rand-RW.fio              |  2 +-
 examples/fio-rand-read.fio            |  2 +-
 examples/fio-rand-write.fio           |  2 +-
 examples/fio-seq-RW.fio               |  2 +-
 examples/fio-seq-read.fio             |  2 +-
 examples/fio-seq-write.fio            |  2 +-
 examples/fsx.fio                      |  1 -
 examples/jesd219.fio                  |  2 +-
 examples/libpmem.fio                  |  4 ++--
 examples/libzbc-rand-write.fio        |  2 +-
 examples/null.fio                     |  1 -
 examples/pmemblk.fio                  |  4 ++--
 examples/steadystate.fio              |  2 +-
 examples/surface-scan.fio             |  2 +-
 examples/waitfor.fio                  |  2 +-
 examples/zbd-rand-write.fio           |  2 +-
 29 files changed, 130 insertions(+), 34 deletions(-)
 create mode 100644 .github/ISSUE_TEMPLATE.md
 create mode 100644 .github/ISSUE_TEMPLATE/bug-report.md
 create mode 100644 .github/ISSUE_TEMPLATE/config.yml
 create mode 100644 .github/ISSUE_TEMPLATE/enhancement.md
 create mode 100644 .github/SUPPORT.md

---

Diff of recent changes:

diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md
new file mode 100644
index 00000000..272968f8
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE.md
@@ -0,0 +1,5 @@
+**Please acknowledge you have done the following before creating a ticket**
+
+- [ ] I have read the GitHub issues section of [REPORTING-BUGS](../blob/master/REPORTING-BUGS).
+
+<!-- replace me with bug report / enhancement request -->
diff --git a/.github/ISSUE_TEMPLATE/bug-report.md b/.github/ISSUE_TEMPLATE/bug-report.md
new file mode 100644
index 00000000..10738165
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug-report.md
@@ -0,0 +1,20 @@
+---
+name: Report a bug
+about: For bugs that are reproducible with the latest fio releases
+
+---
+
+**Please acknowledge the following before creating a ticket**
+
+- [ ] I have read the GitHub issues section of [REPORTING-BUGS](../blob/master/REPORTING-BUGS).
+
+**Description of the bug:**
+<!--replaceme-->
+
+**Environment**: <!-- Name and version of operating system -->
+
+**fio version**: <!--replaceme-->
+
+**Reproduction steps**
+<!-- Please minimise the job file/command line options down to only those
+necessary to reproduce the issue (https://stackoverflow.com/help/mcve ) -->
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 00000000..c7e3b372
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,6 @@
+blank_issues_enabled: true
+
+contact_links:
+- name: General questions (e.g. "How do I...", "Why is...") that are related to fio
+  url: http://vger.kernel.org/vger-lists.html#fio
+  about: Please send questions to the fio mailing list (plain-text emails ONLY)
diff --git a/.github/ISSUE_TEMPLATE/enhancement.md b/.github/ISSUE_TEMPLATE/enhancement.md
new file mode 100644
index 00000000..1d4ba77d
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/enhancement.md
@@ -0,0 +1,11 @@
+---
+name: Feature enhancement request
+about: Suggest a new fio feature
+labels: enhancement
+
+---
+
+**Description of the new feature**
+<!-- Please be aware regular fio developers are busy with non-fio work. Because
+of this, most requests are only completed if someone from outside the project
+contributes the code. -->
diff --git a/.github/SUPPORT.md b/.github/SUPPORT.md
new file mode 100644
index 00000000..8d9df863
--- /dev/null
+++ b/.github/SUPPORT.md
@@ -0,0 +1,36 @@
+# Getting support for fio
+
+## General questions
+
+Please use the fio mailing list for asking general fio questions (e.g. "How do
+I do X?", "Why does Y happen?"). See the Mailing list section of the
+[README][readme] for details).
+
+## Reporting bugs
+
+As mentioned in [REPORTING-BUGS][reportingbugs], fio bugs and enhancements can
+be reported to the fio mailing list or fio's GitHub issues tracker.
+
+When reporting bugs please include ALL of the following:
+- Description of the issue
+- fio version number tested. If your fio isn't among the recent releases (see
+  the [fio releases page][releases]) please build a new one from source (see
+  the Source and Building sections of the [README][readme] for how to do this)
+  and reproduce the issue with the fresh build before filing an issue.
+- Reproduction steps and minimal job file/command line parameters.
+
+When requesting an enhancement only the description is needed.
+
+### GitHub issues specific information
+
+[Formatting terminal output with markdown][quotingcode] will help people who
+are reading your report. However, if the output is large (e.g. over 15 lines
+long) please consider including it as a text attachment. Avoid attaching
+pictures of screenshots as these are not searchable/selectable.
+
+<!-- Definitions -->
+
+[readme]: ../README
+[reportingbugs]: ../REPORTING-BUGS
+[releases]: ../../../releases
+[quotingcode]: https://docs.github.com/en/free-pro-team@latest/github/writing-on-github/basic-writing-and-formatting-syntax#quoting-code
diff --git a/REPORTING-BUGS b/REPORTING-BUGS
index 327b6caa..c0204d7e 100644
--- a/REPORTING-BUGS
+++ b/REPORTING-BUGS
@@ -1,16 +1,20 @@
 Reporting a bug
 ---------------
 
-If you notice anything that seems like a fio bug, please do send email
-to the list (fio@vger.kernel.org, see README) about it. If you are not
-running the newest release of fio, upgrading first is recommended.
+...via the mailing list
+=======================
+
+If you notice anything that seems like a fio bug or want to ask fio related
+questions, please send a plain-text only email to the list
+(fio@vger.kernel.org, see README) about it. If you are not running the newest
+release of fio please upgrade first.
 
 When reporting a bug, you'll need to include:
 
 1) A description of what you think the bug is
-2) Environment (Linux distro version, kernel version). This is mostly
+2) Environment (e.g. Linux distro version, kernel version). This is mostly
    needed if it's a build bug.
-3) The output from fio --version.
+3) The output from fio --version .
 4) How to reproduce. Please include a full list of the parameters
    passed to fio and the job file used (if any).
 
@@ -19,3 +23,20 @@ is left out and has to be asked for will add to the turn-around time
 of getting to the bottom of the issue, and an eventual fix.
 
 That's it!
+
+...via GitHub issues
+====================
+
+Please create an issue in the GitHub issue tracker
+(https://github.com/axboe/fio/issues ) but observe the following:
+
+a) If you are asking a question on how to do something ("How do I/Why is?")
+   please send it to the mailing list and not GitHub issues. The fio project
+   uses GitHub issues for reproducible bugs/enhancement requests.
+b) Please reproduce your bug using the latest fio listed on
+   https://github.com/axboe/fio/releases (see the Source and Building sections
+   of the README for how to build fio from source).
+c) Include all of the information requested in the mailing list section above
+   (description, environment, version, reproduction steps and all job parameters).
+
+Thanks!
diff --git a/examples/cpp_null.fio b/examples/cpp_null.fio
index 436ed90a..7c62beaf 100644
--- a/examples/cpp_null.fio
+++ b/examples/cpp_null.fio
@@ -7,4 +7,4 @@ ioengine=cpp_null
 size=100g
 rw=randread
 norandommap
-time_based=0
+time_based
diff --git a/examples/cross-stripe-verify.fio b/examples/cross-stripe-verify.fio
index 68664ed0..47c0889c 100644
--- a/examples/cross-stripe-verify.fio
+++ b/examples/cross-stripe-verify.fio
@@ -17,7 +17,7 @@ verify_backlog=1
 offset_increment=124g
 io_size=120g
 offset=120k
-group_reporting=1
+group_reporting
 verify_dump=1
 loops=2
 
diff --git a/examples/dev-dax.fio b/examples/dev-dax.fio
index d9f430eb..88bce31b 100644
--- a/examples/dev-dax.fio
+++ b/examples/dev-dax.fio
@@ -2,7 +2,7 @@
 bs=2m
 ioengine=dev-dax
 norandommap
-time_based=1
+time_based
 runtime=30
 group_reporting
 disable_lat=1
@@ -18,7 +18,7 @@ cpus_allowed_policy=split
 #
 iodepth=1
 direct=0
-thread=1
+thread
 numjobs=16
 #
 # The dev-dax engine does IO to DAX device that are special character
diff --git a/examples/e4defrag.fio b/examples/e4defrag.fio
index cb94e85a..d6495f7a 100644
--- a/examples/e4defrag.fio
+++ b/examples/e4defrag.fio
@@ -18,7 +18,7 @@ rw=write
 # Run e4defrag and aio-dio workers in parallel
 [e4defrag]
 stonewall
-time_based=30
+time_based
 runtime=30
 ioengine=e4defrag
 buffered=0
diff --git a/examples/e4defrag2.fio b/examples/e4defrag2.fio
index c6485997..2d4e1a87 100644
--- a/examples/e4defrag2.fio
+++ b/examples/e4defrag2.fio
@@ -55,7 +55,7 @@ inplace=1
 bs=4k
 donorname=file3.def
 filename=file3
-time_based=30
+time_based
 rw=randwrite
 
 [buffered-aio-32k]
@@ -68,7 +68,7 @@ bs=32k
 filename=file3
 rw=randrw
 runtime=30
-time_based=30
+time_based
 numjobs=4
 
 [direct-aio-32k]
@@ -82,7 +82,6 @@ bs=32k
 filename=file3
 rw=randrw
 runtime=30
-time_based=30
 numjobs=4
 
 
diff --git a/examples/exitwhat.fio b/examples/exitwhat.fio
index c91d7375..864508c6 100644
--- a/examples/exitwhat.fio
+++ b/examples/exitwhat.fio
@@ -11,7 +11,7 @@
 filename=/tmp/test
 filesize=1G
 blocksize=4096
-group_reporting=1
+group_reporting
 exitall=1
 
 [slow1]
diff --git a/examples/falloc.fio b/examples/falloc.fio
index fa307314..fadf1321 100644
--- a/examples/falloc.fio
+++ b/examples/falloc.fio
@@ -15,7 +15,7 @@ group_reporting
 [falloc-fuzzer]
 stonewall
 runtime=10
-time_based=10
+time_based
 bssplit=4k/10:64k/50:32k/40
 rw=randwrite
 numjobs=1
@@ -24,7 +24,7 @@ filename=fragmented_file
 [punch hole-fuzzer]
 bs=4k
 runtime=10
-time_based=10
+time_based
 rw=randtrim
 numjobs=2
 filename=fragmented_file
diff --git a/examples/fio-rand-RW.fio b/examples/fio-rand-RW.fio
index 0df0bc17..a1074a1a 100644
--- a/examples/fio-rand-RW.fio
+++ b/examples/fio-rand-RW.fio
@@ -9,7 +9,7 @@ rwmixwrite=40
 bs=4K
 direct=0
 numjobs=4
-time_based=1
+time_based
 runtime=900
 
 [file1]
diff --git a/examples/fio-rand-read.fio b/examples/fio-rand-read.fio
index bc154668..319a9209 100644
--- a/examples/fio-rand-read.fio
+++ b/examples/fio-rand-read.fio
@@ -7,7 +7,7 @@ rw=randread
 bs=4K
 direct=0
 numjobs=1
-time_based=1
+time_based
 runtime=900
 
 [file1]
diff --git a/examples/fio-rand-write.fio b/examples/fio-rand-write.fio
index bd1b73a9..55ededbd 100644
--- a/examples/fio-rand-write.fio
+++ b/examples/fio-rand-write.fio
@@ -7,7 +7,7 @@ rw=randwrite
 bs=4K
 direct=0
 numjobs=4
-time_based=1
+time_based
 runtime=900
 
 [file1]
diff --git a/examples/fio-seq-RW.fio b/examples/fio-seq-RW.fio
index 8f7090f3..89e5c679 100644
--- a/examples/fio-seq-RW.fio
+++ b/examples/fio-seq-RW.fio
@@ -9,7 +9,7 @@ rwmixwrite=40
 bs=256K
 direct=0
 numjobs=4
-time_based=1
+time_based
 runtime=900
 
 [file1]
diff --git a/examples/fio-seq-read.fio b/examples/fio-seq-read.fio
index 28de93c8..2b272480 100644
--- a/examples/fio-seq-read.fio
+++ b/examples/fio-seq-read.fio
@@ -5,7 +5,7 @@ rw=read
 bs=256K
 direct=1
 numjobs=1
-time_based=1
+time_based
 runtime=900
 
 [file1]
diff --git a/examples/fio-seq-write.fio b/examples/fio-seq-write.fio
index b291a15a..ac6c9eef 100644
--- a/examples/fio-seq-write.fio
+++ b/examples/fio-seq-write.fio
@@ -7,7 +7,7 @@ rw=write
 bs=256K
 direct=0
 numjobs=1
-time_based=1
+time_based
 runtime=900
 
 [file1]
diff --git a/examples/fsx.fio b/examples/fsx.fio
index 6b48c6fd..22152dc0 100644
--- a/examples/fsx.fio
+++ b/examples/fsx.fio
@@ -9,4 +9,3 @@ bs=4k
 norandommap
 direct=1
 loops=500000
-rwmixcycle=40
diff --git a/examples/jesd219.fio b/examples/jesd219.fio
index 24f16f77..deddd9a7 100644
--- a/examples/jesd219.fio
+++ b/examples/jesd219.fio
@@ -17,4 +17,4 @@ bssplit=512/4:1024/1:1536/1:2048/1:2560/1:3072/1:3584/1:4k/67:8k/10:16k/7:32k/3:
 blockalign=4k
 random_distribution=zoned:50/5:30/15:20/80
 filename=/dev/nvme0n1
-group_reporting=1
+group_reporting
diff --git a/examples/libpmem.fio b/examples/libpmem.fio
index 65b1d687..0ff681f0 100644
--- a/examples/libpmem.fio
+++ b/examples/libpmem.fio
@@ -3,7 +3,7 @@ bs=4k
 size=8g
 ioengine=libpmem
 norandommap
-time_based=1
+time_based
 group_reporting
 invalidate=1
 disable_lat=1
@@ -13,7 +13,7 @@ clat_percentiles=0
 
 iodepth=1
 iodepth_batch=1
-thread=1
+thread
 numjobs=1
 runtime=300
 
diff --git a/examples/libzbc-rand-write.fio b/examples/libzbc-rand-write.fio
index ce5870e4..41496219 100644
--- a/examples/libzbc-rand-write.fio
+++ b/examples/libzbc-rand-write.fio
@@ -12,7 +12,7 @@ max_open_zones=32
 bs=512K
 direct=1
 numjobs=16
-time_based=1
+time_based
 runtime=300
 
 [dev1]
diff --git a/examples/null.fio b/examples/null.fio
index 9d2f3e00..4534cbdd 100644
--- a/examples/null.fio
+++ b/examples/null.fio
@@ -7,4 +7,3 @@ ioengine=null
 size=100g
 rw=randread
 norandommap
-time_based=0
diff --git a/examples/pmemblk.fio b/examples/pmemblk.fio
index 2d5ecfce..f8131741 100644
--- a/examples/pmemblk.fio
+++ b/examples/pmemblk.fio
@@ -2,7 +2,7 @@
 bs=1m
 ioengine=pmemblk
 norandommap
-time_based=1
+time_based
 runtime=30
 group_reporting
 disable_lat=1
@@ -19,7 +19,7 @@ cpus_allowed_policy=split
 #
 iodepth=1
 direct=1
-thread=1
+thread
 numjobs=16
 #
 # Unlink can be used to remove the files when done, but if you are
diff --git a/examples/steadystate.fio b/examples/steadystate.fio
index 26fb8083..a38a3438 100644
--- a/examples/steadystate.fio
+++ b/examples/steadystate.fio
@@ -7,7 +7,7 @@
 
 [global]
 threads=1
-group_reporting=1
+group_reporting
 time_based
 size=128m
 
diff --git a/examples/surface-scan.fio b/examples/surface-scan.fio
index dc3373a2..98faf69a 100644
--- a/examples/surface-scan.fio
+++ b/examples/surface-scan.fio
@@ -1,7 +1,7 @@
 ; writes 512 byte verification blocks until the disk is full,
 ; then verifies written data
 [global]
-thread=1
+thread
 bs=64k
 direct=1
 ioengine=sync
diff --git a/examples/waitfor.fio b/examples/waitfor.fio
index 95fad005..096c3153 100644
--- a/examples/waitfor.fio
+++ b/examples/waitfor.fio
@@ -1,6 +1,6 @@
 [global]
 threads=1
-group_reporting=1
+group_reporting
 filename=/tmp/data
 filesize=128m
 
diff --git a/examples/zbd-rand-write.fio b/examples/zbd-rand-write.fio
index 1b3f2088..46cddd06 100644
--- a/examples/zbd-rand-write.fio
+++ b/examples/zbd-rand-write.fio
@@ -12,7 +12,7 @@ max_open_zones=32
 bs=512K
 direct=1
 numjobs=16
-time_based=1
+time_based
 runtime=180
 
 [dev1]


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-12-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-12-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 731365d849407426fe32981c97a2f9b42cdc0149:

  flow: fix hang with flow control and zoned block devices (2020-12-07 16:56:37 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dee9b29bef5bc344815d7a53dda6bb21426f2bfa:

  Merge branch 'wip-rbd-engine-tweaks' of https://github.com/dillaman/fio (2020-12-15 09:18:28 -0700)

----------------------------------------------------------------
Jason Dillaman (2):
      engines/rbd: add support for "direct=1" option
      engines/rbd: issue initial flush to enable writeback/around mode

Jens Axboe (1):
      Merge branch 'wip-rbd-engine-tweaks' of https://github.com/dillaman/fio

 engines/rbd.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

---

Diff of recent changes:

diff --git a/engines/rbd.c b/engines/rbd.c
index 268b6ebd..c6203d4c 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -227,12 +227,30 @@ static int _fio_rbd_connect(struct thread_data *td)
 		goto failed_shutdown;
 	}
 
+        if (td->o.odirect) {
+		r = rados_conf_set(rbd->cluster, "rbd_cache", "false");
+		if (r < 0) {
+			log_info("failed to disable RBD in-memory cache\n");
+		}
+	}
+
 	r = rbd_open(rbd->io_ctx, o->rbd_name, &rbd->image, NULL /*snap */ );
 	if (r < 0) {
 		log_err("rbd_open failed.\n");
 		goto failed_open;
 	}
 
+	if (!td->o.odirect) {
+		/*
+		 * ensure cache enables writeback/around mode unless explicitly
+		 * configured for writethrough mode
+		 */
+		r = rbd_flush(rbd->image);
+		if (r < 0) {
+			log_info("rbd: failed to issue initial flush\n");
+		}
+	}
+
 	if (!_fio_rbd_setup_poll(rbd))
 		goto failed_poll;
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-12-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-12-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a210654b03dbd04f17984d1cf791b1fd56862f1b:

  Merge branch 'cufile' of https://github.com/SystemFabricWorks/fio (2020-12-05 14:45:16 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 731365d849407426fe32981c97a2f9b42cdc0149:

  flow: fix hang with flow control and zoned block devices (2020-12-07 16:56:37 -0700)

----------------------------------------------------------------
Aravind Ramesh (1):
      flow: fix hang with flow control and zoned block devices

Jens Axboe (1):
      Merge branch 'reword-toolarge' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      filesetup: reword block size too large message

 filesetup.c | 2 +-
 flow.c      | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index 42c5f630..d3c370ca 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1199,7 +1199,7 @@ int setup_files(struct thread_data *td)
 		o->size = total_size;
 
 	if (o->size < td_min_bs(td)) {
-		log_err("fio: blocksize too large for data set\n");
+		log_err("fio: blocksize is larger than data set range\n");
 		goto err_out;
 	}
 
diff --git a/flow.c b/flow.c
index ea6b0ec9..c64bb3b2 100644
--- a/flow.c
+++ b/flow.c
@@ -37,6 +37,8 @@ int flow_threshold_exceeded(struct thread_data *td)
 		if (td->o.flow_sleep) {
 			io_u_quiesce(td);
 			usleep(td->o.flow_sleep);
+		} else if (td->o.zone_mode == ZONE_MODE_ZBD) {
+			io_u_quiesce(td);
 		}
 
 		return 1;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-12-06 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-12-06 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit fd80924b22fef6ce0d5580724d91490347445f90:

  Fio 3.25 (2020-12-04 11:47:42 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a210654b03dbd04f17984d1cf791b1fd56862f1b:

  Merge branch 'cufile' of https://github.com/SystemFabricWorks/fio (2020-12-05 14:45:16 -0700)

----------------------------------------------------------------
Brian T. Smith (1):
      ioengine: Add libcufile I/O engine

Jens Axboe (1):
      Merge branch 'cufile' of https://github.com/SystemFabricWorks/fio

 HOWTO                         |  30 ++
 Makefile                      |   3 +
 configure                     |  30 ++
 engines/libcufile.c           | 627 ++++++++++++++++++++++++++++++++++++++++++
 examples/libcufile-cufile.fio |  42 +++
 examples/libcufile-posix.fio  |  41 +++
 fio.1                         |  38 ++-
 optgroup.c                    |   4 +
 optgroup.h                    |   2 +
 9 files changed, 816 insertions(+), 1 deletion(-)
 create mode 100644 engines/libcufile.c
 create mode 100644 examples/libcufile-cufile.fio
 create mode 100644 examples/libcufile-posix.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 386fd12a..7e46cee0 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2048,6 +2048,14 @@ I/O engine
 		**nbd**
 			Read and write a Network Block Device (NBD).
 
+		**libcufile**
+			I/O engine supporting libcufile synchronous access to nvidia-fs and a
+			GPUDirect Storage-supported filesystem. This engine performs
+			I/O without transferring buffers between user-space and the kernel,
+			unless :option:`verify` is set or :option:`cuda_io` is `posix`.
+			:option:`iomem` must not be `cudamalloc`. This ioengine defines
+			engine specific options.
+
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -2398,6 +2406,28 @@ with the caveat that when used on the command line, they must come after the
 	nbd+unix:///?socket=/tmp/socket
 	nbds://tlshost/exportname
 
+.. option:: gpu_dev_ids=str : [libcufile]
+
+	Specify the GPU IDs to use with CUDA. This is a colon-separated list of
+	int. GPUs are assigned to workers roundrobin. Default is 0.
+
+.. option:: cuda_io=str : [libcufile]
+
+	Specify the type of I/O to use with CUDA. Default is **cufile**.
+
+	**cufile**
+		Use libcufile and nvidia-fs. This option performs I/O directly
+		between a GPUDirect Storage filesystem and GPU buffers,
+		avoiding use of a bounce buffer. If :option:`verify` is set,
+		cudaMemcpy is used to copy verificaton data between RAM and GPU.
+		Verification data is copied from RAM to GPU before a write
+		and from GPU to RAM after a read. :option:`direct` must be 1.
+	**posix**
+		Use POSIX to perform I/O with a RAM buffer, and use cudaMemcpy
+		to transfer data between RAM and the GPUs. Data is copied from
+		GPU to RAM before a write and copied from RAM to GPU after a
+		read. :option:`verify` does not affect use of cudaMemcpy.
+
 I/O depth
 ~~~~~~~~~
 
diff --git a/Makefile b/Makefile
index ecfaa3e0..a838af9a 100644
--- a/Makefile
+++ b/Makefile
@@ -103,6 +103,9 @@ endif
 ifdef CONFIG_LINUX_EXT4_MOVE_EXTENT
   SOURCE += engines/e4defrag.c
 endif
+ifdef CONFIG_LIBCUFILE
+  SOURCE += engines/libcufile.c
+endif
 ifdef CONFIG_LINUX_SPLICE
   SOURCE += engines/splice.c
 endif
diff --git a/configure b/configure
index d2ca8934..d247a041 100755
--- a/configure
+++ b/configure
@@ -162,6 +162,7 @@ pmemblk="no"
 devdax="no"
 pmem="no"
 cuda="no"
+libcufile="no"
 disable_lex=""
 disable_pmem="no"
 disable_native="no"
@@ -224,6 +225,8 @@ for opt do
   ;;
   --enable-cuda) cuda="yes"
   ;;
+  --enable-libcufile) libcufile="yes"
+  ;;
   --disable-native) disable_native="yes"
   ;;
   --with-ime=*) ime_path="$optarg"
@@ -272,6 +275,7 @@ if test "$show_help" = "yes" ; then
   echo "--disable-shm           Disable SHM support"
   echo "--disable-optimizations Don't enable compiler optimizations"
   echo "--enable-cuda           Enable GPUDirect RDMA support"
+  echo "--enable-libcufile      Enable GPUDirect Storage cuFile support"
   echo "--disable-native        Don't build for native host"
   echo "--with-ime=             Install path for DDN's Infinite Memory Engine"
   echo "--enable-libiscsi       Enable iscsi support"
@@ -2495,6 +2499,29 @@ EOF
 fi
 print_config "cuda" "$cuda"
 
+##########################################
+# libcufile probe
+if test "$libcufile" != "no" ; then
+cat > $TMPC << EOF
+#include <cufile.h>
+
+int main(int argc, char* argv[]) {
+   cuFileDriverOpen();
+   return 0;
+}
+EOF
+  if compile_prog "" "-lcuda -lcudart -lcufile" "libcufile"; then
+    libcufile="yes"
+    LIBS="-lcuda -lcudart -lcufile $LIBS"
+  else
+    if test "$libcufile" = "yes" ; then
+      feature_not_found "libcufile" ""
+    fi
+    libcufile="no"
+  fi
+fi
+print_config "libcufile" "$libcufile"
+
 ##########################################
 # check for cc -march=native
 build_native="no"
@@ -2966,6 +2993,9 @@ fi
 if test "$cuda" = "yes" ; then
   output_sym "CONFIG_CUDA"
 fi
+if test "$libcufile" = "yes" ; then
+  output_sym "CONFIG_LIBCUFILE"
+fi
 if test "$march_set" = "no" && test "$build_native" = "yes" ; then
   output_sym "CONFIG_BUILD_NATIVE"
 fi
diff --git a/engines/libcufile.c b/engines/libcufile.c
new file mode 100644
index 00000000..e575b786
--- /dev/null
+++ b/engines/libcufile.c
@@ -0,0 +1,627 @@
+/*
+ * Copyright (c)2020 System Fabric Works, Inc. All Rights Reserved.
+ * mailto:info@systemfabricworks.com
+ *
+ * License: GPLv2, see COPYING.
+ *
+ * libcufile engine
+ *
+ * fio I/O engine using the NVIDIA cuFile API.
+ *
+ */
+
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <string.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <cufile.h>
+#include <cuda.h>
+#include <cuda_runtime.h>
+#include <pthread.h>
+
+#include "../fio.h"
+#include "../lib/pow2.h"
+#include "../optgroup.h"
+#include "../lib/memalign.h"
+
+#define ALIGNED_4KB(v) (((v) & 0x0fff) == 0)
+
+#define LOGGED_BUFLEN_NOT_ALIGNED     0x01
+#define LOGGED_GPU_OFFSET_NOT_ALIGNED 0x02
+#define GPU_ID_SEP ":"
+
+enum {
+	IO_CUFILE    = 1,
+	IO_POSIX     = 2
+};
+
+struct libcufile_options {
+	struct thread_data *td;
+	char               *gpu_ids;       /* colon-separated list of GPU ids,
+					      one per job */
+	void               *cu_mem_ptr;    /* GPU memory */
+	void               *junk_buf;      /* buffer to simulate cudaMemcpy with
+					      posix I/O write */
+	int                 my_gpu_id;     /* GPU id to use for this job */
+	unsigned int        cuda_io;       /* Type of I/O to use with CUDA */
+	size_t              total_mem;     /* size for cu_mem_ptr and junk_buf */
+	int                 logged;        /* bitmask of log messages that have
+					      been output, prevent flood */
+};
+
+struct fio_libcufile_data {
+	CUfileDescr_t  cf_descr;
+	CUfileHandle_t cf_handle;
+};
+
+static struct fio_option options[] = {
+	{
+		.name	  = "gpu_dev_ids",
+		.lname	  = "libcufile engine gpu dev ids",
+		.type	  = FIO_OPT_STR_STORE,
+		.off1	  = offsetof(struct libcufile_options, gpu_ids),
+		.help	  = "GPU IDs, one per subjob, separated by " GPU_ID_SEP,
+		.category = FIO_OPT_C_ENGINE,
+		.group	  = FIO_OPT_G_LIBCUFILE,
+	},
+	{
+		.name	  = "cuda_io",
+		.lname	  = "libcufile cuda io",
+		.type	  = FIO_OPT_STR,
+		.off1	  = offsetof(struct libcufile_options, cuda_io),
+		.help	  = "Type of I/O to use with CUDA",
+		.def      = "cufile",
+		.posval   = {
+			    { .ival = "cufile",
+			      .oval = IO_CUFILE,
+			      .help = "libcufile nvidia-fs"
+			    },
+			    { .ival = "posix",
+			      .oval = IO_POSIX,
+			      .help = "POSIX I/O"
+			    }
+		},
+		.category = FIO_OPT_C_ENGINE,
+		.group	  = FIO_OPT_G_LIBCUFILE,
+	},
+	{
+		.name	 = NULL,
+	},
+};
+
+static int running = 0;
+static int cufile_initialized = 0;
+static pthread_mutex_t running_lock = PTHREAD_MUTEX_INITIALIZER;
+
+#define check_cudaruntimecall(fn, rc)                                               \
+	do {                                                                        \
+		cudaError_t res = fn;                                               \
+		if (res != cudaSuccess) {                                           \
+			const char *str = cudaGetErrorName(res);                    \
+			log_err("cuda runtime api call failed %s:%d : err=%d:%s\n", \
+				#fn, __LINE__, res, str);                           \
+			rc = -1;                                                    \
+		} else                                                              \
+			rc = 0;                                                     \
+	} while(0)
+
+static const char *fio_libcufile_get_cuda_error(CUfileError_t st)
+{
+	if (IS_CUFILE_ERR(st.err))
+		return cufileop_status_error(st.err);
+	return "unknown";
+}
+
+/*
+ * Assign GPU to subjob roundrobin, similar to how multiple
+ * entries in 'directory' are handled by fio.
+ */
+static int fio_libcufile_find_gpu_id(struct thread_data *td)
+{
+	struct libcufile_options *o = td->eo;
+	int gpu_id = 0;
+
+	if (o->gpu_ids != NULL) {
+		char *gpu_ids, *pos, *cur;
+		int i, id_count, gpu_idx;
+
+		for (id_count = 0, cur = o->gpu_ids; cur != NULL; id_count++) {
+			cur = strchr(cur, GPU_ID_SEP[0]);
+			if (cur != NULL)
+				cur++;
+		}
+
+		gpu_idx = td->subjob_number % id_count;
+
+		pos = gpu_ids = strdup(o->gpu_ids);
+		if (gpu_ids == NULL) {
+			log_err("strdup(gpu_ids): err=%d\n", errno);
+			return -1;
+		}
+
+		i = 0;
+		while (pos != NULL && i <= gpu_idx) {
+			i++;
+			cur = strsep(&pos, GPU_ID_SEP);
+		}
+
+		if (cur)
+			gpu_id = atoi(cur);
+
+		free(gpu_ids);
+	}
+
+	return gpu_id;
+}
+
+static int fio_libcufile_init(struct thread_data *td)
+{
+	struct libcufile_options *o = td->eo;
+	CUfileError_t status;
+	int initialized;
+	int rc;
+
+	pthread_mutex_lock(&running_lock);
+	if (running == 0) {
+		assert(cufile_initialized == 0);
+		if (o->cuda_io == IO_CUFILE) {
+			/* only open the driver if this is the first worker thread */
+			status = cuFileDriverOpen();
+			if (status.err != CU_FILE_SUCCESS)
+				log_err("cuFileDriverOpen: err=%d:%s\n", status.err,
+					fio_libcufile_get_cuda_error(status));
+			else
+				cufile_initialized = 1;
+		}
+	}
+	running++;
+	initialized = cufile_initialized;
+	pthread_mutex_unlock(&running_lock);
+
+	if (o->cuda_io == IO_CUFILE && !initialized)
+		return 1;
+
+	o->my_gpu_id = fio_libcufile_find_gpu_id(td);
+	if (o->my_gpu_id < 0)
+		return 1;
+
+	dprint(FD_MEM, "Subjob %d uses GPU %d\n", td->subjob_number, o->my_gpu_id);
+	check_cudaruntimecall(cudaSetDevice(o->my_gpu_id), rc);
+	if (rc != 0)
+		return 1;
+
+	return 0;
+}
+
+static inline int fio_libcufile_pre_write(struct thread_data *td,
+					  struct libcufile_options *o,
+					  struct io_u *io_u,
+					  size_t gpu_offset)
+{
+	int rc = 0;
+
+	if (o->cuda_io == IO_CUFILE) {
+		if (td->o.verify) {
+			/*
+			  Data is being verified, copy the io_u buffer to GPU memory.
+			  This isn't done in the non-verify case because the data would
+			  already be in GPU memory in a normal cuFile application.
+			*/
+			check_cudaruntimecall(cudaMemcpy(((char*) o->cu_mem_ptr) + gpu_offset,
+							 io_u->xfer_buf,
+							 io_u->xfer_buflen,
+							 cudaMemcpyHostToDevice), rc);
+			if (rc != 0) {
+				log_err("DDIR_WRITE cudaMemcpy H2D failed\n");
+				io_u->error = EIO;
+			}
+		}
+	} else if (o->cuda_io == IO_POSIX) {
+
+		/*
+		  POSIX I/O is being used, the data has to be copied out of the
+		  GPU into a CPU buffer. GPU memory doesn't contain the actual
+		  data to write, copy the data to the junk buffer. The purpose
+		  of this is to add the overhead of cudaMemcpy() that would be
+		  present in a POSIX I/O CUDA application.
+		*/
+		check_cudaruntimecall(cudaMemcpy(o->junk_buf + gpu_offset,
+						 ((char*) o->cu_mem_ptr) + gpu_offset,
+						 io_u->xfer_buflen,
+						 cudaMemcpyDeviceToHost), rc);
+		if (rc != 0) {
+			log_err("DDIR_WRITE cudaMemcpy D2H failed\n");
+			io_u->error = EIO;
+		}
+	} else {
+		log_err("Illegal CUDA IO type: %d\n", o->cuda_io);
+		assert(0);
+		rc = EINVAL;
+	}
+
+	return rc;
+}
+
+static inline int fio_libcufile_post_read(struct thread_data *td,
+					  struct libcufile_options *o,
+					  struct io_u *io_u,
+					  size_t gpu_offset)
+{
+	int rc = 0;
+
+	if (o->cuda_io == IO_CUFILE) {
+		if (td->o.verify) {
+			/* Copy GPU memory to CPU buffer for verify */
+			check_cudaruntimecall(cudaMemcpy(io_u->xfer_buf,
+							 ((char*) o->cu_mem_ptr) + gpu_offset,
+							 io_u->xfer_buflen,
+							 cudaMemcpyDeviceToHost), rc);
+			if (rc != 0) {
+				log_err("DDIR_READ cudaMemcpy D2H failed\n");
+				io_u->error = EIO;
+			}
+		}
+	} else if (o->cuda_io == IO_POSIX) {
+		/* POSIX I/O read, copy the CPU buffer to GPU memory */
+		check_cudaruntimecall(cudaMemcpy(((char*) o->cu_mem_ptr) + gpu_offset,
+						 io_u->xfer_buf,
+						 io_u->xfer_buflen,
+						 cudaMemcpyHostToDevice), rc);
+		if (rc != 0) {
+			log_err("DDIR_READ cudaMemcpy H2D failed\n");
+			io_u->error = EIO;
+		}
+	} else {
+		log_err("Illegal CUDA IO type: %d\n", o->cuda_io);
+		assert(0);
+		rc = EINVAL;
+	}
+
+	return rc;
+}
+
+static enum fio_q_status fio_libcufile_queue(struct thread_data *td,
+					     struct io_u *io_u)
+{
+	struct libcufile_options *o = td->eo;
+	struct fio_libcufile_data *fcd = FILE_ENG_DATA(io_u->file);
+	unsigned long long io_offset;
+	ssize_t sz;
+	ssize_t remaining;
+	size_t xfered;
+	size_t gpu_offset;
+	int rc;
+
+	if (o->cuda_io == IO_CUFILE && fcd == NULL) {
+		io_u->error = EINVAL;
+		td_verror(td, EINVAL, "xfer");
+		return FIO_Q_COMPLETED;
+	}
+
+	fio_ro_check(td, io_u);
+
+	switch(io_u->ddir) {
+	case DDIR_SYNC:
+		rc = fsync(io_u->file->fd);
+		if (rc != 0) {
+			io_u->error = errno;
+			log_err("fsync: err=%d\n", errno);
+		}
+		break;
+
+	case DDIR_DATASYNC:
+		rc = fdatasync(io_u->file->fd);
+		if (rc != 0) {
+			io_u->error = errno;
+			log_err("fdatasync: err=%d\n", errno);
+		}
+		break;
+
+	case DDIR_READ:
+	case DDIR_WRITE:
+		/*
+		  There may be a better way to calculate gpu_offset. The intent is
+		  that gpu_offset equals the the difference between io_u->xfer_buf and
+		  the page-aligned base address for io_u buffers.
+		*/
+		gpu_offset = io_u->index * io_u->xfer_buflen;
+		io_offset = io_u->offset;
+		remaining = io_u->xfer_buflen;
+
+		xfered = 0;
+		sz = 0;
+
+		assert(gpu_offset + io_u->xfer_buflen <= o->total_mem);
+
+		if (o->cuda_io == IO_CUFILE) {
+			if (!(ALIGNED_4KB(io_u->xfer_buflen) ||
+			      (o->logged & LOGGED_BUFLEN_NOT_ALIGNED))) {
+				log_err("buflen not 4KB-aligned: %llu\n", io_u->xfer_buflen);
+				o->logged |= LOGGED_BUFLEN_NOT_ALIGNED;
+			}
+
+			if (!(ALIGNED_4KB(gpu_offset) ||
+			      (o->logged & LOGGED_GPU_OFFSET_NOT_ALIGNED))) {
+				log_err("gpu_offset not 4KB-aligned: %lu\n", gpu_offset);
+				o->logged |= LOGGED_GPU_OFFSET_NOT_ALIGNED;
+			}
+		}
+
+		if (io_u->ddir == DDIR_WRITE)
+			rc = fio_libcufile_pre_write(td, o, io_u, gpu_offset);
+
+		if (io_u->error != 0)
+			break;
+
+		while (remaining > 0) {
+			assert(gpu_offset + xfered <= o->total_mem);
+			if (io_u->ddir == DDIR_READ) {
+				if (o->cuda_io == IO_CUFILE) {
+					sz = cuFileRead(fcd->cf_handle, o->cu_mem_ptr, remaining,
+							io_offset + xfered, gpu_offset + xfered);
+					if (sz == -1) {
+						io_u->error = errno;
+						log_err("cuFileRead: err=%d\n", errno);
+					} else if (sz < 0) {
+						io_u->error = EIO;
+						log_err("cuFileRead: err=%ld:%s\n", sz,
+							cufileop_status_error(-sz));
+					}
+				} else if (o->cuda_io == IO_POSIX) {
+					sz = pread(io_u->file->fd, ((char*) io_u->xfer_buf) + xfered,
+						   remaining, io_offset + xfered);
+					if (sz < 0) {
+						io_u->error = errno;
+						log_err("pread: err=%d\n", errno);
+					}
+				} else {
+					log_err("Illegal CUDA IO type: %d\n", o->cuda_io);
+					io_u->error = -1;
+					assert(0);
+				}
+			} else if (io_u->ddir == DDIR_WRITE) {
+				if (o->cuda_io == IO_CUFILE) {
+					sz = cuFileWrite(fcd->cf_handle, o->cu_mem_ptr, remaining,
+							 io_offset + xfered, gpu_offset + xfered);
+					if (sz == -1) {
+						io_u->error = errno;
+						log_err("cuFileWrite: err=%d\n", errno);
+					} else if (sz < 0) {
+						io_u->error = EIO;
+						log_err("cuFileWrite: err=%ld:%s\n", sz,
+							cufileop_status_error(-sz));
+					}
+				} else if (o->cuda_io == IO_POSIX) {
+					sz = pwrite(io_u->file->fd,
+						    ((char*) io_u->xfer_buf) + xfered,
+						    remaining, io_offset + xfered);
+					if (sz < 0) {
+						io_u->error = errno;
+						log_err("pwrite: err=%d\n", errno);
+					}
+				} else {
+					log_err("Illegal CUDA IO type: %d\n", o->cuda_io);
+					io_u->error = -1;
+					assert(0);
+				}
+			} else {
+				log_err("not DDIR_READ or DDIR_WRITE: %d\n", io_u->ddir);
+				io_u->error = -1;
+				assert(0);
+				break;
+			}
+
+			if (io_u->error != 0)
+				break;
+
+			remaining -= sz;
+			xfered += sz;
+
+			if (remaining != 0)
+				log_info("Incomplete %s: %ld bytes remaining\n",
+					 io_u->ddir == DDIR_READ? "read" : "write", remaining);
+		}
+
+		if (io_u->error != 0)
+			break;
+
+		if (io_u->ddir == DDIR_READ)
+			rc = fio_libcufile_post_read(td, o, io_u, gpu_offset);
+		break;
+
+	default:
+		io_u->error = EINVAL;
+		break;
+	}
+
+	if (io_u->error != 0) {
+		log_err("IO failed\n");
+		td_verror(td, io_u->error, "xfer");
+	}
+
+	return FIO_Q_COMPLETED;
+}
+
+static int fio_libcufile_open_file(struct thread_data *td, struct fio_file *f)
+{
+	struct libcufile_options *o = td->eo;
+	struct fio_libcufile_data *fcd = NULL;
+	int rc;
+	CUfileError_t status;
+
+	rc = generic_open_file(td, f);
+	if (rc)
+		return rc;
+
+	if (o->cuda_io == IO_CUFILE) {
+		fcd = calloc(1, sizeof(*fcd));
+		if (fcd == NULL) {
+			rc = ENOMEM;
+			goto exit_err;
+		}
+
+		fcd->cf_descr.handle.fd = f->fd;
+		fcd->cf_descr.type = CU_FILE_HANDLE_TYPE_OPAQUE_FD;
+		status = cuFileHandleRegister(&fcd->cf_handle, &fcd->cf_descr);
+		if (status.err != CU_FILE_SUCCESS) {
+			log_err("cufile register: err=%d:%s\n", status.err,
+				fio_libcufile_get_cuda_error(status));
+			rc = EINVAL;
+			goto exit_err;
+		}
+	}
+
+	FILE_SET_ENG_DATA(f, fcd);
+	return 0;
+
+exit_err:
+	if (fcd) {
+		free(fcd);
+		fcd = NULL;
+	}
+	if (f) {
+		int rc2 = generic_close_file(td, f);
+		if (rc2)
+			log_err("generic_close_file: err=%d\n", rc2);
+	}
+	return rc;
+}
+
+static int fio_libcufile_close_file(struct thread_data *td, struct fio_file *f)
+{
+	struct fio_libcufile_data *fcd = FILE_ENG_DATA(f);
+	int rc;
+
+	if (fcd != NULL) {
+		cuFileHandleDeregister(fcd->cf_handle);
+		FILE_SET_ENG_DATA(f, NULL);
+		free(fcd);
+	}
+
+	rc = generic_close_file(td, f);
+
+	return rc;
+}
+
+static int fio_libcufile_iomem_alloc(struct thread_data *td, size_t total_mem)
+{
+	struct libcufile_options *o = td->eo;
+	int rc;
+	CUfileError_t status;
+
+	o->total_mem = total_mem;
+	o->logged = 0;
+	o->cu_mem_ptr = NULL;
+	o->junk_buf = NULL;
+	td->orig_buffer = calloc(1, total_mem);
+	if (!td->orig_buffer) {
+		log_err("orig_buffer calloc failed: err=%d\n", errno);
+		goto exit_error;
+	}
+
+	if (o->cuda_io == IO_POSIX) {
+		o->junk_buf = calloc(1, total_mem);
+		if (o->junk_buf == NULL) {
+			log_err("junk_buf calloc failed: err=%d\n", errno);
+			goto exit_error;
+		}
+	}
+
+	dprint(FD_MEM, "Alloc %zu for GPU %d\n", total_mem, o->my_gpu_id);
+	check_cudaruntimecall(cudaMalloc(&o->cu_mem_ptr, total_mem), rc);
+	if (rc != 0)
+		goto exit_error;
+	check_cudaruntimecall(cudaMemset(o->cu_mem_ptr, 0xab, total_mem), rc);
+	if (rc != 0)
+		goto exit_error;
+
+	if (o->cuda_io == IO_CUFILE) {
+		status = cuFileBufRegister(o->cu_mem_ptr, total_mem, 0);
+		if (status.err != CU_FILE_SUCCESS) {
+			log_err("cuFileBufRegister: err=%d:%s\n", status.err,
+				fio_libcufile_get_cuda_error(status));
+			goto exit_error;
+		}
+	}
+
+	return 0;
+
+exit_error:
+	if (td->orig_buffer) {
+		free(td->orig_buffer);
+		td->orig_buffer = NULL;
+	}
+	if (o->junk_buf) {
+		free(o->junk_buf);
+		o->junk_buf = NULL;
+	}
+	if (o->cu_mem_ptr) {
+		cudaFree(o->cu_mem_ptr);
+		o->cu_mem_ptr = NULL;
+	}
+	return 1;
+}
+
+static void fio_libcufile_iomem_free(struct thread_data *td)
+{
+	struct libcufile_options *o = td->eo;
+
+	if (o->junk_buf) {
+		free(o->junk_buf);
+		o->junk_buf = NULL;
+	}
+	if (o->cu_mem_ptr) {
+		if (o->cuda_io == IO_CUFILE)
+			cuFileBufDeregister(o->cu_mem_ptr);
+		cudaFree(o->cu_mem_ptr);
+		o->cu_mem_ptr = NULL;
+	}
+	if (td->orig_buffer) {
+		free(td->orig_buffer);
+		td->orig_buffer = NULL;
+	}
+}
+
+static void fio_libcufile_cleanup(struct thread_data *td)
+{
+	struct libcufile_options *o = td->eo;
+
+	pthread_mutex_lock(&running_lock);
+	running--;
+	assert(running >= 0);
+	if (running == 0) {
+		/* only close the driver if initialized and
+		   this is the last worker thread */
+		if (o->cuda_io == IO_CUFILE && cufile_initialized)
+			cuFileDriverClose();
+		cufile_initialized = 0;
+	}
+	pthread_mutex_unlock(&running_lock);
+}
+
+FIO_STATIC struct ioengine_ops ioengine = {
+	.name                = "libcufile",
+	.version             = FIO_IOOPS_VERSION,
+	.init                = fio_libcufile_init,
+	.queue               = fio_libcufile_queue,
+	.open_file           = fio_libcufile_open_file,
+	.close_file          = fio_libcufile_close_file,
+	.iomem_alloc         = fio_libcufile_iomem_alloc,
+	.iomem_free          = fio_libcufile_iomem_free,
+	.cleanup             = fio_libcufile_cleanup,
+	.flags               = FIO_SYNCIO,
+	.options             = options,
+	.option_struct_size  = sizeof(struct libcufile_options)
+};
+
+void fio_init fio_libcufile_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+void fio_exit fio_libcufile_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/libcufile-cufile.fio b/examples/libcufile-cufile.fio
new file mode 100644
index 00000000..94a64b5a
--- /dev/null
+++ b/examples/libcufile-cufile.fio
@@ -0,0 +1,42 @@
+# Example libcufile job, using cufile I/O
+#
+# Required environment variables:
+#     GPU_DEV_IDS : refer to option 'gpu_dev_ids'
+#     FIO_DIR     : 'directory'. This job uses cuda_io=cufile, so path(s) must
+#                   point to GPUDirect Storage filesystem(s)
+#
+
+[global]
+ioengine=libcufile
+directory=${FIO_DIR}
+gpu_dev_ids=${GPU_DEV_IDS}
+cuda_io=cufile
+# 'direct' must be 1 when using cuda_io=cufile
+direct=1
+# Performance is negatively affected if 'bs' is not a multiple of 4k.
+# Refer to GDS cuFile documentation.
+bs=1m
+size=1m
+numjobs=16
+# cudaMalloc fails if too many processes attach to the GPU, use threads.
+thread
+
+[read]
+rw=read
+
+[write]
+rw=write
+
+[randread]
+rw=randread
+
+[randwrite]
+rw=randwrite
+
+[verify]
+rw=write
+verify=md5
+
+[randverify]
+rw=randwrite
+verify=md5
diff --git a/examples/libcufile-posix.fio b/examples/libcufile-posix.fio
new file mode 100644
index 00000000..2bce22e6
--- /dev/null
+++ b/examples/libcufile-posix.fio
@@ -0,0 +1,41 @@
+# Example libcufile job, using POSIX I/O
+#
+# Required environment variables:
+#     GPU_DEV_IDS : refer to option 'gpu_dev_ids'
+#     FIO_DIR     : 'directory'. cuda_io=posix, so the path(s) may point
+#                   to any POSIX filesystem(s)
+#
+
+[global]
+ioengine=libcufile
+directory=${FIO_DIR}
+gpu_dev_ids=${GPU_DEV_IDS}
+cuda_io=posix
+# 'direct' may be 1 or 0 when using cuda_io=posix
+direct=0
+# there are no unusual requirements for 'bs' when cuda_io=posix
+bs=1m
+size=1G
+numjobs=16
+# cudaMalloc fails if too many processes attach to the GPU, use threads
+thread
+
+[read]
+rw=read
+
+[write]
+rw=write
+
+[randread]
+rw=randread
+
+[randwrite]
+rw=randwrite
+
+[verify]
+rw=write
+verify=md5
+
+[randverify]
+rw=randwrite
+verify=md5
diff --git a/fio.1 b/fio.1
index 48119325..45ec8d43 100644
--- a/fio.1
+++ b/fio.1
@@ -1826,6 +1826,13 @@ Read and write iscsi lun with libiscsi.
 .TP
 .B nbd
 Synchronous read and write a Network Block Device (NBD).
+.TP
+.B libcufile
+I/O engine supporting libcufile synchronous access to nvidia-fs and a
+GPUDirect Storage-supported filesystem. This engine performs
+I/O without transferring buffers between user-space and the kernel,
+unless \fBverify\fR is set or \fBcuda_io\fR is \fBposix\fR. \fBiomem\fR must
+not be \fBcudamalloc\fR. This ioengine defines engine specific options.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -2139,7 +2146,36 @@ Example URIs:
 \fInbd+unix:///?socket=/tmp/socket\fR
 .TP
 \fInbds://tlshost/exportname\fR
-
+.RE
+.RE
+.TP
+.BI (libcufile)gpu_dev_ids\fR=\fPstr
+Specify the GPU IDs to use with CUDA. This is a colon-separated list of int.
+GPUs are assigned to workers roundrobin. Default is 0.
+.TP
+.BI (libcufile)cuda_io\fR=\fPstr
+Specify the type of I/O to use with CUDA. This option
+takes the following values:
+.RS
+.RS
+.TP
+.B cufile (default)
+Use libcufile and nvidia-fs. This option performs I/O directly
+between a GPUDirect Storage filesystem and GPU buffers,
+avoiding use of a bounce buffer. If \fBverify\fR is set,
+cudaMemcpy is used to copy verification data between RAM and GPU(s).
+Verification data is copied from RAM to GPU before a write
+and from GPU to RAM after a read.
+\fBdirect\fR must be 1.
+.TP
+.BI posix
+Use POSIX to perform I/O with a RAM buffer, and use
+cudaMemcpy to transfer data between RAM and the GPU(s).
+Data is copied from GPU to RAM before a write and copied
+from RAM to GPU after a read. \fBverify\fR does not affect
+the use of cudaMemcpy.
+.RE
+.RE
 .SS "I/O depth"
 .TP
 .BI iodepth \fR=\fPint
diff --git a/optgroup.c b/optgroup.c
index c228ff29..64774896 100644
--- a/optgroup.c
+++ b/optgroup.c
@@ -173,6 +173,10 @@ static const struct opt_group fio_opt_cat_groups[] = {
 		.name	= "NBD I/O engine", /* NBD */
 		.mask	= FIO_OPT_G_NBD,
 	},
+	{
+		.name	= "libcufile I/O engine", /* libcufile */
+		.mask	= FIO_OPT_G_LIBCUFILE,
+	},
 	{
 		.name	= NULL,
 	},
diff --git a/optgroup.h b/optgroup.h
index 5789afd3..d2f1ceb3 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -67,6 +67,7 @@ enum opt_category_group {
 	__FIO_OPT_G_IOURING,
 	__FIO_OPT_G_FILESTAT,
 	__FIO_OPT_G_NR,
+	__FIO_OPT_G_LIBCUFILE,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
 	FIO_OPT_G_ZONE		= (1ULL << __FIO_OPT_G_ZONE),
@@ -108,6 +109,7 @@ enum opt_category_group {
 	FIO_OPT_G_NBD		= (1ULL << __FIO_OPT_G_NBD),
 	FIO_OPT_G_IOURING	= (1ULL << __FIO_OPT_G_IOURING),
 	FIO_OPT_G_FILESTAT	= (1ULL << __FIO_OPT_G_FILESTAT),
+	FIO_OPT_G_LIBCUFILE	= (1ULL << __FIO_OPT_G_LIBCUFILE),
 };
 
 extern const struct opt_group *opt_group_from_mask(uint64_t *mask);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-12-05 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-12-05 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f9eb98cba4cfc5351bace61c21eda67fb625266b:

  Merge branch 'stat-int-creep3' of https://github.com/jeffreyalien/fio (2020-12-03 16:09:43 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fd80924b22fef6ce0d5580724d91490347445f90:

  Fio 3.25 (2020-12-04 11:47:42 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 3.25

 FIO-VERSION-GEN | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 86ea0c5d..81a6355b 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.24
+DEF_VER=fio-3.25
 
 LF='
 '


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-12-04 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-12-04 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7914c6147adaf3ef32804519ced850168fff1711:

  Merge branch 'regrow_agg_logs' of https://github.com/pmoust/fio (2020-11-27 08:55:12 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f9eb98cba4cfc5351bace61c21eda67fb625266b:

  Merge branch 'stat-int-creep3' of https://github.com/jeffreyalien/fio (2020-12-03 16:09:43 -0700)

----------------------------------------------------------------
Jeff Lien (1):
      stat: Prevent the BW and IOPS logging interval from creeping up

Jens Axboe (1):
      Merge branch 'stat-int-creep3' of https://github.com/jeffreyalien/fio

 stat.c | 35 ++++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index d42586e7..b7237953 100644
--- a/stat.c
+++ b/stat.c
@@ -2747,7 +2747,8 @@ static unsigned long add_log_sample(struct thread_data *td,
 
 	__add_stat_to_log(iolog, ddir, elapsed, td->o.log_max != 0, priority_bit);
 
-	iolog->avg_last[ddir] = elapsed - (this_window - iolog->avg_msec);
+	iolog->avg_last[ddir] = elapsed - (elapsed % iolog->avg_msec);
+
 	return iolog->avg_msec;
 }
 
@@ -2985,7 +2986,7 @@ static int __add_samples(struct thread_data *td, struct timespec *parent_tv,
 	next_log = avg_time;
 
 	spent = mtime_since(parent_tv, t);
-	if (spent < avg_time && avg_time - spent >= LOG_MSEC_SLACK)
+	if (spent < avg_time && avg_time - spent > LOG_MSEC_SLACK)
 		return avg_time - spent;
 
 	if (needs_lock)
@@ -3078,13 +3079,16 @@ static int add_iops_samples(struct thread_data *td, struct timespec *t)
 int calc_log_samples(void)
 {
 	struct thread_data *td;
-	unsigned int next = ~0U, tmp;
+	unsigned int next = ~0U, tmp = 0, next_mod = 0, log_avg_msec_min = -1U;
 	struct timespec now;
 	int i;
+	long elapsed_time = 0;
 
 	fio_gettime(&now, NULL);
 
 	for_each_td(td, i) {
+		elapsed_time = mtime_since_now(&td->epoch);
+
 		if (!td->o.stats)
 			continue;
 		if (in_ramp_time(td) ||
@@ -3095,17 +3099,34 @@ int calc_log_samples(void)
 		if (!td->bw_log ||
 			(td->bw_log && !per_unit_log(td->bw_log))) {
 			tmp = add_bw_samples(td, &now);
-			if (tmp < next)
-				next = tmp;
+
+			if (td->bw_log)
+				log_avg_msec_min = min(log_avg_msec_min, (unsigned int)td->bw_log->avg_msec);
 		}
 		if (!td->iops_log ||
 			(td->iops_log && !per_unit_log(td->iops_log))) {
 			tmp = add_iops_samples(td, &now);
-			if (tmp < next)
-				next = tmp;
+
+			if (td->iops_log)
+				log_avg_msec_min = min(log_avg_msec_min, (unsigned int)td->iops_log->avg_msec);
 		}
+
+		if (tmp < next)
+			next = tmp;
 	}
 
+	/* if log_avg_msec_min has not been changed, set it to 0 */
+	if (log_avg_msec_min == -1U)
+		log_avg_msec_min = 0;
+
+	if (log_avg_msec_min == 0)
+		next_mod = elapsed_time;
+	else
+		next_mod = elapsed_time % log_avg_msec_min;
+
+	/* correction to keep the time on the log avg msec boundary */
+	next = min(next, (log_avg_msec_min - next_mod));
+
 	return next == ~0U ? 0 : next;
 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-11-28 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-11-28 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit d04162b62c9d7d4f5e8ea70be9cb419abaced160:

  Merge branch 'update-fio-ioops-version' of https://github.com/diameter/fio (2020-11-25 11:01:00 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7914c6147adaf3ef32804519ced850168fff1711:

  Merge branch 'regrow_agg_logs' of https://github.com/pmoust/fio (2020-11-27 08:55:12 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'regrow_agg_logs' of https://github.com/pmoust/fio

Panagiotis Moustafellos (1):
      stat: allow bandwidth log stats to grow to MAX_LOG_ENTRIES

 eta.c   | 1 +
 iolog.h | 1 +
 stat.c  | 8 ++++++++
 3 files changed, 10 insertions(+)

---

Diff of recent changes:

diff --git a/eta.c b/eta.c
index d1c9449f..97843012 100644
--- a/eta.c
+++ b/eta.c
@@ -507,6 +507,7 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 		calc_rate(unified_rw_rep, rate_time, io_bytes, rate_io_bytes,
 				je->rate);
 		memcpy(&rate_prev_time, &now, sizeof(now));
+		regrow_agg_logs();
 		for_each_rw_ddir(ddir) {
 			add_agg_sample(sample_val(je->rate[ddir]), ddir, 0, 0);
 		}
diff --git a/iolog.h b/iolog.h
index 981081f9..9e382cc0 100644
--- a/iolog.h
+++ b/iolog.h
@@ -182,6 +182,7 @@ static inline struct io_sample *__get_sample(void *samples, int log_offset,
 struct io_logs *iolog_cur_log(struct io_log *);
 uint64_t iolog_nr_samples(struct io_log *);
 void regrow_logs(struct thread_data *);
+void regrow_agg_logs(void);
 
 static inline struct io_sample *get_sample(struct io_log *iolog,
 					   struct io_logs *cur_log,
diff --git a/stat.c b/stat.c
index eb40bd7f..d42586e7 100644
--- a/stat.c
+++ b/stat.c
@@ -2536,6 +2536,14 @@ void regrow_logs(struct thread_data *td)
 	td->flags &= ~TD_F_REGROW_LOGS;
 }
 
+void regrow_agg_logs(void)
+{
+	enum fio_ddir ddir;
+
+	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
+		regrow_log(agg_io_log[ddir]);
+}
+
 static struct io_logs *get_cur_log(struct io_log *iolog)
 {
 	struct io_logs *cur_log;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-11-26 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-11-26 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9107a641e67f27003fa6cbe7b55b1ec6a0239197:

  error out if ENOSPC during file layout (2020-11-22 09:54:44 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d04162b62c9d7d4f5e8ea70be9cb419abaced160:

  Merge branch 'update-fio-ioops-version' of https://github.com/diameter/fio (2020-11-25 11:01:00 -0700)

----------------------------------------------------------------
Ivan Andreyev (1):
      ioengines: increment FIO_IOOPS_VERSION

Jens Axboe (1):
      Merge branch 'update-fio-ioops-version' of https://github.com/diameter/fio

 ioengines.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/ioengines.h b/ioengines.h
index fbe52fa4..a928b211 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -8,7 +8,7 @@
 #include "io_u.h"
 #include "zbd_types.h"
 
-#define FIO_IOOPS_VERSION	26
+#define FIO_IOOPS_VERSION	27
 
 #ifndef CONFIG_DYNAMIC_ENGINES
 #define FIO_STATIC	static


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-11-23 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-11-23 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2ee239fb084355e115cf8c2bf8051e8807c4222a:

  Merge branch 'segmented-threads' (2020-11-13 10:06:26 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9107a641e67f27003fa6cbe7b55b1ec6a0239197:

  error out if ENOSPC during file layout (2020-11-22 09:54:44 -0700)

----------------------------------------------------------------
Kushal Kumaran (1):
      error out if ENOSPC during file layout

 filesetup.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index f4360a6f..42c5f630 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -231,13 +231,12 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 						break;
 					log_info("fio: ENOSPC on laying out "
 						 "file, stopping\n");
-					break;
 				}
 				td_verror(td, errno, "write");
 			} else
 				td_verror(td, EIO, "write");
 
-			break;
+			goto err;
 		}
 	}
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-11-14 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-11-14 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 84f9318fc16e33633ac9f789dcef7cc58c3b8595:

  t/latency_percentiles.py: tweak terse output parse (2020-11-12 11:26:58 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2ee239fb084355e115cf8c2bf8051e8807c4222a:

  Merge branch 'segmented-threads' (2020-11-13 10:06:26 -0700)

----------------------------------------------------------------
Jens Axboe (4):
      Wrap thread_data in thread_segment
      Add thread_segments as needed
      Kill off 'max_jobs'
      Merge branch 'segmented-threads'

 backend.c        |   5 +-
 fio.h            |  27 +++++++++--
 gettime-thread.c |   2 +-
 init.c           | 142 +++++++++++++++++++++++++++++--------------------------
 libfio.c         |   5 ++
 os/os-mac.h      |   6 ---
 os/os.h          |   4 --
 server.c         |   2 +-
 8 files changed, 108 insertions(+), 85 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index f91f3caf..2e6a377c 100644
--- a/backend.c
+++ b/backend.c
@@ -62,8 +62,9 @@ struct io_log *agg_io_log[DDIR_RWDIR_CNT];
 
 int groupid = 0;
 unsigned int thread_number = 0;
+unsigned int nr_segments = 0;
+unsigned int cur_segment = 0;
 unsigned int stat_number = 0;
-int shm_id = 0;
 int temp_stall_ts;
 unsigned long done_secs = 0;
 #ifdef PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP
@@ -76,7 +77,7 @@ pthread_mutex_t overlap_check = PTHREAD_MUTEX_INITIALIZER;
 
 static void sig_int(int sig)
 {
-	if (threads) {
+	if (nr_segments) {
 		if (is_backend)
 			fio_server_got_signal(sig);
 		else {
diff --git a/fio.h b/fio.h
index 9d189eb8..fffec001 100644
--- a/fio.h
+++ b/fio.h
@@ -467,6 +467,12 @@ struct thread_data {
 
 };
 
+struct thread_segment {
+	struct thread_data *threads;
+	int shm_id;
+	int nr_threads;
+};
+
 /*
  * when should interactive ETA output be generated
  */
@@ -510,10 +516,15 @@ enum {
 #define __fio_stringify_1(x)	#x
 #define __fio_stringify(x)	__fio_stringify_1(x)
 
+#define REAL_MAX_JOBS		4096
+#define JOBS_PER_SEG		8
+#define REAL_MAX_SEG		(REAL_MAX_JOBS / JOBS_PER_SEG)
+
 extern bool exitall_on_terminate;
 extern unsigned int thread_number;
 extern unsigned int stat_number;
-extern int shm_id;
+extern unsigned int nr_segments;
+extern unsigned int cur_segment;
 extern int groupid;
 extern int output_format;
 extern int append_terse_output;
@@ -542,7 +553,15 @@ extern char *trigger_remote_cmd;
 extern long long trigger_timeout;
 extern char *aux_path;
 
-extern struct thread_data *threads;
+extern struct thread_segment segments[REAL_MAX_SEG];
+
+static inline struct thread_data *tnumber_to_td(unsigned int tnumber)
+{
+	struct thread_segment *seg;
+
+	seg = &segments[tnumber / JOBS_PER_SEG];
+	return &seg->threads[tnumber & (JOBS_PER_SEG - 1)];
+}
 
 static inline bool is_running_backend(void)
 {
@@ -557,8 +576,6 @@ static inline void fio_ro_check(const struct thread_data *td, struct io_u *io_u)
 	       !(io_u->ddir == DDIR_TRIM && !td_trim(td)));
 }
 
-#define REAL_MAX_JOBS		4096
-
 static inline bool should_fsync(struct thread_data *td)
 {
 	if (td->last_was_sync)
@@ -709,7 +726,7 @@ extern void lat_target_reset(struct thread_data *);
  * Iterates all threads/processes within all the defined jobs
  */
 #define for_each_td(td, i)	\
-	for ((i) = 0, (td) = &threads[0]; (i) < (int) thread_number; (i)++, (td)++)
+	for ((i) = 0, (td) = &segments[0].threads[0]; (i) < (int) thread_number; (i)++, (td) = tnumber_to_td((i)))
 #define for_each_file(td, f, i)	\
 	if ((td)->files_index)						\
 		for ((i) = 0, (f) = (td)->files[0];			\
diff --git a/gettime-thread.c b/gettime-thread.c
index 953e4e67..86c2e2ef 100644
--- a/gettime-thread.c
+++ b/gettime-thread.c
@@ -58,7 +58,7 @@ static void *gtod_thread_main(void *data)
 	 * but I'm not sure what to use outside of a simple CPU nop to relax
 	 * it - we don't want to lose precision.
 	 */
-	while (threads) {
+	while (nr_segments) {
 		fio_gtod_update();
 		nop;
 	}
diff --git a/init.c b/init.c
index 7f64ce21..f9c20bdb 100644
--- a/init.c
+++ b/init.c
@@ -45,13 +45,12 @@ const char fio_version_string[] = FIO_VERSION;
 #define FIO_RANDSEED		(0xb1899bedUL)
 
 static char **ini_file;
-static int max_jobs = FIO_MAX_JOBS;
 static bool dump_cmdline;
 static bool parse_only;
 static bool merge_blktrace_only;
 
 static struct thread_data def_thread;
-struct thread_data *threads = NULL;
+struct thread_segment segments[REAL_MAX_SEG];
 static char **job_sections;
 static int nr_job_sections;
 
@@ -301,25 +300,34 @@ static struct option l_opts[FIO_NR_OPTIONS] = {
 
 void free_threads_shm(void)
 {
-	if (threads) {
-		void *tp = threads;
+	int i;
+
+	for (i = 0; i < nr_segments; i++) {
+		struct thread_segment *seg = &segments[i];
+
+		if (seg->threads) {
+			void *tp = seg->threads;
 #ifndef CONFIG_NO_SHM
-		struct shmid_ds sbuf;
+			struct shmid_ds sbuf;
 
-		threads = NULL;
-		shmdt(tp);
-		shmctl(shm_id, IPC_RMID, &sbuf);
-		shm_id = -1;
+			seg->threads = NULL;
+			shmdt(tp);
+			shmctl(seg->shm_id, IPC_RMID, &sbuf);
+			seg->shm_id = -1;
 #else
-		threads = NULL;
-		free(tp);
+			seg->threads = NULL;
+			free(tp);
 #endif
+		}
 	}
+
+	nr_segments = 0;
+	cur_segment = 0;
 }
 
 static void free_shm(void)
 {
-	if (threads) {
+	if (nr_segments) {
 		flow_exit();
 		fio_debug_jobp = NULL;
 		fio_warned = NULL;
@@ -337,71 +345,79 @@ static void free_shm(void)
 	scleanup();
 }
 
-/*
- * The thread area is shared between the main process and the job
- * threads/processes. So setup a shared memory segment that will hold
- * all the job info. We use the end of the region for keeping track of
- * open files across jobs, for file sharing.
- */
-static int setup_thread_area(void)
+static int add_thread_segment(void)
 {
+	struct thread_segment *seg = &segments[nr_segments];
+	size_t size = JOBS_PER_SEG * sizeof(struct thread_data);
 	int i;
 
-	if (threads)
-		return 0;
-
-	/*
-	 * 1024 is too much on some machines, scale max_jobs if
-	 * we get a failure that looks like too large a shm segment
-	 */
-	do {
-		size_t size = max_jobs * sizeof(struct thread_data);
+	if (nr_segments + 1 >= REAL_MAX_SEG) {
+		log_err("error: maximum number of jobs reached.\n");
+		return -1;
+	}
 
-		size += 2 * sizeof(unsigned int);
+	size += 2 * sizeof(unsigned int);
 
 #ifndef CONFIG_NO_SHM
-		shm_id = shmget(0, size, IPC_CREAT | 0600);
-		if (shm_id != -1)
-			break;
-		if (errno != EINVAL && errno != ENOMEM && errno != ENOSPC) {
+	seg->shm_id = shmget(0, size, IPC_CREAT | 0600);
+	if (seg->shm_id == -1) {
+		if (errno != EINVAL && errno != ENOMEM && errno != ENOSPC)
 			perror("shmget");
-			break;
-		}
+		return -1;
+	}
 #else
-		threads = malloc(size);
-		if (threads)
-			break;
+	seg->threads = malloc(size);
+	if (!seg->threads)
+		return -1;
 #endif
 
-		max_jobs >>= 1;
-	} while (max_jobs);
-
 #ifndef CONFIG_NO_SHM
-	if (shm_id == -1)
-		return 1;
-
-	threads = shmat(shm_id, NULL, 0);
-	if (threads == (void *) -1) {
+	seg->threads = shmat(seg->shm_id, NULL, 0);
+	if (seg->threads == (void *) -1) {
 		perror("shmat");
 		return 1;
 	}
 	if (shm_attach_to_open_removed())
-		shmctl(shm_id, IPC_RMID, NULL);
+		shmctl(seg->shm_id, IPC_RMID, NULL);
 #endif
 
-	memset(threads, 0, max_jobs * sizeof(struct thread_data));
-	for (i = 0; i < max_jobs; i++)
-		DRD_IGNORE_VAR(threads[i]);
-	fio_debug_jobp = (unsigned int *)(threads + max_jobs);
+	nr_segments++;
+
+	memset(seg->threads, 0, JOBS_PER_SEG * sizeof(struct thread_data));
+	for (i = 0; i < JOBS_PER_SEG; i++)
+		DRD_IGNORE_VAR(seg->threads[i]);
+	seg->nr_threads = 0;
+
+	/* Not first segment, we're done */
+	if (nr_segments != 1) {
+		cur_segment++;
+		return 0;
+	}
+
+	fio_debug_jobp = (unsigned int *)(seg->threads + JOBS_PER_SEG);
 	*fio_debug_jobp = -1;
 	fio_warned = fio_debug_jobp + 1;
 	*fio_warned = 0;
 
 	flow_init();
-
 	return 0;
 }
 
+/*
+ * The thread areas are shared between the main process and the job
+ * threads/processes, and is split into chunks of JOBS_PER_SEG. If the current
+ * segment has no more room, add a new chunk.
+ */
+static int expand_thread_area(void)
+{
+	struct thread_segment *seg = &segments[cur_segment];
+
+	if (nr_segments && seg->nr_threads < JOBS_PER_SEG)
+		return 0;
+
+	return add_thread_segment();
+}
+
 static void dump_print_option(struct print_option *p)
 {
 	const char *delim;
@@ -470,21 +486,19 @@ static void copy_opt_list(struct thread_data *dst, struct thread_data *src)
 static struct thread_data *get_new_job(bool global, struct thread_data *parent,
 				       bool preserve_eo, const char *jobname)
 {
+	struct thread_segment *seg;
 	struct thread_data *td;
 
 	if (global)
 		return &def_thread;
-	if (setup_thread_area()) {
+	if (expand_thread_area()) {
 		log_err("error: failed to setup shm segment\n");
 		return NULL;
 	}
-	if (thread_number >= max_jobs) {
-		log_err("error: maximum number of jobs (%d) reached.\n",
-				max_jobs);
-		return NULL;
-	}
 
-	td = &threads[thread_number++];
+	seg = &segments[cur_segment];
+	td = &seg->threads[seg->nr_threads++];
+	thread_number++;
 	*td = *parent;
 
 	INIT_FLIST_HEAD(&td->opt_list);
@@ -534,7 +548,8 @@ static void put_job(struct thread_data *td)
 	if (td->o.name)
 		free(td->o.name);
 
-	memset(&threads[td->thread_number - 1], 0, sizeof(*td));
+	memset(td, 0, sizeof(*td));
+	segments[cur_segment].nr_threads--;
 	thread_number--;
 }
 
@@ -2722,12 +2737,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			warnings_fatal = 1;
 			break;
 		case 'j':
-			max_jobs = atoi(optarg);
-			if (!max_jobs || max_jobs > REAL_MAX_JOBS) {
-				log_err("fio: invalid max jobs: %d\n", max_jobs);
-				do_exit++;
-				exit_val = 1;
-			}
+			/* we don't track/need this anymore, ignore it */
 			break;
 		case 'S':
 			did_arg = true;
diff --git a/libfio.c b/libfio.c
index 7348b164..6144a474 100644
--- a/libfio.c
+++ b/libfio.c
@@ -156,8 +156,13 @@ void reset_all_stats(struct thread_data *td)
 
 void reset_fio_state(void)
 {
+	int i;
+
 	groupid = 0;
 	thread_number = 0;
+	cur_segment = 0;
+	for (i = 0; i < nr_segments; i++)
+		segments[i].nr_threads = 0;
 	stat_number = 0;
 	done_secs = 0;
 }
diff --git a/os/os-mac.h b/os/os-mac.h
index 2852ac67..683aab32 100644
--- a/os/os-mac.h
+++ b/os/os-mac.h
@@ -27,12 +27,6 @@
 #define fio_swap32(x)	OSSwapInt32(x)
 #define fio_swap64(x)	OSSwapInt64(x)
 
-/*
- * OSX has a pitifully small shared memory segment by default,
- * so default to a lower number of max jobs supported
- */
-#define FIO_MAX_JOBS		128
-
 #ifndef CONFIG_CLOCKID_T
 typedef unsigned int clockid_t;
 #endif
diff --git a/os/os.h b/os/os.h
index 9a280e54..b46f4164 100644
--- a/os/os.h
+++ b/os/os.h
@@ -172,10 +172,6 @@ extern int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu);
 #endif
 #endif
 
-#ifndef FIO_MAX_JOBS
-#define FIO_MAX_JOBS		4096
-#endif
-
 #ifndef CONFIG_SOCKLEN_T
 typedef unsigned int socklen_t;
 #endif
diff --git a/server.c b/server.c
index 248a2d44..1b65297e 100644
--- a/server.c
+++ b/server.c
@@ -950,7 +950,7 @@ static int handle_update_job_cmd(struct fio_net_cmd *cmd)
 		return 0;
 	}
 
-	td = &threads[tnumber - 1];
+	td = tnumber_to_td(tnumber);
 	convert_thread_options_to_cpu(&td->o, &pdu->top);
 	send_update_job_reply(cmd->tag, 0);
 	return 0;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-11-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-11-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e82ec77644f4fb7eccb3441485762c1c1c574b2f:

  t/latency_percentiles.py: correct terse parse (2020-11-09 12:13:25 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 84f9318fc16e33633ac9f789dcef7cc58c3b8595:

  t/latency_percentiles.py: tweak terse output parse (2020-11-12 11:26:58 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      Makefile: fix fio version gen
      Fio 3.24
      t/latency_percentiles.py: tweak terse output parse

 FIO-VERSION-GEN          | 2 +-
 Makefile                 | 8 ++++----
 t/latency_percentiles.py | 4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 5ee7735c..86ea0c5d 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.23
+DEF_VER=fio-3.24
 
 LF='
 '
diff --git a/Makefile b/Makefile
index 0d3c877e..ecfaa3e0 100644
--- a/Makefile
+++ b/Makefile
@@ -263,6 +263,10 @@ CFLAGS += $$($(1)_CFLAGS)
 endef
 endif
 
+FIO-VERSION-FILE: FORCE
+	@$(SHELL) $(SRCDIR)/FIO-VERSION-GEN
+-include FIO-VERSION-FILE
+
 override CFLAGS := -DFIO_VERSION='"$(FIO_VERSION)"' $(FIO_CFLAGS) $(CFLAGS)
 
 $(foreach eng,$(ENGINES),$(eval $(call engine_template,$(eng))))
@@ -433,10 +437,6 @@ all: $(PROGS) $(T_TEST_PROGS) $(UT_PROGS) $(SCRIPTS) $(ENGS_OBJS) FORCE
 .PHONY: all install clean test
 .PHONY: FORCE cscope
 
-FIO-VERSION-FILE: FORCE
-	@$(SHELL) $(SRCDIR)/FIO-VERSION-GEN
--include FIO-VERSION-FILE
-
 %.o : %.c
 	@mkdir -p $(dir $@)
 	$(QUIET_CC)$(CC) -o $@ $(CFLAGS) $(CPPFLAGS) -c $<
diff --git a/t/latency_percentiles.py b/t/latency_percentiles.py
index a9aee019..cc437426 100755
--- a/t/latency_percentiles.py
+++ b/t/latency_percentiles.py
@@ -216,7 +216,7 @@ class FioLatTest():
             file_data = file.read()
 
         #
-        # Read the first few lines and see if any of them begin with '3;fio-'
+        # Read the first few lines and see if any of them begin with '3;'
         # If so, the line is probably terse output. Obviously, this only
         # works for fio terse version 3 and it does not work for
         # multi-line terse output
@@ -224,7 +224,7 @@ class FioLatTest():
         lines = file_data.splitlines()
         for i in range(8):
             file_data = lines[i]
-            if file_data.startswith('3;;latency'):
+            if file_data.startswith('3;'):
                 self.terse_data = file_data.split(';')
                 return True
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2020-11-06 13:00 Jens Axboe
@ 2020-11-12 20:51 ` Rebecca Cran
  0 siblings, 0 replies; 1305+ messages in thread
From: Rebecca Cran @ 2020-11-12 20:51 UTC (permalink / raw)
  To: Jens Axboe, fio

On 11/6/20 6:00 AM, Jens Axboe wrote:
> The following changes since commit 38c2f9384db8dbd93f59d965d70ab0d3a53343fa:
>
>    Windows: update dobuild.cmd to run the configure/make (2020-11-04 16:43:14 -0700)

Sorry, I got sidetracked by other work. I'll try and get a new patch 
this weekend.


-- 
Rebecca Cran




^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-11-10 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-11-10 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ea693b6e4501a1385bf62a01f6fb1f3609d31a4a:

  Revert "Windows: update dobuild.cmd to run the configure/make" (2020-11-05 15:33:00 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e82ec77644f4fb7eccb3441485762c1c1c574b2f:

  t/latency_percentiles.py: correct terse parse (2020-11-09 12:13:25 -0700)

----------------------------------------------------------------
Eric Sandeen (3):
      fix dynamic engine build
      fix dynamic engine loading for libaio engine etc
      list all available dynamic ioengines with --enghelp

Jens Axboe (6):
      FIO_EXT_ENG_DIR should be default path
      Remove the "libaio over io_uring" mode
      Makefile: ensure that external libs are linked properly with dynamic engine
      configure: remove libaio-uring remnant
      Make sure we do libaio engine compatability names
      t/latency_percentiles.py: correct terse parse

 Makefile                 | 46 +++++++++++++++++++---------------------------
 configure                | 19 +------------------
 ioengines.c              | 42 +++++++++++++++++++++++++++++++++++++++---
 os/os-linux.h            |  2 +-
 t/latency_percentiles.py |  2 +-
 5 files changed, 61 insertions(+), 50 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 8c8a8fce..0d3c877e 100644
--- a/Makefile
+++ b/Makefile
@@ -66,10 +66,10 @@ ifdef CONFIG_LIBHDFS
 endif
 
 ifdef CONFIG_LIBISCSI
-  iscsi_SRCS = engines/libiscsi.c
-  iscsi_LIBS = $(LIBISCSI_LIBS)
-  iscsi_CFLAGS = $(LIBISCSI_CFLAGS)
-  ENGINES += iscsi
+  libiscsi_SRCS = engines/libiscsi.c
+  libiscsi_LIBS = $(LIBISCSI_LIBS)
+  libiscsi_CFLAGS = $(LIBISCSI_CFLAGS)
+  ENGINES += libiscsi
 endif
 
 ifdef CONFIG_LIBNBD
@@ -85,14 +85,9 @@ else ifdef CONFIG_32BIT
   CPPFLAGS += -DBITS_PER_LONG=32
 endif
 ifdef CONFIG_LIBAIO
-  aio_SRCS = engines/libaio.c
-  aio_LIBS = -laio
-  ifdef CONFIG_LIBAIO_URING
-    aio_LIBS = -luring
-  else
-    aio_LIBS = -laio
-  endif
-  ENGINES += aio
+  libaio_SRCS = engines/libaio.c
+  libaio_LIBS = -laio
+  ENGINES += libaio
 endif
 ifdef CONFIG_RDMA
   rdma_SRCS = engines/rdma.c
@@ -179,17 +174,17 @@ ifdef CONFIG_LINUX_DEVDAX
   ENGINES += dev-dax
 endif
 ifdef CONFIG_LIBPMEM
-  pmem_SRCS = engines/libpmem.c
-  pmem_LIBS = -lpmem
-  ENGINES += pmem
+  libpmem_SRCS = engines/libpmem.c
+  libpmem_LIBS = -lpmem
+  ENGINES += libpmem
 endif
 ifdef CONFIG_IME
   SOURCE += engines/ime.c
 endif
 ifdef CONFIG_LIBZBC
-  zbc_SRCS = engines/libzbc.c
-  zbc_LIBS = -lzbc
-  ENGINES += zbc
+  libzbc_SRCS = engines/libzbc.c
+  libzbc_LIBS = -lzbc
+  ENGINES += libzbc
 endif
 
 ifeq ($(CONFIG_TARGET_OS), Linux)
@@ -255,8 +250,10 @@ ifdef CONFIG_DYNAMIC_ENGINES
  DYNAMIC_ENGS := $(ENGINES)
 define engine_template =
 $(1)_OBJS := $$($(1)_SRCS:.c=.o)
-$$($(1)_OBJS): FIO_CFLAGS += -fPIC $$($(1)_CFLAGS)
-ENGS_OBJS += engines/lib$(1).so
+$$($(1)_OBJS): CFLAGS := -fPIC $$($(1)_CFLAGS) $(CFLAGS)
+engines/fio-$(1).so: $$($(1)_OBJS)
+	$$(QUIET_LINK)$(CC) -shared -rdynamic -fPIC -Wl,-soname,fio-$(1).so.1 -o $$@ $$< $$($(1)_LIBS)
+ENGS_OBJS += engines/fio-$(1).so
 endef
 else # !CONFIG_DYNAMIC_ENGINES
 define engine_template =
@@ -266,6 +263,8 @@ CFLAGS += $$($(1)_CFLAGS)
 endef
 endif
 
+override CFLAGS := -DFIO_VERSION='"$(FIO_VERSION)"' $(FIO_CFLAGS) $(CFLAGS)
+
 $(foreach eng,$(ENGINES),$(eval $(call engine_template,$(eng))))
 
 OBJS := $(SOURCE:.c=.o)
@@ -438,8 +437,6 @@ FIO-VERSION-FILE: FORCE
 	@$(SHELL) $(SRCDIR)/FIO-VERSION-GEN
 -include FIO-VERSION-FILE
 
-override CFLAGS := -DFIO_VERSION='"$(FIO_VERSION)"' $(FIO_CFLAGS) $(CFLAGS)
-
 %.o : %.c
 	@mkdir -p $(dir $@)
 	$(QUIET_CC)$(CC) -o $@ $(CFLAGS) $(CPPFLAGS) -c $<
@@ -567,11 +564,6 @@ unittests/unittest: $(UT_OBJS) $(UT_TARGET_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(UT_OBJS) $(UT_TARGET_OBJS) -lcunit $(LIBS)
 endif
 
-ifdef CONFIG_DYNAMIC_ENGINES
-engines/lib$(1).so: $$($(1)_OBJS)
-	$$(QUIET_LINK)$(CC) -shared -rdynamic -fPIC -Wl,-soname,fio-$(1).so.1 $$($(1)_LIBS) -o $$@ $$<
-endif
-
 clean: FORCE
 	@rm -f .depend $(FIO_OBJS) $(GFIO_OBJS) $(OBJS) $(T_OBJS) $(UT_OBJS) $(PROGS) $(T_PROGS) $(T_TEST_PROGS) core.* core gfio unittests/unittest FIO-VERSION-FILE *.[do] lib/*.d oslib/*.[do] crc/*.d engines/*.[do] engines/*.so profiles/*.[do] t/*.[do] unittests/*.[do] unittests/*/*.[do] config-host.mak config-host.h y.tab.[ch] lex.yy.c exp/*.[do] lexer.h
 	@rm -f t/fio-btrace2fio t/io_uring t/read-to-pipe-async
diff --git a/configure b/configure
index 39a9248d..d2ca8934 100755
--- a/configure
+++ b/configure
@@ -168,7 +168,6 @@ disable_native="no"
 march_set="no"
 libiscsi="no"
 libnbd="no"
-libaio_uring="no"
 libzbc=""
 dynamic_engines="no"
 prefix=/usr/local
@@ -237,8 +236,6 @@ for opt do
   ;;
   --disable-tcmalloc) disable_tcmalloc="yes"
   ;;
-  --enable-libaio-uring) libaio_uring="yes"
-  ;;
   --dynamic-libengines) dynamic_engines="yes"
   ;;
   --help)
@@ -281,7 +278,6 @@ if test "$show_help" = "yes" ; then
   echo "--enable-libnbd         Enable libnbd (NBD engine) support"
   echo "--disable-libzbc        Disable libzbc even if found"
   echo "--disable-tcmalloc	Disable tcmalloc support"
-  echo "--enable-libaio-uring   Enable libaio emulated over io_uring"
   echo "--dynamic-libengines	Lib-based ioengines as dynamic libraries"
   exit $exit_val
 fi
@@ -653,22 +649,13 @@ int main(void)
   return 0;
 }
 EOF
-  if test "$libaio_uring" = "yes"; then
-    if compile_prog "" "-luring" "libaio io_uring" ; then
-      libaio=yes
-      LIBS="-luring $LIBS"
-    else
-      feature_not_found "libaio io_uring" ""
-    fi
-  elif compile_prog "" "-laio" "libaio" ; then
+  if compile_prog "" "-laio" "libaio" ; then
     libaio=yes
-    libaio_uring=no
   else
     if test "$libaio" = "yes" ; then
       feature_not_found "linux AIO" "libaio-dev or libaio-devel"
     fi
     libaio=no
-    libaio_uring=no
   fi
 
   cat > $TMPC <<EOF
@@ -689,7 +676,6 @@ EOF
 fi
 print_config "Linux AIO support" "$libaio"
 print_config "Linux AIO support rw flags" "$libaio_rw_flags"
-print_config "Linux AIO over io_uring" "$libaio_uring"
 
 ##########################################
 # posix aio probe
@@ -2722,9 +2708,6 @@ if test "$libaio" = "yes" ; then
   if test "$libaio_rw_flags" = "yes" ; then
     output_sym "CONFIG_LIBAIO_RW_FLAGS"
   fi
-  if test "$libaio_uring" = "yes" ; then
-    output_sym "CONFIG_LIBAIO_URING"
-  fi
 fi
 if test "$posix_aio" = "yes" ; then
   output_sym "CONFIG_POSIXAIO"
diff --git a/ioengines.c b/ioengines.c
index 3e43ef2f..5ac512ae 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -15,6 +15,8 @@
 #include <dlfcn.h>
 #include <fcntl.h>
 #include <assert.h>
+#include <sys/types.h>
+#include <dirent.h>
 
 #include "fio.h"
 #include "diskutil.h"
@@ -91,7 +93,7 @@ static void *dlopen_external(struct thread_data *td, const char *engine)
 	char engine_path[PATH_MAX];
 	void *dlhandle;
 
-	sprintf(engine_path, "%s/lib%s.so", FIO_EXT_ENG_DIR, engine);
+	sprintf(engine_path, "%s/fio-%s.so", FIO_EXT_ENG_DIR, engine);
 
 	dlhandle = dlopen(engine_path, RTLD_LAZY);
 	if (!dlhandle)
@@ -110,6 +112,10 @@ static struct ioengine_ops *dlopen_ioengine(struct thread_data *td,
 	struct ioengine_ops *ops;
 	void *dlhandle;
 
+	if (!strncmp(engine_lib, "linuxaio", 8) ||
+	    !strncmp(engine_lib, "aio", 3))
+		engine_lib = "libaio";
+
 	dprint(FD_IO, "dload engine %s\n", engine_lib);
 
 	dlerror();
@@ -158,7 +164,7 @@ static struct ioengine_ops *__load_ioengine(const char *engine)
 	/*
 	 * linux libaio has alias names, so convert to what we want
 	 */
-	if (!strncmp(engine, "linuxaio", 8)) {
+	if (!strncmp(engine, "linuxaio", 8) || !strncmp(engine, "aio", 3)) {
 		dprint(FD_IO, "converting ioengine name: %s -> libaio\n",
 		       engine);
 		engine = "libaio";
@@ -630,6 +636,34 @@ int td_io_get_file_size(struct thread_data *td, struct fio_file *f)
 	return td->io_ops->get_file_size(td, f);
 }
 
+#ifdef CONFIG_DYNAMIC_ENGINES
+/* Load all dynamic engines in FIO_EXT_ENG_DIR for enghelp command */
+static void
+fio_load_dynamic_engines(struct thread_data *td)
+{
+	DIR *dirhandle = NULL;
+	struct dirent *dirent = NULL;
+	char engine_path[PATH_MAX];
+
+	dirhandle = opendir(FIO_EXT_ENG_DIR);
+	if (!dirhandle)
+		return;
+
+	while ((dirent = readdir(dirhandle)) != NULL) {
+		if (!strcmp(dirent->d_name, ".") ||
+		    !strcmp(dirent->d_name, ".."))
+			continue;
+
+		sprintf(engine_path, "%s/%s", FIO_EXT_ENG_DIR, dirent->d_name);
+		dlopen_ioengine(td, engine_path);
+	}
+
+	closedir(dirhandle);
+}
+#else
+#define fio_load_dynamic_engines(td) do { } while (0)
+#endif
+
 int fio_show_ioengine_help(const char *engine)
 {
 	struct flist_head *entry;
@@ -638,8 +672,11 @@ int fio_show_ioengine_help(const char *engine)
 	char *sep;
 	int ret = 1;
 
+	memset(&td, 0, sizeof(struct thread_data));
+
 	if (!engine || !*engine) {
 		log_info("Available IO engines:\n");
+		fio_load_dynamic_engines(&td);
 		flist_for_each(entry, &engine_list) {
 			io_ops = flist_entry(entry, struct ioengine_ops, list);
 			log_info("\t%s\n", io_ops->name);
@@ -652,7 +689,6 @@ int fio_show_ioengine_help(const char *engine)
 		sep++;
 	}
 
-	memset(&td, 0, sizeof(struct thread_data));
 	td.o.ioengine = (char *)engine;
 	io_ops = load_ioengine(&td);
 
diff --git a/os/os-linux.h b/os/os-linux.h
index 65d3b429..5562b0da 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -58,7 +58,7 @@
 
 #define OS_MAP_ANON		MAP_ANONYMOUS
 
-#define FIO_EXT_ENG_DIR	"/usr/lib/fio"
+#define FIO_EXT_ENG_DIR	"/usr/local/lib/fio"
 
 typedef cpu_set_t os_cpu_mask_t;
 
diff --git a/t/latency_percentiles.py b/t/latency_percentiles.py
index 6ce4579a..a9aee019 100755
--- a/t/latency_percentiles.py
+++ b/t/latency_percentiles.py
@@ -224,7 +224,7 @@ class FioLatTest():
         lines = file_data.splitlines()
         for i in range(8):
             file_data = lines[i]
-            if file_data.startswith('3;fio-'):
+            if file_data.startswith('3;;latency'):
                 self.terse_data = file_data.split(';')
                 return True
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-11-06 13:00 Jens Axboe
  2020-11-12 20:51 ` Rebecca Cran
  0 siblings, 1 reply; 1305+ messages in thread
From: Jens Axboe @ 2020-11-06 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 38c2f9384db8dbd93f59d965d70ab0d3a53343fa:

  Windows: update dobuild.cmd to run the configure/make (2020-11-04 16:43:14 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ea693b6e4501a1385bf62a01f6fb1f3609d31a4a:

  Revert "Windows: update dobuild.cmd to run the configure/make" (2020-11-05 15:33:00 -0700)

----------------------------------------------------------------
Albert Chang (2):
      Makefile: fix usage of JAVA_HOME environmental variable
      engines/hdfs: swap fio_hdfsio_init and fio_hdfsio_io_u_init

Jens Axboe (3):
      Merge branch 'patch-1' of https://github.com/gloit042/fio
      Merge branch 'master' of https://github.com/albert-chang0/fio
      Revert "Windows: update dobuild.cmd to run the configure/make"

gloit042 (1):
      goptions: correct postfix

 Makefile               |  2 +-
 engines/libhdfs.c      |  4 ++--
 goptions.c             |  2 +-
 os/windows/_domake.sh  | 17 -----------------
 os/windows/dobuild.cmd | 23 ++++++++++-------------
 5 files changed, 14 insertions(+), 34 deletions(-)
 delete mode 100755 os/windows/_domake.sh

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 48788f24..8c8a8fce 100644
--- a/Makefile
+++ b/Makefile
@@ -60,7 +60,7 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 
 ifdef CONFIG_LIBHDFS
   HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
-  HDFSLIB= -Wl,-rpath $(JAVA_HOME)/jre/lib/$(FIO_HDFS_CPU)/server -L$(JAVA_HOME)/jre/lib/$(FIO_HDFS_CPU)/server $(FIO_LIBHDFS_LIB)/libhdfs.a -ljvm
+  HDFSLIB= -Wl,-rpath $(JAVA_HOME)/lib/$(FIO_HDFS_CPU)/server -L$(JAVA_HOME)/lib/$(FIO_HDFS_CPU)/server $(FIO_LIBHDFS_LIB)/libhdfs.a -ljvm
   FIO_CFLAGS += $(HDFSFLAGS)
   SOURCE += engines/libhdfs.c
 endif
diff --git a/engines/libhdfs.c b/engines/libhdfs.c
index 9ca82f78..eb55c3c5 100644
--- a/engines/libhdfs.c
+++ b/engines/libhdfs.c
@@ -240,7 +240,7 @@ int fio_hdfsio_close_file(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
-static int fio_hdfsio_init(struct thread_data *td)
+static int fio_hdfsio_io_u_init(struct thread_data *td, struct io_u *io_u)
 {
 	struct hdfsio_options *options = td->eo;
 	struct hdfsio_data *hd = td->io_ops_data;
@@ -349,7 +349,7 @@ static int fio_hdfsio_setup(struct thread_data *td)
 	return 0;
 }
 
-static int fio_hdfsio_io_u_init(struct thread_data *td, struct io_u *io_u)
+static int fio_hdfsio_init(struct thread_data *td)
 {
 	struct hdfsio_data *hd = td->io_ops_data;
 	struct hdfsio_options *options = td->eo;
diff --git a/goptions.c b/goptions.c
index f44254bf..0b8c56a2 100644
--- a/goptions.c
+++ b/goptions.c
@@ -826,7 +826,7 @@ static struct gopt *gopt_new_str_val(struct gopt_job_view *gjv,
 				     unsigned long long *p, unsigned int idx)
 {
 	struct gopt_str_val *g;
-	const gchar *postfix[] = { "B", "KiB", "MiB", "GiB", "PiB", "PiB", "" };
+	const gchar *postfix[] = { "B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB" };
 	GtkWidget *label;
 	int i;
 
diff --git a/os/windows/_domake.sh b/os/windows/_domake.sh
deleted file mode 100755
index 05625ff4..00000000
--- a/os/windows/_domake.sh
+++ /dev/null
@@ -1,17 +0,0 @@
-#!/usr/bin/env bash
-
-set -e
-
-cd "$2"
-cd ../..
-if [ -e "fio.exe" ]; then
-  make clean
-fi
-
-if [ "$1" = "x86" ]; then
-  ./configure --disable-native --build-32bit-win
-else
-  ./configure --disable-native
-fi
-
-make -j
diff --git a/os/windows/dobuild.cmd b/os/windows/dobuild.cmd
index ea79dd03..08df3e87 100644
--- a/os/windows/dobuild.cmd
+++ b/os/windows/dobuild.cmd
@@ -1,18 +1,5 @@
 @echo off
 setlocal enabledelayedexpansion
-
-if "%1"=="x86" set FIO_ARCH=x86
-if "%1"=="x64" set FIO_ARCH=x64
-
-if not defined FIO_ARCH (
-  echo Error: must specify the architecture.
-  echo Usage: dobuild x86
-  echo Usage: dobuild x64
-  goto end
-)
-
-C:\Cygwin64\bin\bash -l %cd%/_domake.sh %FIO_ARCH% %cd%
-
 set /a counter=1
 for /f "tokens=3" %%i in (..\..\FIO-VERSION-FILE) do (
  if "!counter!"=="1" set FIO_VERSION=%%i
@@ -29,6 +16,16 @@ if not defined FIO_VERSION_NUMBERS (
   goto end
 )
 
+if "%1"=="x86" set FIO_ARCH=x86
+if "%1"=="x64" set FIO_ARCH=x64
+
+if not defined FIO_ARCH (
+  echo Error: must specify the architecture.
+  echo Usage: dobuild x86
+  echo Usage: dobuild x64
+  goto end
+)
+
 if defined SIGN_FIO (
   signtool sign /n "%SIGNING_CN%" /t http://timestamp.digicert.com ..\..\fio.exe
   signtool sign /as /n "%SIGNING_CN%" /tr http://timestamp.digicert.com /td sha256 /fd sha256 ..\..\fio.exe


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-11-05 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-11-05 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 08a3f6fef1fe6c9fcd18d5ed40ca81097922bb14:

  fio: fix dynamic engines soname definition (2020-11-01 07:16:16 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 38c2f9384db8dbd93f59d965d70ab0d3a53343fa:

  Windows: update dobuild.cmd to run the configure/make (2020-11-04 16:43:14 -0700)

----------------------------------------------------------------
Rebecca Cran (1):
      Windows: update dobuild.cmd to run the configure/make

 os/windows/_domake.sh  | 17 +++++++++++++++++
 os/windows/dobuild.cmd | 23 +++++++++++++----------
 2 files changed, 30 insertions(+), 10 deletions(-)
 create mode 100755 os/windows/_domake.sh

---

Diff of recent changes:

diff --git a/os/windows/_domake.sh b/os/windows/_domake.sh
new file mode 100755
index 00000000..05625ff4
--- /dev/null
+++ b/os/windows/_domake.sh
@@ -0,0 +1,17 @@
+#!/usr/bin/env bash
+
+set -e
+
+cd "$2"
+cd ../..
+if [ -e "fio.exe" ]; then
+  make clean
+fi
+
+if [ "$1" = "x86" ]; then
+  ./configure --disable-native --build-32bit-win
+else
+  ./configure --disable-native
+fi
+
+make -j
diff --git a/os/windows/dobuild.cmd b/os/windows/dobuild.cmd
index 08df3e87..ea79dd03 100644
--- a/os/windows/dobuild.cmd
+++ b/os/windows/dobuild.cmd
@@ -1,5 +1,18 @@
 @echo off
 setlocal enabledelayedexpansion
+
+if "%1"=="x86" set FIO_ARCH=x86
+if "%1"=="x64" set FIO_ARCH=x64
+
+if not defined FIO_ARCH (
+  echo Error: must specify the architecture.
+  echo Usage: dobuild x86
+  echo Usage: dobuild x64
+  goto end
+)
+
+C:\Cygwin64\bin\bash -l %cd%/_domake.sh %FIO_ARCH% %cd%
+
 set /a counter=1
 for /f "tokens=3" %%i in (..\..\FIO-VERSION-FILE) do (
  if "!counter!"=="1" set FIO_VERSION=%%i
@@ -16,16 +29,6 @@ if not defined FIO_VERSION_NUMBERS (
   goto end
 )
 
-if "%1"=="x86" set FIO_ARCH=x86
-if "%1"=="x64" set FIO_ARCH=x64
-
-if not defined FIO_ARCH (
-  echo Error: must specify the architecture.
-  echo Usage: dobuild x86
-  echo Usage: dobuild x64
-  goto end
-)
-
 if defined SIGN_FIO (
   signtool sign /n "%SIGNING_CN%" /t http://timestamp.digicert.com ..\..\fio.exe
   signtool sign /as /n "%SIGNING_CN%" /tr http://timestamp.digicert.com /td sha256 /fd sha256 ..\..\fio.exe


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-11-02 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-11-02 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8c17a6248bd227a9f3cc3da52bd1cb922dc6cf81:

  t/zbd: Fix test target size of test case #48 (2020-10-30 07:24:59 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 08a3f6fef1fe6c9fcd18d5ed40ca81097922bb14:

  fio: fix dynamic engines soname definition (2020-11-01 07:16:16 -0700)

----------------------------------------------------------------
Yigal Korman (1):
      fio: fix dynamic engines soname definition

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 5ed8a808..48788f24 100644
--- a/Makefile
+++ b/Makefile
@@ -569,7 +569,7 @@ endif
 
 ifdef CONFIG_DYNAMIC_ENGINES
 engines/lib$(1).so: $$($(1)_OBJS)
-	$$(QUIET_LINK)$(CC) -shared -rdynamic -fPIC -Wl,-soname,lib$(1).so.1 $$($(1)_LIBS) -o $$@ $$<
+	$$(QUIET_LINK)$(CC) -shared -rdynamic -fPIC -Wl,-soname,fio-$(1).so.1 $$($(1)_LIBS) -o $$@ $$<
 endif
 
 clean: FORCE


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-10-31 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-10-31 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a0b72421064b5dd7312812509e9babe923063deb:

  Merge branch 'o_dsync' of https://github.com/anarazel/fio (2020-10-28 16:20:43 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8c17a6248bd227a9f3cc3da52bd1cb922dc6cf81:

  t/zbd: Fix test target size of test case #48 (2020-10-30 07:24:59 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (2):
      zbd: Avoid excessive zone resets
      t/zbd: Fix test target size of test case #48

 t/zbd/test-zbd-support | 2 +-
 zbd.c                  | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 248423bb..acde3b3a 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -913,7 +913,7 @@ test48() {
     for ((i=0;i<jobs;i++)); do
 	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
 	opts+=("--io_size=$zone_size" "--iodepth=256" "--thread=1")
-	opts+=("--group_reporting=1")
+	opts+=("--size=$size" "--group_reporting=1")
 	# max_open_zones is already specified
 	opts+=($(job_var_opts_exclude "--max_open_zones"))
     done
diff --git a/zbd.c b/zbd.c
index 905c0c2b..9327816a 100644
--- a/zbd.c
+++ b/zbd.c
@@ -718,6 +718,9 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 	uint64_t length = (z+1)->start - offset;
 	int ret = 0;
 
+	if (z->wp == z->start)
+		return 0;
+
 	assert(is_valid_offset(f, offset + length - 1));
 
 	dprint(FD_ZBD, "%s: resetting wp of zone %u.\n", f->file_name,


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-10-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-10-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8bfe330eb42739d503d35d0b7d96f98c5c544204:

  Disallow offload IO mode for engines marked with FIO_NO_OFFLOAD (2020-10-14 20:11:56 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a0b72421064b5dd7312812509e9babe923063deb:

  Merge branch 'o_dsync' of https://github.com/anarazel/fio (2020-10-28 16:20:43 -0600)

----------------------------------------------------------------
Andres Freund (1):
      extend --sync to allow {sync,dsync,0,1}, to support O_DSYNC

Jens Axboe (1):
      Merge branch 'o_dsync' of https://github.com/anarazel/fio

 HOWTO               | 24 +++++++++++++++++++++---
 engines/glusterfs.c |  3 +--
 engines/ime.c       |  3 +--
 engines/libpmem.c   |  1 +
 filesetup.c         |  3 +--
 fio.1               | 28 +++++++++++++++++++++++++---
 options.c           | 26 ++++++++++++++++++++++----
 7 files changed, 72 insertions(+), 16 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 2d8c7a02..386fd12a 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1677,10 +1677,28 @@ Buffers and memory
 	This will be ignored if :option:`pre_read` is also specified for the
 	same job.
 
-.. option:: sync=bool
+.. option:: sync=str
+
+	Whether, and what type, of synchronous I/O to use for writes.  The allowed
+	values are:
+
+		**none**
+			Do not use synchronous IO, the default.
+
+		**0**
+			Same as **none**.
+
+		**sync**
+			Use synchronous file IO. For the majority of I/O engines,
+			this means using O_SYNC.
+
+		**1**
+			Same as **sync**.
+
+		**dsync**
+			Use synchronous data IO. For the majority of I/O engines,
+			this means using O_DSYNC.
 
-	Use synchronous I/O for buffered writes. For the majority of I/O engines,
-	this means using O_SYNC. Default: false.
 
 .. option:: iomem=str, mem=str
 
diff --git a/engines/glusterfs.c b/engines/glusterfs.c
index f2b84a2a..fc6fee19 100644
--- a/engines/glusterfs.c
+++ b/engines/glusterfs.c
@@ -271,8 +271,7 @@ int fio_gf_open_file(struct thread_data *td, struct fio_file *f)
 
 	if (td->o.odirect)
 		flags |= OS_O_DIRECT;
-	if (td->o.sync_io)
-		flags |= O_SYNC;
+	flags |= td->o.sync_io;
 
 	dprint(FD_FILE, "fio file %s open mode %s td rw %s\n", f->file_name,
 	       flags & O_RDONLY ? "ro" : "rw", td_read(td) ? "read" : "write");
diff --git a/engines/ime.c b/engines/ime.c
index 42984021..440cc29e 100644
--- a/engines/ime.c
+++ b/engines/ime.c
@@ -194,8 +194,7 @@ static int fio_ime_open_file(struct thread_data *td, struct fio_file *f)
 	}
 	if (td->o.odirect)
 		flags |= O_DIRECT;
-	if (td->o.sync_io)
-		flags |= O_SYNC;
+	flags |= td->o.sync_io;
 	if (td->o.create_on_open && td->o.allow_create)
 		flags |= O_CREAT;
 
diff --git a/engines/libpmem.c b/engines/libpmem.c
index a9b3e29b..eefb7767 100644
--- a/engines/libpmem.c
+++ b/engines/libpmem.c
@@ -193,6 +193,7 @@ static enum fio_q_status fio_libpmem_queue(struct thread_data *td,
 
 	dprint(FD_IO, "DEBUG fio_libpmem_queue\n");
 	dprint(FD_IO,"td->o.odirect %d td->o.sync_io %d \n",td->o.odirect, td->o.sync_io);
+	/* map both O_SYNC / DSYNC to not using NODRAIN */
 	flags = td->o.sync_io ? 0 : PMEM_F_MEM_NODRAIN;
 	flags |= td->o.odirect ? PMEM_F_MEM_NONTEMPORAL : PMEM_F_MEM_TEMPORAL;
 
diff --git a/filesetup.c b/filesetup.c
index e44f31c7..f4360a6f 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -655,8 +655,7 @@ int generic_open_file(struct thread_data *td, struct fio_file *f)
 		}
 		flags |= OS_O_DIRECT | FIO_O_ATOMIC;
 	}
-	if (td->o.sync_io)
-		flags |= O_SYNC;
+	flags |= td->o.sync_io;
 	if (td->o.create_on_open && td->o.allow_create)
 		flags |= O_CREAT;
 skip_flags:
diff --git a/fio.1 b/fio.1
index a881277c..48119325 100644
--- a/fio.1
+++ b/fio.1
@@ -1462,9 +1462,31 @@ starting I/O if the platform and file type support it. Defaults to true.
 This will be ignored if \fBpre_read\fR is also specified for the
 same job.
 .TP
-.BI sync \fR=\fPbool
-Use synchronous I/O for buffered writes. For the majority of I/O engines,
-this means using O_SYNC. Default: false.
+.BI sync \fR=\fPstr
+Whether, and what type, of synchronous I/O to use for writes.  The allowed
+values are:
+.RS
+.RS
+.TP
+.B none
+Do not use synchronous IO, the default.
+.TP
+.B 0
+Same as \fBnone\fR.
+.TP
+.B sync
+Use synchronous file IO. For the majority of I/O engines,
+this means using O_SYNC.
+.TP
+.B 1
+Same as \fBsync\fR.
+.TP
+.B dsync
+Use synchronous data IO. For the majority of I/O engines,
+this means using O_DSYNC.
+.PD
+.RE
+.RE
 .TP
 .BI iomem \fR=\fPstr "\fR,\fP mem" \fR=\fPstr
 Fio can use various types of memory as the I/O unit buffer. The allowed
diff --git a/options.c b/options.c
index b497d973..1e91b3e9 100644
--- a/options.c
+++ b/options.c
@@ -3733,14 +3733,32 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "sync",
 		.lname	= "Synchronous I/O",
-		.type	= FIO_OPT_BOOL,
+		.type	= FIO_OPT_STR,
 		.off1	= offsetof(struct thread_options, sync_io),
-		.help	= "Use O_SYNC for buffered writes",
-		.def	= "0",
-		.parent = "buffered",
+		.help	= "Use synchronous write IO",
+		.def	= "none",
 		.hide	= 1,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_IO_TYPE,
+		.posval = {
+			  { .ival = "none",
+			    .oval = 0,
+			  },
+			  { .ival = "0",
+			    .oval = 0,
+			  },
+			  { .ival = "sync",
+			    .oval = O_SYNC,
+			  },
+			  { .ival = "1",
+			    .oval = O_SYNC,
+			  },
+#ifdef O_DSYNC
+			  { .ival = "dsync",
+			    .oval = O_DSYNC,
+			  },
+#endif
+		},
 	},
 #ifdef FIO_HAVE_WRITE_HINT
 	{


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-10-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-10-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8cf503ce5956e4c287e13d5c7761a03fbb4b54cc:

  Merge branch 'patch-1' of https://github.com/chengli-rutgers/fio into master (2020-10-13 13:07:37 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8bfe330eb42739d503d35d0b7d96f98c5c544204:

  Disallow offload IO mode for engines marked with FIO_NO_OFFLOAD (2020-10-14 20:11:56 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Disallow offload IO mode for engines marked with FIO_NO_OFFLOAD

 engines/io_uring.c | 2 +-
 ioengines.c        | 2 +-
 ioengines.h        | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 69f48859..b997c8d8 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -806,7 +806,7 @@ static int fio_ioring_close_file(struct thread_data *td, struct fio_file *f)
 static struct ioengine_ops ioengine = {
 	.name			= "io_uring",
 	.version		= FIO_IOOPS_VERSION,
-	.flags			= FIO_ASYNCIO_SYNC_TRIM,
+	.flags			= FIO_ASYNCIO_SYNC_TRIM | FIO_NO_OFFLOAD,
 	.init			= fio_ioring_init,
 	.post_init		= fio_ioring_post_init,
 	.io_u_init		= fio_ioring_io_u_init,
diff --git a/ioengines.c b/ioengines.c
index d3be8026..3e43ef2f 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -45,7 +45,7 @@ static bool check_engine_ops(struct thread_data *td, struct ioengine_ops *ops)
 	 * async engines aren't reliable with offload
 	 */
 	if ((td->o.io_submit_mode == IO_MODE_OFFLOAD) &&
-	    !(ops->flags & FIO_FAKEIO)) {
+	    (ops->flags & FIO_NO_OFFLOAD)) {
 		log_err("%s: can't be used with offloaded submit. Use a sync "
 			"engine\n", ops->name);
 		return true;
diff --git a/ioengines.h b/ioengines.h
index 54dadba2..fbe52fa4 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -77,7 +77,8 @@ enum fio_ioengine_flags {
 	FIO_NOSTATS	= 1 << 12,	/* don't do IO stats */
 	FIO_NOFILEHASH	= 1 << 13,	/* doesn't hash the files for lookup later. */
 	FIO_ASYNCIO_SYNC_TRIM
-			= 1 << 14	/* io engine has async ->queue except for trim */
+			= 1 << 14,	/* io engine has async ->queue except for trim */
+	FIO_NO_OFFLOAD	= 1 << 15,	/* no async offload */
 };
 
 /*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-10-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-10-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 60c52212520b905a1740d3c8815c34cc48471c5c:

  Merge branch 'master' of https://github.com/bvanassche/fio into master (2020-10-10 20:30:52 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8cf503ce5956e4c287e13d5c7761a03fbb4b54cc:

  Merge branch 'patch-1' of https://github.com/chengli-rutgers/fio into master (2020-10-13 13:07:37 -0600)

----------------------------------------------------------------
Cheng Li (1):
      getopt_long: avoid variable global initialization

Jens Axboe (1):
      Merge branch 'patch-1' of https://github.com/chengli-rutgers/fio into master

 oslib/getopt_long.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/oslib/getopt_long.c b/oslib/getopt_long.c
index 8ec77413..463919fb 100644
--- a/oslib/getopt_long.c
+++ b/oslib/getopt_long.c
@@ -16,8 +16,8 @@
 
 #include "getopt.h"
 
-char *optarg = NULL;
-int optind = 0, opterr = 0, optopt = 0;
+char *optarg;
+int optind, opterr, optopt;
 
 static struct getopt_private_state {
 	const char *optptr;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-10-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-10-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 729071019ee23804d52bed47357e4324be6b06bf:

  Merge branch 'master' of https://github.com/chengli-rutgers/fio into master (2020-10-09 07:28:55 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 60c52212520b905a1740d3c8815c34cc48471c5c:

  Merge branch 'master' of https://github.com/bvanassche/fio into master (2020-10-10 20:30:52 -0600)

----------------------------------------------------------------
Alexey Dobriyan (1):
      simplify MB/s, IOPS interactive printout code

Bart Van Assche (9):
      configure: Remove the CLOCK_MONOTONIC_PRECISE probe
      gettime: Introduce fio_get_mono_time()
      gettime: Simplify get_cycles_per_msec()
      gettime: Introduce rel_time_since()
      helper_thread: Introduce a function for blocking Unix signals
      helper_thread: Introduce the wait_for_action() function
      Change the return value of two functions from 'void' into 'int'
      helper_thread: Rework the interval timer code
      helper_thread: Increase timer accuracy

Jens Axboe (3):
      flow: avoid holes in struct fio_flow
      flow: use unsigned long for the counter
      Merge branch 'master' of https://github.com/bvanassche/fio into master

 configure          |  61 ++++--------
 engines/posixaio.c |  41 +-------
 eta.c              |  70 +++++--------
 fio_time.h         |   2 +
 flow.c             |   6 +-
 gettime.c          |  76 ++++++++------
 gettime.h          |   1 +
 helper_thread.c    | 285 ++++++++++++++++++++++++++++++++++++-----------------
 stat.c             |   4 +-
 stat.h             |   2 +-
 steadystate.c      |   3 +-
 steadystate.h      |   2 +-
 12 files changed, 306 insertions(+), 247 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 12b7cb58..39a9248d 100755
--- a/configure
+++ b/configure
@@ -1104,46 +1104,6 @@ EOF
 fi
 print_config "CLOCK_MONOTONIC" "$clock_monotonic"
 
-##########################################
-# CLOCK_MONOTONIC_RAW probe
-if test "$clock_monotonic_raw" != "yes" ; then
-  clock_monotonic_raw="no"
-fi
-if test "$clock_gettime" = "yes" ; then
-  cat > $TMPC << EOF
-#include <stdio.h>
-#include <time.h>
-int main(int argc, char **argv)
-{
-  return clock_gettime(CLOCK_MONOTONIC_RAW, NULL);
-}
-EOF
-  if compile_prog "" "$LIBS" "clock monotonic"; then
-      clock_monotonic_raw="yes"
-  fi
-fi
-print_config "CLOCK_MONOTONIC_RAW" "$clock_monotonic_raw"
-
-##########################################
-# CLOCK_MONOTONIC_PRECISE probe
-if test "$clock_monotonic_precise" != "yes" ; then
-  clock_monotonic_precise="no"
-fi
-if test "$clock_gettime" = "yes" ; then
-  cat > $TMPC << EOF
-#include <stdio.h>
-#include <time.h>
-int main(int argc, char **argv)
-{
-  return clock_gettime(CLOCK_MONOTONIC_PRECISE, NULL);
-}
-EOF
-  if compile_prog "" "$LIBS" "clock monotonic precise"; then
-      clock_monotonic_precise="yes"
-  fi
-fi
-print_config "CLOCK_MONOTONIC_PRECISE" "$clock_monotonic_precise"
-
 ##########################################
 # clockid_t probe
 if test "$clockid_t" != "yes" ; then
@@ -2722,6 +2682,24 @@ else
   pdb=no
 fi
 print_config "Windows PDB generation" "$pdb"
+
+##########################################
+# check for timerfd support
+timerfd_create="no"
+cat > $TMPC << EOF
+#include <sys/time.h>
+#include <sys/timerfd.h>
+
+int main(int argc, char **argv)
+{
+	return timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK);
+}
+EOF
+if compile_prog "" "" "timerfd_create"; then
+  timerfd_create="yes"
+fi
+print_config "timerfd_create" "$timerfd_create"
+
 #############################################################################
 
 if test "$wordsize" = "64" ; then
@@ -3023,6 +3001,9 @@ fi
 if test "$statx_syscall" = "yes"; then
   output_sym "CONFIG_HAVE_STATX_SYSCALL"
 fi
+if test "$timerfd_create" = "yes"; then
+  output_sym "CONFIG_HAVE_TIMERFD_CREATE"
+fi
 if test "$fallthrough" = "yes"; then
   CFLAGS="$CFLAGS -Wimplicit-fallthrough"
 fi
diff --git a/engines/posixaio.c b/engines/posixaio.c
index 82c6aa65..135d088c 100644
--- a/engines/posixaio.c
+++ b/engines/posixaio.c
@@ -17,47 +17,14 @@ struct posixaio_data {
 	unsigned int queued;
 };
 
-static int fill_timespec(struct timespec *ts)
+static unsigned long long ts_utime_since_now(const struct timespec *start)
 {
-#ifdef CONFIG_CLOCK_GETTIME
-#ifdef CONFIG_CLOCK_MONOTONIC
-	clockid_t clk = CLOCK_MONOTONIC;
-#else
-	clockid_t clk = CLOCK_REALTIME;
-#endif
-	if (!clock_gettime(clk, ts))
-		return 0;
-
-	perror("clock_gettime");
-	return 1;
-#else
-	struct timeval tv;
-
-	gettimeofday(&tv, NULL);
-	ts->tv_sec = tv.tv_sec;
-	ts->tv_nsec = tv.tv_usec * 1000;
-	return 0;
-#endif
-}
-
-static unsigned long long ts_utime_since_now(struct timespec *t)
-{
-	long long sec, nsec;
 	struct timespec now;
 
-	if (fill_timespec(&now))
+	if (fio_get_mono_time(&now) < 0)
 		return 0;
-	
-	sec = now.tv_sec - t->tv_sec;
-	nsec = now.tv_nsec - t->tv_nsec;
-	if (sec > 0 && nsec < 0) {
-		sec--;
-		nsec += 1000000000;
-	}
 
-	sec *= 1000000;
-	nsec /= 1000;
-	return sec + nsec;
+	return utime_since(start, &now);
 }
 
 static int fio_posixaio_cancel(struct thread_data fio_unused *td,
@@ -102,7 +69,7 @@ static int fio_posixaio_getevents(struct thread_data *td, unsigned int min,
 	unsigned int r;
 	int i;
 
-	if (t && !fill_timespec(&start))
+	if (t && fio_get_mono_time(&start) == 0)
 		have_timeout = 1;
 	else
 		memset(&start, 0, sizeof(start));
diff --git a/eta.c b/eta.c
index e8c72780..d1c9449f 100644
--- a/eta.c
+++ b/eta.c
@@ -534,56 +534,38 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 static int gen_eta_str(struct jobs_eta *je, char *p, size_t left,
 		       char **rate_str, char **iops_str)
 {
-	bool has_r = je->rate[DDIR_READ] || je->iops[DDIR_READ];
-	bool has_w = je->rate[DDIR_WRITE] || je->iops[DDIR_WRITE];
-	bool has_t = je->rate[DDIR_TRIM] || je->iops[DDIR_TRIM];
+	static const char c[DDIR_RWDIR_CNT] = {'r', 'w', 't'};
+	bool has[DDIR_RWDIR_CNT];
+	bool has_any = false;
+	const char *sep;
 	int l = 0;
 
-	if (!has_r && !has_w && !has_t)
+	for_each_rw_ddir(ddir) {
+		has[ddir] = (je->rate[ddir] || je->iops[ddir]);
+		has_any |= has[ddir];
+	}
+	if (!has_any)
 		return 0;
 
-	if (has_r) {
-		l += snprintf(p + l, left - l, "[r=%s", rate_str[DDIR_READ]);
-		if (!has_w)
-			l += snprintf(p + l, left - l, "]");
-	}
-	if (has_w) {
-		if (has_r)
-			l += snprintf(p + l, left - l, ",");
-		else
-			l += snprintf(p + l, left - l, "[");
-		l += snprintf(p + l, left - l, "w=%s", rate_str[DDIR_WRITE]);
-		if (!has_t)
-			l += snprintf(p + l, left - l, "]");
-	}
-	if (has_t) {
-		if (has_r || has_w)
-			l += snprintf(p + l, left - l, ",");
-		else if (!has_r && !has_w)
-			l += snprintf(p + l, left - l, "[");
-		l += snprintf(p + l, left - l, "t=%s]", rate_str[DDIR_TRIM]);
-	}
-	if (has_r) {
-		l += snprintf(p + l, left - l, "[r=%s", iops_str[DDIR_READ]);
-		if (!has_w)
-			l += snprintf(p + l, left - l, " IOPS]");
-	}
-	if (has_w) {
-		if (has_r)
-			l += snprintf(p + l, left - l, ",");
-		else
-			l += snprintf(p + l, left - l, "[");
-		l += snprintf(p + l, left - l, "w=%s", iops_str[DDIR_WRITE]);
-		if (!has_t)
-			l += snprintf(p + l, left - l, " IOPS]");
+	l += snprintf(p + l, left - l, "[");
+	sep = "";
+	for_each_rw_ddir(ddir) {
+		if (has[ddir]) {
+			l += snprintf(p + l, left - l, "%s%c=%s",
+					sep, c[ddir], rate_str[ddir]);
+			sep = ",";
+		}
 	}
-	if (has_t) {
-		if (has_r || has_w)
-			l += snprintf(p + l, left - l, ",");
-		else if (!has_r && !has_w)
-			l += snprintf(p + l, left - l, "[");
-		l += snprintf(p + l, left - l, "t=%s IOPS]", iops_str[DDIR_TRIM]);
+	l += snprintf(p + l, left - l, "][");
+	sep = "";
+	for_each_rw_ddir(ddir) {
+		if (has[ddir]) {
+			l += snprintf(p + l, left - l, "%s%c=%s",
+					sep, c[ddir], iops_str[ddir]);
+			sep = ",";
+		}
 	}
+	l += snprintf(p + l, left - l, " IOPS]");
 
 	return l;
 }
diff --git a/fio_time.h b/fio_time.h
index c00f8e78..b3bbd4c0 100644
--- a/fio_time.h
+++ b/fio_time.h
@@ -13,6 +13,8 @@ extern uint64_t ntime_since(const struct timespec *, const struct timespec *);
 extern uint64_t ntime_since_now(const struct timespec *);
 extern uint64_t utime_since(const struct timespec *, const struct timespec *);
 extern uint64_t utime_since_now(const struct timespec *);
+extern int64_t rel_time_since(const struct timespec *,
+			      const struct timespec *);
 extern uint64_t mtime_since(const struct timespec *, const struct timespec *);
 extern uint64_t mtime_since_now(const struct timespec *);
 extern uint64_t mtime_since_tv(const struct timeval *, const struct timeval *);
diff --git a/flow.c b/flow.c
index ee4d761d..ea6b0ec9 100644
--- a/flow.c
+++ b/flow.c
@@ -5,9 +5,9 @@
 
 struct fio_flow {
 	unsigned int refs;
-	struct flist_head list;
 	unsigned int id;
-	unsigned long long flow_counter;
+	struct flist_head list;
+	unsigned long flow_counter;
 	unsigned int total_weight;
 };
 
@@ -90,7 +90,7 @@ static struct fio_flow *flow_get(unsigned int id)
 	return flow;
 }
 
-static void flow_put(struct fio_flow *flow, unsigned long long flow_counter,
+static void flow_put(struct fio_flow *flow, unsigned long flow_counter,
 				        unsigned int weight)
 {
 	if (!flow_lock)
diff --git a/gettime.c b/gettime.c
index c3a4966b..f85da6e0 100644
--- a/gettime.c
+++ b/gettime.c
@@ -127,18 +127,33 @@ static void fio_init gtod_init(void)
 
 #endif /* FIO_DEBUG_TIME */
 
-#ifdef CONFIG_CLOCK_GETTIME
-static int fill_clock_gettime(struct timespec *ts)
+/*
+ * Queries the value of the monotonic clock if a monotonic clock is available
+ * or the wall clock time if no monotonic clock is available. Returns 0 if
+ * querying the clock succeeded or -1 if querying the clock failed.
+ */
+int fio_get_mono_time(struct timespec *ts)
 {
-#if defined(CONFIG_CLOCK_MONOTONIC_RAW)
-	return clock_gettime(CLOCK_MONOTONIC_RAW, ts);
-#elif defined(CONFIG_CLOCK_MONOTONIC)
-	return clock_gettime(CLOCK_MONOTONIC, ts);
+	int ret;
+
+#ifdef CONFIG_CLOCK_GETTIME
+#if defined(CONFIG_CLOCK_MONOTONIC)
+	ret = clock_gettime(CLOCK_MONOTONIC, ts);
 #else
-	return clock_gettime(CLOCK_REALTIME, ts);
+	ret = clock_gettime(CLOCK_REALTIME, ts);
 #endif
-}
+#else
+	struct timeval tv;
+
+	ret = gettimeofday(&tv, NULL);
+	if (ret == 0) {
+		ts->tv_sec = tv.tv_sec;
+		ts->tv_nsec = tv.tv_usec * 1000;
+	}
 #endif
+	assert(ret <= 0);
+	return ret;
+}
 
 static void __fio_gettime(struct timespec *tp)
 {
@@ -155,8 +170,8 @@ static void __fio_gettime(struct timespec *tp)
 #endif
 #ifdef CONFIG_CLOCK_GETTIME
 	case CS_CGETTIME: {
-		if (fill_clock_gettime(tp) < 0) {
-			log_err("fio: clock_gettime fails\n");
+		if (fio_get_mono_time(tp) < 0) {
+			log_err("fio: fio_get_mono_time() fails\n");
 			assert(0);
 		}
 		break;
@@ -224,19 +239,13 @@ static unsigned long get_cycles_per_msec(void)
 {
 	struct timespec s, e;
 	uint64_t c_s, c_e;
-	enum fio_cs old_cs = fio_clock_source;
 	uint64_t elapsed;
 
-#ifdef CONFIG_CLOCK_GETTIME
-	fio_clock_source = CS_CGETTIME;
-#else
-	fio_clock_source = CS_GTOD;
-#endif
-	__fio_gettime(&s);
+	fio_get_mono_time(&s);
 
 	c_s = get_cpu_clock();
 	do {
-		__fio_gettime(&e);
+		fio_get_mono_time(&e);
 		c_e = get_cpu_clock();
 
 		elapsed = ntime_since(&s, &e);
@@ -244,7 +253,6 @@ static unsigned long get_cycles_per_msec(void)
 			break;
 	} while (1);
 
-	fio_clock_source = old_cs;
 	return (c_e - c_s) * 1000000 / elapsed;
 }
 
@@ -516,23 +524,33 @@ uint64_t mtime_since_now(const struct timespec *s)
 	return mtime_since(s, &t);
 }
 
-uint64_t mtime_since(const struct timespec *s, const struct timespec *e)
+/*
+ * Returns *e - *s in milliseconds as a signed integer. Note: rounding is
+ * asymmetric. If the difference yields +1 ns then 0 is returned. If the
+ * difference yields -1 ns then -1 is returned.
+ */
+int64_t rel_time_since(const struct timespec *s, const struct timespec *e)
 {
-	int64_t sec, usec;
+	int64_t sec, nsec;
 
 	sec = e->tv_sec - s->tv_sec;
-	usec = (e->tv_nsec - s->tv_nsec) / 1000;
-	if (sec > 0 && usec < 0) {
+	nsec = e->tv_nsec - s->tv_nsec;
+	if (nsec < 0) {
 		sec--;
-		usec += 1000000;
+		nsec += 1000ULL * 1000 * 1000;
 	}
+	assert(0 <= nsec && nsec < 1000ULL * 1000 * 1000);
 
-	if (sec < 0 || (sec == 0 && usec < 0))
-		return 0;
+	return sec * 1000 + nsec / (1000 * 1000);
+}
 
-	sec *= 1000;
-	usec /= 1000;
-	return sec + usec;
+/*
+ * Returns *e - *s in milliseconds as an unsigned integer. Returns 0 if
+ * *e < *s.
+ */
+uint64_t mtime_since(const struct timespec *s, const struct timespec *e)
+{
+	return max(rel_time_since(s, e), (int64_t)0);
 }
 
 uint64_t time_since_now(const struct timespec *s)
diff --git a/gettime.h b/gettime.h
index c55f5cba..f1d619ad 100644
--- a/gettime.h
+++ b/gettime.h
@@ -16,6 +16,7 @@ enum fio_cs {
 	CS_INVAL,
 };
 
+extern int fio_get_mono_time(struct timespec *);
 extern void fio_gettime(struct timespec *, void *);
 extern void fio_gtod_init(void);
 extern void fio_clock_init(void);
diff --git a/helper_thread.c b/helper_thread.c
index a2fb7c29..2d553654 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -1,4 +1,8 @@
 #include <signal.h>
+#include <unistd.h>
+#ifdef CONFIG_HAVE_TIMERFD_CREATE
+#include <sys/timerfd.h>
+#endif
 #ifdef CONFIG_VALGRIND_DEV
 #include <valgrind/drd.h>
 #else
@@ -11,6 +15,9 @@
 #include "steadystate.h"
 #include "pshared.h"
 
+static int sleep_accuracy_ms;
+static int timerfd = -1;
+
 enum action {
 	A_EXIT		= 1,
 	A_RESET		= 2,
@@ -25,6 +32,13 @@ static struct helper_data {
 	struct fio_sem *startup_sem;
 } *helper_data;
 
+struct interval_timer {
+	const char	*name;
+	struct timespec	expires;
+	uint32_t	interval_ms;
+	int		(*func)(void);
+};
+
 void helper_thread_destroy(void)
 {
 	if (!helper_data)
@@ -83,6 +97,18 @@ static int read_from_pipe(int fd, void *buf, size_t len)
 }
 #endif
 
+static void block_signals(void)
+{
+#ifdef HAVE_PTHREAD_SIGMASK
+	sigset_t sigmask;
+
+	ret = pthread_sigmask(SIG_UNBLOCK, NULL, &sigmask);
+	assert(ret == 0);
+	ret = pthread_sigmask(SIG_BLOCK, &sigmask, NULL);
+	assert(ret == 0);
+#endif
+}
+
 static void submit_action(enum action a)
 {
 	const char data = a;
@@ -128,128 +154,207 @@ void helper_thread_exit(void)
 	pthread_join(helper_data->thread, NULL);
 }
 
-static unsigned int task_helper(struct timespec *last, struct timespec *now, unsigned int period, void do_task())
+/* Resets timers and returns the time in milliseconds until the next event. */
+static int reset_timers(struct interval_timer timer[], int num_timers,
+			struct timespec *now)
 {
-	unsigned int next, since;
-
-	since = mtime_since(last, now);
-	if (since >= period || period - since < 10) {
-		do_task();
-		timespec_add_msec(last, since);
-		if (since > period)
-			next = period - (since - period);
-		else
-			next = period;
-	} else
-		next = period - since;
-
-	return next;
+	uint32_t msec_to_next_event = INT_MAX;
+	int i;
+
+	for (i = 0; i < num_timers; ++i) {
+		timer[i].expires = *now;
+		timespec_add_msec(&timer[i].expires, timer[i].interval_ms);
+		msec_to_next_event = min_not_zero(msec_to_next_event,
+						  timer[i].interval_ms);
+	}
+
+	return msec_to_next_event;
 }
 
-static void *helper_thread_main(void *data)
+/*
+ * Waits for an action from fd during at least timeout_ms. `fd` must be in
+ * non-blocking mode.
+ */
+static uint8_t wait_for_action(int fd, unsigned int timeout_ms)
 {
-	struct helper_data *hd = data;
-	unsigned int msec_to_next_event, next_log, next_si = status_interval;
-	unsigned int next_ss = STEADYSTATE_MSEC;
-	struct timespec ts, last_du, last_ss, last_si;
-	char action;
-	int ret = 0;
-
-	sk_out_assign(hd->sk_out);
+	struct timeval timeout = {
+		.tv_sec  = timeout_ms / 1000,
+		.tv_usec = (timeout_ms % 1000) * 1000,
+	};
+	fd_set rfds, efds;
+	uint8_t action = 0;
+	uint64_t exp;
+	int res;
 
-#ifdef HAVE_PTHREAD_SIGMASK
+	res = read_from_pipe(fd, &action, sizeof(action));
+	if (res > 0 || timeout_ms == 0)
+		return action;
+	FD_ZERO(&rfds);
+	FD_SET(fd, &rfds);
+	FD_ZERO(&efds);
+	FD_SET(fd, &efds);
+#ifdef CONFIG_HAVE_TIMERFD_CREATE
 	{
-	sigset_t sigmask;
-
-	/* Let another thread handle signals. */
-	ret = pthread_sigmask(SIG_UNBLOCK, NULL, &sigmask);
-	assert(ret == 0);
-	ret = pthread_sigmask(SIG_BLOCK, &sigmask, NULL);
-	assert(ret == 0);
+		/*
+		 * If the timer frequency is 100 Hz, select() will round up
+		 * `timeout` to the next multiple of 1 / 100 Hz = 10 ms. Hence
+		 * use a high-resolution timer if possible to increase
+		 * select() timeout accuracy.
+		 */
+		struct itimerspec delta = {};
+
+		delta.it_value.tv_sec = timeout.tv_sec;
+		delta.it_value.tv_nsec = timeout.tv_usec * 1000;
+		res = timerfd_settime(timerfd, 0, &delta, NULL);
+		assert(res == 0);
+		FD_SET(timerfd, &rfds);
 	}
 #endif
+	res = select(max(fd, timerfd) + 1, &rfds, NULL, &efds,
+		     timerfd >= 0 ? NULL : &timeout);
+	if (res < 0) {
+		log_err("fio: select() call in helper thread failed: %s",
+			strerror(errno));
+		return A_EXIT;
+	}
+	if (FD_ISSET(fd, &rfds))
+		read_from_pipe(fd, &action, sizeof(action));
+	if (timerfd >= 0 && FD_ISSET(timerfd, &rfds)) {
+		res = read(timerfd, &exp, sizeof(exp));
+		assert(res == sizeof(exp));
+	}
+	return action;
+}
+
+/*
+ * Verify whether or not timer @it has expired. If timer @it has expired, call
+ * @it->func(). @now is the current time. @msec_to_next_event is an
+ * input/output parameter that represents the time until the next event.
+ */
+static int eval_timer(struct interval_timer *it, const struct timespec *now,
+		      unsigned int *msec_to_next_event)
+{
+	int64_t delta_ms;
+	bool expired;
+
+	/* interval == 0 means that the timer is disabled. */
+	if (it->interval_ms == 0)
+		return 0;
+
+	delta_ms = rel_time_since(now, &it->expires);
+	expired = delta_ms <= sleep_accuracy_ms;
+	if (expired) {
+		timespec_add_msec(&it->expires, it->interval_ms);
+		delta_ms = rel_time_since(now, &it->expires);
+		if (delta_ms < it->interval_ms - sleep_accuracy_ms ||
+		    delta_ms > it->interval_ms + sleep_accuracy_ms) {
+			dprint(FD_HELPERTHREAD,
+			       "%s: delta = %" PRIi64 " <> %u. Clock jump?\n",
+			       it->name, delta_ms, it->interval_ms);
+			delta_ms = it->interval_ms;
+			it->expires = *now;
+			timespec_add_msec(&it->expires, it->interval_ms);
+		}
+	}
+	*msec_to_next_event = min((unsigned int)delta_ms, *msec_to_next_event);
+	return expired ? it->func() : 0;
+}
+
+static void *helper_thread_main(void *data)
+{
+	struct helper_data *hd = data;
+	unsigned int msec_to_next_event, next_log;
+	struct interval_timer timer[] = {
+		{
+			.name = "disk_util",
+			.interval_ms = DISK_UTIL_MSEC,
+			.func = update_io_ticks,
+		},
+		{
+			.name = "status_interval",
+			.interval_ms = status_interval,
+			.func = __show_running_run_stats,
+		},
+		{
+			.name = "steadystate",
+			.interval_ms = steadystate_enabled ? STEADYSTATE_MSEC :
+				0,
+			.func = steadystate_check,
+		}
+	};
+	struct timespec ts;
+	int clk_tck, ret = 0;
 
-#ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
-	clock_gettime(CLOCK_MONOTONIC, &ts);
+#ifdef _SC_CLK_TCK
+	clk_tck = sysconf(_SC_CLK_TCK);
 #else
-	clock_gettime(CLOCK_REALTIME, &ts);
+	/*
+	 * The timer frequence is variable on Windows. Instead of trying to
+	 * query it, use 64 Hz, the clock frequency lower bound. See also
+	 * https://carpediemsystems.co.uk/2019/07/18/windows-system-timer-granularity/.
+	 */
+	clk_tck = 64;
+#endif
+	dprint(FD_HELPERTHREAD, "clk_tck = %d\n", clk_tck);
+	assert(clk_tck > 0);
+	sleep_accuracy_ms = (1000 + clk_tck - 1) / clk_tck;
+
+#ifdef CONFIG_HAVE_TIMERFD_CREATE
+	timerfd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK);
+	assert(timerfd >= 0);
+	sleep_accuracy_ms = 1;
 #endif
-	memcpy(&last_du, &ts, sizeof(ts));
-	memcpy(&last_ss, &ts, sizeof(ts));
-	memcpy(&last_si, &ts, sizeof(ts));
+
+	sk_out_assign(hd->sk_out);
+
+	/* Let another thread handle signals. */
+	block_signals();
+
+	fio_get_mono_time(&ts);
+	msec_to_next_event = reset_timers(timer, ARRAY_SIZE(timer), &ts);
 
 	fio_sem_up(hd->startup_sem);
 
-	msec_to_next_event = DISK_UTIL_MSEC;
 	while (!ret && !hd->exit) {
-		uint64_t since_du;
-		struct timeval timeout = {
-			.tv_sec  = msec_to_next_event / 1000,
-			.tv_usec = (msec_to_next_event % 1000) * 1000,
-		};
-		fd_set rfds, efds;
-
-		if (read_from_pipe(hd->pipe[0], &action, sizeof(action)) < 0) {
-			FD_ZERO(&rfds);
-			FD_SET(hd->pipe[0], &rfds);
-			FD_ZERO(&efds);
-			FD_SET(hd->pipe[0], &efds);
-			if (select(1, &rfds, NULL, &efds, &timeout) < 0) {
-				log_err("fio: select() call in helper thread failed: %s",
-					strerror(errno));
-				ret = 1;
-			}
-			if (read_from_pipe(hd->pipe[0], &action, sizeof(action)) <
-			    0)
-				action = 0;
-		}
+		uint8_t action;
+		int i;
 
-#ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
-		clock_gettime(CLOCK_MONOTONIC, &ts);
-#else
-		clock_gettime(CLOCK_REALTIME, &ts);
-#endif
+		action = wait_for_action(hd->pipe[0], msec_to_next_event);
+		if (action == A_EXIT)
+			break;
 
-		if (action == A_RESET) {
-			last_du = ts;
-			last_ss = ts;
-		}
+		fio_get_mono_time(&ts);
+
+		msec_to_next_event = INT_MAX;
 
-		since_du = mtime_since(&last_du, &ts);
-		if (since_du >= DISK_UTIL_MSEC || DISK_UTIL_MSEC - since_du < 10) {
-			ret = update_io_ticks();
-			timespec_add_msec(&last_du, DISK_UTIL_MSEC);
-			msec_to_next_event = DISK_UTIL_MSEC;
-			if (since_du >= DISK_UTIL_MSEC)
-				msec_to_next_event -= (since_du - DISK_UTIL_MSEC);
-		} else
-			msec_to_next_event = DISK_UTIL_MSEC - since_du;
+		if (action == A_RESET)
+			msec_to_next_event = reset_timers(timer,
+						ARRAY_SIZE(timer), &ts);
+
+		for (i = 0; i < ARRAY_SIZE(timer); ++i)
+			ret = eval_timer(&timer[i], &ts, &msec_to_next_event);
 
 		if (action == A_DO_STAT)
 			__show_running_run_stats();
 
-		if (status_interval) {
-			next_si = task_helper(&last_si, &ts, status_interval, __show_running_run_stats);
-			msec_to_next_event = min(next_si, msec_to_next_event);
-		}
-
 		next_log = calc_log_samples();
 		if (!next_log)
 			next_log = DISK_UTIL_MSEC;
 
-		if (steadystate_enabled) {
-			next_ss = task_helper(&last_ss, &ts, STEADYSTATE_MSEC, steadystate_check);
-			msec_to_next_event = min(next_ss, msec_to_next_event);
-                }
-
 		msec_to_next_event = min(next_log, msec_to_next_event);
-		dprint(FD_HELPERTHREAD, "next_si: %u, next_ss: %u, next_log: %u, msec_to_next_event: %u\n",
-			next_si, next_ss, next_log, msec_to_next_event);
+		dprint(FD_HELPERTHREAD,
+		       "next_log: %u, msec_to_next_event: %u\n",
+		       next_log, msec_to_next_event);
 
 		if (!is_backend)
 			print_thread_status();
 	}
 
+	if (timerfd >= 0) {
+		close(timerfd);
+		timerfd = -1;
+	}
+
 	fio_writeout_logs(false);
 
 	sk_out_drop();
diff --git a/stat.c b/stat.c
index 7f987c7f..eb40bd7f 100644
--- a/stat.c
+++ b/stat.c
@@ -2299,7 +2299,7 @@ void __show_run_stats(void)
 	free(opt_lists);
 }
 
-void __show_running_run_stats(void)
+int __show_running_run_stats(void)
 {
 	struct thread_data *td;
 	unsigned long long *rt;
@@ -2350,6 +2350,8 @@ void __show_running_run_stats(void)
 
 	free(rt);
 	fio_sem_up(stat_sem);
+
+	return 0;
 }
 
 static bool status_file_disabled;
diff --git a/stat.h b/stat.h
index 0d141666..6dd5ef74 100644
--- a/stat.h
+++ b/stat.h
@@ -319,7 +319,7 @@ extern void show_group_stats(struct group_run_stats *rs, struct buf_output *);
 extern bool calc_thread_status(struct jobs_eta *je, int force);
 extern void display_thread_status(struct jobs_eta *je);
 extern void __show_run_stats(void);
-extern void __show_running_run_stats(void);
+extern int __show_running_run_stats(void);
 extern void show_running_run_stats(void);
 extern void check_for_running_stats(void);
 extern void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src, bool first);
diff --git a/steadystate.c b/steadystate.c
index bd2f70dd..2e3da1db 100644
--- a/steadystate.c
+++ b/steadystate.c
@@ -196,7 +196,7 @@ static bool steadystate_deviation(uint64_t iops, uint64_t bw,
 	return false;
 }
 
-void steadystate_check(void)
+int steadystate_check(void)
 {
 	int i, j, ddir, prev_groupid, group_ramp_time_over = 0;
 	unsigned long rate_time;
@@ -302,6 +302,7 @@ void steadystate_check(void)
 			}
 		}
 	}
+	return 0;
 }
 
 int td_steadystate_init(struct thread_data *td)
diff --git a/steadystate.h b/steadystate.h
index 51472c46..bbb86fbb 100644
--- a/steadystate.h
+++ b/steadystate.h
@@ -4,7 +4,7 @@
 #include "thread_options.h"
 
 extern void steadystate_free(struct thread_data *);
-extern void steadystate_check(void);
+extern int steadystate_check(void);
 extern void steadystate_setup(void);
 extern int td_steadystate_init(struct thread_data *);
 extern uint64_t steadystate_bw_mean(struct thread_stat *);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-10-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-10-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7064f8942a3b8070acf60b8e5fabc16f8221ae40:

  Merge branch 'msys2' of https://github.com/sitsofe/fio into master (2020-09-14 19:43:39 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 729071019ee23804d52bed47357e4324be6b06bf:

  Merge branch 'master' of https://github.com/chengli-rutgers/fio into master (2020-10-09 07:28:55 -0600)

----------------------------------------------------------------
Cheng Li (1):
      td_var: avoid arithmetic on a pointer to void cast (#1096)

Jens Axboe (1):
      Merge branch 'master' of https://github.com/chengli-rutgers/fio into master

 parse.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/parse.h b/parse.h
index 1d2cbf74..e6663ed4 100644
--- a/parse.h
+++ b/parse.h
@@ -125,7 +125,7 @@ static inline void *td_var(void *to, const struct fio_option *o,
 	else
 		ret = to;
 
-	return ret + offset;
+	return (void *) ((uintptr_t) ret + offset);
 }
 
 static inline int parse_is_percent(unsigned long long val)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-09-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-09-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 695611a9d4cd554d44d8b2ec5da2811061950a2e:

  Allow offload with FAKEIO engines (2020-09-11 09:58:15 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7064f8942a3b8070acf60b8e5fabc16f8221ae40:

  Merge branch 'msys2' of https://github.com/sitsofe/fio into master (2020-09-14 19:43:39 -0600)

----------------------------------------------------------------
Bart Van Assche (2):
      backend: Remove two superfluous casts
      backend: Use asprintf() instead of strlen() + sprintf()

Jens Axboe (2):
      Merge branch 'backend' of https://github.com/bvanassche/fio into master
      Merge branch 'msys2' of https://github.com/sitsofe/fio into master

Sitsofe Wheeler (12):
      configure: pass non-null pointer to (v)asprintf
      net: coerce the result of htonl before printing
      windows: fix wrong format strings
      windows: fix DWORD format string complaints
      configure: be explicit about "XP" Windows API version
      configure/Makefile: add option to generate pdb symbols
      Makefile/ci: Don't pass CFLAGS when linking
      appveyor: cleanup and add separate install script
      Makefile: introduce FIO_CFLAGS
      memlock: avoid type confusion in format string
      configure: cleanup lex/yacc tests
      travis: cleanup build script

 .appveyor.yml             | 40 ++++++++++++++++--------
 Makefile                  | 77 ++++++++++++++++++++++++++---------------------
 backend.c                 | 15 ++++-----
 ci/appveyor-install.sh    | 41 +++++++++++++++++++++++++
 ci/travis-build.sh        |  8 +++--
 ci/travis-install.sh      |  5 ++-
 configure                 | 60 ++++++++++++++++++++++++++----------
 engines/net.c             |  5 +--
 engines/windowsaio.c      |  8 ++---
 os/windows/cpu-affinity.c | 26 ++++++++--------
 os/windows/dobuild.cmd    | 10 ++++--
 os/windows/install.wxs    |  8 +++++
 os/windows/posix.c        |  5 +--
 t/memlock.c               |  2 +-
 14 files changed, 211 insertions(+), 99 deletions(-)
 create mode 100755 ci/appveyor-install.sh

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index 352caeee..fad16326 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -5,33 +5,49 @@ image:
 
 environment:
   CYG_MIRROR: http://cygwin.mirror.constant.com
-  CYG_ROOT: C:\cygwin64
-  MAKEFLAGS: -j 2
   matrix:
-    - platform: x64
-      PACKAGE_ARCH: x86_64
+    - ARCHITECTURE: x64
+      CC: clang
+      CONFIGURE_OPTIONS: --enable-pdb
+      DISTRO: msys2
+# Skip 32 bit clang build
+#    - ARCHITECTURE: x86
+#      CC: clang
+#      CONFIGURE_OPTIONS: --enable-pdb
+#      DISTRO: msys2
+    - ARCHITECTURE: x64
       CONFIGURE_OPTIONS:
-    - platform: x86
-      PACKAGE_ARCH: i686
+      DISTRO: cygwin
+    - ARCHITECTURE: x86
       CONFIGURE_OPTIONS: --build-32bit-win --target-win-ver=xp
+      DISTRO: cygwin
 
 install:
-  - '%CYG_ROOT%\setup-x86_64.exe --quiet-mode --no-shortcuts --only-site --site "%CYG_MIRROR%" --packages "mingw64-%PACKAGE_ARCH%-zlib,mingw64-%PACKAGE_ARCH%-CUnit" > NUL'
-  - SET PATH=C:\Python38-x64;%CYG_ROOT%\bin;%PATH% # NB: Changed env variables persist to later sections
+  - if %DISTRO%==cygwin (
+      SET "PATH=C:\cygwin64\bin;C:\cygwin64;%PATH%"
+    )
+  - if %DISTRO%==msys2 if %ARCHITECTURE%==x86 (
+      SET "PATH=C:\msys64\mingw32\bin;C:\msys64\usr\bin;%PATH%"
+    )
+  - if %DISTRO%==msys2 if %ARCHITECTURE%==x64 (
+      SET "PATH=C:\msys64\mingw64\bin;C:\msys64\usr\bin;%PATH%"
+    )
+  - SET PATH=C:\Python38-x64;%PATH% # NB: Changed env variables persist to later sections
   - SET PYTHONUNBUFFERED=TRUE
-  - python.exe -m pip install scipy six
+  - bash.exe ci\appveyor-install.sh
 
 build_script:
-  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure --disable-native --extra-cflags=\"-Werror\" ${CONFIGURE_OPTIONS} && make.exe'
+  - bash.exe configure --extra-cflags=-Werror --disable-native %CONFIGURE_OPTIONS%
+  - make.exe -j2
 
 after_build:
   - file.exe fio.exe
   - make.exe test
-  - 'cd os\windows && dobuild.cmd %PLATFORM% && cd ..'
+  - 'cd os\windows && dobuild.cmd %ARCHITECTURE% && cd ..'
   - ps: Get-ChildItem .\os\windows\*.msi | % { Push-AppveyorArtifact $_.FullName -FileName $_.Name -DeploymentName fio.msi }
 
 test_script:
-  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && [ -f fio.exe ] && python.exe t/run-fio-tests.py --artifact-root test-artifacts --debug'
+  - python.exe t/run-fio-tests.py --artifact-root test-artifacts --debug
 
 on_finish:
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && [ -d test-artifacts ] && 7z a -t7z test-artifacts.7z test-artifacts -xr!foo.0.0 -xr!latency.?.0 -xr!fio_jsonplus_clat2csv.test && appveyor PushArtifact test-artifacts.7z'
diff --git a/Makefile b/Makefile
index b00daca2..5ed8a808 100644
--- a/Makefile
+++ b/Makefile
@@ -22,16 +22,22 @@ endif
 DEBUGFLAGS = -DFIO_INC_DEBUG
 CPPFLAGS= -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DFIO_INTERNAL $(DEBUGFLAGS)
 OPTFLAGS= -g -ffast-math
-CFLAGS	:= -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement $(OPTFLAGS) $(EXTFLAGS) $(BUILD_CFLAGS) -I. -I$(SRCDIR) $(CFLAGS)
+FIO_CFLAGS= -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement $(OPTFLAGS) $(EXTFLAGS) $(BUILD_CFLAGS) -I. -I$(SRCDIR)
 LIBS	+= -lm $(EXTLIBS)
 PROGS	= fio
 SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/hist/fiologparser_hist.py tools/hist/fio-histo-log-pctiles.py tools/fio_jsonplus_clat2csv)
 
 ifndef CONFIG_FIO_NO_OPT
-  CFLAGS := -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 $(CFLAGS)
+  FIO_CFLAGS += -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2
 endif
 ifdef CONFIG_BUILD_NATIVE
-  CFLAGS := -march=native $(CFLAGS)
+  FIO_CFLAGS += -march=native
+endif
+
+ifdef CONFIG_PDB
+  LINK_PDBFILE ?= -Wl,-pdb,$(dir $@)/$(basename $(@F)).pdb
+  FIO_CFLAGS += -gcodeview
+  LDFLAGS += -fuse-ld=lld $(LINK_PDBFILE)
 endif
 
 ifdef CONFIG_GFIO
@@ -55,7 +61,7 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 ifdef CONFIG_LIBHDFS
   HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
   HDFSLIB= -Wl,-rpath $(JAVA_HOME)/jre/lib/$(FIO_HDFS_CPU)/server -L$(JAVA_HOME)/jre/lib/$(FIO_HDFS_CPU)/server $(FIO_LIBHDFS_LIB)/libhdfs.a -ljvm
-  CFLAGS := $(HDFSFLAGS) $(CFLAGS)
+  FIO_CFLAGS += $(HDFSFLAGS)
   SOURCE += engines/libhdfs.c
 endif
 
@@ -74,10 +80,9 @@ ifdef CONFIG_LIBNBD
 endif
 
 ifdef CONFIG_64BIT
-  CFLAGS := -DBITS_PER_LONG=64 $(CFLAGS)
-endif
-ifdef CONFIG_32BIT
-  CFLAGS := -DBITS_PER_LONG=32 $(CFLAGS)
+  CPPFLAGS += -DBITS_PER_LONG=64
+else ifdef CONFIG_32BIT
+  CPPFLAGS += -DBITS_PER_LONG=32
 endif
 ifdef CONFIG_LIBAIO
   aio_SRCS = engines/libaio.c
@@ -155,7 +160,7 @@ ifdef CONFIG_GFAPI
   SOURCE += engines/glusterfs_async.c
   LIBS += -lgfapi -lglusterfs
   ifdef CONFIG_GF_FADVISE
-    CFLAGS := "-DGFAPI_USE_FADVISE" $(CFLAGS)
+    FIO_CFLAGS += "-DGFAPI_USE_FADVISE"
   endif
 endif
 ifdef CONFIG_MTD
@@ -234,7 +239,7 @@ ifeq ($(CONFIG_TARGET_OS), AIX)
 endif
 ifeq ($(CONFIG_TARGET_OS), HP-UX)
   LIBS   += -lpthread -ldl -lrt
-  CFLAGS := -D_LARGEFILE64_SOURCE -D_XOPEN_SOURCE_EXTENDED $(CFLAGS)
+  FIO_CFLAGS += -D_LARGEFILE64_SOURCE -D_XOPEN_SOURCE_EXTENDED
 endif
 ifeq ($(CONFIG_TARGET_OS), Darwin)
   LIBS	 += -lpthread -ldl
@@ -243,24 +248,21 @@ ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
   SOURCE += os/windows/cpu-affinity.c os/windows/posix.c
   WINDOWS_OBJS = os/windows/cpu-affinity.o os/windows/posix.o lib/hweight.o
   LIBS	 += -lpthread -lpsapi -lws2_32 -lssp
-  CFLAGS := -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format $(CFLAGS)
+  FIO_CFLAGS += -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format
 endif
 
 ifdef CONFIG_DYNAMIC_ENGINES
  DYNAMIC_ENGS := $(ENGINES)
 define engine_template =
 $(1)_OBJS := $$($(1)_SRCS:.c=.o)
-$$($(1)_OBJS): CFLAGS := -fPIC $$($(1)_CFLAGS) $(CFLAGS)
-engines/lib$(1).so: $$($(1)_OBJS)
-	$$(QUIET_LINK)$(CC) -shared -rdynamic -fPIC -Wl,-soname,lib$(1).so.1 $$($(1)_LIBS) -o $$@ $$<
+$$($(1)_OBJS): FIO_CFLAGS += -fPIC $$($(1)_CFLAGS)
 ENGS_OBJS += engines/lib$(1).so
-all install: $(ENGS_OBJS)
 endef
 else # !CONFIG_DYNAMIC_ENGINES
 define engine_template =
 SOURCE += $$($(1)_SRCS)
 LIBS += $$($(1)_LIBS)
-CFLAGS := $$($(1)_CFLAGS) $(CFLAGS)
+CFLAGS += $$($(1)_CFLAGS)
 endef
 endif
 
@@ -427,7 +429,7 @@ mandir = $(prefix)/man
 sharedir = $(prefix)/share/fio
 endif
 
-all: $(PROGS) $(T_TEST_PROGS) $(UT_PROGS) $(SCRIPTS) FORCE
+all: $(PROGS) $(T_TEST_PROGS) $(UT_PROGS) $(SCRIPTS) $(ENGS_OBJS) FORCE
 
 .PHONY: all install clean test
 .PHONY: FORCE cscope
@@ -436,7 +438,7 @@ FIO-VERSION-FILE: FORCE
 	@$(SHELL) $(SRCDIR)/FIO-VERSION-GEN
 -include FIO-VERSION-FILE
 
-override CFLAGS := -DFIO_VERSION='"$(FIO_VERSION)"' $(CFLAGS)
+override CFLAGS := -DFIO_VERSION='"$(FIO_VERSION)"' $(FIO_CFLAGS) $(CFLAGS)
 
 %.o : %.c
 	@mkdir -p $(dir $@)
@@ -478,7 +480,7 @@ lexer.h: lex.yy.c
 exp/test-expression-parser.o: exp/test-expression-parser.c
 	$(QUIET_CC)$(CC) -o $@ $(CFLAGS) $(CPPFLAGS) -c $<
 exp/test-expression-parser: exp/test-expression-parser.o
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) $< y.tab.o lex.yy.o -o $@ $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) $< y.tab.o lex.yy.o -o $@ $(LIBS)
 
 parse.o: lex.yy.o y.tab.o
 endif
@@ -514,55 +516,60 @@ printing.o: printing.c printing.h
 
 t/io_uring.o: os/linux/io_uring.h
 t/io_uring: $(T_IOU_RING_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_IOU_RING_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_IOU_RING_OBJS) $(LIBS)
 
 t/read-to-pipe-async: $(T_PIPE_ASYNC_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_PIPE_ASYNC_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_PIPE_ASYNC_OBJS) $(LIBS)
 
 t/memlock: $(T_MEMLOCK_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_MEMLOCK_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_MEMLOCK_OBJS) $(LIBS)
 
 t/stest: $(T_SMALLOC_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_SMALLOC_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_SMALLOC_OBJS) $(LIBS)
 
 t/ieee754: $(T_IEEE_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_IEEE_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_IEEE_OBJS) $(LIBS)
 
 fio: $(FIO_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(FIO_OBJS) $(LIBS) $(HDFSLIB)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(FIO_OBJS) $(LIBS) $(HDFSLIB)
 
 gfio: $(GFIO_OBJS)
 	$(QUIET_LINK)$(CC) $(filter-out -static, $(LDFLAGS)) -o gfio $(GFIO_OBJS) $(LIBS) $(GFIO_LIBS) $(GTK_LDFLAGS) $(HDFSLIB)
 
 t/fio-genzipf: $(T_ZIPF_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_ZIPF_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_ZIPF_OBJS) $(LIBS)
 
 t/axmap: $(T_AXMAP_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_AXMAP_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_AXMAP_OBJS) $(LIBS)
 
 t/lfsr-test: $(T_LFSR_TEST_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_LFSR_TEST_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_LFSR_TEST_OBJS) $(LIBS)
 
 t/gen-rand: $(T_GEN_RAND_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_GEN_RAND_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_GEN_RAND_OBJS) $(LIBS)
 
 ifeq ($(CONFIG_TARGET_OS), Linux)
 t/fio-btrace2fio: $(T_BTRACE_FIO_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_BTRACE_FIO_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_BTRACE_FIO_OBJS) $(LIBS)
 endif
 
 t/fio-dedupe: $(T_DEDUPE_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_DEDUPE_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_DEDUPE_OBJS) $(LIBS)
 
 t/fio-verify-state: $(T_VS_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_VS_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_VS_OBJS) $(LIBS)
 
 t/time-test: $(T_TT_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_TT_OBJS) $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(T_TT_OBJS) $(LIBS)
 
 ifdef CONFIG_HAVE_CUNIT
 unittests/unittest: $(UT_OBJS) $(UT_TARGET_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(UT_OBJS) $(UT_TARGET_OBJS) -lcunit $(LIBS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $(UT_OBJS) $(UT_TARGET_OBJS) -lcunit $(LIBS)
+endif
+
+ifdef CONFIG_DYNAMIC_ENGINES
+engines/lib$(1).so: $$($(1)_OBJS)
+	$$(QUIET_LINK)$(CC) -shared -rdynamic -fPIC -Wl,-soname,lib$(1).so.1 $$($(1)_LIBS) -o $$@ $$<
 endif
 
 clean: FORCE
@@ -603,7 +610,7 @@ fulltest:
 		sudo t/zbd/run-tests-against-zoned-nullb;	 	\
 	fi
 
-install: $(PROGS) $(SCRIPTS) tools/plot/fio2gnuplot.1 FORCE
+install: $(PROGS) $(SCRIPTS) $(ENGS_OBJS) tools/plot/fio2gnuplot.1 FORCE
 	$(INSTALL) -m 755 -d $(DESTDIR)$(bindir)
 	$(INSTALL) $(PROGS) $(SCRIPTS) $(DESTDIR)$(bindir)
 ifdef CONFIG_DYNAMIC_ENGINES
diff --git a/backend.c b/backend.c
index 05453ae2..f91f3caf 100644
--- a/backend.c
+++ b/backend.c
@@ -1458,16 +1458,17 @@ static bool keep_running(struct thread_data *td)
 	return false;
 }
 
-static int exec_string(struct thread_options *o, const char *string, const char *mode)
+static int exec_string(struct thread_options *o, const char *string,
+		       const char *mode)
 {
-	size_t newlen = strlen(string) + strlen(o->name) + strlen(mode) + 13 + 1;
 	int ret;
 	char *str;
 
-	str = malloc(newlen);
-	sprintf(str, "%s > %s.%s.txt 2>&1", string, o->name, mode);
+	if (asprintf(&str, "%s > %s.%s.txt 2>&1", string, o->name, mode) < 0)
+		return -1;
 
-	log_info("%s : Saving output of %s in %s.%s.txt\n",o->name, mode, o->name, mode);
+	log_info("%s : Saving output of %s in %s.%s.txt\n", o->name, mode,
+		 o->name, mode);
 	ret = system(str);
 	if (ret == -1)
 		log_err("fio: exec of cmd <%s> failed\n", str);
@@ -1731,7 +1732,7 @@ static void *thread_main(void *data)
 	if (!init_random_map(td))
 		goto err;
 
-	if (o->exec_prerun && exec_string(o, o->exec_prerun, (const char *)"prerun"))
+	if (o->exec_prerun && exec_string(o, o->exec_prerun, "prerun"))
 		goto err;
 
 	if (o->pre_read && !pre_read_files(td))
@@ -1890,7 +1891,7 @@ static void *thread_main(void *data)
 	rate_submit_exit(td);
 
 	if (o->exec_postrun)
-		exec_string(o, o->exec_postrun, (const char *)"postrun");
+		exec_string(o, o->exec_postrun, "postrun");
 
 	if (exitall_on_terminate || (o->exitall_error && td->error))
 		fio_terminate_threads(td->groupid, td->o.exit_what);
diff --git a/ci/appveyor-install.sh b/ci/appveyor-install.sh
new file mode 100755
index 00000000..c73e4cb5
--- /dev/null
+++ b/ci/appveyor-install.sh
@@ -0,0 +1,41 @@
+#!/bin/bash
+# The PATH to appropriate distro commands must already be set before invoking
+# this script
+# The following environment variables must be set:
+# PLATFORM={i686,x64}
+# DISTRO={cygwin,msys2}
+# The following environment can optionally be set:
+# CYG_MIRROR=<URL>
+set -eu
+
+case "${ARCHITECTURE}" in
+    "x64")
+        PACKAGE_ARCH="x86_64"
+        ;;
+    "x86")
+        PACKAGE_ARCH="i686"
+        ;;
+esac
+
+echo "Installing packages..."
+case "${DISTRO}" in
+    "cygwin")
+        CYG_MIRROR=${CYG_MIRROR:-"http://cygwin.mirror.constant.com"}
+        setup-x86_64.exe --quiet-mode --no-shortcuts --only-site \
+            --site "${CYG_MIRROR}" --packages \
+            "mingw64-${PACKAGE_ARCH}-CUnit,mingw64-${PACKAGE_ARCH}-zlib"
+        ;;
+    "msys2")
+        #pacman --noconfirm -Syuu # MSYS2 core update
+        #pacman --noconfirm -Syuu # MSYS2 normal update
+        pacman.exe --noconfirm -S \
+            mingw-w64-${PACKAGE_ARCH}-clang \
+            mingw-w64-${PACKAGE_ARCH}-cunit \
+            mingw-w64-${PACKAGE_ARCH}-lld
+        ;;
+esac
+
+python.exe -m pip install scipy six
+
+echo "Python3 path: $(type -p python3 2>&1)"
+echo "Python3 version: $(python3 -V 2>&1)"
diff --git a/ci/travis-build.sh b/ci/travis-build.sh
index 231417e2..923d882d 100755
--- a/ci/travis-build.sh
+++ b/ci/travis-build.sh
@@ -1,8 +1,9 @@
 #!/bin/bash
+set -eu
 
 CI_TARGET_ARCH="${BUILD_ARCH:-$TRAVIS_CPU_ARCH}"
 EXTRA_CFLAGS="-Werror"
-PYTHONUNBUFFERED=TRUE
+export PYTHONUNBUFFERED=TRUE
 CONFIGURE_FLAGS=()
 
 case "$TRAVIS_OS_NAME" in
@@ -11,6 +12,7 @@ case "$TRAVIS_OS_NAME" in
         case "$CI_TARGET_ARCH" in
             "x86")
                 EXTRA_CFLAGS="${EXTRA_CFLAGS} -m32"
+                export LDFLAGS="-m32"
                 ;;
             "amd64")
                 CONFIGURE_FLAGS+=(--enable-cuda)
@@ -24,7 +26,7 @@ CONFIGURE_FLAGS+=(--extra-cflags="${EXTRA_CFLAGS}")
     make &&
     make test &&
     if [[ "$CI_TARGET_ARCH" == "arm64" ]]; then
-	sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug -p 1010:"--skip 15 16 17 18 19 20"
+        sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug -p 1010:"--skip 15 16 17 18 19 20"
     else
-	sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug
+        sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug
     fi
diff --git a/ci/travis-install.sh b/ci/travis-install.sh
index b6895e82..103695dc 100755
--- a/ci/travis-install.sh
+++ b/ci/travis-install.sh
@@ -1,5 +1,5 @@
 #!/bin/bash
-set -e
+set -eu
 
 CI_TARGET_ARCH="${BUILD_ARCH:-$TRAVIS_CPU_ARCH}"
 case "$TRAVIS_OS_NAME" in
@@ -51,6 +51,5 @@ case "$TRAVIS_OS_NAME" in
 	;;
 esac
 
-echo "Python version: $(/usr/bin/python -V 2>&1)"
-echo "Python3 path: $(which python3 2>&1)"
+echo "Python3 path: $(type -p python3 2>&1)"
 echo "Python3 version: $(python3 -V 2>&1)"
diff --git a/configure b/configure
index 08571fb0..12b7cb58 100755
--- a/configure
+++ b/configure
@@ -193,6 +193,8 @@ for opt do
   ;;
   --target-win-ver=*) target_win_ver="$optarg"
   ;;
+  --enable-pdb) pdb="yes"
+  ;;
   --build-static) build_static="yes"
   ;;
   --enable-gfio) gfio_check="yes"
@@ -256,6 +258,7 @@ if test "$show_help" = "yes" ; then
   echo "--extra-cflags=         Specify extra CFLAGS to pass to compiler"
   echo "--build-32bit-win       Enable 32-bit build on Windows"
   echo "--target-win-ver=       Minimum version of Windows to target (XP or 7)"
+  echo "--enable-pdb            Enable Windows PDB symbols generation (needs clang/lld)"
   echo "--build-static          Build a static fio"
   echo "--esx                   Configure build options for esx"
   echo "--enable-gfio           Enable building of gtk gfio"
@@ -394,6 +397,8 @@ CYGWIN*)
   fi
   if test "$target_win_ver" = "XP"; then
     output_sym "CONFIG_WINDOWS_XP"
+    # Technically the below is targeting 2003
+    CFLAGS="$CFLAGS -D_WIN32_WINNT=0x0502"
   elif test "$target_win_ver" = "7"; then
     output_sym "CONFIG_WINDOWS_7"
     CFLAGS="$CFLAGS -D_WIN32_WINNT=0x0601"
@@ -939,7 +944,8 @@ cat > $TMPC << EOF
 
 int main(int argc, char **argv)
 {
-  return asprintf(NULL, "%s", "str") == 0;
+  char *buf;
+  return asprintf(&buf, "%s", "str") == 0;
 }
 EOF
 if compile_prog "" "" "have_asprintf"; then
@@ -956,7 +962,8 @@ cat > $TMPC << EOF
 int main(int argc, char **argv)
 {
   va_list ap;
-  return vasprintf(NULL, "%s", ap) == 0;
+  char *buf;
+  return vasprintf(&buf, "%s", ap) == 0;
 }
 EOF
 if compile_prog "" "" "have_vasprintf"; then
@@ -2234,19 +2241,14 @@ lex="no"
 arith="no"
 if test "$disable_lex" = "no" || test -z "$disable_lex" ; then
 if test "$targetos" != "SunOS" ; then
-LEX=$(which lex 2> /dev/null)
-if test -x "$LEX" ; then
+if has lex; then
   lex="yes"
 fi
-YACC=$(which bison 2> /dev/null)
-if test -x "$YACC" ; then
+if has bison; then
   yacc="yes"
   yacc_is_bison="yes"
-else
-  YACC=$(which yacc 2> /dev/null)
-  if test -x "$YACC" ; then
-    yacc="yes"
-  fi
+elif has yacc; then
+  yacc="yes"
 fi
 if test "$yacc" = "yes" && test "$lex" = "yes" ; then
   arith="yes"
@@ -2262,7 +2264,9 @@ int main(int argc, char **argv)
   return 0;
 }
 EOF
-if compile_prog "" "-ll" "lex"; then
+if compile_prog "" "-lfl" "flex"; then
+  LIBS="-lfl $LIBS"
+elif compile_prog "" "-ll" "lex"; then
   LIBS="-ll $LIBS"
 else
   arith="no"
@@ -2276,8 +2280,7 @@ if test "$arith" = "yes" ; then
 if test "$force_no_lex_o" = "yes" ; then
   lex_use_o="no"
 else
-$LEX -o lex.yy.c exp/expression-parser.l 2> /dev/null
-if test "$?" = "0" ; then
+if lex -o lex.yy.c exp/expression-parser.l 2> /dev/null; then
   lex_use_o="yes"
 else
   lex_use_o="no"
@@ -2698,6 +2701,27 @@ if compile_prog "" "" "statx_syscall"; then
 fi
 print_config "statx(2)/syscall" "$statx_syscall"
 
+##########################################
+# check for Windows PDB generation support
+if test "pdb" != "no" ; then
+  cat > $TMPC <<EOF
+int main(void)
+{
+  return 0;
+}
+EOF
+  if compile_prog "-g -gcodeview" "-fuse-ld=lld -Wl,-pdb,$TMPO" "pdb"; then
+    pdb=yes
+  else
+    if test "$pdb" = "yes"; then
+      feature_not_found "PDB" "clang and lld"
+    fi
+    pdb=no
+  fi
+else
+  pdb=no
+fi
+print_config "Windows PDB generation" "$pdb"
 #############################################################################
 
 if test "$wordsize" = "64" ; then
@@ -2931,9 +2955,9 @@ fi
 if test "$arith" = "yes" ; then
   output_sym "CONFIG_ARITHMETIC"
   if test "$yacc_is_bison" = "yes" ; then
-    echo "YACC=$YACC -y" >> $config_host_mak
+    echo "YACC=bison -y" >> $config_host_mak
   else
-    echo "YACC=$YACC" >> $config_host_mak
+    echo "YACC=yacc" >> $config_host_mak
   fi
   if test "$lex_use_o" = "yes" ; then
     echo "CONFIG_LEX_USE_O=y" >> $config_host_mak
@@ -3020,6 +3044,10 @@ fi
 if test "$dynamic_engines" = "yes" ; then
   output_sym "CONFIG_DYNAMIC_ENGINES"
 fi
+if test "$pdb" = yes; then
+  output_sym "CONFIG_PDB"
+fi
+
 print_config "Lib-based ioengines dynamic" "$dynamic_engines"
 cat > $TMPC << EOF
 int main(int argc, char **argv)
diff --git a/engines/net.c b/engines/net.c
index 91f25774..c6cec584 100644
--- a/engines/net.c
+++ b/engines/net.c
@@ -938,8 +938,9 @@ static int fio_netio_udp_recv_open(struct thread_data *td, struct fio_file *f)
 
 	if (ntohl(msg.magic) != FIO_LINK_OPEN_CLOSE_MAGIC ||
 	    ntohl(msg.cmd) != FIO_LINK_OPEN) {
-		log_err("fio: bad udp open magic %x/%x\n", ntohl(msg.magic),
-								ntohl(msg.cmd));
+		log_err("fio: bad udp open magic %x/%x\n",
+			(unsigned int) ntohl(msg.magic),
+			(unsigned int) ntohl(msg.cmd));
 		return -1;
 	}
 
diff --git a/engines/windowsaio.c b/engines/windowsaio.c
index 5c7e7964..9868e816 100644
--- a/engines/windowsaio.c
+++ b/engines/windowsaio.c
@@ -161,15 +161,15 @@ static int windowsaio_invalidate_cache(struct fio_file *f)
 	if (ihFile != INVALID_HANDLE_VALUE) {
 		if (!CloseHandle(ihFile)) {
 			error = GetLastError();
-			log_info("windowsaio: invalidation fd close %s "
-				 "failed: error %d\n", f->file_name, error);
+			log_info("windowsaio: invalidation fd close %s failed: error %lu\n",
+				 f->file_name, error);
 			rc = 1;
 		}
 	} else {
 		error = GetLastError();
 		if (error != ERROR_FILE_NOT_FOUND) {
-			log_info("windowsaio: cache invalidation of %s failed: "
-					"error %d\n", f->file_name, error);
+			log_info("windowsaio: cache invalidation of %s failed: error %lu\n",
+				 f->file_name, error);
 			rc = 1;
 		}
 	}
diff --git a/os/windows/cpu-affinity.c b/os/windows/cpu-affinity.c
index 69997b24..46fd048d 100644
--- a/os/windows/cpu-affinity.c
+++ b/os/windows/cpu-affinity.c
@@ -14,7 +14,7 @@ int fio_setaffinity(int pid, os_cpu_mask_t cpumask)
 		bSuccess = SetThreadAffinityMask(h, cpumask);
 		if (!bSuccess)
 			log_err("fio_setaffinity failed: failed to set thread affinity (pid %d, mask %.16llx)\n",
-				pid, cpumask);
+				pid, (long long unsigned) cpumask);
 
 		CloseHandle(h);
 	} else {
@@ -83,7 +83,7 @@ unsigned int cpus_online(void)
 static void print_mask(os_cpu_mask_t *cpumask)
 {
 	for (int i = 0; i < FIO_CPU_MASK_ROWS; i++)
-		dprint(FD_PROCESS, "cpumask[%d]=%lu\n", i, cpumask->row[i]);
+		dprint(FD_PROCESS, "cpumask[%d]=%" PRIu64 "\n", i, cpumask->row[i]);
 }
 
 /* Return the index of the least significant set CPU in cpumask or -1 if no
@@ -99,7 +99,7 @@ int first_set_cpu(os_cpu_mask_t *cpumask)
 		int row_first_cpu;
 
 		row_first_cpu = __builtin_ffsll(cpumask->row[row]) - 1;
-		dprint(FD_PROCESS, "row_first_cpu=%d cpumask->row[%d]=%lu\n",
+		dprint(FD_PROCESS, "row_first_cpu=%d cpumask->row[%d]=%" PRIu64 "\n",
 		       row_first_cpu, row, cpumask->row[row]);
 		if (row_first_cpu > -1) {
 			mask_first_cpu = cpus_offset + row_first_cpu;
@@ -136,7 +136,7 @@ static int last_set_cpu(os_cpu_mask_t *cpumask)
 			    row_last_cpu++;
 		}
 
-		dprint(FD_PROCESS, "row_last_cpu=%d cpumask->row[%d]=%lu\n",
+		dprint(FD_PROCESS, "row_last_cpu=%d cpumask->row[%d]=%" PRIu64 "\n",
 		       row_last_cpu, row, cpumask->row[row]);
 		if (row_last_cpu > -1) {
 			mask_last_cpu = cpus_offset + row_last_cpu;
@@ -213,13 +213,17 @@ static int mask_to_group_mask(os_cpu_mask_t *cpumask, int *processor_group, uint
 		needed_shift = FIO_CPU_MASK_STRIDE - bit_offset;
 		needed_mask_shift = FIO_CPU_MASK_STRIDE - needed;
 		needed_mask = (uint64_t)-1 >> needed_mask_shift;
-		dprint(FD_PROCESS, "bit_offset=%d end=%d needed=%d needed_shift=%d needed_mask=%ld needed_mask_shift=%d\n", bit_offset, end, needed, needed_shift, needed_mask, needed_mask_shift);
+		dprint(FD_PROCESS,
+		       "bit_offset=%d end=%d needed=%d needed_shift=%d needed_mask=%" PRIu64 "needed_mask_shift=%d\n",
+		       bit_offset, end, needed, needed_shift, needed_mask,
+		       needed_mask_shift);
 		group_cpumask |= (cpumask->row[row + 1] & needed_mask) << needed_shift;
 	}
 	group_cpumask &= (uint64_t)-1 >> (FIO_CPU_MASK_STRIDE - group_size);
 
 	/* Return group and mask */
-	dprint(FD_PROCESS, "Returning group=%d group_mask=%lu\n", group, group_cpumask);
+	dprint(FD_PROCESS, "Returning group=%d group_mask=%" PRIu64 "\n",
+	       group, group_cpumask);
 	*processor_group = group;
 	*affinity_mask = group_cpumask;
 
@@ -257,10 +261,8 @@ int fio_setaffinity(int pid, os_cpu_mask_t cpumask)
 	if (SetThreadGroupAffinity(handle, &new_group_affinity, NULL) != 0)
 		ret = 0;
 	else {
-		log_err("fio_setaffinity: failed to set thread affinity "
-			 "(pid %d, group %d, mask %" PRIx64 ", "
-			 "GetLastError=%d)\n", pid, group, group_mask,
-			 GetLastError());
+		log_err("fio_setaffinity: failed to set thread affinity (pid %d, group %d, mask %" PRIx64 ", GetLastError=%lu)\n",
+			pid, group, group_mask, GetLastError());
 		goto err;
 	}
 
@@ -319,7 +321,7 @@ int fio_getaffinity(int pid, os_cpu_mask_t *mask)
 		goto err;
 	}
 	if (!GetProcessGroupAffinity(handle, &group_count, current_groups)) {
-		log_err("%s: failed to get single group affinity for pid %d (%d)\n",
+		log_err("%s: failed to get single group affinity for pid %d (%lu)\n",
 			__func__, pid, GetLastError());
 		goto err;
 	}
@@ -329,7 +331,7 @@ int fio_getaffinity(int pid, os_cpu_mask_t *mask)
 		goto err;
 	}
 	if (!GetProcessAffinityMask(handle, &process_mask, &system_mask)) {
-		log_err("%s: GetProcessAffinityMask() failed for pid\n",
+		log_err("%s: GetProcessAffinityMask() failed for pid %d\n",
 			__func__, pid);
 		goto err;
 	}
diff --git a/os/windows/dobuild.cmd b/os/windows/dobuild.cmd
index d06a2afa..08df3e87 100644
--- a/os/windows/dobuild.cmd
+++ b/os/windows/dobuild.cmd
@@ -34,7 +34,13 @@ if defined SIGN_FIO (
   signtool sign /as /n "%SIGNING_CN%" /tr http://timestamp.digicert.com /td sha256 /fd sha256 ..\..\t\*.exe
 )
 
-"%WIX%bin\candle" -nologo -arch %FIO_ARCH% -dFioVersionNumbers="%FIO_VERSION_NUMBERS%" install.wxs
+if exist ..\..\fio.pdb (
+  set FIO_PDB=true
+) else (
+  set FIO_PDB=false
+)
+
+"%WIX%bin\candle" -nologo -arch %FIO_ARCH% -dFioVersionNumbers="%FIO_VERSION_NUMBERS%" -dFioPDB="%FIO_PDB%" install.wxs
 @if ERRORLEVEL 1 goto end
 "%WIX%bin\candle" -nologo -arch %FIO_ARCH% examples.wxs
 @if ERRORLEVEL 1 goto end
@@ -43,4 +49,4 @@ if defined SIGN_FIO (
 
 if defined SIGN_FIO (
   signtool sign /n "%SIGNING_CN%" /tr http://timestamp.digicert.com /td sha256 /fd sha256 %FIO_VERSION%-%FIO_ARCH%.msi
-)
\ No newline at end of file
+)
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index dcb8c92c..f73ec5e2 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -27,6 +27,11 @@
 							<File Source="..\..\fio.exe"/>
 							<Environment Action="set" Part="last" Id="PATH" Name="PATH" Value="[INSTALLDIR]fio\" System="yes"/>
 						</Component>
+						<?if $(var.FioPDB) = true?>
+						<Component>
+							<File Id="fio.pdb" Name="fio.pdb" Source="..\..\fio.pdb"/>
+						</Component>
+						<?endif?>
 						<Component>
 							<File Id="README" Name="README.txt" Source="..\..\README"/>
 						</Component>
@@ -76,6 +81,9 @@
 
 	<Feature Id="AlwaysInstall" Absent="disallow" ConfigurableDirectory="INSTALLDIR" Display="hidden" Level="1" Title="Flexible I/O Tester">
 		<ComponentRef Id="fio.exe"/>
+		<?if $(var.FioPDB) = true?>
+		<ComponentRef Id="fio.pdb"/>
+		<?endif?>
 		<ComponentRef Id="HOWTO"/>
 		<ComponentRef Id="README"/>
 		<ComponentRef Id="REPORTING_BUGS"/>
diff --git a/os/windows/posix.c b/os/windows/posix.c
index 31271de0..9e9f12ef 100644
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -168,7 +168,7 @@ int win_to_posix_error(DWORD winerr)
 	case ERROR_FILE_INVALID:
 		return ENXIO;
 	default:
-		log_err("fio: windows error %d not handled\n", winerr);
+		log_err("fio: windows error %lu not handled\n", winerr);
 		return EIO;
 	}
 
@@ -188,7 +188,8 @@ int GetNumLogicalProcessors(void)
 		if (error == ERROR_INSUFFICIENT_BUFFER)
 			processor_info = malloc(len);
 		else {
-			log_err("Error: GetLogicalProcessorInformation failed: %d\n", error);
+			log_err("Error: GetLogicalProcessorInformation failed: %lu\n",
+				error);
 			return -1;
 		}
 
diff --git a/t/memlock.c b/t/memlock.c
index 418dc3c4..9f5a3ea8 100644
--- a/t/memlock.c
+++ b/t/memlock.c
@@ -22,7 +22,7 @@ static void *worker(void *data)
 		for (index = 0; index + 4096 < size; index += 4096)
 			memset(&buf[index+512], 0x89, 512);
 		if (first) {
-			printf("loop%d: did %lu MiB\n", i+1, size/(1024UL*1024UL));
+			printf("loop%d: did %lu MiB\n", i+1, td->mib);
 			first = 0;
 		}
 	}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-09-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-09-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e55138eed3296b642229271b0dcec492ec776702:

  Merge branch 'evelu-enghelp' of https://github.com/ErwanAliasr1/fio into master (2020-09-09 10:48:27 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 695611a9d4cd554d44d8b2ec5da2811061950a2e:

  Allow offload with FAKEIO engines (2020-09-11 09:58:15 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      engines/io_uring: mark as not compatible with io_submit_mode=offload
      Disable io_submit_mode=offload with async engines
      Allow offload with FAKEIO engines

 HOWTO              |  3 ++-
 engines/io_uring.c |  6 ++++++
 fio.1              |  2 +-
 ioengines.c        | 14 ++++++++++++--
 4 files changed, 21 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index d8586723..2d8c7a02 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2474,7 +2474,8 @@ I/O depth
 	can increase latencies. The benefit is that fio can manage submission rates
 	independently of the device completion rates. This avoids skewed latency
 	reporting if I/O gets backed up on the device side (the coordinated omission
-	problem).
+	problem). Note that this option cannot reliably be used with async IO
+	engines.
 
 
 I/O rate
diff --git a/engines/io_uring.c b/engines/io_uring.c
index e2b5e6ee..69f48859 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -724,6 +724,12 @@ static int fio_ioring_init(struct thread_data *td)
 	struct ioring_data *ld;
 	struct thread_options *to = &td->o;
 
+	if (to->io_submit_mode == IO_MODE_OFFLOAD) {
+		log_err("fio: io_submit_mode=offload is not compatible (or "
+			"useful) with io_uring\n");
+		return 1;
+	}
+
 	/* sqthread submission requires registered files */
 	if (o->sqpoll_thread)
 		o->registerfiles = 1;
diff --git a/fio.1 b/fio.1
index 74509bbd..a881277c 100644
--- a/fio.1
+++ b/fio.1
@@ -2215,7 +2215,7 @@ has a bit of extra overhead, especially for lower queue depth I/O where it
 can increase latencies. The benefit is that fio can manage submission rates
 independently of the device completion rates. This avoids skewed latency
 reporting if I/O gets backed up on the device side (the coordinated omission
-problem).
+problem). Note that this option cannot reliably be used with async IO engines.
 .SS "I/O rate"
 .TP
 .BI thinktime \fR=\fPtime
diff --git a/ioengines.c b/ioengines.c
index 476df58d..d3be8026 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -22,7 +22,7 @@
 
 static FLIST_HEAD(engine_list);
 
-static bool check_engine_ops(struct ioengine_ops *ops)
+static bool check_engine_ops(struct thread_data *td, struct ioengine_ops *ops)
 {
 	if (ops->version != FIO_IOOPS_VERSION) {
 		log_err("bad ioops version %d (want %d)\n", ops->version,
@@ -41,6 +41,16 @@ static bool check_engine_ops(struct ioengine_ops *ops)
 	if (ops->flags & FIO_SYNCIO)
 		return false;
 
+	/*
+	 * async engines aren't reliable with offload
+	 */
+	if ((td->o.io_submit_mode == IO_MODE_OFFLOAD) &&
+	    !(ops->flags & FIO_FAKEIO)) {
+		log_err("%s: can't be used with offloaded submit. Use a sync "
+			"engine\n", ops->name);
+		return true;
+	}
+
 	if (!ops->event || !ops->getevents) {
 		log_err("%s: no event/getevents handler\n", ops->name);
 		return true;
@@ -193,7 +203,7 @@ struct ioengine_ops *load_ioengine(struct thread_data *td)
 	/*
 	 * Check that the required methods are there.
 	 */
-	if (check_engine_ops(ops))
+	if (check_engine_ops(td, ops))
 		return NULL;
 
 	return ops;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-09-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-09-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ca8ff0968cec7ee47ca7fe5b40f592c2b332b062:

  Merge branch 'zbd' of https://github.com/bvanassche/fio into master (2020-09-08 10:25:48 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e55138eed3296b642229271b0dcec492ec776702:

  Merge branch 'evelu-enghelp' of https://github.com/ErwanAliasr1/fio into master (2020-09-09 10:48:27 -0600)

----------------------------------------------------------------
Erwan Velu (1):
      init: exiting with fio_show_ioengine_help return code

Jens Axboe (1):
      Merge branch 'evelu-enghelp' of https://github.com/ErwanAliasr1/fio into master

 init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 3cd0238b..7f64ce21 100644
--- a/init.c
+++ b/init.c
@@ -2543,7 +2543,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 		case 'i':
 			did_arg = true;
 			if (!cur_client) {
-				fio_show_ioengine_help(optarg);
+				exit_val = fio_show_ioengine_help(optarg);
 				do_exit++;
 			}
 			break;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-09-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-09-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit be23e6be4fadb723f925824f88fbaedbd3502251:

  Kill off old GUASI IO engine (2020-09-07 13:38:03 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ca8ff0968cec7ee47ca7fe5b40f592c2b332b062:

  Merge branch 'zbd' of https://github.com/bvanassche/fio into master (2020-09-08 10:25:48 -0600)

----------------------------------------------------------------
Bart Van Assche (1):
      zbd: Add a missing pthread_mutex_unlock() call

Jens Axboe (2):
      engines/io_uring: allow setting of IOSQE_ASYNC
      Merge branch 'zbd' of https://github.com/bvanassche/fio into master

 engines/io_uring.c | 16 ++++++++++++++++
 zbd.c              |  1 +
 2 files changed, 17 insertions(+)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index ca5b90c9..e2b5e6ee 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -66,6 +66,7 @@ struct ioring_data {
 	unsigned iodepth;
 	bool ioprio_class_set;
 	bool ioprio_set;
+	int prepped;
 
 	struct ioring_mmap mmap[3];
 };
@@ -82,6 +83,7 @@ struct ioring_options {
 	unsigned int nonvectored;
 	unsigned int uncached;
 	unsigned int nowait;
+	unsigned int force_async;
 };
 
 static const int ddir_to_op[2][2] = {
@@ -197,6 +199,15 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
 	},
+	{
+		.name	= "force_async",
+		.lname	= "Force async",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct ioring_options, force_async),
+		.help	= "Set IOSQE_ASYNC every N requests",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
 	{
 		.name	= NULL,
 	},
@@ -277,6 +288,11 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 		}
 	}
 
+	if (o->force_async && ++ld->prepped == o->force_async) {
+		ld->prepped = 0;
+		sqe->flags |= IOSQE_ASYNC;
+	}
+
 	sqe->user_data = (unsigned long) io_u;
 	return 0;
 }
diff --git a/zbd.c b/zbd.c
index e8ecbb6f..905c0c2b 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1546,6 +1546,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	case DDIR_READ:
 		if (td->runstate == TD_VERIFYING && td_write(td)) {
 			zb = zbd_replay_write_order(td, io_u, zb);
+			pthread_mutex_unlock(&zb->mutex);
 			goto accept;
 		}
 		/*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-09-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-09-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 03eabd8a7a53c80a6c9d1376b38a9b7ddce2268a:

  Merge branch 'guasi_fixes' of https://github.com/sitsofe/fio into master (2020-09-06 16:14:58 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to be23e6be4fadb723f925824f88fbaedbd3502251:

  Kill off old GUASI IO engine (2020-09-07 13:38:03 -0600)

----------------------------------------------------------------
Alexey Dobriyan (1):
      fio: cap io_size=N% at 100%, update man page

Jens Axboe (1):
      Kill off old GUASI IO engine

 HOWTO           |   8 --
 Makefile        |   5 --
 configure       |  22 -----
 engines/guasi.c | 270 --------------------------------------------------------
 fio.1           |   8 +-
 io_u.h          |   6 --
 options.c       |  11 ++-
 parse.h         |   5 ++
 8 files changed, 12 insertions(+), 323 deletions(-)
 delete mode 100644 engines/guasi.c

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 5dc571f8..d8586723 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1901,14 +1901,6 @@ I/O engine
 			single CPU at the desired rate. A job never finishes unless there is
 			at least one non-cpuio job.
 
-		**guasi**
-			The GUASI I/O engine is the Generic Userspace Asynchronous Syscall
-			Interface approach to async I/O. See
-
-			http://www.xmailserver.org/guasi-lib.html
-
-			for more info on GUASI.
-
 		**rdma**
 			The RDMA I/O engine supports both RDMA memory semantics
 			(RDMA_WRITE/RDMA_READ) and channel semantics (Send/Recv) for the
diff --git a/Makefile b/Makefile
index 6678c2fd..b00daca2 100644
--- a/Makefile
+++ b/Makefile
@@ -106,11 +106,6 @@ endif
 ifdef CONFIG_LINUX_SPLICE
   SOURCE += engines/splice.c
 endif
-ifdef CONFIG_GUASI
-  guasi_SRCS = engines/guasi.c
-  guasi_LIBS = -lguasi
-  ENGINES += guasi
-endif
 ifdef CONFIG_SOLARISAIO
   SOURCE += engines/solarisaio.c
 endif
diff --git a/configure b/configure
index 81bfb270..08571fb0 100755
--- a/configure
+++ b/configure
@@ -1311,25 +1311,6 @@ if compile_prog "" "" "linux splice"; then
 fi
 print_config "Linux splice(2)" "$linux_splice"
 
-##########################################
-# GUASI probe
-if test "$guasi" != "yes" ; then
-  guasi="no"
-fi
-cat > $TMPC << EOF
-#include <guasi.h>
-#include <guasi_syscalls.h>
-int main(int argc, char **argv)
-{
-  guasi_t ctx = guasi_create(0, 0, 0);
-  return 0;
-}
-EOF
-if compile_prog "" "-lguasi" "guasi"; then
-  guasi="yes"
-fi
-print_config "GUASI" "$guasi"
-
 ##########################################
 # libnuma probe
 if test "$libnuma" != "yes" ; then
@@ -2847,9 +2828,6 @@ fi
 if test "$linux_splice" = "yes" ; then
   output_sym "CONFIG_LINUX_SPLICE"
 fi
-if test "$guasi" = "yes" ; then
-  output_sym "CONFIG_GUASI"
-fi
 if test "$libnuma_v2" = "yes" ; then
   output_sym "CONFIG_LIBNUMA"
 fi
diff --git a/engines/guasi.c b/engines/guasi.c
deleted file mode 100644
index d4121757..00000000
--- a/engines/guasi.c
+++ /dev/null
@@ -1,270 +0,0 @@
-/*
- * guasi engine
- *
- * IO engine using the GUASI library.
- *
- * Before running make. You'll need the GUASI lib as well:
- *
- * http://www.xmailserver.org/guasi-lib.html
- *
- */
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <errno.h>
-#include <assert.h>
-
-#include "../fio.h"
-
-#define GFIO_MIN_THREADS 32
-#ifndef GFIO_MAX_THREADS
-#define GFIO_MAX_THREADS 2000
-#endif
-
-#include <guasi.h>
-#include <guasi_syscalls.h>
-
-#ifdef GFIO_DEBUG
-#define GDBG_PRINT(a) printf a
-#else
-#define GDBG_PRINT(a) (void) 0
-#endif
-
-struct guasi_data {
-	guasi_t hctx;
-	int max_reqs;
-	guasi_req_t *reqs;
-	struct io_u **io_us;
-	int queued_nr;
-	int reqs_nr;
-};
-
-static int fio_guasi_prep(struct thread_data fio_unused *td, struct io_u *io_u)
-{
-
-	GDBG_PRINT(("fio_guasi_prep(%p)\n", io_u));
-	io_u->greq = NULL;
-
-	return 0;
-}
-
-static struct io_u *fio_guasi_event(struct thread_data *td, int event)
-{
-	struct guasi_data *ld = td->io_ops_data;
-	struct io_u *io_u;
-	struct guasi_reqinfo rinf;
-
-	GDBG_PRINT(("fio_guasi_event(%d)\n", event));
-	if (guasi_req_info(ld->reqs[event], &rinf) < 0) {
-		log_err("guasi_req_info(%d) FAILED!\n", event);
-		return NULL;
-	}
-	io_u = rinf.asid;
-	io_u->error = EINPROGRESS;
-	GDBG_PRINT(("fio_guasi_event(%d) -> %p\n", event, io_u));
-	if (rinf.status == GUASI_STATUS_COMPLETE) {
-		io_u->error = rinf.result;
-		if (io_u->ddir == DDIR_READ ||
-		    io_u->ddir == DDIR_WRITE) {
-			io_u->error = 0;
-			if (rinf.result != (long) io_u->xfer_buflen) {
-				if (rinf.result >= 0)
-					io_u->resid = io_u->xfer_buflen - rinf.result;
-				else
-					io_u->error = rinf.error;
-			}
-		}
-	}
-
-	return io_u;
-}
-
-static int fio_guasi_getevents(struct thread_data *td, unsigned int min,
-			       unsigned int max, const struct timespec *t)
-{
-	struct guasi_data *ld = td->io_ops_data;
-	int n, r;
-	long timeo = -1;
-
-	GDBG_PRINT(("fio_guasi_getevents(%d, %d)\n", min, max));
-	if (min > ld->max_reqs)
-		min = ld->max_reqs;
-	if (max > ld->max_reqs)
-		max = ld->max_reqs;
-	if (t)
-		timeo = t->tv_sec * 1000L + t->tv_nsec / 1000000L;
-	for (n = 0; n < ld->reqs_nr; n++)
-		guasi_req_free(ld->reqs[n]);
-	n = 0;
-	do {
-		r = guasi_fetch(ld->hctx, ld->reqs + n, min - n,
-				max - n, timeo);
-		if (r < 0) {
-			log_err("guasi_fetch() FAILED! (%d)\n", r);
-			break;
-		}
-		n += r;
-		if (n >= min)
-			break;
-	} while (1);
-	ld->reqs_nr = n;
-	GDBG_PRINT(("fio_guasi_getevents() -> %d\n", n));
-
-	return n;
-}
-
-static enum fio_q_status fio_guasi_queue(struct thread_data *td,
-					 struct io_u *io_u)
-{
-	struct guasi_data *ld = td->io_ops_data;
-
-	fio_ro_check(td, io_u);
-
-	GDBG_PRINT(("fio_guasi_queue(%p)\n", io_u));
-	if (ld->queued_nr == (int) td->o.iodepth)
-		return FIO_Q_BUSY;
-
-	ld->io_us[ld->queued_nr] = io_u;
-	ld->queued_nr++;
-	return FIO_Q_QUEUED;
-}
-
-static void fio_guasi_queued(struct thread_data *td, struct io_u **io_us, int nr)
-{
-	int i;
-	struct io_u *io_u;
-	struct timespec now;
-
-	if (!fio_fill_issue_time(td))
-		return;
-
-	io_u_mark_submit(td, nr);
-	fio_gettime(&now, NULL);
-	for (i = 0; i < nr; i++) {
-		io_u = io_us[i];
-		memcpy(&io_u->issue_time, &now, sizeof(now));
-		io_u_queued(td, io_u);
-	}
-}
-
-static int fio_guasi_commit(struct thread_data *td)
-{
-	struct guasi_data *ld = td->io_ops_data;
-	int i;
-	struct io_u *io_u;
-	struct fio_file *f;
-
-	GDBG_PRINT(("fio_guasi_commit(%d)\n", ld->queued_nr));
-	for (i = 0; i < ld->queued_nr; i++) {
-		io_u = ld->io_us[i];
-		GDBG_PRINT(("fio_guasi_commit(%d) --> %p\n", i, io_u));
-		f = io_u->file;
-		io_u->greq = NULL;
-		if (io_u->ddir == DDIR_READ)
-			io_u->greq = guasi__pread(ld->hctx, ld, io_u, 0,
-						  f->fd, io_u->xfer_buf, io_u->xfer_buflen,
-						  io_u->offset);
-		else if (io_u->ddir == DDIR_WRITE)
-			io_u->greq = guasi__pwrite(ld->hctx, ld, io_u, 0,
-						   f->fd, io_u->xfer_buf, io_u->xfer_buflen,
-						   io_u->offset);
-		else if (ddir_sync(io_u->ddir))
-			io_u->greq = guasi__fsync(ld->hctx, ld, io_u, 0, f->fd);
-		else {
-			log_err("fio_guasi_commit() FAILED: unknow request %d\n",
-				io_u->ddir);
-		}
-		if (io_u->greq == NULL) {
-			log_err("fio_guasi_commit() FAILED: submit failed (%s)\n",
-				strerror(errno));
-			return -1;
-		}
-	}
-	fio_guasi_queued(td, ld->io_us, i);
-	ld->queued_nr = 0;
-	GDBG_PRINT(("fio_guasi_commit() -> %d\n", i));
-
-	return 0;
-}
-
-static int fio_guasi_cancel(struct thread_data fio_unused *td,
-			    struct io_u *io_u)
-{
-	GDBG_PRINT(("fio_guasi_cancel(%p) req=%p\n", io_u, io_u->greq));
-	if (io_u->greq != NULL)
-		guasi_req_cancel(io_u->greq);
-
-	return 0;
-}
-
-static void fio_guasi_cleanup(struct thread_data *td)
-{
-	struct guasi_data *ld = td->io_ops_data;
-	int n;
-
-	GDBG_PRINT(("fio_guasi_cleanup(%p)\n", ld));
-	if (ld) {
-		for (n = 0; n < ld->reqs_nr; n++)
-			guasi_req_free(ld->reqs[n]);
-		guasi_free(ld->hctx);
-		free(ld->reqs);
-		free(ld->io_us);
-		free(ld);
-	}
-	GDBG_PRINT(("fio_guasi_cleanup(%p) DONE\n", ld));
-}
-
-static int fio_guasi_init(struct thread_data *td)
-{
-	int maxthr;
-	struct guasi_data *ld = malloc(sizeof(*ld));
-
-	GDBG_PRINT(("fio_guasi_init(): depth=%d\n", td->o.iodepth));
-	memset(ld, 0, sizeof(*ld));
-	maxthr = td->o.iodepth > GFIO_MIN_THREADS ? td->o.iodepth: GFIO_MIN_THREADS;
-	if (maxthr > GFIO_MAX_THREADS)
-		maxthr = GFIO_MAX_THREADS;
-	if ((ld->hctx = guasi_create(GFIO_MIN_THREADS, maxthr, 1)) == NULL) {
-		td_verror(td, errno, "guasi_create");
-		free(ld);
-		return 1;
-	}
-	ld->max_reqs = td->o.iodepth;
-	ld->reqs = malloc(ld->max_reqs * sizeof(guasi_req_t));
-	ld->io_us = malloc(ld->max_reqs * sizeof(struct io_u *));
-	memset(ld->io_us, 0, ld->max_reqs * sizeof(struct io_u *));
-	ld->queued_nr = 0;
-	ld->reqs_nr = 0;
-
-	td->io_ops_data = ld;
-	GDBG_PRINT(("fio_guasi_init(): depth=%d -> %p\n", td->o.iodepth, ld));
-
-	return 0;
-}
-
-FIO_STATIC struct ioengine_ops ioengine = {
-	.name		= "guasi",
-	.version	= FIO_IOOPS_VERSION,
-	.init		= fio_guasi_init,
-	.prep		= fio_guasi_prep,
-	.queue		= fio_guasi_queue,
-	.commit		= fio_guasi_commit,
-	.cancel		= fio_guasi_cancel,
-	.getevents	= fio_guasi_getevents,
-	.event		= fio_guasi_event,
-	.cleanup	= fio_guasi_cleanup,
-	.open_file	= generic_open_file,
-	.close_file	= generic_close_file,
-	.get_file_size	= generic_get_file_size,
-};
-
-static void fio_init fio_guasi_register(void)
-{
-	register_ioengine(&ioengine);
-}
-
-static void fio_exit fio_guasi_unregister(void)
-{
-	unregister_ioengine(&ioengine);
-}
-
diff --git a/fio.1 b/fio.1
index f15194ff..74509bbd 100644
--- a/fio.1
+++ b/fio.1
@@ -1561,7 +1561,8 @@ if \fBsize\fR is set to 20GiB and \fBio_size\fR is set to 5GiB, fio
 will perform I/O within the first 20GiB but exit when 5GiB have been
 done. The opposite is also possible \-\- if \fBsize\fR is set to 20GiB,
 and \fBio_size\fR is set to 40GiB, then fio will do 40GiB of I/O within
-the 0..20GiB region.
+the 0..20GiB region. Value can be set as percentage: \fBio_size\fR=N%.
+In this case \fBio_size\fR multiplies \fBsize\fR= value.
 .TP
 .BI filesize \fR=\fPirange(int)
 Individual file sizes. May be a range, in which case fio will select sizes
@@ -1674,11 +1675,6 @@ to get desired CPU usage, as the cpuload only loads a
 single CPU at the desired rate. A job never finishes unless there is
 at least one non-cpuio job.
 .TP
-.B guasi
-The GUASI I/O engine is the Generic Userspace Asynchronous Syscall
-Interface approach to async I/O. See \fIhttp://www.xmailserver.org/guasi-lib.html\fR
-for more info on GUASI.
-.TP
 .B rdma
 The RDMA I/O engine supports both RDMA memory semantics
 (RDMA_WRITE/RDMA_READ) and channel semantics (Send/Recv) for the
diff --git a/io_u.h b/io_u.h
index 5a28689c..d4c5be43 100644
--- a/io_u.h
+++ b/io_u.h
@@ -11,9 +11,6 @@
 #ifdef CONFIG_LIBAIO
 #include <libaio.h>
 #endif
-#ifdef CONFIG_GUASI
-#include <guasi.h>
-#endif
 
 enum {
 	IO_U_F_FREE		= 1 << 0,
@@ -125,9 +122,6 @@ struct io_u {
 #ifdef FIO_HAVE_SGIO
 		struct sg_io_hdr hdr;
 #endif
-#ifdef CONFIG_GUASI
-		guasi_req_t greq;
-#endif
 #ifdef CONFIG_SOLARISAIO
 		aio_result_t resultp;
 #endif
diff --git a/options.c b/options.c
index e27bb9cb..b497d973 100644
--- a/options.c
+++ b/options.c
@@ -1481,9 +1481,13 @@ static int str_io_size_cb(void *data, unsigned long long *__val)
 	struct thread_data *td = cb_data_to_td(data);
 	unsigned long long v = *__val;
 
-	if (parse_is_percent(v)) {
+	if (parse_is_percent_uncapped(v)) {
 		td->o.io_size = 0;
 		td->o.io_size_percent = -1ULL - v;
+		if (td->o.io_size_percent > 100) {
+			log_err("fio: io_size values greater than 100%% aren't supported\n");
+			return 1;
+		}
 		dprint(FD_PARSE, "SET io_size_percent %d\n",
 					td->o.io_size_percent);
 	} else
@@ -1868,11 +1872,6 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  { .ival = "cpuio",
 			    .help = "CPU cycle burner engine",
 			  },
-#ifdef CONFIG_GUASI
-			  { .ival = "guasi",
-			    .help = "GUASI IO engine",
-			  },
-#endif
 #ifdef CONFIG_RDMA
 			  { .ival = "rdma",
 			    .help = "RDMA IO engine",
diff --git a/parse.h b/parse.h
index 5828654f..1d2cbf74 100644
--- a/parse.h
+++ b/parse.h
@@ -133,6 +133,11 @@ static inline int parse_is_percent(unsigned long long val)
 	return val <= -1ULL && val >= (-1ULL - 100ULL);
 }
 
+static inline int parse_is_percent_uncapped(unsigned long long val)
+{
+	return (long long)val <= -1;
+}
+
 struct print_option {
 	struct flist_head list;
 	char *name;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-09-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-09-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8b38f40138047975ea7a59b9ce06681ffdd2c31d:

  fio: support io_size=N% (N <= 100) (2020-09-05 09:37:59 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 03eabd8a7a53c80a6c9d1376b38a9b7ddce2268a:

  Merge branch 'guasi_fixes' of https://github.com/sitsofe/fio into master (2020-09-06 16:14:58 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Fio 3.23
      Merge branch 'guasi_fixes' of https://github.com/sitsofe/fio into master

Sitsofe Wheeler (1):
      Makefile/configure: fix guasi build

 FIO-VERSION-GEN | 2 +-
 Makefile        | 1 +
 configure       | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index bf0fee99..5ee7735c 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.22
+DEF_VER=fio-3.23
 
 LF='
 '
diff --git a/Makefile b/Makefile
index 8e1ebc90..6678c2fd 100644
--- a/Makefile
+++ b/Makefile
@@ -108,6 +108,7 @@ ifdef CONFIG_LINUX_SPLICE
 endif
 ifdef CONFIG_GUASI
   guasi_SRCS = engines/guasi.c
+  guasi_LIBS = -lguasi
   ENGINES += guasi
 endif
 ifdef CONFIG_SOLARISAIO
diff --git a/configure b/configure
index 6d672fe5..81bfb270 100755
--- a/configure
+++ b/configure
@@ -1325,7 +1325,7 @@ int main(int argc, char **argv)
   return 0;
 }
 EOF
-if compile_prog "" "" "guasi"; then
+if compile_prog "" "-lguasi" "guasi"; then
   guasi="yes"
 fi
 print_config "GUASI" "$guasi"


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-09-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-09-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2dd96cc46fa83a73acc1c9238c3ac59203e10213:

  engines/io_uring: use the atomic load acquire instead of a barrier (2020-09-03 08:49:51 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8b38f40138047975ea7a59b9ce06681ffdd2c31d:

  fio: support io_size=N% (N <= 100) (2020-09-05 09:37:59 -0600)

----------------------------------------------------------------
Alexey Dobriyan (1):
      fio: support io_size=N% (N <= 100)

 cconv.c          |  2 ++
 filesetup.c      | 11 ++++++++++-
 options.c        | 17 +++++++++++++++++
 server.h         |  2 +-
 thread_options.h |  4 ++++
 5 files changed, 34 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/cconv.c b/cconv.c
index 5dc0569f..488dd799 100644
--- a/cconv.c
+++ b/cconv.c
@@ -102,6 +102,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->size = le64_to_cpu(top->size);
 	o->io_size = le64_to_cpu(top->io_size);
 	o->size_percent = le32_to_cpu(top->size_percent);
+	o->io_size_percent = le32_to_cpu(top->io_size_percent);
 	o->fill_device = le32_to_cpu(top->fill_device);
 	o->file_append = le32_to_cpu(top->file_append);
 	o->file_size_low = le64_to_cpu(top->file_size_low);
@@ -367,6 +368,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->iodepth_batch_complete_max = cpu_to_le32(o->iodepth_batch_complete_max);
 	top->serialize_overlap = cpu_to_le32(o->serialize_overlap);
 	top->size_percent = cpu_to_le32(o->size_percent);
+	top->io_size_percent = cpu_to_le32(o->io_size_percent);
 	top->fill_device = cpu_to_le32(o->fill_device);
 	top->file_append = cpu_to_le32(o->file_append);
 	top->ratecycle = cpu_to_le32(o->ratecycle);
diff --git a/filesetup.c b/filesetup.c
index d382fa24..e44f31c7 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1139,6 +1139,8 @@ int setup_files(struct thread_data *td)
 		if (f->io_size == -1ULL)
 			total_size = -1ULL;
 		else {
+			uint64_t io_size;
+
                         if (o->size_percent && o->size_percent != 100) {
 				uint64_t file_size;
 
@@ -1150,7 +1152,14 @@ int setup_files(struct thread_data *td)
 
 				f->io_size -= (f->io_size % td_min_bs(td));
 			}
-			total_size += f->io_size;
+
+			io_size = f->io_size;
+			if (o->io_size_percent && o->io_size_percent != 100) {
+				io_size *= o->io_size_percent;
+				io_size /= 100;
+			}
+
+			total_size += io_size;
 		}
 
 		if (f->filetype == FIO_TYPE_FILE &&
diff --git a/options.c b/options.c
index 0d64e7c0..e27bb9cb 100644
--- a/options.c
+++ b/options.c
@@ -1476,6 +1476,22 @@ static int str_size_cb(void *data, unsigned long long *__val)
 	return 0;
 }
 
+static int str_io_size_cb(void *data, unsigned long long *__val)
+{
+	struct thread_data *td = cb_data_to_td(data);
+	unsigned long long v = *__val;
+
+	if (parse_is_percent(v)) {
+		td->o.io_size = 0;
+		td->o.io_size_percent = -1ULL - v;
+		dprint(FD_PARSE, "SET io_size_percent %d\n",
+					td->o.io_size_percent);
+	} else
+		td->o.io_size = v;
+
+	return 0;
+}
+
 static int str_write_bw_log_cb(void *data, const char *str)
 {
 	struct thread_data *td = cb_data_to_td(data);
@@ -2043,6 +2059,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.alias	= "io_limit",
 		.lname	= "IO Size",
 		.type	= FIO_OPT_STR_VAL,
+		.cb	= str_io_size_cb,
 		.off1	= offsetof(struct thread_options, io_size),
 		.help	= "Total size of I/O to be performed",
 		.interval = 1024 * 1024,
diff --git a/server.h b/server.h
index 3cd60096..6d444749 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 85,
+	FIO_SERVER_VER			= 86,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 7c0a3158..97c400fe 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -83,6 +83,7 @@ struct thread_options {
 	unsigned long long size;
 	unsigned long long io_size;
 	unsigned int size_percent;
+	unsigned int io_size_percent;
 	unsigned int fill_device;
 	unsigned int file_append;
 	unsigned long long file_size_low;
@@ -381,6 +382,7 @@ struct thread_options_pack {
 	uint64_t size;
 	uint64_t io_size;
 	uint32_t size_percent;
+	uint32_t io_size_percent;
 	uint32_t fill_device;
 	uint32_t file_append;
 	uint32_t unique_filename;
@@ -460,6 +462,8 @@ struct thread_options_pack {
 	struct zone_split zone_split[DDIR_RWDIR_CNT][ZONESPLIT_MAX];
 	uint32_t zone_split_nr[DDIR_RWDIR_CNT];
 
+	uint8_t pad1[4];
+
 	fio_fp64_t zipf_theta;
 	fio_fp64_t pareto_h;
 	fio_fp64_t gauss_dev;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-09-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-09-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3412afb7b365b97ba515df9c72dfc89bf75aca0a:

  t/zbd: Remove unnecessary option for zbc_reset_zone (2020-09-01 08:37:45 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2dd96cc46fa83a73acc1c9238c3ac59203e10213:

  engines/io_uring: use the atomic load acquire instead of a barrier (2020-09-03 08:49:51 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      engines/io_uring: move sqe clear out of hot path
      t/io_uring: allow setting fixed files/buffers as arguments
      engines/io_uring: use the atomic load acquire instead of a barrier

 engines/io_uring.c | 20 +++++++++++++++-----
 t/io_uring.c       | 10 ++++++++--
 2 files changed, 23 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index ec8cb18a..ca5b90c9 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -218,9 +218,6 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 
 	sqe = &ld->sqes[io_u->index];
 
-	/* zero out fields not used in this submission */
-	memset(sqe, 0, sizeof(*sqe));
-
 	if (o->registerfiles) {
 		sqe->fd = f->engine_pos;
 		sqe->flags = IOSQE_FIXED_FILE;
@@ -262,13 +259,18 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 		if (ld->ioprio_set)
 			sqe->ioprio |= td->o.ioprio;
 		sqe->off = io_u->offset;
+		sqe->rw_flags = 0;
 	} else if (ddir_sync(io_u->ddir)) {
+		sqe->ioprio = 0;
 		if (io_u->ddir == DDIR_SYNC_FILE_RANGE) {
 			sqe->off = f->first_write;
 			sqe->len = f->last_write - f->first_write;
 			sqe->sync_range_flags = td->o.sync_file_range;
 			sqe->opcode = IORING_OP_SYNC_FILE_RANGE;
 		} else {
+			sqe->off = 0;
+			sqe->addr = 0;
+			sqe->len = 0;
 			if (io_u->ddir == DDIR_DATASYNC)
 				sqe->fsync_flags |= IORING_FSYNC_DATASYNC;
 			sqe->opcode = IORING_OP_FSYNC;
@@ -444,9 +446,10 @@ static int fio_ioring_commit(struct thread_data *td)
 	 */
 	if (o->sqpoll_thread) {
 		struct io_sq_ring *ring = &ld->sq_ring;
+		unsigned flags;
 
-		read_barrier();
-		if (*ring->flags & IORING_SQ_NEED_WAKEUP)
+		flags = atomic_load_acquire(ring->flags);
+		if (flags & IORING_SQ_NEED_WAKEUP)
 			io_uring_enter(ld, ld->queued, 0,
 					IORING_ENTER_SQ_WAKEUP);
 		ld->queued = 0;
@@ -681,6 +684,13 @@ static int fio_ioring_post_init(struct thread_data *td)
 		return 1;
 	}
 
+	for (i = 0; i < td->o.iodepth; i++) {
+		struct io_uring_sqe *sqe;
+
+		sqe = &ld->sqes[i];
+		memset(sqe, 0, sizeof(*sqe));
+	}
+
 	if (o->registerfiles) {
 		err = fio_ioring_register_files(td);
 		if (err) {
diff --git a/t/io_uring.c b/t/io_uring.c
index 8d258136..044f9195 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -535,7 +535,7 @@ int main(int argc, char *argv[])
 		return 1;
 	}
 
-	while ((opt = getopt(argc, argv, "d:s:c:b:p:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:B:F:h?")) != -1) {
 		switch (opt) {
 		case 'd':
 			depth = atoi(optarg);
@@ -552,6 +552,12 @@ int main(int argc, char *argv[])
 		case 'p':
 			polled = !!atoi(optarg);
 			break;
+		case 'B':
+			fixedbufs = !!atoi(optarg);
+			break;
+		case 'F':
+			register_files = !!atoi(optarg);
+			break;
 		case 'h':
 		case '?':
 		default:
@@ -628,7 +634,7 @@ int main(int argc, char *argv[])
 		printf("ring setup failed: %s, %d\n", strerror(errno), err);
 		return 1;
 	}
-	printf("polled=%d, fixedbufs=%d, buffered=%d", polled, fixedbufs, buffered);
+	printf("polled=%d, fixedbufs=%d, register_files=%d, buffered=%d", polled, fixedbufs, register_files, buffered);
 	printf(" QD=%d, sq_ring=%d, cq_ring=%d\n", depth, *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
 
 	pthread_create(&s->thread, NULL, submitter_fn, s);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-09-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-09-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 20c7a244e75e4aa705a31a74e7067de4c890dff7:

  options: flow should parse as FIO_OPT_INT (2020-08-31 09:07:12 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3412afb7b365b97ba515df9c72dfc89bf75aca0a:

  t/zbd: Remove unnecessary option for zbc_reset_zone (2020-09-01 08:37:45 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (7):
      zbd: Decrement open zones count at write command completion
      oslib/linux-blkzoned: Allow reset zone before file set up
      zbd: Initialize open zones list referring zone status at fio start
      t/zbd: Improve usage message of test-zbd-support script
      t/zbd: Add -o option to t/zbd/test-zoned-support
      t/zbd: Reset all zones before test when max open zones is specified
      t/zbd: Remove unnecessary option for zbc_reset_zone

 io_u.c                 |   4 +-
 io_u.h                 |   5 +-
 ioengines.c            |   4 +-
 oslib/linux-blkzoned.c |  18 +++++--
 t/zbd/functions        |   2 +-
 t/zbd/test-zbd-support | 122 ++++++++++++++++++++++++++++++++++++++++-------
 zbd.c                  | 127 +++++++++++++++++++++++++++++++++++++++----------
 zbd.h                  |   9 ++--
 8 files changed, 236 insertions(+), 55 deletions(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index 155d0a32..f30fc037 100644
--- a/io_u.c
+++ b/io_u.c
@@ -795,7 +795,7 @@ void put_io_u(struct thread_data *td, struct io_u *io_u)
 {
 	const bool needs_lock = td_async_processing(td);
 
-	zbd_put_io_u(io_u);
+	zbd_put_io_u(td, io_u);
 
 	if (td->parent)
 		td = td->parent;
@@ -1369,7 +1369,7 @@ static long set_io_u_file(struct thread_data *td, struct io_u *io_u)
 		if (!fill_io_u(td, io_u))
 			break;
 
-		zbd_put_io_u(io_u);
+		zbd_put_io_u(td, io_u);
 
 		put_file_log(td, f);
 		td_io_close_file(td, f);
diff --git a/io_u.h b/io_u.h
index 31100928..5a28689c 100644
--- a/io_u.h
+++ b/io_u.h
@@ -101,13 +101,14 @@ struct io_u {
 	 * @success == true means that the I/O operation has been queued or
 	 * completed successfully.
 	 */
-	void (*zbd_queue_io)(struct io_u *, int q, bool success);
+	void (*zbd_queue_io)(struct thread_data *td, struct io_u *, int q,
+			     bool success);
 
 	/*
 	 * ZBD mode zbd_put_io callback: called in after completion of an I/O
 	 * or commit of an async I/O to unlock the I/O target zone.
 	 */
-	void (*zbd_put_io)(const struct io_u *);
+	void (*zbd_put_io)(struct thread_data *td, const struct io_u *);
 
 	/*
 	 * Callback for io completion
diff --git a/ioengines.c b/ioengines.c
index 1c5970a4..476df58d 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -352,7 +352,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	}
 
 	ret = td->io_ops->queue(td, io_u);
-	zbd_queue_io_u(io_u, ret);
+	zbd_queue_io_u(td, io_u, ret);
 
 	unlock_file(td, io_u->file);
 
@@ -394,7 +394,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	if (!td->io_ops->commit) {
 		io_u_mark_submit(td, 1);
 		io_u_mark_complete(td, 1);
-		zbd_put_io_u(io_u);
+		zbd_put_io_u(td, io_u);
 	}
 
 	if (ret == FIO_Q_COMPLETED) {
diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
index 6fe78b9c..0a8a577a 100644
--- a/oslib/linux-blkzoned.c
+++ b/oslib/linux-blkzoned.c
@@ -222,9 +222,21 @@ int blkzoned_reset_wp(struct thread_data *td, struct fio_file *f,
 		.sector         = offset >> 9,
 		.nr_sectors     = length >> 9,
 	};
+	int fd, ret = 0;
+
+	/* If the file is not yet opened, open it for this function. */
+	fd = f->fd;
+	if (fd < 0) {
+		fd = open(f->file_name, O_RDWR | O_LARGEFILE);
+		if (fd < 0)
+			return -errno;
+	}
 
-	if (ioctl(f->fd, BLKRESETZONE, &zr) < 0)
-		return -errno;
+	if (ioctl(fd, BLKRESETZONE, &zr) < 0)
+		ret = -errno;
 
-	return 0;
+	if (f->fd < 0)
+		close(fd);
+
+	return ret;
 }
diff --git a/t/zbd/functions b/t/zbd/functions
index 81b6f3f7..1a64a215 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -185,7 +185,7 @@ reset_zone() {
 	fi
     else
 	if [ "$offset" -lt 0 ]; then
-	    ${zbc_reset_zone} -all "$dev" "${offset}" >/dev/null
+	    ${zbc_reset_zone} -all "$dev" >/dev/null
 	else
 	    ${zbc_reset_zone} -sector "$dev" "${offset}" >/dev/null
 	fi
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 139495d3..248423bb 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -5,7 +5,16 @@
 # This file is released under the GPL.
 
 usage() {
-    echo "Usage: $(basename "$0") [-d] [-e] [-l] [-r] [-v] [-t <test>] [-z] <SMR drive device node>"
+	echo "Usage: $(basename "$0") [OPTIONS] <test target device file>"
+	echo "Options:"
+	echo -e "\t-d Run fio with valgrind using DRD tool"
+	echo -e "\t-e Run fio with valgrind using helgrind tool"
+	echo -e "\t-v Run fio with valgrind --read-var-info option"
+	echo -e "\t-l Test with libzbc ioengine"
+	echo -e "\t-r Reset all zones before test start"
+	echo -e "\t-o <max_open_zones> Run fio with max_open_zones limit"
+	echo -e "\t-t <test #> Run only a single test case with specified number"
+	echo -e "\t-z Run fio with debug=zbd option"
 }
 
 max() {
@@ -95,14 +104,41 @@ is_scsi_device() {
     return 1
 }
 
+job_var_opts_exclude() {
+	local o
+	local ex_key="${1}"
+
+	for o in "${job_var_opts[@]}"; do
+		if [[ ${o} =~ "${ex_key}" ]]; then
+			continue
+		fi
+		echo -n "${o}"
+	done
+}
+
+has_max_open_zones() {
+	while (($# > 1)); do
+		if [[ ${1} =~ "--max_open_zones" ]]; then
+			return 0
+		fi
+		shift
+	done
+	return 1
+}
+
 run_fio() {
     local fio opts
 
     fio=$(dirname "$0")/../../fio
 
-    opts=("--max-jobs=16" "--aux-path=/tmp" "--allow_file_create=0" \
-	  "--significant_figures=10" "$@")
-    opts+=(${var_opts[@]})
+    opts=(${global_var_opts[@]})
+    opts+=("--max-jobs=16" "--aux-path=/tmp" "--allow_file_create=0" \
+			   "--significant_figures=10" "$@")
+    # When max_open_zones option is specified to this test script, add
+    # max_open_zones option to fio command unless the test case already add it.
+    if [[ -n ${max_open_zones_opt} ]] && ! has_max_open_zones "${opts[@]}"; then
+	    opts+=("--max_open_zones=${max_open_zones_opt}")
+    fi
     { echo; echo "fio ${opts[*]}"; echo; } >>"${logfile}.${test_number}"
 
     "${dynamic_analyzer[@]}" "$fio" "${opts[@]}"
@@ -120,13 +156,16 @@ write_and_run_one_fio_job() {
     local r
     local write_offset="${1}"
     local write_size="${2}"
+    local -a write_opts
 
     shift 2
     r=$(((RANDOM << 16) | RANDOM))
-    run_fio --filename="$dev" --randseed="$r"  --name="write_job" --rw=write \
-	    "$(ioengine "psync")" --bs="${logical_block_size}" \
-	    --zonemode=zbd --zonesize="${zone_size}" --thread=1 --direct=1 \
-	    --offset="${write_offset}" --size="${write_size}" \
+    write_opts=(--name="write_job" --rw=write "$(ioengine "psync")" \
+		      --bs="${logical_block_size}" --zonemode=zbd \
+		      --zonesize="${zone_size}" --thread=1 --direct=1 \
+		      --offset="${write_offset}" --size="${write_size}")
+    write_opts+=("${job_var_opts[@]}")
+    run_fio --filename="$dev" --randseed="$r" "${write_opts[@]}" \
 	    --name="$dev" --wait_for="write_job" "$@" --thread=1 --direct=1
 }
 
@@ -142,6 +181,15 @@ run_fio_on_seq() {
     run_one_fio_job "${opts[@]}" "$@"
 }
 
+# Prepare for write test by resetting zones. When max_open_zones option is
+# specified, reset all zones of the test target to ensure that zones out of the
+# test target range do not have open zones. This allows the write test to the
+# target range to be able to open zones up to max_open_zones.
+prep_write() {
+	[[ -n "${max_open_zones_opt}" && -n "${is_zbd}" ]] &&
+		reset_zone "${dev}" -1
+}
+
 # Check whether buffered writes are refused.
 test1() {
     run_fio --name=job1 --filename="$dev" --rw=write --direct=0 --bs=4K	\
@@ -213,6 +261,7 @@ test4() {
 test5() {
     local size off capacity
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 4 $off $dev)
     size=$((4 * zone_size))
@@ -228,6 +277,7 @@ test5() {
 test6() {
     local size off capacity
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 4 $off $dev)
     size=$((4 * zone_size))
@@ -246,6 +296,7 @@ test7() {
     local size=$((zone_size))
     local off capacity
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 1 $off $dev)
     run_fio_on_seq "$(ioengine "libaio")" --iodepth=1 --rw=randwrite	\
@@ -260,6 +311,7 @@ test7() {
 test8() {
     local size off capacity
 
+    prep_write
     size=$((4 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 4 $off $dev)
@@ -280,6 +332,7 @@ test9() {
 	return 0
     fi
 
+    prep_write
     size=$((4 * zone_size))
     run_fio_on_seq --ioengine=sg					\
 		   --iodepth=1 --rw=randwrite --bs=16K			\
@@ -298,6 +351,7 @@ test10() {
 	return 0
     fi
 
+    prep_write
     size=$((4 * zone_size))
     run_fio_on_seq --ioengine=sg 					\
 		   --iodepth=64 --rw=randwrite --bs=16K			\
@@ -311,6 +365,7 @@ test10() {
 test11() {
     local size off capacity
 
+    prep_write
     size=$((4 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 4 $off $dev)
@@ -325,6 +380,7 @@ test11() {
 test12() {
     local size off capacity
 
+    prep_write
     size=$((8 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 8 $off $dev)
@@ -339,6 +395,7 @@ test12() {
 test13() {
     local size off capacity
 
+    prep_write
     size=$((8 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 8 $off $dev)
@@ -354,6 +411,7 @@ test13() {
 test14() {
     local size
 
+    prep_write
     size=$((16 * 2**20)) # 20 MB
     if [ $size -gt $((first_sequential_zone_sector * 512)) ]; then
 	echo "$dev does not have enough sequential zones" \
@@ -378,6 +436,7 @@ test15() {
 	    reset_zone "$dev" $((first_sequential_zone_sector +
 				 i*sectors_per_zone))
     done
+    prep_write
     w_off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
     w_size=$((2 * zone_size))
     w_capacity=$(total_zone_capacity 2 $w_off $dev)
@@ -402,6 +461,7 @@ test16() {
 	    reset_zone "$dev" $((first_sequential_zone_sector +
 				 i*sectors_per_zone))
     done
+    prep_write
     w_off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
     w_size=$((2 * zone_size))
     w_capacity=$(total_zone_capacity 2 $w_off $dev)
@@ -424,6 +484,7 @@ test17() {
     if [ -n "$is_zbd" ]; then
 	reset_zone "$dev" $((off / 512)) || return $?
     fi
+    prep_write
     run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw --bs=4K \
 		    --zonemode=zbd --zonesize="${zone_size}"		\
 		    --offset=$off --loops=2 --norandommap=1\
@@ -477,6 +538,7 @@ test24() {
     local bs loops=9 size=$((zone_size))
     local off capacity
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 1 $off $dev)
 
@@ -499,12 +561,13 @@ test25() {
         [ -n "$is_zbd" ] &&
 	    reset_zone "$dev" $((first_sequential_zone_sector + i*sectors_per_zone))
     done
+    prep_write
     for ((i=0;i<16;i++)); do
 	opts+=("--name=job$i" "--filename=$dev" "--thread=1" "--direct=1")
 	opts+=("--offset=$((first_sequential_zone_sector*512 + zone_size*i))")
 	opts+=("--size=$zone_size" "$(ioengine "psync")" "--rw=write" "--bs=16K")
 	opts+=("--zonemode=zbd" "--zonesize=${zone_size}" "--group_reporting=1")
-	opts+=(${var_opts[@]})
+	opts+=(${job_var_opts[@]})
     done
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
 }
@@ -513,6 +576,7 @@ write_to_first_seq_zone() {
     local loops=4 r
     local off capacity
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 1 $off $dev)
 
@@ -542,6 +606,7 @@ test28() {
 
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
+    prep_write
     opts=("--debug=zbd")
     capacity=$(total_zone_capacity 1 $off $dev)
     for ((i=0;i<jobs;i++)); do
@@ -549,7 +614,7 @@ test28() {
 	opts+=("--size=$zone_size" "--io_size=$capacity" "$(ioengine "psync")" "--rw=randwrite")
 	opts+=("--thread=1" "--direct=1" "--zonemode=zbd")
 	opts+=("--zonesize=${zone_size}" "--group_reporting=1")
-	opts+=(${var_opts[@]})
+	opts+=(${job_var_opts[@]})
     done
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
     check_written $((jobs * $capacity)) || return $?
@@ -565,7 +630,7 @@ test29() {
 
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     size=$((16*zone_size))
-    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
+    prep_write
     opts=("--debug=zbd")
     for ((i=0;i<jobs;i++)); do
 	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
@@ -573,7 +638,8 @@ test29() {
 	opts+=("$(ioengine "psync")" "--rw=randwrite" "--direct=1")
 	opts+=("--max_open_zones=4" "--group_reporting=1")
 	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
-	opts+=(${var_opts[@]})
+	# max_open_zones is already specified
+	opts+=($(job_var_opts_exclude "--max_open_zones"))
     done
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
     check_written $((jobs * zone_size)) || return $?
@@ -583,6 +649,7 @@ test29() {
 test30() {
     local off
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw	\
 		    --bs="$(max $((zone_size / 128)) "$logical_block_size")"\
@@ -596,6 +663,7 @@ test30() {
 test31() {
     local bs inc nz off opts size
 
+    prep_write
     # Start with writing 128 KB to 128 sequential zones.
     bs=128K
     nz=128
@@ -609,7 +677,7 @@ test31() {
 	opts+=("--bs=$bs" "--size=$zone_size" "$(ioengine "libaio")")
 	opts+=("--rw=write" "--direct=1" "--thread=1" "--stats=0")
 	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
-	opts+=(${var_opts[@]})
+	opts+=(${job_var_opts[@]})
     done
     "$(dirname "$0")/../../fio" "${opts[@]}" >> "${logfile}.${test_number}" 2>&1
     # Next, run the test.
@@ -619,6 +687,7 @@ test31() {
     opts+=("--bs=$bs" "$(ioengine "psync")" "--rw=randread" "--direct=1")
     opts+=("--thread=1" "--time_based" "--runtime=30" "--zonemode=zbd")
     opts+=("--zonesize=${zone_size}")
+    opts+=(${job_var_opts[@]})
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
 }
 
@@ -627,6 +696,7 @@ test31() {
 test32() {
     local off opts=() size
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     size=$((disk_size - off))
     opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--size=$size")
@@ -643,6 +713,7 @@ test33() {
     local bs io_size size
     local off capacity=0;
 
+    prep_write
     off=$((first_sequential_zone_sector * 512))
     capacity=$(total_zone_capacity 1 $off $dev)
     size=$((2 * zone_size))
@@ -659,6 +730,7 @@ test33() {
 test34() {
     local size
 
+    prep_write
     size=$((2 * zone_size))
     run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write --size=$size \
 		   --do_verify=1 --verify=md5 --bs=$((3 * zone_size / 4)) \
@@ -670,6 +742,7 @@ test34() {
 test35() {
     local bs off io_size size
 
+    prep_write
     off=$(((first_sequential_zone_sector + 1) * 512))
     size=$((zone_size - 2 * 512))
     bs=$((zone_size / 4))
@@ -684,6 +757,7 @@ test35() {
 test36() {
     local bs off io_size size
 
+    prep_write
     off=$(((first_sequential_zone_sector) * 512))
     size=$((zone_size - 512))
     bs=$((zone_size / 4))
@@ -698,6 +772,7 @@ test36() {
 test37() {
     local bs off size capacity
 
+    prep_write
     capacity=$(total_zone_capacity 1 $first_sequential_zone_sector $dev)
     if [ "$first_sequential_zone_sector" = 0 ]; then
 	off=0
@@ -717,6 +792,7 @@ test37() {
 test38() {
     local bs off size
 
+    prep_write
     size=$((logical_block_size))
     off=$((disk_size - logical_block_size))
     bs=$((logical_block_size))
@@ -787,6 +863,7 @@ test45() {
     local bs i
 
     [ -z "$is_zbd" ] && return 0
+    prep_write
     bs=$((logical_block_size))
     run_one_fio_job "$(ioengine "psync")" --iodepth=1 --rw=randwrite --bs=$bs\
 		    --offset=$((first_sequential_zone_sector * 512)) \
@@ -799,6 +876,7 @@ test45() {
 test46() {
     local size
 
+    prep_write
     size=$((4 * zone_size))
     run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite --bs=4K \
 		   --group_reporting=1 --numjobs=8 \
@@ -810,6 +888,7 @@ test46() {
 test47() {
     local bs
 
+    prep_write
     bs=$((logical_block_size))
     run_fio_on_seq "$(ioengine "psync")" --rw=write --bs=$bs --zoneskip=1 \
 		    >> "${logfile}.${test_number}" 2>&1 && return 1
@@ -824,7 +903,7 @@ test48() {
 
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     size=$((16*zone_size))
-    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
+    prep_write
     opts=("--aux-path=/tmp" "--allow_file_create=0" "--significant_figures=10")
     opts+=("--debug=zbd")
     opts+=("$(ioengine "libaio")" "--rw=randwrite" "--direct=1")
@@ -835,6 +914,8 @@ test48() {
 	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
 	opts+=("--io_size=$zone_size" "--iodepth=256" "--thread=1")
 	opts+=("--group_reporting=1")
+	# max_open_zones is already specified
+	opts+=($(job_var_opts_exclude "--max_open_zones"))
     done
 
     fio=$(dirname "$0")/../../fio
@@ -872,6 +953,7 @@ dynamic_analyzer=()
 reset_all_zones=
 use_libzbc=
 zbd_debug=
+max_open_zones_opt=
 
 while [ "${1#-}" != "$1" ]; do
   case "$1" in
@@ -883,6 +965,7 @@ while [ "${1#-}" != "$1" ]; do
     -l) use_libzbc=1; shift;;
     -r) reset_all_zones=1; shift;;
     -t) tests+=("$2"); shift; shift;;
+    -o) max_open_zones_opt="${2}"; shift; shift;;
     -v) dynamic_analyzer=(valgrind "--read-var-info=yes");
 	shift;;
     -z) zbd_debug=1; shift;;
@@ -898,9 +981,10 @@ fi
 # shellcheck source=functions
 source "$(dirname "$0")/functions" || exit $?
 
-var_opts=()
+global_var_opts=()
+job_var_opts=()
 if [ -n "$zbd_debug" ]; then
-    var_opts+=("--debug=zbd")
+    global_var_opts+=("--debug=zbd")
 fi
 dev=$1
 realdev=$(readlink -f "$dev")
@@ -986,6 +1070,12 @@ elif [[ -c "$realdev" ]]; then
 	fi
 fi
 
+if [[ -n ${max_open_zones_opt} ]]; then
+	# Override max_open_zones with the script option value
+	max_open_zones="${max_open_zones_opt}"
+	job_var_opts+=("--max_open_zones=${max_open_zones_opt}")
+fi
+
 echo -n "First sequential zone starts at sector $first_sequential_zone_sector;"
 echo " zone size: $((zone_size >> 20)) MB"
 
diff --git a/zbd.c b/zbd.c
index 584d3640..e8ecbb6f 100644
--- a/zbd.c
+++ b/zbd.c
@@ -628,6 +628,11 @@ static int zbd_init_zone_info(struct thread_data *td, struct fio_file *file)
 	return ret;
 }
 
+static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
+			  uint32_t zone_idx);
+static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
+			  struct fio_zone_info *z);
+
 int zbd_setup_files(struct thread_data *td)
 {
 	struct fio_file *f;
@@ -651,6 +656,8 @@ int zbd_setup_files(struct thread_data *td)
 
 	for_each_file(td, f, i) {
 		struct zoned_block_device_info *zbd = f->zbd_info;
+		struct fio_zone_info *z;
+		int zi;
 
 		if (!zbd)
 			continue;
@@ -666,6 +673,23 @@ int zbd_setup_files(struct thread_data *td)
 			log_err("'max_open_zones' value is limited by %u\n", ZBD_MAX_OPEN_ZONES);
 			return 1;
 		}
+
+		for (zi = f->min_zone; zi < f->max_zone; zi++) {
+			z = &zbd->zone_info[zi];
+			if (z->cond != ZBD_ZONE_COND_IMP_OPEN &&
+			    z->cond != ZBD_ZONE_COND_EXP_OPEN)
+				continue;
+			if (zbd_open_zone(td, f, zi))
+				continue;
+			/*
+			 * If the number of open zones exceeds specified limits,
+			 * reset all extra open zones.
+			 */
+			if (zbd_reset_zone(td, f, z) < 0) {
+				log_err("Failed to reest zone %d\n", zi);
+				return 1;
+			}
+		}
 	}
 
 	return 0;
@@ -722,12 +746,21 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 
 /* The caller must hold f->zbd_info->mutex */
 static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
-			   unsigned int open_zone_idx)
+			   unsigned int zone_idx)
 {
-	uint32_t zone_idx;
+	uint32_t open_zone_idx = 0;
 
-	assert(open_zone_idx < f->zbd_info->num_open_zones);
-	zone_idx = f->zbd_info->open_zones[open_zone_idx];
+	for (; open_zone_idx < f->zbd_info->num_open_zones; open_zone_idx++) {
+		if (f->zbd_info->open_zones[open_zone_idx] == zone_idx)
+			break;
+	}
+	if (open_zone_idx == f->zbd_info->num_open_zones) {
+		dprint(FD_ZBD, "%s: zone %d is not open\n",
+		       f->file_name, zone_idx);
+		return;
+	}
+
+	dprint(FD_ZBD, "%s: closing zone %d\n", f->file_name, zone_idx);
 	memmove(f->zbd_info->open_zones + open_zone_idx,
 		f->zbd_info->open_zones + open_zone_idx + 1,
 		(ZBD_MAX_OPEN_ZONES - (open_zone_idx + 1)) *
@@ -766,13 +799,8 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 			continue;
 		zone_lock(td, f, z);
 		if (all_zones) {
-			unsigned int i;
-
 			pthread_mutex_lock(&f->zbd_info->mutex);
-			for (i = 0; i < f->zbd_info->num_open_zones; i++) {
-				if (f->zbd_info->open_zones[i] == nz)
-					zbd_close_zone(td, f, i);
-			}
+			zbd_close_zone(td, f, nz);
 			pthread_mutex_unlock(&f->zbd_info->mutex);
 
 			reset_wp = z->wp != z->start;
@@ -933,11 +961,10 @@ static bool is_zone_open(const struct thread_data *td, const struct fio_file *f,
  * was not yet open and opening a new zone would cause the zone limit to be
  * exceeded.
  */
-static bool zbd_open_zone(struct thread_data *td, const struct io_u *io_u,
+static bool zbd_open_zone(struct thread_data *td, const struct fio_file *f,
 			  uint32_t zone_idx)
 {
 	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
-	const struct fio_file *f = io_u->file;
 	struct fio_zone_info *z = &f->zbd_info->zone_info[zone_idx];
 	bool res = true;
 
@@ -952,8 +979,15 @@ static bool zbd_open_zone(struct thread_data *td, const struct io_u *io_u,
 		return false;
 
 	pthread_mutex_lock(&f->zbd_info->mutex);
-	if (is_zone_open(td, f, zone_idx))
+	if (is_zone_open(td, f, zone_idx)) {
+		/*
+		 * If the zone is already open and going to be full by writes
+		 * in-flight, handle it as a full zone instead of an open zone.
+		 */
+		if (z->wp >= zbd_zone_capacity_end(z))
+			res = false;
 		goto out;
+	}
 	res = false;
 	/* Zero means no limit */
 	if (td->o.job_max_open_zones > 0 &&
@@ -995,6 +1029,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 	unsigned int open_zone_idx = -1;
 	uint32_t zone_idx, new_zone_idx;
 	int i;
+	bool wait_zone_close;
 
 	assert(is_valid_offset(f, io_u->offset));
 
@@ -1030,11 +1065,9 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 		if (td->o.max_open_zones == 0 && td->o.job_max_open_zones == 0)
 			goto examine_zone;
 		if (f->zbd_info->num_open_zones == 0) {
-			pthread_mutex_unlock(&f->zbd_info->mutex);
-			pthread_mutex_unlock(&z->mutex);
 			dprint(FD_ZBD, "%s(%s): no zones are open\n",
 			       __func__, f->file_name);
-			return NULL;
+			goto open_other_zone;
 		}
 
 		/*
@@ -1081,14 +1114,30 @@ examine_zone:
 		pthread_mutex_unlock(&f->zbd_info->mutex);
 		goto out;
 	}
-	dprint(FD_ZBD, "%s(%s): closing zone %d\n", __func__, f->file_name,
-	       zone_idx);
-	if (td->o.max_open_zones || td->o.job_max_open_zones)
-		zbd_close_zone(td, f, open_zone_idx);
+
+open_other_zone:
+	/* Check if number of open zones reaches one of limits. */
+	wait_zone_close =
+		f->zbd_info->num_open_zones == f->max_zone - f->min_zone ||
+		(td->o.max_open_zones &&
+		 f->zbd_info->num_open_zones == td->o.max_open_zones) ||
+		(td->o.job_max_open_zones &&
+		 td->num_open_zones == td->o.job_max_open_zones);
+
 	pthread_mutex_unlock(&f->zbd_info->mutex);
 
 	/* Only z->mutex is held. */
 
+	/*
+	 * When number of open zones reaches to one of limits, wait for
+	 * zone close before opening a new zone.
+	 */
+	if (wait_zone_close) {
+		dprint(FD_ZBD, "%s(%s): quiesce to allow open zones to close\n",
+		       __func__, f->file_name);
+		io_u_quiesce(td);
+	}
+
 	/* Zone 'z' is full, so try to open a new zone. */
 	for (i = f->io_size / f->zbd_info->zone_size; i > 0; i--) {
 		zone_idx++;
@@ -1103,7 +1152,7 @@ examine_zone:
 		zone_lock(td, f, z);
 		if (z->open)
 			continue;
-		if (zbd_open_zone(td, io_u, zone_idx))
+		if (zbd_open_zone(td, f, zone_idx))
 			goto out;
 	}
 
@@ -1146,7 +1195,7 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
 	const struct fio_file *f = io_u->file;
 	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
 
-	if (!zbd_open_zone(td, io_u, z - f->zbd_info->zone_info)) {
+	if (!zbd_open_zone(td, f, z - f->zbd_info->zone_info)) {
 		pthread_mutex_unlock(&z->mutex);
 		z = zbd_convert_to_open_zone(td, io_u);
 		assert(z);
@@ -1203,6 +1252,28 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 	return NULL;
 }
 
+/**
+ * zbd_end_zone_io - update zone status at command completion
+ * @io_u: I/O unit
+ * @z: zone info pointer
+ *
+ * If the write command made the zone full, close it.
+ *
+ * The caller must hold z->mutex.
+ */
+static void zbd_end_zone_io(struct thread_data *td, const struct io_u *io_u,
+			    struct fio_zone_info *z)
+{
+	const struct fio_file *f = io_u->file;
+
+	if (io_u->ddir == DDIR_WRITE &&
+	    io_u->offset + io_u->buflen >= zbd_zone_capacity_end(z)) {
+		pthread_mutex_lock(&f->zbd_info->mutex);
+		zbd_close_zone(td, f, z - f->zbd_info->zone_info);
+		pthread_mutex_unlock(&f->zbd_info->mutex);
+	}
+}
+
 /**
  * zbd_queue_io - update the write pointer of a sequential zone
  * @io_u: I/O unit
@@ -1212,7 +1283,8 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
  * For write and trim operations, update the write pointer of the I/O unit
  * target zone.
  */
-static void zbd_queue_io(struct io_u *io_u, int q, bool success)
+static void zbd_queue_io(struct thread_data *td, struct io_u *io_u, int q,
+			 bool success)
 {
 	const struct fio_file *f = io_u->file;
 	struct zoned_block_device_info *zbd_info = f->zbd_info;
@@ -1258,6 +1330,9 @@ static void zbd_queue_io(struct io_u *io_u, int q, bool success)
 		break;
 	}
 
+	if (q == FIO_Q_COMPLETED && !io_u->error)
+		zbd_end_zone_io(td, io_u, z);
+
 unlock:
 	if (!success || q != FIO_Q_QUEUED) {
 		/* BUSY or COMPLETED: unlock the zone */
@@ -1270,7 +1345,7 @@ unlock:
  * zbd_put_io - Unlock an I/O unit target zone lock
  * @io_u: I/O unit
  */
-static void zbd_put_io(const struct io_u *io_u)
+static void zbd_put_io(struct thread_data *td, const struct io_u *io_u)
 {
 	const struct fio_file *f = io_u->file;
 	struct zoned_block_device_info *zbd_info = f->zbd_info;
@@ -1292,6 +1367,8 @@ static void zbd_put_io(const struct io_u *io_u)
 	       "%s: terminate I/O (%lld, %llu) for zone %u\n",
 	       f->file_name, io_u->offset, io_u->buflen, zone_idx);
 
+	zbd_end_zone_io(td, io_u, z);
+
 	ret = pthread_mutex_unlock(&z->mutex);
 	assert(ret == 0);
 	zbd_check_swd(f);
@@ -1527,7 +1604,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	case DDIR_WRITE:
 		if (io_u->buflen > f->zbd_info->zone_size)
 			goto eof;
-		if (!zbd_open_zone(td, io_u, zone_idx_b)) {
+		if (!zbd_open_zone(td, f, zone_idx_b)) {
 			pthread_mutex_unlock(&zb->mutex);
 			zb = zbd_convert_to_open_zone(td, io_u);
 			if (!zb)
diff --git a/zbd.h b/zbd.h
index 021174c1..bff55f99 100644
--- a/zbd.h
+++ b/zbd.h
@@ -98,18 +98,19 @@ static inline void zbd_close_file(struct fio_file *f)
 		zbd_free_zone_info(f);
 }
 
-static inline void zbd_queue_io_u(struct io_u *io_u, enum fio_q_status status)
+static inline void zbd_queue_io_u(struct thread_data *td, struct io_u *io_u,
+				  enum fio_q_status status)
 {
 	if (io_u->zbd_queue_io) {
-		io_u->zbd_queue_io(io_u, status, io_u->error == 0);
+		io_u->zbd_queue_io(td, io_u, status, io_u->error == 0);
 		io_u->zbd_queue_io = NULL;
 	}
 }
 
-static inline void zbd_put_io_u(struct io_u *io_u)
+static inline void zbd_put_io_u(struct thread_data *td, struct io_u *io_u)
 {
 	if (io_u->zbd_put_io) {
-		io_u->zbd_put_io(io_u);
+		io_u->zbd_put_io(td, io_u);
 		io_u->zbd_queue_io = NULL;
 		io_u->zbd_put_io = NULL;
 	}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-09-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-09-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e68470637ae7c03d5f85a9243e148d7c46ad4487:

  Update the year to 2020 in os/windows/eula.rtf (2020-08-29 18:54:17 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 20c7a244e75e4aa705a31a74e7067de4c890dff7:

  options: flow should parse as FIO_OPT_INT (2020-08-31 09:07:12 -0600)

----------------------------------------------------------------
David, Bar (2):
      flow: reclaim flow when job is reaped
      flow: add ability for weight-based flow control on multiple jobs

Jens Axboe (2):
      Merge branch 'multi_job_flow' of https://github.com/bardavid/fio into master
      options: flow should parse as FIO_OPT_INT

 HOWTO                     |   9 +---
 arch/arch.h               |   7 +++
 backend.c                 |   1 +
 cconv.c                   |   6 +--
 examples/butterfly.fio    |   2 +-
 examples/flow.fio         |   5 +-
 fio.1                     |  23 ++++----
 fio.h                     |   1 +
 flow.c                    |  50 ++++++++++++------
 flow.h                    |   2 +
 options.c                 |  10 +---
 server.h                  |   2 +-
 t/jobs/t0011-5d2788d5.fio |   4 +-
 t/jobs/t0012.fio          |  23 ++++----
 t/jobs/t0014.fio          |  29 +++++++++++
 t/run-fio-tests.py        | 130 ++++++++++++++++++++++++++++++++++++++++++++--
 thread_options.h          |  24 +++++----
 17 files changed, 250 insertions(+), 78 deletions(-)
 create mode 100644 t/jobs/t0014.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index e0403b08..5dc571f8 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2861,15 +2861,10 @@ Threads, processes and job synchronization
 	``flow=8`` and another job has ``flow=-1``, then there will be a roughly 1:8
 	ratio in how much one runs vs the other.
 
-.. option:: flow_watermark=int
-
-	The maximum value that the absolute value of the flow counter is allowed to
-	reach before the job must wait for a lower value of the counter.
-
 .. option:: flow_sleep=int
 
-	The period of time, in microseconds, to wait after the flow watermark has
-	been exceeded before retrying operations.
+	The period of time, in microseconds, to wait after the flow counter
+	has exceeded its proportion before retrying operations.
 
 .. option:: stonewall, wait_for_previous
 
diff --git a/arch/arch.h b/arch/arch.h
index 08c3d703..a25779d4 100644
--- a/arch/arch.h
+++ b/arch/arch.h
@@ -36,6 +36,13 @@ extern unsigned long arch_flags;
 
 #define ARCH_CPU_CLOCK_WRAPS
 
+#define atomic_add(p, v)					\
+	atomic_fetch_add((_Atomic typeof(*(p)) *)(p), v)
+#define atomic_sub(p, v)					\
+	atomic_fetch_sub((_Atomic typeof(*(p)) *)(p), v)
+#define atomic_load_relaxed(p)					\
+	atomic_load_explicit((_Atomic typeof(*(p)) *)(p),	\
+			     memory_order_relaxed)
 #define atomic_load_acquire(p)					\
 	atomic_load_explicit((_Atomic typeof(*(p)) *)(p),	\
 			     memory_order_acquire)
diff --git a/backend.c b/backend.c
index a4367672..05453ae2 100644
--- a/backend.c
+++ b/backend.c
@@ -2042,6 +2042,7 @@ reaped:
 
 		done_secs += mtime_since_now(&td->epoch) / 1000;
 		profile_td_exit(td);
+		flow_exit_job(td);
 	}
 
 	if (*nr_running == cputhreads && !pending && realthreads)
diff --git a/cconv.c b/cconv.c
index 4b0c3490..5dc0569f 100644
--- a/cconv.c
+++ b/cconv.c
@@ -281,8 +281,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->uid = le32_to_cpu(top->uid);
 	o->gid = le32_to_cpu(top->gid);
 	o->flow_id = __le32_to_cpu(top->flow_id);
-	o->flow = __le32_to_cpu(top->flow);
-	o->flow_watermark = __le32_to_cpu(top->flow_watermark);
+	o->flow = le32_to_cpu(top->flow);
 	o->flow_sleep = le32_to_cpu(top->flow_sleep);
 	o->sync_file_range = le32_to_cpu(top->sync_file_range);
 	o->latency_target = le64_to_cpu(top->latency_target);
@@ -481,8 +480,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->uid = cpu_to_le32(o->uid);
 	top->gid = cpu_to_le32(o->gid);
 	top->flow_id = __cpu_to_le32(o->flow_id);
-	top->flow = __cpu_to_le32(o->flow);
-	top->flow_watermark = __cpu_to_le32(o->flow_watermark);
+	top->flow = cpu_to_le32(o->flow);
 	top->flow_sleep = cpu_to_le32(o->flow_sleep);
 	top->sync_file_range = cpu_to_le32(o->sync_file_range);
 	top->latency_target = __cpu_to_le64(o->latency_target);
diff --git a/examples/butterfly.fio b/examples/butterfly.fio
index 42d253d5..9678aa85 100644
--- a/examples/butterfly.fio
+++ b/examples/butterfly.fio
@@ -15,5 +15,5 @@ flow=2
 
 [backward]
 rw=read:-8k
-flow=-2
+flow=2
 #offset=50%
diff --git a/examples/flow.fio b/examples/flow.fio
index 4b078cf8..e34c6856 100644
--- a/examples/flow.fio
+++ b/examples/flow.fio
@@ -11,15 +11,14 @@ iodepth=256
 size=100g
 bs=8k
 filename=/tmp/testfile
-flow_watermark=100
 flow_sleep=1000
 
 [job2]
 numjobs=1
 rw=write
-flow=-8
+flow=1
 
 [job1]
 numjobs=1
 rw=randread
-flow=1
+flow=8
diff --git a/fio.1 b/fio.1
index 1c90e4a5..f15194ff 100644
--- a/fio.1
+++ b/fio.1
@@ -2549,21 +2549,18 @@ The ID of the flow. If not specified, it defaults to being a global
 flow. See \fBflow\fR.
 .TP
 .BI flow \fR=\fPint
-Weight in token-based flow control. If this value is used, then there is
-a 'flow counter' which is used to regulate the proportion of activity between
-two or more jobs. Fio attempts to keep this flow counter near zero. The
-\fBflow\fR parameter stands for how much should be added or subtracted to the
-flow counter on each iteration of the main I/O loop. That is, if one job has
-`flow=8' and another job has `flow=\-1', then there will be a roughly 1:8
-ratio in how much one runs vs the other.
-.TP
-.BI flow_watermark \fR=\fPint
-The maximum value that the absolute value of the flow counter is allowed to
-reach before the job must wait for a lower value of the counter.
+Weight in token-based flow control. If this value is used,
+then fio regulates the activity between two or more jobs
+sharing the same flow_id.
+Fio attempts to keep each job activity proportional to other jobs' activities
+in the same flow_id group, with respect to requested weight per job.
+That is, if one job has `flow=3', another job has `flow=2'
+and another with `flow=1`, then there will be a roughly 3:2:1 ratio
+in how much one runs vs the others.
 .TP
 .BI flow_sleep \fR=\fPint
-The period of time, in microseconds, to wait after the flow watermark has
-been exceeded before retrying operations.
+The period of time, in microseconds, to wait after the flow counter
+has exceeded its proportion before retrying operations.
 .TP
 .BI stonewall "\fR,\fB wait_for_previous"
 Wait for preceding jobs in the job file to exit, before starting this
diff --git a/fio.h b/fio.h
index 8045c32f..9d189eb8 100644
--- a/fio.h
+++ b/fio.h
@@ -440,6 +440,7 @@ struct thread_data {
 	int first_error;
 
 	struct fio_flow *flow;
+	unsigned long long flow_counter;
 
 	/*
 	 * Can be overloaded by profiles
diff --git a/flow.c b/flow.c
index a8dbfb9b..ee4d761d 100644
--- a/flow.c
+++ b/flow.c
@@ -7,7 +7,8 @@ struct fio_flow {
 	unsigned int refs;
 	struct flist_head list;
 	unsigned int id;
-	long long int flow_counter;
+	unsigned long long flow_counter;
+	unsigned int total_weight;
 };
 
 static struct flist_head *flow_list;
@@ -16,17 +17,23 @@ static struct fio_sem *flow_lock;
 int flow_threshold_exceeded(struct thread_data *td)
 {
 	struct fio_flow *flow = td->flow;
-	long long flow_counter;
+	double flow_counter_ratio, flow_weight_ratio;
 
 	if (!flow)
 		return 0;
 
-	if (td->o.flow > 0)
-		flow_counter = flow->flow_counter;
-	else
-		flow_counter = -flow->flow_counter;
-
-	if (flow_counter > td->o.flow_watermark) {
+	flow_counter_ratio = (double)td->flow_counter /
+		atomic_load_relaxed(&flow->flow_counter);
+	flow_weight_ratio = (double)td->o.flow /
+		atomic_load_relaxed(&flow->total_weight);
+
+	/*
+	 * each thread/process executing a fio job will stall based on the
+	 * expected  user ratio for a given flow_id group. the idea is to keep
+	 * 2 counters, flow and job-specific counter to test if the
+	 * ratio between them is proportional to other jobs in the same flow_id
+	 */
+	if (flow_counter_ratio > flow_weight_ratio) {
 		if (td->o.flow_sleep) {
 			io_u_quiesce(td);
 			usleep(td->o.flow_sleep);
@@ -35,9 +42,13 @@ int flow_threshold_exceeded(struct thread_data *td)
 		return 1;
 	}
 
-	/* No synchronization needed because it doesn't
-	 * matter if the flow count is slightly inaccurate */
-	flow->flow_counter += td->o.flow;
+	/*
+	 * increment flow(shared counter, therefore atomically)
+	 * and job-specific counter
+	 */
+	atomic_add(&flow->flow_counter, 1);
+	++td->flow_counter;
+
 	return 0;
 }
 
@@ -68,7 +79,8 @@ static struct fio_flow *flow_get(unsigned int id)
 		flow->refs = 0;
 		INIT_FLIST_HEAD(&flow->list);
 		flow->id = id;
-		flow->flow_counter = 0;
+		flow->flow_counter = 1;
+		flow->total_weight = 0;
 
 		flist_add_tail(&flow->list, flow_list);
 	}
@@ -78,14 +90,19 @@ static struct fio_flow *flow_get(unsigned int id)
 	return flow;
 }
 
-static void flow_put(struct fio_flow *flow)
+static void flow_put(struct fio_flow *flow, unsigned long long flow_counter,
+				        unsigned int weight)
 {
 	if (!flow_lock)
 		return;
 
 	fio_sem_down(flow_lock);
 
+	atomic_sub(&flow->flow_counter, flow_counter);
+	atomic_sub(&flow->total_weight, weight);
+
 	if (!--flow->refs) {
+		assert(flow->flow_counter == 1);
 		flist_del(&flow->list);
 		sfree(flow);
 	}
@@ -95,14 +112,17 @@ static void flow_put(struct fio_flow *flow)
 
 void flow_init_job(struct thread_data *td)
 {
-	if (td->o.flow)
+	if (td->o.flow) {
 		td->flow = flow_get(td->o.flow_id);
+		td->flow_counter = 0;
+		atomic_add(&td->flow->total_weight, td->o.flow);
+	}
 }
 
 void flow_exit_job(struct thread_data *td)
 {
 	if (td->flow) {
-		flow_put(td->flow);
+		flow_put(td->flow, td->flow_counter, td->o.flow);
 		td->flow = NULL;
 	}
 }
diff --git a/flow.h b/flow.h
index c0a45c3c..95e766de 100644
--- a/flow.h
+++ b/flow.h
@@ -1,6 +1,8 @@
 #ifndef FIO_FLOW_H
 #define FIO_FLOW_H
 
+#define FLOW_MAX_WEIGHT 1000
+
 int flow_threshold_exceeded(struct thread_data *td);
 void flow_init_job(struct thread_data *td);
 void flow_exit_job(struct thread_data *td);
diff --git a/options.c b/options.c
index 251ad2c1..0d64e7c0 100644
--- a/options.c
+++ b/options.c
@@ -4702,20 +4702,14 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.parent	= "flow_id",
 		.hide	= 1,
 		.def	= "0",
+		.maxval	= FLOW_MAX_WEIGHT,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_IO_FLOW,
 	},
 	{
 		.name	= "flow_watermark",
 		.lname	= "I/O flow watermark",
-		.type	= FIO_OPT_INT,
-		.off1	= offsetof(struct thread_options, flow_watermark),
-		.help	= "High watermark for flow control. This option"
-			" should be set to the same value for all threads"
-			" with non-zero flow.",
-		.parent	= "flow_id",
-		.hide	= 1,
-		.def	= "1024",
+		.type	= FIO_OPT_SOFT_DEPRECATED,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_IO_FLOW,
 	},
diff --git a/server.h b/server.h
index efa70e7c..3cd60096 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 84,
+	FIO_SERVER_VER			= 85,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/t/jobs/t0011-5d2788d5.fio b/t/jobs/t0011-5d2788d5.fio
index f90cee90..ad11f921 100644
--- a/t/jobs/t0011-5d2788d5.fio
+++ b/t/jobs/t0011-5d2788d5.fio
@@ -11,8 +11,8 @@ runtime=10
 flow_id=1
 
 [flow1]
-flow=-8
+flow=1
 rate_iops=1000
 
 [flow2]
-flow=1
+flow=8
diff --git a/t/jobs/t0012.fio b/t/jobs/t0012.fio
index 03fea627..d7123966 100644
--- a/t/jobs/t0012.fio
+++ b/t/jobs/t0012.fio
@@ -1,20 +1,25 @@
-# Expected results: no parse warnings, runs and with roughly 1/8 iops between
-#			the two jobs.
-# Buggy result: parse warning on flow value overflow, no 1/8 division between
-#			jobs.
+# Expected results: no parse warnings, runs and with roughly 1:5:10 iops
+#			between the three jobs.
+# Buggy result: parse warning on flow value overflow, no 1:5:10 division
+#			between jobs.
 #
 
 [global]
 bs=4k
 ioengine=null
 size=100g
-runtime=10
+runtime=12
 flow_id=1
-gtod_cpu=1
+flow_sleep=100
+thread
+log_avg_msec=1000
+write_iops_log=t0012.fio
 
 [flow1]
-flow=-8
-rate_iops=1000
+flow=1
 
 [flow2]
-flow=1
+flow=5
+
+[flow3]
+flow=10
diff --git a/t/jobs/t0014.fio b/t/jobs/t0014.fio
new file mode 100644
index 00000000..d9b45651
--- /dev/null
+++ b/t/jobs/t0014.fio
@@ -0,0 +1,29 @@
+# Expected results: no parse warnings, runs and with roughly 1:2:3 iops
+#			between the three jobs for the first 5 seconds, then
+#			runs with roughly 1:2 iops between the two jobs for
+#			the remaining 5 seconds.
+#
+# Buggy result: parse warning on flow value overflow, no 1:2:3 division between
+#			the three jobs for the first 5 seconds or no 1:2 division between
+#			the first two jobs for the remaining 5 seconds.
+#
+
+[global]
+bs=4k
+ioengine=null
+size=100g
+runtime=12
+flow_id=1
+thread
+log_avg_msec=1000
+write_iops_log=t0014.fio
+
+[flow1]
+flow=1
+
+[flow2]
+flow=2
+
+[flow3]
+flow=3
+runtime=5
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 6f1fc092..e5c2f17c 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -420,6 +420,118 @@ class FioJobTest_t0009(FioJobTest):
             self.passed = False
 
 
+class FioJobTest_t0012(FioJobTest):
+    """Test consists of fio test job t0012
+    Confirm ratios of job iops are 1:5:10
+    job1,job2,job3 respectively"""
+
+    def check_result(self):
+        super(FioJobTest_t0012, self).check_result()
+
+        if not self.passed:
+            return
+
+        iops_files = []
+        for i in range(1,4):
+            file_data, success = self.get_file(os.path.join(self.test_dir, "{0}_iops.{1}.log".format(os.path.basename(self.fio_job), i)))
+
+            if not success:
+                self.failure_reason = "{0} unable to open output file,".format(self.failure_reason)
+                self.passed = False
+                return
+
+            iops_files.append(file_data.splitlines())
+
+        # there are 9 samples for job1 and job2, 4 samples for job3
+        iops1 = 0.0
+        iops2 = 0.0
+        iops3 = 0.0
+        for i in range(9):
+            iops1 = iops1 + float(iops_files[0][i].split(',')[1])
+            iops2 = iops2 + float(iops_files[1][i].split(',')[1])
+            iops3 = iops3 + float(iops_files[2][i].split(',')[1])
+
+            ratio1 = iops3/iops2
+            ratio2 = iops3/iops1
+            logging.debug(
+                "sample {0}: job1 iops={1} job2 iops={2} job3 iops={3} job3/job2={4:.3f} job3/job1={5:.3f}".format(
+                    i, iops1, iops2, iops3, ratio1, ratio2
+                )
+            )
+
+        # test job1 and job2 succeeded to recalibrate
+        if ratio1 < 1 or ratio1 > 3 or ratio2 < 7 or ratio2 > 13:
+            self.failure_reason = "{0} iops ratio mismatch iops1={1} iops2={2} iops3={3} expected r1~2 r2~10 got r1={4:.3f} r2={5:.3f},".format(
+                self.failure_reason, iops1, iops2, iops3, ratio1, ratio2
+            )
+            self.passed = False
+            return
+
+
+class FioJobTest_t0014(FioJobTest):
+    """Test consists of fio test job t0014
+	Confirm that job1_iops / job2_iops ~ 1:2 for entire duration
+	and that job1_iops / job3_iops ~ 1:3 for first half of duration.
+
+    The test is about making sure the flow feature can
+    re-calibrate the activity dynamically"""
+
+    def check_result(self):
+        super(FioJobTest_t0014, self).check_result()
+
+        if not self.passed:
+            return
+
+        iops_files = []
+        for i in range(1,4):
+            file_data, success = self.get_file(os.path.join(self.test_dir, "{0}_iops.{1}.log".format(os.path.basename(self.fio_job), i)))
+
+            if not success:
+                self.failure_reason = "{0} unable to open output file,".format(self.failure_reason)
+                self.passed = False
+                return
+
+            iops_files.append(file_data.splitlines())
+
+        # there are 9 samples for job1 and job2, 4 samples for job3
+        iops1 = 0.0
+        iops2 = 0.0
+        iops3 = 0.0
+        for i in range(9):
+            if i < 4:
+                iops3 = iops3 + float(iops_files[2][i].split(',')[1])
+            elif i == 4:
+                ratio1 = iops1 / iops2
+                ratio2 = iops1 / iops3
+
+
+                if ratio1 < 0.43 or ratio1 > 0.57 or ratio2 < 0.21 or ratio2 > 0.45:
+                    self.failure_reason = "{0} iops ratio mismatch iops1={1} iops2={2} iops3={3}\
+                                                expected r1~0.5 r2~0.33 got r1={4:.3f} r2={5:.3f},".format(
+                        self.failure_reason, iops1, iops2, iops3, ratio1, ratio2
+                    )
+                    self.passed = False
+
+            iops1 = iops1 + float(iops_files[0][i].split(',')[1])
+            iops2 = iops2 + float(iops_files[1][i].split(',')[1])
+
+            ratio1 = iops1/iops2
+            ratio2 = iops1/iops3
+            logging.debug(
+                "sample {0}: job1 iops={1} job2 iops={2} job3 iops={3} job1/job2={4:.3f} job1/job3={5:.3f}".format(
+                    i, iops1, iops2, iops3, ratio1, ratio2
+                )
+            )
+
+        # test job1 and job2 succeeded to recalibrate
+        if ratio1 < 0.43 or ratio1 > 0.57:
+            self.failure_reason = "{0} iops ratio mismatch iops1={1} iops2={2} expected ratio~0.5 got ratio={3:.3f},".format(
+                self.failure_reason, iops1, iops2, ratio1
+            )
+            self.passed = False
+            return
+
+
 class FioJobTest_iops_rate(FioJobTest):
     """Test consists of fio test job t0009
     Confirm that job0 iops == 1000
@@ -442,7 +554,7 @@ class FioJobTest_iops_rate(FioJobTest):
             self.failure_reason = "{0} iops value mismatch,".format(self.failure_reason)
             self.passed = False
 
-        if ratio < 7 or ratio > 9:
+        if ratio < 6 or ratio > 10:
             self.failure_reason = "{0} iops ratio mismatch,".format(self.failure_reason)
             self.passed = False
 
@@ -680,15 +792,13 @@ TEST_LIST = [
     },
     {
         'test_id':          12,
-        'test_class':       FioJobTest_iops_rate,
+        'test_class':       FioJobTest_t0012,
         'job':              't0012.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
         'pre_success':      None,
         'output_format':    'json',
-        'requirements':     [Requirements.not_macos],
-        # mac os does not support CPU affinity
-        # which is required for gtod offloading
+        'requirements':     [],
     },
     {
         'test_id':          13,
@@ -700,6 +810,16 @@ TEST_LIST = [
         'output_format':    'json',
         'requirements':     [],
     },
+    {
+        'test_id':          14,
+        'test_class':       FioJobTest_t0014,
+        'job':              't0014.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,
diff --git a/thread_options.h b/thread_options.h
index 14f1cbe9..7c0a3158 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -311,11 +311,6 @@ struct thread_options {
 	unsigned int uid;
 	unsigned int gid;
 
-	int flow_id;
-	int flow;
-	int flow_watermark;
-	unsigned int flow_sleep;
-
 	unsigned int offset_increment_percent;
 	unsigned long long offset_increment;
 	unsigned long long number_ios;
@@ -327,6 +322,13 @@ struct thread_options {
 	fio_fp64_t latency_percentile;
 	uint32_t latency_run;
 
+	/*
+	 * flow support
+	 */
+	int flow_id;
+	unsigned int flow;
+	unsigned int flow_sleep;
+
 	unsigned int sig_figs;
 
 	unsigned block_error_hist;
@@ -602,11 +604,6 @@ struct thread_options_pack {
 	uint32_t uid;
 	uint32_t gid;
 
-	int32_t flow_id;
-	int32_t flow;
-	int32_t flow_watermark;
-	uint32_t flow_sleep;
-
 	uint32_t offset_increment_percent;
 	uint64_t offset_increment;
 	uint64_t number_ios;
@@ -617,6 +614,13 @@ struct thread_options_pack {
 	fio_fp64_t latency_percentile;
 	uint32_t latency_run;
 
+	/*
+	 * flow support
+	 */
+	int32_t flow_id;
+	uint32_t flow;
+	uint32_t flow_sleep;
+
 	uint32_t sig_figs;
 
 	uint32_t block_error_hist;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit db83b0abd16bbd6b8f589a993e6f70d9812be6e3:

  Use fallthrough attribute (2020-08-28 09:14:38 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e68470637ae7c03d5f85a9243e148d7c46ad4487:

  Update the year to 2020 in os/windows/eula.rtf (2020-08-29 18:54:17 -0600)

----------------------------------------------------------------
Rebecca Cran (2):
      Update os/windows/dobuild.cmd to support signing binaries/installer
      Update the year to 2020 in os/windows/eula.rtf

 os/windows/dobuild.cmd |  12 ++++++++++++
 os/windows/eula.rtf    | Bin 1077 -> 1075 bytes
 2 files changed, 12 insertions(+)

---

Diff of recent changes:

diff --git a/os/windows/dobuild.cmd b/os/windows/dobuild.cmd
index ef12d82d..d06a2afa 100644
--- a/os/windows/dobuild.cmd
+++ b/os/windows/dobuild.cmd
@@ -26,9 +26,21 @@ if not defined FIO_ARCH (
   goto end
 )
 
+if defined SIGN_FIO (
+  signtool sign /n "%SIGNING_CN%" /t http://timestamp.digicert.com ..\..\fio.exe
+  signtool sign /as /n "%SIGNING_CN%" /tr http://timestamp.digicert.com /td sha256 /fd sha256 ..\..\fio.exe
+
+  signtool sign /n "%SIGNING_CN%" /t http://timestamp.digicert.com ..\..\t\*.exe
+  signtool sign /as /n "%SIGNING_CN%" /tr http://timestamp.digicert.com /td sha256 /fd sha256 ..\..\t\*.exe
+)
+
 "%WIX%bin\candle" -nologo -arch %FIO_ARCH% -dFioVersionNumbers="%FIO_VERSION_NUMBERS%" install.wxs
 @if ERRORLEVEL 1 goto end
 "%WIX%bin\candle" -nologo -arch %FIO_ARCH% examples.wxs
 @if ERRORLEVEL 1 goto end
 "%WIX%bin\light" -nologo -sice:ICE61 install.wixobj examples.wixobj -ext WixUIExtension -out %FIO_VERSION%-%FIO_ARCH%.msi
 :end
+
+if defined SIGN_FIO (
+  signtool sign /n "%SIGNING_CN%" /tr http://timestamp.digicert.com /td sha256 /fd sha256 %FIO_VERSION%-%FIO_ARCH%.msi
+)
\ No newline at end of file
diff --git a/os/windows/eula.rtf b/os/windows/eula.rtf
index 01472be9..a931017c 100755
Binary files a/os/windows/eula.rtf and b/os/windows/eula.rtf differ


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0ed7d55e6610020c4642ddb246f6ce1ab697052d:

  zbd: don't read past the WP on a read only workload with verify (2020-08-27 12:27:36 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to db83b0abd16bbd6b8f589a993e6f70d9812be6e3:

  Use fallthrough attribute (2020-08-28 09:14:38 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'master' of https://github.com/donny372/fio into master
      Use fallthrough attribute

donny372 (1):
      Avoid multiple instance read iolog from stdin.

 compiler/compiler.h | 10 ++++++++++
 crc/murmur3.c       |  5 +++--
 hash.h              | 24 +++++++++++++-----------
 init.c              | 24 ++++++++++++++----------
 io_u.c              |  5 +++++
 lib/lfsr.c          | 32 ++++++++++++++++----------------
 parse.c             |  4 ++--
 t/lfsr-test.c       |  7 ++++---
 8 files changed, 67 insertions(+), 44 deletions(-)

---

Diff of recent changes:

diff --git a/compiler/compiler.h b/compiler/compiler.h
index 8a784b92..8988236c 100644
--- a/compiler/compiler.h
+++ b/compiler/compiler.h
@@ -66,4 +66,14 @@
 #define FIELD_SIZE(s, f) (sizeof(((__typeof__(s))0)->f))
 #endif
 
+#ifndef __has_attribute
+#define __GCC4_has_attribute___fallthrough__	0
+#endif
+
+#if __has_attribute(__fallthrough__)
+#define fallthrough	 __attribute__((__fallthrough__))
+#else
+#define fallthrough	do {} while (0)  /* fallthrough */
+#endif
+
 #endif
diff --git a/crc/murmur3.c b/crc/murmur3.c
index f4f2f2c6..ba408a9e 100644
--- a/crc/murmur3.c
+++ b/crc/murmur3.c
@@ -1,4 +1,5 @@
 #include "murmur3.h"
+#include "../compiler/compiler.h"
 
 static inline uint32_t rotl32(uint32_t x, int8_t r)
 {
@@ -29,10 +30,10 @@ static uint32_t murmur3_tail(const uint8_t *data, const int nblocks,
 	switch (len & 3) {
 	case 3:
 		k1 ^= tail[2] << 16;
-		/* fall through */
+		fallthrough;
 	case 2:
 		k1 ^= tail[1] << 8;
-		/* fall through */
+		fallthrough;
 	case 1:
 		k1 ^= tail[0];
 		k1 *= c1;
diff --git a/hash.h b/hash.h
index 66dd3d69..2c04bc29 100644
--- a/hash.h
+++ b/hash.h
@@ -3,6 +3,7 @@
 
 #include <inttypes.h>
 #include "arch/arch.h"
+#include "compiler/compiler.h"
 
 /* Fast hashing routine for a long.
    (C) 2002 William Lee Irwin III, IBM */
@@ -141,19 +142,20 @@ static inline uint32_t jhash(const void *key, uint32_t length, uint32_t initval)
 	/* Last block: affect all 32 bits of (c) */
 	/* All the case statements fall through */
 	switch (length) {
-	case 12: c += (uint32_t) k[11] << 24;	/* fall through */
-	case 11: c += (uint32_t) k[10] << 16;	/* fall through */
-	case 10: c += (uint32_t) k[9] << 8;	/* fall through */
-	case 9:  c += k[8];			/* fall through */
-	case 8:  b += (uint32_t) k[7] << 24;	/* fall through */
-	case 7:  b += (uint32_t) k[6] << 16;	/* fall through */
-	case 6:  b += (uint32_t) k[5] << 8;	/* fall through */
-	case 5:  b += k[4];			/* fall through */
-	case 4:  a += (uint32_t) k[3] << 24;	/* fall through */
-	case 3:  a += (uint32_t) k[2] << 16;	/* fall through */
-	case 2:  a += (uint32_t) k[1] << 8;	/* fall through */
+	case 12: c += (uint32_t) k[11] << 24;	fallthrough;
+	case 11: c += (uint32_t) k[10] << 16;	fallthrough;
+	case 10: c += (uint32_t) k[9] << 8;	fallthrough;
+	case 9:  c += k[8];			fallthrough;
+	case 8:  b += (uint32_t) k[7] << 24;	fallthrough;
+	case 7:  b += (uint32_t) k[6] << 16;	fallthrough;
+	case 6:  b += (uint32_t) k[5] << 8;	fallthrough;
+	case 5:  b += k[4];			fallthrough;
+	case 4:  a += (uint32_t) k[3] << 24;	fallthrough;
+	case 3:  a += (uint32_t) k[2] << 16;	fallthrough;
+	case 2:  a += (uint32_t) k[1] << 8;	fallthrough;
 	case 1:  a += k[0];
 		 __jhash_final(a, b, c);
+		 fallthrough;
 	case 0: /* Nothing left to add */
 		break;
 	}
diff --git a/init.c b/init.c
index 491b46e6..3cd0238b 100644
--- a/init.c
+++ b/init.c
@@ -1835,6 +1835,7 @@ static int __parse_jobs_ini(struct thread_data *td,
 		int nested, char *name, char ***popts, int *aopts, int *nopts)
 {
 	bool global = false;
+	bool stdin_occupied = false;
 	char *string;
 	FILE *f;
 	char *p;
@@ -1851,9 +1852,10 @@ static int __parse_jobs_ini(struct thread_data *td,
 	if (is_buf)
 		f = NULL;
 	else {
-		if (!strcmp(file, "-"))
+		if (!strcmp(file, "-")) {
 			f = stdin;
-		else
+			stdin_occupied = true;
+		} else
 			f = fopen(file, "r");
 
 		if (!f) {
@@ -2056,15 +2058,17 @@ static int __parse_jobs_ini(struct thread_data *td,
 
 		ret = fio_options_parse(td, opts, num_opts);
 
-		if (!ret) {
-			if (!strcmp(file, "-") && td->o.read_iolog_file != NULL) {
-				char *fname = get_name_by_idx(td->o.read_iolog_file,
-							      td->subjob_number);
-				if (!strcmp(fname, "-")) {
-					log_err("fio: we can't read both iolog "
-						"and job file from stdin.\n");
+		if (!ret && td->o.read_iolog_file != NULL) {
+			char *fname = get_name_by_idx(td->o.read_iolog_file,
+						      td->subjob_number);
+			if (!strcmp(fname, "-")) {
+				if (stdin_occupied) {
+					log_err("fio: only one user (read_iolog_file/job "
+						"file) of stdin is permitted at once but "
+						"more than one was found.\n");
 					ret = 1;
 				}
+				stdin_occupied = true;
 			}
 		}
 		if (!ret) {
@@ -2891,7 +2895,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			log_err("%s: unrecognized option '%s'\n", argv[0],
 							argv[optind - 1]);
 			show_closest_option(argv[optind - 1]);
-			/* fall through */
+			fallthrough;
 		default:
 			do_exit++;
 			exit_val = 1;
diff --git a/io_u.c b/io_u.c
index 2ef5acec..155d0a32 100644
--- a/io_u.c
+++ b/io_u.c
@@ -993,6 +993,7 @@ static void __io_u_mark_map(uint64_t *map, unsigned int nr)
 		break;
 	case 1 ... 4:
 		idx = 1;
+		fallthrough;
 	case 0:
 		break;
 	}
@@ -1034,6 +1035,7 @@ void io_u_mark_depth(struct thread_data *td, unsigned int nr)
 		break;
 	case 2 ... 3:
 		idx = 1;
+		fallthrough;
 	case 1:
 		break;
 	}
@@ -1074,6 +1076,7 @@ static void io_u_mark_lat_nsec(struct thread_data *td, unsigned long long nsec)
 		break;
 	case 2 ... 3:
 		idx = 1;
+		fallthrough;
 	case 0 ... 1:
 		break;
 	}
@@ -1115,6 +1118,7 @@ static void io_u_mark_lat_usec(struct thread_data *td, unsigned long long usec)
 		break;
 	case 2 ... 3:
 		idx = 1;
+		fallthrough;
 	case 0 ... 1:
 		break;
 	}
@@ -1162,6 +1166,7 @@ static void io_u_mark_lat_msec(struct thread_data *td, unsigned long long msec)
 		break;
 	case 2 ... 3:
 		idx = 1;
+		fallthrough;
 	case 0 ... 1:
 		break;
 	}
diff --git a/lib/lfsr.c b/lib/lfsr.c
index 1ef6ebbf..a32e850a 100644
--- a/lib/lfsr.c
+++ b/lib/lfsr.c
@@ -88,37 +88,37 @@ static inline void __lfsr_next(struct fio_lfsr *fl, unsigned int spin)
 	 */
 	switch (spin) {
 		case 15: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case 14: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case 13: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case 12: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case 11: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case 10: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case  9: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case  8: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case  7: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case  6: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case  5: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case  4: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case  3: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case  2: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case  1: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		case  0: __LFSR_NEXT(fl, fl->last_val);
-		/* fall through */
+		fallthrough;
 		default: break;
 	}
 }
diff --git a/parse.c b/parse.c
index 04b2e198..f4cefcf6 100644
--- a/parse.c
+++ b/parse.c
@@ -596,7 +596,7 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 	}
 	case FIO_OPT_STR_VAL_TIME:
 		is_time = 1;
-		/* fall through */
+		fallthrough;
 	case FIO_OPT_ULL:
 	case FIO_OPT_INT:
 	case FIO_OPT_STR_VAL: {
@@ -941,7 +941,7 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 	}
 	case FIO_OPT_DEPRECATED:
 		ret = 1;
-		/* fall through */
+		fallthrough;
 	case FIO_OPT_SOFT_DEPRECATED:
 		log_info("Option %s is deprecated\n", o->name);
 		break;
diff --git a/t/lfsr-test.c b/t/lfsr-test.c
index ea8c8ddb..279e07f0 100644
--- a/t/lfsr-test.c
+++ b/t/lfsr-test.c
@@ -6,6 +6,7 @@
 #include "../lib/lfsr.h"
 #include "../gettime.h"
 #include "../fio_time.h"
+#include "../compiler/compiler.h"
 
 static void usage(void)
 {
@@ -40,11 +41,11 @@ int main(int argc, char *argv[])
 	switch (argc) {
 		case 5: if (strncmp(argv[4], "verify", 7) == 0)
 				verify = 1;
-			/* fall through */
+			fallthrough;
 		case 4: spin = atoi(argv[3]);
-			/* fall through */
+			fallthrough;
 		case 3: seed = atol(argv[2]);
-			/* fall through */
+			fallthrough;
 		case 2: numbers = strtol(argv[1], NULL, 16);
 				break;
 		default: usage();


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 84106576cefbbd9f5dfa5ee33b245f77938d0269:

  t/io_uring: cleanup vectored vs non-vectored (2020-08-22 11:26:39 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0ed7d55e6610020c4642ddb246f6ce1ab697052d:

  zbd: don't read past the WP on a read only workload with verify (2020-08-27 12:27:36 -0600)

----------------------------------------------------------------
Aravind Ramesh (1):
      zbd: don't read past the WP on a read only workload with verify

 zbd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/zbd.c b/zbd.c
index 5af8af4a..584d3640 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1467,9 +1467,8 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 
 	switch (io_u->ddir) {
 	case DDIR_READ:
-		if (td->runstate == TD_VERIFYING) {
-			if (td_write(td))
-				zb = zbd_replay_write_order(td, io_u, zb);
+		if (td->runstate == TD_VERIFYING && td_write(td)) {
+			zb = zbd_replay_write_order(td, io_u, zb);
 			goto accept;
 		}
 		/*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d836624b3a7eb3433bdf8f7193b44daacd5ba6d1:

  engines/io_uring: don't attempt to set RLIMITs (2020-08-21 16:22:43 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 84106576cefbbd9f5dfa5ee33b245f77938d0269:

  t/io_uring: cleanup vectored vs non-vectored (2020-08-22 11:26:39 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      engines/io_uring: use non-vectored read/write if available
      t/io_uring: use non-vectored reads if available
      t/io_uring: cleanup vectored vs non-vectored

 engines/io_uring.c  |  37 +++++++++++++++
 os/linux/io_uring.h | 131 +++++++++++++++++++++++++++++++++++++++++++++++-----
 t/io_uring.c        |  32 +++++++++++++
 3 files changed, 188 insertions(+), 12 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 2b1b1357..ec8cb18a 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -174,6 +174,7 @@ static struct fio_option options[] = {
 		.lname	= "Non-vectored",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct ioring_options, nonvectored),
+		.def	= "-1",
 		.help	= "Use non-vectored read/write commands",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
@@ -547,6 +548,40 @@ static int fio_ioring_mmap(struct ioring_data *ld, struct io_uring_params *p)
 	return 0;
 }
 
+static void fio_ioring_probe(struct thread_data *td)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+	struct io_uring_probe *p;
+	int ret;
+
+	/* already set by user, don't touch */
+	if (o->nonvectored != -1)
+		return;
+
+	/* default to off, as that's always safe */
+	o->nonvectored = 0;
+
+	p = malloc(sizeof(*p) + 256 * sizeof(struct io_uring_probe_op));
+	if (!p)
+		return;
+
+	memset(p, 0, sizeof(*p) + 256 * sizeof(struct io_uring_probe_op));
+	ret = syscall(__NR_io_uring_register, ld->ring_fd,
+			IORING_REGISTER_PROBE, p, 256);
+	if (ret < 0)
+		goto out;
+
+	if (IORING_OP_WRITE > p->ops_len)
+		goto out;
+
+	if ((p->ops[IORING_OP_READ].flags & IO_URING_OP_SUPPORTED) &&
+	    (p->ops[IORING_OP_WRITE].flags & IO_URING_OP_SUPPORTED))
+		o->nonvectored = 1;
+out:
+	free(p);
+}
+
 static int fio_ioring_queue_init(struct thread_data *td)
 {
 	struct ioring_data *ld = td->io_ops_data;
@@ -573,6 +608,8 @@ static int fio_ioring_queue_init(struct thread_data *td)
 
 	ld->ring_fd = ret;
 
+	fio_ioring_probe(td);
+
 	if (o->fixedbufs) {
 		ret = syscall(__NR_io_uring_register, ld->ring_fd,
 				IORING_REGISTER_BUFFERS, ld->iovecs, depth);
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index 03d2dde4..d39b45fd 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */
 /*
  * Header file for the io_uring interface.
  *
@@ -11,6 +11,10 @@
 #include <linux/fs.h>
 #include <linux/types.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * IO submission data structure (Submission Queue Entry)
  */
@@ -23,12 +27,16 @@ struct io_uring_sqe {
 		__u64	off;	/* offset into file */
 		__u64	addr2;
 	};
-	__u64	addr;		/* pointer to buffer or iovecs */
+	union {
+		__u64	addr;	/* pointer to buffer or iovecs */
+		__u64	splice_off_in;
+	};
 	__u32	len;		/* buffer size or number of iovecs */
 	union {
 		__kernel_rwf_t	rw_flags;
 		__u32		fsync_flags;
-		__u16		poll_events;
+		__u16		poll_events;	/* compatibility */
+		__u32		poll32_events;	/* word-reversed for BE */
 		__u32		sync_range_flags;
 		__u32		msg_flags;
 		__u32		timeout_flags;
@@ -36,22 +44,51 @@ struct io_uring_sqe {
 		__u32		cancel_flags;
 		__u32		open_flags;
 		__u32		statx_flags;
+		__u32		fadvise_advice;
+		__u32		splice_flags;
 	};
 	__u64	user_data;	/* data to be passed back at completion time */
 	union {
-		__u16	buf_index;	/* index into fixed buffers, if used */
+		struct {
+			/* pack this to avoid bogus arm OABI complaints */
+			union {
+				/* index into fixed buffers, if used */
+				__u16	buf_index;
+				/* for grouped buffer selection */
+				__u16	buf_group;
+			} __attribute__((packed));
+			/* personality to use, if used */
+			__u16	personality;
+			__s32	splice_fd_in;
+		};
 		__u64	__pad2[3];
 	};
 };
 
+enum {
+	IOSQE_FIXED_FILE_BIT,
+	IOSQE_IO_DRAIN_BIT,
+	IOSQE_IO_LINK_BIT,
+	IOSQE_IO_HARDLINK_BIT,
+	IOSQE_ASYNC_BIT,
+	IOSQE_BUFFER_SELECT_BIT,
+};
+
 /*
  * sqe->flags
  */
-#define IOSQE_FIXED_FILE	(1U << 0)	/* use fixed fileset */
-#define IOSQE_IO_DRAIN		(1U << 1)	/* issue after inflight IO */
-#define IOSQE_IO_LINK		(1U << 2)	/* links next sqe */
-#define IOSQE_IO_HARDLINK	(1U << 3)	/* like LINK, but stronger */
-#define IOSQE_ASYNC		(1U << 4)	/* always go async */
+/* use fixed fileset */
+#define IOSQE_FIXED_FILE	(1U << IOSQE_FIXED_FILE_BIT)
+/* issue after inflight IO */
+#define IOSQE_IO_DRAIN		(1U << IOSQE_IO_DRAIN_BIT)
+/* links next sqe */
+#define IOSQE_IO_LINK		(1U << IOSQE_IO_LINK_BIT)
+/* like LINK, but stronger */
+#define IOSQE_IO_HARDLINK	(1U << IOSQE_IO_HARDLINK_BIT)
+/* always go async */
+#define IOSQE_ASYNC		(1U << IOSQE_ASYNC_BIT)
+/* select buffer from sqe->buf_group */
+#define IOSQE_BUFFER_SELECT	(1U << IOSQE_BUFFER_SELECT_BIT)
 
 /*
  * io_uring_setup() flags
@@ -60,6 +97,8 @@ struct io_uring_sqe {
 #define IORING_SETUP_SQPOLL	(1U << 1)	/* SQ poll thread */
 #define IORING_SETUP_SQ_AFF	(1U << 2)	/* sq_thread_cpu is valid */
 #define IORING_SETUP_CQSIZE	(1U << 3)	/* app defines CQ size */
+#define IORING_SETUP_CLAMP	(1U << 4)	/* clamp SQ/CQ ring sizes */
+#define IORING_SETUP_ATTACH_WQ	(1U << 5)	/* attach to existing wq */
 
 enum {
 	IORING_OP_NOP,
@@ -86,6 +125,16 @@ enum {
 	IORING_OP_STATX,
 	IORING_OP_READ,
 	IORING_OP_WRITE,
+	IORING_OP_FADVISE,
+	IORING_OP_MADVISE,
+	IORING_OP_SEND,
+	IORING_OP_RECV,
+	IORING_OP_OPENAT2,
+	IORING_OP_EPOLL_CTL,
+	IORING_OP_SPLICE,
+	IORING_OP_PROVIDE_BUFFERS,
+	IORING_OP_REMOVE_BUFFERS,
+	IORING_OP_TEE,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
@@ -101,6 +150,12 @@ enum {
  */
 #define IORING_TIMEOUT_ABS	(1U << 0)
 
+/*
+ * sqe->splice_flags
+ * extends splice(2) flags
+ */
+#define SPLICE_F_FD_IN_FIXED	(1U << 31) /* the last bit of __u32 */
+
 /*
  * IO completion data structure (Completion Queue Entry)
  */
@@ -110,6 +165,17 @@ struct io_uring_cqe {
 	__u32	flags;
 };
 
+/*
+ * cqe->flags
+ *
+ * IORING_CQE_F_BUFFER	If set, the upper 16 bits are the buffer ID
+ */
+#define IORING_CQE_F_BUFFER		(1U << 0)
+
+enum {
+	IORING_CQE_BUFFER_SHIFT		= 16,
+};
+
 /*
  * Magic offsets for the application to mmap the data it needs
  */
@@ -136,6 +202,7 @@ struct io_sqring_offsets {
  * sq_ring->flags
  */
 #define IORING_SQ_NEED_WAKEUP	(1U << 0) /* needs io_uring_enter wakeup */
+#define IORING_SQ_CQ_OVERFLOW	(1U << 1) /* CQ ring is overflown */
 
 struct io_cqring_offsets {
 	__u32 head;
@@ -144,9 +211,18 @@ struct io_cqring_offsets {
 	__u32 ring_entries;
 	__u32 overflow;
 	__u32 cqes;
-	__u64 resv[2];
+	__u32 flags;
+	__u32 resv1;
+	__u64 resv2;
 };
 
+/*
+ * cq_ring->flags
+ */
+
+/* disable eventfd notifications */
+#define IORING_CQ_EVENTFD_DISABLED	(1U << 0)
+
 /*
  * io_uring_enter(2) flags
  */
@@ -163,7 +239,8 @@ struct io_uring_params {
 	__u32 sq_thread_cpu;
 	__u32 sq_thread_idle;
 	__u32 features;
-	__u32 resv[4];
+	__u32 wq_fd;
+	__u32 resv[3];
 	struct io_sqring_offsets sq_off;
 	struct io_cqring_offsets cq_off;
 };
@@ -174,6 +251,10 @@ struct io_uring_params {
 #define IORING_FEAT_SINGLE_MMAP		(1U << 0)
 #define IORING_FEAT_NODROP		(1U << 1)
 #define IORING_FEAT_SUBMIT_STABLE	(1U << 2)
+#define IORING_FEAT_RW_CUR_POS		(1U << 3)
+#define IORING_FEAT_CUR_PERSONALITY	(1U << 4)
+#define IORING_FEAT_FAST_POLL		(1U << 5)
+#define IORING_FEAT_POLL_32BITS 	(1U << 6)
 
 /*
  * io_uring_register(2) opcodes and arguments
@@ -185,10 +266,36 @@ struct io_uring_params {
 #define IORING_REGISTER_EVENTFD		4
 #define IORING_UNREGISTER_EVENTFD	5
 #define IORING_REGISTER_FILES_UPDATE	6
+#define IORING_REGISTER_EVENTFD_ASYNC	7
+#define IORING_REGISTER_PROBE		8
+#define IORING_REGISTER_PERSONALITY	9
+#define IORING_UNREGISTER_PERSONALITY	10
 
 struct io_uring_files_update {
 	__u32 offset;
-	__s32 *fds;
+	__u32 resv;
+	__aligned_u64 /* __s32 * */ fds;
 };
 
+#define IO_URING_OP_SUPPORTED	(1U << 0)
+
+struct io_uring_probe_op {
+	__u8 op;
+	__u8 resv;
+	__u16 flags;	/* IO_URING_OP_* flags */
+	__u32 resv2;
+};
+
+struct io_uring_probe {
+	__u8 last_op;	/* last opcode supported */
+	__u8 ops_len;	/* length of ops[] array below */
+	__u16 resv;
+	__u32 resv2[3];
+	struct io_uring_probe_op ops[0];
+};
+
+#ifdef __cplusplus
+}
+#endif
+
 #endif
diff --git a/t/io_uring.c b/t/io_uring.c
index 7fa84f99..8d258136 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -94,6 +94,8 @@ static int sq_thread_poll = 0;	/* use kernel submission/poller thread */
 static int sq_thread_cpu = -1;	/* pin above thread to this CPU */
 static int do_nop = 0;		/* no-op SQ ring commands */
 
+static int vectored = 1;
+
 static int io_uring_register_buffers(struct submitter *s)
 {
 	if (do_nop)
@@ -125,6 +127,29 @@ static int io_uring_setup(unsigned entries, struct io_uring_params *p)
 	return syscall(__NR_io_uring_setup, entries, p);
 }
 
+static void io_uring_probe(int fd)
+{
+	struct io_uring_probe *p;
+	int ret;
+
+	p = malloc(sizeof(*p) + 256 * sizeof(struct io_uring_probe_op));
+	if (!p)
+		return;
+
+	memset(p, 0, sizeof(*p) + 256 * sizeof(struct io_uring_probe_op));
+	ret = syscall(__NR_io_uring_register, fd, IORING_REGISTER_PROBE, p, 256);
+	if (ret < 0)
+		goto out;
+
+	if (IORING_OP_READ > p->ops_len)
+		goto out;
+
+	if ((p->ops[IORING_OP_READ].flags & IO_URING_OP_SUPPORTED))
+		vectored = 0;
+out:
+	free(p);
+}
+
 static int io_uring_enter(struct submitter *s, unsigned int to_submit,
 			  unsigned int min_complete, unsigned int flags)
 {
@@ -184,6 +209,11 @@ static void init_io(struct submitter *s, unsigned index)
 		sqe->addr = (unsigned long) s->iovecs[index].iov_base;
 		sqe->len = bs;
 		sqe->buf_index = index;
+	} else if (!vectored) {
+		sqe->opcode = IORING_OP_READ;
+		sqe->addr = (unsigned long) s->iovecs[index].iov_base;
+		sqe->len = bs;
+		sqe->buf_index = 0;
 	} else {
 		sqe->opcode = IORING_OP_READV;
 		sqe->addr = (unsigned long) &s->iovecs[index];
@@ -414,6 +444,8 @@ static int setup_ring(struct submitter *s)
 	}
 	s->ring_fd = fd;
 
+	io_uring_probe(fd);
+
 	if (fixedbufs) {
 		ret = io_uring_register_buffers(s);
 		if (ret < 0) {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-22 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 23039 bytes --]

The following changes since commit e711df54082b5d2d739e9ee3e46a2bc23b1b3c7c:

  file: provider fio_file_free() helper (2020-08-19 13:02:42 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d836624b3a7eb3433bdf8f7193b44daacd5ba6d1:

  engines/io_uring: don't attempt to set RLIMITs (2020-08-21 16:22:43 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'adjusting-libpmem' of https://github.com/lukaszstolarczuk/fio into master
      engines/io_uring: don't attempt to set RLIMITs

��ukasz Stolarczuk (1):
      engines/libpmem: adjust for PMDK >=1.5 usage

 configure            |  30 ++-
 engines/io_uring.c   |   8 -
 engines/libpmem.c    | 501 +++++++++------------------------------------------
 examples/libpmem.fio |  17 +-
 4 files changed, 127 insertions(+), 429 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index d3997b5f..6d672fe5 100755
--- a/configure
+++ b/configure
@@ -2117,10 +2117,11 @@ if test "$libpmem" != "yes" ; then
 fi
 cat > $TMPC << EOF
 #include <libpmem.h>
+#include <stdlib.h>
 int main(int argc, char **argv)
 {
   int rc;
-  rc = pmem_is_pmem(0, 0);
+  rc = pmem_is_pmem(NULL, NULL);
   return 0;
 }
 EOF
@@ -2129,6 +2130,27 @@ if compile_prog "" "-lpmem" "libpmem"; then
 fi
 print_config "libpmem" "$libpmem"
 
+##########################################
+# Check whether libpmem's version >= 1.5
+if test "$libpmem1_5" != "yes" ; then
+  libpmem1_5="no"
+fi
+if test "$libpmem" = "yes"; then
+  cat > $TMPC << EOF
+#include <libpmem.h>
+#include <stdlib.h>
+int main(int argc, char **argv)
+{
+  pmem_memcpy(NULL, NULL, NULL, NULL);
+  return 0;
+}
+EOF
+  if compile_prog "" "-lpmem" "libpmem1_5"; then
+    libpmem1_5="yes"
+  fi
+fi
+print_config "libpmem1_5" "$libpmem1_5"
+
 ##########################################
 # Check whether we have libpmemblk
 # libpmem is a prerequisite
@@ -2151,10 +2173,12 @@ EOF
 fi
 print_config "libpmemblk" "$libpmemblk"
 
-# Choose the ioengines
+# Choose libpmem-based ioengines
 if test "$libpmem" = "yes" && test "$disable_pmem" = "no"; then
-  pmem="yes"
   devdax="yes"
+  if test "$libpmem1_5" = "yes"; then
+    pmem="yes"
+  fi
   if test "$libpmemblk" = "yes"; then
     pmemblk="yes"
   fi
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 57925594..2b1b1357 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -574,14 +574,6 @@ static int fio_ioring_queue_init(struct thread_data *td)
 	ld->ring_fd = ret;
 
 	if (o->fixedbufs) {
-		struct rlimit rlim = {
-			.rlim_cur = RLIM_INFINITY,
-			.rlim_max = RLIM_INFINITY,
-		};
-
-		if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0)
-			return -1;
-
 		ret = syscall(__NR_io_uring_register, ld->ring_fd,
 				IORING_REGISTER_BUFFERS, ld->iovecs, depth);
 		if (ret < 0)
diff --git a/engines/libpmem.c b/engines/libpmem.c
index 3f63055c..a9b3e29b 100644
--- a/engines/libpmem.c
+++ b/engines/libpmem.c
@@ -2,6 +2,7 @@
  * libpmem: IO engine that uses PMDK libpmem to read and write data
  *
  * Copyright (C) 2017 Nippon Telegraph and Telephone Corporation.
+ * Copyright 2018-2020, Intel Corporation
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License,
@@ -17,7 +18,7 @@
 /*
  * libpmem engine
  *
- * IO engine that uses libpmem to read and write data
+ * IO engine that uses libpmem to write data (and memcpy to read)
  *
  * To use:
  *   ioengine=libpmem
@@ -25,21 +26,23 @@
  * Other relevant settings:
  *   iodepth=1
  *   direct=1
+ *   sync=1
  *   directory=/mnt/pmem0/
  *   bs=4k
  *
- *   direct=1 means that pmem_drain() is executed for each write operation.
- *   In contrast, direct=0 means that pmem_drain() is not executed.
+ *   sync=1 means that pmem_drain() is executed for each write operation.
+ *   Otherwise is not and should be called on demand.
+ *
+ *   direct=1 means PMEM_F_MEM_NONTEMPORAL flag is set in pmem_memcpy().
  *
  *   The pmem device must have a DAX-capable filesystem and be mounted
- *   with DAX enabled. directory must point to a mount point of DAX FS.
+ *   with DAX enabled. Directory must point to a mount point of DAX FS.
  *
  *   Example:
  *     mkfs.xfs /dev/pmem0
  *     mkdir /mnt/pmem0
  *     mount -o dax /dev/pmem0 /mnt/pmem0
  *
- *
  * See examples/libpmem.fio for more.
  *
  *
@@ -47,7 +50,7 @@
  *   By default, the libpmem engine will let the system find the libpmem.so
  *   that it uses. You can use an alternative libpmem by setting the
  *   FIO_PMEM_LIB environment variable to the full path to the desired
- *   libpmem.so.
+ *   libpmem.so. This engine requires PMDK >= 1.5.
  */
 
 #include <stdio.h>
@@ -64,394 +67,117 @@
 #include "../fio.h"
 #include "../verify.h"
 
-/*
- * Limits us to 1GiB of mapped files in total to model after
- * libpmem engine behavior
- */
-#define MMAP_TOTAL_SZ   (1 * 1024 * 1024 * 1024UL)
-
 struct fio_libpmem_data {
 	void *libpmem_ptr;
 	size_t libpmem_sz;
 	off_t libpmem_off;
 };
 
-#define MEGABYTE ((uintptr_t)1 << 20)
-#define GIGABYTE ((uintptr_t)1 << 30)
-#define PROCMAXLEN 2048 /* maximum expected line length in /proc files */
-#define roundup(x, y)   ((((x) + ((y) - 1)) / (y)) * (y))
-
-static bool Mmap_no_random;
-static void *Mmap_hint;
-static unsigned long long Mmap_align;
-
-/*
- * util_map_hint_align -- choose the desired mapping alignment
- *
- * Use 2MB/1GB page alignment only if the mapping length is at least
- * twice as big as the page size.
- */
-static inline size_t util_map_hint_align(size_t len, size_t req_align)
-{
-	size_t align = Mmap_align;
-
-	dprint(FD_IO, "DEBUG util_map_hint_align\n" );
-
-	if (req_align)
-		align = req_align;
-	else if (len >= 2 * GIGABYTE)
-		align = GIGABYTE;
-	else if (len >= 4 * MEGABYTE)
-		align = 2 * MEGABYTE;
-
-	dprint(FD_IO, "align=%d\n", (int)align);
-	return align;
-}
-
-#ifdef __FreeBSD__
-static const char *sscanf_os = "%p %p";
-#define MAP_NORESERVE 0
-#define OS_MAPFILE "/proc/curproc/map"
-#else
-static const char *sscanf_os = "%p-%p";
-#define OS_MAPFILE "/proc/self/maps"
-#endif
-
-/*
- * util_map_hint_unused -- use /proc to determine a hint address for mmap()
- *
- * This is a helper function for util_map_hint().
- * It opens up /proc/self/maps and looks for the first unused address
- * in the process address space that is:
- * - greater or equal 'minaddr' argument,
- * - large enough to hold range of given length,
- * - aligned to the specified unit.
- *
- * Asking for aligned address like this will allow the DAX code to use large
- * mappings.  It is not an error if mmap() ignores the hint and chooses
- * different address.
- */
-static char *util_map_hint_unused(void *minaddr, size_t len, size_t align)
+static int fio_libpmem_init(struct thread_data *td)
 {
-	char *lo = NULL;        /* beginning of current range in maps file */
-	char *hi = NULL;        /* end of current range in maps file */
-	char *raddr = minaddr;  /* ignore regions below 'minaddr' */
-
-#ifdef WIN32
-	MEMORY_BASIC_INFORMATION mi;
-#else
-	FILE *fp;
-	char line[PROCMAXLEN];  /* for fgets() */
-#endif
-
-	dprint(FD_IO, "DEBUG util_map_hint_unused\n");
-	assert(align > 0);
-
-	if (raddr == NULL)
-		raddr += page_size;
-
-	raddr = (char *)roundup((uintptr_t)raddr, align);
-
-#ifdef WIN32
-	while ((uintptr_t)raddr < UINTPTR_MAX - len) {
-		size_t ret = VirtualQuery(raddr, &mi, sizeof(mi));
-		if (ret == 0) {
-			ERR("VirtualQuery %p", raddr);
-			return MAP_FAILED;
-		}
-		dprint(FD_IO, "addr %p len %zu state %d",
-				mi.BaseAddress, mi.RegionSize, mi.State);
-
-		if ((mi.State != MEM_FREE) || (mi.RegionSize < len)) {
-			raddr = (char *)mi.BaseAddress + mi.RegionSize;
-			raddr = (char *)roundup((uintptr_t)raddr, align);
-			dprint(FD_IO, "nearest aligned addr %p", raddr);
-		} else {
-			dprint(FD_IO, "unused region of size %zu found at %p",
-					mi.RegionSize, mi.BaseAddress);
-			return mi.BaseAddress;
-		}
-	}
-
-	dprint(FD_IO, "end of address space reached");
-	return MAP_FAILED;
-#else
-	fp = fopen(OS_MAPFILE, "r");
-	if (!fp) {
-		log_err("!%s\n", OS_MAPFILE);
-		return MAP_FAILED;
-	}
-
-	while (fgets(line, PROCMAXLEN, fp) != NULL) {
-		/* check for range line */
-		if (sscanf(line, sscanf_os, &lo, &hi) == 2) {
-			dprint(FD_IO, "%p-%p\n", lo, hi);
-			if (lo > raddr) {
-				if ((uintptr_t)(lo - raddr) >= len) {
-					dprint(FD_IO, "unused region of size "
-							"%zu found at %p\n",
-							lo - raddr, raddr);
-					break;
-				} else {
-					dprint(FD_IO, "region is too small: "
-							"%zu < %zu\n",
-							lo - raddr, len);
-				}
-			}
-
-			if (hi > raddr) {
-				raddr = (char *)roundup((uintptr_t)hi, align);
-				dprint(FD_IO, "nearest aligned addr %p\n",
-						raddr);
-			}
-
-			if (raddr == 0) {
-				dprint(FD_IO, "end of address space reached\n");
-				break;
-			}
-		}
-	}
-
-	/*
-	 * Check for a case when this is the last unused range in the address
-	 * space, but is not large enough. (very unlikely)
-	 */
-	if ((raddr != NULL) && (UINTPTR_MAX - (uintptr_t)raddr < len)) {
-		dprint(FD_IO, "end of address space reached");
-		raddr = MAP_FAILED;
-	}
-
-	fclose(fp);
-
-	dprint(FD_IO, "returning %p", raddr);
-	return raddr;
-#endif
-}
+	struct thread_options *o = &td->o;
 
-/*
- * util_map_hint -- determine hint address for mmap()
- *
- * If PMEM_MMAP_HINT environment variable is not set, we let the system to pick
- * the randomized mapping address.  Otherwise, a user-defined hint address
- * is used.
- *
- * Windows Environment:
- *   XXX - Windows doesn't support large DAX pages yet, so there is
- *   no point in aligning for the same.
- *
- * Except for Windows Environment:
- *   ALSR in 64-bit Linux kernel uses 28-bit of randomness for mmap
- *   (bit positions 12-39), which means the base mapping address is randomized
- *   within [0..1024GB] range, with 4KB granularity.  Assuming additional
- *   1GB alignment, it results in 1024 possible locations.
- *
- *   Configuring the hint address via PMEM_MMAP_HINT environment variable
- *   disables address randomization.  In such case, the function will search for
- *   the first unused, properly aligned region of given size, above the
- *   specified address.
- */
-static char *util_map_hint(size_t len, size_t req_align)
-{
-	char *addr;
-	size_t align = 0;
-	char *e = NULL;
-
-	dprint(FD_IO, "DEBUG util_map_hint\n");
-	dprint(FD_IO, "len %zu req_align %zu\n", len, req_align);
-
-	/* choose the desired alignment based on the requested length */
-	align = util_map_hint_align(len, req_align);
-
-	e = getenv("PMEM_MMAP_HINT");
-	if (e) {
-		char *endp;
-		unsigned long long val = 0;
-
-		errno = 0;
-
-		val = strtoull(e, &endp, 16);
-		if (errno || endp == e) {
-			dprint(FD_IO, "Invalid PMEM_MMAP_HINT\n");
-		} else {
-			Mmap_hint = (void *)val;
-			Mmap_no_random = true;
-			dprint(FD_IO, "PMEM_MMAP_HINT set to %p\n", Mmap_hint);
-		}
-	}
+	dprint(FD_IO,"o->rw_min_bs %llu \n o->fsync_blocks %u \n o->fdatasync_blocks %u \n",
+			o->rw_min_bs,o->fsync_blocks,o->fdatasync_blocks);
+	dprint(FD_IO, "DEBUG fio_libpmem_init\n");
 
-	if (Mmap_no_random) {
-		dprint(FD_IO, "user-defined hint %p\n", (void *)Mmap_hint);
-		addr = util_map_hint_unused((void *)Mmap_hint, len, align);
-	} else {
-		/*
-		 * Create dummy mapping to find an unused region of given size.
-		 * * Request for increased size for later address alignment.
-		 *
-		 * Windows Environment: 
-		 *   Use MAP_NORESERVE flag to only reserve the range of pages
-		 *   rather than commit.  We don't want the pages to be actually
-		 *   backed by the operating system paging file, as the swap
-		 *   file is usually too small to handle terabyte pools.
-		 *
-		 * Except for Windows Environment:
-		 *   Use MAP_PRIVATE with read-only access to simulate
-		 *   zero cost for overcommit accounting.  Note: MAP_NORESERVE
-		 *   flag is ignored if overcommit is disabled (mode 2).
-		 */
-#ifndef WIN32
-		addr = mmap(NULL, len + align, PROT_READ,
-				MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
-#else
-		addr = mmap(NULL, len + align, PROT_READ,
-				MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
-#endif
-		if (addr != MAP_FAILED) {
-			dprint(FD_IO, "system choice %p\n", addr);
-			munmap(addr, len + align);
-			addr = (char *)roundup((uintptr_t)addr, align);
-		}
+	if ((o->rw_min_bs & page_mask) &&
+	    (o->fsync_blocks || o->fdatasync_blocks)) {
+		log_err("libpmem: mmap options dictate a minimum block size of "
+				"%llu bytes\n",	(unsigned long long) page_size);
+		return 1;
 	}
-
-	dprint(FD_IO, "hint %p\n", addr);
-
-	return addr;
+	return 0;
 }
 
 /*
- * This is the mmap execution function
+ * This is the pmem_map_file execution function
  */
 static int fio_libpmem_file(struct thread_data *td, struct fio_file *f,
 			    size_t length, off_t off)
 {
 	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
-	int flags = 0;
-	void *addr = NULL;
-
-	dprint(FD_IO, "DEBUG fio_libpmem_file\n");
-
-	if (td_rw(td))
-		flags = PROT_READ | PROT_WRITE;
-	else if (td_write(td)) {
-		flags = PROT_WRITE;
+	mode_t mode = 0;
+	size_t mapped_len;
+	int is_pmem;
 
-		if (td->o.verify != VERIFY_NONE)
-			flags |= PROT_READ;
-	} else
-		flags = PROT_READ;
+	if(td_rw(td))
+		mode = S_IWUSR | S_IRUSR;
+	else if (td_write(td))
+		mode = S_IWUSR;
+	else
+		mode = S_IRUSR;
 
-	dprint(FD_IO, "f->file_name = %s  td->o.verify = %d \n", f->file_name,
+	dprint(FD_IO, "DEBUG fio_libpmem_file\n");
+	dprint(FD_IO, "f->file_name = %s td->o.verify = %d \n", f->file_name,
 			td->o.verify);
-	dprint(FD_IO, "length = %ld  flags = %d  f->fd = %d off = %ld \n",
-			length, flags, f->fd,off);
-
-	addr = util_map_hint(length, 0);
+	dprint(FD_IO, "length = %ld f->fd = %d off = %ld file mode = %d \n",
+			length, f->fd, off, mode);
 
-	fdd->libpmem_ptr = mmap(addr, length, flags, MAP_SHARED, f->fd, off);
-	if (fdd->libpmem_ptr == MAP_FAILED) {
+	/* unmap any existing mapping */
+	if (fdd->libpmem_ptr) {
+		dprint(FD_IO,"pmem_unmap \n");
+		if (pmem_unmap(fdd->libpmem_ptr, fdd->libpmem_sz) < 0)
+			return errno;
 		fdd->libpmem_ptr = NULL;
-		td_verror(td, errno, "mmap");
 	}
 
-	if (td->error && fdd->libpmem_ptr)
-		munmap(fdd->libpmem_ptr, length);
-
-	return td->error;
-}
-
-/*
- * XXX Just mmap an appropriate portion, we cannot mmap the full extent
- */
-static int fio_libpmem_prep_limited(struct thread_data *td, struct io_u *io_u)
-{
-	struct fio_file *f = io_u->file;
-	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
-
-	dprint(FD_IO, "DEBUG fio_libpmem_prep_limited\n" );
-
-	if (io_u->buflen > f->real_file_size) {
-		log_err("libpmem: bs too big for libpmem engine\n");
-		return EIO;
+	if((fdd->libpmem_ptr = pmem_map_file(f->file_name, length, PMEM_FILE_CREATE, mode, &mapped_len, &is_pmem)) == NULL) {
+		td_verror(td, errno, pmem_errormsg());
+		goto err;
 	}
 
-	fdd->libpmem_sz = min(MMAP_TOTAL_SZ, f->real_file_size);
-	if (fdd->libpmem_sz > f->io_size)
-		fdd->libpmem_sz = f->io_size;
+	if (!is_pmem) {
+		td_verror(td, errno, "file_name does not point to persistent memory");
+	}
 
-	fdd->libpmem_off = io_u->offset;
+err:
+	if (td->error && fdd->libpmem_ptr)
+		pmem_unmap(fdd->libpmem_ptr, length);
 
-	return fio_libpmem_file(td, f, fdd->libpmem_sz, fdd->libpmem_off);
+	return td->error;
 }
 
-/*
- * Attempt to mmap the entire file
- */
-static int fio_libpmem_prep_full(struct thread_data *td, struct io_u *io_u)
+static int fio_libpmem_open_file(struct thread_data *td, struct fio_file *f)
 {
-	struct fio_file *f = io_u->file;
-	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
-	int ret;
-
-	dprint(FD_IO, "DEBUG fio_libpmem_prep_full\n" );
+	struct fio_libpmem_data *fdd;
 
-	if (fio_file_partial_mmap(f))
-		return EINVAL;
+	dprint(FD_IO,"DEBUG fio_libpmem_open_file\n");
+	dprint(FD_IO,"f->io_size=%ld \n",f->io_size);
+	dprint(FD_IO,"td->o.size=%lld \n",td->o.size);
+	dprint(FD_IO,"td->o.iodepth=%d\n",td->o.iodepth);
+	dprint(FD_IO,"td->o.iodepth_batch=%d \n",td->o.iodepth_batch);
 
-	dprint(FD_IO," f->io_size %ld : io_u->offset %lld \n",
-			f->io_size, io_u->offset);
+	if (fio_file_open(f))
+		td_io_close_file(td, f);
 
-	if (io_u->offset != (size_t) io_u->offset ||
-	    f->io_size != (size_t) f->io_size) {
-		fio_file_set_partial_mmap(f);
-		return EINVAL;
+	fdd = calloc(1, sizeof(*fdd));
+	if (!fdd) {
+		return 1;
 	}
+	FILE_SET_ENG_DATA(f, fdd);
 	fdd->libpmem_sz = f->io_size;
 	fdd->libpmem_off = 0;
 
-	ret = fio_libpmem_file(td, f, fdd->libpmem_sz, fdd->libpmem_off);
-	if (ret)
-		fio_file_set_partial_mmap(f);
-
-	return ret;
+	return fio_libpmem_file(td, f, fdd->libpmem_sz, fdd->libpmem_off);
 }
 
 static int fio_libpmem_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
-	int ret;
 
 	dprint(FD_IO, "DEBUG fio_libpmem_prep\n" );
-	/*
-	 * It fits within existing mapping, use it
-	 */
-	dprint(FD_IO," io_u->offset %llu : fdd->libpmem_off %llu : "
-			"io_u->buflen %llu : fdd->libpmem_sz %llu\n",
-			io_u->offset, (unsigned long long) fdd->libpmem_off,
-			io_u->buflen, (unsigned long long) fdd->libpmem_sz);
-
-	if (io_u->offset >= fdd->libpmem_off &&
-	    (io_u->offset + io_u->buflen <=
-	     fdd->libpmem_off + fdd->libpmem_sz))
-		goto done;
-
-	/*
-	 * unmap any existing mapping
-	 */
-	if (fdd->libpmem_ptr) {
-		dprint(FD_IO,"munmap \n");
-		if (munmap(fdd->libpmem_ptr, fdd->libpmem_sz) < 0)
-			return errno;
-		fdd->libpmem_ptr = NULL;
-	}
+	dprint(FD_IO," io_u->offset %llu : fdd->libpmem_off %ld : "
+			"io_u->buflen %llu : fdd->libpmem_sz %ld\n",
+			io_u->offset, fdd->libpmem_off,
+			io_u->buflen, fdd->libpmem_sz);
 
-	if (fio_libpmem_prep_full(td, io_u)) {
-		td_clear_error(td);
-		ret = fio_libpmem_prep_limited(td, io_u);
-		if (ret)
-			return ret;
+	if (io_u->buflen > f->real_file_size) {
+		log_err("libpmem: bs bigger than the file size\n");
+		return EIO;
 	}
 
-done:
 	io_u->mmap_data = fdd->libpmem_ptr + io_u->offset - fdd->libpmem_off
 				- f->file_offset;
 	return 0;
@@ -460,10 +186,15 @@ done:
 static enum fio_q_status fio_libpmem_queue(struct thread_data *td,
 					   struct io_u *io_u)
 {
+	unsigned flags = 0;
+
 	fio_ro_check(td, io_u);
 	io_u->error = 0;
 
 	dprint(FD_IO, "DEBUG fio_libpmem_queue\n");
+	dprint(FD_IO,"td->o.odirect %d td->o.sync_io %d \n",td->o.odirect, td->o.sync_io);
+	flags = td->o.sync_io ? 0 : PMEM_F_MEM_NODRAIN;
+	flags |= td->o.odirect ? PMEM_F_MEM_NONTEMPORAL : PMEM_F_MEM_TEMPORAL;
 
 	switch (io_u->ddir) {
 	case DDIR_READ:
@@ -472,20 +203,15 @@ static enum fio_q_status fio_libpmem_queue(struct thread_data *td,
 	case DDIR_WRITE:
 		dprint(FD_IO, "DEBUG mmap_data=%p, xfer_buf=%p\n",
 				io_u->mmap_data, io_u->xfer_buf );
-		dprint(FD_IO,"td->o.odirect %d \n",td->o.odirect);
-		if (td->o.odirect) {
-			pmem_memcpy_persist(io_u->mmap_data,
-						io_u->xfer_buf,
-						io_u->xfer_buflen);
-		} else {
-			pmem_memcpy_nodrain(io_u->mmap_data,
-						io_u->xfer_buf,
-						io_u->xfer_buflen);
-		}
+		pmem_memcpy(io_u->mmap_data,
+					io_u->xfer_buf,
+					io_u->xfer_buflen,
+					flags);
 		break;
 	case DDIR_SYNC:
 	case DDIR_DATASYNC:
 	case DDIR_SYNC_FILE_RANGE:
+		pmem_drain();
 		break;
 	default:
 		io_u->error = EINVAL;
@@ -495,53 +221,10 @@ static enum fio_q_status fio_libpmem_queue(struct thread_data *td,
 	return FIO_Q_COMPLETED;
 }
 
-static int fio_libpmem_init(struct thread_data *td)
-{
-	struct thread_options *o = &td->o;
-
-	dprint(FD_IO,"o->rw_min_bs %llu \n o->fsync_blocks %d \n o->fdatasync_blocks %d \n",
-			o->rw_min_bs,o->fsync_blocks,o->fdatasync_blocks);
-	dprint(FD_IO, "DEBUG fio_libpmem_init\n");
-
-	if ((o->rw_min_bs & page_mask) &&
-	    (o->fsync_blocks || o->fdatasync_blocks)) {
-		log_err("libpmem: mmap options dictate a minimum block size of "
-				"%llu bytes\n",	(unsigned long long) page_size);
-		return 1;
-	}
-	return 0;
-}
-
-static int fio_libpmem_open_file(struct thread_data *td, struct fio_file *f)
-{
-	struct fio_libpmem_data *fdd;
-	int ret;
-
-	dprint(FD_IO,"DEBUG fio_libpmem_open_file\n");
-	dprint(FD_IO,"f->io_size=%ld \n",f->io_size);
-	dprint(FD_IO,"td->o.size=%lld \n",td->o.size);
-	dprint(FD_IO,"td->o.iodepth=%d\n",td->o.iodepth);
-	dprint(FD_IO,"td->o.iodepth_batch=%d \n",td->o.iodepth_batch);
-
-	ret = generic_open_file(td, f);
-	if (ret)
-		return ret;
-
-	fdd = calloc(1, sizeof(*fdd));
-	if (!fdd) {
-		int fio_unused __ret;
-		__ret = generic_close_file(td, f);
-		return 1;
-	}
-
-	FILE_SET_ENG_DATA(f, fdd);
-
-	return 0;
-}
-
 static int fio_libpmem_close_file(struct thread_data *td, struct fio_file *f)
 {
 	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
+	int ret = 0;
 
 	dprint(FD_IO,"DEBUG fio_libpmem_close_file\n");
 	dprint(FD_IO,"td->o.odirect %d \n",td->o.odirect);
@@ -551,11 +234,15 @@ static int fio_libpmem_close_file(struct thread_data *td, struct fio_file *f)
 		pmem_drain();
 	}
 
+	if (fdd->libpmem_ptr)
+		ret = pmem_unmap(fdd->libpmem_ptr, fdd->libpmem_sz);
+	if (fio_file_open(f))
+		ret &= generic_close_file(td, f);
+
 	FILE_SET_ENG_DATA(f, NULL);
 	free(fdd);
-	fio_file_clear_partial_mmap(f);
 
-	return generic_close_file(td, f);
+	return ret;
 }
 
 FIO_STATIC struct ioengine_ops ioengine = {
@@ -567,22 +254,12 @@ FIO_STATIC struct ioengine_ops ioengine = {
 	.open_file	= fio_libpmem_open_file,
 	.close_file	= fio_libpmem_close_file,
 	.get_file_size	= generic_get_file_size,
-	.flags		= FIO_SYNCIO |FIO_NOEXTEND,
+	.flags		= FIO_SYNCIO | FIO_RAWIO | FIO_DISKLESSIO | FIO_NOEXTEND |
+				FIO_NODISKUTIL | FIO_BARRIER | FIO_MEMALIGN,
 };
 
 static void fio_init fio_libpmem_register(void)
 {
-#ifndef WIN32
-	Mmap_align = page_size;
-#else
-	if (Mmap_align == 0) {
-		SYSTEM_INFO si;
-
-		GetSystemInfo(&si);
-		Mmap_align = si.dwAllocationGranularity;
-	}
-#endif
-
 	register_ioengine(&ioengine);
 }
 
diff --git a/examples/libpmem.fio b/examples/libpmem.fio
index d44fcfa7..65b1d687 100644
--- a/examples/libpmem.fio
+++ b/examples/libpmem.fio
@@ -15,6 +15,7 @@ iodepth=1
 iodepth_batch=1
 thread=1
 numjobs=1
+runtime=300
 
 #
 # In case of 'scramble_buffers=1', the source buffer
@@ -27,13 +28,17 @@ numjobs=1
 scramble_buffers=0
 
 #
-# direct=0:
-#   Using pmem_memcpy_nodrain() for write operation
+# depends on direct option, flags are set for pmem_memcpy() call:
+# direct=1 - PMEM_F_MEM_NONTEMPORAL,
+# direct=0 - PMEM_F_MEM_TEMPORAL.
 #
-# direct=1:
-#   Using pmem_memcpy_persist() for write operation
+direct=1
+
+#
+# sync=1 means that pmem_drain() is executed for each write operation.
 #
-direct=0
+sync=1
+
 
 #
 # Setting for fio process's CPU Node and Memory Node
@@ -47,7 +52,7 @@ numa_mem_policy=bind:0
 cpus_allowed_policy=split
 
 #
-# The pmemblk engine does IO to files in a DAX-mounted filesystem.
+# The libpmem engine does IO to files in a DAX-mounted filesystem.
 # The filesystem should be created on an NVDIMM (e.g /dev/pmem0)
 # and then mounted with the '-o dax' option.  Note that the engine
 # accesses the underlying NVDIMM directly, bypassing the kernel block


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3c1f3ca75b447a2bd0a93deb0a2a1210529d2ccb:

  Merge branch 'filelock_assert_fix' of https://github.com/bardavid/fio into master (2020-08-18 08:33:49 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e711df54082b5d2d739e9ee3e46a2bc23b1b3c7c:

  file: provider fio_file_free() helper (2020-08-19 13:02:42 -0600)

----------------------------------------------------------------
Dmitry Fomichev (1):
      configure: fix syntax error with NetBSD

Jens Axboe (5):
      engines/windowsaio: fix silly thinky on IO thread creation
      Merge branch 'force-windows-artifact' of https://github.com/sitsofe/fio into master
      file: track allocation origin
      init: add_job() needs to use right file freeing functions
      file: provider fio_file_free() helper

Sitsofe Wheeler (1):
      ci: always upload Windows MSI if smoke test passes

 .appveyor.yml        | 10 ++++------
 configure            | 14 +++++++-------
 engines/windowsaio.c |  8 +++-----
 file.h               |  3 +++
 filesetup.c          | 31 ++++++++++++++++---------------
 init.c               |  7 ++-----
 6 files changed, 35 insertions(+), 38 deletions(-)

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index 5c0266a1..352caeee 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -25,15 +25,13 @@ build_script:
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure --disable-native --extra-cflags=\"-Werror\" ${CONFIGURE_OPTIONS} && make.exe'
 
 after_build:
-  - cd os\windows && dobuild.cmd %PLATFORM%
+  - file.exe fio.exe
+  - make.exe test
+  - 'cd os\windows && dobuild.cmd %PLATFORM% && cd ..'
+  - ps: Get-ChildItem .\os\windows\*.msi | % { Push-AppveyorArtifact $_.FullName -FileName $_.Name -DeploymentName fio.msi }
 
 test_script:
-  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && file.exe fio.exe && make.exe test'
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && [ -f fio.exe ] && python.exe t/run-fio-tests.py --artifact-root test-artifacts --debug'
 
-artifacts:
-  - path: os\windows\*.msi
-    name: msi
-
 on_finish:
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && [ -d test-artifacts ] && 7z a -t7z test-artifacts.7z test-artifacts -xr!foo.0.0 -xr!latency.?.0 -xr!fio_jsonplus_clat2csv.test && appveyor PushArtifact test-artifacts.7z'
diff --git a/configure b/configure
index dd7fe3d2..d3997b5f 100755
--- a/configure
+++ b/configure
@@ -134,18 +134,18 @@ output_sym() {
 }
 
 check_min_lib_version() {
-  local feature=$3
+  _feature=$3
 
-  if ${cross_prefix}pkg-config --atleast-version=$2 $1 > /dev/null 2>&1; then
+  if "${cross_prefix}"pkg-config --atleast-version="$2" "$1" > /dev/null 2>&1; then
     return 0
   fi
-  : ${feature:=${1}}
-  if ${cross_prefix}pkg-config --version > /dev/null 2>&1; then
-    if test ${!feature} = "yes" ; then
-      feature_not_found "$feature" "$1 >= $2"
+  : "${_feature:=${1}}"
+  if "${cross_prefix}"pkg-config --version > /dev/null 2>&1; then
+    if eval "echo \$$_feature" = "yes" ; then
+      feature_not_found "$_feature" "$1 >= $2"
     fi
   else
-    print_config "$1" "missing pkg-config, can't check $feature version"
+    print_config "$1" "missing pkg-config, can't check $_feature version"
   fi
   return 1
 }
diff --git a/engines/windowsaio.c b/engines/windowsaio.c
index ff8b6e1b..5c7e7964 100644
--- a/engines/windowsaio.c
+++ b/engines/windowsaio.c
@@ -106,12 +106,10 @@ static int fio_windowsaio_init(struct thread_data *td)
 			ctx->iocp = hFile;
 			ctx->wd = wd;
 			wd->iothread = CreateThread(NULL, 0, IoCompletionRoutine, ctx, 0, &threadid);
-
-			if (wd->iothread != NULL &&
-			    fio_option_is_set(&td->o, cpumask))
-				fio_setaffinity(threadid, td->o.cpumask);
-			else
+			if (!wd->iothread)
 				log_err("windowsaio: failed to create io completion thread\n");
+			else if (fio_option_is_set(&td->o, cpumask))
+				fio_setaffinity(threadid, td->o.cpumask);
 		}
 
 		if (rc || wd->iothread == NULL)
diff --git a/file.h b/file.h
index 375bbfd3..493ec04a 100644
--- a/file.h
+++ b/file.h
@@ -33,6 +33,7 @@ enum fio_file_flags {
 	FIO_FILE_partial_mmap	= 1 << 6,	/* can't do full mmap */
 	FIO_FILE_axmap		= 1 << 7,	/* uses axmap */
 	FIO_FILE_lfsr		= 1 << 8,	/* lfsr is used */
+	FIO_FILE_smalloc	= 1 << 9,	/* smalloc file/file_name */
 };
 
 enum file_lock_mode {
@@ -188,6 +189,7 @@ FILE_FLAG_FNS(hashed);
 FILE_FLAG_FNS(partial_mmap);
 FILE_FLAG_FNS(axmap);
 FILE_FLAG_FNS(lfsr);
+FILE_FLAG_FNS(smalloc);
 #undef FILE_FLAG_FNS
 
 /*
@@ -229,5 +231,6 @@ extern void fio_file_reset(struct thread_data *, struct fio_file *);
 extern bool fio_files_done(struct thread_data *);
 extern bool exists_and_not_regfile(const char *);
 extern int fio_set_directio(struct thread_data *, struct fio_file *);
+extern void fio_file_free(struct fio_file *);
 
 #endif
diff --git a/filesetup.c b/filesetup.c
index 49c54b81..d382fa24 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1445,11 +1445,23 @@ void close_files(struct thread_data *td)
 	}
 }
 
+void fio_file_free(struct fio_file *f)
+{
+	if (fio_file_axmap(f))
+		axmap_free(f->io_axmap);
+	if (!fio_file_smalloc(f)) {
+		free(f->file_name);
+		free(f);
+	} else {
+		sfree(f->file_name);
+		sfree(f);
+	}
+}
+
 void close_and_free_files(struct thread_data *td)
 {
 	struct fio_file *f;
 	unsigned int i;
-	bool use_free = td_ioengine_flagged(td, FIO_NOFILEHASH);
 
 	dprint(FD_FILE, "close files\n");
 
@@ -1470,20 +1482,7 @@ void close_and_free_files(struct thread_data *td)
 		}
 
 		zbd_close_file(f);
-
-		if (use_free)
-			free(f->file_name);
-		else
-			sfree(f->file_name);
-		f->file_name = NULL;
-		if (fio_file_axmap(f)) {
-			axmap_free(f->io_axmap);
-			f->io_axmap = NULL;
-		}
-		if (use_free)
-			free(f);
-		else
-			sfree(f);
+		fio_file_free(f);
 	}
 
 	td->o.filename = NULL;
@@ -1609,6 +1608,8 @@ static struct fio_file *alloc_new_file(struct thread_data *td)
 	f->fd = -1;
 	f->shadow_fd = -1;
 	fio_file_reset(td, f);
+	if (!td_ioengine_flagged(td, FIO_NOFILEHASH))
+		fio_file_set_smalloc(f);
 	return f;
 }
 
diff --git a/init.c b/init.c
index 6ff7c68d..491b46e6 100644
--- a/init.c
+++ b/init.c
@@ -1735,11 +1735,8 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		if (file_alloced) {
 			if (td_new->files) {
 				struct fio_file *f;
-				for_each_file(td_new, f, i) {
-					if (f->file_name)
-						sfree(f->file_name);
-					sfree(f);
-				}
+				for_each_file(td_new, f, i)
+					fio_file_free(f);
 				free(td_new->files);
 				td_new->files = NULL;
 			}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 09c1aa8b322765afae56ea1ebc9eaa06f94da6a6:

  engines/windowsaio: only set IOCP thread affinity if specified (2020-08-17 15:42:16 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3c1f3ca75b447a2bd0a93deb0a2a1210529d2ccb:

  Merge branch 'filelock_assert_fix' of https://github.com/bardavid/fio into master (2020-08-18 08:33:49 -0700)

----------------------------------------------------------------
David, Bar (1):
      filelock: fix wrong file trylock assertion.

Jens Axboe (2):
      Merge branch 'asprintf1' of https://github.com/kusumi/fio into master
      Merge branch 'filelock_assert_fix' of https://github.com/bardavid/fio into master

Tomohiro Kusumi (1):
      oslib: fix asprintf build failure

 filelock.c       | 2 +-
 oslib/asprintf.c | 1 -
 oslib/asprintf.h | 2 ++
 3 files changed, 3 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/filelock.c b/filelock.c
index 7e92f63d..fd3a6b29 100644
--- a/filelock.c
+++ b/filelock.c
@@ -179,7 +179,7 @@ static bool __fio_lock_file(const char *fname, int trylock)
 	fio_sem_up(&fld->lock);
 
 	if (!ff) {
-		assert(!trylock);
+		assert(trylock);
 		return true;
 	}
 
diff --git a/oslib/asprintf.c b/oslib/asprintf.c
index ff503c52..2d9f811c 100644
--- a/oslib/asprintf.c
+++ b/oslib/asprintf.c
@@ -1,4 +1,3 @@
-#include <stdarg.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include "oslib/asprintf.h"
diff --git a/oslib/asprintf.h b/oslib/asprintf.h
index 7425300f..43bbb56b 100644
--- a/oslib/asprintf.h
+++ b/oslib/asprintf.h
@@ -1,6 +1,8 @@
 #ifndef FIO_ASPRINTF_H
 #define FIO_ASPRINTF_H
 
+#include <stdarg.h>
+
 #ifndef CONFIG_HAVE_VASPRINTF
 int vasprintf(char **strp, const char *fmt, va_list ap);
 #endif


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dde39f8c356158207eb2fc8ff6c9d2daad910a84:

  fio: add for_each_rw_ddir() macro (2020-08-16 21:01:22 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 09c1aa8b322765afae56ea1ebc9eaa06f94da6a6:

  engines/windowsaio: only set IOCP thread affinity if specified (2020-08-17 15:42:16 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      engines/windowsaio: only set IOCP thread affinity if specified

 engines/windowsaio.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/windowsaio.c b/engines/windowsaio.c
index 13d7f194..ff8b6e1b 100644
--- a/engines/windowsaio.c
+++ b/engines/windowsaio.c
@@ -107,7 +107,8 @@ static int fio_windowsaio_init(struct thread_data *td)
 			ctx->wd = wd;
 			wd->iothread = CreateThread(NULL, 0, IoCompletionRoutine, ctx, 0, &threadid);
 
-			if (wd->iothread != NULL)
+			if (wd->iothread != NULL &&
+			    fio_option_is_set(&td->o, cpumask))
 				fio_setaffinity(threadid, td->o.cpumask);
 			else
 				log_err("windowsaio: failed to create io completion thread\n");


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0ce7e3e584be6f4f834fb2fec8e52deb23e1dd12:

  Merge branch 'issue-1065' of https://github.com/XeS0r/fio into master (2020-08-14 16:02:37 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dde39f8c356158207eb2fc8ff6c9d2daad910a84:

  fio: add for_each_rw_ddir() macro (2020-08-16 21:01:22 -0700)

----------------------------------------------------------------
Alexey Dobriyan (1):
      fio: add for_each_rw_ddir() macro

Jens Axboe (1):
      Fio 3.22

 FIO-VERSION-GEN |  2 +-
 backend.c       | 16 +++++++--------
 eta.c           | 12 +++++------
 init.c          | 62 ++++++++++++++++++++++++---------------------------------
 io_ddir.h       |  2 ++
 stat.c          | 16 +++++++--------
 6 files changed, 48 insertions(+), 62 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 48e575fc..bf0fee99 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.21
+DEF_VER=fio-3.22
 
 LF='
 '
diff --git a/backend.c b/backend.c
index 0e454cdd..a4367672 100644
--- a/backend.c
+++ b/backend.c
@@ -223,12 +223,10 @@ static bool check_min_rate(struct thread_data *td, struct timespec *now)
 {
 	bool ret = false;
 
-	if (td->bytes_done[DDIR_READ])
-		ret |= __check_min_rate(td, now, DDIR_READ);
-	if (td->bytes_done[DDIR_WRITE])
-		ret |= __check_min_rate(td, now, DDIR_WRITE);
-	if (td->bytes_done[DDIR_TRIM])
-		ret |= __check_min_rate(td, now, DDIR_TRIM);
+	for_each_rw_ddir(ddir) {
+		if (td->bytes_done[ddir])
+			ret |= __check_min_rate(td, now, ddir);
+	}
 
 	return ret;
 }
@@ -1876,9 +1874,9 @@ static void *thread_main(void *data)
 
 	update_rusage_stat(td);
 	td->ts.total_run_time = mtime_since_now(&td->epoch);
-	td->ts.io_bytes[DDIR_READ] = td->io_bytes[DDIR_READ];
-	td->ts.io_bytes[DDIR_WRITE] = td->io_bytes[DDIR_WRITE];
-	td->ts.io_bytes[DDIR_TRIM] = td->io_bytes[DDIR_TRIM];
+	for_each_rw_ddir(ddir) {
+		td->ts.io_bytes[ddir] = td->io_bytes[ddir];
+	}
 
 	if (td->o.verify_state_save && !(td->flags & TD_F_VSTATE_SAVED) &&
 	    (td->o.verify != VERIFY_NONE && td_write(td)))
diff --git a/eta.c b/eta.c
index 13f61ba4..e8c72780 100644
--- a/eta.c
+++ b/eta.c
@@ -383,8 +383,8 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 	struct thread_data *td;
 	int i, unified_rw_rep;
 	uint64_t rate_time, disp_time, bw_avg_time, *eta_secs;
-	unsigned long long io_bytes[DDIR_RWDIR_CNT];
-	unsigned long long io_iops[DDIR_RWDIR_CNT];
+	unsigned long long io_bytes[DDIR_RWDIR_CNT] = {};
+	unsigned long long io_iops[DDIR_RWDIR_CNT] = {};
 	struct timespec now;
 
 	static unsigned long long rate_io_bytes[DDIR_RWDIR_CNT];
@@ -413,8 +413,6 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 
 	je->elapsed_sec = (mtime_since_genesis() + 999) / 1000;
 
-	io_bytes[DDIR_READ] = io_bytes[DDIR_WRITE] = io_bytes[DDIR_TRIM] = 0;
-	io_iops[DDIR_READ] = io_iops[DDIR_WRITE] = io_iops[DDIR_TRIM] = 0;
 	bw_avg_time = ULONG_MAX;
 	unified_rw_rep = 0;
 	for_each_td(td, i) {
@@ -509,9 +507,9 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 		calc_rate(unified_rw_rep, rate_time, io_bytes, rate_io_bytes,
 				je->rate);
 		memcpy(&rate_prev_time, &now, sizeof(now));
-		add_agg_sample(sample_val(je->rate[DDIR_READ]), DDIR_READ, 0, 0);
-		add_agg_sample(sample_val(je->rate[DDIR_WRITE]), DDIR_WRITE, 0, 0);
-		add_agg_sample(sample_val(je->rate[DDIR_TRIM]), DDIR_TRIM, 0, 0);
+		for_each_rw_ddir(ddir) {
+			add_agg_sample(sample_val(je->rate[ddir]), ddir, 0, 0);
+		}
 	}
 
 	disp_time = mtime_since(&disp_prev_time, &now);
diff --git a/init.c b/init.c
index 84325f1e..6ff7c68d 100644
--- a/init.c
+++ b/init.c
@@ -564,13 +564,11 @@ static int setup_rate(struct thread_data *td)
 {
 	int ret = 0;
 
-	if (td->o.rate[DDIR_READ] || td->o.rate_iops[DDIR_READ])
-		ret = __setup_rate(td, DDIR_READ);
-	if (td->o.rate[DDIR_WRITE] || td->o.rate_iops[DDIR_WRITE])
-		ret |= __setup_rate(td, DDIR_WRITE);
-	if (td->o.rate[DDIR_TRIM] || td->o.rate_iops[DDIR_TRIM])
-		ret |= __setup_rate(td, DDIR_TRIM);
-
+	for_each_rw_ddir(ddir) {
+		if (td->o.rate[ddir] || td->o.rate_iops[ddir]) {
+			ret |= __setup_rate(td, ddir);
+		}
+	}
 	return ret;
 }
 
@@ -662,31 +660,25 @@ static int fixup_options(struct thread_data *td)
 	if (td_read(td))
 		o->overwrite = 1;
 
-	if (!o->min_bs[DDIR_READ])
-		o->min_bs[DDIR_READ] = o->bs[DDIR_READ];
-	if (!o->max_bs[DDIR_READ])
-		o->max_bs[DDIR_READ] = o->bs[DDIR_READ];
-	if (!o->min_bs[DDIR_WRITE])
-		o->min_bs[DDIR_WRITE] = o->bs[DDIR_WRITE];
-	if (!o->max_bs[DDIR_WRITE])
-		o->max_bs[DDIR_WRITE] = o->bs[DDIR_WRITE];
-	if (!o->min_bs[DDIR_TRIM])
-		o->min_bs[DDIR_TRIM] = o->bs[DDIR_TRIM];
-	if (!o->max_bs[DDIR_TRIM])
-		o->max_bs[DDIR_TRIM] = o->bs[DDIR_TRIM];
-
-	o->rw_min_bs = min(o->min_bs[DDIR_READ], o->min_bs[DDIR_WRITE]);
-	o->rw_min_bs = min(o->min_bs[DDIR_TRIM], o->rw_min_bs);
+	for_each_rw_ddir(ddir) {
+		if (!o->min_bs[ddir])
+			o->min_bs[ddir] = o->bs[ddir];
+		if (!o->max_bs[ddir])
+			o->max_bs[ddir] = o->bs[ddir];
+	}
+
+	o->rw_min_bs = -1;
+	for_each_rw_ddir(ddir) {
+		o->rw_min_bs = min(o->rw_min_bs, o->min_bs[ddir]);
+	}
 
 	/*
 	 * For random IO, allow blockalign offset other than min_bs.
 	 */
-	if (!o->ba[DDIR_READ] || !td_random(td))
-		o->ba[DDIR_READ] = o->min_bs[DDIR_READ];
-	if (!o->ba[DDIR_WRITE] || !td_random(td))
-		o->ba[DDIR_WRITE] = o->min_bs[DDIR_WRITE];
-	if (!o->ba[DDIR_TRIM] || !td_random(td))
-		o->ba[DDIR_TRIM] = o->min_bs[DDIR_TRIM];
+	for_each_rw_ddir(ddir) {
+		if (!o->ba[ddir] || !td_random(td))
+			o->ba[ddir] = o->min_bs[ddir];
+	}
 
 	if ((o->ba[DDIR_READ] != o->min_bs[DDIR_READ] ||
 	    o->ba[DDIR_WRITE] != o->min_bs[DDIR_WRITE] ||
@@ -765,14 +757,12 @@ static int fixup_options(struct thread_data *td)
 		log_err("fio: rate and rate_iops are mutually exclusive\n");
 		ret |= 1;
 	}
-	if ((o->rate[DDIR_READ] && (o->rate[DDIR_READ] < o->ratemin[DDIR_READ])) ||
-	    (o->rate[DDIR_WRITE] && (o->rate[DDIR_WRITE] < o->ratemin[DDIR_WRITE])) ||
-	    (o->rate[DDIR_TRIM] && (o->rate[DDIR_TRIM] < o->ratemin[DDIR_TRIM])) ||
-	    (o->rate_iops[DDIR_READ] && (o->rate_iops[DDIR_READ] < o->rate_iops_min[DDIR_READ])) ||
-	    (o->rate_iops[DDIR_WRITE] && (o->rate_iops[DDIR_WRITE] < o->rate_iops_min[DDIR_WRITE])) ||
-	    (o->rate_iops[DDIR_TRIM] && (o->rate_iops[DDIR_TRIM] < o->rate_iops_min[DDIR_TRIM]))) {
-		log_err("fio: minimum rate exceeds rate\n");
-		ret |= 1;
+	for_each_rw_ddir(ddir) {
+		if ((o->rate[ddir] && (o->rate[ddir] < o->ratemin[ddir])) ||
+		    (o->rate_iops[ddir] && (o->rate_iops[ddir] < o->rate_iops_min[ddir]))) {
+			log_err("fio: minimum rate exceeds rate, ddir %d\n", +ddir);
+			ret |= 1;
+		}
 	}
 
 	if (!o->timeout && o->time_based) {
diff --git a/io_ddir.h b/io_ddir.h
index deaa8b5a..a42da97a 100644
--- a/io_ddir.h
+++ b/io_ddir.h
@@ -16,6 +16,8 @@ enum fio_ddir {
 	DDIR_RWDIR_SYNC_CNT = 4,
 };
 
+#define for_each_rw_ddir(ddir)	for (enum fio_ddir ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
+
 static inline const char *io_ddir_name(enum fio_ddir ddir)
 {
 	static const char *name[] = { "read", "write", "trim", "sync",
diff --git a/stat.c b/stat.c
index 23657cee..7f987c7f 100644
--- a/stat.c
+++ b/stat.c
@@ -1078,12 +1078,10 @@ static void show_thread_status_normal(struct thread_stat *ts,
 	if (strlen(ts->description))
 		log_buf(out, "  Description  : [%s]\n", ts->description);
 
-	if (ts->io_bytes[DDIR_READ])
-		show_ddir_status(rs, ts, DDIR_READ, out);
-	if (ts->io_bytes[DDIR_WRITE])
-		show_ddir_status(rs, ts, DDIR_WRITE, out);
-	if (ts->io_bytes[DDIR_TRIM])
-		show_ddir_status(rs, ts, DDIR_TRIM, out);
+	for_each_rw_ddir(ddir) {
+		if (ts->io_bytes[ddir])
+			show_ddir_status(rs, ts, ddir, out);
+	}
 
 	show_latencies(ts, out);
 
@@ -2315,9 +2313,9 @@ void __show_running_run_stats(void)
 
 	for_each_td(td, i) {
 		td->update_rusage = 1;
-		td->ts.io_bytes[DDIR_READ] = td->io_bytes[DDIR_READ];
-		td->ts.io_bytes[DDIR_WRITE] = td->io_bytes[DDIR_WRITE];
-		td->ts.io_bytes[DDIR_TRIM] = td->io_bytes[DDIR_TRIM];
+		for_each_rw_ddir(ddir) {
+			td->ts.io_bytes[ddir] = td->io_bytes[ddir];
+		}
 		td->ts.total_run_time = mtime_since(&td->epoch, &ts);
 
 		rt[i] = mtime_since(&td->start, &ts);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-15 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 6274 bytes --]

The following changes since commit 5c79c32f6afb39fae910912475e8fea786c3353e:

  zbd: use ->min_zone, ->max_zone in more places (2020-08-13 16:06:29 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0ce7e3e584be6f4f834fb2fec8e52deb23e1dd12:

  Merge branch 'issue-1065' of https://github.com/XeS0r/fio into master (2020-08-14 16:02:37 -0700)

----------------------------------------------------------------
Andr�� Wild (1):
      thread_options: Use unsigned int type for exit_what and stonewall

Jens Axboe (2):
      Fixup examples/exitwhat.fio
      Merge branch 'issue-1065' of https://github.com/XeS0r/fio into master

 cconv.c               |  8 ++++----
 examples/exitwhat.fio |  8 ++++----
 fio.1                 | 29 +++++++++++++++++++++--------
 server.h              |  2 +-
 thread_options.h      |  9 ++++-----
 5 files changed, 34 insertions(+), 22 deletions(-)

---

Diff of recent changes:

diff --git a/cconv.c b/cconv.c
index 2469389b..4b0c3490 100644
--- a/cconv.c
+++ b/cconv.c
@@ -237,8 +237,8 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->loops = le32_to_cpu(top->loops);
 	o->mem_type = le32_to_cpu(top->mem_type);
 	o->mem_align = le32_to_cpu(top->mem_align);
-	o->exit_what = le16_to_cpu(top->exit_what);
-	o->stonewall = le16_to_cpu(top->stonewall);
+	o->exit_what = le32_to_cpu(top->exit_what);
+	o->stonewall = le32_to_cpu(top->stonewall);
 	o->new_group = le32_to_cpu(top->new_group);
 	o->numjobs = le32_to_cpu(top->numjobs);
 	o->cpus_allowed_policy = le32_to_cpu(top->cpus_allowed_policy);
@@ -437,8 +437,8 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->loops = cpu_to_le32(o->loops);
 	top->mem_type = cpu_to_le32(o->mem_type);
 	top->mem_align = cpu_to_le32(o->mem_align);
-	top->exit_what = cpu_to_le16(o->exit_what);
-	top->stonewall = cpu_to_le16(o->stonewall);
+	top->exit_what = cpu_to_le32(o->exit_what);
+	top->stonewall = cpu_to_le32(o->stonewall);
 	top->new_group = cpu_to_le32(o->new_group);
 	top->numjobs = cpu_to_le32(o->numjobs);
 	top->cpus_allowed_policy = cpu_to_le32(o->cpus_allowed_policy);
diff --git a/examples/exitwhat.fio b/examples/exitwhat.fio
index a1099f0f..c91d7375 100644
--- a/examples/exitwhat.fio
+++ b/examples/exitwhat.fio
@@ -1,7 +1,7 @@
 # We want to run fast1 as long as slow1 is running, but also have a cumulative
 # report of fast1 (group_reporting=1/new_group=1).  exitall=1 would not cause
 # fast1 to stop after slow1 is done. Setting exit_what=stonewall will cause
-# alls jobs up until the next stonewall=1 setting to be stopped, when job slow1
+# alls jobs up until the next stonewall setting to be stopped, when job slow1
 # finishes.
 # In this example skipping forward to slow2/fast2. slow2 has exit_what=all set,
 # which means all jobs will be cancelled when slow2 finishes. In particular,
@@ -15,7 +15,7 @@ group_reporting=1
 exitall=1
 
 [slow1]
-rw=r
+rw=read
 numjobs=1
 ioengine=sync
 new_group=1
@@ -32,8 +32,8 @@ iodepth=32
 rate=300,300,300
 
 [slow2]
-stonewall=1
-rw=w
+stonewall
+rw=write
 numjobs=1
 ioengine=sync
 new_group=1
diff --git a/fio.1 b/fio.1
index cdd105d7..1c90e4a5 100644
--- a/fio.1
+++ b/fio.1
@@ -2569,7 +2569,8 @@ been exceeded before retrying operations.
 Wait for preceding jobs in the job file to exit, before starting this
 one. Can be used to insert serialization points in the job file. A stone
 wall also implies starting a new reporting group, see
-\fBgroup_reporting\fR.
+\fBgroup_reporting\fR. Optionally you can use `stonewall=0` to disable or
+`stonewall=1` to enable it.
 .TP
 .BI exitall
 By default, fio will continue running all other jobs when one job finishes.
@@ -2577,15 +2578,27 @@ Sometimes this is not the desired action. Setting \fBexitall\fR will instead
 make fio terminate all jobs in the same group, as soon as one job of that
 group finishes.
 .TP
-.BI exit_what
+.BI exit_what \fR=\fPstr
 By default, fio will continue running all other jobs when one job finishes.
-Sometimes this is not the desired action. Setting \fBexit_all\fR will instead
+Sometimes this is not the desired action. Setting \fBexitall\fR will instead
 make fio terminate all jobs in the same group. The option \fBexit_what\fR
-allows to control which jobs get terminated when \fBexitall\fR is enabled. The
-default is \fBgroup\fR and does not change the behaviour of \fBexitall\fR. The
-setting \fBall\fR terminates all jobs. The setting \fBstonewall\fR terminates
-all currently running jobs across all groups and continues execution with the
-next stonewalled group.
+allows you to control which jobs get terminated when \fBexitall\fR is enabled.
+The default value is \fBgroup\fR.
+The allowed values are:
+.RS
+.RS
+.TP
+.B all
+terminates all jobs.
+.TP
+.B group
+is the default and does not change the behaviour of \fBexitall\fR.
+.TP
+.B stonewall
+terminates all currently running jobs across all groups and continues
+execution with the next stonewalled group.
+.RE
+.RE
 .TP
 .BI exec_prerun \fR=\fPstr
 Before running this job, issue the command specified through
diff --git a/server.h b/server.h
index de01a5c8..efa70e7c 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 83,
+	FIO_SERVER_VER			= 84,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 3fe48ecc..14f1cbe9 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -202,8 +202,8 @@ struct thread_options {
 
 	unsigned long long max_latency;
 
-	unsigned short exit_what;
-	unsigned short stonewall;
+	unsigned int exit_what;
+	unsigned int stonewall;
 	unsigned int new_group;
 	unsigned int numjobs;
 	os_cpu_mask_t cpumask;
@@ -494,8 +494,8 @@ struct thread_options_pack {
 	uint32_t mem_type;
 	uint32_t mem_align;
 
-	uint16_t exit_what;
-	uint16_t stonewall;
+	uint32_t exit_what;
+	uint32_t stonewall;
 	uint32_t new_group;
 	uint32_t numjobs;
 	/*
@@ -546,7 +546,6 @@ struct thread_options_pack {
 	uint32_t lat_percentiles;
 	uint32_t slat_percentiles;
 	uint32_t percentile_precision;
-	uint32_t pad3;
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
 	uint8_t read_iolog_file[FIO_TOP_STR_MAX];


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 84abeef7d9f2f5cded36dcfc127b3f33db89ea57:

  io_u: calculate incremental residuals correctly (2020-08-12 11:48:15 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5c79c32f6afb39fae910912475e8fea786c3353e:

  zbd: use ->min_zone, ->max_zone in more places (2020-08-13 16:06:29 -0600)

----------------------------------------------------------------
Alexey Dobriyan (1):
      zbd: use ->min_zone, ->max_zone in more places

 zbd.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/zbd.c b/zbd.c
index e4a480b7..5af8af4a 100644
--- a/zbd.c
+++ b/zbd.c
@@ -319,6 +319,7 @@ static bool zbd_verify_sizes(void)
 
 			f->min_zone = zbd_zone_idx(f, f->file_offset);
 			f->max_zone = zbd_zone_idx(f, f->file_offset + f->io_size);
+			assert(f->min_zone < f->max_zone);
 		}
 	}
 
@@ -839,9 +840,8 @@ static uint64_t zbd_process_swd(const struct fio_file *f, enum swd_action a)
 	struct fio_zone_info *zb, *ze, *z;
 	uint64_t swd = 0;
 
-	zb = &f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset)];
-	ze = &f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset +
-						  f->io_size)];
+	zb = &f->zbd_info->zone_info[f->min_zone];
+	ze = &f->zbd_info->zone_info[f->max_zone];
 	for (z = zb; z < ze; z++) {
 		pthread_mutex_lock(&z->mutex);
 		swd += z->wp - z->start;
@@ -1175,7 +1175,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 	struct fio_file *f = io_u->file;
 	struct fio_zone_info *z1, *z2;
 	const struct fio_zone_info *const zf =
-		&f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset)];
+		&f->zbd_info->zone_info[f->min_zone];
 
 	/*
 	 * Skip to the next non-empty zone in case of sequential I/O and to
@@ -1482,8 +1482,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		if (range < min_bs ||
 		    ((!td_random(td)) && (io_u->offset + min_bs > zb->wp))) {
 			pthread_mutex_unlock(&zb->mutex);
-			zl = &f->zbd_info->zone_info[zbd_zone_idx(f,
-						f->file_offset + f->io_size)];
+			zl = &f->zbd_info->zone_info[f->max_zone];
 			zb = zbd_find_zone(td, io_u, zb, zl);
 			if (!zb) {
 				dprint(FD_ZBD,


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0d578085cc7b97b783e1719b205dd563b406ecbc:

  t/zbd: check log file for failed assertions (2020-08-11 10:42:43 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 84abeef7d9f2f5cded36dcfc127b3f33db89ea57:

  io_u: calculate incremental residuals correctly (2020-08-12 11:48:15 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      io_u: calculate incremental residuals correctly

 io_u.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index dbb0a6f8..2ef5acec 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1977,7 +1977,7 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 	td->last_ddir = ddir;
 
 	if (!io_u->error && ddir_rw(ddir)) {
-		unsigned long long bytes = io_u->buflen - io_u->resid;
+		unsigned long long bytes = io_u->xfer_buflen - io_u->resid;
 		int ret;
 
 		/*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit cb7d7abbab67e03c901bfaf9517e0cae40a548bf:

  io_u: set io_u->verify_offset in fill_io_u() (2020-08-10 21:40:59 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0d578085cc7b97b783e1719b205dd563b406ecbc:

  t/zbd: check log file for failed assertions (2020-08-11 10:42:43 -0600)

----------------------------------------------------------------
Dmitry Fomichev (4):
      configure: improve libzbc version check
      configure: check if pkg-config is installed
      zbd: simplify zone reset code
      t/zbd: check log file for failed assertions

 configure              | 62 +++++++++++++++++++++++++--------------------
 t/zbd/test-zbd-support |  9 ++++++-
 zbd.c                  | 68 +++++++++++++++++---------------------------------
 3 files changed, 66 insertions(+), 73 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 5925e94f..dd7fe3d2 100755
--- a/configure
+++ b/configure
@@ -133,6 +133,23 @@ output_sym() {
   echo "#define $1" >> $config_host_h
 }
 
+check_min_lib_version() {
+  local feature=$3
+
+  if ${cross_prefix}pkg-config --atleast-version=$2 $1 > /dev/null 2>&1; then
+    return 0
+  fi
+  : ${feature:=${1}}
+  if ${cross_prefix}pkg-config --version > /dev/null 2>&1; then
+    if test ${!feature} = "yes" ; then
+      feature_not_found "$feature" "$1 >= $2"
+    fi
+  else
+    print_config "$1" "missing pkg-config, can't check $feature version"
+  fi
+  return 1
+}
+
 targetos=""
 cpu=""
 
@@ -152,6 +169,7 @@ march_set="no"
 libiscsi="no"
 libnbd="no"
 libaio_uring="no"
+libzbc=""
 dynamic_engines="no"
 prefix=/usr/local
 
@@ -213,6 +231,8 @@ for opt do
   ;;
   --enable-libnbd) libnbd="yes"
   ;;
+  --disable-libzbc) libzbc="no"
+  ;;
   --disable-tcmalloc) disable_tcmalloc="yes"
   ;;
   --enable-libaio-uring) libaio_uring="yes"
@@ -256,6 +276,7 @@ if test "$show_help" = "yes" ; then
   echo "--with-ime=             Install path for DDN's Infinite Memory Engine"
   echo "--enable-libiscsi       Enable iscsi support"
   echo "--enable-libnbd         Enable libnbd (NBD engine) support"
+  echo "--disable-libzbc        Disable libzbc even if found"
   echo "--disable-tcmalloc	Disable tcmalloc support"
   echo "--enable-libaio-uring   Enable libaio emulated over io_uring"
   echo "--dynamic-libengines	Lib-based ioengines as dynamic libraries"
@@ -1517,18 +1538,17 @@ if test "$?" != "0" ; then
   echo "configure: gtk and gthread not found"
   exit 1
 fi
-if ! ${cross_prefix}pkg-config --atleast-version 2.18.0 gtk+-2.0; then
-  echo "GTK found, but need version 2.18 or higher"
-  gfio="no"
-else
+gfio="yes"
+if check_min_lib_version gtk+-2.0 2.18.0 "gfio"; then
   if compile_prog "$GTK_CFLAGS" "$GTK_LIBS" "gfio" ; then
-    gfio="yes"
     GFIO_LIBS="$LIBS $GTK_LIBS"
     CFLAGS="$CFLAGS $GTK_CFLAGS"
   else
     echo "Please install gtk and gdk libraries"
     gfio="no"
   fi
+else
+  gfio="no"
 fi
 LDFLAGS=$ORG_LDFLAGS
 fi
@@ -2178,15 +2198,11 @@ print_config "DDN's Infinite Memory Engine" "$libime"
 ##########################################
 # Check if we have libiscsi
 if test "$libiscsi" != "no" ; then
-  minimum_libiscsi=1.9.0
-  if $(pkg-config --atleast-version=$minimum_libiscsi libiscsi); then
+  if check_min_lib_version libiscsi 1.9.0; then
     libiscsi="yes"
     libiscsi_cflags=$(pkg-config --cflags libiscsi)
     libiscsi_libs=$(pkg-config --libs libiscsi)
   else
-    if test "$libiscsi" = "yes" ; then
-      feature_not_found "libiscsi" "libiscsi >= $minimum_libiscsi"
-    fi
     libiscsi="no"
   fi
 fi
@@ -2195,15 +2211,11 @@ print_config "iscsi engine" "$libiscsi"
 ##########################################
 # Check if we have libnbd (for NBD support)
 if test "$libnbd" != "no" ; then
-  minimum_libnbd=0.9.8
-  if $(pkg-config --atleast-version=$minimum_libnbd libnbd); then
+  if check_min_lib_version libnbd 0.9.8; then
     libnbd="yes"
     libnbd_cflags=$(pkg-config --cflags libnbd)
     libnbd_libs=$(pkg-config --libs libnbd)
   else
-    if test "$libnbd" = "yes" ; then
-      feature_not_found "libnbd" "libnbd >= $minimum_libnbd"
-    fi
     libnbd="no"
   fi
 fi
@@ -2454,9 +2466,6 @@ fi
 
 ##########################################
 # libzbc probe
-if test "$libzbc" != "yes" ; then
-  libzbc="no"
-fi
 cat > $TMPC << EOF
 #include <libzbc/zbc.h>
 int main(int argc, char **argv)
@@ -2466,19 +2475,18 @@ int main(int argc, char **argv)
   return zbc_open("foo=bar", O_RDONLY, &dev);
 }
 EOF
-if compile_prog "" "-lzbc" "libzbc"; then
-  libzbcvermaj=$(pkg-config --modversion libzbc | sed 's/\.[0-9]*\.[0-9]*//')
-  if test "$libzbcvermaj" -ge "5" ; then
+if test "$libzbc" != "no" ; then
+  if compile_prog "" "-lzbc" "libzbc"; then
     libzbc="yes"
+    if ! check_min_lib_version libzbc 5; then
+      libzbc="no"
+    fi
   else
-    print_config "libzbc engine" "Unsupported libzbc version (version 5 or above required)"
-    libzbc="no"
-  fi
-else
-  if test "$libzbc" = "yes" ; then
+    if test "$libzbc" = "yes" ; then
       feature_not_found "libzbc" "libzbc or libzbc/zbc.h"
+    fi
+    libzbc="no"
   fi
-  libzbc="no"
 fi
 print_config "libzbc engine" "$libzbc"
 
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 471a3487..139495d3 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -77,6 +77,13 @@ check_reset_count() {
     eval "[ '$reset_count' '$1' '$2' ]"
 }
 
+# Check log for failed assertions and crashes. Without these checks,
+# a test can succeed even when these events happen, but it must fail.
+check_log() {
+     [ ! -f "${logfile}.${1}" ] && return 0
+     ! grep -q -e "Assertion " -e "Aborted " "${logfile}.${1}"
+}
+
 # Whether or not $1 (/dev/...) is a SCSI device.
 is_scsi_device() {
     local d f
@@ -1008,7 +1015,7 @@ trap 'intr=1' SIGINT
 for test_number in "${tests[@]}"; do
     rm -f "${logfile}.${test_number}"
     echo -n "Running test $(printf "%02d" $test_number) ... "
-    if eval "test$test_number"; then
+    if eval "test$test_number" && check_log $test_number; then
 	status="PASS"
 	cc_status="${green}${status}${end}"
 	((passed++))
diff --git a/zbd.c b/zbd.c
index 3eac5df3..e4a480b7 100644
--- a/zbd.c
+++ b/zbd.c
@@ -670,24 +670,33 @@ int zbd_setup_files(struct thread_data *td)
 	return 0;
 }
 
+static unsigned int zbd_zone_nr(struct zoned_block_device_info *zbd_info,
+				struct fio_zone_info *zone)
+{
+	return zone - zbd_info->zone_info;
+}
+
 /**
- * zbd_reset_range - reset zones for a range of sectors
+ * zbd_reset_zone - reset the write pointer of a single zone
  * @td: FIO thread data.
- * @f: Fio file for which to reset zones
- * @sector: Starting sector in units of 512 bytes
- * @nr_sectors: Number of sectors in units of 512 bytes
+ * @f: FIO file associated with the disk for which to reset a write pointer.
+ * @z: Zone to reset.
  *
  * Returns 0 upon success and a negative error code upon failure.
+ *
+ * The caller must hold z->mutex.
  */
-static int zbd_reset_range(struct thread_data *td, struct fio_file *f,
-			   uint64_t offset, uint64_t length)
+static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
+			  struct fio_zone_info *z)
 {
-	uint32_t zone_idx_b, zone_idx_e;
-	struct fio_zone_info *zb, *ze, *z;
+	uint64_t offset = z->start;
+	uint64_t length = (z+1)->start - offset;
 	int ret = 0;
 
 	assert(is_valid_offset(f, offset + length - 1));
 
+	dprint(FD_ZBD, "%s: resetting wp of zone %u.\n", f->file_name,
+		zbd_zone_nr(f->zbd_info, z));
 	switch (f->zbd_info->model) {
 	case ZBD_HOST_AWARE:
 	case ZBD_HOST_MANAGED:
@@ -699,48 +708,17 @@ static int zbd_reset_range(struct thread_data *td, struct fio_file *f,
 		break;
 	}
 
-	zone_idx_b = zbd_zone_idx(f, offset);
-	zb = &f->zbd_info->zone_info[zone_idx_b];
-	zone_idx_e = zbd_zone_idx(f, offset + length);
-	ze = &f->zbd_info->zone_info[zone_idx_e];
-	for (z = zb; z < ze; z++) {
-		pthread_mutex_lock(&z->mutex);
-		pthread_mutex_lock(&f->zbd_info->mutex);
-		f->zbd_info->sectors_with_data -= z->wp - z->start;
-		pthread_mutex_unlock(&f->zbd_info->mutex);
-		z->wp = z->start;
-		z->verify_block = 0;
-		pthread_mutex_unlock(&z->mutex);
-	}
+	pthread_mutex_lock(&f->zbd_info->mutex);
+	f->zbd_info->sectors_with_data -= z->wp - z->start;
+	pthread_mutex_unlock(&f->zbd_info->mutex);
+	z->wp = z->start;
+	z->verify_block = 0;
 
-	td->ts.nr_zone_resets += ze - zb;
+	td->ts.nr_zone_resets++;
 
 	return ret;
 }
 
-static unsigned int zbd_zone_nr(struct zoned_block_device_info *zbd_info,
-				struct fio_zone_info *zone)
-{
-	return zone - zbd_info->zone_info;
-}
-
-/**
- * zbd_reset_zone - reset the write pointer of a single zone
- * @td: FIO thread data.
- * @f: FIO file associated with the disk for which to reset a write pointer.
- * @z: Zone to reset.
- *
- * Returns 0 upon success and a negative error code upon failure.
- */
-static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
-			  struct fio_zone_info *z)
-{
-	dprint(FD_ZBD, "%s: resetting wp of zone %u.\n", f->file_name,
-		zbd_zone_nr(f->zbd_info, z));
-
-	return zbd_reset_range(td, f, z->start, zbd_zone_end(z) - z->start);
-}
-
 /* The caller must hold f->zbd_info->mutex */
 static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
 			   unsigned int open_zone_idx)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f0ed01ed095cf1ca7c1945a5a0267e8f73b7b4a9:

  Merge branch 'master' of https://github.com/donny372/fio into master (2020-08-07 18:21:52 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cb7d7abbab67e03c901bfaf9517e0cae40a548bf:

  io_u: set io_u->verify_offset in fill_io_u() (2020-08-10 21:40:59 -0600)

----------------------------------------------------------------
Jens Axboe (6):
      engines/io_uring: make sure state is updated for requeues
      io_u: reset ->resid on starting a requeue IO
      io_uring: notice short IO on completion path
      verify: use origina offset for verification
      io_u: get_next_offset() should always set io_u->verify_offset
      io_u: set io_u->verify_offset in fill_io_u()

 engines/io_uring.c | 16 ++++++++++++----
 io_u.c             | 23 +++++++++++++++++++++--
 io_u.h             |  3 ++-
 iolog.c            |  1 +
 verify.c           | 27 ++++++++++++++-------------
 5 files changed, 50 insertions(+), 20 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 0ccd2318..57925594 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -234,13 +234,21 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 			sqe->len = io_u->xfer_buflen;
 			sqe->buf_index = io_u->index;
 		} else {
+			struct iovec *iov = &ld->iovecs[io_u->index];
+
+			/*
+			 * Update based on actual io_u, requeue could have
+			 * adjusted these
+			 */
+			iov->iov_base = io_u->xfer_buf;
+			iov->iov_len = io_u->xfer_buflen;
+
 			sqe->opcode = ddir_to_op[io_u->ddir][!!o->nonvectored];
 			if (o->nonvectored) {
-				sqe->addr = (unsigned long)
-						ld->iovecs[io_u->index].iov_base;
-				sqe->len = ld->iovecs[io_u->index].iov_len;
+				sqe->addr = (unsigned long) iov->iov_base;
+				sqe->len = iov->iov_len;
 			} else {
-				sqe->addr = (unsigned long) &ld->iovecs[io_u->index];
+				sqe->addr = (unsigned long) iov;
 				sqe->len = 1;
 			}
 		}
diff --git a/io_u.c b/io_u.c
index 6a729e51..dbb0a6f8 100644
--- a/io_u.c
+++ b/io_u.c
@@ -464,6 +464,7 @@ static int get_next_block(struct thread_data *td, struct io_u *io_u,
 			log_err("fio: bug in offset generation: offset=%llu, b=%llu\n", (unsigned long long) offset, (unsigned long long) b);
 			ret = 1;
 		}
+		io_u->verify_offset = io_u->offset;
 	}
 
 	return ret;
@@ -506,6 +507,7 @@ static int get_next_offset(struct thread_data *td, struct io_u *io_u,
 		return 1;
 	}
 
+	io_u->verify_offset = io_u->offset;
 	return 0;
 }
 
@@ -964,6 +966,7 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 
 out:
 	dprint_io_u(io_u, "fill");
+	io_u->verify_offset = io_u->offset;
 	td->zone_bytes += io_u->buflen;
 	return 0;
 }
@@ -1564,9 +1567,10 @@ struct io_u *__get_io_u(struct thread_data *td)
 		__td_io_u_lock(td);
 
 again:
-	if (!io_u_rempty(&td->io_u_requeues))
+	if (!io_u_rempty(&td->io_u_requeues)) {
 		io_u = io_u_rpop(&td->io_u_requeues);
-	else if (!queue_full(td)) {
+		io_u->resid = 0;
+	} else if (!queue_full(td)) {
 		io_u = io_u_qpop(&td->io_u_freelist);
 
 		io_u->file = NULL;
@@ -1976,6 +1980,21 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 		unsigned long long bytes = io_u->buflen - io_u->resid;
 		int ret;
 
+		/*
+		 * Make sure we notice short IO from here, and requeue them
+		 * appropriately!
+		 */
+		if (io_u->resid) {
+			io_u->xfer_buflen = io_u->resid;
+			io_u->xfer_buf += bytes;
+			io_u->offset += bytes;
+			td->ts.short_io_u[io_u->ddir]++;
+			if (io_u->offset < io_u->file->real_file_size) {
+				requeue_io_u(td, io_u_ptr);
+				return;
+			}
+		}
+
 		td->io_blocks[ddir]++;
 		td->io_bytes[ddir] += bytes;
 
diff --git a/io_u.h b/io_u.h
index 87c29201..31100928 100644
--- a/io_u.h
+++ b/io_u.h
@@ -53,7 +53,8 @@ struct io_u {
 	 * Allocated/set buffer and length
 	 */
 	unsigned long long buflen;
-	unsigned long long offset;
+	unsigned long long offset;	/* is really ->xfer_offset... */
+	unsigned long long verify_offset;	/* is really ->offset */
 	void *buf;
 
 	/*
diff --git a/iolog.c b/iolog.c
index 7f21be51..fa40c857 100644
--- a/iolog.c
+++ b/iolog.c
@@ -174,6 +174,7 @@ int read_iolog_get(struct thread_data *td, struct io_u *io_u)
 		io_u->ddir = ipo->ddir;
 		if (ipo->ddir != DDIR_WAIT) {
 			io_u->offset = ipo->offset;
+			io_u->verify_offset = ipo->offset;
 			io_u->buflen = ipo->len;
 			io_u->file = td->files[ipo->fileno];
 			get_file(io_u->file);
diff --git a/verify.c b/verify.c
index 5ee0029d..a418c054 100644
--- a/verify.c
+++ b/verify.c
@@ -302,7 +302,7 @@ static void __dump_verify_buffers(struct verify_header *hdr, struct vcont *vc)
 	 */
 	hdr_offset = vc->hdr_num * hdr->len;
 
-	dump_buf(io_u->buf + hdr_offset, hdr->len, io_u->offset + hdr_offset,
+	dump_buf(io_u->buf + hdr_offset, hdr->len, io_u->verify_offset + hdr_offset,
 			"received", vc->io_u->file);
 
 	/*
@@ -317,7 +317,7 @@ static void __dump_verify_buffers(struct verify_header *hdr, struct vcont *vc)
 
 	fill_pattern_headers(td, &dummy, hdr->rand_seed, 1);
 
-	dump_buf(buf + hdr_offset, hdr->len, io_u->offset + hdr_offset,
+	dump_buf(buf + hdr_offset, hdr->len, io_u->verify_offset + hdr_offset,
 			"expected", vc->io_u->file);
 	free(buf);
 }
@@ -339,12 +339,12 @@ static void log_verify_failure(struct verify_header *hdr, struct vcont *vc)
 {
 	unsigned long long offset;
 
-	offset = vc->io_u->offset;
+	offset = vc->io_u->verify_offset;
 	offset += vc->hdr_num * hdr->len;
 	log_err("%.8s: verify failed at file %s offset %llu, length %u"
-			" (requested block: offset=%llu, length=%llu)\n",
+			" (requested block: offset=%llu, length=%llu, flags=%x)\n",
 			vc->name, vc->io_u->file->file_name, offset, hdr->len,
-			vc->io_u->offset, vc->io_u->buflen);
+			vc->io_u->verify_offset, vc->io_u->buflen, vc->io_u->flags);
 
 	if (vc->good_crc && vc->bad_crc) {
 		log_err("       Expected CRC: ");
@@ -801,7 +801,7 @@ static int verify_trimmed_io_u(struct thread_data *td, struct io_u *io_u)
 
 	log_err("trim: verify failed at file %s offset %llu, length %llu"
 		", block offset %lu\n",
-			io_u->file->file_name, io_u->offset, io_u->buflen,
+			io_u->file->file_name, io_u->verify_offset, io_u->buflen,
 			(unsigned long) offset);
 	return EILSEQ;
 }
@@ -829,10 +829,10 @@ static int verify_header(struct io_u *io_u, struct thread_data *td,
 			hdr->rand_seed, io_u->rand_seed);
 		goto err;
 	}
-	if (hdr->offset != io_u->offset + hdr_num * td->o.verify_interval) {
+	if (hdr->offset != io_u->verify_offset + hdr_num * td->o.verify_interval) {
 		log_err("verify: bad header offset %"PRIu64
 			", wanted %llu",
-			hdr->offset, io_u->offset);
+			hdr->offset, io_u->verify_offset);
 		goto err;
 	}
 
@@ -864,11 +864,11 @@ err:
 	log_err(" at file %s offset %llu, length %u"
 		" (requested block: offset=%llu, length=%llu)\n",
 		io_u->file->file_name,
-		io_u->offset + hdr_num * hdr_len, hdr_len,
-		io_u->offset, io_u->buflen);
+		io_u->verify_offset + hdr_num * hdr_len, hdr_len,
+		io_u->verify_offset, io_u->buflen);
 
 	if (td->o.verify_dump)
-		dump_buf(p, hdr_len, io_u->offset + hdr_num * hdr_len,
+		dump_buf(p, hdr_len, io_u->verify_offset + hdr_num * hdr_len,
 				"hdr_fail", io_u->file);
 
 	return EILSEQ;
@@ -1156,7 +1156,7 @@ static void __fill_hdr(struct thread_data *td, struct io_u *io_u,
 	hdr->verify_type = td->o.verify;
 	hdr->len = header_len;
 	hdr->rand_seed = rand_seed;
-	hdr->offset = io_u->offset + header_num * td->o.verify_interval;
+	hdr->offset = io_u->verify_offset + header_num * td->o.verify_interval;
 	hdr->time_sec = io_u->start_time.tv_sec;
 	hdr->time_nsec = io_u->start_time.tv_nsec;
 	hdr->thread = td->thread_number;
@@ -1334,6 +1334,7 @@ int get_next_verify(struct thread_data *td, struct io_u *io_u)
 		td->io_hist_len--;
 
 		io_u->offset = ipo->offset;
+		io_u->verify_offset = ipo->offset;
 		io_u->buflen = ipo->len;
 		io_u->numberio = ipo->numberio;
 		io_u->file = ipo->file;
@@ -1866,7 +1867,7 @@ int verify_state_should_stop(struct thread_data *td, struct io_u *io_u)
 	for (i = 0; i < s->no_comps; i++) {
 		if (s->comps[i].fileno != f->fileno)
 			continue;
-		if (io_u->offset == s->comps[i].offset)
+		if (io_u->verify_offset == s->comps[i].offset)
 			return 0;
 	}
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 87622bf5295880682bfad5ba14116cf5facbaf2c:

  Merge branch 'master' of https://github.com/bvanassche/fio into master (2020-08-01 15:07:38 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f0ed01ed095cf1ca7c1945a5a0267e8f73b7b4a9:

  Merge branch 'master' of https://github.com/donny372/fio into master (2020-08-07 18:21:52 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'master' of https://github.com/donny372/fio into master

donny372 (1):
      Add support for reading iolog from stdin.

 HOWTO   |  3 +++
 fio.1   |  4 +++-
 init.c  | 12 ++++++++++++
 iolog.c |  2 ++
 4 files changed, 20 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 35ead0cb..e0403b08 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2625,6 +2625,9 @@ I/O replay
 	character. See the :option:`filename` option for information on how to
 	escape ':' characters within the file names. These files will
 	be sequentially assigned to job clones created by :option:`numjobs`.
+	'-' is a reserved name, meaning read from stdin, notably if
+	:option:`filename` is set to '-' which means stdin as well, then
+	this flag can't be set to '-'.
 
 .. option:: read_iolog_chunked=bool
 
diff --git a/fio.1 b/fio.1
index a3d348b2..cdd105d7 100644
--- a/fio.1
+++ b/fio.1
@@ -2336,7 +2336,9 @@ replay, the file needs to be turned into a blkparse binary data file first
 You can specify a number of files by separating the names with a ':' character.
 See the \fBfilename\fR option for information on how to escape ':'
 characters within the file names. These files will be sequentially assigned to
-job clones created by \fBnumjobs\fR.
+job clones created by \fBnumjobs\fR. '-' is a reserved name, meaning read from
+stdin, notably if \fBfilename\fR is set to '-' which means stdin as well,
+then this flag can't be set to '-'.
 .TP
 .BI read_iolog_chunked \fR=\fPbool
 Determines how iolog is read. If false (default) entire \fBread_iolog\fR will
diff --git a/init.c b/init.c
index 3710e3d4..84325f1e 100644
--- a/init.c
+++ b/init.c
@@ -2068,6 +2068,18 @@ static int __parse_jobs_ini(struct thread_data *td,
 		}
 
 		ret = fio_options_parse(td, opts, num_opts);
+
+		if (!ret) {
+			if (!strcmp(file, "-") && td->o.read_iolog_file != NULL) {
+				char *fname = get_name_by_idx(td->o.read_iolog_file,
+							      td->subjob_number);
+				if (!strcmp(fname, "-")) {
+					log_err("fio: we can't read both iolog "
+						"and job file from stdin.\n");
+					ret = 1;
+				}
+			}
+		}
 		if (!ret) {
 			if (dump_cmdline)
 				dump_opt_list(td);
diff --git a/iolog.c b/iolog.c
index 4af10da3..7f21be51 100644
--- a/iolog.c
+++ b/iolog.c
@@ -620,6 +620,8 @@ static bool init_iolog_read(struct thread_data *td)
 		fd = open_socket(fname);
 		if (fd >= 0)
 			f = fdopen(fd, "r");
+	} else if (!strcmp(fname, "-")) {
+		f = stdin;
 	} else
 		f = fopen(fname, "r");
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-08-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-08-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b5aba537d844f73187eb931179ac59e7da570e7c:

  iolog: ensure that dynamic log entries are at least queue depth sized (2020-07-27 16:00:20 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 87622bf5295880682bfad5ba14116cf5facbaf2c:

  Merge branch 'master' of https://github.com/bvanassche/fio into master (2020-08-01 15:07:38 -0600)

----------------------------------------------------------------
Bart Van Assche (1):
      Prevent that fio hangs when using io_submit_mode=offload

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio into master

 rate-submit.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/rate-submit.c b/rate-submit.c
index b7b70372..13dbe7a2 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -97,8 +97,11 @@ static int io_workqueue_fn(struct submit_worker *sw,
 			td->cur_depth -= ret;
 	}
 
-	if (error || td->error)
+	if (error || td->error) {
+		pthread_mutex_lock(&td->io_u_lock);
 		pthread_cond_signal(&td->parent->free_cond);
+		pthread_mutex_unlock(&td->io_u_lock);
+	}
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 11b280c158db01744effde3863bfe9c65f7af090:

  t/jobs/t001[1-2].fio: run for 10 seconds instead of 3 (2020-07-26 22:19:47 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b5aba537d844f73187eb931179ac59e7da570e7c:

  iolog: ensure that dynamic log entries are at least queue depth sized (2020-07-27 16:00:20 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Add roundup_pow2() as a generic helper
      iolog: ensure that dynamic log entries are at least queue depth sized

 engines/io_uring.c |  6 +-----
 iolog.c            |  6 +++++-
 lib/roundup.h      | 11 +++++++++++
 3 files changed, 17 insertions(+), 6 deletions(-)
 create mode 100644 lib/roundup.h

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index ecff0657..0ccd2318 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -17,6 +17,7 @@
 #include "../optgroup.h"
 #include "../lib/memalign.h"
 #include "../lib/fls.h"
+#include "../lib/roundup.h"
 
 #ifdef ARCH_HAVE_IOURING
 
@@ -654,11 +655,6 @@ static int fio_ioring_post_init(struct thread_data *td)
 	return 0;
 }
 
-static unsigned roundup_pow2(unsigned depth)
-{
-	return 1UL << __fls(depth - 1);
-}
-
 static int fio_ioring_init(struct thread_data *td)
 {
 	struct ioring_options *o = td->eo;
diff --git a/iolog.c b/iolog.c
index 4a79fc46..4af10da3 100644
--- a/iolog.c
+++ b/iolog.c
@@ -19,6 +19,7 @@
 #include "smalloc.h"
 #include "blktrace.h"
 #include "pshared.h"
+#include "lib/roundup.h"
 
 #include <netinet/in.h>
 #include <netinet/tcp.h>
@@ -748,10 +749,13 @@ void setup_log(struct io_log **log, struct log_params *p,
 	}
 
 	if (l->td && l->td->o.io_submit_mode != IO_MODE_OFFLOAD) {
+		unsigned int def_samples = DEF_LOG_ENTRIES;
 		struct io_logs *__p;
 
 		__p = calloc(1, sizeof(*l->pending));
-		__p->max_samples = DEF_LOG_ENTRIES;
+		if (l->td->o.iodepth > DEF_LOG_ENTRIES)
+			def_samples = roundup_pow2(l->td->o.iodepth);
+		__p->max_samples = def_samples;
 		__p->log = calloc(__p->max_samples, log_entry_sz(l));
 		l->pending = __p;
 	}
diff --git a/lib/roundup.h b/lib/roundup.h
new file mode 100644
index 00000000..5a99c8ac
--- /dev/null
+++ b/lib/roundup.h
@@ -0,0 +1,11 @@
+#ifndef FIO_ROUNDUP_H
+#define FIO_ROUNDUP_H
+
+#include "lib/fls.h"
+
+static inline unsigned roundup_pow2(unsigned depth)
+{
+	return 1UL << __fls(depth - 1);
+}
+
+#endif


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e235e74dfd0627b17690194f957da509b8ace808:

  Merge branch 'test-cleanup' of https://github.com/vincentkfu/fio (2020-07-25 13:55:16 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 11b280c158db01744effde3863bfe9c65f7af090:

  t/jobs/t001[1-2].fio: run for 10 seconds instead of 3 (2020-07-26 22:19:47 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      Merge branch 'travis_cleanup' of https://github.com/sitsofe/fio
      Merge branch 'enable_or_exit' of https://github.com/sitsofe/fio
      Merge branch 'fix_travis_libiscsi' of https://github.com/sitsofe/fio
      t/jobs/t001[1-2].fio: run for 10 seconds instead of 3

Sitsofe Wheeler (12):
      travis: simplify yml file
      travis: improve installation dependencies
      configure: check for Debian/Ubuntu tcmalloc_minimal
      travis: use install section instead of before_install section
      travis: add dependency for cuda ioengine
      travis: enable libiscsi and cuda ioengines
      travis: make CI install script bail out on first error
      memory: fix incorrect pointer comparison when freeing cuda memory
      Makefile: fix incorrectly set libiscsi cflags
      configure: fail when explicit enabling doesn't succeed
      travis: remove unneeded dependency
      travis: fix x86 libiscsi detection

 .travis.yml               | 18 ++++----------
 Makefile                  |  2 +-
 ci/travis-build.sh        | 23 ++++++++++++++----
 ci/travis-install.sh      | 30 ++++++++++++++++-------
 configure                 | 62 +++++++++++++++++++++++++++++------------------
 memory.c                  |  2 +-
 t/jobs/t0011-5d2788d5.fio |  2 +-
 t/jobs/t0012.fio          |  2 +-
 8 files changed, 87 insertions(+), 54 deletions(-)

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
index b64f0a95..e35aff39 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -9,36 +9,28 @@ arch:
   - amd64
   - arm64
 env:
-  matrix:
-    - BUILD_ARCH="x86"
-    - BUILD_ARCH="x86_64"
   global:
     - MAKEFLAGS="-j 2"
 matrix:
   include:
+    - os: linux
+      compiler: gcc
+      arch: amd64
+      env: BUILD_ARCH="x86" # Only do the gcc x86 build to reduce clutter
     # Default xcode image
     - os: osx
       compiler: clang # Workaround travis setting CC=["clang", "gcc"]
-      env: BUILD_ARCH="x86_64"
       arch: amd64
     # Latest xcode image (needs periodic updating)
     - os: osx
       compiler: clang
       osx_image: xcode11.2
-      env: BUILD_ARCH="x86_64"
       arch: amd64
   exclude:
     - os: osx
       compiler: gcc
-    - os: linux
-      compiler: clang
-      arch: amd64
-      env: BUILD_ARCH="x86" # Only do the gcc x86 build to reduce clutter
-    - os: linux
-      env: BUILD_ARCH="x86"
-      arch: arm64
 
-before_install:
+install:
   - ci/travis-install.sh
 
 script:
diff --git a/Makefile b/Makefile
index f374ac84..8e1ebc90 100644
--- a/Makefile
+++ b/Makefile
@@ -62,7 +62,7 @@ endif
 ifdef CONFIG_LIBISCSI
   iscsi_SRCS = engines/libiscsi.c
   iscsi_LIBS = $(LIBISCSI_LIBS)
-  iscsi_CFLAGS = $(LIBISCSI_LIBS)
+  iscsi_CFLAGS = $(LIBISCSI_CFLAGS)
   ENGINES += iscsi
 endif
 
diff --git a/ci/travis-build.sh b/ci/travis-build.sh
index 06012e89..231417e2 100755
--- a/ci/travis-build.sh
+++ b/ci/travis-build.sh
@@ -1,16 +1,29 @@
 #!/bin/bash
 
+CI_TARGET_ARCH="${BUILD_ARCH:-$TRAVIS_CPU_ARCH}"
 EXTRA_CFLAGS="-Werror"
 PYTHONUNBUFFERED=TRUE
+CONFIGURE_FLAGS=()
 
-if [[ "$BUILD_ARCH" == "x86" ]]; then
-    EXTRA_CFLAGS="${EXTRA_CFLAGS} -m32"
-fi
+case "$TRAVIS_OS_NAME" in
+    "linux")
+        CONFIGURE_FLAGS+=(--enable-libiscsi)
+        case "$CI_TARGET_ARCH" in
+            "x86")
+                EXTRA_CFLAGS="${EXTRA_CFLAGS} -m32"
+                ;;
+            "amd64")
+                CONFIGURE_FLAGS+=(--enable-cuda)
+                ;;
+        esac
+    ;;
+esac
+CONFIGURE_FLAGS+=(--extra-cflags="${EXTRA_CFLAGS}")
 
-./configure --extra-cflags="${EXTRA_CFLAGS}" &&
+./configure "${CONFIGURE_FLAGS[@]}" &&
     make &&
     make test &&
-    if [[ "$TRAVIS_CPU_ARCH" == "arm64" ]]; then
+    if [[ "$CI_TARGET_ARCH" == "arm64" ]]; then
 	sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug -p 1010:"--skip 15 16 17 18 19 20"
     else
 	sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug
diff --git a/ci/travis-install.sh b/ci/travis-install.sh
index 232ab6b1..b6895e82 100755
--- a/ci/travis-install.sh
+++ b/ci/travis-install.sh
@@ -1,13 +1,15 @@
 #!/bin/bash
+set -e
 
+CI_TARGET_ARCH="${BUILD_ARCH:-$TRAVIS_CPU_ARCH}"
 case "$TRAVIS_OS_NAME" in
     "linux")
 	# Architecture-dependent packages.
 	pkgs=(
 	    libaio-dev
-	    libcunit1
 	    libcunit1-dev
-	    libgoogle-perftools4
+	    libfl-dev
+	    libgoogle-perftools-dev
 	    libibverbs-dev
 	    libiscsi-dev
 	    libnuma-dev
@@ -15,15 +17,26 @@ case "$TRAVIS_OS_NAME" in
 	    librdmacm-dev
 	    libz-dev
 	)
-	if [[ "$BUILD_ARCH" == "x86" ]]; then
-	    pkgs=("${pkgs[@]/%/:i386}")
-	    pkgs+=(gcc-multilib)
-	else
-	    pkgs+=(glusterfs-common)
+	case "$CI_TARGET_ARCH" in
+	    "x86")
+		pkgs=("${pkgs[@]/%/:i386}")
+		pkgs+=(
+		    gcc-multilib
+		    pkg-config:i386
+	        )
+		;;
+	    "amd64")
+		pkgs+=(nvidia-cuda-dev)
+		;;
+	esac
+	if [[ $CI_TARGET_ARCH != "x86" ]]; then
+		pkgs+=(glusterfs-common)
 	fi
 	# Architecture-independent packages and packages for which we don't
 	# care about the architecture.
 	pkgs+=(
+	    bison
+	    flex
 	    python3
 	    python3-scipy
 	    python3-six
@@ -34,8 +47,7 @@ case "$TRAVIS_OS_NAME" in
     "osx")
 	brew update >/dev/null 2>&1
 	brew install cunit
-	pip3 install scipy
-	pip3 install six
+	pip3 install scipy six
 	;;
 esac
 
diff --git a/configure b/configure
index 25216c63..5925e94f 100755
--- a/configure
+++ b/configure
@@ -144,6 +144,7 @@ libhdfs="no"
 pmemblk="no"
 devdax="no"
 pmem="no"
+cuda="no"
 disable_lex=""
 disable_pmem="no"
 disable_native="no"
@@ -202,7 +203,7 @@ for opt do
   ;;
   --disable-pmem) disable_pmem="yes"
   ;;
-  --enable-cuda) enable_cuda="yes"
+  --enable-cuda) cuda="yes"
   ;;
   --disable-native) disable_native="yes"
   ;;
@@ -626,8 +627,13 @@ int main(void)
   return 0;
 }
 EOF
-  if test "$libaio_uring" = "yes" && compile_prog "" "-luring" "libaio io_uring" ; then
-    libaio=yes
+  if test "$libaio_uring" = "yes"; then
+    if compile_prog "" "-luring" "libaio io_uring" ; then
+      libaio=yes
+      LIBS="-luring $LIBS"
+    else
+      feature_not_found "libaio io_uring" ""
+    fi
   elif compile_prog "" "-laio" "libaio" ; then
     libaio=yes
     libaio_uring=no
@@ -2052,7 +2058,7 @@ if test "$libhdfs" = "yes" ; then
     hdfs_conf_error=1
   fi
   if test "$hdfs_conf_error" = "1" ; then
-    exit 1
+    feature_not_found "libhdfs" ""
   fi
   FIO_HDFS_CPU=$cpu
   if test "$FIO_HDFS_CPU" = "x86_64" ; then
@@ -2170,15 +2176,16 @@ fi
 print_config "DDN's Infinite Memory Engine" "$libime"
 
 ##########################################
-# Check if we have required environment variables configured for libiscsi
-if test "$libiscsi" = "yes" ; then
-  if $(pkg-config --atleast-version=1.9.0 libiscsi); then
+# Check if we have libiscsi
+if test "$libiscsi" != "no" ; then
+  minimum_libiscsi=1.9.0
+  if $(pkg-config --atleast-version=$minimum_libiscsi libiscsi); then
     libiscsi="yes"
     libiscsi_cflags=$(pkg-config --cflags libiscsi)
     libiscsi_libs=$(pkg-config --libs libiscsi)
   else
     if test "$libiscsi" = "yes" ; then
-      echo "libiscsi" "Install libiscsi >= 1.9.0"
+      feature_not_found "libiscsi" "libiscsi >= $minimum_libiscsi"
     fi
     libiscsi="no"
   fi
@@ -2186,16 +2193,16 @@ fi
 print_config "iscsi engine" "$libiscsi"
 
 ##########################################
-# Check if we have libnbd (for NBD support).
-minimum_libnbd=0.9.8
-if test "$libnbd" = "yes" ; then
+# Check if we have libnbd (for NBD support)
+if test "$libnbd" != "no" ; then
+  minimum_libnbd=0.9.8
   if $(pkg-config --atleast-version=$minimum_libnbd libnbd); then
     libnbd="yes"
     libnbd_cflags=$(pkg-config --cflags libnbd)
     libnbd_libs=$(pkg-config --libs libnbd)
   else
     if test "$libnbd" = "yes" ; then
-      echo "libnbd" "Install libnbd >= $minimum_libnbd"
+      feature_not_found "libnbd" "libnbd >= $minimum_libnbd"
     fi
     libnbd="no"
   fi
@@ -2506,9 +2513,7 @@ print_config "march_armv8_a_crc_crypto" "$march_armv8_a_crc_crypto"
 
 ##########################################
 # cuda probe
-if test "$cuda" != "yes" ; then
-  cuda="no"
-fi
+if test "$cuda" != "no" ; then
 cat > $TMPC << EOF
 #include <cuda.h>
 int main(int argc, char **argv)
@@ -2516,9 +2521,15 @@ int main(int argc, char **argv)
   return cuInit(0);
 }
 EOF
-if test "$enable_cuda" = "yes" && compile_prog "" "-lcuda" "cuda"; then
-  cuda="yes"
-  LIBS="-lcuda $LIBS"
+  if compile_prog "" "-lcuda" "cuda"; then
+    cuda="yes"
+    LIBS="-lcuda $LIBS"
+  else
+    if test "$cuda" = "yes" ; then
+      feature_not_found "cuda" ""
+    fi
+    cuda="no"
+  fi
 fi
 print_config "cuda" "$cuda"
 
@@ -3006,11 +3017,16 @@ int main(int argc, char **argv)
   return 0;
 }
 EOF
-if test "$disable_tcmalloc" != "yes"  && compile_prog "" "-ltcmalloc" "tcmalloc"; then
-  LIBS="-ltcmalloc $LIBS"
-  tcmalloc="yes"
-else
-  tcmalloc="no"
+if test "$disable_tcmalloc" != "yes"; then
+  if compile_prog "" "-ltcmalloc" "tcmalloc"; then
+    tcmalloc="yes"
+    LIBS="-ltcmalloc $LIBS"
+  elif compile_prog "" "-l:libtcmalloc_minimal.so.4" "tcmalloc_minimal4"; then
+    tcmalloc="yes"
+    LIBS="-l:libtcmalloc_minimal.so.4 $LIBS"
+  else
+    tcmalloc="no"
+  fi
 fi
 print_config "TCMalloc support" "$tcmalloc"
 
diff --git a/memory.c b/memory.c
index 5f0225f7..6cf73333 100644
--- a/memory.c
+++ b/memory.c
@@ -274,7 +274,7 @@ static int alloc_mem_cudamalloc(struct thread_data *td, size_t total_mem)
 static void free_mem_cudamalloc(struct thread_data *td)
 {
 #ifdef CONFIG_CUDA
-	if (td->dev_mem_ptr != NULL)
+	if (td->dev_mem_ptr)
 		cuMemFree(td->dev_mem_ptr);
 
 	if (cuCtxDestroy(td->cu_ctx) != CUDA_SUCCESS)
diff --git a/t/jobs/t0011-5d2788d5.fio b/t/jobs/t0011-5d2788d5.fio
index 50daf612..f90cee90 100644
--- a/t/jobs/t0011-5d2788d5.fio
+++ b/t/jobs/t0011-5d2788d5.fio
@@ -7,7 +7,7 @@
 bs=4k
 ioengine=null
 size=100g
-runtime=3
+runtime=10
 flow_id=1
 
 [flow1]
diff --git a/t/jobs/t0012.fio b/t/jobs/t0012.fio
index 985eb16b..03fea627 100644
--- a/t/jobs/t0012.fio
+++ b/t/jobs/t0012.fio
@@ -8,7 +8,7 @@
 bs=4k
 ioengine=null
 size=100g
-runtime=3
+runtime=10
 flow_id=1
 gtod_cpu=1
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c355011a2509fdf6caa2a220e1534d61f14c4801:

  Merge branch 'sribs-patch-1039' of https://github.com/sribs/fio (2020-07-24 11:05:21 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e235e74dfd0627b17690194f957da509b8ace808:

  Merge branch 'test-cleanup' of https://github.com/vincentkfu/fio (2020-07-25 13:55:16 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      configure: error early on too old compier
      Merge branch 'check-atomics' of https://github.com/sitsofe/fio
      Merge branch 'test-cleanup' of https://github.com/vincentkfu/fio

Sitsofe Wheeler (1):
      configure: check for C11 atomics support

Vincent Fu (4):
      t/run-fio-tests: catch modprobe exceptions
      t/run-fio-tests: fix issues identified by pylint3 and pyflakes3
      ci: set PYTHONUNBUFFERED=TRUE
      t/run-fio-tests: add description to each test result line

 .appveyor.yml       |  1 +
 ci/travis-build.sh  |  1 +
 compiler/compiler.h |  7 -------
 configure           | 19 +++++++++++++++++++
 t/run-fio-tests.py  | 34 ++++++++++++++++------------------
 5 files changed, 37 insertions(+), 25 deletions(-)

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index 5f4a33e2..5c0266a1 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -18,6 +18,7 @@ environment:
 install:
   - '%CYG_ROOT%\setup-x86_64.exe --quiet-mode --no-shortcuts --only-site --site "%CYG_MIRROR%" --packages "mingw64-%PACKAGE_ARCH%-zlib,mingw64-%PACKAGE_ARCH%-CUnit" > NUL'
   - SET PATH=C:\Python38-x64;%CYG_ROOT%\bin;%PATH% # NB: Changed env variables persist to later sections
+  - SET PYTHONUNBUFFERED=TRUE
   - python.exe -m pip install scipy six
 
 build_script:
diff --git a/ci/travis-build.sh b/ci/travis-build.sh
index fff9c088..06012e89 100755
--- a/ci/travis-build.sh
+++ b/ci/travis-build.sh
@@ -1,6 +1,7 @@
 #!/bin/bash
 
 EXTRA_CFLAGS="-Werror"
+PYTHONUNBUFFERED=TRUE
 
 if [[ "$BUILD_ARCH" == "x86" ]]; then
     EXTRA_CFLAGS="${EXTRA_CFLAGS} -m32"
diff --git a/compiler/compiler.h b/compiler/compiler.h
index 8c0eb9d1..8a784b92 100644
--- a/compiler/compiler.h
+++ b/compiler/compiler.h
@@ -1,13 +1,6 @@
 #ifndef FIO_COMPILER_H
 #define FIO_COMPILER_H
 
-/* IWYU pragma: begin_exports */
-#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 9) || __clang_major__ >= 6
-#else
-#error Compiler too old, need at least gcc 4.9
-#endif
-/* IWYU pragma: end_exports */
-
 #define __must_check		__attribute__((warn_unused_result))
 
 #define __compiletime_warning(message)	__attribute__((warning(message)))
diff --git a/configure b/configure
index b079a2a5..25216c63 100755
--- a/configure
+++ b/configure
@@ -550,6 +550,25 @@ else
 fi
 print_config "Static build" "$build_static"
 
+##########################################
+# check for C11 atomics support
+cat > $TMPC <<EOF
+#include <stdatomic.h>
+int main(void)
+{
+  _Atomic unsigned v;
+  atomic_load(&v);
+  return 0;
+}
+EOF
+if ! compile_prog "" "" "C11 atomics"; then
+  echo
+  echo "Your compiler doesn't support C11 atomics. gcc 4.9/clang 3.6 are the"
+  echo "minimum versions with it - perhaps your compiler is too old?"
+  fatal "C11 atomics support not found"
+fi
+
+
 ##########################################
 # check for wordsize
 wordsize="0"
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index c116bf5a..6f1fc092 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -447,13 +447,6 @@ class FioJobTest_iops_rate(FioJobTest):
             self.passed = False
 
 
-class FioJobTest_t0013(FioJobTest):
-    """Runs fio test job t0013"""
-
-    def check_result(self):
-        super(FioJobTest_t0013, self).check_result()
-
-
 class Requirements(object):
     """Requirements consists of multiple run environment characteristics.
     These are to determine if a particular test can be run"""
@@ -485,11 +478,14 @@ class Requirements(object):
 
             Requirements._root = (os.geteuid() == 0)
             if Requirements._zbd and Requirements._root:
-                subprocess.run(["modprobe", "null_blk"],
-                               stdout=subprocess.PIPE,
-                               stderr=subprocess.PIPE)
-                if os.path.exists("/sys/module/null_blk/parameters/zoned"):
-                    Requirements._zoned_nullb = True
+                try:
+                    subprocess.run(["modprobe", "null_blk"],
+                                   stdout=subprocess.PIPE,
+                                   stderr=subprocess.PIPE)
+                    if os.path.exists("/sys/module/null_blk/parameters/zoned"):
+                        Requirements._zoned_nullb = True
+                except Exception:
+                    pass
 
         if platform.system() == "Windows":
             utest_exe = "unittest.exe"
@@ -690,13 +686,13 @@ TEST_LIST = [
         'pre_job':          None,
         'pre_success':      None,
         'output_format':    'json',
-        'requirements':     [],
         'requirements':     [Requirements.not_macos],
         # mac os does not support CPU affinity
+        # which is required for gtod offloading
     },
     {
         'test_id':          13,
-        'test_class':       FioJobTest_t0013,
+        'test_class':       FioJobTest,
         'job':              't0013.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -846,9 +842,9 @@ def main():
                 print("Invalid --pass-through argument '%s'" % arg)
                 print("Syntax for --pass-through is TESTNUMBER:ARGUMENT")
                 return
-            split = arg.split(":",1)
+            split = arg.split(":", 1)
             pass_through[int(split[0])] = split[1]
-        logging.debug("Pass-through arguments: %s" % pass_through)
+        logging.debug("Pass-through arguments: %s", pass_through)
 
     if args.fio_root:
         fio_root = args.fio_root
@@ -908,6 +904,7 @@ def main():
                 fio_pre_job=fio_pre_job,
                 fio_pre_success=fio_pre_success,
                 output_format=output_format)
+            desc = config['job']
         elif issubclass(config['test_class'], FioExeTest):
             exe_path = os.path.join(fio_root, config['exe'])
             if config['parameters']:
@@ -921,6 +918,7 @@ def main():
                 parameters += pass_through[config['test_id']].split()
             test = config['test_class'](exe_path, parameters,
                                         config['success'])
+            desc = config['exe']
         else:
             print("Test {0} FAILED: unable to process test config".format(config['test_id']))
             failed = failed + 1
@@ -935,7 +933,7 @@ def main():
                 if not reqs_met:
                     break
             if not reqs_met:
-                print("Test {0} SKIPPED ({1})".format(config['test_id'], reason))
+                print("Test {0} SKIPPED ({1}) {2}".format(config['test_id'], reason, desc))
                 skipped = skipped + 1
                 continue
 
@@ -952,7 +950,7 @@ def main():
             logging.debug("Test %d: stderr:\n%s", config['test_id'], contents)
             contents, _ = FioJobTest.get_file(test.stdout_file)
             logging.debug("Test %d: stdout:\n%s", config['test_id'], contents)
-        print("Test {0} {1}".format(config['test_id'], result))
+        print("Test {0} {1} {2}".format(config['test_id'], result, desc))
 
     print("{0} test(s) passed, {1} failed, {2} skipped".format(passed, failed, skipped))
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5090d1d0f2a109c276384c93308566b7a3bfa5ad:

  zbd: fix %lu -> %llu dprint() formatting (2020-07-21 09:40:07 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c355011a2509fdf6caa2a220e1534d61f14c4801:

  Merge branch 'sribs-patch-1039' of https://github.com/sribs/fio (2020-07-24 11:05:21 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'sribs-patch-1039' of https://github.com/sribs/fio

Shin'ichiro Kawasaki (1):
      t/zbd: Improve pass condition of test case #49

sribs (2):
      io_u: fix exit failure case when using rates and timeout
      stat: stop triggerring division by zero on bandwidth lower than 1KBps

 io_u.c                 | 19 +++++++++++++++++++
 stat.c                 | 25 +++++++++++++++----------
 t/zbd/test-zbd-support |  1 +
 3 files changed, 35 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index 7f50906b..6a729e51 100644
--- a/io_u.c
+++ b/io_u.c
@@ -680,7 +680,22 @@ static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
 	if (td->o.io_submit_mode == IO_MODE_INLINE)
 		io_u_quiesce(td);
 
+	if (td->o.timeout && ((usec + now) > td->o.timeout)) {
+		/*
+		 * check if the usec is capable of taking negative values
+		 */
+		if (now > td->o.timeout) {
+			ddir = DDIR_INVAL;
+			return ddir;
+		}
+		usec = td->o.timeout - now;
+	}
 	usec_sleep(td, usec);
+
+	now = utime_since_now(&td->epoch);
+	if ((td->o.timeout && (now > td->o.timeout)) || td->terminate)
+		ddir = DDIR_INVAL;
+
 	return ddir;
 }
 
@@ -896,6 +911,10 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 
 	set_rw_ddir(td, io_u);
 
+	if (io_u->ddir == DDIR_INVAL) {
+		dprint(FD_IO, "invalid direction received ddir = %d", io_u->ddir);
+		return 1;
+	}
 	/*
 	 * fsync() or fdatasync() or trim etc, we are done
 	 */
diff --git a/stat.c b/stat.c
index b3951199..23657cee 100644
--- a/stat.c
+++ b/stat.c
@@ -414,6 +414,18 @@ static void display_lat(const char *name, unsigned long long min,
 	free(maxp);
 }
 
+static double convert_agg_kbytes_percent(struct group_run_stats *rs, int ddir, int mean)
+{
+	double p_of_agg = 100.0;
+	if (rs && rs->agg[ddir] > 1024) {
+		p_of_agg = mean * 100 / (double) (rs->agg[ddir] / 1024.0);
+
+		if (p_of_agg > 100.0)
+			p_of_agg = 100.0;
+	}
+	return p_of_agg;
+}
+
 static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 			     int ddir, struct buf_output *out)
 {
@@ -551,11 +563,7 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		else
 			bw_str = "kB";
 
-		if (rs->agg[ddir]) {
-			p_of_agg = mean * 100 / (double) (rs->agg[ddir] / 1024);
-			if (p_of_agg > 100.0)
-				p_of_agg = 100.0;
-		}
+		p_of_agg = convert_agg_kbytes_percent(rs, ddir, mean);
 
 		if (rs->unit_base == 1) {
 			min *= 8.0;
@@ -1376,11 +1384,7 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	}
 
 	if (calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev)) {
-		if (rs->agg[ddir]) {
-			p_of_agg = mean * 100 / (double) (rs->agg[ddir] / 1024);
-			if (p_of_agg > 100.0)
-				p_of_agg = 100.0;
-		}
+		p_of_agg = convert_agg_kbytes_percent(rs, ddir, mean);
 	} else {
 		min = max = 0;
 		p_of_agg = mean = dev = 0.0;
@@ -3130,3 +3134,4 @@ uint32_t *io_u_block_info(struct thread_data *td, struct io_u *io_u)
 	assert(idx < td->ts.nr_block_infos);
 	return info;
 }
+
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index e53a20c5..471a3487 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -856,6 +856,7 @@ test49() {
 		    --zonecapacity=${capacity} \
 		    --verify=md5  --size=${size} >>"${logfile}.${test_number}" 2>&1 ||
 	return $?
+    check_written $((capacity * 2)) || return $?
     check_read $((capacity * 2)) || return $?
 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d13596b225baf61425a9ca92b0583fc3fa97765d:

  Fio 3.21 (2020-07-20 16:37:50 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5090d1d0f2a109c276384c93308566b7a3bfa5ad:

  zbd: fix %lu -> %llu dprint() formatting (2020-07-21 09:40:07 -0600)

----------------------------------------------------------------
Hans Holmberg (3):
      options: Add zonecapacity option for zonemode=zbd
      t/zbd: Support testing zone capacity smaller than zone size
      t/zbd: Add test case to check zonecapacity option

Jens Axboe (1):
      zbd: fix %lu -> %llu dprint() formatting

Shin'ichiro Kawasaki (3):
      zbd: Support zone capacity smaller than zone size
      t/zbd: Mandate blkzone capacity report for devices with zone capacity
      t/zbd: Support testing zone capacity smaller than zone size with null_blk

 HOWTO                               |  18 +++++-
 cconv.c                             |   2 +
 configure                           |  19 ++++++
 engines/libzbc.c                    |   5 ++
 fio.1                               |  13 +++-
 options.c                           |  11 ++++
 oslib/linux-blkzoned.c              |  11 ++++
 t/zbd/functions                     |  82 ++++++++++++++++++++++++
 t/zbd/run-tests-against-zoned-nullb |  30 ++++++++-
 t/zbd/test-zbd-support              | 123 ++++++++++++++++++++++++++----------
 thread_options.h                    |   2 +
 zbd.c                               |  87 ++++++++++++++++++++-----
 zbd.h                               |   2 +
 zbd_types.h                         |   1 +
 14 files changed, 348 insertions(+), 58 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 8cf8d650..35ead0cb 100644
--- a/HOWTO
+++ b/HOWTO
@@ -970,14 +970,15 @@ Target file/device
 	Accepted values are:
 
 		**none**
-				The :option:`zonerange`, :option:`zonesize` and
-				:option:`zoneskip` parameters are ignored.
+				The :option:`zonerange`, :option:`zonesize`,
+				:option `zonecapacity` and option:`zoneskip`
+				parameters are ignored.
 		**strided**
 				I/O happens in a single zone until
 				:option:`zonesize` bytes have been transferred.
 				After that number of bytes has been
 				transferred processing of the next zone
-				starts.
+				starts. :option `zonecapacity` is ignored.
 		**zbd**
 				Zoned block device mode. I/O happens
 				sequentially in each zone, even if random I/O
@@ -1004,6 +1005,17 @@ Target file/device
 	For :option:`zonemode` =zbd, this is the size of a single zone. The
 	:option:`zonerange` parameter is ignored in this mode.
 
+
+.. option:: zonecapacity=int
+
+	For :option:`zonemode` =zbd, this defines the capacity of a single zone,
+	which is the accessible area starting from the zone start address.
+	This parameter only applies when using :option:`zonemode` =zbd in
+	combination with regular block devices. If not specified it defaults to
+	the zone size. If the target device is a zoned block device, the zone
+	capacity is obtained from the device information and this option is
+	ignored.
+
 .. option:: zoneskip=int
 
 	For :option:`zonemode` =strided, the number of bytes to skip after
diff --git a/cconv.c b/cconv.c
index 449bcf7b..2469389b 100644
--- a/cconv.c
+++ b/cconv.c
@@ -223,6 +223,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->ss_limit.u.f = fio_uint64_to_double(le64_to_cpu(top->ss_limit.u.i));
 	o->zone_range = le64_to_cpu(top->zone_range);
 	o->zone_size = le64_to_cpu(top->zone_size);
+	o->zone_capacity = le64_to_cpu(top->zone_capacity);
 	o->zone_skip = le64_to_cpu(top->zone_skip);
 	o->zone_mode = le32_to_cpu(top->zone_mode);
 	o->lockmem = le64_to_cpu(top->lockmem);
@@ -563,6 +564,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->ss_limit.u.i = __cpu_to_le64(fio_double_to_uint64(o->ss_limit.u.f));
 	top->zone_range = __cpu_to_le64(o->zone_range);
 	top->zone_size = __cpu_to_le64(o->zone_size);
+	top->zone_capacity = __cpu_to_le64(o->zone_capacity);
 	top->zone_skip = __cpu_to_le64(o->zone_skip);
 	top->zone_mode = __cpu_to_le32(o->zone_mode);
 	top->lockmem = __cpu_to_le64(o->lockmem);
diff --git a/configure b/configure
index 6991393b..b079a2a5 100755
--- a/configure
+++ b/configure
@@ -2390,6 +2390,7 @@ if compile_prog "" "" "valgrind_dev"; then
 fi
 print_config "Valgrind headers" "$valgrind_dev"
 
+if test "$targetos" = "Linux" ; then
 ##########################################
 # <linux/blkzoned.h> probe
 if test "$linux_blkzoned" != "yes" ; then
@@ -2407,6 +2408,24 @@ if compile_prog "" "" "linux_blkzoned"; then
 fi
 print_config "Zoned block device support" "$linux_blkzoned"
 
+##########################################
+# Check BLK_ZONE_REP_CAPACITY
+cat > $TMPC << EOF
+#include <linux/blkzoned.h>
+int main(void)
+{
+  return BLK_ZONE_REP_CAPACITY;
+}
+EOF
+if compile_prog "" "" "blkzoned report capacity"; then
+  output_sym "CONFIG_HAVE_REP_CAPACITY"
+  rep_capacity="yes"
+else
+  rep_capacity="no"
+fi
+print_config "Zoned block device capacity" "$rep_capacity"
+fi
+
 ##########################################
 # libzbc probe
 if test "$libzbc" != "yes" ; then
diff --git a/engines/libzbc.c b/engines/libzbc.c
index fdde8ca6..4b900233 100644
--- a/engines/libzbc.c
+++ b/engines/libzbc.c
@@ -235,6 +235,11 @@ static int libzbc_report_zones(struct thread_data *td, struct fio_file *f,
 		zbdz->start = zones[i].zbz_start << 9;
 		zbdz->len = zones[i].zbz_length << 9;
 		zbdz->wp = zones[i].zbz_write_pointer << 9;
+		/*
+		 * ZBC/ZAC do not define zone capacity, so use the zone size as
+		 * the zone capacity.
+		 */
+		zbdz->capacity = zbdz->len;
 
 		switch (zones[i].zbz_type) {
 		case ZBC_ZT_CONVENTIONAL:
diff --git a/fio.1 b/fio.1
index f134e0bf..a3d348b2 100644
--- a/fio.1
+++ b/fio.1
@@ -738,12 +738,13 @@ Accepted values are:
 .RS
 .TP
 .B none
-The \fBzonerange\fR, \fBzonesize\fR and \fBzoneskip\fR parameters are ignored.
+The \fBzonerange\fR, \fBzonesize\fR \fBzonecapacity\fR and \fBzoneskip\fR
+parameters are ignored.
 .TP
 .B strided
 I/O happens in a single zone until \fBzonesize\fR bytes have been transferred.
 After that number of bytes has been transferred processing of the next zone
-starts.
+starts. The \fBzonecapacity\fR parameter is ignored.
 .TP
 .B zbd
 Zoned block device mode. I/O happens sequentially in each zone, even if random
@@ -771,6 +772,14 @@ zoned block device, the specified \fBzonesize\fR must be 0 or equal to the
 device zone size. For a regular block device or file, the specified
 \fBzonesize\fR must be at least 512B.
 .TP
+.BI zonecapacity \fR=\fPint
+For \fBzonemode\fR=zbd, this defines the capacity of a single zone, which is
+the accessible area starting from the zone start address. This parameter only
+applies when using \fBzonemode\fR=zbd in combination with regular block devices.
+If not specified it defaults to the zone size. If the target device is a zoned
+block device, the zone capacity is obtained from the device information and this
+option is ignored.
+.TP
 .BI zoneskip \fR=\fPint
 For \fBzonemode\fR=strided, the number of bytes to skip after \fBzonesize\fR
 bytes of data have been transferred.
diff --git a/options.c b/options.c
index 85a0f490..251ad2c1 100644
--- a/options.c
+++ b/options.c
@@ -3327,6 +3327,17 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_ZONE,
 	},
+	{
+		.name	= "zonecapacity",
+		.lname	= "Zone capacity",
+		.type	= FIO_OPT_STR_VAL,
+		.off1	= offsetof(struct thread_options, zone_capacity),
+		.help	= "Capacity per zone",
+		.def	= "0",
+		.interval = 1024 * 1024,
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_ZONE,
+	},
 	{
 		.name	= "zonerange",
 		.lname	= "Zone range",
diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
index 1cf06363..6fe78b9c 100644
--- a/oslib/linux-blkzoned.c
+++ b/oslib/linux-blkzoned.c
@@ -113,6 +113,16 @@ out:
 	return 0;
 }
 
+static uint64_t zone_capacity(struct blk_zone_report *hdr,
+			      struct blk_zone *blkz)
+{
+#ifdef CONFIG_HAVE_REP_CAPACITY
+	if (hdr->flags & BLK_ZONE_REP_CAPACITY)
+		return blkz->capacity << 9;
+#endif
+	return blkz->len << 9;
+}
+
 int blkzoned_report_zones(struct thread_data *td, struct fio_file *f,
 			  uint64_t offset, struct zbd_zone *zones,
 			  unsigned int nr_zones)
@@ -149,6 +159,7 @@ int blkzoned_report_zones(struct thread_data *td, struct fio_file *f,
 		z->start = blkz->start << 9;
 		z->wp = blkz->wp << 9;
 		z->len = blkz->len << 9;
+		z->capacity = zone_capacity(hdr, blkz);
 
 		switch (blkz->type) {
 		case BLK_ZONE_TYPE_CONVENTIONAL:
diff --git a/t/zbd/functions b/t/zbd/functions
index 1bd22ec4..81b6f3f7 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -19,6 +19,51 @@ if [ -n "${use_libzbc}" ] &&
     exit 1
 fi
 
+blkzone_reports_capacity() {
+	local dev="${1}"
+
+	[[ -n "${blkzone}" ]] &&
+		"${blkzone}" report -c 1 -o 0 "${dev}" | grep -q 'cap '
+}
+
+# Whether or not $1 (/dev/...) is a NVME ZNS device.
+is_nvme_zns() {
+	local s
+
+	s=/sys/block/$(basename "${1}")/device/subsystem
+
+	if [[ ! -h "${s}" || $(realpath "${s}") != /sys/class/nvme ]]; then
+		return 1
+	fi
+
+	[[ $(</sys/block/$(basename "${1}")/queue/zoned) == host-managed ]]
+}
+
+# Whether or not $1 (/dev/...) is a null_blk device with zone capacity smaller
+# than zone size.
+is_nullb_with_zone_cap() {
+	local f
+
+	f=/sys/kernel/config/nullb/$(basename "${1}")
+	[[ -r "${f}/zone_capacity" &&
+		   $(<"${f}/zone_capacity") -lt $(<"${f}/zone_size") ]]
+}
+
+# Check if blkzone is available and suitable for the test target device. If not
+# available, print error message and return 1. Otherwise return 0.
+check_blkzone() {
+	local dev="${1}"
+
+	# If the device supports zone capacity, mandate zone capacity report by
+	# blkzone.
+	if (is_nvme_zns "${dev}" || is_nullb_with_zone_cap "${dev}") &&
+				! blkzone_reports_capacity "${dev}"; then
+		echo "Error: blkzone does not report zone capacity"
+		echo "Error: install latest util-linux with blkzone"
+		return 1
+	fi
+}
+
 # Reports the starting sector and length of the first sequential zone of device
 # $1.
 first_sequential_zone() {
@@ -39,6 +84,43 @@ first_sequential_zone() {
     fi
 }
 
+# Reports the summed zone capacity of $1 number of zones starting from offset $2
+# on device $3.
+total_zone_capacity() {
+	local nr_zones=$1
+	local sector=$(($2 / 512))
+	local dev=$3
+	local capacity=0 num
+	local grep_str
+
+	if [ -z "$is_zbd" ]; then
+		# For regular block devices, handle zone size as zone capacity.
+		echo $((zone_size * nr_zones))
+		return
+	fi
+
+	if [ -n "${blkzone}" ] && [ ! -n "${use_libzbc}" ]; then
+		if blkzone_reports_capacity "${dev}"; then
+			grep_str='cap \K[0-9a-zA-Z]*'
+		else
+			# If zone capacity is not reported, refer zone length.
+			grep_str='len \K[0-9a-zA-Z]*'
+		fi
+		while read num; do
+			capacity=$((capacity + num))
+		done < <(${blkzone} report -c "$nr_zones" -o "$sector" "$dev" |
+				grep -Po "${grep_str}")
+	else
+		# ZBC devices do not have zone capacity. Use zone size.
+		while read num; do
+			capacity=$((capacity + num))
+		done < <(${zbc_report_zones} -nz "$nr_zones" -start "$sector" \
+				"$dev" | grep -Po 'sector [0-9]*, \K[0-9]*')
+	fi
+
+	echo $((capacity * 512))
+}
+
 max_open_zones() {
     local dev=$1
 
diff --git a/t/zbd/run-tests-against-zoned-nullb b/t/zbd/run-tests-against-zoned-nullb
index 53aee3e8..f9c9530c 100755
--- a/t/zbd/run-tests-against-zoned-nullb
+++ b/t/zbd/run-tests-against-zoned-nullb
@@ -6,6 +6,21 @@
 
 scriptdir="$(cd "$(dirname "$0")" && pwd)"
 
+zone_size=1
+zone_capacity=1
+if [[ ${1} == "-h" ]]; then
+    echo "Usage: ${0} [OPTIONS]"
+    echo "Options:"
+    echo -e "\t-h Show this message."
+    echo -e "\t-zone-cap Use null blk with zone capacity less than zone size."
+    echo -e "\tany option supported by test-zbd-support script."
+    exit 1
+elif [[ ${1} == "-zone-cap" ]]; then
+    zone_size=4
+    zone_capacity=3
+    shift
+fi
+
 for d in /sys/kernel/config/nullb/*; do [ -d "$d" ] && rmdir "$d"; done
 modprobe -r null_blk
 modprobe null_blk nr_devices=0 || exit $?
@@ -17,9 +32,18 @@ modprobe -r null_blk
 modprobe null_blk nr_devices=0 &&
     cd /sys/kernel/config/nullb &&
     mkdir nullb0 &&
-    cd nullb0 &&
-    echo 1 > zoned &&
-    echo 1 > zone_size &&
+    cd nullb0 || exit $?
+
+if ((zone_capacity < zone_size)); then
+    if [[ ! -w zone_capacity ]]; then
+        echo "null blk does not support zone capacity"
+        exit 1
+    fi
+    echo "${zone_capacity}" > zone_capacity
+fi
+
+echo 1 > zoned &&
+    echo "${zone_size}" > zone_size &&
     echo 0 > completion_nsec &&
     echo 4096 > blocksize &&
     echo 1024 > size &&
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 80dc3f30..e53a20c5 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -204,55 +204,64 @@ test4() {
 
 # Sequential write to sequential zones.
 test5() {
-    local size
+    local size off capacity
 
+    off=$((first_sequential_zone_sector * 512))
+    capacity=$(total_zone_capacity 4 $off $dev)
     size=$((4 * zone_size))
     run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write	\
 		   --bs="$(max $((zone_size / 64)) "$logical_block_size")"\
 		   --do_verify=1 --verify=md5				\
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
-    check_written $size || return $?
-    check_read $size || return $?
+    check_written $capacity || return $?
+    check_read $capacity || return $?
 }
 
 # Sequential read from sequential zones.
 test6() {
-    local size
+    local size off capacity
 
+    off=$((first_sequential_zone_sector * 512))
+    capacity=$(total_zone_capacity 4 $off $dev)
     size=$((4 * zone_size))
     write_and_run_one_fio_job \
 	    $((first_sequential_zone_sector * 512)) "${size}" \
-	    --offset=$((first_sequential_zone_sector * 512)) \
+	    --offset="${off}" \
 	    --size="${size}" --zonemode=zbd --zonesize="${zone_size}" \
 	    "$(ioengine "psync")" --iodepth=1 --rw=read \
 	    --bs="$(max $((zone_size / 64)) "$logical_block_size")" \
 	    >>"${logfile}.${test_number}" 2>&1 || return $?
-    check_read $size || return $?
+    check_read $capacity || return $?
 }
 
 # Random write to sequential zones, libaio, queue depth 1.
 test7() {
     local size=$((zone_size))
+    local off capacity
 
+    off=$((first_sequential_zone_sector * 512))
+    capacity=$(total_zone_capacity 1 $off $dev)
     run_fio_on_seq "$(ioengine "libaio")" --iodepth=1 --rw=randwrite	\
 		   --bs="$(min 16384 "${zone_size}")"			\
 		   --do_verify=1 --verify=md5 --size="$size"		\
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
-    check_written $size || return $?
-    check_read $size || return $?
+    check_written $capacity || return $?
+    check_read $capacity || return $?
 }
 
 # Random write to sequential zones, libaio, queue depth 64.
 test8() {
-    local size
+    local size off capacity
 
     size=$((4 * zone_size))
+    off=$((first_sequential_zone_sector * 512))
+    capacity=$(total_zone_capacity 4 $off $dev)
     run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite	\
 		   --bs="$(min 16384 "${zone_size}")"			\
 		   --do_verify=1 --verify=md5				\
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
-    check_written $size || return $?
-    check_read $size || return $?
+    check_written $capacity || return $?
+    check_read $capacity || return $?
 }
 
 # Random write to sequential zones, sg, queue depth 1.
@@ -293,39 +302,45 @@ test10() {
 
 # Random write to sequential zones, libaio, queue depth 64, random block size.
 test11() {
-    local size
+    local size off capacity
 
     size=$((4 * zone_size))
+    off=$((first_sequential_zone_sector * 512))
+    capacity=$(total_zone_capacity 4 $off $dev)
     run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite	\
 		   --bsrange=4K-64K --do_verify=1 --verify=md5		\
 		   --debug=zbd >>"${logfile}.${test_number}" 2>&1 || return $?
-    check_written $size || return $?
-    check_read $size || return $?
+    check_written $capacity || return $?
+    check_read $capacity || return $?
 }
 
 # Random write to sequential zones, libaio, queue depth 64, max 1 open zone.
 test12() {
-    local size
+    local size off capacity
 
     size=$((8 * zone_size))
+    off=$((first_sequential_zone_sector * 512))
+    capacity=$(total_zone_capacity 8 $off $dev)
     run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite --bs=16K \
 		   --max_open_zones=1 --size=$size --do_verify=1 --verify=md5 \
 		   --debug=zbd >>"${logfile}.${test_number}" 2>&1 || return $?
-    check_written $size || return $?
-    check_read $size || return $?
+    check_written $capacity || return $?
+    check_read $capacity || return $?
 }
 
 # Random write to sequential zones, libaio, queue depth 64, max 4 open zones.
 test13() {
-    local size
+    local size off capacity
 
     size=$((8 * zone_size))
+    off=$((first_sequential_zone_sector * 512))
+    capacity=$(total_zone_capacity 8 $off $dev)
     run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite --bs=16K \
 		   --max_open_zones=4 --size=$size --do_verify=1 --verify=md5 \
 		   --debug=zbd						      \
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
-    check_written $size || return $?
-    check_read $size || return $?
+    check_written $capacity || return $?
+    check_read $capacity || return $?
 }
 
 # Random write to conventional zones.
@@ -349,7 +364,7 @@ test14() {
 # Sequential read on a mix of empty and full zones.
 test15() {
     local i off size
-    local w_off w_size
+    local w_off w_size w_capacity
 
     for ((i=0;i<4;i++)); do
 	[ -n "$is_zbd" ] &&
@@ -358,6 +373,7 @@ test15() {
     done
     w_off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
     w_size=$((2 * zone_size))
+    w_capacity=$(total_zone_capacity 2 $w_off $dev)
     off=$((first_sequential_zone_sector * 512))
     size=$((4 * zone_size))
     write_and_run_one_fio_job "${w_off}" "${w_size}" \
@@ -365,14 +381,14 @@ test15() {
 		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off \
 		    --size=$((size)) >>"${logfile}.${test_number}" 2>&1 ||
 	return $?
-    check_written $((w_size)) || return $?
-    check_read $((size / 2))
+    check_written $((w_capacity)) || return $?
+    check_read $((w_capacity))
 }
 
 # Random read on a mix of empty and full zones.
 test16() {
     local off size
-    local i w_off w_size
+    local i w_off w_size w_capacity
 
     for ((i=0;i<4;i++)); do
 	[ -n "$is_zbd" ] &&
@@ -381,13 +397,14 @@ test16() {
     done
     w_off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
     w_size=$((2 * zone_size))
+    w_capacity=$(total_zone_capacity 2 $w_off $dev)
     off=$((first_sequential_zone_sector * 512))
     size=$((4 * zone_size))
     write_and_run_one_fio_job "${w_off}" "${w_size}" \
 		    "$(ioengine "libaio")" --iodepth=64 --rw=randread --bs=16K \
 		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off \
 		    --size=$size >>"${logfile}.${test_number}" 2>&1 || return $?
-    check_written $w_size || return $?
+    check_written $w_capacity || return $?
     check_read $size || return $?
 }
 
@@ -451,13 +468,17 @@ test23() {
 
 test24() {
     local bs loops=9 size=$((zone_size))
+    local off capacity
+
+    off=$((first_sequential_zone_sector * 512))
+    capacity=$(total_zone_capacity 1 $off $dev)
 
     bs=$(min $((256*1024)) "$zone_size")
     run_fio_on_seq "$(ioengine "psync")" --rw=write --bs="$bs"		\
 		   --size=$size --loops=$loops				\
 		   --zone_reset_frequency=.01 --zone_reset_threshold=.90 \
 		   >> "${logfile}.${test_number}" 2>&1 || return $?
-    check_written $((size * loops)) || return $?
+    check_written $((capacity * loops)) || return $?
     check_reset_count -eq 8 ||
 	check_reset_count -eq 9 ||
 	check_reset_count -eq 10 || return $?
@@ -483,15 +504,19 @@ test25() {
 
 write_to_first_seq_zone() {
     local loops=4 r
+    local off capacity
+
+    off=$((first_sequential_zone_sector * 512))
+    capacity=$(total_zone_capacity 1 $off $dev)
 
     r=$(((RANDOM << 16) | RANDOM))
     run_fio --name="$dev" --filename="$dev" "$(ioengine "psync")" --rw="$1" \
 	    --thread=1 --do_verify=1 --verify=md5 --direct=1 --bs=4K	\
-	    --offset=$((first_sequential_zone_sector * 512))		\
-	    "--size=$zone_size" --loops=$loops --randseed="$r"		\
+	    --offset=$off						\
+	    --size=$zone_size --loops=$loops --randseed="$r"		\
 	    --zonemode=zbd --zonesize="${zone_size}" --group_reporting=1	\
 	    --gtod_reduce=1 >> "${logfile}.${test_number}" 2>&1 || return $?
-    check_written $((loops * zone_size)) || return $?
+    check_written $((loops * capacity)) || return $?
 }
 
 # Overwrite the first sequential zone four times sequentially.
@@ -511,15 +536,16 @@ test28() {
     off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
     [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
     opts=("--debug=zbd")
+    capacity=$(total_zone_capacity 1 $off $dev)
     for ((i=0;i<jobs;i++)); do
 	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
-	opts+=("--size=$zone_size" "$(ioengine "psync")" "--rw=randwrite")
+	opts+=("--size=$zone_size" "--io_size=$capacity" "$(ioengine "psync")" "--rw=randwrite")
 	opts+=("--thread=1" "--direct=1" "--zonemode=zbd")
 	opts+=("--zonesize=${zone_size}" "--group_reporting=1")
 	opts+=(${var_opts[@]})
     done
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
-    check_written $((jobs * zone_size)) || return $?
+    check_written $((jobs * $capacity)) || return $?
     check_reset_count -eq $jobs ||
 	check_reset_count -eq $((jobs - 1)) ||
 	return $?
@@ -608,10 +634,13 @@ test32() {
 # zone size.
 test33() {
     local bs io_size size
+    local off capacity=0;
 
+    off=$((first_sequential_zone_sector * 512))
+    capacity=$(total_zone_capacity 1 $off $dev)
     size=$((2 * zone_size))
-    io_size=$((5 * zone_size))
-    bs=$((3 * zone_size / 4))
+    io_size=$((5 * capacity))
+    bs=$((3 * capacity / 4))
     run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write	\
 		   --size=$size --io_size=$io_size --bs=$bs	\
 		   >> "${logfile}.${test_number}" 2>&1 || return $?
@@ -660,8 +689,9 @@ test36() {
 
 # Test 3/4 for the I/O boundary rounding code: $size > $zone_size.
 test37() {
-    local bs off size
+    local bs off size capacity
 
+    capacity=$(total_zone_capacity 1 $first_sequential_zone_sector $dev)
     if [ "$first_sequential_zone_sector" = 0 ]; then
 	off=0
     else
@@ -673,7 +703,7 @@ test37() {
 		    --iodepth=1 --rw=write --do_verify=1 --verify=md5	\
 		    --bs=$bs --zonemode=zbd --zonesize="${zone_size}"	\
 		    >> "${logfile}.${test_number}" 2>&1
-    check_written $((zone_size)) || return $?
+    check_written $capacity || return $?
 }
 
 # Test 4/4 for the I/O boundary rounding code: $offset > $disk_size - $zone_size
@@ -809,6 +839,26 @@ test48() {
 	    >> "${logfile}.${test_number}" 2>&1 || return $?
 }
 
+# Check if fio handles --zonecapacity on a normal block device correctly
+test49() {
+
+    if [ -n "$is_zbd" ]; then
+	echo "$dev is not a regular block device" \
+	     >>"${logfile}.${test_number}"
+	return 0
+    fi
+
+    size=$((2 * zone_size))
+    capacity=$((zone_size * 3 / 4))
+
+    run_one_fio_job "$(ioengine "psync")" --rw=write \
+		    --zonemode=zbd --zonesize="${zone_size}" \
+		    --zonecapacity=${capacity} \
+		    --verify=md5  --size=${size} >>"${logfile}.${test_number}" 2>&1 ||
+	return $?
+    check_read $((capacity * 2)) || return $?
+}
+
 tests=()
 dynamic_analyzer=()
 reset_all_zones=
@@ -863,6 +913,9 @@ if [[ -b "$realdev" ]]; then
 	case "$(<"/sys/class/block/$basename/queue/zoned")" in
 	host-managed|host-aware)
 		is_zbd=true
+		if ! check_blkzone "${dev}"; then
+			exit 1
+		fi
 		if ! result=($(first_sequential_zone "$dev")); then
 			echo "Failed to determine first sequential zone"
 			exit 1
diff --git a/thread_options.h b/thread_options.h
index 968ea0ab..3fe48ecc 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -193,6 +193,7 @@ struct thread_options {
 	unsigned int loops;
 	unsigned long long zone_range;
 	unsigned long long zone_size;
+	unsigned long long zone_capacity;
 	unsigned long long zone_skip;
 	enum fio_zone_mode zone_mode;
 	unsigned long long lockmem;
@@ -487,6 +488,7 @@ struct thread_options_pack {
 	uint32_t loops;
 	uint64_t zone_range;
 	uint64_t zone_size;
+	uint64_t zone_capacity;
 	uint64_t zone_skip;
 	uint64_t lockmem;
 	uint32_t mem_type;
diff --git a/zbd.c b/zbd.c
index cf2cded9..3eac5df3 100644
--- a/zbd.c
+++ b/zbd.c
@@ -140,6 +140,24 @@ static inline bool zbd_zone_swr(struct fio_zone_info *z)
 	return z->type == ZBD_ZONE_TYPE_SWR;
 }
 
+/**
+ * zbd_zone_end - Return zone end location
+ * @z: zone info pointer.
+ */
+static inline uint64_t zbd_zone_end(const struct fio_zone_info *z)
+{
+	return (z+1)->start;
+}
+
+/**
+ * zbd_zone_capacity_end - Return zone capacity limit end location
+ * @z: zone info pointer.
+ */
+static inline uint64_t zbd_zone_capacity_end(const struct fio_zone_info *z)
+{
+	return z->start + z->capacity;
+}
+
 /**
  * zbd_zone_full - verify whether a minimum number of bytes remain in a zone
  * @f: file pointer.
@@ -154,7 +172,7 @@ static bool zbd_zone_full(const struct fio_file *f, struct fio_zone_info *z,
 	assert((required & 511) == 0);
 
 	return zbd_zone_swr(z) &&
-		z->wp + required > z->start + f->zbd_info->zone_size;
+		z->wp + required > zbd_zone_capacity_end(z);
 }
 
 static void zone_lock(struct thread_data *td, struct fio_file *f, struct fio_zone_info *z)
@@ -271,7 +289,7 @@ static bool zbd_verify_sizes(void)
 			z = &f->zbd_info->zone_info[zone_idx];
 			if ((f->file_offset != z->start) &&
 			    (td->o.td_ddir != TD_DDIR_READ)) {
-				new_offset = (z+1)->start;
+				new_offset = zbd_zone_end(z);
 				if (new_offset >= f->file_offset + f->io_size) {
 					log_info("%s: io_size must be at least one zone\n",
 						 f->file_name);
@@ -353,6 +371,7 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 	uint32_t nr_zones;
 	struct fio_zone_info *p;
 	uint64_t zone_size = td->o.zone_size;
+	uint64_t zone_capacity = td->o.zone_capacity;
 	struct zoned_block_device_info *zbd_info = NULL;
 	int i;
 
@@ -368,6 +387,16 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 		return 1;
 	}
 
+	if (zone_capacity == 0)
+		zone_capacity = zone_size;
+
+	if (zone_capacity > zone_size) {
+		log_err("%s: job parameter zonecapacity %llu is larger than zone size %llu\n",
+			f->file_name, (unsigned long long) td->o.zone_capacity,
+			(unsigned long long) td->o.zone_size);
+		return 1;
+	}
+
 	nr_zones = (f->real_file_size + zone_size - 1) / zone_size;
 	zbd_info = scalloc(1, sizeof(*zbd_info) +
 			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
@@ -384,6 +413,7 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 		p->wp = p->start;
 		p->type = ZBD_ZONE_TYPE_SWR;
 		p->cond = ZBD_ZONE_COND_EMPTY;
+		p->capacity = zone_capacity;
 	}
 	/* a sentinel */
 	p->start = nr_zones * zone_size;
@@ -456,10 +486,11 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 			mutex_init_pshared_with_type(&p->mutex,
 						     PTHREAD_MUTEX_RECURSIVE);
 			p->start = z->start;
+			p->capacity = z->capacity;
 			switch (z->cond) {
 			case ZBD_ZONE_COND_NOT_WP:
 			case ZBD_ZONE_COND_FULL:
-				p->wp = p->start + zone_size;
+				p->wp = p->start + p->capacity;
 				break;
 			default:
 				assert(z->start <= z->wp);
@@ -707,7 +738,7 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 	dprint(FD_ZBD, "%s: resetting wp of zone %u.\n", f->file_name,
 		zbd_zone_nr(f->zbd_info, z));
 
-	return zbd_reset_range(td, f, z->start, (z+1)->start - z->start);
+	return zbd_reset_range(td, f, z->start, zbd_zone_end(z) - z->start);
 }
 
 /* The caller must hold f->zbd_info->mutex */
@@ -1068,7 +1099,7 @@ found_candidate_zone:
 	/* Both z->mutex and f->zbd_info->mutex are held. */
 
 examine_zone:
-	if (z->wp + min_bs <= (z+1)->start) {
+	if (z->wp + min_bs <= zbd_zone_capacity_end(z)) {
 		pthread_mutex_unlock(&f->zbd_info->mutex);
 		goto out;
 	}
@@ -1112,7 +1143,7 @@ examine_zone:
 		z = &f->zbd_info->zone_info[zone_idx];
 
 		zone_lock(td, f, z);
-		if (z->wp + min_bs <= (z+1)->start)
+		if (z->wp + min_bs <= zbd_zone_capacity_end(z))
 			goto out;
 		pthread_mutex_lock(&f->zbd_info->mutex);
 	}
@@ -1143,9 +1174,9 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
 		assert(z);
 	}
 
-	if (z->verify_block * min_bs >= f->zbd_info->zone_size)
+	if (z->verify_block * min_bs >= z->capacity)
 		log_err("%s: %d * %d >= %llu\n", f->file_name, z->verify_block,
-			min_bs, (unsigned long long) f->zbd_info->zone_size);
+			min_bs, (unsigned long long)z->capacity);
 	io_u->offset = z->start + z->verify_block++ * min_bs;
 	return z;
 }
@@ -1231,7 +1262,7 @@ static void zbd_queue_io(struct io_u *io_u, int q, bool success)
 	switch (io_u->ddir) {
 	case DDIR_WRITE:
 		zone_end = min((uint64_t)(io_u->offset + io_u->buflen),
-			       (z + 1)->start);
+			       zbd_zone_capacity_end(z));
 		pthread_mutex_lock(&zbd_info->mutex);
 		/*
 		 * z->wp > zone_end means that one or more I/O errors
@@ -1327,6 +1358,28 @@ void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u)
 	assert(td->o.zone_mode == ZONE_MODE_ZBD);
 	assert(td->o.zone_size);
 
+	zone_idx = zbd_zone_idx(f, f->last_pos[ddir]);
+	z = &f->zbd_info->zone_info[zone_idx];
+
+	/*
+	 * When the zone capacity is smaller than the zone size and the I/O is
+	 * sequential write, skip to zone end if the latest position is at the
+	 * zone capacity limit.
+	 */
+	if (z->capacity < f->zbd_info->zone_size && !td_random(td) &&
+	    ddir == DDIR_WRITE &&
+	    f->last_pos[ddir] >= zbd_zone_capacity_end(z)) {
+		dprint(FD_ZBD,
+		       "%s: Jump from zone capacity limit to zone end:"
+		       " (%llu -> %llu) for zone %u (%llu)\n",
+		       f->file_name, (unsigned long long) f->last_pos[ddir],
+		       (unsigned long long) zbd_zone_end(z),
+		       zbd_zone_nr(f->zbd_info, z),
+		       (unsigned long long) z->capacity);
+		td->io_skip_bytes += zbd_zone_end(z) - f->last_pos[ddir];
+		f->last_pos[ddir] = zbd_zone_end(z);
+	}
+
 	/*
 	 * zone_skip is valid only for sequential workloads.
 	 */
@@ -1340,11 +1393,8 @@ void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u)
 	 * - For reads with td->o.read_beyond_wp == false, the last position
 	 *   reached the zone write pointer.
 	 */
-	zone_idx = zbd_zone_idx(f, f->last_pos[ddir]);
-	z = &f->zbd_info->zone_info[zone_idx];
-
 	if (td->zone_bytes >= td->o.zone_size ||
-	    f->last_pos[ddir] >= (z+1)->start ||
+	    f->last_pos[ddir] >= zbd_zone_end(z) ||
 	    (ddir == DDIR_READ &&
 	     (!td->o.read_beyond_wp) && f->last_pos[ddir] >= z->wp)) {
 		/*
@@ -1530,6 +1580,13 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			zb->reset_zone = 0;
 			if (zbd_reset_zone(td, f, zb) < 0)
 				goto eof;
+
+			if (zb->capacity < min_bs) {
+				log_err("zone capacity %llu smaller than minimum block size %d\n",
+					(unsigned long long)zb->capacity,
+					min_bs);
+				goto eof;
+			}
 		}
 		/* Make writes occur at the write pointer */
 		assert(!zbd_zone_full(f, zb, min_bs));
@@ -1545,7 +1602,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		 * small.
 		 */
 		new_len = min((unsigned long long)io_u->buflen,
-			      (zb + 1)->start - io_u->offset);
+			      zbd_zone_capacity_end(zb) - io_u->offset);
 		new_len = new_len / min_bs * min_bs;
 		if (new_len == io_u->buflen)
 			goto accept;
@@ -1556,7 +1613,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			goto accept;
 		}
 		log_err("Zone remainder %lld smaller than minimum block size %d\n",
-			((zb + 1)->start - io_u->offset),
+			(zbd_zone_capacity_end(zb) - io_u->offset),
 			min_bs);
 		goto eof;
 	case DDIR_TRIM:
diff --git a/zbd.h b/zbd.h
index e942a7f6..021174c1 100644
--- a/zbd.h
+++ b/zbd.h
@@ -23,6 +23,7 @@ enum io_u_action {
  * struct fio_zone_info - information about a single ZBD zone
  * @start: zone start location (bytes)
  * @wp: zone write pointer location (bytes)
+ * @capacity: maximum size usable from the start of a zone (bytes)
  * @verify_block: number of blocks that have been verified for this zone
  * @mutex: protects the modifiable members in this structure
  * @type: zone type (BLK_ZONE_TYPE_*)
@@ -35,6 +36,7 @@ struct fio_zone_info {
 	pthread_mutex_t		mutex;
 	uint64_t		start;
 	uint64_t		wp;
+	uint64_t		capacity;
 	uint32_t		verify_block;
 	enum zbd_zone_type	type:2;
 	enum zbd_zone_cond	cond:4;
diff --git a/zbd_types.h b/zbd_types.h
index d63c0d0a..5ed41aa0 100644
--- a/zbd_types.h
+++ b/zbd_types.h
@@ -50,6 +50,7 @@ struct zbd_zone {
 	uint64_t		start;
 	uint64_t		wp;
 	uint64_t		len;
+	uint64_t		capacity;
 	enum zbd_zone_type	type;
 	enum zbd_zone_cond	cond;
 };


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit fa443634fbfa38fd5d6418a96a45022c78b90df4:

  Merge branch 'patch-1' of https://github.com/avlasov-mos-de/fio (2020-07-18 08:39:04 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d13596b225baf61425a9ca92b0583fc3fa97765d:

  Fio 3.21 (2020-07-20 16:37:50 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 3.21

 FIO-VERSION-GEN | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 7050f84e..48e575fc 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.20
+DEF_VER=fio-3.21
 
 LF='
 '


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-19 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1416 bytes --]

The following changes since commit e04241512bb69f1fe8ff6eed9402343af436ba75:

  t/zbd: Enable regular block devices for test case #47 (2020-07-17 07:32:12 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fa443634fbfa38fd5d6418a96a45022c78b90df4:

  Merge branch 'patch-1' of https://github.com/avlasov-mos-de/fio (2020-07-18 08:39:04 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'patch-1' of https://github.com/avlasov-mos-de/fio

avlasov-mos-de (1):
      Fix scale on gnuplot graphs

 tools/fio_generate_plots | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/tools/fio_generate_plots b/tools/fio_generate_plots
index 8872206e..e4558788 100755
--- a/tools/fio_generate_plots
+++ b/tools/fio_generate_plots
@@ -123,10 +123,10 @@ plot () {
 # plot <sub title> <file name tag> <y axis label> <y axis scale>
 #
 
-plot "I/O Latency" lat "Time (msec)" 1000
+plot "I/O Latency" lat "Time (msec)" 1000000
 plot "I/O Operations Per Second" iops "IOPS" 1
-plot "I/O Submission Latency" slat "Time (��sec)" 1
-plot "I/O Completion Latency" clat "Time (msec)" 1000
+plot "I/O Submission Latency" slat "Time (��sec)" 1000
+plot "I/O Completion Latency" clat "Time (msec)" 1000000
 plot "I/O Bandwidth" bw "Throughput (KB/s)" 1
 
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 37e8cc62319b30927a3147e25b16c3e00b84692f:

  Merge branch 'fix_devdax' of https://github.com/harish-24/fio (2020-07-14 10:10:23 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e04241512bb69f1fe8ff6eed9402343af436ba75:

  t/zbd: Enable regular block devices for test case #47 (2020-07-17 07:32:12 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (8):
      zbd: Fix initial zone write pointer of regular block devices
      t/zbd: Fix pass condition of test case #3
      t/zbd: Add write_and_run_one_fio_job() helper function
      t/zbd: Combine write and read fio commands for test case #6
      t/zbd: Combine write and read fio commands for test case #15
      t/zbd: Combine write and read fio commands for test case #16
      t/zbd: Remove write before random read/write from test case #17
      t/zbd: Enable regular block devices for test case #47

 t/zbd/test-zbd-support | 78 ++++++++++++++++++++++++++++----------------------
 zbd.c                  |  2 +-
 2 files changed, 44 insertions(+), 36 deletions(-)

---

Diff of recent changes:

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 4001be3b..80dc3f30 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -109,6 +109,20 @@ run_one_fio_job() {
 	    --thread=1 --direct=1
 }
 
+write_and_run_one_fio_job() {
+    local r
+    local write_offset="${1}"
+    local write_size="${2}"
+
+    shift 2
+    r=$(((RANDOM << 16) | RANDOM))
+    run_fio --filename="$dev" --randseed="$r"  --name="write_job" --rw=write \
+	    "$(ioengine "psync")" --bs="${logical_block_size}" \
+	    --zonemode=zbd --zonesize="${zone_size}" --thread=1 --direct=1 \
+	    --offset="${write_offset}" --size="${write_size}" \
+	    --name="$dev" --wait_for="write_job" "$@" --thread=1 --direct=1
+}
+
 # Run fio on the first four sequential zones of the disk.
 run_fio_on_seq() {
     local opts=()
@@ -170,13 +184,7 @@ test3() {
 	opts+=("--zonesize=${zone_size}")
     fi
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
-    grep -q 'READ:' "${logfile}.${test_number}"
-    rc=$?
-    if [ -n "$is_zbd" ]; then
-	[ $rc != 0 ]
-    else
-	[ $rc = 0 ]
-    fi
+    ! grep -q 'READ:' "${logfile}.${test_number}"
 }
 
 # Run fio with --read_beyond_wp=1 against an empty zone.
@@ -207,14 +215,18 @@ test5() {
     check_read $size || return $?
 }
 
-# Sequential read from sequential zones. Must be run after test5.
+# Sequential read from sequential zones.
 test6() {
     local size
 
     size=$((4 * zone_size))
-    run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=read	\
-		   --bs="$(max $((zone_size / 64)) "$logical_block_size")"\
-		   >>"${logfile}.${test_number}" 2>&1 || return $?
+    write_and_run_one_fio_job \
+	    $((first_sequential_zone_sector * 512)) "${size}" \
+	    --offset=$((first_sequential_zone_sector * 512)) \
+	    --size="${size}" --zonemode=zbd --zonesize="${zone_size}" \
+	    "$(ioengine "psync")" --iodepth=1 --rw=read \
+	    --bs="$(max $((zone_size / 64)) "$logical_block_size")" \
+	    >>"${logfile}.${test_number}" 2>&1 || return $?
     check_read $size || return $?
 }
 
@@ -337,41 +349,45 @@ test14() {
 # Sequential read on a mix of empty and full zones.
 test15() {
     local i off size
+    local w_off w_size
 
     for ((i=0;i<4;i++)); do
 	[ -n "$is_zbd" ] &&
 	    reset_zone "$dev" $((first_sequential_zone_sector +
 				 i*sectors_per_zone))
     done
-    off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
-    size=$((2 * zone_size))
-    run_one_fio_job "$(ioengine "psync")" --rw=write --bs=$((zone_size / 16))\
-		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off \
-		    --size=$size >>"${logfile}.${test_number}" 2>&1 ||
-	return $?
-    check_written $size || return $?
+    w_off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
+    w_size=$((2 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     size=$((4 * zone_size))
-    run_one_fio_job "$(ioengine "psync")" --rw=read --bs=$((zone_size / 16)) \
+    write_and_run_one_fio_job "${w_off}" "${w_size}" \
+		    "$(ioengine "psync")" --rw=read --bs=$((zone_size / 16)) \
 		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off \
 		    --size=$((size)) >>"${logfile}.${test_number}" 2>&1 ||
 	return $?
-    if [ -n "$is_zbd" ]; then
-	check_read $((size / 2))
-    else
-	check_read $size
-    fi
+    check_written $((w_size)) || return $?
+    check_read $((size / 2))
 }
 
-# Random read on a mix of empty and full zones. Must be run after test15.
+# Random read on a mix of empty and full zones.
 test16() {
     local off size
+    local i w_off w_size
 
+    for ((i=0;i<4;i++)); do
+	[ -n "$is_zbd" ] &&
+	    reset_zone "$dev" $((first_sequential_zone_sector +
+				 i*sectors_per_zone))
+    done
+    w_off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
+    w_size=$((2 * zone_size))
     off=$((first_sequential_zone_sector * 512))
     size=$((4 * zone_size))
-    run_one_fio_job "$(ioengine "libaio")" --iodepth=64 --rw=randread --bs=16K \
+    write_and_run_one_fio_job "${w_off}" "${w_size}" \
+		    "$(ioengine "libaio")" --iodepth=64 --rw=randread --bs=16K \
 		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off \
 		    --size=$size >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_written $w_size || return $?
     check_read $size || return $?
 }
 
@@ -381,15 +397,9 @@ test17() {
 
     off=$(((disk_size / zone_size - 1) * zone_size))
     size=$((disk_size - off))
-    # Overwrite the last zone to avoid that reading from that zone fails.
     if [ -n "$is_zbd" ]; then
 	reset_zone "$dev" $((off / 512)) || return $?
     fi
-    run_one_fio_job "$(ioengine "psync")" --rw=write --offset="$off"	\
-		    --zonemode=zbd --zonesize="${zone_size}"		\
-		    --bs="$zone_size" --size="$zone_size"		\
-		    >>"${logfile}.${test_number}" 2>&1 || return $?
-    check_written "$zone_size" || return $?
     run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw --bs=4K \
 		    --zonemode=zbd --zonesize="${zone_size}"		\
 		    --offset=$off --loops=2 --norandommap=1\
@@ -763,10 +773,8 @@ test46() {
 test47() {
     local bs
 
-    [ -z "$is_zbd" ] && return 0
     bs=$((logical_block_size))
-    run_one_fio_job "$(ioengine "psync")" --rw=write --bs=$bs \
-		    --zonemode=zbd --zoneskip=1		 \
+    run_fio_on_seq "$(ioengine "psync")" --rw=write --bs=$bs --zoneskip=1 \
 		    >> "${logfile}.${test_number}" 2>&1 && return 1
     grep -q 'zoneskip 1 is not a multiple of the device zone size' "${logfile}.${test_number}"
 }
diff --git a/zbd.c b/zbd.c
index 8cf8f812..cf2cded9 100644
--- a/zbd.c
+++ b/zbd.c
@@ -381,7 +381,7 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 		mutex_init_pshared_with_type(&p->mutex,
 					     PTHREAD_MUTEX_RECURSIVE);
 		p->start = i * zone_size;
-		p->wp = p->start + zone_size;
+		p->wp = p->start;
 		p->type = ZBD_ZONE_TYPE_SWR;
 		p->cond = ZBD_ZONE_COND_EMPTY;
 	}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 255f09d6d266c3e73abb9776ae481bb4d79caf00:

  Merge branch 'io_uring-opt' of https://github.com/antonblanchard/fio (2020-07-13 08:01:52 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 37e8cc62319b30927a3147e25b16c3e00b84692f:

  Merge branch 'fix_devdax' of https://github.com/harish-24/fio (2020-07-14 10:10:23 -0600)

----------------------------------------------------------------
Harish (1):
      Fix: dev-dax engine building with make

Jens Axboe (1):
      Merge branch 'fix_devdax' of https://github.com/harish-24/fio

 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 8f8d9b9a..f374ac84 100644
--- a/Makefile
+++ b/Makefile
@@ -173,8 +173,8 @@ ifdef CONFIG_PMEMBLK
   ENGINES += pmemblk
 endif
 ifdef CONFIG_LINUX_DEVDAX
-  devdax_SRCS = engines/dev-dax.c
-  devdax_LIBS = -lpmem
+  dev-dax_SRCS = engines/dev-dax.c
+  dev-dax_LIBS = -lpmem
   ENGINES += dev-dax
 endif
 ifdef CONFIG_LIBPMEM


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5bd526f2a614e75a929906c91a5fa3bab293d319:

  t/io_uring: make bs and polled IO configurable at runtime (2020-07-08 15:48:11 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 255f09d6d266c3e73abb9776ae481bb4d79caf00:

  Merge branch 'io_uring-opt' of https://github.com/antonblanchard/fio (2020-07-13 08:01:52 -0600)

----------------------------------------------------------------
Anton Blanchard (1):
      io_uring: Avoid needless update of completion queue head pointer

Jens Axboe (1):
      Merge branch 'io_uring-opt' of https://github.com/antonblanchard/fio

 engines/io_uring.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index cd0810f4..ecff0657 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -307,7 +307,9 @@ static int fio_ioring_cqring_reap(struct thread_data *td, unsigned int events,
 		head++;
 	} while (reaped + events < max);
 
-	atomic_store_release(ring->head, head);
+	if (reaped)
+		atomic_store_release(ring->head, head);
+
 	return reaped;
 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 430083bf2a3db52bcb374919140d3a978391fbf0:

  Merge branch 'windows' of https://github.com/bvanassche/fio (2020-07-04 17:29:32 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5bd526f2a614e75a929906c91a5fa3bab293d319:

  t/io_uring: make bs and polled IO configurable at runtime (2020-07-08 15:48:11 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: make bs and polled IO configurable at runtime

 t/io_uring.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index d48db1e9..7fa84f99 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -46,7 +46,6 @@ struct io_cq_ring {
 #define DEPTH			128
 #define BATCH_SUBMIT		32
 #define BATCH_COMPLETE		32
-
 #define BS			4096
 
 #define MAX_FDS			16
@@ -86,6 +85,7 @@ static volatile int finish;
 static int depth = DEPTH;
 static int batch_submit = BATCH_SUBMIT;
 static int batch_complete = BATCH_COMPLETE;
+static int bs = BS;
 static int polled = 1;		/* use IO polling */
 static int fixedbufs = 1;	/* use fixed user buffers */
 static int register_files = 1;	/* use fixed files */
@@ -170,7 +170,7 @@ static void init_io(struct submitter *s, unsigned index)
 	f->pending_ios++;
 
 	r = lrand48();
-	offset = (r % (f->max_blocks - 1)) * BS;
+	offset = (r % (f->max_blocks - 1)) * bs;
 
 	if (register_files) {
 		sqe->flags = IOSQE_FIXED_FILE;
@@ -182,7 +182,7 @@ static void init_io(struct submitter *s, unsigned index)
 	if (fixedbufs) {
 		sqe->opcode = IORING_OP_READ_FIXED;
 		sqe->addr = (unsigned long) s->iovecs[index].iov_base;
-		sqe->len = BS;
+		sqe->len = bs;
 		sqe->buf_index = index;
 	} else {
 		sqe->opcode = IORING_OP_READV;
@@ -233,10 +233,10 @@ static int get_file_size(struct file *f)
 		if (ioctl(f->real_fd, BLKGETSIZE64, &bytes) != 0)
 			return -1;
 
-		f->max_blocks = bytes / BS;
+		f->max_blocks = bytes / bs;
 		return 0;
 	} else if (S_ISREG(st.st_mode)) {
-		f->max_blocks = st.st_size / BS;
+		f->max_blocks = st.st_size / bs;
 		return 0;
 	}
 
@@ -260,7 +260,7 @@ static int reap_events(struct submitter *s)
 		if (!do_nop) {
 			f = (struct file *) (uintptr_t) cqe->user_data;
 			f->pending_ios--;
-			if (cqe->res != BS) {
+			if (cqe->res != bs) {
 				printf("io: unexpected ret=%d\n", cqe->res);
 				if (polled && cqe->res == -EOPNOTSUPP)
 					printf("Your filesystem/driver/kernel doesn't support polled IO\n");
@@ -483,8 +483,10 @@ static void usage(char *argv)
 	printf("%s [options] -- [filenames]\n"
 		" -d <int> : IO Depth, default %d\n"
 		" -s <int> : Batch submit, default %d\n"
-		" -c <int> : Batch complete, default %d\n",
-		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE);
+		" -c <int> : Batch complete, default %d\n"
+		" -b <int> : Block size, default %d\n"
+		" -p <bool> : Polled IO, default %d\n",
+		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE, BS, polled);
 	exit(0);
 }
 
@@ -501,7 +503,7 @@ int main(int argc, char *argv[])
 		return 1;
 	}
 
-	while ((opt = getopt(argc, argv, "d:s:c:h?")) != -1) {
+	while ((opt = getopt(argc, argv, "d:s:c:b:p:h?")) != -1) {
 		switch (opt) {
 		case 'd':
 			depth = atoi(optarg);
@@ -512,6 +514,12 @@ int main(int argc, char *argv[])
 		case 'c':
 			batch_complete = atoi(optarg);
 			break;
+		case 'b':
+			bs = atoi(optarg);
+			break;
+		case 'p':
+			polled = !!atoi(optarg);
+			break;
 		case 'h':
 		case '?':
 		default:
@@ -575,12 +583,12 @@ int main(int argc, char *argv[])
 	for (i = 0; i < depth; i++) {
 		void *buf;
 
-		if (posix_memalign(&buf, BS, BS)) {
+		if (posix_memalign(&buf, bs, bs)) {
 			printf("failed alloc\n");
 			return 1;
 		}
 		s->iovecs[i].iov_base = buf;
-		s->iovecs[i].iov_len = BS;
+		s->iovecs[i].iov_len = bs;
 	}
 
 	err = setup_ring(s);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a504a3902e00b6db0183872b0ff62e35abae7119:

  configure: fix bad indentation (2020-07-03 08:34:56 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 430083bf2a3db52bcb374919140d3a978391fbf0:

  Merge branch 'windows' of https://github.com/bvanassche/fio (2020-07-04 17:29:32 -0600)

----------------------------------------------------------------
Bart Van Assche (9):
      .appveyor.yml: Select the latest software image
      t/run-fio-tests.py: Increase IOPS tolerance further
      Fix a potentially infinite loop in check_overlap()
      Enable error checking for the mutex that serializes overlapping I/O
      Add a test for serialize_overlap=1
      workqueue: Rework while loop locking
      workqueue: Fix race conditions in the workqueue mechanism
      os/windows/posix.c: Strip trailing whitespace
      os/windows/posix.c: Use INVALID_SOCKET instead of < 0

Jens Axboe (3):
      Merge branch 'overlap' of https://github.com/bvanassche/fio
      Merge branch 'workqueue' of https://github.com/bvanassche/fio
      Merge branch 'windows' of https://github.com/bvanassche/fio

 .appveyor.yml      |  3 +++
 backend.c          | 18 ++++++++++-----
 ioengines.c        |  6 +++--
 os/windows/posix.c |  8 +++----
 rate-submit.c      | 66 ++++++++++++++++++++++++++++--------------------------
 t/jobs/t0013.fio   | 14 ++++++++++++
 t/run-fio-tests.py | 19 +++++++++++++++-
 workqueue.c        | 50 +++++++++++++++++++++--------------------
 8 files changed, 116 insertions(+), 68 deletions(-)
 create mode 100644 t/jobs/t0013.fio

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index 70c337f8..5f4a33e2 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -1,5 +1,8 @@
 clone_depth: 1 # NB: this stops FIO-VERSION-GEN making tag based versions
 
+image:
+  - Visual Studio 2019
+
 environment:
   CYG_MIRROR: http://cygwin.mirror.constant.com
   CYG_ROOT: C:\cygwin64
diff --git a/backend.c b/backend.c
index 0075a733..0e454cdd 100644
--- a/backend.c
+++ b/backend.c
@@ -66,7 +66,11 @@ unsigned int stat_number = 0;
 int shm_id = 0;
 int temp_stall_ts;
 unsigned long done_secs = 0;
+#ifdef PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP
+pthread_mutex_t overlap_check = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP;
+#else
 pthread_mutex_t overlap_check = PTHREAD_MUTEX_INITIALIZER;
+#endif
 
 #define JOB_START_TIMEOUT	(5 * 1000)
 
@@ -1535,7 +1539,7 @@ static void *thread_main(void *data)
 	uint64_t bytes_done[DDIR_RWDIR_CNT];
 	int deadlock_loop_cnt;
 	bool clear_state;
-	int ret;
+	int res, ret;
 
 	sk_out_assign(sk_out);
 	free(fd);
@@ -1860,11 +1864,15 @@ static void *thread_main(void *data)
 	 * offload mode so that we don't clean up this job while
 	 * another thread is checking its io_u's for overlap
 	 */
-	if (td_offload_overlap(td))
-		pthread_mutex_lock(&overlap_check);
+	if (td_offload_overlap(td)) {
+		int res = pthread_mutex_lock(&overlap_check);
+		assert(res == 0);
+	}
 	td_set_runstate(td, TD_FINISHING);
-	if (td_offload_overlap(td))
-		pthread_mutex_unlock(&overlap_check);
+	if (td_offload_overlap(td)) {
+		res = pthread_mutex_unlock(&overlap_check);
+		assert(res == 0);
+	}
 
 	update_rusage_stat(td);
 	td->ts.total_run_time = mtime_since_now(&td->epoch);
diff --git a/ioengines.c b/ioengines.c
index c1b430a1..1c5970a4 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -313,8 +313,10 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	 * started the overlap check because the IO_U_F_FLIGHT
 	 * flag is now set
 	 */
-	if (td_offload_overlap(td))
-		pthread_mutex_unlock(&overlap_check);
+	if (td_offload_overlap(td)) {
+		int res = pthread_mutex_unlock(&overlap_check);
+		assert(res == 0);
+	}
 
 	assert(fio_file_open(io_u->file));
 
diff --git a/os/windows/posix.c b/os/windows/posix.c
index e36453e9..31271de0 100644
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -750,7 +750,7 @@ int setgid(gid_t gid)
 int nice(int incr)
 {
 	DWORD prioclass = NORMAL_PRIORITY_CLASS;
-	
+
 	if (incr < -15)
 		prioclass = HIGH_PRIORITY_CLASS;
 	else if (incr < 0)
@@ -759,7 +759,7 @@ int nice(int incr)
 		prioclass = IDLE_PRIORITY_CLASS;
 	else if (incr > 0)
 		prioclass = BELOW_NORMAL_PRIORITY_CLASS;
-	
+
 	if (!SetPriorityClass(GetCurrentProcess(), prioclass))
 		log_err("fio: SetPriorityClass failed\n");
 
@@ -883,7 +883,7 @@ int poll(struct pollfd fds[], nfds_t nfds, int timeout)
 	FD_ZERO(&exceptfds);
 
 	for (i = 0; i < nfds; i++) {
-		if (fds[i].fd < 0) {
+		if (fds[i].fd == INVALID_SOCKET) {
 			fds[i].revents = 0;
 			continue;
 		}
@@ -900,7 +900,7 @@ int poll(struct pollfd fds[], nfds_t nfds, int timeout)
 
 	if (rc != SOCKET_ERROR) {
 		for (i = 0; i < nfds; i++) {
-			if (fds[i].fd < 0)
+			if (fds[i].fd == INVALID_SOCKET)
 				continue;
 
 			if ((fds[i].events & POLLIN) && FD_ISSET(fds[i].fd, &readfds))
diff --git a/rate-submit.c b/rate-submit.c
index cf00d9bc..b7b70372 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -4,6 +4,7 @@
  * Copyright (C) 2015 Jens Axboe <axboe@kernel.dk>
  *
  */
+#include <assert.h>
 #include "fio.h"
 #include "ioengines.h"
 #include "lib/getrusage.h"
@@ -11,40 +12,41 @@
 
 static void check_overlap(struct io_u *io_u)
 {
-	int i;
+	int i, res;
 	struct thread_data *td;
-	bool overlap = false;
 
-	do {
-		/*
-		 * Allow only one thread to check for overlap at a
-		 * time to prevent two threads from thinking the coast
-		 * is clear and then submitting IOs that overlap with
-		 * each other
-		 *
-		 * If an overlap is found, release the lock and
-		 * re-acquire it before checking again to give other
-		 * threads a chance to make progress
-		 *
-		 * If an overlap is not found, release the lock when the
-		 * io_u's IO_U_F_FLIGHT flag is set so that this io_u
-		 * can be checked by other threads as they assess overlap
-		 */
-		pthread_mutex_lock(&overlap_check);
-		for_each_td(td, i) {
-			if (td->runstate <= TD_SETTING_UP ||
-				td->runstate >= TD_FINISHING ||
-				!td->o.serialize_overlap ||
-				td->o.io_submit_mode != IO_MODE_OFFLOAD)
-				continue;
-
-			overlap = in_flight_overlap(&td->io_u_all, io_u);
-			if (overlap) {
-				pthread_mutex_unlock(&overlap_check);
-				break;
-			}
-		}
-	} while (overlap);
+	/*
+	 * Allow only one thread to check for overlap at a time to prevent two
+	 * threads from thinking the coast is clear and then submitting IOs
+	 * that overlap with each other.
+	 *
+	 * If an overlap is found, release the lock and re-acquire it before
+	 * checking again to give other threads a chance to make progress.
+	 *
+	 * If no overlap is found, release the lock when the io_u's
+	 * IO_U_F_FLIGHT flag is set so that this io_u can be checked by other
+	 * threads as they assess overlap.
+	 */
+	res = pthread_mutex_lock(&overlap_check);
+	assert(res == 0);
+
+retry:
+	for_each_td(td, i) {
+		if (td->runstate <= TD_SETTING_UP ||
+		    td->runstate >= TD_FINISHING ||
+		    !td->o.serialize_overlap ||
+		    td->o.io_submit_mode != IO_MODE_OFFLOAD)
+			continue;
+
+		if (!in_flight_overlap(&td->io_u_all, io_u))
+			continue;
+
+		res = pthread_mutex_unlock(&overlap_check);
+		assert(res == 0);
+		res = pthread_mutex_lock(&overlap_check);
+		assert(res == 0);
+		goto retry;
+	}
 }
 
 static int io_workqueue_fn(struct submit_worker *sw,
diff --git a/t/jobs/t0013.fio b/t/jobs/t0013.fio
new file mode 100644
index 00000000..b4ec1b42
--- /dev/null
+++ b/t/jobs/t0013.fio
@@ -0,0 +1,14 @@
+# Trigger the fio code that serializes overlapping I/O. The job size is very
+# small to make overlapping I/O more likely.
+
+[test]
+ioengine=null
+thread=1
+size=4K
+blocksize=4K
+io_submit_mode=offload
+iodepth=1
+serialize_overlap=1
+numjobs=8
+loops=1000000
+runtime=10
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index ae2cb096..c116bf5a 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -438,7 +438,7 @@ class FioJobTest_iops_rate(FioJobTest):
         logging.debug("Test %d: iops1: %f", self.testnum, iops1)
         logging.debug("Test %d: ratio: %f", self.testnum, ratio)
 
-        if iops1 < 995 or iops1 > 1005:
+        if iops1 < 950 or iops1 > 1050:
             self.failure_reason = "{0} iops value mismatch,".format(self.failure_reason)
             self.passed = False
 
@@ -447,6 +447,13 @@ class FioJobTest_iops_rate(FioJobTest):
             self.passed = False
 
 
+class FioJobTest_t0013(FioJobTest):
+    """Runs fio test job t0013"""
+
+    def check_result(self):
+        super(FioJobTest_t0013, self).check_result()
+
+
 class Requirements(object):
     """Requirements consists of multiple run environment characteristics.
     These are to determine if a particular test can be run"""
@@ -687,6 +694,16 @@ TEST_LIST = [
         'requirements':     [Requirements.not_macos],
         # mac os does not support CPU affinity
     },
+    {
+        'test_id':          13,
+        'test_class':       FioJobTest_t0013,
+        'job':              't0013.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [],
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,
diff --git a/workqueue.c b/workqueue.c
index b5959512..9e6c41ff 100644
--- a/workqueue.c
+++ b/workqueue.c
@@ -85,15 +85,14 @@ static bool all_sw_idle(struct workqueue *wq)
  */
 void workqueue_flush(struct workqueue *wq)
 {
+	pthread_mutex_lock(&wq->flush_lock);
 	wq->wake_idle = 1;
 
-	while (!all_sw_idle(wq)) {
-		pthread_mutex_lock(&wq->flush_lock);
+	while (!all_sw_idle(wq))
 		pthread_cond_wait(&wq->flush_cond, &wq->flush_lock);
-		pthread_mutex_unlock(&wq->flush_lock);
-	}
 
 	wq->wake_idle = 0;
+	pthread_mutex_unlock(&wq->flush_lock);
 }
 
 /*
@@ -159,12 +158,10 @@ static void *worker_thread(void *data)
 	if (sw->flags & SW_F_ERROR)
 		goto done;
 
+	pthread_mutex_lock(&sw->lock);
 	while (1) {
-		pthread_mutex_lock(&sw->lock);
-
 		if (flist_empty(&sw->work_list)) {
 			if (sw->flags & SW_F_EXIT) {
-				pthread_mutex_unlock(&sw->lock);
 				break;
 			}
 
@@ -173,34 +170,41 @@ static void *worker_thread(void *data)
 				workqueue_pre_sleep(sw);
 				pthread_mutex_lock(&sw->lock);
 			}
-
-			/*
-			 * We dropped and reaquired the lock, check
-			 * state again.
-			 */
-			if (!flist_empty(&sw->work_list))
-				goto handle_work;
-
+		}
+		/*
+		 * We may have dropped and reaquired the lock, check state
+		 * again.
+		 */
+		if (flist_empty(&sw->work_list)) {
 			if (sw->flags & SW_F_EXIT) {
-				pthread_mutex_unlock(&sw->lock);
 				break;
-			} else if (!(sw->flags & SW_F_IDLE)) {
+			}
+			if (!(sw->flags & SW_F_IDLE)) {
 				sw->flags |= SW_F_IDLE;
 				wq->next_free_worker = sw->index;
+				pthread_mutex_unlock(&sw->lock);
+				pthread_mutex_lock(&wq->flush_lock);
 				if (wq->wake_idle)
 					pthread_cond_signal(&wq->flush_cond);
+				pthread_mutex_unlock(&wq->flush_lock);
+				pthread_mutex_lock(&sw->lock);
+			}
+		}
+		if (flist_empty(&sw->work_list)) {
+			if (sw->flags & SW_F_EXIT) {
+				break;
 			}
-
 			pthread_cond_wait(&sw->cond, &sw->lock);
 		} else {
-handle_work:
 			flist_splice_init(&sw->work_list, &local_list);
 		}
 		pthread_mutex_unlock(&sw->lock);
 		handle_list(sw, &local_list);
 		if (wq->ops.update_acct_fn)
 			wq->ops.update_acct_fn(sw);
+		pthread_mutex_lock(&sw->lock);
 	}
+	pthread_mutex_unlock(&sw->lock);
 
 done:
 	sk_out_drop();
@@ -336,11 +340,11 @@ int workqueue_init(struct thread_data *td, struct workqueue *wq,
 	 * Wait for them all to be started and initialized
 	 */
 	error = 0;
+	pthread_mutex_lock(&wq->flush_lock);
 	do {
 		struct submit_worker *sw;
 
 		running = 0;
-		pthread_mutex_lock(&wq->flush_lock);
 		for (i = 0; i < wq->max_workers; i++) {
 			sw = &wq->workers[i];
 			pthread_mutex_lock(&sw->lock);
@@ -351,14 +355,12 @@ int workqueue_init(struct thread_data *td, struct workqueue *wq,
 			pthread_mutex_unlock(&sw->lock);
 		}
 
-		if (error || running == wq->max_workers) {
-			pthread_mutex_unlock(&wq->flush_lock);
+		if (error || running == wq->max_workers)
 			break;
-		}
 
 		pthread_cond_wait(&wq->flush_cond, &wq->flush_lock);
-		pthread_mutex_unlock(&wq->flush_lock);
 	} while (1);
+	pthread_mutex_unlock(&wq->flush_lock);
 
 	if (!error)
 		return 0;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5144df064b2fc011615868e59083cfc1786bd073:

  Merge branch 'num2str' of https://github.com/bvanassche/fio (2020-07-02 18:08:14 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a504a3902e00b6db0183872b0ff62e35abae7119:

  configure: fix bad indentation (2020-07-03 08:34:56 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      configure: fix bad indentation

Yigal Korman (4):
      fio: don't retry engine search on failure
      configure/Makefile: engine LIBS consistency
      configure: new --dynamic-libengines build option
      fio: better info when engine not found

 .gitignore         |  1 +
 Makefile           | 86 +++++++++++++++++++++++++++++++++++++++++++-----------
 configure          | 20 +++++--------
 engines/dev-dax.c  |  2 +-
 engines/guasi.c    |  2 +-
 engines/http.c     |  2 +-
 engines/libaio.c   |  2 +-
 engines/libhdfs.c  |  6 ++--
 engines/libiscsi.c |  6 ++--
 engines/libpmem.c  |  2 +-
 engines/libzbc.c   |  2 +-
 engines/nbd.c      |  2 +-
 engines/pmemblk.c  |  2 +-
 engines/rados.c    |  2 +-
 engines/rbd.c      |  2 +-
 engines/rdma.c     |  9 +++---
 init.c             | 14 ++++++---
 ioengines.c        | 26 +++++++++++++++--
 ioengines.h        |  6 ++++
 os/os-linux.h      |  2 ++
 20 files changed, 141 insertions(+), 55 deletions(-)

---

Diff of recent changes:

diff --git a/.gitignore b/.gitignore
index b84b0fda..0aa4a361 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,5 +1,6 @@
 *.d
 *.o
+*.so
 *.exe
 /.depend
 /FIO-VERSION-FILE
diff --git a/Makefile b/Makefile
index 99e96635..8f8d9b9a 100644
--- a/Makefile
+++ b/Makefile
@@ -60,15 +60,17 @@ ifdef CONFIG_LIBHDFS
 endif
 
 ifdef CONFIG_LIBISCSI
-  CFLAGS := $(LIBISCSI_CFLAGS) $(CFLAGS)
-  LIBS += $(LIBISCSI_LIBS)
-  SOURCE += engines/libiscsi.c
+  iscsi_SRCS = engines/libiscsi.c
+  iscsi_LIBS = $(LIBISCSI_LIBS)
+  iscsi_CFLAGS = $(LIBISCSI_LIBS)
+  ENGINES += iscsi
 endif
 
 ifdef CONFIG_LIBNBD
-  CFLAGS := $(LIBNBD_CFLAGS) $(CFLAGS)
-  LIBS += $(LIBNBD_LIBS)
-  SOURCE += engines/nbd.c
+  nbd_SRCS = engines/nbd.c
+  nbd_LIBS = $(LIBNBD_LIBS)
+  nbd_CFLAGS = $(LIBNBD_CFLAGS)
+  ENGINES += nbd
 endif
 
 ifdef CONFIG_64BIT
@@ -78,10 +80,19 @@ ifdef CONFIG_32BIT
   CFLAGS := -DBITS_PER_LONG=32 $(CFLAGS)
 endif
 ifdef CONFIG_LIBAIO
-  SOURCE += engines/libaio.c
+  aio_SRCS = engines/libaio.c
+  aio_LIBS = -laio
+  ifdef CONFIG_LIBAIO_URING
+    aio_LIBS = -luring
+  else
+    aio_LIBS = -laio
+  endif
+  ENGINES += aio
 endif
 ifdef CONFIG_RDMA
-  SOURCE += engines/rdma.c
+  rdma_SRCS = engines/rdma.c
+  rdma_LIBS = -libverbs -lrdmacm
+  ENGINES += rdma
 endif
 ifdef CONFIG_POSIXAIO
   SOURCE += engines/posixaio.c
@@ -96,7 +107,8 @@ ifdef CONFIG_LINUX_SPLICE
   SOURCE += engines/splice.c
 endif
 ifdef CONFIG_GUASI
-  SOURCE += engines/guasi.c
+  guasi_SRCS = engines/guasi.c
+  ENGINES += guasi
 endif
 ifdef CONFIG_SOLARISAIO
   SOURCE += engines/solarisaio.c
@@ -105,13 +117,19 @@ ifdef CONFIG_WINDOWSAIO
   SOURCE += engines/windowsaio.c
 endif
 ifdef CONFIG_RADOS
-  SOURCE += engines/rados.c
+  rados_SRCS = engines/rados.c
+  rados_LIBS = -lrados
+  ENGINES += rados
 endif
 ifdef CONFIG_RBD
-  SOURCE += engines/rbd.c
+  rbd_SRCS = engines/rbd.c
+  rbd_LIBS = -lrbd -lrados
+  ENGINES += rbd
 endif
 ifdef CONFIG_HTTP
-  SOURCE += engines/http.c
+  http_SRCS = engines/http.c
+  http_LIBS = -lcurl -lssl -lcrypto
+  ENGINES += http
 endif
 SOURCE += oslib/asprintf.c
 ifndef CONFIG_STRSEP
@@ -139,6 +157,7 @@ ifdef CONFIG_GFAPI
   SOURCE += engines/glusterfs.c
   SOURCE += engines/glusterfs_sync.c
   SOURCE += engines/glusterfs_async.c
+  LIBS += -lgfapi -lglusterfs
   ifdef CONFIG_GF_FADVISE
     CFLAGS := "-DGFAPI_USE_FADVISE" $(CFLAGS)
   endif
@@ -149,19 +168,27 @@ ifdef CONFIG_MTD
   SOURCE += oslib/libmtd_legacy.c
 endif
 ifdef CONFIG_PMEMBLK
-  SOURCE += engines/pmemblk.c
+  pmemblk_SRCS = engines/pmemblk.c
+  pmemblk_LIBS = -lpmemblk
+  ENGINES += pmemblk
 endif
 ifdef CONFIG_LINUX_DEVDAX
-  SOURCE += engines/dev-dax.c
+  devdax_SRCS = engines/dev-dax.c
+  devdax_LIBS = -lpmem
+  ENGINES += dev-dax
 endif
 ifdef CONFIG_LIBPMEM
-  SOURCE += engines/libpmem.c
+  pmem_SRCS = engines/libpmem.c
+  pmem_LIBS = -lpmem
+  ENGINES += pmem
 endif
 ifdef CONFIG_IME
   SOURCE += engines/ime.c
 endif
 ifdef CONFIG_LIBZBC
-  SOURCE += engines/libzbc.c
+  zbc_SRCS = engines/libzbc.c
+  zbc_LIBS = -lzbc
+  ENGINES += zbc
 endif
 
 ifeq ($(CONFIG_TARGET_OS), Linux)
@@ -223,6 +250,26 @@ ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
   CFLAGS := -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format $(CFLAGS)
 endif
 
+ifdef CONFIG_DYNAMIC_ENGINES
+ DYNAMIC_ENGS := $(ENGINES)
+define engine_template =
+$(1)_OBJS := $$($(1)_SRCS:.c=.o)
+$$($(1)_OBJS): CFLAGS := -fPIC $$($(1)_CFLAGS) $(CFLAGS)
+engines/lib$(1).so: $$($(1)_OBJS)
+	$$(QUIET_LINK)$(CC) -shared -rdynamic -fPIC -Wl,-soname,lib$(1).so.1 $$($(1)_LIBS) -o $$@ $$<
+ENGS_OBJS += engines/lib$(1).so
+all install: $(ENGS_OBJS)
+endef
+else # !CONFIG_DYNAMIC_ENGINES
+define engine_template =
+SOURCE += $$($(1)_SRCS)
+LIBS += $$($(1)_LIBS)
+CFLAGS := $$($(1)_CFLAGS) $(CFLAGS)
+endef
+endif
+
+$(foreach eng,$(ENGINES),$(eval $(call engine_template,$(eng))))
+
 OBJS := $(SOURCE:.c=.o)
 
 FIO_OBJS = $(OBJS) fio.o
@@ -374,6 +421,7 @@ else
 endif
 prefix = $(INSTALL_PREFIX)
 bindir = $(prefix)/bin
+libdir = $(prefix)/lib/fio
 
 ifeq ($(CONFIG_TARGET_OS), Darwin)
 mandir = /usr/share/man
@@ -522,7 +570,7 @@ unittests/unittest: $(UT_OBJS) $(UT_TARGET_OBJS)
 endif
 
 clean: FORCE
-	@rm -f .depend $(FIO_OBJS) $(GFIO_OBJS) $(OBJS) $(T_OBJS) $(UT_OBJS) $(PROGS) $(T_PROGS) $(T_TEST_PROGS) core.* core gfio unittests/unittest FIO-VERSION-FILE *.[do] lib/*.d oslib/*.[do] crc/*.d engines/*.[do] profiles/*.[do] t/*.[do] unittests/*.[do] unittests/*/*.[do] config-host.mak config-host.h y.tab.[ch] lex.yy.c exp/*.[do] lexer.h
+	@rm -f .depend $(FIO_OBJS) $(GFIO_OBJS) $(OBJS) $(T_OBJS) $(UT_OBJS) $(PROGS) $(T_PROGS) $(T_TEST_PROGS) core.* core gfio unittests/unittest FIO-VERSION-FILE *.[do] lib/*.d oslib/*.[do] crc/*.d engines/*.[do] engines/*.so profiles/*.[do] t/*.[do] unittests/*.[do] unittests/*/*.[do] config-host.mak config-host.h y.tab.[ch] lex.yy.c exp/*.[do] lexer.h
 	@rm -f t/fio-btrace2fio t/io_uring t/read-to-pipe-async
 	@rm -rf  doc/output
 
@@ -562,6 +610,10 @@ fulltest:
 install: $(PROGS) $(SCRIPTS) tools/plot/fio2gnuplot.1 FORCE
 	$(INSTALL) -m 755 -d $(DESTDIR)$(bindir)
 	$(INSTALL) $(PROGS) $(SCRIPTS) $(DESTDIR)$(bindir)
+ifdef CONFIG_DYNAMIC_ENGINES
+	$(INSTALL) -m 755 -d $(DESTDIR)$(libdir)
+	$(INSTALL) -m 755 $(SRCDIR)/engines/*.so $(DESTDIR)$(libdir)
+endif
 	$(INSTALL) -m 755 -d $(DESTDIR)$(mandir)/man1
 	$(INSTALL) -m 644 $(SRCDIR)/fio.1 $(DESTDIR)$(mandir)/man1
 	$(INSTALL) -m 644 $(SRCDIR)/tools/fio_generate_plots.1 $(DESTDIR)$(mandir)/man1
diff --git a/configure b/configure
index 63b30555..6991393b 100755
--- a/configure
+++ b/configure
@@ -151,6 +151,7 @@ march_set="no"
 libiscsi="no"
 libnbd="no"
 libaio_uring="no"
+dynamic_engines="no"
 prefix=/usr/local
 
 # parse options
@@ -215,6 +216,8 @@ for opt do
   ;;
   --enable-libaio-uring) libaio_uring="yes"
   ;;
+  --dynamic-libengines) dynamic_engines="yes"
+  ;;
   --help)
     show_help="yes"
     ;;
@@ -254,6 +257,7 @@ if test "$show_help" = "yes" ; then
   echo "--enable-libnbd         Enable libnbd (NBD engine) support"
   echo "--disable-tcmalloc	Disable tcmalloc support"
   echo "--enable-libaio-uring   Enable libaio emulated over io_uring"
+  echo "--dynamic-libengines	Lib-based ioengines as dynamic libraries"
   exit $exit_val
 fi
 
@@ -605,11 +609,9 @@ int main(void)
 EOF
   if test "$libaio_uring" = "yes" && compile_prog "" "-luring" "libaio io_uring" ; then
     libaio=yes
-    LIBS="-luring $LIBS"
   elif compile_prog "" "-laio" "libaio" ; then
     libaio=yes
     libaio_uring=no
-    LIBS="-laio $LIBS"
   else
     if test "$libaio" = "yes" ; then
       feature_not_found "linux AIO" "libaio-dev or libaio-devel"
@@ -859,7 +861,6 @@ int main(int argc, char **argv)
 EOF
 if test "$disable_rdma" != "yes" && compile_prog "" "-libverbs" "libverbs" ; then
     libverbs="yes"
-    LIBS="-libverbs $LIBS"
 fi
 print_config "libverbs" "$libverbs"
 
@@ -879,7 +880,6 @@ int main(int argc, char **argv)
 EOF
 if test "$disable_rdma" != "yes" && compile_prog "" "-lrdmacm" "rdma"; then
     rdmacm="yes"
-    LIBS="-lrdmacm $LIBS"
 fi
 print_config "rdmacm" "$rdmacm"
 
@@ -1770,10 +1770,8 @@ if test "$disable_http" != "yes"; then
   if compile_prog "" "$HTTP_LIBS" "curl-new-ssl"; then
     output_sym "CONFIG_HAVE_OPAQUE_HMAC_CTX"
     http="yes"
-    LIBS="$HTTP_LIBS $LIBS"
   elif mv $TMPC2 $TMPC && compile_prog "" "$HTTP_LIBS" "curl-old-ssl"; then
     http="yes"
-    LIBS="$HTTP_LIBS $LIBS"
   fi
 fi
 print_config "http engine" "$http"
@@ -1802,7 +1800,6 @@ int main(int argc, char **argv)
 }
 EOF
 if test "$disable_rados" != "yes"  && compile_prog "" "-lrados" "rados"; then
-  LIBS="-lrados $LIBS"
   rados="yes"
 fi
 print_config "Rados engine" "$rados"
@@ -1833,7 +1830,6 @@ int main(int argc, char **argv)
 }
 EOF
 if test "$disable_rbd" != "yes"  && compile_prog "" "-lrbd -lrados" "rbd"; then
-  LIBS="-lrbd -lrados $LIBS"
   rbd="yes"
 fi
 print_config "Rados Block Device engine" "$rbd"
@@ -1924,7 +1920,6 @@ int main(int argc, char **argv)
 }
 EOF
 if test "$disable_gfapi" != "yes"  && compile_prog "" "-lgfapi -lglusterfs" "gfapi"; then
-  LIBS="-lgfapi -lglusterfs $LIBS"
   gfapi="yes"
 fi
 print_config "Gluster API engine" "$gfapi"
@@ -2086,7 +2081,6 @@ int main(int argc, char **argv)
 EOF
 if compile_prog "" "-lpmem" "libpmem"; then
   libpmem="yes"
-  LIBS="-lpmem $LIBS"
 fi
 print_config "libpmem" "$libpmem"
 
@@ -2108,7 +2102,6 @@ int main(int argc, char **argv)
 EOF
   if compile_prog "" "-lpmemblk" "libpmemblk"; then
     libpmemblk="yes"
-    LIBS="-lpmemblk $LIBS"
   fi
 fi
 print_config "libpmemblk" "$libpmemblk"
@@ -2432,7 +2425,6 @@ if compile_prog "" "-lzbc" "libzbc"; then
   libzbcvermaj=$(pkg-config --modversion libzbc | sed 's/\.[0-9]*\.[0-9]*//')
   if test "$libzbcvermaj" -ge "5" ; then
     libzbc="yes"
-    LIBS="-lzbc $LIBS"
   else
     print_config "libzbc engine" "Unsupported libzbc version (version 5 or above required)"
     libzbc="no"
@@ -2966,6 +2958,10 @@ if test "$libnbd" = "yes" ; then
   echo "LIBNBD_CFLAGS=$libnbd_cflags" >> $config_host_mak
   echo "LIBNBD_LIBS=$libnbd_libs" >> $config_host_mak
 fi
+if test "$dynamic_engines" = "yes" ; then
+  output_sym "CONFIG_DYNAMIC_ENGINES"
+fi
+print_config "Lib-based ioengines dynamic" "$dynamic_engines"
 cat > $TMPC << EOF
 int main(int argc, char **argv)
 {
diff --git a/engines/dev-dax.c b/engines/dev-dax.c
index 422ea634..1d0f66cb 100644
--- a/engines/dev-dax.c
+++ b/engines/dev-dax.c
@@ -328,7 +328,7 @@ fio_devdax_get_file_size(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
-static struct ioengine_ops ioengine = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name		= "dev-dax",
 	.version	= FIO_IOOPS_VERSION,
 	.init		= fio_devdax_init,
diff --git a/engines/guasi.c b/engines/guasi.c
index cb26802c..d4121757 100644
--- a/engines/guasi.c
+++ b/engines/guasi.c
@@ -242,7 +242,7 @@ static int fio_guasi_init(struct thread_data *td)
 	return 0;
 }
 
-static struct ioengine_ops ioengine = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name		= "guasi",
 	.version	= FIO_IOOPS_VERSION,
 	.init		= fio_guasi_init,
diff --git a/engines/http.c b/engines/http.c
index 275fcab5..7a61b132 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -639,7 +639,7 @@ static int fio_http_invalidate(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
-static struct ioengine_ops ioengine = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name = "http",
 	.version		= FIO_IOOPS_VERSION,
 	.flags			= FIO_DISKLESSIO | FIO_SYNCIO,
diff --git a/engines/libaio.c b/engines/libaio.c
index 398fdf91..b909b79e 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -445,7 +445,7 @@ static int fio_libaio_init(struct thread_data *td)
 	return 0;
 }
 
-static struct ioengine_ops ioengine = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name			= "libaio",
 	.version		= FIO_IOOPS_VERSION,
 	.flags			= FIO_ASYNCIO_SYNC_TRIM,
diff --git a/engines/libhdfs.c b/engines/libhdfs.c
index c57fcea6..9ca82f78 100644
--- a/engines/libhdfs.c
+++ b/engines/libhdfs.c
@@ -393,7 +393,7 @@ static void fio_hdfsio_io_u_free(struct thread_data *td, struct io_u *io_u)
 	}
 }
 
-static struct ioengine_ops ioengine_hdfs = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name = "libhdfs",
 	.version = FIO_IOOPS_VERSION,
 	.flags = FIO_SYNCIO | FIO_DISKLESSIO | FIO_NODISKUTIL,
@@ -412,10 +412,10 @@ static struct ioengine_ops ioengine_hdfs = {
 
 static void fio_init fio_hdfsio_register(void)
 {
-	register_ioengine(&ioengine_hdfs);
+	register_ioengine(&ioengine);
 }
 
 static void fio_exit fio_hdfsio_unregister(void)
 {
-	unregister_ioengine(&ioengine_hdfs);
+	unregister_ioengine(&ioengine);
 }
diff --git a/engines/libiscsi.c b/engines/libiscsi.c
index 35761a61..c97b5709 100644
--- a/engines/libiscsi.c
+++ b/engines/libiscsi.c
@@ -383,7 +383,7 @@ static struct io_u *fio_iscsi_event(struct thread_data *td, int event)
 	return io_u;
 }
 
-static struct ioengine_ops ioengine_iscsi = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name               = "libiscsi",
 	.version            = FIO_IOOPS_VERSION,
 	.flags              = FIO_SYNCIO | FIO_DISKLESSIO | FIO_NODISKUTIL,
@@ -402,10 +402,10 @@ static struct ioengine_ops ioengine_iscsi = {
 
 static void fio_init fio_iscsi_register(void)
 {
-	register_ioengine(&ioengine_iscsi);
+	register_ioengine(&ioengine);
 }
 
 static void fio_exit fio_iscsi_unregister(void)
 {
-	unregister_ioengine(&ioengine_iscsi);
+	unregister_ioengine(&ioengine);
 }
diff --git a/engines/libpmem.c b/engines/libpmem.c
index 99c7b50d..3f63055c 100644
--- a/engines/libpmem.c
+++ b/engines/libpmem.c
@@ -558,7 +558,7 @@ static int fio_libpmem_close_file(struct thread_data *td, struct fio_file *f)
 	return generic_close_file(td, f);
 }
 
-static struct ioengine_ops ioengine = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name		= "libpmem",
 	.version	= FIO_IOOPS_VERSION,
 	.init		= fio_libpmem_init,
diff --git a/engines/libzbc.c b/engines/libzbc.c
index 9e568334..fdde8ca6 100644
--- a/engines/libzbc.c
+++ b/engines/libzbc.c
@@ -397,7 +397,7 @@ static enum fio_q_status libzbc_queue(struct thread_data *td, struct io_u *io_u)
 	return FIO_Q_COMPLETED;
 }
 
-static struct ioengine_ops ioengine = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name			= "libzbc",
 	.version		= FIO_IOOPS_VERSION,
 	.open_file		= libzbc_open_file,
diff --git a/engines/nbd.c b/engines/nbd.c
index 53237929..b0ba75e6 100644
--- a/engines/nbd.c
+++ b/engines/nbd.c
@@ -328,7 +328,7 @@ static int nbd_invalidate(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
-static struct ioengine_ops ioengine = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name			= "nbd",
 	.version		= FIO_IOOPS_VERSION,
 	.options		= options,
diff --git a/engines/pmemblk.c b/engines/pmemblk.c
index e2eaa15e..fc6358e8 100644
--- a/engines/pmemblk.c
+++ b/engines/pmemblk.c
@@ -426,7 +426,7 @@ static int fio_pmemblk_unlink_file(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
-static struct ioengine_ops ioengine = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name = "pmemblk",
 	.version = FIO_IOOPS_VERSION,
 	.queue = fio_pmemblk_queue,
diff --git a/engines/rados.c b/engines/rados.c
index d4413427..42ee48ff 100644
--- a/engines/rados.c
+++ b/engines/rados.c
@@ -444,7 +444,7 @@ static int fio_rados_io_u_init(struct thread_data *td, struct io_u *io_u)
 }
 
 /* ioengine_ops for get_ioengine() */
-static struct ioengine_ops ioengine = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name = "rados",
 	.version		= FIO_IOOPS_VERSION,
 	.flags			= FIO_DISKLESSIO,
diff --git a/engines/rbd.c b/engines/rbd.c
index a08f4775..268b6ebd 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -668,7 +668,7 @@ static int fio_rbd_io_u_init(struct thread_data *td, struct io_u *io_u)
 	return 0;
 }
 
-static struct ioengine_ops ioengine = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name			= "rbd",
 	.version		= FIO_IOOPS_VERSION,
 	.setup			= fio_rbd_setup,
diff --git a/engines/rdma.c b/engines/rdma.c
index f192f432..f4471869 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -226,7 +226,8 @@ static int client_recv(struct thread_data *td, struct ibv_wc *wc)
 		rd->rmt_nr = ntohl(rd->recv_buf.nr);
 
 		for (i = 0; i < rd->rmt_nr; i++) {
-			rd->rmt_us[i].buf = be64_to_cpu(rd->recv_buf.rmt_us[i].buf);
+			rd->rmt_us[i].buf = __be64_to_cpu(
+						rd->recv_buf.rmt_us[i].buf);
 			rd->rmt_us[i].rkey = ntohl(rd->recv_buf.rmt_us[i].rkey);
 			rd->rmt_us[i].size = ntohl(rd->recv_buf.rmt_us[i].size);
 
@@ -1389,7 +1390,7 @@ static int fio_rdmaio_setup(struct thread_data *td)
 	return 0;
 }
 
-static struct ioengine_ops ioengine_rw = {
+FIO_STATIC struct ioengine_ops ioengine = {
 	.name			= "rdma",
 	.version		= FIO_IOOPS_VERSION,
 	.setup			= fio_rdmaio_setup,
@@ -1410,10 +1411,10 @@ static struct ioengine_ops ioengine_rw = {
 
 static void fio_init fio_rdmaio_register(void)
 {
-	register_ioengine(&ioengine_rw);
+	register_ioengine(&ioengine);
 }
 
 static void fio_exit fio_rdmaio_unregister(void)
 {
-	unregister_ioengine(&ioengine_rw);
+	unregister_ioengine(&ioengine);
 }
diff --git a/init.c b/init.c
index e53be35a..3710e3d4 100644
--- a/init.c
+++ b/init.c
@@ -1099,6 +1099,9 @@ int ioengine_load(struct thread_data *td)
 		 */
 		dlhandle = td->io_ops_dlhandle;
 		ops = load_ioengine(td);
+		if (!ops)
+			goto fail;
+
 		if (ops == td->io_ops && dlhandle == td->io_ops_dlhandle) {
 			if (dlhandle)
 				dlclose(dlhandle);
@@ -1113,10 +1116,8 @@ int ioengine_load(struct thread_data *td)
 	}
 
 	td->io_ops = load_ioengine(td);
-	if (!td->io_ops) {
-		log_err("fio: failed to load engine\n");
-		return 1;
-	}
+	if (!td->io_ops)
+		goto fail;
 
 	if (td->io_ops->option_struct_size && td->io_ops->options) {
 		/*
@@ -1155,6 +1156,11 @@ int ioengine_load(struct thread_data *td)
 
 	td_set_ioengine_flags(td);
 	return 0;
+
+fail:
+	log_err("fio: failed to load engine\n");
+	return 1;
+
 }
 
 static void init_flags(struct thread_data *td)
diff --git a/ioengines.c b/ioengines.c
index 2c7a0df9..c1b430a1 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -75,6 +75,25 @@ static struct ioengine_ops *find_ioengine(const char *name)
 	return NULL;
 }
 
+#ifdef CONFIG_DYNAMIC_ENGINES
+static void *dlopen_external(struct thread_data *td, const char *engine)
+{
+	char engine_path[PATH_MAX];
+	void *dlhandle;
+
+	sprintf(engine_path, "%s/lib%s.so", FIO_EXT_ENG_DIR, engine);
+
+	dlhandle = dlopen(engine_path, RTLD_LAZY);
+	if (!dlhandle)
+		log_info("Engine %s not found; Either name is invalid, was not built, or fio-engine-%s package is missing.\n",
+			 engine, engine);
+
+	return dlhandle;
+}
+#else
+#define dlopen_external(td, engine) (NULL)
+#endif
+
 static struct ioengine_ops *dlopen_ioengine(struct thread_data *td,
 					    const char *engine_lib)
 {
@@ -86,8 +105,11 @@ static struct ioengine_ops *dlopen_ioengine(struct thread_data *td,
 	dlerror();
 	dlhandle = dlopen(engine_lib, RTLD_LAZY);
 	if (!dlhandle) {
-		td_vmsg(td, -1, dlerror(), "dlopen");
-		return NULL;
+		dlhandle = dlopen_external(td, engine_lib);
+		if (!dlhandle) {
+			td_vmsg(td, -1, dlerror(), "dlopen");
+			return NULL;
+		}
 	}
 
 	/*
diff --git a/ioengines.h b/ioengines.h
index f48b4db9..54dadba2 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -10,6 +10,12 @@
 
 #define FIO_IOOPS_VERSION	26
 
+#ifndef CONFIG_DYNAMIC_ENGINES
+#define FIO_STATIC	static
+#else
+#define FIO_STATIC
+#endif
+
 /*
  * io_ops->queue() return values
  */
diff --git a/os/os-linux.h b/os/os-linux.h
index 6ec7243d..65d3b429 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -58,6 +58,8 @@
 
 #define OS_MAP_ANON		MAP_ANONYMOUS
 
+#define FIO_EXT_ENG_DIR	"/usr/lib/fio"
+
 typedef cpu_set_t os_cpu_mask_t;
 
 #ifdef CONFIG_3ARG_AFFINITY


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-07-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-07-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 797ef7b209d7423f34dfe513beeede21d2efddc4:

  Merge branch 'pmemblk' of https://github.com/bvanassche/fio (2020-06-28 07:08:36 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5144df064b2fc011615868e59083cfc1786bd073:

  Merge branch 'num2str' of https://github.com/bvanassche/fio (2020-07-02 18:08:14 -0600)

----------------------------------------------------------------
Bart Van Assche (5):
      num2str(): Use asprintf() instead of malloc()
      num2str(): Remove the fmt[] array
      num2str(): Fix overflow handling
      Add a num2str() unit test
      num2str(): Add the E (exa) prefix

Jens Axboe (1):
      Merge branch 'num2str' of https://github.com/bvanassche/fio

 Makefile                |  2 ++
 lib/num2str.c           | 31 ++++++++++++++---------------
 unittests/lib/num2str.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++
 unittests/unittest.c    |  1 +
 unittests/unittest.h    |  1 +
 5 files changed, 72 insertions(+), 16 deletions(-)
 create mode 100644 unittests/lib/num2str.c

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 7eb5e899..99e96635 100644
--- a/Makefile
+++ b/Makefile
@@ -337,12 +337,14 @@ PROGS += $(T_PROGS)
 ifdef CONFIG_HAVE_CUNIT
 UT_OBJS = unittests/unittest.o
 UT_OBJS += unittests/lib/memalign.o
+UT_OBJS += unittests/lib/num2str.o
 UT_OBJS += unittests/lib/strntol.o
 UT_OBJS += unittests/oslib/strlcat.o
 UT_OBJS += unittests/oslib/strndup.o
 UT_OBJS += unittests/oslib/strcasestr.o
 UT_OBJS += unittests/oslib/strsep.o
 UT_TARGET_OBJS = lib/memalign.o
+UT_TARGET_OBJS += lib/num2str.o
 UT_TARGET_OBJS += lib/strntol.o
 UT_TARGET_OBJS += oslib/strlcat.o
 UT_TARGET_OBJS += oslib/strndup.o
diff --git a/lib/num2str.c b/lib/num2str.c
index 1abe22f3..726f1c44 100644
--- a/lib/num2str.c
+++ b/lib/num2str.c
@@ -4,6 +4,7 @@
 #include <string.h>
 
 #include "../compiler/compiler.h"
+#include "../oslib/asprintf.h"
 #include "num2str.h"
 
 #define ARRAY_SIZE(x)    (sizeof((x)) / (sizeof((x)[0])))
@@ -19,8 +20,8 @@
  */
 char *num2str(uint64_t num, int maxlen, int base, int pow2, enum n2s_unit units)
 {
-	const char *sistr[] = { "", "k", "M", "G", "T", "P" };
-	const char *iecstr[] = { "", "Ki", "Mi", "Gi", "Ti", "Pi" };
+	const char *sistr[] = { "", "k", "M", "G", "T", "P", "E" };
+	const char *iecstr[] = { "", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei" };
 	const char **unitprefix;
 	static const char *const unitstr[] = {
 		[N2S_NONE]	= "",
@@ -33,16 +34,12 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, enum n2s_unit units)
 	const unsigned int thousand = pow2 ? 1024 : 1000;
 	unsigned int modulo;
 	int post_index, carry = 0;
-	char tmp[32], fmt[32];
+	char tmp[32];
 	char *buf;
 
 	compiletime_assert(sizeof(sistr) == sizeof(iecstr), "unit prefix arrays must be identical sizes");
 	assert(units < ARRAY_SIZE(unitstr));
 
-	buf = malloc(128);
-	if (!buf)
-		return NULL;
-
 	if (pow2)
 		unitprefix = iecstr;
 	else
@@ -83,16 +80,17 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, enum n2s_unit units)
 		post_index++;
 	}
 
+	if (post_index >= ARRAY_SIZE(sistr))
+		post_index = 0;
+
 	/*
 	 * If no modulo, then we're done.
 	 */
 	if (modulo == -1U) {
 done:
-		if (post_index >= ARRAY_SIZE(sistr))
-			post_index = 0;
-
-		sprintf(buf, "%llu%s%s", (unsigned long long) num,
-			unitprefix[post_index], unitstr[units]);
+		if (asprintf(&buf, "%llu%s%s", (unsigned long long) num,
+			     unitprefix[post_index], unitstr[units]) < 0)
+			buf = NULL;
 		return buf;
 	}
 
@@ -111,10 +109,11 @@ done:
 	 */
 	assert(maxlen - strlen(tmp) - 1 > 0);
 	assert(modulo < thousand);
-	sprintf(fmt, "%%.%df", (int)(maxlen - strlen(tmp) - 1));
-	sprintf(tmp, fmt, (double)modulo / (double)thousand);
+	sprintf(tmp, "%.*f", (int)(maxlen - strlen(tmp) - 1),
+		(double)modulo / (double)thousand);
 
-	sprintf(buf, "%llu.%s%s%s", (unsigned long long) num, &tmp[2],
-			unitprefix[post_index], unitstr[units]);
+	if (asprintf(&buf, "%llu.%s%s%s", (unsigned long long) num, &tmp[2],
+		     unitprefix[post_index], unitstr[units]) < 0)
+		buf = NULL;
 	return buf;
 }
diff --git a/unittests/lib/num2str.c b/unittests/lib/num2str.c
new file mode 100644
index 00000000..a3492a8d
--- /dev/null
+++ b/unittests/lib/num2str.c
@@ -0,0 +1,53 @@
+#include <limits.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include "../../compiler/compiler.h"
+#include "../../lib/num2str.h"
+#include "../unittest.h"
+
+struct testcase {
+	uint64_t num;
+	int maxlen;
+	int base;
+	int pow2;
+	enum n2s_unit unit;
+	const char *expected;
+};
+
+static const struct testcase testcases[] = {
+	{ 1, 1, 1, 0, N2S_NONE, "1" },
+	{ UINT64_MAX, 99, 1, 0, N2S_NONE, "18446744073709551615" },
+	{ 18446744073709551, 2, 1, 0, N2S_NONE, "18P" },
+	{ 18446744073709551, 4, 1, 0, N2S_NONE, "18.4P" },
+	{ UINT64_MAX, 2, 1, 0, N2S_NONE, "18E" },
+	{ UINT64_MAX, 4, 1, 0, N2S_NONE, "18.4E" },
+};
+
+static void test_num2str(void)
+{
+	const struct testcase *p;
+	char *str;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(testcases); ++i) {
+		p = &testcases[i];
+		str = num2str(p->num, p->maxlen, p->base, p->pow2, p->unit);
+		CU_ASSERT_STRING_EQUAL(str, p->expected);
+		free(str);
+	}
+}
+
+static struct fio_unittest_entry tests[] = {
+	{
+		.name	= "num2str/1",
+		.fn	= test_num2str,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
+CU_ErrorCode fio_unittest_lib_num2str(void)
+{
+	return fio_unittest_add_suite("lib/num2str.c", NULL, NULL, tests);
+}
diff --git a/unittests/unittest.c b/unittests/unittest.c
index c37e1971..f490b485 100644
--- a/unittests/unittest.c
+++ b/unittests/unittest.c
@@ -48,6 +48,7 @@ int main(void)
 	}
 
 	fio_unittest_register(fio_unittest_lib_memalign);
+	fio_unittest_register(fio_unittest_lib_num2str);
 	fio_unittest_register(fio_unittest_lib_strntol);
 	fio_unittest_register(fio_unittest_oslib_strlcat);
 	fio_unittest_register(fio_unittest_oslib_strndup);
diff --git a/unittests/unittest.h b/unittests/unittest.h
index 786c1c97..ecb7d124 100644
--- a/unittests/unittest.h
+++ b/unittests/unittest.h
@@ -15,6 +15,7 @@ CU_ErrorCode fio_unittest_add_suite(const char*, CU_InitializeFunc,
 	CU_CleanupFunc, struct fio_unittest_entry*);
 
 CU_ErrorCode fio_unittest_lib_memalign(void);
+CU_ErrorCode fio_unittest_lib_num2str(void);
 CU_ErrorCode fio_unittest_lib_strntol(void);
 CU_ErrorCode fio_unittest_oslib_strlcat(void);
 CU_ErrorCode fio_unittest_oslib_strndup(void);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-06-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-06-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dc6143a58735b781820bb4e3094114078b055810:

  Merge branch 'compiler' of https://github.com/bvanassche/fio (2020-06-24 09:02:06 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 797ef7b209d7423f34dfe513beeede21d2efddc4:

  Merge branch 'pmemblk' of https://github.com/bvanassche/fio (2020-06-28 07:08:36 -0600)

----------------------------------------------------------------
Bart Van Assche (1):
      Unbreak the pmemblk engine

Jens Axboe (1):
      Merge branch 'pmemblk' of https://github.com/bvanassche/fio

 engines/pmemblk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/engines/pmemblk.c b/engines/pmemblk.c
index 730f4d77..e2eaa15e 100644
--- a/engines/pmemblk.c
+++ b/engines/pmemblk.c
@@ -220,14 +220,14 @@ static fio_pmemblk_file_t pmb_open(const char *pathspec, int flags)
 		pmb->pmb_nblocks = pmemblk_nblock(pmb->pmb_pool);
 
 		fio_pmemblk_cache_insert(pmb);
+	} else {
+		free(path);
 	}
 
 	pmb->pmb_refcnt += 1;
 
 	pthread_mutex_unlock(&CacheLock);
 
-	free(path);
-
 	return pmb;
 
 error:


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-06-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-06-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c7f72c4bdb3586025cc7c5da707e69cc7a1baf07:

  Merge branch 'master' of https://github.com/safl/fio (2020-06-23 12:41:35 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dc6143a58735b781820bb4e3094114078b055810:

  Merge branch 'compiler' of https://github.com/bvanassche/fio (2020-06-24 09:02:06 -0600)

----------------------------------------------------------------
Bart Van Assche (1):
      Merge compiler/gcc4.h into compiler/compiler.h

Jens Axboe (1):
      Merge branch 'compiler' of https://github.com/bvanassche/fio

 compiler/compiler-gcc4.h | 17 -----------------
 compiler/compiler.h      |  8 ++++----
 2 files changed, 4 insertions(+), 21 deletions(-)
 delete mode 100644 compiler/compiler-gcc4.h

---

Diff of recent changes:

diff --git a/compiler/compiler-gcc4.h b/compiler/compiler-gcc4.h
deleted file mode 100644
index e8701cf0..00000000
--- a/compiler/compiler-gcc4.h
+++ /dev/null
@@ -1,17 +0,0 @@
-#ifndef FIO_COMPILER_GCC4_H
-#define FIO_COMPILER_GCC4_H
-
-#ifndef __must_check
-#define __must_check		__attribute__((warn_unused_result))
-#endif
-
-#define GCC_VERSION (__GNUC__ * 10000		\
-			+ __GNUC_MINOR__ * 100	\
-			+ __GNUC_PATCHLEVEL__)
-
-#if GCC_VERSION >= 40300
-#define __compiletime_warning(message)	__attribute__((warning(message)))
-#define __compiletime_error(message)	__attribute__((error(message)))
-#endif
-
-#endif
diff --git a/compiler/compiler.h b/compiler/compiler.h
index 8eba2929..8c0eb9d1 100644
--- a/compiler/compiler.h
+++ b/compiler/compiler.h
@@ -3,15 +3,15 @@
 
 /* IWYU pragma: begin_exports */
 #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 9) || __clang_major__ >= 6
-#include "compiler-gcc4.h"
 #else
 #error Compiler too old, need at least gcc 4.9
 #endif
 /* IWYU pragma: end_exports */
 
-#ifndef __must_check
-#define __must_check
-#endif
+#define __must_check		__attribute__((warn_unused_result))
+
+#define __compiletime_warning(message)	__attribute__((warning(message)))
+#define __compiletime_error(message)	__attribute__((error(message)))
 
 /*
  * Mark unused variables passed to ops functions as unused, to silence gcc


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-06-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-06-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 653241de1eb5b9abe21cb6feb036df202d388c68:

  Merge branch 'atomics' of https://github.com/bvanassche/fio (2020-06-21 20:48:05 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c7f72c4bdb3586025cc7c5da707e69cc7a1baf07:

  Merge branch 'master' of https://github.com/safl/fio (2020-06-23 12:41:35 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      compiler/compiler.h: minimum GCC version is 4.9
      oslib/linux-blkzoned: fix bogus poiter alignment warning
      compiler/compiler.h: include clang 6.0 and above
      Merge branch 'master' of https://github.com/safl/fio

Simon A. F. Lund (1):
      Changed signedness of seqlock.sequence fixing comparison-warning

 compiler/compiler.h    | 4 ++--
 lib/seqlock.h          | 2 +-
 oslib/linux-blkzoned.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/compiler/compiler.h b/compiler/compiler.h
index ddfbcc12..8eba2929 100644
--- a/compiler/compiler.h
+++ b/compiler/compiler.h
@@ -2,10 +2,10 @@
 #define FIO_COMPILER_H
 
 /* IWYU pragma: begin_exports */
-#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 1)
+#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 9) || __clang_major__ >= 6
 #include "compiler-gcc4.h"
 #else
-#error Compiler too old, need at least gcc 4.1.0
+#error Compiler too old, need at least gcc 4.9
 #endif
 /* IWYU pragma: end_exports */
 
diff --git a/lib/seqlock.h b/lib/seqlock.h
index afa9fd31..56f3e37d 100644
--- a/lib/seqlock.h
+++ b/lib/seqlock.h
@@ -5,7 +5,7 @@
 #include "../arch/arch.h"
 
 struct seqlock {
-	volatile int sequence;
+	volatile unsigned int sequence;
 };
 
 static inline void seqlock_init(struct seqlock *s)
diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
index 61ea3a53..1cf06363 100644
--- a/oslib/linux-blkzoned.c
+++ b/oslib/linux-blkzoned.c
@@ -143,7 +143,7 @@ int blkzoned_report_zones(struct thread_data *td, struct fio_file *f,
 	}
 
 	nr_zones = hdr->nr_zones;
-	blkz = &hdr->zones[0];
+	blkz = (void *) hdr + sizeof(*hdr);
 	z = &zones[0];
 	for (i = 0; i < nr_zones; i++, z++, blkz++) {
 		z->start = blkz->start << 9;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-06-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-06-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1a953d975847e248be1718105621796bf9481878:

  Priority bit log file format documentation update (2020-06-12 16:24:46 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 653241de1eb5b9abe21cb6feb036df202d388c68:

  Merge branch 'atomics' of https://github.com/bvanassche/fio (2020-06-21 20:48:05 -0600)

----------------------------------------------------------------
Bart Van Assche (11):
      configure: Use -Wimplicit-fallthrough=2 instead of -Wimplicit-fallthrough=3
      Make __rand_0_1() compatible with clang
      fio_sem: Remove a read_barrier() call
      arch/arch.h: Introduce atomic_{load_acquire,store_release}()
      engines/libaio: Use atomic_store_release() instead of read_barrier()
      engines/io_uring: Use atomic_{load_acquire,store_release}()
      fio: Use atomic_load_acquire() and atomic_store_release() where appropriate
      t/run-fio-tests.py: Increase IOPS tolerance further
      Add a test that sets gtod_cpu=1
      Optimize the seqlock implementation
      Optimize fio_gettime_offload()

Jens Axboe (1):
      Merge branch 'atomics' of https://github.com/bvanassche/fio

 arch/arch.h        |  9 +++++++++
 configure          |  6 +++---
 engines/io_uring.c | 12 ++++--------
 engines/libaio.c   |  4 ++--
 fio_sem.c          |  1 -
 gettime-thread.c   | 23 +++++++++++++----------
 gettime.h          | 15 +++++++++------
 io_u.c             |  4 ++--
 lib/rand.h         | 10 ++++++----
 lib/seqlock.h      |  9 +++------
 t/debug.c          |  2 +-
 t/jobs/t0012.fio   | 20 ++++++++++++++++++++
 t/run-fio-tests.py | 20 ++++++++++++++++----
 verify.c           |  7 +++----
 14 files changed, 91 insertions(+), 51 deletions(-)
 create mode 100644 t/jobs/t0012.fio

---

Diff of recent changes:

diff --git a/arch/arch.h b/arch/arch.h
index 30c0d205..08c3d703 100644
--- a/arch/arch.h
+++ b/arch/arch.h
@@ -1,6 +1,8 @@
 #ifndef ARCH_H
 #define ARCH_H
 
+#include <stdatomic.h>
+
 #include "../lib/types.h"
 
 enum {
@@ -34,6 +36,13 @@ extern unsigned long arch_flags;
 
 #define ARCH_CPU_CLOCK_WRAPS
 
+#define atomic_load_acquire(p)					\
+	atomic_load_explicit((_Atomic typeof(*(p)) *)(p),	\
+			     memory_order_acquire)
+#define atomic_store_release(p, v)				\
+	atomic_store_explicit((_Atomic typeof(*(p)) *)(p), (v),	\
+			      memory_order_release)
+
 /* IWYU pragma: begin_exports */
 #if defined(__i386__)
 #include "arch-x86.h"
diff --git a/configure b/configure
index 3ee8aaf2..63b30555 100755
--- a/configure
+++ b/configure
@@ -2548,7 +2548,7 @@ fi
 print_config "__kernel_rwf_t" "$__kernel_rwf_t"
 
 ##########################################
-# check if gcc has -Wimplicit-fallthrough
+# check if gcc has -Wimplicit-fallthrough=2
 fallthrough="no"
 cat > $TMPC << EOF
 int main(int argc, char **argv)
@@ -2556,10 +2556,10 @@ int main(int argc, char **argv)
   return 0;
 }
 EOF
-if compile_prog "-Wimplicit-fallthrough" "" "-Wimplicit-fallthrough"; then
+if compile_prog "-Wimplicit-fallthrough=2" "" "-Wimplicit-fallthrough=2"; then
   fallthrough="yes"
 fi
-print_config "-Wimplicit-fallthrough" "$fallthrough"
+print_config "-Wimplicit-fallthrough=2" "$fallthrough"
 
 ##########################################
 # check for MADV_HUGEPAGE support
diff --git a/engines/io_uring.c b/engines/io_uring.c
index cab7ecaf..cd0810f4 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -301,15 +301,13 @@ static int fio_ioring_cqring_reap(struct thread_data *td, unsigned int events,
 
 	head = *ring->head;
 	do {
-		read_barrier();
-		if (head == *ring->tail)
+		if (head == atomic_load_acquire(ring->tail))
 			break;
 		reaped++;
 		head++;
 	} while (reaped + events < max);
 
-	*ring->head = head;
-	write_barrier();
+	atomic_store_release(ring->head, head);
 	return reaped;
 }
 
@@ -384,15 +382,13 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 
 	tail = *ring->tail;
 	next_tail = tail + 1;
-	read_barrier();
-	if (next_tail == *ring->head)
+	if (next_tail == atomic_load_acquire(ring->head))
 		return FIO_Q_BUSY;
 
 	if (o->cmdprio_percentage)
 		fio_ioring_prio_prep(td, io_u);
 	ring->array[tail & ld->sq_ring_mask] = io_u->index;
-	*ring->tail = next_tail;
-	write_barrier();
+	atomic_store_release(ring->tail, next_tail);
 
 	ld->queued++;
 	return FIO_Q_QUEUED;
diff --git a/engines/libaio.c b/engines/libaio.c
index daa576da..398fdf91 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -195,8 +195,8 @@ static int user_io_getevents(io_context_t aio_ctx, unsigned int max,
 		} else {
 			/* There is another completion to reap */
 			events[i] = ring->events[head];
-			read_barrier();
-			ring->head = (head + 1) % ring->nr;
+			atomic_store_release(&ring->head,
+					     (head + 1) % ring->nr);
 			i++;
 		}
 	}
diff --git a/fio_sem.c b/fio_sem.c
index c34d8bf7..c7806acb 100644
--- a/fio_sem.c
+++ b/fio_sem.c
@@ -169,7 +169,6 @@ void fio_sem_up(struct fio_sem *sem)
 	assert(sem->magic == FIO_SEM_MAGIC);
 
 	pthread_mutex_lock(&sem->lock);
-	read_barrier();
 	if (!sem->value && sem->waiters)
 		do_wake = 1;
 	sem->value++;
diff --git a/gettime-thread.c b/gettime-thread.c
index 0a2cc6c4..953e4e67 100644
--- a/gettime-thread.c
+++ b/gettime-thread.c
@@ -2,9 +2,10 @@
 #include <time.h>
 
 #include "fio.h"
+#include "lib/seqlock.h"
 #include "smalloc.h"
 
-struct timespec *fio_ts = NULL;
+struct fio_ts *fio_ts;
 int fio_gtod_offload = 0;
 static pthread_t gtod_thread;
 static os_cpu_mask_t fio_gtod_cpumask;
@@ -19,15 +20,17 @@ void fio_gtod_init(void)
 
 static void fio_gtod_update(void)
 {
-	if (fio_ts) {
-		struct timeval __tv;
-
-		gettimeofday(&__tv, NULL);
-		fio_ts->tv_sec = __tv.tv_sec;
-		write_barrier();
-		fio_ts->tv_nsec = __tv.tv_usec * 1000;
-		write_barrier();
-	}
+	struct timeval __tv;
+
+	if (!fio_ts)
+		return;
+
+	gettimeofday(&__tv, NULL);
+
+	write_seqlock_begin(&fio_ts->seqlock);
+	fio_ts->ts.tv_sec = __tv.tv_sec;
+	fio_ts->ts.tv_nsec = __tv.tv_usec * 1000;
+	write_seqlock_end(&fio_ts->seqlock);
 }
 
 struct gtod_cpu_data {
diff --git a/gettime.h b/gettime.h
index f92ee8c4..c55f5cba 100644
--- a/gettime.h
+++ b/gettime.h
@@ -4,6 +4,7 @@
 #include <sys/time.h>
 
 #include "arch/arch.h"
+#include "lib/seqlock.h"
 
 /*
  * Clock sources
@@ -22,20 +23,22 @@ extern int fio_start_gtod_thread(void);
 extern int fio_monotonic_clocktest(int debug);
 extern void fio_local_clock_init(void);
 
-extern struct timespec *fio_ts;
+extern struct fio_ts {
+	struct seqlock seqlock;
+	struct timespec ts;
+} *fio_ts;
 
 static inline int fio_gettime_offload(struct timespec *ts)
 {
-	time_t last_sec;
+	unsigned int seq;
 
 	if (!fio_ts)
 		return 0;
 
 	do {
-		read_barrier();
-		last_sec = ts->tv_sec = fio_ts->tv_sec;
-		ts->tv_nsec = fio_ts->tv_nsec;
-	} while (fio_ts->tv_sec != last_sec);
+		seq = read_seqlock_begin(&fio_ts->seqlock);
+		*ts = fio_ts->ts;
+	} while (read_seqlock_retry(&fio_ts->seqlock, seq));
 
 	return 1;
 }
diff --git a/io_u.c b/io_u.c
index ae1438fd..7f50906b 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1934,8 +1934,8 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 		if (io_u->error)
 			unlog_io_piece(td, io_u);
 		else {
-			io_u->ipo->flags &= ~IP_F_IN_FLIGHT;
-			write_barrier();
+			atomic_store_release(&io_u->ipo->flags,
+					io_u->ipo->flags & ~IP_F_IN_FLIGHT);
 		}
 	}
 
diff --git a/lib/rand.h b/lib/rand.h
index 2ccc1b37..46c1c5e0 100644
--- a/lib/rand.h
+++ b/lib/rand.h
@@ -6,7 +6,9 @@
 #include "types.h"
 
 #define FRAND32_MAX	(-1U)
+#define FRAND32_MAX_PLUS_ONE	(1.0 * (1ULL << 32))
 #define FRAND64_MAX	(-1ULL)
+#define FRAND64_MAX_PLUS_ONE	(1.0 * (1ULL << 32) * (1ULL << 32))
 
 struct taus88_state {
 	unsigned int s1, s2, s3;
@@ -106,11 +108,11 @@ static inline double __rand_0_1(struct frand_state *state)
 	if (state->use64) {
 		uint64_t val = __rand64(&state->state64);
 
-		return (val + 1.0) / (FRAND64_MAX + 1.0);
+		return (val + 1.0) / FRAND64_MAX_PLUS_ONE;
 	} else {
 		uint32_t val = __rand32(&state->state32);
 
-		return (val + 1.0) / (FRAND32_MAX + 1.0);
+		return (val + 1.0) / FRAND32_MAX_PLUS_ONE;
 	}
 }
 
@@ -122,7 +124,7 @@ static inline uint32_t rand32_upto(struct frand_state *state, uint32_t end)
 
 	r = __rand32(&state->state32);
 	end++;
-	return (int) ((double)end * (r / (FRAND32_MAX + 1.0)));
+	return (int) ((double)end * (r / FRAND32_MAX_PLUS_ONE));
 }
 
 static inline uint64_t rand64_upto(struct frand_state *state, uint64_t end)
@@ -133,7 +135,7 @@ static inline uint64_t rand64_upto(struct frand_state *state, uint64_t end)
 
 	r = __rand64(&state->state64);
 	end++;
-	return (uint64_t) ((double)end * (r / (FRAND64_MAX + 1.0)));
+	return (uint64_t) ((double)end * (r / FRAND64_MAX_PLUS_ONE));
 }
 
 /*
diff --git a/lib/seqlock.h b/lib/seqlock.h
index 762b6ec1..afa9fd31 100644
--- a/lib/seqlock.h
+++ b/lib/seqlock.h
@@ -18,13 +18,12 @@ static inline unsigned int read_seqlock_begin(struct seqlock *s)
 	unsigned int seq;
 
 	do {
-		seq = s->sequence;
+		seq = atomic_load_acquire(&s->sequence);
 		if (!(seq & 1))
 			break;
 		nop;
 	} while (1);
 
-	read_barrier();
 	return seq;
 }
 
@@ -36,14 +35,12 @@ static inline bool read_seqlock_retry(struct seqlock *s, unsigned int seq)
 
 static inline void write_seqlock_begin(struct seqlock *s)
 {
-	s->sequence++;
-	write_barrier();
+	s->sequence = atomic_load_acquire(&s->sequence) + 1;
 }
 
 static inline void write_seqlock_end(struct seqlock *s)
 {
-	write_barrier();
-	s->sequence++;
+	atomic_store_release(&s->sequence, s->sequence + 1);
 }
 
 #endif
diff --git a/t/debug.c b/t/debug.c
index 8965cfbc..0c913368 100644
--- a/t/debug.c
+++ b/t/debug.c
@@ -1,7 +1,7 @@
 #include <stdio.h>
 
 FILE *f_err;
-struct timespec *fio_ts = NULL;
+void *fio_ts;
 unsigned long fio_debug = 0;
 
 void __dprint(int type, const char *str, ...)
diff --git a/t/jobs/t0012.fio b/t/jobs/t0012.fio
new file mode 100644
index 00000000..985eb16b
--- /dev/null
+++ b/t/jobs/t0012.fio
@@ -0,0 +1,20 @@
+# Expected results: no parse warnings, runs and with roughly 1/8 iops between
+#			the two jobs.
+# Buggy result: parse warning on flow value overflow, no 1/8 division between
+#			jobs.
+#
+
+[global]
+bs=4k
+ioengine=null
+size=100g
+runtime=3
+flow_id=1
+gtod_cpu=1
+
+[flow1]
+flow=-8
+rate_iops=1000
+
+[flow2]
+flow=1
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index c2352d80..ae2cb096 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -420,14 +420,14 @@ class FioJobTest_t0009(FioJobTest):
             self.passed = False
 
 
-class FioJobTest_t0011(FioJobTest):
+class FioJobTest_iops_rate(FioJobTest):
     """Test consists of fio test job t0009
     Confirm that job0 iops == 1000
     and that job1_iops / job0_iops ~ 8
     With two runs of fio-3.16 I observed a ratio of 8.3"""
 
     def check_result(self):
-        super(FioJobTest_t0011, self).check_result()
+        super(FioJobTest_iops_rate, self).check_result()
 
         if not self.passed:
             return
@@ -438,7 +438,7 @@ class FioJobTest_t0011(FioJobTest):
         logging.debug("Test %d: iops1: %f", self.testnum, iops1)
         logging.debug("Test %d: ratio: %f", self.testnum, ratio)
 
-        if iops1 < 997 or iops1 > 1003:
+        if iops1 < 995 or iops1 > 1005:
             self.failure_reason = "{0} iops value mismatch,".format(self.failure_reason)
             self.passed = False
 
@@ -667,7 +667,7 @@ TEST_LIST = [
     },
     {
         'test_id':          11,
-        'test_class':       FioJobTest_t0011,
+        'test_class':       FioJobTest_iops_rate,
         'job':              't0011-5d2788d5.fio',
         'success':          SUCCESS_DEFAULT,
         'pre_job':          None,
@@ -675,6 +675,18 @@ TEST_LIST = [
         'output_format':    'json',
         'requirements':     [],
     },
+    {
+        'test_id':          12,
+        'test_class':       FioJobTest_iops_rate,
+        'job':              't0012.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [],
+        'requirements':     [Requirements.not_macos],
+        # mac os does not support CPU affinity
+    },
     {
         'test_id':          1000,
         'test_class':       FioExeTest,
diff --git a/verify.c b/verify.c
index b7fa6693..5ee0029d 100644
--- a/verify.c
+++ b/verify.c
@@ -8,6 +8,7 @@
 #include <pthread.h>
 #include <libgen.h>
 
+#include "arch/arch.h"
 #include "fio.h"
 #include "verify.h"
 #include "trim.h"
@@ -1309,8 +1310,7 @@ int get_next_verify(struct thread_data *td, struct io_u *io_u)
 		/*
 		 * Ensure that the associated IO has completed
 		 */
-		read_barrier();
-		if (ipo->flags & IP_F_IN_FLIGHT)
+		if (atomic_load_acquire(&ipo->flags) & IP_F_IN_FLIGHT)
 			goto nothing;
 
 		rb_erase(n, &td->io_hist_tree);
@@ -1322,8 +1322,7 @@ int get_next_verify(struct thread_data *td, struct io_u *io_u)
 		/*
 		 * Ensure that the associated IO has completed
 		 */
-		read_barrier();
-		if (ipo->flags & IP_F_IN_FLIGHT)
+		if (atomic_load_acquire(&ipo->flags) & IP_F_IN_FLIGHT)
 			goto nothing;
 
 		flist_del(&ipo->list);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-06-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-06-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 14060ebb90ce5a0a164d0e5e52c13e31b53b282d:

  Merge branch 'latency_window' of https://github.com/liu-song-6/fio (2020-06-09 19:43:52 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1a953d975847e248be1718105621796bf9481878:

  Priority bit log file format documentation update (2020-06-12 16:24:46 -0600)

----------------------------------------------------------------
Bart Van Assche (3):
      zbd: Fix spelling of the "zonemode" job option
      zbd: Fix zoned_block_device_info.zone_size documentation
      pshared: Improve mutex_init_pshared_with_type()

Jens Axboe (2):
      Merge branch 'master' of https://github.com/raphael-nutanix/fio
      Merge branch 'zbd' of https://github.com/bvanassche/fio

Phillip Chen (1):
      Priority bit log file format documentation update

Raphael Norwitz (1):
      Fix typo in libiscsi error message

 HOWTO              |  5 ++++-
 engines/libiscsi.c |  2 +-
 fio.1              |  6 +++++-
 pshared.c          | 18 ++++++++++--------
 zbd.c              |  2 +-
 zbd.h              |  2 +-
 6 files changed, 22 insertions(+), 13 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 9e339bb8..8cf8d650 100644
--- a/HOWTO
+++ b/HOWTO
@@ -4165,7 +4165,7 @@ Fio supports a variety of log file formats, for logging latencies, bandwidth,
 and IOPS. The logs share a common format, which looks like this:
 
     *time* (`msec`), *value*, *data direction*, *block size* (`bytes`),
-    *offset* (`bytes`)
+    *offset* (`bytes`), *command priority*
 
 *Time* for the log entry is always in milliseconds. The *value* logged depends
 on the type of log, it will be one of the following:
@@ -4190,6 +4190,9 @@ The entry's *block size* is always in bytes. The *offset* is the position in byt
 from the start of the file for that particular I/O. The logging of the offset can be
 toggled with :option:`log_offset`.
 
+*Command priority* is 0 for normal priority and 1 for high priority. This is controlled
+by the ioengine specific :option:`cmdprio_percentage`.
+
 Fio defaults to logging every individual I/O but when windowed logging is set
 through :option:`log_avg_msec`, either the average (by default) or the maximum
 (:option:`log_max_value` is set) *value* seen over the specified period of time
diff --git a/engines/libiscsi.c b/engines/libiscsi.c
index 58667fb2..35761a61 100644
--- a/engines/libiscsi.c
+++ b/engines/libiscsi.c
@@ -109,7 +109,7 @@ static int fio_iscsi_setup_lun(struct iscsi_info *iscsi_info,
 	if (iscsi_full_connect_sync(iscsi_lun->iscsi,
 				    iscsi_lun->url->portal,
 				    iscsi_lun->url->lun)) {
-		log_err("sicsi: failed to connect to LUN : %s\n",
+		log_err("iscsi: failed to connect to LUN : %s\n",
 			iscsi_get_error(iscsi_lun->iscsi));
 		ret = EINVAL;
 		goto out;
diff --git a/fio.1 b/fio.1
index f469c46e..f134e0bf 100644
--- a/fio.1
+++ b/fio.1
@@ -3863,7 +3863,8 @@ Fio supports a variety of log file formats, for logging latencies, bandwidth,
 and IOPS. The logs share a common format, which looks like this:
 .RS
 .P
-time (msec), value, data direction, block size (bytes), offset (bytes)
+time (msec), value, data direction, block size (bytes), offset (bytes),
+command priority
 .RE
 .P
 `Time' for the log entry is always in milliseconds. The `value' logged depends
@@ -3897,6 +3898,9 @@ The entry's `block size' is always in bytes. The `offset' is the position in byt
 from the start of the file for that particular I/O. The logging of the offset can be
 toggled with \fBlog_offset\fR.
 .P
+`Command priority` is 0 for normal priority and 1 for high priority. This is controlled
+by the ioengine specific \fBcmdprio_percentage\fR.
+.P
 Fio defaults to logging every individual I/O but when windowed logging is set
 through \fBlog_avg_msec\fR, either the average (by default) or the maximum
 (\fBlog_max_value\fR is set) `value' seen over the specified period of time
diff --git a/pshared.c b/pshared.c
index e671c87f..182a3652 100644
--- a/pshared.c
+++ b/pshared.c
@@ -39,6 +39,10 @@ int cond_init_pshared(pthread_cond_t *cond)
 	return 0;
 }
 
+/*
+ * 'type' must be a mutex type, e.g. PTHREAD_MUTEX_NORMAL,
+ * PTHREAD_MUTEX_ERRORCHECK, PTHREAD_MUTEX_RECURSIVE or PTHREAD_MUTEX_DEFAULT.
+ */
 int mutex_init_pshared_with_type(pthread_mutex_t *mutex, int type)
 {
 	pthread_mutexattr_t mattr;
@@ -60,26 +64,24 @@ int mutex_init_pshared_with_type(pthread_mutex_t *mutex, int type)
 		return ret;
 	}
 #endif
-	if (type) {
-		ret = pthread_mutexattr_settype(&mattr, type);
-		if (ret) {
-			log_err("pthread_mutexattr_settype: %s\n",
-				strerror(ret));
-			return ret;
-		}
+	ret = pthread_mutexattr_settype(&mattr, type);
+	if (ret) {
+		log_err("pthread_mutexattr_settype: %s\n", strerror(ret));
+		return ret;
 	}
 	ret = pthread_mutex_init(mutex, &mattr);
 	if (ret) {
 		log_err("pthread_mutex_init: %s\n", strerror(ret));
 		return ret;
 	}
+	pthread_mutexattr_destroy(&mattr);
 
 	return 0;
 }
 
 int mutex_init_pshared(pthread_mutex_t *mutex)
 {
-	return mutex_init_pshared_with_type(mutex, 0);
+	return mutex_init_pshared_with_type(mutex, PTHREAD_MUTEX_DEFAULT);
 }
 
 int mutex_cond_init_pshared(pthread_mutex_t *mutex, pthread_cond_t *cond)
diff --git a/zbd.c b/zbd.c
index dc302606..8cf8f812 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1365,7 +1365,7 @@ void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u)
 }
 
 /**
- * zbd_adjust_ddir - Adjust an I/O direction for zonedmode=zbd.
+ * zbd_adjust_ddir - Adjust an I/O direction for zonemode=zbd.
  *
  * @td: FIO thread data.
  * @io_u: FIO I/O unit.
diff --git a/zbd.h b/zbd.h
index 9c447af4..e942a7f6 100644
--- a/zbd.h
+++ b/zbd.h
@@ -49,7 +49,7 @@ struct fio_zone_info {
  *	sequential write zones.
  * @mutex: Protects the modifiable members in this structure (refcount and
  *		num_open_zones).
- * @zone_size: size of a single zone in units of 512 bytes
+ * @zone_size: size of a single zone in bytes.
  * @sectors_with_data: total size of data in all zones in units of 512 bytes
  * @zone_size_log2: log2 of the zone size in bytes if it is a power of 2 or 0
  *		if the zone size is not a power of 2.


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-06-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-06-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c9beb194ef608dc49dd2f7537b90951f3c8b432d:

  t/run-fio-tests.py: Accept a wider range of IOPS values (2020-06-07 17:30:42 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 14060ebb90ce5a0a164d0e5e52c13e31b53b282d:

  Merge branch 'latency_window' of https://github.com/liu-song-6/fio (2020-06-09 19:43:52 -0600)

----------------------------------------------------------------
Bart Van Assche (5):
      Make json_object_add_value_string() duplicate its 'value' argument
      client: Fix two memory leaks in handle_job_opt()
      client: Make skipping option appending in handle_job_opt() more selective
      client: Fix a memory leak in an error path
      client: Fix another memory leak in an error path

Jens Axboe (4):
      Merge branch 'client-leak-fix' of https://github.com/bvanassche/fio
      Merge branch 'master' of https://github.com/bvanassche/fio
      Merge branch 'nowait' of https://github.com/koct9i/fio
      Merge branch 'latency_window' of https://github.com/liu-song-6/fio

Konstantin Khlebnikov (1):
      engines: pvsync2 libaio io_uring: add support for RWF_NOWAIT

Song Liu (1):
      init: fix unit of latency_window

 HOWTO              | 20 ++++++++++++++++++++
 client.c           | 37 ++++++++++++++-----------------------
 configure          | 20 ++++++++++++++++++++
 engines/io_uring.c | 12 ++++++++++++
 engines/libaio.c   | 22 +++++++++++++++++++++-
 engines/sync.c     | 12 ++++++++++++
 fio.1              | 16 ++++++++++++++++
 init.c             |  1 -
 json.h             |  2 +-
 os/os-linux.h      |  3 +++
 stat.c             |  6 +-----
 11 files changed, 120 insertions(+), 31 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 9e71a619..9e339bb8 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2097,6 +2097,26 @@ with the caveat that when used on the command line, they must come after the
 	When hipri is set this determines the probability of a pvsync2 I/O being high
 	priority. The default is 100%.
 
+.. option:: nowait : [pvsync2] [libaio] [io_uring]
+
+	By default if a request cannot be executed immediately (e.g. resource starvation,
+	waiting on locks) it is queued and the initiating process will be blocked until
+	the required resource becomes free.
+
+	This option sets the RWF_NOWAIT flag (supported from the 4.14 Linux kernel) and
+	the call will return instantly with EAGAIN or a partial result rather than waiting.
+
+	It is useful to also use ignore_error=EAGAIN when using this option.
+
+	Note: glibc 2.27, 2.28 have a bug in syscall wrappers preadv2, pwritev2.
+	They return EOPNOTSUP instead of EAGAIN.
+
+	For cached I/O, using this option usually means a request operates only with
+	cached data. Currently the RWF_NOWAIT flag does not supported for cached write.
+
+	For direct I/O, requests will only succeed if cache invalidation isn't required,
+	file blocks are fully allocated and the disk request could be issued immediately.
+
 .. option:: cpuload=int : [cpuio]
 
 	Attempt to use the specified percentage of CPU cycles. This is a mandatory
diff --git a/client.c b/client.c
index b7575596..29d8750a 100644
--- a/client.c
+++ b/client.c
@@ -390,8 +390,6 @@ struct fio_client *fio_client_add_explicit(struct client_ops *ops,
 
 	client = get_new_client();
 
-	client->hostname = strdup(hostname);
-
 	if (type == Fio_client_socket)
 		client->is_sock = true;
 	else {
@@ -410,6 +408,7 @@ struct fio_client *fio_client_add_explicit(struct client_ops *ops,
 	client->ops = ops;
 	client->refs = 1;
 	client->type = ops->client_type;
+	client->hostname = strdup(hostname);
 
 	__fio_client_add_cmd_option(client, "fio");
 
@@ -471,8 +470,10 @@ int fio_client_add(struct client_ops *ops, const char *hostname, void **cookie)
 					&client->is_sock, &client->port,
 					&client->addr.sin_addr,
 					&client->addr6.sin6_addr,
-					&client->ipv6))
+					&client->ipv6)) {
+		fio_put_client(client);
 		return -1;
+	}
 
 	client->fd = -1;
 	client->ops = ops;
@@ -1140,37 +1141,27 @@ static void handle_gs(struct fio_client *client, struct fio_net_cmd *cmd)
 static void handle_job_opt(struct fio_client *client, struct fio_net_cmd *cmd)
 {
 	struct cmd_job_option *pdu = (struct cmd_job_option *) cmd->payload;
-	struct print_option *p;
-
-	if (!job_opt_object)
-		return;
 
 	pdu->global = le16_to_cpu(pdu->global);
 	pdu->truncated = le16_to_cpu(pdu->truncated);
 	pdu->groupid = le32_to_cpu(pdu->groupid);
 
-	p = malloc(sizeof(*p));
-	p->name = strdup((char *) pdu->name);
-	if (pdu->value[0] != '\0')
-		p->value = strdup((char *) pdu->value);
-	else
-		p->value = NULL;
-
 	if (pdu->global) {
-		const char *pos = "";
+		if (!job_opt_object)
+			return;
 
-		if (p->value)
-			pos = p->value;
-
-		json_object_add_value_string(job_opt_object, p->name, pos);
+		json_object_add_value_string(job_opt_object,
+					     (const char *)pdu->name,
+					     (const char *)pdu->value);
 	} else if (client->opt_lists) {
 		struct flist_head *opt_list = &client->opt_lists[pdu->groupid];
+		struct print_option *p;
 
+		p = malloc(sizeof(*p));
+		p->name = strdup((const char *)pdu->name);
+		p->value = pdu->value[0] ? strdup((const char *)pdu->value) :
+			NULL;
 		flist_add_tail(&p->list, opt_list);
-	} else {
-		free(p->value);
-		free(p->name);
-		free(p);
 	}
 }
 
diff --git a/configure b/configure
index cf8b88e4..3ee8aaf2 100755
--- a/configure
+++ b/configure
@@ -617,8 +617,25 @@ EOF
     libaio=no
     libaio_uring=no
   fi
+
+  cat > $TMPC <<EOF
+#include <libaio.h>
+#include <stddef.h>
+int main(void)
+{
+  io_prep_preadv2(NULL, 0, NULL, 0, 0, 0);
+  io_prep_pwritev2(NULL, 0, NULL, 0, 0, 0);
+  return 0;
+}
+EOF
+  if compile_prog "" "" "libaio rw flags" ; then
+    libaio_rw_flags=yes
+  else
+    libaio_rw_flags=no
+  fi
 fi
 print_config "Linux AIO support" "$libaio"
+print_config "Linux AIO support rw flags" "$libaio_rw_flags"
 print_config "Linux AIO over io_uring" "$libaio_uring"
 
 ##########################################
@@ -2646,6 +2663,9 @@ if test "$zlib" = "yes" ; then
 fi
 if test "$libaio" = "yes" ; then
   output_sym "CONFIG_LIBAIO"
+  if test "$libaio_rw_flags" = "yes" ; then
+    output_sym "CONFIG_LIBAIO_RW_FLAGS"
+  fi
   if test "$libaio_uring" = "yes" ; then
     output_sym "CONFIG_LIBAIO_URING"
   fi
diff --git a/engines/io_uring.c b/engines/io_uring.c
index ac57af8f..cab7ecaf 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -80,6 +80,7 @@ struct ioring_options {
 	unsigned int sqpoll_cpu;
 	unsigned int nonvectored;
 	unsigned int uncached;
+	unsigned int nowait;
 };
 
 static const int ddir_to_op[2][2] = {
@@ -185,6 +186,15 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
 	},
+	{
+		.name	= "nowait",
+		.lname	= "RWF_NOWAIT",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct ioring_options, nowait),
+		.help	= "Use RWF_NOWAIT for reads/writes",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
 	{
 		.name	= NULL,
 	},
@@ -235,6 +245,8 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 		}
 		if (!td->o.odirect && o->uncached)
 			sqe->rw_flags = RWF_UNCACHED;
+		if (o->nowait)
+			sqe->rw_flags |= RWF_NOWAIT;
 		if (ld->ioprio_class_set)
 			sqe->ioprio = td->o.ioprio_class << 13;
 		if (ld->ioprio_set)
diff --git a/engines/libaio.c b/engines/libaio.c
index 299798ae..daa576da 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -21,6 +21,11 @@
 #define IOCB_FLAG_IOPRIO    (1 << 1)
 #endif
 
+/* Hack for libaio < 0.3.111 */
+#ifndef CONFIG_LIBAIO_RW_FLAGS
+#define aio_rw_flags __pad2
+#endif
+
 static int fio_libaio_commit(struct thread_data *td);
 static int fio_libaio_init(struct thread_data *td);
 
@@ -51,6 +56,7 @@ struct libaio_options {
 	void *pad;
 	unsigned int userspace_reap;
 	unsigned int cmdprio_percentage;
+	unsigned int nowait;
 };
 
 static struct fio_option options[] = {
@@ -83,6 +89,15 @@ static struct fio_option options[] = {
 		.help	= "Your platform does not support I/O priority classes",
 	},
 #endif
+	{
+		.name	= "nowait",
+		.lname	= "RWF_NOWAIT",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct libaio_options, nowait),
+		.help	= "Set RWF_NOWAIT for reads/writes",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
 	{
 		.name	= NULL,
 	},
@@ -97,15 +112,20 @@ static inline void ring_inc(struct libaio_data *ld, unsigned int *val,
 		*val = (*val + add) % ld->entries;
 }
 
-static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
+static int fio_libaio_prep(struct thread_data *td, struct io_u *io_u)
 {
+	struct libaio_options *o = td->eo;
 	struct fio_file *f = io_u->file;
 	struct iocb *iocb = &io_u->iocb;
 
 	if (io_u->ddir == DDIR_READ) {
 		io_prep_pread(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+		if (o->nowait)
+			iocb->aio_rw_flags |= RWF_NOWAIT;
 	} else if (io_u->ddir == DDIR_WRITE) {
 		io_prep_pwrite(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+		if (o->nowait)
+			iocb->aio_rw_flags |= RWF_NOWAIT;
 	} else if (ddir_sync(io_u->ddir))
 		io_prep_fsync(iocb, f->fd);
 
diff --git a/engines/sync.c b/engines/sync.c
index 65fd210c..339ba999 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -40,6 +40,7 @@ struct psyncv2_options {
 	unsigned int hipri;
 	unsigned int hipri_percentage;
 	unsigned int uncached;
+	unsigned int nowait;
 };
 
 static struct fio_option options[] = {
@@ -73,6 +74,15 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_INVALID,
 	},
+	{
+		.name	= "nowait",
+		.lname	= "RWF_NOWAIT",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct psyncv2_options, nowait),
+		.help	= "Set RWF_NOWAIT for pwritev2/preadv2",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_INVALID,
+	},
 	{
 		.name	= NULL,
 	},
@@ -164,6 +174,8 @@ static enum fio_q_status fio_pvsyncio2_queue(struct thread_data *td,
 		flags |= RWF_HIPRI;
 	if (!td->o.odirect && o->uncached)
 		flags |= RWF_UNCACHED;
+	if (o->nowait)
+		flags |= RWF_NOWAIT;
 
 	iov->iov_base = io_u->xfer_buf;
 	iov->iov_len = io_u->xfer_buflen;
diff --git a/fio.1 b/fio.1
index 47bc1592..f469c46e 100644
--- a/fio.1
+++ b/fio.1
@@ -1857,6 +1857,22 @@ than normal.
 When hipri is set this determines the probability of a pvsync2 I/O being high
 priority. The default is 100%.
 .TP
+.BI (pvsync2,libaio,io_uring)nowait
+By default if a request cannot be executed immediately (e.g. resource starvation,
+waiting on locks) it is queued and the initiating process will be blocked until
+the required resource becomes free.
+This option sets the RWF_NOWAIT flag (supported from the 4.14 Linux kernel) and
+the call will return instantly with EAGAIN or a partial result rather than waiting.
+
+It is useful to also use \fBignore_error\fR=EAGAIN when using this option.
+Note: glibc 2.27, 2.28 have a bug in syscall wrappers preadv2, pwritev2.
+They return EOPNOTSUP instead of EAGAIN.
+
+For cached I/O, using this option usually means a request operates only with
+cached data. Currently the RWF_NOWAIT flag does not supported for cached write.
+For direct I/O, requests will only succeed if cache invalidation isn't required,
+file blocks are fully allocated and the disk request could be issued immediately.
+.TP
 .BI (cpuio)cpuload \fR=\fPint
 Attempt to use the specified percentage of CPU cycles. This is a mandatory
 option when using cpuio I/O engine.
diff --git a/init.c b/init.c
index e220c323..e53be35a 100644
--- a/init.c
+++ b/init.c
@@ -956,7 +956,6 @@ static int fixup_options(struct thread_data *td)
 	 */
 	o->max_latency *= 1000ULL;
 	o->latency_target *= 1000ULL;
-	o->latency_window *= 1000ULL;
 
 	return ret;
 }
diff --git a/json.h b/json.h
index 1544ed76..d9824263 100644
--- a/json.h
+++ b/json.h
@@ -82,7 +82,7 @@ static inline int json_object_add_value_string(struct json_object *obj,
 		.type = JSON_TYPE_STRING,
 	};
 
-	arg.string = (char *)val;
+	arg.string = strdup(val ? : "");
 	return json_object_add_value_type(obj, name, &arg);
 }
 
diff --git a/os/os-linux.h b/os/os-linux.h
index 0f0bcc3a..6ec7243d 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -325,6 +325,9 @@ static inline int fio_set_sched_idle(void)
 #ifndef RWF_SYNC
 #define RWF_SYNC	0x00000004
 #endif
+#ifndef RWF_NOWAIT
+#define RWF_NOWAIT	0x00000008
+#endif
 
 #ifndef RWF_UNCACHED
 #define RWF_UNCACHED	0x00000040
diff --git a/stat.c b/stat.c
index 2cf11947..b3951199 100644
--- a/stat.c
+++ b/stat.c
@@ -1506,12 +1506,8 @@ static void json_add_job_opts(struct json_object *root, const char *name,
 	json_object_add_value_object(root, name, dir_object);
 
 	flist_for_each(entry, opt_list) {
-		const char *pos = "";
-
 		p = flist_entry(entry, struct print_option, list);
-		if (p->value)
-			pos = p->value;
-		json_object_add_value_string(dir_object, p->name, pos);
+		json_object_add_value_string(dir_object, p->name, p->value);
 	}
 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-06-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-06-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 08541925ec1b5887fc91e59a4bc253d54757c4c6:

  Makefile: include linux-blkzoned.c for Android, if set (2020-06-05 07:06:46 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c9beb194ef608dc49dd2f7537b90951f3c8b432d:

  t/run-fio-tests.py: Accept a wider range of IOPS values (2020-06-07 17:30:42 -0600)

----------------------------------------------------------------
Bart Van Assche (4):
      .travis.yml: Move shell code out of .travis.yml
      ci/travis-install.sh: Install python3-six package
      Switch to Python3
      t/run-fio-tests.py: Accept a wider range of IOPS values

Vincent Fu (1):
      ci/travis-install.sh, MacOS: Install the Python 'six' package

 .travis.yml                         | 28 +++--------------------
 ci/travis-build.sh                  | 16 ++++++++++++++
 ci/travis-install.sh                | 44 +++++++++++++++++++++++++++++++++++++
 t/run-fio-tests.py                  |  2 +-
 t/sgunmap-perf.py                   |  2 +-
 t/sgunmap-test.py                   |  2 +-
 tools/fio_jsonplus_clat2csv         |  2 +-
 tools/fiologparser.py               |  3 ++-
 tools/hist/fio-histo-log-pctiles.py |  5 +++--
 tools/hist/fiologparser_hist.py     | 11 ++++------
 tools/hist/half-bins.py             |  2 +-
 tools/plot/fio2gnuplot              |  2 +-
 12 files changed, 78 insertions(+), 41 deletions(-)
 create mode 100755 ci/travis-build.sh
 create mode 100755 ci/travis-install.sh

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
index eba16baa..b64f0a95 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -39,29 +39,7 @@ matrix:
       arch: arm64
 
 before_install:
-  - EXTRA_CFLAGS="-Werror"
-  - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then
-        pkgs=(libaio-dev libcunit1 libcunit1-dev libgoogle-perftools4 libibverbs-dev libiscsi-dev libnuma-dev librbd-dev librdmacm-dev libz-dev);
-        if [[ "$BUILD_ARCH" == "x86" ]]; then
-            pkgs=("${pkgs[@]/%/:i386}");
-            pkgs+=(gcc-multilib python3-scipy);
-            EXTRA_CFLAGS="${EXTRA_CFLAGS} -m32";
-        else
-            pkgs+=(glusterfs-common python3-scipy);
-        fi;
-        sudo apt-get -qq update;
-        sudo apt-get install --no-install-recommends -qq -y "${pkgs[@]}";
-    fi;
-  - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then
-        brew update;
-        brew install cunit;
-        pip3 install scipy;
-    fi;
+  - ci/travis-install.sh
+
 script:
-  - ./configure --extra-cflags="${EXTRA_CFLAGS}" && make
-  - make test
-  - if [[ "$TRAVIS_CPU_ARCH" == "arm64" ]]; then
-        sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug -p 1010:"--skip 15 16 17 18 19 20";
-    else
-        sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug;
-    fi;
+  - ci/travis-build.sh
diff --git a/ci/travis-build.sh b/ci/travis-build.sh
new file mode 100755
index 00000000..fff9c088
--- /dev/null
+++ b/ci/travis-build.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+
+EXTRA_CFLAGS="-Werror"
+
+if [[ "$BUILD_ARCH" == "x86" ]]; then
+    EXTRA_CFLAGS="${EXTRA_CFLAGS} -m32"
+fi
+
+./configure --extra-cflags="${EXTRA_CFLAGS}" &&
+    make &&
+    make test &&
+    if [[ "$TRAVIS_CPU_ARCH" == "arm64" ]]; then
+	sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug -p 1010:"--skip 15 16 17 18 19 20"
+    else
+	sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug
+    fi
diff --git a/ci/travis-install.sh b/ci/travis-install.sh
new file mode 100755
index 00000000..232ab6b1
--- /dev/null
+++ b/ci/travis-install.sh
@@ -0,0 +1,44 @@
+#!/bin/bash
+
+case "$TRAVIS_OS_NAME" in
+    "linux")
+	# Architecture-dependent packages.
+	pkgs=(
+	    libaio-dev
+	    libcunit1
+	    libcunit1-dev
+	    libgoogle-perftools4
+	    libibverbs-dev
+	    libiscsi-dev
+	    libnuma-dev
+	    librbd-dev
+	    librdmacm-dev
+	    libz-dev
+	)
+	if [[ "$BUILD_ARCH" == "x86" ]]; then
+	    pkgs=("${pkgs[@]/%/:i386}")
+	    pkgs+=(gcc-multilib)
+	else
+	    pkgs+=(glusterfs-common)
+	fi
+	# Architecture-independent packages and packages for which we don't
+	# care about the architecture.
+	pkgs+=(
+	    python3
+	    python3-scipy
+	    python3-six
+	)
+	sudo apt-get -qq update
+	sudo apt-get install --no-install-recommends -qq -y "${pkgs[@]}"
+	;;
+    "osx")
+	brew update >/dev/null 2>&1
+	brew install cunit
+	pip3 install scipy
+	pip3 install six
+	;;
+esac
+
+echo "Python version: $(/usr/bin/python -V 2>&1)"
+echo "Python3 path: $(which python3 2>&1)"
+echo "Python3 version: $(python3 -V 2>&1)"
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index e7063d3e..c2352d80 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -438,7 +438,7 @@ class FioJobTest_t0011(FioJobTest):
         logging.debug("Test %d: iops1: %f", self.testnum, iops1)
         logging.debug("Test %d: ratio: %f", self.testnum, ratio)
 
-        if iops1 < 998 or iops1 > 1002:
+        if iops1 < 997 or iops1 > 1003:
             self.failure_reason = "{0} iops value mismatch,".format(self.failure_reason)
             self.passed = False
 
diff --git a/t/sgunmap-perf.py b/t/sgunmap-perf.py
index fadbb859..962d187d 100755
--- a/t/sgunmap-perf.py
+++ b/t/sgunmap-perf.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python2.7
+#!/usr/bin/env python3
 #
 # sgunmap-test.py
 #
diff --git a/t/sgunmap-test.py b/t/sgunmap-test.py
index f8f10ab3..4960a040 100755
--- a/t/sgunmap-test.py
+++ b/t/sgunmap-test.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python2.7
+#!/usr/bin/env python3
 # Note: this script is python2 and python 3 compatible.
 #
 # sgunmap-test.py
diff --git a/tools/fio_jsonplus_clat2csv b/tools/fio_jsonplus_clat2csv
index 9544ab74..7f310fcc 100755
--- a/tools/fio_jsonplus_clat2csv
+++ b/tools/fio_jsonplus_clat2csv
@@ -1,4 +1,4 @@
-#!/usr/bin/python2.7
+#!/usr/bin/env python3
 # Note: this script is python2 and python3 compatible.
 
 """
diff --git a/tools/fiologparser.py b/tools/fiologparser.py
index cc29f1c7..054f1f60 100755
--- a/tools/fiologparser.py
+++ b/tools/fiologparser.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python2.7
+#!/usr/bin/env python3
 # Note: this script is python2 and python 3 compatible.
 #
 # fiologparser.py
@@ -18,6 +18,7 @@ from __future__ import absolute_import
 from __future__ import print_function
 import argparse
 import math
+from functools import reduce
 
 def parse_args():
     parser = argparse.ArgumentParser()
diff --git a/tools/hist/fio-histo-log-pctiles.py b/tools/hist/fio-histo-log-pctiles.py
index f9df2a3d..08e7722d 100755
--- a/tools/hist/fio-histo-log-pctiles.py
+++ b/tools/hist/fio-histo-log-pctiles.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
 
 # module to parse fio histogram log files, not using pandas
 # runs in python v2 or v3
@@ -24,6 +24,7 @@
 import sys, os, math, copy, time
 from copy import deepcopy
 import argparse
+from functools import reduce
 
 unittest2_imported = True
 try:
@@ -82,7 +83,7 @@ def parse_hist_file(logfn, buckets_per_interval, log_hist_msec):
         except ValueError as e:
             raise FioHistoLogExc('non-integer value %s' % exception_suffix(k+1, logfn))
 
-        neg_ints = list(filter( lambda tk : tk < 0, int_tokens ))
+        neg_ints = list([tk for tk in int_tokens if tk < 0])
         if len(neg_ints) > 0:
             raise FioHistoLogExc('negative integer value %s' % exception_suffix(k+1, logfn))
 
diff --git a/tools/hist/fiologparser_hist.py b/tools/hist/fiologparser_hist.py
index 8910d5fa..159454b1 100755
--- a/tools/hist/fiologparser_hist.py
+++ b/tools/hist/fiologparser_hist.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python2.7
+#!/usr/bin/env python3
 """ 
     Utility for converting *_clat_hist* files generated by fio into latency statistics.
     
@@ -124,7 +124,7 @@ def gen_output_columns(ctx):
         columns = ["end-time", "dir", "samples", "min", "avg", "median"]
     else:
         columns = ["end-time", "samples", "min", "avg", "median"]
-    columns.extend(list(map(lambda x: x+'%', strpercs)))
+    columns.extend(list([x+'%' for x in strpercs]))
     columns.append("max")
 
 def fmt_float_list(ctx, num=1):
@@ -339,7 +339,7 @@ def guess_max_from_bins(ctx, hist_cols):
     else:
         bins = [1216,1280,1344,1408,1472,1536,1600,1664]
     coarses = range(max_coarse + 1)
-    fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else -10, coarses))
+    fncn = lambda z: list([z/2**x if z % 2**x == 0 else -10 for x in coarses])
     
     arr = np.transpose(list(map(fncn, bins)))
     idx = np.where(arr == hist_cols)
@@ -470,10 +470,7 @@ def output_interval_data(ctx,directions):
 def main(ctx):
 
     if ctx.job_file:
-        try:
-            from configparser import SafeConfigParser, NoOptionError
-        except ImportError:
-            from ConfigParser import SafeConfigParser, NoOptionError
+        from configparser import SafeConfigParser, NoOptionError
 
         cp = SafeConfigParser(allow_no_value=True)
         with open(ctx.job_file, 'r') as fp:
diff --git a/tools/hist/half-bins.py b/tools/hist/half-bins.py
index 1bba8ff7..42af9540 100755
--- a/tools/hist/half-bins.py
+++ b/tools/hist/half-bins.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python2.7
+#!/usr/bin/env python3
 """ Cut the number bins in half in fio histogram output. Example usage:
 
         $ half-bins.py -c 2 output_clat_hist.1.log > smaller_clat_hist.1.log
diff --git a/tools/plot/fio2gnuplot b/tools/plot/fio2gnuplot
index 69aa791e..78ee82fb 100755
--- a/tools/plot/fio2gnuplot
+++ b/tools/plot/fio2gnuplot
@@ -1,4 +1,4 @@
-#!/usr/bin/python2.7
+#!/usr/bin/env python3
 # Note: this script is python2 and python3 compatible.
 #
 #  Copyright (C) 2013 eNovance SAS <licensing@enovance.com>


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-06-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-06-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 00ca8df5468e90bbf1256ec90fc9ae14b1706ccc:

  zbd: Fix max_open_zones checks (2020-06-03 20:15:35 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 08541925ec1b5887fc91e59a4bc253d54757c4c6:

  Makefile: include linux-blkzoned.c for Android, if set (2020-06-05 07:06:46 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Makefile: include linux-blkzoned.c for Android, if set

 Makefile | 3 +++
 1 file changed, 3 insertions(+)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index e3962195..7eb5e899 100644
--- a/Makefile
+++ b/Makefile
@@ -176,6 +176,9 @@ endif
 ifeq ($(CONFIG_TARGET_OS), Android)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c profiles/tiobench.c \
 		oslib/linux-dev-lookup.c
+ifdef CONFIG_HAS_BLKZONED
+  SOURCE += oslib/linux-blkzoned.c
+endif
   LIBS += -ldl -llog
   LDFLAGS += -rdynamic
 endif


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-06-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-06-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 63a4b9cca4ba3aa4101051402cbbe946ced17a49:

  gfio: don't have multiple versions of main_ui (2020-06-02 08:20:03 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 00ca8df5468e90bbf1256ec90fc9ae14b1706ccc:

  zbd: Fix max_open_zones checks (2020-06-03 20:15:35 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (1):
      zbd: Fix max_open_zones checks

 zbd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/zbd.c b/zbd.c
index a7572c9a..dc302606 100644
--- a/zbd.c
+++ b/zbd.c
@@ -989,7 +989,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 
 	assert(is_valid_offset(f, io_u->offset));
 
-	if (td->o.job_max_open_zones) {
+	if (td->o.max_open_zones || td->o.job_max_open_zones) {
 		/*
 		 * This statement accesses f->zbd_info->open_zones[] on purpose
 		 * without locking.
@@ -1018,7 +1018,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 
 		zone_lock(td, f, z);
 		pthread_mutex_lock(&f->zbd_info->mutex);
-		if (td->o.job_max_open_zones == 0)
+		if (td->o.max_open_zones == 0 && td->o.job_max_open_zones == 0)
 			goto examine_zone;
 		if (f->zbd_info->num_open_zones == 0) {
 			pthread_mutex_unlock(&f->zbd_info->mutex);
@@ -1074,7 +1074,7 @@ examine_zone:
 	}
 	dprint(FD_ZBD, "%s(%s): closing zone %d\n", __func__, f->file_name,
 	       zone_idx);
-	if (td->o.job_max_open_zones)
+	if (td->o.max_open_zones || td->o.job_max_open_zones)
 		zbd_close_zone(td, f, open_zone_idx);
 	pthread_mutex_unlock(&f->zbd_info->mutex);
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-06-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-06-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2f6cbfdcf0ce3b53e83713e3a6b5a0bb6a76a1e9:

  Merge branch 'pshared1' of https://github.com/kusumi/fio (2020-05-29 08:16:10 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 63a4b9cca4ba3aa4101051402cbbe946ced17a49:

  gfio: don't have multiple versions of main_ui (2020-06-02 08:20:03 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      gfio: don't have multiple versions of main_ui

 gfio.c | 4 +++-
 gfio.h | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/gfio.c b/gfio.c
index 28053968..734651b6 100644
--- a/gfio.c
+++ b/gfio.c
@@ -38,6 +38,8 @@
 #include "gclient.h"
 #include "graph.h"
 
+struct gui main_ui;
+
 static bool gfio_server_running;
 static unsigned int gfio_graph_limit = 100;
 
@@ -223,7 +225,7 @@ static void update_button_states(struct gui *ui, struct gui_entry *ge)
 	switch (ge->state) {
 	default:
 		gfio_report_error(ge, "Bad client state: %u\n", ge->state);
-		/* fall through to new state */
+		/* fall-through */
 	case GE_STATE_NEW:
 		connect_state = 1;
 		edit_state = 1;
diff --git a/gfio.h b/gfio.h
index aa14e3c7..2bf0ea24 100644
--- a/gfio.h
+++ b/gfio.h
@@ -78,7 +78,9 @@ struct gui {
 	int handler_running;
 
 	GHashTable *ge_hash;
-} main_ui;
+};
+
+extern struct gui main_ui;
 
 enum {
 	GE_STATE_NEW = 1,


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-05-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-05-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dacac3b936092aa73e29257fefcb7bf5658ec7cb:

  Merge branch 'python3-testing' of https://github.com/vincentkfu/fio (2020-05-28 13:27:15 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2f6cbfdcf0ce3b53e83713e3a6b5a0bb6a76a1e9:

  Merge branch 'pshared1' of https://github.com/kusumi/fio (2020-05-29 08:16:10 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'pshared1' of https://github.com/kusumi/fio

Tomohiro Kusumi (1):
      pshared: fix comment on supported platforms

 pshared.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/pshared.c b/pshared.c
index 791faf95..e671c87f 100644
--- a/pshared.c
+++ b/pshared.c
@@ -51,7 +51,7 @@ int mutex_init_pshared_with_type(pthread_mutex_t *mutex, int type)
 	}
 
 	/*
-	 * Not all platforms support process shared mutexes (FreeBSD)
+	 * Not all platforms support process shared mutexes (NetBSD/OpenBSD)
 	 */
 #ifdef CONFIG_PSHARED
 	ret = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-05-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-05-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c96b385b6e0c78478697713e6da9174fba2432d3:

  t/zbd: make the test script easier to terminate (2020-05-25 18:21:45 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dacac3b936092aa73e29257fefcb7bf5658ec7cb:

  Merge branch 'python3-testing' of https://github.com/vincentkfu/fio (2020-05-28 13:27:15 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'testing' of https://github.com/vincentkfu/fio
      Merge branch 'python3-testing' of https://github.com/vincentkfu/fio

Shin'ichiro Kawasaki (2):
      pshared: Add mutex_init_pshared_with_type()
      zbd: Fix compilation error on BSD

Vincent Fu (5):
      appveyor: use on_finish section to upload artifacts
      t/run-fio-tests: pass-through arguments to test scripts
      .travis: enable arm64 architecture builds
      testing: change two test scripts to refer to python3
      travis: install python3 scipy for Linux and macOS tests

 .appveyor.yml          |  6 +++---
 .travis.yml            | 26 ++++++++++++++++++--------
 pshared.c              | 15 ++++++++++++++-
 pshared.h              |  1 +
 t/run-fio-tests.py     | 27 ++++++++++++++++++---------
 t/steadystate_tests.py |  3 +--
 t/strided.py           |  3 +--
 zbd.c                  | 22 +++++++---------------
 8 files changed, 63 insertions(+), 40 deletions(-)

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index e2351be7..70c337f8 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -26,10 +26,10 @@ after_build:
 test_script:
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && file.exe fio.exe && make.exe test'
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && [ -f fio.exe ] && python.exe t/run-fio-tests.py --artifact-root test-artifacts --debug'
-  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && [ -d test-artifacts ] && 7z a -t7z test-artifacts.7z test-artifacts -xr!foo.0.0 -xr!latency.?.0 -xr!fio_jsonplus_clat2csv.test'
 
 artifacts:
   - path: os\windows\*.msi
     name: msi
-  - path: test-artifacts.7z
-    name: test-artifacts
+
+on_finish:
+  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && [ -d test-artifacts ] && 7z a -t7z test-artifacts.7z test-artifacts -xr!foo.0.0 -xr!latency.?.0 -xr!fio_jsonplus_clat2csv.test && appveyor PushArtifact test-artifacts.7z'
diff --git a/.travis.yml b/.travis.yml
index 77c31b77..eba16baa 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -5,6 +5,9 @@ os:
 compiler:
   - clang
   - gcc
+arch:
+  - amd64
+  - arm64
 env:
   matrix:
     - BUILD_ARCH="x86"
@@ -17,28 +20,34 @@ matrix:
     - os: osx
       compiler: clang # Workaround travis setting CC=["clang", "gcc"]
       env: BUILD_ARCH="x86_64"
+      arch: amd64
     # Latest xcode image (needs periodic updating)
     - os: osx
       compiler: clang
       osx_image: xcode11.2
       env: BUILD_ARCH="x86_64"
+      arch: amd64
   exclude:
     - os: osx
       compiler: gcc
-  exclude:
     - os: linux
       compiler: clang
+      arch: amd64
       env: BUILD_ARCH="x86" # Only do the gcc x86 build to reduce clutter
+    - os: linux
+      env: BUILD_ARCH="x86"
+      arch: arm64
+
 before_install:
   - EXTRA_CFLAGS="-Werror"
   - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then
         pkgs=(libaio-dev libcunit1 libcunit1-dev libgoogle-perftools4 libibverbs-dev libiscsi-dev libnuma-dev librbd-dev librdmacm-dev libz-dev);
         if [[ "$BUILD_ARCH" == "x86" ]]; then
             pkgs=("${pkgs[@]/%/:i386}");
-            pkgs+=(gcc-multilib python-scipy);
+            pkgs+=(gcc-multilib python3-scipy);
             EXTRA_CFLAGS="${EXTRA_CFLAGS} -m32";
         else
-            pkgs+=(glusterfs-common python-scipy);
+            pkgs+=(glusterfs-common python3-scipy);
         fi;
         sudo apt-get -qq update;
         sudo apt-get install --no-install-recommends -qq -y "${pkgs[@]}";
@@ -46,12 +55,13 @@ before_install:
   - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then
         brew update;
         brew install cunit;
-        if [[ "$TRAVIS_OSX_IMAGE" == "xcode11.2" ]]; then
-            pip3 install scipy;
-        fi;
-        pip install scipy;
+        pip3 install scipy;
     fi;
 script:
   - ./configure --extra-cflags="${EXTRA_CFLAGS}" && make
   - make test
-  - sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug
+  - if [[ "$TRAVIS_CPU_ARCH" == "arm64" ]]; then
+        sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug -p 1010:"--skip 15 16 17 18 19 20";
+    else
+        sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug;
+    fi;
diff --git a/pshared.c b/pshared.c
index 21192556..791faf95 100644
--- a/pshared.c
+++ b/pshared.c
@@ -39,7 +39,7 @@ int cond_init_pshared(pthread_cond_t *cond)
 	return 0;
 }
 
-int mutex_init_pshared(pthread_mutex_t *mutex)
+int mutex_init_pshared_with_type(pthread_mutex_t *mutex, int type)
 {
 	pthread_mutexattr_t mattr;
 	int ret;
@@ -60,6 +60,14 @@ int mutex_init_pshared(pthread_mutex_t *mutex)
 		return ret;
 	}
 #endif
+	if (type) {
+		ret = pthread_mutexattr_settype(&mattr, type);
+		if (ret) {
+			log_err("pthread_mutexattr_settype: %s\n",
+				strerror(ret));
+			return ret;
+		}
+	}
 	ret = pthread_mutex_init(mutex, &mattr);
 	if (ret) {
 		log_err("pthread_mutex_init: %s\n", strerror(ret));
@@ -69,6 +77,11 @@ int mutex_init_pshared(pthread_mutex_t *mutex)
 	return 0;
 }
 
+int mutex_init_pshared(pthread_mutex_t *mutex)
+{
+	return mutex_init_pshared_with_type(mutex, 0);
+}
+
 int mutex_cond_init_pshared(pthread_mutex_t *mutex, pthread_cond_t *cond)
 {
 	int ret;
diff --git a/pshared.h b/pshared.h
index a58df6fe..f33be462 100644
--- a/pshared.h
+++ b/pshared.h
@@ -3,6 +3,7 @@
 
 #include <pthread.h>
 
+extern int mutex_init_pshared_with_type(pthread_mutex_t *, int);
 extern int mutex_init_pshared(pthread_mutex_t *);
 extern int cond_init_pshared(pthread_cond_t *);
 extern int mutex_cond_init_pshared(pthread_mutex_t *, pthread_cond_t *);
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 763e0103..e7063d3e 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -122,10 +122,7 @@ class FioExeTest(FioTest):
     def run(self):
         """Execute the binary or script described by this instance."""
 
-        if self.parameters:
-            command = [self.exe_path] + self.parameters
-        else:
-            command = [self.exe_path]
+        command = [self.exe_path] + self.parameters
         command_file = open(self.command_file, "w+")
         command_file.write("%s\n" % command)
         command_file.close()
@@ -797,6 +794,8 @@ def parse_args():
                         help='provide debug output')
     parser.add_argument('-k', '--skip-req', action='store_true',
                         help='skip requirements checking')
+    parser.add_argument('-p', '--pass-through', action='append',
+                        help='pass-through an argument to an executable test')
     args = parser.parse_args()
 
     return args
@@ -811,6 +810,17 @@ def main():
     else:
         logging.basicConfig(level=logging.INFO)
 
+    pass_through = {}
+    if args.pass_through:
+        for arg in args.pass_through:
+            if not ':' in arg:
+                print("Invalid --pass-through argument '%s'" % arg)
+                print("Syntax for --pass-through is TESTNUMBER:ARGUMENT")
+                return
+            split = arg.split(":",1)
+            pass_through[int(split[0])] = split[1]
+        logging.debug("Pass-through arguments: %s" % pass_through)
+
     if args.fio_root:
         fio_root = args.fio_root
     else:
@@ -874,13 +884,12 @@ def main():
             if config['parameters']:
                 parameters = [p.format(fio_path=fio_path) for p in config['parameters']]
             else:
-                parameters = None
+                parameters = []
             if Path(exe_path).suffix == '.py' and platform.system() == "Windows":
-                if parameters:
-                    parameters.insert(0, exe_path)
-                else:
-                    parameters = [exe_path]
+                parameters.insert(0, exe_path)
                 exe_path = "python.exe"
+            if config['test_id'] in pass_through:
+                parameters += pass_through[config['test_id']].split()
             test = config['test_class'](exe_path, parameters,
                                         config['success'])
         else:
diff --git a/t/steadystate_tests.py b/t/steadystate_tests.py
index e99b655c..e8bd768c 100755
--- a/t/steadystate_tests.py
+++ b/t/steadystate_tests.py
@@ -1,5 +1,4 @@
-#!/usr/bin/env python
-# Note: this script is python2 and python3 compatible.
+#!/usr/bin/env python3
 #
 # steadystate_tests.py
 #
diff --git a/t/strided.py b/t/strided.py
index 6d34be8a..45e6f148 100755
--- a/t/strided.py
+++ b/t/strided.py
@@ -1,5 +1,4 @@
-#!/usr/bin/python
-# Note: this script is python2 and python3 compatible.
+#!/usr/bin/env python3
 #
 # strided.py
 #
diff --git a/zbd.c b/zbd.c
index 72352db0..a7572c9a 100644
--- a/zbd.c
+++ b/zbd.c
@@ -19,6 +19,7 @@
 #include "oslib/asprintf.h"
 #include "smalloc.h"
 #include "verify.h"
+#include "pshared.h"
 #include "zbd.h"
 
 /**
@@ -353,7 +354,6 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 	struct fio_zone_info *p;
 	uint64_t zone_size = td->o.zone_size;
 	struct zoned_block_device_info *zbd_info = NULL;
-	pthread_mutexattr_t attr;
 	int i;
 
 	if (zone_size == 0) {
@@ -374,14 +374,12 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 	if (!zbd_info)
 		return -ENOMEM;
 
-	pthread_mutexattr_init(&attr);
-	pthread_mutexattr_setpshared(&attr, true);
-	pthread_mutex_init(&zbd_info->mutex, &attr);
-	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
+	mutex_init_pshared(&zbd_info->mutex);
 	zbd_info->refcount = 1;
 	p = &zbd_info->zone_info[0];
 	for (i = 0; i < nr_zones; i++, p++) {
-		pthread_mutex_init(&p->mutex, &attr);
+		mutex_init_pshared_with_type(&p->mutex,
+					     PTHREAD_MUTEX_RECURSIVE);
 		p->start = i * zone_size;
 		p->wp = p->start + zone_size;
 		p->type = ZBD_ZONE_TYPE_SWR;
@@ -395,7 +393,6 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 	f->zbd_info->zone_size_log2 = is_power_of_2(zone_size) ?
 		ilog2(zone_size) : 0;
 	f->zbd_info->nr_zones = nr_zones;
-	pthread_mutexattr_destroy(&attr);
 	return 0;
 }
 
@@ -415,12 +412,8 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	struct fio_zone_info *p;
 	uint64_t zone_size, offset;
 	struct zoned_block_device_info *zbd_info = NULL;
-	pthread_mutexattr_t attr;
 	int i, j, ret = 0;
 
-	pthread_mutexattr_init(&attr);
-	pthread_mutexattr_setpshared(&attr, true);
-
 	zones = calloc(ZBD_REPORT_MAX_ZONES, sizeof(struct zbd_zone));
 	if (!zones)
 		goto out;
@@ -454,14 +447,14 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	ret = -ENOMEM;
 	if (!zbd_info)
 		goto out;
-	pthread_mutex_init(&zbd_info->mutex, &attr);
-	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
+	mutex_init_pshared(&zbd_info->mutex);
 	zbd_info->refcount = 1;
 	p = &zbd_info->zone_info[0];
 	for (offset = 0, j = 0; j < nr_zones;) {
 		z = &zones[0];
 		for (i = 0; i < nrz; i++, j++, z++, p++) {
-			pthread_mutex_init(&p->mutex, &attr);
+			mutex_init_pshared_with_type(&p->mutex,
+						     PTHREAD_MUTEX_RECURSIVE);
 			p->start = z->start;
 			switch (z->cond) {
 			case ZBD_ZONE_COND_NOT_WP:
@@ -512,7 +505,6 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 out:
 	sfree(zbd_info);
 	free(zones);
-	pthread_mutexattr_destroy(&attr);
 	return ret;
 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-05-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-05-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5df5882b267b1cc7d08f7e9038e6c1046d492936:

  Merge branch 'parse-and-fill-pattern' of https://github.com/bvanassche/fio (2020-05-24 12:03:56 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c96b385b6e0c78478697713e6da9174fba2432d3:

  t/zbd: make the test script easier to terminate (2020-05-25 18:21:45 -0600)

----------------------------------------------------------------
Dmitry Fomichev (4):
      libzbc: cleanup init code
      libzbc: fix whitespace errors
      t/zbd: beautify test script output
      t/zbd: make the test script easier to terminate

 engines/libzbc.c       | 33 +++++++++++++++++----------------
 t/zbd/test-zbd-support | 20 ++++++++++++++++++--
 2 files changed, 35 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libzbc.c b/engines/libzbc.c
index 8c682de6..9e568334 100644
--- a/engines/libzbc.c
+++ b/engines/libzbc.c
@@ -47,7 +47,7 @@ static int libzbc_open_dev(struct thread_data *td, struct fio_file *f,
 			   struct libzbc_data **p_ld)
 {
 	struct libzbc_data *ld = td->io_ops_data;
-        int ret, flags = OS_O_DIRECT;
+	int ret, flags = OS_O_DIRECT;
 
 	if (ld) {
 		/* Already open */
@@ -61,7 +61,7 @@ static int libzbc_open_dev(struct thread_data *td, struct fio_file *f,
 		return -EINVAL;
 	}
 
-        if (td_write(td)) {
+	if (td_write(td)) {
 		if (!read_only)
 			flags |= O_RDWR;
 	} else if (td_read(td)) {
@@ -71,17 +71,15 @@ static int libzbc_open_dev(struct thread_data *td, struct fio_file *f,
 			flags |= O_RDONLY;
 	} else if (td_trim(td)) {
 		td_verror(td, EINVAL, "libzbc does not support trim");
-                log_err("%s: libzbc does not support trim\n",
-                        f->file_name);
-                return -EINVAL;
+		log_err("%s: libzbc does not support trim\n", f->file_name);
+		return -EINVAL;
 	}
 
-        if (td->o.oatomic) {
+	if (td->o.oatomic) {
 		td_verror(td, EINVAL, "libzbc does not support O_ATOMIC");
-                log_err("%s: libzbc does not support O_ATOMIC\n",
-                        f->file_name);
-                return -EINVAL;
-        }
+		log_err("%s: libzbc does not support O_ATOMIC\n", f->file_name);
+		return -EINVAL;
+	}
 
 	ld = calloc(1, sizeof(*ld));
 	if (!ld)
@@ -92,15 +90,12 @@ static int libzbc_open_dev(struct thread_data *td, struct fio_file *f,
 	if (ret) {
 		log_err("%s: zbc_open() failed, err=%d\n",
 			f->file_name, ret);
-		return ret;
+		goto err;
 	}
 
 	ret = libzbc_get_dev_info(ld, f);
-	if (ret) {
-		zbc_close(ld->zdev);
-		free(ld);
-		return ret;
-	}
+	if (ret)
+		goto err_close;
 
 	td->io_ops_data = ld;
 out:
@@ -108,6 +103,12 @@ out:
 		*p_ld = ld;
 
 	return 0;
+
+err_close:
+	zbc_close(ld->zdev);
+err:
+	free(ld);
+	return ret;
 }
 
 static int libzbc_close_dev(struct thread_data *td)
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index de05f438..4001be3b 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -929,20 +929,36 @@ logfile=$0.log
 
 passed=0
 failed=0
+if [ -t 1 ]; then
+    red="\e[1;31m"
+    green="\e[1;32m"
+    end="\e[m"
+else
+    red=""
+    green=""
+    end=""
+fi
 rc=0
+
+intr=0
+trap 'intr=1' SIGINT
+
 for test_number in "${tests[@]}"; do
     rm -f "${logfile}.${test_number}"
-    echo -n "Running test $test_number ... "
+    echo -n "Running test $(printf "%02d" $test_number) ... "
     if eval "test$test_number"; then
 	status="PASS"
+	cc_status="${green}${status}${end}"
 	((passed++))
     else
 	status="FAIL"
+	cc_status="${red}${status}${end}"
 	((failed++))
 	rc=1
     fi
-    echo "$status"
+    echo -e "$cc_status"
     echo "$status" >> "${logfile}.${test_number}"
+    [ $intr -ne 0 ] && exit 1
 done
 
 echo "$passed tests passed"


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-05-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-05-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d51e4ffa596b617cdb41680b31a4a3895a5307fc:

  Fio 3.20 (2020-05-23 11:14:14 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5df5882b267b1cc7d08f7e9038e6c1046d492936:

  Merge branch 'parse-and-fill-pattern' of https://github.com/bvanassche/fio (2020-05-24 12:03:56 -0600)

----------------------------------------------------------------
Bart Van Assche (3):
      Fix spelling in a source code comment
      Declare a static variable 'const'
      Do not read past the end of fmt_desc[]

Jens Axboe (1):
      Merge branch 'parse-and-fill-pattern' of https://github.com/bvanassche/fio

 lib/pattern.c | 13 ++++---------
 lib/pattern.h |  1 -
 options.c     |  9 +++++----
 3 files changed, 9 insertions(+), 14 deletions(-)

---

Diff of recent changes:

diff --git a/lib/pattern.c b/lib/pattern.c
index 2024f2e9..680a12be 100644
--- a/lib/pattern.c
+++ b/lib/pattern.c
@@ -204,8 +204,7 @@ static const char *parse_number(const char *beg, char *out,
  * @out - output buffer where space for format should be reserved
  * @parsed - number of bytes which were already parsed so far
  * @out_len - length of the output buffer
- * @fmt_desc - format descritor array, what we expect to find
- * @fmt_desc_sz - size of the format descritor array
+ * @fmt_desc - format descriptor array, what we expect to find
  * @fmt - format array, the output
  * @fmt_sz - size of format array
  *
@@ -223,19 +222,18 @@ static const char *parse_number(const char *beg, char *out,
 static const char *parse_format(const char *in, char *out, unsigned int parsed,
 				unsigned int out_len, unsigned int *filled,
 				const struct pattern_fmt_desc *fmt_desc,
-				unsigned int fmt_desc_sz,
 				struct pattern_fmt *fmt, unsigned int fmt_sz)
 {
 	int i;
 	struct pattern_fmt *f = NULL;
 	unsigned int len = 0;
 
-	if (!out_len || !fmt_desc || !fmt_desc_sz || !fmt || !fmt_sz)
+	if (!out_len || !fmt_desc || !fmt || !fmt_sz)
 		return NULL;
 
 	assert(*in == '%');
 
-	for (i = 0; i < fmt_desc_sz; i++) {
+	for (i = 0; fmt_desc[i].fmt; i++) {
 		const struct pattern_fmt_desc *desc;
 
 		desc = &fmt_desc[i];
@@ -267,7 +265,6 @@ static const char *parse_format(const char *in, char *out, unsigned int parsed,
  * @out - output buffer where parsed result will be put
  * @out_len - lengths of the output buffer
  * @fmt_desc - array of pattern format descriptors [input]
- * @fmt_desc_sz - size of the format descriptor array
  * @fmt - array of pattern formats [output]
  * @fmt_sz - pointer where the size of pattern formats array stored [input],
  *           after successfull parsing this pointer will contain the number
@@ -311,7 +308,6 @@ static const char *parse_format(const char *in, char *out, unsigned int parsed,
 int parse_and_fill_pattern(const char *in, unsigned int in_len,
 			   char *out, unsigned int out_len,
 			   const struct pattern_fmt_desc *fmt_desc,
-			   unsigned int fmt_desc_sz,
 			   struct pattern_fmt *fmt,
 			   unsigned int *fmt_sz_out)
 {
@@ -340,8 +336,7 @@ int parse_and_fill_pattern(const char *in, unsigned int in_len,
 			break;
 		case '%':
 			end = parse_format(beg, out, out - out_beg, out_len,
-					   &filled, fmt_desc, fmt_desc_sz,
-					   fmt, fmt_rem);
+					   &filled, fmt_desc, fmt, fmt_rem);
 			parsed_fmt = 1;
 			break;
 		default:
diff --git a/lib/pattern.h b/lib/pattern.h
index 2d655ad0..a6d9d6b4 100644
--- a/lib/pattern.h
+++ b/lib/pattern.h
@@ -24,7 +24,6 @@ struct pattern_fmt {
 int parse_and_fill_pattern(const char *in, unsigned int in_len,
 			   char *out, unsigned int out_len,
 			   const struct pattern_fmt_desc *fmt_desc,
-			   unsigned int fmt_desc_sz,
 			   struct pattern_fmt *fmt,
 			   unsigned int *fmt_sz_out);
 
diff --git a/options.c b/options.c
index f2d98fa6..85a0f490 100644
--- a/options.c
+++ b/options.c
@@ -19,12 +19,13 @@ char client_sockaddr_str[INET6_ADDRSTRLEN] = { 0 };
 
 #define cb_data_to_td(data)	container_of(data, struct thread_data, o)
 
-static struct pattern_fmt_desc fmt_desc[] = {
+static const struct pattern_fmt_desc fmt_desc[] = {
 	{
 		.fmt   = "%o",
 		.len   = FIELD_SIZE(struct io_u *, offset),
 		.paste = paste_blockoff
-	}
+	},
+	{ }
 };
 
 /*
@@ -1339,7 +1340,7 @@ static int str_buffer_pattern_cb(void *data, const char *input)
 
 	/* FIXME: for now buffer pattern does not support formats */
 	ret = parse_and_fill_pattern(input, strlen(input), td->o.buffer_pattern,
-				     MAX_PATTERN_SIZE, NULL, 0, NULL, NULL);
+				     MAX_PATTERN_SIZE, NULL, NULL, NULL);
 	if (ret < 0)
 		return 1;
 
@@ -1388,7 +1389,7 @@ static int str_verify_pattern_cb(void *data, const char *input)
 
 	td->o.verify_fmt_sz = ARRAY_SIZE(td->o.verify_fmt);
 	ret = parse_and_fill_pattern(input, strlen(input), td->o.verify_pattern,
-				     MAX_PATTERN_SIZE, fmt_desc, sizeof(fmt_desc),
+				     MAX_PATTERN_SIZE, fmt_desc,
 				     td->o.verify_fmt, &td->o.verify_fmt_sz);
 	if (ret < 0)
 		return 1;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-05-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-05-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1bb1bcade4acbd3cf69ba1c825ad01eebf0b3cdf:

  zbd: make zbd_info->mutex non-recursive (2020-05-21 17:23:13 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d51e4ffa596b617cdb41680b31a4a3895a5307fc:

  Fio 3.20 (2020-05-23 11:14:14 -0600)

----------------------------------------------------------------
Fabrice Fontaine (1):
      Makefile: fix build of io_uring on sh4

Jens Axboe (2):
      Merge branch 'master' of https://github.com/ffontaine/fio
      Fio 3.20

 FIO-VERSION-GEN | 2 +-
 Makefile        | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 3220aaa1..7050f84e 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.19
+DEF_VER=fio-3.20
 
 LF='
 '
diff --git a/Makefile b/Makefile
index f1e984f5..e3962195 100644
--- a/Makefile
+++ b/Makefile
@@ -283,6 +283,7 @@ T_PIPE_ASYNC_OBJS = t/read-to-pipe-async.o
 T_PIPE_ASYNC_PROGS = t/read-to-pipe-async
 
 T_IOU_RING_OBJS = t/io_uring.o
+T_IOU_RING_OBJS += t/arch.o
 T_IOU_RING_PROGS = t/io_uring
 
 T_MEMLOCK_OBJS = t/memlock.o


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-05-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-05-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0e21f5c6f64a73bcede20e1a29a885845e453b8e:

  Merge branch 'testing' of https://github.com/vincentkfu/fio (2020-05-20 14:19:12 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1bb1bcade4acbd3cf69ba1c825ad01eebf0b3cdf:

  zbd: make zbd_info->mutex non-recursive (2020-05-21 17:23:13 -0600)

----------------------------------------------------------------
Alexey Dobriyan (5):
      verify: decouple seed generation from buffer fill
      zbd: bump ZBD_MAX_OPEN_ZONES
      zbd: don't lock zones outside working area
      zbd: introduce per job maximum open zones limit
      zbd: make zbd_info->mutex non-recursive

Jens Axboe (1):
      Merge branch 'latency_run' of https://github.com/liu-song-6/fio

Song Liu (1):
      Add option latency_run to continue enable latency_target

 HOWTO            |   7 ++++
 backend.c        |   6 ---
 cconv.c          |   2 +
 file.h           |   3 ++
 fio.1            |  11 +++++-
 fio.h            |   2 +
 io_u.c           |  18 ++++++++-
 options.c        |  26 +++++++++++--
 server.h         |   2 +-
 thread_options.h |   3 ++
 verify.c         |  20 ++++------
 zbd.c            | 110 ++++++++++++++++++++++++++++++++++++++-----------------
 zbd.h            |   3 ++
 zbd_types.h      |   2 +-
 14 files changed, 155 insertions(+), 60 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index cd628552..9e71a619 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2552,6 +2552,13 @@ I/O latency
 	defaults to 100.0, meaning that all I/Os must be equal or below to the value
 	set by :option:`latency_target`.
 
+.. option:: latency_run=bool
+
+	Used with :option:`latency_target`. If false (default), fio will find
+	the highest queue depth that meets :option:`latency_target` and exit. If
+	true, fio will continue running and try to meet :option:`latency_target`
+	by adjusting queue depth.
+
 .. option:: max_latency=time
 
 	If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
diff --git a/backend.c b/backend.c
index f519728c..0075a733 100644
--- a/backend.c
+++ b/backend.c
@@ -1006,12 +1006,6 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 		if (td->o.verify != VERIFY_NONE && io_u->ddir == DDIR_READ &&
 		    ((io_u->flags & IO_U_F_VER_LIST) || !td_rw(td))) {
 
-			if (!td->o.verify_pattern_bytes) {
-				io_u->rand_seed = __rand(&td->verify_state);
-				if (sizeof(int) != sizeof(long *))
-					io_u->rand_seed *= __rand(&td->verify_state);
-			}
-
 			if (verify_state_should_stop(td, io_u)) {
 				put_io_u(td, io_u);
 				break;
diff --git a/cconv.c b/cconv.c
index 48218dc4..449bcf7b 100644
--- a/cconv.c
+++ b/cconv.c
@@ -288,6 +288,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->latency_window = le64_to_cpu(top->latency_window);
 	o->max_latency = le64_to_cpu(top->max_latency);
 	o->latency_percentile.u.f = fio_uint64_to_double(le64_to_cpu(top->latency_percentile.u.i));
+	o->latency_run = le32_to_cpu(top->latency_run);
 	o->compress_percentage = le32_to_cpu(top->compress_percentage);
 	o->compress_chunk = le32_to_cpu(top->compress_chunk);
 	o->dedupe_percentage = le32_to_cpu(top->dedupe_percentage);
@@ -487,6 +488,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->latency_window = __cpu_to_le64(o->latency_window);
 	top->max_latency = __cpu_to_le64(o->max_latency);
 	top->latency_percentile.u.i = __cpu_to_le64(fio_double_to_uint64(o->latency_percentile.u.f));
+	top->latency_run = __cpu_to_le32(o->latency_run);
 	top->compress_percentage = cpu_to_le32(o->compress_percentage);
 	top->compress_chunk = cpu_to_le32(o->compress_chunk);
 	top->dedupe_percentage = cpu_to_le32(o->dedupe_percentage);
diff --git a/file.h b/file.h
index ae0e6fc8..375bbfd3 100644
--- a/file.h
+++ b/file.h
@@ -104,6 +104,9 @@ struct fio_file {
 	 * Zoned block device information. See also zonemode=zbd.
 	 */
 	struct zoned_block_device_info *zbd_info;
+	/* zonemode=zbd working area */
+	uint32_t min_zone;	/* inclusive */
+	uint32_t max_zone;	/* exclusive */
 
 	/*
 	 * Track last end and last start of IO for a given data direction
diff --git a/fio.1 b/fio.1
index 9e9e1b1a..47bc1592 100644
--- a/fio.1
+++ b/fio.1
@@ -804,7 +804,11 @@ so. Default: false.
 When running a random write test across an entire drive many more zones will be
 open than in a typical application workload. Hence this command line option
 that allows to limit the number of open zones. The number of open zones is
-defined as the number of zones to which write commands are issued.
+defined as the number of zones to which write commands are issued by all
+threads/processes.
+.TP
+.BI job_max_open_zones \fR=\fPint
+Limit on the number of simultaneously opened zones per single thread/process.
 .TP
 .BI zone_reset_threshold \fR=\fPfloat
 A number between zero and one that indicates the ratio of logical blocks with
@@ -2276,6 +2280,11 @@ The percentage of I/Os that must fall within the criteria specified by
 defaults to 100.0, meaning that all I/Os must be equal or below to the value
 set by \fBlatency_target\fR.
 .TP
+.BI latency_run \fR=\fPbool
+Used with \fBlatency_target\fR. If false (default), fio will find the highest
+queue depth that meets \fBlatency_target\fR and exit. If true, fio will continue
+running and try to meet \fBlatency_target\fR by adjusting queue depth.
+.TP
 .BI max_latency \fR=\fPtime
 If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
 maximum latency. When the unit is omitted, the value is interpreted in
diff --git a/fio.h b/fio.h
index dbdfdf86..8045c32f 100644
--- a/fio.h
+++ b/fio.h
@@ -260,6 +260,7 @@ struct thread_data {
 	struct frand_state prio_state;
 
 	struct zone_split_index **zone_state_index;
+	unsigned int num_open_zones;
 
 	unsigned int verify_batch;
 	unsigned int trim_batch;
@@ -377,6 +378,7 @@ struct thread_data {
 	unsigned int latency_qd_high;
 	unsigned int latency_qd_low;
 	unsigned int latency_failed;
+	unsigned int latency_stable_count;
 	uint64_t latency_ios;
 	int latency_end_run;
 
diff --git a/io_u.c b/io_u.c
index aa8808b8..ae1438fd 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1391,6 +1391,7 @@ static bool __lat_target_failed(struct thread_data *td)
 		td->latency_qd_low--;
 
 	td->latency_qd = (td->latency_qd + td->latency_qd_low) / 2;
+	td->latency_stable_count = 0;
 
 	dprint(FD_RATE, "Ramped down: %d %d %d\n", td->latency_qd_low, td->latency_qd, td->latency_qd_high);
 
@@ -1440,6 +1441,21 @@ static void lat_target_success(struct thread_data *td)
 
 	td->latency_qd_low = td->latency_qd;
 
+	if (td->latency_qd + 1 == td->latency_qd_high) {
+		/*
+		 * latency_qd will not incease on lat_target_success(), so
+		 * called stable. If we stick with this queue depth, the
+		 * final latency is likely lower than latency_target. Fix
+		 * this by increasing latency_qd_high slowly. Use a naive
+		 * heuristic here. If we get lat_target_success() 3 times
+		 * in a row, increase latency_qd_high by 1.
+		 */
+		if (++td->latency_stable_count >= 3) {
+			td->latency_qd_high++;
+			td->latency_stable_count = 0;
+		}
+	}
+
 	/*
 	 * If we haven't failed yet, we double up to a failing value instead
 	 * of bisecting from highest possible queue depth. If we have set
@@ -1459,7 +1475,7 @@ static void lat_target_success(struct thread_data *td)
 	 * Same as last one, we are done. Let it run a latency cycle, so
 	 * we get only the results from the targeted depth.
 	 */
-	if (td->latency_qd == qd) {
+	if (!o->latency_run && td->latency_qd == qd) {
 		if (td->latency_end_run) {
 			dprint(FD_RATE, "We are done\n");
 			td->done = 1;
diff --git a/options.c b/options.c
index b18cea33..f2d98fa6 100644
--- a/options.c
+++ b/options.c
@@ -3360,12 +3360,22 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "max_open_zones",
-		.lname	= "Maximum number of open zones",
+		.lname	= "Per device/file maximum number of open zones",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct thread_options, max_open_zones),
 		.maxval	= ZBD_MAX_OPEN_ZONES,
-		.help	= "Limit random writes to SMR drives to the specified"
-			  " number of sequential zones",
+		.help	= "Limit on the number of simultaneously opened sequential write zones with zonemode=zbd",
+		.def	= "0",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
+		.name	= "job_max_open_zones",
+		.lname	= "Job maximum number of open zones",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, job_max_open_zones),
+		.maxval	= ZBD_MAX_OPEN_ZONES,
+		.help	= "Limit on the number of simultaneously opened sequential write zones with zonemode=zbd by one thread/process",
 		.def	= "0",
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_INVALID,
@@ -3672,6 +3682,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_LATPROF,
 	},
+	{
+		.name	= "latency_run",
+		.lname	= "Latency Run",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct thread_options, latency_run),
+		.help	= "Keep adjusting queue depth to match latency_target",
+		.def	= "0",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_LATPROF,
+	},
 	{
 		.name	= "invalidate",
 		.lname	= "Cache invalidate",
diff --git a/server.h b/server.h
index 279b6917..de01a5c8 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 82,
+	FIO_SERVER_VER			= 83,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index c78ed43d..968ea0ab 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -324,6 +324,7 @@ struct thread_options {
 	unsigned long long latency_target;
 	unsigned long long latency_window;
 	fio_fp64_t latency_percentile;
+	uint32_t latency_run;
 
 	unsigned int sig_figs;
 
@@ -342,6 +343,7 @@ struct thread_options {
 	/* Parameters that affect zonemode=zbd */
 	unsigned int read_beyond_wp;
 	int max_open_zones;
+	unsigned int job_max_open_zones;
 	fio_fp64_t zrt;
 	fio_fp64_t zrf;
 };
@@ -612,6 +614,7 @@ struct thread_options_pack {
 	uint64_t latency_window;
 	uint64_t max_latency;
 	fio_fp64_t latency_percentile;
+	uint32_t latency_run;
 
 	uint32_t sig_figs;
 
diff --git a/verify.c b/verify.c
index cf299ebf..b7fa6693 100644
--- a/verify.c
+++ b/verify.c
@@ -46,15 +46,6 @@ static void __fill_buffer(struct thread_options *o, uint64_t seed, void *p,
 	__fill_random_buf_percentage(seed, p, o->compress_percentage, len, len, o->buffer_pattern, o->buffer_pattern_bytes);
 }
 
-static uint64_t fill_buffer(struct thread_data *td, void *p,
-			    unsigned int len)
-{
-	struct frand_state *fs = &td->verify_state;
-	struct thread_options *o = &td->o;
-
-	return fill_random_buf_percentage(fs, p, o->compress_percentage, len, len, o->buffer_pattern, o->buffer_pattern_bytes);
-}
-
 void fill_verify_pattern(struct thread_data *td, void *p, unsigned int len,
 			 struct io_u *io_u, uint64_t seed, int use_seed)
 {
@@ -63,10 +54,13 @@ void fill_verify_pattern(struct thread_data *td, void *p, unsigned int len,
 	if (!o->verify_pattern_bytes) {
 		dprint(FD_VERIFY, "fill random bytes len=%u\n", len);
 
-		if (use_seed)
-			__fill_buffer(o, seed, p, len);
-		else
-			io_u->rand_seed = fill_buffer(td, p, len);
+		if (!use_seed) {
+			seed = __rand(&td->verify_state);
+			if (sizeof(int) != sizeof(long *))
+				seed *= (unsigned long)__rand(&td->verify_state);
+		}
+		io_u->rand_seed = seed;
+		__fill_buffer(o, seed, p, len);
 		return;
 	}
 
diff --git a/zbd.c b/zbd.c
index 36de29fb..72352db0 100644
--- a/zbd.c
+++ b/zbd.c
@@ -156,8 +156,14 @@ static bool zbd_zone_full(const struct fio_file *f, struct fio_zone_info *z,
 		z->wp + required > z->start + f->zbd_info->zone_size;
 }
 
-static void zone_lock(struct thread_data *td, struct fio_zone_info *z)
+static void zone_lock(struct thread_data *td, struct fio_file *f, struct fio_zone_info *z)
 {
+	struct zoned_block_device_info *zbd = f->zbd_info;
+	uint32_t nz = z - zbd->zone_info;
+
+	/* A thread should never lock zones outside its working area. */
+	assert(f->min_zone <= nz && nz < f->max_zone);
+
 	/*
 	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
 	 * is changed or when io_u completes and zbd_put_io() executed.
@@ -291,6 +297,9 @@ static bool zbd_verify_sizes(void)
 					 (unsigned long long) new_end - f->file_offset);
 				f->io_size = new_end - f->file_offset;
 			}
+
+			f->min_zone = zbd_zone_idx(f, f->file_offset);
+			f->max_zone = zbd_zone_idx(f, f->file_offset + f->io_size);
 		}
 	}
 
@@ -366,9 +375,9 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 		return -ENOMEM;
 
 	pthread_mutexattr_init(&attr);
-	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
 	pthread_mutexattr_setpshared(&attr, true);
 	pthread_mutex_init(&zbd_info->mutex, &attr);
+	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
 	zbd_info->refcount = 1;
 	p = &zbd_info->zone_info[0];
 	for (i = 0; i < nr_zones; i++, p++) {
@@ -410,7 +419,6 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	int i, j, ret = 0;
 
 	pthread_mutexattr_init(&attr);
-	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
 	pthread_mutexattr_setpshared(&attr, true);
 
 	zones = calloc(ZBD_REPORT_MAX_ZONES, sizeof(struct zbd_zone));
@@ -447,6 +455,7 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	if (!zbd_info)
 		goto out;
 	pthread_mutex_init(&zbd_info->mutex, &attr);
+	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
 	zbd_info->refcount = 1;
 	p = &zbd_info->zone_info[0];
 	for (offset = 0, j = 0; j < nr_zones;) {
@@ -539,8 +548,10 @@ static int zbd_create_zone_info(struct thread_data *td, struct fio_file *f)
 		return -EINVAL;
 	}
 
-	if (ret == 0)
+	if (ret == 0) {
 		f->zbd_info->model = zbd_model;
+		f->zbd_info->max_open_zones = td->o.max_open_zones;
+	}
 	return ret;
 }
 
@@ -614,6 +625,25 @@ int zbd_setup_files(struct thread_data *td)
 	if (!zbd_verify_bs())
 		return 1;
 
+	for_each_file(td, f, i) {
+		struct zoned_block_device_info *zbd = f->zbd_info;
+
+		if (!zbd)
+			continue;
+
+		zbd->max_open_zones = zbd->max_open_zones ?: ZBD_MAX_OPEN_ZONES;
+
+		if (td->o.max_open_zones > 0 &&
+		    zbd->max_open_zones != td->o.max_open_zones) {
+			log_err("Different 'max_open_zones' values\n");
+			return 1;
+		}
+		if (zbd->max_open_zones > ZBD_MAX_OPEN_ZONES) {
+			log_err("'max_open_zones' value is limited by %u\n", ZBD_MAX_OPEN_ZONES);
+			return 1;
+		}
+	}
+
 	return 0;
 }
 
@@ -701,6 +731,7 @@ static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
 		(ZBD_MAX_OPEN_ZONES - (open_zone_idx + 1)) *
 		sizeof(f->zbd_info->open_zones[0]));
 	f->zbd_info->num_open_zones--;
+	td->num_open_zones--;
 	f->zbd_info->zone_info[zone_idx].open = 0;
 }
 
@@ -731,7 +762,7 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 
 		if (!zbd_zone_swr(z))
 			continue;
-		zone_lock(td, z);
+		zone_lock(td, f, z);
 		if (all_zones) {
 			unsigned int i;
 
@@ -763,14 +794,20 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
  * Reset zbd_info.write_cnt, the counter that counts down towards the next
  * zone reset.
  */
-static void zbd_reset_write_cnt(const struct thread_data *td,
-				const struct fio_file *f)
+static void _zbd_reset_write_cnt(const struct thread_data *td,
+				 const struct fio_file *f)
 {
 	assert(0 <= td->o.zrf.u.f && td->o.zrf.u.f <= 1);
 
-	pthread_mutex_lock(&f->zbd_info->mutex);
 	f->zbd_info->write_cnt = td->o.zrf.u.f ?
 		min(1.0 / td->o.zrf.u.f, 0.0 + UINT_MAX) : UINT_MAX;
+}
+
+static void zbd_reset_write_cnt(const struct thread_data *td,
+				const struct fio_file *f)
+{
+	pthread_mutex_lock(&f->zbd_info->mutex);
+	_zbd_reset_write_cnt(td, f);
 	pthread_mutex_unlock(&f->zbd_info->mutex);
 }
 
@@ -784,7 +821,7 @@ static bool zbd_dec_and_reset_write_cnt(const struct thread_data *td,
 	if (f->zbd_info->write_cnt)
 		write_cnt = --f->zbd_info->write_cnt;
 	if (write_cnt == 0)
-		zbd_reset_write_cnt(td, f);
+		_zbd_reset_write_cnt(td, f);
 	pthread_mutex_unlock(&f->zbd_info->mutex);
 
 	return write_cnt == 0;
@@ -854,14 +891,12 @@ static void zbd_init_swd(struct fio_file *f)
 void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 {
 	struct fio_zone_info *zb, *ze;
-	uint32_t zone_idx_e;
 
 	if (!f->zbd_info || !td_write(td))
 		return;
 
-	zb = &f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset)];
-	zone_idx_e = zbd_zone_idx(f, f->file_offset + f->io_size);
-	ze = &f->zbd_info->zone_info[zone_idx_e];
+	zb = &f->zbd_info->zone_info[f->min_zone];
+	ze = &f->zbd_info->zone_info[f->max_zone];
 	zbd_init_swd(f);
 	/*
 	 * If data verification is enabled reset the affected zones before
@@ -880,8 +915,9 @@ static bool is_zone_open(const struct thread_data *td, const struct fio_file *f,
 	struct zoned_block_device_info *zbdi = f->zbd_info;
 	int i;
 
-	assert(td->o.max_open_zones <= ARRAY_SIZE(zbdi->open_zones));
-	assert(zbdi->num_open_zones <= td->o.max_open_zones);
+	assert(td->o.job_max_open_zones == 0 || td->num_open_zones <= td->o.job_max_open_zones);
+	assert(td->o.job_max_open_zones <= zbdi->max_open_zones);
+	assert(zbdi->num_open_zones <= zbdi->max_open_zones);
 
 	for (i = 0; i < zbdi->num_open_zones; i++)
 		if (zbdi->open_zones[i] == zone_idx)
@@ -914,18 +950,19 @@ static bool zbd_open_zone(struct thread_data *td, const struct io_u *io_u,
 	if (td->o.verify != VERIFY_NONE && zbd_zone_full(f, z, min_bs))
 		return false;
 
-	/* Zero means no limit */
-	if (!td->o.max_open_zones)
-		return true;
-
 	pthread_mutex_lock(&f->zbd_info->mutex);
 	if (is_zone_open(td, f, zone_idx))
 		goto out;
 	res = false;
-	if (f->zbd_info->num_open_zones >= td->o.max_open_zones)
+	/* Zero means no limit */
+	if (td->o.job_max_open_zones > 0 &&
+	    td->num_open_zones >= td->o.job_max_open_zones)
+		goto out;
+	if (f->zbd_info->num_open_zones >= f->zbd_info->max_open_zones)
 		goto out;
 	dprint(FD_ZBD, "%s: opening zone %d\n", f->file_name, zone_idx);
 	f->zbd_info->open_zones[f->zbd_info->num_open_zones++] = zone_idx;
+	td->num_open_zones++;
 	z->open = 1;
 	res = true;
 
@@ -952,7 +989,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 						      struct io_u *io_u)
 {
 	const uint32_t min_bs = td->o.min_bs[io_u->ddir];
-	const struct fio_file *f = io_u->file;
+	struct fio_file *f = io_u->file;
 	struct fio_zone_info *z;
 	unsigned int open_zone_idx = -1;
 	uint32_t zone_idx, new_zone_idx;
@@ -960,7 +997,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 
 	assert(is_valid_offset(f, io_u->offset));
 
-	if (td->o.max_open_zones) {
+	if (td->o.job_max_open_zones) {
 		/*
 		 * This statement accesses f->zbd_info->open_zones[] on purpose
 		 * without locking.
@@ -969,6 +1006,10 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 	} else {
 		zone_idx = zbd_zone_idx(f, io_u->offset);
 	}
+	if (zone_idx < f->min_zone)
+		zone_idx = f->min_zone;
+	else if (zone_idx >= f->max_zone)
+		zone_idx = f->max_zone - 1;
 	dprint(FD_ZBD, "%s(%s): starting from zone %d (offset %lld, buflen %lld)\n",
 	       __func__, f->file_name, zone_idx, io_u->offset, io_u->buflen);
 
@@ -983,9 +1024,9 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 
 		z = &f->zbd_info->zone_info[zone_idx];
 
-		zone_lock(td, z);
+		zone_lock(td, f, z);
 		pthread_mutex_lock(&f->zbd_info->mutex);
-		if (td->o.max_open_zones == 0)
+		if (td->o.job_max_open_zones == 0)
 			goto examine_zone;
 		if (f->zbd_info->num_open_zones == 0) {
 			pthread_mutex_unlock(&f->zbd_info->mutex);
@@ -1009,8 +1050,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 			if (tmp_idx >= f->zbd_info->num_open_zones)
 				tmp_idx = 0;
 			tmpz = f->zbd_info->open_zones[tmp_idx];
-
-			if (is_valid_offset(f, f->zbd_info->zone_info[tmpz].start)) {
+			if (f->min_zone <= tmpz && tmpz < f->max_zone) {
 				open_zone_idx = tmp_idx;
 				goto found_candidate_zone;
 			}
@@ -1042,7 +1082,7 @@ examine_zone:
 	}
 	dprint(FD_ZBD, "%s(%s): closing zone %d\n", __func__, f->file_name,
 	       zone_idx);
-	if (td->o.max_open_zones)
+	if (td->o.job_max_open_zones)
 		zbd_close_zone(td, f, open_zone_idx);
 	pthread_mutex_unlock(&f->zbd_info->mutex);
 
@@ -1055,11 +1095,11 @@ examine_zone:
 		z++;
 		if (!is_valid_offset(f, z->start)) {
 			/* Wrap-around. */
-			zone_idx = zbd_zone_idx(f, f->file_offset);
+			zone_idx = f->min_zone;
 			z = &f->zbd_info->zone_info[zone_idx];
 		}
 		assert(is_valid_offset(f, z->start));
-		zone_lock(td, z);
+		zone_lock(td, f, z);
 		if (z->open)
 			continue;
 		if (zbd_open_zone(td, io_u, zone_idx))
@@ -1072,12 +1112,14 @@ examine_zone:
 	pthread_mutex_lock(&f->zbd_info->mutex);
 	for (i = 0; i < f->zbd_info->num_open_zones; i++) {
 		zone_idx = f->zbd_info->open_zones[i];
+		if (zone_idx < f->min_zone || zone_idx >= f->max_zone)
+			continue;
 		pthread_mutex_unlock(&f->zbd_info->mutex);
 		pthread_mutex_unlock(&z->mutex);
 
 		z = &f->zbd_info->zone_info[zone_idx];
 
-		zone_lock(td, z);
+		zone_lock(td, f, z);
 		if (z->wp + min_bs <= (z+1)->start)
 			goto out;
 		pthread_mutex_lock(&f->zbd_info->mutex);
@@ -1129,7 +1171,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 	      struct fio_zone_info *zb, struct fio_zone_info *zl)
 {
 	const uint32_t min_bs = td->o.min_bs[io_u->ddir];
-	const struct fio_file *f = io_u->file;
+	struct fio_file *f = io_u->file;
 	struct fio_zone_info *z1, *z2;
 	const struct fio_zone_info *const zf =
 		&f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset)];
@@ -1140,7 +1182,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 	 */
 	for (z1 = zb + 1, z2 = zb - 1; z1 < zl || z2 >= zf; z1++, z2--) {
 		if (z1 < zl && z1->cond != ZBD_ZONE_COND_OFFLINE) {
-			zone_lock(td, z1);
+			zone_lock(td, f, z1);
 			if (z1->start + min_bs <= z1->wp)
 				return z1;
 			pthread_mutex_unlock(&z1->mutex);
@@ -1149,7 +1191,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 		}
 		if (td_random(td) && z2 >= zf &&
 		    z2->cond != ZBD_ZONE_COND_OFFLINE) {
-			zone_lock(td, z2);
+			zone_lock(td, f, z2);
 			if (z2->start + min_bs <= z2->wp)
 				return z2;
 			pthread_mutex_unlock(&z2->mutex);
@@ -1401,7 +1443,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 
 	zbd_check_swd(f);
 
-	zone_lock(td, zb);
+	zone_lock(td, f, zb);
 
 	switch (io_u->ddir) {
 	case DDIR_READ:
diff --git a/zbd.h b/zbd.h
index e8dd3d6d..9c447af4 100644
--- a/zbd.h
+++ b/zbd.h
@@ -45,6 +45,8 @@ struct fio_zone_info {
 /**
  * zoned_block_device_info - zoned block device characteristics
  * @model: Device model.
+ * @max_open_zones: global limit on the number of simultaneously opened
+ *	sequential write zones.
  * @mutex: Protects the modifiable members in this structure (refcount and
  *		num_open_zones).
  * @zone_size: size of a single zone in units of 512 bytes
@@ -65,6 +67,7 @@ struct fio_zone_info {
  */
 struct zoned_block_device_info {
 	enum zbd_zoned_model	model;
+	uint32_t		max_open_zones;
 	pthread_mutex_t		mutex;
 	uint64_t		zone_size;
 	uint64_t		sectors_with_data;
diff --git a/zbd_types.h b/zbd_types.h
index 2f2f1324..d63c0d0a 100644
--- a/zbd_types.h
+++ b/zbd_types.h
@@ -8,7 +8,7 @@
 
 #include <inttypes.h>
 
-#define ZBD_MAX_OPEN_ZONES	128
+#define ZBD_MAX_OPEN_ZONES	4096
 
 /*
  * Zoned block device models.


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-05-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-05-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 26949a8bbc544094740c61f79b4ea778978e7d8e:

  Merge branch '32-bit-fixes' of https://github.com/sitsofe/fio (2020-05-19 16:14:19 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0e21f5c6f64a73bcede20e1a29a885845e453b8e:

  Merge branch 'testing' of https://github.com/vincentkfu/fio (2020-05-20 14:19:12 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'issue-989' of https://github.com/Nordix/fio
      Merge branch 'testing' of https://github.com/vincentkfu/fio

Lars Ekman (1):
      Corrected scope of for-loop

Vincent Fu (6):
      t/jsonplus2csv_test: reduce file size
      t/run-fio-tests: better catch file errors
      t/latency_percentiles: run cmdprio_percentage tests only if root
      docs: update cmdprio_percentage with note about root user
      testing: use max-jobs to speed up testing
      t/zbd: improve error handling for test scripts

 HOWTO                                 |  3 ++-
 fio.1                                 |  3 ++-
 t/jsonplus2csv_test.py                |  5 ++++-
 t/latency_percentiles.py              |  5 +++--
 t/readonly.py                         |  1 +
 t/run-fio-tests.py                    | 37 +++++++++++++++++++++++------------
 t/steadystate_tests.py                |  2 +-
 t/strided.py                          |  1 +
 t/zbd/run-tests-against-regular-nullb |  4 ++--
 t/zbd/run-tests-against-zoned-nullb   |  2 +-
 tools/plot/fio2gnuplot                |  8 ++++----
 11 files changed, 45 insertions(+), 26 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 430c7b62..cd628552 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2040,7 +2040,8 @@ with the caveat that when used on the command line, they must come after the
     the priority bit. Non-read I/O is likely unaffected by ``cmdprio_percentage``.
     This option cannot be used with the `prio` or `prioclass` options. For this
     option to set the priority bit properly, NCQ priority must be supported and
-    enabled and :option:`direct`\=1 option must be used.
+    enabled and :option:`direct`\=1 option must be used. fio must also be run as
+    the root user.
 
 .. option:: fixedbufs : [io_uring]
 
diff --git a/fio.1 b/fio.1
index a2379f98..9e9e1b1a 100644
--- a/fio.1
+++ b/fio.1
@@ -1806,7 +1806,8 @@ Set the percentage of I/O that will be issued with higher priority by setting
 the priority bit. Non-read I/O is likely unaffected by ``cmdprio_percentage``.
 This option cannot be used with the `prio` or `prioclass` options. For this
 option to set the priority bit properly, NCQ priority must be supported and
-enabled and `direct=1' option must be used.
+enabled and `direct=1' option must be used. fio must also be run as the root
+user.
 .TP
 .BI (io_uring)fixedbufs
 If fio is asked to do direct IO, then Linux will map pages for each IO call, and
diff --git a/t/jsonplus2csv_test.py b/t/jsonplus2csv_test.py
index 2b34ef25..f01a9f32 100755
--- a/t/jsonplus2csv_test.py
+++ b/t/jsonplus2csv_test.py
@@ -44,6 +44,7 @@ def run_fio(fio):
         fio     path to fio executable.
     """
 
+# We need an async ioengine to get submission latencies
     if platform.system() == 'Linux':
         aio = 'libaio'
     elif platform.system() == 'Windows':
@@ -52,13 +53,14 @@ def run_fio(fio):
         aio = 'posixaio'
 
     fio_args = [
+        "--max-jobs=4",
         "--output=fio-output.json",
         "--output-format=json+",
         "--filename=fio_jsonplus_clat2csv.test",
         "--ioengine=" + aio,
         "--time_based",
         "--runtime=3s",
-        "--size=1G",
+        "--size=1M",
         "--slat_percentiles=1",
         "--clat_percentiles=1",
         "--lat_percentiles=1",
@@ -87,6 +89,7 @@ def check_output(fio_output, script_path):
     """
 
     if fio_output.returncode != 0:
+        print("ERROR: fio run failed")
         return False
 
     if platform.system() == 'Windows':
diff --git a/t/latency_percentiles.py b/t/latency_percentiles.py
index 5cdd49cf..6ce4579a 100755
--- a/t/latency_percentiles.py
+++ b/t/latency_percentiles.py
@@ -109,6 +109,7 @@ class FioLatTest():
         """Run a test."""
 
         fio_args = [
+            "--max-jobs=16",
             "--name=latency",
             "--randrepeat=0",
             "--norandommap",
@@ -1303,9 +1304,9 @@ def main():
            (args.run_only and test['test_id'] not in args.run_only):
             skipped = skipped + 1
             outcome = 'SKIPPED (User request)'
-        elif platform.system() != 'Linux' and 'cmdprio_percentage' in test:
+        elif (platform.system() != 'Linux' or os.geteuid() != 0) and 'cmdprio_percentage' in test:
             skipped = skipped + 1
-            outcome = 'SKIPPED (Linux required for cmdprio_percentage tests)'
+            outcome = 'SKIPPED (Linux root required for cmdprio_percentage tests)'
         else:
             test_obj = test['test_obj'](artifact_root, test, args.debug)
             status = test_obj.run_fio(fio)
diff --git a/t/readonly.py b/t/readonly.py
index 43686c9c..464847c6 100755
--- a/t/readonly.py
+++ b/t/readonly.py
@@ -36,6 +36,7 @@ def parse_args():
 
 def run_fio(fio, test, index):
     fio_args = [
+                "--max-jobs=16",
                 "--name=readonly",
                 "--ioengine=null",
                 "--time_based",
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 8e326ed5..763e0103 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -226,6 +226,7 @@ class FioJobTest(FioExeTest):
         self.json_data = None
         self.fio_output = "{0}.output".format(os.path.basename(self.fio_job))
         self.fio_args = [
+            "--max-jobs=16",
             "--output-format={0}".format(self.output_format),
             "--output={0}".format(self.fio_output),
             self.fio_job,
@@ -273,6 +274,20 @@ class FioJobTest(FioExeTest):
         else:
             logging.debug("Test %d: precondition step failed", self.testnum)
 
+    @classmethod
+    def get_file(cls, filename):
+        """Safely read a file."""
+        file_data = ''
+        success = True
+
+        try:
+            with open(filename, "r") as output_file:
+                file_data = output_file.read()
+        except OSError:
+            success = False
+
+        return file_data, success
+
     def check_result(self):
         """Check fio job results."""
 
@@ -289,10 +304,8 @@ class FioJobTest(FioExeTest):
         if 'json' not in self.output_format:
             return
 
-        try:
-            with open(os.path.join(self.test_dir, self.fio_output), "r") as output_file:
-                file_data = output_file.read()
-        except EnvironmentError:
+        file_data, success = self.get_file(os.path.join(self.test_dir, self.fio_output))
+        if not success:
             self.failure_reason = "{0} unable to open output file,".format(self.failure_reason)
             self.passed = False
             return
@@ -457,11 +470,9 @@ class Requirements(object):
         Requirements._linux = platform.system() == "Linux"
 
         if Requirements._linux:
-            try:
-                config_file = os.path.join(fio_root, "config-host.h")
-                with open(config_file, "r") as config:
-                    contents = config.read()
-            except Exception:
+            config_file = os.path.join(fio_root, "config-host.h")
+            contents, success = FioJobTest.get_file(config_file)
+            if not success:
                 print("Unable to open {0} to check requirements".format(config_file))
                 Requirements._zbd = True
             else:
@@ -899,10 +910,10 @@ def main():
         else:
             result = "FAILED: {0}".format(test.failure_reason)
             failed = failed + 1
-            with open(test.stderr_file, "r") as stderr_file:
-                logging.debug("Test %d: stderr:\n%s", config['test_id'], stderr_file.read())
-            with open(test.stdout_file, "r") as stdout_file:
-                logging.debug("Test %d: stdout:\n%s", config['test_id'], stdout_file.read())
+            contents, _ = FioJobTest.get_file(test.stderr_file)
+            logging.debug("Test %d: stderr:\n%s", config['test_id'], contents)
+            contents, _ = FioJobTest.get_file(test.stdout_file)
+            logging.debug("Test %d: stdout:\n%s", config['test_id'], contents)
         print("Test {0} {1}".format(config['test_id'], result))
 
     print("{0} test(s) passed, {1} failed, {2} skipped".format(passed, failed, skipped))
diff --git a/t/steadystate_tests.py b/t/steadystate_tests.py
index b55a67ac..e99b655c 100755
--- a/t/steadystate_tests.py
+++ b/t/steadystate_tests.py
@@ -122,7 +122,7 @@ if __name__ == '__main__':
     for job in reads:
 
         tf = "steadystate_job{0}.json".format(jobnum)
-        parameters = [ "--name=job{0}".format(jobnum) ]
+        parameters = [ "--max-jobs=16", "--name=job{0}".format(jobnum) ]
         parameters.extend([ "--thread",
                             "--output-format=json",
                             "--output={0}".format(tf),
diff --git a/t/strided.py b/t/strided.py
index aac15d10..6d34be8a 100755
--- a/t/strided.py
+++ b/t/strided.py
@@ -52,6 +52,7 @@ def parse_args():
 def run_fio(fio, test, index):
     filename = "strided"
     fio_args = [
+                "--max-jobs=16",
                 "--name=strided",
                 "--zonemode=strided",
                 "--log_offset=1",
diff --git a/t/zbd/run-tests-against-regular-nullb b/t/zbd/run-tests-against-regular-nullb
index 0f6e4b66..5b7b4009 100755
--- a/t/zbd/run-tests-against-regular-nullb
+++ b/t/zbd/run-tests-against-regular-nullb
@@ -8,7 +8,7 @@ scriptdir="$(cd "$(dirname "$0")" && pwd)"
 
 for d in /sys/kernel/config/nullb/*; do [ -d "$d" ] && rmdir "$d"; done
 modprobe -r null_blk
-modprobe null_blk nr_devices=0 || return $?
+modprobe null_blk nr_devices=0 || exit $?
 for d in /sys/kernel/config/nullb/*; do
     [ -d "$d" ] && rmdir "$d"
 done
@@ -22,6 +22,6 @@ modprobe null_blk nr_devices=0 &&
     echo 4096 > blocksize &&
     echo 1024 > size &&
     echo 1 > memory_backed &&
-    echo 1 > power
+    echo 1 > power || exit $?
 
 "${scriptdir}"/test-zbd-support "$@" /dev/nullb0
diff --git a/t/zbd/run-tests-against-zoned-nullb b/t/zbd/run-tests-against-zoned-nullb
index 0952011c..53aee3e8 100755
--- a/t/zbd/run-tests-against-zoned-nullb
+++ b/t/zbd/run-tests-against-zoned-nullb
@@ -8,7 +8,7 @@ scriptdir="$(cd "$(dirname "$0")" && pwd)"
 
 for d in /sys/kernel/config/nullb/*; do [ -d "$d" ] && rmdir "$d"; done
 modprobe -r null_blk
-modprobe null_blk nr_devices=0 || return $?
+modprobe null_blk nr_devices=0 || exit $?
 for d in /sys/kernel/config/nullb/*; do
     [ -d "$d" ] && rmdir "$d"
 done
diff --git a/tools/plot/fio2gnuplot b/tools/plot/fio2gnuplot
index cc4ea4c7..69aa791e 100755
--- a/tools/plot/fio2gnuplot
+++ b/tools/plot/fio2gnuplot
@@ -93,10 +93,10 @@ set style line 1 lt 1 lw 3 pt 3 linecolor rgb "green"
 		compare_smooth.write("plot %s w l ls 1 ti 'Global average value (%.2f)'" % (global_avg,global_avg));
 		compare_trend.write("plot %s w l ls 1 ti 'Global average value (%.2f)'" % (global_avg,global_avg));
 
-		pos=0
-		# Let's create a temporary file for each selected fio file
-		for file in fio_data_file:
-			tmp_filename = "gnuplot_temp_file.%d" % pos
+	pos=0
+	# Let's create a temporary file for each selected fio file
+	for file in fio_data_file:
+		tmp_filename = "gnuplot_temp_file.%d" % pos
 
 		# Plotting comparing graphs doesn't have a meaning unless if there is at least 2 traces
 		if len(fio_data_file) > 1:


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-05-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-05-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f09a7773f5821d5b89428419dcef1987ced39b67:

  Allow more flexibility in zone start and span (2020-05-18 16:56:00 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 26949a8bbc544094740c61f79b4ea778978e7d8e:

  Merge branch '32-bit-fixes' of https://github.com/sitsofe/fio (2020-05-19 16:14:19 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'stephen/rate-ull' of https://github.com/sbates130272/fio
      Merge branch '32-bit-fixes' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      Fix 32-bit/LLP64 platform truncation issues

Stephen Bates (1):
      rate: Convert the rate and rate_min options to FIO_OPTS_ULL

 backend.c  | 10 +++++-----
 fio.h      |  2 +-
 init.c     |  6 +++---
 lib/rand.c |  4 ++--
 lib/rand.h |  2 +-
 options.c  |  4 ++--
 6 files changed, 14 insertions(+), 14 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index feb34e51..f519728c 100644
--- a/backend.c
+++ b/backend.c
@@ -134,8 +134,8 @@ static bool __check_min_rate(struct thread_data *td, struct timespec *now,
 	unsigned long long bytes = 0;
 	unsigned long iops = 0;
 	unsigned long spent;
-	unsigned long rate;
-	unsigned int ratemin = 0;
+	unsigned long long rate;
+	unsigned long long ratemin = 0;
 	unsigned int rate_iops = 0;
 	unsigned int rate_iops_min = 0;
 
@@ -169,7 +169,7 @@ static bool __check_min_rate(struct thread_data *td, struct timespec *now,
 			 * check bandwidth specified rate
 			 */
 			if (bytes < td->rate_bytes[ddir]) {
-				log_err("%s: rate_min=%uB/s not met, only transferred %lluB\n",
+				log_err("%s: rate_min=%lluB/s not met, only transferred %lluB\n",
 					td->o.name, ratemin, bytes);
 				return true;
 			} else {
@@ -180,7 +180,7 @@ static bool __check_min_rate(struct thread_data *td, struct timespec *now,
 
 				if (rate < ratemin ||
 				    bytes < td->rate_bytes[ddir]) {
-					log_err("%s: rate_min=%uB/s not met, got %luB/s\n",
+					log_err("%s: rate_min=%lluB/s not met, got %lluB/s\n",
 						td->o.name, ratemin, rate);
 					return true;
 				}
@@ -201,7 +201,7 @@ static bool __check_min_rate(struct thread_data *td, struct timespec *now,
 
 				if (rate < rate_iops_min ||
 				    iops < td->rate_blocks[ddir]) {
-					log_err("%s: rate_iops_min=%u not met, got %lu IOPS\n",
+					log_err("%s: rate_iops_min=%u not met, got %llu IOPS\n",
 						td->o.name, rate_iops_min, rate);
 					return true;
 				}
diff --git a/fio.h b/fio.h
index bbf057c1..dbdfdf86 100644
--- a/fio.h
+++ b/fio.h
@@ -319,7 +319,7 @@ struct thread_data {
 	 */
 	uint64_t rate_bps[DDIR_RWDIR_CNT];
 	uint64_t rate_next_io_time[DDIR_RWDIR_CNT];
-	unsigned long rate_bytes[DDIR_RWDIR_CNT];
+	unsigned long long rate_bytes[DDIR_RWDIR_CNT];
 	unsigned long rate_blocks[DDIR_RWDIR_CNT];
 	unsigned long long rate_io_issue_bytes[DDIR_RWDIR_CNT];
 	struct timespec lastrate[DDIR_RWDIR_CNT];
diff --git a/init.c b/init.c
index b5315334..e220c323 100644
--- a/init.c
+++ b/init.c
@@ -993,9 +993,9 @@ void td_fill_verify_state_seed(struct thread_data *td)
 
 static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 {
-	unsigned int read_seed = td->rand_seeds[FIO_RAND_BS_OFF];
-	unsigned int write_seed = td->rand_seeds[FIO_RAND_BS1_OFF];
-	unsigned int trim_seed = td->rand_seeds[FIO_RAND_BS2_OFF];
+	uint64_t read_seed = td->rand_seeds[FIO_RAND_BS_OFF];
+	uint64_t write_seed = td->rand_seeds[FIO_RAND_BS1_OFF];
+	uint64_t trim_seed = td->rand_seeds[FIO_RAND_BS2_OFF];
 	int i;
 
 	/*
diff --git a/lib/rand.c b/lib/rand.c
index 69acb06c..5eb6e60a 100644
--- a/lib/rand.c
+++ b/lib/rand.c
@@ -85,12 +85,12 @@ void init_rand(struct frand_state *state, bool use64)
 		__init_rand64(&state->state64, 1);
 }
 
-void init_rand_seed(struct frand_state *state, unsigned int seed, bool use64)
+void init_rand_seed(struct frand_state *state, uint64_t seed, bool use64)
 {
 	state->use64 = use64;
 
 	if (!use64)
-		__init_rand32(&state->state32, seed);
+		__init_rand32(&state->state32, (unsigned int) seed);
 	else
 		__init_rand64(&state->state64, seed);
 }
diff --git a/lib/rand.h b/lib/rand.h
index 95d4f6d4..2ccc1b37 100644
--- a/lib/rand.h
+++ b/lib/rand.h
@@ -149,7 +149,7 @@ static inline uint64_t rand_between(struct frand_state *state, uint64_t start,
 }
 
 extern void init_rand(struct frand_state *, bool);
-extern void init_rand_seed(struct frand_state *, unsigned int seed, bool);
+extern void init_rand_seed(struct frand_state *, uint64_t seed, bool);
 extern void __fill_random_buf(void *buf, unsigned int len, uint64_t seed);
 extern uint64_t fill_random_buf(struct frand_state *, void *buf, unsigned int len);
 extern void __fill_random_buf_percentage(uint64_t, void *, unsigned int, unsigned int, unsigned int, char *, unsigned int);
diff --git a/options.c b/options.c
index 2372c042..b18cea33 100644
--- a/options.c
+++ b/options.c
@@ -3537,7 +3537,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "rate",
 		.lname	= "I/O rate",
-		.type	= FIO_OPT_INT,
+		.type	= FIO_OPT_ULL,
 		.off1	= offsetof(struct thread_options, rate[DDIR_READ]),
 		.off2	= offsetof(struct thread_options, rate[DDIR_WRITE]),
 		.off3	= offsetof(struct thread_options, rate[DDIR_TRIM]),
@@ -3549,7 +3549,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "rate_min",
 		.alias	= "ratemin",
 		.lname	= "I/O min rate",
-		.type	= FIO_OPT_INT,
+		.type	= FIO_OPT_ULL,
 		.off1	= offsetof(struct thread_options, ratemin[DDIR_READ]),
 		.off2	= offsetof(struct thread_options, ratemin[DDIR_WRITE]),
 		.off3	= offsetof(struct thread_options, ratemin[DDIR_TRIM]),


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-05-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-05-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 723f9bc5fe77ae8bedc08ed3ec3a25426b48c096:

  Merge branch 'rados' of https://github.com/vincentkfu/fio (2020-05-14 11:47:17 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f09a7773f5821d5b89428419dcef1987ced39b67:

  Allow more flexibility in zone start and span (2020-05-18 16:56:00 -0600)

----------------------------------------------------------------
Damien Le Moal (7):
      iolog: Fix write_iolog_close()
      zbd: Fix potential deadlock on read operations
      zbd: Fix read with verify
      zbd: Optimize zbd_file_reset()
      zbd: Rename zbd_init()
      io_u: Optimize set_rw_ddir()
      t/zbd: Use max-jobs=16 option

Pierre Labat (1):
      Allow more flexibility in zone start and span

 filesetup.c            |  4 ++--
 io_u.c                 |  3 ++-
 iolog.c                |  3 +++
 t/zbd/test-zbd-support |  4 ++--
 zbd.c                  | 28 +++++++++++++---------------
 zbd.h                  |  8 +++++++-
 6 files changed, 29 insertions(+), 21 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index 8a4091fc..49c54b81 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1268,7 +1268,7 @@ done:
 	td_restore_runstate(td, old_state);
 
 	if (td->o.zone_mode == ZONE_MODE_ZBD) {
-		err = zbd_init(td);
+		err = zbd_setup_files(td);
 		if (err)
 			goto err_out;
 	}
@@ -1469,7 +1469,7 @@ void close_and_free_files(struct thread_data *td)
 			td_io_unlink_file(td, f);
 		}
 
-		zbd_free_zone_info(f);
+		zbd_close_file(f);
 
 		if (use_free)
 			free(f->file_name);
diff --git a/io_u.c b/io_u.c
index 18e94617..aa8808b8 100644
--- a/io_u.c
+++ b/io_u.c
@@ -746,7 +746,8 @@ static void set_rw_ddir(struct thread_data *td, struct io_u *io_u)
 {
 	enum fio_ddir ddir = get_rw_ddir(td);
 
-	ddir = zbd_adjust_ddir(td, io_u, ddir);
+	if (td->o.zone_mode == ZONE_MODE_ZBD)
+		ddir = zbd_adjust_ddir(td, io_u, ddir);
 
 	if (td_trimwrite(td)) {
 		struct fio_file *f = io_u->file;
diff --git a/iolog.c b/iolog.c
index 917a446c..4a79fc46 100644
--- a/iolog.c
+++ b/iolog.c
@@ -342,6 +342,9 @@ void trim_io_piece(const struct io_u *io_u)
 
 void write_iolog_close(struct thread_data *td)
 {
+	if (!td->iolog_f)
+		return;
+
 	fflush(td->iolog_f);
 	fclose(td->iolog_f);
 	free(td->iolog_buf);
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index be889f34..de05f438 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -93,8 +93,8 @@ run_fio() {
 
     fio=$(dirname "$0")/../../fio
 
-    opts=("--aux-path=/tmp" "--allow_file_create=0" \
-			    "--significant_figures=10" "$@")
+    opts=("--max-jobs=16" "--aux-path=/tmp" "--allow_file_create=0" \
+	  "--significant_figures=10" "$@")
     opts+=(${var_opts[@]})
     { echo; echo "fio ${opts[*]}"; echo; } >>"${logfile}.${test_number}"
 
diff --git a/zbd.c b/zbd.c
index 8dc3c397..36de29fb 100644
--- a/zbd.c
+++ b/zbd.c
@@ -262,7 +262,8 @@ static bool zbd_verify_sizes(void)
 
 			zone_idx = zbd_zone_idx(f, f->file_offset);
 			z = &f->zbd_info->zone_info[zone_idx];
-			if (f->file_offset != z->start) {
+			if ((f->file_offset != z->start) &&
+			    (td->o.td_ddir != TD_DDIR_READ)) {
 				new_offset = (z+1)->start;
 				if (new_offset >= f->file_offset + f->io_size) {
 					log_info("%s: io_size must be at least one zone\n",
@@ -278,7 +279,8 @@ static bool zbd_verify_sizes(void)
 			zone_idx = zbd_zone_idx(f, f->file_offset + f->io_size);
 			z = &f->zbd_info->zone_info[zone_idx];
 			new_end = z->start;
-			if (f->file_offset + f->io_size != new_end) {
+			if ((td->o.td_ddir != TD_DDIR_READ) &&
+			    (f->file_offset + f->io_size != new_end)) {
 				if (new_end <= f->file_offset) {
 					log_info("%s: io_size must be at least one zone\n",
 						 f->file_name);
@@ -546,8 +548,7 @@ void zbd_free_zone_info(struct fio_file *f)
 {
 	uint32_t refcount;
 
-	if (!f->zbd_info)
-		return;
+	assert(f->zbd_info);
 
 	pthread_mutex_lock(&f->zbd_info->mutex);
 	refcount = --f->zbd_info->refcount;
@@ -592,7 +593,7 @@ static int zbd_init_zone_info(struct thread_data *td, struct fio_file *file)
 	return ret;
 }
 
-int zbd_init(struct thread_data *td)
+int zbd_setup_files(struct thread_data *td)
 {
 	struct fio_file *f;
 	int i;
@@ -743,8 +744,7 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 
 			reset_wp = z->wp != z->start;
 		} else {
-			reset_wp = (td->o.td_ddir & TD_DDIR_WRITE) &&
-					z->wp % min_bs != 0;
+			reset_wp = z->wp % min_bs != 0;
 		}
 		if (reset_wp) {
 			dprint(FD_ZBD, "%s: resetting zone %u\n",
@@ -856,7 +856,7 @@ void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 	struct fio_zone_info *zb, *ze;
 	uint32_t zone_idx_e;
 
-	if (!f->zbd_info)
+	if (!f->zbd_info || !td_write(td))
 		return;
 
 	zb = &f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset)];
@@ -869,7 +869,6 @@ void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 	 * writing data, which causes data loss.
 	 */
 	zbd_reset_zones(td, f, zb, ze, td->o.verify != VERIFY_NONE &&
-			(td->o.td_ddir & TD_DDIR_WRITE) &&
 			td->runstate != TD_VERIFYING);
 	zbd_reset_write_cnt(td, f);
 }
@@ -1141,7 +1140,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 	 */
 	for (z1 = zb + 1, z2 = zb - 1; z1 < zl || z2 >= zf; z1++, z2--) {
 		if (z1 < zl && z1->cond != ZBD_ZONE_COND_OFFLINE) {
-			pthread_mutex_lock(&z1->mutex);
+			zone_lock(td, z1);
 			if (z1->start + min_bs <= z1->wp)
 				return z1;
 			pthread_mutex_unlock(&z1->mutex);
@@ -1150,7 +1149,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 		}
 		if (td_random(td) && z2 >= zf &&
 		    z2->cond != ZBD_ZONE_COND_OFFLINE) {
-			pthread_mutex_lock(&z2->mutex);
+			zone_lock(td, z2);
 			if (z2->start + min_bs <= z2->wp)
 				return z2;
 			pthread_mutex_unlock(&z2->mutex);
@@ -1349,9 +1348,7 @@ enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
 	 * devices with all empty zones. Overwrite the first I/O direction as
 	 * write to make sure data to read exists.
 	 */
-	if (td->o.zone_mode != ZONE_MODE_ZBD ||
-	    ddir != DDIR_READ ||
-	    !td_rw(td))
+	if (ddir != DDIR_READ || !td_rw(td))
 		return ddir;
 
 	if (io_u->file->zbd_info->sectors_with_data ||
@@ -1409,7 +1406,8 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	switch (io_u->ddir) {
 	case DDIR_READ:
 		if (td->runstate == TD_VERIFYING) {
-			zb = zbd_replay_write_order(td, io_u, zb);
+			if (td_write(td))
+				zb = zbd_replay_write_order(td, io_u, zb);
 			goto accept;
 		}
 		/*
diff --git a/zbd.h b/zbd.h
index 5a660399..e8dd3d6d 100644
--- a/zbd.h
+++ b/zbd.h
@@ -77,8 +77,8 @@ struct zoned_block_device_info {
 	struct fio_zone_info	zone_info[0];
 };
 
+int zbd_setup_files(struct thread_data *td);
 void zbd_free_zone_info(struct fio_file *f);
-int zbd_init(struct thread_data *td);
 void zbd_file_reset(struct thread_data *td, struct fio_file *f);
 bool zbd_unaligned_write(int error_code);
 void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u);
@@ -87,6 +87,12 @@ enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
 enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
 char *zbd_write_status(const struct thread_stat *ts);
 
+static inline void zbd_close_file(struct fio_file *f)
+{
+	if (f->zbd_info)
+		zbd_free_zone_info(f);
+}
+
 static inline void zbd_queue_io_u(struct io_u *io_u, enum fio_q_status status)
 {
 	if (io_u->zbd_queue_io) {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-05-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-05-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3ac77f9f5f7e3f2a5f9dababae810565f4c72eb7:

  Merge branch 'helper-thread-select' of https://github.com/vincentkfu/fio (2020-05-13 08:10:32 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 723f9bc5fe77ae8bedc08ed3ec3a25426b48c096:

  Merge branch 'rados' of https://github.com/vincentkfu/fio (2020-05-14 11:47:17 -0600)

----------------------------------------------------------------
Adam Kupczyk (1):
      engines/rados: Added waiting for completion on cleanup.

Jens Axboe (2):
      Merge branch 'rados-cleanup-wait' of https://github.com/aclamk/fio
      Merge branch 'rados' of https://github.com/vincentkfu/fio

Vincent Fu (1):
      engines/rados: fix build issue with thread_cond_t vs pthread_cond_t

 engines/rados.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/engines/rados.c b/engines/rados.c
index 30fcebb5..d4413427 100644
--- a/engines/rados.c
+++ b/engines/rados.c
@@ -12,13 +12,15 @@
 #include "../optgroup.h"
 
 struct rados_data {
-        rados_t cluster;
-        rados_ioctx_t io_ctx;
-        struct io_u **aio_events;
-        bool connected;
-        pthread_mutex_t completed_lock;
-        pthread_cond_t completed_more_io;
-        struct flist_head completed_operations;
+	rados_t cluster;
+	rados_ioctx_t io_ctx;
+	struct io_u **aio_events;
+	bool connected;
+	pthread_mutex_t completed_lock;
+	pthread_cond_t completed_more_io;
+	struct flist_head completed_operations;
+	uint64_t ops_scheduled;
+	uint64_t ops_completed;
 };
 
 struct fio_rados_iou {
@@ -101,6 +103,8 @@ static int _fio_setup_rados_data(struct thread_data *td,
 	pthread_mutex_init(&rados->completed_lock, NULL);
 	pthread_cond_init(&rados->completed_more_io, NULL);
 	INIT_FLIST_HEAD(&rados->completed_operations);
+	rados->ops_scheduled = 0;
+	rados->ops_completed = 0;
 	*rados_data_ptr = rados;
 	return 0;
 
@@ -227,8 +231,11 @@ static void _fio_rados_disconnect(struct rados_data *rados)
 static void fio_rados_cleanup(struct thread_data *td)
 {
 	struct rados_data *rados = td->io_ops_data;
-
 	if (rados) {
+		pthread_mutex_lock(&rados->completed_lock);
+		while (rados->ops_scheduled != rados->ops_completed)
+			pthread_cond_wait(&rados->completed_more_io, &rados->completed_lock);
+		pthread_mutex_unlock(&rados->completed_lock);
 		_fio_rados_rm_objects(td, rados);
 		_fio_rados_disconnect(rados);
 		free(rados->aio_events);
@@ -244,6 +251,7 @@ static void complete_callback(rados_completion_t cb, void *arg)
 	assert(rados_aio_is_complete(fri->completion));
 	pthread_mutex_lock(&rados->completed_lock);
 	flist_add_tail(&fri->list, &rados->completed_operations);
+	rados->ops_completed++;
 	pthread_mutex_unlock(&rados->completed_lock);
 	pthread_cond_signal(&rados->completed_more_io);
 }
@@ -272,6 +280,7 @@ static enum fio_q_status fio_rados_queue(struct thread_data *td,
 			log_err("rados_write failed.\n");
 			goto failed_comp;
 		}
+		rados->ops_scheduled++;
 		return FIO_Q_QUEUED;
 	} else if (io_u->ddir == DDIR_READ) {
 		r = rados_aio_create_completion(fri, complete_callback,
@@ -286,6 +295,7 @@ static enum fio_q_status fio_rados_queue(struct thread_data *td,
 			log_err("rados_aio_read failed.\n");
 			goto failed_comp;
 		}
+		rados->ops_scheduled++;
 		return FIO_Q_QUEUED;
 	} else if (io_u->ddir == DDIR_TRIM) {
 		r = rados_aio_create_completion(fri, complete_callback,
@@ -307,6 +317,7 @@ static enum fio_q_status fio_rados_queue(struct thread_data *td,
 			log_err("rados_aio_write_op_operate failed.\n");
 			goto failed_write_op;
 		}
+		rados->ops_scheduled++;
 		return FIO_Q_QUEUED;
 	 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-05-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-05-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9bc878ed59deb6c4a031f4f3a8d1ff3bbdbbd14d:

  Merge branch 'btrace2fio' of https://github.com/liu-song-6/fio (2020-05-11 12:09:31 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3ac77f9f5f7e3f2a5f9dababae810565f4c72eb7:

  Merge branch 'helper-thread-select' of https://github.com/vincentkfu/fio (2020-05-13 08:10:32 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'helper-thread-select' of https://github.com/vincentkfu/fio

Vincent Fu (1):
      helper_thread: better handle select() return value

 helper_thread.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/helper_thread.c b/helper_thread.c
index ad1a08f3..a2fb7c29 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -194,10 +194,11 @@ static void *helper_thread_main(void *data)
 			FD_SET(hd->pipe[0], &rfds);
 			FD_ZERO(&efds);
 			FD_SET(hd->pipe[0], &efds);
-			ret = select(1, &rfds, NULL, &efds, &timeout);
-			if (ret < 0)
+			if (select(1, &rfds, NULL, &efds, &timeout) < 0) {
 				log_err("fio: select() call in helper thread failed: %s",
 					strerror(errno));
+				ret = 1;
+			}
 			if (read_from_pipe(hd->pipe[0], &action, sizeof(action)) <
 			    0)
 				action = 0;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-05-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-05-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b2ed1c4a07c6afc4ae90482ca5f70026a4bc34e6:

  Merge branch 'helper_thread_test' of https://github.com/vincentkfu/fio (2020-04-29 09:05:11 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9bc878ed59deb6c4a031f4f3a8d1ff3bbdbbd14d:

  Merge branch 'btrace2fio' of https://github.com/liu-song-6/fio (2020-05-11 12:09:31 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'btrace2fio' of https://github.com/liu-song-6/fio

Song Liu (1):
      btrace2fio: create separate jobs for pid with both read/write and trim

 t/btrace2fio.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/t/btrace2fio.c b/t/btrace2fio.c
index a8a9d629..6dca39cb 100644
--- a/t/btrace2fio.c
+++ b/t/btrace2fio.c
@@ -645,7 +645,8 @@ static void __output_p_ascii(struct btrace_pid *p, unsigned long *ios)
 	printf("\n");
 }
 
-static int __output_p_fio(struct btrace_pid *p, unsigned long *ios)
+static int __output_p_fio(struct btrace_pid *p, unsigned long *ios,
+			  const char *name_postfix)
 {
 	struct btrace_out *o = &p->o;
 	unsigned long total;
@@ -654,15 +655,30 @@ static int __output_p_fio(struct btrace_pid *p, unsigned long *ios)
 	int i, j;
 
 	if ((o->ios[0] + o->ios[1]) && o->ios[2]) {
-		log_err("fio: trace has both read/write and trim\n");
-		return 1;
+		unsigned long ios_bak[DDIR_RWDIR_CNT];
+
+		memcpy(ios_bak, o->ios, DDIR_RWDIR_CNT * sizeof(unsigned long));
+
+		/* create job for read/write */
+		o->ios[2] = 0;
+		__output_p_fio(p, ios, "");
+		o->ios[2] = ios_bak[2];
+
+		/* create job for trim */
+		o->ios[0] = 0;
+		o->ios[1] = 0;
+		__output_p_fio(p, ios, "_trim");
+		o->ios[0] = ios_bak[0];
+		o->ios[1] = ios_bak[1];
+
+		return 0;
 	}
 	if (!p->nr_files) {
 		log_err("fio: no devices found\n");
 		return 1;
 	}
 
-	printf("[pid%u", p->pid);
+	printf("[pid%u%s", p->pid, name_postfix);
 	if (p->nr_merge_pids)
 		for (i = 0; i < p->nr_merge_pids; i++)
 			printf(",pid%u", p->merge_pids[i]);
@@ -783,7 +799,7 @@ static int __output_p(struct btrace_pid *p, unsigned long *ios)
 	if (output_ascii)
 		__output_p_ascii(p, ios);
 	else
-		ret = __output_p_fio(p, ios);
+		ret = __output_p_fio(p, ios, "");
 
 	return ret;
 }
@@ -1037,7 +1053,7 @@ static int trace_needs_swap(const char *trace_file, int *swap)
 	int fd, ret;
 
 	*swap = -1;
-	
+
 	fd = open(trace_file, O_RDONLY);
 	if (fd < 0) {
 		perror("open");


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-04-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-04-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3966740cd903ed95f306e59892075faf9fd2a151:

  Merge branch 'gcc1' of https://github.com/kusumi/fio (2020-04-21 15:44:31 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b2ed1c4a07c6afc4ae90482ca5f70026a4bc34e6:

  Merge branch 'helper_thread_test' of https://github.com/vincentkfu/fio (2020-04-29 09:05:11 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'helper_thread_test' of https://github.com/vincentkfu/fio

Vincent Fu (3):
      helper_thread: cleanups
      helper_thread: fix inconsistent status intervals
      helper_thread: refactor status-interval and steadystate code

 helper_thread.c | 54 +++++++++++++++++++++++++++++++++++-------------------
 stat.c          | 12 ------------
 2 files changed, 35 insertions(+), 31 deletions(-)

---

Diff of recent changes:

diff --git a/helper_thread.c b/helper_thread.c
index 78749855..ad1a08f3 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -128,11 +128,30 @@ void helper_thread_exit(void)
 	pthread_join(helper_data->thread, NULL);
 }
 
+static unsigned int task_helper(struct timespec *last, struct timespec *now, unsigned int period, void do_task())
+{
+	unsigned int next, since;
+
+	since = mtime_since(last, now);
+	if (since >= period || period - since < 10) {
+		do_task();
+		timespec_add_msec(last, since);
+		if (since > period)
+			next = period - (since - period);
+		else
+			next = period;
+	} else
+		next = period - since;
+
+	return next;
+}
+
 static void *helper_thread_main(void *data)
 {
 	struct helper_data *hd = data;
-	unsigned int msec_to_next_event, next_log, next_ss = STEADYSTATE_MSEC;
-	struct timespec ts, last_du, last_ss;
+	unsigned int msec_to_next_event, next_log, next_si = status_interval;
+	unsigned int next_ss = STEADYSTATE_MSEC;
+	struct timespec ts, last_du, last_ss, last_si;
 	char action;
 	int ret = 0;
 
@@ -157,20 +176,19 @@ static void *helper_thread_main(void *data)
 #endif
 	memcpy(&last_du, &ts, sizeof(ts));
 	memcpy(&last_ss, &ts, sizeof(ts));
+	memcpy(&last_si, &ts, sizeof(ts));
 
 	fio_sem_up(hd->startup_sem);
 
 	msec_to_next_event = DISK_UTIL_MSEC;
 	while (!ret && !hd->exit) {
-		uint64_t since_du, since_ss = 0;
+		uint64_t since_du;
 		struct timeval timeout = {
-			.tv_sec  = DISK_UTIL_MSEC / 1000,
-			.tv_usec = (DISK_UTIL_MSEC % 1000) * 1000,
+			.tv_sec  = msec_to_next_event / 1000,
+			.tv_usec = (msec_to_next_event % 1000) * 1000,
 		};
 		fd_set rfds, efds;
 
-		timespec_add_msec(&ts, msec_to_next_event);
-
 		if (read_from_pipe(hd->pipe[0], &action, sizeof(action)) < 0) {
 			FD_ZERO(&rfds);
 			FD_SET(hd->pipe[0], &rfds);
@@ -209,25 +227,23 @@ static void *helper_thread_main(void *data)
 		if (action == A_DO_STAT)
 			__show_running_run_stats();
 
+		if (status_interval) {
+			next_si = task_helper(&last_si, &ts, status_interval, __show_running_run_stats);
+			msec_to_next_event = min(next_si, msec_to_next_event);
+		}
+
 		next_log = calc_log_samples();
 		if (!next_log)
 			next_log = DISK_UTIL_MSEC;
 
 		if (steadystate_enabled) {
-			since_ss = mtime_since(&last_ss, &ts);
-			if (since_ss >= STEADYSTATE_MSEC || STEADYSTATE_MSEC - since_ss < 10) {
-				steadystate_check();
-				timespec_add_msec(&last_ss, since_ss);
-				if (since_ss > STEADYSTATE_MSEC)
-					next_ss = STEADYSTATE_MSEC - (since_ss - STEADYSTATE_MSEC);
-				else
-					next_ss = STEADYSTATE_MSEC;
-			} else
-				next_ss = STEADYSTATE_MSEC - since_ss;
+			next_ss = task_helper(&last_ss, &ts, STEADYSTATE_MSEC, steadystate_check);
+			msec_to_next_event = min(next_ss, msec_to_next_event);
                 }
 
-		msec_to_next_event = min(min(next_log, msec_to_next_event), next_ss);
-		dprint(FD_HELPERTHREAD, "since_ss: %llu, next_ss: %u, next_log: %u, msec_to_next_event: %u\n", (unsigned long long)since_ss, next_ss, next_log, msec_to_next_event);
+		msec_to_next_event = min(next_log, msec_to_next_event);
+		dprint(FD_HELPERTHREAD, "next_si: %u, next_ss: %u, next_log: %u, msec_to_next_event: %u\n",
+			next_si, next_ss, next_log, msec_to_next_event);
 
 		if (!is_backend)
 			print_thread_status();
diff --git a/stat.c b/stat.c
index efa811d2..2cf11947 100644
--- a/stat.c
+++ b/stat.c
@@ -2354,8 +2354,6 @@ void __show_running_run_stats(void)
 	fio_sem_up(stat_sem);
 }
 
-static bool status_interval_init;
-static struct timespec status_time;
 static bool status_file_disabled;
 
 #define FIO_STATUS_FILE		"fio-dump-status"
@@ -2398,16 +2396,6 @@ static int check_status_file(void)
 
 void check_for_running_stats(void)
 {
-	if (status_interval) {
-		if (!status_interval_init) {
-			fio_gettime(&status_time, NULL);
-			status_interval_init = true;
-		} else if (mtime_since_now(&status_time) >= status_interval) {
-			show_running_run_stats();
-			fio_gettime(&status_time, NULL);
-			return;
-		}
-	}
 	if (check_status_file()) {
 		show_running_run_stats();
 		return;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-04-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-04-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 173ff874d01466fa19f41998225d173cafd7e3bc:

  json: don't use named initializers for anonymous unions (2020-04-20 21:20:03 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3966740cd903ed95f306e59892075faf9fd2a151:

  Merge branch 'gcc1' of https://github.com/kusumi/fio (2020-04-21 15:44:31 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'gcc1' of https://github.com/kusumi/fio

Tomohiro Kusumi (1):
      json: Fix compile error on RHEL6

 json.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/json.h b/json.h
index b2bb457e..1544ed76 100644
--- a/json.h
+++ b/json.h
@@ -92,9 +92,9 @@ static inline int json_object_add_value_object(struct json_object *obj,
 {
 	struct json_value arg = {
 		.type = JSON_TYPE_OBJECT,
-		.object = val,
 	};
 
+	arg.object = val;
 	return json_object_add_value_type(obj, name, &arg);
 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-04-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-04-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c65057f93d1de62a48f98578e24716128ce77a75:

  zbd: Fix I/O direction adjustment step for random read/write (2020-04-17 08:32:15 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 173ff874d01466fa19f41998225d173cafd7e3bc:

  json: don't use named initializers for anonymous unions (2020-04-20 21:20:03 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      json: don't use named initializers for anonymous unions

 json.h | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/json.h b/json.h
index 09c2f187..b2bb457e 100644
--- a/json.h
+++ b/json.h
@@ -57,9 +57,9 @@ static inline int json_object_add_value_int(struct json_object *obj,
 {
 	struct json_value arg = {
 		.type = JSON_TYPE_INTEGER,
-		.integer_number = val,
 	};
 
+	arg.integer_number = val;
 	return json_object_add_value_type(obj, name, &arg);
 }
 
@@ -68,9 +68,9 @@ static inline int json_object_add_value_float(struct json_object *obj,
 {
 	struct json_value arg = {
 		.type = JSON_TYPE_FLOAT,
-		.float_number = val,
 	};
 
+	arg.float_number = val;
 	return json_object_add_value_type(obj, name, &arg);
 }
 
@@ -80,9 +80,9 @@ static inline int json_object_add_value_string(struct json_object *obj,
 {
 	struct json_value arg = {
 		.type = JSON_TYPE_STRING,
-		.string = (char *)val,
 	};
 
+	arg.string = (char *)val;
 	return json_object_add_value_type(obj, name, &arg);
 }
 
@@ -104,9 +104,9 @@ static inline int json_object_add_value_array(struct json_object *obj,
 {
 	struct json_value arg = {
 		.type = JSON_TYPE_ARRAY,
-		.array = val,
 	};
 
+	arg.array = val;
 	return json_object_add_value_type(obj, name, &arg);
 }
 
@@ -118,9 +118,9 @@ static inline int json_array_add_value_int(struct json_array *obj,
 {
 	struct json_value arg = {
 		.type = JSON_TYPE_INTEGER,
-		.integer_number = val,
 	};
 
+	arg.integer_number = val;
 	return json_array_add_value_type(obj, &arg);
 }
 
@@ -129,9 +129,9 @@ static inline int json_array_add_value_float(struct json_array *obj,
 {
 	struct json_value arg = {
 		.type = JSON_TYPE_FLOAT,
-		.float_number = val,
 	};
 
+	arg.float_number = val;
 	return json_array_add_value_type(obj, &arg);
 }
 
@@ -140,9 +140,9 @@ static inline int json_array_add_value_string(struct json_array *obj,
 {
 	struct json_value arg = {
 		.type = JSON_TYPE_STRING,
-		.string = (char *)val,
 	};
 
+	arg.string = (char *)val;
 	return json_array_add_value_type(obj, &arg);
 }
 
@@ -151,9 +151,9 @@ static inline int json_array_add_value_object(struct json_array *obj,
 {
 	struct json_value arg = {
 		.type = JSON_TYPE_OBJECT,
-		.object = val,
 	};
 
+	arg.object = val;
 	return json_array_add_value_type(obj, &arg);
 }
 
@@ -162,9 +162,9 @@ static inline int json_array_add_value_array(struct json_array *obj,
 {
 	struct json_value arg = {
 		.type = JSON_TYPE_ARRAY,
-		.array = val,
 	};
 
+	arg.array = val;
 	return json_array_add_value_type(obj, &arg);
 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-04-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-04-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8dbd327fbaa267eef862843a270fac61d06df00b:

  Merge branch 'patch-1' of https://github.com/aakarshg/fio (2020-04-16 14:04:26 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c65057f93d1de62a48f98578e24716128ce77a75:

  zbd: Fix I/O direction adjustment step for random read/write (2020-04-17 08:32:15 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (1):
      zbd: Fix I/O direction adjustment step for random read/write

 io_u.c |  2 ++
 zbd.c  | 40 ++++++++++++++++++++++++++++++----------
 zbd.h  |  2 ++
 3 files changed, 34 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index 5d62a76c..18e94617 100644
--- a/io_u.c
+++ b/io_u.c
@@ -746,6 +746,8 @@ static void set_rw_ddir(struct thread_data *td, struct io_u *io_u)
 {
 	enum fio_ddir ddir = get_rw_ddir(td);
 
+	ddir = zbd_adjust_ddir(td, io_u, ddir);
+
 	if (td_trimwrite(td)) {
 		struct fio_file *f = io_u->file;
 		if (f->last_pos[DDIR_WRITE] == f->last_pos[DDIR_TRIM])
diff --git a/zbd.c b/zbd.c
index de0c5bf4..8dc3c397 100644
--- a/zbd.c
+++ b/zbd.c
@@ -1331,6 +1331,36 @@ void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u)
 	}
 }
 
+/**
+ * zbd_adjust_ddir - Adjust an I/O direction for zonedmode=zbd.
+ *
+ * @td: FIO thread data.
+ * @io_u: FIO I/O unit.
+ * @ddir: I/O direction before adjustment.
+ *
+ * Return adjusted I/O direction.
+ */
+enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
+			      enum fio_ddir ddir)
+{
+	/*
+	 * In case read direction is chosen for the first random I/O, fio with
+	 * zonemode=zbd stops because no data can be read from zoned block
+	 * devices with all empty zones. Overwrite the first I/O direction as
+	 * write to make sure data to read exists.
+	 */
+	if (td->o.zone_mode != ZONE_MODE_ZBD ||
+	    ddir != DDIR_READ ||
+	    !td_rw(td))
+		return ddir;
+
+	if (io_u->file->zbd_info->sectors_with_data ||
+	    td->o.read_beyond_wp)
+		return DDIR_READ;
+
+	return DDIR_WRITE;
+}
+
 /**
  * zbd_adjust_block - adjust the offset and length as necessary for ZBD drives
  * @td: FIO thread data.
@@ -1364,16 +1394,6 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	if (!zbd_zone_swr(zb))
 		return io_u_accept;
 
-	/*
-	 * In case read direction is chosen for the first random I/O, fio with
-	 * zonemode=zbd stops because no data can be read from zoned block
-	 * devices with all empty zones. Overwrite the first I/O direction as
-	 * write to make sure data to read exists.
-	 */
-	if (td_rw(td) && !f->zbd_info->sectors_with_data
-	    && !td->o.read_beyond_wp)
-		io_u->ddir = DDIR_WRITE;
-
 	/*
 	 * Accept the I/O offset for reads if reading beyond the write pointer
 	 * is enabled.
diff --git a/zbd.h b/zbd.h
index 4eaf902e..5a660399 100644
--- a/zbd.h
+++ b/zbd.h
@@ -82,6 +82,8 @@ int zbd_init(struct thread_data *td);
 void zbd_file_reset(struct thread_data *td, struct fio_file *f);
 bool zbd_unaligned_write(int error_code);
 void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u);
+enum fio_ddir zbd_adjust_ddir(struct thread_data *td, struct io_u *io_u,
+			      enum fio_ddir ddir);
 enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
 char *zbd_write_status(const struct thread_stat *ts);
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-04-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-04-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f5501dd2e466e53cd606e304f1cb6b0a49b481dc:

  Merge branch 'appveyor-artifacts' of https://github.com/vincentkfu/fio (2020-04-15 08:29:01 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8dbd327fbaa267eef862843a270fac61d06df00b:

  Merge branch 'patch-1' of https://github.com/aakarshg/fio (2020-04-16 14:04:26 -0600)

----------------------------------------------------------------
Aakarsh Gopi (1):
      Add fio-histo-log-pctiles to make file

Jens Axboe (1):
      Merge branch 'patch-1' of https://github.com/aakarshg/fio

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index cb9e1775..f1e984f5 100644
--- a/Makefile
+++ b/Makefile
@@ -25,7 +25,7 @@ OPTFLAGS= -g -ffast-math
 CFLAGS	:= -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement $(OPTFLAGS) $(EXTFLAGS) $(BUILD_CFLAGS) -I. -I$(SRCDIR) $(CFLAGS)
 LIBS	+= -lm $(EXTLIBS)
 PROGS	= fio
-SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/hist/fiologparser_hist.py tools/fio_jsonplus_clat2csv)
+SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/hist/fiologparser_hist.py tools/hist/fio-histo-log-pctiles.py tools/fio_jsonplus_clat2csv)
 
 ifndef CONFIG_FIO_NO_OPT
   CFLAGS := -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 $(CFLAGS)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-04-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-04-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 25dc6606fbbaca35aec3009c4ff9512ed02d41ba:

  zbd: fix sequential write pattern with verify= and max_open_zones= (2020-04-13 17:18:31 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f5501dd2e466e53cd606e304f1cb6b0a49b481dc:

  Merge branch 'appveyor-artifacts' of https://github.com/vincentkfu/fio (2020-04-15 08:29:01 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'appveyor-artifacts' of https://github.com/vincentkfu/fio

Vincent Fu (1):
      appveyor: make test artifacts available for inspection

 .appveyor.yml | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index 2f962c4b..e2351be7 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -25,8 +25,11 @@ after_build:
 
 test_script:
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && file.exe fio.exe && make.exe test'
-  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && python.exe t/run-fio-tests.py --debug'
+  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && [ -f fio.exe ] && python.exe t/run-fio-tests.py --artifact-root test-artifacts --debug'
+  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && [ -d test-artifacts ] && 7z a -t7z test-artifacts.7z test-artifacts -xr!foo.0.0 -xr!latency.?.0 -xr!fio_jsonplus_clat2csv.test'
 
 artifacts:
   - path: os\windows\*.msi
     name: msi
+  - path: test-artifacts.7z
+    name: test-artifacts


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-04-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-04-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3f2dcb7f855d43244ec178aa2a34bb1475bf6901:

  Merge branch 'zbd-build' of https://github.com/vincentkfu/fio (2020-04-08 08:46:35 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 25dc6606fbbaca35aec3009c4ff9512ed02d41ba:

  zbd: fix sequential write pattern with verify= and max_open_zones= (2020-04-13 17:18:31 -0600)

----------------------------------------------------------------
Alexey Dobriyan (2):
      zbd: fix zonemode=zbd with NDEBUG
      zbd: fix sequential write pattern with verify= and max_open_zones=

Jens Axboe (1):
      Merge branch 'fix-cflags' of https://github.com/Hi-Angel/fio

Konstantin Kharlamov (1):
      configure/Makefile: don't override user CFLAGS

Shin'ichiro Kawasaki (3):
      t/zbd: Fix a bug in max_open_zones()
      t/zbd: Fix a bug in reset_zone() for all zones reset
      zbd: Ensure first I/O is write for random read/write to sequential zones

 Makefile        | 24 ++++++++++-----------
 configure       |  2 +-
 t/zbd/functions |  6 +++---
 zbd.c           | 66 ++++++++++++++++++++++++++++++++++++++++-----------------
 4 files changed, 62 insertions(+), 36 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 5bcd6064..cb9e1775 100644
--- a/Makefile
+++ b/Makefile
@@ -22,16 +22,16 @@ endif
 DEBUGFLAGS = -DFIO_INC_DEBUG
 CPPFLAGS= -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DFIO_INTERNAL $(DEBUGFLAGS)
 OPTFLAGS= -g -ffast-math
-CFLAGS	= -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement $(OPTFLAGS) $(EXTFLAGS) $(BUILD_CFLAGS) -I. -I$(SRCDIR)
+CFLAGS	:= -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement $(OPTFLAGS) $(EXTFLAGS) $(BUILD_CFLAGS) -I. -I$(SRCDIR) $(CFLAGS)
 LIBS	+= -lm $(EXTLIBS)
 PROGS	= fio
 SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/hist/fiologparser_hist.py tools/fio_jsonplus_clat2csv)
 
 ifndef CONFIG_FIO_NO_OPT
-  CFLAGS += -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2
+  CFLAGS := -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 $(CFLAGS)
 endif
 ifdef CONFIG_BUILD_NATIVE
-  CFLAGS += -march=native
+  CFLAGS := -march=native $(CFLAGS)
 endif
 
 ifdef CONFIG_GFIO
@@ -55,27 +55,27 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 ifdef CONFIG_LIBHDFS
   HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
   HDFSLIB= -Wl,-rpath $(JAVA_HOME)/jre/lib/$(FIO_HDFS_CPU)/server -L$(JAVA_HOME)/jre/lib/$(FIO_HDFS_CPU)/server $(FIO_LIBHDFS_LIB)/libhdfs.a -ljvm
-  CFLAGS += $(HDFSFLAGS)
+  CFLAGS := $(HDFSFLAGS) $(CFLAGS)
   SOURCE += engines/libhdfs.c
 endif
 
 ifdef CONFIG_LIBISCSI
-  CFLAGS += $(LIBISCSI_CFLAGS)
+  CFLAGS := $(LIBISCSI_CFLAGS) $(CFLAGS)
   LIBS += $(LIBISCSI_LIBS)
   SOURCE += engines/libiscsi.c
 endif
 
 ifdef CONFIG_LIBNBD
-  CFLAGS += $(LIBNBD_CFLAGS)
+  CFLAGS := $(LIBNBD_CFLAGS) $(CFLAGS)
   LIBS += $(LIBNBD_LIBS)
   SOURCE += engines/nbd.c
 endif
 
 ifdef CONFIG_64BIT
-  CFLAGS += -DBITS_PER_LONG=64
+  CFLAGS := -DBITS_PER_LONG=64 $(CFLAGS)
 endif
 ifdef CONFIG_32BIT
-  CFLAGS += -DBITS_PER_LONG=32
+  CFLAGS := -DBITS_PER_LONG=32 $(CFLAGS)
 endif
 ifdef CONFIG_LIBAIO
   SOURCE += engines/libaio.c
@@ -140,7 +140,7 @@ ifdef CONFIG_GFAPI
   SOURCE += engines/glusterfs_sync.c
   SOURCE += engines/glusterfs_async.c
   ifdef CONFIG_GF_FADVISE
-    CFLAGS += "-DGFAPI_USE_FADVISE"
+    CFLAGS := "-DGFAPI_USE_FADVISE" $(CFLAGS)
   endif
 endif
 ifdef CONFIG_MTD
@@ -208,7 +208,7 @@ ifeq ($(CONFIG_TARGET_OS), AIX)
 endif
 ifeq ($(CONFIG_TARGET_OS), HP-UX)
   LIBS   += -lpthread -ldl -lrt
-  CFLAGS += -D_LARGEFILE64_SOURCE -D_XOPEN_SOURCE_EXTENDED
+  CFLAGS := -D_LARGEFILE64_SOURCE -D_XOPEN_SOURCE_EXTENDED $(CFLAGS)
 endif
 ifeq ($(CONFIG_TARGET_OS), Darwin)
   LIBS	 += -lpthread -ldl
@@ -217,7 +217,7 @@ ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
   SOURCE += os/windows/cpu-affinity.c os/windows/posix.c
   WINDOWS_OBJS = os/windows/cpu-affinity.o os/windows/posix.o lib/hweight.o
   LIBS	 += -lpthread -lpsapi -lws2_32 -lssp
-  CFLAGS += -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format
+  CFLAGS := -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format $(CFLAGS)
 endif
 
 OBJS := $(SOURCE:.c=.o)
@@ -386,7 +386,7 @@ FIO-VERSION-FILE: FORCE
 	@$(SHELL) $(SRCDIR)/FIO-VERSION-GEN
 -include FIO-VERSION-FILE
 
-override CFLAGS += -DFIO_VERSION='"$(FIO_VERSION)"'
+override CFLAGS := -DFIO_VERSION='"$(FIO_VERSION)"' $(CFLAGS)
 
 %.o : %.c
 	@mkdir -p $(dir $@)
diff --git a/configure b/configure
index ae2b3589..cf8b88e4 100755
--- a/configure
+++ b/configure
@@ -44,7 +44,7 @@ print_config() {
 }
 
 # Default CFLAGS
-CFLAGS="-D_GNU_SOURCE -include config-host.h"
+CFLAGS="-D_GNU_SOURCE -include config-host.h $CFLAGS"
 BUILD_CFLAGS=""
 
 # Print a helpful header at the top of config.log
diff --git a/t/zbd/functions b/t/zbd/functions
index 35087b15..1bd22ec4 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -43,7 +43,8 @@ max_open_zones() {
     local dev=$1
 
     if [ -n "${sg_inq}" ] && [ ! -n "${use_libzbc}" ]; then
-	if ! ${sg_inq} -e --page=0xB6 --len=20 --hex "$dev" 2> /dev/null; then
+	if ! ${sg_inq} -e --page=0xB6 --len=20 --hex "$dev" \
+		 > /dev/null 2>&1; then
 	    # Non scsi device such as null_blk can not return max open zones.
 	    # Use default value.
 	    echo 128
@@ -96,8 +97,7 @@ reset_zone() {
 
     if [ -n "${blkzone}" ] && [ ! -n "${use_libzbc}" ]; then
 	if [ "$offset" -lt 0 ]; then
-	    sectors=$(<"/sys/class/block/${dev#/dev/}/size")
-	    ${blkzone} reset -o "${offset}" -l "$sectors" "$dev"
+	    ${blkzone} reset "$dev"
 	else
 	    ${blkzone} reset -o "${offset}" -c 1 "$dev"
 	fi
diff --git a/zbd.c b/zbd.c
index 0b0d4f40..de0c5bf4 100644
--- a/zbd.c
+++ b/zbd.c
@@ -687,6 +687,22 @@ static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 	return zbd_reset_range(td, f, z->start, (z+1)->start - z->start);
 }
 
+/* The caller must hold f->zbd_info->mutex */
+static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
+			   unsigned int open_zone_idx)
+{
+	uint32_t zone_idx;
+
+	assert(open_zone_idx < f->zbd_info->num_open_zones);
+	zone_idx = f->zbd_info->open_zones[open_zone_idx];
+	memmove(f->zbd_info->open_zones + open_zone_idx,
+		f->zbd_info->open_zones + open_zone_idx + 1,
+		(ZBD_MAX_OPEN_ZONES - (open_zone_idx + 1)) *
+		sizeof(f->zbd_info->open_zones[0]));
+	f->zbd_info->num_open_zones--;
+	f->zbd_info->zone_info[zone_idx].open = 0;
+}
+
 /*
  * Reset a range of zones. Returns 0 upon success and 1 upon failure.
  * @td: fio thread data.
@@ -710,12 +726,26 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 	dprint(FD_ZBD, "%s: examining zones %u .. %u\n", f->file_name,
 		zbd_zone_nr(f->zbd_info, zb), zbd_zone_nr(f->zbd_info, ze));
 	for (z = zb; z < ze; z++) {
+		uint32_t nz = z - f->zbd_info->zone_info;
+
 		if (!zbd_zone_swr(z))
 			continue;
 		zone_lock(td, z);
-		reset_wp = all_zones ? z->wp != z->start :
-				(td->o.td_ddir & TD_DDIR_WRITE) &&
-				z->wp % min_bs != 0;
+		if (all_zones) {
+			unsigned int i;
+
+			pthread_mutex_lock(&f->zbd_info->mutex);
+			for (i = 0; i < f->zbd_info->num_open_zones; i++) {
+				if (f->zbd_info->open_zones[i] == nz)
+					zbd_close_zone(td, f, i);
+			}
+			pthread_mutex_unlock(&f->zbd_info->mutex);
+
+			reset_wp = z->wp != z->start;
+		} else {
+			reset_wp = (td->o.td_ddir & TD_DDIR_WRITE) &&
+					z->wp % min_bs != 0;
+		}
 		if (reset_wp) {
 			dprint(FD_ZBD, "%s: resetting zone %u\n",
 			       f->file_name,
@@ -905,22 +935,6 @@ out:
 	return res;
 }
 
-/* The caller must hold f->zbd_info->mutex */
-static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
-			   unsigned int open_zone_idx)
-{
-	uint32_t zone_idx;
-
-	assert(open_zone_idx < f->zbd_info->num_open_zones);
-	zone_idx = f->zbd_info->open_zones[open_zone_idx];
-	memmove(f->zbd_info->open_zones + open_zone_idx,
-		f->zbd_info->open_zones + open_zone_idx + 1,
-		(ZBD_MAX_OPEN_ZONES - (open_zone_idx + 1)) *
-		sizeof(f->zbd_info->open_zones[0]));
-	f->zbd_info->num_open_zones--;
-	f->zbd_info->zone_info[zone_idx].open = 0;
-}
-
 /* Anything goes as long as it is not a constant. */
 static uint32_t pick_random_zone_idx(const struct fio_file *f,
 				     const struct io_u *io_u)
@@ -1220,6 +1234,7 @@ static void zbd_put_io(const struct io_u *io_u)
 	struct zoned_block_device_info *zbd_info = f->zbd_info;
 	struct fio_zone_info *z;
 	uint32_t zone_idx;
+	int ret;
 
 	if (!zbd_info)
 		return;
@@ -1235,7 +1250,8 @@ static void zbd_put_io(const struct io_u *io_u)
 	       "%s: terminate I/O (%lld, %llu) for zone %u\n",
 	       f->file_name, io_u->offset, io_u->buflen, zone_idx);
 
-	assert(pthread_mutex_unlock(&z->mutex) == 0);
+	ret = pthread_mutex_unlock(&z->mutex);
+	assert(ret == 0);
 	zbd_check_swd(f);
 }
 
@@ -1348,6 +1364,16 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	if (!zbd_zone_swr(zb))
 		return io_u_accept;
 
+	/*
+	 * In case read direction is chosen for the first random I/O, fio with
+	 * zonemode=zbd stops because no data can be read from zoned block
+	 * devices with all empty zones. Overwrite the first I/O direction as
+	 * write to make sure data to read exists.
+	 */
+	if (td_rw(td) && !f->zbd_info->sectors_with_data
+	    && !td->o.read_beyond_wp)
+		io_u->ddir = DDIR_WRITE;
+
 	/*
 	 * Accept the I/O offset for reads if reading beyond the write pointer
 	 * is enabled.


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-04-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-04-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9d87c646c45227c86c5a15faee2a6717a4bf1b46:

  zbd: Fix build errors on Windows and MacOS (2020-04-07 20:20:36 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3f2dcb7f855d43244ec178aa2a34bb1475bf6901:

  Merge branch 'zbd-build' of https://github.com/vincentkfu/fio (2020-04-08 08:46:35 -0600)

----------------------------------------------------------------
Damien Le Moal (3):
      zbd: Fix missing mutex unlock and warnings detected with coverity
      examples: add zonemode=zbd example scripts
      examples: add libzbc ioengine example scripts

Jens Axboe (1):
      Merge branch 'zbd-build' of https://github.com/vincentkfu/fio

Vincent Fu (2):
      zbd: fix Windows build errors
      Revert ".travis.yml: remove pip line from xcode11.2 config"

 .travis.yml                    |  3 +--
 examples/libzbc-rand-write.fio | 20 ++++++++++++++++++++
 examples/libzbc-seq-read.fio   | 19 +++++++++++++++++++
 examples/zbd-rand-write.fio    | 20 ++++++++++++++++++++
 examples/zbd-seq-read.fio      | 19 +++++++++++++++++++
 zbd.c                          |  6 ++++++
 6 files changed, 85 insertions(+), 2 deletions(-)
 create mode 100644 examples/libzbc-rand-write.fio
 create mode 100644 examples/libzbc-seq-read.fio
 create mode 100644 examples/zbd-rand-write.fio
 create mode 100644 examples/zbd-seq-read.fio

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
index 6b710cc3..77c31b77 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -48,9 +48,8 @@ before_install:
         brew install cunit;
         if [[ "$TRAVIS_OSX_IMAGE" == "xcode11.2" ]]; then
             pip3 install scipy;
-        else
-            pip install scipy;
         fi;
+        pip install scipy;
     fi;
 script:
   - ./configure --extra-cflags="${EXTRA_CFLAGS}" && make
diff --git a/examples/libzbc-rand-write.fio b/examples/libzbc-rand-write.fio
new file mode 100644
index 00000000..ce5870e4
--- /dev/null
+++ b/examples/libzbc-rand-write.fio
@@ -0,0 +1,20 @@
+; Using the libzbc ioengine, random write to a (zoned) block device,
+; writing at most 32 zones at a time. Target zones are chosen randomly
+; and writes directed at the write pointer of the chosen zones
+
+[global]
+name=libzbc-rand-write
+group_reporting
+rw=randwrite
+zonemode=zbd
+zonesize=256M
+max_open_zones=32
+bs=512K
+direct=1
+numjobs=16
+time_based=1
+runtime=300
+
+[dev1]
+filename=/dev/sdj
+ioengine=libzbc
diff --git a/examples/libzbc-seq-read.fio b/examples/libzbc-seq-read.fio
new file mode 100644
index 00000000..f4d265a0
--- /dev/null
+++ b/examples/libzbc-seq-read.fio
@@ -0,0 +1,19 @@
+; Using the libzbc ioengine, sequentially read 40 zones of a (zoned)
+; block device, reading only written data from the 524th zone
+; (524 * 256M = 140660178944)
+
+[global]
+name=libzbc-seq-read
+rw=read
+bs=1M
+direct=1
+numjobs=1
+zonemode=zbd
+zonesize=256M
+read_beyond_wp=0
+
+[dev1]
+filename=/dev/sdd
+offset=140660178944
+size=10G
+ioengine=libzbc
diff --git a/examples/zbd-rand-write.fio b/examples/zbd-rand-write.fio
new file mode 100644
index 00000000..1b3f2088
--- /dev/null
+++ b/examples/zbd-rand-write.fio
@@ -0,0 +1,20 @@
+; Using the libaio ioengine, random write to a (zoned) block device,
+; writing at most 32 zones at a time. Target zones are chosen randomly
+; and writes directed at the write pointer of the chosen zones
+
+[global]
+name=zbd-rand-write
+group_reporting
+rw=randwrite
+zonemode=zbd
+zonesize=256M
+max_open_zones=32
+bs=512K
+direct=1
+numjobs=16
+time_based=1
+runtime=180
+
+[dev1]
+filename=/dev/sdj
+ioengine=psync
diff --git a/examples/zbd-seq-read.fio b/examples/zbd-seq-read.fio
new file mode 100644
index 00000000..82f2b1db
--- /dev/null
+++ b/examples/zbd-seq-read.fio
@@ -0,0 +1,19 @@
+; Sequentially read 40 zones of a (zoned) block device, reading only
+; written data from the 524th zone (524 * 256M = 140660178944)
+
+[global]
+name=zbd-seq-read
+rw=read
+bs=256K
+direct=1
+numjobs=1
+zonemode=zbd
+zonesize=256M
+read_beyond_wp=0
+
+[dev1]
+filename=/dev/sdd
+offset=140660178944
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/zbd.c b/zbd.c
index f4067802..0b0d4f40 100644
--- a/zbd.c
+++ b/zbd.c
@@ -11,6 +11,7 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
+#include "os/os.h"
 #include "file.h"
 #include "fio.h"
 #include "lib/pow2.h"
@@ -704,6 +705,8 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 	bool reset_wp;
 	int res = 0;
 
+	assert(min_bs);
+
 	dprint(FD_ZBD, "%s: examining zones %u .. %u\n", f->file_name,
 		zbd_zone_nr(f->zbd_info, zb), zbd_zone_nr(f->zbd_info, ze));
 	for (z = zb; z < ze; z++) {
@@ -1004,6 +1007,8 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 
 		dprint(FD_ZBD, "%s(%s): no candidate zone\n",
 			__func__, f->file_name);
+		pthread_mutex_unlock(&f->zbd_info->mutex);
+		pthread_mutex_unlock(&z->mutex);
 		return NULL;
 
 found_candidate_zone:
@@ -1332,6 +1337,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	if (!f->zbd_info)
 		return io_u_accept;
 
+	assert(min_bs);
 	assert(is_valid_offset(f, io_u->offset));
 	assert(io_u->buflen);
 	zone_idx_b = zbd_zone_idx(f, io_u->offset);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-04-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-04-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ebc403fe282864eddfd68ab1793f149a1b0eb1cd:

  zbd: fixup ->zone_size_log2 if zone size is not power of 2 (2020-04-06 19:41:45 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9d87c646c45227c86c5a15faee2a6717a4bf1b46:

  zbd: Fix build errors on Windows and MacOS (2020-04-07 20:20:36 -0600)

----------------------------------------------------------------
Damien Le Moal (3):
      fio: Generalize zonemode=zbd
      ioengines: Add zoned block device operations
      zbd: Fix build errors on Windows and MacOS

Dmitry Fomichev (2):
      fio: Introduce libzbc IO engine
      t/zbd: Add support for libzbc IO engine tests

Dmitry Monakhov (2):
      engines: check options before dereference
      engine/rdmaio: fix io_u initialization

Jens Axboe (1):
      Merge branch 'rdma-fixes' of https://github.com/dmonakhov/fio

 Makefile                    |   9 +-
 configure                   |  36 +++-
 engines/e4defrag.c          |   2 +-
 engines/libzbc.c            | 422 ++++++++++++++++++++++++++++++++++++++++++++
 engines/rbd.c               |   8 +
 engines/rdma.c              |  17 +-
 engines/skeleton_external.c |  43 +++++
 fio.1                       |   6 +
 fio.h                       |   2 -
 io_u.h                      |   2 -
 ioengines.h                 |   9 +-
 options.c                   |   3 +-
 oslib/blkzoned.h            |  49 +++++
 oslib/linux-blkzoned.c      | 219 +++++++++++++++++++++++
 t/run-fio-tests.py          |   2 +-
 t/zbd/functions             |  38 +++-
 t/zbd/test-zbd-support      | 221 +++++++++++++++--------
 zbd.c                       | 404 ++++++++++++++++++++----------------------
 zbd.h                       |  70 +-------
 zbd_types.h                 |  57 ++++++
 20 files changed, 1247 insertions(+), 372 deletions(-)
 create mode 100644 engines/libzbc.c
 create mode 100644 oslib/blkzoned.h
 create mode 100644 oslib/linux-blkzoned.c
 create mode 100644 zbd_types.h

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 9a5dea7f..5bcd6064 100644
--- a/Makefile
+++ b/Makefile
@@ -50,7 +50,7 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		gettime-thread.c helpers.c json.c idletime.c td_error.c \
 		profiles/tiobench.c profiles/act.c io_u_queue.c filelock.c \
 		workqueue.c rate-submit.c optgroup.c helper_thread.c \
-		steadystate.c zone-dist.c
+		steadystate.c zone-dist.c zbd.c
 
 ifdef CONFIG_LIBHDFS
   HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
@@ -160,13 +160,16 @@ endif
 ifdef CONFIG_IME
   SOURCE += engines/ime.c
 endif
-ifdef CONFIG_LINUX_BLKZONED
-  SOURCE += zbd.c
+ifdef CONFIG_LIBZBC
+  SOURCE += engines/libzbc.c
 endif
 
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
 		oslib/linux-dev-lookup.c engines/io_uring.c
+ifdef CONFIG_HAS_BLKZONED
+  SOURCE += oslib/linux-blkzoned.c
+endif
   LIBS += -lpthread -ldl
   LDFLAGS += -rdynamic
 endif
diff --git a/configure b/configure
index d17929f1..ae2b3589 100755
--- a/configure
+++ b/configure
@@ -2397,6 +2397,37 @@ if compile_prog "" "" "linux_blkzoned"; then
 fi
 print_config "Zoned block device support" "$linux_blkzoned"
 
+##########################################
+# libzbc probe
+if test "$libzbc" != "yes" ; then
+  libzbc="no"
+fi
+cat > $TMPC << EOF
+#include <libzbc/zbc.h>
+int main(int argc, char **argv)
+{
+  struct zbc_device *dev = NULL;
+
+  return zbc_open("foo=bar", O_RDONLY, &dev);
+}
+EOF
+if compile_prog "" "-lzbc" "libzbc"; then
+  libzbcvermaj=$(pkg-config --modversion libzbc | sed 's/\.[0-9]*\.[0-9]*//')
+  if test "$libzbcvermaj" -ge "5" ; then
+    libzbc="yes"
+    LIBS="-lzbc $LIBS"
+  else
+    print_config "libzbc engine" "Unsupported libzbc version (version 5 or above required)"
+    libzbc="no"
+  fi
+else
+  if test "$libzbc" = "yes" ; then
+      feature_not_found "libzbc" "libzbc or libzbc/zbc.h"
+  fi
+  libzbc="no"
+fi
+print_config "libzbc engine" "$libzbc"
+
 ##########################################
 # check march=armv8-a+crc+crypto
 if test "$march_armv8_a_crc_crypto" != "yes" ; then
@@ -2862,7 +2893,10 @@ if test "$valgrind_dev" = "yes"; then
   output_sym "CONFIG_VALGRIND_DEV"
 fi
 if test "$linux_blkzoned" = "yes" ; then
-  output_sym "CONFIG_LINUX_BLKZONED"
+  output_sym "CONFIG_HAS_BLKZONED"
+fi
+if test "$libzbc" = "yes" ; then
+  output_sym "CONFIG_LIBZBC"
 fi
 if test "$zlib" = "no" ; then
   echo "Consider installing zlib-dev (zlib-devel, some fio features depend on it."
diff --git a/engines/e4defrag.c b/engines/e4defrag.c
index 8f71d02c..0a0004d0 100644
--- a/engines/e4defrag.c
+++ b/engines/e4defrag.c
@@ -72,7 +72,7 @@ static int fio_e4defrag_init(struct thread_data *td)
 	struct stat stub;
 	char donor_name[PATH_MAX];
 
-	if (!strlen(o->donor_name)) {
+	if (!o->donor_name || !strlen(o->donor_name)) {
 		log_err("'donorname' options required\n");
 		return 1;
 	}
diff --git a/engines/libzbc.c b/engines/libzbc.c
new file mode 100644
index 00000000..8c682de6
--- /dev/null
+++ b/engines/libzbc.c
@@ -0,0 +1,422 @@
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * This file is released under the GPL.
+ *
+ * libzbc engine
+ * IO engine using libzbc library to talk to SMR disks.
+ */
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <libzbc/zbc.h>
+
+#include "fio.h"
+#include "err.h"
+#include "zbd_types.h"
+
+struct libzbc_data {
+	struct zbc_device	*zdev;
+	enum zbc_dev_model	model;
+	uint64_t		nr_sectors;
+};
+
+static int libzbc_get_dev_info(struct libzbc_data *ld, struct fio_file *f)
+{
+	struct zbc_device_info *zinfo;
+
+	zinfo = calloc(1, sizeof(*zinfo));
+	if (!zinfo)
+		return -ENOMEM;
+
+	zbc_get_device_info(ld->zdev, zinfo);
+	ld->model = zinfo->zbd_model;
+	ld->nr_sectors = zinfo->zbd_sectors;
+
+	dprint(FD_ZBD, "%s: vendor_id:%s, type: %s, model: %s\n",
+	       f->file_name, zinfo->zbd_vendor_id,
+	       zbc_device_type_str(zinfo->zbd_type),
+	       zbc_device_model_str(zinfo->zbd_model));
+
+	free(zinfo);
+
+	return 0;
+}
+
+static int libzbc_open_dev(struct thread_data *td, struct fio_file *f,
+			   struct libzbc_data **p_ld)
+{
+	struct libzbc_data *ld = td->io_ops_data;
+        int ret, flags = OS_O_DIRECT;
+
+	if (ld) {
+		/* Already open */
+		assert(ld->zdev);
+		goto out;
+	}
+
+	if (f->filetype != FIO_TYPE_BLOCK && f->filetype != FIO_TYPE_CHAR) {
+		td_verror(td, EINVAL, "wrong file type");
+		log_err("ioengine libzbc only works on block or character devices\n");
+		return -EINVAL;
+	}
+
+        if (td_write(td)) {
+		if (!read_only)
+			flags |= O_RDWR;
+	} else if (td_read(td)) {
+		if (f->filetype == FIO_TYPE_CHAR && !read_only)
+			flags |= O_RDWR;
+		else
+			flags |= O_RDONLY;
+	} else if (td_trim(td)) {
+		td_verror(td, EINVAL, "libzbc does not support trim");
+                log_err("%s: libzbc does not support trim\n",
+                        f->file_name);
+                return -EINVAL;
+	}
+
+        if (td->o.oatomic) {
+		td_verror(td, EINVAL, "libzbc does not support O_ATOMIC");
+                log_err("%s: libzbc does not support O_ATOMIC\n",
+                        f->file_name);
+                return -EINVAL;
+        }
+
+	ld = calloc(1, sizeof(*ld));
+	if (!ld)
+		return -ENOMEM;
+
+	ret = zbc_open(f->file_name,
+		       flags | ZBC_O_DRV_SCSI | ZBC_O_DRV_ATA, &ld->zdev);
+	if (ret) {
+		log_err("%s: zbc_open() failed, err=%d\n",
+			f->file_name, ret);
+		return ret;
+	}
+
+	ret = libzbc_get_dev_info(ld, f);
+	if (ret) {
+		zbc_close(ld->zdev);
+		free(ld);
+		return ret;
+	}
+
+	td->io_ops_data = ld;
+out:
+	if (p_ld)
+		*p_ld = ld;
+
+	return 0;
+}
+
+static int libzbc_close_dev(struct thread_data *td)
+{
+	struct libzbc_data *ld = td->io_ops_data;
+	int ret = 0;
+
+	td->io_ops_data = NULL;
+	if (ld) {
+		if (ld->zdev)
+			ret = zbc_close(ld->zdev);
+		free(ld);
+	}
+
+	return ret;
+}
+static int libzbc_open_file(struct thread_data *td, struct fio_file *f)
+{
+	return libzbc_open_dev(td, f, NULL);
+}
+
+static int libzbc_close_file(struct thread_data *td, struct fio_file *f)
+{
+	int ret;
+
+	ret = libzbc_close_dev(td);
+	if (ret)
+		log_err("%s: close device failed err %d\n",
+			f->file_name, ret);
+
+	return ret;
+}
+
+static void libzbc_cleanup(struct thread_data *td)
+{
+	libzbc_close_dev(td);
+}
+
+static int libzbc_invalidate(struct thread_data *td, struct fio_file *f)
+{
+	/* Passthrough IO do not cache data. Nothing to do */
+	return 0;
+}
+
+static int libzbc_get_file_size(struct thread_data *td, struct fio_file *f)
+{
+	struct libzbc_data *ld;
+	int ret;
+
+	if (fio_file_size_known(f))
+		return 0;
+
+	ret = libzbc_open_dev(td, f, &ld);
+	if (ret)
+		return ret;
+
+	f->real_file_size = ld->nr_sectors << 9;
+	fio_file_set_size_known(f);
+
+	return 0;
+}
+
+static int libzbc_get_zoned_model(struct thread_data *td, struct fio_file *f,
+				  enum zbd_zoned_model *model)
+{
+	struct libzbc_data *ld;
+	int ret;
+
+	if (f->filetype != FIO_TYPE_BLOCK && f->filetype != FIO_TYPE_CHAR) {
+		*model = ZBD_IGNORE;
+		return 0;
+	}
+
+	ret = libzbc_open_dev(td, f, &ld);
+	if (ret)
+		return ret;
+
+	switch (ld->model) {
+	case ZBC_DM_HOST_AWARE:
+		*model = ZBD_HOST_AWARE;
+		break;
+	case ZBC_DM_HOST_MANAGED:
+		*model = ZBD_HOST_MANAGED;
+		break;
+	default:
+		*model = ZBD_NONE;
+		break;
+	}
+
+	return 0;
+}
+
+static int libzbc_report_zones(struct thread_data *td, struct fio_file *f,
+			       uint64_t offset, struct zbd_zone *zbdz,
+			       unsigned int nr_zones)
+{
+	struct libzbc_data *ld;
+	uint64_t sector = offset >> 9;
+	struct zbc_zone *zones;
+	unsigned int i;
+	int ret;
+
+	ret = libzbc_open_dev(td, f, &ld);
+	if (ret)
+		return ret;
+
+	if (sector >= ld->nr_sectors)
+		return 0;
+
+	zones = calloc(nr_zones, sizeof(struct zbc_zone));
+	if (!zones) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = zbc_report_zones(ld->zdev, sector, ZBC_RO_ALL, zones, &nr_zones);
+	if (ret < 0) {
+		log_err("%s: zbc_report_zones failed, err=%d\n",
+			f->file_name, ret);
+		goto out;
+	}
+
+	for (i = 0; i < nr_zones; i++, zbdz++) {
+		zbdz->start = zones[i].zbz_start << 9;
+		zbdz->len = zones[i].zbz_length << 9;
+		zbdz->wp = zones[i].zbz_write_pointer << 9;
+
+		switch (zones[i].zbz_type) {
+		case ZBC_ZT_CONVENTIONAL:
+			zbdz->type = ZBD_ZONE_TYPE_CNV;
+			break;
+		case ZBC_ZT_SEQUENTIAL_REQ:
+			zbdz->type = ZBD_ZONE_TYPE_SWR;
+			break;
+		case ZBC_ZT_SEQUENTIAL_PREF:
+			zbdz->type = ZBD_ZONE_TYPE_SWP;
+			break;
+		default:
+			td_verror(td, errno, "invalid zone type");
+			log_err("%s: invalid type for zone at sector %llu.\n",
+				f->file_name, (unsigned long long)zbdz->start);
+			ret = -EIO;
+			goto out;
+		}
+
+		switch (zones[i].zbz_condition) {
+		case ZBC_ZC_NOT_WP:
+			zbdz->cond = ZBD_ZONE_COND_NOT_WP;
+			break;
+		case ZBC_ZC_EMPTY:
+			zbdz->cond = ZBD_ZONE_COND_EMPTY;
+			break;
+		case ZBC_ZC_IMP_OPEN:
+			zbdz->cond = ZBD_ZONE_COND_IMP_OPEN;
+			break;
+		case ZBC_ZC_EXP_OPEN:
+			zbdz->cond = ZBD_ZONE_COND_EXP_OPEN;
+			break;
+		case ZBC_ZC_CLOSED:
+			zbdz->cond = ZBD_ZONE_COND_CLOSED;
+			break;
+		case ZBC_ZC_FULL:
+			zbdz->cond = ZBD_ZONE_COND_FULL;
+			break;
+		case ZBC_ZC_RDONLY:
+		case ZBC_ZC_OFFLINE:
+		default:
+			/* Treat all these conditions as offline (don't use!) */
+			zbdz->cond = ZBD_ZONE_COND_OFFLINE;
+			break;
+		}
+	}
+
+	ret = nr_zones;
+out:
+	free(zones);
+	return ret;
+}
+
+static int libzbc_reset_wp(struct thread_data *td, struct fio_file *f,
+			   uint64_t offset, uint64_t length)
+{
+	struct libzbc_data *ld = td->io_ops_data;
+	uint64_t sector = offset >> 9;
+	uint64_t end_sector = (offset + length) >> 9;
+	unsigned int nr_zones;
+	struct zbc_errno err;
+	int i, ret;
+
+	assert(ld);
+	assert(ld->zdev);
+
+	nr_zones = (length + td->o.zone_size - 1) / td->o.zone_size;
+	if (!sector && end_sector >= ld->nr_sectors) {
+		/* Reset all zones */
+		ret = zbc_reset_zone(ld->zdev, 0, ZBC_OP_ALL_ZONES);
+		if (ret)
+			goto err;
+
+		return 0;
+	}
+
+	for (i = 0; i < nr_zones; i++, sector += td->o.zone_size >> 9) {
+		ret = zbc_reset_zone(ld->zdev, sector, 0);
+		if (ret)
+			goto err;
+	}
+
+	return 0;
+
+err:
+	zbc_errno(ld->zdev, &err);
+	td_verror(td, errno, "zbc_reset_zone failed");
+	if (err.sk)
+		log_err("%s: reset wp failed %s:%s\n",
+			f->file_name,
+			zbc_sk_str(err.sk), zbc_asc_ascq_str(err.asc_ascq));
+	return -ret;
+}
+
+ssize_t libzbc_rw(struct thread_data *td, struct io_u *io_u)
+{
+	struct libzbc_data *ld = td->io_ops_data;
+	struct fio_file *f = io_u->file;
+	uint64_t sector = io_u->offset >> 9;
+	size_t count = io_u->xfer_buflen >> 9;
+	struct zbc_errno err;
+	ssize_t ret;
+
+	if (io_u->ddir == DDIR_WRITE)
+		ret = zbc_pwrite(ld->zdev, io_u->xfer_buf, count, sector);
+	else
+		ret = zbc_pread(ld->zdev, io_u->xfer_buf, count, sector);
+	if (ret == count)
+		return ret;
+
+	if (ret > 0) {
+		log_err("Short %s, len=%zu, ret=%zd\n",
+			io_u->ddir == DDIR_READ ? "read" : "write",
+			count << 9, ret << 9);
+		return -EIO;
+	}
+
+	/* I/O error */
+	zbc_errno(ld->zdev, &err);
+	td_verror(td, errno, "libzbc i/o failed");
+	if (err.sk) {
+		log_err("%s: op %u offset %llu+%llu failed (%s:%s), err %zd\n",
+			f->file_name, io_u->ddir,
+			io_u->offset, io_u->xfer_buflen,
+			zbc_sk_str(err.sk),
+			zbc_asc_ascq_str(err.asc_ascq), ret);
+	} else {
+		log_err("%s: op %u offset %llu+%llu failed, err %zd\n",
+			f->file_name, io_u->ddir,
+			io_u->offset, io_u->xfer_buflen, ret);
+	}
+
+	return -EIO;
+}
+
+static enum fio_q_status libzbc_queue(struct thread_data *td, struct io_u *io_u)
+{
+	struct libzbc_data *ld = td->io_ops_data;
+	struct fio_file *f = io_u->file;
+	ssize_t ret = 0;
+
+	fio_ro_check(td, io_u);
+
+	dprint(FD_ZBD, "%p:%s: libzbc queue %llu\n",
+	       td, f->file_name, io_u->offset);
+
+	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
+		ret = libzbc_rw(td, io_u);
+	} else if (ddir_sync(io_u->ddir)) {
+		ret = zbc_flush(ld->zdev);
+		if (ret)
+			log_err("zbc_flush error %zd\n", ret);
+	} else if (io_u->ddir != DDIR_TRIM) {
+		log_err("Unsupported operation %u\n", io_u->ddir);
+		ret = -EINVAL;
+	}
+	if (ret < 0)
+		io_u->error = -ret;
+
+	return FIO_Q_COMPLETED;
+}
+
+static struct ioengine_ops ioengine = {
+	.name			= "libzbc",
+	.version		= FIO_IOOPS_VERSION,
+	.open_file		= libzbc_open_file,
+	.close_file		= libzbc_close_file,
+	.cleanup		= libzbc_cleanup,
+	.invalidate		= libzbc_invalidate,
+	.get_file_size		= libzbc_get_file_size,
+	.get_zoned_model	= libzbc_get_zoned_model,
+	.report_zones		= libzbc_report_zones,
+	.reset_wp		= libzbc_reset_wp,
+	.queue			= libzbc_queue,
+	.flags			= FIO_SYNCIO | FIO_NOEXTEND | FIO_RAWIO,
+};
+
+static void fio_init fio_libzbc_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_libzbc_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/engines/rbd.c b/engines/rbd.c
index 7d4d3faf..a08f4775 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -200,6 +200,14 @@ static int _fio_rbd_connect(struct thread_data *td)
 		log_err("rados_create failed.\n");
 		goto failed_early;
 	}
+	if (o->pool_name == NULL) {
+		log_err("rbd pool name must be provided.\n");
+		goto failed_early;
+	}
+	if (!o->rbd_name) {
+		log_err("rbdname must be provided.\n");
+		goto failed_early;
+	}
 
 	r = rados_conf_read_file(rbd->cluster, NULL);
 	if (r < 0) {
diff --git a/engines/rdma.c b/engines/rdma.c
index 2569a8e3..f192f432 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -1050,7 +1050,7 @@ static int fio_rdmaio_setup_connect(struct thread_data *td, const char *host,
 		return err;
 
 	/* resolve route */
-	if (strcmp(o->bindname, "") != 0) {
+	if (o->bindname && strlen(o->bindname)) {
 		addrb.ss_family = AF_INET;
 		err = aton(td, o->bindname, (struct sockaddr_in *)&addrb);
 		if (err)
@@ -1116,7 +1116,7 @@ static int fio_rdmaio_setup_listen(struct thread_data *td, short port)
 	rd->addr.sin_family = AF_INET;
 	rd->addr.sin_port = htons(port);
 
-	if (strcmp(o->bindname, "") == 0)
+	if (!o->bindname || !strlen(o->bindname))
 		rd->addr.sin_addr.s_addr = htonl(INADDR_ANY);
 	else
 		rd->addr.sin_addr.s_addr = htonl(*o->bindname);
@@ -1249,8 +1249,7 @@ static int fio_rdmaio_init(struct thread_data *td)
 {
 	struct rdmaio_data *rd = td->io_ops_data;
 	struct rdmaio_options *o = td->eo;
-	unsigned int max_bs;
-	int ret, i;
+	int ret;
 
 	if (td_rw(td)) {
 		log_err("fio: rdma connections must be read OR write\n");
@@ -1318,6 +1317,13 @@ static int fio_rdmaio_init(struct thread_data *td)
 		rd->is_client = 1;
 		ret = fio_rdmaio_setup_connect(td, td->o.filename, o->port);
 	}
+	return ret;
+}
+static int fio_rdmaio_post_init(struct thread_data *td)
+{
+	unsigned int max_bs;
+	int i;
+	struct rdmaio_data *rd = td->io_ops_data;
 
 	max_bs = max(td->o.max_bs[DDIR_READ], td->o.max_bs[DDIR_WRITE]);
 	rd->send_buf.max_bs = htonl(max_bs);
@@ -1351,7 +1357,7 @@ static int fio_rdmaio_init(struct thread_data *td)
 
 	rd->send_buf.nr = htonl(i);
 
-	return ret;
+	return 0;
 }
 
 static void fio_rdmaio_cleanup(struct thread_data *td)
@@ -1388,6 +1394,7 @@ static struct ioengine_ops ioengine_rw = {
 	.version		= FIO_IOOPS_VERSION,
 	.setup			= fio_rdmaio_setup,
 	.init			= fio_rdmaio_init,
+	.post_init		= fio_rdmaio_post_init,
 	.prep			= fio_rdmaio_prep,
 	.queue			= fio_rdmaio_queue,
 	.commit			= fio_rdmaio_commit,
diff --git a/engines/skeleton_external.c b/engines/skeleton_external.c
index 1b6625b2..7f3e4cb3 100644
--- a/engines/skeleton_external.c
+++ b/engines/skeleton_external.c
@@ -153,6 +153,46 @@ static int fio_skeleton_close(struct thread_data *td, struct fio_file *f)
 	return generic_close_file(td, f);
 }
 
+/*
+ * Hook for getting the zoned model of a zoned block device for zonemode=zbd.
+ * The zoned model can be one of (see zbd_types.h):
+ * - ZBD_IGNORE: skip regular files
+ * - ZBD_NONE: regular block device (zone emulation will be used)
+ * - ZBD_HOST_AWARE: host aware zoned block device
+ * - ZBD_HOST_MANAGED: host managed zoned block device
+ */
+static int fio_skeleton_get_zoned_model(struct thread_data *td,
+			struct fio_file *f, enum zbd_zoned_model *model)
+{
+	*model = ZBD_NONE;
+	return 0;
+}
+
+/*
+ * Hook called for getting zone information of a ZBD_HOST_AWARE or
+ * ZBD_HOST_MANAGED zoned block device. Up to @nr_zones zone information
+ * structures can be reported using the array zones for zones starting from
+ * @offset. The number of zones reported must be returned or a negative error
+ * code in case of error.
+ */
+static int fio_skeleton_report_zones(struct thread_data *td, struct fio_file *f,
+				     uint64_t offset, struct zbd_zone *zones,
+				     unsigned int nr_zones)
+{
+	return 0;
+}
+
+/*
+ * Hook called for resetting the write pointer position of zones of a
+ * ZBD_HOST_AWARE or ZBD_HOST_MANAGED zoned block device. The write pointer
+ * position of all zones in the range @offset..@offset + @length must be reset.
+ */
+static int fio_skeleton_reset_wp(struct thread_data *td, struct fio_file *f,
+				 uint64_t offset, uint64_t length)
+{
+	return 0;
+}
+
 /*
  * Note that the structure is exported, so that fio can get it via
  * dlsym(..., "ioengine"); for (and only for) external engines.
@@ -169,6 +209,9 @@ struct ioengine_ops ioengine = {
 	.cleanup	= fio_skeleton_cleanup,
 	.open_file	= fio_skeleton_open,
 	.close_file	= fio_skeleton_close,
+	.get_zoned_model = fio_skeleton_get_zoned_model,
+	.report_zones	= fio_skeleton_report_zones,
+	.reset_wp	= fio_skeleton_reset_wp,
 	.options	= options,
 	.option_struct_size	= sizeof(struct fio_skeleton_options),
 };
diff --git a/fio.1 b/fio.1
index 1db12c2f..a2379f98 100644
--- a/fio.1
+++ b/fio.1
@@ -1629,6 +1629,12 @@ I/O. Requires \fBfilename\fR option to specify either block or
 character devices. This engine supports trim operations. The
 sg engine includes engine specific options.
 .TP
+.B libzbc
+Synchronous I/O engine for SMR hard-disks using the \fBlibzbc\fR
+library. The target can be either an sg character device or
+a block device file. This engine supports the zonemode=zbd zone
+operations.
+.TP
 .B null
 Doesn't transfer any data, just pretends to. This is mainly used to
 exercise fio itself and for debugging/testing purposes.
diff --git a/fio.h b/fio.h
index 2a9eef45..bbf057c1 100644
--- a/fio.h
+++ b/fio.h
@@ -172,8 +172,6 @@ struct zone_split_index {
 	uint64_t size_prev;
 };
 
-#define FIO_MAX_OPEN_ZBD_ZONES 128
-
 /*
  * This describes a single thread/process executing a fio job.
  */
diff --git a/io_u.h b/io_u.h
index 0f63cdd0..87c29201 100644
--- a/io_u.h
+++ b/io_u.h
@@ -93,7 +93,6 @@ struct io_u {
 		struct workqueue_work work;
 	};
 
-#ifdef CONFIG_LINUX_BLKZONED
 	/*
 	 * ZBD mode zbd_queue_io callback: called after engine->queue operation
 	 * to advance a zone write pointer and eventually unlock the I/O zone.
@@ -108,7 +107,6 @@ struct io_u {
 	 * or commit of an async I/O to unlock the I/O target zone.
 	 */
 	void (*zbd_put_io)(const struct io_u *);
-#endif
 
 	/*
 	 * Callback for io completion
diff --git a/ioengines.h b/ioengines.h
index 01a9b586..f48b4db9 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -6,8 +6,9 @@
 #include "compiler/compiler.h"
 #include "flist.h"
 #include "io_u.h"
+#include "zbd_types.h"
 
-#define FIO_IOOPS_VERSION	25
+#define FIO_IOOPS_VERSION	26
 
 /*
  * io_ops->queue() return values
@@ -44,6 +45,12 @@ struct ioengine_ops {
 	void (*iomem_free)(struct thread_data *);
 	int (*io_u_init)(struct thread_data *, struct io_u *);
 	void (*io_u_free)(struct thread_data *, struct io_u *);
+	int (*get_zoned_model)(struct thread_data *td,
+			       struct fio_file *f, enum zbd_zoned_model *);
+	int (*report_zones)(struct thread_data *, struct fio_file *,
+			    uint64_t, struct zbd_zone *, unsigned int);
+	int (*reset_wp)(struct thread_data *, struct fio_file *,
+			uint64_t, uint64_t);
 	int option_struct_size;
 	struct fio_option *options;
 };
diff --git a/options.c b/options.c
index 4714a3a1..2372c042 100644
--- a/options.c
+++ b/options.c
@@ -13,6 +13,7 @@
 #include "lib/pattern.h"
 #include "options.h"
 #include "optgroup.h"
+#include "zbd.h"
 
 char client_sockaddr_str[INET6_ADDRSTRLEN] = { 0 };
 
@@ -3362,7 +3363,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Maximum number of open zones",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct thread_options, max_open_zones),
-		.maxval	= FIO_MAX_OPEN_ZBD_ZONES,
+		.maxval	= ZBD_MAX_OPEN_ZONES,
 		.help	= "Limit random writes to SMR drives to the specified"
 			  " number of sequential zones",
 		.def	= "0",
diff --git a/oslib/blkzoned.h b/oslib/blkzoned.h
new file mode 100644
index 00000000..4cc071dc
--- /dev/null
+++ b/oslib/blkzoned.h
@@ -0,0 +1,49 @@
+/*
+ * Copyright (C) 2020 Western Digital Corporation or its affiliates.
+ *
+ * This file is released under the GPL.
+ */
+#ifndef FIO_BLKZONED_H
+#define FIO_BLKZONED_H
+
+#include "zbd_types.h"
+
+#ifdef CONFIG_HAS_BLKZONED
+extern int blkzoned_get_zoned_model(struct thread_data *td,
+			struct fio_file *f, enum zbd_zoned_model *model);
+extern int blkzoned_report_zones(struct thread_data *td,
+				struct fio_file *f, uint64_t offset,
+				struct zbd_zone *zones, unsigned int nr_zones);
+extern int blkzoned_reset_wp(struct thread_data *td, struct fio_file *f,
+				uint64_t offset, uint64_t length);
+#else
+/*
+ * Define stubs for systems that do not have zoned block device support.
+ */
+static inline int blkzoned_get_zoned_model(struct thread_data *td,
+			struct fio_file *f, enum zbd_zoned_model *model)
+{
+	/*
+	 * If this is a block device file, allow zbd emulation.
+	 */
+	if (f->filetype == FIO_TYPE_BLOCK) {
+		*model = ZBD_NONE;
+		return 0;
+	}
+
+	return -ENODEV;
+}
+static inline int blkzoned_report_zones(struct thread_data *td,
+				struct fio_file *f, uint64_t offset,
+				struct zbd_zone *zones, unsigned int nr_zones)
+{
+	return -EIO;
+}
+static inline int blkzoned_reset_wp(struct thread_data *td, struct fio_file *f,
+				    uint64_t offset, uint64_t length)
+{
+	return -EIO;
+}
+#endif
+
+#endif /* FIO_BLKZONED_H */
diff --git a/oslib/linux-blkzoned.c b/oslib/linux-blkzoned.c
new file mode 100644
index 00000000..61ea3a53
--- /dev/null
+++ b/oslib/linux-blkzoned.c
@@ -0,0 +1,219 @@
+/*
+ * Copyright (C) 2020 Western Digital Corporation or its affiliates.
+ *
+ * This file is released under the GPL.
+ */
+#include <errno.h>
+#include <string.h>
+#include <stdlib.h>
+#include <dirent.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "file.h"
+#include "fio.h"
+#include "lib/pow2.h"
+#include "log.h"
+#include "oslib/asprintf.h"
+#include "smalloc.h"
+#include "verify.h"
+#include "zbd_types.h"
+
+#include <linux/blkzoned.h>
+
+/*
+ * Read up to 255 characters from the first line of a file. Strip the trailing
+ * newline.
+ */
+static char *read_file(const char *path)
+{
+	char line[256], *p = line;
+	FILE *f;
+
+	f = fopen(path, "rb");
+	if (!f)
+		return NULL;
+	if (!fgets(line, sizeof(line), f))
+		line[0] = '\0';
+	strsep(&p, "\n");
+	fclose(f);
+
+	return strdup(line);
+}
+
+int blkzoned_get_zoned_model(struct thread_data *td, struct fio_file *f,
+			     enum zbd_zoned_model *model)
+{
+	const char *file_name = f->file_name;
+	char *zoned_attr_path = NULL;
+	char *model_str = NULL;
+	struct stat statbuf;
+	char *sys_devno_path = NULL;
+	char *part_attr_path = NULL;
+	char *part_str = NULL;
+	char sys_path[PATH_MAX];
+	ssize_t sz;
+	char *delim = NULL;
+
+	if (f->filetype != FIO_TYPE_BLOCK) {
+		*model = ZBD_IGNORE;
+		return 0;
+	}
+
+	*model = ZBD_NONE;
+
+	if (stat(file_name, &statbuf) < 0)
+		goto out;
+
+	if (asprintf(&sys_devno_path, "/sys/dev/block/%d:%d",
+		     major(statbuf.st_rdev), minor(statbuf.st_rdev)) < 0)
+		goto out;
+
+	sz = readlink(sys_devno_path, sys_path, sizeof(sys_path) - 1);
+	if (sz < 0)
+		goto out;
+	sys_path[sz] = '\0';
+
+	/*
+	 * If the device is a partition device, cut the device name in the
+	 * canonical sysfs path to obtain the sysfs path of the holder device.
+	 *   e.g.:  /sys/devices/.../sda/sda1 -> /sys/devices/.../sda
+	 */
+	if (asprintf(&part_attr_path, "/sys/dev/block/%s/partition",
+		     sys_path) < 0)
+		goto out;
+	part_str = read_file(part_attr_path);
+	if (part_str && *part_str == '1') {
+		delim = strrchr(sys_path, '/');
+		if (!delim)
+			goto out;
+		*delim = '\0';
+	}
+
+	if (asprintf(&zoned_attr_path,
+		     "/sys/dev/block/%s/queue/zoned", sys_path) < 0)
+		goto out;
+
+	model_str = read_file(zoned_attr_path);
+	if (!model_str)
+		goto out;
+	dprint(FD_ZBD, "%s: zbd model string: %s\n", file_name, model_str);
+	if (strcmp(model_str, "host-aware") == 0)
+		*model = ZBD_HOST_AWARE;
+	else if (strcmp(model_str, "host-managed") == 0)
+		*model = ZBD_HOST_MANAGED;
+out:
+	free(model_str);
+	free(zoned_attr_path);
+	free(part_str);
+	free(part_attr_path);
+	free(sys_devno_path);
+	return 0;
+}
+
+int blkzoned_report_zones(struct thread_data *td, struct fio_file *f,
+			  uint64_t offset, struct zbd_zone *zones,
+			  unsigned int nr_zones)
+{
+	struct blk_zone_report *hdr = NULL;
+	struct blk_zone *blkz;
+	struct zbd_zone *z;
+	unsigned int i;
+	int fd = -1, ret;
+
+	fd = open(f->file_name, O_RDONLY | O_LARGEFILE);
+	if (fd < 0)
+		return -errno;
+
+	hdr = calloc(1, sizeof(struct blk_zone_report) +
+			nr_zones * sizeof(struct blk_zone));
+	if (!hdr) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	hdr->nr_zones = nr_zones;
+	hdr->sector = offset >> 9;
+	ret = ioctl(fd, BLKREPORTZONE, hdr);
+	if (ret) {
+		ret = -errno;
+		goto out;
+	}
+
+	nr_zones = hdr->nr_zones;
+	blkz = &hdr->zones[0];
+	z = &zones[0];
+	for (i = 0; i < nr_zones; i++, z++, blkz++) {
+		z->start = blkz->start << 9;
+		z->wp = blkz->wp << 9;
+		z->len = blkz->len << 9;
+
+		switch (blkz->type) {
+		case BLK_ZONE_TYPE_CONVENTIONAL:
+			z->type = ZBD_ZONE_TYPE_CNV;
+			break;
+		case BLK_ZONE_TYPE_SEQWRITE_REQ:
+			z->type = ZBD_ZONE_TYPE_SWR;
+			break;
+		case BLK_ZONE_TYPE_SEQWRITE_PREF:
+			z->type = ZBD_ZONE_TYPE_SWP;
+			break;
+		default:
+			td_verror(td, errno, "invalid zone type");
+			log_err("%s: invalid type for zone at sector %llu.\n",
+				f->file_name, (unsigned long long)offset >> 9);
+			ret = -EIO;
+			goto out;
+		}
+
+		switch (blkz->cond) {
+		case BLK_ZONE_COND_NOT_WP:
+			z->cond = ZBD_ZONE_COND_NOT_WP;
+			break;
+		case BLK_ZONE_COND_EMPTY:
+			z->cond = ZBD_ZONE_COND_EMPTY;
+			break;
+		case BLK_ZONE_COND_IMP_OPEN:
+			z->cond = ZBD_ZONE_COND_IMP_OPEN;
+			break;
+		case BLK_ZONE_COND_EXP_OPEN:
+			z->cond = ZBD_ZONE_COND_EXP_OPEN;
+			break;
+		case BLK_ZONE_COND_CLOSED:
+			z->cond = ZBD_ZONE_COND_CLOSED;
+			break;
+		case BLK_ZONE_COND_FULL:
+			z->cond = ZBD_ZONE_COND_FULL;
+			break;
+		case BLK_ZONE_COND_READONLY:
+		case BLK_ZONE_COND_OFFLINE:
+		default:
+			/* Treat all these conditions as offline (don't use!) */
+			z->cond = ZBD_ZONE_COND_OFFLINE;
+			break;
+		}
+	}
+
+	ret = nr_zones;
+out:
+	free(hdr);
+	close(fd);
+
+	return ret;
+}
+
+int blkzoned_reset_wp(struct thread_data *td, struct fio_file *f,
+		      uint64_t offset, uint64_t length)
+{
+	struct blk_zone_range zr = {
+		.sector         = offset >> 9,
+		.nr_sectors     = length >> 9,
+	};
+
+	if (ioctl(f->fd, BLKRESETZONE, &zr) < 0)
+		return -errno;
+
+	return 0;
+}
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index ea5abc4e..8e326ed5 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -465,7 +465,7 @@ class Requirements(object):
                 print("Unable to open {0} to check requirements".format(config_file))
                 Requirements._zbd = True
             else:
-                Requirements._zbd = "CONFIG_LINUX_BLKZONED" in contents
+                Requirements._zbd = "CONFIG_HAS_BLKZONED" in contents
                 Requirements._libaio = "CONFIG_LIBAIO" in contents
 
             Requirements._root = (os.geteuid() == 0)
diff --git a/t/zbd/functions b/t/zbd/functions
index d49555a8..35087b15 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -4,18 +4,27 @@ blkzone=$(type -p blkzone 2>/dev/null)
 sg_inq=$(type -p sg_inq 2>/dev/null)
 zbc_report_zones=$(type -p zbc_report_zones 2>/dev/null)
 zbc_reset_zone=$(type -p zbc_reset_zone 2>/dev/null)
+zbc_info=$(type -p zbc_info 2>/dev/null)
 if [ -z "${blkzone}" ] &&
        { [ -z "${zbc_report_zones}" ] || [ -z "${zbc_reset_zone}" ]; }; then
     echo "Error: neither blkzone nor zbc_report_zones is available"
     exit 1
 fi
 
+if [ -n "${use_libzbc}" ] &&
+       { [ -z "${zbc_report_zones}" ] || [ -z "${zbc_reset_zone}" ] ||
+         [ -z "${zbc_info}" ]; }; then
+    echo "Error: zbc_report_zones, or zbc_reset_zone or zbc_info is not available"
+    echo "Error: reinstall libzbc tools"
+    exit 1
+fi
+
 # Reports the starting sector and length of the first sequential zone of device
 # $1.
 first_sequential_zone() {
     local dev=$1
 
-    if [ -n "${blkzone}" ]; then
+    if [ -n "${blkzone}" ] && [ ! -n "${use_libzbc}" ]; then
 	${blkzone} report "$dev" |
 	    sed -n 's/^[[:blank:]]*start:[[:blank:]]\([0-9a-zA-Z]*\),[[:blank:]]len[[:blank:]]\([0-9a-zA-Z]*\),.*type:[[:blank:]]2(.*/\1 \2/p' |
 	    {
@@ -33,7 +42,7 @@ first_sequential_zone() {
 max_open_zones() {
     local dev=$1
 
-    if [ -n "${sg_inq}" ]; then
+    if [ -n "${sg_inq}" ] && [ ! -n "${use_libzbc}" ]; then
 	if ! ${sg_inq} -e --page=0xB6 --len=20 --hex "$dev" 2> /dev/null; then
 	    # Non scsi device such as null_blk can not return max open zones.
 	    # Use default value.
@@ -56,13 +65,36 @@ max_open_zones() {
     fi
 }
 
+is_zbc() {
+	local dev=$1
+
+	[[ -z "$(${zbc_info} "$dev" | grep "is not a zoned block device")" ]]
+}
+
+zbc_logical_block_size() {
+	local dev=$1
+
+	${zbc_info} "$dev" |
+		grep "logical blocks" |
+		sed -n 's/^[[:blank:]]*[0-9]* logical blocks of[[:blank:]]*//p' |
+		sed 's/ B//'
+}
+
+zbc_disk_sectors() {
+        local dev=$1
+
+	zbc_info "$dev" |
+		grep "512-bytes sectors" |
+		sed -e 's/[[:blank:]]*\([0-9]*\)512-bytes sectors.*/\1/'
+}
+
 # Reset the write pointer of one zone on device $1 at offset $2. The offset
 # must be specified in units of 512 byte sectors. Offset -1 means reset all
 # zones.
 reset_zone() {
     local dev=$1 offset=$2 sectors
 
-    if [ -n "${blkzone}" ]; then
+    if [ -n "${blkzone}" ] && [ ! -n "${use_libzbc}" ]; then
 	if [ "$offset" -lt 0 ]; then
 	    sectors=$(<"/sys/class/block/${dev#/dev/}/size")
 	    ${blkzone} reset -o "${offset}" -l "$sectors" "$dev"
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index bd41fffb..be889f34 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -5,7 +5,7 @@
 # This file is released under the GPL.
 
 usage() {
-    echo "Usage: $(basename "$0") [-d] [-e] [-r] [-v] [-t <test>] <SMR drive device node>"
+    echo "Usage: $(basename "$0") [-d] [-e] [-l] [-r] [-v] [-t <test>] [-z] <SMR drive device node>"
 }
 
 max() {
@@ -24,6 +24,14 @@ min() {
     fi
 }
 
+ioengine() {
+	if [ -n "$use_libzbc" ]; then
+		echo -n "--ioengine=libzbc"
+	else
+		echo -n "--ioengine=$1"
+	fi
+}
+
 set_io_scheduler() {
     local dev=$1 sched=$2
 
@@ -87,6 +95,7 @@ run_fio() {
 
     opts=("--aux-path=/tmp" "--allow_file_create=0" \
 			    "--significant_figures=10" "$@")
+    opts+=(${var_opts[@]})
     { echo; echo "fio ${opts[*]}"; echo; } >>"${logfile}.${test_number}"
 
     "${dynamic_analyzer[@]}" "$fio" "${opts[@]}"
@@ -115,7 +124,7 @@ run_fio_on_seq() {
 # Check whether buffered writes are refused.
 test1() {
     run_fio --name=job1 --filename="$dev" --rw=write --direct=0 --bs=4K	\
-	    --size="${zone_size}" --thread=1				\
+	    "$(ioengine "psync")" --size="${zone_size}" --thread=1	\
 	    --zonemode=zbd --zonesize="${zone_size}" 2>&1 |
 	tee -a "${logfile}.${test_number}" |
 	grep -q 'Using direct I/O is mandatory for writing to ZBD drives'
@@ -137,6 +146,7 @@ test2() {
 
     off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
     bs=$((2 * zone_size))
+    opts+=("$(ioengine "psync")")
     opts+=("--name=job1" "--filename=$dev" "--rw=write" "--direct=1")
     opts+=("--zonemode=zbd" "--offset=$off" "--bs=$bs" "--size=$bs")
     if [ -z "$is_zbd" ]; then
@@ -155,7 +165,7 @@ test3() {
     [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
     opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--bs=4K")
     opts+=("--size=$size" "--zonemode=zbd")
-    opts+=("--ioengine=psync" "--rw=read" "--direct=1" "--thread=1")
+    opts+=("$(ioengine "psync")" "--rw=read" "--direct=1" "--thread=1")
     if [ -z "$is_zbd" ]; then
 	opts+=("--zonesize=${zone_size}")
     fi
@@ -178,7 +188,7 @@ test4() {
     [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
     opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--bs=$size")
     opts+=("--size=$size" "--thread=1" "--read_beyond_wp=1")
-    opts+=("--ioengine=psync" "--rw=read" "--direct=1" "--disable_lat=1")
+    opts+=("$(ioengine "psync")" "--rw=read" "--direct=1" "--disable_lat=1")
     opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
     check_read $size || return $?
@@ -189,7 +199,7 @@ test5() {
     local size
 
     size=$((4 * zone_size))
-    run_fio_on_seq --ioengine=psync --iodepth=1 --rw=write		\
+    run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write	\
 		   --bs="$(max $((zone_size / 64)) "$logical_block_size")"\
 		   --do_verify=1 --verify=md5				\
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
@@ -202,7 +212,7 @@ test6() {
     local size
 
     size=$((4 * zone_size))
-    run_fio_on_seq --ioengine=psync --iodepth=1 --rw=read		\
+    run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=read	\
 		   --bs="$(max $((zone_size / 64)) "$logical_block_size")"\
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
     check_read $size || return $?
@@ -212,7 +222,7 @@ test6() {
 test7() {
     local size=$((zone_size))
 
-    run_fio_on_seq --ioengine=libaio --iodepth=1 --rw=randwrite		\
+    run_fio_on_seq "$(ioengine "libaio")" --iodepth=1 --rw=randwrite	\
 		   --bs="$(min 16384 "${zone_size}")"			\
 		   --do_verify=1 --verify=md5 --size="$size"		\
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
@@ -225,7 +235,7 @@ test8() {
     local size
 
     size=$((4 * zone_size))
-    run_fio_on_seq --ioengine=libaio --iodepth=64 --rw=randwrite	\
+    run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite	\
 		   --bs="$(min 16384 "${zone_size}")"			\
 		   --do_verify=1 --verify=md5				\
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
@@ -243,7 +253,8 @@ test9() {
     fi
 
     size=$((4 * zone_size))
-    run_fio_on_seq --ioengine=sg --iodepth=1 --rw=randwrite --bs=16K	\
+    run_fio_on_seq --ioengine=sg					\
+		   --iodepth=1 --rw=randwrite --bs=16K			\
 		   --do_verify=1 --verify=md5				\
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
     check_written $size || return $?
@@ -260,7 +271,8 @@ test10() {
     fi
 
     size=$((4 * zone_size))
-    run_fio_on_seq --ioengine=sg --iodepth=64 --rw=randwrite --bs=16K	\
+    run_fio_on_seq --ioengine=sg 					\
+		   --iodepth=64 --rw=randwrite --bs=16K			\
 		   --do_verify=1 --verify=md5				\
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
     check_written $size || return $?
@@ -272,7 +284,7 @@ test11() {
     local size
 
     size=$((4 * zone_size))
-    run_fio_on_seq --ioengine=libaio --iodepth=64 --rw=randwrite	\
+    run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite	\
 		   --bsrange=4K-64K --do_verify=1 --verify=md5		\
 		   --debug=zbd >>"${logfile}.${test_number}" 2>&1 || return $?
     check_written $size || return $?
@@ -284,7 +296,7 @@ test12() {
     local size
 
     size=$((8 * zone_size))
-    run_fio_on_seq --ioengine=libaio --iodepth=64 --rw=randwrite --bs=16K     \
+    run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite --bs=16K \
 		   --max_open_zones=1 --size=$size --do_verify=1 --verify=md5 \
 		   --debug=zbd >>"${logfile}.${test_number}" 2>&1 || return $?
     check_written $size || return $?
@@ -296,7 +308,7 @@ test13() {
     local size
 
     size=$((8 * zone_size))
-    run_fio_on_seq --ioengine=libaio --iodepth=64 --rw=randwrite --bs=16K     \
+    run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite --bs=16K \
 		   --max_open_zones=4 --size=$size --do_verify=1 --verify=md5 \
 		   --debug=zbd						      \
 		   >>"${logfile}.${test_number}" 2>&1 || return $?
@@ -314,7 +326,7 @@ test14() {
 	     >>"${logfile}.${test_number}"
 	return 0
     fi
-    run_one_fio_job --ioengine=libaio --iodepth=64 --rw=randwrite --bs=16K \
+    run_one_fio_job "$(ioengine "libaio")" --iodepth=64 --rw=randwrite --bs=16K \
 		    --zonemode=zbd --zonesize="${zone_size}" --do_verify=1 \
 		    --verify=md5 --size=$size				   \
 		    >>"${logfile}.${test_number}" 2>&1 || return $?
@@ -333,14 +345,14 @@ test15() {
     done
     off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
     size=$((2 * zone_size))
-    run_one_fio_job --ioengine=psync --rw=write --bs=$((zone_size / 16))\
+    run_one_fio_job "$(ioengine "psync")" --rw=write --bs=$((zone_size / 16))\
 		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off \
 		    --size=$size >>"${logfile}.${test_number}" 2>&1 ||
 	return $?
     check_written $size || return $?
     off=$((first_sequential_zone_sector * 512))
     size=$((4 * zone_size))
-    run_one_fio_job --ioengine=psync --rw=read --bs=$((zone_size / 16))	\
+    run_one_fio_job "$(ioengine "psync")" --rw=read --bs=$((zone_size / 16)) \
 		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off \
 		    --size=$((size)) >>"${logfile}.${test_number}" 2>&1 ||
 	return $?
@@ -357,7 +369,7 @@ test16() {
 
     off=$((first_sequential_zone_sector * 512))
     size=$((4 * zone_size))
-    run_one_fio_job --ioengine=libaio --iodepth=64 --rw=randread --bs=16K \
+    run_one_fio_job "$(ioengine "libaio")" --iodepth=64 --rw=randread --bs=16K \
 		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off \
 		    --size=$size >>"${logfile}.${test_number}" 2>&1 || return $?
     check_read $size || return $?
@@ -373,12 +385,12 @@ test17() {
     if [ -n "$is_zbd" ]; then
 	reset_zone "$dev" $((off / 512)) || return $?
     fi
-    run_one_fio_job --ioengine=psync --rw=write --offset="$off"		\
+    run_one_fio_job "$(ioengine "psync")" --rw=write --offset="$off"	\
 		    --zonemode=zbd --zonesize="${zone_size}"		\
 		    --bs="$zone_size" --size="$zone_size"		\
 		    >>"${logfile}.${test_number}" 2>&1 || return $?
     check_written "$zone_size" || return $?
-    run_one_fio_job --ioengine=libaio --iodepth=8 --rw=randrw --bs=4K	\
+    run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw --bs=4K \
 		    --zonemode=zbd --zonesize="${zone_size}"		\
 		    --offset=$off --loops=2 --norandommap=1\
 		    >>"${logfile}.${test_number}" 2>&1 || return $?
@@ -431,8 +443,8 @@ test24() {
     local bs loops=9 size=$((zone_size))
 
     bs=$(min $((256*1024)) "$zone_size")
-    run_fio_on_seq --ioengine=psync --rw=write --bs="$bs" --size=$size	 \
-		   --loops=$loops					 \
+    run_fio_on_seq "$(ioengine "psync")" --rw=write --bs="$bs"		\
+		   --size=$size --loops=$loops				\
 		   --zone_reset_frequency=.01 --zone_reset_threshold=.90 \
 		   >> "${logfile}.${test_number}" 2>&1 || return $?
     check_written $((size * loops)) || return $?
@@ -452,8 +464,9 @@ test25() {
     for ((i=0;i<16;i++)); do
 	opts+=("--name=job$i" "--filename=$dev" "--thread=1" "--direct=1")
 	opts+=("--offset=$((first_sequential_zone_sector*512 + zone_size*i))")
-	opts+=("--size=$zone_size" "--ioengine=psync" "--rw=write" "--bs=16K")
+	opts+=("--size=$zone_size" "$(ioengine "psync")" "--rw=write" "--bs=16K")
 	opts+=("--zonemode=zbd" "--zonesize=${zone_size}" "--group_reporting=1")
+	opts+=(${var_opts[@]})
     done
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
 }
@@ -462,7 +475,7 @@ write_to_first_seq_zone() {
     local loops=4 r
 
     r=$(((RANDOM << 16) | RANDOM))
-    run_fio --name="$dev" --filename="$dev" --ioengine=psync --rw="$1"	\
+    run_fio --name="$dev" --filename="$dev" "$(ioengine "psync")" --rw="$1" \
 	    --thread=1 --do_verify=1 --verify=md5 --direct=1 --bs=4K	\
 	    --offset=$((first_sequential_zone_sector * 512))		\
 	    "--size=$zone_size" --loops=$loops --randseed="$r"		\
@@ -490,9 +503,10 @@ test28() {
     opts=("--debug=zbd")
     for ((i=0;i<jobs;i++)); do
 	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
-	opts+=("--size=$zone_size" "--ioengine=psync" "--rw=randwrite")
+	opts+=("--size=$zone_size" "$(ioengine "psync")" "--rw=randwrite")
 	opts+=("--thread=1" "--direct=1" "--zonemode=zbd")
 	opts+=("--zonesize=${zone_size}" "--group_reporting=1")
+	opts+=(${var_opts[@]})
     done
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
     check_written $((jobs * zone_size)) || return $?
@@ -513,9 +527,10 @@ test29() {
     for ((i=0;i<jobs;i++)); do
 	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
 	opts+=("--size=$size" "--io_size=$zone_size" "--thread=1")
-	opts+=("--ioengine=psync" "--rw=randwrite" "--direct=1")
+	opts+=("$(ioengine "psync")" "--rw=randwrite" "--direct=1")
 	opts+=("--max_open_zones=4" "--group_reporting=1")
 	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
+	opts+=(${var_opts[@]})
     done
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
     check_written $((jobs * zone_size)) || return $?
@@ -526,7 +541,7 @@ test30() {
     local off
 
     off=$((first_sequential_zone_sector * 512))
-    run_one_fio_job --ioengine=libaio --iodepth=8 --rw=randrw		\
+    run_one_fio_job "$(ioengine "libaio")" --iodepth=8 --rw=randrw	\
 		    --bs="$(max $((zone_size / 128)) "$logical_block_size")"\
 		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off\
 		    --loops=2 --time_based --runtime=30s --norandommap=1\
@@ -548,16 +563,17 @@ test31() {
     for ((off = first_sequential_zone_sector * 512; off < disk_size;
 	  off += inc)); do
 	opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--io_size=$bs")
-	opts+=("--bs=$bs" "--size=$zone_size" "--ioengine=libaio")
+	opts+=("--bs=$bs" "--size=$zone_size" "$(ioengine "libaio")")
 	opts+=("--rw=write" "--direct=1" "--thread=1" "--stats=0")
 	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
+	opts+=(${var_opts[@]})
     done
     "$(dirname "$0")/../../fio" "${opts[@]}" >> "${logfile}.${test_number}" 2>&1
     # Next, run the test.
     off=$((first_sequential_zone_sector * 512))
     size=$((disk_size - off))
     opts=("--name=$dev" "--filename=$dev" "--offset=$off" "--size=$size")
-    opts+=("--bs=$bs" "--ioengine=psync" "--rw=randread" "--direct=1")
+    opts+=("--bs=$bs" "$(ioengine "psync")" "--rw=randread" "--direct=1")
     opts+=("--thread=1" "--time_based" "--runtime=30" "--zonemode=zbd")
     opts+=("--zonesize=${zone_size}")
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
@@ -571,7 +587,7 @@ test32() {
     off=$((first_sequential_zone_sector * 512))
     size=$((disk_size - off))
     opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--size=$size")
-    opts+=("--bs=128K" "--ioengine=psync" "--rw=randwrite" "--direct=1")
+    opts+=("--bs=128K" "$(ioengine "psync")" "--rw=randwrite" "--direct=1")
     opts+=("--thread=1" "--time_based" "--runtime=30")
     opts+=("--max_open_zones=$max_open_zones" "--zonemode=zbd")
     opts+=("--zonesize=${zone_size}")
@@ -586,8 +602,8 @@ test33() {
     size=$((2 * zone_size))
     io_size=$((5 * zone_size))
     bs=$((3 * zone_size / 4))
-    run_fio_on_seq --ioengine=psync --iodepth=1 --rw=write --size=$size	\
-		   --io_size=$io_size --bs=$bs				\
+    run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write	\
+		   --size=$size --io_size=$io_size --bs=$bs	\
 		   >> "${logfile}.${test_number}" 2>&1 || return $?
     check_written $(((io_size + bs - 1) / bs * bs)) || return $?
 }
@@ -598,7 +614,7 @@ test34() {
     local size
 
     size=$((2 * zone_size))
-    run_fio_on_seq --ioengine=psync --iodepth=1 --rw=write --size=$size	  \
+    run_fio_on_seq "$(ioengine "psync")" --iodepth=1 --rw=write --size=$size \
 		   --do_verify=1 --verify=md5 --bs=$((3 * zone_size / 4)) \
 		   >> "${logfile}.${test_number}" 2>&1 && return 1
     grep -q 'not a divisor of' "${logfile}.${test_number}"
@@ -611,9 +627,9 @@ test35() {
     off=$(((first_sequential_zone_sector + 1) * 512))
     size=$((zone_size - 2 * 512))
     bs=$((zone_size / 4))
-    run_one_fio_job --offset=$off --size=$size --ioengine=psync	--iodepth=1 \
-		    --rw=write --do_verify=1 --verify=md5 --bs=$bs	    \
-		    --zonemode=zbd --zonesize="${zone_size}"		    \
+    run_one_fio_job --offset=$off --size=$size "$(ioengine "psync")"	\
+		    --iodepth=1 --rw=write --do_verify=1 --verify=md5	\
+		    --bs=$bs --zonemode=zbd --zonesize="${zone_size}"	\
 		    >> "${logfile}.${test_number}" 2>&1 && return 1
     grep -q 'io_size must be at least one zone' "${logfile}.${test_number}"
 }
@@ -625,9 +641,9 @@ test36() {
     off=$(((first_sequential_zone_sector) * 512))
     size=$((zone_size - 512))
     bs=$((zone_size / 4))
-    run_one_fio_job --offset=$off --size=$size --ioengine=psync	--iodepth=1 \
-		    --rw=write --do_verify=1 --verify=md5 --bs=$bs	    \
-		    --zonemode=zbd --zonesize="${zone_size}"		    \
+    run_one_fio_job --offset=$off --size=$size "$(ioengine "psync")"	\
+		    --iodepth=1 --rw=write --do_verify=1 --verify=md5	\
+		    --bs=$bs --zonemode=zbd --zonesize="${zone_size}"	\
 		    >> "${logfile}.${test_number}" 2>&1 && return 1
     grep -q 'io_size must be at least one zone' "${logfile}.${test_number}"
 }
@@ -643,9 +659,9 @@ test37() {
     fi
     size=$((zone_size + 2 * 512))
     bs=$((zone_size / 4))
-    run_one_fio_job --offset=$off --size=$size --ioengine=psync	--iodepth=1 \
-		    --rw=write --do_verify=1 --verify=md5 --bs=$bs	    \
-		    --zonemode=zbd --zonesize="${zone_size}"		    \
+    run_one_fio_job --offset=$off --size=$size "$(ioengine "psync")"	\
+		    --iodepth=1 --rw=write --do_verify=1 --verify=md5	\
+		    --bs=$bs --zonemode=zbd --zonesize="${zone_size}"	\
 		    >> "${logfile}.${test_number}" 2>&1
     check_written $((zone_size)) || return $?
 }
@@ -657,9 +673,9 @@ test38() {
     size=$((logical_block_size))
     off=$((disk_size - logical_block_size))
     bs=$((logical_block_size))
-    run_one_fio_job --offset=$off --size=$size --ioengine=psync	--iodepth=1 \
-		    --rw=write --do_verify=1 --verify=md5 --bs=$bs	    \
-		    --zonemode=zbd --zonesize="${zone_size}"		    \
+    run_one_fio_job --offset=$off --size=$size "$(ioengine "psync")"	\
+		    --iodepth=1 --rw=write --do_verify=1 --verify=md5	\
+		    --bs=$bs --zonemode=zbd --zonesize="${zone_size}"	\
 		    >> "${logfile}.${test_number}" 2>&1 && return 1
     grep -q 'io_size must be at least one zone' "${logfile}.${test_number}"
 }
@@ -669,7 +685,7 @@ read_one_block() {
     local bs
 
     bs=$((logical_block_size))
-    run_one_fio_job --rw=read --ioengine=psync --bs=$bs --size=$bs "$@" 2>&1 |
+    run_one_fio_job --rw=read "$(ioengine "psync")" --bs=$bs --size=$bs "$@" 2>&1 |
 	tee -a "${logfile}.${test_number}"
 }
 
@@ -725,7 +741,7 @@ test45() {
 
     [ -z "$is_zbd" ] && return 0
     bs=$((logical_block_size))
-    run_one_fio_job --ioengine=psync --iodepth=1 --rw=randwrite --bs=$bs\
+    run_one_fio_job "$(ioengine "psync")" --iodepth=1 --rw=randwrite --bs=$bs\
 		    --offset=$((first_sequential_zone_sector * 512)) \
 		    --size="$zone_size" --do_verify=1 --verify=md5 2>&1 |
 	tee -a "${logfile}.${test_number}" |
@@ -737,7 +753,7 @@ test46() {
     local size
 
     size=$((4 * zone_size))
-    run_fio_on_seq --ioengine=libaio --iodepth=64 --rw=randwrite --bs=4K \
+    run_fio_on_seq "$(ioengine "libaio")" --iodepth=64 --rw=randwrite --bs=4K \
 		   --group_reporting=1 --numjobs=8 \
 		   >> "${logfile}.${test_number}" 2>&1 || return $?
     check_written $((size * 8)) || return $?
@@ -749,7 +765,7 @@ test47() {
 
     [ -z "$is_zbd" ] && return 0
     bs=$((logical_block_size))
-    run_one_fio_job --ioengine=psync --rw=write --bs=$bs \
+    run_one_fio_job "$(ioengine "psync")" --rw=write --bs=$bs \
 		    --zonemode=zbd --zoneskip=1		 \
 		    >> "${logfile}.${test_number}" 2>&1 && return 1
     grep -q 'zoneskip 1 is not a multiple of the device zone size' "${logfile}.${test_number}"
@@ -766,7 +782,7 @@ test48() {
     [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
     opts=("--aux-path=/tmp" "--allow_file_create=0" "--significant_figures=10")
     opts+=("--debug=zbd")
-    opts+=("--ioengine=libaio" "--rw=randwrite" "--direct=1")
+    opts+=("$(ioengine "libaio")" "--rw=randwrite" "--direct=1")
     opts+=("--time_based" "--runtime=30")
     opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
     opts+=("--max_open_zones=4")
@@ -788,6 +804,8 @@ test48() {
 tests=()
 dynamic_analyzer=()
 reset_all_zones=
+use_libzbc=
+zbd_debug=
 
 while [ "${1#-}" != "$1" ]; do
   case "$1" in
@@ -796,10 +814,12 @@ while [ "${1#-}" != "$1" ]; do
 	shift;;
     -e) dynamic_analyzer=(valgrind "--read-var-info=yes" "--tool=helgrind");
 	shift;;
+    -l) use_libzbc=1; shift;;
     -r) reset_all_zones=1; shift;;
     -t) tests+=("$2"); shift; shift;;
     -v) dynamic_analyzer=(valgrind "--read-var-info=yes");
 	shift;;
+    -z) zbd_debug=1; shift;;
     --) shift; break;;
   esac
 done
@@ -812,48 +832,93 @@ fi
 # shellcheck source=functions
 source "$(dirname "$0")/functions" || exit $?
 
+var_opts=()
+if [ -n "$zbd_debug" ]; then
+    var_opts+=("--debug=zbd")
+fi
 dev=$1
 realdev=$(readlink -f "$dev")
 basename=$(basename "$realdev")
-major=$((0x$(stat -L -c '%t' "$realdev"))) || exit $?
-minor=$((0x$(stat -L -c '%T' "$realdev"))) || exit $?
-disk_size=$(($(<"/sys/dev/block/$major:$minor/size")*512))
-# When the target is a partition device, get basename of its holder device to
-# access sysfs path of the holder device
-if [[ -r "/sys/dev/block/$major:$minor/partition" ]]; then
-	realsysfs=$(readlink "/sys/dev/block/$major:$minor")
-	basename=$(basename "${realsysfs%/*}")
-fi
-logical_block_size=$(<"/sys/block/$basename/queue/logical_block_size")
-case "$(<"/sys/class/block/$basename/queue/zoned")" in
-    host-managed|host-aware)
+
+if [[ -b "$realdev" ]]; then
+	major=$((0x$(stat -L -c '%t' "$realdev"))) || exit $?
+	minor=$((0x$(stat -L -c '%T' "$realdev"))) || exit $?
+	disk_size=$(($(<"/sys/dev/block/$major:$minor/size")*512))
+
+	# When the target is a partition device, get basename of its
+	# holder device to access sysfs path of the holder device
+	if [[ -r "/sys/dev/block/$major:$minor/partition" ]]; then
+		realsysfs=$(readlink "/sys/dev/block/$major:$minor")
+		basename=$(basename "${realsysfs%/*}")
+	fi
+	logical_block_size=$(<"/sys/block/$basename/queue/logical_block_size")
+	case "$(<"/sys/class/block/$basename/queue/zoned")" in
+	host-managed|host-aware)
+		is_zbd=true
+		if ! result=($(first_sequential_zone "$dev")); then
+			echo "Failed to determine first sequential zone"
+			exit 1
+		fi
+		first_sequential_zone_sector=${result[0]}
+		sectors_per_zone=${result[1]}
+		zone_size=$((sectors_per_zone * 512))
+		if ! max_open_zones=$(max_open_zones "$dev"); then
+			echo "Failed to determine maximum number of open zones"
+			exit 1
+		fi
+		set_io_scheduler "$basename" deadline || exit $?
+		if [ -n "$reset_all_zones" ]; then
+			reset_zone "$dev" -1
+		fi
+		;;
+	*)
+		first_sequential_zone_sector=$(((disk_size / 2) &
+						(logical_block_size - 1)))
+		zone_size=$(max 65536 "$logical_block_size")
+		sectors_per_zone=$((zone_size / 512))
+		max_open_zones=128
+		set_io_scheduler "$basename" none || exit $?
+		;;
+	esac
+elif [[ -c "$realdev" ]]; then
+	# For an SG node, we must have libzbc option specified
+	if [[ ! -n "$use_libzbc" ]]; then
+		echo "Character device files can only be used with -l (libzbc) option"
+		exit 1
+	fi
+
+	if ! $(is_zbc "$dev"); then
+		echo "Device is not a ZBC disk"
+		exit 1
+	fi
 	is_zbd=true
+
+	if ! disk_size=($(( $(zbc_disk_sectors "$dev") * 512))); then
+		echo "Failed to determine disk size"
+		exit 1
+	fi
+	if ! logical_block_size=($(zbc_logical_block_size "$dev")); then
+		echo "Failed to determine logical block size"
+		exit 1
+	fi
 	if ! result=($(first_sequential_zone "$dev")); then
-	    echo "Failed to determine first sequential zone"
-	    exit 1
+		echo "Failed to determine first sequential zone"
+		exit 1
 	fi
 	first_sequential_zone_sector=${result[0]}
 	sectors_per_zone=${result[1]}
 	zone_size=$((sectors_per_zone * 512))
 	if ! max_open_zones=$(max_open_zones "$dev"); then
-	    echo "Failed to determine maximum number of open zones"
-	    exit 1
+		echo "Failed to determine maximum number of open zones"
+		exit 1
 	fi
-	echo "First sequential zone starts at sector $first_sequential_zone_sector; zone size: $((zone_size >> 20)) MB"
-	set_io_scheduler "$basename" deadline || exit $?
 	if [ -n "$reset_all_zones" ]; then
-	    reset_zone "$dev" -1
+		reset_zone "$dev" -1
 	fi
-	;;
-    *)
-	first_sequential_zone_sector=$(((disk_size / 2) &
-					(logical_block_size - 1)))
-	zone_size=$(max 65536 "$logical_block_size")
-	sectors_per_zone=$((zone_size / 512))
-	max_open_zones=128
-	set_io_scheduler "$basename" none || exit $?
-	;;
-esac
+fi
+
+echo -n "First sequential zone starts at sector $first_sequential_zone_sector;"
+echo " zone size: $((zone_size >> 20)) MB"
 
 if [ "${#tests[@]}" = 0 ]; then
     readarray -t tests < <(declare -F | grep "test[0-9]*" | \
diff --git a/zbd.c b/zbd.c
index e2f3f52f..f4067802 100644
--- a/zbd.c
+++ b/zbd.c
@@ -7,12 +7,9 @@
 #include <errno.h>
 #include <string.h>
 #include <stdlib.h>
-#include <dirent.h>
 #include <fcntl.h>
-#include <sys/ioctl.h>
 #include <sys/stat.h>
 #include <unistd.h>
-#include <linux/blkzoned.h>
 
 #include "file.h"
 #include "fio.h"
@@ -23,6 +20,97 @@
 #include "verify.h"
 #include "zbd.h"
 
+/**
+ * zbd_get_zoned_model - Get a device zoned model
+ * @td: FIO thread data
+ * @f: FIO file for which to get model information
+ */
+int zbd_get_zoned_model(struct thread_data *td, struct fio_file *f,
+			enum zbd_zoned_model *model)
+{
+	int ret;
+
+	if (td->io_ops && td->io_ops->get_zoned_model)
+		ret = td->io_ops->get_zoned_model(td, f, model);
+	else
+		ret = blkzoned_get_zoned_model(td, f, model);
+	if (ret < 0) {
+		td_verror(td, errno, "get zoned model failed");
+		log_err("%s: get zoned model failed (%d).\n",
+			f->file_name, errno);
+	}
+
+	return ret;
+}
+
+/**
+ * zbd_report_zones - Get zone information
+ * @td: FIO thread data.
+ * @f: FIO file for which to get zone information
+ * @offset: offset from which to report zones
+ * @zones: Array of struct zbd_zone
+ * @nr_zones: Size of @zones array
+ *
+ * Get zone information into @zones starting from the zone at offset @offset
+ * for the device specified by @f.
+ *
+ * Returns the number of zones reported upon success and a negative error code
+ * upon failure. If the zone report is empty, always assume an error (device
+ * problem) and return -EIO.
+ */
+int zbd_report_zones(struct thread_data *td, struct fio_file *f,
+		     uint64_t offset, struct zbd_zone *zones,
+		     unsigned int nr_zones)
+{
+	int ret;
+
+	if (td->io_ops && td->io_ops->report_zones)
+		ret = td->io_ops->report_zones(td, f, offset, zones, nr_zones);
+	else
+		ret = blkzoned_report_zones(td, f, offset, zones, nr_zones);
+	if (ret < 0) {
+		td_verror(td, errno, "report zones failed");
+		log_err("%s: report zones from sector %llu failed (%d).\n",
+			f->file_name, (unsigned long long)offset >> 9, errno);
+	} else if (ret == 0) {
+		td_verror(td, errno, "Empty zone report");
+		log_err("%s: report zones from sector %llu is empty.\n",
+			f->file_name, (unsigned long long)offset >> 9);
+		ret = -EIO;
+	}
+
+	return ret;
+}
+
+/**
+ * zbd_reset_wp - reset the write pointer of a range of zones
+ * @td: FIO thread data.
+ * @f: FIO file for which to reset zones
+ * @offset: Starting offset of the first zone to reset
+ * @length: Length of the range of zones to reset
+ *
+ * Reset the write pointer of all zones in the range @offset...@offset+@length.
+ * Returns 0 upon success and a negative error code upon failure.
+ */
+int zbd_reset_wp(struct thread_data *td, struct fio_file *f,
+		 uint64_t offset, uint64_t length)
+{
+	int ret;
+
+	if (td->io_ops && td->io_ops->reset_wp)
+		ret = td->io_ops->reset_wp(td, f, offset, length);
+	else
+		ret = blkzoned_reset_wp(td, f, offset, length);
+	if (ret < 0) {
+		td_verror(td, errno, "resetting wp failed");
+		log_err("%s: resetting wp for %llu sectors at sector %llu failed (%d).\n",
+			f->file_name, (unsigned long long)length >> 9,
+			(unsigned long long)offset >> 9, errno);
+	}
+
+	return ret;
+}
+
 /**
  * zbd_zone_idx - convert an offset into a zone number
  * @f: file pointer.
@@ -41,6 +129,15 @@ static uint32_t zbd_zone_idx(const struct fio_file *f, uint64_t offset)
 	return min(zone_idx, f->zbd_info->nr_zones);
 }
 
+/**
+ * zbd_zone_swr - Test whether a zone requires sequential writes
+ * @z: zone info pointer.
+ */
+static inline bool zbd_zone_swr(struct fio_zone_info *z)
+{
+	return z->type == ZBD_ZONE_TYPE_SWR;
+}
+
 /**
  * zbd_zone_full - verify whether a minimum number of bytes remain in a zone
  * @f: file pointer.
@@ -54,7 +151,7 @@ static bool zbd_zone_full(const struct fio_file *f, struct fio_zone_info *z,
 {
 	assert((required & 511) == 0);
 
-	return z->type == BLK_ZONE_TYPE_SEQWRITE_REQ &&
+	return zbd_zone_swr(z) &&
 		z->wp + required > z->start + f->zbd_info->zone_size;
 }
 
@@ -93,7 +190,7 @@ static bool zbd_using_direct_io(void)
 			continue;
 		for_each_file(td, f, j) {
 			if (f->zbd_info &&
-			    f->zbd_info->model == ZBD_DM_HOST_MANAGED)
+			    f->zbd_info->model == ZBD_HOST_MANAGED)
 				return false;
 		}
 	}
@@ -112,8 +209,7 @@ static bool zbd_is_seq_job(struct fio_file *f)
 	zone_idx_b = zbd_zone_idx(f, f->file_offset);
 	zone_idx_e = zbd_zone_idx(f, f->file_offset + f->io_size - 1);
 	for (zone_idx = zone_idx_b; zone_idx <= zone_idx_e; zone_idx++)
-		if (f->zbd_info->zone_info[zone_idx].type ==
-		    BLK_ZONE_TYPE_SEQWRITE_REQ)
+		if (zbd_zone_swr(&f->zbd_info->zone_info[zone_idx]))
 			return true;
 
 	return false;
@@ -224,119 +320,6 @@ static bool zbd_verify_bs(void)
 	return true;
 }
 
-/*
- * Read zone information into @buf starting from sector @start_sector.
- * @fd is a file descriptor that refers to a block device and @bufsz is the
- * size of @buf.
- *
- * Returns 0 upon success and a negative error code upon failure.
- * If the zone report is empty, always assume an error (device problem) and
- * return -EIO.
- */
-static int read_zone_info(int fd, uint64_t start_sector,
-			  void *buf, unsigned int bufsz)
-{
-	struct blk_zone_report *hdr = buf;
-	int ret;
-
-	if (bufsz < sizeof(*hdr))
-		return -EINVAL;
-
-	memset(hdr, 0, sizeof(*hdr));
-
-	hdr->nr_zones = (bufsz - sizeof(*hdr)) / sizeof(struct blk_zone);
-	hdr->sector = start_sector;
-	ret = ioctl(fd, BLKREPORTZONE, hdr);
-	if (ret)
-		return -errno;
-	if (!hdr->nr_zones)
-		return -EIO;
-	return 0;
-}
-
-/*
- * Read up to 255 characters from the first line of a file. Strip the trailing
- * newline.
- */
-static char *read_file(const char *path)
-{
-	char line[256], *p = line;
-	FILE *f;
-
-	f = fopen(path, "rb");
-	if (!f)
-		return NULL;
-	if (!fgets(line, sizeof(line), f))
-		line[0] = '\0';
-	strsep(&p, "\n");
-	fclose(f);
-
-	return strdup(line);
-}
-
-static enum blk_zoned_model get_zbd_model(const char *file_name)
-{
-	enum blk_zoned_model model = ZBD_DM_NONE;
-	char *zoned_attr_path = NULL;
-	char *model_str = NULL;
-	struct stat statbuf;
-	char *sys_devno_path = NULL;
-	char *part_attr_path = NULL;
-	char *part_str = NULL;
-	char sys_path[PATH_MAX];
-	ssize_t sz;
-	char *delim = NULL;
-
-	if (stat(file_name, &statbuf) < 0)
-		goto out;
-
-	if (asprintf(&sys_devno_path, "/sys/dev/block/%d:%d",
-		     major(statbuf.st_rdev), minor(statbuf.st_rdev)) < 0)
-		goto out;
-
-	sz = readlink(sys_devno_path, sys_path, sizeof(sys_path) - 1);
-	if (sz < 0)
-		goto out;
-	sys_path[sz] = '\0';
-
-	/*
-	 * If the device is a partition device, cut the device name in the
-	 * canonical sysfs path to obtain the sysfs path of the holder device.
-	 *   e.g.:  /sys/devices/.../sda/sda1 -> /sys/devices/.../sda
-	 */
-	if (asprintf(&part_attr_path, "/sys/dev/block/%s/partition",
-		     sys_path) < 0)
-		goto out;
-	part_str = read_file(part_attr_path);
-	if (part_str && *part_str == '1') {
-		delim = strrchr(sys_path, '/');
-		if (!delim)
-			goto out;
-		*delim = '\0';
-	}
-
-	if (asprintf(&zoned_attr_path,
-		     "/sys/dev/block/%s/queue/zoned", sys_path) < 0)
-		goto out;
-
-	model_str = read_file(zoned_attr_path);
-	if (!model_str)
-		goto out;
-	dprint(FD_ZBD, "%s: zbd model string: %s\n", file_name, model_str);
-	if (strcmp(model_str, "host-aware") == 0)
-		model = ZBD_DM_HOST_AWARE;
-	else if (strcmp(model_str, "host-managed") == 0)
-		model = ZBD_DM_HOST_MANAGED;
-
-out:
-	free(model_str);
-	free(zoned_attr_path);
-	free(part_str);
-	free(part_attr_path);
-	free(sys_devno_path);
-	return model;
-}
-
 static int ilog2(uint64_t i)
 {
 	int log = -1;
@@ -389,8 +372,8 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 		pthread_mutex_init(&p->mutex, &attr);
 		p->start = i * zone_size;
 		p->wp = p->start + zone_size;
-		p->type = BLK_ZONE_TYPE_SEQWRITE_REQ;
-		p->cond = BLK_ZONE_COND_EMPTY;
+		p->type = ZBD_ZONE_TYPE_SWR;
+		p->cond = ZBD_ZONE_COND_EMPTY;
 	}
 	/* a sentinel */
 	p->start = nr_zones * zone_size;
@@ -405,51 +388,41 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 }
 
 /*
- * Parse the BLKREPORTZONE output and store it in f->zbd_info. Must be called
- * only for devices that support this ioctl, namely zoned block devices.
+ * Maximum number of zones to report in one operation.
+ */
+#define ZBD_REPORT_MAX_ZONES	8192U
+
+/*
+ * Parse the device zone report and store it in f->zbd_info. Must be called
+ * only for devices that are zoned, namely those with a model != ZBD_NONE.
  */
 static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 {
-	const unsigned int bufsz = sizeof(struct blk_zone_report) +
-		4096 * sizeof(struct blk_zone);
-	uint32_t nr_zones;
-	struct blk_zone_report *hdr;
-	const struct blk_zone *z;
+	int nr_zones, nrz;
+	struct zbd_zone *zones, *z;
 	struct fio_zone_info *p;
-	uint64_t zone_size, start_sector;
+	uint64_t zone_size, offset;
 	struct zoned_block_device_info *zbd_info = NULL;
 	pthread_mutexattr_t attr;
-	void *buf;
-	int fd, i, j, ret = 0;
+	int i, j, ret = 0;
 
 	pthread_mutexattr_init(&attr);
 	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
 	pthread_mutexattr_setpshared(&attr, true);
 
-	buf = malloc(bufsz);
-	if (!buf)
+	zones = calloc(ZBD_REPORT_MAX_ZONES, sizeof(struct zbd_zone));
+	if (!zones)
 		goto out;
 
-	fd = open(f->file_name, O_RDONLY | O_LARGEFILE);
-	if (fd < 0) {
-		ret = -errno;
-		goto free;
+	nrz = zbd_report_zones(td, f, 0, zones, ZBD_REPORT_MAX_ZONES);
+	if (nrz < 0) {
+		ret = nrz;
+		log_info("fio: report zones (offset 0) failed for %s (%d).\n",
+			 f->file_name, -ret);
+		goto out;
 	}
 
-	ret = read_zone_info(fd, 0, buf, bufsz);
-	if (ret < 0) {
-		log_info("fio: BLKREPORTZONE(%lu) failed for %s (%d).\n",
-			 0UL, f->file_name, -ret);
-		goto close;
-	}
-	hdr = buf;
-	if (hdr->nr_zones < 1) {
-		log_info("fio: %s has invalid zone information.\n",
-			 f->file_name);
-		goto close;
-	}
-	z = (void *)(hdr + 1);
-	zone_size = z->len << 9;
+	zone_size = zones[0].len;
 	nr_zones = (f->real_file_size + zone_size - 1) / zone_size;
 
 	if (td->o.zone_size == 0) {
@@ -459,7 +432,7 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 			f->file_name, (unsigned long long) td->o.zone_size,
 			(unsigned long long) zone_size);
 		ret = -EINVAL;
-		goto close;
+		goto out;
 	}
 
 	dprint(FD_ZBD, "Device %s has %d zones of size %llu KB\n", f->file_name,
@@ -469,24 +442,24 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
 	ret = -ENOMEM;
 	if (!zbd_info)
-		goto close;
+		goto out;
 	pthread_mutex_init(&zbd_info->mutex, &attr);
 	zbd_info->refcount = 1;
 	p = &zbd_info->zone_info[0];
-	for (start_sector = 0, j = 0; j < nr_zones;) {
-		z = (void *)(hdr + 1);
-		for (i = 0; i < hdr->nr_zones; i++, j++, z++, p++) {
+	for (offset = 0, j = 0; j < nr_zones;) {
+		z = &zones[0];
+		for (i = 0; i < nrz; i++, j++, z++, p++) {
 			pthread_mutex_init(&p->mutex, &attr);
-			p->start = z->start << 9;
+			p->start = z->start;
 			switch (z->cond) {
-			case BLK_ZONE_COND_NOT_WP:
-			case BLK_ZONE_COND_FULL:
+			case ZBD_ZONE_COND_NOT_WP:
+			case ZBD_ZONE_COND_FULL:
 				p->wp = p->start + zone_size;
 				break;
 			default:
 				assert(z->start <= z->wp);
-				assert(z->wp <= z->start + (zone_size >> 9));
-				p->wp = z->wp << 9;
+				assert(z->wp <= z->start + zone_size);
+				p->wp = z->wp;
 				break;
 			}
 			p->type = z->type;
@@ -495,22 +468,26 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 				log_info("%s: invalid zone data\n",
 					 f->file_name);
 				ret = -EINVAL;
-				goto close;
+				goto out;
 			}
 		}
 		z--;
-		start_sector = z->start + z->len;
+		offset = z->start + z->len;
 		if (j >= nr_zones)
 			break;
-		ret = read_zone_info(fd, start_sector, buf, bufsz);
-		if (ret < 0) {
-			log_info("fio: BLKREPORTZONE(%llu) failed for %s (%d).\n",
-				 (unsigned long long) start_sector, f->file_name, -ret);
-			goto close;
+		nrz = zbd_report_zones(td, f, offset,
+					    zones, ZBD_REPORT_MAX_ZONES);
+		if (nrz < 0) {
+			ret = nrz;
+			log_info("fio: report zones (offset %llu) failed for %s (%d).\n",
+			 	 (unsigned long long)offset,
+				 f->file_name, -ret);
+			goto out;
 		}
 	}
+
 	/* a sentinel */
-	zbd_info->zone_info[nr_zones].start = start_sector << 9;
+	zbd_info->zone_info[nr_zones].start = offset;
 
 	f->zbd_info = zbd_info;
 	f->zbd_info->zone_size = zone_size;
@@ -520,12 +497,9 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	zbd_info = NULL;
 	ret = 0;
 
-close:
-	sfree(zbd_info);
-	close(fd);
-free:
-	free(buf);
 out:
+	sfree(zbd_info);
+	free(zones);
 	pthread_mutexattr_destroy(&attr);
 	return ret;
 }
@@ -537,21 +511,31 @@ out:
  */
 static int zbd_create_zone_info(struct thread_data *td, struct fio_file *f)
 {
-	enum blk_zoned_model zbd_model;
-	int ret = 0;
+	enum zbd_zoned_model zbd_model;
+	int ret;
 
 	assert(td->o.zone_mode == ZONE_MODE_ZBD);
 
-	zbd_model = get_zbd_model(f->file_name);
+	ret = zbd_get_zoned_model(td, f, &zbd_model);
+	if (ret)
+		return ret;
+
 	switch (zbd_model) {
-	case ZBD_DM_HOST_AWARE:
-	case ZBD_DM_HOST_MANAGED:
+	case ZBD_IGNORE:
+		return 0;
+	case ZBD_HOST_AWARE:
+	case ZBD_HOST_MANAGED:
 		ret = parse_zone_info(td, f);
 		break;
-	case ZBD_DM_NONE:
+	case ZBD_NONE:
 		ret = init_zone_info(td, f);
 		break;
+	default:
+		td_verror(td, EINVAL, "Unsupported zoned model");
+		log_err("Unsupported zoned model\n");
+		return -EINVAL;
 	}
+
 	if (ret == 0)
 		f->zbd_info->model = zbd_model;
 	return ret;
@@ -613,8 +597,6 @@ int zbd_init(struct thread_data *td)
 	int i;
 
 	for_each_file(td, f, i) {
-		if (f->filetype != FIO_TYPE_BLOCK)
-			continue;
 		if (zbd_init_zone_info(td, f))
 			return 1;
 	}
@@ -642,31 +624,23 @@ int zbd_init(struct thread_data *td)
  *
  * Returns 0 upon success and a negative error code upon failure.
  */
-static int zbd_reset_range(struct thread_data *td, const struct fio_file *f,
+static int zbd_reset_range(struct thread_data *td, struct fio_file *f,
 			   uint64_t offset, uint64_t length)
 {
-	struct blk_zone_range zr = {
-		.sector         = offset >> 9,
-		.nr_sectors     = length >> 9,
-	};
 	uint32_t zone_idx_b, zone_idx_e;
 	struct fio_zone_info *zb, *ze, *z;
 	int ret = 0;
 
-	assert(f->fd != -1);
 	assert(is_valid_offset(f, offset + length - 1));
+
 	switch (f->zbd_info->model) {
-	case ZBD_DM_HOST_AWARE:
-	case ZBD_DM_HOST_MANAGED:
-		ret = ioctl(f->fd, BLKRESETZONE, &zr);
-		if (ret < 0) {
-			td_verror(td, errno, "resetting wp failed");
-			log_err("%s: resetting wp for %llu sectors at sector %llu failed (%d).\n",
-				f->file_name, zr.nr_sectors, zr.sector, errno);
+	case ZBD_HOST_AWARE:
+	case ZBD_HOST_MANAGED:
+		ret = zbd_reset_wp(td, f, offset, length);
+		if (ret < 0)
 			return ret;
-		}
 		break;
-	case ZBD_DM_NONE:
+	default:
 		break;
 	}
 
@@ -703,7 +677,7 @@ static unsigned int zbd_zone_nr(struct zoned_block_device_info *zbd_info,
  *
  * Returns 0 upon success and a negative error code upon failure.
  */
-static int zbd_reset_zone(struct thread_data *td, const struct fio_file *f,
+static int zbd_reset_zone(struct thread_data *td, struct fio_file *f,
 			  struct fio_zone_info *z)
 {
 	dprint(FD_ZBD, "%s: resetting wp of zone %u.\n", f->file_name,
@@ -732,9 +706,8 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 
 	dprint(FD_ZBD, "%s: examining zones %u .. %u\n", f->file_name,
 		zbd_zone_nr(f->zbd_info, zb), zbd_zone_nr(f->zbd_info, ze));
-	assert(f->fd != -1);
 	for (z = zb; z < ze; z++) {
-		if (z->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
+		if (!zbd_zone_swr(z))
 			continue;
 		zone_lock(td, z);
 		reset_wp = all_zones ? z->wp != z->start :
@@ -899,7 +872,7 @@ static bool zbd_open_zone(struct thread_data *td, const struct io_u *io_u,
 	struct fio_zone_info *z = &f->zbd_info->zone_info[zone_idx];
 	bool res = true;
 
-	if (z->cond == BLK_ZONE_COND_OFFLINE)
+	if (z->cond == ZBD_ZONE_COND_OFFLINE)
 		return false;
 
 	/*
@@ -939,7 +912,7 @@ static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
 	zone_idx = f->zbd_info->open_zones[open_zone_idx];
 	memmove(f->zbd_info->open_zones + open_zone_idx,
 		f->zbd_info->open_zones + open_zone_idx + 1,
-		(FIO_MAX_OPEN_ZBD_ZONES - (open_zone_idx + 1)) *
+		(ZBD_MAX_OPEN_ZONES - (open_zone_idx + 1)) *
 		sizeof(f->zbd_info->open_zones[0]));
 	f->zbd_info->num_open_zones--;
 	f->zbd_info->zone_info[zone_idx].open = 0;
@@ -1148,7 +1121,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 	 * the nearest non-empty zone in case of random I/O.
 	 */
 	for (z1 = zb + 1, z2 = zb - 1; z1 < zl || z2 >= zf; z1++, z2--) {
-		if (z1 < zl && z1->cond != BLK_ZONE_COND_OFFLINE) {
+		if (z1 < zl && z1->cond != ZBD_ZONE_COND_OFFLINE) {
 			pthread_mutex_lock(&z1->mutex);
 			if (z1->start + min_bs <= z1->wp)
 				return z1;
@@ -1157,7 +1130,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 			break;
 		}
 		if (td_random(td) && z2 >= zf &&
-		    z2->cond != BLK_ZONE_COND_OFFLINE) {
+		    z2->cond != ZBD_ZONE_COND_OFFLINE) {
 			pthread_mutex_lock(&z2->mutex);
 			if (z2->start + min_bs <= z2->wp)
 				return z2;
@@ -1193,7 +1166,7 @@ static void zbd_queue_io(struct io_u *io_u, int q, bool success)
 	assert(zone_idx < zbd_info->nr_zones);
 	z = &zbd_info->zone_info[zone_idx];
 
-	if (z->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
+	if (!zbd_zone_swr(z))
 		return;
 
 	if (!success)
@@ -1250,7 +1223,7 @@ static void zbd_put_io(const struct io_u *io_u)
 	assert(zone_idx < zbd_info->nr_zones);
 	z = &zbd_info->zone_info[zone_idx];
 
-	if (z->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
+	if (!zbd_zone_swr(z))
 		return;
 
 	dprint(FD_ZBD,
@@ -1261,6 +1234,13 @@ static void zbd_put_io(const struct io_u *io_u)
 	zbd_check_swd(f);
 }
 
+/*
+ * Windows and MacOS do not define this.
+ */
+#ifndef EREMOTEIO
+#define EREMOTEIO	121	/* POSIX value */
+#endif
+
 bool zbd_unaligned_write(int error_code)
 {
 	switch (error_code) {
@@ -1341,7 +1321,7 @@ void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u)
  */
 enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 {
-	const struct fio_file *f = io_u->file;
+	struct fio_file *f = io_u->file;
 	uint32_t zone_idx_b;
 	struct fio_zone_info *zb, *zl, *orig_zb;
 	uint32_t orig_len = io_u->buflen;
@@ -1359,14 +1339,14 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	orig_zb = zb;
 
 	/* Accept the I/O offset for conventional zones. */
-	if (zb->type == BLK_ZONE_TYPE_CONVENTIONAL)
+	if (!zbd_zone_swr(zb))
 		return io_u_accept;
 
 	/*
 	 * Accept the I/O offset for reads if reading beyond the write pointer
 	 * is enabled.
 	 */
-	if (zb->cond != BLK_ZONE_COND_OFFLINE &&
+	if (zb->cond != ZBD_ZONE_COND_OFFLINE &&
 	    io_u->ddir == DDIR_READ && td->o.read_beyond_wp)
 		return io_u_accept;
 
@@ -1385,7 +1365,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		 * I/O of at least min_bs B. If there isn't, find a new zone for
 		 * the I/O.
 		 */
-		range = zb->cond != BLK_ZONE_COND_OFFLINE ?
+		range = zb->cond != ZBD_ZONE_COND_OFFLINE ?
 			zb->wp - zb->start : 0;
 		if (range < min_bs ||
 		    ((!td_random(td)) && (io_u->offset + min_bs > zb->wp))) {
@@ -1510,7 +1490,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 
 accept:
 	assert(zb);
-	assert(zb->cond != BLK_ZONE_COND_OFFLINE);
+	assert(zb->cond != ZBD_ZONE_COND_OFFLINE);
 	assert(!io_u->zbd_queue_io);
 	assert(!io_u->zbd_put_io);
 	io_u->zbd_queue_io = zbd_queue_io;
diff --git a/zbd.h b/zbd.h
index e0a7e447..4eaf902e 100644
--- a/zbd.h
+++ b/zbd.h
@@ -7,23 +7,13 @@
 #ifndef FIO_ZBD_H
 #define FIO_ZBD_H
 
-#include <inttypes.h>
-#include "fio.h"	/* FIO_MAX_OPEN_ZBD_ZONES */
-#ifdef CONFIG_LINUX_BLKZONED
-#include <linux/blkzoned.h>
-#endif
+#include "io_u.h"
+#include "ioengines.h"
+#include "oslib/blkzoned.h"
+#include "zbd_types.h"
 
 struct fio_file;
 
-/*
- * Zoned block device models.
- */
-enum blk_zoned_model {
-	ZBD_DM_NONE,	/* Regular block device */
-	ZBD_DM_HOST_AWARE,	/* Host-aware zoned block device */
-	ZBD_DM_HOST_MANAGED,	/* Host-managed zoned block device */
-};
-
 enum io_u_action {
 	io_u_accept	= 0,
 	io_u_eof	= 1,
@@ -42,16 +32,14 @@ enum io_u_action {
  * @reset_zone: whether or not this zone should be reset before writing to it
  */
 struct fio_zone_info {
-#ifdef CONFIG_LINUX_BLKZONED
 	pthread_mutex_t		mutex;
 	uint64_t		start;
 	uint64_t		wp;
 	uint32_t		verify_block;
-	enum blk_zone_type	type:2;
-	enum blk_zone_cond	cond:4;
+	enum zbd_zone_type	type:2;
+	enum zbd_zone_cond	cond:4;
 	unsigned int		open:1;
 	unsigned int		reset_zone:1;
-#endif
 };
 
 /**
@@ -76,7 +64,7 @@ struct fio_zone_info {
  * will be smaller than 'zone_size'.
  */
 struct zoned_block_device_info {
-	enum blk_zoned_model	model;
+	enum zbd_zoned_model	model;
 	pthread_mutex_t		mutex;
 	uint64_t		zone_size;
 	uint64_t		sectors_with_data;
@@ -85,11 +73,10 @@ struct zoned_block_device_info {
 	uint32_t		refcount;
 	uint32_t		num_open_zones;
 	uint32_t		write_cnt;
-	uint32_t		open_zones[FIO_MAX_OPEN_ZBD_ZONES];
+	uint32_t		open_zones[ZBD_MAX_OPEN_ZONES];
 	struct fio_zone_info	zone_info[0];
 };
 
-#ifdef CONFIG_LINUX_BLKZONED
 void zbd_free_zone_info(struct fio_file *f);
 int zbd_init(struct thread_data *td);
 void zbd_file_reset(struct thread_data *td, struct fio_file *f);
@@ -115,45 +102,4 @@ static inline void zbd_put_io_u(struct io_u *io_u)
 	}
 }
 
-#else
-static inline void zbd_free_zone_info(struct fio_file *f)
-{
-}
-
-static inline int zbd_init(struct thread_data *td)
-{
-	return 0;
-}
-
-static inline void zbd_file_reset(struct thread_data *td, struct fio_file *f)
-{
-}
-
-static inline bool zbd_unaligned_write(int error_code)
-{
-	return false;
-}
-
-static inline enum io_u_action zbd_adjust_block(struct thread_data *td,
-						struct io_u *io_u)
-{
-	return io_u_accept;
-}
-
-static inline char *zbd_write_status(const struct thread_stat *ts)
-{
-	return NULL;
-}
-
-static inline void zbd_queue_io_u(struct io_u *io_u,
-				  enum fio_q_status status) {}
-static inline void zbd_put_io_u(struct io_u *io_u) {}
-
-static inline void setup_zbd_zone_mode(struct thread_data *td,
-					struct io_u *io_u)
-{
-}
-
-#endif
-
 #endif /* FIO_ZBD_H */
diff --git a/zbd_types.h b/zbd_types.h
new file mode 100644
index 00000000..2f2f1324
--- /dev/null
+++ b/zbd_types.h
@@ -0,0 +1,57 @@
+/*
+ * Copyright (C) 2020 Western Digital Corporation or its affiliates.
+ *
+ * This file is released under the GPL.
+ */
+#ifndef FIO_ZBD_TYPES_H
+#define FIO_ZBD_TYPES_H
+
+#include <inttypes.h>
+
+#define ZBD_MAX_OPEN_ZONES	128
+
+/*
+ * Zoned block device models.
+ */
+enum zbd_zoned_model {
+	ZBD_IGNORE,		/* Ignore file */
+	ZBD_NONE,		/* Regular block device */
+	ZBD_HOST_AWARE,		/* Host-aware zoned block device */
+	ZBD_HOST_MANAGED,	/* Host-managed zoned block device */
+};
+
+/*
+ * Zone types.
+ */
+enum zbd_zone_type {
+	ZBD_ZONE_TYPE_CNV	= 0x1,	/* Conventional */
+	ZBD_ZONE_TYPE_SWR	= 0x2,	/* Sequential write required */
+	ZBD_ZONE_TYPE_SWP	= 0x3,	/* Sequential write preferred */
+};
+
+/*
+ * Zone conditions.
+ */
+enum zbd_zone_cond {
+        ZBD_ZONE_COND_NOT_WP    = 0x0,
+        ZBD_ZONE_COND_EMPTY     = 0x1,
+        ZBD_ZONE_COND_IMP_OPEN  = 0x2,
+        ZBD_ZONE_COND_EXP_OPEN  = 0x3,
+        ZBD_ZONE_COND_CLOSED    = 0x4,
+        ZBD_ZONE_COND_READONLY  = 0xD,
+        ZBD_ZONE_COND_FULL      = 0xE,
+        ZBD_ZONE_COND_OFFLINE   = 0xF,
+};
+
+/*
+ * Zone descriptor.
+ */
+struct zbd_zone {
+	uint64_t		start;
+	uint64_t		wp;
+	uint64_t		len;
+	enum zbd_zone_type	type;
+	enum zbd_zone_cond	cond;
+};
+
+#endif /* FIO_ZBD_TYPES_H */


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-04-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-04-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6463db6c1d3a2a961008e87a86d464b596886f1a:

  fio: fix interaction between offset/size limited threads and "max_open_zones" (2020-04-02 13:33:49 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ebc403fe282864eddfd68ab1793f149a1b0eb1cd:

  zbd: fixup ->zone_size_log2 if zone size is not power of 2 (2020-04-06 19:41:45 -0600)

----------------------------------------------------------------
Alexey Dobriyan (1):
      zbd: fixup ->zone_size_log2 if zone size is not power of 2

Damien Le Moal (1):
      zbd: Fix potential zone lock deadlock

 zbd.c | 64 ++++++++++++++++++++++++++++++++--------------------------------
 1 file changed, 32 insertions(+), 32 deletions(-)

---

Diff of recent changes:

diff --git a/zbd.c b/zbd.c
index 0dd5a619..e2f3f52f 100644
--- a/zbd.c
+++ b/zbd.c
@@ -58,6 +58,24 @@ static bool zbd_zone_full(const struct fio_file *f, struct fio_zone_info *z,
 		z->wp + required > z->start + f->zbd_info->zone_size;
 }
 
+static void zone_lock(struct thread_data *td, struct fio_zone_info *z)
+{
+	/*
+	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
+	 * is changed or when io_u completes and zbd_put_io() executed.
+	 * To avoid multiple jobs doing asynchronous I/Os from deadlocking each
+	 * other waiting for zone locks when building an io_u batch, first
+	 * only trylock the zone. If the zone is already locked by another job,
+	 * process the currently queued I/Os so that I/O progress is made and
+	 * zones unlocked.
+	 */
+	if (pthread_mutex_trylock(&z->mutex) != 0) {
+		if (!td_ioengine_flagged(td, FIO_SYNCIO))
+			io_u_quiesce(td);
+		pthread_mutex_lock(&z->mutex);
+	}
+}
+
 static bool is_valid_offset(const struct fio_file *f, uint64_t offset)
 {
 	return (uint64_t)(offset - f->file_offset) < f->io_size;
@@ -380,7 +398,7 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 	f->zbd_info = zbd_info;
 	f->zbd_info->zone_size = zone_size;
 	f->zbd_info->zone_size_log2 = is_power_of_2(zone_size) ?
-		ilog2(zone_size) : -1;
+		ilog2(zone_size) : 0;
 	f->zbd_info->nr_zones = nr_zones;
 	pthread_mutexattr_destroy(&attr);
 	return 0;
@@ -497,7 +515,7 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	f->zbd_info = zbd_info;
 	f->zbd_info->zone_size = zone_size;
 	f->zbd_info->zone_size_log2 = is_power_of_2(zone_size) ?
-		ilog2(zone_size) : -1;
+		ilog2(zone_size) : 0;
 	f->zbd_info->nr_zones = nr_zones;
 	zbd_info = NULL;
 	ret = 0;
@@ -716,18 +734,18 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 		zbd_zone_nr(f->zbd_info, zb), zbd_zone_nr(f->zbd_info, ze));
 	assert(f->fd != -1);
 	for (z = zb; z < ze; z++) {
-		pthread_mutex_lock(&z->mutex);
-		if (z->type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
-			reset_wp = all_zones ? z->wp != z->start :
-					(td->o.td_ddir & TD_DDIR_WRITE) &&
-					z->wp % min_bs != 0;
-			if (reset_wp) {
-				dprint(FD_ZBD, "%s: resetting zone %u\n",
-				       f->file_name,
-				       zbd_zone_nr(f->zbd_info, z));
-				if (zbd_reset_zone(td, f, z) < 0)
-					res = 1;
-			}
+		if (z->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
+			continue;
+		zone_lock(td, z);
+		reset_wp = all_zones ? z->wp != z->start :
+				(td->o.td_ddir & TD_DDIR_WRITE) &&
+				z->wp % min_bs != 0;
+		if (reset_wp) {
+			dprint(FD_ZBD, "%s: resetting zone %u\n",
+			       f->file_name,
+			       zbd_zone_nr(f->zbd_info, z));
+			if (zbd_reset_zone(td, f, z) < 0)
+				res = 1;
 		}
 		pthread_mutex_unlock(&z->mutex);
 	}
@@ -927,24 +945,6 @@ static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
 	f->zbd_info->zone_info[zone_idx].open = 0;
 }
 
-static void zone_lock(struct thread_data *td, struct fio_zone_info *z)
-{
-	/*
-	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
-	 * is changed or when io_u completes and zbd_put_io() executed.
-	 * To avoid multiple jobs doing asynchronous I/Os from deadlocking each
-	 * other waiting for zone locks when building an io_u batch, first
-	 * only trylock the zone. If the zone is already locked by another job,
-	 * process the currently queued I/Os so that I/O progress is made and
-	 * zones unlocked.
-	 */
-	if (pthread_mutex_trylock(&z->mutex) != 0) {
-		if (!td_ioengine_flagged(td, FIO_SYNCIO))
-			io_u_quiesce(td);
-		pthread_mutex_lock(&z->mutex);
-	}
-}
-
 /* Anything goes as long as it is not a constant. */
 static uint32_t pick_random_zone_idx(const struct fio_file *f,
 				     const struct io_u *io_u)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-04-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-04-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 839f607b5771081e96942977a1ff9f1b24b77bca:

  Merge branch 'github-issue-947' of https://github.com/vincentkfu/fio (2020-03-31 10:21:48 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6463db6c1d3a2a961008e87a86d464b596886f1a:

  fio: fix interaction between offset/size limited threads and "max_open_zones" (2020-04-02 13:33:49 -0600)

----------------------------------------------------------------
Alexey Dobriyan (1):
      fio: fix interaction between offset/size limited threads and "max_open_zones"

 zbd.c | 43 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/zbd.c b/zbd.c
index b2d94249..0dd5a619 100644
--- a/zbd.c
+++ b/zbd.c
@@ -945,6 +945,13 @@ static void zone_lock(struct thread_data *td, struct fio_zone_info *z)
 	}
 }
 
+/* Anything goes as long as it is not a constant. */
+static uint32_t pick_random_zone_idx(const struct fio_file *f,
+				     const struct io_u *io_u)
+{
+	return io_u->offset * f->zbd_info->num_open_zones / f->real_file_size;
+}
+
 /*
  * Modify the offset of an I/O unit that does not refer to an open zone such
  * that it refers to an open zone. Close an open zone and open a new zone if
@@ -969,9 +976,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 		 * This statement accesses f->zbd_info->open_zones[] on purpose
 		 * without locking.
 		 */
-		zone_idx = f->zbd_info->open_zones[(io_u->offset -
-						    f->file_offset) *
-				f->zbd_info->num_open_zones / f->io_size];
+		zone_idx = f->zbd_info->open_zones[pick_random_zone_idx(f, io_u)];
 	} else {
 		zone_idx = zbd_zone_idx(f, io_u->offset);
 	}
@@ -985,6 +990,8 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 	 * has been obtained. Hence the loop.
 	 */
 	for (;;) {
+		uint32_t tmp_idx;
+
 		z = &f->zbd_info->zone_info[zone_idx];
 
 		zone_lock(td, z);
@@ -998,9 +1005,35 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 			       __func__, f->file_name);
 			return NULL;
 		}
-		open_zone_idx = (io_u->offset - f->file_offset) *
-			f->zbd_info->num_open_zones / f->io_size;
+
+		/*
+		 * List of opened zones is per-device, shared across all threads.
+		 * Start with quasi-random candidate zone.
+		 * Ignore zones which don't belong to thread's offset/size area.
+		 */
+		open_zone_idx = pick_random_zone_idx(f, io_u);
 		assert(open_zone_idx < f->zbd_info->num_open_zones);
+		tmp_idx = open_zone_idx;
+		for (i = 0; i < f->zbd_info->num_open_zones; i++) {
+			uint32_t tmpz;
+
+			if (tmp_idx >= f->zbd_info->num_open_zones)
+				tmp_idx = 0;
+			tmpz = f->zbd_info->open_zones[tmp_idx];
+
+			if (is_valid_offset(f, f->zbd_info->zone_info[tmpz].start)) {
+				open_zone_idx = tmp_idx;
+				goto found_candidate_zone;
+			}
+
+			tmp_idx++;
+		}
+
+		dprint(FD_ZBD, "%s(%s): no candidate zone\n",
+			__func__, f->file_name);
+		return NULL;
+
+found_candidate_zone:
 		new_zone_idx = f->zbd_info->open_zones[open_zone_idx];
 		if (new_zone_idx == zone_idx)
 			break;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-04-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-04-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 983319d02626347d178b34320ac5835ae7ecad02:

  Merge branch 'jsonplus2csv' of https://github.com/vincentkfu/fio (2020-03-26 09:03:40 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 839f607b5771081e96942977a1ff9f1b24b77bca:

  Merge branch 'github-issue-947' of https://github.com/vincentkfu/fio (2020-03-31 10:21:48 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'github-issue-947' of https://github.com/vincentkfu/fio

Vincent Fu (1):
      stat: eliminate extra log samples

 stat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index d8c01d14..efa811d2 100644
--- a/stat.c
+++ b/stat.c
@@ -2749,7 +2749,7 @@ static unsigned long add_log_sample(struct thread_data *td,
 			return diff;
 	}
 
-	_add_stat_to_log(iolog, elapsed, td->o.log_max != 0, priority_bit);
+	__add_stat_to_log(iolog, ddir, elapsed, td->o.log_max != 0, priority_bit);
 
 	iolog->avg_last[ddir] = elapsed - (this_window - iolog->avg_msec);
 	return iolog->avg_msec;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-03-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-03-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3bd2078bdd1c173f9d02bc20e2d630302555a8a0:

  zbd: add test for stressing zone locking (2020-03-17 20:05:54 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 983319d02626347d178b34320ac5835ae7ecad02:

  Merge branch 'jsonplus2csv' of https://github.com/vincentkfu/fio (2020-03-26 09:03:40 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'jsonplus2csv' of https://github.com/vincentkfu/fio

Vincent Fu (3):
      tools/fio_jsonplus2csv: accommodate multiple lat measurements
      t/jsonplus2csv_test.py: test script for tools/fio_jsonplus_clat2csv
      .travis.yml: remove pip line from xcode11.2 config

 .appveyor.yml               |   2 +-
 .travis.yml                 |   3 +-
 t/jsonplus2csv_test.py      | 150 ++++++++++++++++++
 t/run-fio-tests.py          |   8 +
 tools/fio_jsonplus_clat2csv | 379 +++++++++++++++++++++++++++++++-------------
 5 files changed, 428 insertions(+), 114 deletions(-)
 create mode 100755 t/jsonplus2csv_test.py

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index bf0978ad..2f962c4b 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -15,7 +15,7 @@ environment:
 install:
   - '%CYG_ROOT%\setup-x86_64.exe --quiet-mode --no-shortcuts --only-site --site "%CYG_MIRROR%" --packages "mingw64-%PACKAGE_ARCH%-zlib,mingw64-%PACKAGE_ARCH%-CUnit" > NUL'
   - SET PATH=C:\Python38-x64;%CYG_ROOT%\bin;%PATH% # NB: Changed env variables persist to later sections
-  - python.exe -m pip install scipy
+  - python.exe -m pip install scipy six
 
 build_script:
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure --disable-native --extra-cflags=\"-Werror\" ${CONFIGURE_OPTIONS} && make.exe'
diff --git a/.travis.yml b/.travis.yml
index 77c31b77..6b710cc3 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -48,8 +48,9 @@ before_install:
         brew install cunit;
         if [[ "$TRAVIS_OSX_IMAGE" == "xcode11.2" ]]; then
             pip3 install scipy;
+        else
+            pip install scipy;
         fi;
-        pip install scipy;
     fi;
 script:
   - ./configure --extra-cflags="${EXTRA_CFLAGS}" && make
diff --git a/t/jsonplus2csv_test.py b/t/jsonplus2csv_test.py
new file mode 100755
index 00000000..2b34ef25
--- /dev/null
+++ b/t/jsonplus2csv_test.py
@@ -0,0 +1,150 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright (c) 2020 Western Digital Corporation or its affiliates.
+#
+"""
+jsonplus2csv-test.py
+
+Do one basic test of tools/fio_jsonplus2csv
+
+USAGE
+python jsonplus2csv-test.py [-f fio-executable] [-s script-location]
+
+EXAMPLES
+python t/jsonplus2csv-test.py
+python t/jsonplus2csv-test.py -f ./fio -s tools
+
+REQUIREMENTS
+Python 3.5+
+"""
+
+import os
+import sys
+import platform
+import argparse
+import subprocess
+
+
+def parse_args():
+    """Parse command-line arguments."""
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-f', '--fio',
+                        help='path to fio executable (e.g., ./fio)')
+    parser.add_argument('-s', '--script',
+                        help='directory containing fio_jsonplus2csv script')
+    return parser.parse_args()
+
+
+def run_fio(fio):
+    """Run fio to generate json+ data.
+
+    Parameters:
+        fio     path to fio executable.
+    """
+
+    if platform.system() == 'Linux':
+        aio = 'libaio'
+    elif platform.system() == 'Windows':
+        aio = 'windowsaio'
+    else:
+        aio = 'posixaio'
+
+    fio_args = [
+        "--output=fio-output.json",
+        "--output-format=json+",
+        "--filename=fio_jsonplus_clat2csv.test",
+        "--ioengine=" + aio,
+        "--time_based",
+        "--runtime=3s",
+        "--size=1G",
+        "--slat_percentiles=1",
+        "--clat_percentiles=1",
+        "--lat_percentiles=1",
+        "--thread=1",
+        "--name=test1",
+        "--rw=randrw",
+        "--name=test2",
+        "--rw=read",
+        "--name=test3",
+        "--rw=write",
+        ]
+
+    output = subprocess.run([fio] + fio_args, stdout=subprocess.PIPE,
+                            stderr=subprocess.PIPE)
+
+    return output
+
+
+def check_output(fio_output, script_path):
+    """Run t/fio_jsonplus_clat2csv and validate the generated CSV files
+    against the original json+ fio output.
+
+    Parameters:
+        fio_output      subprocess.run object describing fio run.
+        script_path     path to fio_jsonplus_clat2csv script.
+    """
+
+    if fio_output.returncode != 0:
+        return False
+
+    if platform.system() == 'Windows':
+        script = ['python.exe', script_path]
+    else:
+        script = [script_path]
+
+    script_args = ["fio-output.json", "fio-output.csv"]
+    script_args_validate = script_args + ["--validate"]
+
+    script_output = subprocess.run(script + script_args)
+    if script_output.returncode != 0:
+        return False
+
+    script_output = subprocess.run(script + script_args_validate)
+    if script_output.returncode != 0:
+        return False
+
+    return True
+
+
+def main():
+    """Entry point for this script."""
+
+    args = parse_args()
+
+    index = 1
+    passed = 0
+    failed = 0
+
+    if args.fio:
+        fio_path = args.fio
+    else:
+        fio_path = os.path.join(os.path.dirname(__file__), '../fio')
+        if not os.path.exists(fio_path):
+            fio_path = 'fio'
+    print("fio path is", fio_path)
+
+    if args.script:
+        script_path = args.script
+    else:
+        script_path = os.path.join(os.path.dirname(__file__), '../tools/fio_jsonplus_clat2csv')
+        if not os.path.exists(script_path):
+            script_path = 'fio_jsonplus_clat2csv'
+    print("script path is", script_path)
+
+    fio_output = run_fio(fio_path)
+    status = check_output(fio_output, script_path)
+    print("Test {0} {1}".format(index, ("PASSED" if status else "FAILED")))
+    if status:
+        passed = passed + 1
+    else:
+        failed = failed + 1
+    index = index + 1
+
+    print("{0} tests passed, {1} failed".format(passed, failed))
+
+    sys.exit(failed)
+
+if __name__ == '__main__':
+    main()
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 36fcb2f4..ea5abc4e 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -757,6 +757,14 @@ TEST_LIST = [
         'success':          SUCCESS_DEFAULT,
         'requirements':     [],
     },
+    {
+        'test_id':          1011,
+        'test_class':       FioExeTest,
+        'exe':              't/jsonplus2csv_test.py',
+        'parameters':       ['-f', '{fio_path}'],
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [],
+    },
 ]
 
 
diff --git a/tools/fio_jsonplus_clat2csv b/tools/fio_jsonplus_clat2csv
index 78a007e5..9544ab74 100755
--- a/tools/fio_jsonplus_clat2csv
+++ b/tools/fio_jsonplus_clat2csv
@@ -1,76 +1,97 @@
 #!/usr/bin/python2.7
 # Note: this script is python2 and python3 compatible.
-#
-# fio_jsonplus_clat2csv
-#
-# This script converts fio's json+ completion latency data to CSV format.
-#
-# For example:
-#
-# Run the following fio jobs:
-# ../fio --output=fio-jsonplus.output --output-format=json+ --name=test1
-#  	--ioengine=null --time_based --runtime=5s --size=1G --rw=randrw
-# 	--name=test2 --ioengine=null --time_based --runtime=3s --size=1G
-# 	--rw=read --name=test3 --ioengine=null --time_based --runtime=4s
-# 	--size=8G --rw=write
-#
-# Then run:
-# fio_jsonplus_clat2csv fio-jsonplus.output fio-latency.csv
-#
-# You will end up with the following 3 files
-#
-# -rw-r--r-- 1 root root  6467 Jun 27 14:57 fio-latency_job0.csv
-# -rw-r--r-- 1 root root  3985 Jun 27 14:57 fio-latency_job1.csv
-# -rw-r--r-- 1 root root  4490 Jun 27 14:57 fio-latency_job2.csv
-#
-# fio-latency_job0.csv will look something like:
-#
-# clat_nsec, read_count, read_cumulative, read_percentile, write_count,
-# 	write_cumulative, write_percentile, trim_count, trim_cumulative,
-# 	trim_percentile,
-# 25, 1, 1, 1.50870705013e-07, , , , , , ,
-# 26, 12, 13, 1.96131916517e-06, 947, 947, 0.000142955890032, , , ,
-# 27, 843677, 843690, 0.127288105112, 838347, 839294, 0.126696959629, , , ,
-# 28, 1877982, 2721672, 0.410620573454, 1870189, 2709483, 0.409014312345, , , ,
-# 29, 4471, 2726143, 0.411295116376, 7718, 2717201, 0.410179395301, , , ,
-# 30, 2142885, 4869028, 0.734593687087, 2138164, 4855365, 0.732949340025, , , ,
-# ...
-# 2544, , , , 2, 6624404, 0.999997433738, , , ,
-# 2576, 3, 6628178, 0.99999788781, 4, 6624408, 0.999998037564, , , ,
-# 2608, 4, 6628182, 0.999998491293, 4, 6624412, 0.999998641391, , , ,
-# 2640, 3, 6628185, 0.999998943905, 2, 6624414, 0.999998943304, , , ,
-# 2672, 1, 6628186, 0.999999094776, 3, 6624417, 0.999999396174, , , ,
-# 2736, 1, 6628187, 0.999999245646, 1, 6624418, 0.99999954713, , , ,
-# 2768, 2, 6628189, 0.999999547388, 1, 6624419, 0.999999698087, , , ,
-# 2800, , , , 1, 6624420, 0.999999849043, , , ,
-# 2832, 1, 6628190, 0.999999698259, , , , , , ,
-# 4192, 1, 6628191, 0.999999849129, , , , , , ,
-# 5792, , , , 1, 6624421, 1.0, , , ,
-# 10304, 1, 6628192, 1.0, , , , , , ,
-#
-# The first line says that you had one read IO with 25ns clat,
-# the cumulative number of read IOs at or below 25ns is 1, and
-# 25ns is the 0.00001509th percentile for read latency
-#
-# The job had 2 write IOs complete in 2544ns,
-# 6624404 write IOs completed in 2544ns or less,
-# and this represents the 99.99974th percentile for write latency
-#
-# The last line says that one read IO had 10304ns clat,
-# 6628192 read IOs had 10304ns or shorter clat, and
-# 10304ns is the 100th percentile for read latency
-#
+
+"""
+fio_jsonplus_clat2csv
+
+This script converts fio's json+ latency data to CSV format.
+
+For example:
+
+Run the following fio jobs:
+$ fio --output=fio-jsonplus.output --output-format=json+ --ioengine=null \
+    --time_based --runtime=3s --size=1G --slat_percentiles=1 \
+    --clat_percentiles=1 --lat_percentiles=1 \
+    --name=test1 --rw=randrw \
+    --name=test2 --rw=read \
+    --name=test3 --rw=write
+
+Then run:
+$ fio_jsonplus_clat2csv fio-jsonplus.output fio-jsonplus.csv
+
+You will end up with the following 3 files:
+
+-rw-r--r-- 1 root root 77547 Mar 24 15:17 fio-jsonplus_job0.csv
+-rw-r--r-- 1 root root 65413 Mar 24 15:17 fio-jsonplus_job1.csv
+-rw-r--r-- 1 root root 63291 Mar 24 15:17 fio-jsonplus_job2.csv
+
+fio-jsonplus_job0.csv will look something like:
+
+nsec, read_slat_ns_count, read_slat_ns_cumulative, read_slat_ns_percentile, read_clat_ns_count, read_clat_ns_cumulative, read_clat_ns_percentile, read_lat_ns_count, read_lat_ns_cumulative, read_lat_ns_percentile, write_slat_ns_count, write_slat_ns_cumulative, write_slat_ns_percentile, write_clat_ns_count, write_clat_ns_cumulative, write_clat_ns_percentile, write_lat_ns_count, write_lat_ns_cumulative, write_lat_ns_percentile, trim_slat_ns_count, trim_slat_ns_cumulative, trim_slat_ns_percentile, trim_clat_ns_count, trim_clat_ns_cumulative, trim_clat_ns_percentile, trim_lat_ns_count, trim_lat_ns_cumulative, trim_lat_ns_percentile,
+12, , , , 3, 3, 6.11006798673e-07, , , , , , , 2, 2, 4.07580840603e-07, , , , , , , , , , , , ,
+13, , , , 1364, 1367, 0.000278415431262, , , , , , , 1776, 1778, 0.000362339367296, , , , , , , , , , , , ,
+14, , , , 181872, 183239, 0.037320091594, , , , , , , 207436, 209214, 0.0426358089929, , , , , , , , , , , , ,
+15, , , , 1574811, 1758050, 0.358060167469, , , , , , , 1661435, 1870649, 0.381220345946, , , , , , , , , , , , ,
+16, , , , 2198478, 3956528, 0.805821835713, , , , , , , 2154571, 4025220, 0.820301275606, , , , , , , , , , , , ,
+17, , , , 724335, 4680863, 0.953346372218, , , , , , , 645351, 4670571, 0.951817627138, , , , , , , , , , , , ,
+18, , , , 71837, 4752700, 0.96797733735, , , , , , , 61084, 4731655, 0.964265961171, , , , , , , , , , , , ,
+19, , , , 15915, 4768615, 0.971218728417, , , , , , , 18419, 4750074, 0.968019576923, , , , , , , , , , , , ,
+20, , , , 12651, 4781266, 0.973795344087, , , , , , , 14176, 4764250, 0.970908509921, , , , , , , , , , , , ,
+...
+168960, , , , , , , , , , , , , 1, 4906999, 0.999999388629, 1, 4906997, 0.999998981048, , , , , , , , , ,
+177152, , , , , , , , , , , , , 1, 4907000, 0.999999592419, 1, 4906998, 0.999999184838, , , , , , , , , ,
+183296, , , , , , , , , , , , , 1, 4907001, 0.99999979621, 1, 4906999, 0.999999388629, , , , , , , , , ,
+189440, , , , , , , 1, 4909925, 0.999999185324, , , , , , , , , , , , , , , , , , ,
+214016, , , , 1, 4909928, 0.999999796331, 2, 4909927, 0.999999592662, , , , , , , , , , , , , , , , , , ,
+246784, , , , , , , , , , , , , , , , 1, 4907000, 0.999999592419, , , , , , , , , ,
+272384, , , , 1, 4909929, 1.0, 1, 4909928, 0.999999796331, , , , , , , , , , , , , , , , , , ,
+329728, , , , , , , , , , , , , 1, 4907002, 1.0, 1, 4907001, 0.99999979621, , , , , , , , , ,
+1003520, , , , , , , , , , , , , , , , 1, 4907002, 1.0, , , , , , , , , ,
+1089536, , , , , , , 1, 4909929, 1.0, , , , , , , , , , , , , , , , , , ,
+
+The first line says that there were three read IOs with 12ns clat,
+the cumulative number of read IOs at or below 12ns was two, and
+12ns was the 0.0000611th percentile for read latency. There were
+two write IOs with 12ns clat, the cumulative number of write IOs
+at or below 12ns was two, and 12ns was the 0.0000408th percentile
+for write latency.
+
+The job had one write IO complete at 168960ns and 4906999 write IOs
+completed at or below this duration. Also this duration was the
+99.99994th percentile for write latency. There was one write IO
+with a total latency of 168960ns, this duration had a cumulative
+frequency of 4906997 write IOs and was the 99.9998981048th percentile
+for write total latency.
+
+The last line says that one read IO had 1089536ns total latency, this
+duration had a cumulative frequency of 4909929 and represented the 100th
+percentile for read total latency.
+
+Running the following:
+
+$ fio_jsonplus_clat2csv fio-jsonplus.output fio-jsonplus.csv --validate
+fio-jsonplus_job0.csv validated
+fio-jsonplus_job1.csv validated
+fio-jsonplus_job2.csv validated
+
+will check the CSV data against the json+ output to confirm that the CSV
+data matches.
+"""
 
 from __future__ import absolute_import
 from __future__ import print_function
 import os
 import json
 import argparse
+import itertools
 import six
-from six.moves import range
 
+DDIR_LIST = ['read', 'write', 'trim']
+LAT_LIST = ['slat_ns', 'clat_ns', 'lat_ns']
 
 def parse_args():
+    """Parse command-line arguments."""
+
     parser = argparse.ArgumentParser()
     parser.add_argument('source',
                         help='fio json+ output file containing completion '
@@ -78,12 +99,26 @@ def parse_args():
     parser.add_argument('dest',
                         help='destination file stub for latency data in CSV '
                              'format. job number will be appended to filename')
+    parser.add_argument('--debug', '-d', action='store_true',
+                        help='enable debug prints')
+    parser.add_argument('--validate', action='store_true',
+                        help='validate CSV against JSON output')
     args = parser.parse_args()
 
     return args
 
 
 def percentile(idx, run_total):
+    """Return a percentile for a specified index based on a running total.
+
+    Parameters:
+        idx         index for which to generate percentile.
+        run_total   list of cumulative sums.
+
+    Returns:
+        Percentile represented by the specified index.
+    """
+
     total = run_total[len(run_total)-1]
     if total == 0:
         return 0
@@ -91,7 +126,18 @@ def percentile(idx, run_total):
     return float(run_total[idx]) / total
 
 
-def more_lines(indices, bins):
+def more_bins(indices, bins):
+    """Determine whether we have more bins to process.
+
+    Parameters:
+        indices     a dict containing the last index processed in each bin.
+        bins        a dict contaiing a set of bins to process.
+
+    Returns:
+        True if the indices do not yet point to the end of each bin in bins.
+        False if the indices point beyond their repsective bins.
+    """
+
     for key, value in six.iteritems(indices):
         if value < len(bins[key]):
             return True
@@ -99,78 +145,187 @@ def more_lines(indices, bins):
     return False
 
 
+def debug_print(debug, *args):
+    """Print debug messages.
+
+    Parameters:
+        debug       emit messages if True.
+        *args       arguments for print().
+    """
+
+    if debug:
+        print(*args)
+
+
+def get_csvfile(dest, jobnum):
+    """Generate CSV filename from command-line arguments and job numbers.
+
+    Paramaters:
+        dest        file specification for CSV filename.
+        jobnum      job number.
+
+    Returns:
+        A string that is a new filename that incorporates the job number.
+    """
+
+    stub, ext = os.path.splitext(dest)
+    return stub + '_job' + str(jobnum) + ext
+
+
+def validate(args, jsondata, col_labels):
+    """Validate CSV data against json+ output.
+
+    This function checks the CSV data to make sure that it was correctly
+    generated from the original json+ output. json+ 'bins' objects are
+    constructed from the CSV data and then compared to the corresponding
+    objects in the json+ data. An AssertionError will appear if a mismatch
+    is found.
+
+    Percentiles and cumulative counts are not checked.
+
+    Parameters:
+        args        command-line arguments for this script.
+        jsondata    json+ output to compare against.
+        col_labels  column labels for CSV data.
+
+    Returns
+        0 if no mismatches found.
+    """
+
+    colnames = [c.strip() for c in col_labels.split(',')]
+
+    for jobnum in range(len(jsondata['jobs'])):
+        job_data = jsondata['jobs'][jobnum]
+        csvfile = get_csvfile(args.dest, jobnum)
+
+        with open(csvfile, 'r') as csvsource:
+            csvlines = csvsource.read().split('\n')
+
+        assert csvlines[0] == col_labels
+        debug_print(args.debug, 'col_labels match for', csvfile)
+
+        # create 'bins' objects from the CSV data
+        counts = {}
+        for ddir in DDIR_LIST:
+            counts[ddir] = {}
+            for lat in LAT_LIST:
+                counts[ddir][lat] = {}
+
+        csvlines.pop(0)
+        for line in csvlines:
+            if line.strip() == "":
+                continue
+            values = line.split(',')
+            nsec = values[0]
+            for col in colnames:
+                if 'count' in col:
+                    val = values[colnames.index(col)]
+                    if val.strip() != "":
+                        count = int(val)
+                        ddir, lat, _, _ = col.split('_')
+                        lat = lat + '_ns'
+                        counts[ddir][lat][nsec] = count
+                        try:
+                            assert count == job_data[ddir][lat]['bins'][nsec]
+                        except Exception:
+                            print("mismatch:", csvfile, ddir, lat, nsec, "ns")
+                            return 1
+
+        # compare 'bins' objects created from the CSV data
+        # with corresponding 'bins' objects in the json+ output
+        for ddir in DDIR_LIST:
+            for lat in LAT_LIST:
+                if lat in job_data[ddir] and 'bins' in job_data[ddir][lat]:
+                    assert job_data[ddir][lat]['bins'] == counts[ddir][lat]
+                    debug_print(args.debug, csvfile, ddir, lat, "bins match")
+                else:
+                    assert counts[ddir][lat] == {}
+                    debug_print(args.debug, csvfile, ddir, lat, "bins empty")
+
+        print(csvfile, "validated")
+
+    return 0
+
+
 def main():
+    """Starting point for this script.
+
+    In standard mode, this script will generate CSV data from fio json+ output.
+    In validation mode it will check to make sure that counts in CSV files
+    match the counts in the json+ data.
+    """
+
     args = parse_args()
 
     with open(args.source, 'r') as source:
         jsondata = json.loads(source.read())
 
+    ddir_lat_list = list(ddir + '_' + lat for ddir, lat in itertools.product(DDIR_LIST, LAT_LIST))
+    debug_print(args.debug, 'ddir_lat_list: ', ddir_lat_list)
+    col_labels = 'nsec, '
+    for ddir_lat in ddir_lat_list:
+        col_labels += "{0}_count, {0}_cumulative, {0}_percentile, ".format(ddir_lat)
+    debug_print(args.debug, 'col_labels: ', col_labels)
+
+    if args.validate:
+        return validate(args, jsondata, col_labels)
+
     for jobnum in range(0, len(jsondata['jobs'])):
         bins = {}
         run_total = {}
-        ddir_set = set(['read', 'write', 'trim'])
-
-        prev_ddir = None
-        for ddir in ddir_set:
-            if 'bins' in jsondata['jobs'][jobnum][ddir]['clat_ns']:
-                bins_loc = 'clat_ns'
-            elif 'bins' in jsondata['jobs'][jobnum][ddir]['lat_ns']:
-                bins_loc = 'lat_ns'
-            else:
-                raise RuntimeError("Latency bins not found. "
-                                   "Are you sure you are using json+ output?")
-
-            bins[ddir] = [[int(key), value] for key, value in
-                          six.iteritems(jsondata['jobs'][jobnum][ddir][bins_loc]
-                          ['bins'])]
-            bins[ddir] = sorted(bins[ddir], key=lambda bin: bin[0])
-
-            run_total[ddir] = [0 for x in range(0, len(bins[ddir]))]
-            if len(bins[ddir]) > 0:
-                run_total[ddir][0] = bins[ddir][0][1]
-                for x in range(1, len(bins[ddir])):
-                    run_total[ddir][x] = run_total[ddir][x-1] + \
-                        bins[ddir][x][1]
-
-        stub, ext = os.path.splitext(args.dest)
-        outfile = stub + '_job' + str(jobnum) + ext
-
-        with open(outfile, 'w') as output:
-            output.write("{0}ec, ".format(bins_loc))
-            ddir_list = list(ddir_set)
-            for ddir in ddir_list:
-                output.write("{0}_count, {0}_cumulative, {0}_percentile, ".
-                             format(ddir))
-            output.write("\n")
+
+        for ddir in DDIR_LIST:
+            ddir_data = jsondata['jobs'][jobnum][ddir]
+            for lat in LAT_LIST:
+                ddir_lat = ddir + '_' + lat
+                if lat not in ddir_data or 'bins' not in ddir_data[lat]:
+                    bins[ddir_lat] = []
+                    debug_print(args.debug, 'job', jobnum, ddir_lat, 'not found')
+                    continue
+
+                debug_print(args.debug, 'job', jobnum, ddir_lat, 'processing')
+                bins[ddir_lat] = [[int(key), value] for key, value in
+                                  six.iteritems(ddir_data[lat]['bins'])]
+                bins[ddir_lat] = sorted(bins[ddir_lat], key=lambda bin: bin[0])
+
+                run_total[ddir_lat] = [0 for x in range(0, len(bins[ddir_lat]))]
+                run_total[ddir_lat][0] = bins[ddir_lat][0][1]
+                for index in range(1, len(bins[ddir_lat])):
+                    run_total[ddir_lat][index] = run_total[ddir_lat][index-1] + \
+                        bins[ddir_lat][index][1]
+
+        csvfile = get_csvfile(args.dest, jobnum)
+        with open(csvfile, 'w') as output:
+            output.write(col_labels + "\n")
 
 #
-# Have a counter for each ddir
+# Have a counter for each ddir_lat pairing
 # In each round, pick the shortest remaining duration
 # and output a line with any values for that duration
 #
-            indices = {x: 0 for x in ddir_list}
-            while more_lines(indices, bins):
+            indices = {x: 0 for x in ddir_lat_list}
+            while more_bins(indices, bins):
+                debug_print(args.debug, 'indices: ', indices)
                 min_lat = 17112760320
-                for ddir in ddir_list:
-                    if indices[ddir] < len(bins[ddir]):
-                        min_lat = min(bins[ddir][indices[ddir]][0], min_lat)
+                for ddir_lat in ddir_lat_list:
+                    if indices[ddir_lat] < len(bins[ddir_lat]):
+                        min_lat = min(bins[ddir_lat][indices[ddir_lat]][0], min_lat)
 
                 output.write("{0}, ".format(min_lat))
 
-                for ddir in ddir_list:
-                    if indices[ddir] < len(bins[ddir]) and \
-                       min_lat == bins[ddir][indices[ddir]][0]:
-                        count = bins[ddir][indices[ddir]][1]
-                        cumulative = run_total[ddir][indices[ddir]]
-                        ptile = percentile(indices[ddir], run_total[ddir])
-                        output.write("{0}, {1}, {2}, ".format(count,
-                                     cumulative, ptile))
-                        indices[ddir] += 1
+                for ddir_lat in ddir_lat_list:
+                    if indices[ddir_lat] < len(bins[ddir_lat]) and \
+                       min_lat == bins[ddir_lat][indices[ddir_lat]][0]:
+                        count = bins[ddir_lat][indices[ddir_lat]][1]
+                        cumulative = run_total[ddir_lat][indices[ddir_lat]]
+                        ptile = percentile(indices[ddir_lat], run_total[ddir_lat])
+                        output.write("{0}, {1}, {2}, ".format(count, cumulative, ptile))
+                        indices[ddir_lat] += 1
                     else:
                         output.write(", , , ")
                 output.write("\n")
 
-            print("{0} generated".format(outfile))
+            print("{0} generated".format(csvfile))
 
 
 if __name__ == '__main__':


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-03-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-03-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5e8865c0e08861558c1253c521dc9098d0c773ee:

  t/io_uring: don't use *rand48_r() variants (2020-03-16 08:30:36 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3bd2078bdd1c173f9d02bc20e2d630302555a8a0:

  zbd: add test for stressing zone locking (2020-03-17 20:05:54 -0600)

----------------------------------------------------------------
Feng Tang (1):
      gauss.c: correct the stddev initializtion

Naohiro Aota (6):
      zbd: avoid initializing swd when unnecessary
      zbd: reset one zone at a time
      zbd: use zone_lock to lock a zone
      backend: always clean up pending aios
      io_u: ensure io_u_quiesce() to process all the IOs
      zbd: add test for stressing zone locking

 backend.c              |  5 ---
 io_u.c                 |  6 ++--
 lib/gauss.c            |  2 +-
 t/zbd/test-zbd-support | 30 ++++++++++++++++++
 zbd.c                  | 84 +++++++++++++++++++-------------------------------
 5 files changed, 66 insertions(+), 61 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 936203dc..feb34e51 100644
--- a/backend.c
+++ b/backend.c
@@ -237,15 +237,10 @@ static void cleanup_pending_aio(struct thread_data *td)
 {
 	int r;
 
-	if (td->error)
-		return;
-
 	/*
 	 * get immediately available events, if any
 	 */
 	r = io_u_queued_complete(td, 0);
-	if (r < 0)
-		return;
 
 	/*
 	 * now cancel remaining active events
diff --git a/io_u.c b/io_u.c
index bcb893c5..5d62a76c 100644
--- a/io_u.c
+++ b/io_u.c
@@ -606,7 +606,7 @@ static inline enum fio_ddir get_rand_ddir(struct thread_data *td)
 
 int io_u_quiesce(struct thread_data *td)
 {
-	int ret = 0, completed = 0;
+	int ret = 0, completed = 0, err = 0;
 
 	/*
 	 * We are going to sleep, ensure that we flush anything pending as
@@ -625,7 +625,7 @@ int io_u_quiesce(struct thread_data *td)
 		if (ret > 0)
 			completed += ret;
 		else if (ret < 0)
-			break;
+			err = ret;
 	}
 
 	if (td->flags & TD_F_REGROW_LOGS)
@@ -634,7 +634,7 @@ int io_u_quiesce(struct thread_data *td)
 	if (completed)
 		return completed;
 
-	return ret;
+	return err;
 }
 
 static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
diff --git a/lib/gauss.c b/lib/gauss.c
index 1d24e187..3f84dbc6 100644
--- a/lib/gauss.c
+++ b/lib/gauss.c
@@ -51,7 +51,7 @@ void gauss_init(struct gauss_state *gs, unsigned long nranges, double dev,
 	gs->nranges = nranges;
 
 	if (dev != 0.0) {
-		gs->stddev = ceil((double) (nranges * 100.0) / dev);
+		gs->stddev = ceil((double)(nranges * dev) / 100.0);
 		if (gs->stddev > nranges / 2)
 			gs->stddev = nranges / 2;
 	}
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 5d079a8b..bd41fffb 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -755,6 +755,36 @@ test47() {
     grep -q 'zoneskip 1 is not a multiple of the device zone size' "${logfile}.${test_number}"
 }
 
+# Multiple overlapping random write jobs for the same drive and with a
+# limited number of open zones. This is similar to test29, but uses libaio
+# to stress test zone locking.
+test48() {
+    local i jobs=16 off opts=()
+
+    off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
+    size=$((16*zone_size))
+    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
+    opts=("--aux-path=/tmp" "--allow_file_create=0" "--significant_figures=10")
+    opts+=("--debug=zbd")
+    opts+=("--ioengine=libaio" "--rw=randwrite" "--direct=1")
+    opts+=("--time_based" "--runtime=30")
+    opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
+    opts+=("--max_open_zones=4")
+    for ((i=0;i<jobs;i++)); do
+	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
+	opts+=("--io_size=$zone_size" "--iodepth=256" "--thread=1")
+	opts+=("--group_reporting=1")
+    done
+
+    fio=$(dirname "$0")/../../fio
+
+    { echo; echo "fio ${opts[*]}"; echo; } >>"${logfile}.${test_number}"
+
+    timeout -v -s KILL 45s \
+	    "${dynamic_analyzer[@]}" "$fio" "${opts[@]}" \
+	    >> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
 tests=()
 dynamic_analyzer=()
 reset_all_zones=
diff --git a/zbd.c b/zbd.c
index ee8bcb30..b2d94249 100644
--- a/zbd.c
+++ b/zbd.c
@@ -707,7 +707,7 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 			   struct fio_zone_info *const zb,
 			   struct fio_zone_info *const ze, bool all_zones)
 {
-	struct fio_zone_info *z, *start_z = ze;
+	struct fio_zone_info *z;
 	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
 	bool reset_wp;
 	int res = 0;
@@ -717,48 +717,20 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 	assert(f->fd != -1);
 	for (z = zb; z < ze; z++) {
 		pthread_mutex_lock(&z->mutex);
-		switch (z->type) {
-		case BLK_ZONE_TYPE_SEQWRITE_REQ:
+		if (z->type == BLK_ZONE_TYPE_SEQWRITE_REQ) {
 			reset_wp = all_zones ? z->wp != z->start :
 					(td->o.td_ddir & TD_DDIR_WRITE) &&
 					z->wp % min_bs != 0;
-			if (start_z == ze && reset_wp) {
-				start_z = z;
-			} else if (start_z < ze && !reset_wp) {
-				dprint(FD_ZBD,
-				       "%s: resetting zones %u .. %u\n",
+			if (reset_wp) {
+				dprint(FD_ZBD, "%s: resetting zone %u\n",
 				       f->file_name,
-					zbd_zone_nr(f->zbd_info, start_z),
-					zbd_zone_nr(f->zbd_info, z));
-				if (zbd_reset_range(td, f, start_z->start,
-						z->start - start_z->start) < 0)
+				       zbd_zone_nr(f->zbd_info, z));
+				if (zbd_reset_zone(td, f, z) < 0)
 					res = 1;
-				start_z = ze;
 			}
-			break;
-		default:
-			if (start_z == ze)
-				break;
-			dprint(FD_ZBD, "%s: resetting zones %u .. %u\n",
-			       f->file_name, zbd_zone_nr(f->zbd_info, start_z),
-			       zbd_zone_nr(f->zbd_info, z));
-			if (zbd_reset_range(td, f, start_z->start,
-					    z->start - start_z->start) < 0)
-				res = 1;
-			start_z = ze;
-			break;
 		}
-	}
-	if (start_z < ze) {
-		dprint(FD_ZBD, "%s: resetting zones %u .. %u\n", f->file_name,
-			zbd_zone_nr(f->zbd_info, start_z),
-			zbd_zone_nr(f->zbd_info, z));
-		if (zbd_reset_range(td, f, start_z->start,
-				    z->start - start_z->start) < 0)
-			res = 1;
-	}
-	for (z = zb; z < ze; z++)
 		pthread_mutex_unlock(&z->mutex);
+	}
 
 	return res;
 }
@@ -847,6 +819,9 @@ static void zbd_init_swd(struct fio_file *f)
 {
 	uint64_t swd;
 
+	if (!enable_check_swd)
+		return;
+
 	swd = zbd_process_swd(f, SET_SWD);
 	dprint(FD_ZBD, "%s(%s): swd = %" PRIu64 "\n", __func__, f->file_name,
 	       swd);
@@ -952,6 +927,24 @@ static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
 	f->zbd_info->zone_info[zone_idx].open = 0;
 }
 
+static void zone_lock(struct thread_data *td, struct fio_zone_info *z)
+{
+	/*
+	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
+	 * is changed or when io_u completes and zbd_put_io() executed.
+	 * To avoid multiple jobs doing asynchronous I/Os from deadlocking each
+	 * other waiting for zone locks when building an io_u batch, first
+	 * only trylock the zone. If the zone is already locked by another job,
+	 * process the currently queued I/Os so that I/O progress is made and
+	 * zones unlocked.
+	 */
+	if (pthread_mutex_trylock(&z->mutex) != 0) {
+		if (!td_ioengine_flagged(td, FIO_SYNCIO))
+			io_u_quiesce(td);
+		pthread_mutex_lock(&z->mutex);
+	}
+}
+
 /*
  * Modify the offset of an I/O unit that does not refer to an open zone such
  * that it refers to an open zone. Close an open zone and open a new zone if
@@ -994,7 +987,7 @@ static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 	for (;;) {
 		z = &f->zbd_info->zone_info[zone_idx];
 
-		pthread_mutex_lock(&z->mutex);
+		zone_lock(td, z);
 		pthread_mutex_lock(&f->zbd_info->mutex);
 		if (td->o.max_open_zones == 0)
 			goto examine_zone;
@@ -1042,7 +1035,7 @@ examine_zone:
 			z = &f->zbd_info->zone_info[zone_idx];
 		}
 		assert(is_valid_offset(f, z->start));
-		pthread_mutex_lock(&z->mutex);
+		zone_lock(td, z);
 		if (z->open)
 			continue;
 		if (zbd_open_zone(td, io_u, zone_idx))
@@ -1060,7 +1053,7 @@ examine_zone:
 
 		z = &f->zbd_info->zone_info[zone_idx];
 
-		pthread_mutex_lock(&z->mutex);
+		zone_lock(td, z);
 		if (z->wp + min_bs <= (z+1)->start)
 			goto out;
 		pthread_mutex_lock(&f->zbd_info->mutex);
@@ -1346,20 +1339,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 
 	zbd_check_swd(f);
 
-	/*
-	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
-	 * is changed or when io_u completes and zbd_put_io() executed.
-	 * To avoid multiple jobs doing asynchronous I/Os from deadlocking each
-	 * other waiting for zone locks when building an io_u batch, first
-	 * only trylock the zone. If the zone is already locked by another job,
-	 * process the currently queued I/Os so that I/O progress is made and
-	 * zones unlocked.
-	 */
-	if (pthread_mutex_trylock(&zb->mutex) != 0) {
-		if (!td_ioengine_flagged(td, FIO_SYNCIO))
-			io_u_quiesce(td);
-		pthread_mutex_lock(&zb->mutex);
-	}
+	zone_lock(td, zb);
 
 	switch (io_u->ddir) {
 	case DDIR_READ:


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-03-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-03-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dcdfc515ab7a245f9086c901a675a246e3b4db4e:

  Merge branch 'patch-1' of https://github.com/neheb/fio (2020-03-15 18:59:19 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5e8865c0e08861558c1253c521dc9098d0c773ee:

  t/io_uring: don't use *rand48_r() variants (2020-03-16 08:30:36 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: don't use *rand48_r() variants

 t/io_uring.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index e84a2b6b..d48db1e9 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -63,7 +63,6 @@ struct file {
 struct submitter {
 	pthread_t thread;
 	int ring_fd;
-	struct drand48_data rand;
 	struct io_sq_ring sq_ring;
 	struct io_uring_sqe *sqes;
 	struct io_cq_ring cq_ring;
@@ -170,7 +169,7 @@ static void init_io(struct submitter *s, unsigned index)
 	}
 	f->pending_ios++;
 
-	lrand48_r(&s->rand, &r);
+	r = lrand48();
 	offset = (r % (f->max_blocks - 1)) * BS;
 
 	if (register_files) {
@@ -286,7 +285,7 @@ static void *submitter_fn(void *data)
 
 	printf("submitter=%d\n", gettid());
 
-	srand48_r(pthread_self(), &s->rand);
+	srand48(pthread_self());
 
 	prepped = 0;
 	do {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-03-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-03-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f75248a9d9554b668484b089713e7c2b0a154ad6:

  Fio 3.19 (2020-03-12 11:12:50 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dcdfc515ab7a245f9086c901a675a246e3b4db4e:

  Merge branch 'patch-1' of https://github.com/neheb/fio (2020-03-15 18:59:19 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'patch-1' of https://github.com/neheb/fio

Rosen Penev (1):
      configure: fix vasprintf check under musl

 configure | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 5de86ca2..d17929f1 100755
--- a/configure
+++ b/configure
@@ -892,7 +892,8 @@ cat > $TMPC << EOF
 
 int main(int argc, char **argv)
 {
-  return vasprintf(NULL, "%s", NULL) == 0;
+  va_list ap;
+  return vasprintf(NULL, "%s", ap) == 0;
 }
 EOF
 if compile_prog "" "" "have_vasprintf"; then


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-03-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-03-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8324e5d5ffbe87080eebec1cd0c3c8a0751257de:

  Merge branch 'patch-1' of https://github.com/felixonmars/fio (2020-03-03 08:11:08 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f75248a9d9554b668484b089713e7c2b0a154ad6:

  Fio 3.19 (2020-03-12 11:12:50 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 3.19

Xiaoguang Wang (1):
      engines/io_uring: delete fio_option_is_set() calls when submitting sqes

 FIO-VERSION-GEN    |  2 +-
 engines/io_uring.c | 12 ++++++++++--
 2 files changed, 11 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 6c2bcc8a..3220aaa1 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.18
+DEF_VER=fio-3.19
 
 LF='
 '
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 1efc6cff..ac57af8f 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -63,6 +63,8 @@ struct ioring_data {
 	int queued;
 	int cq_ring_off;
 	unsigned iodepth;
+	bool ioprio_class_set;
+	bool ioprio_set;
 
 	struct ioring_mmap mmap[3];
 };
@@ -233,9 +235,9 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 		}
 		if (!td->o.odirect && o->uncached)
 			sqe->rw_flags = RWF_UNCACHED;
-		if (fio_option_is_set(&td->o, ioprio_class))
+		if (ld->ioprio_class_set)
 			sqe->ioprio = td->o.ioprio_class << 13;
-		if (fio_option_is_set(&td->o, ioprio))
+		if (ld->ioprio_set)
 			sqe->ioprio |= td->o.ioprio;
 		sqe->off = io_u->offset;
 	} else if (ddir_sync(io_u->ddir)) {
@@ -685,6 +687,12 @@ static int fio_ioring_init(struct thread_data *td)
 		td_verror(td, EINVAL, "fio_io_uring_init");
 		return 1;
 	}
+
+	if (fio_option_is_set(&td->o, ioprio_class))
+		ld->ioprio_class_set = true;
+	if (fio_option_is_set(&td->o, ioprio))
+		ld->ioprio_set = true;
+
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-03-04 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-03-04 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 41ee2319ffef9a38fff09d9c80dd997929fb51e4:

  Merge branch 'filestat3' of https://github.com/kusumi/fio (2020-03-02 09:34:38 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8324e5d5ffbe87080eebec1cd0c3c8a0751257de:

  Merge branch 'patch-1' of https://github.com/felixonmars/fio (2020-03-03 08:11:08 -0700)

----------------------------------------------------------------
Felix Yan (1):
      Correct multiple typos in engines/libhdfs.c

Jens Axboe (1):
      Merge branch 'patch-1' of https://github.com/felixonmars/fio

 engines/libhdfs.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libhdfs.c b/engines/libhdfs.c
index 60001601..c57fcea6 100644
--- a/engines/libhdfs.c
+++ b/engines/libhdfs.c
@@ -2,7 +2,7 @@
  * libhdfs engine
  *
  * this engine helps perform read/write operations on hdfs cluster using
- * libhdfs. hdfs doesnot support modification of data once file is created.
+ * libhdfs. hdfs does not support modification of data once file is created.
  *
  * so to mimic that create many files of small size (e.g 256k), and this
  * engine select a file based on the offset generated by fio.
@@ -75,7 +75,7 @@ static struct fio_option options[] = {
 		.type	= FIO_OPT_STR_STORE,
 		.off1   = offsetof(struct hdfsio_options, directory),
 		.def    = "/",
-		.help	= "The HDFS directory where fio will create chuncks",
+		.help	= "The HDFS directory where fio will create chunks",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_HDFS,
 	},
@@ -86,7 +86,7 @@ static struct fio_option options[] = {
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct hdfsio_options, chunck_size),
 		.def    = "1048576",
-		.help	= "Size of individual chunck",
+		.help	= "Size of individual chunk",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_HDFS,
 	},
@@ -177,7 +177,7 @@ static enum fio_q_status fio_hdfsio_queue(struct thread_data *td,
 	
 	if( (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) && 
 	     hdfsTell(hd->fs, hd->fp) != offset && hdfsSeek(hd->fs, hd->fp, offset) != 0 ) {
-		log_err("hdfs: seek failed: %s, are you doing random write smaller than chunck size ?\n", strerror(errno));
+		log_err("hdfs: seek failed: %s, are you doing random write smaller than chunk size ?\n", strerror(errno));
 		io_u->error = errno;
 		return FIO_Q_COMPLETED;
 	};
@@ -338,9 +338,9 @@ static int fio_hdfsio_setup(struct thread_data *td)
 		}
 		f->real_file_size = file_size;
 	}
-	/* If the size doesn't divide nicely with the chunck size,
+	/* If the size doesn't divide nicely with the chunk size,
 	 * make the last files bigger.
-	 * Used only if filesize was not explicitely given
+	 * Used only if filesize was not explicitly given
 	 */
 	if (!td->o.file_size_low && total_file_size < td->o.size) {
 		f->real_file_size += (td->o.size - total_file_size);
@@ -374,7 +374,7 @@ static int fio_hdfsio_io_u_init(struct thread_data *td, struct io_u *io_u)
 	}
 	hd->fs = hdfsBuilderConnect(bld);
 	
-	/* hdfsSetWorkingDirectory succeed on non existend directory */
+	/* hdfsSetWorkingDirectory succeed on non-existent directory */
 	if (hdfsExists(hd->fs, options->directory) < 0 || hdfsSetWorkingDirectory(hd->fs, options->directory) < 0) {
 		failure = errno;
 		log_err("hdfs: invalid working directory %s: %s\n", options->directory, strerror(errno));


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-03-03 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-03-03 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 941e3356a7edde556ffd81c9767ded218de20a50:

  Merge branch 'genfio-bash' of https://github.com/sitsofe/fio (2020-03-01 15:43:14 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 41ee2319ffef9a38fff09d9c80dd997929fb51e4:

  Merge branch 'filestat3' of https://github.com/kusumi/fio (2020-03-02 09:34:38 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'filestat3' of https://github.com/kusumi/fio

Tomohiro Kusumi (1):
      engines/filestat: add statx(2) syscall support

 Makefile           |  3 +++
 configure          | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 engines/filestat.c | 24 ++++++++++++++++++++----
 oslib/statx.c      | 23 +++++++++++++++++++++++
 oslib/statx.h      | 14 ++++++++++++++
 5 files changed, 110 insertions(+), 4 deletions(-)
 create mode 100644 oslib/statx.c
 create mode 100644 oslib/statx.h

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 72ab39e9..9a5dea7f 100644
--- a/Makefile
+++ b/Makefile
@@ -132,6 +132,9 @@ endif
 ifndef CONFIG_INET_ATON
   SOURCE += oslib/inet_aton.c
 endif
+ifndef CONFIG_HAVE_STATX
+  SOURCE += oslib/statx.c
+endif
 ifdef CONFIG_GFAPI
   SOURCE += engines/glusterfs.c
   SOURCE += engines/glusterfs_sync.c
diff --git a/configure b/configure
index a1217700..5de86ca2 100755
--- a/configure
+++ b/configure
@@ -2551,6 +2551,50 @@ if compile_prog "" "" "gettid"; then
 fi
 print_config "gettid" "$gettid"
 
+##########################################
+# check for statx(2) support by libc
+statx="no"
+cat > $TMPC << EOF
+#include <unistd.h>
+#include <sys/stat.h>
+
+int main(int argc, char **argv)
+{
+	struct statx st;
+	return statx(-1, *argv, 0, 0, &st);
+}
+EOF
+if compile_prog "" "" "statx"; then
+  statx="yes"
+fi
+print_config "statx(2)/libc" "$statx"
+
+##########################################
+# check for statx(2) support by kernel
+statx_syscall="no"
+cat > $TMPC << EOF
+#include <unistd.h>
+#include <linux/stat.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+
+static int _statx(int dfd, const char *pathname, int flags, unsigned int mask,
+		  struct statx *buffer)
+{
+	return syscall(__NR_statx, dfd, pathname, flags, mask, buffer);
+}
+
+int main(int argc, char **argv)
+{
+	struct statx st;
+	return _statx(-1, *argv, 0, 0, &st);
+}
+EOF
+if compile_prog "" "" "statx_syscall"; then
+  statx_syscall="yes"
+fi
+print_config "statx(2)/syscall" "$statx_syscall"
+
 #############################################################################
 
 if test "$wordsize" = "64" ; then
@@ -2843,6 +2887,12 @@ fi
 if test "$gettid" = "yes"; then
   output_sym "CONFIG_HAVE_GETTID"
 fi
+if test "$statx" = "yes"; then
+  output_sym "CONFIG_HAVE_STATX"
+fi
+if test "$statx_syscall" = "yes"; then
+  output_sym "CONFIG_HAVE_STATX_SYSCALL"
+fi
 if test "$fallthrough" = "yes"; then
   CFLAGS="$CFLAGS -Wimplicit-fallthrough"
 fi
diff --git a/engines/filestat.c b/engines/filestat.c
index 68a340bd..405f028d 100644
--- a/engines/filestat.c
+++ b/engines/filestat.c
@@ -5,6 +5,7 @@
  * of the file stat.
  */
 #include <stdio.h>
+#include <stdlib.h>
 #include <fcntl.h>
 #include <errno.h>
 #include <sys/types.h>
@@ -12,6 +13,7 @@
 #include <unistd.h>
 #include "../fio.h"
 #include "../optgroup.h"
+#include "../oslib/statx.h"
 
 struct fc_data {
 	enum fio_ddir stat_ddir;
@@ -25,7 +27,7 @@ struct filestat_options {
 enum {
 	FIO_FILESTAT_STAT	= 1,
 	FIO_FILESTAT_LSTAT	= 2,
-	/*FIO_FILESTAT_STATX	= 3,*/
+	FIO_FILESTAT_STATX	= 3,
 };
 
 static struct fio_option options[] = {
@@ -45,12 +47,10 @@ static struct fio_option options[] = {
 			    .oval = FIO_FILESTAT_LSTAT,
 			    .help = "Use lstat(2)",
 			  },
-			  /*
 			  { .ival = "statx",
 			    .oval = FIO_FILESTAT_STATX,
-			    .help = "Use statx(2)",
+			    .help = "Use statx(2) if exists",
 			  },
-			  */
 		},
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_FILESTAT,
@@ -66,6 +66,10 @@ static int stat_file(struct thread_data *td, struct fio_file *f)
 	struct timespec start;
 	int do_lat = !td->o.disable_lat;
 	struct stat statbuf;
+#ifndef WIN32
+	struct statx statxbuf;
+	char *abspath;
+#endif
 	int ret;
 
 	dprint(FD_FILE, "fd stat %s\n", f->file_name);
@@ -89,6 +93,18 @@ static int stat_file(struct thread_data *td, struct fio_file *f)
 	case FIO_FILESTAT_LSTAT:
 		ret = lstat(f->file_name, &statbuf);
 		break;
+	case FIO_FILESTAT_STATX:
+#ifndef WIN32
+		abspath = realpath(f->file_name, NULL);
+		if (abspath) {
+			ret = statx(-1, abspath, 0, STATX_ALL, &statxbuf);
+			free(abspath);
+		} else
+			ret = -1;
+#else
+		ret = -1;
+#endif
+		break;
 	default:
 		ret = -1;
 		break;
diff --git a/oslib/statx.c b/oslib/statx.c
new file mode 100644
index 00000000..1ca81ada
--- /dev/null
+++ b/oslib/statx.c
@@ -0,0 +1,23 @@
+#ifndef CONFIG_HAVE_STATX
+#include "statx.h"
+
+#ifdef CONFIG_HAVE_STATX_SYSCALL
+#include <unistd.h>
+#include <sys/syscall.h>
+
+int statx(int dfd, const char *pathname, int flags, unsigned int mask,
+	  struct statx *buffer)
+{
+	return syscall(__NR_statx, dfd, pathname, flags, mask, buffer);
+}
+#else
+#include <errno.h>
+
+int statx(int dfd, const char *pathname, int flags, unsigned int mask,
+	  struct statx *buffer)
+{
+	errno = EINVAL;
+	return -1;
+}
+#endif
+#endif
diff --git a/oslib/statx.h b/oslib/statx.h
new file mode 100644
index 00000000..d9758f73
--- /dev/null
+++ b/oslib/statx.h
@@ -0,0 +1,14 @@
+#ifndef CONFIG_HAVE_STATX
+#ifdef CONFIG_HAVE_STATX_SYSCALL
+#include <linux/stat.h>
+#include <sys/stat.h>
+#else
+#define STATX_ALL 0
+#undef statx
+struct statx
+{
+};
+#endif
+int statx(int dfd, const char *pathname, int flags, unsigned int mask,
+	  struct statx *buffer);
+#endif


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-03-02 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-03-02 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 58d3994d9be0783990af82571cea819b499d526c:

  io_uring: we should not need two write barriers for SQ updates (2020-02-26 19:54:12 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 941e3356a7edde556ffd81c9767ded218de20a50:

  Merge branch 'genfio-bash' of https://github.com/sitsofe/fio (2020-03-01 15:43:14 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      Merge branch 'clean1' of https://github.com/kusumi/fio
      Merge branch 'fix-win-raw' of https://github.com/sitsofe/fio
      Merge branch 'genfio-bash' of https://github.com/sitsofe/fio

Sitsofe Wheeler (2):
      genfio: use /bin/bash hashbang
      filesetup: fix win raw disk access and improve dir creation failure msg

Tomohiro Kusumi (1):
      Makefile: don't fail to remove conditionally compiled binaries on clean

 Makefile        |  1 +
 filesetup.c     | 52 +++++++++++++++++++++++++++++++++++++++++++++++++---
 os/os-windows.h |  6 +++++-
 tools/genfio    |  2 +-
 4 files changed, 56 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 027b62bc..72ab39e9 100644
--- a/Makefile
+++ b/Makefile
@@ -511,6 +511,7 @@ endif
 
 clean: FORCE
 	@rm -f .depend $(FIO_OBJS) $(GFIO_OBJS) $(OBJS) $(T_OBJS) $(UT_OBJS) $(PROGS) $(T_PROGS) $(T_TEST_PROGS) core.* core gfio unittests/unittest FIO-VERSION-FILE *.[do] lib/*.d oslib/*.[do] crc/*.d engines/*.[do] profiles/*.[do] t/*.[do] unittests/*.[do] unittests/*/*.[do] config-host.mak config-host.h y.tab.[ch] lex.yy.c exp/*.[do] lexer.h
+	@rm -f t/fio-btrace2fio t/io_uring t/read-to-pipe-async
 	@rm -rf  doc/output
 
 distclean: clean FORCE
diff --git a/filesetup.c b/filesetup.c
index b45a5826..8a4091fc 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -913,15 +913,61 @@ uint64_t get_start_offset(struct thread_data *td, struct fio_file *f)
 	return offset;
 }
 
+/*
+ * Find longest path component that exists and return its length
+ */
+int longest_existing_path(char *path) {
+	char buf[PATH_MAX];
+	bool done;
+	char *buf_pos;
+	int offset;
+#ifdef WIN32
+	DWORD dwAttr;
+#else
+	struct stat sb;
+#endif
+
+	sprintf(buf, "%s", path);
+	done = false;
+	while (!done) {
+		buf_pos = strrchr(buf, FIO_OS_PATH_SEPARATOR);
+		if (!buf_pos) {
+			done = true;
+			offset = 0;
+			break;
+		}
+
+		*(buf_pos + 1) = '\0';
+
+#ifdef WIN32
+		dwAttr = GetFileAttributesA(buf);
+		if (dwAttr != INVALID_FILE_ATTRIBUTES) {
+			done = true;
+		}
+#else
+		if (stat(buf, &sb) == 0)
+			done = true;
+#endif
+		if (done)
+			offset = buf_pos - buf;
+		else
+			*buf_pos = '\0';
+	}
+
+	return offset;
+}
+
 static bool create_work_dirs(struct thread_data *td, const char *fname)
 {
 	char path[PATH_MAX];
 	char *start, *end;
+	int offset;
 
 	snprintf(path, PATH_MAX, "%s", fname);
 	start = path;
 
-	end = start;
+	offset = longest_existing_path(path);
+	end = start + offset;
 	while ((end = strchr(end, FIO_OS_PATH_SEPARATOR)) != NULL) {
 		if (end == start) {
 			end++;
@@ -930,8 +976,8 @@ static bool create_work_dirs(struct thread_data *td, const char *fname)
 		*end = '\0';
 		errno = 0;
 		if (fio_mkdir(path, 0700) && errno != EEXIST) {
-			log_err("fio: failed to create dir (%s): %d\n",
-				start, errno);
+			log_err("fio: failed to create dir (%s): %s\n",
+				start, strerror(errno));
 			return false;
 		}
 		*end = FIO_OS_PATH_SEPARATOR;
diff --git a/os/os-windows.h b/os/os-windows.h
index 6d48ffe8..fa2955f9 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -203,7 +203,11 @@ static inline int fio_mkdir(const char *path, mode_t mode) {
 	}
 
 	if (CreateDirectoryA(path, NULL) == 0) {
-		log_err("CreateDirectoryA = %d\n", GetLastError());
+		/* Ignore errors if path is a device namespace */
+		if (strcmp(path, "\\\\.") == 0) {
+			errno = EEXIST;
+			return -1;
+		}
 		errno = win_to_posix_error(GetLastError());
 		return -1;
 	}
diff --git a/tools/genfio b/tools/genfio
index 286d814d..8518bbcc 100755
--- a/tools/genfio
+++ b/tools/genfio
@@ -1,4 +1,4 @@
-#!/usr/bin/bash
+#!/bin/bash
 #
 #  Copyright (C) 2013 eNovance SAS <licensing@enovance.com>
 #  Author: Erwan Velu  <erwan@enovance.com>


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-02-27 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-02-27 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit fb34110eac72f01651543095656693f196d895ca:

  Merge branch 'div-by-zero' of https://github.com/vincentkfu/fio (2020-02-24 08:23:52 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 58d3994d9be0783990af82571cea819b499d526c:

  io_uring: we should not need two write barriers for SQ updates (2020-02-26 19:54:12 -0700)

----------------------------------------------------------------
Eric Sandeen (1):
      fio: remove duplicate global definition of tsc_reliable

Jens Axboe (1):
      io_uring: we should not need two write barriers for SQ updates

 engines/io_uring.c | 2 --
 t/arch.c           | 1 -
 t/io_uring.c       | 2 --
 3 files changed, 5 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 5e59f975..1efc6cff 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -374,8 +374,6 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 	if (next_tail == *ring->head)
 		return FIO_Q_BUSY;
 
-	/* ensure sqe stores are ordered with tail update */
-	write_barrier();
 	if (o->cmdprio_percentage)
 		fio_ioring_prio_prep(td, io_u);
 	ring->array[tail & ld->sq_ring_mask] = io_u->index;
diff --git a/t/arch.c b/t/arch.c
index bd28a848..a72cef3a 100644
--- a/t/arch.c
+++ b/t/arch.c
@@ -1,5 +1,4 @@
 #include "../arch/arch.h"
 
 unsigned long arch_flags = 0;
-bool tsc_reliable;
 int arch_random;
diff --git a/t/io_uring.c b/t/io_uring.c
index 55b75f6e..e84a2b6b 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -216,8 +216,6 @@ static int prep_more_ios(struct submitter *s, int max_ios)
 	} while (prepped < max_ios);
 
 	if (*ring->tail != tail) {
-		/* order tail store with writes to sqes above */
-		write_barrier();
 		*ring->tail = tail;
 		write_barrier();
 	}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-02-25 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-02-25 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1b5e13beb3acc2a08321ce687727e2cbbb3b954f:

  Merge branch 'master' of https://github.com/vincentkfu/fio (2020-02-06 12:17:25 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fb34110eac72f01651543095656693f196d895ca:

  Merge branch 'div-by-zero' of https://github.com/vincentkfu/fio (2020-02-24 08:23:52 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'div-by-zero' of https://github.com/vincentkfu/fio

Vincent Fu (1):
      t/latency_percentiles: avoid division by zero

 t/latency_percentiles.py | 5 +++++
 1 file changed, 5 insertions(+)

---

Diff of recent changes:

diff --git a/t/latency_percentiles.py b/t/latency_percentiles.py
index 0c8d0c19..5cdd49cf 100755
--- a/t/latency_percentiles.py
+++ b/t/latency_percentiles.py
@@ -395,6 +395,11 @@ class FioLatTest():
         approximation   value of the bin used by fio to store a given latency
         actual          actual latency value
         """
+
+        # Avoid a division by zero. The smallest latency values have no error.
+        if actual == 0:
+            return approximation == 0
+
         delta = abs(approximation - actual) / actual
         return delta <= 1/128
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-02-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-02-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ac694f66968fe7b18c820468abd8333f3df333fb:

  Fio 3.18 (2020-02-05 07:59:58 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1b5e13beb3acc2a08321ce687727e2cbbb3b954f:

  Merge branch 'master' of https://github.com/vincentkfu/fio (2020-02-06 12:17:25 -0700)

----------------------------------------------------------------
Bart Van Assche (1):
      Make the JSON code easier to analyze

Jens Axboe (3):
      Unify architecture io_uring syscall numbers
      Merge branch 'master' of https://github.com/bvanassche/fio
      Merge branch 'master' of https://github.com/vincentkfu/fio

Vincent Fu (4):
      stat: summary statistics for both high/low priority latencies
      .gitignore: add some test programs
      gfio: add high/low priority latency results
      t/run-fio-tests: fix style issues

 .gitignore             |   3 +
 arch/arch-aarch64.h    |  12 --
 arch/arch-ppc.h        |  12 --
 arch/arch-x86-common.h |  11 -
 arch/arch.h            |  28 +++
 engines/io_uring.c     |   8 +-
 gclient.c              |  55 ++++-
 json.c                 |  68 +++---
 json.h                 | 140 ++++++++++--
 stat.c                 |  10 +-
 t/io_uring.c           |  10 +-
 t/run-fio-tests.py     | 575 ++++++++++++++++++++++++++-----------------------
 12 files changed, 564 insertions(+), 368 deletions(-)

---

Diff of recent changes:

diff --git a/.gitignore b/.gitignore
index b228938d..b84b0fda 100644
--- a/.gitignore
+++ b/.gitignore
@@ -16,7 +16,10 @@
 /t/fio-verify-state
 /t/gen-rand
 /t/ieee754
+t/io_uring
 /t/lfsr-test
+t/memlock
+t/read-to-pipe-async
 /t/stest
 /unittests/unittest
 y.tab.*
diff --git a/arch/arch-aarch64.h b/arch/arch-aarch64.h
index de9b349b..2a86cc5a 100644
--- a/arch/arch-aarch64.h
+++ b/arch/arch-aarch64.h
@@ -8,18 +8,6 @@
 
 #define FIO_ARCH	(arch_aarch64)
 
-#define ARCH_HAVE_IOURING
-
-#ifndef __NR_sys_io_uring_setup
-#define __NR_sys_io_uring_setup		425
-#endif
-#ifndef __NR_sys_io_uring_enter
-#define __NR_sys_io_uring_enter		426
-#endif
-#ifndef __NR_sys_io_uring_register
-#define __NR_sys_io_uring_register	427
-#endif
-
 #define nop		do { __asm__ __volatile__ ("yield"); } while (0)
 #define read_barrier()	do { __sync_synchronize(); } while (0)
 #define write_barrier()	do { __sync_synchronize(); } while (0)
diff --git a/arch/arch-ppc.h b/arch/arch-ppc.h
index 46246bae..804d596a 100644
--- a/arch/arch-ppc.h
+++ b/arch/arch-ppc.h
@@ -24,18 +24,6 @@
 #define PPC_CNTLZL "cntlzw"
 #endif
 
-#define ARCH_HAVE_IOURING
-
-#ifndef __NR_sys_io_uring_setup
-#define __NR_sys_io_uring_setup		425
-#endif
-#ifndef __NR_sys_io_uring_enter
-#define __NR_sys_io_uring_enter		426
-#endif
-#ifndef __NR_sys_io_uring_register
-#define __NR_sys_io_uring_register	427
-#endif
-
 static inline int __ilog2(unsigned long bitmask)
 {
 	int lz;
diff --git a/arch/arch-x86-common.h b/arch/arch-x86-common.h
index 87925bdc..f32835cc 100644
--- a/arch/arch-x86-common.h
+++ b/arch/arch-x86-common.h
@@ -3,16 +3,6 @@
 
 #include <string.h>
 
-#ifndef __NR_sys_io_uring_setup
-#define __NR_sys_io_uring_setup		425
-#endif
-#ifndef __NR_sys_io_uring_enter
-#define __NR_sys_io_uring_enter		426
-#endif
-#ifndef __NR_sys_io_uring_register
-#define __NR_sys_io_uring_register	427
-#endif
-
 static inline void cpuid(unsigned int op,
 			 unsigned int *eax, unsigned int *ebx,
 			 unsigned int *ecx, unsigned int *edx)
@@ -23,7 +13,6 @@ static inline void cpuid(unsigned int op,
 }
 
 #define ARCH_HAVE_INIT
-#define ARCH_HAVE_IOURING
 
 extern bool tsc_reliable;
 extern int arch_random;
diff --git a/arch/arch.h b/arch/arch.h
index 0ec3f10f..30c0d205 100644
--- a/arch/arch.h
+++ b/arch/arch.h
@@ -76,4 +76,32 @@ static inline int arch_init(char *envp[])
 }
 #endif
 
+#ifdef __alpha__
+/*
+ * alpha is the only exception, all other architectures
+ * have common numbers for new system calls.
+ */
+# ifndef __NR_io_uring_setup
+#  define __NR_io_uring_setup		535
+# endif
+# ifndef __NR_io_uring_enter
+#  define __NR_io_uring_enter		536
+# endif
+# ifndef __NR_io_uring_register
+#  define __NR_io_uring_register	537
+# endif
+#else /* !__alpha__ */
+# ifndef __NR_io_uring_setup
+#  define __NR_io_uring_setup		425
+# endif
+# ifndef __NR_io_uring_enter
+#  define __NR_io_uring_enter		426
+# endif
+# ifndef __NR_io_uring_register
+#  define __NR_io_uring_register	427
+# endif
+#endif
+
+#define ARCH_HAVE_IOURING
+
 #endif
diff --git a/engines/io_uring.c b/engines/io_uring.c
index f1ffc712..5e59f975 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -191,7 +191,7 @@ static struct fio_option options[] = {
 static int io_uring_enter(struct ioring_data *ld, unsigned int to_submit,
 			 unsigned int min_complete, unsigned int flags)
 {
-	return syscall(__NR_sys_io_uring_enter, ld->ring_fd, to_submit,
+	return syscall(__NR_io_uring_enter, ld->ring_fd, to_submit,
 			min_complete, flags, NULL, 0);
 }
 
@@ -548,7 +548,7 @@ static int fio_ioring_queue_init(struct thread_data *td)
 		}
 	}
 
-	ret = syscall(__NR_sys_io_uring_setup, depth, &p);
+	ret = syscall(__NR_io_uring_setup, depth, &p);
 	if (ret < 0)
 		return ret;
 
@@ -563,7 +563,7 @@ static int fio_ioring_queue_init(struct thread_data *td)
 		if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0)
 			return -1;
 
-		ret = syscall(__NR_sys_io_uring_register, ld->ring_fd,
+		ret = syscall(__NR_io_uring_register, ld->ring_fd,
 				IORING_REGISTER_BUFFERS, ld->iovecs, depth);
 		if (ret < 0)
 			return ret;
@@ -589,7 +589,7 @@ static int fio_ioring_register_files(struct thread_data *td)
 		f->engine_pos = i;
 	}
 
-	ret = syscall(__NR_sys_io_uring_register, ld->ring_fd,
+	ret = syscall(__NR_io_uring_register, ld->ring_fd,
 			IORING_REGISTER_FILES, ld->fds, td->o.nr_files);
 	if (ret) {
 err:
diff --git a/gclient.c b/gclient.c
index d2044f32..fe83382f 100644
--- a/gclient.c
+++ b/gclient.c
@@ -1155,18 +1155,21 @@ out:
 #define GFIO_CLAT	1
 #define GFIO_SLAT	2
 #define GFIO_LAT	4
+#define GFIO_HILAT	8
+#define GFIO_LOLAT	16
 
 static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 				  struct group_run_stats *rs,
 				  struct thread_stat *ts, int ddir)
 {
 	const char *ddir_label[3] = { "Read", "Write", "Trim" };
+	const char *hilat, *lolat;
 	GtkWidget *frame, *label, *box, *vbox, *main_vbox;
-	unsigned long long min[3], max[3];
+	unsigned long long min[5], max[5];
 	unsigned long runt;
 	unsigned long long bw, iops;
 	unsigned int flags = 0;
-	double mean[3], dev[3];
+	double mean[5], dev[5];
 	char *io_p, *io_palt, *bw_p, *bw_palt, *iops_p;
 	char tmp[128];
 	int i2p;
@@ -1265,6 +1268,14 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 		flags |= GFIO_CLAT;
 	if (calc_lat(&ts->lat_stat[ddir], &min[2], &max[2], &mean[2], &dev[2]))
 		flags |= GFIO_LAT;
+	if (calc_lat(&ts->clat_high_prio_stat[ddir], &min[3], &max[3], &mean[3], &dev[3])) {
+		flags |= GFIO_HILAT;
+		if (calc_lat(&ts->clat_low_prio_stat[ddir], &min[4], &max[4], &mean[4], &dev[4]))
+			flags |= GFIO_LOLAT;
+		/* we only want to print low priority statistics if other IOs were
+		 * submitted with the priority bit set
+		 */
+	}
 
 	if (flags) {
 		frame = gtk_frame_new("Latency");
@@ -1273,12 +1284,24 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 		vbox = gtk_vbox_new(FALSE, 3);
 		gtk_container_add(GTK_CONTAINER(frame), vbox);
 
+		if (ts->lat_percentiles) {
+			hilat = "High priority total latency";
+			lolat = "Low priority total latency";
+		} else {
+			hilat = "High priority completion latency";
+			lolat = "Low priority completion latency";
+		}
+
 		if (flags & GFIO_SLAT)
 			gfio_show_lat(vbox, "Submission latency", min[0], max[0], mean[0], dev[0]);
 		if (flags & GFIO_CLAT)
 			gfio_show_lat(vbox, "Completion latency", min[1], max[1], mean[1], dev[1]);
 		if (flags & GFIO_LAT)
 			gfio_show_lat(vbox, "Total latency", min[2], max[2], mean[2], dev[2]);
+		if (flags & GFIO_HILAT)
+			gfio_show_lat(vbox, hilat, min[3], max[3], mean[3], dev[3]);
+		if (flags & GFIO_LOLAT)
+			gfio_show_lat(vbox, lolat, min[4], max[4], mean[4], dev[4]);
 	}
 
 	if (ts->slat_percentiles && flags & GFIO_SLAT)
@@ -1286,16 +1309,40 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 				ts->io_u_plat[FIO_SLAT][ddir],
 				ts->slat_stat[ddir].samples,
 				"Submission");
-	if (ts->clat_percentiles && flags & GFIO_CLAT)
+	if (ts->clat_percentiles && flags & GFIO_CLAT) {
 		gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
 				ts->io_u_plat[FIO_CLAT][ddir],
 				ts->clat_stat[ddir].samples,
 				"Completion");
-	if (ts->lat_percentiles && flags & GFIO_LAT)
+		if (!ts->lat_percentiles) {
+			if (flags & GFIO_HILAT)
+				gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
+						ts->io_u_plat_high_prio[ddir],
+						ts->clat_high_prio_stat[ddir].samples,
+						"High priority completion");
+			if (flags & GFIO_LOLAT)
+				gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
+						ts->io_u_plat_low_prio[ddir],
+						ts->clat_low_prio_stat[ddir].samples,
+						"Low priority completion");
+		}
+	}
+	if (ts->lat_percentiles && flags & GFIO_LAT) {
 		gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
 				ts->io_u_plat[FIO_LAT][ddir],
 				ts->lat_stat[ddir].samples,
 				"Total");
+		if (flags & GFIO_HILAT)
+			gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
+					ts->io_u_plat_high_prio[ddir],
+					ts->clat_high_prio_stat[ddir].samples,
+					"High priority total");
+		if (flags & GFIO_LOLAT)
+			gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
+					ts->io_u_plat_low_prio[ddir],
+					ts->clat_low_prio_stat[ddir].samples,
+					"Low priority total");
+	}
 
 	free(io_p);
 	free(bw_p);
diff --git a/json.c b/json.c
index e2819a65..cd3d5d74 100644
--- a/json.c
+++ b/json.c
@@ -194,25 +194,31 @@ static int json_object_add_pair(struct json_object *obj, struct json_pair *pair)
 	return 0;
 }
 
-int json_object_add_value_type(struct json_object *obj, const char *name, int type, ...)
+int json_object_add_value_type(struct json_object *obj, const char *name,
+			       const struct json_value *arg)
 {
 	struct json_value *value;
 	struct json_pair *pair;
-	va_list args;
 	int ret;
 
-	va_start(args, type);
-	if (type == JSON_TYPE_STRING)
-		value = json_create_value_string(va_arg(args, char *));
-	else if (type == JSON_TYPE_INTEGER)
-		value = json_create_value_int(va_arg(args, long long));
-	else if (type == JSON_TYPE_FLOAT)
-		value = json_create_value_float(va_arg(args, double));
-	else if (type == JSON_TYPE_OBJECT)
-		value = json_create_value_object(va_arg(args, struct json_object *));
-	else
-		value = json_create_value_array(va_arg(args, struct json_array *));
-	va_end(args);
+	switch (arg->type) {
+	case JSON_TYPE_STRING:
+		value = json_create_value_string(arg->string);
+		break;
+	case JSON_TYPE_INTEGER:
+		value = json_create_value_int(arg->integer_number);
+		break;
+	case JSON_TYPE_FLOAT:
+		value = json_create_value_float(arg->float_number);
+		break;
+	case JSON_TYPE_OBJECT:
+		value = json_create_value_object(arg->object);
+		break;
+	default:
+	case JSON_TYPE_ARRAY:
+		value = json_create_value_array(arg->array);
+		break;
+	}
 
 	if (!value)
 		return ENOMEM;
@@ -230,24 +236,30 @@ int json_object_add_value_type(struct json_object *obj, const char *name, int ty
 	return 0;
 }
 
-int json_array_add_value_type(struct json_array *array, int type, ...)
+int json_array_add_value_type(struct json_array *array,
+			      const struct json_value *arg)
 {
 	struct json_value *value;
-	va_list args;
 	int ret;
 
-	va_start(args, type);
-	if (type == JSON_TYPE_STRING)
-		value = json_create_value_string(va_arg(args, char *));
-	else if (type == JSON_TYPE_INTEGER)
-		value = json_create_value_int(va_arg(args, long long));
-	else if (type == JSON_TYPE_FLOAT)
-		value = json_create_value_float(va_arg(args, double));
-	else if (type == JSON_TYPE_OBJECT)
-		value = json_create_value_object(va_arg(args, struct json_object *));
-	else
-		value = json_create_value_array(va_arg(args, struct json_array *));
-	va_end(args);
+	switch (arg->type) {
+	case JSON_TYPE_STRING:
+		value = json_create_value_string(arg->string);
+		break;
+	case JSON_TYPE_INTEGER:
+		value = json_create_value_int(arg->integer_number);
+		break;
+	case JSON_TYPE_FLOAT:
+		value = json_create_value_float(arg->float_number);
+		break;
+	case JSON_TYPE_OBJECT:
+		value = json_create_value_object(arg->object);
+		break;
+	default:
+	case JSON_TYPE_ARRAY:
+		value = json_create_value_array(arg->array);
+		break;
+	}
 
 	if (!value)
 		return ENOMEM;
diff --git a/json.h b/json.h
index bcc712cd..09c2f187 100644
--- a/json.h
+++ b/json.h
@@ -49,28 +49,124 @@ struct json_array *json_create_array(void);
 
 void json_free_object(struct json_object *obj);
 
-int json_object_add_value_type(struct json_object *obj, const char *name, int type, ...);
-#define json_object_add_value_int(obj, name, val) \
-	json_object_add_value_type((obj), name, JSON_TYPE_INTEGER, (long long) (val))
-#define json_object_add_value_float(obj, name, val) \
-	json_object_add_value_type((obj), name, JSON_TYPE_FLOAT, (val))
-#define json_object_add_value_string(obj, name, val) \
-	json_object_add_value_type((obj), name, JSON_TYPE_STRING, (val))
-#define json_object_add_value_object(obj, name, val) \
-	json_object_add_value_type((obj), name, JSON_TYPE_OBJECT, (val))
-#define json_object_add_value_array(obj, name, val) \
-	json_object_add_value_type((obj), name, JSON_TYPE_ARRAY, (val))
-int json_array_add_value_type(struct json_array *array, int type, ...);
-#define json_array_add_value_int(obj, val) \
-	json_array_add_value_type((obj), JSON_TYPE_INTEGER, (val))
-#define json_array_add_value_float(obj, val) \
-	json_array_add_value_type((obj), JSON_TYPE_FLOAT, (val))
-#define json_array_add_value_string(obj, val) \
-	json_array_add_value_type((obj), JSON_TYPE_STRING, (val))
-#define json_array_add_value_object(obj, val) \
-	json_array_add_value_type((obj), JSON_TYPE_OBJECT, (val))
-#define json_array_add_value_array(obj, val) \
-	json_array_add_value_type((obj), JSON_TYPE_ARRAY, (val))
+int json_object_add_value_type(struct json_object *obj, const char *name,
+			       const struct json_value *val);
+
+static inline int json_object_add_value_int(struct json_object *obj,
+					    const char *name, long long val)
+{
+	struct json_value arg = {
+		.type = JSON_TYPE_INTEGER,
+		.integer_number = val,
+	};
+
+	return json_object_add_value_type(obj, name, &arg);
+}
+
+static inline int json_object_add_value_float(struct json_object *obj,
+					      const char *name, double val)
+{
+	struct json_value arg = {
+		.type = JSON_TYPE_FLOAT,
+		.float_number = val,
+	};
+
+	return json_object_add_value_type(obj, name, &arg);
+}
+
+static inline int json_object_add_value_string(struct json_object *obj,
+					       const char *name,
+					       const char *val)
+{
+	struct json_value arg = {
+		.type = JSON_TYPE_STRING,
+		.string = (char *)val,
+	};
+
+	return json_object_add_value_type(obj, name, &arg);
+}
+
+static inline int json_object_add_value_object(struct json_object *obj,
+					       const char *name,
+					       struct json_object *val)
+{
+	struct json_value arg = {
+		.type = JSON_TYPE_OBJECT,
+		.object = val,
+	};
+
+	return json_object_add_value_type(obj, name, &arg);
+}
+
+static inline int json_object_add_value_array(struct json_object *obj,
+					      const char *name,
+					      struct json_array *val)
+{
+	struct json_value arg = {
+		.type = JSON_TYPE_ARRAY,
+		.array = val,
+	};
+
+	return json_object_add_value_type(obj, name, &arg);
+}
+
+int json_array_add_value_type(struct json_array *array,
+			      const struct json_value *val);
+
+static inline int json_array_add_value_int(struct json_array *obj,
+					   long long val)
+{
+	struct json_value arg = {
+		.type = JSON_TYPE_INTEGER,
+		.integer_number = val,
+	};
+
+	return json_array_add_value_type(obj, &arg);
+}
+
+static inline int json_array_add_value_float(struct json_array *obj,
+					     double val)
+{
+	struct json_value arg = {
+		.type = JSON_TYPE_FLOAT,
+		.float_number = val,
+	};
+
+	return json_array_add_value_type(obj, &arg);
+}
+
+static inline int json_array_add_value_string(struct json_array *obj,
+					      const char *val)
+{
+	struct json_value arg = {
+		.type = JSON_TYPE_STRING,
+		.string = (char *)val,
+	};
+
+	return json_array_add_value_type(obj, &arg);
+}
+
+static inline int json_array_add_value_object(struct json_array *obj,
+					      struct json_object *val)
+{
+	struct json_value arg = {
+		.type = JSON_TYPE_OBJECT,
+		.object = val,
+	};
+
+	return json_array_add_value_type(obj, &arg);
+}
+
+static inline int json_array_add_value_array(struct json_array *obj,
+					     struct json_array *val)
+{
+	struct json_value arg = {
+		.type = JSON_TYPE_ARRAY,
+		.array = val,
+	};
+
+	return json_array_add_value_type(obj, &arg);
+}
 
 #define json_array_last_value_object(obj) \
 	(obj->values[obj->value_cnt - 1]->object)
diff --git a/stat.c b/stat.c
index 69d57b69..d8c01d14 100644
--- a/stat.c
+++ b/stat.c
@@ -482,9 +482,13 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		display_lat("clat", min, max, mean, dev, out);
 	if (calc_lat(&ts->lat_stat[ddir], &min, &max, &mean, &dev))
 		display_lat(" lat", min, max, mean, dev, out);
-	if (calc_lat(&ts->clat_high_prio_stat[ddir], &min, &max, &mean, &dev))
-		display_lat(ts->lat_percentiles ? "prio_lat" : "prio_clat",
+	if (calc_lat(&ts->clat_high_prio_stat[ddir], &min, &max, &mean, &dev)) {
+		display_lat(ts->lat_percentiles ? "high prio_lat" : "high prio_clat",
 				min, max, mean, dev, out);
+		if (calc_lat(&ts->clat_low_prio_stat[ddir], &min, &max, &mean, &dev))
+			display_lat(ts->lat_percentiles ? "low prio_lat" : "low prio_clat",
+					min, max, mean, dev, out);
+	}
 
 	if (ts->slat_percentiles && ts->slat_stat[ddir].samples > 0)
 		show_clat_percentiles(ts->io_u_plat[FIO_SLAT][ddir],
@@ -950,7 +954,7 @@ void json_array_add_disk_util(struct disk_util_stat *dus,
 	obj = json_create_object();
 	json_array_add_value_object(array, obj);
 
-	json_object_add_value_string(obj, "name", dus->name);
+	json_object_add_value_string(obj, "name", (const char *)dus->name);
 	json_object_add_value_int(obj, "read_ios", dus->s.ios[0]);
 	json_object_add_value_int(obj, "write_ios", dus->s.ios[1]);
 	json_object_add_value_int(obj, "read_merges", dus->s.merges[0]);
diff --git a/t/io_uring.c b/t/io_uring.c
index c2e5e098..55b75f6e 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -100,7 +100,7 @@ static int io_uring_register_buffers(struct submitter *s)
 	if (do_nop)
 		return 0;
 
-	return syscall(__NR_sys_io_uring_register, s->ring_fd,
+	return syscall(__NR_io_uring_register, s->ring_fd,
 			IORING_REGISTER_BUFFERS, s->iovecs, depth);
 }
 
@@ -117,20 +117,20 @@ static int io_uring_register_files(struct submitter *s)
 		s->files[i].fixed_fd = i;
 	}
 
-	return syscall(__NR_sys_io_uring_register, s->ring_fd,
+	return syscall(__NR_io_uring_register, s->ring_fd,
 			IORING_REGISTER_FILES, s->fds, s->nr_files);
 }
 
 static int io_uring_setup(unsigned entries, struct io_uring_params *p)
 {
-	return syscall(__NR_sys_io_uring_setup, entries, p);
+	return syscall(__NR_io_uring_setup, entries, p);
 }
 
 static int io_uring_enter(struct submitter *s, unsigned int to_submit,
 			  unsigned int min_complete, unsigned int flags)
 {
-	return syscall(__NR_sys_io_uring_enter, s->ring_fd, to_submit,
-			min_complete, flags, NULL, 0);
+	return syscall(__NR_io_uring_enter, s->ring_fd, to_submit, min_complete,
+			flags, NULL, 0);
 }
 
 #ifndef CONFIG_HAVE_GETTID
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 003ff664..36fcb2f4 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -67,8 +67,14 @@ class FioTest(object):
         self.test_dir = None
         self.passed = True
         self.failure_reason = ''
+        self.command_file = None
+        self.stdout_file = None
+        self.stderr_file = None
+        self.exitcode_file = None
 
     def setup(self, artifact_root, testnum):
+        """Setup instance variables for test."""
+
         self.artifact_root = artifact_root
         self.testnum = testnum
         self.test_dir = os.path.join(artifact_root, "{:04d}".format(testnum))
@@ -76,22 +82,26 @@ class FioTest(object):
             os.mkdir(self.test_dir)
 
         self.command_file = os.path.join(
-                self.test_dir,
-                "{0}.command".format(os.path.basename(self.exe_path)))
+            self.test_dir,
+            "{0}.command".format(os.path.basename(self.exe_path)))
         self.stdout_file = os.path.join(
-                self.test_dir,
-                "{0}.stdout".format(os.path.basename(self.exe_path)))
+            self.test_dir,
+            "{0}.stdout".format(os.path.basename(self.exe_path)))
         self.stderr_file = os.path.join(
-                self.test_dir,
-                "{0}.stderr".format(os.path.basename(self.exe_path)))
-        self.exticode_file = os.path.join(
-                self.test_dir,
-                "{0}.exitcode".format(os.path.basename(self.exe_path)))
+            self.test_dir,
+            "{0}.stderr".format(os.path.basename(self.exe_path)))
+        self.exitcode_file = os.path.join(
+            self.test_dir,
+            "{0}.exitcode".format(os.path.basename(self.exe_path)))
 
     def run(self):
+        """Run the test."""
+
         raise NotImplementedError()
 
     def check_result(self):
+        """Check test results."""
+
         raise NotImplementedError()
 
 
@@ -109,10 +119,9 @@ class FioExeTest(FioTest):
 
         FioTest.__init__(self, exe_path, parameters, success)
 
-    def setup(self, artifact_root, testnum):
-        super(FioExeTest, self).setup(artifact_root, testnum)
-
     def run(self):
+        """Execute the binary or script described by this instance."""
+
         if self.parameters:
             command = [self.exe_path] + self.parameters
         else:
@@ -123,7 +132,7 @@ class FioExeTest(FioTest):
 
         stdout_file = open(self.stdout_file, "w+")
         stderr_file = open(self.stderr_file, "w+")
-        exticode_file = open(self.exticode_file, "w+")
+        exitcode_file = open(self.exitcode_file, "w+")
         try:
             proc = None
             # Avoid using subprocess.run() here because when a timeout occurs,
@@ -136,8 +145,8 @@ class FioExeTest(FioTest):
                                     cwd=self.test_dir,
                                     universal_newlines=True)
             proc.communicate(timeout=self.success['timeout'])
-            exticode_file.write('{0}\n'.format(proc.returncode))
-            logging.debug("Test %d: return code: %d" % (self.testnum, proc.returncode))
+            exitcode_file.write('{0}\n'.format(proc.returncode))
+            logging.debug("Test %d: return code: %d", self.testnum, proc.returncode)
             self.output['proc'] = proc
         except subprocess.TimeoutExpired:
             proc.terminate()
@@ -154,17 +163,19 @@ class FioExeTest(FioTest):
         finally:
             stdout_file.close()
             stderr_file.close()
-            exticode_file.close()
+            exitcode_file.close()
 
     def check_result(self):
+        """Check results of test run."""
+
         if 'proc' not in self.output:
             if self.output['failure'] == 'timeout':
                 self.failure_reason = "{0} timeout,".format(self.failure_reason)
             else:
                 assert self.output['failure'] == 'exception'
                 self.failure_reason = '{0} exception: {1}, {2}'.format(
-                        self.failure_reason, self.output['exc_info'][0],
-                        self.output['exc_info'][1])
+                    self.failure_reason, self.output['exc_info'][0],
+                    self.output['exc_info'][1])
 
             self.passed = False
             return
@@ -222,22 +233,26 @@ class FioJobTest(FioExeTest):
         FioExeTest.__init__(self, fio_path, self.fio_args, success)
 
     def setup(self, artifact_root, testnum):
+        """Setup instance variables for fio job test."""
+
         super(FioJobTest, self).setup(artifact_root, testnum)
 
         self.command_file = os.path.join(
-                self.test_dir,
-                "{0}.command".format(os.path.basename(self.fio_job)))
+            self.test_dir,
+            "{0}.command".format(os.path.basename(self.fio_job)))
         self.stdout_file = os.path.join(
-                self.test_dir,
-                "{0}.stdout".format(os.path.basename(self.fio_job)))
+            self.test_dir,
+            "{0}.stdout".format(os.path.basename(self.fio_job)))
         self.stderr_file = os.path.join(
-                self.test_dir,
-                "{0}.stderr".format(os.path.basename(self.fio_job)))
-        self.exticode_file = os.path.join(
-                self.test_dir,
-                "{0}.exitcode".format(os.path.basename(self.fio_job)))
+            self.test_dir,
+            "{0}.stderr".format(os.path.basename(self.fio_job)))
+        self.exitcode_file = os.path.join(
+            self.test_dir,
+            "{0}.exitcode".format(os.path.basename(self.fio_job)))
 
     def run_pre_job(self):
+        """Run fio job precondition step."""
+
         precon = FioJobTest(self.exe_path, self.fio_pre_job,
                             self.fio_pre_success,
                             output_format=self.output_format)
@@ -248,15 +263,19 @@ class FioJobTest(FioExeTest):
         self.failure_reason = precon.failure_reason
 
     def run(self):
+        """Run fio job test."""
+
         if self.fio_pre_job:
             self.run_pre_job()
 
         if not self.precon_failed:
             super(FioJobTest, self).run()
         else:
-            logging.debug("Test %d: precondition step failed" % self.testnum)
+            logging.debug("Test %d: precondition step failed", self.testnum)
 
     def check_result(self):
+        """Check fio job results."""
+
         if self.precon_failed:
             self.passed = False
             self.failure_reason = "{0} precondition step failed,".format(self.failure_reason)
@@ -267,7 +286,7 @@ class FioJobTest(FioExeTest):
         if not self.passed:
             return
 
-        if not 'json' in self.output_format:
+        if 'json' not in self.output_format:
             return
 
         try:
@@ -291,7 +310,7 @@ class FioJobTest(FioExeTest):
             except json.JSONDecodeError:
                 continue
             else:
-                logging.debug("Test %d: skipped %d lines decoding JSON data" % (self.testnum, i))
+                logging.debug("Test %d: skipped %d lines decoding JSON data", self.testnum, i)
                 return
 
         self.failure_reason = "{0} unable to decode JSON data,".format(self.failure_reason)
@@ -328,7 +347,7 @@ class FioJobTest_t0006(FioJobTest):
 
         ratio = self.json_data['jobs'][0]['read']['io_kbytes'] \
             / self.json_data['jobs'][0]['write']['io_kbytes']
-        logging.debug("Test %d: ratio: %f" % (self.testnum, ratio))
+        logging.debug("Test %d: ratio: %f", self.testnum, ratio)
         if ratio < 1.99 or ratio > 2.01:
             self.failure_reason = "{0} read/write ratio mismatch,".format(self.failure_reason)
             self.passed = False
@@ -364,7 +383,7 @@ class FioJobTest_t0008(FioJobTest):
             return
 
         ratio = self.json_data['jobs'][0]['write']['io_kbytes'] / 16568
-        logging.debug("Test %d: ratio: %f" % (self.testnum, ratio))
+        logging.debug("Test %d: ratio: %f", self.testnum, ratio)
 
         if ratio < 0.99 or ratio > 1.01:
             self.failure_reason = "{0} bytes written mismatch,".format(self.failure_reason)
@@ -384,7 +403,7 @@ class FioJobTest_t0009(FioJobTest):
         if not self.passed:
             return
 
-        logging.debug('Test %d: elapsed: %d' % (self.testnum, self.json_data['jobs'][0]['elapsed']))
+        logging.debug('Test %d: elapsed: %d', self.testnum, self.json_data['jobs'][0]['elapsed'])
 
         if self.json_data['jobs'][0]['elapsed'] < 60:
             self.failure_reason = "{0} elapsed time mismatch,".format(self.failure_reason)
@@ -406,8 +425,8 @@ class FioJobTest_t0011(FioJobTest):
         iops1 = self.json_data['jobs'][0]['read']['iops']
         iops2 = self.json_data['jobs'][1]['read']['iops']
         ratio = iops2 / iops1
-        logging.debug("Test %d: iops1: %f" % (self.testnum, iops1))
-        logging.debug("Test %d: ratio: %f" % (self.testnum, ratio))
+        logging.debug("Test %d: iops1: %f", self.testnum, iops1)
+        logging.debug("Test %d: ratio: %f", self.testnum, ratio)
 
         if iops1 < 998 or iops1 > 1002:
             self.failure_reason = "{0} iops value mismatch,".format(self.failure_reason)
@@ -451,11 +470,11 @@ class Requirements(object):
 
             Requirements._root = (os.geteuid() == 0)
             if Requirements._zbd and Requirements._root:
-                    subprocess.run(["modprobe", "null_blk"],
-                                   stdout=subprocess.PIPE,
-                                   stderr=subprocess.PIPE)
-                    if os.path.exists("/sys/module/null_blk/parameters/zoned"):
-                        Requirements._zoned_nullb = True
+                subprocess.run(["modprobe", "null_blk"],
+                               stdout=subprocess.PIPE,
+                               stderr=subprocess.PIPE)
+                if os.path.exists("/sys/module/null_blk/parameters/zoned"):
+                    Requirements._zoned_nullb = True
 
         if platform.system() == "Windows":
             utest_exe = "unittest.exe"
@@ -477,253 +496,273 @@ class Requirements(object):
                     Requirements.cpucount4]
         for req in req_list:
             value, desc = req()
-            logging.debug("Requirements: Requirement '%s' met? %s" % (desc, value))
+            logging.debug("Requirements: Requirement '%s' met? %s", desc, value)
 
-    def linux():
+    @classmethod
+    def linux(cls):
+        """Are we running on Linux?"""
         return Requirements._linux, "Linux required"
 
-    def libaio():
+    @classmethod
+    def libaio(cls):
+        """Is libaio available?"""
         return Requirements._libaio, "libaio required"
 
-    def zbd():
+    @classmethod
+    def zbd(cls):
+        """Is ZBD support available?"""
         return Requirements._zbd, "Zoned block device support required"
 
-    def root():
+    @classmethod
+    def root(cls):
+        """Are we running as root?"""
         return Requirements._root, "root required"
 
-    def zoned_nullb():
+    @classmethod
+    def zoned_nullb(cls):
+        """Are zoned null block devices available?"""
         return Requirements._zoned_nullb, "Zoned null block device support required"
 
-    def not_macos():
+    @classmethod
+    def not_macos(cls):
+        """Are we running on a platform other than macOS?"""
         return Requirements._not_macos, "platform other than macOS required"
 
-    def not_windows():
+    @classmethod
+    def not_windows(cls):
+        """Are we running on a platform other than Windws?"""
         return Requirements._not_windows, "platform other than Windows required"
 
-    def unittests():
+    @classmethod
+    def unittests(cls):
+        """Were unittests built?"""
         return Requirements._unittests, "Unittests support required"
 
-    def cpucount4():
+    @classmethod
+    def cpucount4(cls):
+        """Do we have at least 4 CPUs?"""
         return Requirements._cpucount4, "4+ CPUs required"
 
 
 SUCCESS_DEFAULT = {
-        'zero_return': True,
-        'stderr_empty': True,
-        'timeout': 600,
-        }
+    'zero_return': True,
+    'stderr_empty': True,
+    'timeout': 600,
+    }
 SUCCESS_NONZERO = {
-        'zero_return': False,
-        'stderr_empty': False,
-        'timeout': 600,
-        }
+    'zero_return': False,
+    'stderr_empty': False,
+    'timeout': 600,
+    }
 SUCCESS_STDERR = {
-        'zero_return': True,
-        'stderr_empty': False,
-        'timeout': 600,
-        }
+    'zero_return': True,
+    'stderr_empty': False,
+    'timeout': 600,
+    }
 TEST_LIST = [
-        {
-            'test_id':          1,
-            'test_class':       FioJobTest,
-            'job':              't0001-52c58027.fio',
-            'success':          SUCCESS_DEFAULT,
-            'pre_job':          None,
-            'pre_success':      None,
-            'requirements':     [],
-        },
-        {
-            'test_id':          2,
-            'test_class':       FioJobTest,
-            'job':              't0002-13af05ae-post.fio',
-            'success':          SUCCESS_DEFAULT,
-            'pre_job':          't0002-13af05ae-pre.fio',
-            'pre_success':      None,
-            'requirements':     [Requirements.linux, Requirements.libaio],
-        },
-        {
-            'test_id':          3,
-            'test_class':       FioJobTest,
-            'job':              't0003-0ae2c6e1-post.fio',
-            'success':          SUCCESS_NONZERO,
-            'pre_job':          't0003-0ae2c6e1-pre.fio',
-            'pre_success':      SUCCESS_DEFAULT,
-            'requirements':     [Requirements.linux, Requirements.libaio],
-        },
-        {
-            'test_id':          4,
-            'test_class':       FioJobTest,
-            'job':              't0004-8a99fdf6.fio',
-            'success':          SUCCESS_DEFAULT,
-            'pre_job':          None,
-            'pre_success':      None,
-            'requirements':     [Requirements.linux, Requirements.libaio],
-        },
-        {
-            'test_id':          5,
-            'test_class':       FioJobTest_t0005,
-            'job':              't0005-f7078f7b.fio',
-            'success':          SUCCESS_DEFAULT,
-            'pre_job':          None,
-            'pre_success':      None,
-            'output_format':    'json',
-            'requirements':     [Requirements.not_windows],
-        },
-        {
-            'test_id':          6,
-            'test_class':       FioJobTest_t0006,
-            'job':              't0006-82af2a7c.fio',
-            'success':          SUCCESS_DEFAULT,
-            'pre_job':          None,
-            'pre_success':      None,
-            'output_format':    'json',
-            'requirements':     [Requirements.linux, Requirements.libaio],
-        },
-        {
-            'test_id':          7,
-            'test_class':       FioJobTest_t0007,
-            'job':              't0007-37cf9e3c.fio',
-            'success':          SUCCESS_DEFAULT,
-            'pre_job':          None,
-            'pre_success':      None,
-            'output_format':    'json',
-            'requirements':     [],
-        },
-        {
-            'test_id':          8,
-            'test_class':       FioJobTest_t0008,
-            'job':              't0008-ae2fafc8.fio',
-            'success':          SUCCESS_DEFAULT,
-            'pre_job':          None,
-            'pre_success':      None,
-            'output_format':    'json',
-            'requirements':     [],
-        },
-        {
-            'test_id':          9,
-            'test_class':       FioJobTest_t0009,
-            'job':              't0009-f8b0bd10.fio',
-            'success':          SUCCESS_DEFAULT,
-            'pre_job':          None,
-            'pre_success':      None,
-            'output_format':    'json',
-            'requirements':     [Requirements.not_macos,
-                                 Requirements.cpucount4],
-                                # mac os does not support CPU affinity
-        },
-        {
-            'test_id':          10,
-            'test_class':       FioJobTest,
-            'job':              't0010-b7aae4ba.fio',
-            'success':          SUCCESS_DEFAULT,
-            'pre_job':          None,
-            'pre_success':      None,
-            'requirements':     [],
-        },
-        {
-            'test_id':          11,
-            'test_class':       FioJobTest_t0011,
-            'job':              't0011-5d2788d5.fio',
-            'success':          SUCCESS_DEFAULT,
-            'pre_job':          None,
-            'pre_success':      None,
-            'output_format':    'json',
-            'requirements':     [],
-        },
-        {
-            'test_id':          1000,
-            'test_class':       FioExeTest,
-            'exe':              't/axmap',
-            'parameters':       None,
-            'success':          SUCCESS_DEFAULT,
-            'requirements':     [],
-        },
-        {
-            'test_id':          1001,
-            'test_class':       FioExeTest,
-            'exe':              't/ieee754',
-            'parameters':       None,
-            'success':          SUCCESS_DEFAULT,
-            'requirements':     [],
-        },
-        {
-            'test_id':          1002,
-            'test_class':       FioExeTest,
-            'exe':              't/lfsr-test',
-            'parameters':       ['0xFFFFFF', '0', '0', 'verify'],
-            'success':          SUCCESS_STDERR,
-            'requirements':     [],
-        },
-        {
-            'test_id':          1003,
-            'test_class':       FioExeTest,
-            'exe':              't/readonly.py',
-            'parameters':       ['-f', '{fio_path}'],
-            'success':          SUCCESS_DEFAULT,
-            'requirements':     [],
-        },
-        {
-            'test_id':          1004,
-            'test_class':       FioExeTest,
-            'exe':              't/steadystate_tests.py',
-            'parameters':       ['{fio_path}'],
-            'success':          SUCCESS_DEFAULT,
-            'requirements':     [],
-        },
-        {
-            'test_id':          1005,
-            'test_class':       FioExeTest,
-            'exe':              't/stest',
-            'parameters':       None,
-            'success':          SUCCESS_STDERR,
-            'requirements':     [],
-        },
-        {
-            'test_id':          1006,
-            'test_class':       FioExeTest,
-            'exe':              't/strided.py',
-            'parameters':       ['{fio_path}'],
-            'success':          SUCCESS_DEFAULT,
-            'requirements':     [],
-        },
-        {
-            'test_id':          1007,
-            'test_class':       FioExeTest,
-            'exe':              't/zbd/run-tests-against-regular-nullb',
-            'parameters':       None,
-            'success':          SUCCESS_DEFAULT,
-            'requirements':     [Requirements.linux, Requirements.zbd,
-                                 Requirements.root],
-        },
-        {
-            'test_id':          1008,
-            'test_class':       FioExeTest,
-            'exe':              't/zbd/run-tests-against-zoned-nullb',
-            'parameters':       None,
-            'success':          SUCCESS_DEFAULT,
-            'requirements':     [Requirements.linux, Requirements.zbd,
-                                 Requirements.root, Requirements.zoned_nullb],
-        },
-        {
-            'test_id':          1009,
-            'test_class':       FioExeTest,
-            'exe':              'unittests/unittest',
-            'parameters':       None,
-            'success':          SUCCESS_DEFAULT,
-            'requirements':     [Requirements.unittests],
-        },
-        {
-            'test_id':          1010,
-            'test_class':       FioExeTest,
-            'exe':              't/latency_percentiles.py',
-            'parameters':       ['-f', '{fio_path}'],
-            'success':          SUCCESS_DEFAULT,
-            'requirements':     [],
-        },
+    {
+        'test_id':          1,
+        'test_class':       FioJobTest,
+        'job':              't0001-52c58027.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [],
+    },
+    {
+        'test_id':          2,
+        'test_class':       FioJobTest,
+        'job':              't0002-13af05ae-post.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          't0002-13af05ae-pre.fio',
+        'pre_success':      None,
+        'requirements':     [Requirements.linux, Requirements.libaio],
+    },
+    {
+        'test_id':          3,
+        'test_class':       FioJobTest,
+        'job':              't0003-0ae2c6e1-post.fio',
+        'success':          SUCCESS_NONZERO,
+        'pre_job':          't0003-0ae2c6e1-pre.fio',
+        'pre_success':      SUCCESS_DEFAULT,
+        'requirements':     [Requirements.linux, Requirements.libaio],
+    },
+    {
+        'test_id':          4,
+        'test_class':       FioJobTest,
+        'job':              't0004-8a99fdf6.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [Requirements.linux, Requirements.libaio],
+    },
+    {
+        'test_id':          5,
+        'test_class':       FioJobTest_t0005,
+        'job':              't0005-f7078f7b.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [Requirements.not_windows],
+    },
+    {
+        'test_id':          6,
+        'test_class':       FioJobTest_t0006,
+        'job':              't0006-82af2a7c.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [Requirements.linux, Requirements.libaio],
+    },
+    {
+        'test_id':          7,
+        'test_class':       FioJobTest_t0007,
+        'job':              't0007-37cf9e3c.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [],
+    },
+    {
+        'test_id':          8,
+        'test_class':       FioJobTest_t0008,
+        'job':              't0008-ae2fafc8.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [],
+    },
+    {
+        'test_id':          9,
+        'test_class':       FioJobTest_t0009,
+        'job':              't0009-f8b0bd10.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [Requirements.not_macos,
+                             Requirements.cpucount4],
+        # mac os does not support CPU affinity
+    },
+    {
+        'test_id':          10,
+        'test_class':       FioJobTest,
+        'job':              't0010-b7aae4ba.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'requirements':     [],
+    },
+    {
+        'test_id':          11,
+        'test_class':       FioJobTest_t0011,
+        'job':              't0011-5d2788d5.fio',
+        'success':          SUCCESS_DEFAULT,
+        'pre_job':          None,
+        'pre_success':      None,
+        'output_format':    'json',
+        'requirements':     [],
+    },
+    {
+        'test_id':          1000,
+        'test_class':       FioExeTest,
+        'exe':              't/axmap',
+        'parameters':       None,
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [],
+    },
+    {
+        'test_id':          1001,
+        'test_class':       FioExeTest,
+        'exe':              't/ieee754',
+        'parameters':       None,
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [],
+    },
+    {
+        'test_id':          1002,
+        'test_class':       FioExeTest,
+        'exe':              't/lfsr-test',
+        'parameters':       ['0xFFFFFF', '0', '0', 'verify'],
+        'success':          SUCCESS_STDERR,
+        'requirements':     [],
+    },
+    {
+        'test_id':          1003,
+        'test_class':       FioExeTest,
+        'exe':              't/readonly.py',
+        'parameters':       ['-f', '{fio_path}'],
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [],
+    },
+    {
+        'test_id':          1004,
+        'test_class':       FioExeTest,
+        'exe':              't/steadystate_tests.py',
+        'parameters':       ['{fio_path}'],
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [],
+    },
+    {
+        'test_id':          1005,
+        'test_class':       FioExeTest,
+        'exe':              't/stest',
+        'parameters':       None,
+        'success':          SUCCESS_STDERR,
+        'requirements':     [],
+    },
+    {
+        'test_id':          1006,
+        'test_class':       FioExeTest,
+        'exe':              't/strided.py',
+        'parameters':       ['{fio_path}'],
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [],
+    },
+    {
+        'test_id':          1007,
+        'test_class':       FioExeTest,
+        'exe':              't/zbd/run-tests-against-regular-nullb',
+        'parameters':       None,
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [Requirements.linux, Requirements.zbd,
+                             Requirements.root],
+    },
+    {
+        'test_id':          1008,
+        'test_class':       FioExeTest,
+        'exe':              't/zbd/run-tests-against-zoned-nullb',
+        'parameters':       None,
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [Requirements.linux, Requirements.zbd,
+                             Requirements.root, Requirements.zoned_nullb],
+    },
+    {
+        'test_id':          1009,
+        'test_class':       FioExeTest,
+        'exe':              'unittests/unittest',
+        'parameters':       None,
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [Requirements.unittests],
+    },
+    {
+        'test_id':          1010,
+        'test_class':       FioExeTest,
+        'exe':              't/latency_percentiles.py',
+        'parameters':       ['-f', '{fio_path}'],
+        'success':          SUCCESS_DEFAULT,
+        'requirements':     [],
+    },
 ]
 
 
 def parse_args():
+    """Parse command-line arguments."""
+
     parser = argparse.ArgumentParser()
     parser.add_argument('-r', '--fio-root',
                         help='fio root path')
@@ -745,6 +784,8 @@ def parse_args():
 
 
 def main():
+    """Entry point."""
+
     args = parse_args()
     if args.debug:
         logging.basicConfig(level=logging.DEBUG)
@@ -829,14 +870,14 @@ def main():
             continue
 
         if not args.skip_req:
-            skip = False
+            reqs_met = True
             for req in config['requirements']:
-                ok, reason = req()
-                skip = not ok
-                logging.debug("Test %d: Requirement '%s' met? %s" % (config['test_id'], reason, ok))
-                if skip:
+                reqs_met, reason = req()
+                logging.debug("Test %d: Requirement '%s' met? %s", config['test_id'], reason,
+                              reqs_met)
+                if not reqs_met:
                     break
-            if skip:
+            if not reqs_met:
                 print("Test {0} SKIPPED ({1})".format(config['test_id'], reason))
                 skipped = skipped + 1
                 continue
@@ -851,9 +892,9 @@ def main():
             result = "FAILED: {0}".format(test.failure_reason)
             failed = failed + 1
             with open(test.stderr_file, "r") as stderr_file:
-                logging.debug("Test %d: stderr:\n%s" % (config['test_id'], stderr_file.read()))
+                logging.debug("Test %d: stderr:\n%s", config['test_id'], stderr_file.read())
             with open(test.stdout_file, "r") as stdout_file:
-                logging.debug("Test %d: stdout:\n%s" % (config['test_id'], stdout_file.read()))
+                logging.debug("Test %d: stdout:\n%s", config['test_id'], stdout_file.read())
         print("Test {0} {1}".format(config['test_id'], result))
 
     print("{0} test(s) passed, {1} failed, {2} skipped".format(passed, failed, skipped))


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-02-06 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-02-06 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f4cd67d8dc96ca947b294f6a5c9fdced2b64215d:

  Merge branch 'latency-rebase-again' of https://github.com/vincentkfu/fio (2020-02-04 10:05:09 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ac694f66968fe7b18c820468abd8333f3df333fb:

  Fio 3.18 (2020-02-05 07:59:58 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 3.18

 FIO-VERSION-GEN | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 13ce8c16..6c2bcc8a 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.17
+DEF_VER=fio-3.18
 
 LF='
 '


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-02-05 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-02-05 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ab45cf076766ce1ed49d28380c059343305cde4a:

  Merge branch 'stat-averaging-interval-start-fix' of https://github.com/maciejsszmigiero/fio (2020-01-28 14:15:35 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f4cd67d8dc96ca947b294f6a5c9fdced2b64215d:

  Merge branch 'latency-rebase-again' of https://github.com/vincentkfu/fio (2020-02-04 10:05:09 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'filestat2' of https://github.com/kusumi/fio
      Merge branch 'latency-rebase-again' of https://github.com/vincentkfu/fio

Tomohiro Kusumi (1):
      engines/filestat: change "lstat" bool option to "stat_type" str option

Vincent Fu (10):
      fio: groundwork for adding slat, lat percentiles
      fio: report percentiles for slat, clat, lat
      gfio: display slat, clat, and lat percentiles
      docs: updates for slat, clat, lat percentile reporting
      stat: make priority summary statistics consistent with percentiles
      fio: better distinguish between high and low priority
      stat: fix high/low prio unified rw bug
      t/latency_percentiles: test latency percentile reporting
      t/run-fio-tests: add latency_percentiles.py
      t/run-fio-tests: increase time allowed for tests to pass

 HOWTO                    |   30 +-
 cconv.c                  |    2 +
 client.c                 |   14 +-
 engines/filestat.c       |   53 +-
 fio.1                    |   28 +-
 gclient.c                |   27 +-
 init.c                   |   19 +-
 options.c                |   31 +-
 server.c                 |   14 +-
 server.h                 |    2 +-
 stat.c                   |  440 +++++++--------
 stat.h                   |   22 +-
 t/latency_percentiles.py | 1324 ++++++++++++++++++++++++++++++++++++++++++++++
 t/run-fio-tests.py       |   14 +-
 thread_options.h         |   11 +-
 15 files changed, 1672 insertions(+), 359 deletions(-)
 create mode 100755 t/latency_percentiles.py

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index f19f9226..430c7b62 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2261,9 +2261,10 @@ with the caveat that when used on the command line, they must come after the
 	multiple paths exist between the client and the server or in certain loopback
 	configurations.
 
-.. option:: lstat=bool : [filestat]
+.. option:: stat_type=str : [filestat]
 
-	Use lstat(2) to measure lookup/getattr performance. Default is 0.
+	Specify stat system call type to measure lookup/getattr performance.
+	Default is **stat** for :manpage:`stat(2)`.
 
 .. option:: readfua=bool : [sg]
 
@@ -3346,27 +3347,28 @@ Measurements and reporting
 	Disable measurements of throughput/bandwidth numbers. See
 	:option:`disable_lat`.
 
+.. option:: slat_percentiles=bool
+
+	Report submission latency percentiles. Submission latency is not recorded
+	for synchronous ioengines.
+
 .. option:: clat_percentiles=bool
 
-	Enable the reporting of percentiles of completion latencies.  This
-	option is mutually exclusive with :option:`lat_percentiles`.
+	Report completion latency percentiles.
 
 .. option:: lat_percentiles=bool
 
-	Enable the reporting of percentiles of I/O latencies. This is similar
-	to :option:`clat_percentiles`, except that this includes the
-	submission latency. This option is mutually exclusive with
-	:option:`clat_percentiles`.
+	Report total latency percentiles. Total latency is the sum of submission
+	latency and completion latency.
 
 .. option:: percentile_list=float_list
 
-	Overwrite the default list of percentiles for completion latencies and
-	the block error histogram.  Each number is a floating number in the
-	range (0,100], and the maximum length of the list is 20. Use ``:`` to
-	separate the numbers, and list the numbers in ascending order. For
+	Overwrite the default list of percentiles for latencies and the block error
+	histogram.  Each number is a floating point number in the range (0,100], and
+	the maximum length of the list is 20. Use ``:`` to separate the numbers. For
 	example, ``--percentile_list=99.5:99.9`` will cause fio to report the
-	values of completion latency below which 99.5% and 99.9% of the observed
-	latencies fell, respectively.
+	latency durations below which 99.5% and 99.9% of the observed latencies fell,
+	respectively.
 
 .. option:: significant_figures=int
 
diff --git a/cconv.c b/cconv.c
index 04854b0e..48218dc4 100644
--- a/cconv.c
+++ b/cconv.c
@@ -271,6 +271,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->trim_zero = le32_to_cpu(top->trim_zero);
 	o->clat_percentiles = le32_to_cpu(top->clat_percentiles);
 	o->lat_percentiles = le32_to_cpu(top->lat_percentiles);
+	o->slat_percentiles = le32_to_cpu(top->slat_percentiles);
 	o->percentile_precision = le32_to_cpu(top->percentile_precision);
 	o->sig_figs = le32_to_cpu(top->sig_figs);
 	o->continue_on_error = le32_to_cpu(top->continue_on_error);
@@ -469,6 +470,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->trim_zero = cpu_to_le32(o->trim_zero);
 	top->clat_percentiles = cpu_to_le32(o->clat_percentiles);
 	top->lat_percentiles = cpu_to_le32(o->lat_percentiles);
+	top->slat_percentiles = cpu_to_le32(o->slat_percentiles);
 	top->percentile_precision = cpu_to_le32(o->percentile_precision);
 	top->sig_figs = cpu_to_le32(o->sig_figs);
 	top->continue_on_error = cpu_to_le32(o->continue_on_error);
diff --git a/client.c b/client.c
index 4aed39e7..b7575596 100644
--- a/client.c
+++ b/client.c
@@ -944,7 +944,7 @@ static void convert_io_stat(struct io_stat *dst, struct io_stat *src)
 
 static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 {
-	int i, j;
+	int i, j, k;
 
 	dst->error		= le32_to_cpu(src->error);
 	dst->thread_number	= le32_to_cpu(src->thread_number);
@@ -969,6 +969,7 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->majf		= le64_to_cpu(src->majf);
 	dst->clat_percentiles	= le32_to_cpu(src->clat_percentiles);
 	dst->lat_percentiles	= le32_to_cpu(src->lat_percentiles);
+	dst->slat_percentiles	= le32_to_cpu(src->slat_percentiles);
 	dst->percentile_precision = le64_to_cpu(src->percentile_precision);
 
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++) {
@@ -991,9 +992,10 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	for (i = 0; i < FIO_IO_U_LAT_M_NR; i++)
 		dst->io_u_lat_m[i]	= le64_to_cpu(src->io_u_lat_m[i]);
 
-	for (i = 0; i < DDIR_RWDIR_CNT; i++)
-		for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
-			dst->io_u_plat[i][j] = le64_to_cpu(src->io_u_plat[i][j]);
+	for (i = 0; i < FIO_LAT_CNT; i++)
+		for (j = 0; j < DDIR_RWDIR_CNT; j++)
+			for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
+				dst->io_u_plat[i][j][k] = le64_to_cpu(src->io_u_plat[i][j][k]);
 
 	for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
 		dst->io_u_sync_plat[j] = le64_to_cpu(src->io_u_sync_plat[j]);
@@ -1035,10 +1037,10 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
 			dst->io_u_plat_high_prio[i][j] = le64_to_cpu(src->io_u_plat_high_prio[i][j]);
-			dst->io_u_plat_prio[i][j] = le64_to_cpu(src->io_u_plat_prio[i][j]);
+			dst->io_u_plat_low_prio[i][j] = le64_to_cpu(src->io_u_plat_low_prio[i][j]);
 		}
 		convert_io_stat(&dst->clat_high_prio_stat[i], &src->clat_high_prio_stat[i]);
-		convert_io_stat(&dst->clat_prio_stat[i], &src->clat_prio_stat[i]);
+		convert_io_stat(&dst->clat_low_prio_stat[i], &src->clat_low_prio_stat[i]);
 	}
 
 	dst->ss_dur		= le64_to_cpu(src->ss_dur);
diff --git a/engines/filestat.c b/engines/filestat.c
index 6c87c4c2..68a340bd 100644
--- a/engines/filestat.c
+++ b/engines/filestat.c
@@ -19,17 +19,39 @@ struct fc_data {
 
 struct filestat_options {
 	void *pad;
-	unsigned int lstat;
+	unsigned int stat_type;
+};
+
+enum {
+	FIO_FILESTAT_STAT	= 1,
+	FIO_FILESTAT_LSTAT	= 2,
+	/*FIO_FILESTAT_STATX	= 3,*/
 };
 
 static struct fio_option options[] = {
 	{
-		.name	= "lstat",
-		.lname	= "lstat",
-		.type	= FIO_OPT_BOOL,
-		.off1	= offsetof(struct filestat_options, lstat),
-		.help	= "Use lstat(2) to measure lookup/getattr performance",
-		.def	= "0",
+		.name	= "stat_type",
+		.lname	= "stat_type",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct filestat_options, stat_type),
+		.help	= "Specify stat system call type to measure lookup/getattr performance",
+		.def	= "stat",
+		.posval = {
+			  { .ival = "stat",
+			    .oval = FIO_FILESTAT_STAT,
+			    .help = "Use stat(2)",
+			  },
+			  { .ival = "lstat",
+			    .oval = FIO_FILESTAT_LSTAT,
+			    .help = "Use lstat(2)",
+			  },
+			  /*
+			  { .ival = "statx",
+			    .oval = FIO_FILESTAT_STATX,
+			    .help = "Use statx(2)",
+			  },
+			  */
+		},
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_FILESTAT,
 	},
@@ -60,17 +82,24 @@ static int stat_file(struct thread_data *td, struct fio_file *f)
 	if (do_lat)
 		fio_gettime(&start, NULL);
 
-	if (o->lstat)
-		ret = lstat(f->file_name, &statbuf);
-	else
+	switch (o->stat_type){
+	case FIO_FILESTAT_STAT:
 		ret = stat(f->file_name, &statbuf);
+		break;
+	case FIO_FILESTAT_LSTAT:
+		ret = lstat(f->file_name, &statbuf);
+		break;
+	default:
+		ret = -1;
+		break;
+	}
 
 	if (ret == -1) {
 		char buf[FIO_VERROR_SIZE];
 		int e = errno;
 
-		snprintf(buf, sizeof(buf), "%sstat(%s)",
-			o->lstat ? "l" : "", f->file_name);
+		snprintf(buf, sizeof(buf), "stat(%s) type=%u", f->file_name,
+			o->stat_type);
 		td_verror(td, e, buf);
 		return 1;
 	}
diff --git a/fio.1 b/fio.1
index a58632b4..1db12c2f 100644
--- a/fio.1
+++ b/fio.1
@@ -2032,8 +2032,9 @@ on the client site it will be used in the rdma_resolve_add()
 function. This can be useful when multiple paths exist between the
 client and the server or in certain loopback configurations.
 .TP
-.BI (filestat)lstat \fR=\fPbool
-Use \fBlstat\fR\|(2) to measure lookup/getattr performance. Default: 0.
+.BI (filestat)stat_type \fR=\fPstr
+Specify stat system call type to measure lookup/getattr performance.
+Default is \fBstat\fR for \fBstat\fR\|(2).
 .TP
 .BI (sg)readfua \fR=\fPbool
 With readfua option set to 1, read operations include the force
@@ -3000,23 +3001,24 @@ Disable measurements of submission latency numbers. See
 Disable measurements of throughput/bandwidth numbers. See
 \fBdisable_lat\fR.
 .TP
+.BI slat_percentiles \fR=\fPbool
+Report submission latency percentiles. Submission latency is not recorded
+for synchronous ioengines.
+.TP
 .BI clat_percentiles \fR=\fPbool
-Enable the reporting of percentiles of completion latencies. This option is
-mutually exclusive with \fBlat_percentiles\fR.
+Report completion latency percentiles.
 .TP
 .BI lat_percentiles \fR=\fPbool
-Enable the reporting of percentiles of I/O latencies. This is similar to
-\fBclat_percentiles\fR, except that this includes the submission latency.
-This option is mutually exclusive with \fBclat_percentiles\fR.
+Report total latency percentiles. Total latency is the sum of submission
+latency and completion latency.
 .TP
 .BI percentile_list \fR=\fPfloat_list
-Overwrite the default list of percentiles for completion latencies and the
-block error histogram. Each number is a floating number in the range
+Overwrite the default list of percentiles for latencies and the
+block error histogram. Each number is a floating point number in the range
 (0,100], and the maximum length of the list is 20. Use ':' to separate the
-numbers, and list the numbers in ascending order. For example,
-`\-\-percentile_list=99.5:99.9' will cause fio to report the values of
-completion latency below which 99.5% and 99.9% of the observed latencies
-fell, respectively.
+numbers. For example, `\-\-percentile_list=99.5:99.9' will cause fio to
+report the latency durations below which 99.5% and 99.9% of the observed
+latencies fell, respectively.
 .TP
 .BI significant_figures \fR=\fPint
 If using \fB\-\-output\-format\fR of `normal', set the significant figures
diff --git a/gclient.c b/gclient.c
index d8dc62d2..d2044f32 100644
--- a/gclient.c
+++ b/gclient.c
@@ -1097,10 +1097,9 @@ static struct graph *setup_clat_graph(char *title, unsigned long long *ovals,
 
 static void gfio_show_clat_percentiles(struct gfio_client *gc,
 				       GtkWidget *vbox, struct thread_stat *ts,
-				       int ddir)
+				       int ddir, uint64_t *io_u_plat,
+				       unsigned long long nr, const char *type)
 {
-	uint64_t *io_u_plat = ts->io_u_plat[ddir];
-	unsigned long long nr = ts->clat_stat[ddir].samples;
 	fio_fp64_t *plist = ts->percentile_list;
 	unsigned int len, scale_down;
 	unsigned long long *ovals, minv, maxv;
@@ -1128,10 +1127,7 @@ static void gfio_show_clat_percentiles(struct gfio_client *gc,
 		base = "nsec";
         }
 
-	if (ts->clat_percentiles)
-		sprintf(tmp, "Completion percentiles (%s)", base);
-	else
-		sprintf(tmp, "Latency percentiles (%s)", base);
+	sprintf(tmp, "%s latency percentiles (%s)", type, base);
 
 	tree_view = gfio_output_clat_percentiles(ovals, plist, len, base, scale_down);
 	ge->clat_graph = setup_clat_graph(tmp, ovals, plist, len, 700.0, 300.0);
@@ -1285,8 +1281,21 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 			gfio_show_lat(vbox, "Total latency", min[2], max[2], mean[2], dev[2]);
 	}
 
-	if (ts->clat_percentiles)
-		gfio_show_clat_percentiles(gc, main_vbox, ts, ddir);
+	if (ts->slat_percentiles && flags & GFIO_SLAT)
+		gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
+				ts->io_u_plat[FIO_SLAT][ddir],
+				ts->slat_stat[ddir].samples,
+				"Submission");
+	if (ts->clat_percentiles && flags & GFIO_CLAT)
+		gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
+				ts->io_u_plat[FIO_CLAT][ddir],
+				ts->clat_stat[ddir].samples,
+				"Completion");
+	if (ts->lat_percentiles && flags & GFIO_LAT)
+		gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
+				ts->io_u_plat[FIO_LAT][ddir],
+				ts->lat_stat[ddir].samples,
+				"Total");
 
 	free(io_p);
 	free(bw_p);
diff --git a/init.c b/init.c
index dca44bca..b5315334 100644
--- a/init.c
+++ b/init.c
@@ -944,24 +944,12 @@ static int fixup_options(struct thread_data *td)
 		ret |= 1;
 	}
 
-	if (fio_option_is_set(o, clat_percentiles) &&
-	    !fio_option_is_set(o, lat_percentiles)) {
-		o->lat_percentiles = !o->clat_percentiles;
-	} else if (fio_option_is_set(o, lat_percentiles) &&
-		   !fio_option_is_set(o, clat_percentiles)) {
-		o->clat_percentiles = !o->lat_percentiles;
-	} else if (fio_option_is_set(o, lat_percentiles) &&
-		   fio_option_is_set(o, clat_percentiles) &&
-		   o->lat_percentiles && o->clat_percentiles) {
-		log_err("fio: lat_percentiles and clat_percentiles are "
-			"mutually exclusive\n");
-		ret |= 1;
-	}
-
 	if (o->disable_lat)
 		o->lat_percentiles = 0;
 	if (o->disable_clat)
 		o->clat_percentiles = 0;
+	if (o->disable_slat)
+		o->slat_percentiles = 0;
 
 	/*
 	 * Fix these up to be nsec internally
@@ -1509,6 +1497,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 
 	td->ts.clat_percentiles = o->clat_percentiles;
 	td->ts.lat_percentiles = o->lat_percentiles;
+	td->ts.slat_percentiles = o->slat_percentiles;
 	td->ts.percentile_precision = o->percentile_precision;
 	memcpy(td->ts.percentile_list, o->percentile_list, sizeof(o->percentile_list));
 	td->ts.sig_figs = o->sig_figs;
@@ -1520,7 +1509,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		td->ts.bw_stat[i].min_val = ULONG_MAX;
 		td->ts.iops_stat[i].min_val = ULONG_MAX;
 		td->ts.clat_high_prio_stat[i].min_val = ULONG_MAX;
-		td->ts.clat_prio_stat[i].min_val = ULONG_MAX;
+		td->ts.clat_low_prio_stat[i].min_val = ULONG_MAX;
 	}
 	td->ts.sync_stat.min_val = ULONG_MAX;
 	td->ddir_seq_nr = o->ddir_seq_nr;
diff --git a/options.c b/options.c
index 287f0435..4714a3a1 100644
--- a/options.c
+++ b/options.c
@@ -1408,13 +1408,20 @@ static int str_gtod_reduce_cb(void *data, int *il)
 	struct thread_data *td = cb_data_to_td(data);
 	int val = *il;
 
-	td->o.disable_lat = !!val;
-	td->o.disable_clat = !!val;
-	td->o.disable_slat = !!val;
-	td->o.disable_bw = !!val;
-	td->o.clat_percentiles = !val;
-	if (val)
+	/*
+	 * Only modfiy options if gtod_reduce==1
+	 * Otherwise leave settings alone.
+	 */
+	if (val) {
+		td->o.disable_lat = 1;
+		td->o.disable_clat = 1;
+		td->o.disable_slat = 1;
+		td->o.disable_bw = 1;
+		td->o.clat_percentiles = 0;
+		td->o.lat_percentiles = 0;
+		td->o.slat_percentiles = 0;
 		td->ts_cache_mask = 63;
+	}
 
 	return 0;
 }
@@ -4312,7 +4319,6 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.off1	= offsetof(struct thread_options, clat_percentiles),
 		.help	= "Enable the reporting of completion latency percentiles",
 		.def	= "1",
-		.inverse = "lat_percentiles",
 		.category = FIO_OPT_C_STAT,
 		.group	= FIO_OPT_G_INVALID,
 	},
@@ -4323,7 +4329,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.off1	= offsetof(struct thread_options, lat_percentiles),
 		.help	= "Enable the reporting of IO latency percentiles",
 		.def	= "0",
-		.inverse = "clat_percentiles",
+		.category = FIO_OPT_C_STAT,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
+		.name	= "slat_percentiles",
+		.lname	= "Submission latency percentiles",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct thread_options, slat_percentiles),
+		.help	= "Enable the reporting of submission latency percentiles",
+		.def	= "0",
 		.category = FIO_OPT_C_STAT,
 		.group	= FIO_OPT_G_INVALID,
 	},
diff --git a/server.c b/server.c
index a5af5e74..248a2d44 100644
--- a/server.c
+++ b/server.c
@@ -1463,7 +1463,7 @@ static void convert_gs(struct group_run_stats *dst, struct group_run_stats *src)
 void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 {
 	struct cmd_ts_pdu p;
-	int i, j;
+	int i, j, k;
 	void *ss_buf;
 	uint64_t *ss_iops, *ss_bw;
 
@@ -1499,6 +1499,7 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	p.ts.majf		= cpu_to_le64(ts->majf);
 	p.ts.clat_percentiles	= cpu_to_le32(ts->clat_percentiles);
 	p.ts.lat_percentiles	= cpu_to_le32(ts->lat_percentiles);
+	p.ts.slat_percentiles	= cpu_to_le32(ts->slat_percentiles);
 	p.ts.percentile_precision = cpu_to_le64(ts->percentile_precision);
 
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++) {
@@ -1521,9 +1522,10 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	for (i = 0; i < FIO_IO_U_LAT_M_NR; i++)
 		p.ts.io_u_lat_m[i]	= cpu_to_le64(ts->io_u_lat_m[i]);
 
-	for (i = 0; i < DDIR_RWDIR_CNT; i++)
-		for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
-			p.ts.io_u_plat[i][j] = cpu_to_le64(ts->io_u_plat[i][j]);
+	for (i = 0; i < FIO_LAT_CNT; i++)
+		for (j = 0; j < DDIR_RWDIR_CNT; j++)
+			for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
+				p.ts.io_u_plat[i][j][k] = cpu_to_le64(ts->io_u_plat[i][j][k]);
 
 	for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
 		p.ts.io_u_sync_plat[j] = cpu_to_le64(ts->io_u_sync_plat[j]);
@@ -1577,10 +1579,10 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
 			p.ts.io_u_plat_high_prio[i][j] = cpu_to_le64(ts->io_u_plat_high_prio[i][j]);
-			p.ts.io_u_plat_prio[i][j] = cpu_to_le64(ts->io_u_plat_prio[i][j]);
+			p.ts.io_u_plat_low_prio[i][j] = cpu_to_le64(ts->io_u_plat_low_prio[i][j]);
 		}
 		convert_io_stat(&p.ts.clat_high_prio_stat[i], &ts->clat_high_prio_stat[i]);
-		convert_io_stat(&p.ts.clat_prio_stat[i], &ts->clat_prio_stat[i]);
+		convert_io_stat(&p.ts.clat_low_prio_stat[i], &ts->clat_low_prio_stat[i]);
 	}
 
 	convert_gs(&p.rs, rs);
diff --git a/server.h b/server.h
index 6ac75366..279b6917 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 81,
+	FIO_SERVER_VER			= 82,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index cc1c360e..69d57b69 100644
--- a/stat.c
+++ b/stat.c
@@ -483,26 +483,38 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	if (calc_lat(&ts->lat_stat[ddir], &min, &max, &mean, &dev))
 		display_lat(" lat", min, max, mean, dev, out);
 	if (calc_lat(&ts->clat_high_prio_stat[ddir], &min, &max, &mean, &dev))
-		display_lat("prio_clat", min, max, mean, dev, out);
+		display_lat(ts->lat_percentiles ? "prio_lat" : "prio_clat",
+				min, max, mean, dev, out);
+
+	if (ts->slat_percentiles && ts->slat_stat[ddir].samples > 0)
+		show_clat_percentiles(ts->io_u_plat[FIO_SLAT][ddir],
+					ts->slat_stat[ddir].samples,
+					ts->percentile_list,
+					ts->percentile_precision, "slat", out);
+	if (ts->clat_percentiles && ts->clat_stat[ddir].samples > 0)
+		show_clat_percentiles(ts->io_u_plat[FIO_CLAT][ddir],
+					ts->clat_stat[ddir].samples,
+					ts->percentile_list,
+					ts->percentile_precision, "clat", out);
+	if (ts->lat_percentiles && ts->lat_stat[ddir].samples > 0)
+		show_clat_percentiles(ts->io_u_plat[FIO_LAT][ddir],
+					ts->lat_stat[ddir].samples,
+					ts->percentile_list,
+					ts->percentile_precision, "lat", out);
 
 	if (ts->clat_percentiles || ts->lat_percentiles) {
-		const char *name = ts->clat_percentiles ? "clat" : " lat";
+		const char *name = ts->lat_percentiles ? "lat" : "clat";
 		char prio_name[32];
 		uint64_t samples;
 
-		if (ts->clat_percentiles)
-			samples = ts->clat_stat[ddir].samples;
-		else
+		if (ts->lat_percentiles)
 			samples = ts->lat_stat[ddir].samples;
-
-		show_clat_percentiles(ts->io_u_plat[ddir],
-					samples,
-					ts->percentile_list,
-					ts->percentile_precision, name, out);
+		else
+			samples = ts->clat_stat[ddir].samples;
 
 		/* Only print this if some high and low priority stats were collected */
 		if (ts->clat_high_prio_stat[ddir].samples > 0 &&
-			ts->clat_prio_stat[ddir].samples > 0)
+			ts->clat_low_prio_stat[ddir].samples > 0)
 		{
 			sprintf(prio_name, "high prio (%.2f%%) %s",
 					100. * (double) ts->clat_high_prio_stat[ddir].samples / (double) samples,
@@ -513,14 +525,15 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 						ts->percentile_precision, prio_name, out);
 
 			sprintf(prio_name, "low prio (%.2f%%) %s",
-					100. * (double) ts->clat_prio_stat[ddir].samples / (double) samples,
+					100. * (double) ts->clat_low_prio_stat[ddir].samples / (double) samples,
 					name);
-			show_clat_percentiles(ts->io_u_plat_prio[ddir],
-						ts->clat_prio_stat[ddir].samples,
+			show_clat_percentiles(ts->io_u_plat_low_prio[ddir],
+						ts->clat_low_prio_stat[ddir].samples,
 						ts->percentile_list,
 						ts->percentile_precision, prio_name, out);
 		}
 	}
+
 	if (calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev)) {
 		double p_of_agg = 100.0, fkb_base = (double)rs->kb_base;
 		const char *bw_str;
@@ -1170,12 +1183,17 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 	else
 		log_buf(out, ";%llu;%llu;%f;%f", 0ULL, 0ULL, 0.0, 0.0);
 
-	if (ts->clat_percentiles || ts->lat_percentiles) {
-		len = calc_clat_percentiles(ts->io_u_plat[ddir],
+	if (ts->lat_percentiles)
+		len = calc_clat_percentiles(ts->io_u_plat[FIO_LAT][ddir],
+					ts->lat_stat[ddir].samples,
+					ts->percentile_list, &ovals, &maxv,
+					&minv);
+	else if (ts->clat_percentiles)
+		len = calc_clat_percentiles(ts->io_u_plat[FIO_CLAT][ddir],
 					ts->clat_stat[ddir].samples,
 					ts->percentile_list, &ovals, &maxv,
 					&minv);
-	} else
+	else
 		len = 0;
 
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++) {
@@ -1221,18 +1239,63 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 	}
 }
 
+static struct json_object *add_ddir_lat_json(struct thread_stat *ts, uint32_t percentiles,
+		struct io_stat *lat_stat, uint64_t *io_u_plat)
+{
+	char buf[120];
+	double mean, dev;
+	unsigned int i, len;
+	struct json_object *lat_object, *percentile_object, *clat_bins_object;
+	unsigned long long min, max, maxv, minv, *ovals = NULL;
+
+	if (!calc_lat(lat_stat, &min, &max, &mean, &dev)) {
+		min = max = 0;
+		mean = dev = 0.0;
+	}
+	lat_object = json_create_object();
+	json_object_add_value_int(lat_object, "min", min);
+	json_object_add_value_int(lat_object, "max", max);
+	json_object_add_value_float(lat_object, "mean", mean);
+	json_object_add_value_float(lat_object, "stddev", dev);
+	json_object_add_value_int(lat_object, "N", lat_stat->samples);
+
+	if (percentiles && lat_stat->samples) {
+		len = calc_clat_percentiles(io_u_plat, lat_stat->samples,
+				ts->percentile_list, &ovals, &maxv, &minv);
+
+		if (len > FIO_IO_U_LIST_MAX_LEN)
+			len = FIO_IO_U_LIST_MAX_LEN;
+
+		percentile_object = json_create_object();
+		json_object_add_value_object(lat_object, "percentile", percentile_object);
+		for (i = 0; i < len; i++) {
+			snprintf(buf, sizeof(buf), "%f", ts->percentile_list[i].u.f);
+			json_object_add_value_int(percentile_object, buf, ovals[i]);
+		}
+		free(ovals);
+
+		if (output_format & FIO_OUTPUT_JSON_PLUS) {
+			clat_bins_object = json_create_object();
+			json_object_add_value_object(lat_object, "bins", clat_bins_object);
+
+			for(i = 0; i < FIO_IO_U_PLAT_NR; i++)
+				if (io_u_plat[i]) {
+					snprintf(buf, sizeof(buf), "%llu", plat_idx_to_val(i));
+					json_object_add_value_int(clat_bins_object, buf, io_u_plat[i]);
+				}
+		}
+	}
+
+	return lat_object;
+}
+
 static void add_ddir_status_json(struct thread_stat *ts,
 		struct group_run_stats *rs, int ddir, struct json_object *parent)
 {
-	unsigned long long min, max, minv, maxv;
+	unsigned long long min, max;
 	unsigned long long bw_bytes, bw;
-	unsigned long long *ovals = NULL;
 	double mean, dev, iops;
-	unsigned int len;
-	int i;
-	struct json_object *dir_object, *tmp_object, *percentile_object = NULL,
-		*clat_bins_object = NULL;
-	char buf[120];
+	struct json_object *dir_object, *tmp_object;
 	double p_of_agg = 100.0;
 
 	assert(ddir_rw(ddir) || ddir_sync(ddir));
@@ -1266,222 +1329,47 @@ static void add_ddir_status_json(struct thread_stat *ts,
 		json_object_add_value_int(dir_object, "short_ios", ts->short_io_u[ddir]);
 		json_object_add_value_int(dir_object, "drop_ios", ts->drop_io_u[ddir]);
 
-		if (!calc_lat(&ts->slat_stat[ddir], &min, &max, &mean, &dev)) {
-			min = max = 0;
-			mean = dev = 0.0;
-		}
-		tmp_object = json_create_object();
+		tmp_object = add_ddir_lat_json(ts, ts->slat_percentiles,
+				&ts->slat_stat[ddir], ts->io_u_plat[FIO_SLAT][ddir]);
 		json_object_add_value_object(dir_object, "slat_ns", tmp_object);
-		json_object_add_value_int(tmp_object, "min", min);
-		json_object_add_value_int(tmp_object, "max", max);
-		json_object_add_value_float(tmp_object, "mean", mean);
-		json_object_add_value_float(tmp_object, "stddev", dev);
-
-		if (!calc_lat(&ts->clat_stat[ddir], &min, &max, &mean, &dev)) {
-			min = max = 0;
-			mean = dev = 0.0;
-		}
-		tmp_object = json_create_object();
+
+		tmp_object = add_ddir_lat_json(ts, ts->clat_percentiles,
+				&ts->clat_stat[ddir], ts->io_u_plat[FIO_CLAT][ddir]);
 		json_object_add_value_object(dir_object, "clat_ns", tmp_object);
-		json_object_add_value_int(tmp_object, "min", min);
-		json_object_add_value_int(tmp_object, "max", max);
-		json_object_add_value_float(tmp_object, "mean", mean);
-		json_object_add_value_float(tmp_object, "stddev", dev);
-	} else {
-		if (!calc_lat(&ts->sync_stat, &min, &max, &mean, &dev)) {
-			min = max = 0;
-			mean = dev = 0.0;
-		}
 
-		tmp_object = json_create_object();
+		tmp_object = add_ddir_lat_json(ts, ts->lat_percentiles,
+				&ts->lat_stat[ddir], ts->io_u_plat[FIO_LAT][ddir]);
 		json_object_add_value_object(dir_object, "lat_ns", tmp_object);
+	} else {
 		json_object_add_value_int(dir_object, "total_ios", ts->total_io_u[DDIR_SYNC]);
-		json_object_add_value_int(tmp_object, "min", min);
-		json_object_add_value_int(tmp_object, "max", max);
-		json_object_add_value_float(tmp_object, "mean", mean);
-		json_object_add_value_float(tmp_object, "stddev", dev);
-	}
-
-	if (ts->clat_percentiles || ts->lat_percentiles) {
-		if (ddir_rw(ddir)) {
-			uint64_t samples;
-
-			if (ts->clat_percentiles)
-				samples = ts->clat_stat[ddir].samples;
-			else
-				samples = ts->lat_stat[ddir].samples;
-
-			len = calc_clat_percentiles(ts->io_u_plat[ddir],
-					samples, ts->percentile_list, &ovals,
-					&maxv, &minv);
-		} else {
-			len = calc_clat_percentiles(ts->io_u_sync_plat,
-					ts->sync_stat.samples,
-					ts->percentile_list, &ovals, &maxv,
-					&minv);
-		}
-
-		if (len > FIO_IO_U_LIST_MAX_LEN)
-			len = FIO_IO_U_LIST_MAX_LEN;
-	} else
-		len = 0;
-
-	if (ts->clat_percentiles) {
-		percentile_object = json_create_object();
-		json_object_add_value_object(tmp_object, "percentile", percentile_object);
-		for (i = 0; i < len; i++) {
-			snprintf(buf, sizeof(buf), "%f",
-				 ts->percentile_list[i].u.f);
-			json_object_add_value_int(percentile_object, buf,
-						  ovals[i]);
-		}
-	}
-
-	free(ovals);
-
-	if (output_format & FIO_OUTPUT_JSON_PLUS && ts->clat_percentiles) {
-		clat_bins_object = json_create_object();
-		json_object_add_value_object(tmp_object, "bins",
-					     clat_bins_object);
-
-		for(i = 0; i < FIO_IO_U_PLAT_NR; i++) {
-			if (ddir_rw(ddir)) {
-				if (ts->io_u_plat[ddir][i]) {
-					snprintf(buf, sizeof(buf), "%llu", plat_idx_to_val(i));
-					json_object_add_value_int(clat_bins_object, buf, ts->io_u_plat[ddir][i]);
-				}
-			} else {
-				if (ts->io_u_sync_plat[i]) {
-					snprintf(buf, sizeof(buf), "%llu", plat_idx_to_val(i));
-					json_object_add_value_int(clat_bins_object, buf, ts->io_u_sync_plat[i]);
-				}
-			}
-		}
+		tmp_object = add_ddir_lat_json(ts, ts->lat_percentiles | ts->clat_percentiles,
+				&ts->sync_stat, ts->io_u_sync_plat);
+		json_object_add_value_object(dir_object, "lat_ns", tmp_object);
 	}
 
+	if (!ddir_rw(ddir))
+		return;
 
 	/* Only print PRIO latencies if some high priority samples were gathered */
 	if (ts->clat_high_prio_stat[ddir].samples > 0) {
-		/* START OF HIGH PRIO CLAT */
-	    if (!calc_lat(&ts->clat_high_prio_stat[ddir], &min, &max, &mean, &dev)) {
-			min = max = 0;
-			mean = dev = 0.0;
-		}
-		tmp_object = json_create_object();
-		json_object_add_value_object(dir_object, "clat_prio",
-				tmp_object);
-		json_object_add_value_int(tmp_object, "samples",
-				ts->clat_high_prio_stat[ddir].samples);
-		json_object_add_value_int(tmp_object, "min", min);
-		json_object_add_value_int(tmp_object, "max", max);
-		json_object_add_value_float(tmp_object, "mean", mean);
-		json_object_add_value_float(tmp_object, "stddev", dev);
-
-		if (ts->clat_percentiles) {
-			len = calc_clat_percentiles(ts->io_u_plat_high_prio[ddir],
-						ts->clat_high_prio_stat[ddir].samples,
-						ts->percentile_list, &ovals, &maxv,
-						&minv);
-		} else
-			len = 0;
-
-		percentile_object = json_create_object();
-		json_object_add_value_object(tmp_object, "percentile", percentile_object);
-		for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++) {
-			if (i >= len) {
-				json_object_add_value_int(percentile_object, "0.00", 0);
-				continue;
-			}
-			snprintf(buf, sizeof(buf), "%f", ts->percentile_list[i].u.f);
-			json_object_add_value_int(percentile_object, (const char *)buf, ovals[i]);
-		}
-
-		if (output_format & FIO_OUTPUT_JSON_PLUS) {
-			clat_bins_object = json_create_object();
-			json_object_add_value_object(tmp_object, "bins", clat_bins_object);
-			for(i = 0; i < FIO_IO_U_PLAT_NR; i++) {
-				snprintf(buf, sizeof(buf), "%d", i);
-				json_object_add_value_int(clat_bins_object, (const char *)buf,
-						ts->io_u_plat_high_prio[ddir][i]);
-			}
-			json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_BITS",
-					FIO_IO_U_PLAT_BITS);
-			json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_VAL",
-					FIO_IO_U_PLAT_VAL);
-			json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_NR",
-					FIO_IO_U_PLAT_NR);
-		}
-		/* END OF HIGH PRIO CLAT */
-
-		/* START OF PRIO CLAT */
-	    if (!calc_lat(&ts->clat_prio_stat[ddir], &min, &max, &mean, &dev)) {
-			min = max = 0;
-			mean = dev = 0.0;
-		}
-		tmp_object = json_create_object();
-		json_object_add_value_object(dir_object, "clat_low_prio",
-				tmp_object);
-		json_object_add_value_int(tmp_object, "samples",
-				ts->clat_prio_stat[ddir].samples);
-		json_object_add_value_int(tmp_object, "min", min);
-		json_object_add_value_int(tmp_object, "max", max);
-		json_object_add_value_float(tmp_object, "mean", mean);
-		json_object_add_value_float(tmp_object, "stddev", dev);
-
-		if (ts->clat_percentiles) {
-			len = calc_clat_percentiles(ts->io_u_plat_prio[ddir],
-						ts->clat_prio_stat[ddir].samples,
-						ts->percentile_list, &ovals, &maxv,
-						&minv);
-		} else
-			len = 0;
-
-		percentile_object = json_create_object();
-		json_object_add_value_object(tmp_object, "percentile", percentile_object);
-		for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++) {
-			if (i >= len) {
-				json_object_add_value_int(percentile_object, "0.00", 0);
-				continue;
-			}
-			snprintf(buf, sizeof(buf), "%f", ts->percentile_list[i].u.f);
-			json_object_add_value_int(percentile_object, (const char *)buf, ovals[i]);
-		}
+		const char *high, *low;
 
-		if (output_format & FIO_OUTPUT_JSON_PLUS) {
-			clat_bins_object = json_create_object();
-			json_object_add_value_object(tmp_object, "bins", clat_bins_object);
-			for(i = 0; i < FIO_IO_U_PLAT_NR; i++) {
-				snprintf(buf, sizeof(buf), "%d", i);
-				json_object_add_value_int(clat_bins_object, (const char *)buf,
-						ts->io_u_plat_prio[ddir][i]);
-			}
-			json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_BITS",
-					FIO_IO_U_PLAT_BITS);
-			json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_VAL",
-					FIO_IO_U_PLAT_VAL);
-			json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_NR",
-					FIO_IO_U_PLAT_NR);
+		if (ts->lat_percentiles) {
+			high = "lat_high_prio";
+			low = "lat_low_prio";
+		} else {
+			high = "clat_high_prio";
+			low = "clat_low_prio";
 		}
-		/* END OF PRIO CLAT */
-	}
 
-	if (!ddir_rw(ddir))
-		return;
+		tmp_object = add_ddir_lat_json(ts, ts->clat_percentiles | ts->lat_percentiles,
+				&ts->clat_high_prio_stat[ddir], ts->io_u_plat_high_prio[ddir]);
+		json_object_add_value_object(dir_object, high, tmp_object);
 
-	if (!calc_lat(&ts->lat_stat[ddir], &min, &max, &mean, &dev)) {
-		min = max = 0;
-		mean = dev = 0.0;
+		tmp_object = add_ddir_lat_json(ts, ts->clat_percentiles | ts->lat_percentiles,
+				&ts->clat_low_prio_stat[ddir], ts->io_u_plat_low_prio[ddir]);
+		json_object_add_value_object(dir_object, low, tmp_object);
 	}
-	tmp_object = json_create_object();
-	json_object_add_value_object(dir_object, "lat_ns", tmp_object);
-	json_object_add_value_int(tmp_object, "min", min);
-	json_object_add_value_int(tmp_object, "max", max);
-	json_object_add_value_float(tmp_object, "mean", mean);
-	json_object_add_value_float(tmp_object, "stddev", dev);
-	if (ts->lat_percentiles)
-		json_object_add_value_object(tmp_object, "percentile", percentile_object);
-	if (output_format & FIO_OUTPUT_JSON_PLUS && ts->lat_percentiles)
-		json_object_add_value_object(tmp_object, "bins", clat_bins_object);
 
 	if (calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev)) {
 		if (rs->agg[ddir]) {
@@ -1493,6 +1381,7 @@ static void add_ddir_status_json(struct thread_stat *ts,
 		min = max = 0;
 		p_of_agg = mean = dev = 0.0;
 	}
+
 	json_object_add_value_int(dir_object, "bw_min", min);
 	json_object_add_value_int(dir_object, "bw_max", max);
 	json_object_add_value_float(dir_object, "bw_agg", p_of_agg);
@@ -1981,13 +1870,13 @@ void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src)
 void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 		      bool first)
 {
-	int l, k;
+	int k, l, m;
 
 	for (l = 0; l < DDIR_RWDIR_CNT; l++) {
 		if (!dst->unified_rw_rep) {
 			sum_stat(&dst->clat_stat[l], &src->clat_stat[l], first, false);
 			sum_stat(&dst->clat_high_prio_stat[l], &src->clat_high_prio_stat[l], first, false);
-			sum_stat(&dst->clat_prio_stat[l], &src->clat_prio_stat[l], first, false);
+			sum_stat(&dst->clat_low_prio_stat[l], &src->clat_low_prio_stat[l], first, false);
 			sum_stat(&dst->slat_stat[l], &src->slat_stat[l], first, false);
 			sum_stat(&dst->lat_stat[l], &src->lat_stat[l], first, false);
 			sum_stat(&dst->bw_stat[l], &src->bw_stat[l], first, true);
@@ -1999,8 +1888,8 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 				dst->runtime[l] = src->runtime[l];
 		} else {
 			sum_stat(&dst->clat_stat[0], &src->clat_stat[l], first, false);
-			sum_stat(&dst->clat_high_prio_stat[l], &src->clat_high_prio_stat[l], first, false);
-			sum_stat(&dst->clat_prio_stat[l], &src->clat_prio_stat[l], first, false);
+			sum_stat(&dst->clat_high_prio_stat[0], &src->clat_high_prio_stat[l], first, false);
+			sum_stat(&dst->clat_low_prio_stat[0], &src->clat_low_prio_stat[l], first, false);
 			sum_stat(&dst->slat_stat[0], &src->slat_stat[l], first, false);
 			sum_stat(&dst->lat_stat[0], &src->lat_stat[l], first, false);
 			sum_stat(&dst->bw_stat[0], &src->bw_stat[l], first, true);
@@ -2039,9 +1928,6 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 	for (k = 0; k < FIO_IO_U_LAT_M_NR; k++)
 		dst->io_u_lat_m[k] += src->io_u_lat_m[k];
 
-	for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
-		dst->io_u_sync_plat[k] += src->io_u_sync_plat[k];
-
 	for (k = 0; k < DDIR_RWDIR_CNT; k++) {
 		if (!dst->unified_rw_rep) {
 			dst->total_io_u[k] += src->total_io_u[k];
@@ -2056,18 +1942,25 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 
 	dst->total_io_u[DDIR_SYNC] += src->total_io_u[DDIR_SYNC];
 
-	for (k = 0; k < DDIR_RWDIR_CNT; k++) {
-		int m;
+	for (k = 0; k < FIO_LAT_CNT; k++)
+		for (l = 0; l < DDIR_RWDIR_CNT; l++)
+			for (m = 0; m < FIO_IO_U_PLAT_NR; m++)
+				if (!dst->unified_rw_rep)
+					dst->io_u_plat[k][l][m] += src->io_u_plat[k][l][m];
+				else
+					dst->io_u_plat[k][0][m] += src->io_u_plat[k][l][m];
 
+	for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
+		dst->io_u_sync_plat[k] += src->io_u_sync_plat[k];
+
+	for (k = 0; k < DDIR_RWDIR_CNT; k++) {
 		for (m = 0; m < FIO_IO_U_PLAT_NR; m++) {
 			if (!dst->unified_rw_rep) {
-				dst->io_u_plat[k][m] += src->io_u_plat[k][m];
 				dst->io_u_plat_high_prio[k][m] += src->io_u_plat_high_prio[k][m];
-				dst->io_u_plat_prio[k][m] += src->io_u_plat_prio[k][m];
+				dst->io_u_plat_low_prio[k][m] += src->io_u_plat_low_prio[k][m];
 			} else {
-				dst->io_u_plat[0][m] += src->io_u_plat[k][m];
 				dst->io_u_plat_high_prio[0][m] += src->io_u_plat_high_prio[k][m];
-				dst->io_u_plat_prio[0][m] += src->io_u_plat_prio[k][m];
+				dst->io_u_plat_low_prio[0][m] += src->io_u_plat_low_prio[k][m];
 			}
 
 		}
@@ -2103,7 +1996,7 @@ void init_thread_stat(struct thread_stat *ts)
 		ts->bw_stat[j].min_val = -1UL;
 		ts->iops_stat[j].min_val = -1UL;
 		ts->clat_high_prio_stat[j].min_val = -1UL;
-		ts->clat_prio_stat[j].min_val = -1UL;
+		ts->clat_low_prio_stat[j].min_val = -1UL;
 	}
 	ts->sync_stat.min_val = -1UL;
 	ts->groupid = -1;
@@ -2173,6 +2066,7 @@ void __show_run_stats(void)
 
 		ts->clat_percentiles = td->o.clat_percentiles;
 		ts->lat_percentiles = td->o.lat_percentiles;
+		ts->slat_percentiles = td->o.slat_percentiles;
 		ts->percentile_precision = td->o.percentile_precision;
 		memcpy(ts->percentile_list, td->o.percentile_list, sizeof(td->o.percentile_list));
 		opt_lists[j] = &td->opt_list;
@@ -2728,11 +2622,11 @@ static inline void reset_io_stat(struct io_stat *ios)
 void reset_io_stats(struct thread_data *td)
 {
 	struct thread_stat *ts = &td->ts;
-	int i, j;
+	int i, j, k;
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		reset_io_stat(&ts->clat_high_prio_stat[i]);
-		reset_io_stat(&ts->clat_prio_stat[i]);
+		reset_io_stat(&ts->clat_low_prio_stat[i]);
 		reset_io_stat(&ts->clat_stat[i]);
 		reset_io_stat(&ts->slat_stat[i]);
 		reset_io_stat(&ts->lat_stat[i]);
@@ -2746,14 +2640,18 @@ void reset_io_stats(struct thread_data *td)
 		ts->drop_io_u[i] = 0;
 
 		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
-			ts->io_u_plat[i][j] = 0;
 			ts->io_u_plat_high_prio[i][j] = 0;
-			ts->io_u_plat_prio[i][j] = 0;
+			ts->io_u_plat_low_prio[i][j] = 0;
 			if (!i)
 				ts->io_u_sync_plat[j] = 0;
 		}
 	}
 
+	for (i = 0; i < FIO_LAT_CNT; i++)
+		for (j = 0; j < DDIR_RWDIR_CNT; j++)
+			for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
+				ts->io_u_plat[i][j][k] = 0;
+
 	ts->total_io_u[DDIR_SYNC] = 0;
 
 	for (i = 0; i < FIO_IO_U_MAP_NR; i++) {
@@ -2892,19 +2790,27 @@ void add_sync_clat_sample(struct thread_stat *ts, unsigned long long nsec)
 	add_stat_sample(&ts->sync_stat, nsec);
 }
 
-static void add_clat_percentile_sample(struct thread_stat *ts,
-				unsigned long long nsec, enum fio_ddir ddir, uint8_t priority_bit)
+static void add_lat_percentile_sample_noprio(struct thread_stat *ts,
+				unsigned long long nsec, enum fio_ddir ddir, enum fio_lat lat)
 {
 	unsigned int idx = plat_val_to_idx(nsec);
 	assert(idx < FIO_IO_U_PLAT_NR);
 
-	ts->io_u_plat[ddir][idx]++;
+	ts->io_u_plat[lat][ddir][idx]++;
+}
 
-	if (!priority_bit) {
-		ts->io_u_plat_prio[ddir][idx]++;
-	} else {
+static void add_lat_percentile_sample(struct thread_stat *ts,
+				unsigned long long nsec, enum fio_ddir ddir, uint8_t priority_bit,
+				enum fio_lat lat)
+{
+	unsigned int idx = plat_val_to_idx(nsec);
+
+	add_lat_percentile_sample_noprio(ts, nsec, ddir, lat);
+
+	if (!priority_bit)
+		ts->io_u_plat_low_prio[ddir][idx]++;
+	else
 		ts->io_u_plat_high_prio[ddir][idx]++;
-	}
 }
 
 void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
@@ -2921,10 +2827,11 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 
 	add_stat_sample(&ts->clat_stat[ddir], nsec);
 
-	if (priority_bit) {
-		add_stat_sample(&ts->clat_high_prio_stat[ddir], nsec);
-	} else {
-		add_stat_sample(&ts->clat_prio_stat[ddir], nsec);
+	if (!ts->lat_percentiles) {
+		if (priority_bit)
+			add_stat_sample(&ts->clat_high_prio_stat[ddir], nsec);
+		else
+			add_stat_sample(&ts->clat_low_prio_stat[ddir], nsec);
 	}
 
 	if (td->clat_log)
@@ -2932,7 +2839,10 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 			       offset, priority_bit);
 
 	if (ts->clat_percentiles) {
-		add_clat_percentile_sample(ts, nsec, ddir, priority_bit);
+		if (ts->lat_percentiles)
+			add_lat_percentile_sample_noprio(ts, nsec, ddir, FIO_CLAT);
+		else
+			add_lat_percentile_sample(ts, nsec, ddir, priority_bit, FIO_CLAT);
 	}
 
 	if (iolog && iolog->hist_msec) {
@@ -2955,7 +2865,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 			 * located in iolog.c after printing this sample to the
 			 * log file.
 			 */
-			io_u_plat = (uint64_t *) td->ts.io_u_plat[ddir];
+			io_u_plat = (uint64_t *) td->ts.io_u_plat[FIO_CLAT][ddir];
 			dst = malloc(sizeof(struct io_u_plat_entry));
 			memcpy(&(dst->io_u_plat), io_u_plat,
 				FIO_IO_U_PLAT_NR * sizeof(uint64_t));
@@ -2978,7 +2888,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 }
 
 void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
-			unsigned long usec, unsigned long long bs, uint64_t offset,
+			unsigned long long nsec, unsigned long long bs, uint64_t offset,
 			uint8_t priority_bit)
 {
 	const bool needs_lock = td_async_processing(td);
@@ -2990,12 +2900,15 @@ void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
 	if (needs_lock)
 		__td_io_u_lock(td);
 
-	add_stat_sample(&ts->slat_stat[ddir], usec);
+	add_stat_sample(&ts->slat_stat[ddir], nsec);
 
 	if (td->slat_log)
-		add_log_sample(td, td->slat_log, sample_val(usec), ddir, bs, offset,
+		add_log_sample(td, td->slat_log, sample_val(nsec), ddir, bs, offset,
 			priority_bit);
 
+	if (ts->slat_percentiles)
+		add_lat_percentile_sample_noprio(ts, nsec, ddir, FIO_SLAT);
+
 	if (needs_lock)
 		__td_io_u_unlock(td);
 }
@@ -3019,9 +2932,14 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 		add_log_sample(td, td->lat_log, sample_val(nsec), ddir, bs,
 			       offset, priority_bit);
 
-	if (ts->lat_percentiles)
-		add_clat_percentile_sample(ts, nsec, ddir, priority_bit);
+	if (ts->lat_percentiles) {
+		add_lat_percentile_sample(ts, nsec, ddir, priority_bit, FIO_LAT);
+		if (priority_bit)
+			add_stat_sample(&ts->clat_high_prio_stat[ddir], nsec);
+		else
+			add_stat_sample(&ts->clat_low_prio_stat[ddir], nsec);
 
+	}
 	if (needs_lock)
 		__td_io_u_unlock(td);
 }
diff --git a/stat.h b/stat.h
index 9320c6bd..0d141666 100644
--- a/stat.h
+++ b/stat.h
@@ -37,7 +37,7 @@ struct group_run_stats {
 					list of percentiles */
 
 /*
- * Aggregate clat samples to report percentile(s) of them.
+ * Aggregate latency samples for reporting percentile(s).
  *
  * EXECUTIVE SUMMARY
  *
@@ -58,7 +58,7 @@ struct group_run_stats {
  *
  * DETAILS
  *
- * Suppose the clat varies from 0 to 999 (usec), the straightforward
+ * Suppose the lat varies from 0 to 999 (usec), the straightforward
  * method is to keep an array of (999 + 1) buckets, in which a counter
  * keeps the count of samples which fall in the bucket, e.g.,
  * {[0],[1],...,[999]}. However this consumes a huge amount of space,
@@ -147,6 +147,14 @@ enum block_info_state {
 #define FIO_JOBDESC_SIZE	256
 #define FIO_VERROR_SIZE		128
 
+enum fio_lat {
+	FIO_SLAT = 0,
+	FIO_CLAT,
+	FIO_LAT,
+
+	FIO_LAT_CNT = 3,
+};
+
 struct thread_stat {
 	char name[FIO_JOBNAME_SIZE];
 	char verror[FIO_VERROR_SIZE];
@@ -181,6 +189,8 @@ struct thread_stat {
 	 */
 	uint32_t clat_percentiles;
 	uint32_t lat_percentiles;
+	uint32_t slat_percentiles;
+	uint32_t pad;
 	uint64_t percentile_precision;
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
@@ -190,7 +200,7 @@ struct thread_stat {
 	uint64_t io_u_lat_n[FIO_IO_U_LAT_N_NR];
 	uint64_t io_u_lat_u[FIO_IO_U_LAT_U_NR];
 	uint64_t io_u_lat_m[FIO_IO_U_LAT_M_NR];
-	uint64_t io_u_plat[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR];
+	uint64_t io_u_plat[FIO_LAT_CNT][DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR];
 	uint64_t io_u_sync_plat[FIO_IO_U_PLAT_NR];
 
 	uint64_t total_io_u[DDIR_RWDIR_SYNC_CNT];
@@ -240,9 +250,9 @@ struct thread_stat {
 	fio_fp64_t ss_criterion;
 
 	uint64_t io_u_plat_high_prio[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR] __attribute__((aligned(8)));;
-	uint64_t io_u_plat_prio[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR];
+	uint64_t io_u_plat_low_prio[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR];
 	struct io_stat clat_high_prio_stat[DDIR_RWDIR_CNT] __attribute__((aligned(8)));
-	struct io_stat clat_prio_stat[DDIR_RWDIR_CNT];
+	struct io_stat clat_low_prio_stat[DDIR_RWDIR_CNT];
 
 	union {
 		uint64_t *ss_iops_data;
@@ -331,7 +341,7 @@ extern void add_lat_sample(struct thread_data *, enum fio_ddir, unsigned long lo
 				unsigned long long, uint64_t, uint8_t);
 extern void add_clat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
 				unsigned long long, uint64_t, uint8_t);
-extern void add_slat_sample(struct thread_data *, enum fio_ddir, unsigned long,
+extern void add_slat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
 				unsigned long long, uint64_t, uint8_t);
 extern void add_agg_sample(union io_sample_data, enum fio_ddir, unsigned long long bs,
 				uint8_t priority_bit);
diff --git a/t/latency_percentiles.py b/t/latency_percentiles.py
new file mode 100755
index 00000000..0c8d0c19
--- /dev/null
+++ b/t/latency_percentiles.py
@@ -0,0 +1,1324 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright (c) 2020 Western Digital Corporation or its affiliates.
+#
+"""
+# latency_percentiles.py
+#
+# Test the code that produces latency percentiles
+# This is mostly to test the code changes to allow reporting
+# of slat, clat, and lat percentiles
+#
+# USAGE
+# python3 latency-tests.py [-f fio-path] [-a artifact-root] [--debug]
+#
+#
+# Test scenarios:
+#
+# - DONE json
+#   unified rw reporting
+#   compare with latency log
+#   try various combinations of the ?lat_percentile options
+#   null, aio
+#   r, w, t
+# - DONE json+
+#   check presence of latency bins
+#   if the json percentiles match those from the raw data
+#   then the latency bin values and counts are probably ok
+# - DONE terse
+#   produce both terse, JSON output and confirm that they match
+#   lat only; both lat and clat
+# - DONE sync_lat
+#   confirm that sync_lat data appears
+# - MANUAL TESTING normal output:
+#       null ioengine
+#           enable all, but only clat and lat appear
+#           enable subset of latency types
+#           read, write, trim, unified
+#       libaio ioengine
+#           enable all latency types
+#           enable subset of latency types
+#           read, write, trim, unified
+# ./fio/fio --name=test --randrepeat=0 --norandommap --time_based --runtime=2s --size=512M \
+#       --ioengine=null --slat_percentiles=1 --clat_percentiles=1 --lat_percentiles=1
+# echo confirm that clat and lat percentiles appear
+# ./fio/fio --name=test --randrepeat=0 --norandommap --time_based --runtime=2s --size=512M \
+#       --ioengine=null --slat_percentiles=0 --clat_percentiles=0 --lat_percentiles=1
+# echo confirm that only lat percentiles appear
+# ./fio/fio --name=test --randrepeat=0 --norandommap --time_based --runtime=2s --size=512M \
+#       --ioengine=null --slat_percentiles=0 --clat_percentiles=1 --lat_percentiles=0
+# echo confirm that only clat percentiles appear
+# ./fio/fio --name=test --randrepeat=0 --norandommap --time_based --runtime=2s --size=512M \
+#       --ioengine=libaio --slat_percentiles=1 --clat_percentiles=1 --lat_percentiles=1
+# echo confirm that slat, clat, lat percentiles appear
+# ./fio/fio --name=test --randrepeat=0 --norandommap --time_based --runtime=2s --size=512M \
+#       --ioengine=libaio --slat_percentiles=0 --clat_percentiles=1 --lat_percentiles=1
+# echo confirm that clat and lat percentiles appear
+# ./fio/fio --name=test --randrepeat=0 --norandommap --time_based --runtime=2s --size=512M \
+#       --ioengine=libaio -rw=randrw
+# echo confirm that clat percentiles appear for reads and writes
+# ./fio/fio --name=test --randrepeat=0 --norandommap --time_based --runtime=2s --size=512M \
+#       --ioengine=libaio --slat_percentiles=1 --clat_percentiles=0 --lat_percentiles=0 --rw=randrw
+# echo confirm that slat percentiles appear for both reads and writes
+# ./fio/fio --name=test --randrepeat=0 --norandommap --time_based --runtime=2s --size=512M \
+#       --ioengine=libaio --slat_percentiles=1 --clat_percentiles=1 --lat_percentiles=1 \
+#       --rw=randrw --unified_rw_reporting=1
+# echo confirm that slat, clat, and lat percentiles appear for 'mixed' IOs
+#./fio/fio --name=test --randrepeat=0 --norandommap --time_based --runtime=2s --size=512M \
+#       --ioengine=null --slat_percentiles=1 --clat_percentiles=1 --lat_percentiles=1 \
+#       --rw=randrw --fsync=32
+# echo confirm that fsync latencies appear
+"""
+
+import os
+import csv
+import sys
+import json
+import math
+import time
+import argparse
+import platform
+import subprocess
+from pathlib import Path
+
+
+class FioLatTest():
+    """fio latency percentile test."""
+
+    def __init__(self, artifact_root, test_options, debug):
+        """
+        artifact_root   root directory for artifacts (subdirectory will be created under here)
+        test            test specification
+        """
+        self.artifact_root = artifact_root
+        self.test_options = test_options
+        self.debug = debug
+        self.filename = None
+        self.json_data = None
+        self.terse_data = None
+
+        self.test_dir = os.path.join(self.artifact_root,
+                                     "{:03d}".format(self.test_options['test_id']))
+        if not os.path.exists(self.test_dir):
+            os.mkdir(self.test_dir)
+
+        self.filename = "latency{:03d}".format(self.test_options['test_id'])
+
+    def run_fio(self, fio_path):
+        """Run a test."""
+
+        fio_args = [
+            "--name=latency",
+            "--randrepeat=0",
+            "--norandommap",
+            "--time_based",
+            "--size=16M",
+            "--rwmixread=50",
+            "--group_reporting=1",
+            "--write_lat_log={0}".format(self.filename),
+            "--output={0}.out".format(self.filename),
+            "--ioengine={ioengine}".format(**self.test_options),
+            "--rw={rw}".format(**self.test_options),
+            "--runtime={runtime}".format(**self.test_options),
+            "--output-format={output-format}".format(**self.test_options),
+        ]
+        for opt in ['slat_percentiles', 'clat_percentiles', 'lat_percentiles',
+                    'unified_rw_reporting', 'fsync', 'fdatasync', 'numjobs', 'cmdprio_percentage']:
+            if opt in self.test_options:
+                option = '--{0}={{{0}}}'.format(opt)
+                fio_args.append(option.format(**self.test_options))
+
+        command = [fio_path] + fio_args
+        with open(os.path.join(self.test_dir, "{0}.command".format(self.filename)), "w+") as \
+                command_file:
+            command_file.write("%s\n" % command)
+
+        passed = True
+        stdout_file = open(os.path.join(self.test_dir, "{0}.stdout".format(self.filename)), "w+")
+        stderr_file = open(os.path.join(self.test_dir, "{0}.stderr".format(self.filename)), "w+")
+        exitcode_file = open(os.path.join(self.test_dir,
+                                          "{0}.exitcode".format(self.filename)), "w+")
+        try:
+            proc = None
+            # Avoid using subprocess.run() here because when a timeout occurs,
+            # fio will be stopped with SIGKILL. This does not give fio a
+            # chance to clean up and means that child processes may continue
+            # running and submitting IO.
+            proc = subprocess.Popen(command,
+                                    stdout=stdout_file,
+                                    stderr=stderr_file,
+                                    cwd=self.test_dir,
+                                    universal_newlines=True)
+            proc.communicate(timeout=300)
+            exitcode_file.write('{0}\n'.format(proc.returncode))
+            passed &= (proc.returncode == 0)
+        except subprocess.TimeoutExpired:
+            proc.terminate()
+            proc.communicate()
+            assert proc.poll()
+            print("Timeout expired")
+            passed = False
+        except Exception:
+            if proc:
+                if not proc.poll():
+                    proc.terminate()
+                    proc.communicate()
+            print("Exception: %s" % sys.exc_info())
+            passed = False
+        finally:
+            stdout_file.close()
+            stderr_file.close()
+            exitcode_file.close()
+
+        if passed:
+            if 'json' in self.test_options['output-format']:
+                if not self.get_json():
+                    print('Unable to decode JSON data')
+                    passed = False
+            if 'terse' in self.test_options['output-format']:
+                if not self.get_terse():
+                    print('Unable to decode terse data')
+                    passed = False
+
+        return passed
+
+    def get_json(self):
+        """Convert fio JSON output into a python JSON object"""
+
+        filename = os.path.join(self.test_dir, "{0}.out".format(self.filename))
+        with open(filename, 'r') as file:
+            file_data = file.read()
+
+        #
+        # Sometimes fio informational messages are included at the top of the
+        # JSON output, especially under Windows. Try to decode output as JSON
+        # data, lopping off up to the first four lines
+        #
+        lines = file_data.splitlines()
+        for i in range(5):
+            file_data = '\n'.join(lines[i:])
+            try:
+                self.json_data = json.loads(file_data)
+            except json.JSONDecodeError:
+                continue
+            else:
+                return True
+
+        return False
+
+    def get_terse(self):
+        """Read fio output and return terse format data."""
+
+        filename = os.path.join(self.test_dir, "{0}.out".format(self.filename))
+        with open(filename, 'r') as file:
+            file_data = file.read()
+
+        #
+        # Read the first few lines and see if any of them begin with '3;fio-'
+        # If so, the line is probably terse output. Obviously, this only
+        # works for fio terse version 3 and it does not work for
+        # multi-line terse output
+        #
+        lines = file_data.splitlines()
+        for i in range(8):
+            file_data = lines[i]
+            if file_data.startswith('3;fio-'):
+                self.terse_data = file_data.split(';')
+                return True
+
+        return False
+
+    def check_latencies(self, jsondata, ddir, slat=True, clat=True, tlat=True, plus=False,
+                        unified=False):
+        """Check fio latency data.
+
+        ddir                data direction to check (0=read, 1=write, 2=trim)
+        slat                True if submission latency data available to check
+        clat                True if completion latency data available to check
+        tlat                True of total latency data available to check
+        plus                True if we actually have json+ format data where additional checks can
+                            be carried out
+        unified             True if fio is reporting unified r/w data
+        """
+
+        types = {
+            'slat': slat,
+            'clat': clat,
+            'lat': tlat
+        }
+
+        retval = True
+
+        for lat in ['slat', 'clat', 'lat']:
+            this_iter = True
+            if not types[lat]:
+                if 'percentile' in jsondata[lat+'_ns']:
+                    this_iter = False
+                    print('unexpected %s percentiles found' % lat)
+                else:
+                    print("%s percentiles skipped" % lat)
+                continue
+            else:
+                if 'percentile' not in jsondata[lat+'_ns']:
+                    this_iter = False
+                    print('%s percentiles not found in fio output' % lat)
+
+            #
+            # Check only for the presence/absence of json+
+            # latency bins. Future work can check the
+            # accurracy of the bin values and counts.
+            #
+            # Because the latency percentiles are based on
+            # the bins, we can be confident that the bin
+            # values and counts are correct if fio's
+            # latency percentiles match what we compute
+            # from the raw data.
+            #
+            if plus:
+                if 'bins' not in jsondata[lat+'_ns']:
+                    print('bins not found with json+ output format')
+                    this_iter = False
+                else:
+                    if not self.check_jsonplus(jsondata[lat+'_ns']):
+                        this_iter = False
+            else:
+                if 'bins' in jsondata[lat+'_ns']:
+                    print('json+ bins found with json output format')
+                    this_iter = False
+
+            latencies = []
+            for i in range(10):
+                lat_file = os.path.join(self.test_dir, "%s_%s.%s.log" % (self.filename, lat, i+1))
+                if not os.path.exists(lat_file):
+                    break
+                with open(lat_file, 'r', newline='') as file:
+                    reader = csv.reader(file)
+                    for line in reader:
+                        if unified or int(line[2]) == ddir:
+                            latencies.append(int(line[1]))
+
+            if int(jsondata['total_ios']) != len(latencies):
+                this_iter = False
+                print('%s: total_ios = %s, latencies logged = %d' % \
+                        (lat, jsondata['total_ios'], len(latencies)))
+            elif self.debug:
+                print("total_ios %s match latencies logged" % jsondata['total_ios'])
+
+            latencies.sort()
+            ptiles = jsondata[lat+'_ns']['percentile']
+
+            for percentile in ptiles.keys():
+                #
+                # numpy.percentile(latencies, float(percentile),
+                #       interpolation='higher')
+                # produces values that mostly match what fio reports
+                # however, in the tails of the distribution, the values produced
+                # by fio's and numpy.percentile's algorithms are occasionally off
+                # by one latency measurement. So instead of relying on the canned
+                # numpy.percentile routine, implement here fio's algorithm
+                #
+                rank = math.ceil(float(percentile)/100 * len(latencies))
+                if rank > 0:
+                    index = rank - 1
+                else:
+                    index = 0
+                value = latencies[int(index)]
+                fio_val = int(ptiles[percentile])
+                # The theory in stat.h says that the proportional error will be
+                # less than 1/128
+                if not self.similar(fio_val, value):
+                    delta = abs(fio_val - value) / value
+                    print("Error with %s %sth percentile: "
+                          "fio: %d, expected: %d, proportional delta: %f" %
+                          (lat, percentile, fio_val, value, delta))
+                    print("Rank: %d, index: %d" % (rank, index))
+                    this_iter = False
+                elif self.debug:
+                    print('%s %sth percentile values match: %d, %d' %
+                          (lat, percentile, fio_val, value))
+
+            if this_iter:
+                print("%s percentiles match" % lat)
+            else:
+                retval = False
+
+        return retval
+
+    @staticmethod
+    def check_empty(job):
+        """
+        Make sure JSON data is empty.
+
+        Some data structures should be empty. This function makes sure that they are.
+
+        job         JSON object that we need to check for emptiness
+        """
+
+        return job['total_ios'] == 0 and \
+                job['slat_ns']['N'] == 0 and \
+                job['clat_ns']['N'] == 0 and \
+                job['lat_ns']['N'] == 0
+
+    def check_nocmdprio_lat(self, job):
+        """
+        Make sure no high/low priority latencies appear.
+
+        job         JSON object to check
+        """
+
+        for ddir in ['read', 'write', 'trim']:
+            if ddir in job:
+                if 'lat_high_prio' in job[ddir] or 'lat_low_prio' in job[ddir] or \
+                    'clat_high_prio' in job[ddir] or 'clat_low_prio' in job[ddir]:
+                    print("Unexpected high/low priority latencies found in %s output" % ddir)
+                    return False
+
+        if self.debug:
+            print("No high/low priority latencies found")
+
+        return True
+
+    @staticmethod
+    def similar(approximation, actual):
+        """
+        Check whether the approximate values recorded by fio are within the theoretical bound.
+
+        Since it is impractical to store exact latency measurements for each and every IO, fio
+        groups similar latency measurements into variable-sized bins. The theory in stat.h says
+        that the proportional error will be less than 1/128. This function checks whether this
+        is true.
+
+        TODO This test will fail when comparing a value from the largest latency bin against its
+        actual measurement. Find some way to detect this and avoid failing.
+
+        approximation   value of the bin used by fio to store a given latency
+        actual          actual latency value
+        """
+        delta = abs(approximation - actual) / actual
+        return delta <= 1/128
+
+    def check_jsonplus(self, jsondata):
+        """Check consistency of json+ data
+
+        When we have json+ data we can check the min value, max value, and
+        sample size reported by fio
+
+        jsondata            json+ data that we need to check
+        """
+
+        retval = True
+
+        keys = [int(k) for k in jsondata['bins'].keys()]
+        values = [int(jsondata['bins'][k]) for k in jsondata['bins'].keys()]
+        smallest = min(keys)
+        biggest = max(keys)
+        sampsize = sum(values)
+
+        if not self.similar(jsondata['min'], smallest):
+            retval = False
+            print('reported min %d does not match json+ min %d' % (jsondata['min'], smallest))
+        elif self.debug:
+            print('json+ min values match: %d' % jsondata['min'])
+
+        if not self.similar(jsondata['max'], biggest):
+            retval = False
+            print('reported max %d does not match json+ max %d' % (jsondata['max'], biggest))
+        elif self.debug:
+            print('json+ max values match: %d' % jsondata['max'])
+
+        if sampsize != jsondata['N']:
+            retval = False
+            print('reported sample size %d does not match json+ total count %d' % \
+                    (jsondata['N'], sampsize))
+        elif self.debug:
+            print('json+ sample sizes match: %d' % sampsize)
+
+        return retval
+
+    def check_sync_lat(self, jsondata, plus=False):
+        """Check fsync latency percentile data.
+
+        All we can check is that some percentiles are reported, unless we have json+ data.
+        If we actually have json+ data then we can do more checking.
+
+        jsondata        JSON data for fsync operations
+        plus            True if we actually have json+ data
+        """
+        retval = True
+
+        if 'percentile' not in jsondata['lat_ns']:
+            print("Sync percentile data not found")
+            return False
+
+        if int(jsondata['total_ios']) != int(jsondata['lat_ns']['N']):
+            retval = False
+            print('Mismatch between total_ios and lat_ns sample size')
+        elif self.debug:
+            print('sync sample sizes match: %d' % jsondata['total_ios'])
+
+        if not plus:
+            if 'bins' in jsondata['lat_ns']:
+                print('Unexpected json+ bin data found')
+                return False
+
+        if not self.check_jsonplus(jsondata['lat_ns']):
+            retval = False
+
+        return retval
+
+    def check_terse(self, terse, jsondata):
+        """Compare terse latencies with JSON latencies.
+
+        terse           terse format data for checking
+        jsondata        JSON format data for checking
+        """
+
+        retval = True
+
+        for lat in terse:
+            split = lat.split('%')
+            pct = split[0]
+            terse_val = int(split[1][1:])
+            json_val = math.floor(jsondata[pct]/1000)
+            if terse_val != json_val:
+                retval = False
+                print('Mismatch with %sth percentile: json value=%d,%d terse value=%d' % \
+                        (pct, jsondata[pct], json_val, terse_val))
+            elif self.debug:
+                print('Terse %sth percentile matches JSON value: %d' % (pct, terse_val))
+
+        return retval
+
+    def check_prio_latencies(self, jsondata, clat=True, plus=False):
+        """Check consistency of high/low priority latencies.
+
+        clat                True if we should check clat data; other check lat data
+        plus                True if we have json+ format data where additional checks can
+                            be carried out
+        unified             True if fio is reporting unified r/w data
+        """
+
+        if clat:
+            high = 'clat_high_prio'
+            low = 'clat_low_prio'
+            combined = 'clat_ns'
+        else:
+            high = 'lat_high_prio'
+            low = 'lat_low_prio'
+            combined = 'lat_ns'
+
+        if not high in jsondata or not low in jsondata or not combined in jsondata:
+            print("Error identifying high/low priority latencies")
+            return False
+
+        if jsondata[high]['N'] + jsondata[low]['N'] != jsondata[combined]['N']:
+            print("High %d + low %d != combined sample size %d" % \
+                    (jsondata[high]['N'], jsondata[low]['N'], jsondata[combined]['N']))
+            return False
+        elif self.debug:
+            print("High %d + low %d == combined sample size %d" % \
+                    (jsondata[high]['N'], jsondata[low]['N'], jsondata[combined]['N']))
+
+        if min(jsondata[high]['min'], jsondata[low]['min']) != jsondata[combined]['min']:
+            print("Min of high %d, low %d min latencies does not match min %d from combined data" % \
+                    (jsondata[high]['min'], jsondata[low]['min'], jsondata[combined]['min']))
+            return False
+        elif self.debug:
+            print("Min of high %d, low %d min latencies matches min %d from combined data" % \
+                    (jsondata[high]['min'], jsondata[low]['min'], jsondata[combined]['min']))
+
+        if max(jsondata[high]['max'], jsondata[low]['max']) != jsondata[combined]['max']:
+            print("Max of high %d, low %d max latencies does not match max %d from combined data" % \
+                    (jsondata[high]['max'], jsondata[low]['max'], jsondata[combined]['max']))
+            return False
+        elif self.debug:
+            print("Max of high %d, low %d max latencies matches max %d from combined data" % \
+                    (jsondata[high]['max'], jsondata[low]['max'], jsondata[combined]['max']))
+
+        weighted_avg = (jsondata[high]['mean'] * jsondata[high]['N'] + \
+                        jsondata[low]['mean'] * jsondata[low]['N']) / jsondata[combined]['N']
+        delta = abs(weighted_avg - jsondata[combined]['mean'])
+        if (delta / jsondata[combined]['mean']) > 0.0001:
+            print("Difference between weighted average %f of high, low means "
+                  "and actual mean %f exceeds 0.01%%" % (weighted_avg, jsondata[combined]['mean']))
+            return False
+        elif self.debug:
+            print("Weighted average %f of high, low means matches actual mean %f" % \
+                    (weighted_avg, jsondata[combined]['mean']))
+
+        if plus:
+            if not self.check_jsonplus(jsondata[high]):
+                return False
+            if not self.check_jsonplus(jsondata[low]):
+                return False
+
+            bins = {**jsondata[high]['bins'], **jsondata[low]['bins']}
+            for duration in bins.keys():
+                if duration in jsondata[high]['bins'] and duration in jsondata[low]['bins']:
+                    bins[duration] = jsondata[high]['bins'][duration] + \
+                            jsondata[low]['bins'][duration]
+
+            if len(bins) != len(jsondata[combined]['bins']):
+                print("Number of combined high/low bins does not match number of overall bins")
+                return False
+            elif self.debug:
+                print("Number of bins from merged high/low data matches number of overall bins")
+
+            for duration in bins.keys():
+                if bins[duration] != jsondata[combined]['bins'][duration]:
+                    print("Merged high/low count does not match overall count for duration %d" \
+                            % duration)
+                    return False
+
+        print("Merged high/low priority latency data match combined latency data")
+        return True
+
+    def check(self):
+        """Check test output."""
+
+        raise NotImplementedError()
+
+
+class Test001(FioLatTest):
+    """Test object for Test 1."""
+
+    def check(self):
+        """Check Test 1 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['write']):
+            print("Unexpected write data found in output")
+            retval = False
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+        if not self.check_nocmdprio_lat(job):
+            print("Unexpected high/low priority latencies found")
+            retval = False
+
+        retval &= self.check_latencies(job['read'], 0, slat=False)
+
+        return retval
+
+
+class Test002(FioLatTest):
+    """Test object for Test 2."""
+
+    def check(self):
+        """Check Test 2 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['read']):
+            print("Unexpected read data found in output")
+            retval = False
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+        if not self.check_nocmdprio_lat(job):
+            print("Unexpected high/low priority latencies found")
+            retval = False
+
+        retval &= self.check_latencies(job['write'], 1, slat=False, clat=False)
+
+        return retval
+
+
+class Test003(FioLatTest):
+    """Test object for Test 3."""
+
+    def check(self):
+        """Check Test 3 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['read']):
+            print("Unexpected read data found in output")
+            retval = False
+        if not self.check_empty(job['write']):
+            print("Unexpected write data found in output")
+            retval = False
+        if not self.check_nocmdprio_lat(job):
+            print("Unexpected high/low priority latencies found")
+            retval = False
+
+        retval &= self.check_latencies(job['trim'], 2, slat=False, tlat=False)
+
+        return retval
+
+
+class Test004(FioLatTest):
+    """Test object for Tests 4, 13."""
+
+    def check(self):
+        """Check Test 4, 13 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['write']):
+            print("Unexpected write data found in output")
+            retval = False
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+        if not self.check_nocmdprio_lat(job):
+            print("Unexpected high/low priority latencies found")
+            retval = False
+
+        retval &= self.check_latencies(job['read'], 0, plus=True)
+
+        return retval
+
+
+class Test005(FioLatTest):
+    """Test object for Test 5."""
+
+    def check(self):
+        """Check Test 5 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['read']):
+            print("Unexpected read data found in output")
+            retval = False
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+        if not self.check_nocmdprio_lat(job):
+            print("Unexpected high/low priority latencies found")
+            retval = False
+
+        retval &= self.check_latencies(job['write'], 1, slat=False, plus=True)
+
+        return retval
+
+
+class Test006(FioLatTest):
+    """Test object for Test 6."""
+
+    def check(self):
+        """Check Test 6 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['write']):
+            print("Unexpected write data found in output")
+            retval = False
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+        if not self.check_nocmdprio_lat(job):
+            print("Unexpected high/low priority latencies found")
+            retval = False
+
+        retval &= self.check_latencies(job['read'], 0, slat=False, tlat=False, plus=True)
+
+        return retval
+
+
+class Test007(FioLatTest):
+    """Test object for Test 7."""
+
+    def check(self):
+        """Check Test 7 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+        if not self.check_nocmdprio_lat(job):
+            print("Unexpected high/low priority latencies found")
+            retval = False
+
+        retval &= self.check_latencies(job['read'], 0, clat=False, tlat=False, plus=True)
+        retval &= self.check_latencies(job['write'], 1, clat=False, tlat=False, plus=True)
+
+        return retval
+
+
+class Test008(FioLatTest):
+    """Test object for Tests 8, 14."""
+
+    def check(self):
+        """Check Test 8, 14 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if 'read' in job or 'write'in job or 'trim' in job:
+            print("Unexpected data direction found in fio output")
+            retval = False
+        if not self.check_nocmdprio_lat(job):
+            print("Unexpected high/low priority latencies found")
+            retval = False
+
+        retval &= self.check_latencies(job['mixed'], 0, plus=True, unified=True)
+
+        return retval
+
+
+class Test009(FioLatTest):
+    """Test object for Test 9."""
+
+    def check(self):
+        """Check Test 9 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['read']):
+            print("Unexpected read data found in output")
+            retval = False
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+        if not self.check_sync_lat(job['sync'], plus=True):
+            print("Error checking fsync latency data")
+            retval = False
+        if not self.check_nocmdprio_lat(job):
+            print("Unexpected high/low priority latencies found")
+            retval = False
+
+        retval &= self.check_latencies(job['write'], 1, slat=False, plus=True)
+
+        return retval
+
+
+class Test010(FioLatTest):
+    """Test object for Test 10."""
+
+    def check(self):
+        """Check Test 10 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+        if not self.check_nocmdprio_lat(job):
+            print("Unexpected high/low priority latencies found")
+            retval = False
+
+        retval &= self.check_latencies(job['read'], 0, plus=True)
+        retval &= self.check_latencies(job['write'], 1, plus=True)
+        retval &= self.check_terse(self.terse_data[17:34], job['read']['lat_ns']['percentile'])
+        retval &= self.check_terse(self.terse_data[58:75], job['write']['lat_ns']['percentile'])
+        # Terse data checking only works for default percentiles.
+        # This needs to be changed if something other than the default is ever used.
+
+        return retval
+
+
+class Test011(FioLatTest):
+    """Test object for Test 11."""
+
+    def check(self):
+        """Check Test 11 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+        if not self.check_nocmdprio_lat(job):
+            print("Unexpected high/low priority latencies found")
+            retval = False
+
+        retval &= self.check_latencies(job['read'], 0, slat=False, clat=False, plus=True)
+        retval &= self.check_latencies(job['write'], 1, slat=False, clat=False, plus=True)
+        retval &= self.check_terse(self.terse_data[17:34], job['read']['lat_ns']['percentile'])
+        retval &= self.check_terse(self.terse_data[58:75], job['write']['lat_ns']['percentile'])
+        # Terse data checking only works for default percentiles.
+        # This needs to be changed if something other than the default is ever used.
+
+        return retval
+
+
+class Test015(FioLatTest):
+    """Test object for Test 15."""
+
+    def check(self):
+        """Check Test 15 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['write']):
+            print("Unexpected write data found in output")
+            retval = False
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+
+        retval &= self.check_latencies(job['read'], 0, plus=True)
+        retval &= self.check_prio_latencies(job['read'], clat=False, plus=True)
+
+        return retval
+
+
+class Test016(FioLatTest):
+    """Test object for Test 16."""
+
+    def check(self):
+        """Check Test 16 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['read']):
+            print("Unexpected read data found in output")
+            retval = False
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+
+        retval &= self.check_latencies(job['write'], 1, slat=False, plus=True)
+        retval &= self.check_prio_latencies(job['write'], clat=False, plus=True)
+
+        return retval
+
+
+class Test017(FioLatTest):
+    """Test object for Test 17."""
+
+    def check(self):
+        """Check Test 17 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['write']):
+            print("Unexpected write data found in output")
+            retval = False
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+
+        retval &= self.check_latencies(job['read'], 0, slat=False, tlat=False, plus=True)
+        retval &= self.check_prio_latencies(job['read'], plus=True)
+
+        return retval
+
+
+class Test018(FioLatTest):
+    """Test object for Test 18."""
+
+    def check(self):
+        """Check Test 18 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+
+        retval &= self.check_latencies(job['read'], 0, clat=False, tlat=False, plus=True)
+        retval &= self.check_latencies(job['write'], 1, clat=False, tlat=False, plus=True)
+
+        # We actually have json+ data but setting plus=False below avoids checking the
+        # json+ bins which did not exist for clat and lat because this job is run with
+        # clat_percentiles=0, lat_percentiles=0, However, we can still check the summary
+        # statistics
+        retval &= self.check_prio_latencies(job['write'], plus=False)
+        retval &= self.check_prio_latencies(job['read'], plus=False)
+
+        return retval
+
+
+class Test019(FioLatTest):
+    """Test object for Tests 19, 20."""
+
+    def check(self):
+        """Check Test 19, 20 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if 'read' in job or 'write'in job or 'trim' in job:
+            print("Unexpected data direction found in fio output")
+            retval = False
+
+        retval &= self.check_latencies(job['mixed'], 0, plus=True, unified=True)
+        retval &= self.check_prio_latencies(job['mixed'], clat=False, plus=True)
+
+        return retval
+
+
+def parse_args():
+    """Parse command-line arguments."""
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-f', '--fio', help='path to file executable (e.g., ./fio)')
+    parser.add_argument('-a', '--artifact-root', help='artifact root directory')
+    parser.add_argument('-d', '--debug', help='enable debug output', action='store_true')
+    parser.add_argument('-s', '--skip', nargs='+', type=int,
+                        help='list of test(s) to skip')
+    parser.add_argument('-o', '--run-only', nargs='+', type=int,
+                        help='list of test(s) to run, skipping all others')
+    args = parser.parse_args()
+
+    return args
+
+
+def main():
+    """Run tests of fio latency percentile reporting"""
+
+    args = parse_args()
+
+    artifact_root = args.artifact_root if args.artifact_root else \
+        "latency-test-{0}".format(time.strftime("%Y%m%d-%H%M%S"))
+    os.mkdir(artifact_root)
+    print("Artifact directory is %s" % artifact_root)
+
+    if args.fio:
+        fio = str(Path(args.fio).absolute())
+    else:
+        fio = 'fio'
+    print("fio path is %s" % fio)
+
+    if platform.system() == 'Linux':
+        aio = 'libaio'
+    elif platform.system() == 'Windows':
+        aio = 'windowsaio'
+    else:
+        aio = 'posixaio'
+
+    test_list = [
+        {
+            # randread, null
+            # enable slat, clat, lat
+            # only clat and lat will appear because
+            # because the null ioengine is syncrhonous
+            "test_id": 1,
+            "runtime": 2,
+            "output-format": "json",
+            "slat_percentiles": 1,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": 'null',
+            'rw': 'randread',
+            "test_obj": Test001,
+        },
+        {
+            # randwrite, null
+            # enable lat only
+            "test_id": 2,
+            "runtime": 2,
+            "output-format": "json",
+            "slat_percentiles": 0,
+            "clat_percentiles": 0,
+            "lat_percentiles": 1,
+            "ioengine": 'null',
+            'rw': 'randwrite',
+            "test_obj": Test002,
+        },
+        {
+            # randtrim, null
+            # enable clat only
+            "test_id": 3,
+            "runtime": 2,
+            "output-format": "json",
+            "slat_percentiles": 0,
+            "clat_percentiles": 1,
+            "lat_percentiles": 0,
+            "ioengine": 'null',
+            'rw': 'randtrim',
+            "test_obj": Test003,
+        },
+        {
+            # randread, aio
+            # enable slat, clat, lat
+            # all will appear because liaio is asynchronous
+            "test_id": 4,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 1,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": aio,
+            'rw': 'randread',
+            "test_obj": Test004,
+        },
+        {
+            # randwrite, aio
+            # enable only clat, lat
+            "test_id": 5,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 0,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": aio,
+            'rw': 'randwrite',
+            "test_obj": Test005,
+        },
+        {
+            # randread, aio
+            # by default only clat should appear
+            "test_id": 6,
+            "runtime": 5,
+            "output-format": "json+",
+            "ioengine": aio,
+            'rw': 'randread',
+            "test_obj": Test006,
+        },
+        {
+            # 50/50 r/w, aio
+            # enable only slat
+            "test_id": 7,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 1,
+            "clat_percentiles": 0,
+            "lat_percentiles": 0,
+            "ioengine": aio,
+            'rw': 'randrw',
+            "test_obj": Test007,
+        },
+        {
+            # 50/50 r/w, aio, unified_rw_reporting
+            # enable slat, clat, lat
+            "test_id": 8,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 1,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": aio,
+            'rw': 'randrw',
+            'unified_rw_reporting': 1,
+            "test_obj": Test008,
+        },
+        {
+            # randwrite, null
+            # enable slat, clat, lat
+            # fsync
+            "test_id": 9,
+            "runtime": 2,
+            "output-format": "json+",
+            "slat_percentiles": 1,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": 'null',
+            'rw': 'randwrite',
+            'fsync': 32,
+            "test_obj": Test009,
+        },
+        {
+            # 50/50 r/w, aio
+            # enable slat, clat, lat
+            "test_id": 10,
+            "runtime": 5,
+            "output-format": "terse,json+",
+            "slat_percentiles": 1,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": aio,
+            'rw': 'randrw',
+            "test_obj": Test010,
+        },
+        {
+            # 50/50 r/w, aio
+            # enable only lat
+            "test_id": 11,
+            "runtime": 5,
+            "output-format": "terse,json+",
+            "slat_percentiles": 0,
+            "clat_percentiles": 0,
+            "lat_percentiles": 1,
+            "ioengine": aio,
+            'rw': 'randrw',
+            "test_obj": Test011,
+        },
+        {
+            # randread, null
+            # enable slat, clat, lat
+            # only clat and lat will appear because
+            # because the null ioengine is syncrhonous
+            # same as Test 1 except
+            # numjobs = 4 to test sum_thread_stats() changes
+            "test_id": 12,
+            "runtime": 2,
+            "output-format": "json",
+            "slat_percentiles": 1,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": 'null',
+            'rw': 'randread',
+            'numjobs': 4,
+            "test_obj": Test001,
+        },
+        {
+            # randread, aio
+            # enable slat, clat, lat
+            # all will appear because liaio is asynchronous
+            # same as Test 4 except
+            # numjobs = 4 to test sum_thread_stats() changes
+            "test_id": 13,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 1,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": aio,
+            'rw': 'randread',
+            'numjobs': 4,
+            "test_obj": Test004,
+        },
+        {
+            # 50/50 r/w, aio, unified_rw_reporting
+            # enable slat, clat, lata
+            # same as Test 8 except
+            # numjobs = 4 to test sum_thread_stats() changes
+            "test_id": 14,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 1,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": aio,
+            'rw': 'randrw',
+            'unified_rw_reporting': 1,
+            'numjobs': 4,
+            "test_obj": Test008,
+        },
+        {
+            # randread, aio
+            # enable slat, clat, lat
+            # all will appear because liaio is asynchronous
+            # same as Test 4 except add cmdprio_percentage
+            "test_id": 15,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 1,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": aio,
+            'rw': 'randread',
+            'cmdprio_percentage': 50,
+            "test_obj": Test015,
+        },
+        {
+            # randwrite, aio
+            # enable only clat, lat
+            # same as Test 5 except add cmdprio_percentage
+            "test_id": 16,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 0,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": aio,
+            'rw': 'randwrite',
+            'cmdprio_percentage': 50,
+            "test_obj": Test016,
+        },
+        {
+            # randread, aio
+            # by default only clat should appear
+            # same as Test 6 except add cmdprio_percentage
+            "test_id": 17,
+            "runtime": 5,
+            "output-format": "json+",
+            "ioengine": aio,
+            'rw': 'randread',
+            'cmdprio_percentage': 50,
+            "test_obj": Test017,
+        },
+        {
+            # 50/50 r/w, aio
+            # enable only slat
+            # same as Test 7 except add cmdprio_percentage
+            "test_id": 18,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 1,
+            "clat_percentiles": 0,
+            "lat_percentiles": 0,
+            "ioengine": aio,
+            'rw': 'randrw',
+            'cmdprio_percentage': 50,
+            "test_obj": Test018,
+        },
+        {
+            # 50/50 r/w, aio, unified_rw_reporting
+            # enable slat, clat, lat
+            # same as Test 8 except add cmdprio_percentage
+            "test_id": 19,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 1,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": aio,
+            'rw': 'randrw',
+            'unified_rw_reporting': 1,
+            'cmdprio_percentage': 50,
+            "test_obj": Test019,
+        },
+        {
+            # 50/50 r/w, aio, unified_rw_reporting
+            # enable slat, clat, lat
+            # same as Test 19 except
+            # add numjobs = 4 to test sum_thread_stats() changes
+            "test_id": 20,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 1,
+            "clat_percentiles": 1,
+            "lat_percentiles": 1,
+            "ioengine": aio,
+            'rw': 'randrw',
+            'unified_rw_reporting': 1,
+            'cmdprio_percentage': 50,
+            'numjobs': 4,
+            "test_obj": Test019,
+        },
+    ]
+
+    passed = 0
+    failed = 0
+    skipped = 0
+
+    for test in test_list:
+        if (args.skip and test['test_id'] in args.skip) or \
+           (args.run_only and test['test_id'] not in args.run_only):
+            skipped = skipped + 1
+            outcome = 'SKIPPED (User request)'
+        elif platform.system() != 'Linux' and 'cmdprio_percentage' in test:
+            skipped = skipped + 1
+            outcome = 'SKIPPED (Linux required for cmdprio_percentage tests)'
+        else:
+            test_obj = test['test_obj'](artifact_root, test, args.debug)
+            status = test_obj.run_fio(fio)
+            if status:
+                status = test_obj.check()
+            if status:
+                passed = passed + 1
+                outcome = 'PASSED'
+            else:
+                failed = failed + 1
+                outcome = 'FAILED'
+
+        print("**********Test {0} {1}**********".format(test['test_id'], outcome))
+
+    print("{0} tests passed, {1} failed, {2} skipped".format(passed, failed, skipped))
+
+    sys.exit(failed)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 3d236e37..003ff664 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -510,17 +510,17 @@ class Requirements(object):
 SUCCESS_DEFAULT = {
         'zero_return': True,
         'stderr_empty': True,
-        'timeout': 300,
+        'timeout': 600,
         }
 SUCCESS_NONZERO = {
         'zero_return': False,
         'stderr_empty': False,
-        'timeout': 300,
+        'timeout': 600,
         }
 SUCCESS_STDERR = {
         'zero_return': True,
         'stderr_empty': False,
-        'timeout': 300,
+        'timeout': 600,
         }
 TEST_LIST = [
         {
@@ -712,6 +712,14 @@ TEST_LIST = [
             'success':          SUCCESS_DEFAULT,
             'requirements':     [Requirements.unittests],
         },
+        {
+            'test_id':          1010,
+            'test_class':       FioExeTest,
+            'exe':              't/latency_percentiles.py',
+            'parameters':       ['-f', '{fio_path}'],
+            'success':          SUCCESS_DEFAULT,
+            'requirements':     [],
+        },
 ]
 
 
diff --git a/thread_options.h b/thread_options.h
index 4b131bda..c78ed43d 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -250,6 +250,7 @@ struct thread_options {
 	unsigned int trim_zero;
 	unsigned long long trim_backlog;
 	unsigned int clat_percentiles;
+	unsigned int slat_percentiles;
 	unsigned int lat_percentiles;
 	unsigned int percentile_precision;	/* digits after decimal for percentiles */
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
@@ -370,7 +371,7 @@ struct thread_options_pack {
 	uint32_t iodepth_batch_complete_min;
 	uint32_t iodepth_batch_complete_max;
 	uint32_t serialize_overlap;
-	uint32_t lat_percentiles;
+	uint32_t pad;
 
 	uint64_t size;
 	uint64_t io_size;
@@ -430,7 +431,7 @@ struct thread_options_pack {
 	uint32_t override_sync;
 	uint32_t rand_repeatable;
 	uint32_t allrand_repeatable;
-	uint32_t pad;
+	uint32_t pad2;
 	uint64_t rand_seed;
 	uint32_t log_avg_msec;
 	uint32_t log_hist_msec;
@@ -464,7 +465,6 @@ struct thread_options_pack {
 
 	uint32_t hugepage_size;
 	uint64_t rw_min_bs;
-	uint32_t pad2;
 	uint32_t thinktime;
 	uint32_t thinktime_spin;
 	uint32_t thinktime_blocks;
@@ -539,7 +539,10 @@ struct thread_options_pack {
 	uint32_t trim_zero;
 	uint64_t trim_backlog;
 	uint32_t clat_percentiles;
+	uint32_t lat_percentiles;
+	uint32_t slat_percentiles;
 	uint32_t percentile_precision;
+	uint32_t pad3;
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
 	uint8_t read_iolog_file[FIO_TOP_STR_MAX];
@@ -573,7 +576,6 @@ struct thread_options_pack {
 	uint32_t rate_iops_min[DDIR_RWDIR_CNT];
 	uint32_t rate_process;
 	uint32_t rate_ign_think;
-	uint32_t pad3;
 
 	uint8_t ioscheduler[FIO_TOP_STR_MAX];
 
@@ -603,7 +605,6 @@ struct thread_options_pack {
 	uint32_t flow_sleep;
 
 	uint32_t offset_increment_percent;
-	uint32_t pad4;
 	uint64_t offset_increment;
 	uint64_t number_ios;
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-01-29 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-01-29 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8178434ccba8e2d06684ce0c730b0eda571c5280:

  Merge branch 'filestat1' of https://github.com/kusumi/fio (2020-01-23 11:35:23 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ab45cf076766ce1ed49d28380c059343305cde4a:

  Merge branch 'stat-averaging-interval-start-fix' of https://github.com/maciejsszmigiero/fio (2020-01-28 14:15:35 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      Merge branch 'no_cpu_clock_no_tls_thread' of https://github.com/bsdkurt/fio
      Merge branch 'openbsd_swap' of https://github.com/bsdkurt/fio
      Merge branch 'stat-averaging-interval-start-fix' of https://github.com/maciejsszmigiero/fio

Kurt Miller (2):
      Fix build on architectures that don't have both cpu clock and __thread support.
      Use swap16/32/64 on OpenBSD.

Maciej S. Szmigiero (1):
      stat: fix calculation of bw and iops statistics based on samples

 gettime.c       | 4 ++--
 os/os-openbsd.h | 6 +++---
 stat.c          | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/gettime.c b/gettime.c
index 272a3e62..c3a4966b 100644
--- a/gettime.c
+++ b/gettime.c
@@ -371,7 +371,7 @@ static int calibrate_cpu_clock(void)
 }
 #endif // ARCH_HAVE_CPU_CLOCK
 
-#ifndef CONFIG_TLS_THREAD
+#if defined(ARCH_HAVE_CPU_CLOCK) && !defined(CONFIG_TLS_THREAD)
 void fio_local_clock_init(void)
 {
 	struct tv_valid *t;
@@ -398,7 +398,7 @@ void fio_clock_init(void)
 	if (fio_clock_source == fio_clock_source_inited)
 		return;
 
-#ifndef CONFIG_TLS_THREAD
+#if defined(ARCH_HAVE_CPU_CLOCK) && !defined(CONFIG_TLS_THREAD)
 	if (pthread_key_create(&tv_tls_key, kill_tv_tls_key))
 		log_err("fio: can't create TLS key\n");
 #endif
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index 085a6f2b..994bf078 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -31,9 +31,9 @@
 #define PTHREAD_STACK_MIN 4096
 #endif
 
-#define fio_swap16(x)	bswap16(x)
-#define fio_swap32(x)	bswap32(x)
-#define fio_swap64(x)	bswap64(x)
+#define fio_swap16(x)	swap16(x)
+#define fio_swap32(x)	swap32(x)
+#define fio_swap64(x)	swap64(x)
 
 static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 {
diff --git a/stat.c b/stat.c
index 9d93dcd1..cc1c360e 100644
--- a/stat.c
+++ b/stat.c
@@ -3106,7 +3106,7 @@ static int __add_samples(struct thread_data *td, struct timespec *parent_tv,
 		stat_io_bytes[ddir] = this_io_bytes[ddir];
 	}
 
-	timespec_add_msec(parent_tv, avg_time);
+	*parent_tv = *t;
 
 	if (needs_lock)
 		__td_io_u_unlock(td);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-01-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-01-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4ee9c375b64de236356cc97843d84978edeba491:

  stat: ensure we align correctly (2020-01-22 19:53:14 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8178434ccba8e2d06684ce0c730b0eda571c5280:

  Merge branch 'filestat1' of https://github.com/kusumi/fio (2020-01-23 11:35:23 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'filestat1' of https://github.com/kusumi/fio

Tomohiro Kusumi (1):
      engines/filestat: add "lstat" option to use lstat(2)

 HOWTO              |  4 ++++
 engines/filestat.c | 33 +++++++++++++++++++++++++++++++--
 fio.1              |  3 +++
 optgroup.h         |  2 ++
 4 files changed, 40 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 0a366168..f19f9226 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2261,6 +2261,10 @@ with the caveat that when used on the command line, they must come after the
 	multiple paths exist between the client and the server or in certain loopback
 	configurations.
 
+.. option:: lstat=bool : [filestat]
+
+	Use lstat(2) to measure lookup/getattr performance. Default is 0.
+
 .. option:: readfua=bool : [sg]
 
 	With readfua option set to 1, read operations include
diff --git a/engines/filestat.c b/engines/filestat.c
index 79525934..6c87c4c2 100644
--- a/engines/filestat.c
+++ b/engines/filestat.c
@@ -11,13 +11,36 @@
 #include <sys/stat.h>
 #include <unistd.h>
 #include "../fio.h"
+#include "../optgroup.h"
 
 struct fc_data {
 	enum fio_ddir stat_ddir;
 };
 
+struct filestat_options {
+	void *pad;
+	unsigned int lstat;
+};
+
+static struct fio_option options[] = {
+	{
+		.name	= "lstat",
+		.lname	= "lstat",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct filestat_options, lstat),
+		.help	= "Use lstat(2) to measure lookup/getattr performance",
+		.def	= "0",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_FILESTAT,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
 static int stat_file(struct thread_data *td, struct fio_file *f)
 {
+	struct filestat_options *o = td->eo;
 	struct timespec start;
 	int do_lat = !td->o.disable_lat;
 	struct stat statbuf;
@@ -37,13 +60,17 @@ static int stat_file(struct thread_data *td, struct fio_file *f)
 	if (do_lat)
 		fio_gettime(&start, NULL);
 
-	ret = stat(f->file_name, &statbuf);
+	if (o->lstat)
+		ret = lstat(f->file_name, &statbuf);
+	else
+		ret = stat(f->file_name, &statbuf);
 
 	if (ret == -1) {
 		char buf[FIO_VERROR_SIZE];
 		int e = errno;
 
-		snprintf(buf, sizeof(buf), "stat(%s)", f->file_name);
+		snprintf(buf, sizeof(buf), "%sstat(%s)",
+			o->lstat ? "l" : "", f->file_name);
 		td_verror(td, e, buf);
 		return 1;
 	}
@@ -103,6 +130,8 @@ static struct ioengine_ops ioengine = {
 	.open_file	= stat_file,
 	.flags		=  FIO_SYNCIO | FIO_FAKEIO |
 				FIO_NOSTATS | FIO_NOFILEHASH,
+	.options	= options,
+	.option_struct_size = sizeof(struct filestat_options),
 };
 
 static void fio_init fio_filestat_register(void)
diff --git a/fio.1 b/fio.1
index 05896e61..a58632b4 100644
--- a/fio.1
+++ b/fio.1
@@ -2032,6 +2032,9 @@ on the client site it will be used in the rdma_resolve_add()
 function. This can be useful when multiple paths exist between the
 client and the server or in certain loopback configurations.
 .TP
+.BI (filestat)lstat \fR=\fPbool
+Use \fBlstat\fR\|(2) to measure lookup/getattr performance. Default: 0.
+.TP
 .BI (sg)readfua \fR=\fPbool
 With readfua option set to 1, read operations include the force
 unit access (fua) flag. Default: 0.
diff --git a/optgroup.h b/optgroup.h
index 55ef5934..5789afd3 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -65,6 +65,7 @@ enum opt_category_group {
 	__FIO_OPT_G_ISCSI,
 	__FIO_OPT_G_NBD,
 	__FIO_OPT_G_IOURING,
+	__FIO_OPT_G_FILESTAT,
 	__FIO_OPT_G_NR,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
@@ -106,6 +107,7 @@ enum opt_category_group {
 	FIO_OPT_G_ISCSI         = (1ULL << __FIO_OPT_G_ISCSI),
 	FIO_OPT_G_NBD		= (1ULL << __FIO_OPT_G_NBD),
 	FIO_OPT_G_IOURING	= (1ULL << __FIO_OPT_G_IOURING),
+	FIO_OPT_G_FILESTAT	= (1ULL << __FIO_OPT_G_FILESTAT),
 };
 
 extern const struct opt_group *opt_group_from_mask(uint64_t *mask);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-01-23 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-01-23 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit d9b7596a1fad5adf7f6731d067e1513c56eabe96:

  Merge branch 'master' of https://github.com/bvanassche/fio (2020-01-18 09:35:25 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4ee9c375b64de236356cc97843d84978edeba491:

  stat: ensure we align correctly (2020-01-22 19:53:14 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'priorityQueuing' of https://github.com/Seagate/fio
      stat: ensure we align correctly

Phillip Chen (2):
      Whitespace standardization
      Per-command priority: Priority logging and libaio/io_uring cmdprio_percentage

Su, Friendy (1):
      engines: add engine for file stat

 HOWTO                          |  43 +++++---
 Makefile                       |   2 +-
 client.c                       |  10 +-
 engines/filecreate.c           |   2 +-
 engines/filestat.c             | 116 +++++++++++++++++++++
 engines/io_uring.c             |  47 +++++++++
 engines/libaio.c               |  54 ++++++++++
 eta.c                          |   6 +-
 examples/filestat-ioengine.fio |  19 ++++
 fio.1                          |  32 ++++--
 fio.h                          |   4 +-
 init.c                         |   3 +
 io_u.c                         |   8 +-
 io_u.h                         |   3 +
 ioengines.c                    |   1 +
 iolog.c                        |   8 +-
 iolog.h                        |   1 +
 server.c                       |  13 ++-
 server.h                       |   2 +-
 stat.c                         | 229 +++++++++++++++++++++++++++++++++++------
 stat.h                         |  14 ++-
 21 files changed, 541 insertions(+), 76 deletions(-)
 create mode 100644 engines/filestat.c
 create mode 100644 examples/filestat-ioengine.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 41d32c04..0a366168 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1996,6 +1996,11 @@ I/O engine
 			set  `filesize` so that all the accounting still occurs, but no
 			actual I/O will be done other than creating the file.
 
+		**filestat**
+			Simply do stat() and do no I/O to the file. You need to set 'filesize'
+			and 'nrfiles', so that files will be created.
+			This engine is to measure file lookup and meta data access.
+
 		**libpmem**
 			Read and write using mmap I/O to a file on a filesystem
 			mounted with DAX on a persistent memory device through the PMDK
@@ -2029,21 +2034,29 @@ In addition, there are some parameters which are only valid when a specific
 with the caveat that when used on the command line, they must come after the
 :option:`ioengine` that defines them is selected.
 
-.. option:: hipri : [io_uring]
+.. option:: cmdprio_percentage=int : [io_uring] [libaio]
 
-	If this option is set, fio will attempt to use polled IO completions.
-	Normal IO completions generate interrupts to signal the completion of
-	IO, polled completions do not. Hence they are require active reaping
-	by the application. The benefits are more efficient IO for high IOPS
-	scenarios, and lower latencies for low queue depth IO.
+    Set the percentage of I/O that will be issued with higher priority by setting
+    the priority bit. Non-read I/O is likely unaffected by ``cmdprio_percentage``.
+    This option cannot be used with the `prio` or `prioclass` options. For this
+    option to set the priority bit properly, NCQ priority must be supported and
+    enabled and :option:`direct`\=1 option must be used.
 
 .. option:: fixedbufs : [io_uring]
 
-	If fio is asked to do direct IO, then Linux will map pages for each
-	IO call, and release them when IO is done. If this option is set, the
-	pages are pre-mapped before IO is started. This eliminates the need to
-	map and release for each IO. This is more efficient, and reduces the
-	IO latency as well.
+    If fio is asked to do direct IO, then Linux will map pages for each
+    IO call, and release them when IO is done. If this option is set, the
+    pages are pre-mapped before IO is started. This eliminates the need to
+    map and release for each IO. This is more efficient, and reduces the
+    IO latency as well.
+
+.. option:: hipri : [io_uring]
+
+    If this option is set, fio will attempt to use polled IO completions.
+    Normal IO completions generate interrupts to signal the completion of
+    IO, polled completions do not. Hence they are require active reaping
+    by the application. The benefits are more efficient IO for high IOPS
+    scenarios, and lower latencies for low queue depth IO.
 
 .. option:: registerfiles : [io_uring]
 
@@ -2687,11 +2700,15 @@ Threads, processes and job synchronization
 	Set the I/O priority value of this job. Linux limits us to a positive value
 	between 0 and 7, with 0 being the highest.  See man
 	:manpage:`ionice(1)`. Refer to an appropriate manpage for other operating
-	systems since meaning of priority may differ.
+	systems since meaning of priority may differ. For per-command priority
+	setting, see I/O engine specific `cmdprio_percentage` and `hipri_percentage`
+	options.
 
 .. option:: prioclass=int
 
-	Set the I/O priority class. See man :manpage:`ionice(1)`.
+	Set the I/O priority class. See man :manpage:`ionice(1)`. For per-command
+	priority setting, see I/O engine specific `cmdprio_percentage` and
+	`hipri_percentage` options.
 
 .. option:: cpus_allowed=str
 
diff --git a/Makefile b/Makefile
index 3c5e0f5b..027b62bc 100644
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		pshared.c options.c \
 		smalloc.c filehash.c profile.c debug.c engines/cpu.c \
 		engines/mmap.c engines/sync.c engines/null.c engines/net.c \
-		engines/ftruncate.c engines/filecreate.c \
+		engines/ftruncate.c engines/filecreate.c engines/filestat.c \
 		server.c client.c iolog.c backend.c libfio.c flow.c cconv.c \
 		gettime-thread.c helpers.c json.c idletime.c td_error.c \
 		profiles/tiobench.c profiles/act.c io_u_queue.c filelock.c \
diff --git a/client.c b/client.c
index 93bca5df..4aed39e7 100644
--- a/client.c
+++ b/client.c
@@ -1032,6 +1032,14 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->nr_block_infos	= le64_to_cpu(src->nr_block_infos);
 	for (i = 0; i < dst->nr_block_infos; i++)
 		dst->block_infos[i] = le32_to_cpu(src->block_infos[i]);
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
+			dst->io_u_plat_high_prio[i][j] = le64_to_cpu(src->io_u_plat_high_prio[i][j]);
+			dst->io_u_plat_prio[i][j] = le64_to_cpu(src->io_u_plat_prio[i][j]);
+		}
+		convert_io_stat(&dst->clat_high_prio_stat[i], &src->clat_high_prio_stat[i]);
+		convert_io_stat(&dst->clat_prio_stat[i], &src->clat_prio_stat[i]);
+	}
 
 	dst->ss_dur		= le64_to_cpu(src->ss_dur);
 	dst->ss_state		= le32_to_cpu(src->ss_state);
@@ -1693,7 +1701,7 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 
 		s->time		= le64_to_cpu(s->time);
 		s->data.val	= le64_to_cpu(s->data.val);
-		s->__ddir	= le32_to_cpu(s->__ddir);
+		s->__ddir	= __le32_to_cpu(s->__ddir);
 		s->bs		= le64_to_cpu(s->bs);
 
 		if (ret->log_offset) {
diff --git a/engines/filecreate.c b/engines/filecreate.c
index 39a29502..5fec8544 100644
--- a/engines/filecreate.c
+++ b/engines/filecreate.c
@@ -49,7 +49,7 @@ static int open_file(struct thread_data *td, struct fio_file *f)
 		uint64_t nsec;
 
 		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0);
 	}
 
 	return 0;
diff --git a/engines/filestat.c b/engines/filestat.c
new file mode 100644
index 00000000..79525934
--- /dev/null
+++ b/engines/filestat.c
@@ -0,0 +1,116 @@
+/*
+ * filestat engine
+ *
+ * IO engine that doesn't do any IO, just stat files and tracks the latency
+ * of the file stat.
+ */
+#include <stdio.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include "../fio.h"
+
+struct fc_data {
+	enum fio_ddir stat_ddir;
+};
+
+static int stat_file(struct thread_data *td, struct fio_file *f)
+{
+	struct timespec start;
+	int do_lat = !td->o.disable_lat;
+	struct stat statbuf;
+	int ret;
+
+	dprint(FD_FILE, "fd stat %s\n", f->file_name);
+
+	if (f->filetype != FIO_TYPE_FILE) {
+		log_err("fio: only files are supported\n");
+		return 1;
+	}
+	if (!strcmp(f->file_name, "-")) {
+		log_err("fio: can't read/write to stdin/out\n");
+		return 1;
+	}
+
+	if (do_lat)
+		fio_gettime(&start, NULL);
+
+	ret = stat(f->file_name, &statbuf);
+
+	if (ret == -1) {
+		char buf[FIO_VERROR_SIZE];
+		int e = errno;
+
+		snprintf(buf, sizeof(buf), "stat(%s)", f->file_name);
+		td_verror(td, e, buf);
+		return 1;
+	}
+
+	if (do_lat) {
+		struct fc_data *data = td->io_ops_data;
+		uint64_t nsec;
+
+		nsec = ntime_since_now(&start);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0);
+	}
+
+	return 0;
+}
+
+static enum fio_q_status queue_io(struct thread_data *td, struct io_u fio_unused *io_u)
+{
+	return FIO_Q_COMPLETED;
+}
+
+static int init(struct thread_data *td)
+{
+	struct fc_data *data;
+
+	data = calloc(1, sizeof(*data));
+
+	if (td_read(td))
+		data->stat_ddir = DDIR_READ;
+	else if (td_write(td))
+		data->stat_ddir = DDIR_WRITE;
+
+	td->io_ops_data = data;
+	return 0;
+}
+
+static void cleanup(struct thread_data *td)
+{
+	struct fc_data *data = td->io_ops_data;
+
+	free(data);
+}
+
+static int stat_invalidate(struct thread_data *td, struct fio_file *f)
+{
+	/* do nothing because file not opened */
+	return 0;
+}
+
+static struct ioengine_ops ioengine = {
+	.name		= "filestat",
+	.version	= FIO_IOOPS_VERSION,
+	.init		= init,
+	.cleanup	= cleanup,
+	.queue		= queue_io,
+	.invalidate	= stat_invalidate,
+	.get_file_size	= generic_get_file_size,
+	.open_file	= stat_file,
+	.flags		=  FIO_SYNCIO | FIO_FAKEIO |
+				FIO_NOSTATS | FIO_NOFILEHASH,
+};
+
+static void fio_init fio_filestat_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_filestat_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 329f2f07..f1ffc712 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -70,6 +70,7 @@ struct ioring_data {
 struct ioring_options {
 	void *pad;
 	unsigned int hipri;
+	unsigned int cmdprio_percentage;
 	unsigned int fixedbufs;
 	unsigned int registerfiles;
 	unsigned int sqpoll_thread;
@@ -108,6 +109,26 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
 	},
+#ifdef FIO_HAVE_IOPRIO_CLASS
+	{
+		.name	= "cmdprio_percentage",
+		.lname	= "high priority percentage",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct ioring_options, cmdprio_percentage),
+		.minval	= 1,
+		.maxval	= 100,
+		.help	= "Send high priority I/O this percentage of the time",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
+#else
+	{
+		.name	= "cmdprio_percentage",
+		.lname	= "high priority percentage",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support I/O priority classes",
+	},
+#endif
 	{
 		.name	= "fixedbufs",
 		.lname	= "Fixed (pre-mapped) IO buffers",
@@ -313,11 +334,23 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
 	return r < 0 ? r : events;
 }
 
+static void fio_ioring_prio_prep(struct thread_data *td, struct io_u *io_u)
+{
+	struct ioring_options *o = td->eo;
+	struct ioring_data *ld = td->io_ops_data;
+	if (rand_between(&td->prio_state, 0, 99) < o->cmdprio_percentage) {
+		ld->sqes[io_u->index].ioprio = IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT;
+		io_u->flags |= IO_U_F_PRIORITY;
+	}
+	return;
+}
+
 static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 					  struct io_u *io_u)
 {
 	struct ioring_data *ld = td->io_ops_data;
 	struct io_sq_ring *ring = &ld->sq_ring;
+	struct ioring_options *o = td->eo;
 	unsigned tail, next_tail;
 
 	fio_ro_check(td, io_u);
@@ -343,6 +376,8 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 
 	/* ensure sqe stores are ordered with tail update */
 	write_barrier();
+	if (o->cmdprio_percentage)
+		fio_ioring_prio_prep(td, io_u);
 	ring->array[tail & ld->sq_ring_mask] = io_u->index;
 	*ring->tail = next_tail;
 	write_barrier();
@@ -618,6 +653,7 @@ static int fio_ioring_init(struct thread_data *td)
 {
 	struct ioring_options *o = td->eo;
 	struct ioring_data *ld;
+	struct thread_options *to = &td->o;
 
 	/* sqthread submission requires registered files */
 	if (o->sqpoll_thread)
@@ -640,6 +676,17 @@ static int fio_ioring_init(struct thread_data *td)
 	ld->iovecs = calloc(td->o.iodepth, sizeof(struct iovec));
 
 	td->io_ops_data = ld;
+
+	/*
+	 * Check for option conflicts
+	 */
+	if ((fio_option_is_set(to, ioprio) || fio_option_is_set(to, ioprio_class)) &&
+			o->cmdprio_percentage != 0) {
+		log_err("%s: cmdprio_percentage option and mutually exclusive "
+				"prio or prioclass option is set, exiting\n", to->name);
+		td_verror(td, EINVAL, "fio_io_uring_init");
+		return 1;
+	}
 	return 0;
 }
 
diff --git a/engines/libaio.c b/engines/libaio.c
index b047b746..299798ae 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -16,7 +16,13 @@
 #include "../optgroup.h"
 #include "../lib/memalign.h"
 
+/* Should be defined in newest aio_abi.h */
+#ifndef IOCB_FLAG_IOPRIO
+#define IOCB_FLAG_IOPRIO    (1 << 1)
+#endif
+
 static int fio_libaio_commit(struct thread_data *td);
+static int fio_libaio_init(struct thread_data *td);
 
 struct libaio_data {
 	io_context_t aio_ctx;
@@ -44,6 +50,7 @@ struct libaio_data {
 struct libaio_options {
 	void *pad;
 	unsigned int userspace_reap;
+	unsigned int cmdprio_percentage;
 };
 
 static struct fio_option options[] = {
@@ -56,6 +63,26 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
 	},
+#ifdef FIO_HAVE_IOPRIO_CLASS
+	{
+		.name	= "cmdprio_percentage",
+		.lname	= "high priority percentage",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct libaio_options, cmdprio_percentage),
+		.minval	= 1,
+		.maxval	= 100,
+		.help	= "Send high priority I/O this percentage of the time",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
+#else
+	{
+		.name	= "cmdprio_percentage",
+		.lname	= "high priority percentage",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support I/O priority classes",
+	},
+#endif
 	{
 		.name	= NULL,
 	},
@@ -85,6 +112,17 @@ static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 	return 0;
 }
 
+static void fio_libaio_prio_prep(struct thread_data *td, struct io_u *io_u)
+{
+	struct libaio_options *o = td->eo;
+	if (rand_between(&td->prio_state, 0, 99) < o->cmdprio_percentage) {
+		io_u->iocb.aio_reqprio = IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT;
+		io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO;
+		io_u->flags |= IO_U_F_PRIORITY;
+	}
+	return;
+}
+
 static struct io_u *fio_libaio_event(struct thread_data *td, int event)
 {
 	struct libaio_data *ld = td->io_ops_data;
@@ -188,6 +226,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 					  struct io_u *io_u)
 {
 	struct libaio_data *ld = td->io_ops_data;
+	struct libaio_options *o = td->eo;
 
 	fio_ro_check(td, io_u);
 
@@ -218,6 +257,9 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 		return FIO_Q_COMPLETED;
 	}
 
+	if (o->cmdprio_percentage)
+		fio_libaio_prio_prep(td, io_u);
+
 	ld->iocbs[ld->head] = &io_u->iocb;
 	ld->io_us[ld->head] = io_u;
 	ring_inc(ld, &ld->head, 1);
@@ -358,6 +400,8 @@ static int fio_libaio_post_init(struct thread_data *td)
 static int fio_libaio_init(struct thread_data *td)
 {
 	struct libaio_data *ld;
+	struct thread_options *to = &td->o;
+	struct libaio_options *o = td->eo;
 
 	ld = calloc(1, sizeof(*ld));
 
@@ -368,6 +412,16 @@ static int fio_libaio_init(struct thread_data *td)
 	ld->io_us = calloc(ld->entries, sizeof(struct io_u *));
 
 	td->io_ops_data = ld;
+	/*
+	 * Check for option conflicts
+	 */
+	if ((fio_option_is_set(to, ioprio) || fio_option_is_set(to, ioprio_class)) &&
+			o->cmdprio_percentage != 0) {
+		log_err("%s: cmdprio_percentage option and mutually exclusive "
+				"prio or prioclass option is set, exiting\n", to->name);
+		td_verror(td, EINVAL, "fio_libaio_init");
+		return 1;
+	}
 	return 0;
 }
 
diff --git a/eta.c b/eta.c
index 9950ef30..13f61ba4 100644
--- a/eta.c
+++ b/eta.c
@@ -509,9 +509,9 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 		calc_rate(unified_rw_rep, rate_time, io_bytes, rate_io_bytes,
 				je->rate);
 		memcpy(&rate_prev_time, &now, sizeof(now));
-		add_agg_sample(sample_val(je->rate[DDIR_READ]), DDIR_READ, 0);
-		add_agg_sample(sample_val(je->rate[DDIR_WRITE]), DDIR_WRITE, 0);
-		add_agg_sample(sample_val(je->rate[DDIR_TRIM]), DDIR_TRIM, 0);
+		add_agg_sample(sample_val(je->rate[DDIR_READ]), DDIR_READ, 0, 0);
+		add_agg_sample(sample_val(je->rate[DDIR_WRITE]), DDIR_WRITE, 0, 0);
+		add_agg_sample(sample_val(je->rate[DDIR_TRIM]), DDIR_TRIM, 0, 0);
 	}
 
 	disp_time = mtime_since(&disp_prev_time, &now);
diff --git a/examples/filestat-ioengine.fio b/examples/filestat-ioengine.fio
new file mode 100644
index 00000000..932fced8
--- /dev/null
+++ b/examples/filestat-ioengine.fio
@@ -0,0 +1,19 @@
+# Example filestat job
+
+# 'filestat' engine only do 'stat(filename)', file will not be open().
+# 'filesize' must be set, then files will be created at setup stage.
+
+[global]
+ioengine=filestat
+numjobs=1
+filesize=4k
+nrfiles=200
+thread
+
+[t0]
+[t1]
+[t2]
+[t3]
+[t4]
+[t5]
+
diff --git a/fio.1 b/fio.1
index cf5dd853..05896e61 100644
--- a/fio.1
+++ b/fio.1
@@ -1760,6 +1760,11 @@ Simply create the files and do no I/O to them.  You still need to set
 \fBfilesize\fR so that all the accounting still occurs, but no actual I/O will be
 done other than creating the file.
 .TP
+.B filestat
+Simply do stat() and do no I/O to the file. You need to set 'filesize'
+and 'nrfiles', so that files will be created.
+This engine is to measure file lookup and meta data access.
+.TP
 .B libpmem
 Read and write using mmap I/O to a file on a filesystem
 mounted with DAX on a persistent memory device through the PMDK
@@ -1790,12 +1795,12 @@ In addition, there are some parameters which are only valid when a specific
 with the caveat that when used on the command line, they must come after the
 \fBioengine\fR that defines them is selected.
 .TP
-.BI (io_uring)hipri
-If this option is set, fio will attempt to use polled IO completions. Normal IO
-completions generate interrupts to signal the completion of IO, polled
-completions do not. Hence they are require active reaping by the application.
-The benefits are more efficient IO for high IOPS scenarios, and lower latencies
-for low queue depth IO.
+.BI (io_uring, libaio)cmdprio_percentage \fR=\fPint
+Set the percentage of I/O that will be issued with higher priority by setting
+the priority bit. Non-read I/O is likely unaffected by ``cmdprio_percentage``.
+This option cannot be used with the `prio` or `prioclass` options. For this
+option to set the priority bit properly, NCQ priority must be supported and
+enabled and `direct=1' option must be used.
 .TP
 .BI (io_uring)fixedbufs
 If fio is asked to do direct IO, then Linux will map pages for each IO call, and
@@ -1803,6 +1808,13 @@ release them when IO is done. If this option is set, the pages are pre-mapped
 before IO is started. This eliminates the need to map and release for each IO.
 This is more efficient, and reduces the IO latency as well.
 .TP
+.BI (io_uring)hipri
+If this option is set, fio will attempt to use polled IO completions. Normal IO
+completions generate interrupts to signal the completion of IO, polled
+completions do not. Hence they are require active reaping by the application.
+The benefits are more efficient IO for high IOPS scenarios, and lower latencies
+for low queue depth IO.
+.TP
 .BI (io_uring)registerfiles
 With this option, fio registers the set of files being used with the kernel.
 This avoids the overhead of managing file counts in the kernel, making the
@@ -2381,10 +2393,14 @@ priority class.
 Set the I/O priority value of this job. Linux limits us to a positive value
 between 0 and 7, with 0 being the highest. See man
 \fBionice\fR\|(1). Refer to an appropriate manpage for other operating
-systems since meaning of priority may differ.
+systems since meaning of priority may differ. For per-command priority
+setting, see I/O engine specific `cmdprio_percentage` and `hipri_percentage`
+options.
 .TP
 .BI prioclass \fR=\fPint
-Set the I/O priority class. See man \fBionice\fR\|(1).
+Set the I/O priority class. See man \fBionice\fR\|(1). For per-command
+priority setting, see I/O engine specific `cmdprio_percentage` and `hipri_percent`
+options.
 .TP
 .BI cpus_allowed \fR=\fPstr
 Controls the same options as \fBcpumask\fR, but accepts a textual
diff --git a/fio.h b/fio.h
index 6a5ead4d..2a9eef45 100644
--- a/fio.h
+++ b/fio.h
@@ -139,6 +139,7 @@ enum {
 	FIO_RAND_ZONE_OFF,
 	FIO_RAND_POISSON2_OFF,
 	FIO_RAND_POISSON3_OFF,
+	FIO_RAND_PRIO_CMDS,
 	FIO_RAND_NR_OFFS,
 };
 
@@ -258,6 +259,7 @@ struct thread_data {
 	struct frand_state buf_state_prev;
 	struct frand_state dedupe_state;
 	struct frand_state zone_state;
+	struct frand_state prio_state;
 
 	struct zone_split_index **zone_state_index;
 
@@ -460,7 +462,7 @@ struct thread_data {
 	CUdevice  cu_dev;
 	CUcontext cu_ctx;
 	CUdeviceptr dev_mem_ptr;
-#endif	
+#endif
 
 };
 
diff --git a/init.c b/init.c
index 2f64726c..dca44bca 100644
--- a/init.c
+++ b/init.c
@@ -1042,6 +1042,7 @@ static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 	init_rand_seed(&td->poisson_state[2], td->rand_seeds[FIO_RAND_POISSON3_OFF], 0);
 	init_rand_seed(&td->dedupe_state, td->rand_seeds[FIO_DEDUPE_OFF], false);
 	init_rand_seed(&td->zone_state, td->rand_seeds[FIO_RAND_ZONE_OFF], false);
+	init_rand_seed(&td->prio_state, td->rand_seeds[FIO_RAND_PRIO_CMDS], false);
 
 	if (!td_random(td))
 		return;
@@ -1518,6 +1519,8 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		td->ts.lat_stat[i].min_val = ULONG_MAX;
 		td->ts.bw_stat[i].min_val = ULONG_MAX;
 		td->ts.iops_stat[i].min_val = ULONG_MAX;
+		td->ts.clat_high_prio_stat[i].min_val = ULONG_MAX;
+		td->ts.clat_prio_stat[i].min_val = ULONG_MAX;
 	}
 	td->ts.sync_stat.min_val = ULONG_MAX;
 	td->ddir_seq_nr = o->ddir_seq_nr;
diff --git a/io_u.c b/io_u.c
index 03f5c21f..bcb893c5 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1541,7 +1541,7 @@ again:
 		assert(io_u->flags & IO_U_F_FREE);
 		io_u_clear(td, io_u, IO_U_F_FREE | IO_U_F_NO_FILE_PUT |
 				 IO_U_F_TRIMMED | IO_U_F_BARRIER |
-				 IO_U_F_VER_LIST);
+				 IO_U_F_VER_LIST | IO_U_F_PRIORITY);
 
 		io_u->error = 0;
 		io_u->acct_ddir = -1;
@@ -1830,7 +1830,7 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 		unsigned long long tnsec;
 
 		tnsec = ntime_since(&io_u->start_time, &icd->time);
-		add_lat_sample(td, idx, tnsec, bytes, io_u->offset);
+		add_lat_sample(td, idx, tnsec, bytes, io_u->offset, io_u_is_prio(io_u));
 
 		if (td->flags & TD_F_PROFILE_OPS) {
 			struct prof_io_ops *ops = &td->prof_io_ops;
@@ -1849,7 +1849,7 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 
 	if (ddir_rw(idx)) {
 		if (!td->o.disable_clat) {
-			add_clat_sample(td, idx, llnsec, bytes, io_u->offset);
+			add_clat_sample(td, idx, llnsec, bytes, io_u->offset, io_u_is_prio(io_u));
 			io_u_mark_latency(td, llnsec);
 		}
 
@@ -2091,7 +2091,7 @@ void io_u_queued(struct thread_data *td, struct io_u *io_u)
 			td = td->parent;
 
 		add_slat_sample(td, io_u->ddir, slat_time, io_u->xfer_buflen,
-				io_u->offset);
+				io_u->offset, io_u_is_prio(io_u));
 	}
 }
 
diff --git a/io_u.h b/io_u.h
index e75993bd..0f63cdd0 100644
--- a/io_u.h
+++ b/io_u.h
@@ -24,6 +24,7 @@ enum {
 	IO_U_F_TRIMMED		= 1 << 5,
 	IO_U_F_BARRIER		= 1 << 6,
 	IO_U_F_VER_LIST		= 1 << 7,
+	IO_U_F_PRIORITY		= 1 << 8,
 };
 
 /*
@@ -193,5 +194,7 @@ static inline enum fio_ddir acct_ddir(struct io_u *io_u)
 	td_flags_clear((td), &(io_u->flags), (val))
 #define io_u_set(td, io_u, val)		\
 	td_flags_set((td), &(io_u)->flags, (val))
+#define io_u_is_prio(io_u)	\
+	(io_u->flags & (unsigned int) IO_U_F_PRIORITY) != 0
 
 #endif
diff --git a/ioengines.c b/ioengines.c
index b9200ba9..2c7a0df9 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -318,6 +318,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 					sizeof(io_u->issue_time));
 	}
 
+
 	if (ddir_rw(ddir)) {
 		if (!(io_u->flags & IO_U_F_VER_LIST)) {
 			td->io_issues[ddir]++;
diff --git a/iolog.c b/iolog.c
index b72dcf97..917a446c 100644
--- a/iolog.c
+++ b/iolog.c
@@ -896,18 +896,18 @@ void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 		s = __get_sample(samples, log_offset, i);
 
 		if (!log_offset) {
-			fprintf(f, "%lu, %" PRId64 ", %u, %llu\n",
+			fprintf(f, "%lu, %" PRId64 ", %u, %llu, %u\n",
 					(unsigned long) s->time,
 					s->data.val,
-					io_sample_ddir(s), (unsigned long long) s->bs);
+					io_sample_ddir(s), (unsigned long long) s->bs, s->priority_bit);
 		} else {
 			struct io_sample_offset *so = (void *) s;
 
-			fprintf(f, "%lu, %" PRId64 ", %u, %llu, %llu\n",
+			fprintf(f, "%lu, %" PRId64 ", %u, %llu, %llu, %u\n",
 					(unsigned long) s->time,
 					s->data.val,
 					io_sample_ddir(s), (unsigned long long) s->bs,
-					(unsigned long long) so->offset);
+					(unsigned long long) so->offset, s->priority_bit);
 		}
 	}
 }
diff --git a/iolog.h b/iolog.h
index 17be908f..981081f9 100644
--- a/iolog.h
+++ b/iolog.h
@@ -42,6 +42,7 @@ struct io_sample {
 	uint64_t time;
 	union io_sample_data data;
 	uint32_t __ddir;
+	uint8_t priority_bit;
 	uint64_t bs;
 };
 
diff --git a/server.c b/server.c
index 1a070e56..a5af5e74 100644
--- a/server.c
+++ b/server.c
@@ -1262,7 +1262,7 @@ static int handle_connection(struct sk_out *sk_out)
 	_exit(ret);
 }
 
-/* get the address on this host bound by the input socket, 
+/* get the address on this host bound by the input socket,
  * whether it is ipv6 or ipv4 */
 
 static int get_my_addr_str(int sk)
@@ -1574,6 +1574,15 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	p.ts.cachehit		= cpu_to_le64(ts->cachehit);
 	p.ts.cachemiss		= cpu_to_le64(ts->cachemiss);
 
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
+			p.ts.io_u_plat_high_prio[i][j] = cpu_to_le64(ts->io_u_plat_high_prio[i][j]);
+			p.ts.io_u_plat_prio[i][j] = cpu_to_le64(ts->io_u_plat_prio[i][j]);
+		}
+		convert_io_stat(&p.ts.clat_high_prio_stat[i], &ts->clat_high_prio_stat[i]);
+		convert_io_stat(&p.ts.clat_prio_stat[i], &ts->clat_prio_stat[i]);
+	}
+
 	convert_gs(&p.rs, rs);
 
 	dprint(FD_NET, "ts->ss_state = %d\n", ts->ss_state);
@@ -1998,7 +2007,7 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 
 			s->time		= cpu_to_le64(s->time);
 			s->data.val	= cpu_to_le64(s->data.val);
-			s->__ddir	= cpu_to_le32(s->__ddir);
+			s->__ddir	= __cpu_to_le32(s->__ddir);
 			s->bs		= cpu_to_le64(s->bs);
 
 			if (log->log_offset) {
diff --git a/server.h b/server.h
index de1d7f9b..6ac75366 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 80,
+	FIO_SERVER_VER			= 81,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index 4d3b728c..9d93dcd1 100644
--- a/stat.c
+++ b/stat.c
@@ -482,9 +482,12 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		display_lat("clat", min, max, mean, dev, out);
 	if (calc_lat(&ts->lat_stat[ddir], &min, &max, &mean, &dev))
 		display_lat(" lat", min, max, mean, dev, out);
+	if (calc_lat(&ts->clat_high_prio_stat[ddir], &min, &max, &mean, &dev))
+		display_lat("prio_clat", min, max, mean, dev, out);
 
 	if (ts->clat_percentiles || ts->lat_percentiles) {
 		const char *name = ts->clat_percentiles ? "clat" : " lat";
+		char prio_name[32];
 		uint64_t samples;
 
 		if (ts->clat_percentiles)
@@ -496,6 +499,27 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 					samples,
 					ts->percentile_list,
 					ts->percentile_precision, name, out);
+
+		/* Only print this if some high and low priority stats were collected */
+		if (ts->clat_high_prio_stat[ddir].samples > 0 &&
+			ts->clat_prio_stat[ddir].samples > 0)
+		{
+			sprintf(prio_name, "high prio (%.2f%%) %s",
+					100. * (double) ts->clat_high_prio_stat[ddir].samples / (double) samples,
+					name);
+			show_clat_percentiles(ts->io_u_plat_high_prio[ddir],
+						ts->clat_high_prio_stat[ddir].samples,
+						ts->percentile_list,
+						ts->percentile_precision, prio_name, out);
+
+			sprintf(prio_name, "low prio (%.2f%%) %s",
+					100. * (double) ts->clat_prio_stat[ddir].samples / (double) samples,
+					name);
+			show_clat_percentiles(ts->io_u_plat_prio[ddir],
+						ts->clat_prio_stat[ddir].samples,
+						ts->percentile_list,
+						ts->percentile_precision, prio_name, out);
+		}
 	}
 	if (calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev)) {
 		double p_of_agg = 100.0, fkb_base = (double)rs->kb_base;
@@ -1009,7 +1033,7 @@ static void show_thread_status_normal(struct thread_stat *ts,
 
 	if (!ddir_rw_sum(ts->io_bytes) && !ddir_rw_sum(ts->total_io_u))
 		return;
-		
+
 	memset(time_buf, 0, sizeof(time_buf));
 
 	time(&time_p);
@@ -1335,6 +1359,112 @@ static void add_ddir_status_json(struct thread_stat *ts,
 		}
 	}
 
+
+	/* Only print PRIO latencies if some high priority samples were gathered */
+	if (ts->clat_high_prio_stat[ddir].samples > 0) {
+		/* START OF HIGH PRIO CLAT */
+	    if (!calc_lat(&ts->clat_high_prio_stat[ddir], &min, &max, &mean, &dev)) {
+			min = max = 0;
+			mean = dev = 0.0;
+		}
+		tmp_object = json_create_object();
+		json_object_add_value_object(dir_object, "clat_prio",
+				tmp_object);
+		json_object_add_value_int(tmp_object, "samples",
+				ts->clat_high_prio_stat[ddir].samples);
+		json_object_add_value_int(tmp_object, "min", min);
+		json_object_add_value_int(tmp_object, "max", max);
+		json_object_add_value_float(tmp_object, "mean", mean);
+		json_object_add_value_float(tmp_object, "stddev", dev);
+
+		if (ts->clat_percentiles) {
+			len = calc_clat_percentiles(ts->io_u_plat_high_prio[ddir],
+						ts->clat_high_prio_stat[ddir].samples,
+						ts->percentile_list, &ovals, &maxv,
+						&minv);
+		} else
+			len = 0;
+
+		percentile_object = json_create_object();
+		json_object_add_value_object(tmp_object, "percentile", percentile_object);
+		for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++) {
+			if (i >= len) {
+				json_object_add_value_int(percentile_object, "0.00", 0);
+				continue;
+			}
+			snprintf(buf, sizeof(buf), "%f", ts->percentile_list[i].u.f);
+			json_object_add_value_int(percentile_object, (const char *)buf, ovals[i]);
+		}
+
+		if (output_format & FIO_OUTPUT_JSON_PLUS) {
+			clat_bins_object = json_create_object();
+			json_object_add_value_object(tmp_object, "bins", clat_bins_object);
+			for(i = 0; i < FIO_IO_U_PLAT_NR; i++) {
+				snprintf(buf, sizeof(buf), "%d", i);
+				json_object_add_value_int(clat_bins_object, (const char *)buf,
+						ts->io_u_plat_high_prio[ddir][i]);
+			}
+			json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_BITS",
+					FIO_IO_U_PLAT_BITS);
+			json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_VAL",
+					FIO_IO_U_PLAT_VAL);
+			json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_NR",
+					FIO_IO_U_PLAT_NR);
+		}
+		/* END OF HIGH PRIO CLAT */
+
+		/* START OF PRIO CLAT */
+	    if (!calc_lat(&ts->clat_prio_stat[ddir], &min, &max, &mean, &dev)) {
+			min = max = 0;
+			mean = dev = 0.0;
+		}
+		tmp_object = json_create_object();
+		json_object_add_value_object(dir_object, "clat_low_prio",
+				tmp_object);
+		json_object_add_value_int(tmp_object, "samples",
+				ts->clat_prio_stat[ddir].samples);
+		json_object_add_value_int(tmp_object, "min", min);
+		json_object_add_value_int(tmp_object, "max", max);
+		json_object_add_value_float(tmp_object, "mean", mean);
+		json_object_add_value_float(tmp_object, "stddev", dev);
+
+		if (ts->clat_percentiles) {
+			len = calc_clat_percentiles(ts->io_u_plat_prio[ddir],
+						ts->clat_prio_stat[ddir].samples,
+						ts->percentile_list, &ovals, &maxv,
+						&minv);
+		} else
+			len = 0;
+
+		percentile_object = json_create_object();
+		json_object_add_value_object(tmp_object, "percentile", percentile_object);
+		for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++) {
+			if (i >= len) {
+				json_object_add_value_int(percentile_object, "0.00", 0);
+				continue;
+			}
+			snprintf(buf, sizeof(buf), "%f", ts->percentile_list[i].u.f);
+			json_object_add_value_int(percentile_object, (const char *)buf, ovals[i]);
+		}
+
+		if (output_format & FIO_OUTPUT_JSON_PLUS) {
+			clat_bins_object = json_create_object();
+			json_object_add_value_object(tmp_object, "bins", clat_bins_object);
+			for(i = 0; i < FIO_IO_U_PLAT_NR; i++) {
+				snprintf(buf, sizeof(buf), "%d", i);
+				json_object_add_value_int(clat_bins_object, (const char *)buf,
+						ts->io_u_plat_prio[ddir][i]);
+			}
+			json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_BITS",
+					FIO_IO_U_PLAT_BITS);
+			json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_VAL",
+					FIO_IO_U_PLAT_VAL);
+			json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_NR",
+					FIO_IO_U_PLAT_NR);
+		}
+		/* END OF PRIO CLAT */
+	}
+
 	if (!ddir_rw(ddir))
 		return;
 
@@ -1856,6 +1986,8 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 	for (l = 0; l < DDIR_RWDIR_CNT; l++) {
 		if (!dst->unified_rw_rep) {
 			sum_stat(&dst->clat_stat[l], &src->clat_stat[l], first, false);
+			sum_stat(&dst->clat_high_prio_stat[l], &src->clat_high_prio_stat[l], first, false);
+			sum_stat(&dst->clat_prio_stat[l], &src->clat_prio_stat[l], first, false);
 			sum_stat(&dst->slat_stat[l], &src->slat_stat[l], first, false);
 			sum_stat(&dst->lat_stat[l], &src->lat_stat[l], first, false);
 			sum_stat(&dst->bw_stat[l], &src->bw_stat[l], first, true);
@@ -1867,6 +1999,8 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 				dst->runtime[l] = src->runtime[l];
 		} else {
 			sum_stat(&dst->clat_stat[0], &src->clat_stat[l], first, false);
+			sum_stat(&dst->clat_high_prio_stat[l], &src->clat_high_prio_stat[l], first, false);
+			sum_stat(&dst->clat_prio_stat[l], &src->clat_prio_stat[l], first, false);
 			sum_stat(&dst->slat_stat[0], &src->slat_stat[l], first, false);
 			sum_stat(&dst->lat_stat[0], &src->lat_stat[l], first, false);
 			sum_stat(&dst->bw_stat[0], &src->bw_stat[l], first, true);
@@ -1926,10 +2060,16 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 		int m;
 
 		for (m = 0; m < FIO_IO_U_PLAT_NR; m++) {
-			if (!dst->unified_rw_rep)
+			if (!dst->unified_rw_rep) {
 				dst->io_u_plat[k][m] += src->io_u_plat[k][m];
-			else
+				dst->io_u_plat_high_prio[k][m] += src->io_u_plat_high_prio[k][m];
+				dst->io_u_plat_prio[k][m] += src->io_u_plat_prio[k][m];
+			} else {
 				dst->io_u_plat[0][m] += src->io_u_plat[k][m];
+				dst->io_u_plat_high_prio[0][m] += src->io_u_plat_high_prio[k][m];
+				dst->io_u_plat_prio[0][m] += src->io_u_plat_prio[k][m];
+			}
+
 		}
 	}
 
@@ -1962,6 +2102,8 @@ void init_thread_stat(struct thread_stat *ts)
 		ts->slat_stat[j].min_val = -1UL;
 		ts->bw_stat[j].min_val = -1UL;
 		ts->iops_stat[j].min_val = -1UL;
+		ts->clat_high_prio_stat[j].min_val = -1UL;
+		ts->clat_prio_stat[j].min_val = -1UL;
 	}
 	ts->sync_stat.min_val = -1UL;
 	ts->groupid = -1;
@@ -2542,7 +2684,7 @@ static struct io_logs *get_cur_log(struct io_log *iolog)
 
 static void __add_log_sample(struct io_log *iolog, union io_sample_data data,
 			     enum fio_ddir ddir, unsigned long long bs,
-			     unsigned long t, uint64_t offset)
+			     unsigned long t, uint64_t offset, uint8_t priority_bit)
 {
 	struct io_logs *cur_log;
 
@@ -2561,6 +2703,7 @@ static void __add_log_sample(struct io_log *iolog, union io_sample_data data,
 		s->time = t + (iolog->td ? iolog->td->unix_epoch : 0);
 		io_sample_set_ddir(iolog, s, ddir);
 		s->bs = bs;
+		s->priority_bit = priority_bit;
 
 		if (iolog->log_offset) {
 			struct io_sample_offset *so = (void *) s;
@@ -2588,6 +2731,8 @@ void reset_io_stats(struct thread_data *td)
 	int i, j;
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		reset_io_stat(&ts->clat_high_prio_stat[i]);
+		reset_io_stat(&ts->clat_prio_stat[i]);
 		reset_io_stat(&ts->clat_stat[i]);
 		reset_io_stat(&ts->slat_stat[i]);
 		reset_io_stat(&ts->lat_stat[i]);
@@ -2602,6 +2747,8 @@ void reset_io_stats(struct thread_data *td)
 
 		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
 			ts->io_u_plat[i][j] = 0;
+			ts->io_u_plat_high_prio[i][j] = 0;
+			ts->io_u_plat_prio[i][j] = 0;
 			if (!i)
 				ts->io_u_sync_plat[j] = 0;
 		}
@@ -2629,7 +2776,7 @@ void reset_io_stats(struct thread_data *td)
 }
 
 static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
-			      unsigned long elapsed, bool log_max)
+			      unsigned long elapsed, bool log_max, uint8_t priority_bit)
 {
 	/*
 	 * Note an entry in the log. Use the mean from the logged samples,
@@ -2644,26 +2791,26 @@ static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
 		else
 			data.val = iolog->avg_window[ddir].mean.u.f + 0.50;
 
-		__add_log_sample(iolog, data, ddir, 0, elapsed, 0);
+		__add_log_sample(iolog, data, ddir, 0, elapsed, 0, priority_bit);
 	}
 
 	reset_io_stat(&iolog->avg_window[ddir]);
 }
 
 static void _add_stat_to_log(struct io_log *iolog, unsigned long elapsed,
-			     bool log_max)
+			     bool log_max, uint8_t priority_bit)
 {
 	int ddir;
 
 	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
-		__add_stat_to_log(iolog, ddir, elapsed, log_max);
+		__add_stat_to_log(iolog, ddir, elapsed, log_max, priority_bit);
 }
 
 static unsigned long add_log_sample(struct thread_data *td,
 				    struct io_log *iolog,
 				    union io_sample_data data,
 				    enum fio_ddir ddir, unsigned long long bs,
-				    uint64_t offset)
+				    uint64_t offset, uint8_t priority_bit)
 {
 	unsigned long elapsed, this_window;
 
@@ -2676,7 +2823,7 @@ static unsigned long add_log_sample(struct thread_data *td,
 	 * If no time averaging, just add the log sample.
 	 */
 	if (!iolog->avg_msec) {
-		__add_log_sample(iolog, data, ddir, bs, elapsed, offset);
+		__add_log_sample(iolog, data, ddir, bs, elapsed, offset, priority_bit);
 		return 0;
 	}
 
@@ -2700,7 +2847,7 @@ static unsigned long add_log_sample(struct thread_data *td,
 			return diff;
 	}
 
-	__add_stat_to_log(iolog, ddir, elapsed, td->o.log_max != 0);
+	_add_stat_to_log(iolog, elapsed, td->o.log_max != 0, priority_bit);
 
 	iolog->avg_last[ddir] = elapsed - (this_window - iolog->avg_msec);
 	return iolog->avg_msec;
@@ -2713,18 +2860,19 @@ void finalize_logs(struct thread_data *td, bool unit_logs)
 	elapsed = mtime_since_now(&td->epoch);
 
 	if (td->clat_log && unit_logs)
-		_add_stat_to_log(td->clat_log, elapsed, td->o.log_max != 0);
+		_add_stat_to_log(td->clat_log, elapsed, td->o.log_max != 0, 0);
 	if (td->slat_log && unit_logs)
-		_add_stat_to_log(td->slat_log, elapsed, td->o.log_max != 0);
+		_add_stat_to_log(td->slat_log, elapsed, td->o.log_max != 0, 0);
 	if (td->lat_log && unit_logs)
-		_add_stat_to_log(td->lat_log, elapsed, td->o.log_max != 0);
+		_add_stat_to_log(td->lat_log, elapsed, td->o.log_max != 0, 0);
 	if (td->bw_log && (unit_logs == per_unit_log(td->bw_log)))
-		_add_stat_to_log(td->bw_log, elapsed, td->o.log_max != 0);
+		_add_stat_to_log(td->bw_log, elapsed, td->o.log_max != 0, 0);
 	if (td->iops_log && (unit_logs == per_unit_log(td->iops_log)))
-		_add_stat_to_log(td->iops_log, elapsed, td->o.log_max != 0);
+		_add_stat_to_log(td->iops_log, elapsed, td->o.log_max != 0, 0);
 }
 
-void add_agg_sample(union io_sample_data data, enum fio_ddir ddir, unsigned long long bs)
+void add_agg_sample(union io_sample_data data, enum fio_ddir ddir, unsigned long long bs,
+					uint8_t priority_bit)
 {
 	struct io_log *iolog;
 
@@ -2732,7 +2880,7 @@ void add_agg_sample(union io_sample_data data, enum fio_ddir ddir, unsigned long
 		return;
 
 	iolog = agg_io_log[ddir];
-	__add_log_sample(iolog, data, ddir, bs, mtime_since_genesis(), 0);
+	__add_log_sample(iolog, data, ddir, bs, mtime_since_genesis(), 0, priority_bit);
 }
 
 void add_sync_clat_sample(struct thread_stat *ts, unsigned long long nsec)
@@ -2745,17 +2893,23 @@ void add_sync_clat_sample(struct thread_stat *ts, unsigned long long nsec)
 }
 
 static void add_clat_percentile_sample(struct thread_stat *ts,
-				unsigned long long nsec, enum fio_ddir ddir)
+				unsigned long long nsec, enum fio_ddir ddir, uint8_t priority_bit)
 {
 	unsigned int idx = plat_val_to_idx(nsec);
 	assert(idx < FIO_IO_U_PLAT_NR);
 
 	ts->io_u_plat[ddir][idx]++;
+
+	if (!priority_bit) {
+		ts->io_u_plat_prio[ddir][idx]++;
+	} else {
+		ts->io_u_plat_high_prio[ddir][idx]++;
+	}
 }
 
 void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		     unsigned long long nsec, unsigned long long bs,
-		     uint64_t offset)
+		     uint64_t offset, uint8_t priority_bit)
 {
 	const bool needs_lock = td_async_processing(td);
 	unsigned long elapsed, this_window;
@@ -2767,12 +2921,19 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 
 	add_stat_sample(&ts->clat_stat[ddir], nsec);
 
+	if (priority_bit) {
+		add_stat_sample(&ts->clat_high_prio_stat[ddir], nsec);
+	} else {
+		add_stat_sample(&ts->clat_prio_stat[ddir], nsec);
+	}
+
 	if (td->clat_log)
 		add_log_sample(td, td->clat_log, sample_val(nsec), ddir, bs,
-			       offset);
+			       offset, priority_bit);
 
-	if (ts->clat_percentiles)
-		add_clat_percentile_sample(ts, nsec, ddir);
+	if (ts->clat_percentiles) {
+		add_clat_percentile_sample(ts, nsec, ddir, priority_bit);
+	}
 
 	if (iolog && iolog->hist_msec) {
 		struct io_hist *hw = &iolog->hist_window[ddir];
@@ -2782,7 +2943,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		if (!hw->hist_last)
 			hw->hist_last = elapsed;
 		this_window = elapsed - hw->hist_last;
-		
+
 		if (this_window >= iolog->hist_msec) {
 			uint64_t *io_u_plat;
 			struct io_u_plat_entry *dst;
@@ -2800,7 +2961,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 				FIO_IO_U_PLAT_NR * sizeof(uint64_t));
 			flist_add(&dst->list, &hw->list);
 			__add_log_sample(iolog, sample_plat(dst), ddir, bs,
-						elapsed, offset);
+						elapsed, offset, priority_bit);
 
 			/*
 			 * Update the last time we recorded as being now, minus
@@ -2817,7 +2978,8 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 }
 
 void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
-		     unsigned long usec, unsigned long long bs, uint64_t offset)
+			unsigned long usec, unsigned long long bs, uint64_t offset,
+			uint8_t priority_bit)
 {
 	const bool needs_lock = td_async_processing(td);
 	struct thread_stat *ts = &td->ts;
@@ -2831,7 +2993,8 @@ void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
 	add_stat_sample(&ts->slat_stat[ddir], usec);
 
 	if (td->slat_log)
-		add_log_sample(td, td->slat_log, sample_val(usec), ddir, bs, offset);
+		add_log_sample(td, td->slat_log, sample_val(usec), ddir, bs, offset,
+			priority_bit);
 
 	if (needs_lock)
 		__td_io_u_unlock(td);
@@ -2839,7 +3002,7 @@ void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
 
 void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 		    unsigned long long nsec, unsigned long long bs,
-		    uint64_t offset)
+		    uint64_t offset, uint8_t priority_bit)
 {
 	const bool needs_lock = td_async_processing(td);
 	struct thread_stat *ts = &td->ts;
@@ -2854,10 +3017,10 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 
 	if (td->lat_log)
 		add_log_sample(td, td->lat_log, sample_val(nsec), ddir, bs,
-			       offset);
+			       offset, priority_bit);
 
 	if (ts->lat_percentiles)
-		add_clat_percentile_sample(ts, nsec, ddir);
+		add_clat_percentile_sample(ts, nsec, ddir, priority_bit);
 
 	if (needs_lock)
 		__td_io_u_unlock(td);
@@ -2882,7 +3045,7 @@ void add_bw_sample(struct thread_data *td, struct io_u *io_u,
 
 	if (td->bw_log)
 		add_log_sample(td, td->bw_log, sample_val(rate), io_u->ddir,
-			       bytes, io_u->offset);
+			       bytes, io_u->offset, io_u_is_prio(io_u));
 
 	td->stat_io_bytes[io_u->ddir] = td->this_io_bytes[io_u->ddir];
 
@@ -2936,7 +3099,7 @@ static int __add_samples(struct thread_data *td, struct timespec *parent_tv,
 			if (td->o.min_bs[ddir] == td->o.max_bs[ddir])
 				bs = td->o.min_bs[ddir];
 
-			next = add_log_sample(td, log, sample_val(rate), ddir, bs, 0);
+			next = add_log_sample(td, log, sample_val(rate), ddir, bs, 0, 0);
 			next_log = min(next_log, next);
 		}
 
@@ -2976,7 +3139,7 @@ void add_iops_sample(struct thread_data *td, struct io_u *io_u,
 
 	if (td->iops_log)
 		add_log_sample(td, td->iops_log, sample_val(1), io_u->ddir,
-			       bytes, io_u->offset);
+			       bytes, io_u->offset, io_u_is_prio(io_u));
 
 	td->stat_io_blocks[io_u->ddir] = td->this_io_blocks[io_u->ddir];
 
diff --git a/stat.h b/stat.h
index 2ce91ff0..9320c6bd 100644
--- a/stat.h
+++ b/stat.h
@@ -239,6 +239,11 @@ struct thread_stat {
 	fio_fp64_t ss_deviation;
 	fio_fp64_t ss_criterion;
 
+	uint64_t io_u_plat_high_prio[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR] __attribute__((aligned(8)));;
+	uint64_t io_u_plat_prio[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR];
+	struct io_stat clat_high_prio_stat[DDIR_RWDIR_CNT] __attribute__((aligned(8)));
+	struct io_stat clat_prio_stat[DDIR_RWDIR_CNT];
+
 	union {
 		uint64_t *ss_iops_data;
 		uint64_t pad4;
@@ -323,12 +328,13 @@ extern void update_rusage_stat(struct thread_data *);
 extern void clear_rusage_stat(struct thread_data *);
 
 extern void add_lat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
-				unsigned long long, uint64_t);
+				unsigned long long, uint64_t, uint8_t);
 extern void add_clat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
-				unsigned long long, uint64_t);
+				unsigned long long, uint64_t, uint8_t);
 extern void add_slat_sample(struct thread_data *, enum fio_ddir, unsigned long,
-				unsigned long long, uint64_t);
-extern void add_agg_sample(union io_sample_data, enum fio_ddir, unsigned long long);
+				unsigned long long, uint64_t, uint8_t);
+extern void add_agg_sample(union io_sample_data, enum fio_ddir, unsigned long long bs,
+				uint8_t priority_bit);
 extern void add_iops_sample(struct thread_data *, struct io_u *,
 				unsigned int);
 extern void add_bw_sample(struct thread_data *, struct io_u *,


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-01-19 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-01-19 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e08e2dd7b77f99e4bb904fc1df2395c2fe2ffbbe:

  Merge branch 'fix_verify_push' of https://github.com/gwendalcr/fio (2020-01-16 15:44:44 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d9b7596a1fad5adf7f6731d067e1513c56eabe96:

  Merge branch 'master' of https://github.com/bvanassche/fio (2020-01-18 09:35:25 -0700)

----------------------------------------------------------------
Bart Van Assche (8):
      stat: Remove two superfluous casts
      stat: Remove more superfluous casts
      stat: Remove several superfluous if-tests
      stat: Fix a memory leak in add_ddir_status_json()
      stat: Fix another memory leak in add_ddir_status_json()
      lib/memcpy: Suppress a Coverity leak report for setup_tests()
      pmemblk: Fix a memory leak
      client: Fix memory leaks in handle_job_opt()

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 client.c          |  4 ++++
 engines/pmemblk.c |  2 ++
 lib/memcpy.c      |  3 +++
 stat.c            | 43 ++++++++++++++++++++++---------------------
 4 files changed, 31 insertions(+), 21 deletions(-)

---

Diff of recent changes:

diff --git a/client.c b/client.c
index 55d89a0e..93bca5df 100644
--- a/client.c
+++ b/client.c
@@ -1157,6 +1157,10 @@ static void handle_job_opt(struct fio_client *client, struct fio_net_cmd *cmd)
 		struct flist_head *opt_list = &client->opt_lists[pdu->groupid];
 
 		flist_add_tail(&p->list, opt_list);
+	} else {
+		free(p->value);
+		free(p->name);
+		free(p);
 	}
 }
 
diff --git a/engines/pmemblk.c b/engines/pmemblk.c
index 45f6fb65..730f4d77 100644
--- a/engines/pmemblk.c
+++ b/engines/pmemblk.c
@@ -226,6 +226,8 @@ static fio_pmemblk_file_t pmb_open(const char *pathspec, int flags)
 
 	pthread_mutex_unlock(&CacheLock);
 
+	free(path);
+
 	return pmb;
 
 error:
diff --git a/lib/memcpy.c b/lib/memcpy.c
index cf8572e2..a5521343 100644
--- a/lib/memcpy.c
+++ b/lib/memcpy.c
@@ -201,6 +201,9 @@ static int setup_tests(void)
 	void *src, *dst;
 	int i;
 
+	if (!tests[0].name)
+		return 0;
+
 	src = malloc(BUF_SIZE);
 	dst = malloc(BUF_SIZE);
 	if (!src || !dst) {
diff --git a/stat.c b/stat.c
index 55d83fcc..4d3b728c 100644
--- a/stat.c
+++ b/stat.c
@@ -159,7 +159,7 @@ unsigned int calc_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 	 * isn't a worry. Also note that this does not work for NaN values.
 	 */
 	if (len > 1)
-		qsort((void *)plist, len, sizeof(plist[0]), double_cmp);
+		qsort(plist, len, sizeof(plist[0]), double_cmp);
 
 	ovals = malloc(len * sizeof(*ovals));
 	if (!ovals)
@@ -259,8 +259,7 @@ static void show_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 	}
 
 out:
-	if (ovals)
-		free(ovals);
+	free(ovals);
 }
 
 bool calc_lat(struct io_stat *is, unsigned long long *min,
@@ -684,7 +683,7 @@ static int calc_block_percentiles(int nr_block_infos, uint32_t *block_infos,
 	 * isn't a worry. Also note that this does not work for NaN values.
 	 */
 	if (len > 1)
-		qsort((void *)plist, len, sizeof(plist[0]), double_cmp);
+		qsort(plist, len, sizeof(plist[0]), double_cmp);
 
 	/* Start only after the uninit entries end */
 	for (nr_uninit = 0;
@@ -1168,8 +1167,7 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 	else
 		log_buf(out, ";%llu;%llu;%f;%f", 0ULL, 0ULL, 0.0, 0.0);
 
-	if (ovals)
-		free(ovals);
+	free(ovals);
 
 	bw_stat = calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev);
 	if (bw_stat) {
@@ -1208,7 +1206,8 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	double mean, dev, iops;
 	unsigned int len;
 	int i;
-	struct json_object *dir_object, *tmp_object, *percentile_object, *clat_bins_object = NULL;
+	struct json_object *dir_object, *tmp_object, *percentile_object = NULL,
+		*clat_bins_object = NULL;
 	char buf[120];
 	double p_of_agg = 100.0;
 
@@ -1303,29 +1302,34 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	} else
 		len = 0;
 
-	percentile_object = json_create_object();
-	if (ts->clat_percentiles)
+	if (ts->clat_percentiles) {
+		percentile_object = json_create_object();
 		json_object_add_value_object(tmp_object, "percentile", percentile_object);
-	for (i = 0; i < len; i++) {
-		snprintf(buf, sizeof(buf), "%f", ts->percentile_list[i].u.f);
-		json_object_add_value_int(percentile_object, (const char *)buf, ovals[i]);
+		for (i = 0; i < len; i++) {
+			snprintf(buf, sizeof(buf), "%f",
+				 ts->percentile_list[i].u.f);
+			json_object_add_value_int(percentile_object, buf,
+						  ovals[i]);
+		}
 	}
 
-	if (output_format & FIO_OUTPUT_JSON_PLUS) {
+	free(ovals);
+
+	if (output_format & FIO_OUTPUT_JSON_PLUS && ts->clat_percentiles) {
 		clat_bins_object = json_create_object();
-		if (ts->clat_percentiles)
-			json_object_add_value_object(tmp_object, "bins", clat_bins_object);
+		json_object_add_value_object(tmp_object, "bins",
+					     clat_bins_object);
 
 		for(i = 0; i < FIO_IO_U_PLAT_NR; i++) {
 			if (ddir_rw(ddir)) {
 				if (ts->io_u_plat[ddir][i]) {
 					snprintf(buf, sizeof(buf), "%llu", plat_idx_to_val(i));
-					json_object_add_value_int(clat_bins_object, (const char *)buf, ts->io_u_plat[ddir][i]);
+					json_object_add_value_int(clat_bins_object, buf, ts->io_u_plat[ddir][i]);
 				}
 			} else {
 				if (ts->io_u_sync_plat[i]) {
 					snprintf(buf, sizeof(buf), "%llu", plat_idx_to_val(i));
-					json_object_add_value_int(clat_bins_object, (const char *)buf, ts->io_u_sync_plat[i]);
+					json_object_add_value_int(clat_bins_object, buf, ts->io_u_sync_plat[i]);
 				}
 			}
 		}
@@ -1349,9 +1353,6 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	if (output_format & FIO_OUTPUT_JSON_PLUS && ts->lat_percentiles)
 		json_object_add_value_object(tmp_object, "bins", clat_bins_object);
 
-	if (ovals)
-		free(ovals);
-
 	if (calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev)) {
 		if (rs->agg[ddir]) {
 			p_of_agg = mean * 100 / (double) (rs->agg[ddir] / 1024);
@@ -1661,7 +1662,7 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 				snprintf(buf, sizeof(buf), "%f",
 					 ts->percentile_list[i].u.f);
 				json_object_add_value_int(percentile_object,
-							  (const char *)buf,
+							  buf,
 							  percentiles[i]);
 			}
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-01-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-01-17 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3f1e3af7cff07d4aedcd9a58ae00cb1a2189fcc2:

  engines/io_uring: use fixed opcodes for pre-mapped buffers (2020-01-14 14:27:22 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e08e2dd7b77f99e4bb904fc1df2395c2fe2ffbbe:

  Merge branch 'fix_verify_push' of https://github.com/gwendalcr/fio (2020-01-16 15:44:44 -0700)

----------------------------------------------------------------
Andrey Kuzmin (1):
      Use aux_path, if set, when loading verify state

Bart Van Assche (7):
      blktrace: Pass a positive error code to td_verror()
      blktrace: Check value of 'merge_buf' pointer before using it
      blktrace: Fix memory leaks in error paths
      server: Make it explicit that the setsockopt() return value is ignored
      t/memlock: Verify 'threads' argument
      t/read-to-pipe-async: Do not divide by zero
      t/read-to-pipe-async: Complain if pthread_detach() fails

Gwendal Grignou (1):
      verify: Fix test to not check for numberio when verify_only is true

Jens Axboe (3):
      Merge branch 'issue-825' of https://github.com/LeaflessMelospiza/fio
      Merge branch 'master' of https://github.com/bvanassche/fio
      Merge branch 'fix_verify_push' of https://github.com/gwendalcr/fio

LeaflessMelospiza (1):
      Moved diskutil reporting functions to stat.c

 backend.c              |  12 ++-
 blktrace.c             |  23 ++++--
 diskutil.c             | 217 -------------------------------------------------
 diskutil.h             |  12 ---
 server.c               |   3 +-
 stat.c                 | 212 +++++++++++++++++++++++++++++++++++++++++++++++
 stat.h                 |   8 +-
 t/memlock.c            |   4 +
 t/read-to-pipe-async.c |  16 ++--
 verify.c               |   6 +-
 10 files changed, 264 insertions(+), 249 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 0d1f4734..936203dc 100644
--- a/backend.c
+++ b/backend.c
@@ -2120,8 +2120,16 @@ static int fio_verify_load_state(struct thread_data *td)
 					td->thread_number - 1, &data);
 		if (!ret)
 			verify_assign_state(td, data);
-	} else
-		ret = verify_load_state(td, "local");
+	} else {
+		char prefix[PATH_MAX];
+
+		if (aux_path)
+			sprintf(prefix, "%s%clocal", aux_path,
+					FIO_OS_PATH_SEPARATOR);
+		else
+			strcpy(prefix, "local");
+		ret = verify_load_state(td, prefix);
+	}
 
 	return ret;
 }
diff --git a/blktrace.c b/blktrace.c
index 8a246613..64a610a9 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -28,8 +28,11 @@ static int refill_fifo(struct thread_data *td, struct fifo *fifo, int fd)
 
 	ret = read(fd, buf, total);
 	if (ret < 0) {
-		td_verror(td, errno, "read blktrace file");
-		return -1;
+		int read_err = errno;
+
+		assert(read_err > 0);
+		td_verror(td, read_err, "read blktrace file");
+		return -read_err;
 	}
 
 	if (ret > 0)
@@ -486,7 +489,7 @@ bool load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 		}
 		ret = discard_pdu(td, fifo, fd, &t);
 		if (ret < 0) {
-			td_verror(td, ret, "blktrace lseek");
+			td_verror(td, -ret, "blktrace lseek");
 			goto err;
 		} else if (t.pdu_len != ret) {
 			log_err("fio: discarded %d of %d\n", ret, t.pdu_len);
@@ -663,7 +666,7 @@ read_skip:
 	    t_get_ddir(t) == DDIR_INVAL) {
 		ret = discard_pdu(td, bc->fifo, bc->fd, t);
 		if (ret < 0) {
-			td_verror(td, ret, "blktrace lseek");
+			td_verror(td, -ret, "blktrace lseek");
 			return ret;
 		} else if (t->pdu_len != ret) {
 			log_err("fio: discarded %d of %d\n", ret,
@@ -716,9 +719,11 @@ int merge_blktrace_iologs(struct thread_data *td)
 	/* setup output file */
 	merge_fp = fopen(td->o.merge_blktrace_file, "w");
 	merge_buf = malloc(128 * 1024);
+	if (!merge_buf)
+		goto err_out_file;
 	ret = setvbuf(merge_fp, merge_buf, _IOFBF, 128 * 1024);
 	if (ret)
-		goto err_out_file;
+		goto err_merge_buf;
 
 	/* setup input files */
 	str = ptr = strdup(td->o.read_iolog_file);
@@ -728,6 +733,7 @@ int merge_blktrace_iologs(struct thread_data *td)
 		if (bcs[i].fd < 0) {
 			log_err("fio: could not open file: %s\n", name);
 			ret = bcs[i].fd;
+			free(str);
 			goto err_file;
 		}
 		bcs[i].fifo = fifo_alloc(TRACE_FIFO_SIZE);
@@ -735,11 +741,13 @@ int merge_blktrace_iologs(struct thread_data *td)
 
 		if (!is_blktrace(name, &bcs[i].swap)) {
 			log_err("fio: file is not a blktrace: %s\n", name);
+			free(str);
 			goto err_file;
 		}
 
 		ret = read_trace(td, &bcs[i]);
 		if (ret < 0) {
+			free(str);
 			goto err_file;
 		} else if (!ret) {
 			merge_finish_file(bcs, i, &nr_logs);
@@ -755,7 +763,7 @@ int merge_blktrace_iologs(struct thread_data *td)
 		/* skip over the pdu */
 		ret = discard_pdu(td, bc->fifo, bc->fd, &bc->t);
 		if (ret < 0) {
-			td_verror(td, ret, "blktrace lseek");
+			td_verror(td, -ret, "blktrace lseek");
 			goto err_file;
 		} else if (bc->t.pdu_len != ret) {
 			log_err("fio: discarded %d of %d\n", ret,
@@ -781,10 +789,11 @@ err_file:
 		fifo_free(bcs[i].fifo);
 		close(bcs[i].fd);
 	}
+err_merge_buf:
+	free(merge_buf);
 err_out_file:
 	fflush(merge_fp);
 	fclose(merge_fp);
-	free(merge_buf);
 err_param:
 	free(bcs);
 
diff --git a/diskutil.c b/diskutil.c
index f0744015..6c6380bb 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -498,72 +498,6 @@ void init_disk_util(struct thread_data *td)
 		f->du = __init_disk_util(td, f);
 }
 
-static void show_agg_stats(struct disk_util_agg *agg, int terse,
-			   struct buf_output *out)
-{
-	if (!agg->slavecount)
-		return;
-
-	if (!terse) {
-		log_buf(out, ", aggrios=%llu/%llu, aggrmerge=%llu/%llu, "
-			 "aggrticks=%llu/%llu, aggrin_queue=%llu, "
-			 "aggrutil=%3.2f%%",
-			(unsigned long long) agg->ios[0] / agg->slavecount,
-			(unsigned long long) agg->ios[1] / agg->slavecount,
-			(unsigned long long) agg->merges[0] / agg->slavecount,
-			(unsigned long long) agg->merges[1] / agg->slavecount,
-			(unsigned long long) agg->ticks[0] / agg->slavecount,
-			(unsigned long long) agg->ticks[1] / agg->slavecount,
-			(unsigned long long) agg->time_in_queue / agg->slavecount,
-			agg->max_util.u.f);
-	} else {
-		log_buf(out, ";slaves;%llu;%llu;%llu;%llu;%llu;%llu;%llu;%3.2f%%",
-			(unsigned long long) agg->ios[0] / agg->slavecount,
-			(unsigned long long) agg->ios[1] / agg->slavecount,
-			(unsigned long long) agg->merges[0] / agg->slavecount,
-			(unsigned long long) agg->merges[1] / agg->slavecount,
-			(unsigned long long) agg->ticks[0] / agg->slavecount,
-			(unsigned long long) agg->ticks[1] / agg->slavecount,
-			(unsigned long long) agg->time_in_queue / agg->slavecount,
-			agg->max_util.u.f);
-	}
-}
-
-static void aggregate_slaves_stats(struct disk_util *masterdu)
-{
-	struct disk_util_agg *agg = &masterdu->agg;
-	struct disk_util_stat *dus;
-	struct flist_head *entry;
-	struct disk_util *slavedu;
-	double util;
-
-	flist_for_each(entry, &masterdu->slaves) {
-		slavedu = flist_entry(entry, struct disk_util, slavelist);
-		dus = &slavedu->dus;
-		agg->ios[0] += dus->s.ios[0];
-		agg->ios[1] += dus->s.ios[1];
-		agg->merges[0] += dus->s.merges[0];
-		agg->merges[1] += dus->s.merges[1];
-		agg->sectors[0] += dus->s.sectors[0];
-		agg->sectors[1] += dus->s.sectors[1];
-		agg->ticks[0] += dus->s.ticks[0];
-		agg->ticks[1] += dus->s.ticks[1];
-		agg->time_in_queue += dus->s.time_in_queue;
-		agg->slavecount++;
-
-		util = (double) (100 * dus->s.io_ticks / (double) slavedu->dus.s.msec);
-		/* System utilization is the utilization of the
-		 * component with the highest utilization.
-		 */
-		if (util > agg->max_util.u.f)
-			agg->max_util.u.f = util;
-
-	}
-
-	if (agg->max_util.u.f > 100.0)
-		agg->max_util.u.f = 100.0;
-}
-
 void disk_util_prune_entries(void)
 {
 	fio_sem_down(disk_util_sem);
@@ -581,157 +515,6 @@ void disk_util_prune_entries(void)
 	fio_sem_remove(disk_util_sem);
 }
 
-void print_disk_util(struct disk_util_stat *dus, struct disk_util_agg *agg,
-		     int terse, struct buf_output *out)
-{
-	double util = 0;
-
-	if (dus->s.msec)
-		util = (double) 100 * dus->s.io_ticks / (double) dus->s.msec;
-	if (util > 100.0)
-		util = 100.0;
-
-	if (!terse) {
-		if (agg->slavecount)
-			log_buf(out, "  ");
-
-		log_buf(out, "  %s: ios=%llu/%llu, merge=%llu/%llu, "
-			 "ticks=%llu/%llu, in_queue=%llu, util=%3.2f%%",
-				dus->name,
-				(unsigned long long) dus->s.ios[0],
-				(unsigned long long) dus->s.ios[1],
-				(unsigned long long) dus->s.merges[0],
-				(unsigned long long) dus->s.merges[1],
-				(unsigned long long) dus->s.ticks[0],
-				(unsigned long long) dus->s.ticks[1],
-				(unsigned long long) dus->s.time_in_queue,
-				util);
-	} else {
-		log_buf(out, ";%s;%llu;%llu;%llu;%llu;%llu;%llu;%llu;%3.2f%%",
-				dus->name,
-				(unsigned long long) dus->s.ios[0],
-				(unsigned long long) dus->s.ios[1],
-				(unsigned long long) dus->s.merges[0],
-				(unsigned long long) dus->s.merges[1],
-				(unsigned long long) dus->s.ticks[0],
-				(unsigned long long) dus->s.ticks[1],
-				(unsigned long long) dus->s.time_in_queue,
-				util);
-	}
-
-	/*
-	 * If the device has slaves, aggregate the stats for
-	 * those slave devices also.
-	 */
-	show_agg_stats(agg, terse, out);
-
-	if (!terse)
-		log_buf(out, "\n");
-}
-
-void json_array_add_disk_util(struct disk_util_stat *dus,
-		struct disk_util_agg *agg, struct json_array *array)
-{
-	struct json_object *obj;
-	double util = 0;
-
-	if (dus->s.msec)
-		util = (double) 100 * dus->s.io_ticks / (double) dus->s.msec;
-	if (util > 100.0)
-		util = 100.0;
-
-	obj = json_create_object();
-	json_array_add_value_object(array, obj);
-
-	json_object_add_value_string(obj, "name", dus->name);
-	json_object_add_value_int(obj, "read_ios", dus->s.ios[0]);
-	json_object_add_value_int(obj, "write_ios", dus->s.ios[1]);
-	json_object_add_value_int(obj, "read_merges", dus->s.merges[0]);
-	json_object_add_value_int(obj, "write_merges", dus->s.merges[1]);
-	json_object_add_value_int(obj, "read_ticks", dus->s.ticks[0]);
-	json_object_add_value_int(obj, "write_ticks", dus->s.ticks[1]);
-	json_object_add_value_int(obj, "in_queue", dus->s.time_in_queue);
-	json_object_add_value_float(obj, "util", util);
-
-	/*
-	 * If the device has slaves, aggregate the stats for
-	 * those slave devices also.
-	 */
-	if (!agg->slavecount)
-		return;
-	json_object_add_value_int(obj, "aggr_read_ios",
-				agg->ios[0] / agg->slavecount);
-	json_object_add_value_int(obj, "aggr_write_ios",
-				agg->ios[1] / agg->slavecount);
-	json_object_add_value_int(obj, "aggr_read_merges",
-				agg->merges[0] / agg->slavecount);
-	json_object_add_value_int(obj, "aggr_write_merge",
-				agg->merges[1] / agg->slavecount);
-	json_object_add_value_int(obj, "aggr_read_ticks",
-				agg->ticks[0] / agg->slavecount);
-	json_object_add_value_int(obj, "aggr_write_ticks",
-				agg->ticks[1] / agg->slavecount);
-	json_object_add_value_int(obj, "aggr_in_queue",
-				agg->time_in_queue / agg->slavecount);
-	json_object_add_value_float(obj, "aggr_util", agg->max_util.u.f);
-}
-
-static void json_object_add_disk_utils(struct json_object *obj,
-				       struct flist_head *head)
-{
-	struct json_array *array = json_create_array();
-	struct flist_head *entry;
-	struct disk_util *du;
-
-	json_object_add_value_array(obj, "disk_util", array);
-
-	flist_for_each(entry, head) {
-		du = flist_entry(entry, struct disk_util, list);
-
-		aggregate_slaves_stats(du);
-		json_array_add_disk_util(&du->dus, &du->agg, array);
-	}
-}
-
-void show_disk_util(int terse, struct json_object *parent,
-		    struct buf_output *out)
-{
-	struct flist_head *entry;
-	struct disk_util *du;
-	bool do_json;
-
-	if (!is_running_backend())
-		return;
-
-	fio_sem_down(disk_util_sem);
-
-	if (flist_empty(&disk_list)) {
-		fio_sem_up(disk_util_sem);
-		return;
-	}
-
-	if ((output_format & FIO_OUTPUT_JSON) && parent)
-		do_json = true;
-	else
-		do_json = false;
-
-	if (!terse && !do_json)
-		log_buf(out, "\nDisk stats (read/write):\n");
-
-	if (do_json)
-		json_object_add_disk_utils(parent, &disk_list);
-	else if (output_format & ~(FIO_OUTPUT_JSON | FIO_OUTPUT_JSON_PLUS)) {
-		flist_for_each(entry, &disk_list) {
-			du = flist_entry(entry, struct disk_util, list);
-
-			aggregate_slaves_stats(du);
-			print_disk_util(&du->dus, &du->agg, terse, out);
-		}
-	}
-
-	fio_sem_up(disk_util_sem);
-}
-
 void setup_disk_util(void)
 {
 	disk_util_sem = fio_sem_init(FIO_SEM_UNLOCKED);
diff --git a/diskutil.h b/diskutil.h
index f6b09d22..83bcbf89 100644
--- a/diskutil.h
+++ b/diskutil.h
@@ -1,6 +1,5 @@
 #ifndef FIO_DISKUTIL_H
 #define FIO_DISKUTIL_H
-#include "json.h"
 #define FIO_DU_NAME_SZ		64
 
 #include "helper_thread.h"
@@ -105,26 +104,15 @@ extern struct flist_head disk_list;
  * disk util stuff
  */
 #ifdef FIO_HAVE_DISK_UTIL
-extern void print_disk_util(struct disk_util_stat *, struct disk_util_agg *, int terse, struct buf_output *);
-extern void show_disk_util(int terse, struct json_object *parent, struct buf_output *);
-extern void json_array_add_disk_util(struct disk_util_stat *dus,
-		struct disk_util_agg *agg, struct json_array *parent);
 extern void init_disk_util(struct thread_data *);
 extern int update_io_ticks(void);
 extern void setup_disk_util(void);
 extern void disk_util_prune_entries(void);
 #else
 /* keep this as a function to avoid a warning in handle_du() */
-static inline void print_disk_util(struct disk_util_stat *du,
-				   struct disk_util_agg *agg, int terse,
-				   struct buf_output *out)
-{
-}
-#define show_disk_util(terse, parent, out) do { } while (0)
 #define disk_util_prune_entries()
 #define init_disk_util(td)
 #define setup_disk_util()
-#define json_array_add_disk_util(dus, agg, parent)
 
 static inline int update_io_ticks(void)
 {
diff --git a/server.c b/server.c
index b7347b43..1a070e56 100644
--- a/server.c
+++ b/server.c
@@ -2154,7 +2154,8 @@ static int fio_init_server_ip(void)
 	/*
 	 * Not fatal if fails, so just ignore it if that happens
 	 */
-	setsockopt(sk, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof(opt));
+	if (setsockopt(sk, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof(opt))) {
+	}
 #endif
 
 	if (use_ipv6) {
diff --git a/stat.c b/stat.c
index 2b303494..55d83fcc 100644
--- a/stat.c
+++ b/stat.c
@@ -786,6 +786,218 @@ static void show_ss_normal(struct thread_stat *ts, struct buf_output *out)
 	free(p2);
 }
 
+static void show_agg_stats(struct disk_util_agg *agg, int terse,
+			   struct buf_output *out)
+{
+	if (!agg->slavecount)
+		return;
+
+	if (!terse) {
+		log_buf(out, ", aggrios=%llu/%llu, aggrmerge=%llu/%llu, "
+			 "aggrticks=%llu/%llu, aggrin_queue=%llu, "
+			 "aggrutil=%3.2f%%",
+			(unsigned long long) agg->ios[0] / agg->slavecount,
+			(unsigned long long) agg->ios[1] / agg->slavecount,
+			(unsigned long long) agg->merges[0] / agg->slavecount,
+			(unsigned long long) agg->merges[1] / agg->slavecount,
+			(unsigned long long) agg->ticks[0] / agg->slavecount,
+			(unsigned long long) agg->ticks[1] / agg->slavecount,
+			(unsigned long long) agg->time_in_queue / agg->slavecount,
+			agg->max_util.u.f);
+	} else {
+		log_buf(out, ";slaves;%llu;%llu;%llu;%llu;%llu;%llu;%llu;%3.2f%%",
+			(unsigned long long) agg->ios[0] / agg->slavecount,
+			(unsigned long long) agg->ios[1] / agg->slavecount,
+			(unsigned long long) agg->merges[0] / agg->slavecount,
+			(unsigned long long) agg->merges[1] / agg->slavecount,
+			(unsigned long long) agg->ticks[0] / agg->slavecount,
+			(unsigned long long) agg->ticks[1] / agg->slavecount,
+			(unsigned long long) agg->time_in_queue / agg->slavecount,
+			agg->max_util.u.f);
+	}
+}
+
+static void aggregate_slaves_stats(struct disk_util *masterdu)
+{
+	struct disk_util_agg *agg = &masterdu->agg;
+	struct disk_util_stat *dus;
+	struct flist_head *entry;
+	struct disk_util *slavedu;
+	double util;
+
+	flist_for_each(entry, &masterdu->slaves) {
+		slavedu = flist_entry(entry, struct disk_util, slavelist);
+		dus = &slavedu->dus;
+		agg->ios[0] += dus->s.ios[0];
+		agg->ios[1] += dus->s.ios[1];
+		agg->merges[0] += dus->s.merges[0];
+		agg->merges[1] += dus->s.merges[1];
+		agg->sectors[0] += dus->s.sectors[0];
+		agg->sectors[1] += dus->s.sectors[1];
+		agg->ticks[0] += dus->s.ticks[0];
+		agg->ticks[1] += dus->s.ticks[1];
+		agg->time_in_queue += dus->s.time_in_queue;
+		agg->slavecount++;
+
+		util = (double) (100 * dus->s.io_ticks / (double) slavedu->dus.s.msec);
+		/* System utilization is the utilization of the
+		 * component with the highest utilization.
+		 */
+		if (util > agg->max_util.u.f)
+			agg->max_util.u.f = util;
+
+	}
+
+	if (agg->max_util.u.f > 100.0)
+		agg->max_util.u.f = 100.0;
+}
+
+void print_disk_util(struct disk_util_stat *dus, struct disk_util_agg *agg,
+		     int terse, struct buf_output *out)
+{
+	double util = 0;
+
+	if (dus->s.msec)
+		util = (double) 100 * dus->s.io_ticks / (double) dus->s.msec;
+	if (util > 100.0)
+		util = 100.0;
+
+	if (!terse) {
+		if (agg->slavecount)
+			log_buf(out, "  ");
+
+		log_buf(out, "  %s: ios=%llu/%llu, merge=%llu/%llu, "
+			 "ticks=%llu/%llu, in_queue=%llu, util=%3.2f%%",
+				dus->name,
+				(unsigned long long) dus->s.ios[0],
+				(unsigned long long) dus->s.ios[1],
+				(unsigned long long) dus->s.merges[0],
+				(unsigned long long) dus->s.merges[1],
+				(unsigned long long) dus->s.ticks[0],
+				(unsigned long long) dus->s.ticks[1],
+				(unsigned long long) dus->s.time_in_queue,
+				util);
+	} else {
+		log_buf(out, ";%s;%llu;%llu;%llu;%llu;%llu;%llu;%llu;%3.2f%%",
+				dus->name,
+				(unsigned long long) dus->s.ios[0],
+				(unsigned long long) dus->s.ios[1],
+				(unsigned long long) dus->s.merges[0],
+				(unsigned long long) dus->s.merges[1],
+				(unsigned long long) dus->s.ticks[0],
+				(unsigned long long) dus->s.ticks[1],
+				(unsigned long long) dus->s.time_in_queue,
+				util);
+	}
+
+	/*
+	 * If the device has slaves, aggregate the stats for
+	 * those slave devices also.
+	 */
+	show_agg_stats(agg, terse, out);
+
+	if (!terse)
+		log_buf(out, "\n");
+}
+
+void json_array_add_disk_util(struct disk_util_stat *dus,
+		struct disk_util_agg *agg, struct json_array *array)
+{
+	struct json_object *obj;
+	double util = 0;
+
+	if (dus->s.msec)
+		util = (double) 100 * dus->s.io_ticks / (double) dus->s.msec;
+	if (util > 100.0)
+		util = 100.0;
+
+	obj = json_create_object();
+	json_array_add_value_object(array, obj);
+
+	json_object_add_value_string(obj, "name", dus->name);
+	json_object_add_value_int(obj, "read_ios", dus->s.ios[0]);
+	json_object_add_value_int(obj, "write_ios", dus->s.ios[1]);
+	json_object_add_value_int(obj, "read_merges", dus->s.merges[0]);
+	json_object_add_value_int(obj, "write_merges", dus->s.merges[1]);
+	json_object_add_value_int(obj, "read_ticks", dus->s.ticks[0]);
+	json_object_add_value_int(obj, "write_ticks", dus->s.ticks[1]);
+	json_object_add_value_int(obj, "in_queue", dus->s.time_in_queue);
+	json_object_add_value_float(obj, "util", util);
+
+	/*
+	 * If the device has slaves, aggregate the stats for
+	 * those slave devices also.
+	 */
+	if (!agg->slavecount)
+		return;
+	json_object_add_value_int(obj, "aggr_read_ios",
+				agg->ios[0] / agg->slavecount);
+	json_object_add_value_int(obj, "aggr_write_ios",
+				agg->ios[1] / agg->slavecount);
+	json_object_add_value_int(obj, "aggr_read_merges",
+				agg->merges[0] / agg->slavecount);
+	json_object_add_value_int(obj, "aggr_write_merge",
+				agg->merges[1] / agg->slavecount);
+	json_object_add_value_int(obj, "aggr_read_ticks",
+				agg->ticks[0] / agg->slavecount);
+	json_object_add_value_int(obj, "aggr_write_ticks",
+				agg->ticks[1] / agg->slavecount);
+	json_object_add_value_int(obj, "aggr_in_queue",
+				agg->time_in_queue / agg->slavecount);
+	json_object_add_value_float(obj, "aggr_util", agg->max_util.u.f);
+}
+
+static void json_object_add_disk_utils(struct json_object *obj,
+				       struct flist_head *head)
+{
+	struct json_array *array = json_create_array();
+	struct flist_head *entry;
+	struct disk_util *du;
+
+	json_object_add_value_array(obj, "disk_util", array);
+
+	flist_for_each(entry, head) {
+		du = flist_entry(entry, struct disk_util, list);
+
+		aggregate_slaves_stats(du);
+		json_array_add_disk_util(&du->dus, &du->agg, array);
+	}
+}
+
+void show_disk_util(int terse, struct json_object *parent,
+		    struct buf_output *out)
+{
+	struct flist_head *entry;
+	struct disk_util *du;
+	bool do_json;
+
+	if (!is_running_backend())
+		return;
+
+	if (flist_empty(&disk_list)) {
+		return;
+	}
+
+	if ((output_format & FIO_OUTPUT_JSON) && parent)
+		do_json = true;
+	else
+		do_json = false;
+
+	if (!terse && !do_json)
+		log_buf(out, "\nDisk stats (read/write):\n");
+
+	if (do_json)
+		json_object_add_disk_utils(parent, &disk_list);
+	else if (output_format & ~(FIO_OUTPUT_JSON | FIO_OUTPUT_JSON_PLUS)) {
+		flist_for_each(entry, &disk_list) {
+			du = flist_entry(entry, struct disk_util, list);
+
+			aggregate_slaves_stats(du);
+			print_disk_util(&du->dus, &du->agg, terse, out);
+		}
+	}
+}
+
 static void show_thread_status_normal(struct thread_stat *ts,
 				      struct group_run_stats *rs,
 				      struct buf_output *out)
diff --git a/stat.h b/stat.h
index ba7e290d..2ce91ff0 100644
--- a/stat.h
+++ b/stat.h
@@ -3,6 +3,8 @@
 
 #include "iolog.h"
 #include "lib/output_buffer.h"
+#include "diskutil.h"
+#include "json.h"
 
 struct group_run_stats {
 	uint64_t max_run[DDIR_RWDIR_CNT], min_run[DDIR_RWDIR_CNT];
@@ -332,9 +334,13 @@ extern void add_iops_sample(struct thread_data *, struct io_u *,
 extern void add_bw_sample(struct thread_data *, struct io_u *,
 				unsigned int, unsigned long long);
 extern void add_sync_clat_sample(struct thread_stat *ts,
-					unsigned long long nsec);
+				unsigned long long nsec);
 extern int calc_log_samples(void);
 
+extern void print_disk_util(struct disk_util_stat *, struct disk_util_agg *, int terse, struct buf_output *);
+extern void json_array_add_disk_util(struct disk_util_stat *dus,
+				struct disk_util_agg *agg, struct json_array *parent);
+
 extern struct io_log *agg_io_log[DDIR_RWDIR_CNT];
 extern bool write_bw_log;
 
diff --git a/t/memlock.c b/t/memlock.c
index ebedb91d..418dc3c4 100644
--- a/t/memlock.c
+++ b/t/memlock.c
@@ -43,6 +43,10 @@ int main(int argc, char *argv[])
 
 	mib = strtoul(argv[1], NULL, 10);
 	threads = strtoul(argv[2], NULL, 10);
+	if (threads < 1 || threads > 65536) {
+		printf("%s: invalid 'threads' argument\n", argv[0]);
+		return 1;
+	}
 
 	pthreads = calloc(threads, sizeof(pthread_t));
 	td.mib = mib;
diff --git a/t/read-to-pipe-async.c b/t/read-to-pipe-async.c
index bc7986f7..586e3c95 100644
--- a/t/read-to-pipe-async.c
+++ b/t/read-to-pipe-async.c
@@ -392,10 +392,13 @@ static void queue_work(struct reader_thread *rt, struct work_item *work)
 		pthread_cond_signal(&rt->thread.cond);
 	} else {
 		int ret = pthread_create(&work->thread, NULL, reader_one_off, work);
-		if (ret)
+		if (ret) {
 			fprintf(stderr, "pthread_create=%d\n", ret);
-		else
-			pthread_detach(work->thread);
+		} else {
+			ret = pthread_detach(work->thread);
+			if (ret)
+				fprintf(stderr, "pthread_detach=%d\n", ret);
+		}
 	}
 }
 
@@ -581,6 +584,7 @@ int main(int argc, char *argv[])
 	struct reader_thread *rt;
 	struct writer_thread *wt;
 	unsigned long rate;
+	uint64_t elapsed;
 	struct stat sb;
 	size_t bytes;
 	off_t off;
@@ -684,9 +688,11 @@ int main(int argc, char *argv[])
 	show_latencies(&wt->s, "WRITERS");
 
 	bytes /= 1024;
-	rate = (bytes * 1000UL * 1000UL) / utime_since(&s, &re);
+	elapsed = utime_since(&s, &re);
+	rate = elapsed ? (bytes * 1000UL * 1000UL) / elapsed : 0;
 	fprintf(stderr, "Read rate (KiB/sec) : %lu\n", rate);
-	rate = (bytes * 1000UL * 1000UL) / utime_since(&s, &we);
+	elapsed = utime_since(&s, &we);
+	rate = elapsed ? (bytes * 1000UL * 1000UL) / elapsed : 0;
 	fprintf(stderr, "Write rate (KiB/sec): %lu\n", rate);
 
 	close(fd);
diff --git a/verify.c b/verify.c
index a2c0d41d..cf299ebf 100644
--- a/verify.c
+++ b/verify.c
@@ -845,13 +845,11 @@ static int verify_header(struct io_u *io_u, struct thread_data *td,
 	 * For read-only workloads, the program cannot be certain of the
 	 * last numberio written to a block. Checking of numberio will be
 	 * done only for workloads that write data.  For verify_only,
-	 * numberio will be checked in the last iteration when the correct
-	 * state of numberio, that would have been written to each block
-	 * in a previous run of fio, has been reached.
+	 * numberio check is skipped.
 	 */
 	if (td_write(td) && (td_min_bs(td) == td_max_bs(td)) &&
 	    !td->o.time_based)
-		if (!td->o.verify_only || td->o.loops == 0)
+		if (!td->o.verify_only)
 			if (hdr->numberio != io_u->numberio) {
 				log_err("verify: bad header numberio %"PRIu16
 					", wanted %"PRIu16,


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-01-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-01-15 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7eff05d723d1330a5407b2bdd9145f1bfb6dd0e1:

  time: limit usec_sleep() to maximum intervals of 1 second (2020-01-13 14:51:35 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3f1e3af7cff07d4aedcd9a58ae00cb1a2189fcc2:

  engines/io_uring: use fixed opcodes for pre-mapped buffers (2020-01-14 14:27:22 -0700)

----------------------------------------------------------------
Keith Busch (1):
      engines/io_uring: use fixed opcodes for pre-mapped buffers

 engines/io_uring.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 4f6a9678..329f2f07 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -84,6 +84,11 @@ static const int ddir_to_op[2][2] = {
 	{ IORING_OP_WRITEV, IORING_OP_WRITE }
 };
 
+static const int fixed_ddir_to_op[2] = {
+	IORING_OP_READ_FIXED,
+	IORING_OP_WRITE_FIXED
+};
+
 static int fio_ioring_sqpoll_cb(void *data, unsigned long long *val)
 {
 	struct ioring_options *o = data;
@@ -189,12 +194,13 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 	}
 
 	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
-		sqe->opcode = ddir_to_op[io_u->ddir][!!o->nonvectored];
 		if (o->fixedbufs) {
+			sqe->opcode = fixed_ddir_to_op[io_u->ddir];
 			sqe->addr = (unsigned long) io_u->xfer_buf;
 			sqe->len = io_u->xfer_buflen;
 			sqe->buf_index = io_u->index;
 		} else {
+			sqe->opcode = ddir_to_op[io_u->ddir][!!o->nonvectored];
 			if (o->nonvectored) {
 				sqe->addr = (unsigned long)
 						ld->iovecs[io_u->index].iov_base;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-01-14 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-01-14 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b7ed2a862ddafa7dbaa6299ee8633b6f8221e283:

  io_uring: set sqe iopriority, if prio/prioclass is set (2020-01-09 14:58:51 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7eff05d723d1330a5407b2bdd9145f1bfb6dd0e1:

  time: limit usec_sleep() to maximum intervals of 1 second (2020-01-13 14:51:35 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      time: limit usec_sleep() to maximum intervals of 1 second

 time.c | 7 +++++++
 1 file changed, 7 insertions(+)

---

Diff of recent changes:

diff --git a/time.c b/time.c
index 19999699..cd0e2a89 100644
--- a/time.c
+++ b/time.c
@@ -57,6 +57,13 @@ uint64_t usec_sleep(struct thread_data *td, unsigned long usec)
 		if (ts >= 1000000) {
 			req.tv_sec = ts / 1000000;
 			ts -= 1000000 * req.tv_sec;
+			/*
+			 * Limit sleep to ~1 second at most, otherwise we
+			 * don't notice then someone signaled the job to
+			 * exit manually.
+			 */
+			if (req.tv_sec > 1)
+				req.tv_sec = 1;
 		} else
 			req.tv_sec = 0;
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-01-10 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-01-10 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2198a6b5a9f40726b40aced24cf2dcdb3b639898:

  Merge branch 'master' of https://github.com/bvanassche/fio (2020-01-06 18:38:02 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b7ed2a862ddafa7dbaa6299ee8633b6f8221e283:

  io_uring: set sqe iopriority, if prio/prioclass is set (2020-01-09 14:58:51 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      io_uring: set sqe iopriority, if prio/prioclass is set

 engines/io_uring.c | 4 ++++
 1 file changed, 4 insertions(+)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 7c19294b..4f6a9678 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -206,6 +206,10 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 		}
 		if (!td->o.odirect && o->uncached)
 			sqe->rw_flags = RWF_UNCACHED;
+		if (fio_option_is_set(&td->o, ioprio_class))
+			sqe->ioprio = td->o.ioprio_class << 13;
+		if (fio_option_is_set(&td->o, ioprio))
+			sqe->ioprio |= td->o.ioprio;
 		sqe->off = io_u->offset;
 	} else if (ddir_sync(io_u->ddir)) {
 		if (io_u->ddir == DDIR_SYNC_FILE_RANGE) {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-01-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-01-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit dbab52955aeb0b58cc88c8eff1b1c2239241f0bd:

  Merge branch 'memalign1' of https://github.com/kusumi/fio (2020-01-05 10:31:40 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2198a6b5a9f40726b40aced24cf2dcdb3b639898:

  Merge branch 'master' of https://github.com/bvanassche/fio (2020-01-06 18:38:02 -0700)

----------------------------------------------------------------
Bart Van Assche (12):
      Makefile: Avoid duplicating code
      Makefile: Use 'tr' if 'fmt' is not available
      configure: Improve the ibverbs test
      configure: Improve the getopt_long_only() test
      Windows: Fix multiple configure tests
      Fix the build in case FIO_HAVE_DISK_UTIL is not defined
      Change off64_t into uint64_t
      Windows: Remove more unused OS dependent code
      Include "oslib/asprintf.h" where necessary
      Windows: Use snprintf() instead of StringCch*()
      Windows: Uninline CPU affinity functions
      Windows >= 7: Make fio_getaffinity() error reporting more detailed

Jens Axboe (3):
      Merge branch 'json1' of https://github.com/kusumi/fio
      Merge branch 'master' of https://github.com/vincentkfu/fio
      Merge branch 'master' of https://github.com/bvanassche/fio

Tomohiro Kusumi (1):
      json: remove two redundant json_print_array() prototypes

Vincent Fu (2):
      t/run-fio-tests: automatically skip t/jobs/t0005 on Windows
      t/steadystate_tests: relax acceptance criterion

 .appveyor.yml             |   2 +-
 Makefile                  |  37 ++--
 configure                 |  43 ++++-
 diskutil.h                |   2 +-
 helpers.c                 |   3 +-
 helpers.h                 |   4 +-
 init.c                    |   1 +
 io_u.c                    |   2 +-
 json.c                    |   3 +-
 os/os-dragonfly.h         |   1 -
 os/os-freebsd.h           |   2 -
 os/os-mac.h               |   2 -
 os/os-netbsd.h            |   2 -
 os/os-openbsd.h           |   2 -
 os/os-windows-7.h         | 360 -------------------------------------
 os/os-windows-xp.h        |  67 -------
 os/os-windows.h           |  17 +-
 os/windows/cpu-affinity.c | 444 ++++++++++++++++++++++++++++++++++++++++++++++
 os/windows/posix.c        |  23 ++-
 os/windows/posix.h        |   1 -
 stat.c                    |   1 +
 t/run-fio-tests.py        |   8 +-
 t/steadystate_tests.py    |  10 +-
 verify.c                  |   1 +
 zbd.c                     |   2 +
 25 files changed, 542 insertions(+), 498 deletions(-)
 create mode 100644 os/windows/cpu-affinity.c

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index 4fb0a90d..bf0978ad 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -25,7 +25,7 @@ after_build:
 
 test_script:
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && file.exe fio.exe && make.exe test'
-  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && python.exe t/run-fio-tests.py --skip 5 --debug'
+  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && python.exe t/run-fio-tests.py --debug'
 
 artifacts:
   - path: os\windows\*.msi
diff --git a/Makefile b/Makefile
index dd26afca..3c5e0f5b 100644
--- a/Makefile
+++ b/Makefile
@@ -208,7 +208,8 @@ ifeq ($(CONFIG_TARGET_OS), Darwin)
   LIBS	 += -lpthread -ldl
 endif
 ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
-  SOURCE += os/windows/posix.c
+  SOURCE += os/windows/cpu-affinity.c os/windows/posix.c
+  WINDOWS_OBJS = os/windows/cpu-affinity.o os/windows/posix.o lib/hweight.o
   LIBS	 += -lpthread -lpsapi -lws2_32 -lssp
   CFLAGS += -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format
 endif
@@ -299,9 +300,9 @@ T_OBJS += $(T_TT_OBJS)
 T_OBJS += $(T_IOU_RING_OBJS)
 
 ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
-    T_DEDUPE_OBJS += os/windows/posix.o lib/hweight.o
-    T_SMALLOC_OBJS += os/windows/posix.o lib/hweight.o
-    T_LFSR_TEST_OBJS += os/windows/posix.o lib/hweight.o
+    T_DEDUPE_OBJS += $(WINDOWS_OBJS)
+    T_SMALLOC_OBJS += $(WINDOWS_OBJS)
+    T_LFSR_TEST_OBJS += $(WINDOWS_OBJS)
 endif
 
 T_TEST_PROGS = $(T_SMALLOC_PROGS)
@@ -387,13 +388,14 @@ override CFLAGS += -DFIO_VERSION='"$(FIO_VERSION)"'
 	@$(CC) -MM $(CFLAGS) $(CPPFLAGS) $(SRCDIR)/$*.c > $*.d
 	@mv -f $*.d $*.d.tmp
 	@sed -e 's|.*:|$*.o:|' < $*.d.tmp > $*.d
-ifeq ($(CONFIG_TARGET_OS), NetBSD)
-	@sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | tr -cs "[:graph:]" "\n" | \
-		sed -e 's/^ *//' -e '/^$$/ d' -e 's/$$/:/' >> $*.d
-else
-	@sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | fmt -w 1 | \
-		sed -e 's/^ *//' -e 's/$$/:/' >> $*.d
-endif
+	@if type -p fmt >/dev/null 2>&1; then				\
+		sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | fmt -w 1 |	\
+		sed -e 's/^ *//' -e 's/$$/:/' >> $*.d;			\
+	else								\
+		sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp |		\
+		tr -cs "[:graph:]" "\n" |				\
+		sed -e 's/^ *//' -e '/^$$/ d' -e 's/$$/:/' >> $*.d;	\
+	fi
 	@rm -f $*.d.tmp
 
 ifdef CONFIG_ARITHMETIC
@@ -426,19 +428,6 @@ parse.o: lex.yy.o y.tab.o
 endif
 
 init.o: init.c FIO-VERSION-FILE
-	@mkdir -p $(dir $@)
-	$(QUIET_CC)$(CC) -o $@ $(CFLAGS) $(CPPFLAGS) -c $<
-	@$(CC) -MM $(CFLAGS) $(CPPFLAGS) $(SRCDIR)/$*.c > $*.d
-	@mv -f $*.d $*.d.tmp
-	@sed -e 's|.*:|$*.o:|' < $*.d.tmp > $*.d
-ifeq ($(CONFIG_TARGET_OS), NetBSD)
-	@sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | tr -cs "[:graph:]" "\n" | \
-		sed -e 's/^ *//' -e '/^$$/ d' -e 's/$$/:/' >> $*.d
-else
-	@sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | fmt -w 1 | \
-		sed -e 's/^ *//' -e 's/$$/:/' >> $*.d
-endif
-	@rm -f $*.d.tmp
 
 gcompat.o: gcompat.c gcompat.h
 	$(QUIET_CC)$(CC) $(CFLAGS) $(GTK_CFLAGS) $(CPPFLAGS) -c $<
diff --git a/configure b/configure
index fa6df532..a1217700 100755
--- a/configure
+++ b/configure
@@ -381,15 +381,11 @@ CYGWIN*)
   # We now take the regular configuration path without having exit 0 here.
   # Flags below are still necessary mostly for MinGW.
   build_static="yes"
-  socklen_t="yes"
   rusage_thread="yes"
   fdatasync="yes"
   clock_gettime="yes" # clock_monotonic probe has dependency on this
   clock_monotonic="yes"
-  gettimeofday="yes"
   sched_idle="yes"
-  tcp_nodelay="yes"
-  ipv6="yes"
   ;;
 esac
 
@@ -841,7 +837,7 @@ cat > $TMPC << EOF
 int main(int argc, char **argv)
 {
   struct ibv_pd *pd = ibv_alloc_pd(NULL);
-  return 0;
+  return pd != NULL;
 }
 EOF
 if test "$disable_rdma" != "yes" && compile_prog "" "-libverbs" "libverbs" ; then
@@ -1374,7 +1370,7 @@ cat > $TMPC << EOF
 #include <getopt.h>
 int main(int argc, char **argv)
 {
-  int c = getopt_long_only(argc, argv, NULL, NULL, NULL);
+  int c = getopt_long_only(argc, argv, "", NULL, NULL);
   return c;
 }
 EOF
@@ -1389,8 +1385,12 @@ if test "$inet_aton" != "yes" ; then
   inet_aton="no"
 fi
 cat > $TMPC << EOF
+#ifdef _WIN32
+#include <winsock2.h>
+#else
 #include <sys/socket.h>
 #include <arpa/inet.h>
+#endif
 #include <stdio.h>
 int main(int argc, char **argv)
 {
@@ -1409,7 +1409,12 @@ if test "$socklen_t" != "yes" ; then
   socklen_t="no"
 fi
 cat > $TMPC << EOF
+#ifdef _WIN32
+#include <winsock2.h>
+#include <ws2tcpip.h>
+#else
 #include <sys/socket.h>
+#endif
 int main(int argc, char **argv)
 {
   socklen_t len = 0;
@@ -1533,10 +1538,14 @@ if test "$tcp_nodelay" != "yes" ; then
   tcp_nodelay="no"
 fi
 cat > $TMPC << EOF
+#ifdef _WIN32
+#include <winsock2.h>
+#else
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <netinet/tcp.h>
+#endif
 int main(int argc, char **argv)
 {
   return getsockopt(0, 0, TCP_NODELAY, NULL, NULL);
@@ -1544,6 +1553,9 @@ int main(int argc, char **argv)
 EOF
 if compile_prog "" "" "TCP_NODELAY"; then
   tcp_nodelay="yes"
+elif compile_prog "" "-lws2_32" "TCP_NODELAY"; then
+  tcp_nodelay="yes"
+  LIBS="$LIBS -lws2_32"
 fi
 print_config "TCP_NODELAY" "$tcp_nodelay"
 
@@ -1553,10 +1565,14 @@ if test "$window_size" != "yes" ; then
   window_size="no"
 fi
 cat > $TMPC << EOF
+#ifdef _WIN32
+#include <winsock2.h>
+#else
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <netinet/tcp.h>
+#endif
 int main(int argc, char **argv)
 {
   setsockopt(0, SOL_SOCKET, SO_SNDBUF, NULL, 0);
@@ -1565,6 +1581,9 @@ int main(int argc, char **argv)
 EOF
 if compile_prog "" "" "SO_SNDBUF"; then
   window_size="yes"
+elif compile_prog "" "-lws2_32" "SO_SNDBUF"; then
+  window_size="yes"
+  LIBS="$LIBS -lws2_32"
 fi
 print_config "Net engine window_size" "$window_size"
 
@@ -1574,12 +1593,16 @@ if test "$mss" != "yes" ; then
   mss="no"
 fi
 cat > $TMPC << EOF
+#ifdef _WIN32
+#include <winsock2.h>
+#else
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <netinet/tcp.h>
 #include <arpa/inet.h>
 #include <netinet/in.h>
+#endif
 int main(int argc, char **argv)
 {
   return setsockopt(0, IPPROTO_TCP, TCP_MAXSEG, NULL, 0);
@@ -1587,6 +1610,9 @@ int main(int argc, char **argv)
 EOF
 if compile_prog "" "" "TCP_MAXSEG"; then
   mss="yes"
+elif compile_prog "" "-lws2_32" "TCP_MAXSEG"; then
+  mss="yes"
+  LIBS="$LIBS -lws2_32"
 fi
 print_config "TCP_MAXSEG" "$mss"
 
@@ -1651,10 +1677,15 @@ if test "$ipv6" != "yes" ; then
   ipv6="no"
 fi
 cat > $TMPC << EOF
+#ifdef _WIN32
+#include <winsock2.h>
+#include <ws2tcpip.h>
+#else
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <netinet/in.h>
 #include <netdb.h>
+#endif
 #include <stdio.h>
 int main(int argc, char **argv)
 {
diff --git a/diskutil.h b/diskutil.h
index 15ec681a..f6b09d22 100644
--- a/diskutil.h
+++ b/diskutil.h
@@ -120,7 +120,7 @@ static inline void print_disk_util(struct disk_util_stat *du,
 				   struct buf_output *out)
 {
 }
-#define show_disk_util(terse, parent, out)
+#define show_disk_util(terse, parent, out) do { } while (0)
 #define disk_util_prune_entries()
 #define init_disk_util(td)
 #define setup_disk_util()
diff --git a/helpers.c b/helpers.c
index a0ee3704..ab9d706d 100644
--- a/helpers.c
+++ b/helpers.c
@@ -18,7 +18,8 @@ int posix_fallocate(int fd, off_t offset, off_t len)
 #endif
 
 #ifndef CONFIG_SYNC_FILE_RANGE
-int sync_file_range(int fd, off64_t offset, off64_t nbytes, unsigned int flags)
+int sync_file_range(int fd, uint64_t offset, uint64_t nbytes,
+		    unsigned int flags)
 {
 	errno = ENOSYS;
 	return -1;
diff --git a/helpers.h b/helpers.h
index a0b32858..4ec0f052 100644
--- a/helpers.h
+++ b/helpers.h
@@ -7,8 +7,10 @@
 
 extern int fallocate(int fd, int mode, off_t offset, off_t len);
 extern int posix_fallocate(int fd, off_t offset, off_t len);
-extern int sync_file_range(int fd, off64_t offset, off64_t nbytes,
+#ifndef CONFIG_SYNC_FILE_RANGE
+extern int sync_file_range(int fd, uint64_t offset, uint64_t nbytes,
 					unsigned int flags);
+#endif
 extern int posix_fadvise(int fd, off_t offset, off_t len, int advice);
 
 #endif /* FIO_HELPERS_H_ */
diff --git a/init.c b/init.c
index 60c85761..2f64726c 100644
--- a/init.c
+++ b/init.c
@@ -32,6 +32,7 @@
 #include "steadystate.h"
 #include "blktrace.h"
 
+#include "oslib/asprintf.h"
 #include "oslib/getopt.h"
 #include "oslib/strcasestr.h"
 
diff --git a/io_u.c b/io_u.c
index 4a0c725a..03f5c21f 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2182,7 +2182,7 @@ void io_u_fill_buffer(struct thread_data *td, struct io_u *io_u,
 static int do_sync_file_range(const struct thread_data *td,
 			      struct fio_file *f)
 {
-	off64_t offset, nbytes;
+	uint64_t offset, nbytes;
 
 	offset = f->first_write;
 	nbytes = f->last_write - f->first_write;
diff --git a/json.c b/json.c
index 75212c85..e2819a65 100644
--- a/json.c
+++ b/json.c
@@ -230,7 +230,6 @@ int json_object_add_value_type(struct json_object *obj, const char *name, int ty
 	return 0;
 }
 
-static void json_print_array(struct json_array *array, struct buf_output *);
 int json_array_add_value_type(struct json_array *array, int type, ...)
 {
 	struct json_value *value;
@@ -296,8 +295,8 @@ static void json_print_level(int level, struct buf_output *out)
 }
 
 static void json_print_pair(struct json_pair *pair, struct buf_output *);
-static void json_print_array(struct json_array *array, struct buf_output *);
 static void json_print_value(struct json_value *value, struct buf_output *);
+
 void json_print_object(struct json_object *obj, struct buf_output *out)
 {
 	int i;
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index 3c460ae2..44bfcd5d 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -47,7 +47,6 @@
 /* This is supposed to equal (sizeof(cpumask_t)*8) */
 #define FIO_MAX_CPUS	SMP_MAXCPU
 
-typedef off_t off64_t;
 typedef cpumask_t os_cpu_mask_t;
 
 /*
diff --git a/os/os-freebsd.h b/os/os-freebsd.h
index 789da178..b3addf98 100644
--- a/os/os-freebsd.h
+++ b/os/os-freebsd.h
@@ -30,8 +30,6 @@
 #define fio_swap32(x)	bswap32(x)
 #define fio_swap64(x)	bswap64(x)
 
-typedef off_t off64_t;
-
 typedef cpuset_t os_cpu_mask_t;
 
 #define fio_cpu_clear(mask, cpu)        (void) CPU_CLR((cpu), (mask))
diff --git a/os/os-mac.h b/os/os-mac.h
index 0d97f6b9..2852ac67 100644
--- a/os/os-mac.h
+++ b/os/os-mac.h
@@ -33,8 +33,6 @@
  */
 #define FIO_MAX_JOBS		128
 
-typedef off_t off64_t;
-
 #ifndef CONFIG_CLOCKID_T
 typedef unsigned int clockid_t;
 #endif
diff --git a/os/os-netbsd.h b/os/os-netbsd.h
index 88fb3ef1..abc1d3cb 100644
--- a/os/os-netbsd.h
+++ b/os/os-netbsd.h
@@ -35,8 +35,6 @@
 #define fio_swap32(x)	bswap32(x)
 #define fio_swap64(x)	bswap64(x)
 
-typedef off_t off64_t;
-
 static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 {
 	struct disklabel dl;
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index 43a649d4..085a6f2b 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -35,8 +35,6 @@
 #define fio_swap32(x)	bswap32(x)
 #define fio_swap64(x)	bswap64(x)
 
-typedef off_t off64_t;
-
 static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 {
 	struct disklabel dl;
diff --git a/os/os-windows-7.h b/os/os-windows-7.h
index 0a6eaa3c..b8bd9e7c 100644
--- a/os/os-windows-7.h
+++ b/os/os-windows-7.h
@@ -5,363 +5,3 @@
 typedef struct {
 	uint64_t row[FIO_CPU_MASK_ROWS];
 } os_cpu_mask_t;
-
-#define FIO_HAVE_CPU_ONLINE_SYSCONF
-/* Return all processors regardless of processor group */
-static inline unsigned int cpus_online(void)
-{
-	return GetActiveProcessorCount(ALL_PROCESSOR_GROUPS);
-}
-
-static inline void print_mask(os_cpu_mask_t *cpumask)
-{
-	for (int i = 0; i < FIO_CPU_MASK_ROWS; i++)
-		dprint(FD_PROCESS, "cpumask[%d]=%lu\n", i, cpumask->row[i]);
-}
-
-/* Return the index of the least significant set CPU in cpumask or -1 if no
- * CPUs are set */
-static inline int first_set_cpu(os_cpu_mask_t *cpumask)
-{
-	int cpus_offset, mask_first_cpu, row;
-
-	cpus_offset = 0;
-	row = 0;
-	mask_first_cpu = -1;
-	while (mask_first_cpu < 0 && row < FIO_CPU_MASK_ROWS) {
-		int row_first_cpu;
-
-		row_first_cpu = __builtin_ffsll(cpumask->row[row]) - 1;
-		dprint(FD_PROCESS, "row_first_cpu=%d cpumask->row[%d]=%lu\n",
-		       row_first_cpu, row, cpumask->row[row]);
-		if (row_first_cpu > -1) {
-			mask_first_cpu = cpus_offset + row_first_cpu;
-			dprint(FD_PROCESS, "first set cpu in mask is at index %d\n",
-			       mask_first_cpu);
-		} else {
-			cpus_offset += FIO_CPU_MASK_STRIDE;
-			row++;
-		}
-	}
-
-	return mask_first_cpu;
-}
-
-/* Return the index of the most significant set CPU in cpumask or -1 if no
- * CPUs are set */
-static inline int last_set_cpu(os_cpu_mask_t *cpumask)
-{
-	int cpus_offset, mask_last_cpu, row;
-
-	cpus_offset = (FIO_CPU_MASK_ROWS - 1) * FIO_CPU_MASK_STRIDE;
-	row = FIO_CPU_MASK_ROWS - 1;
-	mask_last_cpu = -1;
-	while (mask_last_cpu < 0 && row >= 0) {
-		int row_last_cpu;
-
-		if (cpumask->row[row] == 0)
-			row_last_cpu = -1;
-		else {
-			uint64_t tmp = cpumask->row[row];
-
-			row_last_cpu = 0;
-			while (tmp >>= 1)
-			    row_last_cpu++;
-		}
-
-		dprint(FD_PROCESS, "row_last_cpu=%d cpumask->row[%d]=%lu\n",
-		       row_last_cpu, row, cpumask->row[row]);
-		if (row_last_cpu > -1) {
-			mask_last_cpu = cpus_offset + row_last_cpu;
-			dprint(FD_PROCESS, "last set cpu in mask is at index %d\n",
-			       mask_last_cpu);
-		} else {
-			cpus_offset -= FIO_CPU_MASK_STRIDE;
-			row--;
-		}
-	}
-
-	return mask_last_cpu;
-}
-
-static inline int mask_to_group_mask(os_cpu_mask_t *cpumask, int *processor_group, uint64_t *affinity_mask)
-{
-	WORD online_groups, group, group_size;
-	bool found;
-	int cpus_offset, search_cpu, last_cpu, bit_offset, row, end;
-	uint64_t group_cpumask;
-
-	search_cpu = first_set_cpu(cpumask);
-	if (search_cpu < 0) {
-		log_info("CPU mask doesn't set any CPUs\n");
-		return 1;
-	}
-
-	/* Find processor group first set CPU applies to */
-	online_groups = GetActiveProcessorGroupCount();
-	group = 0;
-	found = false;
-	cpus_offset = 0;
-	group_size = 0;
-	while (!found && group < online_groups) {
-		group_size = GetActiveProcessorCount(group);
-		dprint(FD_PROCESS, "group=%d group_start=%d group_size=%u search_cpu=%d\n",
-		       group, cpus_offset, group_size, search_cpu);
-		if (cpus_offset + group_size > search_cpu)
-			found = true;
-		else {
-			cpus_offset += group_size;
-			group++;
-		}
-	}
-
-	if (!found) {
-		log_err("CPU mask contains processor beyond last active processor index (%d)\n",
-			 cpus_offset - 1);
-		print_mask(cpumask);
-		return 1;
-	}
-
-	/* Check all the CPUs in the mask apply to ONLY that processor group */
-	last_cpu = last_set_cpu(cpumask);
-	if (last_cpu > (cpus_offset + group_size - 1)) {
-		log_info("CPU mask cannot bind CPUs (e.g. %d, %d) that are "
-			 "in different processor groups\n", search_cpu,
-			 last_cpu);
-		print_mask(cpumask);
-		return 1;
-	}
-
-	/* Extract the current processor group mask from the cpumask */
-	row = cpus_offset / FIO_CPU_MASK_STRIDE;
-	bit_offset = cpus_offset % FIO_CPU_MASK_STRIDE;
-	group_cpumask = cpumask->row[row] >> bit_offset;
-	end = bit_offset + group_size;
-	if (end > FIO_CPU_MASK_STRIDE && (row + 1 < FIO_CPU_MASK_ROWS)) {
-		/* Some of the next row needs to be part of the mask */
-		int needed, needed_shift, needed_mask_shift;
-		uint64_t needed_mask;
-
-		needed = end - FIO_CPU_MASK_STRIDE;
-		needed_shift = FIO_CPU_MASK_STRIDE - bit_offset;
-		needed_mask_shift = FIO_CPU_MASK_STRIDE - needed;
-		needed_mask = (uint64_t)-1 >> needed_mask_shift;
-		dprint(FD_PROCESS, "bit_offset=%d end=%d needed=%d needed_shift=%d needed_mask=%ld needed_mask_shift=%d\n", bit_offset, end, needed, needed_shift, needed_mask, needed_mask_shift);
-		group_cpumask |= (cpumask->row[row + 1] & needed_mask) << needed_shift;
-	}
-	group_cpumask &= (uint64_t)-1 >> (FIO_CPU_MASK_STRIDE - group_size);
-
-	/* Return group and mask */
-	dprint(FD_PROCESS, "Returning group=%d group_mask=%lu\n", group, group_cpumask);
-	*processor_group = group;
-	*affinity_mask = group_cpumask;
-
-	return 0;
-}
-
-static inline int fio_setaffinity(int pid, os_cpu_mask_t cpumask)
-{
-	HANDLE handle = NULL;
-	int group, ret;
-	uint64_t group_mask = 0;
-	GROUP_AFFINITY new_group_affinity;
-
-	ret = -1;
-
-	if (mask_to_group_mask(&cpumask, &group, &group_mask) != 0)
-		goto err;
-
-	handle = OpenThread(THREAD_QUERY_INFORMATION | THREAD_SET_INFORMATION,
-			    TRUE, pid);
-	if (handle == NULL) {
-		log_err("fio_setaffinity: failed to get handle for pid %d\n", pid);
-		goto err;
-	}
-
-	/* Set group and mask.
-	 * Note: if the GROUP_AFFINITY struct's Reserved members are not
-	 * initialised to 0 then SetThreadGroupAffinity will fail with
-	 * GetLastError() set to ERROR_INVALID_PARAMETER */
-	new_group_affinity.Mask = (KAFFINITY) group_mask;
-	new_group_affinity.Group = group;
-	new_group_affinity.Reserved[0] = 0;
-	new_group_affinity.Reserved[1] = 0;
-	new_group_affinity.Reserved[2] = 0;
-	if (SetThreadGroupAffinity(handle, &new_group_affinity, NULL) != 0)
-		ret = 0;
-	else {
-		log_err("fio_setaffinity: failed to set thread affinity "
-			 "(pid %d, group %d, mask %" PRIx64 ", "
-			 "GetLastError=%d)\n", pid, group, group_mask,
-			 GetLastError());
-		goto err;
-	}
-
-err:
-	if (handle)
-		CloseHandle(handle);
-	return ret;
-}
-
-static inline void cpu_to_row_offset(int cpu, int *row, int *offset)
-{
-	*row = cpu / FIO_CPU_MASK_STRIDE;
-	*offset = cpu << FIO_CPU_MASK_STRIDE * *row;
-}
-
-static inline int fio_cpuset_init(os_cpu_mask_t *mask)
-{
-	for (int i = 0; i < FIO_CPU_MASK_ROWS; i++)
-		mask->row[i] = 0;
-	return 0;
-}
-
-/*
- * fio_getaffinity() should not be called once a fio_setaffinity() call has
- * been made because fio_setaffinity() may put the process into multiple
- * processor groups
- */
-static inline int fio_getaffinity(int pid, os_cpu_mask_t *mask)
-{
-	int ret;
-	int row, offset, end, group, group_size, group_start_cpu;
-	DWORD_PTR process_mask, system_mask;
-	HANDLE handle;
-	PUSHORT current_groups;
-	USHORT group_count;
-	WORD online_groups;
-
-	ret = -1;
-	current_groups = NULL;
-	handle = OpenProcess(PROCESS_QUERY_INFORMATION, TRUE, pid);
-	if (handle == NULL) {
-		log_err("fio_getaffinity: failed to get handle for pid %d\n",
-			pid);
-		goto err;
-	}
-
-	group_count = 1;
-	/*
-	 * GetProcessGroupAffinity() seems to expect more than the natural
-	 * alignment for a USHORT from the area pointed to by current_groups so
-	 * arrange for maximum alignment by allocating via malloc()
-	 */
-	current_groups = malloc(sizeof(USHORT));
-	if (!current_groups) {
-		log_err("fio_getaffinity: malloc failed\n");
-		goto err;
-	}
-	if (GetProcessGroupAffinity(handle, &group_count, current_groups) == 0) {
-		/* NB: we also fail here if we are a multi-group process */
-		log_err("fio_getaffinity: failed to get single group affinity for pid %d\n", pid);
-		goto err;
-	}
-	GetProcessAffinityMask(handle, &process_mask, &system_mask);
-
-	/* Convert group and group relative mask to full CPU mask */
-	online_groups = GetActiveProcessorGroupCount();
-	if (online_groups == 0) {
-		log_err("fio_getaffinity: error retrieving total processor groups\n");
-		goto err;
-	}
-
-	group = 0;
-	group_start_cpu = 0;
-	group_size = 0;
-	dprint(FD_PROCESS, "current_groups=%d group_count=%d\n",
-	       current_groups[0], group_count);
-	while (true) {
-		group_size = GetActiveProcessorCount(group);
-		if (group_size == 0) {
-			log_err("fio_getaffinity: error retrieving size of "
-				"processor group %d\n", group);
-			goto err;
-		} else if (group >= current_groups[0] || group >= online_groups)
-			break;
-		else {
-			group_start_cpu += group_size;
-			group++;
-		}
-	}
-
-	if (group != current_groups[0]) {
-		log_err("fio_getaffinity: could not find processor group %d\n",
-			current_groups[0]);
-		goto err;
-	}
-
-	dprint(FD_PROCESS, "group_start_cpu=%d, group size=%u\n",
-	       group_start_cpu, group_size);
-	if ((group_start_cpu + group_size) >= FIO_MAX_CPUS) {
-		log_err("fio_getaffinity failed: current CPU affinity (group "
-			"%d, group_start_cpu %d, group_size %d) extends "
-			"beyond mask's highest CPU (%d)\n", group,
-			group_start_cpu, group_size, FIO_MAX_CPUS);
-		goto err;
-	}
-
-	fio_cpuset_init(mask);
-	cpu_to_row_offset(group_start_cpu, &row, &offset);
-	mask->row[row] = process_mask;
-	mask->row[row] <<= offset;
-	end = offset + group_size;
-	if (end > FIO_CPU_MASK_STRIDE) {
-		int needed;
-		uint64_t needed_mask;
-
-		needed = FIO_CPU_MASK_STRIDE - end;
-		needed_mask = (uint64_t)-1 >> (FIO_CPU_MASK_STRIDE - needed);
-		row++;
-		mask->row[row] = process_mask;
-		mask->row[row] >>= needed;
-		mask->row[row] &= needed_mask;
-	}
-	ret = 0;
-
-err:
-	if (handle)
-		CloseHandle(handle);
-	if (current_groups)
-		free(current_groups);
-
-	return ret;
-}
-
-static inline void fio_cpu_clear(os_cpu_mask_t *mask, int cpu)
-{
-	int row, offset;
-	cpu_to_row_offset(cpu, &row, &offset);
-
-	mask->row[row] &= ~(1ULL << offset);
-}
-
-static inline void fio_cpu_set(os_cpu_mask_t *mask, int cpu)
-{
-	int row, offset;
-	cpu_to_row_offset(cpu, &row, &offset);
-
-	mask->row[row] |= 1ULL << offset;
-}
-
-static inline int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
-{
-	int row, offset;
-	cpu_to_row_offset(cpu, &row, &offset);
-
-	return (mask->row[row] & (1ULL << offset)) != 0;
-}
-
-static inline int fio_cpu_count(os_cpu_mask_t *mask)
-{
-	int count = 0;
-
-	for (int i = 0; i < FIO_CPU_MASK_ROWS; i++)
-		count += hweight64(mask->row[i]);
-
-	return count;
-}
-
-static inline int fio_cpuset_exit(os_cpu_mask_t *mask)
-{
-	return 0;
-}
diff --git a/os/os-windows-xp.h b/os/os-windows-xp.h
index 1ce9ab31..fbc23e2c 100644
--- a/os/os-windows-xp.h
+++ b/os/os-windows-xp.h
@@ -1,70 +1,3 @@
 #define FIO_MAX_CPUS	MAXIMUM_PROCESSORS
 
 typedef DWORD_PTR os_cpu_mask_t;
-
-static inline int fio_setaffinity(int pid, os_cpu_mask_t cpumask)
-{
-	HANDLE h;
-	BOOL bSuccess = FALSE;
-
-	h = OpenThread(THREAD_QUERY_INFORMATION | THREAD_SET_INFORMATION, TRUE, pid);
-	if (h != NULL) {
-		bSuccess = SetThreadAffinityMask(h, cpumask);
-		if (!bSuccess)
-			log_err("fio_setaffinity failed: failed to set thread affinity (pid %d, mask %.16llx)\n", pid, cpumask);
-
-		CloseHandle(h);
-	} else {
-		log_err("fio_setaffinity failed: failed to get handle for pid %d\n", pid);
-	}
-
-	return (bSuccess)? 0 : -1;
-}
-
-static inline int fio_getaffinity(int pid, os_cpu_mask_t *mask)
-{
-	os_cpu_mask_t systemMask;
-
-	HANDLE h = OpenProcess(PROCESS_QUERY_INFORMATION, TRUE, pid);
-
-	if (h != NULL) {
-		GetProcessAffinityMask(h, mask, &systemMask);
-		CloseHandle(h);
-	} else {
-		log_err("fio_getaffinity failed: failed to get handle for pid %d\n", pid);
-		return -1;
-	}
-
-	return 0;
-}
-
-static inline void fio_cpu_clear(os_cpu_mask_t *mask, int cpu)
-{
-	*mask &= ~(1ULL << cpu);
-}
-
-static inline void fio_cpu_set(os_cpu_mask_t *mask, int cpu)
-{
-	*mask |= 1ULL << cpu;
-}
-
-static inline int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
-{
-	return (*mask & (1ULL << cpu)) != 0;
-}
-
-static inline int fio_cpu_count(os_cpu_mask_t *mask)
-{
-	return hweight64(*mask);
-}
-
-static inline int fio_cpuset_init(os_cpu_mask_t *mask)
-{
-	*mask = 0;
-	return 0;
-}
-
-static inline int fio_cpuset_exit(os_cpu_mask_t *mask)
-{
-	return 0;
-}
diff --git a/os/os-windows.h b/os/os-windows.h
index 6061d8c7..6d48ffe8 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -22,11 +22,6 @@
 
 #include "windows/posix.h"
 
-/* MinGW won't declare rand_r unless _POSIX is defined */
-#if defined(WIN32) && !defined(rand_r)
-int rand_r(unsigned *);
-#endif
-
 #ifndef PTHREAD_STACK_MIN
 #define PTHREAD_STACK_MIN 65535
 #endif
@@ -219,7 +214,19 @@ static inline int fio_mkdir(const char *path, mode_t mode) {
 #ifdef CONFIG_WINDOWS_XP
 #include "os-windows-xp.h"
 #else
+#define FIO_HAVE_CPU_ONLINE_SYSCONF
+unsigned int cpus_online(void);
 #include "os-windows-7.h"
 #endif
 
+int first_set_cpu(os_cpu_mask_t *cpumask);
+int fio_setaffinity(int pid, os_cpu_mask_t cpumask);
+int fio_cpuset_init(os_cpu_mask_t *mask);
+int fio_getaffinity(int pid, os_cpu_mask_t *mask);
+void fio_cpu_clear(os_cpu_mask_t *mask, int cpu);
+void fio_cpu_set(os_cpu_mask_t *mask, int cpu);
+int fio_cpu_isset(os_cpu_mask_t *mask, int cpu);
+int fio_cpu_count(os_cpu_mask_t *mask);
+int fio_cpuset_exit(os_cpu_mask_t *mask);
+
 #endif /* FIO_OS_WINDOWS_H */
diff --git a/os/windows/cpu-affinity.c b/os/windows/cpu-affinity.c
new file mode 100644
index 00000000..69997b24
--- /dev/null
+++ b/os/windows/cpu-affinity.c
@@ -0,0 +1,444 @@
+#include "os/os.h"
+
+#include <windows.h>
+
+#ifdef CONFIG_WINDOWS_XP
+int fio_setaffinity(int pid, os_cpu_mask_t cpumask)
+{
+	HANDLE h;
+	BOOL bSuccess = FALSE;
+
+	h = OpenThread(THREAD_QUERY_INFORMATION | THREAD_SET_INFORMATION, TRUE,
+		       pid);
+	if (h != NULL) {
+		bSuccess = SetThreadAffinityMask(h, cpumask);
+		if (!bSuccess)
+			log_err("fio_setaffinity failed: failed to set thread affinity (pid %d, mask %.16llx)\n",
+				pid, cpumask);
+
+		CloseHandle(h);
+	} else {
+		log_err("fio_setaffinity failed: failed to get handle for pid %d\n",
+			pid);
+	}
+
+	return bSuccess ? 0 : -1;
+}
+
+int fio_getaffinity(int pid, os_cpu_mask_t *mask)
+{
+	os_cpu_mask_t systemMask;
+
+	HANDLE h = OpenProcess(PROCESS_QUERY_INFORMATION, TRUE, pid);
+
+	if (h != NULL) {
+		GetProcessAffinityMask(h, mask, &systemMask);
+		CloseHandle(h);
+	} else {
+		log_err("fio_getaffinity failed: failed to get handle for pid %d\n",
+			pid);
+		return -1;
+	}
+
+	return 0;
+}
+
+void fio_cpu_clear(os_cpu_mask_t *mask, int cpu)
+{
+	*mask &= ~(1ULL << cpu);
+}
+
+void fio_cpu_set(os_cpu_mask_t *mask, int cpu)
+{
+	*mask |= 1ULL << cpu;
+}
+
+int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
+{
+	return (*mask & (1ULL << cpu)) != 0;
+}
+
+int fio_cpu_count(os_cpu_mask_t *mask)
+{
+	return hweight64(*mask);
+}
+
+int fio_cpuset_init(os_cpu_mask_t *mask)
+{
+	*mask = 0;
+	return 0;
+}
+
+int fio_cpuset_exit(os_cpu_mask_t *mask)
+{
+	return 0;
+}
+#else /* CONFIG_WINDOWS_XP */
+/* Return all processors regardless of processor group */
+unsigned int cpus_online(void)
+{
+	return GetActiveProcessorCount(ALL_PROCESSOR_GROUPS);
+}
+
+static void print_mask(os_cpu_mask_t *cpumask)
+{
+	for (int i = 0; i < FIO_CPU_MASK_ROWS; i++)
+		dprint(FD_PROCESS, "cpumask[%d]=%lu\n", i, cpumask->row[i]);
+}
+
+/* Return the index of the least significant set CPU in cpumask or -1 if no
+ * CPUs are set */
+int first_set_cpu(os_cpu_mask_t *cpumask)
+{
+	int cpus_offset, mask_first_cpu, row;
+
+	cpus_offset = 0;
+	row = 0;
+	mask_first_cpu = -1;
+	while (mask_first_cpu < 0 && row < FIO_CPU_MASK_ROWS) {
+		int row_first_cpu;
+
+		row_first_cpu = __builtin_ffsll(cpumask->row[row]) - 1;
+		dprint(FD_PROCESS, "row_first_cpu=%d cpumask->row[%d]=%lu\n",
+		       row_first_cpu, row, cpumask->row[row]);
+		if (row_first_cpu > -1) {
+			mask_first_cpu = cpus_offset + row_first_cpu;
+			dprint(FD_PROCESS, "first set cpu in mask is at index %d\n",
+			       mask_first_cpu);
+		} else {
+			cpus_offset += FIO_CPU_MASK_STRIDE;
+			row++;
+		}
+	}
+
+	return mask_first_cpu;
+}
+
+/* Return the index of the most significant set CPU in cpumask or -1 if no
+ * CPUs are set */
+static int last_set_cpu(os_cpu_mask_t *cpumask)
+{
+	int cpus_offset, mask_last_cpu, row;
+
+	cpus_offset = (FIO_CPU_MASK_ROWS - 1) * FIO_CPU_MASK_STRIDE;
+	row = FIO_CPU_MASK_ROWS - 1;
+	mask_last_cpu = -1;
+	while (mask_last_cpu < 0 && row >= 0) {
+		int row_last_cpu;
+
+		if (cpumask->row[row] == 0)
+			row_last_cpu = -1;
+		else {
+			uint64_t tmp = cpumask->row[row];
+
+			row_last_cpu = 0;
+			while (tmp >>= 1)
+			    row_last_cpu++;
+		}
+
+		dprint(FD_PROCESS, "row_last_cpu=%d cpumask->row[%d]=%lu\n",
+		       row_last_cpu, row, cpumask->row[row]);
+		if (row_last_cpu > -1) {
+			mask_last_cpu = cpus_offset + row_last_cpu;
+			dprint(FD_PROCESS, "last set cpu in mask is at index %d\n",
+			       mask_last_cpu);
+		} else {
+			cpus_offset -= FIO_CPU_MASK_STRIDE;
+			row--;
+		}
+	}
+
+	return mask_last_cpu;
+}
+
+static int mask_to_group_mask(os_cpu_mask_t *cpumask, int *processor_group, uint64_t *affinity_mask)
+{
+	WORD online_groups, group, group_size;
+	bool found;
+	int cpus_offset, search_cpu, last_cpu, bit_offset, row, end;
+	uint64_t group_cpumask;
+
+	search_cpu = first_set_cpu(cpumask);
+	if (search_cpu < 0) {
+		log_info("CPU mask doesn't set any CPUs\n");
+		return 1;
+	}
+
+	/* Find processor group first set CPU applies to */
+	online_groups = GetActiveProcessorGroupCount();
+	group = 0;
+	found = false;
+	cpus_offset = 0;
+	group_size = 0;
+	while (!found && group < online_groups) {
+		group_size = GetActiveProcessorCount(group);
+		dprint(FD_PROCESS, "group=%d group_start=%d group_size=%u search_cpu=%d\n",
+		       group, cpus_offset, group_size, search_cpu);
+		if (cpus_offset + group_size > search_cpu)
+			found = true;
+		else {
+			cpus_offset += group_size;
+			group++;
+		}
+	}
+
+	if (!found) {
+		log_err("CPU mask contains processor beyond last active processor index (%d)\n",
+			 cpus_offset - 1);
+		print_mask(cpumask);
+		return 1;
+	}
+
+	/* Check all the CPUs in the mask apply to ONLY that processor group */
+	last_cpu = last_set_cpu(cpumask);
+	if (last_cpu > (cpus_offset + group_size - 1)) {
+		log_info("CPU mask cannot bind CPUs (e.g. %d, %d) that are "
+			 "in different processor groups\n", search_cpu,
+			 last_cpu);
+		print_mask(cpumask);
+		return 1;
+	}
+
+	/* Extract the current processor group mask from the cpumask */
+	row = cpus_offset / FIO_CPU_MASK_STRIDE;
+	bit_offset = cpus_offset % FIO_CPU_MASK_STRIDE;
+	group_cpumask = cpumask->row[row] >> bit_offset;
+	end = bit_offset + group_size;
+	if (end > FIO_CPU_MASK_STRIDE && (row + 1 < FIO_CPU_MASK_ROWS)) {
+		/* Some of the next row needs to be part of the mask */
+		int needed, needed_shift, needed_mask_shift;
+		uint64_t needed_mask;
+
+		needed = end - FIO_CPU_MASK_STRIDE;
+		needed_shift = FIO_CPU_MASK_STRIDE - bit_offset;
+		needed_mask_shift = FIO_CPU_MASK_STRIDE - needed;
+		needed_mask = (uint64_t)-1 >> needed_mask_shift;
+		dprint(FD_PROCESS, "bit_offset=%d end=%d needed=%d needed_shift=%d needed_mask=%ld needed_mask_shift=%d\n", bit_offset, end, needed, needed_shift, needed_mask, needed_mask_shift);
+		group_cpumask |= (cpumask->row[row + 1] & needed_mask) << needed_shift;
+	}
+	group_cpumask &= (uint64_t)-1 >> (FIO_CPU_MASK_STRIDE - group_size);
+
+	/* Return group and mask */
+	dprint(FD_PROCESS, "Returning group=%d group_mask=%lu\n", group, group_cpumask);
+	*processor_group = group;
+	*affinity_mask = group_cpumask;
+
+	return 0;
+}
+
+int fio_setaffinity(int pid, os_cpu_mask_t cpumask)
+{
+	HANDLE handle = NULL;
+	int group, ret;
+	uint64_t group_mask = 0;
+	GROUP_AFFINITY new_group_affinity;
+
+	ret = -1;
+
+	if (mask_to_group_mask(&cpumask, &group, &group_mask) != 0)
+		goto err;
+
+	handle = OpenThread(THREAD_QUERY_INFORMATION | THREAD_SET_INFORMATION,
+			    TRUE, pid);
+	if (handle == NULL) {
+		log_err("fio_setaffinity: failed to get handle for pid %d\n", pid);
+		goto err;
+	}
+
+	/* Set group and mask.
+	 * Note: if the GROUP_AFFINITY struct's Reserved members are not
+	 * initialised to 0 then SetThreadGroupAffinity will fail with
+	 * GetLastError() set to ERROR_INVALID_PARAMETER */
+	new_group_affinity.Mask = (KAFFINITY) group_mask;
+	new_group_affinity.Group = group;
+	new_group_affinity.Reserved[0] = 0;
+	new_group_affinity.Reserved[1] = 0;
+	new_group_affinity.Reserved[2] = 0;
+	if (SetThreadGroupAffinity(handle, &new_group_affinity, NULL) != 0)
+		ret = 0;
+	else {
+		log_err("fio_setaffinity: failed to set thread affinity "
+			 "(pid %d, group %d, mask %" PRIx64 ", "
+			 "GetLastError=%d)\n", pid, group, group_mask,
+			 GetLastError());
+		goto err;
+	}
+
+err:
+	if (handle)
+		CloseHandle(handle);
+	return ret;
+}
+
+static void cpu_to_row_offset(int cpu, int *row, int *offset)
+{
+	*row = cpu / FIO_CPU_MASK_STRIDE;
+	*offset = cpu << FIO_CPU_MASK_STRIDE * *row;
+}
+
+int fio_cpuset_init(os_cpu_mask_t *mask)
+{
+	for (int i = 0; i < FIO_CPU_MASK_ROWS; i++)
+		mask->row[i] = 0;
+	return 0;
+}
+
+/*
+ * fio_getaffinity() should not be called once a fio_setaffinity() call has
+ * been made because fio_setaffinity() may put the process into multiple
+ * processor groups
+ */
+int fio_getaffinity(int pid, os_cpu_mask_t *mask)
+{
+	int ret;
+	int row, offset, end, group, group_size, group_start_cpu;
+	DWORD_PTR process_mask, system_mask;
+	HANDLE handle;
+	PUSHORT current_groups;
+	USHORT group_count;
+	WORD online_groups;
+
+	ret = -1;
+	current_groups = NULL;
+	handle = OpenProcess(PROCESS_QUERY_INFORMATION, TRUE, pid);
+	if (handle == NULL) {
+		log_err("fio_getaffinity: failed to get handle for pid %d\n",
+			pid);
+		goto err;
+	}
+
+	group_count = 16;
+	/*
+	 * GetProcessGroupAffinity() seems to expect more than the natural
+	 * alignment for a USHORT from the area pointed to by current_groups so
+	 * arrange for maximum alignment by allocating via malloc()
+	 */
+	current_groups = malloc(group_count * sizeof(USHORT));
+	if (!current_groups) {
+		log_err("fio_getaffinity: malloc failed\n");
+		goto err;
+	}
+	if (!GetProcessGroupAffinity(handle, &group_count, current_groups)) {
+		log_err("%s: failed to get single group affinity for pid %d (%d)\n",
+			__func__, pid, GetLastError());
+		goto err;
+	}
+	if (group_count > 1) {
+		log_err("%s: pid %d is associated with %d process groups\n",
+			__func__, pid, group_count);
+		goto err;
+	}
+	if (!GetProcessAffinityMask(handle, &process_mask, &system_mask)) {
+		log_err("%s: GetProcessAffinityMask() failed for pid\n",
+			__func__, pid);
+		goto err;
+	}
+
+	/* Convert group and group relative mask to full CPU mask */
+	online_groups = GetActiveProcessorGroupCount();
+	if (online_groups == 0) {
+		log_err("fio_getaffinity: error retrieving total processor groups\n");
+		goto err;
+	}
+
+	group = 0;
+	group_start_cpu = 0;
+	group_size = 0;
+	dprint(FD_PROCESS, "current_groups=%d group_count=%d\n",
+	       current_groups[0], group_count);
+	while (true) {
+		group_size = GetActiveProcessorCount(group);
+		if (group_size == 0) {
+			log_err("fio_getaffinity: error retrieving size of "
+				"processor group %d\n", group);
+			goto err;
+		} else if (group >= current_groups[0] || group >= online_groups)
+			break;
+		else {
+			group_start_cpu += group_size;
+			group++;
+		}
+	}
+
+	if (group != current_groups[0]) {
+		log_err("fio_getaffinity: could not find processor group %d\n",
+			current_groups[0]);
+		goto err;
+	}
+
+	dprint(FD_PROCESS, "group_start_cpu=%d, group size=%u\n",
+	       group_start_cpu, group_size);
+	if ((group_start_cpu + group_size) >= FIO_MAX_CPUS) {
+		log_err("fio_getaffinity failed: current CPU affinity (group "
+			"%d, group_start_cpu %d, group_size %d) extends "
+			"beyond mask's highest CPU (%d)\n", group,
+			group_start_cpu, group_size, FIO_MAX_CPUS);
+		goto err;
+	}
+
+	fio_cpuset_init(mask);
+	cpu_to_row_offset(group_start_cpu, &row, &offset);
+	mask->row[row] = process_mask;
+	mask->row[row] <<= offset;
+	end = offset + group_size;
+	if (end > FIO_CPU_MASK_STRIDE) {
+		int needed;
+		uint64_t needed_mask;
+
+		needed = FIO_CPU_MASK_STRIDE - end;
+		needed_mask = (uint64_t)-1 >> (FIO_CPU_MASK_STRIDE - needed);
+		row++;
+		mask->row[row] = process_mask;
+		mask->row[row] >>= needed;
+		mask->row[row] &= needed_mask;
+	}
+	ret = 0;
+
+err:
+	if (handle)
+		CloseHandle(handle);
+	if (current_groups)
+		free(current_groups);
+
+	return ret;
+}
+
+void fio_cpu_clear(os_cpu_mask_t *mask, int cpu)
+{
+	int row, offset;
+	cpu_to_row_offset(cpu, &row, &offset);
+
+	mask->row[row] &= ~(1ULL << offset);
+}
+
+void fio_cpu_set(os_cpu_mask_t *mask, int cpu)
+{
+	int row, offset;
+	cpu_to_row_offset(cpu, &row, &offset);
+
+	mask->row[row] |= 1ULL << offset;
+}
+
+int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
+{
+	int row, offset;
+	cpu_to_row_offset(cpu, &row, &offset);
+
+	return (mask->row[row] & (1ULL << offset)) != 0;
+}
+
+int fio_cpu_count(os_cpu_mask_t *mask)
+{
+	int count = 0;
+
+	for (int i = 0; i < FIO_CPU_MASK_ROWS; i++)
+		count += hweight64(mask->row[i]);
+
+	return count;
+}
+
+int fio_cpuset_exit(os_cpu_mask_t *mask)
+{
+	return 0;
+}
+#endif /* CONFIG_WINDOWS_XP */
diff --git a/os/windows/posix.c b/os/windows/posix.c
index fd1d5582..e36453e9 100644
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -28,10 +28,6 @@
 extern unsigned long mtime_since_now(struct timespec *);
 extern void fio_gettime(struct timespec *, void *);
 
-/* These aren't defined in the MinGW headers */
-HRESULT WINAPI StringCchCopyA(char *pszDest, size_t cchDest, const char *pszSrc);
-HRESULT WINAPI StringCchPrintfA(char *pszDest, size_t cchDest, const char *pszFormat, ...);
-
 int win_to_posix_error(DWORD winerr)
 {
 	switch (winerr) {
@@ -312,11 +308,11 @@ char *ctime_r(const time_t *t, char *buf)
 	 * We don't know how long `buf` is, but assume it's rounded up from
 	 * the minimum of 25 to 32
 	 */
-	StringCchPrintfA(buf, 31, "%s %s %d %02d:%02d:%02d %04d\n",
-				dayOfWeek[systime.wDayOfWeek % 7],
-				monthOfYear[(systime.wMonth - 1) % 12],
-				systime.wDay, systime.wHour, systime.wMinute,
-				systime.wSecond, systime.wYear);
+	snprintf(buf, 32, "%s %s %d %02d:%02d:%02d %04d\n",
+		 dayOfWeek[systime.wDayOfWeek % 7],
+		 monthOfYear[(systime.wMonth - 1) % 12],
+		 systime.wDay, systime.wHour, systime.wMinute,
+		 systime.wSecond, systime.wYear);
 	return buf;
 }
 
@@ -958,8 +954,8 @@ DIR *opendir(const char *dirname)
 				OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, NULL);
 	if (file != INVALID_HANDLE_VALUE) {
 		CloseHandle(file);
-		dc = (struct dirent_ctx*)malloc(sizeof(struct dirent_ctx));
-		StringCchCopyA(dc->dirname, MAX_PATH, dirname);
+		dc = malloc(sizeof(struct dirent_ctx));
+		snprintf(dc->dirname, sizeof(dc->dirname), "%s", dirname);
 		dc->find_handle = INVALID_HANDLE_VALUE;
 	} else {
 		DWORD error = GetLastError();
@@ -999,7 +995,8 @@ struct dirent *readdir(DIR *dirp)
 	if (dirp->find_handle == INVALID_HANDLE_VALUE) {
 		char search_pattern[MAX_PATH];
 
-		StringCchPrintfA(search_pattern, MAX_PATH-1, "%s\\*", dirp->dirname);
+		snprintf(search_pattern, sizeof(search_pattern), "%s\\*",
+			 dirp->dirname);
 		dirp->find_handle = FindFirstFileA(search_pattern, &find_data);
 		if (dirp->find_handle == INVALID_HANDLE_VALUE)
 			return NULL;
@@ -1008,7 +1005,7 @@ struct dirent *readdir(DIR *dirp)
 			return NULL;
 	}
 
-	StringCchCopyA(de.d_name, MAX_PATH, find_data.cFileName);
+	snprintf(de.d_name, sizeof(de.d_name), find_data.cFileName);
 	de.d_ino = 0;
 
 	return &de;
diff --git a/os/windows/posix.h b/os/windows/posix.h
index 85640a21..02a9075b 100644
--- a/os/windows/posix.h
+++ b/os/windows/posix.h
@@ -1,7 +1,6 @@
 #ifndef FIO_WINDOWS_POSIX_H
 #define FIO_WINDOWS_POSIX_H
 
-typedef off_t off64_t;
 typedef int clockid_t;
 
 extern int clock_gettime(clockid_t clock_id, struct timespec *tp);
diff --git a/stat.c b/stat.c
index e2bc8ddb..2b303494 100644
--- a/stat.c
+++ b/stat.c
@@ -15,6 +15,7 @@
 #include "helper_thread.h"
 #include "smalloc.h"
 #include "zbd.h"
+#include "oslib/asprintf.h"
 
 #define LOG_MSEC_SLACK	1
 
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index a0a1e8fa..3d236e37 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -428,11 +428,13 @@ class Requirements(object):
     _root = False
     _zoned_nullb = False
     _not_macos = False
+    _not_windows = False
     _unittests = False
     _cpucount4 = False
 
     def __init__(self, fio_root):
         Requirements._not_macos = platform.system() != "Darwin"
+        Requirements._not_windows = platform.system() != "Windows"
         Requirements._linux = platform.system() == "Linux"
 
         if Requirements._linux:
@@ -470,6 +472,7 @@ class Requirements(object):
                     Requirements.root,
                     Requirements.zoned_nullb,
                     Requirements.not_macos,
+                    Requirements.not_windows,
                     Requirements.unittests,
                     Requirements.cpucount4]
         for req in req_list:
@@ -494,6 +497,9 @@ class Requirements(object):
     def not_macos():
         return Requirements._not_macos, "platform other than macOS required"
 
+    def not_windows():
+        return Requirements._not_windows, "platform other than Windows required"
+
     def unittests():
         return Requirements._unittests, "Unittests support required"
 
@@ -561,7 +567,7 @@ TEST_LIST = [
             'pre_job':          None,
             'pre_success':      None,
             'output_format':    'json',
-            'requirements':     [],
+            'requirements':     [Requirements.not_windows],
         },
         {
             'test_id':          6,
diff --git a/t/steadystate_tests.py b/t/steadystate_tests.py
index 9122a60f..b55a67ac 100755
--- a/t/steadystate_tests.py
+++ b/t/steadystate_tests.py
@@ -187,7 +187,7 @@ if __name__ == '__main__':
                     # check runtime, confirm criterion calculation, and confirm that criterion was not met
                     expected = job['timeout'] * 1000
                     actual = jsonjob['read']['runtime']
-                    if abs(expected - actual) > 10:
+                    if abs(expected - actual) > 50:
                         line = 'FAILED ' + line + ' ss not attained, expected runtime {0} != actual runtime {1}'.format(expected, actual)
                     else:
                         line = line + ' ss not attained, runtime {0} != ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
@@ -215,12 +215,12 @@ if __name__ == '__main__':
             else:
                 expected = job['timeout'] * 1000
                 actual = jsonjob['read']['runtime']
-                if abs(expected - actual) < 10:
-                    result = 'PASSED '
-                    passed = passed + 1
-                else:
+                if abs(expected - actual) > 50:
                     result = 'FAILED '
                     failed = failed + 1
+                else:
+                    result = 'PASSED '
+                    passed = passed + 1
                 line = result + line + ' no ss, expected runtime {0} ~= actual runtime {1}'.format(expected, actual)
             print(line)
             if 'steadystate' in jsonjob:
diff --git a/verify.c b/verify.c
index 37d2be8d..a2c0d41d 100644
--- a/verify.c
+++ b/verify.c
@@ -14,6 +14,7 @@
 #include "lib/rand.h"
 #include "lib/hweight.h"
 #include "lib/pattern.h"
+#include "oslib/asprintf.h"
 
 #include "crc/md5.h"
 #include "crc/crc64.h"
diff --git a/zbd.c b/zbd.c
index 99310c49..ee8bcb30 100644
--- a/zbd.c
+++ b/zbd.c
@@ -13,10 +13,12 @@
 #include <sys/stat.h>
 #include <unistd.h>
 #include <linux/blkzoned.h>
+
 #include "file.h"
 #include "fio.h"
 #include "lib/pow2.h"
 #include "log.h"
+#include "oslib/asprintf.h"
 #include "smalloc.h"
 #include "verify.h"
 #include "zbd.h"


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-01-06 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-01-06 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit fa6e7f4fb827adb124dbb97a7f72d64e76b2fe6a:

  Merge branch 'master' of https://github.com/bvanassche/fio (2020-01-04 20:22:16 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dbab52955aeb0b58cc88c8eff1b1c2239241f0bd:

  Merge branch 'memalign1' of https://github.com/kusumi/fio (2020-01-05 10:31:40 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'memalign1' of https://github.com/kusumi/fio

Tomohiro Kusumi (1):
      lib/memalign: remove smalloc()/sfree() dependency

 fio.h                    | 12 ++++++++++++
 lib/memalign.c           | 15 ++++-----------
 lib/memalign.h           |  7 +++++--
 t/dedupe.c               |  2 +-
 unittests/lib/memalign.c |  3 ++-
 unittests/unittest.c     | 11 -----------
 unittests/unittest.h     |  4 ----
 7 files changed, 24 insertions(+), 30 deletions(-)

---

Diff of recent changes:

diff --git a/fio.h b/fio.h
index e943ad16..6a5ead4d 100644
--- a/fio.h
+++ b/fio.h
@@ -36,6 +36,8 @@
 #include "lib/rand.h"
 #include "lib/rbtree.h"
 #include "lib/num2str.h"
+#include "lib/memalign.h"
+#include "smalloc.h"
 #include "client.h"
 #include "server.h"
 #include "stat.h"
@@ -856,4 +858,14 @@ extern void check_trigger_file(void);
 extern bool in_flight_overlap(struct io_u_queue *q, struct io_u *io_u);
 extern pthread_mutex_t overlap_check;
 
+static inline void *fio_memalign(size_t alignment, size_t size, bool shared)
+{
+	return __fio_memalign(alignment, size, shared ? smalloc : malloc);
+}
+
+static inline void fio_memfree(void *ptr, size_t size, bool shared)
+{
+	return __fio_memfree(ptr, size, shared ? sfree : free);
+}
+
 #endif
diff --git a/lib/memalign.c b/lib/memalign.c
index 537bb9fb..214a66fa 100644
--- a/lib/memalign.c
+++ b/lib/memalign.c
@@ -11,18 +11,14 @@ struct align_footer {
 	unsigned int offset;
 };
 
-void *fio_memalign(size_t alignment, size_t size, bool shared)
+void *__fio_memalign(size_t alignment, size_t size, malloc_fn fn)
 {
 	struct align_footer *f;
 	void *ptr, *ret = NULL;
 
 	assert(!(alignment & (alignment - 1)));
 
-	if (shared)
-		ptr = smalloc(size + alignment + sizeof(*f) - 1);
-	else
-		ptr = malloc(size + alignment + sizeof(*f) - 1);
-
+	ptr = fn(size + alignment + sizeof(*f) - 1);
 	if (ptr) {
 		ret = PTR_ALIGN(ptr, alignment - 1);
 		f = ret + size;
@@ -32,12 +28,9 @@ void *fio_memalign(size_t alignment, size_t size, bool shared)
 	return ret;
 }
 
-void fio_memfree(void *ptr, size_t size, bool shared)
+void __fio_memfree(void *ptr, size_t size, free_fn fn)
 {
 	struct align_footer *f = ptr + size;
 
-	if (shared)
-		sfree(ptr - f->offset);
-	else
-		free(ptr - f->offset);
+	fn(ptr - f->offset);
 }
diff --git a/lib/memalign.h b/lib/memalign.h
index d7030870..815e3aa2 100644
--- a/lib/memalign.h
+++ b/lib/memalign.h
@@ -4,7 +4,10 @@
 #include <inttypes.h>
 #include <stdbool.h>
 
-extern void *fio_memalign(size_t alignment, size_t size, bool shared);
-extern void fio_memfree(void *ptr, size_t size, bool shared);
+typedef void* (*malloc_fn)(size_t);
+typedef void (*free_fn)(void*);
+
+extern void *__fio_memalign(size_t alignment, size_t size, malloc_fn fn);
+extern void __fio_memfree(void *ptr, size_t size, free_fn fn);
 
 #endif
diff --git a/t/dedupe.c b/t/dedupe.c
index 2ef8dc53..68d31f19 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -10,13 +10,13 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
+#include "../fio.h"
 #include "../flist.h"
 #include "../log.h"
 #include "../fio_sem.h"
 #include "../smalloc.h"
 #include "../minmax.h"
 #include "../crc/md5.h"
-#include "../lib/memalign.h"
 #include "../os/os.h"
 #include "../gettime.h"
 #include "../fio_time.h"
diff --git a/unittests/lib/memalign.c b/unittests/lib/memalign.c
index 854c2744..42a2e31a 100644
--- a/unittests/lib/memalign.c
+++ b/unittests/lib/memalign.c
@@ -1,3 +1,4 @@
+#include <stdlib.h>
 #include "../unittest.h"
 
 #include "../../lib/memalign.h"
@@ -5,7 +6,7 @@
 static void test_memalign_1(void)
 {
 	size_t align = 4096;
-	void *p = fio_memalign(align, 1234, false);
+	void *p = __fio_memalign(align, 1234, malloc);
 
 	if (p)
 		CU_ASSERT_EQUAL(((int)(uintptr_t)p) & (align - 1), 0);
diff --git a/unittests/unittest.c b/unittests/unittest.c
index 66789e4f..c37e1971 100644
--- a/unittests/unittest.c
+++ b/unittests/unittest.c
@@ -8,17 +8,6 @@
 
 #include "./unittest.h"
 
-/* XXX workaround lib/memalign.c's dependency on smalloc.c */
-void *smalloc(size_t size)
-{
-	return malloc(size);
-}
-
-void sfree(void *ptr)
-{
-	free(ptr);
-}
-
 CU_ErrorCode fio_unittest_add_suite(const char *name, CU_InitializeFunc initfn,
 	CU_CleanupFunc cleanfn, struct fio_unittest_entry *tvec)
 {
diff --git a/unittests/unittest.h b/unittests/unittest.h
index bbc49613..786c1c97 100644
--- a/unittests/unittest.h
+++ b/unittests/unittest.h
@@ -11,10 +11,6 @@ struct fio_unittest_entry {
 	CU_TestFunc fn;
 };
 
-/* XXX workaround lib/memalign.c's dependency on smalloc.c */
-void *smalloc(size_t);
-void sfree(void*);
-
 CU_ErrorCode fio_unittest_add_suite(const char*, CU_InitializeFunc,
 	CU_CleanupFunc, struct fio_unittest_entry*);
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-01-05 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-01-05 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 235b09b68d9bbc5150891faca47f5397d7e806bb:

  Merge branch 'master' of https://github.com/bvanassche/fio (2020-01-03 21:00:17 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fa6e7f4fb827adb124dbb97a7f72d64e76b2fe6a:

  Merge branch 'master' of https://github.com/bvanassche/fio (2020-01-04 20:22:16 -0700)

----------------------------------------------------------------
Bart Van Assche (6):
      Handle helper_thread_create() failures properly
      Fix a potential deadlock in helper_do_stat()
      Block signals for the helper thread
      helper_thread: Complain if select() fails
      t/memlock: Free allocated memory
      t/read-to-pipe-async: Complain if option -f is specified multiple times

Jens Axboe (2):
      Merge branch 'master' of https://github.com/bvanassche/fio
      Merge branch 'master' of https://github.com/bvanassche/fio

 backend.c              |   3 +-
 configure              |  66 ++++++++++++++++
 helper_thread.c        | 211 ++++++++++++++++++++++++++++++++++++++++---------
 t/memlock.c            |   1 +
 t/read-to-pipe-async.c |   2 +
 5 files changed, 244 insertions(+), 39 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index d0d691b3..0d1f4734 100644
--- a/backend.c
+++ b/backend.c
@@ -2495,7 +2495,8 @@ int fio_backend(struct sk_out *sk_out)
 
 	set_genesis_time();
 	stat_init();
-	helper_thread_create(startup_sem, sk_out);
+	if (helper_thread_create(startup_sem, sk_out))
+		log_err("fio: failed to create helper thread\n");
 
 	cgroup_list = smalloc(sizeof(*cgroup_list));
 	if (cgroup_list)
diff --git a/configure b/configure
index 3a675a46..fa6df532 100755
--- a/configure
+++ b/configure
@@ -726,6 +726,27 @@ elif compile_prog "" "$LIBS -lpthread" "pthread_condattr_setclock" ; then
 fi
 print_config "pthread_condattr_setclock()" "$pthread_condattr_setclock"
 
+##########################################
+# pthread_sigmask() probe
+if test "$pthread_sigmask" != "yes" ; then
+  pthread_sigmask="no"
+fi
+cat > $TMPC <<EOF
+#include <stddef.h> /* NULL */
+#include <signal.h> /* pthread_sigmask() */
+int main(void)
+{
+  return pthread_sigmask(0, NULL, NULL);
+}
+EOF
+if compile_prog "" "$LIBS" "pthread_sigmask" ; then
+  pthread_sigmask=yes
+elif compile_prog "" "$LIBS -lpthread" "pthread_sigmask" ; then
+  pthread_sigmask=yes
+  LIBS="$LIBS -lpthread"
+fi
+print_config "pthread_sigmask()" "$pthread_sigmask"
+
 ##########################################
 # solaris aio probe
 if test "$solaris_aio" != "yes" ; then
@@ -1113,6 +1134,42 @@ if compile_prog "" "" "fdatasync"; then
 fi
 print_config "fdatasync" "$fdatasync"
 
+##########################################
+# pipe() probe
+if test "$pipe" != "yes" ; then
+  pipe="no"
+fi
+cat > $TMPC << EOF
+#include <unistd.h>
+int main(int argc, char **argv)
+{
+  int fd[2];
+  return pipe(fd);
+}
+EOF
+if compile_prog "" "" "pipe"; then
+  pipe="yes"
+fi
+print_config "pipe()" "$pipe"
+
+##########################################
+# pipe2() probe
+if test "$pipe2" != "yes" ; then
+  pipe2="no"
+fi
+cat > $TMPC << EOF
+#include <unistd.h>
+int main(int argc, char **argv)
+{
+  int fd[2];
+  return pipe2(fd, 0);
+}
+EOF
+if compile_prog "" "" "pipe2"; then
+  pipe2="yes"
+fi
+print_config "pipe2()" "$pipe2"
+
 ##########################################
 # pread() probe
 if test "$pread" != "yes" ; then
@@ -2498,6 +2555,9 @@ fi
 if test "$pthread_condattr_setclock" = "yes" ; then
   output_sym "CONFIG_PTHREAD_CONDATTR_SETCLOCK"
 fi
+if test "$pthread_sigmask" = "yes" ; then
+  output_sym "CONFIG_PTHREAD_SIGMASK"
+fi
 if test "$have_asprintf" = "yes" ; then
     output_sym "CONFIG_HAVE_ASPRINTF"
 fi
@@ -2513,6 +2573,12 @@ fi
 if test "$fdatasync" = "yes" ; then
   output_sym "CONFIG_FDATASYNC"
 fi
+if test "$pipe" = "yes" ; then
+  output_sym "CONFIG_PIPE"
+fi
+if test "$pipe2" = "yes" ; then
+  output_sym "CONFIG_PIPE2"
+fi
 if test "$pread" = "yes" ; then
   output_sym "CONFIG_PREAD"
 fi
diff --git a/helper_thread.c b/helper_thread.c
index eba2898a..78749855 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -1,3 +1,4 @@
+#include <signal.h>
 #ifdef CONFIG_VALGRIND_DEV
 #include <valgrind/drd.h>
 #else
@@ -10,48 +11,103 @@
 #include "steadystate.h"
 #include "pshared.h"
 
+enum action {
+	A_EXIT		= 1,
+	A_RESET		= 2,
+	A_DO_STAT	= 3,
+};
+
 static struct helper_data {
 	volatile int exit;
-	volatile int reset;
-	volatile int do_stat;
+	int pipe[2]; /* 0: read end; 1: write end. */
 	struct sk_out *sk_out;
 	pthread_t thread;
-	pthread_mutex_t lock;
-	pthread_cond_t cond;
 	struct fio_sem *startup_sem;
 } *helper_data;
 
 void helper_thread_destroy(void)
 {
-	pthread_cond_destroy(&helper_data->cond);
-	pthread_mutex_destroy(&helper_data->lock);
+	if (!helper_data)
+		return;
+
+	close(helper_data->pipe[0]);
+	close(helper_data->pipe[1]);
 	sfree(helper_data);
 }
 
-void helper_reset(void)
+#ifdef _WIN32
+static void sock_init(void)
 {
-	if (!helper_data)
-		return;
+	WSADATA wsaData;
+	int res;
 
-	pthread_mutex_lock(&helper_data->lock);
+	/* It is allowed to call WSAStartup() more than once. */
+	res = WSAStartup(MAKEWORD(2, 2), &wsaData);
+	assert(res == 0);
+}
 
-	if (!helper_data->reset) {
-		helper_data->reset = 1;
-		pthread_cond_signal(&helper_data->cond);
-	}
+static int make_nonblocking(int fd)
+{
+	unsigned long arg = 1;
 
-	pthread_mutex_unlock(&helper_data->lock);
+	return ioctlsocket(fd, FIONBIO, &arg);
 }
 
-void helper_do_stat(void)
+static int write_to_pipe(int fd, const void *buf, size_t len)
+{
+	return send(fd, buf, len, 0);
+}
+
+static int read_from_pipe(int fd, void *buf, size_t len)
+{
+	return recv(fd, buf, len, 0);
+}
+#else
+static void sock_init(void)
+{
+}
+
+static int make_nonblocking(int fd)
+{
+	return fcntl(fd, F_SETFL, O_NONBLOCK);
+}
+
+static int write_to_pipe(int fd, const void *buf, size_t len)
+{
+	return write(fd, buf, len);
+}
+
+static int read_from_pipe(int fd, void *buf, size_t len)
+{
+	return read(fd, buf, len);
+}
+#endif
+
+static void submit_action(enum action a)
 {
+	const char data = a;
+	int ret;
+
 	if (!helper_data)
 		return;
 
-	pthread_mutex_lock(&helper_data->lock);
-	helper_data->do_stat = 1;
-	pthread_cond_signal(&helper_data->cond);
-	pthread_mutex_unlock(&helper_data->lock);
+	ret = write_to_pipe(helper_data->pipe[1], &data, sizeof(data));
+	assert(ret == 1);
+}
+
+void helper_reset(void)
+{
+	submit_action(A_RESET);
+}
+
+/*
+ * May be invoked in signal handler context and hence must only call functions
+ * that are async-signal-safe. See also
+ * https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03.
+ */
+void helper_do_stat(void)
+{
+	submit_action(A_DO_STAT);
 }
 
 bool helper_should_exit(void)
@@ -64,14 +120,12 @@ bool helper_should_exit(void)
 
 void helper_thread_exit(void)
 {
-	void *ret;
+	if (!helper_data)
+		return;
 
-	pthread_mutex_lock(&helper_data->lock);
 	helper_data->exit = 1;
-	pthread_cond_signal(&helper_data->cond);
-	pthread_mutex_unlock(&helper_data->lock);
-
-	pthread_join(helper_data->thread, &ret);
+	submit_action(A_EXIT);
+	pthread_join(helper_data->thread, NULL);
 }
 
 static void *helper_thread_main(void *data)
@@ -79,10 +133,23 @@ static void *helper_thread_main(void *data)
 	struct helper_data *hd = data;
 	unsigned int msec_to_next_event, next_log, next_ss = STEADYSTATE_MSEC;
 	struct timespec ts, last_du, last_ss;
+	char action;
 	int ret = 0;
 
 	sk_out_assign(hd->sk_out);
 
+#ifdef HAVE_PTHREAD_SIGMASK
+	{
+	sigset_t sigmask;
+
+	/* Let another thread handle signals. */
+	ret = pthread_sigmask(SIG_UNBLOCK, NULL, &sigmask);
+	assert(ret == 0);
+	ret = pthread_sigmask(SIG_BLOCK, &sigmask, NULL);
+	assert(ret == 0);
+	}
+#endif
+
 #ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
 	clock_gettime(CLOCK_MONOTONIC, &ts);
 #else
@@ -96,11 +163,27 @@ static void *helper_thread_main(void *data)
 	msec_to_next_event = DISK_UTIL_MSEC;
 	while (!ret && !hd->exit) {
 		uint64_t since_du, since_ss = 0;
+		struct timeval timeout = {
+			.tv_sec  = DISK_UTIL_MSEC / 1000,
+			.tv_usec = (DISK_UTIL_MSEC % 1000) * 1000,
+		};
+		fd_set rfds, efds;
 
 		timespec_add_msec(&ts, msec_to_next_event);
 
-		pthread_mutex_lock(&hd->lock);
-		pthread_cond_timedwait(&hd->cond, &hd->lock, &ts);
+		if (read_from_pipe(hd->pipe[0], &action, sizeof(action)) < 0) {
+			FD_ZERO(&rfds);
+			FD_SET(hd->pipe[0], &rfds);
+			FD_ZERO(&efds);
+			FD_SET(hd->pipe[0], &efds);
+			ret = select(1, &rfds, NULL, &efds, &timeout);
+			if (ret < 0)
+				log_err("fio: select() call in helper thread failed: %s",
+					strerror(errno));
+			if (read_from_pipe(hd->pipe[0], &action, sizeof(action)) <
+			    0)
+				action = 0;
+		}
 
 #ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
 		clock_gettime(CLOCK_MONOTONIC, &ts);
@@ -108,14 +191,11 @@ static void *helper_thread_main(void *data)
 		clock_gettime(CLOCK_REALTIME, &ts);
 #endif
 
-		if (hd->reset) {
-			memcpy(&last_du, &ts, sizeof(ts));
-			memcpy(&last_ss, &ts, sizeof(ts));
-			hd->reset = 0;
+		if (action == A_RESET) {
+			last_du = ts;
+			last_ss = ts;
 		}
 
-		pthread_mutex_unlock(&hd->lock);
-
 		since_du = mtime_since(&last_du, &ts);
 		if (since_du >= DISK_UTIL_MSEC || DISK_UTIL_MSEC - since_du < 10) {
 			ret = update_io_ticks();
@@ -126,10 +206,8 @@ static void *helper_thread_main(void *data)
 		} else
 			msec_to_next_event = DISK_UTIL_MSEC - since_du;
 
-		if (hd->do_stat) {
-			hd->do_stat = 0;
+		if (action == A_DO_STAT)
 			__show_running_run_stats();
-		}
 
 		next_log = calc_log_samples();
 		if (!next_log)
@@ -161,6 +239,54 @@ static void *helper_thread_main(void *data)
 	return NULL;
 }
 
+/*
+ * Connect two sockets to each other to emulate the pipe() system call on Windows.
+ */
+int pipe_over_loopback(int fd[2])
+{
+	struct sockaddr_in addr = { .sin_family = AF_INET };
+	socklen_t len = sizeof(addr);
+	int res;
+
+	addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+
+	sock_init();
+
+	fd[0] = socket(AF_INET, SOCK_STREAM, 0);
+	if (fd[0] < 0)
+		goto err;
+	fd[1] = socket(AF_INET, SOCK_STREAM, 0);
+	if (fd[1] < 0)
+		goto close_fd_0;
+	res = bind(fd[0], (struct sockaddr *)&addr, len);
+	if (res < 0)
+		goto close_fd_1;
+	res = getsockname(fd[0], (struct sockaddr *)&addr, &len);
+	if (res < 0)
+		goto close_fd_1;
+	res = listen(fd[0], 1);
+	if (res < 0)
+		goto close_fd_1;
+	res = connect(fd[1], (struct sockaddr *)&addr, len);
+	if (res < 0)
+		goto close_fd_1;
+	res = accept(fd[0], NULL, NULL);
+	if (res < 0)
+		goto close_fd_1;
+	close(fd[0]);
+	fd[0] = res;
+	return 0;
+
+close_fd_1:
+	close(fd[1]);
+
+close_fd_0:
+	close(fd[0]);
+
+err:
+	return -1;
+}
+
 int helper_thread_create(struct fio_sem *startup_sem, struct sk_out *sk_out)
 {
 	struct helper_data *hd;
@@ -173,10 +299,19 @@ int helper_thread_create(struct fio_sem *startup_sem, struct sk_out *sk_out)
 
 	hd->sk_out = sk_out;
 
-	ret = mutex_cond_init_pshared(&hd->lock, &hd->cond);
+#if defined(CONFIG_PIPE2)
+	ret = pipe2(hd->pipe, O_CLOEXEC);
+#elif defined(CONFIG_PIPE)
+	ret = pipe(hd->pipe);
+#else
+	ret = pipe_over_loopback(hd->pipe);
+#endif
 	if (ret)
 		return 1;
 
+	ret = make_nonblocking(hd->pipe[0]);
+	assert(ret >= 0);
+
 	hd->startup_sem = startup_sem;
 
 	DRD_IGNORE_VAR(helper_data);
diff --git a/t/memlock.c b/t/memlock.c
index 3d3579ad..ebedb91d 100644
--- a/t/memlock.c
+++ b/t/memlock.c
@@ -26,6 +26,7 @@ static void *worker(void *data)
 			first = 0;
 		}
 	}
+	free(buf);
 	return NULL;
 }
 
diff --git a/t/read-to-pipe-async.c b/t/read-to-pipe-async.c
index 69799336..bc7986f7 100644
--- a/t/read-to-pipe-async.c
+++ b/t/read-to-pipe-async.c
@@ -520,6 +520,8 @@ static int parse_options(int argc, char *argv[])
 	while ((c = getopt(argc, argv, "f:b:t:w:")) != -1) {
 		switch (c) {
 		case 'f':
+			if (file)
+				return usage(argv);
 			file = strdup(optarg);
 			break;
 		case 'b':


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2020-01-04 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2020-01-04 13:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 21738 bytes --]

The following changes since commit aae515f4e1cb0b3c003e127200d344d807032a79:

  io_uring: Enable io_uring ioengine on aarch64 arch (2019-12-25 08:02:57 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 235b09b68d9bbc5150891faca47f5397d7e806bb:

  Merge branch 'master' of https://github.com/bvanassche/fio (2020-01-03 21:00:17 -0700)

----------------------------------------------------------------
Bart Van Assche (9):
      Micro-optimize __load_ioengine()
      Suppress a Coverity taint warning in check_status_file()
      Makefile: Build more test code
      Use CLOCK_MONOTONIC for condition variables used by pthread_cond_timedwait()
      t/read-to-pipe-async: Use the monotonic clock when measuring time intervals
      Fix a potential deadlock in helper_do_stat()
      .appveyor.yml: Convert to ASCII
      Only build t/read-to-pipe-async if pread() is available
      Improve the pthread_condattr_setclock() test

Jens Axboe (5):
      Merge branch 'unit1' of https://github.com/kusumi/fio
      Merge branch 'master' of https://github.com/bvanassche/fio
      Revert "Fix a potential deadlock in helper_do_stat()"
      Merge branch 'master' of https://github.com/kenbarr1/fio
      Merge branch 'master' of https://github.com/bvanassche/fio

Ken Barr (1):
      io_u: fix rate limiting to handle file wrap-around

Tomohiro Kusumi (2):
      unittests: add unittest suite for oslib/strcasestr.c
      unittests: add unittest suite for oslib/strsep.c

 .appveyor.yml                |   2 +-
 Makefile                     |  11 +++++
 configure                    |  45 ++++++++++++++++++
 fio_sem.c                    |  11 +++--
 helper_thread.c              |  17 ++++---
 idletime.c                   |  16 ++++++-
 io_u.c                       |   2 +-
 ioengines.c                  |  11 ++---
 pshared.c                    |   9 ++++
 stat.c                       |   3 ++
 t/read-to-pipe-async.c       |  68 +++++++++++++++++----------
 unittests/oslib/strcasestr.c |  87 +++++++++++++++++++++++++++++++++++
 unittests/oslib/strsep.c     | 107 +++++++++++++++++++++++++++++++++++++++++++
 unittests/unittest.c         |   2 +
 unittests/unittest.h         |   2 +
 15 files changed, 348 insertions(+), 45 deletions(-)
 create mode 100644 unittests/oslib/strcasestr.c
 create mode 100644 unittests/oslib/strsep.c

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index f6934096..4fb0a90d 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -14,7 +14,7 @@ environment:
 
 install:
   - '%CYG_ROOT%\setup-x86_64.exe --quiet-mode --no-shortcuts --only-site --site "%CYG_MIRROR%" --packages "mingw64-%PACKAGE_ARCH%-zlib,mingw64-%PACKAGE_ARCH%-CUnit" > NUL'
-  - SET PATH=C:\Python38-x64;%CYG_ROOT%\bin;%PATH% #��NB: Changed env variables persist to later sections
+  - SET PATH=C:\Python38-x64;%CYG_ROOT%\bin;%PATH% # NB: Changed env variables persist to later sections
   - python.exe -m pip install scipy
 
 build_script:
diff --git a/Makefile b/Makefile
index 4a07fab3..dd26afca 100644
--- a/Makefile
+++ b/Makefile
@@ -313,6 +313,13 @@ T_TEST_PROGS += $(T_GEN_RAND_PROGS)
 T_PROGS += $(T_BTRACE_FIO_PROGS)
 T_PROGS += $(T_DEDUPE_PROGS)
 T_PROGS += $(T_VS_PROGS)
+T_TEST_PROGS += $(T_MEMLOCK_PROGS)
+ifdef CONFIG_PREAD
+T_TEST_PROGS += $(T_PIPE_ASYNC_PROGS)
+endif
+ifneq (,$(findstring Linux,$(CONFIG_TARGET_OS)))
+T_TEST_PROGS += $(T_IOU_RING_PROGS)
+endif
 
 PROGS += $(T_PROGS)
 
@@ -322,10 +329,14 @@ UT_OBJS += unittests/lib/memalign.o
 UT_OBJS += unittests/lib/strntol.o
 UT_OBJS += unittests/oslib/strlcat.o
 UT_OBJS += unittests/oslib/strndup.o
+UT_OBJS += unittests/oslib/strcasestr.o
+UT_OBJS += unittests/oslib/strsep.o
 UT_TARGET_OBJS = lib/memalign.o
 UT_TARGET_OBJS += lib/strntol.o
 UT_TARGET_OBJS += oslib/strlcat.o
 UT_TARGET_OBJS += oslib/strndup.o
+UT_TARGET_OBJS += oslib/strcasestr.o
+UT_TARGET_OBJS += oslib/strsep.o
 UT_PROGS = unittests/unittest
 else
 UT_OBJS =
diff --git a/configure b/configure
index a1279693..3a675a46 100755
--- a/configure
+++ b/configure
@@ -704,6 +704,28 @@ if compile_prog "" "$LIBS" "posix_pshared" ; then
 fi
 print_config "POSIX pshared support" "$posix_pshared"
 
+##########################################
+# POSIX pthread_condattr_setclock() probe
+if test "$pthread_condattr_setclock" != "yes" ; then
+  pthread_condattr_setclock="no"
+fi
+cat > $TMPC <<EOF
+#include <pthread.h>
+int main(void)
+{
+  pthread_condattr_t condattr;
+  pthread_condattr_setclock(&condattr, CLOCK_MONOTONIC);
+  return 0;
+}
+EOF
+if compile_prog "" "$LIBS" "pthread_condattr_setclock" ; then
+  pthread_condattr_setclock=yes
+elif compile_prog "" "$LIBS -lpthread" "pthread_condattr_setclock" ; then
+  pthread_condattr_setclock=yes
+  LIBS="$LIBS -lpthread"
+fi
+print_config "pthread_condattr_setclock()" "$pthread_condattr_setclock"
+
 ##########################################
 # solaris aio probe
 if test "$solaris_aio" != "yes" ; then
@@ -1091,6 +1113,23 @@ if compile_prog "" "" "fdatasync"; then
 fi
 print_config "fdatasync" "$fdatasync"
 
+##########################################
+# pread() probe
+if test "$pread" != "yes" ; then
+  pread="no"
+fi
+cat > $TMPC << EOF
+#include <unistd.h>
+int main(int argc, char **argv)
+{
+  return pread(0, NULL, 0, 0);
+}
+EOF
+if compile_prog "" "" "pread"; then
+  pread="yes"
+fi
+print_config "pread()" "$pread"
+
 ##########################################
 # sync_file_range() probe
 if test "$sync_file_range" != "yes" ; then
@@ -2456,6 +2495,9 @@ fi
 if test "$posix_pshared" = "yes" ; then
   output_sym "CONFIG_PSHARED"
 fi
+if test "$pthread_condattr_setclock" = "yes" ; then
+  output_sym "CONFIG_PTHREAD_CONDATTR_SETCLOCK"
+fi
 if test "$have_asprintf" = "yes" ; then
     output_sym "CONFIG_HAVE_ASPRINTF"
 fi
@@ -2471,6 +2513,9 @@ fi
 if test "$fdatasync" = "yes" ; then
   output_sym "CONFIG_FDATASYNC"
 fi
+if test "$pread" = "yes" ; then
+  output_sym "CONFIG_PREAD"
+fi
 if test "$sync_file_range" = "yes" ; then
   output_sym "CONFIG_SYNC_FILE_RANGE"
 fi
diff --git a/fio_sem.c b/fio_sem.c
index 3b48061c..c34d8bf7 100644
--- a/fio_sem.c
+++ b/fio_sem.c
@@ -85,16 +85,19 @@ static bool sem_timed_out(struct timespec *t, unsigned int msecs)
 
 int fio_sem_down_timeout(struct fio_sem *sem, unsigned int msecs)
 {
-	struct timeval tv_s;
 	struct timespec base;
 	struct timespec t;
 	int ret = 0;
 
 	assert(sem->magic == FIO_SEM_MAGIC);
 
-	gettimeofday(&tv_s, NULL);
-	base.tv_sec = t.tv_sec = tv_s.tv_sec;
-	base.tv_nsec = t.tv_nsec = tv_s.tv_usec * 1000;
+#ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
+	clock_gettime(CLOCK_MONOTONIC, &t);
+#else
+	clock_gettime(CLOCK_REALTIME, &t);
+#endif
+
+	base = t;
 
 	t.tv_sec += msecs / 1000;
 	t.tv_nsec += ((msecs * 1000000ULL) % 1000000000);
diff --git a/helper_thread.c b/helper_thread.c
index f0c717f5..eba2898a 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -78,15 +78,16 @@ static void *helper_thread_main(void *data)
 {
 	struct helper_data *hd = data;
 	unsigned int msec_to_next_event, next_log, next_ss = STEADYSTATE_MSEC;
-	struct timeval tv;
 	struct timespec ts, last_du, last_ss;
 	int ret = 0;
 
 	sk_out_assign(hd->sk_out);
 
-	gettimeofday(&tv, NULL);
-	ts.tv_sec = tv.tv_sec;
-	ts.tv_nsec = tv.tv_usec * 1000;
+#ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
+	clock_gettime(CLOCK_MONOTONIC, &ts);
+#else
+	clock_gettime(CLOCK_REALTIME, &ts);
+#endif
 	memcpy(&last_du, &ts, sizeof(ts));
 	memcpy(&last_ss, &ts, sizeof(ts));
 
@@ -101,9 +102,11 @@ static void *helper_thread_main(void *data)
 		pthread_mutex_lock(&hd->lock);
 		pthread_cond_timedwait(&hd->cond, &hd->lock, &ts);
 
-		gettimeofday(&tv, NULL);
-		ts.tv_sec = tv.tv_sec;
-		ts.tv_nsec = tv.tv_usec * 1000;
+#ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
+		clock_gettime(CLOCK_MONOTONIC, &ts);
+#else
+		clock_gettime(CLOCK_REALTIME, &ts);
+#endif
 
 		if (hd->reset) {
 			memcpy(&last_du, &ts, sizeof(ts));
diff --git a/idletime.c b/idletime.c
index 2f59f510..fc1df8e9 100644
--- a/idletime.c
+++ b/idletime.c
@@ -186,6 +186,7 @@ void fio_idle_prof_init(void)
 	int i, ret;
 	struct timespec ts;
 	pthread_attr_t tattr;
+	pthread_condattr_t cattr;
 	struct idle_prof_thread *ipt;
 
 	ipc.nr_cpus = cpus_online();
@@ -194,6 +195,13 @@ void fio_idle_prof_init(void)
 	if (ipc.opt == IDLE_PROF_OPT_NONE)
 		return;
 
+	ret = pthread_condattr_init(&cattr);
+	assert(ret == 0);
+#ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
+	ret = pthread_condattr_setclock(&cattr, CLOCK_MONOTONIC);
+	assert(ret == 0);
+#endif
+
 	if ((ret = pthread_attr_init(&tattr))) {
 		log_err("fio: pthread_attr_init %s\n", strerror(ret));
 		return;
@@ -239,7 +247,7 @@ void fio_idle_prof_init(void)
 			break;
 		}
 
-		if ((ret = pthread_cond_init(&ipt->cond, NULL))) {
+		if ((ret = pthread_cond_init(&ipt->cond, &cattr))) {
 			ipc.status = IDLE_PROF_STATUS_ABORT;
 			log_err("fio: pthread_cond_init %s\n", strerror(ret));
 			break;
@@ -282,7 +290,11 @@ void fio_idle_prof_init(void)
 		pthread_mutex_lock(&ipt->init_lock);
 		while ((ipt->state != TD_EXITED) &&
 		       (ipt->state!=TD_INITIALIZED)) {
-			fio_gettime(&ts, NULL);
+#ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
+			clock_gettime(CLOCK_MONOTONIC, &ts);
+#else
+			clock_gettime(CLOCK_REALTIME, &ts);
+#endif
 			ts.tv_sec += 1;
 			pthread_cond_timedwait(&ipt->cond, &ipt->init_lock, &ts);
 		}
diff --git a/io_u.c b/io_u.c
index b5c31335..4a0c725a 100644
--- a/io_u.c
+++ b/io_u.c
@@ -644,7 +644,7 @@ static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
 	uint64_t now;
 
 	assert(ddir_rw(ddir));
-	now = utime_since_now(&td->start);
+	now = utime_since_now(&td->epoch);
 
 	/*
 	 * if rate_next_io_time is in the past, need to catch up to rate
diff --git a/ioengines.c b/ioengines.c
index 9e3fcc9f..b9200ba9 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -121,18 +121,15 @@ static struct ioengine_ops *dlopen_ioengine(struct thread_data *td,
 	return ops;
 }
 
-static struct ioengine_ops *__load_ioengine(const char *name)
+static struct ioengine_ops *__load_ioengine(const char *engine)
 {
-	char engine[64];
-
-	snprintf(engine, sizeof(engine), "%s", name);
-
 	/*
 	 * linux libaio has alias names, so convert to what we want
 	 */
 	if (!strncmp(engine, "linuxaio", 8)) {
-		dprint(FD_IO, "converting ioengine name: %s -> libaio\n", name);
-		strcpy(engine, "libaio");
+		dprint(FD_IO, "converting ioengine name: %s -> libaio\n",
+		       engine);
+		engine = "libaio";
 	}
 
 	dprint(FD_IO, "load ioengine %s\n", engine);
diff --git a/pshared.c b/pshared.c
index 74812ede..21192556 100644
--- a/pshared.c
+++ b/pshared.c
@@ -21,6 +21,15 @@ int cond_init_pshared(pthread_cond_t *cond)
 		return ret;
 	}
 #endif
+
+#ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
+	ret = pthread_condattr_setclock(&cattr, CLOCK_MONOTONIC);
+	if (ret) {
+		log_err("pthread_condattr_setclock: %s\n", strerror(ret));
+		return ret;
+	}
+#endif
+
 	ret = pthread_cond_init(cond, &cattr);
 	if (ret) {
 		log_err("pthread_cond_init: %s\n", strerror(ret));
diff --git a/stat.c b/stat.c
index 05663e07..e2bc8ddb 100644
--- a/stat.c
+++ b/stat.c
@@ -2123,6 +2123,9 @@ static int check_status_file(void)
 	}
 	if (temp_dir == NULL)
 		temp_dir = "/tmp";
+#ifdef __COVERITY__
+	__coverity_tainted_data_sanitize__(temp_dir);
+#endif
 
 	snprintf(fio_status_file_path, sizeof(fio_status_file_path), "%s/%s", temp_dir, FIO_STATUS_FILE);
 
diff --git a/t/read-to-pipe-async.c b/t/read-to-pipe-async.c
index ebdd8f10..69799336 100644
--- a/t/read-to-pipe-async.c
+++ b/t/read-to-pipe-async.c
@@ -20,6 +20,7 @@
  * Copyright (C) 2016 Jens Axboe
  *
  */
+
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
@@ -100,13 +101,13 @@ struct work_item {
 static struct reader_thread reader_thread;
 static struct writer_thread writer_thread;
 
-uint64_t utime_since(const struct timeval *s, const struct timeval *e)
+uint64_t utime_since(const struct timespec *s, const struct timespec *e)
 {
 	long sec, usec;
 	uint64_t ret;
 
 	sec = e->tv_sec - s->tv_sec;
-	usec = e->tv_usec - s->tv_usec;
+	usec = (e->tv_nsec - s->tv_nsec) / 1000;
 	if (sec > 0 && usec < 0) {
 		sec--;
 		usec += 1000000;
@@ -218,12 +219,12 @@ static void add_lat(struct stats *s, unsigned int us, const char *name)
 
 static int write_work(struct work_item *work)
 {
-	struct timeval s, e;
+	struct timespec s, e;
 	ssize_t ret;
 
-	gettimeofday(&s, NULL);
+	clock_gettime(CLOCK_MONOTONIC, &s);
 	ret = write(STDOUT_FILENO, work->buf, work->buf_size);
-	gettimeofday(&e, NULL);
+	clock_gettime(CLOCK_MONOTONIC, &e);
 	assert(ret == work->buf_size);
 
 	add_lat(&work->writer->s, utime_since(&s, &e), "write");
@@ -269,13 +270,13 @@ static void *writer_fn(void *data)
 
 static void reader_work(struct work_item *work)
 {
-	struct timeval s, e;
+	struct timespec s, e;
 	ssize_t ret;
 	size_t left;
 	void *buf;
 	off_t off;
 
-	gettimeofday(&s, NULL);
+	clock_gettime(CLOCK_MONOTONIC, &s);
 
 	left = work->buf_size;
 	buf = work->buf;
@@ -294,7 +295,7 @@ static void reader_work(struct work_item *work)
 		buf += ret;
 	}
 
-	gettimeofday(&e, NULL);
+	clock_gettime(CLOCK_MONOTONIC, &e);
 
 	add_lat(&work->reader->s, utime_since(&s, &e), "read");
 
@@ -461,8 +462,17 @@ static void show_latencies(struct stats *s, const char *msg)
 
 static void init_thread(struct thread_data *thread)
 {
-	pthread_cond_init(&thread->cond, NULL);
-	pthread_cond_init(&thread->done_cond, NULL);
+	pthread_condattr_t cattr;
+	int ret;
+
+	ret = pthread_condattr_init(&cattr);
+	assert(ret == 0);
+#ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
+	ret = pthread_condattr_setclock(&cattr, CLOCK_MONOTONIC);
+	assert(ret == 0);
+#endif
+	pthread_cond_init(&thread->cond, &cattr);
+	pthread_cond_init(&thread->done_cond, &cattr);
 	pthread_mutex_init(&thread->lock, NULL);
 	pthread_mutex_init(&thread->done_lock, NULL);
 	thread->exit = 0;
@@ -479,12 +489,14 @@ static void exit_thread(struct thread_data *thread,
 		pthread_mutex_lock(&thread->done_lock);
 
 		if (fn) {
-			struct timeval tv;
 			struct timespec ts;
 
-			gettimeofday(&tv, NULL);
-			ts.tv_sec = tv.tv_sec + 1;
-			ts.tv_nsec = tv.tv_usec * 1000ULL;
+#ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
+			clock_gettime(CLOCK_MONOTONIC, &ts);
+#else
+			clock_gettime(CLOCK_REALTIME, &ts);
+#endif
+			ts.tv_sec++;
 
 			pthread_cond_timedwait(&thread->done_cond, &thread->done_lock, &ts);
 			fn(wt);
@@ -562,7 +574,8 @@ static void prune_done_entries(struct writer_thread *wt)
 
 int main(int argc, char *argv[])
 {
-	struct timeval s, re, we;
+	pthread_condattr_t cattr;
+	struct timespec s, re, we;
 	struct reader_thread *rt;
 	struct writer_thread *wt;
 	unsigned long rate;
@@ -570,6 +583,7 @@ int main(int argc, char *argv[])
 	size_t bytes;
 	off_t off;
 	int fd, seq;
+	int ret;
 
 	if (parse_options(argc, argv))
 		return 1;
@@ -605,13 +619,19 @@ int main(int argc, char *argv[])
 	seq = 0;
 	bytes = 0;
 
-	gettimeofday(&s, NULL);
+	ret = pthread_condattr_init(&cattr);
+	assert(ret == 0);
+#ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
+	ret = pthread_condattr_setclock(&cattr, CLOCK_MONOTONIC);
+	assert(ret == 0);
+#endif
+
+	clock_gettime(CLOCK_MONOTONIC, &s);
 
 	while (sb.st_size) {
 		struct work_item *work;
 		size_t this_len;
 		struct timespec ts;
-		struct timeval tv;
 
 		prune_done_entries(wt);
 
@@ -627,14 +647,16 @@ int main(int argc, char *argv[])
 		work->seq = ++seq;
 		work->writer = wt;
 		work->reader = rt;
-		pthread_cond_init(&work->cond, NULL);
+		pthread_cond_init(&work->cond, &cattr);
 		pthread_mutex_init(&work->lock, NULL);
 
 		queue_work(rt, work);
 
-		gettimeofday(&tv, NULL);
-		ts.tv_sec = tv.tv_sec;
-		ts.tv_nsec = tv.tv_usec * 1000ULL;
+#ifdef CONFIG_PTHREAD_CONDATTR_SETCLOCK
+		clock_gettime(CLOCK_MONOTONIC, &ts);
+#else
+		clock_gettime(CLOCK_REALTIME, &ts);
+#endif
 		ts.tv_nsec += max_us * 1000ULL;
 		if (ts.tv_nsec >= 1000000000ULL) {
 			ts.tv_nsec -= 1000000000ULL;
@@ -651,10 +673,10 @@ int main(int argc, char *argv[])
 	}
 
 	exit_thread(&rt->thread, NULL, NULL);
-	gettimeofday(&re, NULL);
+	clock_gettime(CLOCK_MONOTONIC, &re);
 
 	exit_thread(&wt->thread, prune_done_entries, wt);
-	gettimeofday(&we, NULL);
+	clock_gettime(CLOCK_MONOTONIC, &we);
 
 	show_latencies(&rt->s, "READERS");
 	show_latencies(&wt->s, "WRITERS");
diff --git a/unittests/oslib/strcasestr.c b/unittests/oslib/strcasestr.c
new file mode 100644
index 00000000..19a2de37
--- /dev/null
+++ b/unittests/oslib/strcasestr.c
@@ -0,0 +1,87 @@
+/*
+ * Copyright (C) 2019 Tomohiro Kusumi <tkusumi@netbsd.org>
+ */
+#include "../unittest.h"
+
+#ifndef CONFIG_STRCASESTR
+#include "../../oslib/strcasestr.h"
+#else
+#include <string.h>
+#endif
+
+static void test_strcasestr_1(void)
+{
+	const char *haystack = "0123456789";
+	const char *p;
+
+	p = strcasestr(haystack, "012");
+	CU_ASSERT_EQUAL(p, haystack);
+
+	p = strcasestr(haystack, "12345");
+	CU_ASSERT_EQUAL(p, haystack + 1);
+
+	p = strcasestr(haystack, "1234567890");
+	CU_ASSERT_EQUAL(p, NULL);
+
+	p = strcasestr(haystack, "");
+	CU_ASSERT_EQUAL(p, haystack); /* is this expected ? */
+}
+
+static void test_strcasestr_2(void)
+{
+	const char *haystack = "ABCDEFG";
+	const char *p;
+
+	p = strcasestr(haystack, "ABC");
+	CU_ASSERT_EQUAL(p, haystack);
+
+	p = strcasestr(haystack, "BCD");
+	CU_ASSERT_EQUAL(p, haystack + 1);
+
+	p = strcasestr(haystack, "ABCDEFGH");
+	CU_ASSERT_EQUAL(p, NULL);
+
+	p = strcasestr(haystack, "");
+	CU_ASSERT_EQUAL(p, haystack); /* is this expected ? */
+}
+
+static void test_strcasestr_3(void)
+{
+	const char *haystack = "ABCDEFG";
+	const char *p;
+
+	p = strcasestr(haystack, "AbC");
+	CU_ASSERT_EQUAL(p, haystack);
+
+	p = strcasestr(haystack, "bCd");
+	CU_ASSERT_EQUAL(p, haystack + 1);
+
+	p = strcasestr(haystack, "AbcdEFGH");
+	CU_ASSERT_EQUAL(p, NULL);
+
+	p = strcasestr(haystack, "");
+	CU_ASSERT_EQUAL(p, haystack); /* is this expected ? */
+}
+
+static struct fio_unittest_entry tests[] = {
+	{
+		.name	= "strcasestr/1",
+		.fn	= test_strcasestr_1,
+	},
+	{
+		.name	= "strcasestr/2",
+		.fn	= test_strcasestr_2,
+	},
+	{
+		.name	= "strcasestr/3",
+		.fn	= test_strcasestr_3,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
+CU_ErrorCode fio_unittest_oslib_strcasestr(void)
+{
+	return fio_unittest_add_suite("oslib/strcasestr.c", NULL, NULL, tests);
+}
diff --git a/unittests/oslib/strsep.c b/unittests/oslib/strsep.c
new file mode 100644
index 00000000..7f645f40
--- /dev/null
+++ b/unittests/oslib/strsep.c
@@ -0,0 +1,107 @@
+/*
+ * Copyright (C) 2019 Tomohiro Kusumi <tkusumi@netbsd.org>
+ */
+#include "../unittest.h"
+
+#ifndef CONFIG_STRSEP
+#include "../../oslib/strsep.h"
+#else
+#include <string.h>
+#endif
+
+/*
+ * strsep(3) - "If *stringp is NULL, the strsep() function returns NULL and does
+ * nothing else."
+ */
+static void test_strsep_1(void)
+{
+	char *string = NULL;
+	const char *p;
+
+	p = strsep(&string, "");
+	CU_ASSERT_EQUAL(p, NULL);
+	CU_ASSERT_EQUAL(string, NULL);
+
+	p = strsep(&string, "ABC");
+	CU_ASSERT_EQUAL(p, NULL);
+	CU_ASSERT_EQUAL(string, NULL);
+}
+
+/*
+ * strsep(3) - "In case no delimiter was found, the token is taken to be the
+ * entire string *stringp, and *stringp is made NULL."
+ */
+static void test_strsep_2(void)
+{
+	char src[] = "ABCDEFG";
+	char *string = src;
+	const char *p;
+
+	p = strsep(&string, "");
+	CU_ASSERT_EQUAL(p, src);
+	CU_ASSERT_EQUAL(*p, 'A');
+	CU_ASSERT_EQUAL(string, NULL);
+
+	string = src;
+	p = strsep(&string, "@");
+	CU_ASSERT_EQUAL(p, src);
+	CU_ASSERT_EQUAL(*p, 'A');
+	CU_ASSERT_EQUAL(string, NULL);
+}
+
+/*
+ * strsep(3) - "This token is terminated with a '\0' character (by overwriting
+ * the delimiter) and *stringp is updated to point past the token."
+ */
+static void test_strsep_3(void)
+{
+	char src[] = "ABCDEFG";
+	char *string = src;
+	const char *p;
+
+	p = strsep(&string, "ABC");
+	CU_ASSERT_EQUAL(p, &src[0]);
+	CU_ASSERT_EQUAL(*p, '\0');
+	CU_ASSERT_EQUAL(strcmp(string, "BCDEFG"), 0);
+	CU_ASSERT_EQUAL(*string, 'B');
+
+	p = strsep(&string, "ABC");
+	CU_ASSERT_EQUAL(p, &src[1]);
+	CU_ASSERT_EQUAL(*p, '\0');
+	CU_ASSERT_EQUAL(strcmp(string, "CDEFG"), 0);
+	CU_ASSERT_EQUAL(*string, 'C');
+
+	p = strsep(&string, "ABC");
+	CU_ASSERT_EQUAL(p, &src[2]);
+	CU_ASSERT_EQUAL(*p, '\0');
+	CU_ASSERT_EQUAL(strcmp(string, "DEFG"), 0);
+	CU_ASSERT_EQUAL(*string, 'D');
+
+	p = strsep(&string, "ABC");
+	CU_ASSERT_EQUAL(p, &src[3]);
+	CU_ASSERT_EQUAL(*p, 'D');
+	CU_ASSERT_EQUAL(string, NULL);
+}
+
+static struct fio_unittest_entry tests[] = {
+	{
+		.name	= "strsep/1",
+		.fn	= test_strsep_1,
+	},
+	{
+		.name	= "strsep/2",
+		.fn	= test_strsep_2,
+	},
+	{
+		.name	= "strsep/3",
+		.fn	= test_strsep_3,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
+CU_ErrorCode fio_unittest_oslib_strsep(void)
+{
+	return fio_unittest_add_suite("oslib/strsep.c", NULL, NULL, tests);
+}
diff --git a/unittests/unittest.c b/unittests/unittest.c
index 1166e6ef..66789e4f 100644
--- a/unittests/unittest.c
+++ b/unittests/unittest.c
@@ -62,6 +62,8 @@ int main(void)
 	fio_unittest_register(fio_unittest_lib_strntol);
 	fio_unittest_register(fio_unittest_oslib_strlcat);
 	fio_unittest_register(fio_unittest_oslib_strndup);
+	fio_unittest_register(fio_unittest_oslib_strcasestr);
+	fio_unittest_register(fio_unittest_oslib_strsep);
 
 	CU_basic_set_mode(CU_BRM_VERBOSE);
 	CU_basic_run_tests();
diff --git a/unittests/unittest.h b/unittests/unittest.h
index d3e3822f..bbc49613 100644
--- a/unittests/unittest.h
+++ b/unittests/unittest.h
@@ -22,5 +22,7 @@ CU_ErrorCode fio_unittest_lib_memalign(void);
 CU_ErrorCode fio_unittest_lib_strntol(void);
 CU_ErrorCode fio_unittest_oslib_strlcat(void);
 CU_ErrorCode fio_unittest_oslib_strndup(void);
+CU_ErrorCode fio_unittest_oslib_strcasestr(void);
+CU_ErrorCode fio_unittest_oslib_strsep(void);
 
 #endif


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-12-26 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-12-26 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b10b1e70afaff8c9b00005e9238f2ad347a9c00a:

  io_uring: add option for non-vectored read/write commands (2019-12-23 08:54:25 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to aae515f4e1cb0b3c003e127200d344d807032a79:

  io_uring: Enable io_uring ioengine on aarch64 arch (2019-12-25 08:02:57 -0700)

----------------------------------------------------------------
Zhenyu Ye (1):
      io_uring: Enable io_uring ioengine on aarch64 arch

 arch/arch-aarch64.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

---

Diff of recent changes:

diff --git a/arch/arch-aarch64.h b/arch/arch-aarch64.h
index 2a86cc5a..de9b349b 100644
--- a/arch/arch-aarch64.h
+++ b/arch/arch-aarch64.h
@@ -8,6 +8,18 @@
 
 #define FIO_ARCH	(arch_aarch64)
 
+#define ARCH_HAVE_IOURING
+
+#ifndef __NR_sys_io_uring_setup
+#define __NR_sys_io_uring_setup		425
+#endif
+#ifndef __NR_sys_io_uring_enter
+#define __NR_sys_io_uring_enter		426
+#endif
+#ifndef __NR_sys_io_uring_register
+#define __NR_sys_io_uring_register	427
+#endif
+
 #define nop		do { __asm__ __volatile__ ("yield"); } while (0)
 #define read_barrier()	do { __sync_synchronize(); } while (0)
 #define write_barrier()	do { __sync_synchronize(); } while (0)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-12-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-12-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ab1665679b00fd2b3f955596da837d9694732b24:

  Merge branch 'doc_fixes' of https://github.com/sitsofe/fio (2019-12-21 07:04:58 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b10b1e70afaff8c9b00005e9238f2ad347a9c00a:

  io_uring: add option for non-vectored read/write commands (2019-12-23 08:54:25 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      io_uring: add option for non-vectored read/write commands

 engines/io_uring.c  | 34 +++++++++++++++++-------
 os/linux/io_uring.h | 75 ++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 87 insertions(+), 22 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 9ba126d8..7c19294b 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -75,9 +75,15 @@ struct ioring_options {
 	unsigned int sqpoll_thread;
 	unsigned int sqpoll_set;
 	unsigned int sqpoll_cpu;
+	unsigned int nonvectored;
 	unsigned int uncached;
 };
 
+static const int ddir_to_op[2][2] = {
+	{ IORING_OP_READV, IORING_OP_READ },
+	{ IORING_OP_WRITEV, IORING_OP_WRITE }
+};
+
 static int fio_ioring_sqpoll_cb(void *data, unsigned long long *val)
 {
 	struct ioring_options *o = data;
@@ -133,6 +139,15 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
 	},
+	{
+		.name	= "nonvectored",
+		.lname	= "Non-vectored",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct ioring_options, nonvectored),
+		.help	= "Use non-vectored read/write commands",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
 	{
 		.name	= "uncached",
 		.lname	= "Uncached",
@@ -174,21 +189,20 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 	}
 
 	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
+		sqe->opcode = ddir_to_op[io_u->ddir][!!o->nonvectored];
 		if (o->fixedbufs) {
-			if (io_u->ddir == DDIR_READ)
-				sqe->opcode = IORING_OP_READ_FIXED;
-			else
-				sqe->opcode = IORING_OP_WRITE_FIXED;
 			sqe->addr = (unsigned long) io_u->xfer_buf;
 			sqe->len = io_u->xfer_buflen;
 			sqe->buf_index = io_u->index;
 		} else {
-			if (io_u->ddir == DDIR_READ)
-				sqe->opcode = IORING_OP_READV;
-			else
-				sqe->opcode = IORING_OP_WRITEV;
-			sqe->addr = (unsigned long) &ld->iovecs[io_u->index];
-			sqe->len = 1;
+			if (o->nonvectored) {
+				sqe->addr = (unsigned long)
+						ld->iovecs[io_u->index].iov_base;
+				sqe->len = ld->iovecs[io_u->index].iov_len;
+			} else {
+				sqe->addr = (unsigned long) &ld->iovecs[io_u->index];
+				sqe->len = 1;
+			}
 		}
 		if (!td->o.odirect && o->uncached)
 			sqe->rw_flags = RWF_UNCACHED;
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index ce03151e..03d2dde4 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -19,7 +19,10 @@ struct io_uring_sqe {
 	__u8	flags;		/* IOSQE_ flags */
 	__u16	ioprio;		/* ioprio for the request */
 	__s32	fd;		/* file descriptor to do IO on */
-	__u64	off;		/* offset into file */
+	union {
+		__u64	off;	/* offset into file */
+		__u64	addr2;
+	};
 	__u64	addr;		/* pointer to buffer or iovecs */
 	__u32	len;		/* buffer size or number of iovecs */
 	union {
@@ -27,6 +30,12 @@ struct io_uring_sqe {
 		__u32		fsync_flags;
 		__u16		poll_events;
 		__u32		sync_range_flags;
+		__u32		msg_flags;
+		__u32		timeout_flags;
+		__u32		accept_flags;
+		__u32		cancel_flags;
+		__u32		open_flags;
+		__u32		statx_flags;
 	};
 	__u64	user_data;	/* data to be passed back at completion time */
 	union {
@@ -40,7 +49,9 @@ struct io_uring_sqe {
  */
 #define IOSQE_FIXED_FILE	(1U << 0)	/* use fixed fileset */
 #define IOSQE_IO_DRAIN		(1U << 1)	/* issue after inflight IO */
-#define IOSQE_IO_LINK		(1U << 2)	/* next IO depends on this one */
+#define IOSQE_IO_LINK		(1U << 2)	/* links next sqe */
+#define IOSQE_IO_HARDLINK	(1U << 3)	/* like LINK, but stronger */
+#define IOSQE_ASYNC		(1U << 4)	/* always go async */
 
 /*
  * io_uring_setup() flags
@@ -48,22 +59,48 @@ struct io_uring_sqe {
 #define IORING_SETUP_IOPOLL	(1U << 0)	/* io_context is polled */
 #define IORING_SETUP_SQPOLL	(1U << 1)	/* SQ poll thread */
 #define IORING_SETUP_SQ_AFF	(1U << 2)	/* sq_thread_cpu is valid */
+#define IORING_SETUP_CQSIZE	(1U << 3)	/* app defines CQ size */
+
+enum {
+	IORING_OP_NOP,
+	IORING_OP_READV,
+	IORING_OP_WRITEV,
+	IORING_OP_FSYNC,
+	IORING_OP_READ_FIXED,
+	IORING_OP_WRITE_FIXED,
+	IORING_OP_POLL_ADD,
+	IORING_OP_POLL_REMOVE,
+	IORING_OP_SYNC_FILE_RANGE,
+	IORING_OP_SENDMSG,
+	IORING_OP_RECVMSG,
+	IORING_OP_TIMEOUT,
+	IORING_OP_TIMEOUT_REMOVE,
+	IORING_OP_ACCEPT,
+	IORING_OP_ASYNC_CANCEL,
+	IORING_OP_LINK_TIMEOUT,
+	IORING_OP_CONNECT,
+	IORING_OP_FALLOCATE,
+	IORING_OP_OPENAT,
+	IORING_OP_CLOSE,
+	IORING_OP_FILES_UPDATE,
+	IORING_OP_STATX,
+	IORING_OP_READ,
+	IORING_OP_WRITE,
 
-#define IORING_OP_NOP		0
-#define IORING_OP_READV		1
-#define IORING_OP_WRITEV	2
-#define IORING_OP_FSYNC		3
-#define IORING_OP_READ_FIXED	4
-#define IORING_OP_WRITE_FIXED	5
-#define IORING_OP_POLL_ADD	6
-#define IORING_OP_POLL_REMOVE	7
-#define IORING_OP_SYNC_FILE_RANGE	8
+	/* this goes last, obviously */
+	IORING_OP_LAST,
+};
 
 /*
  * sqe->fsync_flags
  */
 #define IORING_FSYNC_DATASYNC	(1U << 0)
 
+/*
+ * sqe->timeout_flags
+ */
+#define IORING_TIMEOUT_ABS	(1U << 0)
+
 /*
  * IO completion data structure (Completion Queue Entry)
  */
@@ -125,11 +162,19 @@ struct io_uring_params {
 	__u32 flags;
 	__u32 sq_thread_cpu;
 	__u32 sq_thread_idle;
-	__u32 resv[5];
+	__u32 features;
+	__u32 resv[4];
 	struct io_sqring_offsets sq_off;
 	struct io_cqring_offsets cq_off;
 };
 
+/*
+ * io_uring_params->features flags
+ */
+#define IORING_FEAT_SINGLE_MMAP		(1U << 0)
+#define IORING_FEAT_NODROP		(1U << 1)
+#define IORING_FEAT_SUBMIT_STABLE	(1U << 2)
+
 /*
  * io_uring_register(2) opcodes and arguments
  */
@@ -139,5 +184,11 @@ struct io_uring_params {
 #define IORING_UNREGISTER_FILES		3
 #define IORING_REGISTER_EVENTFD		4
 #define IORING_UNREGISTER_EVENTFD	5
+#define IORING_REGISTER_FILES_UPDATE	6
+
+struct io_uring_files_update {
+	__u32 offset;
+	__s32 *fds;
+};
 
 #endif


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-12-22 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-12-22 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8d853d0bbf4487af9445f7253c1951970ce75200:

  Merge branch 'mine/patch-2' of https://github.com/hannesweisbach/fio (2019-12-19 05:38:30 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ab1665679b00fd2b3f955596da837d9694732b24:

  Merge branch 'doc_fixes' of https://github.com/sitsofe/fio (2019-12-21 07:04:58 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'travis' of https://github.com/sitsofe/fio
      Merge branch 'doc_fixes' of https://github.com/sitsofe/fio

Sitsofe Wheeler (5):
      travis: remove duplicate xcode image and add comments
      travis: switch to ubuntu 18.04 and install more libraries
      doc: stop saying backslashes need escaping
      HOWTO: fix up broken formatting in registerfiles section
      doc: fix up sphinx warnings

 .travis.yml |  9 ++++-----
 HOWTO       | 13 +++++++------
 doc/conf.py |  2 +-
 fio.1       |  8 ++++----
 4 files changed, 16 insertions(+), 16 deletions(-)

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
index eee31988..77c31b77 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -1,4 +1,5 @@
 language: c
+dist: bionic
 os:
   - linux
 compiler:
@@ -12,13 +13,11 @@ env:
     - MAKEFLAGS="-j 2"
 matrix:
   include:
+    # Default xcode image
     - os: osx
       compiler: clang # Workaround travis setting CC=["clang", "gcc"]
       env: BUILD_ARCH="x86_64"
-    - os: osx
-      compiler: clang
-      osx_image: xcode9.4
-      env: BUILD_ARCH="x86_64"
+    # Latest xcode image (needs periodic updating)
     - os: osx
       compiler: clang
       osx_image: xcode11.2
@@ -33,7 +32,7 @@ matrix:
 before_install:
   - EXTRA_CFLAGS="-Werror"
   - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then
-        pkgs=(libaio-dev libnuma-dev libz-dev librbd-dev libibverbs-dev librdmacm-dev libcunit1 libcunit1-dev);
+        pkgs=(libaio-dev libcunit1 libcunit1-dev libgoogle-perftools4 libibverbs-dev libiscsi-dev libnuma-dev librbd-dev librdmacm-dev libz-dev);
         if [[ "$BUILD_ARCH" == "x86" ]]; then
             pkgs=("${pkgs[@]/%/:i386}");
             pkgs+=(gcc-multilib python-scipy);
diff --git a/HOWTO b/HOWTO
index 41a667af..41d32c04 100644
--- a/HOWTO
+++ b/HOWTO
@@ -765,8 +765,8 @@ Target file/device
 	`filename` semantic (which generates a file for each clone if not
 	specified, but lets all clones use the same file if set).
 
-	See the :option:`filename` option for information on how to escape "``:``" and
-	"``\``" characters within the directory path itself.
+	See the :option:`filename` option for information on how to escape "``:``"
+	characters within the directory path itself.
 
 	Note: To control the directory fio will use for internal state files
 	use :option:`--aux-path`.
@@ -785,10 +785,10 @@ Target file/device
 	by this option will be :option:`size` divided by number of files unless an
 	explicit size is specified by :option:`filesize`.
 
-	Each colon and backslash in the wanted path must be escaped with a ``\``
+	Each colon in the wanted path must be escaped with a ``\``
 	character.  For instance, if the path is :file:`/dev/dsk/foo@3,0:c` then you
 	would use ``filename=/dev/dsk/foo@3,0\:c`` and if the path is
-	:file:`F:\\filename` then you would use ``filename=F\:\\filename``.
+	:file:`F:\\filename` then you would use ``filename=F\:\filename``.
 
 	On Windows, disk devices are accessed as :file:`\\\\.\\PhysicalDrive0` for
 	the first device, :file:`\\\\.\\PhysicalDrive1` for the second etc.
@@ -2046,6 +2046,7 @@ with the caveat that when used on the command line, they must come after the
 	IO latency as well.
 
 .. option:: registerfiles : [io_uring]
+
 	With this option, fio registers the set of files being used with the
 	kernel. This avoids the overhead of managing file counts in the kernel,
 	making the submission and completion part more lightweight. Required
@@ -2564,7 +2565,7 @@ I/O replay
 	(``blkparse <device> -o /dev/null -d file_for_fio.bin``).
 	You can specify a number of files by separating the names with a ':'
 	character. See the :option:`filename` option for information on how to
-	escape ':' and '\' characters within the file names. These files will
+	escape ':' characters within the file names. These files will
 	be sequentially assigned to job clones created by :option:`numjobs`.
 
 .. option:: read_iolog_chunked=bool
@@ -3990,7 +3991,7 @@ only file passed to :option:`read_iolog`. An example would look like::
 	$ fio --read_iolog="<file1>:<file2>" --merge_blktrace_file="<output_file>"
 
 Creating only the merged file can be done by passing the command line argument
-:option:`merge-blktrace-only`.
+:option:`--merge-blktrace-only`.
 
 Scaling traces can be done to see the relative impact of any particular trace
 being slowed down or sped up. :option:`merge_blktrace_scalars` takes in a colon
diff --git a/doc/conf.py b/doc/conf.py
index 087a9a11..10b72ecb 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -177,7 +177,7 @@ html_theme = 'alabaster'
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['_static']
+# html_static_path = ['_static']
 
 # Add any extra paths that contain custom files (such as robots.txt or
 # .htaccess) here, relative to this directory. These files are copied
diff --git a/fio.1 b/fio.1
index a60863f6..cf5dd853 100644
--- a/fio.1
+++ b/fio.1
@@ -536,7 +536,7 @@ set fio will use the first listed directory, and thereby matching the
 specified, but lets all clones use the same file if set).
 .RS
 .P
-See the \fBfilename\fR option for information on how to escape ':' and '\\'
+See the \fBfilename\fR option for information on how to escape ':'
 characters within the directory path itself.
 .P
 Note: To control the directory fio will use for internal state files
@@ -557,10 +557,10 @@ by this option will be \fBsize\fR divided by number of files unless an
 explicit size is specified by \fBfilesize\fR.
 .RS
 .P
-Each colon and backslash in the wanted path must be escaped with a '\\'
+Each colon in the wanted path must be escaped with a '\\'
 character. For instance, if the path is `/dev/dsk/foo@3,0:c' then you
 would use `filename=/dev/dsk/foo@3,0\\:c' and if the path is
-`F:\\filename' then you would use `filename=F\\:\\\\filename'.
+`F:\\filename' then you would use `filename=F\\:\\filename'.
 .P
 On Windows, disk devices are accessed as `\\\\.\\PhysicalDrive0' for
 the first device, `\\\\.\\PhysicalDrive1' for the second etc.
@@ -2277,7 +2277,7 @@ to replay a workload captured by blktrace. See
 replay, the file needs to be turned into a blkparse binary data file first
 (`blkparse <device> \-o /dev/null \-d file_for_fio.bin').
 You can specify a number of files by separating the names with a ':' character.
-See the \fBfilename\fR option for information on how to escape ':' and '\'
+See the \fBfilename\fR option for information on how to escape ':'
 characters within the file names. These files will be sequentially assigned to
 job clones created by \fBnumjobs\fR.
 .TP


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-12-19 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-12-19 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 397dddce43664df92101350f1fe5cd4c9cd2a2c7:

  Merge branch 'travis-xcode11.2-python' of https://github.com/vincentkfu/fio (2019-12-16 16:08:35 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8d853d0bbf4487af9445f7253c1951970ce75200:

  Merge branch 'mine/patch-2' of https://github.com/hannesweisbach/fio (2019-12-19 05:38:30 -0700)

----------------------------------------------------------------
Hannes Weisbach (2):
      Expand choices for exitall
      Add example job file for exit_what

Jens Axboe (4):
      Merge branch 'cygwin-build-error' of https://github.com/vincentkfu/fio
      Merge branch 'issue-878' of https://github.com/vincentkfu/fio
      Merge branch 'windows_mkdir' of https://github.com/sitsofe/fio
      Merge branch 'mine/patch-2' of https://github.com/hannesweisbach/fio

Sitsofe Wheeler (1):
      filesetup: fix directory creation issues

Vincent Fu (2):
      Makefile: add libssp for Windows
      client/server: add missing fsync data structures

 HOWTO                 | 18 ++++++++++++++---
 Makefile              |  4 ++--
 backend.c             | 12 +++++------
 cconv.c               |  6 ++++--
 client.c              |  8 +++++++-
 configure             | 19 -----------------
 examples/exitwhat.fio | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++
 filesetup.c           | 22 +++++++-------------
 fio.1                 | 17 +++++++++++++---
 fio.h                 | 10 +++++++--
 libfio.c              |  7 +++++--
 options.c             | 27 ++++++++++++++++++++++++-
 os/os-windows.h       | 19 +++++++++++++++++
 os/os.h               |  4 ++++
 server.c              | 12 ++++++++---
 thread_options.h      |  6 ++++--
 16 files changed, 186 insertions(+), 61 deletions(-)
 create mode 100644 examples/exitwhat.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 88dbb03f..41a667af 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2814,9 +2814,21 @@ Threads, processes and job synchronization
 
 .. option:: exitall
 
-	By default, fio will continue running all other jobs when one job finishes
-	but sometimes this is not the desired action.  Setting ``exitall`` will
-	instead make fio terminate all other jobs when one job finishes.
+	By default, fio will continue running all other jobs when one job finishes.
+	Sometimes this is not the desired action.  Setting ``exitall`` will instead
+	make fio terminate all jobs in the same group, as soon as one job of that
+	group finishes.
+
+.. option:: exit_what
+
+	By default, fio will continue running all other jobs when one job finishes.
+	Sometimes this is not the desired action. Setting ``exit_all`` will
+	instead make fio terminate all jobs in the same group. The option
+        ``exit_what`` allows to control which jobs get terminated when ``exitall`` is
+        enabled. The default is ``group`` and does not change the behaviour of
+        ``exitall``. The setting ``all`` terminates all jobs. The setting ``stonewall``
+        terminates all currently running jobs across all groups and continues execution
+        with the next stonewalled group.
 
 .. option:: exec_prerun=str
 
diff --git a/Makefile b/Makefile
index 7aab6abd..4a07fab3 100644
--- a/Makefile
+++ b/Makefile
@@ -209,7 +209,7 @@ ifeq ($(CONFIG_TARGET_OS), Darwin)
 endif
 ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
   SOURCE += os/windows/posix.c
-  LIBS	 += -lpthread -lpsapi -lws2_32
+  LIBS	 += -lpthread -lpsapi -lws2_32 -lssp
   CFLAGS += -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format
 endif
 
@@ -506,7 +506,7 @@ t/time-test: $(T_TT_OBJS)
 
 ifdef CONFIG_HAVE_CUNIT
 unittests/unittest: $(UT_OBJS) $(UT_TARGET_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(UT_OBJS) $(UT_TARGET_OBJS) -lcunit
+	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(UT_OBJS) $(UT_TARGET_OBJS) -lcunit $(LIBS)
 endif
 
 clean: FORCE
diff --git a/backend.c b/backend.c
index 1c339408..d0d691b3 100644
--- a/backend.c
+++ b/backend.c
@@ -81,7 +81,7 @@ static void sig_int(int sig)
 			exit_value = 128;
 		}
 
-		fio_terminate_threads(TERMINATE_ALL);
+		fio_terminate_threads(TERMINATE_ALL, TERMINATE_ALL);
 	}
 }
 
@@ -1091,7 +1091,7 @@ reap:
 		if (!in_ramp_time(td) && should_check_rate(td)) {
 			if (check_min_rate(td, &comp_time)) {
 				if (exitall_on_terminate || td->o.exitall_error)
-					fio_terminate_threads(td->groupid);
+					fio_terminate_threads(td->groupid, td->o.exit_what);
 				td_verror(td, EIO, "check_min_rate");
 				break;
 			}
@@ -1898,7 +1898,7 @@ static void *thread_main(void *data)
 		exec_string(o, o->exec_postrun, (const char *)"postrun");
 
 	if (exitall_on_terminate || (o->exitall_error && td->error))
-		fio_terminate_threads(td->groupid);
+		fio_terminate_threads(td->groupid, td->o.exit_what);
 
 err:
 	if (td->error)
@@ -2050,7 +2050,7 @@ reaped:
 	}
 
 	if (*nr_running == cputhreads && !pending && realthreads)
-		fio_terminate_threads(TERMINATE_ALL);
+		fio_terminate_threads(TERMINATE_ALL, TERMINATE_ALL);
 }
 
 static bool __check_trigger_file(void)
@@ -2100,7 +2100,7 @@ void check_trigger_file(void)
 			fio_clients_send_trigger(trigger_remote_cmd);
 		else {
 			verify_save_state(IO_LIST_ALL);
-			fio_terminate_threads(TERMINATE_ALL);
+			fio_terminate_threads(TERMINATE_ALL, TERMINATE_ALL);
 			exec_trigger(trigger_cmd);
 		}
 	}
@@ -2373,7 +2373,7 @@ reap:
 			dprint(FD_MUTEX, "wait on startup_sem\n");
 			if (fio_sem_down_timeout(startup_sem, 10000)) {
 				log_err("fio: job startup hung? exiting.\n");
-				fio_terminate_threads(TERMINATE_ALL);
+				fio_terminate_threads(TERMINATE_ALL, TERMINATE_ALL);
 				fio_abort = true;
 				nr_started--;
 				free(fd);
diff --git a/cconv.c b/cconv.c
index bff5e34f..04854b0e 100644
--- a/cconv.c
+++ b/cconv.c
@@ -236,7 +236,8 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->loops = le32_to_cpu(top->loops);
 	o->mem_type = le32_to_cpu(top->mem_type);
 	o->mem_align = le32_to_cpu(top->mem_align);
-	o->stonewall = le32_to_cpu(top->stonewall);
+	o->exit_what = le16_to_cpu(top->exit_what);
+	o->stonewall = le16_to_cpu(top->stonewall);
 	o->new_group = le32_to_cpu(top->new_group);
 	o->numjobs = le32_to_cpu(top->numjobs);
 	o->cpus_allowed_policy = le32_to_cpu(top->cpus_allowed_policy);
@@ -433,7 +434,8 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->loops = cpu_to_le32(o->loops);
 	top->mem_type = cpu_to_le32(o->mem_type);
 	top->mem_align = cpu_to_le32(o->mem_align);
-	top->stonewall = cpu_to_le32(o->stonewall);
+	top->exit_what = cpu_to_le16(o->exit_what);
+	top->stonewall = cpu_to_le16(o->stonewall);
 	top->new_group = cpu_to_le32(o->new_group);
 	top->numjobs = cpu_to_le32(o->numjobs);
 	top->cpus_allowed_policy = cpu_to_le32(o->cpus_allowed_policy);
diff --git a/client.c b/client.c
index e0047af0..55d89a0e 100644
--- a/client.c
+++ b/client.c
@@ -960,6 +960,7 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 		convert_io_stat(&dst->bw_stat[i], &src->bw_stat[i]);
 		convert_io_stat(&dst->iops_stat[i], &src->iops_stat[i]);
 	}
+	convert_io_stat(&dst->sync_stat, &src->sync_stat);
 
 	dst->usr_time		= le64_to_cpu(src->usr_time);
 	dst->sys_time		= le64_to_cpu(src->sys_time);
@@ -994,8 +995,13 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 		for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
 			dst->io_u_plat[i][j] = le64_to_cpu(src->io_u_plat[i][j]);
 
-	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+	for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
+		dst->io_u_sync_plat[j] = le64_to_cpu(src->io_u_sync_plat[j]);
+
+	for (i = 0; i < DDIR_RWDIR_SYNC_CNT; i++)
 		dst->total_io_u[i]	= le64_to_cpu(src->total_io_u[i]);
+
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		dst->short_io_u[i]	= le64_to_cpu(src->short_io_u[i]);
 		dst->drop_io_u[i]	= le64_to_cpu(src->drop_io_u[i]);
 	}
diff --git a/configure b/configure
index e32d5dcf..a1279693 100755
--- a/configure
+++ b/configure
@@ -2316,22 +2316,6 @@ if test "$enable_cuda" = "yes" && compile_prog "" "-lcuda" "cuda"; then
 fi
 print_config "cuda" "$cuda"
 
-##########################################
-# mkdir() probe. mingw apparently has a one-argument mkdir :/
-mkdir_two="no"
-cat > $TMPC << EOF
-#include <sys/stat.h>
-#include <sys/types.h>
-int main(int argc, char **argv)
-{
-  return mkdir("/tmp/bla", 0600);
-}
-EOF
-if compile_prog "" "" "mkdir(a, b)"; then
-  mkdir_two="yes"
-fi
-print_config "mkdir(a, b)" "$mkdir_two"
-
 ##########################################
 # check for cc -march=native
 build_native="no"
@@ -2705,9 +2689,6 @@ fi
 if test "$cuda" = "yes" ; then
   output_sym "CONFIG_CUDA"
 fi
-if test "$mkdir_two" = "yes" ; then
-  output_sym "CONFIG_HAVE_MKDIR_TWO"
-fi
 if test "$march_set" = "no" && test "$build_native" = "yes" ; then
   output_sym "CONFIG_BUILD_NATIVE"
 fi
diff --git a/examples/exitwhat.fio b/examples/exitwhat.fio
new file mode 100644
index 00000000..a1099f0f
--- /dev/null
+++ b/examples/exitwhat.fio
@@ -0,0 +1,56 @@
+# We want to run fast1 as long as slow1 is running, but also have a cumulative
+# report of fast1 (group_reporting=1/new_group=1).  exitall=1 would not cause
+# fast1 to stop after slow1 is done. Setting exit_what=stonewall will cause
+# alls jobs up until the next stonewall=1 setting to be stopped, when job slow1
+# finishes.
+# In this example skipping forward to slow2/fast2. slow2 has exit_what=all set,
+# which means all jobs will be cancelled when slow2 finishes. In particular,
+# runsnever will never run.
+
+[global]
+filename=/tmp/test
+filesize=1G
+blocksize=4096
+group_reporting=1
+exitall=1
+
+[slow1]
+rw=r
+numjobs=1
+ioengine=sync
+new_group=1
+thinktime=2000
+number_ios=1000
+exit_what=stonewall
+
+[fast1]
+new_group=1
+rw=randrw
+numjobs=3
+ioengine=libaio
+iodepth=32
+rate=300,300,300
+
+[slow2]
+stonewall=1
+rw=w
+numjobs=1
+ioengine=sync
+new_group=1
+thinktime=2000
+number_ios=1000
+exit_what=all
+
+[fast2]
+rw=randrw
+numjobs=3
+ioengine=libaio
+iodepth=32
+rate=300,300,300
+
+[runsnever]
+rw=randrw
+numjobs=3
+ioengine=libaio
+iodepth=32
+rate=300,300,300
diff --git a/filesetup.c b/filesetup.c
index ed3646a4..b45a5826 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -918,26 +918,18 @@ static bool create_work_dirs(struct thread_data *td, const char *fname)
 	char path[PATH_MAX];
 	char *start, *end;
 
-	if (td->o.directory) {
-		snprintf(path, PATH_MAX, "%s%c%s", td->o.directory,
-			 FIO_OS_PATH_SEPARATOR, fname);
-		start = strstr(path, fname);
-	} else {
-		snprintf(path, PATH_MAX, "%s", fname);
-		start = path;
-	}
+	snprintf(path, PATH_MAX, "%s", fname);
+	start = path;
 
 	end = start;
 	while ((end = strchr(end, FIO_OS_PATH_SEPARATOR)) != NULL) {
-		if (end == start)
-			break;
+		if (end == start) {
+			end++;
+			continue;
+		}
 		*end = '\0';
 		errno = 0;
-#ifdef CONFIG_HAVE_MKDIR_TWO
-		if (mkdir(path, 0600) && errno != EEXIST) {
-#else
-		if (mkdir(path) && errno != EEXIST) {
-#endif
+		if (fio_mkdir(path, 0700) && errno != EEXIST) {
 			log_err("fio: failed to create dir (%s): %d\n",
 				start, errno);
 			return false;
diff --git a/fio.1 b/fio.1
index 14569e9f..a60863f6 100644
--- a/fio.1
+++ b/fio.1
@@ -2509,9 +2509,20 @@ wall also implies starting a new reporting group, see
 \fBgroup_reporting\fR.
 .TP
 .BI exitall
-By default, fio will continue running all other jobs when one job finishes
-but sometimes this is not the desired action. Setting \fBexitall\fR will
-instead make fio terminate all other jobs when one job finishes.
+By default, fio will continue running all other jobs when one job finishes.
+Sometimes this is not the desired action. Setting \fBexitall\fR will instead
+make fio terminate all jobs in the same group, as soon as one job of that
+group finishes.
+.TP
+.BI exit_what
+By default, fio will continue running all other jobs when one job finishes.
+Sometimes this is not the desired action. Setting \fBexit_all\fR will instead
+make fio terminate all jobs in the same group. The option \fBexit_what\fR
+allows to control which jobs get terminated when \fBexitall\fR is enabled. The
+default is \fBgroup\fR and does not change the behaviour of \fBexitall\fR. The
+setting \fBall\fR terminates all jobs. The setting \fBstonewall\fR terminates
+all currently running jobs across all groups and continues execution with the
+next stonewalled group.
 .TP
 .BI exec_prerun \fR=\fPstr
 Before running this job, issue the command specified through
diff --git a/fio.h b/fio.h
index 2094d30b..e943ad16 100644
--- a/fio.h
+++ b/fio.h
@@ -660,8 +660,14 @@ extern const char *runstate_to_name(int runstate);
  */
 #define FIO_REAP_TIMEOUT	300
 
-#define TERMINATE_ALL		(-1U)
-extern void fio_terminate_threads(unsigned int);
+enum {
+	TERMINATE_NONE = 0,
+	TERMINATE_GROUP = 1,
+	TERMINATE_STONEWALL = 2,
+	TERMINATE_ALL = -1,
+};
+
+extern void fio_terminate_threads(unsigned int, unsigned int);
 extern void fio_mark_td_terminate(struct thread_data *);
 
 /*
diff --git a/libfio.c b/libfio.c
index 674bc1dc..7348b164 100644
--- a/libfio.c
+++ b/libfio.c
@@ -233,7 +233,7 @@ void fio_mark_td_terminate(struct thread_data *td)
 	td->terminate = true;
 }
 
-void fio_terminate_threads(unsigned int group_id)
+void fio_terminate_threads(unsigned int group_id, unsigned int terminate)
 {
 	struct thread_data *td;
 	pid_t pid = getpid();
@@ -242,7 +242,10 @@ void fio_terminate_threads(unsigned int group_id)
 	dprint(FD_PROCESS, "terminate group_id=%d\n", group_id);
 
 	for_each_td(td, i) {
-		if (group_id == TERMINATE_ALL || group_id == td->groupid) {
+		if ((terminate == TERMINATE_GROUP && group_id == TERMINATE_ALL) ||
+		    (terminate == TERMINATE_GROUP && group_id == td->groupid) ||
+		    (terminate == TERMINATE_STONEWALL && td->runstate >= TD_RUNNING) ||
+		    (terminate == TERMINATE_ALL)) {
 			dprint(FD_PROCESS, "setting terminate on %s/%d\n",
 						td->o.name, (int) td->pid);
 
diff --git a/options.c b/options.c
index fad1857e..287f0435 100644
--- a/options.c
+++ b/options.c
@@ -1235,7 +1235,8 @@ int set_name_idx(char *target, size_t tlen, char *input, int index,
 		len = snprintf(target, tlen, "%s/%s.", fname,
 				client_sockaddr_str);
 	} else
-		len = snprintf(target, tlen, "%s/", fname);
+		len = snprintf(target, tlen, "%s%c", fname,
+				FIO_OS_PATH_SEPARATOR);
 
 	target[tlen - 1] = '\0';
 	free(p);
@@ -3940,6 +3941,30 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_PROCESS,
 	},
+	{
+		.name	= "exit_what",
+		.lname	= "What jobs to quit on terminate",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct thread_options, exit_what),
+		.help	= "Fine-grained control for exitall",
+		.def	= "group",
+		.category = FIO_OPT_C_GENERAL,
+		.group	= FIO_OPT_G_PROCESS,
+		.posval	= {
+			  { .ival = "group",
+			    .oval = TERMINATE_GROUP,
+			    .help = "exit_all=1 default behaviour",
+			  },
+			  { .ival = "stonewall",
+			    .oval = TERMINATE_STONEWALL,
+			    .help = "quit all currently running jobs; continue with next stonewall",
+			  },
+			  { .ival = "all",
+			    .oval = TERMINATE_ALL,
+			    .help = "Quit everything",
+			  },
+		},
+	},
 	{
 		.name	= "exitall_on_error",
 		.lname	= "Exit-all on terminate in error",
diff --git a/os/os-windows.h b/os/os-windows.h
index 3e9f7341..6061d8c7 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -35,6 +35,7 @@ int rand_r(unsigned *);
 #define FIO_HAVE_CPU_AFFINITY
 #define FIO_HAVE_CHARDEV_SIZE
 #define FIO_HAVE_GETTID
+#define FIO_EMULATED_MKDIR_TWO
 
 #define FIO_PREFERRED_ENGINE		"windowsaio"
 #define FIO_PREFERRED_CLOCK_SOURCE	CS_CGETTIME
@@ -197,6 +198,24 @@ static inline int fio_set_sched_idle(void)
 	return (SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_IDLE))? 0 : -1;
 }
 
+static inline int fio_mkdir(const char *path, mode_t mode) {
+	DWORD dwAttr = GetFileAttributesA(path);
+
+	if (dwAttr != INVALID_FILE_ATTRIBUTES &&
+	    (dwAttr & FILE_ATTRIBUTE_DIRECTORY)) {
+		errno = EEXIST;
+		return -1;
+	}
+
+	if (CreateDirectoryA(path, NULL) == 0) {
+		log_err("CreateDirectoryA = %d\n", GetLastError());
+		errno = win_to_posix_error(GetLastError());
+		return -1;
+	}
+
+	return 0;
+}
+
 #ifdef CONFIG_WINDOWS_XP
 #include "os-windows-xp.h"
 #else
diff --git a/os/os.h b/os/os.h
index dadcd87b..9a280e54 100644
--- a/os/os.h
+++ b/os/os.h
@@ -407,4 +407,8 @@ static inline bool os_cpu_has(cpu_features feature)
 }
 #endif
 
+#ifndef FIO_EMULATED_MKDIR_TWO
+# define fio_mkdir(path, mode)	mkdir(path, mode)
 #endif
+
+#endif /* FIO_OS_H */
diff --git a/server.c b/server.c
index e7846227..b7347b43 100644
--- a/server.c
+++ b/server.c
@@ -975,7 +975,7 @@ static int handle_trigger_cmd(struct fio_net_cmd *cmd, struct flist_head *job_li
 	} else
 		fio_net_queue_cmd(FIO_NET_CMD_VTRIGGER, rep, sz, NULL, SK_F_FREE | SK_F_INLINE);
 
-	fio_terminate_threads(TERMINATE_ALL);
+	fio_terminate_threads(TERMINATE_ALL, TERMINATE_ALL);
 	fio_server_check_jobs(job_list);
 	exec_trigger(buf);
 	return 0;
@@ -992,7 +992,7 @@ static int handle_command(struct sk_out *sk_out, struct flist_head *job_list,
 
 	switch (cmd->opcode) {
 	case FIO_NET_CMD_QUIT:
-		fio_terminate_threads(TERMINATE_ALL);
+		fio_terminate_threads(TERMINATE_ALL, TERMINATE_ALL);
 		ret = 0;
 		break;
 	case FIO_NET_CMD_EXIT:
@@ -1490,6 +1490,7 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 		convert_io_stat(&p.ts.bw_stat[i], &ts->bw_stat[i]);
 		convert_io_stat(&p.ts.iops_stat[i], &ts->iops_stat[i]);
 	}
+	convert_io_stat(&p.ts.sync_stat, &ts->sync_stat);
 
 	p.ts.usr_time		= cpu_to_le64(ts->usr_time);
 	p.ts.sys_time		= cpu_to_le64(ts->sys_time);
@@ -1524,8 +1525,13 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 		for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
 			p.ts.io_u_plat[i][j] = cpu_to_le64(ts->io_u_plat[i][j]);
 
-	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+	for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
+		p.ts.io_u_sync_plat[j] = cpu_to_le64(ts->io_u_sync_plat[j]);
+
+	for (i = 0; i < DDIR_RWDIR_SYNC_CNT; i++)
 		p.ts.total_io_u[i]	= cpu_to_le64(ts->total_io_u[i]);
+
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		p.ts.short_io_u[i]	= cpu_to_le64(ts->short_io_u[i]);
 		p.ts.drop_io_u[i]	= cpu_to_le64(ts->drop_io_u[i]);
 	}
diff --git a/thread_options.h b/thread_options.h
index ee6e4d6d..4b131bda 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -201,7 +201,8 @@ struct thread_options {
 
 	unsigned long long max_latency;
 
-	unsigned int stonewall;
+	unsigned short exit_what;
+	unsigned short stonewall;
 	unsigned int new_group;
 	unsigned int numjobs;
 	os_cpu_mask_t cpumask;
@@ -489,7 +490,8 @@ struct thread_options_pack {
 	uint32_t mem_type;
 	uint32_t mem_align;
 
-	uint32_t stonewall;
+	uint16_t exit_what;
+	uint16_t stonewall;
 	uint32_t new_group;
 	uint32_t numjobs;
 	/*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-12-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-12-17 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 41ceb6c79aea52bd46ea13c5036722a294bb719e:

  t/run-fio-tests: relax acceptance criterion for t0011 (2019-12-11 20:55:07 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 397dddce43664df92101350f1fe5cd4c9cd2a2c7:

  Merge branch 'travis-xcode11.2-python' of https://github.com/vincentkfu/fio (2019-12-16 16:08:35 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      t/io_uring: check for CONFIG_HAVE_GETTID
      Fio 3.17
      Merge branch 'travis-xcode11.2-python' of https://github.com/vincentkfu/fio

Vincent Fu (1):
      .travis.yml: xcode11.2 scipy issue

 .travis.yml     | 9 ++++-----
 FIO-VERSION-GEN | 2 +-
 t/io_uring.c    | 2 ++
 3 files changed, 7 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
index 0017db56..eee31988 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -43,17 +43,16 @@ before_install:
         fi;
         sudo apt-get -qq update;
         sudo apt-get install --no-install-recommends -qq -y "${pkgs[@]}";
-    fi
+    fi;
   - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then
         brew update;
         brew install cunit;
         if [[ "$TRAVIS_OSX_IMAGE" == "xcode11.2" ]]; then
             pip3 install scipy;
-        else
-            pip install scipy;
         fi;
-    fi
+        pip install scipy;
+    fi;
 script:
   - ./configure --extra-cflags="${EXTRA_CFLAGS}" && make
   - make test
-  - sudo python3 t/run-fio-tests.py --skip 6 1007 1008
+  - sudo python3 t/run-fio-tests.py --skip 6 1007 1008 --debug
diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index d5cec22e..13ce8c16 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.16
+DEF_VER=fio-3.17
 
 LF='
 '
diff --git a/t/io_uring.c b/t/io_uring.c
index 62dee805..c2e5e098 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -133,10 +133,12 @@ static int io_uring_enter(struct submitter *s, unsigned int to_submit,
 			min_complete, flags, NULL, 0);
 }
 
+#ifndef CONFIG_HAVE_GETTID
 static int gettid(void)
 {
 	return syscall(__NR_gettid);
 }
+#endif
 
 static unsigned file_depth(struct submitter *s)
 {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-12-12 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-12-12 13:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 25764 bytes --]

The following changes since commit fd9882facefa0f5b09c09d2bc5cb3a2b6eabda1a:

  filesetup: ensure to setup random generator properly (2019-12-06 22:03:04 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 41ceb6c79aea52bd46ea13c5036722a294bb719e:

  t/run-fio-tests: relax acceptance criterion for t0011 (2019-12-11 20:55:07 -0700)

----------------------------------------------------------------
Vincent Fu (9):
      .gitignore: ignore zbd test output files
      t/run-fio-tests: a few small improvements
      t/run-fio-tests: detect requirements and skip tests accordingly
      t/run-fio-tests: improve Windows support
      t/run-fio-tests: identify test id for debug messages
      t/steadystate_tests: use null ioengine for tests
      .travis.yml: run t/run-fio.tests.py as part of build
      .appveyor.yml: run run-fio-tests.py
      t/run-fio-tests: relax acceptance criterion for t0011

 .appveyor.yml          |   6 +-
 .gitignore             |   1 +
 .travis.yml            |  26 +++---
 t/run-fio-tests.py     | 219 +++++++++++++++++++++++++++++++++++++++++++------
 t/steadystate_tests.py |  21 +----
 5 files changed, 219 insertions(+), 54 deletions(-)

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
index ca8b2ab1..f6934096 100644
--- a/.appveyor.yml
+++ b/.appveyor.yml
@@ -13,8 +13,9 @@ environment:
       CONFIGURE_OPTIONS: --build-32bit-win --target-win-ver=xp
 
 install:
-  - '%CYG_ROOT%\setup-x86_64.exe --quiet-mode --no-shortcuts --only-site --site "%CYG_MIRROR%" --packages "mingw64-%PACKAGE_ARCH%-zlib" > NUL'
-  - SET PATH=%CYG_ROOT%\bin;%PATH% #��NB: Changed env variables persist to later sections
+  - '%CYG_ROOT%\setup-x86_64.exe --quiet-mode --no-shortcuts --only-site --site "%CYG_MIRROR%" --packages "mingw64-%PACKAGE_ARCH%-zlib,mingw64-%PACKAGE_ARCH%-CUnit" > NUL'
+  - SET PATH=C:\Python38-x64;%CYG_ROOT%\bin;%PATH% #��NB: Changed env variables persist to later sections
+  - python.exe -m pip install scipy
 
 build_script:
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure --disable-native --extra-cflags=\"-Werror\" ${CONFIGURE_OPTIONS} && make.exe'
@@ -24,6 +25,7 @@ after_build:
 
 test_script:
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && file.exe fio.exe && make.exe test'
+  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && python.exe t/run-fio-tests.py --skip 5 --debug'
 
 artifacts:
   - path: os\windows\*.msi
diff --git a/.gitignore b/.gitignore
index f86bec64..b228938d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -25,3 +25,4 @@ lex.yy.c
 doc/output
 /tags
 /TAGS
+/t/zbd/test-zbd-support.log.*
diff --git a/.travis.yml b/.travis.yml
index 4a87fe6c..0017db56 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -15,19 +15,13 @@ matrix:
     - os: osx
       compiler: clang # Workaround travis setting CC=["clang", "gcc"]
       env: BUILD_ARCH="x86_64"
-    # Build using the 10.12 SDK but target and run on OSX 10.11
-#   - os: osx
-#     compiler: clang
-#     osx_image: xcode8
-#     env: SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk MACOSX_DEPLOYMENT_TARGET=10.11
-    # Build on the latest OSX version (will eventually become obsolete)
     - os: osx
       compiler: clang
-      osx_image: xcode8.3
+      osx_image: xcode9.4
       env: BUILD_ARCH="x86_64"
     - os: osx
       compiler: clang
-      osx_image: xcode9.4
+      osx_image: xcode11.2
       env: BUILD_ARCH="x86_64"
   exclude:
     - os: osx
@@ -39,17 +33,27 @@ matrix:
 before_install:
   - EXTRA_CFLAGS="-Werror"
   - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then
-        pkgs=(libaio-dev libnuma-dev libz-dev librbd-dev libibverbs-dev librdmacm-dev);
+        pkgs=(libaio-dev libnuma-dev libz-dev librbd-dev libibverbs-dev librdmacm-dev libcunit1 libcunit1-dev);
         if [[ "$BUILD_ARCH" == "x86" ]]; then
             pkgs=("${pkgs[@]/%/:i386}");
-            pkgs+=(gcc-multilib);
+            pkgs+=(gcc-multilib python-scipy);
             EXTRA_CFLAGS="${EXTRA_CFLAGS} -m32";
         else
-            pkgs+=(glusterfs-common);
+            pkgs+=(glusterfs-common python-scipy);
         fi;
         sudo apt-get -qq update;
         sudo apt-get install --no-install-recommends -qq -y "${pkgs[@]}";
     fi
+  - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then
+        brew update;
+        brew install cunit;
+        if [[ "$TRAVIS_OSX_IMAGE" == "xcode11.2" ]]; then
+            pip3 install scipy;
+        else
+            pip install scipy;
+        fi;
+    fi
 script:
   - ./configure --extra-cflags="${EXTRA_CFLAGS}" && make
   - make test
+  - sudo python3 t/run-fio-tests.py --skip 6 1007 1008
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index cf8de093..a0a1e8fa 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -21,7 +21,7 @@
 #
 #
 # REQUIREMENTS
-# - Python 3
+# - Python 3.5 (subprocess.run)
 # - Linux (libaio ioengine, zbd tests, etc)
 # - The artifact directory must be on a file system that accepts 512-byte IO
 #   (t0002, t0003, t0004).
@@ -39,16 +39,18 @@
 #
 # TODO  run multiple tests simultaneously
 # TODO  Add sgunmap tests (requires SAS SSD)
-# TODO  automatically detect dependencies and skip tests accordingly
 #
 
 import os
 import sys
 import json
 import time
+import shutil
 import logging
 import argparse
+import platform
 import subprocess
+import multiprocessing
 from pathlib import Path
 
 
@@ -135,7 +137,7 @@ class FioExeTest(FioTest):
                                     universal_newlines=True)
             proc.communicate(timeout=self.success['timeout'])
             exticode_file.write('{0}\n'.format(proc.returncode))
-            logging.debug("return code: %d" % proc.returncode)
+            logging.debug("Test %d: return code: %d" % (self.testnum, proc.returncode))
             self.output['proc'] = proc
         except subprocess.TimeoutExpired:
             proc.terminate()
@@ -177,8 +179,8 @@ class FioExeTest(FioTest):
                     self.failure_reason = "{0} zero return code,".format(self.failure_reason)
                     self.passed = False
 
+        stderr_size = os.path.getsize(self.stderr_file)
         if 'stderr_empty' in self.success:
-            stderr_size = os.path.getsize(self.stderr_file)
             if self.success['stderr_empty']:
                 if stderr_size != 0:
                     self.failure_reason = "{0} stderr not empty,".format(self.failure_reason)
@@ -252,7 +254,7 @@ class FioJobTest(FioExeTest):
         if not self.precon_failed:
             super(FioJobTest, self).run()
         else:
-            logging.debug("precondition step failed")
+            logging.debug("Test %d: precondition step failed" % self.testnum)
 
     def check_result(self):
         if self.precon_failed:
@@ -262,15 +264,38 @@ class FioJobTest(FioExeTest):
 
         super(FioJobTest, self).check_result()
 
-        if 'json' in self.output_format:
-            output_file = open(os.path.join(self.test_dir, self.fio_output), "r")
-            file_data = output_file.read()
-            output_file.close()
+        if not self.passed:
+            return
+
+        if not 'json' in self.output_format:
+            return
+
+        try:
+            with open(os.path.join(self.test_dir, self.fio_output), "r") as output_file:
+                file_data = output_file.read()
+        except EnvironmentError:
+            self.failure_reason = "{0} unable to open output file,".format(self.failure_reason)
+            self.passed = False
+            return
+
+        #
+        # Sometimes fio informational messages are included at the top of the
+        # JSON output, especially under Windows. Try to decode output as JSON
+        # data, lopping off up to the first four lines
+        #
+        lines = file_data.splitlines()
+        for i in range(5):
+            file_data = '\n'.join(lines[i:])
             try:
                 self.json_data = json.loads(file_data)
             except json.JSONDecodeError:
-                self.failure_reason = "{0} unable to decode JSON data,".format(self.failure_reason)
-                self.passed = False
+                continue
+            else:
+                logging.debug("Test %d: skipped %d lines decoding JSON data" % (self.testnum, i))
+                return
+
+        self.failure_reason = "{0} unable to decode JSON data,".format(self.failure_reason)
+        self.passed = False
 
 
 class FioJobTest_t0005(FioJobTest):
@@ -303,7 +328,7 @@ class FioJobTest_t0006(FioJobTest):
 
         ratio = self.json_data['jobs'][0]['read']['io_kbytes'] \
             / self.json_data['jobs'][0]['write']['io_kbytes']
-        logging.debug("ratio: %f" % ratio)
+        logging.debug("Test %d: ratio: %f" % (self.testnum, ratio))
         if ratio < 1.99 or ratio > 2.01:
             self.failure_reason = "{0} read/write ratio mismatch,".format(self.failure_reason)
             self.passed = False
@@ -339,7 +364,7 @@ class FioJobTest_t0008(FioJobTest):
             return
 
         ratio = self.json_data['jobs'][0]['write']['io_kbytes'] / 16568
-        logging.debug("ratio: %f" % ratio)
+        logging.debug("Test %d: ratio: %f" % (self.testnum, ratio))
 
         if ratio < 0.99 or ratio > 1.01:
             self.failure_reason = "{0} bytes written mismatch,".format(self.failure_reason)
@@ -359,7 +384,7 @@ class FioJobTest_t0009(FioJobTest):
         if not self.passed:
             return
 
-        logging.debug('elapsed: %d' % self.json_data['jobs'][0]['elapsed'])
+        logging.debug('Test %d: elapsed: %d' % (self.testnum, self.json_data['jobs'][0]['elapsed']))
 
         if self.json_data['jobs'][0]['elapsed'] < 60:
             self.failure_reason = "{0} elapsed time mismatch,".format(self.failure_reason)
@@ -381,9 +406,10 @@ class FioJobTest_t0011(FioJobTest):
         iops1 = self.json_data['jobs'][0]['read']['iops']
         iops2 = self.json_data['jobs'][1]['read']['iops']
         ratio = iops2 / iops1
-        logging.debug("ratio: %f" % ratio)
+        logging.debug("Test %d: iops1: %f" % (self.testnum, iops1))
+        logging.debug("Test %d: ratio: %f" % (self.testnum, ratio))
 
-        if iops1 < 999 or iops1 > 1001:
+        if iops1 < 998 or iops1 > 1002:
             self.failure_reason = "{0} iops value mismatch,".format(self.failure_reason)
             self.passed = False
 
@@ -392,6 +418,89 @@ class FioJobTest_t0011(FioJobTest):
             self.passed = False
 
 
+class Requirements(object):
+    """Requirements consists of multiple run environment characteristics.
+    These are to determine if a particular test can be run"""
+
+    _linux = False
+    _libaio = False
+    _zbd = False
+    _root = False
+    _zoned_nullb = False
+    _not_macos = False
+    _unittests = False
+    _cpucount4 = False
+
+    def __init__(self, fio_root):
+        Requirements._not_macos = platform.system() != "Darwin"
+        Requirements._linux = platform.system() == "Linux"
+
+        if Requirements._linux:
+            try:
+                config_file = os.path.join(fio_root, "config-host.h")
+                with open(config_file, "r") as config:
+                    contents = config.read()
+            except Exception:
+                print("Unable to open {0} to check requirements".format(config_file))
+                Requirements._zbd = True
+            else:
+                Requirements._zbd = "CONFIG_LINUX_BLKZONED" in contents
+                Requirements._libaio = "CONFIG_LIBAIO" in contents
+
+            Requirements._root = (os.geteuid() == 0)
+            if Requirements._zbd and Requirements._root:
+                    subprocess.run(["modprobe", "null_blk"],
+                                   stdout=subprocess.PIPE,
+                                   stderr=subprocess.PIPE)
+                    if os.path.exists("/sys/module/null_blk/parameters/zoned"):
+                        Requirements._zoned_nullb = True
+
+        if platform.system() == "Windows":
+            utest_exe = "unittest.exe"
+        else:
+            utest_exe = "unittest"
+        unittest_path = os.path.join(fio_root, "unittests", utest_exe)
+        Requirements._unittests = os.path.exists(unittest_path)
+
+        Requirements._cpucount4 = multiprocessing.cpu_count() >= 4
+
+        req_list = [Requirements.linux,
+                    Requirements.libaio,
+                    Requirements.zbd,
+                    Requirements.root,
+                    Requirements.zoned_nullb,
+                    Requirements.not_macos,
+                    Requirements.unittests,
+                    Requirements.cpucount4]
+        for req in req_list:
+            value, desc = req()
+            logging.debug("Requirements: Requirement '%s' met? %s" % (desc, value))
+
+    def linux():
+        return Requirements._linux, "Linux required"
+
+    def libaio():
+        return Requirements._libaio, "libaio required"
+
+    def zbd():
+        return Requirements._zbd, "Zoned block device support required"
+
+    def root():
+        return Requirements._root, "root required"
+
+    def zoned_nullb():
+        return Requirements._zoned_nullb, "Zoned null block device support required"
+
+    def not_macos():
+        return Requirements._not_macos, "platform other than macOS required"
+
+    def unittests():
+        return Requirements._unittests, "Unittests support required"
+
+    def cpucount4():
+        return Requirements._cpucount4, "4+ CPUs required"
+
+
 SUCCESS_DEFAULT = {
         'zero_return': True,
         'stderr_empty': True,
@@ -415,6 +524,7 @@ TEST_LIST = [
             'success':          SUCCESS_DEFAULT,
             'pre_job':          None,
             'pre_success':      None,
+            'requirements':     [],
         },
         {
             'test_id':          2,
@@ -423,6 +533,7 @@ TEST_LIST = [
             'success':          SUCCESS_DEFAULT,
             'pre_job':          't0002-13af05ae-pre.fio',
             'pre_success':      None,
+            'requirements':     [Requirements.linux, Requirements.libaio],
         },
         {
             'test_id':          3,
@@ -431,6 +542,7 @@ TEST_LIST = [
             'success':          SUCCESS_NONZERO,
             'pre_job':          't0003-0ae2c6e1-pre.fio',
             'pre_success':      SUCCESS_DEFAULT,
+            'requirements':     [Requirements.linux, Requirements.libaio],
         },
         {
             'test_id':          4,
@@ -439,6 +551,7 @@ TEST_LIST = [
             'success':          SUCCESS_DEFAULT,
             'pre_job':          None,
             'pre_success':      None,
+            'requirements':     [Requirements.linux, Requirements.libaio],
         },
         {
             'test_id':          5,
@@ -448,6 +561,7 @@ TEST_LIST = [
             'pre_job':          None,
             'pre_success':      None,
             'output_format':    'json',
+            'requirements':     [],
         },
         {
             'test_id':          6,
@@ -457,6 +571,7 @@ TEST_LIST = [
             'pre_job':          None,
             'pre_success':      None,
             'output_format':    'json',
+            'requirements':     [Requirements.linux, Requirements.libaio],
         },
         {
             'test_id':          7,
@@ -466,6 +581,7 @@ TEST_LIST = [
             'pre_job':          None,
             'pre_success':      None,
             'output_format':    'json',
+            'requirements':     [],
         },
         {
             'test_id':          8,
@@ -475,6 +591,7 @@ TEST_LIST = [
             'pre_job':          None,
             'pre_success':      None,
             'output_format':    'json',
+            'requirements':     [],
         },
         {
             'test_id':          9,
@@ -484,6 +601,9 @@ TEST_LIST = [
             'pre_job':          None,
             'pre_success':      None,
             'output_format':    'json',
+            'requirements':     [Requirements.not_macos,
+                                 Requirements.cpucount4],
+                                # mac os does not support CPU affinity
         },
         {
             'test_id':          10,
@@ -492,6 +612,7 @@ TEST_LIST = [
             'success':          SUCCESS_DEFAULT,
             'pre_job':          None,
             'pre_success':      None,
+            'requirements':     [],
         },
         {
             'test_id':          11,
@@ -501,6 +622,7 @@ TEST_LIST = [
             'pre_job':          None,
             'pre_success':      None,
             'output_format':    'json',
+            'requirements':     [],
         },
         {
             'test_id':          1000,
@@ -508,6 +630,7 @@ TEST_LIST = [
             'exe':              't/axmap',
             'parameters':       None,
             'success':          SUCCESS_DEFAULT,
+            'requirements':     [],
         },
         {
             'test_id':          1001,
@@ -515,6 +638,7 @@ TEST_LIST = [
             'exe':              't/ieee754',
             'parameters':       None,
             'success':          SUCCESS_DEFAULT,
+            'requirements':     [],
         },
         {
             'test_id':          1002,
@@ -522,6 +646,7 @@ TEST_LIST = [
             'exe':              't/lfsr-test',
             'parameters':       ['0xFFFFFF', '0', '0', 'verify'],
             'success':          SUCCESS_STDERR,
+            'requirements':     [],
         },
         {
             'test_id':          1003,
@@ -529,6 +654,7 @@ TEST_LIST = [
             'exe':              't/readonly.py',
             'parameters':       ['-f', '{fio_path}'],
             'success':          SUCCESS_DEFAULT,
+            'requirements':     [],
         },
         {
             'test_id':          1004,
@@ -536,6 +662,7 @@ TEST_LIST = [
             'exe':              't/steadystate_tests.py',
             'parameters':       ['{fio_path}'],
             'success':          SUCCESS_DEFAULT,
+            'requirements':     [],
         },
         {
             'test_id':          1005,
@@ -543,6 +670,7 @@ TEST_LIST = [
             'exe':              't/stest',
             'parameters':       None,
             'success':          SUCCESS_STDERR,
+            'requirements':     [],
         },
         {
             'test_id':          1006,
@@ -550,6 +678,7 @@ TEST_LIST = [
             'exe':              't/strided.py',
             'parameters':       ['{fio_path}'],
             'success':          SUCCESS_DEFAULT,
+            'requirements':     [],
         },
         {
             'test_id':          1007,
@@ -557,6 +686,8 @@ TEST_LIST = [
             'exe':              't/zbd/run-tests-against-regular-nullb',
             'parameters':       None,
             'success':          SUCCESS_DEFAULT,
+            'requirements':     [Requirements.linux, Requirements.zbd,
+                                 Requirements.root],
         },
         {
             'test_id':          1008,
@@ -564,6 +695,8 @@ TEST_LIST = [
             'exe':              't/zbd/run-tests-against-zoned-nullb',
             'parameters':       None,
             'success':          SUCCESS_DEFAULT,
+            'requirements':     [Requirements.linux, Requirements.zbd,
+                                 Requirements.root, Requirements.zoned_nullb],
         },
         {
             'test_id':          1009,
@@ -571,6 +704,7 @@ TEST_LIST = [
             'exe':              'unittests/unittest',
             'parameters':       None,
             'success':          SUCCESS_DEFAULT,
+            'requirements':     [Requirements.unittests],
         },
 ]
 
@@ -587,32 +721,48 @@ def parse_args():
                         help='list of test(s) to skip')
     parser.add_argument('-o', '--run-only', nargs='+', type=int,
                         help='list of test(s) to run, skipping all others')
+    parser.add_argument('-d', '--debug', action='store_true',
+                        help='provide debug output')
+    parser.add_argument('-k', '--skip-req', action='store_true',
+                        help='skip requirements checking')
     args = parser.parse_args()
 
     return args
 
 
 def main():
-    logging.basicConfig(level=logging.INFO)
-
     args = parse_args()
+    if args.debug:
+        logging.basicConfig(level=logging.DEBUG)
+    else:
+        logging.basicConfig(level=logging.INFO)
+
     if args.fio_root:
         fio_root = args.fio_root
     else:
-        fio_root = Path(__file__).absolute().parent.parent
-    logging.debug("fio_root: %s" % fio_root)
+        fio_root = str(Path(__file__).absolute().parent.parent)
+    print("fio root is %s" % fio_root)
 
     if args.fio:
         fio_path = args.fio
     else:
-        fio_path = os.path.join(fio_root, "fio")
-    logging.debug("fio_path: %s" % fio_path)
+        if platform.system() == "Windows":
+            fio_exe = "fio.exe"
+        else:
+            fio_exe = "fio"
+        fio_path = os.path.join(fio_root, fio_exe)
+    print("fio path is %s" % fio_path)
+    if not shutil.which(fio_path):
+        print("Warning: fio executable not found")
 
     artifact_root = args.artifact_root if args.artifact_root else \
         "fio-test-{0}".format(time.strftime("%Y%m%d-%H%M%S"))
     os.mkdir(artifact_root)
     print("Artifact directory is %s" % artifact_root)
 
+    if not args.skip_req:
+        req = Requirements(fio_root)
+
     passed = 0
     failed = 0
     skipped = 0
@@ -621,7 +771,7 @@ def main():
         if (args.skip and config['test_id'] in args.skip) or \
            (args.run_only and config['test_id'] not in args.run_only):
             skipped = skipped + 1
-            print("Test {0} SKIPPED".format(config['test_id']))
+            print("Test {0} SKIPPED (User request)".format(config['test_id']))
             continue
 
         if issubclass(config['test_class'], FioJobTest):
@@ -651,6 +801,12 @@ def main():
                 parameters = [p.format(fio_path=fio_path) for p in config['parameters']]
             else:
                 parameters = None
+            if Path(exe_path).suffix == '.py' and platform.system() == "Windows":
+                if parameters:
+                    parameters.insert(0, exe_path)
+                else:
+                    parameters = [exe_path]
+                exe_path = "python.exe"
             test = config['test_class'](exe_path, parameters,
                                         config['success'])
         else:
@@ -658,6 +814,19 @@ def main():
             failed = failed + 1
             continue
 
+        if not args.skip_req:
+            skip = False
+            for req in config['requirements']:
+                ok, reason = req()
+                skip = not ok
+                logging.debug("Test %d: Requirement '%s' met? %s" % (config['test_id'], reason, ok))
+                if skip:
+                    break
+            if skip:
+                print("Test {0} SKIPPED ({1})".format(config['test_id'], reason))
+                skipped = skipped + 1
+                continue
+
         test.setup(artifact_root, config['test_id'])
         test.run()
         test.check_result()
@@ -667,6 +836,10 @@ def main():
         else:
             result = "FAILED: {0}".format(test.failure_reason)
             failed = failed + 1
+            with open(test.stderr_file, "r") as stderr_file:
+                logging.debug("Test %d: stderr:\n%s" % (config['test_id'], stderr_file.read()))
+            with open(test.stdout_file, "r") as stdout_file:
+                logging.debug("Test %d: stdout:\n%s" % (config['test_id'], stdout_file.read()))
         print("Test {0} {1}".format(config['test_id'], result))
 
     print("{0} test(s) passed, {1} failed, {2} skipped".format(passed, failed, skipped))
diff --git a/t/steadystate_tests.py b/t/steadystate_tests.py
index 53b0f35e..9122a60f 100755
--- a/t/steadystate_tests.py
+++ b/t/steadystate_tests.py
@@ -31,12 +31,7 @@ from scipy import stats
 
 def parse_args():
     parser = argparse.ArgumentParser()
-    parser.add_argument('fio',
-                        help='path to fio executable')
-    parser.add_argument('--read',
-                        help='target for read testing')
-    parser.add_argument('--write',
-                        help='target for write testing')
+    parser.add_argument('fio', help='path to fio executable')
     args = parser.parse_args()
 
     return args
@@ -123,26 +118,16 @@ if __name__ == '__main__':
               {'s': True, 'timeout': 10, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 500, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
             ]
 
-    if args.read == None:
-        if os.name == 'posix':
-            args.read = '/dev/zero'
-            extra = [ "--size=128M" ]
-        else:
-            print("ERROR: file for read testing must be specified on non-posix systems")
-            sys.exit(1)
-    else:
-        extra = []
-
     jobnum = 0
     for job in reads:
 
         tf = "steadystate_job{0}.json".format(jobnum)
         parameters = [ "--name=job{0}".format(jobnum) ]
-        parameters.extend(extra)
         parameters.extend([ "--thread",
                             "--output-format=json",
                             "--output={0}".format(tf),
-                            "--filename={0}".format(args.read),
+                            "--ioengine=null",
+                            "--size=1G",
                             "--rw=randrw",
                             "--rwmixread=100",
                             "--stonewall",


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-12-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-12-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit bef74db41fb5a1607fd55cb86544165fc08acac1:

  Merge branch 'engine-rados-fix-first' of https://github.com/aclamk/fio (2019-11-27 10:23:12 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fd9882facefa0f5b09c09d2bc5cb3a2b6eabda1a:

  filesetup: ensure to setup random generator properly (2019-12-06 22:03:04 -0700)

----------------------------------------------------------------
Jens Axboe (4):
      io_uring: add support for RWF_UNCACHED
      pvsync2: add support for RWF_UNCACHED
      Renumber RWF_UNCACHED
      filesetup: ensure to setup random generator properly

 engines/io_uring.c | 12 ++++++++++++
 engines/sync.c     | 12 ++++++++++++
 filesetup.c        |  4 +++-
 os/os-linux.h      |  4 ++++
 4 files changed, 31 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index ef56345b..9ba126d8 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -75,6 +75,7 @@ struct ioring_options {
 	unsigned int sqpoll_thread;
 	unsigned int sqpoll_set;
 	unsigned int sqpoll_cpu;
+	unsigned int uncached;
 };
 
 static int fio_ioring_sqpoll_cb(void *data, unsigned long long *val)
@@ -132,6 +133,15 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
 	},
+	{
+		.name	= "uncached",
+		.lname	= "Uncached",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct ioring_options, uncached),
+		.help	= "Use RWF_UNCACHED for buffered read/writes",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
 	{
 		.name	= NULL,
 	},
@@ -180,6 +190,8 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 			sqe->addr = (unsigned long) &ld->iovecs[io_u->index];
 			sqe->len = 1;
 		}
+		if (!td->o.odirect && o->uncached)
+			sqe->rw_flags = RWF_UNCACHED;
 		sqe->off = io_u->offset;
 	} else if (ddir_sync(io_u->ddir)) {
 		if (io_u->ddir == DDIR_SYNC_FILE_RANGE) {
diff --git a/engines/sync.c b/engines/sync.c
index b3e1c9db..65fd210c 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -39,6 +39,7 @@ struct psyncv2_options {
 	void *pad;
 	unsigned int hipri;
 	unsigned int hipri_percentage;
+	unsigned int uncached;
 };
 
 static struct fio_option options[] = {
@@ -63,6 +64,15 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_INVALID,
 	},
+	{
+		.name	= "uncached",
+		.lname	= "Uncached",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct psyncv2_options, uncached),
+		.help	= "Use RWF_UNCACHED for buffered read/writes",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_INVALID,
+	},
 	{
 		.name	= NULL,
 	},
@@ -152,6 +162,8 @@ static enum fio_q_status fio_pvsyncio2_queue(struct thread_data *td,
 	if (o->hipri &&
 	    (rand_between(&sd->rand_state, 1, 100) <= o->hipri_percentage))
 		flags |= RWF_HIPRI;
+	if (!td->o.odirect && o->uncached)
+		flags |= RWF_UNCACHED;
 
 	iov->iov_base = io_u->xfer_buf;
 	iov->iov_len = io_u->xfer_buflen;
diff --git a/filesetup.c b/filesetup.c
index 7fe2ebd4..ed3646a4 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1287,7 +1287,9 @@ static bool init_rand_distribution(struct thread_data *td)
 	unsigned int i;
 	int state;
 
-	if (td->o.random_distribution == FIO_RAND_DIST_RANDOM)
+	if (td->o.random_distribution == FIO_RAND_DIST_RANDOM ||
+	    td->o.random_distribution == FIO_RAND_DIST_ZONED ||
+	    td->o.random_distribution == FIO_RAND_DIST_ZONED_ABS)
 		return false;
 
 	state = td_bump_runstate(td, TD_SETTING_UP);
diff --git a/os/os-linux.h b/os/os-linux.h
index 36339ef3..0f0bcc3a 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -326,6 +326,10 @@ static inline int fio_set_sched_idle(void)
 #define RWF_SYNC	0x00000004
 #endif
 
+#ifndef RWF_UNCACHED
+#define RWF_UNCACHED	0x00000040
+#endif
+
 #ifndef RWF_WRITE_LIFE_SHIFT
 #define RWF_WRITE_LIFE_SHIFT		4
 #define RWF_WRITE_LIFE_SHORT		(1 << RWF_WRITE_LIFE_SHIFT)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-11-28 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-11-28 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b4d867339d3e89ca54104df104f830aa374e31c0:

  Merge branch 'fallocate-truncate' of https://github.com/tripped/fio (2019-11-26 11:57:52 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to bef74db41fb5a1607fd55cb86544165fc08acac1:

  Merge branch 'engine-rados-fix-first' of https://github.com/aclamk/fio (2019-11-27 10:23:12 -0700)

----------------------------------------------------------------
Adam Kupczyk (1):
      engines/rados: fix error with getting last instead of first element from list

Jens Axboe (1):
      Merge branch 'engine-rados-fix-first' of https://github.com/aclamk/fio

 engines/rados.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/rados.c b/engines/rados.c
index cde538b9..30fcebb5 100644
--- a/engines/rados.c
+++ b/engines/rados.c
@@ -342,7 +342,7 @@ int fio_rados_getevents(struct thread_data *td, unsigned int min,
 		}
 		assert(!flist_empty(&rados->completed_operations));
 		
-		fri = flist_last_entry(&rados->completed_operations, struct fio_rados_iou, list);
+		fri = flist_first_entry(&rados->completed_operations, struct fio_rados_iou, list);
 		assert(fri->completion);
 		assert(rados_aio_is_complete(fri->completion));
 		if (fri->write_op != NULL) {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-11-27 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-11-27 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5a9d5c4ab5f7cea83dc94e7cd23ecceb45b68134:

  Merge branch 'rados-now-use-completion-callbacks' of https://github.com/aclamk/fio (2019-11-25 21:40:38 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b4d867339d3e89ca54104df104f830aa374e31c0:

  Merge branch 'fallocate-truncate' of https://github.com/tripped/fio (2019-11-26 11:57:52 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fallocate-truncate' of https://github.com/tripped/fio

Trip Volpe (1):
      filesetup: add fallocate=truncate option.

 HOWTO       | 14 +++++++++++++-
 file.h      |  1 +
 filesetup.c | 12 ++++++++++++
 fio.1       | 14 +++++++++++++-
 init.c      |  5 -----
 options.c   | 17 ++++++++---------
 os/os.h     |  2 +-
 7 files changed, 48 insertions(+), 17 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 64c6a033..88dbb03f 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1173,6 +1173,10 @@ I/O type
 			Pre-allocate via :manpage:`fallocate(2)` with
 			FALLOC_FL_KEEP_SIZE set.
 
+		**truncate**
+			Extend file to final size via :manpage:`ftruncate(2)`
+			instead of allocating.
+
 		**0**
 			Backward-compatible alias for **none**.
 
@@ -1182,7 +1186,15 @@ I/O type
 	May not be available on all supported platforms. **keep** is only available
 	on Linux. If using ZFS on Solaris this cannot be set to **posix**
 	because ZFS doesn't support pre-allocation. Default: **native** if any
-	pre-allocation methods are available, **none** if not.
+	pre-allocation methods except **truncate** are available, **none** if not.
+
+	Note that using **truncate** on Windows will interact surprisingly
+	with non-sequential write patterns. When writing to a file that has
+	been extended by setting the end-of-file information, Windows will
+	backfill the unwritten portion of the file up to that offset with
+	zeroes before issuing the new write. This means that a single small
+	write to the end of an extended file will stall until the entire
+	file has been filled with zeroes.
 
 .. option:: fadvise_hint=str
 
diff --git a/file.h b/file.h
index e50c0f9c..ae0e6fc8 100644
--- a/file.h
+++ b/file.h
@@ -67,6 +67,7 @@ enum fio_fallocate_mode {
 	FIO_FALLOCATE_POSIX	= 2,
 	FIO_FALLOCATE_KEEP_SIZE	= 3,
 	FIO_FALLOCATE_NATIVE	= 4,
+	FIO_FALLOCATE_TRUNCATE	= 5,
 };
 
 /*
diff --git a/filesetup.c b/filesetup.c
index 7d54c9f1..7fe2ebd4 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -95,6 +95,18 @@ static void fallocate_file(struct thread_data *td, struct fio_file *f)
 		break;
 		}
 #endif /* CONFIG_LINUX_FALLOCATE */
+	case FIO_FALLOCATE_TRUNCATE: {
+		int r;
+
+		dprint(FD_FILE, "ftruncate file %s size %llu\n",
+				f->file_name,
+				(unsigned long long) f->real_file_size);
+		r = ftruncate(f->fd, f->real_file_size);
+		if (r != 0)
+			td_verror(td, errno, "ftruncate");
+
+		break;
+	}
 	default:
 		log_err("fio: unknown fallocate mode: %d\n", td->o.fallocate_mode);
 		assert(0);
diff --git a/fio.1 b/fio.1
index 087d3778..14569e9f 100644
--- a/fio.1
+++ b/fio.1
@@ -943,6 +943,10 @@ Pre-allocate via \fBposix_fallocate\fR\|(3).
 Pre-allocate via \fBfallocate\fR\|(2) with
 FALLOC_FL_KEEP_SIZE set.
 .TP
+.B truncate
+Extend file to final size using \fBftruncate\fR|(2)
+instead of allocating.
+.TP
 .B 0
 Backward-compatible alias for \fBnone\fR.
 .TP
@@ -953,7 +957,15 @@ Backward-compatible alias for \fBposix\fR.
 May not be available on all supported platforms. \fBkeep\fR is only available
 on Linux. If using ZFS on Solaris this cannot be set to \fBposix\fR
 because ZFS doesn't support pre-allocation. Default: \fBnative\fR if any
-pre-allocation methods are available, \fBnone\fR if not.
+pre-allocation methods except \fBtruncate\fR are available, \fBnone\fR if not.
+.P
+Note that using \fBtruncate\fR on Windows will interact surprisingly
+with non-sequential write patterns. When writing to a file that has
+been extended by setting the end-of-file information, Windows will
+backfill the unwritten portion of the file up to that offset with
+zeroes before issuing the new write. This means that a single small
+write to the end of an extended file will stall until the entire
+file has been filled with zeroes.
 .RE
 .TP
 .BI fadvise_hint \fR=\fPstr
diff --git a/init.c b/init.c
index 63f2168e..60c85761 100644
--- a/init.c
+++ b/init.c
@@ -852,11 +852,6 @@ static int fixup_options(struct thread_data *td)
 			o->unit_base = N2S_BYTEPERSEC;
 	}
 
-#ifndef FIO_HAVE_ANY_FALLOCATE
-	/* Platform doesn't support any fallocate so force it to none */
-	o->fallocate_mode = FIO_FALLOCATE_NONE;
-#endif
-
 #ifndef CONFIG_FDATASYNC
 	if (o->fdatasync_blocks) {
 		log_info("fio: this platform does not support fdatasync()"
diff --git a/options.c b/options.c
index 2c5bf5e0..fad1857e 100644
--- a/options.c
+++ b/options.c
@@ -2412,14 +2412,17 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.parent = "nrfiles",
 		.hide	= 1,
 	},
-#ifdef FIO_HAVE_ANY_FALLOCATE
 	{
 		.name	= "fallocate",
 		.lname	= "Fallocate",
 		.type	= FIO_OPT_STR,
 		.off1	= offsetof(struct thread_options, fallocate_mode),
 		.help	= "Whether pre-allocation is performed when laying out files",
+#ifdef FIO_HAVE_DEFAULT_FALLOCATE
 		.def	= "native",
+#else
+		.def	= "none",
+#endif
 		.category = FIO_OPT_C_FILE,
 		.group	= FIO_OPT_G_INVALID,
 		.posval	= {
@@ -2443,6 +2446,10 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "Use fallocate(..., FALLOC_FL_KEEP_SIZE, ...)",
 			  },
 #endif
+			  { .ival = "truncate",
+			    .oval = FIO_FALLOCATE_TRUNCATE,
+			    .help = "Truncate file to final size instead of allocating"
+			  },
 			  /* Compatibility with former boolean values */
 			  { .ival = "0",
 			    .oval = FIO_FALLOCATE_NONE,
@@ -2456,14 +2463,6 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 #endif
 		},
 	},
-#else	/* FIO_HAVE_ANY_FALLOCATE */
-	{
-		.name	= "fallocate",
-		.lname	= "Fallocate",
-		.type	= FIO_OPT_UNSUPPORTED,
-		.help	= "Your platform does not support fallocate",
-	},
-#endif /* FIO_HAVE_ANY_FALLOCATE */
 	{
 		.name	= "fadvise_hint",
 		.lname	= "Fadvise hint",
diff --git a/os/os.h b/os/os.h
index e4729680..dadcd87b 100644
--- a/os/os.h
+++ b/os/os.h
@@ -397,7 +397,7 @@ static inline bool fio_fallocate(struct fio_file *f, uint64_t offset, uint64_t l
 #endif
 
 #if defined(CONFIG_POSIX_FALLOCATE) || defined(FIO_HAVE_NATIVE_FALLOCATE)
-# define FIO_HAVE_ANY_FALLOCATE
+# define FIO_HAVE_DEFAULT_FALLOCATE
 #endif
 
 #ifndef FIO_HAVE_CPU_HAS


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-11-26 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-11-26 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b048455fef67dad4ef96ce0393937322e3165b58:

  t/run-fio-tests: improve error handling (2019-11-14 14:07:25 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5a9d5c4ab5f7cea83dc94e7cd23ecceb45b68134:

  Merge branch 'rados-now-use-completion-callbacks' of https://github.com/aclamk/fio (2019-11-25 21:40:38 -0700)

----------------------------------------------------------------
Adam Kupczyk (1):
      engines/rados: changed polling to completion callbacks methodology

Jens Axboe (2):
      Merge branch 'const1' of https://github.com/kusumi/fio
      Merge branch 'rados-now-use-completion-callbacks' of https://github.com/aclamk/fio

Tomohiro Kusumi (1):
      parse: Silence discard-const warning on OpenBSD

 engines/rados.c | 105 +++++++++++++++++++++++++++++---------------------------
 parse.c         |  15 +++++++-
 2 files changed, 68 insertions(+), 52 deletions(-)

---

Diff of recent changes:

diff --git a/engines/rados.c b/engines/rados.c
index 86100dc4..cde538b9 100644
--- a/engines/rados.c
+++ b/engines/rados.c
@@ -11,20 +11,24 @@
 #include "fio.h"
 #include "../optgroup.h"
 
+struct rados_data {
+        rados_t cluster;
+        rados_ioctx_t io_ctx;
+        struct io_u **aio_events;
+        bool connected;
+        pthread_mutex_t completed_lock;
+        pthread_cond_t completed_more_io;
+        struct flist_head completed_operations;
+};
+
 struct fio_rados_iou {
+	struct flist_head list;
 	struct thread_data *td;
 	struct io_u *io_u;
 	rados_completion_t completion;
 	rados_write_op_t write_op;
 };
 
-struct rados_data {
-	rados_t cluster;
-	rados_ioctx_t io_ctx;
-	struct io_u **aio_events;
-	bool connected;
-};
-
 /* fio configuration options read from the job file */
 struct rados_options {
 	void *pad;
@@ -94,6 +98,9 @@ static int _fio_setup_rados_data(struct thread_data *td,
 	rados->aio_events = calloc(td->o.iodepth, sizeof(struct io_u *));
 	if (!rados->aio_events)
 		goto failed;
+	pthread_mutex_init(&rados->completed_lock, NULL);
+	pthread_cond_init(&rados->completed_more_io, NULL);
+	INIT_FLIST_HEAD(&rados->completed_operations);
 	*rados_data_ptr = rados;
 	return 0;
 
@@ -229,6 +236,18 @@ static void fio_rados_cleanup(struct thread_data *td)
 	}
 }
 
+static void complete_callback(rados_completion_t cb, void *arg)
+{
+	struct fio_rados_iou *fri = (struct fio_rados_iou *)arg;
+	struct rados_data *rados = fri->td->io_ops_data;
+	assert(fri->completion);
+	assert(rados_aio_is_complete(fri->completion));
+	pthread_mutex_lock(&rados->completed_lock);
+	flist_add_tail(&fri->list, &rados->completed_operations);
+	pthread_mutex_unlock(&rados->completed_lock);
+	pthread_cond_signal(&rados->completed_more_io);
+}
+
 static enum fio_q_status fio_rados_queue(struct thread_data *td,
 					 struct io_u *io_u)
 {
@@ -240,7 +259,7 @@ static enum fio_q_status fio_rados_queue(struct thread_data *td,
 	fio_ro_check(td, io_u);
 
 	if (io_u->ddir == DDIR_WRITE) {
-		 r = rados_aio_create_completion(fri, NULL,
+		 r = rados_aio_create_completion(fri, complete_callback,
 			NULL, &fri->completion);
 		if (r < 0) {
 			log_err("rados_aio_create_completion failed.\n");
@@ -255,7 +274,7 @@ static enum fio_q_status fio_rados_queue(struct thread_data *td,
 		}
 		return FIO_Q_QUEUED;
 	} else if (io_u->ddir == DDIR_READ) {
-		r = rados_aio_create_completion(fri, NULL,
+		r = rados_aio_create_completion(fri, complete_callback,
 			NULL, &fri->completion);
 		if (r < 0) {
 			log_err("rados_aio_create_completion failed.\n");
@@ -269,7 +288,7 @@ static enum fio_q_status fio_rados_queue(struct thread_data *td,
 		}
 		return FIO_Q_QUEUED;
 	} else if (io_u->ddir == DDIR_TRIM) {
-		r = rados_aio_create_completion(fri, NULL,
+		r = rados_aio_create_completion(fri, complete_callback,
 			NULL , &fri->completion);
 		if (r < 0) {
 			log_err("rados_aio_create_completion failed.\n");
@@ -313,50 +332,33 @@ int fio_rados_getevents(struct thread_data *td, unsigned int min,
 	unsigned int max, const struct timespec *t)
 {
 	struct rados_data *rados = td->io_ops_data;
-	struct rados_options *o = td->eo;
-	int busy_poll = o->busy_poll;
 	unsigned int events = 0;
-	struct io_u *u;
 	struct fio_rados_iou *fri;
-	unsigned int i;
-	rados_completion_t first_unfinished;
-	int observed_new = 0;
-
-	/* loop through inflight ios until we find 'min' completions */
-	do {
-		first_unfinished = NULL;
-		io_u_qiter(&td->io_u_all, u, i) {
-			if (!(u->flags & IO_U_F_FLIGHT))
-				continue;
-
-			fri = u->engine_data;
-			if (fri->completion) {
-				if (rados_aio_is_complete(fri->completion)) {
-					if (fri->write_op != NULL) {
-						rados_release_write_op(fri->write_op);
-						fri->write_op = NULL;
-					}
-					rados_aio_release(fri->completion);
-					fri->completion = NULL;
-					rados->aio_events[events] = u;
-					events++;
-					observed_new = 1;
-				} else if (first_unfinished == NULL) {
-					first_unfinished = fri->completion;
-				}
-			}
-			if (events >= max)
-				break;
+
+	pthread_mutex_lock(&rados->completed_lock);
+	while (events < min) {
+		while (flist_empty(&rados->completed_operations)) {
+			pthread_cond_wait(&rados->completed_more_io, &rados->completed_lock);
 		}
-		if (events >= min)
-			return events;
-		if (first_unfinished == NULL || busy_poll)
-			continue;
-
-		if (!observed_new)
-			rados_aio_wait_for_complete(first_unfinished);
-	} while (1);
-  return events;
+		assert(!flist_empty(&rados->completed_operations));
+		
+		fri = flist_last_entry(&rados->completed_operations, struct fio_rados_iou, list);
+		assert(fri->completion);
+		assert(rados_aio_is_complete(fri->completion));
+		if (fri->write_op != NULL) {
+			rados_release_write_op(fri->write_op);
+			fri->write_op = NULL;
+		}
+		rados_aio_release(fri->completion);
+		fri->completion = NULL;
+
+		rados->aio_events[events] = fri->io_u;
+		events ++;
+		flist_del(&fri->list);
+		if (events >= max) break;
+	}
+	pthread_mutex_unlock(&rados->completed_lock);
+	return events;
 }
 
 static int fio_rados_setup(struct thread_data *td)
@@ -425,6 +427,7 @@ static int fio_rados_io_u_init(struct thread_data *td, struct io_u *io_u)
 	fri = calloc(1, sizeof(*fri));
 	fri->io_u = io_u;
 	fri->td = td;
+	INIT_FLIST_HEAD(&fri->list);
 	io_u->engine_data = fri;
 	return 0;
 }
diff --git a/parse.c b/parse.c
index 483a62f6..04b2e198 100644
--- a/parse.c
+++ b/parse.c
@@ -1048,7 +1048,20 @@ struct fio_option *find_option(struct fio_option *options, const char *opt)
 const struct fio_option *
 find_option_c(const struct fio_option *options, const char *opt)
 {
-	return find_option((struct fio_option *)options, opt);
+	const struct fio_option *o;
+
+	for (o = &options[0]; o->name; o++) {
+		if (!o_match(o, opt))
+			continue;
+		if (o->type == FIO_OPT_UNSUPPORTED) {
+			log_err("Option <%s>: %s\n", o->name, o->help);
+			continue;
+		}
+
+		return o;
+	}
+
+	return NULL;
 }
 
 static const struct fio_option *


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-11-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-11-15 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit cc16261082a4d0f027cef9d019ad023129bf6012:

  Merge branch 'testing' of https://github.com/vincentkfu/fio (2019-11-06 17:04:17 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b048455fef67dad4ef96ce0393937322e3165b58:

  t/run-fio-tests: improve error handling (2019-11-14 14:07:25 -0700)

----------------------------------------------------------------
Vincent Fu (3):
      filesetup: improve LFSR init failure error message
      io_u: move to next zone even if zoneskip is unset
      t/run-fio-tests: improve error handling

 filesetup.c        |  3 +++
 io_u.c             |  3 +--
 t/run-fio-tests.py | 10 ++++++----
 t/strided.py       |  3 +--
 4 files changed, 11 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index 1d3094c1..7d54c9f1 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1354,6 +1354,9 @@ bool init_random_map(struct thread_data *td)
 			if (!lfsr_init(&f->lfsr, blocks, seed, 0)) {
 				fio_file_set_lfsr(f);
 				continue;
+			} else {
+				log_err("fio: failed initializing LFSR\n");
+				return false;
 			}
 		} else if (!td->o.norandommap) {
 			f->io_axmap = axmap_new(blocks);
diff --git a/io_u.c b/io_u.c
index 5cbbe85a..b5c31335 100644
--- a/io_u.c
+++ b/io_u.c
@@ -850,8 +850,7 @@ static void setup_strided_zone_mode(struct thread_data *td, struct io_u *io_u)
 	/*
 	 * See if it's time to switch to a new zone
 	 */
-	if (td->zone_bytes >= td->o.zone_size &&
-			fio_option_is_set(&td->o, zone_skip)) {
+	if (td->zone_bytes >= td->o.zone_size) {
 		td->zone_bytes = 0;
 		f->file_offset += td->o.zone_range + td->o.zone_skip;
 
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
index 1b8ca0a2..cf8de093 100755
--- a/t/run-fio-tests.py
+++ b/t/run-fio-tests.py
@@ -14,7 +14,7 @@
 #
 #
 # EXAMPLE
-# # git clone [fio-repository]
+# # git clone git://git.kernel.dk/fio.git
 # # cd fio
 # # make -j
 # # python3 t/run-fio-tests.py
@@ -123,6 +123,7 @@ class FioExeTest(FioTest):
         stderr_file = open(self.stderr_file, "w+")
         exticode_file = open(self.exticode_file, "w+")
         try:
+            proc = None
             # Avoid using subprocess.run() here because when a timeout occurs,
             # fio will be stopped with SIGKILL. This does not give fio a
             # chance to clean up and means that child processes may continue
@@ -142,9 +143,10 @@ class FioExeTest(FioTest):
             assert proc.poll()
             self.output['failure'] = 'timeout'
         except Exception:
-            if not proc.poll():
-                proc.terminate()
-                proc.communicate()
+            if proc:
+                if not proc.poll():
+                    proc.terminate()
+                    proc.communicate()
             self.output['failure'] = 'exception'
             self.output['exc_info'] = sys.exc_info()
         finally:
diff --git a/t/strided.py b/t/strided.py
index c159dc0b..aac15d10 100755
--- a/t/strided.py
+++ b/t/strided.py
@@ -22,7 +22,7 @@
 #
 # ===TEST MATRIX===
 #
-# --zonemode=strided, zoneskip >= 0
+# --zonemode=strided, zoneskip unset
 #   w/ randommap and LFSR
 #       zonesize=zonerange  all blocks in zonerange touched
 #       zonesize>zonerange  all blocks touched and roll-over back into zone
@@ -57,7 +57,6 @@ def run_fio(fio, test, index):
                 "--log_offset=1",
                 "--randrepeat=0",
                 "--rw=randread",
-                "--zoneskip=0",
                 "--write_iops_log={0}{1:03d}".format(filename, index),
                 "--output={0}{1:03d}.out".format(filename, index),
                 "--zonerange={zonerange}".format(**test),


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-11-07 15:25 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-11-07 15:25 UTC (permalink / raw)
  To: fio

The following changes since commit d5c4f97458d59689c3d1a13831519617d000fb19:

  arch-arm: Consider armv7ve arch as well (2019-11-05 06:41:55 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cc16261082a4d0f027cef9d019ad023129bf6012:

  Merge branch 'testing' of https://github.com/vincentkfu/fio (2019-11-06 17:04:17 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'testing' of https://github.com/vincentkfu/fio

Vincent Fu (12):
      t/iee754: add return value
      t/readonly: replace shell script with python script
      t/strided.py: change LFSR tests
      t/steadystate_tests: better support automated testing
      t/sgunmap-test.py: drop six.moves dependency
      t/stest: non-zero exit value on failure
      t/jobs: fixup t0002 test jobs
      t/jobs: use current directory for test file for t0003 and t0004
      t/jobs: drop time_based in t0007
      t/jobs: clean up t0009 and use only 4 CPUs
      t/jobs: fix t0011 syntax error
      t/run-fio-tests: a script to automate running fio tests

 t/ieee754.c                                        |  15 +-
 t/jobs/readonly-r.fio                              |   5 -
 t/jobs/readonly-t.fio                              |   5 -
 t/jobs/readonly-w.fio                              |   5 -
 ...t0002-13af05ae-post => t0002-13af05ae-post.fio} |  48 +-
 .../{t0002-13af05ae-pre => t0002-13af05ae-pre.fio} |  46 +-
 t/jobs/t0003-0ae2c6e1-post.fio                     |   2 +-
 t/jobs/t0003-0ae2c6e1-pre.fio                      |   2 +-
 t/jobs/t0004-8a99fdf6.fio                          |   2 +-
 t/jobs/t0007-37cf9e3c.fio                          |   1 -
 t/jobs/t0009-f8b0bd10.fio                          |   6 +-
 t/jobs/t0011-5d2788d5.fio                          |   2 +-
 t/readonly.py                                      | 138 +++++
 t/readonly.sh                                      |  84 ---
 t/run-fio-tests.py                                 | 676 +++++++++++++++++++++
 t/sgunmap-test.py                                  |   1 -
 t/steadystate_tests.py                             |  40 +-
 t/stest.c                                          |  15 +-
 t/strided.py                                       |  36 +-
 19 files changed, 936 insertions(+), 193 deletions(-)
 delete mode 100644 t/jobs/readonly-r.fio
 delete mode 100644 t/jobs/readonly-t.fio
 delete mode 100644 t/jobs/readonly-w.fio
 rename t/jobs/{t0002-13af05ae-post => t0002-13af05ae-post.fio} (85%)
 rename t/jobs/{t0002-13af05ae-pre => t0002-13af05ae-pre.fio} (85%)
 create mode 100755 t/readonly.py
 delete mode 100755 t/readonly.sh
 create mode 100755 t/run-fio-tests.py

---

Diff of recent changes:

diff --git a/t/ieee754.c b/t/ieee754.c
index 3898ab74..b6526394 100644
--- a/t/ieee754.c
+++ b/t/ieee754.c
@@ -1,21 +1,26 @@
 #include <stdio.h>
 #include "../lib/ieee754.h"
 
-static double values[] = { -17.23, 17.23, 123.4567, 98765.4321, 0.0 };
+static double values[] = { -17.23, 17.23, 123.4567, 98765.4321,
+	3.14159265358979323, 0.0 };
 
 int main(int argc, char *argv[])
 {
 	uint64_t i;
-	double f;
-	int j;
+	double f, delta;
+	int j, differences = 0;
 
 	j = 0;
 	do {
 		i = fio_double_to_uint64(values[j]);
 		f = fio_uint64_to_double(i);
-		printf("%f -> %f\n", values[j], f);
+		delta = values[j] - f;
+		printf("%26.20lf -> %26.20lf, delta = %26.20lf\n", values[j],
+			f, delta);
+		if (f != values[j])
+			differences++;
 		j++;
 	} while (values[j] != 0.0);
 
-	return 0;
+	return differences;
 }
diff --git a/t/jobs/readonly-r.fio b/t/jobs/readonly-r.fio
deleted file mode 100644
index 34ba9b5e..00000000
--- a/t/jobs/readonly-r.fio
+++ /dev/null
@@ -1,5 +0,0 @@
-[test]
-filename=${DUT}
-rw=randread
-time_based
-runtime=1s
diff --git a/t/jobs/readonly-t.fio b/t/jobs/readonly-t.fio
deleted file mode 100644
index f3e093c1..00000000
--- a/t/jobs/readonly-t.fio
+++ /dev/null
@@ -1,5 +0,0 @@
-[test]
-filename=${DUT}
-rw=randtrim
-time_based
-runtime=1s
diff --git a/t/jobs/readonly-w.fio b/t/jobs/readonly-w.fio
deleted file mode 100644
index 26029ef2..00000000
--- a/t/jobs/readonly-w.fio
+++ /dev/null
@@ -1,5 +0,0 @@
-[test]
-filename=${DUT}
-rw=randwrite
-time_based
-runtime=1s
diff --git a/t/jobs/t0002-13af05ae-post b/t/jobs/t0002-13af05ae-post.fio
similarity index 85%
rename from t/jobs/t0002-13af05ae-post
rename to t/jobs/t0002-13af05ae-post.fio
index b7d5bab9..d141d406 100644
--- a/t/jobs/t0002-13af05ae-post
+++ b/t/jobs/t0002-13af05ae-post.fio
@@ -1,24 +1,24 @@
-[global]
-ioengine=libaio
-direct=1
-filename=/dev/fioa
-iodepth=128
-size=1G
-loops=1
-group_reporting=1
-readwrite=read
-do_verify=1
-verify=md5
-verify_fatal=1
-numjobs=1
-thread
-bssplit=512/50:1M/50
-
-[thread0]
-offset=0G
-
-[thread-mix0]
-offset=4G
-size=1G
-readwrite=rw
-bsrange=512:1M
+[global]
+ioengine=libaio
+direct=1
+filename=t0002file
+iodepth=128
+size=1G
+loops=1
+group_reporting=1
+readwrite=read
+do_verify=1
+verify=md5
+verify_fatal=1
+numjobs=1
+thread
+bssplit=512/50:1M/50
+
+[thread0]
+offset=0G
+
+[thread-mix0]
+offset=4G
+size=1G
+readwrite=rw
+bsrange=512:1M
diff --git a/t/jobs/t0002-13af05ae-pre b/t/jobs/t0002-13af05ae-pre.fio
similarity index 85%
rename from t/jobs/t0002-13af05ae-pre
rename to t/jobs/t0002-13af05ae-pre.fio
index 77dd48fd..0e044d47 100644
--- a/t/jobs/t0002-13af05ae-pre
+++ b/t/jobs/t0002-13af05ae-pre.fio
@@ -1,23 +1,23 @@
-[global]
-ioengine=libaio
-direct=1
-filename=/dev/fioa
-iodepth=128
-size=1G
-loops=1
-group_reporting=1
-readwrite=write
-do_verify=0
-verify=md5
-numjobs=1
-thread
-bssplit=512/50:1M/50
-
-[thread0]
-offset=0G
-
-[thread-mix0]
-offset=4G
-readwrite=rw
-size=1G
-bsrange=512:1M
+[global]
+ioengine=libaio
+direct=1
+filename=t0002file
+iodepth=128
+size=1G
+loops=1
+group_reporting=1
+readwrite=write
+do_verify=0
+verify=md5
+numjobs=1
+thread
+bssplit=512/50:1M/50
+
+[thread0]
+offset=0G
+
+[thread-mix0]
+offset=4G
+readwrite=rw
+size=1G
+bsrange=512:1M
diff --git a/t/jobs/t0003-0ae2c6e1-post.fio b/t/jobs/t0003-0ae2c6e1-post.fio
index 8bc4f05a..4e7887a3 100644
--- a/t/jobs/t0003-0ae2c6e1-post.fio
+++ b/t/jobs/t0003-0ae2c6e1-post.fio
@@ -3,7 +3,7 @@
 [global]
 ioengine=libaio
 direct=1
-filename=/tmp/foo
+filename=foo
 iodepth=128
 size=1M
 loops=1
diff --git a/t/jobs/t0003-0ae2c6e1-pre.fio b/t/jobs/t0003-0ae2c6e1-pre.fio
index 46f452cb..a9a9f319 100644
--- a/t/jobs/t0003-0ae2c6e1-pre.fio
+++ b/t/jobs/t0003-0ae2c6e1-pre.fio
@@ -1,7 +1,7 @@
 [global]
 ioengine=libaio
 direct=1
-filename=/tmp/foo
+filename=foo
 iodepth=128
 size=10M
 loops=1
diff --git a/t/jobs/t0004-8a99fdf6.fio b/t/jobs/t0004-8a99fdf6.fio
index 09ae9b26..0fc3e0de 100644
--- a/t/jobs/t0004-8a99fdf6.fio
+++ b/t/jobs/t0004-8a99fdf6.fio
@@ -3,7 +3,7 @@
 [global]
 ioengine=libaio
 direct=1
-filename=/tmp/foo
+filename=foo
 iodepth=128
 size=10M
 loops=1
diff --git a/t/jobs/t0007-37cf9e3c.fio b/t/jobs/t0007-37cf9e3c.fio
index fd70c21c..d3c98751 100644
--- a/t/jobs/t0007-37cf9e3c.fio
+++ b/t/jobs/t0007-37cf9e3c.fio
@@ -4,7 +4,6 @@
 size=128mb
 rw=read:512k
 bs=1m
-time_based
 norandommap
 write_iolog=log
 direct=1
diff --git a/t/jobs/t0009-f8b0bd10.fio b/t/jobs/t0009-f8b0bd10.fio
index 90e07ad8..20f376e6 100644
--- a/t/jobs/t0009-f8b0bd10.fio
+++ b/t/jobs/t0009-f8b0bd10.fio
@@ -16,21 +16,21 @@ numjobs=1
 #numjobs=24
 # number_ios=1
 # runtime=216000
-runtime=3600
+#runtime=3600
 time_based=1
 group_reporting=1
 thread
 gtod_reduce=1
 iodepth_batch=4
 iodepth_batch_complete=4
-cpus_allowed=0-5
+cpus_allowed=0-3
 cpus_allowed_policy=split
 rw=randwrite
 verify=crc32c-intel
 verify_backlog=1m
 do_verify=1
 verify_async=6
-verify_async_cpus=0-5
+verify_async_cpus=0-3
 runtime=1m
 
 [4_KiB_RR_drive_r]
diff --git a/t/jobs/t0011-5d2788d5.fio b/t/jobs/t0011-5d2788d5.fio
index 09861f7f..50daf612 100644
--- a/t/jobs/t0011-5d2788d5.fio
+++ b/t/jobs/t0011-5d2788d5.fio
@@ -1,7 +1,7 @@
 # Expected results: no parse warnings, runs and with roughly 1/8 iops between
 #			the two jobs.
 # Buggy result: parse warning on flow value overflow, no 1/8 division between
-			jobs.
+#			jobs.
 #
 [global]
 bs=4k
diff --git a/t/readonly.py b/t/readonly.py
new file mode 100755
index 00000000..43686c9c
--- /dev/null
+++ b/t/readonly.py
@@ -0,0 +1,138 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright (c) 2019 Western Digital Corporation or its affiliates.
+#
+#
+# readonly.py
+#
+# Do some basic tests of the --readonly paramter
+#
+# USAGE
+# python readonly.py [-f fio-executable]
+#
+# EXAMPLES
+# python t/readonly.py
+# python t/readonly.py -f ./fio
+#
+# REQUIREMENTS
+# Python 3.5+
+#
+#
+
+import sys
+import argparse
+import subprocess
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-f', '--fio',
+                        help='path to fio executable (e.g., ./fio)')
+    args = parser.parse_args()
+
+    return args
+
+
+def run_fio(fio, test, index):
+    fio_args = [
+                "--name=readonly",
+                "--ioengine=null",
+                "--time_based",
+                "--runtime=1s",
+                "--size=1M",
+                "--rw={rw}".format(**test),
+               ]
+    if 'readonly-pre' in test:
+        fio_args.insert(0, "--readonly")
+    if 'readonly-post' in test:
+        fio_args.append("--readonly")
+
+    output = subprocess.run([fio] + fio_args, stdout=subprocess.PIPE,
+                            stderr=subprocess.PIPE)
+
+    return output
+
+
+def check_output(output, test):
+    expect_error = False
+    if 'readonly-pre' in test or 'readonly-post' in test:
+        if 'write' in test['rw'] or 'trim' in test['rw']:
+            expect_error = True
+
+#    print(output.stdout)
+#    print(output.stderr)
+
+    if output.returncode == 0:
+        if expect_error:
+            return False
+        else:
+            return True
+    else:
+        if expect_error:
+            return True
+        else:
+            return False
+
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    tests = [
+                {
+                    "rw": "randread",
+                    "readonly-pre": 1,
+                },
+                {
+                    "rw": "randwrite",
+                    "readonly-pre": 1,
+                },
+                {
+                    "rw": "randtrim",
+                    "readonly-pre": 1,
+                },
+                {
+                    "rw": "randread",
+                    "readonly-post": 1,
+                },
+                {
+                    "rw": "randwrite",
+                    "readonly-post": 1,
+                },
+                {
+                    "rw": "randtrim",
+                    "readonly-post": 1,
+                },
+                {
+                    "rw": "randread",
+                },
+                {
+                    "rw": "randwrite",
+                },
+                {
+                    "rw": "randtrim",
+                },
+            ]
+
+    index = 1
+    passed = 0
+    failed = 0
+
+    if args.fio:
+        fio_path = args.fio
+    else:
+        fio_path = 'fio'
+
+    for test in tests:
+        output = run_fio(fio_path, test, index)
+        status = check_output(output, test)
+        print("Test {0} {1}".format(index, ("PASSED" if status else "FAILED")))
+        if status:
+            passed = passed + 1
+        else:
+            failed = failed + 1
+        index = index + 1
+
+    print("{0} tests passed, {1} failed".format(passed, failed))
+
+    sys.exit(failed)
diff --git a/t/readonly.sh b/t/readonly.sh
deleted file mode 100755
index d7094146..00000000
--- a/t/readonly.sh
+++ /dev/null
@@ -1,84 +0,0 @@
-#!/bin/bash
-#
-# Do some basic test of the --readonly parameter
-#
-# DUT should be a device that accepts read, write, and trim operations
-#
-# Example usage:
-#
-# DUT=/dev/fioa t/readonly.sh
-#
-TESTNUM=1
-
-#
-# The first parameter is the return code
-# The second parameter is 0        if the return code should be 0
-#                         positive if the return code should be positive
-#
-check () {
-	echo "********************"
-
-	if [ $2 -gt 0 ]; then
-		if [ $1 -eq 0 ]; then
-			echo "Test $TESTNUM failed"
-			echo "********************"
-			exit 1
-		else
-			echo "Test $TESTNUM passed"
-		fi
-	else
-		if [ $1 -gt 0 ]; then
-			echo "Test $TESTNUM failed"
-			echo "********************"
-			exit 1
-		else
-			echo "Test $TESTNUM passed"
-		fi
-	fi
-
-	echo "********************"
-	echo
-	TESTNUM=$((TESTNUM+1))
-}
-
-./fio --name=test --filename=$DUT --rw=randread  --readonly --time_based --runtime=1s &> /dev/null
-check $? 0
-./fio --name=test --filename=$DUT --rw=randwrite --readonly --time_based --runtime=1s &> /dev/null
-check $? 1
-./fio --name=test --filename=$DUT --rw=randtrim  --readonly --time_based --runtime=1s &> /dev/null
-check $? 1
-
-./fio --name=test --filename=$DUT --readonly --rw=randread  --time_based --runtime=1s &> /dev/null
-check $? 0
-./fio --name=test --filename=$DUT --readonly --rw=randwrite --time_based --runtime=1s &> /dev/null
-check $? 1
-./fio --name=test --filename=$DUT --readonly --rw=randtrim  --time_based --runtime=1s &> /dev/null
-check $? 1
-
-./fio --name=test --filename=$DUT --rw=randread  --time_based --runtime=1s &> /dev/null
-check $? 0
-./fio --name=test --filename=$DUT --rw=randwrite --time_based --runtime=1s &> /dev/null
-check $? 0
-./fio --name=test --filename=$DUT --rw=randtrim  --time_based --runtime=1s &> /dev/null
-check $? 0
-
-./fio t/jobs/readonly-r.fio --readonly &> /dev/null
-check $? 0
-./fio t/jobs/readonly-w.fio --readonly &> /dev/null
-check $? 1
-./fio t/jobs/readonly-t.fio --readonly &> /dev/null
-check $? 1
-
-./fio --readonly t/jobs/readonly-r.fio &> /dev/null
-check $? 0
-./fio --readonly t/jobs/readonly-w.fio &> /dev/null
-check $? 1
-./fio --readonly t/jobs/readonly-t.fio &> /dev/null
-check $? 1
-
-./fio t/jobs/readonly-r.fio &> /dev/null
-check $? 0
-./fio t/jobs/readonly-w.fio &> /dev/null
-check $? 0
-./fio t/jobs/readonly-t.fio &> /dev/null
-check $? 0
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
new file mode 100755
index 00000000..1b8ca0a2
--- /dev/null
+++ b/t/run-fio-tests.py
@@ -0,0 +1,676 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright (c) 2019 Western Digital Corporation or its affiliates.
+#
+"""
+# run-fio-tests.py
+#
+# Automate running of fio tests
+#
+# USAGE
+# python3 run-fio-tests.py [-r fio-root] [-f fio-path] [-a artifact-root]
+#                           [--skip # # #...] [--run-only # # #...]
+#
+#
+# EXAMPLE
+# # git clone [fio-repository]
+# # cd fio
+# # make -j
+# # python3 t/run-fio-tests.py
+#
+#
+# REQUIREMENTS
+# - Python 3
+# - Linux (libaio ioengine, zbd tests, etc)
+# - The artifact directory must be on a file system that accepts 512-byte IO
+#   (t0002, t0003, t0004).
+# - The artifact directory needs to be on an SSD. Otherwise tests that carry
+#   out file-based IO will trigger a timeout (t0006).
+# - 4 CPUs (t0009)
+# - SciPy (steadystate_tests.py)
+# - libzbc (zbd tests)
+# - root privileges (zbd test)
+# - kernel 4.19 or later for zoned null block devices (zbd tests)
+# - CUnit support (unittests)
+#
+"""
+
+#
+# TODO  run multiple tests simultaneously
+# TODO  Add sgunmap tests (requires SAS SSD)
+# TODO  automatically detect dependencies and skip tests accordingly
+#
+
+import os
+import sys
+import json
+import time
+import logging
+import argparse
+import subprocess
+from pathlib import Path
+
+
+class FioTest(object):
+    """Base for all fio tests."""
+
+    def __init__(self, exe_path, parameters, success):
+        self.exe_path = exe_path
+        self.parameters = parameters
+        self.success = success
+        self.output = {}
+        self.artifact_root = None
+        self.testnum = None
+        self.test_dir = None
+        self.passed = True
+        self.failure_reason = ''
+
+    def setup(self, artifact_root, testnum):
+        self.artifact_root = artifact_root
+        self.testnum = testnum
+        self.test_dir = os.path.join(artifact_root, "{:04d}".format(testnum))
+        if not os.path.exists(self.test_dir):
+            os.mkdir(self.test_dir)
+
+        self.command_file = os.path.join(
+                self.test_dir,
+                "{0}.command".format(os.path.basename(self.exe_path)))
+        self.stdout_file = os.path.join(
+                self.test_dir,
+                "{0}.stdout".format(os.path.basename(self.exe_path)))
+        self.stderr_file = os.path.join(
+                self.test_dir,
+                "{0}.stderr".format(os.path.basename(self.exe_path)))
+        self.exticode_file = os.path.join(
+                self.test_dir,
+                "{0}.exitcode".format(os.path.basename(self.exe_path)))
+
+    def run(self):
+        raise NotImplementedError()
+
+    def check_result(self):
+        raise NotImplementedError()
+
+
+class FioExeTest(FioTest):
+    """Test consists of an executable binary or script"""
+
+    def __init__(self, exe_path, parameters, success):
+        """Construct a FioExeTest which is a FioTest consisting of an
+        executable binary or script.
+
+        exe_path:       location of executable binary or script
+        parameters:     list of parameters for executable
+        success:        Definition of test success
+        """
+
+        FioTest.__init__(self, exe_path, parameters, success)
+
+    def setup(self, artifact_root, testnum):
+        super(FioExeTest, self).setup(artifact_root, testnum)
+
+    def run(self):
+        if self.parameters:
+            command = [self.exe_path] + self.parameters
+        else:
+            command = [self.exe_path]
+        command_file = open(self.command_file, "w+")
+        command_file.write("%s\n" % command)
+        command_file.close()
+
+        stdout_file = open(self.stdout_file, "w+")
+        stderr_file = open(self.stderr_file, "w+")
+        exticode_file = open(self.exticode_file, "w+")
+        try:
+            # Avoid using subprocess.run() here because when a timeout occurs,
+            # fio will be stopped with SIGKILL. This does not give fio a
+            # chance to clean up and means that child processes may continue
+            # running and submitting IO.
+            proc = subprocess.Popen(command,
+                                    stdout=stdout_file,
+                                    stderr=stderr_file,
+                                    cwd=self.test_dir,
+                                    universal_newlines=True)
+            proc.communicate(timeout=self.success['timeout'])
+            exticode_file.write('{0}\n'.format(proc.returncode))
+            logging.debug("return code: %d" % proc.returncode)
+            self.output['proc'] = proc
+        except subprocess.TimeoutExpired:
+            proc.terminate()
+            proc.communicate()
+            assert proc.poll()
+            self.output['failure'] = 'timeout'
+        except Exception:
+            if not proc.poll():
+                proc.terminate()
+                proc.communicate()
+            self.output['failure'] = 'exception'
+            self.output['exc_info'] = sys.exc_info()
+        finally:
+            stdout_file.close()
+            stderr_file.close()
+            exticode_file.close()
+
+    def check_result(self):
+        if 'proc' not in self.output:
+            if self.output['failure'] == 'timeout':
+                self.failure_reason = "{0} timeout,".format(self.failure_reason)
+            else:
+                assert self.output['failure'] == 'exception'
+                self.failure_reason = '{0} exception: {1}, {2}'.format(
+                        self.failure_reason, self.output['exc_info'][0],
+                        self.output['exc_info'][1])
+
+            self.passed = False
+            return
+
+        if 'zero_return' in self.success:
+            if self.success['zero_return']:
+                if self.output['proc'].returncode != 0:
+                    self.passed = False
+                    self.failure_reason = "{0} non-zero return code,".format(self.failure_reason)
+            else:
+                if self.output['proc'].returncode == 0:
+                    self.failure_reason = "{0} zero return code,".format(self.failure_reason)
+                    self.passed = False
+
+        if 'stderr_empty' in self.success:
+            stderr_size = os.path.getsize(self.stderr_file)
+            if self.success['stderr_empty']:
+                if stderr_size != 0:
+                    self.failure_reason = "{0} stderr not empty,".format(self.failure_reason)
+                    self.passed = False
+            else:
+                if stderr_size == 0:
+                    self.failure_reason = "{0} stderr empty,".format(self.failure_reason)
+                    self.passed = False
+
+
+class FioJobTest(FioExeTest):
+    """Test consists of a fio job"""
+
+    def __init__(self, fio_path, fio_job, success, fio_pre_job=None,
+                 fio_pre_success=None, output_format="normal"):
+        """Construct a FioJobTest which is a FioExeTest consisting of a
+        single fio job file with an optional setup step.
+
+        fio_path:           location of fio executable
+        fio_job:            location of fio job file
+        success:            Definition of test success
+        fio_pre_job:        fio job for preconditioning
+        fio_pre_success:    Definition of test success for fio precon job
+        output_format:      normal (default), json, jsonplus, or terse
+        """
+
+        self.fio_job = fio_job
+        self.fio_pre_job = fio_pre_job
+        self.fio_pre_success = fio_pre_success if fio_pre_success else success
+        self.output_format = output_format
+        self.precon_failed = False
+        self.json_data = None
+        self.fio_output = "{0}.output".format(os.path.basename(self.fio_job))
+        self.fio_args = [
+            "--output-format={0}".format(self.output_format),
+            "--output={0}".format(self.fio_output),
+            self.fio_job,
+            ]
+        FioExeTest.__init__(self, fio_path, self.fio_args, success)
+
+    def setup(self, artifact_root, testnum):
+        super(FioJobTest, self).setup(artifact_root, testnum)
+
+        self.command_file = os.path.join(
+                self.test_dir,
+                "{0}.command".format(os.path.basename(self.fio_job)))
+        self.stdout_file = os.path.join(
+                self.test_dir,
+                "{0}.stdout".format(os.path.basename(self.fio_job)))
+        self.stderr_file = os.path.join(
+                self.test_dir,
+                "{0}.stderr".format(os.path.basename(self.fio_job)))
+        self.exticode_file = os.path.join(
+                self.test_dir,
+                "{0}.exitcode".format(os.path.basename(self.fio_job)))
+
+    def run_pre_job(self):
+        precon = FioJobTest(self.exe_path, self.fio_pre_job,
+                            self.fio_pre_success,
+                            output_format=self.output_format)
+        precon.setup(self.artifact_root, self.testnum)
+        precon.run()
+        precon.check_result()
+        self.precon_failed = not precon.passed
+        self.failure_reason = precon.failure_reason
+
+    def run(self):
+        if self.fio_pre_job:
+            self.run_pre_job()
+
+        if not self.precon_failed:
+            super(FioJobTest, self).run()
+        else:
+            logging.debug("precondition step failed")
+
+    def check_result(self):
+        if self.precon_failed:
+            self.passed = False
+            self.failure_reason = "{0} precondition step failed,".format(self.failure_reason)
+            return
+
+        super(FioJobTest, self).check_result()
+
+        if 'json' in self.output_format:
+            output_file = open(os.path.join(self.test_dir, self.fio_output), "r")
+            file_data = output_file.read()
+            output_file.close()
+            try:
+                self.json_data = json.loads(file_data)
+            except json.JSONDecodeError:
+                self.failure_reason = "{0} unable to decode JSON data,".format(self.failure_reason)
+                self.passed = False
+
+
+class FioJobTest_t0005(FioJobTest):
+    """Test consists of fio test job t0005
+    Confirm that read['io_kbytes'] == write['io_kbytes'] == 102400"""
+
+    def check_result(self):
+        super(FioJobTest_t0005, self).check_result()
+
+        if not self.passed:
+            return
+
+        if self.json_data['jobs'][0]['read']['io_kbytes'] != 102400:
+            self.failure_reason = "{0} bytes read mismatch,".format(self.failure_reason)
+            self.passed = False
+        if self.json_data['jobs'][0]['write']['io_kbytes'] != 102400:
+            self.failure_reason = "{0} bytes written mismatch,".format(self.failure_reason)
+            self.passed = False
+
+
+class FioJobTest_t0006(FioJobTest):
+    """Test consists of fio test job t0006
+    Confirm that read['io_kbytes'] ~ 2*write['io_kbytes']"""
+
+    def check_result(self):
+        super(FioJobTest_t0006, self).check_result()
+
+        if not self.passed:
+            return
+
+        ratio = self.json_data['jobs'][0]['read']['io_kbytes'] \
+            / self.json_data['jobs'][0]['write']['io_kbytes']
+        logging.debug("ratio: %f" % ratio)
+        if ratio < 1.99 or ratio > 2.01:
+            self.failure_reason = "{0} read/write ratio mismatch,".format(self.failure_reason)
+            self.passed = False
+
+
+class FioJobTest_t0007(FioJobTest):
+    """Test consists of fio test job t0007
+    Confirm that read['io_kbytes'] = 87040"""
+
+    def check_result(self):
+        super(FioJobTest_t0007, self).check_result()
+
+        if not self.passed:
+            return
+
+        if self.json_data['jobs'][0]['read']['io_kbytes'] != 87040:
+            self.failure_reason = "{0} bytes read mismatch,".format(self.failure_reason)
+            self.passed = False
+
+
+class FioJobTest_t0008(FioJobTest):
+    """Test consists of fio test job t0008
+    Confirm that read['io_kbytes'] = 32768 and that
+                write['io_kbytes'] ~ 16568
+
+    I did runs with fio-ae2fafc8 and saw write['io_kbytes'] values of
+    16585, 16588. With two runs of fio-3.16 I obtained 16568"""
+
+    def check_result(self):
+        super(FioJobTest_t0008, self).check_result()
+
+        if not self.passed:
+            return
+
+        ratio = self.json_data['jobs'][0]['write']['io_kbytes'] / 16568
+        logging.debug("ratio: %f" % ratio)
+
+        if ratio < 0.99 or ratio > 1.01:
+            self.failure_reason = "{0} bytes written mismatch,".format(self.failure_reason)
+            self.passed = False
+        if self.json_data['jobs'][0]['read']['io_kbytes'] != 32768:
+            self.failure_reason = "{0} bytes read mismatch,".format(self.failure_reason)
+            self.passed = False
+
+
+class FioJobTest_t0009(FioJobTest):
+    """Test consists of fio test job t0009
+    Confirm that runtime >= 60s"""
+
+    def check_result(self):
+        super(FioJobTest_t0009, self).check_result()
+
+        if not self.passed:
+            return
+
+        logging.debug('elapsed: %d' % self.json_data['jobs'][0]['elapsed'])
+
+        if self.json_data['jobs'][0]['elapsed'] < 60:
+            self.failure_reason = "{0} elapsed time mismatch,".format(self.failure_reason)
+            self.passed = False
+
+
+class FioJobTest_t0011(FioJobTest):
+    """Test consists of fio test job t0009
+    Confirm that job0 iops == 1000
+    and that job1_iops / job0_iops ~ 8
+    With two runs of fio-3.16 I observed a ratio of 8.3"""
+
+    def check_result(self):
+        super(FioJobTest_t0011, self).check_result()
+
+        if not self.passed:
+            return
+
+        iops1 = self.json_data['jobs'][0]['read']['iops']
+        iops2 = self.json_data['jobs'][1]['read']['iops']
+        ratio = iops2 / iops1
+        logging.debug("ratio: %f" % ratio)
+
+        if iops1 < 999 or iops1 > 1001:
+            self.failure_reason = "{0} iops value mismatch,".format(self.failure_reason)
+            self.passed = False
+
+        if ratio < 7 or ratio > 9:
+            self.failure_reason = "{0} iops ratio mismatch,".format(self.failure_reason)
+            self.passed = False
+
+
+SUCCESS_DEFAULT = {
+        'zero_return': True,
+        'stderr_empty': True,
+        'timeout': 300,
+        }
+SUCCESS_NONZERO = {
+        'zero_return': False,
+        'stderr_empty': False,
+        'timeout': 300,
+        }
+SUCCESS_STDERR = {
+        'zero_return': True,
+        'stderr_empty': False,
+        'timeout': 300,
+        }
+TEST_LIST = [
+        {
+            'test_id':          1,
+            'test_class':       FioJobTest,
+            'job':              't0001-52c58027.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+        },
+        {
+            'test_id':          2,
+            'test_class':       FioJobTest,
+            'job':              't0002-13af05ae-post.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          't0002-13af05ae-pre.fio',
+            'pre_success':      None,
+        },
+        {
+            'test_id':          3,
+            'test_class':       FioJobTest,
+            'job':              't0003-0ae2c6e1-post.fio',
+            'success':          SUCCESS_NONZERO,
+            'pre_job':          't0003-0ae2c6e1-pre.fio',
+            'pre_success':      SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          4,
+            'test_class':       FioJobTest,
+            'job':              't0004-8a99fdf6.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+        },
+        {
+            'test_id':          5,
+            'test_class':       FioJobTest_t0005,
+            'job':              't0005-f7078f7b.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+            'output_format':    'json',
+        },
+        {
+            'test_id':          6,
+            'test_class':       FioJobTest_t0006,
+            'job':              't0006-82af2a7c.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+            'output_format':    'json',
+        },
+        {
+            'test_id':          7,
+            'test_class':       FioJobTest_t0007,
+            'job':              't0007-37cf9e3c.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+            'output_format':    'json',
+        },
+        {
+            'test_id':          8,
+            'test_class':       FioJobTest_t0008,
+            'job':              't0008-ae2fafc8.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+            'output_format':    'json',
+        },
+        {
+            'test_id':          9,
+            'test_class':       FioJobTest_t0009,
+            'job':              't0009-f8b0bd10.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+            'output_format':    'json',
+        },
+        {
+            'test_id':          10,
+            'test_class':       FioJobTest,
+            'job':              't0010-b7aae4ba.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+        },
+        {
+            'test_id':          11,
+            'test_class':       FioJobTest_t0011,
+            'job':              't0011-5d2788d5.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+            'output_format':    'json',
+        },
+        {
+            'test_id':          1000,
+            'test_class':       FioExeTest,
+            'exe':              't/axmap',
+            'parameters':       None,
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1001,
+            'test_class':       FioExeTest,
+            'exe':              't/ieee754',
+            'parameters':       None,
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1002,
+            'test_class':       FioExeTest,
+            'exe':              't/lfsr-test',
+            'parameters':       ['0xFFFFFF', '0', '0', 'verify'],
+            'success':          SUCCESS_STDERR,
+        },
+        {
+            'test_id':          1003,
+            'test_class':       FioExeTest,
+            'exe':              't/readonly.py',
+            'parameters':       ['-f', '{fio_path}'],
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1004,
+            'test_class':       FioExeTest,
+            'exe':              't/steadystate_tests.py',
+            'parameters':       ['{fio_path}'],
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1005,
+            'test_class':       FioExeTest,
+            'exe':              't/stest',
+            'parameters':       None,
+            'success':          SUCCESS_STDERR,
+        },
+        {
+            'test_id':          1006,
+            'test_class':       FioExeTest,
+            'exe':              't/strided.py',
+            'parameters':       ['{fio_path}'],
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1007,
+            'test_class':       FioExeTest,
+            'exe':              't/zbd/run-tests-against-regular-nullb',
+            'parameters':       None,
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1008,
+            'test_class':       FioExeTest,
+            'exe':              't/zbd/run-tests-against-zoned-nullb',
+            'parameters':       None,
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1009,
+            'test_class':       FioExeTest,
+            'exe':              'unittests/unittest',
+            'parameters':       None,
+            'success':          SUCCESS_DEFAULT,
+        },
+]
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-r', '--fio-root',
+                        help='fio root path')
+    parser.add_argument('-f', '--fio',
+                        help='path to fio executable (e.g., ./fio)')
+    parser.add_argument('-a', '--artifact-root',
+                        help='artifact root directory')
+    parser.add_argument('-s', '--skip', nargs='+', type=int,
+                        help='list of test(s) to skip')
+    parser.add_argument('-o', '--run-only', nargs='+', type=int,
+                        help='list of test(s) to run, skipping all others')
+    args = parser.parse_args()
+
+    return args
+
+
+def main():
+    logging.basicConfig(level=logging.INFO)
+
+    args = parse_args()
+    if args.fio_root:
+        fio_root = args.fio_root
+    else:
+        fio_root = Path(__file__).absolute().parent.parent
+    logging.debug("fio_root: %s" % fio_root)
+
+    if args.fio:
+        fio_path = args.fio
+    else:
+        fio_path = os.path.join(fio_root, "fio")
+    logging.debug("fio_path: %s" % fio_path)
+
+    artifact_root = args.artifact_root if args.artifact_root else \
+        "fio-test-{0}".format(time.strftime("%Y%m%d-%H%M%S"))
+    os.mkdir(artifact_root)
+    print("Artifact directory is %s" % artifact_root)
+
+    passed = 0
+    failed = 0
+    skipped = 0
+
+    for config in TEST_LIST:
+        if (args.skip and config['test_id'] in args.skip) or \
+           (args.run_only and config['test_id'] not in args.run_only):
+            skipped = skipped + 1
+            print("Test {0} SKIPPED".format(config['test_id']))
+            continue
+
+        if issubclass(config['test_class'], FioJobTest):
+            if config['pre_job']:
+                fio_pre_job = os.path.join(fio_root, 't', 'jobs',
+                                           config['pre_job'])
+            else:
+                fio_pre_job = None
+            if config['pre_success']:
+                fio_pre_success = config['pre_success']
+            else:
+                fio_pre_success = None
+            if 'output_format' in config:
+                output_format = config['output_format']
+            else:
+                output_format = 'normal'
+            test = config['test_class'](
+                fio_path,
+                os.path.join(fio_root, 't', 'jobs', config['job']),
+                config['success'],
+                fio_pre_job=fio_pre_job,
+                fio_pre_success=fio_pre_success,
+                output_format=output_format)
+        elif issubclass(config['test_class'], FioExeTest):
+            exe_path = os.path.join(fio_root, config['exe'])
+            if config['parameters']:
+                parameters = [p.format(fio_path=fio_path) for p in config['parameters']]
+            else:
+                parameters = None
+            test = config['test_class'](exe_path, parameters,
+                                        config['success'])
+        else:
+            print("Test {0} FAILED: unable to process test config".format(config['test_id']))
+            failed = failed + 1
+            continue
+
+        test.setup(artifact_root, config['test_id'])
+        test.run()
+        test.check_result()
+        if test.passed:
+            result = "PASSED"
+            passed = passed + 1
+        else:
+            result = "FAILED: {0}".format(test.failure_reason)
+            failed = failed + 1
+        print("Test {0} {1}".format(config['test_id'], result))
+
+    print("{0} test(s) passed, {1} failed, {2} skipped".format(passed, failed, skipped))
+
+    sys.exit(failed)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/t/sgunmap-test.py b/t/sgunmap-test.py
index d2caa5fd..f8f10ab3 100755
--- a/t/sgunmap-test.py
+++ b/t/sgunmap-test.py
@@ -45,7 +45,6 @@ import json
 import argparse
 import traceback
 import subprocess
-from six.moves import range
 
 
 def parse_args():
diff --git a/t/steadystate_tests.py b/t/steadystate_tests.py
index 50254dcc..53b0f35e 100755
--- a/t/steadystate_tests.py
+++ b/t/steadystate_tests.py
@@ -1,5 +1,5 @@
-#!/usr/bin/python2.7
-# Note: this script is python2 and python 3 compatible.
+#!/usr/bin/env python
+# Note: this script is python2 and python3 compatible.
 #
 # steadystate_tests.py
 #
@@ -24,12 +24,10 @@ from __future__ import print_function
 import os
 import sys
 import json
-import uuid
 import pprint
 import argparse
 import subprocess
 from scipy import stats
-from six.moves import range
 
 def parse_args():
     parser = argparse.ArgumentParser()
@@ -53,7 +51,7 @@ def check(data, iops, slope, pct, limit, dur, criterion):
         m, intercept, r_value, p_value, std_err = stats.linregress(x,data)
         m = abs(m)
         if pct:
-            target = m / mean * 100
+            target = (m / mean * 100) if mean != 0 else 0
             criterion = criterion[:-1]
         else:
             target = m
@@ -68,7 +66,11 @@ def check(data, iops, slope, pct, limit, dur, criterion):
             target = maxdev
 
     criterion = float(criterion)
-    return (abs(target - criterion) / criterion < 0.005), target < limit, mean, target
+    if criterion == 0.0:
+        objsame = False
+    else:
+        objsame = abs(target - criterion) / criterion < 0.005
+    return (objsame, target < limit, mean, target)
 
 
 if __name__ == '__main__':
@@ -76,6 +78,9 @@ if __name__ == '__main__':
 
     pp = pprint.PrettyPrinter(indent=4)
 
+    passed = 0
+    failed = 0
+
 #
 # test option parsing
 #
@@ -96,8 +101,10 @@ if __name__ == '__main__':
         output = subprocess.check_output([args.fio] + test['args'])
         if test['output'] in output.decode():
             print("PASSED '{0}' found with arguments {1}".format(test['output'], test['args']))
+            passed = passed + 1
         else:
             print("FAILED '{0}' NOT found with arguments {1}".format(test['output'], test['args']))
+            failed = failed + 1
 
 #
 # test some read workloads
@@ -119,7 +126,7 @@ if __name__ == '__main__':
     if args.read == None:
         if os.name == 'posix':
             args.read = '/dev/zero'
-            extra = [ "--size=134217728" ]  # 128 MiB
+            extra = [ "--size=128M" ]
         else:
             print("ERROR: file for read testing must be specified on non-posix systems")
             sys.exit(1)
@@ -129,7 +136,7 @@ if __name__ == '__main__':
     jobnum = 0
     for job in reads:
 
-        tf = uuid.uuid4().hex
+        tf = "steadystate_job{0}.json".format(jobnum)
         parameters = [ "--name=job{0}".format(jobnum) ]
         parameters.extend(extra)
         parameters.extend([ "--thread",
@@ -160,10 +167,10 @@ if __name__ == '__main__':
         output = subprocess.call([args.fio] + parameters)
         with open(tf, 'r') as source:
             jsondata = json.loads(source.read())
-        os.remove(tf)
+            source.close()
 
         for jsonjob in jsondata['jobs']:
-            line = "job {0}".format(jsonjob['job options']['name'])
+            line = "{0}".format(jsonjob['job options']['name'])
             if job['s']:
                 if jsonjob['steadystate']['attained'] == 1:
                     # check runtime >= ss_dur + ss_ramp, check criterion, check criterion < limit
@@ -171,6 +178,7 @@ if __name__ == '__main__':
                     actual = jsonjob['read']['runtime']
                     if mintime > actual:
                         line = 'FAILED ' + line + ' ss attained, runtime {0} < ss_dur {1} + ss_ramp {2}'.format(actual, job['ss_dur'], job['ss_ramp'])
+                        failed = failed + 1
                     else:
                         line = line + ' ss attained, runtime {0} > ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
                         objsame, met, mean, target = check(data=jsonjob['steadystate']['data'],
@@ -182,11 +190,14 @@ if __name__ == '__main__':
                             criterion=jsonjob['steadystate']['criterion'])
                         if not objsame:
                             line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
+                            failed = failed + 1
                         else:
                             if met:
                                 line = 'PASSED ' + line + ' target {0} < limit {1}'.format(target, job['ss_limit'])
+                                passed = passed + 1
                             else:
                                 line = 'FAILED ' + line + ' target {0} < limit {1} but fio reports ss not attained '.format(target, job['ss_limit'])
+                                failed = failed + 1
                 else:
                     # check runtime, confirm criterion calculation, and confirm that criterion was not met
                     expected = job['timeout'] * 1000
@@ -205,22 +216,31 @@ if __name__ == '__main__':
                         if not objsame:
                             if actual > (job['ss_dur'] + job['ss_ramp'])*1000:
                                 line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
+                                failed = failed + 1
                             else:
                                 line = 'PASSED ' + line + ' fio criterion {0} == 0.0 since ss_dur + ss_ramp has not elapsed '.format(jsonjob['steadystate']['criterion'])
+                                passed = passed + 1
                         else:
                             if met:
                                 line = 'FAILED ' + line + ' target {0} < threshold {1} but fio reports ss not attained '.format(target, job['ss_limit'])
+                                failed = failed + 1
                             else:
                                 line = 'PASSED ' + line + ' criterion {0} > threshold {1}'.format(target, job['ss_limit'])
+                                passed = passed + 1
             else:
                 expected = job['timeout'] * 1000
                 actual = jsonjob['read']['runtime']
                 if abs(expected - actual) < 10:
                     result = 'PASSED '
+                    passed = passed + 1
                 else:
                     result = 'FAILED '
+                    failed = failed + 1
                 line = result + line + ' no ss, expected runtime {0} ~= actual runtime {1}'.format(expected, actual)
             print(line)
             if 'steadystate' in jsonjob:
                 pp.pprint(jsonjob['steadystate'])
         jobnum += 1
+
+    print("{0} test(s) PASSED, {1} test(s) FAILED".format(passed,failed))
+    sys.exit(failed)
diff --git a/t/stest.c b/t/stest.c
index 515ae5a5..c6bf2d1e 100644
--- a/t/stest.c
+++ b/t/stest.c
@@ -25,7 +25,7 @@ static FLIST_HEAD(list);
 
 static int do_rand_allocs(void)
 {
-	unsigned int size, nr, rounds = 0;
+	unsigned int size, nr, rounds = 0, ret = 0;
 	unsigned long total;
 	struct elem *e;
 	bool error;
@@ -41,6 +41,7 @@ static int do_rand_allocs(void)
 			e = smalloc(size);
 			if (!e) {
 				printf("fail at %lu, size %u\n", total, size);
+				ret++;
 				break;
 			}
 			e->magic1 = MAGIC1;
@@ -65,6 +66,7 @@ static int do_rand_allocs(void)
 				e = smalloc(LARGESMALLOC);
 				if (!e) {
 					error = true;
+					ret++;
 					printf("failure allocating %u bytes at %lu allocated during sfree phase\n",
 						LARGESMALLOC, total);
 				}
@@ -74,18 +76,21 @@ static int do_rand_allocs(void)
 		}
 	}
 
-	return 0;
+	return ret;
 }
 
 int main(int argc, char *argv[])
 {
+	int ret;
+
 	arch_init(argv);
 	sinit();
 	debug_init();
 
-	do_rand_allocs();
-	smalloc_debug(0);	/* free and total blocks should match */
+	ret = do_rand_allocs();
+	smalloc_debug(0);	/* TODO: check that free and total blocks
+				** match */
 
 	scleanup();
-	return 0;
+	return ret;
 }
diff --git a/t/strided.py b/t/strided.py
index 47ce5523..c159dc0b 100755
--- a/t/strided.py
+++ b/t/strided.py
@@ -202,20 +202,20 @@ if __name__ == '__main__':
                 # lfsr
                 {
                     "random_generator": "lfsr",
-                    "zonerange": 4096,
-                    "zonesize": 4096,
+                    "zonerange": 4096*1024,
+                    "zonesize": 4096*1024,
                     "bs": 4096,
-                    "offset": 8*4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
+                    "offset": 8*4096*1024,
+                    "size": 16*4096*1024,
+                    "io_size": 16*4096*1024,
                 },
                 {
                     "random_generator": "lfsr",
-                    "zonerange": 4096,
-                    "zonesize": 4096,
+                    "zonerange": 4096*1024,
+                    "zonesize": 4096*1024,
                     "bs": 4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
+                    "size": 16*4096*1024,
+                    "io_size": 16*4096*1024,
                 },
                 {
                     "random_generator": "lfsr",
@@ -227,11 +227,11 @@ if __name__ == '__main__':
                 },
                 {
                     "random_generator": "lfsr",
-                    "zonerange": 4096,
-                    "zonesize": 4*4096,
+                    "zonerange": 4096*1024,
+                    "zonesize": 4*4096*1024,
                     "bs": 4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
+                    "size": 16*4096*1024,
+                    "io_size": 16*4096*1024,
                 },
                 {
                     "random_generator": "lfsr",
@@ -243,11 +243,11 @@ if __name__ == '__main__':
                 },
                 {
                     "random_generator": "lfsr",
-                    "zonerange": 8192,
-                    "zonesize": 4096,
+                    "zonerange": 8192*1024,
+                    "zonesize": 4096*1024,
                     "bs": 4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
+                    "size": 16*4096*1024,
+                    "io_size": 16*4096*1024,
                 },
                 {
                     "random_generator": "lfsr",
@@ -313,7 +313,7 @@ if __name__ == '__main__':
                     "zonesize": 8*1024*1024,
                     "bs": 4096,
                     "size": 256*1024*1024,
-                    "io_size": 256*1024*204,
+                    "io_size": 256*1024*1024,
                 },
 
             ]


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-11-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-11-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit d5c4f97458d59689c3d1a13831519617d000fb19:

  arch-arm: Consider armv7ve arch as well (2019-11-05 06:41:55 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cc16261082a4d0f027cef9d019ad023129bf6012:

  Merge branch 'testing' of https://github.com/vincentkfu/fio (2019-11-06 17:04:17 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'testing' of https://github.com/vincentkfu/fio

Vincent Fu (12):
      t/iee754: add return value
      t/readonly: replace shell script with python script
      t/strided.py: change LFSR tests
      t/steadystate_tests: better support automated testing
      t/sgunmap-test.py: drop six.moves dependency
      t/stest: non-zero exit value on failure
      t/jobs: fixup t0002 test jobs
      t/jobs: use current directory for test file for t0003 and t0004
      t/jobs: drop time_based in t0007
      t/jobs: clean up t0009 and use only 4 CPUs
      t/jobs: fix t0011 syntax error
      t/run-fio-tests: a script to automate running fio tests

 t/ieee754.c                                        |  15 +-
 t/jobs/readonly-r.fio                              |   5 -
 t/jobs/readonly-t.fio                              |   5 -
 t/jobs/readonly-w.fio                              |   5 -
 ...t0002-13af05ae-post => t0002-13af05ae-post.fio} |  48 +-
 .../{t0002-13af05ae-pre => t0002-13af05ae-pre.fio} |  46 +-
 t/jobs/t0003-0ae2c6e1-post.fio                     |   2 +-
 t/jobs/t0003-0ae2c6e1-pre.fio                      |   2 +-
 t/jobs/t0004-8a99fdf6.fio                          |   2 +-
 t/jobs/t0007-37cf9e3c.fio                          |   1 -
 t/jobs/t0009-f8b0bd10.fio                          |   6 +-
 t/jobs/t0011-5d2788d5.fio                          |   2 +-
 t/readonly.py                                      | 138 +++++
 t/readonly.sh                                      |  84 ---
 t/run-fio-tests.py                                 | 676 +++++++++++++++++++++
 t/sgunmap-test.py                                  |   1 -
 t/steadystate_tests.py                             |  40 +-
 t/stest.c                                          |  15 +-
 t/strided.py                                       |  36 +-
 19 files changed, 936 insertions(+), 193 deletions(-)
 delete mode 100644 t/jobs/readonly-r.fio
 delete mode 100644 t/jobs/readonly-t.fio
 delete mode 100644 t/jobs/readonly-w.fio
 rename t/jobs/{t0002-13af05ae-post => t0002-13af05ae-post.fio} (85%)
 rename t/jobs/{t0002-13af05ae-pre => t0002-13af05ae-pre.fio} (85%)
 create mode 100755 t/readonly.py
 delete mode 100755 t/readonly.sh
 create mode 100755 t/run-fio-tests.py

---

Diff of recent changes:

diff --git a/t/ieee754.c b/t/ieee754.c
index 3898ab74..b6526394 100644
--- a/t/ieee754.c
+++ b/t/ieee754.c
@@ -1,21 +1,26 @@
 #include <stdio.h>
 #include "../lib/ieee754.h"
 
-static double values[] = { -17.23, 17.23, 123.4567, 98765.4321, 0.0 };
+static double values[] = { -17.23, 17.23, 123.4567, 98765.4321,
+	3.14159265358979323, 0.0 };
 
 int main(int argc, char *argv[])
 {
 	uint64_t i;
-	double f;
-	int j;
+	double f, delta;
+	int j, differences = 0;
 
 	j = 0;
 	do {
 		i = fio_double_to_uint64(values[j]);
 		f = fio_uint64_to_double(i);
-		printf("%f -> %f\n", values[j], f);
+		delta = values[j] - f;
+		printf("%26.20lf -> %26.20lf, delta = %26.20lf\n", values[j],
+			f, delta);
+		if (f != values[j])
+			differences++;
 		j++;
 	} while (values[j] != 0.0);
 
-	return 0;
+	return differences;
 }
diff --git a/t/jobs/readonly-r.fio b/t/jobs/readonly-r.fio
deleted file mode 100644
index 34ba9b5e..00000000
--- a/t/jobs/readonly-r.fio
+++ /dev/null
@@ -1,5 +0,0 @@
-[test]
-filename=${DUT}
-rw=randread
-time_based
-runtime=1s
diff --git a/t/jobs/readonly-t.fio b/t/jobs/readonly-t.fio
deleted file mode 100644
index f3e093c1..00000000
--- a/t/jobs/readonly-t.fio
+++ /dev/null
@@ -1,5 +0,0 @@
-[test]
-filename=${DUT}
-rw=randtrim
-time_based
-runtime=1s
diff --git a/t/jobs/readonly-w.fio b/t/jobs/readonly-w.fio
deleted file mode 100644
index 26029ef2..00000000
--- a/t/jobs/readonly-w.fio
+++ /dev/null
@@ -1,5 +0,0 @@
-[test]
-filename=${DUT}
-rw=randwrite
-time_based
-runtime=1s
diff --git a/t/jobs/t0002-13af05ae-post b/t/jobs/t0002-13af05ae-post.fio
similarity index 85%
rename from t/jobs/t0002-13af05ae-post
rename to t/jobs/t0002-13af05ae-post.fio
index b7d5bab9..d141d406 100644
--- a/t/jobs/t0002-13af05ae-post
+++ b/t/jobs/t0002-13af05ae-post.fio
@@ -1,24 +1,24 @@
-[global]
-ioengine=libaio
-direct=1
-filename=/dev/fioa
-iodepth=128
-size=1G
-loops=1
-group_reporting=1
-readwrite=read
-do_verify=1
-verify=md5
-verify_fatal=1
-numjobs=1
-thread
-bssplit=512/50:1M/50
-
-[thread0]
-offset=0G
-
-[thread-mix0]
-offset=4G
-size=1G
-readwrite=rw
-bsrange=512:1M
+[global]
+ioengine=libaio
+direct=1
+filename=t0002file
+iodepth=128
+size=1G
+loops=1
+group_reporting=1
+readwrite=read
+do_verify=1
+verify=md5
+verify_fatal=1
+numjobs=1
+thread
+bssplit=512/50:1M/50
+
+[thread0]
+offset=0G
+
+[thread-mix0]
+offset=4G
+size=1G
+readwrite=rw
+bsrange=512:1M
diff --git a/t/jobs/t0002-13af05ae-pre b/t/jobs/t0002-13af05ae-pre.fio
similarity index 85%
rename from t/jobs/t0002-13af05ae-pre
rename to t/jobs/t0002-13af05ae-pre.fio
index 77dd48fd..0e044d47 100644
--- a/t/jobs/t0002-13af05ae-pre
+++ b/t/jobs/t0002-13af05ae-pre.fio
@@ -1,23 +1,23 @@
-[global]
-ioengine=libaio
-direct=1
-filename=/dev/fioa
-iodepth=128
-size=1G
-loops=1
-group_reporting=1
-readwrite=write
-do_verify=0
-verify=md5
-numjobs=1
-thread
-bssplit=512/50:1M/50
-
-[thread0]
-offset=0G
-
-[thread-mix0]
-offset=4G
-readwrite=rw
-size=1G
-bsrange=512:1M
+[global]
+ioengine=libaio
+direct=1
+filename=t0002file
+iodepth=128
+size=1G
+loops=1
+group_reporting=1
+readwrite=write
+do_verify=0
+verify=md5
+numjobs=1
+thread
+bssplit=512/50:1M/50
+
+[thread0]
+offset=0G
+
+[thread-mix0]
+offset=4G
+readwrite=rw
+size=1G
+bsrange=512:1M
diff --git a/t/jobs/t0003-0ae2c6e1-post.fio b/t/jobs/t0003-0ae2c6e1-post.fio
index 8bc4f05a..4e7887a3 100644
--- a/t/jobs/t0003-0ae2c6e1-post.fio
+++ b/t/jobs/t0003-0ae2c6e1-post.fio
@@ -3,7 +3,7 @@
 [global]
 ioengine=libaio
 direct=1
-filename=/tmp/foo
+filename=foo
 iodepth=128
 size=1M
 loops=1
diff --git a/t/jobs/t0003-0ae2c6e1-pre.fio b/t/jobs/t0003-0ae2c6e1-pre.fio
index 46f452cb..a9a9f319 100644
--- a/t/jobs/t0003-0ae2c6e1-pre.fio
+++ b/t/jobs/t0003-0ae2c6e1-pre.fio
@@ -1,7 +1,7 @@
 [global]
 ioengine=libaio
 direct=1
-filename=/tmp/foo
+filename=foo
 iodepth=128
 size=10M
 loops=1
diff --git a/t/jobs/t0004-8a99fdf6.fio b/t/jobs/t0004-8a99fdf6.fio
index 09ae9b26..0fc3e0de 100644
--- a/t/jobs/t0004-8a99fdf6.fio
+++ b/t/jobs/t0004-8a99fdf6.fio
@@ -3,7 +3,7 @@
 [global]
 ioengine=libaio
 direct=1
-filename=/tmp/foo
+filename=foo
 iodepth=128
 size=10M
 loops=1
diff --git a/t/jobs/t0007-37cf9e3c.fio b/t/jobs/t0007-37cf9e3c.fio
index fd70c21c..d3c98751 100644
--- a/t/jobs/t0007-37cf9e3c.fio
+++ b/t/jobs/t0007-37cf9e3c.fio
@@ -4,7 +4,6 @@
 size=128mb
 rw=read:512k
 bs=1m
-time_based
 norandommap
 write_iolog=log
 direct=1
diff --git a/t/jobs/t0009-f8b0bd10.fio b/t/jobs/t0009-f8b0bd10.fio
index 90e07ad8..20f376e6 100644
--- a/t/jobs/t0009-f8b0bd10.fio
+++ b/t/jobs/t0009-f8b0bd10.fio
@@ -16,21 +16,21 @@ numjobs=1
 #numjobs=24
 # number_ios=1
 # runtime=216000
-runtime=3600
+#runtime=3600
 time_based=1
 group_reporting=1
 thread
 gtod_reduce=1
 iodepth_batch=4
 iodepth_batch_complete=4
-cpus_allowed=0-5
+cpus_allowed=0-3
 cpus_allowed_policy=split
 rw=randwrite
 verify=crc32c-intel
 verify_backlog=1m
 do_verify=1
 verify_async=6
-verify_async_cpus=0-5
+verify_async_cpus=0-3
 runtime=1m
 
 [4_KiB_RR_drive_r]
diff --git a/t/jobs/t0011-5d2788d5.fio b/t/jobs/t0011-5d2788d5.fio
index 09861f7f..50daf612 100644
--- a/t/jobs/t0011-5d2788d5.fio
+++ b/t/jobs/t0011-5d2788d5.fio
@@ -1,7 +1,7 @@
 # Expected results: no parse warnings, runs and with roughly 1/8 iops between
 #			the two jobs.
 # Buggy result: parse warning on flow value overflow, no 1/8 division between
-			jobs.
+#			jobs.
 #
 [global]
 bs=4k
diff --git a/t/readonly.py b/t/readonly.py
new file mode 100755
index 00000000..43686c9c
--- /dev/null
+++ b/t/readonly.py
@@ -0,0 +1,138 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright (c) 2019 Western Digital Corporation or its affiliates.
+#
+#
+# readonly.py
+#
+# Do some basic tests of the --readonly paramter
+#
+# USAGE
+# python readonly.py [-f fio-executable]
+#
+# EXAMPLES
+# python t/readonly.py
+# python t/readonly.py -f ./fio
+#
+# REQUIREMENTS
+# Python 3.5+
+#
+#
+
+import sys
+import argparse
+import subprocess
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-f', '--fio',
+                        help='path to fio executable (e.g., ./fio)')
+    args = parser.parse_args()
+
+    return args
+
+
+def run_fio(fio, test, index):
+    fio_args = [
+                "--name=readonly",
+                "--ioengine=null",
+                "--time_based",
+                "--runtime=1s",
+                "--size=1M",
+                "--rw={rw}".format(**test),
+               ]
+    if 'readonly-pre' in test:
+        fio_args.insert(0, "--readonly")
+    if 'readonly-post' in test:
+        fio_args.append("--readonly")
+
+    output = subprocess.run([fio] + fio_args, stdout=subprocess.PIPE,
+                            stderr=subprocess.PIPE)
+
+    return output
+
+
+def check_output(output, test):
+    expect_error = False
+    if 'readonly-pre' in test or 'readonly-post' in test:
+        if 'write' in test['rw'] or 'trim' in test['rw']:
+            expect_error = True
+
+#    print(output.stdout)
+#    print(output.stderr)
+
+    if output.returncode == 0:
+        if expect_error:
+            return False
+        else:
+            return True
+    else:
+        if expect_error:
+            return True
+        else:
+            return False
+
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    tests = [
+                {
+                    "rw": "randread",
+                    "readonly-pre": 1,
+                },
+                {
+                    "rw": "randwrite",
+                    "readonly-pre": 1,
+                },
+                {
+                    "rw": "randtrim",
+                    "readonly-pre": 1,
+                },
+                {
+                    "rw": "randread",
+                    "readonly-post": 1,
+                },
+                {
+                    "rw": "randwrite",
+                    "readonly-post": 1,
+                },
+                {
+                    "rw": "randtrim",
+                    "readonly-post": 1,
+                },
+                {
+                    "rw": "randread",
+                },
+                {
+                    "rw": "randwrite",
+                },
+                {
+                    "rw": "randtrim",
+                },
+            ]
+
+    index = 1
+    passed = 0
+    failed = 0
+
+    if args.fio:
+        fio_path = args.fio
+    else:
+        fio_path = 'fio'
+
+    for test in tests:
+        output = run_fio(fio_path, test, index)
+        status = check_output(output, test)
+        print("Test {0} {1}".format(index, ("PASSED" if status else "FAILED")))
+        if status:
+            passed = passed + 1
+        else:
+            failed = failed + 1
+        index = index + 1
+
+    print("{0} tests passed, {1} failed".format(passed, failed))
+
+    sys.exit(failed)
diff --git a/t/readonly.sh b/t/readonly.sh
deleted file mode 100755
index d7094146..00000000
--- a/t/readonly.sh
+++ /dev/null
@@ -1,84 +0,0 @@
-#!/bin/bash
-#
-# Do some basic test of the --readonly parameter
-#
-# DUT should be a device that accepts read, write, and trim operations
-#
-# Example usage:
-#
-# DUT=/dev/fioa t/readonly.sh
-#
-TESTNUM=1
-
-#
-# The first parameter is the return code
-# The second parameter is 0        if the return code should be 0
-#                         positive if the return code should be positive
-#
-check () {
-	echo "********************"
-
-	if [ $2 -gt 0 ]; then
-		if [ $1 -eq 0 ]; then
-			echo "Test $TESTNUM failed"
-			echo "********************"
-			exit 1
-		else
-			echo "Test $TESTNUM passed"
-		fi
-	else
-		if [ $1 -gt 0 ]; then
-			echo "Test $TESTNUM failed"
-			echo "********************"
-			exit 1
-		else
-			echo "Test $TESTNUM passed"
-		fi
-	fi
-
-	echo "********************"
-	echo
-	TESTNUM=$((TESTNUM+1))
-}
-
-./fio --name=test --filename=$DUT --rw=randread  --readonly --time_based --runtime=1s &> /dev/null
-check $? 0
-./fio --name=test --filename=$DUT --rw=randwrite --readonly --time_based --runtime=1s &> /dev/null
-check $? 1
-./fio --name=test --filename=$DUT --rw=randtrim  --readonly --time_based --runtime=1s &> /dev/null
-check $? 1
-
-./fio --name=test --filename=$DUT --readonly --rw=randread  --time_based --runtime=1s &> /dev/null
-check $? 0
-./fio --name=test --filename=$DUT --readonly --rw=randwrite --time_based --runtime=1s &> /dev/null
-check $? 1
-./fio --name=test --filename=$DUT --readonly --rw=randtrim  --time_based --runtime=1s &> /dev/null
-check $? 1
-
-./fio --name=test --filename=$DUT --rw=randread  --time_based --runtime=1s &> /dev/null
-check $? 0
-./fio --name=test --filename=$DUT --rw=randwrite --time_based --runtime=1s &> /dev/null
-check $? 0
-./fio --name=test --filename=$DUT --rw=randtrim  --time_based --runtime=1s &> /dev/null
-check $? 0
-
-./fio t/jobs/readonly-r.fio --readonly &> /dev/null
-check $? 0
-./fio t/jobs/readonly-w.fio --readonly &> /dev/null
-check $? 1
-./fio t/jobs/readonly-t.fio --readonly &> /dev/null
-check $? 1
-
-./fio --readonly t/jobs/readonly-r.fio &> /dev/null
-check $? 0
-./fio --readonly t/jobs/readonly-w.fio &> /dev/null
-check $? 1
-./fio --readonly t/jobs/readonly-t.fio &> /dev/null
-check $? 1
-
-./fio t/jobs/readonly-r.fio &> /dev/null
-check $? 0
-./fio t/jobs/readonly-w.fio &> /dev/null
-check $? 0
-./fio t/jobs/readonly-t.fio &> /dev/null
-check $? 0
diff --git a/t/run-fio-tests.py b/t/run-fio-tests.py
new file mode 100755
index 00000000..1b8ca0a2
--- /dev/null
+++ b/t/run-fio-tests.py
@@ -0,0 +1,676 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright (c) 2019 Western Digital Corporation or its affiliates.
+#
+"""
+# run-fio-tests.py
+#
+# Automate running of fio tests
+#
+# USAGE
+# python3 run-fio-tests.py [-r fio-root] [-f fio-path] [-a artifact-root]
+#                           [--skip # # #...] [--run-only # # #...]
+#
+#
+# EXAMPLE
+# # git clone [fio-repository]
+# # cd fio
+# # make -j
+# # python3 t/run-fio-tests.py
+#
+#
+# REQUIREMENTS
+# - Python 3
+# - Linux (libaio ioengine, zbd tests, etc)
+# - The artifact directory must be on a file system that accepts 512-byte IO
+#   (t0002, t0003, t0004).
+# - The artifact directory needs to be on an SSD. Otherwise tests that carry
+#   out file-based IO will trigger a timeout (t0006).
+# - 4 CPUs (t0009)
+# - SciPy (steadystate_tests.py)
+# - libzbc (zbd tests)
+# - root privileges (zbd test)
+# - kernel 4.19 or later for zoned null block devices (zbd tests)
+# - CUnit support (unittests)
+#
+"""
+
+#
+# TODO  run multiple tests simultaneously
+# TODO  Add sgunmap tests (requires SAS SSD)
+# TODO  automatically detect dependencies and skip tests accordingly
+#
+
+import os
+import sys
+import json
+import time
+import logging
+import argparse
+import subprocess
+from pathlib import Path
+
+
+class FioTest(object):
+    """Base for all fio tests."""
+
+    def __init__(self, exe_path, parameters, success):
+        self.exe_path = exe_path
+        self.parameters = parameters
+        self.success = success
+        self.output = {}
+        self.artifact_root = None
+        self.testnum = None
+        self.test_dir = None
+        self.passed = True
+        self.failure_reason = ''
+
+    def setup(self, artifact_root, testnum):
+        self.artifact_root = artifact_root
+        self.testnum = testnum
+        self.test_dir = os.path.join(artifact_root, "{:04d}".format(testnum))
+        if not os.path.exists(self.test_dir):
+            os.mkdir(self.test_dir)
+
+        self.command_file = os.path.join(
+                self.test_dir,
+                "{0}.command".format(os.path.basename(self.exe_path)))
+        self.stdout_file = os.path.join(
+                self.test_dir,
+                "{0}.stdout".format(os.path.basename(self.exe_path)))
+        self.stderr_file = os.path.join(
+                self.test_dir,
+                "{0}.stderr".format(os.path.basename(self.exe_path)))
+        self.exticode_file = os.path.join(
+                self.test_dir,
+                "{0}.exitcode".format(os.path.basename(self.exe_path)))
+
+    def run(self):
+        raise NotImplementedError()
+
+    def check_result(self):
+        raise NotImplementedError()
+
+
+class FioExeTest(FioTest):
+    """Test consists of an executable binary or script"""
+
+    def __init__(self, exe_path, parameters, success):
+        """Construct a FioExeTest which is a FioTest consisting of an
+        executable binary or script.
+
+        exe_path:       location of executable binary or script
+        parameters:     list of parameters for executable
+        success:        Definition of test success
+        """
+
+        FioTest.__init__(self, exe_path, parameters, success)
+
+    def setup(self, artifact_root, testnum):
+        super(FioExeTest, self).setup(artifact_root, testnum)
+
+    def run(self):
+        if self.parameters:
+            command = [self.exe_path] + self.parameters
+        else:
+            command = [self.exe_path]
+        command_file = open(self.command_file, "w+")
+        command_file.write("%s\n" % command)
+        command_file.close()
+
+        stdout_file = open(self.stdout_file, "w+")
+        stderr_file = open(self.stderr_file, "w+")
+        exticode_file = open(self.exticode_file, "w+")
+        try:
+            # Avoid using subprocess.run() here because when a timeout occurs,
+            # fio will be stopped with SIGKILL. This does not give fio a
+            # chance to clean up and means that child processes may continue
+            # running and submitting IO.
+            proc = subprocess.Popen(command,
+                                    stdout=stdout_file,
+                                    stderr=stderr_file,
+                                    cwd=self.test_dir,
+                                    universal_newlines=True)
+            proc.communicate(timeout=self.success['timeout'])
+            exticode_file.write('{0}\n'.format(proc.returncode))
+            logging.debug("return code: %d" % proc.returncode)
+            self.output['proc'] = proc
+        except subprocess.TimeoutExpired:
+            proc.terminate()
+            proc.communicate()
+            assert proc.poll()
+            self.output['failure'] = 'timeout'
+        except Exception:
+            if not proc.poll():
+                proc.terminate()
+                proc.communicate()
+            self.output['failure'] = 'exception'
+            self.output['exc_info'] = sys.exc_info()
+        finally:
+            stdout_file.close()
+            stderr_file.close()
+            exticode_file.close()
+
+    def check_result(self):
+        if 'proc' not in self.output:
+            if self.output['failure'] == 'timeout':
+                self.failure_reason = "{0} timeout,".format(self.failure_reason)
+            else:
+                assert self.output['failure'] == 'exception'
+                self.failure_reason = '{0} exception: {1}, {2}'.format(
+                        self.failure_reason, self.output['exc_info'][0],
+                        self.output['exc_info'][1])
+
+            self.passed = False
+            return
+
+        if 'zero_return' in self.success:
+            if self.success['zero_return']:
+                if self.output['proc'].returncode != 0:
+                    self.passed = False
+                    self.failure_reason = "{0} non-zero return code,".format(self.failure_reason)
+            else:
+                if self.output['proc'].returncode == 0:
+                    self.failure_reason = "{0} zero return code,".format(self.failure_reason)
+                    self.passed = False
+
+        if 'stderr_empty' in self.success:
+            stderr_size = os.path.getsize(self.stderr_file)
+            if self.success['stderr_empty']:
+                if stderr_size != 0:
+                    self.failure_reason = "{0} stderr not empty,".format(self.failure_reason)
+                    self.passed = False
+            else:
+                if stderr_size == 0:
+                    self.failure_reason = "{0} stderr empty,".format(self.failure_reason)
+                    self.passed = False
+
+
+class FioJobTest(FioExeTest):
+    """Test consists of a fio job"""
+
+    def __init__(self, fio_path, fio_job, success, fio_pre_job=None,
+                 fio_pre_success=None, output_format="normal"):
+        """Construct a FioJobTest which is a FioExeTest consisting of a
+        single fio job file with an optional setup step.
+
+        fio_path:           location of fio executable
+        fio_job:            location of fio job file
+        success:            Definition of test success
+        fio_pre_job:        fio job for preconditioning
+        fio_pre_success:    Definition of test success for fio precon job
+        output_format:      normal (default), json, jsonplus, or terse
+        """
+
+        self.fio_job = fio_job
+        self.fio_pre_job = fio_pre_job
+        self.fio_pre_success = fio_pre_success if fio_pre_success else success
+        self.output_format = output_format
+        self.precon_failed = False
+        self.json_data = None
+        self.fio_output = "{0}.output".format(os.path.basename(self.fio_job))
+        self.fio_args = [
+            "--output-format={0}".format(self.output_format),
+            "--output={0}".format(self.fio_output),
+            self.fio_job,
+            ]
+        FioExeTest.__init__(self, fio_path, self.fio_args, success)
+
+    def setup(self, artifact_root, testnum):
+        super(FioJobTest, self).setup(artifact_root, testnum)
+
+        self.command_file = os.path.join(
+                self.test_dir,
+                "{0}.command".format(os.path.basename(self.fio_job)))
+        self.stdout_file = os.path.join(
+                self.test_dir,
+                "{0}.stdout".format(os.path.basename(self.fio_job)))
+        self.stderr_file = os.path.join(
+                self.test_dir,
+                "{0}.stderr".format(os.path.basename(self.fio_job)))
+        self.exticode_file = os.path.join(
+                self.test_dir,
+                "{0}.exitcode".format(os.path.basename(self.fio_job)))
+
+    def run_pre_job(self):
+        precon = FioJobTest(self.exe_path, self.fio_pre_job,
+                            self.fio_pre_success,
+                            output_format=self.output_format)
+        precon.setup(self.artifact_root, self.testnum)
+        precon.run()
+        precon.check_result()
+        self.precon_failed = not precon.passed
+        self.failure_reason = precon.failure_reason
+
+    def run(self):
+        if self.fio_pre_job:
+            self.run_pre_job()
+
+        if not self.precon_failed:
+            super(FioJobTest, self).run()
+        else:
+            logging.debug("precondition step failed")
+
+    def check_result(self):
+        if self.precon_failed:
+            self.passed = False
+            self.failure_reason = "{0} precondition step failed,".format(self.failure_reason)
+            return
+
+        super(FioJobTest, self).check_result()
+
+        if 'json' in self.output_format:
+            output_file = open(os.path.join(self.test_dir, self.fio_output), "r")
+            file_data = output_file.read()
+            output_file.close()
+            try:
+                self.json_data = json.loads(file_data)
+            except json.JSONDecodeError:
+                self.failure_reason = "{0} unable to decode JSON data,".format(self.failure_reason)
+                self.passed = False
+
+
+class FioJobTest_t0005(FioJobTest):
+    """Test consists of fio test job t0005
+    Confirm that read['io_kbytes'] == write['io_kbytes'] == 102400"""
+
+    def check_result(self):
+        super(FioJobTest_t0005, self).check_result()
+
+        if not self.passed:
+            return
+
+        if self.json_data['jobs'][0]['read']['io_kbytes'] != 102400:
+            self.failure_reason = "{0} bytes read mismatch,".format(self.failure_reason)
+            self.passed = False
+        if self.json_data['jobs'][0]['write']['io_kbytes'] != 102400:
+            self.failure_reason = "{0} bytes written mismatch,".format(self.failure_reason)
+            self.passed = False
+
+
+class FioJobTest_t0006(FioJobTest):
+    """Test consists of fio test job t0006
+    Confirm that read['io_kbytes'] ~ 2*write['io_kbytes']"""
+
+    def check_result(self):
+        super(FioJobTest_t0006, self).check_result()
+
+        if not self.passed:
+            return
+
+        ratio = self.json_data['jobs'][0]['read']['io_kbytes'] \
+            / self.json_data['jobs'][0]['write']['io_kbytes']
+        logging.debug("ratio: %f" % ratio)
+        if ratio < 1.99 or ratio > 2.01:
+            self.failure_reason = "{0} read/write ratio mismatch,".format(self.failure_reason)
+            self.passed = False
+
+
+class FioJobTest_t0007(FioJobTest):
+    """Test consists of fio test job t0007
+    Confirm that read['io_kbytes'] = 87040"""
+
+    def check_result(self):
+        super(FioJobTest_t0007, self).check_result()
+
+        if not self.passed:
+            return
+
+        if self.json_data['jobs'][0]['read']['io_kbytes'] != 87040:
+            self.failure_reason = "{0} bytes read mismatch,".format(self.failure_reason)
+            self.passed = False
+
+
+class FioJobTest_t0008(FioJobTest):
+    """Test consists of fio test job t0008
+    Confirm that read['io_kbytes'] = 32768 and that
+                write['io_kbytes'] ~ 16568
+
+    I did runs with fio-ae2fafc8 and saw write['io_kbytes'] values of
+    16585, 16588. With two runs of fio-3.16 I obtained 16568"""
+
+    def check_result(self):
+        super(FioJobTest_t0008, self).check_result()
+
+        if not self.passed:
+            return
+
+        ratio = self.json_data['jobs'][0]['write']['io_kbytes'] / 16568
+        logging.debug("ratio: %f" % ratio)
+
+        if ratio < 0.99 or ratio > 1.01:
+            self.failure_reason = "{0} bytes written mismatch,".format(self.failure_reason)
+            self.passed = False
+        if self.json_data['jobs'][0]['read']['io_kbytes'] != 32768:
+            self.failure_reason = "{0} bytes read mismatch,".format(self.failure_reason)
+            self.passed = False
+
+
+class FioJobTest_t0009(FioJobTest):
+    """Test consists of fio test job t0009
+    Confirm that runtime >= 60s"""
+
+    def check_result(self):
+        super(FioJobTest_t0009, self).check_result()
+
+        if not self.passed:
+            return
+
+        logging.debug('elapsed: %d' % self.json_data['jobs'][0]['elapsed'])
+
+        if self.json_data['jobs'][0]['elapsed'] < 60:
+            self.failure_reason = "{0} elapsed time mismatch,".format(self.failure_reason)
+            self.passed = False
+
+
+class FioJobTest_t0011(FioJobTest):
+    """Test consists of fio test job t0009
+    Confirm that job0 iops == 1000
+    and that job1_iops / job0_iops ~ 8
+    With two runs of fio-3.16 I observed a ratio of 8.3"""
+
+    def check_result(self):
+        super(FioJobTest_t0011, self).check_result()
+
+        if not self.passed:
+            return
+
+        iops1 = self.json_data['jobs'][0]['read']['iops']
+        iops2 = self.json_data['jobs'][1]['read']['iops']
+        ratio = iops2 / iops1
+        logging.debug("ratio: %f" % ratio)
+
+        if iops1 < 999 or iops1 > 1001:
+            self.failure_reason = "{0} iops value mismatch,".format(self.failure_reason)
+            self.passed = False
+
+        if ratio < 7 or ratio > 9:
+            self.failure_reason = "{0} iops ratio mismatch,".format(self.failure_reason)
+            self.passed = False
+
+
+SUCCESS_DEFAULT = {
+        'zero_return': True,
+        'stderr_empty': True,
+        'timeout': 300,
+        }
+SUCCESS_NONZERO = {
+        'zero_return': False,
+        'stderr_empty': False,
+        'timeout': 300,
+        }
+SUCCESS_STDERR = {
+        'zero_return': True,
+        'stderr_empty': False,
+        'timeout': 300,
+        }
+TEST_LIST = [
+        {
+            'test_id':          1,
+            'test_class':       FioJobTest,
+            'job':              't0001-52c58027.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+        },
+        {
+            'test_id':          2,
+            'test_class':       FioJobTest,
+            'job':              't0002-13af05ae-post.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          't0002-13af05ae-pre.fio',
+            'pre_success':      None,
+        },
+        {
+            'test_id':          3,
+            'test_class':       FioJobTest,
+            'job':              't0003-0ae2c6e1-post.fio',
+            'success':          SUCCESS_NONZERO,
+            'pre_job':          't0003-0ae2c6e1-pre.fio',
+            'pre_success':      SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          4,
+            'test_class':       FioJobTest,
+            'job':              't0004-8a99fdf6.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+        },
+        {
+            'test_id':          5,
+            'test_class':       FioJobTest_t0005,
+            'job':              't0005-f7078f7b.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+            'output_format':    'json',
+        },
+        {
+            'test_id':          6,
+            'test_class':       FioJobTest_t0006,
+            'job':              't0006-82af2a7c.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+            'output_format':    'json',
+        },
+        {
+            'test_id':          7,
+            'test_class':       FioJobTest_t0007,
+            'job':              't0007-37cf9e3c.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+            'output_format':    'json',
+        },
+        {
+            'test_id':          8,
+            'test_class':       FioJobTest_t0008,
+            'job':              't0008-ae2fafc8.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+            'output_format':    'json',
+        },
+        {
+            'test_id':          9,
+            'test_class':       FioJobTest_t0009,
+            'job':              't0009-f8b0bd10.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+            'output_format':    'json',
+        },
+        {
+            'test_id':          10,
+            'test_class':       FioJobTest,
+            'job':              't0010-b7aae4ba.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+        },
+        {
+            'test_id':          11,
+            'test_class':       FioJobTest_t0011,
+            'job':              't0011-5d2788d5.fio',
+            'success':          SUCCESS_DEFAULT,
+            'pre_job':          None,
+            'pre_success':      None,
+            'output_format':    'json',
+        },
+        {
+            'test_id':          1000,
+            'test_class':       FioExeTest,
+            'exe':              't/axmap',
+            'parameters':       None,
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1001,
+            'test_class':       FioExeTest,
+            'exe':              't/ieee754',
+            'parameters':       None,
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1002,
+            'test_class':       FioExeTest,
+            'exe':              't/lfsr-test',
+            'parameters':       ['0xFFFFFF', '0', '0', 'verify'],
+            'success':          SUCCESS_STDERR,
+        },
+        {
+            'test_id':          1003,
+            'test_class':       FioExeTest,
+            'exe':              't/readonly.py',
+            'parameters':       ['-f', '{fio_path}'],
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1004,
+            'test_class':       FioExeTest,
+            'exe':              't/steadystate_tests.py',
+            'parameters':       ['{fio_path}'],
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1005,
+            'test_class':       FioExeTest,
+            'exe':              't/stest',
+            'parameters':       None,
+            'success':          SUCCESS_STDERR,
+        },
+        {
+            'test_id':          1006,
+            'test_class':       FioExeTest,
+            'exe':              't/strided.py',
+            'parameters':       ['{fio_path}'],
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1007,
+            'test_class':       FioExeTest,
+            'exe':              't/zbd/run-tests-against-regular-nullb',
+            'parameters':       None,
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1008,
+            'test_class':       FioExeTest,
+            'exe':              't/zbd/run-tests-against-zoned-nullb',
+            'parameters':       None,
+            'success':          SUCCESS_DEFAULT,
+        },
+        {
+            'test_id':          1009,
+            'test_class':       FioExeTest,
+            'exe':              'unittests/unittest',
+            'parameters':       None,
+            'success':          SUCCESS_DEFAULT,
+        },
+]
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-r', '--fio-root',
+                        help='fio root path')
+    parser.add_argument('-f', '--fio',
+                        help='path to fio executable (e.g., ./fio)')
+    parser.add_argument('-a', '--artifact-root',
+                        help='artifact root directory')
+    parser.add_argument('-s', '--skip', nargs='+', type=int,
+                        help='list of test(s) to skip')
+    parser.add_argument('-o', '--run-only', nargs='+', type=int,
+                        help='list of test(s) to run, skipping all others')
+    args = parser.parse_args()
+
+    return args
+
+
+def main():
+    logging.basicConfig(level=logging.INFO)
+
+    args = parse_args()
+    if args.fio_root:
+        fio_root = args.fio_root
+    else:
+        fio_root = Path(__file__).absolute().parent.parent
+    logging.debug("fio_root: %s" % fio_root)
+
+    if args.fio:
+        fio_path = args.fio
+    else:
+        fio_path = os.path.join(fio_root, "fio")
+    logging.debug("fio_path: %s" % fio_path)
+
+    artifact_root = args.artifact_root if args.artifact_root else \
+        "fio-test-{0}".format(time.strftime("%Y%m%d-%H%M%S"))
+    os.mkdir(artifact_root)
+    print("Artifact directory is %s" % artifact_root)
+
+    passed = 0
+    failed = 0
+    skipped = 0
+
+    for config in TEST_LIST:
+        if (args.skip and config['test_id'] in args.skip) or \
+           (args.run_only and config['test_id'] not in args.run_only):
+            skipped = skipped + 1
+            print("Test {0} SKIPPED".format(config['test_id']))
+            continue
+
+        if issubclass(config['test_class'], FioJobTest):
+            if config['pre_job']:
+                fio_pre_job = os.path.join(fio_root, 't', 'jobs',
+                                           config['pre_job'])
+            else:
+                fio_pre_job = None
+            if config['pre_success']:
+                fio_pre_success = config['pre_success']
+            else:
+                fio_pre_success = None
+            if 'output_format' in config:
+                output_format = config['output_format']
+            else:
+                output_format = 'normal'
+            test = config['test_class'](
+                fio_path,
+                os.path.join(fio_root, 't', 'jobs', config['job']),
+                config['success'],
+                fio_pre_job=fio_pre_job,
+                fio_pre_success=fio_pre_success,
+                output_format=output_format)
+        elif issubclass(config['test_class'], FioExeTest):
+            exe_path = os.path.join(fio_root, config['exe'])
+            if config['parameters']:
+                parameters = [p.format(fio_path=fio_path) for p in config['parameters']]
+            else:
+                parameters = None
+            test = config['test_class'](exe_path, parameters,
+                                        config['success'])
+        else:
+            print("Test {0} FAILED: unable to process test config".format(config['test_id']))
+            failed = failed + 1
+            continue
+
+        test.setup(artifact_root, config['test_id'])
+        test.run()
+        test.check_result()
+        if test.passed:
+            result = "PASSED"
+            passed = passed + 1
+        else:
+            result = "FAILED: {0}".format(test.failure_reason)
+            failed = failed + 1
+        print("Test {0} {1}".format(config['test_id'], result))
+
+    print("{0} test(s) passed, {1} failed, {2} skipped".format(passed, failed, skipped))
+
+    sys.exit(failed)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/t/sgunmap-test.py b/t/sgunmap-test.py
index d2caa5fd..f8f10ab3 100755
--- a/t/sgunmap-test.py
+++ b/t/sgunmap-test.py
@@ -45,7 +45,6 @@ import json
 import argparse
 import traceback
 import subprocess
-from six.moves import range
 
 
 def parse_args():
diff --git a/t/steadystate_tests.py b/t/steadystate_tests.py
index 50254dcc..53b0f35e 100755
--- a/t/steadystate_tests.py
+++ b/t/steadystate_tests.py
@@ -1,5 +1,5 @@
-#!/usr/bin/python2.7
-# Note: this script is python2 and python 3 compatible.
+#!/usr/bin/env python
+# Note: this script is python2 and python3 compatible.
 #
 # steadystate_tests.py
 #
@@ -24,12 +24,10 @@ from __future__ import print_function
 import os
 import sys
 import json
-import uuid
 import pprint
 import argparse
 import subprocess
 from scipy import stats
-from six.moves import range
 
 def parse_args():
     parser = argparse.ArgumentParser()
@@ -53,7 +51,7 @@ def check(data, iops, slope, pct, limit, dur, criterion):
         m, intercept, r_value, p_value, std_err = stats.linregress(x,data)
         m = abs(m)
         if pct:
-            target = m / mean * 100
+            target = (m / mean * 100) if mean != 0 else 0
             criterion = criterion[:-1]
         else:
             target = m
@@ -68,7 +66,11 @@ def check(data, iops, slope, pct, limit, dur, criterion):
             target = maxdev
 
     criterion = float(criterion)
-    return (abs(target - criterion) / criterion < 0.005), target < limit, mean, target
+    if criterion == 0.0:
+        objsame = False
+    else:
+        objsame = abs(target - criterion) / criterion < 0.005
+    return (objsame, target < limit, mean, target)
 
 
 if __name__ == '__main__':
@@ -76,6 +78,9 @@ if __name__ == '__main__':
 
     pp = pprint.PrettyPrinter(indent=4)
 
+    passed = 0
+    failed = 0
+
 #
 # test option parsing
 #
@@ -96,8 +101,10 @@ if __name__ == '__main__':
         output = subprocess.check_output([args.fio] + test['args'])
         if test['output'] in output.decode():
             print("PASSED '{0}' found with arguments {1}".format(test['output'], test['args']))
+            passed = passed + 1
         else:
             print("FAILED '{0}' NOT found with arguments {1}".format(test['output'], test['args']))
+            failed = failed + 1
 
 #
 # test some read workloads
@@ -119,7 +126,7 @@ if __name__ == '__main__':
     if args.read == None:
         if os.name == 'posix':
             args.read = '/dev/zero'
-            extra = [ "--size=134217728" ]  # 128 MiB
+            extra = [ "--size=128M" ]
         else:
             print("ERROR: file for read testing must be specified on non-posix systems")
             sys.exit(1)
@@ -129,7 +136,7 @@ if __name__ == '__main__':
     jobnum = 0
     for job in reads:
 
-        tf = uuid.uuid4().hex
+        tf = "steadystate_job{0}.json".format(jobnum)
         parameters = [ "--name=job{0}".format(jobnum) ]
         parameters.extend(extra)
         parameters.extend([ "--thread",
@@ -160,10 +167,10 @@ if __name__ == '__main__':
         output = subprocess.call([args.fio] + parameters)
         with open(tf, 'r') as source:
             jsondata = json.loads(source.read())
-        os.remove(tf)
+            source.close()
 
         for jsonjob in jsondata['jobs']:
-            line = "job {0}".format(jsonjob['job options']['name'])
+            line = "{0}".format(jsonjob['job options']['name'])
             if job['s']:
                 if jsonjob['steadystate']['attained'] == 1:
                     # check runtime >= ss_dur + ss_ramp, check criterion, check criterion < limit
@@ -171,6 +178,7 @@ if __name__ == '__main__':
                     actual = jsonjob['read']['runtime']
                     if mintime > actual:
                         line = 'FAILED ' + line + ' ss attained, runtime {0} < ss_dur {1} + ss_ramp {2}'.format(actual, job['ss_dur'], job['ss_ramp'])
+                        failed = failed + 1
                     else:
                         line = line + ' ss attained, runtime {0} > ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
                         objsame, met, mean, target = check(data=jsonjob['steadystate']['data'],
@@ -182,11 +190,14 @@ if __name__ == '__main__':
                             criterion=jsonjob['steadystate']['criterion'])
                         if not objsame:
                             line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
+                            failed = failed + 1
                         else:
                             if met:
                                 line = 'PASSED ' + line + ' target {0} < limit {1}'.format(target, job['ss_limit'])
+                                passed = passed + 1
                             else:
                                 line = 'FAILED ' + line + ' target {0} < limit {1} but fio reports ss not attained '.format(target, job['ss_limit'])
+                                failed = failed + 1
                 else:
                     # check runtime, confirm criterion calculation, and confirm that criterion was not met
                     expected = job['timeout'] * 1000
@@ -205,22 +216,31 @@ if __name__ == '__main__':
                         if not objsame:
                             if actual > (job['ss_dur'] + job['ss_ramp'])*1000:
                                 line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
+                                failed = failed + 1
                             else:
                                 line = 'PASSED ' + line + ' fio criterion {0} == 0.0 since ss_dur + ss_ramp has not elapsed '.format(jsonjob['steadystate']['criterion'])
+                                passed = passed + 1
                         else:
                             if met:
                                 line = 'FAILED ' + line + ' target {0} < threshold {1} but fio reports ss not attained '.format(target, job['ss_limit'])
+                                failed = failed + 1
                             else:
                                 line = 'PASSED ' + line + ' criterion {0} > threshold {1}'.format(target, job['ss_limit'])
+                                passed = passed + 1
             else:
                 expected = job['timeout'] * 1000
                 actual = jsonjob['read']['runtime']
                 if abs(expected - actual) < 10:
                     result = 'PASSED '
+                    passed = passed + 1
                 else:
                     result = 'FAILED '
+                    failed = failed + 1
                 line = result + line + ' no ss, expected runtime {0} ~= actual runtime {1}'.format(expected, actual)
             print(line)
             if 'steadystate' in jsonjob:
                 pp.pprint(jsonjob['steadystate'])
         jobnum += 1
+
+    print("{0} test(s) PASSED, {1} test(s) FAILED".format(passed,failed))
+    sys.exit(failed)
diff --git a/t/stest.c b/t/stest.c
index 515ae5a5..c6bf2d1e 100644
--- a/t/stest.c
+++ b/t/stest.c
@@ -25,7 +25,7 @@ static FLIST_HEAD(list);
 
 static int do_rand_allocs(void)
 {
-	unsigned int size, nr, rounds = 0;
+	unsigned int size, nr, rounds = 0, ret = 0;
 	unsigned long total;
 	struct elem *e;
 	bool error;
@@ -41,6 +41,7 @@ static int do_rand_allocs(void)
 			e = smalloc(size);
 			if (!e) {
 				printf("fail at %lu, size %u\n", total, size);
+				ret++;
 				break;
 			}
 			e->magic1 = MAGIC1;
@@ -65,6 +66,7 @@ static int do_rand_allocs(void)
 				e = smalloc(LARGESMALLOC);
 				if (!e) {
 					error = true;
+					ret++;
 					printf("failure allocating %u bytes at %lu allocated during sfree phase\n",
 						LARGESMALLOC, total);
 				}
@@ -74,18 +76,21 @@ static int do_rand_allocs(void)
 		}
 	}
 
-	return 0;
+	return ret;
 }
 
 int main(int argc, char *argv[])
 {
+	int ret;
+
 	arch_init(argv);
 	sinit();
 	debug_init();
 
-	do_rand_allocs();
-	smalloc_debug(0);	/* free and total blocks should match */
+	ret = do_rand_allocs();
+	smalloc_debug(0);	/* TODO: check that free and total blocks
+				** match */
 
 	scleanup();
-	return 0;
+	return ret;
 }
diff --git a/t/strided.py b/t/strided.py
index 47ce5523..c159dc0b 100755
--- a/t/strided.py
+++ b/t/strided.py
@@ -202,20 +202,20 @@ if __name__ == '__main__':
                 # lfsr
                 {
                     "random_generator": "lfsr",
-                    "zonerange": 4096,
-                    "zonesize": 4096,
+                    "zonerange": 4096*1024,
+                    "zonesize": 4096*1024,
                     "bs": 4096,
-                    "offset": 8*4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
+                    "offset": 8*4096*1024,
+                    "size": 16*4096*1024,
+                    "io_size": 16*4096*1024,
                 },
                 {
                     "random_generator": "lfsr",
-                    "zonerange": 4096,
-                    "zonesize": 4096,
+                    "zonerange": 4096*1024,
+                    "zonesize": 4096*1024,
                     "bs": 4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
+                    "size": 16*4096*1024,
+                    "io_size": 16*4096*1024,
                 },
                 {
                     "random_generator": "lfsr",
@@ -227,11 +227,11 @@ if __name__ == '__main__':
                 },
                 {
                     "random_generator": "lfsr",
-                    "zonerange": 4096,
-                    "zonesize": 4*4096,
+                    "zonerange": 4096*1024,
+                    "zonesize": 4*4096*1024,
                     "bs": 4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
+                    "size": 16*4096*1024,
+                    "io_size": 16*4096*1024,
                 },
                 {
                     "random_generator": "lfsr",
@@ -243,11 +243,11 @@ if __name__ == '__main__':
                 },
                 {
                     "random_generator": "lfsr",
-                    "zonerange": 8192,
-                    "zonesize": 4096,
+                    "zonerange": 8192*1024,
+                    "zonesize": 4096*1024,
                     "bs": 4096,
-                    "size": 16*4096,
-                    "io_size": 16*4096,
+                    "size": 16*4096*1024,
+                    "io_size": 16*4096*1024,
                 },
                 {
                     "random_generator": "lfsr",
@@ -313,7 +313,7 @@ if __name__ == '__main__':
                     "zonesize": 8*1024*1024,
                     "bs": 4096,
                     "size": 256*1024*1024,
-                    "io_size": 256*1024*204,
+                    "io_size": 256*1024*1024,
                 },
 
             ]


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-11-06 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-11-06 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 35e3ade0fe63f5149e07024ee31a02ac14d2dde7:

  Merge branch 'master' of https://github.com/aphreet/fio (2019-11-03 06:35:44 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d5c4f97458d59689c3d1a13831519617d000fb19:

  arch-arm: Consider armv7ve arch as well (2019-11-05 06:41:55 -0700)

----------------------------------------------------------------
Khem Raj (1):
      arch-arm: Consider armv7ve arch as well

 arch/arch-arm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/arch/arch-arm.h b/arch/arch-arm.h
index 78cb2ebe..b3567122 100644
--- a/arch/arch-arm.h
+++ b/arch/arch-arm.h
@@ -11,7 +11,7 @@
 #define nop             __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t")
 #define read_barrier()	__asm__ __volatile__ ("" : : : "memory")
 #define write_barrier()	__asm__ __volatile__ ("" : : : "memory")
-#elif defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_8A__)
+#elif defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_7VE__) || defined(__ARM_ARCH_8A__)
 #define	nop		__asm__ __volatile__ ("nop")
 #define read_barrier()	__sync_synchronize()
 #define write_barrier()	__sync_synchronize()


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-11-04 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-11-04 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f89cfed7d28f76e99ee7520d6d350b9847fab07c:

  engines/libaio.c: remove unused 'hipri' setting (2019-11-02 14:48:24 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 35e3ade0fe63f5149e07024ee31a02ac14d2dde7:

  Merge branch 'master' of https://github.com/aphreet/fio (2019-11-03 06:35:44 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'master' of https://github.com/aphreet/fio

Mikhail Malygin (1):
      Enable io_uring engine on powerpc arch

 arch/arch-ppc.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

---

Diff of recent changes:

diff --git a/arch/arch-ppc.h b/arch/arch-ppc.h
index 804d596a..46246bae 100644
--- a/arch/arch-ppc.h
+++ b/arch/arch-ppc.h
@@ -24,6 +24,18 @@
 #define PPC_CNTLZL "cntlzw"
 #endif
 
+#define ARCH_HAVE_IOURING
+
+#ifndef __NR_sys_io_uring_setup
+#define __NR_sys_io_uring_setup		425
+#endif
+#ifndef __NR_sys_io_uring_enter
+#define __NR_sys_io_uring_enter		426
+#endif
+#ifndef __NR_sys_io_uring_register
+#define __NR_sys_io_uring_register	427
+#endif
+
 static inline int __ilog2(unsigned long bitmask)
 {
 	int lz;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-11-03 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-11-03 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8c302eb9706963e07d6d79998e15bede77b94520:

  Merge branch 'patch-1' of https://github.com/hannesweisbach/fio (2019-10-29 13:39:14 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f89cfed7d28f76e99ee7520d6d350b9847fab07c:

  engines/libaio.c: remove unused 'hipri' setting (2019-11-02 14:48:24 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      engines/libaio.c: remove unused 'hipri' setting

 engines/libaio.c | 1 -
 1 file changed, 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/libaio.c b/engines/libaio.c
index cd5b89f9..b047b746 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -44,7 +44,6 @@ struct libaio_data {
 struct libaio_options {
 	void *pad;
 	unsigned int userspace_reap;
-	unsigned int hipri;
 };
 
 static struct fio_option options[] = {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-10-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-10-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c656b9056e45f9acd1d374a85c5becf7cc6a00ca:

  Merge branch 'doc_fixes' of https://github.com/sitsofe/fio (2019-10-24 12:04:09 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8c302eb9706963e07d6d79998e15bede77b94520:

  Merge branch 'patch-1' of https://github.com/hannesweisbach/fio (2019-10-29 13:39:14 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch '1029_tet' of https://github.com/optimistyzy/fio
      Merge branch 'patch-1' of https://github.com/hannesweisbach/fio

Ziye Yang (1):
      backend: fix the memory leak if fio_memalign fails,

hannesweisbach (1):
      Fix output redirection of exec_prerun/_postrun

 backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index fe868271..1c339408 100644
--- a/backend.c
+++ b/backend.c
@@ -1236,7 +1236,7 @@ static int init_io_u(struct thread_data *td)
 		ptr = fio_memalign(cl_align, sizeof(*io_u), td_offload_overlap(td));
 		if (!ptr) {
 			log_err("fio: unable to allocate aligned memory\n");
-			break;
+			return 1;
 		}
 
 		io_u = ptr;
@@ -1469,12 +1469,12 @@ static bool keep_running(struct thread_data *td)
 
 static int exec_string(struct thread_options *o, const char *string, const char *mode)
 {
-	size_t newlen = strlen(string) + strlen(o->name) + strlen(mode) + 9 + 1;
+	size_t newlen = strlen(string) + strlen(o->name) + strlen(mode) + 13 + 1;
 	int ret;
 	char *str;
 
 	str = malloc(newlen);
-	sprintf(str, "%s &> %s.%s.txt", string, o->name, mode);
+	sprintf(str, "%s > %s.%s.txt 2>&1", string, o->name, mode);
 
 	log_info("%s : Saving output of %s in %s.%s.txt\n",o->name, mode, o->name, mode);
 	ret = system(str);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-10-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-10-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f6830780c2022a607b252191d4a5a1c6804e7513:

  Merge branch 'android-log-fix' of https://github.com/kdrag0n/fio (2019-10-21 08:46:43 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c656b9056e45f9acd1d374a85c5becf7cc6a00ca:

  Merge branch 'doc_fixes' of https://github.com/sitsofe/fio (2019-10-24 12:04:09 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'doc_fixes' of https://github.com/sitsofe/fio

Sitsofe Wheeler (2):
      man: don't use non-breaking minuses when they're not necessary
      doc: delete repeated word "will"

 HOWTO |   2 +-
 fio.1 | 138 +++++++++++++++++++++++++++++++++---------------------------------
 2 files changed, 70 insertions(+), 70 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 96a047de..64c6a033 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2713,7 +2713,7 @@ Threads, processes and job synchronization
 			Each job will get a unique CPU from the CPU set.
 
 	**shared** is the default behavior, if the option isn't specified. If
-	**split** is specified, then fio will will assign one cpu per job. If not
+	**split** is specified, then fio will assign one cpu per job. If not
 	enough CPUs are given for the jobs listed, then fio will roundrobin the CPUs
 	in the set.
 
diff --git a/fio.1 b/fio.1
index 6685e507..087d3778 100644
--- a/fio.1
+++ b/fio.1
@@ -255,7 +255,7 @@ default unit is bytes. For quantities of time, the default unit is seconds
 unless otherwise specified.
 .P
 With `kb_base=1000', fio follows international standards for unit
-prefixes. To specify power\-of\-10 decimal values defined in the
+prefixes. To specify power-of-10 decimal values defined in the
 International System of Units (SI):
 .RS
 .P
@@ -272,7 +272,7 @@ P means peta (P) or 1000**5
 .PD
 .RE
 .P
-To specify power\-of\-2 binary values defined in IEC 80000\-13:
+To specify power-of-2 binary values defined in IEC 80000-13:
 .RS
 .P
 .PD 0
@@ -289,7 +289,7 @@ Pi means pebi (Pi) or 1024**5
 .RE
 .P
 With `kb_base=1024' (the default), the unit prefixes are opposite
-from those specified in the SI and IEC 80000\-13 standards to provide
+from those specified in the SI and IEC 80000-13 standards to provide
 compatibility with old scripts. For example, 4k means 4096.
 .P
 For quantities of data, an optional unit of 'B' may be included
@@ -376,14 +376,14 @@ Select the interpretation of unit prefixes in input parameters.
 .RS
 .TP
 .B 1000
-Inputs comply with IEC 80000\-13 and the International
+Inputs comply with IEC 80000-13 and the International
 System of Units (SI). Use:
 .RS
 .P
 .PD 0
-\- power\-of\-2 values with IEC prefixes (e.g., KiB)
+\- power-of-2 values with IEC prefixes (e.g., KiB)
 .P
-\- power\-of\-10 values with SI prefixes (e.g., kB)
+\- power-of-10 values with SI prefixes (e.g., kB)
 .PD
 .RE
 .TP
@@ -392,9 +392,9 @@ Compatibility mode (default). To avoid breaking old scripts:
 .P
 .RS
 .PD 0
-\- power\-of\-2 values with SI prefixes
+\- power-of-2 values with SI prefixes
 .P
-\- power\-of\-10 values with IEC prefixes
+\- power-of-10 values with IEC prefixes
 .PD
 .RE
 .RE
@@ -402,7 +402,7 @@ Compatibility mode (default). To avoid breaking old scripts:
 See \fBbs\fR for more details on input parameters.
 .P
 Outputs always use correct prefixes. Most outputs include both
-side\-by\-side, like:
+side-by-side, like:
 .P
 .RS
 bw=2383.3kB/s (2327.4KiB/s)
@@ -425,7 +425,7 @@ Base unit for reporting. Allowed values are:
 .RS
 .TP
 .B 0
-Use auto\-detection (default).
+Use auto-detection (default).
 .TP
 .B 8
 Byte based.
@@ -565,7 +565,7 @@ would use `filename=/dev/dsk/foo@3,0\\:c' and if the path is
 On Windows, disk devices are accessed as `\\\\.\\PhysicalDrive0' for
 the first device, `\\\\.\\PhysicalDrive1' for the second etc.
 Note: Windows and FreeBSD prevent write access to areas
-of the disk containing in\-use data (e.g. filesystems).
+of the disk containing in-use data (e.g. filesystems).
 .P
 The filename `\-' is a reserved name, meaning *stdin* or *stdout*. Which
 of the two depends on the read/write direction set.
@@ -676,7 +676,7 @@ Alias for normal.
 For \fBrandom\fR, \fBroundrobin\fR, and \fBsequential\fR, a postfix can be appended to
 tell fio how many I/Os to issue before switching to a new file. For example,
 specifying `file_service_type=random:8' would cause fio to issue
-8 I/Os before selecting a new file at random. For the non\-uniform
+8 I/Os before selecting a new file at random. For the non-uniform
 distributions, a floating point postfix can be given to influence how the
 distribution is skewed. See \fBrandom_distribution\fR for a description
 of how that would work.
@@ -695,8 +695,8 @@ used and even the number of processors in the system. Default: true.
 \fBfsync\fR\|(2) the data file after creation. This is the default.
 .TP
 .BI create_on_open \fR=\fPbool
-If true, don't pre\-create files but allow the job's open() to create a file
-when it's time to do I/O. Default: false \-\- pre\-create all necessary files
+If true, don't pre-create files but allow the job's open() to create a file
+when it's time to do I/O. Default: false \-\- pre-create all necessary files
 when the job starts.
 .TP
 .BI create_only \fR=\fPbool
@@ -717,11 +717,11 @@ destroy data on the mounted file system. Note that some platforms don't allow
 writing against a mounted device regardless of this option. Default: false.
 .TP
 .BI pre_read \fR=\fPbool
-If this is given, files will be pre\-read into memory before starting the
+If this is given, files will be pre-read into memory before starting the
 given I/O operation. This will also clear the \fBinvalidate\fR flag,
-since it is pointless to pre\-read and then drop the cache. This will only
-work for I/O engines that are seek\-able, since they allow you to read the
-same data multiple times. Thus it will not work on non\-seekable I/O engines
+since it is pointless to pre-read and then drop the cache. This will only
+work for I/O engines that are seek-able, since they allow you to read the
+same data multiple times. Thus it will not work on non-seekable I/O engines
 (e.g. network, splice). Default: false.
 .TP
 .BI unlink \fR=\fPbool
@@ -820,7 +820,7 @@ previous parameter can be used to simulate garbage collection activity.
 .SS "I/O type"
 .TP
 .BI direct \fR=\fPbool
-If value is true, use non\-buffered I/O. This is usually O_DIRECT. Note that
+If value is true, use non-buffered I/O. This is usually O_DIRECT. Note that
 OpenBSD and ZFS on Solaris don't support direct I/O. On Windows the synchronous
 ioengines don't support direct I/O. Default: false.
 .TP
@@ -924,36 +924,36 @@ control what sequence of output is being generated. If not set, the random
 sequence depends on the \fBrandrepeat\fR setting.
 .TP
 .BI fallocate \fR=\fPstr
-Whether pre\-allocation is performed when laying down files.
+Whether pre-allocation is performed when laying down files.
 Accepted values are:
 .RS
 .RS
 .TP
 .B none
-Do not pre\-allocate space.
+Do not pre-allocate space.
 .TP
 .B native
-Use a platform's native pre\-allocation call but fall back to
+Use a platform's native pre-allocation call but fall back to
 \fBnone\fR behavior if it fails/is not implemented.
 .TP
 .B posix
-Pre\-allocate via \fBposix_fallocate\fR\|(3).
+Pre-allocate via \fBposix_fallocate\fR\|(3).
 .TP
 .B keep
-Pre\-allocate via \fBfallocate\fR\|(2) with
+Pre-allocate via \fBfallocate\fR\|(2) with
 FALLOC_FL_KEEP_SIZE set.
 .TP
 .B 0
-Backward\-compatible alias for \fBnone\fR.
+Backward-compatible alias for \fBnone\fR.
 .TP
 .B 1
-Backward\-compatible alias for \fBposix\fR.
+Backward-compatible alias for \fBposix\fR.
 .RE
 .P
 May not be available on all supported platforms. \fBkeep\fR is only available
 on Linux. If using ZFS on Solaris this cannot be set to \fBposix\fR
-because ZFS doesn't support pre\-allocation. Default: \fBnative\fR if any
-pre\-allocation methods are available, \fBnone\fR if not.
+because ZFS doesn't support pre-allocation. Default: \fBnative\fR if any
+pre-allocation methods are available, \fBnone\fR if not.
 .RE
 .TP
 .BI fadvise_hint \fR=\fPstr
@@ -1023,7 +1023,7 @@ offset is aligned to the minimum block size.
 .BI offset_increment \fR=\fPint
 If this is provided, then the real offset becomes `\fBoffset\fR + \fBoffset_increment\fR
 * thread_number', where the thread number is a counter that starts at 0 and
-is incremented for each sub\-job (i.e. when \fBnumjobs\fR option is
+is incremented for each sub-job (i.e. when \fBnumjobs\fR option is
 specified). This option is useful if there are several jobs which are
 intended to operate on a file in parallel disjoint segments, with even
 spacing between the starting points. Percentages can be used for this option.
@@ -1037,13 +1037,13 @@ condition). With this setting, the range/size can be set independently of
 the number of I/Os to perform. When fio reaches this number, it will exit
 normally and report status. Note that this does not extend the amount of I/O
 that will be done, it will only stop fio if this condition is met before
-other end\-of\-job criteria.
+other end-of-job criteria.
 .TP
 .BI fsync \fR=\fPint
 If writing to a file, issue an \fBfsync\fR\|(2) (or its equivalent) of
 the dirty data for every number of blocks given. For example, if you give 32
 as a parameter, fio will sync the file after every 32 writes issued. If fio is
-using non\-buffered I/O, we may not sync the file. The exception is the sg
+using non-buffered I/O, we may not sync the file. The exception is the sg
 I/O engine, which synchronizes the disk cache anyway. Defaults to 0, which
 means fio does not periodically issue and wait for a sync to complete. Also
 see \fBend_fsync\fR and \fBfsync_on_close\fR.
@@ -1053,7 +1053,7 @@ Like \fBfsync\fR but uses \fBfdatasync\fR\|(2) to only sync data and
 not metadata blocks. In Windows, FreeBSD, DragonFlyBSD or OSX there is no
 \fBfdatasync\fR\|(2) so this falls back to using \fBfsync\fR\|(2).
 Defaults to 0, which means fio does not periodically issue and wait for a
-data\-only sync to complete.
+data-only sync to complete.
 .TP
 .BI write_barrier \fR=\fPint
 Make every N\-th write a barrier write.
@@ -1200,7 +1200,7 @@ For a random workload, set how big a percentage should be random. This
 defaults to 100%, in which case the workload is fully random. It can be set
 from anywhere from 0 to 100. Setting it to 0 would make the workload fully
 sequential. Any setting in between will result in a random mix of sequential
-and random I/O, at the given percentages. Comma\-separated values may be
+and random I/O, at the given percentages. Comma-separated values may be
 specified for reads, writes, and trims as described in \fBblocksize\fR.
 .TP
 .BI norandommap
@@ -1209,7 +1209,7 @@ this option is given, fio will just get a new random offset without looking
 at past I/O history. This means that some blocks may not be read or written,
 and that some blocks may be read/written more than once. If this option is
 used with \fBverify\fR and multiple blocksizes (via \fBbsrange\fR),
-only intact blocks are verified, i.e., partially\-overwritten blocks are
+only intact blocks are verified, i.e., partially-overwritten blocks are
 ignored.  With an async I/O engine and an I/O depth > 1, it is possible for
 the same block to be overwritten, which can cause verification errors.  Either
 do not use norandommap in this case, or also use the lfsr random generator.
@@ -1250,7 +1250,7 @@ selected automatically.
 .TP
 .BI blocksize \fR=\fPint[,int][,int] "\fR,\fB bs" \fR=\fPint[,int][,int]
 The block size in bytes used for I/O units. Default: 4096. A single value
-applies to reads, writes, and trims. Comma\-separated values may be
+applies to reads, writes, and trims. Comma-separated values may be
 specified for reads, writes, and trims. A value not terminated in a comma
 applies to subsequent types. Examples:
 .RS
@@ -1274,7 +1274,7 @@ bs=,8k,        means default for reads, 8k for writes, and default for trims.
 A range of block sizes in bytes for I/O units. The issued I/O unit will
 always be a multiple of the minimum size, unless
 \fBblocksize_unaligned\fR is set.
-Comma\-separated ranges may be specified for reads, writes, and trims as
+Comma-separated ranges may be specified for reads, writes, and trims as
 described in \fBblocksize\fR. Example:
 .RS
 .RS
@@ -1311,7 +1311,7 @@ bssplit=4k/50:1k/:32k/
 would have 50% 4k ios, and 25% 1k and 32k ios. The percentages always add up
 to 100, if bssplit is given a range that adds up to more, it will error out.
 .P
-Comma\-separated values may be specified for reads, writes, and trims as
+Comma-separated values may be specified for reads, writes, and trims as
 described in \fBblocksize\fR.
 .P
 If you want a workload that has 50% 2k reads and 50% 4k reads, while having
@@ -1341,7 +1341,7 @@ Boundary to which fio will align random I/O units. Default:
 \fBblocksize\fR. Minimum alignment is typically 512b for using direct
 I/O, though it usually depends on the hardware block size. This option is
 mutually exclusive with using a random map for files, so it will turn off
-that option. Comma\-separated values may be specified for reads, writes, and
+that option. Comma-separated values may be specified for reads, writes, and
 trims as described in \fBblocksize\fR.
 .SS "Buffers and memory"
 .TP
@@ -1357,7 +1357,7 @@ verification is enabled, \fBrefill_buffers\fR is also automatically enabled.
 .BI scramble_buffers \fR=\fPbool
 If \fBrefill_buffers\fR is too costly and the target is using data
 deduplication, then setting this option will slightly modify the I/O buffer
-contents to defeat normal de\-dupe attempts. This is not enough to defeat
+contents to defeat normal de-dupe attempts. This is not enough to defeat
 more clever block compression attempts, but it will stop naive dedupe of
 blocks. Default: true.
 .TP
@@ -1482,7 +1482,7 @@ is 4MiB in size. So to calculate the number of huge pages you need for a
 given job file, add up the I/O depth of all jobs (normally one unless
 \fBiodepth\fR is used) and multiply by the maximum bs set. Then divide
 that number by the huge page size. You can see the size of the huge pages in
-`/proc/meminfo'. If no huge pages are allocated by having a non\-zero
+`/proc/meminfo'. If no huge pages are allocated by having a non-zero
 number in `nr_hugepages', using \fBmmaphuge\fR or \fBshmhuge\fR will fail. Also
 see \fBhugepage\-size\fR.
 .P
@@ -1505,7 +1505,7 @@ of subsequent I/O memory buffers is the sum of the \fBiomem_align\fR and
 Defines the size of a huge page. Must at least be equal to the system
 setting, see `/proc/meminfo'. Defaults to 4MiB. Should probably
 always be a multiple of megabytes, so using `hugepage\-size=Xm' is the
-preferred way to set this to avoid setting a non\-pow\-2 bad value.
+preferred way to set this to avoid setting a non-pow-2 bad value.
 .TP
 .BI lockmem \fR=\fPint
 Pin the specified amount of memory with \fBmlock\fR\|(2). Can be used to
@@ -1549,7 +1549,7 @@ this value is used as a fixed size or possible range of each file.
 Perform I/O after the end of the file. Normally fio will operate within the
 size of a file. If this option is set, then fio will append to the file
 instead. This has identical behavior to setting \fBoffset\fR to the size
-of a file. This option is ignored on non\-regular files.
+of a file. This option is ignored on non-regular files.
 .TP
 .BI fill_device \fR=\fPbool "\fR,\fB fill_fs" \fR=\fPbool
 Sets size to something really large and waits for ENOSPC (no space left on
@@ -1557,7 +1557,7 @@ device) as the terminating condition. Only makes sense with sequential
 write. For a read workload, the mount point will be filled first then I/O
 started on the result. This option doesn't make sense if operating on a raw
 device node, since the size of that is already known by the file system.
-Additionally, writing beyond end\-of\-device will not return ENOSPC there.
+Additionally, writing beyond end-of-device will not return ENOSPC there.
 .SS "I/O engine"
 .TP
 .BI ioengine \fR=\fPstr
@@ -1586,7 +1586,7 @@ Basic \fBpreadv2\fR\|(2) or \fBpwritev2\fR\|(2) I/O.
 .TP
 .B libaio
 Linux native asynchronous I/O. Note that Linux may only support
-queued behavior with non\-buffered I/O (set `direct=1' or
+queued behavior with non-buffered I/O (set `direct=1' or
 `buffered=0').
 This engine defines engine specific options.
 .TP
@@ -1641,11 +1641,11 @@ Doesn't transfer any data, but burns CPU cycles according to the
 of the CPU. In case of SMP machines, use `numjobs=<nr_of_cpu>'
 to get desired CPU usage, as the cpuload only loads a
 single CPU at the desired rate. A job never finishes unless there is
-at least one non\-cpuio job.
+at least one non-cpuio job.
 .TP
 .B guasi
 The GUASI I/O engine is the Generic Userspace Asynchronous Syscall
-Interface approach to async I/O. See \fIhttp://www.xmailserver.org/guasi\-lib.html\fR
+Interface approach to async I/O. See \fIhttp://www.xmailserver.org/guasi-lib.html\fR
 for more info on GUASI.
 .TP
 .B rdma
@@ -1810,7 +1810,7 @@ should be used for the polling thread.
 .BI (libaio)userspace_reap
 Normally, with the libaio engine in use, fio will use the
 \fBio_getevents\fR\|(3) system call to reap newly returned events. With
-this flag turned on, the AIO ring will be read directly from user\-space to
+this flag turned on, the AIO ring will be read directly from user-space to
 reap events. The reaping mode is only enabled when polling for a minimum of
 0 events (e.g. when `iodepth_batch_complete=0').
 .TP
@@ -2165,13 +2165,13 @@ When the unit is omitted, the value is interpreted in microseconds. See
 \fBthinktime_blocks\fR and \fBthinktime_spin\fR.
 .TP
 .BI thinktime_spin \fR=\fPtime
-Only valid if \fBthinktime\fR is set \- pretend to spend CPU time doing
+Only valid if \fBthinktime\fR is set - pretend to spend CPU time doing
 something with the data received, before falling back to sleeping for the
 rest of the period specified by \fBthinktime\fR. When the unit is
 omitted, the value is interpreted in microseconds.
 .TP
 .BI thinktime_blocks \fR=\fPint
-Only valid if \fBthinktime\fR is set \- control how many blocks to issue,
+Only valid if \fBthinktime\fR is set - control how many blocks to issue,
 before waiting \fBthinktime\fR usecs. If not set, defaults to 1 which will make
 fio wait \fBthinktime\fR usecs after every block. This effectively makes any
 queue depth setting redundant, since no more than 1 I/O will be queued
@@ -2180,7 +2180,7 @@ setting effectively caps the queue depth if the latter is larger.
 .TP
 .BI rate \fR=\fPint[,int][,int]
 Cap the bandwidth used by this job. The number is in bytes/sec, the normal
-suffix rules apply. Comma\-separated values may be specified for reads,
+suffix rules apply. Comma-separated values may be specified for reads,
 writes, and trims as described in \fBblocksize\fR.
 .RS
 .P
@@ -2192,7 +2192,7 @@ latter will only limit reads.
 .TP
 .BI rate_min \fR=\fPint[,int][,int]
 Tell fio to do whatever it can to maintain at least this bandwidth. Failing
-to meet this requirement will cause the job to exit. Comma\-separated values
+to meet this requirement will cause the job to exit. Comma-separated values
 may be specified for reads, writes, and trims as described in
 \fBblocksize\fR.
 .TP
@@ -2200,12 +2200,12 @@ may be specified for reads, writes, and trims as described in
 Cap the bandwidth to this number of IOPS. Basically the same as
 \fBrate\fR, just specified independently of bandwidth. If the job is
 given a block size range instead of a fixed value, the smallest block size
-is used as the metric. Comma\-separated values may be specified for reads,
+is used as the metric. Comma-separated values may be specified for reads,
 writes, and trims as described in \fBblocksize\fR.
 .TP
 .BI rate_iops_min \fR=\fPint[,int][,int]
 If fio doesn't meet this rate of I/O, it will cause the job to exit.
-Comma\-separated values may be specified for reads, writes, and trims as
+Comma-separated values may be specified for reads, writes, and trims as
 described in \fBblocksize\fR.
 .TP
 .BI rate_process \fR=\fPstr
@@ -2411,7 +2411,7 @@ Each job will get a unique CPU from the CPU set.
 .RE
 .P
 \fBshared\fR is the default behavior, if the option isn't specified. If
-\fBsplit\fR is specified, then fio will will assign one cpu per job. If not
+\fBsplit\fR is specified, then fio will assign one cpu per job. If not
 enough CPUs are given for the jobs listed, then fio will roundrobin the CPUs
 in the set.
 .RE
@@ -2474,7 +2474,7 @@ The ID of the flow. If not specified, it defaults to being a global
 flow. See \fBflow\fR.
 .TP
 .BI flow \fR=\fPint
-Weight in token\-based flow control. If this value is used, then there is
+Weight in token-based flow control. If this value is used, then there is
 a 'flow counter' which is used to regulate the proportion of activity between
 two or more jobs. Fio attempts to keep this flow counter near zero. The
 \fBflow\fR parameter stands for how much should be added or subtracted to the
@@ -2795,8 +2795,8 @@ true.
 It may sometimes be interesting to display statistics for groups of jobs as
 a whole instead of for each individual job. This is especially true if
 \fBnumjobs\fR is used; looking at individual thread/process output
-quickly becomes unwieldy. To see the final report per\-group instead of
-per\-job, use \fBgroup_reporting\fR. Jobs in a file will be part of the
+quickly becomes unwieldy. To see the final report per-group instead of
+per-job, use \fBgroup_reporting\fR. Jobs in a file will be part of the
 same reporting group, unless if separated by a \fBstonewall\fR, or by
 using \fBnew_group\fR.
 .TP
@@ -2915,11 +2915,11 @@ parameter. The files will be stored with a `.fz' suffix.
 .TP
 .BI log_unix_epoch \fR=\fPbool
 If set, fio will log Unix timestamps to the log files produced by enabling
-write_type_log for each log type, instead of the default zero\-based
+write_type_log for each log type, instead of the default zero-based
 timestamps.
 .TP
 .BI block_error_percentiles \fR=\fPbool
-If set, record errors in trim block\-sized units from writes and trims and
+If set, record errors in trim block-sized units from writes and trims and
 output a histogram of how many trims it took to get to errors, and what kind
 of error was encountered.
 .TP
@@ -2989,7 +2989,7 @@ for each job to finish.
 .TP
 .BI continue_on_error \fR=\fPstr
 Normally fio will exit the job on the first observed failure. If this option
-is set, fio will continue the job when there is a 'non\-fatal error' (EIO or
+is set, fio will continue the job when there is a 'non-fatal error' (EIO or
 EILSEQ) until the runtime is exceeded or the I/O size specified is
 completed. If this option is used, there are two more stats that are
 appended, the total error count and the first error. The error field given
@@ -3017,17 +3017,17 @@ Continue on verify errors, exit on all others.
 Continue on all errors.
 .TP
 .B 0
-Backward\-compatible alias for 'none'.
+Backward-compatible alias for 'none'.
 .TP
 .B 1
-Backward\-compatible alias for 'all'.
+Backward-compatible alias for 'all'.
 .RE
 .RE
 .TP
 .BI ignore_error \fR=\fPstr
 Sometimes you want to ignore some errors during test in that case you can
 specify error list for each error type, instead of only being able to
-ignore the default 'non\-fatal error' using \fBcontinue_on_error\fR.
+ignore the default 'non-fatal error' using \fBcontinue_on_error\fR.
 `ignore_error=READ_ERR_LIST,WRITE_ERR_LIST,VERIFY_ERR_LIST' errors for
 given error type is separated with ':'. Error may be symbol ('ENOSPC', 'ENOMEM')
 or integer. Example:
@@ -3131,7 +3131,7 @@ Thread created.
 Thread initialized, waiting or generating necessary data.
 .TP
 .B p
-Thread running pre\-reading file(s).
+Thread running pre-reading file(s).
 .TP
 .B /
 Thread is in ramp period.
@@ -3337,8 +3337,8 @@ For each data direction it prints:
 .B bw
 Aggregate bandwidth of threads in this group followed by the
 minimum and maximum bandwidth of all the threads in this group.
-Values outside of brackets are power\-of\-2 format and those
-within are the equivalent value in a power\-of\-10 format.
+Values outside of brackets are power-of-2 format and those
+within are the equivalent value in a power-of-10 format.
 .TP
 .B io
 Aggregate I/O performed of all threads in this group. The
@@ -3555,7 +3555,7 @@ This data indicates that one I/O required 87,552ns to complete, two I/Os require
 100,864ns to complete, and 7529 I/Os required 107,008ns to complete.
 .P
 Also included with fio is a Python script \fBfio_jsonplus_clat2csv\fR that takes
-json+ output and generates CSV\-formatted latency data suitable for plotting.
+json+ output and generates CSV-formatted latency data suitable for plotting.
 .P
 The latency durations actually represent the midpoints of latency intervals.
 For details refer to `stat.h' in the fio source.
@@ -3828,7 +3828,7 @@ is recorded. Each `data direction' seen within the window period will aggregate
 its values in a separate row. Further, when using windowed logging the `block
 size' and `offset' entries will always contain 0.
 .SH CLIENT / SERVER
-Normally fio is invoked as a stand\-alone application on the machine where the
+Normally fio is invoked as a stand-alone application on the machine where the
 I/O workload should be generated. However, the backend and frontend of fio can
 be run separately i.e., the fio server can generate an I/O workload on the "Device
 Under Test" while being controlled by a client on another machine.
@@ -3911,7 +3911,7 @@ The fio command would then be:
 $ fio \-\-client=host.list <job file(s)>
 .RE
 .P
-In this mode, you cannot input server\-specific parameters or job files \-\- all
+In this mode, you cannot input server-specific parameters or job files \-\- all
 servers receive the same job file.
 .P
 In order to let `fio \-\-client' runs use a shared filesystem from multiple


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-10-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-10-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2e97fa1b0d76edc6517fa4a8a4f6e0792b458e8c:

  Merge branch 'fix-fsync-on-close' of https://github.com/sitsofe/fio (2019-10-15 09:27:06 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f6830780c2022a607b252191d4a5a1c6804e7513:

  Merge branch 'android-log-fix' of https://github.com/kdrag0n/fio (2019-10-21 08:46:43 -0600)

----------------------------------------------------------------
Danny Lin (1):
      Makefile: Link to the system logging library on Android

Jens Axboe (1):
      Merge branch 'android-log-fix' of https://github.com/kdrag0n/fio

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 7c21ef83..7aab6abd 100644
--- a/Makefile
+++ b/Makefile
@@ -170,7 +170,7 @@ endif
 ifeq ($(CONFIG_TARGET_OS), Android)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c profiles/tiobench.c \
 		oslib/linux-dev-lookup.c
-  LIBS += -ldl
+  LIBS += -ldl -llog
   LDFLAGS += -rdynamic
 endif
 ifeq ($(CONFIG_TARGET_OS), SunOS)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-10-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-10-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5cd4efe903798f2185a347911a9440324558c89f:

  parse: improve detection of bad input string (2019-10-14 08:03:53 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2e97fa1b0d76edc6517fa4a8a4f6e0792b458e8c:

  Merge branch 'fix-fsync-on-close' of https://github.com/sitsofe/fio (2019-10-15 09:27:06 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fix-fsync-on-close' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      backend: fix final fsync behaviour

Vincent Fu (3):
      io_u: skip to the next zone when zoneskip is set to zero
      filesetup: use zonerange for map and LFSR with zonemode=strided
      testing: add test script for zonemode=strided

 backend.c    |  11 +-
 filesetup.c  |   3 +
 io_u.c       |   3 +-
 ioengines.c  |   6 +-
 t/strided.py | 350 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 366 insertions(+), 7 deletions(-)
 create mode 100755 t/strided.py

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 2f463293..fe868271 100644
--- a/backend.c
+++ b/backend.c
@@ -281,6 +281,7 @@ static bool fio_io_sync(struct thread_data *td, struct fio_file *f)
 
 	io_u->ddir = DDIR_SYNC;
 	io_u->file = f;
+	io_u_set(td, io_u, IO_U_F_NO_FILE_PUT);
 
 	if (td_io_prep(td, io_u)) {
 		put_io_u(td, io_u);
@@ -314,7 +315,7 @@ requeue:
 
 static int fio_file_fsync(struct thread_data *td, struct fio_file *f)
 {
-	int ret;
+	int ret, ret2;
 
 	if (fio_file_open(f))
 		return fio_io_sync(td, f);
@@ -323,8 +324,10 @@ static int fio_file_fsync(struct thread_data *td, struct fio_file *f)
 		return 1;
 
 	ret = fio_io_sync(td, f);
-	td_io_close_file(td, f);
-	return ret;
+	ret2 = 0;
+	if (fio_file_open(f))
+		ret2 = td_io_close_file(td, f);
+	return (ret || ret2);
 }
 
 static inline void __update_ts_cache(struct thread_data *td)
@@ -1124,7 +1127,7 @@ reap:
 				td->error = 0;
 		}
 
-		if (should_fsync(td) && td->o.end_fsync) {
+		if (should_fsync(td) && (td->o.end_fsync || td->o.fsync_on_close)) {
 			td_set_runstate(td, TD_FSYNCING);
 
 			for_each_file(td, f, i) {
diff --git a/filesetup.c b/filesetup.c
index a439b6d6..1d3094c1 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1338,6 +1338,9 @@ bool init_random_map(struct thread_data *td)
 	for_each_file(td, f, i) {
 		uint64_t fsize = min(f->real_file_size, f->io_size);
 
+		if (td->o.zone_mode == ZONE_MODE_STRIDED)
+			fsize = td->o.zone_range;
+
 		blocks = fsize / (unsigned long long) td->o.rw_min_bs;
 
 		if (check_rand_gen_limits(td, f, blocks))
diff --git a/io_u.c b/io_u.c
index 94899552..5cbbe85a 100644
--- a/io_u.c
+++ b/io_u.c
@@ -850,7 +850,8 @@ static void setup_strided_zone_mode(struct thread_data *td, struct io_u *io_u)
 	/*
 	 * See if it's time to switch to a new zone
 	 */
-	if (td->zone_bytes >= td->o.zone_size && td->o.zone_skip) {
+	if (td->zone_bytes >= td->o.zone_size &&
+			fio_option_is_set(&td->o, zone_skip)) {
 		td->zone_bytes = 0;
 		f->file_offset += td->o.zone_range + td->o.zone_skip;
 
diff --git a/ioengines.c b/ioengines.c
index 40fa75c3..9e3fcc9f 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -376,14 +376,16 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	}
 
 	if (ret == FIO_Q_COMPLETED) {
-		if (ddir_rw(io_u->ddir) || ddir_sync(io_u->ddir)) {
+		if (ddir_rw(io_u->ddir) ||
+		    (ddir_sync(io_u->ddir) && td->runstate != TD_FSYNCING)) {
 			io_u_mark_depth(td, 1);
 			td->ts.total_io_u[io_u->ddir]++;
 		}
 	} else if (ret == FIO_Q_QUEUED) {
 		td->io_u_queued++;
 
-		if (ddir_rw(io_u->ddir) || ddir_sync(io_u->ddir))
+		if (ddir_rw(io_u->ddir) ||
+		    (ddir_sync(io_u->ddir) && td->runstate != TD_FSYNCING))
 			td->ts.total_io_u[io_u->ddir]++;
 
 		if (td->io_u_queued >= td->o.iodepth_batch)
diff --git a/t/strided.py b/t/strided.py
new file mode 100755
index 00000000..47ce5523
--- /dev/null
+++ b/t/strided.py
@@ -0,0 +1,350 @@
+#!/usr/bin/python
+# Note: this script is python2 and python3 compatible.
+#
+# strided.py
+#
+# Test zonemode=strided. This uses the null ioengine when no file is
+# specified. If a file is specified, use it for randdom read testing.
+# Some of the zoneranges in the tests are 16MiB. So when using a file
+# a minimum size of 32MiB is recommended.
+#
+# USAGE
+# python strided.py fio-executable [-f file/device]
+#
+# EXAMPLES
+# python t/strided.py ./fio
+# python t/strided.py ./fio -f /dev/sda
+# dd if=/dev/zero of=temp bs=1M count=32
+# python t/strided.py ./fio -f temp
+#
+# REQUIREMENTS
+# Python 2.6+
+#
+# ===TEST MATRIX===
+#
+# --zonemode=strided, zoneskip >= 0
+#   w/ randommap and LFSR
+#       zonesize=zonerange  all blocks in zonerange touched
+#       zonesize>zonerange  all blocks touched and roll-over back into zone
+#       zonesize<zonerange  all blocks inside zone
+#
+#   w/o randommap       all blocks inside zone
+#
+
+from __future__ import absolute_import
+from __future__ import print_function
+import os
+import sys
+import argparse
+import subprocess
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('fio',
+                        help='path to fio executable (e.g., ./fio)')
+    parser.add_argument('-f', '--filename', help="file/device to test")
+    args = parser.parse_args()
+
+    return args
+
+
+def run_fio(fio, test, index):
+    filename = "strided"
+    fio_args = [
+                "--name=strided",
+                "--zonemode=strided",
+                "--log_offset=1",
+                "--randrepeat=0",
+                "--rw=randread",
+                "--zoneskip=0",
+                "--write_iops_log={0}{1:03d}".format(filename, index),
+                "--output={0}{1:03d}.out".format(filename, index),
+                "--zonerange={zonerange}".format(**test),
+                "--zonesize={zonesize}".format(**test),
+                "--bs={bs}".format(**test),
+               ]
+    if 'norandommap' in test:
+        fio_args.append('--norandommap')
+    if 'random_generator' in test:
+        fio_args.append('--random_generator={random_generator}'.format(**test))
+    if 'offset' in test:
+        fio_args.append('--offset={offset}'.format(**test))
+    if 'filename' in test:
+        fio_args.append('--filename={filename}'.format(**test))
+        fio_args.append('--filesize={filesize})'.format(**test))
+    else:
+        fio_args.append('--ioengine=null')
+        fio_args.append('--size={size}'.format(**test))
+        fio_args.append('--io_size={io_size}'.format(**test))
+        fio_args.append('--filesize={size})'.format(**test))
+
+    output = subprocess.check_output([fio] + fio_args, universal_newlines=True)
+
+    f = open("{0}{1:03d}_iops.1.log".format(filename, index), "r")
+    log = f.read()
+    f.close()
+
+    return log
+
+
+def check_output(iops_log, test):
+    zonestart = 0 if 'offset' not in test else test['offset']
+    iospersize = test['zonesize'] / test['bs']
+    iosperrange = test['zonerange'] / test['bs']
+    iosperzone = 0
+    lines = iops_log.split('\n')
+    zoneset = set()
+
+    for line in lines:
+        if len(line) == 0:
+            continue
+
+        if iosperzone == iospersize:
+            # time to move to a new zone
+            iosperzone = 0
+            zoneset = set()
+            zonestart += test['zonerange']
+            if zonestart >= test['filesize']:
+                zonestart = 0 if 'offset' not in test else test['offset']
+
+        iosperzone = iosperzone + 1
+        tokens = line.split(',')
+        offset = int(tokens[4])
+        if offset < zonestart or offset >= zonestart + test['zonerange']:
+            print("Offset {0} outside of zone starting at {1}".format(
+                    offset, zonestart))
+            return False
+
+        # skip next section if norandommap is enabled with no
+        # random_generator or with a random_generator != lfsr
+        if 'norandommap' in test:
+            if 'random_generator' in test:
+                if test['random_generator'] != 'lfsr':
+                    continue
+            else:
+                continue
+
+        # we either have a random map enabled or we
+        # are using an LFSR
+        # so all blocks should be unique and we should have
+        # covered the entire zone when iosperzone % iosperrange == 0
+        block = (offset - zonestart) / test['bs']
+        if block in zoneset:
+            print("Offset {0} in zone already touched".format(offset))
+            return False
+
+        zoneset.add(block)
+        if iosperzone % iosperrange == 0:
+            if len(zoneset) != iosperrange:
+                print("Expected {0} blocks in zone but only saw {1}".format(
+                        iosperrange, len(zoneset)))
+                return False
+            zoneset = set()
+
+    return True
+
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    tests = [   # randommap enabled
+                {
+                    "zonerange": 4096,
+                    "zonesize": 4096,
+                    "bs": 4096,
+                    "offset": 8*4096,
+                    "size": 16*4096,
+                    "io_size": 16*4096,
+                },
+                {
+                    "zonerange": 4096,
+                    "zonesize": 4096,
+                    "bs": 4096,
+                    "size": 16*4096,
+                    "io_size": 16*4096,
+                },
+                {
+                    "zonerange": 16*1024*1024,
+                    "zonesize": 16*1024*1024,
+                    "bs": 4096,
+                    "size": 256*1024*1024,
+                    "io_size": 256*1024*204,
+                },
+                {
+                    "zonerange": 4096,
+                    "zonesize": 4*4096,
+                    "bs": 4096,
+                    "size": 16*4096,
+                    "io_size": 16*4096,
+                },
+                {
+                    "zonerange": 16*1024*1024,
+                    "zonesize": 32*1024*1024,
+                    "bs": 4096,
+                    "size": 256*1024*1024,
+                    "io_size": 256*1024*204,
+                },
+                {
+                    "zonerange": 8192,
+                    "zonesize": 4096,
+                    "bs": 4096,
+                    "size": 16*4096,
+                    "io_size": 16*4096,
+                },
+                {
+                    "zonerange": 16*1024*1024,
+                    "zonesize": 8*1024*1024,
+                    "bs": 4096,
+                    "size": 256*1024*1024,
+                    "io_size": 256*1024*204,
+                },
+                # lfsr
+                {
+                    "random_generator": "lfsr",
+                    "zonerange": 4096,
+                    "zonesize": 4096,
+                    "bs": 4096,
+                    "offset": 8*4096,
+                    "size": 16*4096,
+                    "io_size": 16*4096,
+                },
+                {
+                    "random_generator": "lfsr",
+                    "zonerange": 4096,
+                    "zonesize": 4096,
+                    "bs": 4096,
+                    "size": 16*4096,
+                    "io_size": 16*4096,
+                },
+                {
+                    "random_generator": "lfsr",
+                    "zonerange": 16*1024*1024,
+                    "zonesize": 16*1024*1024,
+                    "bs": 4096,
+                    "size": 256*1024*1024,
+                    "io_size": 256*1024*204,
+                },
+                {
+                    "random_generator": "lfsr",
+                    "zonerange": 4096,
+                    "zonesize": 4*4096,
+                    "bs": 4096,
+                    "size": 16*4096,
+                    "io_size": 16*4096,
+                },
+                {
+                    "random_generator": "lfsr",
+                    "zonerange": 16*1024*1024,
+                    "zonesize": 32*1024*1024,
+                    "bs": 4096,
+                    "size": 256*1024*1024,
+                    "io_size": 256*1024*204,
+                },
+                {
+                    "random_generator": "lfsr",
+                    "zonerange": 8192,
+                    "zonesize": 4096,
+                    "bs": 4096,
+                    "size": 16*4096,
+                    "io_size": 16*4096,
+                },
+                {
+                    "random_generator": "lfsr",
+                    "zonerange": 16*1024*1024,
+                    "zonesize": 8*1024*1024,
+                    "bs": 4096,
+                    "size": 256*1024*1024,
+                    "io_size": 256*1024*204,
+                },
+                # norandommap
+                {
+                    "norandommap": 1,
+                    "zonerange": 4096,
+                    "zonesize": 4096,
+                    "bs": 4096,
+                    "offset": 8*4096,
+                    "size": 16*4096,
+                    "io_size": 16*4096,
+                },
+                {
+                    "norandommap": 1,
+                    "zonerange": 4096,
+                    "zonesize": 4096,
+                    "bs": 4096,
+                    "size": 16*4096,
+                    "io_size": 16*4096,
+                },
+                {
+                    "norandommap": 1,
+                    "zonerange": 16*1024*1024,
+                    "zonesize": 16*1024*1024,
+                    "bs": 4096,
+                    "size": 256*1024*1024,
+                    "io_size": 256*1024*204,
+                },
+                {
+                    "norandommap": 1,
+                    "zonerange": 4096,
+                    "zonesize": 8192,
+                    "bs": 4096,
+                    "size": 16*4096,
+                    "io_size": 16*4096,
+                },
+                {
+                    "norandommap": 1,
+                    "zonerange": 16*1024*1024,
+                    "zonesize": 32*1024*1024,
+                    "bs": 4096,
+                    "size": 256*1024*1024,
+                    "io_size": 256*1024*204,
+                },
+                {
+                    "norandommap": 1,
+                    "zonerange": 8192,
+                    "zonesize": 4096,
+                    "bs": 4096,
+                    "size": 16*4096,
+                    "io_size": 16*4096,
+                },
+                {
+                    "norandommap": 1,
+                    "zonerange": 16*1024*1024,
+                    "zonesize": 8*1024*1024,
+                    "bs": 4096,
+                    "size": 256*1024*1024,
+                    "io_size": 256*1024*204,
+                },
+
+            ]
+
+    index = 1
+    passed = 0
+    failed = 0
+
+    if args.filename:
+        statinfo = os.stat(args.filename)
+        filesize = statinfo.st_size
+        if filesize == 0:
+            f = os.open(args.filename, os.O_RDONLY)
+            filesize = os.lseek(f, 0, os.SEEK_END)
+            os.close(f)
+
+    for test in tests:
+        if args.filename:
+            test['filename'] = args.filename
+            test['filesize'] = filesize
+        else:
+            test['filesize'] = test['size']
+        iops_log = run_fio(args.fio, test, index)
+        status = check_output(iops_log, test)
+        print("Test {0} {1}".format(index, ("PASSED" if status else "FAILED")))
+        if status:
+            passed = passed + 1
+        else:
+            failed = failed + 1
+        index = index + 1
+
+    print("{0} tests passed, {1} failed".format(passed, failed))
+
+    sys.exit(failed)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-10-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-10-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9663e751c25d3bf52e959d8c9025460d2f645b1e:

  Merge branch 'fix-corrupt-hist-log' of https://github.com/sitsofe/fio (2019-10-13 11:02:00 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5cd4efe903798f2185a347911a9440324558c89f:

  parse: improve detection of bad input string (2019-10-14 08:03:53 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      parse: improve detection of bad input string

 parse.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/parse.c b/parse.c
index c4fd4626..483a62f6 100644
--- a/parse.c
+++ b/parse.c
@@ -373,12 +373,16 @@ int str_to_decimal(const char *str, long long *val, int kilo, void *data,
 #endif
 
 	if (rc == 1) {
+		char *endptr;
+
 		if (strstr(str, "0x") || strstr(str, "0X"))
 			base = 16;
 		else
 			base = 10;
 
-		*val = strtoll(str, NULL, base);
+		*val = strtoll(str, &endptr, base);
+		if (*val == 0 && endptr == str)
+			return 1;
 		if (*val == LONG_MAX && errno == ERANGE)
 			return 1;
 	}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-10-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-10-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 983a656c11bf35043759de48aa5a85bdb4ffb106:

  Merge branch 'bumpflocks' of https://github.com/earlephilhower/fio (2019-10-08 20:58:57 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9663e751c25d3bf52e959d8c9025460d2f645b1e:

  Merge branch 'fix-corrupt-hist-log' of https://github.com/sitsofe/fio (2019-10-13 11:02:00 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fix-corrupt-hist-log' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      stat: fix corruption in histogram logs

 stat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 33637900..05663e07 100644
--- a/stat.c
+++ b/stat.c
@@ -2580,7 +2580,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 			io_u_plat = (uint64_t *) td->ts.io_u_plat[ddir];
 			dst = malloc(sizeof(struct io_u_plat_entry));
 			memcpy(&(dst->io_u_plat), io_u_plat,
-				FIO_IO_U_PLAT_NR * sizeof(unsigned int));
+				FIO_IO_U_PLAT_NR * sizeof(uint64_t));
 			flist_add(&dst->list, &hw->list);
 			__add_log_sample(iolog, sample_plat(dst), ddir, bs,
 						elapsed, offset);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-10-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-10-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 998baa29f571df9d2e4b626bedd317a2fd28c68a:

  Merge branch 'replay-blktrace-fixes' of https://github.com/shimrot/fio (2019-10-07 21:23:26 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 983a656c11bf35043759de48aa5a85bdb4ffb106:

  Merge branch 'bumpflocks' of https://github.com/earlephilhower/fio (2019-10-08 20:58:57 -0600)

----------------------------------------------------------------
Earle F. Philhower, III (1):
      Increase MAX_FILELOCKS for highly parallel IO test

Jens Axboe (1):
      Merge branch 'bumpflocks' of https://github.com/earlephilhower/fio

 filelock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/filelock.c b/filelock.c
index cc98aafc..7e92f63d 100644
--- a/filelock.c
+++ b/filelock.c
@@ -22,7 +22,7 @@ struct fio_filelock {
 	unsigned int references;
 };
 
-#define MAX_FILELOCKS	128
+#define MAX_FILELOCKS	1024
 	
 static struct filelock_data {
 	struct flist_head list;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-10-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-10-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 58d68a0dad21fb8aed34bb987e00b9b7dee65296:

  Merge branch 'error-on-implicit-decl' of https://github.com/sitsofe/fio (2019-10-06 09:15:51 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 998baa29f571df9d2e4b626bedd317a2fd28c68a:

  Merge branch 'replay-blktrace-fixes' of https://github.com/shimrot/fio (2019-10-07 21:23:26 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'replay-blktrace-fixes' of https://github.com/shimrot/fio

krisd (1):
      Fix assert error on blktrace replay containing trims

 blktrace.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/blktrace.c b/blktrace.c
index efe9ce24..8a246613 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -544,16 +544,19 @@ bool load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 	    !ios[DDIR_SYNC]) {
 		log_err("fio: found no ios in blktrace data\n");
 		return false;
-	} else if (ios[DDIR_READ] && !ios[DDIR_WRITE]) {
-		td->o.td_ddir = TD_DDIR_READ;
-		td->o.max_bs[DDIR_READ] = rw_bs[DDIR_READ];
-	} else if (!ios[DDIR_READ] && ios[DDIR_WRITE]) {
-		td->o.td_ddir = TD_DDIR_WRITE;
-		td->o.max_bs[DDIR_WRITE] = rw_bs[DDIR_WRITE];
-	} else {
-		td->o.td_ddir = TD_DDIR_RW;
+	}
+
+	td->o.td_ddir = 0;
+	if (ios[DDIR_READ]) {
+		td->o.td_ddir |= TD_DDIR_READ;
 		td->o.max_bs[DDIR_READ] = rw_bs[DDIR_READ];
+	}
+	if (ios[DDIR_WRITE]) {
+		td->o.td_ddir |= TD_DDIR_WRITE;
 		td->o.max_bs[DDIR_WRITE] = rw_bs[DDIR_WRITE];
+	}
+	if (ios[DDIR_TRIM]) {
+		td->o.td_ddir |= TD_DDIR_TRIM;
 		td->o.max_bs[DDIR_TRIM] = rw_bs[DDIR_TRIM];
 	}
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-10-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-10-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 13e9c0b09c0c8d892b790aeaf736263dd76f2d2e:

  Windows: Update URLs to https, and remove mention of WiX version (2019-10-02 08:23:19 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 58d68a0dad21fb8aed34bb987e00b9b7dee65296:

  Merge branch 'error-on-implicit-decl' of https://github.com/sitsofe/fio (2019-10-06 09:15:51 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'error-on-implicit-decl' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      configure: stop enabling fdatasync on OSX

 HOWTO       | 2 +-
 configure   | 4 ++--
 fio.1       | 2 +-
 os/os-mac.h | 6 ------
 4 files changed, 4 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 4fef1504..96a047de 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1279,7 +1279,7 @@ I/O type
 .. option:: fdatasync=int
 
 	Like :option:`fsync` but uses :manpage:`fdatasync(2)` to only sync data and
-	not metadata blocks.  In Windows, FreeBSD, and DragonFlyBSD there is no
+	not metadata blocks. In Windows, FreeBSD, DragonFlyBSD or OSX there is no
 	:manpage:`fdatasync(2)` so this falls back to using :manpage:`fsync(2)`.
 	Defaults to 0, which means fio does not periodically issue and wait for a
 	data-only sync to complete.
diff --git a/configure b/configure
index 59da2f7e..e32d5dcf 100755
--- a/configure
+++ b/configure
@@ -88,14 +88,14 @@ do_cc() {
 }
 
 compile_object() {
-  do_cc $CFLAGS -c -o $TMPO $TMPC
+  do_cc $CFLAGS -Werror-implicit-function-declaration -c -o $TMPO $TMPC
 }
 
 compile_prog() {
   local_cflags="$1"
   local_ldflags="$2 $LIBS"
   echo "Compiling test case $3" >> config.log
-  do_cc $CFLAGS $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags
+  do_cc $CFLAGS -Werror-implicit-function-declaration $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags
 }
 
 feature_not_found() {
diff --git a/fio.1 b/fio.1
index 77a2d799..6685e507 100644
--- a/fio.1
+++ b/fio.1
@@ -1050,7 +1050,7 @@ see \fBend_fsync\fR and \fBfsync_on_close\fR.
 .TP
 .BI fdatasync \fR=\fPint
 Like \fBfsync\fR but uses \fBfdatasync\fR\|(2) to only sync data and
-not metadata blocks. In Windows, FreeBSD, and DragonFlyBSD there is no
+not metadata blocks. In Windows, FreeBSD, DragonFlyBSD or OSX there is no
 \fBfdatasync\fR\|(2) so this falls back to using \fBfsync\fR\|(2).
 Defaults to 0, which means fio does not periodically issue and wait for a
 data\-only sync to complete.
diff --git a/os/os-mac.h b/os/os-mac.h
index a073300c..0d97f6b9 100644
--- a/os/os-mac.h
+++ b/os/os-mac.h
@@ -97,12 +97,6 @@ static inline int gettid(void)
 }
 #endif
 
-/*
- * For some reason, there's no header definition for fdatasync(), even
- * if it exists.
- */
-extern int fdatasync(int fd);
-
 static inline bool fio_fallocate(struct fio_file *f, uint64_t offset, uint64_t len)
 {
 	fstore_t store = {F_ALLOCATEALL, F_PEOFPOSMODE, offset, len};


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-10-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-10-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f95183e40c5d48cd119d3fbcd752b70c39c97683:

  Update the email and web address for Windows binaries. (2019-10-01 21:34:18 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 13e9c0b09c0c8d892b790aeaf736263dd76f2d2e:

  Windows: Update URLs to https, and remove mention of WiX version (2019-10-02 08:23:19 -0600)

----------------------------------------------------------------
Rebecca Cran (1):
      Windows: Update URLs to https, and remove mention of WiX version

 README | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/README b/README
index 7c65906d..0f943bcc 100644
--- a/README
+++ b/README
@@ -164,9 +164,9 @@ configure.
 Windows
 ~~~~~~~
 
-On Windows, Cygwin (http://www.cygwin.com/) is required in order to build
-fio. To create an MSI installer package install WiX 3.8 from
-http://wixtoolset.org and run :file:`dobuild.cmd` from the :file:`os/windows`
+On Windows, Cygwin (https://www.cygwin.com/) is required in order to build
+fio. To create an MSI installer package install WiX from
+https://wixtoolset.org and run :file:`dobuild.cmd` from the :file:`os/windows`
 directory.
 
 How to compile fio on 64-bit Windows:


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-10-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-10-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e2defc22798dcd884356fbbaf8856689839b52e5:

  configure: add --enable-libaio-uring parameter (2019-09-27 13:51:12 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f95183e40c5d48cd119d3fbcd752b70c39c97683:

  Update the email and web address for Windows binaries. (2019-10-01 21:34:18 -0600)

----------------------------------------------------------------
Rebecca Cran (1):
      Update the email and web address for Windows binaries.

 README | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/README b/README
index 38022bbb..7c65906d 100644
--- a/README
+++ b/README
@@ -119,8 +119,8 @@ Solaris:
 	``pkgutil -i fio``.
 
 Windows:
-	Rebecca Cran <rebecca+fio@bluestop.org> has fio packages for Windows at
-	https://www.bluestop.org/fio/ . The latest builds for Windows can also
+	Rebecca Cran <rebecca@bsdio.com> has fio packages for Windows at
+	https://bsdio.com/fio/ . The latest builds for Windows can also
 	be grabbed from https://ci.appveyor.com/project/axboe/fio by clicking
 	the latest x86 or x64 build, then selecting the ARTIFACTS tab.
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-09-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-09-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 106a71cd6c24f31fce9d95c49e00ff4b28aa6fdc:

  t/zbd: Avoid magic number of test case count (2019-09-26 00:47:14 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e2defc22798dcd884356fbbaf8856689839b52e5:

  configure: add --enable-libaio-uring parameter (2019-09-27 13:51:12 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      configure: add --enable-libaio-uring parameter

 configure | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index b174a6fc..59da2f7e 100755
--- a/configure
+++ b/configure
@@ -150,6 +150,7 @@ disable_native="no"
 march_set="no"
 libiscsi="no"
 libnbd="no"
+libaio_uring="no"
 prefix=/usr/local
 
 # parse options
@@ -212,6 +213,8 @@ for opt do
   ;;
   --disable-tcmalloc) disable_tcmalloc="yes"
   ;;
+  --enable-libaio-uring) libaio_uring="yes"
+  ;;
   --help)
     show_help="yes"
     ;;
@@ -250,6 +253,7 @@ if test "$show_help" = "yes" ; then
   echo "--enable-libiscsi       Enable iscsi support"
   echo "--enable-libnbd         Enable libnbd (NBD engine) support"
   echo "--disable-tcmalloc	Disable tcmalloc support"
+  echo "--enable-libaio-uring   Enable libaio emulated over io_uring"
   exit $exit_val
 fi
 
@@ -603,17 +607,23 @@ int main(void)
   return 0;
 }
 EOF
-  if compile_prog "" "-laio" "libaio" ; then
+  if test "$libaio_uring" = "yes" && compile_prog "" "-luring" "libaio io_uring" ; then
+    libaio=yes
+    LIBS="-luring $LIBS"
+  elif compile_prog "" "-laio" "libaio" ; then
     libaio=yes
+    libaio_uring=no
     LIBS="-laio $LIBS"
   else
     if test "$libaio" = "yes" ; then
       feature_not_found "linux AIO" "libaio-dev or libaio-devel"
     fi
     libaio=no
+    libaio_uring=no
   fi
 fi
 print_config "Linux AIO support" "$libaio"
+print_config "Linux AIO over io_uring" "$libaio_uring"
 
 ##########################################
 # posix aio probe
@@ -2449,6 +2459,9 @@ if test "$zlib" = "yes" ; then
 fi
 if test "$libaio" = "yes" ; then
   output_sym "CONFIG_LIBAIO"
+  if test "$libaio_uring" = "yes" ; then
+    output_sym "CONFIG_LIBAIO_URING"
+  fi
 fi
 if test "$posix_aio" = "yes" ; then
   output_sym "CONFIG_POSIXAIO"


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-09-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-09-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3c029ac46c3478243932f76cadf04ca10b64ab3e:

  filesetup: Extend file size for 'null' and 'filecreate' ioengines (2019-09-25 03:12:05 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 106a71cd6c24f31fce9d95c49e00ff4b28aa6fdc:

  t/zbd: Avoid magic number of test case count (2019-09-26 00:47:14 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (1):
      t/zbd: Avoid magic number of test case count

 t/zbd/test-zbd-support | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 90f9f87b..5d079a8b 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -826,9 +826,8 @@ case "$(<"/sys/class/block/$basename/queue/zoned")" in
 esac
 
 if [ "${#tests[@]}" = 0 ]; then
-    for ((i=1;i<=46;i++)); do
-	tests+=("$i")
-    done
+    readarray -t tests < <(declare -F | grep "test[0-9]*" | \
+				   tr -c -d "[:digit:]\n" | sort -n)
 fi
 
 logfile=$0.log


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-09-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-09-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3982ec03ab0c125d01b62876f5139b2c07082c7a:

  verify: check that the block size is big enough (2019-09-24 02:43:39 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3c029ac46c3478243932f76cadf04ca10b64ab3e:

  filesetup: Extend file size for 'null' and 'filecreate' ioengines (2019-09-25 03:12:05 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (2):
      Revert "filesetup: honor the offset option"
      filesetup: Extend file size for 'null' and 'filecreate' ioengines

 filesetup.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index b8d1d838..a439b6d6 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1047,7 +1047,7 @@ int setup_files(struct thread_data *td)
 			 * doesn't divide nicely with the min blocksize,
 			 * make the first files bigger.
 			 */
-			f->io_size = fs - f->file_offset;
+			f->io_size = fs;
 			if (nr_fs_extra) {
 				nr_fs_extra--;
 				f->io_size += bs;
@@ -1104,13 +1104,15 @@ int setup_files(struct thread_data *td)
 		}
 
 		if (f->filetype == FIO_TYPE_FILE &&
-		    (f->io_size + f->file_offset) > f->real_file_size &&
-		    !td_ioengine_flagged(td, FIO_DISKLESSIO)) {
-			if (!o->create_on_open) {
+		    (f->io_size + f->file_offset) > f->real_file_size) {
+			if (!td_ioengine_flagged(td, FIO_DISKLESSIO) &&
+			    !o->create_on_open) {
 				need_extend++;
 				extend_size += (f->io_size + f->file_offset);
 				fio_file_set_extend(f);
-			} else
+			} else if (!td_ioengine_flagged(td, FIO_DISKLESSIO) ||
+				   (td_ioengine_flagged(td, FIO_DISKLESSIO) &&
+				    td_ioengine_flagged(td, FIO_FAKEIO)))
 				f->real_file_size = f->io_size + f->file_offset;
 		}
 	}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-09-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-09-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 92f75708b530989fdb13b50be6604f44b80d038d:

  Fio 3.16 (2019-09-19 19:01:52 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3982ec03ab0c125d01b62876f5139b2c07082c7a:

  verify: check that the block size is big enough (2019-09-24 02:43:39 -0600)

----------------------------------------------------------------
Anatol Pomozov (1):
      Fix compilation error with gfio

Jens Axboe (2):
      Merge branch 'master' of https://github.com/anatol/fio
      verify: check that the block size is big enough

 gclient.c | 2 +-
 verify.c  | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/gclient.c b/gclient.c
index 64324177..d8dc62d2 100644
--- a/gclient.c
+++ b/gclient.c
@@ -330,7 +330,7 @@ static void gfio_update_thread_status_all(struct gui *ui, char *status_message,
 	static char message[100];
 	const char *m = message;
 
-	strncpy(message, sizeof(message), "%s", status_message);
+	snprintf(message, sizeof(message), "%s", status_message);
 	gtk_progress_bar_set_text(GTK_PROGRESS_BAR(ui->thread_status_pb), m);
 	gtk_progress_bar_set_fraction(GTK_PROGRESS_BAR(ui->thread_status_pb), perc / 100.0);
 	gtk_widget_queue_draw(ui->window);
diff --git a/verify.c b/verify.c
index 48ba051d..37d2be8d 100644
--- a/verify.c
+++ b/verify.c
@@ -1191,6 +1191,10 @@ static void populate_hdr(struct thread_data *td, struct io_u *io_u,
 
 	fill_hdr(td, io_u, hdr, header_num, header_len, io_u->rand_seed);
 
+	if (header_len <= hdr_size(td, hdr)) {
+		td_verror(td, EINVAL, "Blocksize too small");
+		return;
+	}
 	data_len = header_len - hdr_size(td, hdr);
 
 	data = p + hdr_size(td, hdr);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-09-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-09-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ae3ae8367271e6df7be5eae37c15c00e6c2dc76e:

  Merge branch 'fio_reset_sqe' of https://github.com/anarazel/fio (2019-09-13 14:56:16 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 92f75708b530989fdb13b50be6604f44b80d038d:

  Fio 3.16 (2019-09-19 19:01:52 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      engines/io_uring: remove debug printf
      Fio 3.16

 FIO-VERSION-GEN    | 2 +-
 engines/io_uring.c | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 350da551..d5cec22e 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.15
+DEF_VER=fio-3.16
 
 LF='
 '
diff --git a/engines/io_uring.c b/engines/io_uring.c
index e5edfcd2..ef56345b 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -562,7 +562,6 @@ static int fio_ioring_post_init(struct thread_data *td)
 		return 1;
 	}
 
-	printf("files=%d\n", o->registerfiles);
 	if (o->registerfiles) {
 		err = fio_ioring_register_files(td);
 		if (err) {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-09-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-09-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4c29a34fcc8cae333ec8b7af7657495745153b44:

  Merge branch 'ioring_add_sync_file_range' of https://github.com/anarazel/fio (2019-09-12 14:24:23 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ae3ae8367271e6df7be5eae37c15c00e6c2dc76e:

  Merge branch 'fio_reset_sqe' of https://github.com/anarazel/fio (2019-09-13 14:56:16 -0600)

----------------------------------------------------------------
Andres Freund (1):
      engines/io_uring: Fully clear out previous SQE contents.

Jens Axboe (1):
      Merge branch 'fio_reset_sqe' of https://github.com/anarazel/fio

 engines/io_uring.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 53cb60c5..e5edfcd2 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -152,15 +152,16 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 	struct io_uring_sqe *sqe;
 
 	sqe = &ld->sqes[io_u->index];
+
+	/* zero out fields not used in this submission */
+	memset(sqe, 0, sizeof(*sqe));
+
 	if (o->registerfiles) {
 		sqe->fd = f->engine_pos;
 		sqe->flags = IOSQE_FIXED_FILE;
 	} else {
 		sqe->fd = f->fd;
-		sqe->flags = 0;
 	}
-	sqe->ioprio = 0;
-	sqe->buf_index = 0;
 
 	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
 		if (o->fixedbufs) {
@@ -187,7 +188,6 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 			sqe->sync_range_flags = td->o.sync_file_range;
 			sqe->opcode = IORING_OP_SYNC_FILE_RANGE;
 		} else {
-			sqe->fsync_flags = 0;
 			if (io_u->ddir == DDIR_DATASYNC)
 				sqe->fsync_flags |= IORING_FSYNC_DATASYNC;
 			sqe->opcode = IORING_OP_FSYNC;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-09-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-09-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 27f436d9f72a9d2d3da3adfdf712757152eab29e:

  engines/io_uring: use its own option group (2019-09-05 09:15:41 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4c29a34fcc8cae333ec8b7af7657495745153b44:

  Merge branch 'ioring_add_sync_file_range' of https://github.com/anarazel/fio (2019-09-12 14:24:23 -0600)

----------------------------------------------------------------
Andres Freund (2):
      engines/io_uring: Handle EINTR.
      engines/io_uring: Add support for sync_file_range.

Jens Axboe (3):
      engines/io_uring: fix crash with registerfiles=1
      Merge branch 'fix_iouring_eintr' of https://github.com/anarazel/fio
      Merge branch 'ioring_add_sync_file_range' of https://github.com/anarazel/fio

Vincent Fu (2):
      doc: clarify what --alloc-size does
      filesetup: honor the offset option

 HOWTO              |  4 ++--
 engines/io_uring.c | 25 +++++++++++++++++--------
 filesetup.c        |  2 +-
 fio.1              |  4 ++--
 4 files changed, 22 insertions(+), 13 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 6b449e97..4fef1504 100644
--- a/HOWTO
+++ b/HOWTO
@@ -222,8 +222,8 @@ Command line options
 
 .. option:: --alloc-size=kb
 
-	Set the internal smalloc pool size to `kb` in KiB.  The
-	``--alloc-size`` switch allows one to use a larger pool size for smalloc.
+	Allocate additional internal smalloc pools of size `kb` in KiB.  The
+	``--alloc-size`` option increases shared memory set aside for use by fio.
 	If running large jobs with randommap enabled, fio can run out of memory.
 	Smalloc is an internal allocator for shared structures from a fixed size
 	memory pool and can grow to 16 pools. The pool size defaults to 16MiB.
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 10cfe9f2..53cb60c5 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -181,10 +181,17 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 		}
 		sqe->off = io_u->offset;
 	} else if (ddir_sync(io_u->ddir)) {
-		sqe->fsync_flags = 0;
-		if (io_u->ddir == DDIR_DATASYNC)
-			sqe->fsync_flags |= IORING_FSYNC_DATASYNC;
-		sqe->opcode = IORING_OP_FSYNC;
+		if (io_u->ddir == DDIR_SYNC_FILE_RANGE) {
+			sqe->off = f->first_write;
+			sqe->len = f->last_write - f->first_write;
+			sqe->sync_range_flags = td->o.sync_file_range;
+			sqe->opcode = IORING_OP_SYNC_FILE_RANGE;
+		} else {
+			sqe->fsync_flags = 0;
+			if (io_u->ddir == DDIR_DATASYNC)
+				sqe->fsync_flags |= IORING_FSYNC_DATASYNC;
+			sqe->opcode = IORING_OP_FSYNC;
+		}
 	}
 
 	sqe->user_data = (unsigned long) io_u;
@@ -259,7 +266,7 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
 			r = io_uring_enter(ld, 0, actual_min,
 						IORING_ENTER_GETEVENTS);
 			if (r < 0) {
-				if (errno == EAGAIN)
+				if (errno == EAGAIN || errno == EINTR)
 					continue;
 				td_verror(td, errno, "io_uring_enter");
 				break;
@@ -370,7 +377,7 @@ static int fio_ioring_commit(struct thread_data *td)
 			io_u_mark_submit(td, ret);
 			continue;
 		} else {
-			if (errno == EAGAIN) {
+			if (errno == EAGAIN || errno == EINTR) {
 				ret = fio_ioring_cqring_reap(td, 0, ld->queued);
 				if (ret)
 					continue;
@@ -555,6 +562,7 @@ static int fio_ioring_post_init(struct thread_data *td)
 		return 1;
 	}
 
+	printf("files=%d\n", o->registerfiles);
 	if (o->registerfiles) {
 		err = fio_ioring_register_files(td);
 		if (err) {
@@ -613,7 +621,7 @@ static int fio_ioring_open_file(struct thread_data *td, struct fio_file *f)
 	struct ioring_data *ld = td->io_ops_data;
 	struct ioring_options *o = td->eo;
 
-	if (!o->registerfiles)
+	if (!ld || !o->registerfiles)
 		return generic_open_file(td, f);
 
 	f->fd = ld->fds[f->engine_pos];
@@ -622,9 +630,10 @@ static int fio_ioring_open_file(struct thread_data *td, struct fio_file *f)
 
 static int fio_ioring_close_file(struct thread_data *td, struct fio_file *f)
 {
+	struct ioring_data *ld = td->io_ops_data;
 	struct ioring_options *o = td->eo;
 
-	if (!o->registerfiles)
+	if (!ld || !o->registerfiles)
 		return generic_close_file(td, f);
 
 	f->fd = -1;
diff --git a/filesetup.c b/filesetup.c
index 7904d187..b8d1d838 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1047,7 +1047,7 @@ int setup_files(struct thread_data *td)
 			 * doesn't divide nicely with the min blocksize,
 			 * make the first files bigger.
 			 */
-			f->io_size = fs;
+			f->io_size = fs - f->file_offset;
 			if (nr_fs_extra) {
 				nr_fs_extra--;
 				f->io_size += bs;
diff --git a/fio.1 b/fio.1
index e0283f7f..77a2d799 100644
--- a/fio.1
+++ b/fio.1
@@ -112,8 +112,8 @@ only applies to job sections. The reserved *global* section is always
 parsed and used.
 .TP
 .BI \-\-alloc\-size \fR=\fPkb
-Set the internal smalloc pool size to \fIkb\fR in KiB. The
-\fB\-\-alloc\-size\fR switch allows one to use a larger pool size for smalloc.
+Allocate additional internal smalloc pools of size \fIkb\fR in KiB. The
+\fB\-\-alloc\-size\fR option increases shared memory set aside for use by fio.
 If running large jobs with randommap enabled, fio can run out of memory.
 Smalloc is an internal allocator for shared structures from a fixed size
 memory pool and can grow to 16 pools. The pool size defaults to 16MiB.


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-09-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-09-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4a479420d50eada0a7b9a972c529d75e2884732d:

  smalloc: use SMALLOC_BPI instead of SMALLOC_BPB in add_pool() (2019-09-03 12:32:01 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 27f436d9f72a9d2d3da3adfdf712757152eab29e:

  engines/io_uring: use its own option group (2019-09-05 09:15:41 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      engines/io_uring: add support for registered files
      engines/io_uring: use its own option group

 HOWTO              |   6 +++
 engines/io_uring.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++----
 fio.1              |   6 +++
 optgroup.h         |   2 +
 4 files changed, 123 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 4201e2e9..6b449e97 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2033,6 +2033,12 @@ with the caveat that when used on the command line, they must come after the
 	map and release for each IO. This is more efficient, and reduces the
 	IO latency as well.
 
+.. option:: registerfiles : [io_uring]
+	With this option, fio registers the set of files being used with the
+	kernel. This avoids the overhead of managing file counts in the kernel,
+	making the submission and completion part more lightweight. Required
+	for the below :option:`sqthread_poll` option.
+
 .. option:: sqthread_poll : [io_uring]
 
 	Normally fio will submit IO by issuing a system call to notify the
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 9bcfec17..10cfe9f2 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -50,6 +50,8 @@ struct ioring_data {
 
 	struct io_u **io_u_index;
 
+	int *fds;
+
 	struct io_sq_ring sq_ring;
 	struct io_uring_sqe *sqes;
 	struct iovec *iovecs;
@@ -69,6 +71,7 @@ struct ioring_options {
 	void *pad;
 	unsigned int hipri;
 	unsigned int fixedbufs;
+	unsigned int registerfiles;
 	unsigned int sqpoll_thread;
 	unsigned int sqpoll_set;
 	unsigned int sqpoll_cpu;
@@ -91,7 +94,7 @@ static struct fio_option options[] = {
 		.off1	= offsetof(struct ioring_options, hipri),
 		.help	= "Use polled IO completions",
 		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_LIBAIO,
+		.group	= FIO_OPT_G_IOURING,
 	},
 	{
 		.name	= "fixedbufs",
@@ -100,7 +103,16 @@ static struct fio_option options[] = {
 		.off1	= offsetof(struct ioring_options, fixedbufs),
 		.help	= "Pre map IO buffers",
 		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_LIBAIO,
+		.group	= FIO_OPT_G_IOURING,
+	},
+	{
+		.name	= "registerfiles",
+		.lname	= "Register file set",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct ioring_options, registerfiles),
+		.help	= "Pre-open/register files",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
 	},
 	{
 		.name	= "sqthread_poll",
@@ -109,7 +121,7 @@ static struct fio_option options[] = {
 		.off1	= offsetof(struct ioring_options, sqpoll_thread),
 		.help	= "Offload submission/completion to kernel thread",
 		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_LIBAIO,
+		.group	= FIO_OPT_G_IOURING,
 	},
 	{
 		.name	= "sqthread_poll_cpu",
@@ -118,7 +130,7 @@ static struct fio_option options[] = {
 		.cb	= fio_ioring_sqpoll_cb,
 		.help	= "What CPU to run SQ thread polling on",
 		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_LIBAIO,
+		.group	= FIO_OPT_G_IOURING,
 	},
 	{
 		.name	= NULL,
@@ -140,8 +152,13 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 	struct io_uring_sqe *sqe;
 
 	sqe = &ld->sqes[io_u->index];
-	sqe->fd = f->fd;
-	sqe->flags = 0;
+	if (o->registerfiles) {
+		sqe->fd = f->engine_pos;
+		sqe->flags = IOSQE_FIXED_FILE;
+	} else {
+		sqe->fd = f->fd;
+		sqe->flags = 0;
+	}
 	sqe->ioprio = 0;
 	sqe->buf_index = 0;
 
@@ -388,6 +405,7 @@ static void fio_ioring_cleanup(struct thread_data *td)
 
 		free(ld->io_u_index);
 		free(ld->iovecs);
+		free(ld->fds);
 		free(ld);
 	}
 }
@@ -476,9 +494,50 @@ static int fio_ioring_queue_init(struct thread_data *td)
 	return fio_ioring_mmap(ld, &p);
 }
 
+static int fio_ioring_register_files(struct thread_data *td)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct fio_file *f;
+	unsigned int i;
+	int ret;
+
+	ld->fds = calloc(td->o.nr_files, sizeof(int));
+
+	for_each_file(td, f, i) {
+		ret = generic_open_file(td, f);
+		if (ret)
+			goto err;
+		ld->fds[i] = f->fd;
+		f->engine_pos = i;
+	}
+
+	ret = syscall(__NR_sys_io_uring_register, ld->ring_fd,
+			IORING_REGISTER_FILES, ld->fds, td->o.nr_files);
+	if (ret) {
+err:
+		free(ld->fds);
+		ld->fds = NULL;
+	}
+
+	/*
+	 * Pretend the file is closed again, and really close it if we hit
+	 * an error.
+	 */
+	for_each_file(td, f, i) {
+		if (ret) {
+			int fio_unused ret2;
+			ret2 = generic_close_file(td, f);
+		} else
+			f->fd = -1;
+	}
+
+	return ret;
+}
+
 static int fio_ioring_post_init(struct thread_data *td)
 {
 	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
 	struct io_u *io_u;
 	int err, i;
 
@@ -496,6 +555,14 @@ static int fio_ioring_post_init(struct thread_data *td)
 		return 1;
 	}
 
+	if (o->registerfiles) {
+		err = fio_ioring_register_files(td);
+		if (err) {
+			td_verror(td, errno, "ioring_register_files");
+			return 1;
+		}
+	}
+
 	return 0;
 }
 
@@ -506,8 +573,19 @@ static unsigned roundup_pow2(unsigned depth)
 
 static int fio_ioring_init(struct thread_data *td)
 {
+	struct ioring_options *o = td->eo;
 	struct ioring_data *ld;
 
+	/* sqthread submission requires registered files */
+	if (o->sqpoll_thread)
+		o->registerfiles = 1;
+
+	if (o->registerfiles && td->o.nr_files != td->o.open_files) {
+		log_err("fio: io_uring registered files require nr_files to "
+			"be identical to open_files\n");
+		return 1;
+	}
+
 	ld = calloc(1, sizeof(*ld));
 
 	/* ring depth must be a power-of-2 */
@@ -530,6 +608,29 @@ static int fio_ioring_io_u_init(struct thread_data *td, struct io_u *io_u)
 	return 0;
 }
 
+static int fio_ioring_open_file(struct thread_data *td, struct fio_file *f)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+
+	if (!o->registerfiles)
+		return generic_open_file(td, f);
+
+	f->fd = ld->fds[f->engine_pos];
+	return 0;
+}
+
+static int fio_ioring_close_file(struct thread_data *td, struct fio_file *f)
+{
+	struct ioring_options *o = td->eo;
+
+	if (!o->registerfiles)
+		return generic_close_file(td, f);
+
+	f->fd = -1;
+	return 0;
+}
+
 static struct ioengine_ops ioengine = {
 	.name			= "io_uring",
 	.version		= FIO_IOOPS_VERSION,
@@ -543,8 +644,8 @@ static struct ioengine_ops ioengine = {
 	.getevents		= fio_ioring_getevents,
 	.event			= fio_ioring_event,
 	.cleanup		= fio_ioring_cleanup,
-	.open_file		= generic_open_file,
-	.close_file		= generic_close_file,
+	.open_file		= fio_ioring_open_file,
+	.close_file		= fio_ioring_close_file,
 	.get_file_size		= generic_get_file_size,
 	.options		= options,
 	.option_struct_size	= sizeof(struct ioring_options),
diff --git a/fio.1 b/fio.1
index 3e872bce..e0283f7f 100644
--- a/fio.1
+++ b/fio.1
@@ -1791,6 +1791,12 @@ release them when IO is done. If this option is set, the pages are pre-mapped
 before IO is started. This eliminates the need to map and release for each IO.
 This is more efficient, and reduces the IO latency as well.
 .TP
+.BI (io_uring)registerfiles
+With this option, fio registers the set of files being used with the kernel.
+This avoids the overhead of managing file counts in the kernel, making the
+submission and completion part more lightweight. Required for the below
+sqthread_poll option.
+.TP
 .BI (io_uring)sqthread_poll
 Normally fio will submit IO by issuing a system call to notify the kernel of
 available items in the SQ ring. If this option is set, the act of submitting IO
diff --git a/optgroup.h b/optgroup.h
index 8009bf25..55ef5934 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -64,6 +64,7 @@ enum opt_category_group {
 	__FIO_OPT_G_MMAP,
 	__FIO_OPT_G_ISCSI,
 	__FIO_OPT_G_NBD,
+	__FIO_OPT_G_IOURING,
 	__FIO_OPT_G_NR,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
@@ -104,6 +105,7 @@ enum opt_category_group {
 	FIO_OPT_G_INVALID	= (1ULL << __FIO_OPT_G_NR),
 	FIO_OPT_G_ISCSI         = (1ULL << __FIO_OPT_G_ISCSI),
 	FIO_OPT_G_NBD		= (1ULL << __FIO_OPT_G_NBD),
+	FIO_OPT_G_IOURING	= (1ULL << __FIO_OPT_G_IOURING),
 };
 
 extern const struct opt_group *opt_group_from_mask(uint64_t *mask);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-09-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-09-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 971d6a22bad5942234496683d89a2f8deed57172:

  zbd: Improve job zonesize initialization checks (2019-08-29 20:51:17 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4a479420d50eada0a7b9a972c529d75e2884732d:

  smalloc: use SMALLOC_BPI instead of SMALLOC_BPB in add_pool() (2019-09-03 12:32:01 -0600)

----------------------------------------------------------------
Vincent Fu (2):
      smalloc: allocate struct pool array from shared memory
      smalloc: use SMALLOC_BPI instead of SMALLOC_BPB in add_pool()

 smalloc.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/smalloc.c b/smalloc.c
index 125e07bf..fa00f0ee 100644
--- a/smalloc.c
+++ b/smalloc.c
@@ -54,7 +54,7 @@ struct block_hdr {
  */
 static const bool enable_smalloc_debug = false;
 
-static struct pool mp[MAX_POOLS];
+static struct pool *mp;
 static unsigned int nr_pools;
 static unsigned int last_pool;
 
@@ -173,7 +173,7 @@ static bool add_pool(struct pool *pool, unsigned int alloc_size)
 	pool->mmap_size = alloc_size;
 
 	pool->nr_blocks = bitmap_blocks;
-	pool->free_blocks = bitmap_blocks * SMALLOC_BPB;
+	pool->free_blocks = bitmap_blocks * SMALLOC_BPI;
 
 	mmap_flags = OS_MAP_ANON;
 #ifdef CONFIG_ESX
@@ -208,6 +208,20 @@ void sinit(void)
 	bool ret;
 	int i;
 
+	/*
+	 * sinit() can be called more than once if alloc-size is
+	 * set. But we want to allocate space for the struct pool
+	 * instances only once.
+	 */
+	if (!mp) {
+		mp = (struct pool *) mmap(NULL,
+			MAX_POOLS * sizeof(struct pool),
+			PROT_READ | PROT_WRITE,
+			OS_MAP_ANON | MAP_SHARED, -1, 0);
+
+		assert(mp != MAP_FAILED);
+	}
+
 	for (i = 0; i < INITIAL_POOLS; i++) {
 		ret = add_pool(&mp[nr_pools], smalloc_pool_size);
 		if (!ret)
@@ -239,6 +253,8 @@ void scleanup(void)
 
 	for (i = 0; i < nr_pools; i++)
 		cleanup_pool(&mp[i]);
+
+	munmap(mp, MAX_POOLS * sizeof(struct pool));
 }
 
 #ifdef SMALLOC_REDZONE


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-08-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-08-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0b288ba164c10009ea9f4a2c737bd29863ebc60c:

  options: allow offset_increment to understand percentages (2019-08-28 13:45:37 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 971d6a22bad5942234496683d89a2f8deed57172:

  zbd: Improve job zonesize initialization checks (2019-08-29 20:51:17 -0600)

----------------------------------------------------------------
Damien Le Moal (8):
      zbd: Cleanup zbd_init()
      zbd: Fix initialization error message
      zbd: Fix error message
      man page: Fix read_beyond_wp description
      man: Improve zonemode=zbd information
      zbd: Add support for zoneskip option
      zbd: Fix job zone size initialization
      zbd: Improve job zonesize initialization checks

Jens Axboe (1):
      zbd: provide empty setup_zbd_zone_mode()

 fio.1                  |  29 ++++++++----
 io_u.c                 |   2 +
 t/zbd/test-zbd-support |  15 ++++++-
 zbd.c                  | 117 ++++++++++++++++++++++++++++++++++++++++++-------
 zbd.h                  |   7 +++
 5 files changed, 143 insertions(+), 27 deletions(-)

---

Diff of recent changes:

diff --git a/fio.1 b/fio.1
index a06a12da..3e872bce 100644
--- a/fio.1
+++ b/fio.1
@@ -753,7 +753,10 @@ restricted to a single zone.
 .RE
 .TP
 .BI zonerange \fR=\fPint
-Size of a single zone. See also \fBzonesize\fR and \fBzoneskip\fR.
+For \fBzonemode\fR=strided, this is the size of a single zone. See also
+\fBzonesize\fR and \fBzoneskip\fR.
+
+For \fBzonemode\fR=zbd, this parameter is ignored.
 .TP
 .BI zonesize \fR=\fPint
 For \fBzonemode\fR=strided, this is the number of bytes to transfer before
@@ -762,13 +765,21 @@ skipping \fBzoneskip\fR bytes. If this parameter is smaller than
 will be accessed.  If this parameter is larger than \fBzonerange\fR then each
 zone will be accessed multiple times before skipping to the next zone.
 
-For \fBzonemode\fR=zbd, this is the size of a single zone. The \fBzonerange\fR
-parameter is ignored in this mode.
+For \fBzonemode\fR=zbd, this is the size of a single zone. The
+\fBzonerange\fR parameter is ignored in this mode. For a job accessing a
+zoned block device, the specified \fBzonesize\fR must be 0 or equal to the
+device zone size. For a regular block device or file, the specified
+\fBzonesize\fR must be at least 512B.
 .TP
 .BI zoneskip \fR=\fPint
 For \fBzonemode\fR=strided, the number of bytes to skip after \fBzonesize\fR
-bytes of data have been transferred. This parameter must be zero for
-\fBzonemode\fR=zbd.
+bytes of data have been transferred.
+
+For \fBzonemode\fR=zbd, the \fBzonesize\fR aligned number of bytes to skip
+once a zone is fully written (write workloads) or all written data in the
+zone have been read (read workloads). This parameter is valid only for
+sequential workloads and ignored for random workloads. For read workloads,
+see also \fBread_beyond_wp\fR.
 
 .TP
 .BI read_beyond_wp \fR=\fPbool
@@ -778,10 +789,10 @@ Zoned block devices are block devices that consist of multiple zones. Each
 zone has a type, e.g. conventional or sequential. A conventional zone can be
 written at any offset that is a multiple of the block size. Sequential zones
 must be written sequentially. The position at which a write must occur is
-called the write pointer. A zoned block device can be either drive
-managed, host managed or host aware. For host managed devices the host must
-ensure that writes happen sequentially. Fio recognizes host managed devices
-and serializes writes to sequential zones for these devices.
+called the write pointer. A zoned block device can be either host managed or
+host aware. For host managed devices the host must ensure that writes happen
+sequentially. Fio recognizes host managed devices and serializes writes to
+sequential zones for these devices.
 
 If a read occurs in a sequential zone beyond the write pointer then the zoned
 block device will complete the read without reading any data from the storage
diff --git a/io_u.c b/io_u.c
index 80df2854..94899552 100644
--- a/io_u.c
+++ b/io_u.c
@@ -901,6 +901,8 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 
 	if (td->o.zone_mode == ZONE_MODE_STRIDED)
 		setup_strided_zone_mode(td, io_u);
+	else if (td->o.zone_mode == ZONE_MODE_ZBD)
+		setup_zbd_zone_mode(td, io_u);
 
 	/*
 	 * No log, let the seq/rand engine retrieve the next buflen and
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index ed54a0aa..90f9f87b 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -707,8 +707,9 @@ test42() {
 	grep -q 'Specifying the zone size is mandatory for regular block devices with --zonemode=zbd'
 }
 
-# Check whether fio handles --zonesize=1 correctly.
+# Check whether fio handles --zonesize=1 correctly for regular block devices.
 test43() {
+    [ -n "$is_zbd" ] && return 0
     read_one_block --zonemode=zbd --zonesize=1 |
 	grep -q 'zone size must be at least 512 bytes for --zonemode=zbd'
 }
@@ -742,6 +743,18 @@ test46() {
     check_written $((size * 8)) || return $?
 }
 
+# Check whether fio handles --zonemode=zbd --zoneskip=1 correctly.
+test47() {
+    local bs
+
+    [ -z "$is_zbd" ] && return 0
+    bs=$((logical_block_size))
+    run_one_fio_job --ioengine=psync --rw=write --bs=$bs \
+		    --zonemode=zbd --zoneskip=1		 \
+		    >> "${logfile}.${test_number}" 2>&1 && return 1
+    grep -q 'zoneskip 1 is not a multiple of the device zone size' "${logfile}.${test_number}"
+}
+
 tests=()
 dynamic_analyzer=()
 reset_all_zones=
diff --git a/zbd.c b/zbd.c
index 2383c57d..99310c49 100644
--- a/zbd.c
+++ b/zbd.c
@@ -119,6 +119,30 @@ static bool zbd_verify_sizes(void)
 				continue;
 			if (!zbd_is_seq_job(f))
 				continue;
+
+			if (!td->o.zone_size) {
+				td->o.zone_size = f->zbd_info->zone_size;
+				if (!td->o.zone_size) {
+					log_err("%s: invalid 0 zone size\n",
+						f->file_name);
+					return false;
+				}
+			} else if (td->o.zone_size != f->zbd_info->zone_size) {
+				log_err("%s: job parameter zonesize %llu does not match disk zone size %llu.\n",
+					f->file_name, (unsigned long long) td->o.zone_size,
+					(unsigned long long) f->zbd_info->zone_size);
+				return false;
+			}
+
+			if (td->o.zone_skip &&
+			    (td->o.zone_skip < td->o.zone_size ||
+			     td->o.zone_skip % td->o.zone_size)) {
+				log_err("%s: zoneskip %llu is not a multiple of the device zone size %llu.\n",
+					f->file_name, (unsigned long long) td->o.zone_skip,
+					(unsigned long long) td->o.zone_size);
+				return false;
+			}
+
 			zone_idx = zbd_zone_idx(f, f->file_offset);
 			z = &f->zbd_info->zone_info[zone_idx];
 			if (f->file_offset != z->start) {
@@ -312,13 +336,23 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 {
 	uint32_t nr_zones;
 	struct fio_zone_info *p;
-	uint64_t zone_size;
+	uint64_t zone_size = td->o.zone_size;
 	struct zoned_block_device_info *zbd_info = NULL;
 	pthread_mutexattr_t attr;
 	int i;
 
-	zone_size = td->o.zone_size;
-	assert(zone_size);
+	if (zone_size == 0) {
+		log_err("%s: Specifying the zone size is mandatory for regular block devices with --zonemode=zbd\n\n",
+			f->file_name);
+		return 1;
+	}
+
+	if (zone_size < 512) {
+		log_err("%s: zone size must be at least 512 bytes for --zonemode=zbd\n\n",
+			f->file_name);
+		return 1;
+	}
+
 	nr_zones = (f->real_file_size + zone_size - 1) / zone_size;
 	zbd_info = scalloc(1, sizeof(*zbd_info) +
 			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
@@ -401,8 +435,8 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	if (td->o.zone_size == 0) {
 		td->o.zone_size = zone_size;
 	} else if (td->o.zone_size != zone_size) {
-		log_info("fio: %s job parameter zonesize %llu does not match disk zone size %llu.\n",
-			 f->file_name, (unsigned long long) td->o.zone_size,
+		log_err("fio: %s job parameter zonesize %llu does not match disk zone size %llu.\n",
+			f->file_name, (unsigned long long) td->o.zone_size,
 			(unsigned long long) zone_size);
 		ret = -EINVAL;
 		goto close;
@@ -549,7 +583,7 @@ static int zbd_init_zone_info(struct thread_data *td, struct fio_file *file)
 
 	ret = zbd_create_zone_info(td, file);
 	if (ret < 0)
-		td_verror(td, -ret, "BLKREPORTZONE failed");
+		td_verror(td, -ret, "zbd_create_zone_info() failed");
 	return ret;
 }
 
@@ -561,18 +595,8 @@ int zbd_init(struct thread_data *td)
 	for_each_file(td, f, i) {
 		if (f->filetype != FIO_TYPE_BLOCK)
 			continue;
-		if (td->o.zone_size && td->o.zone_size < 512) {
-			log_err("%s: zone size must be at least 512 bytes for --zonemode=zbd\n\n",
-				f->file_name);
+		if (zbd_init_zone_info(td, f))
 			return 1;
-		}
-		if (td->o.zone_size == 0 &&
-		    get_zbd_model(f->file_name) == ZBD_DM_NONE) {
-			log_err("%s: Specifying the zone size is mandatory for regular block devices with --zonemode=zbd\n\n",
-				f->file_name);
-			return 1;
-		}
-		zbd_init_zone_info(td, f);
 	}
 
 	if (!zbd_using_direct_io()) {
@@ -1219,6 +1243,65 @@ bool zbd_unaligned_write(int error_code)
 	return false;
 }
 
+/**
+ * setup_zbd_zone_mode - handle zoneskip as necessary for ZBD drives
+ * @td: FIO thread data.
+ * @io_u: FIO I/O unit.
+ *
+ * For sequential workloads, change the file offset to skip zoneskip bytes when
+ * no more IO can be performed in the current zone.
+ * - For read workloads, zoneskip is applied when the io has reached the end of
+ *   the zone or the zone write position (when td->o.read_beyond_wp is false).
+ * - For write workloads, zoneskip is applied when the zone is full.
+ * This applies only to read and write operations.
+ */
+void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	enum fio_ddir ddir = io_u->ddir;
+	struct fio_zone_info *z;
+	uint32_t zone_idx;
+
+	assert(td->o.zone_mode == ZONE_MODE_ZBD);
+	assert(td->o.zone_size);
+
+	/*
+	 * zone_skip is valid only for sequential workloads.
+	 */
+	if (td_random(td) || !td->o.zone_skip)
+		return;
+
+	/*
+	 * It is time to switch to a new zone if:
+	 * - zone_bytes == zone_size bytes have already been accessed
+	 * - The last position reached the end of the current zone.
+	 * - For reads with td->o.read_beyond_wp == false, the last position
+	 *   reached the zone write pointer.
+	 */
+	zone_idx = zbd_zone_idx(f, f->last_pos[ddir]);
+	z = &f->zbd_info->zone_info[zone_idx];
+
+	if (td->zone_bytes >= td->o.zone_size ||
+	    f->last_pos[ddir] >= (z+1)->start ||
+	    (ddir == DDIR_READ &&
+	     (!td->o.read_beyond_wp) && f->last_pos[ddir] >= z->wp)) {
+		/*
+		 * Skip zones.
+		 */
+		td->zone_bytes = 0;
+		f->file_offset += td->o.zone_size + td->o.zone_skip;
+
+		/*
+		 * Wrap from the beginning, if we exceed the file size
+		 */
+		if (f->file_offset >= f->real_file_size)
+			f->file_offset = get_start_offset(td, f);
+
+		f->last_pos[ddir] = f->file_offset;
+		td->io_skip_bytes += td->o.zone_skip;
+	}
+}
+
 /**
  * zbd_adjust_block - adjust the offset and length as necessary for ZBD drives
  * @td: FIO thread data.
diff --git a/zbd.h b/zbd.h
index 521283b2..e0a7e447 100644
--- a/zbd.h
+++ b/zbd.h
@@ -94,6 +94,7 @@ void zbd_free_zone_info(struct fio_file *f);
 int zbd_init(struct thread_data *td);
 void zbd_file_reset(struct thread_data *td, struct fio_file *f);
 bool zbd_unaligned_write(int error_code);
+void setup_zbd_zone_mode(struct thread_data *td, struct io_u *io_u);
 enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
 char *zbd_write_status(const struct thread_stat *ts);
 
@@ -147,6 +148,12 @@ static inline char *zbd_write_status(const struct thread_stat *ts)
 static inline void zbd_queue_io_u(struct io_u *io_u,
 				  enum fio_q_status status) {}
 static inline void zbd_put_io_u(struct io_u *io_u) {}
+
+static inline void setup_zbd_zone_mode(struct thread_data *td,
+					struct io_u *io_u)
+{
+}
+
 #endif
 
 #endif /* FIO_ZBD_H */


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-08-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-08-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4e8c82b4e9804c52bf2c78327cc5bfca9d8aedfc:

  nbd: Update for libnbd 0.9.8 (2019-08-15 10:42:01 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0b288ba164c10009ea9f4a2c737bd29863ebc60c:

  options: allow offset_increment to understand percentages (2019-08-28 13:45:37 -0600)

----------------------------------------------------------------
Vincent Fu (2):
      docs: small HOWTO fixes
      options: allow offset_increment to understand percentages

 HOWTO            | 12 ++++++++++--
 cconv.c          |  2 ++
 filesetup.c      | 32 ++++++++++++++++++++++----------
 fio.1            |  4 +++-
 options.c        | 17 +++++++++++++++++
 server.h         |  2 +-
 thread_options.h |  3 +++
 7 files changed, 58 insertions(+), 14 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 41b959d1..4201e2e9 100644
--- a/HOWTO
+++ b/HOWTO
@@ -93,6 +93,12 @@ Command line options
 			Dump info related to I/O rate switching.
 	*compress*
 			Dump info related to log compress/decompress.
+	*steadystate*
+			Dump info related to steadystate detection.
+	*helperthread*
+			Dump info related to the helper thread.
+	*zbd*
+			Dump info related to support for zoned block devices.
 	*?* or *help*
 			Show available debug options.
 
@@ -1246,7 +1252,9 @@ I/O type
 	is incremented for each sub-job (i.e. when :option:`numjobs` option is
 	specified). This option is useful if there are several jobs which are
 	intended to operate on a file in parallel disjoint segments, with even
-	spacing between the starting points.
+	spacing between the starting points. Percentages can be used for this option.
+	If a percentage is given, the generated offset will be aligned to the minimum
+	``blocksize`` or to the value of ``offset_align`` if provided.
 
 .. option:: number_ios=int
 
@@ -2388,7 +2396,7 @@ I/O depth
 	this option can reduce both performance and the :option:`iodepth` achieved.
 
 	This option only applies to I/Os issued for a single job except when it is
-	enabled along with :option:`io_submit_mode`=offload. In offload mode, fio
+	enabled along with :option:`io_submit_mode`\=offload. In offload mode, fio
 	will check for overlap among all I/Os submitted by offload jobs with :option:`serialize_overlap`
 	enabled.
 
diff --git a/cconv.c b/cconv.c
index 0e657246..bff5e34f 100644
--- a/cconv.c
+++ b/cconv.c
@@ -226,6 +226,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->zone_skip = le64_to_cpu(top->zone_skip);
 	o->zone_mode = le32_to_cpu(top->zone_mode);
 	o->lockmem = le64_to_cpu(top->lockmem);
+	o->offset_increment_percent = le32_to_cpu(top->offset_increment_percent);
 	o->offset_increment = le64_to_cpu(top->offset_increment);
 	o->number_ios = le64_to_cpu(top->number_ios);
 
@@ -566,6 +567,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->start_offset_align = __cpu_to_le64(o->start_offset_align);
 	top->start_offset_percent = __cpu_to_le32(o->start_offset_percent);
 	top->trim_backlog = __cpu_to_le64(o->trim_backlog);
+	top->offset_increment_percent = __cpu_to_le32(o->offset_increment_percent);
 	top->offset_increment = __cpu_to_le64(o->offset_increment);
 	top->number_ios = __cpu_to_le64(o->number_ios);
 	top->rate_process = cpu_to_le32(o->rate_process);
diff --git a/filesetup.c b/filesetup.c
index 57eca1bf..7904d187 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -852,16 +852,37 @@ static unsigned long long get_fs_free_counts(struct thread_data *td)
 
 uint64_t get_start_offset(struct thread_data *td, struct fio_file *f)
 {
+	bool align = false;
 	struct thread_options *o = &td->o;
 	unsigned long long align_bs;
 	unsigned long long offset;
+	unsigned long long increment;
 
 	if (o->file_append && f->filetype == FIO_TYPE_FILE)
 		return f->real_file_size;
 
+	if (o->offset_increment_percent) {
+		assert(!o->offset_increment);
+		increment = o->offset_increment_percent * f->real_file_size / 100;
+		align = true;
+	} else
+		increment = o->offset_increment;
+
 	if (o->start_offset_percent > 0) {
+		/* calculate the raw offset */
+		offset = (f->real_file_size * o->start_offset_percent / 100) +
+			(td->subjob_number * increment);
+
+		align = true;
+	} else {
+		/* start_offset_percent not set */
+		offset = o->start_offset +
+				td->subjob_number * increment;
+	}
+
+	if (align) {
 		/*
-		 * if offset_align is provided, set initial offset
+		 * if offset_align is provided, use it
 		 */
 		if (fio_option_is_set(o, start_offset_align)) {
 			align_bs = o->start_offset_align;
@@ -870,20 +891,11 @@ uint64_t get_start_offset(struct thread_data *td, struct fio_file *f)
 			align_bs = td_min_bs(td);
 		}
 
-		/* calculate the raw offset */
-		offset = (f->real_file_size * o->start_offset_percent / 100) +
-			(td->subjob_number * o->offset_increment);
-
 		/*
 		 * block align the offset at the next available boundary at
 		 * ceiling(offset / align_bs) * align_bs
 		 */
 		offset = (offset / align_bs + (offset % align_bs != 0)) * align_bs;
-
-	} else {
-		/* start_offset_percent not set */
-		offset = o->start_offset +
-				td->subjob_number * o->offset_increment;
 	}
 
 	return offset;
diff --git a/fio.1 b/fio.1
index 97371d77..a06a12da 100644
--- a/fio.1
+++ b/fio.1
@@ -1015,7 +1015,9 @@ If this is provided, then the real offset becomes `\fBoffset\fR + \fBoffset_incr
 is incremented for each sub\-job (i.e. when \fBnumjobs\fR option is
 specified). This option is useful if there are several jobs which are
 intended to operate on a file in parallel disjoint segments, with even
-spacing between the starting points.
+spacing between the starting points. Percentages can be used for this option.
+If a percentage is given, the generated offset will be aligned to the minimum
+\fBblocksize\fR or to the value of \fBoffset_align\fR if provided.
 .TP
 .BI number_ios \fR=\fPint
 Fio will normally perform I/Os until it has exhausted the size of the region
diff --git a/options.c b/options.c
index 447f231e..2c5bf5e0 100644
--- a/options.c
+++ b/options.c
@@ -1434,6 +1434,22 @@ static int str_offset_cb(void *data, unsigned long long *__val)
 	return 0;
 }
 
+static int str_offset_increment_cb(void *data, unsigned long long *__val)
+{
+	struct thread_data *td = cb_data_to_td(data);
+	unsigned long long v = *__val;
+
+	if (parse_is_percent(v)) {
+		td->o.offset_increment = 0;
+		td->o.offset_increment_percent = -1ULL - v;
+		dprint(FD_PARSE, "SET offset_increment_percent %d\n",
+					td->o.offset_increment_percent);
+	} else
+		td->o.offset_increment = v;
+
+	return 0;
+}
+
 static int str_size_cb(void *data, unsigned long long *__val)
 {
 	struct thread_data *td = cb_data_to_td(data);
@@ -2084,6 +2100,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "offset_increment",
 		.lname	= "IO offset increment",
 		.type	= FIO_OPT_STR_VAL,
+		.cb	= str_offset_increment_cb,
 		.off1	= offsetof(struct thread_options, offset_increment),
 		.help	= "What is the increment from one offset to the next",
 		.parent = "offset",
diff --git a/server.h b/server.h
index ac713252..de1d7f9b 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 79,
+	FIO_SERVER_VER			= 80,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 14c6969f..ee6e4d6d 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -313,6 +313,7 @@ struct thread_options {
 	int flow_watermark;
 	unsigned int flow_sleep;
 
+	unsigned int offset_increment_percent;
 	unsigned long long offset_increment;
 	unsigned long long number_ios;
 
@@ -599,6 +600,8 @@ struct thread_options_pack {
 	int32_t flow_watermark;
 	uint32_t flow_sleep;
 
+	uint32_t offset_increment_percent;
+	uint32_t pad4;
 	uint64_t offset_increment;
 	uint64_t number_ios;
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-08-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-08-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dc54b6ca3209c2da3df30e96d097f6de29d56d24:

  eta: use struct jobs_eta_packed (2019-08-14 15:16:09 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4e8c82b4e9804c52bf2c78327cc5bfca9d8aedfc:

  nbd: Update for libnbd 0.9.8 (2019-08-15 10:42:01 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      stat: ensure that struct jobs_eta packs nicely

Richard W.M. Jones (1):
      nbd: Update for libnbd 0.9.8

 configure     |  2 +-
 engines/nbd.c | 33 ++++++++++++---------------------
 server.h      |  2 +-
 stat.h        | 15 +++++++++------
 4 files changed, 23 insertions(+), 29 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index b11b2dce..b174a6fc 100755
--- a/configure
+++ b/configure
@@ -2020,7 +2020,7 @@ print_config "iscsi engine" "$libiscsi"
 
 ##########################################
 # Check if we have libnbd (for NBD support).
-minimum_libnbd=0.9.6
+minimum_libnbd=0.9.8
 if test "$libnbd" = "yes" ; then
   if $(pkg-config --atleast-version=$minimum_libnbd libnbd); then
     libnbd="yes"
diff --git a/engines/nbd.c b/engines/nbd.c
index f8583812..53237929 100644
--- a/engines/nbd.c
+++ b/engines/nbd.c
@@ -152,15 +152,12 @@ static int nbd_init(struct thread_data *td)
 }
 
 /* A command in flight has been completed. */
-static int cmd_completed (unsigned valid_flag, void *vp, int *error)
+static int cmd_completed (void *vp, int *error)
 {
 	struct io_u *io_u;
 	struct nbd_data *nbd_data;
 	struct io_u **completed;
 
-	if (!(valid_flag & LIBNBD_CALLBACK_VALID))
-		return 0;
-
 	io_u = vp;
 	nbd_data = io_u->engine_data;
 
@@ -195,6 +192,8 @@ static enum fio_q_status nbd_queue(struct thread_data *td,
 				   struct io_u *io_u)
 {
 	struct nbd_data *nbd_data = td->io_ops_data;
+	nbd_completion_callback completion = { .callback = cmd_completed,
+					       .user_data = io_u };
 	int r;
 
 	fio_ro_check(td, io_u);
@@ -206,32 +205,24 @@ static enum fio_q_status nbd_queue(struct thread_data *td,
 
 	switch (io_u->ddir) {
 	case DDIR_READ:
-		r = nbd_aio_pread_callback(nbd_data->nbd,
-					   io_u->xfer_buf, io_u->xfer_buflen,
-					   io_u->offset,
-					   cmd_completed, io_u,
-					   0);
+		r = nbd_aio_pread(nbd_data->nbd,
+				  io_u->xfer_buf, io_u->xfer_buflen,
+				  io_u->offset, completion, 0);
 		break;
 	case DDIR_WRITE:
-		r = nbd_aio_pwrite_callback(nbd_data->nbd,
-					    io_u->xfer_buf, io_u->xfer_buflen,
-					    io_u->offset,
-					    cmd_completed, io_u,
-					    0);
+		r = nbd_aio_pwrite(nbd_data->nbd,
+				   io_u->xfer_buf, io_u->xfer_buflen,
+				   io_u->offset, completion, 0);
 		break;
 	case DDIR_TRIM:
-		r = nbd_aio_trim_callback(nbd_data->nbd, io_u->xfer_buflen,
-					  io_u->offset,
-					  cmd_completed, io_u,
-					  0);
+		r = nbd_aio_trim(nbd_data->nbd, io_u->xfer_buflen,
+				 io_u->offset, completion, 0);
 		break;
 	case DDIR_SYNC:
 		/* XXX We could probably also handle
 		 * DDIR_SYNC_FILE_RANGE with a bit of effort.
 		 */
-		r = nbd_aio_flush_callback(nbd_data->nbd,
-					   cmd_completed, io_u,
-					   0);
+		r = nbd_aio_flush(nbd_data->nbd, completion, 0);
 		break;
 	default:
 		io_u->error = EINVAL;
diff --git a/server.h b/server.h
index abb23bad..ac713252 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 78,
+	FIO_SERVER_VER			= 79,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.h b/stat.h
index c209ab6c..ba7e290d 100644
--- a/stat.h
+++ b/stat.h
@@ -258,13 +258,15 @@ struct thread_stat {
 	uint32_t nr_pending;						\
 	uint32_t nr_setting_up;						\
 									\
-	uint64_t m_rate[DDIR_RWDIR_CNT], t_rate[DDIR_RWDIR_CNT];	\
+	uint64_t m_rate[DDIR_RWDIR_CNT];				\
+	uint64_t t_rate[DDIR_RWDIR_CNT];				\
 	uint64_t rate[DDIR_RWDIR_CNT];					\
-	uint32_t m_iops[DDIR_RWDIR_CNT] __attribute__((packed));	\
-	uint32_t t_iops[DDIR_RWDIR_CNT] __attribute__((packed));	\
-	uint32_t iops[DDIR_RWDIR_CNT] __attribute__((packed));		\
-	uint64_t elapsed_sec __attribute__((packed));			\
-	uint64_t eta_sec __attribute__((packed));			\
+	uint32_t m_iops[DDIR_RWDIR_CNT];				\
+	uint32_t t_iops[DDIR_RWDIR_CNT];				\
+	uint32_t iops[DDIR_RWDIR_CNT];					\
+	uint32_t pad;							\
+	uint64_t elapsed_sec;						\
+	uint64_t eta_sec;						\
 	uint32_t is_pow2;						\
 	uint32_t unit_base;						\
 									\
@@ -276,6 +278,7 @@ struct thread_stat {
 	 * Network 'copy' of run_str[]					\
 	 */								\
 	uint32_t nr_threads;						\
+	uint32_t pad2;							\
 	uint8_t run_str[];						\
 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2019-08-15 15:17       ` Jens Axboe
@ 2019-08-15 15:35         ` Rebecca Cran
  0 siblings, 0 replies; 1305+ messages in thread
From: Rebecca Cran @ 2019-08-15 15:35 UTC (permalink / raw)
  To: Jens Axboe, fio

On 8/15/2019 9:17 AM, Jens Axboe wrote:

>
> Can you check current -git? I took a look at the struct and cleaned it a
> bit.
>
> http://git.kernel.dk/cgit/fio/commit/?id=cd8920a40ef04625d9e76f63fa09fcaac9e9977e

That works. Thanks!


-- 
Rebecca Cran



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2019-08-15 15:05     ` Rebecca Cran
@ 2019-08-15 15:17       ` Jens Axboe
  2019-08-15 15:35         ` Rebecca Cran
  0 siblings, 1 reply; 1305+ messages in thread
From: Jens Axboe @ 2019-08-15 15:17 UTC (permalink / raw)
  To: Rebecca Cran, fio

On 8/15/19 9:05 AM, Rebecca Cran wrote:
> On 8/15/2019 8:28 AM, Jens Axboe wrote:
> 
>>
>> I did notice, I just didn't see what was wrong here. Can you remove the
>> assert and check the struct sizes of jobs_eta and jobs_eta_packed?
> 
> 
> jobs_eta is 168 bytes; jobs_eta_packed is 164.
> 
> Adding an extra 32-bit field before run_str fixes the error.

Can you check current -git? I took a look at the struct and cleaned it a
bit.

http://git.kernel.dk/cgit/fio/commit/?id=cd8920a40ef04625d9e76f63fa09fcaac9e9977e

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2019-08-15 14:28   ` Jens Axboe
@ 2019-08-15 15:05     ` Rebecca Cran
  2019-08-15 15:17       ` Jens Axboe
  0 siblings, 1 reply; 1305+ messages in thread
From: Rebecca Cran @ 2019-08-15 15:05 UTC (permalink / raw)
  To: Jens Axboe, fio

On 8/15/2019 8:28 AM, Jens Axboe wrote:

>
> I did notice, I just didn't see what was wrong here. Can you remove the
> assert and check the struct sizes of jobs_eta and jobs_eta_packed?


jobs_eta is 168 bytes; jobs_eta_packed is 164.

Adding an extra 32-bit field before run_str fixes the error.


-- 
Rebecca Cran



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2019-08-15 14:27 ` Rebecca Cran
@ 2019-08-15 14:28   ` Jens Axboe
  2019-08-15 15:05     ` Rebecca Cran
  0 siblings, 1 reply; 1305+ messages in thread
From: Jens Axboe @ 2019-08-15 14:28 UTC (permalink / raw)
  To: Rebecca Cran, fio

On 8/15/19 8:27 AM, Rebecca Cran wrote:
> On 8/15/2019 6:00 AM, Jens Axboe wrote:
>>    void print_status_init(int thr_number)
>>    {
>> +	struct jobs_eta_packed jep;
>> +
>> +	compiletime_assert(sizeof(struct jobs_eta) == sizeof(jep), "jobs_eta");
>> +
> 
> This broke the build on Windows:
> 
> 
>       CC eta.o
> In file included from fio.h:17:0,
>                    from eta.c:12:
> eta.c: In function ‘print_status_init’:
> compiler/compiler.h:38:44: error: static assertion failed: "jobs_eta"
>    #define compiletime_assert(condition, msg) _Static_assert(condition, msg)
>                                               ^
> eta.c:738:2: note: in expansion of macro ‘compiletime_assert’
>     compiletime_assert(sizeof(struct jobs_eta) == sizeof(jep), "jobs_eta");
>     ^

I did notice, I just didn't see what was wrong here. Can you remove the
assert and check the struct sizes of jobs_eta and jobs_eta_packed?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2019-08-15 12:00 Jens Axboe
@ 2019-08-15 14:27 ` Rebecca Cran
  2019-08-15 14:28   ` Jens Axboe
  0 siblings, 1 reply; 1305+ messages in thread
From: Rebecca Cran @ 2019-08-15 14:27 UTC (permalink / raw)
  To: Jens Axboe, fio

On 8/15/2019 6:00 AM, Jens Axboe wrote:
>   void print_status_init(int thr_number)
>   {
> +	struct jobs_eta_packed jep;
> +
> +	compiletime_assert(sizeof(struct jobs_eta) == sizeof(jep), "jobs_eta");
> +

This broke the build on Windows:


     CC eta.o
In file included from fio.h:17:0,
                  from eta.c:12:
eta.c: In function ‘print_status_init’:
compiler/compiler.h:38:44: error: static assertion failed: "jobs_eta"
  #define compiletime_assert(condition, msg) _Static_assert(condition, msg)
                                             ^
eta.c:738:2: note: in expansion of macro ‘compiletime_assert’
   compiletime_assert(sizeof(struct jobs_eta) == sizeof(jep), "jobs_eta");
   ^

-- 
Rebecca Cran



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-08-15 12:00 Jens Axboe
  2019-08-15 14:27 ` Rebecca Cran
  0 siblings, 1 reply; 1305+ messages in thread
From: Jens Axboe @ 2019-08-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 937ec971236d98089b63217635294c788ea00bce:

  t/zbd: Fix I/O bytes rounding errors (2019-08-08 21:36:32 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dc54b6ca3209c2da3df30e96d097f6de29d56d24:

  eta: use struct jobs_eta_packed (2019-08-14 15:16:09 -0600)

----------------------------------------------------------------
Bart Van Assche (7):
      zbd: Declare local functions 'static'
      zbd: Improve robustness of unit tests
      Optimize the code that copies strings
      Refine packed annotations in stat.h
      Verify the absence of holes in struct jobs_eta at compile time
      Restore type checking in calc_thread_status()
      Makefile: Add 'fulltest' target

Jens Axboe (1):
      eta: use struct jobs_eta_packed

 Makefile                            | 17 ++++++++++++
 cconv.c                             |  7 +++--
 client.c                            |  5 ++--
 diskutil.c                          |  9 +++----
 engines/net.c                       |  6 ++---
 engines/sg.c                        |  4 +--
 eta.c                               | 13 ++++-----
 filesetup.c                         |  6 ++---
 gclient.c                           |  4 +--
 init.c                              | 19 ++++---------
 ioengines.c                         |  3 +--
 options.c                           |  3 +--
 parse.c                             |  6 ++---
 server.c                            | 26 ++++++++----------
 stat.c                              | 15 ++++++-----
 stat.h                              | 54 ++++++++++++++++++++-----------------
 t/zbd/run-tests-against-zoned-nullb |  2 +-
 t/zbd/test-zbd-support              |  4 +--
 verify.c                            |  3 +--
 zbd.c                               |  6 ++---
 20 files changed, 106 insertions(+), 106 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index fe02bf1d..7c21ef83 100644
--- a/Makefile
+++ b/Makefile
@@ -531,6 +531,21 @@ doc: tools/plot/fio2gnuplot.1
 test: fio
 	./fio --minimal --thread --exitall_on_error --runtime=1s --name=nulltest --ioengine=null --rw=randrw --iodepth=2 --norandommap --random_generator=tausworthe64 --size=16T --name=verifyfstest --filename=fiotestfile.tmp --unlink=1 --rw=write --verify=crc32c --verify_state_save=0 --size=16K
 
+fulltest:
+	sudo modprobe null_blk &&				 	\
+	if [ ! -e /usr/include/libzbc/zbc.h ]; then			\
+	  git clone https://github.com/hgst/libzbc &&		 	\
+	  (cd libzbc &&						 	\
+	   ./autogen.sh &&					 	\
+	   ./configure --prefix=/usr &&				 	\
+	   make -j &&						 	\
+	   sudo make install)						\
+	fi &&					 			\
+	sudo t/zbd/run-tests-against-regular-nullb &&		 	\
+	if [ -e /sys/module/null_blk/parameters/zoned ]; then		\
+		sudo t/zbd/run-tests-against-zoned-nullb;	 	\
+	fi
+
 install: $(PROGS) $(SCRIPTS) tools/plot/fio2gnuplot.1 FORCE
 	$(INSTALL) -m 755 -d $(DESTDIR)$(bindir)
 	$(INSTALL) $(PROGS) $(SCRIPTS) $(DESTDIR)$(bindir)
@@ -541,3 +556,5 @@ install: $(PROGS) $(SCRIPTS) tools/plot/fio2gnuplot.1 FORCE
 	$(INSTALL) -m 644 $(SRCDIR)/tools/hist/fiologparser_hist.py.1 $(DESTDIR)$(mandir)/man1
 	$(INSTALL) -m 755 -d $(DESTDIR)$(sharedir)
 	$(INSTALL) -m 644 $(SRCDIR)/tools/plot/*gpm $(DESTDIR)$(sharedir)/
+
+.PHONY: test fulltest
diff --git a/cconv.c b/cconv.c
index 50e45c63..0e657246 100644
--- a/cconv.c
+++ b/cconv.c
@@ -13,10 +13,9 @@ static void string_to_cpu(char **dst, const uint8_t *src)
 
 static void __string_to_net(uint8_t *dst, const char *src, size_t dst_size)
 {
-	if (src) {
-		dst[dst_size - 1] = '\0';
-		strncpy((char *) dst, src, dst_size - 1);
-	} else
+	if (src)
+		snprintf((char *) dst, dst_size, "%s", src);
+	else
 		dst[0] = '\0';
 }
 
diff --git a/client.c b/client.c
index 43cfbd43..e0047af0 100644
--- a/client.c
+++ b/client.c
@@ -520,7 +520,7 @@ static void probe_client(struct fio_client *client)
 
 	sname = server_name(client, buf, sizeof(buf));
 	memset(pdu.server, 0, sizeof(pdu.server));
-	strncpy((char *) pdu.server, sname, sizeof(pdu.server) - 1);
+	snprintf((char *) pdu.server, sizeof(pdu.server), "%s", sname);
 
 	fio_net_send_cmd(client->fd, FIO_NET_CMD_PROBE, &pdu, sizeof(pdu), &tag, &client->cmd_list);
 }
@@ -574,7 +574,8 @@ static int fio_client_connect_sock(struct fio_client *client)
 
 	memset(addr, 0, sizeof(*addr));
 	addr->sun_family = AF_UNIX;
-	strncpy(addr->sun_path, client->hostname, sizeof(addr->sun_path) - 1);
+	snprintf(addr->sun_path, sizeof(addr->sun_path), "%s",
+		 client->hostname);
 
 	fd = socket(AF_UNIX, SOCK_STREAM, 0);
 	if (fd < 0) {
diff --git a/diskutil.c b/diskutil.c
index 7be4c022..f0744015 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -181,8 +181,7 @@ static int get_device_numbers(char *file_name, int *maj, int *min)
 		/*
 		 * must be a file, open "." in that path
 		 */
-		tempname[PATH_MAX - 1] = '\0';
-		strncpy(tempname, file_name, PATH_MAX - 1);
+		snprintf(tempname, ARRAY_SIZE(tempname), "%s", file_name);
 		p = dirname(tempname);
 		if (stat(p, &st)) {
 			perror("disk util stat");
@@ -314,7 +313,8 @@ static struct disk_util *disk_util_add(struct thread_data *td, int majdev,
 		sfree(du);
 		return NULL;
 	}
-	strncpy((char *) du->dus.name, basename(path), FIO_DU_NAME_SZ - 1);
+	snprintf((char *) du->dus.name, ARRAY_SIZE(du->dus.name), "%s",
+		 basename(path));
 	du->sysfs_root = strdup(path);
 	du->major = majdev;
 	du->minor = mindev;
@@ -435,8 +435,7 @@ static struct disk_util *__init_per_file_disk_util(struct thread_data *td,
 			log_err("unknown sysfs layout\n");
 			return NULL;
 		}
-		tmp[PATH_MAX - 1] = '\0';
-		strncpy(tmp, p, PATH_MAX - 1);
+		snprintf(tmp, ARRAY_SIZE(tmp), "%s", p);
 		sprintf(path, "%s", tmp);
 	}
 
diff --git a/engines/net.c b/engines/net.c
index ca6fb344..91f25774 100644
--- a/engines/net.c
+++ b/engines/net.c
@@ -1105,8 +1105,7 @@ static int fio_netio_setup_connect_unix(struct thread_data *td,
 	struct sockaddr_un *soun = &nd->addr_un;
 
 	soun->sun_family = AF_UNIX;
-	memset(soun->sun_path, 0, sizeof(soun->sun_path));
-	strncpy(soun->sun_path, path, sizeof(soun->sun_path) - 1);
+	snprintf(soun->sun_path, sizeof(soun->sun_path), "%s", path);
 	return 0;
 }
 
@@ -1135,9 +1134,8 @@ static int fio_netio_setup_listen_unix(struct thread_data *td, const char *path)
 
 	mode = umask(000);
 
-	memset(addr, 0, sizeof(*addr));
 	addr->sun_family = AF_UNIX;
-	strncpy(addr->sun_path, path, sizeof(addr->sun_path) - 1);
+	snprintf(addr->sun_path, sizeof(addr->sun_path), "%s", path);
 	unlink(path);
 
 	len = sizeof(addr->sun_family) + strlen(path) + 1;
diff --git a/engines/sg.c b/engines/sg.c
index c46b9aba..a1a6de4c 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -1181,8 +1181,8 @@ static char *fio_sgio_errdetails(struct io_u *io_u)
 	}
 
 	if (!(hdr->info & SG_INFO_CHECK) && !strlen(msg))
-		strncpy(msg, "SG Driver did not report a Host, Driver or Device check",
-			MAXERRDETAIL - 1);
+		snprintf(msg, MAXERRDETAIL, "%s",
+			 "SG Driver did not report a Host, Driver or Device check");
 
 	return msg;
 }
diff --git a/eta.c b/eta.c
index 647a1bdd..9950ef30 100644
--- a/eta.c
+++ b/eta.c
@@ -392,9 +392,6 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 	static unsigned long long disp_io_iops[DDIR_RWDIR_CNT];
 	static struct timespec rate_prev_time, disp_prev_time;
 
-	void *je_rate = (void *) je->rate;
-	void *je_iops = (void *) je->iops;
-
 	if (!force) {
 		if (!(output_format & FIO_OUTPUT_NORMAL) &&
 		    f_out == stdout)
@@ -510,7 +507,7 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 
 	if (write_bw_log && rate_time > bw_avg_time && !in_ramp_time(td)) {
 		calc_rate(unified_rw_rep, rate_time, io_bytes, rate_io_bytes,
-				je_rate);
+				je->rate);
 		memcpy(&rate_prev_time, &now, sizeof(now));
 		add_agg_sample(sample_val(je->rate[DDIR_READ]), DDIR_READ, 0);
 		add_agg_sample(sample_val(je->rate[DDIR_WRITE]), DDIR_WRITE, 0);
@@ -522,8 +519,8 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 	if (!force && !eta_time_within_slack(disp_time))
 		return false;
 
-	calc_rate(unified_rw_rep, disp_time, io_bytes, disp_io_bytes, je_rate);
-	calc_iops(unified_rw_rep, disp_time, io_iops, disp_io_iops, je_iops);
+	calc_rate(unified_rw_rep, disp_time, io_bytes, disp_io_bytes, je->rate);
+	calc_iops(unified_rw_rep, disp_time, io_iops, disp_io_iops, je->iops);
 
 	memcpy(&disp_prev_time, &now, sizeof(now));
 
@@ -736,6 +733,10 @@ void print_thread_status(void)
 
 void print_status_init(int thr_number)
 {
+	struct jobs_eta_packed jep;
+
+	compiletime_assert(sizeof(struct jobs_eta) == sizeof(jep), "jobs_eta");
+
 	DRD_IGNORE_VAR(__run_str);
 	__run_str[thr_number] = 'P';
 	update_condensed_str(__run_str, run_str);
diff --git a/filesetup.c b/filesetup.c
index 17fa31fb..57eca1bf 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -805,8 +805,7 @@ static unsigned long long get_fs_free_counts(struct thread_data *td)
 		} else if (f->filetype != FIO_TYPE_FILE)
 			continue;
 
-		buf[255] = '\0';
-		strncpy(buf, f->file_name, 255);
+		snprintf(buf, ARRAY_SIZE(buf), "%s", f->file_name);
 
 		if (stat(buf, &sb) < 0) {
 			if (errno != ENOENT)
@@ -829,8 +828,7 @@ static unsigned long long get_fs_free_counts(struct thread_data *td)
 			continue;
 
 		fm = calloc(1, sizeof(*fm));
-		strncpy(fm->__base, buf, sizeof(fm->__base));
-		fm->__base[255] = '\0'; 
+		snprintf(fm->__base, ARRAY_SIZE(fm->__base), "%s", buf);
 		fm->base = basename(fm->__base);
 		fm->key = sb.st_dev;
 		flist_add(&fm->list, &list);
diff --git a/gclient.c b/gclient.c
index 04275a13..64324177 100644
--- a/gclient.c
+++ b/gclient.c
@@ -318,7 +318,7 @@ static void gfio_update_thread_status(struct gui_entry *ge,
 	static char message[100];
 	const char *m = message;
 
-	strncpy(message, status_message, sizeof(message) - 1);
+	snprintf(message, sizeof(message), "%s", status_message);
 	gtk_progress_bar_set_text(GTK_PROGRESS_BAR(ge->thread_status_pb), m);
 	gtk_progress_bar_set_fraction(GTK_PROGRESS_BAR(ge->thread_status_pb), perc / 100.0);
 	gtk_widget_queue_draw(ge->ui->window);
@@ -330,7 +330,7 @@ static void gfio_update_thread_status_all(struct gui *ui, char *status_message,
 	static char message[100];
 	const char *m = message;
 
-	strncpy(message, status_message, sizeof(message) - 1);
+	strncpy(message, sizeof(message), "%s", status_message);
 	gtk_progress_bar_set_text(GTK_PROGRESS_BAR(ui->thread_status_pb), m);
 	gtk_progress_bar_set_fraction(GTK_PROGRESS_BAR(ui->thread_status_pb), perc / 100.0);
 	gtk_widget_queue_draw(ui->window);
diff --git a/init.c b/init.c
index c9f6198e..63f2168e 100644
--- a/init.c
+++ b/init.c
@@ -1273,8 +1273,7 @@ static char *make_filename(char *buf, size_t buf_size,struct thread_options *o,
 	for (f = &fpre_keywords[0]; f->keyword; f++)
 		f->strlen = strlen(f->keyword);
 
-	buf[buf_size - 1] = '\0';
-	strncpy(buf, o->filename_format, buf_size - 1);
+	snprintf(buf, buf_size, "%s", o->filename_format);
 
 	memset(copy, 0, sizeof(copy));
 	for (f = &fpre_keywords[0]; f->keyword; f++) {
@@ -1353,7 +1352,7 @@ static char *make_filename(char *buf, size_t buf_size,struct thread_options *o,
 			if (post_start)
 				strncpy(dst, buf + post_start, dst_left);
 
-			strncpy(buf, copy, buf_size - 1);
+			snprintf(buf, buf_size, "%s", copy);
 		} while (1);
 	}
 
@@ -2029,20 +2028,12 @@ static int __parse_jobs_ini(struct thread_data *td,
 				 */
 				if (access(filename, F_OK) &&
 				    (ts = strrchr(file, '/'))) {
-					int len = ts - file +
-						strlen(filename) + 2;
-
-					if (!(full_fn = calloc(1, len))) {
+					if (asprintf(&full_fn, "%.*s%s",
+						 (int)(ts - file + 1), file,
+						 filename) < 0) {
 						ret = ENOMEM;
 						break;
 					}
-
-					strncpy(full_fn,
-						file, (ts - file) + 1);
-					strncpy(full_fn + (ts - file) + 1,
-						filename,
-						len - (ts - file) - 1);
-					full_fn[len - 1] = 0;
 					filename = full_fn;
 				}
 
diff --git a/ioengines.c b/ioengines.c
index aa4ccd27..40fa75c3 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -125,8 +125,7 @@ static struct ioengine_ops *__load_ioengine(const char *name)
 {
 	char engine[64];
 
-	engine[sizeof(engine) - 1] = '\0';
-	strncpy(engine, name, sizeof(engine) - 1);
+	snprintf(engine, sizeof(engine), "%s", name);
 
 	/*
 	 * linux libaio has alias names, so convert to what we want
diff --git a/options.c b/options.c
index f4c9bedf..447f231e 100644
--- a/options.c
+++ b/options.c
@@ -4902,8 +4902,7 @@ char *fio_option_dup_subs(const char *opt)
 		return NULL;
 	}
 
-	in[OPT_LEN_MAX] = '\0';
-	strncpy(in, opt, OPT_LEN_MAX);
+	snprintf(in, sizeof(in), "%s", opt);
 
 	while (*inptr && nchr > 0) {
 		if (inptr[0] == '$' && inptr[1] == '{') {
diff --git a/parse.c b/parse.c
index a7d4516e..c4fd4626 100644
--- a/parse.c
+++ b/parse.c
@@ -602,8 +602,7 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 		if (!is_time && o->is_time)
 			is_time = o->is_time;
 
-		tmp[sizeof(tmp) - 1] = '\0';
-		strncpy(tmp, ptr, sizeof(tmp) - 1);
+		snprintf(tmp, sizeof(tmp), "%s", ptr);
 		p = strchr(tmp, ',');
 		if (p)
 			*p = '\0';
@@ -829,8 +828,7 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 		char tmp[128];
 		char *p1, *p2;
 
-		tmp[sizeof(tmp) - 1] = '\0';
-		strncpy(tmp, ptr, sizeof(tmp) - 1);
+		snprintf(tmp, sizeof(tmp), "%s", ptr);
 
 		/* Handle bsrange with separate read,write values: */
 		p1 = strchr(tmp, ',');
diff --git a/server.c b/server.c
index 23e549a5..e7846227 100644
--- a/server.c
+++ b/server.c
@@ -865,7 +865,8 @@ static int handle_probe_cmd(struct fio_net_cmd *cmd)
 	strcpy(me, (char *) pdu->server);
 
 	gethostname((char *) probe.hostname, sizeof(probe.hostname));
-	strncpy((char *) probe.fio_version, fio_version_string, sizeof(probe.fio_version) - 1);
+	snprintf((char *) probe.fio_version, sizeof(probe.fio_version), "%s",
+		 fio_version_string);
 
 	/*
 	 * If the client supports compression and we do too, then enable it
@@ -1470,12 +1471,10 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 
 	memset(&p, 0, sizeof(p));
 
-	strncpy(p.ts.name, ts->name, FIO_JOBNAME_SIZE);
-	p.ts.name[FIO_JOBNAME_SIZE - 1] = '\0';
-	strncpy(p.ts.verror, ts->verror, FIO_VERROR_SIZE);
-	p.ts.verror[FIO_VERROR_SIZE - 1] = '\0';
-	strncpy(p.ts.description, ts->description, FIO_JOBDESC_SIZE);
-	p.ts.description[FIO_JOBDESC_SIZE - 1] = '\0';
+	snprintf(p.ts.name, sizeof(p.ts.name), "%s", ts->name);
+	snprintf(p.ts.verror, sizeof(p.ts.verror), "%s", ts->verror);
+	snprintf(p.ts.description, sizeof(p.ts.description), "%s",
+		 ts->description);
 
 	p.ts.error		= cpu_to_le32(ts->error);
 	p.ts.thread_number	= cpu_to_le32(ts->thread_number);
@@ -1666,8 +1665,7 @@ static void convert_dus(struct disk_util_stat *dst, struct disk_util_stat *src)
 {
 	int i;
 
-	dst->name[FIO_DU_NAME_SZ - 1] = '\0';
-	strncpy((char *) dst->name, (char *) src->name, FIO_DU_NAME_SZ - 1);
+	snprintf((char *) dst->name, sizeof(dst->name), "%s", src->name);
 
 	for (i = 0; i < 2; i++) {
 		dst->s.ios[i]		= cpu_to_le64(src->s.ios[i]);
@@ -1977,8 +1975,7 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 	else
 		pdu.compressed = 0;
 
-	strncpy((char *) pdu.name, name, FIO_NET_NAME_MAX);
-	pdu.name[FIO_NET_NAME_MAX - 1] = '\0';
+	snprintf((char *) pdu.name, sizeof(pdu.name), "%s", name);
 
 	/*
 	 * We can't do this for a pre-compressed log, but for that case,
@@ -2195,9 +2192,8 @@ static int fio_init_server_sock(void)
 
 	mode = umask(000);
 
-	memset(&addr, 0, sizeof(addr));
 	addr.sun_family = AF_UNIX;
-	strncpy(addr.sun_path, bind_sock, sizeof(addr.sun_path) - 1);
+	snprintf(addr.sun_path, sizeof(addr.sun_path), "%s", bind_sock);
 
 	len = sizeof(addr.sun_family) + strlen(bind_sock) + 1;
 
@@ -2247,9 +2243,9 @@ static int fio_init_server_connection(void)
 		if (p)
 			strcat(p, port);
 		else
-			strncpy(bind_str, port, sizeof(bind_str) - 1);
+			snprintf(bind_str, sizeof(bind_str), "%s", port);
 	} else
-		strncpy(bind_str, bind_sock, sizeof(bind_str) - 1);
+		snprintf(bind_str, sizeof(bind_str), "%s", bind_sock);
 
 	log_info("fio: server listening on %s\n", bind_str);
 
diff --git a/stat.c b/stat.c
index bf87917c..33637900 100644
--- a/stat.c
+++ b/stat.c
@@ -1828,10 +1828,11 @@ void __show_run_stats(void)
 			/*
 			 * These are per-group shared already
 			 */
-			strncpy(ts->name, td->o.name, FIO_JOBNAME_SIZE - 1);
+			snprintf(ts->name, sizeof(ts->name), "%s", td->o.name);
 			if (td->o.description)
-				strncpy(ts->description, td->o.description,
-						FIO_JOBDESC_SIZE - 1);
+				snprintf(ts->description,
+					 sizeof(ts->description), "%s",
+					 td->o.description);
 			else
 				memset(ts->description, 0, FIO_JOBDESC_SIZE);
 
@@ -1868,12 +1869,12 @@ void __show_run_stats(void)
 			if (!td->error && td->o.continue_on_error &&
 			    td->first_error) {
 				ts->error = td->first_error;
-				ts->verror[sizeof(ts->verror) - 1] = '\0';
-				strncpy(ts->verror, td->verror, sizeof(ts->verror) - 1);
+				snprintf(ts->verror, sizeof(ts->verror), "%s",
+					 td->verror);
 			} else  if (td->error) {
 				ts->error = td->error;
-				ts->verror[sizeof(ts->verror) - 1] = '\0';
-				strncpy(ts->verror, td->verror, sizeof(ts->verror) - 1);
+				snprintf(ts->verror, sizeof(ts->verror), "%s",
+					 td->verror);
 			}
 		}
 
diff --git a/stat.h b/stat.h
index e9551381..c209ab6c 100644
--- a/stat.h
+++ b/stat.h
@@ -251,32 +251,36 @@ struct thread_stat {
 	uint64_t cachemiss;
 } __attribute__((packed));
 
-struct jobs_eta {
-	uint32_t nr_running;
-	uint32_t nr_ramp;
-
-	uint32_t nr_pending;
-	uint32_t nr_setting_up;
-
-	uint64_t m_rate[DDIR_RWDIR_CNT], t_rate[DDIR_RWDIR_CNT];
-	uint64_t rate[DDIR_RWDIR_CNT];
-	uint32_t m_iops[DDIR_RWDIR_CNT], t_iops[DDIR_RWDIR_CNT];
-	uint32_t iops[DDIR_RWDIR_CNT];
-	uint64_t elapsed_sec;
-	uint64_t eta_sec;
-	uint32_t is_pow2;
-	uint32_t unit_base;
-
-	uint32_t sig_figs;
-
-	uint32_t files_open;
+#define JOBS_ETA {							\
+	uint32_t nr_running;						\
+	uint32_t nr_ramp;						\
+									\
+	uint32_t nr_pending;						\
+	uint32_t nr_setting_up;						\
+									\
+	uint64_t m_rate[DDIR_RWDIR_CNT], t_rate[DDIR_RWDIR_CNT];	\
+	uint64_t rate[DDIR_RWDIR_CNT];					\
+	uint32_t m_iops[DDIR_RWDIR_CNT] __attribute__((packed));	\
+	uint32_t t_iops[DDIR_RWDIR_CNT] __attribute__((packed));	\
+	uint32_t iops[DDIR_RWDIR_CNT] __attribute__((packed));		\
+	uint64_t elapsed_sec __attribute__((packed));			\
+	uint64_t eta_sec __attribute__((packed));			\
+	uint32_t is_pow2;						\
+	uint32_t unit_base;						\
+									\
+	uint32_t sig_figs;						\
+									\
+	uint32_t files_open;						\
+									\
+	/*								\
+	 * Network 'copy' of run_str[]					\
+	 */								\
+	uint32_t nr_threads;						\
+	uint8_t run_str[];						\
+}
 
-	/*
-	 * Network 'copy' of run_str[]
-	 */
-	uint32_t nr_threads;
-	uint8_t run_str[];
-} __attribute__((packed));
+struct jobs_eta JOBS_ETA;
+struct jobs_eta_packed JOBS_ETA __attribute__((packed));
 
 struct io_u_plat_entry {
 	struct flist_head list;
diff --git a/t/zbd/run-tests-against-zoned-nullb b/t/zbd/run-tests-against-zoned-nullb
index 9336716d..0952011c 100755
--- a/t/zbd/run-tests-against-zoned-nullb
+++ b/t/zbd/run-tests-against-zoned-nullb
@@ -24,6 +24,6 @@ modprobe null_blk nr_devices=0 &&
     echo 4096 > blocksize &&
     echo 1024 > size &&
     echo 1 > memory_backed &&
-    echo 1 > power
+    echo 1 > power || exit $?
 
 "${scriptdir}"/test-zbd-support "$@" /dev/nullb0
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 6fb48ef0..ed54a0aa 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -772,8 +772,8 @@ source "$(dirname "$0")/functions" || exit $?
 dev=$1
 realdev=$(readlink -f "$dev")
 basename=$(basename "$realdev")
-major=$((0x$(stat -L -c '%t' "$realdev")))
-minor=$((0x$(stat -L -c '%T' "$realdev")))
+major=$((0x$(stat -L -c '%t' "$realdev"))) || exit $?
+minor=$((0x$(stat -L -c '%T' "$realdev"))) || exit $?
 disk_size=$(($(<"/sys/dev/block/$major:$minor/size")*512))
 # When the target is a partition device, get basename of its holder device to
 # access sysfs path of the holder device
diff --git a/verify.c b/verify.c
index f79ab43a..48ba051d 100644
--- a/verify.c
+++ b/verify.c
@@ -1635,8 +1635,7 @@ struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 			s->rand.state32.s[3] = 0;
 			s->rand.use64 = 0;
 		}
-		s->name[sizeof(s->name) - 1] = '\0';
-		strncpy((char *) s->name, td->o.name, sizeof(s->name) - 1);
+		snprintf((char *) s->name, sizeof(s->name), "%s", td->o.name);
 		next = io_list_next(s);
 	}
 
diff --git a/zbd.c b/zbd.c
index d7e91e37..2383c57d 100644
--- a/zbd.c
+++ b/zbd.c
@@ -481,7 +481,7 @@ out:
  *
  * Returns 0 upon success and a negative error code upon failure.
  */
-int zbd_create_zone_info(struct thread_data *td, struct fio_file *f)
+static int zbd_create_zone_info(struct thread_data *td, struct fio_file *f)
 {
 	enum blk_zoned_model zbd_model;
 	int ret = 0;
@@ -933,8 +933,8 @@ static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
  * a multiple of the fio block size. The caller must neither hold z->mutex
  * nor f->zbd_info->mutex. Returns with z->mutex held upon success.
  */
-struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
-					       struct io_u *io_u)
+static struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
+						      struct io_u *io_u)
 {
 	const uint32_t min_bs = td->o.min_bs[io_u->ddir];
 	const struct fio_file *f = io_u->file;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-08-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-08-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c1f6f32ea74316df1e8707eba4fb95ab14fae6f7:

  Add tests from t/ to the Windows installer (2019-08-05 14:45:50 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 937ec971236d98089b63217635294c788ea00bce:

  t/zbd: Fix I/O bytes rounding errors (2019-08-08 21:36:32 -0600)

----------------------------------------------------------------
Shin'ichiro Kawasaki (1):
      t/zbd: Fix I/O bytes rounding errors

 t/zbd/test-zbd-support | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 10c78e9a..6fb48ef0 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -85,7 +85,8 @@ run_fio() {
 
     fio=$(dirname "$0")/../../fio
 
-    opts=("--aux-path=/tmp" "--allow_file_create=0" "$@")
+    opts=("--aux-path=/tmp" "--allow_file_create=0" \
+			    "--significant_figures=10" "$@")
     { echo; echo "fio ${opts[*]}"; echo; } >>"${logfile}.${test_number}"
 
     "${dynamic_analyzer[@]}" "$fio" "${opts[@]}"


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-08-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-08-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit acd44dec8cfa476d91dc1695df64f249e52d06e3:

  nbd: Remove copy and paste error in example (2019-08-03 09:51:44 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c1f6f32ea74316df1e8707eba4fb95ab14fae6f7:

  Add tests from t/ to the Windows installer (2019-08-05 14:45:50 -0600)

----------------------------------------------------------------
Rebecca Cran (1):
      Add tests from t/ to the Windows installer

 os/windows/install.wxs | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

---

Diff of recent changes:

diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 97d88e9f..dcb8c92c 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -43,6 +43,32 @@
 							<File Id="MORAL_LICENSE" Name="MORAL-LICENSE.txt" Source="..\..\MORAL-LICENSE"/>
 						</Component>
 						<Directory Id="examples" Name="examples"/>
+						<Directory Id="tests" Name="tests">
+							<Component>
+								<File Source="../../t/fio-genzipf.exe"/>
+							</Component>
+							<Component>
+								<File Source="../../t/fio-dedupe.exe"/>
+							</Component>
+							<Component>
+								<File Source="../../t/stest.exe"/>
+							</Component>
+							<Component>
+								<File Source="../../t/ieee754.exe"/>
+							</Component>
+							<Component>
+								<File Source="../../t/axmap.exe"/>
+							</Component>
+							<Component>
+								<File Source="../../t/lfsr-test.exe"/>
+							</Component>
+							<Component>
+								<File Source="../../t/gen-rand.exe"/>
+							</Component>
+							<Component>
+								<File Source="../../t/fio-verify-state.exe"/>
+							</Component>
+						</Directory>
 					</Directory>
 				</Directory>
 			</Directory>
@@ -56,6 +82,14 @@
 		<ComponentRef Id="COPYING"/>
 		<ComponentRef Id="MORAL_LICENSE"/>
 		<ComponentGroupRef Id="examples"/>
+		<ComponentRef Id="fio_genzipf.exe"/>
+		<ComponentRef Id="fio_dedupe.exe"/>
+		<ComponentRef Id="stest.exe"/>
+		<ComponentRef Id="ieee754.exe"/>
+		<ComponentRef Id="axmap.exe"/>
+		<ComponentRef Id="lfsr_test.exe"/>
+		<ComponentRef Id="gen_rand.exe"/>
+		<ComponentRef Id="fio_verify_state.exe"/>
 	</Feature>
 
 	<Property Id="ARPURLINFOABOUT" Value="http://git.kernel.dk/cgit/fio/" />


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-08-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-08-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f2d6de5d997b039cebac9c34912871baa5e12d49:

  nbd: Document the NBD-specific uri parameter (2019-08-02 11:05:28 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to acd44dec8cfa476d91dc1695df64f249e52d06e3:

  nbd: Remove copy and paste error in example (2019-08-03 09:51:44 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      engines/splice: remove buggy ->mem_align set

Richard W.M. Jones (1):
      nbd: Remove copy and paste error in example

 engines/splice.c | 7 -------
 examples/nbd.fio | 2 +-
 2 files changed, 1 insertion(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/engines/splice.c b/engines/splice.c
index feb764fe..6fc36bb6 100644
--- a/engines/splice.c
+++ b/engines/splice.c
@@ -278,13 +278,6 @@ static int fio_spliceio_init(struct thread_data *td)
 	 */
 	sd->vmsplice_to_user_map = 1;
 
-	/*
-	 * And if vmsplice_to_user works, we definitely need aligned
-	 * buffers. Just set ->odirect to force that.
-	 */
-	if (td_read(td))
-		td->o.mem_align = 1;
-
 	td->io_ops_data = sd;
 	return 0;
 }
diff --git a/examples/nbd.fio b/examples/nbd.fio
index 27e15086..6900ebe7 100644
--- a/examples/nbd.fio
+++ b/examples/nbd.fio
@@ -26,7 +26,7 @@ iodepth=64
 offset=0
 
 [job1]
-offset=064m
+offset=64m
 
 [job2]
 offset=128m


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-08-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-08-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 66b182f90c3f08dcbd0592ce70cb350ca5ac0cc0:

  smalloc: cleanup firstfree() (2019-07-31 15:29:48 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f2d6de5d997b039cebac9c34912871baa5e12d49:

  nbd: Document the NBD-specific uri parameter (2019-08-02 11:05:28 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      parse: bump max value pairs supported from 24 to 32

Richard W.M. Jones (2):
      engines: Add Network Block Device (NBD) support using libnbd.
      nbd: Document the NBD-specific uri parameter

 HOWTO            |  11 ++
 Makefile         |   6 +
 configure        |  27 ++++
 engines/nbd.c    | 368 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 examples/nbd.fio |  35 ++++++
 fio.1            |  19 +++
 optgroup.c       |   4 +
 optgroup.h       |   2 +
 options.c        |   3 +
 parse.h          |   2 +-
 10 files changed, 476 insertions(+), 1 deletion(-)
 create mode 100644 engines/nbd.c
 create mode 100644 examples/nbd.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 81244064..41b959d1 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1998,6 +1998,8 @@ I/O engine
 			requests for IME. FIO will then decide when to commit these requests.
 		**libiscsi**
 			Read and write iscsi lun with libiscsi.
+		**nbd**
+			Read and write a Network Block Device (NBD).
 
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -2299,6 +2301,15 @@ with the caveat that when used on the command line, they must come after the
 	turns on verbose logging from libcurl, 2 additionally enables
 	HTTP IO tracing. Default is **0**
 
+.. option:: uri=str : [nbd]
+
+	Specify the NBD URI of the server to test.  The string
+	is a standard NBD URI
+	(see https://github.com/NetworkBlockDevice/nbd/tree/master/doc).
+	Example URIs: nbd://localhost:10809
+	nbd+unix:///?socket=/tmp/socket
+	nbds://tlshost/exportname
+
 I/O depth
 ~~~~~~~~~
 
diff --git a/Makefile b/Makefile
index d7e5fca7..fe02bf1d 100644
--- a/Makefile
+++ b/Makefile
@@ -65,6 +65,12 @@ ifdef CONFIG_LIBISCSI
   SOURCE += engines/libiscsi.c
 endif
 
+ifdef CONFIG_LIBNBD
+  CFLAGS += $(LIBNBD_CFLAGS)
+  LIBS += $(LIBNBD_LIBS)
+  SOURCE += engines/nbd.c
+endif
+
 ifdef CONFIG_64BIT
   CFLAGS += -DBITS_PER_LONG=64
 endif
diff --git a/configure b/configure
index a0692d58..b11b2dce 100755
--- a/configure
+++ b/configure
@@ -149,6 +149,7 @@ disable_pmem="no"
 disable_native="no"
 march_set="no"
 libiscsi="no"
+libnbd="no"
 prefix=/usr/local
 
 # parse options
@@ -207,6 +208,8 @@ for opt do
   ;;
   --enable-libiscsi) libiscsi="yes"
   ;;
+  --enable-libnbd) libnbd="yes"
+  ;;
   --disable-tcmalloc) disable_tcmalloc="yes"
   ;;
   --help)
@@ -245,6 +248,7 @@ if test "$show_help" = "yes" ; then
   echo "--disable-native        Don't build for native host"
   echo "--with-ime=             Install path for DDN's Infinite Memory Engine"
   echo "--enable-libiscsi       Enable iscsi support"
+  echo "--enable-libnbd         Enable libnbd (NBD engine) support"
   echo "--disable-tcmalloc	Disable tcmalloc support"
   exit $exit_val
 fi
@@ -2014,6 +2018,23 @@ if test "$libiscsi" = "yes" ; then
 fi
 print_config "iscsi engine" "$libiscsi"
 
+##########################################
+# Check if we have libnbd (for NBD support).
+minimum_libnbd=0.9.6
+if test "$libnbd" = "yes" ; then
+  if $(pkg-config --atleast-version=$minimum_libnbd libnbd); then
+    libnbd="yes"
+    libnbd_cflags=$(pkg-config --cflags libnbd)
+    libnbd_libs=$(pkg-config --libs libnbd)
+  else
+    if test "$libnbd" = "yes" ; then
+      echo "libnbd" "Install libnbd >= $minimum_libnbd"
+    fi
+    libnbd="no"
+  fi
+fi
+print_config "NBD engine" "$libnbd"
+
 ##########################################
 # Check if we have lex/yacc available
 yacc="no"
@@ -2698,6 +2719,12 @@ if test "$libiscsi" = "yes" ; then
   echo "LIBISCSI_CFLAGS=$libiscsi_cflags" >> $config_host_mak
   echo "LIBISCSI_LIBS=$libiscsi_libs" >> $config_host_mak
 fi
+if test "$libnbd" = "yes" ; then
+  output_sym "CONFIG_LIBNBD"
+  echo "CONFIG_LIBNBD=m" >> $config_host_mak
+  echo "LIBNBD_CFLAGS=$libnbd_cflags" >> $config_host_mak
+  echo "LIBNBD_LIBS=$libnbd_libs" >> $config_host_mak
+fi
 cat > $TMPC << EOF
 int main(int argc, char **argv)
 {
diff --git a/engines/nbd.c b/engines/nbd.c
new file mode 100644
index 00000000..f8583812
--- /dev/null
+++ b/engines/nbd.c
@@ -0,0 +1,368 @@
+/*
+ * NBD engine
+ *
+ * IO engine that talks to an NBD server.
+ *
+ * Copyright (C) 2019 Red Hat Inc.
+ * Written by Richard W.M. Jones <rjones@redhat.com>
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <errno.h>
+
+#include <libnbd.h>
+
+#include "../fio.h"
+#include "../optgroup.h"
+
+/* Actually this differs across servers, but for nbdkit ... */
+#define NBD_MAX_REQUEST_SIZE (64 * 1024 * 1024)
+
+/* Storage for the NBD handle. */
+struct nbd_data {
+	struct nbd_handle *nbd;
+	int debug;
+
+	/* The list of completed io_u structs. */
+	struct io_u **completed;
+	size_t nr_completed;
+};
+
+/* Options. */
+struct nbd_options {
+	void *padding;
+	char *uri;
+};
+
+static struct fio_option options[] = {
+	{
+		.name	= "uri",
+		.lname	= "NBD URI",
+		.help	= "Name of NBD URI",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_NBD,
+		.type	= FIO_OPT_STR_STORE,
+		.off1	= offsetof(struct nbd_options, uri),
+	},
+	{
+		.name	= NULL,
+	},
+};
+
+/* Alocates nbd_data. */
+static int nbd_setup(struct thread_data *td)
+{
+	struct nbd_data *nbd_data;
+	struct nbd_options *o = td->eo;
+	struct fio_file *f;
+	int r;
+	int64_t size;
+
+	nbd_data = calloc(1, sizeof(*nbd_data));
+	if (!nbd_data) {
+		td_verror(td, errno, "calloc");
+		return 1;
+	}
+	td->io_ops_data = nbd_data;
+
+	/* Pretend to deal with files.	See engines/rbd.c */
+	if (!td->files_index) {
+		add_file(td, "nbd", 0, 0);
+		td->o.nr_files = td->o.nr_files ? : 1;
+		td->o.open_files++;
+	}
+	f = td->files[0];
+
+	nbd_data->nbd = nbd_create();
+	if (!nbd_data->nbd) {
+		log_err("fio: nbd_create: %s\n", nbd_get_error());
+		return 1;
+	}
+
+	/* Get the debug flag which can be set through LIBNBD_DEBUG=1. */
+	nbd_data->debug = nbd_get_debug(nbd_data->nbd);
+
+	/* Connect synchronously here so we can check for the size and
+	 * in future other properties of the server.
+	 */
+	if (!o->uri) {
+		log_err("fio: nbd: uri parameter was not specified\n");
+		return 1;
+	}
+	r = nbd_connect_uri(nbd_data->nbd, o->uri);
+	if (r == -1) {
+		log_err("fio: nbd_connect_uri: %s\n", nbd_get_error());
+		return 1;
+	}
+	size = nbd_get_size(nbd_data->nbd);
+	if (size == -1) {
+		log_err("fio: nbd_get_size: %s\n", nbd_get_error());
+		return 1;
+	}
+
+	f->real_file_size = size;
+
+	nbd_close (nbd_data->nbd);
+	nbd_data->nbd = NULL;
+
+	return 0;
+}
+
+/* Closes socket and frees nbd_data -- the opposite of nbd_setup. */
+static void nbd_cleanup(struct thread_data *td)
+{
+	struct nbd_data *nbd_data = td->io_ops_data;
+
+	if (nbd_data) {
+		if (nbd_data->nbd)
+			nbd_close(nbd_data->nbd);
+		free(nbd_data);
+	}
+}
+
+/* Connect to the server from each thread. */
+static int nbd_init(struct thread_data *td)
+{
+	struct nbd_options *o = td->eo;
+	struct nbd_data *nbd_data = td->io_ops_data;
+	int r;
+
+	if (!o->uri) {
+		log_err("fio: nbd: uri parameter was not specified\n");
+		return 1;
+	}
+
+	nbd_data->nbd = nbd_create();
+	if (!nbd_data->nbd) {
+		log_err("fio: nbd_create: %s\n", nbd_get_error());
+		return 1;
+	}
+	/* This is actually a synchronous connect and handshake. */
+	r = nbd_connect_uri(nbd_data->nbd, o->uri);
+	if (r == -1) {
+		log_err("fio: nbd_connect_uri: %s\n", nbd_get_error());
+		return 1;
+	}
+
+	log_info("fio: connected to NBD server\n");
+	return 0;
+}
+
+/* A command in flight has been completed. */
+static int cmd_completed (unsigned valid_flag, void *vp, int *error)
+{
+	struct io_u *io_u;
+	struct nbd_data *nbd_data;
+	struct io_u **completed;
+
+	if (!(valid_flag & LIBNBD_CALLBACK_VALID))
+		return 0;
+
+	io_u = vp;
+	nbd_data = io_u->engine_data;
+
+	if (nbd_data->debug)
+		log_info("fio: nbd: command completed\n");
+
+	if (*error != 0)
+		io_u->error = *error;
+	else
+		io_u->error = 0;
+
+	/* Add this completion to the list so it can be picked up
+	 * later by ->event.
+	 */
+	completed = realloc(nbd_data->completed,
+			    sizeof(struct io_u *) *
+			    (nbd_data->nr_completed+1));
+	if (completed == NULL) {
+		io_u->error = errno;
+		return 0;
+	}
+
+	nbd_data->completed = completed;
+	nbd_data->completed[nbd_data->nr_completed] = io_u;
+	nbd_data->nr_completed++;
+
+	return 0;
+}
+
+/* Begin read or write request. */
+static enum fio_q_status nbd_queue(struct thread_data *td,
+				   struct io_u *io_u)
+{
+	struct nbd_data *nbd_data = td->io_ops_data;
+	int r;
+
+	fio_ro_check(td, io_u);
+
+	io_u->engine_data = nbd_data;
+
+	if (io_u->ddir == DDIR_WRITE || io_u->ddir == DDIR_READ)
+		assert(io_u->xfer_buflen <= NBD_MAX_REQUEST_SIZE);
+
+	switch (io_u->ddir) {
+	case DDIR_READ:
+		r = nbd_aio_pread_callback(nbd_data->nbd,
+					   io_u->xfer_buf, io_u->xfer_buflen,
+					   io_u->offset,
+					   cmd_completed, io_u,
+					   0);
+		break;
+	case DDIR_WRITE:
+		r = nbd_aio_pwrite_callback(nbd_data->nbd,
+					    io_u->xfer_buf, io_u->xfer_buflen,
+					    io_u->offset,
+					    cmd_completed, io_u,
+					    0);
+		break;
+	case DDIR_TRIM:
+		r = nbd_aio_trim_callback(nbd_data->nbd, io_u->xfer_buflen,
+					  io_u->offset,
+					  cmd_completed, io_u,
+					  0);
+		break;
+	case DDIR_SYNC:
+		/* XXX We could probably also handle
+		 * DDIR_SYNC_FILE_RANGE with a bit of effort.
+		 */
+		r = nbd_aio_flush_callback(nbd_data->nbd,
+					   cmd_completed, io_u,
+					   0);
+		break;
+	default:
+		io_u->error = EINVAL;
+		return FIO_Q_COMPLETED;
+	}
+
+	if (r == -1) {
+		/* errno is optional information on libnbd error path;
+		 * if it's 0, set it to a default value
+		 */
+		io_u->error = nbd_get_errno();
+		if (io_u->error == 0)
+			io_u->error = EIO;
+		return FIO_Q_COMPLETED;
+	}
+
+	if (nbd_data->debug)
+		log_info("fio: nbd: command issued\n");
+	io_u->error = 0;
+	return FIO_Q_QUEUED;
+}
+
+static unsigned retire_commands(struct nbd_handle *nbd)
+{
+	int64_t cookie;
+	unsigned r = 0;
+
+	while ((cookie = nbd_aio_peek_command_completed(nbd)) > 0) {
+		/* Ignore the return value.  cmd_completed has already
+		 * checked for an error and set io_u->error.  We only
+		 * have to call this to retire the command.
+		 */
+		nbd_aio_command_completed(nbd, cookie);
+		r++;
+	}
+
+	if (nbd_get_debug(nbd))
+		log_info("fio: nbd: %u commands retired\n", r);
+	return r;
+}
+
+static int nbd_getevents(struct thread_data *td, unsigned int min,
+			 unsigned int max, const struct timespec *t)
+{
+	struct nbd_data *nbd_data = td->io_ops_data;
+	int r;
+	unsigned events = 0;
+	int timeout;
+
+	/* XXX This handling of timeout is wrong because it will wait
+	 * for up to loop iterations * timeout.
+	 */
+	timeout = !t ? -1 : t->tv_sec * 1000 + t->tv_nsec / 1000000;
+
+	while (events < min) {
+		r = nbd_poll(nbd_data->nbd, timeout);
+		if (r == -1) {
+			/* error in poll */
+			log_err("fio: nbd_poll: %s\n", nbd_get_error());
+			return -1;
+		}
+		else {
+			/* poll made progress */
+			events += retire_commands(nbd_data->nbd);
+		}
+	}
+
+	return events;
+}
+
+static struct io_u *nbd_event(struct thread_data *td, int event)
+{
+	struct nbd_data *nbd_data = td->io_ops_data;
+
+	if (nbd_data->nr_completed == 0)
+		return NULL;
+
+	/* XXX We ignore the event number and assume fio calls us
+	 * exactly once for [0..nr_events-1].
+	 */
+	nbd_data->nr_completed--;
+	return nbd_data->completed[nbd_data->nr_completed];
+}
+
+static int nbd_io_u_init(struct thread_data *td, struct io_u *io_u)
+{
+	io_u->engine_data = NULL;
+	return 0;
+}
+
+static void nbd_io_u_free(struct thread_data *td, struct io_u *io_u)
+{
+	/* Nothing needs to be done. */
+}
+
+static int nbd_open_file(struct thread_data *td, struct fio_file *f)
+{
+	return 0;
+}
+
+static int nbd_invalidate(struct thread_data *td, struct fio_file *f)
+{
+	return 0;
+}
+
+static struct ioengine_ops ioengine = {
+	.name			= "nbd",
+	.version		= FIO_IOOPS_VERSION,
+	.options		= options,
+	.option_struct_size	= sizeof(struct nbd_options),
+	.flags			= FIO_DISKLESSIO | FIO_NOEXTEND,
+
+	.setup			= nbd_setup,
+	.init			= nbd_init,
+	.cleanup		= nbd_cleanup,
+	.queue			= nbd_queue,
+	.getevents		= nbd_getevents,
+	.event			= nbd_event,
+	.io_u_init		= nbd_io_u_init,
+	.io_u_free		= nbd_io_u_free,
+
+	.open_file		= nbd_open_file,
+	.invalidate		= nbd_invalidate,
+};
+
+static void fio_init fio_nbd_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_nbd_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/nbd.fio b/examples/nbd.fio
new file mode 100644
index 00000000..27e15086
--- /dev/null
+++ b/examples/nbd.fio
@@ -0,0 +1,35 @@
+# To use fio to test nbdkit:
+#
+# nbdkit -U - memory size=256M --run 'export unixsocket; fio examples/nbd.fio'
+#
+# To use fio to test qemu-nbd:
+#
+# rm -f /tmp/disk.img /tmp/socket
+# truncate -s 256M /tmp/disk.img
+# export unixsocket=/tmp/socket
+# qemu-nbd -t -k $unixsocket -f raw /tmp/disk.img &
+# fio examples/nbd.fio
+# killall qemu-nbd
+
+[global]
+ioengine=nbd
+uri=nbd+unix:///?socket=${unixsocket}
+# Starting from nbdkit 1.14 the following will work:
+#uri=${uri}
+rw=randrw
+time_based
+runtime=60
+group_reporting
+iodepth=64
+
+[job0]
+offset=0
+
+[job1]
+offset=064m
+
+[job2]
+offset=128m
+
+[job3]
+offset=192m
diff --git a/fio.1 b/fio.1
index 2966d9d5..97371d77 100644
--- a/fio.1
+++ b/fio.1
@@ -1756,6 +1756,9 @@ FIO will then decide when to commit these requests.
 .TP
 .B libiscsi
 Read and write iscsi lun with libiscsi.
+.TP
+.B nbd
+Synchronous read and write a Network Block Device (NBD).
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -2019,6 +2022,22 @@ blocksize=8k will write 16 sectors with each command. fio will still
 generate 8k of data for each command butonly the first 512 bytes will
 be used and transferred to the device. The writefua option is ignored
 with this selection.
+.RE
+.RE
+.TP
+.BI (nbd)uri \fR=\fPstr
+Specify the NBD URI of the server to test.
+The string is a standard NBD URI (see
+\fIhttps://github.com/NetworkBlockDevice/nbd/tree/master/doc\fR).
+Example URIs:
+.RS
+.RS
+.TP
+\fInbd://localhost:10809\fR
+.TP
+\fInbd+unix:///?socket=/tmp/socket\fR
+.TP
+\fInbds://tlshost/exportname\fR
 
 .SS "I/O depth"
 .TP
diff --git a/optgroup.c b/optgroup.c
index 04ceec7e..c228ff29 100644
--- a/optgroup.c
+++ b/optgroup.c
@@ -169,6 +169,10 @@ static const struct opt_group fio_opt_cat_groups[] = {
 		.name	= "libhdfs I/O engine", /* libhdfs */
 		.mask	= FIO_OPT_G_HDFS,
 	},
+	{
+		.name	= "NBD I/O engine", /* NBD */
+		.mask	= FIO_OPT_G_NBD,
+	},
 	{
 		.name	= NULL,
 	},
diff --git a/optgroup.h b/optgroup.h
index 148c8da1..8009bf25 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -63,6 +63,7 @@ enum opt_category_group {
 	__FIO_OPT_G_SG,
 	__FIO_OPT_G_MMAP,
 	__FIO_OPT_G_ISCSI,
+	__FIO_OPT_G_NBD,
 	__FIO_OPT_G_NR,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
@@ -102,6 +103,7 @@ enum opt_category_group {
 	FIO_OPT_G_MMAP		= (1ULL << __FIO_OPT_G_MMAP),
 	FIO_OPT_G_INVALID	= (1ULL << __FIO_OPT_G_NR),
 	FIO_OPT_G_ISCSI         = (1ULL << __FIO_OPT_G_ISCSI),
+	FIO_OPT_G_NBD		= (1ULL << __FIO_OPT_G_NBD),
 };
 
 extern const struct opt_group *opt_group_from_mask(uint64_t *mask);
diff --git a/options.c b/options.c
index 95086074..f4c9bedf 100644
--- a/options.c
+++ b/options.c
@@ -1899,6 +1899,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "HTTP (WebDAV/S3) IO engine",
 			  },
 #endif
+			  { .ival = "nbd",
+			    .help = "Network Block Device (NBD) IO engine"
+			  },
 		},
 	},
 	{
diff --git a/parse.h b/parse.h
index 9b4e2f32..5828654f 100644
--- a/parse.h
+++ b/parse.h
@@ -38,7 +38,7 @@ struct value_pair {
 };
 
 #define OPT_LEN_MAX 	8192
-#define PARSE_MAX_VP	24
+#define PARSE_MAX_VP	32
 
 /*
  * Option define


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-08-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-08-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 029b42ace698eae477c5e261d2f82b191507526b:

  Document io_uring feature (2019-07-26 09:53:43 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 66b182f90c3f08dcbd0592ce70cb350ca5ac0cc0:

  smalloc: cleanup firstfree() (2019-07-31 15:29:48 -0600)

----------------------------------------------------------------
Alexander Kapshuna (1):
      fio2gnuplot: fix TabErrors when running with Python 3

Damien Le Moal (1):
      bssplit: Fix handling of 0 percentage

Jens Axboe (5):
      Merge branch 'gnuplot-tabs' of https://github.com/kapsh/fio
      Merge branch 'dev' of https://github.com/smartxworks/fio
      Merge branch 'smalloc-gc' of https://github.com/vincentkfu/fio
      Remove unused fio_assert()
      smalloc: cleanup firstfree()

Kyle Zhang (1):
      libiscsi: continue working when meets EINTR or EAGAIN

Vincent Fu (4):
      smalloc: print debug info on oom error
      t/stest: make the test more challenging
      smalloc: fix garbage collection problem
      smalloc: fix compiler warning on Windows

 engines/libiscsi.c     |   3 +
 fio.h                  |  10 ---
 io_u.c                 |   4 +-
 smalloc.c              | 108 ++++++++++++++++++++++-
 smalloc.h              |   1 +
 t/stest.c              |  21 ++++-
 tools/plot/fio2gnuplot | 234 ++++++++++++++++++++++++-------------------------
 7 files changed, 247 insertions(+), 134 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libiscsi.c b/engines/libiscsi.c
index bea94c5a..58667fb2 100644
--- a/engines/libiscsi.c
+++ b/engines/libiscsi.c
@@ -351,6 +351,9 @@ static int fio_iscsi_getevents(struct thread_data *td, unsigned int min,
 
 		ret = poll(iscsi_info->pfds, iscsi_info->nr_luns, -1);
 		if (ret < 0) {
+			if (errno == EINTR || errno == EAGAIN) {
+				continue;
+			}
 			log_err("iscsi: failed to poll events: %s.\n",
 				strerror(errno));
 			break;
diff --git a/fio.h b/fio.h
index 2103151d..2094d30b 100644
--- a/fio.h
+++ b/fio.h
@@ -705,16 +705,6 @@ extern void lat_target_reset(struct thread_data *);
 	    	 (i) < (td)->o.nr_files && ((f) = (td)->files[i]) != NULL; \
 		 (i)++)
 
-#define fio_assert(td, cond)	do {	\
-	if (!(cond)) {			\
-		int *__foo = NULL;	\
-		fprintf(stderr, "file:%s:%d, assert %s failed\n", __FILE__, __LINE__, #cond);	\
-		td_set_runstate((td), TD_EXITED);	\
-		(td)->error = EFAULT;		\
-		*__foo = 0;			\
-	}	\
-} while (0)
-
 static inline bool fio_fill_issue_time(struct thread_data *td)
 {
 	if (td->o.read_iolog_file ||
diff --git a/io_u.c b/io_u.c
index 910b7deb..80df2854 100644
--- a/io_u.c
+++ b/io_u.c
@@ -557,10 +557,10 @@ static unsigned long long get_next_buflen(struct thread_data *td, struct io_u *i
 			for (i = 0; i < td->o.bssplit_nr[ddir]; i++) {
 				struct bssplit *bsp = &td->o.bssplit[ddir][i];
 
+				if (!bsp->perc)
+					continue;
 				buflen = bsp->bs;
 				perc += bsp->perc;
-				if (!perc)
-					break;
 				if ((r / perc <= frand_max / 100ULL) &&
 				    io_u_fits(td, io_u, buflen))
 					break;
diff --git a/smalloc.c b/smalloc.c
index a2ad25a0..125e07bf 100644
--- a/smalloc.c
+++ b/smalloc.c
@@ -48,6 +48,12 @@ struct block_hdr {
 #endif
 };
 
+/*
+ * This suppresses the voluminous potential bitmap printout when
+ * smalloc encounters an OOM error
+ */
+static const bool enable_smalloc_debug = false;
+
 static struct pool mp[MAX_POOLS];
 static unsigned int nr_pools;
 static unsigned int last_pool;
@@ -332,6 +338,25 @@ void sfree(void *ptr)
 	log_err("smalloc: ptr %p not from smalloc pool\n", ptr);
 }
 
+static unsigned int find_best_index(struct pool *pool)
+{
+	unsigned int i;
+
+	assert(pool->free_blocks);
+
+	for (i = pool->next_non_full; pool->bitmap[i] == -1U; i++) {
+		if (i == pool->nr_blocks - 1) {
+			unsigned int j;
+
+			for (j = 0; j < pool->nr_blocks; j++)
+				if (pool->bitmap[j] != -1U)
+					return j;
+		}
+	}
+
+	return i;
+}
+
 static void *__smalloc_pool(struct pool *pool, size_t size)
 {
 	size_t nr_blocks;
@@ -346,15 +371,16 @@ static void *__smalloc_pool(struct pool *pool, size_t size)
 	if (nr_blocks > pool->free_blocks)
 		goto fail;
 
-	i = pool->next_non_full;
+	pool->next_non_full = find_best_index(pool);
+
 	last_idx = 0;
 	offset = -1U;
+	i = pool->next_non_full;
 	while (i < pool->nr_blocks) {
 		unsigned int idx;
 
 		if (pool->bitmap[i] == -1U) {
 			i++;
-			pool->next_non_full = i;
 			last_idx = 0;
 			continue;
 		}
@@ -387,10 +413,9 @@ fail:
 	return ret;
 }
 
-static void *smalloc_pool(struct pool *pool, size_t size)
+static size_t size_to_alloc_size(size_t size)
 {
 	size_t alloc_size = size + sizeof(struct block_hdr);
-	void *ptr;
 
 	/*
 	 * Round to int alignment, so that the postred pointer will
@@ -401,6 +426,14 @@ static void *smalloc_pool(struct pool *pool, size_t size)
 	alloc_size = (alloc_size + int_mask) & ~int_mask;
 #endif
 
+	return alloc_size;
+}
+
+static void *smalloc_pool(struct pool *pool, size_t size)
+{
+	size_t alloc_size = size_to_alloc_size(size);
+	void *ptr;
+
 	ptr = __smalloc_pool(pool, alloc_size);
 	if (ptr) {
 		struct block_hdr *hdr = ptr;
@@ -415,6 +448,72 @@ static void *smalloc_pool(struct pool *pool, size_t size)
 	return ptr;
 }
 
+static void smalloc_print_bitmap(struct pool *pool)
+{
+	size_t nr_blocks = pool->nr_blocks;
+	unsigned int *bitmap = pool->bitmap;
+	unsigned int i, j;
+	char *buffer;
+
+	if (!enable_smalloc_debug)
+		return;
+
+	buffer = malloc(SMALLOC_BPI + 1);
+	if (!buffer)
+		return;
+	buffer[SMALLOC_BPI] = '\0';
+
+	for (i = 0; i < nr_blocks; i++) {
+		unsigned int line = bitmap[i];
+
+		/* skip completely full lines */
+		if (line == -1U)
+			continue;
+
+		for (j = 0; j < SMALLOC_BPI; j++)
+			if ((1 << j) & line)
+				buffer[SMALLOC_BPI-1-j] = '1';
+			else
+				buffer[SMALLOC_BPI-1-j] = '0';
+
+		log_err("smalloc: bitmap %5u, %s\n", i, buffer);
+	}
+
+	free(buffer);
+}
+
+void smalloc_debug(size_t size)
+{
+	unsigned int i;
+	size_t alloc_size = size_to_alloc_size(size);
+	size_t alloc_blocks;
+
+	alloc_blocks = size_to_blocks(alloc_size);
+
+	if (size)
+		log_err("smalloc: size = %lu, alloc_size = %lu, blocks = %lu\n",
+			(unsigned long) size, (unsigned long) alloc_size,
+			(unsigned long) alloc_blocks);
+	for (i = 0; i < nr_pools; i++) {
+		log_err("smalloc: pool %u, free/total blocks %u/%u\n", i,
+			(unsigned int) (mp[i].free_blocks),
+			(unsigned int) (mp[i].nr_blocks*sizeof(unsigned int)*8));
+		if (size && mp[i].free_blocks >= alloc_blocks) {
+			void *ptr = smalloc_pool(&mp[i], size);
+			if (ptr) {
+				sfree(ptr);
+				last_pool = i;
+				log_err("smalloc: smalloc_pool %u succeeded\n", i);
+			} else {
+				log_err("smalloc: smalloc_pool %u failed\n", i);
+				log_err("smalloc: next_non_full=%u, nr_blocks=%u\n",
+					(unsigned int) mp[i].next_non_full, (unsigned int) mp[i].nr_blocks);
+				smalloc_print_bitmap(&mp[i]);
+			}
+		}
+	}
+}
+
 void *smalloc(size_t size)
 {
 	unsigned int i, end_pool;
@@ -445,6 +544,7 @@ void *smalloc(size_t size)
 
 	log_err("smalloc: OOM. Consider using --alloc-size to increase the "
 		"shared memory available.\n");
+	smalloc_debug(size);
 	return NULL;
 }
 
diff --git a/smalloc.h b/smalloc.h
index 8df10e6f..1f7716f4 100644
--- a/smalloc.h
+++ b/smalloc.h
@@ -9,6 +9,7 @@ extern void sfree(void *);
 extern char *smalloc_strdup(const char *);
 extern void sinit(void);
 extern void scleanup(void);
+extern void smalloc_debug(size_t);
 
 extern unsigned int smalloc_pool_size;
 
diff --git a/t/stest.c b/t/stest.c
index b95968f1..515ae5a5 100644
--- a/t/stest.c
+++ b/t/stest.c
@@ -11,11 +11,14 @@
 #define MAGIC2	0xf0a1e9b3
 
 #define LOOPS	32
+#define MAXSMALLOC	120*1024*1024UL
+#define LARGESMALLOC	128*1024U
 
 struct elem {
 	unsigned int magic1;
 	struct flist_head list;
 	unsigned int magic2;
+	unsigned int size;
 };
 
 static FLIST_HEAD(list);
@@ -25,13 +28,15 @@ static int do_rand_allocs(void)
 	unsigned int size, nr, rounds = 0;
 	unsigned long total;
 	struct elem *e;
+	bool error;
 
 	while (rounds++ < LOOPS) {
 #ifdef STEST_SEED
 		srand(MAGIC1);
 #endif
+		error = false;
 		nr = total = 0;
-		while (total < 120*1024*1024UL) {
+		while (total < MAXSMALLOC) {
 			size = 8 * sizeof(struct elem) + (int) (999.0 * (rand() / (RAND_MAX + 1.0)));
 			e = smalloc(size);
 			if (!e) {
@@ -40,6 +45,7 @@ static int do_rand_allocs(void)
 			}
 			e->magic1 = MAGIC1;
 			e->magic2 = MAGIC2;
+			e->size = size;
 			total += size;
 			flist_add_tail(&e->list, &list);
 			nr++;
@@ -51,8 +57,20 @@ static int do_rand_allocs(void)
 			e = flist_entry(list.next, struct elem, list);
 			assert(e->magic1 == MAGIC1);
 			assert(e->magic2 == MAGIC2);
+			total -= e->size;
 			flist_del(&e->list);
 			sfree(e);
+
+			if (!error) {
+				e = smalloc(LARGESMALLOC);
+				if (!e) {
+					error = true;
+					printf("failure allocating %u bytes at %lu allocated during sfree phase\n",
+						LARGESMALLOC, total);
+				}
+				else
+					sfree(e);
+			}
 		}
 	}
 
@@ -66,6 +84,7 @@ int main(int argc, char *argv[])
 	debug_init();
 
 	do_rand_allocs();
+	smalloc_debug(0);	/* free and total blocks should match */
 
 	scleanup();
 	return 0;
diff --git a/tools/plot/fio2gnuplot b/tools/plot/fio2gnuplot
index 4d1815cf..cc4ea4c7 100755
--- a/tools/plot/fio2gnuplot
+++ b/tools/plot/fio2gnuplot
@@ -36,10 +36,10 @@ def find_file(path, pattern):
 	fio_data_file=[]
 	# For all the local files
 	for file in os.listdir(path):
-	    # If the file matches the glob
-	    if fnmatch.fnmatch(file, pattern):
-		# Let's consider this file
-		fio_data_file.append(file)
+		# If the file matches the glob
+		if fnmatch.fnmatch(file, pattern):
+			# Let's consider this file
+			fio_data_file.append(file)
 
 	return fio_data_file
 
@@ -51,7 +51,7 @@ def generate_gnuplot_script(fio_data_file,title,gnuplot_output_filename,gnuplot_
 
 	# Plotting 3D or comparing graphs doesn't have a meaning unless if there is at least 2 traces
 	if len(fio_data_file) > 1:
-        	f.write("call \'%s/graph3D.gpm\' \'%s' \'%s\' \'\' \'%s\' \'%s\'\n" % (gpm_dir,title,gnuplot_output_filename,gnuplot_output_filename,mode))
+		f.write("call \'%s/graph3D.gpm\' \'%s' \'%s\' \'\' \'%s\' \'%s\'\n" % (gpm_dir,title,gnuplot_output_filename,gnuplot_output_filename,mode))
 
 		# Setting up the compare files that will be plot later
 		compare=open(gnuplot_output_dir + 'compare.gnuplot','w')
@@ -93,10 +93,10 @@ set style line 1 lt 1 lw 3 pt 3 linecolor rgb "green"
 		compare_smooth.write("plot %s w l ls 1 ti 'Global average value (%.2f)'" % (global_avg,global_avg));
 		compare_trend.write("plot %s w l ls 1 ti 'Global average value (%.2f)'" % (global_avg,global_avg));
 
-        pos=0
-        # Let's create a temporary file for each selected fio file
-        for file in fio_data_file:
-                tmp_filename = "gnuplot_temp_file.%d" % pos
+		pos=0
+		# Let's create a temporary file for each selected fio file
+		for file in fio_data_file:
+			tmp_filename = "gnuplot_temp_file.%d" % pos
 
 		# Plotting comparing graphs doesn't have a meaning unless if there is at least 2 traces
 		if len(fio_data_file) > 1:
@@ -106,12 +106,12 @@ set style line 1 lt 1 lw 3 pt 3 linecolor rgb "green"
 			compare_trend.write(",\\\n'%s' using 2:3 smooth bezier title '%s'" % (tmp_filename,fio_data_file[pos]))
 
 		png_file=file.replace('.log','')
-                raw_filename = "%s-2Draw" % (png_file)
-                smooth_filename = "%s-2Dsmooth" % (png_file)
-                trend_filename = "%s-2Dtrend" % (png_file)
-                avg  = average(disk_perf[pos])
-                f.write("call \'%s/graph2D.gpm\' \'%s' \'%s\' \'%s\' \'%s\' \'%s\' \'%s\' \'%s\' \'%f\'\n" % (gpm_dir,title,tmp_filename,fio_data_file[pos],raw_filename,mode,smooth_filename,trend_filename,avg))
-                pos = pos +1
+		raw_filename = "%s-2Draw" % (png_file)
+		smooth_filename = "%s-2Dsmooth" % (png_file)
+		trend_filename = "%s-2Dtrend" % (png_file)
+		avg  = average(disk_perf[pos])
+		f.write("call \'%s/graph2D.gpm\' \'%s' \'%s\' \'%s\' \'%s\' \'%s\' \'%s\' \'%s\' \'%f\'\n" % (gpm_dir,title,tmp_filename,fio_data_file[pos],raw_filename,mode,smooth_filename,trend_filename,avg))
+		pos = pos +1
 
 	# Plotting comparing graphs doesn't have a meaning unless if there is at least 2 traces
 	if len(fio_data_file) > 1:
@@ -125,7 +125,7 @@ def generate_gnuplot_math_script(title,gnuplot_output_filename,mode,average,gnup
 	filename=gnuplot_output_dir+'mymath';
 	temporary_files.append(filename)
 	f=open(filename,'a')
-        f.write("call \'%s/math.gpm\' \'%s' \'%s\' \'\' \'%s\' \'%s\' %s\n" % (gpm_dir,title,gnuplot_output_filename,gnuplot_output_filename,mode,average))
+	f.write("call \'%s/math.gpm\' \'%s' \'%s\' \'\' \'%s\' \'%s\' %s\n" % (gpm_dir,title,gnuplot_output_filename,gnuplot_output_filename,mode,average))
 	f.close()
 
 def compute_aggregated_file(fio_data_file, gnuplot_output_filename, gnuplot_output_dir):
@@ -250,10 +250,10 @@ def compute_math(fio_data_file, title,gnuplot_output_filename,gnuplot_output_dir
 	stddev_file.write('DiskName %s\n'% mode )
 	for disk in range(len(fio_data_file)):
 #		print disk_perf[disk]
-	    	min_file.write("# Disk%d was coming from %s\n" % (disk,fio_data_file[disk]))
-	    	max_file.write("# Disk%d was coming from %s\n" % (disk,fio_data_file[disk]))
-	    	average_file.write("# Disk%d was coming from %s\n" % (disk,fio_data_file[disk]))
-	    	stddev_file.write("# Disk%d was coming from %s\n" % (disk,fio_data_file[disk]))
+		min_file.write("# Disk%d was coming from %s\n" % (disk,fio_data_file[disk]))
+		max_file.write("# Disk%d was coming from %s\n" % (disk,fio_data_file[disk]))
+		average_file.write("# Disk%d was coming from %s\n" % (disk,fio_data_file[disk]))
+		stddev_file.write("# Disk%d was coming from %s\n" % (disk,fio_data_file[disk]))
 		avg  = average(disk_perf[disk])
 		variance = [(x - avg)**2 for x in disk_perf[disk]]
 		standard_deviation = math.sqrt(average(variance))
@@ -406,126 +406,126 @@ def main(argv):
     force_keep_temp_files=False
 
     if not os.path.isfile(gpm_dir+'math.gpm'):
-	    gpm_dir="/usr/local/share/fio/"
-    	    if not os.path.isfile(gpm_dir+'math.gpm'):
-		    print("Looks like fio didn't get installed properly as no gpm files found in '/usr/share/fio' or '/usr/local/share/fio'\n")
-		    sys.exit(3)
+        gpm_dir="/usr/local/share/fio/"
+        if not os.path.isfile(gpm_dir+'math.gpm'):
+            print("Looks like fio didn't get installed properly as no gpm files found in '/usr/share/fio' or '/usr/local/share/fio'\n")
+            sys.exit(3)
 
     try:
-	    opts, args = getopt.getopt(argv[1:],"ghkbivo:d:t:p:G:m:M:",['bandwidth', 'iops', 'pattern', 'outputfile', 'outputdir', 'title', 'min_time', 'max_time', 'gnuplot', 'Global', 'help', 'verbose','keep'])
+        opts, args = getopt.getopt(argv[1:],"ghkbivo:d:t:p:G:m:M:",['bandwidth', 'iops', 'pattern', 'outputfile', 'outputdir', 'title', 'min_time', 'max_time', 'gnuplot', 'Global', 'help', 'verbose','keep'])
     except getopt.GetoptError:
-	 print("Error: One of the options passed to the cmdline was not supported")
-	 print("Please fix your command line or read the help (-h option)")
-         sys.exit(2)
+        print("Error: One of the options passed to the cmdline was not supported")
+        print("Please fix your command line or read the help (-h option)")
+        sys.exit(2)
 
     for opt, arg in opts:
-      if opt in ("-b", "--bandwidth"):
-         pattern='*_bw.log'
-      elif opt in ("-i", "--iops"):
-         pattern='*_iops.log'
-      elif opt in ("-v", "--verbose"):
-	 verbose=True
-      elif opt in ("-k", "--keep"):
-	 #User really wants to keep the temporary files
-	 force_keep_temp_files=True
-      elif opt in ("-p", "--pattern"):
-         pattern_set_by_user=True
-	 pattern=arg
-	 pattern=pattern.replace('\\','')
-      elif opt in ("-o", "--outputfile"):
-         gnuplot_output_filename=arg
-      elif opt in ("-d", "--outputdir"):
-         gnuplot_output_dir=arg
-	 if not gnuplot_output_dir.endswith('/'):
-		gnuplot_output_dir=gnuplot_output_dir+'/'
-	 if not os.path.exists(gnuplot_output_dir):
-		os.makedirs(gnuplot_output_dir)
-      elif opt in ("-t", "--title"):
-         title=arg
-      elif opt in ("-m", "--min_time"):
-	 min_time=arg
-      elif opt in ("-M", "--max_time"):
-	 max_time=arg
-      elif opt in ("-g", "--gnuplot"):
-	 run_gnuplot=True
-      elif opt in ("-G", "--Global"):
-	 parse_global=True
-	 global_search=arg
-      elif opt in ("-h", "--help"):
-	  print_help()
-	  sys.exit(1)
+        if opt in ("-b", "--bandwidth"):
+            pattern='*_bw.log'
+        elif opt in ("-i", "--iops"):
+            pattern='*_iops.log'
+        elif opt in ("-v", "--verbose"):
+            verbose=True
+        elif opt in ("-k", "--keep"):
+            #User really wants to keep the temporary files
+            force_keep_temp_files=True
+        elif opt in ("-p", "--pattern"):
+            pattern_set_by_user=True
+            pattern=arg
+            pattern=pattern.replace('\\','')
+        elif opt in ("-o", "--outputfile"):
+            gnuplot_output_filename=arg
+        elif opt in ("-d", "--outputdir"):
+            gnuplot_output_dir=arg
+            if not gnuplot_output_dir.endswith('/'):
+                gnuplot_output_dir=gnuplot_output_dir+'/'
+            if not os.path.exists(gnuplot_output_dir):
+                os.makedirs(gnuplot_output_dir)
+        elif opt in ("-t", "--title"):
+            title=arg
+        elif opt in ("-m", "--min_time"):
+            min_time=arg
+        elif opt in ("-M", "--max_time"):
+            max_time=arg
+        elif opt in ("-g", "--gnuplot"):
+            run_gnuplot=True
+        elif opt in ("-G", "--Global"):
+            parse_global=True
+            global_search=arg
+        elif opt in ("-h", "--help"):
+            print_help()
+            sys.exit(1)
 
     # Adding .global extension to the file
     if parse_global==True:
-	    if not gnuplot_output_filename.endswith('.global'):
-	    	pattern = pattern+'.global'
+        if not gnuplot_output_filename.endswith('.global'):
+            pattern = pattern+'.global'
 
     fio_data_file=find_file('.',pattern)
     if len(fio_data_file) == 0:
-	    print("No log file found with pattern %s!" % pattern)
-	    # Try numjob log file format if per_numjob_logs=1
-            if (pattern == '*_bw.log'):
-                fio_data_file=find_file('.','*_bw.*.log')
-            if (pattern == '*_iops.log'):
-                fio_data_file=find_file('.','*_iops.*.log')
-            if len(fio_data_file) == 0:
-                sys.exit(1)
-            else:
-                print("Using log file per job format instead")
+        print("No log file found with pattern %s!" % pattern)
+        # Try numjob log file format if per_numjob_logs=1
+        if (pattern == '*_bw.log'):
+            fio_data_file=find_file('.','*_bw.*.log')
+        if (pattern == '*_iops.log'):
+            fio_data_file=find_file('.','*_iops.*.log')
+        if len(fio_data_file) == 0:
+            sys.exit(1)
+        else:
+            print("Using log file per job format instead")
     else:
-	    print("%d files Selected with pattern '%s'" % (len(fio_data_file), pattern))
+        print("%d files Selected with pattern '%s'" % (len(fio_data_file), pattern))
 
     fio_data_file=sorted(fio_data_file, key=str.lower)
     for file in fio_data_file:
-	print(' |-> %s' % file)
-	if "_bw.log" in file :
-		mode="Bandwidth (KB/sec)"
-	if "_iops.log" in file :
-		mode="IO per Seconds (IO/sec)"
+        print(' |-> %s' % file)
+        if "_bw.log" in file :
+            mode="Bandwidth (KB/sec)"
+        if "_iops.log" in file :
+            mode="IO per Seconds (IO/sec)"
     if (title == 'No title') and (mode != 'unknown'):
-	    if "Bandwidth" in mode:
-		    title='Bandwidth benchmark with %d fio results' % len(fio_data_file)
-	    if "IO" in mode:
-		    title='IO benchmark with %d fio results' % len(fio_data_file)
+        if "Bandwidth" in mode:
+            title='Bandwidth benchmark with %d fio results' % len(fio_data_file)
+        if "IO" in mode:
+            title='IO benchmark with %d fio results' % len(fio_data_file)
 
     print()
     #We need to adjust the output filename regarding the pattern required by the user
     if (pattern_set_by_user == True):
-	    gnuplot_output_filename=pattern
-	    # As we do have some glob in the pattern, let's make this simpliest
-	    # We do remove the simpliest parts of the expression to get a clear file name
-	    gnuplot_output_filename=gnuplot_output_filename.replace('-*-','-')
-	    gnuplot_output_filename=gnuplot_output_filename.replace('*','-')
-	    gnuplot_output_filename=gnuplot_output_filename.replace('--','-')
-	    gnuplot_output_filename=gnuplot_output_filename.replace('.log','')
-	    # Insure that we don't have any starting or trailing dash to the filename
-	    gnuplot_output_filename = gnuplot_output_filename[:-1] if gnuplot_output_filename.endswith('-') else gnuplot_output_filename
-	    gnuplot_output_filename = gnuplot_output_filename[1:] if gnuplot_output_filename.startswith('-') else gnuplot_output_filename
-	    if (gnuplot_output_filename == ''):
-            	gnuplot_output_filename='default'	
+        gnuplot_output_filename=pattern
+        # As we do have some glob in the pattern, let's make this simpliest
+        # We do remove the simpliest parts of the expression to get a clear file name
+        gnuplot_output_filename=gnuplot_output_filename.replace('-*-','-')
+        gnuplot_output_filename=gnuplot_output_filename.replace('*','-')
+        gnuplot_output_filename=gnuplot_output_filename.replace('--','-')
+        gnuplot_output_filename=gnuplot_output_filename.replace('.log','')
+        # Insure that we don't have any starting or trailing dash to the filename
+        gnuplot_output_filename = gnuplot_output_filename[:-1] if gnuplot_output_filename.endswith('-') else gnuplot_output_filename
+        gnuplot_output_filename = gnuplot_output_filename[1:] if gnuplot_output_filename.startswith('-') else gnuplot_output_filename
+        if (gnuplot_output_filename == ''):
+            gnuplot_output_filename='default'
 
     if parse_global==True:
-	parse_global_files(fio_data_file, global_search)
+        parse_global_files(fio_data_file, global_search)
     else:
-    	blk_size=compute_temp_file(fio_data_file,disk_perf,gnuplot_output_dir,min_time,max_time)
-    	title="%s @ Blocksize = %dK" % (title,blk_size/1024)
-    	compute_aggregated_file(fio_data_file, gnuplot_output_filename, gnuplot_output_dir)
-    	compute_math(fio_data_file,title,gnuplot_output_filename,gnuplot_output_dir,mode,disk_perf,gpm_dir)
-    	generate_gnuplot_script(fio_data_file,title,gnuplot_output_filename,gnuplot_output_dir,mode,disk_perf,gpm_dir)
-
-    	if (run_gnuplot==True):
-    		render_gnuplot(fio_data_file, gnuplot_output_dir)
-
-	# Shall we clean the temporary files ?
-	if keep_temp_files==False and force_keep_temp_files==False:
-	    	# Cleaning temporary files
-		if verbose: print("Cleaning temporary files")
-		for f in enumerate(temporary_files):
-			if verbose: print(" -> %s"%f[1])
-			try:
-			    os.remove(f[1])
-			except:
-			    True
+        blk_size=compute_temp_file(fio_data_file,disk_perf,gnuplot_output_dir,min_time,max_time)
+        title="%s @ Blocksize = %dK" % (title,blk_size/1024)
+        compute_aggregated_file(fio_data_file, gnuplot_output_filename, gnuplot_output_dir)
+        compute_math(fio_data_file,title,gnuplot_output_filename,gnuplot_output_dir,mode,disk_perf,gpm_dir)
+        generate_gnuplot_script(fio_data_file,title,gnuplot_output_filename,gnuplot_output_dir,mode,disk_perf,gpm_dir)
+
+        if (run_gnuplot==True):
+            render_gnuplot(fio_data_file, gnuplot_output_dir)
+
+        # Shall we clean the temporary files ?
+        if keep_temp_files==False and force_keep_temp_files==False:
+            # Cleaning temporary files
+            if verbose: print("Cleaning temporary files")
+            for f in enumerate(temporary_files):
+                if verbose: print(" -> %s"%f[1])
+                try:
+                    os.remove(f[1])
+                except:
+                    True
 
 #Main
 if __name__ == "__main__":


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-07-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-07-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit fc220349e45144360917db48010b503a9874930d:

  Merge branch 'dev' of https://github.com/smartxworks/fio (2019-07-12 10:44:45 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 029b42ace698eae477c5e261d2f82b191507526b:

  Document io_uring feature (2019-07-26 09:53:43 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      engines/libaio: remove remnants of abandoned aio features
      Document io_uring feature

 HOWTO            | 34 ++++++++++++++++++++++++++++++++
 engines/libaio.c | 60 +++-----------------------------------------------------
 fio.1            | 23 ++++++++++++++++++++++
 3 files changed, 60 insertions(+), 57 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 142b83e5..81244064 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1805,6 +1805,11 @@ I/O engine
 		**pvsync2**
 			Basic :manpage:`preadv2(2)` or :manpage:`pwritev2(2)` I/O.
 
+		**io_uring**
+			Fast Linux native asynchronous I/O. Supports async IO
+			for both direct and buffered IO.
+			This engine defines engine specific options.
+
 		**libaio**
 			Linux native asynchronous I/O. Note that Linux may only support
 			queued behavior with non-buffered I/O (set ``direct=1`` or
@@ -2002,6 +2007,35 @@ In addition, there are some parameters which are only valid when a specific
 with the caveat that when used on the command line, they must come after the
 :option:`ioengine` that defines them is selected.
 
+.. option:: hipri : [io_uring]
+
+	If this option is set, fio will attempt to use polled IO completions.
+	Normal IO completions generate interrupts to signal the completion of
+	IO, polled completions do not. Hence they are require active reaping
+	by the application. The benefits are more efficient IO for high IOPS
+	scenarios, and lower latencies for low queue depth IO.
+
+.. option:: fixedbufs : [io_uring]
+
+	If fio is asked to do direct IO, then Linux will map pages for each
+	IO call, and release them when IO is done. If this option is set, the
+	pages are pre-mapped before IO is started. This eliminates the need to
+	map and release for each IO. This is more efficient, and reduces the
+	IO latency as well.
+
+.. option:: sqthread_poll : [io_uring]
+
+	Normally fio will submit IO by issuing a system call to notify the
+	kernel of available items in the SQ ring. If this option is set, the
+	act of submitting IO will be done by a polling thread in the kernel.
+	This frees up cycles for fio, at the cost of using more CPU in the
+	system.
+
+.. option:: sqthread_poll_cpu : [io_uring]
+
+	When :option:`sqthread_poll` is set, this option provides a way to
+	define which CPU should be used for the polling thread.
+
 .. option:: userspace_reap : [libaio]
 
 	Normally, with the libaio engine in use, fio will use the
diff --git a/engines/libaio.c b/engines/libaio.c
index cc6ca66b..cd5b89f9 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -16,14 +16,6 @@
 #include "../optgroup.h"
 #include "../lib/memalign.h"
 
-#ifndef IOCB_FLAG_HIPRI
-#define IOCB_FLAG_HIPRI	(1 << 2)
-#endif
-
-#ifndef IOCTX_FLAG_IOPOLL
-#define IOCTX_FLAG_IOPOLL	(1 << 0)
-#endif
-
 static int fio_libaio_commit(struct thread_data *td);
 
 struct libaio_data {
@@ -65,15 +57,6 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
 	},
-	{
-		.name	= "hipri",
-		.lname	= "High Priority",
-		.type	= FIO_OPT_STR_SET,
-		.off1	= offsetof(struct libaio_options, hipri),
-		.help	= "Use polled IO completions",
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_LIBAIO,
-	},
 	{
 		.name	= NULL,
 	},
@@ -91,19 +74,12 @@ static inline void ring_inc(struct libaio_data *ld, unsigned int *val,
 static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
-	struct libaio_options *o = td->eo;
-	struct iocb *iocb;
-
-	iocb = &io_u->iocb;
+	struct iocb *iocb = &io_u->iocb;
 
 	if (io_u->ddir == DDIR_READ) {
 		io_prep_pread(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
-		if (o->hipri)
-			iocb->u.c.flags |= IOCB_FLAG_HIPRI;
 	} else if (io_u->ddir == DDIR_WRITE) {
 		io_prep_pwrite(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
-		if (o->hipri)
-			iocb->u.c.flags |= IOCB_FLAG_HIPRI;
 	} else if (ddir_sync(io_u->ddir))
 		io_prep_fsync(iocb, f->fd);
 
@@ -366,42 +342,12 @@ static void fio_libaio_cleanup(struct thread_data *td)
 	}
 }
 
-static int fio_libaio_old_queue_init(struct libaio_data *ld, unsigned int depth,
-				     bool hipri)
-{
-	if (hipri) {
-		log_err("fio: polled aio not available on your platform\n");
-		return 1;
-	}
-
-	return io_queue_init(depth, &ld->aio_ctx);
-}
-
-static int fio_libaio_queue_init(struct libaio_data *ld, unsigned int depth,
-				 bool hipri)
-{
-#ifdef __NR_sys_io_setup2
-	int ret, flags = 0;
-
-	if (hipri)
-		flags |= IOCTX_FLAG_IOPOLL;
-
-	ret = syscall(__NR_sys_io_setup2, depth, flags, NULL, NULL,
-			&ld->aio_ctx);
-	if (!ret)
-		return 0;
-	/* fall through to old syscall */
-#endif
-	return fio_libaio_old_queue_init(ld, depth, hipri);
-}
-
 static int fio_libaio_post_init(struct thread_data *td)
 {
 	struct libaio_data *ld = td->io_ops_data;
-	struct libaio_options *o = td->eo;
-	int err = 0;
+	int err;
 
-	err = fio_libaio_queue_init(ld, td->o.iodepth, o->hipri);
+	err = io_queue_init(td->o.iodepth, &ld->aio_ctx);
 	if (err) {
 		td_verror(td, -err, "io_queue_init");
 		return 1;
diff --git a/fio.1 b/fio.1
index 156201ad..2966d9d5 100644
--- a/fio.1
+++ b/fio.1
@@ -1762,6 +1762,29 @@ In addition, there are some parameters which are only valid when a specific
 with the caveat that when used on the command line, they must come after the
 \fBioengine\fR that defines them is selected.
 .TP
+.BI (io_uring)hipri
+If this option is set, fio will attempt to use polled IO completions. Normal IO
+completions generate interrupts to signal the completion of IO, polled
+completions do not. Hence they are require active reaping by the application.
+The benefits are more efficient IO for high IOPS scenarios, and lower latencies
+for low queue depth IO.
+.TP
+.BI (io_uring)fixedbufs
+If fio is asked to do direct IO, then Linux will map pages for each IO call, and
+release them when IO is done. If this option is set, the pages are pre-mapped
+before IO is started. This eliminates the need to map and release for each IO.
+This is more efficient, and reduces the IO latency as well.
+.TP
+.BI (io_uring)sqthread_poll
+Normally fio will submit IO by issuing a system call to notify the kernel of
+available items in the SQ ring. If this option is set, the act of submitting IO
+will be done by a polling thread in the kernel. This frees up cycles for fio, at
+the cost of using more CPU in the system.
+.TP
+.BI (io_uring)sqthread_poll_cpu
+When `sqthread_poll` is set, this option provides a way to define which CPU
+should be used for the polling thread.
+.TP
 .BI (libaio)userspace_reap
 Normally, with the libaio engine in use, fio will use the
 \fBio_getevents\fR\|(3) system call to reap newly returned events. With


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-07-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-07-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f32a30d4c4eb2490b5c1bdac9ae3c2fc7a7ab20e:

  engines/http: set FIO_SYNCIO flag (2019-07-09 08:55:31 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fc220349e45144360917db48010b503a9874930d:

  Merge branch 'dev' of https://github.com/smartxworks/fio (2019-07-12 10:44:45 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Fio 3.15
      Merge branch 'dev' of https://github.com/smartxworks/fio

Kyle Zhang (1):
      libiscsi: log reason of error when readcapacity failed

 FIO-VERSION-GEN    | 2 +-
 engines/libiscsi.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 69222008..350da551 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.14
+DEF_VER=fio-3.15
 
 LF='
 '
diff --git a/engines/libiscsi.c b/engines/libiscsi.c
index e4eb0bab..bea94c5a 100644
--- a/engines/libiscsi.c
+++ b/engines/libiscsi.c
@@ -117,7 +117,8 @@ static int fio_iscsi_setup_lun(struct iscsi_info *iscsi_info,
 
 	task = iscsi_readcapacity16_sync(iscsi_lun->iscsi, iscsi_lun->url->lun);
 	if (task == NULL || task->status != SCSI_STATUS_GOOD) {
-		log_err("iscsi: failed to send readcapacity command\n");
+		log_err("iscsi: failed to send readcapacity command: %s\n",
+			iscsi_get_error(iscsi_lun->iscsi));
 		ret = EINVAL;
 		goto out;
 	}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-07-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-07-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 308aa5d011158d5f7fa533a60199066dd1858e4c:

  Merge branch 'optlenmax' of https://github.com/powernap/fio (2019-07-01 14:26:29 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f32a30d4c4eb2490b5c1bdac9ae3c2fc7a7ab20e:

  engines/http: set FIO_SYNCIO flag (2019-07-09 08:55:31 -0600)

----------------------------------------------------------------
Vincent Fu (2):
      fio: fix aio trim completion latencies
      engines/http: set FIO_SYNCIO flag

 engines/http.c     | 2 +-
 engines/io_uring.c | 1 +
 engines/libaio.c   | 1 +
 engines/posixaio.c | 1 +
 ioengines.c        | 8 ++++++--
 ioengines.h        | 2 ++
 6 files changed, 12 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/engines/http.c b/engines/http.c
index a35c0332..275fcab5 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -642,7 +642,7 @@ static int fio_http_invalidate(struct thread_data *td, struct fio_file *f)
 static struct ioengine_ops ioengine = {
 	.name = "http",
 	.version		= FIO_IOOPS_VERSION,
-	.flags			= FIO_DISKLESSIO,
+	.flags			= FIO_DISKLESSIO | FIO_SYNCIO,
 	.setup			= fio_http_setup,
 	.queue			= fio_http_queue,
 	.getevents		= fio_http_getevents,
diff --git a/engines/io_uring.c b/engines/io_uring.c
index a5e77d8f..9bcfec17 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -533,6 +533,7 @@ static int fio_ioring_io_u_init(struct thread_data *td, struct io_u *io_u)
 static struct ioengine_ops ioengine = {
 	.name			= "io_uring",
 	.version		= FIO_IOOPS_VERSION,
+	.flags			= FIO_ASYNCIO_SYNC_TRIM,
 	.init			= fio_ioring_init,
 	.post_init		= fio_ioring_post_init,
 	.io_u_init		= fio_ioring_io_u_init,
diff --git a/engines/libaio.c b/engines/libaio.c
index 8844ac8b..cc6ca66b 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -429,6 +429,7 @@ static int fio_libaio_init(struct thread_data *td)
 static struct ioengine_ops ioengine = {
 	.name			= "libaio",
 	.version		= FIO_IOOPS_VERSION,
+	.flags			= FIO_ASYNCIO_SYNC_TRIM,
 	.init			= fio_libaio_init,
 	.post_init		= fio_libaio_post_init,
 	.prep			= fio_libaio_prep,
diff --git a/engines/posixaio.c b/engines/posixaio.c
index 4ac01957..82c6aa65 100644
--- a/engines/posixaio.c
+++ b/engines/posixaio.c
@@ -243,6 +243,7 @@ static int fio_posixaio_init(struct thread_data *td)
 static struct ioengine_ops ioengine = {
 	.name		= "posixaio",
 	.version	= FIO_IOOPS_VERSION,
+	.flags		= FIO_ASYNCIO_SYNC_TRIM,
 	.init		= fio_posixaio_init,
 	.prep		= fio_posixaio_prep,
 	.queue		= fio_posixaio_queue,
diff --git a/ioengines.c b/ioengines.c
index 7e5a50cc..aa4ccd27 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -308,7 +308,9 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	io_u->error = 0;
 	io_u->resid = 0;
 
-	if (td_ioengine_flagged(td, FIO_SYNCIO)) {
+	if (td_ioengine_flagged(td, FIO_SYNCIO) ||
+		(td_ioengine_flagged(td, FIO_ASYNCIO_SYNC_TRIM) && 
+		io_u->ddir == DDIR_TRIM)) {
 		if (fio_fill_issue_time(td))
 			fio_gettime(&io_u->issue_time, NULL);
 
@@ -389,7 +391,9 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 			td_io_commit(td);
 	}
 
-	if (!td_ioengine_flagged(td, FIO_SYNCIO)) {
+	if (!td_ioengine_flagged(td, FIO_SYNCIO) &&
+		(!td_ioengine_flagged(td, FIO_ASYNCIO_SYNC_TRIM) ||
+		 io_u->ddir != DDIR_TRIM)) {
 		if (fio_fill_issue_time(td))
 			fio_gettime(&io_u->issue_time, NULL);
 
diff --git a/ioengines.h b/ioengines.h
index b9cd33d5..01a9b586 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -63,6 +63,8 @@ enum fio_ioengine_flags {
 	FIO_FAKEIO	= 1 << 11,	/* engine pretends to do IO */
 	FIO_NOSTATS	= 1 << 12,	/* don't do IO stats */
 	FIO_NOFILEHASH	= 1 << 13,	/* doesn't hash the files for lookup later. */
+	FIO_ASYNCIO_SYNC_TRIM
+			= 1 << 14	/* io engine has async ->queue except for trim */
 };
 
 /*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-07-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-07-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 971341ad741db8d2ae57d8f9b368d75f6c02c1a9:

  Merge branch 'wip-tcmalloc' of https://github.com/dillaman/fio (2019-05-31 15:37:18 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 308aa5d011158d5f7fa533a60199066dd1858e4c:

  Merge branch 'optlenmax' of https://github.com/powernap/fio (2019-07-01 14:26:29 -0600)

----------------------------------------------------------------
Damien Le Moal (2):
      Fix string copy compilation warnings
      eta: Fix compiler warning

Jens Axboe (2):
      engines/rbd: hide rbd_io_u_seen() if not used
      Merge branch 'optlenmax' of https://github.com/powernap/fio

Nick Principe (1):
      Increase maximum length of line in jobs file to 8192

 engines/rbd.c           |  2 ++
 eta.c                   |  9 ++++++---
 exp/expression-parser.y |  6 +++---
 filesetup.c             |  3 ++-
 fio.1                   |  2 ++
 init.c                  | 11 ++++++-----
 parse.h                 |  2 +-
 server.c                |  9 ++++++---
 8 files changed, 28 insertions(+), 16 deletions(-)

---

Diff of recent changes:

diff --git a/engines/rbd.c b/engines/rbd.c
index 081b4a04..7d4d3faf 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -316,12 +316,14 @@ static inline int fri_check_complete(struct rbd_data *rbd, struct io_u *io_u,
 	return 0;
 }
 
+#ifndef CONFIG_RBD_POLL
 static inline int rbd_io_u_seen(struct io_u *io_u)
 {
 	struct fio_rbd_iou *fri = io_u->engine_data;
 
 	return fri->io_seen;
 }
+#endif
 
 static void rbd_io_u_wait_complete(struct io_u *io_u)
 {
diff --git a/eta.c b/eta.c
index b69dd194..647a1bdd 100644
--- a/eta.c
+++ b/eta.c
@@ -392,6 +392,9 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 	static unsigned long long disp_io_iops[DDIR_RWDIR_CNT];
 	static struct timespec rate_prev_time, disp_prev_time;
 
+	void *je_rate = (void *) je->rate;
+	void *je_iops = (void *) je->iops;
+
 	if (!force) {
 		if (!(output_format & FIO_OUTPUT_NORMAL) &&
 		    f_out == stdout)
@@ -507,7 +510,7 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 
 	if (write_bw_log && rate_time > bw_avg_time && !in_ramp_time(td)) {
 		calc_rate(unified_rw_rep, rate_time, io_bytes, rate_io_bytes,
-				je->rate);
+				je_rate);
 		memcpy(&rate_prev_time, &now, sizeof(now));
 		add_agg_sample(sample_val(je->rate[DDIR_READ]), DDIR_READ, 0);
 		add_agg_sample(sample_val(je->rate[DDIR_WRITE]), DDIR_WRITE, 0);
@@ -519,8 +522,8 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 	if (!force && !eta_time_within_slack(disp_time))
 		return false;
 
-	calc_rate(unified_rw_rep, disp_time, io_bytes, disp_io_bytes, je->rate);
-	calc_iops(unified_rw_rep, disp_time, io_iops, disp_io_iops, je->iops);
+	calc_rate(unified_rw_rep, disp_time, io_bytes, disp_io_bytes, je_rate);
+	calc_iops(unified_rw_rep, disp_time, io_iops, disp_io_iops, je_iops);
 
 	memcpy(&disp_prev_time, &now, sizeof(now));
 
diff --git a/exp/expression-parser.y b/exp/expression-parser.y
index 04a6e07a..8619025c 100644
--- a/exp/expression-parser.y
+++ b/exp/expression-parser.y
@@ -204,9 +204,9 @@ static void setup_to_parse_string(const char *string)
 {
 	unsigned int len;
 
-	len = strlen(string);
-	if (len > sizeof(lexer_input_buffer) - 3)
-		len = sizeof(lexer_input_buffer) - 3;
+	len = sizeof(lexer_input_buffer) - 3;
+	if (len > strlen(string))
+		len = strlen(string);
 
 	strncpy(lexer_input_buffer, string, len);
 	lexer_input_buffer[len] = '\0'; 
diff --git a/filesetup.c b/filesetup.c
index 24e6fb07..17fa31fb 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -829,7 +829,8 @@ static unsigned long long get_fs_free_counts(struct thread_data *td)
 			continue;
 
 		fm = calloc(1, sizeof(*fm));
-		strncpy(fm->__base, buf, sizeof(fm->__base) - 1);
+		strncpy(fm->__base, buf, sizeof(fm->__base));
+		fm->__base[255] = '\0'; 
 		fm->base = basename(fm->__base);
 		fm->key = sb.st_dev;
 		flist_add(&fm->list, &list);
diff --git a/fio.1 b/fio.1
index 84b80eee..156201ad 100644
--- a/fio.1
+++ b/fio.1
@@ -201,6 +201,8 @@ argument, \fB\-\-cmdhelp\fR will detail the given \fIcommand\fR.
 See the `examples/' directory for inspiration on how to write job files. Note
 the copyright and license requirements currently apply to
 `examples/' files.
+
+Note that the maximum length of a line in the job file is 8192 bytes.
 .SH "JOB FILE PARAMETERS"
 Some parameters take an option of a given type, such as an integer or a
 string. Anywhere a numeric value is required, an arithmetic expression may be
diff --git a/init.c b/init.c
index 73834279..c9f6198e 100644
--- a/init.c
+++ b/init.c
@@ -1438,7 +1438,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		   int recursed, int client_type)
 {
 	unsigned int i;
-	char fname[PATH_MAX];
+	char fname[PATH_MAX + 1];
 	int numjobs, file_alloced;
 	struct thread_options *o = &td->o;
 	char logname[PATH_MAX + 32];
@@ -1887,7 +1887,7 @@ static int __parse_jobs_ini(struct thread_data *td,
 		}
 	}
 
-	string = malloc(4096);
+	string = malloc(OPT_LEN_MAX);
 
 	/*
 	 * it's really 256 + small bit, 280 should suffice
@@ -1920,7 +1920,7 @@ static int __parse_jobs_ini(struct thread_data *td,
 			if (is_buf)
 				p = strsep(&file, "\n");
 			else
-				p = fgets(string, 4096, f);
+				p = fgets(string, OPT_LEN_MAX, f);
 			if (!p)
 				break;
 		}
@@ -1989,7 +1989,7 @@ static int __parse_jobs_ini(struct thread_data *td,
 				if (is_buf)
 					p = strsep(&file, "\n");
 				else
-					p = fgets(string, 4096, f);
+					p = fgets(string, OPT_LEN_MAX, f);
 				if (!p)
 					break;
 				dprint(FD_PARSE, "%s", p);
@@ -2040,7 +2040,8 @@ static int __parse_jobs_ini(struct thread_data *td,
 					strncpy(full_fn,
 						file, (ts - file) + 1);
 					strncpy(full_fn + (ts - file) + 1,
-						filename, strlen(filename));
+						filename,
+						len - (ts - file) - 1);
 					full_fn[len - 1] = 0;
 					filename = full_fn;
 				}
diff --git a/parse.h b/parse.h
index b47a02c7..9b4e2f32 100644
--- a/parse.h
+++ b/parse.h
@@ -37,7 +37,7 @@ struct value_pair {
 	void *cb;			/* sub-option callback */
 };
 
-#define OPT_LEN_MAX 	4096
+#define OPT_LEN_MAX 	8192
 #define PARSE_MAX_VP	24
 
 /*
diff --git a/server.c b/server.c
index 2a337707..23e549a5 100644
--- a/server.c
+++ b/server.c
@@ -1470,9 +1470,12 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 
 	memset(&p, 0, sizeof(p));
 
-	strncpy(p.ts.name, ts->name, FIO_JOBNAME_SIZE - 1);
-	strncpy(p.ts.verror, ts->verror, FIO_VERROR_SIZE - 1);
-	strncpy(p.ts.description, ts->description, FIO_JOBDESC_SIZE - 1);
+	strncpy(p.ts.name, ts->name, FIO_JOBNAME_SIZE);
+	p.ts.name[FIO_JOBNAME_SIZE - 1] = '\0';
+	strncpy(p.ts.verror, ts->verror, FIO_VERROR_SIZE);
+	p.ts.verror[FIO_VERROR_SIZE - 1] = '\0';
+	strncpy(p.ts.description, ts->description, FIO_JOBDESC_SIZE);
+	p.ts.description[FIO_JOBDESC_SIZE - 1] = '\0';
 
 	p.ts.error		= cpu_to_le32(ts->error);
 	p.ts.thread_number	= cpu_to_le32(ts->thread_number);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-06-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-06-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ce4d13ca162df4127ec3b5911553802c53396705:

  glusterfs: update for new API (2019-05-23 14:57:52 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 971341ad741db8d2ae57d8f9b368d75f6c02c1a9:

  Merge branch 'wip-tcmalloc' of https://github.com/dillaman/fio (2019-05-31 15:37:18 -0600)

----------------------------------------------------------------
Helmut Grohne (2):
      configure: apply ${cross_prefix} to pkg-config calls
      configure: check for gtk version using pkg-config

Jason Dillaman (1):
      configure: attempt to link against tcmalloc by default if available

Jens Axboe (1):
      Merge branch 'wip-tcmalloc' of https://github.com/dillaman/fio

 configure | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index b0052dc1..a0692d58 100755
--- a/configure
+++ b/configure
@@ -207,6 +207,8 @@ for opt do
   ;;
   --enable-libiscsi) libiscsi="yes"
   ;;
+  --disable-tcmalloc) disable_tcmalloc="yes"
+  ;;
   --help)
     show_help="yes"
     ;;
@@ -243,6 +245,7 @@ if test "$show_help" = "yes" ; then
   echo "--disable-native        Don't build for native host"
   echo "--with-ime=             Install path for DDN's Infinite Memory Engine"
   echo "--enable-libiscsi       Enable iscsi support"
+  echo "--disable-tcmalloc	Disable tcmalloc support"
   exit $exit_val
 fi
 
@@ -1344,31 +1347,30 @@ int main(void)
   return GTK_CHECK_VERSION(2, 18, 0) ? 0 : 1; /* 0 on success */
 }
 EOF
-GTK_CFLAGS=$(pkg-config --cflags gtk+-2.0 gthread-2.0)
+GTK_CFLAGS=$(${cross_prefix}pkg-config --cflags gtk+-2.0 gthread-2.0)
 ORG_LDFLAGS=$LDFLAGS
 LDFLAGS=$(echo $LDFLAGS | sed s/"-static"//g)
 if test "$?" != "0" ; then
   echo "configure: gtk and gthread not found"
   exit 1
 fi
-GTK_LIBS=$(pkg-config --libs gtk+-2.0 gthread-2.0)
+GTK_LIBS=$(${cross_prefix}pkg-config --libs gtk+-2.0 gthread-2.0)
 if test "$?" != "0" ; then
   echo "configure: gtk and gthread not found"
   exit 1
 fi
-if compile_prog "$GTK_CFLAGS" "$GTK_LIBS" "gfio" ; then
-  $TMPE
-  if test "$?" = "0" ; then
+if ! ${cross_prefix}pkg-config --atleast-version 2.18.0 gtk+-2.0; then
+  echo "GTK found, but need version 2.18 or higher"
+  gfio="no"
+else
+  if compile_prog "$GTK_CFLAGS" "$GTK_LIBS" "gfio" ; then
     gfio="yes"
     GFIO_LIBS="$LIBS $GTK_LIBS"
     CFLAGS="$CFLAGS $GTK_CFLAGS"
   else
-    echo "GTK found, but need version 2.18 or higher"
+    echo "Please install gtk and gdk libraries"
     gfio="no"
   fi
-else
-  echo "Please install gtk and gdk libraries"
-  gfio="no"
 fi
 LDFLAGS=$ORG_LDFLAGS
 fi
@@ -2696,6 +2698,19 @@ if test "$libiscsi" = "yes" ; then
   echo "LIBISCSI_CFLAGS=$libiscsi_cflags" >> $config_host_mak
   echo "LIBISCSI_LIBS=$libiscsi_libs" >> $config_host_mak
 fi
+cat > $TMPC << EOF
+int main(int argc, char **argv)
+{
+  return 0;
+}
+EOF
+if test "$disable_tcmalloc" != "yes"  && compile_prog "" "-ltcmalloc" "tcmalloc"; then
+  LIBS="-ltcmalloc $LIBS"
+  tcmalloc="yes"
+else
+  tcmalloc="no"
+fi
+print_config "TCMalloc support" "$tcmalloc"
 
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "GFIO_LIBS+=$GFIO_LIBS" >> $config_host_mak


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-05-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-05-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit de5ed0e4d398bc9d4576f9b2b82d7686989c27e1:

  configure: add gettid() test (2019-05-22 17:12:55 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ce4d13ca162df4127ec3b5911553802c53396705:

  glusterfs: update for new API (2019-05-23 14:57:52 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      glusterfs: update for new API

 configure                 | 21 +++++++++++++++++++++
 engines/glusterfs.c       |  8 ++++++++
 engines/glusterfs_async.c |  5 +++++
 engines/glusterfs_sync.c  |  8 ++++++++
 4 files changed, 42 insertions(+)

---

Diff of recent changes:

diff --git a/configure b/configure
index ee421663..b0052dc1 100755
--- a/configure
+++ b/configure
@@ -1789,6 +1789,24 @@ fi
 print_config "Gluster API use fadvise" "$gf_fadvise"
 fi
 
+##########################################
+# check for newer gfapi
+if test "$gfapi" = "yes" ; then
+gf_new="no"
+cat > $TMPC << EOF
+#include <glusterfs/api/glfs.h>
+
+int main(int argc, char **argv)
+{
+  return glfs_fsync(NULL, NULL, NULL) && glfs_ftruncate(NULL, 0, NULL, NULL);
+}
+EOF
+if compile_prog "" "-lgfapi -lglusterfs" "gf new api"; then
+  gf_new="yes"
+fi
+print_config "Gluster new API" "$gf_new"
+fi
+
 ##########################################
 # check for gfapi trim support
 if test "$gf_trim" != "yes" ; then
@@ -2576,6 +2594,9 @@ fi
 if test "$gf_trim" = "yes" ; then
   output_sym "CONFIG_GF_TRIM"
 fi
+if test "$gf_new" = "yes" ; then
+  output_sym "CONFIG_GF_NEW_API"
+fi
 if test "$libhdfs" = "yes" ; then
   output_sym "CONFIG_LIBHDFS"
   echo "FIO_HDFS_CPU=$FIO_HDFS_CPU" >> $config_host_mak
diff --git a/engines/glusterfs.c b/engines/glusterfs.c
index d0250b70..f2b84a2a 100644
--- a/engines/glusterfs.c
+++ b/engines/glusterfs.c
@@ -288,7 +288,11 @@ int fio_gf_open_file(struct thread_data *td, struct fio_file *f)
 		    || sb.st_size < f->real_file_size) {
 			dprint(FD_FILE, "fio extend file %s from %jd to %" PRIu64 "\n",
 			       f->file_name, (intmax_t) sb.st_size, f->real_file_size);
+#if defined(CONFIG_GF_NEW_API)
+			ret = glfs_ftruncate(g->fd, f->real_file_size, NULL, NULL);
+#else
 			ret = glfs_ftruncate(g->fd, f->real_file_size);
+#endif
 			if (ret) {
 				log_err("failed fio extend file %s to %" PRIu64 "\n",
 					f->file_name, f->real_file_size);
@@ -350,7 +354,11 @@ int fio_gf_open_file(struct thread_data *td, struct fio_file *f)
 					       f->file_name);
 					glfs_unlink(g->fs, f->file_name);
 				} else if (td->o.create_fsync) {
+#if defined(CONFIG_GF_NEW_API)
+					if (glfs_fsync(g->fd, NULL, NULL) < 0) {
+#else
 					if (glfs_fsync(g->fd) < 0) {
+#endif
 						dprint(FD_FILE,
 						       "failed to sync, close %s\n",
 						       f->file_name);
diff --git a/engines/glusterfs_async.c b/engines/glusterfs_async.c
index 9e1c4bf0..0392ad6e 100644
--- a/engines/glusterfs_async.c
+++ b/engines/glusterfs_async.c
@@ -84,7 +84,12 @@ static int fio_gf_io_u_init(struct thread_data *td, struct io_u *io_u)
 	return 0;
 }
 
+#if defined(CONFIG_GF_NEW_API)
+static void gf_async_cb(glfs_fd_t * fd, ssize_t ret, struct glfs_stat *prestat,
+			struct glfs_stat *poststat, void *data)
+#else
 static void gf_async_cb(glfs_fd_t * fd, ssize_t ret, void *data)
+#endif
 {
 	struct io_u *io_u = data;
 	struct fio_gf_iou *iou = io_u->engine_data;
diff --git a/engines/glusterfs_sync.c b/engines/glusterfs_sync.c
index 099a5af1..de73261f 100644
--- a/engines/glusterfs_sync.c
+++ b/engines/glusterfs_sync.c
@@ -42,9 +42,17 @@ static enum fio_q_status fio_gf_queue(struct thread_data *td, struct io_u *io_u)
 	else if (io_u->ddir == DDIR_WRITE)
 		ret = glfs_write(g->fd, io_u->xfer_buf, io_u->xfer_buflen, 0);
 	else if (io_u->ddir == DDIR_SYNC)
+#if defined(CONFIG_GF_NEW_API)
+		ret = glfs_fsync(g->fd, NULL, NULL);
+#else
 		ret = glfs_fsync(g->fd);
+#endif
 	else if (io_u->ddir == DDIR_DATASYNC)
+#if defined(CONFIG_GF_NEW_API)
+		ret = glfs_fdatasync(g->fd, NULL, NULL);
+#else
 		ret = glfs_fdatasync(g->fd);
+#endif
 	else {
 		log_err("unsupported operation.\n");
 		io_u->error = EINVAL;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-05-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-05-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a819dfb6b6b1e1e1339bbd8c3a446b52b5e7575c:

  io_uring: sync with liburing/kernel (2019-05-20 08:49:49 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to de5ed0e4d398bc9d4576f9b2b82d7686989c27e1:

  configure: add gettid() test (2019-05-22 17:12:55 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Fio 3.14
      configure: add gettid() test

 FIO-VERSION-GEN   |  2 +-
 configure         | 18 ++++++++++++++++++
 os/os-dragonfly.h |  2 ++
 os/os-linux.h     |  2 ++
 os/os-mac.h       |  2 ++
 os/os-netbsd.h    |  2 ++
 os/os-openbsd.h   |  2 ++
 os/os-solaris.h   |  2 ++
 os/os-windows.h   |  2 ++
 os/os.h           |  2 ++
 10 files changed, 35 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 37fb1a7a..69222008 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.13
+DEF_VER=fio-3.14
 
 LF='
 '
diff --git a/configure b/configure
index d71387c0..ee421663 100755
--- a/configure
+++ b/configure
@@ -2374,6 +2374,21 @@ EOF
 fi
 print_config "MADV_HUGEPAGE" "$thp"
 
+##########################################
+# check for gettid()
+gettid="no"
+cat > $TMPC << EOF
+#include <unistd.h>
+int main(int argc, char **argv)
+{
+  return gettid();
+}
+EOF
+if compile_prog "" "" "gettid"; then
+  gettid="yes"
+fi
+print_config "gettid" "$gettid"
+
 #############################################################################
 
 if test "$wordsize" = "64" ; then
@@ -2645,6 +2660,9 @@ fi
 if test "$__kernel_rwf_t" = "yes"; then
   output_sym "CONFIG_HAVE_KERNEL_RWF_T"
 fi
+if test "$gettid" = "yes"; then
+  output_sym "CONFIG_HAVE_GETTID"
+fi
 if test "$fallthrough" = "yes"; then
   CFLAGS="$CFLAGS -Wimplicit-fallthrough"
 fi
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index eb92521f..3c460ae2 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -202,10 +202,12 @@ static inline unsigned long long os_phys_mem(void)
 	return mem;
 }
 
+#ifndef CONFIG_HAVE_GETTID
 static inline int gettid(void)
 {
 	return (int) lwp_gettid();
 }
+#endif
 
 static inline unsigned long long get_fs_free_size(const char *path)
 {
diff --git a/os/os-linux.h b/os/os-linux.h
index ba58bf7d..36339ef3 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -124,10 +124,12 @@ static inline int ioprio_set(int which, int who, int ioprio_class, int ioprio)
 	return syscall(__NR_ioprio_set, which, who, ioprio);
 }
 
+#ifndef CONFIG_HAVE_GETTID
 static inline int gettid(void)
 {
 	return syscall(__NR_gettid);
 }
+#endif
 
 #define SPLICE_DEF_SIZE	(64*1024)
 
diff --git a/os/os-mac.h b/os/os-mac.h
index 0b9c8707..a073300c 100644
--- a/os/os-mac.h
+++ b/os/os-mac.h
@@ -90,10 +90,12 @@ static inline unsigned long long os_phys_mem(void)
 	return mem;
 }
 
+#ifndef CONFIG_HAVE_GETTID
 static inline int gettid(void)
 {
 	return mach_thread_self();
 }
+#endif
 
 /*
  * For some reason, there's no header definition for fdatasync(), even
diff --git a/os/os-netbsd.h b/os/os-netbsd.h
index c06261d4..88fb3ef1 100644
--- a/os/os-netbsd.h
+++ b/os/os-netbsd.h
@@ -65,10 +65,12 @@ static inline unsigned long long os_phys_mem(void)
 	return mem;
 }
 
+#ifndef CONFIG_HAVE_GETTID
 static inline int gettid(void)
 {
 	return (int) _lwp_self();
 }
+#endif
 
 static inline unsigned long long get_fs_free_size(const char *path)
 {
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index 70f58b49..43a649d4 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -65,10 +65,12 @@ static inline unsigned long long os_phys_mem(void)
 	return mem;
 }
 
+#ifndef CONFIG_HAVE_GETTID
 static inline int gettid(void)
 {
 	return (int)(intptr_t) pthread_self();
 }
+#endif
 
 static inline unsigned long long get_fs_free_size(const char *path)
 {
diff --git a/os/os-solaris.h b/os/os-solaris.h
index 1a411af6..f1966f44 100644
--- a/os/os-solaris.h
+++ b/os/os-solaris.h
@@ -164,10 +164,12 @@ static inline int fio_cpuset_exit(os_cpu_mask_t *mask)
 	return 0;
 }
 
+#ifndef CONFIG_HAVE_GETTID
 static inline int gettid(void)
 {
 	return pthread_self();
 }
+#endif
 
 /*
  * Should be enough, not aware of what (if any) restrictions Solaris has
diff --git a/os/os-windows.h b/os/os-windows.h
index dc958f5c..3e9f7341 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -162,10 +162,12 @@ static inline unsigned long long os_phys_mem(void)
 	return (unsigned long long) pages * (unsigned long long) pagesize;
 }
 
+#ifndef CONFIG_HAVE_GETTID
 static inline int gettid(void)
 {
 	return GetCurrentThreadId();
 }
+#endif
 
 static inline int init_random_seeds(uint64_t *rand_seeds, int size)
 {
diff --git a/os/os.h b/os/os.h
index 756ece4b..e4729680 100644
--- a/os/os.h
+++ b/os/os.h
@@ -373,11 +373,13 @@ static inline int CPU_COUNT(os_cpu_mask_t *mask)
 #endif
 
 #ifndef FIO_HAVE_GETTID
+#ifndef CONFIG_HAVE_GETTID
 static inline int gettid(void)
 {
 	return getpid();
 }
 #endif
+#endif
 
 #ifndef FIO_HAVE_SHM_ATTACH_REMOVED
 static inline int shm_attach_to_open_removed(void)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-05-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-05-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d5495f0b72f8fbecb97192da430720aa56f8feb9:

  stat: remove terse v2 blank lines with description not set (2019-05-16 08:42:45 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a819dfb6b6b1e1e1339bbd8c3a446b52b5e7575c:

  io_uring: sync with liburing/kernel (2019-05-20 08:49:49 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      io_uring: sync with liburing/kernel

 os/linux/io_uring.h | 6 ++++++
 1 file changed, 6 insertions(+)

---

Diff of recent changes:

diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index e2340869..ce03151e 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -26,6 +26,7 @@ struct io_uring_sqe {
 		__kernel_rwf_t	rw_flags;
 		__u32		fsync_flags;
 		__u16		poll_events;
+		__u32		sync_range_flags;
 	};
 	__u64	user_data;	/* data to be passed back at completion time */
 	union {
@@ -38,6 +39,8 @@ struct io_uring_sqe {
  * sqe->flags
  */
 #define IOSQE_FIXED_FILE	(1U << 0)	/* use fixed fileset */
+#define IOSQE_IO_DRAIN		(1U << 1)	/* issue after inflight IO */
+#define IOSQE_IO_LINK		(1U << 2)	/* next IO depends on this one */
 
 /*
  * io_uring_setup() flags
@@ -54,6 +57,7 @@ struct io_uring_sqe {
 #define IORING_OP_WRITE_FIXED	5
 #define IORING_OP_POLL_ADD	6
 #define IORING_OP_POLL_REMOVE	7
+#define IORING_OP_SYNC_FILE_RANGE	8
 
 /*
  * sqe->fsync_flags
@@ -133,5 +137,7 @@ struct io_uring_params {
 #define IORING_UNREGISTER_BUFFERS	1
 #define IORING_REGISTER_FILES		2
 #define IORING_UNREGISTER_FILES		3
+#define IORING_REGISTER_EVENTFD		4
+#define IORING_UNREGISTER_EVENTFD	5
 
 #endif


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-05-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-05-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8066f6b6177acba17f61fecb9f04d04767fb1a96:

  t/io_uring: improve EOPNOTSUPP message (2019-05-09 09:56:29 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d5495f0b72f8fbecb97192da430720aa56f8feb9:

  stat: remove terse v2 blank lines with description not set (2019-05-16 08:42:45 -0600)

----------------------------------------------------------------
Vincent Fu (5):
      client: do not print disk utilization for terse v2
      client: handle disk util for all output formats
      client: add a newline after terse disk util
      docs: improve terse output format documentation
      stat: remove terse v2 blank lines with description not set

 HOWTO    | 12 +++++++++++-
 client.c |  9 ++++++---
 fio.1    | 11 ++++++++++-
 stat.c   |  7 ++++---
 4 files changed, 31 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 4fd4da14..142b83e5 100644
--- a/HOWTO
+++ b/HOWTO
@@ -3693,7 +3693,8 @@ is one long line of values, such as::
     2;card0;0;0;7139336;121836;60004;1;10109;27.932460;116.933948;220;126861;3495.446807;1085.368601;226;126864;3523.635629;1089.012448;24063;99944;50.275485%;59818.274627;5540.657370;7155060;122104;60004;1;8338;29.086342;117.839068;388;128077;5032.488518;1234.785715;391;128085;5061.839412;1236.909129;23436;100928;50.287926%;59964.832030;5644.844189;14.595833%;19.394167%;123706;0;7313;0.1%;0.1%;0.1%;0.1%;0.1%;0.1%;100.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.01%;0.02%;0.05%;0.16%;6.04%;40.40%;52.68%;0.64%;0.01%;0.00%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00%
     A description of this job goes here.
 
-The job description (if provided) follows on a second line.
+The job description (if provided) follows on a second line for terse v2.
+It appears on the same line for other terse versions.
 
 To enable terse output, use the :option:`--minimal` or
 :option:`--output-format`\=terse command line options. The
@@ -3778,6 +3779,11 @@ minimal output v3, separated by semicolons::
 
         terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_cla
 t_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 
+In client/server mode terse output differs from what appears when jobs are run
+locally. Disk utilization data is omitted from the standard terse output and
+for v3 and later appears on its own separate line at the end of each terse
+reporting cycle.
+
 
 JSON output
 ------------
@@ -4056,6 +4062,7 @@ is recorded. Each *data direction* seen within the window period will aggregate
 its values in a separate row. Further, when using windowed logging the *block
 size* and *offset* entries will always contain 0.
 
+
 Client/Server
 -------------
 
@@ -4143,3 +4150,6 @@ containing two hostnames ``h1`` and ``h2`` with IP addresses 192.168.10.120 and
 
 	/mnt/nfs/fio/192.168.10.120.fileio.tmp
 	/mnt/nfs/fio/192.168.10.121.fileio.tmp
+
+Terse output in client/server mode will differ slightly from what is produced
+when fio is run in stand-alone mode. See the terse output section for details.
diff --git a/client.c b/client.c
index 4cbffb62..43cfbd43 100644
--- a/client.c
+++ b/client.c
@@ -1219,12 +1219,15 @@ static void handle_du(struct fio_client *client, struct fio_net_cmd *cmd)
 		json_array_add_disk_util(&du->dus, &du->agg, du_array);
 		duobj = json_array_last_value_object(du_array);
 		json_object_add_client_info(duobj, client);
-	} else if (output_format & FIO_OUTPUT_TERSE)
-		print_disk_util(&du->dus, &du->agg, 1, &client->buf);
-	else if (output_format & FIO_OUTPUT_NORMAL) {
+	}
+	if (output_format & FIO_OUTPUT_NORMAL) {
 		__log_buf(&client->buf, "\nDisk stats (read/write):\n");
 		print_disk_util(&du->dus, &du->agg, 0, &client->buf);
 	}
+	if (output_format & FIO_OUTPUT_TERSE && terse_version >= 3) {
+		print_disk_util(&du->dus, &du->agg, 1, &client->buf);
+		__log_buf(&client->buf, "\n");
+	}
 }
 
 static void convert_jobs_eta(struct jobs_eta *je)
diff --git a/fio.1 b/fio.1
index 2708b503..84b80eee 100644
--- a/fio.1
+++ b/fio.1
@@ -3330,7 +3330,8 @@ is one long line of values, such as:
 		A description of this job goes here.
 .fi
 .P
-The job description (if provided) follows on a second line.
+The job description (if provided) follows on a second line for terse v2.
+It appears on the same line for other terse versions.
 .P
 To enable terse output, use the \fB\-\-minimal\fR or
 `\-\-output\-format=terse' command line options. The
@@ -3465,6 +3466,11 @@ minimal output v3, separated by semicolons:
 .nf
 		terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct1
 0;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 .fi
+.P
+In client/server mode terse output differs from what appears when jobs are run
+locally. Disk utilization data is omitted from the standard terse output and
+for v3 and later appears on its own separate line at the end of each terse
+reporting cycle.
 .SH JSON OUTPUT
 The \fBjson\fR output format is intended to be both human readable and convenient
 for automated parsing. For the most part its sections mirror those of the
@@ -3859,6 +3865,9 @@ containing two hostnames `h1' and `h2' with IP addresses 192.168.10.120 and
 /mnt/nfs/fio/192.168.10.121.fileio.tmp
 .PD
 .RE
+.P
+Terse output in client/server mode will differ slightly from what is produced
+when fio is run in stand-alone mode. See the terse output section for details.
 .SH AUTHORS
 .B fio
 was written by Jens Axboe <axboe@kernel.dk>.
diff --git a/stat.c b/stat.c
index 2bc21dad..bf87917c 100644
--- a/stat.c
+++ b/stat.c
@@ -1244,12 +1244,13 @@ static void show_thread_status_terse_all(struct thread_stat *ts,
 	/* Additional output if continue_on_error set - default off*/
 	if (ts->continue_on_error)
 		log_buf(out, ";%llu;%d", (unsigned long long) ts->total_err_count, ts->first_error);
-	if (ver == 2)
-		log_buf(out, "\n");
 
 	/* Additional output if description is set */
-	if (strlen(ts->description))
+	if (strlen(ts->description)) {
+		if (ver == 2)
+			log_buf(out, "\n");
 		log_buf(out, ";%s", ts->description);
+	}
 
 	log_buf(out, "\n");
 }


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-05-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-05-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2857c34bd39fcd67c489a8e2b19d9455032bad0f:

  t/io_uring: clarify polled support is fs + device (2019-05-08 11:56:05 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8066f6b6177acba17f61fecb9f04d04767fb1a96:

  t/io_uring: improve EOPNOTSUPP message (2019-05-09 09:56:29 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: improve EOPNOTSUPP message

 t/io_uring.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 208b58a5..62dee805 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -264,8 +264,7 @@ static int reap_events(struct submitter *s)
 			if (cqe->res != BS) {
 				printf("io: unexpected ret=%d\n", cqe->res);
 				if (polled && cqe->res == -EOPNOTSUPP)
-					printf("Your filesystem/device doesn't "
-						"support polled IO\n");
+					printf("Your filesystem/driver/kernel doesn't support polled IO\n");
 				return -1;
 			}
 		}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* RE: Recent changes (master)
  2019-05-09 15:52   ` Sebastien Boisvert
@ 2019-05-09 16:12     ` Elliott, Robert (Servers)
  0 siblings, 0 replies; 1305+ messages in thread
From: Elliott, Robert (Servers) @ 2019-05-09 16:12 UTC (permalink / raw)
  To: Sebastien Boisvert, Jens Axboe, fio



> This
> 
> 
> +					printf("Your filesystem/device
> doesn't "
> +						"support polled IO\n");
> 
> will print one line in standard output:
> 
> Your filesystem/device doesn't support polled IO
> 
> 
> Do you mean that you want to grep in the source code ?

Yes, like Linus Torvalds complained about years ago:
    "I've had grep's that fail due to people splitting the actual
    string etc, which just drives me wild."
    https://lkml.org/lkml/2009/12/17/229

It's part of the linux kernel coding style (which fio doesn't
strictly follow, but generally does):

    "Statements longer than 80 columns will be broken into sensible
    chunks, unless exceeding 80 columns significantly increases
    readability and does not hide information. Descendants are
    always substantially shorter than the parent and are placed
    substantially to the right. The same applies to function headers
    with a long argument list. However, never break user-visible
    strings such as printk messages, because that breaks the
    ability to grep for them."



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2019-05-09 15:47 ` Elliott, Robert (Servers)
  2019-05-09 15:52   ` Sebastien Boisvert
@ 2019-05-09 15:57   ` Jens Axboe
  1 sibling, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-05-09 15:57 UTC (permalink / raw)
  To: Elliott, Robert (Servers), fio

On 5/9/19 9:47 AM, Elliott, Robert (Servers) wrote:
> 
> 
>> -----Original Message-----
>> From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf
>> Of Jens Axboe
>> Sent: Thursday, May 09, 2019 7:00 AM
>> To: fio@vger.kernel.org
>> Subject: Recent changes (master)
> ...
>> -					printf("Your filesystem doesn't
>> support poll\n");
>> +					printf("Your filesystem/device
>> doesn't "
>> +						"support polled IO\n");
> 
> Can you keep that string on one line so it is grep-able?

To keep everyone happy, I changed it to one line AND added a kernel
mention too.

Next up, color of the build ;-)

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2019-05-09 15:47 ` Elliott, Robert (Servers)
@ 2019-05-09 15:52   ` Sebastien Boisvert
  2019-05-09 16:12     ` Elliott, Robert (Servers)
  2019-05-09 15:57   ` Jens Axboe
  1 sibling, 1 reply; 1305+ messages in thread
From: Sebastien Boisvert @ 2019-05-09 15:52 UTC (permalink / raw)
  To: Elliott, Robert (Servers), Jens Axboe, fio



On 2019-05-09 11:47 a.m., Elliott, Robert (Servers) wrote:
> 
> 
>> -----Original Message-----
>> From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf
>> Of Jens Axboe
>> Sent: Thursday, May 09, 2019 7:00 AM
>> To: fio@vger.kernel.org
>> Subject: Recent changes (master)
> ...
>> -					printf("Your filesystem doesn't
>> support poll\n");
>> +					printf("Your filesystem/device
>> doesn't "
>> +						"support polled IO\n");
> 
> Can you keep that string on one line so it is grep-able?
> 
> 

This 


+					printf("Your filesystem/device doesn't "
+						"support polled IO\n");

will print one line in standard output:

Your filesystem/device doesn't support polled IO


Do you mean that you want to grep in the source code ?


^ permalink raw reply	[flat|nested] 1305+ messages in thread

* RE: Recent changes (master)
  2019-05-09 12:00 Jens Axboe
  2019-05-09 12:47 ` Erwan Velu
@ 2019-05-09 15:47 ` Elliott, Robert (Servers)
  2019-05-09 15:52   ` Sebastien Boisvert
  2019-05-09 15:57   ` Jens Axboe
  1 sibling, 2 replies; 1305+ messages in thread
From: Elliott, Robert (Servers) @ 2019-05-09 15:47 UTC (permalink / raw)
  To: Jens Axboe, fio



> -----Original Message-----
> From: fio-owner@vger.kernel.org <fio-owner@vger.kernel.org> On Behalf
> Of Jens Axboe
> Sent: Thursday, May 09, 2019 7:00 AM
> To: fio@vger.kernel.org
> Subject: Recent changes (master)
...
> -					printf("Your filesystem doesn't
> support poll\n");
> +					printf("Your filesystem/device
> doesn't "
> +						"support polled IO\n");

Can you keep that string on one line so it is grep-able?




^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2019-05-09 12:47 ` Erwan Velu
@ 2019-05-09 14:07   ` Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-05-09 14:07 UTC (permalink / raw)
  To: Erwan Velu, fio

On 5/9/19 6:47 AM, Erwan Velu wrote:
> Isn't that also related to the kernel version you use ?
> 
> This message might mislead the user as he might just have to upgrade his 
> kernel to gain the feature ?

Yes, the message is for the specific kernel, it doesn't speak for any
future kernels. But I think we're getting into documentation at this
point, the message just means that on this setup, either the fs or
device driver (or both) doesn't support polled IO.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2019-05-09 12:00 Jens Axboe
@ 2019-05-09 12:47 ` Erwan Velu
  2019-05-09 14:07   ` Jens Axboe
  2019-05-09 15:47 ` Elliott, Robert (Servers)
  1 sibling, 1 reply; 1305+ messages in thread
From: Erwan Velu @ 2019-05-09 12:47 UTC (permalink / raw)
  To: Jens Axboe, fio

Isn't that also related to the kernel version you use ?

This message might mislead the user as he might just have to upgrade his 
kernel to gain the feature ?

Le 09/05/2019 à 14:00, Jens Axboe a écrit :
> The following changes since commit f22dd97a5a1342c7dcb84f777a77bd30859cc35b:
>
>    Update CFLAGS and LDFLAGS for FreeBSD builds (2019-05-06 22:18:55 -0600)
>
> are available in the Git repository at:
>
>    git://git.kernel.dk/fio.git master
>
> for you to fetch changes up to 2857c34bd39fcd67c489a8e2b19d9455032bad0f:
>
>    t/io_uring: clarify polled support is fs + device (2019-05-08 11:56:05 -0600)
>
> ----------------------------------------------------------------
> Jens Axboe (1):
>        t/io_uring: clarify polled support is fs + device
>
>   t/io_uring.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> ---
>
> Diff of recent changes:
>
> diff --git a/t/io_uring.c b/t/io_uring.c
> index 79a92311..208b58a5 100644
> --- a/t/io_uring.c
> +++ b/t/io_uring.c
> @@ -264,7 +264,8 @@ static int reap_events(struct submitter *s)
>   			if (cqe->res != BS) {
>   				printf("io: unexpected ret=%d\n", cqe->res);
>   				if (polled && cqe->res == -EOPNOTSUPP)
> -					printf("Your filesystem doesn't support poll\n");
> +					printf("Your filesystem/device doesn't "
> +						"support polled IO\n");
>   				return -1;
>   			}
>   		}

^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-05-09 12:00 Jens Axboe
  2019-05-09 12:47 ` Erwan Velu
  2019-05-09 15:47 ` Elliott, Robert (Servers)
  0 siblings, 2 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-05-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f22dd97a5a1342c7dcb84f777a77bd30859cc35b:

  Update CFLAGS and LDFLAGS for FreeBSD builds (2019-05-06 22:18:55 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2857c34bd39fcd67c489a8e2b19d9455032bad0f:

  t/io_uring: clarify polled support is fs + device (2019-05-08 11:56:05 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: clarify polled support is fs + device

 t/io_uring.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 79a92311..208b58a5 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -264,7 +264,8 @@ static int reap_events(struct submitter *s)
 			if (cqe->res != BS) {
 				printf("io: unexpected ret=%d\n", cqe->res);
 				if (polled && cqe->res == -EOPNOTSUPP)
-					printf("Your filesystem doesn't support poll\n");
+					printf("Your filesystem/device doesn't "
+						"support polled IO\n");
 				return -1;
 			}
 		}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-05-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-05-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 051382218cbe5101a5caa83eab55ed04608f8475:

  io_uring: remove cachehit information (2019-04-25 13:27:54 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f22dd97a5a1342c7dcb84f777a77bd30859cc35b:

  Update CFLAGS and LDFLAGS for FreeBSD builds (2019-05-06 22:18:55 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fix-infinite-loop-of-io_uring' of https://github.com/satoru-takeuchi/fio

Rebecca Cran (1):
      Update CFLAGS and LDFLAGS for FreeBSD builds

Satoru Takeuchi (1):
      io_uring: fix possible infinite loop

 configure          | 4 ++++
 engines/io_uring.c | 2 ++
 2 files changed, 6 insertions(+)

---

Diff of recent changes:

diff --git a/configure b/configure
index c7a7c0ae..d71387c0 100755
--- a/configure
+++ b/configure
@@ -307,6 +307,10 @@ AIX|OpenBSD|NetBSD)
     force_no_lex_o="yes"
   fi
   ;;
+FreeBSD)
+  CFLAGS="$CFLAGS -I/usr/local/include"
+  LDFLAGS="$LDFLAGS -L/usr/local/lib"
+  ;;
 Darwin)
   # on Leopard most of the system is 32-bit, so we have to ask the kernel if
   # we can run 64-bit userspace code.
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 5b3509a9..a5e77d8f 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -233,6 +233,8 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
 		r = fio_ioring_cqring_reap(td, events, max);
 		if (r) {
 			events += r;
+			if (actual_min != 0)
+				actual_min -= r;
 			continue;
 		}
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-04-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-04-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit afc22c98609ee80a99fbed24231181bdab2bc659:

  Merge branch 'libiscsi' of https://github.com/smartxworks/fio (2019-04-22 08:38:55 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 051382218cbe5101a5caa83eab55ed04608f8475:

  io_uring: remove cachehit information (2019-04-25 13:27:54 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Fix __FIO_OPT_G_ISCSI numbering
      io_uring: remove cachehit information

Phillip Chen (1):
      zbd random read conventional zones

 engines/io_uring.c  | 13 -------------
 optgroup.h          |  2 +-
 os/linux/io_uring.h |  5 -----
 t/io_uring.c        | 28 ++++------------------------
 zbd.c               |  2 --
 5 files changed, 5 insertions(+), 45 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 014f954e..5b3509a9 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -62,9 +62,6 @@ struct ioring_data {
 	int cq_ring_off;
 	unsigned iodepth;
 
-	uint64_t cachehit;
-	uint64_t cachemiss;
-
 	struct ioring_mmap mmap[3];
 };
 
@@ -197,13 +194,6 @@ static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 	} else
 		io_u->error = 0;
 
-	if (io_u->ddir == DDIR_READ) {
-		if (cqe->flags & IOCQE_FLAG_CACHEHIT)
-			ld->cachehit++;
-		else
-			ld->cachemiss++;
-	}
-
 	return io_u;
 }
 
@@ -391,9 +381,6 @@ static void fio_ioring_cleanup(struct thread_data *td)
 	struct ioring_data *ld = td->io_ops_data;
 
 	if (ld) {
-		td->ts.cachehit += ld->cachehit;
-		td->ts.cachemiss += ld->cachemiss;
-
 		if (!(td->flags & TD_F_CHILD))
 			fio_ioring_unmap(ld);
 
diff --git a/optgroup.h b/optgroup.h
index 483adddd..148c8da1 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -62,8 +62,8 @@ enum opt_category_group {
 	__FIO_OPT_G_HDFS,
 	__FIO_OPT_G_SG,
 	__FIO_OPT_G_MMAP,
-	__FIO_OPT_G_NR,
 	__FIO_OPT_G_ISCSI,
+	__FIO_OPT_G_NR,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
 	FIO_OPT_G_ZONE		= (1ULL << __FIO_OPT_G_ZONE),
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index 24906e99..e2340869 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -69,11 +69,6 @@ struct io_uring_cqe {
 	__u32	flags;
 };
 
-/*
- * io_uring_event->flags
- */
-#define IOCQE_FLAG_CACHEHIT	(1U << 0)	/* IO did not hit media */
-
 /*
  * Magic offsets for the application to mmap the data it needs
  */
diff --git a/t/io_uring.c b/t/io_uring.c
index 363cba3e..79a92311 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -71,7 +71,6 @@ struct submitter {
 	unsigned long reaps;
 	unsigned long done;
 	unsigned long calls;
-	unsigned long cachehit, cachemiss;
 	volatile int finish;
 
 	__s32 *fds;
@@ -269,10 +268,6 @@ static int reap_events(struct submitter *s)
 				return -1;
 			}
 		}
-		if (cqe->flags & IOCQE_FLAG_CACHEHIT)
-			s->cachehit++;
-		else
-			s->cachemiss++;
 		reaped++;
 		head++;
 	} while (1);
@@ -497,7 +492,7 @@ static void usage(char *argv)
 int main(int argc, char *argv[])
 {
 	struct submitter *s;
-	unsigned long done, calls, reap, cache_hit, cache_miss;
+	unsigned long done, calls, reap;
 	int err, i, flags, fd, opt;
 	char *fdepths;
 	void *ret;
@@ -600,44 +595,29 @@ int main(int argc, char *argv[])
 	pthread_create(&s->thread, NULL, submitter_fn, s);
 
 	fdepths = malloc(8 * s->nr_files);
-	cache_hit = cache_miss = reap = calls = done = 0;
+	reap = calls = done = 0;
 	do {
 		unsigned long this_done = 0;
 		unsigned long this_reap = 0;
 		unsigned long this_call = 0;
-		unsigned long this_cache_hit = 0;
-		unsigned long this_cache_miss = 0;
 		unsigned long rpc = 0, ipc = 0;
-		double hit = 0.0;
 
 		sleep(1);
 		this_done += s->done;
 		this_call += s->calls;
 		this_reap += s->reaps;
-		this_cache_hit += s->cachehit;
-		this_cache_miss += s->cachemiss;
-		if (this_cache_hit && this_cache_miss) {
-			unsigned long hits, total;
-
-			hits = this_cache_hit - cache_hit;
-			total = hits + this_cache_miss - cache_miss;
-			hit = (double) hits / (double) total;
-			hit *= 100.0;
-		}
 		if (this_call - calls) {
 			rpc = (this_done - done) / (this_call - calls);
 			ipc = (this_reap - reap) / (this_call - calls);
 		} else
 			rpc = ipc = -1;
 		file_depths(fdepths);
-		printf("IOPS=%lu, IOS/call=%ld/%ld, inflight=%u (%s), Cachehit=%0.2f%%\n",
+		printf("IOPS=%lu, IOS/call=%ld/%ld, inflight=%u (%s)\n",
 				this_done - done, rpc, ipc, s->inflight,
-				fdepths, hit);
+				fdepths);
 		done = this_done;
 		calls = this_call;
 		reap = this_reap;
-		cache_hit = s->cachehit;
-		cache_miss = s->cachemiss;
 	} while (!finish);
 
 	pthread_join(s->thread, &ret);
diff --git a/zbd.c b/zbd.c
index 1c46b452..d7e91e37 100644
--- a/zbd.c
+++ b/zbd.c
@@ -426,8 +426,6 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 			p->start = z->start << 9;
 			switch (z->cond) {
 			case BLK_ZONE_COND_NOT_WP:
-				p->wp = p->start;
-				break;
 			case BLK_ZONE_COND_FULL:
 				p->wp = p->start + zone_size;
 				break;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-04-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-04-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7f125e7f3879d23e79bc2ef5eed678ddab3b5c70:

  zbd: Fix zone report handling (2019-04-19 09:11:34 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to afc22c98609ee80a99fbed24231181bdab2bc659:

  Merge branch 'libiscsi' of https://github.com/smartxworks/fio (2019-04-22 08:38:55 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'libiscsi' of https://github.com/smartxworks/fio

Kyle Zhang (2):
      filesetup: don't call create_work_dirs() for ioengine with FIO_DISKLESSIO
      fio: add libiscsi engine

 HOWTO                 |   2 +
 Makefile              |   6 +
 configure             |  28 +++-
 engines/libiscsi.c    | 407 ++++++++++++++++++++++++++++++++++++++++++++++++++
 examples/libiscsi.fio |   3 +
 filesetup.c           |  85 ++++++-----
 fio.1                 |   3 +
 optgroup.h            |   2 +
 8 files changed, 494 insertions(+), 42 deletions(-)
 create mode 100644 engines/libiscsi.c
 create mode 100644 examples/libiscsi.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 468772d7..4fd4da14 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1991,6 +1991,8 @@ I/O engine
 			Asynchronous read and write using DDN's Infinite Memory Engine (IME).
 			This engine will try to stack as much IOs as possible by creating
 			requests for IME. FIO will then decide when to commit these requests.
+		**libiscsi**
+			Read and write iscsi lun with libiscsi.
 
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/Makefile b/Makefile
index fd138dd2..d7e5fca7 100644
--- a/Makefile
+++ b/Makefile
@@ -59,6 +59,12 @@ ifdef CONFIG_LIBHDFS
   SOURCE += engines/libhdfs.c
 endif
 
+ifdef CONFIG_LIBISCSI
+  CFLAGS += $(LIBISCSI_CFLAGS)
+  LIBS += $(LIBISCSI_LIBS)
+  SOURCE += engines/libiscsi.c
+endif
+
 ifdef CONFIG_64BIT
   CFLAGS += -DBITS_PER_LONG=64
 endif
diff --git a/configure b/configure
index 3c882f0f..c7a7c0ae 100755
--- a/configure
+++ b/configure
@@ -148,6 +148,7 @@ disable_lex=""
 disable_pmem="no"
 disable_native="no"
 march_set="no"
+libiscsi="no"
 prefix=/usr/local
 
 # parse options
@@ -204,6 +205,8 @@ for opt do
   ;;
   --with-ime=*) ime_path="$optarg"
   ;;
+  --enable-libiscsi) libiscsi="yes"
+  ;;
   --help)
     show_help="yes"
     ;;
@@ -239,6 +242,7 @@ if test "$show_help" = "yes" ; then
   echo "--enable-cuda           Enable GPUDirect RDMA support"
   echo "--disable-native        Don't build for native host"
   echo "--with-ime=             Install path for DDN's Infinite Memory Engine"
+  echo "--enable-libiscsi       Enable iscsi support"
   exit $exit_val
 fi
 
@@ -1970,6 +1974,22 @@ if compile_prog "-I${ime_path}/include" "-L${ime_path}/lib -lim_client" "libime"
 fi
 print_config "DDN's Infinite Memory Engine" "$libime"
 
+##########################################
+# Check if we have required environment variables configured for libiscsi
+if test "$libiscsi" = "yes" ; then
+  if $(pkg-config --atleast-version=1.9.0 libiscsi); then
+    libiscsi="yes"
+    libiscsi_cflags=$(pkg-config --cflags libiscsi)
+    libiscsi_libs=$(pkg-config --libs libiscsi)
+  else
+    if test "$libiscsi" = "yes" ; then
+      echo "libiscsi" "Install libiscsi >= 1.9.0"
+    fi
+    libiscsi="no"
+  fi
+fi
+print_config "iscsi engine" "$libiscsi"
+
 ##########################################
 # Check if we have lex/yacc available
 yacc="no"
@@ -2543,7 +2563,7 @@ if test "$libhdfs" = "yes" ; then
   echo "JAVA_HOME=$JAVA_HOME" >> $config_host_mak
   echo "FIO_LIBHDFS_INCLUDE=$FIO_LIBHDFS_INCLUDE" >> $config_host_mak
   echo "FIO_LIBHDFS_LIB=$FIO_LIBHDFS_LIB" >> $config_host_mak
- fi
+fi
 if test "$mtd" = "yes" ; then
   output_sym "CONFIG_MTD"
 fi
@@ -2627,6 +2647,12 @@ fi
 if test "$thp" = "yes" ; then
   output_sym "CONFIG_HAVE_THP"
 fi
+if test "$libiscsi" = "yes" ; then
+  output_sym "CONFIG_LIBISCSI"
+  echo "CONFIG_LIBISCSI=m" >> $config_host_mak
+  echo "LIBISCSI_CFLAGS=$libiscsi_cflags" >> $config_host_mak
+  echo "LIBISCSI_LIBS=$libiscsi_libs" >> $config_host_mak
+fi
 
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "GFIO_LIBS+=$GFIO_LIBS" >> $config_host_mak
diff --git a/engines/libiscsi.c b/engines/libiscsi.c
new file mode 100644
index 00000000..e4eb0bab
--- /dev/null
+++ b/engines/libiscsi.c
@@ -0,0 +1,407 @@
+/*
+ * libiscsi engine
+ *
+ * this engine read/write iscsi lun with libiscsi.
+ */
+
+
+#include "../fio.h"
+#include "../optgroup.h"
+
+#include <stdlib.h>
+#include <iscsi/iscsi.h>
+#include <iscsi/scsi-lowlevel.h>
+#include <poll.h>
+
+struct iscsi_lun;
+struct iscsi_info;
+
+struct iscsi_task {
+	struct scsi_task	*scsi_task;
+	struct iscsi_lun	*iscsi_lun;
+	struct io_u		*io_u;
+};
+
+struct iscsi_lun {
+	struct iscsi_info	*iscsi_info;
+	struct iscsi_context	*iscsi;
+	struct iscsi_url        *url;
+	int			 block_size;
+	uint64_t		 num_blocks;
+};
+
+struct iscsi_info {
+	struct iscsi_lun	**luns;
+	int			  nr_luns;
+	struct pollfd		 *pfds;
+	struct iscsi_task	**complete_events;
+	int			  nr_events;
+};
+
+struct iscsi_options {
+	void	*pad;
+	char	*initiator;
+};
+
+static struct fio_option options[] = {
+	{
+		.name	  = "initiator",
+		.lname	  = "initiator",
+		.type	  = FIO_OPT_STR_STORE,
+		.off1	  = offsetof(struct iscsi_options, initiator),
+		.def	  = "iqn.2019-04.org.fio:fio",
+		.help	  = "initiator name",
+		.category = FIO_OPT_C_ENGINE,
+		.group	  = FIO_OPT_G_ISCSI,
+	},
+
+	{
+		.name = NULL,
+	},
+};
+
+static int fio_iscsi_setup_lun(struct iscsi_info *iscsi_info,
+			       char *initiator, struct fio_file *f, int i)
+{
+	struct iscsi_lun		*iscsi_lun  = NULL;
+	struct scsi_task		*task	    = NULL;
+	struct scsi_readcapacity16	*rc16	    = NULL;
+	int				 ret	    = 0;
+
+	iscsi_lun = malloc(sizeof(struct iscsi_lun));
+	memset(iscsi_lun, 0, sizeof(struct iscsi_lun));
+
+	iscsi_lun->iscsi_info = iscsi_info;
+
+	iscsi_lun->url = iscsi_parse_full_url(NULL, f->file_name);
+	if (iscsi_lun->url == NULL) {
+		log_err("iscsi: failed to parse url: %s\n", f->file_name);
+		ret = EINVAL;
+		goto out;
+	}
+
+	iscsi_lun->iscsi = iscsi_create_context(initiator);
+	if (iscsi_lun->iscsi == NULL) {
+		log_err("iscsi: failed to create iscsi context.\n");
+		ret = 1;
+		goto out;
+	}
+
+	if (iscsi_set_targetname(iscsi_lun->iscsi, iscsi_lun->url->target)) {
+		log_err("iscsi: failed to set target name.\n");
+		ret = EINVAL;
+		goto out;
+	}
+
+	if (iscsi_set_session_type(iscsi_lun->iscsi, ISCSI_SESSION_NORMAL) != 0) {
+		log_err("iscsi: failed to set session type.\n");
+		ret = EINVAL;
+		goto out;
+	}
+
+	if (iscsi_set_header_digest(iscsi_lun->iscsi,
+				    ISCSI_HEADER_DIGEST_NONE_CRC32C) != 0) {
+		log_err("iscsi: failed to set header digest.\n");
+		ret = EINVAL;
+		goto out;
+	}
+
+	if (iscsi_full_connect_sync(iscsi_lun->iscsi,
+				    iscsi_lun->url->portal,
+				    iscsi_lun->url->lun)) {
+		log_err("sicsi: failed to connect to LUN : %s\n",
+			iscsi_get_error(iscsi_lun->iscsi));
+		ret = EINVAL;
+		goto out;
+	}
+
+	task = iscsi_readcapacity16_sync(iscsi_lun->iscsi, iscsi_lun->url->lun);
+	if (task == NULL || task->status != SCSI_STATUS_GOOD) {
+		log_err("iscsi: failed to send readcapacity command\n");
+		ret = EINVAL;
+		goto out;
+	}
+
+	rc16 = scsi_datain_unmarshall(task);
+	if (rc16 == NULL) {
+		log_err("iscsi: failed to unmarshal readcapacity16 data.\n");
+		ret = EINVAL;
+		goto out;
+	}
+
+	iscsi_lun->block_size = rc16->block_length;
+	iscsi_lun->num_blocks = rc16->returned_lba + 1;
+
+	scsi_free_scsi_task(task);
+	task = NULL;
+
+	f->real_file_size = iscsi_lun->num_blocks * iscsi_lun->block_size;
+	f->engine_data	  = iscsi_lun;
+
+	iscsi_info->luns[i]    = iscsi_lun;
+	iscsi_info->pfds[i].fd = iscsi_get_fd(iscsi_lun->iscsi);
+
+out:
+	if (task) {
+		scsi_free_scsi_task(task);
+	}
+
+	if (ret && iscsi_lun) {
+		if (iscsi_lun->iscsi != NULL) {
+			if (iscsi_is_logged_in(iscsi_lun->iscsi)) {
+				iscsi_logout_sync(iscsi_lun->iscsi);
+			}
+			iscsi_destroy_context(iscsi_lun->iscsi);
+		}
+		free(iscsi_lun);
+	}
+
+	return ret;
+}
+
+static int fio_iscsi_setup(struct thread_data *td)
+{
+	struct iscsi_options	*options    = td->eo;
+	struct iscsi_info	*iscsi_info = NULL;
+	int			 ret	    = 0;
+	struct fio_file		*f;
+	int			 i;
+
+	iscsi_info	    = malloc(sizeof(struct iscsi_info));
+	iscsi_info->nr_luns = td->o.nr_files;
+	iscsi_info->luns    = calloc(iscsi_info->nr_luns, sizeof(struct iscsi_lun*));
+	iscsi_info->pfds    = calloc(iscsi_info->nr_luns, sizeof(struct pollfd));
+
+	iscsi_info->nr_events	    = 0;
+	iscsi_info->complete_events = calloc(td->o.iodepth, sizeof(struct iscsi_task*));
+
+	td->io_ops_data = iscsi_info;
+
+	for_each_file(td, f, i) {
+		ret = fio_iscsi_setup_lun(iscsi_info, options->initiator, f, i);
+		if (ret < 0) break;
+	}
+
+	return ret;
+}
+
+static int fio_iscsi_init(struct thread_data *td) {
+	return 0;
+}
+
+static void fio_iscsi_cleanup_lun(struct iscsi_lun *iscsi_lun) {
+	if (iscsi_lun->iscsi != NULL) {
+		if (iscsi_is_logged_in(iscsi_lun->iscsi)) {
+			iscsi_logout_sync(iscsi_lun->iscsi);
+		}
+		iscsi_destroy_context(iscsi_lun->iscsi);
+	}
+	free(iscsi_lun);
+}
+
+static void fio_iscsi_cleanup(struct thread_data *td)
+{
+	struct iscsi_info *iscsi_info = td->io_ops_data;
+
+	for (int i = 0; i < iscsi_info->nr_luns; i++) {
+		if (iscsi_info->luns[i]) {
+			fio_iscsi_cleanup_lun(iscsi_info->luns[i]);
+			iscsi_info->luns[i] = NULL;
+		}
+	}
+
+	free(iscsi_info->luns);
+	free(iscsi_info->pfds);
+	free(iscsi_info->complete_events);
+	free(iscsi_info);
+}
+
+static int fio_iscsi_prep(struct thread_data *td, struct io_u *io_u)
+{
+	return 0;
+}
+
+static int fio_iscsi_open_file(struct thread_data *td, struct fio_file *f)
+{
+	return 0;
+}
+
+static int fio_iscsi_close_file(struct thread_data *td, struct fio_file *f)
+{
+	return 0;
+}
+
+static void iscsi_cb(struct iscsi_context *iscsi, int status,
+		     void *command_data, void *private_data)
+{
+	struct iscsi_task	*iscsi_task = (struct iscsi_task*)private_data;
+	struct iscsi_lun	*iscsi_lun  = iscsi_task->iscsi_lun;
+	struct iscsi_info       *iscsi_info = iscsi_lun->iscsi_info;
+	struct io_u             *io_u	    = iscsi_task->io_u;
+
+	if (status == SCSI_STATUS_GOOD) {
+		io_u->error = 0;
+	} else {
+		log_err("iscsi: request failed with error %s.\n",
+			iscsi_get_error(iscsi_lun->iscsi));
+
+		io_u->error = 1;
+		io_u->resid = io_u->xfer_buflen;
+	}
+
+	iscsi_info->complete_events[iscsi_info->nr_events] = iscsi_task;
+	iscsi_info->nr_events++;
+}
+
+static enum fio_q_status fio_iscsi_queue(struct thread_data *td,
+					 struct io_u *io_u)
+{
+	struct iscsi_lun	*iscsi_lun  = io_u->file->engine_data;
+	struct scsi_task	*scsi_task  = NULL;
+	struct iscsi_task	*iscsi_task = malloc(sizeof(struct iscsi_task));
+	int			 ret	    = -1;
+
+	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
+		if (io_u->offset % iscsi_lun->block_size != 0) {
+			log_err("iscsi: offset is not align to block size.\n");
+			ret = -1;
+			goto out;
+		}
+
+		if (io_u->xfer_buflen % iscsi_lun->block_size != 0) {
+			log_err("iscsi: buflen is not align to block size.\n");
+			ret = -1;
+			goto out;
+		}
+	}
+
+	if (io_u->ddir == DDIR_READ) {
+		scsi_task = scsi_cdb_read16(io_u->offset / iscsi_lun->block_size,
+					    io_u->xfer_buflen,
+					    iscsi_lun->block_size,
+					    0, 0, 0, 0, 0);
+		ret = scsi_task_add_data_in_buffer(scsi_task, io_u->xfer_buflen,
+						   io_u->xfer_buf);
+		if (ret < 0) {
+			log_err("iscsi: failed to add data in buffer.\n");
+			goto out;
+		}
+	} else if (io_u->ddir == DDIR_WRITE) {
+		scsi_task = scsi_cdb_write16(io_u->offset / iscsi_lun->block_size,
+					     io_u->xfer_buflen,
+					     iscsi_lun->block_size,
+					     0, 0, 0, 0, 0);
+		ret = scsi_task_add_data_out_buffer(scsi_task, io_u->xfer_buflen,
+						    io_u->xfer_buf);
+		if (ret < 0) {
+			log_err("iscsi: failed to add data out buffer.\n");
+			goto out;
+		}
+	} else if (ddir_sync(io_u->ddir)) {
+		scsi_task = scsi_cdb_synchronizecache16(
+			0, iscsi_lun->num_blocks * iscsi_lun->block_size, 0, 0);
+	} else {
+		log_err("iscsi: invalid I/O operation: %d\n", io_u->ddir);
+		ret = EINVAL;
+		goto out;
+	}
+
+	iscsi_task->scsi_task = scsi_task;
+	iscsi_task->iscsi_lun = iscsi_lun;
+	iscsi_task->io_u      = io_u;
+
+	ret = iscsi_scsi_command_async(iscsi_lun->iscsi, iscsi_lun->url->lun,
+				       scsi_task, iscsi_cb, NULL, iscsi_task);
+	if (ret < 0) {
+		log_err("iscsi: failed to send scsi command.\n");
+		goto out;
+	}
+
+	return FIO_Q_QUEUED;
+
+out:
+	if (iscsi_task) {
+		free(iscsi_task);
+	}
+
+	if (scsi_task) {
+		scsi_free_scsi_task(scsi_task);
+	}
+
+	if (ret) {
+		io_u->error = ret;
+	}
+	return FIO_Q_COMPLETED;
+}
+
+static int fio_iscsi_getevents(struct thread_data *td, unsigned int min,
+			       unsigned int max, const struct timespec *t)
+{
+	struct iscsi_info	*iscsi_info = td->io_ops_data;
+	int			 ret	    = 0;
+
+	iscsi_info->nr_events = 0;
+
+	while (iscsi_info->nr_events < min) {
+		for (int i = 0; i < iscsi_info->nr_luns; i++) {
+			int events = iscsi_which_events(iscsi_info->luns[i]->iscsi);
+			iscsi_info->pfds[i].events = events;
+		}
+
+		ret = poll(iscsi_info->pfds, iscsi_info->nr_luns, -1);
+		if (ret < 0) {
+			log_err("iscsi: failed to poll events: %s.\n",
+				strerror(errno));
+			break;
+		}
+
+		for (int i = 0; i < iscsi_info->nr_luns; i++) {
+			ret = iscsi_service(iscsi_info->luns[i]->iscsi,
+					    iscsi_info->pfds[i].revents);
+			assert(ret >= 0);
+		}
+	}
+
+	return ret < 0 ? ret : iscsi_info->nr_events;
+}
+
+static struct io_u *fio_iscsi_event(struct thread_data *td, int event)
+{
+	struct iscsi_info	*iscsi_info = (struct iscsi_info*)td->io_ops_data;
+	struct iscsi_task	*iscsi_task = iscsi_info->complete_events[event];
+	struct io_u		*io_u	    = iscsi_task->io_u;
+
+	iscsi_info->complete_events[event] = NULL;
+
+	scsi_free_scsi_task(iscsi_task->scsi_task);
+	free(iscsi_task);
+
+	return io_u;
+}
+
+static struct ioengine_ops ioengine_iscsi = {
+	.name               = "libiscsi",
+	.version            = FIO_IOOPS_VERSION,
+	.flags              = FIO_SYNCIO | FIO_DISKLESSIO | FIO_NODISKUTIL,
+	.setup              = fio_iscsi_setup,
+	.init               = fio_iscsi_init,
+	.prep               = fio_iscsi_prep,
+	.queue              = fio_iscsi_queue,
+	.getevents          = fio_iscsi_getevents,
+	.event              = fio_iscsi_event,
+	.cleanup            = fio_iscsi_cleanup,
+	.open_file          = fio_iscsi_open_file,
+	.close_file         = fio_iscsi_close_file,
+	.option_struct_size = sizeof(struct iscsi_options),
+	.options	    = options,
+};
+
+static void fio_init fio_iscsi_register(void)
+{
+	register_ioengine(&ioengine_iscsi);
+}
+
+static void fio_exit fio_iscsi_unregister(void)
+{
+	unregister_ioengine(&ioengine_iscsi);
+}
diff --git a/examples/libiscsi.fio b/examples/libiscsi.fio
new file mode 100644
index 00000000..565604dd
--- /dev/null
+++ b/examples/libiscsi.fio
@@ -0,0 +1,3 @@
+[iscsi]
+ioengine=libiscsi
+filename=iscsi\://127.0.0.1/iqn.2016-02.com.fio\:system\:fio/1
diff --git a/filesetup.c b/filesetup.c
index 47c889a0..24e6fb07 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -890,6 +890,42 @@ uint64_t get_start_offset(struct thread_data *td, struct fio_file *f)
 	return offset;
 }
 
+static bool create_work_dirs(struct thread_data *td, const char *fname)
+{
+	char path[PATH_MAX];
+	char *start, *end;
+
+	if (td->o.directory) {
+		snprintf(path, PATH_MAX, "%s%c%s", td->o.directory,
+			 FIO_OS_PATH_SEPARATOR, fname);
+		start = strstr(path, fname);
+	} else {
+		snprintf(path, PATH_MAX, "%s", fname);
+		start = path;
+	}
+
+	end = start;
+	while ((end = strchr(end, FIO_OS_PATH_SEPARATOR)) != NULL) {
+		if (end == start)
+			break;
+		*end = '\0';
+		errno = 0;
+#ifdef CONFIG_HAVE_MKDIR_TWO
+		if (mkdir(path, 0600) && errno != EEXIST) {
+#else
+		if (mkdir(path) && errno != EEXIST) {
+#endif
+			log_err("fio: failed to create dir (%s): %d\n",
+				start, errno);
+			return false;
+		}
+		*end = FIO_OS_PATH_SEPARATOR;
+		end++;
+	}
+	td->flags |= TD_F_DIRS_CREATED;
+	return true;
+}
+
 /*
  * Open the files and setup files sizes, creating files if necessary.
  */
@@ -908,6 +944,14 @@ int setup_files(struct thread_data *td)
 
 	old_state = td_bump_runstate(td, TD_SETTING_UP);
 
+	for_each_file(td, f, i) {
+		if (!td_ioengine_flagged(td, FIO_DISKLESSIO) &&
+		    strchr(f->file_name, FIO_OS_PATH_SEPARATOR) &&
+		    !(td->flags & TD_F_DIRS_CREATED) &&
+		    !create_work_dirs(td, f->file_name))
+			goto err_out;
+	}
+
 	/*
 	 * Find out physical size of files or devices for this thread,
 	 * before we determine I/O size and range of our targets.
@@ -1517,42 +1561,6 @@ bool exists_and_not_regfile(const char *filename)
 	return true;
 }
 
-static bool create_work_dirs(struct thread_data *td, const char *fname)
-{
-	char path[PATH_MAX];
-	char *start, *end;
-
-	if (td->o.directory) {
-		snprintf(path, PATH_MAX, "%s%c%s", td->o.directory,
-			 FIO_OS_PATH_SEPARATOR, fname);
-		start = strstr(path, fname);
-	} else {
-		snprintf(path, PATH_MAX, "%s", fname);
-		start = path;
-	}
-
-	end = start;
-	while ((end = strchr(end, FIO_OS_PATH_SEPARATOR)) != NULL) {
-		if (end == start)
-			break;
-		*end = '\0';
-		errno = 0;
-#ifdef CONFIG_HAVE_MKDIR_TWO
-		if (mkdir(path, 0600) && errno != EEXIST) {
-#else
-		if (mkdir(path) && errno != EEXIST) {
-#endif
-			log_err("fio: failed to create dir (%s): %d\n",
-				start, errno);
-			return false;
-		}
-		*end = FIO_OS_PATH_SEPARATOR;
-		end++;
-	}
-	td->flags |= TD_F_DIRS_CREATED;
-	return true;
-}
-
 int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 {
 	int cur_files = td->files_index;
@@ -1568,11 +1576,6 @@ int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 
 	sprintf(file_name + len, "%s", fname);
 
-	if (strchr(fname, FIO_OS_PATH_SEPARATOR) &&
-	    !(td->flags & TD_F_DIRS_CREATED) &&
-	    !create_work_dirs(td, fname))
-		return 1;
-
 	/* clean cloned siblings using existing files */
 	if (numjob && is_already_allocated(file_name) &&
 	    !exists_and_not_regfile(fname))
diff --git a/fio.1 b/fio.1
index ed492682..2708b503 100644
--- a/fio.1
+++ b/fio.1
@@ -1751,6 +1751,9 @@ are "contiguous" and the IO depth is not exceeded) before issuing a call to IME.
 Asynchronous read and write using DDN's Infinite Memory Engine (IME). This
 engine will try to stack as much IOs as possible by creating requests for IME.
 FIO will then decide when to commit these requests.
+.TP
+.B libiscsi
+Read and write iscsi lun with libiscsi.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
diff --git a/optgroup.h b/optgroup.h
index bf1bb036..483adddd 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -63,6 +63,7 @@ enum opt_category_group {
 	__FIO_OPT_G_SG,
 	__FIO_OPT_G_MMAP,
 	__FIO_OPT_G_NR,
+	__FIO_OPT_G_ISCSI,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
 	FIO_OPT_G_ZONE		= (1ULL << __FIO_OPT_G_ZONE),
@@ -100,6 +101,7 @@ enum opt_category_group {
 	FIO_OPT_G_SG		= (1ULL << __FIO_OPT_G_SG),
 	FIO_OPT_G_MMAP		= (1ULL << __FIO_OPT_G_MMAP),
 	FIO_OPT_G_INVALID	= (1ULL << __FIO_OPT_G_NR),
+	FIO_OPT_G_ISCSI         = (1ULL << __FIO_OPT_G_ISCSI),
 };
 
 extern const struct opt_group *opt_group_from_mask(uint64_t *mask);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-04-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-04-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ff547caad147b437f654881717fedc750c0c9b17:

  fio: Add advise THP option to mmap engine (2019-04-18 10:46:19 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7f125e7f3879d23e79bc2ef5eed678ddab3b5c70:

  zbd: Fix zone report handling (2019-04-19 09:11:34 -0600)

----------------------------------------------------------------
Damien Le Moal (1):
      zbd: Fix zone report handling

 zbd.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/zbd.c b/zbd.c
index 2da742b7..1c46b452 100644
--- a/zbd.c
+++ b/zbd.c
@@ -186,11 +186,14 @@ static bool zbd_verify_bs(void)
  * size of @buf.
  *
  * Returns 0 upon success and a negative error code upon failure.
+ * If the zone report is empty, always assume an error (device problem) and
+ * return -EIO.
  */
 static int read_zone_info(int fd, uint64_t start_sector,
 			  void *buf, unsigned int bufsz)
 {
 	struct blk_zone_report *hdr = buf;
+	int ret;
 
 	if (bufsz < sizeof(*hdr))
 		return -EINVAL;
@@ -199,7 +202,12 @@ static int read_zone_info(int fd, uint64_t start_sector,
 
 	hdr->nr_zones = (bufsz - sizeof(*hdr)) / sizeof(struct blk_zone);
 	hdr->sector = start_sector;
-	return ioctl(fd, BLKREPORTZONE, hdr) >= 0 ? 0 : -errno;
+	ret = ioctl(fd, BLKREPORTZONE, hdr);
+	if (ret)
+		return -errno;
+	if (!hdr->nr_zones)
+		return -EIO;
+	return 0;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-04-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-04-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 96416da576d47c01dd3c9481b03e969908b3ff74:

  Merge branch 'master' of https://github.com/mingnus/fio (2019-04-17 06:50:25 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ff547caad147b437f654881717fedc750c0c9b17:

  fio: Add advise THP option to mmap engine (2019-04-18 10:46:19 -0600)

----------------------------------------------------------------
Keith Busch (1):
      fio: Add advise THP option to mmap engine

 configure      | 27 +++++++++++++++++++++++++++
 engines/mmap.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 optgroup.h     |  2 ++
 3 files changed, 81 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 6e549cdc..3c882f0f 100755
--- a/configure
+++ b/configure
@@ -2326,6 +2326,30 @@ if compile_prog "-Wimplicit-fallthrough" "" "-Wimplicit-fallthrough"; then
 fi
 print_config "-Wimplicit-fallthrough" "$fallthrough"
 
+##########################################
+# check for MADV_HUGEPAGE support
+if test "$thp" != "yes" ; then
+  thp="no"
+fi
+if test "$esx" != "yes" ; then
+  cat > $TMPC <<EOF
+#include <sys/mman.h>
+int main(void)
+{
+  return madvise(0, 0x1000, MADV_HUGEPAGE);
+}
+EOF
+  if compile_prog "" "" "thp" ; then
+    thp=yes
+  else
+    if test "$thp" = "yes" ; then
+      feature_not_found "Transparent Huge Page" ""
+    fi
+    thp=no
+  fi
+fi
+print_config "MADV_HUGEPAGE" "$thp"
+
 #############################################################################
 
 if test "$wordsize" = "64" ; then
@@ -2600,6 +2624,9 @@ fi
 if test "$fallthrough" = "yes"; then
   CFLAGS="$CFLAGS -Wimplicit-fallthrough"
 fi
+if test "$thp" = "yes" ; then
+  output_sym "CONFIG_HAVE_THP"
+fi
 
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "GFIO_LIBS+=$GFIO_LIBS" >> $config_host_mak
diff --git a/engines/mmap.c b/engines/mmap.c
index 308b4665..55ba1ab3 100644
--- a/engines/mmap.c
+++ b/engines/mmap.c
@@ -11,6 +11,7 @@
 #include <sys/mman.h>
 
 #include "../fio.h"
+#include "../optgroup.h"
 #include "../verify.h"
 
 /*
@@ -26,11 +27,40 @@ struct fio_mmap_data {
 	off_t mmap_off;
 };
 
+#ifdef CONFIG_HAVE_THP
+struct mmap_options {
+	void *pad;
+	unsigned int thp;
+};
+
+static struct fio_option options[] = {
+	{
+		.name	= "thp",
+		.lname	= "Transparent Huge Pages",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct mmap_options, thp),
+		.help	= "Memory Advise Huge Page",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_MMAP,
+	},
+	{
+		.name = NULL,
+	},
+};
+#endif
+
 static bool fio_madvise_file(struct thread_data *td, struct fio_file *f,
 			     size_t length)
 
 {
 	struct fio_mmap_data *fmd = FILE_ENG_DATA(f);
+#ifdef CONFIG_HAVE_THP
+	struct mmap_options *o = td->eo;
+
+	/* Ignore errors on this optional advisory */
+	if (o->thp)
+		madvise(fmd->mmap_ptr, length, MADV_HUGEPAGE);
+#endif
 
 	if (!td->o.fadvise_hint)
 		return true;
@@ -50,11 +80,27 @@ static bool fio_madvise_file(struct thread_data *td, struct fio_file *f,
 	return true;
 }
 
+#ifdef CONFIG_HAVE_THP
+static int fio_mmap_get_shared(struct thread_data *td)
+{
+	struct mmap_options *o = td->eo;
+
+	if (o->thp)
+		return MAP_PRIVATE;
+	return MAP_SHARED;
+}
+#else
+static int fio_mmap_get_shared(struct thread_data *td)
+{
+	return MAP_SHARED;
+}
+#endif
+
 static int fio_mmap_file(struct thread_data *td, struct fio_file *f,
 			 size_t length, off_t off)
 {
 	struct fio_mmap_data *fmd = FILE_ENG_DATA(f);
-	int flags = 0;
+	int flags = 0, shared = fio_mmap_get_shared(td);
 
 	if (td_rw(td) && !td->o.verify_only)
 		flags = PROT_READ | PROT_WRITE;
@@ -66,7 +112,7 @@ static int fio_mmap_file(struct thread_data *td, struct fio_file *f,
 	} else
 		flags = PROT_READ;
 
-	fmd->mmap_ptr = mmap(NULL, length, flags, MAP_SHARED, f->fd, off);
+	fmd->mmap_ptr = mmap(NULL, length, flags, shared, f->fd, off);
 	if (fmd->mmap_ptr == MAP_FAILED) {
 		fmd->mmap_ptr = NULL;
 		td_verror(td, errno, "mmap");
@@ -275,6 +321,10 @@ static struct ioengine_ops ioengine = {
 	.close_file	= fio_mmapio_close_file,
 	.get_file_size	= generic_get_file_size,
 	.flags		= FIO_SYNCIO | FIO_NOEXTEND,
+#ifdef CONFIG_HAVE_THP
+	.options	= options,
+	.option_struct_size = sizeof(struct mmap_options),
+#endif
 };
 
 static void fio_init fio_mmapio_register(void)
diff --git a/optgroup.h b/optgroup.h
index adf4d09b..bf1bb036 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -61,6 +61,7 @@ enum opt_category_group {
 	__FIO_OPT_G_MTD,
 	__FIO_OPT_G_HDFS,
 	__FIO_OPT_G_SG,
+	__FIO_OPT_G_MMAP,
 	__FIO_OPT_G_NR,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
@@ -97,6 +98,7 @@ enum opt_category_group {
 	FIO_OPT_G_MTD		= (1ULL << __FIO_OPT_G_MTD),
 	FIO_OPT_G_HDFS		= (1ULL << __FIO_OPT_G_HDFS),
 	FIO_OPT_G_SG		= (1ULL << __FIO_OPT_G_SG),
+	FIO_OPT_G_MMAP		= (1ULL << __FIO_OPT_G_MMAP),
 	FIO_OPT_G_INVALID	= (1ULL << __FIO_OPT_G_NR),
 };
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-04-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-04-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ef32da643574162c2b1aae67af38ecffb43db036:

  Merge branch 'patch-1' of https://github.com/neheb/fio (2019-04-01 06:57:22 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 96416da576d47c01dd3c9481b03e969908b3ff74:

  Merge branch 'master' of https://github.com/mingnus/fio (2019-04-17 06:50:25 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'master' of https://github.com/mingnus/fio

Ming-Hung Tsai (1):
      rand: fix truncated rand_seed on Windows

 filesetup.c     |  2 +-
 fio.h           |  2 +-
 init.c          |  2 +-
 lib/lfsr.c      |  6 +++---
 lib/lfsr.h      |  4 ++--
 lib/rand.c      | 20 ++++++++++----------
 lib/rand.h      |  8 ++++----
 os/os-windows.h |  2 +-
 os/os.h         |  2 +-
 verify.c        | 10 +++++-----
 verify.h        |  2 +-
 11 files changed, 30 insertions(+), 30 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index aa1a3945..47c889a0 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1287,7 +1287,7 @@ bool init_random_map(struct thread_data *td)
 			return false;
 
 		if (td->o.random_generator == FIO_RAND_GEN_LFSR) {
-			unsigned long seed;
+			uint64_t seed;
 
 			seed = td->rand_seeds[FIO_RAND_BLOCK_OFF];
 
diff --git a/fio.h b/fio.h
index b3ba5db2..2103151d 100644
--- a/fio.h
+++ b/fio.h
@@ -245,7 +245,7 @@ struct thread_data {
 	void *iolog_buf;
 	FILE *iolog_f;
 
-	unsigned long rand_seeds[FIO_RAND_NR_OFFS];
+	uint64_t rand_seeds[FIO_RAND_NR_OFFS];
 
 	struct frand_state bsrange_state[DDIR_RWDIR_CNT];
 	struct frand_state verify_state;
diff --git a/init.c b/init.c
index e6378715..73834279 100644
--- a/init.c
+++ b/init.c
@@ -1217,7 +1217,7 @@ static void init_flags(struct thread_data *td)
 
 static int setup_random_seeds(struct thread_data *td)
 {
-	unsigned long seed;
+	uint64_t seed;
 	unsigned int i;
 
 	if (!td->o.rand_repeatable && !fio_option_is_set(&td->o, rand_seed)) {
diff --git a/lib/lfsr.c b/lib/lfsr.c
index 32fbec56..1ef6ebbf 100644
--- a/lib/lfsr.c
+++ b/lib/lfsr.c
@@ -232,7 +232,7 @@ static int prepare_spin(struct fio_lfsr *fl, unsigned int spin)
 	return 0;
 }
 
-int lfsr_reset(struct fio_lfsr *fl, unsigned long seed)
+int lfsr_reset(struct fio_lfsr *fl, uint64_t seed)
 {
 	uint64_t bitmask = (fl->cached_bit << 1) - 1;
 
@@ -246,8 +246,8 @@ int lfsr_reset(struct fio_lfsr *fl, unsigned long seed)
 	return 0;
 }
 
-int lfsr_init(struct fio_lfsr *fl, uint64_t nums, unsigned long seed,
-		unsigned int spin)
+int lfsr_init(struct fio_lfsr *fl, uint64_t nums, uint64_t seed,
+	      unsigned int spin)
 {
 	uint8_t *taps;
 
diff --git a/lib/lfsr.h b/lib/lfsr.h
index c2d55693..95bc07fd 100644
--- a/lib/lfsr.h
+++ b/lib/lfsr.h
@@ -24,7 +24,7 @@ struct fio_lfsr {
 
 int lfsr_next(struct fio_lfsr *fl, uint64_t *off);
 int lfsr_init(struct fio_lfsr *fl, uint64_t size,
-		unsigned long seed, unsigned int spin);
-int lfsr_reset(struct fio_lfsr *fl, unsigned long seed);
+	      uint64_t seed, unsigned int spin);
+int lfsr_reset(struct fio_lfsr *fl, uint64_t seed);
 
 #endif
diff --git a/lib/rand.c b/lib/rand.c
index f18bd8d8..69acb06c 100644
--- a/lib/rand.c
+++ b/lib/rand.c
@@ -95,7 +95,7 @@ void init_rand_seed(struct frand_state *state, unsigned int seed, bool use64)
 		__init_rand64(&state->state64, seed);
 }
 
-void __fill_random_buf(void *buf, unsigned int len, unsigned long seed)
+void __fill_random_buf(void *buf, unsigned int len, uint64_t seed)
 {
 	void *ptr = buf;
 
@@ -122,10 +122,10 @@ void __fill_random_buf(void *buf, unsigned int len, unsigned long seed)
 	}
 }
 
-unsigned long fill_random_buf(struct frand_state *fs, void *buf,
-			      unsigned int len)
+uint64_t fill_random_buf(struct frand_state *fs, void *buf,
+			 unsigned int len)
 {
-	unsigned long r = __rand(fs);
+	uint64_t r = __rand(fs);
 
 	if (sizeof(int) != sizeof(long *))
 		r *= (unsigned long) __rand(fs);
@@ -134,7 +134,7 @@ unsigned long fill_random_buf(struct frand_state *fs, void *buf,
 	return r;
 }
 
-void __fill_random_buf_percentage(unsigned long seed, void *buf,
+void __fill_random_buf_percentage(uint64_t seed, void *buf,
 				  unsigned int percentage,
 				  unsigned int segment, unsigned int len,
 				  char *pattern, unsigned int pbytes)
@@ -183,12 +183,12 @@ void __fill_random_buf_percentage(unsigned long seed, void *buf,
 	}
 }
 
-unsigned long fill_random_buf_percentage(struct frand_state *fs, void *buf,
-					 unsigned int percentage,
-					 unsigned int segment, unsigned int len,
-					 char *pattern, unsigned int pbytes)
+uint64_t fill_random_buf_percentage(struct frand_state *fs, void *buf,
+				    unsigned int percentage,
+				    unsigned int segment, unsigned int len,
+				    char *pattern, unsigned int pbytes)
 {
-	unsigned long r = __rand(fs);
+	uint64_t r = __rand(fs);
 
 	if (sizeof(int) != sizeof(long *))
 		r *= (unsigned long) __rand(fs);
diff --git a/lib/rand.h b/lib/rand.h
index 1676cf98..95d4f6d4 100644
--- a/lib/rand.h
+++ b/lib/rand.h
@@ -150,9 +150,9 @@ static inline uint64_t rand_between(struct frand_state *state, uint64_t start,
 
 extern void init_rand(struct frand_state *, bool);
 extern void init_rand_seed(struct frand_state *, unsigned int seed, bool);
-extern void __fill_random_buf(void *buf, unsigned int len, unsigned long seed);
-extern unsigned long fill_random_buf(struct frand_state *, void *buf, unsigned int len);
-extern void __fill_random_buf_percentage(unsigned long, void *, unsigned int, unsigned int, unsigned int, char *, unsigned int);
-extern unsigned long fill_random_buf_percentage(struct frand_state *, void *, unsigned int, unsigned int, unsigned int, char *, unsigned int);
+extern void __fill_random_buf(void *buf, unsigned int len, uint64_t seed);
+extern uint64_t fill_random_buf(struct frand_state *, void *buf, unsigned int len);
+extern void __fill_random_buf_percentage(uint64_t, void *, unsigned int, unsigned int, unsigned int, char *, unsigned int);
+extern uint64_t fill_random_buf_percentage(struct frand_state *, void *, unsigned int, unsigned int, unsigned int, char *, unsigned int);
 
 #endif
diff --git a/os/os-windows.h b/os/os-windows.h
index ef955dc3..dc958f5c 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -167,7 +167,7 @@ static inline int gettid(void)
 	return GetCurrentThreadId();
 }
 
-static inline int init_random_seeds(unsigned long *rand_seeds, int size)
+static inline int init_random_seeds(uint64_t *rand_seeds, int size)
 {
 	HCRYPTPROV hCryptProv;
 
diff --git a/os/os.h b/os/os.h
index 36b6bb2e..756ece4b 100644
--- a/os/os.h
+++ b/os/os.h
@@ -323,7 +323,7 @@ static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 #endif
 
 #ifdef FIO_USE_GENERIC_INIT_RANDOM_STATE
-static inline int init_random_seeds(unsigned long *rand_seeds, int size)
+static inline int init_random_seeds(uint64_t *rand_seeds, int size)
 {
 	int fd;
 
diff --git a/verify.c b/verify.c
index da429e79..f79ab43a 100644
--- a/verify.c
+++ b/verify.c
@@ -39,14 +39,14 @@ void fill_buffer_pattern(struct thread_data *td, void *p, unsigned int len)
 	(void)cpy_pattern(td->o.buffer_pattern, td->o.buffer_pattern_bytes, p, len);
 }
 
-static void __fill_buffer(struct thread_options *o, unsigned long seed, void *p,
+static void __fill_buffer(struct thread_options *o, uint64_t seed, void *p,
 			  unsigned int len)
 {
 	__fill_random_buf_percentage(seed, p, o->compress_percentage, len, len, o->buffer_pattern, o->buffer_pattern_bytes);
 }
 
-static unsigned long fill_buffer(struct thread_data *td, void *p,
-				 unsigned int len)
+static uint64_t fill_buffer(struct thread_data *td, void *p,
+			    unsigned int len)
 {
 	struct frand_state *fs = &td->verify_state;
 	struct thread_options *o = &td->o;
@@ -55,7 +55,7 @@ static unsigned long fill_buffer(struct thread_data *td, void *p,
 }
 
 void fill_verify_pattern(struct thread_data *td, void *p, unsigned int len,
-			 struct io_u *io_u, unsigned long seed, int use_seed)
+			 struct io_u *io_u, uint64_t seed, int use_seed)
 {
 	struct thread_options *o = &td->o;
 
@@ -100,7 +100,7 @@ static unsigned int get_hdr_inc(struct thread_data *td, struct io_u *io_u)
 }
 
 static void fill_pattern_headers(struct thread_data *td, struct io_u *io_u,
-				 unsigned long seed, int use_seed)
+				 uint64_t seed, int use_seed)
 {
 	unsigned int hdr_inc, header_num;
 	struct verify_header *hdr;
diff --git a/verify.h b/verify.h
index 64121a51..539e6f6c 100644
--- a/verify.h
+++ b/verify.h
@@ -97,7 +97,7 @@ extern void populate_verify_io_u(struct thread_data *, struct io_u *);
 extern int __must_check get_next_verify(struct thread_data *td, struct io_u *);
 extern int __must_check verify_io_u(struct thread_data *, struct io_u **);
 extern int verify_io_u_async(struct thread_data *, struct io_u **);
-extern void fill_verify_pattern(struct thread_data *td, void *p, unsigned int len, struct io_u *io_u, unsigned long seed, int use_seed);
+extern void fill_verify_pattern(struct thread_data *td, void *p, unsigned int len, struct io_u *io_u, uint64_t seed, int use_seed);
 extern void fill_buffer_pattern(struct thread_data *td, void *p, unsigned int len);
 extern void fio_verify_init(struct thread_data *td);
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-04-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-04-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit bf14b39eb98873ebd61e78d37b51233d47ed8aef:

  stat: eliminate unneeded curly braces (2019-03-25 08:23:13 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ef32da643574162c2b1aae67af38ecffb43db036:

  Merge branch 'patch-1' of https://github.com/neheb/fio (2019-04-01 06:57:22 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'patch-1' of https://github.com/neheb/fio

Rosen Penev (1):
      arch: fix build breakage on armv6 again

 arch/arch-arm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/arch/arch-arm.h b/arch/arch-arm.h
index fc1c4844..78cb2ebe 100644
--- a/arch/arch-arm.h
+++ b/arch/arch-arm.h
@@ -7,7 +7,7 @@
 	|| defined (__ARM_ARCH_5__) || defined (__ARM_ARCH_5T__) || defined (__ARM_ARCH_5E__)\
 	|| defined (__ARM_ARCH_5TE__) || defined (__ARM_ARCH_5TEJ__) \
 	|| defined(__ARM_ARCH_6__)  || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
-	|| defined(__ARM_ARCH_6KZ__)
+	|| defined(__ARM_ARCH_6KZ__) || defined(__ARM_ARCH_6K__)
 #define nop             __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t")
 #define read_barrier()	__asm__ __volatile__ ("" : : : "memory")
 #define write_barrier()	__asm__ __volatile__ ("" : : : "memory")


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-03-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-03-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c5daece64fd56763f264a59965a547433d4da799:

  stat: fix accumulation of latency buckets (2019-03-21 10:53:39 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to bf14b39eb98873ebd61e78d37b51233d47ed8aef:

  stat: eliminate unneeded curly braces (2019-03-25 08:23:13 -0600)

----------------------------------------------------------------
Vincent Fu (2):
      client: put All clients section at end of normal output
      stat: eliminate unneeded curly braces

 client.c |  6 +++++-
 stat.c   | 11 +++++------
 2 files changed, 10 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/client.c b/client.c
index 8d7c0331..4cbffb62 100644
--- a/client.c
+++ b/client.c
@@ -59,6 +59,7 @@ struct group_run_stats client_gs;
 int sum_stat_clients;
 
 static int sum_stat_nr;
+static struct buf_output allclients;
 static struct json_object *root = NULL;
 static struct json_object *job_opt_object = NULL;
 static struct json_array *clients_array = NULL;
@@ -1103,7 +1104,7 @@ static void handle_ts(struct fio_client *client, struct fio_net_cmd *cmd)
 
 	if (++sum_stat_nr == sum_stat_clients) {
 		strcpy(client_ts.name, "All clients");
-		tsobj = show_thread_status(&client_ts, &client_gs, NULL, &client->buf);
+		tsobj = show_thread_status(&client_ts, &client_gs, NULL, &allclients);
 		if (tsobj) {
 			json_object_add_client_info(tsobj, client);
 			json_array_add_value_object(clients_array, tsobj);
@@ -2129,6 +2130,9 @@ int fio_handle_clients(struct client_ops *ops)
 		}
 	}
 
+	log_info_buf(allclients.buf, allclients.buflen);
+	buf_output_free(&allclients);
+
 	fio_client_json_fini();
 
 	free(pfds);
diff --git a/stat.c b/stat.c
index ecef1099..2bc21dad 100644
--- a/stat.c
+++ b/stat.c
@@ -1682,15 +1682,14 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 		dst->io_u_submit[k] += src->io_u_submit[k];
 		dst->io_u_complete[k] += src->io_u_complete[k];
 	}
-	for (k = 0; k < FIO_IO_U_LAT_N_NR; k++) {
+
+	for (k = 0; k < FIO_IO_U_LAT_N_NR; k++)
 		dst->io_u_lat_n[k] += src->io_u_lat_n[k];
-	}
-	for (k = 0; k < FIO_IO_U_LAT_U_NR; k++) {
+	for (k = 0; k < FIO_IO_U_LAT_U_NR; k++)
 		dst->io_u_lat_u[k] += src->io_u_lat_u[k];
-	}
-	for (k = 0; k < FIO_IO_U_LAT_M_NR; k++) {
+	for (k = 0; k < FIO_IO_U_LAT_M_NR; k++)
 		dst->io_u_lat_m[k] += src->io_u_lat_m[k];
-	}
+
 	for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
 		dst->io_u_sync_plat[k] += src->io_u_sync_plat[k];
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-03-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-03-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d9c50de7c1d95e5c173ac6d7f5b3f0d63131f8b4:

  t/io_uring: memset() allocated memory (2019-03-11 10:46:47 -0600)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c5daece64fd56763f264a59965a547433d4da799:

  stat: fix accumulation of latency buckets (2019-03-21 10:53:39 -0600)

----------------------------------------------------------------
Vincent Fu (1):
      stat: fix accumulation of latency buckets

 stat.c | 4 ++++
 1 file changed, 4 insertions(+)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 66a13bca..ecef1099 100644
--- a/stat.c
+++ b/stat.c
@@ -1684,7 +1684,11 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 	}
 	for (k = 0; k < FIO_IO_U_LAT_N_NR; k++) {
 		dst->io_u_lat_n[k] += src->io_u_lat_n[k];
+	}
+	for (k = 0; k < FIO_IO_U_LAT_U_NR; k++) {
 		dst->io_u_lat_u[k] += src->io_u_lat_u[k];
+	}
+	for (k = 0; k < FIO_IO_U_LAT_M_NR; k++) {
 		dst->io_u_lat_m[k] += src->io_u_lat_m[k];
 	}
 	for (k = 0; k < FIO_IO_U_PLAT_NR; k++)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-03-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-03-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e39863e3cb61bb750f5e7d87fcf11c2bf4651996:

  t/io_uring: add depth options (2019-03-08 21:30:25 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d9c50de7c1d95e5c173ac6d7f5b3f0d63131f8b4:

  t/io_uring: memset() allocated memory (2019-03-11 10:46:47 -0600)

----------------------------------------------------------------
Keith Busch (1):
      t/io_uring: memset() allocated memory

 t/io_uring.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 36aede9b..363cba3e 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -526,7 +526,8 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	submitter = malloc(sizeof(*submitter) * depth * sizeof(struct iovec));
+	submitter = malloc(sizeof(*submitter) + depth * sizeof(struct iovec));
+	memset(submitter, 0, sizeof(*submitter) + depth * sizeof(struct iovec));
 	s = submitter;
 
 	flags = O_RDONLY | O_NOATIME;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-03-09 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-03-09 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 44364a9937d01825557f4a7d78c6f153886e1115:

  engines/skeleton_external: update gcc incantation (2019-03-07 16:54:54 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e39863e3cb61bb750f5e7d87fcf11c2bf4651996:

  t/io_uring: add depth options (2019-03-08 21:30:25 -0700)

----------------------------------------------------------------
Keith Busch (1):
      t/io_uring: add depth options

 t/io_uring.c | 70 ++++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 52 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index c7139f87..36aede9b 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -44,7 +44,6 @@ struct io_cq_ring {
 };
 
 #define DEPTH			128
-
 #define BATCH_SUBMIT		32
 #define BATCH_COMPLETE		32
 
@@ -67,7 +66,6 @@ struct submitter {
 	struct drand48_data rand;
 	struct io_sq_ring sq_ring;
 	struct io_uring_sqe *sqes;
-	struct iovec iovecs[DEPTH];
 	struct io_cq_ring cq_ring;
 	int inflight;
 	unsigned long reaps;
@@ -81,11 +79,15 @@ struct submitter {
 	struct file files[MAX_FDS];
 	unsigned nr_files;
 	unsigned cur_file;
+	struct iovec iovecs[];
 };
 
-static struct submitter submitters[1];
+static struct submitter *submitter;
 static volatile int finish;
 
+static int depth = DEPTH;
+static int batch_submit = BATCH_SUBMIT;
+static int batch_complete = BATCH_COMPLETE;
 static int polled = 1;		/* use IO polling */
 static int fixedbufs = 1;	/* use fixed user buffers */
 static int register_files = 1;	/* use fixed files */
@@ -100,7 +102,7 @@ static int io_uring_register_buffers(struct submitter *s)
 		return 0;
 
 	return syscall(__NR_sys_io_uring_register, s->ring_fd,
-			IORING_REGISTER_BUFFERS, s->iovecs, DEPTH);
+			IORING_REGISTER_BUFFERS, s->iovecs, depth);
 }
 
 static int io_uring_register_files(struct submitter *s)
@@ -139,7 +141,7 @@ static int gettid(void)
 
 static unsigned file_depth(struct submitter *s)
 {
-	return (DEPTH + s->nr_files - 1) / s->nr_files;
+	return (depth + s->nr_files - 1) / s->nr_files;
 }
 
 static void init_io(struct submitter *s, unsigned index)
@@ -295,18 +297,18 @@ static void *submitter_fn(void *data)
 	do {
 		int to_wait, to_submit, this_reap, to_prep;
 
-		if (!prepped && s->inflight < DEPTH) {
-			to_prep = min(DEPTH - s->inflight, BATCH_SUBMIT);
+		if (!prepped && s->inflight < depth) {
+			to_prep = min(depth - s->inflight, batch_submit);
 			prepped = prep_more_ios(s, to_prep);
 		}
 		s->inflight += prepped;
 submit_more:
 		to_submit = prepped;
 submit:
-		if (to_submit && (s->inflight + to_submit <= DEPTH))
+		if (to_submit && (s->inflight + to_submit <= depth))
 			to_wait = 0;
 		else
-			to_wait = min(s->inflight + to_submit, BATCH_COMPLETE);
+			to_wait = min(s->inflight + to_submit, batch_complete);
 
 		/*
 		 * Only need to call io_uring_enter if we're not using SQ thread
@@ -377,7 +379,7 @@ submit:
 static void sig_int(int sig)
 {
 	printf("Exiting on signal %d\n", sig);
-	submitters[0].finish = 1;
+	submitter->finish = 1;
 	finish = 1;
 }
 
@@ -411,7 +413,7 @@ static int setup_ring(struct submitter *s)
 		}
 	}
 
-	fd = io_uring_setup(DEPTH, &p);
+	fd = io_uring_setup(depth, &p);
 	if (fd < 0) {
 		perror("io_uring_setup");
 		return 1;
@@ -466,7 +468,7 @@ static int setup_ring(struct submitter *s)
 
 static void file_depths(char *buf)
 {
-	struct submitter *s = &submitters[0];
+	struct submitter *s = submitter;
 	char *p;
 	int i;
 
@@ -482,24 +484,56 @@ static void file_depths(char *buf)
 	}
 }
 
+static void usage(char *argv)
+{
+	printf("%s [options] -- [filenames]\n"
+		" -d <int> : IO Depth, default %d\n"
+		" -s <int> : Batch submit, default %d\n"
+		" -c <int> : Batch complete, default %d\n",
+		argv, DEPTH, BATCH_SUBMIT, BATCH_COMPLETE);
+	exit(0);
+}
+
 int main(int argc, char *argv[])
 {
-	struct submitter *s = &submitters[0];
+	struct submitter *s;
 	unsigned long done, calls, reap, cache_hit, cache_miss;
-	int err, i, flags, fd;
+	int err, i, flags, fd, opt;
 	char *fdepths;
 	void *ret;
 
 	if (!do_nop && argc < 2) {
-		printf("%s: filename\n", argv[0]);
+		printf("%s: filename [options]\n", argv[0]);
 		return 1;
 	}
 
+	while ((opt = getopt(argc, argv, "d:s:c:h?")) != -1) {
+		switch (opt) {
+		case 'd':
+			depth = atoi(optarg);
+			break;
+		case 's':
+			batch_submit = atoi(optarg);
+			break;
+		case 'c':
+			batch_complete = atoi(optarg);
+			break;
+		case 'h':
+		case '?':
+		default:
+			usage(argv[0]);
+			break;
+		}
+	}
+
+	submitter = malloc(sizeof(*submitter) * depth * sizeof(struct iovec));
+	s = submitter;
+
 	flags = O_RDONLY | O_NOATIME;
 	if (!buffered)
 		flags |= O_DIRECT;
 
-	i = 1;
+	i = optind;
 	while (!do_nop && i < argc) {
 		struct file *f;
 
@@ -543,7 +577,7 @@ int main(int argc, char *argv[])
 
 	arm_sig_int();
 
-	for (i = 0; i < DEPTH; i++) {
+	for (i = 0; i < depth; i++) {
 		void *buf;
 
 		if (posix_memalign(&buf, BS, BS)) {
@@ -560,7 +594,7 @@ int main(int argc, char *argv[])
 		return 1;
 	}
 	printf("polled=%d, fixedbufs=%d, buffered=%d", polled, fixedbufs, buffered);
-	printf(" QD=%d, sq_ring=%d, cq_ring=%d\n", DEPTH, *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
+	printf(" QD=%d, sq_ring=%d, cq_ring=%d\n", depth, *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
 
 	pthread_create(&s->thread, NULL, submitter_fn, s);
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-03-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-03-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit be26a9823261c15c6e737e2e6c8762423cf325b8:

  t/io_uring: stop when max number of files is reached (2019-03-06 08:24:38 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 44364a9937d01825557f4a7d78c6f153886e1115:

  engines/skeleton_external: update gcc incantation (2019-03-07 16:54:54 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      engines/skeleton_external: update gcc incantation

 engines/skeleton_external.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/skeleton_external.c b/engines/skeleton_external.c
index 21a36018..1b6625b2 100644
--- a/engines/skeleton_external.c
+++ b/engines/skeleton_external.c
@@ -3,7 +3,7 @@
  *
  * Should be compiled with:
  *
- * gcc -Wall -O2 -g -shared -rdynamic -fPIC -o skeleton_external.o skeleton_external.c
+ * gcc -Wall -O2 -g -D_GNU_SOURCE -include ../config-host.h -shared -rdynamic -fPIC -o skeleton_external.o skeleton_external.c
  * (also requires -D_GNU_SOURCE -DCONFIG_STRSEP on Linux)
  *
  */


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-03-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-03-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7508b3948a216106196cec33a3a707d6f32d84a8:

  engines/sg: ensure we flag EIO on the right io_u (2019-02-28 10:08:08 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to be26a9823261c15c6e737e2e6c8762423cf325b8:

  t/io_uring: stop when max number of files is reached (2019-03-06 08:24:38 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: stop when max number of files is reached

 t/io_uring.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 7c75c887..c7139f87 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -501,13 +501,19 @@ int main(int argc, char *argv[])
 
 	i = 1;
 	while (!do_nop && i < argc) {
-		struct file *f = &s->files[s->nr_files];
+		struct file *f;
 
+		if (s->nr_files == MAX_FDS) {
+			printf("Max number of files (%d) reached\n", MAX_FDS);
+			break;
+		}
 		fd = open(argv[i], flags);
 		if (fd < 0) {
 			perror("open");
 			return 1;
 		}
+
+		f = &s->files[s->nr_files];
 		f->real_fd = fd;
 		if (get_file_size(f)) {
 			printf("failed getting size of device/file\n");


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-03-01 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-03-01 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 66a8a1bf98d714e013ee329dc975f3b6b552de6d:

  engines/sg: kill dead function (2019-02-24 08:13:39 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7508b3948a216106196cec33a3a707d6f32d84a8:

  engines/sg: ensure we flag EIO on the right io_u (2019-02-28 10:08:08 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      engines/sg: ensure we complete the right command for sync IO
      engines/sg: ensure we flag EIO on the right io_u

 engines/sg.c | 43 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 32 insertions(+), 11 deletions(-)

---

Diff of recent changes:

diff --git a/engines/sg.c b/engines/sg.c
index bf437c8d..c46b9aba 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -424,7 +424,8 @@ static enum fio_q_status fio_sgio_ioctl_doio(struct thread_data *td,
 	return FIO_Q_COMPLETED;
 }
 
-static enum fio_q_status fio_sgio_rw_doio(struct fio_file *f,
+static enum fio_q_status fio_sgio_rw_doio(struct thread_data *td,
+					  struct fio_file *f,
 					  struct io_u *io_u, int do_sync)
 {
 	struct sg_io_hdr *hdr = &io_u->hdr;
@@ -435,13 +436,32 @@ static enum fio_q_status fio_sgio_rw_doio(struct fio_file *f,
 		return ret;
 
 	if (do_sync) {
-		ret = read(f->fd, hdr, sizeof(*hdr));
-		if (ret < 0)
-			return ret;
+		/*
+		 * We can't just read back the first command that completes
+		 * and assume it's the one we need, it could be any command
+		 * that is inflight.
+		 */
+		do {
+			struct io_u *__io_u;
 
-		/* record if an io error occurred */
-		if (hdr->info & SG_INFO_CHECK)
-			io_u->error = EIO;
+			ret = read(f->fd, hdr, sizeof(*hdr));
+			if (ret < 0)
+				return ret;
+
+			__io_u = hdr->usr_ptr;
+
+			/* record if an io error occurred */
+			if (hdr->info & SG_INFO_CHECK)
+				__io_u->error = EIO;
+
+			if (__io_u == io_u)
+				break;
+
+			if (io_u_sync_complete(td, __io_u)) {
+				ret = -1;
+				break;
+			}
+		} while (1);
 
 		return FIO_Q_COMPLETED;
 	}
@@ -457,10 +477,11 @@ static enum fio_q_status fio_sgio_doio(struct thread_data *td,
 
 	if (f->filetype == FIO_TYPE_BLOCK) {
 		ret = fio_sgio_ioctl_doio(td, f, io_u);
-		td_verror(td, io_u->error, __func__);
+		if (io_u->error)
+			td_verror(td, io_u->error, __func__);
 	} else {
-		ret = fio_sgio_rw_doio(f, io_u, do_sync);
-		if (do_sync)
+		ret = fio_sgio_rw_doio(td, f, io_u, do_sync);
+		if (io_u->error && do_sync)
 			td_verror(td, io_u->error, __func__);
 	}
 
@@ -678,7 +699,7 @@ static int fio_sgio_commit(struct thread_data *td)
 
 	sd->current_queue = -1;
 
-	ret = fio_sgio_rw_doio(io_u->file, io_u, 0);
+	ret = fio_sgio_rw_doio(td, io_u->file, io_u, 0);
 
 	if (ret < 0 || hdr->status) {
 		int error;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-02-25 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-02-25 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit bc596cbcdbb58b81da53a29acf1370d8a7e94429:

  t/zbd: Add multi-job libaio test (2019-02-23 21:19:01 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 66a8a1bf98d714e013ee329dc975f3b6b552de6d:

  engines/sg: kill dead function (2019-02-24 08:13:39 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      engines/sg: kill dead function

 engines/sg.c | 5 -----
 1 file changed, 5 deletions(-)

---

Diff of recent changes:

diff --git a/engines/sg.c b/engines/sg.c
index d681ac93..bf437c8d 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -157,11 +157,6 @@ struct sgio_data {
 #endif
 };
 
-static inline uint16_t sgio_get_be16(uint8_t *buf)
-{
-	return be16_to_cpu(*((uint16_t *) buf));
-}
-
 static inline uint32_t sgio_get_be32(uint8_t *buf)
 {
 	return be32_to_cpu(*((uint32_t *) buf));


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-02-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-02-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3a294b8704a4125f12e3c3dec36667e68d821be0:

  options: catch division by zero in setting CPU affinity (2019-02-21 10:55:32 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to bc596cbcdbb58b81da53a29acf1370d8a7e94429:

  t/zbd: Add multi-job libaio test (2019-02-23 21:19:01 -0700)

----------------------------------------------------------------
Damien Le Moal (4):
      t/zbd: Fix test 2 and 3 result handling
      zbd: Fix zone locking for async I/O engines
      zbd: Avoid async I/O multi-job workload deadlock
      t/zbd: Add multi-job libaio test

Dmitry Fomichev (2):
      sg: Avoid READ CAPACITY failures
      sg: Clean up handling of big endian data fields

Shin'ichiro Kawasaki (3):
      zbd: Fix partition block device handling
      t/zbd: Fix handling of partition devices
      t/zbd: Default to using blkzone tool

 engines/sg.c           | 127 ++++++++++++++++++++++++++----------------------
 io_u.c                 |  10 +---
 io_u.h                 |  17 +++++--
 ioengines.c            |   6 +--
 os/os.h                |  24 +++++++++
 t/zbd/functions        |  25 +++++++---
 t/zbd/test-zbd-support |  34 ++++++++++---
 zbd.c                  | 129 +++++++++++++++++++++++++++++++++++++++++--------
 zbd.h                  |  22 +++++++++
 9 files changed, 289 insertions(+), 105 deletions(-)

---

Diff of recent changes:

diff --git a/engines/sg.c b/engines/sg.c
index 3cc068f3..d681ac93 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -137,7 +137,7 @@ struct sgio_cmd {
 };
 
 struct sgio_trim {
-	char *unmap_param;
+	uint8_t *unmap_param;
 	unsigned int unmap_range_count;
 	struct io_u **trim_io_us;
 };
@@ -157,6 +157,42 @@ struct sgio_data {
 #endif
 };
 
+static inline uint16_t sgio_get_be16(uint8_t *buf)
+{
+	return be16_to_cpu(*((uint16_t *) buf));
+}
+
+static inline uint32_t sgio_get_be32(uint8_t *buf)
+{
+	return be32_to_cpu(*((uint32_t *) buf));
+}
+
+static inline uint64_t sgio_get_be64(uint8_t *buf)
+{
+	return be64_to_cpu(*((uint64_t *) buf));
+}
+
+static inline void sgio_set_be16(uint16_t val, uint8_t *buf)
+{
+	uint16_t t = cpu_to_be16(val);
+
+	memcpy(buf, &t, sizeof(uint16_t));
+}
+
+static inline void sgio_set_be32(uint32_t val, uint8_t *buf)
+{
+	uint32_t t = cpu_to_be32(val);
+
+	memcpy(buf, &t, sizeof(uint32_t));
+}
+
+static inline void sgio_set_be64(uint64_t val, uint8_t *buf)
+{
+	uint64_t t = cpu_to_be64(val);
+
+	memcpy(buf, &t, sizeof(uint64_t));
+}
+
 static inline bool sgio_unbuffered(struct thread_data *td)
 {
 	return (td->o.odirect || td->o.sync_io);
@@ -440,25 +476,11 @@ static void fio_sgio_rw_lba(struct sg_io_hdr *hdr, unsigned long long lba,
 			    unsigned long long nr_blocks)
 {
 	if (lba < MAX_10B_LBA) {
-		hdr->cmdp[2] = (unsigned char) ((lba >> 24) & 0xff);
-		hdr->cmdp[3] = (unsigned char) ((lba >> 16) & 0xff);
-		hdr->cmdp[4] = (unsigned char) ((lba >>  8) & 0xff);
-		hdr->cmdp[5] = (unsigned char) (lba & 0xff);
-		hdr->cmdp[7] = (unsigned char) ((nr_blocks >> 8) & 0xff);
-		hdr->cmdp[8] = (unsigned char) (nr_blocks & 0xff);
+		sgio_set_be32((uint32_t) lba, &hdr->cmdp[2]);
+		sgio_set_be16((uint16_t) nr_blocks, &hdr->cmdp[7]);
 	} else {
-		hdr->cmdp[2] = (unsigned char) ((lba >> 56) & 0xff);
-		hdr->cmdp[3] = (unsigned char) ((lba >> 48) & 0xff);
-		hdr->cmdp[4] = (unsigned char) ((lba >> 40) & 0xff);
-		hdr->cmdp[5] = (unsigned char) ((lba >> 32) & 0xff);
-		hdr->cmdp[6] = (unsigned char) ((lba >> 24) & 0xff);
-		hdr->cmdp[7] = (unsigned char) ((lba >> 16) & 0xff);
-		hdr->cmdp[8] = (unsigned char) ((lba >>  8) & 0xff);
-		hdr->cmdp[9] = (unsigned char) (lba & 0xff);
-		hdr->cmdp[10] = (unsigned char) ((nr_blocks >> 32) & 0xff);
-		hdr->cmdp[11] = (unsigned char) ((nr_blocks >> 16) & 0xff);
-		hdr->cmdp[12] = (unsigned char) ((nr_blocks >> 8) & 0xff);
-		hdr->cmdp[13] = (unsigned char) (nr_blocks & 0xff);
+		sgio_set_be64(lba, &hdr->cmdp[2]);
+		sgio_set_be32((uint32_t) nr_blocks, &hdr->cmdp[10]);
 	}
 
 	return;
@@ -552,18 +574,8 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 #endif
 
 		offset = 8 + 16 * st->unmap_range_count;
-		st->unmap_param[offset] = (unsigned char) ((lba >> 56) & 0xff);
-		st->unmap_param[offset+1] = (unsigned char) ((lba >> 48) & 0xff);
-		st->unmap_param[offset+2] = (unsigned char) ((lba >> 40) & 0xff);
-		st->unmap_param[offset+3] = (unsigned char) ((lba >> 32) & 0xff);
-		st->unmap_param[offset+4] = (unsigned char) ((lba >> 24) & 0xff);
-		st->unmap_param[offset+5] = (unsigned char) ((lba >> 16) & 0xff);
-		st->unmap_param[offset+6] = (unsigned char) ((lba >>  8) & 0xff);
-		st->unmap_param[offset+7] = (unsigned char) (lba & 0xff);
-		st->unmap_param[offset+8] = (unsigned char) ((nr_blocks >> 32) & 0xff);
-		st->unmap_param[offset+9] = (unsigned char) ((nr_blocks >> 16) & 0xff);
-		st->unmap_param[offset+10] = (unsigned char) ((nr_blocks >> 8) & 0xff);
-		st->unmap_param[offset+11] = (unsigned char) (nr_blocks & 0xff);
+		sgio_set_be64(lba, &st->unmap_param[offset]);
+		sgio_set_be32((uint32_t) nr_blocks, &st->unmap_param[offset + 8]);
 
 		st->unmap_range_count++;
 
@@ -582,14 +594,12 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 
 static void fio_sgio_unmap_setup(struct sg_io_hdr *hdr, struct sgio_trim *st)
 {
-	hdr->dxfer_len = st->unmap_range_count * 16 + 8;
-	hdr->cmdp[7] = (unsigned char) (((st->unmap_range_count * 16 + 8) >> 8) & 0xff);
-	hdr->cmdp[8] = (unsigned char) ((st->unmap_range_count * 16 + 8) & 0xff);
+	uint16_t cnt = st->unmap_range_count * 16;
 
-	st->unmap_param[0] = (unsigned char) (((16 * st->unmap_range_count + 6) >> 8) & 0xff);
-	st->unmap_param[1] = (unsigned char)  ((16 * st->unmap_range_count + 6) & 0xff);
-	st->unmap_param[2] = (unsigned char) (((16 * st->unmap_range_count) >> 8) & 0xff);
-	st->unmap_param[3] = (unsigned char)  ((16 * st->unmap_range_count) & 0xff);
+	hdr->dxfer_len = cnt + 8;
+	sgio_set_be16(cnt + 8, &hdr->cmdp[7]);
+	sgio_set_be16(cnt + 6, st->unmap_param);
+	sgio_set_be16(cnt, &st->unmap_param[2]);
 
 	return;
 }
@@ -723,6 +733,8 @@ static int fio_sgio_read_capacity(struct thread_data *td, unsigned int *bs,
 	 * io_u structures, which are not initialized until later.
 	 */
 	struct sg_io_hdr hdr;
+	unsigned long long hlba;
+	unsigned int blksz = 0;
 	unsigned char cmd[16];
 	unsigned char sb[64];
 	unsigned char buf[32];  // read capacity return
@@ -759,23 +771,23 @@ static int fio_sgio_read_capacity(struct thread_data *td, unsigned int *bs,
 		return ret;
 	}
 
-	*bs	 = ((unsigned long) buf[4] << 24) | ((unsigned long) buf[5] << 16) |
-		   ((unsigned long) buf[6] << 8) | (unsigned long) buf[7];
-	*max_lba = ((unsigned long) buf[0] << 24) | ((unsigned long) buf[1] << 16) |
-		   ((unsigned long) buf[2] << 8) | (unsigned long) buf[3];
+	if (hdr.info & SG_INFO_CHECK) {
+		/* RCAP(10) might be unsupported by device. Force RCAP(16) */
+		hlba = MAX_10B_LBA;
+	} else {
+		blksz = sgio_get_be32(&buf[4]);
+		hlba = sgio_get_be32(buf);
+	}
 
 	/*
 	 * If max lba masked by MAX_10B_LBA equals MAX_10B_LBA,
 	 * then need to retry with 16 byte Read Capacity command.
 	 */
-	if (*max_lba == MAX_10B_LBA) {
+	if (hlba == MAX_10B_LBA) {
 		hdr.cmd_len = 16;
 		hdr.cmdp[0] = 0x9e; // service action
 		hdr.cmdp[1] = 0x10; // Read Capacity(16)
-		hdr.cmdp[10] = (unsigned char) ((sizeof(buf) >> 24) & 0xff);
-		hdr.cmdp[11] = (unsigned char) ((sizeof(buf) >> 16) & 0xff);
-		hdr.cmdp[12] = (unsigned char) ((sizeof(buf) >> 8) & 0xff);
-		hdr.cmdp[13] = (unsigned char) (sizeof(buf) & 0xff);
+		sgio_set_be32(sizeof(buf), &hdr.cmdp[10]);
 
 		hdr.dxfer_direction = SG_DXFER_FROM_DEV;
 		hdr.dxferp = buf;
@@ -791,19 +803,20 @@ static int fio_sgio_read_capacity(struct thread_data *td, unsigned int *bs,
 		if (hdr.info & SG_INFO_CHECK)
 			td_verror(td, EIO, "fio_sgio_read_capacity");
 
-		*bs = (buf[8] << 24) | (buf[9] << 16) | (buf[10] << 8) | buf[11];
-		*max_lba = ((unsigned long long)buf[0] << 56) |
-				((unsigned long long)buf[1] << 48) |
-				((unsigned long long)buf[2] << 40) |
-				((unsigned long long)buf[3] << 32) |
-				((unsigned long long)buf[4] << 24) |
-				((unsigned long long)buf[5] << 16) |
-				((unsigned long long)buf[6] << 8) |
-				(unsigned long long)buf[7];
+		blksz = sgio_get_be32(&buf[8]);
+		hlba = sgio_get_be64(buf);
+	}
+
+	if (blksz) {
+		*bs = blksz;
+		*max_lba = hlba;
+		ret = 0;
+	} else {
+		ret = EIO;
 	}
 
 	close(fd);
-	return 0;
+	return ret;
 }
 
 static void fio_sgio_cleanup(struct thread_data *td)
diff --git a/io_u.c b/io_u.c
index bee99c37..910b7deb 100644
--- a/io_u.c
+++ b/io_u.c
@@ -775,10 +775,7 @@ void put_io_u(struct thread_data *td, struct io_u *io_u)
 {
 	const bool needs_lock = td_async_processing(td);
 
-	if (io_u->post_submit) {
-		io_u->post_submit(io_u, io_u->error == 0);
-		io_u->post_submit = NULL;
-	}
+	zbd_put_io_u(io_u);
 
 	if (td->parent)
 		td = td->parent;
@@ -1340,10 +1337,7 @@ static long set_io_u_file(struct thread_data *td, struct io_u *io_u)
 		if (!fill_io_u(td, io_u))
 			break;
 
-		if (io_u->post_submit) {
-			io_u->post_submit(io_u, false);
-			io_u->post_submit = NULL;
-		}
+		zbd_put_io_u(io_u);
 
 		put_file_log(td, f);
 		td_io_close_file(td, f);
diff --git a/io_u.h b/io_u.h
index 97270c94..e75993bd 100644
--- a/io_u.h
+++ b/io_u.h
@@ -92,11 +92,22 @@ struct io_u {
 		struct workqueue_work work;
 	};
 
+#ifdef CONFIG_LINUX_BLKZONED
 	/*
-	 * Post-submit callback. Used by the ZBD code. @success == true means
-	 * that the I/O operation has been queued or completed successfully.
+	 * ZBD mode zbd_queue_io callback: called after engine->queue operation
+	 * to advance a zone write pointer and eventually unlock the I/O zone.
+	 * @q indicates the I/O queue status (busy, queued or completed).
+	 * @success == true means that the I/O operation has been queued or
+	 * completed successfully.
 	 */
-	void (*post_submit)(const struct io_u *, bool success);
+	void (*zbd_queue_io)(struct io_u *, int q, bool success);
+
+	/*
+	 * ZBD mode zbd_put_io callback: called in after completion of an I/O
+	 * or commit of an async I/O to unlock the I/O target zone.
+	 */
+	void (*zbd_put_io)(const struct io_u *);
+#endif
 
 	/*
 	 * Callback for io completion
diff --git a/ioengines.c b/ioengines.c
index 45e769e6..7e5a50cc 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -329,10 +329,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	}
 
 	ret = td->io_ops->queue(td, io_u);
-	if (ret != FIO_Q_BUSY && io_u->post_submit) {
-		io_u->post_submit(io_u, io_u->error == 0);
-		io_u->post_submit = NULL;
-	}
+	zbd_queue_io_u(io_u, ret);
 
 	unlock_file(td, io_u->file);
 
@@ -374,6 +371,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	if (!td->io_ops->commit) {
 		io_u_mark_submit(td, 1);
 		io_u_mark_complete(td, 1);
+		zbd_put_io_u(io_u);
 	}
 
 	if (ret == FIO_Q_COMPLETED) {
diff --git a/os/os.h b/os/os.h
index 0b182c4a..36b6bb2e 100644
--- a/os/os.h
+++ b/os/os.h
@@ -210,19 +210,27 @@ static inline uint64_t fio_swap64(uint64_t val)
 
 #ifndef FIO_HAVE_BYTEORDER_FUNCS
 #ifdef CONFIG_LITTLE_ENDIAN
+#define __be16_to_cpu(x)		fio_swap16(x)
+#define __be32_to_cpu(x)		fio_swap32(x)
 #define __be64_to_cpu(x)		fio_swap64(x)
 #define __le16_to_cpu(x)		(x)
 #define __le32_to_cpu(x)		(x)
 #define __le64_to_cpu(x)		(x)
+#define __cpu_to_be16(x)		fio_swap16(x)
+#define __cpu_to_be32(x)		fio_swap32(x)
 #define __cpu_to_be64(x)		fio_swap64(x)
 #define __cpu_to_le16(x)		(x)
 #define __cpu_to_le32(x)		(x)
 #define __cpu_to_le64(x)		(x)
 #else
+#define __be16_to_cpu(x)		(x)
+#define __be32_to_cpu(x)		(x)
 #define __be64_to_cpu(x)		(x)
 #define __le16_to_cpu(x)		fio_swap16(x)
 #define __le32_to_cpu(x)		fio_swap32(x)
 #define __le64_to_cpu(x)		fio_swap64(x)
+#define __cpu_to_be16(x)		(x)
+#define __cpu_to_be32(x)		(x)
 #define __cpu_to_be64(x)		(x)
 #define __cpu_to_le16(x)		fio_swap16(x)
 #define __cpu_to_le32(x)		fio_swap32(x)
@@ -231,6 +239,14 @@ static inline uint64_t fio_swap64(uint64_t val)
 #endif /* FIO_HAVE_BYTEORDER_FUNCS */
 
 #ifdef FIO_INTERNAL
+#define be16_to_cpu(val) ({			\
+	typecheck(uint16_t, val);		\
+	__be16_to_cpu(val);			\
+})
+#define be32_to_cpu(val) ({			\
+	typecheck(uint32_t, val);		\
+	__be32_to_cpu(val);			\
+})
 #define be64_to_cpu(val) ({			\
 	typecheck(uint64_t, val);		\
 	__be64_to_cpu(val);			\
@@ -249,6 +265,14 @@ static inline uint64_t fio_swap64(uint64_t val)
 })
 #endif
 
+#define cpu_to_be16(val) ({			\
+	typecheck(uint16_t, val);		\
+	__cpu_to_be16(val);			\
+})
+#define cpu_to_be32(val) ({			\
+	typecheck(uint32_t, val);		\
+	__cpu_to_be32(val);			\
+})
 #define cpu_to_be64(val) ({			\
 	typecheck(uint64_t, val);		\
 	__cpu_to_be64(val);			\
diff --git a/t/zbd/functions b/t/zbd/functions
index 173f0ca6..d49555a8 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -1,8 +1,7 @@
 #!/bin/bash
 
-# To do: switch to blkzone once blkzone reset works correctly.
-blkzone=
-#blkzone=$(type -p blkzone 2>/dev/null)
+blkzone=$(type -p blkzone 2>/dev/null)
+sg_inq=$(type -p sg_inq 2>/dev/null)
 zbc_report_zones=$(type -p zbc_report_zones 2>/dev/null)
 zbc_reset_zone=$(type -p zbc_reset_zone 2>/dev/null)
 if [ -z "${blkzone}" ] &&
@@ -34,9 +33,23 @@ first_sequential_zone() {
 max_open_zones() {
     local dev=$1
 
-    if [ -n "${blkzone}" ]; then
-	# To do: query the maximum number of open zones using sg_raw
-	return 1
+    if [ -n "${sg_inq}" ]; then
+	if ! ${sg_inq} -e --page=0xB6 --len=20 --hex "$dev" 2> /dev/null; then
+	    # Non scsi device such as null_blk can not return max open zones.
+	    # Use default value.
+	    echo 128
+	else
+	    ${sg_inq} -e --page=0xB6 --len=20 --hex "$dev" | tail -1 |
+		{
+		    read -r offset b0 b1 b2 b3 trailer || return $?
+		    # Convert from hex to decimal
+		    max_nr_open_zones=$((0x${b0}))
+		    max_nr_open_zones=$((max_nr_open_zones * 256 + 0x${b1}))
+		    max_nr_open_zones=$((max_nr_open_zones * 256 + 0x${b2}))
+		    max_nr_open_zones=$((max_nr_open_zones * 256 + 0x${b3}))
+		    echo ${max_nr_open_zones}
+		}
+	fi
     else
 	${zbc_report_zones} "$dev" |
 	    sed -n 's/^[[:blank:]]*Maximum number of open sequential write required zones:[[:blank:]]*//p'
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 2d727910..10c78e9a 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -141,9 +141,8 @@ test2() {
     if [ -z "$is_zbd" ]; then
 	opts+=("--zonesize=${zone_size}")
     fi
-    run_fio "${opts[@]}" 2>&1 |
-	tee -a "${logfile}.${test_number}" |
-	grep -q 'No I/O performed'
+    run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
+    ! grep -q 'WRITE:' "${logfile}.${test_number}"
 }
 
 # Run fio against an empty zone. This causes fio to report "No I/O performed".
@@ -160,12 +159,12 @@ test3() {
 	opts+=("--zonesize=${zone_size}")
     fi
     run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
-    grep -q "No I/O performed" "${logfile}.${test_number}"
+    grep -q 'READ:' "${logfile}.${test_number}"
     rc=$?
     if [ -n "$is_zbd" ]; then
-	[ $rc = 0 ]
-    else
 	[ $rc != 0 ]
+    else
+	[ $rc = 0 ]
     fi
 }
 
@@ -731,6 +730,17 @@ test45() {
 	grep -q "fio: first I/O failed. If .* is a zoned block device, consider --zonemode=zbd"
 }
 
+# Random write to sequential zones, libaio, 8 jobs, queue depth 64 per job
+test46() {
+    local size
+
+    size=$((4 * zone_size))
+    run_fio_on_seq --ioengine=libaio --iodepth=64 --rw=randwrite --bs=4K \
+		   --group_reporting=1 --numjobs=8 \
+		   >> "${logfile}.${test_number}" 2>&1 || return $?
+    check_written $((size * 8)) || return $?
+}
+
 tests=()
 dynamic_analyzer=()
 reset_all_zones=
@@ -761,7 +771,15 @@ source "$(dirname "$0")/functions" || exit $?
 dev=$1
 realdev=$(readlink -f "$dev")
 basename=$(basename "$realdev")
-disk_size=$(($(<"/sys/block/$basename/size")*512))
+major=$((0x$(stat -L -c '%t' "$realdev")))
+minor=$((0x$(stat -L -c '%T' "$realdev")))
+disk_size=$(($(<"/sys/dev/block/$major:$minor/size")*512))
+# When the target is a partition device, get basename of its holder device to
+# access sysfs path of the holder device
+if [[ -r "/sys/dev/block/$major:$minor/partition" ]]; then
+	realsysfs=$(readlink "/sys/dev/block/$major:$minor")
+	basename=$(basename "${realsysfs%/*}")
+fi
 logical_block_size=$(<"/sys/block/$basename/queue/logical_block_size")
 case "$(<"/sys/class/block/$basename/queue/zoned")" in
     host-managed|host-aware)
@@ -794,7 +812,7 @@ case "$(<"/sys/class/block/$basename/queue/zoned")" in
 esac
 
 if [ "${#tests[@]}" = 0 ]; then
-    for ((i=1;i<=45;i++)); do
+    for ((i=1;i<=46;i++)); do
 	tests+=("$i")
     done
 fi
diff --git a/zbd.c b/zbd.c
index 8acda1f6..2da742b7 100644
--- a/zbd.c
+++ b/zbd.c
@@ -228,12 +228,45 @@ static enum blk_zoned_model get_zbd_model(const char *file_name)
 	char *zoned_attr_path = NULL;
 	char *model_str = NULL;
 	struct stat statbuf;
+	char *sys_devno_path = NULL;
+	char *part_attr_path = NULL;
+	char *part_str = NULL;
+	char sys_path[PATH_MAX];
+	ssize_t sz;
+	char *delim = NULL;
 
 	if (stat(file_name, &statbuf) < 0)
 		goto out;
-	if (asprintf(&zoned_attr_path, "/sys/dev/block/%d:%d/queue/zoned",
+
+	if (asprintf(&sys_devno_path, "/sys/dev/block/%d:%d",
 		     major(statbuf.st_rdev), minor(statbuf.st_rdev)) < 0)
 		goto out;
+
+	sz = readlink(sys_devno_path, sys_path, sizeof(sys_path) - 1);
+	if (sz < 0)
+		goto out;
+	sys_path[sz] = '\0';
+
+	/*
+	 * If the device is a partition device, cut the device name in the
+	 * canonical sysfs path to obtain the sysfs path of the holder device.
+	 *   e.g.:  /sys/devices/.../sda/sda1 -> /sys/devices/.../sda
+	 */
+	if (asprintf(&part_attr_path, "/sys/dev/block/%s/partition",
+		     sys_path) < 0)
+		goto out;
+	part_str = read_file(part_attr_path);
+	if (part_str && *part_str == '1') {
+		delim = strrchr(sys_path, '/');
+		if (!delim)
+			goto out;
+		*delim = '\0';
+	}
+
+	if (asprintf(&zoned_attr_path,
+		     "/sys/dev/block/%s/queue/zoned", sys_path) < 0)
+		goto out;
+
 	model_str = read_file(zoned_attr_path);
 	if (!model_str)
 		goto out;
@@ -246,6 +279,9 @@ static enum blk_zoned_model get_zbd_model(const char *file_name)
 out:
 	free(model_str);
 	free(zoned_attr_path);
+	free(part_str);
+	free(part_attr_path);
+	free(sys_devno_path);
 	return model;
 }
 
@@ -1075,37 +1111,44 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 	return NULL;
 }
 
-
 /**
- * zbd_post_submit - update the write pointer and unlock the zone lock
+ * zbd_queue_io - update the write pointer of a sequential zone
  * @io_u: I/O unit
- * @success: Whether or not the I/O unit has been executed successfully
+ * @success: Whether or not the I/O unit has been queued successfully
+ * @q: queueing status (busy, completed or queued).
  *
- * For write and trim operations, update the write pointer of all affected
- * zones.
+ * For write and trim operations, update the write pointer of the I/O unit
+ * target zone.
  */
-static void zbd_post_submit(const struct io_u *io_u, bool success)
+static void zbd_queue_io(struct io_u *io_u, int q, bool success)
 {
-	struct zoned_block_device_info *zbd_info;
+	const struct fio_file *f = io_u->file;
+	struct zoned_block_device_info *zbd_info = f->zbd_info;
 	struct fio_zone_info *z;
 	uint32_t zone_idx;
-	uint64_t end, zone_end;
+	uint64_t zone_end;
 
-	zbd_info = io_u->file->zbd_info;
 	if (!zbd_info)
 		return;
 
-	zone_idx = zbd_zone_idx(io_u->file, io_u->offset);
-	end = io_u->offset + io_u->buflen;
-	z = &zbd_info->zone_info[zone_idx];
+	zone_idx = zbd_zone_idx(f, io_u->offset);
 	assert(zone_idx < zbd_info->nr_zones);
+	z = &zbd_info->zone_info[zone_idx];
+
 	if (z->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
 		return;
+
 	if (!success)
 		goto unlock;
+
+	dprint(FD_ZBD,
+	       "%s: queued I/O (%lld, %llu) for zone %u\n",
+	       f->file_name, io_u->offset, io_u->buflen, zone_idx);
+
 	switch (io_u->ddir) {
 	case DDIR_WRITE:
-		zone_end = min(end, (z + 1)->start);
+		zone_end = min((uint64_t)(io_u->offset + io_u->buflen),
+			       (z + 1)->start);
 		pthread_mutex_lock(&zbd_info->mutex);
 		/*
 		 * z->wp > zone_end means that one or more I/O errors
@@ -1122,10 +1165,42 @@ static void zbd_post_submit(const struct io_u *io_u, bool success)
 	default:
 		break;
 	}
+
 unlock:
-	pthread_mutex_unlock(&z->mutex);
+	if (!success || q != FIO_Q_QUEUED) {
+		/* BUSY or COMPLETED: unlock the zone */
+		pthread_mutex_unlock(&z->mutex);
+		io_u->zbd_put_io = NULL;
+	}
+}
 
-	zbd_check_swd(io_u->file);
+/**
+ * zbd_put_io - Unlock an I/O unit target zone lock
+ * @io_u: I/O unit
+ */
+static void zbd_put_io(const struct io_u *io_u)
+{
+	const struct fio_file *f = io_u->file;
+	struct zoned_block_device_info *zbd_info = f->zbd_info;
+	struct fio_zone_info *z;
+	uint32_t zone_idx;
+
+	if (!zbd_info)
+		return;
+
+	zone_idx = zbd_zone_idx(f, io_u->offset);
+	assert(zone_idx < zbd_info->nr_zones);
+	z = &zbd_info->zone_info[zone_idx];
+
+	if (z->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
+		return;
+
+	dprint(FD_ZBD,
+	       "%s: terminate I/O (%lld, %llu) for zone %u\n",
+	       f->file_name, io_u->offset, io_u->buflen, zone_idx);
+
+	assert(pthread_mutex_unlock(&z->mutex) == 0);
+	zbd_check_swd(f);
 }
 
 bool zbd_unaligned_write(int error_code)
@@ -1180,7 +1255,21 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 
 	zbd_check_swd(f);
 
-	pthread_mutex_lock(&zb->mutex);
+	/*
+	 * Lock the io_u target zone. The zone will be unlocked if io_u offset
+	 * is changed or when io_u completes and zbd_put_io() executed.
+	 * To avoid multiple jobs doing asynchronous I/Os from deadlocking each
+	 * other waiting for zone locks when building an io_u batch, first
+	 * only trylock the zone. If the zone is already locked by another job,
+	 * process the currently queued I/Os so that I/O progress is made and
+	 * zones unlocked.
+	 */
+	if (pthread_mutex_trylock(&zb->mutex) != 0) {
+		if (!td_ioengine_flagged(td, FIO_SYNCIO))
+			io_u_quiesce(td);
+		pthread_mutex_lock(&zb->mutex);
+	}
+
 	switch (io_u->ddir) {
 	case DDIR_READ:
 		if (td->runstate == TD_VERIFYING) {
@@ -1318,8 +1407,10 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 accept:
 	assert(zb);
 	assert(zb->cond != BLK_ZONE_COND_OFFLINE);
-	assert(!io_u->post_submit);
-	io_u->post_submit = zbd_post_submit;
+	assert(!io_u->zbd_queue_io);
+	assert(!io_u->zbd_put_io);
+	io_u->zbd_queue_io = zbd_queue_io;
+	io_u->zbd_put_io = zbd_put_io;
 	return io_u_accept;
 
 eof:
diff --git a/zbd.h b/zbd.h
index 33e6d8bd..521283b2 100644
--- a/zbd.h
+++ b/zbd.h
@@ -96,6 +96,24 @@ void zbd_file_reset(struct thread_data *td, struct fio_file *f);
 bool zbd_unaligned_write(int error_code);
 enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
 char *zbd_write_status(const struct thread_stat *ts);
+
+static inline void zbd_queue_io_u(struct io_u *io_u, enum fio_q_status status)
+{
+	if (io_u->zbd_queue_io) {
+		io_u->zbd_queue_io(io_u, status, io_u->error == 0);
+		io_u->zbd_queue_io = NULL;
+	}
+}
+
+static inline void zbd_put_io_u(struct io_u *io_u)
+{
+	if (io_u->zbd_put_io) {
+		io_u->zbd_put_io(io_u);
+		io_u->zbd_queue_io = NULL;
+		io_u->zbd_put_io = NULL;
+	}
+}
+
 #else
 static inline void zbd_free_zone_info(struct fio_file *f)
 {
@@ -125,6 +143,10 @@ static inline char *zbd_write_status(const struct thread_stat *ts)
 {
 	return NULL;
 }
+
+static inline void zbd_queue_io_u(struct io_u *io_u,
+				  enum fio_q_status status) {}
+static inline void zbd_put_io_u(struct io_u *io_u) {}
 #endif
 
 #endif /* FIO_ZBD_H */


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-02-22 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-02-22 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 71144e676e710b37966f447ccd8d944813dfa6d1:

  configure: enable -Wimplicit-fallthrough if we have it (2019-02-11 13:30:52 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3a294b8704a4125f12e3c3dec36667e68d821be0:

  options: catch division by zero in setting CPU affinity (2019-02-21 10:55:32 -0700)

----------------------------------------------------------------
Vincent Fu (2):
      stat: use long doubles to identify latency percentiles
      options: catch division by zero in setting CPU affinity

 options.c | 3 +++
 stat.c    | 5 ++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/options.c b/options.c
index 6d832354..95086074 100644
--- a/options.c
+++ b/options.c
@@ -493,6 +493,9 @@ int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu_index)
 	const long max_cpu = cpus_online();
 
 	cpus_in_mask = fio_cpu_count(mask);
+	if (!cpus_in_mask)
+		return 0;
+
 	cpu_index = cpu_index % cpus_in_mask;
 
 	index = 0;
diff --git a/stat.c b/stat.c
index c1f46e1d..66a13bca 100644
--- a/stat.c
+++ b/stat.c
@@ -170,7 +170,7 @@ unsigned int calc_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 	is_last = false;
 	for (i = 0; i < FIO_IO_U_PLAT_NR && !is_last; i++) {
 		sum += io_u_plat[i];
-		while (sum >= (plist[j].u.f / 100.0 * nr)) {
+		while (sum >= ((long double) plist[j].u.f / 100.0 * nr)) {
 			assert(plist[j].u.f <= 100.0);
 
 			ovals[j] = plat_idx_to_val(i);
@@ -187,6 +187,9 @@ unsigned int calc_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 		}
 	}
 
+	if (!is_last)
+		log_err("fio: error calculating latency percentiles\n");
+
 	*output = ovals;
 	return len;
 }


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-02-12 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-02-12 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6616949337d8409ec0999f2b3ad240ea2d037a82:

  io_uring: sync header with the kernel (2019-02-10 09:36:48 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 71144e676e710b37966f447ccd8d944813dfa6d1:

  configure: enable -Wimplicit-fallthrough if we have it (2019-02-11 13:30:52 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Document switch fall-through cases
      configure: enable -Wimplicit-fallthrough if we have it

 configure      | 17 +++++++++++++++++
 crc/murmur3.c  |  2 ++
 engines/http.c |  4 +++-
 hash.h         | 22 +++++++++++-----------
 init.c         |  1 +
 lib/lfsr.c     | 16 ++++++++++++++++
 t/lfsr-test.c  |  5 ++++-
 7 files changed, 54 insertions(+), 13 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index c4fffd99..6e549cdc 100755
--- a/configure
+++ b/configure
@@ -2312,6 +2312,20 @@ if compile_prog "" "" "__kernel_rwf_t"; then
 fi
 print_config "__kernel_rwf_t" "$__kernel_rwf_t"
 
+##########################################
+# check if gcc has -Wimplicit-fallthrough
+fallthrough="no"
+cat > $TMPC << EOF
+int main(int argc, char **argv)
+{
+  return 0;
+}
+EOF
+if compile_prog "-Wimplicit-fallthrough" "" "-Wimplicit-fallthrough"; then
+  fallthrough="yes"
+fi
+print_config "-Wimplicit-fallthrough" "$fallthrough"
+
 #############################################################################
 
 if test "$wordsize" = "64" ; then
@@ -2583,6 +2597,9 @@ fi
 if test "$__kernel_rwf_t" = "yes"; then
   output_sym "CONFIG_HAVE_KERNEL_RWF_T"
 fi
+if test "$fallthrough" = "yes"; then
+  CFLAGS="$CFLAGS -Wimplicit-fallthrough"
+fi
 
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "GFIO_LIBS+=$GFIO_LIBS" >> $config_host_mak
diff --git a/crc/murmur3.c b/crc/murmur3.c
index e316f592..f4f2f2c6 100644
--- a/crc/murmur3.c
+++ b/crc/murmur3.c
@@ -29,8 +29,10 @@ static uint32_t murmur3_tail(const uint8_t *data, const int nblocks,
 	switch (len & 3) {
 	case 3:
 		k1 ^= tail[2] << 16;
+		/* fall through */
 	case 2:
 		k1 ^= tail[1] << 8;
+		/* fall through */
 	case 1:
 		k1 ^= tail[0];
 		k1 *= c1;
diff --git a/engines/http.c b/engines/http.c
index d81e4288..a35c0332 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -296,9 +296,11 @@ static int _curl_trace(CURL *handle, curl_infotype type,
 
 	switch (type) {
 	case CURLINFO_TEXT:
-	fprintf(stderr, "== Info: %s", data);
+		fprintf(stderr, "== Info: %s", data);
+		/* fall through */
 	default:
 	case CURLINFO_SSL_DATA_OUT:
+		/* fall through */
 	case CURLINFO_SSL_DATA_IN:
 		return 0;
 
diff --git a/hash.h b/hash.h
index d227b938..66dd3d69 100644
--- a/hash.h
+++ b/hash.h
@@ -141,17 +141,17 @@ static inline uint32_t jhash(const void *key, uint32_t length, uint32_t initval)
 	/* Last block: affect all 32 bits of (c) */
 	/* All the case statements fall through */
 	switch (length) {
-	case 12: c += (uint32_t) k[11] << 24;
-	case 11: c += (uint32_t) k[10] << 16;
-	case 10: c += (uint32_t) k[9] << 8;
-	case 9:  c += k[8];
-	case 8:  b += (uint32_t) k[7] << 24;
-	case 7:  b += (uint32_t) k[6] << 16;
-	case 6:  b += (uint32_t) k[5] << 8;
-	case 5:  b += k[4];
-	case 4:  a += (uint32_t) k[3] << 24;
-	case 3:  a += (uint32_t) k[2] << 16;
-	case 2:  a += (uint32_t) k[1] << 8;
+	case 12: c += (uint32_t) k[11] << 24;	/* fall through */
+	case 11: c += (uint32_t) k[10] << 16;	/* fall through */
+	case 10: c += (uint32_t) k[9] << 8;	/* fall through */
+	case 9:  c += k[8];			/* fall through */
+	case 8:  b += (uint32_t) k[7] << 24;	/* fall through */
+	case 7:  b += (uint32_t) k[6] << 16;	/* fall through */
+	case 6:  b += (uint32_t) k[5] << 8;	/* fall through */
+	case 5:  b += k[4];			/* fall through */
+	case 4:  a += (uint32_t) k[3] << 24;	/* fall through */
+	case 3:  a += (uint32_t) k[2] << 16;	/* fall through */
+	case 2:  a += (uint32_t) k[1] << 8;	/* fall through */
 	case 1:  a += k[0];
 		 __jhash_final(a, b, c);
 	case 0: /* Nothing left to add */
diff --git a/init.c b/init.c
index a2b70c4a..e6378715 100644
--- a/init.c
+++ b/init.c
@@ -2907,6 +2907,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			log_err("%s: unrecognized option '%s'\n", argv[0],
 							argv[optind - 1]);
 			show_closest_option(argv[optind - 1]);
+			/* fall through */
 		default:
 			do_exit++;
 			exit_val = 1;
diff --git a/lib/lfsr.c b/lib/lfsr.c
index 49e34a8c..32fbec56 100644
--- a/lib/lfsr.c
+++ b/lib/lfsr.c
@@ -88,21 +88,37 @@ static inline void __lfsr_next(struct fio_lfsr *fl, unsigned int spin)
 	 */
 	switch (spin) {
 		case 15: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case 14: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case 13: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case 12: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case 11: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case 10: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case  9: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case  8: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case  7: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case  6: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case  5: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case  4: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case  3: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case  2: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case  1: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		case  0: __LFSR_NEXT(fl, fl->last_val);
+		/* fall through */
 		default: break;
 	}
 }
diff --git a/t/lfsr-test.c b/t/lfsr-test.c
index a01f2cfc..ea8c8ddb 100644
--- a/t/lfsr-test.c
+++ b/t/lfsr-test.c
@@ -39,9 +39,12 @@ int main(int argc, char *argv[])
 	/* Read arguments */
 	switch (argc) {
 		case 5: if (strncmp(argv[4], "verify", 7) == 0)
-					verify = 1;
+				verify = 1;
+			/* fall through */
 		case 4: spin = atoi(argv[3]);
+			/* fall through */
 		case 3: seed = atol(argv[2]);
+			/* fall through */
 		case 2: numbers = strtol(argv[1], NULL, 16);
 				break;
 		default: usage();


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-02-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-02-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2f75f022393e432210d01b15088f425ee5260340:

  client/server: inflate error handling (2019-02-08 16:33:34 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6616949337d8409ec0999f2b3ad240ea2d037a82:

  io_uring: sync header with the kernel (2019-02-10 09:36:48 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      io_uring: sync header with the kernel

 os/linux/io_uring.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index b1504502..24906e99 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -11,8 +11,6 @@
 #include <linux/fs.h>
 #include <linux/types.h>
 
-#define IORING_MAX_ENTRIES	4096
-
 /*
  * IO submission data structure (Submission Queue Entry)
  */
@@ -94,7 +92,8 @@ struct io_sqring_offsets {
 	__u32 flags;
 	__u32 dropped;
 	__u32 array;
-	__u32 resv[3];
+	__u32 resv1;
+	__u64 resv2;
 };
 
 /*
@@ -109,7 +108,7 @@ struct io_cqring_offsets {
 	__u32 ring_entries;
 	__u32 overflow;
 	__u32 cqes;
-	__u32 resv[4];
+	__u64 resv[2];
 };
 
 /*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-02-09 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-02-09 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e5eb6fbf91617be0c7d74147165909a493c90136:

  stat: put 'percentiles' object in appropriate 'clat_ns' or 'lat_ns' parent (2019-02-07 09:48:46 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2f75f022393e432210d01b15088f425ee5260340:

  client/server: inflate error handling (2019-02-08 16:33:34 -0700)

----------------------------------------------------------------
Jeff Furlong (1):
      client/server: inflate error handling

Jens Axboe (1):
      Fio 3.13

 FIO-VERSION-GEN | 2 +-
 client.c        | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index ea5be1a9..37fb1a7a 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.12
+DEF_VER=fio-3.13
 
 LF='
 '
diff --git a/client.c b/client.c
index 480425f6..8d7c0331 100644
--- a/client.c
+++ b/client.c
@@ -1598,6 +1598,11 @@ static struct cmd_iolog_pdu *convert_iolog_gz(struct fio_net_cmd *cmd,
 		err = inflate(&stream, Z_NO_FLUSH);
 		/* may be Z_OK, or Z_STREAM_END */
 		if (err < 0) {
+			/*
+			 * Z_STREAM_ERROR and Z_BUF_ERROR can safely be
+			 * ignored */
+			if (err == Z_STREAM_ERROR || err == Z_BUF_ERROR)
+				break;
 			log_err("fio: inflate error %d\n", err);
 			free(ret);
 			ret = NULL;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-02-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-02-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit fda82fda074d910fd2939d004b3a73c06da40445:

  Improve wording in REPORTING-BUGS (2019-02-04 09:01:48 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e5eb6fbf91617be0c7d74147165909a493c90136:

  stat: put 'percentiles' object in appropriate 'clat_ns' or 'lat_ns' parent (2019-02-07 09:48:46 -0700)

----------------------------------------------------------------
Vincent Fu (2):
      stat: clean up calc_clat_percentiles
      stat: put 'percentiles' object in appropriate 'clat_ns' or 'lat_ns' parent

 stat.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 351c49cc..c1f46e1d 100644
--- a/stat.c
+++ b/stat.c
@@ -139,7 +139,6 @@ unsigned int calc_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 {
 	unsigned long long sum = 0;
 	unsigned int len, i, j = 0;
-	unsigned int oval_len = 0;
 	unsigned long long *ovals = NULL;
 	bool is_last;
 
@@ -161,6 +160,10 @@ unsigned int calc_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 	if (len > 1)
 		qsort((void *)plist, len, sizeof(plist[0]), double_cmp);
 
+	ovals = malloc(len * sizeof(*ovals));
+	if (!ovals)
+		return 0;
+
 	/*
 	 * Calculate bucket values, note down max and min values
 	 */
@@ -170,11 +173,6 @@ unsigned int calc_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 		while (sum >= (plist[j].u.f / 100.0 * nr)) {
 			assert(plist[j].u.f <= 100.0);
 
-			if (j == oval_len) {
-				oval_len += 100;
-				ovals = realloc(ovals, oval_len * sizeof(*ovals));
-			}
-
 			ovals[j] = plat_idx_to_val(i);
 			if (ovals[j] < *minv)
 				*minv = ovals[j];
@@ -1090,7 +1088,8 @@ static void add_ddir_status_json(struct thread_stat *ts,
 		len = 0;
 
 	percentile_object = json_create_object();
-	json_object_add_value_object(tmp_object, "percentile", percentile_object);
+	if (ts->clat_percentiles)
+		json_object_add_value_object(tmp_object, "percentile", percentile_object);
 	for (i = 0; i < len; i++) {
 		snprintf(buf, sizeof(buf), "%f", ts->percentile_list[i].u.f);
 		json_object_add_value_int(percentile_object, (const char *)buf, ovals[i]);
@@ -1129,6 +1128,8 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	json_object_add_value_int(tmp_object, "max", max);
 	json_object_add_value_float(tmp_object, "mean", mean);
 	json_object_add_value_float(tmp_object, "stddev", dev);
+	if (ts->lat_percentiles)
+		json_object_add_value_object(tmp_object, "percentile", percentile_object);
 	if (output_format & FIO_OUTPUT_JSON_PLUS && ts->lat_percentiles)
 		json_object_add_value_object(tmp_object, "bins", clat_bins_object);
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-02-05 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-02-05 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 154a95828fe08b01774b59602544de394b2d3aa6:

  t/io_uring: verbose error for -95/-EOPNOTSUPP failure (2019-01-31 22:55:52 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fda82fda074d910fd2939d004b3a73c06da40445:

  Improve wording in REPORTING-BUGS (2019-02-04 09:01:48 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Improve wording in REPORTING-BUGS

 REPORTING-BUGS | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/REPORTING-BUGS b/REPORTING-BUGS
index d8876ae6..327b6caa 100644
--- a/REPORTING-BUGS
+++ b/REPORTING-BUGS
@@ -14,8 +14,8 @@ When reporting a bug, you'll need to include:
 4) How to reproduce. Please include a full list of the parameters
    passed to fio and the job file used (if any).
 
-A bug report can never have too much information. Any time information
-is left out and has to be asked for, it'll add to the turn-around time
-of getting to the bottom of it and committing a fix.
+A bug report can't have too much information. Any time information that
+is left out and has to be asked for will add to the turn-around time
+of getting to the bottom of the issue, and an eventual fix.
 
 That's it!


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-02-01 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-02-01 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 521164fa90df413aa7aa8fb2956095a41eba7d6a:

  io_uring: ensure we use the right argument syscall (2019-01-29 12:20:02 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 154a95828fe08b01774b59602544de394b2d3aa6:

  t/io_uring: verbose error for -95/-EOPNOTSUPP failure (2019-01-31 22:55:52 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: verbose error for -95/-EOPNOTSUPP failure

 t/io_uring.c | 2 ++
 1 file changed, 2 insertions(+)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 62b48e44..7c75c887 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -262,6 +262,8 @@ static int reap_events(struct submitter *s)
 			f->pending_ios--;
 			if (cqe->res != BS) {
 				printf("io: unexpected ret=%d\n", cqe->res);
+				if (polled && cqe->res == -EOPNOTSUPP)
+					printf("Your filesystem doesn't support poll\n");
 				return -1;
 			}
 		}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-30 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-30 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b532dd6d476679b08e4a56a60e8a7dd958779df9:

  io_uring: sync with kernel (2019-01-28 11:42:20 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 521164fa90df413aa7aa8fb2956095a41eba7d6a:

  io_uring: ensure we use the right argument syscall (2019-01-29 12:20:02 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      io_uring: update to kernel struct io_uring_params
      t/io_uring: fix bad if
      io_uring: ensure we use the right argument syscall

 engines/io_uring.c  |  2 +-
 os/linux/io_uring.h | 24 ++++++++++++------------
 t/io_uring.c        |  4 ++--
 3 files changed, 15 insertions(+), 15 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 5279b1d0..014f954e 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -132,7 +132,7 @@ static int io_uring_enter(struct ioring_data *ld, unsigned int to_submit,
 			 unsigned int min_complete, unsigned int flags)
 {
 	return syscall(__NR_sys_io_uring_enter, ld->ring_fd, to_submit,
-			min_complete, flags);
+			min_complete, flags, NULL, 0);
 }
 
 static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index 589b6402..b1504502 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -39,14 +39,14 @@ struct io_uring_sqe {
 /*
  * sqe->flags
  */
-#define IOSQE_FIXED_FILE	(1 << 0)	/* use fixed fileset */
+#define IOSQE_FIXED_FILE	(1U << 0)	/* use fixed fileset */
 
 /*
  * io_uring_setup() flags
  */
-#define IORING_SETUP_IOPOLL	(1 << 0)	/* io_context is polled */
-#define IORING_SETUP_SQPOLL	(1 << 1)	/* SQ poll thread */
-#define IORING_SETUP_SQ_AFF	(1 << 2)	/* sq_thread_cpu is valid */
+#define IORING_SETUP_IOPOLL	(1U << 0)	/* io_context is polled */
+#define IORING_SETUP_SQPOLL	(1U << 1)	/* SQ poll thread */
+#define IORING_SETUP_SQ_AFF	(1U << 2)	/* sq_thread_cpu is valid */
 
 #define IORING_OP_NOP		0
 #define IORING_OP_READV		1
@@ -60,7 +60,7 @@ struct io_uring_sqe {
 /*
  * sqe->fsync_flags
  */
-#define IORING_FSYNC_DATASYNC	(1 << 0)
+#define IORING_FSYNC_DATASYNC	(1U << 0)
 
 /*
  * IO completion data structure (Completion Queue Entry)
@@ -74,7 +74,7 @@ struct io_uring_cqe {
 /*
  * io_uring_event->flags
  */
-#define IOCQE_FLAG_CACHEHIT	(1 << 0)	/* IO did not hit media */
+#define IOCQE_FLAG_CACHEHIT	(1U << 0)	/* IO did not hit media */
 
 /*
  * Magic offsets for the application to mmap the data it needs
@@ -100,7 +100,7 @@ struct io_sqring_offsets {
 /*
  * sq_ring->flags
  */
-#define IORING_SQ_NEED_WAKEUP	(1 << 0) /* needs io_uring_enter wakeup */
+#define IORING_SQ_NEED_WAKEUP	(1U << 0) /* needs io_uring_enter wakeup */
 
 struct io_cqring_offsets {
 	__u32 head;
@@ -115,8 +115,8 @@ struct io_cqring_offsets {
 /*
  * io_uring_enter(2) flags
  */
-#define IORING_ENTER_GETEVENTS	(1 << 0)
-#define IORING_ENTER_SQ_WAKEUP	(1 << 1)
+#define IORING_ENTER_GETEVENTS	(1U << 0)
+#define IORING_ENTER_SQ_WAKEUP	(1U << 1)
 
 /*
  * Passed in for io_uring_setup(2). Copied back with updated info on success
@@ -125,9 +125,9 @@ struct io_uring_params {
 	__u32 sq_entries;
 	__u32 cq_entries;
 	__u32 flags;
-	__u16 sq_thread_cpu;
-	__u16 sq_thread_idle;
-	__u16 resv[8];
+	__u32 sq_thread_cpu;
+	__u32 sq_thread_idle;
+	__u32 resv[5];
 	struct io_sqring_offsets sq_off;
 	struct io_cqring_offsets cq_off;
 };
diff --git a/t/io_uring.c b/t/io_uring.c
index 9ded1590..62b48e44 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -129,7 +129,7 @@ static int io_uring_enter(struct submitter *s, unsigned int to_submit,
 			  unsigned int min_complete, unsigned int flags)
 {
 	return syscall(__NR_sys_io_uring_enter, s->ring_fd, to_submit,
-			min_complete, flags);
+			min_complete, flags, NULL, 0);
 }
 
 static int gettid(void)
@@ -315,7 +315,7 @@ submit:
 
 			if (to_wait)
 				flags = IORING_ENTER_GETEVENTS;
-			if (*ring->flags & IORING_SQ_NEED_WAKEUP)
+			if ((*ring->flags & IORING_SQ_NEED_WAKEUP))
 				flags |= IORING_ENTER_SQ_WAKEUP;
 			ret = io_uring_enter(s, to_submit, to_wait, flags);
 			s->calls++;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-29 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-29 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4c085cf20f5c0d083aca18680c4323a1fb2b7a1f:

  rate-submit: call ioengine post_init when starting workers (2019-01-24 13:58:00 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b532dd6d476679b08e4a56a60e8a7dd958779df9:

  io_uring: sync with kernel (2019-01-28 11:42:20 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'hygon-support' of https://github.com/hygonsoc/fio
      io_uring: sync with kernel

hygonsoc (1):
      Add Hygon SoC support to enable tsc_reliable feature

 arch/arch-x86-common.h | 2 +-
 engines/io_uring.c     | 3 ++-
 os/linux/io_uring.h    | 9 ++++++++-
 t/io_uring.c           | 2 ++
 4 files changed, 13 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/arch/arch-x86-common.h b/arch/arch-x86-common.h
index 7c07a61c..87925bdc 100644
--- a/arch/arch-x86-common.h
+++ b/arch/arch-x86-common.h
@@ -81,7 +81,7 @@ static inline void arch_init(char *envp[])
 	str[12] = '\0';
 	if (!strcmp(str, "GenuineIntel"))
 		arch_init_intel();
-	else if (!strcmp(str, "AuthenticAMD"))
+	else if (!strcmp(str, "AuthenticAMD") || !strcmp(str, "HygonGenuine"))
 		arch_init_amd();
 }
 
diff --git a/engines/io_uring.c b/engines/io_uring.c
index c759ec19..5279b1d0 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -340,7 +340,8 @@ static int fio_ioring_commit(struct thread_data *td)
 
 		read_barrier();
 		if (*ring->flags & IORING_SQ_NEED_WAKEUP)
-			io_uring_enter(ld, ld->queued, 0, 0);
+			io_uring_enter(ld, ld->queued, 0,
+					IORING_ENTER_SQ_WAKEUP);
 		ld->queued = 0;
 		return 0;
 	}
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index 9bb71816..589b6402 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -11,6 +11,8 @@
 #include <linux/fs.h>
 #include <linux/types.h>
 
+#define IORING_MAX_ENTRIES	4096
+
 /*
  * IO submission data structure (Submission Queue Entry)
  */
@@ -25,6 +27,7 @@ struct io_uring_sqe {
 	union {
 		__kernel_rwf_t	rw_flags;
 		__u32		fsync_flags;
+		__u16		poll_events;
 	};
 	__u64	user_data;	/* data to be passed back at completion time */
 	union {
@@ -51,6 +54,8 @@ struct io_uring_sqe {
 #define IORING_OP_FSYNC		3
 #define IORING_OP_READ_FIXED	4
 #define IORING_OP_WRITE_FIXED	5
+#define IORING_OP_POLL_ADD	6
+#define IORING_OP_POLL_REMOVE	7
 
 /*
  * sqe->fsync_flags
@@ -111,6 +116,7 @@ struct io_cqring_offsets {
  * io_uring_enter(2) flags
  */
 #define IORING_ENTER_GETEVENTS	(1 << 0)
+#define IORING_ENTER_SQ_WAKEUP	(1 << 1)
 
 /*
  * Passed in for io_uring_setup(2). Copied back with updated info on success
@@ -120,7 +126,8 @@ struct io_uring_params {
 	__u32 cq_entries;
 	__u32 flags;
 	__u16 sq_thread_cpu;
-	__u16 resv[9];
+	__u16 sq_thread_idle;
+	__u16 resv[8];
 	struct io_sqring_offsets sq_off;
 	struct io_cqring_offsets cq_off;
 };
diff --git a/t/io_uring.c b/t/io_uring.c
index da3b4d1f..9ded1590 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -315,6 +315,8 @@ submit:
 
 			if (to_wait)
 				flags = IORING_ENTER_GETEVENTS;
+			if (*ring->flags & IORING_SQ_NEED_WAKEUP)
+				flags |= IORING_ENTER_SQ_WAKEUP;
 			ret = io_uring_enter(s, to_submit, to_wait, flags);
 			s->calls++;
 		}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-25 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-25 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 63ed82548f4489b4c6d5df688f7405b9eb20ddc9:

  io_uring: system calls have been renumbered (2019-01-23 08:04:28 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4c085cf20f5c0d083aca18680c4323a1fb2b7a1f:

  rate-submit: call ioengine post_init when starting workers (2019-01-24 13:58:00 -0700)

----------------------------------------------------------------
Vincent Fu (1):
      rate-submit: call ioengine post_init when starting workers

 rate-submit.c | 3 +++
 1 file changed, 3 insertions(+)

---

Diff of recent changes:

diff --git a/rate-submit.c b/rate-submit.c
index b07a2072..cf00d9bc 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -165,6 +165,9 @@ static int io_workqueue_init_worker_fn(struct submit_worker *sw)
 	if (td_io_init(td))
 		goto err_io_init;
 
+	if (td->io_ops->post_init && td->io_ops->post_init(td))
+		goto err_io_init;
+
 	set_epoch_time(td, td->o.log_unix_epoch);
 	fio_getrusage(&td->ru_start);
 	clear_io_state(td, 1);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2d644205f56953192d27ccbf193c899ae559fcb7:

  engines/io_uring: cleanup setrlimit() (2019-01-16 09:07:13 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 63ed82548f4489b4c6d5df688f7405b9eb20ddc9:

  io_uring: system calls have been renumbered (2019-01-23 08:04:28 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      io_uring: system calls have been renumbered

 arch/arch-x86-common.h | 11 +++++++++++
 arch/arch-x86.h        | 11 -----------
 arch/arch-x86_64.h     | 11 -----------
 3 files changed, 11 insertions(+), 22 deletions(-)

---

Diff of recent changes:

diff --git a/arch/arch-x86-common.h b/arch/arch-x86-common.h
index 5140f238..7c07a61c 100644
--- a/arch/arch-x86-common.h
+++ b/arch/arch-x86-common.h
@@ -3,6 +3,16 @@
 
 #include <string.h>
 
+#ifndef __NR_sys_io_uring_setup
+#define __NR_sys_io_uring_setup		425
+#endif
+#ifndef __NR_sys_io_uring_enter
+#define __NR_sys_io_uring_enter		426
+#endif
+#ifndef __NR_sys_io_uring_register
+#define __NR_sys_io_uring_register	427
+#endif
+
 static inline void cpuid(unsigned int op,
 			 unsigned int *eax, unsigned int *ebx,
 			 unsigned int *ecx, unsigned int *edx)
@@ -13,6 +23,7 @@ static inline void cpuid(unsigned int op,
 }
 
 #define ARCH_HAVE_INIT
+#define ARCH_HAVE_IOURING
 
 extern bool tsc_reliable;
 extern int arch_random;
diff --git a/arch/arch-x86.h b/arch/arch-x86.h
index c1c866ea..c6bcb54c 100644
--- a/arch/arch-x86.h
+++ b/arch/arch-x86.h
@@ -1,16 +1,6 @@
 #ifndef ARCH_X86_H
 #define ARCH_X86_H
 
-#ifndef __NR_sys_io_uring_setup
-#define __NR_sys_io_uring_setup		387
-#endif
-#ifndef __NR_sys_io_uring_enter
-#define __NR_sys_io_uring_enter		388
-#endif
-#ifndef __NR_sys_io_uring_register
-#define __NR_sys_io_uring_register	389
-#endif
-
 static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 			    unsigned int *ecx, unsigned int *edx)
 {
@@ -46,6 +36,5 @@ static inline unsigned long long get_cpu_clock(void)
 
 #define ARCH_HAVE_FFZ
 #define ARCH_HAVE_CPU_CLOCK
-#define ARCH_HAVE_IOURING
 
 #endif
diff --git a/arch/arch-x86_64.h b/arch/arch-x86_64.h
index 0cd21b8f..25850f90 100644
--- a/arch/arch-x86_64.h
+++ b/arch/arch-x86_64.h
@@ -1,16 +1,6 @@
 #ifndef ARCH_X86_64_H
 #define ARCH_X86_64_H
 
-#ifndef __NR_sys_io_uring_setup
-#define __NR_sys_io_uring_setup		335
-#endif
-#ifndef __NR_sys_io_uring_enter
-#define __NR_sys_io_uring_enter		336
-#endif
-#ifndef __NR_sys_io_uring_register
-#define __NR_sys_io_uring_register	337
-#endif
-
 static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 			    unsigned int *ecx, unsigned int *edx)
 {
@@ -47,7 +37,6 @@ static inline unsigned long long get_cpu_clock(void)
 #define ARCH_HAVE_FFZ
 #define ARCH_HAVE_SSE4_2
 #define ARCH_HAVE_CPU_CLOCK
-#define ARCH_HAVE_IOURING
 
 #define RDRAND_LONG	".byte 0x48,0x0f,0xc7,0xf0"
 #define RDSEED_LONG	".byte 0x48,0x0f,0xc7,0xf8"


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-17 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2b9415ddc260726c3ea9ae3436826f9181811143:

  engines/io_uring: ensure sqe stores are ordered SQ ring tail update (2019-01-15 22:06:05 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2d644205f56953192d27ccbf193c899ae559fcb7:

  engines/io_uring: cleanup setrlimit() (2019-01-16 09:07:13 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      io_uring: sync with upstream API
      engines/io_uring: cleanup setrlimit()

 engines/io_uring.c  | 24 +++++++++---------------
 os/linux/io_uring.h | 21 +--------------------
 t/io_uring.c        | 16 ++++------------
 3 files changed, 14 insertions(+), 47 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 8c5d9deb..c759ec19 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -154,7 +154,7 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 				sqe->opcode = IORING_OP_READ_FIXED;
 			else
 				sqe->opcode = IORING_OP_WRITE_FIXED;
-			sqe->addr = io_u->xfer_buf;
+			sqe->addr = (unsigned long) io_u->xfer_buf;
 			sqe->len = io_u->xfer_buflen;
 			sqe->buf_index = io_u->index;
 		} else {
@@ -162,7 +162,7 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 				sqe->opcode = IORING_OP_READV;
 			else
 				sqe->opcode = IORING_OP_WRITEV;
-			sqe->addr = &ld->iovecs[io_u->index];
+			sqe->addr = (unsigned long) &ld->iovecs[io_u->index];
 			sqe->len = 1;
 		}
 		sqe->off = io_u->offset;
@@ -462,15 +462,6 @@ static int fio_ioring_queue_init(struct thread_data *td)
 		}
 	}
 
-	if (o->fixedbufs) {
-		struct rlimit rlim = {
-			.rlim_cur = RLIM_INFINITY,
-			.rlim_max = RLIM_INFINITY,
-		};
-
-		setrlimit(RLIMIT_MEMLOCK, &rlim);
-	}
-
 	ret = syscall(__NR_sys_io_uring_setup, depth, &p);
 	if (ret < 0)
 		return ret;
@@ -478,13 +469,16 @@ static int fio_ioring_queue_init(struct thread_data *td)
 	ld->ring_fd = ret;
 
 	if (o->fixedbufs) {
-		struct io_uring_register_buffers reg = {
-			.iovecs = ld->iovecs,
-			.nr_iovecs = depth
+		struct rlimit rlim = {
+			.rlim_cur = RLIM_INFINITY,
+			.rlim_max = RLIM_INFINITY,
 		};
 
+		if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0)
+			return -1;
+
 		ret = syscall(__NR_sys_io_uring_register, ld->ring_fd,
-				IORING_REGISTER_BUFFERS, &reg);
+				IORING_REGISTER_BUFFERS, ld->iovecs, depth);
 		if (ret < 0)
 			return ret;
 	}
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index 71e92026..9bb71816 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -20,10 +20,7 @@ struct io_uring_sqe {
 	__u16	ioprio;		/* ioprio for the request */
 	__s32	fd;		/* file descriptor to do IO on */
 	__u64	off;		/* offset into file */
-	union {
-		void	*addr;	/* buffer or iovecs */
-		__u64	__pad;
-	};
+	__u64	addr;		/* pointer to buffer or iovecs */
 	__u32	len;		/* buffer size or number of iovecs */
 	union {
 		__kernel_rwf_t	rw_flags;
@@ -136,20 +133,4 @@ struct io_uring_params {
 #define IORING_REGISTER_FILES		2
 #define IORING_UNREGISTER_FILES		3
 
-struct io_uring_register_buffers {
-	union {
-		struct iovec *iovecs;
-		__u64 pad;
-	};
-	__u32 nr_iovecs;
-};
-
-struct io_uring_register_files {
-	union {
-		__s32 *fds;
-		__u64 pad;
-	};
-	__u32 nr_fds;
-};
-
 #endif
diff --git a/t/io_uring.c b/t/io_uring.c
index ef5d52d1..da3b4d1f 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -96,21 +96,15 @@ static int do_nop = 0;		/* no-op SQ ring commands */
 
 static int io_uring_register_buffers(struct submitter *s)
 {
-	struct io_uring_register_buffers reg = {
-		.iovecs = s->iovecs,
-		.nr_iovecs = DEPTH
-	};
-
 	if (do_nop)
 		return 0;
 
 	return syscall(__NR_sys_io_uring_register, s->ring_fd,
-			IORING_REGISTER_BUFFERS, &reg);
+			IORING_REGISTER_BUFFERS, s->iovecs, DEPTH);
 }
 
 static int io_uring_register_files(struct submitter *s)
 {
-	struct io_uring_register_files reg;
 	int i;
 
 	if (do_nop)
@@ -121,11 +115,9 @@ static int io_uring_register_files(struct submitter *s)
 		s->fds[i] = s->files[i].real_fd;
 		s->files[i].fixed_fd = i;
 	}
-	reg.fds = s->fds;
-	reg.nr_fds = s->nr_files;
 
 	return syscall(__NR_sys_io_uring_register, s->ring_fd,
-			IORING_REGISTER_FILES, &reg);
+			IORING_REGISTER_FILES, s->fds, s->nr_files);
 }
 
 static int io_uring_setup(unsigned entries, struct io_uring_params *p)
@@ -187,12 +179,12 @@ static void init_io(struct submitter *s, unsigned index)
 	}
 	if (fixedbufs) {
 		sqe->opcode = IORING_OP_READ_FIXED;
-		sqe->addr = s->iovecs[index].iov_base;
+		sqe->addr = (unsigned long) s->iovecs[index].iov_base;
 		sqe->len = BS;
 		sqe->buf_index = index;
 	} else {
 		sqe->opcode = IORING_OP_READV;
-		sqe->addr = &s->iovecs[index];
+		sqe->addr = (unsigned long) &s->iovecs[index];
 		sqe->len = 1;
 		sqe->buf_index = 0;
 	}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 93d1811ce6a510d36d61381840283d1ae9933b37:

  t/io_uring: pick next file if we're over the limti (2019-01-15 05:57:54 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2b9415ddc260726c3ea9ae3436826f9181811143:

  engines/io_uring: ensure sqe stores are ordered SQ ring tail update (2019-01-15 22:06:05 -0700)

----------------------------------------------------------------
Jens Axboe (7):
      t/io_uring: print file depths
      t/io_uring: wait if we're at queue limit
      t/io_uring: terminate buf[] file depth string
      t/io_uring: fixes
      x86-64: correct read/write barriers
      t/io_uring: use fio provided memory barriers
      engines/io_uring: ensure sqe stores are ordered SQ ring tail update

 arch/arch-x86_64.h |  4 ++--
 engines/io_uring.c |  2 ++
 t/io_uring.c       | 51 ++++++++++++++++++++++++++++++++++++---------------
 3 files changed, 40 insertions(+), 17 deletions(-)

---

Diff of recent changes:

diff --git a/arch/arch-x86_64.h b/arch/arch-x86_64.h
index 665c6b04..0cd21b8f 100644
--- a/arch/arch-x86_64.h
+++ b/arch/arch-x86_64.h
@@ -27,8 +27,8 @@ static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 #define	FIO_HUGE_PAGE		2097152
 
 #define nop		__asm__ __volatile__("rep;nop": : :"memory")
-#define read_barrier()	__asm__ __volatile__("lfence":::"memory")
-#define write_barrier()	__asm__ __volatile__("sfence":::"memory")
+#define read_barrier()	__asm__ __volatile__("":::"memory")
+#define write_barrier()	__asm__ __volatile__("":::"memory")
 
 static inline unsigned long arch_ffz(unsigned long bitmask)
 {
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 56af8d71..8c5d9deb 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -289,6 +289,8 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 	if (next_tail == *ring->head)
 		return FIO_Q_BUSY;
 
+	/* ensure sqe stores are ordered with tail update */
+	write_barrier();
 	ring->array[tail & ld->sq_ring_mask] = io_u->index;
 	*ring->tail = next_tail;
 	write_barrier();
diff --git a/t/io_uring.c b/t/io_uring.c
index b5f1e094..ef5d52d1 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -24,8 +24,6 @@
 #include "../lib/types.h"
 #include "../os/linux/io_uring.h"
 
-#define barrier()	__asm__ __volatile__("": : :"memory")
-
 #define min(a, b)		((a < b) ? (a) : (b))
 
 struct io_sq_ring {
@@ -47,8 +45,8 @@ struct io_cq_ring {
 
 #define DEPTH			128
 
-#define BATCH_SUBMIT		64
-#define BATCH_COMPLETE		64
+#define BATCH_SUBMIT		32
+#define BATCH_COMPLETE		32
 
 #define BS			4096
 
@@ -211,7 +209,7 @@ static int prep_more_ios(struct submitter *s, int max_ios)
 	next_tail = tail = *ring->tail;
 	do {
 		next_tail++;
-		barrier();
+		read_barrier();
 		if (next_tail == *ring->head)
 			break;
 
@@ -224,9 +222,9 @@ static int prep_more_ios(struct submitter *s, int max_ios)
 
 	if (*ring->tail != tail) {
 		/* order tail store with writes to sqes above */
-		barrier();
+		write_barrier();
 		*ring->tail = tail;
-		barrier();
+		write_barrier();
 	}
 	return prepped;
 }
@@ -263,7 +261,7 @@ static int reap_events(struct submitter *s)
 	do {
 		struct file *f;
 
-		barrier();
+		read_barrier();
 		if (head == *ring->tail)
 			break;
 		cqe = &ring->cqes[head & cq_ring_mask];
@@ -285,7 +283,7 @@ static int reap_events(struct submitter *s)
 
 	s->inflight -= reaped;
 	*ring->head = head;
-	barrier();
+	write_barrier();
 	return reaped;
 }
 
@@ -311,7 +309,7 @@ static void *submitter_fn(void *data)
 submit_more:
 		to_submit = prepped;
 submit:
-		if (to_submit && (s->inflight + to_submit < DEPTH))
+		if (to_submit && (s->inflight + to_submit <= DEPTH))
 			to_wait = 0;
 		else
 			to_wait = min(s->inflight + to_submit, BATCH_COMPLETE);
@@ -338,9 +336,10 @@ submit:
 		do {
 			int r;
 			r = reap_events(s);
-			if (r == -1)
+			if (r == -1) {
+				s->finish = 1;
 				break;
-			else if (r > 0)
+			} else if (r > 0)
 				this_reap += r;
 		} while (sq_thread_poll && this_reap < to_wait);
 		s->reaps += this_reap;
@@ -406,7 +405,7 @@ static int setup_ring(struct submitter *s)
 
 	memset(&p, 0, sizeof(p));
 
-	if (polled)
+	if (polled && !do_nop)
 		p.flags |= IORING_SETUP_IOPOLL;
 	if (sq_thread_poll) {
 		p.flags |= IORING_SETUP_SQPOLL;
@@ -469,11 +468,30 @@ static int setup_ring(struct submitter *s)
 	return 0;
 }
 
+static void file_depths(char *buf)
+{
+	struct submitter *s = &submitters[0];
+	char *p;
+	int i;
+
+	buf[0] = '\0';
+	p = buf;
+	for (i = 0; i < s->nr_files; i++) {
+		struct file *f = &s->files[i];
+
+		if (i + 1 == s->nr_files)
+			p += sprintf(p, "%d", f->pending_ios);
+		else
+			p += sprintf(p, "%d, ", f->pending_ios);
+	}
+}
+
 int main(int argc, char *argv[])
 {
 	struct submitter *s = &submitters[0];
 	unsigned long done, calls, reap, cache_hit, cache_miss;
 	int err, i, flags, fd;
+	char *fdepths;
 	void *ret;
 
 	if (!do_nop && argc < 2) {
@@ -544,6 +562,7 @@ int main(int argc, char *argv[])
 
 	pthread_create(&s->thread, NULL, submitter_fn, s);
 
+	fdepths = malloc(8 * s->nr_files);
 	cache_hit = cache_miss = reap = calls = done = 0;
 	do {
 		unsigned long this_done = 0;
@@ -573,9 +592,10 @@ int main(int argc, char *argv[])
 			ipc = (this_reap - reap) / (this_call - calls);
 		} else
 			rpc = ipc = -1;
-		printf("IOPS=%lu, IOS/call=%ld/%ld, inflight=%u (head=%u tail=%u), Cachehit=%0.2f%%\n",
+		file_depths(fdepths);
+		printf("IOPS=%lu, IOS/call=%ld/%ld, inflight=%u (%s), Cachehit=%0.2f%%\n",
 				this_done - done, rpc, ipc, s->inflight,
-				*s->cq_ring.head, *s->cq_ring.tail, hit);
+				fdepths, hit);
 		done = this_done;
 		calls = this_call;
 		reap = this_reap;
@@ -585,5 +605,6 @@ int main(int argc, char *argv[])
 
 	pthread_join(s->thread, &ret);
 	close(s->ring_fd);
+	free(fdepths);
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-15 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c21198d9cd52b6239c8f9de2373574a7683a0593:

  t/io_uring: use the right check for when to wait (2019-01-13 22:52:07 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 93d1811ce6a510d36d61381840283d1ae9933b37:

  t/io_uring: pick next file if we're over the limti (2019-01-15 05:57:54 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: pick next file if we're over the limti

 t/io_uring.c | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/t/io_uring.c b/t/io_uring.c
index 7ddeef39..b5f1e094 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -172,6 +172,7 @@ static void init_io(struct submitter *s, unsigned index)
 			s->cur_file++;
 			if (s->cur_file == s->nr_files)
 				s->cur_file = 0;
+			f = &s->files[s->cur_file];
 		}
 	}
 	f->pending_ios++;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-14 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-14 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8025517dfa599be4bc795e4af7c9012d10b81bc5:

  t/io_uring: add IORING_OP_NOP support (2019-01-12 22:14:54 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c21198d9cd52b6239c8f9de2373574a7683a0593:

  t/io_uring: use the right check for when to wait (2019-01-13 22:52:07 -0700)

----------------------------------------------------------------
Jens Axboe (7):
      Move io_uring to os/linux/
      io_uring: ensure that the io_uring_register() structs are 32-bit safe
      io_uring: fix pointer cast warning on 32-bit
      t/io_uring: add option for register_files
      io_uring: add 32-bit x86 support
      t/io_uring: only call setrlimit() for fixedbufs
      t/io_uring: use the right check for when to wait

 Makefile                  |  2 +-
 arch/arch-x86.h           | 11 +++++++++++
 engines/io_uring.c        |  4 ++--
 os/{ => linux}/io_uring.h | 17 ++++++++++++-----
 t/io_uring.c              | 41 ++++++++++++++++++++++++++---------------
 5 files changed, 52 insertions(+), 23 deletions(-)
 rename os/{ => linux}/io_uring.h (94%)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 3701317e..fd138dd2 100644
--- a/Makefile
+++ b/Makefile
@@ -444,7 +444,7 @@ cairo_text_helpers.o: cairo_text_helpers.c cairo_text_helpers.h
 printing.o: printing.c printing.h
 	$(QUIET_CC)$(CC) $(CFLAGS) $(GTK_CFLAGS) $(CPPFLAGS) -c $<
 
-t/io_uring.o: os/io_uring.h
+t/io_uring.o: os/linux/io_uring.h
 t/io_uring: $(T_IOU_RING_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_IOU_RING_OBJS) $(LIBS)
 
diff --git a/arch/arch-x86.h b/arch/arch-x86.h
index c6bcb54c..c1c866ea 100644
--- a/arch/arch-x86.h
+++ b/arch/arch-x86.h
@@ -1,6 +1,16 @@
 #ifndef ARCH_X86_H
 #define ARCH_X86_H
 
+#ifndef __NR_sys_io_uring_setup
+#define __NR_sys_io_uring_setup		387
+#endif
+#ifndef __NR_sys_io_uring_enter
+#define __NR_sys_io_uring_enter		388
+#endif
+#ifndef __NR_sys_io_uring_register
+#define __NR_sys_io_uring_register	389
+#endif
+
 static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 			    unsigned int *ecx, unsigned int *edx)
 {
@@ -36,5 +46,6 @@ static inline unsigned long long get_cpu_clock(void)
 
 #define ARCH_HAVE_FFZ
 #define ARCH_HAVE_CPU_CLOCK
+#define ARCH_HAVE_IOURING
 
 #endif
diff --git a/engines/io_uring.c b/engines/io_uring.c
index ca3e157f..56af8d71 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -21,7 +21,7 @@
 #ifdef ARCH_HAVE_IOURING
 
 #include "../lib/types.h"
-#include "../os/io_uring.h"
+#include "../os/linux/io_uring.h"
 
 struct io_sq_ring {
 	unsigned *head;
@@ -187,7 +187,7 @@ static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 	index = (event + ld->cq_ring_off) & ld->cq_ring_mask;
 
 	cqe = &ld->cq_ring.cqes[index];
-	io_u = (struct io_u *) cqe->user_data;
+	io_u = (struct io_u *) (uintptr_t) cqe->user_data;
 
 	if (cqe->res != io_u->xfer_buflen) {
 		if (cqe->res > io_u->xfer_buflen)
diff --git a/os/io_uring.h b/os/linux/io_uring.h
similarity index 94%
rename from os/io_uring.h
rename to os/linux/io_uring.h
index 0f4460d6..71e92026 100644
--- a/os/io_uring.h
+++ b/os/linux/io_uring.h
@@ -29,10 +29,11 @@ struct io_uring_sqe {
 		__kernel_rwf_t	rw_flags;
 		__u32		fsync_flags;
 	};
-	__u16	buf_index;	/* index into fixed buffers, if used */
-	__u16	__pad2;
-	__u32	__pad3;
 	__u64	user_data;	/* data to be passed back at completion time */
+	union {
+		__u16	buf_index;	/* index into fixed buffers, if used */
+		__u64	__pad2[3];
+	};
 };
 
 /*
@@ -136,12 +137,18 @@ struct io_uring_params {
 #define IORING_UNREGISTER_FILES		3
 
 struct io_uring_register_buffers {
-	struct iovec *iovecs;
+	union {
+		struct iovec *iovecs;
+		__u64 pad;
+	};
 	__u32 nr_iovecs;
 };
 
 struct io_uring_register_files {
-	__s32 *fds;
+	union {
+		__s32 *fds;
+		__u64 pad;
+	};
 	__u32 nr_fds;
 };
 
diff --git a/t/io_uring.c b/t/io_uring.c
index 8d3f3a9b..7ddeef39 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -22,7 +22,7 @@
 
 #include "../arch/arch.h"
 #include "../lib/types.h"
-#include "../os/io_uring.h"
+#include "../os/linux/io_uring.h"
 
 #define barrier()	__asm__ __volatile__("": : :"memory")
 
@@ -90,6 +90,7 @@ static volatile int finish;
 
 static int polled = 1;		/* use IO polling */
 static int fixedbufs = 1;	/* use fixed user buffers */
+static int register_files = 1;	/* use fixed files */
 static int buffered = 0;	/* use buffered IO, not O_DIRECT */
 static int sq_thread_poll = 0;	/* use kernel submission/poller thread */
 static int sq_thread_cpu = -1;	/* pin above thread to this CPU */
@@ -178,7 +179,13 @@ static void init_io(struct submitter *s, unsigned index)
 	lrand48_r(&s->rand, &r);
 	offset = (r % (f->max_blocks - 1)) * BS;
 
-	sqe->flags = IOSQE_FIXED_FILE;
+	if (register_files) {
+		sqe->flags = IOSQE_FIXED_FILE;
+		sqe->fd = f->fixed_fd;
+	} else {
+		sqe->flags = 0;
+		sqe->fd = f->real_fd;
+	}
 	if (fixedbufs) {
 		sqe->opcode = IORING_OP_READ_FIXED;
 		sqe->addr = s->iovecs[index].iov_base;
@@ -191,7 +198,6 @@ static void init_io(struct submitter *s, unsigned index)
 		sqe->buf_index = 0;
 	}
 	sqe->ioprio = 0;
-	sqe->fd = f->fixed_fd;
 	sqe->off = offset;
 	sqe->user_data = (unsigned long) f;
 }
@@ -261,7 +267,7 @@ static int reap_events(struct submitter *s)
 			break;
 		cqe = &ring->cqes[head & cq_ring_mask];
 		if (!do_nop) {
-			f = (struct file *) cqe->user_data;
+			f = (struct file *) (uintptr_t) cqe->user_data;
 			f->pending_ios--;
 			if (cqe->res != BS) {
 				printf("io: unexpected ret=%d\n", cqe->res);
@@ -304,7 +310,7 @@ static void *submitter_fn(void *data)
 submit_more:
 		to_submit = prepped;
 submit:
-		if (s->inflight + BATCH_SUBMIT < DEPTH)
+		if (to_submit && (s->inflight + to_submit < DEPTH))
 			to_wait = 0;
 		else
 			to_wait = min(s->inflight + to_submit, BATCH_COMPLETE);
@@ -424,10 +430,12 @@ static int setup_ring(struct submitter *s)
 		}
 	}
 
-	ret = io_uring_register_files(s);
-	if (ret < 0) {
-		perror("io_uring_register_files");
-		return 1;
+	if (register_files) {
+		ret = io_uring_register_files(s);
+		if (ret < 0) {
+			perror("io_uring_register_files");
+			return 1;
+		}
 	}
 
 	ptr = mmap(0, p.sq_off.array + p.sq_entries * sizeof(__u32),
@@ -465,7 +473,6 @@ int main(int argc, char *argv[])
 	struct submitter *s = &submitters[0];
 	unsigned long done, calls, reap, cache_hit, cache_miss;
 	int err, i, flags, fd;
-	struct rlimit rlim;
 	void *ret;
 
 	if (!do_nop && argc < 2) {
@@ -502,11 +509,15 @@ int main(int argc, char *argv[])
 		i++;
 	}
 
-	rlim.rlim_cur = RLIM_INFINITY;
-	rlim.rlim_max = RLIM_INFINITY;
-	if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0) {
-		perror("setrlimit");
-		return 1;
+	if (fixedbufs) {
+		struct rlimit rlim;
+
+		rlim.rlim_cur = RLIM_INFINITY;
+		rlim.rlim_max = RLIM_INFINITY;
+		if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0) {
+			perror("setrlimit");
+			return 1;
+		}
 	}
 
 	arm_sig_int();


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e0abe38815e1c8cf7a319c6fbd0b1d60691db3d5:

  t/io_uring: only set IORING_ENTER_GETEVENTS when actively reaping (2019-01-11 14:40:16 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8025517dfa599be4bc795e4af7c9012d10b81bc5:

  t/io_uring: add IORING_OP_NOP support (2019-01-12 22:14:54 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      t/io_uring: add IORING_OP_NOP support

 os/io_uring.h |  1 +
 t/io_uring.c  | 28 +++++++++++++++++++++-------
 2 files changed, 22 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/os/io_uring.h b/os/io_uring.h
index 74370aed..0f4460d6 100644
--- a/os/io_uring.h
+++ b/os/io_uring.h
@@ -47,6 +47,7 @@ struct io_uring_sqe {
 #define IORING_SETUP_SQPOLL	(1 << 1)	/* SQ poll thread */
 #define IORING_SETUP_SQ_AFF	(1 << 2)	/* sq_thread_cpu is valid */
 
+#define IORING_OP_NOP		0
 #define IORING_OP_READV		1
 #define IORING_OP_WRITEV	2
 #define IORING_OP_FSYNC		3
diff --git a/t/io_uring.c b/t/io_uring.c
index d4160c3d..8d3f3a9b 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -93,6 +93,7 @@ static int fixedbufs = 1;	/* use fixed user buffers */
 static int buffered = 0;	/* use buffered IO, not O_DIRECT */
 static int sq_thread_poll = 0;	/* use kernel submission/poller thread */
 static int sq_thread_cpu = -1;	/* pin above thread to this CPU */
+static int do_nop = 0;		/* no-op SQ ring commands */
 
 static int io_uring_register_buffers(struct submitter *s)
 {
@@ -101,6 +102,9 @@ static int io_uring_register_buffers(struct submitter *s)
 		.nr_iovecs = DEPTH
 	};
 
+	if (do_nop)
+		return 0;
+
 	return syscall(__NR_sys_io_uring_register, s->ring_fd,
 			IORING_REGISTER_BUFFERS, &reg);
 }
@@ -110,6 +114,9 @@ static int io_uring_register_files(struct submitter *s)
 	struct io_uring_register_files reg;
 	int i;
 
+	if (do_nop)
+		return 0;
+
 	s->fds = calloc(s->nr_files, sizeof(__s32));
 	for (i = 0; i < s->nr_files; i++) {
 		s->fds[i] = s->files[i].real_fd;
@@ -151,6 +158,11 @@ static void init_io(struct submitter *s, unsigned index)
 	struct file *f;
 	long r;
 
+	if (do_nop) {
+		sqe->opcode = IORING_OP_NOP;
+		return;
+	}
+
 	if (s->nr_files == 1) {
 		f = &s->files[0];
 	} else {
@@ -248,11 +260,13 @@ static int reap_events(struct submitter *s)
 		if (head == *ring->tail)
 			break;
 		cqe = &ring->cqes[head & cq_ring_mask];
-		f = (struct file *) cqe->user_data;
-		f->pending_ios--;
-		if (cqe->res != BS) {
-			printf("io: unexpected ret=%d\n", cqe->res);
-			return -1;
+		if (!do_nop) {
+			f = (struct file *) cqe->user_data;
+			f->pending_ios--;
+			if (cqe->res != BS) {
+				printf("io: unexpected ret=%d\n", cqe->res);
+				return -1;
+			}
 		}
 		if (cqe->flags & IOCQE_FLAG_CACHEHIT)
 			s->cachehit++;
@@ -454,7 +468,7 @@ int main(int argc, char *argv[])
 	struct rlimit rlim;
 	void *ret;
 
-	if (argc < 2) {
+	if (!do_nop && argc < 2) {
 		printf("%s: filename\n", argv[0]);
 		return 1;
 	}
@@ -464,7 +478,7 @@ int main(int argc, char *argv[])
 		flags |= O_DIRECT;
 
 	i = 1;
-	while (i < argc) {
+	while (!do_nop && i < argc) {
 		struct file *f = &s->files[s->nr_files];
 
 		fd = open(argv[i], flags);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-12 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-12 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a7abc9fb769596d3bbf6d779e99d1cb8c1fcd49b:

  t/io_uring: add support for registered files (2019-01-10 22:29:27 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e0abe38815e1c8cf7a319c6fbd0b1d60691db3d5:

  t/io_uring: only set IORING_ENTER_GETEVENTS when actively reaping (2019-01-11 14:40:16 -0700)

----------------------------------------------------------------
Jens Axboe (4):
      io_uring: update to newer API
      t/io_uring: remember to set p->sq_thread_cpu
      engines/io_uring: remove unused ld->io_us array
      t/io_uring: only set IORING_ENTER_GETEVENTS when actively reaping

 engines/io_uring.c | 27 +++++++++++++++------------
 os/io_uring.h      | 31 ++++++++++++++++++++-----------
 t/io_uring.c       | 47 ++++++++++++++++++++++++++++-------------------
 3 files changed, 63 insertions(+), 42 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 7591190a..ca3e157f 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -48,7 +48,6 @@ struct ioring_mmap {
 struct ioring_data {
 	int ring_fd;
 
-	struct io_u **io_us;
 	struct io_u **io_u_index;
 
 	struct io_sq_ring sq_ring;
@@ -150,25 +149,31 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 	sqe->buf_index = 0;
 
 	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
-		if (io_u->ddir == DDIR_READ)
-			sqe->opcode = IORING_OP_READV;
-		else
-			sqe->opcode = IORING_OP_WRITEV;
-
 		if (o->fixedbufs) {
-			sqe->flags |= IOSQE_FIXED_BUFFER;
+			if (io_u->ddir == DDIR_READ)
+				sqe->opcode = IORING_OP_READ_FIXED;
+			else
+				sqe->opcode = IORING_OP_WRITE_FIXED;
 			sqe->addr = io_u->xfer_buf;
 			sqe->len = io_u->xfer_buflen;
 			sqe->buf_index = io_u->index;
 		} else {
+			if (io_u->ddir == DDIR_READ)
+				sqe->opcode = IORING_OP_READV;
+			else
+				sqe->opcode = IORING_OP_WRITEV;
 			sqe->addr = &ld->iovecs[io_u->index];
 			sqe->len = 1;
 		}
 		sqe->off = io_u->offset;
-	} else if (ddir_sync(io_u->ddir))
+	} else if (ddir_sync(io_u->ddir)) {
+		sqe->fsync_flags = 0;
+		if (io_u->ddir == DDIR_DATASYNC)
+			sqe->fsync_flags |= IORING_FSYNC_DATASYNC;
 		sqe->opcode = IORING_OP_FSYNC;
+	}
 
-	sqe->data = (unsigned long) io_u;
+	sqe->user_data = (unsigned long) io_u;
 	return 0;
 }
 
@@ -182,7 +187,7 @@ static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 	index = (event + ld->cq_ring_off) & ld->cq_ring_mask;
 
 	cqe = &ld->cq_ring.cqes[index];
-	io_u = (struct io_u *) cqe->data;
+	io_u = (struct io_u *) cqe->user_data;
 
 	if (cqe->res != io_u->xfer_buflen) {
 		if (cqe->res > io_u->xfer_buflen)
@@ -390,7 +395,6 @@ static void fio_ioring_cleanup(struct thread_data *td)
 			fio_ioring_unmap(ld);
 
 		free(ld->io_u_index);
-		free(ld->io_us);
 		free(ld->iovecs);
 		free(ld);
 	}
@@ -526,7 +530,6 @@ static int fio_ioring_init(struct thread_data *td)
 
 	/* io_u index */
 	ld->io_u_index = calloc(td->o.iodepth, sizeof(struct io_u *));
-	ld->io_us = calloc(td->o.iodepth, sizeof(struct io_u *));
 	ld->iovecs = calloc(td->o.iodepth, sizeof(struct iovec));
 
 	td->io_ops_data = ld;
diff --git a/os/io_uring.h b/os/io_uring.h
index e1d3df0b..74370aed 100644
--- a/os/io_uring.h
+++ b/os/io_uring.h
@@ -16,7 +16,7 @@
  */
 struct io_uring_sqe {
 	__u8	opcode;		/* type of operation for this sqe */
-	__u8	flags;		/* IOSQE_ flags below */
+	__u8	flags;		/* IOSQE_ flags */
 	__u16	ioprio;		/* ioprio for the request */
 	__s32	fd;		/* file descriptor to do IO on */
 	__u64	off;		/* offset into file */
@@ -27,18 +27,18 @@ struct io_uring_sqe {
 	__u32	len;		/* buffer size or number of iovecs */
 	union {
 		__kernel_rwf_t	rw_flags;
-		__u32		__resv;
+		__u32		fsync_flags;
 	};
 	__u16	buf_index;	/* index into fixed buffers, if used */
-	__u16	__pad2[3];
-	__u64	data;		/* data to be passed back at completion time */
+	__u16	__pad2;
+	__u32	__pad3;
+	__u64	user_data;	/* data to be passed back at completion time */
 };
 
 /*
  * sqe->flags
  */
-#define IOSQE_FIXED_BUFFER	(1 << 0)	/* use fixed buffer */
-#define IOSQE_FIXED_FILE	(1 << 1)	/* use fixed fileset */
+#define IOSQE_FIXED_FILE	(1 << 0)	/* use fixed fileset */
 
 /*
  * io_uring_setup() flags
@@ -50,13 +50,19 @@ struct io_uring_sqe {
 #define IORING_OP_READV		1
 #define IORING_OP_WRITEV	2
 #define IORING_OP_FSYNC		3
-#define IORING_OP_FDSYNC	4
+#define IORING_OP_READ_FIXED	4
+#define IORING_OP_WRITE_FIXED	5
+
+/*
+ * sqe->fsync_flags
+ */
+#define IORING_FSYNC_DATASYNC	(1 << 0)
 
 /*
  * IO completion data structure (Completion Queue Entry)
  */
 struct io_uring_cqe {
-	__u64	data;		/* sqe->data submission passed back */
+	__u64	user_data;	/* sqe->data submission passed back */
 	__s32	res;		/* result code for this event */
 	__u32	flags;
 };
@@ -87,6 +93,9 @@ struct io_sqring_offsets {
 	__u32 resv[3];
 };
 
+/*
+ * sq_ring->flags
+ */
 #define IORING_SQ_NEED_WAKEUP	(1 << 0) /* needs io_uring_enter wakeup */
 
 struct io_cqring_offsets {
@@ -127,12 +136,12 @@ struct io_uring_params {
 
 struct io_uring_register_buffers {
 	struct iovec *iovecs;
-	unsigned nr_iovecs;
+	__u32 nr_iovecs;
 };
 
 struct io_uring_register_files {
-	int *fds;
-	unsigned nr_fds;
+	__s32 *fds;
+	__u32 nr_fds;
 };
 
 #endif
diff --git a/t/io_uring.c b/t/io_uring.c
index 0461329b..d4160c3d 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -59,7 +59,8 @@ static unsigned sq_ring_mask, cq_ring_mask;
 struct file {
 	unsigned long max_blocks;
 	unsigned pending_ios;
-	int fd;
+	int real_fd;
+	int fixed_fd;
 };
 
 struct submitter {
@@ -77,6 +78,8 @@ struct submitter {
 	unsigned long cachehit, cachemiss;
 	volatile int finish;
 
+	__s32 *fds;
+
 	struct file files[MAX_FDS];
 	unsigned nr_files;
 	unsigned cur_file;
@@ -105,17 +108,18 @@ static int io_uring_register_buffers(struct submitter *s)
 static int io_uring_register_files(struct submitter *s)
 {
 	struct io_uring_register_files reg;
-	int i, ret;
+	int i;
 
-	reg.fds = calloc(s->nr_files, sizeof(int));
-	for (i = 0; i < s->nr_files; i++)
-		reg.fds[i] = s->files[i].fd;
+	s->fds = calloc(s->nr_files, sizeof(__s32));
+	for (i = 0; i < s->nr_files; i++) {
+		s->fds[i] = s->files[i].real_fd;
+		s->files[i].fixed_fd = i;
+	}
+	reg.fds = s->fds;
 	reg.nr_fds = s->nr_files;
 
-	ret = syscall(__NR_sys_io_uring_register, s->ring_fd,
+	return syscall(__NR_sys_io_uring_register, s->ring_fd,
 			IORING_REGISTER_FILES, &reg);
-	free(reg.fds);
-	return ret;
 }
 
 static int io_uring_setup(unsigned entries, struct io_uring_params *p)
@@ -163,21 +167,21 @@ static void init_io(struct submitter *s, unsigned index)
 	offset = (r % (f->max_blocks - 1)) * BS;
 
 	sqe->flags = IOSQE_FIXED_FILE;
-	sqe->opcode = IORING_OP_READV;
 	if (fixedbufs) {
+		sqe->opcode = IORING_OP_READ_FIXED;
 		sqe->addr = s->iovecs[index].iov_base;
 		sqe->len = BS;
 		sqe->buf_index = index;
-		sqe->flags |= IOSQE_FIXED_BUFFER;
 	} else {
+		sqe->opcode = IORING_OP_READV;
 		sqe->addr = &s->iovecs[index];
 		sqe->len = 1;
 		sqe->buf_index = 0;
 	}
 	sqe->ioprio = 0;
-	sqe->fd = f->fd;
+	sqe->fd = f->fixed_fd;
 	sqe->off = offset;
-	sqe->data = (unsigned long) f;
+	sqe->user_data = (unsigned long) f;
 }
 
 static int prep_more_ios(struct submitter *s, int max_ios)
@@ -212,12 +216,12 @@ static int get_file_size(struct file *f)
 {
 	struct stat st;
 
-	if (fstat(f->fd, &st) < 0)
+	if (fstat(f->real_fd, &st) < 0)
 		return -1;
 	if (S_ISBLK(st.st_mode)) {
 		unsigned long long bytes;
 
-		if (ioctl(f->fd, BLKGETSIZE64, &bytes) != 0)
+		if (ioctl(f->real_fd, BLKGETSIZE64, &bytes) != 0)
 			return -1;
 
 		f->max_blocks = bytes / BS;
@@ -244,7 +248,7 @@ static int reap_events(struct submitter *s)
 		if (head == *ring->tail)
 			break;
 		cqe = &ring->cqes[head & cq_ring_mask];
-		f = (struct file *) cqe->data;
+		f = (struct file *) cqe->user_data;
 		f->pending_ios--;
 		if (cqe->res != BS) {
 			printf("io: unexpected ret=%d\n", cqe->res);
@@ -296,8 +300,11 @@ submit:
 		 * poll, or if IORING_SQ_NEED_WAKEUP is set.
 		 */
 		if (!sq_thread_poll || (*ring->flags & IORING_SQ_NEED_WAKEUP)) {
-			ret = io_uring_enter(s, to_submit, to_wait,
-						IORING_ENTER_GETEVENTS);
+			unsigned flags = 0;
+
+			if (to_wait)
+				flags = IORING_ENTER_GETEVENTS;
+			ret = io_uring_enter(s, to_submit, to_wait, flags);
 			s->calls++;
 		}
 
@@ -382,8 +389,10 @@ static int setup_ring(struct submitter *s)
 		p.flags |= IORING_SETUP_IOPOLL;
 	if (sq_thread_poll) {
 		p.flags |= IORING_SETUP_SQPOLL;
-		if (sq_thread_cpu != -1)
+		if (sq_thread_cpu != -1) {
 			p.flags |= IORING_SETUP_SQ_AFF;
+			p.sq_thread_cpu = sq_thread_cpu;
+		}
 	}
 
 	fd = io_uring_setup(DEPTH, &p);
@@ -463,7 +472,7 @@ int main(int argc, char *argv[])
 			perror("open");
 			return 1;
 		}
-		f->fd = fd;
+		f->real_fd = fd;
 		if (get_file_size(f)) {
 			printf("failed getting size of device/file\n");
 			return 1;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 650346e17d045366b817814dd3e10dc94d0d990f:

  engines/io_uring: always setup ld->iovecs[] (2019-01-09 15:13:06 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a7abc9fb769596d3bbf6d779e99d1cb8c1fcd49b:

  t/io_uring: add support for registered files (2019-01-10 22:29:27 -0700)

----------------------------------------------------------------
Jens Axboe (10):
      Update to newer io_uring API
      Makefile: make t/io_uring depend on os/io_uring.h
      io_uring: io_uring_setup(2) takes a 'nr_iovecs' field now
      Update io_uring API
      io_uring: cleanup sq thread poll/cpu setup
      t/io_uring: restore usage of IORING_SETUP_IOPOLL
      t/io_uring: make more efficient for multiple files
      t/io_uring: enable SQ thread poll mode
      t/io_uring: make submits/reaps per-second reflected with sq thread poll
      t/io_uring: add support for registered files

 Makefile           |   1 +
 arch/arch-x86_64.h |   7 +-
 engines/io_uring.c | 100 +++++++++++-----------
 os/io_uring.h      |  50 ++++++++---
 t/io_uring.c       | 242 +++++++++++++++++++++++++++++++++++++----------------
 5 files changed, 264 insertions(+), 136 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 5bc82f9a..3701317e 100644
--- a/Makefile
+++ b/Makefile
@@ -444,6 +444,7 @@ cairo_text_helpers.o: cairo_text_helpers.c cairo_text_helpers.h
 printing.o: printing.c printing.h
 	$(QUIET_CC)$(CC) $(CFLAGS) $(GTK_CFLAGS) $(CPPFLAGS) -c $<
 
+t/io_uring.o: os/io_uring.h
 t/io_uring: $(T_IOU_RING_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_IOU_RING_OBJS) $(LIBS)
 
diff --git a/arch/arch-x86_64.h b/arch/arch-x86_64.h
index a5864bab..665c6b04 100644
--- a/arch/arch-x86_64.h
+++ b/arch/arch-x86_64.h
@@ -2,10 +2,13 @@
 #define ARCH_X86_64_H
 
 #ifndef __NR_sys_io_uring_setup
-#define __NR_sys_io_uring_setup	335
+#define __NR_sys_io_uring_setup		335
 #endif
 #ifndef __NR_sys_io_uring_enter
-#define __NR_sys_io_uring_enter	336
+#define __NR_sys_io_uring_enter		336
+#endif
+#ifndef __NR_sys_io_uring_register
+#define __NR_sys_io_uring_register	337
 #endif
 
 static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 06355e9c..7591190a 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -73,18 +73,17 @@ struct ioring_options {
 	void *pad;
 	unsigned int hipri;
 	unsigned int fixedbufs;
-	unsigned int sqthread;
-	unsigned int sqthread_set;
-	unsigned int sqthread_poll;
-	unsigned int sqwq;
+	unsigned int sqpoll_thread;
+	unsigned int sqpoll_set;
+	unsigned int sqpoll_cpu;
 };
 
-static int fio_ioring_sqthread_cb(void *data, unsigned long long *val)
+static int fio_ioring_sqpoll_cb(void *data, unsigned long long *val)
 {
 	struct ioring_options *o = data;
 
-	o->sqthread = *val;
-	o->sqthread_set = 1;
+	o->sqpoll_cpu = *val;
+	o->sqpoll_set = 1;
 	return 0;
 }
 
@@ -107,30 +106,21 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
 	},
-	{
-		.name	= "sqthread",
-		.lname	= "Use kernel SQ thread on this CPU",
-		.type	= FIO_OPT_INT,
-		.cb	= fio_ioring_sqthread_cb,
-		.help	= "Offload submission to kernel thread",
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_LIBAIO,
-	},
 	{
 		.name	= "sqthread_poll",
-		.lname	= "Kernel SQ thread should poll",
-		.type	= FIO_OPT_STR_SET,
-		.off1	= offsetof(struct ioring_options, sqthread_poll),
-		.help	= "Used with sqthread, enables kernel side polling",
+		.lname	= "Kernel SQ thread polling",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct ioring_options, sqpoll_thread),
+		.help	= "Offload submission/completion to kernel thread",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
 	},
 	{
-		.name	= "sqwq",
-		.lname	= "Offload submission to kernel workqueue",
-		.type	= FIO_OPT_STR_SET,
-		.off1	= offsetof(struct ioring_options, sqwq),
-		.help	= "Offload submission to kernel workqueue",
+		.name	= "sqthread_poll_cpu",
+		.lname	= "SQ Thread Poll CPU",
+		.type	= FIO_OPT_INT,
+		.cb	= fio_ioring_sqpoll_cb,
+		.help	= "What CPU to run SQ thread polling on",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
 	},
@@ -157,20 +147,20 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 	sqe->fd = f->fd;
 	sqe->flags = 0;
 	sqe->ioprio = 0;
+	sqe->buf_index = 0;
 
 	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
+		if (io_u->ddir == DDIR_READ)
+			sqe->opcode = IORING_OP_READV;
+		else
+			sqe->opcode = IORING_OP_WRITEV;
+
 		if (o->fixedbufs) {
-			if (io_u->ddir == DDIR_READ)
-				sqe->opcode = IORING_OP_READ_FIXED;
-			else
-				sqe->opcode = IORING_OP_WRITE_FIXED;
+			sqe->flags |= IOSQE_FIXED_BUFFER;
 			sqe->addr = io_u->xfer_buf;
 			sqe->len = io_u->xfer_buflen;
+			sqe->buf_index = io_u->index;
 		} else {
-			if (io_u->ddir == DDIR_READ)
-				sqe->opcode = IORING_OP_READV;
-			else
-				sqe->opcode = IORING_OP_WRITEV;
 			sqe->addr = &ld->iovecs[io_u->index];
 			sqe->len = 1;
 		}
@@ -178,6 +168,7 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 	} else if (ddir_sync(io_u->ddir))
 		sqe->opcode = IORING_OP_FSYNC;
 
+	sqe->data = (unsigned long) io_u;
 	return 0;
 }
 
@@ -191,7 +182,7 @@ static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 	index = (event + ld->cq_ring_off) & ld->cq_ring_mask;
 
 	cqe = &ld->cq_ring.cqes[index];
-	io_u = ld->io_u_index[cqe->index];
+	io_u = (struct io_u *) cqe->data;
 
 	if (cqe->res != io_u->xfer_buflen) {
 		if (cqe->res > io_u->xfer_buflen)
@@ -250,7 +241,7 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
 			continue;
 		}
 
-		if (!o->sqthread_poll) {
+		if (!o->sqpoll_thread) {
 			r = io_uring_enter(ld, 0, actual_min,
 						IORING_ENTER_GETEVENTS);
 			if (r < 0) {
@@ -332,10 +323,15 @@ static int fio_ioring_commit(struct thread_data *td)
 	if (!ld->queued)
 		return 0;
 
-	/* Nothing to do */
-	if (o->sqthread_poll) {
+	/*
+	 * Kernel side does submission. just need to check if the ring is
+	 * flagged as needing a kick, if so, call io_uring_enter(). This
+	 * only happens if we've been idle too long.
+	 */
+	if (o->sqpoll_thread) {
 		struct io_sq_ring *ring = &ld->sq_ring;
 
+		read_barrier();
 		if (*ring->flags & IORING_SQ_NEED_WAKEUP)
 			io_uring_enter(ld, ld->queued, 0, 0);
 		ld->queued = 0;
@@ -445,7 +441,6 @@ static int fio_ioring_queue_init(struct thread_data *td)
 	struct ioring_data *ld = td->io_ops_data;
 	struct ioring_options *o = td->eo;
 	int depth = td->o.iodepth;
-	struct iovec *vecs = NULL;
 	struct io_uring_params p;
 	int ret;
 
@@ -453,14 +448,13 @@ static int fio_ioring_queue_init(struct thread_data *td)
 
 	if (o->hipri)
 		p.flags |= IORING_SETUP_IOPOLL;
-	if (o->sqthread_set) {
-		p.sq_thread_cpu = o->sqthread;
-		p.flags |= IORING_SETUP_SQTHREAD;
-		if (o->sqthread_poll)
-			p.flags |= IORING_SETUP_SQPOLL;
+	if (o->sqpoll_thread) {
+		p.flags |= IORING_SETUP_SQPOLL;
+		if (o->sqpoll_set) {
+			p.flags |= IORING_SETUP_SQ_AFF;
+			p.sq_thread_cpu = o->sqpoll_cpu;
+		}
 	}
-	if (o->sqwq)
-		p.flags |= IORING_SETUP_SQWQ;
 
 	if (o->fixedbufs) {
 		struct rlimit rlim = {
@@ -469,14 +463,26 @@ static int fio_ioring_queue_init(struct thread_data *td)
 		};
 
 		setrlimit(RLIMIT_MEMLOCK, &rlim);
-		vecs = ld->iovecs;
 	}
 
-	ret = syscall(__NR_sys_io_uring_setup, depth, vecs, &p);
+	ret = syscall(__NR_sys_io_uring_setup, depth, &p);
 	if (ret < 0)
 		return ret;
 
 	ld->ring_fd = ret;
+
+	if (o->fixedbufs) {
+		struct io_uring_register_buffers reg = {
+			.iovecs = ld->iovecs,
+			.nr_iovecs = depth
+		};
+
+		ret = syscall(__NR_sys_io_uring_register, ld->ring_fd,
+				IORING_REGISTER_BUFFERS, &reg);
+		if (ret < 0)
+			return ret;
+	}
+
 	return fio_ioring_mmap(ld, &p);
 }
 
diff --git a/os/io_uring.h b/os/io_uring.h
index 20e4c22e..e1d3df0b 100644
--- a/os/io_uring.h
+++ b/os/io_uring.h
@@ -15,42 +15,48 @@
  * IO submission data structure (Submission Queue Entry)
  */
 struct io_uring_sqe {
-	__u8	opcode;
-	__u8	flags;
-	__u16	ioprio;
-	__s32	fd;
-	__u64	off;
+	__u8	opcode;		/* type of operation for this sqe */
+	__u8	flags;		/* IOSQE_ flags below */
+	__u16	ioprio;		/* ioprio for the request */
+	__s32	fd;		/* file descriptor to do IO on */
+	__u64	off;		/* offset into file */
 	union {
-		void	*addr;
+		void	*addr;	/* buffer or iovecs */
 		__u64	__pad;
 	};
-	__u32	len;
+	__u32	len;		/* buffer size or number of iovecs */
 	union {
 		__kernel_rwf_t	rw_flags;
 		__u32		__resv;
 	};
+	__u16	buf_index;	/* index into fixed buffers, if used */
+	__u16	__pad2[3];
+	__u64	data;		/* data to be passed back at completion time */
 };
 
+/*
+ * sqe->flags
+ */
+#define IOSQE_FIXED_BUFFER	(1 << 0)	/* use fixed buffer */
+#define IOSQE_FIXED_FILE	(1 << 1)	/* use fixed fileset */
+
 /*
  * io_uring_setup() flags
  */
 #define IORING_SETUP_IOPOLL	(1 << 0)	/* io_context is polled */
-#define	IORING_SETUP_SQTHREAD	(1 << 1)	/* Use SQ thread */
-#define IORING_SETUP_SQWQ	(1 << 2)	/* Use SQ workqueue */
-#define IORING_SETUP_SQPOLL	(1 << 3)	/* SQ thread polls */
+#define IORING_SETUP_SQPOLL	(1 << 1)	/* SQ poll thread */
+#define IORING_SETUP_SQ_AFF	(1 << 2)	/* sq_thread_cpu is valid */
 
 #define IORING_OP_READV		1
 #define IORING_OP_WRITEV	2
 #define IORING_OP_FSYNC		3
 #define IORING_OP_FDSYNC	4
-#define IORING_OP_READ_FIXED	5
-#define IORING_OP_WRITE_FIXED	6
 
 /*
  * IO completion data structure (Completion Queue Entry)
  */
 struct io_uring_cqe {
-	__u64	index;		/* what sqe this event came from */
+	__u64	data;		/* sqe->data submission passed back */
 	__s32	res;		/* result code for this event */
 	__u32	flags;
 };
@@ -111,4 +117,22 @@ struct io_uring_params {
 	struct io_cqring_offsets cq_off;
 };
 
+/*
+ * io_uring_register(2) opcodes and arguments
+ */
+#define IORING_REGISTER_BUFFERS		0
+#define IORING_UNREGISTER_BUFFERS	1
+#define IORING_REGISTER_FILES		2
+#define IORING_UNREGISTER_FILES		3
+
+struct io_uring_register_buffers {
+	struct iovec *iovecs;
+	unsigned nr_iovecs;
+};
+
+struct io_uring_register_files {
+	int *fds;
+	unsigned nr_fds;
+};
+
 #endif
diff --git a/t/io_uring.c b/t/io_uring.c
index 3edc87c6..0461329b 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -33,6 +33,7 @@ struct io_sq_ring {
 	unsigned *tail;
 	unsigned *ring_mask;
 	unsigned *ring_entries;
+	unsigned *flags;
 	unsigned *array;
 };
 
@@ -44,18 +45,25 @@ struct io_cq_ring {
 	struct io_uring_cqe *cqes;
 };
 
-#define DEPTH			32
+#define DEPTH			128
 
-#define BATCH_SUBMIT		8
-#define BATCH_COMPLETE		8
+#define BATCH_SUBMIT		64
+#define BATCH_COMPLETE		64
 
 #define BS			4096
 
+#define MAX_FDS			16
+
 static unsigned sq_ring_mask, cq_ring_mask;
 
+struct file {
+	unsigned long max_blocks;
+	unsigned pending_ios;
+	int fd;
+};
+
 struct submitter {
 	pthread_t thread;
-	unsigned long max_blocks;
 	int ring_fd;
 	struct drand48_data rand;
 	struct io_sq_ring sq_ring;
@@ -68,22 +76,51 @@ struct submitter {
 	unsigned long calls;
 	unsigned long cachehit, cachemiss;
 	volatile int finish;
-	char filename[128];
+
+	struct file files[MAX_FDS];
+	unsigned nr_files;
+	unsigned cur_file;
 };
 
 static struct submitter submitters[1];
 static volatile int finish;
 
 static int polled = 1;		/* use IO polling */
-static int fixedbufs = 0;	/* use fixed user buffers */
+static int fixedbufs = 1;	/* use fixed user buffers */
 static int buffered = 0;	/* use buffered IO, not O_DIRECT */
-static int sq_thread = 0;	/* use kernel submission thread */
-static int sq_thread_cpu = 0;	/* pin above thread to this CPU */
+static int sq_thread_poll = 0;	/* use kernel submission/poller thread */
+static int sq_thread_cpu = -1;	/* pin above thread to this CPU */
 
-static int io_uring_setup(unsigned entries, struct iovec *iovecs,
-			  struct io_uring_params *p)
+static int io_uring_register_buffers(struct submitter *s)
 {
-	return syscall(__NR_sys_io_uring_setup, entries, iovecs, p);
+	struct io_uring_register_buffers reg = {
+		.iovecs = s->iovecs,
+		.nr_iovecs = DEPTH
+	};
+
+	return syscall(__NR_sys_io_uring_register, s->ring_fd,
+			IORING_REGISTER_BUFFERS, &reg);
+}
+
+static int io_uring_register_files(struct submitter *s)
+{
+	struct io_uring_register_files reg;
+	int i, ret;
+
+	reg.fds = calloc(s->nr_files, sizeof(int));
+	for (i = 0; i < s->nr_files; i++)
+		reg.fds[i] = s->files[i].fd;
+	reg.nr_fds = s->nr_files;
+
+	ret = syscall(__NR_sys_io_uring_register, s->ring_fd,
+			IORING_REGISTER_FILES, &reg);
+	free(reg.fds);
+	return ret;
+}
+
+static int io_uring_setup(unsigned entries, struct io_uring_params *p)
+{
+	return syscall(__NR_sys_io_uring_setup, entries, p);
 }
 
 static int io_uring_enter(struct submitter *s, unsigned int to_submit,
@@ -98,31 +135,52 @@ static int gettid(void)
 	return syscall(__NR_gettid);
 }
 
-static void init_io(struct submitter *s, int fd, unsigned index)
+static unsigned file_depth(struct submitter *s)
+{
+	return (DEPTH + s->nr_files - 1) / s->nr_files;
+}
+
+static void init_io(struct submitter *s, unsigned index)
 {
 	struct io_uring_sqe *sqe = &s->sqes[index];
 	unsigned long offset;
+	struct file *f;
 	long r;
 
+	if (s->nr_files == 1) {
+		f = &s->files[0];
+	} else {
+		f = &s->files[s->cur_file];
+		if (f->pending_ios >= file_depth(s)) {
+			s->cur_file++;
+			if (s->cur_file == s->nr_files)
+				s->cur_file = 0;
+		}
+	}
+	f->pending_ios++;
+
 	lrand48_r(&s->rand, &r);
-	offset = (r % (s->max_blocks - 1)) * BS;
+	offset = (r % (f->max_blocks - 1)) * BS;
 
+	sqe->flags = IOSQE_FIXED_FILE;
+	sqe->opcode = IORING_OP_READV;
 	if (fixedbufs) {
-		sqe->opcode = IORING_OP_READ_FIXED;
 		sqe->addr = s->iovecs[index].iov_base;
 		sqe->len = BS;
+		sqe->buf_index = index;
+		sqe->flags |= IOSQE_FIXED_BUFFER;
 	} else {
-		sqe->opcode = IORING_OP_READV;
 		sqe->addr = &s->iovecs[index];
 		sqe->len = 1;
+		sqe->buf_index = 0;
 	}
-	sqe->flags = 0;
 	sqe->ioprio = 0;
-	sqe->fd = fd;
+	sqe->fd = f->fd;
 	sqe->off = offset;
+	sqe->data = (unsigned long) f;
 }
 
-static int prep_more_ios(struct submitter *s, int fd, int max_ios)
+static int prep_more_ios(struct submitter *s, int max_ios)
 {
 	struct io_sq_ring *ring = &s->sq_ring;
 	unsigned index, tail, next_tail, prepped = 0;
@@ -135,7 +193,7 @@ static int prep_more_ios(struct submitter *s, int fd, int max_ios)
 			break;
 
 		index = tail & sq_ring_mask;
-		init_io(s, fd, index);
+		init_io(s, index);
 		ring->array[index] = index;
 		prepped++;
 		tail = next_tail;
@@ -150,22 +208,22 @@ static int prep_more_ios(struct submitter *s, int fd, int max_ios)
 	return prepped;
 }
 
-static int get_file_size(int fd, unsigned long *blocks)
+static int get_file_size(struct file *f)
 {
 	struct stat st;
 
-	if (fstat(fd, &st) < 0)
+	if (fstat(f->fd, &st) < 0)
 		return -1;
 	if (S_ISBLK(st.st_mode)) {
 		unsigned long long bytes;
 
-		if (ioctl(fd, BLKGETSIZE64, &bytes) != 0)
+		if (ioctl(f->fd, BLKGETSIZE64, &bytes) != 0)
 			return -1;
 
-		*blocks = bytes / BS;
+		f->max_blocks = bytes / BS;
 		return 0;
 	} else if (S_ISREG(st.st_mode)) {
-		*blocks = st.st_size / BS;
+		f->max_blocks = st.st_size / BS;
 		return 0;
 	}
 
@@ -180,17 +238,16 @@ static int reap_events(struct submitter *s)
 
 	head = *ring->head;
 	do {
+		struct file *f;
+
 		barrier();
 		if (head == *ring->tail)
 			break;
 		cqe = &ring->cqes[head & cq_ring_mask];
+		f = (struct file *) cqe->data;
+		f->pending_ios--;
 		if (cqe->res != BS) {
-			struct io_uring_sqe *sqe = &s->sqes[cqe->index];
-
 			printf("io: unexpected ret=%d\n", cqe->res);
-			printf("offset=%lu, size=%lu\n",
-					(unsigned long) sqe->off,
-					(unsigned long) sqe->len);
 			return -1;
 		}
 		if (cqe->flags & IOCQE_FLAG_CACHEHIT)
@@ -210,29 +267,11 @@ static int reap_events(struct submitter *s)
 static void *submitter_fn(void *data)
 {
 	struct submitter *s = data;
-	int fd, ret, prepped, flags;
+	struct io_sq_ring *ring = &s->sq_ring;
+	int ret, prepped;
 
 	printf("submitter=%d\n", gettid());
 
-	flags = O_RDONLY;
-	if (!buffered)
-		flags |= O_DIRECT;
-	fd = open(s->filename, flags);
-	if (fd < 0) {
-		perror("open");
-		goto done;
-	}
-
-	if (get_file_size(fd, &s->max_blocks)) {
-		printf("failed getting size of device/file\n");
-		goto err;
-	}
-	if (s->max_blocks <= 1) {
-		printf("Zero file/device size?\n");
-		goto err;
-	}
-	s->max_blocks--;
-
 	srand48_r(pthread_self(), &s->rand);
 
 	prepped = 0;
@@ -241,7 +280,7 @@ static void *submitter_fn(void *data)
 
 		if (!prepped && s->inflight < DEPTH) {
 			to_prep = min(DEPTH - s->inflight, BATCH_SUBMIT);
-			prepped = prep_more_ios(s, fd, to_prep);
+			prepped = prep_more_ios(s, to_prep);
 		}
 		s->inflight += prepped;
 submit_more:
@@ -252,13 +291,30 @@ submit:
 		else
 			to_wait = min(s->inflight + to_submit, BATCH_COMPLETE);
 
-		ret = io_uring_enter(s, to_submit, to_wait,
-					IORING_ENTER_GETEVENTS);
-		s->calls++;
+		/*
+		 * Only need to call io_uring_enter if we're not using SQ thread
+		 * poll, or if IORING_SQ_NEED_WAKEUP is set.
+		 */
+		if (!sq_thread_poll || (*ring->flags & IORING_SQ_NEED_WAKEUP)) {
+			ret = io_uring_enter(s, to_submit, to_wait,
+						IORING_ENTER_GETEVENTS);
+			s->calls++;
+		}
 
-		this_reap = reap_events(s);
-		if (this_reap == -1)
-			break;
+		/*
+		 * For non SQ thread poll, we already got the events we needed
+		 * through the io_uring_enter() above. For SQ thread poll, we
+		 * need to loop here until we find enough events.
+		 */
+		this_reap = 0;
+		do {
+			int r;
+			r = reap_events(s);
+			if (r == -1)
+				break;
+			else if (r > 0)
+				this_reap += r;
+		} while (sq_thread_poll && this_reap < to_wait);
 		s->reaps += this_reap;
 
 		if (ret >= 0) {
@@ -290,9 +346,7 @@ submit:
 			break;
 		}
 	} while (!s->finish);
-err:
-	close(fd);
-done:
+
 	finish = 1;
 	return NULL;
 }
@@ -319,30 +373,40 @@ static int setup_ring(struct submitter *s)
 	struct io_sq_ring *sring = &s->sq_ring;
 	struct io_cq_ring *cring = &s->cq_ring;
 	struct io_uring_params p;
+	int ret, fd;
 	void *ptr;
-	int fd;
 
 	memset(&p, 0, sizeof(p));
 
 	if (polled)
 		p.flags |= IORING_SETUP_IOPOLL;
-	if (buffered)
-		p.flags |= IORING_SETUP_SQWQ;
-	else if (sq_thread) {
-		p.flags |= IORING_SETUP_SQTHREAD;
-		p.sq_thread_cpu = sq_thread_cpu;
+	if (sq_thread_poll) {
+		p.flags |= IORING_SETUP_SQPOLL;
+		if (sq_thread_cpu != -1)
+			p.flags |= IORING_SETUP_SQ_AFF;
 	}
 
-	if (fixedbufs)
-		fd = io_uring_setup(DEPTH, s->iovecs, &p);
-	else
-		fd = io_uring_setup(DEPTH, NULL, &p);
+	fd = io_uring_setup(DEPTH, &p);
 	if (fd < 0) {
 		perror("io_uring_setup");
 		return 1;
 	}
-
 	s->ring_fd = fd;
+
+	if (fixedbufs) {
+		ret = io_uring_register_buffers(s);
+		if (ret < 0) {
+			perror("io_uring_register_buffers");
+			return 1;
+		}
+	}
+
+	ret = io_uring_register_files(s);
+	if (ret < 0) {
+		perror("io_uring_register_files");
+		return 1;
+	}
+
 	ptr = mmap(0, p.sq_off.array + p.sq_entries * sizeof(__u32),
 			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
 			IORING_OFF_SQ_RING);
@@ -351,6 +415,7 @@ static int setup_ring(struct submitter *s)
 	sring->tail = ptr + p.sq_off.tail;
 	sring->ring_mask = ptr + p.sq_off.ring_mask;
 	sring->ring_entries = ptr + p.sq_off.ring_entries;
+	sring->flags = ptr + p.sq_off.flags;
 	sring->array = ptr + p.sq_off.array;
 	sq_ring_mask = *sring->ring_mask;
 
@@ -376,7 +441,7 @@ int main(int argc, char *argv[])
 {
 	struct submitter *s = &submitters[0];
 	unsigned long done, calls, reap, cache_hit, cache_miss;
-	int err, i;
+	int err, i, flags, fd;
 	struct rlimit rlim;
 	void *ret;
 
@@ -385,6 +450,35 @@ int main(int argc, char *argv[])
 		return 1;
 	}
 
+	flags = O_RDONLY | O_NOATIME;
+	if (!buffered)
+		flags |= O_DIRECT;
+
+	i = 1;
+	while (i < argc) {
+		struct file *f = &s->files[s->nr_files];
+
+		fd = open(argv[i], flags);
+		if (fd < 0) {
+			perror("open");
+			return 1;
+		}
+		f->fd = fd;
+		if (get_file_size(f)) {
+			printf("failed getting size of device/file\n");
+			return 1;
+		}
+		if (f->max_blocks <= 1) {
+			printf("Zero file/device size?\n");
+			return 1;
+		}
+		f->max_blocks--;
+
+		printf("Added file %s\n", argv[i]);
+		s->nr_files++;
+		i++;
+	}
+
 	rlim.rlim_cur = RLIM_INFINITY;
 	rlim.rlim_max = RLIM_INFINITY;
 	if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0) {
@@ -412,7 +506,6 @@ int main(int argc, char *argv[])
 	}
 	printf("polled=%d, fixedbufs=%d, buffered=%d", polled, fixedbufs, buffered);
 	printf(" QD=%d, sq_ring=%d, cq_ring=%d\n", DEPTH, *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
-	strcpy(s->filename, argv[1]);
 
 	pthread_create(&s->thread, NULL, submitter_fn, s);
 
@@ -443,8 +536,9 @@ int main(int argc, char *argv[])
 		if (this_call - calls) {
 			rpc = (this_done - done) / (this_call - calls);
 			ipc = (this_reap - reap) / (this_call - calls);
-		}
-		printf("IOPS=%lu, IOS/call=%lu/%lu, inflight=%u (head=%u tail=%u), Cachehit=%0.2f%%\n",
+		} else
+			rpc = ipc = -1;
+		printf("IOPS=%lu, IOS/call=%ld/%ld, inflight=%u (head=%u tail=%u), Cachehit=%0.2f%%\n",
 				this_done - done, rpc, ipc, s->inflight,
 				*s->cq_ring.head, *s->cq_ring.tail, hit);
 		done = this_done;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-10 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-10 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b08e7d6b18b4a38f61800e7553cd5e5d282da4a8:

  engines/devdax: Make detection of device-dax instances more robust (2019-01-08 12:47:37 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 650346e17d045366b817814dd3e10dc94d0d990f:

  engines/io_uring: always setup ld->iovecs[] (2019-01-09 15:13:06 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Update to newer io_uring API
      engines/io_uring: always setup ld->iovecs[]

 engines/io_uring.c | 91 ++++++++++++++++++++++++++----------------------------
 os/io_uring.h      | 27 ++++++++--------
 t/io_uring.c       | 63 ++++++++++++++++++-------------------
 3 files changed, 89 insertions(+), 92 deletions(-)

---

Diff of recent changes:

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 55f48eda..06355e9c 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -37,7 +37,7 @@ struct io_cq_ring {
 	unsigned *tail;
 	unsigned *ring_mask;
 	unsigned *ring_entries;
-	struct io_uring_event *events;
+	struct io_uring_cqe *cqes;
 };
 
 struct ioring_mmap {
@@ -52,7 +52,7 @@ struct ioring_data {
 	struct io_u **io_u_index;
 
 	struct io_sq_ring sq_ring;
-	struct io_uring_iocb *iocbs;
+	struct io_uring_sqe *sqes;
 	struct iovec *iovecs;
 	unsigned sq_ring_mask;
 
@@ -151,30 +151,32 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 	struct ioring_data *ld = td->io_ops_data;
 	struct ioring_options *o = td->eo;
 	struct fio_file *f = io_u->file;
-	struct io_uring_iocb *iocb;
+	struct io_uring_sqe *sqe;
 
-	iocb = &ld->iocbs[io_u->index];
-	iocb->fd = f->fd;
-	iocb->flags = 0;
-	iocb->ioprio = 0;
+	sqe = &ld->sqes[io_u->index];
+	sqe->fd = f->fd;
+	sqe->flags = 0;
+	sqe->ioprio = 0;
 
 	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
-		if (io_u->ddir == DDIR_READ) {
-			if (o->fixedbufs)
-				iocb->opcode = IORING_OP_READ_FIXED;
+		if (o->fixedbufs) {
+			if (io_u->ddir == DDIR_READ)
+				sqe->opcode = IORING_OP_READ_FIXED;
 			else
-				iocb->opcode = IORING_OP_READ;
+				sqe->opcode = IORING_OP_WRITE_FIXED;
+			sqe->addr = io_u->xfer_buf;
+			sqe->len = io_u->xfer_buflen;
 		} else {
-			if (o->fixedbufs)
-				iocb->opcode = IORING_OP_WRITE_FIXED;
+			if (io_u->ddir == DDIR_READ)
+				sqe->opcode = IORING_OP_READV;
 			else
-				iocb->opcode = IORING_OP_WRITE;
+				sqe->opcode = IORING_OP_WRITEV;
+			sqe->addr = &ld->iovecs[io_u->index];
+			sqe->len = 1;
 		}
-		iocb->off = io_u->offset;
-		iocb->addr = io_u->xfer_buf;
-		iocb->len = io_u->xfer_buflen;
+		sqe->off = io_u->offset;
 	} else if (ddir_sync(io_u->ddir))
-		iocb->opcode = IORING_OP_FSYNC;
+		sqe->opcode = IORING_OP_FSYNC;
 
 	return 0;
 }
@@ -182,25 +184,25 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 {
 	struct ioring_data *ld = td->io_ops_data;
-	struct io_uring_event *ev;
+	struct io_uring_cqe *cqe;
 	struct io_u *io_u;
 	unsigned index;
 
 	index = (event + ld->cq_ring_off) & ld->cq_ring_mask;
 
-	ev = &ld->cq_ring.events[index];
-	io_u = ld->io_u_index[ev->index];
+	cqe = &ld->cq_ring.cqes[index];
+	io_u = ld->io_u_index[cqe->index];
 
-	if (ev->res != io_u->xfer_buflen) {
-		if (ev->res > io_u->xfer_buflen)
-			io_u->error = -ev->res;
+	if (cqe->res != io_u->xfer_buflen) {
+		if (cqe->res > io_u->xfer_buflen)
+			io_u->error = -cqe->res;
 		else
-			io_u->resid = io_u->xfer_buflen - ev->res;
+			io_u->resid = io_u->xfer_buflen - cqe->res;
 	} else
 		io_u->error = 0;
 
 	if (io_u->ddir == DDIR_READ) {
-		if (ev->flags & IOEV_FLAG_CACHEHIT)
+		if (cqe->flags & IOCQE_FLAG_CACHEHIT)
 			ld->cachehit++;
 		else
 			ld->cachemiss++;
@@ -417,14 +419,14 @@ static int fio_ioring_mmap(struct ioring_data *ld, struct io_uring_params *p)
 	sring->array = ptr + p->sq_off.array;
 	ld->sq_ring_mask = *sring->ring_mask;
 
-	ld->mmap[1].len = p->sq_entries * sizeof(struct io_uring_iocb);
-	ld->iocbs = mmap(0, ld->mmap[1].len, PROT_READ | PROT_WRITE,
+	ld->mmap[1].len = p->sq_entries * sizeof(struct io_uring_sqe);
+	ld->sqes = mmap(0, ld->mmap[1].len, PROT_READ | PROT_WRITE,
 				MAP_SHARED | MAP_POPULATE, ld->ring_fd,
-				IORING_OFF_IOCB);
-	ld->mmap[1].ptr = ld->iocbs;
+				IORING_OFF_SQES);
+	ld->mmap[1].ptr = ld->sqes;
 
-	ld->mmap[2].len = p->cq_off.events +
-				p->cq_entries * sizeof(struct io_uring_event);
+	ld->mmap[2].len = p->cq_off.cqes +
+				p->cq_entries * sizeof(struct io_uring_cqe);
 	ptr = mmap(0, ld->mmap[2].len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_POPULATE, ld->ring_fd,
 			IORING_OFF_CQ_RING);
@@ -433,7 +435,7 @@ static int fio_ioring_mmap(struct ioring_data *ld, struct io_uring_params *p)
 	cring->tail = ptr + p->cq_off.tail;
 	cring->ring_mask = ptr + p->cq_off.ring_mask;
 	cring->ring_entries = ptr + p->cq_off.ring_entries;
-	cring->events = ptr + p->cq_off.events;
+	cring->cqes = ptr + p->cq_off.cqes;
 	ld->cq_ring_mask = *cring->ring_mask;
 	return 0;
 }
@@ -443,6 +445,7 @@ static int fio_ioring_queue_init(struct thread_data *td)
 	struct ioring_data *ld = td->io_ops_data;
 	struct ioring_options *o = td->eo;
 	int depth = td->o.iodepth;
+	struct iovec *vecs = NULL;
 	struct io_uring_params p;
 	int ret;
 
@@ -466,10 +469,10 @@ static int fio_ioring_queue_init(struct thread_data *td)
 		};
 
 		setrlimit(RLIMIT_MEMLOCK, &rlim);
-		p.flags |= IORING_SETUP_FIXEDBUFS;
+		vecs = ld->iovecs;
 	}
 
-	ret = syscall(__NR_sys_io_uring_setup, depth, ld->iovecs, &p);
+	ret = syscall(__NR_sys_io_uring_setup, depth, vecs, &p);
 	if (ret < 0)
 		return ret;
 
@@ -480,20 +483,15 @@ static int fio_ioring_queue_init(struct thread_data *td)
 static int fio_ioring_post_init(struct thread_data *td)
 {
 	struct ioring_data *ld = td->io_ops_data;
-	struct ioring_options *o = td->eo;
 	struct io_u *io_u;
-	int err;
-
-	if (o->fixedbufs) {
-		int i;
+	int err, i;
 
-		for (i = 0; i < td->o.iodepth; i++) {
-			struct iovec *iov = &ld->iovecs[i];
+	for (i = 0; i < td->o.iodepth; i++) {
+		struct iovec *iov = &ld->iovecs[i];
 
-			io_u = ld->io_u_index[i];
-			iov->iov_base = io_u->buf;
-			iov->iov_len = td_max_bs(td);
-		}
+		io_u = ld->io_u_index[i];
+		iov->iov_base = io_u->buf;
+		iov->iov_len = td_max_bs(td);
 	}
 
 	err = fio_ioring_queue_init(td);
@@ -523,7 +521,6 @@ static int fio_ioring_init(struct thread_data *td)
 	/* io_u index */
 	ld->io_u_index = calloc(td->o.iodepth, sizeof(struct io_u *));
 	ld->io_us = calloc(td->o.iodepth, sizeof(struct io_u *));
-
 	ld->iovecs = calloc(td->o.iodepth, sizeof(struct iovec));
 
 	td->io_ops_data = ld;
diff --git a/os/io_uring.h b/os/io_uring.h
index 7dd21126..20e4c22e 100644
--- a/os/io_uring.h
+++ b/os/io_uring.h
@@ -12,9 +12,9 @@
 #include <linux/types.h>
 
 /*
- * IO submission data structure
+ * IO submission data structure (Submission Queue Entry)
  */
-struct io_uring_iocb {
+struct io_uring_sqe {
 	__u8	opcode;
 	__u8	flags;
 	__u16	ioprio;
@@ -35,23 +35,22 @@ struct io_uring_iocb {
  * io_uring_setup() flags
  */
 #define IORING_SETUP_IOPOLL	(1 << 0)	/* io_context is polled */
-#define IORING_SETUP_FIXEDBUFS	(1 << 1)	/* IO buffers are fixed */
-#define IORING_SETUP_SQTHREAD	(1 << 2)	/* Use SQ thread */
-#define IORING_SETUP_SQWQ	(1 << 3)	/* Use SQ workqueue */
-#define IORING_SETUP_SQPOLL	(1 << 4)	/* SQ thread polls */
+#define	IORING_SETUP_SQTHREAD	(1 << 1)	/* Use SQ thread */
+#define IORING_SETUP_SQWQ	(1 << 2)	/* Use SQ workqueue */
+#define IORING_SETUP_SQPOLL	(1 << 3)	/* SQ thread polls */
 
-#define IORING_OP_READ		1
-#define IORING_OP_WRITE		2
+#define IORING_OP_READV		1
+#define IORING_OP_WRITEV	2
 #define IORING_OP_FSYNC		3
 #define IORING_OP_FDSYNC	4
 #define IORING_OP_READ_FIXED	5
 #define IORING_OP_WRITE_FIXED	6
 
 /*
- * IO completion data structure
+ * IO completion data structure (Completion Queue Entry)
  */
-struct io_uring_event {
-	__u64	index;		/* what iocb this event came from */
+struct io_uring_cqe {
+	__u64	index;		/* what sqe this event came from */
 	__s32	res;		/* result code for this event */
 	__u32	flags;
 };
@@ -59,14 +58,14 @@ struct io_uring_event {
 /*
  * io_uring_event->flags
  */
-#define IOEV_FLAG_CACHEHIT	(1 << 0)	/* IO did not hit media */
+#define IOCQE_FLAG_CACHEHIT	(1 << 0)	/* IO did not hit media */
 
 /*
  * Magic offsets for the application to mmap the data it needs
  */
 #define IORING_OFF_SQ_RING		0ULL
 #define IORING_OFF_CQ_RING		0x8000000ULL
-#define IORING_OFF_IOCB			0x10000000ULL
+#define IORING_OFF_SQES			0x10000000ULL
 
 /*
  * Filled with the offset for mmap(2)
@@ -90,7 +89,7 @@ struct io_cqring_offsets {
 	__u32 ring_mask;
 	__u32 ring_entries;
 	__u32 overflow;
-	__u32 events;
+	__u32 cqes;
 	__u32 resv[4];
 };
 
diff --git a/t/io_uring.c b/t/io_uring.c
index fb2654a3..3edc87c6 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -41,7 +41,7 @@ struct io_cq_ring {
 	unsigned *tail;
 	unsigned *ring_mask;
 	unsigned *ring_entries;
-	struct io_uring_event *events;
+	struct io_uring_cqe *cqes;
 };
 
 #define DEPTH			32
@@ -59,7 +59,7 @@ struct submitter {
 	int ring_fd;
 	struct drand48_data rand;
 	struct io_sq_ring sq_ring;
-	struct io_uring_iocb *iocbs;
+	struct io_uring_sqe *sqes;
 	struct iovec iovecs[DEPTH];
 	struct io_cq_ring cq_ring;
 	int inflight;
@@ -74,9 +74,9 @@ struct submitter {
 static struct submitter submitters[1];
 static volatile int finish;
 
-static int polled = 0;		/* use IO polling */
+static int polled = 1;		/* use IO polling */
 static int fixedbufs = 0;	/* use fixed user buffers */
-static int buffered = 1;	/* use buffered IO, not O_DIRECT */
+static int buffered = 0;	/* use buffered IO, not O_DIRECT */
 static int sq_thread = 0;	/* use kernel submission thread */
 static int sq_thread_cpu = 0;	/* pin above thread to this CPU */
 
@@ -100,23 +100,26 @@ static int gettid(void)
 
 static void init_io(struct submitter *s, int fd, unsigned index)
 {
-	struct io_uring_iocb *iocb = &s->iocbs[index];
+	struct io_uring_sqe *sqe = &s->sqes[index];
 	unsigned long offset;
 	long r;
 
 	lrand48_r(&s->rand, &r);
 	offset = (r % (s->max_blocks - 1)) * BS;
 
-	if (fixedbufs)
-		iocb->opcode = IORING_OP_READ_FIXED;
-	else
-		iocb->opcode = IORING_OP_READ;
-	iocb->flags = 0;
-	iocb->ioprio = 0;
-	iocb->fd = fd;
-	iocb->off = offset;
-	iocb->addr = s->iovecs[index].iov_base;
-	iocb->len = BS;
+	if (fixedbufs) {
+		sqe->opcode = IORING_OP_READ_FIXED;
+		sqe->addr = s->iovecs[index].iov_base;
+		sqe->len = BS;
+	} else {
+		sqe->opcode = IORING_OP_READV;
+		sqe->addr = &s->iovecs[index];
+		sqe->len = 1;
+	}
+	sqe->flags = 0;
+	sqe->ioprio = 0;
+	sqe->fd = fd;
+	sqe->off = offset;
 }
 
 static int prep_more_ios(struct submitter *s, int fd, int max_ios)
@@ -139,7 +142,7 @@ static int prep_more_ios(struct submitter *s, int fd, int max_ios)
 	} while (prepped < max_ios);
 
 	if (*ring->tail != tail) {
-		/* order tail store with writes to iocbs above */
+		/* order tail store with writes to sqes above */
 		barrier();
 		*ring->tail = tail;
 		barrier();
@@ -172,7 +175,7 @@ static int get_file_size(int fd, unsigned long *blocks)
 static int reap_events(struct submitter *s)
 {
 	struct io_cq_ring *ring = &s->cq_ring;
-	struct io_uring_event *ev;
+	struct io_uring_cqe *cqe;
 	unsigned head, reaped = 0;
 
 	head = *ring->head;
@@ -180,17 +183,17 @@ static int reap_events(struct submitter *s)
 		barrier();
 		if (head == *ring->tail)
 			break;
-		ev = &ring->events[head & cq_ring_mask];
-		if (ev->res != BS) {
-			struct io_uring_iocb *iocb = &s->iocbs[ev->index];
+		cqe = &ring->cqes[head & cq_ring_mask];
+		if (cqe->res != BS) {
+			struct io_uring_sqe *sqe = &s->sqes[cqe->index];
 
-			printf("io: unexpected ret=%d\n", ev->res);
+			printf("io: unexpected ret=%d\n", cqe->res);
 			printf("offset=%lu, size=%lu\n",
-					(unsigned long) iocb->off,
-					(unsigned long) iocb->len);
+					(unsigned long) sqe->off,
+					(unsigned long) sqe->len);
 			return -1;
 		}
-		if (ev->flags & IOEV_FLAG_CACHEHIT)
+		if (cqe->flags & IOCQE_FLAG_CACHEHIT)
 			s->cachehit++;
 		else
 			s->cachemiss++;
@@ -323,8 +326,6 @@ static int setup_ring(struct submitter *s)
 
 	if (polled)
 		p.flags |= IORING_SETUP_IOPOLL;
-	if (fixedbufs)
-		p.flags |= IORING_SETUP_FIXEDBUFS;
 	if (buffered)
 		p.flags |= IORING_SETUP_SQWQ;
 	else if (sq_thread) {
@@ -353,12 +354,12 @@ static int setup_ring(struct submitter *s)
 	sring->array = ptr + p.sq_off.array;
 	sq_ring_mask = *sring->ring_mask;
 
-	s->iocbs = mmap(0, p.sq_entries * sizeof(struct io_uring_iocb),
+	s->sqes = mmap(0, p.sq_entries * sizeof(struct io_uring_sqe),
 			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
-			IORING_OFF_IOCB);
-	printf("iocbs ptr   = 0x%p\n", s->iocbs);
+			IORING_OFF_SQES);
+	printf("sqes ptr    = 0x%p\n", s->sqes);
 
-	ptr = mmap(0, p.cq_off.events + p.cq_entries * sizeof(struct io_uring_event),
+	ptr = mmap(0, p.cq_off.cqes + p.cq_entries * sizeof(struct io_uring_cqe),
 			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
 			IORING_OFF_CQ_RING);
 	printf("cq_ring ptr = 0x%p\n", ptr);
@@ -366,7 +367,7 @@ static int setup_ring(struct submitter *s)
 	cring->tail = ptr + p.cq_off.tail;
 	cring->ring_mask = ptr + p.cq_off.ring_mask;
 	cring->ring_entries = ptr + p.cq_off.ring_entries;
-	cring->events = ptr + p.cq_off.events;
+	cring->cqes = ptr + p.cq_off.cqes;
 	cq_ring_mask = *cring->ring_mask;
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-09 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-09 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6e70fd303855575c99c520e8c46b7d85c9f21dc8:

  io_uring.h should include <linux/fs.h> (2019-01-08 05:43:38 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b08e7d6b18b4a38f61800e7553cd5e5d282da4a8:

  engines/devdax: Make detection of device-dax instances more robust (2019-01-08 12:47:37 -0700)

----------------------------------------------------------------
Dan Williams (1):
      engines/devdax: Make detection of device-dax instances more robust

Jens Axboe (4):
      io_uring: use kernel header directly
      configure: add __kernel_rwf_t check
      engines/io_uring: ensure to use the right opcode for fixed buffers
      t/io_uring: ensure to use the right opcode for fixed buffers

 configure          | 20 ++++++++++++++
 engines/dev-dax.c  |  5 ++--
 engines/io_uring.c | 47 +++++++++++++++++---------------
 lib/types.h        |  4 +++
 os/io_uring.h      | 78 ++++++++++++++++++++++++++++++++----------------------
 t/io_uring.c       | 37 ++++++++++++--------------
 6 files changed, 115 insertions(+), 76 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 1f4e50b1..c4fffd99 100755
--- a/configure
+++ b/configure
@@ -2295,6 +2295,23 @@ if compile_prog "" "-lcunit" "CUnit"; then
 fi
 print_config "CUnit" "$cunit"
 
+##########################################
+# check for __kernel_rwf_t
+__kernel_rwf_t="no"
+cat > $TMPC << EOF
+#include <linux/fs.h>
+int main(int argc, char **argv)
+{
+  __kernel_rwf_t x;
+  x = 0;
+  return x;
+}
+EOF
+if compile_prog "" "" "__kernel_rwf_t"; then
+  __kernel_rwf_t="yes"
+fi
+print_config "__kernel_rwf_t" "$__kernel_rwf_t"
+
 #############################################################################
 
 if test "$wordsize" = "64" ; then
@@ -2563,6 +2580,9 @@ fi
 if test "$cunit" = "yes" ; then
   output_sym "CONFIG_HAVE_CUNIT"
 fi
+if test "$__kernel_rwf_t" = "yes"; then
+  output_sym "CONFIG_HAVE_KERNEL_RWF_T"
+fi
 
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "GFIO_LIBS+=$GFIO_LIBS" >> $config_host_mak
diff --git a/engines/dev-dax.c b/engines/dev-dax.c
index 0660bba5..422ea634 100644
--- a/engines/dev-dax.c
+++ b/engines/dev-dax.c
@@ -259,7 +259,7 @@ fio_devdax_get_file_size(struct thread_data *td, struct fio_file *f)
 {
 	char spath[PATH_MAX];
 	char npath[PATH_MAX];
-	char *rpath;
+	char *rpath, *basename;
 	FILE *sfile;
 	uint64_t size;
 	struct stat st;
@@ -289,7 +289,8 @@ fio_devdax_get_file_size(struct thread_data *td, struct fio_file *f)
 	}
 
 	/* check if DAX device */
-	if (strcmp("/sys/class/dax", rpath)) {
+	basename = strrchr(rpath, '/');
+	if (!basename || strcmp("dax", basename+1)) {
 		log_err("%s: %s not a DAX device!\n",
 			td->o.name, f->file_name);
 	}
diff --git a/engines/io_uring.c b/engines/io_uring.c
index ebca08c8..55f48eda 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -20,28 +20,23 @@
 
 #ifdef ARCH_HAVE_IOURING
 
-typedef uint64_t u64;
-typedef uint32_t u32;
-typedef int32_t s32;
-typedef uint16_t u16;
-typedef uint8_t u8;
-
+#include "../lib/types.h"
 #include "../os/io_uring.h"
 
 struct io_sq_ring {
-	u32 *head;
-	u32 *tail;
-	u32 *ring_mask;
-	u32 *ring_entries;
-	u32 *flags;
-	u32 *array;
+	unsigned *head;
+	unsigned *tail;
+	unsigned *ring_mask;
+	unsigned *ring_entries;
+	unsigned *flags;
+	unsigned *array;
 };
 
 struct io_cq_ring {
-	u32 *head;
-	u32 *tail;
-	u32 *ring_mask;
-	u32 *ring_entries;
+	unsigned *head;
+	unsigned *tail;
+	unsigned *ring_mask;
+	unsigned *ring_entries;
 	struct io_uring_event *events;
 };
 
@@ -154,6 +149,7 @@ static int io_uring_enter(struct ioring_data *ld, unsigned int to_submit,
 static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
 	struct fio_file *f = io_u->file;
 	struct io_uring_iocb *iocb;
 
@@ -163,10 +159,17 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 	iocb->ioprio = 0;
 
 	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
-		if (io_u->ddir == DDIR_READ)
-			iocb->opcode = IORING_OP_READ;
-		else
-			iocb->opcode = IORING_OP_WRITE;
+		if (io_u->ddir == DDIR_READ) {
+			if (o->fixedbufs)
+				iocb->opcode = IORING_OP_READ_FIXED;
+			else
+				iocb->opcode = IORING_OP_READ;
+		} else {
+			if (o->fixedbufs)
+				iocb->opcode = IORING_OP_WRITE_FIXED;
+			else
+				iocb->opcode = IORING_OP_WRITE;
+		}
 		iocb->off = io_u->offset;
 		iocb->addr = io_u->xfer_buf;
 		iocb->len = io_u->xfer_buflen;
@@ -211,7 +214,7 @@ static int fio_ioring_cqring_reap(struct thread_data *td, unsigned int events,
 {
 	struct ioring_data *ld = td->io_ops_data;
 	struct io_cq_ring *ring = &ld->cq_ring;
-	u32 head, reaped = 0;
+	unsigned head, reaped = 0;
 
 	head = *ring->head;
 	do {
@@ -401,7 +404,7 @@ static int fio_ioring_mmap(struct ioring_data *ld, struct io_uring_params *p)
 	struct io_cq_ring *cring = &ld->cq_ring;
 	void *ptr;
 
-	ld->mmap[0].len = p->sq_off.array + p->sq_entries * sizeof(u32);
+	ld->mmap[0].len = p->sq_off.array + p->sq_entries * sizeof(__u32);
 	ptr = mmap(0, ld->mmap[0].len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_POPULATE, ld->ring_fd,
 			IORING_OFF_SQ_RING);
diff --git a/lib/types.h b/lib/types.h
index 236bf8a3..d92b064c 100644
--- a/lib/types.h
+++ b/lib/types.h
@@ -13,4 +13,8 @@ typedef int bool;
 #include <stdbool.h> /* IWYU pragma: export */
 #endif
 
+#if !defined(CONFIG_HAVE_KERNEL_RWF_T)
+typedef int __kernel_rwf_t;
+#endif
+
 #endif
diff --git a/os/io_uring.h b/os/io_uring.h
index 8dda7951..7dd21126 100644
--- a/os/io_uring.h
+++ b/os/io_uring.h
@@ -1,25 +1,33 @@
-#ifndef IO_URING_H
-#define IO_URING_H
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Header file for the io_uring interface.
+ *
+ * Copyright (C) 2019 Jens Axboe
+ * Copyright (C) 2019 Christoph Hellwig
+ */
+#ifndef LINUX_IO_URING_H
+#define LINUX_IO_URING_H
 
 #include <linux/fs.h>
+#include <linux/types.h>
 
 /*
  * IO submission data structure
  */
 struct io_uring_iocb {
-	u8	opcode;
-	u8	flags;
-	u16	ioprio;
-	s32	fd;
-	u64	off;
+	__u8	opcode;
+	__u8	flags;
+	__u16	ioprio;
+	__s32	fd;
+	__u64	off;
 	union {
 		void	*addr;
-		u64	__pad;
+		__u64	__pad;
 	};
-	u32	len;
+	__u32	len;
 	union {
 		__kernel_rwf_t	rw_flags;
-		u32		__resv;
+		__u32		__resv;
 	};
 };
 
@@ -44,10 +52,13 @@ struct io_uring_iocb {
  */
 struct io_uring_event {
 	__u64	index;		/* what iocb this event came from */
-	s32	res;		/* result code for this event */
-	u32	flags;
+	__s32	res;		/* result code for this event */
+	__u32	flags;
 };
 
+/*
+ * io_uring_event->flags
+ */
 #define IOEV_FLAG_CACHEHIT	(1 << 0)	/* IO did not hit media */
 
 /*
@@ -61,39 +72,42 @@ struct io_uring_event {
  * Filled with the offset for mmap(2)
  */
 struct io_sqring_offsets {
-	u32 head;
-	u32 tail;
-	u32 ring_mask;
-	u32 ring_entries;
-	u32 flags;
-	u32 dropped;
-	u32 array;
-	u32 resv[3];
+	__u32 head;
+	__u32 tail;
+	__u32 ring_mask;
+	__u32 ring_entries;
+	__u32 flags;
+	__u32 dropped;
+	__u32 array;
+	__u32 resv[3];
 };
 
 #define IORING_SQ_NEED_WAKEUP	(1 << 0) /* needs io_uring_enter wakeup */
 
 struct io_cqring_offsets {
-	u32 head;
-	u32 tail;
-	u32 ring_mask;
-	u32 ring_entries;
-	u32 overflow;
-	u32 events;
-	u32 resv[4];
+	__u32 head;
+	__u32 tail;
+	__u32 ring_mask;
+	__u32 ring_entries;
+	__u32 overflow;
+	__u32 events;
+	__u32 resv[4];
 };
 
+/*
+ * io_uring_enter(2) flags
+ */
 #define IORING_ENTER_GETEVENTS	(1 << 0)
 
 /*
  * Passed in for io_uring_setup(2). Copied back with updated info on success
  */
 struct io_uring_params {
-	u32 sq_entries;
-	u32 cq_entries;
-	u32 flags;
-	u16 sq_thread_cpu;
-	u16 resv[9];
+	__u32 sq_entries;
+	__u32 cq_entries;
+	__u32 flags;
+	__u16 sq_thread_cpu;
+	__u16 resv[9];
 	struct io_sqring_offsets sq_off;
 	struct io_cqring_offsets cq_off;
 };
diff --git a/t/io_uring.c b/t/io_uring.c
index 83d723f9..fb2654a3 100644
--- a/t/io_uring.c
+++ b/t/io_uring.c
@@ -21,13 +21,7 @@
 #include <sched.h>
 
 #include "../arch/arch.h"
-
-typedef uint64_t u64;
-typedef uint32_t u32;
-typedef int32_t s32;
-typedef uint16_t u16;
-typedef uint8_t u8;
-
+#include "../lib/types.h"
 #include "../os/io_uring.h"
 
 #define barrier()	__asm__ __volatile__("": : :"memory")
@@ -35,18 +29,18 @@ typedef uint8_t u8;
 #define min(a, b)		((a < b) ? (a) : (b))
 
 struct io_sq_ring {
-	u32 *head;
-	u32 *tail;
-	u32 *ring_mask;
-	u32 *ring_entries;
-	u32 *array;
+	unsigned *head;
+	unsigned *tail;
+	unsigned *ring_mask;
+	unsigned *ring_entries;
+	unsigned *array;
 };
 
 struct io_cq_ring {
-	u32 *head;
-	u32 *tail;
-	u32 *ring_mask;
-	u32 *ring_entries;
+	unsigned *head;
+	unsigned *tail;
+	unsigned *ring_mask;
+	unsigned *ring_entries;
 	struct io_uring_event *events;
 };
 
@@ -113,7 +107,10 @@ static void init_io(struct submitter *s, int fd, unsigned index)
 	lrand48_r(&s->rand, &r);
 	offset = (r % (s->max_blocks - 1)) * BS;
 
-	iocb->opcode = IORING_OP_READ;
+	if (fixedbufs)
+		iocb->opcode = IORING_OP_READ_FIXED;
+	else
+		iocb->opcode = IORING_OP_READ;
 	iocb->flags = 0;
 	iocb->ioprio = 0;
 	iocb->fd = fd;
@@ -125,7 +122,7 @@ static void init_io(struct submitter *s, int fd, unsigned index)
 static int prep_more_ios(struct submitter *s, int fd, int max_ios)
 {
 	struct io_sq_ring *ring = &s->sq_ring;
-	u32 index, tail, next_tail, prepped = 0;
+	unsigned index, tail, next_tail, prepped = 0;
 
 	next_tail = tail = *ring->tail;
 	do {
@@ -176,7 +173,7 @@ static int reap_events(struct submitter *s)
 {
 	struct io_cq_ring *ring = &s->cq_ring;
 	struct io_uring_event *ev;
-	u32 head, reaped = 0;
+	unsigned head, reaped = 0;
 
 	head = *ring->head;
 	do {
@@ -345,7 +342,7 @@ static int setup_ring(struct submitter *s)
 	}
 
 	s->ring_fd = fd;
-	ptr = mmap(0, p.sq_off.array + p.sq_entries * sizeof(u32),
+	ptr = mmap(0, p.sq_off.array + p.sq_entries * sizeof(__u32),
 			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
 			IORING_OFF_SQ_RING);
 	printf("sq_ring ptr = 0x%p\n", ptr);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f310970e737975088e41ea14c399450ba8ae3a49:

  t/aio-ring: cleanup the code a bit (2019-01-05 07:42:30 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6e70fd303855575c99c520e8c46b7d85c9f21dc8:

  io_uring.h should include <linux/fs.h> (2019-01-08 05:43:38 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      Rename t/aio-ring to t/io_uring
      Rename aioring engine to io_uring
      io_uring.h should include <linux/fs.h>

 Makefile                          |  15 +-
 arch/arch-x86_64.h                |   2 +-
 engines/{aioring.c => io_uring.c} | 283 ++++++++++++++------------------------
 options.c                         |   8 +-
 os/io_uring.h                     | 101 ++++++++++++++
 t/{aio-ring.c => io_uring.c}      | 121 ++++++----------
 6 files changed, 254 insertions(+), 276 deletions(-)
 rename engines/{aioring.c => io_uring.c} (58%)
 create mode 100644 os/io_uring.h
 rename t/{aio-ring.c => io_uring.c} (81%)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index f111ae6a..5bc82f9a 100644
--- a/Makefile
+++ b/Makefile
@@ -68,9 +68,6 @@ endif
 ifdef CONFIG_LIBAIO
   SOURCE += engines/libaio.c
 endif
-ifdef CONFIG_LIBAIO
-  SOURCE += engines/aioring.c
-endif
 ifdef CONFIG_RDMA
   SOURCE += engines/rdma.c
 endif
@@ -154,7 +151,7 @@ endif
 
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
-		oslib/linux-dev-lookup.c
+		oslib/linux-dev-lookup.c engines/io_uring.c
   LIBS += -lpthread -ldl
   LDFLAGS += -rdynamic
 endif
@@ -266,8 +263,8 @@ T_VS_PROGS = t/fio-verify-state
 T_PIPE_ASYNC_OBJS = t/read-to-pipe-async.o
 T_PIPE_ASYNC_PROGS = t/read-to-pipe-async
 
-T_AIO_RING_OBJS = t/aio-ring.o
-T_AIO_RING_PROGS = t/aio-ring
+T_IOU_RING_OBJS = t/io_uring.o
+T_IOU_RING_PROGS = t/io_uring
 
 T_MEMLOCK_OBJS = t/memlock.o
 T_MEMLOCK_PROGS = t/memlock
@@ -287,7 +284,7 @@ T_OBJS += $(T_VS_OBJS)
 T_OBJS += $(T_PIPE_ASYNC_OBJS)
 T_OBJS += $(T_MEMLOCK_OBJS)
 T_OBJS += $(T_TT_OBJS)
-T_OBJS += $(T_AIO_RING_OBJS)
+T_OBJS += $(T_IOU_RING_OBJS)
 
 ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
     T_DEDUPE_OBJS += os/windows/posix.o lib/hweight.o
@@ -447,8 +444,8 @@ cairo_text_helpers.o: cairo_text_helpers.c cairo_text_helpers.h
 printing.o: printing.c printing.h
 	$(QUIET_CC)$(CC) $(CFLAGS) $(GTK_CFLAGS) $(CPPFLAGS) -c $<
 
-t/aio-ring: $(T_AIO_RING_OBJS)
-	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_AIO_RING_OBJS) $(LIBS)
+t/io_uring: $(T_IOU_RING_OBJS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_IOU_RING_OBJS) $(LIBS)
 
 t/read-to-pipe-async: $(T_PIPE_ASYNC_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_PIPE_ASYNC_OBJS) $(LIBS)
diff --git a/arch/arch-x86_64.h b/arch/arch-x86_64.h
index d0a98b8b..a5864bab 100644
--- a/arch/arch-x86_64.h
+++ b/arch/arch-x86_64.h
@@ -44,7 +44,7 @@ static inline unsigned long long get_cpu_clock(void)
 #define ARCH_HAVE_FFZ
 #define ARCH_HAVE_SSE4_2
 #define ARCH_HAVE_CPU_CLOCK
-#define ARCH_HAVE_AIORING
+#define ARCH_HAVE_IOURING
 
 #define RDRAND_LONG	".byte 0x48,0x0f,0xc7,0xf0"
 #define RDSEED_LONG	".byte 0x48,0x0f,0xc7,0xf8"
diff --git a/engines/aioring.c b/engines/io_uring.c
similarity index 58%
rename from engines/aioring.c
rename to engines/io_uring.c
index 8cecb6ad..ebca08c8 100644
--- a/engines/aioring.c
+++ b/engines/io_uring.c
@@ -1,15 +1,14 @@
 /*
- * aioring engine
+ * io_uring engine
  *
- * IO engine using the new native Linux libaio ring interface. See:
+ * IO engine using the new native Linux aio io_uring interface. See:
  *
- * http://git.kernel.dk/cgit/linux-block/log/?h=aio-poll
+ * http://git.kernel.dk/cgit/linux-block/log/?h=io_uring
  *
  */
 #include <stdlib.h>
 #include <unistd.h>
 #include <errno.h>
-#include <libaio.h>
 #include <sys/time.h>
 #include <sys/resource.h>
 
@@ -19,81 +18,17 @@
 #include "../lib/memalign.h"
 #include "../lib/fls.h"
 
-#ifdef ARCH_HAVE_AIORING
-
-/*
- * io_uring_setup(2) flags
- */
-#ifndef IOCTX_FLAG_SCQRING
-#define IOCTX_FLAG_SCQRING	(1 << 0)
-#endif
-#ifndef IOCTX_FLAG_IOPOLL
-#define IOCTX_FLAG_IOPOLL	(1 << 1)
-#endif
-#ifndef IOCTX_FLAG_FIXEDBUFS
-#define IOCTX_FLAG_FIXEDBUFS	(1 << 2)
-#endif
-#ifndef IOCTX_FLAG_SQTHREAD
-#define IOCTX_FLAG_SQTHREAD	(1 << 3)
-#endif
-#ifndef IOCTX_FLAG_SQWQ
-#define IOCTX_FLAG_SQWQ		(1 << 4)
-#endif
-#ifndef IOCTX_FLAG_SQPOLL
-#define IOCTX_FLAG_SQPOLL	(1 << 5)
-#endif
-
-#define IORING_OFF_SQ_RING	0ULL
-#define IORING_OFF_CQ_RING	0x8000000ULL
-#define IORING_OFF_IOCB		0x10000000ULL
-
-/*
- * io_uring_enter(2) flags
- */
-#ifndef IORING_ENTER_GETEVENTS
-#define IORING_ENTER_GETEVENTS	(1 << 0)
-#endif
+#ifdef ARCH_HAVE_IOURING
 
 typedef uint64_t u64;
 typedef uint32_t u32;
+typedef int32_t s32;
 typedef uint16_t u16;
+typedef uint8_t u8;
 
-#define IORING_SQ_NEED_WAKEUP	(1 << 0)
-
-#define IOEV_RES2_CACHEHIT	(1 << 0)
-
-struct aio_sqring_offsets {
-	u32 head;
-	u32 tail;
-	u32 ring_mask;
-	u32 ring_entries;
-	u32 flags;
-	u32 dropped;
-	u32 array;
-	u32 resv[3];
-};
-
-struct aio_cqring_offsets {
-	u32 head;
-	u32 tail;
-	u32 ring_mask;
-	u32 ring_entries;
-	u32 overflow;
-	u32 events;
-	u32 resv[4];
-};
-
-struct aio_uring_params {
-	u32 sq_entries;
-	u32 cq_entries;
-	u32 flags;
-	u16 sq_thread_cpu;
-	u16 resv[9];
-	struct aio_sqring_offsets sq_off;
-	struct aio_cqring_offsets cq_off;
-};
+#include "../os/io_uring.h"
 
-struct aio_sq_ring {
+struct io_sq_ring {
 	u32 *head;
 	u32 *tail;
 	u32 *ring_mask;
@@ -102,32 +37,31 @@ struct aio_sq_ring {
 	u32 *array;
 };
 
-struct aio_cq_ring {
+struct io_cq_ring {
 	u32 *head;
 	u32 *tail;
 	u32 *ring_mask;
 	u32 *ring_entries;
-	struct io_event *events;
+	struct io_uring_event *events;
 };
 
-struct aioring_mmap {
+struct ioring_mmap {
 	void *ptr;
 	size_t len;
 };
 
-struct aioring_data {
+struct ioring_data {
 	int ring_fd;
 
 	struct io_u **io_us;
 	struct io_u **io_u_index;
 
-	struct aio_sq_ring sq_ring;
-	struct iocb *iocbs;
+	struct io_sq_ring sq_ring;
+	struct io_uring_iocb *iocbs;
 	struct iovec *iovecs;
 	unsigned sq_ring_mask;
 
-	struct aio_cq_ring cq_ring;
-	struct io_event *events;
+	struct io_cq_ring cq_ring;
 	unsigned cq_ring_mask;
 
 	int queued;
@@ -137,10 +71,10 @@ struct aioring_data {
 	uint64_t cachehit;
 	uint64_t cachemiss;
 
-	struct aioring_mmap mmap[3];
+	struct ioring_mmap mmap[3];
 };
 
-struct aioring_options {
+struct ioring_options {
 	void *pad;
 	unsigned int hipri;
 	unsigned int fixedbufs;
@@ -150,10 +84,9 @@ struct aioring_options {
 	unsigned int sqwq;
 };
 
-static int fio_aioring_sqthread_cb(void *data,
-				   unsigned long long *val)
+static int fio_ioring_sqthread_cb(void *data, unsigned long long *val)
 {
-	struct aioring_options *o = data;
+	struct ioring_options *o = data;
 
 	o->sqthread = *val;
 	o->sqthread_set = 1;
@@ -165,7 +98,7 @@ static struct fio_option options[] = {
 		.name	= "hipri",
 		.lname	= "High Priority",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= offsetof(struct aioring_options, hipri),
+		.off1	= offsetof(struct ioring_options, hipri),
 		.help	= "Use polled IO completions",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
@@ -174,7 +107,7 @@ static struct fio_option options[] = {
 		.name	= "fixedbufs",
 		.lname	= "Fixed (pre-mapped) IO buffers",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= offsetof(struct aioring_options, fixedbufs),
+		.off1	= offsetof(struct ioring_options, fixedbufs),
 		.help	= "Pre map IO buffers",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
@@ -183,7 +116,7 @@ static struct fio_option options[] = {
 		.name	= "sqthread",
 		.lname	= "Use kernel SQ thread on this CPU",
 		.type	= FIO_OPT_INT,
-		.cb	= fio_aioring_sqthread_cb,
+		.cb	= fio_ioring_sqthread_cb,
 		.help	= "Offload submission to kernel thread",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
@@ -192,7 +125,7 @@ static struct fio_option options[] = {
 		.name	= "sqthread_poll",
 		.lname	= "Kernel SQ thread should poll",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= offsetof(struct aioring_options, sqthread_poll),
+		.off1	= offsetof(struct ioring_options, sqthread_poll),
 		.help	= "Used with sqthread, enables kernel side polling",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
@@ -201,7 +134,7 @@ static struct fio_option options[] = {
 		.name	= "sqwq",
 		.lname	= "Offload submission to kernel workqueue",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= offsetof(struct aioring_options, sqwq),
+		.off1	= offsetof(struct ioring_options, sqwq),
 		.help	= "Offload submission to kernel workqueue",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
@@ -211,50 +144,49 @@ static struct fio_option options[] = {
 	},
 };
 
-static int io_uring_enter(struct aioring_data *ld, unsigned int to_submit,
+static int io_uring_enter(struct ioring_data *ld, unsigned int to_submit,
 			 unsigned int min_complete, unsigned int flags)
 {
 	return syscall(__NR_sys_io_uring_enter, ld->ring_fd, to_submit,
 			min_complete, flags);
 }
 
-static int fio_aioring_prep(struct thread_data *td, struct io_u *io_u)
+static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 {
-	struct aioring_data *ld = td->io_ops_data;
+	struct ioring_data *ld = td->io_ops_data;
 	struct fio_file *f = io_u->file;
-	struct iocb *iocb;
+	struct io_uring_iocb *iocb;
 
 	iocb = &ld->iocbs[io_u->index];
+	iocb->fd = f->fd;
+	iocb->flags = 0;
+	iocb->ioprio = 0;
 
 	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
 		if (io_u->ddir == DDIR_READ)
-			iocb->aio_lio_opcode = IO_CMD_PREAD;
+			iocb->opcode = IORING_OP_READ;
 		else
-			iocb->aio_lio_opcode = IO_CMD_PWRITE;
-		iocb->aio_reqprio = 0;
-		iocb->aio_fildes = f->fd;
-		iocb->u.c.buf = io_u->xfer_buf;
-		iocb->u.c.nbytes = io_u->xfer_buflen;
-		iocb->u.c.offset = io_u->offset;
-		iocb->u.c.flags = 0;
+			iocb->opcode = IORING_OP_WRITE;
+		iocb->off = io_u->offset;
+		iocb->addr = io_u->xfer_buf;
+		iocb->len = io_u->xfer_buflen;
 	} else if (ddir_sync(io_u->ddir))
-		io_prep_fsync(iocb, f->fd);
+		iocb->opcode = IORING_OP_FSYNC;
 
-	iocb->data = io_u;
 	return 0;
 }
 
-static struct io_u *fio_aioring_event(struct thread_data *td, int event)
+static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 {
-	struct aioring_data *ld = td->io_ops_data;
-	struct io_event *ev;
+	struct ioring_data *ld = td->io_ops_data;
+	struct io_uring_event *ev;
 	struct io_u *io_u;
 	unsigned index;
 
 	index = (event + ld->cq_ring_off) & ld->cq_ring_mask;
 
 	ev = &ld->cq_ring.events[index];
-	io_u = ev->data;
+	io_u = ld->io_u_index[ev->index];
 
 	if (ev->res != io_u->xfer_buflen) {
 		if (ev->res > io_u->xfer_buflen)
@@ -265,7 +197,7 @@ static struct io_u *fio_aioring_event(struct thread_data *td, int event)
 		io_u->error = 0;
 
 	if (io_u->ddir == DDIR_READ) {
-		if (ev->res2 & IOEV_RES2_CACHEHIT)
+		if (ev->flags & IOEV_FLAG_CACHEHIT)
 			ld->cachehit++;
 		else
 			ld->cachemiss++;
@@ -274,11 +206,11 @@ static struct io_u *fio_aioring_event(struct thread_data *td, int event)
 	return io_u;
 }
 
-static int fio_aioring_cqring_reap(struct thread_data *td, unsigned int events,
+static int fio_ioring_cqring_reap(struct thread_data *td, unsigned int events,
 				   unsigned int max)
 {
-	struct aioring_data *ld = td->io_ops_data;
-	struct aio_cq_ring *ring = &ld->cq_ring;
+	struct ioring_data *ld = td->io_ops_data;
+	struct io_cq_ring *ring = &ld->cq_ring;
 	u32 head, reaped = 0;
 
 	head = *ring->head;
@@ -295,19 +227,19 @@ static int fio_aioring_cqring_reap(struct thread_data *td, unsigned int events,
 	return reaped;
 }
 
-static int fio_aioring_getevents(struct thread_data *td, unsigned int min,
-				 unsigned int max, const struct timespec *t)
+static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
+				unsigned int max, const struct timespec *t)
 {
-	struct aioring_data *ld = td->io_ops_data;
+	struct ioring_data *ld = td->io_ops_data;
 	unsigned actual_min = td->o.iodepth_batch_complete_min == 0 ? 0 : min;
-	struct aioring_options *o = td->eo;
-	struct aio_cq_ring *ring = &ld->cq_ring;
+	struct ioring_options *o = td->eo;
+	struct io_cq_ring *ring = &ld->cq_ring;
 	unsigned events = 0;
 	int r;
 
 	ld->cq_ring_off = *ring->head;
 	do {
-		r = fio_aioring_cqring_reap(td, events, max);
+		r = fio_ioring_cqring_reap(td, events, max);
 		if (r) {
 			events += r;
 			continue;
@@ -328,11 +260,11 @@ static int fio_aioring_getevents(struct thread_data *td, unsigned int min,
 	return r < 0 ? r : events;
 }
 
-static enum fio_q_status fio_aioring_queue(struct thread_data *td,
-					   struct io_u *io_u)
+static enum fio_q_status fio_ioring_queue(struct thread_data *td,
+					  struct io_u *io_u)
 {
-	struct aioring_data *ld = td->io_ops_data;
-	struct aio_sq_ring *ring = &ld->sq_ring;
+	struct ioring_data *ld = td->io_ops_data;
+	struct io_sq_ring *ring = &ld->sq_ring;
 	unsigned tail, next_tail;
 
 	fio_ro_check(td, io_u);
@@ -364,9 +296,9 @@ static enum fio_q_status fio_aioring_queue(struct thread_data *td,
 	return FIO_Q_QUEUED;
 }
 
-static void fio_aioring_queued(struct thread_data *td, int start, int nr)
+static void fio_ioring_queued(struct thread_data *td, int start, int nr)
 {
-	struct aioring_data *ld = td->io_ops_data;
+	struct ioring_data *ld = td->io_ops_data;
 	struct timespec now;
 
 	if (!fio_fill_issue_time(td))
@@ -375,7 +307,7 @@ static void fio_aioring_queued(struct thread_data *td, int start, int nr)
 	fio_gettime(&now, NULL);
 
 	while (nr--) {
-		struct aio_sq_ring *ring = &ld->sq_ring;
+		struct io_sq_ring *ring = &ld->sq_ring;
 		int index = ring->array[start & ld->sq_ring_mask];
 		struct io_u *io_u = ld->io_u_index[index];
 
@@ -386,10 +318,10 @@ static void fio_aioring_queued(struct thread_data *td, int start, int nr)
 	}
 }
 
-static int fio_aioring_commit(struct thread_data *td)
+static int fio_ioring_commit(struct thread_data *td)
 {
-	struct aioring_data *ld = td->io_ops_data;
-	struct aioring_options *o = td->eo;
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
 	int ret;
 
 	if (!ld->queued)
@@ -397,7 +329,7 @@ static int fio_aioring_commit(struct thread_data *td)
 
 	/* Nothing to do */
 	if (o->sqthread_poll) {
-		struct aio_sq_ring *ring = &ld->sq_ring;
+		struct io_sq_ring *ring = &ld->sq_ring;
 
 		if (*ring->flags & IORING_SQ_NEED_WAKEUP)
 			io_uring_enter(ld, ld->queued, 0, 0);
@@ -411,7 +343,7 @@ static int fio_aioring_commit(struct thread_data *td)
 
 		ret = io_uring_enter(ld, nr, 0, IORING_ENTER_GETEVENTS);
 		if (ret > 0) {
-			fio_aioring_queued(td, start, ret);
+			fio_ioring_queued(td, start, ret);
 			io_u_mark_submit(td, ret);
 
 			ld->queued -= ret;
@@ -421,7 +353,7 @@ static int fio_aioring_commit(struct thread_data *td)
 			continue;
 		} else {
 			if (errno == EAGAIN) {
-				ret = fio_aioring_cqring_reap(td, 0, ld->queued);
+				ret = fio_ioring_cqring_reap(td, 0, ld->queued);
 				if (ret)
 					continue;
 				/* Shouldn't happen */
@@ -436,7 +368,7 @@ static int fio_aioring_commit(struct thread_data *td)
 	return ret;
 }
 
-static void fio_aioring_unmap(struct aioring_data *ld)
+static void fio_ioring_unmap(struct ioring_data *ld)
 {
 	int i;
 
@@ -445,22 +377,16 @@ static void fio_aioring_unmap(struct aioring_data *ld)
 	close(ld->ring_fd);
 }
 
-static void fio_aioring_cleanup(struct thread_data *td)
+static void fio_ioring_cleanup(struct thread_data *td)
 {
-	struct aioring_data *ld = td->io_ops_data;
+	struct ioring_data *ld = td->io_ops_data;
 
 	if (ld) {
 		td->ts.cachehit += ld->cachehit;
 		td->ts.cachemiss += ld->cachemiss;
 
-		/*
-		 * Work-around to avoid huge RCU stalls at exit time. If we
-		 * don't do this here, then it'll be torn down by exit_aio().
-		 * But for that case we can parallellize the freeing, thus
-		 * speeding it up a lot.
-		 */
 		if (!(td->flags & TD_F_CHILD))
-			fio_aioring_unmap(ld);
+			fio_ioring_unmap(ld);
 
 		free(ld->io_u_index);
 		free(ld->io_us);
@@ -469,10 +395,10 @@ static void fio_aioring_cleanup(struct thread_data *td)
 	}
 }
 
-static int fio_aioring_mmap(struct aioring_data *ld, struct aio_uring_params *p)
+static int fio_ioring_mmap(struct ioring_data *ld, struct io_uring_params *p)
 {
-	struct aio_sq_ring *sring = &ld->sq_ring;
-	struct aio_cq_ring *cring = &ld->cq_ring;
+	struct io_sq_ring *sring = &ld->sq_ring;
+	struct io_cq_ring *cring = &ld->cq_ring;
 	void *ptr;
 
 	ld->mmap[0].len = p->sq_off.array + p->sq_entries * sizeof(u32);
@@ -488,14 +414,14 @@ static int fio_aioring_mmap(struct aioring_data *ld, struct aio_uring_params *p)
 	sring->array = ptr + p->sq_off.array;
 	ld->sq_ring_mask = *sring->ring_mask;
 
-	ld->mmap[1].len = p->sq_entries * sizeof(struct iocb);
+	ld->mmap[1].len = p->sq_entries * sizeof(struct io_uring_iocb);
 	ld->iocbs = mmap(0, ld->mmap[1].len, PROT_READ | PROT_WRITE,
 				MAP_SHARED | MAP_POPULATE, ld->ring_fd,
 				IORING_OFF_IOCB);
 	ld->mmap[1].ptr = ld->iocbs;
 
 	ld->mmap[2].len = p->cq_off.events +
-				p->cq_entries * sizeof(struct io_event);
+				p->cq_entries * sizeof(struct io_uring_event);
 	ptr = mmap(0, ld->mmap[2].len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_POPULATE, ld->ring_fd,
 			IORING_OFF_CQ_RING);
@@ -509,27 +435,26 @@ static int fio_aioring_mmap(struct aioring_data *ld, struct aio_uring_params *p)
 	return 0;
 }
 
-static int fio_aioring_queue_init(struct thread_data *td)
+static int fio_ioring_queue_init(struct thread_data *td)
 {
-	struct aioring_data *ld = td->io_ops_data;
-	struct aioring_options *o = td->eo;
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
 	int depth = td->o.iodepth;
-	struct aio_uring_params p;
+	struct io_uring_params p;
 	int ret;
 
 	memset(&p, 0, sizeof(p));
-	p.flags = IOCTX_FLAG_SCQRING;
 
 	if (o->hipri)
-		p.flags |= IOCTX_FLAG_IOPOLL;
+		p.flags |= IORING_SETUP_IOPOLL;
 	if (o->sqthread_set) {
 		p.sq_thread_cpu = o->sqthread;
-		p.flags |= IOCTX_FLAG_SQTHREAD;
+		p.flags |= IORING_SETUP_SQTHREAD;
 		if (o->sqthread_poll)
-			p.flags |= IOCTX_FLAG_SQPOLL;
+			p.flags |= IORING_SETUP_SQPOLL;
 	}
 	if (o->sqwq)
-		p.flags |= IOCTX_FLAG_SQWQ;
+		p.flags |= IORING_SETUP_SQWQ;
 
 	if (o->fixedbufs) {
 		struct rlimit rlim = {
@@ -538,7 +463,7 @@ static int fio_aioring_queue_init(struct thread_data *td)
 		};
 
 		setrlimit(RLIMIT_MEMLOCK, &rlim);
-		p.flags |= IOCTX_FLAG_FIXEDBUFS;
+		p.flags |= IORING_SETUP_FIXEDBUFS;
 	}
 
 	ret = syscall(__NR_sys_io_uring_setup, depth, ld->iovecs, &p);
@@ -546,13 +471,13 @@ static int fio_aioring_queue_init(struct thread_data *td)
 		return ret;
 
 	ld->ring_fd = ret;
-	return fio_aioring_mmap(ld, &p);
+	return fio_ioring_mmap(ld, &p);
 }
 
-static int fio_aioring_post_init(struct thread_data *td)
+static int fio_ioring_post_init(struct thread_data *td)
 {
-	struct aioring_data *ld = td->io_ops_data;
-	struct aioring_options *o = td->eo;
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
 	struct io_u *io_u;
 	int err;
 
@@ -568,7 +493,7 @@ static int fio_aioring_post_init(struct thread_data *td)
 		}
 	}
 
-	err = fio_aioring_queue_init(td);
+	err = fio_ioring_queue_init(td);
 	if (err) {
 		td_verror(td, errno, "io_queue_init");
 		return 1;
@@ -582,9 +507,9 @@ static unsigned roundup_pow2(unsigned depth)
 	return 1UL << __fls(depth - 1);
 }
 
-static int fio_aioring_init(struct thread_data *td)
+static int fio_ioring_init(struct thread_data *td)
 {
-	struct aioring_data *ld;
+	struct ioring_data *ld;
 
 	ld = calloc(1, sizeof(*ld));
 
@@ -602,39 +527,39 @@ static int fio_aioring_init(struct thread_data *td)
 	return 0;
 }
 
-static int fio_aioring_io_u_init(struct thread_data *td, struct io_u *io_u)
+static int fio_ioring_io_u_init(struct thread_data *td, struct io_u *io_u)
 {
-	struct aioring_data *ld = td->io_ops_data;
+	struct ioring_data *ld = td->io_ops_data;
 
 	ld->io_u_index[io_u->index] = io_u;
 	return 0;
 }
 
 static struct ioengine_ops ioengine = {
-	.name			= "aio-ring",
+	.name			= "io_uring",
 	.version		= FIO_IOOPS_VERSION,
-	.init			= fio_aioring_init,
-	.post_init		= fio_aioring_post_init,
-	.io_u_init		= fio_aioring_io_u_init,
-	.prep			= fio_aioring_prep,
-	.queue			= fio_aioring_queue,
-	.commit			= fio_aioring_commit,
-	.getevents		= fio_aioring_getevents,
-	.event			= fio_aioring_event,
-	.cleanup		= fio_aioring_cleanup,
+	.init			= fio_ioring_init,
+	.post_init		= fio_ioring_post_init,
+	.io_u_init		= fio_ioring_io_u_init,
+	.prep			= fio_ioring_prep,
+	.queue			= fio_ioring_queue,
+	.commit			= fio_ioring_commit,
+	.getevents		= fio_ioring_getevents,
+	.event			= fio_ioring_event,
+	.cleanup		= fio_ioring_cleanup,
 	.open_file		= generic_open_file,
 	.close_file		= generic_close_file,
 	.get_file_size		= generic_get_file_size,
 	.options		= options,
-	.option_struct_size	= sizeof(struct aioring_options),
+	.option_struct_size	= sizeof(struct ioring_options),
 };
 
-static void fio_init fio_aioring_register(void)
+static void fio_init fio_ioring_register(void)
 {
 	register_ioengine(&ioengine);
 }
 
-static void fio_exit fio_aioring_unregister(void)
+static void fio_exit fio_ioring_unregister(void)
 {
 	unregister_ioengine(&ioengine);
 }
diff --git a/options.c b/options.c
index 626c7c17..6d832354 100644
--- a/options.c
+++ b/options.c
@@ -1773,13 +1773,11 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "Linux native asynchronous IO",
 			  },
 #endif
-#ifdef CONFIG_LIBAIO
-#ifdef ARCH_HAVE_AIORING
-			  { .ival = "aio-ring",
-			    .help = "Linux native asynchronous IO",
+#ifdef ARCH_HAVE_IOURING
+			  { .ival = "io_uring",
+			    .help = "Fast Linux native aio",
 			  },
 #endif
-#endif
 #ifdef CONFIG_POSIXAIO
 			  { .ival = "posixaio",
 			    .help = "POSIX asynchronous IO",
diff --git a/os/io_uring.h b/os/io_uring.h
new file mode 100644
index 00000000..8dda7951
--- /dev/null
+++ b/os/io_uring.h
@@ -0,0 +1,101 @@
+#ifndef IO_URING_H
+#define IO_URING_H
+
+#include <linux/fs.h>
+
+/*
+ * IO submission data structure
+ */
+struct io_uring_iocb {
+	u8	opcode;
+	u8	flags;
+	u16	ioprio;
+	s32	fd;
+	u64	off;
+	union {
+		void	*addr;
+		u64	__pad;
+	};
+	u32	len;
+	union {
+		__kernel_rwf_t	rw_flags;
+		u32		__resv;
+	};
+};
+
+/*
+ * io_uring_setup() flags
+ */
+#define IORING_SETUP_IOPOLL	(1 << 0)	/* io_context is polled */
+#define IORING_SETUP_FIXEDBUFS	(1 << 1)	/* IO buffers are fixed */
+#define IORING_SETUP_SQTHREAD	(1 << 2)	/* Use SQ thread */
+#define IORING_SETUP_SQWQ	(1 << 3)	/* Use SQ workqueue */
+#define IORING_SETUP_SQPOLL	(1 << 4)	/* SQ thread polls */
+
+#define IORING_OP_READ		1
+#define IORING_OP_WRITE		2
+#define IORING_OP_FSYNC		3
+#define IORING_OP_FDSYNC	4
+#define IORING_OP_READ_FIXED	5
+#define IORING_OP_WRITE_FIXED	6
+
+/*
+ * IO completion data structure
+ */
+struct io_uring_event {
+	__u64	index;		/* what iocb this event came from */
+	s32	res;		/* result code for this event */
+	u32	flags;
+};
+
+#define IOEV_FLAG_CACHEHIT	(1 << 0)	/* IO did not hit media */
+
+/*
+ * Magic offsets for the application to mmap the data it needs
+ */
+#define IORING_OFF_SQ_RING		0ULL
+#define IORING_OFF_CQ_RING		0x8000000ULL
+#define IORING_OFF_IOCB			0x10000000ULL
+
+/*
+ * Filled with the offset for mmap(2)
+ */
+struct io_sqring_offsets {
+	u32 head;
+	u32 tail;
+	u32 ring_mask;
+	u32 ring_entries;
+	u32 flags;
+	u32 dropped;
+	u32 array;
+	u32 resv[3];
+};
+
+#define IORING_SQ_NEED_WAKEUP	(1 << 0) /* needs io_uring_enter wakeup */
+
+struct io_cqring_offsets {
+	u32 head;
+	u32 tail;
+	u32 ring_mask;
+	u32 ring_entries;
+	u32 overflow;
+	u32 events;
+	u32 resv[4];
+};
+
+#define IORING_ENTER_GETEVENTS	(1 << 0)
+
+/*
+ * Passed in for io_uring_setup(2). Copied back with updated info on success
+ */
+struct io_uring_params {
+	u32 sq_entries;
+	u32 cq_entries;
+	u32 flags;
+	u16 sq_thread_cpu;
+	u16 resv[9];
+	struct io_sqring_offsets sq_off;
+	struct io_cqring_offsets cq_off;
+};
+
+#endif
diff --git a/t/aio-ring.c b/t/io_uring.c
similarity index 81%
rename from t/aio-ring.c
rename to t/io_uring.c
index 1a4fe44b..83d723f9 100644
--- a/t/aio-ring.c
+++ b/t/io_uring.c
@@ -1,6 +1,3 @@
-/*
- * gcc -D_GNU_SOURCE -Wall -O2 -o aio-ring aio-ring.c  -lpthread -laio
- */
 #include <stdio.h>
 #include <errno.h>
 #include <assert.h>
@@ -15,69 +12,29 @@
 #include <sys/syscall.h>
 #include <sys/resource.h>
 #include <sys/mman.h>
+#include <sys/uio.h>
 #include <linux/fs.h>
 #include <fcntl.h>
 #include <unistd.h>
-#include <libaio.h>
 #include <string.h>
 #include <pthread.h>
 #include <sched.h>
 
 #include "../arch/arch.h"
 
-#define IOCTX_FLAG_SCQRING	(1 << 0)	/* Use SQ/CQ rings */
-#define IOCTX_FLAG_IOPOLL	(1 << 1)
-#define IOCTX_FLAG_FIXEDBUFS	(1 << 2)
-#define IOCTX_FLAG_SQTHREAD	(1 << 3)	/* Use SQ thread */
-#define IOCTX_FLAG_SQWQ		(1 << 4)	/* Use SQ wq */
-#define IOCTX_FLAG_SQPOLL	(1 << 5)
-
-#define IOEV_RES2_CACHEHIT	(1 << 0)
-
-#define barrier()	__asm__ __volatile__("": : :"memory")
-
-#define min(a, b)		((a < b) ? (a) : (b))
-
 typedef uint64_t u64;
 typedef uint32_t u32;
+typedef int32_t s32;
 typedef uint16_t u16;
+typedef uint8_t u8;
 
-#define IORING_OFF_SQ_RING	0ULL
-#define IORING_OFF_CQ_RING	0x8000000ULL
-#define IORING_OFF_IOCB		0x10000000ULL
-
-struct aio_sqring_offsets {
-	u32 head;
-	u32 tail;
-	u32 ring_mask;
-	u32 ring_entries;
-	u32 flags;
-	u32 dropped;
-	u32 array;
-	u32 resv[3];
-};
+#include "../os/io_uring.h"
 
-struct aio_cqring_offsets {
-	u32 head;
-	u32 tail;
-	u32 ring_mask;
-	u32 ring_entries;
-	u32 overflow;
-	u32 events;
-	u32 resv[4];
-};
+#define barrier()	__asm__ __volatile__("": : :"memory")
 
-struct aio_uring_params {
-	u32 sq_entries;
-	u32 cq_entries;
-	u32 flags;
-	u16 sq_thread_cpu;
-	u16 resv[9];
-	struct aio_sqring_offsets sq_off;
-	struct aio_cqring_offsets cq_off;
-};
+#define min(a, b)		((a < b) ? (a) : (b))
 
-struct aio_sq_ring {
+struct io_sq_ring {
 	u32 *head;
 	u32 *tail;
 	u32 *ring_mask;
@@ -85,16 +42,14 @@ struct aio_sq_ring {
 	u32 *array;
 };
 
-struct aio_cq_ring {
+struct io_cq_ring {
 	u32 *head;
 	u32 *tail;
 	u32 *ring_mask;
 	u32 *ring_entries;
-	struct io_event *events;
+	struct io_uring_event *events;
 };
 
-#define IORING_ENTER_GETEVENTS	(1 << 0)
-
 #define DEPTH			32
 
 #define BATCH_SUBMIT		8
@@ -109,10 +64,10 @@ struct submitter {
 	unsigned long max_blocks;
 	int ring_fd;
 	struct drand48_data rand;
-	struct aio_sq_ring sq_ring;
-	struct iocb *iocbs;
+	struct io_sq_ring sq_ring;
+	struct io_uring_iocb *iocbs;
 	struct iovec iovecs[DEPTH];
-	struct aio_cq_ring cq_ring;
+	struct io_cq_ring cq_ring;
 	int inflight;
 	unsigned long reaps;
 	unsigned long done;
@@ -132,7 +87,7 @@ static int sq_thread = 0;	/* use kernel submission thread */
 static int sq_thread_cpu = 0;	/* pin above thread to this CPU */
 
 static int io_uring_setup(unsigned entries, struct iovec *iovecs,
-			  struct aio_uring_params *p)
+			  struct io_uring_params *p)
 {
 	return syscall(__NR_sys_io_uring_setup, entries, iovecs, p);
 }
@@ -151,23 +106,25 @@ static int gettid(void)
 
 static void init_io(struct submitter *s, int fd, unsigned index)
 {
-	struct iocb *iocb = &s->iocbs[index];
+	struct io_uring_iocb *iocb = &s->iocbs[index];
 	unsigned long offset;
 	long r;
 
 	lrand48_r(&s->rand, &r);
 	offset = (r % (s->max_blocks - 1)) * BS;
 
-	iocb->aio_fildes = fd;
-	iocb->aio_lio_opcode = IO_CMD_PREAD;
-	iocb->u.c.buf = s->iovecs[index].iov_base;
-	iocb->u.c.nbytes = BS;
-	iocb->u.c.offset = offset;
+	iocb->opcode = IORING_OP_READ;
+	iocb->flags = 0;
+	iocb->ioprio = 0;
+	iocb->fd = fd;
+	iocb->off = offset;
+	iocb->addr = s->iovecs[index].iov_base;
+	iocb->len = BS;
 }
 
 static int prep_more_ios(struct submitter *s, int fd, int max_ios)
 {
-	struct aio_sq_ring *ring = &s->sq_ring;
+	struct io_sq_ring *ring = &s->sq_ring;
 	u32 index, tail, next_tail, prepped = 0;
 
 	next_tail = tail = *ring->tail;
@@ -217,8 +174,8 @@ static int get_file_size(int fd, unsigned long *blocks)
 
 static int reap_events(struct submitter *s)
 {
-	struct aio_cq_ring *ring = &s->cq_ring;
-	struct io_event *ev;
+	struct io_cq_ring *ring = &s->cq_ring;
+	struct io_uring_event *ev;
 	u32 head, reaped = 0;
 
 	head = *ring->head;
@@ -228,15 +185,15 @@ static int reap_events(struct submitter *s)
 			break;
 		ev = &ring->events[head & cq_ring_mask];
 		if (ev->res != BS) {
-			struct iocb *iocb = ev->obj;
+			struct io_uring_iocb *iocb = &s->iocbs[ev->index];
 
-			printf("io: unexpected ret=%ld\n", ev->res);
+			printf("io: unexpected ret=%d\n", ev->res);
 			printf("offset=%lu, size=%lu\n",
-					(unsigned long) iocb->u.c.offset,
-					(unsigned long) iocb->u.c.nbytes);
+					(unsigned long) iocb->off,
+					(unsigned long) iocb->len);
 			return -1;
 		}
-		if (ev->res2 & IOEV_RES2_CACHEHIT)
+		if (ev->flags & IOEV_FLAG_CACHEHIT)
 			s->cachehit++;
 		else
 			s->cachemiss++;
@@ -359,23 +316,22 @@ static void arm_sig_int(void)
 
 static int setup_ring(struct submitter *s)
 {
-	struct aio_sq_ring *sring = &s->sq_ring;
-	struct aio_cq_ring *cring = &s->cq_ring;
-	struct aio_uring_params p;
+	struct io_sq_ring *sring = &s->sq_ring;
+	struct io_cq_ring *cring = &s->cq_ring;
+	struct io_uring_params p;
 	void *ptr;
 	int fd;
 
 	memset(&p, 0, sizeof(p));
 
-	p.flags = IOCTX_FLAG_SCQRING;
 	if (polled)
-		p.flags |= IOCTX_FLAG_IOPOLL;
+		p.flags |= IORING_SETUP_IOPOLL;
 	if (fixedbufs)
-		p.flags |= IOCTX_FLAG_FIXEDBUFS;
+		p.flags |= IORING_SETUP_FIXEDBUFS;
 	if (buffered)
-		p.flags |= IOCTX_FLAG_SQWQ;
+		p.flags |= IORING_SETUP_SQWQ;
 	else if (sq_thread) {
-		p.flags |= IOCTX_FLAG_SQTHREAD;
+		p.flags |= IORING_SETUP_SQTHREAD;
 		p.sq_thread_cpu = sq_thread_cpu;
 	}
 
@@ -400,12 +356,12 @@ static int setup_ring(struct submitter *s)
 	sring->array = ptr + p.sq_off.array;
 	sq_ring_mask = *sring->ring_mask;
 
-	s->iocbs = mmap(0, p.sq_entries * sizeof(struct iocb),
+	s->iocbs = mmap(0, p.sq_entries * sizeof(struct io_uring_iocb),
 			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
 			IORING_OFF_IOCB);
 	printf("iocbs ptr   = 0x%p\n", s->iocbs);
 
-	ptr = mmap(0, p.cq_off.events + p.cq_entries * sizeof(struct io_event),
+	ptr = mmap(0, p.cq_off.events + p.cq_entries * sizeof(struct io_uring_event),
 			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
 			IORING_OFF_CQ_RING);
 	printf("cq_ring ptr = 0x%p\n", ptr);
@@ -501,5 +457,6 @@ int main(int argc, char *argv[])
 	} while (!finish);
 
 	pthread_join(s->thread, &ret);
+	close(s->ring_fd);
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-06 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-06 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ac122fea7540ca115c157e0a835a74b891f10484:

  aioring: update to newer API (2019-01-04 22:22:54 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f310970e737975088e41ea14c399450ba8ae3a49:

  t/aio-ring: cleanup the code a bit (2019-01-05 07:42:30 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      aioring: make sq/cqring_offsets a bit more future proof
      t/aio-ring: cleanup the code a bit

 engines/aioring.c |  3 +++
 t/aio-ring.c      | 43 +++++++++++++++++++++++++------------------
 2 files changed, 28 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/engines/aioring.c b/engines/aioring.c
index ca60b281..8cecb6ad 100644
--- a/engines/aioring.c
+++ b/engines/aioring.c
@@ -68,7 +68,9 @@ struct aio_sqring_offsets {
 	u32 ring_mask;
 	u32 ring_entries;
 	u32 flags;
+	u32 dropped;
 	u32 array;
+	u32 resv[3];
 };
 
 struct aio_cqring_offsets {
@@ -78,6 +80,7 @@ struct aio_cqring_offsets {
 	u32 ring_entries;
 	u32 overflow;
 	u32 events;
+	u32 resv[4];
 };
 
 struct aio_uring_params {
diff --git a/t/aio-ring.c b/t/aio-ring.c
index 71978c68..1a4fe44b 100644
--- a/t/aio-ring.c
+++ b/t/aio-ring.c
@@ -52,7 +52,9 @@ struct aio_sqring_offsets {
 	u32 ring_mask;
 	u32 ring_entries;
 	u32 flags;
+	u32 dropped;
 	u32 array;
+	u32 resv[3];
 };
 
 struct aio_cqring_offsets {
@@ -62,6 +64,7 @@ struct aio_cqring_offsets {
 	u32 ring_entries;
 	u32 overflow;
 	u32 events;
+	u32 resv[4];
 };
 
 struct aio_uring_params {
@@ -104,7 +107,7 @@ static unsigned sq_ring_mask, cq_ring_mask;
 struct submitter {
 	pthread_t thread;
 	unsigned long max_blocks;
-	int fd;
+	int ring_fd;
 	struct drand48_data rand;
 	struct aio_sq_ring sq_ring;
 	struct iocb *iocbs;
@@ -137,8 +140,8 @@ static int io_uring_setup(unsigned entries, struct iovec *iovecs,
 static int io_uring_enter(struct submitter *s, unsigned int to_submit,
 			  unsigned int min_complete, unsigned int flags)
 {
-	return syscall(__NR_sys_io_uring_enter, s->fd, to_submit, min_complete,
-			flags);
+	return syscall(__NR_sys_io_uring_enter, s->ring_fd, to_submit,
+			min_complete, flags);
 }
 
 static int gettid(void)
@@ -228,7 +231,9 @@ static int reap_events(struct submitter *s)
 			struct iocb *iocb = ev->obj;
 
 			printf("io: unexpected ret=%ld\n", ev->res);
-			printf("offset=%lu, size=%lu\n", (unsigned long) iocb->u.c.offset, (unsigned long) iocb->u.c.nbytes);
+			printf("offset=%lu, size=%lu\n",
+					(unsigned long) iocb->u.c.offset,
+					(unsigned long) iocb->u.c.nbytes);
 			return -1;
 		}
 		if (ev->res2 & IOEV_RES2_CACHEHIT)
@@ -265,21 +270,22 @@ static void *submitter_fn(void *data)
 		printf("failed getting size of device/file\n");
 		goto err;
 	}
-	if (!s->max_blocks) {
+	if (s->max_blocks <= 1) {
 		printf("Zero file/device size?\n");
 		goto err;
 	}
-
 	s->max_blocks--;
 
 	srand48_r(pthread_self(), &s->rand);
 
 	prepped = 0;
 	do {
-		int to_wait, to_submit, this_reap;
+		int to_wait, to_submit, this_reap, to_prep;
 
-		if (!prepped && s->inflight < DEPTH)
-			prepped = prep_more_ios(s, fd, min(DEPTH - s->inflight, BATCH_SUBMIT));
+		if (!prepped && s->inflight < DEPTH) {
+			to_prep = min(DEPTH - s->inflight, BATCH_SUBMIT);
+			prepped = prep_more_ios(s, fd, to_prep);
+		}
 		s->inflight += prepped;
 submit_more:
 		to_submit = prepped;
@@ -289,7 +295,8 @@ submit:
 		else
 			to_wait = min(s->inflight + to_submit, BATCH_COMPLETE);
 
-		ret = io_uring_enter(s, to_submit, to_wait, IORING_ENTER_GETEVENTS);
+		ret = io_uring_enter(s, to_submit, to_wait,
+					IORING_ENTER_GETEVENTS);
 		s->calls++;
 
 		this_reap = reap_events(s);
@@ -381,11 +388,10 @@ static int setup_ring(struct submitter *s)
 		return 1;
 	}
 
-	s->fd = fd;
-
+	s->ring_fd = fd;
 	ptr = mmap(0, p.sq_off.array + p.sq_entries * sizeof(u32),
-			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
-			fd, IORING_OFF_SQ_RING);
+			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
+			IORING_OFF_SQ_RING);
 	printf("sq_ring ptr = 0x%p\n", ptr);
 	sring->head = ptr + p.sq_off.head;
 	sring->tail = ptr + p.sq_off.tail;
@@ -394,13 +400,14 @@ static int setup_ring(struct submitter *s)
 	sring->array = ptr + p.sq_off.array;
 	sq_ring_mask = *sring->ring_mask;
 
-	s->iocbs = mmap(0, p.sq_entries * sizeof(struct iocb), PROT_READ | PROT_WRITE,
-			MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_IOCB);
+	s->iocbs = mmap(0, p.sq_entries * sizeof(struct iocb),
+			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
+			IORING_OFF_IOCB);
 	printf("iocbs ptr   = 0x%p\n", s->iocbs);
 
 	ptr = mmap(0, p.cq_off.events + p.cq_entries * sizeof(struct io_event),
-			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
-			fd, IORING_OFF_CQ_RING);
+			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd,
+			IORING_OFF_CQ_RING);
 	printf("cq_ring ptr = 0x%p\n", ptr);
 	cring->head = ptr + p.cq_off.head;
 	cring->tail = ptr + p.cq_off.tail;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2019-01-05 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2019-01-05 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ac4f3d4e4cf16b1097249a819fe7111b2674b3f4:

  aioring: remove IOCB_FLAG_HIPRI (2018-12-30 17:19:40 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ac122fea7540ca115c157e0a835a74b891f10484:

  aioring: update to newer API (2019-01-04 22:22:54 -0700)

----------------------------------------------------------------
Jens Axboe (4):
      t/aio-ring: update to newer mmap() API
      engines/aioring: update for newer mmap based API
      t/aio-ring: use syscall defines
      aioring: update to newer API

 arch/arch-x86_64.h |   8 +-
 engines/aioring.c  | 267 ++++++++++++++++++++++++++++++++---------------------
 t/aio-ring.c       | 262 +++++++++++++++++++++++++++++++---------------------
 3 files changed, 322 insertions(+), 215 deletions(-)

---

Diff of recent changes:

diff --git a/arch/arch-x86_64.h b/arch/arch-x86_64.h
index d49bcd7f..d0a98b8b 100644
--- a/arch/arch-x86_64.h
+++ b/arch/arch-x86_64.h
@@ -1,11 +1,11 @@
 #ifndef ARCH_X86_64_H
 #define ARCH_X86_64_H
 
-#ifndef __NR_sys_io_setup2
-#define __NR_sys_io_setup2	335
+#ifndef __NR_sys_io_uring_setup
+#define __NR_sys_io_uring_setup	335
 #endif
-#ifndef __NR_sys_io_ring_enter
-#define __NR_sys_io_ring_enter	336
+#ifndef __NR_sys_io_uring_enter
+#define __NR_sys_io_uring_enter	336
 #endif
 
 static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
diff --git a/engines/aioring.c b/engines/aioring.c
index f836009d..ca60b281 100644
--- a/engines/aioring.c
+++ b/engines/aioring.c
@@ -22,13 +22,13 @@
 #ifdef ARCH_HAVE_AIORING
 
 /*
- * io_setup2(2) flags
+ * io_uring_setup(2) flags
  */
-#ifndef IOCTX_FLAG_IOPOLL
-#define IOCTX_FLAG_IOPOLL	(1 << 0)
-#endif
 #ifndef IOCTX_FLAG_SCQRING
-#define IOCTX_FLAG_SCQRING	(1 << 1)
+#define IOCTX_FLAG_SCQRING	(1 << 0)
+#endif
+#ifndef IOCTX_FLAG_IOPOLL
+#define IOCTX_FLAG_IOPOLL	(1 << 1)
 #endif
 #ifndef IOCTX_FLAG_FIXEDBUFS
 #define IOCTX_FLAG_FIXEDBUFS	(1 << 2)
@@ -43,12 +43,15 @@
 #define IOCTX_FLAG_SQPOLL	(1 << 5)
 #endif
 
+#define IORING_OFF_SQ_RING	0ULL
+#define IORING_OFF_CQ_RING	0x8000000ULL
+#define IORING_OFF_IOCB		0x10000000ULL
 
 /*
- * io_ring_enter(2) flags
+ * io_uring_enter(2) flags
  */
-#ifndef IORING_FLAG_GETEVENTS
-#define IORING_FLAG_GETEVENTS	(1 << 0)
+#ifndef IORING_ENTER_GETEVENTS
+#define IORING_ENTER_GETEVENTS	(1 << 0)
 #endif
 
 typedef uint64_t u64;
@@ -59,43 +62,68 @@ typedef uint16_t u16;
 
 #define IOEV_RES2_CACHEHIT	(1 << 0)
 
+struct aio_sqring_offsets {
+	u32 head;
+	u32 tail;
+	u32 ring_mask;
+	u32 ring_entries;
+	u32 flags;
+	u32 array;
+};
+
+struct aio_cqring_offsets {
+	u32 head;
+	u32 tail;
+	u32 ring_mask;
+	u32 ring_entries;
+	u32 overflow;
+	u32 events;
+};
+
+struct aio_uring_params {
+	u32 sq_entries;
+	u32 cq_entries;
+	u32 flags;
+	u16 sq_thread_cpu;
+	u16 resv[9];
+	struct aio_sqring_offsets sq_off;
+	struct aio_cqring_offsets cq_off;
+};
+
 struct aio_sq_ring {
-	union {
-		struct {
-			u32 head;
-			u32 tail;
-			u32 nr_events;
-			u16 sq_thread_cpu;
-			u16 kflags;
-			u64 iocbs;
-		};
-		u32 pad[16];
-	};
-	u32 array[0];
+	u32 *head;
+	u32 *tail;
+	u32 *ring_mask;
+	u32 *ring_entries;
+	u32 *flags;
+	u32 *array;
 };
 
 struct aio_cq_ring {
-	union {
-		struct {
-			u32 head;
-			u32 tail;
-			u32 nr_events;
-		};
-		struct io_event pad;
-	};
-	struct io_event events[0];
+	u32 *head;
+	u32 *tail;
+	u32 *ring_mask;
+	u32 *ring_entries;
+	struct io_event *events;
+};
+
+struct aioring_mmap {
+	void *ptr;
+	size_t len;
 };
 
 struct aioring_data {
-	io_context_t aio_ctx;
+	int ring_fd;
+
 	struct io_u **io_us;
 	struct io_u **io_u_index;
 
-	struct aio_sq_ring *sq_ring;
+	struct aio_sq_ring sq_ring;
 	struct iocb *iocbs;
+	struct iovec *iovecs;
 	unsigned sq_ring_mask;
 
-	struct aio_cq_ring *cq_ring;
+	struct aio_cq_ring cq_ring;
 	struct io_event *events;
 	unsigned cq_ring_mask;
 
@@ -105,6 +133,8 @@ struct aioring_data {
 
 	uint64_t cachehit;
 	uint64_t cachemiss;
+
+	struct aioring_mmap mmap[3];
 };
 
 struct aioring_options {
@@ -178,11 +208,11 @@ static struct fio_option options[] = {
 	},
 };
 
-static int io_ring_enter(io_context_t ctx, unsigned int to_submit,
+static int io_uring_enter(struct aioring_data *ld, unsigned int to_submit,
 			 unsigned int min_complete, unsigned int flags)
 {
-	return syscall(__NR_sys_io_ring_enter, ctx, to_submit, min_complete,
-			flags);
+	return syscall(__NR_sys_io_uring_enter, ld->ring_fd, to_submit,
+			min_complete, flags);
 }
 
 static int fio_aioring_prep(struct thread_data *td, struct io_u *io_u)
@@ -220,7 +250,7 @@ static struct io_u *fio_aioring_event(struct thread_data *td, int event)
 
 	index = (event + ld->cq_ring_off) & ld->cq_ring_mask;
 
-	ev = &ld->cq_ring->events[index];
+	ev = &ld->cq_ring.events[index];
 	io_u = ev->data;
 
 	if (ev->res != io_u->xfer_buflen) {
@@ -245,19 +275,19 @@ static int fio_aioring_cqring_reap(struct thread_data *td, unsigned int events,
 				   unsigned int max)
 {
 	struct aioring_data *ld = td->io_ops_data;
-	struct aio_cq_ring *ring = ld->cq_ring;
+	struct aio_cq_ring *ring = &ld->cq_ring;
 	u32 head, reaped = 0;
 
-	head = ring->head;
+	head = *ring->head;
 	do {
 		read_barrier();
-		if (head == ring->tail)
+		if (head == *ring->tail)
 			break;
 		reaped++;
 		head++;
 	} while (reaped + events < max);
 
-	ring->head = head;
+	*ring->head = head;
 	write_barrier();
 	return reaped;
 }
@@ -268,11 +298,11 @@ static int fio_aioring_getevents(struct thread_data *td, unsigned int min,
 	struct aioring_data *ld = td->io_ops_data;
 	unsigned actual_min = td->o.iodepth_batch_complete_min == 0 ? 0 : min;
 	struct aioring_options *o = td->eo;
-	struct aio_cq_ring *ring = ld->cq_ring;
+	struct aio_cq_ring *ring = &ld->cq_ring;
 	unsigned events = 0;
 	int r;
 
-	ld->cq_ring_off = ring->head;
+	ld->cq_ring_off = *ring->head;
 	do {
 		r = fio_aioring_cqring_reap(td, events, max);
 		if (r) {
@@ -281,12 +311,12 @@ static int fio_aioring_getevents(struct thread_data *td, unsigned int min,
 		}
 
 		if (!o->sqthread_poll) {
-			r = io_ring_enter(ld->aio_ctx, 0, actual_min,
-						IORING_FLAG_GETEVENTS);
+			r = io_uring_enter(ld, 0, actual_min,
+						IORING_ENTER_GETEVENTS);
 			if (r < 0) {
 				if (errno == EAGAIN)
 					continue;
-				td_verror(td, errno, "io_ring_enter get");
+				td_verror(td, errno, "io_uring_enter");
 				break;
 			}
 		}
@@ -299,7 +329,7 @@ static enum fio_q_status fio_aioring_queue(struct thread_data *td,
 					   struct io_u *io_u)
 {
 	struct aioring_data *ld = td->io_ops_data;
-	struct aio_sq_ring *ring = ld->sq_ring;
+	struct aio_sq_ring *ring = &ld->sq_ring;
 	unsigned tail, next_tail;
 
 	fio_ro_check(td, io_u);
@@ -317,14 +347,14 @@ static enum fio_q_status fio_aioring_queue(struct thread_data *td,
 		return FIO_Q_COMPLETED;
 	}
 
-	tail = ring->tail;
+	tail = *ring->tail;
 	next_tail = tail + 1;
 	read_barrier();
-	if (next_tail == ring->head)
+	if (next_tail == *ring->head)
 		return FIO_Q_BUSY;
 
 	ring->array[tail & ld->sq_ring_mask] = io_u->index;
-	ring->tail = next_tail;
+	*ring->tail = next_tail;
 	write_barrier();
 
 	ld->queued++;
@@ -342,7 +372,8 @@ static void fio_aioring_queued(struct thread_data *td, int start, int nr)
 	fio_gettime(&now, NULL);
 
 	while (nr--) {
-		int index = ld->sq_ring->array[start & ld->sq_ring_mask];
+		struct aio_sq_ring *ring = &ld->sq_ring;
+		int index = ring->array[start & ld->sq_ring_mask];
 		struct io_u *io_u = ld->io_u_index[index];
 
 		memcpy(&io_u->issue_time, &now, sizeof(now));
@@ -363,19 +394,19 @@ static int fio_aioring_commit(struct thread_data *td)
 
 	/* Nothing to do */
 	if (o->sqthread_poll) {
-		struct aio_sq_ring *ring = ld->sq_ring;
+		struct aio_sq_ring *ring = &ld->sq_ring;
 
-		if (ring->kflags & IORING_SQ_NEED_WAKEUP)
-			io_ring_enter(ld->aio_ctx, ld->queued, 0, 0);
+		if (*ring->flags & IORING_SQ_NEED_WAKEUP)
+			io_uring_enter(ld, ld->queued, 0, 0);
 		ld->queued = 0;
 		return 0;
 	}
 
 	do {
-		unsigned start = ld->sq_ring->head;
+		unsigned start = *ld->sq_ring.head;
 		long nr = ld->queued;
 
-		ret = io_ring_enter(ld->aio_ctx, nr, 0, IORING_FLAG_GETEVENTS);
+		ret = io_uring_enter(ld, nr, 0, IORING_ENTER_GETEVENTS);
 		if (ret > 0) {
 			fio_aioring_queued(td, start, ret);
 			io_u_mark_submit(td, ret);
@@ -394,7 +425,7 @@ static int fio_aioring_commit(struct thread_data *td)
 				usleep(1);
 				continue;
 			}
-			td_verror(td, errno, "io_ring_enter sumit");
+			td_verror(td, errno, "io_uring_enter submit");
 			break;
 		}
 	} while (ld->queued);
@@ -402,24 +433,13 @@ static int fio_aioring_commit(struct thread_data *td)
 	return ret;
 }
 
-static size_t aioring_cq_size(struct thread_data *td)
+static void fio_aioring_unmap(struct aioring_data *ld)
 {
-	return sizeof(struct aio_cq_ring) + 2 * td->o.iodepth * sizeof(struct io_event);
-}
+	int i;
 
-static size_t aioring_sq_iocb(struct thread_data *td)
-{
-	return sizeof(struct iocb) * td->o.iodepth;
-}
-
-static size_t aioring_sq_size(struct thread_data *td)
-{
-	return sizeof(struct aio_sq_ring) + td->o.iodepth * sizeof(u32);
-}
-
-static unsigned roundup_pow2(unsigned depth)
-{
-	return 1UL << __fls(depth - 1);
+	for (i = 0; i < ARRAY_SIZE(ld->mmap); i++)
+		munmap(ld->mmap[i].ptr, ld->mmap[i].len);
+	close(ld->ring_fd);
 }
 
 static void fio_aioring_cleanup(struct thread_data *td)
@@ -437,33 +457,76 @@ static void fio_aioring_cleanup(struct thread_data *td)
 		 * speeding it up a lot.
 		 */
 		if (!(td->flags & TD_F_CHILD))
-			io_destroy(ld->aio_ctx);
+			fio_aioring_unmap(ld);
+
 		free(ld->io_u_index);
 		free(ld->io_us);
-		fio_memfree(ld->sq_ring, aioring_sq_size(td), false);
-		fio_memfree(ld->iocbs, aioring_sq_iocb(td), false);
-		fio_memfree(ld->cq_ring, aioring_cq_size(td), false);
+		free(ld->iovecs);
 		free(ld);
 	}
 }
 
+static int fio_aioring_mmap(struct aioring_data *ld, struct aio_uring_params *p)
+{
+	struct aio_sq_ring *sring = &ld->sq_ring;
+	struct aio_cq_ring *cring = &ld->cq_ring;
+	void *ptr;
+
+	ld->mmap[0].len = p->sq_off.array + p->sq_entries * sizeof(u32);
+	ptr = mmap(0, ld->mmap[0].len, PROT_READ | PROT_WRITE,
+			MAP_SHARED | MAP_POPULATE, ld->ring_fd,
+			IORING_OFF_SQ_RING);
+	ld->mmap[0].ptr = ptr;
+	sring->head = ptr + p->sq_off.head;
+	sring->tail = ptr + p->sq_off.tail;
+	sring->ring_mask = ptr + p->sq_off.ring_mask;
+	sring->ring_entries = ptr + p->sq_off.ring_entries;
+	sring->flags = ptr + p->sq_off.flags;
+	sring->array = ptr + p->sq_off.array;
+	ld->sq_ring_mask = *sring->ring_mask;
+
+	ld->mmap[1].len = p->sq_entries * sizeof(struct iocb);
+	ld->iocbs = mmap(0, ld->mmap[1].len, PROT_READ | PROT_WRITE,
+				MAP_SHARED | MAP_POPULATE, ld->ring_fd,
+				IORING_OFF_IOCB);
+	ld->mmap[1].ptr = ld->iocbs;
+
+	ld->mmap[2].len = p->cq_off.events +
+				p->cq_entries * sizeof(struct io_event);
+	ptr = mmap(0, ld->mmap[2].len, PROT_READ | PROT_WRITE,
+			MAP_SHARED | MAP_POPULATE, ld->ring_fd,
+			IORING_OFF_CQ_RING);
+	ld->mmap[2].ptr = ptr;
+	cring->head = ptr + p->cq_off.head;
+	cring->tail = ptr + p->cq_off.tail;
+	cring->ring_mask = ptr + p->cq_off.ring_mask;
+	cring->ring_entries = ptr + p->cq_off.ring_entries;
+	cring->events = ptr + p->cq_off.events;
+	ld->cq_ring_mask = *cring->ring_mask;
+	return 0;
+}
+
 static int fio_aioring_queue_init(struct thread_data *td)
 {
 	struct aioring_data *ld = td->io_ops_data;
 	struct aioring_options *o = td->eo;
-	int flags = IOCTX_FLAG_SCQRING;
 	int depth = td->o.iodepth;
+	struct aio_uring_params p;
+	int ret;
+
+	memset(&p, 0, sizeof(p));
+	p.flags = IOCTX_FLAG_SCQRING;
 
 	if (o->hipri)
-		flags |= IOCTX_FLAG_IOPOLL;
+		p.flags |= IOCTX_FLAG_IOPOLL;
 	if (o->sqthread_set) {
-		ld->sq_ring->sq_thread_cpu = o->sqthread;
-		flags |= IOCTX_FLAG_SQTHREAD;
+		p.sq_thread_cpu = o->sqthread;
+		p.flags |= IOCTX_FLAG_SQTHREAD;
 		if (o->sqthread_poll)
-			flags |= IOCTX_FLAG_SQPOLL;
+			p.flags |= IOCTX_FLAG_SQPOLL;
 	}
 	if (o->sqwq)
-		flags |= IOCTX_FLAG_SQWQ;
+		p.flags |= IOCTX_FLAG_SQWQ;
 
 	if (o->fixedbufs) {
 		struct rlimit rlim = {
@@ -472,11 +535,15 @@ static int fio_aioring_queue_init(struct thread_data *td)
 		};
 
 		setrlimit(RLIMIT_MEMLOCK, &rlim);
-		flags |= IOCTX_FLAG_FIXEDBUFS;
+		p.flags |= IOCTX_FLAG_FIXEDBUFS;
 	}
 
-	return syscall(__NR_sys_io_setup2, depth, flags,
-			ld->sq_ring, ld->cq_ring, &ld->aio_ctx);
+	ret = syscall(__NR_sys_io_uring_setup, depth, ld->iovecs, &p);
+	if (ret < 0)
+		return ret;
+
+	ld->ring_fd = ret;
+	return fio_aioring_mmap(ld, &p);
 }
 
 static int fio_aioring_post_init(struct thread_data *td)
@@ -484,22 +551,21 @@ static int fio_aioring_post_init(struct thread_data *td)
 	struct aioring_data *ld = td->io_ops_data;
 	struct aioring_options *o = td->eo;
 	struct io_u *io_u;
-	struct iocb *iocb;
-	int err = 0;
+	int err;
 
 	if (o->fixedbufs) {
 		int i;
 
 		for (i = 0; i < td->o.iodepth; i++) {
+			struct iovec *iov = &ld->iovecs[i];
+
 			io_u = ld->io_u_index[i];
-			iocb = &ld->iocbs[i];
-			iocb->u.c.buf = io_u->buf;
-			iocb->u.c.nbytes = td_max_bs(td);
+			iov->iov_base = io_u->buf;
+			iov->iov_len = td_max_bs(td);
 		}
 	}
 
 	err = fio_aioring_queue_init(td);
-
 	if (err) {
 		td_verror(td, errno, "io_queue_init");
 		return 1;
@@ -508,6 +574,11 @@ static int fio_aioring_post_init(struct thread_data *td)
 	return 0;
 }
 
+static unsigned roundup_pow2(unsigned depth)
+{
+	return 1UL << __fls(depth - 1);
+}
+
 static int fio_aioring_init(struct thread_data *td)
 {
 	struct aioring_data *ld;
@@ -522,19 +593,7 @@ static int fio_aioring_init(struct thread_data *td)
 	ld->io_u_index = calloc(td->o.iodepth, sizeof(struct io_u *));
 	ld->io_us = calloc(td->o.iodepth, sizeof(struct io_u *));
 
-	ld->iocbs = fio_memalign(page_size, aioring_sq_iocb(td), false);
-	memset(ld->iocbs, 0, aioring_sq_iocb(td));
-
-	ld->sq_ring = fio_memalign(page_size, aioring_sq_size(td), false);
-	memset(ld->sq_ring, 0, aioring_sq_size(td));
-	ld->sq_ring->nr_events = td->o.iodepth;
-	ld->sq_ring->iocbs = (u64) (uintptr_t) ld->iocbs;
-	ld->sq_ring_mask = td->o.iodepth - 1;
-
-	ld->cq_ring = fio_memalign(page_size, aioring_cq_size(td), false);
-	memset(ld->cq_ring, 0, aioring_cq_size(td));
-	ld->cq_ring->nr_events = td->o.iodepth * 2;
-	ld->cq_ring_mask = (2 * td->o.iodepth) - 1;
+	ld->iovecs = calloc(td->o.iodepth, sizeof(struct iovec));
 
 	td->io_ops_data = ld;
 	return 0;
diff --git a/t/aio-ring.c b/t/aio-ring.c
index c0c5009e..71978c68 100644
--- a/t/aio-ring.c
+++ b/t/aio-ring.c
@@ -14,6 +14,7 @@
 #include <sys/ioctl.h>
 #include <sys/syscall.h>
 #include <sys/resource.h>
+#include <sys/mman.h>
 #include <linux/fs.h>
 #include <fcntl.h>
 #include <unistd.h>
@@ -22,11 +23,14 @@
 #include <pthread.h>
 #include <sched.h>
 
-#define IOCTX_FLAG_IOPOLL	(1 << 0)
-#define IOCTX_FLAG_SCQRING	(1 << 1)	/* Use SQ/CQ rings */
+#include "../arch/arch.h"
+
+#define IOCTX_FLAG_SCQRING	(1 << 0)	/* Use SQ/CQ rings */
+#define IOCTX_FLAG_IOPOLL	(1 << 1)
 #define IOCTX_FLAG_FIXEDBUFS	(1 << 2)
 #define IOCTX_FLAG_SQTHREAD	(1 << 3)	/* Use SQ thread */
 #define IOCTX_FLAG_SQWQ		(1 << 4)	/* Use SQ wq */
+#define IOCTX_FLAG_SQPOLL	(1 << 5)
 
 #define IOEV_RES2_CACHEHIT	(1 << 0)
 
@@ -38,33 +42,55 @@ typedef uint64_t u64;
 typedef uint32_t u32;
 typedef uint16_t u16;
 
+#define IORING_OFF_SQ_RING	0ULL
+#define IORING_OFF_CQ_RING	0x8000000ULL
+#define IORING_OFF_IOCB		0x10000000ULL
+
+struct aio_sqring_offsets {
+	u32 head;
+	u32 tail;
+	u32 ring_mask;
+	u32 ring_entries;
+	u32 flags;
+	u32 array;
+};
+
+struct aio_cqring_offsets {
+	u32 head;
+	u32 tail;
+	u32 ring_mask;
+	u32 ring_entries;
+	u32 overflow;
+	u32 events;
+};
+
+struct aio_uring_params {
+	u32 sq_entries;
+	u32 cq_entries;
+	u32 flags;
+	u16 sq_thread_cpu;
+	u16 resv[9];
+	struct aio_sqring_offsets sq_off;
+	struct aio_cqring_offsets cq_off;
+};
+
 struct aio_sq_ring {
-	union {
-		struct {
-			u32 head;
-			u32 tail;
-			u32 nr_events;
-			u16 sq_thread_cpu;
-			u64 iocbs;
-		};
-		u32 pad[16];
-	};
-	u32 array[0];
+	u32 *head;
+	u32 *tail;
+	u32 *ring_mask;
+	u32 *ring_entries;
+	u32 *array;
 };
 
 struct aio_cq_ring {
-	union {
-		struct {
-			u32 head;
-			u32 tail;
-			u32 nr_events;
-		};
-		struct io_event pad;
-	};
-	struct io_event events[0];
+	u32 *head;
+	u32 *tail;
+	u32 *ring_mask;
+	u32 *ring_entries;
+	struct io_event *events;
 };
 
-#define IORING_FLAG_GETEVENTS	(1 << 0)
+#define IORING_ENTER_GETEVENTS	(1 << 0)
 
 #define DEPTH			32
 
@@ -73,17 +99,17 @@ struct aio_cq_ring {
 
 #define BS			4096
 
-static unsigned sq_ring_mask = DEPTH - 1;
-static unsigned cq_ring_mask = (2 * DEPTH) - 1;
+static unsigned sq_ring_mask, cq_ring_mask;
 
 struct submitter {
 	pthread_t thread;
 	unsigned long max_blocks;
-	io_context_t ioc;
+	int fd;
 	struct drand48_data rand;
-	struct aio_sq_ring *sq_ring;
+	struct aio_sq_ring sq_ring;
 	struct iocb *iocbs;
-	struct aio_cq_ring *cq_ring;
+	struct iovec iovecs[DEPTH];
+	struct aio_cq_ring cq_ring;
 	int inflight;
 	unsigned long reaps;
 	unsigned long done;
@@ -96,23 +122,23 @@ struct submitter {
 static struct submitter submitters[1];
 static volatile int finish;
 
-static int polled = 1;		/* use IO polling */
-static int fixedbufs = 1;	/* use fixed user buffers */
-static int buffered = 0;	/* use buffered IO, not O_DIRECT */
+static int polled = 0;		/* use IO polling */
+static int fixedbufs = 0;	/* use fixed user buffers */
+static int buffered = 1;	/* use buffered IO, not O_DIRECT */
 static int sq_thread = 0;	/* use kernel submission thread */
 static int sq_thread_cpu = 0;	/* pin above thread to this CPU */
 
-static int io_setup2(unsigned int nr_events, unsigned int flags,
-		     struct aio_sq_ring *sq_ring, struct aio_cq_ring *cq_ring,
-		     io_context_t *ctx_idp)
+static int io_uring_setup(unsigned entries, struct iovec *iovecs,
+			  struct aio_uring_params *p)
 {
-	return syscall(335, nr_events, flags, sq_ring, cq_ring, ctx_idp);
+	return syscall(__NR_sys_io_uring_setup, entries, iovecs, p);
 }
 
-static int io_ring_enter(io_context_t ctx, unsigned int to_submit,
-			 unsigned int min_complete, unsigned int flags)
+static int io_uring_enter(struct submitter *s, unsigned int to_submit,
+			  unsigned int min_complete, unsigned int flags)
 {
-	return syscall(336, ctx, to_submit, min_complete, flags);
+	return syscall(__NR_sys_io_uring_enter, s->fd, to_submit, min_complete,
+			flags);
 }
 
 static int gettid(void)
@@ -120,8 +146,9 @@ static int gettid(void)
 	return syscall(__NR_gettid);
 }
 
-static void init_io(struct submitter *s, int fd, struct iocb *iocb)
+static void init_io(struct submitter *s, int fd, unsigned index)
 {
+	struct iocb *iocb = &s->iocbs[index];
 	unsigned long offset;
 	long r;
 
@@ -130,34 +157,34 @@ static void init_io(struct submitter *s, int fd, struct iocb *iocb)
 
 	iocb->aio_fildes = fd;
 	iocb->aio_lio_opcode = IO_CMD_PREAD;
+	iocb->u.c.buf = s->iovecs[index].iov_base;
+	iocb->u.c.nbytes = BS;
 	iocb->u.c.offset = offset;
-	if (!fixedbufs)
-		iocb->u.c.nbytes = BS;
 }
 
 static int prep_more_ios(struct submitter *s, int fd, int max_ios)
 {
-	struct aio_sq_ring *ring = s->sq_ring;
+	struct aio_sq_ring *ring = &s->sq_ring;
 	u32 index, tail, next_tail, prepped = 0;
 
-	next_tail = tail = ring->tail;
+	next_tail = tail = *ring->tail;
 	do {
 		next_tail++;
 		barrier();
-		if (next_tail == ring->head)
+		if (next_tail == *ring->head)
 			break;
 
 		index = tail & sq_ring_mask;
-		init_io(s, fd, &s->iocbs[index]);
-		s->sq_ring->array[index] = index;
+		init_io(s, fd, index);
+		ring->array[index] = index;
 		prepped++;
 		tail = next_tail;
 	} while (prepped < max_ios);
 
-	if (ring->tail != tail) {
+	if (*ring->tail != tail) {
 		/* order tail store with writes to iocbs above */
 		barrier();
-		ring->tail = tail;
+		*ring->tail = tail;
 		barrier();
 	}
 	return prepped;
@@ -187,14 +214,14 @@ static int get_file_size(int fd, unsigned long *blocks)
 
 static int reap_events(struct submitter *s)
 {
-	struct aio_cq_ring *ring = s->cq_ring;
+	struct aio_cq_ring *ring = &s->cq_ring;
 	struct io_event *ev;
 	u32 head, reaped = 0;
 
-	head = ring->head;
+	head = *ring->head;
 	do {
 		barrier();
-		if (head == ring->tail)
+		if (head == *ring->tail)
 			break;
 		ev = &ring->events[head & cq_ring_mask];
 		if (ev->res != BS) {
@@ -213,7 +240,7 @@ static int reap_events(struct submitter *s)
 	} while (1);
 
 	s->inflight -= reaped;
-	ring->head = head;
+	*ring->head = head;
 	barrier();
 	return reaped;
 }
@@ -262,8 +289,7 @@ submit:
 		else
 			to_wait = min(s->inflight + to_submit, BATCH_COMPLETE);
 
-		ret = io_ring_enter(s->ioc, to_submit, to_wait,
-					IORING_FLAG_GETEVENTS);
+		ret = io_uring_enter(s, to_submit, to_wait, IORING_ENTER_GETEVENTS);
 		s->calls++;
 
 		this_reap = reap_events(s);
@@ -288,7 +314,7 @@ submit:
 			prepped = 0;
 			continue;
 		} else if (ret < 0) {
-			if ((ret == -1 && errno == EAGAIN) || ret == -EAGAIN) {
+			if (errno == EAGAIN) {
 				if (s->finish)
 					break;
 				if (this_reap)
@@ -296,10 +322,7 @@ submit:
 				to_submit = 0;
 				goto submit;
 			}
-			if (ret == -1)
-				printf("io_submit: %s\n", strerror(errno));
-			else
-				printf("io_submit: %s\n", strerror(-ret));
+			printf("io_submit: %s\n", strerror(errno));
 			break;
 		}
 	} while (!s->finish);
@@ -327,15 +350,74 @@ static void arm_sig_int(void)
 	sigaction(SIGINT, &act, NULL);
 }
 
+static int setup_ring(struct submitter *s)
+{
+	struct aio_sq_ring *sring = &s->sq_ring;
+	struct aio_cq_ring *cring = &s->cq_ring;
+	struct aio_uring_params p;
+	void *ptr;
+	int fd;
+
+	memset(&p, 0, sizeof(p));
+
+	p.flags = IOCTX_FLAG_SCQRING;
+	if (polled)
+		p.flags |= IOCTX_FLAG_IOPOLL;
+	if (fixedbufs)
+		p.flags |= IOCTX_FLAG_FIXEDBUFS;
+	if (buffered)
+		p.flags |= IOCTX_FLAG_SQWQ;
+	else if (sq_thread) {
+		p.flags |= IOCTX_FLAG_SQTHREAD;
+		p.sq_thread_cpu = sq_thread_cpu;
+	}
+
+	if (fixedbufs)
+		fd = io_uring_setup(DEPTH, s->iovecs, &p);
+	else
+		fd = io_uring_setup(DEPTH, NULL, &p);
+	if (fd < 0) {
+		perror("io_uring_setup");
+		return 1;
+	}
+
+	s->fd = fd;
+
+	ptr = mmap(0, p.sq_off.array + p.sq_entries * sizeof(u32),
+			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+			fd, IORING_OFF_SQ_RING);
+	printf("sq_ring ptr = 0x%p\n", ptr);
+	sring->head = ptr + p.sq_off.head;
+	sring->tail = ptr + p.sq_off.tail;
+	sring->ring_mask = ptr + p.sq_off.ring_mask;
+	sring->ring_entries = ptr + p.sq_off.ring_entries;
+	sring->array = ptr + p.sq_off.array;
+	sq_ring_mask = *sring->ring_mask;
+
+	s->iocbs = mmap(0, p.sq_entries * sizeof(struct iocb), PROT_READ | PROT_WRITE,
+			MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_IOCB);
+	printf("iocbs ptr   = 0x%p\n", s->iocbs);
+
+	ptr = mmap(0, p.cq_off.events + p.cq_entries * sizeof(struct io_event),
+			PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+			fd, IORING_OFF_CQ_RING);
+	printf("cq_ring ptr = 0x%p\n", ptr);
+	cring->head = ptr + p.cq_off.head;
+	cring->tail = ptr + p.cq_off.tail;
+	cring->ring_mask = ptr + p.cq_off.ring_mask;
+	cring->ring_entries = ptr + p.cq_off.ring_entries;
+	cring->events = ptr + p.cq_off.events;
+	cq_ring_mask = *cring->ring_mask;
+	return 0;
+}
+
 int main(int argc, char *argv[])
 {
 	struct submitter *s = &submitters[0];
 	unsigned long done, calls, reap, cache_hit, cache_miss;
-	int flags = 0, err;
-	int j;
-	size_t size;
-	void *p, *ret;
+	int err, i;
 	struct rlimit rlim;
+	void *ret;
 
 	if (argc < 2) {
 		printf("%s: filename\n", argv[0]);
@@ -351,58 +433,24 @@ int main(int argc, char *argv[])
 
 	arm_sig_int();
 
-	size = sizeof(struct iocb) * DEPTH;
-	if (posix_memalign(&p, 4096, size))
-		return 1;
-	memset(p, 0, size);
-	s->iocbs = p;
+	for (i = 0; i < DEPTH; i++) {
+		void *buf;
 
-	size = sizeof(struct aio_sq_ring) + DEPTH * sizeof(u32);
-	if (posix_memalign(&p, 4096, size))
-		return 1;
-	s->sq_ring = p;
-	memset(p, 0, size);
-	s->sq_ring->nr_events = DEPTH;
-	s->sq_ring->iocbs = (u64) s->iocbs;
-
-	/* CQ ring must be twice as big */
-	size = sizeof(struct aio_cq_ring) +
-			2 * DEPTH * sizeof(struct io_event);
-	if (posix_memalign(&p, 4096, size))
-		return 1;
-	s->cq_ring = p;
-	memset(p, 0, size);
-	s->cq_ring->nr_events = 2 * DEPTH;
-
-	for (j = 0; j < DEPTH; j++) {
-		struct iocb *iocb = &s->iocbs[j];
-
-		if (posix_memalign(&iocb->u.c.buf, BS, BS)) {
+		if (posix_memalign(&buf, BS, BS)) {
 			printf("failed alloc\n");
 			return 1;
 		}
-		iocb->u.c.nbytes = BS;
-	}
-
-	flags = IOCTX_FLAG_SCQRING;
-	if (polled)
-		flags |= IOCTX_FLAG_IOPOLL;
-	if (fixedbufs)
-		flags |= IOCTX_FLAG_FIXEDBUFS;
-	if (buffered)
-		flags |= IOCTX_FLAG_SQWQ;
-	else if (sq_thread) {
-		flags |= IOCTX_FLAG_SQTHREAD;
-		s->sq_ring->sq_thread_cpu = sq_thread_cpu;
+		s->iovecs[i].iov_base = buf;
+		s->iovecs[i].iov_len = BS;
 	}
 
-	err = io_setup2(DEPTH, flags, s->sq_ring, s->cq_ring, &s->ioc);
+	err = setup_ring(s);
 	if (err) {
-		printf("ctx_init failed: %s, %d\n", strerror(errno), err);
+		printf("ring setup failed: %s, %d\n", strerror(errno), err);
 		return 1;
 	}
-	printf("polled=%d, fixedbufs=%d, buffered=%d\n", polled, fixedbufs, buffered);
-	printf("  QD=%d, sq_ring=%d, cq_ring=%d\n", DEPTH, s->sq_ring->nr_events, s->cq_ring->nr_events);
+	printf("polled=%d, fixedbufs=%d, buffered=%d", polled, fixedbufs, buffered);
+	printf(" QD=%d, sq_ring=%d, cq_ring=%d\n", DEPTH, *s->sq_ring.ring_entries, *s->cq_ring.ring_entries);
 	strcpy(s->filename, argv[1]);
 
 	pthread_create(&s->thread, NULL, submitter_fn, s);
@@ -437,7 +485,7 @@ int main(int argc, char *argv[])
 		}
 		printf("IOPS=%lu, IOS/call=%lu/%lu, inflight=%u (head=%u tail=%u), Cachehit=%0.2f%%\n",
 				this_done - done, rpc, ipc, s->inflight,
-				s->cq_ring->head, s->cq_ring->tail, hit);
+				*s->cq_ring.head, *s->cq_ring.tail, hit);
 		done = this_done;
 		calls = this_call;
 		reap = this_reap;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-12-31 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-12-31 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 10c4d1318fff63eef1d22c6be6d816210277ae17:

  t/aio-ring: print head/tail as unsigneds (2018-12-21 15:37:16 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ac4f3d4e4cf16b1097249a819fe7111b2674b3f4:

  aioring: remove IOCB_FLAG_HIPRI (2018-12-30 17:19:40 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      aioring: update API
      aioring: remove IOCB_FLAG_HIPRI

 engines/aioring.c | 23 ++++-------------------
 t/aio-ring.c      | 16 ++++------------
 2 files changed, 8 insertions(+), 31 deletions(-)

---

Diff of recent changes:

diff --git a/engines/aioring.c b/engines/aioring.c
index 50826964..f836009d 100644
--- a/engines/aioring.c
+++ b/engines/aioring.c
@@ -21,10 +21,6 @@
 
 #ifdef ARCH_HAVE_AIORING
 
-#ifndef IOCB_FLAG_HIPRI
-#define IOCB_FLAG_HIPRI	(1 << 2)
-#endif
-
 /*
  * io_setup2(2) flags
  */
@@ -51,11 +47,8 @@
 /*
  * io_ring_enter(2) flags
  */
-#ifndef IORING_FLAG_SUBMIT
-#define IORING_FLAG_SUBMIT	(1 << 0)
-#endif
 #ifndef IORING_FLAG_GETEVENTS
-#define IORING_FLAG_GETEVENTS	(1 << 1)
+#define IORING_FLAG_GETEVENTS	(1 << 0)
 #endif
 
 typedef uint64_t u64;
@@ -196,7 +189,6 @@ static int fio_aioring_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct aioring_data *ld = td->io_ops_data;
 	struct fio_file *f = io_u->file;
-	struct aioring_options *o = td->eo;
 	struct iocb *iocb;
 
 	iocb = &ld->iocbs[io_u->index];
@@ -211,10 +203,7 @@ static int fio_aioring_prep(struct thread_data *td, struct io_u *io_u)
 		iocb->u.c.buf = io_u->xfer_buf;
 		iocb->u.c.nbytes = io_u->xfer_buflen;
 		iocb->u.c.offset = io_u->offset;
-		if (o->hipri)
-			iocb->u.c.flags |= IOCB_FLAG_HIPRI;
-		else
-			iocb->u.c.flags = 0;
+		iocb->u.c.flags = 0;
 	} else if (ddir_sync(io_u->ddir))
 		io_prep_fsync(iocb, f->fd);
 
@@ -377,7 +366,7 @@ static int fio_aioring_commit(struct thread_data *td)
 		struct aio_sq_ring *ring = ld->sq_ring;
 
 		if (ring->kflags & IORING_SQ_NEED_WAKEUP)
-			io_ring_enter(ld->aio_ctx, ld->queued, 0, IORING_FLAG_SUBMIT);
+			io_ring_enter(ld->aio_ctx, ld->queued, 0, 0);
 		ld->queued = 0;
 		return 0;
 	}
@@ -386,8 +375,7 @@ static int fio_aioring_commit(struct thread_data *td)
 		unsigned start = ld->sq_ring->head;
 		long nr = ld->queued;
 
-		ret = io_ring_enter(ld->aio_ctx, nr, 0, IORING_FLAG_SUBMIT |
-						IORING_FLAG_GETEVENTS);
+		ret = io_ring_enter(ld->aio_ctx, nr, 0, IORING_FLAG_GETEVENTS);
 		if (ret > 0) {
 			fio_aioring_queued(td, start, ret);
 			io_u_mark_submit(td, ret);
@@ -507,9 +495,6 @@ static int fio_aioring_post_init(struct thread_data *td)
 			iocb = &ld->iocbs[i];
 			iocb->u.c.buf = io_u->buf;
 			iocb->u.c.nbytes = td_max_bs(td);
-
-			if (o->hipri)
-				iocb->u.c.flags |= IOCB_FLAG_HIPRI;
 		}
 	}
 
diff --git a/t/aio-ring.c b/t/aio-ring.c
index 900f4640..c0c5009e 100644
--- a/t/aio-ring.c
+++ b/t/aio-ring.c
@@ -22,8 +22,6 @@
 #include <pthread.h>
 #include <sched.h>
 
-#define IOCB_FLAG_HIPRI		(1 << 2)
-
 #define IOCTX_FLAG_IOPOLL	(1 << 0)
 #define IOCTX_FLAG_SCQRING	(1 << 1)	/* Use SQ/CQ rings */
 #define IOCTX_FLAG_FIXEDBUFS	(1 << 2)
@@ -66,8 +64,7 @@ struct aio_cq_ring {
 	struct io_event events[0];
 };
 
-#define IORING_FLAG_SUBMIT	(1 << 0)
-#define IORING_FLAG_GETEVENTS	(1 << 1)
+#define IORING_FLAG_GETEVENTS	(1 << 0)
 
 #define DEPTH			32
 
@@ -134,8 +131,6 @@ static void init_io(struct submitter *s, int fd, struct iocb *iocb)
 	iocb->aio_fildes = fd;
 	iocb->aio_lio_opcode = IO_CMD_PREAD;
 	iocb->u.c.offset = offset;
-	if (polled)
-		iocb->u.c.flags = IOCB_FLAG_HIPRI;
 	if (!fixedbufs)
 		iocb->u.c.nbytes = BS;
 }
@@ -254,7 +249,7 @@ static void *submitter_fn(void *data)
 
 	prepped = 0;
 	do {
-		int to_wait, flags, to_submit, this_reap;
+		int to_wait, to_submit, this_reap;
 
 		if (!prepped && s->inflight < DEPTH)
 			prepped = prep_more_ios(s, fd, min(DEPTH - s->inflight, BATCH_SUBMIT));
@@ -267,11 +262,8 @@ submit:
 		else
 			to_wait = min(s->inflight + to_submit, BATCH_COMPLETE);
 
-		flags = IORING_FLAG_GETEVENTS;
-		if (to_submit)
-			flags |= IORING_FLAG_SUBMIT;
-
-		ret = io_ring_enter(s->ioc, to_submit, to_wait, flags);
+		ret = io_ring_enter(s->ioc, to_submit, to_wait,
+					IORING_FLAG_GETEVENTS);
 		s->calls++;
 
 		this_reap = reap_events(s);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-12-22 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-12-22 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit d63a472d4b213533236ae9aab9cf9e0ec2854c31:

  engines/aio-ring: initialization error handling (2018-12-19 12:55:10 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 10c4d1318fff63eef1d22c6be6d816210277ae17:

  t/aio-ring: print head/tail as unsigneds (2018-12-21 15:37:16 -0700)

----------------------------------------------------------------
Jens Axboe (4):
      engines/aioring: update for continually rolling ring
      t/aio-ring: update for continually rolling ring
      engines/aioring: fix harmless typo
      t/aio-ring: print head/tail as unsigneds

 engines/aioring.c | 49 ++++++++++++++++++++++++-------------------------
 t/aio-ring.c      | 34 ++++++++++++++++------------------
 2 files changed, 40 insertions(+), 43 deletions(-)

---

Diff of recent changes:

diff --git a/engines/aioring.c b/engines/aioring.c
index 59551f9c..50826964 100644
--- a/engines/aioring.c
+++ b/engines/aioring.c
@@ -17,6 +17,7 @@
 #include "../lib/pow2.h"
 #include "../optgroup.h"
 #include "../lib/memalign.h"
+#include "../lib/fls.h"
 
 #ifdef ARCH_HAVE_AIORING
 
@@ -99,12 +100,15 @@ struct aioring_data {
 
 	struct aio_sq_ring *sq_ring;
 	struct iocb *iocbs;
+	unsigned sq_ring_mask;
 
 	struct aio_cq_ring *cq_ring;
 	struct io_event *events;
+	unsigned cq_ring_mask;
 
 	int queued;
 	int cq_ring_off;
+	unsigned iodepth;
 
 	uint64_t cachehit;
 	uint64_t cachemiss;
@@ -223,11 +227,9 @@ static struct io_u *fio_aioring_event(struct thread_data *td, int event)
 	struct aioring_data *ld = td->io_ops_data;
 	struct io_event *ev;
 	struct io_u *io_u;
-	int index;
+	unsigned index;
 
-	index = event + ld->cq_ring_off;
-	if (index >= ld->cq_ring->nr_events)
-		index -= ld->cq_ring->nr_events;
+	index = (event + ld->cq_ring_off) & ld->cq_ring_mask;
 
 	ev = &ld->cq_ring->events[index];
 	io_u = ev->data;
@@ -264,8 +266,6 @@ static int fio_aioring_cqring_reap(struct thread_data *td, unsigned int events,
 			break;
 		reaped++;
 		head++;
-		if (head == ring->nr_events)
-			head = 0;
 	} while (reaped + events < max);
 
 	ring->head = head;
@@ -280,7 +280,8 @@ static int fio_aioring_getevents(struct thread_data *td, unsigned int min,
 	unsigned actual_min = td->o.iodepth_batch_complete_min == 0 ? 0 : min;
 	struct aioring_options *o = td->eo;
 	struct aio_cq_ring *ring = ld->cq_ring;
-	int r, events = 0;
+	unsigned events = 0;
+	int r;
 
 	ld->cq_ring_off = ring->head;
 	do {
@@ -314,7 +315,7 @@ static enum fio_q_status fio_aioring_queue(struct thread_data *td,
 
 	fio_ro_check(td, io_u);
 
-	if (ld->queued == td->o.iodepth)
+	if (ld->queued == ld->iodepth)
 		return FIO_Q_BUSY;
 
 	if (io_u->ddir == DDIR_TRIM) {
@@ -329,13 +330,11 @@ static enum fio_q_status fio_aioring_queue(struct thread_data *td,
 
 	tail = ring->tail;
 	next_tail = tail + 1;
-	if (next_tail == ring->nr_events)
-		next_tail = 0;
 	read_barrier();
 	if (next_tail == ring->head)
 		return FIO_Q_BUSY;
 
-	ring->array[tail] = io_u->index;
+	ring->array[tail & ld->sq_ring_mask] = io_u->index;
 	ring->tail = next_tail;
 	write_barrier();
 
@@ -354,15 +353,13 @@ static void fio_aioring_queued(struct thread_data *td, int start, int nr)
 	fio_gettime(&now, NULL);
 
 	while (nr--) {
-		int index = ld->sq_ring->array[start];
-		struct io_u *io_u = io_u = ld->io_u_index[index];
+		int index = ld->sq_ring->array[start & ld->sq_ring_mask];
+		struct io_u *io_u = ld->io_u_index[index];
 
 		memcpy(&io_u->issue_time, &now, sizeof(now));
 		io_u_queued(td, io_u);
 
 		start++;
-		if (start == ld->sq_ring->nr_events)
-			start = 0;
 	}
 }
 
@@ -386,7 +383,7 @@ static int fio_aioring_commit(struct thread_data *td)
 	}
 
 	do {
-		int start = ld->sq_ring->head;
+		unsigned start = ld->sq_ring->head;
 		long nr = ld->queued;
 
 		ret = io_ring_enter(ld->aio_ctx, nr, 0, IORING_FLAG_SUBMIT |
@@ -432,6 +429,11 @@ static size_t aioring_sq_size(struct thread_data *td)
 	return sizeof(struct aio_sq_ring) + td->o.iodepth * sizeof(u32);
 }
 
+static unsigned roundup_pow2(unsigned depth)
+{
+	return 1UL << __fls(depth - 1);
+}
+
 static void fio_aioring_cleanup(struct thread_data *td)
 {
 	struct aioring_data *ld = td->io_ops_data;
@@ -440,9 +442,6 @@ static void fio_aioring_cleanup(struct thread_data *td)
 		td->ts.cachehit += ld->cachehit;
 		td->ts.cachemiss += ld->cachemiss;
 
-		/* Bump depth to match init depth */
-		td->o.iodepth++;
-
 		/*
 		 * Work-around to avoid huge RCU stalls at exit time. If we
 		 * don't do this here, then it'll be torn down by exit_aio().
@@ -516,9 +515,6 @@ static int fio_aioring_post_init(struct thread_data *td)
 
 	err = fio_aioring_queue_init(td);
 
-	/* Adjust depth back again */
-	td->o.iodepth--;
-
 	if (err) {
 		td_verror(td, errno, "io_queue_init");
 		return 1;
@@ -531,11 +527,12 @@ static int fio_aioring_init(struct thread_data *td)
 {
 	struct aioring_data *ld;
 
-	/* ring needs an extra entry, add one to achieve QD set */
-	td->o.iodepth++;
-
 	ld = calloc(1, sizeof(*ld));
 
+	/* ring depth must be a power-of-2 */
+	ld->iodepth = td->o.iodepth;
+	td->o.iodepth = roundup_pow2(td->o.iodepth);
+
 	/* io_u index */
 	ld->io_u_index = calloc(td->o.iodepth, sizeof(struct io_u *));
 	ld->io_us = calloc(td->o.iodepth, sizeof(struct io_u *));
@@ -547,10 +544,12 @@ static int fio_aioring_init(struct thread_data *td)
 	memset(ld->sq_ring, 0, aioring_sq_size(td));
 	ld->sq_ring->nr_events = td->o.iodepth;
 	ld->sq_ring->iocbs = (u64) (uintptr_t) ld->iocbs;
+	ld->sq_ring_mask = td->o.iodepth - 1;
 
 	ld->cq_ring = fio_memalign(page_size, aioring_cq_size(td), false);
 	memset(ld->cq_ring, 0, aioring_cq_size(td));
 	ld->cq_ring->nr_events = td->o.iodepth * 2;
+	ld->cq_ring_mask = (2 * td->o.iodepth) - 1;
 
 	td->io_ops_data = ld;
 	return 0;
diff --git a/t/aio-ring.c b/t/aio-ring.c
index c813c4e7..900f4640 100644
--- a/t/aio-ring.c
+++ b/t/aio-ring.c
@@ -70,13 +70,15 @@ struct aio_cq_ring {
 #define IORING_FLAG_GETEVENTS	(1 << 1)
 
 #define DEPTH			32
-#define RING_SIZE		(DEPTH + 1)
 
 #define BATCH_SUBMIT		8
 #define BATCH_COMPLETE		8
 
 #define BS			4096
 
+static unsigned sq_ring_mask = DEPTH - 1;
+static unsigned cq_ring_mask = (2 * DEPTH) - 1;
+
 struct submitter {
 	pthread_t thread;
 	unsigned long max_blocks;
@@ -141,20 +143,18 @@ static void init_io(struct submitter *s, int fd, struct iocb *iocb)
 static int prep_more_ios(struct submitter *s, int fd, int max_ios)
 {
 	struct aio_sq_ring *ring = s->sq_ring;
-	u32 tail, next_tail, prepped = 0;
+	u32 index, tail, next_tail, prepped = 0;
 
 	next_tail = tail = ring->tail;
 	do {
 		next_tail++;
-		if (next_tail == ring->nr_events)
-			next_tail = 0;
-
 		barrier();
 		if (next_tail == ring->head)
 			break;
 
-		init_io(s, fd, &s->iocbs[tail]);
-		s->sq_ring->array[tail] = tail;
+		index = tail & sq_ring_mask;
+		init_io(s, fd, &s->iocbs[index]);
+		s->sq_ring->array[index] = index;
 		prepped++;
 		tail = next_tail;
 	} while (prepped < max_ios);
@@ -201,7 +201,7 @@ static int reap_events(struct submitter *s)
 		barrier();
 		if (head == ring->tail)
 			break;
-		ev = &ring->events[head];
+		ev = &ring->events[head & cq_ring_mask];
 		if (ev->res != BS) {
 			struct iocb *iocb = ev->obj;
 
@@ -215,8 +215,6 @@ static int reap_events(struct submitter *s)
 			s->cachemiss++;
 		reaped++;
 		head++;
-		if (head == ring->nr_events)
-			head = 0;
 	} while (1);
 
 	s->inflight -= reaped;
@@ -361,30 +359,30 @@ int main(int argc, char *argv[])
 
 	arm_sig_int();
 
-	size = sizeof(struct iocb) * RING_SIZE;
+	size = sizeof(struct iocb) * DEPTH;
 	if (posix_memalign(&p, 4096, size))
 		return 1;
 	memset(p, 0, size);
 	s->iocbs = p;
 
-	size = sizeof(struct aio_sq_ring) + RING_SIZE * sizeof(u32);
+	size = sizeof(struct aio_sq_ring) + DEPTH * sizeof(u32);
 	if (posix_memalign(&p, 4096, size))
 		return 1;
 	s->sq_ring = p;
 	memset(p, 0, size);
-	s->sq_ring->nr_events = RING_SIZE;
+	s->sq_ring->nr_events = DEPTH;
 	s->sq_ring->iocbs = (u64) s->iocbs;
 
 	/* CQ ring must be twice as big */
 	size = sizeof(struct aio_cq_ring) +
-			2 * RING_SIZE * sizeof(struct io_event);
+			2 * DEPTH * sizeof(struct io_event);
 	if (posix_memalign(&p, 4096, size))
 		return 1;
 	s->cq_ring = p;
 	memset(p, 0, size);
-	s->cq_ring->nr_events = 2 * RING_SIZE;
+	s->cq_ring->nr_events = 2 * DEPTH;
 
-	for (j = 0; j < RING_SIZE; j++) {
+	for (j = 0; j < DEPTH; j++) {
 		struct iocb *iocb = &s->iocbs[j];
 
 		if (posix_memalign(&iocb->u.c.buf, BS, BS)) {
@@ -406,7 +404,7 @@ int main(int argc, char *argv[])
 		s->sq_ring->sq_thread_cpu = sq_thread_cpu;
 	}
 
-	err = io_setup2(RING_SIZE, flags, s->sq_ring, s->cq_ring, &s->ioc);
+	err = io_setup2(DEPTH, flags, s->sq_ring, s->cq_ring, &s->ioc);
 	if (err) {
 		printf("ctx_init failed: %s, %d\n", strerror(errno), err);
 		return 1;
@@ -445,7 +443,7 @@ int main(int argc, char *argv[])
 			rpc = (this_done - done) / (this_call - calls);
 			ipc = (this_reap - reap) / (this_call - calls);
 		}
-		printf("IOPS=%lu, IOS/call=%lu/%lu, inflight=%u (head=%d tail=%d), Cachehit=%0.2f%%\n",
+		printf("IOPS=%lu, IOS/call=%lu/%lu, inflight=%u (head=%u tail=%u), Cachehit=%0.2f%%\n",
 				this_done - done, rpc, ipc, s->inflight,
 				s->cq_ring->head, s->cq_ring->tail, hit);
 		done = this_done;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-12-20 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-12-20 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5b8f19b7afe0cabc002c453a1a4abd7a494880bb:

  Fix 'min' latency times being 0 with ramp_time (2018-12-14 14:36:52 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d63a472d4b213533236ae9aab9cf9e0ec2854c31:

  engines/aio-ring: initialization error handling (2018-12-19 12:55:10 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      engines/aio-ring: cleanup read/write prep
      engines/aio-ring: initialization error handling

 engines/aioring.c | 38 +++++++++++++++++---------------------
 1 file changed, 17 insertions(+), 21 deletions(-)

---

Diff of recent changes:

diff --git a/engines/aioring.c b/engines/aioring.c
index 925b8862..59551f9c 100644
--- a/engines/aioring.c
+++ b/engines/aioring.c
@@ -197,26 +197,20 @@ static int fio_aioring_prep(struct thread_data *td, struct io_u *io_u)
 
 	iocb = &ld->iocbs[io_u->index];
 
-	if (io_u->ddir == DDIR_READ) {
-		if (o->fixedbufs) {
-			iocb->aio_fildes = f->fd;
+	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
+		if (io_u->ddir == DDIR_READ)
 			iocb->aio_lio_opcode = IO_CMD_PREAD;
-			iocb->u.c.offset = io_u->offset;
-		} else {
-			io_prep_pread(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
-			if (o->hipri)
-				iocb->u.c.flags |= IOCB_FLAG_HIPRI;
-		}
-	} else if (io_u->ddir == DDIR_WRITE) {
-		if (o->fixedbufs) {
-			iocb->aio_fildes = f->fd;
+		else
 			iocb->aio_lio_opcode = IO_CMD_PWRITE;
-			iocb->u.c.offset = io_u->offset;
-		} else {
-			io_prep_pwrite(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
-			if (o->hipri)
-				iocb->u.c.flags |= IOCB_FLAG_HIPRI;
-		}
+		iocb->aio_reqprio = 0;
+		iocb->aio_fildes = f->fd;
+		iocb->u.c.buf = io_u->xfer_buf;
+		iocb->u.c.nbytes = io_u->xfer_buflen;
+		iocb->u.c.offset = io_u->offset;
+		if (o->hipri)
+			iocb->u.c.flags |= IOCB_FLAG_HIPRI;
+		else
+			iocb->u.c.flags = 0;
 	} else if (ddir_sync(io_u->ddir))
 		io_prep_fsync(iocb, f->fd);
 
@@ -521,13 +515,15 @@ static int fio_aioring_post_init(struct thread_data *td)
 	}
 
 	err = fio_aioring_queue_init(td);
+
+	/* Adjust depth back again */
+	td->o.iodepth--;
+
 	if (err) {
-		td_verror(td, -err, "io_queue_init");
+		td_verror(td, errno, "io_queue_init");
 		return 1;
 	}
 
-	/* Adjust depth back again */
-	td->o.iodepth--;
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-12-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-12-15 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 771c99012e26af0dc2a0b7e0762e5097534144bd:

  engines/aioring: enable IOCTX_FLAG_SQPOLL (2018-12-13 13:52:35 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5b8f19b7afe0cabc002c453a1a4abd7a494880bb:

  Fix 'min' latency times being 0 with ramp_time (2018-12-14 14:36:52 -0700)

----------------------------------------------------------------
Jens Axboe (6):
      engines/aioring: update to newer API
      client/server: convert nr_zone_resets on the wire
      Add cache hit stats
      t/aio-ring: add cache hit statistics
      engines/aioring: get rid of old error on sqwq and sqthread
      Fix 'min' latency times being 0 with ramp_time

 client.c          |  4 ++++
 engines/aioring.c | 31 ++++++++++++++++++++++++-------
 server.c          |  4 ++++
 server.h          |  2 +-
 stat.c            | 34 +++++++++++++++++++++++++++++-----
 stat.h            |  3 +++
 t/aio-ring.c      | 30 ++++++++++++++++++++++++++----
 7 files changed, 91 insertions(+), 17 deletions(-)

---

Diff of recent changes:

diff --git a/client.c b/client.c
index 32489067..480425f6 100644
--- a/client.c
+++ b/client.c
@@ -1000,6 +1000,7 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 
 	dst->total_submit	= le64_to_cpu(src->total_submit);
 	dst->total_complete	= le64_to_cpu(src->total_complete);
+	dst->nr_zone_resets	= le64_to_cpu(src->nr_zone_resets);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		dst->io_bytes[i]	= le64_to_cpu(src->io_bytes[i]);
@@ -1038,6 +1039,9 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 			dst->ss_bw_data[i] = le64_to_cpu(src->ss_bw_data[i]);
 		}
 	}
+
+	dst->cachehit		= le64_to_cpu(src->cachehit);
+	dst->cachemiss		= le64_to_cpu(src->cachemiss);
 }
 
 static void convert_gs(struct group_run_stats *dst, struct group_run_stats *src)
diff --git a/engines/aioring.c b/engines/aioring.c
index cb13b415..925b8862 100644
--- a/engines/aioring.c
+++ b/engines/aioring.c
@@ -61,6 +61,10 @@ typedef uint64_t u64;
 typedef uint32_t u32;
 typedef uint16_t u16;
 
+#define IORING_SQ_NEED_WAKEUP	(1 << 0)
+
+#define IOEV_RES2_CACHEHIT	(1 << 0)
+
 struct aio_sq_ring {
 	union {
 		struct {
@@ -68,6 +72,7 @@ struct aio_sq_ring {
 			u32 tail;
 			u32 nr_events;
 			u16 sq_thread_cpu;
+			u16 kflags;
 			u64 iocbs;
 		};
 		u32 pad[16];
@@ -100,6 +105,9 @@ struct aioring_data {
 
 	int queued;
 	int cq_ring_off;
+
+	uint64_t cachehit;
+	uint64_t cachemiss;
 };
 
 struct aioring_options {
@@ -238,6 +246,13 @@ static struct io_u *fio_aioring_event(struct thread_data *td, int event)
 	} else
 		io_u->error = 0;
 
+	if (io_u->ddir == DDIR_READ) {
+		if (ev->res2 & IOEV_RES2_CACHEHIT)
+			ld->cachehit++;
+		else
+			ld->cachemiss++;
+	}
+
 	return io_u;
 }
 
@@ -368,6 +383,10 @@ static int fio_aioring_commit(struct thread_data *td)
 
 	/* Nothing to do */
 	if (o->sqthread_poll) {
+		struct aio_sq_ring *ring = ld->sq_ring;
+
+		if (ring->kflags & IORING_SQ_NEED_WAKEUP)
+			io_ring_enter(ld->aio_ctx, ld->queued, 0, IORING_FLAG_SUBMIT);
 		ld->queued = 0;
 		return 0;
 	}
@@ -424,6 +443,9 @@ static void fio_aioring_cleanup(struct thread_data *td)
 	struct aioring_data *ld = td->io_ops_data;
 
 	if (ld) {
+		td->ts.cachehit += ld->cachehit;
+		td->ts.cachemiss += ld->cachemiss;
+
 		/* Bump depth to match init depth */
 		td->o.iodepth++;
 
@@ -458,7 +480,8 @@ static int fio_aioring_queue_init(struct thread_data *td)
 		flags |= IOCTX_FLAG_SQTHREAD;
 		if (o->sqthread_poll)
 			flags |= IOCTX_FLAG_SQPOLL;
-	} else if (o->sqwq)
+	}
+	if (o->sqwq)
 		flags |= IOCTX_FLAG_SQWQ;
 
 	if (o->fixedbufs) {
@@ -510,14 +533,8 @@ static int fio_aioring_post_init(struct thread_data *td)
 
 static int fio_aioring_init(struct thread_data *td)
 {
-	struct aioring_options *o = td->eo;
 	struct aioring_data *ld;
 
-	if (o->sqthread_set && o->sqwq) {
-		log_err("fio: aioring sqthread and sqwq are mutually exclusive\n");
-		return 1;
-	}
-
 	/* ring needs an extra entry, add one to achieve QD set */
 	td->o.iodepth++;
 
diff --git a/server.c b/server.c
index 90d3396b..2a337707 100644
--- a/server.c
+++ b/server.c
@@ -1530,6 +1530,7 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 
 	p.ts.total_submit	= cpu_to_le64(ts->total_submit);
 	p.ts.total_complete	= cpu_to_le64(ts->total_complete);
+	p.ts.nr_zone_resets	= cpu_to_le64(ts->nr_zone_resets);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		p.ts.io_bytes[i]	= cpu_to_le64(ts->io_bytes[i]);
@@ -1562,6 +1563,9 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	p.ts.ss_deviation.u.i	= cpu_to_le64(fio_double_to_uint64(ts->ss_deviation.u.f));
 	p.ts.ss_criterion.u.i	= cpu_to_le64(fio_double_to_uint64(ts->ss_criterion.u.f));
 
+	p.ts.cachehit		= cpu_to_le64(ts->cachehit);
+	p.ts.cachemiss		= cpu_to_le64(ts->cachemiss);
+
 	convert_gs(&p.rs, rs);
 
 	dprint(FD_NET, "ts->ss_state = %d\n", ts->ss_state);
diff --git a/server.h b/server.h
index 371e51ea..abb23bad 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 77,
+	FIO_SERVER_VER			= 78,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index 887509fe..351c49cc 100644
--- a/stat.c
+++ b/stat.c
@@ -419,7 +419,7 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	unsigned long runt;
 	unsigned long long min, max, bw, iops;
 	double mean, dev;
-	char *io_p, *bw_p, *bw_p_alt, *iops_p, *zbd_w_st = NULL;
+	char *io_p, *bw_p, *bw_p_alt, *iops_p, *post_st = NULL;
 	int i2p;
 
 	if (ddir_sync(ddir)) {
@@ -451,15 +451,25 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	iops = (1000 * (uint64_t)ts->total_io_u[ddir]) / runt;
 	iops_p = num2str(iops, ts->sig_figs, 1, 0, N2S_NONE);
 	if (ddir == DDIR_WRITE)
-		zbd_w_st = zbd_write_status(ts);
+		post_st = zbd_write_status(ts);
+	else if (ddir == DDIR_READ && ts->cachehit && ts->cachemiss) {
+		uint64_t total;
+		double hit;
+
+		total = ts->cachehit + ts->cachemiss;
+		hit = (double) ts->cachehit / (double) total;
+		hit *= 100.0;
+		if (asprintf(&post_st, "; Cachehit=%0.2f%%", hit) < 0)
+			post_st = NULL;
+	}
 
 	log_buf(out, "  %s: IOPS=%s, BW=%s (%s)(%s/%llumsec)%s\n",
 			rs->unified_rw_rep ? "mixed" : io_ddir_name(ddir),
 			iops_p, bw_p, bw_p_alt, io_p,
 			(unsigned long long) ts->runtime[ddir],
-			zbd_w_st ? : "");
+			post_st ? : "");
 
-	free(zbd_w_st);
+	free(post_st);
 	free(io_p);
 	free(bw_p);
 	free(bw_p_alt);
@@ -1153,6 +1163,16 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	json_object_add_value_float(dir_object, "iops_stddev", dev);
 	json_object_add_value_int(dir_object, "iops_samples",
 				(&ts->iops_stat[ddir])->samples);
+
+	if (ts->cachehit + ts->cachemiss) {
+		uint64_t total;
+		double hit;
+
+		total = ts->cachehit + ts->cachemiss;
+		hit = (double) ts->cachehit / (double) total;
+		hit *= 100.0;
+		json_object_add_value_float(dir_object, "cachehit", hit);
+	}
 }
 
 static void show_thread_status_terse_all(struct thread_stat *ts,
@@ -1695,6 +1715,8 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 	dst->total_submit += src->total_submit;
 	dst->total_complete += src->total_complete;
 	dst->nr_zone_resets += src->nr_zone_resets;
+	dst->cachehit += src->cachehit;
+	dst->cachemiss += src->cachemiss;
 }
 
 void init_group_run_stat(struct group_run_stats *gs)
@@ -2329,7 +2351,8 @@ static void __add_log_sample(struct io_log *iolog, union io_sample_data data,
 
 static inline void reset_io_stat(struct io_stat *ios)
 {
-	ios->max_val = ios->min_val = ios->samples = 0;
+	ios->min_val = -1ULL;
+	ios->max_val = ios->samples = 0;
 	ios->mean.u.f = ios->S.u.f = 0;
 }
 
@@ -2376,6 +2399,7 @@ void reset_io_stats(struct thread_data *td)
 	ts->total_submit = 0;
 	ts->total_complete = 0;
 	ts->nr_zone_resets = 0;
+	ts->cachehit = ts->cachemiss = 0;
 }
 
 static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
diff --git a/stat.h b/stat.h
index b4ba71e3..e9551381 100644
--- a/stat.h
+++ b/stat.h
@@ -246,6 +246,9 @@ struct thread_stat {
 		uint64_t *ss_bw_data;
 		uint64_t pad5;
 	};
+
+	uint64_t cachehit;
+	uint64_t cachemiss;
 } __attribute__((packed));
 
 struct jobs_eta {
diff --git a/t/aio-ring.c b/t/aio-ring.c
index 322f2ffa..c813c4e7 100644
--- a/t/aio-ring.c
+++ b/t/aio-ring.c
@@ -30,6 +30,8 @@
 #define IOCTX_FLAG_SQTHREAD	(1 << 3)	/* Use SQ thread */
 #define IOCTX_FLAG_SQWQ		(1 << 4)	/* Use SQ wq */
 
+#define IOEV_RES2_CACHEHIT	(1 << 0)
+
 #define barrier()	__asm__ __volatile__("": : :"memory")
 
 #define min(a, b)		((a < b) ? (a) : (b))
@@ -87,6 +89,7 @@ struct submitter {
 	unsigned long reaps;
 	unsigned long done;
 	unsigned long calls;
+	unsigned long cachehit, cachemiss;
 	volatile int finish;
 	char filename[128];
 };
@@ -206,6 +209,10 @@ static int reap_events(struct submitter *s)
 			printf("offset=%lu, size=%lu\n", (unsigned long) iocb->u.c.offset, (unsigned long) iocb->u.c.nbytes);
 			return -1;
 		}
+		if (ev->res2 & IOEV_RES2_CACHEHIT)
+			s->cachehit++;
+		else
+			s->cachemiss++;
 		reaped++;
 		head++;
 		if (head == ring->nr_events)
@@ -333,7 +340,7 @@ static void arm_sig_int(void)
 int main(int argc, char *argv[])
 {
 	struct submitter *s = &submitters[0];
-	unsigned long done, calls, reap;
+	unsigned long done, calls, reap, cache_hit, cache_miss;
 	int flags = 0, err;
 	int j;
 	size_t size;
@@ -410,27 +417,42 @@ int main(int argc, char *argv[])
 
 	pthread_create(&s->thread, NULL, submitter_fn, s);
 
-	reap = calls = done = 0;
+	cache_hit = cache_miss = reap = calls = done = 0;
 	do {
 		unsigned long this_done = 0;
 		unsigned long this_reap = 0;
 		unsigned long this_call = 0;
+		unsigned long this_cache_hit = 0;
+		unsigned long this_cache_miss = 0;
 		unsigned long rpc = 0, ipc = 0;
+		double hit = 0.0;
 
 		sleep(1);
 		this_done += s->done;
 		this_call += s->calls;
 		this_reap += s->reaps;
+		this_cache_hit += s->cachehit;
+		this_cache_miss += s->cachemiss;
+		if (this_cache_hit && this_cache_miss) {
+			unsigned long hits, total;
+
+			hits = this_cache_hit - cache_hit;
+			total = hits + this_cache_miss - cache_miss;
+			hit = (double) hits / (double) total;
+			hit *= 100.0;
+		}
 		if (this_call - calls) {
 			rpc = (this_done - done) / (this_call - calls);
 			ipc = (this_reap - reap) / (this_call - calls);
 		}
-		printf("IOPS=%lu, IOS/call=%lu/%lu, inflight=%u (head=%d tail=%d), %lu, %lu\n",
+		printf("IOPS=%lu, IOS/call=%lu/%lu, inflight=%u (head=%d tail=%d), Cachehit=%0.2f%%\n",
 				this_done - done, rpc, ipc, s->inflight,
-				s->cq_ring->head, s->cq_ring->tail, s->reaps, s->done);
+				s->cq_ring->head, s->cq_ring->tail, hit);
 		done = this_done;
 		calls = this_call;
 		reap = this_reap;
+		cache_hit = s->cachehit;
+		cache_miss = s->cachemiss;
 	} while (!finish);
 
 	pthread_join(s->thread, &ret);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-12-14 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-12-14 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 702906e9e3e03e9836421d5e5b5eaae3cd99d398:

  engines/libaio: remove features deprecated from old interface (2018-12-12 22:02:16 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 771c99012e26af0dc2a0b7e0762e5097534144bd:

  engines/aioring: enable IOCTX_FLAG_SQPOLL (2018-12-13 13:52:35 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      engines/aioring: various updates and fixes
      io_u: ensure buflen is capped at maxbs
      engines/aioring: enable IOCTX_FLAG_SQPOLL

 engines/aioring.c | 157 +++++++++++++++++++++++++++++++++---------------------
 io_u.c            |   6 ++-
 2 files changed, 100 insertions(+), 63 deletions(-)

---

Diff of recent changes:

diff --git a/engines/aioring.c b/engines/aioring.c
index 1598cc12..cb13b415 100644
--- a/engines/aioring.c
+++ b/engines/aioring.c
@@ -1,7 +1,9 @@
 /*
  * aioring engine
  *
- * IO engine using the new native Linux libaio ring interface
+ * IO engine using the new native Linux libaio ring interface. See:
+ *
+ * http://git.kernel.dk/cgit/linux-block/log/?h=aio-poll
  *
  */
 #include <stdlib.h>
@@ -40,6 +42,10 @@
 #ifndef IOCTX_FLAG_SQWQ
 #define IOCTX_FLAG_SQWQ		(1 << 4)
 #endif
+#ifndef IOCTX_FLAG_SQPOLL
+#define IOCTX_FLAG_SQPOLL	(1 << 5)
+#endif
+
 
 /*
  * io_ring_enter(2) flags
@@ -100,8 +106,22 @@ struct aioring_options {
 	void *pad;
 	unsigned int hipri;
 	unsigned int fixedbufs;
+	unsigned int sqthread;
+	unsigned int sqthread_set;
+	unsigned int sqthread_poll;
+	unsigned int sqwq;
 };
 
+static int fio_aioring_sqthread_cb(void *data,
+				   unsigned long long *val)
+{
+	struct aioring_options *o = data;
+
+	o->sqthread = *val;
+	o->sqthread_set = 1;
+	return 0;
+}
+
 static struct fio_option options[] = {
 	{
 		.name	= "hipri",
@@ -121,22 +141,43 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
 	},
+	{
+		.name	= "sqthread",
+		.lname	= "Use kernel SQ thread on this CPU",
+		.type	= FIO_OPT_INT,
+		.cb	= fio_aioring_sqthread_cb,
+		.help	= "Offload submission to kernel thread",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
+	{
+		.name	= "sqthread_poll",
+		.lname	= "Kernel SQ thread should poll",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct aioring_options, sqthread_poll),
+		.help	= "Used with sqthread, enables kernel side polling",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
+	{
+		.name	= "sqwq",
+		.lname	= "Offload submission to kernel workqueue",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct aioring_options, sqwq),
+		.help	= "Offload submission to kernel workqueue",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
 	{
 		.name	= NULL,
 	},
 };
 
-static int fio_aioring_commit(struct thread_data *td);
-
 static int io_ring_enter(io_context_t ctx, unsigned int to_submit,
 			 unsigned int min_complete, unsigned int flags)
 {
-#ifdef __NR_sys_io_ring_enter
 	return syscall(__NR_sys_io_ring_enter, ctx, to_submit, min_complete,
 			flags);
-#else
-	return -1;
-#endif
 }
 
 static int fio_aioring_prep(struct thread_data *td, struct io_u *io_u)
@@ -228,6 +269,7 @@ static int fio_aioring_getevents(struct thread_data *td, unsigned int min,
 {
 	struct aioring_data *ld = td->io_ops_data;
 	unsigned actual_min = td->o.iodepth_batch_complete_min == 0 ? 0 : min;
+	struct aioring_options *o = td->eo;
 	struct aio_cq_ring *ring = ld->cq_ring;
 	int r, events = 0;
 
@@ -239,13 +281,15 @@ static int fio_aioring_getevents(struct thread_data *td, unsigned int min,
 			continue;
 		}
 
-		r = io_ring_enter(ld->aio_ctx, 0, actual_min,
-					IORING_FLAG_GETEVENTS);
-		if (r < 0) {
-			if (errno == EAGAIN)
-				continue;
-			perror("ring enter");
-			break;
+		if (!o->sqthread_poll) {
+			r = io_ring_enter(ld->aio_ctx, 0, actual_min,
+						IORING_FLAG_GETEVENTS);
+			if (r < 0) {
+				if (errno == EAGAIN)
+					continue;
+				td_verror(td, errno, "io_ring_enter get");
+				break;
+			}
 		}
 	} while (events < min);
 
@@ -264,20 +308,6 @@ static enum fio_q_status fio_aioring_queue(struct thread_data *td,
 	if (ld->queued == td->o.iodepth)
 		return FIO_Q_BUSY;
 
-	/*
-	 * fsync is tricky, since it can fail and we need to do it
-	 * serialized with other io. the reason is that linux doesn't
-	 * support aio fsync yet. So return busy for the case where we
-	 * have pending io, to let fio complete those first.
-	 */
-	if (ddir_sync(io_u->ddir)) {
-		if (ld->queued)
-			return FIO_Q_BUSY;
-
-		do_io_u_sync(td, io_u);
-		return FIO_Q_COMPLETED;
-	}
-
 	if (io_u->ddir == DDIR_TRIM) {
 		if (ld->queued)
 			return FIO_Q_BUSY;
@@ -330,55 +360,45 @@ static void fio_aioring_queued(struct thread_data *td, int start, int nr)
 static int fio_aioring_commit(struct thread_data *td)
 {
 	struct aioring_data *ld = td->io_ops_data;
+	struct aioring_options *o = td->eo;
 	int ret;
 
 	if (!ld->queued)
 		return 0;
 
+	/* Nothing to do */
+	if (o->sqthread_poll) {
+		ld->queued = 0;
+		return 0;
+	}
+
 	do {
 		int start = ld->sq_ring->head;
 		long nr = ld->queued;
 
 		ret = io_ring_enter(ld->aio_ctx, nr, 0, IORING_FLAG_SUBMIT |
 						IORING_FLAG_GETEVENTS);
-		if (ret == -1)
-			perror("io_ring_enter");
 		if (ret > 0) {
 			fio_aioring_queued(td, start, ret);
 			io_u_mark_submit(td, ret);
 
 			ld->queued -= ret;
 			ret = 0;
-		} else if (ret == -EINTR || !ret) {
-			if (!ret)
-				io_u_mark_submit(td, ret);
+		} else if (!ret) {
+			io_u_mark_submit(td, ret);
 			continue;
-		} else if (ret == -EAGAIN) {
-			/*
-			 * If we get EAGAIN, we should break out without
-			 * error and let the upper layer reap some
-			 * events for us. If we have no queued IO, we
-			 * must loop here. If we loop for more than 30s,
-			 * just error out, something must be buggy in the
-			 * IO path.
-			 */
-			if (ld->queued) {
-				ret = 0;
-				break;
+		} else {
+			if (errno == EAGAIN) {
+				ret = fio_aioring_cqring_reap(td, 0, ld->queued);
+				if (ret)
+					continue;
+				/* Shouldn't happen */
+				usleep(1);
+				continue;
 			}
-			usleep(1);
-			continue;
-		} else if (ret == -ENOMEM) {
-			/*
-			 * If we get -ENOMEM, reap events if we can. If
-			 * we cannot, treat it as a fatal event since there's
-			 * nothing we can do about it.
-			 */
-			if (ld->queued)
-				ret = 0;
-			break;
-		} else
+			td_verror(td, errno, "io_ring_enter sumit");
 			break;
+		}
 	} while (ld->queued);
 
 	return ret;
@@ -404,6 +424,9 @@ static void fio_aioring_cleanup(struct thread_data *td)
 	struct aioring_data *ld = td->io_ops_data;
 
 	if (ld) {
+		/* Bump depth to match init depth */
+		td->o.iodepth++;
+
 		/*
 		 * Work-around to avoid huge RCU stalls at exit time. If we
 		 * don't do this here, then it'll be torn down by exit_aio().
@@ -423,7 +446,6 @@ static void fio_aioring_cleanup(struct thread_data *td)
 
 static int fio_aioring_queue_init(struct thread_data *td)
 {
-#ifdef __NR_sys_io_setup2
 	struct aioring_data *ld = td->io_ops_data;
 	struct aioring_options *o = td->eo;
 	int flags = IOCTX_FLAG_SCQRING;
@@ -431,6 +453,14 @@ static int fio_aioring_queue_init(struct thread_data *td)
 
 	if (o->hipri)
 		flags |= IOCTX_FLAG_IOPOLL;
+	if (o->sqthread_set) {
+		ld->sq_ring->sq_thread_cpu = o->sqthread;
+		flags |= IOCTX_FLAG_SQTHREAD;
+		if (o->sqthread_poll)
+			flags |= IOCTX_FLAG_SQPOLL;
+	} else if (o->sqwq)
+		flags |= IOCTX_FLAG_SQWQ;
+
 	if (o->fixedbufs) {
 		struct rlimit rlim = {
 			.rlim_cur = RLIM_INFINITY,
@@ -443,9 +473,6 @@ static int fio_aioring_queue_init(struct thread_data *td)
 
 	return syscall(__NR_sys_io_setup2, depth, flags,
 			ld->sq_ring, ld->cq_ring, &ld->aio_ctx);
-#else
-	return -1;
-#endif
 }
 
 static int fio_aioring_post_init(struct thread_data *td)
@@ -476,13 +503,21 @@ static int fio_aioring_post_init(struct thread_data *td)
 		return 1;
 	}
 
+	/* Adjust depth back again */
+	td->o.iodepth--;
 	return 0;
 }
 
 static int fio_aioring_init(struct thread_data *td)
 {
+	struct aioring_options *o = td->eo;
 	struct aioring_data *ld;
 
+	if (o->sqthread_set && o->sqwq) {
+		log_err("fio: aioring sqthread and sqwq are mutually exclusive\n");
+		return 1;
+	}
+
 	/* ring needs an extra entry, add one to achieve QD set */
 	td->o.iodepth++;
 
diff --git a/io_u.c b/io_u.c
index 1604ff84..bee99c37 100644
--- a/io_u.c
+++ b/io_u.c
@@ -570,8 +570,10 @@ static unsigned long long get_next_buflen(struct thread_data *td, struct io_u *i
 		power_2 = is_power_of_2(minbs);
 		if (!td->o.bs_unaligned && power_2)
 			buflen &= ~(minbs - 1);
-		else if (!td->o.bs_unaligned && !power_2) 
-			buflen -= buflen % minbs; 
+		else if (!td->o.bs_unaligned && !power_2)
+			buflen -= buflen % minbs;
+		if (buflen > maxbs)
+			buflen = maxbs;
 	} while (!io_u_fits(td, io_u, buflen));
 
 	return buflen;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-12-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-12-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0527b23f9112b59dc63f3b40a7f40d45c48ec60d:

  t/aio-ring: updates/cleanups (2018-12-10 15:14:36 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 702906e9e3e03e9836421d5e5b5eaae3cd99d398:

  engines/libaio: remove features deprecated from old interface (2018-12-12 22:02:16 -0700)

----------------------------------------------------------------
Jens Axboe (10):
      t/aio-ring: update to newer API
      t/aio-ring: update to new io_setup2(2)
      t/aio-ring: set nr_events after clear
      ioengine: remove ancient alias for libaio
      Add aioring engine
      t/aio-ring: update for new API
      aioring: hide it if archs don't define syscalls
      aioring: check for arch support AFTER including the headers
      aioring: remove qd > 1 restriction
      engines/libaio: remove features deprecated from old interface

 Makefile           |   3 +
 arch/arch-x86_64.h |   4 +
 engines/aioring.c  | 547 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 engines/libaio.c   | 150 ++-------------
 ioengines.c        |   2 +-
 options.c          |   7 +
 t/aio-ring.c       |  72 ++++---
 7 files changed, 621 insertions(+), 164 deletions(-)
 create mode 100644 engines/aioring.c

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 284621d3..f111ae6a 100644
--- a/Makefile
+++ b/Makefile
@@ -68,6 +68,9 @@ endif
 ifdef CONFIG_LIBAIO
   SOURCE += engines/libaio.c
 endif
+ifdef CONFIG_LIBAIO
+  SOURCE += engines/aioring.c
+endif
 ifdef CONFIG_RDMA
   SOURCE += engines/rdma.c
 endif
diff --git a/arch/arch-x86_64.h b/arch/arch-x86_64.h
index ac670d08..d49bcd7f 100644
--- a/arch/arch-x86_64.h
+++ b/arch/arch-x86_64.h
@@ -4,6 +4,9 @@
 #ifndef __NR_sys_io_setup2
 #define __NR_sys_io_setup2	335
 #endif
+#ifndef __NR_sys_io_ring_enter
+#define __NR_sys_io_ring_enter	336
+#endif
 
 static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 			    unsigned int *ecx, unsigned int *edx)
@@ -41,6 +44,7 @@ static inline unsigned long long get_cpu_clock(void)
 #define ARCH_HAVE_FFZ
 #define ARCH_HAVE_SSE4_2
 #define ARCH_HAVE_CPU_CLOCK
+#define ARCH_HAVE_AIORING
 
 #define RDRAND_LONG	".byte 0x48,0x0f,0xc7,0xf0"
 #define RDSEED_LONG	".byte 0x48,0x0f,0xc7,0xf8"
diff --git a/engines/aioring.c b/engines/aioring.c
new file mode 100644
index 00000000..1598cc12
--- /dev/null
+++ b/engines/aioring.c
@@ -0,0 +1,547 @@
+/*
+ * aioring engine
+ *
+ * IO engine using the new native Linux libaio ring interface
+ *
+ */
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <libaio.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+
+#include "../fio.h"
+#include "../lib/pow2.h"
+#include "../optgroup.h"
+#include "../lib/memalign.h"
+
+#ifdef ARCH_HAVE_AIORING
+
+#ifndef IOCB_FLAG_HIPRI
+#define IOCB_FLAG_HIPRI	(1 << 2)
+#endif
+
+/*
+ * io_setup2(2) flags
+ */
+#ifndef IOCTX_FLAG_IOPOLL
+#define IOCTX_FLAG_IOPOLL	(1 << 0)
+#endif
+#ifndef IOCTX_FLAG_SCQRING
+#define IOCTX_FLAG_SCQRING	(1 << 1)
+#endif
+#ifndef IOCTX_FLAG_FIXEDBUFS
+#define IOCTX_FLAG_FIXEDBUFS	(1 << 2)
+#endif
+#ifndef IOCTX_FLAG_SQTHREAD
+#define IOCTX_FLAG_SQTHREAD	(1 << 3)
+#endif
+#ifndef IOCTX_FLAG_SQWQ
+#define IOCTX_FLAG_SQWQ		(1 << 4)
+#endif
+
+/*
+ * io_ring_enter(2) flags
+ */
+#ifndef IORING_FLAG_SUBMIT
+#define IORING_FLAG_SUBMIT	(1 << 0)
+#endif
+#ifndef IORING_FLAG_GETEVENTS
+#define IORING_FLAG_GETEVENTS	(1 << 1)
+#endif
+
+typedef uint64_t u64;
+typedef uint32_t u32;
+typedef uint16_t u16;
+
+struct aio_sq_ring {
+	union {
+		struct {
+			u32 head;
+			u32 tail;
+			u32 nr_events;
+			u16 sq_thread_cpu;
+			u64 iocbs;
+		};
+		u32 pad[16];
+	};
+	u32 array[0];
+};
+
+struct aio_cq_ring {
+	union {
+		struct {
+			u32 head;
+			u32 tail;
+			u32 nr_events;
+		};
+		struct io_event pad;
+	};
+	struct io_event events[0];
+};
+
+struct aioring_data {
+	io_context_t aio_ctx;
+	struct io_u **io_us;
+	struct io_u **io_u_index;
+
+	struct aio_sq_ring *sq_ring;
+	struct iocb *iocbs;
+
+	struct aio_cq_ring *cq_ring;
+	struct io_event *events;
+
+	int queued;
+	int cq_ring_off;
+};
+
+struct aioring_options {
+	void *pad;
+	unsigned int hipri;
+	unsigned int fixedbufs;
+};
+
+static struct fio_option options[] = {
+	{
+		.name	= "hipri",
+		.lname	= "High Priority",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct aioring_options, hipri),
+		.help	= "Use polled IO completions",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
+	{
+		.name	= "fixedbufs",
+		.lname	= "Fixed (pre-mapped) IO buffers",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct aioring_options, fixedbufs),
+		.help	= "Pre map IO buffers",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
+static int fio_aioring_commit(struct thread_data *td);
+
+static int io_ring_enter(io_context_t ctx, unsigned int to_submit,
+			 unsigned int min_complete, unsigned int flags)
+{
+#ifdef __NR_sys_io_ring_enter
+	return syscall(__NR_sys_io_ring_enter, ctx, to_submit, min_complete,
+			flags);
+#else
+	return -1;
+#endif
+}
+
+static int fio_aioring_prep(struct thread_data *td, struct io_u *io_u)
+{
+	struct aioring_data *ld = td->io_ops_data;
+	struct fio_file *f = io_u->file;
+	struct aioring_options *o = td->eo;
+	struct iocb *iocb;
+
+	iocb = &ld->iocbs[io_u->index];
+
+	if (io_u->ddir == DDIR_READ) {
+		if (o->fixedbufs) {
+			iocb->aio_fildes = f->fd;
+			iocb->aio_lio_opcode = IO_CMD_PREAD;
+			iocb->u.c.offset = io_u->offset;
+		} else {
+			io_prep_pread(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+			if (o->hipri)
+				iocb->u.c.flags |= IOCB_FLAG_HIPRI;
+		}
+	} else if (io_u->ddir == DDIR_WRITE) {
+		if (o->fixedbufs) {
+			iocb->aio_fildes = f->fd;
+			iocb->aio_lio_opcode = IO_CMD_PWRITE;
+			iocb->u.c.offset = io_u->offset;
+		} else {
+			io_prep_pwrite(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+			if (o->hipri)
+				iocb->u.c.flags |= IOCB_FLAG_HIPRI;
+		}
+	} else if (ddir_sync(io_u->ddir))
+		io_prep_fsync(iocb, f->fd);
+
+	iocb->data = io_u;
+	return 0;
+}
+
+static struct io_u *fio_aioring_event(struct thread_data *td, int event)
+{
+	struct aioring_data *ld = td->io_ops_data;
+	struct io_event *ev;
+	struct io_u *io_u;
+	int index;
+
+	index = event + ld->cq_ring_off;
+	if (index >= ld->cq_ring->nr_events)
+		index -= ld->cq_ring->nr_events;
+
+	ev = &ld->cq_ring->events[index];
+	io_u = ev->data;
+
+	if (ev->res != io_u->xfer_buflen) {
+		if (ev->res > io_u->xfer_buflen)
+			io_u->error = -ev->res;
+		else
+			io_u->resid = io_u->xfer_buflen - ev->res;
+	} else
+		io_u->error = 0;
+
+	return io_u;
+}
+
+static int fio_aioring_cqring_reap(struct thread_data *td, unsigned int events,
+				   unsigned int max)
+{
+	struct aioring_data *ld = td->io_ops_data;
+	struct aio_cq_ring *ring = ld->cq_ring;
+	u32 head, reaped = 0;
+
+	head = ring->head;
+	do {
+		read_barrier();
+		if (head == ring->tail)
+			break;
+		reaped++;
+		head++;
+		if (head == ring->nr_events)
+			head = 0;
+	} while (reaped + events < max);
+
+	ring->head = head;
+	write_barrier();
+	return reaped;
+}
+
+static int fio_aioring_getevents(struct thread_data *td, unsigned int min,
+				 unsigned int max, const struct timespec *t)
+{
+	struct aioring_data *ld = td->io_ops_data;
+	unsigned actual_min = td->o.iodepth_batch_complete_min == 0 ? 0 : min;
+	struct aio_cq_ring *ring = ld->cq_ring;
+	int r, events = 0;
+
+	ld->cq_ring_off = ring->head;
+	do {
+		r = fio_aioring_cqring_reap(td, events, max);
+		if (r) {
+			events += r;
+			continue;
+		}
+
+		r = io_ring_enter(ld->aio_ctx, 0, actual_min,
+					IORING_FLAG_GETEVENTS);
+		if (r < 0) {
+			if (errno == EAGAIN)
+				continue;
+			perror("ring enter");
+			break;
+		}
+	} while (events < min);
+
+	return r < 0 ? r : events;
+}
+
+static enum fio_q_status fio_aioring_queue(struct thread_data *td,
+					   struct io_u *io_u)
+{
+	struct aioring_data *ld = td->io_ops_data;
+	struct aio_sq_ring *ring = ld->sq_ring;
+	unsigned tail, next_tail;
+
+	fio_ro_check(td, io_u);
+
+	if (ld->queued == td->o.iodepth)
+		return FIO_Q_BUSY;
+
+	/*
+	 * fsync is tricky, since it can fail and we need to do it
+	 * serialized with other io. the reason is that linux doesn't
+	 * support aio fsync yet. So return busy for the case where we
+	 * have pending io, to let fio complete those first.
+	 */
+	if (ddir_sync(io_u->ddir)) {
+		if (ld->queued)
+			return FIO_Q_BUSY;
+
+		do_io_u_sync(td, io_u);
+		return FIO_Q_COMPLETED;
+	}
+
+	if (io_u->ddir == DDIR_TRIM) {
+		if (ld->queued)
+			return FIO_Q_BUSY;
+
+		do_io_u_trim(td, io_u);
+		io_u_mark_submit(td, 1);
+		io_u_mark_complete(td, 1);
+		return FIO_Q_COMPLETED;
+	}
+
+	tail = ring->tail;
+	next_tail = tail + 1;
+	if (next_tail == ring->nr_events)
+		next_tail = 0;
+	read_barrier();
+	if (next_tail == ring->head)
+		return FIO_Q_BUSY;
+
+	ring->array[tail] = io_u->index;
+	ring->tail = next_tail;
+	write_barrier();
+
+	ld->queued++;
+	return FIO_Q_QUEUED;
+}
+
+static void fio_aioring_queued(struct thread_data *td, int start, int nr)
+{
+	struct aioring_data *ld = td->io_ops_data;
+	struct timespec now;
+
+	if (!fio_fill_issue_time(td))
+		return;
+
+	fio_gettime(&now, NULL);
+
+	while (nr--) {
+		int index = ld->sq_ring->array[start];
+		struct io_u *io_u = io_u = ld->io_u_index[index];
+
+		memcpy(&io_u->issue_time, &now, sizeof(now));
+		io_u_queued(td, io_u);
+
+		start++;
+		if (start == ld->sq_ring->nr_events)
+			start = 0;
+	}
+}
+
+static int fio_aioring_commit(struct thread_data *td)
+{
+	struct aioring_data *ld = td->io_ops_data;
+	int ret;
+
+	if (!ld->queued)
+		return 0;
+
+	do {
+		int start = ld->sq_ring->head;
+		long nr = ld->queued;
+
+		ret = io_ring_enter(ld->aio_ctx, nr, 0, IORING_FLAG_SUBMIT |
+						IORING_FLAG_GETEVENTS);
+		if (ret == -1)
+			perror("io_ring_enter");
+		if (ret > 0) {
+			fio_aioring_queued(td, start, ret);
+			io_u_mark_submit(td, ret);
+
+			ld->queued -= ret;
+			ret = 0;
+		} else if (ret == -EINTR || !ret) {
+			if (!ret)
+				io_u_mark_submit(td, ret);
+			continue;
+		} else if (ret == -EAGAIN) {
+			/*
+			 * If we get EAGAIN, we should break out without
+			 * error and let the upper layer reap some
+			 * events for us. If we have no queued IO, we
+			 * must loop here. If we loop for more than 30s,
+			 * just error out, something must be buggy in the
+			 * IO path.
+			 */
+			if (ld->queued) {
+				ret = 0;
+				break;
+			}
+			usleep(1);
+			continue;
+		} else if (ret == -ENOMEM) {
+			/*
+			 * If we get -ENOMEM, reap events if we can. If
+			 * we cannot, treat it as a fatal event since there's
+			 * nothing we can do about it.
+			 */
+			if (ld->queued)
+				ret = 0;
+			break;
+		} else
+			break;
+	} while (ld->queued);
+
+	return ret;
+}
+
+static size_t aioring_cq_size(struct thread_data *td)
+{
+	return sizeof(struct aio_cq_ring) + 2 * td->o.iodepth * sizeof(struct io_event);
+}
+
+static size_t aioring_sq_iocb(struct thread_data *td)
+{
+	return sizeof(struct iocb) * td->o.iodepth;
+}
+
+static size_t aioring_sq_size(struct thread_data *td)
+{
+	return sizeof(struct aio_sq_ring) + td->o.iodepth * sizeof(u32);
+}
+
+static void fio_aioring_cleanup(struct thread_data *td)
+{
+	struct aioring_data *ld = td->io_ops_data;
+
+	if (ld) {
+		/*
+		 * Work-around to avoid huge RCU stalls at exit time. If we
+		 * don't do this here, then it'll be torn down by exit_aio().
+		 * But for that case we can parallellize the freeing, thus
+		 * speeding it up a lot.
+		 */
+		if (!(td->flags & TD_F_CHILD))
+			io_destroy(ld->aio_ctx);
+		free(ld->io_u_index);
+		free(ld->io_us);
+		fio_memfree(ld->sq_ring, aioring_sq_size(td), false);
+		fio_memfree(ld->iocbs, aioring_sq_iocb(td), false);
+		fio_memfree(ld->cq_ring, aioring_cq_size(td), false);
+		free(ld);
+	}
+}
+
+static int fio_aioring_queue_init(struct thread_data *td)
+{
+#ifdef __NR_sys_io_setup2
+	struct aioring_data *ld = td->io_ops_data;
+	struct aioring_options *o = td->eo;
+	int flags = IOCTX_FLAG_SCQRING;
+	int depth = td->o.iodepth;
+
+	if (o->hipri)
+		flags |= IOCTX_FLAG_IOPOLL;
+	if (o->fixedbufs) {
+		struct rlimit rlim = {
+			.rlim_cur = RLIM_INFINITY,
+			.rlim_max = RLIM_INFINITY,
+		};
+
+		setrlimit(RLIMIT_MEMLOCK, &rlim);
+		flags |= IOCTX_FLAG_FIXEDBUFS;
+	}
+
+	return syscall(__NR_sys_io_setup2, depth, flags,
+			ld->sq_ring, ld->cq_ring, &ld->aio_ctx);
+#else
+	return -1;
+#endif
+}
+
+static int fio_aioring_post_init(struct thread_data *td)
+{
+	struct aioring_data *ld = td->io_ops_data;
+	struct aioring_options *o = td->eo;
+	struct io_u *io_u;
+	struct iocb *iocb;
+	int err = 0;
+
+	if (o->fixedbufs) {
+		int i;
+
+		for (i = 0; i < td->o.iodepth; i++) {
+			io_u = ld->io_u_index[i];
+			iocb = &ld->iocbs[i];
+			iocb->u.c.buf = io_u->buf;
+			iocb->u.c.nbytes = td_max_bs(td);
+
+			if (o->hipri)
+				iocb->u.c.flags |= IOCB_FLAG_HIPRI;
+		}
+	}
+
+	err = fio_aioring_queue_init(td);
+	if (err) {
+		td_verror(td, -err, "io_queue_init");
+		return 1;
+	}
+
+	return 0;
+}
+
+static int fio_aioring_init(struct thread_data *td)
+{
+	struct aioring_data *ld;
+
+	/* ring needs an extra entry, add one to achieve QD set */
+	td->o.iodepth++;
+
+	ld = calloc(1, sizeof(*ld));
+
+	/* io_u index */
+	ld->io_u_index = calloc(td->o.iodepth, sizeof(struct io_u *));
+	ld->io_us = calloc(td->o.iodepth, sizeof(struct io_u *));
+
+	ld->iocbs = fio_memalign(page_size, aioring_sq_iocb(td), false);
+	memset(ld->iocbs, 0, aioring_sq_iocb(td));
+
+	ld->sq_ring = fio_memalign(page_size, aioring_sq_size(td), false);
+	memset(ld->sq_ring, 0, aioring_sq_size(td));
+	ld->sq_ring->nr_events = td->o.iodepth;
+	ld->sq_ring->iocbs = (u64) (uintptr_t) ld->iocbs;
+
+	ld->cq_ring = fio_memalign(page_size, aioring_cq_size(td), false);
+	memset(ld->cq_ring, 0, aioring_cq_size(td));
+	ld->cq_ring->nr_events = td->o.iodepth * 2;
+
+	td->io_ops_data = ld;
+	return 0;
+}
+
+static int fio_aioring_io_u_init(struct thread_data *td, struct io_u *io_u)
+{
+	struct aioring_data *ld = td->io_ops_data;
+
+	ld->io_u_index[io_u->index] = io_u;
+	return 0;
+}
+
+static struct ioengine_ops ioengine = {
+	.name			= "aio-ring",
+	.version		= FIO_IOOPS_VERSION,
+	.init			= fio_aioring_init,
+	.post_init		= fio_aioring_post_init,
+	.io_u_init		= fio_aioring_io_u_init,
+	.prep			= fio_aioring_prep,
+	.queue			= fio_aioring_queue,
+	.commit			= fio_aioring_commit,
+	.getevents		= fio_aioring_getevents,
+	.event			= fio_aioring_event,
+	.cleanup		= fio_aioring_cleanup,
+	.open_file		= generic_open_file,
+	.close_file		= generic_close_file,
+	.get_file_size		= generic_get_file_size,
+	.options		= options,
+	.option_struct_size	= sizeof(struct aioring_options),
+};
+
+static void fio_init fio_aioring_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_aioring_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
+#endif
diff --git a/engines/libaio.c b/engines/libaio.c
index 03335094..8844ac8b 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -20,14 +20,8 @@
 #define IOCB_FLAG_HIPRI	(1 << 2)
 #endif
 
-#ifndef IOCTX_FLAG_USERIOCB
-#define IOCTX_FLAG_USERIOCB	(1 << 0)
-#endif
 #ifndef IOCTX_FLAG_IOPOLL
-#define IOCTX_FLAG_IOPOLL	(1 << 1)
-#endif
-#ifndef IOCTX_FLAG_FIXEDBUFS
-#define IOCTX_FLAG_FIXEDBUFS	(1 << 2)
+#define IOCTX_FLAG_IOPOLL	(1 << 0)
 #endif
 
 static int fio_libaio_commit(struct thread_data *td);
@@ -38,7 +32,6 @@ struct libaio_data {
 	struct iocb **iocbs;
 	struct io_u **io_us;
 
-	struct iocb *user_iocbs;
 	struct io_u **io_u_index;
 
 	/*
@@ -60,8 +53,6 @@ struct libaio_options {
 	void *pad;
 	unsigned int userspace_reap;
 	unsigned int hipri;
-	unsigned int useriocb;
-	unsigned int fixedbufs;
 };
 
 static struct fio_option options[] = {
@@ -83,24 +74,6 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
 	},
-	{
-		.name	= "useriocb",
-		.lname	= "User IOCBs",
-		.type	= FIO_OPT_STR_SET,
-		.off1	= offsetof(struct libaio_options, useriocb),
-		.help	= "Use user mapped IOCBs",
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_LIBAIO,
-	},
-	{
-		.name	= "fixedbufs",
-		.lname	= "Fixed (pre-mapped) IO buffers",
-		.type	= FIO_OPT_STR_SET,
-		.off1	= offsetof(struct libaio_options, fixedbufs),
-		.help	= "Pre map IO buffers",
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_LIBAIO,
-	},
 	{
 		.name	= NULL,
 	},
@@ -117,36 +90,20 @@ static inline void ring_inc(struct libaio_data *ld, unsigned int *val,
 
 static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 {
-	struct libaio_data *ld = td->io_ops_data;
 	struct fio_file *f = io_u->file;
 	struct libaio_options *o = td->eo;
 	struct iocb *iocb;
 
-	if (o->useriocb)
-		iocb = &ld->user_iocbs[io_u->index];
-	else
-		iocb = &io_u->iocb;
+	iocb = &io_u->iocb;
 
 	if (io_u->ddir == DDIR_READ) {
-		if (o->fixedbufs) {
-			iocb->aio_fildes = f->fd;
-			iocb->aio_lio_opcode = IO_CMD_PREAD;
-			iocb->u.c.offset = io_u->offset;
-		} else {
-			io_prep_pread(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
-			if (o->hipri)
-				iocb->u.c.flags |= IOCB_FLAG_HIPRI;
-		}
+		io_prep_pread(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+		if (o->hipri)
+			iocb->u.c.flags |= IOCB_FLAG_HIPRI;
 	} else if (io_u->ddir == DDIR_WRITE) {
-		if (o->fixedbufs) {
-			iocb->aio_fildes = f->fd;
-			iocb->aio_lio_opcode = IO_CMD_PWRITE;
-			iocb->u.c.offset = io_u->offset;
-		} else {
-			io_prep_pwrite(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
-			if (o->hipri)
-				iocb->u.c.flags |= IOCB_FLAG_HIPRI;
-		}
+		io_prep_pwrite(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+		if (o->hipri)
+			iocb->u.c.flags |= IOCB_FLAG_HIPRI;
 	} else if (ddir_sync(io_u->ddir))
 		io_prep_fsync(iocb, f->fd);
 
@@ -156,16 +113,11 @@ static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 static struct io_u *fio_libaio_event(struct thread_data *td, int event)
 {
 	struct libaio_data *ld = td->io_ops_data;
-	struct libaio_options *o = td->eo;
 	struct io_event *ev;
 	struct io_u *io_u;
 
 	ev = ld->aio_events + event;
-	if (o->useriocb) {
-		int index = (int) (uintptr_t) ev->obj;
-		io_u = ld->io_u_index[index];
-	} else
-		io_u = container_of(ev->obj, struct io_u, iocb);
+	io_u = container_of(ev->obj, struct io_u, iocb);
 
 	if (ev->res != io_u->xfer_buflen) {
 		if (ev->res > io_u->xfer_buflen)
@@ -261,7 +213,6 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 					  struct io_u *io_u)
 {
 	struct libaio_data *ld = td->io_ops_data;
-	struct libaio_options *o = td->eo;
 
 	fio_ro_check(td, io_u);
 
@@ -292,11 +243,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 		return FIO_Q_COMPLETED;
 	}
 
-	if (o->useriocb)
-		ld->iocbs[ld->head] = (struct iocb *) (uintptr_t) io_u->index;
-	else
-		ld->iocbs[ld->head] = &io_u->iocb;
-
+	ld->iocbs[ld->head] = &io_u->iocb;
 	ld->io_us[ld->head] = io_u;
 	ring_inc(ld, &ld->head, 1);
 	ld->queued++;
@@ -415,87 +362,46 @@ static void fio_libaio_cleanup(struct thread_data *td)
 		free(ld->aio_events);
 		free(ld->iocbs);
 		free(ld->io_us);
-		if (ld->user_iocbs) {
-			size_t size = td->o.iodepth * sizeof(struct iocb);
-			fio_memfree(ld->user_iocbs, size, false);
-		}
 		free(ld);
 	}
 }
 
 static int fio_libaio_old_queue_init(struct libaio_data *ld, unsigned int depth,
-				     bool hipri, bool useriocb, bool fixedbufs)
+				     bool hipri)
 {
 	if (hipri) {
 		log_err("fio: polled aio not available on your platform\n");
 		return 1;
 	}
-	if (useriocb) {
-		log_err("fio: user mapped iocbs not available on your platform\n");
-		return 1;
-	}
-	if (fixedbufs) {
-		log_err("fio: fixed buffers not available on your platform\n");
-		return 1;
-	}
 
 	return io_queue_init(depth, &ld->aio_ctx);
 }
 
 static int fio_libaio_queue_init(struct libaio_data *ld, unsigned int depth,
-				 bool hipri, bool useriocb, bool fixedbufs)
+				 bool hipri)
 {
 #ifdef __NR_sys_io_setup2
 	int ret, flags = 0;
 
 	if (hipri)
 		flags |= IOCTX_FLAG_IOPOLL;
-	if (useriocb)
-		flags |= IOCTX_FLAG_USERIOCB;
-	if (fixedbufs) {
-		struct rlimit rlim = {
-			.rlim_cur = RLIM_INFINITY,
-			.rlim_max = RLIM_INFINITY,
-		};
-
-		setrlimit(RLIMIT_MEMLOCK, &rlim);
-		flags |= IOCTX_FLAG_FIXEDBUFS;
-	}
 
-	ret = syscall(__NR_sys_io_setup2, depth, flags, ld->user_iocbs,
-			NULL, NULL, &ld->aio_ctx);
+	ret = syscall(__NR_sys_io_setup2, depth, flags, NULL, NULL,
+			&ld->aio_ctx);
 	if (!ret)
 		return 0;
 	/* fall through to old syscall */
 #endif
-	return fio_libaio_old_queue_init(ld, depth, hipri, useriocb, fixedbufs);
+	return fio_libaio_old_queue_init(ld, depth, hipri);
 }
 
 static int fio_libaio_post_init(struct thread_data *td)
 {
 	struct libaio_data *ld = td->io_ops_data;
 	struct libaio_options *o = td->eo;
-	struct io_u *io_u;
-	struct iocb *iocb;
 	int err = 0;
 
-	if (o->fixedbufs) {
-		int i;
-
-		for (i = 0; i < td->o.iodepth; i++) {
-			io_u = ld->io_u_index[i];
-			iocb = &ld->user_iocbs[i];
-			iocb->u.c.buf = io_u->buf;
-			iocb->u.c.nbytes = td_max_bs(td);
-
-			iocb->u.c.flags = 0;
-			if (o->hipri)
-				iocb->u.c.flags |= IOCB_FLAG_HIPRI;
-		}
-	}
-
-	err = fio_libaio_queue_init(ld, td->o.iodepth, o->hipri, o->useriocb,
-					o->fixedbufs);
+	err = fio_libaio_queue_init(ld, td->o.iodepth, o->hipri);
 	if (err) {
 		td_verror(td, -err, "io_queue_init");
 		return 1;
@@ -506,20 +412,10 @@ static int fio_libaio_post_init(struct thread_data *td)
 
 static int fio_libaio_init(struct thread_data *td)
 {
-	struct libaio_options *o = td->eo;
 	struct libaio_data *ld;
 
 	ld = calloc(1, sizeof(*ld));
 
-	if (o->useriocb) {
-		size_t size;
-
-		ld->io_u_index = calloc(td->o.iodepth, sizeof(struct io_u *));
-		size = td->o.iodepth * sizeof(struct iocb);
-		ld->user_iocbs = fio_memalign(page_size, size, false);
-		memset(ld->user_iocbs, 0, size);
-	}
-
 	ld->entries = td->o.iodepth;
 	ld->is_pow2 = is_power_of_2(ld->entries);
 	ld->aio_events = calloc(ld->entries, sizeof(struct io_event));
@@ -530,25 +426,11 @@ static int fio_libaio_init(struct thread_data *td)
 	return 0;
 }
 
-static int fio_libaio_io_u_init(struct thread_data *td, struct io_u *io_u)
-{
-	struct libaio_options *o = td->eo;
-
-	if (o->useriocb) {
-		struct libaio_data *ld = td->io_ops_data;
-
-		ld->io_u_index[io_u->index] = io_u;
-	}
-
-	return 0;
-}
-
 static struct ioengine_ops ioengine = {
 	.name			= "libaio",
 	.version		= FIO_IOOPS_VERSION,
 	.init			= fio_libaio_init,
 	.post_init		= fio_libaio_post_init,
-	.io_u_init		= fio_libaio_io_u_init,
 	.prep			= fio_libaio_prep,
 	.queue			= fio_libaio_queue,
 	.commit			= fio_libaio_commit,
diff --git a/ioengines.c b/ioengines.c
index b7df8608..45e769e6 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -131,7 +131,7 @@ static struct ioengine_ops *__load_ioengine(const char *name)
 	/*
 	 * linux libaio has alias names, so convert to what we want
 	 */
-	if (!strncmp(engine, "linuxaio", 8) || !strncmp(engine, "aio", 3)) {
+	if (!strncmp(engine, "linuxaio", 8)) {
 		dprint(FD_IO, "converting ioengine name: %s -> libaio\n", name);
 		strcpy(engine, "libaio");
 	}
diff --git a/options.c b/options.c
index 7a7006c1..626c7c17 100644
--- a/options.c
+++ b/options.c
@@ -1773,6 +1773,13 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "Linux native asynchronous IO",
 			  },
 #endif
+#ifdef CONFIG_LIBAIO
+#ifdef ARCH_HAVE_AIORING
+			  { .ival = "aio-ring",
+			    .help = "Linux native asynchronous IO",
+			  },
+#endif
+#endif
 #ifdef CONFIG_POSIXAIO
 			  { .ival = "posixaio",
 			    .help = "POSIX asynchronous IO",
diff --git a/t/aio-ring.c b/t/aio-ring.c
index c6106348..322f2ffa 100644
--- a/t/aio-ring.c
+++ b/t/aio-ring.c
@@ -24,38 +24,42 @@
 
 #define IOCB_FLAG_HIPRI		(1 << 2)
 
-#define IOCTX_FLAG_IOPOLL	(1 << 1)
-#define IOCTX_FLAG_USERIOCB	(1 << 0)
+#define IOCTX_FLAG_IOPOLL	(1 << 0)
+#define IOCTX_FLAG_SCQRING	(1 << 1)	/* Use SQ/CQ rings */
 #define IOCTX_FLAG_FIXEDBUFS	(1 << 2)
-#define IOCTX_FLAG_SCQRING	(1 << 3)	/* Use SQ/CQ rings */
-#define IOCTX_FLAG_SQTHREAD	(1 << 4)	/* Use SQ thread */
-#define IOCTX_FLAG_SQWQ		(1 << 5)	/* Use SQ wq */
+#define IOCTX_FLAG_SQTHREAD	(1 << 3)	/* Use SQ thread */
+#define IOCTX_FLAG_SQWQ		(1 << 4)	/* Use SQ wq */
 
 #define barrier()	__asm__ __volatile__("": : :"memory")
 
 #define min(a, b)		((a < b) ? (a) : (b))
 
+typedef uint64_t u64;
 typedef uint32_t u32;
+typedef uint16_t u16;
 
-struct aio_iocb_ring {
+struct aio_sq_ring {
 	union {
 		struct {
-			u32 head, tail;
+			u32 head;
+			u32 tail;
 			u32 nr_events;
-			u32 sq_thread_cpu;
+			u16 sq_thread_cpu;
+			u64 iocbs;
 		};
-		struct iocb pad_iocb;
+		u32 pad[16];
 	};
-	struct iocb iocbs[0];
+	u32 array[0];
 };
 
-struct aio_io_event_ring {
+struct aio_cq_ring {
 	union {
 		struct {
-			u32 head, tail;
+			u32 head;
+			u32 tail;
 			u32 nr_events;
 		};
-		struct io_event pad_event;
+		struct io_event pad;
 	};
 	struct io_event events[0];
 };
@@ -76,8 +80,9 @@ struct submitter {
 	unsigned long max_blocks;
 	io_context_t ioc;
 	struct drand48_data rand;
-	struct aio_iocb_ring *sq_ring;
-	struct aio_io_event_ring *cq_ring;
+	struct aio_sq_ring *sq_ring;
+	struct iocb *iocbs;
+	struct aio_cq_ring *cq_ring;
 	int inflight;
 	unsigned long reaps;
 	unsigned long done;
@@ -96,10 +101,10 @@ static int sq_thread = 0;	/* use kernel submission thread */
 static int sq_thread_cpu = 0;	/* pin above thread to this CPU */
 
 static int io_setup2(unsigned int nr_events, unsigned int flags,
-		     struct iocb *iocbs, struct aio_iocb_ring *sq_ring,
-		     struct aio_io_event_ring *cq_ring, io_context_t *ctx_idp)
+		     struct aio_sq_ring *sq_ring, struct aio_cq_ring *cq_ring,
+		     io_context_t *ctx_idp)
 {
-	return syscall(335, nr_events, flags, iocbs, sq_ring, cq_ring, ctx_idp);
+	return syscall(335, nr_events, flags, sq_ring, cq_ring, ctx_idp);
 }
 
 static int io_ring_enter(io_context_t ctx, unsigned int to_submit,
@@ -132,8 +137,7 @@ static void init_io(struct submitter *s, int fd, struct iocb *iocb)
 
 static int prep_more_ios(struct submitter *s, int fd, int max_ios)
 {
-	struct aio_iocb_ring *ring = s->sq_ring;
-	struct iocb *iocb;
+	struct aio_sq_ring *ring = s->sq_ring;
 	u32 tail, next_tail, prepped = 0;
 
 	next_tail = tail = ring->tail;
@@ -146,8 +150,8 @@ static int prep_more_ios(struct submitter *s, int fd, int max_ios)
 		if (next_tail == ring->head)
 			break;
 
-		iocb = &s->sq_ring->iocbs[tail];
-		init_io(s, fd, iocb);
+		init_io(s, fd, &s->iocbs[tail]);
+		s->sq_ring->array[tail] = tail;
 		prepped++;
 		tail = next_tail;
 	} while (prepped < max_ios);
@@ -185,7 +189,7 @@ static int get_file_size(int fd, unsigned long *blocks)
 
 static int reap_events(struct submitter *s)
 {
-	struct aio_io_event_ring *ring = s->cq_ring;
+	struct aio_cq_ring *ring = s->cq_ring;
 	struct io_event *ev;
 	u32 head, reaped = 0;
 
@@ -196,8 +200,7 @@ static int reap_events(struct submitter *s)
 			break;
 		ev = &ring->events[head];
 		if (ev->res != BS) {
-			int index = (int) (uintptr_t) ev->obj;
-			struct iocb *iocb = &s->sq_ring->iocbs[index];
+			struct iocb *iocb = ev->obj;
 
 			printf("io: unexpected ret=%ld\n", ev->res);
 			printf("offset=%lu, size=%lu\n", (unsigned long) iocb->u.c.offset, (unsigned long) iocb->u.c.nbytes);
@@ -351,20 +354,31 @@ int main(int argc, char *argv[])
 
 	arm_sig_int();
 
-	size = sizeof(struct aio_iocb_ring) + RING_SIZE * sizeof(struct iocb);
+	size = sizeof(struct iocb) * RING_SIZE;
+	if (posix_memalign(&p, 4096, size))
+		return 1;
+	memset(p, 0, size);
+	s->iocbs = p;
+
+	size = sizeof(struct aio_sq_ring) + RING_SIZE * sizeof(u32);
 	if (posix_memalign(&p, 4096, size))
 		return 1;
 	s->sq_ring = p;
 	memset(p, 0, size);
+	s->sq_ring->nr_events = RING_SIZE;
+	s->sq_ring->iocbs = (u64) s->iocbs;
 
-	size = sizeof(struct aio_io_event_ring) + RING_SIZE * sizeof(struct io_event);
+	/* CQ ring must be twice as big */
+	size = sizeof(struct aio_cq_ring) +
+			2 * RING_SIZE * sizeof(struct io_event);
 	if (posix_memalign(&p, 4096, size))
 		return 1;
 	s->cq_ring = p;
 	memset(p, 0, size);
+	s->cq_ring->nr_events = 2 * RING_SIZE;
 
 	for (j = 0; j < RING_SIZE; j++) {
-		struct iocb *iocb = &s->sq_ring->iocbs[j];
+		struct iocb *iocb = &s->iocbs[j];
 
 		if (posix_memalign(&iocb->u.c.buf, BS, BS)) {
 			printf("failed alloc\n");
@@ -385,7 +399,7 @@ int main(int argc, char *argv[])
 		s->sq_ring->sq_thread_cpu = sq_thread_cpu;
 	}
 
-	err = io_setup2(RING_SIZE, flags, s->sq_ring->iocbs, s->sq_ring, s->cq_ring, &s->ioc);
+	err = io_setup2(RING_SIZE, flags, s->sq_ring, s->cq_ring, &s->ioc);
 	if (err) {
 		printf("ctx_init failed: %s, %d\n", strerror(errno), err);
 		return 1;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-12-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-12-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 46961748140e291f0f1e966bac9aad1a3b9ad1c7:

  engines/libaio: increase RLIMIT_MEMLOCK for user buffers (2018-12-04 11:27:02 -0700)

are available in the Git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0527b23f9112b59dc63f3b40a7f40d45c48ec60d:

  t/aio-ring: updates/cleanups (2018-12-10 15:14:36 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Add aio-ring test app
      t/aio-ring: updates/cleanups

 Makefile     |   7 +
 t/aio-ring.c | 424 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 431 insertions(+)
 create mode 100644 t/aio-ring.c

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 5ac568e9..284621d3 100644
--- a/Makefile
+++ b/Makefile
@@ -263,6 +263,9 @@ T_VS_PROGS = t/fio-verify-state
 T_PIPE_ASYNC_OBJS = t/read-to-pipe-async.o
 T_PIPE_ASYNC_PROGS = t/read-to-pipe-async
 
+T_AIO_RING_OBJS = t/aio-ring.o
+T_AIO_RING_PROGS = t/aio-ring
+
 T_MEMLOCK_OBJS = t/memlock.o
 T_MEMLOCK_PROGS = t/memlock
 
@@ -281,6 +284,7 @@ T_OBJS += $(T_VS_OBJS)
 T_OBJS += $(T_PIPE_ASYNC_OBJS)
 T_OBJS += $(T_MEMLOCK_OBJS)
 T_OBJS += $(T_TT_OBJS)
+T_OBJS += $(T_AIO_RING_OBJS)
 
 ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
     T_DEDUPE_OBJS += os/windows/posix.o lib/hweight.o
@@ -440,6 +444,9 @@ cairo_text_helpers.o: cairo_text_helpers.c cairo_text_helpers.h
 printing.o: printing.c printing.h
 	$(QUIET_CC)$(CC) $(CFLAGS) $(GTK_CFLAGS) $(CPPFLAGS) -c $<
 
+t/aio-ring: $(T_AIO_RING_OBJS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_AIO_RING_OBJS) $(LIBS)
+
 t/read-to-pipe-async: $(T_PIPE_ASYNC_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_PIPE_ASYNC_OBJS) $(LIBS)
 
diff --git a/t/aio-ring.c b/t/aio-ring.c
new file mode 100644
index 00000000..c6106348
--- /dev/null
+++ b/t/aio-ring.c
@@ -0,0 +1,424 @@
+/*
+ * gcc -D_GNU_SOURCE -Wall -O2 -o aio-ring aio-ring.c  -lpthread -laio
+ */
+#include <stdio.h>
+#include <errno.h>
+#include <assert.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <signal.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/syscall.h>
+#include <sys/resource.h>
+#include <linux/fs.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <libaio.h>
+#include <string.h>
+#include <pthread.h>
+#include <sched.h>
+
+#define IOCB_FLAG_HIPRI		(1 << 2)
+
+#define IOCTX_FLAG_IOPOLL	(1 << 1)
+#define IOCTX_FLAG_USERIOCB	(1 << 0)
+#define IOCTX_FLAG_FIXEDBUFS	(1 << 2)
+#define IOCTX_FLAG_SCQRING	(1 << 3)	/* Use SQ/CQ rings */
+#define IOCTX_FLAG_SQTHREAD	(1 << 4)	/* Use SQ thread */
+#define IOCTX_FLAG_SQWQ		(1 << 5)	/* Use SQ wq */
+
+#define barrier()	__asm__ __volatile__("": : :"memory")
+
+#define min(a, b)		((a < b) ? (a) : (b))
+
+typedef uint32_t u32;
+
+struct aio_iocb_ring {
+	union {
+		struct {
+			u32 head, tail;
+			u32 nr_events;
+			u32 sq_thread_cpu;
+		};
+		struct iocb pad_iocb;
+	};
+	struct iocb iocbs[0];
+};
+
+struct aio_io_event_ring {
+	union {
+		struct {
+			u32 head, tail;
+			u32 nr_events;
+		};
+		struct io_event pad_event;
+	};
+	struct io_event events[0];
+};
+
+#define IORING_FLAG_SUBMIT	(1 << 0)
+#define IORING_FLAG_GETEVENTS	(1 << 1)
+
+#define DEPTH			32
+#define RING_SIZE		(DEPTH + 1)
+
+#define BATCH_SUBMIT		8
+#define BATCH_COMPLETE		8
+
+#define BS			4096
+
+struct submitter {
+	pthread_t thread;
+	unsigned long max_blocks;
+	io_context_t ioc;
+	struct drand48_data rand;
+	struct aio_iocb_ring *sq_ring;
+	struct aio_io_event_ring *cq_ring;
+	int inflight;
+	unsigned long reaps;
+	unsigned long done;
+	unsigned long calls;
+	volatile int finish;
+	char filename[128];
+};
+
+static struct submitter submitters[1];
+static volatile int finish;
+
+static int polled = 1;		/* use IO polling */
+static int fixedbufs = 1;	/* use fixed user buffers */
+static int buffered = 0;	/* use buffered IO, not O_DIRECT */
+static int sq_thread = 0;	/* use kernel submission thread */
+static int sq_thread_cpu = 0;	/* pin above thread to this CPU */
+
+static int io_setup2(unsigned int nr_events, unsigned int flags,
+		     struct iocb *iocbs, struct aio_iocb_ring *sq_ring,
+		     struct aio_io_event_ring *cq_ring, io_context_t *ctx_idp)
+{
+	return syscall(335, nr_events, flags, iocbs, sq_ring, cq_ring, ctx_idp);
+}
+
+static int io_ring_enter(io_context_t ctx, unsigned int to_submit,
+			 unsigned int min_complete, unsigned int flags)
+{
+	return syscall(336, ctx, to_submit, min_complete, flags);
+}
+
+static int gettid(void)
+{
+	return syscall(__NR_gettid);
+}
+
+static void init_io(struct submitter *s, int fd, struct iocb *iocb)
+{
+	unsigned long offset;
+	long r;
+
+	lrand48_r(&s->rand, &r);
+	offset = (r % (s->max_blocks - 1)) * BS;
+
+	iocb->aio_fildes = fd;
+	iocb->aio_lio_opcode = IO_CMD_PREAD;
+	iocb->u.c.offset = offset;
+	if (polled)
+		iocb->u.c.flags = IOCB_FLAG_HIPRI;
+	if (!fixedbufs)
+		iocb->u.c.nbytes = BS;
+}
+
+static int prep_more_ios(struct submitter *s, int fd, int max_ios)
+{
+	struct aio_iocb_ring *ring = s->sq_ring;
+	struct iocb *iocb;
+	u32 tail, next_tail, prepped = 0;
+
+	next_tail = tail = ring->tail;
+	do {
+		next_tail++;
+		if (next_tail == ring->nr_events)
+			next_tail = 0;
+
+		barrier();
+		if (next_tail == ring->head)
+			break;
+
+		iocb = &s->sq_ring->iocbs[tail];
+		init_io(s, fd, iocb);
+		prepped++;
+		tail = next_tail;
+	} while (prepped < max_ios);
+
+	if (ring->tail != tail) {
+		/* order tail store with writes to iocbs above */
+		barrier();
+		ring->tail = tail;
+		barrier();
+	}
+	return prepped;
+}
+
+static int get_file_size(int fd, unsigned long *blocks)
+{
+	struct stat st;
+
+	if (fstat(fd, &st) < 0)
+		return -1;
+	if (S_ISBLK(st.st_mode)) {
+		unsigned long long bytes;
+
+		if (ioctl(fd, BLKGETSIZE64, &bytes) != 0)
+			return -1;
+
+		*blocks = bytes / BS;
+		return 0;
+	} else if (S_ISREG(st.st_mode)) {
+		*blocks = st.st_size / BS;
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reap_events(struct submitter *s)
+{
+	struct aio_io_event_ring *ring = s->cq_ring;
+	struct io_event *ev;
+	u32 head, reaped = 0;
+
+	head = ring->head;
+	do {
+		barrier();
+		if (head == ring->tail)
+			break;
+		ev = &ring->events[head];
+		if (ev->res != BS) {
+			int index = (int) (uintptr_t) ev->obj;
+			struct iocb *iocb = &s->sq_ring->iocbs[index];
+
+			printf("io: unexpected ret=%ld\n", ev->res);
+			printf("offset=%lu, size=%lu\n", (unsigned long) iocb->u.c.offset, (unsigned long) iocb->u.c.nbytes);
+			return -1;
+		}
+		reaped++;
+		head++;
+		if (head == ring->nr_events)
+			head = 0;
+	} while (1);
+
+	s->inflight -= reaped;
+	ring->head = head;
+	barrier();
+	return reaped;
+}
+
+static void *submitter_fn(void *data)
+{
+	struct submitter *s = data;
+	int fd, ret, prepped, flags;
+
+	printf("submitter=%d\n", gettid());
+
+	flags = O_RDONLY;
+	if (!buffered)
+		flags |= O_DIRECT;
+	fd = open(s->filename, flags);
+	if (fd < 0) {
+		perror("open");
+		goto done;
+	}
+
+	if (get_file_size(fd, &s->max_blocks)) {
+		printf("failed getting size of device/file\n");
+		goto err;
+	}
+	if (!s->max_blocks) {
+		printf("Zero file/device size?\n");
+		goto err;
+	}
+
+	s->max_blocks--;
+
+	srand48_r(pthread_self(), &s->rand);
+
+	prepped = 0;
+	do {
+		int to_wait, flags, to_submit, this_reap;
+
+		if (!prepped && s->inflight < DEPTH)
+			prepped = prep_more_ios(s, fd, min(DEPTH - s->inflight, BATCH_SUBMIT));
+		s->inflight += prepped;
+submit_more:
+		to_submit = prepped;
+submit:
+		if (s->inflight + BATCH_SUBMIT < DEPTH)
+			to_wait = 0;
+		else
+			to_wait = min(s->inflight + to_submit, BATCH_COMPLETE);
+
+		flags = IORING_FLAG_GETEVENTS;
+		if (to_submit)
+			flags |= IORING_FLAG_SUBMIT;
+
+		ret = io_ring_enter(s->ioc, to_submit, to_wait, flags);
+		s->calls++;
+
+		this_reap = reap_events(s);
+		if (this_reap == -1)
+			break;
+		s->reaps += this_reap;
+
+		if (ret >= 0) {
+			if (!ret) {
+				to_submit = 0;
+				if (s->inflight)
+					goto submit;
+				continue;
+			} else if (ret < to_submit) {
+				int diff = to_submit - ret;
+
+				s->done += ret;
+				prepped -= diff;
+				goto submit_more;
+			}
+			s->done += ret;
+			prepped = 0;
+			continue;
+		} else if (ret < 0) {
+			if ((ret == -1 && errno == EAGAIN) || ret == -EAGAIN) {
+				if (s->finish)
+					break;
+				if (this_reap)
+					goto submit;
+				to_submit = 0;
+				goto submit;
+			}
+			if (ret == -1)
+				printf("io_submit: %s\n", strerror(errno));
+			else
+				printf("io_submit: %s\n", strerror(-ret));
+			break;
+		}
+	} while (!s->finish);
+err:
+	close(fd);
+done:
+	finish = 1;
+	return NULL;
+}
+
+static void sig_int(int sig)
+{
+	printf("Exiting on signal %d\n", sig);
+	submitters[0].finish = 1;
+	finish = 1;
+}
+
+static void arm_sig_int(void)
+{
+	struct sigaction act;
+
+	memset(&act, 0, sizeof(act));
+	act.sa_handler = sig_int;
+	act.sa_flags = SA_RESTART;
+	sigaction(SIGINT, &act, NULL);
+}
+
+int main(int argc, char *argv[])
+{
+	struct submitter *s = &submitters[0];
+	unsigned long done, calls, reap;
+	int flags = 0, err;
+	int j;
+	size_t size;
+	void *p, *ret;
+	struct rlimit rlim;
+
+	if (argc < 2) {
+		printf("%s: filename\n", argv[0]);
+		return 1;
+	}
+
+	rlim.rlim_cur = RLIM_INFINITY;
+	rlim.rlim_max = RLIM_INFINITY;
+	if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0) {
+		perror("setrlimit");
+		return 1;
+	}
+
+	arm_sig_int();
+
+	size = sizeof(struct aio_iocb_ring) + RING_SIZE * sizeof(struct iocb);
+	if (posix_memalign(&p, 4096, size))
+		return 1;
+	s->sq_ring = p;
+	memset(p, 0, size);
+
+	size = sizeof(struct aio_io_event_ring) + RING_SIZE * sizeof(struct io_event);
+	if (posix_memalign(&p, 4096, size))
+		return 1;
+	s->cq_ring = p;
+	memset(p, 0, size);
+
+	for (j = 0; j < RING_SIZE; j++) {
+		struct iocb *iocb = &s->sq_ring->iocbs[j];
+
+		if (posix_memalign(&iocb->u.c.buf, BS, BS)) {
+			printf("failed alloc\n");
+			return 1;
+		}
+		iocb->u.c.nbytes = BS;
+	}
+
+	flags = IOCTX_FLAG_SCQRING;
+	if (polled)
+		flags |= IOCTX_FLAG_IOPOLL;
+	if (fixedbufs)
+		flags |= IOCTX_FLAG_FIXEDBUFS;
+	if (buffered)
+		flags |= IOCTX_FLAG_SQWQ;
+	else if (sq_thread) {
+		flags |= IOCTX_FLAG_SQTHREAD;
+		s->sq_ring->sq_thread_cpu = sq_thread_cpu;
+	}
+
+	err = io_setup2(RING_SIZE, flags, s->sq_ring->iocbs, s->sq_ring, s->cq_ring, &s->ioc);
+	if (err) {
+		printf("ctx_init failed: %s, %d\n", strerror(errno), err);
+		return 1;
+	}
+	printf("polled=%d, fixedbufs=%d, buffered=%d\n", polled, fixedbufs, buffered);
+	printf("  QD=%d, sq_ring=%d, cq_ring=%d\n", DEPTH, s->sq_ring->nr_events, s->cq_ring->nr_events);
+	strcpy(s->filename, argv[1]);
+
+	pthread_create(&s->thread, NULL, submitter_fn, s);
+
+	reap = calls = done = 0;
+	do {
+		unsigned long this_done = 0;
+		unsigned long this_reap = 0;
+		unsigned long this_call = 0;
+		unsigned long rpc = 0, ipc = 0;
+
+		sleep(1);
+		this_done += s->done;
+		this_call += s->calls;
+		this_reap += s->reaps;
+		if (this_call - calls) {
+			rpc = (this_done - done) / (this_call - calls);
+			ipc = (this_reap - reap) / (this_call - calls);
+		}
+		printf("IOPS=%lu, IOS/call=%lu/%lu, inflight=%u (head=%d tail=%d), %lu, %lu\n",
+				this_done - done, rpc, ipc, s->inflight,
+				s->cq_ring->head, s->cq_ring->tail, s->reaps, s->done);
+		done = this_done;
+		calls = this_call;
+		reap = this_reap;
+	} while (!finish);
+
+	pthread_join(s->thread, &ret);
+	return 0;
+}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-12-05 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-12-05 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6f3a2c116481731abdf2fbed2d0f63cf4d3ddca2:

  engines/libaio: set IOCB_HIPRI if we are polling (2018-12-01 10:17:26 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 46961748140e291f0f1e966bac9aad1a3b9ad1c7:

  engines/libaio: increase RLIMIT_MEMLOCK for user buffers (2018-12-04 11:27:02 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      engines/libaio: update for newer io_setup2() system call
      engines/libaio: increase RLIMIT_MEMLOCK for user buffers

 engines/libaio.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libaio.c b/engines/libaio.c
index a780b2b..0333509 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -8,6 +8,8 @@
 #include <unistd.h>
 #include <errno.h>
 #include <libaio.h>
+#include <sys/time.h>
+#include <sys/resource.h>
 
 #include "../fio.h"
 #include "../lib/pow2.h"
@@ -450,11 +452,18 @@ static int fio_libaio_queue_init(struct libaio_data *ld, unsigned int depth,
 		flags |= IOCTX_FLAG_IOPOLL;
 	if (useriocb)
 		flags |= IOCTX_FLAG_USERIOCB;
-	if (fixedbufs)
+	if (fixedbufs) {
+		struct rlimit rlim = {
+			.rlim_cur = RLIM_INFINITY,
+			.rlim_max = RLIM_INFINITY,
+		};
+
+		setrlimit(RLIMIT_MEMLOCK, &rlim);
 		flags |= IOCTX_FLAG_FIXEDBUFS;
+	}
 
 	ret = syscall(__NR_sys_io_setup2, depth, flags, ld->user_iocbs,
-			&ld->aio_ctx);
+			NULL, NULL, &ld->aio_ctx);
 	if (!ret)
 		return 0;
 	/* fall through to old syscall */


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-12-02 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-12-02 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 68afa5b570a7e0dce0470817037f7828cf36cd2f:

  stat: assign for first stat iteration, don't sum (2018-11-30 14:44:25 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6f3a2c116481731abdf2fbed2d0f63cf4d3ddca2:

  engines/libaio: set IOCB_HIPRI if we are polling (2018-12-01 10:17:26 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      engines/libaio: set IOCB_HIPRI if we are polling

 engines/libaio.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libaio.c b/engines/libaio.c
index 9c8a61b..a780b2b 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -130,15 +130,21 @@ static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 			iocb->aio_fildes = f->fd;
 			iocb->aio_lio_opcode = IO_CMD_PREAD;
 			iocb->u.c.offset = io_u->offset;
-		} else
+		} else {
 			io_prep_pread(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+			if (o->hipri)
+				iocb->u.c.flags |= IOCB_FLAG_HIPRI;
+		}
 	} else if (io_u->ddir == DDIR_WRITE) {
 		if (o->fixedbufs) {
 			iocb->aio_fildes = f->fd;
 			iocb->aio_lio_opcode = IO_CMD_PWRITE;
 			iocb->u.c.offset = io_u->offset;
-		} else
+		} else {
 			io_prep_pwrite(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+			if (o->hipri)
+				iocb->u.c.flags |= IOCB_FLAG_HIPRI;
+		}
 	} else if (ddir_sync(io_u->ddir))
 		io_prep_fsync(iocb, f->fd);
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-12-01 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-12-01 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a0318e176ace1e53b0049965f63b0c60b55ff8cd:

  gettime: use nsec in get_cycles_per_msec division (2018-11-29 11:21:18 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 68afa5b570a7e0dce0470817037f7828cf36cd2f:

  stat: assign for first stat iteration, don't sum (2018-11-30 14:44:25 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      engines/libaio: only initialize iocb members when we need to
      stat: only apply proper stat summing for event timestamps
      stat: assign for first stat iteration, don't sum

 engines/libaio.c | 24 +++++++++++++++--------
 stat.c           | 60 ++++++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 61 insertions(+), 23 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libaio.c b/engines/libaio.c
index d386d14..9c8a61b 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -125,16 +125,20 @@ static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 	else
 		iocb = &io_u->iocb;
 
-	iocb->u.c.flags = 0;
-
 	if (io_u->ddir == DDIR_READ) {
-		io_prep_pread(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
-		if (o->hipri)
-			iocb->u.c.flags |= IOCB_FLAG_HIPRI;
+		if (o->fixedbufs) {
+			iocb->aio_fildes = f->fd;
+			iocb->aio_lio_opcode = IO_CMD_PREAD;
+			iocb->u.c.offset = io_u->offset;
+		} else
+			io_prep_pread(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
 	} else if (io_u->ddir == DDIR_WRITE) {
-		io_prep_pwrite(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
-		if (o->hipri)
-			iocb->u.c.flags |= IOCB_FLAG_HIPRI;
+		if (o->fixedbufs) {
+			iocb->aio_fildes = f->fd;
+			iocb->aio_lio_opcode = IO_CMD_PWRITE;
+			iocb->u.c.offset = io_u->offset;
+		} else
+			io_prep_pwrite(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
 	} else if (ddir_sync(io_u->ddir))
 		io_prep_fsync(iocb, f->fd);
 
@@ -468,6 +472,10 @@ static int fio_libaio_post_init(struct thread_data *td)
 			iocb = &ld->user_iocbs[i];
 			iocb->u.c.buf = io_u->buf;
 			iocb->u.c.nbytes = td_max_bs(td);
+
+			iocb->u.c.flags = 0;
+			if (o->hipri)
+				iocb->u.c.flags |= IOCB_FLAG_HIPRI;
 		}
 	}
 
diff --git a/stat.c b/stat.c
index 331abf6..887509f 100644
--- a/stat.c
+++ b/stat.c
@@ -1518,13 +1518,10 @@ struct json_object *show_thread_status(struct thread_stat *ts,
 	return ret;
 }
 
-static void sum_stat(struct io_stat *dst, struct io_stat *src, bool first)
+static void __sum_stat(struct io_stat *dst, struct io_stat *src, bool first)
 {
 	double mean, S;
 
-	if (src->samples == 0)
-		return;
-
 	dst->min_val = min(dst->min_val, src->min_val);
 	dst->max_val = max(dst->max_val, src->max_val);
 
@@ -1551,6 +1548,39 @@ static void sum_stat(struct io_stat *dst, struct io_stat *src, bool first)
 	dst->samples += src->samples;
 	dst->mean.u.f = mean;
 	dst->S.u.f = S;
+
+}
+
+/*
+ * We sum two kinds of stats - one that is time based, in which case we
+ * apply the proper summing technique, and then one that is iops/bw
+ * numbers. For group_reporting, we should just add those up, not make
+ * them the mean of everything.
+ */
+static void sum_stat(struct io_stat *dst, struct io_stat *src, bool first,
+		     bool pure_sum)
+{
+	if (src->samples == 0)
+		return;
+
+	if (!pure_sum) {
+		__sum_stat(dst, src, first);
+		return;
+	}
+
+	if (first) {
+		dst->min_val = src->min_val;
+		dst->max_val = src->max_val;
+		dst->samples = src->samples;
+		dst->mean.u.f = src->mean.u.f;
+		dst->S.u.f = src->S.u.f;
+	} else {
+		dst->min_val += src->min_val;
+		dst->max_val += src->max_val;
+		dst->samples += src->samples;
+		dst->mean.u.f += src->mean.u.f;
+		dst->S.u.f += src->S.u.f;
+	}
 }
 
 void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src)
@@ -1586,22 +1616,22 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 
 	for (l = 0; l < DDIR_RWDIR_CNT; l++) {
 		if (!dst->unified_rw_rep) {
-			sum_stat(&dst->clat_stat[l], &src->clat_stat[l], first);
-			sum_stat(&dst->slat_stat[l], &src->slat_stat[l], first);
-			sum_stat(&dst->lat_stat[l], &src->lat_stat[l], first);
-			sum_stat(&dst->bw_stat[l], &src->bw_stat[l], first);
-			sum_stat(&dst->iops_stat[l], &src->iops_stat[l], first);
+			sum_stat(&dst->clat_stat[l], &src->clat_stat[l], first, false);
+			sum_stat(&dst->slat_stat[l], &src->slat_stat[l], first, false);
+			sum_stat(&dst->lat_stat[l], &src->lat_stat[l], first, false);
+			sum_stat(&dst->bw_stat[l], &src->bw_stat[l], first, true);
+			sum_stat(&dst->iops_stat[l], &src->iops_stat[l], first, true);
 
 			dst->io_bytes[l] += src->io_bytes[l];
 
 			if (dst->runtime[l] < src->runtime[l])
 				dst->runtime[l] = src->runtime[l];
 		} else {
-			sum_stat(&dst->clat_stat[0], &src->clat_stat[l], first);
-			sum_stat(&dst->slat_stat[0], &src->slat_stat[l], first);
-			sum_stat(&dst->lat_stat[0], &src->lat_stat[l], first);
-			sum_stat(&dst->bw_stat[0], &src->bw_stat[l], first);
-			sum_stat(&dst->iops_stat[0], &src->iops_stat[l], first);
+			sum_stat(&dst->clat_stat[0], &src->clat_stat[l], first, false);
+			sum_stat(&dst->slat_stat[0], &src->slat_stat[l], first, false);
+			sum_stat(&dst->lat_stat[0], &src->lat_stat[l], first, false);
+			sum_stat(&dst->bw_stat[0], &src->bw_stat[l], first, true);
+			sum_stat(&dst->iops_stat[0], &src->iops_stat[l], first, true);
 
 			dst->io_bytes[0] += src->io_bytes[l];
 
@@ -1616,7 +1646,7 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 		}
 	}
 
-	sum_stat(&dst->sync_stat, &src->sync_stat, first);
+	sum_stat(&dst->sync_stat, &src->sync_stat, first, false);
 	dst->usr_time += src->usr_time;
 	dst->sys_time += src->sys_time;
 	dst->ctx += src->ctx;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-11-30 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-11-30 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c36210b443a37c53978bbea88f1dabb0b61799d7:

  engines/libaio: use maximum buffer size for fixed user bufs (2018-11-27 19:43:30 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a0318e176ace1e53b0049965f63b0c60b55ff8cd:

  gettime: use nsec in get_cycles_per_msec division (2018-11-29 11:21:18 -0700)

----------------------------------------------------------------
Bari Antebi (1):
      rand: fix compressible data ratio per segment

Vincent Fu (1):
      gettime: use nsec in get_cycles_per_msec division

 gettime.c  | 12 ++++++------
 lib/rand.c |  1 +
 2 files changed, 7 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/gettime.c b/gettime.c
index 7702193..272a3e6 100644
--- a/gettime.c
+++ b/gettime.c
@@ -239,13 +239,13 @@ static unsigned long get_cycles_per_msec(void)
 		__fio_gettime(&e);
 		c_e = get_cpu_clock();
 
-		elapsed = utime_since(&s, &e);
-		if (elapsed >= 1280)
+		elapsed = ntime_since(&s, &e);
+		if (elapsed >= 1280000)
 			break;
 	} while (1);
 
 	fio_clock_source = old_cs;
-	return (c_e - c_s) * 1000 / elapsed;
+	return (c_e - c_s) * 1000000 / elapsed;
 }
 
 #define NR_TIME_ITERS	50
@@ -298,10 +298,10 @@ static int calibrate_cpu_clock(void)
 
 	avg /= samples;
 	cycles_per_msec = avg;
-	dprint(FD_TIME, "avg: %llu\n", (unsigned long long) avg);
-	dprint(FD_TIME, "min=%llu, max=%llu, mean=%f, S=%f\n",
+	dprint(FD_TIME, "min=%llu, max=%llu, mean=%f, S=%f, N=%d\n",
 			(unsigned long long) minc,
-			(unsigned long long) maxc, mean, S);
+			(unsigned long long) maxc, mean, S, NR_TIME_ITERS);
+	dprint(FD_TIME, "trimmed mean=%llu, N=%d\n", (unsigned long long) avg, samples);
 
 	max_ticks = MAX_CLOCK_SEC * cycles_per_msec * 1000ULL;
 	max_mult = ULLONG_MAX / max_ticks;
diff --git a/lib/rand.c b/lib/rand.c
index 99846a8..f18bd8d 100644
--- a/lib/rand.c
+++ b/lib/rand.c
@@ -166,6 +166,7 @@ void __fill_random_buf_percentage(unsigned long seed, void *buf,
 		if (!len)
 			break;
 		buf += this_len;
+		this_len = segment - this_len;
 
 		if (this_len > len)
 			this_len = len;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-11-28 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-11-28 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4cf30b66c62f3f5e6501390d564cf0d966823591:

  workqueue: update IO counters promptly after handling IO (2018-11-26 09:16:52 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c36210b443a37c53978bbea88f1dabb0b61799d7:

  engines/libaio: use maximum buffer size for fixed user bufs (2018-11-27 19:43:30 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      engines/libaio: add preliminary support for pre-mapped IO buffers
      engines/libaio: use maximum buffer size for fixed user bufs

 backend.c        |  3 +++
 engines/libaio.c | 73 ++++++++++++++++++++++++++++++++++++++++----------------
 ioengines.h      |  3 ++-
 3 files changed, 57 insertions(+), 22 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 2f103b3..2f46329 100644
--- a/backend.c
+++ b/backend.c
@@ -1704,6 +1704,9 @@ static void *thread_main(void *data)
 	if (init_io_u(td))
 		goto err;
 
+	if (td->io_ops->post_init && td->io_ops->post_init(td))
+		goto err;
+
 	if (o->verify_async && verify_async_init(td))
 		goto err;
 
diff --git a/engines/libaio.c b/engines/libaio.c
index 74238e4..d386d14 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -24,6 +24,9 @@
 #ifndef IOCTX_FLAG_IOPOLL
 #define IOCTX_FLAG_IOPOLL	(1 << 1)
 #endif
+#ifndef IOCTX_FLAG_FIXEDBUFS
+#define IOCTX_FLAG_FIXEDBUFS	(1 << 2)
+#endif
 
 static int fio_libaio_commit(struct thread_data *td);
 
@@ -56,6 +59,7 @@ struct libaio_options {
 	unsigned int userspace_reap;
 	unsigned int hipri;
 	unsigned int useriocb;
+	unsigned int fixedbufs;
 };
 
 static struct fio_option options[] = {
@@ -87,6 +91,15 @@ static struct fio_option options[] = {
 		.group	= FIO_OPT_G_LIBAIO,
 	},
 	{
+		.name	= "fixedbufs",
+		.lname	= "Fixed (pre-mapped) IO buffers",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct libaio_options, fixedbufs),
+		.help	= "Pre map IO buffers",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
+	{
 		.name	= NULL,
 	},
 };
@@ -399,7 +412,7 @@ static void fio_libaio_cleanup(struct thread_data *td)
 }
 
 static int fio_libaio_old_queue_init(struct libaio_data *ld, unsigned int depth,
-				     bool hipri, bool useriocb)
+				     bool hipri, bool useriocb, bool fixedbufs)
 {
 	if (hipri) {
 		log_err("fio: polled aio not available on your platform\n");
@@ -409,12 +422,16 @@ static int fio_libaio_old_queue_init(struct libaio_data *ld, unsigned int depth,
 		log_err("fio: user mapped iocbs not available on your platform\n");
 		return 1;
 	}
+	if (fixedbufs) {
+		log_err("fio: fixed buffers not available on your platform\n");
+		return 1;
+	}
 
 	return io_queue_init(depth, &ld->aio_ctx);
 }
 
 static int fio_libaio_queue_init(struct libaio_data *ld, unsigned int depth,
-				 bool hipri, bool useriocb)
+				 bool hipri, bool useriocb, bool fixedbufs)
 {
 #ifdef __NR_sys_io_setup2
 	int ret, flags = 0;
@@ -423,6 +440,8 @@ static int fio_libaio_queue_init(struct libaio_data *ld, unsigned int depth,
 		flags |= IOCTX_FLAG_IOPOLL;
 	if (useriocb)
 		flags |= IOCTX_FLAG_USERIOCB;
+	if (fixedbufs)
+		flags |= IOCTX_FLAG_FIXEDBUFS;
 
 	ret = syscall(__NR_sys_io_setup2, depth, flags, ld->user_iocbs,
 			&ld->aio_ctx);
@@ -430,14 +449,42 @@ static int fio_libaio_queue_init(struct libaio_data *ld, unsigned int depth,
 		return 0;
 	/* fall through to old syscall */
 #endif
-	return fio_libaio_old_queue_init(ld, depth, hipri, useriocb);
+	return fio_libaio_old_queue_init(ld, depth, hipri, useriocb, fixedbufs);
+}
+
+static int fio_libaio_post_init(struct thread_data *td)
+{
+	struct libaio_data *ld = td->io_ops_data;
+	struct libaio_options *o = td->eo;
+	struct io_u *io_u;
+	struct iocb *iocb;
+	int err = 0;
+
+	if (o->fixedbufs) {
+		int i;
+
+		for (i = 0; i < td->o.iodepth; i++) {
+			io_u = ld->io_u_index[i];
+			iocb = &ld->user_iocbs[i];
+			iocb->u.c.buf = io_u->buf;
+			iocb->u.c.nbytes = td_max_bs(td);
+		}
+	}
+
+	err = fio_libaio_queue_init(ld, td->o.iodepth, o->hipri, o->useriocb,
+					o->fixedbufs);
+	if (err) {
+		td_verror(td, -err, "io_queue_init");
+		return 1;
+	}
+
+	return 0;
 }
 
 static int fio_libaio_init(struct thread_data *td)
 {
 	struct libaio_options *o = td->eo;
 	struct libaio_data *ld;
-	int err = 0;
 
 	ld = calloc(1, sizeof(*ld));
 
@@ -450,23 +497,6 @@ static int fio_libaio_init(struct thread_data *td)
 		memset(ld->user_iocbs, 0, size);
 	}
 
-	/*
-	 * First try passing in 0 for queue depth, since we don't
-	 * care about the user ring. If that fails, the kernel is too old
-	 * and we need the right depth.
-	 */
-	err = fio_libaio_queue_init(ld, td->o.iodepth, o->hipri, o->useriocb);
-	if (err) {
-		td_verror(td, -err, "io_queue_init");
-		log_err("fio: check /proc/sys/fs/aio-max-nr\n");
-		if (ld->user_iocbs) {
-			size_t size = td->o.iodepth * sizeof(struct iocb);
-			fio_memfree(ld->user_iocbs, size, false);
-		}
-		free(ld);
-		return 1;
-	}
-
 	ld->entries = td->o.iodepth;
 	ld->is_pow2 = is_power_of_2(ld->entries);
 	ld->aio_events = calloc(ld->entries, sizeof(struct io_event));
@@ -494,6 +524,7 @@ static struct ioengine_ops ioengine = {
 	.name			= "libaio",
 	.version		= FIO_IOOPS_VERSION,
 	.init			= fio_libaio_init,
+	.post_init		= fio_libaio_post_init,
 	.io_u_init		= fio_libaio_io_u_init,
 	.prep			= fio_libaio_prep,
 	.queue			= fio_libaio_queue,
diff --git a/ioengines.h b/ioengines.h
index feb21db..b9cd33d 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -7,7 +7,7 @@
 #include "flist.h"
 #include "io_u.h"
 
-#define FIO_IOOPS_VERSION	24
+#define FIO_IOOPS_VERSION	25
 
 /*
  * io_ops->queue() return values
@@ -25,6 +25,7 @@ struct ioengine_ops {
 	int flags;
 	int (*setup)(struct thread_data *);
 	int (*init)(struct thread_data *);
+	int (*post_init)(struct thread_data *);
 	int (*prep)(struct thread_data *, struct io_u *);
 	enum fio_q_status (*queue)(struct thread_data *, struct io_u *);
 	int (*commit)(struct thread_data *);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-11-27 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-11-27 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 41dd12d62f43160bc8d8574127d0c2b861e1ee1d:

  options: fix 'kb_base' being of the wrong type (2018-11-25 09:56:06 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4cf30b66c62f3f5e6501390d564cf0d966823591:

  workqueue: update IO counters promptly after handling IO (2018-11-26 09:16:52 -0700)

----------------------------------------------------------------
Vincent Fu (1):
      workqueue: update IO counters promptly after handling IO

 workqueue.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/workqueue.c b/workqueue.c
index faed245..b595951 100644
--- a/workqueue.c
+++ b/workqueue.c
@@ -190,8 +190,6 @@ static void *worker_thread(void *data)
 				if (wq->wake_idle)
 					pthread_cond_signal(&wq->flush_cond);
 			}
-			if (wq->ops.update_acct_fn)
-				wq->ops.update_acct_fn(sw);
 
 			pthread_cond_wait(&sw->cond, &sw->lock);
 		} else {
@@ -200,11 +198,10 @@ handle_work:
 		}
 		pthread_mutex_unlock(&sw->lock);
 		handle_list(sw, &local_list);
+		if (wq->ops.update_acct_fn)
+			wq->ops.update_acct_fn(sw);
 	}
 
-	if (wq->ops.update_acct_fn)
-		wq->ops.update_acct_fn(sw);
-
 done:
 	sk_out_drop();
 	return NULL;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-11-26 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-11-26 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 92a1a1d701f8af640859a75cc74f82b7bf9e3a0a:

  options: fix 'unit_base' being of the wrong type (2018-11-24 15:10:39 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 41dd12d62f43160bc8d8574127d0c2b861e1ee1d:

  options: fix 'kb_base' being of the wrong type (2018-11-25 09:56:06 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      options: fix 'kb_base' being of the wrong type

 options.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/options.c b/options.c
index cf087ed..7a7006c 100644
--- a/options.c
+++ b/options.c
@@ -4530,7 +4530,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "kb_base",
 		.lname	= "KB Base",
-		.type	= FIO_OPT_INT,
+		.type	= FIO_OPT_STR,
 		.off1	= offsetof(struct thread_options, kb_base),
 		.prio	= 1,
 		.def	= "1024",


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-11-25 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-11-25 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0fcbc00994f49f73fe815b9dc074bd6a15eab522:

  engines/libaio: cleanup new vs old io_setup() system call path (2018-11-21 11:33:22 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 92a1a1d701f8af640859a75cc74f82b7bf9e3a0a:

  options: fix 'unit_base' being of the wrong type (2018-11-24 15:10:39 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      options: fix 'unit_base' being of the wrong type

 options.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/options.c b/options.c
index 98187de..cf087ed 100644
--- a/options.c
+++ b/options.c
@@ -4551,7 +4551,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "unit_base",
 		.lname	= "Unit for quantities of data (Bits or Bytes)",
-		.type	= FIO_OPT_INT,
+		.type	= FIO_OPT_STR,
 		.off1	= offsetof(struct thread_options, unit_base),
 		.prio	= 1,
 		.posval = {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-11-22 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-11-22 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e65ef9593436f0f62df366f506f7c4d318b5cd71:

  engines/libaio: fallback to old io_setup() system call (2018-11-21 05:53:38 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0fcbc00994f49f73fe815b9dc074bd6a15eab522:

  engines/libaio: cleanup new vs old io_setup() system call path (2018-11-21 11:33:22 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      engines/libaio: the IOCTX_FLAG_* flags changed
      engines/libaio: use fio_memalign() helper for user iocbs
      engines/libaio: cleanup new vs old io_setup() system call path

 engines/libaio.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libaio.c b/engines/libaio.c
index 991d588..74238e4 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -12,17 +12,18 @@
 #include "../fio.h"
 #include "../lib/pow2.h"
 #include "../optgroup.h"
+#include "../lib/memalign.h"
 
 #ifndef IOCB_FLAG_HIPRI
 #define IOCB_FLAG_HIPRI	(1 << 2)
 #endif
-#ifndef IOCTX_FLAG_IOPOLL
-#define IOCTX_FLAG_IOPOLL	(1 << 0)
-#endif
+
 #ifndef IOCTX_FLAG_USERIOCB
-#define IOCTX_FLAG_USERIOCB	(1 << 1)
+#define IOCTX_FLAG_USERIOCB	(1 << 0)
+#endif
+#ifndef IOCTX_FLAG_IOPOLL
+#define IOCTX_FLAG_IOPOLL	(1 << 1)
 #endif
-
 
 static int fio_libaio_commit(struct thread_data *td);
 
@@ -111,6 +112,8 @@ static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 	else
 		iocb = &io_u->iocb;
 
+	iocb->u.c.flags = 0;
+
 	if (io_u->ddir == DDIR_READ) {
 		io_prep_pread(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
 		if (o->hipri)
@@ -387,8 +390,10 @@ static void fio_libaio_cleanup(struct thread_data *td)
 		free(ld->aio_events);
 		free(ld->iocbs);
 		free(ld->io_us);
-		if (ld->user_iocbs)
-			free(ld->user_iocbs);
+		if (ld->user_iocbs) {
+			size_t size = td->o.iodepth * sizeof(struct iocb);
+			fio_memfree(ld->user_iocbs, size, false);
+		}
 		free(ld);
 	}
 }
@@ -423,11 +428,9 @@ static int fio_libaio_queue_init(struct libaio_data *ld, unsigned int depth,
 			&ld->aio_ctx);
 	if (!ret)
 		return 0;
-
-	return fio_libaio_old_queue_init(ld, depth, hipri, useriocb);
-#else
-	return fio_libaio_old_queue_init(ld, depth, hipri, useriocb);
+	/* fall through to old syscall */
 #endif
+	return fio_libaio_old_queue_init(ld, depth, hipri, useriocb);
 }
 
 static int fio_libaio_init(struct thread_data *td)
@@ -440,16 +443,11 @@ static int fio_libaio_init(struct thread_data *td)
 
 	if (o->useriocb) {
 		size_t size;
-		void *p;
 
 		ld->io_u_index = calloc(td->o.iodepth, sizeof(struct io_u *));
 		size = td->o.iodepth * sizeof(struct iocb);
-		if (posix_memalign(&p, page_size, size)) {
-			log_err("fio: libaio iocb allocation failure\n");
-			free(ld);
-			return 1;
-		}
-		ld->user_iocbs = p;
+		ld->user_iocbs = fio_memalign(page_size, size, false);
+		memset(ld->user_iocbs, 0, size);
 	}
 
 	/*
@@ -461,8 +459,10 @@ static int fio_libaio_init(struct thread_data *td)
 	if (err) {
 		td_verror(td, -err, "io_queue_init");
 		log_err("fio: check /proc/sys/fs/aio-max-nr\n");
-		if (ld->user_iocbs)
-			free(ld->user_iocbs);
+		if (ld->user_iocbs) {
+			size_t size = td->o.iodepth * sizeof(struct iocb);
+			fio_memfree(ld->user_iocbs, size, false);
+		}
 		free(ld);
 		return 1;
 	}


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-11-21 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-11-21 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a1b006fe1cd3aa7e1b567f55e5a4c827d54f7c41:

  engines/libaio: fix new aio poll API (2018-11-19 19:41:53 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e65ef9593436f0f62df366f506f7c4d318b5cd71:

  engines/libaio: fallback to old io_setup() system call (2018-11-21 05:53:38 -0700)

----------------------------------------------------------------
Jens Axboe (4):
      Kill "No I/O performed by ..." message
      backend: initialize io engine before io_u buffers
      engines/libaio: add support for user mapped iocbs
      engines/libaio: fallback to old io_setup() system call

 backend.c        |  25 ++----------
 engines/libaio.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 105 insertions(+), 38 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index a92ec45..2f103b3 100644
--- a/backend.c
+++ b/backend.c
@@ -1542,7 +1542,7 @@ static void *thread_main(void *data)
 	struct sk_out *sk_out = fd->sk_out;
 	uint64_t bytes_done[DDIR_RWDIR_CNT];
 	int deadlock_loop_cnt;
-	bool clear_state, did_some_io;
+	bool clear_state;
 	int ret;
 
 	sk_out_assign(sk_out);
@@ -1698,6 +1698,9 @@ static void *thread_main(void *data)
 	if (!init_iolog(td))
 		goto err;
 
+	if (td_io_init(td))
+		goto err;
+
 	if (init_io_u(td))
 		goto err;
 
@@ -1728,9 +1731,6 @@ static void *thread_main(void *data)
 	if (!o->create_serialize && setup_files(td))
 		goto err;
 
-	if (td_io_init(td))
-		goto err;
-
 	if (!init_random_map(td))
 		goto err;
 
@@ -1763,7 +1763,6 @@ static void *thread_main(void *data)
 
 	memset(bytes_done, 0, sizeof(bytes_done));
 	clear_state = false;
-	did_some_io = false;
 
 	while (keep_running(td)) {
 		uint64_t verify_bytes;
@@ -1841,9 +1840,6 @@ static void *thread_main(void *data)
 		    td_ioengine_flagged(td, FIO_UNIDIR))
 			continue;
 
-		if (ddir_rw_sum(bytes_done))
-			did_some_io = true;
-
 		clear_io_state(td, 0);
 
 		fio_gettime(&td->start, NULL);
@@ -1865,19 +1861,6 @@ static void *thread_main(void *data)
 	}
 
 	/*
-	 * If td ended up with no I/O when it should have had,
-	 * then something went wrong unless FIO_NOIO or FIO_DISKLESSIO.
-	 * (Are we not missing other flags that can be ignored ?)
-	 */
-	if (!td->error && (td->o.size || td->o.io_size) &&
-	    !ddir_rw_sum(bytes_done) && !did_some_io && !td->o.create_only &&
-	    !(td_ioengine_flagged(td, FIO_NOIO) ||
-	      td_ioengine_flagged(td, FIO_DISKLESSIO)))
-		log_err("%s: No I/O performed by %s, "
-			 "perhaps try --debug=io option for details?\n",
-			 td->o.name, td->io_ops->name);
-
-	/*
 	 * Acquire this lock if we were doing overlap checking in
 	 * offload mode so that we don't clean up this job while
 	 * another thread is checking its io_u's for overlap
diff --git a/engines/libaio.c b/engines/libaio.c
index 2a4d653..991d588 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -19,6 +19,10 @@
 #ifndef IOCTX_FLAG_IOPOLL
 #define IOCTX_FLAG_IOPOLL	(1 << 0)
 #endif
+#ifndef IOCTX_FLAG_USERIOCB
+#define IOCTX_FLAG_USERIOCB	(1 << 1)
+#endif
+
 
 static int fio_libaio_commit(struct thread_data *td);
 
@@ -28,6 +32,9 @@ struct libaio_data {
 	struct iocb **iocbs;
 	struct io_u **io_us;
 
+	struct iocb *user_iocbs;
+	struct io_u **io_u_index;
+
 	/*
 	 * Basic ring buffer. 'head' is incremented in _queue(), and
 	 * 'tail' is incremented in _commit(). We keep 'queued' so
@@ -47,6 +54,7 @@ struct libaio_options {
 	void *pad;
 	unsigned int userspace_reap;
 	unsigned int hipri;
+	unsigned int useriocb;
 };
 
 static struct fio_option options[] = {
@@ -69,6 +77,15 @@ static struct fio_option options[] = {
 		.group	= FIO_OPT_G_LIBAIO,
 	},
 	{
+		.name	= "useriocb",
+		.lname	= "User IOCBs",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct libaio_options, useriocb),
+		.help	= "Use user mapped IOCBs",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
+	{
 		.name	= NULL,
 	},
 };
@@ -84,19 +101,26 @@ static inline void ring_inc(struct libaio_data *ld, unsigned int *val,
 
 static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 {
+	struct libaio_data *ld = td->io_ops_data;
 	struct fio_file *f = io_u->file;
 	struct libaio_options *o = td->eo;
+	struct iocb *iocb;
+
+	if (o->useriocb)
+		iocb = &ld->user_iocbs[io_u->index];
+	else
+		iocb = &io_u->iocb;
 
 	if (io_u->ddir == DDIR_READ) {
-		io_prep_pread(&io_u->iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+		io_prep_pread(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
 		if (o->hipri)
-			io_u->iocb.u.c.flags |= IOCB_FLAG_HIPRI;
+			iocb->u.c.flags |= IOCB_FLAG_HIPRI;
 	} else if (io_u->ddir == DDIR_WRITE) {
-		io_prep_pwrite(&io_u->iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+		io_prep_pwrite(iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
 		if (o->hipri)
-			io_u->iocb.u.c.flags |= IOCB_FLAG_HIPRI;
+			iocb->u.c.flags |= IOCB_FLAG_HIPRI;
 	} else if (ddir_sync(io_u->ddir))
-		io_prep_fsync(&io_u->iocb, f->fd);
+		io_prep_fsync(iocb, f->fd);
 
 	return 0;
 }
@@ -104,11 +128,16 @@ static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 static struct io_u *fio_libaio_event(struct thread_data *td, int event)
 {
 	struct libaio_data *ld = td->io_ops_data;
+	struct libaio_options *o = td->eo;
 	struct io_event *ev;
 	struct io_u *io_u;
 
 	ev = ld->aio_events + event;
-	io_u = container_of(ev->obj, struct io_u, iocb);
+	if (o->useriocb) {
+		int index = (int) (uintptr_t) ev->obj;
+		io_u = ld->io_u_index[index];
+	} else
+		io_u = container_of(ev->obj, struct io_u, iocb);
 
 	if (ev->res != io_u->xfer_buflen) {
 		if (ev->res > io_u->xfer_buflen)
@@ -204,6 +233,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 					  struct io_u *io_u)
 {
 	struct libaio_data *ld = td->io_ops_data;
+	struct libaio_options *o = td->eo;
 
 	fio_ro_check(td, io_u);
 
@@ -234,7 +264,11 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 		return FIO_Q_COMPLETED;
 	}
 
-	ld->iocbs[ld->head] = &io_u->iocb;
+	if (o->useriocb)
+		ld->iocbs[ld->head] = (struct iocb *) (uintptr_t) io_u->index;
+	else
+		ld->iocbs[ld->head] = &io_u->iocb;
+
 	ld->io_us[ld->head] = io_u;
 	ring_inc(ld, &ld->head, 1);
 	ld->queued++;
@@ -353,26 +387,46 @@ static void fio_libaio_cleanup(struct thread_data *td)
 		free(ld->aio_events);
 		free(ld->iocbs);
 		free(ld->io_us);
+		if (ld->user_iocbs)
+			free(ld->user_iocbs);
 		free(ld);
 	}
 }
 
+static int fio_libaio_old_queue_init(struct libaio_data *ld, unsigned int depth,
+				     bool hipri, bool useriocb)
+{
+	if (hipri) {
+		log_err("fio: polled aio not available on your platform\n");
+		return 1;
+	}
+	if (useriocb) {
+		log_err("fio: user mapped iocbs not available on your platform\n");
+		return 1;
+	}
+
+	return io_queue_init(depth, &ld->aio_ctx);
+}
+
 static int fio_libaio_queue_init(struct libaio_data *ld, unsigned int depth,
-				 bool hipri)
+				 bool hipri, bool useriocb)
 {
 #ifdef __NR_sys_io_setup2
-	int flags = 0;
+	int ret, flags = 0;
 
 	if (hipri)
-		flags = IOCTX_FLAG_IOPOLL;
+		flags |= IOCTX_FLAG_IOPOLL;
+	if (useriocb)
+		flags |= IOCTX_FLAG_USERIOCB;
 
-	return syscall(__NR_sys_io_setup2, depth, flags, &ld->aio_ctx);
+	ret = syscall(__NR_sys_io_setup2, depth, flags, ld->user_iocbs,
+			&ld->aio_ctx);
+	if (!ret)
+		return 0;
+
+	return fio_libaio_old_queue_init(ld, depth, hipri, useriocb);
 #else
-	if (hipri) {
-		log_err("fio: polled aio not available on your platform\n");
-		return 1;
-	}
-	return io_queue_init(depth, &ld->aio_ctx);
+	return fio_libaio_old_queue_init(ld, depth, hipri, useriocb);
 #endif
 }
 
@@ -384,15 +438,31 @@ static int fio_libaio_init(struct thread_data *td)
 
 	ld = calloc(1, sizeof(*ld));
 
+	if (o->useriocb) {
+		size_t size;
+		void *p;
+
+		ld->io_u_index = calloc(td->o.iodepth, sizeof(struct io_u *));
+		size = td->o.iodepth * sizeof(struct iocb);
+		if (posix_memalign(&p, page_size, size)) {
+			log_err("fio: libaio iocb allocation failure\n");
+			free(ld);
+			return 1;
+		}
+		ld->user_iocbs = p;
+	}
+
 	/*
 	 * First try passing in 0 for queue depth, since we don't
 	 * care about the user ring. If that fails, the kernel is too old
 	 * and we need the right depth.
 	 */
-	err = fio_libaio_queue_init(ld, td->o.iodepth, o->hipri);
+	err = fio_libaio_queue_init(ld, td->o.iodepth, o->hipri, o->useriocb);
 	if (err) {
 		td_verror(td, -err, "io_queue_init");
 		log_err("fio: check /proc/sys/fs/aio-max-nr\n");
+		if (ld->user_iocbs)
+			free(ld->user_iocbs);
 		free(ld);
 		return 1;
 	}
@@ -407,10 +477,24 @@ static int fio_libaio_init(struct thread_data *td)
 	return 0;
 }
 
+static int fio_libaio_io_u_init(struct thread_data *td, struct io_u *io_u)
+{
+	struct libaio_options *o = td->eo;
+
+	if (o->useriocb) {
+		struct libaio_data *ld = td->io_ops_data;
+
+		ld->io_u_index[io_u->index] = io_u;
+	}
+
+	return 0;
+}
+
 static struct ioengine_ops ioengine = {
 	.name			= "libaio",
 	.version		= FIO_IOOPS_VERSION,
 	.init			= fio_libaio_init,
+	.io_u_init		= fio_libaio_io_u_init,
 	.prep			= fio_libaio_prep,
 	.queue			= fio_libaio_queue,
 	.commit			= fio_libaio_commit,


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-11-20 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-11-20 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ee636f3fc5ddb9488c40aa2c6dd4168732e5b095:

  libaio: switch to newer libaio polled IO API (2018-11-15 20:31:35 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a1b006fe1cd3aa7e1b567f55e5a4c827d54f7c41:

  engines/libaio: fix new aio poll API (2018-11-19 19:41:53 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      engines/libaio: update to new io_setup2() system call
      engines/libaio: fix new aio poll API

 arch/arch-x86_64.h |  4 ++++
 engines/libaio.c   | 27 +++++++++++++++++++++++----
 2 files changed, 27 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/arch/arch-x86_64.h b/arch/arch-x86_64.h
index 484ea0c..ac670d0 100644
--- a/arch/arch-x86_64.h
+++ b/arch/arch-x86_64.h
@@ -1,6 +1,10 @@
 #ifndef ARCH_X86_64_H
 #define ARCH_X86_64_H
 
+#ifndef __NR_sys_io_setup2
+#define __NR_sys_io_setup2	335
+#endif
+
 static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 			    unsigned int *ecx, unsigned int *edx)
 {
diff --git a/engines/libaio.c b/engines/libaio.c
index dc66462..2a4d653 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -16,6 +16,9 @@
 #ifndef IOCB_FLAG_HIPRI
 #define IOCB_FLAG_HIPRI	(1 << 2)
 #endif
+#ifndef IOCTX_FLAG_IOPOLL
+#define IOCTX_FLAG_IOPOLL	(1 << 0)
+#endif
 
 static int fio_libaio_commit(struct thread_data *td);
 
@@ -354,6 +357,25 @@ static void fio_libaio_cleanup(struct thread_data *td)
 	}
 }
 
+static int fio_libaio_queue_init(struct libaio_data *ld, unsigned int depth,
+				 bool hipri)
+{
+#ifdef __NR_sys_io_setup2
+	int flags = 0;
+
+	if (hipri)
+		flags = IOCTX_FLAG_IOPOLL;
+
+	return syscall(__NR_sys_io_setup2, depth, flags, &ld->aio_ctx);
+#else
+	if (hipri) {
+		log_err("fio: polled aio not available on your platform\n");
+		return 1;
+	}
+	return io_queue_init(depth, &ld->aio_ctx);
+#endif
+}
+
 static int fio_libaio_init(struct thread_data *td)
 {
 	struct libaio_options *o = td->eo;
@@ -367,10 +389,7 @@ static int fio_libaio_init(struct thread_data *td)
 	 * care about the user ring. If that fails, the kernel is too old
 	 * and we need the right depth.
 	 */
-	if (!o->userspace_reap)
-		err = io_queue_init(INT_MAX, &ld->aio_ctx);
-	if (o->userspace_reap || err == -EINVAL)
-		err = io_queue_init(td->o.iodepth, &ld->aio_ctx);
+	err = fio_libaio_queue_init(ld, td->o.iodepth, o->hipri);
 	if (err) {
 		td_verror(td, -err, "io_queue_init");
 		log_err("fio: check /proc/sys/fs/aio-max-nr\n");


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-11-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-11-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 46ff70e8795e91acb2ffc041e6c1fdb4e157dff4:

  verify: add requested block information to failure trace (2018-11-06 21:30:32 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ee636f3fc5ddb9488c40aa2c6dd4168732e5b095:

  libaio: switch to newer libaio polled IO API (2018-11-15 20:31:35 -0700)

----------------------------------------------------------------
Jens Axboe (5):
      workqueue: fix misleading comment
      workqueue: ensure we see deferred error for IOs
      backend: silence "No I/O performed by..." if jobs ends in error
      io_u: fall through to unlock path in case of error
      libaio: switch to newer libaio polled IO API

 backend.c        |  7 +++++--
 engines/libaio.c | 11 ++++++-----
 io_u.c           | 14 +++++++++-----
 rate-submit.c    | 11 +++++++++--
 workqueue.c      |  2 +-
 5 files changed, 30 insertions(+), 15 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index d6450ba..a92ec45 100644
--- a/backend.c
+++ b/backend.c
@@ -237,6 +237,9 @@ static void cleanup_pending_aio(struct thread_data *td)
 {
 	int r;
 
+	if (td->error)
+		return;
+
 	/*
 	 * get immediately available events, if any
 	 */
@@ -1866,8 +1869,8 @@ static void *thread_main(void *data)
 	 * then something went wrong unless FIO_NOIO or FIO_DISKLESSIO.
 	 * (Are we not missing other flags that can be ignored ?)
 	 */
-	if ((td->o.size || td->o.io_size) && !ddir_rw_sum(bytes_done) &&
-	    !did_some_io && !td->o.create_only &&
+	if (!td->error && (td->o.size || td->o.io_size) &&
+	    !ddir_rw_sum(bytes_done) && !did_some_io && !td->o.create_only &&
 	    !(td_ioengine_flagged(td, FIO_NOIO) ||
 	      td_ioengine_flagged(td, FIO_DISKLESSIO)))
 		log_err("%s: No I/O performed by %s, "
diff --git a/engines/libaio.c b/engines/libaio.c
index b241ed9..dc66462 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -13,8 +13,9 @@
 #include "../lib/pow2.h"
 #include "../optgroup.h"
 
-#define IOCB_CMD_PREAD_POLL 9
-#define IOCB_CMD_PWRITE_POLL 10
+#ifndef IOCB_FLAG_HIPRI
+#define IOCB_FLAG_HIPRI	(1 << 2)
+#endif
 
 static int fio_libaio_commit(struct thread_data *td);
 
@@ -57,7 +58,7 @@ static struct fio_option options[] = {
 	},
 	{
 		.name	= "hipri",
-		.lname	= "RWF_HIPRI",
+		.lname	= "High Priority",
 		.type	= FIO_OPT_STR_SET,
 		.off1	= offsetof(struct libaio_options, hipri),
 		.help	= "Use polled IO completions",
@@ -86,11 +87,11 @@ static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 	if (io_u->ddir == DDIR_READ) {
 		io_prep_pread(&io_u->iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
 		if (o->hipri)
-			io_u->iocb.aio_lio_opcode = IOCB_CMD_PREAD_POLL;
+			io_u->iocb.u.c.flags |= IOCB_FLAG_HIPRI;
 	} else if (io_u->ddir == DDIR_WRITE) {
 		io_prep_pwrite(&io_u->iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
 		if (o->hipri)
-			io_u->iocb.aio_lio_opcode = IOCB_CMD_PWRITE_POLL;
+			io_u->iocb.u.c.flags |= IOCB_FLAG_HIPRI;
 	} else if (ddir_sync(io_u->ddir))
 		io_prep_fsync(&io_u->iocb, f->fd);
 
diff --git a/io_u.c b/io_u.c
index 56abe6f..1604ff8 100644
--- a/io_u.c
+++ b/io_u.c
@@ -604,7 +604,7 @@ static inline enum fio_ddir get_rand_ddir(struct thread_data *td)
 
 int io_u_quiesce(struct thread_data *td)
 {
-	int completed = 0;
+	int ret = 0, completed = 0;
 
 	/*
 	 * We are going to sleep, ensure that we flush anything pending as
@@ -619,17 +619,20 @@ int io_u_quiesce(struct thread_data *td)
 		td_io_commit(td);
 
 	while (td->io_u_in_flight) {
-		int ret;
-
 		ret = io_u_queued_complete(td, 1);
 		if (ret > 0)
 			completed += ret;
+		else if (ret < 0)
+			break;
 	}
 
 	if (td->flags & TD_F_REGROW_LOGS)
 		regrow_logs(td);
 
-	return completed;
+	if (completed)
+		return completed;
+
+	return ret;
 }
 
 static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
@@ -1556,7 +1559,8 @@ again:
 		assert(!(td->flags & TD_F_CHILD));
 		ret = pthread_cond_wait(&td->free_cond, &td->io_u_lock);
 		assert(ret == 0);
-		goto again;
+		if (!td->error)
+			goto again;
 	}
 
 	if (needs_lock)
diff --git a/rate-submit.c b/rate-submit.c
index e5c6204..b07a207 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -53,7 +53,7 @@ static int io_workqueue_fn(struct submit_worker *sw,
 	struct io_u *io_u = container_of(work, struct io_u, work);
 	const enum fio_ddir ddir = io_u->ddir;
 	struct thread_data *td = sw->priv;
-	int ret;
+	int ret, error;
 
 	if (td->o.serialize_overlap)
 		check_overlap(io_u);
@@ -71,12 +71,14 @@ static int io_workqueue_fn(struct submit_worker *sw,
 		ret = io_u_queued_complete(td, 1);
 		if (ret > 0)
 			td->cur_depth -= ret;
+		else if (ret < 0)
+			break;
 		io_u_clear(td, io_u, IO_U_F_FLIGHT);
 	} while (1);
 
 	dprint(FD_RATE, "io_u %p ret %d by %u\n", io_u, ret, gettid());
 
-	io_queue_event(td, io_u, &ret, ddir, NULL, 0, NULL);
+	error = io_queue_event(td, io_u, &ret, ddir, NULL, 0, NULL);
 
 	if (ret == FIO_Q_COMPLETED)
 		td->cur_depth--;
@@ -93,6 +95,9 @@ static int io_workqueue_fn(struct submit_worker *sw,
 			td->cur_depth -= ret;
 	}
 
+	if (error || td->error)
+		pthread_cond_signal(&td->parent->free_cond);
+
 	return 0;
 }
 
@@ -100,6 +105,8 @@ static bool io_workqueue_pre_sleep_flush_fn(struct submit_worker *sw)
 {
 	struct thread_data *td = sw->priv;
 
+	if (td->error)
+		return false;
 	if (td->io_u_queued || td->cur_depth || td->io_u_in_flight)
 		return true;
 
diff --git a/workqueue.c b/workqueue.c
index 841dbb9..faed245 100644
--- a/workqueue.c
+++ b/workqueue.c
@@ -97,7 +97,7 @@ void workqueue_flush(struct workqueue *wq)
 }
 
 /*
- * Must be serialized by caller. Returns true for queued, false for busy.
+ * Must be serialized by caller.
  */
 void workqueue_enqueue(struct workqueue *wq, struct workqueue_work *work)
 {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-11-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-11-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 16500b5a0b03ee0142d592bb74a46943a223b06e:

  Fio 3.12 (2018-11-02 12:41:50 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 46ff70e8795e91acb2ffc041e6c1fdb4e157dff4:

  verify: add requested block information to failure trace (2018-11-06 21:30:32 -0700)

----------------------------------------------------------------
Feng, Changyu (1):
      verify: add requested block information to failure trace

Jens Axboe (2):
      libaio: add support for polled IO
      libaio: fix 'hipri' help entry

 engines/libaio.c | 24 +++++++++++++++++++++---
 verify.c         | 12 ++++++++----
 2 files changed, 29 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libaio.c b/engines/libaio.c
index 7ac36b2..b241ed9 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -13,6 +13,9 @@
 #include "../lib/pow2.h"
 #include "../optgroup.h"
 
+#define IOCB_CMD_PREAD_POLL 9
+#define IOCB_CMD_PWRITE_POLL 10
+
 static int fio_libaio_commit(struct thread_data *td);
 
 struct libaio_data {
@@ -39,6 +42,7 @@ struct libaio_data {
 struct libaio_options {
 	void *pad;
 	unsigned int userspace_reap;
+	unsigned int hipri;
 };
 
 static struct fio_option options[] = {
@@ -52,6 +56,15 @@ static struct fio_option options[] = {
 		.group	= FIO_OPT_G_LIBAIO,
 	},
 	{
+		.name	= "hipri",
+		.lname	= "RWF_HIPRI",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct libaio_options, hipri),
+		.help	= "Use polled IO completions",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_LIBAIO,
+	},
+	{
 		.name	= NULL,
 	},
 };
@@ -68,12 +81,17 @@ static inline void ring_inc(struct libaio_data *ld, unsigned int *val,
 static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
+	struct libaio_options *o = td->eo;
 
-	if (io_u->ddir == DDIR_READ)
+	if (io_u->ddir == DDIR_READ) {
 		io_prep_pread(&io_u->iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
-	else if (io_u->ddir == DDIR_WRITE)
+		if (o->hipri)
+			io_u->iocb.aio_lio_opcode = IOCB_CMD_PREAD_POLL;
+	} else if (io_u->ddir == DDIR_WRITE) {
 		io_prep_pwrite(&io_u->iocb, f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
-	else if (ddir_sync(io_u->ddir))
+		if (o->hipri)
+			io_u->iocb.aio_lio_opcode = IOCB_CMD_PWRITE_POLL;
+	} else if (ddir_sync(io_u->ddir))
 		io_prep_fsync(&io_u->iocb, f->fd);
 
 	return 0;
diff --git a/verify.c b/verify.c
index 01492f2..da429e7 100644
--- a/verify.c
+++ b/verify.c
@@ -345,8 +345,10 @@ static void log_verify_failure(struct verify_header *hdr, struct vcont *vc)
 
 	offset = vc->io_u->offset;
 	offset += vc->hdr_num * hdr->len;
-	log_err("%.8s: verify failed at file %s offset %llu, length %u\n",
-			vc->name, vc->io_u->file->file_name, offset, hdr->len);
+	log_err("%.8s: verify failed at file %s offset %llu, length %u"
+			" (requested block: offset=%llu, length=%llu)\n",
+			vc->name, vc->io_u->file->file_name, offset, hdr->len,
+			vc->io_u->offset, vc->io_u->buflen);
 
 	if (vc->good_crc && vc->bad_crc) {
 		log_err("       Expected CRC: ");
@@ -865,9 +867,11 @@ static int verify_header(struct io_u *io_u, struct thread_data *td,
 	return 0;
 
 err:
-	log_err(" at file %s offset %llu, length %u\n",
+	log_err(" at file %s offset %llu, length %u"
+		" (requested block: offset=%llu, length=%llu)\n",
 		io_u->file->file_name,
-		io_u->offset + hdr_num * hdr_len, hdr_len);
+		io_u->offset + hdr_num * hdr_len, hdr_len,
+		io_u->offset, io_u->buflen);
 
 	if (td->o.verify_dump)
 		dump_buf(p, hdr_len, io_u->offset + hdr_num * hdr_len,


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-11-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-11-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 199710f5822cf22bf76107f26993a9468f93a422:

  oslib: fix strlcat's incorrect copying (2018-10-26 10:24:28 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 16500b5a0b03ee0142d592bb74a46943a223b06e:

  Fio 3.12 (2018-11-02 12:41:50 -0600)

----------------------------------------------------------------
Jeff Furlong (1):
      Fix Windows CPU count

Jens Axboe (1):
      Fio 3.12

 FIO-VERSION-GEN   | 2 +-
 os/os-windows-7.h | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 17b215d..ea5be1a 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.11
+DEF_VER=fio-3.12
 
 LF='
 '
diff --git a/os/os-windows-7.h b/os/os-windows-7.h
index f5ddb8e..0a6eaa3 100644
--- a/os/os-windows-7.h
+++ b/os/os-windows-7.h
@@ -10,7 +10,7 @@ typedef struct {
 /* Return all processors regardless of processor group */
 static inline unsigned int cpus_online(void)
 {
-	return GetMaximumProcessorCount(ALL_PROCESSOR_GROUPS);
+	return GetActiveProcessorCount(ALL_PROCESSOR_GROUPS);
 }
 
 static inline void print_mask(os_cpu_mask_t *cpumask)
@@ -104,7 +104,7 @@ static inline int mask_to_group_mask(os_cpu_mask_t *cpumask, int *processor_grou
 	cpus_offset = 0;
 	group_size = 0;
 	while (!found && group < online_groups) {
-		group_size = GetMaximumProcessorCount(group);
+		group_size = GetActiveProcessorCount(group);
 		dprint(FD_PROCESS, "group=%d group_start=%d group_size=%u search_cpu=%d\n",
 		       group, cpus_offset, group_size, search_cpu);
 		if (cpus_offset + group_size > search_cpu)
@@ -271,7 +271,7 @@ static inline int fio_getaffinity(int pid, os_cpu_mask_t *mask)
 	dprint(FD_PROCESS, "current_groups=%d group_count=%d\n",
 	       current_groups[0], group_count);
 	while (true) {
-		group_size = GetMaximumProcessorCount(group);
+		group_size = GetActiveProcessorCount(group);
 		if (group_size == 0) {
 			log_err("fio_getaffinity: error retrieving size of "
 				"processor group %d\n", group);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-10-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-10-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d7e92306bde2117702ed96b7c5647d9485869047:

  io_u: move trim error notification out-of-line (2018-10-24 05:00:53 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 199710f5822cf22bf76107f26993a9468f93a422:

  oslib: fix strlcat's incorrect copying (2018-10-26 10:24:28 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (7):
      unittests: add CUnit based unittest framework
      unittests: add unittest suite for lib/memalign.c
      unittests: add unittest suite for lib/strntol.c
      unittests: add unittest suite for oslib/strlcat.c
      unittests: add unittest suite for oslib/strndup.c
      lib: fix strntol's end pointer when str has leading spaces
      oslib: fix strlcat's incorrect copying

 .gitignore                |  1 +
 Makefile                  | 26 +++++++++++++++--
 configure                 | 26 +++++++++++++++++
 lib/strntol.c             |  2 +-
 oslib/strlcat.c           | 69 ++++++++++++++++++++++++++++++++-------------
 oslib/strlcat.h           |  2 +-
 unittests/lib/memalign.c  | 27 ++++++++++++++++++
 unittests/lib/strntol.c   | 59 +++++++++++++++++++++++++++++++++++++++
 unittests/oslib/strlcat.c | 52 ++++++++++++++++++++++++++++++++++
 unittests/oslib/strndup.c | 63 +++++++++++++++++++++++++++++++++++++++++
 unittests/unittest.c      | 71 +++++++++++++++++++++++++++++++++++++++++++++++
 unittests/unittest.h      | 26 +++++++++++++++++
 12 files changed, 400 insertions(+), 24 deletions(-)
 create mode 100644 unittests/lib/memalign.c
 create mode 100644 unittests/lib/strntol.c
 create mode 100644 unittests/oslib/strlcat.c
 create mode 100644 unittests/oslib/strndup.c
 create mode 100644 unittests/unittest.c
 create mode 100644 unittests/unittest.h

---

Diff of recent changes:

diff --git a/.gitignore b/.gitignore
index 0c8cb7c..f86bec6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -18,6 +18,7 @@
 /t/ieee754
 /t/lfsr-test
 /t/stest
+/unittests/unittest
 y.tab.*
 lex.yy.c
 *.un~
diff --git a/Makefile b/Makefile
index 4721b78..5ac568e 100644
--- a/Makefile
+++ b/Makefile
@@ -300,6 +300,23 @@ T_PROGS += $(T_VS_PROGS)
 
 PROGS += $(T_PROGS)
 
+ifdef CONFIG_HAVE_CUNIT
+UT_OBJS = unittests/unittest.o
+UT_OBJS += unittests/lib/memalign.o
+UT_OBJS += unittests/lib/strntol.o
+UT_OBJS += unittests/oslib/strlcat.o
+UT_OBJS += unittests/oslib/strndup.o
+UT_TARGET_OBJS = lib/memalign.o
+UT_TARGET_OBJS += lib/strntol.o
+UT_TARGET_OBJS += oslib/strlcat.o
+UT_TARGET_OBJS += oslib/strndup.o
+UT_PROGS = unittests/unittest
+else
+UT_OBJS =
+UT_TARGET_OBJS =
+UT_PROGS =
+endif
+
 ifneq ($(findstring $(MAKEFLAGS),s),s)
 ifndef V
 	QUIET_CC	= @echo '   ' CC $@;
@@ -326,7 +343,7 @@ mandir = $(prefix)/man
 sharedir = $(prefix)/share/fio
 endif
 
-all: $(PROGS) $(T_TEST_PROGS) $(SCRIPTS) FORCE
+all: $(PROGS) $(T_TEST_PROGS) $(UT_PROGS) $(SCRIPTS) FORCE
 
 .PHONY: all install clean test
 .PHONY: FORCE cscope
@@ -467,8 +484,13 @@ t/fio-verify-state: $(T_VS_OBJS)
 t/time-test: $(T_TT_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_TT_OBJS) $(LIBS)
 
+ifdef CONFIG_HAVE_CUNIT
+unittests/unittest: $(UT_OBJS) $(UT_TARGET_OBJS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(UT_OBJS) $(UT_TARGET_OBJS) -lcunit
+endif
+
 clean: FORCE
-	@rm -f .depend $(FIO_OBJS) $(GFIO_OBJS) $(OBJS) $(T_OBJS) $(PROGS) $(T_PROGS) $(T_TEST_PROGS) core.* core gfio FIO-VERSION-FILE *.[do] lib/*.d oslib/*.[do] crc/*.d engines/*.[do] profiles/*.[do] t/*.[do] config-host.mak config-host.h y.tab.[ch] lex.yy.c exp/*.[do] lexer.h
+	@rm -f .depend $(FIO_OBJS) $(GFIO_OBJS) $(OBJS) $(T_OBJS) $(UT_OBJS) $(PROGS) $(T_PROGS) $(T_TEST_PROGS) core.* core gfio unittests/unittest FIO-VERSION-FILE *.[do] lib/*.d oslib/*.[do] crc/*.d engines/*.[do] profiles/*.[do] t/*.[do] unittests/*.[do] unittests/*/*.[do] config-host.mak config-host.h y.tab.[ch] lex.yy.c exp/*.[do] lexer.h
 	@rm -rf  doc/output
 
 distclean: clean FORCE
diff --git a/configure b/configure
index 5490e26..1f4e50b 100755
--- a/configure
+++ b/configure
@@ -2272,6 +2272,29 @@ if test "$disable_native" = "no" && test "$disable_opt" != "yes" && \
 fi
 print_config "Build march=native" "$build_native"
 
+##########################################
+# check for -lcunit
+if test "$cunit" != "yes" ; then
+  cunit="no"
+fi
+cat > $TMPC << EOF
+#include <CUnit/CUnit.h>
+#include <CUnit/Basic.h>
+int main(void)
+{
+  if (CU_initialize_registry() != CUE_SUCCESS)
+    return CU_get_error();
+  CU_basic_set_mode(CU_BRM_VERBOSE);
+  CU_basic_run_tests();
+  CU_cleanup_registry();
+  return CU_get_error();
+}
+EOF
+if compile_prog "" "-lcunit" "CUnit"; then
+  cunit="yes"
+fi
+print_config "CUnit" "$cunit"
+
 #############################################################################
 
 if test "$wordsize" = "64" ; then
@@ -2537,6 +2560,9 @@ fi
 if test "$march_set" = "no" && test "$build_native" = "yes" ; then
   output_sym "CONFIG_BUILD_NATIVE"
 fi
+if test "$cunit" = "yes" ; then
+  output_sym "CONFIG_HAVE_CUNIT"
+fi
 
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "GFIO_LIBS+=$GFIO_LIBS" >> $config_host_mak
diff --git a/lib/strntol.c b/lib/strntol.c
index f622c8d..c3a55a1 100644
--- a/lib/strntol.c
+++ b/lib/strntol.c
@@ -28,6 +28,6 @@ long strntol(const char *str, size_t sz, char **end, int base)
 	if (ret == LONG_MIN || ret == LONG_MAX)
 		return ret;
 	if (end)
-		*end = (char *)str + (*end - buf);
+		*end = (char *)beg + (*end - buf);
 	return ret;
 }
diff --git a/oslib/strlcat.c b/oslib/strlcat.c
index 6c4c678..3e86eeb 100644
--- a/oslib/strlcat.c
+++ b/oslib/strlcat.c
@@ -1,28 +1,57 @@
 #ifndef CONFIG_STRLCAT
-
+/*
+ * Copyright (c) 1998, 2015 Todd C. Miller <Todd.Miller@courtesan.com>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include <sys/types.h>
 #include <string.h>
 #include "strlcat.h"
 
-size_t strlcat(char *dst, const char *src, size_t size)
+/*
+ * Appends src to string dst of size dsize (unlike strncat, dsize is the
+ * full size of dst, not space left).  At most dsize-1 characters
+ * will be copied.  Always NUL terminates (unless dsize <= strlen(dst)).
+ * Returns strlen(src) + MIN(dsize, strlen(initial dst)).
+ * If retval >= dsize, truncation occurred.
+ */
+size_t
+strlcat(char *dst, const char *src, size_t dsize)
 {
-	size_t dstlen;
-	size_t srclen;
-
-	dstlen = strlen(dst);
-	size -= dstlen + 1;
-
-	/* return if no room */
-	if (!size)
-		return dstlen;
-
-	srclen = strlen(src);
-	if (srclen > size)
-		srclen = size;
-
-	memcpy(dst + dstlen, src, srclen);
-	dst[dstlen + srclen] = '\0';
-
-	return dstlen + srclen;
+	const char *odst = dst;
+	const char *osrc = src;
+	size_t n = dsize;
+	size_t dlen;
+
+	/* Find the end of dst and adjust bytes left but don't go past end. */
+	while (n-- != 0 && *dst != '\0')
+		dst++;
+	dlen = dst - odst;
+	n = dsize - dlen;
+
+	if (n-- == 0)
+		return(dlen + strlen(src));
+	while (*src != '\0') {
+		if (n != 0) {
+			*dst++ = *src;
+			n--;
+		}
+		src++;
+	}
+	*dst = '\0';
+
+	return(dlen + (src - osrc));	/* count does not include NUL */
 }
 
 #endif
diff --git a/oslib/strlcat.h b/oslib/strlcat.h
index f766392..85e4bda 100644
--- a/oslib/strlcat.h
+++ b/oslib/strlcat.h
@@ -5,7 +5,7 @@
 
 #include <stddef.h>
 
-size_t strlcat(char *dst, const char *src, size_t size);
+size_t strlcat(char *dst, const char *src, size_t dsize);
 
 #endif
 
diff --git a/unittests/lib/memalign.c b/unittests/lib/memalign.c
new file mode 100644
index 0000000..854c274
--- /dev/null
+++ b/unittests/lib/memalign.c
@@ -0,0 +1,27 @@
+#include "../unittest.h"
+
+#include "../../lib/memalign.h"
+
+static void test_memalign_1(void)
+{
+	size_t align = 4096;
+	void *p = fio_memalign(align, 1234, false);
+
+	if (p)
+		CU_ASSERT_EQUAL(((int)(uintptr_t)p) & (align - 1), 0);
+}
+
+static struct fio_unittest_entry tests[] = {
+	{
+		.name	= "memalign/1",
+		.fn	= test_memalign_1,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
+CU_ErrorCode fio_unittest_lib_memalign(void)
+{
+	return fio_unittest_add_suite("lib/memalign.c", NULL, NULL, tests);
+}
diff --git a/unittests/lib/strntol.c b/unittests/lib/strntol.c
new file mode 100644
index 0000000..14adde2
--- /dev/null
+++ b/unittests/lib/strntol.c
@@ -0,0 +1,59 @@
+#include "../unittest.h"
+
+#include "../../lib/strntol.h"
+
+static void test_strntol_1(void)
+{
+	char s[] = "12345";
+	char *endp = NULL;
+	long ret = strntol(s, strlen(s), &endp, 10);
+
+	CU_ASSERT_EQUAL(ret, 12345);
+	CU_ASSERT_NOT_EQUAL(endp, NULL);
+	CU_ASSERT_EQUAL(*endp, '\0');
+}
+
+static void test_strntol_2(void)
+{
+	char s[] = "     12345";
+	char *endp = NULL;
+	long ret = strntol(s, strlen(s), &endp, 10);
+
+	CU_ASSERT_EQUAL(ret, 12345);
+	CU_ASSERT_NOT_EQUAL(endp, NULL);
+	CU_ASSERT_EQUAL(*endp, '\0');
+}
+
+static void test_strntol_3(void)
+{
+	char s[] = "0x12345";
+	char *endp = NULL;
+	long ret = strntol(s, strlen(s), &endp, 16);
+
+	CU_ASSERT_EQUAL(ret, 0x12345);
+	CU_ASSERT_NOT_EQUAL(endp, NULL);
+	CU_ASSERT_EQUAL(*endp, '\0');
+}
+
+static struct fio_unittest_entry tests[] = {
+	{
+		.name	= "strntol/1",
+		.fn	= test_strntol_1,
+	},
+	{
+		.name	= "strntol/2",
+		.fn	= test_strntol_2,
+	},
+	{
+		.name	= "strntol/3",
+		.fn	= test_strntol_3,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
+CU_ErrorCode fio_unittest_lib_strntol(void)
+{
+	return fio_unittest_add_suite("lib/strntol.c", NULL, NULL, tests);
+}
diff --git a/unittests/oslib/strlcat.c b/unittests/oslib/strlcat.c
new file mode 100644
index 0000000..8d35d41
--- /dev/null
+++ b/unittests/oslib/strlcat.c
@@ -0,0 +1,52 @@
+#include "../unittest.h"
+
+#ifndef CONFIG_STRLCAT
+#include "../../oslib/strlcat.h"
+#else
+#include <string.h>
+#endif
+
+static void test_strlcat_1(void)
+{
+	char dst[32];
+	char src[] = "test";
+	size_t ret;
+
+	dst[0] = '\0';
+	ret = strlcat(dst, src, sizeof(dst));
+
+	CU_ASSERT_EQUAL(strcmp(dst, "test"), 0);
+	CU_ASSERT_EQUAL(ret, 4); /* total length it tried to create */
+}
+
+static void test_strlcat_2(void)
+{
+	char dst[32];
+	char src[] = "test";
+	size_t ret;
+
+	dst[0] = '\0';
+	ret = strlcat(dst, src, strlen(dst));
+
+	CU_ASSERT_EQUAL(strcmp(dst, ""), 0);
+	CU_ASSERT_EQUAL(ret, 4); /* total length it tried to create */
+}
+
+static struct fio_unittest_entry tests[] = {
+	{
+		.name	= "strlcat/1",
+		.fn	= test_strlcat_1,
+	},
+	{
+		.name	= "strlcat/2",
+		.fn	= test_strlcat_2,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
+CU_ErrorCode fio_unittest_oslib_strlcat(void)
+{
+	return fio_unittest_add_suite("oslib/strlcat.c", NULL, NULL, tests);
+}
diff --git a/unittests/oslib/strndup.c b/unittests/oslib/strndup.c
new file mode 100644
index 0000000..2d1baf1
--- /dev/null
+++ b/unittests/oslib/strndup.c
@@ -0,0 +1,63 @@
+#include "../unittest.h"
+
+#ifndef CONFIG_HAVE_STRNDUP
+#include "../../oslib/strndup.h"
+#else
+#include <string.h>
+#endif
+
+static void test_strndup_1(void)
+{
+	char s[] = "test";
+	char *p = strndup(s, 3);
+
+	if (p) {
+		CU_ASSERT_EQUAL(strcmp(p, "tes"), 0);
+		CU_ASSERT_EQUAL(strlen(p), 3);
+	}
+}
+
+static void test_strndup_2(void)
+{
+	char s[] = "test";
+	char *p = strndup(s, 4);
+
+	if (p) {
+		CU_ASSERT_EQUAL(strcmp(p, s), 0);
+		CU_ASSERT_EQUAL(strlen(p), 4);
+	}
+}
+
+static void test_strndup_3(void)
+{
+	char s[] = "test";
+	char *p = strndup(s, 5);
+
+	if (p) {
+		CU_ASSERT_EQUAL(strcmp(p, s), 0);
+		CU_ASSERT_EQUAL(strlen(p), 4);
+	}
+}
+
+static struct fio_unittest_entry tests[] = {
+	{
+		.name	= "strndup/1",
+		.fn	= test_strndup_1,
+	},
+	{
+		.name	= "strndup/2",
+		.fn	= test_strndup_2,
+	},
+	{
+		.name	= "strndup/3",
+		.fn	= test_strndup_3,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
+CU_ErrorCode fio_unittest_oslib_strndup(void)
+{
+	return fio_unittest_add_suite("oslib/strndup.c", NULL, NULL, tests);
+}
diff --git a/unittests/unittest.c b/unittests/unittest.c
new file mode 100644
index 0000000..1166e6e
--- /dev/null
+++ b/unittests/unittest.c
@@ -0,0 +1,71 @@
+/*
+ * fio unittest
+ * Copyright (C) 2018 Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "./unittest.h"
+
+/* XXX workaround lib/memalign.c's dependency on smalloc.c */
+void *smalloc(size_t size)
+{
+	return malloc(size);
+}
+
+void sfree(void *ptr)
+{
+	free(ptr);
+}
+
+CU_ErrorCode fio_unittest_add_suite(const char *name, CU_InitializeFunc initfn,
+	CU_CleanupFunc cleanfn, struct fio_unittest_entry *tvec)
+{
+	CU_pSuite pSuite;
+	struct fio_unittest_entry *t;
+
+	pSuite = CU_add_suite(name, initfn, cleanfn);
+	if (!pSuite) {
+		CU_cleanup_registry();
+		return CU_get_error();
+	}
+
+	t = tvec;
+	while (t && t->name) {
+		if (!CU_add_test(pSuite, t->name, t->fn)) {
+			CU_cleanup_registry();
+			return CU_get_error();
+		}
+		t++;
+	}
+
+	return CUE_SUCCESS;
+}
+
+static void fio_unittest_register(CU_ErrorCode (*fn)(void))
+{
+	if (fn && fn() != CUE_SUCCESS) {
+		fprintf(stderr, "%s\n", CU_get_error_msg());
+		exit(1);
+	}
+}
+
+int main(void)
+{
+	if (CU_initialize_registry() != CUE_SUCCESS) {
+		fprintf(stderr, "%s\n", CU_get_error_msg());
+		exit(1);
+	}
+
+	fio_unittest_register(fio_unittest_lib_memalign);
+	fio_unittest_register(fio_unittest_lib_strntol);
+	fio_unittest_register(fio_unittest_oslib_strlcat);
+	fio_unittest_register(fio_unittest_oslib_strndup);
+
+	CU_basic_set_mode(CU_BRM_VERBOSE);
+	CU_basic_run_tests();
+	CU_cleanup_registry();
+
+	return CU_get_error();
+}
diff --git a/unittests/unittest.h b/unittests/unittest.h
new file mode 100644
index 0000000..d3e3822
--- /dev/null
+++ b/unittests/unittest.h
@@ -0,0 +1,26 @@
+#ifndef FIO_UNITTEST_H
+#define FIO_UNITTEST_H
+
+#include <sys/types.h>
+
+#include <CUnit/CUnit.h>
+#include <CUnit/Basic.h>
+
+struct fio_unittest_entry {
+	const char *name;
+	CU_TestFunc fn;
+};
+
+/* XXX workaround lib/memalign.c's dependency on smalloc.c */
+void *smalloc(size_t);
+void sfree(void*);
+
+CU_ErrorCode fio_unittest_add_suite(const char*, CU_InitializeFunc,
+	CU_CleanupFunc, struct fio_unittest_entry*);
+
+CU_ErrorCode fio_unittest_lib_memalign(void);
+CU_ErrorCode fio_unittest_lib_strntol(void);
+CU_ErrorCode fio_unittest_oslib_strlcat(void);
+CU_ErrorCode fio_unittest_oslib_strndup(void);
+
+#endif


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-10-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-10-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 307f2246d65d9dbea7e5f56cd17734cb68c2817f:

  docs: serialize_overlap=1 with io_submit_mode=offload no longer requires threads (2018-10-19 11:09:23 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d7e92306bde2117702ed96b7c5647d9485869047:

  io_u: move trim error notification out-of-line (2018-10-24 05:00:53 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      stat: use helper for IO direction name
      io_u: move trim error notification out-of-line

 io_u.c | 24 ++++++++++++------------
 stat.c | 10 ++++------
 2 files changed, 16 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index a3540d1..56abe6f 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1798,6 +1798,16 @@ static inline bool gtod_reduce(struct thread_data *td)
 			|| td->o.gtod_reduce;
 }
 
+static void trim_block_info(struct thread_data *td, struct io_u *io_u)
+{
+	uint32_t *info = io_u_block_info(td, io_u);
+
+	if (BLOCK_INFO_STATE(*info) >= BLOCK_STATE_TRIM_FAILURE)
+		return;
+
+	*info = BLOCK_INFO(BLOCK_STATE_TRIMMED, BLOCK_INFO_TRIMS(*info) + 1);
+}
+
 static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 				  struct io_completion_data *icd,
 				  const enum fio_ddir idx, unsigned int bytes)
@@ -1849,18 +1859,8 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 	} else if (ddir_sync(idx) && !td->o.disable_clat)
 		add_sync_clat_sample(&td->ts, llnsec);
 
-	if (td->ts.nr_block_infos && io_u->ddir == DDIR_TRIM) {
-		uint32_t *info = io_u_block_info(td, io_u);
-		if (BLOCK_INFO_STATE(*info) < BLOCK_STATE_TRIM_FAILURE) {
-			if (io_u->ddir == DDIR_TRIM) {
-				*info = BLOCK_INFO(BLOCK_STATE_TRIMMED,
-						BLOCK_INFO_TRIMS(*info) + 1);
-			} else if (io_u->ddir == DDIR_WRITE) {
-				*info = BLOCK_INFO_SET_STATE(BLOCK_STATE_WRITTEN,
-								*info);
-			}
-		}
-	}
+	if (td->ts.nr_block_infos && io_u->ddir == DDIR_TRIM)
+		trim_block_info(td, io_u);
 }
 
 static void file_log_write_comp(const struct thread_data *td, struct fio_file *f,
diff --git a/stat.c b/stat.c
index ef9c4af..331abf6 100644
--- a/stat.c
+++ b/stat.c
@@ -416,7 +416,6 @@ static void display_lat(const char *name, unsigned long long min,
 static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 			     int ddir, struct buf_output *out)
 {
-	const char *str[] = { " read", "write", " trim", "sync" };
 	unsigned long runt;
 	unsigned long long min, max, bw, iops;
 	double mean, dev;
@@ -426,12 +425,12 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	if (ddir_sync(ddir)) {
 		if (calc_lat(&ts->sync_stat, &min, &max, &mean, &dev)) {
 			log_buf(out, "  %s:\n", "fsync/fdatasync/sync_file_range");
-			display_lat(str[ddir], min, max, mean, dev, out);
+			display_lat(io_ddir_name(ddir), min, max, mean, dev, out);
 			show_clat_percentiles(ts->io_u_sync_plat,
 						ts->sync_stat.samples,
 						ts->percentile_list,
 						ts->percentile_precision,
-						str[ddir], out);
+						io_ddir_name(ddir), out);
 		}
 		return;
 	}
@@ -455,7 +454,7 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		zbd_w_st = zbd_write_status(ts);
 
 	log_buf(out, "  %s: IOPS=%s, BW=%s (%s)(%s/%llumsec)%s\n",
-			rs->unified_rw_rep ? "mixed" : str[ddir],
+			rs->unified_rw_rep ? "mixed" : io_ddir_name(ddir),
 			iops_p, bw_p, bw_p_alt, io_p,
 			(unsigned long long) ts->runtime[ddir],
 			zbd_w_st ? : "");
@@ -985,7 +984,6 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	double mean, dev, iops;
 	unsigned int len;
 	int i;
-	const char *ddirname[] = { "read", "write", "trim", "sync" };
 	struct json_object *dir_object, *tmp_object, *percentile_object, *clat_bins_object = NULL;
 	char buf[120];
 	double p_of_agg = 100.0;
@@ -997,7 +995,7 @@ static void add_ddir_status_json(struct thread_stat *ts,
 
 	dir_object = json_create_object();
 	json_object_add_value_object(parent,
-		ts->unified_rw_rep ? "mixed" : ddirname[ddir], dir_object);
+		ts->unified_rw_rep ? "mixed" : io_ddir_name(ddir), dir_object);
 
 	if (ddir_rw(ddir)) {
 		bw_bytes = 0;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-10-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-10-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8eb142ddd8bc3fe9428cd46b9fd98f32b2bc8c67:

  fio: reset more counters when ramp time has elapsed (2018-10-18 15:33:05 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 307f2246d65d9dbea7e5f56cd17734cb68c2817f:

  docs: serialize_overlap=1 with io_submit_mode=offload no longer requires threads (2018-10-19 11:09:23 -0600)

----------------------------------------------------------------
Adam Kupczyk (1):
      iolog: Fix problem with setup() not invoked when read_iolog is used.

Ben England (1):
      add rsp. time samples as column 2, use meaningful pctile names

Jens Axboe (3):
      Merge branch 'fix-init-read-iolog' of https://github.com/aclamk/fio
      filesetup: fix whitespace damage introduced by previous patch
      Merge branch 'samples-colnames' of https://github.com/parallel-fs-utils/fio

Vincent Fu (4):
      fio: add function to check for serialize_overlap with offload submission
      fio: enable cross-thread overlap checking with processes
      fio: document locking for overlap checking in offload mode
      docs: serialize_overlap=1 with io_submit_mode=offload no longer requires threads

 HOWTO                               |  2 +-
 backend.c                           | 21 +++++++++++++--------
 filesetup.c                         |  6 +++---
 fio.1                               |  2 +-
 fio.h                               |  5 +++++
 io_u_queue.c                        | 17 +++++++++++++----
 io_u_queue.h                        |  4 ++--
 ioengines.c                         |  9 ++++++++-
 lib/memalign.c                      | 16 ++++++++++++----
 lib/memalign.h                      |  5 +++--
 rate-submit.c                       |  8 ++++++++
 t/dedupe.c                          | 12 ++++++------
 tools/hist/fio-histo-log-pctiles.py | 23 ++++++++++++++++++++---
 13 files changed, 95 insertions(+), 35 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 72ef872..468772d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2343,7 +2343,7 @@ I/O depth
 	This option only applies to I/Os issued for a single job except when it is
 	enabled along with :option:`io_submit_mode`=offload. In offload mode, fio
 	will check for overlap among all I/Os submitted by offload jobs with :option:`serialize_overlap`
-	enabled. Threads must be used for all such jobs.
+	enabled.
 
 	Default: false.
 
diff --git a/backend.c b/backend.c
index cc3c4e7..d6450ba 100644
--- a/backend.c
+++ b/backend.c
@@ -1189,14 +1189,14 @@ static void cleanup_io_u(struct thread_data *td)
 		if (td->io_ops->io_u_free)
 			td->io_ops->io_u_free(td, io_u);
 
-		fio_memfree(io_u, sizeof(*io_u));
+		fio_memfree(io_u, sizeof(*io_u), td_offload_overlap(td));
 	}
 
 	free_io_mem(td);
 
 	io_u_rexit(&td->io_u_requeues);
-	io_u_qexit(&td->io_u_freelist);
-	io_u_qexit(&td->io_u_all);
+	io_u_qexit(&td->io_u_freelist, false);
+	io_u_qexit(&td->io_u_all, td_offload_overlap(td));
 
 	free_file_completion_logging(td);
 }
@@ -1211,8 +1211,8 @@ static int init_io_u(struct thread_data *td)
 
 	err = 0;
 	err += !io_u_rinit(&td->io_u_requeues, td->o.iodepth);
-	err += !io_u_qinit(&td->io_u_freelist, td->o.iodepth);
-	err += !io_u_qinit(&td->io_u_all, td->o.iodepth);
+	err += !io_u_qinit(&td->io_u_freelist, td->o.iodepth, false);
+	err += !io_u_qinit(&td->io_u_all, td->o.iodepth, td_offload_overlap(td));
 
 	if (err) {
 		log_err("fio: failed setting up IO queues\n");
@@ -1227,7 +1227,7 @@ static int init_io_u(struct thread_data *td)
 		if (td->terminate)
 			return 1;
 
-		ptr = fio_memalign(cl_align, sizeof(*io_u));
+		ptr = fio_memalign(cl_align, sizeof(*io_u), td_offload_overlap(td));
 		if (!ptr) {
 			log_err("fio: unable to allocate aligned memory\n");
 			break;
@@ -1874,10 +1874,15 @@ static void *thread_main(void *data)
 			 "perhaps try --debug=io option for details?\n",
 			 td->o.name, td->io_ops->name);
 
-	if (td->o.serialize_overlap && td->o.io_submit_mode == IO_MODE_OFFLOAD)
+	/*
+	 * Acquire this lock if we were doing overlap checking in
+	 * offload mode so that we don't clean up this job while
+	 * another thread is checking its io_u's for overlap
+	 */
+	if (td_offload_overlap(td))
 		pthread_mutex_lock(&overlap_check);
 	td_set_runstate(td, TD_FINISHING);
-	if (td->o.serialize_overlap && td->o.io_submit_mode == IO_MODE_OFFLOAD)
+	if (td_offload_overlap(td))
 		pthread_mutex_unlock(&overlap_check);
 
 	update_rusage_stat(td);
diff --git a/filesetup.c b/filesetup.c
index c0fa3cd..aa1a394 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -908,9 +908,6 @@ int setup_files(struct thread_data *td)
 
 	old_state = td_bump_runstate(td, TD_SETTING_UP);
 
-	if (o->read_iolog_file)
-		goto done;
-
 	/*
 	 * Find out physical size of files or devices for this thread,
 	 * before we determine I/O size and range of our targets.
@@ -926,6 +923,9 @@ int setup_files(struct thread_data *td)
 	if (err)
 		goto err_out;
 
+	if (o->read_iolog_file)
+		goto done;
+
 	/*
 	 * check sizes. if the files/devices do not exist and the size
 	 * isn't passed to fio, abort.
diff --git a/fio.1 b/fio.1
index 7691b2b..ed49268 100644
--- a/fio.1
+++ b/fio.1
@@ -2075,7 +2075,7 @@ this option can reduce both performance and the \fBiodepth\fR achieved.
 This option only applies to I/Os issued for a single job except when it is
 enabled along with \fBio_submit_mode\fR=offload. In offload mode, fio
 will check for overlap among all I/Os submitted by offload jobs with \fBserialize_overlap\fR
-enabled. Threads must be used for all such jobs.
+enabled.
 .P
 Default: false.
 .RE
diff --git a/fio.h b/fio.h
index e394e16..b3ba5db 100644
--- a/fio.h
+++ b/fio.h
@@ -772,6 +772,11 @@ static inline bool td_async_processing(struct thread_data *td)
 	return (td->flags & TD_F_NEED_LOCK) != 0;
 }
 
+static inline bool td_offload_overlap(struct thread_data *td)
+{
+	return td->o.serialize_overlap && td->o.io_submit_mode == IO_MODE_OFFLOAD;
+}
+
 /*
  * We currently only need to do locking if we have verifier threads
  * accessing our internal structures too
diff --git a/io_u_queue.c b/io_u_queue.c
index 8cf4c8c..41f98bc 100644
--- a/io_u_queue.c
+++ b/io_u_queue.c
@@ -1,9 +1,15 @@
 #include <stdlib.h>
+#include <string.h>
 #include "io_u_queue.h"
+#include "smalloc.h"
 
-bool io_u_qinit(struct io_u_queue *q, unsigned int nr)
+bool io_u_qinit(struct io_u_queue *q, unsigned int nr, bool shared)
 {
-	q->io_us = calloc(nr, sizeof(struct io_u *));
+	if (shared)
+		q->io_us = smalloc(nr * sizeof(struct io_u *));
+	else
+		q->io_us = calloc(nr, sizeof(struct io_u *));
+
 	if (!q->io_us)
 		return false;
 
@@ -12,9 +18,12 @@ bool io_u_qinit(struct io_u_queue *q, unsigned int nr)
 	return true;
 }
 
-void io_u_qexit(struct io_u_queue *q)
+void io_u_qexit(struct io_u_queue *q, bool shared)
 {
-	free(q->io_us);
+	if (shared)
+		sfree(q->io_us);
+	else
+		free(q->io_us);
 }
 
 bool io_u_rinit(struct io_u_ring *ring, unsigned int nr)
diff --git a/io_u_queue.h b/io_u_queue.h
index 545e2c4..87de894 100644
--- a/io_u_queue.h
+++ b/io_u_queue.h
@@ -45,8 +45,8 @@ static inline int io_u_qempty(const struct io_u_queue *q)
 #define io_u_qiter(q, io_u, i)	\
 	for (i = 0; i < (q)->nr && (io_u = (q)->io_us[i]); i++)
 
-bool io_u_qinit(struct io_u_queue *q, unsigned int nr);
-void io_u_qexit(struct io_u_queue *q);
+bool io_u_qinit(struct io_u_queue *q, unsigned int nr, bool shared);
+void io_u_qexit(struct io_u_queue *q, bool shared);
 
 struct io_u_ring {
 	unsigned int head;
diff --git a/ioengines.c b/ioengines.c
index 47f606a..b7df860 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -288,7 +288,14 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 
 	assert((io_u->flags & IO_U_F_FLIGHT) == 0);
 	io_u_set(td, io_u, IO_U_F_FLIGHT);
-	if (td->o.serialize_overlap && td->o.io_submit_mode == IO_MODE_OFFLOAD)
+
+	/*
+	 * If overlap checking was enabled in offload mode we
+	 * can release this lock that was acquired when we
+	 * started the overlap check because the IO_U_F_FLIGHT
+	 * flag is now set
+	 */
+	if (td_offload_overlap(td))
 		pthread_mutex_unlock(&overlap_check);
 
 	assert(fio_file_open(io_u->file));
diff --git a/lib/memalign.c b/lib/memalign.c
index e774c19..537bb9f 100644
--- a/lib/memalign.c
+++ b/lib/memalign.c
@@ -2,6 +2,7 @@
 #include <stdlib.h>
 
 #include "memalign.h"
+#include "smalloc.h"
 
 #define PTR_ALIGN(ptr, mask)   \
 	(char *)((uintptr_t)((ptr) + (mask)) & ~(mask))
@@ -10,14 +11,18 @@ struct align_footer {
 	unsigned int offset;
 };
 
-void *fio_memalign(size_t alignment, size_t size)
+void *fio_memalign(size_t alignment, size_t size, bool shared)
 {
 	struct align_footer *f;
 	void *ptr, *ret = NULL;
 
 	assert(!(alignment & (alignment - 1)));
 
-	ptr = malloc(size + alignment + sizeof(*f) - 1);
+	if (shared)
+		ptr = smalloc(size + alignment + sizeof(*f) - 1);
+	else
+		ptr = malloc(size + alignment + sizeof(*f) - 1);
+
 	if (ptr) {
 		ret = PTR_ALIGN(ptr, alignment - 1);
 		f = ret + size;
@@ -27,9 +32,12 @@ void *fio_memalign(size_t alignment, size_t size)
 	return ret;
 }
 
-void fio_memfree(void *ptr, size_t size)
+void fio_memfree(void *ptr, size_t size, bool shared)
 {
 	struct align_footer *f = ptr + size;
 
-	free(ptr - f->offset);
+	if (shared)
+		sfree(ptr - f->offset);
+	else
+		free(ptr - f->offset);
 }
diff --git a/lib/memalign.h b/lib/memalign.h
index c2eb170..d703087 100644
--- a/lib/memalign.h
+++ b/lib/memalign.h
@@ -2,8 +2,9 @@
 #define FIO_MEMALIGN_H
 
 #include <inttypes.h>
+#include <stdbool.h>
 
-extern void *fio_memalign(size_t alignment, size_t size);
-extern void fio_memfree(void *ptr, size_t size);
+extern void *fio_memalign(size_t alignment, size_t size, bool shared);
+extern void fio_memfree(void *ptr, size_t size, bool shared);
 
 #endif
diff --git a/rate-submit.c b/rate-submit.c
index 68ad755..e5c6204 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -21,6 +21,14 @@ static void check_overlap(struct io_u *io_u)
 		 * time to prevent two threads from thinking the coast
 		 * is clear and then submitting IOs that overlap with
 		 * each other
+		 *
+		 * If an overlap is found, release the lock and
+		 * re-acquire it before checking again to give other
+		 * threads a chance to make progress
+		 *
+		 * If an overlap is not found, release the lock when the
+		 * io_u's IO_U_F_FLIGHT flag is set so that this io_u
+		 * can be checked by other threads as they assess overlap
 		 */
 		pthread_mutex_lock(&overlap_check);
 		for_each_td(td, i) {
diff --git a/t/dedupe.c b/t/dedupe.c
index 37120e1..2ef8dc5 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -158,8 +158,8 @@ static int col_check(struct chunk *c, struct item *i)
 	char *cbuf, *ibuf;
 	int ret = 1;
 
-	cbuf = fio_memalign(blocksize, blocksize);
-	ibuf = fio_memalign(blocksize, blocksize);
+	cbuf = fio_memalign(blocksize, blocksize, false);
+	ibuf = fio_memalign(blocksize, blocksize, false);
 
 	e = flist_entry(c->extent_list[0].next, struct extent, list);
 	if (read_block(file.fd, cbuf, e->offset))
@@ -170,8 +170,8 @@ static int col_check(struct chunk *c, struct item *i)
 
 	ret = memcmp(ibuf, cbuf, blocksize);
 out:
-	fio_memfree(cbuf, blocksize);
-	fio_memfree(ibuf, blocksize);
+	fio_memfree(cbuf, blocksize, false);
+	fio_memfree(ibuf, blocksize, false);
 	return ret;
 }
 
@@ -309,7 +309,7 @@ static void *thread_fn(void *data)
 	struct worker_thread *thread = data;
 	void *buf;
 
-	buf = fio_memalign(blocksize, chunk_size);
+	buf = fio_memalign(blocksize, chunk_size, false);
 
 	do {
 		if (get_work(&thread->cur_offset, &thread->size)) {
@@ -323,7 +323,7 @@ static void *thread_fn(void *data)
 	} while (1);
 
 	thread->done = 1;
-	fio_memfree(buf, chunk_size);
+	fio_memfree(buf, chunk_size, false);
 	return NULL;
 }
 
diff --git a/tools/hist/fio-histo-log-pctiles.py b/tools/hist/fio-histo-log-pctiles.py
index 7f08f6e..f9df2a3 100755
--- a/tools/hist/fio-histo-log-pctiles.py
+++ b/tools/hist/fio-histo-log-pctiles.py
@@ -272,6 +272,13 @@ def add_to_histo_from( target, source ):
     for b in range(0, len(source)):
         target[b] += source[b]
 
+
+# calculate total samples in the histogram buckets
+
+def get_samples(buckets):
+    return reduce( lambda x,y: x + y, buckets)
+
+
 # compute percentiles
 # inputs:
 #   buckets: histogram bucket array 
@@ -453,14 +460,24 @@ def compute_percentiles_from_logs():
     # calculate percentiles across aggregate histogram for all threads
     # print CSV header just like fiologparser_hist does
 
-    header = 'msec-since-start, '
+    header = 'msec-since-start, samples, '
     for p in args.pctiles_wanted:
-        header += '%3.1f, ' % p
+        if p == 0.:
+            next_pctile_header = 'min'
+        elif p == 100.:
+            next_pctile_header = 'max'
+        elif p == 50.:
+            next_pctile_header = 'median'
+        else:
+            next_pctile_header = '%3.1f' % p
+        header += '%s, ' % next_pctile_header
+
     print('time (millisec), percentiles in increasing order with values in ' + args.output_unit)
     print(header)
 
     for (t_msec, all_threads_histo_t) in all_threads_histograms:
-        record = '%8d, ' % t_msec
+        samples = get_samples(all_threads_histo_t)
+        record = '%8d, %8d, ' % (t_msec, samples)
         pct = get_pctiles(all_threads_histo_t, args.pctiles_wanted, bucket_times)
         if not pct:
             for w in args.pctiles_wanted:


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-10-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-10-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d5dbacf662b1cc3fb09b5cd70b236ab98d1c0dbe:

  Merge branch 'offload-serialize-overlap2' of https://github.com/vincentkfu/fio (2018-10-15 14:09:18 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8eb142ddd8bc3fe9428cd46b9fd98f32b2bc8c67:

  fio: reset more counters when ramp time has elapsed (2018-10-18 15:33:05 -0600)

----------------------------------------------------------------
Vincent Fu (1):
      fio: reset more counters when ramp time has elapsed

 time.c | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/time.c b/time.c
index c887682..1999969 100644
--- a/time.c
+++ b/time.c
@@ -118,6 +118,7 @@ bool ramp_time_over(struct thread_data *td)
 	if (utime_since_now(&td->epoch) >= td->o.ramp_time) {
 		td->ramp_time_over = true;
 		reset_all_stats(td);
+		reset_io_stats(td);
 		td_set_runstate(td, TD_RAMP);
 
 		/*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-10-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-10-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ee6ce26c029dbdb62184d2f011fdab61d3429d82:

  options: kill 'use_os_rand' (2018-10-08 13:43:23 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d5dbacf662b1cc3fb09b5cd70b236ab98d1c0dbe:

  Merge branch 'offload-serialize-overlap2' of https://github.com/vincentkfu/fio (2018-10-15 14:09:18 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'offload-serialize-overlap2' of https://github.com/vincentkfu/fio

Vincent Fu (4):
      init: loosen serialize_overlap restrictions
      fio: enable overlap checking with offload submission
      docs: enable serialize_overlap with io_submit_mode=offload
      rate-submit: remove code that can never be executed

 HOWTO         |  9 +++++++--
 backend.c     |  8 +++++++-
 fio.1         | 11 +++++++++--
 fio.h         |  3 +++
 init.c        | 13 +++----------
 ioengines.c   |  2 ++
 rate-submit.c | 37 +++++++++++++++++++++++++++++++++----
 7 files changed, 64 insertions(+), 19 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index a2503e9..72ef872 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2339,8 +2339,13 @@ I/O depth
 	``serialize_overlap`` tells fio to avoid provoking this behavior by explicitly
 	serializing in-flight I/Os that have a non-zero overlap. Note that setting
 	this option can reduce both performance and the :option:`iodepth` achieved.
-	Additionally this option does not work when :option:`io_submit_mode` is set to
-	offload. Default: false.
+
+	This option only applies to I/Os issued for a single job except when it is
+	enabled along with :option:`io_submit_mode`=offload. In offload mode, fio
+	will check for overlap among all I/Os submitted by offload jobs with :option:`serialize_overlap`
+	enabled. Threads must be used for all such jobs.
+
+	Default: false.
 
 .. option:: io_submit_mode=str
 
diff --git a/backend.c b/backend.c
index 76e456f..cc3c4e7 100644
--- a/backend.c
+++ b/backend.c
@@ -29,6 +29,7 @@
 #include <sys/stat.h>
 #include <sys/wait.h>
 #include <math.h>
+#include <pthread.h>
 
 #include "fio.h"
 #include "smalloc.h"
@@ -65,6 +66,7 @@ unsigned int stat_number = 0;
 int shm_id = 0;
 int temp_stall_ts;
 unsigned long done_secs = 0;
+pthread_mutex_t overlap_check = PTHREAD_MUTEX_INITIALIZER;
 
 #define JOB_START_TIMEOUT	(5 * 1000)
 
@@ -567,7 +569,7 @@ static int unlink_all_files(struct thread_data *td)
 /*
  * Check if io_u will overlap an in-flight IO in the queue
  */
-static bool in_flight_overlap(struct io_u_queue *q, struct io_u *io_u)
+bool in_flight_overlap(struct io_u_queue *q, struct io_u *io_u)
 {
 	bool overlap;
 	struct io_u *check_io_u;
@@ -1872,7 +1874,11 @@ static void *thread_main(void *data)
 			 "perhaps try --debug=io option for details?\n",
 			 td->o.name, td->io_ops->name);
 
+	if (td->o.serialize_overlap && td->o.io_submit_mode == IO_MODE_OFFLOAD)
+		pthread_mutex_lock(&overlap_check);
 	td_set_runstate(td, TD_FINISHING);
+	if (td->o.serialize_overlap && td->o.io_submit_mode == IO_MODE_OFFLOAD)
+		pthread_mutex_unlock(&overlap_check);
 
 	update_rusage_stat(td);
 	td->ts.total_run_time = mtime_since_now(&td->epoch);
diff --git a/fio.1 b/fio.1
index bf181b3..7691b2b 100644
--- a/fio.1
+++ b/fio.1
@@ -2070,8 +2070,15 @@ changing data and the overlapping region has a non-zero size. Setting
 \fBserialize_overlap\fR tells fio to avoid provoking this behavior by explicitly
 serializing in-flight I/Os that have a non-zero overlap. Note that setting
 this option can reduce both performance and the \fBiodepth\fR achieved.
-Additionally this option does not work when \fBio_submit_mode\fR is set to
-offload. Default: false.
+.RS
+.P
+This option only applies to I/Os issued for a single job except when it is
+enabled along with \fBio_submit_mode\fR=offload. In offload mode, fio
+will check for overlap among all I/Os submitted by offload jobs with \fBserialize_overlap\fR
+enabled. Threads must be used for all such jobs.
+.P
+Default: false.
+.RE
 .TP
 .BI io_submit_mode \fR=\fPstr
 This option controls how fio submits the I/O to the I/O engine. The default
diff --git a/fio.h b/fio.h
index 53bcda1..e394e16 100644
--- a/fio.h
+++ b/fio.h
@@ -852,4 +852,7 @@ enum {
 extern void exec_trigger(const char *);
 extern void check_trigger_file(void);
 
+extern bool in_flight_overlap(struct io_u_queue *q, struct io_u *io_u);
+extern pthread_mutex_t overlap_check;
+
 #endif
diff --git a/init.c b/init.c
index 1eddc6f..a2b70c4 100644
--- a/init.c
+++ b/init.c
@@ -744,19 +744,12 @@ static int fixup_options(struct thread_data *td)
 	/*
 	 * There's no need to check for in-flight overlapping IOs if the job
 	 * isn't changing data or the maximum iodepth is guaranteed to be 1
+	 * when we are not in offload mode
 	 */
 	if (o->serialize_overlap && !(td->flags & TD_F_READ_IOLOG) &&
-	    (!(td_write(td) || td_trim(td)) || o->iodepth == 1))
+	    (!(td_write(td) || td_trim(td)) || o->iodepth == 1) &&
+	    o->io_submit_mode != IO_MODE_OFFLOAD)
 		o->serialize_overlap = 0;
-	/*
-	 * Currently can't check for overlaps in offload mode
-	 */
-	if (o->serialize_overlap && o->io_submit_mode == IO_MODE_OFFLOAD) {
-		log_err("fio: checking for in-flight overlaps when the "
-			"io_submit_mode is offload is not supported\n");
-		o->serialize_overlap = 0;
-		ret |= warnings_fatal;
-	}
 
 	if (o->nr_files > td->files_index)
 		o->nr_files = td->files_index;
diff --git a/ioengines.c b/ioengines.c
index ba02952..47f606a 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -288,6 +288,8 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 
 	assert((io_u->flags & IO_U_F_FLIGHT) == 0);
 	io_u_set(td, io_u, IO_U_F_FLIGHT);
+	if (td->o.serialize_overlap && td->o.io_submit_mode == IO_MODE_OFFLOAD)
+		pthread_mutex_unlock(&overlap_check);
 
 	assert(fio_file_open(io_u->file));
 
diff --git a/rate-submit.c b/rate-submit.c
index 2f02fe2..68ad755 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -9,6 +9,36 @@
 #include "lib/getrusage.h"
 #include "rate-submit.h"
 
+static void check_overlap(struct io_u *io_u)
+{
+	int i;
+	struct thread_data *td;
+	bool overlap = false;
+
+	do {
+		/*
+		 * Allow only one thread to check for overlap at a
+		 * time to prevent two threads from thinking the coast
+		 * is clear and then submitting IOs that overlap with
+		 * each other
+		 */
+		pthread_mutex_lock(&overlap_check);
+		for_each_td(td, i) {
+			if (td->runstate <= TD_SETTING_UP ||
+				td->runstate >= TD_FINISHING ||
+				!td->o.serialize_overlap ||
+				td->o.io_submit_mode != IO_MODE_OFFLOAD)
+				continue;
+
+			overlap = in_flight_overlap(&td->io_u_all, io_u);
+			if (overlap) {
+				pthread_mutex_unlock(&overlap_check);
+				break;
+			}
+		}
+	} while (overlap);
+}
+
 static int io_workqueue_fn(struct submit_worker *sw,
 			   struct workqueue_work *work)
 {
@@ -17,6 +47,9 @@ static int io_workqueue_fn(struct submit_worker *sw,
 	struct thread_data *td = sw->priv;
 	int ret;
 
+	if (td->o.serialize_overlap)
+		check_overlap(io_u);
+
 	dprint(FD_RATE, "io_u %p queued by %u\n", io_u, gettid());
 
 	io_u_set(td, io_u, IO_U_F_NO_FILE_PUT);
@@ -50,10 +83,6 @@ static int io_workqueue_fn(struct submit_worker *sw,
 		ret = io_u_queued_complete(td, min_evts);
 		if (ret > 0)
 			td->cur_depth -= ret;
-	} else if (ret == FIO_Q_BUSY) {
-		ret = io_u_queued_complete(td, td->cur_depth);
-		if (ret > 0)
-			td->cur_depth -= ret;
 	}
 
 	return 0;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-10-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-10-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5857e8e0f53e81bd7096209fe5cabf91ce74b853:

  Merge branch 'patch-1' of https://github.com/joaomlneto/fio (2018-10-05 19:11:19 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ee6ce26c029dbdb62184d2f011fdab61d3429d82:

  options: kill 'use_os_rand' (2018-10-08 13:43:23 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Remove old OS dependent (unused) random code
      options: kill 'use_os_rand'

 options.c         |  8 --------
 os/os-aix.h       |  1 -
 os/os-android.h   | 17 -----------------
 os/os-dragonfly.h |  1 -
 os/os-freebsd.h   |  1 -
 os/os-hpux.h      |  1 -
 os/os-linux.h     | 15 ---------------
 os/os-mac.h       |  1 -
 os/os-netbsd.h    |  1 -
 os/os-openbsd.h   |  1 -
 os/os-solaris.h   | 16 ----------------
 os/os-windows.h   |  1 -
 os/os.h           | 17 -----------------
 thread_options.h  |  6 ++----
 14 files changed, 2 insertions(+), 85 deletions(-)

---

Diff of recent changes:

diff --git a/options.c b/options.c
index 66fb580..98187de 100644
--- a/options.c
+++ b/options.c
@@ -2203,14 +2203,6 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_RANDOM,
 	},
 	{
-		.name	= "use_os_rand",
-		.lname	= "Use OS random",
-		.type	= FIO_OPT_DEPRECATED,
-		.off1	= offsetof(struct thread_options, dep_use_os_rand),
-		.category = FIO_OPT_C_IO,
-		.group	= FIO_OPT_G_RANDOM,
-	},
-	{
 		.name	= "norandommap",
 		.lname	= "No randommap",
 		.type	= FIO_OPT_STR_SET,
diff --git a/os/os-aix.h b/os/os-aix.h
index e204d6f..1aab96e 100644
--- a/os/os-aix.h
+++ b/os/os-aix.h
@@ -11,7 +11,6 @@
 #include "../file.h"
 
 #define FIO_HAVE_ODIRECT
-#define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 
 #define OS_MAP_ANON		MAP_ANON
diff --git a/os/os-android.h b/os/os-android.h
index 1483275..3c05077 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -201,23 +201,6 @@ static inline unsigned long long os_phys_mem(void)
 	return (unsigned long long) pages * (unsigned long long) pagesize;
 }
 
-typedef struct { unsigned short r[3]; } os_random_state_t;
-
-static inline void os_random_seed(unsigned long seed, os_random_state_t *rs)
-{
-	rs->r[0] = seed & 0xffff;
-	seed >>= 16;
-	rs->r[1] = seed & 0xffff;
-	seed >>= 16;
-	rs->r[2] = seed & 0xffff;
-	seed48(rs->r);
-}
-
-static inline long os_random_long(os_random_state_t *rs)
-{
-	return nrand48(rs->r);
-}
-
 #ifdef O_NOATIME
 #define FIO_O_NOATIME	O_NOATIME
 #else
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index e80ad8c..eb92521 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -25,7 +25,6 @@
 #include "../lib/types.h"
 
 #define FIO_HAVE_ODIRECT
-#define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_FS_STAT
 #define FIO_HAVE_TRIM
diff --git a/os/os-freebsd.h b/os/os-freebsd.h
index 97bc8ae..789da17 100644
--- a/os/os-freebsd.h
+++ b/os/os-freebsd.h
@@ -16,7 +16,6 @@
 #include "../file.h"
 
 #define FIO_HAVE_ODIRECT
-#define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_CHARDEV_SIZE
 #define FIO_HAVE_FS_STAT
diff --git a/os/os-hpux.h b/os/os-hpux.h
index 515a525..c1dafe4 100644
--- a/os/os-hpux.h
+++ b/os/os-hpux.h
@@ -20,7 +20,6 @@
 #include "../file.h"
 
 #define FIO_HAVE_ODIRECT
-#define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_CHARDEV_SIZE
 
diff --git a/os/os-linux.h b/os/os-linux.h
index 6b63d12..ba58bf7 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -60,8 +60,6 @@
 
 typedef cpu_set_t os_cpu_mask_t;
 
-typedef struct drand48_data os_random_state_t;
-
 #ifdef CONFIG_3ARG_AFFINITY
 #define fio_setaffinity(pid, cpumask)		\
 	sched_setaffinity((pid), sizeof(cpumask), &(cpumask))
@@ -170,19 +168,6 @@ static inline unsigned long long os_phys_mem(void)
 	return (unsigned long long) pages * (unsigned long long) pagesize;
 }
 
-static inline void os_random_seed(unsigned long seed, os_random_state_t *rs)
-{
-	srand48_r(seed, rs);
-}
-
-static inline long os_random_long(os_random_state_t *rs)
-{
-	long val;
-
-	lrand48_r(rs, &val);
-	return val;
-}
-
 static inline int fio_lookup_raw(dev_t dev, int *majdev, int *mindev)
 {
 	struct raw_config_request rq;
diff --git a/os/os-mac.h b/os/os-mac.h
index 92a60ee..0b9c870 100644
--- a/os/os-mac.h
+++ b/os/os-mac.h
@@ -16,7 +16,6 @@
 
 #include "../file.h"
 
-#define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_GETTID
 #define FIO_HAVE_CHARDEV_SIZE
diff --git a/os/os-netbsd.h b/os/os-netbsd.h
index 682a11c..c06261d 100644
--- a/os/os-netbsd.h
+++ b/os/os-netbsd.h
@@ -21,7 +21,6 @@
 #include "../file.h"
 
 #define FIO_HAVE_ODIRECT
-#define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_FS_STAT
 #define FIO_HAVE_GETTID
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index b4c02c9..70f58b4 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -20,7 +20,6 @@
 
 #include "../file.h"
 
-#define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_FS_STAT
 #define FIO_HAVE_GETTID
diff --git a/os/os-solaris.h b/os/os-solaris.h
index 2425ab9..1a411af 100644
--- a/os/os-solaris.h
+++ b/os/os-solaris.h
@@ -47,7 +47,6 @@ struct solaris_rand_seed {
 #define FIO_OS_HAS_CTIME_R
 
 typedef psetid_t os_cpu_mask_t;
-typedef struct solaris_rand_seed os_random_state_t;
 
 static inline int chardev_size(struct fio_file *f, unsigned long long *bytes)
 {
@@ -92,21 +91,6 @@ static inline unsigned long long get_fs_free_size(const char *path)
 	return ret;
 }
 
-static inline void os_random_seed(unsigned long seed, os_random_state_t *rs)
-{
-	rs->r[0] = seed & 0xffff;
-	seed >>= 16;
-	rs->r[1] = seed & 0xffff;
-	seed >>= 16;
-	rs->r[2] = seed & 0xffff;
-	seed48(rs->r);
-}
-
-static inline long os_random_long(os_random_state_t *rs)
-{
-	return nrand48(rs->r);
-}
-
 #define FIO_OS_DIRECTIO
 extern int directio(int, int);
 static inline int fio_set_odirect(struct fio_file *f)
diff --git a/os/os-windows.h b/os/os-windows.h
index aad446e..ef955dc 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -35,7 +35,6 @@ int rand_r(unsigned *);
 #define FIO_HAVE_CPU_AFFINITY
 #define FIO_HAVE_CHARDEV_SIZE
 #define FIO_HAVE_GETTID
-#define FIO_USE_GENERIC_RAND
 
 #define FIO_PREFERRED_ENGINE		"windowsaio"
 #define FIO_PREFERRED_CLOCK_SOURCE	CS_CGETTIME
diff --git a/os/os.h b/os/os.h
index becc410..0b182c4 100644
--- a/os/os.h
+++ b/os/os.h
@@ -298,23 +298,6 @@ static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 }
 #endif
 
-#ifdef FIO_USE_GENERIC_RAND
-typedef unsigned int os_random_state_t;
-
-static inline void os_random_seed(unsigned long seed, os_random_state_t *rs)
-{
-	srand(seed);
-}
-
-static inline long os_random_long(os_random_state_t *rs)
-{
-	long val;
-
-	val = rand_r(rs);
-	return val;
-}
-#endif
-
 #ifdef FIO_USE_GENERIC_INIT_RANDOM_STATE
 static inline int init_random_seeds(unsigned long *rand_seeds, int size)
 {
diff --git a/thread_options.h b/thread_options.h
index 4f791cf..14c6969 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -140,7 +140,6 @@ struct thread_options {
 	unsigned int rand_repeatable;
 	unsigned int allrand_repeatable;
 	unsigned long long rand_seed;
-	unsigned int dep_use_os_rand;
 	unsigned int log_avg_msec;
 	unsigned int log_hist_msec;
 	unsigned int log_hist_coarseness;
@@ -173,7 +172,6 @@ struct thread_options {
 
 	unsigned int hugepage_size;
 	unsigned long long rw_min_bs;
-	unsigned int pad2;
 	unsigned int thinktime;
 	unsigned int thinktime_spin;
 	unsigned int thinktime_blocks;
@@ -430,8 +428,8 @@ struct thread_options_pack {
 	uint32_t override_sync;
 	uint32_t rand_repeatable;
 	uint32_t allrand_repeatable;
+	uint32_t pad;
 	uint64_t rand_seed;
-	uint32_t dep_use_os_rand;
 	uint32_t log_avg_msec;
 	uint32_t log_hist_msec;
 	uint32_t log_hist_coarseness;
@@ -572,7 +570,7 @@ struct thread_options_pack {
 	uint32_t rate_iops_min[DDIR_RWDIR_CNT];
 	uint32_t rate_process;
 	uint32_t rate_ign_think;
-	uint32_t pad;
+	uint32_t pad3;
 
 	uint8_t ioscheduler[FIO_TOP_STR_MAX];
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-10-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-10-06 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2087 bytes --]

The following changes since commit f1867a7f9e588acf67cf8fa96eab8a6e2fdedcf6:

  Bool conversions (2018-10-04 09:07:01 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5857e8e0f53e81bd7096209fe5cabf91ce74b853:

  Merge branch 'patch-1' of https://github.com/joaomlneto/fio (2018-10-05 19:11:19 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Add cross-stripe intel sample verify job
      Merge branch 'patch-1' of https://github.com/joaomlneto/fio

Jo��o Neto (1):
      Be careful when defining `MPOL_LOCAL`

 examples/cross-stripe-verify.fio | 25 +++++++++++++++++++++++++
 fio.h                            |  4 +++-
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 examples/cross-stripe-verify.fio

---

Diff of recent changes:

diff --git a/examples/cross-stripe-verify.fio b/examples/cross-stripe-verify.fio
new file mode 100644
index 0000000..68664ed
--- /dev/null
+++ b/examples/cross-stripe-verify.fio
@@ -0,0 +1,25 @@
+# Example of how to split a drive up into sections, manually, and perform
+# verify from a bunch of jobs. This example is special in that it assumes
+# the drive is at around 30 * 124G in size, so with the below settings, we'll
+# cover most of the drive. It's also special in that it doesn't write
+# everything, it just writes 16k at a specific boundary, for every 128k.
+# This is done to exercise the split path for Intel NVMe devices, most of
+# which have a 128k stripe size and require IOs to be split if the cross
+# the stripe boundary.
+#
+[global]
+bs=16k
+direct=1
+rw=write:112k
+verify=crc32c
+filename=/dev/nvme0n1
+verify_backlog=1
+offset_increment=124g
+io_size=120g
+offset=120k
+group_reporting=1
+verify_dump=1
+loops=2
+
+[write-verify]
+numjobs=30
diff --git a/fio.h b/fio.h
index 7b6611a..53bcda1 100644
--- a/fio.h
+++ b/fio.h
@@ -57,7 +57,9 @@
 /*
  * "local" is pseudo-policy
  */
-#define MPOL_LOCAL MPOL_MAX
+#ifndef MPOL_LOCAL
+#define MPOL_LOCAL 4
+#endif
 #endif
 
 #ifdef CONFIG_CUDA


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-10-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-10-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8772acb0435c6a378951458d8fd9b18e1ab1add0:

  Fio 3.11 (2018-10-03 12:30:40 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f1867a7f9e588acf67cf8fa96eab8a6e2fdedcf6:

  Bool conversions (2018-10-04 09:07:01 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Bool conversions

 backend.c |  4 ++--
 fio.h     | 10 +++++-----
 gfio.c    | 10 +++++-----
 init.c    | 32 ++++++++++++++++----------------
 options.c |  2 +-
 server.c  |  8 ++++----
 server.h  |  2 +-
 stat.h    |  2 +-
 8 files changed, 35 insertions(+), 35 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index bb8bd13..76e456f 100644
--- a/backend.c
+++ b/backend.c
@@ -53,7 +53,7 @@ static struct fio_sem *startup_sem;
 static struct flist_head *cgroup_list;
 static struct cgroup_mnt *cgroup_mnt;
 static int exit_value;
-static volatile int fio_abort;
+static volatile bool fio_abort;
 static unsigned int nr_process = 0;
 static unsigned int nr_thread = 0;
 
@@ -2371,7 +2371,7 @@ reap:
 			if (fio_sem_down_timeout(startup_sem, 10000)) {
 				log_err("fio: job startup hung? exiting.\n");
 				fio_terminate_threads(TERMINATE_ALL);
-				fio_abort = 1;
+				fio_abort = true;
 				nr_started--;
 				free(fd);
 				break;
diff --git a/fio.h b/fio.h
index 9e99da1..7b6611a 100644
--- a/fio.h
+++ b/fio.h
@@ -503,7 +503,7 @@ enum {
 #define __fio_stringify_1(x)	#x
 #define __fio_stringify(x)	__fio_stringify_1(x)
 
-extern int exitall_on_terminate;
+extern bool exitall_on_terminate;
 extern unsigned int thread_number;
 extern unsigned int stat_number;
 extern int shm_id;
@@ -512,7 +512,7 @@ extern int output_format;
 extern int append_terse_output;
 extern int temp_stall_ts;
 extern uintptr_t page_mask, page_size;
-extern int read_only;
+extern bool read_only;
 extern int eta_print;
 extern int eta_new_line;
 extern unsigned int eta_interval_msec;
@@ -523,10 +523,10 @@ extern enum fio_cs fio_clock_source;
 extern int fio_clock_source_set;
 extern int warnings_fatal;
 extern int terse_version;
-extern int is_backend;
-extern int is_local_backend;
+extern bool is_backend;
+extern bool is_local_backend;
 extern int nr_clients;
-extern int log_syslog;
+extern bool log_syslog;
 extern int status_interval;
 extern const char fio_version_string[];
 extern char *trigger_file;
diff --git a/gfio.c b/gfio.c
index f59238c..2805396 100644
--- a/gfio.c
+++ b/gfio.c
@@ -38,7 +38,7 @@
 #include "gclient.h"
 #include "graph.h"
 
-static int gfio_server_running;
+static bool gfio_server_running;
 static unsigned int gfio_graph_limit = 100;
 
 GdkColor gfio_color_white;
@@ -461,10 +461,10 @@ static int send_job_file(struct gui_entry *ge)
 static void *server_thread(void *arg)
 {
 	fio_server_create_sk_key();
-	is_backend = 1;
-	gfio_server_running = 1;
+	is_backend = true;
+	gfio_server_running = true;
 	fio_start_server(NULL);
-	gfio_server_running = 0;
+	gfio_server_running = false;
 	fio_server_destroy_sk_key();
 	return NULL;
 }
@@ -472,7 +472,7 @@ static void *server_thread(void *arg)
 static void gfio_start_server(struct gui *ui)
 {
 	if (!gfio_server_running) {
-		gfio_server_running = 1;
+		gfio_server_running = true;
 		pthread_create(&ui->server_t, NULL, server_thread, NULL);
 		pthread_detach(ui->server_t);
 	}
diff --git a/init.c b/init.c
index 560da8f..1eddc6f 100644
--- a/init.c
+++ b/init.c
@@ -45,16 +45,16 @@ const char fio_version_string[] = FIO_VERSION;
 
 static char **ini_file;
 static int max_jobs = FIO_MAX_JOBS;
-static int dump_cmdline;
-static int parse_only;
-static int merge_blktrace_only;
+static bool dump_cmdline;
+static bool parse_only;
+static bool merge_blktrace_only;
 
 static struct thread_data def_thread;
 struct thread_data *threads = NULL;
 static char **job_sections;
 static int nr_job_sections;
 
-int exitall_on_terminate = 0;
+bool exitall_on_terminate = false;
 int output_format = FIO_OUTPUT_NORMAL;
 int eta_print = FIO_ETA_AUTO;
 unsigned int eta_interval_msec = 1000;
@@ -64,13 +64,13 @@ FILE *f_err = NULL;
 char *exec_profile = NULL;
 int warnings_fatal = 0;
 int terse_version = 3;
-int is_backend = 0;
-int is_local_backend = 0;
+bool is_backend = false;
+bool is_local_backend = false;
 int nr_clients = 0;
-int log_syslog = 0;
+bool log_syslog = false;
 
-int write_bw_log = 0;
-int read_only = 0;
+bool write_bw_log = false;
+bool read_only = false;
 int status_interval = 0;
 
 char *trigger_file = NULL;
@@ -2487,7 +2487,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 	char *ostr = cmd_optstr;
 	char *pid_file = NULL;
 	void *cur_client = NULL;
-	int backend = 0;
+	bool backend = false;
 
 	/*
 	 * Reset optind handling, since we may call this multiple times
@@ -2513,7 +2513,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			exit_val = 1;
 			break;
 		case 'b':
-			write_bw_log = 1;
+			write_bw_log = true;
 			break;
 		case 'o': {
 			FILE *tmp;
@@ -2568,7 +2568,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			break;
 		case 's':
 			did_arg = true;
-			dump_cmdline = 1;
+			dump_cmdline = true;
 			break;
 		case 'r':
 			read_only = 1;
@@ -2634,7 +2634,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			break;
 		case 'P':
 			did_arg = true;
-			parse_only = 1;
+			parse_only = true;
 			break;
 		case 'x': {
 			size_t new_size;
@@ -2759,8 +2759,8 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			}
 			if (optarg)
 				fio_server_set_arg(optarg);
-			is_backend = 1;
-			backend = 1;
+			is_backend = true;
+			backend = true;
 #else
 			log_err("fio: client/server requires SHM support\n");
 			do_exit++;
@@ -2908,7 +2908,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 
 		case 'A':
 			did_arg = true;
-			merge_blktrace_only = 1;
+			merge_blktrace_only = true;
 			break;
 		case '?':
 			log_err("%s: unrecognized option '%s'\n", argv[0],
diff --git a/options.c b/options.c
index 9b27730..66fb580 100644
--- a/options.c
+++ b/options.c
@@ -482,7 +482,7 @@ static int str_rwmix_write_cb(void *data, unsigned long long *val)
 
 static int str_exitall_cb(void)
 {
-	exitall_on_terminate = 1;
+	exitall_on_terminate = true;
 	return 0;
 }
 
diff --git a/server.c b/server.c
index 1c07501..90d3396 100644
--- a/server.c
+++ b/server.c
@@ -28,7 +28,7 @@
 
 int fio_net_port = FIO_NET_PORT;
 
-int exit_backend = 0;
+bool exit_backend = false;
 
 enum {
 	SK_F_FREE	= 1,
@@ -995,7 +995,7 @@ static int handle_command(struct sk_out *sk_out, struct flist_head *job_list,
 		ret = 0;
 		break;
 	case FIO_NET_CMD_EXIT:
-		exit_backend = 1;
+		exit_backend = true;
 		return -1;
 	case FIO_NET_CMD_LOAD_FILE:
 		ret = handle_load_file_cmd(cmd);
@@ -2492,7 +2492,7 @@ void fio_server_got_signal(int signal)
 		sk_out->sk = -1;
 	else {
 		log_info("\nfio: terminating on signal %d\n", signal);
-		exit_backend = 1;
+		exit_backend = true;
 	}
 }
 
@@ -2574,7 +2574,7 @@ int fio_start_server(char *pidfile)
 
 	setsid();
 	openlog("fio", LOG_NDELAY|LOG_NOWAIT|LOG_PID, LOG_USER);
-	log_syslog = 1;
+	log_syslog = true;
 	close(STDIN_FILENO);
 	close(STDOUT_FILENO);
 	close(STDERR_FILENO);
diff --git a/server.h b/server.h
index 40b9eac..371e51e 100644
--- a/server.h
+++ b/server.h
@@ -232,7 +232,7 @@ extern int fio_net_send_quit(int sk);
 extern int fio_server_create_sk_key(void);
 extern void fio_server_destroy_sk_key(void);
 
-extern int exit_backend;
+extern bool exit_backend;
 extern int fio_net_port;
 
 #endif
diff --git a/stat.h b/stat.h
index 98de281..b4ba71e 100644
--- a/stat.h
+++ b/stat.h
@@ -326,7 +326,7 @@ extern void add_sync_clat_sample(struct thread_stat *ts,
 extern int calc_log_samples(void);
 
 extern struct io_log *agg_io_log[DDIR_RWDIR_CNT];
-extern int write_bw_log;
+extern bool write_bw_log;
 
 static inline bool nsec_to_usec(unsigned long long *min,
 				unsigned long long *max, double *mean,


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-10-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-10-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 21f277b89029e7729d2dc631572244361ea7718c:

  engines/rados: use fio provided file names (2018-10-01 08:18:10 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8772acb0435c6a378951458d8fd9b18e1ab1add0:

  Fio 3.11 (2018-10-03 12:30:40 -0600)

----------------------------------------------------------------
Dennis Zhou (2):
      Revert "blktrace: support for non-512b sector sizes"
      update replay_align and replay_scale documentation

Jens Axboe (3):
      iolog: fix up some style issues
      iolog: fix leak for unsupported iolog version
      Fio 3.11

 FIO-VERSION-GEN |  2 +-
 HOWTO           |  7 +++---
 blktrace.c      | 75 +++++++++++++++------------------------------------------
 file.h          |  1 -
 fio.1           |  7 +++---
 iolog.c         | 48 +++++++++++++++++++++---------------
 6 files changed, 57 insertions(+), 83 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index f1d25d0..17b215d 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.10
+DEF_VER=fio-3.11
 
 LF='
 '
diff --git a/HOWTO b/HOWTO
index 9528904..a2503e9 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2559,12 +2559,13 @@ I/O replay
 
 .. option:: replay_align=int
 
-	Force alignment of I/O offsets and lengths in a trace to this power of 2
-	value.
+	Force alignment of the byte offsets in a trace to this value. The value
+	must be a power of 2.
 
 .. option:: replay_scale=int
 
-	Scale sector offsets down by this factor when replaying traces.
+	Scale byte offsets down by this factor when replaying traces. Should most
+	likely use :option:`replay_align` as well.
 
 .. option:: replay_skip=str
 
diff --git a/blktrace.c b/blktrace.c
index 772a0c7..efe9ce2 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -3,9 +3,7 @@
  */
 #include <stdio.h>
 #include <stdlib.h>
-#include <sys/ioctl.h>
 #include <unistd.h>
-#include <linux/fs.h>
 
 #include "flist.h"
 #include "fio.h"
@@ -129,37 +127,17 @@ static void trace_add_open_close_event(struct thread_data *td, int fileno, enum
 	flist_add_tail(&ipo->list, &td->io_log_list);
 }
 
-static int get_dev_blocksize(const char *dev, unsigned int *bs)
+static int trace_add_file(struct thread_data *td, __u32 device)
 {
-	int fd;
-
-	fd = open(dev, O_RDONLY);
-	if (fd < 0)
-		return 1;
-
-	if (ioctl(fd, BLKSSZGET, bs) < 0) {
-		close(fd);
-		return 1;
-	}
-
-	close(fd);
-	return 0;
-}
-
-static int trace_add_file(struct thread_data *td, __u32 device,
-			  unsigned int *bs)
-{
-	static unsigned int last_maj, last_min, last_fileno, last_bs;
+	static unsigned int last_maj, last_min, last_fileno;
 	unsigned int maj = FMAJOR(device);
 	unsigned int min = FMINOR(device);
 	struct fio_file *f;
-	unsigned int i;
 	char dev[256];
+	unsigned int i;
 
-	if (last_maj == maj && last_min == min) {
-		*bs = last_bs;
+	if (last_maj == maj && last_min == min)
 		return last_fileno;
-	}
 
 	last_maj = maj;
 	last_min = min;
@@ -167,17 +145,14 @@ static int trace_add_file(struct thread_data *td, __u32 device,
 	/*
 	 * check for this file in our list
 	 */
-	for_each_file(td, f, i) {
+	for_each_file(td, f, i)
 		if (f->major == maj && f->minor == min) {
 			last_fileno = f->fileno;
-			last_bs = f->bs;
-			goto out;
+			return last_fileno;
 		}
-	}
 
 	strcpy(dev, "/dev");
 	if (blktrace_lookup_device(td->o.replay_redirect, dev, maj, min)) {
-		unsigned int this_bs;
 		int fileno;
 
 		if (td->o.replay_redirect)
@@ -189,22 +164,13 @@ static int trace_add_file(struct thread_data *td, __u32 device,
 
 		dprint(FD_BLKTRACE, "add devices %s\n", dev);
 		fileno = add_file_exclusive(td, dev);
-
-		if (get_dev_blocksize(dev, &this_bs))
-			this_bs = 512;
-
 		td->o.open_files++;
 		td->files[fileno]->major = maj;
 		td->files[fileno]->minor = min;
-		td->files[fileno]->bs = this_bs;
 		trace_add_open_close_event(td, fileno, FIO_LOG_OPEN_FILE);
-
 		last_fileno = fileno;
-		last_bs = this_bs;
 	}
 
-out:
-	*bs = last_bs;
 	return last_fileno;
 }
 
@@ -221,14 +187,14 @@ static void t_bytes_align(struct thread_options *o, struct blk_io_trace *t)
  */
 static void store_ipo(struct thread_data *td, unsigned long long offset,
 		      unsigned int bytes, int rw, unsigned long long ttime,
-		      int fileno, unsigned int bs)
+		      int fileno)
 {
 	struct io_piece *ipo;
 
 	ipo = calloc(1, sizeof(*ipo));
 	init_ipo(ipo);
 
-	ipo->offset = offset * bs;
+	ipo->offset = offset * 512;
 	if (td->o.replay_scale)
 		ipo->offset = ipo->offset / td->o.replay_scale;
 	ipo_bytes_align(td->o.replay_align, ipo);
@@ -268,10 +234,9 @@ static void handle_trace_notify(struct blk_io_trace *t)
 static void handle_trace_discard(struct thread_data *td,
 				 struct blk_io_trace *t,
 				 unsigned long long ttime,
-				 unsigned long *ios, unsigned int *rw_bs)
+				 unsigned long *ios, unsigned int *bs)
 {
 	struct io_piece *ipo;
-	unsigned int bs;
 	int fileno;
 
 	if (td->o.replay_skip & (1u << DDIR_TRIM))
@@ -279,17 +244,17 @@ static void handle_trace_discard(struct thread_data *td,
 
 	ipo = calloc(1, sizeof(*ipo));
 	init_ipo(ipo);
-	fileno = trace_add_file(td, t->device, &bs);
+	fileno = trace_add_file(td, t->device);
 
 	ios[DDIR_TRIM]++;
-	if (t->bytes > rw_bs[DDIR_TRIM])
-		rw_bs[DDIR_TRIM] = t->bytes;
+	if (t->bytes > bs[DDIR_TRIM])
+		bs[DDIR_TRIM] = t->bytes;
 
 	td->o.size += t->bytes;
 
 	INIT_FLIST_HEAD(&ipo->list);
 
-	ipo->offset = t->sector * bs;
+	ipo->offset = t->sector * 512;
 	if (td->o.replay_scale)
 		ipo->offset = ipo->offset / td->o.replay_scale;
 	ipo_bytes_align(td->o.replay_align, ipo);
@@ -311,13 +276,12 @@ static void dump_trace(struct blk_io_trace *t)
 
 static void handle_trace_fs(struct thread_data *td, struct blk_io_trace *t,
 			    unsigned long long ttime, unsigned long *ios,
-			    unsigned int *rw_bs)
+			    unsigned int *bs)
 {
-	unsigned int bs;
 	int rw;
 	int fileno;
 
-	fileno = trace_add_file(td, t->device, &bs);
+	fileno = trace_add_file(td, t->device);
 
 	rw = (t->action & BLK_TC_ACT(BLK_TC_WRITE)) != 0;
 
@@ -335,19 +299,18 @@ static void handle_trace_fs(struct thread_data *td, struct blk_io_trace *t,
 		return;
 	}
 
-	if (t->bytes > rw_bs[rw])
-		rw_bs[rw] = t->bytes;
+	if (t->bytes > bs[rw])
+		bs[rw] = t->bytes;
 
 	ios[rw]++;
 	td->o.size += t->bytes;
-	store_ipo(td, t->sector, t->bytes, rw, ttime, fileno, bs);
+	store_ipo(td, t->sector, t->bytes, rw, ttime, fileno);
 }
 
 static void handle_trace_flush(struct thread_data *td, struct blk_io_trace *t,
 			       unsigned long long ttime, unsigned long *ios)
 {
 	struct io_piece *ipo;
-	unsigned int bs;
 	int fileno;
 
 	if (td->o.replay_skip & (1u << DDIR_SYNC))
@@ -355,7 +318,7 @@ static void handle_trace_flush(struct thread_data *td, struct blk_io_trace *t,
 
 	ipo = calloc(1, sizeof(*ipo));
 	init_ipo(ipo);
-	fileno = trace_add_file(td, t->device, &bs);
+	fileno = trace_add_file(td, t->device);
 
 	ipo->delay = ttime / 1000;
 	ipo->ddir = DDIR_SYNC;
diff --git a/file.h b/file.h
index 446a1fb..e50c0f9 100644
--- a/file.h
+++ b/file.h
@@ -89,7 +89,6 @@ struct fio_file {
 	 */
 	unsigned int major, minor;
 	int fileno;
-	unsigned long long bs;
 	char *file_name;
 
 	/*
diff --git a/fio.1 b/fio.1
index 5c11d96..bf181b3 100644
--- a/fio.1
+++ b/fio.1
@@ -2257,11 +2257,12 @@ Unfortunately this also breaks the strict time ordering between multiple
 device accesses.
 .TP
 .BI replay_align \fR=\fPint
-Force alignment of I/O offsets and lengths in a trace to this power of 2
-value.
+Force alignment of the byte offsets in a trace to this value. The value
+must be a power of 2.
 .TP
 .BI replay_scale \fR=\fPint
-Scale sector offsets down by this factor when replaying traces.
+Scale bye offsets down by this factor when replaying traces. Should most
+likely use \fBreplay_align\fR as well.
 .SS "Threads, processes and job synchronization"
 .TP
 .BI replay_skip \fR=\fPstr
diff --git a/iolog.c b/iolog.c
index 26c3458..b72dcf9 100644
--- a/iolog.c
+++ b/iolog.c
@@ -566,7 +566,9 @@ static bool read_iolog2(struct thread_data *td)
 static bool is_socket(const char *path)
 {
 	struct stat buf;
-	int r = stat(path, &buf);
+	int r;
+
+	r = stat(path, &buf);
 	if (r == -1)
 		return false;
 
@@ -575,19 +577,25 @@ static bool is_socket(const char *path)
 
 static int open_socket(const char *path)
 {
-	int fd = socket(AF_UNIX, SOCK_STREAM, 0);
 	struct sockaddr_un addr;
+	int ret, fd;
+
+	fd = socket(AF_UNIX, SOCK_STREAM, 0);
 	if (fd < 0)
 		return fd;
+
 	addr.sun_family = AF_UNIX;
 	if (snprintf(addr.sun_path, sizeof(addr.sun_path), "%s", path) >=
-	    sizeof(addr.sun_path))
+	    sizeof(addr.sun_path)) {
 		log_err("%s: path name %s is too long for a Unix socket\n",
 			__func__, path);
-	if (connect(fd, (const struct sockaddr *)&addr, strlen(path) + sizeof(addr.sun_family)) == 0)
+	}
+
+	ret = connect(fd, (const struct sockaddr *)&addr, strlen(path) + sizeof(addr.sun_family));
+	if (!ret)
 		return fd;
-	else
-		close(fd);
+
+	close(fd);
 	return -1;
 }
 
@@ -596,20 +604,23 @@ static int open_socket(const char *path)
  */
 static bool init_iolog_read(struct thread_data *td)
 {
-	char buffer[256], *p;
+	char buffer[256], *p, *fname;
 	FILE *f = NULL;
-	bool ret;
-	char* fname = get_name_by_idx(td->o.read_iolog_file, td->subjob_number);
+
+	fname = get_name_by_idx(td->o.read_iolog_file, td->subjob_number);
 	dprint(FD_IO, "iolog: name=%s\n", fname);
 
 	if (is_socket(fname)) {
-		int fd = open_socket(fname);
-		if (fd >= 0) {
+		int fd;
+
+		fd = open_socket(fname);
+		if (fd >= 0)
 			f = fdopen(fd, "r");
-		}
 	} else
 		f = fopen(fname, "r");
+
 	free(fname);
+
 	if (!f) {
 		perror("fopen read iolog");
 		return false;
@@ -622,21 +633,20 @@ static bool init_iolog_read(struct thread_data *td)
 		fclose(f);
 		return false;
 	}
-	td->io_log_rfile = f;
+
 	/*
 	 * version 2 of the iolog stores a specific string as the
 	 * first line, check for that
 	 */
 	if (!strncmp(iolog_ver2, buffer, strlen(iolog_ver2))) {
 		free_release_files(td);
-		ret = read_iolog2(td);
-	}
-	else {
-		log_err("fio: iolog version 1 is no longer supported\n");
-		ret = false;
+		td->io_log_rfile = f;
+		return read_iolog2(td);
 	}
 
-	return ret;
+	log_err("fio: iolog version 1 is no longer supported\n");
+	fclose(f);
+	return false;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-10-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-10-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2bce52c621f0e1d7a360e8aedfe23300f8e56bb9:

  engines/cpu: fix potential overflow in thinktime calculation (2018-09-30 08:41:40 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 21f277b89029e7729d2dc631572244361ea7718c:

  engines/rados: use fio provided file names (2018-10-01 08:18:10 -0600)

----------------------------------------------------------------
Adam Kupczyk (1):
      engines/rados: use fio provided file names

 engines/rados.c | 52 +++++++++++++++-------------------------------------
 1 file changed, 15 insertions(+), 37 deletions(-)

---

Diff of recent changes:

diff --git a/engines/rados.c b/engines/rados.c
index c6aec73..86100dc 100644
--- a/engines/rados.c
+++ b/engines/rados.c
@@ -21,8 +21,6 @@ struct fio_rados_iou {
 struct rados_data {
 	rados_t cluster;
 	rados_ioctx_t io_ctx;
-	char **objects;
-	size_t object_count;
 	struct io_u **aio_events;
 	bool connected;
 };
@@ -96,18 +94,11 @@ static int _fio_setup_rados_data(struct thread_data *td,
 	rados->aio_events = calloc(td->o.iodepth, sizeof(struct io_u *));
 	if (!rados->aio_events)
 		goto failed;
-
-	rados->object_count = td->o.nr_files;
-	rados->objects = calloc(rados->object_count, sizeof(char*));
-	if (!rados->objects)
-		goto failed;
-
 	*rados_data_ptr = rados;
 	return 0;
 
 failed:
 	if (rados) {
-		rados->object_count = 0;
 		if (rados->aio_events)
 			free(rados->aio_events);
 		free(rados);
@@ -115,15 +106,12 @@ failed:
 	return 1;
 }
 
-static void _fio_rados_rm_objects(struct rados_data *rados)
+static void _fio_rados_rm_objects(struct thread_data *td, struct rados_data *rados)
 {
 	size_t i;
-	for (i = 0; i < rados->object_count; ++i) {
-		if (rados->objects[i]) {
-			rados_remove(rados->io_ctx, rados->objects[i]);
-			free(rados->objects[i]);
-			rados->objects[i] = NULL;
-		}
+	for (i = 0; i < td->o.nr_files; i++) {
+		struct fio_file *f = td->files[i];
+		rados_remove(rados->io_ctx, f->file_name);
 	}
 }
 
@@ -136,7 +124,6 @@ static int _fio_rados_connect(struct thread_data *td)
 		td->o.size / (td->o.nr_files ? td->o.nr_files : 1u);
 	struct fio_file *f;
 	uint32_t i;
-	size_t oname_len = 0;
 
 	if (o->cluster_name) {
 		char *client_name = NULL;
@@ -165,6 +152,11 @@ static int _fio_rados_connect(struct thread_data *td)
 	} else
 		r = rados_create(&rados->cluster, o->client_name);
 
+	if (o->pool_name == NULL) {
+		log_err("rados pool name must be provided.\n");
+		goto failed_early;
+	}
+
 	if (r < 0) {
 		log_err("rados_create failed.\n");
 		goto failed_early;
@@ -188,30 +180,18 @@ static int _fio_rados_connect(struct thread_data *td)
 		goto failed_shutdown;
 	}
 
-	for (i = 0; i < rados->object_count; i++) {
+	for (i = 0; i < td->o.nr_files; i++) {
 		f = td->files[i];
 		f->real_file_size = file_size;
-		f->engine_pos = i;
-
-		oname_len = strlen(f->file_name) + 32;
-		rados->objects[i] = malloc(oname_len);
-		/* vary objects for different jobs */
-		snprintf(rados->objects[i], oname_len - 1,
-			"fio_rados_bench.%s.%x",
-			f->file_name, td->thread_number);
-		r = rados_write(rados->io_ctx, rados->objects[i], "", 0, 0);
+		r = rados_write(rados->io_ctx, f->file_name, "", 0, 0);
 		if (r < 0) {
-			free(rados->objects[i]);
-			rados->objects[i] = NULL;
-			log_err("error creating object.\n");
 			goto failed_obj_create;
 		}
 	}
-
-  return 0;
+	return 0;
 
 failed_obj_create:
-	_fio_rados_rm_objects(rados);
+	_fio_rados_rm_objects(td, rados);
 	rados_ioctx_destroy(rados->io_ctx);
 	rados->io_ctx = NULL;
 failed_shutdown:
@@ -226,8 +206,6 @@ static void _fio_rados_disconnect(struct rados_data *rados)
 	if (!rados)
 		return;
 
-	_fio_rados_rm_objects(rados);
-
 	if (rados->io_ctx) {
 		rados_ioctx_destroy(rados->io_ctx);
 		rados->io_ctx = NULL;
@@ -244,8 +222,8 @@ static void fio_rados_cleanup(struct thread_data *td)
 	struct rados_data *rados = td->io_ops_data;
 
 	if (rados) {
+		_fio_rados_rm_objects(td, rados);
 		_fio_rados_disconnect(rados);
-		free(rados->objects);
 		free(rados->aio_events);
 		free(rados);
 	}
@@ -256,7 +234,7 @@ static enum fio_q_status fio_rados_queue(struct thread_data *td,
 {
 	struct rados_data *rados = td->io_ops_data;
 	struct fio_rados_iou *fri = io_u->engine_data;
-	char *object = rados->objects[io_u->file->engine_pos];
+	char *object = io_u->file->file_name;
 	int r = -1;
 
 	fio_ro_check(td, io_u);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-10-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-10-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 91d2513127442c4946dc99978870c4dc4f58427d:

  zbd: Avoid duplicating the code for calculating the number of sectors with data (2018-09-29 15:16:26 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2bce52c621f0e1d7a360e8aedfe23300f8e56bb9:

  engines/cpu: fix potential overflow in thinktime calculation (2018-09-30 08:41:40 -0600)

----------------------------------------------------------------
Jeff Moyer (1):
      fix hung fio process with large I/O sizes and verify= option

Jens Axboe (1):
      engines/cpu: fix potential overflow in thinktime calculation

 engines/cpu.c | 2 +-
 lib/rand.c    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/engines/cpu.c b/engines/cpu.c
index 0987250..4d572b4 100644
--- a/engines/cpu.c
+++ b/engines/cpu.c
@@ -85,7 +85,7 @@ static int fio_cpuio_init(struct thread_data *td)
 	 */
 	o->thinktime_blocks = 1;
 	o->thinktime_spin = 0;
-	o->thinktime = (co->cpucycle * (100 - co->cpuload)) / co->cpuload;
+	o->thinktime = ((unsigned long long) co->cpucycle * (100 - co->cpuload)) / co->cpuload;
 
 	o->nr_files = o->open_files = 1;
 
diff --git a/lib/rand.c b/lib/rand.c
index 46ffe4f..99846a8 100644
--- a/lib/rand.c
+++ b/lib/rand.c
@@ -156,7 +156,7 @@ void __fill_random_buf_percentage(unsigned long seed, void *buf,
 		/*
 		 * Fill random chunk
 		 */
-		this_len = (segment * (100 - percentage)) / 100;
+		this_len = ((unsigned long long)segment * (100 - percentage)) / 100;
 		if (this_len > len)
 			this_len = len;
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 152529ae26d1167779138b6cd30d4de10623da1b:

  gettime: slightly improve CPU clock calibration (2018-09-27 17:38:21 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 91d2513127442c4946dc99978870c4dc4f58427d:

  zbd: Avoid duplicating the code for calculating the number of sectors with data (2018-09-29 15:16:26 -0600)

----------------------------------------------------------------
Bart Van Assche (6):
      t/zbd/functions: Make fio_reset_count() return 0 if no resets occurred
      t/zbd/test-zbd-support: Ensure that an assertion failure causes this test to fail
      t/zbd/test-zbd-support: Set fio aux path and forbid file creation
      t/zbd/test-zbd-support: Report a test summary when finished
      zbd: Restore zbd_check_swd()
      zbd: Avoid duplicating the code for calculating the number of sectors with data

Damien Le Moal (1):
      zbd: Fix incorrect comments

 t/zbd/functions        |  5 +++-
 t/zbd/test-zbd-support | 17 +++++++++---
 zbd.c                  | 75 ++++++++++++++++++++++++++++++++++++++++++--------
 zbd.h                  |  4 +--
 4 files changed, 82 insertions(+), 19 deletions(-)

---

Diff of recent changes:

diff --git a/t/zbd/functions b/t/zbd/functions
index 95f9bf4..173f0ca 100644
--- a/t/zbd/functions
+++ b/t/zbd/functions
@@ -102,5 +102,8 @@ fio_written() {
 }
 
 fio_reset_count() {
-    sed -n 's/^.*write:[^;]*; \([0-9]*\) zone resets$/\1/p'
+    local count
+
+    count=$(sed -n 's/^.*write:[^;]*; \([0-9]*\) zone resets$/\1/p')
+    echo "${count:-0}"
 }
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
index 6ee5055..2d72791 100755
--- a/t/zbd/test-zbd-support
+++ b/t/zbd/test-zbd-support
@@ -81,13 +81,14 @@ is_scsi_device() {
 }
 
 run_fio() {
-    local fio
+    local fio opts
 
     fio=$(dirname "$0")/../../fio
 
-    { echo; echo "fio $*"; echo; } >>"${logfile}.${test_number}"
+    opts=("--aux-path=/tmp" "--allow_file_create=0" "$@")
+    { echo; echo "fio ${opts[*]}"; echo; } >>"${logfile}.${test_number}"
 
-    "${dynamic_analyzer[@]}" "$fio" "$@"
+    "${dynamic_analyzer[@]}" "$fio" "${opts[@]}"
 }
 
 run_one_fio_job() {
@@ -113,7 +114,7 @@ run_fio_on_seq() {
 # Check whether buffered writes are refused.
 test1() {
     run_fio --name=job1 --filename="$dev" --rw=write --direct=0 --bs=4K	\
-	    --size="${zone_size}"					\
+	    --size="${zone_size}" --thread=1				\
 	    --zonemode=zbd --zonesize="${zone_size}" 2>&1 |
 	tee -a "${logfile}.${test_number}" |
 	grep -q 'Using direct I/O is mandatory for writing to ZBD drives'
@@ -800,18 +801,26 @@ fi
 
 logfile=$0.log
 
+passed=0
+failed=0
 rc=0
 for test_number in "${tests[@]}"; do
     rm -f "${logfile}.${test_number}"
     echo -n "Running test $test_number ... "
     if eval "test$test_number"; then
 	status="PASS"
+	((passed++))
     else
 	status="FAIL"
+	((failed++))
 	rc=1
     fi
     echo "$status"
     echo "$status" >> "${logfile}.${test_number}"
 done
 
+echo "$passed tests passed"
+if [ $failed -gt 0 ]; then
+    echo " and $failed tests failed"
+fi
 exit $rc
diff --git a/zbd.c b/zbd.c
index aa08b81..8acda1f 100644
--- a/zbd.c
+++ b/zbd.c
@@ -726,29 +726,76 @@ static bool zbd_dec_and_reset_write_cnt(const struct thread_data *td,
 	return write_cnt == 0;
 }
 
-void zbd_file_reset(struct thread_data *td, struct fio_file *f)
+enum swd_action {
+	CHECK_SWD,
+	SET_SWD,
+};
+
+/* Calculate the number of sectors with data (swd) and perform action 'a' */
+static uint64_t zbd_process_swd(const struct fio_file *f, enum swd_action a)
 {
 	struct fio_zone_info *zb, *ze, *z;
-	uint32_t zone_idx_e;
 	uint64_t swd = 0;
 
-	if (!f->zbd_info)
-		return;
-
 	zb = &f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset)];
-	zone_idx_e = zbd_zone_idx(f, f->file_offset + f->io_size);
-	ze = &f->zbd_info->zone_info[zone_idx_e];
-	for (z = zb ; z < ze; z++) {
+	ze = &f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset +
+						  f->io_size)];
+	for (z = zb; z < ze; z++) {
 		pthread_mutex_lock(&z->mutex);
 		swd += z->wp - z->start;
 	}
 	pthread_mutex_lock(&f->zbd_info->mutex);
-	f->zbd_info->sectors_with_data = swd;
+	switch (a) {
+	case CHECK_SWD:
+		assert(f->zbd_info->sectors_with_data == swd);
+		break;
+	case SET_SWD:
+		f->zbd_info->sectors_with_data = swd;
+		break;
+	}
 	pthread_mutex_unlock(&f->zbd_info->mutex);
-	for (z = zb ; z < ze; z++)
+	for (z = zb; z < ze; z++)
 		pthread_mutex_unlock(&z->mutex);
-	dprint(FD_ZBD, "%s(%s): swd = %llu\n", __func__, f->file_name,
-		(unsigned long long) swd);
+
+	return swd;
+}
+
+/*
+ * The swd check is useful for debugging but takes too much time to leave
+ * it enabled all the time. Hence it is disabled by default.
+ */
+static const bool enable_check_swd = false;
+
+/* Check whether the value of zbd_info.sectors_with_data is correct. */
+static void zbd_check_swd(const struct fio_file *f)
+{
+	if (!enable_check_swd)
+		return;
+
+	zbd_process_swd(f, CHECK_SWD);
+}
+
+static void zbd_init_swd(struct fio_file *f)
+{
+	uint64_t swd;
+
+	swd = zbd_process_swd(f, SET_SWD);
+	dprint(FD_ZBD, "%s(%s): swd = %" PRIu64 "\n", __func__, f->file_name,
+	       swd);
+}
+
+void zbd_file_reset(struct thread_data *td, struct fio_file *f)
+{
+	struct fio_zone_info *zb, *ze;
+	uint32_t zone_idx_e;
+
+	if (!f->zbd_info)
+		return;
+
+	zb = &f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset)];
+	zone_idx_e = zbd_zone_idx(f, f->file_offset + f->io_size);
+	ze = &f->zbd_info->zone_info[zone_idx_e];
+	zbd_init_swd(f);
 	/*
 	 * If data verification is enabled reset the affected zones before
 	 * writing any data to avoid that a zone reset has to be issued while
@@ -1077,6 +1124,8 @@ static void zbd_post_submit(const struct io_u *io_u, bool success)
 	}
 unlock:
 	pthread_mutex_unlock(&z->mutex);
+
+	zbd_check_swd(io_u->file);
 }
 
 bool zbd_unaligned_write(int error_code)
@@ -1129,6 +1178,8 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	    io_u->ddir == DDIR_READ && td->o.read_beyond_wp)
 		return io_u_accept;
 
+	zbd_check_swd(f);
+
 	pthread_mutex_lock(&zb->mutex);
 	switch (io_u->ddir) {
 	case DDIR_READ:
diff --git a/zbd.h b/zbd.h
index d750b67..33e6d8b 100644
--- a/zbd.h
+++ b/zbd.h
@@ -31,8 +31,8 @@ enum io_u_action {
 
 /**
  * struct fio_zone_info - information about a single ZBD zone
- * @start: zone start in 512 byte units
- * @wp: zone write pointer location in 512 byte units
+ * @start: zone start location (bytes)
+ * @wp: zone write pointer location (bytes)
  * @verify_block: number of blocks that have been verified for this zone
  * @mutex: protects the modifiable members in this structure
  * @type: zone type (BLK_ZONE_TYPE_*)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d60be7d51cbb601cc59dccc9f2a418072046a985:

  zbd: Remove unused function and variable (2018-09-26 08:26:01 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 152529ae26d1167779138b6cd30d4de10623da1b:

  gettime: slightly improve CPU clock calibration (2018-09-27 17:38:21 -0600)

----------------------------------------------------------------
Bart Van Assche (1):
      zbd: Fix zbd_zone_nr()

Jens Axboe (3):
      Merge branch 'master' of https://github.com/bvanassche/fio
      server: be locally vocal about communication issues
      gettime: slightly improve CPU clock calibration

 gettime.c | 5 ++---
 server.c  | 4 ++++
 zbd.c     | 2 +-
 3 files changed, 7 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/gettime.c b/gettime.c
index c0f2638..7702193 100644
--- a/gettime.c
+++ b/gettime.c
@@ -237,12 +237,11 @@ static unsigned long get_cycles_per_msec(void)
 	c_s = get_cpu_clock();
 	do {
 		__fio_gettime(&e);
+		c_e = get_cpu_clock();
 
 		elapsed = utime_since(&s, &e);
-		if (elapsed >= 1280) {
-			c_e = get_cpu_clock();
+		if (elapsed >= 1280)
 			break;
-		}
 	} while (1);
 
 	fio_clock_source = old_cs;
diff --git a/server.c b/server.c
index b966c66..1c07501 100644
--- a/server.c
+++ b/server.c
@@ -296,6 +296,8 @@ static int verify_convert_cmd(struct fio_net_cmd *cmd)
 	if (crc != cmd->cmd_crc16) {
 		log_err("fio: server bad crc on command (got %x, wanted %x)\n",
 				cmd->cmd_crc16, crc);
+		fprintf(f_err, "fio: server bad crc on command (got %x, wanted %x)\n",
+				cmd->cmd_crc16, crc);
 		return 1;
 	}
 
@@ -310,6 +312,8 @@ static int verify_convert_cmd(struct fio_net_cmd *cmd)
 		break;
 	default:
 		log_err("fio: bad server cmd version %d\n", cmd->version);
+		fprintf(f_err, "fio: client/server version mismatch (%d != %d)\n",
+				cmd->version, FIO_SERVER_VER);
 		return 1;
 	}
 
diff --git a/zbd.c b/zbd.c
index 9c52587..aa08b81 100644
--- a/zbd.c
+++ b/zbd.c
@@ -606,7 +606,7 @@ static int zbd_reset_range(struct thread_data *td, const struct fio_file *f,
 static unsigned int zbd_zone_nr(struct zoned_block_device_info *zbd_info,
 				struct fio_zone_info *zone)
 {
-	return (uintptr_t) zone - (uintptr_t) zbd_info->zone_info;
+	return zone - zbd_info->zone_info;
 }
 
 /**


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7676a1c25fcdacfe27d84a0f86fe68077b7de79a:

  parse: fix negative FIO_OPT_INT too-large check (2018-09-25 20:15:05 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d60be7d51cbb601cc59dccc9f2a418072046a985:

  zbd: Remove unused function and variable (2018-09-26 08:26:01 -0600)

----------------------------------------------------------------
Damien Le Moal (2):
      eta: Avoid adjustements to a negative value
      zbd: Remove unused function and variable

 eta.c | 21 ++++++++++++++++++---
 zbd.c | 31 ++-----------------------------
 2 files changed, 20 insertions(+), 32 deletions(-)

---

Diff of recent changes:

diff --git a/eta.c b/eta.c
index 970a67d..b69dd19 100644
--- a/eta.c
+++ b/eta.c
@@ -177,12 +177,27 @@ static unsigned long thread_eta(struct thread_data *td)
 		bytes_total = td->fill_device_size;
 	}
 
-	if (td->o.zone_size && td->o.zone_skip && bytes_total) {
+	/*
+	 * If io_size is set, bytes_total is an exact value that does not need
+	 * adjustment.
+	 */
+	if (td->o.zone_size && td->o.zone_skip && bytes_total &&
+	    !fio_option_is_set(&td->o, io_size)) {
 		unsigned int nr_zones;
 		uint64_t zone_bytes;
 
-		zone_bytes = bytes_total + td->o.zone_size + td->o.zone_skip;
-		nr_zones = (zone_bytes - 1) / (td->o.zone_size + td->o.zone_skip);
+		/*
+		 * Calculate the upper bound of the number of zones that will
+		 * be processed, including skipped bytes between zones. If this
+		 * is larger than total_io_size (e.g. when --io_size or --size
+		 * specify a small value), use the lower bound to avoid
+		 * adjustments to a negative value that would result in a very
+		 * large bytes_total and an incorrect eta.
+		 */
+		zone_bytes = td->o.zone_size + td->o.zone_skip;
+		nr_zones = (bytes_total + zone_bytes - 1) / zone_bytes;
+		if (bytes_total < nr_zones * td->o.zone_skip)
+			nr_zones = bytes_total / zone_bytes;
 		bytes_total -= nr_zones * td->o.zone_skip;
 	}
 
diff --git a/zbd.c b/zbd.c
index 9c3092a..9c52587 100644
--- a/zbd.c
+++ b/zbd.c
@@ -620,12 +620,10 @@ static unsigned int zbd_zone_nr(struct zoned_block_device_info *zbd_info,
 static int zbd_reset_zone(struct thread_data *td, const struct fio_file *f,
 			  struct fio_zone_info *z)
 {
-	int ret;
-
 	dprint(FD_ZBD, "%s: resetting wp of zone %u.\n", f->file_name,
 		zbd_zone_nr(f->zbd_info, z));
-	ret = zbd_reset_range(td, f, z->start, (z+1)->start - z->start);
-	return ret;
+
+	return zbd_reset_range(td, f, z->start, (z+1)->start - z->start);
 }
 
 /*
@@ -728,29 +726,6 @@ static bool zbd_dec_and_reset_write_cnt(const struct thread_data *td,
 	return write_cnt == 0;
 }
 
-/* Check whether the value of zbd_info.sectors_with_data is correct. */
-static void check_swd(const struct thread_data *td, const struct fio_file *f)
-{
-#if 0
-	struct fio_zone_info *zb, *ze, *z;
-	uint64_t swd;
-
-	zb = &f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset)];
-	ze = &f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset +
-						  f->io_size)];
-	swd = 0;
-	for (z = zb; z < ze; z++) {
-		pthread_mutex_lock(&z->mutex);
-		swd += z->wp - z->start;
-	}
-	pthread_mutex_lock(&f->zbd_info->mutex);
-	assert(f->zbd_info->sectors_with_data == swd);
-	pthread_mutex_unlock(&f->zbd_info->mutex);
-	for (z = zb; z < ze; z++)
-		pthread_mutex_unlock(&z->mutex);
-#endif
-}
-
 void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 {
 	struct fio_zone_info *zb, *ze, *z;
@@ -1227,7 +1202,6 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		}
 		/* Check whether the zone reset threshold has been exceeded */
 		if (td->o.zrf.u.f) {
-			check_swd(td, f);
 			if (f->zbd_info->sectors_with_data >=
 			    f->io_size * td->o.zrt.u.f &&
 			    zbd_dec_and_reset_write_cnt(td, f)) {
@@ -1248,7 +1222,6 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			zb->reset_zone = 0;
 			if (zbd_reset_zone(td, f, zb) < 0)
 				goto eof;
-			check_swd(td, f);
 		}
 		/* Make writes occur at the write pointer */
 		assert(!zbd_zone_full(f, zb, min_bs));


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9f2cd5e0ccbce6b65276c1401cdcf2cb8b77b9ff:

  stat: print the right percentile variant for json output (2018-09-22 15:02:04 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7676a1c25fcdacfe27d84a0f86fe68077b7de79a:

  parse: fix negative FIO_OPT_INT too-large check (2018-09-25 20:15:05 -0600)

----------------------------------------------------------------
Jens Axboe (5):
      parse: print option name for out-of-range settings
      parse: fix min/max val checking for FIO_OPT_INT
      Add regression test for flow/negative option parser breakage
      parse: fix minval checking
      parse: fix negative FIO_OPT_INT too-large check

 parse.c                   | 39 +++++++++++++++++++++++++++++++++------
 t/jobs/t0011-5d2788d5.fio | 18 ++++++++++++++++++
 2 files changed, 51 insertions(+), 6 deletions(-)
 create mode 100644 t/jobs/t0011-5d2788d5.fio

---

Diff of recent changes:

diff --git a/parse.c b/parse.c
index 5d88d91..a7d4516 100644
--- a/parse.c
+++ b/parse.c
@@ -506,6 +506,33 @@ static const char *opt_type_name(const struct fio_option *o)
 	return "OPT_UNKNOWN?";
 }
 
+static bool val_too_large(const struct fio_option *o, unsigned long long val,
+			  bool is_uint)
+{
+	if (!o->maxval)
+		return false;
+
+	if (is_uint) {
+		if ((int) val < 0)
+			return (int) val > (int) o->maxval;
+		return (unsigned int) val > o->maxval;
+	}
+
+	return val > o->maxval;
+}
+
+static bool val_too_small(const struct fio_option *o, unsigned long long val,
+			  bool is_uint)
+{
+	if (!o->minval)
+		return false;
+
+	if (is_uint)
+		return (int) val < o->minval;
+
+	return val < o->minval;
+}
+
 static int __handle_option(const struct fio_option *o, const char *ptr,
 			   void *data, int first, int more, int curr)
 {
@@ -595,14 +622,14 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 			return 1;
 		}
 
-		if (o->maxval && ull > o->maxval) {
-			log_err("max value out of range: %llu"
-					" (%llu max)\n", ull, o->maxval);
+		if (val_too_large(o, ull, o->type == FIO_OPT_INT)) {
+			log_err("%s: max value out of range: %llu"
+				" (%llu max)\n", o->name, ull, o->maxval);
 			return 1;
 		}
-		if (o->minval && ull < o->minval) {
-			log_err("min value out of range: %lld"
-					" (%d min)\n", ull, o->minval);
+		if (val_too_small(o, ull, o->type == FIO_OPT_INT)) {
+			log_err("%s: min value out of range: %lld"
+				" (%d min)\n", o->name, ull, o->minval);
 			return 1;
 		}
 		if (o->posval[0].ival) {
diff --git a/t/jobs/t0011-5d2788d5.fio b/t/jobs/t0011-5d2788d5.fio
new file mode 100644
index 0000000..09861f7
--- /dev/null
+++ b/t/jobs/t0011-5d2788d5.fio
@@ -0,0 +1,18 @@
+# Expected results: no parse warnings, runs and with roughly 1/8 iops between
+#			the two jobs.
+# Buggy result: parse warning on flow value overflow, no 1/8 division between
+			jobs.
+#
+[global]
+bs=4k
+ioengine=null
+size=100g
+runtime=3
+flow_id=1
+
+[flow1]
+flow=-8
+rate_iops=1000
+
+[flow2]
+flow=1


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 843c6b5e081900394da002b45253e541b794ac54:

  Merge branch 'steadystate-doc' of https://github.com/vincentkfu/fio (2018-09-21 09:50:07 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9f2cd5e0ccbce6b65276c1401cdcf2cb8b77b9ff:

  stat: print the right percentile variant for json output (2018-09-22 15:02:04 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      stat: print the right percentile variant for json output

 stat.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 5fca998..ef9c4af 100644
--- a/stat.c
+++ b/stat.c
@@ -1059,10 +1059,16 @@ static void add_ddir_status_json(struct thread_stat *ts,
 
 	if (ts->clat_percentiles || ts->lat_percentiles) {
 		if (ddir_rw(ddir)) {
+			uint64_t samples;
+
+			if (ts->clat_percentiles)
+				samples = ts->clat_stat[ddir].samples;
+			else
+				samples = ts->lat_stat[ddir].samples;
+
 			len = calc_clat_percentiles(ts->io_u_plat[ddir],
-					ts->clat_stat[ddir].samples,
-					ts->percentile_list, &ovals, &maxv,
-					&minv);
+					samples, ts->percentile_list, &ovals,
+					&maxv, &minv);
 		} else {
 			len = calc_clat_percentiles(ts->io_u_sync_plat,
 					ts->sync_stat.samples,


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4d2707ef027cc5b2418ab5de622318e0f70c096a:

  blktrace: fix leak of 'merge_buf' (2018-09-20 14:25:56 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 843c6b5e081900394da002b45253e541b794ac54:

  Merge branch 'steadystate-doc' of https://github.com/vincentkfu/fio (2018-09-21 09:50:07 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      engines/http: work-around crash in certain libssl versions
      Merge branch 'steadystate-doc' of https://github.com/vincentkfu/fio

Vincent Fu (1):
      docs: clarify usage of steadystate detection feature

 HOWTO          | 4 ++++
 engines/http.c | 2 ++
 fio.1          | 6 ++++++
 3 files changed, 12 insertions(+)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 45cf0bd..9528904 100644
--- a/HOWTO
+++ b/HOWTO
@@ -3000,6 +3000,10 @@ Steady state
 	data from the rolling collection window. Threshold limits can be expressed
 	as a fixed value or as a percentage of the mean in the collection window.
 
+	When using this feature, most jobs should include the :option:`time_based`
+	and :option:`runtime` options or the :option:`loops` option so that fio does not
+	stop running after it has covered the full size of the specified file(s) or device(s).
+
 		**iops**
 			Collect IOPS data. Stop the job if all individual IOPS measurements
 			are within the specified limit of the mean IOPS (e.g., ``iops:2``
diff --git a/engines/http.c b/engines/http.c
index 93fcd0d..d81e428 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -273,6 +273,8 @@ static void _hmac(unsigned char *md, void *key, int key_len, char *data) {
 	ctx = HMAC_CTX_new();
 #else
 	ctx = &_ctx;
+	/* work-around crash in certain versions of libssl */
+	HMAC_CTX_init(ctx);
 #endif
 	HMAC_Init_ex(ctx, key, key_len, EVP_sha256(), NULL);
 	HMAC_Update(ctx, (unsigned char*)data, strlen(data));
diff --git a/fio.1 b/fio.1
index 81164ae..5c11d96 100644
--- a/fio.1
+++ b/fio.1
@@ -2671,6 +2671,12 @@ steady state assessment criteria. All assessments are carried out using only
 data from the rolling collection window. Threshold limits can be expressed
 as a fixed value or as a percentage of the mean in the collection window.
 .RS
+.P
+When using this feature, most jobs should include the \fBtime_based\fR
+and \fBruntime\fR options or the \fBloops\fR option so that fio does not
+stop running after it has covered the full size of the specified file(s)
+or device(s).
+.RS
 .RS
 .TP
 .B iops


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5fa4aff59e4eec471b2c534702f0162006845c4b:

  Merge branch 'epoch-time-hist-logs' of https://github.com/parallel-fs-utils/fio (2018-09-19 09:42:47 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4d2707ef027cc5b2418ab5de622318e0f70c096a:

  blktrace: fix leak of 'merge_buf' (2018-09-20 14:25:56 -0600)

----------------------------------------------------------------
Dennis Zhou (4):
      options: rename name string operations for more general use
      blktrace: add support to interleave blktrace files
      blktrace: add option to scale a trace
      blktrace: add option to iterate over a trace multiple times

Jens Axboe (2):
      blktrace: provide empty merge_blktrace_iologs()
      blktrace: fix leak of 'merge_buf'

 HOWTO            |  71 +++++++++++++++++++
 blktrace.c       | 211 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 blktrace.h       |  22 ++++++
 cconv.c          |  15 ++++
 fio.1            |  66 +++++++++++++++++
 init.c           |  21 ++++++
 options.c        |  47 ++++++++++---
 options.h        |   2 +
 server.h         |   2 +-
 thread_options.h |   6 ++
 10 files changed, 453 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 0c5b710..45cf0bd 100644
--- a/HOWTO
+++ b/HOWTO
@@ -100,6 +100,10 @@ Command line options
 
 	Parse options only, don't start any I/O.
 
+.. option:: --merge-blktrace-only
+
+	Merge blktraces only, don't start any I/O.
+
 .. option:: --output=filename
 
 	Write output to file `filename`.
@@ -2491,6 +2495,33 @@ I/O replay
 	will be read at once. If selected true, input from iolog will be read
 	gradually. Useful when iolog is very large, or it is generated.
 
+.. option:: merge_blktrace_file=str
+
+	When specified, rather than replaying the logs passed to :option:`read_iolog`,
+	the logs go through a merge phase which aggregates them into a single
+	blktrace. The resulting file is then passed on as the :option:`read_iolog`
+	parameter. The intention here is to make the order of events consistent.
+	This limits the influence of the scheduler compared to replaying multiple
+	blktraces via concurrent jobs.
+
+.. option:: merge_blktrace_scalars=float_list
+
+	This is a percentage based option that is index paired with the list of
+	files passed to :option:`read_iolog`. When merging is performed, scale
+	the time of each event by the corresponding amount. For example,
+	``--merge_blktrace_scalars="50:100"`` runs the first trace in halftime
+	and the second trace in realtime. This knob is separately tunable from
+	:option:`replay_time_scale` which scales the trace during runtime and
+	does not change the output of the merge unlike this option.
+
+.. option:: merge_blktrace_iters=float_list
+
+	This is a whole number option that is index paired with the list of files
+	passed to :option:`read_iolog`. When merging is performed, run each trace
+	for the specified number of iterations. For example,
+	``--merge_blktrace_iters="2:1"`` runs the first trace for two iterations
+	and the second trace for one iteration.
+
 .. option:: replay_no_stall=bool
 
 	When replaying I/O with :option:`read_iolog` the default behavior is to
@@ -3839,6 +3870,46 @@ given in bytes. The `action` can be one of these:
 **trim**
 	   Trim the given file from the given `offset` for `length` bytes.
 
+
+I/O Replay - Merging Traces
+---------------------------
+
+Colocation is a common practice used to get the most out of a machine.
+Knowing which workloads play nicely with each other and which ones don't is
+a much harder task. While fio can replay workloads concurrently via multiple
+jobs, it leaves some variability up to the scheduler making results harder to
+reproduce. Merging is a way to make the order of events consistent.
+
+Merging is integrated into I/O replay and done when a
+:option:`merge_blktrace_file` is specified. The list of files passed to
+:option:`read_iolog` go through the merge process and output a single file
+stored to the specified file. The output file is passed on as if it were the
+only file passed to :option:`read_iolog`. An example would look like::
+
+	$ fio --read_iolog="<file1>:<file2>" --merge_blktrace_file="<output_file>"
+
+Creating only the merged file can be done by passing the command line argument
+:option:`merge-blktrace-only`.
+
+Scaling traces can be done to see the relative impact of any particular trace
+being slowed down or sped up. :option:`merge_blktrace_scalars` takes in a colon
+separated list of percentage scalars. It is index paired with the files passed
+to :option:`read_iolog`.
+
+With scaling, it may be desirable to match the running time of all traces.
+This can be done with :option:`merge_blktrace_iters`. It is index paired with
+:option:`read_iolog` just like :option:`merge_blktrace_scalars`.
+
+In an example, given two traces, A and B, each 60s long. If we want to see
+the impact of trace A issuing IOs twice as fast and repeat trace A over the
+runtime of trace B, the following can be done::
+
+	$ fio --read_iolog="<trace_a>:"<trace_b>" --merge_blktrace_file"<output_file>" --merge_blktrace_scalars="50:100" --merge_blktrace_iters="2:1"
+
+This runs trace A at 2x the speed twice for approximately the same runtime as
+a single run of trace B.
+
+
 CPU idleness profiling
 ----------------------
 
diff --git a/blktrace.c b/blktrace.c
index 36a7180..772a0c7 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -4,6 +4,7 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <sys/ioctl.h>
+#include <unistd.h>
 #include <linux/fs.h>
 
 #include "flist.h"
@@ -613,3 +614,213 @@ err:
 	fifo_free(fifo);
 	return false;
 }
+
+static int init_merge_param_list(fio_fp64_t *vals, struct blktrace_cursor *bcs,
+				 int nr_logs, int def, size_t off)
+{
+	int i = 0, len = 0;
+
+	while (len < FIO_IO_U_LIST_MAX_LEN && vals[len].u.f != 0.0)
+		len++;
+
+	if (len && len != nr_logs)
+		return len;
+
+	for (i = 0; i < nr_logs; i++) {
+		int *val = (int *)((char *)&bcs[i] + off);
+		*val = def;
+		if (len)
+			*val = (int)vals[i].u.f;
+	}
+
+	return 0;
+
+}
+
+static int find_earliest_io(struct blktrace_cursor *bcs, int nr_logs)
+{
+	__u64 time = ~(__u64)0;
+	int idx = 0, i;
+
+	for (i = 0; i < nr_logs; i++) {
+		if (bcs[i].t.time < time) {
+			time = bcs[i].t.time;
+			idx = i;
+		}
+	}
+
+	return idx;
+}
+
+static void merge_finish_file(struct blktrace_cursor *bcs, int i, int *nr_logs)
+{
+	bcs[i].iter++;
+	if (bcs[i].iter < bcs[i].nr_iter) {
+		lseek(bcs[i].fd, 0, SEEK_SET);
+		return;
+	}
+
+	*nr_logs -= 1;
+
+	/* close file */
+	fifo_free(bcs[i].fifo);
+	close(bcs[i].fd);
+
+	/* keep active files contiguous */
+	memmove(&bcs[i], &bcs[*nr_logs], sizeof(bcs[i]));
+}
+
+static int read_trace(struct thread_data *td, struct blktrace_cursor *bc)
+{
+	int ret = 0;
+	struct blk_io_trace *t = &bc->t;
+
+read_skip:
+	/* read an io trace */
+	ret = trace_fifo_get(td, bc->fifo, bc->fd, t, sizeof(*t));
+	if (ret < 0) {
+		return ret;
+	} else if (!ret) {
+		if (!bc->length)
+			bc->length = bc->t.time;
+		return ret;
+	} else if (ret < (int) sizeof(*t)) {
+		log_err("fio: short fifo get\n");
+		return -1;
+	}
+
+	if (bc->swap)
+		byteswap_trace(t);
+
+	/* skip over actions that fio does not care about */
+	if ((t->action & 0xffff) != __BLK_TA_QUEUE ||
+	    t_get_ddir(t) == DDIR_INVAL) {
+		ret = discard_pdu(td, bc->fifo, bc->fd, t);
+		if (ret < 0) {
+			td_verror(td, ret, "blktrace lseek");
+			return ret;
+		} else if (t->pdu_len != ret) {
+			log_err("fio: discarded %d of %d\n", ret,
+				t->pdu_len);
+			return -1;
+		}
+		goto read_skip;
+	}
+
+	t->time = (t->time + bc->iter * bc->length) * bc->scalar / 100;
+
+	return ret;
+}
+
+static int write_trace(FILE *fp, struct blk_io_trace *t)
+{
+	/* pdu is not used so just write out only the io trace */
+	t->pdu_len = 0;
+	return fwrite((void *)t, sizeof(*t), 1, fp);
+}
+
+int merge_blktrace_iologs(struct thread_data *td)
+{
+	int nr_logs = get_max_str_idx(td->o.read_iolog_file);
+	struct blktrace_cursor *bcs = malloc(sizeof(struct blktrace_cursor) *
+					     nr_logs);
+	struct blktrace_cursor *bc;
+	FILE *merge_fp;
+	char *str, *ptr, *name, *merge_buf;
+	int i, ret;
+
+	ret = init_merge_param_list(td->o.merge_blktrace_scalars, bcs, nr_logs,
+				    100, offsetof(struct blktrace_cursor,
+						  scalar));
+	if (ret) {
+		log_err("fio: merge_blktrace_scalars(%d) != nr_logs(%d)\n",
+			ret, nr_logs);
+		goto err_param;
+	}
+
+	ret = init_merge_param_list(td->o.merge_blktrace_iters, bcs, nr_logs,
+				    1, offsetof(struct blktrace_cursor,
+						nr_iter));
+	if (ret) {
+		log_err("fio: merge_blktrace_iters(%d) != nr_logs(%d)\n",
+			ret, nr_logs);
+		goto err_param;
+	}
+
+	/* setup output file */
+	merge_fp = fopen(td->o.merge_blktrace_file, "w");
+	merge_buf = malloc(128 * 1024);
+	ret = setvbuf(merge_fp, merge_buf, _IOFBF, 128 * 1024);
+	if (ret)
+		goto err_out_file;
+
+	/* setup input files */
+	str = ptr = strdup(td->o.read_iolog_file);
+	nr_logs = 0;
+	for (i = 0; (name = get_next_str(&ptr)) != NULL; i++) {
+		bcs[i].fd = open(name, O_RDONLY);
+		if (bcs[i].fd < 0) {
+			log_err("fio: could not open file: %s\n", name);
+			ret = bcs[i].fd;
+			goto err_file;
+		}
+		bcs[i].fifo = fifo_alloc(TRACE_FIFO_SIZE);
+		nr_logs++;
+
+		if (!is_blktrace(name, &bcs[i].swap)) {
+			log_err("fio: file is not a blktrace: %s\n", name);
+			goto err_file;
+		}
+
+		ret = read_trace(td, &bcs[i]);
+		if (ret < 0) {
+			goto err_file;
+		} else if (!ret) {
+			merge_finish_file(bcs, i, &nr_logs);
+			i--;
+		}
+	}
+	free(str);
+
+	/* merge files */
+	while (nr_logs) {
+		i = find_earliest_io(bcs, nr_logs);
+		bc = &bcs[i];
+		/* skip over the pdu */
+		ret = discard_pdu(td, bc->fifo, bc->fd, &bc->t);
+		if (ret < 0) {
+			td_verror(td, ret, "blktrace lseek");
+			goto err_file;
+		} else if (bc->t.pdu_len != ret) {
+			log_err("fio: discarded %d of %d\n", ret,
+				bc->t.pdu_len);
+			goto err_file;
+		}
+
+		ret = write_trace(merge_fp, &bc->t);
+		ret = read_trace(td, bc);
+		if (ret < 0)
+			goto err_file;
+		else if (!ret)
+			merge_finish_file(bcs, i, &nr_logs);
+	}
+
+	/* set iolog file to read from the newly merged file */
+	td->o.read_iolog_file = td->o.merge_blktrace_file;
+	ret = 0;
+
+err_file:
+	/* cleanup */
+	for (i = 0; i < nr_logs; i++) {
+		fifo_free(bcs[i].fifo);
+		close(bcs[i].fd);
+	}
+err_out_file:
+	fflush(merge_fp);
+	fclose(merge_fp);
+	free(merge_buf);
+err_param:
+	free(bcs);
+
+	return ret;
+}
diff --git a/blktrace.h b/blktrace.h
index 096993e..a0e82fa 100644
--- a/blktrace.h
+++ b/blktrace.h
@@ -1,10 +1,27 @@
 #ifndef FIO_BLKTRACE_H
 #define FIO_BLKTRACE_H
 
+
 #ifdef FIO_HAVE_BLKTRACE
 
+#include <asm/types.h>
+
+#include "blktrace_api.h"
+
+struct blktrace_cursor {
+	struct fifo		*fifo;	// fifo queue for reading
+	int			fd;	// blktrace file
+	__u64			length; // length of trace
+	struct blk_io_trace	t;	// current io trace
+	int			swap;	// bitwise reverse required
+	int			scalar;	// scale percentage
+	int			iter;	// current iteration
+	int			nr_iter; // number of iterations to run
+};
+
 bool is_blktrace(const char *, int *);
 bool load_blktrace(struct thread_data *, const char *, int);
+int merge_blktrace_iologs(struct thread_data *td);
 
 #else
 
@@ -19,5 +36,10 @@ static inline bool load_blktrace(struct thread_data *td, const char *fname,
 	return false;
 }
 
+static inline int merge_blktrace_iologs(struct thread_data *td)
+{
+	return false;
+}
+
 #endif
 #endif
diff --git a/cconv.c b/cconv.c
index 1d7f6f2..50e45c6 100644
--- a/cconv.c
+++ b/cconv.c
@@ -37,6 +37,7 @@ static void free_thread_options_to_cpu(struct thread_options *o)
 	free(o->mmapfile);
 	free(o->read_iolog_file);
 	free(o->write_iolog_file);
+	free(o->merge_blktrace_file);
 	free(o->bw_log_file);
 	free(o->lat_log_file);
 	free(o->iops_log_file);
@@ -73,6 +74,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	string_to_cpu(&o->mmapfile, top->mmapfile);
 	string_to_cpu(&o->read_iolog_file, top->read_iolog_file);
 	string_to_cpu(&o->write_iolog_file, top->write_iolog_file);
+	string_to_cpu(&o->merge_blktrace_file, top->merge_blktrace_file);
 	string_to_cpu(&o->bw_log_file, top->bw_log_file);
 	string_to_cpu(&o->lat_log_file, top->lat_log_file);
 	string_to_cpu(&o->iops_log_file, top->iops_log_file);
@@ -304,6 +306,12 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++)
 		o->percentile_list[i].u.f = fio_uint64_to_double(le64_to_cpu(top->percentile_list[i].u.i));
+
+	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++)
+		o->merge_blktrace_scalars[i].u.f = fio_uint64_to_double(le64_to_cpu(top->merge_blktrace_scalars[i].u.i));
+
+	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++)
+		o->merge_blktrace_iters[i].u.f = fio_uint64_to_double(le64_to_cpu(top->merge_blktrace_iters[i].u.i));
 #if 0
 	uint8_t cpumask[FIO_TOP_STR_MAX];
 	uint8_t verify_cpumask[FIO_TOP_STR_MAX];
@@ -330,6 +338,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	string_to_net(top->mmapfile, o->mmapfile);
 	string_to_net(top->read_iolog_file, o->read_iolog_file);
 	string_to_net(top->write_iolog_file, o->write_iolog_file);
+	string_to_net(top->merge_blktrace_file, o->merge_blktrace_file);
 	string_to_net(top->bw_log_file, o->bw_log_file);
 	string_to_net(top->lat_log_file, o->lat_log_file);
 	string_to_net(top->iops_log_file, o->iops_log_file);
@@ -565,6 +574,12 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++)
 		top->percentile_list[i].u.i = __cpu_to_le64(fio_double_to_uint64(o->percentile_list[i].u.f));
+
+	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++)
+		top->merge_blktrace_scalars[i].u.i = __cpu_to_le64(fio_double_to_uint64(o->merge_blktrace_scalars[i].u.f));
+
+	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++)
+		top->merge_blktrace_iters[i].u.i = __cpu_to_le64(fio_double_to_uint64(o->merge_blktrace_iters[i].u.f));
 #if 0
 	uint8_t cpumask[FIO_TOP_STR_MAX];
 	uint8_t verify_cpumask[FIO_TOP_STR_MAX];
diff --git a/fio.1 b/fio.1
index 593f4db..81164ae 100644
--- a/fio.1
+++ b/fio.1
@@ -20,6 +20,9 @@ file and memory debugging). `help' will list all available tracing options.
 .BI \-\-parse\-only
 Parse options only, don't start any I/O.
 .TP
+.BI \-\-merge\-blktrace\-only
+Merge blktraces only, don't start any I/O.
+.TP
 .BI \-\-output \fR=\fPfilename
 Write output to \fIfilename\fR.
 .TP
@@ -2198,6 +2201,30 @@ Determines how iolog is read. If false (default) entire \fBread_iolog\fR will
 be read at once. If selected true, input from iolog will be read gradually.
 Useful when iolog is very large, or it is generated.
 .TP
+.BI merge_blktrace_file \fR=\fPstr
+When specified, rather than replaying the logs passed to \fBread_iolog\fR,
+the logs go through a merge phase which aggregates them into a single blktrace.
+The resulting file is then passed on as the \fBread_iolog\fR parameter. The
+intention here is to make the order of events consistent. This limits the
+influence of the scheduler compared to replaying multiple blktraces via
+concurrent jobs.
+.TP
+.BI merge_blktrace_scalars \fR=\fPfloat_list
+This is a percentage based option that is index paired with the list of files
+passed to \fBread_iolog\fR. When merging is performed, scale the time of each
+event by the corresponding amount. For example,
+`\-\-merge_blktrace_scalars="50:100"' runs the first trace in halftime and the
+second trace in realtime. This knob is separately tunable from
+\fBreplay_time_scale\fR which scales the trace during runtime and will not
+change the output of the merge unlike this option.
+.TP
+.BI merge_blktrace_iters \fR=\fPfloat_list
+This is a whole number option that is index paired with the list of files
+passed to \fBread_iolog\fR. When merging is performed, run each trace for
+the specified number of iterations. For example,
+`\-\-merge_blktrace_iters="2:1"' runs the first trace for two iterations
+and the second trace for one iteration.
+.TP
 .BI replay_no_stall \fR=\fPbool
 When replaying I/O with \fBread_iolog\fR the default behavior is to
 attempt to respect the timestamps within the log and replay them with the
@@ -3531,6 +3558,45 @@ Write `length' bytes beginning from `offset'.
 Trim the given file from the given `offset' for `length' bytes.
 .RE
 .RE
+.SH I/O REPLAY \- MERGING TRACES
+Colocation is a common practice used to get the most out of a machine.
+Knowing which workloads play nicely with each other and which ones don't is
+a much harder task. While fio can replay workloads concurrently via multiple
+jobs, it leaves some variability up to the scheduler making results harder to
+reproduce. Merging is a way to make the order of events consistent.
+.P
+Merging is integrated into I/O replay and done when a \fBmerge_blktrace_file\fR
+is specified. The list of files passed to \fBread_iolog\fR go through the merge
+process and output a single file stored to the specified file. The output file is
+passed on as if it were the only file passed to \fBread_iolog\fR. An example would
+look like:
+.RS
+.P
+$ fio \-\-read_iolog="<file1>:<file2>" \-\-merge_blktrace_file="<output_file>"
+.RE
+.P
+Creating only the merged file can be done by passing the command line argument
+\fBmerge-blktrace-only\fR.
+.P
+Scaling traces can be done to see the relative impact of any particular trace
+being slowed down or sped up. \fBmerge_blktrace_scalars\fR takes in a colon
+separated list of percentage scalars. It is index paired with the files passed
+to \fBread_iolog\fR.
+.P
+With scaling, it may be desirable to match the running time of all traces.
+This can be done with \fBmerge_blktrace_iters\fR. It is index paired with
+\fBread_iolog\fR just like \fBmerge_blktrace_scalars\fR.
+.P
+In an example, given two traces, A and B, each 60s long. If we want to see
+the impact of trace A issuing IOs twice as fast and repeat trace A over the
+runtime of trace B, the following can be done:
+.RS
+.P
+$ fio \-\-read_iolog="<trace_a>:"<trace_b>" \-\-merge_blktrace_file"<output_file>" \-\-merge_blktrace_scalars="50:100" \-\-merge_blktrace_iters="2:1"
+.RE
+.P
+This runs trace A at 2x the speed twice for approximately the same runtime as
+a single run of trace B.
 .SH CPU IDLENESS PROFILING
 In some cases, we want to understand CPU overhead in a test. For example, we
 test patches for the specific goodness of whether they reduce CPU usage.
diff --git a/init.c b/init.c
index c235b05..560da8f 100644
--- a/init.c
+++ b/init.c
@@ -30,6 +30,7 @@
 #include "idletime.h"
 #include "filelock.h"
 #include "steadystate.h"
+#include "blktrace.h"
 
 #include "oslib/getopt.h"
 #include "oslib/strcasestr.h"
@@ -46,6 +47,7 @@ static char **ini_file;
 static int max_jobs = FIO_MAX_JOBS;
 static int dump_cmdline;
 static int parse_only;
+static int merge_blktrace_only;
 
 static struct thread_data def_thread;
 struct thread_data *threads = NULL;
@@ -287,6 +289,11 @@ static struct option l_opts[FIO_NR_OPTIONS] = {
 		.val		= 'K',
 	},
 	{
+		.name		= (char *) "merge-blktrace-only",
+		.has_arg	= no_argument,
+		.val		= 'A' | FIO_CLIENT_FLAG,
+	},
+	{
 		.name		= NULL,
 	},
 };
@@ -1724,6 +1731,14 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	if (td_steadystate_init(td))
 		goto err;
 
+	if (o->merge_blktrace_file && !merge_blktrace_iologs(td))
+		goto err;
+
+	if (merge_blktrace_only) {
+		put_job(td);
+		return 0;
+	}
+
 	/*
 	 * recurse add identical jobs, clear numjobs and stonewall options
 	 * as they don't apply to sub-jobs
@@ -2173,6 +2188,7 @@ static void usage(const char *name)
 	printf("  --debug=options\tEnable debug logging. May be one/more of:\n");
 	show_debug_categories();
 	printf("  --parse-only\t\tParse options only, don't start any IO\n");
+	printf("  --merge-blktrace-only\tMerge blktraces only, don't start any IO\n");
 	printf("  --output\t\tWrite output to file\n");
 	printf("  --bandwidth-log\tGenerate aggregate bandwidth logs\n");
 	printf("  --minimal\t\tMinimal (terse) output\n");
@@ -2889,6 +2905,11 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			}
 			trigger_timeout /= 1000000;
 			break;
+
+		case 'A':
+			did_arg = true;
+			merge_blktrace_only = 1;
+			break;
 		case '?':
 			log_err("%s: unrecognized option '%s'\n", argv[0],
 							argv[optind - 1]);
diff --git a/options.c b/options.c
index 6bd7455..9b27730 100644
--- a/options.c
+++ b/options.c
@@ -1155,7 +1155,7 @@ static int str_steadystate_cb(void *data, const char *str)
  * is escaped with a '\', then that ':' is part of the filename and does not
  * indicate a new file.
  */
-static char *get_next_name(char **ptr)
+char *get_next_str(char **ptr)
 {
 	char *str = *ptr;
 	char *p, *start;
@@ -1197,14 +1197,14 @@ static char *get_next_name(char **ptr)
 }
 
 
-static int get_max_name_idx(char *input)
+int get_max_str_idx(char *input)
 {
 	unsigned int cur_idx;
 	char *str, *p;
 
 	p = str = strdup(input);
 	for (cur_idx = 0; ; cur_idx++)
-		if (get_next_name(&str) == NULL)
+		if (get_next_str(&str) == NULL)
 			break;
 
 	free(p);
@@ -1224,9 +1224,9 @@ int set_name_idx(char *target, size_t tlen, char *input, int index,
 
 	p = str = strdup(input);
 
-	index %= get_max_name_idx(input);
+	index %= get_max_str_idx(input);
 	for (cur_idx = 0; cur_idx <= index; cur_idx++)
-		fname = get_next_name(&str);
+		fname = get_next_str(&str);
 
 	if (client_sockaddr_str[0] && unique_filename) {
 		len = snprintf(target, tlen, "%s/%s.", fname,
@@ -1247,9 +1247,9 @@ char* get_name_by_idx(char *input, int index)
 
 	p = str = strdup(input);
 
-	index %= get_max_name_idx(input);
+	index %= get_max_str_idx(input);
 	for (cur_idx = 0; cur_idx <= index; cur_idx++)
-		fname = get_next_name(&str);
+		fname = get_next_str(&str);
 
 	fname = strdup(fname);
 	free(p);
@@ -1273,7 +1273,7 @@ static int str_filename_cb(void *data, const char *input)
 	if (!td->files_index)
 		td->o.nr_files = 0;
 
-	while ((fname = get_next_name(&str)) != NULL) {
+	while ((fname = get_next_str(&str)) != NULL) {
 		if (!strlen(fname))
 			break;
 		add_file(td, fname, 0, 1);
@@ -1294,7 +1294,7 @@ static int str_directory_cb(void *data, const char fio_unused *unused)
 		return 0;
 
 	p = str = strdup(td->o.directory);
-	while ((dirname = get_next_name(&str)) != NULL) {
+	while ((dirname = get_next_str(&str)) != NULL) {
 		if (lstat(dirname, &sb) < 0) {
 			ret = errno;
 
@@ -3199,6 +3199,35 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_IOLOG,
 	},
 	{
+		.name	= "merge_blktrace_file",
+		.lname	= "Merged blktrace output filename",
+		.type	= FIO_OPT_STR_STORE,
+		.off1	= offsetof(struct thread_options, merge_blktrace_file),
+		.help	= "Merged blktrace output filename",
+		.category = FIO_OPT_C_IO,
+		.group = FIO_OPT_G_IOLOG,
+	},
+	{
+		.name	= "merge_blktrace_scalars",
+		.lname	= "Percentage to scale each trace",
+		.type	= FIO_OPT_FLOAT_LIST,
+		.off1	= offsetof(struct thread_options, merge_blktrace_scalars),
+		.maxlen	= FIO_IO_U_LIST_MAX_LEN,
+		.help	= "Percentage to scale each trace",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_IOLOG,
+	},
+	{
+		.name	= "merge_blktrace_iters",
+		.lname	= "Number of iterations to run per trace",
+		.type	= FIO_OPT_FLOAT_LIST,
+		.off1	= offsetof(struct thread_options, merge_blktrace_iters),
+		.maxlen	= FIO_IO_U_LIST_MAX_LEN,
+		.help	= "Number of iterations to run per trace",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_IOLOG,
+	},
+	{
 		.name	= "exec_prerun",
 		.lname	= "Pre-execute runnable",
 		.type	= FIO_OPT_STR_STORE,
diff --git a/options.h b/options.h
index 8fdd136..5276f31 100644
--- a/options.h
+++ b/options.h
@@ -16,6 +16,8 @@ void add_opt_posval(const char *, const char *, const char *);
 void del_opt_posval(const char *, const char *);
 struct thread_data;
 void fio_options_free(struct thread_data *);
+char *get_next_str(char **ptr);
+int get_max_str_idx(char *input);
 char* get_name_by_idx(char *input, int index);
 int set_name_idx(char *, size_t, char *, int, bool);
 
diff --git a/server.h b/server.h
index 37d2f76..40b9eac 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 74,
+	FIO_SERVER_VER			= 77,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 3931583..4f791cf 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -258,6 +258,9 @@ struct thread_options {
 	char *read_iolog_file;
 	bool read_iolog_chunked;
 	char *write_iolog_file;
+	char *merge_blktrace_file;
+	fio_fp64_t merge_blktrace_scalars[FIO_IO_U_LIST_MAX_LEN];
+	fio_fp64_t merge_blktrace_iters[FIO_IO_U_LIST_MAX_LEN];
 
 	unsigned int write_bw_log;
 	unsigned int write_lat_log;
@@ -540,6 +543,9 @@ struct thread_options_pack {
 
 	uint8_t read_iolog_file[FIO_TOP_STR_MAX];
 	uint8_t write_iolog_file[FIO_TOP_STR_MAX];
+	uint8_t merge_blktrace_file[FIO_TOP_STR_MAX];
+	fio_fp64_t merge_blktrace_scalars[FIO_IO_U_LIST_MAX_LEN];
+	fio_fp64_t merge_blktrace_iters[FIO_IO_U_LIST_MAX_LEN];
 
 	uint32_t write_bw_log;
 	uint32_t write_lat_log;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ed06087e5f4f5dcb6660d1095005c777bcd661cb:

  filesetup: use 64-bit type for file size generation (2018-09-17 13:46:51 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5fa4aff59e4eec471b2c534702f0162006845c4b:

  Merge branch 'epoch-time-hist-logs' of https://github.com/parallel-fs-utils/fio (2018-09-19 09:42:47 -0600)

----------------------------------------------------------------
Ben England (2):
      handle log_unix_epoch=1
      raise exception if test start time can't be estimated

Jens Axboe (1):
      Merge branch 'epoch-time-hist-logs' of https://github.com/parallel-fs-utils/fio

 tools/hist/fio-histo-log-pctiles.py | 207 +++++++++++++++++++++++++-----------
 1 file changed, 144 insertions(+), 63 deletions(-)

---

Diff of recent changes:

diff --git a/tools/hist/fio-histo-log-pctiles.py b/tools/hist/fio-histo-log-pctiles.py
index 1e7b631..7f08f6e 100755
--- a/tools/hist/fio-histo-log-pctiles.py
+++ b/tools/hist/fio-histo-log-pctiles.py
@@ -21,7 +21,7 @@
 # if you do this, don't pass normal CLI parameters to it
 # otherwise it runs the CLI
 
-import sys, os, math, copy
+import sys, os, math, copy, time
 from copy import deepcopy
 import argparse
 
@@ -33,6 +33,8 @@ except ImportError:
 
 msec_per_sec = 1000
 nsec_per_usec = 1000
+direction_read = 0
+direction_write = 1
 
 class FioHistoLogExc(Exception):
     pass
@@ -57,10 +59,15 @@ def exception_suffix( record_num, pathname ):
 
 # log file parser raises FioHistoLogExc exceptions
 # it returns histogram buckets in whatever unit fio uses
-
-def parse_hist_file(logfn, buckets_per_interval):
-    max_timestamp_ms = 0.0
-    
+# inputs:
+#  logfn: pathname to histogram log file
+#  buckets_per_interval - how many histogram buckets to expect
+#  log_hist_msec - if not None, expected time interval between histogram records
+
+def parse_hist_file(logfn, buckets_per_interval, log_hist_msec):
+    previous_ts_ms_read = -1
+    previous_ts_ms_write = -1
+ 
     with open(logfn, 'r') as f:
         records = [ l.strip() for l in f.readlines() ]
     intervals = []
@@ -82,14 +89,20 @@ def parse_hist_file(logfn, buckets_per_interval):
         if len(int_tokens) < 3:
             raise FioHistoLogExc('too few numbers %s' % exception_suffix(k+1, logfn))
 
-        time_ms = int_tokens[0]
-        if time_ms > max_timestamp_ms:
-            max_timestamp_ms = time_ms
-
         direction = int_tokens[1]
-        if direction != 0 and direction != 1:
+        if direction != direction_read and direction != direction_write:
             raise FioHistoLogExc('invalid I/O direction %s' % exception_suffix(k+1, logfn))
 
+        time_ms = int_tokens[0]
+        if direction == direction_read:
+            if time_ms < previous_ts_ms_read:
+                raise FioHistoLogExc('read timestamp in column 1 decreased %s' % exception_suffix(k+1, logfn))
+            previous_ts_ms_read = time_ms
+        elif direction == direction_write:
+            if time_ms < previous_ts_ms_write:
+                raise FioHistoLogExc('write timestamp in column 1 decreased %s' % exception_suffix(k+1, logfn))
+            previous_ts_ms_write = time_ms
+
         bsz = int_tokens[2]
         if bsz > (1 << 24):
             raise FioHistoLogExc('block size too large %s' % exception_suffix(k+1, logfn))
@@ -110,7 +123,19 @@ def parse_hist_file(logfn, buckets_per_interval):
         intervals.append((time_ms, direction, bsz, buckets))
     if len(intervals) == 0:
         raise FioHistoLogExc('no records in %s' % logfn)
-    return (intervals, max_timestamp_ms)
+    (first_timestamp, _, _, _) = intervals[0]
+    if first_timestamp < 1000000:
+        start_time = 0    # assume log_unix_epoch = 0
+    elif log_hist_msec != None:
+        start_time = first_timestamp - log_hist_msec
+    elif len(intervals) > 1:
+        (second_timestamp, _, _, _) = intervals[1]
+        start_time = first_timestamp - (second_timestamp - first_timestamp)
+    else:
+        raise FioHistoLogExc('no way to estimate test start time')
+    (end_timestamp, _, _, _) = intervals[-1]
+
+    return (intervals, start_time, end_timestamp)
 
 
 # compute time range for each bucket index in histogram record
@@ -139,12 +164,13 @@ def time_ranges(groups, counters_per_group, fio_version=3):
 
 # compute number of time quantum intervals in the test
 
-def get_time_intervals(time_quantum, max_timestamp_ms):
+def get_time_intervals(time_quantum, min_timestamp_ms, max_timestamp_ms):
     # round down to nearest second
     max_timestamp = max_timestamp_ms // msec_per_sec
+    min_timestamp = min_timestamp_ms // msec_per_sec
     # round up to nearest whole multiple of time_quantum
-    time_interval_count = (max_timestamp + time_quantum) // time_quantum
-    end_time = time_interval_count * time_quantum
+    time_interval_count = ((max_timestamp - min_timestamp) + time_quantum) // time_quantum
+    end_time = min_timestamp + (time_interval_count * time_quantum)
     return (end_time, time_interval_count)
 
 # align raw histogram log data to time quantum so 
@@ -162,17 +188,17 @@ def get_time_intervals(time_quantum, max_timestamp_ms):
 # so the contribution of this bucket to this time quantum is
 # 515 x 0.99 = 509.85
 
-def align_histo_log(raw_histogram_log, time_quantum, bucket_count, max_timestamp_ms):
+def align_histo_log(raw_histogram_log, time_quantum, bucket_count, min_timestamp_ms, max_timestamp_ms):
 
     # slice up test time int intervals of time_quantum seconds
 
-    (end_time, time_interval_count) = get_time_intervals(time_quantum, max_timestamp_ms)
+    (end_time, time_interval_count) = get_time_intervals(time_quantum, min_timestamp_ms, max_timestamp_ms)
     time_qtm_ms = time_quantum * msec_per_sec
     end_time_ms = end_time * msec_per_sec
     aligned_intervals = []
     for j in range(0, time_interval_count):
         aligned_intervals.append((
-            j * time_qtm_ms,
+            min_timestamp_ms + (j * time_qtm_ms),
             [ 0.0 for j in range(0, bucket_count) ] ))
 
     log_record_count = len(raw_histogram_log)
@@ -205,14 +231,20 @@ def align_histo_log(raw_histogram_log, time_quantum, bucket_count, max_timestamp
 
         # calculate first quantum that overlaps this histogram record 
 
-        qtm_start_ms = (time_msec // time_qtm_ms) * time_qtm_ms
-        qtm_end_ms = ((time_msec + time_qtm_ms) // time_qtm_ms) * time_qtm_ms
-        qtm_index = qtm_start_ms // time_qtm_ms
+        offset_from_min_ts = time_msec - min_timestamp_ms
+        qtm_start_ms = min_timestamp_ms + (offset_from_min_ts // time_qtm_ms) * time_qtm_ms
+        qtm_end_ms = min_timestamp_ms + ((offset_from_min_ts + time_qtm_ms) // time_qtm_ms) * time_qtm_ms
+        qtm_index = offset_from_min_ts // time_qtm_ms
 
         # for each quantum that overlaps this histogram record's time interval
 
         while qtm_start_ms < time_msec_end:  # while quantum overlaps record
 
+            # some histogram logs may be longer than others
+
+            if len(aligned_intervals) <= qtm_index:
+                break
+
             # calculate fraction of time that this quantum 
             # overlaps histogram record's time interval
             
@@ -332,6 +364,9 @@ def compute_percentiles_from_logs():
     parser.add_argument("--time-quantum", dest="time_quantum", 
         default="1", type=int,
         help="time quantum in seconds (default=1)")
+    parser.add_argument("--log-hist-msec", dest="log_hist_msec", 
+        type=int, default=None,
+        help="log_hist_msec value in fio job file")
     parser.add_argument("--output-unit", dest="output_unit", 
         default="usec", type=str,
         help="Latency percentile output unit: msec|usec|nsec (default usec)")
@@ -355,30 +390,24 @@ def compute_percentiles_from_logs():
     buckets_per_interval = buckets_per_group * args.bucket_groups
     print('buckets per interval = %d ' % buckets_per_interval)
     bucket_index_range = range(0, buckets_per_interval)
+    if args.log_hist_msec != None:
+        print('log_hist_msec = %d' % args.log_hist_msec)
     if args.time_quantum == 0:
         print('ERROR: time-quantum must be a positive number of seconds')
     print('output unit = ' + args.output_unit)
     if args.output_unit == 'msec':
-        time_divisor = 1000.0
+        time_divisor = float(msec_per_sec)
     elif args.output_unit == 'usec':
         time_divisor = 1.0
 
-    # calculate response time interval associated with each histogram bucket
-
-    bucket_times = time_ranges(args.bucket_groups, buckets_per_group, fio_version=args.fio_version)
-
     # construct template for each histogram bucket array with buckets all zeroes
     # we just copy this for each new histogram
 
     zeroed_buckets = [ 0.0 for r in bucket_index_range ]
 
-    # print CSV header just like fiologparser_hist does
+    # calculate response time interval associated with each histogram bucket
 
-    header = 'msec, '
-    for p in args.pctiles_wanted:
-        header += '%3.1f, ' % p
-    print('time (millisec), percentiles in increasing order with values in ' + args.output_unit)
-    print(header)
+    bucket_times = time_ranges(args.bucket_groups, buckets_per_group, fio_version=args.fio_version)
 
     # parse the histogram logs
     # assumption: each bucket has a monotonically increasing time
@@ -386,33 +415,52 @@ def compute_percentiles_from_logs():
     # (exception: if randrw workload, then there is a read and a write 
     # record for the same time interval)
 
-    max_timestamp_all_logs = 0
+    test_start_time = 0
+    test_end_time = 1.0e18
     hist_files = {}
     for fn in args.file_list:
         try:
-            (hist_files[fn], max_timestamp_ms)  = parse_hist_file(fn, buckets_per_interval)
+            (hist_files[fn], log_start_time, log_end_time)  = parse_hist_file(fn, buckets_per_interval, args.log_hist_msec)
         except FioHistoLogExc as e:
             myabort(str(e))
-        max_timestamp_all_logs = max(max_timestamp_all_logs, max_timestamp_ms)
-
-    (end_time, time_interval_count) = get_time_intervals(args.time_quantum, max_timestamp_all_logs)
+        # we consider the test started when all threads have started logging
+        test_start_time = max(test_start_time, log_start_time)
+        # we consider the test over when one of the logs has ended
+        test_end_time = min(test_end_time, log_end_time)
+
+    if test_start_time >= test_end_time:
+        raise FioHistoLogExc('no time interval when all threads logs overlapped')
+    if test_start_time > 0:
+        print('all threads running as of unix epoch time %d = %s' % (
+               test_start_time/float(msec_per_sec), 
+               time.ctime(test_start_time/1000.0)))
+
+    (end_time, time_interval_count) = get_time_intervals(args.time_quantum, test_start_time, test_end_time)
     all_threads_histograms = [ ((j*args.time_quantum*msec_per_sec), deepcopy(zeroed_buckets))
-                                for j in range(0, time_interval_count) ]
+                               for j in range(0, time_interval_count) ]
 
     for logfn in hist_files.keys():
         aligned_per_thread = align_histo_log(hist_files[logfn], 
                                              args.time_quantum, 
                                              buckets_per_interval, 
-                                             max_timestamp_all_logs)
+                                             test_start_time,
+                                             test_end_time)
         for t in range(0, time_interval_count):
             (_, all_threads_histo_t) = all_threads_histograms[t]
             (_, log_histo_t) = aligned_per_thread[t]
             add_to_histo_from( all_threads_histo_t, log_histo_t )
 
     # calculate percentiles across aggregate histogram for all threads
+    # print CSV header just like fiologparser_hist does
+
+    header = 'msec-since-start, '
+    for p in args.pctiles_wanted:
+        header += '%3.1f, ' % p
+    print('time (millisec), percentiles in increasing order with values in ' + args.output_unit)
+    print(header)
 
     for (t_msec, all_threads_histo_t) in all_threads_histograms:
-        record = '%d, ' % t_msec
+        record = '%8d, ' % t_msec
         pct = get_pctiles(all_threads_histo_t, args.pctiles_wanted, bucket_times)
         if not pct:
             for w in args.pctiles_wanted:
@@ -471,8 +519,9 @@ if unittest2_imported:
         with open(self.fn, 'w') as f:
             f.write('1234, 0, 4096, 1, 2, 3, 4\n')
             f.write('5678,1,16384,5,6,7,8 \n')
-        (raw_histo_log, max_timestamp) = parse_hist_file(self.fn, 4) # 4 buckets per interval
-        self.A(len(raw_histo_log) == 2 and max_timestamp == 5678)
+        (raw_histo_log, min_timestamp, max_timestamp) = parse_hist_file(self.fn, 4, None) # 4 buckets per interval
+        # if not log_unix_epoch=1, then min_timestamp will always be set to zero
+        self.A(len(raw_histo_log) == 2 and min_timestamp == 0 and max_timestamp == 5678)
         (time_ms, direction, bsz, histo) = raw_histo_log[0]
         self.A(time_ms == 1234 and direction == 0 and bsz == 4096 and histo == [ 1, 2, 3, 4 ])
         (time_ms, direction, bsz, histo) = raw_histo_log[1]
@@ -482,7 +531,7 @@ if unittest2_imported:
         with open(self.fn, 'w') as f:
             pass
         try:
-            (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn, 4)
+            (raw_histo_log, _, _) = parse_hist_file(self.fn, 4, None)
             self.A(should_not_get_here)
         except FioHistoLogExc as e:
             self.A(str(e).startswith('no records'))
@@ -493,7 +542,7 @@ if unittest2_imported:
             f.write('1234, 0, 4096, 1, 2, 3, 4\n')
             f.write('5678,1,16384,5,6,7,8 \n')
             f.write('\n')
-        (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn, 4)
+        (raw_histo_log, _, max_timestamp_ms) = parse_hist_file(self.fn, 4, None)
         self.A(len(raw_histo_log) == 2 and max_timestamp_ms == 5678)
         (time_ms, direction, bsz, histo) = raw_histo_log[0]
         self.A(time_ms == 1234 and direction == 0 and bsz == 4096 and histo == [ 1, 2, 3, 4 ])
@@ -504,7 +553,7 @@ if unittest2_imported:
         with open(self.fn, 'w') as f:
             f.write('12, 0, 4096, 1a, 2, 3, 4\n')
         try:
-            (raw_histo_log, _) = parse_hist_file(self.fn, 4)
+            (raw_histo_log, _, _) = parse_hist_file(self.fn, 4, None)
             self.A(False)
         except FioHistoLogExc as e:
             self.A(str(e).startswith('non-integer'))
@@ -513,7 +562,7 @@ if unittest2_imported:
         with open(self.fn, 'w') as f:
             f.write('-12, 0, 4096, 1, 2, 3, 4\n')
         try:
-            (raw_histo_log, _) = parse_hist_file(self.fn, 4)
+            (raw_histo_log, _, _) = parse_hist_file(self.fn, 4, None)
             self.A(False)
         except FioHistoLogExc as e:
             self.A(str(e).startswith('negative integer'))
@@ -522,7 +571,7 @@ if unittest2_imported:
         with open(self.fn, 'w') as f:
             f.write('0, 0\n')
         try:
-            (raw_histo_log, _) = parse_hist_file(self.fn, 4)
+            (raw_histo_log, _, _) = parse_hist_file(self.fn, 4, None)
             self.A(False)
         except FioHistoLogExc as e:
             self.A(str(e).startswith('too few numbers'))
@@ -531,7 +580,7 @@ if unittest2_imported:
         with open(self.fn, 'w') as f:
             f.write('100, 2, 4096, 1, 2, 3, 4\n')
         try:
-            (raw_histo_log, _) = parse_hist_file(self.fn, 4)
+            (raw_histo_log, _, _) = parse_hist_file(self.fn, 4, None)
             self.A(False)
         except FioHistoLogExc as e:
             self.A(str(e).startswith('invalid I/O direction'))
@@ -539,11 +588,11 @@ if unittest2_imported:
     def test_b8_parse_bsz_too_big(self):
         with open(self.fn+'_good', 'w') as f:
             f.write('100, 1, %d, 1, 2, 3, 4\n' % (1<<24))
-        (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn+'_good', 4)
+        (raw_histo_log, _, _) = parse_hist_file(self.fn+'_good', 4, None)
         with open(self.fn+'_bad', 'w') as f:
             f.write('100, 1, 20000000, 1, 2, 3, 4\n')
         try:
-            (raw_histo_log, _) = parse_hist_file(self.fn+'_bad', 4)
+            (raw_histo_log, _, _) = parse_hist_file(self.fn+'_bad', 4, None)
             self.A(False)
         except FioHistoLogExc as e:
             self.A(str(e).startswith('block size too large'))
@@ -552,7 +601,7 @@ if unittest2_imported:
         with open(self.fn, 'w') as f:
             f.write('100, 1, %d, 1, 2, 3, 4, 5\n' % (1<<24))
         try:
-            (raw_histo_log, _) = parse_hist_file(self.fn, 4)
+            (raw_histo_log, _, _) = parse_hist_file(self.fn, 4, None)
             self.A(False)
         except FioHistoLogExc as e:
             self.A(str(e).__contains__('buckets per interval'))
@@ -581,12 +630,44 @@ if unittest2_imported:
     def test_d1_align_histo_log_1_quantum(self):
         with open(self.fn, 'w') as f:
             f.write('100, 1, 4096, 1, 2, 3, 4')
-        (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn, 4)
-        self.A(max_timestamp_ms == 100)
-        aligned_log = align_histo_log(raw_histo_log, 5, 4, max_timestamp_ms)
+        (raw_histo_log, min_timestamp_ms, max_timestamp_ms) = parse_hist_file(self.fn, 4, None)
+        self.A(min_timestamp_ms == 0 and max_timestamp_ms == 100)
+        aligned_log = align_histo_log(raw_histo_log, 5, 4, min_timestamp_ms, max_timestamp_ms)
+        self.A(len(aligned_log) == 1)
+        (time_ms0, h) = aligned_log[0]
+        self.A(time_ms0 == 0 and h == [1., 2., 3., 4.])
+
+    # handle case with log_unix_epoch=1 timestamps, 1-second time quantum
+    # here both records will be separated into 2 aligned intervals
+
+    def test_d1a_align_2rec_histo_log_epoch_1_quantum_1sec(self):
+        with open(self.fn, 'w') as f:
+            f.write('1536504002123, 1, 4096, 1, 2, 3, 4\n')
+            f.write('1536504003123, 1, 4096, 4, 3, 2, 1\n')
+        (raw_histo_log, min_timestamp_ms, max_timestamp_ms) = parse_hist_file(self.fn, 4, None)
+        self.A(min_timestamp_ms == 1536504001123 and max_timestamp_ms == 1536504003123)
+        aligned_log = align_histo_log(raw_histo_log, 1, 4, min_timestamp_ms, max_timestamp_ms)
+        self.A(len(aligned_log) == 3)
+        (time_ms0, h) = aligned_log[0]
+        self.A(time_ms0 == 1536504001123 and h == [0., 0., 0., 0.])
+        (time_ms1, h) = aligned_log[1]
+        self.A(time_ms1 == 1536504002123 and h == [1., 2., 3., 4.])
+        (time_ms2, h) = aligned_log[2]
+        self.A(time_ms2 == 1536504003123 and h == [4., 3., 2., 1.])
+
+    # handle case with log_unix_epoch=1 timestamps, 5-second time quantum
+    # here both records will be merged into a single aligned time interval
+
+    def test_d1b_align_2rec_histo_log_epoch_1_quantum_5sec(self):
+        with open(self.fn, 'w') as f:
+            f.write('1536504002123, 1, 4096, 1, 2, 3, 4\n')
+            f.write('1536504003123, 1, 4096, 4, 3, 2, 1\n')
+        (raw_histo_log, min_timestamp_ms, max_timestamp_ms) = parse_hist_file(self.fn, 4, None)
+        self.A(min_timestamp_ms == 1536504001123 and max_timestamp_ms == 1536504003123)
+        aligned_log = align_histo_log(raw_histo_log, 5, 4, min_timestamp_ms, max_timestamp_ms)
         self.A(len(aligned_log) == 1)
         (time_ms0, h) = aligned_log[0]
-        self.A(time_ms0 == 0 and h == [1.0, 2.0, 3.0, 4.0])
+        self.A(time_ms0 == 1536504001123 and h == [5., 5., 5., 5.])
 
     # we need this to compare 2 lists of floating point numbers for equality
     # because of floating-point imprecision
@@ -608,11 +689,11 @@ if unittest2_imported:
         with open(self.fn, 'w') as f:
             f.write('2000, 1, 4096, 1, 2, 3, 4\n')
             f.write('7000, 1, 4096, 1, 2, 3, 4\n')
-        (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn, 4)
-        self.A(max_timestamp_ms == 7000)
+        (raw_histo_log, min_timestamp_ms, max_timestamp_ms) = parse_hist_file(self.fn, 4, None)
+        self.A(min_timestamp_ms == 0 and max_timestamp_ms == 7000)
         (_, _, _, raw_buckets1) = raw_histo_log[0]
         (_, _, _, raw_buckets2) = raw_histo_log[1]
-        aligned_log = align_histo_log(raw_histo_log, 5, 4, max_timestamp_ms)
+        aligned_log = align_histo_log(raw_histo_log, 5, 4, min_timestamp_ms, max_timestamp_ms)
         self.A(len(aligned_log) == 2)
         (time_ms1, h1) = aligned_log[0]
         (time_ms2, h2) = aligned_log[1]
@@ -630,9 +711,9 @@ if unittest2_imported:
         with open(self.fn, 'w') as f:
             buckets = [ 100 for j in range(0, 128) ]
             f.write('9000, 1, 4096, %s\n' % ', '.join([str(b) for b in buckets]))
-        (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn, 128)
-        self.A(max_timestamp_ms == 9000)
-        aligned_log = align_histo_log(raw_histo_log, 5, 128, max_timestamp_ms)
+        (raw_histo_log, min_timestamp_ms, max_timestamp_ms) = parse_hist_file(self.fn, 128, None)
+        self.A(min_timestamp_ms == 0 and max_timestamp_ms == 9000)
+        aligned_log = align_histo_log(raw_histo_log, 5, 128, min_timestamp_ms, max_timestamp_ms)
         time_intervals = time_ranges(4, 32)
         # since buckets are all equal, then median is halfway through time_intervals
         # and max latency interval is at end of time_intervals
@@ -654,9 +735,9 @@ if unittest2_imported:
             # add one I/O request to last bucket
             buckets[-1] = 1
             f.write('9000, 1, 4096, %s\n' % ', '.join([str(b) for b in buckets]))
-        (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn, fio_v3_bucket_count)
-        self.A(max_timestamp_ms == 9000)
-        aligned_log = align_histo_log(raw_histo_log, 5, fio_v3_bucket_count, max_timestamp_ms)
+        (raw_histo_log, min_timestamp_ms, max_timestamp_ms) = parse_hist_file(self.fn, fio_v3_bucket_count, None)
+        self.A(min_timestamp_ms == 0 and max_timestamp_ms == 9000)
+        aligned_log = align_histo_log(raw_histo_log, 5, fio_v3_bucket_count, min_timestamp_ms, max_timestamp_ms)
         (time_ms, histo) = aligned_log[1]
         time_intervals = time_ranges(29, 64)
         expected_pctiles = { 100.0:(64*(1<<28))/1000.0 }


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0f9f1921197d79fe5069473b9d85ca4559ef8c42:

  axmap: isset_fn() should use 1ULL, not 1UL (2018-09-16 21:58:26 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ed06087e5f4f5dcb6660d1095005c777bcd661cb:

  filesetup: use 64-bit type for file size generation (2018-09-17 13:46:51 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      zbd: fix various 32-bit compilation warnings
      Random distribution 32-bit fixes
      filesetup: use 64-bit type for file size generation

 filesetup.c |  6 +++---
 lib/zipf.c  | 12 +++++------
 lib/zipf.h  |  8 ++++----
 zbd.c       | 66 ++++++++++++++++++++++++++++++++++---------------------------
 4 files changed, 50 insertions(+), 42 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index 580403d..c0fa3cd 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -331,7 +331,7 @@ unsigned long long get_rand_file_size(struct thread_data *td)
 {
 	unsigned long long ret, sized;
 	uint64_t frand_max;
-	unsigned long r;
+	uint64_t r;
 
 	frand_max = rand_max(&td->file_size_state);
 	r = __rand(&td->file_size_state);
@@ -1192,13 +1192,13 @@ bool pre_read_files(struct thread_data *td)
 static void __init_rand_distribution(struct thread_data *td, struct fio_file *f)
 {
 	unsigned int range_size, seed;
-	unsigned long nranges;
+	uint64_t nranges;
 	uint64_t fsize;
 
 	range_size = min(td->o.min_bs[DDIR_READ], td->o.min_bs[DDIR_WRITE]);
 	fsize = min(f->real_file_size, f->io_size);
 
-	nranges = (fsize + range_size - 1) / range_size;
+	nranges = (fsize + range_size - 1ULL) / range_size;
 
 	seed = jhash(f->file_name, strlen(f->file_name), 0) * td->thread_number;
 	if (!td->o.rand_repeatable)
diff --git a/lib/zipf.c b/lib/zipf.c
index 1ff8568..321a4fb 100644
--- a/lib/zipf.c
+++ b/lib/zipf.c
@@ -8,7 +8,7 @@
 
 static void zipf_update(struct zipf_state *zs)
 {
-	unsigned long to_gen;
+	uint64_t to_gen;
 	unsigned int i;
 
 	/*
@@ -22,7 +22,7 @@ static void zipf_update(struct zipf_state *zs)
 		zs->zetan += pow(1.0 / (double) (i + 1), zs->theta);
 }
 
-static void shared_rand_init(struct zipf_state *zs, unsigned long nranges,
+static void shared_rand_init(struct zipf_state *zs, uint64_t nranges,
 			     unsigned int seed)
 {
 	memset(zs, 0, sizeof(*zs));
@@ -32,7 +32,7 @@ static void shared_rand_init(struct zipf_state *zs, unsigned long nranges,
 	zs->rand_off = __rand(&zs->rand);
 }
 
-void zipf_init(struct zipf_state *zs, unsigned long nranges, double theta,
+void zipf_init(struct zipf_state *zs, uint64_t nranges, double theta,
 	       unsigned int seed)
 {
 	shared_rand_init(zs, nranges, seed);
@@ -43,7 +43,7 @@ void zipf_init(struct zipf_state *zs, unsigned long nranges, double theta,
 	zipf_update(zs);
 }
 
-unsigned long long zipf_next(struct zipf_state *zs)
+uint64_t zipf_next(struct zipf_state *zs)
 {
 	double alpha, eta, rand_uni, rand_z;
 	unsigned long long n = zs->nranges;
@@ -70,14 +70,14 @@ unsigned long long zipf_next(struct zipf_state *zs)
 	return (val + zs->rand_off) % zs->nranges;
 }
 
-void pareto_init(struct zipf_state *zs, unsigned long nranges, double h,
+void pareto_init(struct zipf_state *zs, uint64_t nranges, double h,
 		 unsigned int seed)
 {
 	shared_rand_init(zs, nranges, seed);
 	zs->pareto_pow = log(h) / log(1.0 - h);
 }
 
-unsigned long long pareto_next(struct zipf_state *zs)
+uint64_t pareto_next(struct zipf_state *zs)
 {
 	double rand = (double) __rand(&zs->rand) / (double) FRAND32_MAX;
 	unsigned long long n;
diff --git a/lib/zipf.h b/lib/zipf.h
index a4aa163..16b65f5 100644
--- a/lib/zipf.h
+++ b/lib/zipf.h
@@ -16,11 +16,11 @@ struct zipf_state {
 	bool disable_hash;
 };
 
-void zipf_init(struct zipf_state *zs, unsigned long nranges, double theta, unsigned int seed);
-unsigned long long zipf_next(struct zipf_state *zs);
+void zipf_init(struct zipf_state *zs, uint64_t nranges, double theta, unsigned int seed);
+uint64_t zipf_next(struct zipf_state *zs);
 
-void pareto_init(struct zipf_state *zs, unsigned long nranges, double h, unsigned int seed);
-unsigned long long pareto_next(struct zipf_state *zs);
+void pareto_init(struct zipf_state *zs, uint64_t nranges, double h, unsigned int seed);
+uint64_t pareto_next(struct zipf_state *zs);
 void zipf_disable_hash(struct zipf_state *zs);
 
 #endif
diff --git a/zbd.c b/zbd.c
index 0f3636a..9c3092a 100644
--- a/zbd.c
+++ b/zbd.c
@@ -128,9 +128,9 @@ static bool zbd_verify_sizes(void)
 						 f->file_name);
 					return false;
 				}
-				log_info("%s: rounded up offset from %lu to %lu\n",
-					 f->file_name, f->file_offset,
-					 new_offset);
+				log_info("%s: rounded up offset from %llu to %llu\n",
+					 f->file_name, (unsigned long long) f->file_offset,
+					 (unsigned long long) new_offset);
 				f->io_size -= (new_offset - f->file_offset);
 				f->file_offset = new_offset;
 			}
@@ -143,9 +143,9 @@ static bool zbd_verify_sizes(void)
 						 f->file_name);
 					return false;
 				}
-				log_info("%s: rounded down io_size from %lu to %lu\n",
-					 f->file_name, f->io_size,
-					 new_end - f->file_offset);
+				log_info("%s: rounded down io_size from %llu to %llu\n",
+					 f->file_name, (unsigned long long) f->io_size,
+					 (unsigned long long) new_end - f->file_offset);
 				f->io_size = new_end - f->file_offset;
 			}
 		}
@@ -357,14 +357,15 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 	if (td->o.zone_size == 0) {
 		td->o.zone_size = zone_size;
 	} else if (td->o.zone_size != zone_size) {
-		log_info("fio: %s job parameter zonesize %lld does not match disk zone size %ld.\n",
-			 f->file_name, td->o.zone_size, zone_size);
+		log_info("fio: %s job parameter zonesize %llu does not match disk zone size %llu.\n",
+			 f->file_name, (unsigned long long) td->o.zone_size,
+			(unsigned long long) zone_size);
 		ret = -EINVAL;
 		goto close;
 	}
 
-	dprint(FD_ZBD, "Device %s has %d zones of size %lu KB\n", f->file_name,
-	       nr_zones, zone_size / 1024);
+	dprint(FD_ZBD, "Device %s has %d zones of size %llu KB\n", f->file_name,
+	       nr_zones, (unsigned long long) zone_size / 1024);
 
 	zbd_info = scalloc(1, sizeof(*zbd_info) +
 			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
@@ -407,8 +408,8 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 			break;
 		ret = read_zone_info(fd, start_sector, buf, bufsz);
 		if (ret < 0) {
-			log_info("fio: BLKREPORTZONE(%lu) failed for %s (%d).\n",
-				 start_sector, f->file_name, -ret);
+			log_info("fio: BLKREPORTZONE(%llu) failed for %s (%d).\n",
+				 (unsigned long long) start_sector, f->file_name, -ret);
 			goto close;
 		}
 	}
@@ -602,6 +603,12 @@ static int zbd_reset_range(struct thread_data *td, const struct fio_file *f,
 	return ret;
 }
 
+static unsigned int zbd_zone_nr(struct zoned_block_device_info *zbd_info,
+				struct fio_zone_info *zone)
+{
+	return (uintptr_t) zone - (uintptr_t) zbd_info->zone_info;
+}
+
 /**
  * zbd_reset_zone - reset the write pointer of a single zone
  * @td: FIO thread data.
@@ -615,8 +622,8 @@ static int zbd_reset_zone(struct thread_data *td, const struct fio_file *f,
 {
 	int ret;
 
-	dprint(FD_ZBD, "%s: resetting wp of zone %lu.\n", f->file_name,
-	       z - f->zbd_info->zone_info);
+	dprint(FD_ZBD, "%s: resetting wp of zone %u.\n", f->file_name,
+		zbd_zone_nr(f->zbd_info, z));
 	ret = zbd_reset_range(td, f, z->start, (z+1)->start - z->start);
 	return ret;
 }
@@ -639,8 +646,8 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 	bool reset_wp;
 	int res = 0;
 
-	dprint(FD_ZBD, "%s: examining zones %lu .. %lu\n", f->file_name,
-	       zb - f->zbd_info->zone_info, ze - f->zbd_info->zone_info);
+	dprint(FD_ZBD, "%s: examining zones %u .. %u\n", f->file_name,
+		zbd_zone_nr(f->zbd_info, zb), zbd_zone_nr(f->zbd_info, ze));
 	assert(f->fd != -1);
 	for (z = zb; z < ze; z++) {
 		pthread_mutex_lock(&z->mutex);
@@ -653,10 +660,10 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 				start_z = z;
 			} else if (start_z < ze && !reset_wp) {
 				dprint(FD_ZBD,
-				       "%s: resetting zones %lu .. %lu\n",
+				       "%s: resetting zones %u .. %u\n",
 				       f->file_name,
-				       start_z - f->zbd_info->zone_info,
-				       z - f->zbd_info->zone_info);
+					zbd_zone_nr(f->zbd_info, start_z),
+					zbd_zone_nr(f->zbd_info, z));
 				if (zbd_reset_range(td, f, start_z->start,
 						z->start - start_z->start) < 0)
 					res = 1;
@@ -666,9 +673,9 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 		default:
 			if (start_z == ze)
 				break;
-			dprint(FD_ZBD, "%s: resetting zones %lu .. %lu\n",
-			       f->file_name, start_z - f->zbd_info->zone_info,
-			       z - f->zbd_info->zone_info);
+			dprint(FD_ZBD, "%s: resetting zones %u .. %u\n",
+			       f->file_name, zbd_zone_nr(f->zbd_info, start_z),
+			       zbd_zone_nr(f->zbd_info, z));
 			if (zbd_reset_range(td, f, start_z->start,
 					    z->start - start_z->start) < 0)
 				res = 1;
@@ -677,9 +684,9 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 		}
 	}
 	if (start_z < ze) {
-		dprint(FD_ZBD, "%s: resetting zones %lu .. %lu\n", f->file_name,
-		       start_z - f->zbd_info->zone_info,
-		       z - f->zbd_info->zone_info);
+		dprint(FD_ZBD, "%s: resetting zones %u .. %u\n", f->file_name,
+			zbd_zone_nr(f->zbd_info, start_z),
+			zbd_zone_nr(f->zbd_info, z));
 		if (zbd_reset_range(td, f, start_z->start,
 				    z->start - start_z->start) < 0)
 			res = 1;
@@ -765,7 +772,8 @@ void zbd_file_reset(struct thread_data *td, struct fio_file *f)
 	pthread_mutex_unlock(&f->zbd_info->mutex);
 	for (z = zb ; z < ze; z++)
 		pthread_mutex_unlock(&z->mutex);
-	dprint(FD_ZBD, "%s(%s): swd = %ld\n", __func__, f->file_name, swd);
+	dprint(FD_ZBD, "%s(%s): swd = %llu\n", __func__, f->file_name,
+		(unsigned long long) swd);
 	/*
 	 * If data verification is enabled reset the affected zones before
 	 * writing any data to avoid that a zone reset has to be issued while
@@ -995,8 +1003,8 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
 	}
 
 	if (z->verify_block * min_bs >= f->zbd_info->zone_size)
-		log_err("%s: %d * %d >= %ld\n", f->file_name, z->verify_block,
-			min_bs, f->zbd_info->zone_size);
+		log_err("%s: %d * %d >= %llu\n", f->file_name, z->verify_block,
+			min_bs, (unsigned long long) f->zbd_info->zone_size);
 	io_u->offset = z->start + z->verify_block++ * min_bs;
 	return z;
 }
@@ -1301,7 +1309,7 @@ char *zbd_write_status(const struct thread_stat *ts)
 {
 	char *res;
 
-	if (asprintf(&res, "; %ld zone resets", ts->nr_zone_resets) < 0)
+	if (asprintf(&res, "; %llu zone resets", (unsigned long long) ts->nr_zone_resets) < 0)
 		return NULL;
 	return res;
 }


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e26029be10ee2c570cba2c4cc2b1987568306cd2:

  Merge branch 'histo-log-dup-timestamp' of https://github.com/parallel-fs-utils/fio (2018-09-12 14:33:04 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0f9f1921197d79fe5069473b9d85ca4559ef8c42:

  axmap: isset_fn() should use 1ULL, not 1UL (2018-09-16 21:58:26 -0600)

----------------------------------------------------------------
Jens Axboe (6):
      Merge branch 'windows_static' of https://github.com/sitsofe/fio
      axmap: use 64-bit type for number of bits
      axmap: use 64-bit index for the handlers
      lfsr: use unsigned long long for 64-bit values
      t/axmap: use a 64-bit type (not size_t) for axmap tests
      axmap: isset_fn() should use 1ULL, not 1UL

Sitsofe Wheeler (1):
      build: change where we set -static for Windows

 Makefile    |  2 +-
 configure   |  1 +
 lib/axmap.c | 19 ++++++++++---------
 lib/axmap.h |  2 +-
 lib/lfsr.c  |  8 ++++----
 t/axmap.c   | 10 +++++-----
 6 files changed, 22 insertions(+), 20 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 42e5205..4721b78 100644
--- a/Makefile
+++ b/Makefile
@@ -198,7 +198,7 @@ endif
 ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
   SOURCE += os/windows/posix.c
   LIBS	 += -lpthread -lpsapi -lws2_32
-  CFLAGS += -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format -static
+  CFLAGS += -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format
 endif
 
 OBJS := $(SOURCE:.c=.o)
diff --git a/configure b/configure
index 26c345b..5490e26 100755
--- a/configure
+++ b/configure
@@ -361,6 +361,7 @@ CYGWIN*)
   output_sym "CONFIG_WINDOWSAIO"
   # We now take the regular configuration path without having exit 0 here.
   # Flags below are still necessary mostly for MinGW.
+  build_static="yes"
   socklen_t="yes"
   rusage_thread="yes"
   fdatasync="yes"
diff --git a/lib/axmap.c b/lib/axmap.c
index 03e712f..27301bd 100644
--- a/lib/axmap.c
+++ b/lib/axmap.c
@@ -110,7 +110,7 @@ void axmap_free(struct axmap *axmap)
 }
 
 /* Allocate memory for a set that can store the numbers 0 .. @nr_bits - 1. */
-struct axmap *axmap_new(unsigned long nr_bits)
+struct axmap *axmap_new(uint64_t nr_bits)
 {
 	struct axmap *axmap;
 	unsigned int i, levels;
@@ -135,13 +135,14 @@ struct axmap *axmap_new(unsigned long nr_bits)
 	for (i = 0; i < axmap->nr_levels; i++) {
 		struct axmap_level *al = &axmap->levels[i];
 
+		nr_bits = (nr_bits + BLOCKS_PER_UNIT - 1) >> UNIT_SHIFT;
+
 		al->level = i;
-		al->map_size = (nr_bits + BLOCKS_PER_UNIT - 1) >> UNIT_SHIFT;
+		al->map_size = nr_bits;
 		al->map = malloc(al->map_size * sizeof(unsigned long));
 		if (!al->map)
 			goto free_levels;
 
-		nr_bits = (nr_bits + BLOCKS_PER_UNIT - 1) >> UNIT_SHIFT;
 	}
 
 	axmap_reset(axmap);
@@ -164,7 +165,7 @@ free_axmap:
  * returns true.
  */
 static bool axmap_handler(struct axmap *axmap, uint64_t bit_nr,
-			  bool (*func)(struct axmap_level *, unsigned long, unsigned int,
+			  bool (*func)(struct axmap_level *, uint64_t, unsigned int,
 			  void *), void *data)
 {
 	struct axmap_level *al;
@@ -193,12 +194,12 @@ static bool axmap_handler(struct axmap *axmap, uint64_t bit_nr,
  * returns true.
  */
 static bool axmap_handler_topdown(struct axmap *axmap, uint64_t bit_nr,
-	bool (*func)(struct axmap_level *, unsigned long, unsigned int, void *))
+	bool (*func)(struct axmap_level *, uint64_t, unsigned int, void *))
 {
 	int i;
 
 	for (i = axmap->nr_levels - 1; i >= 0; i--) {
-		unsigned long index = bit_nr >> (UNIT_SHIFT * i);
+		uint64_t index = bit_nr >> (UNIT_SHIFT * i);
 		unsigned long offset = index >> UNIT_SHIFT;
 		unsigned int bit = index & BLOCKS_PER_UNIT_MASK;
 
@@ -219,7 +220,7 @@ struct axmap_set_data {
  * the boundary of the element at offset @offset. Return the number of bits
  * that have been set in @__data->set_bits if @al->level == 0.
  */
-static bool axmap_set_fn(struct axmap_level *al, unsigned long offset,
+static bool axmap_set_fn(struct axmap_level *al, uint64_t offset,
 			 unsigned int bit, void *__data)
 {
 	struct axmap_set_data *data = __data;
@@ -321,10 +322,10 @@ unsigned int axmap_set_nr(struct axmap *axmap, uint64_t bit_nr,
 	return set_bits;
 }
 
-static bool axmap_isset_fn(struct axmap_level *al, unsigned long offset,
+static bool axmap_isset_fn(struct axmap_level *al, uint64_t offset,
 			   unsigned int bit, void *unused)
 {
-	return (al->map[offset] & (1UL << bit)) != 0;
+	return (al->map[offset] & (1ULL << bit)) != 0;
 }
 
 bool axmap_isset(struct axmap *axmap, uint64_t bit_nr)
diff --git a/lib/axmap.h b/lib/axmap.h
index 55349d8..aa59768 100644
--- a/lib/axmap.h
+++ b/lib/axmap.h
@@ -5,7 +5,7 @@
 #include "types.h"
 
 struct axmap;
-struct axmap *axmap_new(unsigned long nr_bits);
+struct axmap *axmap_new(uint64_t nr_bits);
 void axmap_free(struct axmap *bm);
 
 void axmap_set(struct axmap *axmap, uint64_t bit_nr);
diff --git a/lib/lfsr.c b/lib/lfsr.c
index a4f1fb1..49e34a8 100644
--- a/lib/lfsr.c
+++ b/lib/lfsr.c
@@ -78,7 +78,7 @@ static uint8_t lfsr_taps[64][FIO_MAX_TAPS] =
 
 #define __LFSR_NEXT(__fl, __v)						\
 	__v = ((__v >> 1) | __fl->cached_bit) ^			\
-			(((__v & 1UL) - 1UL) & __fl->xormask);
+			(((__v & 1ULL) - 1ULL) & __fl->xormask);
 
 static inline void __lfsr_next(struct fio_lfsr *fl, unsigned int spin)
 {
@@ -146,7 +146,7 @@ static uint64_t lfsr_create_xormask(uint8_t *taps)
 	uint64_t xormask = 0;
 
 	for(i = 0; i < FIO_MAX_TAPS && taps[i] != 0; i++)
-		xormask |= 1UL << (taps[i] - 1);
+		xormask |= 1ULL << (taps[i] - 1);
 
 	return xormask;
 }
@@ -161,7 +161,7 @@ static uint8_t *find_lfsr(uint64_t size)
 	 * take that into account.
 	 */
 	for (i = 3; i < 64; i++)
-		if ((1UL << i) > size)
+		if ((1ULL << i) > size)
 			return lfsr_taps[i];
 
 	return NULL;
@@ -241,7 +241,7 @@ int lfsr_init(struct fio_lfsr *fl, uint64_t nums, unsigned long seed,
 
 	fl->max_val = nums - 1;
 	fl->xormask = lfsr_create_xormask(taps);
-	fl->cached_bit = 1UL << (taps[0] - 1);
+	fl->cached_bit = 1ULL << (taps[0] - 1);
 
 	if (prepare_spin(fl, spin))
 		return 1;
diff --git a/t/axmap.c b/t/axmap.c
index a2e6fd6..9d6bdee 100644
--- a/t/axmap.c
+++ b/t/axmap.c
@@ -5,7 +5,7 @@
 #include "../lib/lfsr.h"
 #include "../lib/axmap.h"
 
-static int test_regular(size_t size, int seed)
+static int test_regular(uint64_t size, int seed)
 {
 	struct fio_lfsr lfsr;
 	struct axmap *map;
@@ -61,11 +61,11 @@ static int check_next_free(struct axmap *map, uint64_t start, uint64_t expected)
 	return 0;
 }
 
-static int test_next_free(size_t size, int seed)
+static int test_next_free(uint64_t size, int seed)
 {
 	struct fio_lfsr lfsr;
 	struct axmap *map;
-	size_t osize;
+	uint64_t osize;
 	uint64_t ff, lastfree;
 	int err, i;
 
@@ -196,7 +196,7 @@ static int test_next_free(size_t size, int seed)
 	return 0;
 }
 
-static int test_multi(size_t size, unsigned int bit_off)
+static int test_multi(uint64_t size, unsigned int bit_off)
 {
 	unsigned int map_size = size;
 	struct axmap *map;
@@ -395,7 +395,7 @@ static int test_overlap(void)
 
 int main(int argc, char *argv[])
 {
-	size_t size = (1UL << 23) - 200;
+	uint64_t size = (1ULL << 23) - 200;
 	int seed = 1;
 
 	if (argc > 1) {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e8ed50bc3ce67f449714c55c3fbf2f8eb50730c2:

  windows: fix the most egregious posix.c style errors (2018-09-11 16:54:39 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e26029be10ee2c570cba2c4cc2b1987568306cd2:

  Merge branch 'histo-log-dup-timestamp' of https://github.com/parallel-fs-utils/fio (2018-09-12 14:33:04 -0600)

----------------------------------------------------------------
Ben England (1):
      filter out records with duplicate timestamps

Jens Axboe (2):
      windows: make win_to_posix_error() more resilient
      Merge branch 'histo-log-dup-timestamp' of https://github.com/parallel-fs-utils/fio

 os/windows/posix.c                  |  2 ++
 tools/hist/fio-histo-log-pctiles.py | 11 +++++++++++
 2 files changed, 13 insertions(+)

---

Diff of recent changes:

diff --git a/os/windows/posix.c b/os/windows/posix.c
index 5b72bea..fd1d558 100644
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -35,6 +35,8 @@ HRESULT WINAPI StringCchPrintfA(char *pszDest, size_t cchDest, const char *pszFo
 int win_to_posix_error(DWORD winerr)
 {
 	switch (winerr) {
+	case ERROR_SUCCESS:
+		return 0;
 	case ERROR_FILE_NOT_FOUND:
 		return ENOENT;
 	case ERROR_PATH_NOT_FOUND:
diff --git a/tools/hist/fio-histo-log-pctiles.py b/tools/hist/fio-histo-log-pctiles.py
index bb016ea..1e7b631 100755
--- a/tools/hist/fio-histo-log-pctiles.py
+++ b/tools/hist/fio-histo-log-pctiles.py
@@ -64,6 +64,8 @@ def parse_hist_file(logfn, buckets_per_interval):
     with open(logfn, 'r') as f:
         records = [ l.strip() for l in f.readlines() ]
     intervals = []
+    last_time_ms = -1
+    last_direction = -1
     for k, r in enumerate(records):
         if r == '':
             continue
@@ -96,6 +98,15 @@ def parse_hist_file(logfn, buckets_per_interval):
         if len(buckets) != buckets_per_interval:
             raise FioHistoLogExc('%d buckets per interval but %d expected in %s' % 
                     (len(buckets), buckets_per_interval, exception_suffix(k+1, logfn)))
+
+        # hack to filter out records with the same timestamp
+        # we should not have to do this if fio logs histogram records correctly
+
+        if time_ms == last_time_ms and direction == last_direction:
+            continue
+        last_time_ms = time_ms
+        last_direction = direction
+
         intervals.append((time_ms, direction, bsz, buckets))
     if len(intervals) == 0:
         raise FioHistoLogExc('no records in %s' % logfn)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 39281024d26b5dbd4c70ce7620aeadc8933ac8c7:

  Merge branch 'no-unittest-dep' of https://github.com/parallel-fs-utils/fio (2018-09-10 09:56:11 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e8ed50bc3ce67f449714c55c3fbf2f8eb50730c2:

  windows: fix the most egregious posix.c style errors (2018-09-11 16:54:39 -0600)

----------------------------------------------------------------
Jens Axboe (5):
      Merge branch 'offload-flags-fix' of https://github.com/vincentkfu/fio
      Add regression test for recent offload locking bug
      Fio 3.10
      windows: handle ERROR_NOT_READY
      windows: fix the most egregious posix.c style errors

Vincent Fu (1):
      rate_submit: synchronize accesses to io_u_queue->nr

 FIO-VERSION-GEN           |   2 +-
 os/windows/posix.c        | 366 +++++++++++++++++++++++++++-------------------
 rate-submit.c             |   2 +-
 t/jobs/t0010-b7aae4ba.fio |   8 +
 4 files changed, 229 insertions(+), 149 deletions(-)
 create mode 100644 t/jobs/t0010-b7aae4ba.fio

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index f031594..f1d25d0 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.9
+DEF_VER=fio-3.10
 
 LF='
 '
diff --git a/os/windows/posix.c b/os/windows/posix.c
index d33250d..5b72bea 100644
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -29,87 +29,149 @@ extern unsigned long mtime_since_now(struct timespec *);
 extern void fio_gettime(struct timespec *, void *);
 
 /* These aren't defined in the MinGW headers */
-HRESULT WINAPI StringCchCopyA(
-  char *pszDest,
-  size_t cchDest,
-  const char *pszSrc);
-
-HRESULT WINAPI StringCchPrintfA(
-  char *pszDest,
-  size_t cchDest,
-  const char *pszFormat,
-  ...);
+HRESULT WINAPI StringCchCopyA(char *pszDest, size_t cchDest, const char *pszSrc);
+HRESULT WINAPI StringCchPrintfA(char *pszDest, size_t cchDest, const char *pszFormat, ...);
 
 int win_to_posix_error(DWORD winerr)
 {
-	switch (winerr)
-	{
-	case ERROR_FILE_NOT_FOUND:		return ENOENT;
-	case ERROR_PATH_NOT_FOUND:		return ENOENT;
-	case ERROR_ACCESS_DENIED:		return EACCES;
-	case ERROR_INVALID_HANDLE:		return EBADF;
-	case ERROR_NOT_ENOUGH_MEMORY:	return ENOMEM;
-	case ERROR_INVALID_DATA:		return EINVAL;
-	case ERROR_OUTOFMEMORY:			return ENOMEM;
-	case ERROR_INVALID_DRIVE:		return ENODEV;
-	case ERROR_NOT_SAME_DEVICE:		return EXDEV;
-	case ERROR_WRITE_PROTECT:		return EROFS;
-	case ERROR_BAD_UNIT:			return ENODEV;
-	case ERROR_SHARING_VIOLATION:	return EACCES;
-	case ERROR_LOCK_VIOLATION:		return EACCES;
-	case ERROR_SHARING_BUFFER_EXCEEDED:	return ENOLCK;
-	case ERROR_HANDLE_DISK_FULL:	return ENOSPC;
-	case ERROR_NOT_SUPPORTED:		return ENOSYS;
-	case ERROR_FILE_EXISTS:			return EEXIST;
-	case ERROR_CANNOT_MAKE:			return EPERM;
-	case ERROR_INVALID_PARAMETER:	return EINVAL;
-	case ERROR_NO_PROC_SLOTS:		return EAGAIN;
-	case ERROR_BROKEN_PIPE:			return EPIPE;
-	case ERROR_OPEN_FAILED:			return EIO;
-	case ERROR_NO_MORE_SEARCH_HANDLES:	return ENFILE;
-	case ERROR_CALL_NOT_IMPLEMENTED:	return ENOSYS;
-	case ERROR_INVALID_NAME:		return ENOENT;
-	case ERROR_WAIT_NO_CHILDREN:	return ECHILD;
-	case ERROR_CHILD_NOT_COMPLETE:	return EBUSY;
-	case ERROR_DIR_NOT_EMPTY:		return ENOTEMPTY;
-	case ERROR_SIGNAL_REFUSED:		return EIO;
-	case ERROR_BAD_PATHNAME:		return ENOENT;
-	case ERROR_SIGNAL_PENDING:		return EBUSY;
-	case ERROR_MAX_THRDS_REACHED:	return EAGAIN;
-	case ERROR_BUSY:				return EBUSY;
-	case ERROR_ALREADY_EXISTS:		return EEXIST;
-	case ERROR_NO_SIGNAL_SENT:		return EIO;
-	case ERROR_FILENAME_EXCED_RANGE:	return EINVAL;
-	case ERROR_META_EXPANSION_TOO_LONG:	return EINVAL;
-	case ERROR_INVALID_SIGNAL_NUMBER:	return EINVAL;
-	case ERROR_THREAD_1_INACTIVE:	return EINVAL;
-	case ERROR_BAD_PIPE:			return EINVAL;
-	case ERROR_PIPE_BUSY:			return EBUSY;
-	case ERROR_NO_DATA:				return EPIPE;
-	case ERROR_MORE_DATA:			return EAGAIN;
-	case ERROR_DIRECTORY:			return ENOTDIR;
-	case ERROR_PIPE_CONNECTED:		return EBUSY;
-	case ERROR_NO_TOKEN:			return EINVAL;
-	case ERROR_PROCESS_ABORTED:		return EFAULT;
-	case ERROR_BAD_DEVICE:			return ENODEV;
-	case ERROR_BAD_USERNAME:		return EINVAL;
-	case ERROR_OPEN_FILES:			return EAGAIN;
-	case ERROR_ACTIVE_CONNECTIONS:	return EAGAIN;
-	case ERROR_DEVICE_IN_USE:		return EAGAIN;
-	case ERROR_INVALID_AT_INTERRUPT_TIME:	return EINTR;
-	case ERROR_IO_DEVICE:			return EIO;
-	case ERROR_NOT_OWNER:			return EPERM;
-	case ERROR_END_OF_MEDIA:		return ENOSPC;
-	case ERROR_EOM_OVERFLOW:		return ENOSPC;
-	case ERROR_BEGINNING_OF_MEDIA:	return ESPIPE;
-	case ERROR_SETMARK_DETECTED:	return ESPIPE;
-	case ERROR_NO_DATA_DETECTED:	return ENOSPC;
-	case ERROR_POSSIBLE_DEADLOCK:	return EDEADLOCK;
-	case ERROR_CRC:					return EIO;
-	case ERROR_NEGATIVE_SEEK:		return EINVAL;
-	case ERROR_DISK_FULL:			return ENOSPC;
-	case ERROR_NOACCESS:			return EFAULT;
-	case ERROR_FILE_INVALID:		return ENXIO;
+	switch (winerr) {
+	case ERROR_FILE_NOT_FOUND:
+		return ENOENT;
+	case ERROR_PATH_NOT_FOUND:
+		return ENOENT;
+	case ERROR_ACCESS_DENIED:
+		return EACCES;
+	case ERROR_INVALID_HANDLE:
+		return EBADF;
+	case ERROR_NOT_ENOUGH_MEMORY:
+		return ENOMEM;
+	case ERROR_INVALID_DATA:
+		return EINVAL;
+	case ERROR_OUTOFMEMORY:
+		return ENOMEM;
+	case ERROR_INVALID_DRIVE:
+		return ENODEV;
+	case ERROR_NOT_SAME_DEVICE:
+		return EXDEV;
+	case ERROR_WRITE_PROTECT:
+		return EROFS;
+	case ERROR_BAD_UNIT:
+		return ENODEV;
+	case ERROR_NOT_READY:
+		return EAGAIN;
+	case ERROR_SHARING_VIOLATION:
+		return EACCES;
+	case ERROR_LOCK_VIOLATION:
+		return EACCES;
+	case ERROR_SHARING_BUFFER_EXCEEDED:
+		return ENOLCK;
+	case ERROR_HANDLE_DISK_FULL:
+		return ENOSPC;
+	case ERROR_NOT_SUPPORTED:
+		return ENOSYS;
+	case ERROR_FILE_EXISTS:
+		return EEXIST;
+	case ERROR_CANNOT_MAKE:
+		return EPERM;
+	case ERROR_INVALID_PARAMETER:
+		return EINVAL;
+	case ERROR_NO_PROC_SLOTS:
+		return EAGAIN;
+	case ERROR_BROKEN_PIPE:
+		return EPIPE;
+	case ERROR_OPEN_FAILED:
+		return EIO;
+	case ERROR_NO_MORE_SEARCH_HANDLES:
+		return ENFILE;
+	case ERROR_CALL_NOT_IMPLEMENTED:
+		return ENOSYS;
+	case ERROR_INVALID_NAME:
+		return ENOENT;
+	case ERROR_WAIT_NO_CHILDREN:
+		return ECHILD;
+	case ERROR_CHILD_NOT_COMPLETE:
+		return EBUSY;
+	case ERROR_DIR_NOT_EMPTY:
+		return ENOTEMPTY;
+	case ERROR_SIGNAL_REFUSED:
+		return EIO;
+	case ERROR_BAD_PATHNAME:
+		return ENOENT;
+	case ERROR_SIGNAL_PENDING:
+		return EBUSY;
+	case ERROR_MAX_THRDS_REACHED:
+		return EAGAIN;
+	case ERROR_BUSY:
+		return EBUSY;
+	case ERROR_ALREADY_EXISTS:
+		return EEXIST;
+	case ERROR_NO_SIGNAL_SENT:
+		return EIO;
+	case ERROR_FILENAME_EXCED_RANGE:
+		return EINVAL;
+	case ERROR_META_EXPANSION_TOO_LONG:
+		return EINVAL;
+	case ERROR_INVALID_SIGNAL_NUMBER:
+		return EINVAL;
+	case ERROR_THREAD_1_INACTIVE:
+		return EINVAL;
+	case ERROR_BAD_PIPE:
+		return EINVAL;
+	case ERROR_PIPE_BUSY:
+		return EBUSY;
+	case ERROR_NO_DATA:
+		return EPIPE;
+	case ERROR_MORE_DATA:
+		return EAGAIN;
+	case ERROR_DIRECTORY:
+		return ENOTDIR;
+	case ERROR_PIPE_CONNECTED:
+		return EBUSY;
+	case ERROR_NO_TOKEN:
+		return EINVAL;
+	case ERROR_PROCESS_ABORTED:
+		return EFAULT;
+	case ERROR_BAD_DEVICE:
+		return ENODEV;
+	case ERROR_BAD_USERNAME:
+		return EINVAL;
+	case ERROR_OPEN_FILES:
+		return EAGAIN;
+	case ERROR_ACTIVE_CONNECTIONS:
+		return EAGAIN;
+	case ERROR_DEVICE_IN_USE:
+		return EBUSY;
+	case ERROR_INVALID_AT_INTERRUPT_TIME:
+		return EINTR;
+	case ERROR_IO_DEVICE:
+		return EIO;
+	case ERROR_NOT_OWNER:
+		return EPERM;
+	case ERROR_END_OF_MEDIA:
+		return ENOSPC;
+	case ERROR_EOM_OVERFLOW:
+		return ENOSPC;
+	case ERROR_BEGINNING_OF_MEDIA:
+		return ESPIPE;
+	case ERROR_SETMARK_DETECTED:
+		return ESPIPE;
+	case ERROR_NO_DATA_DETECTED:
+		return ENOSPC;
+	case ERROR_POSSIBLE_DEADLOCK:
+		return EDEADLOCK;
+	case ERROR_CRC:
+		return EIO;
+	case ERROR_NEGATIVE_SEEK:
+		return EINVAL;
+	case ERROR_DISK_FULL:
+		return ENOSPC;
+	case ERROR_NOACCESS:
+		return EFAULT;
+	case ERROR_FILE_INVALID:
+		return ENXIO;
+	default:
+		log_err("fio: windows error %d not handled\n", winerr);
+		return EIO;
 	}
 
 	return winerr;
@@ -138,8 +200,7 @@ int GetNumLogicalProcessors(void)
 		}
 	}
 
-	for (i = 0; i < len / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); i++)
-	{
+	for (i = 0; i < len / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); i++) {
 		if (processor_info[i].Relationship == RelationProcessorCore)
 			num_processors += hweight64(processor_info[i].ProcessorMask);
 	}
@@ -155,8 +216,7 @@ long sysconf(int name)
 	SYSTEM_INFO sysInfo;
 	MEMORYSTATUSEX status;
 
-	switch (name)
-	{
+	switch (name) {
 	case _SC_NPROCESSORS_ONLN:
 		val = GetNumLogicalProcessors();
 		if (val == -1)
@@ -226,29 +286,36 @@ char *dlerror(void)
 /* Copied from http://blogs.msdn.com/b/joshpoley/archive/2007/12/19/date-time-formats-and-conversions.aspx */
 void Time_tToSystemTime(time_t dosTime, SYSTEMTIME *systemTime)
 {
-    FILETIME utcFT;
-    LONGLONG jan1970;
+	FILETIME utcFT;
+	LONGLONG jan1970;
 	SYSTEMTIME tempSystemTime;
 
-    jan1970 = Int32x32To64(dosTime, 10000000) + 116444736000000000;
-    utcFT.dwLowDateTime = (DWORD)jan1970;
-    utcFT.dwHighDateTime = jan1970 >> 32;
+	jan1970 = Int32x32To64(dosTime, 10000000) + 116444736000000000;
+	utcFT.dwLowDateTime = (DWORD)jan1970;
+	utcFT.dwHighDateTime = jan1970 >> 32;
 
-    FileTimeToSystemTime((FILETIME*)&utcFT, &tempSystemTime);
+	FileTimeToSystemTime((FILETIME*)&utcFT, &tempSystemTime);
 	SystemTimeToTzSpecificLocalTime(NULL, &tempSystemTime, systemTime);
 }
 
-char* ctime_r(const time_t *t, char *buf)
+char *ctime_r(const time_t *t, char *buf)
 {
-    SYSTEMTIME systime;
-    const char * const dayOfWeek[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };
-    const char * const monthOfYear[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" };
-
-    Time_tToSystemTime(*t, &systime);
-    /* We don't know how long `buf` is, but assume it's rounded up from the minimum of 25 to 32 */
-    StringCchPrintfA(buf, 31, "%s %s %d %02d:%02d:%02d %04d\n", dayOfWeek[systime.wDayOfWeek % 7], monthOfYear[(systime.wMonth - 1) % 12],
-										 systime.wDay, systime.wHour, systime.wMinute, systime.wSecond, systime.wYear);
-    return buf;
+	SYSTEMTIME systime;
+	const char * const dayOfWeek[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };
+	const char * const monthOfYear[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" };
+
+	Time_tToSystemTime(*t, &systime);
+
+	/*
+	 * We don't know how long `buf` is, but assume it's rounded up from
+	 * the minimum of 25 to 32
+	 */
+	StringCchPrintfA(buf, 31, "%s %s %d %02d:%02d:%02d %04d\n",
+				dayOfWeek[systime.wDayOfWeek % 7],
+				monthOfYear[(systime.wMonth - 1) % 12],
+				systime.wDay, systime.wHour, systime.wMinute,
+				systime.wSecond, systime.wYear);
+	return buf;
 }
 
 int gettimeofday(struct timeval *restrict tp, void *restrict tzp)
@@ -275,8 +342,7 @@ int gettimeofday(struct timeval *restrict tp, void *restrict tzp)
 	return 0;
 }
 
-int sigaction(int sig, const struct sigaction *act,
-		struct sigaction *oact)
+int sigaction(int sig, const struct sigaction *act, struct sigaction *oact)
 {
 	int rc = 0;
 	void (*prev_handler)(int);
@@ -291,13 +357,12 @@ int sigaction(int sig, const struct sigaction *act,
 	return rc;
 }
 
-int lstat(const char * path, struct stat * buf)
+int lstat(const char *path, struct stat *buf)
 {
 	return stat(path, buf);
 }
 
-void *mmap(void *addr, size_t len, int prot, int flags,
-		int fildes, off_t off)
+void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off)
 {
 	DWORD vaProt = 0;
 	DWORD mapAccess = 0;
@@ -323,25 +388,20 @@ void *mmap(void *addr, size_t len, int prot, int flags,
 	lenhigh = len >> 16;
 	/* If the low DWORD is zero and the high DWORD is non-zero, `CreateFileMapping`
 	   will return ERROR_INVALID_PARAMETER. To avoid this, set both to zero. */
-	if (lenlow == 0) {
+	if (lenlow == 0)
 		lenhigh = 0;
-	}
 
-	if (flags & MAP_ANON || flags & MAP_ANONYMOUS)
-	{
+	if (flags & MAP_ANON || flags & MAP_ANONYMOUS) {
 		allocAddr = VirtualAlloc(addr, len, MEM_COMMIT, vaProt);
 		if (allocAddr == NULL)
 			errno = win_to_posix_error(GetLastError());
-	}
-	else
-	{
-		hMap = CreateFileMapping((HANDLE)_get_osfhandle(fildes), NULL, vaProt, lenhigh, lenlow, NULL);
+	} else {
+		hMap = CreateFileMapping((HANDLE)_get_osfhandle(fildes), NULL,
+						vaProt, lenhigh, lenlow, NULL);
 
 		if (hMap != NULL)
-		{
-			allocAddr = MapViewOfFile(hMap, mapAccess, off >> 16, off & 0xFFFF, len);
-		}
-
+			allocAddr = MapViewOfFile(hMap, mapAccess, off >> 16,
+							off & 0xFFFF, len);
 		if (hMap == NULL || allocAddr == NULL)
 			errno = win_to_posix_error(GetLastError());
 
@@ -360,9 +420,7 @@ int munmap(void *addr, size_t len)
 	success = UnmapViewOfFile(addr);
 
 	if (!success)
-	{
 		success = VirtualFree(addr, 0, MEM_RELEASE);
-	}
 
 	return !success;
 }
@@ -390,8 +448,12 @@ static HANDLE log_file = INVALID_HANDLE_VALUE;
 
 void openlog(const char *ident, int logopt, int facility)
 {
-	if (log_file == INVALID_HANDLE_VALUE)
-		log_file = CreateFileA("syslog.txt", GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_ALWAYS, 0, NULL);
+	if (log_file != INVALID_HANDLE_VALUE)
+		return;
+
+	log_file = CreateFileA("syslog.txt", GENERIC_WRITE,
+				FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
+				OPEN_ALWAYS, 0, NULL);
 }
 
 void closelog(void)
@@ -408,7 +470,9 @@ void syslog(int priority, const char *message, ... /* argument */)
 	DWORD bytes_written;
 
 	if (log_file == INVALID_HANDLE_VALUE) {
-		log_file = CreateFileA("syslog.txt", GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_ALWAYS, 0, NULL);
+		log_file = CreateFileA("syslog.txt", GENERIC_WRITE,
+					FILE_SHARE_READ | FILE_SHARE_WRITE,
+					NULL, OPEN_ALWAYS, 0, NULL);
 	}
 
 	if (log_file == INVALID_HANDLE_VALUE) {
@@ -483,8 +547,7 @@ int clock_gettime(clockid_t clock_id, struct timespec *tp)
 {
 	int rc = 0;
 
-	if (clock_id == CLOCK_MONOTONIC)
-	{
+	if (clock_id == CLOCK_MONOTONIC) {
 		static LARGE_INTEGER freq = {{0,0}};
 		LARGE_INTEGER counts;
 		uint64_t t;
@@ -503,9 +566,7 @@ int clock_gettime(clockid_t clock_id, struct timespec *tp)
 		 * and then divide by the frequency. */
 		t *= 1000000000;
 		tp->tv_nsec = t / freq.QuadPart;
-	}
-	else if (clock_id == CLOCK_REALTIME)
-	{
+	} else if (clock_id == CLOCK_REALTIME) {
 		/* clock_gettime(CLOCK_REALTIME,...) is just an alias for gettimeofday with a
 		 * higher-precision field. */
 		struct timeval tv;
@@ -552,6 +613,7 @@ int mlock(const void * addr, size_t len)
 int munlock(const void * addr, size_t len)
 {
 	BOOL success = VirtualUnlock((LPVOID)addr, len);
+
 	if (!success) {
 		errno = win_to_posix_error(GetLastError());
 		return -1;
@@ -611,22 +673,26 @@ int shmget(key_t key, size_t size, int shmflg)
 	int mapid = -1;
 	uint32_t size_low = size & 0xFFFFFFFF;
 	uint32_t size_high = ((uint64_t)size) >> 32;
-	HANDLE hMapping = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, (PAGE_EXECUTE_READWRITE | SEC_RESERVE), size_high, size_low, NULL);
+	HANDLE hMapping;
+
+	hMapping = CreateFileMapping(INVALID_HANDLE_VALUE, NULL,
+					PAGE_EXECUTE_READWRITE | SEC_RESERVE,
+					size_high, size_low, NULL);
 	if (hMapping != NULL) {
 		fileMappings[nFileMappings] = hMapping;
 		mapid = nFileMappings;
 		nFileMappings++;
-	} else {
+	} else
 		errno = ENOSYS;
-	}
 
 	return mapid;
 }
 
 void *shmat(int shmid, const void *shmaddr, int shmflg)
 {
-	void* mapAddr;
+	void *mapAddr;
 	MEMORY_BASIC_INFORMATION memInfo;
+
 	mapAddr = MapViewOfFile(fileMappings[shmid], FILE_MAP_ALL_ACCESS, 0, 0, 0);
 	if (mapAddr == NULL) {
 		errno = win_to_posix_error(GetLastError());
@@ -662,9 +728,9 @@ int shmctl(int shmid, int cmd, struct shmid_ds *buf)
 	if (cmd == IPC_RMID) {
 		fileMappings[shmid] = INVALID_HANDLE_VALUE;
 		return 0;
-	} else {
-		log_err("%s is not implemented\n", __func__);
 	}
+
+	log_err("%s is not implemented\n", __func__);
 	errno = ENOSYS;
 	return -1;
 }
@@ -753,6 +819,7 @@ ssize_t pwrite(int fildes, const void *buf, size_t nbyte,
 {
 	int64_t pos = _telli64(fildes);
 	ssize_t len = _write(fildes, buf, nbyte);
+
 	_lseeki64(fildes, pos, SEEK_SET);
 	return len;
 }
@@ -761,6 +828,7 @@ ssize_t pread(int fildes, void *buf, size_t nbyte, off_t offset)
 {
 	int64_t pos = _telli64(fildes);
 	ssize_t len = read(fildes, buf, nbyte);
+
 	_lseeki64(fildes, pos, SEEK_SET);
 	return len;
 }
@@ -776,11 +844,12 @@ ssize_t writev(int fildes, const struct iovec *iov, int iovcnt)
 {
 	int i;
 	DWORD bytes_written = 0;
-	for (i = 0; i < iovcnt; i++)
-	{
-		int len = send((SOCKET)fildes, iov[i].iov_base, iov[i].iov_len, 0);
-		if (len == SOCKET_ERROR)
-		{
+
+	for (i = 0; i < iovcnt; i++) {
+		int len;
+
+		len = send((SOCKET)fildes, iov[i].iov_base, iov[i].iov_len, 0);
+		if (len == SOCKET_ERROR) {
 			DWORD err = GetLastError();
 			errno = win_to_posix_error(err);
 			bytes_written = -1;
@@ -792,8 +861,7 @@ ssize_t writev(int fildes, const struct iovec *iov, int iovcnt)
 	return bytes_written;
 }
 
-long long strtoll(const char *restrict str, char **restrict endptr,
-		int base)
+long long strtoll(const char *restrict str, char **restrict endptr, int base)
 {
 	return _strtoi64(str, endptr, base);
 }
@@ -816,8 +884,7 @@ int poll(struct pollfd fds[], nfds_t nfds, int timeout)
 	FD_ZERO(&writefds);
 	FD_ZERO(&exceptfds);
 
-	for (i = 0; i < nfds; i++)
-	{
+	for (i = 0; i < nfds; i++) {
 		if (fds[i].fd < 0) {
 			fds[i].revents = 0;
 			continue;
@@ -834,11 +901,9 @@ int poll(struct pollfd fds[], nfds_t nfds, int timeout)
 	rc = select(nfds, &readfds, &writefds, &exceptfds, to);
 
 	if (rc != SOCKET_ERROR) {
-		for (i = 0; i < nfds; i++)
-		{
-			if (fds[i].fd < 0) {
+		for (i = 0; i < nfds; i++) {
+			if (fds[i].fd < 0)
 				continue;
-			}
 
 			if ((fds[i].events & POLLIN) && FD_ISSET(fds[i].fd, &readfds))
 				fds[i].revents |= POLLIN;
@@ -884,9 +949,11 @@ int nanosleep(const struct timespec *rqtp, struct timespec *rmtp)
 DIR *opendir(const char *dirname)
 {
 	struct dirent_ctx *dc = NULL;
+	HANDLE file;
 
 	/* See if we can open it. If not, we'll return an error here */
-	HANDLE file = CreateFileA(dirname, 0, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, NULL);
+	file = CreateFileA(dirname, 0, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
+				OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, NULL);
 	if (file != INVALID_HANDLE_VALUE) {
 		CloseHandle(file);
 		dc = (struct dirent_ctx*)malloc(sizeof(struct dirent_ctx));
@@ -929,6 +996,7 @@ struct dirent *readdir(DIR *dirp)
 
 	if (dirp->find_handle == INVALID_HANDLE_VALUE) {
 		char search_pattern[MAX_PATH];
+
 		StringCchPrintfA(search_pattern, MAX_PATH-1, "%s\\*", dirp->dirname);
 		dirp->find_handle = FindFirstFileA(search_pattern, &find_data);
 		if (dirp->find_handle == INVALID_HANDLE_VALUE)
@@ -960,8 +1028,8 @@ in_addr_t inet_network(const char *cp)
 }
 
 #ifdef CONFIG_WINDOWS_XP
-const char* inet_ntop(int af, const void *restrict src,
-		char *restrict dst, socklen_t size)
+const char *inet_ntop(int af, const void *restrict src, char *restrict dst,
+		      socklen_t size)
 {
 	INT status = SOCKET_ERROR;
 	WSADATA wsd;
@@ -977,6 +1045,7 @@ const char* inet_ntop(int af, const void *restrict src,
 	if (af == AF_INET) {
 		struct sockaddr_in si;
 		DWORD len = size;
+
 		memset(&si, 0, sizeof(si));
 		si.sin_family = af;
 		memcpy(&si.sin_addr, src, sizeof(si.sin_addr));
@@ -984,6 +1053,7 @@ const char* inet_ntop(int af, const void *restrict src,
 	} else if (af == AF_INET6) {
 		struct sockaddr_in6 si6;
 		DWORD len = size;
+
 		memset(&si6, 0, sizeof(si6));
 		si6.sin6_family = af;
 		memcpy(&si6.sin6_addr, src, sizeof(si6.sin6_addr));
@@ -1016,6 +1086,7 @@ int inet_pton(int af, const char *restrict src, void *restrict dst)
 	if (af == AF_INET) {
 		struct sockaddr_in si;
 		INT len = sizeof(si);
+
 		memset(&si, 0, sizeof(si));
 		si.sin_family = af;
 		status = WSAStringToAddressA((char*)src, af, NULL, (struct sockaddr*)&si, &len);
@@ -1024,6 +1095,7 @@ int inet_pton(int af, const char *restrict src, void *restrict dst)
 	} else if (af == AF_INET6) {
 		struct sockaddr_in6 si6;
 		INT len = sizeof(si6);
+
 		memset(&si6, 0, sizeof(si6));
 		si6.sin6_family = af;
 		status = WSAStringToAddressA((char*)src, af, NULL, (struct sockaddr*)&si6, &len);
diff --git a/rate-submit.c b/rate-submit.c
index 5c77a4e..2f02fe2 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -126,7 +126,7 @@ static int io_workqueue_init_worker_fn(struct submit_worker *sw)
 	clear_io_state(td, 1);
 
 	td_set_runstate(td, TD_RUNNING);
-	td->flags |= TD_F_CHILD;
+	td->flags |= TD_F_CHILD | TD_F_NEED_LOCK;
 	td->parent = parent;
 	return 0;
 
diff --git a/t/jobs/t0010-b7aae4ba.fio b/t/jobs/t0010-b7aae4ba.fio
new file mode 100644
index 0000000..0223770
--- /dev/null
+++ b/t/jobs/t0010-b7aae4ba.fio
@@ -0,0 +1,8 @@
+# Expected result: fio runs and completes the job
+# Buggy result: fio segfaults
+#
+[test]
+ioengine=null
+size=10g
+io_submit_mode=offload
+iodepth=16


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ed06328df452330b7210db2558ae125f6e0d8fe2:

  Merge branch 'master' of https://github.com/bvanassche/fio (2018-09-09 17:42:10 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 39281024d26b5dbd4c70ce7620aeadc8933ac8c7:

  Merge branch 'no-unittest-dep' of https://github.com/parallel-fs-utils/fio (2018-09-10 09:56:11 -0600)

----------------------------------------------------------------
Ben England (1):
      remove dependency on unittest2 module

Jens Axboe (1):
      Merge branch 'no-unittest-dep' of https://github.com/parallel-fs-utils/fio

 tools/hist/fio-histo-log-pctiles.py | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/tools/hist/fio-histo-log-pctiles.py b/tools/hist/fio-histo-log-pctiles.py
index c398113..bb016ea 100755
--- a/tools/hist/fio-histo-log-pctiles.py
+++ b/tools/hist/fio-histo-log-pctiles.py
@@ -24,7 +24,12 @@
 import sys, os, math, copy
 from copy import deepcopy
 import argparse
-import unittest2
+
+unittest2_imported = True
+try:
+    import unittest2
+except ImportError:
+    unittest2_imported = False
 
 msec_per_sec = 1000
 nsec_per_usec = 1000
@@ -412,14 +417,14 @@ def compute_percentiles_from_logs():
 #end of MAIN PROGRAM
 
 
-
 ##### below are unit tests ##############
 
-import tempfile, shutil
-from os.path import join
-should_not_get_here = False
+if unittest2_imported:
+  import tempfile, shutil
+  from os.path import join
+  should_not_get_here = False
 
-class Test(unittest2.TestCase):
+  class Test(unittest2.TestCase):
     tempdir = None
 
     # a little less typing please
@@ -651,7 +656,10 @@ class Test(unittest2.TestCase):
 
 if __name__ == '__main__':
     if os.getenv('UNITTEST'):
-        sys.exit(unittest2.main())
+        if unittest2_imported:
+            sys.exit(unittest2.main())
+        else:
+            raise Exception('you must install unittest2 module to run unit test')
     else:
         compute_percentiles_from_logs()
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit fd98fb689d5ad7e9977461e961fff3fdd37f9cb8:

  Kill fusion atomic write engine (2018-09-08 08:09:53 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ed06328df452330b7210db2558ae125f6e0d8fe2:

  Merge branch 'master' of https://github.com/bvanassche/fio (2018-09-09 17:42:10 -0600)

----------------------------------------------------------------
Bart Van Assche (3):
      iolog: Ensure that sockaddr_un.sun_path is '\0'-terminated
      Micro-optimize num2str()
      num2str(): Avoid an out-of-bounds array access

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 iolog.c       |  5 ++++-
 lib/num2str.c | 16 ++++++++--------
 2 files changed, 12 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/iolog.c b/iolog.c
index f3eedb5..26c3458 100644
--- a/iolog.c
+++ b/iolog.c
@@ -580,7 +580,10 @@ static int open_socket(const char *path)
 	if (fd < 0)
 		return fd;
 	addr.sun_family = AF_UNIX;
-	strncpy(addr.sun_path, path, sizeof(addr.sun_path));
+	if (snprintf(addr.sun_path, sizeof(addr.sun_path), "%s", path) >=
+	    sizeof(addr.sun_path))
+		log_err("%s: path name %s is too long for a Unix socket\n",
+			__func__, path);
 	if (connect(fd, (const struct sockaddr *)&addr, strlen(path) + sizeof(addr.sun_family)) == 0)
 		return fd;
 	else
diff --git a/lib/num2str.c b/lib/num2str.c
index 40fb3ae..1abe22f 100644
--- a/lib/num2str.c
+++ b/lib/num2str.c
@@ -30,7 +30,7 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, enum n2s_unit units)
 		[N2S_BYTEPERSEC]= "B/s",
 		[N2S_BITPERSEC]	= "bit/s"
 	};
-	const unsigned int thousand[] = { 1000, 1024 };
+	const unsigned int thousand = pow2 ? 1024 : 1000;
 	unsigned int modulo;
 	int post_index, carry = 0;
 	char tmp[32], fmt[32];
@@ -49,7 +49,7 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, enum n2s_unit units)
 		unitprefix = sistr;
 
 	for (post_index = 0; base > 1; post_index++)
-		base /= thousand[!!pow2];
+		base /= thousand;
 
 	switch (units) {
 	case N2S_NONE:
@@ -72,14 +72,14 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, enum n2s_unit units)
 	 * Divide by K/Ki until string length of num <= maxlen.
 	 */
 	modulo = -1U;
-	while (post_index < sizeof(sistr)) {
+	while (post_index < ARRAY_SIZE(sistr)) {
 		sprintf(tmp, "%llu", (unsigned long long) num);
 		if (strlen(tmp) <= maxlen)
 			break;
 
-		modulo = num % thousand[!!pow2];
-		num /= thousand[!!pow2];
-		carry = modulo >= thousand[!!pow2] / 2;
+		modulo = num % thousand;
+		num /= thousand;
+		carry = modulo >= thousand / 2;
 		post_index++;
 	}
 
@@ -110,9 +110,9 @@ done:
 	 * Fill in everything and return the result.
 	 */
 	assert(maxlen - strlen(tmp) - 1 > 0);
-	assert(modulo < thousand[!!pow2]);
+	assert(modulo < thousand);
 	sprintf(fmt, "%%.%df", (int)(maxlen - strlen(tmp) - 1));
-	sprintf(tmp, fmt, (double)modulo / (double)thousand[!!pow2]);
+	sprintf(tmp, fmt, (double)modulo / (double)thousand);
 
 	sprintf(buf, "%llu.%s%s%s", (unsigned long long) num, &tmp[2],
 			unitprefix[post_index], unitstr[units]);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2eca48926723d3ebe8f43d4999302fb826f4a250:

  client: cleanup output types (2018-09-07 15:59:51 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fd98fb689d5ad7e9977461e961fff3fdd37f9cb8:

  Kill fusion atomic write engine (2018-09-08 08:09:53 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Rename example job files (*.job -> *.fio)
      Kill fusion atomic write engine

 Makefile                                           |   3 -
 configure                                          |  25 ---
 engines/fusion-aw.c                                | 183 ---------------------
 examples/{fio-rand-RW.job => fio-rand-RW.fio}      |   0
 examples/{fio-rand-read.job => fio-rand-read.fio}  |   0
 .../{fio-rand-write.job => fio-rand-write.fio}     |   0
 examples/{fio-seq-RW.job => fio-seq-RW.fio}        |   0
 examples/{fio-seq-read.job => fio-seq-read.fio}    |   0
 examples/{fio-seq-write.job => fio-seq-write.fio}  |   0
 examples/fusion-aw-sync.fio                        |  18 --
 options.c                                          |   5 -
 os/windows/examples.wxs                            |  28 ++--
 12 files changed, 12 insertions(+), 250 deletions(-)
 delete mode 100644 engines/fusion-aw.c
 rename examples/{fio-rand-RW.job => fio-rand-RW.fio} (100%)
 rename examples/{fio-rand-read.job => fio-rand-read.fio} (100%)
 rename examples/{fio-rand-write.job => fio-rand-write.fio} (100%)
 rename examples/{fio-seq-RW.job => fio-seq-RW.fio} (100%)
 rename examples/{fio-seq-read.job => fio-seq-read.fio} (100%)
 rename examples/{fio-seq-write.job => fio-seq-write.fio} (100%)
 delete mode 100644 examples/fusion-aw-sync.fio

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 7e87b2f..42e5205 100644
--- a/Makefile
+++ b/Makefile
@@ -86,9 +86,6 @@ endif
 ifdef CONFIG_GUASI
   SOURCE += engines/guasi.c
 endif
-ifdef CONFIG_FUSION_AW
-  SOURCE += engines/fusion-aw.c
-endif
 ifdef CONFIG_SOLARISAIO
   SOURCE += engines/solarisaio.c
 endif
diff --git a/configure b/configure
index 5e11195..26c345b 100755
--- a/configure
+++ b/configure
@@ -1149,28 +1149,6 @@ fi
 print_config "GUASI" "$guasi"
 
 ##########################################
-# fusion-aw probe
-if test "$fusion_aw" != "yes" ; then
-  fusion_aw="no"
-fi
-cat > $TMPC << EOF
-#include <nvm/nvm_primitives.h>
-int main(int argc, char **argv)
-{
-  nvm_version_t ver_info;
-  nvm_handle_t handle;
-
-  handle = nvm_get_handle(0, &ver_info);
-  return nvm_atomic_write(handle, 0, 0, 0);
-}
-EOF
-if compile_prog "" "-L/usr/lib/fio -L/usr/lib/nvm -lnvm-primitives -ldl -lpthread" "fusion-aw"; then
-  LIBS="-L/usr/lib/fio -L/usr/lib/nvm -lnvm-primitives -ldl -lpthread $LIBS"
-  fusion_aw="yes"
-fi
-print_config "Fusion-io atomic engine" "$fusion_aw"
-
-##########################################
 # libnuma probe
 if test "$libnuma" != "yes" ; then
   libnuma="no"
@@ -2405,9 +2383,6 @@ fi
 if test "$guasi" = "yes" ; then
   output_sym "CONFIG_GUASI"
 fi
-if test "$fusion_aw" = "yes" ; then
-  output_sym "CONFIG_FUSION_AW"
-fi
 if test "$libnuma_v2" = "yes" ; then
   output_sym "CONFIG_LIBNUMA"
 fi
diff --git a/engines/fusion-aw.c b/engines/fusion-aw.c
deleted file mode 100644
index eb5fdf5..0000000
--- a/engines/fusion-aw.c
+++ /dev/null
@@ -1,183 +0,0 @@
-/*
- * Custom fio(1) engine that submits synchronous atomic writes to file.
- *
- * Copyright (C) 2013 Fusion-io, Inc.
- * Author: Santhosh Kumar Koundinya (skoundinya@fusionio.com).
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; under version 2 of the License.
- *
- * This program is distributed in the hope that it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License version
- * 2 for more details.
- *
- * You should have received a copy of the GNU General Public License Version 2
- * along with this program; if not see <http://www.gnu.org/licenses/>
- */
-
-#include <stdlib.h>
-#include <stdint.h>
-
-#include "../fio.h"
-
-#include <nvm/nvm_primitives.h>
-
-#define NUM_ATOMIC_CAPABILITIES (5)
-
-struct fas_data {
-	nvm_handle_t nvm_handle;
-	size_t xfer_buf_align;
-	size_t xfer_buflen_align;
-	size_t xfer_buflen_max;
-	size_t sector_size;
-};
-
-static enum fio_q_status queue(struct thread_data *td, struct io_u *io_u)
-{
-	struct fas_data *d = FILE_ENG_DATA(io_u->file);
-	int rc;
-
-	if (io_u->ddir != DDIR_WRITE) {
-		td_vmsg(td, EINVAL, "only writes supported", "io_u->ddir");
-		rc = -EINVAL;
-		goto out;
-	}
-
-	if ((size_t) io_u->xfer_buf % d->xfer_buf_align) {
-		td_vmsg(td, EINVAL, "unaligned data buffer", "io_u->xfer_buf");
-		rc = -EINVAL;
-		goto out;
-	}
-
-	if (io_u->xfer_buflen % d->xfer_buflen_align) {
-		td_vmsg(td, EINVAL, "unaligned data size", "io_u->xfer_buflen");
-		rc = -EINVAL;
-		goto out;
-	}
-
-	if (io_u->xfer_buflen > d->xfer_buflen_max) {
-		td_vmsg(td, EINVAL, "data too big", "io_u->xfer_buflen");
-		rc = -EINVAL;
-		goto out;
-	}
-
-	rc = nvm_atomic_write(d->nvm_handle, (uint64_t) io_u->xfer_buf,
-		io_u->xfer_buflen, io_u->offset / d->sector_size);
-	if (rc == -1) {
-		td_verror(td, errno, "nvm_atomic_write");
-		rc = -errno;
-		goto out;
-	}
-	rc = FIO_Q_COMPLETED;
-out:
-	if (rc < 0)
-		io_u->error = -rc;
-
-	return rc;
-}
-
-static int open_file(struct thread_data *td, struct fio_file *f)
-{
-	int rc;
-	int fio_unused close_file_rc;
-	struct fas_data *d;
-	nvm_version_t nvm_version;
-	nvm_capability_t nvm_capability[NUM_ATOMIC_CAPABILITIES];
-
-
-	d = malloc(sizeof(*d));
-	if (!d) {
-		td_verror(td, ENOMEM, "malloc");
-		rc = ENOMEM;
-		goto error;
-	}
-	d->nvm_handle = -1;
-	FILE_SET_ENG_DATA(f, d);
-
-	rc = generic_open_file(td, f);
-
-	if (rc)
-		goto free_engine_data;
-
-	/* Set the version of the library as seen when engine is compiled */
-	nvm_version.major = NVM_PRIMITIVES_API_MAJOR;
-	nvm_version.minor = NVM_PRIMITIVES_API_MINOR;
-	nvm_version.micro = NVM_PRIMITIVES_API_MICRO;
-
-	d->nvm_handle = nvm_get_handle(f->fd, &nvm_version);
-	if (d->nvm_handle == -1) {
-		td_vmsg(td, errno, "nvm_get_handle failed", "nvm_get_handle");
-		rc = errno;
-		goto close_file;
-	}
-
-	nvm_capability[0].cap_id = NVM_CAP_ATOMIC_WRITE_START_ALIGN_ID;
-	nvm_capability[1].cap_id = NVM_CAP_ATOMIC_WRITE_MULTIPLICITY_ID;
-	nvm_capability[2].cap_id = NVM_CAP_ATOMIC_WRITE_MAX_VECTOR_SIZE_ID;
-	nvm_capability[3].cap_id = NVM_CAP_SECTOR_SIZE_ID;
-	nvm_capability[4].cap_id = NVM_CAP_ATOMIC_MAX_IOV_ID;
-	rc = nvm_get_capabilities(d->nvm_handle, nvm_capability,
-                                  NUM_ATOMIC_CAPABILITIES, false);
-	if (rc == -1) {
-		td_vmsg(td, errno, "error in getting atomic write capabilities", "nvm_get_capabilities");
-		rc = errno;
-		goto close_file;
-	} else if (rc < NUM_ATOMIC_CAPABILITIES) {
-		td_vmsg(td, EINVAL, "couldn't get all the atomic write capabilities" , "nvm_get_capabilities");
-		rc = ECANCELED;
-		goto close_file;
-	}
-	/* Reset rc to 0 because we got all capabilities we needed */
-	rc = 0;
-	d->xfer_buf_align = nvm_capability[0].cap_value;
-	d->xfer_buflen_align = nvm_capability[1].cap_value;
-	d->xfer_buflen_max = d->xfer_buflen_align * nvm_capability[2].cap_value * nvm_capability[4].cap_value;
-	d->sector_size = nvm_capability[3].cap_value;
-
-out:
-	return rc;
-close_file:
-	close_file_rc = generic_close_file(td, f);
-free_engine_data:
-	free(d);
-error:
-	f->fd = -1;
-	FILE_SET_ENG_DATA(f, NULL);
-	goto out;
-}
-
-static int close_file(struct thread_data *td, struct fio_file *f)
-{
-	struct fas_data *d = FILE_ENG_DATA(f);
-
-	if (d) {
-		if (d->nvm_handle != -1)
-			nvm_release_handle(d->nvm_handle);
-		free(d);
-		FILE_SET_ENG_DATA(f, NULL);
-	}
-
-	return generic_close_file(td, f);
-}
-
-static struct ioengine_ops ioengine = {
-	.name = "fusion-aw-sync",
-	.version = FIO_IOOPS_VERSION,
-	.queue = queue,
-	.open_file = open_file,
-	.close_file = close_file,
-	.get_file_size = generic_get_file_size,
-	.flags = FIO_SYNCIO | FIO_RAWIO | FIO_MEMALIGN
-};
-
-static void fio_init fio_fusion_aw_init(void)
-{
-	register_ioengine(&ioengine);
-}
-
-static void fio_exit fio_fusion_aw_exit(void)
-{
-	unregister_ioengine(&ioengine);
-}
diff --git a/examples/fio-rand-RW.fio b/examples/fio-rand-RW.fio
new file mode 100644
index 0000000..0df0bc1
--- /dev/null
+++ b/examples/fio-rand-RW.fio
@@ -0,0 +1,18 @@
+; fio-rand-RW.job for fiotest
+
+[global]
+name=fio-rand-RW
+filename=fio-rand-RW
+rw=randrw
+rwmixread=60
+rwmixwrite=40
+bs=4K
+direct=0
+numjobs=4
+time_based=1
+runtime=900
+
+[file1]
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/examples/fio-rand-RW.job b/examples/fio-rand-RW.job
deleted file mode 100644
index 0df0bc1..0000000
--- a/examples/fio-rand-RW.job
+++ /dev/null
@@ -1,18 +0,0 @@
-; fio-rand-RW.job for fiotest
-
-[global]
-name=fio-rand-RW
-filename=fio-rand-RW
-rw=randrw
-rwmixread=60
-rwmixwrite=40
-bs=4K
-direct=0
-numjobs=4
-time_based=1
-runtime=900
-
-[file1]
-size=10G
-ioengine=libaio
-iodepth=16
diff --git a/examples/fio-rand-read.fio b/examples/fio-rand-read.fio
new file mode 100644
index 0000000..bc15466
--- /dev/null
+++ b/examples/fio-rand-read.fio
@@ -0,0 +1,16 @@
+; fio-rand-read.job for fiotest
+
+[global]
+name=fio-rand-read
+filename=fio-rand-read
+rw=randread
+bs=4K
+direct=0
+numjobs=1
+time_based=1
+runtime=900
+
+[file1]
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/examples/fio-rand-read.job b/examples/fio-rand-read.job
deleted file mode 100644
index bc15466..0000000
--- a/examples/fio-rand-read.job
+++ /dev/null
@@ -1,16 +0,0 @@
-; fio-rand-read.job for fiotest
-
-[global]
-name=fio-rand-read
-filename=fio-rand-read
-rw=randread
-bs=4K
-direct=0
-numjobs=1
-time_based=1
-runtime=900
-
-[file1]
-size=10G
-ioengine=libaio
-iodepth=16
diff --git a/examples/fio-rand-write.fio b/examples/fio-rand-write.fio
new file mode 100644
index 0000000..bd1b73a
--- /dev/null
+++ b/examples/fio-rand-write.fio
@@ -0,0 +1,16 @@
+; fio-rand-write.job for fiotest
+
+[global]
+name=fio-rand-write
+filename=fio-rand-write
+rw=randwrite
+bs=4K
+direct=0
+numjobs=4
+time_based=1
+runtime=900
+
+[file1]
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/examples/fio-rand-write.job b/examples/fio-rand-write.job
deleted file mode 100644
index bd1b73a..0000000
--- a/examples/fio-rand-write.job
+++ /dev/null
@@ -1,16 +0,0 @@
-; fio-rand-write.job for fiotest
-
-[global]
-name=fio-rand-write
-filename=fio-rand-write
-rw=randwrite
-bs=4K
-direct=0
-numjobs=4
-time_based=1
-runtime=900
-
-[file1]
-size=10G
-ioengine=libaio
-iodepth=16
diff --git a/examples/fio-seq-RW.fio b/examples/fio-seq-RW.fio
new file mode 100644
index 0000000..8f7090f
--- /dev/null
+++ b/examples/fio-seq-RW.fio
@@ -0,0 +1,18 @@
+; fio-seq-RW.job for fiotest
+
+[global]
+name=fio-seq-RW
+filename=fio-seq-RW
+rw=rw
+rwmixread=60
+rwmixwrite=40
+bs=256K
+direct=0
+numjobs=4
+time_based=1
+runtime=900
+
+[file1]
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/examples/fio-seq-RW.job b/examples/fio-seq-RW.job
deleted file mode 100644
index 8f7090f..0000000
--- a/examples/fio-seq-RW.job
+++ /dev/null
@@ -1,18 +0,0 @@
-; fio-seq-RW.job for fiotest
-
-[global]
-name=fio-seq-RW
-filename=fio-seq-RW
-rw=rw
-rwmixread=60
-rwmixwrite=40
-bs=256K
-direct=0
-numjobs=4
-time_based=1
-runtime=900
-
-[file1]
-size=10G
-ioengine=libaio
-iodepth=16
diff --git a/examples/fio-seq-read.fio b/examples/fio-seq-read.fio
new file mode 100644
index 0000000..28de93c
--- /dev/null
+++ b/examples/fio-seq-read.fio
@@ -0,0 +1,14 @@
+[global]
+name=fio-seq-reads
+filename=fio-seq-reads
+rw=read
+bs=256K
+direct=1
+numjobs=1
+time_based=1
+runtime=900
+
+[file1]
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/examples/fio-seq-read.job b/examples/fio-seq-read.job
deleted file mode 100644
index 28de93c..0000000
--- a/examples/fio-seq-read.job
+++ /dev/null
@@ -1,14 +0,0 @@
-[global]
-name=fio-seq-reads
-filename=fio-seq-reads
-rw=read
-bs=256K
-direct=1
-numjobs=1
-time_based=1
-runtime=900
-
-[file1]
-size=10G
-ioengine=libaio
-iodepth=16
diff --git a/examples/fio-seq-write.fio b/examples/fio-seq-write.fio
new file mode 100644
index 0000000..b291a15
--- /dev/null
+++ b/examples/fio-seq-write.fio
@@ -0,0 +1,16 @@
+; fio-seq-write.job for fiotest
+
+[global]
+name=fio-seq-write
+filename=fio-seq-write
+rw=write
+bs=256K
+direct=0
+numjobs=1
+time_based=1
+runtime=900
+
+[file1]
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/examples/fio-seq-write.job b/examples/fio-seq-write.job
deleted file mode 100644
index b291a15..0000000
--- a/examples/fio-seq-write.job
+++ /dev/null
@@ -1,16 +0,0 @@
-; fio-seq-write.job for fiotest
-
-[global]
-name=fio-seq-write
-filename=fio-seq-write
-rw=write
-bs=256K
-direct=0
-numjobs=1
-time_based=1
-runtime=900
-
-[file1]
-size=10G
-ioengine=libaio
-iodepth=16
diff --git a/examples/fusion-aw-sync.fio b/examples/fusion-aw-sync.fio
deleted file mode 100644
index f2ca313..0000000
--- a/examples/fusion-aw-sync.fio
+++ /dev/null
@@ -1,18 +0,0 @@
-# Example Job File that randomly writes 8k worth of data atomically for
-# 60 seconds.
-[rw_aw_file_sync]
-rw=randwrite
-ioengine=fusion-aw-sync
-blocksize=8k
-blockalign=8k
-
-# if file system supports atomic write
-filename=/mnt/fs/file
-# or test on a direct block device instead
-#filename=/dev/fioa
-randrepeat=1
-fallocate=none
-direct=1
-invalidate=0
-runtime=60
-time_based
diff --git a/options.c b/options.c
index 534233b..6bd7455 100644
--- a/options.c
+++ b/options.c
@@ -1828,11 +1828,6 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "RDMA IO engine",
 			  },
 #endif
-#ifdef CONFIG_FUSION_AW
-			  { .ival = "fusion-aw-sync",
-			    .help = "Fusion-io atomic write engine",
-			  },
-#endif
 #ifdef CONFIG_LINUX_EXT4_MOVE_EXTENT
 			  { .ival = "e4defrag",
 			    .help = "ext4 defrag engine",
diff --git a/os/windows/examples.wxs b/os/windows/examples.wxs
index e8580d9..9308ba8 100755
--- a/os/windows/examples.wxs
+++ b/os/windows/examples.wxs
@@ -45,22 +45,22 @@
                   <File Source="..\..\examples\filecreate-ioengine.fio" />
                 </Component>
                 <Component>
-                  <File Source="..\..\examples\fio-rand-read.job" />
+                  <File Source="..\..\examples\fio-rand-read.fio" />
                 </Component>
                 <Component>
-                  <File Source="..\..\examples\fio-rand-RW.job" />
+                  <File Source="..\..\examples\fio-rand-RW.fio" />
                 </Component>
                 <Component>
-                  <File Source="..\..\examples\fio-rand-write.job" />
+                  <File Source="..\..\examples\fio-rand-write.fio" />
                 </Component>
                 <Component>
-                  <File Source="..\..\examples\fio-seq-read.job" />
+                  <File Source="..\..\examples\fio-seq-read.fio" />
                 </Component>
                 <Component>
-                  <File Source="..\..\examples\fio-seq-RW.job" />
+                  <File Source="..\..\examples\fio-seq-RW.fio" />
                 </Component>
                 <Component>
-                  <File Source="..\..\examples\fio-seq-write.job" />
+                  <File Source="..\..\examples\fio-seq-write.fio" />
                 </Component>
                 <Component>
                   <File Source="..\..\examples\fixed-rate-submission.fio" />
@@ -75,9 +75,6 @@
                   <File Source="..\..\examples\ftruncate.fio" />
                 </Component>
                 <Component>
-                  <File Source="..\..\examples\fusion-aw-sync.fio" />
-                </Component>
-                <Component>
                   <File Source="..\..\examples\gfapi.fio" />
                 </Component>
                 <Component>
@@ -188,17 +185,16 @@
             <ComponentRef Id="enospc_pressure.fio" />
             <ComponentRef Id="falloc.fio" />
             <ComponentRef Id="filecreate_ioengine.fio"/>
-            <ComponentRef Id="fio_rand_read.job"/>
-            <ComponentRef Id="fio_rand_RW.job"/>
-            <ComponentRef Id="fio_rand_write.job"/>
-            <ComponentRef Id="fio_seq_read.job"/>
-            <ComponentRef Id="fio_seq_RW.job"/>
-            <ComponentRef Id="fio_seq_write.job"/>
+            <ComponentRef Id="fio_rand_read.fio"/>
+            <ComponentRef Id="fio_rand_RW.fio"/>
+            <ComponentRef Id="fio_rand_write.fio"/>
+            <ComponentRef Id="fio_seq_read.fio"/>
+            <ComponentRef Id="fio_seq_RW.fio"/>
+            <ComponentRef Id="fio_seq_write.fio"/>
             <ComponentRef Id="fixed_rate_submission.fio" />
             <ComponentRef Id="flow.fio" />
             <ComponentRef Id="fsx.fio" />
             <ComponentRef Id="ftruncate.fio"/>
-            <ComponentRef Id="fusion_aw_sync.fio" />
             <ComponentRef Id="gfapi.fio" />
             <ComponentRef Id="gpudirect_rdmaio_client.fio"/>
             <ComponentRef Id="gpudirect_rdmaio_server.fio"/>


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit bb661c4027e5a0482b9fcc1c1b4e7e918650ee72:

  Fio 3.9 (2018-09-06 09:07:55 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2eca48926723d3ebe8f43d4999302fb826f4a250:

  client: cleanup output types (2018-09-07 15:59:51 -0600)

----------------------------------------------------------------
Jens Axboe (9):
      Collect startup output before logging it
      client: use temp buffer for client text output
      log: use __log_buf() if we know buf != NULL
      init: use __log_buf() if we know buf != NULL
      log: remember to free output buffer when done
      Revert "client: respect terse output on client <--> backend relationship"
      client: use temp buffer for single output flush for json/disk util
      client: switch to per-client buffer
      client: cleanup output types

 backend.c | 16 +++++++-----
 client.c  | 90 +++++++++++++++++++++++++++++++++++----------------------------
 client.h  |  2 ++
 init.c    | 12 ++++++---
 stat.c    |  2 --
 5 files changed, 70 insertions(+), 52 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 8fec1ce..bb8bd13 100644
--- a/backend.c
+++ b/backend.c
@@ -2213,18 +2213,22 @@ static void run_threads(struct sk_out *sk_out)
 	}
 
 	if (output_format & FIO_OUTPUT_NORMAL) {
-		log_info("Starting ");
+		struct buf_output out;
+
+		buf_output_init(&out);
+		__log_buf(&out, "Starting ");
 		if (nr_thread)
-			log_info("%d thread%s", nr_thread,
+			__log_buf(&out, "%d thread%s", nr_thread,
 						nr_thread > 1 ? "s" : "");
 		if (nr_process) {
 			if (nr_thread)
-				log_info(" and ");
-			log_info("%d process%s", nr_process,
+				__log_buf(&out, " and ");
+			__log_buf(&out, "%d process%s", nr_process,
 						nr_process > 1 ? "es" : "");
 		}
-		log_info("\n");
-		log_info_flush();
+		__log_buf(&out, "\n");
+		log_info_buf(out.buf, out.buflen);
+		buf_output_free(&out);
 	}
 
 	todo = thread_number;
diff --git a/client.c b/client.c
index 31c7c64..3248906 100644
--- a/client.c
+++ b/client.c
@@ -198,14 +198,23 @@ static void fio_client_json_init(void)
 
 static void fio_client_json_fini(void)
 {
-	if (!(output_format & FIO_OUTPUT_JSON))
+	struct buf_output out;
+
+	if (!root)
 		return;
 
-	log_info("\n");
-	json_print_object(root, NULL);
-	log_info("\n");
+	buf_output_init(&out);
+
+	__log_buf(&out, "\n");
+	json_print_object(root, &out);
+	__log_buf(&out, "\n");
+	log_info_buf(out.buf, out.buflen);
+
+	buf_output_free(&out);
+
 	json_free_object(root);
 	root = NULL;
+	job_opt_object = NULL;
 	clients_array = NULL;
 	du_array = NULL;
 }
@@ -233,6 +242,9 @@ void fio_put_client(struct fio_client *client)
 	if (--client->refs)
 		return;
 
+	log_info_buf(client->buf.buf, client->buf.buflen);
+	buf_output_free(&client->buf);
+
 	free(client->hostname);
 	if (client->argv)
 		free(client->argv);
@@ -351,9 +363,7 @@ void fio_client_add_cmd_option(void *cookie, const char *opt)
 	}
 }
 
-struct fio_client *fio_client_add_explicit(struct client_ops *ops,
-					   const char *hostname, int type,
-					   int port)
+static struct fio_client *get_new_client(void)
 {
 	struct fio_client *client;
 
@@ -366,6 +376,19 @@ struct fio_client *fio_client_add_explicit(struct client_ops *ops,
 	INIT_FLIST_HEAD(&client->eta_list);
 	INIT_FLIST_HEAD(&client->cmd_list);
 
+	buf_output_init(&client->buf);
+
+	return client;
+}
+
+struct fio_client *fio_client_add_explicit(struct client_ops *ops,
+					   const char *hostname, int type,
+					   int port)
+{
+	struct fio_client *client;
+
+	client = get_new_client();
+
 	client->hostname = strdup(hostname);
 
 	if (type == Fio_client_socket)
@@ -441,14 +464,7 @@ int fio_client_add(struct client_ops *ops, const char *hostname, void **cookie)
 		}
 	}
 
-	client = malloc(sizeof(*client));
-	memset(client, 0, sizeof(*client));
-
-	INIT_FLIST_HEAD(&client->list);
-	INIT_FLIST_HEAD(&client->hash_list);
-	INIT_FLIST_HEAD(&client->arg_list);
-	INIT_FLIST_HEAD(&client->eta_list);
-	INIT_FLIST_HEAD(&client->cmd_list);
+	client = get_new_client();
 
 	if (fio_server_parse_string(hostname, &client->hostname,
 					&client->is_sock, &client->port,
@@ -1059,13 +1075,10 @@ static void handle_ts(struct fio_client *client, struct fio_net_cmd *cmd)
 	struct flist_head *opt_list = NULL;
 	struct json_object *tsobj;
 
-	if (output_format & FIO_OUTPUT_TERSE)
-		return;
-
 	if (client->opt_lists && p->ts.thread_number <= client->jobs)
 		opt_list = &client->opt_lists[p->ts.thread_number - 1];
 
-	tsobj = show_thread_status(&p->ts, &p->rs, opt_list, NULL);
+	tsobj = show_thread_status(&p->ts, &p->rs, opt_list, &client->buf);
 	client->did_stat = true;
 	if (tsobj) {
 		json_object_add_client_info(tsobj, client);
@@ -1086,7 +1099,7 @@ static void handle_ts(struct fio_client *client, struct fio_net_cmd *cmd)
 
 	if (++sum_stat_nr == sum_stat_clients) {
 		strcpy(client_ts.name, "All clients");
-		tsobj = show_thread_status(&client_ts, &client_gs, NULL, NULL);
+		tsobj = show_thread_status(&client_ts, &client_gs, NULL, &client->buf);
 		if (tsobj) {
 			json_object_add_client_info(tsobj, client);
 			json_array_add_value_object(clients_array, tsobj);
@@ -1098,11 +1111,8 @@ static void handle_gs(struct fio_client *client, struct fio_net_cmd *cmd)
 {
 	struct group_run_stats *gs = (struct group_run_stats *) cmd->payload;
 
-	if (output_format & FIO_OUTPUT_TERSE)
-		return;
-
 	if (output_format & FIO_OUTPUT_NORMAL)
-		show_group_stats(gs, NULL);
+		show_group_stats(gs, &client->buf);
 }
 
 static void handle_job_opt(struct fio_client *client, struct fio_net_cmd *cmd)
@@ -1144,13 +1154,17 @@ static void handle_text(struct fio_client *client, struct fio_net_cmd *cmd)
 	const char *buf = (const char *) pdu->buf;
 	const char *name;
 	int fio_unused ret;
+	struct buf_output out;
+
+	buf_output_init(&out);
 
 	name = client->name ? client->name : client->hostname;
 
 	if (!client->skip_newline && !(output_format & FIO_OUTPUT_TERSE))
-		fprintf(f_out, "<%s> ", name);
-	ret = fwrite(buf, pdu->buf_len, 1, f_out);
-	fflush(f_out);
+		__log_buf(&out, "<%s> ", name);
+	__log_buf(&out, "%s", buf);
+	log_info_buf(out.buf, out.buflen);
+	buf_output_free(&out);
 	client->skip_newline = strchr(buf, '\n') == NULL;
 }
 
@@ -1191,23 +1205,21 @@ static void handle_du(struct fio_client *client, struct fio_net_cmd *cmd)
 {
 	struct cmd_du_pdu *du = (struct cmd_du_pdu *) cmd->payload;
 
-	if (output_format & FIO_OUTPUT_TERSE)
-		return;
-
-	if (!client->disk_stats_shown) {
+	if (!client->disk_stats_shown)
 		client->disk_stats_shown = true;
-		if (!(output_format & FIO_OUTPUT_JSON))
-			log_info("\nDisk stats (read/write):\n");
-	}
 
 	if (output_format & FIO_OUTPUT_JSON) {
 		struct json_object *duobj;
+
 		json_array_add_disk_util(&du->dus, &du->agg, du_array);
 		duobj = json_array_last_value_object(du_array);
 		json_object_add_client_info(duobj, client);
+	} else if (output_format & FIO_OUTPUT_TERSE)
+		print_disk_util(&du->dus, &du->agg, 1, &client->buf);
+	else if (output_format & FIO_OUTPUT_NORMAL) {
+		__log_buf(&client->buf, "\nDisk stats (read/write):\n");
+		print_disk_util(&du->dus, &du->agg, 0, &client->buf);
 	}
-	if (output_format & FIO_OUTPUT_NORMAL)
-		print_disk_util(&du->dus, &du->agg, 0, NULL);
 }
 
 static void convert_jobs_eta(struct jobs_eta *je)
@@ -1465,9 +1477,6 @@ static void handle_probe(struct fio_client *client, struct fio_net_cmd *cmd)
 	const char *os, *arch;
 	char bit[16];
 
-	if (output_format & FIO_OUTPUT_TERSE)
-		return;
-
 	os = fio_get_os_string(probe->os);
 	if (!os)
 		os = "unknown";
@@ -1479,10 +1488,11 @@ static void handle_probe(struct fio_client *client, struct fio_net_cmd *cmd)
 	sprintf(bit, "%d-bit", probe->bpp * 8);
 	probe->flags = le64_to_cpu(probe->flags);
 
-	if (!(output_format & FIO_OUTPUT_JSON))
+	if (output_format & FIO_OUTPUT_NORMAL) {
 		log_info("hostname=%s, be=%u, %s, os=%s, arch=%s, fio=%s, flags=%lx\n",
 			probe->hostname, probe->bigendian, bit, os, arch,
 			probe->fio_version, (unsigned long) probe->flags);
+	}
 
 	if (!client->name)
 		client->name = strdup((char *) probe->hostname);
diff --git a/client.h b/client.h
index a597449..8033325 100644
--- a/client.h
+++ b/client.h
@@ -74,6 +74,8 @@ struct fio_client {
 
 	struct client_file *files;
 	unsigned int nr_files;
+
+	struct buf_output buf;
 };
 
 typedef void (client_cmd_op)(struct fio_client *, struct fio_net_cmd *);
diff --git a/init.c b/init.c
index 09f58a3..c235b05 100644
--- a/init.c
+++ b/init.c
@@ -1681,6 +1681,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 				char *c1, *c2, *c3, *c4;
 				char *c5 = NULL, *c6 = NULL;
 				int i2p = is_power_of_2(o->kb_base);
+				struct buf_output out;
 
 				c1 = num2str(o->min_bs[DDIR_READ], o->sig_figs, 1, i2p, N2S_BYTE);
 				c2 = num2str(o->max_bs[DDIR_READ], o->sig_figs, 1, i2p, N2S_BYTE);
@@ -1692,19 +1693,22 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 					c6 = num2str(o->max_bs[DDIR_TRIM], o->sig_figs, 1, i2p, N2S_BYTE);
 				}
 
-				log_info("%s: (g=%d): rw=%s, ", td->o.name,
+				buf_output_init(&out);
+				__log_buf(&out, "%s: (g=%d): rw=%s, ", td->o.name,
 							td->groupid,
 							ddir_str(o->td_ddir));
 
 				if (o->bs_is_seq_rand)
-					log_info("bs=(R) %s-%s, (W) %s-%s, bs_is_seq_rand, ",
+					__log_buf(&out, "bs=(R) %s-%s, (W) %s-%s, bs_is_seq_rand, ",
 							c1, c2, c3, c4);
 				else
-					log_info("bs=(R) %s-%s, (W) %s-%s, (T) %s-%s, ",
+					__log_buf(&out, "bs=(R) %s-%s, (W) %s-%s, (T) %s-%s, ",
 							c1, c2, c3, c4, c5, c6);
 
-				log_info("ioengine=%s, iodepth=%u\n",
+				__log_buf(&out, "ioengine=%s, iodepth=%u\n",
 						td->io_ops->name, o->iodepth);
+				log_info_buf(out.buf, out.buflen);
+				buf_output_free(&out);
 
 				free(c1);
 				free(c2);
diff --git a/stat.c b/stat.c
index 1a9c553..5fca998 100644
--- a/stat.c
+++ b/stat.c
@@ -1928,8 +1928,6 @@ void __show_run_stats(void)
 		if (is_backend) {
 			fio_server_send_job_options(opt_lists[i], i);
 			fio_server_send_ts(ts, rs);
-			if (output_format & FIO_OUTPUT_TERSE)
-				show_thread_status_terse(ts, rs, &output[__FIO_OUTPUT_TERSE]);
 		} else {
 			if (output_format & FIO_OUTPUT_TERSE)
 				show_thread_status_terse(ts, rs, &output[__FIO_OUTPUT_TERSE]);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f111f76c0e6687c0cc6fc562cb902e4a64e42b37:

  Merge branch 'master' of https://github.com/damien-lemoal/fio (2018-09-05 19:25:39 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to bb661c4027e5a0482b9fcc1c1b4e7e918650ee72:

  Fio 3.9 (2018-09-06 09:07:55 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 3.9

 FIO-VERSION-GEN | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 99261fb..f031594 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.8
+DEF_VER=fio-3.9
 
 LF='
 '


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 53ee8c17adb00e3db4f2c9441777ba777390cb9f:

  engines/sg: improve error handling (2018-09-03 09:01:14 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f111f76c0e6687c0cc6fc562cb902e4a64e42b37:

  Merge branch 'master' of https://github.com/damien-lemoal/fio (2018-09-05 19:25:39 -0600)

----------------------------------------------------------------
Damien Le Moal (5):
      zbd: Fix test scripts
      zbd: Improve read randomness
      zbd: Remove inexistent functions declaration
      zbd: Fix zbd_zone_idx()
      zbd: Use bytes unit

Jens Axboe (4):
      Document oddity with --status-interval and --output-format=json
      Merge branch 'sz/log-names-need-help' of https://github.com/szaydel/fio
      init: move log name generation into helper
      Merge branch 'master' of https://github.com/damien-lemoal/fio

Rebecca Cran (1):
      Windows: update download URL and add missing examples

Sam Zaydel (1):
      Log files names start with _ when write_XX_log= keys in config file have empty value(s)

 HOWTO                                 |   5 +-
 fio.1                                 |   5 +-
 init.c                                |  20 ++++-
 os/windows/examples.wxs               |  90 ++++++++++++++++++--
 os/windows/install.wxs                |   2 +-
 t/zbd/run-tests-against-regular-nullb |   4 +-
 t/zbd/run-tests-against-zoned-nullb   |   4 +-
 zbd.c                                 | 149 +++++++++++++++++++---------------
 zbd.h                                 |  12 ---
 9 files changed, 196 insertions(+), 95 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 7bbd589..0c5b710 100644
--- a/HOWTO
+++ b/HOWTO
@@ -194,7 +194,10 @@ Command line options
 	Force a full status dump of cumulative (from job start) values at `time`
 	intervals. This option does *not* provide per-period measurements. So
 	values such as bandwidth are running averages. When the time unit is omitted,
-	`time` is interpreted in seconds.
+	`time` is interpreted in seconds. Note that using this option with
+	``--output-format=json`` will yield output that technically isn't valid
+	json, since the output will be collated sets of valid json. It will need
+	to be split into valid sets of json after the run.
 
 .. option:: --section=name
 
diff --git a/fio.1 b/fio.1
index b555b20..593f4db 100644
--- a/fio.1
+++ b/fio.1
@@ -93,7 +93,10 @@ the value is interpreted in seconds.
 Force a full status dump of cumulative (from job start) values at \fItime\fR
 intervals. This option does *not* provide per-period measurements. So
 values such as bandwidth are running averages. When the time unit is omitted,
-\fItime\fR is interpreted in seconds.
+\fItime\fR is interpreted in seconds. Note that using this option with
+`\-\-output-format=json' will yield output that technically isn't valid json,
+since the output will be collated sets of valid json. It will need to be split
+into valid sets of json after the run.
 .TP
 .BI \-\-section \fR=\fPname
 Only run specified section \fIname\fR in job file. Multiple sections can be specified.
diff --git a/init.c b/init.c
index b925b4c..09f58a3 100644
--- a/init.c
+++ b/init.c
@@ -1419,6 +1419,17 @@ static bool wait_for_ok(const char *jobname, struct thread_options *o)
 }
 
 /*
+ * Treat an empty log file name the same as a one not given
+ */
+static const char *make_log_name(const char *logname, const char *jobname)
+{
+	if (logname && strcmp(logname, ""))
+		return logname;
+
+	return jobname;
+}
+
+/*
  * Adds a job to the list of things todo. Sanitizes the various options
  * to make sure we don't have conflicts, and initializes various
  * members of td.
@@ -1542,7 +1553,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			.log_gz = o->log_gz,
 			.log_gz_store = o->log_gz_store,
 		};
-		const char *pre = o->lat_log_file ? o->lat_log_file : o->name;
+		const char *pre = make_log_name(o->lat_log_file, o->name);
 		const char *suf;
 
 		if (p.log_gz_store)
@@ -1561,6 +1572,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		gen_log_name(logname, sizeof(logname), "clat", pre,
 				td->thread_number, suf, o->per_job_logs);
 		setup_log(&td->clat_log, &p, logname);
+
 	}
 
 	if (o->write_hist_log) {
@@ -1574,7 +1586,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			.log_gz = o->log_gz,
 			.log_gz_store = o->log_gz_store,
 		};
-		const char *pre = o->hist_log_file ? o->hist_log_file : o->name;
+		const char *pre = make_log_name(o->hist_log_file, o->name);
 		const char *suf;
 
 #ifndef CONFIG_ZLIB
@@ -1605,7 +1617,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			.log_gz = o->log_gz,
 			.log_gz_store = o->log_gz_store,
 		};
-		const char *pre = o->bw_log_file ? o->bw_log_file : o->name;
+		const char *pre = make_log_name(o->bw_log_file, o->name);
 		const char *suf;
 
 		if (fio_option_is_set(o, bw_avg_time))
@@ -1636,7 +1648,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			.log_gz = o->log_gz,
 			.log_gz_store = o->log_gz_store,
 		};
-		const char *pre = o->iops_log_file ? o->iops_log_file : o->name;
+		const char *pre = make_log_name(o->iops_log_file, o->name);
 		const char *suf;
 
 		if (fio_option_is_set(o, iops_avg_time))
diff --git a/os/windows/examples.wxs b/os/windows/examples.wxs
index cc2ff5c..e8580d9 100755
--- a/os/windows/examples.wxs
+++ b/os/windows/examples.wxs
@@ -3,16 +3,22 @@
     <Fragment>
         <DirectoryRef Id="examples">
                 <Component>
-                    <File Source="..\..\examples\1mbs_clients.fio" />
+                  <File Source="..\..\examples\1mbs_clients.fio" />
                 </Component>
                 <Component>
-                    <File Source="..\..\examples\aio-read.fio" />
+                  <File Source="..\..\examples\aio-read.fio" />
                 </Component>
                 <Component>
-                    <File Source="..\..\examples\backwards-read.fio" />
+                  <File Source="..\..\examples\backwards-read.fio" />
                 </Component>
                 <Component>
-                    <File Source="..\..\examples\basic-verify.fio" />
+                  <File Source="..\..\examples\basic-verify.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\butterfly.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\cpp_null.fio" />
                 </Component>
                 <Component>
                   <File Source="..\..\examples\cpuio.fio" />
@@ -21,7 +27,7 @@
                   <File Source="..\..\examples\dev-dax.fio" />
                 </Component>
                 <Component>
-                    <File Source="..\..\examples\disk-zone-profile.fio" />
+                  <File Source="..\..\examples\disk-zone-profile.fio" />
                 </Component>
                 <Component>
                   <File Source="..\..\examples\e4defrag.fio" />
@@ -36,13 +42,37 @@
                   <File Source="..\..\examples\falloc.fio" />
                 </Component>
                 <Component>
+                  <File Source="..\..\examples\filecreate-ioengine.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\fio-rand-read.job" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\fio-rand-RW.job" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\fio-rand-write.job" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\fio-seq-read.job" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\fio-seq-RW.job" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\fio-seq-write.job" />
+                </Component>
+                <Component>
                   <File Source="..\..\examples\fixed-rate-submission.fio" />
                 </Component>
                 <Component>
                   <File Source="..\..\examples\flow.fio" />
                 </Component>
                 <Component>
-                    <File Source="..\..\examples\fsx.fio" />
+                  <File Source="..\..\examples\fsx.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\ftruncate.fio" />
                 </Component>
                 <Component>
                   <File Source="..\..\examples\fusion-aw-sync.fio" />
@@ -51,7 +81,25 @@
                   <File Source="..\..\examples\gfapi.fio" />
                 </Component>
                 <Component>
-                    <File Source="..\..\examples\iometer-file-access-server.fio" />
+                  <File Source="..\..\examples\gpudirect-rdmaio-client.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\gpudirect-rdmaio-server.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\http-s3.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\http-swift.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\http-webdav.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\ime.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\iometer-file-access-server.fio" />
                 </Component>
                 <Component>
                   <File Source="..\..\examples\jesd219.fio" />
@@ -63,13 +111,16 @@
                   <File Source="..\..\examples\libhdfs.fio" />
                 </Component>
                 <Component>
+                  <File Source="..\..\examples\libpmem.fio" />
+                </Component>
+                <Component>
                   <File Source="..\..\examples\mtd.fio" />
                 </Component>
                 <Component>
-                    <File Source="..\..\examples\netio.fio" />
+                  <File Source="..\..\examples\netio.fio" />
                 </Component>
                 <Component>
-                    <File Source="..\..\examples\netio_multicast.fio" />
+                  <File Source="..\..\examples\netio_multicast.fio" />
                 </Component>
                 <Component>
                   <File Source="..\..\examples\null.fio" />
@@ -84,6 +135,9 @@
                   <File Source="..\..\examples\poisson-rate-submission.fio" />
                 </Component>
                 <Component>
+                  <File Source="..\..\examples\rados.fio" />
+                </Component>
+                <Component>
                   <File Source="..\..\examples\rand-zones.fio" />
                 </Component>
                 <Component>
@@ -124,6 +178,8 @@
             <ComponentRef Id="aio_read.fio" />
             <ComponentRef Id="backwards_read.fio" />
             <ComponentRef Id="basic_verify.fio" />
+            <ComponentRef Id="butterfly.fio"/>
+            <ComponentRef Id="cpp_null.fio"/>
             <ComponentRef Id="cpuio.fio" />
             <ComponentRef Id="dev_dax.fio" />
             <ComponentRef Id="disk_zone_profile.fio" />
@@ -131,15 +187,30 @@
             <ComponentRef Id="e4defrag2.fio" />
             <ComponentRef Id="enospc_pressure.fio" />
             <ComponentRef Id="falloc.fio" />
+            <ComponentRef Id="filecreate_ioengine.fio"/>
+            <ComponentRef Id="fio_rand_read.job"/>
+            <ComponentRef Id="fio_rand_RW.job"/>
+            <ComponentRef Id="fio_rand_write.job"/>
+            <ComponentRef Id="fio_seq_read.job"/>
+            <ComponentRef Id="fio_seq_RW.job"/>
+            <ComponentRef Id="fio_seq_write.job"/>
             <ComponentRef Id="fixed_rate_submission.fio" />
             <ComponentRef Id="flow.fio" />
             <ComponentRef Id="fsx.fio" />
+            <ComponentRef Id="ftruncate.fio"/>
             <ComponentRef Id="fusion_aw_sync.fio" />
             <ComponentRef Id="gfapi.fio" />
+            <ComponentRef Id="gpudirect_rdmaio_client.fio"/>
+            <ComponentRef Id="gpudirect_rdmaio_server.fio"/>
+            <ComponentRef Id="http_s3.fio"/>
+            <ComponentRef Id="http_swift.fio"/>
+            <ComponentRef Id="http_webdav.fio"/>
+            <ComponentRef Id="ime.fio"/>
             <ComponentRef Id="iometer_file_access_server.fio" />
             <ComponentRef Id="jesd219.fio" />
             <ComponentRef Id="latency_profile.fio" />
             <ComponentRef Id="libhdfs.fio" />
+            <ComponentRef Id="libpmem.fio"/>
             <ComponentRef Id="mtd.fio" />
             <ComponentRef Id="netio.fio" />
             <ComponentRef Id="netio_multicast.fio" />
@@ -147,6 +218,7 @@
             <ComponentRef Id="numa.fio" />
             <ComponentRef Id="pmemblk.fio" />
             <ComponentRef Id="poisson_rate_submission.fio" />
+            <ComponentRef Id="rados.fio"/>
             <ComponentRef Id="rand_zones.fio" />
             <ComponentRef Id="rbd.fio" />
             <ComponentRef Id="rdmaio_client.fio" />
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 73b2810..97d88e9 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -61,7 +61,7 @@
 	<Property Id="ARPURLINFOABOUT" Value="http://git.kernel.dk/cgit/fio/" />
 	<Property Id='ARPCONTACT'>fio@vger.kernel.org</Property>
 	<Property Id='ARPHELPLINK'>http://www.spinics.net/lists/fio/</Property>
-	<Property Id='ARPURLUPDATEINFO'>http://bluestop.org/fio/</Property>
+	<Property Id='ARPURLUPDATEINFO'>https://bluestop.org/fio/</Property>
 
 	<WixVariable Id="WixUILicenseRtf" Value="eula.rtf" />
 
diff --git a/t/zbd/run-tests-against-regular-nullb b/t/zbd/run-tests-against-regular-nullb
index 133c7c4..0f6e4b6 100755
--- a/t/zbd/run-tests-against-regular-nullb
+++ b/t/zbd/run-tests-against-regular-nullb
@@ -4,6 +4,8 @@
 #
 # This file is released under the GPL.
 
+scriptdir="$(cd "$(dirname "$0")" && pwd)"
+
 for d in /sys/kernel/config/nullb/*; do [ -d "$d" ] && rmdir "$d"; done
 modprobe -r null_blk
 modprobe null_blk nr_devices=0 || return $?
@@ -22,4 +24,4 @@ modprobe null_blk nr_devices=0 &&
     echo 1 > memory_backed &&
     echo 1 > power
 
-"$(dirname "$0")"/test-zbd-support "$@" /dev/nullb0
+"${scriptdir}"/test-zbd-support "$@" /dev/nullb0
diff --git a/t/zbd/run-tests-against-zoned-nullb b/t/zbd/run-tests-against-zoned-nullb
index 7d9eb43..9336716 100755
--- a/t/zbd/run-tests-against-zoned-nullb
+++ b/t/zbd/run-tests-against-zoned-nullb
@@ -4,6 +4,8 @@
 #
 # This file is released under the GPL.
 
+scriptdir="$(cd "$(dirname "$0")" && pwd)"
+
 for d in /sys/kernel/config/nullb/*; do [ -d "$d" ] && rmdir "$d"; done
 modprobe -r null_blk
 modprobe null_blk nr_devices=0 || return $?
@@ -24,4 +26,4 @@ modprobe null_blk nr_devices=0 &&
     echo 1 > memory_backed &&
     echo 1 > power
 
-"$(dirname "$0")"/test-zbd-support "$@" /dev/nullb0
+"${scriptdir}"/test-zbd-support "$@" /dev/nullb0
diff --git a/zbd.c b/zbd.c
index 5619769..0f3636a 100644
--- a/zbd.c
+++ b/zbd.c
@@ -31,10 +31,10 @@ static uint32_t zbd_zone_idx(const struct fio_file *f, uint64_t offset)
 {
 	uint32_t zone_idx;
 
-	if (f->zbd_info->zone_size_log2)
+	if (f->zbd_info->zone_size_log2 > 0)
 		zone_idx = offset >> f->zbd_info->zone_size_log2;
 	else
-		zone_idx = (offset >> 9) / f->zbd_info->zone_size;
+		zone_idx = offset / f->zbd_info->zone_size;
 
 	return min(zone_idx, f->zbd_info->nr_zones);
 }
@@ -53,7 +53,7 @@ static bool zbd_zone_full(const struct fio_file *f, struct fio_zone_info *z,
 	assert((required & 511) == 0);
 
 	return z->type == BLK_ZONE_TYPE_SEQWRITE_REQ &&
-		z->wp + (required >> 9) > z->start + f->zbd_info->zone_size;
+		z->wp + required > z->start + f->zbd_info->zone_size;
 }
 
 static bool is_valid_offset(const struct fio_file *f, uint64_t offset)
@@ -121,8 +121,8 @@ static bool zbd_verify_sizes(void)
 				continue;
 			zone_idx = zbd_zone_idx(f, f->file_offset);
 			z = &f->zbd_info->zone_info[zone_idx];
-			if (f->file_offset != (z->start << 9)) {
-				new_offset = (z+1)->start << 9;
+			if (f->file_offset != z->start) {
+				new_offset = (z+1)->start;
 				if (new_offset >= f->file_offset + f->io_size) {
 					log_info("%s: io_size must be at least one zone\n",
 						 f->file_name);
@@ -136,7 +136,7 @@ static bool zbd_verify_sizes(void)
 			}
 			zone_idx = zbd_zone_idx(f, f->file_offset + f->io_size);
 			z = &f->zbd_info->zone_info[zone_idx];
-			new_end = z->start << 9;
+			new_end = z->start;
 			if (f->file_offset + f->io_size != new_end) {
 				if (new_end <= f->file_offset) {
 					log_info("%s: io_size must be at least one zone\n",
@@ -168,10 +168,10 @@ static bool zbd_verify_bs(void)
 			zone_size = f->zbd_info->zone_size;
 			for (k = 0; k < ARRAY_SIZE(td->o.bs); k++) {
 				if (td->o.verify != VERIFY_NONE &&
-				    (zone_size << 9) % td->o.bs[k] != 0) {
+				    zone_size % td->o.bs[k] != 0) {
 					log_info("%s: block size %llu is not a divisor of the zone size %d\n",
 						 f->file_name, td->o.bs[k],
-						 zone_size << 9);
+						 zone_size);
 					return false;
 				}
 			}
@@ -273,9 +273,9 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 	pthread_mutexattr_t attr;
 	int i;
 
-	zone_size = td->o.zone_size >> 9;
+	zone_size = td->o.zone_size;
 	assert(zone_size);
-	nr_zones = ((f->real_file_size >> 9) + zone_size - 1) / zone_size;
+	nr_zones = (f->real_file_size + zone_size - 1) / zone_size;
 	zbd_info = scalloc(1, sizeof(*zbd_info) +
 			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
 	if (!zbd_info)
@@ -300,7 +300,7 @@ static int init_zone_info(struct thread_data *td, struct fio_file *f)
 	f->zbd_info = zbd_info;
 	f->zbd_info->zone_size = zone_size;
 	f->zbd_info->zone_size_log2 = is_power_of_2(zone_size) ?
-		ilog2(zone_size) + 9 : -1;
+		ilog2(zone_size) : -1;
 	f->zbd_info->nr_zones = nr_zones;
 	pthread_mutexattr_destroy(&attr);
 	return 0;
@@ -351,20 +351,20 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 		goto close;
 	}
 	z = (void *)(hdr + 1);
-	zone_size = z->len;
-	nr_zones = ((f->real_file_size >> 9) + zone_size - 1) / zone_size;
+	zone_size = z->len << 9;
+	nr_zones = (f->real_file_size + zone_size - 1) / zone_size;
 
 	if (td->o.zone_size == 0) {
-		td->o.zone_size = zone_size << 9;
-	} else if (td->o.zone_size != zone_size << 9) {
+		td->o.zone_size = zone_size;
+	} else if (td->o.zone_size != zone_size) {
 		log_info("fio: %s job parameter zonesize %lld does not match disk zone size %ld.\n",
-			 f->file_name, td->o.zone_size, zone_size << 9);
+			 f->file_name, td->o.zone_size, zone_size);
 		ret = -EINVAL;
 		goto close;
 	}
 
 	dprint(FD_ZBD, "Device %s has %d zones of size %lu KB\n", f->file_name,
-	       nr_zones, zone_size / 2);
+	       nr_zones, zone_size / 1024);
 
 	zbd_info = scalloc(1, sizeof(*zbd_info) +
 			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
@@ -378,18 +378,18 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 		z = (void *)(hdr + 1);
 		for (i = 0; i < hdr->nr_zones; i++, j++, z++, p++) {
 			pthread_mutex_init(&p->mutex, &attr);
-			p->start = z->start;
+			p->start = z->start << 9;
 			switch (z->cond) {
 			case BLK_ZONE_COND_NOT_WP:
-				p->wp = z->start;
+				p->wp = p->start;
 				break;
 			case BLK_ZONE_COND_FULL:
-				p->wp = z->start + zone_size;
+				p->wp = p->start + zone_size;
 				break;
 			default:
 				assert(z->start <= z->wp);
-				assert(z->wp <= z->start + zone_size);
-				p->wp = z->wp;
+				assert(z->wp <= z->start + (zone_size >> 9));
+				p->wp = z->wp << 9;
 				break;
 			}
 			p->type = z->type;
@@ -413,12 +413,12 @@ static int parse_zone_info(struct thread_data *td, struct fio_file *f)
 		}
 	}
 	/* a sentinel */
-	zbd_info->zone_info[nr_zones].start = start_sector;
+	zbd_info->zone_info[nr_zones].start = start_sector << 9;
 
 	f->zbd_info = zbd_info;
 	f->zbd_info->zone_size = zone_size;
 	f->zbd_info->zone_size_log2 = is_power_of_2(zone_size) ?
-		ilog2(zone_size) + 9 : -1;
+		ilog2(zone_size) : -1;
 	f->zbd_info->nr_zones = nr_zones;
 	zbd_info = NULL;
 	ret = 0;
@@ -556,18 +556,18 @@ int zbd_init(struct thread_data *td)
  * Returns 0 upon success and a negative error code upon failure.
  */
 static int zbd_reset_range(struct thread_data *td, const struct fio_file *f,
-			   uint64_t sector, uint64_t nr_sectors)
+			   uint64_t offset, uint64_t length)
 {
 	struct blk_zone_range zr = {
-		.sector         = sector,
-		.nr_sectors     = nr_sectors,
+		.sector         = offset >> 9,
+		.nr_sectors     = length >> 9,
 	};
 	uint32_t zone_idx_b, zone_idx_e;
 	struct fio_zone_info *zb, *ze, *z;
 	int ret = 0;
 
 	assert(f->fd != -1);
-	assert(is_valid_offset(f, ((sector + nr_sectors) << 9) - 1));
+	assert(is_valid_offset(f, offset + length - 1));
 	switch (f->zbd_info->model) {
 	case ZBD_DM_HOST_AWARE:
 	case ZBD_DM_HOST_MANAGED:
@@ -583,9 +583,9 @@ static int zbd_reset_range(struct thread_data *td, const struct fio_file *f,
 		break;
 	}
 
-	zone_idx_b = zbd_zone_idx(f, sector << 9);
+	zone_idx_b = zbd_zone_idx(f, offset);
 	zb = &f->zbd_info->zone_info[zone_idx_b];
-	zone_idx_e = zbd_zone_idx(f, (sector + nr_sectors) << 9);
+	zone_idx_e = zbd_zone_idx(f, offset + length);
 	ze = &f->zbd_info->zone_info[zone_idx_e];
 	for (z = zb; z < ze; z++) {
 		pthread_mutex_lock(&z->mutex);
@@ -635,7 +635,7 @@ static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
 			   struct fio_zone_info *const ze, bool all_zones)
 {
 	struct fio_zone_info *z, *start_z = ze;
-	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE] >> 9;
+	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
 	bool reset_wp;
 	int res = 0;
 
@@ -921,7 +921,7 @@ struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
 	/* Both z->mutex and f->zbd_info->mutex are held. */
 
 examine_zone:
-	if ((z->wp << 9) + min_bs <= ((z+1)->start << 9)) {
+	if (z->wp + min_bs <= (z+1)->start) {
 		pthread_mutex_unlock(&f->zbd_info->mutex);
 		goto out;
 	}
@@ -938,12 +938,12 @@ examine_zone:
 		zone_idx++;
 		pthread_mutex_unlock(&z->mutex);
 		z++;
-		if (!is_valid_offset(f, z->start << 9)) {
+		if (!is_valid_offset(f, z->start)) {
 			/* Wrap-around. */
 			zone_idx = zbd_zone_idx(f, f->file_offset);
 			z = &f->zbd_info->zone_info[zone_idx];
 		}
-		assert(is_valid_offset(f, z->start << 9));
+		assert(is_valid_offset(f, z->start));
 		pthread_mutex_lock(&z->mutex);
 		if (z->open)
 			continue;
@@ -963,7 +963,7 @@ examine_zone:
 		z = &f->zbd_info->zone_info[zone_idx];
 
 		pthread_mutex_lock(&z->mutex);
-		if ((z->wp << 9) + min_bs <= ((z+1)->start << 9))
+		if (z->wp + min_bs <= (z+1)->start)
 			goto out;
 		pthread_mutex_lock(&f->zbd_info->mutex);
 	}
@@ -976,7 +976,7 @@ examine_zone:
 out:
 	dprint(FD_ZBD, "%s(%s): returning zone %d\n", __func__, f->file_name,
 	       zone_idx);
-	io_u->offset = z->start << 9;
+	io_u->offset = z->start;
 	return z;
 }
 
@@ -997,7 +997,7 @@ static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
 	if (z->verify_block * min_bs >= f->zbd_info->zone_size)
 		log_err("%s: %d * %d >= %ld\n", f->file_name, z->verify_block,
 			min_bs, f->zbd_info->zone_size);
-	io_u->offset = (z->start << 9) + z->verify_block++ * min_bs;
+	io_u->offset = z->start + z->verify_block++ * min_bs;
 	return z;
 }
 
@@ -1026,7 +1026,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 	for (z1 = zb + 1, z2 = zb - 1; z1 < zl || z2 >= zf; z1++, z2--) {
 		if (z1 < zl && z1->cond != BLK_ZONE_COND_OFFLINE) {
 			pthread_mutex_lock(&z1->mutex);
-			if (z1->start + (min_bs >> 9) <= z1->wp)
+			if (z1->start + min_bs <= z1->wp)
 				return z1;
 			pthread_mutex_unlock(&z1->mutex);
 		} else if (!td_random(td)) {
@@ -1035,7 +1035,7 @@ zbd_find_zone(struct thread_data *td, struct io_u *io_u,
 		if (td_random(td) && z2 >= zf &&
 		    z2->cond != BLK_ZONE_COND_OFFLINE) {
 			pthread_mutex_lock(&z2->mutex);
-			if (z2->start + (min_bs >> 9) <= z2->wp)
+			if (z2->start + min_bs <= z2->wp)
 				return z2;
 			pthread_mutex_unlock(&z2->mutex);
 		}
@@ -1066,7 +1066,7 @@ static void zbd_post_submit(const struct io_u *io_u, bool success)
 		return;
 
 	zone_idx = zbd_zone_idx(io_u->file, io_u->offset);
-	end = (io_u->offset + io_u->buflen) >> 9;
+	end = io_u->offset + io_u->buflen;
 	z = &zbd_info->zone_info[zone_idx];
 	assert(zone_idx < zbd_info->nr_zones);
 	if (z->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
@@ -1119,7 +1119,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 {
 	const struct fio_file *f = io_u->file;
 	uint32_t zone_idx_b;
-	struct fio_zone_info *zb, *zl;
+	struct fio_zone_info *zb, *zl, *orig_zb;
 	uint32_t orig_len = io_u->buflen;
 	uint32_t min_bs = td->o.min_bs[io_u->ddir];
 	uint64_t new_len;
@@ -1132,6 +1132,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 	assert(io_u->buflen);
 	zone_idx_b = zbd_zone_idx(f, io_u->offset);
 	zb = &f->zbd_info->zone_info[zone_idx_b];
+	orig_zb = zb;
 
 	/* Accept the I/O offset for conventional zones. */
 	if (zb->type == BLK_ZONE_TYPE_CONVENTIONAL)
@@ -1153,21 +1154,14 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			goto accept;
 		}
 		/*
-		 * Avoid reads past the write pointer because such reads do not
-		 * hit the medium.
+		 * Check that there is enough written data in the zone to do an
+		 * I/O of at least min_bs B. If there isn't, find a new zone for
+		 * the I/O.
 		 */
 		range = zb->cond != BLK_ZONE_COND_OFFLINE ?
-			((zb->wp - zb->start) << 9) - io_u->buflen : 0;
-		if (td_random(td) && range >= 0) {
-			io_u->offset = (zb->start << 9) +
-				((io_u->offset - (zb->start << 9)) %
-				 (range + 1)) / min_bs * min_bs;
-			assert(zb->start << 9 <= io_u->offset);
-			assert(io_u->offset + io_u->buflen <= zb->wp << 9);
-			goto accept;
-		}
-		if (zb->cond == BLK_ZONE_COND_OFFLINE ||
-		    (io_u->offset + io_u->buflen) >> 9 > zb->wp) {
+			zb->wp - zb->start : 0;
+		if (range < min_bs ||
+		    ((!td_random(td)) && (io_u->offset + min_bs > zb->wp))) {
 			pthread_mutex_unlock(&zb->mutex);
 			zl = &f->zbd_info->zone_info[zbd_zone_idx(f,
 						f->file_offset + f->io_size)];
@@ -1179,17 +1173,42 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 				       io_u->buflen);
 				goto eof;
 			}
-			io_u->offset = zb->start << 9;
+			/*
+			 * zbd_find_zone() returned a zone with a range of at
+			 * least min_bs.
+			 */
+			range = zb->wp - zb->start;
+			assert(range >= min_bs);
+
+			if (!td_random(td))
+				io_u->offset = zb->start;
 		}
-		if ((io_u->offset + io_u->buflen) >> 9 > zb->wp) {
-			dprint(FD_ZBD, "%s: %lld + %lld > %" PRIu64 "\n",
-			       f->file_name, io_u->offset, io_u->buflen,
-			       zb->wp);
-			goto eof;
+		/*
+		 * Make sure the I/O is within the zone valid data range while
+		 * maximizing the I/O size and preserving randomness.
+		 */
+		if (range <= io_u->buflen)
+			io_u->offset = zb->start;
+		else if (td_random(td))
+			io_u->offset = zb->start +
+				((io_u->offset - orig_zb->start) %
+				 (range - io_u->buflen)) / min_bs * min_bs;
+		/*
+		 * Make sure the I/O does not cross over the zone wp position.
+		 */
+		new_len = min((unsigned long long)io_u->buflen,
+			      (unsigned long long)(zb->wp - io_u->offset));
+		new_len = new_len / min_bs * min_bs;
+		if (new_len < io_u->buflen) {
+			io_u->buflen = new_len;
+			dprint(FD_IO, "Changed length from %u into %llu\n",
+			       orig_len, io_u->buflen);
 		}
+		assert(zb->start <= io_u->offset);
+		assert(io_u->offset + io_u->buflen <= zb->wp);
 		goto accept;
 	case DDIR_WRITE:
-		if (io_u->buflen > (f->zbd_info->zone_size << 9))
+		if (io_u->buflen > f->zbd_info->zone_size)
 			goto eof;
 		if (!zbd_open_zone(td, io_u, zone_idx_b)) {
 			pthread_mutex_unlock(&zb->mutex);
@@ -1201,7 +1220,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		/* Check whether the zone reset threshold has been exceeded */
 		if (td->o.zrf.u.f) {
 			check_swd(td, f);
-			if ((f->zbd_info->sectors_with_data << 9) >=
+			if (f->zbd_info->sectors_with_data >=
 			    f->io_size * td->o.zrt.u.f &&
 			    zbd_dec_and_reset_write_cnt(td, f)) {
 				zb->reset_zone = 1;
@@ -1225,7 +1244,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		}
 		/* Make writes occur at the write pointer */
 		assert(!zbd_zone_full(f, zb, min_bs));
-		io_u->offset = zb->wp << 9;
+		io_u->offset = zb->wp;
 		if (!is_valid_offset(f, io_u->offset)) {
 			dprint(FD_ZBD, "Dropped request with offset %llu\n",
 			       io_u->offset);
@@ -1237,7 +1256,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 		 * small.
 		 */
 		new_len = min((unsigned long long)io_u->buflen,
-			      ((zb + 1)->start << 9) - io_u->offset);
+			      (zb + 1)->start - io_u->offset);
 		new_len = new_len / min_bs * min_bs;
 		if (new_len == io_u->buflen)
 			goto accept;
@@ -1248,7 +1267,7 @@ enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
 			goto accept;
 		}
 		log_err("Zone remainder %lld smaller than minimum block size %d\n",
-			(((zb + 1)->start << 9) - io_u->offset),
+			((zb + 1)->start - io_u->offset),
 			min_bs);
 		goto eof;
 	case DDIR_TRIM:
diff --git a/zbd.h b/zbd.h
index 08751fd..d750b67 100644
--- a/zbd.h
+++ b/zbd.h
@@ -95,8 +95,6 @@ int zbd_init(struct thread_data *td);
 void zbd_file_reset(struct thread_data *td, struct fio_file *f);
 bool zbd_unaligned_write(int error_code);
 enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
-int zbd_do_trim(struct thread_data *td, const struct io_u *io_u);
-void zbd_update_wp(struct thread_data *td, const struct io_u *io_u);
 char *zbd_write_status(const struct thread_stat *ts);
 #else
 static inline void zbd_free_zone_info(struct fio_file *f)
@@ -123,16 +121,6 @@ static inline enum io_u_action zbd_adjust_block(struct thread_data *td,
 	return io_u_accept;
 }
 
-static inline int zbd_do_trim(struct thread_data *td, const struct io_u *io_u)
-{
-	return 1;
-}
-
-static inline void zbd_update_wp(struct thread_data *td,
-				 const struct io_u *io_u)
-{
-}
-
 static inline char *zbd_write_status(const struct thread_stat *ts)
 {
 	return NULL;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7df9ac895f4af7ac4500379d5a5e204be9210fb2:

  client: fix nr_samples (2018-08-31 12:43:59 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 53ee8c17adb00e3db4f2c9441777ba777390cb9f:

  engines/sg: improve error handling (2018-09-03 09:01:14 -0600)

----------------------------------------------------------------
Vincent Fu (1):
      engines/sg: improve error handling

 engines/sg.c | 49 +++++++++++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 24 deletions(-)

---

Diff of recent changes:

diff --git a/engines/sg.c b/engines/sg.c
index 7741f83..3cc068f 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -675,36 +675,37 @@ static int fio_sgio_commit(struct thread_data *td)
 
 	ret = fio_sgio_rw_doio(io_u->file, io_u, 0);
 
-	if (ret < 0)
-		for (i = 0; i < st->unmap_range_count; i++)
-			st->trim_io_us[i]->error = errno;
-	else if (hdr->status)
-		for (i = 0; i < st->unmap_range_count; i++) {
-			st->trim_io_us[i]->resid = hdr->resid;
-			st->trim_io_us[i]->error = EIO;
+	if (ret < 0 || hdr->status) {
+		int error;
+
+		if (ret < 0)
+			error = errno;
+		else {
+			error = EIO;
+			ret = -EIO;
 		}
-	else {
-		if (fio_fill_issue_time(td)) {
-			fio_gettime(&now, NULL);
-			for (i = 0; i < st->unmap_range_count; i++) {
-				struct io_u *io_u = st->trim_io_us[i];
-
-				memcpy(&io_u->issue_time, &now, sizeof(now));
-				io_u_queued(td, io_u);
-			}
+
+		for (i = 0; i < st->unmap_range_count; i++) {
+			st->trim_io_us[i]->error = error;
+			clear_io_u(td, st->trim_io_us[i]);
+			if (hdr->status)
+				st->trim_io_us[i]->resid = hdr->resid;
 		}
-		io_u_mark_submit(td, st->unmap_range_count);
+
+		td_verror(td, error, "xfer");
+		return ret;
 	}
 
-	if (io_u->error) {
-		td_verror(td, io_u->error, "xfer");
-		return 0;
+	if (fio_fill_issue_time(td)) {
+		fio_gettime(&now, NULL);
+		for (i = 0; i < st->unmap_range_count; i++) {
+			memcpy(&st->trim_io_us[i]->issue_time, &now, sizeof(now));
+			io_u_queued(td, io_u);
+		}
 	}
+	io_u_mark_submit(td, st->unmap_range_count);
 
-	if (ret == FIO_Q_QUEUED)
-		return 0;
-	else
-		return ret;
+	return 0;
 }
 
 static struct io_u *fio_sgio_event(struct thread_data *td, int event)


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-09-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-09-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c6cc1cfed2aec5ea348cbe8b8762ba8fd5fad966:

  Merge branch 'configure-help' of https://github.com/hahnjo/fio (2018-08-30 08:22:21 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7df9ac895f4af7ac4500379d5a5e204be9210fb2:

  client: fix nr_samples (2018-08-31 12:43:59 -0600)

----------------------------------------------------------------
Jeff Furlong (1):
      client: fix nr_samples

 client.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/client.c b/client.c
index a868e3a..31c7c64 100644
--- a/client.c
+++ b/client.c
@@ -1539,7 +1539,7 @@ static struct cmd_iolog_pdu *convert_iolog_gz(struct fio_net_cmd *cmd,
 #ifdef CONFIG_ZLIB
 	struct cmd_iolog_pdu *ret;
 	z_stream stream;
-	uint32_t nr_samples;
+	uint64_t nr_samples;
 	size_t total;
 	char *p;
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-31 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-31 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 26b3a1880d38bc24b633a643339c9ca31f303d1c:

  Make td_io_u_lock/unlock() explicit (2018-08-25 10:22:31 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c6cc1cfed2aec5ea348cbe8b8762ba8fd5fad966:

  Merge branch 'configure-help' of https://github.com/hahnjo/fio (2018-08-30 08:22:21 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'configure-help' of https://github.com/hahnjo/fio

Jonas Hahnfeld (1):
      configure: Document more switches to disable features

 configure | 3 +++
 1 file changed, 3 insertions(+)

---

Diff of recent changes:

diff --git a/configure b/configure
index ab89df7..5e11195 100755
--- a/configure
+++ b/configure
@@ -226,6 +226,9 @@ if test "$show_help" = "yes" ; then
   echo "--enable-gfio           Enable building of gtk gfio"
   echo "--disable-numa          Disable libnuma even if found"
   echo "--disable-rdma          Disable RDMA support even if found"
+  echo "--disable-rados         Disable Rados support even if found"
+  echo "--disable-rbd           Disable Rados Block Device even if found"
+  echo "--disable-http          Disable HTTP support even if found"
   echo "--disable-gfapi         Disable gfapi"
   echo "--enable-libhdfs        Enable hdfs support"
   echo "--disable-lex           Disable use of lex/yacc for math"


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 632b28a93154cb1be203d911f758d5932c0a8f86:

  Merge branch 'master' of https://github.com/bvanassche/fio (2018-08-24 18:22:13 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 26b3a1880d38bc24b633a643339c9ca31f303d1c:

  Make td_io_u_lock/unlock() explicit (2018-08-25 10:22:31 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Make td_io_u_lock/unlock() explicit

Tomohiro Kusumi (1):
      client: suppress non JSON default outputs on --output-format=json/json+

 client.c      | 19 ++++++++++++++-----
 fio.h         | 10 ++++------
 io_u.c        | 26 ++++++++++++++++++++------
 stat.c        | 44 ++++++++++++++++++++++++++++++++------------
 steadystate.c |  9 +++++++--
 5 files changed, 77 insertions(+), 31 deletions(-)

---

Diff of recent changes:

diff --git a/client.c b/client.c
index bc0275b..a868e3a 100644
--- a/client.c
+++ b/client.c
@@ -32,6 +32,7 @@ static void handle_stop(struct fio_client *client);
 static void handle_start(struct fio_client *client, struct fio_net_cmd *cmd);
 
 static void convert_text(struct fio_net_cmd *cmd);
+static void client_display_thread_status(struct jobs_eta *je);
 
 struct client_ops fio_client_ops = {
 	.text		= handle_text,
@@ -40,7 +41,7 @@ struct client_ops fio_client_ops = {
 	.group_stats	= handle_gs,
 	.stop		= handle_stop,
 	.start		= handle_start,
-	.eta		= display_thread_status,
+	.eta		= client_display_thread_status,
 	.probe		= handle_probe,
 	.eta_msec	= FIO_CLIENT_DEF_ETA_MSEC,
 	.client_type	= FIO_CLIENT_TYPE_CLI,
@@ -1195,7 +1196,8 @@ static void handle_du(struct fio_client *client, struct fio_net_cmd *cmd)
 
 	if (!client->disk_stats_shown) {
 		client->disk_stats_shown = true;
-		log_info("\nDisk stats (read/write):\n");
+		if (!(output_format & FIO_OUTPUT_JSON))
+			log_info("\nDisk stats (read/write):\n");
 	}
 
 	if (output_format & FIO_OUTPUT_JSON) {
@@ -1477,9 +1479,10 @@ static void handle_probe(struct fio_client *client, struct fio_net_cmd *cmd)
 	sprintf(bit, "%d-bit", probe->bpp * 8);
 	probe->flags = le64_to_cpu(probe->flags);
 
-	log_info("hostname=%s, be=%u, %s, os=%s, arch=%s, fio=%s, flags=%lx\n",
-		probe->hostname, probe->bigendian, bit, os, arch,
-		probe->fio_version, (unsigned long) probe->flags);
+	if (!(output_format & FIO_OUTPUT_JSON))
+		log_info("hostname=%s, be=%u, %s, os=%s, arch=%s, fio=%s, flags=%lx\n",
+			probe->hostname, probe->bigendian, bit, os, arch,
+			probe->fio_version, (unsigned long) probe->flags);
 
 	if (!client->name)
 		client->name = strdup((char *) probe->hostname);
@@ -2112,3 +2115,9 @@ int fio_handle_clients(struct client_ops *ops)
 	free(pfds);
 	return retval || error_clients;
 }
+
+static void client_display_thread_status(struct jobs_eta *je)
+{
+	if (!(output_format & FIO_OUTPUT_JSON))
+		display_thread_status(je);
+}
diff --git a/fio.h b/fio.h
index 42015d3..9e99da1 100644
--- a/fio.h
+++ b/fio.h
@@ -774,16 +774,14 @@ static inline bool td_async_processing(struct thread_data *td)
  * We currently only need to do locking if we have verifier threads
  * accessing our internal structures too
  */
-static inline void td_io_u_lock(struct thread_data *td)
+static inline void __td_io_u_lock(struct thread_data *td)
 {
-	if (td_async_processing(td))
-		pthread_mutex_lock(&td->io_u_lock);
+	pthread_mutex_lock(&td->io_u_lock);
 }
 
-static inline void td_io_u_unlock(struct thread_data *td)
+static inline void __td_io_u_unlock(struct thread_data *td)
 {
-	if (td_async_processing(td))
-		pthread_mutex_unlock(&td->io_u_lock);
+	pthread_mutex_unlock(&td->io_u_lock);
 }
 
 static inline void td_io_u_free_notify(struct thread_data *td)
diff --git a/io_u.c b/io_u.c
index 3fbcf0f..a3540d1 100644
--- a/io_u.c
+++ b/io_u.c
@@ -768,6 +768,8 @@ void put_file_log(struct thread_data *td, struct fio_file *f)
 
 void put_io_u(struct thread_data *td, struct io_u *io_u)
 {
+	const bool needs_lock = td_async_processing(td);
+
 	if (io_u->post_submit) {
 		io_u->post_submit(io_u, io_u->error == 0);
 		io_u->post_submit = NULL;
@@ -776,7 +778,8 @@ void put_io_u(struct thread_data *td, struct io_u *io_u)
 	if (td->parent)
 		td = td->parent;
 
-	td_io_u_lock(td);
+	if (needs_lock)
+		__td_io_u_lock(td);
 
 	if (io_u->file && !(io_u->flags & IO_U_F_NO_FILE_PUT))
 		put_file_log(td, io_u->file);
@@ -790,7 +793,9 @@ void put_io_u(struct thread_data *td, struct io_u *io_u)
 	}
 	io_u_qpush(&td->io_u_freelist, io_u);
 	td_io_u_free_notify(td);
-	td_io_u_unlock(td);
+
+	if (needs_lock)
+		__td_io_u_unlock(td);
 }
 
 void clear_io_u(struct thread_data *td, struct io_u *io_u)
@@ -801,6 +806,7 @@ void clear_io_u(struct thread_data *td, struct io_u *io_u)
 
 void requeue_io_u(struct thread_data *td, struct io_u **io_u)
 {
+	const bool needs_lock = td_async_processing(td);
 	struct io_u *__io_u = *io_u;
 	enum fio_ddir ddir = acct_ddir(__io_u);
 
@@ -809,7 +815,8 @@ void requeue_io_u(struct thread_data *td, struct io_u **io_u)
 	if (td->parent)
 		td = td->parent;
 
-	td_io_u_lock(td);
+	if (needs_lock)
+		__td_io_u_lock(td);
 
 	io_u_set(td, __io_u, IO_U_F_FREE);
 	if ((__io_u->flags & IO_U_F_FLIGHT) && ddir_rw(ddir))
@@ -823,7 +830,10 @@ void requeue_io_u(struct thread_data *td, struct io_u **io_u)
 
 	io_u_rpush(&td->io_u_requeues, __io_u);
 	td_io_u_free_notify(td);
-	td_io_u_unlock(td);
+
+	if (needs_lock)
+		__td_io_u_unlock(td);
+
 	*io_u = NULL;
 }
 
@@ -1504,13 +1514,15 @@ bool queue_full(const struct thread_data *td)
 
 struct io_u *__get_io_u(struct thread_data *td)
 {
+	const bool needs_lock = td_async_processing(td);
 	struct io_u *io_u = NULL;
 	int ret;
 
 	if (td->stop_io)
 		return NULL;
 
-	td_io_u_lock(td);
+	if (needs_lock)
+		__td_io_u_lock(td);
 
 again:
 	if (!io_u_rempty(&td->io_u_requeues))
@@ -1547,7 +1559,9 @@ again:
 		goto again;
 	}
 
-	td_io_u_unlock(td);
+	if (needs_lock)
+		__td_io_u_unlock(td);
+
 	return io_u;
 }
 
diff --git a/stat.c b/stat.c
index abdbb0e..1a9c553 100644
--- a/stat.c
+++ b/stat.c
@@ -2475,11 +2475,13 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		     unsigned long long nsec, unsigned long long bs,
 		     uint64_t offset)
 {
+	const bool needs_lock = td_async_processing(td);
 	unsigned long elapsed, this_window;
 	struct thread_stat *ts = &td->ts;
 	struct io_log *iolog = td->clat_hist_log;
 
-	td_io_u_lock(td);
+	if (needs_lock)
+		__td_io_u_lock(td);
 
 	add_stat_sample(&ts->clat_stat[ddir], nsec);
 
@@ -2528,37 +2530,43 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		}
 	}
 
-	td_io_u_unlock(td);
+	if (needs_lock)
+		__td_io_u_unlock(td);
 }
 
 void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
 		     unsigned long usec, unsigned long long bs, uint64_t offset)
 {
+	const bool needs_lock = td_async_processing(td);
 	struct thread_stat *ts = &td->ts;
 
 	if (!ddir_rw(ddir))
 		return;
 
-	td_io_u_lock(td);
+	if (needs_lock)
+		__td_io_u_lock(td);
 
 	add_stat_sample(&ts->slat_stat[ddir], usec);
 
 	if (td->slat_log)
 		add_log_sample(td, td->slat_log, sample_val(usec), ddir, bs, offset);
 
-	td_io_u_unlock(td);
+	if (needs_lock)
+		__td_io_u_unlock(td);
 }
 
 void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 		    unsigned long long nsec, unsigned long long bs,
 		    uint64_t offset)
 {
+	const bool needs_lock = td_async_processing(td);
 	struct thread_stat *ts = &td->ts;
 
 	if (!ddir_rw(ddir))
 		return;
 
-	td_io_u_lock(td);
+	if (needs_lock)
+		__td_io_u_lock(td);
 
 	add_stat_sample(&ts->lat_stat[ddir], nsec);
 
@@ -2569,12 +2577,14 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 	if (ts->lat_percentiles)
 		add_clat_percentile_sample(ts, nsec, ddir);
 
-	td_io_u_unlock(td);
+	if (needs_lock)
+		__td_io_u_unlock(td);
 }
 
 void add_bw_sample(struct thread_data *td, struct io_u *io_u,
 		   unsigned int bytes, unsigned long long spent)
 {
+	const bool needs_lock = td_async_processing(td);
 	struct thread_stat *ts = &td->ts;
 	unsigned long rate;
 
@@ -2583,7 +2593,8 @@ void add_bw_sample(struct thread_data *td, struct io_u *io_u,
 	else
 		rate = 0;
 
-	td_io_u_lock(td);
+	if (needs_lock)
+		__td_io_u_lock(td);
 
 	add_stat_sample(&ts->bw_stat[io_u->ddir], rate);
 
@@ -2592,7 +2603,9 @@ void add_bw_sample(struct thread_data *td, struct io_u *io_u,
 			       bytes, io_u->offset);
 
 	td->stat_io_bytes[io_u->ddir] = td->this_io_bytes[io_u->ddir];
-	td_io_u_unlock(td);
+
+	if (needs_lock)
+		__td_io_u_unlock(td);
 }
 
 static int __add_samples(struct thread_data *td, struct timespec *parent_tv,
@@ -2601,6 +2614,7 @@ static int __add_samples(struct thread_data *td, struct timespec *parent_tv,
 			 struct io_stat *stat, struct io_log *log,
 			 bool is_kb)
 {
+	const bool needs_lock = td_async_processing(td);
 	unsigned long spent, rate;
 	enum fio_ddir ddir;
 	unsigned long next, next_log;
@@ -2611,7 +2625,8 @@ static int __add_samples(struct thread_data *td, struct timespec *parent_tv,
 	if (spent < avg_time && avg_time - spent >= LOG_MSEC_SLACK)
 		return avg_time - spent;
 
-	td_io_u_lock(td);
+	if (needs_lock)
+		__td_io_u_lock(td);
 
 	/*
 	 * Compute both read and write rates for the interval.
@@ -2648,7 +2663,8 @@ static int __add_samples(struct thread_data *td, struct timespec *parent_tv,
 
 	timespec_add_msec(parent_tv, avg_time);
 
-	td_io_u_unlock(td);
+	if (needs_lock)
+		__td_io_u_unlock(td);
 
 	if (spent <= avg_time)
 		next = avg_time;
@@ -2668,9 +2684,11 @@ static int add_bw_samples(struct thread_data *td, struct timespec *t)
 void add_iops_sample(struct thread_data *td, struct io_u *io_u,
 		     unsigned int bytes)
 {
+	const bool needs_lock = td_async_processing(td);
 	struct thread_stat *ts = &td->ts;
 
-	td_io_u_lock(td);
+	if (needs_lock)
+		__td_io_u_lock(td);
 
 	add_stat_sample(&ts->iops_stat[io_u->ddir], 1);
 
@@ -2679,7 +2697,9 @@ void add_iops_sample(struct thread_data *td, struct io_u *io_u,
 			       bytes, io_u->offset);
 
 	td->stat_io_blocks[io_u->ddir] = td->this_io_blocks[io_u->ddir];
-	td_io_u_unlock(td);
+
+	if (needs_lock)
+		__td_io_u_unlock(td);
 }
 
 static int add_iops_samples(struct thread_data *td, struct timespec *t)
diff --git a/steadystate.c b/steadystate.c
index ee1c0e5..bd2f70d 100644
--- a/steadystate.c
+++ b/steadystate.c
@@ -208,6 +208,7 @@ void steadystate_check(void)
 
 	prev_groupid = -1;
 	for_each_td(td, i) {
+		const bool needs_lock = td_async_processing(td);
 		struct steadystate_data *ss = &td->ss;
 
 		if (!ss->dur || td->runstate <= TD_SETTING_UP ||
@@ -235,12 +236,16 @@ void steadystate_check(void)
 				ss->state |= FIO_SS_RAMP_OVER;
 		}
 
-		td_io_u_lock(td);
+		if (needs_lock)
+			__td_io_u_lock(td);
+
 		for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
 			td_iops += td->io_blocks[ddir];
 			td_bytes += td->io_bytes[ddir];
 		}
-		td_io_u_unlock(td);
+
+		if (needs_lock)
+			__td_io_u_unlock(td);
 
 		rate_time = mtime_since(&ss->prev_time, &now);
 		memcpy(&ss->prev_time, &now, sizeof(now));


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e7ff953f87d000d1de4c928493a6f67214cfcf8f:

  t/axmap: print explicit overlap ranges tested (2018-08-23 13:58:21 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 632b28a93154cb1be203d911f758d5932c0a8f86:

  Merge branch 'master' of https://github.com/bvanassche/fio (2018-08-24 18:22:13 -0600)

----------------------------------------------------------------
Bart Van Assche (12):
      configure: Add <linux/blkzoned.h> test
      Add the 'zbd' debug level
      Add the zonemode job option
      Introduce the io_u.post_submit callback function pointer
      Pass offset and buffer length explicitly to mark_random_map()
      Add two assert statements in mark_random_map()
      Add support for zoned block devices
      Collect and show zone reset statistics
      Make it possible to limit the number of open zones
      Add support for resetting zones periodically
      Add scripts for testing the fio zoned block device support code
      Fix the typehelp[] array

Jens Axboe (2):
      Merge branch 'zbd'
      Merge branch 'master' of https://github.com/bvanassche/fio

 HOWTO                                 |   88 ++-
 Makefile                              |    3 +
 cconv.c                               |    2 +
 configure                             |   21 +
 debug.h                               |    1 +
 file.h                                |    8 +
 filesetup.c                           |   15 +-
 fio.1                                 |   76 +-
 fio.h                                 |    2 +
 init.c                                |   29 +-
 io_u.c                                |   55 +-
 io_u.h                                |    6 +
 ioengines.c                           |   12 +
 options.c                             |   70 ++
 parse.c                               |   29 +-
 stat.c                                |   13 +-
 stat.h                                |    3 +
 t/zbd/functions                       |  106 +++
 t/zbd/run-tests-against-regular-nullb |   25 +
 t/zbd/run-tests-against-zoned-nullb   |   27 +
 t/zbd/test-zbd-support                |  817 +++++++++++++++++++++
 thread_options.h                      |   17 +
 zbd.c                                 | 1288 +++++++++++++++++++++++++++++++++
 zbd.h                                 |  142 ++++
 24 files changed, 2799 insertions(+), 56 deletions(-)
 create mode 100644 t/zbd/functions
 create mode 100755 t/zbd/run-tests-against-regular-nullb
 create mode 100755 t/zbd/run-tests-against-zoned-nullb
 create mode 100755 t/zbd/test-zbd-support
 create mode 100644 zbd.c
 create mode 100644 zbd.h

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 3839461..7bbd589 100644
--- a/HOWTO
+++ b/HOWTO
@@ -952,24 +952,92 @@ Target file/device
 
 	Unlink job files after each iteration or loop.  Default: false.
 
+.. option:: zonemode=str
+
+	Accepted values are:
+
+		**none**
+				The :option:`zonerange`, :option:`zonesize` and
+				:option:`zoneskip` parameters are ignored.
+		**strided**
+				I/O happens in a single zone until
+				:option:`zonesize` bytes have been transferred.
+				After that number of bytes has been
+				transferred processing of the next zone
+				starts.
+		**zbd**
+				Zoned block device mode. I/O happens
+				sequentially in each zone, even if random I/O
+				has been selected. Random I/O happens across
+				all zones instead of being restricted to a
+				single zone. The :option:`zoneskip` parameter
+				is ignored. :option:`zonerange` and
+				:option:`zonesize` must be identical.
+
 .. option:: zonerange=int
 
-	Size of a single zone in which I/O occurs. See also :option:`zonesize`
-	and :option:`zoneskip`.
+	Size of a single zone. See also :option:`zonesize` and
+	:option:`zoneskip`.
 
 .. option:: zonesize=int
 
-	Number of bytes to transfer before skipping :option:`zoneskip`
-	bytes. If this parameter is smaller than :option:`zonerange` then only
-	a fraction of each zone with :option:`zonerange` bytes will be
-	accessed.  If this parameter is larger than :option:`zonerange` then
-	each zone will be accessed multiple times before skipping
+	For :option:`zonemode` =strided, this is the number of bytes to
+	transfer before skipping :option:`zoneskip` bytes. If this parameter
+	is smaller than :option:`zonerange` then only a fraction of each zone
+	with :option:`zonerange` bytes will be accessed.  If this parameter is
+	larger than :option:`zonerange` then each zone will be accessed
+	multiple times before skipping to the next zone.
+
+	For :option:`zonemode` =zbd, this is the size of a single zone. The
+	:option:`zonerange` parameter is ignored in this mode.
 
 .. option:: zoneskip=int
 
-	Skip the specified number of bytes when :option:`zonesize` data have
-	been transferred. The three zone options can be used to do strided I/O
-	on a file.
+	For :option:`zonemode` =strided, the number of bytes to skip after
+	:option:`zonesize` bytes of data have been transferred. This parameter
+	must be zero for :option:`zonemode` =zbd.
+
+.. option:: read_beyond_wp=bool
+
+	This parameter applies to :option:`zonemode` =zbd only.
+
+	Zoned block devices are block devices that consist of multiple zones.
+	Each zone has a type, e.g. conventional or sequential. A conventional
+	zone can be written at any offset that is a multiple of the block
+	size. Sequential zones must be written sequentially. The position at
+	which a write must occur is called the write pointer. A zoned block
+	device can be either drive managed, host managed or host aware. For
+	host managed devices the host must ensure that writes happen
+	sequentially. Fio recognizes host managed devices and serializes
+	writes to sequential zones for these devices.
+
+	If a read occurs in a sequential zone beyond the write pointer then
+	the zoned block device will complete the read without reading any data
+	from the storage medium. Since such reads lead to unrealistically high
+	bandwidth and IOPS numbers fio only reads beyond the write pointer if
+	explicitly told to do so. Default: false.
+
+.. option:: max_open_zones=int
+
+	When running a random write test across an entire drive many more
+	zones will be open than in a typical application workload. Hence this
+	command line option that allows to limit the number of open zones. The
+	number of open zones is defined as the number of zones to which write
+	commands are issued.
+
+.. option:: zone_reset_threshold=float
+
+	A number between zero and one that indicates the ratio of logical
+	blocks with data to the total number of logical blocks in the test
+	above which zones should be reset periodically.
+
+.. option:: zone_reset_frequency=float
+
+	A number between zero and one that indicates how often a zone reset
+	should be issued if the zone reset threshold has been exceeded. A zone
+	reset is submitted after each (1 / zone_reset_frequency) write
+	requests. This and the previous parameter can be used to simulate
+	garbage collection activity.
 
 
 I/O type
diff --git a/Makefile b/Makefile
index e8e15fe..7e87b2f 100644
--- a/Makefile
+++ b/Makefile
@@ -148,6 +148,9 @@ endif
 ifdef CONFIG_IME
   SOURCE += engines/ime.c
 endif
+ifdef CONFIG_LINUX_BLKZONED
+  SOURCE += zbd.c
+endif
 
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
diff --git a/cconv.c b/cconv.c
index 534bfb0..1d7f6f2 100644
--- a/cconv.c
+++ b/cconv.c
@@ -223,6 +223,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->zone_range = le64_to_cpu(top->zone_range);
 	o->zone_size = le64_to_cpu(top->zone_size);
 	o->zone_skip = le64_to_cpu(top->zone_skip);
+	o->zone_mode = le32_to_cpu(top->zone_mode);
 	o->lockmem = le64_to_cpu(top->lockmem);
 	o->offset_increment = le64_to_cpu(top->offset_increment);
 	o->number_ios = le64_to_cpu(top->number_ios);
@@ -548,6 +549,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->zone_range = __cpu_to_le64(o->zone_range);
 	top->zone_size = __cpu_to_le64(o->zone_size);
 	top->zone_skip = __cpu_to_le64(o->zone_skip);
+	top->zone_mode = __cpu_to_le32(o->zone_mode);
 	top->lockmem = __cpu_to_le64(o->lockmem);
 	top->ddir_seq_add = __cpu_to_le64(o->ddir_seq_add);
 	top->file_size_low = __cpu_to_le64(o->file_size_low);
diff --git a/configure b/configure
index fb8b243..ab89df7 100755
--- a/configure
+++ b/configure
@@ -2195,6 +2195,24 @@ if compile_prog "" "" "valgrind_dev"; then
 fi
 print_config "Valgrind headers" "$valgrind_dev"
 
+##########################################
+# <linux/blkzoned.h> probe
+if test "$linux_blkzoned" != "yes" ; then
+  linux_blkzoned="no"
+fi
+cat > $TMPC << EOF
+#include <linux/blkzoned.h>
+int main(int argc, char **argv)
+{
+  return 0;
+}
+EOF
+if compile_prog "" "" "linux_blkzoned"; then
+  linux_blkzoned="yes"
+fi
+print_config "Zoned block device support" "$linux_blkzoned"
+
+##########################################
 # check march=armv8-a+crc+crypto
 if test "$march_armv8_a_crc_crypto" != "yes" ; then
   march_armv8_a_crc_crypto="no"
@@ -2519,6 +2537,9 @@ fi
 if test "$valgrind_dev" = "yes"; then
   output_sym "CONFIG_VALGRIND_DEV"
 fi
+if test "$linux_blkzoned" = "yes" ; then
+  output_sym "CONFIG_LINUX_BLKZONED"
+fi
 if test "$zlib" = "no" ; then
   echo "Consider installing zlib-dev (zlib-devel, some fio features depend on it."
   if test "$build_static" = "yes"; then
diff --git a/debug.h b/debug.h
index e5e8040..51b18de 100644
--- a/debug.h
+++ b/debug.h
@@ -22,6 +22,7 @@ enum {
 	FD_COMPRESS,
 	FD_STEADYSTATE,
 	FD_HELPERTHREAD,
+	FD_ZBD,
 	FD_DEBUG_MAX,
 };
 
diff --git a/file.h b/file.h
index c0a547e..446a1fb 100644
--- a/file.h
+++ b/file.h
@@ -10,6 +10,9 @@
 #include "lib/lfsr.h"
 #include "lib/gauss.h"
 
+/* Forward declarations */
+struct zoned_block_device_info;
+
 /*
  * The type of object we are working on
  */
@@ -98,6 +101,11 @@ struct fio_file {
 	uint64_t io_size;
 
 	/*
+	 * Zoned block device information. See also zonemode=zbd.
+	 */
+	struct zoned_block_device_info *zbd_info;
+
+	/*
 	 * Track last end and last start of IO for a given data direction
 	 */
 	uint64_t last_pos[DDIR_RWDIR_CNT];
diff --git a/filesetup.c b/filesetup.c
index 94a025e..580403d 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -14,6 +14,7 @@
 #include "hash.h"
 #include "lib/axmap.h"
 #include "rwlock.h"
+#include "zbd.h"
 
 #ifdef CONFIG_LINUX_FALLOCATE
 #include <linux/falloc.h>
@@ -1142,9 +1143,6 @@ int setup_files(struct thread_data *td)
 	if (err)
 		goto err_out;
 
-	if (!o->zone_size)
-		o->zone_size = o->size;
-
 	/*
 	 * iolog already set the total io size, if we read back
 	 * stored entries.
@@ -1161,7 +1159,14 @@ done:
 		td->done = 1;
 
 	td_restore_runstate(td, old_state);
+
+	if (td->o.zone_mode == ZONE_MODE_ZBD) {
+		err = zbd_init(td);
+		if (err)
+			goto err_out;
+	}
 	return 0;
+
 err_offset:
 	log_err("%s: you need to specify valid offset=\n", o->name);
 err_out:
@@ -1349,6 +1354,8 @@ void close_and_free_files(struct thread_data *td)
 			td_io_unlink_file(td, f);
 		}
 
+		zbd_free_zone_info(f);
+
 		if (use_free)
 			free(f->file_name);
 		else
@@ -1873,6 +1880,8 @@ void fio_file_reset(struct thread_data *td, struct fio_file *f)
 		axmap_reset(f->io_axmap);
 	else if (fio_file_lfsr(f))
 		lfsr_reset(&f->lfsr, td->rand_seeds[FIO_RAND_BLOCK_OFF]);
+
+	zbd_file_reset(td, f);
 }
 
 bool fio_files_done(struct thread_data *td)
diff --git a/fio.1 b/fio.1
index 4071947..b555b20 100644
--- a/fio.1
+++ b/fio.1
@@ -724,21 +724,79 @@ false.
 .BI unlink_each_loop \fR=\fPbool
 Unlink job files after each iteration or loop. Default: false.
 .TP
-Fio supports strided data access. After having read \fBzonesize\fR bytes from an area that is \fBzonerange\fR bytes big, \fBzoneskip\fR bytes are skipped.
+.BI zonemode \fR=\fPstr
+Accepted values are:
+.RS
+.RS
+.TP
+.B none
+The \fBzonerange\fR, \fBzonesize\fR and \fBzoneskip\fR parameters are ignored.
+.TP
+.B strided
+I/O happens in a single zone until \fBzonesize\fR bytes have been transferred.
+After that number of bytes has been transferred processing of the next zone
+starts.
+.TP
+.B zbd
+Zoned block device mode. I/O happens sequentially in each zone, even if random
+I/O has been selected. Random I/O happens across all zones instead of being
+restricted to a single zone.
+.RE
+.RE
 .TP
 .BI zonerange \fR=\fPint
-Size of a single zone in which I/O occurs.
+Size of a single zone. See also \fBzonesize\fR and \fBzoneskip\fR.
 .TP
 .BI zonesize \fR=\fPint
-Number of bytes to transfer before skipping \fBzoneskip\fR bytes. If this
-parameter is smaller than \fBzonerange\fR then only a fraction of each zone
-with \fBzonerange\fR bytes will be accessed.  If this parameter is larger than
-\fBzonerange\fR then each zone will be accessed multiple times before skipping
-to the next zone.
+For \fBzonemode\fR=strided, this is the number of bytes to transfer before
+skipping \fBzoneskip\fR bytes. If this parameter is smaller than
+\fBzonerange\fR then only a fraction of each zone with \fBzonerange\fR bytes
+will be accessed.  If this parameter is larger than \fBzonerange\fR then each
+zone will be accessed multiple times before skipping to the next zone.
+
+For \fBzonemode\fR=zbd, this is the size of a single zone. The \fBzonerange\fR
+parameter is ignored in this mode.
 .TP
 .BI zoneskip \fR=\fPint
-Skip the specified number of bytes after \fBzonesize\fR bytes of data have been
-transferred.
+For \fBzonemode\fR=strided, the number of bytes to skip after \fBzonesize\fR
+bytes of data have been transferred. This parameter must be zero for
+\fBzonemode\fR=zbd.
+
+.TP
+.BI read_beyond_wp \fR=\fPbool
+This parameter applies to \fBzonemode=zbd\fR only.
+
+Zoned block devices are block devices that consist of multiple zones. Each
+zone has a type, e.g. conventional or sequential. A conventional zone can be
+written at any offset that is a multiple of the block size. Sequential zones
+must be written sequentially. The position at which a write must occur is
+called the write pointer. A zoned block device can be either drive
+managed, host managed or host aware. For host managed devices the host must
+ensure that writes happen sequentially. Fio recognizes host managed devices
+and serializes writes to sequential zones for these devices.
+
+If a read occurs in a sequential zone beyond the write pointer then the zoned
+block device will complete the read without reading any data from the storage
+medium. Since such reads lead to unrealistically high bandwidth and IOPS
+numbers fio only reads beyond the write pointer if explicitly told to do
+so. Default: false.
+.TP
+.BI max_open_zones \fR=\fPint
+When running a random write test across an entire drive many more zones will be
+open than in a typical application workload. Hence this command line option
+that allows to limit the number of open zones. The number of open zones is
+defined as the number of zones to which write commands are issued.
+.TP
+.BI zone_reset_threshold \fR=\fPfloat
+A number between zero and one that indicates the ratio of logical blocks with
+data to the total number of logical blocks in the test above which zones
+should be reset periodically.
+.TP
+.BI zone_reset_frequency \fR=\fPfloat
+A number between zero and one that indicates how often a zone reset should be
+issued if the zone reset threshold has been exceeded. A zone reset is
+submitted after each (1 / zone_reset_frequency) write requests. This and the
+previous parameter can be used to simulate garbage collection activity.
 
 .SS "I/O type"
 .TP
diff --git a/fio.h b/fio.h
index 83654bb..42015d3 100644
--- a/fio.h
+++ b/fio.h
@@ -167,6 +167,8 @@ struct zone_split_index {
 	uint64_t size_prev;
 };
 
+#define FIO_MAX_OPEN_ZBD_ZONES 128
+
 /*
  * This describes a single thread/process executing a fio job.
  */
diff --git a/init.c b/init.c
index 3ed5757..b925b4c 100644
--- a/init.c
+++ b/init.c
@@ -618,17 +618,34 @@ static int fixup_options(struct thread_data *td)
 		ret |= warnings_fatal;
 	}
 
+	if (o->zone_mode == ZONE_MODE_NONE && o->zone_size) {
+		log_err("fio: --zonemode=none and --zonesize are not compatible.\n");
+		ret |= 1;
+	}
+
+	if (o->zone_mode == ZONE_MODE_STRIDED && !o->zone_size) {
+		log_err("fio: --zonesize must be specified when using --zonemode=strided.\n");
+		ret |= 1;
+	}
+
+	if (o->zone_mode == ZONE_MODE_NOT_SPECIFIED) {
+		if (o->zone_size)
+			o->zone_mode = ZONE_MODE_STRIDED;
+		else
+			o->zone_mode = ZONE_MODE_NONE;
+	}
+
 	/*
-	 * only really works with 1 file
+	 * Strided zone mode only really works with 1 file.
 	 */
-	if (o->zone_size && o->open_files > 1)
-		o->zone_size = 0;
+	if (o->zone_mode == ZONE_MODE_STRIDED && o->open_files > 1)
+		o->zone_mode = ZONE_MODE_NONE;
 
 	/*
 	 * If zone_range isn't specified, backward compatibility dictates it
 	 * should be made equal to zone_size.
 	 */
-	if (o->zone_size && !o->zone_range)
+	if (o->zone_mode == ZONE_MODE_STRIDED && !o->zone_range)
 		o->zone_range = o->zone_size;
 
 	/*
@@ -2263,6 +2280,10 @@ const struct debug_level debug_levels[] = {
 	  .help = "Helper thread logging",
 	  .shift = FD_HELPERTHREAD,
 	},
+	{ .name = "zbd",
+	  .help = "Zoned Block Device logging",
+	  .shift = FD_ZBD,
+	},
 	{ .name = NULL, },
 };
 
diff --git a/io_u.c b/io_u.c
index c58dcf0..3fbcf0f 100644
--- a/io_u.c
+++ b/io_u.c
@@ -10,6 +10,7 @@
 #include "err.h"
 #include "lib/pow2.h"
 #include "minmax.h"
+#include "zbd.h"
 
 struct io_completion_data {
 	int nr;				/* input */
@@ -31,21 +32,27 @@ static bool random_map_free(struct fio_file *f, const uint64_t block)
 /*
  * Mark a given offset as used in the map.
  */
-static void mark_random_map(struct thread_data *td, struct io_u *io_u)
+static uint64_t mark_random_map(struct thread_data *td, struct io_u *io_u,
+				uint64_t offset, uint64_t buflen)
 {
 	unsigned long long min_bs = td->o.min_bs[io_u->ddir];
 	struct fio_file *f = io_u->file;
 	unsigned long long nr_blocks;
 	uint64_t block;
 
-	block = (io_u->offset - f->file_offset) / (uint64_t) min_bs;
-	nr_blocks = (io_u->buflen + min_bs - 1) / min_bs;
+	block = (offset - f->file_offset) / (uint64_t) min_bs;
+	nr_blocks = (buflen + min_bs - 1) / min_bs;
+	assert(nr_blocks > 0);
 
-	if (!(io_u->flags & IO_U_F_BUSY_OK))
+	if (!(io_u->flags & IO_U_F_BUSY_OK)) {
 		nr_blocks = axmap_set_nr(f->io_axmap, block, nr_blocks);
+		assert(nr_blocks > 0);
+	}
+
+	if ((nr_blocks * min_bs) < buflen)
+		buflen = nr_blocks * min_bs;
 
-	if ((nr_blocks * min_bs) < io_u->buflen)
-		io_u->buflen = nr_blocks * min_bs;
+	return buflen;
 }
 
 static uint64_t last_block(struct thread_data *td, struct fio_file *f,
@@ -64,7 +71,7 @@ static uint64_t last_block(struct thread_data *td, struct fio_file *f,
 	if (max_size > f->real_file_size)
 		max_size = f->real_file_size;
 
-	if (td->o.zone_range)
+	if (td->o.zone_mode == ZONE_MODE_STRIDED && td->o.zone_range)
 		max_size = td->o.zone_range;
 
 	if (td->o.min_bs[ddir] > td->o.ba[ddir])
@@ -761,6 +768,11 @@ void put_file_log(struct thread_data *td, struct fio_file *f)
 
 void put_io_u(struct thread_data *td, struct io_u *io_u)
 {
+	if (io_u->post_submit) {
+		io_u->post_submit(io_u, io_u->error == 0);
+		io_u->post_submit = NULL;
+	}
+
 	if (td->parent)
 		td = td->parent;
 
@@ -815,10 +827,14 @@ void requeue_io_u(struct thread_data *td, struct io_u **io_u)
 	*io_u = NULL;
 }
 
-static void __fill_io_u_zone(struct thread_data *td, struct io_u *io_u)
+static void setup_strided_zone_mode(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 
+	assert(td->o.zone_mode == ZONE_MODE_STRIDED);
+	assert(td->o.zone_size);
+	assert(td->o.zone_range);
+
 	/*
 	 * See if it's time to switch to a new zone
 	 */
@@ -857,6 +873,8 @@ static void __fill_io_u_zone(struct thread_data *td, struct io_u *io_u)
 static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 {
 	bool is_random;
+	uint64_t offset;
+	enum io_u_action ret;
 
 	if (td_ioengine_flagged(td, FIO_NOIO))
 		goto out;
@@ -869,11 +887,8 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 	if (!ddir_rw(io_u->ddir))
 		goto out;
 
-	/*
-	 * When file is zoned zone_range is always positive
-	 */
-	if (td->o.zone_range)
-		__fill_io_u_zone(td, io_u);
+	if (td->o.zone_mode == ZONE_MODE_STRIDED)
+		setup_strided_zone_mode(td, io_u);
 
 	/*
 	 * No log, let the seq/rand engine retrieve the next buflen and
@@ -890,6 +905,13 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 		return 1;
 	}
 
+	offset = io_u->offset;
+	if (td->o.zone_mode == ZONE_MODE_ZBD) {
+		ret = zbd_adjust_block(td, io_u);
+		if (ret == io_u_eof)
+			return 1;
+	}
+
 	if (io_u->offset + io_u->buflen > io_u->file->real_file_size) {
 		dprint(FD_IO, "io_u %p, off=0x%llx + len=0x%llx exceeds file size=0x%llx\n",
 			io_u,
@@ -902,7 +924,7 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 	 * mark entry before potentially trimming io_u
 	 */
 	if (td_random(td) && file_randommap(td, io_u->file))
-		mark_random_map(td, io_u);
+		io_u->buflen = mark_random_map(td, io_u, offset, io_u->buflen);
 
 out:
 	dprint_io_u(io_u, "fill");
@@ -1303,6 +1325,11 @@ static long set_io_u_file(struct thread_data *td, struct io_u *io_u)
 		if (!fill_io_u(td, io_u))
 			break;
 
+		if (io_u->post_submit) {
+			io_u->post_submit(io_u, false);
+			io_u->post_submit = NULL;
+		}
+
 		put_file_log(td, f);
 		td_io_close_file(td, f);
 		io_u->file = NULL;
diff --git a/io_u.h b/io_u.h
index 2e0fd3f..97270c9 100644
--- a/io_u.h
+++ b/io_u.h
@@ -93,6 +93,12 @@ struct io_u {
 	};
 
 	/*
+	 * Post-submit callback. Used by the ZBD code. @success == true means
+	 * that the I/O operation has been queued or completed successfully.
+	 */
+	void (*post_submit)(const struct io_u *, bool success);
+
+	/*
 	 * Callback for io completion
 	 */
 	int (*end_io)(struct thread_data *, struct io_u **);
diff --git a/ioengines.c b/ioengines.c
index 433da60..ba02952 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -18,6 +18,7 @@
 
 #include "fio.h"
 #include "diskutil.h"
+#include "zbd.h"
 
 static FLIST_HEAD(engine_list);
 
@@ -319,6 +320,10 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 	}
 
 	ret = td->io_ops->queue(td, io_u);
+	if (ret != FIO_Q_BUSY && io_u->post_submit) {
+		io_u->post_submit(io_u, io_u->error == 0);
+		io_u->post_submit = NULL;
+	}
 
 	unlock_file(td, io_u->file);
 
@@ -350,6 +355,13 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 			 "invalid block size. Try setting direct=0.\n");
 	}
 
+	if (zbd_unaligned_write(io_u->error) &&
+	    td->io_issues[io_u->ddir & 1] == 1 &&
+	    td->o.zone_mode != ZONE_MODE_ZBD) {
+		log_info("fio: first I/O failed. If %s is a zoned block device, consider --zonemode=zbd\n",
+			 io_u->file->file_name);
+	}
+
 	if (!td->io_ops->commit) {
 		io_u_mark_submit(td, 1);
 		io_u_mark_complete(td, 1);
diff --git a/options.c b/options.c
index 86ab5d6..534233b 100644
--- a/options.c
+++ b/options.c
@@ -3240,6 +3240,30 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 #endif
 	{
+		.name	= "zonemode",
+		.lname	= "Zone mode",
+		.help	= "Mode for the zonesize, zonerange and zoneskip parameters",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct thread_options, zone_mode),
+		.def	= "none",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_ZONE,
+		.posval	= {
+			   { .ival = "none",
+			     .oval = ZONE_MODE_NONE,
+			     .help = "no zoning",
+			   },
+			   { .ival = "strided",
+			     .oval = ZONE_MODE_STRIDED,
+			     .help = "strided mode - random I/O is restricted to a single zone",
+			   },
+			   { .ival = "zbd",
+			     .oval = ZONE_MODE_ZBD,
+			     .help = "zoned block device mode - random I/O selects one of multiple zones randomly",
+			   },
+		},
+	},
+	{
 		.name	= "zonesize",
 		.lname	= "Zone size",
 		.type	= FIO_OPT_STR_VAL,
@@ -3273,6 +3297,52 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_ZONE,
 	},
 	{
+		.name	= "read_beyond_wp",
+		.lname	= "Allow reads beyond the zone write pointer",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct thread_options, read_beyond_wp),
+		.help	= "Allow reads beyond the zone write pointer",
+		.def	= "0",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
+		.name	= "max_open_zones",
+		.lname	= "Maximum number of open zones",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, max_open_zones),
+		.maxval	= FIO_MAX_OPEN_ZBD_ZONES,
+		.help	= "Limit random writes to SMR drives to the specified"
+			  " number of sequential zones",
+		.def	= "0",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
+		.name	= "zone_reset_threshold",
+		.lname	= "Zone reset threshold",
+		.help	= "Zoned block device reset threshold",
+		.type	= FIO_OPT_FLOAT_LIST,
+		.maxlen	= 1,
+		.off1	= offsetof(struct thread_options, zrt),
+		.minfp	= 0,
+		.maxfp	= 1,
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_ZONE,
+	},
+	{
+		.name	= "zone_reset_frequency",
+		.lname	= "Zone reset frequency",
+		.help	= "Zoned block device zone reset frequency in HZ",
+		.type	= FIO_OPT_FLOAT_LIST,
+		.maxlen	= 1,
+		.off1	= offsetof(struct thread_options, zrf),
+		.minfp	= 0,
+		.maxfp	= 1,
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_ZONE,
+	},
+	{
 		.name	= "lockmem",
 		.lname	= "Lock memory",
 		.type	= FIO_OPT_STR_VAL,
diff --git a/parse.c b/parse.c
index 952118c..5d88d91 100644
--- a/parse.c
+++ b/parse.c
@@ -120,19 +120,22 @@ static void show_option_values(const struct fio_option *o)
 static void show_option_help(const struct fio_option *o, int is_err)
 {
 	const char *typehelp[] = {
-		"invalid",
-		"string (opt=bla)",
-		"string (opt=bla)",
-		"string with possible k/m/g postfix (opt=4k)",
-		"string with time postfix (opt=10s)",
-		"string (opt=bla)",
-		"string with dual range (opt=1k-4k,4k-8k)",
-		"integer value (opt=100)",
-		"boolean value (opt=1)",
-		"list of floating point values separated by ':' (opt=5.9:7.8)",
-		"no argument (opt)",
-		"deprecated",
-		"unsupported",
+		[FIO_OPT_INVALID]	  = "invalid",
+		[FIO_OPT_STR]		  = "string (opt=bla)",
+		[FIO_OPT_STR_ULL]	  = "string (opt=bla)",
+		[FIO_OPT_STR_MULTI]	  = "string with possible k/m/g postfix (opt=4k)",
+		[FIO_OPT_STR_VAL]	  = "string (opt=bla)",
+		[FIO_OPT_STR_VAL_TIME]	  = "string with time postfix (opt=10s)",
+		[FIO_OPT_STR_STORE]	  = "string (opt=bla)",
+		[FIO_OPT_RANGE]		  = "one to three ranges (opt=1k-4k[,4k-8k[,1k-8k]])",
+		[FIO_OPT_INT]		  = "integer value (opt=100)",
+		[FIO_OPT_ULL]		  = "integer value (opt=100)",
+		[FIO_OPT_BOOL]		  = "boolean value (opt=1)",
+		[FIO_OPT_FLOAT_LIST]	  = "list of floating point values separated by ':' (opt=5.9:7.8)",
+		[FIO_OPT_STR_SET]	  = "empty or boolean value ([0|1])",
+		[FIO_OPT_DEPRECATED]	  = "deprecated",
+		[FIO_OPT_SOFT_DEPRECATED] = "deprecated",
+		[FIO_OPT_UNSUPPORTED]	  = "unsupported",
 	};
 	ssize_t (*logger)(const char *format, ...);
 
diff --git a/stat.c b/stat.c
index 6cb704e..abdbb0e 100644
--- a/stat.c
+++ b/stat.c
@@ -14,6 +14,7 @@
 #include "lib/output_buffer.h"
 #include "helper_thread.h"
 #include "smalloc.h"
+#include "zbd.h"
 
 #define LOG_MSEC_SLACK	1
 
@@ -419,7 +420,7 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	unsigned long runt;
 	unsigned long long min, max, bw, iops;
 	double mean, dev;
-	char *io_p, *bw_p, *bw_p_alt, *iops_p;
+	char *io_p, *bw_p, *bw_p_alt, *iops_p, *zbd_w_st = NULL;
 	int i2p;
 
 	if (ddir_sync(ddir)) {
@@ -450,12 +451,16 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 
 	iops = (1000 * (uint64_t)ts->total_io_u[ddir]) / runt;
 	iops_p = num2str(iops, ts->sig_figs, 1, 0, N2S_NONE);
+	if (ddir == DDIR_WRITE)
+		zbd_w_st = zbd_write_status(ts);
 
-	log_buf(out, "  %s: IOPS=%s, BW=%s (%s)(%s/%llumsec)\n",
+	log_buf(out, "  %s: IOPS=%s, BW=%s (%s)(%s/%llumsec)%s\n",
 			rs->unified_rw_rep ? "mixed" : str[ddir],
 			iops_p, bw_p, bw_p_alt, io_p,
-			(unsigned long long) ts->runtime[ddir]);
+			(unsigned long long) ts->runtime[ddir],
+			zbd_w_st ? : "");
 
+	free(zbd_w_st);
 	free(io_p);
 	free(bw_p);
 	free(bw_p_alt);
@@ -1655,6 +1660,7 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 	dst->total_run_time += src->total_run_time;
 	dst->total_submit += src->total_submit;
 	dst->total_complete += src->total_complete;
+	dst->nr_zone_resets += src->nr_zone_resets;
 }
 
 void init_group_run_stat(struct group_run_stats *gs)
@@ -2337,6 +2343,7 @@ void reset_io_stats(struct thread_data *td)
 
 	ts->total_submit = 0;
 	ts->total_complete = 0;
+	ts->nr_zone_resets = 0;
 }
 
 static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
diff --git a/stat.h b/stat.h
index 5dcaae0..98de281 100644
--- a/stat.h
+++ b/stat.h
@@ -211,6 +211,9 @@ struct thread_stat {
 	uint32_t first_error;
 	uint64_t total_err_count;
 
+	/* ZBD stats */
+	uint64_t nr_zone_resets;
+
 	uint64_t nr_block_infos;
 	uint32_t block_infos[MAX_NR_BLOCK_INFOS];
 
diff --git a/t/zbd/functions b/t/zbd/functions
new file mode 100644
index 0000000..95f9bf4
--- /dev/null
+++ b/t/zbd/functions
@@ -0,0 +1,106 @@
+#!/bin/bash
+
+# To do: switch to blkzone once blkzone reset works correctly.
+blkzone=
+#blkzone=$(type -p blkzone 2>/dev/null)
+zbc_report_zones=$(type -p zbc_report_zones 2>/dev/null)
+zbc_reset_zone=$(type -p zbc_reset_zone 2>/dev/null)
+if [ -z "${blkzone}" ] &&
+       { [ -z "${zbc_report_zones}" ] || [ -z "${zbc_reset_zone}" ]; }; then
+    echo "Error: neither blkzone nor zbc_report_zones is available"
+    exit 1
+fi
+
+# Reports the starting sector and length of the first sequential zone of device
+# $1.
+first_sequential_zone() {
+    local dev=$1
+
+    if [ -n "${blkzone}" ]; then
+	${blkzone} report "$dev" |
+	    sed -n 's/^[[:blank:]]*start:[[:blank:]]\([0-9a-zA-Z]*\),[[:blank:]]len[[:blank:]]\([0-9a-zA-Z]*\),.*type:[[:blank:]]2(.*/\1 \2/p' |
+	    {
+		read -r starting_sector length &&
+		    # Convert from hex to decimal
+		    echo $((starting_sector)) $((length))
+	    }
+    else
+	${zbc_report_zones} "$dev" |
+	    sed -n 's/^Zone [0-9]*: type 0x2 .*, sector \([0-9]*\), \([0-9]*\) sectors,.*$/\1 \2/p' |
+	    head -n1
+    fi
+}
+
+max_open_zones() {
+    local dev=$1
+
+    if [ -n "${blkzone}" ]; then
+	# To do: query the maximum number of open zones using sg_raw
+	return 1
+    else
+	${zbc_report_zones} "$dev" |
+	    sed -n 's/^[[:blank:]]*Maximum number of open sequential write required zones:[[:blank:]]*//p'
+    fi
+}
+
+# Reset the write pointer of one zone on device $1 at offset $2. The offset
+# must be specified in units of 512 byte sectors. Offset -1 means reset all
+# zones.
+reset_zone() {
+    local dev=$1 offset=$2 sectors
+
+    if [ -n "${blkzone}" ]; then
+	if [ "$offset" -lt 0 ]; then
+	    sectors=$(<"/sys/class/block/${dev#/dev/}/size")
+	    ${blkzone} reset -o "${offset}" -l "$sectors" "$dev"
+	else
+	    ${blkzone} reset -o "${offset}" -c 1 "$dev"
+	fi
+    else
+	if [ "$offset" -lt 0 ]; then
+	    ${zbc_reset_zone} -all "$dev" "${offset}" >/dev/null
+	else
+	    ${zbc_reset_zone} -sector "$dev" "${offset}" >/dev/null
+	fi
+    fi
+}
+
+# Extract the number of bytes that have been transferred from a line like
+# READ: bw=6847KiB/s (7011kB/s), 6847KiB/s-6847KiB/s (7011kB/s-7011kB/s), io=257MiB (269MB), run=38406-38406msec
+fio_io() {
+    sed -n 's/^[[:blank:]]*'"$1"'.*, io=\([^[:blank:]]*\).*/\1/p' |
+	tail -n 1 |
+	(
+	    read -r io;
+	    # Parse <number>.<number><suffix> into n1, n2 and s. See also
+	    # num2str().
+	    shopt -s extglob
+	    n1=${io%${io##*([0-9])}}
+	    s=${io#${io%%*([a-zA-Z])}}
+	    n2=${io#${n1}}
+	    n2=${n2#.}
+	    n2=${n2%$s}000
+	    n2=${n2:0:3}
+	    case "$s" in
+		KiB) m=10;;
+		MiB) m=20;;
+		GiB) m=30;;
+		B)   m=0;;
+		*)   return 1;;
+	    esac
+	    [ -n "$n1" ] || return 1
+	    echo $(((n1 << m) + (n2 << m) / 1000))
+	)
+}
+
+fio_read() {
+    fio_io 'READ:'
+}
+
+fio_written() {
+    fio_io 'WRITE:'
+}
+
+fio_reset_count() {
+    sed -n 's/^.*write:[^;]*; \([0-9]*\) zone resets$/\1/p'
+}
diff --git a/t/zbd/run-tests-against-regular-nullb b/t/zbd/run-tests-against-regular-nullb
new file mode 100755
index 0000000..133c7c4
--- /dev/null
+++ b/t/zbd/run-tests-against-regular-nullb
@@ -0,0 +1,25 @@
+#!/bin/bash
+#
+# Copyright (C) 2018 Western Digital Corporation or its affiliates.
+#
+# This file is released under the GPL.
+
+for d in /sys/kernel/config/nullb/*; do [ -d "$d" ] && rmdir "$d"; done
+modprobe -r null_blk
+modprobe null_blk nr_devices=0 || return $?
+for d in /sys/kernel/config/nullb/*; do
+    [ -d "$d" ] && rmdir "$d"
+done
+modprobe -r null_blk
+[ -e /sys/module/null_blk ] && exit $?
+modprobe null_blk nr_devices=0 &&
+    cd /sys/kernel/config/nullb &&
+    mkdir nullb0 &&
+    cd nullb0 &&
+    echo 0 > completion_nsec &&
+    echo 4096 > blocksize &&
+    echo 1024 > size &&
+    echo 1 > memory_backed &&
+    echo 1 > power
+
+"$(dirname "$0")"/test-zbd-support "$@" /dev/nullb0
diff --git a/t/zbd/run-tests-against-zoned-nullb b/t/zbd/run-tests-against-zoned-nullb
new file mode 100755
index 0000000..7d9eb43
--- /dev/null
+++ b/t/zbd/run-tests-against-zoned-nullb
@@ -0,0 +1,27 @@
+#!/bin/bash
+#
+# Copyright (C) 2018 Western Digital Corporation or its affiliates.
+#
+# This file is released under the GPL.
+
+for d in /sys/kernel/config/nullb/*; do [ -d "$d" ] && rmdir "$d"; done
+modprobe -r null_blk
+modprobe null_blk nr_devices=0 || return $?
+for d in /sys/kernel/config/nullb/*; do
+    [ -d "$d" ] && rmdir "$d"
+done
+modprobe -r null_blk
+[ -e /sys/module/null_blk ] && exit $?
+modprobe null_blk nr_devices=0 &&
+    cd /sys/kernel/config/nullb &&
+    mkdir nullb0 &&
+    cd nullb0 &&
+    echo 1 > zoned &&
+    echo 1 > zone_size &&
+    echo 0 > completion_nsec &&
+    echo 4096 > blocksize &&
+    echo 1024 > size &&
+    echo 1 > memory_backed &&
+    echo 1 > power
+
+"$(dirname "$0")"/test-zbd-support "$@" /dev/nullb0
diff --git a/t/zbd/test-zbd-support b/t/zbd/test-zbd-support
new file mode 100755
index 0000000..6ee5055
--- /dev/null
+++ b/t/zbd/test-zbd-support
@@ -0,0 +1,817 @@
+#!/bin/bash
+#
+# Copyright (C) 2018 Western Digital Corporation or its affiliates.
+#
+# This file is released under the GPL.
+
+usage() {
+    echo "Usage: $(basename "$0") [-d] [-e] [-r] [-v] [-t <test>] <SMR drive device node>"
+}
+
+max() {
+    if [ "$1" -gt "$2" ]; then
+	echo "$1"
+    else
+	echo "$2"
+    fi
+}
+
+min() {
+    if [ "$1" -lt "$2" ]; then
+	echo "$1"
+    else
+	echo "$2"
+    fi
+}
+
+set_io_scheduler() {
+    local dev=$1 sched=$2
+
+    [ -e "/sys/block/$dev" ] || return $?
+    if [ -e "/sys/block/$dev/mq" ]; then
+	case "$sched" in
+	    noop)        sched=none;;
+	    deadline)    sched=mq-deadline;;
+	esac
+    else
+	case "$sched" in
+	    none)        sched=noop;;
+	    mq-deadline) sched=deadline;;
+	esac
+    fi
+
+    echo "$sched" >"/sys/block/$dev/queue/scheduler"
+}
+
+check_read() {
+    local read
+
+    read=$(fio_read <"${logfile}.${test_number}")
+    echo "read: $read <> $1" >> "${logfile}.${test_number}"
+    [ "$read" = "$1" ]
+}
+
+check_written() {
+    local written
+
+    written=$(fio_written <"${logfile}.${test_number}")
+    echo "written: $written <> $1" >> "${logfile}.${test_number}"
+    [ "$written" = "$1" ]
+}
+
+# Compare the reset count from the log file with reset count $2 using operator
+# $1 (=, -ge, -gt, -le, -lt).
+check_reset_count() {
+    local reset_count
+
+    reset_count=$(fio_reset_count <"${logfile}.${test_number}")
+    echo "reset_count: test $reset_count $1 $2" >> "${logfile}.${test_number}"
+    eval "[ '$reset_count' '$1' '$2' ]"
+}
+
+# Whether or not $1 (/dev/...) is a SCSI device.
+is_scsi_device() {
+    local d f
+
+    d=$(basename "$dev")
+    for f in /sys/class/scsi_device/*/device/block/"$d"; do
+	[ -e "$f" ] && return 0
+    done
+    return 1
+}
+
+run_fio() {
+    local fio
+
+    fio=$(dirname "$0")/../../fio
+
+    { echo; echo "fio $*"; echo; } >>"${logfile}.${test_number}"
+
+    "${dynamic_analyzer[@]}" "$fio" "$@"
+}
+
+run_one_fio_job() {
+    local r
+
+    r=$(((RANDOM << 16) | RANDOM))
+    run_fio --name="$dev" --filename="$dev" "$@" --randseed="$r"	\
+	    --thread=1 --direct=1
+}
+
+# Run fio on the first four sequential zones of the disk.
+run_fio_on_seq() {
+    local opts=()
+
+    opts+=("--offset=$((first_sequential_zone_sector * 512))")
+    opts+=("--size=$((4 * zone_size))" "--zonemode=zbd")
+    if [ -z "$is_zbd" ]; then
+	opts+=("--zonesize=${zone_size}")
+    fi
+    run_one_fio_job "${opts[@]}" "$@"
+}
+
+# Check whether buffered writes are refused.
+test1() {
+    run_fio --name=job1 --filename="$dev" --rw=write --direct=0 --bs=4K	\
+	    --size="${zone_size}"					\
+	    --zonemode=zbd --zonesize="${zone_size}" 2>&1 |
+	tee -a "${logfile}.${test_number}" |
+	grep -q 'Using direct I/O is mandatory for writing to ZBD drives'
+    local fio_rc=${PIPESTATUS[0]} grep_rc=${PIPESTATUS[2]}
+    case "$fio_rc" in
+	0|1) ;;
+	*)   return "$fio_rc"
+    esac
+    if [ -n "$is_zbd" ]; then
+	[ "$grep_rc" = 0 ]
+    else
+	[ "$grep_rc" != 0 ]
+    fi
+}
+
+# Block size exceeds zone size.
+test2() {
+    local bs off opts=() rc
+
+    off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
+    bs=$((2 * zone_size))
+    opts+=("--name=job1" "--filename=$dev" "--rw=write" "--direct=1")
+    opts+=("--zonemode=zbd" "--offset=$off" "--bs=$bs" "--size=$bs")
+    if [ -z "$is_zbd" ]; then
+	opts+=("--zonesize=${zone_size}")
+    fi
+    run_fio "${opts[@]}" 2>&1 |
+	tee -a "${logfile}.${test_number}" |
+	grep -q 'No I/O performed'
+}
+
+# Run fio against an empty zone. This causes fio to report "No I/O performed".
+test3() {
+    local off opts=() rc
+
+    off=$((first_sequential_zone_sector * 512 + 128 * zone_size))
+    size=$((zone_size))
+    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
+    opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--bs=4K")
+    opts+=("--size=$size" "--zonemode=zbd")
+    opts+=("--ioengine=psync" "--rw=read" "--direct=1" "--thread=1")
+    if [ -z "$is_zbd" ]; then
+	opts+=("--zonesize=${zone_size}")
+    fi
+    run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
+    grep -q "No I/O performed" "${logfile}.${test_number}"
+    rc=$?
+    if [ -n "$is_zbd" ]; then
+	[ $rc = 0 ]
+    else
+	[ $rc != 0 ]
+    fi
+}
+
+# Run fio with --read_beyond_wp=1 against an empty zone.
+test4() {
+    local off opts=()
+
+    off=$((first_sequential_zone_sector * 512 + 129 * zone_size))
+    size=$((zone_size))
+    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
+    opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--bs=$size")
+    opts+=("--size=$size" "--thread=1" "--read_beyond_wp=1")
+    opts+=("--ioengine=psync" "--rw=read" "--direct=1" "--disable_lat=1")
+    opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
+    run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
+    check_read $size || return $?
+}
+
+# Sequential write to sequential zones.
+test5() {
+    local size
+
+    size=$((4 * zone_size))
+    run_fio_on_seq --ioengine=psync --iodepth=1 --rw=write		\
+		   --bs="$(max $((zone_size / 64)) "$logical_block_size")"\
+		   --do_verify=1 --verify=md5				\
+		   >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_written $size || return $?
+    check_read $size || return $?
+}
+
+# Sequential read from sequential zones. Must be run after test5.
+test6() {
+    local size
+
+    size=$((4 * zone_size))
+    run_fio_on_seq --ioengine=psync --iodepth=1 --rw=read		\
+		   --bs="$(max $((zone_size / 64)) "$logical_block_size")"\
+		   >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_read $size || return $?
+}
+
+# Random write to sequential zones, libaio, queue depth 1.
+test7() {
+    local size=$((zone_size))
+
+    run_fio_on_seq --ioengine=libaio --iodepth=1 --rw=randwrite		\
+		   --bs="$(min 16384 "${zone_size}")"			\
+		   --do_verify=1 --verify=md5 --size="$size"		\
+		   >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_written $size || return $?
+    check_read $size || return $?
+}
+
+# Random write to sequential zones, libaio, queue depth 64.
+test8() {
+    local size
+
+    size=$((4 * zone_size))
+    run_fio_on_seq --ioengine=libaio --iodepth=64 --rw=randwrite	\
+		   --bs="$(min 16384 "${zone_size}")"			\
+		   --do_verify=1 --verify=md5				\
+		   >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_written $size || return $?
+    check_read $size || return $?
+}
+
+# Random write to sequential zones, sg, queue depth 1.
+test9() {
+    local size
+
+    if ! is_scsi_device "$dev"; then
+	echo "$dev is not a SCSI device" >>"${logfile}.${test_number}"
+	return 0
+    fi
+
+    size=$((4 * zone_size))
+    run_fio_on_seq --ioengine=sg --iodepth=1 --rw=randwrite --bs=16K	\
+		   --do_verify=1 --verify=md5				\
+		   >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_written $size || return $?
+    check_read $size || return $?
+}
+
+# Random write to sequential zones, sg, queue depth 64.
+test10() {
+    local size
+
+    if ! is_scsi_device "$dev"; then
+	echo "$dev is not a SCSI device" >>"${logfile}.${test_number}"
+	return 0
+    fi
+
+    size=$((4 * zone_size))
+    run_fio_on_seq --ioengine=sg --iodepth=64 --rw=randwrite --bs=16K	\
+		   --do_verify=1 --verify=md5				\
+		   >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_written $size || return $?
+    check_read $size || return $?
+}
+
+# Random write to sequential zones, libaio, queue depth 64, random block size.
+test11() {
+    local size
+
+    size=$((4 * zone_size))
+    run_fio_on_seq --ioengine=libaio --iodepth=64 --rw=randwrite	\
+		   --bsrange=4K-64K --do_verify=1 --verify=md5		\
+		   --debug=zbd >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_written $size || return $?
+    check_read $size || return $?
+}
+
+# Random write to sequential zones, libaio, queue depth 64, max 1 open zone.
+test12() {
+    local size
+
+    size=$((8 * zone_size))
+    run_fio_on_seq --ioengine=libaio --iodepth=64 --rw=randwrite --bs=16K     \
+		   --max_open_zones=1 --size=$size --do_verify=1 --verify=md5 \
+		   --debug=zbd >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_written $size || return $?
+    check_read $size || return $?
+}
+
+# Random write to sequential zones, libaio, queue depth 64, max 4 open zones.
+test13() {
+    local size
+
+    size=$((8 * zone_size))
+    run_fio_on_seq --ioengine=libaio --iodepth=64 --rw=randwrite --bs=16K     \
+		   --max_open_zones=4 --size=$size --do_verify=1 --verify=md5 \
+		   --debug=zbd						      \
+		   >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_written $size || return $?
+    check_read $size || return $?
+}
+
+# Random write to conventional zones.
+test14() {
+    local size
+
+    size=$((16 * 2**20)) # 20 MB
+    if [ $size -gt $((first_sequential_zone_sector * 512)) ]; then
+	echo "$dev does not have enough sequential zones" \
+	     >>"${logfile}.${test_number}"
+	return 0
+    fi
+    run_one_fio_job --ioengine=libaio --iodepth=64 --rw=randwrite --bs=16K \
+		    --zonemode=zbd --zonesize="${zone_size}" --do_verify=1 \
+		    --verify=md5 --size=$size				   \
+		    >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_written $((size)) || return $?
+    check_read $((size)) || return $?
+}
+
+# Sequential read on a mix of empty and full zones.
+test15() {
+    local i off size
+
+    for ((i=0;i<4;i++)); do
+	[ -n "$is_zbd" ] &&
+	    reset_zone "$dev" $((first_sequential_zone_sector +
+				 i*sectors_per_zone))
+    done
+    off=$(((first_sequential_zone_sector + 2 * sectors_per_zone) * 512))
+    size=$((2 * zone_size))
+    run_one_fio_job --ioengine=psync --rw=write --bs=$((zone_size / 16))\
+		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off \
+		    --size=$size >>"${logfile}.${test_number}" 2>&1 ||
+	return $?
+    check_written $size || return $?
+    off=$((first_sequential_zone_sector * 512))
+    size=$((4 * zone_size))
+    run_one_fio_job --ioengine=psync --rw=read --bs=$((zone_size / 16))	\
+		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off \
+		    --size=$((size)) >>"${logfile}.${test_number}" 2>&1 ||
+	return $?
+    if [ -n "$is_zbd" ]; then
+	check_read $((size / 2))
+    else
+	check_read $size
+    fi
+}
+
+# Random read on a mix of empty and full zones. Must be run after test15.
+test16() {
+    local off size
+
+    off=$((first_sequential_zone_sector * 512))
+    size=$((4 * zone_size))
+    run_one_fio_job --ioengine=libaio --iodepth=64 --rw=randread --bs=16K \
+		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off \
+		    --size=$size >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_read $size || return $?
+}
+
+# Random reads and writes in the last zone.
+test17() {
+    local io off read size written
+
+    off=$(((disk_size / zone_size - 1) * zone_size))
+    size=$((disk_size - off))
+    # Overwrite the last zone to avoid that reading from that zone fails.
+    if [ -n "$is_zbd" ]; then
+	reset_zone "$dev" $((off / 512)) || return $?
+    fi
+    run_one_fio_job --ioengine=psync --rw=write --offset="$off"		\
+		    --zonemode=zbd --zonesize="${zone_size}"		\
+		    --bs="$zone_size" --size="$zone_size"		\
+		    >>"${logfile}.${test_number}" 2>&1 || return $?
+    check_written "$zone_size" || return $?
+    run_one_fio_job --ioengine=libaio --iodepth=8 --rw=randrw --bs=4K	\
+		    --zonemode=zbd --zonesize="${zone_size}"		\
+		    --offset=$off --loops=2 --norandommap=1\
+		    >>"${logfile}.${test_number}" 2>&1 || return $?
+    written=$(fio_written <"${logfile}.${test_number}")
+    read=$(fio_read <"${logfile}.${test_number}")
+    io=$((written + read))
+    echo "Total number of bytes read and written: $io <> $size" \
+	 >>"${logfile}.${test_number}"
+    [ $io = $((size * 2)) ];
+}
+
+# Out-of-range zone reset threshold and frequency parameters.
+test18() {
+    run_fio_on_seq --zone_reset_threshold=-1 |&
+	tee -a "${logfile}.${test_number}"   |
+	    grep -q 'value out of range' || return $?
+}
+
+test19() {
+    run_fio_on_seq --zone_reset_threshold=2  |&
+	tee -a "${logfile}.${test_number}"   |
+	grep -q 'value out of range' || return $?
+}
+
+test20() {
+    run_fio_on_seq --zone_reset_threshold=.4:.6 |&
+	tee -a "${logfile}.${test_number}"   |
+	grep -q 'the list exceeding max length' || return $?
+}
+
+test21() {
+    run_fio_on_seq --zone_reset_frequency=-1 |&
+	tee -a "${logfile}.${test_number}"   |
+	grep -q 'value out of range' || return $?
+}
+
+test22() {
+    run_fio_on_seq --zone_reset_frequency=2  |&
+	tee -a "${logfile}.${test_number}"   |
+	grep -q 'value out of range' || return $?
+}
+
+test23() {
+    run_fio_on_seq --zone_reset_frequency=.4:.6  |&
+	tee -a "${logfile}.${test_number}"   |
+	grep -q 'the list exceeding max length' || return $?
+}
+
+test24() {
+    local bs loops=9 size=$((zone_size))
+
+    bs=$(min $((256*1024)) "$zone_size")
+    run_fio_on_seq --ioengine=psync --rw=write --bs="$bs" --size=$size	 \
+		   --loops=$loops					 \
+		   --zone_reset_frequency=.01 --zone_reset_threshold=.90 \
+		   >> "${logfile}.${test_number}" 2>&1 || return $?
+    check_written $((size * loops)) || return $?
+    check_reset_count -eq 8 ||
+	check_reset_count -eq 9 ||
+	check_reset_count -eq 10 || return $?
+}
+
+# Multiple non-overlapping sequential write jobs for the same drive.
+test25() {
+    local i opts=()
+
+    for ((i=0;i<16;i++)); do
+        [ -n "$is_zbd" ] &&
+	    reset_zone "$dev" $((first_sequential_zone_sector + i*sectors_per_zone))
+    done
+    for ((i=0;i<16;i++)); do
+	opts+=("--name=job$i" "--filename=$dev" "--thread=1" "--direct=1")
+	opts+=("--offset=$((first_sequential_zone_sector*512 + zone_size*i))")
+	opts+=("--size=$zone_size" "--ioengine=psync" "--rw=write" "--bs=16K")
+	opts+=("--zonemode=zbd" "--zonesize=${zone_size}" "--group_reporting=1")
+    done
+    run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
+write_to_first_seq_zone() {
+    local loops=4 r
+
+    r=$(((RANDOM << 16) | RANDOM))
+    run_fio --name="$dev" --filename="$dev" --ioengine=psync --rw="$1"	\
+	    --thread=1 --do_verify=1 --verify=md5 --direct=1 --bs=4K	\
+	    --offset=$((first_sequential_zone_sector * 512))		\
+	    "--size=$zone_size" --loops=$loops --randseed="$r"		\
+	    --zonemode=zbd --zonesize="${zone_size}" --group_reporting=1	\
+	    --gtod_reduce=1 >> "${logfile}.${test_number}" 2>&1 || return $?
+    check_written $((loops * zone_size)) || return $?
+}
+
+# Overwrite the first sequential zone four times sequentially.
+test26() {
+    write_to_first_seq_zone write
+}
+
+# Overwrite the first sequential zone four times using random writes.
+test27() {
+    write_to_first_seq_zone randwrite
+}
+
+# Multiple overlapping random write jobs for the same drive.
+test28() {
+    local i jobs=16 off opts
+
+    off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
+    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
+    opts=("--debug=zbd")
+    for ((i=0;i<jobs;i++)); do
+	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
+	opts+=("--size=$zone_size" "--ioengine=psync" "--rw=randwrite")
+	opts+=("--thread=1" "--direct=1" "--zonemode=zbd")
+	opts+=("--zonesize=${zone_size}" "--group_reporting=1")
+    done
+    run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
+    check_written $((jobs * zone_size)) || return $?
+    check_reset_count -eq $jobs ||
+	check_reset_count -eq $((jobs - 1)) ||
+	return $?
+}
+
+# Multiple overlapping random write jobs for the same drive and with a limited
+# number of open zones.
+test29() {
+    local i jobs=16 off opts=()
+
+    off=$((first_sequential_zone_sector * 512 + 64 * zone_size))
+    size=$((16*zone_size))
+    [ -n "$is_zbd" ] && reset_zone "$dev" $((off / 512))
+    opts=("--debug=zbd")
+    for ((i=0;i<jobs;i++)); do
+	opts+=("--name=job$i" "--filename=$dev" "--offset=$off" "--bs=16K")
+	opts+=("--size=$size" "--io_size=$zone_size" "--thread=1")
+	opts+=("--ioengine=psync" "--rw=randwrite" "--direct=1")
+	opts+=("--max_open_zones=4" "--group_reporting=1")
+	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
+    done
+    run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
+    check_written $((jobs * zone_size)) || return $?
+}
+
+# Random reads and writes across the entire disk for 30s.
+test30() {
+    local off
+
+    off=$((first_sequential_zone_sector * 512))
+    run_one_fio_job --ioengine=libaio --iodepth=8 --rw=randrw		\
+		    --bs="$(max $((zone_size / 128)) "$logical_block_size")"\
+		    --zonemode=zbd --zonesize="${zone_size}" --offset=$off\
+		    --loops=2 --time_based --runtime=30s --norandommap=1\
+		    >>"${logfile}.${test_number}" 2>&1
+}
+
+# Random reads across all sequential zones for 30s. This is not only a fio
+# test but also allows to verify the performance of a drive.
+test31() {
+    local bs inc nz off opts size
+
+    # Start with writing 128 KB to 128 sequential zones.
+    bs=128K
+    nz=128
+    # shellcheck disable=SC2017
+    inc=$(((disk_size - (first_sequential_zone_sector * 512)) / (nz * zone_size)
+	   * zone_size))
+    opts=()
+    for ((off = first_sequential_zone_sector * 512; off < disk_size;
+	  off += inc)); do
+	opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--io_size=$bs")
+	opts+=("--bs=$bs" "--size=$zone_size" "--ioengine=libaio")
+	opts+=("--rw=write" "--direct=1" "--thread=1" "--stats=0")
+	opts+=("--zonemode=zbd" "--zonesize=${zone_size}")
+    done
+    "$(dirname "$0")/../../fio" "${opts[@]}" >> "${logfile}.${test_number}" 2>&1
+    # Next, run the test.
+    off=$((first_sequential_zone_sector * 512))
+    size=$((disk_size - off))
+    opts=("--name=$dev" "--filename=$dev" "--offset=$off" "--size=$size")
+    opts+=("--bs=$bs" "--ioengine=psync" "--rw=randread" "--direct=1")
+    opts+=("--thread=1" "--time_based" "--runtime=30" "--zonemode=zbd")
+    opts+=("--zonesize=${zone_size}")
+    run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
+# Random writes across all sequential zones. This is not only a fio test but
+# also allows to verify the performance of a drive.
+test32() {
+    local off opts=() size
+
+    off=$((first_sequential_zone_sector * 512))
+    size=$((disk_size - off))
+    opts+=("--name=$dev" "--filename=$dev" "--offset=$off" "--size=$size")
+    opts+=("--bs=128K" "--ioengine=psync" "--rw=randwrite" "--direct=1")
+    opts+=("--thread=1" "--time_based" "--runtime=30")
+    opts+=("--max_open_zones=$max_open_zones" "--zonemode=zbd")
+    opts+=("--zonesize=${zone_size}")
+    run_fio "${opts[@]}" >> "${logfile}.${test_number}" 2>&1 || return $?
+}
+
+# Write to sequential zones with a block size that is not a divisor of the
+# zone size.
+test33() {
+    local bs io_size size
+
+    size=$((2 * zone_size))
+    io_size=$((5 * zone_size))
+    bs=$((3 * zone_size / 4))
+    run_fio_on_seq --ioengine=psync --iodepth=1 --rw=write --size=$size	\
+		   --io_size=$io_size --bs=$bs				\
+		   >> "${logfile}.${test_number}" 2>&1 || return $?
+    check_written $(((io_size + bs - 1) / bs * bs)) || return $?
+}
+
+# Write to sequential zones with a block size that is not a divisor of the
+# zone size and with data verification enabled.
+test34() {
+    local size
+
+    size=$((2 * zone_size))
+    run_fio_on_seq --ioengine=psync --iodepth=1 --rw=write --size=$size	  \
+		   --do_verify=1 --verify=md5 --bs=$((3 * zone_size / 4)) \
+		   >> "${logfile}.${test_number}" 2>&1 && return 1
+    grep -q 'not a divisor of' "${logfile}.${test_number}"
+}
+
+# Test 1/4 for the I/O boundary rounding code: $size < $zone_size.
+test35() {
+    local bs off io_size size
+
+    off=$(((first_sequential_zone_sector + 1) * 512))
+    size=$((zone_size - 2 * 512))
+    bs=$((zone_size / 4))
+    run_one_fio_job --offset=$off --size=$size --ioengine=psync	--iodepth=1 \
+		    --rw=write --do_verify=1 --verify=md5 --bs=$bs	    \
+		    --zonemode=zbd --zonesize="${zone_size}"		    \
+		    >> "${logfile}.${test_number}" 2>&1 && return 1
+    grep -q 'io_size must be at least one zone' "${logfile}.${test_number}"
+}
+
+# Test 2/4 for the I/O boundary rounding code: $size < $zone_size.
+test36() {
+    local bs off io_size size
+
+    off=$(((first_sequential_zone_sector) * 512))
+    size=$((zone_size - 512))
+    bs=$((zone_size / 4))
+    run_one_fio_job --offset=$off --size=$size --ioengine=psync	--iodepth=1 \
+		    --rw=write --do_verify=1 --verify=md5 --bs=$bs	    \
+		    --zonemode=zbd --zonesize="${zone_size}"		    \
+		    >> "${logfile}.${test_number}" 2>&1 && return 1
+    grep -q 'io_size must be at least one zone' "${logfile}.${test_number}"
+}
+
+# Test 3/4 for the I/O boundary rounding code: $size > $zone_size.
+test37() {
+    local bs off size
+
+    if [ "$first_sequential_zone_sector" = 0 ]; then
+	off=0
+    else
+	off=$(((first_sequential_zone_sector - 1) * 512))
+    fi
+    size=$((zone_size + 2 * 512))
+    bs=$((zone_size / 4))
+    run_one_fio_job --offset=$off --size=$size --ioengine=psync	--iodepth=1 \
+		    --rw=write --do_verify=1 --verify=md5 --bs=$bs	    \
+		    --zonemode=zbd --zonesize="${zone_size}"		    \
+		    >> "${logfile}.${test_number}" 2>&1
+    check_written $((zone_size)) || return $?
+}
+
+# Test 4/4 for the I/O boundary rounding code: $offset > $disk_size - $zone_size
+test38() {
+    local bs off size
+
+    size=$((logical_block_size))
+    off=$((disk_size - logical_block_size))
+    bs=$((logical_block_size))
+    run_one_fio_job --offset=$off --size=$size --ioengine=psync	--iodepth=1 \
+		    --rw=write --do_verify=1 --verify=md5 --bs=$bs	    \
+		    --zonemode=zbd --zonesize="${zone_size}"		    \
+		    >> "${logfile}.${test_number}" 2>&1 && return 1
+    grep -q 'io_size must be at least one zone' "${logfile}.${test_number}"
+}
+
+# Read one block from a block device.
+read_one_block() {
+    local bs
+
+    bs=$((logical_block_size))
+    run_one_fio_job --rw=read --ioengine=psync --bs=$bs --size=$bs "$@" 2>&1 |
+	tee -a "${logfile}.${test_number}"
+}
+
+# Check whether fio accepts --zonemode=none for zoned block devices.
+test39() {
+    [ -n "$is_zbd" ] || return 0
+    read_one_block --zonemode=none >/dev/null || return $?
+    check_read $((logical_block_size)) || return $?
+}
+
+# Check whether fio accepts --zonemode=strided for zoned block devices.
+test40() {
+    local bs
+
+    bs=$((logical_block_size))
+    [ -n "$is_zbd" ] || return 0
+    read_one_block --zonemode=strided |
+	grep -q 'fio: --zonesize must be specified when using --zonemode=strided' ||
+	return $?
+    read_one_block --zonemode=strided --zonesize=$bs >/dev/null || return $?
+    check_read $bs || return $?
+}
+
+# Check whether fio checks the zone size for zoned block devices.
+test41() {
+    [ -n "$is_zbd" ] || return 0
+    read_one_block --zonemode=zbd --zonesize=$((2 * zone_size)) |
+	grep -q 'job parameter zonesize.*does not match disk zone size'
+}
+
+# Check whether fio handles --zonesize=0 correctly for regular block devices.
+test42() {
+    [ -n "$is_zbd" ] && return 0
+    read_one_block --zonemode=zbd --zonesize=0 |
+	grep -q 'Specifying the zone size is mandatory for regular block devices with --zonemode=zbd'
+}
+
+# Check whether fio handles --zonesize=1 correctly.
+test43() {
+    read_one_block --zonemode=zbd --zonesize=1 |
+	grep -q 'zone size must be at least 512 bytes for --zonemode=zbd'
+}
+
+# Check whether fio handles --zonemode=none --zonesize=1 correctly.
+test44() {
+    read_one_block --zonemode=none --zonesize=1 |
+	grep -q 'fio: --zonemode=none and --zonesize are not compatible'
+}
+
+test45() {
+    local bs i
+
+    [ -z "$is_zbd" ] && return 0
+    bs=$((logical_block_size))
+    run_one_fio_job --ioengine=psync --iodepth=1 --rw=randwrite --bs=$bs\
+		    --offset=$((first_sequential_zone_sector * 512)) \
+		    --size="$zone_size" --do_verify=1 --verify=md5 2>&1 |
+	tee -a "${logfile}.${test_number}" |
+	grep -q "fio: first I/O failed. If .* is a zoned block device, consider --zonemode=zbd"
+}
+
+tests=()
+dynamic_analyzer=()
+reset_all_zones=
+
+while [ "${1#-}" != "$1" ]; do
+  case "$1" in
+    -d) dynamic_analyzer=(valgrind "--read-var-info=yes" "--tool=drd"
+			  "--show-confl-seg=no");
+	shift;;
+    -e) dynamic_analyzer=(valgrind "--read-var-info=yes" "--tool=helgrind");
+	shift;;
+    -r) reset_all_zones=1; shift;;
+    -t) tests+=("$2"); shift; shift;;
+    -v) dynamic_analyzer=(valgrind "--read-var-info=yes");
+	shift;;
+    --) shift; break;;
+  esac
+done
+
+if [ $# != 1 ]; then
+    usage
+    exit 1
+fi
+
+# shellcheck source=functions
+source "$(dirname "$0")/functions" || exit $?
+
+dev=$1
+realdev=$(readlink -f "$dev")
+basename=$(basename "$realdev")
+disk_size=$(($(<"/sys/block/$basename/size")*512))
+logical_block_size=$(<"/sys/block/$basename/queue/logical_block_size")
+case "$(<"/sys/class/block/$basename/queue/zoned")" in
+    host-managed|host-aware)
+	is_zbd=true
+	if ! result=($(first_sequential_zone "$dev")); then
+	    echo "Failed to determine first sequential zone"
+	    exit 1
+	fi
+	first_sequential_zone_sector=${result[0]}
+	sectors_per_zone=${result[1]}
+	zone_size=$((sectors_per_zone * 512))
+	if ! max_open_zones=$(max_open_zones "$dev"); then
+	    echo "Failed to determine maximum number of open zones"
+	    exit 1
+	fi
+	echo "First sequential zone starts at sector $first_sequential_zone_sector; zone size: $((zone_size >> 20)) MB"
+	set_io_scheduler "$basename" deadline || exit $?
+	if [ -n "$reset_all_zones" ]; then
+	    reset_zone "$dev" -1
+	fi
+	;;
+    *)
+	first_sequential_zone_sector=$(((disk_size / 2) &
+					(logical_block_size - 1)))
+	zone_size=$(max 65536 "$logical_block_size")
+	sectors_per_zone=$((zone_size / 512))
+	max_open_zones=128
+	set_io_scheduler "$basename" none || exit $?
+	;;
+esac
+
+if [ "${#tests[@]}" = 0 ]; then
+    for ((i=1;i<=45;i++)); do
+	tests+=("$i")
+    done
+fi
+
+logfile=$0.log
+
+rc=0
+for test_number in "${tests[@]}"; do
+    rm -f "${logfile}.${test_number}"
+    echo -n "Running test $test_number ... "
+    if eval "test$test_number"; then
+	status="PASS"
+    else
+	status="FAIL"
+	rc=1
+    fi
+    echo "$status"
+    echo "$status" >> "${logfile}.${test_number}"
+done
+
+exit $rc
diff --git a/thread_options.h b/thread_options.h
index 8bbf54b..3931583 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -10,6 +10,14 @@
 #include "lib/pattern.h"
 #include "td_error.h"
 
+enum fio_zone_mode {
+	ZONE_MODE_NOT_SPECIFIED	= 0,
+	ZONE_MODE_NONE		= 1,
+	ZONE_MODE_STRIDED	= 2, /* perform I/O in one zone at a time */
+	/* perform I/O across multiple zones simultaneously */
+	ZONE_MODE_ZBD		= 3,
+};
+
 /*
  * What type of allocation to use for io buffers
  */
@@ -188,6 +196,7 @@ struct thread_options {
 	unsigned long long zone_range;
 	unsigned long long zone_size;
 	unsigned long long zone_skip;
+	enum fio_zone_mode zone_mode;
 	unsigned long long lockmem;
 	enum fio_memtype mem_type;
 	unsigned int mem_align;
@@ -325,6 +334,12 @@ struct thread_options {
 
 	unsigned int allow_create;
 	unsigned int allow_mounted_write;
+
+	/* Parameters that affect zonemode=zbd */
+	unsigned int read_beyond_wp;
+	int max_open_zones;
+	fio_fp64_t zrt;
+	fio_fp64_t zrf;
 };
 
 #define FIO_TOP_STR_MAX		256
@@ -601,6 +616,8 @@ struct thread_options_pack {
 
 	uint32_t allow_create;
 	uint32_t allow_mounted_write;
+
+	uint32_t zone_mode;
 } __attribute__((packed));
 
 extern void convert_thread_options_to_cpu(struct thread_options *o, struct thread_options_pack *top);
diff --git a/zbd.c b/zbd.c
new file mode 100644
index 0000000..5619769
--- /dev/null
+++ b/zbd.c
@@ -0,0 +1,1288 @@
+/*
+ * Copyright (C) 2018 Western Digital Corporation or its affiliates.
+ *
+ * This file is released under the GPL.
+ */
+
+#include <errno.h>
+#include <string.h>
+#include <stdlib.h>
+#include <dirent.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <linux/blkzoned.h>
+#include "file.h"
+#include "fio.h"
+#include "lib/pow2.h"
+#include "log.h"
+#include "smalloc.h"
+#include "verify.h"
+#include "zbd.h"
+
+/**
+ * zbd_zone_idx - convert an offset into a zone number
+ * @f: file pointer.
+ * @offset: offset in bytes. If this offset is in the first zone_size bytes
+ *	    past the disk size then the index of the sentinel is returned.
+ */
+static uint32_t zbd_zone_idx(const struct fio_file *f, uint64_t offset)
+{
+	uint32_t zone_idx;
+
+	if (f->zbd_info->zone_size_log2)
+		zone_idx = offset >> f->zbd_info->zone_size_log2;
+	else
+		zone_idx = (offset >> 9) / f->zbd_info->zone_size;
+
+	return min(zone_idx, f->zbd_info->nr_zones);
+}
+
+/**
+ * zbd_zone_full - verify whether a minimum number of bytes remain in a zone
+ * @f: file pointer.
+ * @z: zone info pointer.
+ * @required: minimum number of bytes that must remain in a zone.
+ *
+ * The caller must hold z->mutex.
+ */
+static bool zbd_zone_full(const struct fio_file *f, struct fio_zone_info *z,
+			  uint64_t required)
+{
+	assert((required & 511) == 0);
+
+	return z->type == BLK_ZONE_TYPE_SEQWRITE_REQ &&
+		z->wp + (required >> 9) > z->start + f->zbd_info->zone_size;
+}
+
+static bool is_valid_offset(const struct fio_file *f, uint64_t offset)
+{
+	return (uint64_t)(offset - f->file_offset) < f->io_size;
+}
+
+/* Verify whether direct I/O is used for all host-managed zoned drives. */
+static bool zbd_using_direct_io(void)
+{
+	struct thread_data *td;
+	struct fio_file *f;
+	int i, j;
+
+	for_each_td(td, i) {
+		if (td->o.odirect || !(td->o.td_ddir & TD_DDIR_WRITE))
+			continue;
+		for_each_file(td, f, j) {
+			if (f->zbd_info &&
+			    f->zbd_info->model == ZBD_DM_HOST_MANAGED)
+				return false;
+		}
+	}
+
+	return true;
+}
+
+/* Whether or not the I/O range for f includes one or more sequential zones */
+static bool zbd_is_seq_job(struct fio_file *f)
+{
+	uint32_t zone_idx, zone_idx_b, zone_idx_e;
+
+	assert(f->zbd_info);
+	if (f->io_size == 0)
+		return false;
+	zone_idx_b = zbd_zone_idx(f, f->file_offset);
+	zone_idx_e = zbd_zone_idx(f, f->file_offset + f->io_size - 1);
+	for (zone_idx = zone_idx_b; zone_idx <= zone_idx_e; zone_idx++)
+		if (f->zbd_info->zone_info[zone_idx].type ==
+		    BLK_ZONE_TYPE_SEQWRITE_REQ)
+			return true;
+
+	return false;
+}
+
+/*
+ * Verify whether offset and size parameters are aligned with zone boundaries.
+ */
+static bool zbd_verify_sizes(void)
+{
+	const struct fio_zone_info *z;
+	struct thread_data *td;
+	struct fio_file *f;
+	uint64_t new_offset, new_end;
+	uint32_t zone_idx;
+	int i, j;
+
+	for_each_td(td, i) {
+		for_each_file(td, f, j) {
+			if (!f->zbd_info)
+				continue;
+			if (f->file_offset >= f->real_file_size)
+				continue;
+			if (!zbd_is_seq_job(f))
+				continue;
+			zone_idx = zbd_zone_idx(f, f->file_offset);
+			z = &f->zbd_info->zone_info[zone_idx];
+			if (f->file_offset != (z->start << 9)) {
+				new_offset = (z+1)->start << 9;
+				if (new_offset >= f->file_offset + f->io_size) {
+					log_info("%s: io_size must be at least one zone\n",
+						 f->file_name);
+					return false;
+				}
+				log_info("%s: rounded up offset from %lu to %lu\n",
+					 f->file_name, f->file_offset,
+					 new_offset);
+				f->io_size -= (new_offset - f->file_offset);
+				f->file_offset = new_offset;
+			}
+			zone_idx = zbd_zone_idx(f, f->file_offset + f->io_size);
+			z = &f->zbd_info->zone_info[zone_idx];
+			new_end = z->start << 9;
+			if (f->file_offset + f->io_size != new_end) {
+				if (new_end <= f->file_offset) {
+					log_info("%s: io_size must be at least one zone\n",
+						 f->file_name);
+					return false;
+				}
+				log_info("%s: rounded down io_size from %lu to %lu\n",
+					 f->file_name, f->io_size,
+					 new_end - f->file_offset);
+				f->io_size = new_end - f->file_offset;
+			}
+		}
+	}
+
+	return true;
+}
+
+static bool zbd_verify_bs(void)
+{
+	struct thread_data *td;
+	struct fio_file *f;
+	uint32_t zone_size;
+	int i, j, k;
+
+	for_each_td(td, i) {
+		for_each_file(td, f, j) {
+			if (!f->zbd_info)
+				continue;
+			zone_size = f->zbd_info->zone_size;
+			for (k = 0; k < ARRAY_SIZE(td->o.bs); k++) {
+				if (td->o.verify != VERIFY_NONE &&
+				    (zone_size << 9) % td->o.bs[k] != 0) {
+					log_info("%s: block size %llu is not a divisor of the zone size %d\n",
+						 f->file_name, td->o.bs[k],
+						 zone_size << 9);
+					return false;
+				}
+			}
+		}
+	}
+	return true;
+}
+
+/*
+ * Read zone information into @buf starting from sector @start_sector.
+ * @fd is a file descriptor that refers to a block device and @bufsz is the
+ * size of @buf.
+ *
+ * Returns 0 upon success and a negative error code upon failure.
+ */
+static int read_zone_info(int fd, uint64_t start_sector,
+			  void *buf, unsigned int bufsz)
+{
+	struct blk_zone_report *hdr = buf;
+
+	if (bufsz < sizeof(*hdr))
+		return -EINVAL;
+
+	memset(hdr, 0, sizeof(*hdr));
+
+	hdr->nr_zones = (bufsz - sizeof(*hdr)) / sizeof(struct blk_zone);
+	hdr->sector = start_sector;
+	return ioctl(fd, BLKREPORTZONE, hdr) >= 0 ? 0 : -errno;
+}
+
+/*
+ * Read up to 255 characters from the first line of a file. Strip the trailing
+ * newline.
+ */
+static char *read_file(const char *path)
+{
+	char line[256], *p = line;
+	FILE *f;
+
+	f = fopen(path, "rb");
+	if (!f)
+		return NULL;
+	if (!fgets(line, sizeof(line), f))
+		line[0] = '\0';
+	strsep(&p, "\n");
+	fclose(f);
+
+	return strdup(line);
+}
+
+static enum blk_zoned_model get_zbd_model(const char *file_name)
+{
+	enum blk_zoned_model model = ZBD_DM_NONE;
+	char *zoned_attr_path = NULL;
+	char *model_str = NULL;
+	struct stat statbuf;
+
+	if (stat(file_name, &statbuf) < 0)
+		goto out;
+	if (asprintf(&zoned_attr_path, "/sys/dev/block/%d:%d/queue/zoned",
+		     major(statbuf.st_rdev), minor(statbuf.st_rdev)) < 0)
+		goto out;
+	model_str = read_file(zoned_attr_path);
+	if (!model_str)
+		goto out;
+	dprint(FD_ZBD, "%s: zbd model string: %s\n", file_name, model_str);
+	if (strcmp(model_str, "host-aware") == 0)
+		model = ZBD_DM_HOST_AWARE;
+	else if (strcmp(model_str, "host-managed") == 0)
+		model = ZBD_DM_HOST_MANAGED;
+
+out:
+	free(model_str);
+	free(zoned_attr_path);
+	return model;
+}
+
+static int ilog2(uint64_t i)
+{
+	int log = -1;
+
+	while (i) {
+		i >>= 1;
+		log++;
+	}
+	return log;
+}
+
+/*
+ * Initialize f->zbd_info for devices that are not zoned block devices. This
+ * allows to execute a ZBD workload against a non-ZBD device.
+ */
+static int init_zone_info(struct thread_data *td, struct fio_file *f)
+{
+	uint32_t nr_zones;
+	struct fio_zone_info *p;
+	uint64_t zone_size;
+	struct zoned_block_device_info *zbd_info = NULL;
+	pthread_mutexattr_t attr;
+	int i;
+
+	zone_size = td->o.zone_size >> 9;
+	assert(zone_size);
+	nr_zones = ((f->real_file_size >> 9) + zone_size - 1) / zone_size;
+	zbd_info = scalloc(1, sizeof(*zbd_info) +
+			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
+	if (!zbd_info)
+		return -ENOMEM;
+
+	pthread_mutexattr_init(&attr);
+	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
+	pthread_mutexattr_setpshared(&attr, true);
+	pthread_mutex_init(&zbd_info->mutex, &attr);
+	zbd_info->refcount = 1;
+	p = &zbd_info->zone_info[0];
+	for (i = 0; i < nr_zones; i++, p++) {
+		pthread_mutex_init(&p->mutex, &attr);
+		p->start = i * zone_size;
+		p->wp = p->start + zone_size;
+		p->type = BLK_ZONE_TYPE_SEQWRITE_REQ;
+		p->cond = BLK_ZONE_COND_EMPTY;
+	}
+	/* a sentinel */
+	p->start = nr_zones * zone_size;
+
+	f->zbd_info = zbd_info;
+	f->zbd_info->zone_size = zone_size;
+	f->zbd_info->zone_size_log2 = is_power_of_2(zone_size) ?
+		ilog2(zone_size) + 9 : -1;
+	f->zbd_info->nr_zones = nr_zones;
+	pthread_mutexattr_destroy(&attr);
+	return 0;
+}
+
+/*
+ * Parse the BLKREPORTZONE output and store it in f->zbd_info. Must be called
+ * only for devices that support this ioctl, namely zoned block devices.
+ */
+static int parse_zone_info(struct thread_data *td, struct fio_file *f)
+{
+	const unsigned int bufsz = sizeof(struct blk_zone_report) +
+		4096 * sizeof(struct blk_zone);
+	uint32_t nr_zones;
+	struct blk_zone_report *hdr;
+	const struct blk_zone *z;
+	struct fio_zone_info *p;
+	uint64_t zone_size, start_sector;
+	struct zoned_block_device_info *zbd_info = NULL;
+	pthread_mutexattr_t attr;
+	void *buf;
+	int fd, i, j, ret = 0;
+
+	pthread_mutexattr_init(&attr);
+	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
+	pthread_mutexattr_setpshared(&attr, true);
+
+	buf = malloc(bufsz);
+	if (!buf)
+		goto out;
+
+	fd = open(f->file_name, O_RDONLY | O_LARGEFILE);
+	if (fd < 0) {
+		ret = -errno;
+		goto free;
+	}
+
+	ret = read_zone_info(fd, 0, buf, bufsz);
+	if (ret < 0) {
+		log_info("fio: BLKREPORTZONE(%lu) failed for %s (%d).\n",
+			 0UL, f->file_name, -ret);
+		goto close;
+	}
+	hdr = buf;
+	if (hdr->nr_zones < 1) {
+		log_info("fio: %s has invalid zone information.\n",
+			 f->file_name);
+		goto close;
+	}
+	z = (void *)(hdr + 1);
+	zone_size = z->len;
+	nr_zones = ((f->real_file_size >> 9) + zone_size - 1) / zone_size;
+
+	if (td->o.zone_size == 0) {
+		td->o.zone_size = zone_size << 9;
+	} else if (td->o.zone_size != zone_size << 9) {
+		log_info("fio: %s job parameter zonesize %lld does not match disk zone size %ld.\n",
+			 f->file_name, td->o.zone_size, zone_size << 9);
+		ret = -EINVAL;
+		goto close;
+	}
+
+	dprint(FD_ZBD, "Device %s has %d zones of size %lu KB\n", f->file_name,
+	       nr_zones, zone_size / 2);
+
+	zbd_info = scalloc(1, sizeof(*zbd_info) +
+			   (nr_zones + 1) * sizeof(zbd_info->zone_info[0]));
+	ret = -ENOMEM;
+	if (!zbd_info)
+		goto close;
+	pthread_mutex_init(&zbd_info->mutex, &attr);
+	zbd_info->refcount = 1;
+	p = &zbd_info->zone_info[0];
+	for (start_sector = 0, j = 0; j < nr_zones;) {
+		z = (void *)(hdr + 1);
+		for (i = 0; i < hdr->nr_zones; i++, j++, z++, p++) {
+			pthread_mutex_init(&p->mutex, &attr);
+			p->start = z->start;
+			switch (z->cond) {
+			case BLK_ZONE_COND_NOT_WP:
+				p->wp = z->start;
+				break;
+			case BLK_ZONE_COND_FULL:
+				p->wp = z->start + zone_size;
+				break;
+			default:
+				assert(z->start <= z->wp);
+				assert(z->wp <= z->start + zone_size);
+				p->wp = z->wp;
+				break;
+			}
+			p->type = z->type;
+			p->cond = z->cond;
+			if (j > 0 && p->start != p[-1].start + zone_size) {
+				log_info("%s: invalid zone data\n",
+					 f->file_name);
+				ret = -EINVAL;
+				goto close;
+			}
+		}
+		z--;
+		start_sector = z->start + z->len;
+		if (j >= nr_zones)
+			break;
+		ret = read_zone_info(fd, start_sector, buf, bufsz);
+		if (ret < 0) {
+			log_info("fio: BLKREPORTZONE(%lu) failed for %s (%d).\n",
+				 start_sector, f->file_name, -ret);
+			goto close;
+		}
+	}
+	/* a sentinel */
+	zbd_info->zone_info[nr_zones].start = start_sector;
+
+	f->zbd_info = zbd_info;
+	f->zbd_info->zone_size = zone_size;
+	f->zbd_info->zone_size_log2 = is_power_of_2(zone_size) ?
+		ilog2(zone_size) + 9 : -1;
+	f->zbd_info->nr_zones = nr_zones;
+	zbd_info = NULL;
+	ret = 0;
+
+close:
+	sfree(zbd_info);
+	close(fd);
+free:
+	free(buf);
+out:
+	pthread_mutexattr_destroy(&attr);
+	return ret;
+}
+
+/*
+ * Allocate zone information and store it into f->zbd_info if zonemode=zbd.
+ *
+ * Returns 0 upon success and a negative error code upon failure.
+ */
+int zbd_create_zone_info(struct thread_data *td, struct fio_file *f)
+{
+	enum blk_zoned_model zbd_model;
+	int ret = 0;
+
+	assert(td->o.zone_mode == ZONE_MODE_ZBD);
+
+	zbd_model = get_zbd_model(f->file_name);
+	switch (zbd_model) {
+	case ZBD_DM_HOST_AWARE:
+	case ZBD_DM_HOST_MANAGED:
+		ret = parse_zone_info(td, f);
+		break;
+	case ZBD_DM_NONE:
+		ret = init_zone_info(td, f);
+		break;
+	}
+	if (ret == 0)
+		f->zbd_info->model = zbd_model;
+	return ret;
+}
+
+void zbd_free_zone_info(struct fio_file *f)
+{
+	uint32_t refcount;
+
+	if (!f->zbd_info)
+		return;
+
+	pthread_mutex_lock(&f->zbd_info->mutex);
+	refcount = --f->zbd_info->refcount;
+	pthread_mutex_unlock(&f->zbd_info->mutex);
+
+	assert((int32_t)refcount >= 0);
+	if (refcount == 0)
+		sfree(f->zbd_info);
+	f->zbd_info = NULL;
+}
+
+/*
+ * Initialize f->zbd_info.
+ *
+ * Returns 0 upon success and a negative error code upon failure.
+ *
+ * Note: this function can only work correctly if it is called before the first
+ * fio fork() call.
+ */
+static int zbd_init_zone_info(struct thread_data *td, struct fio_file *file)
+{
+	struct thread_data *td2;
+	struct fio_file *f2;
+	int i, j, ret;
+
+	for_each_td(td2, i) {
+		for_each_file(td2, f2, j) {
+			if (td2 == td && f2 == file)
+				continue;
+			if (!f2->zbd_info ||
+			    strcmp(f2->file_name, file->file_name) != 0)
+				continue;
+			file->zbd_info = f2->zbd_info;
+			file->zbd_info->refcount++;
+			return 0;
+		}
+	}
+
+	ret = zbd_create_zone_info(td, file);
+	if (ret < 0)
+		td_verror(td, -ret, "BLKREPORTZONE failed");
+	return ret;
+}
+
+int zbd_init(struct thread_data *td)
+{
+	struct fio_file *f;
+	int i;
+
+	for_each_file(td, f, i) {
+		if (f->filetype != FIO_TYPE_BLOCK)
+			continue;
+		if (td->o.zone_size && td->o.zone_size < 512) {
+			log_err("%s: zone size must be at least 512 bytes for --zonemode=zbd\n\n",
+				f->file_name);
+			return 1;
+		}
+		if (td->o.zone_size == 0 &&
+		    get_zbd_model(f->file_name) == ZBD_DM_NONE) {
+			log_err("%s: Specifying the zone size is mandatory for regular block devices with --zonemode=zbd\n\n",
+				f->file_name);
+			return 1;
+		}
+		zbd_init_zone_info(td, f);
+	}
+
+	if (!zbd_using_direct_io()) {
+		log_err("Using direct I/O is mandatory for writing to ZBD drives\n\n");
+		return 1;
+	}
+
+	if (!zbd_verify_sizes())
+		return 1;
+
+	if (!zbd_verify_bs())
+		return 1;
+
+	return 0;
+}
+
+/**
+ * zbd_reset_range - reset zones for a range of sectors
+ * @td: FIO thread data.
+ * @f: Fio file for which to reset zones
+ * @sector: Starting sector in units of 512 bytes
+ * @nr_sectors: Number of sectors in units of 512 bytes
+ *
+ * Returns 0 upon success and a negative error code upon failure.
+ */
+static int zbd_reset_range(struct thread_data *td, const struct fio_file *f,
+			   uint64_t sector, uint64_t nr_sectors)
+{
+	struct blk_zone_range zr = {
+		.sector         = sector,
+		.nr_sectors     = nr_sectors,
+	};
+	uint32_t zone_idx_b, zone_idx_e;
+	struct fio_zone_info *zb, *ze, *z;
+	int ret = 0;
+
+	assert(f->fd != -1);
+	assert(is_valid_offset(f, ((sector + nr_sectors) << 9) - 1));
+	switch (f->zbd_info->model) {
+	case ZBD_DM_HOST_AWARE:
+	case ZBD_DM_HOST_MANAGED:
+		ret = ioctl(f->fd, BLKRESETZONE, &zr);
+		if (ret < 0) {
+			td_verror(td, errno, "resetting wp failed");
+			log_err("%s: resetting wp for %llu sectors at sector %llu failed (%d).\n",
+				f->file_name, zr.nr_sectors, zr.sector, errno);
+			return ret;
+		}
+		break;
+	case ZBD_DM_NONE:
+		break;
+	}
+
+	zone_idx_b = zbd_zone_idx(f, sector << 9);
+	zb = &f->zbd_info->zone_info[zone_idx_b];
+	zone_idx_e = zbd_zone_idx(f, (sector + nr_sectors) << 9);
+	ze = &f->zbd_info->zone_info[zone_idx_e];
+	for (z = zb; z < ze; z++) {
+		pthread_mutex_lock(&z->mutex);
+		pthread_mutex_lock(&f->zbd_info->mutex);
+		f->zbd_info->sectors_with_data -= z->wp - z->start;
+		pthread_mutex_unlock(&f->zbd_info->mutex);
+		z->wp = z->start;
+		z->verify_block = 0;
+		pthread_mutex_unlock(&z->mutex);
+	}
+
+	td->ts.nr_zone_resets += ze - zb;
+
+	return ret;
+}
+
+/**
+ * zbd_reset_zone - reset the write pointer of a single zone
+ * @td: FIO thread data.
+ * @f: FIO file associated with the disk for which to reset a write pointer.
+ * @z: Zone to reset.
+ *
+ * Returns 0 upon success and a negative error code upon failure.
+ */
+static int zbd_reset_zone(struct thread_data *td, const struct fio_file *f,
+			  struct fio_zone_info *z)
+{
+	int ret;
+
+	dprint(FD_ZBD, "%s: resetting wp of zone %lu.\n", f->file_name,
+	       z - f->zbd_info->zone_info);
+	ret = zbd_reset_range(td, f, z->start, (z+1)->start - z->start);
+	return ret;
+}
+
+/*
+ * Reset a range of zones. Returns 0 upon success and 1 upon failure.
+ * @td: fio thread data.
+ * @f: fio file for which to reset zones
+ * @zb: first zone to reset.
+ * @ze: first zone not to reset.
+ * @all_zones: whether to reset all zones or only those zones for which the
+ *	write pointer is not a multiple of td->o.min_bs[DDIR_WRITE].
+ */
+static int zbd_reset_zones(struct thread_data *td, struct fio_file *f,
+			   struct fio_zone_info *const zb,
+			   struct fio_zone_info *const ze, bool all_zones)
+{
+	struct fio_zone_info *z, *start_z = ze;
+	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE] >> 9;
+	bool reset_wp;
+	int res = 0;
+
+	dprint(FD_ZBD, "%s: examining zones %lu .. %lu\n", f->file_name,
+	       zb - f->zbd_info->zone_info, ze - f->zbd_info->zone_info);
+	assert(f->fd != -1);
+	for (z = zb; z < ze; z++) {
+		pthread_mutex_lock(&z->mutex);
+		switch (z->type) {
+		case BLK_ZONE_TYPE_SEQWRITE_REQ:
+			reset_wp = all_zones ? z->wp != z->start :
+					(td->o.td_ddir & TD_DDIR_WRITE) &&
+					z->wp % min_bs != 0;
+			if (start_z == ze && reset_wp) {
+				start_z = z;
+			} else if (start_z < ze && !reset_wp) {
+				dprint(FD_ZBD,
+				       "%s: resetting zones %lu .. %lu\n",
+				       f->file_name,
+				       start_z - f->zbd_info->zone_info,
+				       z - f->zbd_info->zone_info);
+				if (zbd_reset_range(td, f, start_z->start,
+						z->start - start_z->start) < 0)
+					res = 1;
+				start_z = ze;
+			}
+			break;
+		default:
+			if (start_z == ze)
+				break;
+			dprint(FD_ZBD, "%s: resetting zones %lu .. %lu\n",
+			       f->file_name, start_z - f->zbd_info->zone_info,
+			       z - f->zbd_info->zone_info);
+			if (zbd_reset_range(td, f, start_z->start,
+					    z->start - start_z->start) < 0)
+				res = 1;
+			start_z = ze;
+			break;
+		}
+	}
+	if (start_z < ze) {
+		dprint(FD_ZBD, "%s: resetting zones %lu .. %lu\n", f->file_name,
+		       start_z - f->zbd_info->zone_info,
+		       z - f->zbd_info->zone_info);
+		if (zbd_reset_range(td, f, start_z->start,
+				    z->start - start_z->start) < 0)
+			res = 1;
+	}
+	for (z = zb; z < ze; z++)
+		pthread_mutex_unlock(&z->mutex);
+
+	return res;
+}
+
+/*
+ * Reset zbd_info.write_cnt, the counter that counts down towards the next
+ * zone reset.
+ */
+static void zbd_reset_write_cnt(const struct thread_data *td,
+				const struct fio_file *f)
+{
+	assert(0 <= td->o.zrf.u.f && td->o.zrf.u.f <= 1);
+
+	pthread_mutex_lock(&f->zbd_info->mutex);
+	f->zbd_info->write_cnt = td->o.zrf.u.f ?
+		min(1.0 / td->o.zrf.u.f, 0.0 + UINT_MAX) : UINT_MAX;
+	pthread_mutex_unlock(&f->zbd_info->mutex);
+}
+
+static bool zbd_dec_and_reset_write_cnt(const struct thread_data *td,
+					const struct fio_file *f)
+{
+	uint32_t write_cnt = 0;
+
+	pthread_mutex_lock(&f->zbd_info->mutex);
+	assert(f->zbd_info->write_cnt);
+	if (f->zbd_info->write_cnt)
+		write_cnt = --f->zbd_info->write_cnt;
+	if (write_cnt == 0)
+		zbd_reset_write_cnt(td, f);
+	pthread_mutex_unlock(&f->zbd_info->mutex);
+
+	return write_cnt == 0;
+}
+
+/* Check whether the value of zbd_info.sectors_with_data is correct. */
+static void check_swd(const struct thread_data *td, const struct fio_file *f)
+{
+#if 0
+	struct fio_zone_info *zb, *ze, *z;
+	uint64_t swd;
+
+	zb = &f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset)];
+	ze = &f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset +
+						  f->io_size)];
+	swd = 0;
+	for (z = zb; z < ze; z++) {
+		pthread_mutex_lock(&z->mutex);
+		swd += z->wp - z->start;
+	}
+	pthread_mutex_lock(&f->zbd_info->mutex);
+	assert(f->zbd_info->sectors_with_data == swd);
+	pthread_mutex_unlock(&f->zbd_info->mutex);
+	for (z = zb; z < ze; z++)
+		pthread_mutex_unlock(&z->mutex);
+#endif
+}
+
+void zbd_file_reset(struct thread_data *td, struct fio_file *f)
+{
+	struct fio_zone_info *zb, *ze, *z;
+	uint32_t zone_idx_e;
+	uint64_t swd = 0;
+
+	if (!f->zbd_info)
+		return;
+
+	zb = &f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset)];
+	zone_idx_e = zbd_zone_idx(f, f->file_offset + f->io_size);
+	ze = &f->zbd_info->zone_info[zone_idx_e];
+	for (z = zb ; z < ze; z++) {
+		pthread_mutex_lock(&z->mutex);
+		swd += z->wp - z->start;
+	}
+	pthread_mutex_lock(&f->zbd_info->mutex);
+	f->zbd_info->sectors_with_data = swd;
+	pthread_mutex_unlock(&f->zbd_info->mutex);
+	for (z = zb ; z < ze; z++)
+		pthread_mutex_unlock(&z->mutex);
+	dprint(FD_ZBD, "%s(%s): swd = %ld\n", __func__, f->file_name, swd);
+	/*
+	 * If data verification is enabled reset the affected zones before
+	 * writing any data to avoid that a zone reset has to be issued while
+	 * writing data, which causes data loss.
+	 */
+	zbd_reset_zones(td, f, zb, ze, td->o.verify != VERIFY_NONE &&
+			(td->o.td_ddir & TD_DDIR_WRITE) &&
+			td->runstate != TD_VERIFYING);
+	zbd_reset_write_cnt(td, f);
+}
+
+/* The caller must hold f->zbd_info->mutex. */
+static bool is_zone_open(const struct thread_data *td, const struct fio_file *f,
+			 unsigned int zone_idx)
+{
+	struct zoned_block_device_info *zbdi = f->zbd_info;
+	int i;
+
+	assert(td->o.max_open_zones <= ARRAY_SIZE(zbdi->open_zones));
+	assert(zbdi->num_open_zones <= td->o.max_open_zones);
+
+	for (i = 0; i < zbdi->num_open_zones; i++)
+		if (zbdi->open_zones[i] == zone_idx)
+			return true;
+
+	return false;
+}
+
+/*
+ * Open a ZBD zone if it was not yet open. Returns true if either the zone was
+ * already open or if opening a new zone is allowed. Returns false if the zone
+ * was not yet open and opening a new zone would cause the zone limit to be
+ * exceeded.
+ */
+static bool zbd_open_zone(struct thread_data *td, const struct io_u *io_u,
+			  uint32_t zone_idx)
+{
+	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
+	const struct fio_file *f = io_u->file;
+	struct fio_zone_info *z = &f->zbd_info->zone_info[zone_idx];
+	bool res = true;
+
+	if (z->cond == BLK_ZONE_COND_OFFLINE)
+		return false;
+
+	/*
+	 * Skip full zones with data verification enabled because resetting a
+	 * zone causes data loss and hence causes verification to fail.
+	 */
+	if (td->o.verify != VERIFY_NONE && zbd_zone_full(f, z, min_bs))
+		return false;
+
+	/* Zero means no limit */
+	if (!td->o.max_open_zones)
+		return true;
+
+	pthread_mutex_lock(&f->zbd_info->mutex);
+	if (is_zone_open(td, f, zone_idx))
+		goto out;
+	res = false;
+	if (f->zbd_info->num_open_zones >= td->o.max_open_zones)
+		goto out;
+	dprint(FD_ZBD, "%s: opening zone %d\n", f->file_name, zone_idx);
+	f->zbd_info->open_zones[f->zbd_info->num_open_zones++] = zone_idx;
+	z->open = 1;
+	res = true;
+
+out:
+	pthread_mutex_unlock(&f->zbd_info->mutex);
+	return res;
+}
+
+/* The caller must hold f->zbd_info->mutex */
+static void zbd_close_zone(struct thread_data *td, const struct fio_file *f,
+			   unsigned int open_zone_idx)
+{
+	uint32_t zone_idx;
+
+	assert(open_zone_idx < f->zbd_info->num_open_zones);
+	zone_idx = f->zbd_info->open_zones[open_zone_idx];
+	memmove(f->zbd_info->open_zones + open_zone_idx,
+		f->zbd_info->open_zones + open_zone_idx + 1,
+		(FIO_MAX_OPEN_ZBD_ZONES - (open_zone_idx + 1)) *
+		sizeof(f->zbd_info->open_zones[0]));
+	f->zbd_info->num_open_zones--;
+	f->zbd_info->zone_info[zone_idx].open = 0;
+}
+
+/*
+ * Modify the offset of an I/O unit that does not refer to an open zone such
+ * that it refers to an open zone. Close an open zone and open a new zone if
+ * necessary. This algorithm can only work correctly if all write pointers are
+ * a multiple of the fio block size. The caller must neither hold z->mutex
+ * nor f->zbd_info->mutex. Returns with z->mutex held upon success.
+ */
+struct fio_zone_info *zbd_convert_to_open_zone(struct thread_data *td,
+					       struct io_u *io_u)
+{
+	const uint32_t min_bs = td->o.min_bs[io_u->ddir];
+	const struct fio_file *f = io_u->file;
+	struct fio_zone_info *z;
+	unsigned int open_zone_idx = -1;
+	uint32_t zone_idx, new_zone_idx;
+	int i;
+
+	assert(is_valid_offset(f, io_u->offset));
+
+	if (td->o.max_open_zones) {
+		/*
+		 * This statement accesses f->zbd_info->open_zones[] on purpose
+		 * without locking.
+		 */
+		zone_idx = f->zbd_info->open_zones[(io_u->offset -
+						    f->file_offset) *
+				f->zbd_info->num_open_zones / f->io_size];
+	} else {
+		zone_idx = zbd_zone_idx(f, io_u->offset);
+	}
+	dprint(FD_ZBD, "%s(%s): starting from zone %d (offset %lld, buflen %lld)\n",
+	       __func__, f->file_name, zone_idx, io_u->offset, io_u->buflen);
+
+	/*
+	 * Since z->mutex is the outer lock and f->zbd_info->mutex the inner
+	 * lock it can happen that the state of the zone with index zone_idx
+	 * has changed after 'z' has been assigned and before f->zbd_info->mutex
+	 * has been obtained. Hence the loop.
+	 */
+	for (;;) {
+		z = &f->zbd_info->zone_info[zone_idx];
+
+		pthread_mutex_lock(&z->mutex);
+		pthread_mutex_lock(&f->zbd_info->mutex);
+		if (td->o.max_open_zones == 0)
+			goto examine_zone;
+		if (f->zbd_info->num_open_zones == 0) {
+			pthread_mutex_unlock(&f->zbd_info->mutex);
+			pthread_mutex_unlock(&z->mutex);
+			dprint(FD_ZBD, "%s(%s): no zones are open\n",
+			       __func__, f->file_name);
+			return NULL;
+		}
+		open_zone_idx = (io_u->offset - f->file_offset) *
+			f->zbd_info->num_open_zones / f->io_size;
+		assert(open_zone_idx < f->zbd_info->num_open_zones);
+		new_zone_idx = f->zbd_info->open_zones[open_zone_idx];
+		if (new_zone_idx == zone_idx)
+			break;
+		zone_idx = new_zone_idx;
+		pthread_mutex_unlock(&f->zbd_info->mutex);
+		pthread_mutex_unlock(&z->mutex);
+	}
+
+	/* Both z->mutex and f->zbd_info->mutex are held. */
+
+examine_zone:
+	if ((z->wp << 9) + min_bs <= ((z+1)->start << 9)) {
+		pthread_mutex_unlock(&f->zbd_info->mutex);
+		goto out;
+	}
+	dprint(FD_ZBD, "%s(%s): closing zone %d\n", __func__, f->file_name,
+	       zone_idx);
+	if (td->o.max_open_zones)
+		zbd_close_zone(td, f, open_zone_idx);
+	pthread_mutex_unlock(&f->zbd_info->mutex);
+
+	/* Only z->mutex is held. */
+
+	/* Zone 'z' is full, so try to open a new zone. */
+	for (i = f->io_size / f->zbd_info->zone_size; i > 0; i--) {
+		zone_idx++;
+		pthread_mutex_unlock(&z->mutex);
+		z++;
+		if (!is_valid_offset(f, z->start << 9)) {
+			/* Wrap-around. */
+			zone_idx = zbd_zone_idx(f, f->file_offset);
+			z = &f->zbd_info->zone_info[zone_idx];
+		}
+		assert(is_valid_offset(f, z->start << 9));
+		pthread_mutex_lock(&z->mutex);
+		if (z->open)
+			continue;
+		if (zbd_open_zone(td, io_u, zone_idx))
+			goto out;
+	}
+
+	/* Only z->mutex is held. */
+
+	/* Check whether the write fits in any of the already opened zones. */
+	pthread_mutex_lock(&f->zbd_info->mutex);
+	for (i = 0; i < f->zbd_info->num_open_zones; i++) {
+		zone_idx = f->zbd_info->open_zones[i];
+		pthread_mutex_unlock(&f->zbd_info->mutex);
+		pthread_mutex_unlock(&z->mutex);
+
+		z = &f->zbd_info->zone_info[zone_idx];
+
+		pthread_mutex_lock(&z->mutex);
+		if ((z->wp << 9) + min_bs <= ((z+1)->start << 9))
+			goto out;
+		pthread_mutex_lock(&f->zbd_info->mutex);
+	}
+	pthread_mutex_unlock(&f->zbd_info->mutex);
+	pthread_mutex_unlock(&z->mutex);
+	dprint(FD_ZBD, "%s(%s): did not open another zone\n", __func__,
+	       f->file_name);
+	return NULL;
+
+out:
+	dprint(FD_ZBD, "%s(%s): returning zone %d\n", __func__, f->file_name,
+	       zone_idx);
+	io_u->offset = z->start << 9;
+	return z;
+}
+
+/* The caller must hold z->mutex. */
+static struct fio_zone_info *zbd_replay_write_order(struct thread_data *td,
+						    struct io_u *io_u,
+						    struct fio_zone_info *z)
+{
+	const struct fio_file *f = io_u->file;
+	const uint32_t min_bs = td->o.min_bs[DDIR_WRITE];
+
+	if (!zbd_open_zone(td, io_u, z - f->zbd_info->zone_info)) {
+		pthread_mutex_unlock(&z->mutex);
+		z = zbd_convert_to_open_zone(td, io_u);
+		assert(z);
+	}
+
+	if (z->verify_block * min_bs >= f->zbd_info->zone_size)
+		log_err("%s: %d * %d >= %ld\n", f->file_name, z->verify_block,
+			min_bs, f->zbd_info->zone_size);
+	io_u->offset = (z->start << 9) + z->verify_block++ * min_bs;
+	return z;
+}
+
+/*
+ * Find another zone for which @io_u fits below the write pointer. Start
+ * searching in zones @zb + 1 .. @zl and continue searching in zones
+ * @zf .. @zb - 1.
+ *
+ * Either returns NULL or returns a zone pointer and holds the mutex for that
+ * zone.
+ */
+static struct fio_zone_info *
+zbd_find_zone(struct thread_data *td, struct io_u *io_u,
+	      struct fio_zone_info *zb, struct fio_zone_info *zl)
+{
+	const uint32_t min_bs = td->o.min_bs[io_u->ddir];
+	const struct fio_file *f = io_u->file;
+	struct fio_zone_info *z1, *z2;
+	const struct fio_zone_info *const zf =
+		&f->zbd_info->zone_info[zbd_zone_idx(f, f->file_offset)];
+
+	/*
+	 * Skip to the next non-empty zone in case of sequential I/O and to
+	 * the nearest non-empty zone in case of random I/O.
+	 */
+	for (z1 = zb + 1, z2 = zb - 1; z1 < zl || z2 >= zf; z1++, z2--) {
+		if (z1 < zl && z1->cond != BLK_ZONE_COND_OFFLINE) {
+			pthread_mutex_lock(&z1->mutex);
+			if (z1->start + (min_bs >> 9) <= z1->wp)
+				return z1;
+			pthread_mutex_unlock(&z1->mutex);
+		} else if (!td_random(td)) {
+			break;
+		}
+		if (td_random(td) && z2 >= zf &&
+		    z2->cond != BLK_ZONE_COND_OFFLINE) {
+			pthread_mutex_lock(&z2->mutex);
+			if (z2->start + (min_bs >> 9) <= z2->wp)
+				return z2;
+			pthread_mutex_unlock(&z2->mutex);
+		}
+	}
+	dprint(FD_ZBD, "%s: adjusting random read offset failed\n",
+	       f->file_name);
+	return NULL;
+}
+
+
+/**
+ * zbd_post_submit - update the write pointer and unlock the zone lock
+ * @io_u: I/O unit
+ * @success: Whether or not the I/O unit has been executed successfully
+ *
+ * For write and trim operations, update the write pointer of all affected
+ * zones.
+ */
+static void zbd_post_submit(const struct io_u *io_u, bool success)
+{
+	struct zoned_block_device_info *zbd_info;
+	struct fio_zone_info *z;
+	uint32_t zone_idx;
+	uint64_t end, zone_end;
+
+	zbd_info = io_u->file->zbd_info;
+	if (!zbd_info)
+		return;
+
+	zone_idx = zbd_zone_idx(io_u->file, io_u->offset);
+	end = (io_u->offset + io_u->buflen) >> 9;
+	z = &zbd_info->zone_info[zone_idx];
+	assert(zone_idx < zbd_info->nr_zones);
+	if (z->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
+		return;
+	if (!success)
+		goto unlock;
+	switch (io_u->ddir) {
+	case DDIR_WRITE:
+		zone_end = min(end, (z + 1)->start);
+		pthread_mutex_lock(&zbd_info->mutex);
+		/*
+		 * z->wp > zone_end means that one or more I/O errors
+		 * have occurred.
+		 */
+		if (z->wp <= zone_end)
+			zbd_info->sectors_with_data += zone_end - z->wp;
+		pthread_mutex_unlock(&zbd_info->mutex);
+		z->wp = zone_end;
+		break;
+	case DDIR_TRIM:
+		assert(z->wp == z->start);
+		break;
+	default:
+		break;
+	}
+unlock:
+	pthread_mutex_unlock(&z->mutex);
+}
+
+bool zbd_unaligned_write(int error_code)
+{
+	switch (error_code) {
+	case EIO:
+	case EREMOTEIO:
+		return true;
+	}
+	return false;
+}
+
+/**
+ * zbd_adjust_block - adjust the offset and length as necessary for ZBD drives
+ * @td: FIO thread data.
+ * @io_u: FIO I/O unit.
+ *
+ * Locking strategy: returns with z->mutex locked if and only if z refers
+ * to a sequential zone and if io_u_accept is returned. z is the zone that
+ * corresponds to io_u->offset at the end of this function.
+ */
+enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u)
+{
+	const struct fio_file *f = io_u->file;
+	uint32_t zone_idx_b;
+	struct fio_zone_info *zb, *zl;
+	uint32_t orig_len = io_u->buflen;
+	uint32_t min_bs = td->o.min_bs[io_u->ddir];
+	uint64_t new_len;
+	int64_t range;
+
+	if (!f->zbd_info)
+		return io_u_accept;
+
+	assert(is_valid_offset(f, io_u->offset));
+	assert(io_u->buflen);
+	zone_idx_b = zbd_zone_idx(f, io_u->offset);
+	zb = &f->zbd_info->zone_info[zone_idx_b];
+
+	/* Accept the I/O offset for conventional zones. */
+	if (zb->type == BLK_ZONE_TYPE_CONVENTIONAL)
+		return io_u_accept;
+
+	/*
+	 * Accept the I/O offset for reads if reading beyond the write pointer
+	 * is enabled.
+	 */
+	if (zb->cond != BLK_ZONE_COND_OFFLINE &&
+	    io_u->ddir == DDIR_READ && td->o.read_beyond_wp)
+		return io_u_accept;
+
+	pthread_mutex_lock(&zb->mutex);
+	switch (io_u->ddir) {
+	case DDIR_READ:
+		if (td->runstate == TD_VERIFYING) {
+			zb = zbd_replay_write_order(td, io_u, zb);
+			goto accept;
+		}
+		/*
+		 * Avoid reads past the write pointer because such reads do not
+		 * hit the medium.
+		 */
+		range = zb->cond != BLK_ZONE_COND_OFFLINE ?
+			((zb->wp - zb->start) << 9) - io_u->buflen : 0;
+		if (td_random(td) && range >= 0) {
+			io_u->offset = (zb->start << 9) +
+				((io_u->offset - (zb->start << 9)) %
+				 (range + 1)) / min_bs * min_bs;
+			assert(zb->start << 9 <= io_u->offset);
+			assert(io_u->offset + io_u->buflen <= zb->wp << 9);
+			goto accept;
+		}
+		if (zb->cond == BLK_ZONE_COND_OFFLINE ||
+		    (io_u->offset + io_u->buflen) >> 9 > zb->wp) {
+			pthread_mutex_unlock(&zb->mutex);
+			zl = &f->zbd_info->zone_info[zbd_zone_idx(f,
+						f->file_offset + f->io_size)];
+			zb = zbd_find_zone(td, io_u, zb, zl);
+			if (!zb) {
+				dprint(FD_ZBD,
+				       "%s: zbd_find_zone(%lld, %llu) failed\n",
+				       f->file_name, io_u->offset,
+				       io_u->buflen);
+				goto eof;
+			}
+			io_u->offset = zb->start << 9;
+		}
+		if ((io_u->offset + io_u->buflen) >> 9 > zb->wp) {
+			dprint(FD_ZBD, "%s: %lld + %lld > %" PRIu64 "\n",
+			       f->file_name, io_u->offset, io_u->buflen,
+			       zb->wp);
+			goto eof;
+		}
+		goto accept;
+	case DDIR_WRITE:
+		if (io_u->buflen > (f->zbd_info->zone_size << 9))
+			goto eof;
+		if (!zbd_open_zone(td, io_u, zone_idx_b)) {
+			pthread_mutex_unlock(&zb->mutex);
+			zb = zbd_convert_to_open_zone(td, io_u);
+			if (!zb)
+				goto eof;
+			zone_idx_b = zb - f->zbd_info->zone_info;
+		}
+		/* Check whether the zone reset threshold has been exceeded */
+		if (td->o.zrf.u.f) {
+			check_swd(td, f);
+			if ((f->zbd_info->sectors_with_data << 9) >=
+			    f->io_size * td->o.zrt.u.f &&
+			    zbd_dec_and_reset_write_cnt(td, f)) {
+				zb->reset_zone = 1;
+			}
+		}
+		/* Reset the zone pointer if necessary */
+		if (zb->reset_zone || zbd_zone_full(f, zb, min_bs)) {
+			assert(td->o.verify == VERIFY_NONE);
+			/*
+			 * Since previous write requests may have been submitted
+			 * asynchronously and since we will submit the zone
+			 * reset synchronously, wait until previously submitted
+			 * write requests have completed before issuing a
+			 * zone reset.
+			 */
+			io_u_quiesce(td);
+			zb->reset_zone = 0;
+			if (zbd_reset_zone(td, f, zb) < 0)
+				goto eof;
+			check_swd(td, f);
+		}
+		/* Make writes occur at the write pointer */
+		assert(!zbd_zone_full(f, zb, min_bs));
+		io_u->offset = zb->wp << 9;
+		if (!is_valid_offset(f, io_u->offset)) {
+			dprint(FD_ZBD, "Dropped request with offset %llu\n",
+			       io_u->offset);
+			goto eof;
+		}
+		/*
+		 * Make sure that the buflen is a multiple of the minimal
+		 * block size. Give up if shrinking would make the request too
+		 * small.
+		 */
+		new_len = min((unsigned long long)io_u->buflen,
+			      ((zb + 1)->start << 9) - io_u->offset);
+		new_len = new_len / min_bs * min_bs;
+		if (new_len == io_u->buflen)
+			goto accept;
+		if (new_len >= min_bs) {
+			io_u->buflen = new_len;
+			dprint(FD_IO, "Changed length from %u into %llu\n",
+			       orig_len, io_u->buflen);
+			goto accept;
+		}
+		log_err("Zone remainder %lld smaller than minimum block size %d\n",
+			(((zb + 1)->start << 9) - io_u->offset),
+			min_bs);
+		goto eof;
+	case DDIR_TRIM:
+		/* fall-through */
+	case DDIR_SYNC:
+	case DDIR_DATASYNC:
+	case DDIR_SYNC_FILE_RANGE:
+	case DDIR_WAIT:
+	case DDIR_LAST:
+	case DDIR_INVAL:
+		goto accept;
+	}
+
+	assert(false);
+
+accept:
+	assert(zb);
+	assert(zb->cond != BLK_ZONE_COND_OFFLINE);
+	assert(!io_u->post_submit);
+	io_u->post_submit = zbd_post_submit;
+	return io_u_accept;
+
+eof:
+	if (zb)
+		pthread_mutex_unlock(&zb->mutex);
+	return io_u_eof;
+}
+
+/* Return a string with ZBD statistics */
+char *zbd_write_status(const struct thread_stat *ts)
+{
+	char *res;
+
+	if (asprintf(&res, "; %ld zone resets", ts->nr_zone_resets) < 0)
+		return NULL;
+	return res;
+}
diff --git a/zbd.h b/zbd.h
new file mode 100644
index 0000000..08751fd
--- /dev/null
+++ b/zbd.h
@@ -0,0 +1,142 @@
+/*
+ * Copyright (C) 2018 Western Digital Corporation or its affiliates.
+ *
+ * This file is released under the GPL.
+ */
+
+#ifndef FIO_ZBD_H
+#define FIO_ZBD_H
+
+#include <inttypes.h>
+#include "fio.h"	/* FIO_MAX_OPEN_ZBD_ZONES */
+#ifdef CONFIG_LINUX_BLKZONED
+#include <linux/blkzoned.h>
+#endif
+
+struct fio_file;
+
+/*
+ * Zoned block device models.
+ */
+enum blk_zoned_model {
+	ZBD_DM_NONE,	/* Regular block device */
+	ZBD_DM_HOST_AWARE,	/* Host-aware zoned block device */
+	ZBD_DM_HOST_MANAGED,	/* Host-managed zoned block device */
+};
+
+enum io_u_action {
+	io_u_accept	= 0,
+	io_u_eof	= 1,
+};
+
+/**
+ * struct fio_zone_info - information about a single ZBD zone
+ * @start: zone start in 512 byte units
+ * @wp: zone write pointer location in 512 byte units
+ * @verify_block: number of blocks that have been verified for this zone
+ * @mutex: protects the modifiable members in this structure
+ * @type: zone type (BLK_ZONE_TYPE_*)
+ * @cond: zone state (BLK_ZONE_COND_*)
+ * @open: whether or not this zone is currently open. Only relevant if
+ *		max_open_zones > 0.
+ * @reset_zone: whether or not this zone should be reset before writing to it
+ */
+struct fio_zone_info {
+#ifdef CONFIG_LINUX_BLKZONED
+	pthread_mutex_t		mutex;
+	uint64_t		start;
+	uint64_t		wp;
+	uint32_t		verify_block;
+	enum blk_zone_type	type:2;
+	enum blk_zone_cond	cond:4;
+	unsigned int		open:1;
+	unsigned int		reset_zone:1;
+#endif
+};
+
+/**
+ * zoned_block_device_info - zoned block device characteristics
+ * @model: Device model.
+ * @mutex: Protects the modifiable members in this structure (refcount and
+ *		num_open_zones).
+ * @zone_size: size of a single zone in units of 512 bytes
+ * @sectors_with_data: total size of data in all zones in units of 512 bytes
+ * @zone_size_log2: log2 of the zone size in bytes if it is a power of 2 or 0
+ *		if the zone size is not a power of 2.
+ * @nr_zones: number of zones
+ * @refcount: number of fio files that share this structure
+ * @num_open_zones: number of open zones
+ * @write_cnt: Number of writes since the latest zone reset triggered by
+ *	       the zone_reset_frequency fio job parameter.
+ * @open_zones: zone numbers of open zones
+ * @zone_info: description of the individual zones
+ *
+ * Only devices for which all zones have the same size are supported.
+ * Note: if the capacity is not a multiple of the zone size then the last zone
+ * will be smaller than 'zone_size'.
+ */
+struct zoned_block_device_info {
+	enum blk_zoned_model	model;
+	pthread_mutex_t		mutex;
+	uint64_t		zone_size;
+	uint64_t		sectors_with_data;
+	uint32_t		zone_size_log2;
+	uint32_t		nr_zones;
+	uint32_t		refcount;
+	uint32_t		num_open_zones;
+	uint32_t		write_cnt;
+	uint32_t		open_zones[FIO_MAX_OPEN_ZBD_ZONES];
+	struct fio_zone_info	zone_info[0];
+};
+
+#ifdef CONFIG_LINUX_BLKZONED
+void zbd_free_zone_info(struct fio_file *f);
+int zbd_init(struct thread_data *td);
+void zbd_file_reset(struct thread_data *td, struct fio_file *f);
+bool zbd_unaligned_write(int error_code);
+enum io_u_action zbd_adjust_block(struct thread_data *td, struct io_u *io_u);
+int zbd_do_trim(struct thread_data *td, const struct io_u *io_u);
+void zbd_update_wp(struct thread_data *td, const struct io_u *io_u);
+char *zbd_write_status(const struct thread_stat *ts);
+#else
+static inline void zbd_free_zone_info(struct fio_file *f)
+{
+}
+
+static inline int zbd_init(struct thread_data *td)
+{
+	return 0;
+}
+
+static inline void zbd_file_reset(struct thread_data *td, struct fio_file *f)
+{
+}
+
+static inline bool zbd_unaligned_write(int error_code)
+{
+	return false;
+}
+
+static inline enum io_u_action zbd_adjust_block(struct thread_data *td,
+						struct io_u *io_u)
+{
+	return io_u_accept;
+}
+
+static inline int zbd_do_trim(struct thread_data *td, const struct io_u *io_u)
+{
+	return 1;
+}
+
+static inline void zbd_update_wp(struct thread_data *td,
+				 const struct io_u *io_u)
+{
+}
+
+static inline char *zbd_write_status(const struct thread_stat *ts)
+{
+	return NULL;
+}
+#endif
+
+#endif /* FIO_ZBD_H */


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c08890d3a81cebb5b4bde14afb6c7778bb390ddf:

  Merge branch 'axmap' of https://github.com/bvanassche/fio (2018-08-22 20:13:45 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e7ff953f87d000d1de4c928493a6f67214cfcf8f:

  t/axmap: print explicit overlap ranges tested (2018-08-23 13:58:21 -0600)

----------------------------------------------------------------
Jens Axboe (5):
      parse: fix bssplit option
      Merge branch '20180823-terse-remote-fix' of https://github.com/mcgrof/fio
      axmap: return early of an overlap results in 0 settable bits
      t/axmap: add regression test for overlap case resulting in 0 settable bits
      t/axmap: print explicit overlap ranges tested

Luis Chamberlain (2):
      client: respect terse output on client <--> backend relationship
      init: add semantics for all types of backends running

 backend.c   |  2 ++
 client.c    | 16 +++++++++++++---
 diskutil.c  |  2 +-
 fio.h       |  6 ++++++
 init.c      |  1 +
 lib/axmap.c |  2 ++
 parse.c     |  4 +---
 stat.c      |  4 +++-
 t/axmap.c   | 15 +++++++++------
 9 files changed, 38 insertions(+), 14 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 36bde6a..8fec1ce 100644
--- a/backend.c
+++ b/backend.c
@@ -2481,6 +2481,8 @@ int fio_backend(struct sk_out *sk_out)
 	}
 
 	startup_sem = fio_sem_init(FIO_SEM_LOCKED);
+	if (!sk_out)
+		is_local_backend = true;
 	if (startup_sem == NULL)
 		return 1;
 
diff --git a/client.c b/client.c
index e2525c8..bc0275b 100644
--- a/client.c
+++ b/client.c
@@ -1058,6 +1058,9 @@ static void handle_ts(struct fio_client *client, struct fio_net_cmd *cmd)
 	struct flist_head *opt_list = NULL;
 	struct json_object *tsobj;
 
+	if (output_format & FIO_OUTPUT_TERSE)
+		return;
+
 	if (client->opt_lists && p->ts.thread_number <= client->jobs)
 		opt_list = &client->opt_lists[p->ts.thread_number - 1];
 
@@ -1094,6 +1097,9 @@ static void handle_gs(struct fio_client *client, struct fio_net_cmd *cmd)
 {
 	struct group_run_stats *gs = (struct group_run_stats *) cmd->payload;
 
+	if (output_format & FIO_OUTPUT_TERSE)
+		return;
+
 	if (output_format & FIO_OUTPUT_NORMAL)
 		show_group_stats(gs, NULL);
 }
@@ -1140,7 +1146,7 @@ static void handle_text(struct fio_client *client, struct fio_net_cmd *cmd)
 
 	name = client->name ? client->name : client->hostname;
 
-	if (!client->skip_newline)
+	if (!client->skip_newline && !(output_format & FIO_OUTPUT_TERSE))
 		fprintf(f_out, "<%s> ", name);
 	ret = fwrite(buf, pdu->buf_len, 1, f_out);
 	fflush(f_out);
@@ -1184,6 +1190,9 @@ static void handle_du(struct fio_client *client, struct fio_net_cmd *cmd)
 {
 	struct cmd_du_pdu *du = (struct cmd_du_pdu *) cmd->payload;
 
+	if (output_format & FIO_OUTPUT_TERSE)
+		return;
+
 	if (!client->disk_stats_shown) {
 		client->disk_stats_shown = true;
 		log_info("\nDisk stats (read/write):\n");
@@ -1195,8 +1204,6 @@ static void handle_du(struct fio_client *client, struct fio_net_cmd *cmd)
 		duobj = json_array_last_value_object(du_array);
 		json_object_add_client_info(duobj, client);
 	}
-	if (output_format & FIO_OUTPUT_TERSE)
-		print_disk_util(&du->dus, &du->agg, 1, NULL);
 	if (output_format & FIO_OUTPUT_NORMAL)
 		print_disk_util(&du->dus, &du->agg, 0, NULL);
 }
@@ -1456,6 +1463,9 @@ static void handle_probe(struct fio_client *client, struct fio_net_cmd *cmd)
 	const char *os, *arch;
 	char bit[16];
 
+	if (output_format & FIO_OUTPUT_TERSE)
+		return;
+
 	os = fio_get_os_string(probe->os);
 	if (!os)
 		os = "unknown";
diff --git a/diskutil.c b/diskutil.c
index 5b4eb46..7be4c02 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -701,7 +701,7 @@ void show_disk_util(int terse, struct json_object *parent,
 	struct disk_util *du;
 	bool do_json;
 
-	if (!disk_util_sem)
+	if (!is_running_backend())
 		return;
 
 	fio_sem_down(disk_util_sem);
diff --git a/fio.h b/fio.h
index b58057f..83654bb 100644
--- a/fio.h
+++ b/fio.h
@@ -522,6 +522,7 @@ extern int fio_clock_source_set;
 extern int warnings_fatal;
 extern int terse_version;
 extern int is_backend;
+extern int is_local_backend;
 extern int nr_clients;
 extern int log_syslog;
 extern int status_interval;
@@ -534,6 +535,11 @@ extern char *aux_path;
 
 extern struct thread_data *threads;
 
+static inline bool is_running_backend(void)
+{
+	return is_backend || is_local_backend;
+}
+
 extern bool eta_time_within_slack(unsigned int time);
 
 static inline void fio_ro_check(const struct thread_data *td, struct io_u *io_u)
diff --git a/init.c b/init.c
index 06f6971..3ed5757 100644
--- a/init.c
+++ b/init.c
@@ -63,6 +63,7 @@ char *exec_profile = NULL;
 int warnings_fatal = 0;
 int terse_version = 3;
 int is_backend = 0;
+int is_local_backend = 0;
 int nr_clients = 0;
 int log_syslog = 0;
 
diff --git a/lib/axmap.c b/lib/axmap.c
index e194e80..03e712f 100644
--- a/lib/axmap.c
+++ b/lib/axmap.c
@@ -241,6 +241,8 @@ static bool axmap_set_fn(struct axmap_level *al, unsigned long offset,
 
 	if (overlap) {
 		nr_bits = ffz(~overlap) - bit;
+		if (!nr_bits)
+			return true;
 		mask = bit_masks[nr_bits] << bit;
 	}
 
diff --git a/parse.c b/parse.c
index 194ad59..952118c 100644
--- a/parse.c
+++ b/parse.c
@@ -959,6 +959,7 @@ static int handle_option(const struct fio_option *o, const char *__ptr,
 		if (ptr &&
 		    (o->type != FIO_OPT_STR_STORE) &&
 		    (o->type != FIO_OPT_STR) &&
+		    (o->type != FIO_OPT_STR_ULL) &&
 		    (o->type != FIO_OPT_FLOAT_LIST)) {
 			ptr2 = strchr(ptr, ',');
 			if (ptr2 && *(ptr2 + 1) == '\0')
@@ -1372,9 +1373,6 @@ static void option_init(struct fio_option *o)
 		o->category = FIO_OPT_C_GENERAL;
 		o->group = FIO_OPT_G_INVALID;
 	}
-	if (o->type == FIO_OPT_STR || o->type == FIO_OPT_STR_STORE ||
-	    o->type == FIO_OPT_STR_MULTI)
-		return;
 }
 
 /*
diff --git a/stat.c b/stat.c
index 82e79df..6cb704e 100644
--- a/stat.c
+++ b/stat.c
@@ -1205,7 +1205,7 @@ static void show_thread_status_terse_all(struct thread_stat *ts,
 		log_buf(out, ";%3.2f%%", io_u_lat_m[i]);
 
 	/* disk util stats, if any */
-	if (ver >= 3)
+	if (ver >= 3 && is_running_backend())
 		show_disk_util(1, NULL, out);
 
 	/* Additional output if continue_on_error set - default off*/
@@ -1922,6 +1922,8 @@ void __show_run_stats(void)
 		if (is_backend) {
 			fio_server_send_job_options(opt_lists[i], i);
 			fio_server_send_ts(ts, rs);
+			if (output_format & FIO_OUTPUT_TERSE)
+				show_thread_status_terse(ts, rs, &output[__FIO_OUTPUT_TERSE]);
 		} else {
 			if (output_format & FIO_OUTPUT_TERSE)
 				show_thread_status_terse(ts, rs, &output[__FIO_OUTPUT_TERSE]);
diff --git a/t/axmap.c b/t/axmap.c
index 1752439..a2e6fd6 100644
--- a/t/axmap.c
+++ b/t/axmap.c
@@ -352,6 +352,11 @@ static int test_overlap(void)
 			.ret	= 14,
 		},
 		{
+			.start	= 22670,
+			.nr	= 60,
+			.ret	= 0,
+		},
+		{
 			.start	= -1U,
 		},
 	};
@@ -366,7 +371,7 @@ static int test_overlap(void)
 			entries = this;
 	}
 
-	printf("Test overlaps...");
+	printf("Test overlaps...\n");
 	fflush(stdout);
 
 	map = axmap_new(entries);
@@ -374,18 +379,16 @@ static int test_overlap(void)
 	for (i = 0; tests[i].start != -1U; i++) {
 		struct overlap_test *t = &tests[i];
 
+		printf("\tstart=%6u, nr=%3u: ", t->start, t->nr);
 		ret = axmap_set_nr(map, t->start, t->nr);
 		if (ret != t->ret) {
-			printf("fail\n");
-			printf("start=%u, nr=%d, ret=%d: %d\n", t->start, t->nr,
-								t->ret, ret);
+			printf("%3d (FAIL, wanted %d)\n", ret, t->ret);
 			err = 1;
 			break;
 		}
+		printf("%3d (PASS)\n", ret);
 	}
 
-	if (!err)
-		printf("pass!\n");
 	axmap_free(map);
 	return err;
 }


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit daa899130fdd40f5df720ee54980b00b07903dc4:

  io_u: residiual size should be unsigned long long (2018-08-21 09:16:09 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c08890d3a81cebb5b4bde14afb6c7778bb390ddf:

  Merge branch 'axmap' of https://github.com/bvanassche/fio (2018-08-22 20:13:45 -0600)

----------------------------------------------------------------
Bart Van Assche (5):
      lib/axmap: Add more documentation
      lib/axmap: Inline ulog64()
      lib/axmap: Make axmap_new() more robust
      lib/axmap: Simplify axmap_set_fn()
      lib/axmap: Optimize __axmap_set()

Jens Axboe (1):
      Merge branch 'axmap' of https://github.com/bvanassche/fio

 lib/axmap.c | 96 ++++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 60 insertions(+), 36 deletions(-)

---

Diff of recent changes:

diff --git a/lib/axmap.c b/lib/axmap.c
index 923aae4..e194e80 100644
--- a/lib/axmap.c
+++ b/lib/axmap.c
@@ -57,26 +57,33 @@ static const unsigned long bit_masks[] = {
 #endif
 };
 
+/**
+ * struct axmap_level - a bitmap used to implement struct axmap
+ * @level: Level index. Each map has at least one level with index zero. The
+ *	higher the level index, the fewer bits a struct axmap_level contains.
+ * @map_size: Number of elements of the @map array.
+ * @map: A bitmap with @map_size elements.
+ */
 struct axmap_level {
 	int level;
 	unsigned long map_size;
 	unsigned long *map;
 };
 
+/**
+ * struct axmap - a set that can store numbers 0 .. @nr_bits - 1
+ * @nr_level: Number of elements of the @levels array.
+ * @levels: struct axmap_level array in which lower levels contain more bits
+ *	than higher levels.
+ * @nr_bits: One more than the highest value stored in the set.
+ */
 struct axmap {
 	unsigned int nr_levels;
 	struct axmap_level *levels;
 	uint64_t nr_bits;
 };
 
-static inline unsigned long ulog64(unsigned long val, unsigned int log)
-{
-	while (log-- && val)
-		val >>= UNIT_SHIFT;
-
-	return val;
-}
-
+/* Remove all elements from the @axmap set */
 void axmap_reset(struct axmap *axmap)
 {
 	int i;
@@ -102,6 +109,7 @@ void axmap_free(struct axmap *axmap)
 	free(axmap);
 }
 
+/* Allocate memory for a set that can store the numbers 0 .. @nr_bits - 1. */
 struct axmap *axmap_new(unsigned long nr_bits)
 {
 	struct axmap *axmap;
@@ -120,6 +128,8 @@ struct axmap *axmap_new(unsigned long nr_bits)
 
 	axmap->nr_levels = levels;
 	axmap->levels = calloc(axmap->nr_levels, sizeof(struct axmap_level));
+	if (!axmap->levels)
+		goto free_axmap;
 	axmap->nr_bits = nr_bits;
 
 	for (i = 0; i < axmap->nr_levels; i++) {
@@ -129,23 +139,30 @@ struct axmap *axmap_new(unsigned long nr_bits)
 		al->map_size = (nr_bits + BLOCKS_PER_UNIT - 1) >> UNIT_SHIFT;
 		al->map = malloc(al->map_size * sizeof(unsigned long));
 		if (!al->map)
-			goto err;
+			goto free_levels;
 
 		nr_bits = (nr_bits + BLOCKS_PER_UNIT - 1) >> UNIT_SHIFT;
 	}
 
 	axmap_reset(axmap);
 	return axmap;
-err:
+
+free_levels:
 	for (i = 0; i < axmap->nr_levels; i++)
-		if (axmap->levels[i].map)
-			free(axmap->levels[i].map);
+		free(axmap->levels[i].map);
 
 	free(axmap->levels);
+
+free_axmap:
 	free(axmap);
 	return NULL;
 }
 
+/*
+ * Call @func for each level, starting at level zero, until a level is found
+ * for which @func returns true. Return false if none of the @func calls
+ * returns true.
+ */
 static bool axmap_handler(struct axmap *axmap, uint64_t bit_nr,
 			  bool (*func)(struct axmap_level *, unsigned long, unsigned int,
 			  void *), void *data)
@@ -170,13 +187,18 @@ static bool axmap_handler(struct axmap *axmap, uint64_t bit_nr,
 	return false;
 }
 
+/*
+ * Call @func for each level, starting at the highest level, until a level is
+ * found for which @func returns true. Return false if none of the @func calls
+ * returns true.
+ */
 static bool axmap_handler_topdown(struct axmap *axmap, uint64_t bit_nr,
 	bool (*func)(struct axmap_level *, unsigned long, unsigned int, void *))
 {
 	int i;
 
 	for (i = axmap->nr_levels - 1; i >= 0; i--) {
-		unsigned long index = ulog64(bit_nr, i);
+		unsigned long index = bit_nr >> (UNIT_SHIFT * i);
 		unsigned long offset = index >> UNIT_SHIFT;
 		unsigned int bit = index & BLOCKS_PER_UNIT_MASK;
 
@@ -192,6 +214,11 @@ struct axmap_set_data {
 	unsigned int set_bits;
 };
 
+/*
+ * Set at most @__data->nr_bits bits in @al at offset @offset. Do not exceed
+ * the boundary of the element at offset @offset. Return the number of bits
+ * that have been set in @__data->set_bits if @al->level == 0.
+ */
 static bool axmap_set_fn(struct axmap_level *al, unsigned long offset,
 			 unsigned int bit, void *__data)
 {
@@ -208,18 +235,12 @@ static bool axmap_set_fn(struct axmap_level *al, unsigned long offset,
 	 */
 	overlap = al->map[offset] & mask;
 	if (overlap == mask) {
-done:
 		data->set_bits = 0;
 		return true;
 	}
 
 	if (overlap) {
-		const int __bit = ffz(~overlap);
-
-		nr_bits = __bit - bit;
-		if (!nr_bits)
-			goto done;
-
+		nr_bits = ffz(~overlap) - bit;
 		mask = bit_masks[nr_bits] << bit;
 	}
 
@@ -230,36 +251,33 @@ done:
 	if (!al->level)
 		data->set_bits = nr_bits;
 
+	/* For the next level */
 	data->nr_bits = 1;
+
 	return al->map[offset] != -1UL;
 }
 
+/*
+ * Set up to @data->nr_bits starting from @bit_nr in @axmap. Start at
+ * @bit_nr. If that bit has not yet been set then set it and continue until
+ * either @data->nr_bits have been set or a 1 bit is found. Store the number
+ * of bits that have been set in @data->set_bits. It is guaranteed that all
+ * bits that have been requested to set fit in the same unsigned long word of
+ * level 0 of @axmap.
+ */
 static void __axmap_set(struct axmap *axmap, uint64_t bit_nr,
 			 struct axmap_set_data *data)
 {
-	unsigned int set_bits, nr_bits = data->nr_bits;
+	unsigned int nr_bits = data->nr_bits;
 
 	if (bit_nr > axmap->nr_bits)
 		return;
 	else if (bit_nr + nr_bits > axmap->nr_bits)
 		nr_bits = axmap->nr_bits - bit_nr;
 
-	set_bits = 0;
-	while (nr_bits) {
-		axmap_handler(axmap, bit_nr, axmap_set_fn, data);
-		set_bits += data->set_bits;
+	assert(nr_bits <= BLOCKS_PER_UNIT);
 
-		if (!data->set_bits ||
-		    data->set_bits != (BLOCKS_PER_UNIT - nr_bits))
-			break;
-
-		nr_bits -= data->set_bits;
-		bit_nr += data->set_bits;
-
-		data->nr_bits = nr_bits;
-	}
-
-	data->set_bits = set_bits;
+	axmap_handler(axmap, bit_nr, axmap_set_fn, data);
 }
 
 void axmap_set(struct axmap *axmap, uint64_t bit_nr)
@@ -269,6 +287,12 @@ void axmap_set(struct axmap *axmap, uint64_t bit_nr)
 	__axmap_set(axmap, bit_nr, &data);
 }
 
+/*
+ * Set up to @nr_bits starting from @bit in @axmap. Start at @bit. If that
+ * bit has not yet been set then set it and continue until either @nr_bits
+ * have been set or a 1 bit is found. Return the number of bits that have been
+ * set.
+ */
 unsigned int axmap_set_nr(struct axmap *axmap, uint64_t bit_nr,
 			  unsigned int nr_bits)
 {


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 78439a18225255f7f1b4f9efab950afcd638b606:

  Update HOWTO for read_iolog change (2018-08-20 08:34:13 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to daa899130fdd40f5df720ee54980b00b07903dc4:

  io_u: residiual size should be unsigned long long (2018-08-21 09:16:09 -0600)

----------------------------------------------------------------
Jeff Furlong (1):
      io_u: residiual size should be unsigned long long

 io_u.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/io_u.h b/io_u.h
index 9a423b2..2e0fd3f 100644
--- a/io_u.h
+++ b/io_u.h
@@ -75,7 +75,7 @@ struct io_u {
 
 	struct io_piece *ipo;
 
-	unsigned int resid;
+	unsigned long long resid;
 	unsigned int error;
 
 	/*


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c44d2c6e97245bf68a57f9860a1c92c7bc065f82:

  Move steady state unit test to t/ (2018-08-17 19:39:07 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 78439a18225255f7f1b4f9efab950afcd638b606:

  Update HOWTO for read_iolog change (2018-08-20 08:34:13 -0600)

----------------------------------------------------------------
Adam Kupczyk (1):
      iolog: Now --read_iolog can contain multiple replay files, separated by ':'.

Jens Axboe (2):
      Merge branch 'multiple-read_iolog' of https://github.com/aclamk/fio
      Update HOWTO for read_iolog change

 HOWTO     |  4 ++++
 fio.1     |  4 ++++
 iolog.c   | 12 ++++++++----
 options.c | 17 +++++++++++++++++
 options.h |  1 +
 5 files changed, 34 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index ff7aa09..3839461 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2409,6 +2409,10 @@ I/O replay
 	:manpage:`blktrace(8)` for how to capture such logging data. For blktrace
 	replay, the file needs to be turned into a blkparse binary data file first
 	(``blkparse <device> -o /dev/null -d file_for_fio.bin``).
+	You can specify a number of files by separating the names with a ':'
+	character. See the :option:`filename` option for information on how to
+	escape ':' and '\' characters within the file names. These files will
+	be sequentially assigned to job clones created by :option:`numjobs`.
 
 .. option:: read_iolog_chunked=bool
 
diff --git a/fio.1 b/fio.1
index cb4351f..4071947 100644
--- a/fio.1
+++ b/fio.1
@@ -2127,6 +2127,10 @@ to replay a workload captured by blktrace. See
 \fBblktrace\fR\|(8) for how to capture such logging data. For blktrace
 replay, the file needs to be turned into a blkparse binary data file first
 (`blkparse <device> \-o /dev/null \-d file_for_fio.bin').
+You can specify a number of files by separating the names with a ':' character.
+See the \fBfilename\fR option for information on how to escape ':' and '\'
+characters within the file names. These files will be sequentially assigned to
+job clones created by \fBnumjobs\fR.
 .TP
 .BI read_iolog_chunked \fR=\fPbool
 Determines how iolog is read. If false (default) entire \fBread_iolog\fR will
diff --git a/iolog.c b/iolog.c
index 0f95c60..f3eedb5 100644
--- a/iolog.c
+++ b/iolog.c
@@ -447,7 +447,7 @@ static bool read_iolog2(struct thread_data *td)
 					dprint(FD_FILE, "iolog: ignoring"
 						" re-add of file %s\n", fname);
 				} else {
-					fileno = add_file(td, fname, 0, 1);
+					fileno = add_file(td, fname, td->subjob_number, 1);
 					file_action = FIO_LOG_ADD_FILE;
 				}
 				continue;
@@ -596,13 +596,17 @@ static bool init_iolog_read(struct thread_data *td)
 	char buffer[256], *p;
 	FILE *f = NULL;
 	bool ret;
-	if (is_socket(td->o.read_iolog_file)) {
-		int fd = open_socket(td->o.read_iolog_file);
+	char* fname = get_name_by_idx(td->o.read_iolog_file, td->subjob_number);
+	dprint(FD_IO, "iolog: name=%s\n", fname);
+
+	if (is_socket(fname)) {
+		int fd = open_socket(fname);
 		if (fd >= 0) {
 			f = fdopen(fd, "r");
 		}
 	} else
-		f = fopen(td->o.read_iolog_file, "r");
+		f = fopen(fname, "r");
+	free(fname);
 	if (!f) {
 		perror("fopen read iolog");
 		return false;
diff --git a/options.c b/options.c
index 1c35acc..86ab5d6 100644
--- a/options.c
+++ b/options.c
@@ -1240,6 +1240,23 @@ int set_name_idx(char *target, size_t tlen, char *input, int index,
 	return len;
 }
 
+char* get_name_by_idx(char *input, int index)
+{
+	unsigned int cur_idx;
+	char *fname, *str, *p;
+
+	p = str = strdup(input);
+
+	index %= get_max_name_idx(input);
+	for (cur_idx = 0; cur_idx <= index; cur_idx++)
+		fname = get_next_name(&str);
+
+	fname = strdup(fname);
+	free(p);
+
+	return fname;
+}
+
 static int str_filename_cb(void *data, const char *input)
 {
 	struct thread_data *td = cb_data_to_td(data);
diff --git a/options.h b/options.h
index e53eb1b..8fdd136 100644
--- a/options.h
+++ b/options.h
@@ -16,6 +16,7 @@ void add_opt_posval(const char *, const char *, const char *);
 void del_opt_posval(const char *, const char *);
 struct thread_data;
 void fio_options_free(struct thread_data *);
+char* get_name_by_idx(char *input, int index);
 int set_name_idx(char *, size_t, char *, int, bool);
 
 extern char client_sockaddr_str[];  /* used with --client option */


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-18 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 68132 bytes --]

The following changes since commit 9ee669faa39003c2317e5df892314bcfcee069e3:

  configure: avoid pkg-config usage for http engine (2018-08-16 18:58:27 +0200)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c44d2c6e97245bf68a57f9860a1c92c7bc065f82:

  Move steady state unit test to t/ (2018-08-17 19:39:07 -0600)

----------------------------------------------------------------
Ga��tan Bossu (1):
      Add support for DDN's Infinite Memory Engine

Jens Axboe (7):
      Merge branch 'master' of https://github.com/kelleymh/fio
      Merge branch 'ime-support' of https://github.com/DDNStorage/fio-public into ddn-ime
      engines/ime: various code and style cleanups
      Sync man page with fio for IME
      Merge branch 'ddn-ime'
      Merge branch 'master' of https://github.com/kelleymh/fio
      Move steady state unit test to t/

Michael Kelley (3):
      Reimplement axmap_next_free() to prevent distribution skew
      Add tests specifically for axmap_next_free()
      Remove unused code in lib/axmap.c

Tomohiro Kusumi (1):
      http: fix compile-time warnings

 HOWTO                                  |  16 +
 Makefile                               |   3 +
 configure                              |  29 ++
 engines/http.c                         |   4 +-
 engines/ime.c                          | 899 +++++++++++++++++++++++++++++++++
 examples/ime.fio                       |  51 ++
 fio.1                                  |  14 +
 lib/axmap.c                            | 189 ++++---
 lib/axmap.h                            |   1 -
 options.c                              |  11 +
 t/axmap.c                              | 162 +++++-
 {unit_tests => t}/steadystate_tests.py |   0
 12 files changed, 1272 insertions(+), 107 deletions(-)
 create mode 100644 engines/ime.c
 create mode 100644 examples/ime.fio
 rename {unit_tests => t}/steadystate_tests.py (100%)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 743144f..ff7aa09 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1901,6 +1901,22 @@ I/O engine
 			mounted with DAX on a persistent memory device through the PMDK
 			libpmem library.
 
+		**ime_psync**
+			Synchronous read and write using DDN's Infinite Memory Engine (IME).
+			This engine is very basic and issues calls to IME whenever an IO is
+			queued.
+
+		**ime_psyncv**
+			Synchronous read and write using DDN's Infinite Memory Engine (IME).
+			This engine uses iovecs and will try to stack as much IOs as possible
+			(if the IOs are "contiguous" and the IO depth is not exceeded)
+			before issuing a call to IME.
+
+		**ime_aio**
+			Asynchronous read and write using DDN's Infinite Memory Engine (IME).
+			This engine will try to stack as much IOs as possible by creating
+			requests for IME. FIO will then decide when to commit these requests.
+
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/Makefile b/Makefile
index b981b45..e8e15fe 100644
--- a/Makefile
+++ b/Makefile
@@ -145,6 +145,9 @@ endif
 ifdef CONFIG_LIBPMEM
   SOURCE += engines/libpmem.c
 endif
+ifdef CONFIG_IME
+  SOURCE += engines/ime.c
+endif
 
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
diff --git a/configure b/configure
index a03f7fa..fb8b243 100755
--- a/configure
+++ b/configure
@@ -202,6 +202,8 @@ for opt do
   ;;
   --disable-native) disable_native="yes"
   ;;
+  --with-ime=*) ime_path="$optarg"
+  ;;
   --help)
     show_help="yes"
     ;;
@@ -233,6 +235,7 @@ if test "$show_help" = "yes" ; then
   echo "--disable-optimizations Don't enable compiler optimizations"
   echo "--enable-cuda           Enable GPUDirect RDMA support"
   echo "--disable-native        Don't build for native host"
+  echo "--with-ime=             Install path for DDN's Infinite Memory Engine"
   exit $exit_val
 fi
 
@@ -1963,6 +1966,29 @@ print_config "PMDK dev-dax engine" "$devdax"
 print_config "PMDK libpmem engine" "$pmem"
 
 ##########################################
+# Check whether we support DDN's IME
+if test "$libime" != "yes" ; then
+  libime="no"
+fi
+cat > $TMPC << EOF
+#include <ime_native.h>
+int main(int argc, char **argv)
+{
+  int rc;
+  ime_native_init();
+  rc = ime_native_finalize();
+  return 0;
+}
+EOF
+if compile_prog "-I${ime_path}/include" "-L${ime_path}/lib -lim_client" "libime"; then
+  libime="yes"
+  CFLAGS="-I${ime_path}/include $CFLAGS"
+  LDFLAGS="-Wl,-rpath ${ime_path}/lib -L${ime_path}/lib $LDFLAGS"
+  LIBS="-lim_client $LIBS"
+fi
+print_config "DDN's Infinite Memory Engine" "$libime"
+
+##########################################
 # Check if we have lex/yacc available
 yacc="no"
 yacc_is_bison="no"
@@ -2455,6 +2481,9 @@ fi
 if test "$pmem" = "yes" ; then
   output_sym "CONFIG_LIBPMEM"
 fi
+if test "$libime" = "yes" ; then
+  output_sym "CONFIG_IME"
+fi
 if test "$arith" = "yes" ; then
   output_sym "CONFIG_ARITHMETIC"
   if test "$yacc_is_bison" = "yes" ; then
diff --git a/engines/http.c b/engines/http.c
index cb66ebe..93fcd0d 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -546,7 +546,7 @@ static enum fio_q_status fio_http_queue(struct thread_data *td,
 	} else if (io_u->ddir == DDIR_TRIM) {
 		curl_easy_setopt(http->curl, CURLOPT_HTTPGET, 1L);
 		curl_easy_setopt(http->curl, CURLOPT_CUSTOMREQUEST, "DELETE");
-		curl_easy_setopt(http->curl, CURLOPT_INFILESIZE_LARGE, 0);
+		curl_easy_setopt(http->curl, CURLOPT_INFILESIZE_LARGE, (curl_off_t)0);
 		curl_easy_setopt(http->curl, CURLOPT_READDATA, NULL);
 		curl_easy_setopt(http->curl, CURLOPT_WRITEDATA, NULL);
 		res = curl_easy_perform(http->curl);
@@ -608,7 +608,7 @@ static int fio_http_setup(struct thread_data *td)
 	}
 	curl_easy_setopt(http->curl, CURLOPT_READFUNCTION, _http_read);
 	curl_easy_setopt(http->curl, CURLOPT_WRITEFUNCTION, _http_write);
-	curl_easy_setopt(http->curl, CURLOPT_SEEKFUNCTION, _http_seek);
+	curl_easy_setopt(http->curl, CURLOPT_SEEKFUNCTION, &_http_seek);
 	if (o->user && o->pass) {
 		curl_easy_setopt(http->curl, CURLOPT_USERNAME, o->user);
 		curl_easy_setopt(http->curl, CURLOPT_PASSWORD, o->pass);
diff --git a/engines/ime.c b/engines/ime.c
new file mode 100644
index 0000000..4298402
--- /dev/null
+++ b/engines/ime.c
@@ -0,0 +1,899 @@
+/*
+ * FIO engines for DDN's Infinite Memory Engine.
+ * This file defines 3 engines: ime_psync, ime_psyncv, and ime_aio
+ *
+ * Copyright (C) 2018      DataDirect Networks. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+/*
+ * Some details about the new engines are given below:
+ *
+ *
+ * ime_psync:
+ * Most basic engine that issues calls to ime_native whenever an IO is queued.
+ *
+ * ime_psyncv:
+ * This engine tries to queue the IOs (by creating iovecs) if asked by FIO (via
+ * iodepth_batch). It refuses to queue when the iovecs can't be appended, and
+ * waits for FIO to issue a commit. After a call to commit and get_events, new
+ * IOs can be queued.
+ *
+ * ime_aio:
+ * This engine tries to queue the IOs (by creating iovecs) if asked by FIO (via
+ * iodepth_batch). When the iovecs can't be appended to the current request, a
+ * new request for IME is created. These requests will be issued to IME when
+ * commit is called. Contrary to ime_psyncv, there can be several requests at
+ * once. We don't need to wait for a request to terminate before creating a new
+ * one.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <linux/limits.h>
+#include <ime_native.h>
+
+#include "../fio.h"
+
+
+/**************************************************************
+ *              Types and constants definitions
+ *
+ **************************************************************/
+
+/* define constants for async IOs */
+#define FIO_IME_IN_PROGRESS -1
+#define FIO_IME_REQ_ERROR   -2
+
+/* This flag is used when some jobs were created using threads. In that
+   case, IME can't be finalized in the engine-specific cleanup function,
+   because other threads might still use IME. Instead, IME is finalized
+   in the destructor (see fio_ime_unregister), only when the flag
+   fio_ime_is_initialized is true (which means at least one thread has
+   initialized IME). */
+static bool fio_ime_is_initialized = false;
+
+struct imesio_req {
+	int 			fd;		/* File descriptor */
+	enum fio_ddir	ddir;	/* Type of IO (read or write) */
+	off_t			offset;	/* File offset */
+};
+struct imeaio_req {
+	struct ime_aiocb 	iocb;			/* IME aio request */
+	ssize_t      		status;			/* Status of the IME request */
+	enum fio_ddir		ddir;			/* Type of IO (read or write) */
+	pthread_cond_t		cond_endio;		/* Condition var to notify FIO */
+	pthread_mutex_t		status_mutex;	/* Mutex for cond_endio */
+};
+
+/* This structure will be used for 2 engines: ime_psyncv and ime_aio */
+struct ime_data {
+	union {
+		struct imeaio_req 	*aioreqs;	/* array of aio requests */
+		struct imesio_req	*sioreq;	/* pointer to the only syncio request */
+	};
+	struct iovec 	*iovecs;		/* array of queued iovecs */
+	struct io_u 	**io_us;		/* array of queued io_u pointers */
+	struct io_u 	**event_io_us;	/* array of the events retieved afer get_events*/
+	unsigned int 	queued;			/* iovecs/io_us in the queue */
+	unsigned int 	events;			/* number of committed iovecs/io_us */
+
+	/* variables used to implement a "ring" queue */
+	unsigned int depth;			/* max entries in the queue */
+	unsigned int head;			/* index used to append */
+	unsigned int tail;			/* index used to pop */
+	unsigned int cur_commit;	/* index of the first uncommitted req */
+
+	/* offset used by the last iovec (used to check if the iovecs can be appended)*/
+	unsigned long long	last_offset;
+
+	/* The variables below are used for aio only */
+	struct imeaio_req	*last_req; /* last request awaiting committing */
+};
+
+
+/**************************************************************
+ *         Private functions for queueing/unqueueing
+ *
+ **************************************************************/
+
+static void fio_ime_queue_incr (struct ime_data *ime_d)
+{
+	ime_d->head = (ime_d->head + 1) % ime_d->depth;
+	ime_d->queued++;
+}
+
+static void fio_ime_queue_red (struct ime_data *ime_d)
+{
+	ime_d->tail = (ime_d->tail + 1) % ime_d->depth;
+	ime_d->queued--;
+	ime_d->events--;
+}
+
+static void fio_ime_queue_commit (struct ime_data *ime_d, int iovcnt)
+{
+	ime_d->cur_commit = (ime_d->cur_commit + iovcnt) % ime_d->depth;
+	ime_d->events += iovcnt;
+}
+
+static void fio_ime_queue_reset (struct ime_data *ime_d)
+{
+	ime_d->head = 0;
+	ime_d->tail = 0;
+	ime_d->cur_commit = 0;
+	ime_d->queued = 0;
+	ime_d->events = 0;
+}
+
+/**************************************************************
+ *                   General IME functions
+ *             (needed for both sync and async IOs)
+ **************************************************************/
+
+static char *fio_set_ime_filename(char* filename)
+{
+	static __thread char ime_filename[PATH_MAX];
+	int ret;
+
+	ret = snprintf(ime_filename, PATH_MAX, "%s%s", DEFAULT_IME_FILE_PREFIX, filename);
+	if (ret < PATH_MAX)
+		return ime_filename;
+
+	return NULL;
+}
+
+static int fio_ime_get_file_size(struct thread_data *td, struct fio_file *f)
+{
+	struct stat buf;
+	int ret;
+	char *ime_filename;
+
+	dprint(FD_FILE, "get file size %s\n", f->file_name);
+
+	ime_filename = fio_set_ime_filename(f->file_name);
+	if (ime_filename == NULL)
+		return 1;
+	ret = ime_native_stat(ime_filename, &buf);
+	if (ret == -1) {
+		td_verror(td, errno, "fstat");
+		return 1;
+	}
+
+	f->real_file_size = buf.st_size;
+	return 0;
+}
+
+/* This functions mimics the generic_file_open function, but issues
+   IME native calls instead of POSIX calls. */
+static int fio_ime_open_file(struct thread_data *td, struct fio_file *f)
+{
+	int flags = 0;
+	int ret;
+	uint64_t desired_fs;
+	char *ime_filename;
+
+	dprint(FD_FILE, "fd open %s\n", f->file_name);
+
+	if (td_trim(td)) {
+		td_verror(td, EINVAL, "IME does not support TRIM operation");
+		return 1;
+	}
+
+	if (td->o.oatomic) {
+		td_verror(td, EINVAL, "IME does not support atomic IO");
+		return 1;
+	}
+	if (td->o.odirect)
+		flags |= O_DIRECT;
+	if (td->o.sync_io)
+		flags |= O_SYNC;
+	if (td->o.create_on_open && td->o.allow_create)
+		flags |= O_CREAT;
+
+	if (td_write(td)) {
+		if (!read_only)
+			flags |= O_RDWR;
+
+		if (td->o.allow_create)
+			flags |= O_CREAT;
+	} else if (td_read(td)) {
+		flags |= O_RDONLY;
+	} else {
+		/* We should never go here. */
+		td_verror(td, EINVAL, "Unsopported open mode");
+		return 1;
+	}
+
+	ime_filename = fio_set_ime_filename(f->file_name);
+	if (ime_filename == NULL)
+		return 1;
+	f->fd = ime_native_open(ime_filename, flags, 0600);
+	if (f->fd == -1) {
+		char buf[FIO_VERROR_SIZE];
+		int __e = errno;
+
+		snprintf(buf, sizeof(buf), "open(%s)", f->file_name);
+		td_verror(td, __e, buf);
+		return 1;
+	}
+
+	/* Now we need to make sure the real file size is sufficient for FIO
+	   to do its things. This is normally done before the file open function
+	   is called, but because FIO would use POSIX calls, we need to do it
+	   ourselves */
+	ret = fio_ime_get_file_size(td, f);
+	if (ret < 0) {
+		ime_native_close(f->fd);
+		td_verror(td, errno, "ime_get_file_size");
+		return 1;
+	}
+
+	desired_fs = f->io_size + f->file_offset;
+	if (td_write(td)) {
+		dprint(FD_FILE, "Laying out file %s%s\n",
+			DEFAULT_IME_FILE_PREFIX, f->file_name);
+		if (!td->o.create_on_open &&
+				f->real_file_size < desired_fs &&
+				ime_native_ftruncate(f->fd, desired_fs) < 0) {
+			ime_native_close(f->fd);
+			td_verror(td, errno, "ime_native_ftruncate");
+			return 1;
+		}
+		if (f->real_file_size < desired_fs)
+			f->real_file_size = desired_fs;
+	} else if (td_read(td) && f->real_file_size < desired_fs) {
+		ime_native_close(f->fd);
+		log_err("error: can't read %lu bytes from file with "
+						"%lu bytes\n", desired_fs, f->real_file_size);
+		return 1;
+	}
+
+	return 0;
+}
+
+static int fio_ime_close_file(struct thread_data fio_unused *td, struct fio_file *f)
+{
+	int ret = 0;
+
+	dprint(FD_FILE, "fd close %s\n", f->file_name);
+
+	if (ime_native_close(f->fd) < 0)
+		ret = errno;
+
+	f->fd = -1;
+	return ret;
+}
+
+static int fio_ime_unlink_file(struct thread_data *td, struct fio_file *f)
+{
+	char *ime_filename = fio_set_ime_filename(f->file_name);
+	int ret;
+
+	if (ime_filename == NULL)
+		return 1;
+
+	ret = unlink(ime_filename);
+	return ret < 0 ? errno : 0;
+}
+
+static struct io_u *fio_ime_event(struct thread_data *td, int event)
+{
+	struct ime_data *ime_d = td->io_ops_data;
+
+	return ime_d->event_io_us[event];
+}
+
+/* Setup file used to replace get_file_sizes when settin up the file.
+   Instead we will set real_file_sie to 0 for each file. This way we
+   can avoid calling ime_native_init before the forks are created. */
+static int fio_ime_setup(struct thread_data *td)
+{
+	struct fio_file *f;
+	unsigned int i;
+
+	for_each_file(td, f, i) {
+		dprint(FD_FILE, "setup: set file size to 0 for %p/%d/%s\n",
+			f, i, f->file_name);
+		f->real_file_size = 0;
+	}
+
+	return 0;
+}
+
+static int fio_ime_engine_init(struct thread_data *td)
+{
+	struct fio_file *f;
+	unsigned int i;
+
+	dprint(FD_IO, "ime engine init\n");
+	if (fio_ime_is_initialized && !td->o.use_thread) {
+		log_err("Warning: something might go wrong. Not all threads/forks were"
+				" created before the FIO jobs were initialized.\n");
+	}
+
+	ime_native_init();
+	fio_ime_is_initialized = true;
+
+	/* We have to temporarily set real_file_size so that
+	   FIO can initialize properly. It will be corrected
+	   on file open. */
+	for_each_file(td, f, i)
+		f->real_file_size = f->io_size + f->file_offset;
+
+	return 0;
+}
+
+static void fio_ime_engine_finalize(struct thread_data *td)
+{
+	/* Only finalize IME when using forks */
+	if (!td->o.use_thread) {
+		if (ime_native_finalize() < 0)
+			log_err("error in ime_native_finalize\n");
+		fio_ime_is_initialized = false;
+	}
+}
+
+
+/**************************************************************
+ *             Private functions for blocking IOs
+ *                     (without iovecs)
+ **************************************************************/
+
+/* Notice: this function comes from the sync engine */
+/* It is used by the commit function to return a proper code and fill
+   some attributes in the io_u used for the IO. */
+static int fio_ime_psync_end(struct thread_data *td, struct io_u *io_u, ssize_t ret)
+{
+	if (ret != (ssize_t) io_u->xfer_buflen) {
+		if (ret >= 0) {
+			io_u->resid = io_u->xfer_buflen - ret;
+			io_u->error = 0;
+			return FIO_Q_COMPLETED;
+		} else
+			io_u->error = errno;
+	}
+
+	if (io_u->error) {
+		io_u_log_error(td, io_u);
+		td_verror(td, io_u->error, "xfer");
+	}
+
+	return FIO_Q_COMPLETED;
+}
+
+static enum fio_q_status fio_ime_psync_queue(struct thread_data *td,
+					   struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	ssize_t ret;
+
+	fio_ro_check(td, io_u);
+
+	if (io_u->ddir == DDIR_READ)
+		ret = ime_native_pread(f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+	else if (io_u->ddir == DDIR_WRITE)
+		ret = ime_native_pwrite(f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+	else if (io_u->ddir == DDIR_SYNC)
+		ret = ime_native_fsync(f->fd);
+	else {
+		ret = io_u->xfer_buflen;
+		io_u->error = EINVAL;
+	}
+
+	return fio_ime_psync_end(td, io_u, ret);
+}
+
+
+/**************************************************************
+ *             Private functions for blocking IOs
+ *                       (with iovecs)
+ **************************************************************/
+
+static bool fio_ime_psyncv_can_queue(struct ime_data *ime_d, struct io_u *io_u)
+{
+	/* We can only queue if:
+	  - There are no queued iovecs
+	  - Or if there is at least one:
+		 - There must be no event waiting for retrieval
+		 - The offsets must be contiguous
+		 - The ddir and fd must be the same */
+	return (ime_d->queued == 0 || (
+			ime_d->events == 0 &&
+			ime_d->last_offset == io_u->offset &&
+			ime_d->sioreq->ddir == io_u->ddir &&
+			ime_d->sioreq->fd == io_u->file->fd));
+}
+
+/* Before using this function, we should have already
+   ensured that the queue is not full */
+static void fio_ime_psyncv_enqueue(struct ime_data *ime_d, struct io_u *io_u)
+{
+	struct imesio_req *ioreq = ime_d->sioreq;
+	struct iovec *iov = &ime_d->iovecs[ime_d->head];
+
+	iov->iov_base = io_u->xfer_buf;
+	iov->iov_len = io_u->xfer_buflen;
+
+	if (ime_d->queued == 0) {
+		ioreq->offset = io_u->offset;
+		ioreq->ddir = io_u->ddir;
+		ioreq->fd = io_u->file->fd;
+	}
+
+	ime_d->io_us[ime_d->head] = io_u;
+	ime_d->last_offset = io_u->offset + io_u->xfer_buflen;
+	fio_ime_queue_incr(ime_d);
+}
+
+/* Tries to queue an IO. It will fail if the IO can't be appended to the
+   current request or if the current request has been committed but not
+   yet retrieved by get_events. */
+static enum fio_q_status fio_ime_psyncv_queue(struct thread_data *td,
+	struct io_u *io_u)
+{
+	struct ime_data *ime_d = td->io_ops_data;
+
+	fio_ro_check(td, io_u);
+
+	if (ime_d->queued == ime_d->depth)
+		return FIO_Q_BUSY;
+
+	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
+		if (!fio_ime_psyncv_can_queue(ime_d, io_u))
+			return FIO_Q_BUSY;
+
+		dprint(FD_IO, "queue: ddir=%d at %u commit=%u queued=%u events=%u\n",
+			io_u->ddir, ime_d->head, ime_d->cur_commit,
+			ime_d->queued, ime_d->events);
+		fio_ime_psyncv_enqueue(ime_d, io_u);
+		return FIO_Q_QUEUED;
+	} else if (io_u->ddir == DDIR_SYNC) {
+		if (ime_native_fsync(io_u->file->fd) < 0) {
+			io_u->error = errno;
+			td_verror(td, io_u->error, "fsync");
+		}
+		return FIO_Q_COMPLETED;
+	} else {
+		io_u->error = EINVAL;
+		td_verror(td, io_u->error, "wrong ddir");
+		return FIO_Q_COMPLETED;
+	}
+}
+
+/* Notice: this function comes from the sync engine */
+/* It is used by the commit function to return a proper code and fill
+   some attributes in the io_us appended to the current request. */
+static int fio_ime_psyncv_end(struct thread_data *td, ssize_t bytes)
+{
+	struct ime_data *ime_d = td->io_ops_data;
+	struct io_u *io_u;
+	unsigned int i;
+	int err = errno;
+
+	for (i = 0; i < ime_d->queued; i++) {
+		io_u = ime_d->io_us[i];
+
+		if (bytes == -1)
+			io_u->error = err;
+		else {
+			unsigned int this_io;
+
+			this_io = bytes;
+			if (this_io > io_u->xfer_buflen)
+				this_io = io_u->xfer_buflen;
+
+			io_u->resid = io_u->xfer_buflen - this_io;
+			io_u->error = 0;
+			bytes -= this_io;
+		}
+	}
+
+	if (bytes == -1) {
+		td_verror(td, err, "xfer psyncv");
+		return -err;
+	}
+
+	return 0;
+}
+
+/* Commits the current request by calling ime_native (with one or several
+   iovecs). After this commit, the corresponding events (one per iovec)
+   can be retrieved by get_events. */
+static int fio_ime_psyncv_commit(struct thread_data *td)
+{
+	struct ime_data *ime_d = td->io_ops_data;
+	struct imesio_req *ioreq;
+	int ret = 0;
+
+	/* Exit if there are no (new) events to commit
+	   or if the previous committed event haven't been retrieved */
+	if (!ime_d->queued || ime_d->events)
+		return 0;
+
+	ioreq = ime_d->sioreq;
+	ime_d->events = ime_d->queued;
+	if (ioreq->ddir == DDIR_READ)
+		ret = ime_native_preadv(ioreq->fd, ime_d->iovecs, ime_d->queued, ioreq->offset);
+	else
+		ret = ime_native_pwritev(ioreq->fd, ime_d->iovecs, ime_d->queued, ioreq->offset);
+
+	dprint(FD_IO, "committed %d iovecs\n", ime_d->queued);
+
+	return fio_ime_psyncv_end(td, ret);
+}
+
+static int fio_ime_psyncv_getevents(struct thread_data *td, unsigned int min,
+				unsigned int max, const struct timespec *t)
+{
+	struct ime_data *ime_d = td->io_ops_data;
+	struct io_u *io_u;
+	int events = 0;
+	unsigned int count;
+
+	if (ime_d->events) {
+		for (count = 0; count < ime_d->events; count++) {
+			io_u = ime_d->io_us[count];
+			ime_d->event_io_us[events] = io_u;
+			events++;
+		}
+		fio_ime_queue_reset(ime_d);
+	}
+
+	dprint(FD_IO, "getevents(%u,%u) ret=%d queued=%u events=%u\n",
+		min, max, events, ime_d->queued, ime_d->events);
+	return events;
+}
+
+static int fio_ime_psyncv_init(struct thread_data *td)
+{
+	struct ime_data *ime_d;
+
+	if (fio_ime_engine_init(td) < 0)
+		return 1;
+
+	ime_d = calloc(1, sizeof(*ime_d));
+
+	ime_d->sioreq = malloc(sizeof(struct imesio_req));
+	ime_d->iovecs = malloc(td->o.iodepth * sizeof(struct iovec));
+	ime_d->io_us = malloc(2 * td->o.iodepth * sizeof(struct io_u *));
+	ime_d->event_io_us = ime_d->io_us + td->o.iodepth;
+
+	ime_d->depth = td->o.iodepth;
+
+	td->io_ops_data = ime_d;
+	return 0;
+}
+
+static void fio_ime_psyncv_clean(struct thread_data *td)
+{
+	struct ime_data *ime_d = td->io_ops_data;
+
+	if (ime_d) {
+		free(ime_d->sioreq);
+		free(ime_d->iovecs);
+		free(ime_d->io_us);
+		free(ime_d);
+		td->io_ops_data = NULL;
+	}
+
+	fio_ime_engine_finalize(td);
+}
+
+
+/**************************************************************
+ *           Private functions for non-blocking IOs
+ *
+ **************************************************************/
+
+void fio_ime_aio_complete_cb  (struct ime_aiocb *aiocb, int err,
+							   ssize_t bytes)
+{
+	struct imeaio_req *ioreq = (struct imeaio_req *) aiocb->user_context;
+
+	pthread_mutex_lock(&ioreq->status_mutex);
+	ioreq->status = err == 0 ? bytes : FIO_IME_REQ_ERROR;
+	pthread_mutex_unlock(&ioreq->status_mutex);
+
+	pthread_cond_signal(&ioreq->cond_endio);
+}
+
+static bool fio_ime_aio_can_queue (struct ime_data *ime_d, struct io_u *io_u)
+{
+	/* So far we can queue in any case. */
+	return true;
+}
+static bool fio_ime_aio_can_append (struct ime_data *ime_d, struct io_u *io_u)
+{
+	/* We can only append if:
+		- The iovecs will be contiguous in the array
+		- There is already a queued iovec
+		- The offsets are contiguous
+		- The ddir and fs are the same */
+	return (ime_d->head != 0 &&
+			ime_d->queued - ime_d->events > 0 &&
+			ime_d->last_offset == io_u->offset &&
+			ime_d->last_req->ddir == io_u->ddir &&
+			ime_d->last_req->iocb.fd == io_u->file->fd);
+}
+
+/* Before using this function, we should have already
+   ensured that the queue is not full */
+static void fio_ime_aio_enqueue(struct ime_data *ime_d, struct io_u *io_u)
+{
+	struct imeaio_req *ioreq = &ime_d->aioreqs[ime_d->head];
+	struct ime_aiocb *iocb = &ioreq->iocb;
+	struct iovec *iov = &ime_d->iovecs[ime_d->head];
+
+	iov->iov_base = io_u->xfer_buf;
+	iov->iov_len = io_u->xfer_buflen;
+
+	if (fio_ime_aio_can_append(ime_d, io_u))
+		ime_d->last_req->iocb.iovcnt++;
+	else {
+		ioreq->status = FIO_IME_IN_PROGRESS;
+		ioreq->ddir = io_u->ddir;
+		ime_d->last_req = ioreq;
+
+		iocb->complete_cb = &fio_ime_aio_complete_cb;
+		iocb->fd = io_u->file->fd;
+		iocb->file_offset = io_u->offset;
+		iocb->iov = iov;
+		iocb->iovcnt = 1;
+		iocb->flags = 0;
+		iocb->user_context = (intptr_t) ioreq;
+	}
+
+	ime_d->io_us[ime_d->head] = io_u;
+	ime_d->last_offset = io_u->offset + io_u->xfer_buflen;
+	fio_ime_queue_incr(ime_d);
+}
+
+/* Tries to queue an IO. It will create a new request if the IO can't be
+   appended to the current request. It will fail if the queue can't contain
+   any more io_u/iovec. In this case, commit and then get_events need to be
+   called. */
+static enum fio_q_status fio_ime_aio_queue(struct thread_data *td,
+		struct io_u *io_u)
+{
+	struct ime_data *ime_d = td->io_ops_data;
+
+	fio_ro_check(td, io_u);
+
+	dprint(FD_IO, "queue: ddir=%d at %u commit=%u queued=%u events=%u\n",
+		io_u->ddir, ime_d->head, ime_d->cur_commit,
+		ime_d->queued, ime_d->events);
+
+	if (ime_d->queued == ime_d->depth)
+		return FIO_Q_BUSY;
+
+	if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
+		if (!fio_ime_aio_can_queue(ime_d, io_u))
+			return FIO_Q_BUSY;
+
+		fio_ime_aio_enqueue(ime_d, io_u);
+		return FIO_Q_QUEUED;
+	} else if (io_u->ddir == DDIR_SYNC) {
+		if (ime_native_fsync(io_u->file->fd) < 0) {
+			io_u->error = errno;
+			td_verror(td, io_u->error, "fsync");
+		}
+		return FIO_Q_COMPLETED;
+	} else {
+		io_u->error = EINVAL;
+		td_verror(td, io_u->error, "wrong ddir");
+		return FIO_Q_COMPLETED;
+	}
+}
+
+static int fio_ime_aio_commit(struct thread_data *td)
+{
+	struct ime_data *ime_d = td->io_ops_data;
+	struct imeaio_req *ioreq;
+	int ret = 0;
+
+	/* Loop while there are events to commit */
+	while (ime_d->queued - ime_d->events) {
+		ioreq = &ime_d->aioreqs[ime_d->cur_commit];
+		if (ioreq->ddir == DDIR_READ)
+			ret = ime_native_aio_read(&ioreq->iocb);
+		else
+			ret = ime_native_aio_write(&ioreq->iocb);
+
+		fio_ime_queue_commit(ime_d, ioreq->iocb.iovcnt);
+
+		/* fio needs a negative error code */
+		if (ret < 0) {
+			ioreq->status = FIO_IME_REQ_ERROR;
+			return -errno;
+		}
+
+		io_u_mark_submit(td, ioreq->iocb.iovcnt);
+		dprint(FD_IO, "committed %d iovecs commit=%u queued=%u events=%u\n",
+			ioreq->iocb.iovcnt, ime_d->cur_commit,
+			ime_d->queued, ime_d->events);
+	}
+
+	return 0;
+}
+
+static int fio_ime_aio_getevents(struct thread_data *td, unsigned int min,
+				unsigned int max, const struct timespec *t)
+{
+	struct ime_data *ime_d = td->io_ops_data;
+	struct imeaio_req *ioreq;
+	struct io_u *io_u;
+	int events = 0;
+	unsigned int count;
+	ssize_t bytes;
+
+	while (ime_d->events) {
+		ioreq = &ime_d->aioreqs[ime_d->tail];
+
+		/* Break if we already got events, and if we will
+		   exceed max if we append the next events */
+		if (events && events + ioreq->iocb.iovcnt > max)
+			break;
+
+		if (ioreq->status != FIO_IME_IN_PROGRESS) {
+
+			bytes = ioreq->status;
+			for (count = 0; count < ioreq->iocb.iovcnt; count++) {
+				io_u = ime_d->io_us[ime_d->tail];
+				ime_d->event_io_us[events] = io_u;
+				events++;
+				fio_ime_queue_red(ime_d);
+
+				if (ioreq->status == FIO_IME_REQ_ERROR)
+					io_u->error = EIO;
+				else {
+					io_u->resid = bytes > io_u->xfer_buflen ?
+									0 : io_u->xfer_buflen - bytes;
+					io_u->error = 0;
+					bytes -= io_u->xfer_buflen - io_u->resid;
+				}
+			}
+		} else {
+			pthread_mutex_lock(&ioreq->status_mutex);
+			while (ioreq->status == FIO_IME_IN_PROGRESS)
+				pthread_cond_wait(&ioreq->cond_endio, &ioreq->status_mutex);
+			pthread_mutex_unlock(&ioreq->status_mutex);
+		}
+
+	}
+
+	dprint(FD_IO, "getevents(%u,%u) ret=%d queued=%u events=%u\n", min, max,
+		events, ime_d->queued, ime_d->events);
+	return events;
+}
+
+static int fio_ime_aio_init(struct thread_data *td)
+{
+	struct ime_data *ime_d;
+	struct imeaio_req *ioreq;
+	unsigned int i;
+
+	if (fio_ime_engine_init(td) < 0)
+		return 1;
+
+	ime_d = calloc(1, sizeof(*ime_d));
+
+	ime_d->aioreqs = malloc(td->o.iodepth * sizeof(struct imeaio_req));
+	ime_d->iovecs = malloc(td->o.iodepth * sizeof(struct iovec));
+	ime_d->io_us = malloc(2 * td->o.iodepth * sizeof(struct io_u *));
+	ime_d->event_io_us = ime_d->io_us + td->o.iodepth;
+
+	ime_d->depth = td->o.iodepth;
+	for (i = 0; i < ime_d->depth; i++) {
+		ioreq = &ime_d->aioreqs[i];
+		pthread_cond_init(&ioreq->cond_endio, NULL);
+		pthread_mutex_init(&ioreq->status_mutex, NULL);
+	}
+
+	td->io_ops_data = ime_d;
+	return 0;
+}
+
+static void fio_ime_aio_clean(struct thread_data *td)
+{
+	struct ime_data *ime_d = td->io_ops_data;
+	struct imeaio_req *ioreq;
+	unsigned int i;
+
+	if (ime_d) {
+		for (i = 0; i < ime_d->depth; i++) {
+			ioreq = &ime_d->aioreqs[i];
+			pthread_cond_destroy(&ioreq->cond_endio);
+			pthread_mutex_destroy(&ioreq->status_mutex);
+		}
+		free(ime_d->aioreqs);
+		free(ime_d->iovecs);
+		free(ime_d->io_us);
+		free(ime_d);
+		td->io_ops_data = NULL;
+	}
+
+	fio_ime_engine_finalize(td);
+}
+
+
+/**************************************************************
+ *                   IO engines definitions
+ *
+ **************************************************************/
+
+/* The FIO_DISKLESSIO flag used for these engines is necessary to prevent
+   FIO from using POSIX calls. See fio_ime_open_file for more details. */
+
+static struct ioengine_ops ioengine_prw = {
+	.name		= "ime_psync",
+	.version	= FIO_IOOPS_VERSION,
+	.setup		= fio_ime_setup,
+	.init		= fio_ime_engine_init,
+	.cleanup	= fio_ime_engine_finalize,
+	.queue		= fio_ime_psync_queue,
+	.open_file	= fio_ime_open_file,
+	.close_file	= fio_ime_close_file,
+	.get_file_size	= fio_ime_get_file_size,
+	.unlink_file  	= fio_ime_unlink_file,
+	.flags	    	= FIO_SYNCIO | FIO_DISKLESSIO,
+};
+
+static struct ioengine_ops ioengine_pvrw = {
+	.name		= "ime_psyncv",
+	.version	= FIO_IOOPS_VERSION,
+	.setup		= fio_ime_setup,
+	.init		= fio_ime_psyncv_init,
+	.cleanup	= fio_ime_psyncv_clean,
+	.queue		= fio_ime_psyncv_queue,
+	.commit		= fio_ime_psyncv_commit,
+	.getevents	= fio_ime_psyncv_getevents,
+	.event		= fio_ime_event,
+	.open_file	= fio_ime_open_file,
+	.close_file	= fio_ime_close_file,
+	.get_file_size	= fio_ime_get_file_size,
+	.unlink_file  	= fio_ime_unlink_file,
+	.flags	    	= FIO_SYNCIO | FIO_DISKLESSIO,
+};
+
+static struct ioengine_ops ioengine_aio = {
+	.name		= "ime_aio",
+	.version	= FIO_IOOPS_VERSION,
+	.setup		= fio_ime_setup,
+	.init		= fio_ime_aio_init,
+	.cleanup	= fio_ime_aio_clean,
+	.queue		= fio_ime_aio_queue,
+	.commit		= fio_ime_aio_commit,
+	.getevents	= fio_ime_aio_getevents,
+	.event		= fio_ime_event,
+	.open_file	= fio_ime_open_file,
+	.close_file	= fio_ime_close_file,
+	.get_file_size	= fio_ime_get_file_size,
+	.unlink_file  	= fio_ime_unlink_file,
+	.flags       	= FIO_DISKLESSIO,
+};
+
+static void fio_init fio_ime_register(void)
+{
+	register_ioengine(&ioengine_prw);
+	register_ioengine(&ioengine_pvrw);
+	register_ioengine(&ioengine_aio);
+}
+
+static void fio_exit fio_ime_unregister(void)
+{
+	unregister_ioengine(&ioengine_prw);
+	unregister_ioengine(&ioengine_pvrw);
+	unregister_ioengine(&ioengine_aio);
+
+	if (fio_ime_is_initialized && ime_native_finalize() < 0)
+		log_err("Warning: IME did not finalize properly\n");
+}
diff --git a/examples/ime.fio b/examples/ime.fio
new file mode 100644
index 0000000..e97fd1d
--- /dev/null
+++ b/examples/ime.fio
@@ -0,0 +1,51 @@
+# This jobfile performs basic write+read operations using
+# DDN's Infinite Memory Engine.
+
+[global]
+
+# Use as much jobs as possible to maximize performance
+numjobs=8
+
+# The filename should be uniform so that "read" jobs can read what
+# the "write" jobs have written.
+filename_format=fio-test-ime.$jobnum.$filenum
+
+size=25g
+bs=128k
+
+# These settings are useful for the asynchronous ime_aio engine:
+# by setting the io depth to twice the size of a "batch", we can
+# queue IOs while other IOs are "in-flight".
+iodepth=32
+iodepth_batch=16
+iodepth_batch_complete=16
+
+[write-psync]
+stonewall
+rw=write
+ioengine=ime_psync
+
+[read-psync]
+stonewall
+rw=read
+ioengine=ime_psync
+
+[write-psyncv]
+stonewall
+rw=write
+ioengine=ime_psyncv
+
+[read-psyncv]
+stonewall
+rw=read
+ioengine=ime_psyncv
+
+[write-aio]
+stonewall
+rw=write
+ioengine=ime_aio
+
+[read-aio]
+stonewall
+rw=read
+ioengine=ime_aio
\ No newline at end of file
diff --git a/fio.1 b/fio.1
index 73a0422..cb4351f 100644
--- a/fio.1
+++ b/fio.1
@@ -1673,6 +1673,20 @@ done other than creating the file.
 Read and write using mmap I/O to a file on a filesystem
 mounted with DAX on a persistent memory device through the PMDK
 libpmem library.
+.TP
+.B ime_psync
+Synchronous read and write using DDN's Infinite Memory Engine (IME). This
+engine is very basic and issues calls to IME whenever an IO is queued.
+.TP
+.B ime_psyncv
+Synchronous read and write using DDN's Infinite Memory Engine (IME). This
+engine uses iovecs and will try to stack as much IOs as possible (if the IOs
+are "contiguous" and the IO depth is not exceeded) before issuing a call to IME.
+.TP
+.B ime_aio
+Asynchronous read and write using DDN's Infinite Memory Engine (IME). This
+engine will try to stack as much IOs as possible by creating requests for IME.
+FIO will then decide when to commit these requests.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
diff --git a/lib/axmap.c b/lib/axmap.c
index 454af0b..923aae4 100644
--- a/lib/axmap.c
+++ b/lib/axmap.c
@@ -35,8 +35,6 @@
 #define BLOCKS_PER_UNIT		(1U << UNIT_SHIFT)
 #define BLOCKS_PER_UNIT_MASK	(BLOCKS_PER_UNIT - 1)
 
-#define firstfree_valid(b)	((b)->first_free != (uint64_t) -1)
-
 static const unsigned long bit_masks[] = {
 	0x0000000000000000, 0x0000000000000001, 0x0000000000000003, 0x0000000000000007,
 	0x000000000000000f, 0x000000000000001f, 0x000000000000003f, 0x000000000000007f,
@@ -68,7 +66,6 @@ struct axmap_level {
 struct axmap {
 	unsigned int nr_levels;
 	struct axmap_level *levels;
-	uint64_t first_free;
 	uint64_t nr_bits;
 };
 
@@ -89,8 +86,6 @@ void axmap_reset(struct axmap *axmap)
 
 		memset(al->map, 0, al->map_size * sizeof(unsigned long));
 	}
-
-	axmap->first_free = 0;
 }
 
 void axmap_free(struct axmap *axmap)
@@ -192,24 +187,6 @@ static bool axmap_handler_topdown(struct axmap *axmap, uint64_t bit_nr,
 	return false;
 }
 
-static bool axmap_clear_fn(struct axmap_level *al, unsigned long offset,
-			   unsigned int bit, void *unused)
-{
-	if (!(al->map[offset] & (1UL << bit)))
-		return true;
-
-	al->map[offset] &= ~(1UL << bit);
-	return false;
-}
-
-void axmap_clear(struct axmap *axmap, uint64_t bit_nr)
-{
-	axmap_handler(axmap, bit_nr, axmap_clear_fn, NULL);
-
-	if (bit_nr < axmap->first_free)
-		axmap->first_free = bit_nr;
-}
-
 struct axmap_set_data {
 	unsigned int nr_bits;
 	unsigned int set_bits;
@@ -262,10 +239,6 @@ static void __axmap_set(struct axmap *axmap, uint64_t bit_nr,
 {
 	unsigned int set_bits, nr_bits = data->nr_bits;
 
-	if (axmap->first_free >= bit_nr &&
-	    axmap->first_free < bit_nr + data->nr_bits)
-		axmap->first_free = -1ULL;
-
 	if (bit_nr > axmap->nr_bits)
 		return;
 	else if (bit_nr + nr_bits > axmap->nr_bits)
@@ -336,99 +309,119 @@ bool axmap_isset(struct axmap *axmap, uint64_t bit_nr)
 	return false;
 }
 
-static uint64_t axmap_find_first_free(struct axmap *axmap, unsigned int level,
-				       uint64_t index)
+/*
+ * Find the first free bit that is at least as large as bit_nr.  Return
+ * -1 if no free bit is found before the end of the map.
+ */
+static uint64_t axmap_find_first_free(struct axmap *axmap, uint64_t bit_nr)
 {
-	uint64_t ret = -1ULL;
-	unsigned long j;
 	int i;
+	unsigned long temp;
+	unsigned int bit;
+	uint64_t offset, base_index, index;
+	struct axmap_level *al;
 
-	/*
-	 * Start at the bottom, then converge towards first free bit at the top
-	 */
-	for (i = level; i >= 0; i--) {
-		struct axmap_level *al = &axmap->levels[i];
-
-		if (index >= al->map_size)
-			goto err;
-
-		for (j = index; j < al->map_size; j++) {
-			if (al->map[j] == -1UL)
-				continue;
+	index = 0;
+	for (i = axmap->nr_levels - 1; i >= 0; i--) {
+		al = &axmap->levels[i];
 
-			/*
-			 * First free bit here is our index into the first
-			 * free bit at the next higher level
-			 */
-			ret = index = (j << UNIT_SHIFT) + ffz(al->map[j]);
-			break;
+		/* Shift previously calculated index for next level */
+		index <<= UNIT_SHIFT;
+
+		/*
+		 * Start from an index that's at least as large as the
+		 * originally passed in bit number.
+		 */
+		base_index = bit_nr >> (UNIT_SHIFT * i);
+		if (index < base_index)
+			index = base_index;
+
+		/* Get the offset and bit for this level */
+		offset = index >> UNIT_SHIFT;
+		bit = index & BLOCKS_PER_UNIT_MASK;
+
+		/*
+		 * If the previous level had unused bits in its last
+		 * word, the offset could be bigger than the map at
+		 * this level. That means no free bits exist before the
+		 * end of the map, so return -1.
+		 */
+		if (offset >= al->map_size)
+			return -1ULL;
+
+		/* Check the first word starting with the specific bit */
+		temp = ~bit_masks[bit] & ~al->map[offset];
+		if (temp)
+			goto found;
+
+		/*
+		 * No free bit in the first word, so iterate
+		 * looking for a word with one or more free bits.
+		 */
+		for (offset++; offset < al->map_size; offset++) {
+			temp = ~al->map[offset];
+			if (temp)
+				goto found;
 		}
-	}
-
-	if (ret < axmap->nr_bits)
-		return ret;
-
-err:
-	return (uint64_t) -1ULL;
-}
-
-static uint64_t axmap_first_free(struct axmap *axmap)
-{
-	if (!firstfree_valid(axmap))
-		axmap->first_free = axmap_find_first_free(axmap, axmap->nr_levels - 1, 0);
-
-	return axmap->first_free;
-}
-
-struct axmap_next_free_data {
-	unsigned int level;
-	unsigned long offset;
-	uint64_t bit;
-};
 
-static bool axmap_next_free_fn(struct axmap_level *al, unsigned long offset,
-			       unsigned int bit, void *__data)
-{
-	struct axmap_next_free_data *data = __data;
-	uint64_t mask = ~bit_masks[(data->bit + 1) & BLOCKS_PER_UNIT_MASK];
-
-	if (!(mask & ~al->map[offset]))
-		return false;
+		/* Did not find a free bit */
+		return -1ULL;
 
-	if (al->map[offset] != -1UL) {
-		data->level = al->level;
-		data->offset = offset;
-		return true;
+found:
+		/* Compute the index of the free bit just found */
+		index = (offset << UNIT_SHIFT) + ffz(~temp);
 	}
 
-	data->bit = (data->bit + BLOCKS_PER_UNIT - 1) / BLOCKS_PER_UNIT;
-	return false;
+	/* If found an unused bit in the last word of level 0, return -1 */
+	if (index >= axmap->nr_bits)
+		return -1ULL;
+
+	return index;
 }
 
 /*
  * 'bit_nr' is already set. Find the next free bit after this one.
+ * Return -1 if no free bits found.
  */
 uint64_t axmap_next_free(struct axmap *axmap, uint64_t bit_nr)
 {
-	struct axmap_next_free_data data = { .level = -1U, .bit = bit_nr, };
 	uint64_t ret;
+	uint64_t next_bit = bit_nr + 1;
+	unsigned long temp;
+	uint64_t offset;
+	unsigned int bit;
 
-	if (firstfree_valid(axmap) && bit_nr < axmap->first_free)
-		return axmap->first_free;
+	if (bit_nr >= axmap->nr_bits)
+		return -1ULL;
 
-	if (!axmap_handler(axmap, bit_nr, axmap_next_free_fn, &data))
-		return axmap_first_free(axmap);
+	/* If at the end of the map, wrap-around */
+	if (next_bit == axmap->nr_bits)
+		next_bit = 0;
 
-	assert(data.level != -1U);
+	offset = next_bit >> UNIT_SHIFT;
+	bit = next_bit & BLOCKS_PER_UNIT_MASK;
 
 	/*
-	 * In the rare case that the map is unaligned, we might end up
-	 * finding an offset that's beyond the valid end. For that case,
-	 * find the first free one, the map is practically full.
+	 * As an optimization, do a quick check for a free bit
+	 * in the current word at level 0. If not found, do
+	 * a topdown search.
 	 */
-	ret = axmap_find_first_free(axmap, data.level, data.offset);
-	if (ret != -1ULL)
-		return ret;
+	temp = ~bit_masks[bit] & ~axmap->levels[0].map[offset];
+	if (temp) {
+		ret = (offset << UNIT_SHIFT) + ffz(~temp);
+
+		/* Might have found an unused bit at level 0 */
+		if (ret >= axmap->nr_bits)
+			ret = -1ULL;
+	} else
+		ret = axmap_find_first_free(axmap, next_bit);
 
-	return axmap_first_free(axmap);
+	/*
+	 * If there are no free bits starting at next_bit and going
+	 * to the end of the map, wrap around by searching again
+	 * starting at bit 0.
+	 */
+	if (ret == -1ULL && next_bit != 0)
+		ret = axmap_find_first_free(axmap, 0);
+	return ret;
 }
diff --git a/lib/axmap.h b/lib/axmap.h
index a7a6f94..55349d8 100644
--- a/lib/axmap.h
+++ b/lib/axmap.h
@@ -8,7 +8,6 @@ struct axmap;
 struct axmap *axmap_new(unsigned long nr_bits);
 void axmap_free(struct axmap *bm);
 
-void axmap_clear(struct axmap *axmap, uint64_t bit_nr);
 void axmap_set(struct axmap *axmap, uint64_t bit_nr);
 unsigned int axmap_set_nr(struct axmap *axmap, uint64_t bit_nr, unsigned int nr_bits);
 bool axmap_isset(struct axmap *axmap, uint64_t bit_nr);
diff --git a/options.c b/options.c
index 9ee1ba3..1c35acc 100644
--- a/options.c
+++ b/options.c
@@ -1845,6 +1845,17 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  },
 
 #endif
+#ifdef CONFIG_IME
+			  { .ival = "ime_psync",
+			    .help = "DDN's IME synchronous IO engine",
+			  },
+			  { .ival = "ime_psyncv",
+			    .help = "DDN's IME synchronous IO engine using iovecs",
+			  },
+			  { .ival = "ime_aio",
+			    .help = "DDN's IME asynchronous IO engine",
+			  },
+#endif
 #ifdef CONFIG_LINUX_DEVDAX
 			  { .ival = "dev-dax",
 			    .help = "DAX Device based IO engine",
diff --git a/t/axmap.c b/t/axmap.c
index 1512737..1752439 100644
--- a/t/axmap.c
+++ b/t/axmap.c
@@ -9,8 +9,6 @@ static int test_regular(size_t size, int seed)
 {
 	struct fio_lfsr lfsr;
 	struct axmap *map;
-	size_t osize;
-	uint64_t ff;
 	int err;
 
 	printf("Using %llu entries...", (unsigned long long) size);
@@ -18,7 +16,6 @@ static int test_regular(size_t size, int seed)
 
 	lfsr_init(&lfsr, size, seed, seed & 0xF);
 	map = axmap_new(size);
-	osize = size;
 	err = 0;
 
 	while (size--) {
@@ -45,11 +42,154 @@ static int test_regular(size_t size, int seed)
 	if (err)
 		return err;
 
-	ff = axmap_next_free(map, osize);
-	if (ff != (uint64_t) -1ULL) {
-		printf("axmap_next_free broken: got %llu\n", (unsigned long long) ff);
+	printf("pass!\n");
+	axmap_free(map);
+	return 0;
+}
+
+static int check_next_free(struct axmap *map, uint64_t start, uint64_t expected)
+{
+
+	uint64_t ff;
+
+	ff = axmap_next_free(map, start);
+	if (ff != expected) {
+		printf("axmap_next_free broken: Expected %llu, got %llu\n",
+				(unsigned long long)expected, (unsigned long long) ff);
 		return 1;
 	}
+	return 0;
+}
+
+static int test_next_free(size_t size, int seed)
+{
+	struct fio_lfsr lfsr;
+	struct axmap *map;
+	size_t osize;
+	uint64_t ff, lastfree;
+	int err, i;
+
+	printf("Test next_free %llu entries...", (unsigned long long) size);
+	fflush(stdout);
+
+	map = axmap_new(size);
+	err = 0;
+
+
+	/* Empty map.  Next free after 0 should be 1. */
+	if (check_next_free(map, 0, 1))
+		err = 1;
+
+	/* Empty map.  Next free after 63 should be 64. */
+	if (check_next_free(map, 63, 64))
+		err = 1;
+
+	/* Empty map.  Next free after size - 2 should be size - 1 */
+	if (check_next_free(map, size - 2, size - 1))
+		err = 1;
+
+	/* Empty map.  Next free after size - 1 should be 0 */
+	if (check_next_free(map, size - 1, 0))
+		err = 1;
+
+	/* Empty map.  Next free after 63 should be 64. */
+	if (check_next_free(map, 63, 64))
+		err = 1;
+
+
+	/* Bit 63 set.  Next free after 62 should be 64. */
+	axmap_set(map, 63);
+	if (check_next_free(map, 62, 64))
+		err = 1;
+
+	/* Last bit set.  Next free after size - 2 should be 0. */
+	axmap_set(map, size - 1);
+	if (check_next_free(map, size - 2, 0))
+		err = 1;
+
+	/* Last bit set.  Next free after size - 1 should be 0. */
+	if (check_next_free(map, size - 1, 0))
+		err = 1;
+	
+	/* Last 64 bits set.  Next free after size - 66 or size - 65 should be 0. */
+	for (i=size - 65; i < size; i++)
+		axmap_set(map, i);
+	if (check_next_free(map, size - 66, 0))
+		err = 1;
+	if (check_next_free(map, size - 65, 0))
+		err = 1;
+	
+	/* Last 64 bits set.  Next free after size - 67 should be size - 66. */
+	if (check_next_free(map, size - 67, size - 66))
+		err = 1;
+
+	axmap_free(map);
+	
+	/* Start with a fresh map and mostly fill it up */
+	lfsr_init(&lfsr, size, seed, seed & 0xF);
+	map = axmap_new(size);
+	osize = size;
+
+	/* Leave 1 entry free */
+	size--;
+	while (size--) {
+		uint64_t val;
+
+		if (lfsr_next(&lfsr, &val)) {
+			printf("lfsr: short loop\n");
+			err = 1;
+			break;
+		}
+		if (axmap_isset(map, val)) {
+			printf("bit already set\n");
+			err = 1;
+			break;
+		}
+		axmap_set(map, val);
+		if (!axmap_isset(map, val)) {
+			printf("bit not set\n");
+			err = 1;
+			break;
+		}
+	}
+
+	/* Get last free bit */
+	lastfree = axmap_next_free(map, 0);
+	if (lastfree == -1ULL) {
+		printf("axmap_next_free broken: Couldn't find last free bit\n");
+		err = 1;
+	}
+
+	/* Start with last free bit and test wrap-around */
+	ff = axmap_next_free(map, lastfree);
+	if (ff != lastfree) {
+		printf("axmap_next_free broken: wrap-around test #1 failed\n");
+		err = 1;
+	}
+
+	/* Start with last bit and test wrap-around */
+	ff = axmap_next_free(map, osize - 1);
+	if (ff != lastfree) {
+		printf("axmap_next_free broken: wrap-around test #2 failed\n");
+		err = 1;
+	}
+
+	/* Set last free bit */
+	axmap_set(map, lastfree);
+	ff = axmap_next_free(map, 0);
+	if (ff != -1ULL) {
+		printf("axmap_next_free broken: Expected -1 from full map\n");
+		err = 1;
+	}
+
+	ff = axmap_next_free(map, osize);
+	if (ff != -1ULL) {
+		printf("axmap_next_free broken: Expected -1 from out of bounds request\n");
+		err = 1;
+	}
+
+	if (err)
+		return err;
 
 	printf("pass!\n");
 	axmap_free(map);
@@ -269,6 +409,16 @@ int main(int argc, char *argv[])
 		return 3;
 	if (test_overlap())
 		return 4;
+	if (test_next_free(size, seed))
+		return 5;
+
+	/* Test 3 levels, all full:  64*64*64 */
+	if (test_next_free(64*64*64, seed))
+		return 6;
+
+	/* Test 4 levels, with 2 inner levels not full */
+	if (test_next_free(((((64*64)-63)*64)-63)*64*12, seed))
+		return 7;
 
 	return 0;
 }
diff --git a/t/steadystate_tests.py b/t/steadystate_tests.py
new file mode 100755
index 0000000..50254dc
--- /dev/null
+++ b/t/steadystate_tests.py
@@ -0,0 +1,226 @@
+#!/usr/bin/python2.7
+# Note: this script is python2 and python 3 compatible.
+#
+# steadystate_tests.py
+#
+# Test option parsing and functonality for fio's steady state detection feature.
+#
+# steadystate_tests.py --read file-for-read-testing --write file-for-write-testing ./fio
+#
+# REQUIREMENTS
+# Python 2.6+
+# SciPy
+#
+# KNOWN ISSUES
+# only option parsing and read tests are carried out
+# On Windows this script works under Cygwin but not from cmd.exe
+# On Windows I encounter frequent fio problems generating JSON output (nothing to decode)
+# min runtime:
+# if ss attained: min runtime = ss_dur + ss_ramp
+# if not attained: runtime = timeout
+
+from __future__ import absolute_import
+from __future__ import print_function
+import os
+import sys
+import json
+import uuid
+import pprint
+import argparse
+import subprocess
+from scipy import stats
+from six.moves import range
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('fio',
+                        help='path to fio executable')
+    parser.add_argument('--read',
+                        help='target for read testing')
+    parser.add_argument('--write',
+                        help='target for write testing')
+    args = parser.parse_args()
+
+    return args
+
+
+def check(data, iops, slope, pct, limit, dur, criterion):
+    measurement = 'iops' if iops else 'bw'
+    data = data[measurement]
+    mean = sum(data) / len(data)
+    if slope:
+        x = list(range(len(data)))
+        m, intercept, r_value, p_value, std_err = stats.linregress(x,data)
+        m = abs(m)
+        if pct:
+            target = m / mean * 100
+            criterion = criterion[:-1]
+        else:
+            target = m
+    else:
+        maxdev = 0
+        for x in data:
+            maxdev = max(abs(mean-x), maxdev)
+        if pct:
+            target = maxdev / mean * 100
+            criterion = criterion[:-1]
+        else:
+            target = maxdev
+
+    criterion = float(criterion)
+    return (abs(target - criterion) / criterion < 0.005), target < limit, mean, target
+
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    pp = pprint.PrettyPrinter(indent=4)
+
+#
+# test option parsing
+#
+    parsing = [ { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:10", "--ss_ramp=5"],
+                  'output': "set steady state IOPS threshold to 10.000000" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:10%", "--ss_ramp=5"],
+                  'output': "set steady state threshold to 10.000000%" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:.1%", "--ss_ramp=5"],
+                  'output': "set steady state threshold to 0.100000%" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:10%", "--ss_ramp=5"],
+                  'output': "set steady state threshold to 10.000000%" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:.1%", "--ss_ramp=5"],
+                  'output': "set steady state threshold to 0.100000%" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:12", "--ss_ramp=5"],
+                  'output': "set steady state BW threshold to 12" },
+              ]
+    for test in parsing:
+        output = subprocess.check_output([args.fio] + test['args'])
+        if test['output'] in output.decode():
+            print("PASSED '{0}' found with arguments {1}".format(test['output'], test['args']))
+        else:
+            print("FAILED '{0}' NOT found with arguments {1}".format(test['output'], test['args']))
+
+#
+# test some read workloads
+#
+# if ss active and attained,
+#   check that runtime is less than job time
+#   check criteria
+#   how to check ramp time?
+#
+# if ss inactive
+#   check that runtime is what was specified
+#
+    reads = [ {'s': True, 'timeout': 100, 'numjobs': 1, 'ss_dur': 5, 'ss_ramp': 3, 'iops': True, 'slope': True, 'ss_limit': 0.1, 'pct': True},
+              {'s': False, 'timeout': 20, 'numjobs': 2},
+              {'s': True, 'timeout': 100, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 5, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
+              {'s': True, 'timeout': 10, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 500, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
+            ]
+
+    if args.read == None:
+        if os.name == 'posix':
+            args.read = '/dev/zero'
+            extra = [ "--size=134217728" ]  # 128 MiB
+        else:
+            print("ERROR: file for read testing must be specified on non-posix systems")
+            sys.exit(1)
+    else:
+        extra = []
+
+    jobnum = 0
+    for job in reads:
+
+        tf = uuid.uuid4().hex
+        parameters = [ "--name=job{0}".format(jobnum) ]
+        parameters.extend(extra)
+        parameters.extend([ "--thread",
+                            "--output-format=json",
+                            "--output={0}".format(tf),
+                            "--filename={0}".format(args.read),
+                            "--rw=randrw",
+                            "--rwmixread=100",
+                            "--stonewall",
+                            "--group_reporting",
+                            "--numjobs={0}".format(job['numjobs']),
+                            "--time_based",
+                            "--runtime={0}".format(job['timeout']) ])
+        if job['s']:
+           if job['iops']:
+               ss = 'iops'
+           else:
+               ss = 'bw'
+           if job['slope']:
+               ss += "_slope"
+           ss += ":" + str(job['ss_limit'])
+           if job['pct']:
+               ss += '%'
+           parameters.extend([ '--ss_dur={0}'.format(job['ss_dur']),
+                               '--ss={0}'.format(ss),
+                               '--ss_ramp={0}'.format(job['ss_ramp']) ])
+
+        output = subprocess.call([args.fio] + parameters)
+        with open(tf, 'r') as source:
+            jsondata = json.loads(source.read())
+        os.remove(tf)
+
+        for jsonjob in jsondata['jobs']:
+            line = "job {0}".format(jsonjob['job options']['name'])
+            if job['s']:
+                if jsonjob['steadystate']['attained'] == 1:
+                    # check runtime >= ss_dur + ss_ramp, check criterion, check criterion < limit
+                    mintime = (job['ss_dur'] + job['ss_ramp']) * 1000
+                    actual = jsonjob['read']['runtime']
+                    if mintime > actual:
+                        line = 'FAILED ' + line + ' ss attained, runtime {0} < ss_dur {1} + ss_ramp {2}'.format(actual, job['ss_dur'], job['ss_ramp'])
+                    else:
+                        line = line + ' ss attained, runtime {0} > ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
+                        objsame, met, mean, target = check(data=jsonjob['steadystate']['data'],
+                            iops=job['iops'],
+                            slope=job['slope'],
+                            pct=job['pct'],
+                            limit=job['ss_limit'],
+                            dur=job['ss_dur'],
+                            criterion=jsonjob['steadystate']['criterion'])
+                        if not objsame:
+                            line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
+                        else:
+                            if met:
+                                line = 'PASSED ' + line + ' target {0} < limit {1}'.format(target, job['ss_limit'])
+                            else:
+                                line = 'FAILED ' + line + ' target {0} < limit {1} but fio reports ss not attained '.format(target, job['ss_limit'])
+                else:
+                    # check runtime, confirm criterion calculation, and confirm that criterion was not met
+                    expected = job['timeout'] * 1000
+                    actual = jsonjob['read']['runtime']
+                    if abs(expected - actual) > 10:
+                        line = 'FAILED ' + line + ' ss not attained, expected runtime {0} != actual runtime {1}'.format(expected, actual)
+                    else:
+                        line = line + ' ss not attained, runtime {0} != ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
+                        objsame, met, mean, target = check(data=jsonjob['steadystate']['data'],
+                            iops=job['iops'],
+                            slope=job['slope'],
+                            pct=job['pct'],
+                            limit=job['ss_limit'],
+                            dur=job['ss_dur'],
+                            criterion=jsonjob['steadystate']['criterion'])
+                        if not objsame:
+                            if actual > (job['ss_dur'] + job['ss_ramp'])*1000:
+                                line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
+                            else:
+                                line = 'PASSED ' + line + ' fio criterion {0} == 0.0 since ss_dur + ss_ramp has not elapsed '.format(jsonjob['steadystate']['criterion'])
+                        else:
+                            if met:
+                                line = 'FAILED ' + line + ' target {0} < threshold {1} but fio reports ss not attained '.format(target, job['ss_limit'])
+                            else:
+                                line = 'PASSED ' + line + ' criterion {0} > threshold {1}'.format(target, job['ss_limit'])
+            else:
+                expected = job['timeout'] * 1000
+                actual = jsonjob['read']['runtime']
+                if abs(expected - actual) < 10:
+                    result = 'PASSED '
+                else:
+                    result = 'FAILED '
+                line = result + line + ' no ss, expected runtime {0} ~= actual runtime {1}'.format(expected, actual)
+            print(line)
+            if 'steadystate' in jsonjob:
+                pp.pprint(jsonjob['steadystate'])
+        jobnum += 1
diff --git a/unit_tests/steadystate_tests.py b/unit_tests/steadystate_tests.py
deleted file mode 100755
index 50254dc..0000000
--- a/unit_tests/steadystate_tests.py
+++ /dev/null
@@ -1,226 +0,0 @@
-#!/usr/bin/python2.7
-# Note: this script is python2 and python 3 compatible.
-#
-# steadystate_tests.py
-#
-# Test option parsing and functonality for fio's steady state detection feature.
-#
-# steadystate_tests.py --read file-for-read-testing --write file-for-write-testing ./fio
-#
-# REQUIREMENTS
-# Python 2.6+
-# SciPy
-#
-# KNOWN ISSUES
-# only option parsing and read tests are carried out
-# On Windows this script works under Cygwin but not from cmd.exe
-# On Windows I encounter frequent fio problems generating JSON output (nothing to decode)
-# min runtime:
-# if ss attained: min runtime = ss_dur + ss_ramp
-# if not attained: runtime = timeout
-
-from __future__ import absolute_import
-from __future__ import print_function
-import os
-import sys
-import json
-import uuid
-import pprint
-import argparse
-import subprocess
-from scipy import stats
-from six.moves import range
-
-def parse_args():
-    parser = argparse.ArgumentParser()
-    parser.add_argument('fio',
-                        help='path to fio executable')
-    parser.add_argument('--read',
-                        help='target for read testing')
-    parser.add_argument('--write',
-                        help='target for write testing')
-    args = parser.parse_args()
-
-    return args
-
-
-def check(data, iops, slope, pct, limit, dur, criterion):
-    measurement = 'iops' if iops else 'bw'
-    data = data[measurement]
-    mean = sum(data) / len(data)
-    if slope:
-        x = list(range(len(data)))
-        m, intercept, r_value, p_value, std_err = stats.linregress(x,data)
-        m = abs(m)
-        if pct:
-            target = m / mean * 100
-            criterion = criterion[:-1]
-        else:
-            target = m
-    else:
-        maxdev = 0
-        for x in data:
-            maxdev = max(abs(mean-x), maxdev)
-        if pct:
-            target = maxdev / mean * 100
-            criterion = criterion[:-1]
-        else:
-            target = maxdev
-
-    criterion = float(criterion)
-    return (abs(target - criterion) / criterion < 0.005), target < limit, mean, target
-
-
-if __name__ == '__main__':
-    args = parse_args()
-
-    pp = pprint.PrettyPrinter(indent=4)
-
-#
-# test option parsing
-#
-    parsing = [ { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:10", "--ss_ramp=5"],
-                  'output': "set steady state IOPS threshold to 10.000000" },
-                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:10%", "--ss_ramp=5"],
-                  'output': "set steady state threshold to 10.000000%" },
-                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:.1%", "--ss_ramp=5"],
-                  'output': "set steady state threshold to 0.100000%" },
-                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:10%", "--ss_ramp=5"],
-                  'output': "set steady state threshold to 10.000000%" },
-                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:.1%", "--ss_ramp=5"],
-                  'output': "set steady state threshold to 0.100000%" },
-                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:12", "--ss_ramp=5"],
-                  'output': "set steady state BW threshold to 12" },
-              ]
-    for test in parsing:
-        output = subprocess.check_output([args.fio] + test['args'])
-        if test['output'] in output.decode():
-            print("PASSED '{0}' found with arguments {1}".format(test['output'], test['args']))
-        else:
-            print("FAILED '{0}' NOT found with arguments {1}".format(test['output'], test['args']))
-
-#
-# test some read workloads
-#
-# if ss active and attained,
-#   check that runtime is less than job time
-#   check criteria
-#   how to check ramp time?
-#
-# if ss inactive
-#   check that runtime is what was specified
-#
-    reads = [ {'s': True, 'timeout': 100, 'numjobs': 1, 'ss_dur': 5, 'ss_ramp': 3, 'iops': True, 'slope': True, 'ss_limit': 0.1, 'pct': True},
-              {'s': False, 'timeout': 20, 'numjobs': 2},
-              {'s': True, 'timeout': 100, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 5, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
-              {'s': True, 'timeout': 10, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 500, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
-            ]
-
-    if args.read == None:
-        if os.name == 'posix':
-            args.read = '/dev/zero'
-            extra = [ "--size=134217728" ]  # 128 MiB
-        else:
-            print("ERROR: file for read testing must be specified on non-posix systems")
-            sys.exit(1)
-    else:
-        extra = []
-
-    jobnum = 0
-    for job in reads:
-
-        tf = uuid.uuid4().hex
-        parameters = [ "--name=job{0}".format(jobnum) ]
-        parameters.extend(extra)
-        parameters.extend([ "--thread",
-                            "--output-format=json",
-                            "--output={0}".format(tf),
-                            "--filename={0}".format(args.read),
-                            "--rw=randrw",
-                            "--rwmixread=100",
-                            "--stonewall",
-                            "--group_reporting",
-                            "--numjobs={0}".format(job['numjobs']),
-                            "--time_based",
-                            "--runtime={0}".format(job['timeout']) ])
-        if job['s']:
-           if job['iops']:
-               ss = 'iops'
-           else:
-               ss = 'bw'
-           if job['slope']:
-               ss += "_slope"
-           ss += ":" + str(job['ss_limit'])
-           if job['pct']:
-               ss += '%'
-           parameters.extend([ '--ss_dur={0}'.format(job['ss_dur']),
-                               '--ss={0}'.format(ss),
-                               '--ss_ramp={0}'.format(job['ss_ramp']) ])
-
-        output = subprocess.call([args.fio] + parameters)
-        with open(tf, 'r') as source:
-            jsondata = json.loads(source.read())
-        os.remove(tf)
-
-        for jsonjob in jsondata['jobs']:
-            line = "job {0}".format(jsonjob['job options']['name'])
-            if job['s']:
-                if jsonjob['steadystate']['attained'] == 1:
-                    # check runtime >= ss_dur + ss_ramp, check criterion, check criterion < limit
-                    mintime = (job['ss_dur'] + job['ss_ramp']) * 1000
-                    actual = jsonjob['read']['runtime']
-                    if mintime > actual:
-                        line = 'FAILED ' + line + ' ss attained, runtime {0} < ss_dur {1} + ss_ramp {2}'.format(actual, job['ss_dur'], job['ss_ramp'])
-                    else:
-                        line = line + ' ss attained, runtime {0} > ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
-                        objsame, met, mean, target = check(data=jsonjob['steadystate']['data'],
-                            iops=job['iops'],
-                            slope=job['slope'],
-                            pct=job['pct'],
-                            limit=job['ss_limit'],
-                            dur=job['ss_dur'],
-                            criterion=jsonjob['steadystate']['criterion'])
-                        if not objsame:
-                            line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
-                        else:
-                            if met:
-                                line = 'PASSED ' + line + ' target {0} < limit {1}'.format(target, job['ss_limit'])
-                            else:
-                                line = 'FAILED ' + line + ' target {0} < limit {1} but fio reports ss not attained '.format(target, job['ss_limit'])
-                else:
-                    # check runtime, confirm criterion calculation, and confirm that criterion was not met
-                    expected = job['timeout'] * 1000
-                    actual = jsonjob['read']['runtime']
-                    if abs(expected - actual) > 10:
-                        line = 'FAILED ' + line + ' ss not attained, expected runtime {0} != actual runtime {1}'.format(expected, actual)
-                    else:
-                        line = line + ' ss not attained, runtime {0} != ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
-                        objsame, met, mean, target = check(data=jsonjob['steadystate']['data'],
-                            iops=job['iops'],
-                            slope=job['slope'],
-                            pct=job['pct'],
-                            limit=job['ss_limit'],
-                            dur=job['ss_dur'],
-                            criterion=jsonjob['steadystate']['criterion'])
-                        if not objsame:
-                            if actual > (job['ss_dur'] + job['ss_ramp'])*1000:
-                                line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
-                            else:
-                                line = 'PASSED ' + line + ' fio criterion {0} == 0.0 since ss_dur + ss_ramp has not elapsed '.format(jsonjob['steadystate']['criterion'])
-                        else:
-                            if met:
-                                line = 'FAILED ' + line + ' target {0} < threshold {1} but fio reports ss not attained '.format(target, job['ss_limit'])
-                            else:
-                                line = 'PASSED ' + line + ' criterion {0} > threshold {1}'.format(target, job['ss_limit'])
-            else:
-                expected = job['timeout'] * 1000
-                actual = jsonjob['read']['runtime']
-                if abs(expected - actual) < 10:
-                    result = 'PASSED '
-                else:
-                    result = 'FAILED '
-                line = result + line + ' no ss, expected runtime {0} ~= actual runtime {1}'.format(expected, actual)
-            print(line)
-            if 'steadystate' in jsonjob:
-                pp.pprint(jsonjob['steadystate'])
-        jobnum += 1


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 35e2d88fad2151f272af60babb5e6c98922d0bcd:

  Fix compilation on centos 7 (2018-08-15 12:06:14 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9ee669faa39003c2317e5df892314bcfcee069e3:

  configure: avoid pkg-config usage for http engine (2018-08-16 18:58:27 +0200)

----------------------------------------------------------------
David Disseldorp (3):
      configure: use pkg-config to detect libcurl & openssl
      engines/http: support openssl < 1.1.0
      configure: avoid pkg-config usage for http engine

Jens Axboe (3):
      Merge branch 'http_older_openssl' of https://github.com/ddiss/fio
      engines/http: fix use of uninitialized variable
      Merge branch 'wip-http-swift' of https://github.com/l-mb/fio

Lars Marowsky-Bree (1):
      engines/http: Add support for Swift storage backend

 HOWTO                    |  17 ++++--
 configure                |  35 ++++++++++--
 engines/http.c           | 135 +++++++++++++++++++++++++++++++++++++++++------
 examples/http-s3.fio     |   4 +-
 examples/http-swift.fio  |  32 +++++++++++
 examples/http-webdav.fio |   4 +-
 fio.1                    |  16 ++++--
 7 files changed, 209 insertions(+), 34 deletions(-)
 create mode 100644 examples/http-swift.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index c77dad1..743144f 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2137,14 +2137,16 @@ with the caveat that when used on the command line, they must come after the
 
 	Password for HTTP authentication.
 
-.. option:: https=bool : [http]
+.. option:: https=str : [http]
 
-	Enable HTTPS instead of http. Default is **0**
+	Enable HTTPS instead of http. *on* enables HTTPS; *insecure*
+	will enable HTTPS, but disable SSL peer verification (use with
+	caution!). Default is **off**
 
-.. option:: http_s3=bool : [http]
+.. option:: http_mode=str : [http]
 
-	Enable S3 specific HTTP headers such as authenticating requests
-	with AWS Signature Version 4. Default is **0**
+	Which HTTP access mode to use: *webdav*, *swift*, or *s3*.
+	Default is **webdav**
 
 .. option:: http_s3_region=str : [http]
 
@@ -2159,6 +2161,11 @@ with the caveat that when used on the command line, they must come after the
 
 	The S3 key/access id.
 
+.. option:: http_swift_auth_token=str : [http]
+
+	The Swift auth token. See the example configuration file on how
+	to retrieve this.
+
 .. option:: http_verbose=int : [http]
 
 	Enable verbose requests from libcurl. Useful for debugging. 1
diff --git a/configure b/configure
index 103ea94..a03f7fa 100755
--- a/configure
+++ b/configure
@@ -14,12 +14,13 @@ else
 fi
 
 TMPC="${TMPDIR1}/fio-conf-${RANDOM}-$$-${RANDOM}.c"
+TMPC2="${TMPDIR1}/fio-conf-${RANDOM}-$$-${RANDOM}-2.c"
 TMPO="${TMPDIR1}/fio-conf-${RANDOM}-$$-${RANDOM}.o"
 TMPE="${TMPDIR1}/fio-conf-${RANDOM}-$$-${RANDOM}.exe"
 
 # NB: do not call "exit" in the trap handler; this is buggy with some shells;
 # see <1285349658-3122-1-git-send-email-loic.minier@linaro.org>
-trap "rm -f $TMPC $TMPO $TMPE" EXIT INT QUIT TERM
+trap "rm -f $TMPC $TMPC2 $TMPO $TMPE" EXIT INT QUIT TERM
 
 rm -rf config.log
 
@@ -1573,6 +1574,7 @@ print_config "IPv6 helpers" "$ipv6"
 if test "$http" != "yes" ; then
   http="no"
 fi
+# check for openssl >= 1.1.0, which uses an opaque HMAC_CTX pointer
 cat > $TMPC << EOF
 #include <curl/curl.h>
 #include <openssl/hmac.h>
@@ -1591,9 +1593,34 @@ int main(int argc, char **argv)
   return 0;
 }
 EOF
-if test "$disable_http" != "yes"  && compile_prog "" "-lcurl -lssl -lcrypto" "curl"; then
-  LIBS="-lcurl -lssl -lcrypto $LIBS"
-  http="yes"
+# openssl < 1.1.0 uses the HMAC_CTX type directly
+cat > $TMPC2 << EOF
+#include <curl/curl.h>
+#include <openssl/hmac.h>
+
+int main(int argc, char **argv)
+{
+  CURL *curl;
+  HMAC_CTX ctx;
+
+  curl = curl_easy_init();
+  curl_easy_cleanup(curl);
+
+  HMAC_CTX_init(&ctx);
+  HMAC_CTX_cleanup(&ctx);
+  return 0;
+}
+EOF
+if test "$disable_http" != "yes"; then
+  HTTP_LIBS="-lcurl -lssl -lcrypto"
+  if compile_prog "" "$HTTP_LIBS" "curl-new-ssl"; then
+    output_sym "CONFIG_HAVE_OPAQUE_HMAC_CTX"
+    http="yes"
+    LIBS="$HTTP_LIBS $LIBS"
+  elif mv $TMPC2 $TMPC && compile_prog "" "$HTTP_LIBS" "curl-old-ssl"; then
+    http="yes"
+    LIBS="$HTTP_LIBS $LIBS"
+  fi
 fi
 print_config "http engine" "$http"
 
diff --git a/engines/http.c b/engines/http.c
index d3fdba8..cb66ebe 100644
--- a/engines/http.c
+++ b/engines/http.c
@@ -25,25 +25,37 @@
 #include <curl/curl.h>
 #include <openssl/hmac.h>
 #include <openssl/sha.h>
+#include <openssl/md5.h>
 #include "fio.h"
 #include "../optgroup.h"
 
 
+enum {
+	FIO_HTTP_WEBDAV	    = 0,
+	FIO_HTTP_S3	    = 1,
+	FIO_HTTP_SWIFT	    = 2,
+
+	FIO_HTTPS_OFF	    = 0,
+	FIO_HTTPS_ON	    = 1,
+	FIO_HTTPS_INSECURE  = 2,
+};
+
 struct http_data {
 	CURL *curl;
 };
 
 struct http_options {
 	void *pad;
-	int  https;
+	unsigned int https;
 	char *host;
 	char *user;
 	char *pass;
 	char *s3_key;
 	char *s3_keyid;
 	char *s3_region;
+	char *swift_auth_token;
 	int verbose;
-	int s3;
+	unsigned int mode;
 };
 
 struct http_curl_stream {
@@ -56,10 +68,24 @@ static struct fio_option options[] = {
 	{
 		.name     = "https",
 		.lname    = "https",
-		.type     = FIO_OPT_BOOL,
+		.type     = FIO_OPT_STR,
 		.help     = "Enable https",
 		.off1     = offsetof(struct http_options, https),
-		.def      = "0",
+		.def      = "off",
+		.posval = {
+			  { .ival = "off",
+			    .oval = FIO_HTTPS_OFF,
+			    .help = "No HTTPS",
+			  },
+			  { .ival = "on",
+			    .oval = FIO_HTTPS_ON,
+			    .help = "Enable HTTPS",
+			  },
+			  { .ival = "insecure",
+			    .oval = FIO_HTTPS_INSECURE,
+			    .help = "Enable HTTPS, disable peer verification",
+			  },
+		},
 		.category = FIO_OPT_C_ENGINE,
 		.group    = FIO_OPT_G_HTTP,
 	},
@@ -112,6 +138,16 @@ static struct fio_option options[] = {
 		.group    = FIO_OPT_G_HTTP,
 	},
 	{
+		.name     = "http_swift_auth_token",
+		.lname    = "Swift auth token",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "OpenStack Swift auth token",
+		.off1     = offsetof(struct http_options, swift_auth_token),
+		.def	  = "",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
+	{
 		.name     = "http_s3_region",
 		.lname    = "S3 region",
 		.type     = FIO_OPT_STR_STORE,
@@ -122,18 +158,32 @@ static struct fio_option options[] = {
 		.group    = FIO_OPT_G_HTTP,
 	},
 	{
-		.name     = "http_s3",
-		.lname    = "S3 extensions",
-		.type     = FIO_OPT_BOOL,
-		.help     = "Whether to enable S3 specific headers",
-		.off1     = offsetof(struct http_options, s3),
-		.def	  = "0",
+		.name     = "http_mode",
+		.lname    = "Request mode to use",
+		.type     = FIO_OPT_STR,
+		.help     = "Whether to use WebDAV, Swift, or S3",
+		.off1     = offsetof(struct http_options, mode),
+		.def	  = "webdav",
+		.posval = {
+			  { .ival = "webdav",
+			    .oval = FIO_HTTP_WEBDAV,
+			    .help = "WebDAV server",
+			  },
+			  { .ival = "s3",
+			    .oval = FIO_HTTP_S3,
+			    .help = "S3 storage backend",
+			  },
+			  { .ival = "swift",
+			    .oval = FIO_HTTP_SWIFT,
+			    .help = "OpenStack Swift storage",
+			  },
+		},
 		.category = FIO_OPT_C_ENGINE,
 		.group    = FIO_OPT_G_HTTP,
 	},
 	{
 		.name     = "http_verbose",
-		.lname    = "CURL verbosity",
+		.lname    = "HTTP verbosity level",
 		.type     = FIO_OPT_INT,
 		.help     = "increase http engine verbosity",
 		.off1     = offsetof(struct http_options, verbose),
@@ -204,15 +254,34 @@ static char *_gen_hex_sha256(const char *p, size_t len)
 	return _conv_hex(hash, SHA256_DIGEST_LENGTH);
 }
 
+static char *_gen_hex_md5(const char *p, size_t len)
+{
+	unsigned char hash[MD5_DIGEST_LENGTH];
+
+	MD5((unsigned char*)p, len, hash);
+	return _conv_hex(hash, MD5_DIGEST_LENGTH);
+}
+
 static void _hmac(unsigned char *md, void *key, int key_len, char *data) {
+#ifndef CONFIG_HAVE_OPAQUE_HMAC_CTX
+	HMAC_CTX _ctx;
+#endif
 	HMAC_CTX *ctx;
 	unsigned int hmac_len;
 
+#ifdef CONFIG_HAVE_OPAQUE_HMAC_CTX
 	ctx = HMAC_CTX_new();
+#else
+	ctx = &_ctx;
+#endif
 	HMAC_Init_ex(ctx, key, key_len, EVP_sha256(), NULL);
 	HMAC_Update(ctx, (unsigned char*)data, strlen(data));
 	HMAC_Final(ctx, md, &hmac_len);
+#ifdef CONFIG_HAVE_OPAQUE_HMAC_CTX
 	HMAC_CTX_free(ctx);
+#else
+	HMAC_CTX_cleanup(ctx);
+#endif
 }
 
 static int _curl_trace(CURL *handle, curl_infotype type,
@@ -338,6 +407,29 @@ static void _add_aws_auth_header(CURL *curl, struct curl_slist *slist, struct ht
 	free(signature);
 }
 
+static void _add_swift_header(CURL *curl, struct curl_slist *slist, struct http_options *o,
+		int op, const char *uri, char *buf, size_t len)
+{
+	char *dsha = NULL;
+	char s[512];
+
+	if (op == DDIR_WRITE) {
+		dsha = _gen_hex_md5(buf, len);
+	}
+	/* Surpress automatic Accept: header */
+	slist = curl_slist_append(slist, "Accept:");
+
+	snprintf(s, sizeof(s), "etag: %s", dsha);
+	slist = curl_slist_append(slist, s);
+
+	snprintf(s, sizeof(s), "x-auth-token: %s", o->swift_auth_token);
+	slist = curl_slist_append(slist, s);
+
+	curl_easy_setopt(curl, CURLOPT_HTTPHEADER, slist);
+
+	free(dsha);
+}
+
 static void fio_http_cleanup(struct thread_data *td)
 {
 	struct http_data *http = td->io_ops_data;
@@ -402,17 +494,24 @@ static enum fio_q_status fio_http_queue(struct thread_data *td,
 
 	fio_ro_check(td, io_u);
 	memset(&_curl_stream, 0, sizeof(_curl_stream));
-	snprintf(object, sizeof(object), "%s_%llu_%llu", td->files[0]->file_name, io_u->offset, io_u->xfer_buflen);
-	snprintf(url, sizeof(url), "%s://%s%s", o->https ? "https" : "http", o->host, object);
+	snprintf(object, sizeof(object), "%s_%llu_%llu", td->files[0]->file_name,
+		io_u->offset, io_u->xfer_buflen);
+	if (o->https == FIO_HTTPS_OFF)
+		snprintf(url, sizeof(url), "http://%s%s", o->host, object);
+	else
+		snprintf(url, sizeof(url), "https://%s%s", o->host, object);
 	curl_easy_setopt(http->curl, CURLOPT_URL, url);
 	_curl_stream.buf = io_u->xfer_buf;
 	_curl_stream.max = io_u->xfer_buflen;
 	curl_easy_setopt(http->curl, CURLOPT_SEEKDATA, &_curl_stream);
 	curl_easy_setopt(http->curl, CURLOPT_INFILESIZE_LARGE, (curl_off_t)io_u->xfer_buflen);
 
-	if (o->s3)
+	if (o->mode == FIO_HTTP_S3)
 		_add_aws_auth_header(http->curl, slist, o, io_u->ddir, object,
 			io_u->xfer_buf, io_u->xfer_buflen);
+	else if (o->mode == FIO_HTTP_SWIFT)
+		_add_swift_header(http->curl, slist, o, io_u->ddir, object,
+			io_u->xfer_buf, io_u->xfer_buflen);
 
 	if (io_u->ddir == DDIR_WRITE) {
 		curl_easy_setopt(http->curl, CURLOPT_READDATA, &_curl_stream);
@@ -487,7 +586,7 @@ static int fio_http_setup(struct thread_data *td)
 {
 	struct http_data *http = NULL;
 	struct http_options *o = td->eo;
-	int r;
+
 	/* allocate engine specific structure to deal with libhttp. */
 	http = calloc(1, sizeof(*http));
 	if (!http) {
@@ -503,6 +602,10 @@ static int fio_http_setup(struct thread_data *td)
 	curl_easy_setopt(http->curl, CURLOPT_NOPROGRESS, 1L);
 	curl_easy_setopt(http->curl, CURLOPT_FOLLOWLOCATION, 1L);
 	curl_easy_setopt(http->curl, CURLOPT_PROTOCOLS, CURLPROTO_HTTP|CURLPROTO_HTTPS);
+	if (o->https == FIO_HTTPS_INSECURE) {
+		curl_easy_setopt(http->curl, CURLOPT_SSL_VERIFYPEER, 0L);
+		curl_easy_setopt(http->curl, CURLOPT_SSL_VERIFYHOST, 0L);
+	}
 	curl_easy_setopt(http->curl, CURLOPT_READFUNCTION, _http_read);
 	curl_easy_setopt(http->curl, CURLOPT_WRITEFUNCTION, _http_write);
 	curl_easy_setopt(http->curl, CURLOPT_SEEKFUNCTION, _http_seek);
@@ -520,7 +623,7 @@ static int fio_http_setup(struct thread_data *td)
 	return 0;
 cleanup:
 	fio_http_cleanup(td);
-	return r;
+	return 1;
 }
 
 static int fio_http_open(struct thread_data *td, struct fio_file *f)
diff --git a/examples/http-s3.fio b/examples/http-s3.fio
index a9805da..2dcae36 100644
--- a/examples/http-s3.fio
+++ b/examples/http-s3.fio
@@ -9,8 +9,8 @@ name=test
 direct=1
 filename=/larsmb-fio-test/object
 http_verbose=0
-https=1
-http_s3=1
+https=on
+http_mode=s3
 http_s3_key=${S3_KEY}
 http_s3_keyid=${S3_ID}
 http_host=s3.eu-central-1.amazonaws.com
diff --git a/examples/http-swift.fio b/examples/http-swift.fio
new file mode 100644
index 0000000..b591adb
--- /dev/null
+++ b/examples/http-swift.fio
@@ -0,0 +1,32 @@
+[global]
+ioengine=http
+rw=randwrite
+name=test
+direct=1
+http_verbose=0
+http_mode=swift
+https=on
+# This is the hostname and port portion of the public access link for
+# the container:
+http_host=swift.srv.openstack.local:8081
+filename_format=/swift/v1/fio-test/bucket.$jobnum
+group_reporting
+bs=64k
+size=1M
+# Currently, fio cannot yet generate the Swift Auth-Token itself.
+# You need to set this prior to running fio via
+# eval $(openstack token issue -f shell --prefix SWIFT_) ; export SWIFT_id
+http_swift_auth_token=${SWIFT_id}
+
+[create]
+numjobs=1
+rw=randwrite
+io_size=256k
+verify=sha256
+
+# This will delete all created objects again
+[trim]
+stonewall
+numjobs=1
+rw=trim
+io_size=64k
diff --git a/examples/http-webdav.fio b/examples/http-webdav.fio
index c0624f8..2d1ca73 100644
--- a/examples/http-webdav.fio
+++ b/examples/http-webdav.fio
@@ -4,8 +4,8 @@ rw=randwrite
 name=test
 direct=1
 http_verbose=0
-http_s3=0
-https=0
+http_mode=webdav
+https=off
 http_host=localhost
 filename_format=/dav/bucket.$jobnum
 group_reporting
diff --git a/fio.1 b/fio.1
index 883a31b..73a0422 100644
--- a/fio.1
+++ b/fio.1
@@ -1829,12 +1829,14 @@ Username for HTTP authentication.
 .BI (http)http_pass \fR=\fPstr
 Password for HTTP authentication.
 .TP
-.BI (http)https \fR=\fPbool
-Whether to use HTTPS instead of plain HTTP. Default is \fB0\fR.
+.BI (http)https \fR=\fPstr
+Whether to use HTTPS instead of plain HTTP. \fRon\fP enables HTTPS;
+\fRinsecure\fP will enable HTTPS, but disable SSL peer verification (use
+with caution!).  Default is \fBoff\fR.
 .TP
-.BI (http)http_s3 \fR=\fPbool
-Include S3 specific HTTP headers such as authenticating requests with
-AWS Signature Version 4. Default is \fB0\fR.
+.BI (http)http_mode \fR=\fPstr
+Which HTTP access mode to use: webdav, swift, or s3. Default is
+\fBwebdav\fR.
 .TP
 .BI (http)http_s3_region \fR=\fPstr
 The S3 region/zone to include in the request. Default is \fBus-east-1\fR.
@@ -1845,6 +1847,10 @@ The S3 secret key.
 .BI (http)http_s3_keyid \fR=\fPstr
 The S3 key/access id.
 .TP
+.BI (http)http_swift_auth_token \fR=\fPstr
+The Swift auth token. See the example configuration file on how to
+retrieve this.
+.TP
 .BI (http)http_verbose \fR=\fPint
 Enable verbose requests from libcurl. Useful for debugging. 1 turns on
 verbose logging from libcurl, 2 additionally enables HTTP IO tracing.


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 875e8d6fa4d443068eb1c48a29f5367e454d2a37:

  Merge branch 'wip-http-engine' of https://github.com/l-mb/fio (2018-08-14 10:09:47 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 35e2d88fad2151f272af60babb5e6c98922d0bcd:

  Fix compilation on centos 7 (2018-08-15 12:06:14 -0600)

----------------------------------------------------------------
Manish Dusane (1):
      Fix compilation on centos 7

 log.h | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/log.h b/log.h
index b50d448..562f3f4 100644
--- a/log.h
+++ b/log.h
@@ -3,6 +3,7 @@
 
 #include <stdio.h>
 #include <stdarg.h>
+#include <unistd.h>
 
 #include "lib/output_buffer.h"
 


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dcf6ad384149ee0b3f91c5a8127160cc291f7157:

  Merge branch 'fio-man-page' of https://github.com/bvanassche/fio (2018-08-13 21:05:33 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 875e8d6fa4d443068eb1c48a29f5367e454d2a37:

  Merge branch 'wip-http-engine' of https://github.com/l-mb/fio (2018-08-14 10:09:47 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'wip-http-engine' of https://github.com/l-mb/fio

Lars Marowsky-Bree (1):
      engines/http: Add support for WebDAV and S3

 HOWTO                    |  50 +++++
 Makefile                 |   3 +
 configure                |  34 +++
 engines/http.c           | 558 +++++++++++++++++++++++++++++++++++++++++++++++
 examples/http-s3.fio     |  34 +++
 examples/http-webdav.fio |  26 +++
 fio.1                    |  40 ++++
 optgroup.h               |   2 +
 options.c                |   5 +
 9 files changed, 752 insertions(+)
 create mode 100644 engines/http.c
 create mode 100644 examples/http-s3.fio
 create mode 100644 examples/http-webdav.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 1bec806..c77dad1 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1835,6 +1835,15 @@ I/O engine
 			(RBD) via librbd without the need to use the kernel rbd driver. This
 			ioengine defines engine specific options.
 
+		**http**
+			I/O engine supporting GET/PUT requests over HTTP(S) with libcurl to
+			a WebDAV or S3 endpoint.  This ioengine defines engine specific options.
+
+			This engine only supports direct IO of iodepth=1; you need to scale this
+			via numjobs. blocksize defines the size of the objects to be created.
+
+			TRIM is translated to object deletion.
+
 		**gfapi**
 			Using GlusterFS libgfapi sync interface to direct access to
 			GlusterFS volumes without having to go through FUSE.  This ioengine
@@ -2115,6 +2124,47 @@ with the caveat that when used on the command line, they must come after the
 		transferred to the device. The writefua option is ignored with this
 		selection.
 
+.. option:: http_host=str : [http]
+
+	Hostname to connect to. For S3, this could be the bucket hostname.
+	Default is **localhost**
+
+.. option:: http_user=str : [http]
+
+	Username for HTTP authentication.
+
+.. option:: http_pass=str : [http]
+
+	Password for HTTP authentication.
+
+.. option:: https=bool : [http]
+
+	Enable HTTPS instead of http. Default is **0**
+
+.. option:: http_s3=bool : [http]
+
+	Enable S3 specific HTTP headers such as authenticating requests
+	with AWS Signature Version 4. Default is **0**
+
+.. option:: http_s3_region=str : [http]
+
+	The S3 region/zone string.
+	Default is **us-east-1**
+
+.. option:: http_s3_key=str : [http]
+
+	The S3 secret key.
+
+.. option:: http_s3_keyid=str : [http]
+
+	The S3 key/access id.
+
+.. option:: http_verbose=int : [http]
+
+	Enable verbose requests from libcurl. Useful for debugging. 1
+	turns on verbose logging from libcurl, 2 additionally enables
+	HTTP IO tracing. Default is **0**
+
 I/O depth
 ~~~~~~~~~
 
diff --git a/Makefile b/Makefile
index f9bd787..b981b45 100644
--- a/Makefile
+++ b/Makefile
@@ -101,6 +101,9 @@ endif
 ifdef CONFIG_RBD
   SOURCE += engines/rbd.c
 endif
+ifdef CONFIG_HTTP
+  SOURCE += engines/http.c
+endif
 SOURCE += oslib/asprintf.c
 ifndef CONFIG_STRSEP
   SOURCE += oslib/strsep.c
diff --git a/configure b/configure
index 9bdc7a1..103ea94 100755
--- a/configure
+++ b/configure
@@ -181,6 +181,8 @@ for opt do
   ;;
   --disable-rbd) disable_rbd="yes"
   ;;
+  --disable-http) disable_http="yes"
+  ;;
   --disable-gfapi) disable_gfapi="yes"
   ;;
   --enable-libhdfs) libhdfs="yes"
@@ -1567,6 +1569,35 @@ fi
 print_config "IPv6 helpers" "$ipv6"
 
 ##########################################
+# check for http
+if test "$http" != "yes" ; then
+  http="no"
+fi
+cat > $TMPC << EOF
+#include <curl/curl.h>
+#include <openssl/hmac.h>
+
+int main(int argc, char **argv)
+{
+  CURL *curl;
+  HMAC_CTX *ctx;
+
+  curl = curl_easy_init();
+  curl_easy_cleanup(curl);
+
+  ctx = HMAC_CTX_new();
+  HMAC_CTX_reset(ctx);
+  HMAC_CTX_free(ctx);
+  return 0;
+}
+EOF
+if test "$disable_http" != "yes"  && compile_prog "" "-lcurl -lssl -lcrypto" "curl"; then
+  LIBS="-lcurl -lssl -lcrypto $LIBS"
+  http="yes"
+fi
+print_config "http engine" "$http"
+
+##########################################
 # check for rados
 if test "$rados" != "yes" ; then
   rados="no"
@@ -2346,6 +2377,9 @@ fi
 if test "$ipv6" = "yes" ; then
   output_sym "CONFIG_IPV6"
 fi
+if test "$http" = "yes" ; then
+  output_sym "CONFIG_HTTP"
+fi
 if test "$rados" = "yes" ; then
   output_sym "CONFIG_RADOS"
 fi
diff --git a/engines/http.c b/engines/http.c
new file mode 100644
index 0000000..d3fdba8
--- /dev/null
+++ b/engines/http.c
@@ -0,0 +1,558 @@
+/*
+ * HTTP GET/PUT IO engine
+ *
+ * IO engine to perform HTTP(S) GET/PUT requests via libcurl-easy.
+ *
+ * Copyright (C) 2018 SUSE LLC
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the Free
+ * Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
+ * Boston, MA 02110-1301, USA.
+ */
+
+#include <pthread.h>
+#include <time.h>
+#include <curl/curl.h>
+#include <openssl/hmac.h>
+#include <openssl/sha.h>
+#include "fio.h"
+#include "../optgroup.h"
+
+
+struct http_data {
+	CURL *curl;
+};
+
+struct http_options {
+	void *pad;
+	int  https;
+	char *host;
+	char *user;
+	char *pass;
+	char *s3_key;
+	char *s3_keyid;
+	char *s3_region;
+	int verbose;
+	int s3;
+};
+
+struct http_curl_stream {
+	char *buf;
+	size_t pos;
+	size_t max;
+};
+
+static struct fio_option options[] = {
+	{
+		.name     = "https",
+		.lname    = "https",
+		.type     = FIO_OPT_BOOL,
+		.help     = "Enable https",
+		.off1     = offsetof(struct http_options, https),
+		.def      = "0",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
+	{
+		.name     = "http_host",
+		.lname    = "http_host",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "Hostname (S3 bucket)",
+		.off1     = offsetof(struct http_options, host),
+		.def	  = "localhost",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
+	{
+		.name     = "http_user",
+		.lname    = "http_user",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "HTTP user name",
+		.off1     = offsetof(struct http_options, user),
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
+	{
+		.name     = "http_pass",
+		.lname    = "http_pass",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "HTTP password",
+		.off1     = offsetof(struct http_options, pass),
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
+	{
+		.name     = "http_s3_key",
+		.lname    = "S3 secret key",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "S3 secret key",
+		.off1     = offsetof(struct http_options, s3_key),
+		.def	  = "",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
+	{
+		.name     = "http_s3_keyid",
+		.lname    = "S3 key id",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "S3 key id",
+		.off1     = offsetof(struct http_options, s3_keyid),
+		.def	  = "",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
+	{
+		.name     = "http_s3_region",
+		.lname    = "S3 region",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "S3 region",
+		.off1     = offsetof(struct http_options, s3_region),
+		.def	  = "us-east-1",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
+	{
+		.name     = "http_s3",
+		.lname    = "S3 extensions",
+		.type     = FIO_OPT_BOOL,
+		.help     = "Whether to enable S3 specific headers",
+		.off1     = offsetof(struct http_options, s3),
+		.def	  = "0",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
+	{
+		.name     = "http_verbose",
+		.lname    = "CURL verbosity",
+		.type     = FIO_OPT_INT,
+		.help     = "increase http engine verbosity",
+		.off1     = offsetof(struct http_options, verbose),
+		.def	  = "0",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_HTTP,
+	},
+	{
+		.name     = NULL,
+	},
+};
+
+static char *_aws_uriencode(const char *uri)
+{
+	size_t bufsize = 1024;
+	char *r = malloc(bufsize);
+	char c;
+	int i, n;
+	const char *hex = "0123456789ABCDEF";
+
+	if (!r) {
+		log_err("malloc failed\n");
+		return NULL;
+	}
+
+	n = 0;
+	for (i = 0; (c = uri[i]); i++) {
+		if (n > bufsize-5) {
+			log_err("encoding the URL failed\n");
+			return NULL;
+		}
+
+		if ( (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')
+		|| (c >= '0' && c <= '9') || c == '_' || c == '-'
+		|| c == '~' || c == '.' || c == '/')
+			r[n++] = c;
+		else {
+			r[n++] = '%';
+			r[n++] = hex[(c >> 4 ) & 0xF];
+			r[n++] = hex[c & 0xF];
+		}
+	}
+	r[n++] = 0;
+	return r;
+}
+
+static char *_conv_hex(const unsigned char *p, size_t len)
+{
+	char *r;
+	int i,n;
+	const char *hex = "0123456789abcdef";
+	r = malloc(len * 2 + 1);
+	n = 0;
+	for (i = 0; i < len; i++) {
+		r[n++] = hex[(p[i] >> 4 ) & 0xF];
+		r[n++] = hex[p[i] & 0xF];
+	}
+	r[n] = 0;
+
+	return r;
+}
+
+static char *_gen_hex_sha256(const char *p, size_t len)
+{
+	unsigned char hash[SHA256_DIGEST_LENGTH];
+
+	SHA256((unsigned char*)p, len, hash);
+	return _conv_hex(hash, SHA256_DIGEST_LENGTH);
+}
+
+static void _hmac(unsigned char *md, void *key, int key_len, char *data) {
+	HMAC_CTX *ctx;
+	unsigned int hmac_len;
+
+	ctx = HMAC_CTX_new();
+	HMAC_Init_ex(ctx, key, key_len, EVP_sha256(), NULL);
+	HMAC_Update(ctx, (unsigned char*)data, strlen(data));
+	HMAC_Final(ctx, md, &hmac_len);
+	HMAC_CTX_free(ctx);
+}
+
+static int _curl_trace(CURL *handle, curl_infotype type,
+	     char *data, size_t size,
+	     void *userp)
+{
+	const char *text;
+	(void)handle; /* prevent compiler warning */
+	(void)userp;
+
+	switch (type) {
+	case CURLINFO_TEXT:
+	fprintf(stderr, "== Info: %s", data);
+	default:
+	case CURLINFO_SSL_DATA_OUT:
+	case CURLINFO_SSL_DATA_IN:
+		return 0;
+
+	case CURLINFO_HEADER_OUT:
+		text = "=> Send header";
+		break;
+	case CURLINFO_DATA_OUT:
+		text = "=> Send data";
+		break;
+	case CURLINFO_HEADER_IN:
+		text = "<= Recv header";
+		break;
+	case CURLINFO_DATA_IN:
+		text = "<= Recv data";
+		break;
+	}
+
+	log_info("%s: %s", text, data);
+	return 0;
+}
+
+/* https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html
+ * https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html#signing-request-intro
+ */
+static void _add_aws_auth_header(CURL *curl, struct curl_slist *slist, struct http_options *o,
+		int op, const char *uri, char *buf, size_t len)
+{
+	char date_short[16];
+	char date_iso[32];
+	char method[8];
+	char dkey[128];
+	char creq[512];
+	char sts[256];
+	char s[512];
+	char *uri_encoded = NULL;
+	char *dsha = NULL;
+	char *csha = NULL;
+	char *signature = NULL;
+	const char *service = "s3";
+	const char *aws = "aws4_request";
+	unsigned char md[SHA256_DIGEST_LENGTH];
+
+	time_t t = time(NULL);
+	struct tm *gtm = gmtime(&t);
+
+	strftime (date_short, sizeof(date_short), "%Y%m%d", gtm);
+	strftime (date_iso, sizeof(date_iso), "%Y%m%dT%H%M%SZ", gtm);
+	uri_encoded = _aws_uriencode(uri);
+
+	if (op == DDIR_WRITE) {
+		dsha = _gen_hex_sha256(buf, len);
+		sprintf(method, "PUT");
+	} else {
+		/* DDIR_READ && DDIR_TRIM supply an empty body */
+		if (op == DDIR_READ)
+			sprintf(method, "GET");
+		else
+			sprintf(method, "DELETE");
+		dsha = _gen_hex_sha256("", 0);
+	}
+
+	/* Create the canonical request first */
+	snprintf(creq, sizeof(creq),
+	"%s\n"
+	"%s\n"
+	"\n"
+	"host:%s\n"
+	"x-amz-content-sha256:%s\n"
+	"x-amz-date:%s\n"
+	"\n"
+	"host;x-amz-content-sha256;x-amz-date\n"
+	"%s"
+	, method
+	, uri_encoded, o->host, dsha, date_iso, dsha);
+
+	csha = _gen_hex_sha256(creq, strlen(creq));
+	snprintf(sts, sizeof(sts), "AWS4-HMAC-SHA256\n%s\n%s/%s/%s/%s\n%s",
+		date_iso, date_short, o->s3_region, service, aws, csha);
+
+	snprintf((char *)dkey, sizeof(dkey), "AWS4%s", o->s3_key);
+	_hmac(md, dkey, strlen(dkey), date_short);
+	_hmac(md, md, SHA256_DIGEST_LENGTH, o->s3_region);
+	_hmac(md, md, SHA256_DIGEST_LENGTH, (char*) service);
+	_hmac(md, md, SHA256_DIGEST_LENGTH, (char*) aws);
+	_hmac(md, md, SHA256_DIGEST_LENGTH, sts);
+
+	signature = _conv_hex(md, SHA256_DIGEST_LENGTH);
+
+	/* Surpress automatic Accept: header */
+	slist = curl_slist_append(slist, "Accept:");
+
+	snprintf(s, sizeof(s), "x-amz-content-sha256: %s", dsha);
+	slist = curl_slist_append(slist, s);
+
+	snprintf(s, sizeof(s), "x-amz-date: %s", date_iso);
+	slist = curl_slist_append(slist, s);
+
+	snprintf(s, sizeof(s), "Authorization: AWS4-HMAC-SHA256 Credential=%s/%s/%s/s3/aws4_request,"
+	"SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=%s",
+	o->s3_keyid, date_short, o->s3_region, signature);
+	slist = curl_slist_append(slist, s);
+
+	curl_easy_setopt(curl, CURLOPT_HTTPHEADER, slist);
+
+	free(uri_encoded);
+	free(csha);
+	free(dsha);
+	free(signature);
+}
+
+static void fio_http_cleanup(struct thread_data *td)
+{
+	struct http_data *http = td->io_ops_data;
+
+	if (http) {
+		curl_easy_cleanup(http->curl);
+		free(http);
+	}
+}
+
+static size_t _http_read(void *ptr, size_t size, size_t nmemb, void *stream)
+{
+	struct http_curl_stream *state = stream;
+	size_t len = size * nmemb;
+	/* We're retrieving; nothing is supposed to be read locally */
+	if (!stream)
+		return 0;
+	if (len+state->pos > state->max)
+		len = state->max - state->pos;
+	memcpy(ptr, &state->buf[state->pos], len);
+	state->pos += len;
+	return len;
+}
+
+static size_t _http_write(void *ptr, size_t size, size_t nmemb, void *stream)
+{
+	struct http_curl_stream *state = stream;
+	/* We're just discarding the returned body after a PUT */
+	if (!stream)
+		return nmemb;
+	if (size != 1)
+		return CURLE_WRITE_ERROR;
+	if (nmemb + state->pos > state->max)
+		return CURLE_WRITE_ERROR;
+	memcpy(&state->buf[state->pos], ptr, nmemb);
+	state->pos += nmemb;
+	return nmemb;
+}
+
+static int _http_seek(void *stream, curl_off_t offset, int origin)
+{
+	struct http_curl_stream *state = stream;
+	if (offset < state->max && origin == SEEK_SET) {
+		state->pos = offset;
+		return CURL_SEEKFUNC_OK;
+	} else
+		return CURL_SEEKFUNC_FAIL;
+}
+
+static enum fio_q_status fio_http_queue(struct thread_data *td,
+					 struct io_u *io_u)
+{
+	struct http_data *http = td->io_ops_data;
+	struct http_options *o = td->eo;
+	struct http_curl_stream _curl_stream;
+	struct curl_slist *slist = NULL;
+	char object[512];
+	char url[1024];
+	long status;
+	CURLcode res;
+	int r = -1;
+
+	fio_ro_check(td, io_u);
+	memset(&_curl_stream, 0, sizeof(_curl_stream));
+	snprintf(object, sizeof(object), "%s_%llu_%llu", td->files[0]->file_name, io_u->offset, io_u->xfer_buflen);
+	snprintf(url, sizeof(url), "%s://%s%s", o->https ? "https" : "http", o->host, object);
+	curl_easy_setopt(http->curl, CURLOPT_URL, url);
+	_curl_stream.buf = io_u->xfer_buf;
+	_curl_stream.max = io_u->xfer_buflen;
+	curl_easy_setopt(http->curl, CURLOPT_SEEKDATA, &_curl_stream);
+	curl_easy_setopt(http->curl, CURLOPT_INFILESIZE_LARGE, (curl_off_t)io_u->xfer_buflen);
+
+	if (o->s3)
+		_add_aws_auth_header(http->curl, slist, o, io_u->ddir, object,
+			io_u->xfer_buf, io_u->xfer_buflen);
+
+	if (io_u->ddir == DDIR_WRITE) {
+		curl_easy_setopt(http->curl, CURLOPT_READDATA, &_curl_stream);
+		curl_easy_setopt(http->curl, CURLOPT_WRITEDATA, NULL);
+		curl_easy_setopt(http->curl, CURLOPT_UPLOAD, 1L);
+		res = curl_easy_perform(http->curl);
+		if (res == CURLE_OK) {
+			curl_easy_getinfo(http->curl, CURLINFO_RESPONSE_CODE, &status);
+			if (status == 100 || (status >= 200 && status <= 204))
+				goto out;
+			log_err("DDIR_WRITE failed with HTTP status code %ld\n", status);
+			goto err;
+		}
+	} else if (io_u->ddir == DDIR_READ) {
+		curl_easy_setopt(http->curl, CURLOPT_READDATA, NULL);
+		curl_easy_setopt(http->curl, CURLOPT_WRITEDATA, &_curl_stream);
+		curl_easy_setopt(http->curl, CURLOPT_HTTPGET, 1L);
+		res = curl_easy_perform(http->curl);
+		if (res == CURLE_OK) {
+			curl_easy_getinfo(http->curl, CURLINFO_RESPONSE_CODE, &status);
+			if (status == 200)
+				goto out;
+			else if (status == 404) {
+				/* Object doesn't exist. Pretend we read
+				 * zeroes */
+				memset(io_u->xfer_buf, 0, io_u->xfer_buflen);
+				goto out;
+			}
+			log_err("DDIR_READ failed with HTTP status code %ld\n", status);
+		}
+		goto err;
+	} else if (io_u->ddir == DDIR_TRIM) {
+		curl_easy_setopt(http->curl, CURLOPT_HTTPGET, 1L);
+		curl_easy_setopt(http->curl, CURLOPT_CUSTOMREQUEST, "DELETE");
+		curl_easy_setopt(http->curl, CURLOPT_INFILESIZE_LARGE, 0);
+		curl_easy_setopt(http->curl, CURLOPT_READDATA, NULL);
+		curl_easy_setopt(http->curl, CURLOPT_WRITEDATA, NULL);
+		res = curl_easy_perform(http->curl);
+		if (res == CURLE_OK) {
+			curl_easy_getinfo(http->curl, CURLINFO_RESPONSE_CODE, &status);
+			if (status == 200 || status == 202 || status == 204 || status == 404)
+				goto out;
+			log_err("DDIR_TRIM failed with HTTP status code %ld\n", status);
+		}
+		goto err;
+	}
+
+	log_err("WARNING: Only DDIR_READ/DDIR_WRITE/DDIR_TRIM are supported!\n");
+
+err:
+	io_u->error = r;
+	td_verror(td, io_u->error, "transfer");
+out:
+	curl_slist_free_all(slist);
+	return FIO_Q_COMPLETED;
+}
+
+static struct io_u *fio_http_event(struct thread_data *td, int event)
+{
+	/* sync IO engine - never any outstanding events */
+	return NULL;
+}
+
+int fio_http_getevents(struct thread_data *td, unsigned int min,
+	unsigned int max, const struct timespec *t)
+{
+	/* sync IO engine - never any outstanding events */
+	return 0;
+}
+
+static int fio_http_setup(struct thread_data *td)
+{
+	struct http_data *http = NULL;
+	struct http_options *o = td->eo;
+	int r;
+	/* allocate engine specific structure to deal with libhttp. */
+	http = calloc(1, sizeof(*http));
+	if (!http) {
+		log_err("calloc failed.\n");
+		goto cleanup;
+	}
+
+	http->curl = curl_easy_init();
+	if (o->verbose)
+		curl_easy_setopt(http->curl, CURLOPT_VERBOSE, 1L);
+	if (o->verbose > 1)
+		curl_easy_setopt(http->curl, CURLOPT_DEBUGFUNCTION, &_curl_trace);
+	curl_easy_setopt(http->curl, CURLOPT_NOPROGRESS, 1L);
+	curl_easy_setopt(http->curl, CURLOPT_FOLLOWLOCATION, 1L);
+	curl_easy_setopt(http->curl, CURLOPT_PROTOCOLS, CURLPROTO_HTTP|CURLPROTO_HTTPS);
+	curl_easy_setopt(http->curl, CURLOPT_READFUNCTION, _http_read);
+	curl_easy_setopt(http->curl, CURLOPT_WRITEFUNCTION, _http_write);
+	curl_easy_setopt(http->curl, CURLOPT_SEEKFUNCTION, _http_seek);
+	if (o->user && o->pass) {
+		curl_easy_setopt(http->curl, CURLOPT_USERNAME, o->user);
+		curl_easy_setopt(http->curl, CURLOPT_PASSWORD, o->pass);
+		curl_easy_setopt(http->curl, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
+	}
+
+	td->io_ops_data = http;
+
+	/* Force single process mode. */
+	td->o.use_thread = 1;
+
+	return 0;
+cleanup:
+	fio_http_cleanup(td);
+	return r;
+}
+
+static int fio_http_open(struct thread_data *td, struct fio_file *f)
+{
+	return 0;
+}
+static int fio_http_invalidate(struct thread_data *td, struct fio_file *f)
+{
+	return 0;
+}
+
+static struct ioengine_ops ioengine = {
+	.name = "http",
+	.version		= FIO_IOOPS_VERSION,
+	.flags			= FIO_DISKLESSIO,
+	.setup			= fio_http_setup,
+	.queue			= fio_http_queue,
+	.getevents		= fio_http_getevents,
+	.event			= fio_http_event,
+	.cleanup		= fio_http_cleanup,
+	.open_file		= fio_http_open,
+	.invalidate		= fio_http_invalidate,
+	.options		= options,
+	.option_struct_size	= sizeof(struct http_options),
+};
+
+static void fio_init fio_http_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_http_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/http-s3.fio b/examples/http-s3.fio
new file mode 100644
index 0000000..a9805da
--- /dev/null
+++ b/examples/http-s3.fio
@@ -0,0 +1,34 @@
+# Example test for the HTTP engine's S3 support against Amazon AWS.
+# Obviously, you have to adjust the S3 credentials; for this example,
+# they're passed in via the environment.
+#
+
+[global]
+ioengine=http
+name=test
+direct=1
+filename=/larsmb-fio-test/object
+http_verbose=0
+https=1
+http_s3=1
+http_s3_key=${S3_KEY}
+http_s3_keyid=${S3_ID}
+http_host=s3.eu-central-1.amazonaws.com
+http_s3_region=eu-central-1
+group_reporting
+
+# With verify, this both writes and reads the object
+[create]
+rw=write
+bs=4k
+size=64k
+io_size=4k
+verify=sha256
+
+[trim]
+stonewall
+rw=trim
+bs=4k
+size=64k
+io_size=4k
+
diff --git a/examples/http-webdav.fio b/examples/http-webdav.fio
new file mode 100644
index 0000000..c0624f8
--- /dev/null
+++ b/examples/http-webdav.fio
@@ -0,0 +1,26 @@
+[global]
+ioengine=http
+rw=randwrite
+name=test
+direct=1
+http_verbose=0
+http_s3=0
+https=0
+http_host=localhost
+filename_format=/dav/bucket.$jobnum
+group_reporting
+bs=64k
+size=1M
+
+[create]
+numjobs=16
+rw=randwrite
+io_size=10M
+verify=sha256
+
+# This will delete all created objects again
+[trim]
+stonewall
+numjobs=16
+rw=trim
+io_size=1M
diff --git a/fio.1 b/fio.1
index 18bf6a2..883a31b 100644
--- a/fio.1
+++ b/fio.1
@@ -1608,6 +1608,15 @@ I/O engine supporting direct access to Ceph Rados Block Devices
 (RBD) via librbd without the need to use the kernel rbd driver. This
 ioengine defines engine specific options.
 .TP
+.B http
+I/O engine supporting GET/PUT requests over HTTP(S) with libcurl to
+a WebDAV or S3 endpoint.  This ioengine defines engine specific options.
+
+This engine only supports direct IO of iodepth=1; you need to scale this
+via numjobs. blocksize defines the size of the objects to be created.
+
+TRIM is translated to object deletion.
+.TP
 .B gfapi
 Using GlusterFS libgfapi sync interface to direct access to
 GlusterFS volumes without having to go through FUSE. This ioengine
@@ -1810,6 +1819,37 @@ by default.
 Poll store instead of waiting for completion. Usually this provides better
 throughput at cost of higher(up to 100%) CPU utilization.
 .TP
+.BI (http)http_host \fR=\fPstr
+Hostname to connect to. For S3, this could be the bucket name. Default
+is \fBlocalhost\fR
+.TP
+.BI (http)http_user \fR=\fPstr
+Username for HTTP authentication.
+.TP
+.BI (http)http_pass \fR=\fPstr
+Password for HTTP authentication.
+.TP
+.BI (http)https \fR=\fPbool
+Whether to use HTTPS instead of plain HTTP. Default is \fB0\fR.
+.TP
+.BI (http)http_s3 \fR=\fPbool
+Include S3 specific HTTP headers such as authenticating requests with
+AWS Signature Version 4. Default is \fB0\fR.
+.TP
+.BI (http)http_s3_region \fR=\fPstr
+The S3 region/zone to include in the request. Default is \fBus-east-1\fR.
+.TP
+.BI (http)http_s3_key \fR=\fPstr
+The S3 secret key.
+.TP
+.BI (http)http_s3_keyid \fR=\fPstr
+The S3 key/access id.
+.TP
+.BI (http)http_verbose \fR=\fPint
+Enable verbose requests from libcurl. Useful for debugging. 1 turns on
+verbose logging from libcurl, 2 additionally enables HTTP IO tracing.
+Default is \fB0\fR
+.TP
 .BI (mtd)skip_bad \fR=\fPbool
 Skip operations against known bad blocks.
 .TP
diff --git a/optgroup.h b/optgroup.h
index d5e968d..adf4d09 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -56,6 +56,7 @@ enum opt_category_group {
 	__FIO_OPT_G_ACT,
 	__FIO_OPT_G_LATPROF,
 	__FIO_OPT_G_RBD,
+	__FIO_OPT_G_HTTP,
 	__FIO_OPT_G_GFAPI,
 	__FIO_OPT_G_MTD,
 	__FIO_OPT_G_HDFS,
@@ -91,6 +92,7 @@ enum opt_category_group {
 	FIO_OPT_G_ACT		= (1ULL << __FIO_OPT_G_ACT),
 	FIO_OPT_G_LATPROF	= (1ULL << __FIO_OPT_G_LATPROF),
 	FIO_OPT_G_RBD		= (1ULL << __FIO_OPT_G_RBD),
+	FIO_OPT_G_HTTP		= (1ULL << __FIO_OPT_G_HTTP),
 	FIO_OPT_G_GFAPI		= (1ULL << __FIO_OPT_G_GFAPI),
 	FIO_OPT_G_MTD		= (1ULL << __FIO_OPT_G_MTD),
 	FIO_OPT_G_HDFS		= (1ULL << __FIO_OPT_G_HDFS),
diff --git a/options.c b/options.c
index e53b717..9ee1ba3 100644
--- a/options.c
+++ b/options.c
@@ -1863,6 +1863,11 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "PMDK libpmem based IO engine",
 			  },
 #endif
+#ifdef CONFIG_HTTP
+			  { .ival = "http",
+			    .help = "HTTP (WebDAV/S3) IO engine",
+			  },
+#endif
 		},
 	},
 	{


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit fee14ab846ef542d9bb9ebf68f11f0ecb8636f5e:

  Merge branch 'minor_fixes' of https://github.com/sitsofe/fio (2018-08-12 15:51:34 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dcf6ad384149ee0b3f91c5a8127160cc291f7157:

  Merge branch 'fio-man-page' of https://github.com/bvanassche/fio (2018-08-13 21:05:33 -0600)

----------------------------------------------------------------
Bart Van Assche (1):
      Improve zone support documentation

Jens Axboe (5):
      gclient: bump output time buf size
      asprintf: fix indentation
      Fix double free of zone cache data
      Remove old zone gen from options
      Merge branch 'fio-man-page' of https://github.com/bvanassche/fio

 HOWTO            | 18 +++++++++-----
 Makefile         |  2 +-
 backend.c        | 13 +++-------
 fio.1            | 17 +++++++++----
 gclient.c        |  2 +-
 options.c        | 46 +----------------------------------
 oslib/asprintf.c | 38 ++++++++++++++---------------
 zone-dist.c      | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 zone-dist.h      |  7 ++++++
 9 files changed, 131 insertions(+), 86 deletions(-)
 create mode 100644 zone-dist.c
 create mode 100644 zone-dist.h

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 4c117c2..1bec806 100644
--- a/HOWTO
+++ b/HOWTO
@@ -952,18 +952,24 @@ Target file/device
 
 	Unlink job files after each iteration or loop.  Default: false.
 
-.. option:: zonesize=int
+.. option:: zonerange=int
 
-	Divide a file into zones of the specified size. See :option:`zoneskip`.
+	Size of a single zone in which I/O occurs. See also :option:`zonesize`
+	and :option:`zoneskip`.
 
-.. option:: zonerange=int
+.. option:: zonesize=int
 
-	Give size of an I/O zone.  See :option:`zoneskip`.
+	Number of bytes to transfer before skipping :option:`zoneskip`
+	bytes. If this parameter is smaller than :option:`zonerange` then only
+	a fraction of each zone with :option:`zonerange` bytes will be
+	accessed.  If this parameter is larger than :option:`zonerange` then
+	each zone will be accessed multiple times before skipping
 
 .. option:: zoneskip=int
 
-	Skip the specified number of bytes when :option:`zonesize` data has been
-	read. The two zone options can be used to only do I/O on zones of a file.
+	Skip the specified number of bytes when :option:`zonesize` data have
+	been transferred. The three zone options can be used to do strided I/O
+	on a file.
 
 
 I/O type
diff --git a/Makefile b/Makefile
index 20d3ec1..f9bd787 100644
--- a/Makefile
+++ b/Makefile
@@ -50,7 +50,7 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		gettime-thread.c helpers.c json.c idletime.c td_error.c \
 		profiles/tiobench.c profiles/act.c io_u_queue.c filelock.c \
 		workqueue.c rate-submit.c optgroup.c helper_thread.c \
-		steadystate.c
+		steadystate.c zone-dist.c
 
 ifdef CONFIG_LIBHDFS
   HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
diff --git a/backend.c b/backend.c
index f6cfbdd..36bde6a 100644
--- a/backend.c
+++ b/backend.c
@@ -47,6 +47,7 @@
 #include "rate-submit.h"
 #include "helper_thread.h"
 #include "pshared.h"
+#include "zone-dist.h"
 
 static struct fio_sem *startup_sem;
 static struct flist_head *cgroup_list;
@@ -1592,6 +1593,8 @@ static void *thread_main(void *data)
 		goto err;
 	}
 
+	td_zone_gen_index(td);
+
 	/*
 	 * Do this early, we don't want the compress threads to be limited
 	 * to the same CPUs as the IO workers. So do this before we set
@@ -1907,15 +1910,7 @@ err:
 	close_ioengine(td);
 	cgroup_shutdown(td, cgroup_mnt);
 	verify_free_state(td);
-
-	if (td->zone_state_index) {
-		int i;
-
-		for (i = 0; i < DDIR_RWDIR_CNT; i++)
-			free(td->zone_state_index[i]);
-		free(td->zone_state_index);
-		td->zone_state_index = NULL;
-	}
+	td_zone_free_index(td);
 
 	if (fio_option_is_set(o, cpumask)) {
 		ret = fio_cpuset_exit(&o->cpumask);
diff --git a/fio.1 b/fio.1
index 0c604a6..18bf6a2 100644
--- a/fio.1
+++ b/fio.1
@@ -724,15 +724,22 @@ false.
 .BI unlink_each_loop \fR=\fPbool
 Unlink job files after each iteration or loop. Default: false.
 .TP
-.BI zonesize \fR=\fPint
-Divide a file into zones of the specified size. See \fBzoneskip\fR.
+Fio supports strided data access. After having read \fBzonesize\fR bytes from an area that is \fBzonerange\fR bytes big, \fBzoneskip\fR bytes are skipped.
 .TP
 .BI zonerange \fR=\fPint
-Give size of an I/O zone. See \fBzoneskip\fR.
+Size of a single zone in which I/O occurs.
+.TP
+.BI zonesize \fR=\fPint
+Number of bytes to transfer before skipping \fBzoneskip\fR bytes. If this
+parameter is smaller than \fBzonerange\fR then only a fraction of each zone
+with \fBzonerange\fR bytes will be accessed.  If this parameter is larger than
+\fBzonerange\fR then each zone will be accessed multiple times before skipping
+to the next zone.
 .TP
 .BI zoneskip \fR=\fPint
-Skip the specified number of bytes when \fBzonesize\fR data has been
-read. The two zone options can be used to only do I/O on zones of a file.
+Skip the specified number of bytes after \fBzonesize\fR bytes of data have been
+transferred.
+
 .SS "I/O type"
 .TP
 .BI direct \fR=\fPbool
diff --git a/gclient.c b/gclient.c
index 7e5071d..04275a1 100644
--- a/gclient.c
+++ b/gclient.c
@@ -121,7 +121,7 @@ static void gfio_text_op(struct fio_client *client, struct fio_net_cmd *cmd)
 	GtkTreeIter iter;
 	struct tm *tm;
 	time_t sec;
-	char tmp[64], timebuf[80];
+	char tmp[64], timebuf[96];
 
 	sec = p->log_sec;
 	tm = localtime(&sec);
diff --git a/options.c b/options.c
index f592027..e53b717 100644
--- a/options.c
+++ b/options.c
@@ -959,48 +959,6 @@ static int zone_split_ddir(struct thread_options *o, enum fio_ddir ddir,
 	return 0;
 }
 
-static void __td_zone_gen_index(struct thread_data *td, enum fio_ddir ddir)
-{
-	unsigned int i, j, sprev, aprev;
-	uint64_t sprev_sz;
-
-	td->zone_state_index[ddir] = malloc(sizeof(struct zone_split_index) * 100);
-
-	sprev_sz = sprev = aprev = 0;
-	for (i = 0; i < td->o.zone_split_nr[ddir]; i++) {
-		struct zone_split *zsp = &td->o.zone_split[ddir][i];
-
-		for (j = aprev; j < aprev + zsp->access_perc; j++) {
-			struct zone_split_index *zsi = &td->zone_state_index[ddir][j];
-
-			zsi->size_perc = sprev + zsp->size_perc;
-			zsi->size_perc_prev = sprev;
-
-			zsi->size = sprev_sz + zsp->size;
-			zsi->size_prev = sprev_sz;
-		}
-
-		aprev += zsp->access_perc;
-		sprev += zsp->size_perc;
-		sprev_sz += zsp->size;
-	}
-}
-
-/*
- * Generate state table for indexes, so we don't have to do it inline from
- * the hot IO path
- */
-static void td_zone_gen_index(struct thread_data *td)
-{
-	int i;
-
-	td->zone_state_index = malloc(DDIR_RWDIR_CNT *
-					sizeof(struct zone_split_index *));
-
-	for (i = 0; i < DDIR_RWDIR_CNT; i++)
-		__td_zone_gen_index(td, i);
-}
-
 static int parse_zoned_distribution(struct thread_data *td, const char *input,
 				    bool absolute)
 {
@@ -1055,9 +1013,7 @@ static int parse_zoned_distribution(struct thread_data *td, const char *input,
 		return ret;
 	}
 
-	if (!ret)
-		td_zone_gen_index(td);
-	else {
+	if (ret) {
 		for (i = 0; i < DDIR_RWDIR_CNT; i++)
 			td->o.zone_split_nr[i] = 0;
 	}
diff --git a/oslib/asprintf.c b/oslib/asprintf.c
index 969479f..ff503c5 100644
--- a/oslib/asprintf.c
+++ b/oslib/asprintf.c
@@ -6,38 +6,38 @@
 #ifndef CONFIG_HAVE_VASPRINTF
 int vasprintf(char **strp, const char *fmt, va_list ap)
 {
-    va_list ap_copy;
-    char *str;
-    int len;
+	va_list ap_copy;
+	char *str;
+	int len;
 
 #ifdef va_copy
-    va_copy(ap_copy, ap);
+	va_copy(ap_copy, ap);
 #else
-    __va_copy(ap_copy, ap);
+	__va_copy(ap_copy, ap);
 #endif
-    len = vsnprintf(NULL, 0, fmt, ap_copy);
-    va_end(ap_copy);
+	len = vsnprintf(NULL, 0, fmt, ap_copy);
+	va_end(ap_copy);
 
-    if (len < 0)
-        return len;
+	if (len < 0)
+		return len;
 
-    len++;
-    str = malloc(len);
-    *strp = str;
-    return str ? vsnprintf(str, len, fmt, ap) : -1;
+	len++;
+	str = malloc(len);
+	*strp = str;
+	return str ? vsnprintf(str, len, fmt, ap) : -1;
 }
 #endif
 
 #ifndef CONFIG_HAVE_ASPRINTF
 int asprintf(char **strp, const char *fmt, ...)
 {
-    va_list arg;
-    int done;
+	va_list arg;
+	int done;
 
-    va_start(arg, fmt);
-    done = vasprintf(strp, fmt, arg);
-    va_end(arg);
+	va_start(arg, fmt);
+	done = vasprintf(strp, fmt, arg);
+	va_end(arg);
 
-    return done;
+	return done;
 }
 #endif
diff --git a/zone-dist.c b/zone-dist.c
new file mode 100644
index 0000000..819d531
--- /dev/null
+++ b/zone-dist.c
@@ -0,0 +1,74 @@
+#include <stdlib.h>
+#include "fio.h"
+#include "zone-dist.h"
+
+static void __td_zone_gen_index(struct thread_data *td, enum fio_ddir ddir)
+{
+	unsigned int i, j, sprev, aprev;
+	uint64_t sprev_sz;
+
+	td->zone_state_index[ddir] = malloc(sizeof(struct zone_split_index) * 100);
+
+	sprev_sz = sprev = aprev = 0;
+	for (i = 0; i < td->o.zone_split_nr[ddir]; i++) {
+		struct zone_split *zsp = &td->o.zone_split[ddir][i];
+
+		for (j = aprev; j < aprev + zsp->access_perc; j++) {
+			struct zone_split_index *zsi = &td->zone_state_index[ddir][j];
+
+			zsi->size_perc = sprev + zsp->size_perc;
+			zsi->size_perc_prev = sprev;
+
+			zsi->size = sprev_sz + zsp->size;
+			zsi->size_prev = sprev_sz;
+		}
+
+		aprev += zsp->access_perc;
+		sprev += zsp->size_perc;
+		sprev_sz += zsp->size;
+	}
+}
+
+static bool has_zones(struct thread_data *td)
+{
+	int i, zones = 0;
+
+	for (i = 0; i < DDIR_RWDIR_CNT; i++)
+		zones += td->o.zone_split_nr[i];
+
+	return zones != 0;
+}
+
+/*
+ * Generate state table for indexes, so we don't have to do it inline from
+ * the hot IO path
+ */
+void td_zone_gen_index(struct thread_data *td)
+{
+	int i;
+
+	if (!has_zones(td))
+		return;
+
+	td->zone_state_index = malloc(DDIR_RWDIR_CNT *
+					sizeof(struct zone_split_index *));
+
+	for (i = 0; i < DDIR_RWDIR_CNT; i++)
+		__td_zone_gen_index(td, i);
+}
+
+void td_zone_free_index(struct thread_data *td)
+{
+	int i;
+
+	if (!td->zone_state_index)
+		return;
+
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		free(td->zone_state_index[i]);
+		td->zone_state_index[i] = NULL;
+	}
+
+	free(td->zone_state_index);
+	td->zone_state_index = NULL;
+}
diff --git a/zone-dist.h b/zone-dist.h
new file mode 100644
index 0000000..c0b2884
--- /dev/null
+++ b/zone-dist.h
@@ -0,0 +1,7 @@
+#ifndef FIO_ZONE_DIST_H
+#define FIO_ZONE_DIST_H
+
+void td_zone_gen_index(struct thread_data *td);
+void td_zone_free_index(struct thread_data *td);
+
+#endif


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 73027fb9ad6728f5b066e25a9ec923459ceab8a2:

  eta: clean up ETA string printing (2018-08-10 09:33:44 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fee14ab846ef542d9bb9ebf68f11f0ecb8636f5e:

  Merge branch 'minor_fixes' of https://github.com/sitsofe/fio (2018-08-12 15:51:34 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Merge branch 'armv6' of https://github.com/sitsofe/fio
      travis: include new xcode 9.4
      Merge branch 'minor_fixes' of https://github.com/sitsofe/fio

Sitsofe Wheeler (5):
      arch: fix build breakage on armv6
      minor fio.service cleanups
      doc: update Log File Formats and write_iops_log sections
      doc: rewording and add reference to --aux-path
      man: fix missing/too many backslashes

 .travis.yml       |  4 ++++
 HOWTO             | 35 ++++++++++++++++++++---------------
 arch/arch-arm.h   |  3 ++-
 fio.1             | 45 +++++++++++++++++++++++++--------------------
 tools/fio.service |  8 ++++----
 5 files changed, 55 insertions(+), 40 deletions(-)

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
index 94f69fb..4a87fe6 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -25,6 +25,10 @@ matrix:
       compiler: clang
       osx_image: xcode8.3
       env: BUILD_ARCH="x86_64"
+    - os: osx
+      compiler: clang
+      osx_image: xcode9.4
+      env: BUILD_ARCH="x86_64"
   exclude:
     - os: osx
       compiler: gcc
diff --git a/HOWTO b/HOWTO
index 16c5ae3..4c117c2 100644
--- a/HOWTO
+++ b/HOWTO
@@ -283,7 +283,8 @@ Command line options
 
 .. option:: --aux-path=path
 
-	Use this `path` for fio state generated files.
+	Use the directory specified by `path` for generated state files instead
+	of the current working directory.
 
 Any parameters following the options will be assumed to be job files, unless
 they match a job file parameter. Multiple job files can be listed and each job
@@ -748,12 +749,15 @@ Target file/device
 	assigned equally distributed to job clones created by :option:`numjobs` as
 	long as they are using generated filenames. If specific `filename(s)` are
 	set fio will use the first listed directory, and thereby matching the
-	`filename` semantic which generates a file each clone if not specified, but
-	let all clones use the same if set.
+	`filename` semantic (which generates a file for each clone if not
+	specified, but lets all clones use the same file if set).
 
 	See the :option:`filename` option for information on how to escape "``:``" and
 	"``\``" characters within the directory path itself.
 
+	Note: To control the directory fio will use for internal state files
+	use :option:`--aux-path`.
+
 .. option:: filename=str
 
 	Fio normally makes up a `filename` based on the job name, thread number, and
@@ -2915,9 +2919,11 @@ Measurements and reporting
 .. option:: write_iops_log=str
 
 	Same as :option:`write_bw_log`, but writes an IOPS file (e.g.
-	:file:`name_iops.x.log`) instead. See :option:`write_bw_log` for
-	details about the filename format and `Log File Formats`_ for how data
-	is structured within the file.
+	:file:`name_iops.x.log`) instead. Because fio defaults to individual
+	I/O logging, the value entry in the IOPS log will be 1 unless windowed
+	logging (see :option:`log_avg_msec`) has been enabled. See
+	:option:`write_bw_log` for details about the filename format and `Log
+	File Formats`_ for how data is structured within the file.
 
 .. option:: log_avg_msec=int
 
@@ -3802,17 +3808,16 @@ on the type of log, it will be one of the following:
 	**2**
 		I/O is a TRIM
 
-The entry's *block size* is always in bytes. The *offset* is the offset, in bytes,
-from the start of the file, for that particular I/O. The logging of the offset can be
+The entry's *block size* is always in bytes. The *offset* is the position in bytes
+from the start of the file for that particular I/O. The logging of the offset can be
 toggled with :option:`log_offset`.
 
-Fio defaults to logging every individual I/O.  When IOPS are logged for individual
-I/Os the *value* entry will always be 1. If windowed logging is enabled through
-:option:`log_avg_msec`, fio logs the average values over the specified period of time.
-If windowed logging is enabled and :option:`log_max_value` is set, then fio logs
-maximum values in that window instead of averages. Since *data direction*, *block
-size* and *offset* are per-I/O values, if windowed logging is enabled they
-aren't applicable and will be 0.
+Fio defaults to logging every individual I/O but when windowed logging is set
+through :option:`log_avg_msec`, either the average (by default) or the maximum
+(:option:`log_max_value` is set) *value* seen over the specified period of time
+is recorded. Each *data direction* seen within the window period will aggregate
+its values in a separate row. Further, when using windowed logging the *block
+size* and *offset* entries will always contain 0.
 
 Client/Server
 -------------
diff --git a/arch/arch-arm.h b/arch/arch-arm.h
index dd286d0..fc1c484 100644
--- a/arch/arch-arm.h
+++ b/arch/arch-arm.h
@@ -6,7 +6,8 @@
 #if defined (__ARM_ARCH_4__) || defined (__ARM_ARCH_4T__) \
 	|| defined (__ARM_ARCH_5__) || defined (__ARM_ARCH_5T__) || defined (__ARM_ARCH_5E__)\
 	|| defined (__ARM_ARCH_5TE__) || defined (__ARM_ARCH_5TEJ__) \
-	|| defined(__ARM_ARCH_6__)  || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__)
+	|| defined(__ARM_ARCH_6__)  || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
+	|| defined(__ARM_ARCH_6KZ__)
 #define nop             __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t")
 #define read_barrier()	__asm__ __volatile__ ("" : : : "memory")
 #define write_barrier()	__asm__ __volatile__ ("" : : : "memory")
diff --git a/fio.1 b/fio.1
index 4386f85..0c604a6 100644
--- a/fio.1
+++ b/fio.1
@@ -168,7 +168,8 @@ Set this \fIcommand\fR as local trigger.
 Set this \fIcommand\fR as remote trigger.
 .TP
 .BI \-\-aux\-path \fR=\fPpath
-Use this \fIpath\fR for fio state generated files.
+Use the directory specified by \fIpath\fP for generated state files instead
+of the current working directory.
 .SH "JOB FILE FORMAT"
 Any parameters following the options will be assumed to be job files, unless
 they match a job file parameter. Multiple job files can be listed and each job
@@ -523,12 +524,15 @@ separating the names with a ':' character. These directories will be
 assigned equally distributed to job clones created by \fBnumjobs\fR as
 long as they are using generated filenames. If specific \fBfilename\fR(s) are
 set fio will use the first listed directory, and thereby matching the
-\fBfilename\fR semantic which generates a file each clone if not specified, but
-let all clones use the same if set.
+\fBfilename\fR semantic (which generates a file for each clone if not
+specified, but lets all clones use the same file if set).
 .RS
 .P
-See the \fBfilename\fR option for information on how to escape ':' and '\'
+See the \fBfilename\fR option for information on how to escape ':' and '\\'
 characters within the directory path itself.
+.P
+Note: To control the directory fio will use for internal state files
+use \fB\-\-aux\-path\fR.
 .RE
 .TP
 .BI filename \fR=\fPstr
@@ -545,13 +549,13 @@ by this option will be \fBsize\fR divided by number of files unless an
 explicit size is specified by \fBfilesize\fR.
 .RS
 .P
-Each colon and backslash in the wanted path must be escaped with a '\'
+Each colon and backslash in the wanted path must be escaped with a '\\'
 character. For instance, if the path is `/dev/dsk/foo@3,0:c' then you
 would use `filename=/dev/dsk/foo@3,0\\:c' and if the path is
-`F:\\\\filename' then you would use `filename=F\\:\\\\filename'.
+`F:\\filename' then you would use `filename=F\\:\\\\filename'.
 .P
-On Windows, disk devices are accessed as `\\\\\\\\.\\\\PhysicalDrive0' for
-the first device, `\\\\\\\\.\\\\PhysicalDrive1' for the second etc.
+On Windows, disk devices are accessed as `\\\\.\\PhysicalDrive0' for
+the first device, `\\\\.\\PhysicalDrive1' for the second etc.
 Note: Windows and FreeBSD prevent write access to areas
 of the disk containing in\-use data (e.g. filesystems).
 .P
@@ -2608,9 +2612,11 @@ within the file.
 .TP
 .BI write_iops_log \fR=\fPstr
 Same as \fBwrite_bw_log\fR, but writes an IOPS file (e.g.
-`name_iops.x.log') instead. See \fBwrite_bw_log\fR for
-details about the filename format and the \fBLOG FILE FORMATS\fR section for how data
-is structured within the file.
+`name_iops.x.log`) instead. Because fio defaults to individual
+I/O logging, the value entry in the IOPS log will be 1 unless windowed
+logging (see \fBlog_avg_msec\fR) has been enabled. See
+\fBwrite_bw_log\fR for details about the filename format and \fBLOG
+FILE FORMATS\fR for how data is structured within the file.
 .TP
 .BI log_avg_msec \fR=\fPint
 By default, fio will log an entry in the iops, latency, or bw log for every
@@ -3527,17 +3533,16 @@ I/O is a WRITE
 I/O is a TRIM
 .RE
 .P
-The entry's `block size' is always in bytes. The `offset' is the offset, in bytes,
-from the start of the file, for that particular I/O. The logging of the offset can be
+The entry's `block size' is always in bytes. The `offset' is the position in bytes
+from the start of the file for that particular I/O. The logging of the offset can be
 toggled with \fBlog_offset\fR.
 .P
-Fio defaults to logging every individual I/O. When IOPS are logged for individual
-I/Os the `value' entry will always be 1. If windowed logging is enabled through
-\fBlog_avg_msec\fR, fio logs the average values over the specified period of time.
-If windowed logging is enabled and \fBlog_max_value\fR is set, then fio logs
-maximum values in that window instead of averages. Since `data direction', `block size'
-and `offset' are per\-I/O values, if windowed logging is enabled they
-aren't applicable and will be 0.
+Fio defaults to logging every individual I/O but when windowed logging is set
+through \fBlog_avg_msec\fR, either the average (by default) or the maximum
+(\fBlog_max_value\fR is set) `value' seen over the specified period of time
+is recorded. Each `data direction' seen within the window period will aggregate
+its values in a separate row. Further, when using windowed logging the `block
+size' and `offset' entries will always contain 0.
 .SH CLIENT / SERVER
 Normally fio is invoked as a stand\-alone application on the machine where the
 I/O workload should be generated. However, the backend and frontend of fio can
diff --git a/tools/fio.service b/tools/fio.service
index 21de0b7..678158b 100644
--- a/tools/fio.service
+++ b/tools/fio.service
@@ -1,10 +1,10 @@
 [Unit]
-
-Description=flexible I/O tester server
+Description=Flexible I/O tester server
 After=network.target
 
 [Service]
-
 Type=simple
-PIDFile=/run/fio.pid
 ExecStart=/usr/bin/fio --server
+
+[Install]
+WantedBy=multi-user.target


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ae48493039f833119126ef979dbe86aee2c05557:

  Merge branch 'realloc-io_u-buffers' of https://github.com/aclamk/fio (2018-08-09 09:31:26 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 73027fb9ad6728f5b066e25a9ec923459ceab8a2:

  eta: clean up ETA string printing (2018-08-10 09:33:44 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      eta: clean up ETA string printing

 eta.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 60 insertions(+), 14 deletions(-)

---

Diff of recent changes:

diff --git a/eta.c b/eta.c
index 9111f5e..970a67d 100644
--- a/eta.c
+++ b/eta.c
@@ -518,6 +518,63 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 	return true;
 }
 
+static int gen_eta_str(struct jobs_eta *je, char *p, size_t left,
+		       char **rate_str, char **iops_str)
+{
+	bool has_r = je->rate[DDIR_READ] || je->iops[DDIR_READ];
+	bool has_w = je->rate[DDIR_WRITE] || je->iops[DDIR_WRITE];
+	bool has_t = je->rate[DDIR_TRIM] || je->iops[DDIR_TRIM];
+	int l = 0;
+
+	if (!has_r && !has_w && !has_t)
+		return 0;
+
+	if (has_r) {
+		l += snprintf(p + l, left - l, "[r=%s", rate_str[DDIR_READ]);
+		if (!has_w)
+			l += snprintf(p + l, left - l, "]");
+	}
+	if (has_w) {
+		if (has_r)
+			l += snprintf(p + l, left - l, ",");
+		else
+			l += snprintf(p + l, left - l, "[");
+		l += snprintf(p + l, left - l, "w=%s", rate_str[DDIR_WRITE]);
+		if (!has_t)
+			l += snprintf(p + l, left - l, "]");
+	}
+	if (has_t) {
+		if (has_r || has_w)
+			l += snprintf(p + l, left - l, ",");
+		else if (!has_r && !has_w)
+			l += snprintf(p + l, left - l, "[");
+		l += snprintf(p + l, left - l, "t=%s]", rate_str[DDIR_TRIM]);
+	}
+	if (has_r) {
+		l += snprintf(p + l, left - l, "[r=%s", iops_str[DDIR_READ]);
+		if (!has_w)
+			l += snprintf(p + l, left - l, " IOPS]");
+	}
+	if (has_w) {
+		if (has_r)
+			l += snprintf(p + l, left - l, ",");
+		else
+			l += snprintf(p + l, left - l, "[");
+		l += snprintf(p + l, left - l, "w=%s", iops_str[DDIR_WRITE]);
+		if (!has_t)
+			l += snprintf(p + l, left - l, " IOPS]");
+	}
+	if (has_t) {
+		if (has_r || has_w)
+			l += snprintf(p + l, left - l, ",");
+		else if (!has_r && !has_w)
+			l += snprintf(p + l, left - l, "[");
+		l += snprintf(p + l, left - l, "t=%s IOPS]", iops_str[DDIR_TRIM]);
+	}
+
+	return l;
+}
+
 void display_thread_status(struct jobs_eta *je)
 {
 	static struct timespec disp_eta_new_line;
@@ -592,21 +649,10 @@ void display_thread_status(struct jobs_eta *je)
 		}
 
 		left = sizeof(output) - (p - output) - 1;
+		l = snprintf(p, left, ": [%s][%s]", je->run_str, perc_str);
+		l += gen_eta_str(je, p + l, left - l, rate_str, iops_str);
+		l += snprintf(p + l, left - l, "[eta %s]", eta_str);
 
-		if (je->rate[DDIR_TRIM] || je->iops[DDIR_TRIM])
-			l = snprintf(p, left,
-				": [%s][%s][r=%s,w=%s,t=%s][r=%s,w=%s,t=%s IOPS][eta %s]",
-				je->run_str, perc_str, rate_str[DDIR_READ],
-				rate_str[DDIR_WRITE], rate_str[DDIR_TRIM],
-				iops_str[DDIR_READ], iops_str[DDIR_WRITE],
-				iops_str[DDIR_TRIM], eta_str);
-		else
-			l = snprintf(p, left,
-				": [%s][%s][r=%s,w=%s][r=%s,w=%s IOPS][eta %s]",
-				je->run_str, perc_str,
-				rate_str[DDIR_READ], rate_str[DDIR_WRITE],
-				iops_str[DDIR_READ], iops_str[DDIR_WRITE],
-				eta_str);
 		/* If truncation occurred adjust l so p is on the null */
 		if (l >= left)
 			l = left - 1;


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2ce58465415fc4d900c4dd89b86acbcaa51d9dfb:

  libpmem: fix type print (2018-08-07 08:09:47 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ae48493039f833119126ef979dbe86aee2c05557:

  Merge branch 'realloc-io_u-buffers' of https://github.com/aclamk/fio (2018-08-09 09:31:26 -0600)

----------------------------------------------------------------
Adam Kupczyk (2):
      iolog replay: Treat 'open' on file that is scheduled to close as cancel of 'close' operation.
      iolog replay: Realloc io_u buffers to adapt to operation size.

Jens Axboe (1):
      Merge branch 'realloc-io_u-buffers' of https://github.com/aclamk/fio

Udit agarwal (1):
      xxhash: fix function definition and declaration mismatch

 backend.c    | 115 ++++++++++++++++++++++++++++++++++-------------------------
 crc/xxhash.h |   8 ++---
 filesetup.c  |   6 ++++
 ioengines.c  |  13 ++++---
 iolog.c      |  11 +++++-
 iolog.h      |   1 +
 6 files changed, 95 insertions(+), 59 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 4b4ecde..f6cfbdd 100644
--- a/backend.c
+++ b/backend.c
@@ -1201,19 +1201,10 @@ static void cleanup_io_u(struct thread_data *td)
 static int init_io_u(struct thread_data *td)
 {
 	struct io_u *io_u;
-	unsigned long long max_bs, min_write;
 	int cl_align, i, max_units;
-	int data_xfer = 1, err;
-	char *p;
+	int err;
 
 	max_units = td->o.iodepth;
-	max_bs = td_max_bs(td);
-	min_write = td->o.min_bs[DDIR_WRITE];
-	td->orig_buffer_size = (unsigned long long) max_bs
-					* (unsigned long long) max_units;
-
-	if (td_ioengine_flagged(td, FIO_NOIO) || !(td_read(td) || td_write(td)))
-		data_xfer = 0;
 
 	err = 0;
 	err += !io_u_rinit(&td->io_u_requeues, td->o.iodepth);
@@ -1225,6 +1216,70 @@ static int init_io_u(struct thread_data *td)
 		return 1;
 	}
 
+	cl_align = os_cache_line_size();
+
+	for (i = 0; i < max_units; i++) {
+		void *ptr;
+
+		if (td->terminate)
+			return 1;
+
+		ptr = fio_memalign(cl_align, sizeof(*io_u));
+		if (!ptr) {
+			log_err("fio: unable to allocate aligned memory\n");
+			break;
+		}
+
+		io_u = ptr;
+		memset(io_u, 0, sizeof(*io_u));
+		INIT_FLIST_HEAD(&io_u->verify_list);
+		dprint(FD_MEM, "io_u alloc %p, index %u\n", io_u, i);
+
+		io_u->index = i;
+		io_u->flags = IO_U_F_FREE;
+		io_u_qpush(&td->io_u_freelist, io_u);
+
+		/*
+		 * io_u never leaves this stack, used for iteration of all
+		 * io_u buffers.
+		 */
+		io_u_qpush(&td->io_u_all, io_u);
+
+		if (td->io_ops->io_u_init) {
+			int ret = td->io_ops->io_u_init(td, io_u);
+
+			if (ret) {
+				log_err("fio: failed to init engine data: %d\n", ret);
+				return 1;
+			}
+		}
+	}
+
+	init_io_u_buffers(td);
+
+	if (init_file_completion_logging(td, max_units))
+		return 1;
+
+	return 0;
+}
+
+int init_io_u_buffers(struct thread_data *td)
+{
+	struct io_u *io_u;
+	unsigned long long max_bs, min_write;
+	int i, max_units;
+	int data_xfer = 1;
+	char *p;
+
+	max_units = td->o.iodepth;
+	max_bs = td_max_bs(td);
+	min_write = td->o.min_bs[DDIR_WRITE];
+	td->orig_buffer_size = (unsigned long long) max_bs
+					* (unsigned long long) max_units;
+
+	if (td_ioengine_flagged(td, FIO_NOIO) || !(td_read(td) || td_write(td)))
+		data_xfer = 0;
+
 	/*
 	 * if we may later need to do address alignment, then add any
 	 * possible adjustment here so that we don't cause a buffer
@@ -1256,23 +1311,8 @@ static int init_io_u(struct thread_data *td)
 	else
 		p = td->orig_buffer;
 
-	cl_align = os_cache_line_size();
-
 	for (i = 0; i < max_units; i++) {
-		void *ptr;
-
-		if (td->terminate)
-			return 1;
-
-		ptr = fio_memalign(cl_align, sizeof(*io_u));
-		if (!ptr) {
-			log_err("fio: unable to allocate aligned memory\n");
-			break;
-		}
-
-		io_u = ptr;
-		memset(io_u, 0, sizeof(*io_u));
-		INIT_FLIST_HEAD(&io_u->verify_list);
+		io_u = td->io_u_all.io_us[i];
 		dprint(FD_MEM, "io_u alloc %p, index %u\n", io_u, i);
 
 		if (data_xfer) {
@@ -1289,32 +1329,9 @@ static int init_io_u(struct thread_data *td)
 				fill_verify_pattern(td, io_u->buf, max_bs, io_u, 0, 0);
 			}
 		}
-
-		io_u->index = i;
-		io_u->flags = IO_U_F_FREE;
-		io_u_qpush(&td->io_u_freelist, io_u);
-
-		/*
-		 * io_u never leaves this stack, used for iteration of all
-		 * io_u buffers.
-		 */
-		io_u_qpush(&td->io_u_all, io_u);
-
-		if (td->io_ops->io_u_init) {
-			int ret = td->io_ops->io_u_init(td, io_u);
-
-			if (ret) {
-				log_err("fio: failed to init engine data: %d\n", ret);
-				return 1;
-			}
-		}
-
 		p += max_bs;
 	}
 
-	if (init_file_completion_logging(td, max_units))
-		return 1;
-
 	return 0;
 }
 
diff --git a/crc/xxhash.h b/crc/xxhash.h
index 8850d20..934c555 100644
--- a/crc/xxhash.h
+++ b/crc/xxhash.h
@@ -107,9 +107,9 @@ XXH32() :
 // Advanced Hash Functions
 //****************************
 
-void*         XXH32_init   (unsigned int seed);
+void*         XXH32_init   (uint32_t seed);
 XXH_errorcode XXH32_update (void* state, const void* input, int len);
-unsigned int  XXH32_digest (void* state);
+uint32_t XXH32_digest (void* state);
 
 /*
 These functions calculate the xxhash of an input provided in several small packets,
@@ -135,7 +135,7 @@ Memory will be freed by XXH32_digest().
 
 
 int           XXH32_sizeofState(void);
-XXH_errorcode XXH32_resetState(void* state, unsigned int seed);
+XXH_errorcode XXH32_resetState(void* state, uint32_t seed);
 
 #define       XXH32_SIZEOFSTATE 48
 typedef struct { long long ll[(XXH32_SIZEOFSTATE+(sizeof(long long)-1))/sizeof(long long)]; } XXH32_stateSpace_t;
@@ -151,7 +151,7 @@ use the structure XXH32_stateSpace_t, which will ensure that memory space is lar
 */
 
 
-unsigned int XXH32_intermediateDigest (void* state);
+uint32_t XXH32_intermediateDigest (void* state);
 /*
 This function does the same as XXH32_digest(), generating a 32-bit hash,
 but preserve memory context.
diff --git a/filesetup.c b/filesetup.c
index accb67a..94a025e 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1675,6 +1675,11 @@ int put_file(struct thread_data *td, struct fio_file *f)
 	if (--f->references)
 		return 0;
 
+	disk_util_dec(f->du);
+
+	if (td->o.file_lock_mode != FILE_LOCK_NONE)
+		unlock_file_all(td, f);
+
 	if (should_fsync(td) && td->o.fsync_on_close) {
 		f_ret = fsync(f->fd);
 		if (f_ret < 0)
@@ -1688,6 +1693,7 @@ int put_file(struct thread_data *td, struct fio_file *f)
 		ret = f_ret;
 
 	td->nr_open_files--;
+	fio_file_clear_closing(f);
 	fio_file_clear_open(f);
 	assert(f->fd == -1);
 	return ret;
diff --git a/ioengines.c b/ioengines.c
index e5fbcd4..433da60 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -431,6 +431,14 @@ void td_io_commit(struct thread_data *td)
 
 int td_io_open_file(struct thread_data *td, struct fio_file *f)
 {
+	if (fio_file_closing(f)) {
+		/*
+		 * Open translates to undo closing.
+		 */
+		fio_file_clear_closing(f);
+		get_file(f);
+		return 0;
+	}
 	assert(!fio_file_open(f));
 	assert(f->fd == -1);
 	assert(td->io_ops->open_file);
@@ -540,11 +548,6 @@ int td_io_close_file(struct thread_data *td, struct fio_file *f)
 	 */
 	fio_file_set_closing(f);
 
-	disk_util_dec(f->du);
-
-	if (td->o.file_lock_mode != FILE_LOCK_NONE)
-		unlock_file_all(td, f);
-
 	return put_file(td, f);
 }
 
diff --git a/iolog.c b/iolog.c
index bd2a214..0f95c60 100644
--- a/iolog.c
+++ b/iolog.c
@@ -389,6 +389,7 @@ static bool read_iolog2(struct thread_data *td)
 	char *rfname, *fname, *act;
 	char *str, *p;
 	enum fio_ddir rw;
+	bool realloc = false;
 	int64_t items_to_fetch = 0;
 
 	if (td->o.read_iolog_chunked) {
@@ -501,8 +502,10 @@ static bool read_iolog2(struct thread_data *td)
 			ipo_bytes_align(td->o.replay_align, ipo);
 
 			ipo->len = bytes;
-			if (rw != DDIR_INVAL && bytes > td->o.max_bs[rw])
+			if (rw != DDIR_INVAL && bytes > td->o.max_bs[rw]) {
+				realloc = true;
 				td->o.max_bs[rw] = bytes;
+			}
 			ipo->fileno = fileno;
 			ipo->file_action = file_action;
 			td->o.size += bytes;
@@ -539,6 +542,12 @@ static bool read_iolog2(struct thread_data *td)
 			return false;
 		}
 		td->o.td_ddir = TD_DDIR_RW;
+		if (realloc && td->orig_buffer)
+		{
+			io_u_quiesce(td);
+			free_io_mem(td);
+			init_io_u_buffers(td);
+		}
 		return true;
 	}
 
diff --git a/iolog.h b/iolog.h
index 3b8c901..17be908 100644
--- a/iolog.h
+++ b/iolog.h
@@ -244,6 +244,7 @@ extern void write_iolog_close(struct thread_data *);
 extern int iolog_compress_init(struct thread_data *, struct sk_out *);
 extern void iolog_compress_exit(struct thread_data *);
 extern size_t log_chunk_sizes(struct io_log *);
+extern int init_io_u_buffers(struct thread_data *);
 
 #ifdef CONFIG_ZLIB
 extern int iolog_file_inflate(const char *);


^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b87ed299820c26e8c4271294b0c5037e8d0a3d4a:

  client: support --status-interval option in client/server mode (2018-08-05 15:12:12 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2ce58465415fc4d900c4dd89b86acbcaa51d9dfb:

  libpmem: fix type print (2018-08-07 08:09:47 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      libpmem: fix type print

 engines/libpmem.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/engines/libpmem.c b/engines/libpmem.c
index 4ef3094..99c7b50 100644
--- a/engines/libpmem.c
+++ b/engines/libpmem.c
@@ -424,10 +424,10 @@ static int fio_libpmem_prep(struct thread_data *td, struct io_u *io_u)
 	/*
 	 * It fits within existing mapping, use it
 	 */
-	dprint(FD_IO," io_u->offset %lld : fdd->libpmem_off %ld : "
-			"io_u->buflen %ld : fdd->libpmem_sz %ld\n",
-			io_u->offset, fdd->libpmem_off,
-			io_u->buflen, fdd->libpmem_sz);
+	dprint(FD_IO," io_u->offset %llu : fdd->libpmem_off %llu : "
+			"io_u->buflen %llu : fdd->libpmem_sz %llu\n",
+			io_u->offset, (unsigned long long) fdd->libpmem_off,
+			io_u->buflen, (unsigned long long) fdd->libpmem_sz);
 
 	if (io_u->offset >= fdd->libpmem_off &&
 	    (io_u->offset + io_u->buflen <=

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1d73ff2a4a8f02905cf338b2f0286d76d64e7c2a:

  iolog: move the chunked items-to-fetch logic into separate function (2018-08-03 14:40:17 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b87ed299820c26e8c4271294b0c5037e8d0a3d4a:

  client: support --status-interval option in client/server mode (2018-08-05 15:12:12 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (1):
      client: support --status-interval option in client/server mode

 init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index ede0a8b..06f6971 100644
--- a/init.c
+++ b/init.c
@@ -258,7 +258,7 @@ static struct option l_opts[FIO_NR_OPTIONS] = {
 	{
 		.name		= (char *) "status-interval",
 		.has_arg	= required_argument,
-		.val		= 'L',
+		.val		= 'L' | FIO_CLIENT_FLAG,
 	},
 	{
 		.name		= (char *) "trigger-file",

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 17924179519397e49a7a82fd99d860f9ef077645:

  Merge branch 'szaydel/solaris-Wincompatible-pointer-types' of https://github.com/szaydel/fio (2018-08-02 16:20:24 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1d73ff2a4a8f02905cf338b2f0286d76d64e7c2a:

  iolog: move the chunked items-to-fetch logic into separate function (2018-08-03 14:40:17 -0600)

----------------------------------------------------------------
Adam Kupczyk (4):
      iolog: Added option read_iolog_chunked. Used to avoid reading large iologs at once.     	Allows iologs to be infinite, generated.
      iolog: Added new option description to HOWTO
      platforms/windows: Add S_ISSOCK macro.
      iolog: allow to read_iolog from unix socket

Jens Axboe (5):
      Merge branch 'windows-s_issock' of https://github.com/aclamk/fio
      Merge branch 'read_iolog-from-unix-socket' of https://github.com/aclamk/fio
      Merge branch 'chunked-iolog-reading' of https://github.com/aclamk/fio
      iolog: fix potential div-by-zero
      iolog: move the chunked items-to-fetch logic into separate function

 HOWTO            |   6 +++
 backend.c        |   4 ++
 fio.1            |   5 +++
 fio.h            |   5 +++
 iolog.c          | 122 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
 options.c        |  11 +++++
 os/os-windows.h  |   4 ++
 thread_options.h |   1 +
 8 files changed, 147 insertions(+), 11 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 804d93e..16c5ae3 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2327,6 +2327,12 @@ I/O replay
 	replay, the file needs to be turned into a blkparse binary data file first
 	(``blkparse <device> -o /dev/null -d file_for_fio.bin``).
 
+.. option:: read_iolog_chunked=bool
+
+	Determines how iolog is read. If false(default) entire :option:`read_iolog`
+	will be read at once. If selected true, input from iolog will be read
+	gradually. Useful when iolog is very large, or it is generated.
+
 .. option:: replay_no_stall=bool
 
 	When replaying I/O with :option:`read_iolog` the default behavior is to
diff --git a/backend.c b/backend.c
index 3c45e78..4b4ecde 100644
--- a/backend.c
+++ b/backend.c
@@ -966,8 +966,10 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 		 * Break if we exceeded the bytes. The exception is time
 		 * based runs, but we still need to break out of the loop
 		 * for those to run verification, if enabled.
+		 * Jobs read from iolog do not use this stop condition.
 		 */
 		if (bytes_issued >= total_bytes &&
+		    !td->o.read_iolog_file &&
 		    (!td->o.time_based ||
 		     (td->o.time_based && td->o.verify != VERIFY_NONE)))
 			break;
@@ -1909,6 +1911,8 @@ err:
 	 */
 	if (o->write_iolog_file)
 		write_iolog_close(td);
+	if (td->io_log_rfile)
+		fclose(td->io_log_rfile);
 
 	td_set_runstate(td, TD_EXITED);
 
diff --git a/fio.1 b/fio.1
index a446aba..4386f85 100644
--- a/fio.1
+++ b/fio.1
@@ -2057,6 +2057,11 @@ to replay a workload captured by blktrace. See
 replay, the file needs to be turned into a blkparse binary data file first
 (`blkparse <device> \-o /dev/null \-d file_for_fio.bin').
 .TP
+.BI read_iolog_chunked \fR=\fPbool
+Determines how iolog is read. If false (default) entire \fBread_iolog\fR will
+be read at once. If selected true, input from iolog will be read gradually.
+Useful when iolog is very large, or it is generated.
+.TP
 .BI replay_no_stall \fR=\fPbool
 When replaying I/O with \fBread_iolog\fR the default behavior is to
 attempt to respect the timestamps within the log and replay them with the
diff --git a/fio.h b/fio.h
index 685aab1..b58057f 100644
--- a/fio.h
+++ b/fio.h
@@ -399,6 +399,11 @@ struct thread_data {
 	 * For IO replaying
 	 */
 	struct flist_head io_log_list;
+	FILE *io_log_rfile;
+	unsigned int io_log_current;
+	unsigned int io_log_checkmark;
+	unsigned int io_log_highmark;
+	struct timespec io_log_highmark_time;
 
 	/*
 	 * For tracking/handling discards
diff --git a/iolog.c b/iolog.c
index d51e49c..bd2a214 100644
--- a/iolog.c
+++ b/iolog.c
@@ -20,6 +20,13 @@
 #include "blktrace.h"
 #include "pshared.h"
 
+#include <netinet/in.h>
+#include <netinet/tcp.h>
+#include <arpa/inet.h>
+#include <sys/stat.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+
 static int iolog_flush(struct io_log *log);
 
 static const char iolog_ver2[] = "fio version 2 iolog";
@@ -134,6 +141,8 @@ static int ipo_special(struct thread_data *td, struct io_piece *ipo)
 	return 1;
 }
 
+static bool read_iolog2(struct thread_data *td);
+
 int read_iolog_get(struct thread_data *td, struct io_u *io_u)
 {
 	struct io_piece *ipo;
@@ -141,7 +150,13 @@ int read_iolog_get(struct thread_data *td, struct io_u *io_u)
 
 	while (!flist_empty(&td->io_log_list)) {
 		int ret;
-
+		if (td->o.read_iolog_chunked) {
+			if (td->io_log_checkmark == td->io_log_current) {
+				if (!read_iolog2(td))
+					return 1;
+			}
+			td->io_log_current--;
+		}
 		ipo = flist_first_entry(&td->io_log_list, struct io_piece, list);
 		flist_del(&ipo->list);
 		remove_trim_entry(td, ipo);
@@ -334,11 +349,39 @@ void write_iolog_close(struct thread_data *td)
 	td->iolog_buf = NULL;
 }
 
+static int64_t iolog_items_to_fetch(struct thread_data *td)
+{
+	struct timespec now;
+	uint64_t elapsed;
+	uint64_t for_1s;
+	int64_t items_to_fetch;
+
+	if (!td->io_log_highmark)
+		return 10;
+
+
+	fio_gettime(&now, NULL);
+	elapsed = ntime_since(&td->io_log_highmark_time, &now);
+	if (elapsed) {
+		for_1s = (td->io_log_highmark - td->io_log_current) * 1000000000 / elapsed;
+		items_to_fetch = for_1s - td->io_log_current;
+		if (items_to_fetch < 0)
+			items_to_fetch = 0;
+	} else
+		items_to_fetch = 0;
+
+	td->io_log_highmark = td->io_log_current + items_to_fetch;
+	td->io_log_checkmark = (td->io_log_highmark + 1) / 2;
+	fio_gettime(&td->io_log_highmark_time, NULL);
+
+	return items_to_fetch;
+}
+
 /*
  * Read version 2 iolog data. It is enhanced to include per-file logging,
  * syncs, etc.
  */
-static bool read_iolog2(struct thread_data *td, FILE *f)
+static bool read_iolog2(struct thread_data *td)
 {
 	unsigned long long offset;
 	unsigned int bytes;
@@ -346,8 +389,13 @@ static bool read_iolog2(struct thread_data *td, FILE *f)
 	char *rfname, *fname, *act;
 	char *str, *p;
 	enum fio_ddir rw;
+	int64_t items_to_fetch = 0;
 
-	free_release_files(td);
+	if (td->o.read_iolog_chunked) {
+		items_to_fetch = iolog_items_to_fetch(td);
+		if (!items_to_fetch)
+			return true;
+	}
 
 	/*
 	 * Read in the read iolog and store it, reuse the infrastructure
@@ -358,7 +406,7 @@ static bool read_iolog2(struct thread_data *td, FILE *f)
 	act = malloc(256+16);
 
 	reads = writes = waits = 0;
-	while ((p = fgets(str, 4096, f)) != NULL) {
+	while ((p = fgets(str, 4096, td->io_log_rfile)) != NULL) {
 		struct io_piece *ipo;
 		int r;
 
@@ -461,18 +509,39 @@ static bool read_iolog2(struct thread_data *td, FILE *f)
 		}
 
 		queue_io_piece(td, ipo);
+
+		if (td->o.read_iolog_chunked) {
+			td->io_log_current++;
+			items_to_fetch--;
+			if (items_to_fetch == 0)
+				break;
+		}
 	}
 
 	free(str);
 	free(act);
 	free(rfname);
 
+	if (td->o.read_iolog_chunked) {
+		td->io_log_highmark = td->io_log_current;
+		td->io_log_checkmark = (td->io_log_highmark + 1) / 2;
+		fio_gettime(&td->io_log_highmark_time, NULL);
+	}
+
 	if (writes && read_only) {
 		log_err("fio: <%s> skips replay of %d writes due to"
 			" read-only\n", td->o.name, writes);
 		writes = 0;
 	}
 
+	if (td->o.read_iolog_chunked) {
+		if (td->io_log_current == 0) {
+			return false;
+		}
+		td->o.td_ddir = TD_DDIR_RW;
+		return true;
+	}
+
 	if (!reads && !writes && !waits)
 		return false;
 	else if (reads && !writes)
@@ -485,16 +554,46 @@ static bool read_iolog2(struct thread_data *td, FILE *f)
 	return true;
 }
 
+static bool is_socket(const char *path)
+{
+	struct stat buf;
+	int r = stat(path, &buf);
+	if (r == -1)
+		return false;
+
+	return S_ISSOCK(buf.st_mode);
+}
+
+static int open_socket(const char *path)
+{
+	int fd = socket(AF_UNIX, SOCK_STREAM, 0);
+	struct sockaddr_un addr;
+	if (fd < 0)
+		return fd;
+	addr.sun_family = AF_UNIX;
+	strncpy(addr.sun_path, path, sizeof(addr.sun_path));
+	if (connect(fd, (const struct sockaddr *)&addr, strlen(path) + sizeof(addr.sun_family)) == 0)
+		return fd;
+	else
+		close(fd);
+	return -1;
+}
+
 /*
  * open iolog, check version, and call appropriate parser
  */
 static bool init_iolog_read(struct thread_data *td)
 {
 	char buffer[256], *p;
-	FILE *f;
+	FILE *f = NULL;
 	bool ret;
-
-	f = fopen(td->o.read_iolog_file, "r");
+	if (is_socket(td->o.read_iolog_file)) {
+		int fd = open_socket(td->o.read_iolog_file);
+		if (fd >= 0) {
+			f = fdopen(fd, "r");
+		}
+	} else
+		f = fopen(td->o.read_iolog_file, "r");
 	if (!f) {
 		perror("fopen read iolog");
 		return false;
@@ -507,19 +606,20 @@ static bool init_iolog_read(struct thread_data *td)
 		fclose(f);
 		return false;
 	}
-
+	td->io_log_rfile = f;
 	/*
 	 * version 2 of the iolog stores a specific string as the
 	 * first line, check for that
 	 */
-	if (!strncmp(iolog_ver2, buffer, strlen(iolog_ver2)))
-		ret = read_iolog2(td, f);
+	if (!strncmp(iolog_ver2, buffer, strlen(iolog_ver2))) {
+		free_release_files(td);
+		ret = read_iolog2(td);
+	}
 	else {
 		log_err("fio: iolog version 1 is no longer supported\n");
 		ret = false;
 	}
 
-	fclose(f);
 	return ret;
 }
 
diff --git a/options.c b/options.c
index 4b46402..f592027 100644
--- a/options.c
+++ b/options.c
@@ -3135,6 +3135,17 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_IOLOG,
 	},
 	{
+		.name	= "read_iolog_chunked",
+		.lname	= "Read I/O log in parts",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct thread_options, read_iolog_chunked),
+		.def	= "0",
+		.parent	= "read_iolog",
+		.help	= "Parse IO pattern in chunks",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_IOLOG,
+	},
+	{
 		.name	= "replay_no_stall",
 		.lname	= "Don't stall on replay",
 		.type	= FIO_OPT_BOOL,
diff --git a/os/os-windows.h b/os/os-windows.h
index 01f555e..aad446e 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -74,6 +74,10 @@ int rand_r(unsigned *);
 /* Winsock doesn't support MSG_WAIT */
 #define OS_MSG_DONTWAIT	0
 
+#ifndef S_ISSOCK
+#define S_ISSOCK(x) 0
+#endif
+
 #define SIGCONT	0
 #define SIGUSR1	1
 #define SIGUSR2 2
diff --git a/thread_options.h b/thread_options.h
index 8adba48..8bbf54b 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -247,6 +247,7 @@ struct thread_options {
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
 	char *read_iolog_file;
+	bool read_iolog_chunked;
 	char *write_iolog_file;
 
 	unsigned int write_bw_log;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-08-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-08-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 284226f333be2fb7d859287fd3ab3c51b3636a92:

  Merge branch 'fio-histo-fix' of https://github.com/parallel-fs-utils/fio (2018-07-30 08:24:20 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 17924179519397e49a7a82fd99d860f9ef077645:

  Merge branch 'szaydel/solaris-Wincompatible-pointer-types' of https://github.com/szaydel/fio (2018-08-02 16:20:24 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'szaydel/solaris-Wincompatible-pointer-types' of https://github.com/szaydel/fio

Sam Zaydel (1):
      Fix incompatible pointer types warning on Solaris with gcc 6.4

 engines/solarisaio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/solarisaio.c b/engines/solarisaio.c
index 151f31d..21e9593 100644
--- a/engines/solarisaio.c
+++ b/engines/solarisaio.c
@@ -105,7 +105,7 @@ static struct io_u *fio_solarisaio_event(struct thread_data *td, int event)
 	return sd->aio_events[event];
 }
 
-static int fio_solarisaio_queue(struct thread_data fio_unused *td,
+static enum fio_q_status fio_solarisaio_queue(struct thread_data fio_unused *td,
 			      struct io_u *io_u)
 {
 	struct solarisaio_data *sd = td->io_ops_data;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-07-31 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-07-31 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a871240086ca6bdc52f79d7459ed283c5a359299:

  Merge branch 'sgunmap2' of https://github.com/vincentkfu/fio (2018-07-26 11:47:28 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 284226f333be2fb7d859287fd3ab3c51b3636a92:

  Merge branch 'fio-histo-fix' of https://github.com/parallel-fs-utils/fio (2018-07-30 08:24:20 -0600)

----------------------------------------------------------------
Ben England (1):
      clean up argparse usage

Jens Axboe (1):
      Merge branch 'fio-histo-fix' of https://github.com/parallel-fs-utils/fio

 tools/hist/fio-histo-log-pctiles.py | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/tools/hist/fio-histo-log-pctiles.py b/tools/hist/fio-histo-log-pctiles.py
index bad6887..c398113 100755
--- a/tools/hist/fio-histo-log-pctiles.py
+++ b/tools/hist/fio-histo-log-pctiles.py
@@ -311,7 +311,7 @@ def compute_percentiles_from_logs():
         default="6", type=int, 
         help="fio histogram buckets-per-group bits (default=6 means 64 buckets/group)")
     parser.add_argument("--percentiles", dest="pctiles_wanted", 
-        default="0 50 95 99 100", type=float, nargs='+',
+        default=[ 0., 50., 95., 99., 100.], type=float, nargs='+',
         help="fio histogram buckets-per-group bits (default=6 means 64 buckets/group)")
     parser.add_argument("--time-quantum", dest="time_quantum", 
         default="1", type=int,
@@ -319,20 +319,17 @@ def compute_percentiles_from_logs():
     parser.add_argument("--output-unit", dest="output_unit", 
         default="usec", type=str,
         help="Latency percentile output unit: msec|usec|nsec (default usec)")
-    parser.add_argument("file_list", nargs='+')
+    parser.add_argument("file_list", nargs='+', 
+        help='list of files, preceded by " -- " if necessary')
     args = parser.parse_args()
-    print(args)
 
-    if not args.bucket_groups:
-        # default changes based on fio version
-        if fio_version == 2:
-            args.bucket_groups = 19
-        else:
-            # default in fio 3.x
-            args.bucket_groups = 29
+    # default changes based on fio version
+    if args.fio_version == 2:
+        args.bucket_groups = 19
 
     # print parameters
 
+    print('fio version = %d' % args.fio_version)
     print('bucket groups = %d' % args.bucket_groups)
     print('bucket bits = %d' % args.bucket_bits)
     print('time quantum = %d sec' % args.time_quantum)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-07-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-07-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 40e8d8314f10a578765e20a4eb574b2603d292df:

  Merge branch 'fio-histo-log-pctiles' of https://github.com/parallel-fs-utils/fio (2018-07-25 14:42:57 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a871240086ca6bdc52f79d7459ed283c5a359299:

  Merge branch 'sgunmap2' of https://github.com/vincentkfu/fio (2018-07-26 11:47:28 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'sgunmap2' of https://github.com/vincentkfu/fio

Vincent Fu (7):
      ioengines: have ioengines with commit do own io accounting for trims
      stat: add IO submit and complete depths to JSON output
      engines/sg: support trim operations via the UNMAP command
      engines/sg: add cmdp and dxferp for trims to sg error string
      engines/sg: move asserts and debug statements behind a debug flag
      testing: add test scripts for sg ioengine
      docs: update HOWTO and manpage for sg trim support

 HOWTO             |   9 +-
 engines/libaio.c  |   2 +
 engines/sg.c      | 421 ++++++++++++++++++++++++++++++++++++++++++++++--------
 fio.1             |   7 +-
 ioengines.c       |   2 +-
 stat.c            |  45 +++++-
 t/sgunmap-perf.py | 115 +++++++++++++++
 t/sgunmap-test.py | 173 ++++++++++++++++++++++
 8 files changed, 701 insertions(+), 73 deletions(-)
 create mode 100755 t/sgunmap-perf.py
 create mode 100755 t/sgunmap-test.py

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 70eed28..804d93e 100644
--- a/HOWTO
+++ b/HOWTO
@@ -991,13 +991,15 @@ I/O type
 		**write**
 				Sequential writes.
 		**trim**
-				Sequential trims (Linux block devices only).
+				Sequential trims (Linux block devices and SCSI
+				character devices only).
 		**randread**
 				Random reads.
 		**randwrite**
 				Random writes.
 		**randtrim**
-				Random trims (Linux block devices only).
+				Random trims (Linux block devices and SCSI
+				character devices only).
 		**rw,readwrite**
 				Sequential mixed reads and writes.
 		**randrw**
@@ -1748,7 +1750,7 @@ I/O engine
 			ioctl, or if the target is an sg character device we use
 			:manpage:`read(2)` and :manpage:`write(2)` for asynchronous
 			I/O. Requires :option:`filename` option to specify either block or
-			character devices.
+			character devices. This engine supports trim operations.
 			The sg engine includes engine specific options.
 
 		**null**
@@ -2082,6 +2084,7 @@ with the caveat that when used on the command line, they must come after the
 	the force unit access (fua) flag. Default is 0.
 
 .. option:: sg_write_mode=str : [sg]
+
 	Specify the type of write commands to issue. This option can take three values:
 
 	**write**
diff --git a/engines/libaio.c b/engines/libaio.c
index dae2a70..7ac36b2 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -207,6 +207,8 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 			return FIO_Q_BUSY;
 
 		do_io_u_trim(td, io_u);
+		io_u_mark_submit(td, 1);
+		io_u_mark_complete(td, 1);
 		return FIO_Q_COMPLETED;
 	}
 
diff --git a/engines/sg.c b/engines/sg.c
index 06cd194..7741f83 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -3,6 +3,51 @@
  *
  * IO engine that uses the Linux SG v3 interface to talk to SCSI devices
  *
+ * This ioengine can operate in two modes:
+ *	sync	with block devices (/dev/sdX) or
+ *		with character devices (/dev/sgY) with direct=1 or sync=1
+ *	async	with character devices with direct=0 and sync=0
+ *
+ * What value does queue() return for the different cases?
+ *				queue() return value
+ * In sync mode:
+ *  /dev/sdX		RWT	FIO_Q_COMPLETED
+ *  /dev/sgY		RWT	FIO_Q_COMPLETED
+ *   with direct=1 or sync=1
+ *
+ * In async mode:
+ *  /dev/sgY		RWT	FIO_Q_QUEUED
+ *   direct=0 and sync=0
+ *
+ * Because FIO_SYNCIO is set for this ioengine td_io_queue() will fill in
+ * issue_time *before* each IO is sent to queue()
+ *
+ * Where are the IO counting functions called for the different cases?
+ *
+ * In sync mode:
+ *  /dev/sdX (commit==NULL)
+ *   RWT
+ *    io_u_mark_depth()			called in td_io_queue()
+ *    io_u_mark_submit/complete()	called in td_io_queue()
+ *    issue_time			set in td_io_queue()
+ *
+ *  /dev/sgY with direct=1 or sync=1 (commit does nothing)
+ *   RWT
+ *    io_u_mark_depth()			called in td_io_queue()
+ *    io_u_mark_submit/complete()	called in queue()
+ *    issue_time			set in td_io_queue()
+ *  
+ * In async mode:
+ *  /dev/sgY with direct=0 and sync=0
+ *   RW: read and write operations are submitted in queue()
+ *    io_u_mark_depth()			called in td_io_commit()
+ *    io_u_mark_submit()		called in queue()
+ *    issue_time			set in td_io_queue()
+ *   T: trim operations are queued in queue() and submitted in commit()
+ *    io_u_mark_depth()			called in td_io_commit()
+ *    io_u_mark_submit()		called in commit()
+ *    issue_time			set in commit()
+ *
  */
 #include <stdio.h>
 #include <stdlib.h>
@@ -81,6 +126,9 @@ static struct fio_option options[] = {
 #define MAX_10B_LBA  0xFFFFFFFFULL
 #define SCSI_TIMEOUT_MS 30000   // 30 second timeout; currently no method to override
 #define MAX_SB 64               // sense block maximum return size
+/*
+#define FIO_SGIO_DEBUG
+*/
 
 struct sgio_cmd {
 	unsigned char cdb[16];      // enhanced from 10 to support 16 byte commands
@@ -88,6 +136,12 @@ struct sgio_cmd {
 	int nr;
 };
 
+struct sgio_trim {
+	char *unmap_param;
+	unsigned int unmap_range_count;
+	struct io_u **trim_io_us;
+};
+
 struct sgio_data {
 	struct sgio_cmd *cmds;
 	struct io_u **events;
@@ -96,8 +150,18 @@ struct sgio_data {
 	void *sgbuf;
 	unsigned int bs;
 	int type_checked;
+	struct sgio_trim **trim_queues;
+	int current_queue;
+#ifdef FIO_SGIO_DEBUG
+	unsigned int *trim_queue_map;
+#endif
 };
 
+static inline bool sgio_unbuffered(struct thread_data *td)
+{
+	return (td->o.odirect || td->o.sync_io);
+}
+
 static void sgio_hdr_init(struct sgio_data *sd, struct sg_io_hdr *hdr,
 			  struct io_u *io_u, int fs)
 {
@@ -113,6 +177,7 @@ static void sgio_hdr_init(struct sgio_data *sd, struct sg_io_hdr *hdr,
 	hdr->mx_sb_len = sizeof(sc->sb);
 	hdr->pack_id = io_u->index;
 	hdr->usr_ptr = io_u;
+	hdr->timeout = SCSI_TIMEOUT_MS;
 
 	if (fs) {
 		hdr->dxferp = io_u->xfer_buf;
@@ -165,10 +230,11 @@ static int fio_sgio_getevents(struct thread_data *td, unsigned int min,
 			      const struct timespec fio_unused *t)
 {
 	struct sgio_data *sd = td->io_ops_data;
-	int left = max, eventNum, ret, r = 0;
+	int left = max, eventNum, ret, r = 0, trims = 0;
 	void *buf = sd->sgbuf;
-	unsigned int i, events;
+	unsigned int i, j, events;
 	struct fio_file *f;
+	struct io_u *io_u;
 
 	/*
 	 * Fill in the file descriptors
@@ -186,10 +252,20 @@ static int fio_sgio_getevents(struct thread_data *td, unsigned int min,
 		sd->pfds[i].events = POLLIN;
 	}
 
-	while (left) {
+	/*
+	** There are two counters here:
+	**  - number of SCSI commands completed
+	**  - number of io_us completed
+	**
+	** These are the same with reads and writes, but
+	** could differ with trim/unmap commands because
+	** a single unmap can include multiple io_us
+	*/
+
+	while (left > 0) {
 		char *p;
 
-		dprint(FD_IO, "sgio_getevents: sd %p: left=%d\n", sd, left);
+		dprint(FD_IO, "sgio_getevents: sd %p: min=%d, max=%d, left=%d\n", sd, min, max, left);
 
 		do {
 			if (!min)
@@ -217,15 +293,21 @@ re_read:
 		for_each_file(td, f, i) {
 			for (eventNum = 0; eventNum < left; eventNum++) {
 				ret = sg_fd_read(f->fd, p, sizeof(struct sg_io_hdr));
-				dprint(FD_IO, "sgio_getevents: ret: %d\n", ret);
+				dprint(FD_IO, "sgio_getevents: sg_fd_read ret: %d\n", ret);
 				if (ret) {
 					r = -ret;
 					td_verror(td, r, "sg_read");
 					break;
 				}
+				io_u = ((struct sg_io_hdr *)p)->usr_ptr;
+				if (io_u->ddir == DDIR_TRIM) {
+					events += sd->trim_queues[io_u->index]->unmap_range_count;
+					eventNum += sd->trim_queues[io_u->index]->unmap_range_count - 1;
+				} else
+					events++;
+
 				p += sizeof(struct sg_io_hdr);
-				events++;
-				dprint(FD_IO, "sgio_getevents: events: %d\n", events);
+				dprint(FD_IO, "sgio_getevents: events: %d, eventNum: %d, left: %d\n", events, eventNum, left);
 			}
 		}
 
@@ -241,14 +323,38 @@ re_read:
 
 		for (i = 0; i < events; i++) {
 			struct sg_io_hdr *hdr = (struct sg_io_hdr *) buf + i;
-			sd->events[i] = hdr->usr_ptr;
+			sd->events[i + trims] = hdr->usr_ptr;
+			io_u = (struct io_u *)(hdr->usr_ptr);
 
-			/* record if an io error occurred, ignore resid */
 			if (hdr->info & SG_INFO_CHECK) {
-				struct io_u *io_u;
-				io_u = (struct io_u *)(hdr->usr_ptr);
+				/* record if an io error occurred, ignore resid */
 				memcpy(&io_u->hdr, hdr, sizeof(struct sg_io_hdr));
-				sd->events[i]->error = EIO;
+				sd->events[i + trims]->error = EIO;
+			}
+
+			if (io_u->ddir == DDIR_TRIM) {
+				struct sgio_trim *st = sd->trim_queues[io_u->index];
+#ifdef FIO_SGIO_DEBUG
+				assert(st->trim_io_us[0] == io_u);
+				assert(sd->trim_queue_map[io_u->index] == io_u->index);
+				dprint(FD_IO, "sgio_getevents: reaping %d io_us from trim queue %d\n", st->unmap_range_count, io_u->index);
+				dprint(FD_IO, "sgio_getevents: reaped io_u %d and stored in events[%d]\n", io_u->index, i+trims);
+#endif
+				for (j = 1; j < st->unmap_range_count; j++) {
+					++trims;
+					sd->events[i + trims] = st->trim_io_us[j];
+#ifdef FIO_SGIO_DEBUG
+					dprint(FD_IO, "sgio_getevents: reaped io_u %d and stored in events[%d]\n", st->trim_io_us[j]->index, i+trims);
+					assert(sd->trim_queue_map[st->trim_io_us[j]->index] == io_u->index);
+#endif
+					if (hdr->info & SG_INFO_CHECK) {
+						/* record if an io error occurred, ignore resid */
+						memcpy(&st->trim_io_us[j]->hdr, hdr, sizeof(struct sg_io_hdr));
+						sd->events[i + trims]->error = EIO;
+					}
+				}
+				events -= st->unmap_range_count - 1;
+				st->unmap_range_count = 0;
 			}
 		}
 	}
@@ -287,7 +393,8 @@ static enum fio_q_status fio_sgio_ioctl_doio(struct thread_data *td,
 	return FIO_Q_COMPLETED;
 }
 
-static int fio_sgio_rw_doio(struct fio_file *f, struct io_u *io_u, int do_sync)
+static enum fio_q_status fio_sgio_rw_doio(struct fio_file *f,
+					  struct io_u *io_u, int do_sync)
 {
 	struct sg_io_hdr *hdr = &io_u->hdr;
 	int ret;
@@ -311,10 +418,11 @@ static int fio_sgio_rw_doio(struct fio_file *f, struct io_u *io_u, int do_sync)
 	return FIO_Q_QUEUED;
 }
 
-static int fio_sgio_doio(struct thread_data *td, struct io_u *io_u, int do_sync)
+static enum fio_q_status fio_sgio_doio(struct thread_data *td,
+				       struct io_u *io_u, int do_sync)
 {
 	struct fio_file *f = io_u->file;
-	int ret;
+	enum fio_q_status ret;
 
 	if (f->filetype == FIO_TYPE_BLOCK) {
 		ret = fio_sgio_ioctl_doio(td, f, io_u);
@@ -328,12 +436,41 @@ static int fio_sgio_doio(struct thread_data *td, struct io_u *io_u, int do_sync)
 	return ret;
 }
 
+static void fio_sgio_rw_lba(struct sg_io_hdr *hdr, unsigned long long lba,
+			    unsigned long long nr_blocks)
+{
+	if (lba < MAX_10B_LBA) {
+		hdr->cmdp[2] = (unsigned char) ((lba >> 24) & 0xff);
+		hdr->cmdp[3] = (unsigned char) ((lba >> 16) & 0xff);
+		hdr->cmdp[4] = (unsigned char) ((lba >>  8) & 0xff);
+		hdr->cmdp[5] = (unsigned char) (lba & 0xff);
+		hdr->cmdp[7] = (unsigned char) ((nr_blocks >> 8) & 0xff);
+		hdr->cmdp[8] = (unsigned char) (nr_blocks & 0xff);
+	} else {
+		hdr->cmdp[2] = (unsigned char) ((lba >> 56) & 0xff);
+		hdr->cmdp[3] = (unsigned char) ((lba >> 48) & 0xff);
+		hdr->cmdp[4] = (unsigned char) ((lba >> 40) & 0xff);
+		hdr->cmdp[5] = (unsigned char) ((lba >> 32) & 0xff);
+		hdr->cmdp[6] = (unsigned char) ((lba >> 24) & 0xff);
+		hdr->cmdp[7] = (unsigned char) ((lba >> 16) & 0xff);
+		hdr->cmdp[8] = (unsigned char) ((lba >>  8) & 0xff);
+		hdr->cmdp[9] = (unsigned char) (lba & 0xff);
+		hdr->cmdp[10] = (unsigned char) ((nr_blocks >> 32) & 0xff);
+		hdr->cmdp[11] = (unsigned char) ((nr_blocks >> 16) & 0xff);
+		hdr->cmdp[12] = (unsigned char) ((nr_blocks >> 8) & 0xff);
+		hdr->cmdp[13] = (unsigned char) (nr_blocks & 0xff);
+	}
+
+	return;
+}
+
 static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct sg_io_hdr *hdr = &io_u->hdr;
 	struct sg_options *o = td->eo;
 	struct sgio_data *sd = td->io_ops_data;
-	long long nr_blocks, lba;
+	unsigned long long nr_blocks, lba;
+	int offset;
 
 	if (io_u->xfer_buflen & (sd->bs - 1)) {
 		log_err("read/write not sector aligned\n");
@@ -355,6 +492,8 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 		if (o->readfua)
 			hdr->cmdp[1] |= 0x08;
 
+		fio_sgio_rw_lba(hdr, lba, nr_blocks);
+
 	} else if (io_u->ddir == DDIR_WRITE) {
 		sgio_hdr_init(sd, hdr, io_u, 1);
 
@@ -383,58 +522,111 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 				hdr->cmdp[0] = 0x93; // write same(16)
 			break;
 		};
-	} else {
+
+		fio_sgio_rw_lba(hdr, lba, nr_blocks);
+
+	} else if (io_u->ddir == DDIR_TRIM) {
+		struct sgio_trim *st;
+
+		if (sd->current_queue == -1) {
+			sgio_hdr_init(sd, hdr, io_u, 0);
+
+			hdr->cmd_len = 10;
+			hdr->dxfer_direction = SG_DXFER_TO_DEV;
+			hdr->cmdp[0] = 0x42; // unmap
+			sd->current_queue = io_u->index;
+			st = sd->trim_queues[sd->current_queue];
+			hdr->dxferp = st->unmap_param;
+#ifdef FIO_SGIO_DEBUG
+			assert(sd->trim_queues[io_u->index]->unmap_range_count == 0);
+			dprint(FD_IO, "sg: creating new queue based on io_u %d\n", io_u->index);
+#endif
+		}
+		else
+			st = sd->trim_queues[sd->current_queue];
+
+		dprint(FD_IO, "sg: adding io_u %d to trim queue %d\n", io_u->index, sd->current_queue);
+		st->trim_io_us[st->unmap_range_count] = io_u;
+#ifdef FIO_SGIO_DEBUG
+		sd->trim_queue_map[io_u->index] = sd->current_queue;
+#endif
+
+		offset = 8 + 16 * st->unmap_range_count;
+		st->unmap_param[offset] = (unsigned char) ((lba >> 56) & 0xff);
+		st->unmap_param[offset+1] = (unsigned char) ((lba >> 48) & 0xff);
+		st->unmap_param[offset+2] = (unsigned char) ((lba >> 40) & 0xff);
+		st->unmap_param[offset+3] = (unsigned char) ((lba >> 32) & 0xff);
+		st->unmap_param[offset+4] = (unsigned char) ((lba >> 24) & 0xff);
+		st->unmap_param[offset+5] = (unsigned char) ((lba >> 16) & 0xff);
+		st->unmap_param[offset+6] = (unsigned char) ((lba >>  8) & 0xff);
+		st->unmap_param[offset+7] = (unsigned char) (lba & 0xff);
+		st->unmap_param[offset+8] = (unsigned char) ((nr_blocks >> 32) & 0xff);
+		st->unmap_param[offset+9] = (unsigned char) ((nr_blocks >> 16) & 0xff);
+		st->unmap_param[offset+10] = (unsigned char) ((nr_blocks >> 8) & 0xff);
+		st->unmap_param[offset+11] = (unsigned char) (nr_blocks & 0xff);
+
+		st->unmap_range_count++;
+
+	} else if (ddir_sync(io_u->ddir)) {
 		sgio_hdr_init(sd, hdr, io_u, 0);
 		hdr->dxfer_direction = SG_DXFER_NONE;
 		if (lba < MAX_10B_LBA)
 			hdr->cmdp[0] = 0x35; // synccache(10)
 		else
 			hdr->cmdp[0] = 0x91; // synccache(16)
-	}
+	} else
+		assert(0);
 
-	/*
-	 * for synccache, we leave lba and length to 0 to sync all
-	 * blocks on medium.
-	 */
-	if (hdr->dxfer_direction != SG_DXFER_NONE) {
-		if (lba < MAX_10B_LBA) {
-			hdr->cmdp[2] = (unsigned char) ((lba >> 24) & 0xff);
-			hdr->cmdp[3] = (unsigned char) ((lba >> 16) & 0xff);
-			hdr->cmdp[4] = (unsigned char) ((lba >>  8) & 0xff);
-			hdr->cmdp[5] = (unsigned char) (lba & 0xff);
-			hdr->cmdp[7] = (unsigned char) ((nr_blocks >> 8) & 0xff);
-			hdr->cmdp[8] = (unsigned char) (nr_blocks & 0xff);
-		} else {
-			hdr->cmdp[2] = (unsigned char) ((lba >> 56) & 0xff);
-			hdr->cmdp[3] = (unsigned char) ((lba >> 48) & 0xff);
-			hdr->cmdp[4] = (unsigned char) ((lba >> 40) & 0xff);
-			hdr->cmdp[5] = (unsigned char) ((lba >> 32) & 0xff);
-			hdr->cmdp[6] = (unsigned char) ((lba >> 24) & 0xff);
-			hdr->cmdp[7] = (unsigned char) ((lba >> 16) & 0xff);
-			hdr->cmdp[8] = (unsigned char) ((lba >>  8) & 0xff);
-			hdr->cmdp[9] = (unsigned char) (lba & 0xff);
-			hdr->cmdp[10] = (unsigned char) ((nr_blocks >> 32) & 0xff);
-			hdr->cmdp[11] = (unsigned char) ((nr_blocks >> 16) & 0xff);
-			hdr->cmdp[12] = (unsigned char) ((nr_blocks >> 8) & 0xff);
-			hdr->cmdp[13] = (unsigned char) (nr_blocks & 0xff);
-		}
-	}
-
-	hdr->timeout = SCSI_TIMEOUT_MS;
 	return 0;
 }
 
+static void fio_sgio_unmap_setup(struct sg_io_hdr *hdr, struct sgio_trim *st)
+{
+	hdr->dxfer_len = st->unmap_range_count * 16 + 8;
+	hdr->cmdp[7] = (unsigned char) (((st->unmap_range_count * 16 + 8) >> 8) & 0xff);
+	hdr->cmdp[8] = (unsigned char) ((st->unmap_range_count * 16 + 8) & 0xff);
+
+	st->unmap_param[0] = (unsigned char) (((16 * st->unmap_range_count + 6) >> 8) & 0xff);
+	st->unmap_param[1] = (unsigned char)  ((16 * st->unmap_range_count + 6) & 0xff);
+	st->unmap_param[2] = (unsigned char) (((16 * st->unmap_range_count) >> 8) & 0xff);
+	st->unmap_param[3] = (unsigned char)  ((16 * st->unmap_range_count) & 0xff);
+
+	return;
+}
+
 static enum fio_q_status fio_sgio_queue(struct thread_data *td,
 					struct io_u *io_u)
 {
 	struct sg_io_hdr *hdr = &io_u->hdr;
+	struct sgio_data *sd = td->io_ops_data;
 	int ret, do_sync = 0;
 
 	fio_ro_check(td, io_u);
 
-	if (td->o.sync_io || td->o.odirect || ddir_sync(io_u->ddir))
+	if (sgio_unbuffered(td) || ddir_sync(io_u->ddir))
 		do_sync = 1;
 
+	if (io_u->ddir == DDIR_TRIM) {
+		if (do_sync || io_u->file->filetype == FIO_TYPE_BLOCK) {
+			struct sgio_trim *st = sd->trim_queues[sd->current_queue];
+
+			/* finish cdb setup for unmap because we are
+			** doing unmap commands synchronously */
+#ifdef FIO_SGIO_DEBUG
+			assert(st->unmap_range_count == 1);
+			assert(io_u == st->trim_io_us[0]);
+#endif
+			hdr = &io_u->hdr;
+
+			fio_sgio_unmap_setup(hdr, st);
+
+			st->unmap_range_count = 0;
+			sd->current_queue = -1;
+		} else
+			/* queue up trim ranges and submit in commit() */
+			return FIO_Q_QUEUED;
+	}
+
 	ret = fio_sgio_doio(td, io_u, do_sync);
 
 	if (ret < 0)
@@ -442,6 +634,14 @@ static enum fio_q_status fio_sgio_queue(struct thread_data *td,
 	else if (hdr->status) {
 		io_u->resid = hdr->resid;
 		io_u->error = EIO;
+	} else if (td->io_ops->commit != NULL) {
+		if (do_sync && !ddir_sync(io_u->ddir)) {
+			io_u_mark_submit(td, 1);
+			io_u_mark_complete(td, 1);
+		} else if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
+			io_u_mark_submit(td, 1);
+			io_u_queued(td, io_u);
+		}
 	}
 
 	if (io_u->error) {
@@ -452,6 +652,61 @@ static enum fio_q_status fio_sgio_queue(struct thread_data *td,
 	return ret;
 }
 
+static int fio_sgio_commit(struct thread_data *td)
+{
+	struct sgio_data *sd = td->io_ops_data;
+	struct sgio_trim *st;
+	struct io_u *io_u;
+	struct sg_io_hdr *hdr;
+	struct timespec now;
+	unsigned int i;
+	int ret;
+
+	if (sd->current_queue == -1)
+		return 0;
+
+	st = sd->trim_queues[sd->current_queue];
+	io_u = st->trim_io_us[0];
+	hdr = &io_u->hdr;
+
+	fio_sgio_unmap_setup(hdr, st);
+
+	sd->current_queue = -1;
+
+	ret = fio_sgio_rw_doio(io_u->file, io_u, 0);
+
+	if (ret < 0)
+		for (i = 0; i < st->unmap_range_count; i++)
+			st->trim_io_us[i]->error = errno;
+	else if (hdr->status)
+		for (i = 0; i < st->unmap_range_count; i++) {
+			st->trim_io_us[i]->resid = hdr->resid;
+			st->trim_io_us[i]->error = EIO;
+		}
+	else {
+		if (fio_fill_issue_time(td)) {
+			fio_gettime(&now, NULL);
+			for (i = 0; i < st->unmap_range_count; i++) {
+				struct io_u *io_u = st->trim_io_us[i];
+
+				memcpy(&io_u->issue_time, &now, sizeof(now));
+				io_u_queued(td, io_u);
+			}
+		}
+		io_u_mark_submit(td, st->unmap_range_count);
+	}
+
+	if (io_u->error) {
+		td_verror(td, io_u->error, "xfer");
+		return 0;
+	}
+
+	if (ret == FIO_Q_QUEUED)
+		return 0;
+	else
+		return ret;
+}
+
 static struct io_u *fio_sgio_event(struct thread_data *td, int event)
 {
 	struct sgio_data *sd = td->io_ops_data;
@@ -553,6 +808,7 @@ static int fio_sgio_read_capacity(struct thread_data *td, unsigned int *bs,
 static void fio_sgio_cleanup(struct thread_data *td)
 {
 	struct sgio_data *sd = td->io_ops_data;
+	int i;
 
 	if (sd) {
 		free(sd->events);
@@ -560,6 +816,17 @@ static void fio_sgio_cleanup(struct thread_data *td)
 		free(sd->fd_flags);
 		free(sd->pfds);
 		free(sd->sgbuf);
+#ifdef FIO_SGIO_DEBUG
+		free(sd->trim_queue_map);
+#endif
+
+		for (i = 0; i < td->o.iodepth; i++) {
+			free(sd->trim_queues[i]->unmap_param);
+			free(sd->trim_queues[i]->trim_io_us);
+			free(sd->trim_queues[i]);
+		}
+
+		free(sd->trim_queues);
 		free(sd);
 	}
 }
@@ -567,20 +834,30 @@ static void fio_sgio_cleanup(struct thread_data *td)
 static int fio_sgio_init(struct thread_data *td)
 {
 	struct sgio_data *sd;
+	struct sgio_trim *st;
+	int i;
 
-	sd = malloc(sizeof(*sd));
-	memset(sd, 0, sizeof(*sd));
-	sd->cmds = malloc(td->o.iodepth * sizeof(struct sgio_cmd));
-	memset(sd->cmds, 0, td->o.iodepth * sizeof(struct sgio_cmd));
-	sd->events = malloc(td->o.iodepth * sizeof(struct io_u *));
-	memset(sd->events, 0, td->o.iodepth * sizeof(struct io_u *));
-	sd->pfds = malloc(sizeof(struct pollfd) * td->o.nr_files);
-	memset(sd->pfds, 0, sizeof(struct pollfd) * td->o.nr_files);
-	sd->fd_flags = malloc(sizeof(int) * td->o.nr_files);
-	memset(sd->fd_flags, 0, sizeof(int) * td->o.nr_files);
-	sd->sgbuf = malloc(sizeof(struct sg_io_hdr) * td->o.iodepth);
-	memset(sd->sgbuf, 0, sizeof(struct sg_io_hdr) * td->o.iodepth);
+	sd = calloc(1, sizeof(*sd));
+	sd->cmds = calloc(td->o.iodepth, sizeof(struct sgio_cmd));
+	sd->sgbuf = calloc(td->o.iodepth, sizeof(struct sg_io_hdr));
+	sd->events = calloc(td->o.iodepth, sizeof(struct io_u *));
+	sd->pfds = calloc(td->o.nr_files, sizeof(struct pollfd));
+	sd->fd_flags = calloc(td->o.nr_files, sizeof(int));
 	sd->type_checked = 0;
+
+	sd->trim_queues = calloc(td->o.iodepth, sizeof(struct sgio_trim *));
+	sd->current_queue = -1;
+#ifdef FIO_SGIO_DEBUG
+	sd->trim_queue_map = calloc(td->o.iodepth, sizeof(int));
+#endif
+	for (i = 0; i < td->o.iodepth; i++) {
+		sd->trim_queues[i] = calloc(1, sizeof(struct sgio_trim));
+		st = sd->trim_queues[i];
+		st->unmap_param = calloc(td->o.iodepth + 1, sizeof(char[16]));
+		st->unmap_range_count = 0;
+		st->trim_io_us = calloc(td->o.iodepth, sizeof(struct io_u *));
+	}
+
 	td->io_ops_data = sd;
 
 	/*
@@ -632,6 +909,12 @@ static int fio_sgio_type_check(struct thread_data *td, struct fio_file *f)
 	if (f->filetype == FIO_TYPE_BLOCK) {
 		td->io_ops->getevents = NULL;
 		td->io_ops->event = NULL;
+		td->io_ops->commit = NULL;
+		/*
+		** Setting these functions to null may cause problems
+		** with filename=/dev/sda:/dev/sg0 since we are only
+		** considering a single file
+		*/
 	}
 	sd->type_checked = 1;
 
@@ -848,6 +1131,23 @@ static char *fio_sgio_errdetails(struct io_u *io_u)
 			snprintf(msgchunk, MAXMSGCHUNK, "SG Driver: %d bytes out of %d not transferred. ", hdr->resid, hdr->dxfer_len);
 			strlcat(msg, msgchunk, MAXERRDETAIL);
 		}
+		if (hdr->cmdp) {
+			strlcat(msg, "cdb:", MAXERRDETAIL);
+			for (i = 0; i < hdr->cmd_len; i++) {
+				snprintf(msgchunk, MAXMSGCHUNK, " %02x", hdr->cmdp[i]);
+				strlcat(msg, msgchunk, MAXERRDETAIL);
+			}
+			strlcat(msg, ". ", MAXERRDETAIL);
+			if (io_u->ddir == DDIR_TRIM) {
+				unsigned char *param_list = hdr->dxferp;
+				strlcat(msg, "dxferp:", MAXERRDETAIL);
+				for (i = 0; i < hdr->dxfer_len; i++) {
+					snprintf(msgchunk, MAXMSGCHUNK, " %02x", param_list[i]);
+					strlcat(msg, msgchunk, MAXERRDETAIL);
+				}
+				strlcat(msg, ". ", MAXERRDETAIL);
+			}
+		}
 	}
 
 	if (!(hdr->info & SG_INFO_CHECK) && !strlen(msg))
@@ -906,6 +1206,7 @@ static struct ioengine_ops ioengine = {
 	.init		= fio_sgio_init,
 	.prep		= fio_sgio_prep,
 	.queue		= fio_sgio_queue,
+	.commit		= fio_sgio_commit,
 	.getevents	= fio_sgio_getevents,
 	.errdetails	= fio_sgio_errdetails,
 	.event		= fio_sgio_event,
diff --git a/fio.1 b/fio.1
index 6d2eba6..a446aba 100644
--- a/fio.1
+++ b/fio.1
@@ -757,7 +757,7 @@ Sequential reads.
 Sequential writes.
 .TP
 .B trim
-Sequential trims (Linux block devices only).
+Sequential trims (Linux block devices and SCSI character devices only).
 .TP
 .B randread
 Random reads.
@@ -766,7 +766,7 @@ Random reads.
 Random writes.
 .TP
 .B randtrim
-Random trims (Linux block devices only).
+Random trims (Linux block devices and SCSI character devices only).
 .TP
 .B rw,readwrite
 Sequential mixed reads and writes.
@@ -1524,7 +1524,8 @@ SCSI generic sg v3 I/O. May either be synchronous using the SG_IO
 ioctl, or if the target is an sg character device we use
 \fBread\fR\|(2) and \fBwrite\fR\|(2) for asynchronous
 I/O. Requires \fBfilename\fR option to specify either block or
-character devices. The sg engine includes engine specific options.
+character devices. This engine supports trim operations. The
+sg engine includes engine specific options.
 .TP
 .B null
 Doesn't transfer any data, just pretends to. This is mainly used to
diff --git a/ioengines.c b/ioengines.c
index bce65ea..e5fbcd4 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -350,7 +350,7 @@ enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 			 "invalid block size. Try setting direct=0.\n");
 	}
 
-	if (!td->io_ops->commit || io_u->ddir == DDIR_TRIM) {
+	if (!td->io_ops->commit) {
 		io_u_mark_submit(td, 1);
 		io_u_mark_complete(td, 1);
 	}
diff --git a/stat.c b/stat.c
index 8de4835..82e79df 100644
--- a/stat.c
+++ b/stat.c
@@ -1295,13 +1295,8 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 	json_object_add_value_int(root, "majf", ts->majf);
 	json_object_add_value_int(root, "minf", ts->minf);
 
-
-	/* Calc % distribution of IO depths, usecond, msecond latency */
+	/* Calc % distribution of IO depths */
 	stat_calc_dist(ts->io_u_map, ddir_rw_sum(ts->total_io_u), io_u_dist);
-	stat_calc_lat_n(ts, io_u_lat_n);
-	stat_calc_lat_u(ts, io_u_lat_u);
-	stat_calc_lat_m(ts, io_u_lat_m);
-
 	tmp = json_create_object();
 	json_object_add_value_object(root, "iodepth_level", tmp);
 	/* Only show fixed 7 I/O depth levels*/
@@ -1314,6 +1309,44 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 		json_object_add_value_float(tmp, (const char *)name, io_u_dist[i]);
 	}
 
+	/* Calc % distribution of submit IO depths */
+	stat_calc_dist(ts->io_u_submit, ts->total_submit, io_u_dist);
+	tmp = json_create_object();
+	json_object_add_value_object(root, "iodepth_submit", tmp);
+	/* Only show fixed 7 I/O depth levels*/
+	for (i = 0; i < 7; i++) {
+		char name[20];
+		if (i == 0)
+			snprintf(name, 20, "0");
+		else if (i < 6)
+			snprintf(name, 20, "%d", 1 << (i+1));
+		else
+			snprintf(name, 20, ">=%d", 1 << i);
+		json_object_add_value_float(tmp, (const char *)name, io_u_dist[i]);
+	}
+
+	/* Calc % distribution of completion IO depths */
+	stat_calc_dist(ts->io_u_complete, ts->total_complete, io_u_dist);
+	tmp = json_create_object();
+	json_object_add_value_object(root, "iodepth_complete", tmp);
+	/* Only show fixed 7 I/O depth levels*/
+	for (i = 0; i < 7; i++) {
+		char name[20];
+		if (i == 0)
+			snprintf(name, 20, "0");
+		else if (i < 6)
+			snprintf(name, 20, "%d", 1 << (i+1));
+		else
+			snprintf(name, 20, ">=%d", 1 << i);
+		json_object_add_value_float(tmp, (const char *)name, io_u_dist[i]);
+	}
+
+	/* Calc % distribution of nsecond, usecond, msecond latency */
+	stat_calc_dist(ts->io_u_map, ddir_rw_sum(ts->total_io_u), io_u_dist);
+	stat_calc_lat_n(ts, io_u_lat_n);
+	stat_calc_lat_u(ts, io_u_lat_u);
+	stat_calc_lat_m(ts, io_u_lat_m);
+
 	/* Nanosecond latency */
 	tmp = json_create_object();
 	json_object_add_value_object(root, "latency_ns", tmp);
diff --git a/t/sgunmap-perf.py b/t/sgunmap-perf.py
new file mode 100755
index 0000000..fadbb85
--- /dev/null
+++ b/t/sgunmap-perf.py
@@ -0,0 +1,115 @@
+#!/usr/bin/python2.7
+#
+# sgunmap-test.py
+#
+# Basic performance testing using fio's sg ioengine
+#
+# USAGE
+# sgunmap-perf.py char-device block-device fio-executable
+#
+# EXAMPLE
+# t/sgunmap-perf.py /dev/sg1 /dev/sdb ./fio
+#
+# REQUIREMENTS
+# Python 2.6+
+#
+#
+
+from __future__ import absolute_import
+from __future__ import print_function
+import sys
+import json
+import argparse
+import subprocess
+from six.moves import range
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('cdev',
+                        help='character device target (e.g., /dev/sg0)')
+    parser.add_argument('bdev',
+                        help='block device target (e.g., /dev/sda)')
+    parser.add_argument('fioc',
+                        help='path to candidate fio executable (e.g., ./fio)')
+    parser.add_argument('fior',
+                        help='path to reference fio executable (e.g., ./fio)')
+    args = parser.parse_args()
+
+    return args
+
+
+def fulldevice(fio, dev, ioengine='psync', rw='trim', bs='1M'):
+    parameters = ["--name=test",
+                  "--output-format=json",
+                  "--random_generator=lfsr",
+                  "--bs={0}".format(bs),
+                  "--rw={0}".format(rw),
+                  "--ioengine={0}".format(ioengine),
+                  "--filename={0}".format(dev)]
+
+    output = subprocess.check_output([fio] + parameters)
+    jsondata = json.loads(output)
+    jobdata = jsondata['jobs'][0]
+    return jobdata
+
+
+def runtest(fio, dev, rw, qd, batch, bs='512', runtime='30s'):
+    parameters = ["--name=test",
+                  "--random_generator=tausworthe64",
+                  "--time_based",
+                  "--runtime={0}".format(runtime),
+                  "--output-format=json",
+                  "--ioengine=sg",
+                  "--blocksize={0}".format(bs),
+                  "--rw={0}".format(rw),
+                  "--filename={0}".format(dev),
+                  "--iodepth={0}".format(qd),
+                  "--iodepth_batch={0}".format(batch)]
+
+    output = subprocess.check_output([fio] + parameters)
+    jsondata = json.loads(output)
+    jobdata = jsondata['jobs'][0]
+#    print(parameters)
+
+    return jobdata
+
+
+def runtests(fio, dev, qd, batch, rw, bs='512', trials=5):
+    iops = []
+    for x in range(trials):
+        jd = runtest(fio, dev, rw, qd, batch, bs=bs)
+        total = jd['read']['iops'] + jd['write']['iops'] + jd['trim']['iops']
+#       print(total)
+        iops.extend([total])
+    return iops, (sum(iops) / trials)
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    print("Trimming full device {0}".format(args.cdev))
+    fulldevice(args.fior, args.cdev, ioengine='sg')
+
+    print("Running rand read tests on {0}"
+        " with fio candidate build {1}".format(args.cdev, args.fioc))
+    randread, rrmean = runtests(args.fioc, args.cdev, 16, 1, 'randread',
+        trials=5)
+    print("IOPS mean {0}, trials {1}".format(rrmean, randread))
+
+    print("Running rand read tests on {0}"
+        " with fio reference build {1}".format(args.cdev, args.fior))
+    randread, rrmean = runtests(args.fior, args.cdev, 16, 1, 'randread',
+        trials=5)
+    print("IOPS mean {0}, trials {1}".format(rrmean, randread))
+
+    print("Running rand write tests on {0}"
+        " with fio candidate build {1}".format(args.cdev, args.fioc))
+    randwrite, rwmean = runtests(args.fioc, args.cdev, 16, 1, 'randwrite',
+        trials=5)
+    print("IOPS mean {0}, trials {1}".format(rwmean, randwrite))
+
+    print("Running rand write tests on {0}"
+        " with fio reference build {1}".format(args.cdev, args.fior))
+    randwrite, rwmean = runtests(args.fior, args.cdev, 16, 1, 'randwrite',
+        trials=5)
+    print("IOPS mean {0}, trials {1}".format(rwmean, randwrite))
diff --git a/t/sgunmap-test.py b/t/sgunmap-test.py
new file mode 100755
index 0000000..d2caa5f
--- /dev/null
+++ b/t/sgunmap-test.py
@@ -0,0 +1,173 @@
+#!/usr/bin/python2.7
+# Note: this script is python2 and python 3 compatible.
+#
+# sgunmap-test.py
+#
+# Limited functonality test for trim workloads using fio's sg ioengine
+# This checks only the three sets of reported iodepths
+#
+# !!!WARNING!!!
+# This script carries out destructive tests. Be sure that
+# there is no data you want to keep on the supplied devices.
+#
+# USAGE
+# sgunmap-test.py char-device block-device fio-executable
+#
+# EXAMPLE
+# t/sgunmap-test.py /dev/sg1 /dev/sdb ./fio
+#
+# REQUIREMENTS
+# Python 2.6+
+#
+# TEST MATRIX
+# For both char-dev and block-dev these are the expected
+# submit/complete IO depths
+#
+#                       blockdev                chardev
+#                       iodepth                 iodepth
+# R QD1                 sub/comp: 1-4=100%      sub/comp: 1-4=100%
+# W QD1                 sub/comp: 1-4=100%      sub/comp: 1-4=100%
+# T QD1                 sub/comp: 1-4=100%      sub/comp: 1-4=100%
+#
+# R QD16, batch8        sub/comp: 1-4=100%      sub/comp: 1-4=100%
+# W QD16, batch8        sub/comp: 1-4=100%      sub/comp: 1-4=100%
+# T QD16, batch8        sub/comp: 1-4=100%      sub/comp: 5-8=100%
+#
+# R QD16, batch16       sub/comp: 1-4=100%      sub/comp: 1-4=100%
+# W QD16, batch16       sub/comp: 1-4=100%      sub/comp: 1-4=100%
+# T QD16, batch16       sub/comp: 1-4=100%      sub/comp: 9-16=100%
+#
+
+from __future__ import absolute_import
+from __future__ import print_function
+import sys
+import json
+import argparse
+import traceback
+import subprocess
+from six.moves import range
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('chardev',
+                        help='character device target (e.g., /dev/sg0)')
+    parser.add_argument('blockdev',
+                        help='block device target (e.g., /dev/sda)')
+    parser.add_argument('fio',
+                        help='path to fio executable (e.g., ./fio)')
+    args = parser.parse_args()
+
+    return args
+
+#
+# With block devices,
+#     iodepth = 1 always
+#     submit = complete = 1-4 always
+# With character devices,
+# RW
+#     iodepth = qd
+#     submit = 1-4
+#     complete = 1-4 except for the IOs in flight
+#                when the job is ending
+# T
+#     iodepth = qd
+#     submit = qdbatch
+#     complete = qdbatch except for the IOs in flight
+#                when the job is ending
+#
+
+
+def check(jsondata, parameters, block, qd, qdbatch, rw):
+    iodepth = jsondata['iodepth_level']
+    submit = jsondata['iodepth_submit']
+    complete = jsondata['iodepth_complete']
+
+    try:
+        if block:
+            assert iodepth['1'] == 100.0
+            assert submit['4'] == 100.0
+            assert complete['4'] == 100.0
+        elif 'read' in rw or 'write' in rw:
+            assert iodepth[str(qd)] > 99.9
+            assert submit['4'] == 100.0
+            assert complete['4'] > 99.9
+        else:
+            if qdbatch <= 4:
+                batchkey = '4'
+            elif qdbatch > 64:
+                batchkey = '>=64'
+            else:
+                batchkey = str(qdbatch)
+            if qd >= 64:
+                qdkey = ">=64"
+            else:
+                qdkey = str(qd)
+            assert iodepth[qdkey] > 99
+            assert submit[batchkey] == 100.0
+            assert complete[batchkey] > 99
+    except AssertionError:
+        print("Assertion failed")
+        traceback.print_exc()
+        print(jsondata)
+        return
+
+    print("**********passed*********")
+
+
+def runalltests(args, qd, batch):
+    block = False
+    for dev in [args.chardev, args.blockdev]:
+        for rw in ["randread", "randwrite", "randtrim"]:
+            parameters = ["--name=test",
+                           "--time_based",
+                           "--runtime=30s",
+                           "--output-format=json",
+                           "--ioengine=sg",
+                           "--rw={0}".format(rw),
+                           "--filename={0}".format(dev),
+                           "--iodepth={0}".format(qd),
+                           "--iodepth_batch={0}".format(batch)]
+
+            print(parameters)
+            output = subprocess.check_output([args.fio] + parameters)
+            jsondata = json.loads(output)
+            jobdata = jsondata['jobs'][0]
+            check(jobdata, parameters, block, qd, batch, rw)
+        block = True
+
+
+def runcdevtrimtest(args, qd, batch):
+    parameters = ["--name=test",
+                   "--time_based",
+                   "--runtime=30s",
+                   "--output-format=json",
+                   "--ioengine=sg",
+                   "--rw=randtrim",
+                   "--filename={0}".format(args.chardev),
+                   "--iodepth={0}".format(qd),
+                   "--iodepth_batch={0}".format(batch)]
+
+    print(parameters)
+    output = subprocess.check_output([args.fio] + parameters)
+    jsondata = json.loads(output)
+    jobdata = jsondata['jobs'][0]
+    check(jobdata, parameters, False, qd, batch, "randtrim")
+
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    runcdevtrimtest(args, 32, 2)
+    runcdevtrimtest(args, 32, 4)
+    runcdevtrimtest(args, 32, 8)
+    runcdevtrimtest(args, 64, 4)
+    runcdevtrimtest(args, 64, 8)
+    runcdevtrimtest(args, 64, 16)
+    runcdevtrimtest(args, 128, 8)
+    runcdevtrimtest(args, 128, 16)
+    runcdevtrimtest(args, 128, 32)
+
+    runalltests(args, 1, 1)
+    runalltests(args, 16, 2)
+    runalltests(args, 16, 16)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-07-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-07-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9a496382133e8003bd56ab6f3d260c5afadae555:

  init: unify 't' time period (2018-07-24 15:23:28 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 40e8d8314f10a578765e20a4eb574b2603d292df:

  Merge branch 'fio-histo-log-pctiles' of https://github.com/parallel-fs-utils/fio (2018-07-25 14:42:57 -0600)

----------------------------------------------------------------
Ben England (4):
      get latency percentiles over time from fio histo logs
      use interpolation for more accurate percentile calculation
      switch to argparse module for CLI parsing
      design document for tools/hist/fio-histo-log-pctiles.py

Jens Axboe (2):
      Merge branch 'fio-c++-engine' of https://github.com/tchaikov/fio
      Merge branch 'fio-histo-log-pctiles' of https://github.com/parallel-fs-utils/fio

Kefu Chai (1):
      replace typeof with __typeof__

 compiler/compiler.h                 |   4 +-
 doc/fio-histo-log-pctiles.pdf       | Bin 0 -> 182996 bytes
 flist.h                             |   4 +-
 minmax.h                            |  12 +-
 oslib/libmtd_common.h               |  10 +-
 tools/hist/fio-histo-log-pctiles.py | 660 ++++++++++++++++++++++++++++++++++++
 verify.c                            |   2 +-
 7 files changed, 676 insertions(+), 16 deletions(-)
 create mode 100644 doc/fio-histo-log-pctiles.pdf
 create mode 100755 tools/hist/fio-histo-log-pctiles.py

---

Diff of recent changes:

diff --git a/compiler/compiler.h b/compiler/compiler.h
index dacb737..ddfbcc1 100644
--- a/compiler/compiler.h
+++ b/compiler/compiler.h
@@ -28,7 +28,7 @@
  */
 #define typecheck(type,x) \
 ({	type __dummy; \
-	typeof(x) __dummy2; \
+	__typeof__(x) __dummy2; \
 	(void)(&__dummy == &__dummy2); \
 	1; \
 })
@@ -70,7 +70,7 @@
 
 #ifdef FIO_INTERNAL
 #define ARRAY_SIZE(x)    (sizeof((x)) / (sizeof((x)[0])))
-#define FIELD_SIZE(s, f) (sizeof(((typeof(s))0)->f))
+#define FIELD_SIZE(s, f) (sizeof(((__typeof__(s))0)->f))
 #endif
 
 #endif
diff --git a/doc/fio-histo-log-pctiles.pdf b/doc/fio-histo-log-pctiles.pdf
new file mode 100644
index 0000000..069ab99
Binary files /dev/null and b/doc/fio-histo-log-pctiles.pdf differ
diff --git a/flist.h b/flist.h
index 2ca3d77..5437cd8 100644
--- a/flist.h
+++ b/flist.h
@@ -4,8 +4,8 @@
 #include <stdlib.h>
 #include <stddef.h>
 
-#define container_of(ptr, type, member) ({			\
-	const typeof( ((type *)0)->member ) *__mptr = (ptr);	\
+#define container_of(ptr, type, member)  ({			\
+	const __typeof__( ((type *)0)->member ) *__mptr = (ptr);	\
 	(type *)( (char *)__mptr - offsetof(type,member) );})
 
 /*
diff --git a/minmax.h b/minmax.h
index afc78f0..ec0848c 100644
--- a/minmax.h
+++ b/minmax.h
@@ -3,23 +3,23 @@
 
 #ifndef min
 #define min(x,y) ({ \
-	typeof(x) _x = (x);	\
-	typeof(y) _y = (y);	\
+	__typeof__(x) _x = (x);	\
+	__typeof__(y) _y = (y);	\
 	(void) (&_x == &_y);		\
 	_x < _y ? _x : _y; })
 #endif
 
 #ifndef max
 #define max(x,y) ({ \
-	typeof(x) _x = (x);	\
-	typeof(y) _y = (y);	\
+	__typeof__(x) _x = (x);	\
+	__typeof__(y) _y = (y);	\
 	(void) (&_x == &_y);		\
 	_x > _y ? _x : _y; })
 #endif
 
 #define min_not_zero(x, y) ({		\
-	typeof(x) __x = (x);		\
-	typeof(y) __y = (y);		\
+	__typeof__(x) __x = (x);		\
+	__typeof__(y) __y = (y);		\
 	__x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
 
 #endif
diff --git a/oslib/libmtd_common.h b/oslib/libmtd_common.h
index 87f93b6..4ed9f0b 100644
--- a/oslib/libmtd_common.h
+++ b/oslib/libmtd_common.h
@@ -49,18 +49,18 @@ extern "C" {
 #define min(a, b) MIN(a, b) /* glue for linux kernel source */
 #define ARRAY_SIZE(a) (sizeof(a) / sizeof((a)[0]))
 
-#define ALIGN(x,a) __ALIGN_MASK(x,(typeof(x))(a)-1)
+#define ALIGN(x,a) __ALIGN_MASK(x,(__typeof__(x))(a)-1)
 #define __ALIGN_MASK(x,mask) (((x)+(mask))&~(mask))
 
 #define min_t(t,x,y) ({ \
-	typeof((x)) _x = (x); \
-	typeof((y)) _y = (y); \
+	__typeof__((x)) _x = (x); \
+	__typeof__((y)) _y = (y); \
 	(_x < _y) ? _x : _y; \
 })
 
 #define max_t(t,x,y) ({ \
-	typeof((x)) _x = (x); \
-	typeof((y)) _y = (y); \
+	__typeof__((x)) _x = (x); \
+	__typeof__((y)) _y = (y); \
 	(_x > _y) ? _x : _y; \
 })
 
diff --git a/tools/hist/fio-histo-log-pctiles.py b/tools/hist/fio-histo-log-pctiles.py
new file mode 100755
index 0000000..bad6887
--- /dev/null
+++ b/tools/hist/fio-histo-log-pctiles.py
@@ -0,0 +1,660 @@
+#!/usr/bin/env python
+
+# module to parse fio histogram log files, not using pandas
+# runs in python v2 or v3
+# to get help with the CLI: $ python fio-histo-log-pctiles.py -h
+# this can be run standalone as a script but is callable
+# assumes all threads run for same time duration
+# assumes all threads are doing the same thing for the entire run
+
+# percentiles:
+#  0 - min latency
+#  50 - median
+#  100 - max latency
+
+# TO-DO: 
+#   separate read and write stats for randrw mixed workload
+#   report average latency if needed
+#   prove that it works (partially done with unit tests)
+
+# to run unit tests, set UNITTEST environment variable to anything
+# if you do this, don't pass normal CLI parameters to it
+# otherwise it runs the CLI
+
+import sys, os, math, copy
+from copy import deepcopy
+import argparse
+import unittest2
+
+msec_per_sec = 1000
+nsec_per_usec = 1000
+
+class FioHistoLogExc(Exception):
+    pass
+
+# if there is an error, print message, and exit with error status
+
+def myabort(msg):
+    print('ERROR: ' + msg)
+    sys.exit(1)
+
+# convert histogram log file into a list of
+# (time_ms, direction, bsz, buckets) tuples where
+# - time_ms is the time in msec at which the log record was written
+# - direction is 0 (read) or 1 (write)
+# - bsz is block size (not used)
+# - buckets is a CSV list of counters that make up the histogram
+# caller decides if the expected number of counters are present
+
+
+def exception_suffix( record_num, pathname ):
+    return 'in histogram record %d file %s' % (record_num+1, pathname)
+
+# log file parser raises FioHistoLogExc exceptions
+# it returns histogram buckets in whatever unit fio uses
+
+def parse_hist_file(logfn, buckets_per_interval):
+    max_timestamp_ms = 0.0
+    
+    with open(logfn, 'r') as f:
+        records = [ l.strip() for l in f.readlines() ]
+    intervals = []
+    for k, r in enumerate(records):
+        if r == '':
+            continue
+        tokens = r.split(',')
+        try:
+            int_tokens = [ int(t) for t in tokens ]
+        except ValueError as e:
+            raise FioHistoLogExc('non-integer value %s' % exception_suffix(k+1, logfn))
+
+        neg_ints = list(filter( lambda tk : tk < 0, int_tokens ))
+        if len(neg_ints) > 0:
+            raise FioHistoLogExc('negative integer value %s' % exception_suffix(k+1, logfn))
+
+        if len(int_tokens) < 3:
+            raise FioHistoLogExc('too few numbers %s' % exception_suffix(k+1, logfn))
+
+        time_ms = int_tokens[0]
+        if time_ms > max_timestamp_ms:
+            max_timestamp_ms = time_ms
+
+        direction = int_tokens[1]
+        if direction != 0 and direction != 1:
+            raise FioHistoLogExc('invalid I/O direction %s' % exception_suffix(k+1, logfn))
+
+        bsz = int_tokens[2]
+        if bsz > (1 << 24):
+            raise FioHistoLogExc('block size too large %s' % exception_suffix(k+1, logfn))
+
+        buckets = int_tokens[3:]
+        if len(buckets) != buckets_per_interval:
+            raise FioHistoLogExc('%d buckets per interval but %d expected in %s' % 
+                    (len(buckets), buckets_per_interval, exception_suffix(k+1, logfn)))
+        intervals.append((time_ms, direction, bsz, buckets))
+    if len(intervals) == 0:
+        raise FioHistoLogExc('no records in %s' % logfn)
+    return (intervals, max_timestamp_ms)
+
+
+# compute time range for each bucket index in histogram record
+# see comments in https://github.com/axboe/fio/blob/master/stat.h
+# for description of bucket groups and buckets
+# fio v3 bucket ranges are in nanosec (since response times are measured in nanosec)
+# but we convert fio v3 nanosecs to floating-point microseconds
+
+def time_ranges(groups, counters_per_group, fio_version=3):
+    bucket_width = 1
+    bucket_base = 0
+    bucket_intervals = []
+    for g in range(0, groups):
+        for b in range(0, counters_per_group):
+            rmin = float(bucket_base)
+            rmax = rmin + bucket_width
+            if fio_version == 3:
+                rmin /= nsec_per_usec
+                rmax /= nsec_per_usec
+            bucket_intervals.append( [rmin, rmax] )
+            bucket_base += bucket_width
+        if g != 0:
+            bucket_width *= 2
+    return bucket_intervals
+
+
+# compute number of time quantum intervals in the test
+
+def get_time_intervals(time_quantum, max_timestamp_ms):
+    # round down to nearest second
+    max_timestamp = max_timestamp_ms // msec_per_sec
+    # round up to nearest whole multiple of time_quantum
+    time_interval_count = (max_timestamp + time_quantum) // time_quantum
+    end_time = time_interval_count * time_quantum
+    return (end_time, time_interval_count)
+
+# align raw histogram log data to time quantum so 
+# we can then combine histograms from different threads with addition
+# for randrw workload we count both reads and writes in same output bucket
+# but we separate reads and writes for purposes of calculating
+# end time for histogram record.
+# this requires us to weight a raw histogram bucket by the 
+# fraction of time quantum that the bucket overlaps the current
+# time quantum interval
+# for example, if we have a bucket with 515 samples for time interval
+# [ 1010, 2014 ] msec since start of test, and time quantum is 1 sec, then
+# for time quantum interval [ 1000, 2000 ] msec, the overlap is
+# (2000 - 1010) / (2000 - 1000) = 0.99
+# so the contribution of this bucket to this time quantum is
+# 515 x 0.99 = 509.85
+
+def align_histo_log(raw_histogram_log, time_quantum, bucket_count, max_timestamp_ms):
+
+    # slice up test time int intervals of time_quantum seconds
+
+    (end_time, time_interval_count) = get_time_intervals(time_quantum, max_timestamp_ms)
+    time_qtm_ms = time_quantum * msec_per_sec
+    end_time_ms = end_time * msec_per_sec
+    aligned_intervals = []
+    for j in range(0, time_interval_count):
+        aligned_intervals.append((
+            j * time_qtm_ms,
+            [ 0.0 for j in range(0, bucket_count) ] ))
+
+    log_record_count = len(raw_histogram_log)
+    for k, record in enumerate(raw_histogram_log):
+
+        # find next record with same direction to get end-time
+        # have to avoid going past end of array
+        # for fio randrw workload, 
+        # we have read and write records on same time interval
+        # sometimes read and write records are in opposite order
+        # assertion checks that next read/write record 
+        # can be separated by at most 2 other records
+
+        (time_msec, direction, sz, interval_buckets) = record
+        if k+1 < log_record_count:
+            (time_msec_end, direction2, _, _) = raw_histogram_log[k+1]
+            if direction2 != direction:
+                if k+2 < log_record_count:
+                    (time_msec_end, direction2, _, _) = raw_histogram_log[k+2]
+                    if direction2 != direction:
+                        if k+3 < log_record_count:
+                            (time_msec_end, direction2, _, _) = raw_histogram_log[k+3]
+                            assert direction2 == direction
+                        else:
+                            time_msec_end = end_time_ms
+                else:
+                    time_msec_end = end_time_ms
+        else:
+            time_msec_end = end_time_ms
+
+        # calculate first quantum that overlaps this histogram record 
+
+        qtm_start_ms = (time_msec // time_qtm_ms) * time_qtm_ms
+        qtm_end_ms = ((time_msec + time_qtm_ms) // time_qtm_ms) * time_qtm_ms
+        qtm_index = qtm_start_ms // time_qtm_ms
+
+        # for each quantum that overlaps this histogram record's time interval
+
+        while qtm_start_ms < time_msec_end:  # while quantum overlaps record
+
+            # calculate fraction of time that this quantum 
+            # overlaps histogram record's time interval
+            
+            overlap_start = max(qtm_start_ms, time_msec)
+            overlap_end = min(qtm_end_ms, time_msec_end)
+            weight = float(overlap_end - overlap_start)
+            weight /= (time_msec_end - time_msec)
+            (_,aligned_histogram) = aligned_intervals[qtm_index]
+            for bx, b in enumerate(interval_buckets):
+                weighted_bucket = weight * b
+                aligned_histogram[bx] += weighted_bucket
+
+            # advance to the next time quantum
+
+            qtm_start_ms += time_qtm_ms
+            qtm_end_ms += time_qtm_ms
+            qtm_index += 1
+
+    return aligned_intervals
+
+# add histogram in "source" to histogram in "target"
+# it is assumed that the 2 histograms are precisely time-aligned
+
+def add_to_histo_from( target, source ):
+    for b in range(0, len(source)):
+        target[b] += source[b]
+
+# compute percentiles
+# inputs:
+#   buckets: histogram bucket array 
+#   wanted: list of floating-pt percentiles to calculate
+#   time_ranges: [tmin,tmax) time interval for each bucket
+# returns None if no I/O reported.
+# otherwise we would be dividing by zero
+# think of buckets as probability distribution function
+# and this loop is integrating to get cumulative distribution function
+
+def get_pctiles(buckets, wanted, time_ranges):
+
+    # get total of IO requests done
+    total_ios = 0
+    for io_count in buckets:
+        total_ios += io_count
+
+    # don't return percentiles if no I/O was done during interval
+    if total_ios == 0.0:
+        return None
+
+    pctile_count = len(wanted)
+
+    # results returned as dictionary keyed by percentile
+    pctile_result = {}
+
+    # index of next percentile in list
+    pctile_index = 0
+
+    # next percentile
+    next_pctile = wanted[pctile_index]
+
+    # no one is interested in percentiles bigger than this but not 100.0
+    # this prevents floating-point error from preventing loop exit
+    almost_100 = 99.9999
+
+    # pct is the percentile corresponding to 
+    # all I/O requests up through bucket b
+    pct = 0.0
+    total_so_far = 0
+    for b, io_count in enumerate(buckets):
+        if io_count == 0:
+            continue
+        total_so_far += io_count
+        # last_pct_lt is the percentile corresponding to 
+        # all I/O requests up to, but not including, bucket b
+        last_pct = pct
+        pct = 100.0 * float(total_so_far) / total_ios
+        # a single bucket could satisfy multiple pctiles
+        # so this must be a while loop
+        # for 100-percentile (max latency) case, no bucket exceeds it 
+        # so we must stop there.
+        while ((next_pctile == 100.0 and pct >= almost_100) or
+               (next_pctile < 100.0  and pct > next_pctile)):
+            # interpolate between min and max time for bucket time interval
+            # we keep the time_ranges access inside this loop, 
+            # even though it could be above the loop,
+            # because in many cases we will not be even entering 
+            # the loop so we optimize out these accesses
+            range_max_time = time_ranges[b][1]
+            range_min_time = time_ranges[b][0]
+            offset_frac = (next_pctile - last_pct)/(pct - last_pct)
+            interpolation = range_min_time + (offset_frac*(range_max_time - range_min_time))
+            pctile_result[next_pctile] = interpolation
+            pctile_index += 1
+            if pctile_index == pctile_count:
+                break
+            next_pctile = wanted[pctile_index]
+        if pctile_index == pctile_count:
+            break
+    assert pctile_index == pctile_count
+    return pctile_result
+
+
+# this is really the main program
+
+def compute_percentiles_from_logs():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--fio-version", dest="fio_version", 
+        default="3", choices=[2,3], type=int, 
+        help="fio version (default=3)")
+    parser.add_argument("--bucket-groups", dest="bucket_groups", default="29", type=int, 
+        help="fio histogram bucket groups (default=29)")
+    parser.add_argument("--bucket-bits", dest="bucket_bits", 
+        default="6", type=int, 
+        help="fio histogram buckets-per-group bits (default=6 means 64 buckets/group)")
+    parser.add_argument("--percentiles", dest="pctiles_wanted", 
+        default="0 50 95 99 100", type=float, nargs='+',
+        help="fio histogram buckets-per-group bits (default=6 means 64 buckets/group)")
+    parser.add_argument("--time-quantum", dest="time_quantum", 
+        default="1", type=int,
+        help="time quantum in seconds (default=1)")
+    parser.add_argument("--output-unit", dest="output_unit", 
+        default="usec", type=str,
+        help="Latency percentile output unit: msec|usec|nsec (default usec)")
+    parser.add_argument("file_list", nargs='+')
+    args = parser.parse_args()
+    print(args)
+
+    if not args.bucket_groups:
+        # default changes based on fio version
+        if fio_version == 2:
+            args.bucket_groups = 19
+        else:
+            # default in fio 3.x
+            args.bucket_groups = 29
+
+    # print parameters
+
+    print('bucket groups = %d' % args.bucket_groups)
+    print('bucket bits = %d' % args.bucket_bits)
+    print('time quantum = %d sec' % args.time_quantum)
+    print('percentiles = %s' % ','.join([ str(p) for p in args.pctiles_wanted ]))
+    buckets_per_group = 1 << args.bucket_bits
+    print('buckets per group = %d' % buckets_per_group)
+    buckets_per_interval = buckets_per_group * args.bucket_groups
+    print('buckets per interval = %d ' % buckets_per_interval)
+    bucket_index_range = range(0, buckets_per_interval)
+    if args.time_quantum == 0:
+        print('ERROR: time-quantum must be a positive number of seconds')
+    print('output unit = ' + args.output_unit)
+    if args.output_unit == 'msec':
+        time_divisor = 1000.0
+    elif args.output_unit == 'usec':
+        time_divisor = 1.0
+
+    # calculate response time interval associated with each histogram bucket
+
+    bucket_times = time_ranges(args.bucket_groups, buckets_per_group, fio_version=args.fio_version)
+
+    # construct template for each histogram bucket array with buckets all zeroes
+    # we just copy this for each new histogram
+
+    zeroed_buckets = [ 0.0 for r in bucket_index_range ]
+
+    # print CSV header just like fiologparser_hist does
+
+    header = 'msec, '
+    for p in args.pctiles_wanted:
+        header += '%3.1f, ' % p
+    print('time (millisec), percentiles in increasing order with values in ' + args.output_unit)
+    print(header)
+
+    # parse the histogram logs
+    # assumption: each bucket has a monotonically increasing time
+    # assumption: time ranges do not overlap for a single thread's records
+    # (exception: if randrw workload, then there is a read and a write 
+    # record for the same time interval)
+
+    max_timestamp_all_logs = 0
+    hist_files = {}
+    for fn in args.file_list:
+        try:
+            (hist_files[fn], max_timestamp_ms)  = parse_hist_file(fn, buckets_per_interval)
+        except FioHistoLogExc as e:
+            myabort(str(e))
+        max_timestamp_all_logs = max(max_timestamp_all_logs, max_timestamp_ms)
+
+    (end_time, time_interval_count) = get_time_intervals(args.time_quantum, max_timestamp_all_logs)
+    all_threads_histograms = [ ((j*args.time_quantum*msec_per_sec), deepcopy(zeroed_buckets))
+                                for j in range(0, time_interval_count) ]
+
+    for logfn in hist_files.keys():
+        aligned_per_thread = align_histo_log(hist_files[logfn], 
+                                             args.time_quantum, 
+                                             buckets_per_interval, 
+                                             max_timestamp_all_logs)
+        for t in range(0, time_interval_count):
+            (_, all_threads_histo_t) = all_threads_histograms[t]
+            (_, log_histo_t) = aligned_per_thread[t]
+            add_to_histo_from( all_threads_histo_t, log_histo_t )
+
+    # calculate percentiles across aggregate histogram for all threads
+
+    for (t_msec, all_threads_histo_t) in all_threads_histograms:
+        record = '%d, ' % t_msec
+        pct = get_pctiles(all_threads_histo_t, args.pctiles_wanted, bucket_times)
+        if not pct:
+            for w in args.pctiles_wanted:
+                record += ', '
+        else:
+            pct_keys = [ k for k in pct.keys() ]
+            pct_values = [ str(pct[wanted]/time_divisor) for wanted in sorted(pct_keys) ]
+            record += ', '.join(pct_values)
+        print(record)
+
+
+
+#end of MAIN PROGRAM
+
+
+
+##### below are unit tests ##############
+
+import tempfile, shutil
+from os.path import join
+should_not_get_here = False
+
+class Test(unittest2.TestCase):
+    tempdir = None
+
+    # a little less typing please
+    def A(self, boolean_val):
+        self.assertTrue(boolean_val)
+
+    # initialize unit test environment
+
+    @classmethod
+    def setUpClass(cls):
+        d = tempfile.mkdtemp()
+        Test.tempdir = d
+
+    # remove anything left by unit test environment
+    # unless user sets UNITTEST_LEAVE_FILES environment variable
+
+    @classmethod
+    def tearDownClass(cls):
+        if not os.getenv("UNITTEST_LEAVE_FILES"):
+            shutil.rmtree(cls.tempdir)
+
+    def setUp(self):
+        self.fn = join(Test.tempdir, self.id())
+
+    def test_a_add_histos(self):
+        a = [ 1.0, 2.0 ]
+        b = [ 1.5, 2.5 ]
+        add_to_histo_from( a, b )
+        self.A(a == [2.5, 4.5])
+        self.A(b == [1.5, 2.5])
+
+    def test_b1_parse_log(self):
+        with open(self.fn, 'w') as f:
+            f.write('1234, 0, 4096, 1, 2, 3, 4\n')
+            f.write('5678,1,16384,5,6,7,8 \n')
+        (raw_histo_log, max_timestamp) = parse_hist_file(self.fn, 4) # 4 buckets per interval
+        self.A(len(raw_histo_log) == 2 and max_timestamp == 5678)
+        (time_ms, direction, bsz, histo) = raw_histo_log[0]
+        self.A(time_ms == 1234 and direction == 0 and bsz == 4096 and histo == [ 1, 2, 3, 4 ])
+        (time_ms, direction, bsz, histo) = raw_histo_log[1]
+        self.A(time_ms == 5678 and direction == 1 and bsz == 16384 and histo == [ 5, 6, 7, 8 ])
+
+    def test_b2_parse_empty_log(self):
+        with open(self.fn, 'w') as f:
+            pass
+        try:
+            (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn, 4)
+            self.A(should_not_get_here)
+        except FioHistoLogExc as e:
+            self.A(str(e).startswith('no records'))
+
+    def test_b3_parse_empty_records(self):
+        with open(self.fn, 'w') as f:
+            f.write('\n')
+            f.write('1234, 0, 4096, 1, 2, 3, 4\n')
+            f.write('5678,1,16384,5,6,7,8 \n')
+            f.write('\n')
+        (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn, 4)
+        self.A(len(raw_histo_log) == 2 and max_timestamp_ms == 5678)
+        (time_ms, direction, bsz, histo) = raw_histo_log[0]
+        self.A(time_ms == 1234 and direction == 0 and bsz == 4096 and histo == [ 1, 2, 3, 4 ])
+        (time_ms, direction, bsz, histo) = raw_histo_log[1]
+        self.A(time_ms == 5678 and direction == 1 and bsz == 16384 and histo == [ 5, 6, 7, 8 ])
+
+    def test_b4_parse_non_int(self):
+        with open(self.fn, 'w') as f:
+            f.write('12, 0, 4096, 1a, 2, 3, 4\n')
+        try:
+            (raw_histo_log, _) = parse_hist_file(self.fn, 4)
+            self.A(False)
+        except FioHistoLogExc as e:
+            self.A(str(e).startswith('non-integer'))
+
+    def test_b5_parse_neg_int(self):
+        with open(self.fn, 'w') as f:
+            f.write('-12, 0, 4096, 1, 2, 3, 4\n')
+        try:
+            (raw_histo_log, _) = parse_hist_file(self.fn, 4)
+            self.A(False)
+        except FioHistoLogExc as e:
+            self.A(str(e).startswith('negative integer'))
+
+    def test_b6_parse_too_few_int(self):
+        with open(self.fn, 'w') as f:
+            f.write('0, 0\n')
+        try:
+            (raw_histo_log, _) = parse_hist_file(self.fn, 4)
+            self.A(False)
+        except FioHistoLogExc as e:
+            self.A(str(e).startswith('too few numbers'))
+
+    def test_b7_parse_invalid_direction(self):
+        with open(self.fn, 'w') as f:
+            f.write('100, 2, 4096, 1, 2, 3, 4\n')
+        try:
+            (raw_histo_log, _) = parse_hist_file(self.fn, 4)
+            self.A(False)
+        except FioHistoLogExc as e:
+            self.A(str(e).startswith('invalid I/O direction'))
+
+    def test_b8_parse_bsz_too_big(self):
+        with open(self.fn+'_good', 'w') as f:
+            f.write('100, 1, %d, 1, 2, 3, 4\n' % (1<<24))
+        (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn+'_good', 4)
+        with open(self.fn+'_bad', 'w') as f:
+            f.write('100, 1, 20000000, 1, 2, 3, 4\n')
+        try:
+            (raw_histo_log, _) = parse_hist_file(self.fn+'_bad', 4)
+            self.A(False)
+        except FioHistoLogExc as e:
+            self.A(str(e).startswith('block size too large'))
+
+    def test_b9_parse_wrong_bucket_count(self):
+        with open(self.fn, 'w') as f:
+            f.write('100, 1, %d, 1, 2, 3, 4, 5\n' % (1<<24))
+        try:
+            (raw_histo_log, _) = parse_hist_file(self.fn, 4)
+            self.A(False)
+        except FioHistoLogExc as e:
+            self.A(str(e).__contains__('buckets per interval'))
+
+    def test_c1_time_ranges(self):
+        ranges = time_ranges(3, 2)  # fio_version defaults to 3
+        expected_ranges = [ # fio_version 3 is in nanoseconds
+                [0.000, 0.001], [0.001, 0.002],   # first group
+                [0.002, 0.003], [0.003, 0.004],   # second group same width
+                [0.004, 0.006], [0.006, 0.008]]   # subsequent groups double width
+        self.A(ranges == expected_ranges)
+        ranges = time_ranges(3, 2, fio_version=3)
+        self.A(ranges == expected_ranges)
+        ranges = time_ranges(3, 2, fio_version=2)
+        expected_ranges_v2 = [ [ 1000.0 * min_or_max for min_or_max in time_range ] 
+                               for time_range in expected_ranges ]
+        self.A(ranges == expected_ranges_v2)
+        # see fio V3 stat.h for why 29 groups and 2^6 buckets/group
+        normal_ranges_v3 = time_ranges(29, 64)
+        # for v3, bucket time intervals are measured in nanoseconds
+        self.A(len(normal_ranges_v3) == 29 * 64 and normal_ranges_v3[-1][1] == 64*(1<<(29-1))/1000.0)
+        normal_ranges_v2 = time_ranges(19, 64, fio_version=2)
+        # for v2, bucket time intervals are measured in microseconds so we have fewer buckets
+        self.A(len(normal_ranges_v2) == 19 * 64 and normal_ranges_v2[-1][1] == 64*(1<<(19-1)))
+
+    def test_d1_align_histo_log_1_quantum(self):
+        with open(self.fn, 'w') as f:
+            f.write('100, 1, 4096, 1, 2, 3, 4')
+        (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn, 4)
+        self.A(max_timestamp_ms == 100)
+        aligned_log = align_histo_log(raw_histo_log, 5, 4, max_timestamp_ms)
+        self.A(len(aligned_log) == 1)
+        (time_ms0, h) = aligned_log[0]
+        self.A(time_ms0 == 0 and h == [1.0, 2.0, 3.0, 4.0])
+
+    # we need this to compare 2 lists of floating point numbers for equality
+    # because of floating-point imprecision
+
+    def compare_2_floats(self, x, y):
+        if x == 0.0 or y == 0.0:
+            return (x+y) < 0.0000001
+        else:
+            return (math.fabs(x-y)/x) < 0.00001
+                
+    def is_close(self, buckets, buckets_expected):
+        if len(buckets) != len(buckets_expected):
+            return False
+        compare_buckets = lambda k: self.compare_2_floats(buckets[k], buckets_expected[k])
+        indices_close = list(filter(compare_buckets, range(0, len(buckets))))
+        return len(indices_close) == len(buckets)
+
+    def test_d2_align_histo_log_2_quantum(self):
+        with open(self.fn, 'w') as f:
+            f.write('2000, 1, 4096, 1, 2, 3, 4\n')
+            f.write('7000, 1, 4096, 1, 2, 3, 4\n')
+        (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn, 4)
+        self.A(max_timestamp_ms == 7000)
+        (_, _, _, raw_buckets1) = raw_histo_log[0]
+        (_, _, _, raw_buckets2) = raw_histo_log[1]
+        aligned_log = align_histo_log(raw_histo_log, 5, 4, max_timestamp_ms)
+        self.A(len(aligned_log) == 2)
+        (time_ms1, h1) = aligned_log[0]
+        (time_ms2, h2) = aligned_log[1]
+        # because first record is from time interval [2000, 7000]
+        # we weight it according
+        expect1 = [float(b) * 0.6 for b in raw_buckets1]
+        expect2 = [float(b) * 0.4 for b in raw_buckets1]
+        for e in range(0, len(expect2)):
+            expect2[e] += raw_buckets2[e]
+        self.A(time_ms1 == 0    and self.is_close(h1, expect1))
+        self.A(time_ms2 == 5000 and self.is_close(h2, expect2))
+
+    # what to expect if histogram buckets are all equal
+    def test_e1_get_pctiles_flat_histo(self):
+        with open(self.fn, 'w') as f:
+            buckets = [ 100 for j in range(0, 128) ]
+            f.write('9000, 1, 4096, %s\n' % ', '.join([str(b) for b in buckets]))
+        (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn, 128)
+        self.A(max_timestamp_ms == 9000)
+        aligned_log = align_histo_log(raw_histo_log, 5, 128, max_timestamp_ms)
+        time_intervals = time_ranges(4, 32)
+        # since buckets are all equal, then median is halfway through time_intervals
+        # and max latency interval is at end of time_intervals
+        self.A(time_intervals[64][1] == 0.066 and time_intervals[127][1] == 0.256)
+        pctiles_wanted = [ 0, 50, 100 ]
+        pct_vs_time = []
+        for (time_ms, histo) in aligned_log:
+            pct_vs_time.append(get_pctiles(histo, pctiles_wanted, time_intervals))
+        self.A(pct_vs_time[0] == None)  # no I/O in this time interval
+        expected_pctiles = { 0:0.000, 50:0.064, 100:0.256 }
+        self.A(pct_vs_time[1] == expected_pctiles)
+
+    # what to expect if just the highest histogram bucket is used
+    def test_e2_get_pctiles_highest_pct(self):
+        fio_v3_bucket_count = 29 * 64
+        with open(self.fn, 'w') as f:
+            # make a empty fio v3 histogram
+            buckets = [ 0 for j in range(0, fio_v3_bucket_count) ]
+            # add one I/O request to last bucket
+            buckets[-1] = 1
+            f.write('9000, 1, 4096, %s\n' % ', '.join([str(b) for b in buckets]))
+        (raw_histo_log, max_timestamp_ms) = parse_hist_file(self.fn, fio_v3_bucket_count)
+        self.A(max_timestamp_ms == 9000)
+        aligned_log = align_histo_log(raw_histo_log, 5, fio_v3_bucket_count, max_timestamp_ms)
+        (time_ms, histo) = aligned_log[1]
+        time_intervals = time_ranges(29, 64)
+        expected_pctiles = { 100.0:(64*(1<<28))/1000.0 }
+        pct = get_pctiles( histo, [ 100.0 ], time_intervals )
+        self.A(pct == expected_pctiles)
+
+# we are using this module as a standalone program
+
+if __name__ == '__main__':
+    if os.getenv('UNITTEST'):
+        sys.exit(unittest2.main())
+    else:
+        compute_percentiles_from_logs()
+
diff --git a/verify.c b/verify.c
index 0f2c118..01492f2 100644
--- a/verify.c
+++ b/verify.c
@@ -1517,7 +1517,7 @@ int paste_blockoff(char *buf, unsigned int len, void *priv)
 	struct io_u *io = priv;
 	unsigned long long off;
 
-	typecheck(typeof(off), io->offset);
+	typecheck(__typeof__(off), io->offset);
 	off = cpu_to_le64((uint64_t)io->offset);
 	len = min(len, (unsigned int)sizeof(off));
 	memcpy(buf, &off, len);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-07-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-07-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 464b6f76a69a1937d87a604346fa9c2b430f7465:

  libpmem: update print statement for bs now being ULL (2018-07-23 10:12:54 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9a496382133e8003bd56ab6f3d260c5afadae555:

  init: unify 't' time period (2018-07-24 15:23:28 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      init: unify 't' time period

 init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 8cb8117..ede0a8b 100644
--- a/init.c
+++ b/init.c
@@ -2158,7 +2158,7 @@ static void usage(const char *name)
 	printf("  --showcmd\t\tTurn a job file into command line options\n");
 	printf("  --eta=when\t\tWhen ETA estimate should be printed\n");
 	printf("            \t\tMay be \"always\", \"never\" or \"auto\"\n");
-	printf("  --eta-newline=time\tForce a new line for every 'time'");
+	printf("  --eta-newline=t\tForce a new line for every 't'");
 	printf(" period passed\n");
 	printf("  --status-interval=t\tForce full status dump every");
 	printf(" 't' period passed\n");

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-07-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-07-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit edfca254378e729035073af6ac0d9156565afebe:

  axmap: optimize ulog64 usage in axmap_handler() (2018-07-12 08:33:14 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 464b6f76a69a1937d87a604346fa9c2b430f7465:

  libpmem: update print statement for bs now being ULL (2018-07-23 10:12:54 -0600)

----------------------------------------------------------------
Jeff Furlong (1):
      Add support for >= 4G block sizes

Jens Axboe (6):
      gfio: cleanup includes
      Fio 3.8
      Use stdlib.h instead of malloc.h
      parse: mark fall-through switch case
      parse: mark another fall-through switch case
      libpmem: update print statement for bs now being ULL

 FIO-VERSION-GEN          |  2 +-
 backend.c                | 10 +++----
 cconv.c                  | 24 +++++++--------
 client.c                 |  6 ++--
 engines/glusterfs_sync.c |  4 +--
 engines/libpmem.c        |  2 +-
 file.h                   |  2 +-
 filesetup.c              |  6 ++--
 fio.h                    |  8 ++---
 gclient.c                |  2 +-
 gerror.c                 |  2 +-
 gfio.c                   |  3 +-
 goptions.c               |  2 +-
 graph.c                  |  2 +-
 init.c                   |  4 +--
 io_u.c                   | 34 ++++++++++-----------
 io_u.h                   | 14 ++++-----
 ioengines.c              |  2 +-
 iolog.c                  | 16 +++++-----
 iolog.h                  |  2 +-
 options.c                | 10 +++----
 os/windows/posix.c       |  0
 parse.c                  | 78 +++++++++++++++++++++++++++++++++++-------------
 parse.h                  |  6 ++--
 server.c                 |  2 +-
 server.h                 |  2 +-
 stat.c                   | 20 +++++++------
 stat.h                   |  8 ++---
 thread_options.h         | 24 ++++++++-------
 tickmarks.c              |  2 +-
 verify.c                 |  2 +-
 31 files changed, 172 insertions(+), 129 deletions(-)
 mode change 100755 => 100644 os/windows/posix.c

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index b28a1f3..99261fb 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.7
+DEF_VER=fio-3.8
 
 LF='
 '
diff --git a/backend.c b/backend.c
index a7e9184..3c45e78 100644
--- a/backend.c
+++ b/backend.c
@@ -454,7 +454,7 @@ int io_queue_event(struct thread_data *td, struct io_u *io_u, int *ret,
 			*ret = -io_u->error;
 			clear_io_u(td, io_u);
 		} else if (io_u->resid) {
-			int bytes = io_u->xfer_buflen - io_u->resid;
+			long long bytes = io_u->xfer_buflen - io_u->resid;
 			struct fio_file *f = io_u->file;
 
 			if (bytes_issued)
@@ -583,7 +583,7 @@ static bool in_flight_overlap(struct io_u_queue *q, struct io_u *io_u)
 
 			if (x1 < y2 && y1 < x2) {
 				overlap = true;
-				dprint(FD_IO, "in-flight overlap: %llu/%lu, %llu/%lu\n",
+				dprint(FD_IO, "in-flight overlap: %llu/%llu, %llu/%llu\n",
 						x1, io_u->buflen,
 						y1, check_io_u->buflen);
 				break;
@@ -1033,7 +1033,7 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 			log_io_piece(td, io_u);
 
 		if (td->o.io_submit_mode == IO_MODE_OFFLOAD) {
-			const unsigned long blen = io_u->xfer_buflen;
+			const unsigned long long blen = io_u->xfer_buflen;
 			const enum fio_ddir __ddir = acct_ddir(io_u);
 
 			if (td->error)
@@ -1199,7 +1199,7 @@ static void cleanup_io_u(struct thread_data *td)
 static int init_io_u(struct thread_data *td)
 {
 	struct io_u *io_u;
-	unsigned int max_bs, min_write;
+	unsigned long long max_bs, min_write;
 	int cl_align, i, max_units;
 	int data_xfer = 1, err;
 	char *p;
@@ -1234,7 +1234,7 @@ static int init_io_u(struct thread_data *td)
 		td->orig_buffer_size += page_mask + td->o.mem_align;
 
 	if (td->o.mem_type == MEM_SHMHUGE || td->o.mem_type == MEM_MMAPHUGE) {
-		unsigned long bs;
+		unsigned long long bs;
 
 		bs = td->orig_buffer_size + td->o.hugepage_size - 1;
 		td->orig_buffer_size = bs & ~(td->o.hugepage_size - 1);
diff --git a/cconv.c b/cconv.c
index bfd699d..534bfb0 100644
--- a/cconv.c
+++ b/cconv.c
@@ -110,16 +110,16 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->start_offset_percent = le32_to_cpu(top->start_offset_percent);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		o->bs[i] = le32_to_cpu(top->bs[i]);
-		o->ba[i] = le32_to_cpu(top->ba[i]);
-		o->min_bs[i] = le32_to_cpu(top->min_bs[i]);
-		o->max_bs[i] = le32_to_cpu(top->max_bs[i]);
+		o->bs[i] = le64_to_cpu(top->bs[i]);
+		o->ba[i] = le64_to_cpu(top->ba[i]);
+		o->min_bs[i] = le64_to_cpu(top->min_bs[i]);
+		o->max_bs[i] = le64_to_cpu(top->max_bs[i]);
 		o->bssplit_nr[i] = le32_to_cpu(top->bssplit_nr[i]);
 
 		if (o->bssplit_nr[i]) {
 			o->bssplit[i] = malloc(o->bssplit_nr[i] * sizeof(struct bssplit));
 			for (j = 0; j < o->bssplit_nr[i]; j++) {
-				o->bssplit[i][j].bs = le32_to_cpu(top->bssplit[i][j].bs);
+				o->bssplit[i][j].bs = le64_to_cpu(top->bssplit[i][j].bs);
 				o->bssplit[i][j].perc = le32_to_cpu(top->bssplit[i][j].perc);
 			}
 		}
@@ -203,7 +203,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->gauss_dev.u.f = fio_uint64_to_double(le64_to_cpu(top->gauss_dev.u.i));
 	o->random_generator = le32_to_cpu(top->random_generator);
 	o->hugepage_size = le32_to_cpu(top->hugepage_size);
-	o->rw_min_bs = le32_to_cpu(top->rw_min_bs);
+	o->rw_min_bs = le64_to_cpu(top->rw_min_bs);
 	o->thinktime = le32_to_cpu(top->thinktime);
 	o->thinktime_spin = le32_to_cpu(top->thinktime_spin);
 	o->thinktime_blocks = le32_to_cpu(top->thinktime_blocks);
@@ -410,7 +410,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->gauss_dev.u.i = __cpu_to_le64(fio_double_to_uint64(o->gauss_dev.u.f));
 	top->random_generator = cpu_to_le32(o->random_generator);
 	top->hugepage_size = cpu_to_le32(o->hugepage_size);
-	top->rw_min_bs = cpu_to_le32(o->rw_min_bs);
+	top->rw_min_bs = __cpu_to_le64(o->rw_min_bs);
 	top->thinktime = cpu_to_le32(o->thinktime);
 	top->thinktime_spin = cpu_to_le32(o->thinktime_spin);
 	top->thinktime_blocks = cpu_to_le32(o->thinktime_blocks);
@@ -488,10 +488,10 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->write_hist_log = cpu_to_le32(o->write_hist_log);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		top->bs[i] = cpu_to_le32(o->bs[i]);
-		top->ba[i] = cpu_to_le32(o->ba[i]);
-		top->min_bs[i] = cpu_to_le32(o->min_bs[i]);
-		top->max_bs[i] = cpu_to_le32(o->max_bs[i]);
+		top->bs[i] = __cpu_to_le64(o->bs[i]);
+		top->ba[i] = __cpu_to_le64(o->ba[i]);
+		top->min_bs[i] = __cpu_to_le64(o->min_bs[i]);
+		top->max_bs[i] = __cpu_to_le64(o->max_bs[i]);
 		top->bssplit_nr[i] = cpu_to_le32(o->bssplit_nr[i]);
 
 		if (o->bssplit_nr[i]) {
@@ -502,7 +502,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 				bssplit_nr = BSSPLIT_MAX;
 			}
 			for (j = 0; j < bssplit_nr; j++) {
-				top->bssplit[i][j].bs = cpu_to_le32(o->bssplit[i][j].bs);
+				top->bssplit[i][j].bs = cpu_to_le64(o->bssplit[i][j].bs);
 				top->bssplit[i][j].perc = cpu_to_le32(o->bssplit[i][j].perc);
 			}
 		}
diff --git a/client.c b/client.c
index 2a86ea9..e2525c8 100644
--- a/client.c
+++ b/client.c
@@ -1357,8 +1357,8 @@ static void client_flush_hist_samples(FILE *f, int hist_coarseness, void *sample
 		entry = s->data.plat_entry;
 		io_u_plat = entry->io_u_plat;
 
-		fprintf(f, "%lu, %u, %u, ", (unsigned long) s->time,
-						io_sample_ddir(s), s->bs);
+		fprintf(f, "%lu, %u, %llu, ", (unsigned long) s->time,
+						io_sample_ddir(s), (unsigned long long) s->bs);
 		for (j = 0; j < FIO_IO_U_PLAT_NR - stride; j += stride) {
 			fprintf(f, "%llu, ", (unsigned long long)hist_sum(j, stride, io_u_plat, NULL));
 		}
@@ -1647,7 +1647,7 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 		s->time		= le64_to_cpu(s->time);
 		s->data.val	= le64_to_cpu(s->data.val);
 		s->__ddir	= le32_to_cpu(s->__ddir);
-		s->bs		= le32_to_cpu(s->bs);
+		s->bs		= le64_to_cpu(s->bs);
 
 		if (ret->log_offset) {
 			struct io_sample_offset *so = (void *) s;
diff --git a/engines/glusterfs_sync.c b/engines/glusterfs_sync.c
index a10e0ed..099a5af 100644
--- a/engines/glusterfs_sync.c
+++ b/engines/glusterfs_sync.c
@@ -34,7 +34,7 @@ static enum fio_q_status fio_gf_queue(struct thread_data *td, struct io_u *io_u)
 	struct gf_data *g = td->io_ops_data;
 	int ret = 0;
 
-	dprint(FD_FILE, "fio queue len %lu\n", io_u->xfer_buflen);
+	dprint(FD_FILE, "fio queue len %llu\n", io_u->xfer_buflen);
 	fio_ro_check(td, io_u);
 
 	if (io_u->ddir == DDIR_READ)
@@ -50,7 +50,7 @@ static enum fio_q_status fio_gf_queue(struct thread_data *td, struct io_u *io_u)
 		io_u->error = EINVAL;
 		return FIO_Q_COMPLETED;
 	}
-	dprint(FD_FILE, "fio len %lu ret %d\n", io_u->xfer_buflen, ret);
+	dprint(FD_FILE, "fio len %llu ret %d\n", io_u->xfer_buflen, ret);
 	if (io_u->file && ret >= 0 && ddir_rw(io_u->ddir))
 		LAST_POS(io_u->file) = io_u->offset + ret;
 
diff --git a/engines/libpmem.c b/engines/libpmem.c
index 21ff4f6..4ef3094 100644
--- a/engines/libpmem.c
+++ b/engines/libpmem.c
@@ -499,7 +499,7 @@ static int fio_libpmem_init(struct thread_data *td)
 {
 	struct thread_options *o = &td->o;
 
-	dprint(FD_IO,"o->rw_min_bs %d \n o->fsync_blocks %d \n o->fdatasync_blocks %d \n",
+	dprint(FD_IO,"o->rw_min_bs %llu \n o->fsync_blocks %d \n o->fdatasync_blocks %d \n",
 			o->rw_min_bs,o->fsync_blocks,o->fdatasync_blocks);
 	dprint(FD_IO, "DEBUG fio_libpmem_init\n");
 
diff --git a/file.h b/file.h
index 8fd34b1..c0a547e 100644
--- a/file.h
+++ b/file.h
@@ -86,7 +86,7 @@ struct fio_file {
 	 */
 	unsigned int major, minor;
 	int fileno;
-	int bs;
+	unsigned long long bs;
 	char *file_name;
 
 	/*
diff --git a/filesetup.c b/filesetup.c
index a2427a1..accb67a 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -107,7 +107,7 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 {
 	int new_layout = 0, unlink_file = 0, flags;
 	unsigned long long left;
-	unsigned int bs;
+	unsigned long long bs;
 	char *b = NULL;
 
 	if (read_only) {
@@ -260,7 +260,7 @@ static bool pre_read_file(struct thread_data *td, struct fio_file *f)
 {
 	int r, did_open = 0, old_runstate;
 	unsigned long long left;
-	unsigned int bs;
+	unsigned long long bs;
 	bool ret = true;
 	char *b;
 
@@ -900,7 +900,7 @@ int setup_files(struct thread_data *td)
 	unsigned int i, nr_fs_extra = 0;
 	int err = 0, need_extend;
 	int old_state;
-	const unsigned int bs = td_min_bs(td);
+	const unsigned long long bs = td_min_bs(td);
 	uint64_t fs = 0;
 
 	dprint(FD_FILE, "setup files\n");
diff --git a/fio.h b/fio.h
index 3ac552b..685aab1 100644
--- a/fio.h
+++ b/fio.h
@@ -736,17 +736,17 @@ static inline bool should_check_rate(struct thread_data *td)
 	return ddir_rw_sum(td->bytes_done) != 0;
 }
 
-static inline unsigned int td_max_bs(struct thread_data *td)
+static inline unsigned long long td_max_bs(struct thread_data *td)
 {
-	unsigned int max_bs;
+	unsigned long long max_bs;
 
 	max_bs = max(td->o.max_bs[DDIR_READ], td->o.max_bs[DDIR_WRITE]);
 	return max(td->o.max_bs[DDIR_TRIM], max_bs);
 }
 
-static inline unsigned int td_min_bs(struct thread_data *td)
+static inline unsigned long long td_min_bs(struct thread_data *td)
 {
-	unsigned int min_bs;
+	unsigned long long min_bs;
 
 	min_bs = min(td->o.min_bs[DDIR_READ], td->o.min_bs[DDIR_WRITE]);
 	return min(td->o.min_bs[DDIR_TRIM], min_bs);
diff --git a/gclient.c b/gclient.c
index bcd7a88..7e5071d 100644
--- a/gclient.c
+++ b/gclient.c
@@ -1,4 +1,4 @@
-#include <malloc.h>
+#include <stdlib.h>
 #include <string.h>
 
 #include <glib.h>
diff --git a/gerror.c b/gerror.c
index 43bdaba..1ebcb27 100644
--- a/gerror.c
+++ b/gerror.c
@@ -1,5 +1,5 @@
 #include <locale.h>
-#include <malloc.h>
+#include <stdlib.h>
 #include <string.h>
 #include <stdarg.h>
 
diff --git a/gfio.c b/gfio.c
index d222a1c..f59238c 100644
--- a/gfio.c
+++ b/gfio.c
@@ -22,8 +22,9 @@
  *
  */
 #include <locale.h>
-#include <malloc.h>
+#include <stdlib.h>
 #include <string.h>
+#include <libgen.h>
 
 #include <glib.h>
 #include <cairo.h>
diff --git a/goptions.c b/goptions.c
index 16938ed..f44254b 100644
--- a/goptions.c
+++ b/goptions.c
@@ -1,5 +1,5 @@
 #include <locale.h>
-#include <malloc.h>
+#include <stdlib.h>
 #include <string.h>
 
 #include <glib.h>
diff --git a/graph.c b/graph.c
index f82b52a..7a17417 100644
--- a/graph.c
+++ b/graph.c
@@ -21,7 +21,7 @@
  *
  */
 #include <string.h>
-#include <malloc.h>
+#include <stdlib.h>
 #include <math.h>
 #include <assert.h>
 #include <stdlib.h>
diff --git a/init.c b/init.c
index af4cc6b..8cb8117 100644
--- a/init.c
+++ b/init.c
@@ -531,7 +531,7 @@ static void put_job(struct thread_data *td)
 
 static int __setup_rate(struct thread_data *td, enum fio_ddir ddir)
 {
-	unsigned int bs = td->o.min_bs[ddir];
+	unsigned long long bs = td->o.min_bs[ddir];
 
 	assert(ddir_rw(ddir));
 
@@ -891,7 +891,7 @@ static int fixup_options(struct thread_data *td)
 	 * If size is set but less than the min block size, complain
 	 */
 	if (o->size && o->size < td_min_bs(td)) {
-		log_err("fio: size too small, must not be less than minimum block size: %llu < %u\n",
+		log_err("fio: size too small, must not be less than minimum block size: %llu < %llu\n",
 			(unsigned long long) o->size, td_min_bs(td));
 		ret |= 1;
 	}
diff --git a/io_u.c b/io_u.c
index 5221a78..c58dcf0 100644
--- a/io_u.c
+++ b/io_u.c
@@ -33,9 +33,9 @@ static bool random_map_free(struct fio_file *f, const uint64_t block)
  */
 static void mark_random_map(struct thread_data *td, struct io_u *io_u)
 {
-	unsigned int min_bs = td->o.min_bs[io_u->ddir];
+	unsigned long long min_bs = td->o.min_bs[io_u->ddir];
 	struct fio_file *f = io_u->file;
-	unsigned int nr_blocks;
+	unsigned long long nr_blocks;
 	uint64_t block;
 
 	block = (io_u->offset - f->file_offset) / (uint64_t) min_bs;
@@ -503,19 +503,19 @@ static int get_next_offset(struct thread_data *td, struct io_u *io_u,
 }
 
 static inline bool io_u_fits(struct thread_data *td, struct io_u *io_u,
-			     unsigned int buflen)
+			     unsigned long long buflen)
 {
 	struct fio_file *f = io_u->file;
 
 	return io_u->offset + buflen <= f->io_size + get_start_offset(td, f);
 }
 
-static unsigned int get_next_buflen(struct thread_data *td, struct io_u *io_u,
+static unsigned long long get_next_buflen(struct thread_data *td, struct io_u *io_u,
 				    bool is_random)
 {
 	int ddir = io_u->ddir;
-	unsigned int buflen = 0;
-	unsigned int minbs, maxbs;
+	unsigned long long buflen = 0;
+	unsigned long long minbs, maxbs;
 	uint64_t frand_max, r;
 	bool power_2;
 
@@ -541,7 +541,7 @@ static unsigned int get_next_buflen(struct thread_data *td, struct io_u *io_u,
 		r = __rand(&td->bsrange_state[ddir]);
 
 		if (!td->o.bssplit_nr[ddir]) {
-			buflen = minbs + (unsigned int) ((double) maxbs *
+			buflen = minbs + (unsigned long long) ((double) maxbs *
 					(r / (frand_max + 1.0)));
 		} else {
 			long long perc = 0;
@@ -891,7 +891,7 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 	}
 
 	if (io_u->offset + io_u->buflen > io_u->file->real_file_size) {
-		dprint(FD_IO, "io_u %p, off=0x%llx + len=0x%lx exceeds file size=0x%llx\n",
+		dprint(FD_IO, "io_u %p, off=0x%llx + len=0x%llx exceeds file size=0x%llx\n",
 			io_u,
 			(unsigned long long) io_u->offset, io_u->buflen,
 			(unsigned long long) io_u->file->real_file_size);
@@ -1582,7 +1582,7 @@ static bool check_get_verify(struct thread_data *td, struct io_u *io_u)
  */
 static void small_content_scramble(struct io_u *io_u)
 {
-	unsigned int i, nr_blocks = io_u->buflen >> 9;
+	unsigned long long i, nr_blocks = io_u->buflen >> 9;
 	unsigned int offset;
 	uint64_t boffset, *iptr;
 	char *p;
@@ -1726,7 +1726,7 @@ static void __io_u_log_error(struct thread_data *td, struct io_u *io_u)
 	if (td_non_fatal_error(td, eb, io_u->error) && !td->o.error_dump)
 		return;
 
-	log_err("fio: io_u error%s%s: %s: %s offset=%llu, buflen=%lu\n",
+	log_err("fio: io_u error%s%s: %s: %s offset=%llu, buflen=%llu\n",
 		io_u->file ? " on file " : "",
 		io_u->file ? io_u->file->file_name : "",
 		strerror(io_u->error),
@@ -1892,7 +1892,7 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 	td->last_ddir = ddir;
 
 	if (!io_u->error && ddir_rw(ddir)) {
-		unsigned int bytes = io_u->buflen - io_u->resid;
+		unsigned long long bytes = io_u->buflen - io_u->resid;
 		int ret;
 
 		td->io_blocks[ddir]++;
@@ -2082,8 +2082,8 @@ static void save_buf_state(struct thread_data *td, struct frand_state *rs)
 		frand_copy(&td->buf_state_prev, rs);
 }
 
-void fill_io_buffer(struct thread_data *td, void *buf, unsigned int min_write,
-		    unsigned int max_bs)
+void fill_io_buffer(struct thread_data *td, void *buf, unsigned long long min_write,
+		    unsigned long long max_bs)
 {
 	struct thread_options *o = &td->o;
 
@@ -2093,8 +2093,8 @@ void fill_io_buffer(struct thread_data *td, void *buf, unsigned int min_write,
 	if (o->compress_percentage || o->dedupe_percentage) {
 		unsigned int perc = td->o.compress_percentage;
 		struct frand_state *rs;
-		unsigned int left = max_bs;
-		unsigned int this_write;
+		unsigned long long left = max_bs;
+		unsigned long long this_write;
 
 		do {
 			rs = get_buf_state(td);
@@ -2103,7 +2103,7 @@ void fill_io_buffer(struct thread_data *td, void *buf, unsigned int min_write,
 
 			if (perc) {
 				this_write = min_not_zero(min_write,
-							td->o.compress_chunk);
+							(unsigned long long) td->o.compress_chunk);
 
 				fill_random_buf_percentage(rs, buf, perc,
 					this_write, this_write,
@@ -2130,7 +2130,7 @@ void fill_io_buffer(struct thread_data *td, void *buf, unsigned int min_write,
  * "randomly" fill the buffer contents
  */
 void io_u_fill_buffer(struct thread_data *td, struct io_u *io_u,
-		      unsigned int min_write, unsigned int max_bs)
+		      unsigned long long min_write, unsigned long long max_bs)
 {
 	io_u->buf_filled_len = 0;
 	fill_io_buffer(td, io_u->buf, min_write, max_bs);
diff --git a/io_u.h b/io_u.h
index 4f433c3..9a423b2 100644
--- a/io_u.h
+++ b/io_u.h
@@ -51,7 +51,7 @@ struct io_u {
 	/*
 	 * Allocated/set buffer and length
 	 */
-	unsigned long buflen;
+	unsigned long long buflen;
 	unsigned long long offset;
 	void *buf;
 
@@ -65,13 +65,13 @@ struct io_u {
 	 * partial transfers / residual data counts
 	 */
 	void *xfer_buf;
-	unsigned long xfer_buflen;
+	unsigned long long xfer_buflen;
 
 	/*
 	 * Parameter related to pre-filled buffers and
 	 * their size to handle variable block sizes.
 	 */
-	unsigned long buf_filled_len;
+	unsigned long long buf_filled_len;
 
 	struct io_piece *ipo;
 
@@ -134,8 +134,8 @@ extern void io_u_queued(struct thread_data *, struct io_u *);
 extern int io_u_quiesce(struct thread_data *);
 extern void io_u_log_error(struct thread_data *, struct io_u *);
 extern void io_u_mark_depth(struct thread_data *, unsigned int);
-extern void fill_io_buffer(struct thread_data *, void *, unsigned int, unsigned int);
-extern void io_u_fill_buffer(struct thread_data *td, struct io_u *, unsigned int, unsigned int);
+extern void fill_io_buffer(struct thread_data *, void *, unsigned long long, unsigned long long);
+extern void io_u_fill_buffer(struct thread_data *td, struct io_u *, unsigned long long, unsigned long long);
 void io_u_mark_complete(struct thread_data *, unsigned int);
 void io_u_mark_submit(struct thread_data *, unsigned int);
 bool queue_full(const struct thread_data *);
@@ -149,13 +149,13 @@ static inline void dprint_io_u(struct io_u *io_u, const char *p)
 	struct fio_file *f = io_u->file;
 
 	if (f)
-		dprint(FD_IO, "%s: io_u %p: off=0x%llx,len=0x%lx,ddir=%d,file=%s\n",
+		dprint(FD_IO, "%s: io_u %p: off=0x%llx,len=0x%llx,ddir=%d,file=%s\n",
 				p, io_u,
 				(unsigned long long) io_u->offset,
 				io_u->buflen, io_u->ddir,
 				f->file_name);
 	else
-		dprint(FD_IO, "%s: io_u %p: off=0x%llx,len=0x%lx,ddir=%d\n",
+		dprint(FD_IO, "%s: io_u %p: off=0x%llx,len=0x%llx,ddir=%d\n",
 				p, io_u,
 				(unsigned long long) io_u->offset,
 				io_u->buflen, io_u->ddir);
diff --git a/ioengines.c b/ioengines.c
index d579682..bce65ea 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -279,7 +279,7 @@ out:
 enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 {
 	const enum fio_ddir ddir = acct_ddir(io_u);
-	unsigned long buflen = io_u->xfer_buflen;
+	unsigned long long buflen = io_u->xfer_buflen;
 	enum fio_q_status ret;
 
 	dprint_io_u(io_u, "queue");
diff --git a/iolog.c b/iolog.c
index 5be3e84..d51e49c 100644
--- a/iolog.c
+++ b/iolog.c
@@ -35,7 +35,7 @@ void log_io_u(const struct thread_data *td, const struct io_u *io_u)
 	if (!td->o.write_iolog_file)
 		return;
 
-	fprintf(td->iolog_f, "%s %s %llu %lu\n", io_u->file->file_name,
+	fprintf(td->iolog_f, "%s %s %llu %llu\n", io_u->file->file_name,
 						io_ddir_name(io_u->ddir),
 						io_u->offset, io_u->buflen);
 }
@@ -161,7 +161,7 @@ int read_iolog_get(struct thread_data *td, struct io_u *io_u)
 			io_u->buflen = ipo->len;
 			io_u->file = td->files[ipo->fileno];
 			get_file(io_u->file);
-			dprint(FD_IO, "iolog: get %llu/%lu/%s\n", io_u->offset,
+			dprint(FD_IO, "iolog: get %llu/%llu/%s\n", io_u->offset,
 						io_u->buflen, io_u->file->file_name);
 			if (ipo->delay)
 				iolog_delay(td, ipo->delay);
@@ -737,8 +737,8 @@ static void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
 		entry_before = flist_first_entry(&entry->list, struct io_u_plat_entry, list);
 		io_u_plat_before = entry_before->io_u_plat;
 
-		fprintf(f, "%lu, %u, %u, ", (unsigned long) s->time,
-						io_sample_ddir(s), s->bs);
+		fprintf(f, "%lu, %u, %llu, ", (unsigned long) s->time,
+						io_sample_ddir(s), (unsigned long long) s->bs);
 		for (j = 0; j < FIO_IO_U_PLAT_NR - stride; j += stride) {
 			fprintf(f, "%llu, ", (unsigned long long)
 			        hist_sum(j, stride, io_u_plat, io_u_plat_before));
@@ -770,17 +770,17 @@ void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 		s = __get_sample(samples, log_offset, i);
 
 		if (!log_offset) {
-			fprintf(f, "%lu, %" PRId64 ", %u, %u\n",
+			fprintf(f, "%lu, %" PRId64 ", %u, %llu\n",
 					(unsigned long) s->time,
 					s->data.val,
-					io_sample_ddir(s), s->bs);
+					io_sample_ddir(s), (unsigned long long) s->bs);
 		} else {
 			struct io_sample_offset *so = (void *) s;
 
-			fprintf(f, "%lu, %" PRId64 ", %u, %u, %llu\n",
+			fprintf(f, "%lu, %" PRId64 ", %u, %llu, %llu\n",
 					(unsigned long) s->time,
 					s->data.val,
-					io_sample_ddir(s), s->bs,
+					io_sample_ddir(s), (unsigned long long) s->bs,
 					(unsigned long long) so->offset);
 		}
 	}
diff --git a/iolog.h b/iolog.h
index a4e335a..3b8c901 100644
--- a/iolog.h
+++ b/iolog.h
@@ -42,7 +42,7 @@ struct io_sample {
 	uint64_t time;
 	union io_sample_data data;
 	uint32_t __ddir;
-	uint32_t bs;
+	uint64_t bs;
 };
 
 struct io_sample_offset {
diff --git a/options.c b/options.c
index a174e2c..4b46402 100644
--- a/options.c
+++ b/options.c
@@ -52,7 +52,7 @@ static int bs_cmp(const void *p1, const void *p2)
 
 struct split {
 	unsigned int nr;
-	unsigned int val1[ZONESPLIT_MAX];
+	unsigned long long val1[ZONESPLIT_MAX];
 	unsigned long long val2[ZONESPLIT_MAX];
 };
 
@@ -119,7 +119,7 @@ static int bssplit_ddir(struct thread_options *o, enum fio_ddir ddir, char *str,
 			bool data)
 {
 	unsigned int i, perc, perc_missing;
-	unsigned int max_bs, min_bs;
+	unsigned long long max_bs, min_bs;
 	struct split split;
 
 	memset(&split, 0, sizeof(split));
@@ -2112,7 +2112,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "bs",
 		.lname	= "Block size",
 		.alias	= "blocksize",
-		.type	= FIO_OPT_INT,
+		.type	= FIO_OPT_ULL,
 		.off1	= offsetof(struct thread_options, bs[DDIR_READ]),
 		.off2	= offsetof(struct thread_options, bs[DDIR_WRITE]),
 		.off3	= offsetof(struct thread_options, bs[DDIR_TRIM]),
@@ -2129,7 +2129,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "ba",
 		.lname	= "Block size align",
 		.alias	= "blockalign",
-		.type	= FIO_OPT_INT,
+		.type	= FIO_OPT_ULL,
 		.off1	= offsetof(struct thread_options, ba[DDIR_READ]),
 		.off2	= offsetof(struct thread_options, ba[DDIR_WRITE]),
 		.off3	= offsetof(struct thread_options, ba[DDIR_TRIM]),
@@ -2163,7 +2163,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "bssplit",
 		.lname	= "Block size split",
-		.type	= FIO_OPT_STR,
+		.type	= FIO_OPT_STR_ULL,
 		.cb	= str_bssplit_cb,
 		.off1	= offsetof(struct thread_options, bssplit),
 		.help	= "Set a specific mix of block sizes",
diff --git a/os/windows/posix.c b/os/windows/posix.c
old mode 100755
new mode 100644
diff --git a/parse.c b/parse.c
index 6261fca..194ad59 100644
--- a/parse.c
+++ b/parse.c
@@ -26,12 +26,14 @@
 static const char *opt_type_names[] = {
 	"OPT_INVALID",
 	"OPT_STR",
+	"OPT_STR_ULL",
 	"OPT_STR_MULTI",
 	"OPT_STR_VAL",
 	"OPT_STR_VAL_TIME",
 	"OPT_STR_STORE",
 	"OPT_RANGE",
 	"OPT_INT",
+	"OPT_ULL",
 	"OPT_BOOL",
 	"OPT_FLOAT_LIST",
 	"OPT_STR_SET",
@@ -438,7 +440,7 @@ void strip_blank_end(char *p)
 	*(s + 1) = '\0';
 }
 
-static int check_range_bytes(const char *str, long *val, void *data)
+static int check_range_bytes(const char *str, long long *val, void *data)
 {
 	long long __val;
 
@@ -507,7 +509,8 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 	int il=0, *ilp;
 	fio_fp64_t *flp;
 	long long ull, *ullp;
-	long ul1, ul2;
+	long ul2;
+	long long ull1, ull2;
 	double uf;
 	char **cp = NULL;
 	int ret = 0, is_time = 0;
@@ -525,6 +528,7 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 
 	switch (o->type) {
 	case FIO_OPT_STR:
+	case FIO_OPT_STR_ULL:
 	case FIO_OPT_STR_MULTI: {
 		fio_opt_str_fn *fn = o->cb;
 
@@ -540,7 +544,11 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 				break;
 			if (!strncmp(vp->ival, ptr, str_match_len(vp, ptr))) {
 				ret = 0;
-				if (o->off1)
+				if (!o->off1)
+					continue;
+				if (o->type == FIO_OPT_STR_ULL)
+					val_store(ullp, vp->oval, o->off1, vp->orval, data, o);
+				else
 					val_store(ilp, vp->oval, o->off1, vp->orval, data, o);
 				continue;
 			}
@@ -554,6 +562,8 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 	}
 	case FIO_OPT_STR_VAL_TIME:
 		is_time = 1;
+		/* fall through */
+	case FIO_OPT_ULL:
 	case FIO_OPT_INT:
 	case FIO_OPT_STR_VAL: {
 		fio_opt_str_val_fn *fn = o->cb;
@@ -584,7 +594,7 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 
 		if (o->maxval && ull > o->maxval) {
 			log_err("max value out of range: %llu"
-					" (%u max)\n", ull, o->maxval);
+					" (%llu max)\n", ull, o->maxval);
 			return 1;
 		}
 		if (o->minval && ull < o->minval) {
@@ -636,6 +646,27 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 							val_store(ilp, ull, o->off3, 0, data, o);
 					}
 				}
+			} else if (o->type == FIO_OPT_ULL) {
+				if (first)
+					val_store(ullp, ull, o->off1, 0, data, o);
+				if (curr == 1) {
+					if (o->off2)
+						val_store(ullp, ull, o->off2, 0, data, o);
+				}
+				if (curr == 2) {
+					if (o->off3)
+						val_store(ullp, ull, o->off3, 0, data, o);
+				}
+				if (!more) {
+					if (curr < 1) {
+						if (o->off2)
+							val_store(ullp, ull, o->off2, 0, data, o);
+					}
+					if (curr < 2) {
+						if (o->off3)
+							val_store(ullp, ull, o->off3, 0, data, o);
+					}
+				}
 			} else {
 				if (first)
 					val_store(ullp, ull, o->off1, 0, data, o);
@@ -790,43 +821,43 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 		p1 = tmp;
 
 		ret = 1;
-		if (!check_range_bytes(p1, &ul1, data) &&
-		    !check_range_bytes(p2, &ul2, data)) {
+		if (!check_range_bytes(p1, &ull1, data) &&
+			!check_range_bytes(p2, &ull2, data)) {
 			ret = 0;
-			if (ul1 > ul2) {
-				unsigned long foo = ul1;
+			if (ull1 > ull2) {
+				unsigned long long foo = ull1;
 
-				ul1 = ul2;
-				ul2 = foo;
+				ull1 = ull2;
+				ull2 = foo;
 			}
 
 			if (first) {
-				val_store(ilp, ul1, o->off1, 0, data, o);
-				val_store(ilp, ul2, o->off2, 0, data, o);
+				val_store(ullp, ull1, o->off1, 0, data, o);
+				val_store(ullp, ull2, o->off2, 0, data, o);
 			}
 			if (curr == 1) {
 				if (o->off3 && o->off4) {
-					val_store(ilp, ul1, o->off3, 0, data, o);
-					val_store(ilp, ul2, o->off4, 0, data, o);
+					val_store(ullp, ull1, o->off3, 0, data, o);
+					val_store(ullp, ull2, o->off4, 0, data, o);
 				}
 			}
 			if (curr == 2) {
 				if (o->off5 && o->off6) {
-					val_store(ilp, ul1, o->off5, 0, data, o);
-					val_store(ilp, ul2, o->off6, 0, data, o);
+					val_store(ullp, ull1, o->off5, 0, data, o);
+					val_store(ullp, ull2, o->off6, 0, data, o);
 				}
 			}
 			if (!more) {
 				if (curr < 1) {
 					if (o->off3 && o->off4) {
-						val_store(ilp, ul1, o->off3, 0, data, o);
-						val_store(ilp, ul2, o->off4, 0, data, o);
+						val_store(ullp, ull1, o->off3, 0, data, o);
+						val_store(ullp, ull2, o->off4, 0, data, o);
 					}
 				}
 				if (curr < 2) {
 					if (o->off5 && o->off6) {
-						val_store(ilp, ul1, o->off5, 0, data, o);
-						val_store(ilp, ul2, o->off6, 0, data, o);
+						val_store(ullp, ull1, o->off5, 0, data, o);
+						val_store(ullp, ull2, o->off6, 0, data, o);
 					}
 				}
 			}
@@ -851,7 +882,7 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 			break;
 
 		if (o->maxval && il > (int) o->maxval) {
-			log_err("max value out of range: %d (%d max)\n",
+			log_err("max value out of range: %d (%llu max)\n",
 								il, o->maxval);
 			return 1;
 		}
@@ -878,6 +909,7 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 	}
 	case FIO_OPT_DEPRECATED:
 		ret = 1;
+		/* fall through */
 	case FIO_OPT_SOFT_DEPRECATED:
 		log_info("Option %s is deprecated\n", o->name);
 		break;
@@ -1325,6 +1357,10 @@ static void option_init(struct fio_option *o)
 		if (!o->maxval)
 			o->maxval = UINT_MAX;
 	}
+	if (o->type == FIO_OPT_ULL) {
+		if (!o->maxval)
+			o->maxval = ULLONG_MAX;
+	}
 	if (o->type == FIO_OPT_STR_SET && o->def && !o->no_warn_def) {
 		log_err("Option %s: string set option with"
 				" default will always be true\n", o->name);
diff --git a/parse.h b/parse.h
index 4de5e77..b47a02c 100644
--- a/parse.h
+++ b/parse.h
@@ -10,12 +10,14 @@
 enum fio_opt_type {
 	FIO_OPT_INVALID = 0,
 	FIO_OPT_STR,
+	FIO_OPT_STR_ULL,
 	FIO_OPT_STR_MULTI,
 	FIO_OPT_STR_VAL,
 	FIO_OPT_STR_VAL_TIME,
 	FIO_OPT_STR_STORE,
 	FIO_OPT_RANGE,
 	FIO_OPT_INT,
+	FIO_OPT_ULL,
 	FIO_OPT_BOOL,
 	FIO_OPT_FLOAT_LIST,
 	FIO_OPT_STR_SET,
@@ -29,7 +31,7 @@ enum fio_opt_type {
  */
 struct value_pair {
 	const char *ival;		/* string option */
-	unsigned int oval;		/* output value */
+	unsigned long long oval;/* output value */
 	const char *help;		/* help text for sub option */
 	int orval;			/* OR value */
 	void *cb;			/* sub-option callback */
@@ -52,7 +54,7 @@ struct fio_option {
 	unsigned int off4;
 	unsigned int off5;
 	unsigned int off6;
-	unsigned int maxval;		/* max and min value */
+	unsigned long long maxval;		/* max and min value */
 	int minval;
 	double maxfp;			/* max and min floating value */
 	double minfp;
diff --git a/server.c b/server.c
index 7e7ffed..b966c66 100644
--- a/server.c
+++ b/server.c
@@ -1985,7 +1985,7 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 			s->time		= cpu_to_le64(s->time);
 			s->data.val	= cpu_to_le64(s->data.val);
 			s->__ddir	= cpu_to_le32(s->__ddir);
-			s->bs		= cpu_to_le32(s->bs);
+			s->bs		= cpu_to_le64(s->bs);
 
 			if (log->log_offset) {
 				struct io_sample_offset *so = (void *) s;
diff --git a/server.h b/server.h
index b48bbe1..37d2f76 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 73,
+	FIO_SERVER_VER			= 74,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index a308eb8..8de4835 100644
--- a/stat.c
+++ b/stat.c
@@ -619,8 +619,8 @@ static int block_state_category(int block_state)
 
 static int compare_block_infos(const void *bs1, const void *bs2)
 {
-	uint32_t block1 = *(uint32_t *)bs1;
-	uint32_t block2 = *(uint32_t *)bs2;
+	uint64_t block1 = *(uint64_t *)bs1;
+	uint64_t block2 = *(uint64_t *)bs2;
 	int state1 = BLOCK_INFO_STATE(block1);
 	int state2 = BLOCK_INFO_STATE(block2);
 	int bscat1 = block_state_category(state1);
@@ -2220,7 +2220,7 @@ static struct io_logs *get_cur_log(struct io_log *iolog)
 }
 
 static void __add_log_sample(struct io_log *iolog, union io_sample_data data,
-			     enum fio_ddir ddir, unsigned int bs,
+			     enum fio_ddir ddir, unsigned long long bs,
 			     unsigned long t, uint64_t offset)
 {
 	struct io_logs *cur_log;
@@ -2338,7 +2338,7 @@ static void _add_stat_to_log(struct io_log *iolog, unsigned long elapsed,
 static unsigned long add_log_sample(struct thread_data *td,
 				    struct io_log *iolog,
 				    union io_sample_data data,
-				    enum fio_ddir ddir, unsigned int bs,
+				    enum fio_ddir ddir, unsigned long long bs,
 				    uint64_t offset)
 {
 	unsigned long elapsed, this_window;
@@ -2400,7 +2400,7 @@ void finalize_logs(struct thread_data *td, bool unit_logs)
 		_add_stat_to_log(td->iops_log, elapsed, td->o.log_max != 0);
 }
 
-void add_agg_sample(union io_sample_data data, enum fio_ddir ddir, unsigned int bs)
+void add_agg_sample(union io_sample_data data, enum fio_ddir ddir, unsigned long long bs)
 {
 	struct io_log *iolog;
 
@@ -2430,7 +2430,8 @@ static void add_clat_percentile_sample(struct thread_stat *ts,
 }
 
 void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
-		     unsigned long long nsec, unsigned int bs, uint64_t offset)
+		     unsigned long long nsec, unsigned long long bs,
+		     uint64_t offset)
 {
 	unsigned long elapsed, this_window;
 	struct thread_stat *ts = &td->ts;
@@ -2489,7 +2490,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 }
 
 void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
-		     unsigned long usec, unsigned int bs, uint64_t offset)
+		     unsigned long usec, unsigned long long bs, uint64_t offset)
 {
 	struct thread_stat *ts = &td->ts;
 
@@ -2507,7 +2508,8 @@ void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
 }
 
 void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
-		    unsigned long long nsec, unsigned int bs, uint64_t offset)
+		    unsigned long long nsec, unsigned long long bs,
+		    uint64_t offset)
 {
 	struct thread_stat *ts = &td->ts;
 
@@ -2590,7 +2592,7 @@ static int __add_samples(struct thread_data *td, struct timespec *parent_tv,
 		add_stat_sample(&stat[ddir], rate);
 
 		if (log) {
-			unsigned int bs = 0;
+			unsigned long long bs = 0;
 
 			if (td->o.min_bs[ddir] == td->o.max_bs[ddir])
 				bs = td->o.min_bs[ddir];
diff --git a/stat.h b/stat.h
index c5b8185..5dcaae0 100644
--- a/stat.h
+++ b/stat.h
@@ -308,12 +308,12 @@ extern void update_rusage_stat(struct thread_data *);
 extern void clear_rusage_stat(struct thread_data *);
 
 extern void add_lat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
-				unsigned int, uint64_t);
+				unsigned long long, uint64_t);
 extern void add_clat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
-				unsigned int, uint64_t);
+				unsigned long long, uint64_t);
 extern void add_slat_sample(struct thread_data *, enum fio_ddir, unsigned long,
-				unsigned int, uint64_t);
-extern void add_agg_sample(union io_sample_data, enum fio_ddir, unsigned int);
+				unsigned long long, uint64_t);
+extern void add_agg_sample(union io_sample_data, enum fio_ddir, unsigned long long);
 extern void add_iops_sample(struct thread_data *, struct io_u *,
 				unsigned int);
 extern void add_bw_sample(struct thread_data *, struct io_u *,
diff --git a/thread_options.h b/thread_options.h
index 8d13b79..8adba48 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -29,7 +29,7 @@ enum fio_memtype {
 #define ZONESPLIT_MAX	256
 
 struct bssplit {
-	uint32_t bs;
+	uint64_t bs;
 	uint32_t perc;
 };
 
@@ -82,10 +82,10 @@ struct thread_options {
 	unsigned long long start_offset;
 	unsigned long long start_offset_align;
 
-	unsigned int bs[DDIR_RWDIR_CNT];
-	unsigned int ba[DDIR_RWDIR_CNT];
-	unsigned int min_bs[DDIR_RWDIR_CNT];
-	unsigned int max_bs[DDIR_RWDIR_CNT];
+	unsigned long long bs[DDIR_RWDIR_CNT];
+	unsigned long long ba[DDIR_RWDIR_CNT];
+	unsigned long long min_bs[DDIR_RWDIR_CNT];
+	unsigned long long max_bs[DDIR_RWDIR_CNT];
 	struct bssplit *bssplit[DDIR_RWDIR_CNT];
 	unsigned int bssplit_nr[DDIR_RWDIR_CNT];
 
@@ -164,7 +164,8 @@ struct thread_options {
 	unsigned int perc_rand[DDIR_RWDIR_CNT];
 
 	unsigned int hugepage_size;
-	unsigned int rw_min_bs;
+	unsigned long long rw_min_bs;
+	unsigned int pad2;
 	unsigned int thinktime;
 	unsigned int thinktime_spin;
 	unsigned int thinktime_blocks;
@@ -363,10 +364,10 @@ struct thread_options_pack {
 	uint64_t start_offset;
 	uint64_t start_offset_align;
 
-	uint32_t bs[DDIR_RWDIR_CNT];
-	uint32_t ba[DDIR_RWDIR_CNT];
-	uint32_t min_bs[DDIR_RWDIR_CNT];
-	uint32_t max_bs[DDIR_RWDIR_CNT];
+	uint64_t bs[DDIR_RWDIR_CNT];
+	uint64_t ba[DDIR_RWDIR_CNT];
+	uint64_t min_bs[DDIR_RWDIR_CNT];
+	uint64_t max_bs[DDIR_RWDIR_CNT];
 	struct bssplit bssplit[DDIR_RWDIR_CNT][BSSPLIT_MAX];
 	uint32_t bssplit_nr[DDIR_RWDIR_CNT];
 
@@ -443,7 +444,8 @@ struct thread_options_pack {
 	uint32_t perc_rand[DDIR_RWDIR_CNT];
 
 	uint32_t hugepage_size;
-	uint32_t rw_min_bs;
+	uint64_t rw_min_bs;
+	uint32_t pad2;
 	uint32_t thinktime;
 	uint32_t thinktime_spin;
 	uint32_t thinktime_blocks;
diff --git a/tickmarks.c b/tickmarks.c
index 808de67..88bace0 100644
--- a/tickmarks.c
+++ b/tickmarks.c
@@ -1,6 +1,6 @@
 #include <stdio.h>
 #include <math.h>
-#include <malloc.h>
+#include <stdlib.h>
 #include <string.h>
 
 /*
diff --git a/verify.c b/verify.c
index 40d484b..0f2c118 100644
--- a/verify.c
+++ b/verify.c
@@ -801,7 +801,7 @@ static int verify_trimmed_io_u(struct thread_data *td, struct io_u *io_u)
 
 	mem_is_zero_slow(io_u->buf, io_u->buflen, &offset);
 
-	log_err("trim: verify failed at file %s offset %llu, length %lu"
+	log_err("trim: verify failed at file %s offset %llu, length %llu"
 		", block offset %lu\n",
 			io_u->file->file_name, io_u->offset, io_u->buflen,
 			(unsigned long) offset);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-07-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-07-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b8c7923917200c7c68c84c21549940d4be6b1398:

  t/axmap: add longer overlap test case (2018-07-11 15:32:31 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to edfca254378e729035073af6ac0d9156565afebe:

  axmap: optimize ulog64 usage in axmap_handler() (2018-07-12 08:33:14 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      axmap: optimize ulog64 usage in axmap_handler()

 lib/axmap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/lib/axmap.c b/lib/axmap.c
index 4047f23..454af0b 100644
--- a/lib/axmap.c
+++ b/lib/axmap.c
@@ -156,10 +156,10 @@ static bool axmap_handler(struct axmap *axmap, uint64_t bit_nr,
 			  void *), void *data)
 {
 	struct axmap_level *al;
+	uint64_t index = bit_nr;
 	int i;
 
 	for (i = 0; i < axmap->nr_levels; i++) {
-		unsigned long index = ulog64(bit_nr, i);
 		unsigned long offset = index >> UNIT_SHIFT;
 		unsigned int bit = index & BLOCKS_PER_UNIT_MASK;
 
@@ -167,6 +167,9 @@ static bool axmap_handler(struct axmap *axmap, uint64_t bit_nr,
 
 		if (func(al, offset, bit, data))
 			return true;
+
+		if (index)
+			index >>= UNIT_SHIFT;
 	}
 
 	return false;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-07-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-07-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dc54a01f9a2966133f6d1a52f3718d267117b4f3:

  t/axmap: clean up overlap tests (2018-07-10 21:51:16 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b8c7923917200c7c68c84c21549940d4be6b1398:

  t/axmap: add longer overlap test case (2018-07-11 15:32:31 -0600)

----------------------------------------------------------------
Jens Axboe (11):
      t/axmap: a few more overlap cases
      t/axmap: don't print 'pass' on failure
      axmap: clean up 'no bits to set' case
      t/axmap: add zero return overlap cases
      axmap: code cleanups
      fio: should_fsync() returns bool
      axmap: remove unused 'data' argument to topdown handler
      axmap: a few more cleanups
      axmap: fix continued sequential bit setting
      t/axmap: add regression case for recent overlap failure case
      t/axmap: add longer overlap test case

 fio.h       |  8 +++---
 lib/axmap.c | 83 +++++++++++++++++++++++++++++--------------------------------
 t/axmap.c   | 45 +++++++++++++++++++++++++++++++--
 3 files changed, 86 insertions(+), 50 deletions(-)

---

Diff of recent changes:

diff --git a/fio.h b/fio.h
index 51b8fdc..3ac552b 100644
--- a/fio.h
+++ b/fio.h
@@ -539,14 +539,14 @@ static inline void fio_ro_check(const struct thread_data *td, struct io_u *io_u)
 
 #define REAL_MAX_JOBS		4096
 
-static inline int should_fsync(struct thread_data *td)
+static inline bool should_fsync(struct thread_data *td)
 {
 	if (td->last_was_sync)
-		return 0;
+		return false;
 	if (td_write(td) || td->o.override_sync)
-		return 1;
+		return true;
 
-	return 0;
+	return false;
 }
 
 /*
diff --git a/lib/axmap.c b/lib/axmap.c
index c29597f..4047f23 100644
--- a/lib/axmap.c
+++ b/lib/axmap.c
@@ -37,6 +37,28 @@
 
 #define firstfree_valid(b)	((b)->first_free != (uint64_t) -1)
 
+static const unsigned long bit_masks[] = {
+	0x0000000000000000, 0x0000000000000001, 0x0000000000000003, 0x0000000000000007,
+	0x000000000000000f, 0x000000000000001f, 0x000000000000003f, 0x000000000000007f,
+	0x00000000000000ff, 0x00000000000001ff, 0x00000000000003ff, 0x00000000000007ff,
+	0x0000000000000fff, 0x0000000000001fff, 0x0000000000003fff, 0x0000000000007fff,
+	0x000000000000ffff, 0x000000000001ffff, 0x000000000003ffff, 0x000000000007ffff,
+	0x00000000000fffff, 0x00000000001fffff, 0x00000000003fffff, 0x00000000007fffff,
+	0x0000000000ffffff, 0x0000000001ffffff, 0x0000000003ffffff, 0x0000000007ffffff,
+	0x000000000fffffff, 0x000000001fffffff, 0x000000003fffffff, 0x000000007fffffff,
+	0x00000000ffffffff,
+#if BITS_PER_LONG == 64
+	0x00000001ffffffff, 0x00000003ffffffff, 0x00000007ffffffff, 0x0000000fffffffff,
+	0x0000001fffffffff, 0x0000003fffffffff, 0x0000007fffffffff, 0x000000ffffffffff,
+	0x000001ffffffffff, 0x000003ffffffffff, 0x000007ffffffffff, 0x00000fffffffffff,
+	0x00001fffffffffff, 0x00003fffffffffff, 0x00007fffffffffff, 0x0000ffffffffffff,
+	0x0001ffffffffffff, 0x0003ffffffffffff, 0x0007ffffffffffff, 0x000fffffffffffff,
+	0x001fffffffffffff, 0x003fffffffffffff, 0x007fffffffffffff, 0x00ffffffffffffff,
+	0x01ffffffffffffff, 0x03ffffffffffffff, 0x07ffffffffffffff, 0x0fffffffffffffff,
+	0x1fffffffffffffff, 0x3fffffffffffffff, 0x7fffffffffffffff, 0xffffffffffffffff
+#endif
+};
+
 struct axmap_level {
 	int level;
 	unsigned long map_size;
@@ -50,7 +72,7 @@ struct axmap {
 	uint64_t nr_bits;
 };
 
-static unsigned long ulog64(unsigned long val, unsigned int log)
+static inline unsigned long ulog64(unsigned long val, unsigned int log)
 {
 	while (log-- && val)
 		val >>= UNIT_SHIFT;
@@ -151,20 +173,16 @@ static bool axmap_handler(struct axmap *axmap, uint64_t bit_nr,
 }
 
 static bool axmap_handler_topdown(struct axmap *axmap, uint64_t bit_nr,
-	bool (*func)(struct axmap_level *, unsigned long, unsigned int, void *),
-	void *data)
+	bool (*func)(struct axmap_level *, unsigned long, unsigned int, void *))
 {
-	struct axmap_level *al;
-	int i, level = axmap->nr_levels;
+	int i;
 
 	for (i = axmap->nr_levels - 1; i >= 0; i--) {
-		unsigned long index = ulog64(bit_nr, --level);
+		unsigned long index = ulog64(bit_nr, i);
 		unsigned long offset = index >> UNIT_SHIFT;
 		unsigned int bit = index & BLOCKS_PER_UNIT_MASK;
 
-		al = &axmap->levels[i];
-
-		if (func(al, offset, bit, data))
+		if (func(&axmap->levels[i], offset, bit, NULL))
 			return true;
 	}
 
@@ -194,28 +212,6 @@ struct axmap_set_data {
 	unsigned int set_bits;
 };
 
-static const unsigned long bit_masks[] = {
-	0x0000000000000000, 0x0000000000000001, 0x0000000000000003, 0x0000000000000007,
-	0x000000000000000f, 0x000000000000001f, 0x000000000000003f, 0x000000000000007f,
-	0x00000000000000ff, 0x00000000000001ff, 0x00000000000003ff, 0x00000000000007ff,
-	0x0000000000000fff, 0x0000000000001fff, 0x0000000000003fff, 0x0000000000007fff,
-	0x000000000000ffff, 0x000000000001ffff, 0x000000000003ffff, 0x000000000007ffff,
-	0x00000000000fffff, 0x00000000001fffff, 0x00000000003fffff, 0x00000000007fffff,
-	0x0000000000ffffff, 0x0000000001ffffff, 0x0000000003ffffff, 0x0000000007ffffff,
-	0x000000000fffffff, 0x000000001fffffff, 0x000000003fffffff, 0x000000007fffffff,
-	0x00000000ffffffff,
-#if BITS_PER_LONG == 64
-	0x00000001ffffffff, 0x00000003ffffffff, 0x00000007ffffffff, 0x0000000fffffffff,
-	0x0000001fffffffff, 0x0000003fffffffff, 0x0000007fffffffff, 0x000000ffffffffff,
-	0x000001ffffffffff, 0x000003ffffffffff, 0x000007ffffffffff, 0x00000fffffffffff,
-	0x00001fffffffffff, 0x00003fffffffffff, 0x00007fffffffffff, 0x0000ffffffffffff,
-	0x0001ffffffffffff, 0x0003ffffffffffff, 0x0007ffffffffffff, 0x000fffffffffffff,
-	0x001fffffffffffff, 0x003fffffffffffff, 0x007fffffffffffff, 0x00ffffffffffffff,
-	0x01ffffffffffffff, 0x03ffffffffffffff, 0x07ffffffffffffff, 0x0fffffffffffffff,
-	0x1fffffffffffffff, 0x3fffffffffffffff, 0x7fffffffffffffff, 0xffffffffffffffff
-#endif
-};
-
 static bool axmap_set_fn(struct axmap_level *al, unsigned long offset,
 			 unsigned int bit, void *__data)
 {
@@ -231,16 +227,19 @@ static bool axmap_set_fn(struct axmap_level *al, unsigned long offset,
 	 * Mask off any potential overlap, only sets contig regions
 	 */
 	overlap = al->map[offset] & mask;
-	if (overlap == mask)
+	if (overlap == mask) {
+done:
+		data->set_bits = 0;
 		return true;
+	}
 
 	if (overlap) {
 		const int __bit = ffz(~overlap);
 
-		if (__bit == bit)
-			return true;
-
 		nr_bits = __bit - bit;
+		if (!nr_bits)
+			goto done;
+
 		mask = bit_masks[nr_bits] << bit;
 	}
 
@@ -304,7 +303,7 @@ unsigned int axmap_set_nr(struct axmap *axmap, uint64_t bit_nr,
 		unsigned int max_bits, this_set;
 
 		max_bits = BLOCKS_PER_UNIT - (bit_nr & BLOCKS_PER_UNIT_MASK);
-		if (max_bits < nr_bits)
+		if (nr_bits > max_bits)
 			data.nr_bits = max_bits;
 
 		this_set = data.nr_bits;
@@ -329,7 +328,7 @@ static bool axmap_isset_fn(struct axmap_level *al, unsigned long offset,
 bool axmap_isset(struct axmap *axmap, uint64_t bit_nr)
 {
 	if (bit_nr <= axmap->nr_bits)
-		return axmap_handler_topdown(axmap, bit_nr, axmap_isset_fn, NULL);
+		return axmap_handler_topdown(axmap, bit_nr, axmap_isset_fn);
 
 	return false;
 }
@@ -347,13 +346,8 @@ static uint64_t axmap_find_first_free(struct axmap *axmap, unsigned int level,
 	for (i = level; i >= 0; i--) {
 		struct axmap_level *al = &axmap->levels[i];
 
-		/*
-		 * Clear 'ret', this is a bug condition.
-		 */
-		if (index >= al->map_size) {
-			ret = -1ULL;
-			break;
-		}
+		if (index >= al->map_size)
+			goto err;
 
 		for (j = index; j < al->map_size; j++) {
 			if (al->map[j] == -1UL)
@@ -371,6 +365,7 @@ static uint64_t axmap_find_first_free(struct axmap *axmap, unsigned int level,
 	if (ret < axmap->nr_bits)
 		return ret;
 
+err:
 	return (uint64_t) -1ULL;
 }
 
diff --git a/t/axmap.c b/t/axmap.c
index 3f6b8f1..1512737 100644
--- a/t/axmap.c
+++ b/t/axmap.c
@@ -117,11 +117,21 @@ static int test_overlap(void)
 {
 	struct overlap_test tests[] = {
 		{
+			.start	= 0,
+			.nr	= 0,
+			.ret	= 0,
+		},
+		{
 			.start	= 16,
 			.nr	= 16,
 			.ret	= 16,
 		},
 		{
+			.start	= 16,
+			.nr	= 0,
+			.ret	= 0,
+		},
+		{
 			.start	= 0,
 			.nr	= 32,
 			.ret	= 16,
@@ -137,6 +147,16 @@ static int test_overlap(void)
 			.ret	= 16,
 		},
 		{
+			.start	= 79,
+			.nr	= 1,
+			.ret	= 0,
+		},
+		{
+			.start	= 80,
+			.nr	= 21,
+			.ret	= 21,
+		},
+		{
 			.start	= 102,
 			.nr	= 1,
 			.ret	= 1,
@@ -172,6 +192,26 @@ static int test_overlap(void)
 			.ret	= 0,
 		},
 		{
+			.start	= 1100,
+			.nr	= 1,
+			.ret	= 1,
+		},
+		{
+			.start	= 1000,
+			.nr	= 256,
+			.ret	= 100,
+		},
+		{
+			.start	= 22684,
+			.nr	= 1,
+			.ret	= 1,
+		},
+		{
+			.start	= 22670,
+			.nr	= 60,
+			.ret	= 14,
+		},
+		{
 			.start	= -1U,
 		},
 	};
@@ -179,7 +219,7 @@ static int test_overlap(void)
 	int entries, i, ret, err = 0;
 
 	entries = 0;
-	for (i = 0; tests[i].start != 1U; i++) {
+	for (i = 0; tests[i].start != -1U; i++) {
 		unsigned int this = tests[i].start + tests[i].nr;
 
 		if (this > entries)
@@ -204,7 +244,8 @@ static int test_overlap(void)
 		}
 	}
 
-	printf("pass!\n");
+	if (!err)
+		printf("pass!\n");
 	axmap_free(map);
 	return err;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-07-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-07-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 19a8064ef4a2a826ee06ed061af970d1737cf840:

  blktrace: just ignore zero byte traces (2018-07-04 14:59:18 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dc54a01f9a2966133f6d1a52f3718d267117b4f3:

  t/axmap: clean up overlap tests (2018-07-10 21:51:16 -0600)

----------------------------------------------------------------
Jens Axboe (7):
      io_u: fix negative offset due to wrap
      io_u: ensure we generate the full length of block sizes
      t/axmap: add overlap test cases
      axmap: ensure that overlaps are handled strictly sequential
      t/axmap: add a few more overlap test cases
      Makefile: lib/axmap no longer needs hweight
      t/axmap: clean up overlap tests

 io_u.c      |  19 +++++------
 lib/axmap.c |  13 ++++----
 t/axmap.c   | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 119 insertions(+), 17 deletions(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index 580c414..5221a78 100644
--- a/io_u.c
+++ b/io_u.c
@@ -541,10 +541,8 @@ static unsigned int get_next_buflen(struct thread_data *td, struct io_u *io_u,
 		r = __rand(&td->bsrange_state[ddir]);
 
 		if (!td->o.bssplit_nr[ddir]) {
-			buflen = 1 + (unsigned int) ((double) maxbs *
+			buflen = minbs + (unsigned int) ((double) maxbs *
 					(r / (frand_max + 1.0)));
-			if (buflen < minbs)
-				buflen = minbs;
 		} else {
 			long long perc = 0;
 			unsigned int i;
@@ -832,15 +830,16 @@ static void __fill_io_u_zone(struct thread_data *td, struct io_u *io_u)
 		 * Wrap from the beginning, if we exceed the file size
 		 */
 		if (f->file_offset >= f->real_file_size)
-			f->file_offset = f->real_file_size - f->file_offset;
+			f->file_offset = get_start_offset(td, f);
+
 		f->last_pos[io_u->ddir] = f->file_offset;
 		td->io_skip_bytes += td->o.zone_skip;
 	}
 
 	/*
- 	 * If zone_size > zone_range, then maintain the same zone until
- 	 * zone_bytes >= zone_size.
- 	 */
+	 * If zone_size > zone_range, then maintain the same zone until
+	 * zone_bytes >= zone_size.
+	 */
 	if (f->last_pos[io_u->ddir] >= (f->file_offset + td->o.zone_range)) {
 		dprint(FD_IO, "io_u maintain zone offset=%" PRIu64 "/last_pos=%" PRIu64 "\n",
 				f->file_offset, f->last_pos[io_u->ddir]);
@@ -851,9 +850,8 @@ static void __fill_io_u_zone(struct thread_data *td, struct io_u *io_u)
 	 * For random: if 'norandommap' is not set and zone_size > zone_range,
 	 * map needs to be reset as it's done with zone_range everytime.
 	 */
-	if ((td->zone_bytes % td->o.zone_range) == 0) {
+	if ((td->zone_bytes % td->o.zone_range) == 0)
 		fio_file_reset(td, f);
-	}
 }
 
 static int fill_io_u(struct thread_data *td, struct io_u *io_u)
@@ -874,9 +872,8 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 	/*
 	 * When file is zoned zone_range is always positive
 	 */
-	if (td->o.zone_range) {
+	if (td->o.zone_range)
 		__fill_io_u_zone(td, io_u);
-	}
 
 	/*
 	 * No log, let the seq/rand engine retrieve the next buflen and
diff --git a/lib/axmap.c b/lib/axmap.c
index 3c65308..c29597f 100644
--- a/lib/axmap.c
+++ b/lib/axmap.c
@@ -234,17 +234,18 @@ static bool axmap_set_fn(struct axmap_level *al, unsigned long offset,
 	if (overlap == mask)
 		return true;
 
-	while (overlap) {
-		unsigned long clear_mask = ~(1UL << ffz(~overlap));
+	if (overlap) {
+		const int __bit = ffz(~overlap);
 
-		mask &= clear_mask;
-		overlap &= clear_mask;
-		nr_bits--;
+		if (__bit == bit)
+			return true;
+
+		nr_bits = __bit - bit;
+		mask = bit_masks[nr_bits] << bit;
 	}
 
 	assert(mask);
 	assert(!(al->map[offset] & mask));
-		
 	al->map[offset] |= mask;
 
 	if (!al->level)
diff --git a/t/axmap.c b/t/axmap.c
index eef464f..3f6b8f1 100644
--- a/t/axmap.c
+++ b/t/axmap.c
@@ -107,6 +107,108 @@ static int test_multi(size_t size, unsigned int bit_off)
 	return err;
 }
 
+struct overlap_test {
+	unsigned int start;
+	unsigned int nr;
+	unsigned int ret;
+};
+
+static int test_overlap(void)
+{
+	struct overlap_test tests[] = {
+		{
+			.start	= 16,
+			.nr	= 16,
+			.ret	= 16,
+		},
+		{
+			.start	= 0,
+			.nr	= 32,
+			.ret	= 16,
+		},
+		{
+			.start	= 48,
+			.nr	= 32,
+			.ret	= 32,
+		},
+		{
+			.start	= 32,
+			.nr	= 32,
+			.ret	= 16,
+		},
+		{
+			.start	= 102,
+			.nr	= 1,
+			.ret	= 1,
+		},
+		{
+			.start	= 101,
+			.nr	= 3,
+			.ret	= 1,
+		},
+		{
+			.start	= 106,
+			.nr	= 4,
+			.ret	= 4,
+		},
+		{
+			.start	= 105,
+			.nr	= 3,
+			.ret	= 1,
+		},
+		{
+			.start	= 120,
+			.nr	= 4,
+			.ret	= 4,
+		},
+		{
+			.start	= 118,
+			.nr	= 2,
+			.ret	= 2,
+		},
+		{
+			.start	= 118,
+			.nr	= 2,
+			.ret	= 0,
+		},
+		{
+			.start	= -1U,
+		},
+	};
+	struct axmap *map;
+	int entries, i, ret, err = 0;
+
+	entries = 0;
+	for (i = 0; tests[i].start != 1U; i++) {
+		unsigned int this = tests[i].start + tests[i].nr;
+
+		if (this > entries)
+			entries = this;
+	}
+
+	printf("Test overlaps...");
+	fflush(stdout);
+
+	map = axmap_new(entries);
+
+	for (i = 0; tests[i].start != -1U; i++) {
+		struct overlap_test *t = &tests[i];
+
+		ret = axmap_set_nr(map, t->start, t->nr);
+		if (ret != t->ret) {
+			printf("fail\n");
+			printf("start=%u, nr=%d, ret=%d: %d\n", t->start, t->nr,
+								t->ret, ret);
+			err = 1;
+			break;
+		}
+	}
+
+	printf("pass!\n");
+	axmap_free(map);
+	return err;
+}
+
 int main(int argc, char *argv[])
 {
 	size_t size = (1UL << 23) - 200;
@@ -124,6 +226,8 @@ int main(int argc, char *argv[])
 		return 2;
 	if (test_multi(size, 17))
 		return 3;
+	if (test_overlap())
+		return 4;
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-07-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-07-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 354d50e771451f510e5886275768abb63b602798:

  Fix compilation without cgroups (2018-06-29 08:00:29 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 19a8064ef4a2a826ee06ed061af970d1737cf840:

  blktrace: just ignore zero byte traces (2018-07-04 14:59:18 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      blktrace: just ignore zero byte traces

 blktrace.c | 11 ++++++++++-
 debug.h    |  1 +
 2 files changed, 11 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/blktrace.c b/blktrace.c
index cda111a..36a7180 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -303,6 +303,11 @@ static void handle_trace_discard(struct thread_data *td,
 	queue_io_piece(td, ipo);
 }
 
+static void dump_trace(struct blk_io_trace *t)
+{
+	log_err("blktrace: ignoring zero byte trace: action=%x\n", t->action);
+}
+
 static void handle_trace_fs(struct thread_data *td, struct blk_io_trace *t,
 			    unsigned long long ttime, unsigned long *ios,
 			    unsigned int *rw_bs)
@@ -323,7 +328,11 @@ static void handle_trace_fs(struct thread_data *td, struct blk_io_trace *t,
 			return;
 	}
 
-	assert(t->bytes);
+	if (!t->bytes) {
+		if (!fio_did_warn(FIO_WARN_BTRACE_ZERO))
+			dump_trace(t);
+		return;
+	}
 
 	if (t->bytes > rw_bs[rw])
 		rw_bs[rw] = t->bytes;
diff --git a/debug.h b/debug.h
index 8a8cf87..e5e8040 100644
--- a/debug.h
+++ b/debug.h
@@ -42,6 +42,7 @@ enum {
 	FIO_WARN_ZONED_BUG	= 4,
 	FIO_WARN_IOLOG_DROP	= 8,
 	FIO_WARN_FADVISE	= 16,
+	FIO_WARN_BTRACE_ZERO	= 32,
 };
 
 #ifdef FIO_INC_DEBUG

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-06-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-06-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9caaf02b06deecf0b0a13c16b6d59dddc64b8c35:

  Merge branch 'doc-norandommap' of https://github.com/larrystevenwise/fio (2018-06-21 11:22:41 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 354d50e771451f510e5886275768abb63b602798:

  Fix compilation without cgroups (2018-06-29 08:00:29 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fix compilation without cgroups

Josef Bacik (2):
      fio: work with cgroup2 as well
      fio: add job_runtime to the thread json output

 backend.c |  5 ++---
 cgroup.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++-------------------
 cgroup.h  | 15 +++++++++++----
 stat.c    |  1 +
 4 files changed, 59 insertions(+), 26 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index c52eba5..a7e9184 100644
--- a/backend.c
+++ b/backend.c
@@ -50,7 +50,7 @@
 
 static struct fio_sem *startup_sem;
 static struct flist_head *cgroup_list;
-static char *cgroup_mnt;
+static struct cgroup_mnt *cgroup_mnt;
 static int exit_value;
 static volatile int fio_abort;
 static unsigned int nr_process = 0;
@@ -1886,7 +1886,7 @@ err:
 	close_and_free_files(td);
 	cleanup_io_u(td);
 	close_ioengine(td);
-	cgroup_shutdown(td, &cgroup_mnt);
+	cgroup_shutdown(td, cgroup_mnt);
 	verify_free_state(td);
 
 	if (td->zone_state_index) {
@@ -2508,7 +2508,6 @@ int fio_backend(struct sk_out *sk_out)
 		cgroup_kill(cgroup_list);
 		sfree(cgroup_list);
 	}
-	sfree(cgroup_mnt);
 
 	fio_sem_remove(startup_sem);
 	stat_exit();
diff --git a/cgroup.c b/cgroup.c
index 629047b..77e31a4 100644
--- a/cgroup.c
+++ b/cgroup.c
@@ -18,12 +18,13 @@ struct cgroup_member {
 	unsigned int cgroup_nodelete;
 };
 
-static char *find_cgroup_mnt(struct thread_data *td)
+static struct cgroup_mnt *find_cgroup_mnt(struct thread_data *td)
 {
-	char *mntpoint = NULL;
+	struct cgroup_mnt *cgroup_mnt = NULL;
 	struct mntent *mnt, dummy;
 	char buf[256] = {0};
 	FILE *f;
+	bool cgroup2 = false;
 
 	f = setmntent("/proc/mounts", "r");
 	if (!f) {
@@ -35,15 +36,29 @@ static char *find_cgroup_mnt(struct thread_data *td)
 		if (!strcmp(mnt->mnt_type, "cgroup") &&
 		    strstr(mnt->mnt_opts, "blkio"))
 			break;
+		if (!strcmp(mnt->mnt_type, "cgroup2")) {
+			cgroup2 = true;
+			break;
+		}
 	}
 
-	if (mnt)
-		mntpoint = smalloc_strdup(mnt->mnt_dir);
-	else
+	if (mnt) {
+		cgroup_mnt = smalloc(sizeof(*cgroup_mnt));
+		if (cgroup_mnt) {
+			cgroup_mnt->path = smalloc_strdup(mnt->mnt_dir);
+			if (!cgroup_mnt->path) {
+				sfree(cgroup_mnt);
+				log_err("fio: could not allocate memory\n");
+			} else {
+				cgroup_mnt->cgroup2 = cgroup2;
+			}
+		}
+	} else {
 		log_err("fio: cgroup blkio does not appear to be mounted\n");
+	}
 
 	endmntent(f);
-	return mntpoint;
+	return cgroup_mnt;
 }
 
 static void add_cgroup(struct thread_data *td, const char *name,
@@ -96,14 +111,14 @@ void cgroup_kill(struct flist_head *clist)
 	fio_sem_up(lock);
 }
 
-static char *get_cgroup_root(struct thread_data *td, char *mnt)
+static char *get_cgroup_root(struct thread_data *td, struct cgroup_mnt *mnt)
 {
 	char *str = malloc(64);
 
 	if (td->o.cgroup)
-		sprintf(str, "%s/%s", mnt, td->o.cgroup);
+		sprintf(str, "%s/%s", mnt->path, td->o.cgroup);
 	else
-		sprintf(str, "%s/%s", mnt, td->o.name);
+		sprintf(str, "%s/%s", mnt->path, td->o.name);
 
 	return str;
 }
@@ -128,22 +143,25 @@ static int write_int_to_file(struct thread_data *td, const char *path,
 
 }
 
-static int cgroup_write_pid(struct thread_data *td, const char *root)
+static int cgroup_write_pid(struct thread_data *td, char *path, bool cgroup2)
 {
 	unsigned int val = td->pid;
 
-	return write_int_to_file(td, root, "tasks", val, "cgroup write pid");
+	if (cgroup2)
+		return write_int_to_file(td, path, "cgroup.procs",
+					 val, "cgroup write pid");
+	return write_int_to_file(td, path, "tasks", val, "cgroup write pid");
 }
 
 /*
  * Move pid to root class
  */
-static int cgroup_del_pid(struct thread_data *td, char *mnt)
+static int cgroup_del_pid(struct thread_data *td, struct cgroup_mnt *mnt)
 {
-	return cgroup_write_pid(td, mnt);
+	return cgroup_write_pid(td, mnt->path, mnt->cgroup2);
 }
 
-int cgroup_setup(struct thread_data *td, struct flist_head *clist, char **mnt)
+int cgroup_setup(struct thread_data *td, struct flist_head *clist, struct cgroup_mnt **mnt)
 {
 	char *root;
 
@@ -172,13 +190,17 @@ int cgroup_setup(struct thread_data *td, struct flist_head *clist, char **mnt)
 		add_cgroup(td, root, clist);
 
 	if (td->o.cgroup_weight) {
+		if ((*mnt)->cgroup2) {
+			log_err("fio: cgroup weit doesn't work with cgroup2\n");
+			goto err;
+		}
 		if (write_int_to_file(td, root, "blkio.weight",
 					td->o.cgroup_weight,
 					"cgroup open weight"))
 			goto err;
 	}
 
-	if (!cgroup_write_pid(td, root)) {
+	if (!cgroup_write_pid(td, root, (*mnt)->cgroup2)) {
 		free(root);
 		return 0;
 	}
@@ -188,14 +210,18 @@ err:
 	return 1;
 }
 
-void cgroup_shutdown(struct thread_data *td, char **mnt)
+void cgroup_shutdown(struct thread_data *td, struct cgroup_mnt *mnt)
 {
-	if (*mnt == NULL)
+	if (mnt == NULL)
 		return;
 	if (!td->o.cgroup_weight && !td->o.cgroup)
-		return;
+		goto out;
 
-	cgroup_del_pid(td, *mnt);
+	cgroup_del_pid(td, mnt);
+out:
+	if (mnt->path)
+		sfree(mnt->path);
+	sfree(mnt);
 }
 
 static void fio_init cgroup_init(void)
diff --git a/cgroup.h b/cgroup.h
index 0bbe25a..10313b7 100644
--- a/cgroup.h
+++ b/cgroup.h
@@ -3,21 +3,28 @@
 
 #ifdef FIO_HAVE_CGROUPS
 
-int cgroup_setup(struct thread_data *, struct flist_head *, char **);
-void cgroup_shutdown(struct thread_data *, char **);
+struct cgroup_mnt {
+	char *path;
+	bool cgroup2;
+};
+
+int cgroup_setup(struct thread_data *, struct flist_head *, struct cgroup_mnt **);
+void cgroup_shutdown(struct thread_data *, struct cgroup_mnt *);
 
 void cgroup_kill(struct flist_head *list);
 
 #else
 
+struct cgroup_mnt;
+
 static inline int cgroup_setup(struct thread_data *td, struct flist_head *list,
-			       char **mnt)
+			       struct cgroup_mnt **mnt)
 {
 	td_verror(td, EINVAL, "cgroup_setup");
 	return 1;
 }
 
-static inline void cgroup_shutdown(struct thread_data *td, char **mnt)
+static inline void cgroup_shutdown(struct thread_data *td, struct cgroup_mnt *mnt)
 {
 }
 
diff --git a/stat.c b/stat.c
index d5240d9..a308eb8 100644
--- a/stat.c
+++ b/stat.c
@@ -1288,6 +1288,7 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 		usr_cpu = 0;
 		sys_cpu = 0;
 	}
+	json_object_add_value_int(root, "job_runtime", ts->total_run_time);
 	json_object_add_value_float(root, "usr_cpu", usr_cpu);
 	json_object_add_value_float(root, "sys_cpu", sys_cpu);
 	json_object_add_value_int(root, "ctx", ts->ctx);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-06-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-06-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5de1d4ba1e6ae82bb4ad559463801cb6b7096ac3:

  Merge branch 'readonly-trim' of https://github.com/vincentkfu/fio (2018-06-18 13:58:26 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9caaf02b06deecf0b0a13c16b6d59dddc64b8c35:

  Merge branch 'doc-norandommap' of https://github.com/larrystevenwise/fio (2018-06-21 11:22:41 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'wip-single-glfs-instance' of https://github.com/zhanghuan/fio
      Merge branch 'doc-norandommap' of https://github.com/larrystevenwise/fio

Steve Wise (1):
      doc: add text about possibly verify errors with norandommap

Zhang Huan (1):
      glusterfs: capable to test with one single glfs instance

 HOWTO               |   9 ++-
 engines/gfapi.h     |   1 +
 engines/glusterfs.c | 179 +++++++++++++++++++++++++++++++++++++++++++---------
 fio.1               |   9 ++-
 4 files changed, 167 insertions(+), 31 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index e1399b4..70eed28 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1329,7 +1329,9 @@ I/O type
 	and that some blocks may be read/written more than once. If this option is
 	used with :option:`verify` and multiple blocksizes (via :option:`bsrange`),
 	only intact blocks are verified, i.e., partially-overwritten blocks are
-	ignored.
+	ignored.  With an async I/O engine and an I/O depth > 1, it is possible for
+	the same block to be overwritten, which can cause verification errors.  Either
+	do not use norandommap in this case, or also use the lfsr random generator.
 
 .. option:: softrandommap=bool
 
@@ -2663,6 +2665,11 @@ Verification
 	previously written file. If the data direction includes any form of write,
 	the verify will be of the newly written data.
 
+	To avoid false verification errors, do not use the norandommap option when
+	verifying data with async I/O engines and I/O depths > 1.  Or use the
+	norandommap and the lfsr random generator together to avoid writing to the
+	same offset with muliple outstanding I/Os.
+
 .. option:: verify_offset=int
 
 	Swap the verification header with data somewhere else in the block before
diff --git a/engines/gfapi.h b/engines/gfapi.h
index 1028431..e4cdbcb 100644
--- a/engines/gfapi.h
+++ b/engines/gfapi.h
@@ -5,6 +5,7 @@ struct gf_options {
 	void *pad;
 	char *gf_vol;
 	char *gf_brick;
+	int gf_single_instance;
 };
 
 struct gf_data {
diff --git a/engines/glusterfs.c b/engines/glusterfs.c
index 981dfa3..d0250b7 100644
--- a/engines/glusterfs.c
+++ b/engines/glusterfs.c
@@ -28,16 +28,159 @@ struct fio_option gfapi_options[] = {
 	 .group = FIO_OPT_G_GFAPI,
 	 },
 	{
+	 .name = "single-instance",
+	 .lname = "Single glusterfs instance",
+	 .type = FIO_OPT_BOOL,
+	 .help = "Only one glusterfs instance",
+	 .off1 = offsetof(struct gf_options, gf_single_instance),
+	 .category = FIO_OPT_C_ENGINE,
+	 .group = FIO_OPT_G_GFAPI,
+	 },
+	{
 	 .name = NULL,
 	 },
 };
 
-int fio_gf_setup(struct thread_data *td)
+struct glfs_info {
+	struct flist_head	list;
+	char			*volume;
+	char			*brick;
+	glfs_t			*fs;
+	int			refcount;
+};
+
+static pthread_mutex_t glfs_lock = PTHREAD_MUTEX_INITIALIZER;
+static FLIST_HEAD(glfs_list_head);
+
+static glfs_t *fio_gf_new_fs(char *volume, char *brick)
 {
 	int r = 0;
+	glfs_t *fs;
+	struct stat sb = { 0, };
+
+	fs = glfs_new(volume);
+	if (!fs) {
+		log_err("glfs_new failed.\n");
+		goto out;
+	}
+	glfs_set_logging(fs, "/tmp/fio_gfapi.log", 7);
+	/* default to tcp */
+	r = glfs_set_volfile_server(fs, "tcp", brick, 0);
+	if (r) {
+		log_err("glfs_set_volfile_server failed.\n");
+		goto out;
+	}
+	r = glfs_init(fs);
+	if (r) {
+		log_err("glfs_init failed. Is glusterd running on brick?\n");
+		goto out;
+	}
+	sleep(2);
+	r = glfs_lstat(fs, ".", &sb);
+	if (r) {
+		log_err("glfs_lstat failed.\n");
+		goto out;
+	}
+
+out:
+	if (r) {
+		glfs_fini(fs);
+		fs = NULL;
+	}
+	return fs;
+}
+
+static glfs_t *fio_gf_get_glfs(struct gf_options *opt,
+			       char *volume, char *brick)
+{
+	struct glfs_info *glfs = NULL;
+	struct glfs_info *tmp;
+	struct flist_head *entry;
+
+	if (!opt->gf_single_instance)
+		return fio_gf_new_fs(volume, brick);
+
+	pthread_mutex_lock (&glfs_lock);
+
+	flist_for_each(entry, &glfs_list_head) {
+		tmp = flist_entry(entry, struct glfs_info, list);
+		if (!strcmp(volume, tmp->volume) &&
+		    !strcmp(brick, tmp->brick)) {
+			glfs = tmp;
+			break;
+		}
+	}
+
+	if (glfs) {
+		glfs->refcount++;
+	} else {
+		glfs = malloc(sizeof(*glfs));
+		if (!glfs)
+			goto out;
+		INIT_FLIST_HEAD(&glfs->list);
+		glfs->refcount = 0;
+		glfs->volume = strdup(volume);
+		glfs->brick = strdup(brick);
+		glfs->fs = fio_gf_new_fs(volume, brick);
+		if (!glfs->fs) {
+			free(glfs);
+			glfs = NULL;
+			goto out;
+		}
+
+		flist_add_tail(&glfs->list, &glfs_list_head);
+		glfs->refcount = 1;
+	}
+
+out:
+	pthread_mutex_unlock (&glfs_lock);
+
+	if (glfs)
+		return glfs->fs;
+	return NULL;
+}
+
+static void fio_gf_put_glfs(struct gf_options *opt, glfs_t *fs)
+{
+	struct glfs_info *glfs = NULL;
+	struct glfs_info *tmp;
+	struct flist_head *entry;
+
+	if (!opt->gf_single_instance) {
+		glfs_fini(fs);
+		return;
+	}
+
+	pthread_mutex_lock (&glfs_lock);
+
+	flist_for_each(entry, &glfs_list_head) {
+		tmp = flist_entry(entry, struct glfs_info, list);
+		if (tmp->fs == fs) {
+			glfs = tmp;
+			break;
+		}
+	}
+
+	if (!glfs) {
+		log_err("glfs not found to fini.\n");
+	} else {
+		glfs->refcount--;
+
+		if (glfs->refcount == 0) {
+			glfs_fini(glfs->fs);
+			free(glfs->volume);
+			free(glfs->brick);
+			flist_del(&glfs->list);
+		}
+	}
+
+	pthread_mutex_unlock (&glfs_lock);
+}
+
+int fio_gf_setup(struct thread_data *td)
+{
 	struct gf_data *g = NULL;
 	struct gf_options *opt = td->eo;
-	struct stat sb = { 0, };
 
 	dprint(FD_IO, "fio setup\n");
 
@@ -49,42 +192,20 @@ int fio_gf_setup(struct thread_data *td)
 		log_err("malloc failed.\n");
 		return -ENOMEM;
 	}
-	g->fs = NULL;
 	g->fd = NULL;
 	g->aio_events = NULL;
 
-	g->fs = glfs_new(opt->gf_vol);
-	if (!g->fs) {
-		log_err("glfs_new failed.\n");
-		goto cleanup;
-	}
-	glfs_set_logging(g->fs, "/tmp/fio_gfapi.log", 7);
-	/* default to tcp */
-	r = glfs_set_volfile_server(g->fs, "tcp", opt->gf_brick, 0);
-	if (r) {
-		log_err("glfs_set_volfile_server failed.\n");
+	g->fs = fio_gf_get_glfs(opt, opt->gf_vol, opt->gf_brick);
+	if (!g->fs)
 		goto cleanup;
-	}
-	r = glfs_init(g->fs);
-	if (r) {
-		log_err("glfs_init failed. Is glusterd running on brick?\n");
-		goto cleanup;
-	}
-	sleep(2);
-	r = glfs_lstat(g->fs, ".", &sb);
-	if (r) {
-		log_err("glfs_lstat failed.\n");
-		goto cleanup;
-	}
+
 	dprint(FD_FILE, "fio setup %p\n", g->fs);
 	td->io_ops_data = g;
 	return 0;
 cleanup:
-	if (g->fs)
-		glfs_fini(g->fs);
 	free(g);
 	td->io_ops_data = NULL;
-	return r;
+	return -EIO;
 }
 
 void fio_gf_cleanup(struct thread_data *td)
@@ -97,7 +218,7 @@ void fio_gf_cleanup(struct thread_data *td)
 		if (g->fd)
 			glfs_close(g->fd);
 		if (g->fs)
-			glfs_fini(g->fs);
+			fio_gf_put_glfs(td->eo, g->fs);
 		free(g);
 		td->io_ops_data = NULL;
 	}
diff --git a/fio.1 b/fio.1
index c744d1a..6d2eba6 100644
--- a/fio.1
+++ b/fio.1
@@ -1120,7 +1120,9 @@ at past I/O history. This means that some blocks may not be read or written,
 and that some blocks may be read/written more than once. If this option is
 used with \fBverify\fR and multiple blocksizes (via \fBbsrange\fR),
 only intact blocks are verified, i.e., partially\-overwritten blocks are
-ignored.
+ignored.  With an async I/O engine and an I/O depth > 1, it is possible for
+the same block to be overwritten, which can cause verification errors.  Either
+do not use norandommap in this case, or also use the lfsr random generator.
 .TP
 .BI softrandommap \fR=\fPbool
 See \fBnorandommap\fR. If fio runs with the random block map enabled and
@@ -2370,6 +2372,11 @@ that the written data is also correctly read back. If the data direction
 given is a read or random read, fio will assume that it should verify a
 previously written file. If the data direction includes any form of write,
 the verify will be of the newly written data.
+.P
+To avoid false verification errors, do not use the norandommap option when
+verifying data with async I/O engines and I/O depths > 1.  Or use the
+norandommap and the lfsr random generator together to avoid writing to the
+same offset with muliple outstanding I/Os.
 .RE
 .TP
 .BI verify_offset \fR=\fPint

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-06-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-06-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit af84cd66149507424814cf9c0b4950f4cf66e3b7:

  client: close dup'ed descriptor if fdopen() fails (2018-06-15 09:18:04 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5de1d4ba1e6ae82bb4ad559463801cb6b7096ac3:

  Merge branch 'readonly-trim' of https://github.com/vincentkfu/fio (2018-06-18 13:58:26 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'readonly-trim' of https://github.com/vincentkfu/fio

Vincent Fu (7):
      fio.h: also check trim operations in fio_ro_check
      filesetup: make trim jobs respect --readonly during file open
      init: ensure that fatal errors in fixup_options are always propogated to caller
      init: abort write and trim jobs when --readonly option is present
      options: check for conflict between trims and readonly option
      doc: improve readonly option description
      testing: add test script for readonly parameter

 HOWTO                 | 12 ++++----
 filesetup.c           |  3 +-
 fio.1                 |  9 +++---
 fio.h                 |  3 +-
 init.c                | 40 +++++++++++++-----------
 options.c             |  6 ++--
 t/jobs/readonly-r.fio |  5 +++
 t/jobs/readonly-t.fio |  5 +++
 t/jobs/readonly-w.fio |  5 +++
 t/readonly.sh         | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++
 10 files changed, 139 insertions(+), 33 deletions(-)
 create mode 100644 t/jobs/readonly-r.fio
 create mode 100644 t/jobs/readonly-t.fio
 create mode 100644 t/jobs/readonly-w.fio
 create mode 100755 t/readonly.sh

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 4398ffa..e1399b4 100644
--- a/HOWTO
+++ b/HOWTO
@@ -163,12 +163,12 @@ Command line options
 
 .. option:: --readonly
 
-	Turn on safety read-only checks, preventing writes.  The ``--readonly``
-	option is an extra safety guard to prevent users from accidentally starting
-	a write workload when that is not desired.  Fio will only write if
-	`rw=write/randwrite/rw/randrw` is given.  This extra safety net can be used
-	as an extra precaution as ``--readonly`` will also enable a write check in
-	the I/O engine core to prevent writes due to unknown user space bug(s).
+	Turn on safety read-only checks, preventing writes and trims.  The
+	``--readonly`` option is an extra safety guard to prevent users from
+	accidentally starting a write or trim workload when that is not desired.
+	Fio will only modify the device under test if
+	`rw=write/randwrite/rw/randrw/trim/randtrim/trimwrite` is given.  This
+	safety net can be used as an extra precaution.
 
 .. option:: --eta=when
 
diff --git a/filesetup.c b/filesetup.c
index 75694bd..a2427a1 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -674,7 +674,8 @@ open_again:
 			from_hash = file_lookup_open(f, flags);
 	} else if (td_trim(td)) {
 		assert(!td_rw(td)); /* should have matched above */
-		flags |= O_RDWR;
+		if (!read_only)
+			flags |= O_RDWR;
 		from_hash = file_lookup_open(f, flags);
 	}
 
diff --git a/fio.1 b/fio.1
index 733c740..c744d1a 100644
--- a/fio.1
+++ b/fio.1
@@ -68,12 +68,11 @@ available ioengines.
 Convert \fIjobfile\fR to a set of command\-line options.
 .TP
 .BI \-\-readonly
-Turn on safety read\-only checks, preventing writes. The \fB\-\-readonly\fR
+Turn on safety read\-only checks, preventing writes and trims. The \fB\-\-readonly\fR
 option is an extra safety guard to prevent users from accidentally starting
-a write workload when that is not desired. Fio will only write if
-`rw=write/randwrite/rw/randrw' is given. This extra safety net can be used
-as an extra precaution as \fB\-\-readonly\fR will also enable a write check in
-the I/O engine core to prevent writes due to unknown user space bug(s).
+a write or trim workload when that is not desired. Fio will only modify the
+device under test if `rw=write/randwrite/rw/randrw/trim/randtrim/trimwrite'
+is given. This safety net can be used as an extra precaution.
 .TP
 .BI \-\-eta \fR=\fPwhen
 Specifies when real\-time ETA estimate should be printed. \fIwhen\fR may
diff --git a/fio.h b/fio.h
index 9727f6c..51b8fdc 100644
--- a/fio.h
+++ b/fio.h
@@ -533,7 +533,8 @@ extern bool eta_time_within_slack(unsigned int time);
 
 static inline void fio_ro_check(const struct thread_data *td, struct io_u *io_u)
 {
-	assert(!(io_u->ddir == DDIR_WRITE && !td_write(td)));
+	assert(!(io_u->ddir == DDIR_WRITE && !td_write(td)) &&
+	       !(io_u->ddir == DDIR_TRIM && !td_trim(td)));
 }
 
 #define REAL_MAX_JOBS		4096
diff --git a/init.c b/init.c
index 353c99b..af4cc6b 100644
--- a/init.c
+++ b/init.c
@@ -594,13 +594,19 @@ static int fixup_options(struct thread_data *td)
 	struct thread_options *o = &td->o;
 	int ret = 0;
 
+	if (read_only && (td_write(td) || td_trim(td))) {
+		log_err("fio: trim and write operations are not allowed"
+			 " with the --readonly parameter.\n");
+		ret |= 1;
+	}
+
 #ifndef CONFIG_PSHARED
 	if (!o->use_thread) {
 		log_info("fio: this platform does not support process shared"
 			 " mutexes, forcing use of threads. Use the 'thread'"
 			 " option to get rid of this warning.\n");
 		o->use_thread = 1;
-		ret = warnings_fatal;
+		ret |= warnings_fatal;
 	}
 #endif
 
@@ -608,7 +614,7 @@ static int fixup_options(struct thread_data *td)
 		log_err("fio: read iolog overrides write_iolog\n");
 		free(o->write_iolog_file);
 		o->write_iolog_file = NULL;
-		ret = warnings_fatal;
+		ret |= warnings_fatal;
 	}
 
 	/*
@@ -662,7 +668,7 @@ static int fixup_options(struct thread_data *td)
 	    !o->norandommap) {
 		log_err("fio: Any use of blockalign= turns off randommap\n");
 		o->norandommap = 1;
-		ret = warnings_fatal;
+		ret |= warnings_fatal;
 	}
 
 	if (!o->file_size_high)
@@ -680,7 +686,7 @@ static int fixup_options(struct thread_data *td)
 	    && !fixed_block_size(o))  {
 		log_err("fio: norandommap given for variable block sizes, "
 			"verify limited\n");
-		ret = warnings_fatal;
+		ret |= warnings_fatal;
 	}
 	if (o->bs_unaligned && (o->odirect || td_ioengine_flagged(td, FIO_RAWIO)))
 		log_err("fio: bs_unaligned may not work with raw io\n");
@@ -724,7 +730,7 @@ static int fixup_options(struct thread_data *td)
 		log_err("fio: checking for in-flight overlaps when the "
 			"io_submit_mode is offload is not supported\n");
 		o->serialize_overlap = 0;
-		ret = warnings_fatal;
+		ret |= warnings_fatal;
 	}
 
 	if (o->nr_files > td->files_index)
@@ -738,7 +744,7 @@ static int fixup_options(struct thread_data *td)
 	    ((o->ratemin[DDIR_READ] + o->ratemin[DDIR_WRITE] + o->ratemin[DDIR_TRIM]) &&
 	    (o->rate_iops_min[DDIR_READ] + o->rate_iops_min[DDIR_WRITE] + o->rate_iops_min[DDIR_TRIM]))) {
 		log_err("fio: rate and rate_iops are mutually exclusive\n");
-		ret = 1;
+		ret |= 1;
 	}
 	if ((o->rate[DDIR_READ] && (o->rate[DDIR_READ] < o->ratemin[DDIR_READ])) ||
 	    (o->rate[DDIR_WRITE] && (o->rate[DDIR_WRITE] < o->ratemin[DDIR_WRITE])) ||
@@ -747,13 +753,13 @@ static int fixup_options(struct thread_data *td)
 	    (o->rate_iops[DDIR_WRITE] && (o->rate_iops[DDIR_WRITE] < o->rate_iops_min[DDIR_WRITE])) ||
 	    (o->rate_iops[DDIR_TRIM] && (o->rate_iops[DDIR_TRIM] < o->rate_iops_min[DDIR_TRIM]))) {
 		log_err("fio: minimum rate exceeds rate\n");
-		ret = 1;
+		ret |= 1;
 	}
 
 	if (!o->timeout && o->time_based) {
 		log_err("fio: time_based requires a runtime/timeout setting\n");
 		o->time_based = 0;
-		ret = warnings_fatal;
+		ret |= warnings_fatal;
 	}
 
 	if (o->fill_device && !o->size)
@@ -769,7 +775,7 @@ static int fixup_options(struct thread_data *td)
 			log_info("fio: multiple writers may overwrite blocks "
 				"that belong to other jobs. This can cause "
 				"verification failures.\n");
-			ret = warnings_fatal;
+			ret |= warnings_fatal;
 		}
 
 		/*
@@ -781,7 +787,7 @@ static int fixup_options(struct thread_data *td)
 			log_info("fio: verification read phase will never "
 				 "start because write phase uses all of "
 				 "runtime\n");
-			ret = warnings_fatal;
+			ret |= warnings_fatal;
 		}
 
 		if (!fio_option_is_set(o, refill_buffers))
@@ -817,7 +823,7 @@ static int fixup_options(struct thread_data *td)
 		if (td_ioengine_flagged(td, FIO_PIPEIO)) {
 			log_info("fio: cannot pre-read files with an IO engine"
 				 " that isn't seekable. Pre-read disabled.\n");
-			ret = warnings_fatal;
+			ret |= warnings_fatal;
 		}
 	}
 
@@ -841,7 +847,7 @@ static int fixup_options(struct thread_data *td)
 			 " this warning\n");
 		o->fsync_blocks = o->fdatasync_blocks;
 		o->fdatasync_blocks = 0;
-		ret = warnings_fatal;
+		ret |= warnings_fatal;
 	}
 #endif
 
@@ -854,7 +860,7 @@ static int fixup_options(struct thread_data *td)
 		log_err("fio: Windows does not support direct or non-buffered io with"
 				" the synchronous ioengines. Use the 'windowsaio' ioengine"
 				" with 'direct=1' and 'iodepth=1' instead.\n");
-		ret = 1;
+		ret |= 1;
 	}
 #endif
 
@@ -887,7 +893,7 @@ static int fixup_options(struct thread_data *td)
 	if (o->size && o->size < td_min_bs(td)) {
 		log_err("fio: size too small, must not be less than minimum block size: %llu < %u\n",
 			(unsigned long long) o->size, td_min_bs(td));
-		ret = 1;
+		ret |= 1;
 	}
 
 	/*
@@ -904,7 +910,7 @@ static int fixup_options(struct thread_data *td)
 
 	if (td_ioengine_flagged(td, FIO_NOEXTEND) && o->file_append) {
 		log_err("fio: can't append/extent with IO engine %s\n", td->io_ops->name);
-		ret = 1;
+		ret |= 1;
 	}
 
 	if (fio_option_is_set(o, gtod_cpu)) {
@@ -921,7 +927,7 @@ static int fixup_options(struct thread_data *td)
 		log_err("fio: block error histogram only available "
 			"with a single file per job, but %d files "
 			"provided\n", o->nr_files);
-		ret = 1;
+		ret |= 1;
 	}
 
 	if (fio_option_is_set(o, clat_percentiles) &&
@@ -935,7 +941,7 @@ static int fixup_options(struct thread_data *td)
 		   o->lat_percentiles && o->clat_percentiles) {
 		log_err("fio: lat_percentiles and clat_percentiles are "
 			"mutually exclusive\n");
-		ret = 1;
+		ret |= 1;
 	}
 
 	if (o->disable_lat)
diff --git a/options.c b/options.c
index 0c4f89c..a174e2c 100644
--- a/options.c
+++ b/options.c
@@ -1555,9 +1555,9 @@ static int rw_verify(const struct fio_option *o, void *data)
 {
 	struct thread_data *td = cb_data_to_td(data);
 
-	if (read_only && td_write(td)) {
-		log_err("fio: job <%s> has write bit set, but fio is in"
-			" read-only mode\n", td->o.name);
+	if (read_only && (td_write(td) || td_trim(td))) {
+		log_err("fio: job <%s> has write or trim bit set, but"
+			" fio is in read-only mode\n", td->o.name);
 		return 1;
 	}
 
diff --git a/t/jobs/readonly-r.fio b/t/jobs/readonly-r.fio
new file mode 100644
index 0000000..34ba9b5
--- /dev/null
+++ b/t/jobs/readonly-r.fio
@@ -0,0 +1,5 @@
+[test]
+filename=${DUT}
+rw=randread
+time_based
+runtime=1s
diff --git a/t/jobs/readonly-t.fio b/t/jobs/readonly-t.fio
new file mode 100644
index 0000000..f3e093c
--- /dev/null
+++ b/t/jobs/readonly-t.fio
@@ -0,0 +1,5 @@
+[test]
+filename=${DUT}
+rw=randtrim
+time_based
+runtime=1s
diff --git a/t/jobs/readonly-w.fio b/t/jobs/readonly-w.fio
new file mode 100644
index 0000000..26029ef
--- /dev/null
+++ b/t/jobs/readonly-w.fio
@@ -0,0 +1,5 @@
+[test]
+filename=${DUT}
+rw=randwrite
+time_based
+runtime=1s
diff --git a/t/readonly.sh b/t/readonly.sh
new file mode 100755
index 0000000..d709414
--- /dev/null
+++ b/t/readonly.sh
@@ -0,0 +1,84 @@
+#!/bin/bash
+#
+# Do some basic test of the --readonly parameter
+#
+# DUT should be a device that accepts read, write, and trim operations
+#
+# Example usage:
+#
+# DUT=/dev/fioa t/readonly.sh
+#
+TESTNUM=1
+
+#
+# The first parameter is the return code
+# The second parameter is 0        if the return code should be 0
+#                         positive if the return code should be positive
+#
+check () {
+	echo "********************"
+
+	if [ $2 -gt 0 ]; then
+		if [ $1 -eq 0 ]; then
+			echo "Test $TESTNUM failed"
+			echo "********************"
+			exit 1
+		else
+			echo "Test $TESTNUM passed"
+		fi
+	else
+		if [ $1 -gt 0 ]; then
+			echo "Test $TESTNUM failed"
+			echo "********************"
+			exit 1
+		else
+			echo "Test $TESTNUM passed"
+		fi
+	fi
+
+	echo "********************"
+	echo
+	TESTNUM=$((TESTNUM+1))
+}
+
+./fio --name=test --filename=$DUT --rw=randread  --readonly --time_based --runtime=1s &> /dev/null
+check $? 0
+./fio --name=test --filename=$DUT --rw=randwrite --readonly --time_based --runtime=1s &> /dev/null
+check $? 1
+./fio --name=test --filename=$DUT --rw=randtrim  --readonly --time_based --runtime=1s &> /dev/null
+check $? 1
+
+./fio --name=test --filename=$DUT --readonly --rw=randread  --time_based --runtime=1s &> /dev/null
+check $? 0
+./fio --name=test --filename=$DUT --readonly --rw=randwrite --time_based --runtime=1s &> /dev/null
+check $? 1
+./fio --name=test --filename=$DUT --readonly --rw=randtrim  --time_based --runtime=1s &> /dev/null
+check $? 1
+
+./fio --name=test --filename=$DUT --rw=randread  --time_based --runtime=1s &> /dev/null
+check $? 0
+./fio --name=test --filename=$DUT --rw=randwrite --time_based --runtime=1s &> /dev/null
+check $? 0
+./fio --name=test --filename=$DUT --rw=randtrim  --time_based --runtime=1s &> /dev/null
+check $? 0
+
+./fio t/jobs/readonly-r.fio --readonly &> /dev/null
+check $? 0
+./fio t/jobs/readonly-w.fio --readonly &> /dev/null
+check $? 1
+./fio t/jobs/readonly-t.fio --readonly &> /dev/null
+check $? 1
+
+./fio --readonly t/jobs/readonly-r.fio &> /dev/null
+check $? 0
+./fio --readonly t/jobs/readonly-w.fio &> /dev/null
+check $? 1
+./fio --readonly t/jobs/readonly-t.fio &> /dev/null
+check $? 1
+
+./fio t/jobs/readonly-r.fio &> /dev/null
+check $? 0
+./fio t/jobs/readonly-w.fio &> /dev/null
+check $? 0
+./fio t/jobs/readonly-t.fio &> /dev/null
+check $? 0

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-06-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-06-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit cdfb5a85d9743fb53f4a2b56a392e0897a333568:

  rand: make randX_upto() do the end value increment (2018-06-12 11:43:29 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to af84cd66149507424814cf9c0b4950f4cf66e3b7:

  client: close dup'ed descriptor if fdopen() fails (2018-06-15 09:18:04 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      client: check return of dup(2)
      client: close dup'ed descriptor if fdopen() fails

Tomohiro Kusumi (1):
      client: parse env variables before sending job-file contents to server

 client.c  | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 fio.h     |  1 +
 options.c |  4 ++--
 3 files changed, 60 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/client.c b/client.c
index 96cc35e..2a86ea9 100644
--- a/client.c
+++ b/client.c
@@ -118,6 +118,58 @@ static int read_data(int fd, void *data, size_t size)
 	return 0;
 }
 
+static int read_ini_data(int fd, void *data, size_t size)
+{
+	char *p = data;
+	int ret = 0;
+	FILE *fp;
+	int dupfd;
+
+	dupfd = dup(fd);
+	if (dupfd < 0)
+		return errno;
+
+	fp = fdopen(dupfd, "r");
+	if (!fp) {
+		ret = errno;
+		close(dupfd);
+		goto out;
+	}
+
+	while (1) {
+		ssize_t len;
+		char buf[OPT_LEN_MAX+1], *sub;
+
+		if (!fgets(buf, sizeof(buf), fp)) {
+			if (ferror(fp)) {
+				if (errno == EAGAIN || errno == EINTR)
+					continue;
+				ret = errno;
+			}
+			break;
+		}
+
+		sub = fio_option_dup_subs(buf);
+		len = strlen(sub);
+		if (len + 1 > size) {
+			log_err("fio: no space left to read data\n");
+			free(sub);
+			ret = ENOSPC;
+			break;
+		}
+
+		memcpy(p, sub, len);
+		free(sub);
+		p += len;
+		*p = '\0';
+		size -= len;
+	}
+
+	fclose(fp);
+out:
+	return ret;
+}
+
 static void fio_client_json_init(void)
 {
 	char time_buf[32];
@@ -763,13 +815,17 @@ static int __fio_client_send_local_ini(struct fio_client *client,
 		return ret;
 	}
 
+	/*
+	 * Add extra space for variable expansion, but doesn't guarantee.
+	 */
+	sb.st_size += OPT_LEN_MAX;
 	p_size = sb.st_size + sizeof(*pdu);
 	pdu = malloc(p_size);
 	buf = pdu->buf;
 
 	len = sb.st_size;
 	p = buf;
-	if (read_data(fd, p, len)) {
+	if (read_ini_data(fd, p, len)) {
 		log_err("fio: failed reading job file %s\n", filename);
 		close(fd);
 		free(pdu);
diff --git a/fio.h b/fio.h
index 9f3140a..9727f6c 100644
--- a/fio.h
+++ b/fio.h
@@ -567,6 +567,7 @@ extern void fio_fill_default_options(struct thread_data *);
 extern int fio_show_option_help(const char *);
 extern void fio_options_set_ioengine_opts(struct option *long_options, struct thread_data *td);
 extern void fio_options_dup_and_init(struct option *);
+extern char *fio_option_dup_subs(const char *);
 extern void fio_options_mem_dupe(struct thread_data *);
 extern void td_fill_rand_seeds(struct thread_data *);
 extern void td_fill_verify_state_seed(struct thread_data *);
diff --git a/options.c b/options.c
index 9fbcd96..0c4f89c 100644
--- a/options.c
+++ b/options.c
@@ -4790,7 +4790,7 @@ static char *bc_calc(char *str)
  * substitution always occurs, even if VARNAME is empty or the corresponding
  * environment variable undefined.
  */
-static char *option_dup_subs(const char *opt)
+char *fio_option_dup_subs(const char *opt)
 {
 	char out[OPT_LEN_MAX+1];
 	char in[OPT_LEN_MAX+1];
@@ -4895,7 +4895,7 @@ static char **dup_and_sub_options(char **opts, int num_opts)
 	int i;
 	char **opts_copy = malloc(num_opts * sizeof(*opts));
 	for (i = 0; i < num_opts; i++) {
-		opts_copy[i] = option_dup_subs(opts[i]);
+		opts_copy[i] = fio_option_dup_subs(opts[i]);
 		if (!opts_copy[i])
 			continue;
 		opts_copy[i] = fio_keyword_replace(opts_copy[i]);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-06-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-06-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 49050b56681b17968256702a7a3ec0f545c7dad8:

  Fix start delay being the same across threads (2018-06-11 20:02:10 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cdfb5a85d9743fb53f4a2b56a392e0897a333568:

  rand: make randX_upto() do the end value increment (2018-06-12 11:43:29 -0600)

----------------------------------------------------------------
Jens Axboe (6):
      rand: add rand64_between()
      rand: cleanup rand_between() and helpers
      init: kill get_rand_start_delay()
      init: use o-> instead of td->o
      rand: ensure that rand_between() can reach max value
      rand: make randX_upto() do the end value increment

 engines/sync.c |  2 +-
 init.c         | 20 +++-----------------
 io_u.c         | 10 +++++-----
 lib/rand.h     | 31 ++++++++++++++++++++++++++-----
 t/gen-rand.c   |  2 +-
 5 files changed, 36 insertions(+), 29 deletions(-)

---

Diff of recent changes:

diff --git a/engines/sync.c b/engines/sync.c
index 3f36da8..b3e1c9d 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -150,7 +150,7 @@ static enum fio_q_status fio_pvsyncio2_queue(struct thread_data *td,
 	fio_ro_check(td, io_u);
 
 	if (o->hipri &&
-	    (rand32_between(&sd->rand_state, 1, 100) <= o->hipri_percentage))
+	    (rand_between(&sd->rand_state, 1, 100) <= o->hipri_percentage))
 		flags |= RWF_HIPRI;
 
 	iov->iov_base = io_u->xfer_buf;
diff --git a/init.c b/init.c
index e25e5e4..353c99b 100644
--- a/init.c
+++ b/init.c
@@ -574,22 +574,6 @@ static int fixed_block_size(struct thread_options *o)
 		o->min_bs[DDIR_READ] == o->min_bs[DDIR_TRIM];
 }
 
-
-static unsigned long long get_rand_start_delay(struct thread_data *td)
-{
-	unsigned long long delayrange;
-	uint64_t r, frand_max;
-
-	delayrange = td->o.start_delay_high - td->o.start_delay;
-
-	frand_max = rand_max(&td->delay_state);
-	r = __rand(&td->delay_state);
-	delayrange = (unsigned long long) ((double) delayrange * (r / (frand_max + 1.0)));
-
-	delayrange += td->o.start_delay_orig;
-	return delayrange;
-}
-
 /*
  * <3 Johannes
  */
@@ -687,7 +671,9 @@ static int fixup_options(struct thread_data *td)
 	if (o->start_delay_high) {
 		if (!o->start_delay_orig)
 			o->start_delay_orig = o->start_delay;
-		o->start_delay = get_rand_start_delay(td);
+		o->start_delay = rand_between(&td->delay_state,
+						o->start_delay_orig,
+						o->start_delay_high);
 	}
 
 	if (o->norandommap && o->verify != VERIFY_NONE
diff --git a/io_u.c b/io_u.c
index 945aa19..580c414 100644
--- a/io_u.c
+++ b/io_u.c
@@ -168,7 +168,7 @@ bail:
 	/*
 	 * Generate a value, v, between 1 and 100, both inclusive
 	 */
-	v = rand32_between(&td->zone_state, 1, 100);
+	v = rand_between(&td->zone_state, 1, 100);
 
 	/*
 	 * Find our generated table. 'send' is the end block of this zone,
@@ -225,7 +225,7 @@ bail:
 	/*
 	 * Generate a value, v, between 1 and 100, both inclusive
 	 */
-	v = rand32_between(&td->zone_state, 1, 100);
+	v = rand_between(&td->zone_state, 1, 100);
 
 	zsi = &td->zone_state_index[ddir][v - 1];
 	stotal = zsi->size_perc_prev;
@@ -300,7 +300,7 @@ static bool should_do_random(struct thread_data *td, enum fio_ddir ddir)
 	if (td->o.perc_rand[ddir] == 100)
 		return true;
 
-	v = rand32_between(&td->seq_rand_state[ddir], 1, 100);
+	v = rand_between(&td->seq_rand_state[ddir], 1, 100);
 
 	return v <= td->o.perc_rand[ddir];
 }
@@ -589,7 +589,7 @@ static inline enum fio_ddir get_rand_ddir(struct thread_data *td)
 {
 	unsigned int v;
 
-	v = rand32_between(&td->rwmix_state, 1, 100);
+	v = rand_between(&td->rwmix_state, 1, 100);
 
 	if (v <= td->o.rwmix[DDIR_READ])
 		return DDIR_READ;
@@ -2069,7 +2069,7 @@ static struct frand_state *get_buf_state(struct thread_data *td)
 		return &td->buf_state;
 	}
 
-	v = rand32_between(&td->dedupe_state, 1, 100);
+	v = rand_between(&td->dedupe_state, 1, 100);
 
 	if (v <= td->o.dedupe_percentage)
 		return &td->buf_state_prev;
diff --git a/lib/rand.h b/lib/rand.h
index 8832c73..1676cf9 100644
--- a/lib/rand.h
+++ b/lib/rand.h
@@ -114,17 +114,38 @@ static inline double __rand_0_1(struct frand_state *state)
 	}
 }
 
-/*
- * Generate a random value between 'start' and 'end', both inclusive
- */
-static inline int rand32_between(struct frand_state *state, int start, int end)
+static inline uint32_t rand32_upto(struct frand_state *state, uint32_t end)
 {
 	uint32_t r;
 
 	assert(!state->use64);
 
 	r = __rand32(&state->state32);
-	return start + (int) ((double)end * (r / (FRAND32_MAX + 1.0)));
+	end++;
+	return (int) ((double)end * (r / (FRAND32_MAX + 1.0)));
+}
+
+static inline uint64_t rand64_upto(struct frand_state *state, uint64_t end)
+{
+	uint64_t r;
+
+	assert(state->use64);
+
+	r = __rand64(&state->state64);
+	end++;
+	return (uint64_t) ((double)end * (r / (FRAND64_MAX + 1.0)));
+}
+
+/*
+ * Generate a random value between 'start' and 'end', both inclusive
+ */
+static inline uint64_t rand_between(struct frand_state *state, uint64_t start,
+				    uint64_t end)
+{
+	if (state->use64)
+		return start + rand64_upto(state, end - start);
+	else
+		return start + rand32_upto(state, end - start);
 }
 
 extern void init_rand(struct frand_state *, bool);
diff --git a/t/gen-rand.c b/t/gen-rand.c
index c379053..b050bd7 100644
--- a/t/gen-rand.c
+++ b/t/gen-rand.c
@@ -34,7 +34,7 @@ int main(int argc, char *argv[])
 	init_rand(&s, false);
 
 	for (i = 0; i < nvalues; i++) {
-		int v = rand32_between(&s, start, end);
+		int v = rand_between(&s, start, end);
 
 		buckets[v - start]++;
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-06-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-06-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ab5643cb04dac549b2202231d5c6e33339b7fe7d:

  stat: fix --bandwidth-log segfault (2018-06-08 08:31:20 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 49050b56681b17968256702a7a3ec0f545c7dad8:

  Fix start delay being the same across threads (2018-06-11 20:02:10 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fix start delay being the same across threads

 init.c           | 20 +++++++++++---------
 thread_options.h |  1 +
 2 files changed, 12 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 9257d47..e25e5e4 100644
--- a/init.c
+++ b/init.c
@@ -578,8 +578,7 @@ static int fixed_block_size(struct thread_options *o)
 static unsigned long long get_rand_start_delay(struct thread_data *td)
 {
 	unsigned long long delayrange;
-	uint64_t frand_max;
-	unsigned long r;
+	uint64_t r, frand_max;
 
 	delayrange = td->o.start_delay_high - td->o.start_delay;
 
@@ -587,7 +586,7 @@ static unsigned long long get_rand_start_delay(struct thread_data *td)
 	r = __rand(&td->delay_state);
 	delayrange = (unsigned long long) ((double) delayrange * (r / (frand_max + 1.0)));
 
-	delayrange += td->o.start_delay;
+	delayrange += td->o.start_delay_orig;
 	return delayrange;
 }
 
@@ -685,8 +684,11 @@ static int fixup_options(struct thread_data *td)
 	if (!o->file_size_high)
 		o->file_size_high = o->file_size_low;
 
-	if (o->start_delay_high)
+	if (o->start_delay_high) {
+		if (!o->start_delay_orig)
+			o->start_delay_orig = o->start_delay;
 		o->start_delay = get_rand_start_delay(td);
+	}
 
 	if (o->norandommap && o->verify != VERIFY_NONE
 	    && !fixed_block_size(o))  {
@@ -1456,6 +1458,11 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		}
 	}
 
+	if (setup_random_seeds(td)) {
+		td_verror(td, errno, "setup_random_seeds");
+		goto err;
+	}
+
 	if (fixup_options(td))
 		goto err;
 
@@ -1511,11 +1518,6 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	td->groupid = groupid;
 	prev_group_jobs++;
 
-	if (setup_random_seeds(td)) {
-		td_verror(td, errno, "setup_random_seeds");
-		goto err;
-	}
-
 	if (setup_rate(td))
 		goto err;
 
diff --git a/thread_options.h b/thread_options.h
index 52026e3..8d13b79 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -172,6 +172,7 @@ struct thread_options {
 	unsigned int fdatasync_blocks;
 	unsigned int barrier_blocks;
 	unsigned long long start_delay;
+	unsigned long long start_delay_orig;
 	unsigned long long start_delay_high;
 	unsigned long long timeout;
 	unsigned long long ramp_time;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-06-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-06-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7ceefc0914b9ff721047b4dc8be54625d43f577c:

  idle-prof: Fix segment fault issue when run with '--idle-prof' and multiple output format normal,json (2018-06-07 20:27:25 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ab5643cb04dac549b2202231d5c6e33339b7fe7d:

  stat: fix --bandwidth-log segfault (2018-06-08 08:31:20 -0600)

----------------------------------------------------------------
Igor Konopko (1):
      stat: fix --bandwidth-log segfault

 stat.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 58edf04..d5240d9 100644
--- a/stat.c
+++ b/stat.c
@@ -2207,12 +2207,14 @@ static struct io_logs *get_cur_log(struct io_log *iolog)
 	 * submissions, flag 'td' as needing a log regrow and we'll take
 	 * care of it on the submission side.
 	 */
-	if (iolog->td->o.io_submit_mode == IO_MODE_OFFLOAD ||
+	if ((iolog->td && iolog->td->o.io_submit_mode == IO_MODE_OFFLOAD) ||
 	    !per_unit_log(iolog))
 		return regrow_log(iolog);
 
-	iolog->td->flags |= TD_F_REGROW_LOGS;
-	assert(iolog->pending->nr_samples < iolog->pending->max_samples);
+	if (iolog->td)
+		iolog->td->flags |= TD_F_REGROW_LOGS;
+	if (iolog->pending)
+		assert(iolog->pending->nr_samples < iolog->pending->max_samples);
 	return iolog->pending;
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-06-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-06-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a43f4461d5716c4c46018dd45cecf0a896d05dbd:

  Fix variable shadowing (2018-06-05 13:44:47 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7ceefc0914b9ff721047b4dc8be54625d43f577c:

  idle-prof: Fix segment fault issue when run with '--idle-prof' and multiple output format normal,json (2018-06-07 20:27:25 -0600)

----------------------------------------------------------------
Friendy.Su@sony.com (1):
      idle-prof: Fix segment fault issue when run with '--idle-prof' and multiple output format normal,json

 idletime.c | 8 +-------
 idletime.h | 2 ++
 stat.c     | 2 ++
 3 files changed, 5 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/idletime.c b/idletime.c
index 8762c85..2f59f51 100644
--- a/idletime.c
+++ b/idletime.c
@@ -397,7 +397,7 @@ static double fio_idle_prof_cpu_stat(int cpu)
 	return p * 100.0;
 }
 
-static void fio_idle_prof_cleanup(void)
+void fio_idle_prof_cleanup(void)
 {
 	if (ipc.ipts) {
 		free(ipc.ipts);
@@ -471,10 +471,6 @@ void show_idle_prof_stats(int output, struct json_object *parent,
 			log_buf(out, " stddev=%3.2f\n", ipc.cali_stddev);
 		}
 
-		/* dynamic mem allocations can now be freed */
-		if (ipc.opt != IDLE_PROF_OPT_NONE)
-			fio_idle_prof_cleanup();
-
 		return;
 	}
 
@@ -498,7 +494,5 @@ void show_idle_prof_stats(int output, struct json_object *parent,
 
 		json_object_add_value_float(tmp, "unit_mean", ipc.cali_mean);
 		json_object_add_value_float(tmp, "unit_stddev", ipc.cali_stddev);
-		
-		fio_idle_prof_cleanup();
 	}
 }
diff --git a/idletime.h b/idletime.h
index 6c1161a..91ca95f 100644
--- a/idletime.h
+++ b/idletime.h
@@ -58,4 +58,6 @@ extern void fio_idle_prof_stop(void);
 
 extern void show_idle_prof_stats(int, struct json_object *, struct buf_output *);
 
+extern void fio_idle_prof_cleanup(void);
+
 #endif
diff --git a/stat.c b/stat.c
index 5b49ddd..58edf04 100644
--- a/stat.c
+++ b/stat.c
@@ -1934,6 +1934,8 @@ void __show_run_stats(void)
 		buf_output_free(out);
 	}
 
+	fio_idle_prof_cleanup();
+
 	log_info_flush();
 	free(runstats);
 	free(threadstats);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-06-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-06-06 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 24313 bytes --]

The following changes since commit 182cab24328ab22a5b76ba1edac2c801b83aeffb:

  Fix issue with rate_process=poisson and ramp time (2018-06-04 13:51:47 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a43f4461d5716c4c46018dd45cecf0a896d05dbd:

  Fix variable shadowing (2018-06-05 13:44:47 -0600)

----------------------------------------------------------------
Bart Van Assche (4):
      Remove show_run_stats() because it has no callers
      Rename appveyor.yml into .appveyor.yml
      Suppress gcc 8 compiler warnings
      Make nowarn_snprintf() call va_end()

Jens Axboe (14):
      Merge branch 'remove-show-run-stats' of https://github.com/bvanassche/fio
      Merge branch 'appveyor' of https://github.com/bvanassche/fio
      Merge branch 'suppress-gcc-8-warnings' of https://github.com/bvanassche/fio
      Move nowarn_snprintf.h to lib/
      Merge branch 'make-nowarn-snprintf-call-va-end' of https://github.com/bvanassche/fio
      fallocate: use 'offset' parameter
      Cleanup should_check_rate()
      x86: don't need 'level' passed to amd/intel init functions
      gettime: remove 'is_thread' variable to local clock init
      log: fix signedness issue
      options: remove unused function parameters
      client: remove 'cmd' argument to stop handler
      iolog: remove 'td' from trim_io_piece()
      Fix variable shadowing

 appveyor.yml => .appveyor.yml |  0
 arch/arch-x86-common.h        |  8 ++++----
 backend.c                     | 24 ++++++++++--------------
 client.c                      |  6 +++---
 client.h                      |  3 ++-
 diskutil.c                    | 13 +++++++++----
 fio.h                         | 18 ++++++++----------
 gclient.c                     |  2 +-
 gettime.c                     |  4 ++--
 gettime.h                     |  2 +-
 iolog.c                       | 12 ++++++------
 iolog.h                       |  2 +-
 lib/nowarn_snprintf.h         | 27 +++++++++++++++++++++++++++
 log.c                         | 10 +++++-----
 log.h                         |  4 ++--
 options.c                     | 14 +++++---------
 os/os-linux.h                 |  2 +-
 parse.c                       |  4 ++--
 server.c                      |  6 +++---
 stat.c                        | 13 +++----------
 stat.h                        |  1 -
 verify-state.h                |  3 ++-
 22 files changed, 97 insertions(+), 81 deletions(-)
 rename appveyor.yml => .appveyor.yml (100%)
 create mode 100644 lib/nowarn_snprintf.h

---

Diff of recent changes:

diff --git a/.appveyor.yml b/.appveyor.yml
new file mode 100644
index 0000000..ca8b2ab
--- /dev/null
+++ b/.appveyor.yml
@@ -0,0 +1,30 @@
+clone_depth: 1 # NB: this stops FIO-VERSION-GEN making tag based versions
+
+environment:
+  CYG_MIRROR: http://cygwin.mirror.constant.com
+  CYG_ROOT: C:\cygwin64
+  MAKEFLAGS: -j 2
+  matrix:
+    - platform: x64
+      PACKAGE_ARCH: x86_64
+      CONFIGURE_OPTIONS:
+    - platform: x86
+      PACKAGE_ARCH: i686
+      CONFIGURE_OPTIONS: --build-32bit-win --target-win-ver=xp
+
+install:
+  - '%CYG_ROOT%\setup-x86_64.exe --quiet-mode --no-shortcuts --only-site --site "%CYG_MIRROR%" --packages "mingw64-%PACKAGE_ARCH%-zlib" > NUL'
+  - SET PATH=%CYG_ROOT%\bin;%PATH% #��NB: Changed env variables persist to later sections
+
+build_script:
+  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure --disable-native --extra-cflags=\"-Werror\" ${CONFIGURE_OPTIONS} && make.exe'
+
+after_build:
+  - cd os\windows && dobuild.cmd %PLATFORM%
+
+test_script:
+  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && file.exe fio.exe && make.exe test'
+
+artifacts:
+  - path: os\windows\*.msi
+    name: msi
diff --git a/appveyor.yml b/appveyor.yml
deleted file mode 100644
index ca8b2ab..0000000
--- a/appveyor.yml
+++ /dev/null
@@ -1,30 +0,0 @@
-clone_depth: 1 # NB: this stops FIO-VERSION-GEN making tag based versions
-
-environment:
-  CYG_MIRROR: http://cygwin.mirror.constant.com
-  CYG_ROOT: C:\cygwin64
-  MAKEFLAGS: -j 2
-  matrix:
-    - platform: x64
-      PACKAGE_ARCH: x86_64
-      CONFIGURE_OPTIONS:
-    - platform: x86
-      PACKAGE_ARCH: i686
-      CONFIGURE_OPTIONS: --build-32bit-win --target-win-ver=xp
-
-install:
-  - '%CYG_ROOT%\setup-x86_64.exe --quiet-mode --no-shortcuts --only-site --site "%CYG_MIRROR%" --packages "mingw64-%PACKAGE_ARCH%-zlib" > NUL'
-  - SET PATH=%CYG_ROOT%\bin;%PATH% #��NB: Changed env variables persist to later sections
-
-build_script:
-  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure --disable-native --extra-cflags=\"-Werror\" ${CONFIGURE_OPTIONS} && make.exe'
-
-after_build:
-  - cd os\windows && dobuild.cmd %PLATFORM%
-
-test_script:
-  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && file.exe fio.exe && make.exe test'
-
-artifacts:
-  - path: os\windows\*.msi
-    name: msi
diff --git a/arch/arch-x86-common.h b/arch/arch-x86-common.h
index c51c04c..5140f23 100644
--- a/arch/arch-x86-common.h
+++ b/arch/arch-x86-common.h
@@ -17,7 +17,7 @@ static inline void cpuid(unsigned int op,
 extern bool tsc_reliable;
 extern int arch_random;
 
-static inline void arch_init_intel(unsigned int level)
+static inline void arch_init_intel(void)
 {
 	unsigned int eax, ebx, ecx = 0, edx;
 
@@ -44,7 +44,7 @@ static inline void arch_init_intel(unsigned int level)
 	arch_random = (ecx & (1U << 30)) != 0;
 }
 
-static inline void arch_init_amd(unsigned int level)
+static inline void arch_init_amd(void)
 {
 	unsigned int eax, ebx, ecx, edx;
 
@@ -69,9 +69,9 @@ static inline void arch_init(char *envp[])
 
 	str[12] = '\0';
 	if (!strcmp(str, "GenuineIntel"))
-		arch_init_intel(level);
+		arch_init_intel();
 	else if (!strcmp(str, "AuthenticAMD"))
-		arch_init_amd(level);
+		arch_init_amd();
 }
 
 #endif
diff --git a/backend.c b/backend.c
index 180348f..c52eba5 100644
--- a/backend.c
+++ b/backend.c
@@ -432,9 +432,7 @@ static int wait_for_completions(struct thread_data *td, struct timespec *time)
 	if ((full && !min_evts) || !td->o.iodepth_batch_complete_min)
 		min_evts = 1;
 
-	if (time && (__should_check_rate(td, DDIR_READ) ||
-	    __should_check_rate(td, DDIR_WRITE) ||
-	    __should_check_rate(td, DDIR_TRIM)))
+	if (time && __should_check_rate(td))
 		fio_gettime(time, NULL);
 
 	do {
@@ -463,7 +461,7 @@ int io_queue_event(struct thread_data *td, struct io_u *io_u, int *ret,
 				*bytes_issued += bytes;
 
 			if (!from_verify)
-				trim_io_piece(td, io_u);
+				trim_io_piece(io_u);
 
 			/*
 			 * zero read, fail
@@ -489,9 +487,7 @@ int io_queue_event(struct thread_data *td, struct io_u *io_u, int *ret,
 			requeue_io_u(td, &io_u);
 		} else {
 sync_done:
-			if (comp_time && (__should_check_rate(td, DDIR_READ) ||
-			    __should_check_rate(td, DDIR_WRITE) ||
-			    __should_check_rate(td, DDIR_TRIM)))
+			if (comp_time && __should_check_rate(td))
 				fio_gettime(comp_time, NULL);
 
 			*ret = io_u_sync_complete(td, io_u);
@@ -1038,7 +1034,7 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 
 		if (td->o.io_submit_mode == IO_MODE_OFFLOAD) {
 			const unsigned long blen = io_u->xfer_buflen;
-			const enum fio_ddir ddir = acct_ddir(io_u);
+			const enum fio_ddir __ddir = acct_ddir(io_u);
 
 			if (td->error)
 				break;
@@ -1046,14 +1042,14 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 			workqueue_enqueue(&td->io_wq, &io_u->work);
 			ret = FIO_Q_QUEUED;
 
-			if (ddir_rw(ddir)) {
-				td->io_issues[ddir]++;
-				td->io_issue_bytes[ddir] += blen;
-				td->rate_io_issue_bytes[ddir] += blen;
+			if (ddir_rw(__ddir)) {
+				td->io_issues[__ddir]++;
+				td->io_issue_bytes[__ddir] += blen;
+				td->rate_io_issue_bytes[__ddir] += blen;
 			}
 
 			if (should_check_rate(td))
-				td->rate_next_io_time[ddir] = usec_for_io(td, ddir);
+				td->rate_next_io_time[__ddir] = usec_for_io(td, __ddir);
 
 		} else {
 			ret = io_u_submit(td, io_u);
@@ -1533,7 +1529,7 @@ static void *thread_main(void *data)
 	} else
 		td->pid = gettid();
 
-	fio_local_clock_init(o->use_thread);
+	fio_local_clock_init();
 
 	dprint(FD_PROCESS, "jobs pid=%d started\n", (int) td->pid);
 
diff --git a/client.c b/client.c
index ea1a4d2..96cc35e 100644
--- a/client.c
+++ b/client.c
@@ -28,7 +28,7 @@ static void handle_ts(struct fio_client *client, struct fio_net_cmd *cmd);
 static void handle_gs(struct fio_client *client, struct fio_net_cmd *cmd);
 static void handle_probe(struct fio_client *client, struct fio_net_cmd *cmd);
 static void handle_text(struct fio_client *client, struct fio_net_cmd *cmd);
-static void handle_stop(struct fio_client *client, struct fio_net_cmd *cmd);
+static void handle_stop(struct fio_client *client);
 static void handle_start(struct fio_client *client, struct fio_net_cmd *cmd);
 
 static void convert_text(struct fio_net_cmd *cmd);
@@ -1441,7 +1441,7 @@ static void handle_start(struct fio_client *client, struct fio_net_cmd *cmd)
 	sum_stat_clients += client->nr_stat;
 }
 
-static void handle_stop(struct fio_client *client, struct fio_net_cmd *cmd)
+static void handle_stop(struct fio_client *client)
 {
 	if (client->error)
 		log_info("client <%s>: exited with error %d\n", client->hostname, client->error);
@@ -1744,7 +1744,7 @@ int fio_handle_client(struct fio_client *client)
 		client->state = Client_stopped;
 		client->error = le32_to_cpu(pdu->error);
 		client->signal = le32_to_cpu(pdu->signal);
-		ops->stop(client, cmd);
+		ops->stop(client);
 		break;
 		}
 	case FIO_NET_CMD_ADD_JOB: {
diff --git a/client.h b/client.h
index 29e84d0..a597449 100644
--- a/client.h
+++ b/client.h
@@ -77,6 +77,7 @@ struct fio_client {
 };
 
 typedef void (client_cmd_op)(struct fio_client *, struct fio_net_cmd *);
+typedef void (client_op)(struct fio_client *);
 typedef void (client_eta_op)(struct jobs_eta *je);
 typedef void (client_timed_out_op)(struct fio_client *);
 typedef void (client_jobs_eta_op)(struct fio_client *client, struct jobs_eta *je);
@@ -95,7 +96,7 @@ struct client_ops {
 	client_cmd_op		*add_job;
 	client_cmd_op		*update_job;
 	client_timed_out_op	*timed_out;
-	client_cmd_op		*stop;
+	client_op		*stop;
 	client_cmd_op		*start;
 	client_cmd_op		*job_start;
 	client_timed_out_op	*removed;
diff --git a/diskutil.c b/diskutil.c
index b973120..5b4eb46 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -242,7 +242,8 @@ static void find_add_disk_slaves(struct thread_data *td, char *path,
 		    !strcmp(dirent->d_name, ".."))
 			continue;
 
-		sprintf(temppath, "%s/%s", slavesdir, dirent->d_name);
+		nowarn_snprintf(temppath, sizeof(temppath), "%s/%s", slavesdir,
+				dirent->d_name);
 		/* Can we always assume that the slaves device entries
 		 * are links to the real directories for the slave
 		 * devices?
@@ -255,9 +256,12 @@ static void find_add_disk_slaves(struct thread_data *td, char *path,
 		}
 		slavepath[linklen] = '\0';
 
-		sprintf(temppath, "%s/%s/dev", slavesdir, slavepath);
+		nowarn_snprintf(temppath, sizeof(temppath), "%s/%s/dev",
+				slavesdir, slavepath);
 		if (access(temppath, F_OK) != 0)
-			sprintf(temppath, "%s/%s/device/dev", slavesdir, slavepath);
+			nowarn_snprintf(temppath, sizeof(temppath),
+					"%s/%s/device/dev", slavesdir,
+					slavepath);
 		if (read_block_dev_entry(temppath, &majdev, &mindev)) {
 			perror("Error getting slave device numbers");
 			closedir(dirhandle);
@@ -271,7 +275,8 @@ static void find_add_disk_slaves(struct thread_data *td, char *path,
 		if (slavedu)
 			continue;
 
-		sprintf(temppath, "%s/%s", slavesdir, slavepath);
+		nowarn_snprintf(temppath, sizeof(temppath), "%s/%s", slavesdir,
+				slavepath);
 		__init_per_file_disk_util(td, majdev, mindev, temppath);
 		slavedu = disk_util_exists(majdev, mindev);
 
diff --git a/fio.h b/fio.h
index 4ce4991..9f3140a 100644
--- a/fio.h
+++ b/fio.h
@@ -44,6 +44,7 @@
 #include "io_u_queue.h"
 #include "workqueue.h"
 #include "steadystate.h"
+#include "lib/nowarn_snprintf.h"
 
 #ifdef CONFIG_SOLARISAIO
 #include <sys/asynch.h>
@@ -468,7 +469,9 @@ enum {
 			break;						\
 		(td)->error = ____e;					\
 		if (!(td)->first_error)					\
-			snprintf(td->verror, sizeof(td->verror), "file:%s:%d, func=%s, error=%s", __FILE__, __LINE__, (func), (msg));		\
+			nowarn_snprintf(td->verror, sizeof(td->verror),	\
+					"file:%s:%d, func=%s, error=%s", \
+					__FILE__, __LINE__, (func), (msg)); \
 	} while (0)
 
 
@@ -718,22 +721,17 @@ static inline bool option_check_rate(struct thread_data *td, enum fio_ddir ddir)
 	return false;
 }
 
-static inline bool __should_check_rate(struct thread_data *td,
-				       enum fio_ddir ddir)
+static inline bool __should_check_rate(struct thread_data *td)
 {
 	return (td->flags & TD_F_CHECK_RATE) != 0;
 }
 
 static inline bool should_check_rate(struct thread_data *td)
 {
-	if (__should_check_rate(td, DDIR_READ) && td->bytes_done[DDIR_READ])
-		return true;
-	if (__should_check_rate(td, DDIR_WRITE) && td->bytes_done[DDIR_WRITE])
-		return true;
-	if (__should_check_rate(td, DDIR_TRIM) && td->bytes_done[DDIR_TRIM])
-		return true;
+	if (!__should_check_rate(td))
+		return false;
 
-	return false;
+	return ddir_rw_sum(td->bytes_done) != 0;
 }
 
 static inline unsigned int td_max_bs(struct thread_data *td)
diff --git a/gclient.c b/gclient.c
index 5087b6b..bcd7a88 100644
--- a/gclient.c
+++ b/gclient.c
@@ -641,7 +641,7 @@ static void gfio_client_timed_out(struct fio_client *client)
 	gdk_threads_leave();
 }
 
-static void gfio_client_stop(struct fio_client *client, struct fio_net_cmd *cmd)
+static void gfio_client_stop(struct fio_client *client)
 {
 	struct gfio_client *gc = client->client_data;
 
diff --git a/gettime.c b/gettime.c
index 87fc29b..c0f2638 100644
--- a/gettime.c
+++ b/gettime.c
@@ -373,7 +373,7 @@ static int calibrate_cpu_clock(void)
 #endif // ARCH_HAVE_CPU_CLOCK
 
 #ifndef CONFIG_TLS_THREAD
-void fio_local_clock_init(int is_thread)
+void fio_local_clock_init(void)
 {
 	struct tv_valid *t;
 
@@ -389,7 +389,7 @@ static void kill_tv_tls_key(void *data)
 	free(data);
 }
 #else
-void fio_local_clock_init(int is_thread)
+void fio_local_clock_init(void)
 {
 }
 #endif
diff --git a/gettime.h b/gettime.h
index 1c4a25c..f92ee8c 100644
--- a/gettime.h
+++ b/gettime.h
@@ -20,7 +20,7 @@ extern void fio_gtod_init(void);
 extern void fio_clock_init(void);
 extern int fio_start_gtod_thread(void);
 extern int fio_monotonic_clocktest(int debug);
-extern void fio_local_clock_init(int);
+extern void fio_local_clock_init(void);
 
 extern struct timespec *fio_ts;
 
diff --git a/iolog.c b/iolog.c
index 6e44119..5be3e84 100644
--- a/iolog.c
+++ b/iolog.c
@@ -315,7 +315,7 @@ void unlog_io_piece(struct thread_data *td, struct io_u *io_u)
 	td->io_hist_len--;
 }
 
-void trim_io_piece(struct thread_data *td, const struct io_u *io_u)
+void trim_io_piece(const struct io_u *io_u)
 {
 	struct io_piece *ipo = io_u->ipo;
 
@@ -619,12 +619,12 @@ void setup_log(struct io_log **log, struct log_params *p,
 	}
 
 	if (l->td && l->td->o.io_submit_mode != IO_MODE_OFFLOAD) {
-		struct io_logs *p;
+		struct io_logs *__p;
 
-		p = calloc(1, sizeof(*l->pending));
-		p->max_samples = DEF_LOG_ENTRIES;
-		p->log = calloc(p->max_samples, log_entry_sz(l));
-		l->pending = p;
+		__p = calloc(1, sizeof(*l->pending));
+		__p->max_samples = DEF_LOG_ENTRIES;
+		__p->log = calloc(__p->max_samples, log_entry_sz(l));
+		l->pending = __p;
 	}
 
 	if (l->log_offset)
diff --git a/iolog.h b/iolog.h
index 60b4f01..a4e335a 100644
--- a/iolog.h
+++ b/iolog.h
@@ -237,7 +237,7 @@ extern void log_file(struct thread_data *, struct fio_file *, enum file_log_act)
 extern bool __must_check init_iolog(struct thread_data *td);
 extern void log_io_piece(struct thread_data *, struct io_u *);
 extern void unlog_io_piece(struct thread_data *, struct io_u *);
-extern void trim_io_piece(struct thread_data *, const struct io_u *);
+extern void trim_io_piece(const struct io_u *);
 extern void queue_io_piece(struct thread_data *, struct io_piece *);
 extern void prune_io_piece_log(struct thread_data *);
 extern void write_iolog_close(struct thread_data *);
diff --git a/lib/nowarn_snprintf.h b/lib/nowarn_snprintf.h
new file mode 100644
index 0000000..81a6d10
--- /dev/null
+++ b/lib/nowarn_snprintf.h
@@ -0,0 +1,27 @@
+#ifndef _NOWARN_SNPRINTF_H_
+#define _NOWARN_SNPRINTF_H_
+
+#include <stdio.h>
+#include <stdarg.h>
+
+static inline int nowarn_snprintf(char *str, size_t size, const char *format,
+				  ...)
+{
+	va_list args;
+	int res;
+
+	va_start(args, format);
+#if __GNUC__ -0 >= 8
+#pragma GCC diagnostic push "-Wformat-truncation"
+#pragma GCC diagnostic ignored "-Wformat-truncation"
+#endif
+	res = vsnprintf(str, size, format, args);
+#if __GNUC__ -0 >= 8
+#pragma GCC diagnostic pop "-Wformat-truncation"
+#endif
+	va_end(args);
+
+	return res;
+}
+
+#endif
diff --git a/log.c b/log.c
index 46e5034..6c36813 100644
--- a/log.c
+++ b/log.c
@@ -15,7 +15,7 @@ size_t log_info_buf(const char *buf, size_t len)
 		return 0;
 
 	if (is_backend) {
-		size_t ret = fio_server_text_output(FIO_LOG_INFO, buf, len);
+		ssize_t ret = fio_server_text_output(FIO_LOG_INFO, buf, len);
 		if (ret != -1)
 			return ret;
 	}
@@ -65,10 +65,10 @@ void log_prevalist(int type, const char *fmt, va_list args)
 	free(buf2);
 }
 
-size_t log_info(const char *format, ...)
+ssize_t log_info(const char *format, ...)
 {
 	va_list args;
-	size_t ret;
+	ssize_t ret;
 
 	va_start(args, format);
 	ret = log_valist(format, args);
@@ -102,9 +102,9 @@ int log_info_flush(void)
 	return fflush(f_out);
 }
 
-size_t log_err(const char *format, ...)
+ssize_t log_err(const char *format, ...)
 {
-	size_t ret;
+	ssize_t ret;
 	int len;
 	char *buffer;
 	va_list args;
diff --git a/log.h b/log.h
index 8163f97..b50d448 100644
--- a/log.h
+++ b/log.h
@@ -9,8 +9,8 @@
 extern FILE *f_out;
 extern FILE *f_err;
 
-extern size_t log_err(const char *format, ...) __attribute__ ((__format__ (__printf__, 1, 2)));
-extern size_t log_info(const char *format, ...) __attribute__ ((__format__ (__printf__, 1, 2)));
+extern ssize_t log_err(const char *format, ...) __attribute__ ((__format__ (__printf__, 1, 2)));
+extern ssize_t log_info(const char *format, ...) __attribute__ ((__format__ (__printf__, 1, 2)));
 extern size_t __log_buf(struct buf_output *, const char *format, ...) __attribute__ ((__format__ (__printf__, 2, 3)));
 extern size_t log_valist(const char *str, va_list);
 extern void log_prevalist(int type, const char *str, va_list);
diff --git a/options.c b/options.c
index 047e493..9fbcd96 100644
--- a/options.c
+++ b/options.c
@@ -57,8 +57,7 @@ struct split {
 };
 
 static int split_parse_ddir(struct thread_options *o, struct split *split,
-			    enum fio_ddir ddir, char *str, bool absolute,
-			    unsigned int max_splits)
+			    char *str, bool absolute, unsigned int max_splits)
 {
 	unsigned long long perc;
 	unsigned int i;
@@ -125,7 +124,7 @@ static int bssplit_ddir(struct thread_options *o, enum fio_ddir ddir, char *str,
 
 	memset(&split, 0, sizeof(split));
 
-	if (split_parse_ddir(o, &split, ddir, str, data, BSSPLIT_MAX))
+	if (split_parse_ddir(o, &split, str, data, BSSPLIT_MAX))
 		return 1;
 	if (!split.nr)
 		return 0;
@@ -882,7 +881,7 @@ static int zone_split_ddir(struct thread_options *o, enum fio_ddir ddir,
 
 	memset(&split, 0, sizeof(split));
 
-	if (split_parse_ddir(o, &split, ddir, str, absolute, ZONESPLIT_MAX))
+	if (split_parse_ddir(o, &split, str, absolute, ZONESPLIT_MAX))
 		return 1;
 	if (!split.nr)
 		return 0;
@@ -1047,8 +1046,6 @@ static int parse_zoned_distribution(struct thread_data *td, const char *input,
 	}
 
 	if (parse_dryrun()) {
-		int i;
-
 		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 			free(td->o.zone_split[i]);
 			td->o.zone_split[i] = NULL;
@@ -5162,8 +5159,7 @@ struct fio_option *fio_option_find(const char *name)
 	return find_option(fio_options, name);
 }
 
-static struct fio_option *find_next_opt(struct thread_options *o,
-					struct fio_option *from,
+static struct fio_option *find_next_opt(struct fio_option *from,
 					unsigned int off1)
 {
 	struct fio_option *opt;
@@ -5200,7 +5196,7 @@ bool __fio_option_is_set(struct thread_options *o, unsigned int off1)
 	struct fio_option *opt, *next;
 
 	next = NULL;
-	while ((opt = find_next_opt(o, next, off1)) != NULL) {
+	while ((opt = find_next_opt(next, off1)) != NULL) {
 		if (opt_is_set(o, opt))
 			return true;
 
diff --git a/os/os-linux.h b/os/os-linux.h
index a550bba..6b63d12 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -403,7 +403,7 @@ static inline bool fio_fallocate(struct fio_file *f, uint64_t offset,
 				 uint64_t len)
 {
 	int ret;
-	ret = fallocate(f->fd, 0, 0, len);
+	ret = fallocate(f->fd, 0, offset, len);
 	if (ret == 0)
 		return true;
 
diff --git a/parse.c b/parse.c
index 9685f1e..6261fca 100644
--- a/parse.c
+++ b/parse.c
@@ -69,7 +69,7 @@ static void posval_sort(const struct fio_option *o, struct value_pair *vpmap)
 }
 
 static void show_option_range(const struct fio_option *o,
-			      size_t (*logger)(const char *format, ...))
+			      ssize_t (*logger)(const char *format, ...))
 {
 	if (o->type == FIO_OPT_FLOAT_LIST) {
 		const char *sep = "";
@@ -132,7 +132,7 @@ static void show_option_help(const struct fio_option *o, int is_err)
 		"deprecated",
 		"unsupported",
 	};
-	size_t (*logger)(const char *format, ...);
+	ssize_t (*logger)(const char *format, ...);
 
 	if (is_err)
 		logger = log_err;
diff --git a/server.c b/server.c
index 12c8d68..7e7ffed 100644
--- a/server.c
+++ b/server.c
@@ -1594,7 +1594,7 @@ void fio_server_send_gs(struct group_run_stats *rs)
 }
 
 void fio_server_send_job_options(struct flist_head *opt_list,
-				 unsigned int groupid)
+				 unsigned int gid)
 {
 	struct cmd_job_option pdu;
 	struct flist_head *entry;
@@ -1609,12 +1609,12 @@ void fio_server_send_job_options(struct flist_head *opt_list,
 		p = flist_entry(entry, struct print_option, list);
 		memset(&pdu, 0, sizeof(pdu));
 
-		if (groupid == -1U) {
+		if (gid == -1U) {
 			pdu.global = __cpu_to_le16(1);
 			pdu.groupid = 0;
 		} else {
 			pdu.global = 0;
-			pdu.groupid = cpu_to_le32(groupid);
+			pdu.groupid = cpu_to_le32(gid);
 		}
 		len = strlen(p->name);
 		if (len >= sizeof(pdu.name)) {
diff --git a/stat.c b/stat.c
index c89a7f0..5b49ddd 100644
--- a/stat.c
+++ b/stat.c
@@ -1398,7 +1398,7 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 	if (ts->ss_dur) {
 		struct json_object *data;
 		struct json_array *iops, *bw;
-		int i, j, k;
+		int j, k, l;
 		char ss_buf[64];
 
 		snprintf(ss_buf, sizeof(ss_buf), "%s%s:%f%s",
@@ -1434,8 +1434,8 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 			j = ts->ss_head;
 		else
 			j = ts->ss_head == 0 ? ts->ss_dur - 1 : ts->ss_head - 1;
-		for (i = 0; i < ts->ss_dur; i++) {
-			k = (j + i) % ts->ss_dur;
+		for (l = 0; l < ts->ss_dur; l++) {
+			k = (j + l) % ts->ss_dur;
 			json_array_add_value_int(bw, ts->ss_bw_data[k]);
 			json_array_add_value_int(iops, ts->ss_iops_data[k]);
 		}
@@ -1940,13 +1940,6 @@ void __show_run_stats(void)
 	free(opt_lists);
 }
 
-void show_run_stats(void)
-{
-	fio_sem_down(stat_sem);
-	__show_run_stats();
-	fio_sem_up(stat_sem);
-}
-
 void __show_running_run_stats(void)
 {
 	struct thread_data *td;
diff --git a/stat.h b/stat.h
index 8e7bcdb..c5b8185 100644
--- a/stat.h
+++ b/stat.h
@@ -288,7 +288,6 @@ extern struct json_object * show_thread_status(struct thread_stat *ts, struct gr
 extern void show_group_stats(struct group_run_stats *rs, struct buf_output *);
 extern bool calc_thread_status(struct jobs_eta *je, int force);
 extern void display_thread_status(struct jobs_eta *je);
-extern void show_run_stats(void);
 extern void __show_run_stats(void);
 extern void __show_running_run_stats(void);
 extern void show_running_run_stats(void);
diff --git a/verify-state.h b/verify-state.h
index 1586f63..6da1585 100644
--- a/verify-state.h
+++ b/verify-state.h
@@ -4,6 +4,7 @@
 #include <stdint.h>
 #include <string.h>
 #include <limits.h>
+#include "lib/nowarn_snprintf.h"
 
 struct thread_rand32_state {
 	uint32_t s[4];
@@ -101,7 +102,7 @@ static inline void verify_state_gen_name(char *out, size_t size,
 		name++;
 	} while (1);
 
-	snprintf(out, size, "%s-%s-%d-verify.state", prefix, ename, num);
+	nowarn_snprintf(out, size, "%s-%s-%d-verify.state", prefix, ename, num);
 	out[size - 1] = '\0';
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-06-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-06-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c397ab3aac3d4ed1660d9cd4d820f31a4375306e:

  Fio 3.7 (2018-06-01 13:21:56 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 182cab24328ab22a5b76ba1edac2c801b83aeffb:

  Fix issue with rate_process=poisson and ramp time (2018-06-04 13:51:47 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fix issue with rate_process=poisson and ramp time

 backend.c | 2 ++
 libfio.c  | 1 +
 2 files changed, 3 insertions(+)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 033d5a7..180348f 100644
--- a/backend.c
+++ b/backend.c
@@ -892,6 +892,8 @@ static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir)
 			over = (usperop - total) / usperop * -bs;
 
 		td->rate_io_issue_bytes[ddir] += (missed - over);
+		/* adjust for rate_process=poisson */
+		td->last_usec[ddir] += total;
 	}
 }
 
diff --git a/libfio.c b/libfio.c
index 6faf32a..674bc1d 100644
--- a/libfio.c
+++ b/libfio.c
@@ -92,6 +92,7 @@ static void reset_io_counters(struct thread_data *td, int all)
 			td->bytes_done[ddir] = 0;
 			td->rate_io_issue_bytes[ddir] = 0;
 			td->rate_next_io_time[ddir] = 0;
+			td->last_usec[ddir] = 0;
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-06-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-06-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 80f021501fda6a6244672bb89dd8221a61cee54b:

  io_u: ensure to invalidate cache on time_based random reads (2018-05-31 09:05:59 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c397ab3aac3d4ed1660d9cd4d820f31a4375306e:

  Fio 3.7 (2018-06-01 13:21:56 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 3.7

 FIO-VERSION-GEN | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index d2d095b..b28a1f3 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.6
+DEF_VER=fio-3.7
 
 LF='
 '

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-06-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-06-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit cf04b906fda16c6c14c420b71130a31d6580e9d8:

  Fix typo in bssplit documentation (2018-05-25 08:14:50 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 80f021501fda6a6244672bb89dd8221a61cee54b:

  io_u: ensure to invalidate cache on time_based random reads (2018-05-31 09:05:59 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      io_u: ensure to invalidate cache on time_based random reads

 io_u.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index 5b4c0df..945aa19 100644
--- a/io_u.c
+++ b/io_u.c
@@ -325,9 +325,9 @@ static int get_next_rand_block(struct thread_data *td, struct fio_file *f,
 	if (td->o.time_based ||
 	    (td->o.file_service_type & __FIO_FSERVICE_NONUNIFORM)) {
 		fio_file_reset(td, f);
+		loop_cache_invalidate(td, f);
 		if (!get_next_rand_offset(td, f, ddir, b))
 			return 0;
-		loop_cache_invalidate(td, f);
 	}
 
 	dprint(FD_IO, "%s: rand offset failed, last=%llu, size=%llu\n",

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-05-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-05-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 12223a35ac7d975d7c2950c664d8caf7f9c065bf:

  Merge branch 'sg-verify2' of https://github.com/vincentkfu/fio (2018-05-18 13:34:36 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cf04b906fda16c6c14c420b71130a31d6580e9d8:

  Fix typo in bssplit documentation (2018-05-25 08:14:50 -0600)

----------------------------------------------------------------
Bill O'Donnell (1):
      make fio scripts python3-ready (part 2)

Jens Axboe (1):
      Fix typo in bssplit documentation

 HOWTO                       |  2 +-
 fio.1                       |  2 +-
 tools/fio_jsonplus_clat2csv | 13 ++++--
 tools/plot/fio2gnuplot      | 97 ++++++++++++++++++++++++---------------------
 4 files changed, 62 insertions(+), 52 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 7680f9c..4398ffa 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1429,7 +1429,7 @@ Block size
 	If you want a workload that has 50% 2k reads and 50% 4k reads, while
 	having 90% 4k writes and 10% 8k writes, you would specify::
 
-		bssplit=2k/50:4k/50,4k/90,8k/10
+		bssplit=2k/50:4k/50,4k/90:8k/10
 
 	Fio supports defining up to 64 different weights for each data
 	direction.
diff --git a/fio.1 b/fio.1
index ce3585a..733c740 100644
--- a/fio.1
+++ b/fio.1
@@ -1227,7 +1227,7 @@ If you want a workload that has 50% 2k reads and 50% 4k reads, while having
 90% 4k writes and 10% 8k writes, you would specify:
 .RS
 .P
-bssplit=2k/50:4k/50,4k/90,8k/10
+bssplit=2k/50:4k/50,4k/90:8k/10
 .RE
 .P
 Fio supports defining up to 64 different weights for each data direction.
diff --git a/tools/fio_jsonplus_clat2csv b/tools/fio_jsonplus_clat2csv
index e63d6d8..78a007e 100755
--- a/tools/fio_jsonplus_clat2csv
+++ b/tools/fio_jsonplus_clat2csv
@@ -1,4 +1,5 @@
 #!/usr/bin/python2.7
+# Note: this script is python2 and python3 compatible.
 #
 # fio_jsonplus_clat2csv
 #
@@ -60,9 +61,13 @@
 # 10304ns is the 100th percentile for read latency
 #
 
+from __future__ import absolute_import
+from __future__ import print_function
 import os
 import json
 import argparse
+import six
+from six.moves import range
 
 
 def parse_args():
@@ -87,7 +92,7 @@ def percentile(idx, run_total):
 
 
 def more_lines(indices, bins):
-    for key, value in indices.iteritems():
+    for key, value in six.iteritems(indices):
         if value < len(bins[key]):
             return True
 
@@ -116,8 +121,8 @@ def main():
                                    "Are you sure you are using json+ output?")
 
             bins[ddir] = [[int(key), value] for key, value in
-                          jsondata['jobs'][jobnum][ddir][bins_loc]
-                          ['bins'].iteritems()]
+                          six.iteritems(jsondata['jobs'][jobnum][ddir][bins_loc]
+                          ['bins'])]
             bins[ddir] = sorted(bins[ddir], key=lambda bin: bin[0])
 
             run_total[ddir] = [0 for x in range(0, len(bins[ddir]))]
@@ -165,7 +170,7 @@ def main():
                         output.write(", , , ")
                 output.write("\n")
 
-            print "{0} generated".format(outfile)
+            print("{0} generated".format(outfile))
 
 
 if __name__ == '__main__':
diff --git a/tools/plot/fio2gnuplot b/tools/plot/fio2gnuplot
index 5d31f13..4d1815c 100755
--- a/tools/plot/fio2gnuplot
+++ b/tools/plot/fio2gnuplot
@@ -1,4 +1,5 @@
 #!/usr/bin/python2.7
+# Note: this script is python2 and python3 compatible.
 #
 #  Copyright (C) 2013 eNovance SAS <licensing@enovance.com>
 #  Author: Erwan Velu  <erwan@enovance.com>
@@ -19,6 +20,8 @@
 #  along with this program; if not, write to the Free Software
 #  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 
+from __future__ import absolute_import
+from __future__ import print_function
 import os
 import fnmatch
 import sys
@@ -26,6 +29,8 @@ import getopt
 import re
 import math
 import shutil
+from six.moves import map
+from six.moves import range
 
 def find_file(path, pattern):
 	fio_data_file=[]
@@ -39,7 +44,7 @@ def find_file(path, pattern):
 	return fio_data_file
 
 def generate_gnuplot_script(fio_data_file,title,gnuplot_output_filename,gnuplot_output_dir,mode,disk_perf,gpm_dir):
-	if verbose: print "Generating rendering scripts"
+	if verbose: print("Generating rendering scripts")
 	filename=gnuplot_output_dir+'mygraph'
 	temporary_files.append(filename)
 	f=open(filename,'w')
@@ -124,7 +129,7 @@ def generate_gnuplot_math_script(title,gnuplot_output_filename,mode,average,gnup
 	f.close()
 
 def compute_aggregated_file(fio_data_file, gnuplot_output_filename, gnuplot_output_dir):
-	if verbose: print "Processing data file 2/2"
+	if verbose: print("Processing data file 2/2")
 	temp_files=[]
 	pos=0
 
@@ -152,7 +157,7 @@ def compute_temp_file(fio_data_file,disk_perf,gnuplot_output_dir, min_time, max_
 	end_time=max_time
 	if end_time == -1:
 		end_time="infinite"
-	if verbose: print "Processing data file 1/2 with %s<time<%s" % (min_time,end_time)
+	if verbose: print("Processing data file 1/2 with %s<time<%s" % (min_time,end_time))
 	files=[]
 	temp_outfile=[]
 	blk_size=0
@@ -198,8 +203,8 @@ def compute_temp_file(fio_data_file,disk_perf,gnuplot_output_dir, min_time, max_
 				try:
 					blk_size=int(block_size)
 				except:
-					print "Error while reading the following line :"
-					print line
+					print("Error while reading the following line :")
+					print(line)
 					sys.exit(1);
 
 			# We ignore the first 500msec as it doesn't seems to be part of the real benchmark
@@ -225,7 +230,7 @@ def compute_temp_file(fio_data_file,disk_perf,gnuplot_output_dir, min_time, max_
 	return blk_size
 
 def compute_math(fio_data_file, title,gnuplot_output_filename,gnuplot_output_dir,mode,disk_perf,gpm_dir):
-	if verbose: print "Computing Maths"
+	if verbose: print("Computing Maths")
 	global_min=[]
 	global_max=[]
 	average_file=open(gnuplot_output_dir+gnuplot_output_filename+'.average', 'w')
@@ -243,14 +248,14 @@ def compute_math(fio_data_file, title,gnuplot_output_filename,gnuplot_output_dir
 	max_file.write('DiskName %s\n'% mode)
 	average_file.write('DiskName %s\n'% mode)
 	stddev_file.write('DiskName %s\n'% mode )
-	for disk in xrange(len(fio_data_file)):
+	for disk in range(len(fio_data_file)):
 #		print disk_perf[disk]
 	    	min_file.write("# Disk%d was coming from %s\n" % (disk,fio_data_file[disk]))
 	    	max_file.write("# Disk%d was coming from %s\n" % (disk,fio_data_file[disk]))
 	    	average_file.write("# Disk%d was coming from %s\n" % (disk,fio_data_file[disk]))
 	    	stddev_file.write("# Disk%d was coming from %s\n" % (disk,fio_data_file[disk]))
 		avg  = average(disk_perf[disk])
-		variance = map(lambda x: (x - avg)**2, disk_perf[disk])
+		variance = [(x - avg)**2 for x in disk_perf[disk]]
 		standard_deviation = math.sqrt(average(variance))
 #		print "Disk%d [ min=%.2f max=%.2f avg=%.2f stddev=%.2f \n" % (disk,min(disk_perf[disk]),max(disk_perf[disk]),avg, standard_deviation)
 		average_file.write('%d %d\n' % (disk, avg))
@@ -264,7 +269,7 @@ def compute_math(fio_data_file, title,gnuplot_output_filename,gnuplot_output_dir
 
 	global_disk_perf = sum(disk_perf, [])
 	avg  = average(global_disk_perf)
-	variance = map(lambda x: (x - avg)**2, global_disk_perf)
+	variance = [(x - avg)**2 for x in global_disk_perf]
 	standard_deviation = math.sqrt(average(variance))
 
 	global_file.write('min=%.2f\n' % min(global_disk_perf))
@@ -331,52 +336,52 @@ def parse_global_files(fio_data_file, global_search):
 					max_file=file
 	# Let's print the avg output
 	if global_search == "avg":
-		print "Biggest aggregated value of %s was %2.f in file %s\n" % (global_search, max_result, max_file)
+		print("Biggest aggregated value of %s was %2.f in file %s\n" % (global_search, max_result, max_file))
 	else:
-		print "Global search %s is not yet implemented\n" % global_search
+		print("Global search %s is not yet implemented\n" % global_search)
 
 def render_gnuplot(fio_data_file, gnuplot_output_dir):
-	print "Running gnuplot Rendering"
+	print("Running gnuplot Rendering")
 	try:
 		# Let's render all the compared files if some
 		if len(fio_data_file) > 1:
-			if verbose: print " |-> Rendering comparing traces"
+			if verbose: print(" |-> Rendering comparing traces")
 			os.system("cd %s; for i in *.gnuplot; do gnuplot $i; done" % gnuplot_output_dir)
-		if verbose: print " |-> Rendering math traces"
+		if verbose: print(" |-> Rendering math traces")
 		os.system("cd %s; gnuplot mymath" % gnuplot_output_dir)
-		if verbose: print " |-> Rendering 2D & 3D traces"
+		if verbose: print(" |-> Rendering 2D & 3D traces")
 		os.system("cd %s; gnuplot mygraph" % gnuplot_output_dir)
 
 		name_of_directory="the current"
 		if gnuplot_output_dir != "./":
 			name_of_directory=gnuplot_output_dir
-		print "\nRendering traces are available in %s directory" % name_of_directory
+		print("\nRendering traces are available in %s directory" % name_of_directory)
 		global keep_temp_files
 		keep_temp_files=False
 	except:
-		print "Could not run gnuplot on mymath or mygraph !\n"
+		print("Could not run gnuplot on mymath or mygraph !\n")
 		sys.exit(1);
 
 def print_help():
-    print 'fio2gnuplot -ghbiodvk -t <title> -o <outputfile> -p <pattern> -G <type> -m <time> -M <time>'
-    print
-    print '-h --help                           : Print this help'
-    print '-p <pattern> or --pattern <pattern> : A glob pattern to select fio input files'
-    print '-b           or --bandwidth         : A predefined pattern for selecting *_bw.log files'
-    print '-i           or --iops              : A predefined pattern for selecting *_iops.log files'
-    print '-g           or --gnuplot           : Render gnuplot traces before exiting'
-    print '-o           or --outputfile <file> : The basename for gnuplot traces'
-    print '                                       - Basename is set with the pattern if defined'
-    print '-d           or --outputdir <dir>   : The directory where gnuplot shall render files'
-    print '-t           or --title <title>     : The title of the gnuplot traces'
-    print '                                       - Title is set with the block size detected in fio traces'
-    print '-G           or --Global <type>     : Search for <type> in .global files match by a pattern'
-    print '                                       - Available types are : min, max, avg, stddev'
-    print '                                       - The .global extension is added automatically to the pattern'
-    print '-m           or --min_time <time>   : Only consider data starting from <time> seconds (default is 0)'
-    print '-M           or --max_time <time>   : Only consider data ending before <time> seconds (default is -1 aka nolimit)'
-    print '-v           or --verbose           : Increasing verbosity'
-    print '-k           or --keep              : Keep all temporary files from gnuplot\'s output dir'
+    print('fio2gnuplot -ghbiodvk -t <title> -o <outputfile> -p <pattern> -G <type> -m <time> -M <time>')
+    print()
+    print('-h --help                           : Print this help')
+    print('-p <pattern> or --pattern <pattern> : A glob pattern to select fio input files')
+    print('-b           or --bandwidth         : A predefined pattern for selecting *_bw.log files')
+    print('-i           or --iops              : A predefined pattern for selecting *_iops.log files')
+    print('-g           or --gnuplot           : Render gnuplot traces before exiting')
+    print('-o           or --outputfile <file> : The basename for gnuplot traces')
+    print('                                       - Basename is set with the pattern if defined')
+    print('-d           or --outputdir <dir>   : The directory where gnuplot shall render files')
+    print('-t           or --title <title>     : The title of the gnuplot traces')
+    print('                                       - Title is set with the block size detected in fio traces')
+    print('-G           or --Global <type>     : Search for <type> in .global files match by a pattern')
+    print('                                       - Available types are : min, max, avg, stddev')
+    print('                                       - The .global extension is added automatically to the pattern')
+    print('-m           or --min_time <time>   : Only consider data starting from <time> seconds (default is 0)')
+    print('-M           or --max_time <time>   : Only consider data ending before <time> seconds (default is -1 aka nolimit)')
+    print('-v           or --verbose           : Increasing verbosity')
+    print('-k           or --keep              : Keep all temporary files from gnuplot\'s output dir')
 
 def main(argv):
     mode='unknown'
@@ -403,14 +408,14 @@ def main(argv):
     if not os.path.isfile(gpm_dir+'math.gpm'):
 	    gpm_dir="/usr/local/share/fio/"
     	    if not os.path.isfile(gpm_dir+'math.gpm'):
-		    print "Looks like fio didn't get installed properly as no gpm files found in '/usr/share/fio' or '/usr/local/share/fio'\n"
+		    print("Looks like fio didn't get installed properly as no gpm files found in '/usr/share/fio' or '/usr/local/share/fio'\n")
 		    sys.exit(3)
 
     try:
 	    opts, args = getopt.getopt(argv[1:],"ghkbivo:d:t:p:G:m:M:",['bandwidth', 'iops', 'pattern', 'outputfile', 'outputdir', 'title', 'min_time', 'max_time', 'gnuplot', 'Global', 'help', 'verbose','keep'])
     except getopt.GetoptError:
-	 print "Error: One of the options passed to the cmdline was not supported"
-	 print "Please fix your command line or read the help (-h option)"
+	 print("Error: One of the options passed to the cmdline was not supported")
+	 print("Please fix your command line or read the help (-h option)")
          sys.exit(2)
 
     for opt, arg in opts:
@@ -457,7 +462,7 @@ def main(argv):
 
     fio_data_file=find_file('.',pattern)
     if len(fio_data_file) == 0:
-	    print "No log file found with pattern %s!" % pattern
+	    print("No log file found with pattern %s!" % pattern)
 	    # Try numjob log file format if per_numjob_logs=1
             if (pattern == '*_bw.log'):
                 fio_data_file=find_file('.','*_bw.*.log')
@@ -466,13 +471,13 @@ def main(argv):
             if len(fio_data_file) == 0:
                 sys.exit(1)
             else:
-                print "Using log file per job format instead"
+                print("Using log file per job format instead")
     else:
-	    print "%d files Selected with pattern '%s'" % (len(fio_data_file), pattern)
+	    print("%d files Selected with pattern '%s'" % (len(fio_data_file), pattern))
 
     fio_data_file=sorted(fio_data_file, key=str.lower)
     for file in fio_data_file:
-	print ' |-> %s' % file
+	print(' |-> %s' % file)
 	if "_bw.log" in file :
 		mode="Bandwidth (KB/sec)"
 	if "_iops.log" in file :
@@ -483,7 +488,7 @@ def main(argv):
 	    if "IO" in mode:
 		    title='IO benchmark with %d fio results' % len(fio_data_file)
 
-    print
+    print()
     #We need to adjust the output filename regarding the pattern required by the user
     if (pattern_set_by_user == True):
 	    gnuplot_output_filename=pattern
@@ -514,9 +519,9 @@ def main(argv):
 	# Shall we clean the temporary files ?
 	if keep_temp_files==False and force_keep_temp_files==False:
 	    	# Cleaning temporary files
-		if verbose: print "Cleaning temporary files"
+		if verbose: print("Cleaning temporary files")
 		for f in enumerate(temporary_files):
-		    	if verbose: print " -> %s"%f[1]
+			if verbose: print(" -> %s"%f[1])
 			try:
 			    os.remove(f[1])
 			except:

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-05-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-05-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5eac3b00238b450ac0679121a76f1e566ca8f468:

  make fio scripts python3-ready (2018-05-16 11:17:55 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 12223a35ac7d975d7c2950c664d8caf7f9c065bf:

  Merge branch 'sg-verify2' of https://github.com/vincentkfu/fio (2018-05-18 13:34:36 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'sg-verify2' of https://github.com/vincentkfu/fio

Vincent Fu (2):
      engines/sg: add support for WRITE AND VERIFY, WRITE SAME
      docs: add documentation for sg ioengine WRITE SAME, WRITE AND VERIFY command support

 HOWTO        | 20 ++++++++++++++++++++
 engines/sg.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++--------
 fio.1        | 27 ++++++++++++++++++++++++++
 3 files changed, 101 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index d200700..7680f9c 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2079,6 +2079,26 @@ with the caveat that when used on the command line, they must come after the
 	With writefua option set to 1, write operations include
 	the force unit access (fua) flag. Default is 0.
 
+.. option:: sg_write_mode=str : [sg]
+	Specify the type of write commands to issue. This option can take three values:
+
+	**write**
+		This is the default where write opcodes are issued as usual.
+	**verify**
+		Issue WRITE AND VERIFY commands. The BYTCHK bit is set to 0. This
+		directs the device to carry out a medium verification with no data
+		comparison. The writefua option is ignored with this selection.
+	**same**
+		Issue WRITE SAME commands. This transfers a single block to the device
+		and writes this same block of data to a contiguous sequence of LBAs
+		beginning at the specified offset. fio's block size parameter specifies
+		the amount of data written with each command. However, the amount of data
+		actually transferred to the device is equal to the device's block
+		(sector) size. For a device with 512 byte sectors, blocksize=8k will
+		write 16 sectors with each command. fio will still generate 8k of data
+		for each command but only the first 512 bytes will be used and
+		transferred to the device. The writefua option is ignored with this
+		selection.
 
 I/O depth
 ~~~~~~~~~
diff --git a/engines/sg.c b/engines/sg.c
index d4848bc..06cd194 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -15,11 +15,17 @@
 
 #ifdef FIO_HAVE_SGIO
 
+enum {
+	FIO_SG_WRITE		= 1,
+	FIO_SG_WRITE_VERIFY	= 2,
+	FIO_SG_WRITE_SAME	= 3
+};
 
 struct sg_options {
 	void *pad;
 	unsigned int readfua;
 	unsigned int writefua;
+	unsigned int write_mode;
 };
 
 static struct fio_option options[] = {
@@ -44,6 +50,30 @@ static struct fio_option options[] = {
 		.group	= FIO_OPT_G_SG,
 	},
 	{
+		.name	= "sg_write_mode",
+		.lname	= "specify sg write mode",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct sg_options, write_mode),
+		.help	= "Specify SCSI WRITE mode",
+		.def	= "write",
+		.posval = {
+			  { .ival = "write",
+			    .oval = FIO_SG_WRITE,
+			    .help = "Issue standard SCSI WRITE commands",
+			  },
+			  { .ival = "verify",
+			    .oval = FIO_SG_WRITE_VERIFY,
+			    .help = "Issue SCSI WRITE AND VERIFY commands",
+			  },
+			  { .ival = "same",
+			    .oval = FIO_SG_WRITE_SAME,
+			    .help = "Issue SCSI WRITE SAME commands",
+			  },
+		},
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_SG,
+	},
+	{
 		.name	= NULL,
 	},
 };
@@ -329,14 +359,30 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 		sgio_hdr_init(sd, hdr, io_u, 1);
 
 		hdr->dxfer_direction = SG_DXFER_TO_DEV;
-		if (lba < MAX_10B_LBA)
-			hdr->cmdp[0] = 0x2a; // write(10)
-		else
-			hdr->cmdp[0] = 0x8a; // write(16)
-
-		if (o->writefua)
-			hdr->cmdp[1] |= 0x08;
-
+		switch(o->write_mode) {
+		case FIO_SG_WRITE:
+			if (lba < MAX_10B_LBA)
+				hdr->cmdp[0] = 0x2a; // write(10)
+			else
+				hdr->cmdp[0] = 0x8a; // write(16)
+			if (o->writefua)
+				hdr->cmdp[1] |= 0x08;
+			break;
+		case FIO_SG_WRITE_VERIFY:
+			if (lba < MAX_10B_LBA)
+				hdr->cmdp[0] = 0x2e; // write and verify(10)
+			else
+				hdr->cmdp[0] = 0x8e; // write and verify(16)
+			break;
+			// BYTCHK is disabled by virtue of the memset in sgio_hdr_init
+		case FIO_SG_WRITE_SAME:
+			hdr->dxfer_len = sd->bs;
+			if (lba < MAX_10B_LBA)
+				hdr->cmdp[0] = 0x41; // write same(10)
+			else
+				hdr->cmdp[0] = 0x93; // write same(16)
+			break;
+		};
 	} else {
 		sgio_hdr_init(sd, hdr, io_u, 0);
 		hdr->dxfer_direction = SG_DXFER_NONE;
diff --git a/fio.1 b/fio.1
index 7d5d8be..ce3585a 100644
--- a/fio.1
+++ b/fio.1
@@ -1828,6 +1828,33 @@ unit access (fua) flag. Default: 0.
 .BI (sg)writefua \fR=\fPbool
 With writefua option set to 1, write operations include the force
 unit access (fua) flag. Default: 0.
+.TP
+.BI (sg)sg_write_mode \fR=\fPstr
+Specify the type of write commands to issue. This option can take three
+values:
+.RS
+.RS
+.TP
+.B write (default)
+Write opcodes are issued as usual
+.TP
+.B verify
+Issue WRITE AND VERIFY commands. The BYTCHK bit is set to 0. This
+directs the device to carry out a medium verification with no data
+comparison. The writefua option is ignored with this selection.
+.TP
+.B same
+Issue WRITE SAME commands. This transfers a single block to the device
+and writes this same block of data to a contiguous sequence of LBAs
+beginning at the specified offset. fio's block size parameter
+specifies the amount of data written with each command. However, the
+amount of data actually transferred to the device is equal to the
+device's block (sector) size. For a device with 512 byte sectors,
+blocksize=8k will write 16 sectors with each command. fio will still
+generate 8k of data for each command butonly the first 512 bytes will
+be used and transferred to the device. The writefua option is ignored
+with this selection.
+
 .SS "I/O depth"
 .TP
 .BI iodepth \fR=\fPint

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-05-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-05-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 540e235dcd276e63c57ca4bd35f70a0651e2d00e:

  Merge branch 'master' of https://github.com/majianpeng/fio (2018-05-14 19:40:32 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5eac3b00238b450ac0679121a76f1e566ca8f468:

  make fio scripts python3-ready (2018-05-16 11:17:55 -0600)

----------------------------------------------------------------
Bill O'Donnell (1):
      make fio scripts python3-ready

 doc/conf.py                     |  3 +++
 tools/fiologparser.py           |  3 +++
 unit_tests/steadystate_tests.py | 20 ++++++++++++--------
 3 files changed, 18 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/doc/conf.py b/doc/conf.py
index d4dd9d2..087a9a1 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -22,6 +22,9 @@
 
 # -- General configuration ------------------------------------------------
 
+from __future__ import absolute_import
+from __future__ import print_function
+
 # If your documentation needs a minimal Sphinx version, state it here.
 #
 # needs_sphinx = '1.0'
diff --git a/tools/fiologparser.py b/tools/fiologparser.py
index 8549859..cc29f1c 100755
--- a/tools/fiologparser.py
+++ b/tools/fiologparser.py
@@ -1,4 +1,5 @@
 #!/usr/bin/python2.7
+# Note: this script is python2 and python 3 compatible.
 #
 # fiologparser.py
 #
@@ -13,6 +14,8 @@
 #
 # to see per-interval average completion latency.
 
+from __future__ import absolute_import
+from __future__ import print_function
 import argparse
 import math
 
diff --git a/unit_tests/steadystate_tests.py b/unit_tests/steadystate_tests.py
index 5a74f95..50254dc 100755
--- a/unit_tests/steadystate_tests.py
+++ b/unit_tests/steadystate_tests.py
@@ -1,4 +1,5 @@
 #!/usr/bin/python2.7
+# Note: this script is python2 and python 3 compatible.
 #
 # steadystate_tests.py
 #
@@ -18,6 +19,8 @@
 # if ss attained: min runtime = ss_dur + ss_ramp
 # if not attained: runtime = timeout
 
+from __future__ import absolute_import
+from __future__ import print_function
 import os
 import sys
 import json
@@ -26,11 +29,12 @@ import pprint
 import argparse
 import subprocess
 from scipy import stats
+from six.moves import range
 
 def parse_args():
     parser = argparse.ArgumentParser()
     parser.add_argument('fio',
-                        help='path to fio executable');
+                        help='path to fio executable')
     parser.add_argument('--read',
                         help='target for read testing')
     parser.add_argument('--write',
@@ -45,7 +49,7 @@ def check(data, iops, slope, pct, limit, dur, criterion):
     data = data[measurement]
     mean = sum(data) / len(data)
     if slope:
-        x = range(len(data))
+        x = list(range(len(data)))
         m, intercept, r_value, p_value, std_err = stats.linregress(x,data)
         m = abs(m)
         if pct:
@@ -89,11 +93,11 @@ if __name__ == '__main__':
                   'output': "set steady state BW threshold to 12" },
               ]
     for test in parsing:
-        output = subprocess.check_output([args.fio] + test['args']);
-        if test['output'] in output:
-            print "PASSED '{0}' found with arguments {1}".format(test['output'], test['args'])
+        output = subprocess.check_output([args.fio] + test['args'])
+        if test['output'] in output.decode():
+            print("PASSED '{0}' found with arguments {1}".format(test['output'], test['args']))
         else:
-            print "FAILED '{0}' NOT found with arguments {1}".format(test['output'], test['args'])
+            print("FAILED '{0}' NOT found with arguments {1}".format(test['output'], test['args']))
 
 #
 # test some read workloads
@@ -117,7 +121,7 @@ if __name__ == '__main__':
             args.read = '/dev/zero'
             extra = [ "--size=134217728" ]  # 128 MiB
         else:
-            print "ERROR: file for read testing must be specified on non-posix systems"
+            print("ERROR: file for read testing must be specified on non-posix systems")
             sys.exit(1)
     else:
         extra = []
@@ -216,7 +220,7 @@ if __name__ == '__main__':
                 else:
                     result = 'FAILED '
                 line = result + line + ' no ss, expected runtime {0} ~= actual runtime {1}'.format(expected, actual)
-            print line
+            print(line)
             if 'steadystate' in jsonjob:
                 pp.pprint(jsonjob['steadystate'])
         jobnum += 1

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-05-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-05-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 43864deb3b4e24d5570ee4cb16cb626e94ec0465:

  iolog: default to good return (2018-04-26 22:45:04 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 540e235dcd276e63c57ca4bd35f70a0651e2d00e:

  Merge branch 'master' of https://github.com/majianpeng/fio (2018-05-14 19:40:32 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'master' of https://github.com/majianpeng/fio

Jianpeng Ma (1):
      This partly revert 97bb54c9606c(add __load_ioengine() to separate ioengine loading from td context)

 ioengines.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/ioengines.c b/ioengines.c
index 6ffd27f..d579682 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -574,6 +574,7 @@ int td_io_get_file_size(struct thread_data *td, struct fio_file *f)
 int fio_show_ioengine_help(const char *engine)
 {
 	struct flist_head *entry;
+	struct thread_data td;
 	struct ioengine_ops *io_ops;
 	char *sep;
 	int ret = 1;
@@ -592,7 +593,10 @@ int fio_show_ioengine_help(const char *engine)
 		sep++;
 	}
 
-	io_ops = __load_ioengine(engine);
+	memset(&td, 0, sizeof(struct thread_data));
+	td.o.ioengine = (char *)engine;
+	io_ops = load_ioengine(&td);
+
 	if (!io_ops) {
 		log_info("IO engine %s not found\n", engine);
 		return 1;
@@ -603,5 +607,6 @@ int fio_show_ioengine_help(const char *engine)
 	else
 		log_info("IO engine %s has no options\n", io_ops->name);
 
+	free_ioengine(&td);
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ca0a0b521e8650ed736246c00092094a4d5c9829:

  t/*: missing statics (2018-04-24 14:03:34 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 43864deb3b4e24d5570ee4cb16cb626e94ec0465:

  iolog: default to good return (2018-04-26 22:45:04 -0600)

----------------------------------------------------------------
Jens Axboe (12):
      blktrace: don't re-clear ipo
      blktrace: ignore 0 byte writes
      iolog: always use calloc() and always init both lists
      blktrace: change barrier to a flush
      blktrace: handle flush/sync replay
      blktrace: kill zero sized write test
      blktrace: add 'reply_skip' option
      Makefile: ensure we kill all object files
      Update documentation for 'replay_skip'
      blktrace: make sure to account SYNC/TRIM at load time
      iolog/blktrace: boolean conversion
      iolog: default to good return

 HOWTO            |  8 +++++
 Makefile         |  2 +-
 backend.c        |  2 +-
 blktrace.c       | 94 ++++++++++++++++++++++++++++++++++++++------------------
 blktrace.h       | 14 ++++-----
 blktrace_api.h   |  2 +-
 cconv.c          |  2 ++
 fio.1            |  6 ++++
 iolog.c          | 36 ++++++++++++----------
 iolog.h          |  4 +--
 options.c        | 48 +++++++++++++++++++++++++++++
 server.h         |  2 +-
 thread_options.h |  2 ++
 13 files changed, 162 insertions(+), 60 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 8ee00fd..d200700 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2346,6 +2346,14 @@ I/O replay
 
 	Scale sector offsets down by this factor when replaying traces.
 
+.. option:: replay_skip=str
+
+	Sometimes it's useful to skip certain IO types in a replay trace.
+	This could be, for instance, eliminating the writes in the trace.
+	Or not replaying the trims/discards, if you are redirecting to
+	a device that doesn't support them. This option takes a comma
+	separated list of read, write, trim, sync.
+
 
 Threads, processes and job synchronization
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/Makefile b/Makefile
index 357ae98..20d3ec1 100644
--- a/Makefile
+++ b/Makefile
@@ -462,7 +462,7 @@ t/time-test: $(T_TT_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_TT_OBJS) $(LIBS)
 
 clean: FORCE
-	@rm -f .depend $(FIO_OBJS) $(GFIO_OBJS) $(OBJS) $(T_OBJS) $(PROGS) $(T_PROGS) $(T_TEST_PROGS) core.* core gfio FIO-VERSION-FILE *.d lib/*.d oslib/*.d crc/*.d engines/*.d profiles/*.d t/*.d config-host.mak config-host.h y.tab.[ch] lex.yy.c exp/*.[do] lexer.h
+	@rm -f .depend $(FIO_OBJS) $(GFIO_OBJS) $(OBJS) $(T_OBJS) $(PROGS) $(T_PROGS) $(T_TEST_PROGS) core.* core gfio FIO-VERSION-FILE *.[do] lib/*.d oslib/*.[do] crc/*.d engines/*.[do] profiles/*.[do] t/*.[do] config-host.mak config-host.h y.tab.[ch] lex.yy.c exp/*.[do] lexer.h
 	@rm -rf  doc/output
 
 distclean: clean FORCE
diff --git a/backend.c b/backend.c
index d5cb6ef..033d5a7 100644
--- a/backend.c
+++ b/backend.c
@@ -1670,7 +1670,7 @@ static void *thread_main(void *data)
 	 * May alter parameters that init_io_u() will use, so we need to
 	 * do this first.
 	 */
-	if (init_iolog(td))
+	if (!init_iolog(td))
 		goto err;
 
 	if (init_io_u(td))
diff --git a/blktrace.c b/blktrace.c
index 71ac412..cda111a 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -73,29 +73,29 @@ static int discard_pdu(struct thread_data *td, struct fifo *fifo, int fd,
  * Check if this is a blktrace binary data file. We read a single trace
  * into memory and check for the magic signature.
  */
-int is_blktrace(const char *filename, int *need_swap)
+bool is_blktrace(const char *filename, int *need_swap)
 {
 	struct blk_io_trace t;
 	int fd, ret;
 
 	fd = open(filename, O_RDONLY);
 	if (fd < 0)
-		return 0;
+		return false;
 
 	ret = read(fd, &t, sizeof(t));
 	close(fd);
 
 	if (ret < 0) {
 		perror("read blktrace");
-		return 0;
+		return false;
 	} else if (ret != sizeof(t)) {
 		log_err("fio: short read on blktrace file\n");
-		return 0;
+		return false;
 	}
 
 	if ((t.magic & 0xffffff00) == BLK_IO_TRACE_MAGIC) {
 		*need_swap = 0;
-		return 1;
+		return true;
 	}
 
 	/*
@@ -104,10 +104,10 @@ int is_blktrace(const char *filename, int *need_swap)
 	t.magic = fio_swap32(t.magic);
 	if ((t.magic & 0xffffff00) == BLK_IO_TRACE_MAGIC) {
 		*need_swap = 1;
-		return 1;
+		return true;
 	}
 
-	return 0;
+	return false;
 }
 
 #define FMINORBITS	20
@@ -222,8 +222,9 @@ static void store_ipo(struct thread_data *td, unsigned long long offset,
 		      unsigned int bytes, int rw, unsigned long long ttime,
 		      int fileno, unsigned int bs)
 {
-	struct io_piece *ipo = malloc(sizeof(*ipo));
+	struct io_piece *ipo;
 
+	ipo = calloc(1, sizeof(*ipo));
 	init_ipo(ipo);
 
 	ipo->offset = offset * bs;
@@ -268,10 +269,14 @@ static void handle_trace_discard(struct thread_data *td,
 				 unsigned long long ttime,
 				 unsigned long *ios, unsigned int *rw_bs)
 {
-	struct io_piece *ipo = malloc(sizeof(*ipo));
+	struct io_piece *ipo;
 	unsigned int bs;
 	int fileno;
 
+	if (td->o.replay_skip & (1u << DDIR_TRIM))
+		return;
+
+	ipo = calloc(1, sizeof(*ipo));
 	init_ipo(ipo);
 	fileno = trace_add_file(td, t->device, &bs);
 
@@ -281,7 +286,6 @@ static void handle_trace_discard(struct thread_data *td,
 
 	td->o.size += t->bytes;
 
-	memset(ipo, 0, sizeof(*ipo));
 	INIT_FLIST_HEAD(&ipo->list);
 
 	ipo->offset = t->sector * bs;
@@ -311,6 +315,16 @@ static void handle_trace_fs(struct thread_data *td, struct blk_io_trace *t,
 
 	rw = (t->action & BLK_TC_ACT(BLK_TC_WRITE)) != 0;
 
+	if (rw) {
+		if (td->o.replay_skip & (1u << DDIR_WRITE))
+			return;
+	} else {
+		if (td->o.replay_skip & (1u << DDIR_READ))
+			return;
+	}
+
+	assert(t->bytes);
+
 	if (t->bytes > rw_bs[rw])
 		rw_bs[rw] = t->bytes;
 
@@ -319,6 +333,29 @@ static void handle_trace_fs(struct thread_data *td, struct blk_io_trace *t,
 	store_ipo(td, t->sector, t->bytes, rw, ttime, fileno, bs);
 }
 
+static void handle_trace_flush(struct thread_data *td, struct blk_io_trace *t,
+			       unsigned long long ttime, unsigned long *ios)
+{
+	struct io_piece *ipo;
+	unsigned int bs;
+	int fileno;
+
+	if (td->o.replay_skip & (1u << DDIR_SYNC))
+		return;
+
+	ipo = calloc(1, sizeof(*ipo));
+	init_ipo(ipo);
+	fileno = trace_add_file(td, t->device, &bs);
+
+	ipo->delay = ttime / 1000;
+	ipo->ddir = DDIR_SYNC;
+	ipo->fileno = fileno;
+
+	ios[DDIR_SYNC]++;
+	dprint(FD_BLKTRACE, "store flush delay=%lu\n", ipo->delay);
+	queue_io_piece(td, ipo);
+}
+
 /*
  * We only care for queue traces, most of the others are side effects
  * due to internal workings of the block layer.
@@ -354,6 +391,8 @@ static void handle_trace(struct thread_data *td, struct blk_io_trace *t,
 		handle_trace_notify(t);
 	else if (t->action & BLK_TC_ACT(BLK_TC_DISCARD))
 		handle_trace_discard(td, t, delay, ios, bs);
+	else if (t->action & BLK_TC_ACT(BLK_TC_FLUSH))
+		handle_trace_flush(td, t, delay, ios);
 	else
 		handle_trace_fs(td, t, delay, ios, bs);
 }
@@ -373,7 +412,7 @@ static void byteswap_trace(struct blk_io_trace *t)
 	t->pdu_len = fio_swap16(t->pdu_len);
 }
 
-static int t_is_write(struct blk_io_trace *t)
+static bool t_is_write(struct blk_io_trace *t)
 {
 	return (t->action & BLK_TC_ACT(BLK_TC_WRITE | BLK_TC_DISCARD)) != 0;
 }
@@ -423,20 +462,22 @@ static void depth_end(struct blk_io_trace *t, int *this_depth, int *depth)
  * Load a blktrace file by reading all the blk_io_trace entries, and storing
  * them as io_pieces like the fio text version would do.
  */
-int load_blktrace(struct thread_data *td, const char *filename, int need_swap)
+bool load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 {
 	struct blk_io_trace t;
-	unsigned long ios[DDIR_RWDIR_CNT], skipped_writes;
-	unsigned int rw_bs[DDIR_RWDIR_CNT];
+	unsigned long ios[DDIR_RWDIR_SYNC_CNT] = { };
+	unsigned int rw_bs[DDIR_RWDIR_CNT] = { };
+	unsigned long skipped_writes;
 	struct fifo *fifo;
-	int fd, i, old_state;
+	int fd, i, old_state, max_depth;
 	struct fio_file *f;
-	int this_depth[DDIR_RWDIR_CNT], depth[DDIR_RWDIR_CNT], max_depth;
+	int this_depth[DDIR_RWDIR_CNT] = { };
+	int depth[DDIR_RWDIR_CNT] = { };
 
 	fd = open(filename, O_RDONLY);
 	if (fd < 0) {
 		td_verror(td, errno, "open blktrace file");
-		return 1;
+		return false;
 	}
 
 	fifo = fifo_alloc(TRACE_FIFO_SIZE);
@@ -444,14 +485,6 @@ int load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 	old_state = td_bump_runstate(td, TD_SETTING_UP);
 
 	td->o.size = 0;
-
-	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		ios[i] = 0;
-		rw_bs[i] = 0;
-		this_depth[i] = 0;
-		depth[i] = 0;
-	}
-
 	skipped_writes = 0;
 	do {
 		int ret = trace_fifo_get(td, fifo, fd, &t, sizeof(t));
@@ -514,7 +547,7 @@ int load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 
 	if (!td->files_index) {
 		log_err("fio: did not find replay device(s)\n");
-		return 1;
+		return false;
 	}
 
 	/*
@@ -534,9 +567,10 @@ int load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 		log_err("fio: %s skips replay of %lu writes due to read-only\n",
 						td->o.name, skipped_writes);
 
-	if (!ios[DDIR_READ] && !ios[DDIR_WRITE]) {
+	if (!ios[DDIR_READ] && !ios[DDIR_WRITE] && !ios[DDIR_TRIM] &&
+	    !ios[DDIR_SYNC]) {
 		log_err("fio: found no ios in blktrace data\n");
-		return 1;
+		return false;
 	} else if (ios[DDIR_READ] && !ios[DDIR_WRITE]) {
 		td->o.td_ddir = TD_DDIR_READ;
 		td->o.max_bs[DDIR_READ] = rw_bs[DDIR_READ];
@@ -564,9 +598,9 @@ int load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 	if (!fio_option_is_set(&td->o, iodepth))
 		td->o.iodepth = td->o.iodepth_low = max_depth;
 
-	return 0;
+	return true;
 err:
 	close(fd);
 	fifo_free(fifo);
-	return 1;
+	return false;
 }
diff --git a/blktrace.h b/blktrace.h
index 8656a95..096993e 100644
--- a/blktrace.h
+++ b/blktrace.h
@@ -3,20 +3,20 @@
 
 #ifdef FIO_HAVE_BLKTRACE
 
-int is_blktrace(const char *, int *);
-int load_blktrace(struct thread_data *, const char *, int);
+bool is_blktrace(const char *, int *);
+bool load_blktrace(struct thread_data *, const char *, int);
 
 #else
 
-static inline int is_blktrace(const char *fname, int *need_swap)
+static inline bool is_blktrace(const char *fname, int *need_swap)
 {
-	return 0;
+	return false;
 }
 
-static inline int load_blktrace(struct thread_data *td, const char *fname,
-				int need_swap)
+static inline bool load_blktrace(struct thread_data *td, const char *fname,
+				 int need_swap)
 {
-	return 1;
+	return false;
 }
 
 #endif
diff --git a/blktrace_api.h b/blktrace_api.h
index e2d8cb3..32ce1d8 100644
--- a/blktrace_api.h
+++ b/blktrace_api.h
@@ -9,7 +9,7 @@
 enum {
 	BLK_TC_READ	= 1 << 0,	/* reads */
 	BLK_TC_WRITE	= 1 << 1,	/* writes */
-	BLK_TC_BARRIER	= 1 << 2,	/* barrier */
+	BLK_TC_FLUSH	= 1 << 2,	/* flush */
 	BLK_TC_SYNC	= 1 << 3,	/* sync */
 	BLK_TC_QUEUE	= 1 << 4,	/* queueing/merging */
 	BLK_TC_REQUEUE	= 1 << 5,	/* requeueing */
diff --git a/cconv.c b/cconv.c
index 9e163b3..bfd699d 100644
--- a/cconv.c
+++ b/cconv.c
@@ -290,6 +290,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->replay_align = le32_to_cpu(top->replay_align);
 	o->replay_scale = le32_to_cpu(top->replay_scale);
 	o->replay_time_scale = le32_to_cpu(top->replay_time_scale);
+	o->replay_skip = le32_to_cpu(top->replay_skip);
 	o->per_job_logs = le32_to_cpu(top->per_job_logs);
 	o->write_bw_log = le32_to_cpu(top->write_bw_log);
 	o->write_lat_log = le32_to_cpu(top->write_lat_log);
@@ -479,6 +480,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->replay_align = cpu_to_le32(o->replay_align);
 	top->replay_scale = cpu_to_le32(o->replay_scale);
 	top->replay_time_scale = cpu_to_le32(o->replay_time_scale);
+	top->replay_skip = cpu_to_le32(o->replay_skip);
 	top->per_job_logs = cpu_to_le32(o->per_job_logs);
 	top->write_bw_log = cpu_to_le32(o->write_bw_log);
 	top->write_lat_log = cpu_to_le32(o->write_lat_log);
diff --git a/fio.1 b/fio.1
index 24bdcdb..7d5d8be 100644
--- a/fio.1
+++ b/fio.1
@@ -2067,6 +2067,12 @@ value.
 Scale sector offsets down by this factor when replaying traces.
 .SS "Threads, processes and job synchronization"
 .TP
+.BI replay_skip \fR=\fPstr
+Sometimes it's useful to skip certain IO types in a replay trace. This could
+be, for instance, eliminating the writes in the trace. Or not replaying the
+trims/discards, if you are redirecting to a device that doesn't support them.
+This option takes a comma separated list of read, write, trim, sync.
+.TP
 .BI thread
 Fio defaults to creating jobs by using fork, however if this option is
 given, fio will create jobs by using POSIX Threads' function
diff --git a/iolog.c b/iolog.c
index 74c89f0..6e44119 100644
--- a/iolog.c
+++ b/iolog.c
@@ -211,7 +211,7 @@ void log_io_piece(struct thread_data *td, struct io_u *io_u)
 	struct fio_rb_node **p, *parent;
 	struct io_piece *ipo, *__ipo;
 
-	ipo = malloc(sizeof(struct io_piece));
+	ipo = calloc(1, sizeof(struct io_piece));
 	init_ipo(ipo);
 	ipo->file = io_u->file;
 	ipo->offset = io_u->offset;
@@ -338,7 +338,7 @@ void write_iolog_close(struct thread_data *td)
  * Read version 2 iolog data. It is enhanced to include per-file logging,
  * syncs, etc.
  */
-static int read_iolog2(struct thread_data *td, FILE *f)
+static bool read_iolog2(struct thread_data *td, FILE *f)
 {
 	unsigned long long offset;
 	unsigned int bytes;
@@ -440,7 +440,7 @@ static int read_iolog2(struct thread_data *td, FILE *f)
 		/*
 		 * Make note of file
 		 */
-		ipo = malloc(sizeof(*ipo));
+		ipo = calloc(1, sizeof(*ipo));
 		init_ipo(ipo);
 		ipo->ddir = rw;
 		if (rw == DDIR_WAIT) {
@@ -474,7 +474,7 @@ static int read_iolog2(struct thread_data *td, FILE *f)
 	}
 
 	if (!reads && !writes && !waits)
-		return 1;
+		return false;
 	else if (reads && !writes)
 		td->o.td_ddir = TD_DDIR_READ;
 	else if (!reads && writes)
@@ -482,22 +482,22 @@ static int read_iolog2(struct thread_data *td, FILE *f)
 	else
 		td->o.td_ddir = TD_DDIR_RW;
 
-	return 0;
+	return true;
 }
 
 /*
  * open iolog, check version, and call appropriate parser
  */
-static int init_iolog_read(struct thread_data *td)
+static bool init_iolog_read(struct thread_data *td)
 {
 	char buffer[256], *p;
 	FILE *f;
-	int ret;
+	bool ret;
 
 	f = fopen(td->o.read_iolog_file, "r");
 	if (!f) {
 		perror("fopen read iolog");
-		return 1;
+		return false;
 	}
 
 	p = fgets(buffer, sizeof(buffer), f);
@@ -505,7 +505,7 @@ static int init_iolog_read(struct thread_data *td)
 		td_verror(td, errno, "iolog read");
 		log_err("fio: unable to read iolog\n");
 		fclose(f);
-		return 1;
+		return false;
 	}
 
 	/*
@@ -516,7 +516,7 @@ static int init_iolog_read(struct thread_data *td)
 		ret = read_iolog2(td, f);
 	else {
 		log_err("fio: iolog version 1 is no longer supported\n");
-		ret = 1;
+		ret = false;
 	}
 
 	fclose(f);
@@ -526,7 +526,7 @@ static int init_iolog_read(struct thread_data *td)
 /*
  * Set up a log for storing io patterns.
  */
-static int init_iolog_write(struct thread_data *td)
+static bool init_iolog_write(struct thread_data *td)
 {
 	struct fio_file *ff;
 	FILE *f;
@@ -535,7 +535,7 @@ static int init_iolog_write(struct thread_data *td)
 	f = fopen(td->o.write_iolog_file, "a");
 	if (!f) {
 		perror("fopen write iolog");
-		return 1;
+		return false;
 	}
 
 	/*
@@ -550,7 +550,7 @@ static int init_iolog_write(struct thread_data *td)
 	 */
 	if (fprintf(f, "%s\n", iolog_ver2) < 0) {
 		perror("iolog init\n");
-		return 1;
+		return false;
 	}
 
 	/*
@@ -559,12 +559,12 @@ static int init_iolog_write(struct thread_data *td)
 	for_each_file(td, ff, i)
 		log_file(td, ff, FIO_LOG_ADD_FILE);
 
-	return 0;
+	return true;
 }
 
-int init_iolog(struct thread_data *td)
+bool init_iolog(struct thread_data *td)
 {
-	int ret = 0;
+	bool ret;
 
 	if (td->o.read_iolog_file) {
 		int need_swap;
@@ -579,8 +579,10 @@ int init_iolog(struct thread_data *td)
 			ret = init_iolog_read(td);
 	} else if (td->o.write_iolog_file)
 		ret = init_iolog_write(td);
+	else
+		ret = true;
 
-	if (ret)
+	if (!ret)
 		td_verror(td, EINVAL, "failed initializing iolog");
 
 	return ret;
diff --git a/iolog.h b/iolog.h
index f70eb61..60b4f01 100644
--- a/iolog.h
+++ b/iolog.h
@@ -234,7 +234,7 @@ struct io_u;
 extern int __must_check read_iolog_get(struct thread_data *, struct io_u *);
 extern void log_io_u(const struct thread_data *, const struct io_u *);
 extern void log_file(struct thread_data *, struct fio_file *, enum file_log_act);
-extern int __must_check init_iolog(struct thread_data *td);
+extern bool __must_check init_iolog(struct thread_data *td);
 extern void log_io_piece(struct thread_data *, struct io_u *);
 extern void unlog_io_piece(struct thread_data *, struct io_u *);
 extern void trim_io_piece(struct thread_data *, const struct io_u *);
@@ -296,7 +296,7 @@ extern int iolog_cur_flush(struct io_log *, struct io_logs *);
 
 static inline void init_ipo(struct io_piece *ipo)
 {
-	memset(ipo, 0, sizeof(*ipo));
+	INIT_FLIST_HEAD(&ipo->list);
 	INIT_FLIST_HEAD(&ipo->trim_list);
 }
 
diff --git a/options.c b/options.c
index 0b3a895..047e493 100644
--- a/options.c
+++ b/options.c
@@ -342,6 +342,43 @@ static int ignore_error_type(struct thread_data *td, enum error_type_bit etype,
 
 }
 
+static int str_replay_skip_cb(void *data, const char *input)
+{
+	struct thread_data *td = cb_data_to_td(data);
+	char *str, *p, *n;
+	int ret = 0;
+
+	if (parse_dryrun())
+		return 0;
+
+	p = str = strdup(input);
+
+	strip_blank_front(&str);
+	strip_blank_end(str);
+
+	while (p) {
+		n = strchr(p, ',');
+		if (n)
+			*n++ = '\0';
+		if (!strcmp(p, "read"))
+			td->o.replay_skip |= 1u << DDIR_READ;
+		else if (!strcmp(p, "write"))
+			td->o.replay_skip |= 1u << DDIR_WRITE;
+		else if (!strcmp(p, "trim"))
+			td->o.replay_skip |= 1u << DDIR_TRIM;
+		else if (!strcmp(p, "sync"))
+			td->o.replay_skip |= 1u << DDIR_SYNC;
+		else {
+			log_err("Unknown skip type: %s\n", p);
+			ret = 1;
+			break;
+		}
+		p = n;
+	}
+	free(str);
+	return ret;
+}
+
 static int str_ignore_error_cb(void *data, const char *input)
 {
 	struct thread_data *td = cb_data_to_td(data);
@@ -3159,6 +3196,17 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_IOLOG,
 	},
 	{
+		.name	= "replay_skip",
+		.lname	= "Replay Skip",
+		.type	= FIO_OPT_STR,
+		.cb	= str_replay_skip_cb,
+		.off1	= offsetof(struct thread_options, replay_skip),
+		.parent	= "read_iolog",
+		.help	= "Skip certain IO types (read,write,trim,flush)",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_IOLOG,
+	},
+	{
 		.name	= "exec_prerun",
 		.lname	= "Pre-execute runnable",
 		.type	= FIO_OPT_STR_STORE,
diff --git a/server.h b/server.h
index 4896860..b48bbe1 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 72,
+	FIO_SERVER_VER			= 73,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 4ec570d..52026e3 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -316,6 +316,7 @@ struct thread_options {
 	unsigned int replay_align;
 	unsigned int replay_scale;
 	unsigned int replay_time_scale;
+	unsigned int replay_skip;
 
 	unsigned int per_job_logs;
 
@@ -590,6 +591,7 @@ struct thread_options_pack {
 	uint32_t replay_align;
 	uint32_t replay_scale;
 	uint32_t replay_time_scale;
+	uint32_t replay_skip;
 
 	uint32_t per_job_logs;
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2e4ef4fbd69eb6d4c07f2f362463e3f3df2e808c:

  engines: fixup fio_q_status style violations (2018-04-20 09:46:19 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ca0a0b521e8650ed736246c00092094a4d5c9829:

  t/*: missing statics (2018-04-24 14:03:34 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      configure: use proper CONFIG_ prefix for asprintf/vasprintf
      engines/libaio: don't sleep for 0 reap return for 0 event check
      t/*: missing statics

 configure        | 4 ++--
 engines/libaio.c | 3 ++-
 oslib/asprintf.c | 4 ++--
 oslib/asprintf.h | 4 ++--
 t/lfsr-test.c    | 2 +-
 t/stest.c        | 2 +-
 6 files changed, 10 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 70cb006..9bdc7a1 100755
--- a/configure
+++ b/configure
@@ -2218,10 +2218,10 @@ if test "$posix_pshared" = "yes" ; then
   output_sym "CONFIG_PSHARED"
 fi
 if test "$have_asprintf" = "yes" ; then
-    output_sym "HAVE_ASPRINTF"
+    output_sym "CONFIG_HAVE_ASPRINTF"
 fi
 if test "$have_vasprintf" = "yes" ; then
-    output_sym "HAVE_VASPRINTF"
+    output_sym "CONFIG_HAVE_VASPRINTF"
 fi
 if test "$linux_fallocate" = "yes" ; then
   output_sym "CONFIG_LINUX_FALLOCATE"
diff --git a/engines/libaio.c b/engines/libaio.c
index f46b331..dae2a70 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -169,7 +169,8 @@ static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
 			events += r;
 		else if ((min && r == 0) || r == -EAGAIN) {
 			fio_libaio_commit(td);
-			usleep(100);
+			if (actual_min)
+				usleep(10);
 		} else if (r != -EINTR)
 			break;
 	} while (events < min);
diff --git a/oslib/asprintf.c b/oslib/asprintf.c
index f1e7fd2..969479f 100644
--- a/oslib/asprintf.c
+++ b/oslib/asprintf.c
@@ -3,7 +3,7 @@
 #include <stdlib.h>
 #include "oslib/asprintf.h"
 
-#ifndef HAVE_VASPRINTF
+#ifndef CONFIG_HAVE_VASPRINTF
 int vasprintf(char **strp, const char *fmt, va_list ap)
 {
     va_list ap_copy;
@@ -28,7 +28,7 @@ int vasprintf(char **strp, const char *fmt, va_list ap)
 }
 #endif
 
-#ifndef HAVE_ASPRINTF
+#ifndef CONFIG_HAVE_ASPRINTF
 int asprintf(char **strp, const char *fmt, ...)
 {
     va_list arg;
diff --git a/oslib/asprintf.h b/oslib/asprintf.h
index 1aa076b..7425300 100644
--- a/oslib/asprintf.h
+++ b/oslib/asprintf.h
@@ -1,10 +1,10 @@
 #ifndef FIO_ASPRINTF_H
 #define FIO_ASPRINTF_H
 
-#ifndef HAVE_VASPRINTF
+#ifndef CONFIG_HAVE_VASPRINTF
 int vasprintf(char **strp, const char *fmt, va_list ap);
 #endif
-#ifndef HAVE_ASPRINTF
+#ifndef CONFIG_HAVE_ASPRINTF
 int asprintf(char **strp, const char *fmt, ...);
 #endif
 
diff --git a/t/lfsr-test.c b/t/lfsr-test.c
index abdbafb..a01f2cf 100644
--- a/t/lfsr-test.c
+++ b/t/lfsr-test.c
@@ -7,7 +7,7 @@
 #include "../gettime.h"
 #include "../fio_time.h"
 
-void usage()
+static void usage(void)
 {
 	printf("Usage: lfsr-test 0x<numbers> [seed] [spin] [verify]\n");
 	printf("-------------------------------------------------------------\n");
diff --git a/t/stest.c b/t/stest.c
index 04df60d..b95968f 100644
--- a/t/stest.c
+++ b/t/stest.c
@@ -18,7 +18,7 @@ struct elem {
 	unsigned int magic2;
 };
 
-FLIST_HEAD(list);
+static FLIST_HEAD(list);
 
 static int do_rand_allocs(void)
 {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6897af4622fec753b5b76a4f2f7865dd56550ea4:

  Remove verifysort/verifysort_nr from documentation (2018-04-18 10:52:00 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2e4ef4fbd69eb6d4c07f2f362463e3f3df2e808c:

  engines: fixup fio_q_status style violations (2018-04-20 09:46:19 -0600)

----------------------------------------------------------------
Bart Van Assche (7):
      Declare stat_calc_lat_nu() static
      Introduce enum n2s_unit
      Simplify num2str()
      Change return type of td_io_commit() into void
      gfapi: Make fio_gf_queue() set the I/O unit error status instead of returning -EINVAL
      Remove dead code from fio_io_sync()
      Introduce enum fio_q_status

Jens Axboe (2):
      Merge branch 'master' of https://github.com/bvanassche/fio
      engines: fixup fio_q_status style violations

 backend.c                   | 29 +++++++++++------------------
 engines/cpu.c               |  3 ++-
 engines/dev-dax.c           |  3 ++-
 engines/e4defrag.c          |  3 ++-
 engines/falloc.c            |  6 ++++--
 engines/filecreate.c        |  3 ++-
 engines/ftruncate.c         |  6 ++++--
 engines/fusion-aw.c         |  2 +-
 engines/glusterfs_async.c   |  4 ++--
 engines/glusterfs_sync.c    |  5 +++--
 engines/guasi.c             |  3 ++-
 engines/libaio.c            |  3 ++-
 engines/libhdfs.c           |  3 ++-
 engines/libpmem.c           |  3 ++-
 engines/mmap.c              |  3 ++-
 engines/mtd.c               |  3 ++-
 engines/net.c               |  8 +++++---
 engines/null.c              |  7 ++++---
 engines/pmemblk.c           |  3 ++-
 engines/posixaio.c          |  4 ++--
 engines/rados.c             |  3 ++-
 engines/rbd.c               |  3 ++-
 engines/rdma.c              |  3 ++-
 engines/sg.c                |  8 +++++---
 engines/skeleton_external.c |  3 ++-
 engines/splice.c            |  3 ++-
 engines/sync.c              | 16 ++++++++++------
 engines/windowsaio.c        |  3 ++-
 init.c                      |  6 +++---
 io_u.c                      |  7 ++-----
 ioengines.c                 | 19 ++++++-------------
 ioengines.h                 | 10 +++++-----
 lib/num2str.c               | 27 ++++++++++++++++-----------
 lib/num2str.h               | 18 ++++++++++--------
 options.c                   |  6 +++---
 stat.c                      |  2 +-
 36 files changed, 131 insertions(+), 110 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index a2a0b3d..d5cb6ef 100644
--- a/backend.c
+++ b/backend.c
@@ -268,7 +268,7 @@ static void cleanup_pending_aio(struct thread_data *td)
 static bool fio_io_sync(struct thread_data *td, struct fio_file *f)
 {
 	struct io_u *io_u = __get_io_u(td);
-	int ret;
+	enum fio_q_status ret;
 
 	if (!io_u)
 		return true;
@@ -283,16 +283,13 @@ static bool fio_io_sync(struct thread_data *td, struct fio_file *f)
 
 requeue:
 	ret = td_io_queue(td, io_u);
-	if (ret < 0) {
-		td_verror(td, io_u->error, "td_io_queue");
-		put_io_u(td, io_u);
-		return true;
-	} else if (ret == FIO_Q_QUEUED) {
-		if (td_io_commit(td))
-			return true;
+	switch (ret) {
+	case FIO_Q_QUEUED:
+		td_io_commit(td);
 		if (io_u_queued_complete(td, 1) < 0)
 			return true;
-	} else if (ret == FIO_Q_COMPLETED) {
+		break;
+	case FIO_Q_COMPLETED:
 		if (io_u->error) {
 			td_verror(td, io_u->error, "td_io_queue");
 			return true;
@@ -300,9 +297,9 @@ requeue:
 
 		if (io_u_sync_complete(td, io_u) < 0)
 			return true;
-	} else if (ret == FIO_Q_BUSY) {
-		if (td_io_commit(td))
-			return true;
+		break;
+	case FIO_Q_BUSY:
+		td_io_commit(td);
 		goto requeue;
 	}
 
@@ -453,8 +450,6 @@ int io_queue_event(struct thread_data *td, struct io_u *io_u, int *ret,
 		   enum fio_ddir ddir, uint64_t *bytes_issued, int from_verify,
 		   struct timespec *comp_time)
 {
-	int ret2;
-
 	switch (*ret) {
 	case FIO_Q_COMPLETED:
 		if (io_u->error) {
@@ -530,9 +525,7 @@ sync_done:
 		if (!from_verify)
 			unlog_io_piece(td, io_u);
 		requeue_io_u(td, &io_u);
-		ret2 = td_io_commit(td);
-		if (ret2 < 0)
-			*ret = ret2;
+		td_io_commit(td);
 		break;
 	default:
 		assert(*ret < 0);
@@ -605,7 +598,7 @@ static bool in_flight_overlap(struct io_u_queue *q, struct io_u *io_u)
 	return overlap;
 }
 
-static int io_u_submit(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status io_u_submit(struct thread_data *td, struct io_u *io_u)
 {
 	/*
 	 * Check for overlap if the user asked us to, and we have
diff --git a/engines/cpu.c b/engines/cpu.c
index d0b4a89..0987250 100644
--- a/engines/cpu.c
+++ b/engines/cpu.c
@@ -53,7 +53,8 @@ static struct fio_option options[] = {
 };
 
 
-static int fio_cpuio_queue(struct thread_data *td, struct io_u fio_unused *io_u)
+static enum fio_q_status fio_cpuio_queue(struct thread_data *td,
+					 struct io_u fio_unused *io_u)
 {
 	struct cpu_options *co = td->eo;
 
diff --git a/engines/dev-dax.c b/engines/dev-dax.c
index caae1e0..0660bba 100644
--- a/engines/dev-dax.c
+++ b/engines/dev-dax.c
@@ -182,7 +182,8 @@ done:
 	return 0;
 }
 
-static int fio_devdax_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_devdax_queue(struct thread_data *td,
+					  struct io_u *io_u)
 {
 	fio_ro_check(td, io_u);
 	io_u->error = 0;
diff --git a/engines/e4defrag.c b/engines/e4defrag.c
index 3619450..8f71d02 100644
--- a/engines/e4defrag.c
+++ b/engines/e4defrag.c
@@ -127,7 +127,8 @@ static void fio_e4defrag_cleanup(struct thread_data *td)
 }
 
 
-static int fio_e4defrag_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_e4defrag_queue(struct thread_data *td,
+					    struct io_u *io_u)
 {
 
 	int ret;
diff --git a/engines/falloc.c b/engines/falloc.c
index bb3ac85..6382569 100644
--- a/engines/falloc.c
+++ b/engines/falloc.c
@@ -65,8 +65,10 @@ open_again:
 #endif
 #ifndef FALLOC_FL_PUNCH_HOLE
 #define FALLOC_FL_PUNCH_HOLE    0x02 /* de-allocates range */
-#endif 
-static int fio_fallocate_queue(struct thread_data *td, struct io_u *io_u)
+#endif
+
+static enum fio_q_status fio_fallocate_queue(struct thread_data *td,
+					     struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	int ret;
diff --git a/engines/filecreate.c b/engines/filecreate.c
index 6fa041c..39a2950 100644
--- a/engines/filecreate.c
+++ b/engines/filecreate.c
@@ -55,7 +55,8 @@ static int open_file(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
-static int queue_io(struct thread_data *td, struct io_u fio_unused *io_u)
+static enum fio_q_status queue_io(struct thread_data *td,
+				  struct io_u fio_unused *io_u)
 {
 	return FIO_Q_COMPLETED;
 }
diff --git a/engines/ftruncate.c b/engines/ftruncate.c
index 14e115f..c7ad038 100644
--- a/engines/ftruncate.c
+++ b/engines/ftruncate.c
@@ -11,18 +11,20 @@
 
 #include "../fio.h"
 
-static int fio_ftruncate_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_ftruncate_queue(struct thread_data *td,
+					     struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	int ret;
+
 	fio_ro_check(td, io_u);
 
 	if (io_u->ddir != DDIR_WRITE) {
 		io_u->error = EINVAL;
 		return FIO_Q_COMPLETED;
 	}
-	ret = ftruncate(f->fd, io_u->offset);
 
+	ret = ftruncate(f->fd, io_u->offset);
 	if (ret)
 		io_u->error = errno;
 
diff --git a/engines/fusion-aw.c b/engines/fusion-aw.c
index 77844ff..eb5fdf5 100644
--- a/engines/fusion-aw.c
+++ b/engines/fusion-aw.c
@@ -34,7 +34,7 @@ struct fas_data {
 	size_t sector_size;
 };
 
-static int queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status queue(struct thread_data *td, struct io_u *io_u)
 {
 	struct fas_data *d = FILE_ENG_DATA(io_u->file);
 	int rc;
diff --git a/engines/glusterfs_async.c b/engines/glusterfs_async.c
index eb8df45..9e1c4bf 100644
--- a/engines/glusterfs_async.c
+++ b/engines/glusterfs_async.c
@@ -93,8 +93,8 @@ static void gf_async_cb(glfs_fd_t * fd, ssize_t ret, void *data)
 	iou->io_complete = 1;
 }
 
-static int fio_gf_async_queue(struct thread_data fio_unused * td,
-			      struct io_u *io_u)
+static enum fio_q_status fio_gf_async_queue(struct thread_data fio_unused * td,
+					    struct io_u *io_u)
 {
 	struct gf_data *g = td->io_ops_data;
 	int r;
diff --git a/engines/glusterfs_sync.c b/engines/glusterfs_sync.c
index 25d05b2..a10e0ed 100644
--- a/engines/glusterfs_sync.c
+++ b/engines/glusterfs_sync.c
@@ -29,7 +29,7 @@ static int fio_gf_prep(struct thread_data *td, struct io_u *io_u)
 	return 0;
 }
 
-static int fio_gf_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_gf_queue(struct thread_data *td, struct io_u *io_u)
 {
 	struct gf_data *g = td->io_ops_data;
 	int ret = 0;
@@ -47,7 +47,8 @@ static int fio_gf_queue(struct thread_data *td, struct io_u *io_u)
 		ret = glfs_fdatasync(g->fd);
 	else {
 		log_err("unsupported operation.\n");
-		return -EINVAL;
+		io_u->error = EINVAL;
+		return FIO_Q_COMPLETED;
 	}
 	dprint(FD_FILE, "fio len %lu ret %d\n", io_u->xfer_buflen, ret);
 	if (io_u->file && ret >= 0 && ddir_rw(io_u->ddir))
diff --git a/engines/guasi.c b/engines/guasi.c
index 9644ee5..cb26802 100644
--- a/engines/guasi.c
+++ b/engines/guasi.c
@@ -113,7 +113,8 @@ static int fio_guasi_getevents(struct thread_data *td, unsigned int min,
 	return n;
 }
 
-static int fio_guasi_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_guasi_queue(struct thread_data *td,
+					 struct io_u *io_u)
 {
 	struct guasi_data *ld = td->io_ops_data;
 
diff --git a/engines/libaio.c b/engines/libaio.c
index 7d59df3..f46b331 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -177,7 +177,8 @@ static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
 	return r < 0 ? r : events;
 }
 
-static int fio_libaio_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_libaio_queue(struct thread_data *td,
+					  struct io_u *io_u)
 {
 	struct libaio_data *ld = td->io_ops_data;
 
diff --git a/engines/libhdfs.c b/engines/libhdfs.c
index 96a0871..6000160 100644
--- a/engines/libhdfs.c
+++ b/engines/libhdfs.c
@@ -165,7 +165,8 @@ static int fio_hdfsio_prep(struct thread_data *td, struct io_u *io_u)
 	return 0;
 }
 
-static int fio_hdfsio_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_hdfsio_queue(struct thread_data *td,
+					  struct io_u *io_u)
 {
 	struct hdfsio_data *hd = td->io_ops_data;
 	struct hdfsio_options *options = td->eo;
diff --git a/engines/libpmem.c b/engines/libpmem.c
index dbb3f5c..21ff4f6 100644
--- a/engines/libpmem.c
+++ b/engines/libpmem.c
@@ -457,7 +457,8 @@ done:
 	return 0;
 }
 
-static int fio_libpmem_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_libpmem_queue(struct thread_data *td,
+					   struct io_u *io_u)
 {
 	fio_ro_check(td, io_u);
 	io_u->error = 0;
diff --git a/engines/mmap.c b/engines/mmap.c
index 9dbefc8..308b466 100644
--- a/engines/mmap.c
+++ b/engines/mmap.c
@@ -177,7 +177,8 @@ done:
 	return 0;
 }
 
-static int fio_mmapio_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_mmapio_queue(struct thread_data *td,
+					  struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	struct fio_mmap_data *fmd = FILE_ENG_DATA(f);
diff --git a/engines/mtd.c b/engines/mtd.c
index 5f822fc..b9f4316 100644
--- a/engines/mtd.c
+++ b/engines/mtd.c
@@ -71,7 +71,8 @@ static int fio_mtd_is_bad(struct thread_data *td,
 	return ret;
 }
 
-static int fio_mtd_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_mtd_queue(struct thread_data *td,
+				       struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	struct fio_mtd_data *fmd = FILE_ENG_DATA(f);
diff --git a/engines/net.c b/engines/net.c
index 4540e0e..ca6fb34 100644
--- a/engines/net.c
+++ b/engines/net.c
@@ -642,8 +642,9 @@ static int fio_netio_recv(struct thread_data *td, struct io_u *io_u)
 	return ret;
 }
 
-static int __fio_netio_queue(struct thread_data *td, struct io_u *io_u,
-			     enum fio_ddir ddir)
+static enum fio_q_status __fio_netio_queue(struct thread_data *td,
+					   struct io_u *io_u,
+					   enum fio_ddir ddir)
 {
 	struct netio_data *nd = td->io_ops_data;
 	struct netio_options *o = td->eo;
@@ -687,7 +688,8 @@ static int __fio_netio_queue(struct thread_data *td, struct io_u *io_u,
 	return FIO_Q_COMPLETED;
 }
 
-static int fio_netio_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_netio_queue(struct thread_data *td,
+					 struct io_u *io_u)
 {
 	struct netio_options *o = td->eo;
 	int ret;
diff --git a/engines/null.c b/engines/null.c
index 8c26ad7..4cc0102 100644
--- a/engines/null.c
+++ b/engines/null.c
@@ -56,8 +56,8 @@ static int null_commit(struct thread_data *td, struct null_data *nd)
 	return 0;
 }
 
-static int null_queue(struct thread_data *td, struct null_data *nd,
-		      struct io_u *io_u)
+static enum fio_q_status null_queue(struct thread_data *td,
+				    struct null_data *nd, struct io_u *io_u)
 {
 	fio_ro_check(td, io_u);
 
@@ -118,7 +118,8 @@ static int fio_null_commit(struct thread_data *td)
 	return null_commit(td, td->io_ops_data);
 }
 
-static int fio_null_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_null_queue(struct thread_data *td,
+					struct io_u *io_u)
 {
 	return null_queue(td, td->io_ops_data, io_u);
 }
diff --git a/engines/pmemblk.c b/engines/pmemblk.c
index 264eb71..45f6fb6 100644
--- a/engines/pmemblk.c
+++ b/engines/pmemblk.c
@@ -342,7 +342,8 @@ static int fio_pmemblk_get_file_size(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
-static int fio_pmemblk_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_pmemblk_queue(struct thread_data *td,
+					   struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	fio_pmemblk_file_t pmb = FILE_ENG_DATA(f);
diff --git a/engines/posixaio.c b/engines/posixaio.c
index bddb1ec..4ac0195 100644
--- a/engines/posixaio.c
+++ b/engines/posixaio.c
@@ -166,8 +166,8 @@ static struct io_u *fio_posixaio_event(struct thread_data *td, int event)
 	return pd->aio_events[event];
 }
 
-static int fio_posixaio_queue(struct thread_data *td,
-			      struct io_u *io_u)
+static enum fio_q_status fio_posixaio_queue(struct thread_data *td,
+					    struct io_u *io_u)
 {
 	struct posixaio_data *pd = td->io_ops_data;
 	os_aiocb_t *aiocb = &io_u->aiocb;
diff --git a/engines/rados.c b/engines/rados.c
index dc0d7b1..c6aec73 100644
--- a/engines/rados.c
+++ b/engines/rados.c
@@ -251,7 +251,8 @@ static void fio_rados_cleanup(struct thread_data *td)
 	}
 }
 
-static int fio_rados_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_rados_queue(struct thread_data *td,
+					 struct io_u *io_u)
 {
 	struct rados_data *rados = td->io_ops_data;
 	struct fio_rados_iou *fri = io_u->engine_data;
diff --git a/engines/rbd.c b/engines/rbd.c
index 6582b06..081b4a0 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -462,7 +462,8 @@ static int fio_rbd_getevents(struct thread_data *td, unsigned int min,
 	return events;
 }
 
-static int fio_rbd_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_rbd_queue(struct thread_data *td,
+				       struct io_u *io_u)
 {
 	struct rbd_data *rbd = td->io_ops_data;
 	struct fio_rbd_iou *fri = io_u->engine_data;
diff --git a/engines/rdma.c b/engines/rdma.c
index 8def6eb..2569a8e 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -791,7 +791,8 @@ static int fio_rdmaio_recv(struct thread_data *td, struct io_u **io_us,
 	return i;
 }
 
-static int fio_rdmaio_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_rdmaio_queue(struct thread_data *td,
+					  struct io_u *io_u)
 {
 	struct rdmaio_data *rd = td->io_ops_data;
 
diff --git a/engines/sg.c b/engines/sg.c
index c2c0de3..d4848bc 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -236,8 +236,9 @@ re_read:
 	return r;
 }
 
-static int fio_sgio_ioctl_doio(struct thread_data *td,
-			       struct fio_file *f, struct io_u *io_u)
+static enum fio_q_status fio_sgio_ioctl_doio(struct thread_data *td,
+					     struct fio_file *f,
+					     struct io_u *io_u)
 {
 	struct sgio_data *sd = td->io_ops_data;
 	struct sg_io_hdr *hdr = &io_u->hdr;
@@ -377,7 +378,8 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 	return 0;
 }
 
-static int fio_sgio_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_sgio_queue(struct thread_data *td,
+					struct io_u *io_u)
 {
 	struct sg_io_hdr *hdr = &io_u->hdr;
 	int ret, do_sync = 0;
diff --git a/engines/skeleton_external.c b/engines/skeleton_external.c
index 56f89f9..21a3601 100644
--- a/engines/skeleton_external.c
+++ b/engines/skeleton_external.c
@@ -90,7 +90,8 @@ static int fio_skeleton_cancel(struct thread_data *td, struct io_u *io_u)
  * io_u->xfer_buflen. Residual data count may be set in io_u->resid
  * for a short read/write.
  */
-static int fio_skeleton_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_skeleton_queue(struct thread_data *td,
+					    struct io_u *io_u)
 {
 	/*
 	 * Double sanity check to catch errant write on a readonly setup
diff --git a/engines/splice.c b/engines/splice.c
index 08fc857..feb764f 100644
--- a/engines/splice.c
+++ b/engines/splice.c
@@ -199,7 +199,8 @@ static int fio_splice_write(struct thread_data *td, struct io_u *io_u)
 	return io_u->xfer_buflen;
 }
 
-static int fio_spliceio_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_spliceio_queue(struct thread_data *td,
+					    struct io_u *io_u)
 {
 	struct spliceio_data *sd = td->io_ops_data;
 	int ret = 0;
diff --git a/engines/sync.c b/engines/sync.c
index d5b4012..3f36da8 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -110,7 +110,8 @@ static int fio_io_end(struct thread_data *td, struct io_u *io_u, int ret)
 }
 
 #ifdef CONFIG_PWRITEV
-static int fio_pvsyncio_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_pvsyncio_queue(struct thread_data *td,
+					    struct io_u *io_u)
 {
 	struct syncio_data *sd = td->io_ops_data;
 	struct iovec *iov = &sd->iovecs[0];
@@ -137,7 +138,8 @@ static int fio_pvsyncio_queue(struct thread_data *td, struct io_u *io_u)
 #endif
 
 #ifdef FIO_HAVE_PWRITEV2
-static int fio_pvsyncio2_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_pvsyncio2_queue(struct thread_data *td,
+					     struct io_u *io_u)
 {
 	struct syncio_data *sd = td->io_ops_data;
 	struct psyncv2_options *o = td->eo;
@@ -168,8 +170,8 @@ static int fio_pvsyncio2_queue(struct thread_data *td, struct io_u *io_u)
 }
 #endif
 
-
-static int fio_psyncio_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_psyncio_queue(struct thread_data *td,
+					   struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	int ret;
@@ -189,7 +191,8 @@ static int fio_psyncio_queue(struct thread_data *td, struct io_u *io_u)
 	return fio_io_end(td, io_u, ret);
 }
 
-static int fio_syncio_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_syncio_queue(struct thread_data *td,
+					  struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	int ret;
@@ -260,7 +263,8 @@ static void fio_vsyncio_set_iov(struct syncio_data *sd, struct io_u *io_u,
 	sd->queued++;
 }
 
-static int fio_vsyncio_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_vsyncio_queue(struct thread_data *td,
+					   struct io_u *io_u)
 {
 	struct syncio_data *sd = td->io_ops_data;
 
diff --git a/engines/windowsaio.c b/engines/windowsaio.c
index 9439393..13d7f19 100644
--- a/engines/windowsaio.c
+++ b/engines/windowsaio.c
@@ -354,7 +354,8 @@ static int fio_windowsaio_getevents(struct thread_data *td, unsigned int min,
 	return dequeued;
 }
 
-static int fio_windowsaio_queue(struct thread_data *td, struct io_u *io_u)
+static enum fio_q_status fio_windowsaio_queue(struct thread_data *td,
+					      struct io_u *io_u)
 {
 	struct fio_overlapped *o = io_u->engine_data;
 	LPOVERLAPPED lpOvl = &o->o;
diff --git a/init.c b/init.c
index 07d1cdd..9257d47 100644
--- a/init.c
+++ b/init.c
@@ -833,11 +833,11 @@ static int fixup_options(struct thread_data *td)
 		}
 	}
 
-	if (!o->unit_base) {
+	if (o->unit_base == N2S_NONE) {
 		if (td_ioengine_flagged(td, FIO_BIT_BASED))
-			o->unit_base = 1;
+			o->unit_base = N2S_BITPERSEC;
 		else
-			o->unit_base = 8;
+			o->unit_base = N2S_BYTEPERSEC;
 	}
 
 #ifndef FIO_HAVE_ANY_FALLOCATE
diff --git a/io_u.c b/io_u.c
index 633f617..5b4c0df 100644
--- a/io_u.c
+++ b/io_u.c
@@ -610,11 +610,8 @@ int io_u_quiesce(struct thread_data *td)
 	 * io's that have been actually submitted to an async engine,
 	 * and cur_depth is meaningless for sync engines.
 	 */
-	if (td->io_u_queued || td->cur_depth) {
-		int fio_unused ret;
-
-		ret = td_io_commit(td);
-	}
+	if (td->io_u_queued || td->cur_depth)
+		td_io_commit(td);
 
 	while (td->io_u_in_flight) {
 		int ret;
diff --git a/ioengines.c b/ioengines.c
index a8ec79d..6ffd27f 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -276,11 +276,11 @@ out:
 	return r;
 }
 
-int td_io_queue(struct thread_data *td, struct io_u *io_u)
+enum fio_q_status td_io_queue(struct thread_data *td, struct io_u *io_u)
 {
 	const enum fio_ddir ddir = acct_ddir(io_u);
 	unsigned long buflen = io_u->xfer_buflen;
-	int ret;
+	enum fio_q_status ret;
 
 	dprint_io_u(io_u, "queue");
 	fio_ro_check(td, io_u);
@@ -361,18 +361,13 @@ int td_io_queue(struct thread_data *td, struct io_u *io_u)
 			td->ts.total_io_u[io_u->ddir]++;
 		}
 	} else if (ret == FIO_Q_QUEUED) {
-		int r;
-
 		td->io_u_queued++;
 
 		if (ddir_rw(io_u->ddir) || ddir_sync(io_u->ddir))
 			td->ts.total_io_u[io_u->ddir]++;
 
-		if (td->io_u_queued >= td->o.iodepth_batch) {
-			r = td_io_commit(td);
-			if (r < 0)
-				return r;
-		}
+		if (td->io_u_queued >= td->o.iodepth_batch)
+			td_io_commit(td);
 	}
 
 	if (!td_ioengine_flagged(td, FIO_SYNCIO)) {
@@ -410,14 +405,14 @@ int td_io_init(struct thread_data *td)
 	return ret;
 }
 
-int td_io_commit(struct thread_data *td)
+void td_io_commit(struct thread_data *td)
 {
 	int ret;
 
 	dprint(FD_IO, "calling ->commit(), depth %d\n", td->cur_depth);
 
 	if (!td->cur_depth || !td->io_u_queued)
-		return 0;
+		return;
 
 	io_u_mark_depth(td, td->io_u_queued);
 
@@ -432,8 +427,6 @@ int td_io_commit(struct thread_data *td)
 	 */
 	td->io_u_in_flight += td->io_u_queued;
 	td->io_u_queued = 0;
-
-	return 0;
 }
 
 int td_io_open_file(struct thread_data *td, struct fio_file *f)
diff --git a/ioengines.h b/ioengines.h
index a0674ae..feb21db 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -7,12 +7,12 @@
 #include "flist.h"
 #include "io_u.h"
 
-#define FIO_IOOPS_VERSION	23
+#define FIO_IOOPS_VERSION	24
 
 /*
  * io_ops->queue() return values
  */
-enum {
+enum fio_q_status {
 	FIO_Q_COMPLETED	= 0,		/* completed sync */
 	FIO_Q_QUEUED	= 1,		/* queued, will complete async */
 	FIO_Q_BUSY	= 2,		/* no more room, call ->commit() */
@@ -26,7 +26,7 @@ struct ioengine_ops {
 	int (*setup)(struct thread_data *);
 	int (*init)(struct thread_data *);
 	int (*prep)(struct thread_data *, struct io_u *);
-	int (*queue)(struct thread_data *, struct io_u *);
+	enum fio_q_status (*queue)(struct thread_data *, struct io_u *);
 	int (*commit)(struct thread_data *);
 	int (*getevents)(struct thread_data *, unsigned int, unsigned int, const struct timespec *);
 	struct io_u *(*event)(struct thread_data *, int);
@@ -74,9 +74,9 @@ typedef void (*get_ioengine_t)(struct ioengine_ops **);
  */
 extern int __must_check td_io_init(struct thread_data *);
 extern int __must_check td_io_prep(struct thread_data *, struct io_u *);
-extern int __must_check td_io_queue(struct thread_data *, struct io_u *);
+extern enum fio_q_status __must_check td_io_queue(struct thread_data *, struct io_u *);
 extern int __must_check td_io_getevents(struct thread_data *, unsigned int, unsigned int, const struct timespec *);
-extern int __must_check td_io_commit(struct thread_data *);
+extern void td_io_commit(struct thread_data *);
 extern int __must_check td_io_open_file(struct thread_data *, struct fio_file *);
 extern int td_io_close_file(struct thread_data *, struct fio_file *);
 extern int td_io_unlink_file(struct thread_data *, struct fio_file *);
diff --git a/lib/num2str.c b/lib/num2str.c
index 387c5d7..40fb3ae 100644
--- a/lib/num2str.c
+++ b/lib/num2str.c
@@ -14,22 +14,30 @@
  * @maxlen: max number of digits in the output string (not counting prefix and units, but counting .)
  * @base: multiplier for num (e.g., if num represents Ki, use 1024)
  * @pow2: select unit prefix - 0=power-of-10 decimal SI, nonzero=power-of-2 binary IEC
- * @units: select units - N2S_* macros defined in num2str.h
+ * @units: select units - N2S_* constants defined in num2str.h
  * @returns a malloc'd buffer containing "number[<unit prefix>][<units>]"
  */
-char *num2str(uint64_t num, int maxlen, int base, int pow2, int units)
+char *num2str(uint64_t num, int maxlen, int base, int pow2, enum n2s_unit units)
 {
 	const char *sistr[] = { "", "k", "M", "G", "T", "P" };
 	const char *iecstr[] = { "", "Ki", "Mi", "Gi", "Ti", "Pi" };
 	const char **unitprefix;
-	const char *unitstr[] = { "", "/s", "B", "bit", "B/s", "bit/s" };
+	static const char *const unitstr[] = {
+		[N2S_NONE]	= "",
+		[N2S_PERSEC]	= "/s",
+		[N2S_BYTE]	= "B",
+		[N2S_BIT]	= "bit",
+		[N2S_BYTEPERSEC]= "B/s",
+		[N2S_BITPERSEC]	= "bit/s"
+	};
 	const unsigned int thousand[] = { 1000, 1024 };
 	unsigned int modulo;
-	int unit_index = 0, post_index, carry = 0;
+	int post_index, carry = 0;
 	char tmp[32], fmt[32];
 	char *buf;
 
 	compiletime_assert(sizeof(sistr) == sizeof(iecstr), "unit prefix arrays must be identical sizes");
+	assert(units < ARRAY_SIZE(unitstr));
 
 	buf = malloc(128);
 	if (!buf)
@@ -44,21 +52,18 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, int units)
 		base /= thousand[!!pow2];
 
 	switch (units) {
+	case N2S_NONE:
+		break;
 	case N2S_PERSEC:
-		unit_index = 1;
 		break;
 	case N2S_BYTE:
-		unit_index = 2;
 		break;
 	case N2S_BIT:
-		unit_index = 3;
 		num *= 8;
 		break;
 	case N2S_BYTEPERSEC:
-		unit_index = 4;
 		break;
 	case N2S_BITPERSEC:
-		unit_index = 5;
 		num *= 8;
 		break;
 	}
@@ -87,7 +92,7 @@ done:
 			post_index = 0;
 
 		sprintf(buf, "%llu%s%s", (unsigned long long) num,
-			unitprefix[post_index], unitstr[unit_index]);
+			unitprefix[post_index], unitstr[units]);
 		return buf;
 	}
 
@@ -110,6 +115,6 @@ done:
 	sprintf(tmp, fmt, (double)modulo / (double)thousand[!!pow2]);
 
 	sprintf(buf, "%llu.%s%s%s", (unsigned long long) num, &tmp[2],
-			unitprefix[post_index], unitstr[unit_index]);
+			unitprefix[post_index], unitstr[units]);
 	return buf;
 }
diff --git a/lib/num2str.h b/lib/num2str.h
index 81358a1..797288b 100644
--- a/lib/num2str.h
+++ b/lib/num2str.h
@@ -3,13 +3,15 @@
 
 #include <inttypes.h>
 
-#define N2S_NONE	0
-#define N2S_BITPERSEC	1	/* match unit_base for bit rates */
-#define N2S_PERSEC	2
-#define N2S_BIT		3
-#define N2S_BYTE	4
-#define N2S_BYTEPERSEC	8	/* match unit_base for byte rates */
-
-extern char *num2str(uint64_t, int, int, int, int);
+enum n2s_unit {
+	N2S_NONE	= 0,
+	N2S_PERSEC	= 1,
+	N2S_BYTE	= 2,
+	N2S_BIT		= 3,
+	N2S_BYTEPERSEC	= 4,
+	N2S_BITPERSEC	= 5,
+};
+
+extern char *num2str(uint64_t, int, int, int, enum n2s_unit);
 
 #endif
diff --git a/options.c b/options.c
index 1b3ea04..0b3a895 100644
--- a/options.c
+++ b/options.c
@@ -4425,15 +4425,15 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.prio	= 1,
 		.posval = {
 			  { .ival = "0",
-			    .oval = 0,
+			    .oval = N2S_NONE,
 			    .help = "Auto-detect",
 			  },
 			  { .ival = "8",
-			    .oval = 8,
+			    .oval = N2S_BYTEPERSEC,
 			    .help = "Normal (byte based)",
 			  },
 			  { .ival = "1",
-			    .oval = 1,
+			    .oval = N2S_BITPERSEC,
 			    .help = "Bit based",
 			  },
 		},
diff --git a/stat.c b/stat.c
index 7b9dd3b..c89a7f0 100644
--- a/stat.c
+++ b/stat.c
@@ -362,7 +362,7 @@ static void stat_calc_lat(struct thread_stat *ts, double *dst,
  * To keep the terse format unaltered, add all of the ns latency
  * buckets to the first us latency bucket
  */
-void stat_calc_lat_nu(struct thread_stat *ts, double *io_u_lat_u)
+static void stat_calc_lat_nu(struct thread_stat *ts, double *io_u_lat_u)
 {
 	unsigned long ntotal = 0, total = ddir_rw_sum(ts->total_io_u);
 	int i;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f31feaa21642929b6d9d5396b73669372fda9a0a:

  Deprecate verifysort and verifysort_nr (2018-04-17 21:50:55 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6897af4622fec753b5b76a4f2f7865dd56550ea4:

  Remove verifysort/verifysort_nr from documentation (2018-04-18 10:52:00 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      iolog: update stale comment
      Remove verifysort/verifysort_nr from documentation

 HOWTO   | 12 ------------
 fio.1   | 10 ----------
 iolog.c |  6 +-----
 3 files changed, 1 insertion(+), 27 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index c34fdf3..8ee00fd 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2635,18 +2635,6 @@ Verification
 	previously written file. If the data direction includes any form of write,
 	the verify will be of the newly written data.
 
-.. option:: verifysort=bool
-
-	If true, fio will sort written verify blocks when it deems it faster to read
-	them back in a sorted manner. This is often the case when overwriting an
-	existing file, since the blocks are already laid out in the file system. You
-	can ignore this option unless doing huge amounts of really fast I/O where
-	the red-black tree sorting CPU time becomes significant. Default: true.
-
-.. option:: verifysort_nr=int
-
-	Pre-load and sort verify blocks for a read workload.
-
 .. option:: verify_offset=int
 
 	Swap the verification header with data somewhere else in the block before
diff --git a/fio.1 b/fio.1
index 9264855..24bdcdb 100644
--- a/fio.1
+++ b/fio.1
@@ -2340,16 +2340,6 @@ previously written file. If the data direction includes any form of write,
 the verify will be of the newly written data.
 .RE
 .TP
-.BI verifysort \fR=\fPbool
-If true, fio will sort written verify blocks when it deems it faster to read
-them back in a sorted manner. This is often the case when overwriting an
-existing file, since the blocks are already laid out in the file system. You
-can ignore this option unless doing huge amounts of really fast I/O where
-the red\-black tree sorting CPU time becomes significant. Default: true.
-.TP
-.BI verifysort_nr \fR=\fPint
-Pre\-load and sort verify blocks for a read workload.
-.TP
 .BI verify_offset \fR=\fPint
 Swap the verification header with data somewhere else in the block before
 writing. It is swapped back before verifying.
diff --git a/iolog.c b/iolog.c
index 598548d..74c89f0 100644
--- a/iolog.c
+++ b/iolog.c
@@ -227,11 +227,7 @@ void log_io_piece(struct thread_data *td, struct io_u *io_u)
 	}
 
 	/*
-	 * We don't need to sort the entries if we only performed sequential
-	 * writes. In this case, just reading back data in the order we wrote
-	 * it out is the faster but still safe.
-	 *
-	 * One exception is if we don't have a random map in which case we need
+	 * Only sort writes if we don't have a random map in which case we need
 	 * to check for duplicate blocks and drop the old one, which we rely on
 	 * the rb insert/lookup for handling.
 	 */

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c7d32225c3efe61c79470bc31bb369b33d3e3a88:

  Merge branch 'nvml-to-pmdk' of https://github.com/sscargal/fio (2018-04-16 19:40:46 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f31feaa21642929b6d9d5396b73669372fda9a0a:

  Deprecate verifysort and verifysort_nr (2018-04-17 21:50:55 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      init: ensure that read/write use the same random seed for verify
      parse: add support for soft deprecated options
      Deprecate verifysort and verifysort_nr

 backend.c        |  1 -
 cconv.c          |  4 ---
 fio.h            |  2 --
 init.c           | 23 ++++++++++--------
 io_u.c           | 74 ++------------------------------------------------------
 iolog.c          |  3 +--
 options.c        | 15 ++----------
 parse.c          | 10 +++++---
 parse.h          |  1 +
 rate-submit.c    |  1 -
 thread_options.h |  4 ---
 verify.c         |  5 ++--
 12 files changed, 28 insertions(+), 115 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index b28c3db..a2a0b3d 100644
--- a/backend.c
+++ b/backend.c
@@ -1549,7 +1549,6 @@ static void *thread_main(void *data)
 	INIT_FLIST_HEAD(&td->io_hist_list);
 	INIT_FLIST_HEAD(&td->verify_list);
 	INIT_FLIST_HEAD(&td->trim_list);
-	INIT_FLIST_HEAD(&td->next_rand_list);
 	td->io_hist_tree = RB_ROOT;
 
 	ret = mutex_cond_init_pshared(&td->io_u_lock, &td->free_cond);
diff --git a/cconv.c b/cconv.c
index 585ed86..9e163b3 100644
--- a/cconv.c
+++ b/cconv.c
@@ -162,8 +162,6 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->write_hint = le32_to_cpu(top->write_hint);
 	o->verify = le32_to_cpu(top->verify);
 	o->do_verify = le32_to_cpu(top->do_verify);
-	o->verifysort = le32_to_cpu(top->verifysort);
-	o->verifysort_nr = le32_to_cpu(top->verifysort_nr);
 	o->experimental_verify = le32_to_cpu(top->experimental_verify);
 	o->verify_state = le32_to_cpu(top->verify_state);
 	o->verify_interval = le32_to_cpu(top->verify_interval);
@@ -376,8 +374,6 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->write_hint = cpu_to_le32(o->write_hint);
 	top->verify = cpu_to_le32(o->verify);
 	top->do_verify = cpu_to_le32(o->do_verify);
-	top->verifysort = cpu_to_le32(o->verifysort);
-	top->verifysort_nr = cpu_to_le32(o->verifysort_nr);
 	top->experimental_verify = cpu_to_le32(o->experimental_verify);
 	top->verify_state = cpu_to_le32(o->verify_state);
 	top->verify_interval = cpu_to_le32(o->verify_interval);
diff --git a/fio.h b/fio.h
index 2bfcac4..4ce4991 100644
--- a/fio.h
+++ b/fio.h
@@ -405,8 +405,6 @@ struct thread_data {
 	struct flist_head trim_list;
 	unsigned long trim_entries;
 
-	struct flist_head next_rand_list;
-
 	/*
 	 * for fileservice, how often to switch to a new file
 	 */
diff --git a/init.c b/init.c
index f5ff73d..07d1cdd 100644
--- a/init.c
+++ b/init.c
@@ -997,23 +997,26 @@ void td_fill_verify_state_seed(struct thread_data *td)
 
 static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 {
+	unsigned int read_seed = td->rand_seeds[FIO_RAND_BS_OFF];
+	unsigned int write_seed = td->rand_seeds[FIO_RAND_BS1_OFF];
+	unsigned int trim_seed = td->rand_seeds[FIO_RAND_BS2_OFF];
 	int i;
 
 	/*
 	 * trimwrite is special in that we need to generate the same
 	 * offsets to get the "write after trim" effect. If we are
 	 * using bssplit to set buffer length distributions, ensure that
-	 * we seed the trim and write generators identically.
+	 * we seed the trim and write generators identically. Ditto for
+	 * verify, read and writes must have the same seed, if we are doing
+	 * read verify.
 	 */
-	if (td_trimwrite(td)) {
-		init_rand_seed(&td->bsrange_state[DDIR_READ], td->rand_seeds[FIO_RAND_BS_OFF], use64);
-		init_rand_seed(&td->bsrange_state[DDIR_WRITE], td->rand_seeds[FIO_RAND_BS1_OFF], use64);
-		init_rand_seed(&td->bsrange_state[DDIR_TRIM], td->rand_seeds[FIO_RAND_BS1_OFF], use64);
-	} else {
-		init_rand_seed(&td->bsrange_state[DDIR_READ], td->rand_seeds[FIO_RAND_BS_OFF], use64);
-		init_rand_seed(&td->bsrange_state[DDIR_WRITE], td->rand_seeds[FIO_RAND_BS1_OFF], use64);
-		init_rand_seed(&td->bsrange_state[DDIR_TRIM], td->rand_seeds[FIO_RAND_BS2_OFF], use64);
-	}
+	if (td->o.verify != VERIFY_NONE)
+		write_seed = read_seed;
+	if (td_trimwrite(td))
+		trim_seed = write_seed;
+	init_rand_seed(&td->bsrange_state[DDIR_READ], read_seed, use64);
+	init_rand_seed(&td->bsrange_state[DDIR_WRITE], write_seed, use64);
+	init_rand_seed(&td->bsrange_state[DDIR_TRIM], trim_seed, use64);
 
 	td_fill_verify_state_seed(td);
 	init_rand_seed(&td->rwmix_state, td->rand_seeds[FIO_RAND_MIX_OFF], false);
diff --git a/io_u.c b/io_u.c
index 5fbb238..633f617 100644
--- a/io_u.c
+++ b/io_u.c
@@ -77,11 +77,6 @@ static uint64_t last_block(struct thread_data *td, struct fio_file *f,
 	return max_blocks;
 }
 
-struct rand_off {
-	struct flist_head list;
-	uint64_t off;
-};
-
 static int __get_next_rand_offset(struct thread_data *td, struct fio_file *f,
 				  enum fio_ddir ddir, uint64_t *b,
 				  uint64_t lastb)
@@ -272,16 +267,8 @@ bail:
 	return 0;
 }
 
-static int flist_cmp(void *data, struct flist_head *a, struct flist_head *b)
-{
-	struct rand_off *r1 = flist_entry(a, struct rand_off, list);
-	struct rand_off *r2 = flist_entry(b, struct rand_off, list);
-
-	return r1->off - r2->off;
-}
-
-static int get_off_from_method(struct thread_data *td, struct fio_file *f,
-			       enum fio_ddir ddir, uint64_t *b)
+static int get_next_rand_offset(struct thread_data *td, struct fio_file *f,
+				enum fio_ddir ddir, uint64_t *b)
 {
 	if (td->o.random_distribution == FIO_RAND_DIST_RANDOM) {
 		uint64_t lastb;
@@ -306,25 +293,6 @@ static int get_off_from_method(struct thread_data *td, struct fio_file *f,
 	return 1;
 }
 
-/*
- * Sort the reads for a verify phase in batches of verifysort_nr, if
- * specified.
- */
-static inline bool should_sort_io(struct thread_data *td)
-{
-	if (!td->o.verifysort_nr || !td->o.do_verify)
-		return false;
-	if (!td_random(td))
-		return false;
-	if (td->runstate != TD_VERIFYING)
-		return false;
-	if (td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE ||
-	    td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE64)
-		return false;
-
-	return true;
-}
-
 static bool should_do_random(struct thread_data *td, enum fio_ddir ddir)
 {
 	unsigned int v;
@@ -337,44 +305,6 @@ static bool should_do_random(struct thread_data *td, enum fio_ddir ddir)
 	return v <= td->o.perc_rand[ddir];
 }
 
-static int get_next_rand_offset(struct thread_data *td, struct fio_file *f,
-				enum fio_ddir ddir, uint64_t *b)
-{
-	struct rand_off *r;
-	int i, ret = 1;
-
-	if (!should_sort_io(td))
-		return get_off_from_method(td, f, ddir, b);
-
-	if (!flist_empty(&td->next_rand_list)) {
-fetch:
-		r = flist_first_entry(&td->next_rand_list, struct rand_off, list);
-		flist_del(&r->list);
-		*b = r->off;
-		free(r);
-		return 0;
-	}
-
-	for (i = 0; i < td->o.verifysort_nr; i++) {
-		r = malloc(sizeof(*r));
-
-		ret = get_off_from_method(td, f, ddir, &r->off);
-		if (ret) {
-			free(r);
-			break;
-		}
-
-		flist_add(&r->list, &td->next_rand_list);
-	}
-
-	if (ret && !i)
-		return ret;
-
-	assert(!flist_empty(&td->next_rand_list));
-	flist_sort(NULL, &td->next_rand_list, flist_cmp);
-	goto fetch;
-}
-
 static void loop_cache_invalidate(struct thread_data *td, struct fio_file *f)
 {
 	struct thread_options *o = &td->o;
diff --git a/iolog.c b/iolog.c
index 3f0fc22..598548d 100644
--- a/iolog.c
+++ b/iolog.c
@@ -235,8 +235,7 @@ void log_io_piece(struct thread_data *td, struct io_u *io_u)
 	 * to check for duplicate blocks and drop the old one, which we rely on
 	 * the rb insert/lookup for handling.
 	 */
-	if (((!td->o.verifysort) || !td_random(td)) &&
-	      file_randommap(td, ipo->file)) {
+	if (file_randommap(td, ipo->file)) {
 		INIT_FLIST_HEAD(&ipo->list);
 		flist_add_tail(&ipo->list, &td->io_hist_list);
 		ipo->flags |= IP_F_ONLIST;
diff --git a/options.c b/options.c
index fb28511..1b3ea04 100644
--- a/options.c
+++ b/options.c
@@ -2846,25 +2846,14 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "verifysort",
 		.lname	= "Verify sort",
-		.type	= FIO_OPT_BOOL,
-		.off1	= offsetof(struct thread_options, verifysort),
-		.help	= "Sort written verify blocks for read back",
-		.def	= "1",
-		.parent = "verify",
-		.hide	= 1,
+		.type	= FIO_OPT_SOFT_DEPRECATED,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_VERIFY,
 	},
 	{
 		.name	= "verifysort_nr",
 		.lname	= "Verify Sort Nr",
-		.type	= FIO_OPT_INT,
-		.off1	= offsetof(struct thread_options, verifysort_nr),
-		.help	= "Pre-load and sort verify blocks for a read workload",
-		.minval	= 0,
-		.maxval	= 131072,
-		.def	= "1024",
-		.parent = "verify",
+		.type	= FIO_OPT_SOFT_DEPRECATED,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_VERIFY,
 	},
diff --git a/parse.c b/parse.c
index 539c602..9685f1e 100644
--- a/parse.c
+++ b/parse.c
@@ -36,6 +36,7 @@ static const char *opt_type_names[] = {
 	"OPT_FLOAT_LIST",
 	"OPT_STR_SET",
 	"OPT_DEPRECATED",
+	"OPT_SOFT_DEPRECATED",
 	"OPT_UNSUPPORTED",
 };
 
@@ -876,8 +877,9 @@ static int __handle_option(const struct fio_option *o, const char *ptr,
 		break;
 	}
 	case FIO_OPT_DEPRECATED:
-		log_info("Option %s is deprecated\n", o->name);
 		ret = 1;
+	case FIO_OPT_SOFT_DEPRECATED:
+		log_info("Option %s is deprecated\n", o->name);
 		break;
 	default:
 		log_err("Bad option type %u\n", o->type);
@@ -1235,7 +1237,8 @@ int show_cmd_help(const struct fio_option *options, const char *name)
 	for (o = &options[0]; o->name; o++) {
 		int match = 0;
 
-		if (o->type == FIO_OPT_DEPRECATED)
+		if (o->type == FIO_OPT_DEPRECATED ||
+		    o->type == FIO_OPT_SOFT_DEPRECATED)
 			continue;
 		if (!exec_profile && o->prof_name)
 			continue;
@@ -1309,7 +1312,8 @@ void fill_default_options(void *data, const struct fio_option *options)
 
 static void option_init(struct fio_option *o)
 {
-	if (o->type == FIO_OPT_DEPRECATED || o->type == FIO_OPT_UNSUPPORTED)
+	if (o->type == FIO_OPT_DEPRECATED || o->type == FIO_OPT_UNSUPPORTED ||
+	    o->type == FIO_OPT_SOFT_DEPRECATED)
 		return;
 	if (o->name && !o->lname)
 		log_err("Option %s: missing long option name\n", o->name);
diff --git a/parse.h b/parse.h
index 4ad92d9..4de5e77 100644
--- a/parse.h
+++ b/parse.h
@@ -20,6 +20,7 @@ enum fio_opt_type {
 	FIO_OPT_FLOAT_LIST,
 	FIO_OPT_STR_SET,
 	FIO_OPT_DEPRECATED,
+	FIO_OPT_SOFT_DEPRECATED,
 	FIO_OPT_UNSUPPORTED,	/* keep this last */
 };
 
diff --git a/rate-submit.c b/rate-submit.c
index fdbece6..5c77a4e 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -115,7 +115,6 @@ static int io_workqueue_init_worker_fn(struct submit_worker *sw)
 	INIT_FLIST_HEAD(&td->io_hist_list);
 	INIT_FLIST_HEAD(&td->verify_list);
 	INIT_FLIST_HEAD(&td->trim_list);
-	INIT_FLIST_HEAD(&td->next_rand_list);
 	td->io_hist_tree = RB_ROOT;
 
 	td->o.iodepth = 1;
diff --git a/thread_options.h b/thread_options.h
index 944feaf..4ec570d 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -110,8 +110,6 @@ struct thread_options {
 	unsigned int write_hint;
 	unsigned int verify;
 	unsigned int do_verify;
-	unsigned int verifysort;
-	unsigned int verifysort_nr;
 	unsigned int verify_interval;
 	unsigned int verify_offset;
 	char verify_pattern[MAX_PATTERN_SIZE];
@@ -391,8 +389,6 @@ struct thread_options_pack {
 	uint32_t write_hint;
 	uint32_t verify;
 	uint32_t do_verify;
-	uint32_t verifysort;
-	uint32_t verifysort_nr;
 	uint32_t verify_interval;
 	uint32_t verify_offset;
 	uint8_t verify_pattern[MAX_PATTERN_SIZE];
diff --git a/verify.c b/verify.c
index c5fa241..40d484b 100644
--- a/verify.c
+++ b/verify.c
@@ -919,10 +919,9 @@ int verify_io_u(struct thread_data *td, struct io_u **io_u_ptr)
 		hdr = p;
 
 		/*
-		 * Make rand_seed check pass when have verifysort or
-		 * verify_backlog.
+		 * Make rand_seed check pass when have verify_backlog.
 		 */
-		if (td->o.verifysort || (td->flags & TD_F_VER_BACKLOG))
+		if (!td_rw(td) || (td->flags & TD_F_VER_BACKLOG))
 			io_u->rand_seed = hdr->rand_seed;
 
 		if (td->o.verify != VERIFY_PATTERN_NO_HDR) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dd39e2d406b6be11aa5432311034761aff5d6ba8:

  cconv: add conversion for 'replay_time_scale' (2018-04-14 16:26:17 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c7d32225c3efe61c79470bc31bb369b33d3e3a88:

  Merge branch 'nvml-to-pmdk' of https://github.com/sscargal/fio (2018-04-16 19:40:46 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Fio 3.6
      Merge branch 'nvml-to-pmdk' of https://github.com/sscargal/fio

sscargal (1):
      NVML references renamed to PMDK

 FIO-VERSION-GEN   | 2 +-
 HOWTO             | 6 +++---
 configure         | 6 +++---
 engines/libpmem.c | 2 +-
 engines/pmemblk.c | 2 +-
 fio.1             | 6 +++---
 options.c         | 4 ++--
 7 files changed, 14 insertions(+), 14 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 7abd8ce..d2d095b 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.5
+DEF_VER=fio-3.6
 
 LF='
 '
diff --git a/HOWTO b/HOWTO
index 68b6b82..c34fdf3 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1854,12 +1854,12 @@ I/O engine
 
 		**pmemblk**
 			Read and write using filesystem DAX to a file on a filesystem
-			mounted with DAX on a persistent memory device through the NVML
+			mounted with DAX on a persistent memory device through the PMDK
 			libpmemblk library.
 
 		**dev-dax**
 			Read and write using device DAX to a persistent memory device (e.g.,
-			/dev/dax0.0) through the NVML libpmem library.
+			/dev/dax0.0) through the PMDK libpmem library.
 
 		**external**
 			Prefix to specify loading an external I/O engine object file. Append
@@ -1875,7 +1875,7 @@ I/O engine
 
 		**libpmem**
 			Read and write using mmap I/O to a file on a filesystem
-			mounted with DAX on a persistent memory device through the NVML
+			mounted with DAX on a persistent memory device through the PMDK
 			libpmem library.
 
 I/O engine specific parameters
diff --git a/configure b/configure
index 32baec6..70cb006 100755
--- a/configure
+++ b/configure
@@ -1894,15 +1894,15 @@ fi
 
 ##########################################
 # Report whether pmemblk engine is enabled
-print_config "NVML pmemblk engine" "$pmemblk"
+print_config "PMDK pmemblk engine" "$pmemblk"
 
 ##########################################
 # Report whether dev-dax engine is enabled
-print_config "NVML dev-dax engine" "$devdax"
+print_config "PMDK dev-dax engine" "$devdax"
 
 ##########################################
 # Report whether libpmem engine is enabled
-print_config "NVML libpmem engine" "$pmem"
+print_config "PMDK libpmem engine" "$pmem"
 
 ##########################################
 # Check if we have lex/yacc available
diff --git a/engines/libpmem.c b/engines/libpmem.c
index 3038784..dbb3f5c 100644
--- a/engines/libpmem.c
+++ b/engines/libpmem.c
@@ -1,5 +1,5 @@
 /*
- * libpmem: IO engine that uses NVML libpmem to read and write data
+ * libpmem: IO engine that uses PMDK libpmem to read and write data
  *
  * Copyright (C) 2017 Nippon Telegraph and Telephone Corporation.
  *
diff --git a/engines/pmemblk.c b/engines/pmemblk.c
index 5d21915..264eb71 100644
--- a/engines/pmemblk.c
+++ b/engines/pmemblk.c
@@ -1,5 +1,5 @@
 /*
- * pmemblk: IO engine that uses NVML libpmemblk to read and write data
+ * pmemblk: IO engine that uses PMDK libpmemblk to read and write data
  *
  * Copyright (C) 2016 Hewlett Packard Enterprise Development LP
  *
diff --git a/fio.1 b/fio.1
index 3b5522f..9264855 100644
--- a/fio.1
+++ b/fio.1
@@ -1628,12 +1628,12 @@ constraint.
 .TP
 .B pmemblk
 Read and write using filesystem DAX to a file on a filesystem
-mounted with DAX on a persistent memory device through the NVML
+mounted with DAX on a persistent memory device through the PMDK
 libpmemblk library.
 .TP
 .B dev\-dax
 Read and write using device DAX to a persistent memory device (e.g.,
-/dev/dax0.0) through the NVML libpmem library.
+/dev/dax0.0) through the PMDK libpmem library.
 .TP
 .B external
 Prefix to specify loading an external I/O engine object file. Append
@@ -1649,7 +1649,7 @@ done other than creating the file.
 .TP
 .B libpmem
 Read and write using mmap I/O to a file on a filesystem
-mounted with DAX on a persistent memory device through the NVML
+mounted with DAX on a persistent memory device through the PMDK
 libpmem library.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
diff --git a/options.c b/options.c
index 045c62b..fb28511 100644
--- a/options.c
+++ b/options.c
@@ -1851,7 +1851,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 #endif
 #ifdef CONFIG_PMEMBLK
 			  { .ival = "pmemblk",
-			    .help = "NVML libpmemblk based IO engine",
+			    .help = "PMDK libpmemblk based IO engine",
 			  },
 
 #endif
@@ -1870,7 +1870,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  },
 #ifdef CONFIG_LIBPMEM
 			  { .ival = "libpmem",
-			    .help = "NVML libpmem based IO engine",
+			    .help = "PMDK libpmem based IO engine",
 			  },
 #endif
 		},

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c479640d6208236744f0562b1e79535eec290e2b:

  Merge branch 'proc_group' of https://github.com/sitsofe/fio (2018-04-13 17:25:35 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dd39e2d406b6be11aa5432311034761aff5d6ba8:

  cconv: add conversion for 'replay_time_scale' (2018-04-14 16:26:17 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      iolog: fix issue with replay rate
      Add 'replay_time_scale' option
      cconv: add conversion for 'replay_time_scale'

 HOWTO            |  8 ++++++++
 blktrace.c       | 14 ++++++++++----
 cconv.c          |  2 ++
 fio.1            |  6 ++++++
 iolog.c          |  5 +++--
 options.c        | 13 +++++++++++++
 server.h         |  2 +-
 thread_options.h |  2 ++
 8 files changed, 45 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 5c8623d..68b6b82 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2311,6 +2311,14 @@ I/O replay
 	still respecting ordering. The result is the same I/O pattern to a given
 	device, but different timings.
 
+.. option:: replay_time_scale=int
+
+	When replaying I/O with :option:`read_iolog`, fio will honor the
+	original timing in the trace. With this option, it's possible to scale
+	the time. It's a percentage option, if set to 50 it means run at 50%
+	the original IO rate in the trace. If set to 200, run at twice the
+	original IO rate. Defaults to 100.
+
 .. option:: replay_redirect=str
 
 	While replaying I/O patterns using :option:`read_iolog` the default behavior
diff --git a/blktrace.c b/blktrace.c
index 6e4d0a4..71ac412 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -333,13 +333,19 @@ static void handle_trace(struct thread_data *td, struct blk_io_trace *t,
 		return;
 
 	if (!(t->action & BLK_TC_ACT(BLK_TC_NOTIFY))) {
-		if (!last_ttime || td->o.no_stall) {
-			last_ttime = t->time;
+		if (!last_ttime || td->o.no_stall)
 			delay = 0;
-		} else {
+		else if (td->o.replay_time_scale == 100)
 			delay = t->time - last_ttime;
-			last_ttime = t->time;
+		else {
+			double tmp = t->time - last_ttime;
+			double scale;
+
+			scale = (double) 100.0 / (double) td->o.replay_time_scale;
+			tmp *= scale;
+			delay = tmp;
 		}
+		last_ttime = t->time;
 	}
 
 	t_bytes_align(&td->o, t);
diff --git a/cconv.c b/cconv.c
index dbe0071..585ed86 100644
--- a/cconv.c
+++ b/cconv.c
@@ -291,6 +291,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->block_error_hist = le32_to_cpu(top->block_error_hist);
 	o->replay_align = le32_to_cpu(top->replay_align);
 	o->replay_scale = le32_to_cpu(top->replay_scale);
+	o->replay_time_scale = le32_to_cpu(top->replay_time_scale);
 	o->per_job_logs = le32_to_cpu(top->per_job_logs);
 	o->write_bw_log = le32_to_cpu(top->write_bw_log);
 	o->write_lat_log = le32_to_cpu(top->write_lat_log);
@@ -481,6 +482,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->block_error_hist = cpu_to_le32(o->block_error_hist);
 	top->replay_align = cpu_to_le32(o->replay_align);
 	top->replay_scale = cpu_to_le32(o->replay_scale);
+	top->replay_time_scale = cpu_to_le32(o->replay_time_scale);
 	top->per_job_logs = cpu_to_le32(o->per_job_logs);
 	top->write_bw_log = cpu_to_le32(o->write_bw_log);
 	top->write_lat_log = cpu_to_le32(o->write_lat_log);
diff --git a/fio.1 b/fio.1
index dd4f9cb..3b5522f 100644
--- a/fio.1
+++ b/fio.1
@@ -2036,6 +2036,12 @@ respect the timestamps and attempt to replay them as fast as possible while
 still respecting ordering. The result is the same I/O pattern to a given
 device, but different timings.
 .TP
+.BI replay_time_scale \fR=\fPint
+When replaying I/O with \fBread_iolog\fR, fio will honor the original timing
+in the trace. With this option, it's possible to scale the time. It's a
+percentage option, if set to 50 it means run at 50% the original IO rate in
+the trace. If set to 200, run at twice the original IO rate. Defaults to 100.
+.TP
 .BI replay_redirect \fR=\fPstr
 While replaying I/O patterns using \fBread_iolog\fR the default behavior
 is to replay the IOPS onto the major/minor device that each IOP was recorded
diff --git a/iolog.c b/iolog.c
index bfafc03..3f0fc22 100644
--- a/iolog.c
+++ b/iolog.c
@@ -63,6 +63,7 @@ void log_file(struct thread_data *td, struct fio_file *f,
 static void iolog_delay(struct thread_data *td, unsigned long delay)
 {
 	uint64_t usec = utime_since_now(&td->last_issue);
+	unsigned long orig_delay = delay;
 	uint64_t this_delay;
 	struct timespec ts;
 
@@ -88,8 +89,8 @@ static void iolog_delay(struct thread_data *td, unsigned long delay)
 	}
 
 	usec = utime_since_now(&ts);
-	if (usec > delay)
-		td->time_offset = usec - delay;
+	if (usec > orig_delay)
+		td->time_offset = usec - orig_delay;
 	else
 		td->time_offset = 0;
 }
diff --git a/options.c b/options.c
index fae3943..045c62b 100644
--- a/options.c
+++ b/options.c
@@ -3157,6 +3157,19 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.pow2	= 1,
 	},
 	{
+		.name	= "replay_time_scale",
+		.lname	= "Replay Time Scale",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, replay_time_scale),
+		.def	= "100",
+		.minval	= 1,
+		.parent	= "read_iolog",
+		.hide	= 1,
+		.help	= "Scale time for replay events",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_IOLOG,
+	},
+	{
 		.name	= "exec_prerun",
 		.lname	= "Pre-execute runnable",
 		.type	= FIO_OPT_STR_STORE,
diff --git a/server.h b/server.h
index 1eee7dc..4896860 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 71,
+	FIO_SERVER_VER			= 72,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index dc290b0..944feaf 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -317,6 +317,7 @@ struct thread_options {
 
 	unsigned int replay_align;
 	unsigned int replay_scale;
+	unsigned int replay_time_scale;
 
 	unsigned int per_job_logs;
 
@@ -592,6 +593,7 @@ struct thread_options_pack {
 
 	uint32_t replay_align;
 	uint32_t replay_scale;
+	uint32_t replay_time_scale;
 
 	uint32_t per_job_logs;
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-14 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 32385 bytes --]

The following changes since commit 4fe721ac83e84df7c6be07394d1963fd1ec5d9a6:

  os/os-dragonfly: sync with header file changes in upstream (2018-04-10 09:17:22 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c479640d6208236744f0562b1e79535eec290e2b:

  Merge branch 'proc_group' of https://github.com/sitsofe/fio (2018-04-13 17:25:35 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'proc_group' of https://github.com/sitsofe/fio

Sitsofe Wheeler (7):
      windows: update EULA
      windows: prepare for Windows build split
      windows: target Windows 7 and add support for more than 64 CPUs
      doc: add Windows processor group behaviour and Windows target option
      configure/Makefile: make Cygwin force less
      appveyor: make 32 bit build target XP + minor fixes
      doc: add cpus_allowed reference to log_compression_cpus

 HOWTO                                |  47 +++--
 Makefile                             |   3 -
 README                               |  11 +-
 appveyor.yml                         |   4 +-
 configure                            |  29 ++-
 fio.1                                |  47 +++--
 os/os-windows-7.h                    | 367 +++++++++++++++++++++++++++++++++++
 os/os-windows-xp.h                   |  70 +++++++
 os/os-windows.h                      |  85 +-------
 os/windows/eula.rtf                  | Bin 1075 -> 1077 bytes
 os/windows/posix.c                   |   2 +
 os/windows/posix/include/arpa/inet.h |   2 +
 os/windows/posix/include/poll.h      |   9 +
 server.c                             |   6 +-
 14 files changed, 551 insertions(+), 131 deletions(-)
 create mode 100644 os/os-windows-7.h
 create mode 100644 os/os-windows-xp.h

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index dbbbfaa..5c8623d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2377,24 +2377,27 @@ Threads, processes and job synchronization
 
 	Set the I/O priority class. See man :manpage:`ionice(1)`.
 
-.. option:: cpumask=int
-
-	Set the CPU affinity of this job. The parameter given is a bit mask of
-	allowed CPUs the job may run on. So if you want the allowed CPUs to be 1
-	and 5, you would pass the decimal value of (1 << 1 | 1 << 5), or 34. See man
-	:manpage:`sched_setaffinity(2)`. This may not work on all supported
-	operating systems or kernel versions. This option doesn't work well for a
-	higher CPU count than what you can store in an integer mask, so it can only
-	control cpus 1-32. For boxes with larger CPU counts, use
-	:option:`cpus_allowed`.
-
 .. option:: cpus_allowed=str
 
 	Controls the same options as :option:`cpumask`, but accepts a textual
-	specification of the permitted CPUs instead. So to use CPUs 1 and 5 you
-	would specify ``cpus_allowed=1,5``. This option also allows a range of CPUs
-	to be specified -- say you wanted a binding to CPUs 1, 5, and 8 to 15, you
-	would set ``cpus_allowed=1,5,8-15``.
+	specification of the permitted CPUs instead and CPUs are indexed from 0. So
+	to use CPUs 0 and 5 you would specify ``cpus_allowed=0,5``. This option also
+	allows a range of CPUs to be specified -- say you wanted a binding to CPUs
+	0, 5, and 8 to 15, you would set ``cpus_allowed=0,5,8-15``.
+
+	On Windows, when ``cpus_allowed`` is unset only CPUs from fio's current
+	processor group will be used and affinity settings are inherited from the
+	system. An fio build configured to target Windows 7 makes options that set
+	CPUs processor group aware and values will set both the processor group
+	and a CPU from within that group. For example, on a system where processor
+	group 0 has 40 CPUs and processor group 1 has 32 CPUs, ``cpus_allowed``
+	values between 0 and 39 will bind CPUs from processor group 0 and
+	``cpus_allowed`` values between 40 and 71 will bind CPUs from processor
+	group 1. When using ``cpus_allowed_policy=shared`` all CPUs specified by a
+	single ``cpus_allowed`` option must be from the same processor group. For
+	Windows fio builds not built for Windows 7, CPUs will only be selected from
+	(and be relative to) whatever processor group fio happens to be running in
+	and CPUs from other processor groups cannot be used.
 
 .. option:: cpus_allowed_policy=str
 
@@ -2411,6 +2414,17 @@ Threads, processes and job synchronization
 	enough CPUs are given for the jobs listed, then fio will roundrobin the CPUs
 	in the set.
 
+.. option:: cpumask=int
+
+	Set the CPU affinity of this job. The parameter given is a bit mask of
+	allowed CPUs the job may run on. So if you want the allowed CPUs to be 1
+	and 5, you would pass the decimal value of (1 << 1 | 1 << 5), or 34. See man
+	:manpage:`sched_setaffinity(2)`. This may not work on all supported
+	operating systems or kernel versions. This option doesn't work well for a
+	higher CPU count than what you can store in an integer mask, so it can only
+	control cpus 1-32. For boxes with larger CPU counts, use
+	:option:`cpus_allowed`.
+
 .. option:: numa_cpu_nodes=str
 
 	Set this job running on specified NUMA nodes' CPUs. The arguments allow
@@ -2921,7 +2935,8 @@ Measurements and reporting
 
 	Define the set of CPUs that are allowed to handle online log compression for
 	the I/O jobs. This can provide better isolation between performance
-	sensitive jobs, and background compression work.
+	sensitive jobs, and background compression work. See
+	:option:`cpus_allowed` for the format used.
 
 .. option:: log_store_compressed=bool
 
diff --git a/Makefile b/Makefile
index cc4b71f..357ae98 100644
--- a/Makefile
+++ b/Makefile
@@ -59,9 +59,6 @@ ifdef CONFIG_LIBHDFS
   SOURCE += engines/libhdfs.c
 endif
 
-ifdef CONFIG_64BIT_LLP64
-  CFLAGS += -DBITS_PER_LONG=32
-endif
 ifdef CONFIG_64BIT
   CFLAGS += -DBITS_PER_LONG=64
 endif
diff --git a/README b/README
index fba5f10..38022bb 100644
--- a/README
+++ b/README
@@ -172,15 +172,18 @@ directory.
 How to compile fio on 64-bit Windows:
 
  1. Install Cygwin (http://www.cygwin.com/). Install **make** and all
-    packages starting with **mingw64-i686** and **mingw64-x86_64**. Ensure
-    **mingw64-i686-zlib** and **mingw64-x86_64-zlib** are installed if you wish
+    packages starting with **mingw64-x86_64**. Ensure
+    **mingw64-x86_64-zlib** are installed if you wish
     to enable fio's log compression functionality.
  2. Open the Cygwin Terminal.
  3. Go to the fio directory (source files).
  4. Run ``make clean && make -j``.
 
-To build fio on 32-bit Windows, run ``./configure --build-32bit-win`` before
-``make``.
+To build fio for 32-bit Windows, ensure the -i686 versions of the previously
+mentioned -x86_64 packages are installed and run ``./configure
+--build-32bit-win`` before ``make``. To build an fio that supports versions of
+Windows below Windows 7/Windows Server 2008 R2 also add ``--target-win-ver=xp``
+to the end of the configure line that you run before doing ``make``.
 
 It's recommended that once built or installed, fio be run in a Command Prompt or
 other 'native' console such as console2, since there are known to be display and
diff --git a/appveyor.yml b/appveyor.yml
index 09ebccf..ca8b2ab 100644
--- a/appveyor.yml
+++ b/appveyor.yml
@@ -10,10 +10,10 @@ environment:
       CONFIGURE_OPTIONS:
     - platform: x86
       PACKAGE_ARCH: i686
-      CONFIGURE_OPTIONS: --build-32bit-win
+      CONFIGURE_OPTIONS: --build-32bit-win --target-win-ver=xp
 
 install:
-  - '%CYG_ROOT%\setup-x86_64.exe --quiet-mode --no-shortcuts --only-site --site "%CYG_MIRROR%" --packages "mingw64-%PACKAGE_ARCH%-zlib" > NULL'
+  - '%CYG_ROOT%\setup-x86_64.exe --quiet-mode --no-shortcuts --only-site --site "%CYG_MIRROR%" --packages "mingw64-%PACKAGE_ARCH%-zlib" > NUL'
   - SET PATH=%CYG_ROOT%\bin;%PATH% #��NB: Changed env variables persist to later sections
 
 build_script:
diff --git a/configure b/configure
index 38706a9..32baec6 100755
--- a/configure
+++ b/configure
@@ -167,6 +167,8 @@ for opt do
   ;;
   --build-32bit-win) build_32bit_win="yes"
   ;;
+  --target-win-ver=*) target_win_ver="$optarg"
+  ;;
   --build-static) build_static="yes"
   ;;
   --enable-gfio) gfio_check="yes"
@@ -213,6 +215,7 @@ if test "$show_help" = "yes" ; then
   echo "--cc=                   Specify compiler to use"
   echo "--extra-cflags=         Specify extra CFLAGS to pass to compiler"
   echo "--build-32bit-win       Enable 32-bit build on Windows"
+  echo "--target-win-ver=       Minimum version of Windows to target (XP or 7)"
   echo "--build-static          Build a static fio"
   echo "--esx                   Configure build options for esx"
   echo "--enable-gfio           Enable building of gtk gfio"
@@ -329,20 +332,27 @@ CYGWIN*)
       cc="x86_64-w64-mingw32-gcc"
     fi
   fi
-  if test ! -z "$build_32bit_win" && test "$build_32bit_win" = "yes"; then
-    output_sym "CONFIG_32BIT"
+
+  target_win_ver=$(echo "$target_win_ver" | tr '[:lower:]' '[:upper:]')
+  if test -z "$target_win_ver"; then
+    # Default Windows API target
+    target_win_ver="7"
+  fi
+  if test "$target_win_ver" = "XP"; then
+    output_sym "CONFIG_WINDOWS_XP"
+  elif test "$target_win_ver" = "7"; then
+    output_sym "CONFIG_WINDOWS_7"
+    CFLAGS="$CFLAGS -D_WIN32_WINNT=0x0601"
   else
-    output_sym "CONFIG_64BIT_LLP64"
+    fatal "Unknown target Windows version"
   fi
+
   # We need this to be output_sym'd here because this is Windows specific.
   # The regular configure path never sets this config.
   output_sym "CONFIG_WINDOWSAIO"
   # We now take the regular configuration path without having exit 0 here.
   # Flags below are still necessary mostly for MinGW.
   socklen_t="yes"
-  sfaa="yes"
-  sync_sync="yes"
-  cmp_swap="yes"
   rusage_thread="yes"
   fdatasync="yes"
   clock_gettime="yes" # clock_monotonic probe has dependency on this
@@ -350,11 +360,7 @@ CYGWIN*)
   gettimeofday="yes"
   sched_idle="yes"
   tcp_nodelay="yes"
-  tls_thread="yes"
-  static_assert="yes"
   ipv6="yes"
-  mkdir_two="no"
-  echo "BUILD_CFLAGS=$CFLAGS -include config-host.h -D_GNU_SOURCE" >> $config_host_mak
   ;;
 esac
 
@@ -498,6 +504,9 @@ fi
 print_config "Operating system" "$targetos"
 print_config "CPU" "$cpu"
 print_config "Big endian" "$bigendian"
+if test ! -z "$target_win_ver"; then
+  print_config "Target Windows version" "$target_win_ver"
+fi
 print_config "Compiler" "$cc"
 print_config "Cross compile" "$cross_compile"
 echo
diff --git a/fio.1 b/fio.1
index 5ca57ce..dd4f9cb 100644
--- a/fio.1
+++ b/fio.1
@@ -2091,22 +2091,28 @@ systems since meaning of priority may differ.
 .BI prioclass \fR=\fPint
 Set the I/O priority class. See man \fBionice\fR\|(1).
 .TP
-.BI cpumask \fR=\fPint
-Set the CPU affinity of this job. The parameter given is a bit mask of
-allowed CPUs the job may run on. So if you want the allowed CPUs to be 1
-and 5, you would pass the decimal value of (1 << 1 | 1 << 5), or 34. See man
-\fBsched_setaffinity\fR\|(2). This may not work on all supported
-operating systems or kernel versions. This option doesn't work well for a
-higher CPU count than what you can store in an integer mask, so it can only
-control cpus 1\-32. For boxes with larger CPU counts, use
-\fBcpus_allowed\fR.
-.TP
 .BI cpus_allowed \fR=\fPstr
 Controls the same options as \fBcpumask\fR, but accepts a textual
-specification of the permitted CPUs instead. So to use CPUs 1 and 5 you
-would specify `cpus_allowed=1,5'. This option also allows a range of CPUs
-to be specified \-\- say you wanted a binding to CPUs 1, 5, and 8 to 15, you
-would set `cpus_allowed=1,5,8\-15'.
+specification of the permitted CPUs instead and CPUs are indexed from 0. So
+to use CPUs 0 and 5 you would specify `cpus_allowed=0,5'. This option also
+allows a range of CPUs to be specified \-\- say you wanted a binding to CPUs
+0, 5, and 8 to 15, you would set `cpus_allowed=0,5,8\-15'.
+.RS
+.P
+On Windows, when `cpus_allowed' is unset only CPUs from fio's current
+processor group will be used and affinity settings are inherited from the
+system. An fio build configured to target Windows 7 makes options that set
+CPUs processor group aware and values will set both the processor group
+and a CPU from within that group. For example, on a system where processor
+group 0 has 40 CPUs and processor group 1 has 32 CPUs, `cpus_allowed'
+values between 0 and 39 will bind CPUs from processor group 0 and
+`cpus_allowed' values between 40 and 71 will bind CPUs from processor
+group 1. When using `cpus_allowed_policy=shared' all CPUs specified by a
+single `cpus_allowed' option must be from the same processor group. For
+Windows fio builds not built for Windows 7, CPUs will only be selected from
+(and be relative to) whatever processor group fio happens to be running in
+and CPUs from other processor groups cannot be used.
+.RE
 .TP
 .BI cpus_allowed_policy \fR=\fPstr
 Set the policy of how fio distributes the CPUs specified by
@@ -2127,6 +2133,16 @@ enough CPUs are given for the jobs listed, then fio will roundrobin the CPUs
 in the set.
 .RE
 .TP
+.BI cpumask \fR=\fPint
+Set the CPU affinity of this job. The parameter given is a bit mask of
+allowed CPUs the job may run on. So if you want the allowed CPUs to be 1
+and 5, you would pass the decimal value of (1 << 1 | 1 << 5), or 34. See man
+\fBsched_setaffinity\fR\|(2). This may not work on all supported
+operating systems or kernel versions. This option doesn't work well for a
+higher CPU count than what you can store in an integer mask, so it can only
+control cpus 1\-32. For boxes with larger CPU counts, use
+\fBcpus_allowed\fR.
+.TP
 .BI numa_cpu_nodes \fR=\fPstr
 Set this job running on specified NUMA nodes' CPUs. The arguments allow
 comma delimited list of cpu numbers, A\-B ranges, or `all'. Note, to enable
@@ -2603,7 +2619,8 @@ zlib.
 .BI log_compression_cpus \fR=\fPstr
 Define the set of CPUs that are allowed to handle online log compression for
 the I/O jobs. This can provide better isolation between performance
-sensitive jobs, and background compression work.
+sensitive jobs, and background compression work. See \fBcpus_allowed\fR for
+the format used.
 .TP
 .BI log_store_compressed \fR=\fPbool
 If set, fio will store the log files in a compressed format. They can be
diff --git a/os/os-windows-7.h b/os/os-windows-7.h
new file mode 100644
index 0000000..f5ddb8e
--- /dev/null
+++ b/os/os-windows-7.h
@@ -0,0 +1,367 @@
+#define FIO_MAX_CPUS		512 /* From Hyper-V 2016's max logical processors */
+#define FIO_CPU_MASK_STRIDE	64
+#define FIO_CPU_MASK_ROWS	(FIO_MAX_CPUS / FIO_CPU_MASK_STRIDE)
+
+typedef struct {
+	uint64_t row[FIO_CPU_MASK_ROWS];
+} os_cpu_mask_t;
+
+#define FIO_HAVE_CPU_ONLINE_SYSCONF
+/* Return all processors regardless of processor group */
+static inline unsigned int cpus_online(void)
+{
+	return GetMaximumProcessorCount(ALL_PROCESSOR_GROUPS);
+}
+
+static inline void print_mask(os_cpu_mask_t *cpumask)
+{
+	for (int i = 0; i < FIO_CPU_MASK_ROWS; i++)
+		dprint(FD_PROCESS, "cpumask[%d]=%lu\n", i, cpumask->row[i]);
+}
+
+/* Return the index of the least significant set CPU in cpumask or -1 if no
+ * CPUs are set */
+static inline int first_set_cpu(os_cpu_mask_t *cpumask)
+{
+	int cpus_offset, mask_first_cpu, row;
+
+	cpus_offset = 0;
+	row = 0;
+	mask_first_cpu = -1;
+	while (mask_first_cpu < 0 && row < FIO_CPU_MASK_ROWS) {
+		int row_first_cpu;
+
+		row_first_cpu = __builtin_ffsll(cpumask->row[row]) - 1;
+		dprint(FD_PROCESS, "row_first_cpu=%d cpumask->row[%d]=%lu\n",
+		       row_first_cpu, row, cpumask->row[row]);
+		if (row_first_cpu > -1) {
+			mask_first_cpu = cpus_offset + row_first_cpu;
+			dprint(FD_PROCESS, "first set cpu in mask is at index %d\n",
+			       mask_first_cpu);
+		} else {
+			cpus_offset += FIO_CPU_MASK_STRIDE;
+			row++;
+		}
+	}
+
+	return mask_first_cpu;
+}
+
+/* Return the index of the most significant set CPU in cpumask or -1 if no
+ * CPUs are set */
+static inline int last_set_cpu(os_cpu_mask_t *cpumask)
+{
+	int cpus_offset, mask_last_cpu, row;
+
+	cpus_offset = (FIO_CPU_MASK_ROWS - 1) * FIO_CPU_MASK_STRIDE;
+	row = FIO_CPU_MASK_ROWS - 1;
+	mask_last_cpu = -1;
+	while (mask_last_cpu < 0 && row >= 0) {
+		int row_last_cpu;
+
+		if (cpumask->row[row] == 0)
+			row_last_cpu = -1;
+		else {
+			uint64_t tmp = cpumask->row[row];
+
+			row_last_cpu = 0;
+			while (tmp >>= 1)
+			    row_last_cpu++;
+		}
+
+		dprint(FD_PROCESS, "row_last_cpu=%d cpumask->row[%d]=%lu\n",
+		       row_last_cpu, row, cpumask->row[row]);
+		if (row_last_cpu > -1) {
+			mask_last_cpu = cpus_offset + row_last_cpu;
+			dprint(FD_PROCESS, "last set cpu in mask is at index %d\n",
+			       mask_last_cpu);
+		} else {
+			cpus_offset -= FIO_CPU_MASK_STRIDE;
+			row--;
+		}
+	}
+
+	return mask_last_cpu;
+}
+
+static inline int mask_to_group_mask(os_cpu_mask_t *cpumask, int *processor_group, uint64_t *affinity_mask)
+{
+	WORD online_groups, group, group_size;
+	bool found;
+	int cpus_offset, search_cpu, last_cpu, bit_offset, row, end;
+	uint64_t group_cpumask;
+
+	search_cpu = first_set_cpu(cpumask);
+	if (search_cpu < 0) {
+		log_info("CPU mask doesn't set any CPUs\n");
+		return 1;
+	}
+
+	/* Find processor group first set CPU applies to */
+	online_groups = GetActiveProcessorGroupCount();
+	group = 0;
+	found = false;
+	cpus_offset = 0;
+	group_size = 0;
+	while (!found && group < online_groups) {
+		group_size = GetMaximumProcessorCount(group);
+		dprint(FD_PROCESS, "group=%d group_start=%d group_size=%u search_cpu=%d\n",
+		       group, cpus_offset, group_size, search_cpu);
+		if (cpus_offset + group_size > search_cpu)
+			found = true;
+		else {
+			cpus_offset += group_size;
+			group++;
+		}
+	}
+
+	if (!found) {
+		log_err("CPU mask contains processor beyond last active processor index (%d)\n",
+			 cpus_offset - 1);
+		print_mask(cpumask);
+		return 1;
+	}
+
+	/* Check all the CPUs in the mask apply to ONLY that processor group */
+	last_cpu = last_set_cpu(cpumask);
+	if (last_cpu > (cpus_offset + group_size - 1)) {
+		log_info("CPU mask cannot bind CPUs (e.g. %d, %d) that are "
+			 "in different processor groups\n", search_cpu,
+			 last_cpu);
+		print_mask(cpumask);
+		return 1;
+	}
+
+	/* Extract the current processor group mask from the cpumask */
+	row = cpus_offset / FIO_CPU_MASK_STRIDE;
+	bit_offset = cpus_offset % FIO_CPU_MASK_STRIDE;
+	group_cpumask = cpumask->row[row] >> bit_offset;
+	end = bit_offset + group_size;
+	if (end > FIO_CPU_MASK_STRIDE && (row + 1 < FIO_CPU_MASK_ROWS)) {
+		/* Some of the next row needs to be part of the mask */
+		int needed, needed_shift, needed_mask_shift;
+		uint64_t needed_mask;
+
+		needed = end - FIO_CPU_MASK_STRIDE;
+		needed_shift = FIO_CPU_MASK_STRIDE - bit_offset;
+		needed_mask_shift = FIO_CPU_MASK_STRIDE - needed;
+		needed_mask = (uint64_t)-1 >> needed_mask_shift;
+		dprint(FD_PROCESS, "bit_offset=%d end=%d needed=%d needed_shift=%d needed_mask=%ld needed_mask_shift=%d\n", bit_offset, end, needed, needed_shift, needed_mask, needed_mask_shift);
+		group_cpumask |= (cpumask->row[row + 1] & needed_mask) << needed_shift;
+	}
+	group_cpumask &= (uint64_t)-1 >> (FIO_CPU_MASK_STRIDE - group_size);
+
+	/* Return group and mask */
+	dprint(FD_PROCESS, "Returning group=%d group_mask=%lu\n", group, group_cpumask);
+	*processor_group = group;
+	*affinity_mask = group_cpumask;
+
+	return 0;
+}
+
+static inline int fio_setaffinity(int pid, os_cpu_mask_t cpumask)
+{
+	HANDLE handle = NULL;
+	int group, ret;
+	uint64_t group_mask = 0;
+	GROUP_AFFINITY new_group_affinity;
+
+	ret = -1;
+
+	if (mask_to_group_mask(&cpumask, &group, &group_mask) != 0)
+		goto err;
+
+	handle = OpenThread(THREAD_QUERY_INFORMATION | THREAD_SET_INFORMATION,
+			    TRUE, pid);
+	if (handle == NULL) {
+		log_err("fio_setaffinity: failed to get handle for pid %d\n", pid);
+		goto err;
+	}
+
+	/* Set group and mask.
+	 * Note: if the GROUP_AFFINITY struct's Reserved members are not
+	 * initialised to 0 then SetThreadGroupAffinity will fail with
+	 * GetLastError() set to ERROR_INVALID_PARAMETER */
+	new_group_affinity.Mask = (KAFFINITY) group_mask;
+	new_group_affinity.Group = group;
+	new_group_affinity.Reserved[0] = 0;
+	new_group_affinity.Reserved[1] = 0;
+	new_group_affinity.Reserved[2] = 0;
+	if (SetThreadGroupAffinity(handle, &new_group_affinity, NULL) != 0)
+		ret = 0;
+	else {
+		log_err("fio_setaffinity: failed to set thread affinity "
+			 "(pid %d, group %d, mask %" PRIx64 ", "
+			 "GetLastError=%d)\n", pid, group, group_mask,
+			 GetLastError());
+		goto err;
+	}
+
+err:
+	if (handle)
+		CloseHandle(handle);
+	return ret;
+}
+
+static inline void cpu_to_row_offset(int cpu, int *row, int *offset)
+{
+	*row = cpu / FIO_CPU_MASK_STRIDE;
+	*offset = cpu << FIO_CPU_MASK_STRIDE * *row;
+}
+
+static inline int fio_cpuset_init(os_cpu_mask_t *mask)
+{
+	for (int i = 0; i < FIO_CPU_MASK_ROWS; i++)
+		mask->row[i] = 0;
+	return 0;
+}
+
+/*
+ * fio_getaffinity() should not be called once a fio_setaffinity() call has
+ * been made because fio_setaffinity() may put the process into multiple
+ * processor groups
+ */
+static inline int fio_getaffinity(int pid, os_cpu_mask_t *mask)
+{
+	int ret;
+	int row, offset, end, group, group_size, group_start_cpu;
+	DWORD_PTR process_mask, system_mask;
+	HANDLE handle;
+	PUSHORT current_groups;
+	USHORT group_count;
+	WORD online_groups;
+
+	ret = -1;
+	current_groups = NULL;
+	handle = OpenProcess(PROCESS_QUERY_INFORMATION, TRUE, pid);
+	if (handle == NULL) {
+		log_err("fio_getaffinity: failed to get handle for pid %d\n",
+			pid);
+		goto err;
+	}
+
+	group_count = 1;
+	/*
+	 * GetProcessGroupAffinity() seems to expect more than the natural
+	 * alignment for a USHORT from the area pointed to by current_groups so
+	 * arrange for maximum alignment by allocating via malloc()
+	 */
+	current_groups = malloc(sizeof(USHORT));
+	if (!current_groups) {
+		log_err("fio_getaffinity: malloc failed\n");
+		goto err;
+	}
+	if (GetProcessGroupAffinity(handle, &group_count, current_groups) == 0) {
+		/* NB: we also fail here if we are a multi-group process */
+		log_err("fio_getaffinity: failed to get single group affinity for pid %d\n", pid);
+		goto err;
+	}
+	GetProcessAffinityMask(handle, &process_mask, &system_mask);
+
+	/* Convert group and group relative mask to full CPU mask */
+	online_groups = GetActiveProcessorGroupCount();
+	if (online_groups == 0) {
+		log_err("fio_getaffinity: error retrieving total processor groups\n");
+		goto err;
+	}
+
+	group = 0;
+	group_start_cpu = 0;
+	group_size = 0;
+	dprint(FD_PROCESS, "current_groups=%d group_count=%d\n",
+	       current_groups[0], group_count);
+	while (true) {
+		group_size = GetMaximumProcessorCount(group);
+		if (group_size == 0) {
+			log_err("fio_getaffinity: error retrieving size of "
+				"processor group %d\n", group);
+			goto err;
+		} else if (group >= current_groups[0] || group >= online_groups)
+			break;
+		else {
+			group_start_cpu += group_size;
+			group++;
+		}
+	}
+
+	if (group != current_groups[0]) {
+		log_err("fio_getaffinity: could not find processor group %d\n",
+			current_groups[0]);
+		goto err;
+	}
+
+	dprint(FD_PROCESS, "group_start_cpu=%d, group size=%u\n",
+	       group_start_cpu, group_size);
+	if ((group_start_cpu + group_size) >= FIO_MAX_CPUS) {
+		log_err("fio_getaffinity failed: current CPU affinity (group "
+			"%d, group_start_cpu %d, group_size %d) extends "
+			"beyond mask's highest CPU (%d)\n", group,
+			group_start_cpu, group_size, FIO_MAX_CPUS);
+		goto err;
+	}
+
+	fio_cpuset_init(mask);
+	cpu_to_row_offset(group_start_cpu, &row, &offset);
+	mask->row[row] = process_mask;
+	mask->row[row] <<= offset;
+	end = offset + group_size;
+	if (end > FIO_CPU_MASK_STRIDE) {
+		int needed;
+		uint64_t needed_mask;
+
+		needed = FIO_CPU_MASK_STRIDE - end;
+		needed_mask = (uint64_t)-1 >> (FIO_CPU_MASK_STRIDE - needed);
+		row++;
+		mask->row[row] = process_mask;
+		mask->row[row] >>= needed;
+		mask->row[row] &= needed_mask;
+	}
+	ret = 0;
+
+err:
+	if (handle)
+		CloseHandle(handle);
+	if (current_groups)
+		free(current_groups);
+
+	return ret;
+}
+
+static inline void fio_cpu_clear(os_cpu_mask_t *mask, int cpu)
+{
+	int row, offset;
+	cpu_to_row_offset(cpu, &row, &offset);
+
+	mask->row[row] &= ~(1ULL << offset);
+}
+
+static inline void fio_cpu_set(os_cpu_mask_t *mask, int cpu)
+{
+	int row, offset;
+	cpu_to_row_offset(cpu, &row, &offset);
+
+	mask->row[row] |= 1ULL << offset;
+}
+
+static inline int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
+{
+	int row, offset;
+	cpu_to_row_offset(cpu, &row, &offset);
+
+	return (mask->row[row] & (1ULL << offset)) != 0;
+}
+
+static inline int fio_cpu_count(os_cpu_mask_t *mask)
+{
+	int count = 0;
+
+	for (int i = 0; i < FIO_CPU_MASK_ROWS; i++)
+		count += hweight64(mask->row[i]);
+
+	return count;
+}
+
+static inline int fio_cpuset_exit(os_cpu_mask_t *mask)
+{
+	return 0;
+}
diff --git a/os/os-windows-xp.h b/os/os-windows-xp.h
new file mode 100644
index 0000000..1ce9ab3
--- /dev/null
+++ b/os/os-windows-xp.h
@@ -0,0 +1,70 @@
+#define FIO_MAX_CPUS	MAXIMUM_PROCESSORS
+
+typedef DWORD_PTR os_cpu_mask_t;
+
+static inline int fio_setaffinity(int pid, os_cpu_mask_t cpumask)
+{
+	HANDLE h;
+	BOOL bSuccess = FALSE;
+
+	h = OpenThread(THREAD_QUERY_INFORMATION | THREAD_SET_INFORMATION, TRUE, pid);
+	if (h != NULL) {
+		bSuccess = SetThreadAffinityMask(h, cpumask);
+		if (!bSuccess)
+			log_err("fio_setaffinity failed: failed to set thread affinity (pid %d, mask %.16llx)\n", pid, cpumask);
+
+		CloseHandle(h);
+	} else {
+		log_err("fio_setaffinity failed: failed to get handle for pid %d\n", pid);
+	}
+
+	return (bSuccess)? 0 : -1;
+}
+
+static inline int fio_getaffinity(int pid, os_cpu_mask_t *mask)
+{
+	os_cpu_mask_t systemMask;
+
+	HANDLE h = OpenProcess(PROCESS_QUERY_INFORMATION, TRUE, pid);
+
+	if (h != NULL) {
+		GetProcessAffinityMask(h, mask, &systemMask);
+		CloseHandle(h);
+	} else {
+		log_err("fio_getaffinity failed: failed to get handle for pid %d\n", pid);
+		return -1;
+	}
+
+	return 0;
+}
+
+static inline void fio_cpu_clear(os_cpu_mask_t *mask, int cpu)
+{
+	*mask &= ~(1ULL << cpu);
+}
+
+static inline void fio_cpu_set(os_cpu_mask_t *mask, int cpu)
+{
+	*mask |= 1ULL << cpu;
+}
+
+static inline int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
+{
+	return (*mask & (1ULL << cpu)) != 0;
+}
+
+static inline int fio_cpu_count(os_cpu_mask_t *mask)
+{
+	return hweight64(*mask);
+}
+
+static inline int fio_cpuset_init(os_cpu_mask_t *mask)
+{
+	*mask = 0;
+	return 0;
+}
+
+static inline int fio_cpuset_exit(os_cpu_mask_t *mask)
+{
+	return 0;
+}
diff --git a/os/os-windows.h b/os/os-windows.h
index 9b04579..01f555e 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -13,6 +13,7 @@
 #include <stdlib.h>
 
 #include "../smalloc.h"
+#include "../debug.h"
 #include "../file.h"
 #include "../log.h"
 #include "../lib/hweight.h"
@@ -21,7 +22,7 @@
 
 #include "windows/posix.h"
 
-/* Cygwin doesn't define rand_r if C99 or newer is being used */
+/* MinGW won't declare rand_r unless _POSIX is defined */
 #if defined(WIN32) && !defined(rand_r)
 int rand_r(unsigned *);
 #endif
@@ -40,16 +41,12 @@ int rand_r(unsigned *);
 #define FIO_PREFERRED_CLOCK_SOURCE	CS_CGETTIME
 #define FIO_OS_PATH_SEPARATOR		'\\'
 
-#define FIO_MAX_CPUS	MAXIMUM_PROCESSORS
-
 #define OS_MAP_ANON		MAP_ANON
 
 #define fio_swap16(x)	_byteswap_ushort(x)
 #define fio_swap32(x)	_byteswap_ulong(x)
 #define fio_swap64(x)	_byteswap_uint64(x)
 
-typedef DWORD_PTR os_cpu_mask_t;
-
 #define _SC_PAGESIZE			0x1
 #define _SC_NPROCESSORS_ONLN	0x2
 #define _SC_PHYS_PAGES			0x4
@@ -77,11 +74,6 @@ typedef DWORD_PTR os_cpu_mask_t;
 /* Winsock doesn't support MSG_WAIT */
 #define OS_MSG_DONTWAIT	0
 
-#define POLLOUT	1
-#define POLLIN	2
-#define POLLERR	0
-#define POLLHUP	1
-
 #define SIGCONT	0
 #define SIGUSR1	1
 #define SIGUSR2 2
@@ -172,73 +164,6 @@ static inline int gettid(void)
 	return GetCurrentThreadId();
 }
 
-static inline int fio_setaffinity(int pid, os_cpu_mask_t cpumask)
-{
-	HANDLE h;
-	BOOL bSuccess = FALSE;
-
-	h = OpenThread(THREAD_QUERY_INFORMATION | THREAD_SET_INFORMATION, TRUE, pid);
-	if (h != NULL) {
-		bSuccess = SetThreadAffinityMask(h, cpumask);
-		if (!bSuccess)
-			log_err("fio_setaffinity failed: failed to set thread affinity (pid %d, mask %.16llx)\n", pid, cpumask);
-
-		CloseHandle(h);
-	} else {
-		log_err("fio_setaffinity failed: failed to get handle for pid %d\n", pid);
-	}
-
-	return (bSuccess)? 0 : -1;
-}
-
-static inline int fio_getaffinity(int pid, os_cpu_mask_t *mask)
-{
-	os_cpu_mask_t systemMask;
-
-	HANDLE h = OpenProcess(PROCESS_QUERY_INFORMATION, TRUE, pid);
-
-	if (h != NULL) {
-		GetProcessAffinityMask(h, mask, &systemMask);
-		CloseHandle(h);
-	} else {
-		log_err("fio_getaffinity failed: failed to get handle for pid %d\n", pid);
-		return -1;
-	}
-
-	return 0;
-}
-
-static inline void fio_cpu_clear(os_cpu_mask_t *mask, int cpu)
-{
-	*mask &= ~(1ULL << cpu);
-}
-
-static inline void fio_cpu_set(os_cpu_mask_t *mask, int cpu)
-{
-	*mask |= 1ULL << cpu;
-}
-
-static inline int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
-{
-	return (*mask & (1ULL << cpu)) != 0;
-}
-
-static inline int fio_cpu_count(os_cpu_mask_t *mask)
-{
-	return hweight64(*mask);
-}
-
-static inline int fio_cpuset_init(os_cpu_mask_t *mask)
-{
-	*mask = 0;
-	return 0;
-}
-
-static inline int fio_cpuset_exit(os_cpu_mask_t *mask)
-{
-	return 0;
-}
-
 static inline int init_random_seeds(unsigned long *rand_seeds, int size)
 {
 	HCRYPTPROV hCryptProv;
@@ -261,12 +186,16 @@ static inline int init_random_seeds(unsigned long *rand_seeds, int size)
 	return 0;
 }
 
-
 static inline int fio_set_sched_idle(void)
 {
 	/* SetThreadPriority returns nonzero for success */
 	return (SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_IDLE))? 0 : -1;
 }
 
+#ifdef CONFIG_WINDOWS_XP
+#include "os-windows-xp.h"
+#else
+#include "os-windows-7.h"
+#endif
 
 #endif /* FIO_OS_WINDOWS_H */
diff --git a/os/windows/eula.rtf b/os/windows/eula.rtf
index b2798bb..01472be 100755
Binary files a/os/windows/eula.rtf and b/os/windows/eula.rtf differ
diff --git a/os/windows/posix.c b/os/windows/posix.c
index ecc8c40..d33250d 100755
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -959,6 +959,7 @@ in_addr_t inet_network(const char *cp)
 	return hbo;
 }
 
+#ifdef CONFIG_WINDOWS_XP
 const char* inet_ntop(int af, const void *restrict src,
 		char *restrict dst, socklen_t size)
 {
@@ -1039,3 +1040,4 @@ int inet_pton(int af, const char *restrict src, void *restrict dst)
 
 	return ret;
 }
+#endif /* CONFIG_WINDOWS_XP */
diff --git a/os/windows/posix/include/arpa/inet.h b/os/windows/posix/include/arpa/inet.h
index 30498c6..056f1dd 100644
--- a/os/windows/posix/include/arpa/inet.h
+++ b/os/windows/posix/include/arpa/inet.h
@@ -12,8 +12,10 @@ typedef int in_addr_t;
 
 in_addr_t inet_network(const char *cp);
 
+#ifdef CONFIG_WINDOWS_XP
 const char *inet_ntop(int af, const void *restrict src,
         char *restrict dst, socklen_t size);
 int inet_pton(int af, const char *restrict src, void *restrict dst);
+#endif
 
 #endif /* ARPA_INET_H */
diff --git a/os/windows/posix/include/poll.h b/os/windows/posix/include/poll.h
index f064e2b..25b8183 100644
--- a/os/windows/posix/include/poll.h
+++ b/os/windows/posix/include/poll.h
@@ -1,8 +1,11 @@
 #ifndef POLL_H
 #define POLL_H
 
+#include <winsock2.h>
+
 typedef int nfds_t;
 
+#ifdef CONFIG_WINDOWS_XP
 struct pollfd
 {
 	int fd;
@@ -10,6 +13,12 @@ struct pollfd
 	short revents;
 };
 
+#define POLLOUT	1
+#define POLLIN	2
+#define POLLERR	0
+#define POLLHUP	1
+#endif /* CONFIG_WINDOWS_XP */
+
 int poll(struct pollfd fds[], nfds_t nfds, int timeout);
 
 #endif /* POLL_H */
diff --git a/server.c b/server.c
index 2e08c66..12c8d68 100644
--- a/server.c
+++ b/server.c
@@ -2144,14 +2144,14 @@ static int fio_init_server_ip(void)
 #endif
 
 	if (use_ipv6) {
-		const void *src = &saddr_in6.sin6_addr;
+		void *src = &saddr_in6.sin6_addr;
 
 		addr = (struct sockaddr *) &saddr_in6;
 		socklen = sizeof(saddr_in6);
 		saddr_in6.sin6_family = AF_INET6;
 		str = inet_ntop(AF_INET6, src, buf, sizeof(buf));
 	} else {
-		const void *src = &saddr_in.sin_addr;
+		void *src = &saddr_in.sin_addr;
 
 		addr = (struct sockaddr *) &saddr_in;
 		socklen = sizeof(saddr_in);
@@ -2219,7 +2219,7 @@ static int fio_init_server_connection(void)
 
 	if (!bind_sock) {
 		char *p, port[16];
-		const void *src;
+		void *src;
 		int af;
 
 		if (use_ipv6) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit cb73748b9af3d678eb6ad0af7b9cea5a2ea1999e:

  stat: remove dead 'nr_uninit' assignment (2018-04-09 08:10:40 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4fe721ac83e84df7c6be07394d1963fd1ec5d9a6:

  os/os-dragonfly: sync with header file changes in upstream (2018-04-10 09:17:22 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (1):
      os/os-dragonfly: sync with header file changes in upstream

 os/os-dragonfly.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index 713046f..e80ad8c 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -10,10 +10,17 @@
 #include <sys/sysctl.h>
 #include <sys/statvfs.h>
 #include <sys/diskslice.h>
-#include <sys/ioctl_compat.h>
 #include <sys/usched.h>
 #include <sys/resource.h>
 
+/* API changed during "5.3 development" */
+#if __DragonFly_version < 500302
+#include <sys/ioctl_compat.h>
+#define DAIOCTRIM	IOCTLTRIM
+#else
+#include <bus/cam/scsi/scsi_daio.h>
+#endif
+
 #include "../file.h"
 #include "../lib/types.h"
 
@@ -222,7 +229,7 @@ static inline int os_trim(struct fio_file *f, unsigned long long start,
 	range[0] = start;
 	range[1] = len;
 
-	if (!ioctl(f->fd, IOCTLTRIM, range))
+	if (!ioctl(f->fd, DAIOCTRIM, range))
 		return 0;
 
 	return errno;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9fac0db7a09bac08bbff9b213d3b1daceee07679:

  steadystate: check for division by zero in mean calculation (2018-04-08 15:54:26 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cb73748b9af3d678eb6ad0af7b9cea5a2ea1999e:

  stat: remove dead 'nr_uninit' assignment (2018-04-09 08:10:40 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      init: fix memory leak in error handling
      stat: remove dead 'nr_uninit' assignment

 init.c | 3 ++-
 stat.c | 1 -
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 0b6fedd..f5ff73d 100644
--- a/init.c
+++ b/init.c
@@ -1970,7 +1970,8 @@ static int __parse_jobs_ini(struct thread_data *td,
 			if (p[0] == '[') {
 				if (nested) {
 					log_err("No new sections in included files\n");
-					return 1;
+					ret = 1;
+					goto out;
 				}
 
 				skip_fgets = 1;
diff --git a/stat.c b/stat.c
index a837ed9..7b9dd3b 100644
--- a/stat.c
+++ b/stat.c
@@ -670,7 +670,6 @@ static int calc_block_percentiles(int nr_block_infos, uint32_t *block_infos,
 	if (len > 1)
 		qsort((void *)plist, len, sizeof(plist[0]), double_cmp);
 
-	nr_uninit = 0;
 	/* Start only after the uninit entries end */
 	for (nr_uninit = 0;
 	     nr_uninit < nr_block_infos

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 465964a6c8ff3ccac62b92e9af57377075e04579:

  Merge branch 'master' of https://github.com/bvanassche/fio (2018-04-06 17:59:20 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9fac0db7a09bac08bbff9b213d3b1daceee07679:

  steadystate: check for division by zero in mean calculation (2018-04-08 15:54:26 -0600)

----------------------------------------------------------------
Jens Axboe (8):
      Remove binject engine
      axmap: use calloc() for level alloc
      server: fix dead assignment of variable
      client: fix bad shadowing of 'ret'
      filesetup: fix dead assignment of 'ret'
      parse: fix dead 'org' assignment
      eta: fix dead variable assignments
      steadystate: check for division by zero in mean calculation

 Makefile          |   2 +-
 client.c          |   6 +-
 engines/binject.c | 458 ------------------------------------------------------
 eta.c             |   4 +-
 filesetup.c       |   3 -
 io_u.h            |   3 -
 lib/axmap.c       |   2 +-
 options.c         |   5 -
 os/binject.h      |  71 ---------
 os/os-android.h   |   1 -
 os/os-linux.h     |   2 -
 parse.c           |   2 -
 server.c          |   1 -
 steadystate.c     |   6 +
 14 files changed, 13 insertions(+), 553 deletions(-)
 delete mode 100644 engines/binject.c
 delete mode 100644 os/binject.h

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index d45ba6b..cc4b71f 100644
--- a/Makefile
+++ b/Makefile
@@ -148,7 +148,7 @@ endif
 
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
-		engines/binject.c oslib/linux-dev-lookup.c
+		oslib/linux-dev-lookup.c
   LIBS += -lpthread -ldl
   LDFLAGS += -rdynamic
 endif
diff --git a/client.c b/client.c
index 970974a..ea1a4d2 100644
--- a/client.c
+++ b/client.c
@@ -1339,7 +1339,7 @@ static int fio_client_handle_iolog(struct fio_client *client,
 	sprintf(log_pathname, "%s.%s", pdu->name, client->hostname);
 
 	if (store_direct) {
-		ssize_t ret;
+		ssize_t wrote;
 		size_t sz;
 		int fd;
 
@@ -1353,10 +1353,10 @@ static int fio_client_handle_iolog(struct fio_client *client,
 		}
 
 		sz = cmd->pdu_len - sizeof(*pdu);
-		ret = write(fd, pdu->samples, sz);
+		wrote = write(fd, pdu->samples, sz);
 		close(fd);
 
-		if (ret != sz) {
+		if (wrote != sz) {
 			log_err("fio: short write on compressed log\n");
 			ret = 1;
 			goto out;
diff --git a/engines/binject.c b/engines/binject.c
deleted file mode 100644
index 49042a3..0000000
--- a/engines/binject.c
+++ /dev/null
@@ -1,458 +0,0 @@
-/*
- * binject engine
- *
- * IO engine that uses the Linux binject interface to directly inject
- * bio's to block devices.
- *
- */
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <errno.h>
-#include <assert.h>
-#include <string.h>
-#include <poll.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-
-#include "../fio.h"
-
-#ifdef FIO_HAVE_BINJECT
-
-struct binject_data {
-	struct b_user_cmd *cmds;
-	struct io_u **events;
-	struct pollfd *pfds;
-	int *fd_flags;
-};
-
-struct binject_file {
-	unsigned int bs;
-	int minor;
-	int fd;
-};
-
-static void binject_buc_init(struct binject_data *bd, struct io_u *io_u)
-{
-	struct b_user_cmd *buc = &io_u->buc;
-
-	memset(buc, 0, sizeof(*buc));
-	binject_buc_set_magic(buc);
-
-	buc->buf = (unsigned long) io_u->xfer_buf;
-	buc->len = io_u->xfer_buflen;
-	buc->offset = io_u->offset;
-	buc->usr_ptr = (unsigned long) io_u;
-
-	buc->flags = B_FLAG_NOIDLE | B_FLAG_UNPLUG;
-	assert(buc->buf);
-}
-
-static int pollin_events(struct pollfd *pfds, int fds)
-{
-	int i;
-
-	for (i = 0; i < fds; i++)
-		if (pfds[i].revents & POLLIN)
-			return 1;
-
-	return 0;
-}
-
-static unsigned int binject_read_commands(struct thread_data *td, void *buf,
-					  int left, int *err)
-{
-	struct fio_file *f;
-	int i, ret, events;
-	char *p = buf;
-
-one_more:
-	events = 0;
-	for_each_file(td, f, i) {
-		struct binject_file *bf = FILE_ENG_DATA(f);
-
-		ret = read(bf->fd, p, left * sizeof(struct b_user_cmd));
-		if (ret < 0) {
-			if (errno == EAGAIN)
-				continue;
-			*err = -errno;
-			td_verror(td, errno, "read");
-			break;
-		} else if (ret) {
-			p += ret;
-			events += ret / sizeof(struct b_user_cmd);
-		}
-	}
-
-	if (*err || events)
-		return events;
-
-	usleep(1000);
-	goto one_more;
-}
-
-static int fio_binject_getevents(struct thread_data *td, unsigned int min,
-				 unsigned int max,
-				 const struct timespec fio_unused *t)
-{
-	struct binject_data *bd = td->io_ops_data;
-	int left = max, ret, r = 0, ev_index = 0;
-	void *buf = bd->cmds;
-	unsigned int i, events;
-	struct fio_file *f;
-
-	/*
-	 * Fill in the file descriptors
-	 */
-	for_each_file(td, f, i) {
-		struct binject_file *bf = FILE_ENG_DATA(f);
-
-		/*
-		 * don't block for min events == 0
-		 */
-		if (!min)
-			bd->fd_flags[i] = fio_set_fd_nonblocking(bf->fd, "binject");
-		else
-			bd->fd_flags[i] = -1;
-
-		bd->pfds[i].fd = bf->fd;
-		bd->pfds[i].events = POLLIN;
-	}
-
-	while (left) {
-		while (!min) {
-			ret = poll(bd->pfds, td->o.nr_files, -1);
-			if (ret < 0) {
-				if (!r)
-					r = -errno;
-				td_verror(td, errno, "poll");
-				break;
-			} else if (!ret)
-				continue;
-
-			if (pollin_events(bd->pfds, td->o.nr_files))
-				break;
-		}
-
-		if (r < 0)
-			break;
-
-		events = binject_read_commands(td, buf, left, &r);
-
-		if (r < 0)
-			break;
-
-		left -= events;
-		r += events;
-
-		for (i = 0; i < events; i++) {
-			struct b_user_cmd *buc = (struct b_user_cmd *) buf + i;
-
-			bd->events[ev_index] = (struct io_u *) (unsigned long) buc->usr_ptr;
-			ev_index++;
-		}
-	}
-
-	if (!min) {
-		for_each_file(td, f, i) {
-			struct binject_file *bf = FILE_ENG_DATA(f);
-
-			if (bd->fd_flags[i] == -1)
-				continue;
-
-			if (fcntl(bf->fd, F_SETFL, bd->fd_flags[i]) < 0)
-				log_err("fio: binject failed to restore fcntl flags: %s\n", strerror(errno));
-		}
-	}
-
-	if (r > 0)
-		assert(ev_index == r);
-
-	return r;
-}
-
-static int fio_binject_doio(struct thread_data *td, struct io_u *io_u)
-{
-	struct b_user_cmd *buc = &io_u->buc;
-	struct binject_file *bf = FILE_ENG_DATA(io_u->file);
-	int ret;
-
-	ret = write(bf->fd, buc, sizeof(*buc));
-	if (ret < 0)
-		return ret;
-
-	return FIO_Q_QUEUED;
-}
-
-static int fio_binject_prep(struct thread_data *td, struct io_u *io_u)
-{
-	struct binject_data *bd = td->io_ops_data;
-	struct b_user_cmd *buc = &io_u->buc;
-	struct binject_file *bf = FILE_ENG_DATA(io_u->file);
-
-	if (io_u->xfer_buflen & (bf->bs - 1)) {
-		log_err("read/write not sector aligned\n");
-		return EINVAL;
-	}
-
-	if (io_u->ddir == DDIR_READ) {
-		binject_buc_init(bd, io_u);
-		buc->type = B_TYPE_READ;
-	} else if (io_u->ddir == DDIR_WRITE) {
-		binject_buc_init(bd, io_u);
-		if (io_u->flags & IO_U_F_BARRIER)
-			buc->type = B_TYPE_WRITEBARRIER;
-		else
-			buc->type = B_TYPE_WRITE;
-	} else if (io_u->ddir == DDIR_TRIM) {
-		binject_buc_init(bd, io_u);
-		buc->type = B_TYPE_DISCARD;
-	} else {
-		assert(0);
-	}
-
-	return 0;
-}
-
-static int fio_binject_queue(struct thread_data *td, struct io_u *io_u)
-{
-	int ret;
-
-	fio_ro_check(td, io_u);
-
-	ret = fio_binject_doio(td, io_u);
-
-	if (ret < 0)
-		io_u->error = errno;
-
-	if (io_u->error) {
-		td_verror(td, io_u->error, "xfer");
-		return FIO_Q_COMPLETED;
-	}
-
-	return ret;
-}
-
-static struct io_u *fio_binject_event(struct thread_data *td, int event)
-{
-	struct binject_data *bd = td->io_ops_data;
-
-	return bd->events[event];
-}
-
-static int binject_open_ctl(struct thread_data *td)
-{
-	int fd;
-
-	fd = open("/dev/binject-ctl", O_RDWR);
-	if (fd < 0)
-		td_verror(td, errno, "open binject-ctl");
-
-	return fd;
-}
-
-static void binject_unmap_dev(struct thread_data *td, struct binject_file *bf)
-{
-	struct b_ioctl_cmd bic;
-	int fdb;
-
-	if (bf->fd >= 0) {
-		close(bf->fd);
-		bf->fd = -1;
-	}
-
-	fdb = binject_open_ctl(td);
-	if (fdb < 0)
-		return;
-
-	bic.minor = bf->minor;
-
-	if (ioctl(fdb, B_IOCTL_DEL, &bic) < 0)
-		td_verror(td, errno, "binject dev unmap");
-
-	close(fdb);
-}
-
-static int binject_map_dev(struct thread_data *td, struct binject_file *bf,
-			   int fd)
-{
-	struct b_ioctl_cmd bic;
-	char name[80];
-	struct stat sb;
-	int fdb, dev_there, loops;
-
-	fdb = binject_open_ctl(td);
-	if (fdb < 0)
-		return 1;
-
-	bic.fd = fd;
-
-	if (ioctl(fdb, B_IOCTL_ADD, &bic) < 0) {
-		td_verror(td, errno, "binject dev map");
-		close(fdb);
-		return 1;
-	}
-
-	bf->minor = bic.minor;
-
-	sprintf(name, "/dev/binject%u", bf->minor);
-
-	/*
-	 * Wait for udev to create the node...
-	 */
-	dev_there = loops = 0;
-	do {
-		if (!stat(name, &sb)) {
-			dev_there = 1;
-			break;
-		}
-
-		usleep(10000);
-	} while (++loops < 100);
-
-	close(fdb);
-
-	if (!dev_there) {
-		log_err("fio: timed out waiting for binject dev\n");
-		goto err_unmap;
-	}
-
-	bf->fd = open(name, O_RDWR);
-	if (bf->fd < 0) {
-		td_verror(td, errno, "binject dev open");
-err_unmap:
-		binject_unmap_dev(td, bf);
-		return 1;
-	}
-
-	return 0;
-}
-
-static int fio_binject_close_file(struct thread_data *td, struct fio_file *f)
-{
-	struct binject_file *bf = FILE_ENG_DATA(f);
-
-	if (bf) {
-		binject_unmap_dev(td, bf);
-		free(bf);
-		FILE_SET_ENG_DATA(f, NULL);
-		return generic_close_file(td, f);
-	}
-
-	return 0;
-}
-
-static int fio_binject_open_file(struct thread_data *td, struct fio_file *f)
-{
-	struct binject_file *bf;
-	unsigned int bs;
-	int ret;
-
-	ret = generic_open_file(td, f);
-	if (ret)
-		return 1;
-
-	if (f->filetype != FIO_TYPE_BLOCK) {
-		log_err("fio: binject only works with block devices\n");
-		goto err_close;
-	}
-	if (ioctl(f->fd, BLKSSZGET, &bs) < 0) {
-		td_verror(td, errno, "BLKSSZGET");
-		goto err_close;
-	}
-
-	bf = malloc(sizeof(*bf));
-	bf->bs = bs;
-	bf->minor = bf->fd = -1;
-	FILE_SET_ENG_DATA(f, bf);
-
-	if (binject_map_dev(td, bf, f->fd)) {
-err_close:
-		ret = generic_close_file(td, f);
-		return 1;
-	}
-
-	return 0;
-}
-
-static void fio_binject_cleanup(struct thread_data *td)
-{
-	struct binject_data *bd = td->io_ops_data;
-
-	if (bd) {
-		free(bd->events);
-		free(bd->cmds);
-		free(bd->fd_flags);
-		free(bd->pfds);
-		free(bd);
-	}
-}
-
-static int fio_binject_init(struct thread_data *td)
-{
-	struct binject_data *bd;
-
-	bd = malloc(sizeof(*bd));
-	memset(bd, 0, sizeof(*bd));
-
-	bd->cmds = malloc(td->o.iodepth * sizeof(struct b_user_cmd));
-	memset(bd->cmds, 0, td->o.iodepth * sizeof(struct b_user_cmd));
-
-	bd->events = malloc(td->o.iodepth * sizeof(struct io_u *));
-	memset(bd->events, 0, td->o.iodepth * sizeof(struct io_u *));
-
-	bd->pfds = malloc(sizeof(struct pollfd) * td->o.nr_files);
-	memset(bd->pfds, 0, sizeof(struct pollfd) * td->o.nr_files);
-
-	bd->fd_flags = malloc(sizeof(int) * td->o.nr_files);
-	memset(bd->fd_flags, 0, sizeof(int) * td->o.nr_files);
-
-	td->io_ops_data = bd;
-	return 0;
-}
-
-static struct ioengine_ops ioengine = {
-	.name		= "binject",
-	.version	= FIO_IOOPS_VERSION,
-	.init		= fio_binject_init,
-	.prep		= fio_binject_prep,
-	.queue		= fio_binject_queue,
-	.getevents	= fio_binject_getevents,
-	.event		= fio_binject_event,
-	.cleanup	= fio_binject_cleanup,
-	.open_file	= fio_binject_open_file,
-	.close_file	= fio_binject_close_file,
-	.get_file_size	= generic_get_file_size,
-	.flags		= FIO_RAWIO | FIO_BARRIER | FIO_MEMALIGN,
-};
-
-#else /* FIO_HAVE_BINJECT */
-
-/*
- * When we have a proper configure system in place, we simply wont build
- * and install this io engine. For now install a crippled version that
- * just complains and fails to load.
- */
-static int fio_binject_init(struct thread_data fio_unused *td)
-{
-	log_err("fio: ioengine binject not available\n");
-	return 1;
-}
-
-static struct ioengine_ops ioengine = {
-	.name		= "binject",
-	.version	= FIO_IOOPS_VERSION,
-	.init		= fio_binject_init,
-};
-
-#endif
-
-static void fio_init fio_binject_register(void)
-{
-	register_ioengine(&ioengine);
-}
-
-static void fio_exit fio_binject_unregister(void)
-{
-	unregister_ioengine(&ioengine);
-}
diff --git a/eta.c b/eta.c
index 2d549ee..9111f5e 100644
--- a/eta.c
+++ b/eta.c
@@ -149,7 +149,7 @@ void eta_to_str(char *str, unsigned long eta_sec)
 		str += sprintf(str, "%02uh:", h);
 
 	str += sprintf(str, "%02um:", m);
-	str += sprintf(str, "%02us", s);
+	sprintf(str, "%02us", s);
 }
 
 /*
@@ -621,7 +621,7 @@ void display_thread_status(struct jobs_eta *je)
 			free(iops_str[ddir]);
 		}
 	}
-	p += sprintf(p, "\r");
+	sprintf(p, "\r");
 
 	printf("%s", output);
 
diff --git a/filesetup.c b/filesetup.c
index b246e0f..75694bd 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -490,7 +490,6 @@ static int __file_invalidate_cache(struct thread_data *td, struct fio_file *f,
 	} else if (td_ioengine_flagged(td, FIO_DISKLESSIO)) {
 		dprint(FD_IO, "invalidate not supported by ioengine %s\n",
 		       td->io_ops->name);
-		ret = 0;
 	} else if (f->filetype == FIO_TYPE_FILE) {
 		dprint(FD_IO, "declare unneeded cache %s: %llu/%llu\n",
 			f->file_name, off, len);
@@ -517,14 +516,12 @@ static int __file_invalidate_cache(struct thread_data *td, struct fio_file *f,
 				log_err("fio: only root may flush block "
 					"devices. Cache flush bypassed!\n");
 			}
-			ret = 0;
 		}
 		if (ret < 0)
 			errval = errno;
 	} else if (f->filetype == FIO_TYPE_CHAR ||
 		   f->filetype == FIO_TYPE_PIPE) {
 		dprint(FD_IO, "invalidate not supported %s\n", f->file_name);
-		ret = 0;
 	}
 
 	/*
diff --git a/io_u.h b/io_u.h
index aaa7d97..4f433c3 100644
--- a/io_u.h
+++ b/io_u.h
@@ -113,9 +113,6 @@ struct io_u {
 #ifdef CONFIG_SOLARISAIO
 		aio_result_t resultp;
 #endif
-#ifdef FIO_HAVE_BINJECT
-		struct b_user_cmd buc;
-#endif
 #ifdef CONFIG_RDMA
 		struct ibv_mr *mr;
 #endif
diff --git a/lib/axmap.c b/lib/axmap.c
index bf203df..3c65308 100644
--- a/lib/axmap.c
+++ b/lib/axmap.c
@@ -102,7 +102,7 @@ struct axmap *axmap_new(unsigned long nr_bits)
 	}
 
 	axmap->nr_levels = levels;
-	axmap->levels = malloc(axmap->nr_levels * sizeof(struct axmap_level));
+	axmap->levels = calloc(axmap->nr_levels, sizeof(struct axmap_level));
 	axmap->nr_bits = nr_bits;
 
 	for (i = 0; i < axmap->nr_levels; i++) {
diff --git a/options.c b/options.c
index 17d7245..fae3943 100644
--- a/options.c
+++ b/options.c
@@ -1816,11 +1816,6 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "GUASI IO engine",
 			  },
 #endif
-#ifdef FIO_HAVE_BINJECT
-			  { .ival = "binject",
-			    .help = "binject direct inject block engine",
-			  },
-#endif
 #ifdef CONFIG_RDMA
 			  { .ival = "rdma",
 			    .help = "RDMA IO engine",
diff --git a/os/binject.h b/os/binject.h
deleted file mode 100644
index 1d862c8..0000000
--- a/os/binject.h
+++ /dev/null
@@ -1,71 +0,0 @@
-#ifndef BINJECT_H
-#define BINJECT_H
-
-#include <linux/types.h>
-
-#define BINJECT_MAGIC		0x89
-#define BINJECT_VER		0x01
-#define BINJECT_MAGIC_SHIFT	8
-#define BINJECT_VER_MASK	((1 << BINJECT_MAGIC_SHIFT) - 1)
-
-struct b_user_cmd {
-	__u16 magic;	/* INPUT */
-	__u16 type;	/* INPUT */
-	__u32 error;	/* OUTPUT */
-	__u32 flags;	/* INPUT */
-	__u32 len;	/* INPUT */
-	__u64 offset;	/* INPUT */
-	__u64 buf;	/* INPUT */
-	__u64 usr_ptr;	/* PASSED THROUGH */
-	__u64 nsec;	/* OUTPUT */
-};
-
-struct b_ioctl_cmd {
-	int fd;
-	int minor;
-};
-
-#define BINJECT_IOCTL_CHR	'J'
-#define B_IOCTL_ADD		_IOWR(BINJECT_IOCTL_CHR, 1, struct b_ioctl_cmd)
-#define B_IOCTL_DEL		_IOWR(BINJECT_IOCTL_CHR, 2, struct b_ioctl_cmd)
-
-enum {
-	B_TYPE_READ		= 0,
-	B_TYPE_WRITE,
-	B_TYPE_DISCARD,
-	B_TYPE_READVOID,
-	B_TYPE_WRITEZERO,
-	B_TYPE_READBARRIER,
-	B_TYPE_WRITEBARRIER,
-	B_TYPE_NR
-};
-
-enum {
-	__B_FLAG_SYNC	= 0,
-	__B_FLAG_UNPLUG,
-	__B_FLAG_NOIDLE,
-	__B_FLAG_BARRIER,
-	__B_FLAG_META,
-	__B_FLAG_RAHEAD,
-	__B_FLAG_FAILFAST_DEV,
-	__B_FLAG_FAILFAST_TRANSPORT,
-	__B_FLAG_FAILFAST_DRIVER,
-	__B_FLAG_NR,
-
-	B_FLAG_SYNC			= 1 << __B_FLAG_SYNC,
-	B_FLAG_UNPLUG			= 1 << __B_FLAG_UNPLUG,
-	B_FLAG_NOIDLE			= 1 << __B_FLAG_NOIDLE,
-	B_FLAG_BARRIER			= 1 << __B_FLAG_BARRIER,
-	B_FLAG_META			= 1 << __B_FLAG_META,
-	B_FLAG_RAHEAD			= 1 << __B_FLAG_RAHEAD,
-	B_FLAG_FAILFAST_DEV		= 1 << __B_FLAG_FAILFAST_DEV,
-	B_FLAG_FAILFAST_TRANSPORT	= 1 << __B_FLAG_FAILFAST_TRANSPORT,
-	B_FLAG_FAILFAST_DRIVER		= 1 << __B_FLAG_FAILFAST_DRIVER,
-};
-
-static inline void binject_buc_set_magic(struct b_user_cmd *buc)
-{
-	buc->magic = (BINJECT_MAGIC << BINJECT_MAGIC_SHIFT) | BINJECT_VER;
-}
-
-#endif
diff --git a/os/os-android.h b/os/os-android.h
index bb590e4..1483275 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -18,7 +18,6 @@
 #include <asm/byteorder.h>
 
 #include "./os-linux-syscall.h"
-#include "binject.h"
 #include "../file.h"
 
 #ifndef __has_builtin         // Optional of course.
diff --git a/os/os-linux.h b/os/os-linux.h
index 1d400a0..a550bba 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -27,7 +27,6 @@
 #endif /* ARCH_HAVE_CRC_CRYPTO */
 
 #include "./os-linux-syscall.h"
-#include "binject.h"
 #include "../file.h"
 
 #ifndef __has_builtin         // Optional of course.
@@ -48,7 +47,6 @@
 #define FIO_HAVE_CGROUPS
 #define FIO_HAVE_FS_STAT
 #define FIO_HAVE_TRIM
-#define FIO_HAVE_BINJECT
 #define FIO_HAVE_GETTID
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_PWRITEV2
diff --git a/parse.c b/parse.c
index 3993471..539c602 100644
--- a/parse.c
+++ b/parse.c
@@ -1185,8 +1185,6 @@ static void __print_option(const struct fio_option *o,
 
 	if (!o)
 		return;
-	if (!org)
-		org = o;
 
 	p = name;
 	depth = level;
diff --git a/server.c b/server.c
index d3f6977..2e08c66 100644
--- a/server.c
+++ b/server.c
@@ -1199,7 +1199,6 @@ static int handle_connection(struct sk_out *sk_out)
 			.events	= POLLIN,
 		};
 
-		ret = 0;
 		do {
 			int timeout = 1000;
 
diff --git a/steadystate.c b/steadystate.c
index 1e3a546..ee1c0e5 100644
--- a/steadystate.c
+++ b/steadystate.c
@@ -350,6 +350,9 @@ uint64_t steadystate_bw_mean(struct thread_stat *ts)
 	int i;
 	uint64_t sum;
 
+	if (!ts->ss_dur)
+		return 0;
+
 	for (i = 0, sum = 0; i < ts->ss_dur; i++)
 		sum += ts->ss_bw_data[i];
 
@@ -361,6 +364,9 @@ uint64_t steadystate_iops_mean(struct thread_stat *ts)
 	int i;
 	uint64_t sum;
 
+	if (!ts->ss_dur)
+		return 0;
+
 	for (i = 0, sum = 0; i < ts->ss_dur; i++)
 		sum += ts->ss_iops_data[i];
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2b8b8f0dccb4c7e97aee3b7d7e13d8528467d64e:

  Fix return value checking of fread() in iolog.c (2018-04-04 19:18:50 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 465964a6c8ff3ccac62b92e9af57377075e04579:

  Merge branch 'master' of https://github.com/bvanassche/fio (2018-04-06 17:59:20 -0600)

----------------------------------------------------------------
Bart Van Assche (1):
      Fix floating point option range formatting

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 parse.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/parse.c b/parse.c
index deb4120..3993471 100644
--- a/parse.c
+++ b/parse.c
@@ -71,13 +71,17 @@ static void show_option_range(const struct fio_option *o,
 			      size_t (*logger)(const char *format, ...))
 {
 	if (o->type == FIO_OPT_FLOAT_LIST) {
+		const char *sep = "";
 		if (!o->minfp && !o->maxfp)
 			return;
 
-		if (o->minfp != DBL_MIN)
-			logger("%20s: min=%f", "range", o->minfp);
+		logger("%20s: ", "range");
+		if (o->minfp != DBL_MIN) {
+			logger("min=%f", o->minfp);
+			sep = ", ";
+		}
 		if (o->maxfp != DBL_MAX)
-			logger(", max=%f", o->maxfp);
+			logger("%smax=%f", sep, o->maxfp);
 		logger("\n");
 	} else if (!o->posval[0].ival) {
 		if (!o->minval && !o->maxval)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 196ce5d5ff76af30cf93fd24240f12d8c4d24381:

  Merge branch 'fixbug_glfs' of https://github.com/simon-rock/fio (2018-04-03 08:19:38 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2b8b8f0dccb4c7e97aee3b7d7e13d8528467d64e:

  Fix return value checking of fread() in iolog.c (2018-04-04 19:18:50 -0600)

----------------------------------------------------------------
Bart Van Assche (6):
      engines/sg: Make I/O error messages more informative
      Rename TD_F_VER_NONE into TD_F_DO_VERIFY
      Only populate the write buffer if necessary
      parse.h: Remove a superfluous cast
      option parsing: Mark arguments that are not modified as 'const'
      Ensure that .minfp and .maxfp are respected for FIO_OPT_FLOAT_LIST

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

Rebecca Cran (1):
      Fix return value checking of fread() in iolog.c

 backend.c    |   4 +++
 engines/sg.c |   4 +--
 fio.h        |   4 +--
 init.c       |   2 +-
 io_u.c       |   9 ++---
 iolog.c      |   6 ++--
 options.c    |  16 ++++-----
 options.h    |   9 +++--
 parse.c      | 112 +++++++++++++++++++++++++++++++----------------------------
 parse.h      |  25 +++++++------
 10 files changed, 103 insertions(+), 88 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index f2d7cc3..b28c3db 100644
--- a/backend.c
+++ b/backend.c
@@ -723,6 +723,7 @@ static void do_verify(struct thread_data *td, uint64_t verify_bytes)
 					break;
 				} else if (io_u->ddir == DDIR_WRITE) {
 					io_u->ddir = DDIR_READ;
+					populate_verify_io_u(td, io_u);
 					break;
 				} else {
 					put_io_u(td, io_u);
@@ -995,6 +996,9 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 			break;
 		}
 
+		if (io_u->ddir == DDIR_WRITE && td->flags & TD_F_DO_VERIFY)
+			populate_verify_io_u(td, io_u);
+
 		ddir = io_u->ddir;
 
 		/*
diff --git a/engines/sg.c b/engines/sg.c
index 72eed8b..c2c0de3 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -287,11 +287,11 @@ static int fio_sgio_doio(struct thread_data *td, struct io_u *io_u, int do_sync)
 
 	if (f->filetype == FIO_TYPE_BLOCK) {
 		ret = fio_sgio_ioctl_doio(td, f, io_u);
-		td->error = io_u->error;
+		td_verror(td, io_u->error, __func__);
 	} else {
 		ret = fio_sgio_rw_doio(f, io_u, do_sync);
 		if (do_sync)
-			td->error = io_u->error;
+			td_verror(td, io_u->error, __func__);
 	}
 
 	return ret;
diff --git a/fio.h b/fio.h
index 488fa9a..2bfcac4 100644
--- a/fio.h
+++ b/fio.h
@@ -79,7 +79,7 @@ enum {
 	__TD_F_READ_IOLOG,
 	__TD_F_REFILL_BUFFERS,
 	__TD_F_SCRAMBLE_BUFFERS,
-	__TD_F_VER_NONE,
+	__TD_F_DO_VERIFY,
 	__TD_F_PROFILE_OPS,
 	__TD_F_COMPRESS,
 	__TD_F_COMPRESS_LOG,
@@ -100,7 +100,7 @@ enum {
 	TD_F_READ_IOLOG		= 1U << __TD_F_READ_IOLOG,
 	TD_F_REFILL_BUFFERS	= 1U << __TD_F_REFILL_BUFFERS,
 	TD_F_SCRAMBLE_BUFFERS	= 1U << __TD_F_SCRAMBLE_BUFFERS,
-	TD_F_VER_NONE		= 1U << __TD_F_VER_NONE,
+	TD_F_DO_VERIFY		= 1U << __TD_F_DO_VERIFY,
 	TD_F_PROFILE_OPS	= 1U << __TD_F_PROFILE_OPS,
 	TD_F_COMPRESS		= 1U << __TD_F_COMPRESS,
 	TD_F_COMPRESS_LOG	= 1U << __TD_F_COMPRESS_LOG,
diff --git a/init.c b/init.c
index ab7e399..0b6fedd 100644
--- a/init.c
+++ b/init.c
@@ -1184,7 +1184,7 @@ static void init_flags(struct thread_data *td)
 	    fio_option_is_set(o, zero_buffers)))
 		td->flags |= TD_F_SCRAMBLE_BUFFERS;
 	if (o->verify != VERIFY_NONE)
-		td->flags |= TD_F_VER_NONE;
+		td->flags |= TD_F_DO_VERIFY;
 
 	if (o->verify_async || o->io_submit_mode == IO_MODE_OFFLOAD)
 		td->flags |= TD_F_NEED_LOCK;
diff --git a/io_u.c b/io_u.c
index 98a7dc5..5fbb238 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1699,7 +1699,7 @@ static void small_content_scramble(struct io_u *io_u)
 
 /*
  * Return an io_u to be processed. Gets a buflen and offset, sets direction,
- * etc. The returned io_u is fully ready to be prepped and submitted.
+ * etc. The returned io_u is fully ready to be prepped, populated and submitted.
  */
 struct io_u *get_io_u(struct thread_data *td)
 {
@@ -1760,12 +1760,9 @@ struct io_u *get_io_u(struct thread_data *td)
 					td->o.min_bs[DDIR_WRITE],
 					io_u->buflen);
 			} else if ((td->flags & TD_F_SCRAMBLE_BUFFERS) &&
-				   !(td->flags & TD_F_COMPRESS))
+				   !(td->flags & TD_F_COMPRESS) &&
+				   !(td->flags & TD_F_DO_VERIFY))
 				do_scramble = 1;
-			if (td->flags & TD_F_VER_NONE) {
-				populate_verify_io_u(td, io_u);
-				do_scramble = 0;
-			}
 		} else if (io_u->ddir == DDIR_READ) {
 			/*
 			 * Reset the buf_filled parameters so next time if the
diff --git a/iolog.c b/iolog.c
index 2b5eaf0..bfafc03 100644
--- a/iolog.c
+++ b/iolog.c
@@ -978,7 +978,7 @@ int iolog_file_inflate(const char *file)
 	struct iolog_compress ic;
 	z_stream stream;
 	struct stat sb;
-	ssize_t ret;
+	size_t ret;
 	size_t total;
 	void *buf;
 	FILE *f;
@@ -1000,12 +1000,12 @@ int iolog_file_inflate(const char *file)
 	ic.seq = 1;
 
 	ret = fread(ic.buf, ic.len, 1, f);
-	if (ret < 0) {
+	if (ret == 0 && ferror(f)) {
 		perror("fread");
 		fclose(f);
 		free(buf);
 		return 1;
-	} else if (ret != 1) {
+	} else if (ferror(f) || (!feof(f) && ret != 1)) {
 		log_err("fio: short read on reading log\n");
 		fclose(f);
 		free(buf);
diff --git a/options.c b/options.c
index 45a5b82..17d7245 100644
--- a/options.c
+++ b/options.c
@@ -1517,7 +1517,7 @@ static int str_ioengine_external_cb(void *data, const char *str)
 	return 0;
 }
 
-static int rw_verify(struct fio_option *o, void *data)
+static int rw_verify(const struct fio_option *o, void *data)
 {
 	struct thread_data *td = cb_data_to_td(data);
 
@@ -1530,7 +1530,7 @@ static int rw_verify(struct fio_option *o, void *data)
 	return 0;
 }
 
-static int gtod_cpu_verify(struct fio_option *o, void *data)
+static int gtod_cpu_verify(const struct fio_option *o, void *data)
 {
 #ifndef FIO_HAVE_CPU_AFFINITY
 	struct thread_data *td = cb_data_to_td(data);
@@ -4904,7 +4904,7 @@ int fio_options_parse(struct thread_data *td, char **opts, int num_opts)
 	opts_copy = dup_and_sub_options(opts, num_opts);
 
 	for (ret = 0, i = 0, unknown = 0; i < num_opts; i++) {
-		struct fio_option *o;
+		const struct fio_option *o;
 		int newret = parse_option(opts_copy[i], opts[i], fio_options,
 						&o, &td->o, &td->opt_list);
 
@@ -4930,7 +4930,7 @@ int fio_options_parse(struct thread_data *td, char **opts, int num_opts)
 			opts = opts_copy;
 		}
 		for (i = 0; i < num_opts; i++) {
-			struct fio_option *o = NULL;
+			const struct fio_option *o = NULL;
 			int newret = 1;
 
 			if (!opts_copy[i])
@@ -4961,9 +4961,9 @@ int fio_cmd_option_parse(struct thread_data *td, const char *opt, char *val)
 
 	ret = parse_cmd_option(opt, val, fio_options, &td->o, &td->opt_list);
 	if (!ret) {
-		struct fio_option *o;
+		const struct fio_option *o;
 
-		o = find_option(fio_options, opt);
+		o = find_option_c(fio_options, opt);
 		if (o)
 			fio_option_mark_set(&td->o, o);
 	}
@@ -5028,7 +5028,7 @@ unsigned int fio_get_kb_base(void *data)
 	return kb_base;
 }
 
-int add_option(struct fio_option *o)
+int add_option(const struct fio_option *o)
 {
 	struct fio_option *__o;
 	int opt_index = 0;
@@ -5165,7 +5165,7 @@ bool __fio_option_is_set(struct thread_options *o, unsigned int off1)
 	return false;
 }
 
-void fio_option_mark_set(struct thread_options *o, struct fio_option *opt)
+void fio_option_mark_set(struct thread_options *o, const struct fio_option *opt)
 {
 	unsigned int opt_off, index, offset;
 
diff --git a/options.h b/options.h
index 59024ef..e53eb1b 100644
--- a/options.h
+++ b/options.h
@@ -8,7 +8,7 @@
 #include "parse.h"
 #include "lib/types.h"
 
-int add_option(struct fio_option *);
+int add_option(const struct fio_option *);
 void invalidate_profile_options(const char *);
 extern char *exec_profile;
 
@@ -31,9 +31,10 @@ extern bool __fio_option_is_set(struct thread_options *, unsigned int off);
 	__r;								\
 })
 
-extern void fio_option_mark_set(struct thread_options *, struct fio_option *);
+extern void fio_option_mark_set(struct thread_options *,
+				const struct fio_option *);
 
-static inline bool o_match(struct fio_option *o, const char *opt)
+static inline bool o_match(const struct fio_option *o, const char *opt)
 {
 	if (!strcmp(o->name, opt))
 		return true;
@@ -44,6 +45,8 @@ static inline bool o_match(struct fio_option *o, const char *opt)
 }
 
 extern struct fio_option *find_option(struct fio_option *, const char *);
+extern const struct fio_option *
+find_option_c(const struct fio_option *, const char *);
 extern struct fio_option *fio_option_find(const char *);
 extern unsigned int fio_get_kb_base(void *);
 
diff --git a/parse.c b/parse.c
index 33fcf46..deb4120 100644
--- a/parse.c
+++ b/parse.c
@@ -39,7 +39,7 @@ static const char *opt_type_names[] = {
 	"OPT_UNSUPPORTED",
 };
 
-static struct fio_option *__fio_options;
+static const struct fio_option *__fio_options;
 
 static int vp_cmp(const void *p1, const void *p2)
 {
@@ -49,7 +49,7 @@ static int vp_cmp(const void *p1, const void *p2)
 	return strlen(vp2->ival) - strlen(vp1->ival);
 }
 
-static void posval_sort(struct fio_option *o, struct value_pair *vpmap)
+static void posval_sort(const struct fio_option *o, struct value_pair *vpmap)
 {
 	const struct value_pair *vp;
 	int entries;
@@ -67,14 +67,15 @@ static void posval_sort(struct fio_option *o, struct value_pair *vpmap)
 	qsort(vpmap, entries, sizeof(struct value_pair), vp_cmp);
 }
 
-static void show_option_range(struct fio_option *o,
+static void show_option_range(const struct fio_option *o,
 			      size_t (*logger)(const char *format, ...))
 {
 	if (o->type == FIO_OPT_FLOAT_LIST) {
-		if (o->minfp == DBL_MIN && o->maxfp == DBL_MAX)
+		if (!o->minfp && !o->maxfp)
 			return;
 
-		logger("%20s: min=%f", "range", o->minfp);
+		if (o->minfp != DBL_MIN)
+			logger("%20s: min=%f", "range", o->minfp);
 		if (o->maxfp != DBL_MAX)
 			logger(", max=%f", o->maxfp);
 		logger("\n");
@@ -89,7 +90,7 @@ static void show_option_range(struct fio_option *o,
 	}
 }
 
-static void show_option_values(struct fio_option *o)
+static void show_option_values(const struct fio_option *o)
 {
 	int i;
 
@@ -109,7 +110,7 @@ static void show_option_values(struct fio_option *o)
 		log_info("\n");
 }
 
-static void show_option_help(struct fio_option *o, int is_err)
+static void show_option_help(const struct fio_option *o, int is_err)
 {
 	const char *typehelp[] = {
 		"invalid",
@@ -484,7 +485,7 @@ static int str_match_len(const struct value_pair *vp, const char *str)
 			*ptr = (val);			\
 	} while (0)
 
-static const char *opt_type_name(struct fio_option *o)
+static const char *opt_type_name(const struct fio_option *o)
 {
 	compiletime_assert(ARRAY_SIZE(opt_type_names) - 1 == FIO_OPT_UNSUPPORTED,
 				"opt_type_names[] index");
@@ -495,8 +496,8 @@ static const char *opt_type_name(struct fio_option *o)
 	return "OPT_UNKNOWN?";
 }
 
-static int __handle_option(struct fio_option *o, const char *ptr, void *data,
-			   int first, int more, int curr)
+static int __handle_option(const struct fio_option *o, const char *ptr,
+			   void *data, int first, int more, int curr)
 {
 	int il=0, *ilp;
 	fio_fp64_t *flp;
@@ -668,15 +669,17 @@ static int __handle_option(struct fio_option *o, const char *ptr, void *data,
 			log_err("not a floating point value: %s\n", ptr);
 			return 1;
 		}
-		if (uf > o->maxfp) {
-			log_err("value out of range: %f"
-				" (range max: %f)\n", uf, o->maxfp);
-			return 1;
-		}
-		if (uf < o->minfp) {
-			log_err("value out of range: %f"
-				" (range min: %f)\n", uf, o->minfp);
-			return 1;
+		if (o->minfp || o->maxfp) {
+			if (uf > o->maxfp) {
+				log_err("value out of range: %f"
+					" (range max: %f)\n", uf, o->maxfp);
+				return 1;
+			}
+			if (uf < o->minfp) {
+				log_err("value out of range: %f"
+					" (range min: %f)\n", uf, o->minfp);
+				return 1;
+			}
 		}
 
 		flp = td_var(data, o, o->off1);
@@ -892,7 +895,8 @@ static int __handle_option(struct fio_option *o, const char *ptr, void *data,
 	return ret;
 }
 
-static int handle_option(struct fio_option *o, const char *__ptr, void *data)
+static int handle_option(const struct fio_option *o, const char *__ptr,
+			 void *data)
 {
 	char *o_ptr, *ptr, *ptr2;
 	int ret, done;
@@ -970,11 +974,16 @@ struct fio_option *find_option(struct fio_option *options, const char *opt)
 	return NULL;
 }
 
+const struct fio_option *
+find_option_c(const struct fio_option *options, const char *opt)
+{
+	return find_option((struct fio_option *)options, opt);
+}
 
-static struct fio_option *get_option(char *opt,
-				     struct fio_option *options, char **post)
+static const struct fio_option *
+get_option(char *opt, const struct fio_option *options, char **post)
 {
-	struct fio_option *o;
+	const struct fio_option *o;
 	char *ret;
 
 	ret = strchr(opt, '=');
@@ -984,9 +993,9 @@ static struct fio_option *get_option(char *opt,
 		ret = opt;
 		(*post)++;
 		strip_blank_end(ret);
-		o = find_option(options, ret);
+		o = find_option_c(options, ret);
 	} else {
-		o = find_option(options, opt);
+		o = find_option_c(options, opt);
 		*post = NULL;
 	}
 
@@ -995,7 +1004,7 @@ static struct fio_option *get_option(char *opt,
 
 static int opt_cmp(const void *p1, const void *p2)
 {
-	struct fio_option *o;
+	const struct fio_option *o;
 	char *s, *foo;
 	int prio1, prio2;
 
@@ -1019,15 +1028,15 @@ static int opt_cmp(const void *p1, const void *p2)
 	return prio2 - prio1;
 }
 
-void sort_options(char **opts, struct fio_option *options, int num_opts)
+void sort_options(char **opts, const struct fio_option *options, int num_opts)
 {
 	__fio_options = options;
 	qsort(opts, num_opts, sizeof(char *), opt_cmp);
 	__fio_options = NULL;
 }
 
-static void add_to_dump_list(struct fio_option *o, struct flist_head *dump_list,
-			     const char *post)
+static void add_to_dump_list(const struct fio_option *o,
+			     struct flist_head *dump_list, const char *post)
 {
 	struct print_option *p;
 
@@ -1045,12 +1054,12 @@ static void add_to_dump_list(struct fio_option *o, struct flist_head *dump_list,
 }
 
 int parse_cmd_option(const char *opt, const char *val,
-		     struct fio_option *options, void *data,
+		     const struct fio_option *options, void *data,
 		     struct flist_head *dump_list)
 {
-	struct fio_option *o;
+	const struct fio_option *o;
 
-	o = find_option(options, opt);
+	o = find_option_c(options, opt);
 	if (!o) {
 		log_err("Bad option <%s>\n", opt);
 		return 1;
@@ -1065,8 +1074,8 @@ int parse_cmd_option(const char *opt, const char *val,
 	return 0;
 }
 
-int parse_option(char *opt, const char *input,
-		 struct fio_option *options, struct fio_option **o, void *data,
+int parse_option(char *opt, const char *input, const struct fio_option *options,
+		 const struct fio_option **o, void *data,
 		 struct flist_head *dump_list)
 {
 	char *post;
@@ -1151,10 +1160,10 @@ int string_distance_ok(const char *opt, int distance)
 	return distance <= len;
 }
 
-static struct fio_option *find_child(struct fio_option *options,
-				     struct fio_option *o)
+static const struct fio_option *find_child(const struct fio_option *options,
+					   const struct fio_option *o)
 {
-	struct fio_option *__o;
+	const struct fio_option *__o;
 
 	for (__o = options + 1; __o->name; __o++)
 		if (__o->parent && !strcmp(__o->parent, o->name))
@@ -1163,7 +1172,8 @@ static struct fio_option *find_child(struct fio_option *options,
 	return NULL;
 }
 
-static void __print_option(struct fio_option *o, struct fio_option *org,
+static void __print_option(const struct fio_option *o,
+			   const struct fio_option *org,
 			   int level)
 {
 	char name[256], *p;
@@ -1184,10 +1194,10 @@ static void __print_option(struct fio_option *o, struct fio_option *org,
 	log_info("%-24s: %s\n", name, o->help);
 }
 
-static void print_option(struct fio_option *o)
+static void print_option(const struct fio_option *o)
 {
-	struct fio_option *parent;
-	struct fio_option *__o;
+	const struct fio_option *parent;
+	const struct fio_option *__o;
 	unsigned int printed;
 	unsigned int level;
 
@@ -1208,9 +1218,9 @@ static void print_option(struct fio_option *o)
 	} while (printed);
 }
 
-int show_cmd_help(struct fio_option *options, const char *name)
+int show_cmd_help(const struct fio_option *options, const char *name)
 {
-	struct fio_option *o, *closest;
+	const struct fio_option *o, *closest;
 	unsigned int best_dist = -1U;
 	int found = 0;
 	int show_all = 0;
@@ -1284,9 +1294,9 @@ int show_cmd_help(struct fio_option *options, const char *name)
 /*
  * Handle parsing of default parameters.
  */
-void fill_default_options(void *data, struct fio_option *options)
+void fill_default_options(void *data, const struct fio_option *options)
 {
-	struct fio_option *o;
+	const struct fio_option *o;
 
 	dprint(FD_PARSE, "filling default options\n");
 
@@ -1309,10 +1319,6 @@ static void option_init(struct fio_option *o)
 		if (!o->maxval)
 			o->maxval = UINT_MAX;
 	}
-	if (o->type == FIO_OPT_FLOAT_LIST) {
-		o->minfp = DBL_MIN;
-		o->maxfp = DBL_MAX;
-	}
 	if (o->type == FIO_OPT_STR_SET && o->def && !o->no_warn_def) {
 		log_err("Option %s: string set option with"
 				" default will always be true\n", o->name);
@@ -1346,9 +1352,9 @@ void options_init(struct fio_option *options)
 	}
 }
 
-void options_mem_dupe(struct fio_option *options, void *data)
+void options_mem_dupe(const struct fio_option *options, void *data)
 {
-	struct fio_option *o;
+	const struct fio_option *o;
 	char **ptr;
 
 	dprint(FD_PARSE, "dup options\n");
@@ -1363,9 +1369,9 @@ void options_mem_dupe(struct fio_option *options, void *data)
 	}
 }
 
-void options_free(struct fio_option *options, void *data)
+void options_free(const struct fio_option *options, void *data)
 {
-	struct fio_option *o;
+	const struct fio_option *o;
 	char **ptr;
 
 	dprint(FD_PARSE, "free options\n");
diff --git a/parse.h b/parse.h
index d05236b..4ad92d9 100644
--- a/parse.h
+++ b/parse.h
@@ -68,7 +68,7 @@ struct fio_option {
 	int hide_on_set;		/* hide on set, not on unset */
 	const char *inverse;		/* if set, apply opposite action to this option */
 	struct fio_option *inv_opt;	/* cached lookup */
-	int (*verify)(struct fio_option *, void *);
+	int (*verify)(const struct fio_option *, void *);
 	const char *prof_name;		/* only valid for specific profile */
 	void *prof_opts;
 	uint64_t category;		/* what type of option */
@@ -81,14 +81,18 @@ struct fio_option {
 	int no_free;
 };
 
-extern int parse_option(char *, const char *, struct fio_option *, struct fio_option **, void *, struct flist_head *);
-extern void sort_options(char **, struct fio_option *, int);
-extern int parse_cmd_option(const char *t, const char *l, struct fio_option *, void *, struct flist_head *);
-extern int show_cmd_help(struct fio_option *, const char *);
-extern void fill_default_options(void *, struct fio_option *);
+extern int parse_option(char *, const char *, const struct fio_option *,
+			const struct fio_option **, void *,
+			struct flist_head *);
+extern void sort_options(char **, const struct fio_option *, int);
+extern int parse_cmd_option(const char *t, const char *l,
+			    const struct fio_option *, void *,
+			    struct flist_head *);
+extern int show_cmd_help(const struct fio_option *, const char *);
+extern void fill_default_options(void *, const struct fio_option *);
 extern void options_init(struct fio_option *);
-extern void options_mem_dupe(struct fio_option *, void *);
-extern void options_free(struct fio_option *, void *);
+extern void options_mem_dupe(const struct fio_option *, void *);
+extern void options_free(const struct fio_option *, void *);
 
 extern void strip_blank_front(char **);
 extern void strip_blank_end(char *);
@@ -108,7 +112,8 @@ typedef int (fio_opt_str_val_fn)(void *, long long *);
 typedef int (fio_opt_int_fn)(void *, int *);
 
 struct thread_options;
-static inline void *td_var(void *to, struct fio_option *o, unsigned int offset)
+static inline void *td_var(void *to, const struct fio_option *o,
+			   unsigned int offset)
 {
 	void *ret;
 
@@ -117,7 +122,7 @@ static inline void *td_var(void *to, struct fio_option *o, unsigned int offset)
 	else
 		ret = to;
 
-	return (char *) ret + offset;
+	return ret + offset;
 }
 
 static inline int parse_is_percent(unsigned long long val)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-04-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-04-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit cefd2a94b408b9c3be0300edb1270a546e7f09fe:

  Merge branch 'aarch64-crc32c' of https://github.com/sitsofe/fio (2018-03-30 10:16:27 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 196ce5d5ff76af30cf93fd24240f12d8c4d24381:

  Merge branch 'fixbug_glfs' of https://github.com/simon-rock/fio (2018-04-03 08:19:38 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fixbug_glfs' of https://github.com/simon-rock/fio

simon (1):
      glusterfs: always allocate io_u->engine_data

 engines/glusterfs_async.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

---

Diff of recent changes:

diff --git a/engines/glusterfs_async.c b/engines/glusterfs_async.c
index 97271d6..eb8df45 100644
--- a/engines/glusterfs_async.c
+++ b/engines/glusterfs_async.c
@@ -70,20 +70,17 @@ static void fio_gf_io_u_free(struct thread_data *td, struct io_u *io_u)
 
 static int fio_gf_io_u_init(struct thread_data *td, struct io_u *io_u)
 {
+    struct fio_gf_iou *io;
 	dprint(FD_FILE, "%s\n", __FUNCTION__);
-
-	if (!io_u->engine_data) {
-		struct fio_gf_iou *io;
-
-		io = malloc(sizeof(struct fio_gf_iou));
-		if (!io) {
-			td_verror(td, errno, "malloc");
-			return 1;
-		}
-		io->io_complete = 0;
-		io->io_u = io_u;
-		io_u->engine_data = io;
-	}
+    
+    io = malloc(sizeof(struct fio_gf_iou));
+    if (!io) {
+        td_verror(td, errno, "malloc");
+        return 1;
+    }
+    io->io_complete = 0;
+    io->io_u = io_u;
+    io_u->engine_data = io;
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-31 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-31 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d6d74886759e3f268a6a3b12a47872865b867023:

  Merge branch 'master' of https://github.com/bvanassche/fio (2018-03-29 10:02:25 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cefd2a94b408b9c3be0300edb1270a546e7f09fe:

  Merge branch 'aarch64-crc32c' of https://github.com/sitsofe/fio (2018-03-30 10:16:27 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'aarch64-crc32c' of https://github.com/sitsofe/fio

Sitsofe Wheeler (2):
      aarch64: refactor HW ARM CRC32c detection
      Minor style changes

 arch/arch-aarch64.h |  4 ----
 configure           | 18 +++++++++++++-----
 crc/crc32c-arm64.c  | 21 +++++++--------------
 crc/crc32c-intel.c  |  2 +-
 crc/crc32c.h        |  6 +++---
 os/os-linux.h       | 27 +++++++++++++++++++++++++++
 os/os.h             | 11 +++++++++++
 7 files changed, 62 insertions(+), 27 deletions(-)

---

Diff of recent changes:

diff --git a/arch/arch-aarch64.h b/arch/arch-aarch64.h
index 0912a86..2a86cc5 100644
--- a/arch/arch-aarch64.h
+++ b/arch/arch-aarch64.h
@@ -27,8 +27,4 @@ static inline int arch_ffz(unsigned long bitmask)
 
 #define ARCH_HAVE_FFZ
 
-#ifdef ARCH_HAVE_CRC_CRYPTO
-#define ARCH_HAVE_ARM64_CRC_CRYPTO
-#endif
-
 #endif
diff --git a/configure b/configure
index f635863..38706a9 100755
--- a/configure
+++ b/configure
@@ -600,7 +600,7 @@ int main(void)
 EOF
 if compile_prog "" "" "posixaio" ; then
   posix_aio="yes"
-elif compile_prog "" "-lrt" "posixaio"; then
+elif compile_prog "" "-lrt" "posixaio -lrt"; then
   posix_aio="yes"
   posix_aio_lrt="yes"
   LIBS="-lrt $LIBS"
@@ -2108,18 +2108,23 @@ if test "$march_armv8_a_crc_crypto" != "yes" ; then
 fi
 if test "$cpu" = "arm64" ; then
   cat > $TMPC <<EOF
-#include <sys/auxv.h>
 #include <arm_acle.h>
 #include <arm_neon.h>
+#include <sys/auxv.h>
 
 int main(void)
 {
-  return 0;
+  /* Can we also do a runtime probe? */
+#if __linux__
+  return getauxval(AT_HWCAP);
+#else
+# error "Don't know how to do runtime probe for ARM CRC32c"
+#endif
 }
 EOF
-  if compile_prog "-march=armv8-a+crc+crypto" "" ""; then
+  if compile_prog "-march=armv8-a+crc+crypto" "" "ARM CRC32c"; then
     march_armv8_a_crc_crypto="yes"
-    CFLAGS="$CFLAGS -march=armv8-a+crc+crypto -DARCH_HAVE_CRC_CRYPTO"
+    CFLAGS="$CFLAGS -march=armv8-a+crc+crypto"
     march_set="yes"
   fi
 fi
@@ -2421,6 +2426,9 @@ if test "$zlib" = "no" ; then
     echo "Note that some distros have separate packages for static libraries."
   fi
 fi
+if test "$march_armv8_a_crc_crypto" = "yes" ; then
+  output_sym "ARCH_HAVE_CRC_CRYPTO"
+fi
 if test "$cuda" = "yes" ; then
   output_sym "CONFIG_CUDA"
 fi
diff --git a/crc/crc32c-arm64.c b/crc/crc32c-arm64.c
index 08177ba..11bfe5d 100644
--- a/crc/crc32c-arm64.c
+++ b/crc/crc32c-arm64.c
@@ -1,4 +1,9 @@
 #include "crc32c.h"
+#include "../os/os.h"
+
+bool crc32c_arm64_available = false;
+
+#ifdef ARCH_HAVE_CRC_CRYPTO
 
 #define CRC32C3X8(ITR) \
 	crc1 = __crc32cd(crc1, *((const uint64_t *)data + 42*1 + (ITR)));\
@@ -15,15 +20,6 @@
 	CRC32C3X8((ITR)*7+6) \
 	} while(0)
 
-#ifndef HWCAP_CRC32
-#define HWCAP_CRC32             (1 << 7)
-#endif /* HWCAP_CRC32 */
-
-bool crc32c_arm64_available = false;
-
-#ifdef ARCH_HAVE_ARM64_CRC_CRYPTO
-
-#include <sys/auxv.h>
 #include <arm_acle.h>
 #include <arm_neon.h>
 
@@ -102,13 +98,10 @@ uint32_t crc32c_arm64(unsigned char const *data, unsigned long length)
 
 void crc32c_arm64_probe(void)
 {
-	unsigned long hwcap;
-
 	if (!crc32c_probed) {
-		hwcap = getauxval(AT_HWCAP);
-		crc32c_arm64_available = (hwcap & HWCAP_CRC32) != 0;
+		crc32c_arm64_available = os_cpu_has(CPU_ARM64_CRC32C);
 		crc32c_probed = true;
 	}
 }
 
-#endif /* ARCH_HAVE_ARM64_CRC_CRYPTO */
+#endif /* ARCH_HAVE_CRC_CRYPTO */
diff --git a/crc/crc32c-intel.c b/crc/crc32c-intel.c
index 9a2cefd..6e810a2 100644
--- a/crc/crc32c-intel.c
+++ b/crc/crc32c-intel.c
@@ -84,4 +84,4 @@ void crc32c_intel_probe(void)
 	}
 }
 
-#endif /* ARCH_HAVE_SSE */
+#endif /* ARCH_HAVE_SSE4_2 */
diff --git a/crc/crc32c.h b/crc/crc32c.h
index 60f6014..18f1161 100644
--- a/crc/crc32c.h
+++ b/crc/crc32c.h
@@ -27,7 +27,7 @@ extern uint32_t crc32c_sw(unsigned char const *, unsigned long);
 extern bool crc32c_arm64_available;
 extern bool crc32c_intel_available;
 
-#ifdef ARCH_HAVE_ARM64_CRC_CRYPTO
+#ifdef ARCH_HAVE_CRC_CRYPTO
 extern uint32_t crc32c_arm64(unsigned char const *, unsigned long);
 extern void crc32c_arm64_probe(void);
 #else
@@ -35,7 +35,7 @@ extern void crc32c_arm64_probe(void);
 static inline void crc32c_arm64_probe(void)
 {
 }
-#endif
+#endif /* ARCH_HAVE_CRC_CRYPTO */
 
 #ifdef ARCH_HAVE_SSE4_2
 extern uint32_t crc32c_intel(unsigned char const *, unsigned long);
@@ -45,7 +45,7 @@ extern void crc32c_intel_probe(void);
 static inline void crc32c_intel_probe(void)
 {
 }
-#endif
+#endif /* ARCH_HAVE_SSE4_2 */
 
 static inline uint32_t fio_crc32c(unsigned char const *buf, unsigned long len)
 {
diff --git a/os/os-linux.h b/os/os-linux.h
index 894dc85..1d400a0 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -19,6 +19,13 @@
 #include <linux/fs.h>
 #include <scsi/sg.h>
 
+#ifdef ARCH_HAVE_CRC_CRYPTO
+#include <sys/auxv.h>
+#ifndef HWCAP_CRC32
+#define HWCAP_CRC32             (1 << 7)
+#endif /* HWCAP_CRC32 */
+#endif /* ARCH_HAVE_CRC_CRYPTO */
+
 #include "./os-linux-syscall.h"
 #include "binject.h"
 #include "../file.h"
@@ -410,4 +417,24 @@ static inline bool fio_fallocate(struct fio_file *f, uint64_t offset,
 }
 #endif
 
+#define FIO_HAVE_CPU_HAS
+static inline bool os_cpu_has(cpu_features feature)
+{
+	bool have_feature;
+	unsigned long fio_unused hwcap;
+
+	switch (feature) {
+#ifdef ARCH_HAVE_CRC_CRYPTO
+	case CPU_ARM64_CRC32C:
+		hwcap = getauxval(AT_HWCAP);
+		have_feature = (hwcap & HWCAP_CRC32) != 0;
+		break;
+#endif
+	default:
+		have_feature = false;
+	}
+
+	return have_feature;
+}
+
 #endif
diff --git a/os/os.h b/os/os.h
index 95ed7cf..becc410 100644
--- a/os/os.h
+++ b/os/os.h
@@ -27,6 +27,10 @@ enum {
 	os_nr,
 };
 
+typedef enum {
+        CPU_ARM64_CRC32C,
+} cpu_features;
+
 /* IWYU pragma: begin_exports */
 #if defined(__ANDROID__)
 #include "os-android.h"
@@ -387,4 +391,11 @@ static inline bool fio_fallocate(struct fio_file *f, uint64_t offset, uint64_t l
 # define FIO_HAVE_ANY_FALLOCATE
 #endif
 
+#ifndef FIO_HAVE_CPU_HAS
+static inline bool os_cpu_has(cpu_features feature)
+{
+	return false;
+}
+#endif
+
 #endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b348b7c7a1a6278d793698fc23993620ae9c588a:

  Merge branch 'master' of https://github.com/bvanassche/fio (2018-03-23 09:58:51 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d6d74886759e3f268a6a3b12a47872865b867023:

  Merge branch 'master' of https://github.com/bvanassche/fio (2018-03-29 10:02:25 -0600)

----------------------------------------------------------------
Bart Van Assche (2):
      Make it clear to Coverity that the tmp buffer in switch_ioscheduler() is \0-terminated
      switch_ioscheduler(): only remove the last character if it's a newline

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 backend.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index fc83ed1..f2d7cc3 100644
--- a/backend.c
+++ b/backend.c
@@ -1328,7 +1328,7 @@ static int init_io_u(struct thread_data *td)
 static int switch_ioscheduler(struct thread_data *td)
 {
 #ifdef FIO_HAVE_IOSCHED_SWITCH
-	char tmp[256], tmp2[128];
+	char tmp[256], tmp2[128], *p;
 	FILE *f;
 	int ret;
 
@@ -1364,17 +1364,19 @@ static int switch_ioscheduler(struct thread_data *td)
 	/*
 	 * Read back and check that the selected scheduler is now the default.
 	 */
-	memset(tmp, 0, sizeof(tmp));
-	ret = fread(tmp, sizeof(tmp), 1, f);
+	ret = fread(tmp, 1, sizeof(tmp) - 1, f);
 	if (ferror(f) || ret < 0) {
 		td_verror(td, errno, "fread");
 		fclose(f);
 		return 1;
 	}
+	tmp[ret] = '\0';
 	/*
-	 * either a list of io schedulers or "none\n" is expected.
+	 * either a list of io schedulers or "none\n" is expected. Strip the
+	 * trailing newline.
 	 */
-	tmp[strlen(tmp) - 1] = '\0';
+	p = tmp;
+	strsep(&p, "\n");
 
 	/*
 	 * Write to "none" entry doesn't fail, so check the result here.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 87c6f22bf24da4679849ccf778451b8432c2b368:

  server: use scalloc() for sk_out allocation (2018-03-22 11:29:25 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b348b7c7a1a6278d793698fc23993620ae9c588a:

  Merge branch 'master' of https://github.com/bvanassche/fio (2018-03-23 09:58:51 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      smalloc: Remove Valgrind instrumentation
      Merge branch 'master' of https://github.com/bvanassche/fio

 smalloc.c | 34 ++--------------------------------
 1 file changed, 2 insertions(+), 32 deletions(-)

---

Diff of recent changes:

diff --git a/smalloc.c b/smalloc.c
index 7b1690a..a2ad25a 100644
--- a/smalloc.c
+++ b/smalloc.c
@@ -5,13 +5,6 @@
 #include <sys/mman.h>
 #include <assert.h>
 #include <string.h>
-#ifdef CONFIG_VALGRIND_DEV
-#include <valgrind/valgrind.h>
-#else
-#define RUNNING_ON_VALGRIND 0
-#define VALGRIND_MALLOCLIKE_BLOCK(addr, size, rzB, is_zeroed) do { } while (0)
-#define VALGRIND_FREELIKE_BLOCK(addr, rzB) do { } while (0)
-#endif
 
 #include "fio.h"
 #include "fio_sem.h"
@@ -48,12 +41,6 @@ struct pool {
 	size_t mmap_size;
 };
 
-#ifdef SMALLOC_REDZONE
-#define REDZONE_SIZE sizeof(unsigned int)
-#else
-#define REDZONE_SIZE 0
-#endif
-
 struct block_hdr {
 	size_t size;
 #ifdef SMALLOC_REDZONE
@@ -263,10 +250,6 @@ static void fill_redzone(struct block_hdr *hdr)
 {
 	unsigned int *postred = postred_ptr(hdr);
 
-	/* Let Valgrind fill the red zones. */
-	if (RUNNING_ON_VALGRIND)
-		return;
-
 	hdr->prered = SMALLOC_PRE_RED;
 	*postred = SMALLOC_POST_RED;
 }
@@ -275,10 +258,6 @@ static void sfree_check_redzone(struct block_hdr *hdr)
 {
 	unsigned int *postred = postred_ptr(hdr);
 
-	/* Let Valgrind check the red zones. */
-	if (RUNNING_ON_VALGRIND)
-		return;
-
 	if (hdr->prered != SMALLOC_PRE_RED) {
 		log_err("smalloc pre redzone destroyed!\n"
 			" ptr=%p, prered=%x, expected %x\n",
@@ -346,7 +325,6 @@ void sfree(void *ptr)
 	}
 
 	if (pool) {
-		VALGRIND_FREELIKE_BLOCK(ptr, REDZONE_SIZE);
 		sfree_pool(pool, ptr);
 		return;
 	}
@@ -437,7 +415,7 @@ static void *smalloc_pool(struct pool *pool, size_t size)
 	return ptr;
 }
 
-static void *__smalloc(size_t size, bool is_zeroed)
+void *smalloc(size_t size)
 {
 	unsigned int i, end_pool;
 
@@ -453,9 +431,6 @@ static void *__smalloc(size_t size, bool is_zeroed)
 
 			if (ptr) {
 				last_pool = i;
-				VALGRIND_MALLOCLIKE_BLOCK(ptr, size,
-							  REDZONE_SIZE,
-							  is_zeroed);
 				return ptr;
 			}
 		}
@@ -473,14 +448,9 @@ static void *__smalloc(size_t size, bool is_zeroed)
 	return NULL;
 }
 
-void *smalloc(size_t size)
-{
-	return __smalloc(size, false);
-}
-
 void *scalloc(size_t nmemb, size_t size)
 {
-	return __smalloc(nmemb * size, true);
+	return smalloc(nmemb * size);
 }
 
 char *smalloc_strdup(const char *str)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7ad2ddffe2bdc5e47fb86cab276db1db9e350f1b:

  sg: fix sign extension (2018-03-21 20:09:36 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 87c6f22bf24da4679849ccf778451b8432c2b368:

  server: use scalloc() for sk_out allocation (2018-03-22 11:29:25 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      smalloc: oom cleanups
      Merge branch 'gcc' of https://github.com/sitsofe/fio
      server: use scalloc() for sk_out allocation

Sitsofe Wheeler (1):
      compiler: set minimum compiler version to GCC 4.1.0

 backend.c                |  9 ++++++---
 cgroup.c                 |  3 +++
 compiler/compiler-gcc3.h | 10 ----------
 compiler/compiler.h      |  6 ++----
 filesetup.c              |  9 +++++----
 gettime-thread.c         |  2 --
 server.c                 |  2 +-
 7 files changed, 17 insertions(+), 24 deletions(-)
 delete mode 100644 compiler/compiler-gcc3.h

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index a92b1e3..fc83ed1 100644
--- a/backend.c
+++ b/backend.c
@@ -2477,7 +2477,8 @@ int fio_backend(struct sk_out *sk_out)
 	helper_thread_create(startup_sem, sk_out);
 
 	cgroup_list = smalloc(sizeof(*cgroup_list));
-	INIT_FLIST_HEAD(cgroup_list);
+	if (cgroup_list)
+		INIT_FLIST_HEAD(cgroup_list);
 
 	run_threads(sk_out);
 
@@ -2507,8 +2508,10 @@ int fio_backend(struct sk_out *sk_out)
 	}
 
 	free_disk_util();
-	cgroup_kill(cgroup_list);
-	sfree(cgroup_list);
+	if (cgroup_list) {
+		cgroup_kill(cgroup_list);
+		sfree(cgroup_list);
+	}
 	sfree(cgroup_mnt);
 
 	fio_sem_remove(startup_sem);
diff --git a/cgroup.c b/cgroup.c
index 380e37e..629047b 100644
--- a/cgroup.c
+++ b/cgroup.c
@@ -147,6 +147,9 @@ int cgroup_setup(struct thread_data *td, struct flist_head *clist, char **mnt)
 {
 	char *root;
 
+	if (!clist)
+		return 1;
+
 	if (!*mnt) {
 		*mnt = find_cgroup_mnt(td);
 		if (!*mnt)
diff --git a/compiler/compiler-gcc3.h b/compiler/compiler-gcc3.h
deleted file mode 100644
index 566987a..0000000
--- a/compiler/compiler-gcc3.h
+++ /dev/null
@@ -1,10 +0,0 @@
-#ifndef FIO_COMPILER_GCC3_H
-#define FIO_COMPILER_GCC3_H
-
-#if __GNUC_MINOR__ >= 4
-#ifndef __must_check
-#define __must_check		__attribute__((warn_unused_result))
-#endif
-#endif
-
-#endif
diff --git a/compiler/compiler.h b/compiler/compiler.h
index 4d92ac4..dacb737 100644
--- a/compiler/compiler.h
+++ b/compiler/compiler.h
@@ -2,12 +2,10 @@
 #define FIO_COMPILER_H
 
 /* IWYU pragma: begin_exports */
-#if __GNUC__ >= 4
+#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 1)
 #include "compiler-gcc4.h"
-#elif __GNUC__ == 3
-#include "compiler-gcc3.h"
 #else
-#error Compiler too old, need gcc at least gcc 3.x
+#error Compiler too old, need at least gcc 4.1.0
 #endif
 /* IWYU pragma: end_exports */
 
diff --git a/filesetup.c b/filesetup.c
index c115f7b..b246e0f 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1608,8 +1608,9 @@ int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 		f->file_name = strdup(file_name);
 	else
 		f->file_name = smalloc_strdup(file_name);
-	if (!f->file_name)
-		assert(0);
+
+	/* can't handle smalloc failure from here */
+	assert(f->file_name);
 
 	get_file_type(f);
 
@@ -1814,9 +1815,9 @@ void dup_files(struct thread_data *td, struct thread_data *org)
 				__f->file_name = strdup(f->file_name);
 			else
 				__f->file_name = smalloc_strdup(f->file_name);
-			if (!__f->file_name)
-				assert(0);
 
+			/* can't handle smalloc failure from here */
+			assert(__f->file_name);
 			__f->filetype = f->filetype;
 		}
 
diff --git a/gettime-thread.c b/gettime-thread.c
index eb535a0..0a2cc6c 100644
--- a/gettime-thread.c
+++ b/gettime-thread.c
@@ -15,8 +15,6 @@ void fio_gtod_init(void)
 		return;
 
 	fio_ts = smalloc(sizeof(*fio_ts));
-	if (!fio_ts)
-		log_err("fio: smalloc pool exhausted\n");
 }
 
 static void fio_gtod_update(void)
diff --git a/server.c b/server.c
index 15dc2c4..d3f6977 100644
--- a/server.c
+++ b/server.c
@@ -1359,7 +1359,7 @@ static int accept_loop(int listen_sk)
 
 		dprint(FD_NET, "server: connect from %s\n", from);
 
-		sk_out = smalloc(sizeof(*sk_out));
+		sk_out = scalloc(1, sizeof(*sk_out));
 		if (!sk_out) {
 			close(sk);
 			return -1;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 69c594d81d4067fadd70fa4909e19d615efa5f1c:

  optgroup: move debug code into function (2018-03-20 11:19:19 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7ad2ddffe2bdc5e47fb86cab276db1db9e350f1b:

  sg: fix sign extension (2018-03-21 20:09:36 -0600)

----------------------------------------------------------------
Bart Van Assche (3):
      Add an asprintf() implementation
      log: Modify the implementation such that it uses asprintf()
      verify: Simplify dump_buf()

Jens Axboe (7):
      debug: remove extra parens
      Merge branch 'asprintf' of https://github.com/bvanassche/fio
      server: process STOP/QUIT commands out-of-line
      server: handle shared mem pool allocation failures
      Merge branch 'include_refactor' of https://github.com/sitsofe/fio
      server: fix error handling for shared memory handling
      sg: fix sign extension

Sitsofe Wheeler (4):
      Refactor #includes and headers
      Use POSIX path for poll.h and fcntl.h headers
      oslib: make str* compat functions more uniform
      Add include-what-you-use pragmas

 Makefile                            |  1 +
 arch/arch-x86.h                     |  2 +-
 arch/arch-x86_64.h                  |  2 +-
 arch/arch.h                         |  2 +
 backend.c                           | 12 -----
 blktrace.c                          |  2 -
 cconv.c                             |  1 +
 cgroup.c                            |  1 -
 client.c                            |  6 +--
 client.h                            |  6 +--
 compiler/compiler.h                 |  3 +-
 configure                           | 40 +++++++++++++++
 crc/crc32.c                         |  1 -
 crc/crc32.h                         |  2 +
 crc/crc32c-intel.c                  |  7 ---
 crc/crc32c.c                        |  2 -
 crc/crc32c.h                        |  2 +
 crc/md5.c                           |  1 -
 crc/sha1.h                          |  2 +
 crc/sha256.c                        |  1 -
 crc/sha256.h                        |  2 +
 crc/sha3.c                          |  1 -
 crc/sha512.c                        |  1 -
 crc/sha512.h                        |  2 +
 crc/test.c                          |  5 +-
 debug.c                             |  5 +-
 debug.h                             |  4 +-
 diskutil.c                          |  3 --
 diskutil.h                          |  1 -
 engines/binject.c                   |  2 +-
 engines/e4defrag.c                  |  4 --
 engines/falloc.c                    |  4 --
 engines/filecreate.c                |  2 -
 engines/ftruncate.c                 |  8 +--
 engines/libaio.c                    |  2 -
 engines/mmap.c                      |  1 -
 engines/mtd.c                       |  3 --
 engines/net.c                       |  4 +-
 engines/null.c                      |  3 --
 engines/rdma.c                      |  2 +-
 engines/sg.c                        |  9 ++--
 engines/splice.c                    |  3 +-
 engines/sync.c                      |  1 -
 eta.c                               |  1 -
 fifo.c                              |  1 +
 fifo.h                              |  1 -
 filesetup.c                         |  2 -
 fio.c                               |  5 --
 fio.h                               |  1 +
 fio_sem.c                           |  2 +-
 fio_time.h                          |  3 ++
 gettime-thread.c                    |  2 -
 gettime.c                           |  7 ---
 gettime.h                           |  2 +
 helpers.c                           |  9 +---
 helpers.h                           |  5 +-
 idletime.c                          |  1 +
 idletime.h                          |  5 +-
 init.c                              |  2 -
 io_u.c                              |  4 --
 io_u.h                              |  1 -
 io_u_queue.h                        |  2 +
 ioengines.c                         |  1 -
 ioengines.h                         |  5 +-
 iolog.c                             |  2 -
 iolog.h                             |  2 +
 json.c                              |  1 -
 json.h                              |  4 --
 lib/bloom.c                         |  2 -
 lib/gauss.c                         |  1 -
 lib/ieee754.c                       |  1 -
 lib/lfsr.c                          |  1 -
 lib/memalign.c                      |  3 +-
 lib/memalign.h                      |  2 +
 lib/memcpy.c                        |  3 +-
 lib/num2str.c                       |  1 +
 lib/output_buffer.c                 |  1 -
 lib/output_buffer.h                 |  2 +-
 lib/pattern.c                       |  2 -
 lib/pattern.h                       |  2 -
 lib/prio_tree.c                     |  1 +
 lib/prio_tree.h                     |  1 -
 lib/rand.c                          |  1 -
 lib/rand.h                          |  1 -
 lib/strntol.h                       |  2 +
 lib/types.h                         |  2 +-
 lib/zipf.c                          |  6 ---
 lib/zipf.h                          |  1 +
 libfio.c                            |  1 -
 log.c                               | 97 ++++++++++---------------------------
 memory.c                            |  3 +-
 options.c                           |  4 --
 options.h                           |  1 -
 os/os-hpux.h                        |  2 +-
 os/os-solaris.h                     |  2 +-
 os/os.h                             |  4 +-
 os/windows/posix.c                  |  2 +-
 os/windows/posix/include/poll.h     | 11 +++++
 os/windows/posix/include/sys/poll.h | 15 ------
 oslib/asprintf.c                    | 43 ++++++++++++++++
 oslib/asprintf.h                    | 11 +++++
 oslib/strcasestr.c                  |  5 +-
 oslib/strcasestr.h                  |  7 +--
 oslib/strlcat.c                     |  4 ++
 oslib/strlcat.h                     |  6 +++
 oslib/strndup.c                     |  5 +-
 oslib/strndup.h                     |  9 +++-
 oslib/strsep.c                      |  7 ++-
 oslib/strsep.h                      |  4 ++
 parse.c                             |  4 +-
 server.c                            | 55 +++++++++++++++------
 server.h                            |  3 --
 smalloc.c                           |  8 ---
 smalloc.h                           |  2 +
 stat.c                              |  3 --
 steadystate.c                       |  1 -
 steadystate.h                       |  2 -
 t/axmap.c                           |  3 --
 t/btrace2fio.c                      |  2 +-
 t/dedupe.c                          | 10 ++--
 t/gen-rand.c                        | 12 ++---
 t/genzipf.c                         |  1 -
 t/lfsr-test.c                       |  4 --
 td_error.h                          |  2 +
 trim.c                              |  3 --
 trim.h                              |  8 ++-
 verify.c                            | 29 +++++------
 verify.h                            |  1 +
 workqueue.h                         |  7 +++
 129 files changed, 324 insertions(+), 354 deletions(-)
 delete mode 100644 os/windows/posix/include/sys/poll.h
 create mode 100644 oslib/asprintf.c
 create mode 100644 oslib/asprintf.h

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index eb3bddd..d45ba6b 100644
--- a/Makefile
+++ b/Makefile
@@ -104,6 +104,7 @@ endif
 ifdef CONFIG_RBD
   SOURCE += engines/rbd.c
 endif
+SOURCE += oslib/asprintf.c
 ifndef CONFIG_STRSEP
   SOURCE += oslib/strsep.c
 endif
diff --git a/arch/arch-x86.h b/arch/arch-x86.h
index 457b44c..c6bcb54 100644
--- a/arch/arch-x86.h
+++ b/arch/arch-x86.h
@@ -10,7 +10,7 @@ static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 		: "memory");
 }
 
-#include "arch-x86-common.h"
+#include "arch-x86-common.h" /* IWYU pragma: export */
 
 #define FIO_ARCH	(arch_x86)
 
diff --git a/arch/arch-x86_64.h b/arch/arch-x86_64.h
index e686d10..484ea0c 100644
--- a/arch/arch-x86_64.h
+++ b/arch/arch-x86_64.h
@@ -10,7 +10,7 @@ static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 		: "memory");
 }
 
-#include "arch-x86-common.h"
+#include "arch-x86-common.h" /* IWYU pragma: export */
 
 #define FIO_ARCH	(arch_x86_64)
 
diff --git a/arch/arch.h b/arch/arch.h
index 4fb9b51..0ec3f10 100644
--- a/arch/arch.h
+++ b/arch/arch.h
@@ -34,6 +34,7 @@ extern unsigned long arch_flags;
 
 #define ARCH_CPU_CLOCK_WRAPS
 
+/* IWYU pragma: begin_exports */
 #if defined(__i386__)
 #include "arch-x86.h"
 #elif defined(__x86_64__)
@@ -66,6 +67,7 @@ extern unsigned long arch_flags;
 #endif
 
 #include "../lib/ffz.h"
+/* IWYU pragma: end_exports */
 
 #ifndef ARCH_HAVE_INIT
 static inline int arch_init(char *envp[])
diff --git a/backend.c b/backend.c
index d82d494..a92b1e3 100644
--- a/backend.c
+++ b/backend.c
@@ -22,29 +22,17 @@
  *
  */
 #include <unistd.h>
-#include <fcntl.h>
 #include <string.h>
-#include <limits.h>
 #include <signal.h>
-#include <time.h>
-#include <locale.h>
 #include <assert.h>
-#include <time.h>
 #include <inttypes.h>
 #include <sys/stat.h>
 #include <sys/wait.h>
-#include <sys/ipc.h>
-#include <sys/mman.h>
 #include <math.h>
 
 #include "fio.h"
-#ifndef FIO_NO_HAVE_SHM_H
-#include <sys/shm.h>
-#endif
-#include "hash.h"
 #include "smalloc.h"
 #include "verify.h"
-#include "trim.h"
 #include "diskutil.h"
 #include "cgroup.h"
 #include "profile.h"
diff --git a/blktrace.c b/blktrace.c
index 4b791d7..6e4d0a4 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -3,10 +3,8 @@
  */
 #include <stdio.h>
 #include <stdlib.h>
-#include <sys/stat.h>
 #include <sys/ioctl.h>
 #include <linux/fs.h>
-#include <dirent.h>
 
 #include "flist.h"
 #include "fio.h"
diff --git a/cconv.c b/cconv.c
index 92996b1..dbe0071 100644
--- a/cconv.c
+++ b/cconv.c
@@ -1,5 +1,6 @@
 #include <string.h>
 
+#include "log.h"
 #include "thread_options.h"
 
 static void string_to_cpu(char **dst, const uint8_t *src)
diff --git a/cgroup.c b/cgroup.c
index 4fab977..380e37e 100644
--- a/cgroup.c
+++ b/cgroup.c
@@ -5,7 +5,6 @@
 #include <stdlib.h>
 #include <mntent.h>
 #include <sys/stat.h>
-#include <sys/types.h>
 #include "fio.h"
 #include "flist.h"
 #include "cgroup.h"
diff --git a/client.c b/client.c
index bff0adc..970974a 100644
--- a/client.c
+++ b/client.c
@@ -1,13 +1,11 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
-#include <limits.h>
 #include <errno.h>
 #include <fcntl.h>
-#include <sys/poll.h>
+#include <poll.h>
 #include <sys/types.h>
 #include <sys/stat.h>
-#include <sys/wait.h>
 #include <sys/socket.h>
 #include <sys/un.h>
 #include <netinet/in.h>
@@ -23,7 +21,7 @@
 #include "server.h"
 #include "flist.h"
 #include "hash.h"
-#include "verify.h"
+#include "verify-state.h"
 
 static void handle_du(struct fio_client *client, struct fio_net_cmd *cmd);
 static void handle_ts(struct fio_client *client, struct fio_net_cmd *cmd);
diff --git a/client.h b/client.h
index 90082a3..29e84d0 100644
--- a/client.h
+++ b/client.h
@@ -1,7 +1,6 @@
 #ifndef CLIENT_H
 #define CLIENT_H
 
-#include <sys/socket.h>
 #include <sys/un.h>
 #include <netinet/in.h>
 #include <arpa/inet.h>
@@ -10,7 +9,6 @@
 #include "stat.h"
 
 struct fio_net_cmd;
-struct client_ops;
 
 enum {
 	Client_created		= 0,
@@ -83,6 +81,8 @@ typedef void (client_eta_op)(struct jobs_eta *je);
 typedef void (client_timed_out_op)(struct fio_client *);
 typedef void (client_jobs_eta_op)(struct fio_client *client, struct jobs_eta *je);
 
+extern struct client_ops fio_client_ops;
+
 struct client_ops {
 	client_cmd_op		*text;
 	client_cmd_op		*disk_util;
@@ -105,8 +105,6 @@ struct client_ops {
 	uint32_t client_type;
 };
 
-extern struct client_ops fio_client_ops;
-
 struct client_eta {
 	unsigned int pending;
 	struct jobs_eta eta;
diff --git a/compiler/compiler.h b/compiler/compiler.h
index 91a9883..4d92ac4 100644
--- a/compiler/compiler.h
+++ b/compiler/compiler.h
@@ -1,7 +1,7 @@
 #ifndef FIO_COMPILER_H
 #define FIO_COMPILER_H
-#include <assert.h>
 
+/* IWYU pragma: begin_exports */
 #if __GNUC__ >= 4
 #include "compiler-gcc4.h"
 #elif __GNUC__ == 3
@@ -9,6 +9,7 @@
 #else
 #error Compiler too old, need gcc at least gcc 3.x
 #endif
+/* IWYU pragma: end_exports */
 
 #ifndef __must_check
 #define __must_check
diff --git a/configure b/configure
index ddf03a6..f635863 100755
--- a/configure
+++ b/configure
@@ -784,6 +784,40 @@ fi
 print_config "rdmacm" "$rdmacm"
 
 ##########################################
+# asprintf() and vasprintf() probes
+if test "$have_asprintf" != "yes" ; then
+  have_asprintf="no"
+fi
+cat > $TMPC << EOF
+#include <stdio.h>
+
+int main(int argc, char **argv)
+{
+  return asprintf(NULL, "%s", "str") == 0;
+}
+EOF
+if compile_prog "" "" "have_asprintf"; then
+    have_asprintf="yes"
+fi
+print_config "asprintf()" "$have_asprintf"
+
+if test "$have_vasprintf" != "yes" ; then
+  have_vasprintf="no"
+fi
+cat > $TMPC << EOF
+#include <stdio.h>
+
+int main(int argc, char **argv)
+{
+  return vasprintf(NULL, "%s", NULL) == 0;
+}
+EOF
+if compile_prog "" "" "have_vasprintf"; then
+    have_vasprintf="yes"
+fi
+print_config "vasprintf()" "$have_vasprintf"
+
+##########################################
 # Linux fallocate probe
 if test "$linux_fallocate" != "yes" ; then
   linux_fallocate="no"
@@ -2169,6 +2203,12 @@ fi
 if test "$posix_pshared" = "yes" ; then
   output_sym "CONFIG_PSHARED"
 fi
+if test "$have_asprintf" = "yes" ; then
+    output_sym "HAVE_ASPRINTF"
+fi
+if test "$have_vasprintf" = "yes" ; then
+    output_sym "HAVE_VASPRINTF"
+fi
 if test "$linux_fallocate" = "yes" ; then
   output_sym "CONFIG_LINUX_FALLOCATE"
 fi
diff --git a/crc/crc32.c b/crc/crc32.c
index 4140a8d..e35f5d9 100644
--- a/crc/crc32.c
+++ b/crc/crc32.c
@@ -15,7 +15,6 @@
    along with this program; if not, write to the Free Software Foundation,
    Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */
 
-#include <inttypes.h>
 #include "crc32.h"
 
 static const uint32_t crctab[256] = {
diff --git a/crc/crc32.h b/crc/crc32.h
index a37d7ad..6378e81 100644
--- a/crc/crc32.h
+++ b/crc/crc32.h
@@ -18,6 +18,8 @@
 #ifndef CRC32_H
 #define CRC32_H
 
+#include <inttypes.h>
+
 extern uint32_t fio_crc32(const void * const, unsigned long);
 
 #endif
diff --git a/crc/crc32c-intel.c b/crc/crc32c-intel.c
index 05a087d..9a2cefd 100644
--- a/crc/crc32c-intel.c
+++ b/crc/crc32c-intel.c
@@ -1,10 +1,3 @@
-#include <inttypes.h>
-#include <string.h>
-#include <unistd.h>
-#include <stdlib.h>
-#include <signal.h>
-#include <sys/types.h>
-#include <sys/wait.h>
 #include "crc32c.h"
 
 /*
diff --git a/crc/crc32c.c b/crc/crc32c.c
index f6fc688..34944ae 100644
--- a/crc/crc32c.c
+++ b/crc/crc32c.c
@@ -30,8 +30,6 @@
  * any later version.
  *
  */
-#include <inttypes.h>
-
 #include "crc32c.h"
 
 /*
diff --git a/crc/crc32c.h b/crc/crc32c.h
index be03c1a..60f6014 100644
--- a/crc/crc32c.h
+++ b/crc/crc32c.h
@@ -18,6 +18,8 @@
 #ifndef CRC32C_H
 #define CRC32C_H
 
+#include <inttypes.h>
+
 #include "../arch/arch.h"
 #include "../lib/types.h"
 
diff --git a/crc/md5.c b/crc/md5.c
index 64fe48a..ade4f69 100644
--- a/crc/md5.c
+++ b/crc/md5.c
@@ -2,7 +2,6 @@
  * Shamelessly lifted from the 2.6 kernel (crypto/md5.c)
  */
 #include <string.h>
-#include <stdint.h>
 #include "md5.h"
 
 static void md5_transform(uint32_t *hash, uint32_t const *in)
diff --git a/crc/sha1.h b/crc/sha1.h
index 75317f7..416199b 100644
--- a/crc/sha1.h
+++ b/crc/sha1.h
@@ -1,6 +1,8 @@
 #ifndef FIO_SHA1
 #define FIO_SHA1
 
+#include <inttypes.h>
+
 /*
  * Based on the Mozilla SHA1 (see mozilla-sha1/sha1.h),
  * optimized to do word accesses rather than byte accesses,
diff --git a/crc/sha256.c b/crc/sha256.c
index 2fd17a3..2b39c42 100644
--- a/crc/sha256.c
+++ b/crc/sha256.c
@@ -17,7 +17,6 @@
  *
  */
 #include <string.h>
-#include <inttypes.h>
 
 #include "../lib/bswap.h"
 #include "sha256.h"
diff --git a/crc/sha256.h b/crc/sha256.h
index b636033..b904c7d 100644
--- a/crc/sha256.h
+++ b/crc/sha256.h
@@ -1,6 +1,8 @@
 #ifndef FIO_SHA256_H
 #define FIO_SHA256_H
 
+#include <inttypes.h>
+
 #define SHA256_DIGEST_SIZE	32
 #define SHA256_BLOCK_SIZE	64
 
diff --git a/crc/sha3.c b/crc/sha3.c
index 2685dce..c136550 100644
--- a/crc/sha3.c
+++ b/crc/sha3.c
@@ -13,7 +13,6 @@
  *
  */
 #include <string.h>
-#include <inttypes.h>
 
 #include "../os/os.h"
 
diff --git a/crc/sha512.c b/crc/sha512.c
index e069a44..f599cdc 100644
--- a/crc/sha512.c
+++ b/crc/sha512.c
@@ -12,7 +12,6 @@
  */
 
 #include <string.h>
-#include <inttypes.h>
 
 #include "../lib/bswap.h"
 #include "sha512.h"
diff --git a/crc/sha512.h b/crc/sha512.h
index f8b2112..5adf627 100644
--- a/crc/sha512.h
+++ b/crc/sha512.h
@@ -1,6 +1,8 @@
 #ifndef FIO_SHA512_H
 #define FIO_SHA512_H
 
+#include <inttypes.h>
+
 struct fio_sha512_ctx {
 	uint64_t state[8];
 	uint32_t count[4];
diff --git a/crc/test.c b/crc/test.c
index b119872..b57f07a 100644
--- a/crc/test.c
+++ b/crc/test.c
@@ -1,11 +1,12 @@
+#include <inttypes.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 
-#include "../fio.h"
 #include "../gettime.h"
 #include "../fio_time.h"
-#include "../verify.h"
+#include "../lib/rand.h"
+#include "../os/os.h"
 
 #include "../crc/md5.h"
 #include "../crc/crc64.h"
diff --git a/debug.c b/debug.c
index 2bee507..d1e2987 100644
--- a/debug.c
+++ b/debug.c
@@ -1,7 +1,8 @@
+#include <assert.h>
 #include <stdarg.h>
-#include <sys/types.h>
-#include <unistd.h>
+
 #include "debug.h"
+#include "log.h"
 
 #ifdef FIO_INC_DEBUG
 void __dprint(int type, const char *str, ...)
diff --git a/debug.h b/debug.h
index b8718dd..8a8cf87 100644
--- a/debug.h
+++ b/debug.h
@@ -1,9 +1,7 @@
 #ifndef FIO_DEBUG_H
 #define FIO_DEBUG_H
 
-#include <assert.h>
 #include "lib/types.h"
-#include "log.h"
 
 enum {
 	FD_PROCESS	= 0,
@@ -61,7 +59,7 @@ void __dprint(int type, const char *str, ...) __attribute__((format (printf, 2,
 
 #define dprint(type, str, args...)			\
 	do {						\
-		if ((((1 << type)) & fio_debug) == 0)	\
+		if (((1 << type) & fio_debug) == 0)	\
 			break;				\
 		__dprint((type), (str), ##args);	\
 	} while (0)					\
diff --git a/diskutil.c b/diskutil.c
index dd8fc6a..b973120 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -1,13 +1,10 @@
 #include <stdio.h>
 #include <string.h>
-#include <sys/time.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <sys/sysmacros.h>
 #include <dirent.h>
 #include <libgen.h>
-#include <math.h>
-#include <assert.h>
 #ifdef CONFIG_VALGRIND_DEV
 #include <valgrind/drd.h>
 #else
diff --git a/diskutil.h b/diskutil.h
index c103578..15ec681 100644
--- a/diskutil.h
+++ b/diskutil.h
@@ -3,7 +3,6 @@
 #include "json.h"
 #define FIO_DU_NAME_SZ		64
 
-#include "lib/output_buffer.h"
 #include "helper_thread.h"
 #include "fio_sem.h"
 
diff --git a/engines/binject.c b/engines/binject.c
index 792dbbd..49042a3 100644
--- a/engines/binject.c
+++ b/engines/binject.c
@@ -11,7 +11,7 @@
 #include <errno.h>
 #include <assert.h>
 #include <string.h>
-#include <sys/poll.h>
+#include <poll.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 
diff --git a/engines/e4defrag.c b/engines/e4defrag.c
index 4b44488..3619450 100644
--- a/engines/e4defrag.c
+++ b/engines/e4defrag.c
@@ -9,11 +9,7 @@
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <sys/uio.h>
 #include <errno.h>
-#include <assert.h>
 #include <fcntl.h>
 
 #include "../fio.h"
diff --git a/engines/falloc.c b/engines/falloc.c
index 2b00d52..bb3ac85 100644
--- a/engines/falloc.c
+++ b/engines/falloc.c
@@ -9,11 +9,7 @@
  *
  */
 #include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <sys/uio.h>
 #include <errno.h>
-#include <assert.h>
 #include <fcntl.h>
 
 #include "../fio.h"
diff --git a/engines/filecreate.c b/engines/filecreate.c
index 0c3bcdd..6fa041c 100644
--- a/engines/filecreate.c
+++ b/engines/filecreate.c
@@ -5,12 +5,10 @@
  * of the file creation.
  */
 #include <stdio.h>
-#include <unistd.h>
 #include <fcntl.h>
 #include <errno.h>
 
 #include "../fio.h"
-#include "../filehash.h"
 
 struct fc_data {
 	enum fio_ddir stat_ddir;
diff --git a/engines/ftruncate.c b/engines/ftruncate.c
index e86dbac..14e115f 100644
--- a/engines/ftruncate.c
+++ b/engines/ftruncate.c
@@ -6,16 +6,10 @@
  * DDIR_WRITE does ftruncate
  *
  */
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <sys/uio.h>
 #include <errno.h>
-#include <assert.h>
-#include <fcntl.h>
+#include <unistd.h>
 
 #include "../fio.h"
-#include "../filehash.h"
 
 static int fio_ftruncate_queue(struct thread_data *td, struct io_u *io_u)
 {
diff --git a/engines/libaio.c b/engines/libaio.c
index e0d7cbb..7d59df3 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -4,11 +4,9 @@
  * IO engine using the Linux native aio interface.
  *
  */
-#include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <errno.h>
-#include <assert.h>
 #include <libaio.h>
 
 #include "../fio.h"
diff --git a/engines/mmap.c b/engines/mmap.c
index ea7179d..9dbefc8 100644
--- a/engines/mmap.c
+++ b/engines/mmap.c
@@ -7,7 +7,6 @@
  */
 #include <stdio.h>
 #include <stdlib.h>
-#include <unistd.h>
 #include <errno.h>
 #include <sys/mman.h>
 
diff --git a/engines/mtd.c b/engines/mtd.c
index b4a6600..5f822fc 100644
--- a/engines/mtd.c
+++ b/engines/mtd.c
@@ -4,17 +4,14 @@
  * IO engine that reads/writes from MTD character devices.
  *
  */
-#include <assert.h>
 #include <stdio.h>
 #include <stdlib.h>
-#include <unistd.h>
 #include <errno.h>
 #include <sys/ioctl.h>
 #include <mtd/mtd-user.h>
 
 #include "../fio.h"
 #include "../optgroup.h"
-#include "../verify.h"
 #include "../oslib/libmtd.h"
 
 static libmtd_t desc;
diff --git a/engines/net.c b/engines/net.c
index 37d44fd..4540e0e 100644
--- a/engines/net.c
+++ b/engines/net.c
@@ -9,13 +9,11 @@
 #include <unistd.h>
 #include <signal.h>
 #include <errno.h>
-#include <assert.h>
 #include <netinet/in.h>
 #include <netinet/tcp.h>
 #include <arpa/inet.h>
 #include <netdb.h>
-#include <sys/poll.h>
-#include <sys/types.h>
+#include <poll.h>
 #include <sys/stat.h>
 #include <sys/socket.h>
 #include <sys/un.h>
diff --git a/engines/null.c b/engines/null.c
index 0cfc22a..8c26ad7 100644
--- a/engines/null.c
+++ b/engines/null.c
@@ -13,10 +13,7 @@
  * LD_LIBRARY_PATH=./engines ./fio examples/cpp_null.fio
  *
  */
-#include <stdio.h>
 #include <stdlib.h>
-#include <unistd.h>
-#include <errno.h>
 #include <assert.h>
 
 #include "../fio.h"
diff --git a/engines/rdma.c b/engines/rdma.c
index 6b173a8..8def6eb 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -30,7 +30,7 @@
 #include <netinet/in.h>
 #include <arpa/inet.h>
 #include <netdb.h>
-#include <sys/poll.h>
+#include <poll.h>
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <sys/time.h>
diff --git a/engines/sg.c b/engines/sg.c
index f240755..72eed8b 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -8,8 +8,7 @@
 #include <stdlib.h>
 #include <unistd.h>
 #include <errno.h>
-#include <assert.h>
-#include <sys/poll.h>
+#include <poll.h>
 
 #include "../fio.h"
 #include "../optgroup.h"
@@ -456,8 +455,10 @@ static int fio_sgio_read_capacity(struct thread_data *td, unsigned int *bs,
 		return ret;
 	}
 
-	*bs	 = (buf[4] << 24) | (buf[5] << 16) | (buf[6] << 8) | buf[7];
-	*max_lba = ((buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | buf[3]) & MAX_10B_LBA;  // for some reason max_lba is being sign extended even though unsigned.
+	*bs	 = ((unsigned long) buf[4] << 24) | ((unsigned long) buf[5] << 16) |
+		   ((unsigned long) buf[6] << 8) | (unsigned long) buf[7];
+	*max_lba = ((unsigned long) buf[0] << 24) | ((unsigned long) buf[1] << 16) |
+		   ((unsigned long) buf[2] << 8) | (unsigned long) buf[3];
 
 	/*
 	 * If max lba masked by MAX_10B_LBA equals MAX_10B_LBA,
diff --git a/engines/splice.c b/engines/splice.c
index d5d8ab0..08fc857 100644
--- a/engines/splice.c
+++ b/engines/splice.c
@@ -9,8 +9,7 @@
 #include <stdlib.h>
 #include <unistd.h>
 #include <errno.h>
-#include <assert.h>
-#include <sys/poll.h>
+#include <poll.h>
 #include <sys/mman.h>
 
 #include "../fio.h"
diff --git a/engines/sync.c b/engines/sync.c
index 26b98b6..d5b4012 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -10,7 +10,6 @@
 #include <unistd.h>
 #include <sys/uio.h>
 #include <errno.h>
-#include <assert.h>
 
 #include "../fio.h"
 #include "../optgroup.h"
diff --git a/eta.c b/eta.c
index 3126f21..2d549ee 100644
--- a/eta.c
+++ b/eta.c
@@ -2,7 +2,6 @@
  * Status and ETA code
  */
 #include <unistd.h>
-#include <fcntl.h>
 #include <string.h>
 #ifdef CONFIG_VALGRIND_DEV
 #include <valgrind/drd.h>
diff --git a/fifo.c b/fifo.c
index 98737e9..ac0d215 100644
--- a/fifo.c
+++ b/fifo.c
@@ -24,6 +24,7 @@
 #include <string.h>
 
 #include "fifo.h"
+#include "minmax.h"
 
 struct fifo *fifo_alloc(unsigned int size)
 {
diff --git a/fifo.h b/fifo.h
index 5e3d339..61cc5a8 100644
--- a/fifo.h
+++ b/fifo.h
@@ -20,7 +20,6 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  */
-#include "minmax.h"
 
 struct fifo {
 	unsigned char *buffer;	/* the buffer holding the data */
diff --git a/filesetup.c b/filesetup.c
index 7cbce13..c115f7b 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -5,8 +5,6 @@
 #include <dirent.h>
 #include <libgen.h>
 #include <sys/stat.h>
-#include <sys/mman.h>
-#include <sys/types.h>
 
 #include "fio.h"
 #include "smalloc.h"
diff --git a/fio.c b/fio.c
index 7b61ffc..f19db1b 100644
--- a/fio.c
+++ b/fio.c
@@ -21,12 +21,7 @@
  *  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  */
-#include <unistd.h>
-#include <locale.h>
-#include <time.h>
-
 #include "fio.h"
-#include "smalloc.h"
 
 int main(int argc, char *argv[], char *envp[])
 {
diff --git a/fio.h b/fio.h
index 9551048..488fa9a 100644
--- a/fio.h
+++ b/fio.h
@@ -27,6 +27,7 @@
 #include "ioengines.h"
 #include "iolog.h"
 #include "helpers.h"
+#include "minmax.h"
 #include "options.h"
 #include "profile.h"
 #include "fio_time.h"
diff --git a/fio_sem.c b/fio_sem.c
index 20fcfcc..3b48061 100644
--- a/fio_sem.c
+++ b/fio_sem.c
@@ -1,3 +1,4 @@
+#include <stdio.h>
 #include <string.h>
 #include <sys/mman.h>
 #include <assert.h>
@@ -7,7 +8,6 @@
 #define RUNNING_ON_VALGRIND 0
 #endif
 
-#include "log.h"
 #include "fio_sem.h"
 #include "pshared.h"
 #include "os/os.h"
diff --git a/fio_time.h b/fio_time.h
index 8b4bb25..c00f8e7 100644
--- a/fio_time.h
+++ b/fio_time.h
@@ -1,8 +1,11 @@
 #ifndef FIO_TIME_H
 #define FIO_TIME_H
 
+#include <stdint.h>
+/* IWYU pragma: begin_exports */
 #include <time.h>
 #include <sys/time.h>
+/* IWYU pragma: end_exports */
 #include "lib/types.h"
 
 struct thread_data;
diff --git a/gettime-thread.c b/gettime-thread.c
index 87f5060..eb535a0 100644
--- a/gettime-thread.c
+++ b/gettime-thread.c
@@ -1,5 +1,3 @@
-#include <unistd.h>
-#include <math.h>
 #include <sys/time.h>
 #include <time.h>
 
diff --git a/gettime.c b/gettime.c
index 57c66f7..87fc29b 100644
--- a/gettime.c
+++ b/gettime.c
@@ -2,16 +2,9 @@
  * Clock functions
  */
 
-#include <unistd.h>
 #include <math.h>
-#include <sys/time.h>
-#include <time.h>
 
 #include "fio.h"
-#include "fio_sem.h"
-#include "smalloc.h"
-
-#include "hash.h"
 #include "os/os.h"
 
 #if defined(ARCH_HAVE_CPU_CLOCK)
diff --git a/gettime.h b/gettime.h
index 11e2a7b..1c4a25c 100644
--- a/gettime.h
+++ b/gettime.h
@@ -1,6 +1,8 @@
 #ifndef FIO_GETTIME_H
 #define FIO_GETTIME_H
 
+#include <sys/time.h>
+
 #include "arch/arch.h"
 
 /*
diff --git a/helpers.c b/helpers.c
index 4342b2d..a0ee370 100644
--- a/helpers.c
+++ b/helpers.c
@@ -1,13 +1,6 @@
-#include <stdlib.h>
 #include <errno.h>
-#include <sys/socket.h>
-#include <sys/time.h>
-#include <netinet/in.h>
-#include <unistd.h>
 
-#include "compiler/compiler.h"
-#include "arch/arch.h"
-#include "os/os.h"
+#include "helpers.h"
 
 #ifndef CONFIG_LINUX_FALLOCATE
 int fallocate(int fd, int mode, off_t offset, off_t len)
diff --git a/helpers.h b/helpers.h
index 5f1865b..a0b3285 100644
--- a/helpers.h
+++ b/helpers.h
@@ -1,10 +1,9 @@
 #ifndef FIO_HELPERS_H
 #define FIO_HELPERS_H
 
-#include "compiler/compiler.h"
-
 #include <sys/types.h>
-#include <time.h>
+
+#include "os/os.h"
 
 extern int fallocate(int fd, int mode, off_t offset, off_t len);
 extern int posix_fallocate(int fd, off_t offset, off_t len);
diff --git a/idletime.c b/idletime.c
index 90bc1d9..8762c85 100644
--- a/idletime.c
+++ b/idletime.c
@@ -1,4 +1,5 @@
 #include <math.h>
+#include "fio.h"
 #include "json.h"
 #include "idletime.h"
 
diff --git a/idletime.h b/idletime.h
index b8376c2..6c1161a 100644
--- a/idletime.h
+++ b/idletime.h
@@ -1,8 +1,9 @@
 #ifndef FIO_IDLETIME_H
 #define FIO_IDLETIME_H
 
-#include "fio.h"
-#include "lib/output_buffer.h"
+#include <sys/time.h>
+#include <sys/types.h>
+#include "os/os.h"
 
 #define CALIBRATE_RUNS  10
 #define CALIBRATE_SCALE 1000
diff --git a/init.c b/init.c
index e47e538..ab7e399 100644
--- a/init.c
+++ b/init.c
@@ -4,13 +4,11 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
-#include <fcntl.h>
 #include <ctype.h>
 #include <string.h>
 #include <errno.h>
 #include <sys/ipc.h>
 #include <sys/types.h>
-#include <sys/stat.h>
 #include <dlfcn.h>
 #ifdef CONFIG_VALGRIND_DEV
 #include <valgrind/drd.h>
diff --git a/io_u.c b/io_u.c
index f3b5932..98a7dc5 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1,12 +1,8 @@
 #include <unistd.h>
-#include <fcntl.h>
 #include <string.h>
-#include <signal.h>
-#include <time.h>
 #include <assert.h>
 
 #include "fio.h"
-#include "hash.h"
 #include "verify.h"
 #include "trim.h"
 #include "lib/rand.h"
diff --git a/io_u.h b/io_u.h
index da25efb..aaa7d97 100644
--- a/io_u.h
+++ b/io_u.h
@@ -3,7 +3,6 @@
 
 #include "compiler/compiler.h"
 #include "os/os.h"
-#include "log.h"
 #include "io_ddir.h"
 #include "debug.h"
 #include "file.h"
diff --git a/io_u_queue.h b/io_u_queue.h
index b5b8d2f..545e2c4 100644
--- a/io_u_queue.h
+++ b/io_u_queue.h
@@ -2,6 +2,8 @@
 #define FIO_IO_U_QUEUE
 
 #include <assert.h>
+#include <stddef.h>
+
 #include "lib/types.h"
 
 struct io_u;
diff --git a/ioengines.c b/ioengines.c
index 965581a..a8ec79d 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -9,7 +9,6 @@
  * generic io engine that could be used for other projects.
  *
  */
-#include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <string.h>
diff --git a/ioengines.h b/ioengines.h
index 32b18ed..a0674ae 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -1,9 +1,10 @@
 #ifndef FIO_IOENGINE_H
 #define FIO_IOENGINE_H
 
+#include <stddef.h>
+
 #include "compiler/compiler.h"
-#include "os/os.h"
-#include "file.h"
+#include "flist.h"
 #include "io_u.h"
 
 #define FIO_IOOPS_VERSION	23
diff --git a/iolog.c b/iolog.c
index 460d7a2..2b5eaf0 100644
--- a/iolog.c
+++ b/iolog.c
@@ -4,7 +4,6 @@
  */
 #include <stdio.h>
 #include <stdlib.h>
-#include <libgen.h>
 #include <assert.h>
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -15,7 +14,6 @@
 
 #include "flist.h"
 #include "fio.h"
-#include "verify.h"
 #include "trim.h"
 #include "filelock.h"
 #include "smalloc.h"
diff --git a/iolog.h b/iolog.h
index 70981f9..f70eb61 100644
--- a/iolog.h
+++ b/iolog.h
@@ -1,6 +1,8 @@
 #ifndef FIO_IOLOG_H
 #define FIO_IOLOG_H
 
+#include <stdio.h>
+
 #include "lib/rbtree.h"
 #include "lib/ieee754.h"
 #include "flist.h"
diff --git a/json.c b/json.c
index e0227ec..75212c8 100644
--- a/json.c
+++ b/json.c
@@ -1,6 +1,5 @@
 #include <stdlib.h>
 #include <string.h>
-#include <stdio.h>
 #include <errno.h>
 #include <stdarg.h>
 #include "json.h"
diff --git a/json.h b/json.h
index d7017e0..bcc712c 100644
--- a/json.h
+++ b/json.h
@@ -3,10 +3,6 @@
 
 #include "lib/output_buffer.h"
 
-struct json_object;
-struct json_array;
-struct json_pair;
-
 #define JSON_TYPE_STRING 0
 #define JSON_TYPE_INTEGER 1
 #define JSON_TYPE_FLOAT 2
diff --git a/lib/bloom.c b/lib/bloom.c
index bb81dbb..f4f9b6b 100644
--- a/lib/bloom.c
+++ b/lib/bloom.c
@@ -1,9 +1,7 @@
 #include <stdlib.h>
-#include <inttypes.h>
 
 #include "bloom.h"
 #include "../hash.h"
-#include "../minmax.h"
 #include "../crc/xxhash.h"
 #include "../crc/murmur3.h"
 #include "../crc/crc32c.h"
diff --git a/lib/gauss.c b/lib/gauss.c
index f974490..1d24e18 100644
--- a/lib/gauss.c
+++ b/lib/gauss.c
@@ -1,6 +1,5 @@
 #include <math.h>
 #include <string.h>
-#include <stdio.h>
 #include "../hash.h"
 #include "gauss.h"
 
diff --git a/lib/ieee754.c b/lib/ieee754.c
index c7742a2..2154065 100644
--- a/lib/ieee754.c
+++ b/lib/ieee754.c
@@ -5,7 +5,6 @@
  *
  * Below code was granted to the public domain.
  */
-#include <inttypes.h>
 #include "ieee754.h"
 
 uint64_t pack754(long double f, unsigned bits, unsigned expbits)
diff --git a/lib/lfsr.c b/lib/lfsr.c
index 0c0072c..a4f1fb1 100644
--- a/lib/lfsr.c
+++ b/lib/lfsr.c
@@ -1,5 +1,4 @@
 #include <stdio.h>
-#include <math.h>
 
 #include "lfsr.h"
 #include "../compiler/compiler.h"
diff --git a/lib/memalign.c b/lib/memalign.c
index bfbd1e8..e774c19 100644
--- a/lib/memalign.c
+++ b/lib/memalign.c
@@ -1,6 +1,5 @@
-#include <stdlib.h>
 #include <assert.h>
-#include <inttypes.h>
+#include <stdlib.h>
 
 #include "memalign.h"
 
diff --git a/lib/memalign.h b/lib/memalign.h
index df412e2..c2eb170 100644
--- a/lib/memalign.h
+++ b/lib/memalign.h
@@ -1,6 +1,8 @@
 #ifndef FIO_MEMALIGN_H
 #define FIO_MEMALIGN_H
 
+#include <inttypes.h>
+
 extern void *fio_memalign(size_t alignment, size_t size);
 extern void fio_memfree(void *ptr, size_t size);
 
diff --git a/lib/memcpy.c b/lib/memcpy.c
index 00e65aa..cf8572e 100644
--- a/lib/memcpy.c
+++ b/lib/memcpy.c
@@ -1,3 +1,4 @@
+#include <inttypes.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -6,7 +7,7 @@
 #include "rand.h"
 #include "../fio_time.h"
 #include "../gettime.h"
-#include "../fio.h"
+#include "../os/os.h"
 
 #define BUF_SIZE	32 * 1024 * 1024ULL
 
diff --git a/lib/num2str.c b/lib/num2str.c
index 8d08841..387c5d7 100644
--- a/lib/num2str.c
+++ b/lib/num2str.c
@@ -1,3 +1,4 @@
+#include <assert.h>
 #include <stdlib.h>
 #include <stdio.h>
 #include <string.h>
diff --git a/lib/output_buffer.c b/lib/output_buffer.c
index f6c304b..beb8a14 100644
--- a/lib/output_buffer.c
+++ b/lib/output_buffer.c
@@ -1,4 +1,3 @@
-#include <stdio.h>
 #include <string.h>
 #include <stdlib.h>
 
diff --git a/lib/output_buffer.h b/lib/output_buffer.h
index a235af2..389ed5b 100644
--- a/lib/output_buffer.h
+++ b/lib/output_buffer.h
@@ -1,7 +1,7 @@
 #ifndef FIO_OUTPUT_BUFFER_H
 #define FIO_OUTPUT_BUFFER_H
 
-#include <unistd.h>
+#include <stddef.h>
 
 struct buf_output {
 	char *buf;
diff --git a/lib/pattern.c b/lib/pattern.c
index 31ee4ea..2024f2e 100644
--- a/lib/pattern.c
+++ b/lib/pattern.c
@@ -4,8 +4,6 @@
 #include <limits.h>
 #include <errno.h>
 #include <assert.h>
-#include <sys/types.h>
-#include <sys/stat.h>
 #include <fcntl.h>
 #include <unistd.h>
 
diff --git a/lib/pattern.h b/lib/pattern.h
index 9f937f0..2d655ad 100644
--- a/lib/pattern.h
+++ b/lib/pattern.h
@@ -1,8 +1,6 @@
 #ifndef FIO_PARSE_PATTERN_H
 #define FIO_PARSE_PATTERN_H
 
-struct pattern_fmt;
-
 /**
  * Pattern format description. The input for 'parse_pattern'.
  * Describes format with its name and callback, which should
diff --git a/lib/prio_tree.c b/lib/prio_tree.c
index de3fe1c..d8e1b89 100644
--- a/lib/prio_tree.c
+++ b/lib/prio_tree.c
@@ -11,6 +11,7 @@
  * 02Feb2004	Initial version
  */
 
+#include <assert.h>
 #include <stdlib.h>
 #include <limits.h>
 
diff --git a/lib/prio_tree.h b/lib/prio_tree.h
index e1491db..9bd458f 100644
--- a/lib/prio_tree.h
+++ b/lib/prio_tree.h
@@ -2,7 +2,6 @@
 #define _LINUX_PRIO_TREE_H
 
 #include <inttypes.h>
-#include "../hash.h"
 
 struct prio_tree_node {
 	struct prio_tree_node	*left;
diff --git a/lib/rand.c b/lib/rand.c
index 3f60a67..46ffe4f 100644
--- a/lib/rand.c
+++ b/lib/rand.c
@@ -34,7 +34,6 @@
 */
 
 #include <string.h>
-#include <assert.h>
 #include "rand.h"
 #include "pattern.h"
 #include "../hash.h"
diff --git a/lib/rand.h b/lib/rand.h
index bff4a35..8832c73 100644
--- a/lib/rand.h
+++ b/lib/rand.h
@@ -4,7 +4,6 @@
 #include <inttypes.h>
 #include <assert.h>
 #include "types.h"
-#include "../arch/arch.h"
 
 #define FRAND32_MAX	(-1U)
 #define FRAND64_MAX	(-1ULL)
diff --git a/lib/strntol.h b/lib/strntol.h
index 68f5d1b..59c090d 100644
--- a/lib/strntol.h
+++ b/lib/strntol.h
@@ -1,6 +1,8 @@
 #ifndef FIO_STRNTOL_H
 #define FIO_STRNTOL_H
 
+#include <stdint.h>
+
 long strntol(const char *str, size_t sz, char **end, int base);
 
 #endif
diff --git a/lib/types.h b/lib/types.h
index bb24506..236bf8a 100644
--- a/lib/types.h
+++ b/lib/types.h
@@ -10,7 +10,7 @@ typedef int bool;
 #define true	1
 #endif
 #else
-#include <stdbool.h>
+#include <stdbool.h> /* IWYU pragma: export */
 #endif
 
 #endif
diff --git a/lib/zipf.c b/lib/zipf.c
index 3d535c7..1ff8568 100644
--- a/lib/zipf.c
+++ b/lib/zipf.c
@@ -1,11 +1,5 @@
 #include <math.h>
 #include <string.h>
-#include <inttypes.h>
-#include <stdio.h>
-#include <unistd.h>
-#include <sys/types.h>
-#include <fcntl.h>
-#include "ieee754.h"
 #include "zipf.h"
 #include "../minmax.h"
 #include "../hash.h"
diff --git a/lib/zipf.h b/lib/zipf.h
index af2d0e6..a4aa163 100644
--- a/lib/zipf.h
+++ b/lib/zipf.h
@@ -3,6 +3,7 @@
 
 #include <inttypes.h>
 #include "rand.h"
+#include "types.h"
 
 struct zipf_state {
 	uint64_t nranges;
diff --git a/libfio.c b/libfio.c
index 80159b4..6faf32a 100644
--- a/libfio.c
+++ b/libfio.c
@@ -23,7 +23,6 @@
  */
 
 #include <string.h>
-#include <sys/types.h>
 #include <signal.h>
 #include <stdint.h>
 #include <locale.h>
diff --git a/log.c b/log.c
index a327f6a..46e5034 100644
--- a/log.c
+++ b/log.c
@@ -1,12 +1,10 @@
 #include <unistd.h>
-#include <fcntl.h>
 #include <string.h>
 #include <stdarg.h>
 #include <syslog.h>
 
 #include "fio.h"
-
-#define LOG_START_SZ		512
+#include "oslib/asprintf.h"
 
 size_t log_info_buf(const char *buf, size_t len)
 {
@@ -29,63 +27,14 @@ size_t log_info_buf(const char *buf, size_t len)
 		return fwrite(buf, len, 1, f_out);
 }
 
-static size_t valist_to_buf(char **buffer, const char *fmt, va_list src_args)
-{
-	size_t len, cur = LOG_START_SZ;
-	va_list args;
-
-	do {
-		*buffer = calloc(1, cur);
-		if (!*buffer)
-			return 0;
-
-		va_copy(args, src_args);
-		len = vsnprintf(*buffer, cur, fmt, args);
-		va_end(args);
-
-		if (len < cur)
-			break;
-
-		cur = len + 1;
-		free(*buffer);
-	} while (1);
-
-	return len;
-}
-
-/* allocate buffer, fill with prefix string followed by vararg string */
-static size_t prevalist_to_buf(char **buffer, const char *pre, int prelen,
-		const char *fmt, va_list src_args)
-{
-	size_t len, cur = LOG_START_SZ;
-	va_list args;
-
-	do {
-		*buffer = calloc(1, cur);
-		if (!*buffer)
-			return 0;
-
-		va_copy(args, src_args);
-		memcpy(*buffer, pre, prelen);
-		len = prelen + vsnprintf(*buffer + prelen, cur - prelen, fmt, args);
-		va_end(args);
-
-		if (len < cur)
-			break;
-
-		cur = len + 1;
-		free(*buffer);
-	} while (1);
-
-	return len;
-}
-
 size_t log_valist(const char *fmt, va_list args)
 {
 	char *buffer;
-	size_t len;
+	int len;
 
-	len = valist_to_buf(&buffer, fmt, args);
+	len = vasprintf(&buffer, fmt, args);
+	if (len < 0)
+		return 0;
 	len = log_info_buf(buffer, len);
 	free(buffer);
 
@@ -95,10 +44,8 @@ size_t log_valist(const char *fmt, va_list args)
 /* add prefix for the specified type in front of the valist */
 void log_prevalist(int type, const char *fmt, va_list args)
 {
-	char pre[32];
-	char *buffer;
-	size_t len;
-	int prelen;
+	char *buf1, *buf2;
+	int len;
 	pid_t pid;
 
 	pid = gettid();
@@ -106,12 +53,16 @@ void log_prevalist(int type, const char *fmt, va_list args)
 	    && pid != *fio_debug_jobp)
 		return;
 
-	prelen = snprintf(pre, sizeof pre, "%-8s %-5u ", debug_levels[type].name, (int) pid);
-	if (prelen > 0) {
-		len = prevalist_to_buf(&buffer, pre, prelen, fmt, args);
-		len = log_info_buf(buffer, len);
-		free(buffer);
-	}
+	len = vasprintf(&buf1, fmt, args);
+	if (len < 0)
+		return;
+	len = asprintf(&buf2, "%-8s %-5u %s", debug_levels[type].name,
+		       (int) pid, buf1);
+	free(buf1);
+	if (len < 0)
+		return;
+	len = log_info_buf(buf2, len);
+	free(buf2);
 }
 
 size_t log_info(const char *format, ...)
@@ -130,12 +81,13 @@ size_t __log_buf(struct buf_output *buf, const char *format, ...)
 {
 	char *buffer;
 	va_list args;
-	size_t len;
+	int len;
 
 	va_start(args, format);
-	len = valist_to_buf(&buffer, format, args);
+	len = vasprintf(&buffer, format, args);
 	va_end(args);
-
+	if (len < 0)
+		return 0;
 	len = buf_output_add(buf, buffer, len);
 	free(buffer);
 
@@ -152,13 +104,16 @@ int log_info_flush(void)
 
 size_t log_err(const char *format, ...)
 {
-	size_t ret, len;
+	size_t ret;
+	int len;
 	char *buffer;
 	va_list args;
 
 	va_start(args, format);
-	len = valist_to_buf(&buffer, format, args);
+	len = vasprintf(&buffer, format, args);
 	va_end(args);
+	if (len < 0)
+		return len;
 
 	if (is_backend) {
 		ret = fio_server_text_output(FIO_LOG_ERR, buffer, len);
diff --git a/memory.c b/memory.c
index 04dc3be..5f0225f 100644
--- a/memory.c
+++ b/memory.c
@@ -1,11 +1,10 @@
 /*
  * Memory helpers
  */
-#include <sys/types.h>
-#include <sys/stat.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <sys/mman.h>
+#include <sys/stat.h>
 
 #include "fio.h"
 #ifndef FIO_NO_HAVE_SHM_H
diff --git a/options.c b/options.c
index 6810521..45a5b82 100644
--- a/options.c
+++ b/options.c
@@ -4,16 +4,12 @@
 #include <ctype.h>
 #include <string.h>
 #include <assert.h>
-#include <libgen.h>
-#include <fcntl.h>
-#include <sys/types.h>
 #include <sys/stat.h>
 #include <netinet/in.h>
 
 #include "fio.h"
 #include "verify.h"
 #include "parse.h"
-#include "lib/fls.h"
 #include "lib/pattern.h"
 #include "options.h"
 #include "optgroup.h"
diff --git a/options.h b/options.h
index 83a58e2..59024ef 100644
--- a/options.h
+++ b/options.h
@@ -6,7 +6,6 @@
 #include <string.h>
 #include <inttypes.h>
 #include "parse.h"
-#include "flist.h"
 #include "lib/types.h"
 
 int add_option(struct fio_option *);
diff --git a/os/os-hpux.h b/os/os-hpux.h
index 6a240b0..515a525 100644
--- a/os/os-hpux.h
+++ b/os/os-hpux.h
@@ -6,7 +6,7 @@
 #include <errno.h>
 #include <unistd.h>
 #include <sys/ioctl.h>
-#include <sys/fcntl.h>
+#include <fcntl.h>
 #include <sys/fadvise.h>
 #include <sys/mman.h>
 #include <sys/mpctl.h>
diff --git a/os/os-solaris.h b/os/os-solaris.h
index db03546..2425ab9 100644
--- a/os/os-solaris.h
+++ b/os/os-solaris.h
@@ -7,7 +7,7 @@
 #include <malloc.h>
 #include <unistd.h>
 #include <sys/types.h>
-#include <sys/fcntl.h>
+#include <fcntl.h>
 #include <sys/pset.h>
 #include <sys/mman.h>
 #include <sys/dkio.h>
diff --git a/os/os.h b/os/os.h
index 1a4437c..95ed7cf 100644
--- a/os/os.h
+++ b/os/os.h
@@ -8,7 +8,7 @@
 #include <unistd.h>
 #include <stdlib.h>
 
-#include "../arch/arch.h"
+#include "../arch/arch.h" /* IWYU pragma: export */
 #include "../lib/types.h"
 
 enum {
@@ -27,6 +27,7 @@ enum {
 	os_nr,
 };
 
+/* IWYU pragma: begin_exports */
 #if defined(__ANDROID__)
 #include "os-android.h"
 #elif defined(__linux__)
@@ -67,6 +68,7 @@ typedef struct aiocb os_aiocb_t;
 #ifndef CONFIG_STRLCAT
 #include "../oslib/strlcat.h"
 #endif
+/* IWYU pragma: end_exports */
 
 #ifdef MSG_DONTWAIT
 #define OS_MSG_DONTWAIT	MSG_DONTWAIT
diff --git a/os/windows/posix.c b/os/windows/posix.c
index 17e18a1..ecc8c40 100755
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -18,7 +18,7 @@
 #include <sys/mman.h>
 #include <sys/uio.h>
 #include <sys/resource.h>
-#include <sys/poll.h>
+#include <poll.h>
 #include <sys/wait.h>
 #include <setjmp.h>
 
diff --git a/os/windows/posix/include/poll.h b/os/windows/posix/include/poll.h
index 058e23a..f064e2b 100644
--- a/os/windows/posix/include/poll.h
+++ b/os/windows/posix/include/poll.h
@@ -1,4 +1,15 @@
 #ifndef POLL_H
 #define POLL_H
 
+typedef int nfds_t;
+
+struct pollfd
+{
+	int fd;
+	short events;
+	short revents;
+};
+
+int poll(struct pollfd fds[], nfds_t nfds, int timeout);
+
 #endif /* POLL_H */
diff --git a/os/windows/posix/include/sys/poll.h b/os/windows/posix/include/sys/poll.h
deleted file mode 100644
index f009d6e..0000000
--- a/os/windows/posix/include/sys/poll.h
+++ /dev/null
@@ -1,15 +0,0 @@
-#ifndef SYS_POLL_H
-#define SYS_POLL_H
-
-typedef int nfds_t;
-
-struct pollfd
-{
-	int fd;
-	short events;
-	short revents;
-};
-
-int poll(struct pollfd fds[], nfds_t nfds, int timeout);
-
-#endif /* SYS_POLL_H */
diff --git a/oslib/asprintf.c b/oslib/asprintf.c
new file mode 100644
index 0000000..f1e7fd2
--- /dev/null
+++ b/oslib/asprintf.c
@@ -0,0 +1,43 @@
+#include <stdarg.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include "oslib/asprintf.h"
+
+#ifndef HAVE_VASPRINTF
+int vasprintf(char **strp, const char *fmt, va_list ap)
+{
+    va_list ap_copy;
+    char *str;
+    int len;
+
+#ifdef va_copy
+    va_copy(ap_copy, ap);
+#else
+    __va_copy(ap_copy, ap);
+#endif
+    len = vsnprintf(NULL, 0, fmt, ap_copy);
+    va_end(ap_copy);
+
+    if (len < 0)
+        return len;
+
+    len++;
+    str = malloc(len);
+    *strp = str;
+    return str ? vsnprintf(str, len, fmt, ap) : -1;
+}
+#endif
+
+#ifndef HAVE_ASPRINTF
+int asprintf(char **strp, const char *fmt, ...)
+{
+    va_list arg;
+    int done;
+
+    va_start(arg, fmt);
+    done = vasprintf(strp, fmt, arg);
+    va_end(arg);
+
+    return done;
+}
+#endif
diff --git a/oslib/asprintf.h b/oslib/asprintf.h
new file mode 100644
index 0000000..1aa076b
--- /dev/null
+++ b/oslib/asprintf.h
@@ -0,0 +1,11 @@
+#ifndef FIO_ASPRINTF_H
+#define FIO_ASPRINTF_H
+
+#ifndef HAVE_VASPRINTF
+int vasprintf(char **strp, const char *fmt, va_list ap);
+#endif
+#ifndef HAVE_ASPRINTF
+int asprintf(char **strp, const char *fmt, ...);
+#endif
+
+#endif /* FIO_ASPRINTF_H */
diff --git a/oslib/strcasestr.c b/oslib/strcasestr.c
index 2626609..5fa05fa 100644
--- a/oslib/strcasestr.c
+++ b/oslib/strcasestr.c
@@ -1,7 +1,8 @@
+#ifndef CONFIG_STRCASESTR
+
 #include <ctype.h>
 #include <stddef.h>
-
-#ifndef CONFIG_STRCASESTR
+#include "strcasestr.h"
 
 char *strcasestr(const char *s1, const char *s2)
 {
diff --git a/oslib/strcasestr.h b/oslib/strcasestr.h
index 43d61df..f13e929 100644
--- a/oslib/strcasestr.h
+++ b/oslib/strcasestr.h
@@ -1,8 +1,4 @@
-#ifdef CONFIG_STRCASESTR
-
-#include <string.h>
-
-#else
+#ifndef CONFIG_STRCASESTR
 
 #ifndef FIO_STRCASESTR_H
 #define FIO_STRCASESTR_H
@@ -10,4 +6,5 @@
 char *strcasestr(const char *haystack, const char *needle);
 
 #endif
+
 #endif
diff --git a/oslib/strlcat.c b/oslib/strlcat.c
index 3b33d0e..6c4c678 100644
--- a/oslib/strlcat.c
+++ b/oslib/strlcat.c
@@ -1,3 +1,5 @@
+#ifndef CONFIG_STRLCAT
+
 #include <string.h>
 #include "strlcat.h"
 
@@ -22,3 +24,5 @@ size_t strlcat(char *dst, const char *src, size_t size)
 
 	return dstlen + srclen;
 }
+
+#endif
diff --git a/oslib/strlcat.h b/oslib/strlcat.h
index baeace4..f766392 100644
--- a/oslib/strlcat.h
+++ b/oslib/strlcat.h
@@ -1,6 +1,12 @@
+#ifndef CONFIG_STRLCAT
+
 #ifndef FIO_STRLCAT_H
 #define FIO_STRLCAT_H
 
+#include <stddef.h>
+
 size_t strlcat(char *dst, const char *src, size_t size);
 
 #endif
+
+#endif
diff --git a/oslib/strndup.c b/oslib/strndup.c
index 7b0fcb5..657904a 100644
--- a/oslib/strndup.c
+++ b/oslib/strndup.c
@@ -1,8 +1,9 @@
+#ifndef CONFIG_HAVE_STRNDUP
+
 #include <stdlib.h>
+#include <string.h>
 #include "strndup.h"
 
-#ifndef CONFIG_HAVE_STRNDUP
-
 char *strndup(const char *s, size_t n)
 {
 	char *str = malloc(n + 1);
diff --git a/oslib/strndup.h b/oslib/strndup.h
index 2cb904d..2f41848 100644
--- a/oslib/strndup.h
+++ b/oslib/strndup.h
@@ -1,7 +1,12 @@
-#include <string.h>
-
 #ifndef CONFIG_HAVE_STRNDUP
 
+#ifndef FIO_STRNDUP_LIB_H
+#define FIO_STRNDUP_LIB_H
+
+#include <stddef.h>
+
 char *strndup(const char *s, size_t n);
 
 #endif
+
+#endif
diff --git a/oslib/strsep.c b/oslib/strsep.c
index b71e9f7..2d42ca0 100644
--- a/oslib/strsep.c
+++ b/oslib/strsep.c
@@ -1,4 +1,7 @@
-#include <stdio.h>
+#ifndef CONFIG_STRSEP
+
+#include <stddef.h>
+#include "strsep.h"
 
 char *strsep(char **stringp, const char *delim)
 {
@@ -27,3 +30,5 @@ char *strsep(char **stringp, const char *delim)
 		} while (sc != 0);
 	} while (1);
 }
+
+#endif
diff --git a/oslib/strsep.h b/oslib/strsep.h
index 5fea5d1..8cd9ada 100644
--- a/oslib/strsep.h
+++ b/oslib/strsep.h
@@ -1,6 +1,10 @@
+#ifndef CONFIG_STRSEP
+
 #ifndef FIO_STRSEP_LIB_H
 #define FIO_STRSEP_LIB_H
 
 char *strsep(char **, const char *);
 
 #endif
+
+#endif
diff --git a/parse.c b/parse.c
index fdb6611..33fcf46 100644
--- a/parse.c
+++ b/parse.c
@@ -3,18 +3,16 @@
  */
 #include <stdio.h>
 #include <stdlib.h>
-#include <unistd.h>
 #include <ctype.h>
 #include <string.h>
 #include <errno.h>
 #include <limits.h>
-#include <stdlib.h>
-#include <math.h>
 #include <float.h>
 
 #include "compiler/compiler.h"
 #include "parse.h"
 #include "debug.h"
+#include "log.h"
 #include "options.h"
 #include "optgroup.h"
 #include "minmax.h"
diff --git a/server.c b/server.c
index 65d4484..15dc2c4 100644
--- a/server.c
+++ b/server.c
@@ -1,10 +1,8 @@
 #include <stdio.h>
 #include <stdlib.h>
-#include <stdarg.h>
 #include <unistd.h>
-#include <limits.h>
 #include <errno.h>
-#include <sys/poll.h>
+#include <poll.h>
 #include <sys/types.h>
 #include <sys/wait.h>
 #include <sys/socket.h>
@@ -25,7 +23,7 @@
 #include "server.h"
 #include "crc/crc16.h"
 #include "lib/ieee754.h"
-#include "verify.h"
+#include "verify-state.h"
 #include "smalloc.h"
 
 int fio_net_port = FIO_NET_PORT;
@@ -528,6 +526,9 @@ static struct sk_entry *fio_net_prep_cmd(uint16_t opcode, void *buf,
 	struct sk_entry *entry;
 
 	entry = smalloc(sizeof(*entry));
+	if (!entry)
+		return NULL;
+
 	INIT_FLIST_HEAD(&entry->next);
 	entry->opcode = opcode;
 	if (flags & SK_F_COPY) {
@@ -616,7 +617,7 @@ static int fio_net_queue_quit(void)
 {
 	dprint(FD_NET, "server: sending quit\n");
 
-	return fio_net_queue_cmd(FIO_NET_CMD_QUIT, NULL, 0, NULL, SK_F_SIMPLE | SK_F_INLINE);
+	return fio_net_queue_cmd(FIO_NET_CMD_QUIT, NULL, 0, NULL, SK_F_SIMPLE);
 }
 
 int fio_net_send_quit(int sk)
@@ -636,7 +637,7 @@ static int fio_net_send_ack(struct fio_net_cmd *cmd, int error, int signal)
 
 	epdu.error = __cpu_to_le32(error);
 	epdu.signal = __cpu_to_le32(signal);
-	return fio_net_queue_cmd(FIO_NET_CMD_STOP, &epdu, sizeof(epdu), &tag, SK_F_COPY | SK_F_INLINE);
+	return fio_net_queue_cmd(FIO_NET_CMD_STOP, &epdu, sizeof(epdu), &tag, SK_F_COPY);
 }
 
 static int fio_net_queue_stop(int error, int signal)
@@ -1359,6 +1360,11 @@ static int accept_loop(int listen_sk)
 		dprint(FD_NET, "server: connect from %s\n", from);
 
 		sk_out = smalloc(sizeof(*sk_out));
+		if (!sk_out) {
+			close(sk);
+			return -1;
+		}
+
 		sk_out->sk = sk;
 		INIT_FLIST_HEAD(&sk_out->list);
 		__fio_sem_init(&sk_out->lock, FIO_SEM_UNLOCKED);
@@ -1695,8 +1701,8 @@ static inline void __fio_net_prep_tail(z_stream *stream, void *out_pdu,
 
 	*last_entry = fio_net_prep_cmd(FIO_NET_CMD_IOLOG, out_pdu, this_len,
 				 NULL, SK_F_VEC | SK_F_INLINE | SK_F_FREE);
-	flist_add_tail(&(*last_entry)->list, &first->next);
-
+	if (*last_entry)
+		flist_add_tail(&(*last_entry)->list, &first->next);
 }
 
 /*
@@ -1712,9 +1718,10 @@ static int __deflate_pdu_buffer(void *next_in, unsigned int next_sz, void **out_
 	stream->next_in = next_in;
 	stream->avail_in = next_sz;
 	do {
-		if (! stream->avail_out) {
-
+		if (!stream->avail_out) {
 			__fio_net_prep_tail(stream, *out_pdu, last_entry, first);
+			if (*last_entry == NULL)
+				return 1;
 
 			*out_pdu = malloc(FIO_SERVER_MAX_FRAGMENT_PDU);
 
@@ -1778,8 +1785,7 @@ static int __fio_append_iolog_gz_hist(struct sk_entry *first, struct io_log *log
 	}
 
 	__fio_net_prep_tail(stream, out_pdu, &entry, first);
-
-	return 0;
+	return entry == NULL;
 }
 
 static int __fio_append_iolog_gz(struct sk_entry *first, struct io_log *log,
@@ -1818,6 +1824,10 @@ static int __fio_append_iolog_gz(struct sk_entry *first, struct io_log *log,
 
 		entry = fio_net_prep_cmd(FIO_NET_CMD_IOLOG, out_pdu, this_len,
 					 NULL, SK_F_VEC | SK_F_INLINE | SK_F_FREE);
+		if (!entry) {
+			free(out_pdu);
+			return 1;
+		}
 		flist_add_tail(&entry->list, &first->next);
 	} while (stream->avail_in);
 
@@ -1869,6 +1879,10 @@ static int fio_append_iolog_gz(struct sk_entry *first, struct io_log *log)
 
 		entry = fio_net_prep_cmd(FIO_NET_CMD_IOLOG, out_pdu, this_len,
 					 NULL, SK_F_VEC | SK_F_INLINE | SK_F_FREE);
+		if (!entry) {
+			free(out_pdu);
+			break;
+		}
 		flist_add_tail(&entry->list, &first->next);
 	} while (ret != Z_STREAM_END);
 
@@ -1889,6 +1903,7 @@ static int fio_append_gz_chunks(struct sk_entry *first, struct io_log *log)
 {
 	struct sk_entry *entry;
 	struct flist_head *node;
+	int ret = 0;
 
 	pthread_mutex_lock(&log->chunk_lock);
 	flist_for_each(node, &log->chunk_list) {
@@ -1897,16 +1912,20 @@ static int fio_append_gz_chunks(struct sk_entry *first, struct io_log *log)
 		c = flist_entry(node, struct iolog_compress, list);
 		entry = fio_net_prep_cmd(FIO_NET_CMD_IOLOG, c->buf, c->len,
 						NULL, SK_F_VEC | SK_F_INLINE);
+		if (!entry) {
+			ret = 1;
+			break;
+		}
 		flist_add_tail(&entry->list, &first->next);
 	}
 	pthread_mutex_unlock(&log->chunk_lock);
-
-	return 0;
+	return ret;
 }
 
 static int fio_append_text_log(struct sk_entry *first, struct io_log *log)
 {
 	struct sk_entry *entry;
+	int ret = 0;
 
 	while (!flist_empty(&log->io_logs)) {
 		struct io_logs *cur_log;
@@ -1919,10 +1938,14 @@ static int fio_append_text_log(struct sk_entry *first, struct io_log *log)
 
 		entry = fio_net_prep_cmd(FIO_NET_CMD_IOLOG, cur_log->log, size,
 						NULL, SK_F_VEC | SK_F_INLINE);
+		if (!entry) {
+			ret = 1;
+			break;
+		}
 		flist_add_tail(&entry->list, &first->next);
 	}
 
-	return 0;
+	return ret;
 }
 
 int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
@@ -1977,6 +2000,8 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 	 * Assemble header entry first
 	 */
 	first = fio_net_prep_cmd(FIO_NET_CMD_IOLOG, &pdu, sizeof(pdu), NULL, SK_F_VEC | SK_F_INLINE | SK_F_COPY);
+	if (!first)
+		return 1;
 
 	/*
 	 * Now append actual log entries. If log compression was enabled on
diff --git a/server.h b/server.h
index d652d31..1eee7dc 100644
--- a/server.h
+++ b/server.h
@@ -7,7 +7,6 @@
 #include <netinet/in.h>
 
 #include "stat.h"
-#include "os/os.h"
 #include "diskutil.h"
 
 #define FIO_NET_PORT 8765
@@ -217,8 +216,6 @@ extern int fio_server_parse_host(const char *, int, struct in_addr *, struct in6
 extern const char *fio_server_op(unsigned int);
 extern void fio_server_got_signal(int);
 
-struct thread_stat;
-struct group_run_stats;
 extern void fio_server_send_ts(struct thread_stat *, struct group_run_stats *);
 extern void fio_server_send_gs(struct group_run_stats *);
 extern void fio_server_send_du(void);
diff --git a/smalloc.c b/smalloc.c
index 13995ac..7b1690a 100644
--- a/smalloc.c
+++ b/smalloc.c
@@ -3,15 +3,8 @@
  * that can be shared across processes and threads
  */
 #include <sys/mman.h>
-#include <stdio.h>
-#include <stdlib.h>
 #include <assert.h>
 #include <string.h>
-#include <unistd.h>
-#include <inttypes.h>
-#include <sys/types.h>
-#include <limits.h>
-#include <fcntl.h>
 #ifdef CONFIG_VALGRIND_DEV
 #include <valgrind/valgrind.h>
 #else
@@ -22,7 +15,6 @@
 
 #include "fio.h"
 #include "fio_sem.h"
-#include "arch/arch.h"
 #include "os/os.h"
 #include "smalloc.h"
 #include "log.h"
diff --git a/smalloc.h b/smalloc.h
index 4b551e3..8df10e6 100644
--- a/smalloc.h
+++ b/smalloc.h
@@ -1,6 +1,8 @@
 #ifndef FIO_SMALLOC_H
 #define FIO_SMALLOC_H
 
+#include <stddef.h>
+
 extern void *smalloc(size_t);
 extern void *scalloc(size_t, size_t);
 extern void sfree(void *);
diff --git a/stat.c b/stat.c
index 98ab638..a837ed9 100644
--- a/stat.c
+++ b/stat.c
@@ -1,10 +1,7 @@
 #include <stdio.h>
 #include <string.h>
 #include <sys/time.h>
-#include <sys/types.h>
 #include <sys/stat.h>
-#include <dirent.h>
-#include <libgen.h>
 #include <math.h>
 
 #include "fio.h"
diff --git a/steadystate.c b/steadystate.c
index 2017ca6..1e3a546 100644
--- a/steadystate.c
+++ b/steadystate.c
@@ -2,7 +2,6 @@
 
 #include "fio.h"
 #include "steadystate.h"
-#include "helper_thread.h"
 
 bool steadystate_enabled = false;
 
diff --git a/steadystate.h b/steadystate.h
index 9fd88ee..51472c4 100644
--- a/steadystate.h
+++ b/steadystate.h
@@ -1,9 +1,7 @@
 #ifndef FIO_STEADYSTATE_H
 #define FIO_STEADYSTATE_H
 
-#include "stat.h"
 #include "thread_options.h"
-#include "lib/ieee754.h"
 
 extern void steadystate_free(struct thread_data *);
 extern void steadystate_check(void);
diff --git a/t/axmap.c b/t/axmap.c
index a803ce4..eef464f 100644
--- a/t/axmap.c
+++ b/t/axmap.c
@@ -1,8 +1,5 @@
 #include <stdio.h>
 #include <stdlib.h>
-#include <fcntl.h>
-#include <string.h>
-#include <unistd.h>
 #include <inttypes.h>
 
 #include "../lib/lfsr.h"
diff --git a/t/btrace2fio.c b/t/btrace2fio.c
index 4cdb38d..a8a9d62 100644
--- a/t/btrace2fio.c
+++ b/t/btrace2fio.c
@@ -1,5 +1,4 @@
 #include <stdio.h>
-#include <stdio.h>
 #include <unistd.h>
 #include <inttypes.h>
 #include <math.h>
@@ -12,6 +11,7 @@
 #include "../blktrace_api.h"
 #include "../os/os.h"
 #include "../log.h"
+#include "../minmax.h"
 #include "../oslib/linux-dev-lookup.h"
 
 #define TRACE_FIFO_SIZE	8192
diff --git a/t/dedupe.c b/t/dedupe.c
index 1b4277c..37120e1 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -3,16 +3,12 @@
  * just scans the filename for extents of the given size, checksums them,
  * and orders them up.
  */
+#include <fcntl.h>
+#include <inttypes.h>
 #include <stdio.h>
-#include <stdio.h>
+#include <string.h>
 #include <unistd.h>
-#include <inttypes.h>
-#include <assert.h>
-#include <sys/types.h>
 #include <sys/stat.h>
-#include <sys/ioctl.h>
-#include <fcntl.h>
-#include <string.h>
 
 #include "../flist.h"
 #include "../log.h"
diff --git a/t/gen-rand.c b/t/gen-rand.c
index 4e9d39c..c379053 100644
--- a/t/gen-rand.c
+++ b/t/gen-rand.c
@@ -1,17 +1,11 @@
+#include <math.h>
+#include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
-#include <inttypes.h>
-#include <assert.h>
-#include <math.h>
-#include <string.h>
 
 #include "../lib/types.h"
-#include "../log.h"
-#include "../lib/lfsr.h"
-#include "../lib/axmap.h"
-#include "../smalloc.h"
-#include "../minmax.h"
 #include "../lib/rand.h"
+#include "../log.h"
 
 int main(int argc, char *argv[])
 {
diff --git a/t/genzipf.c b/t/genzipf.c
index 9faec38..4fc10ae 100644
--- a/t/genzipf.c
+++ b/t/genzipf.c
@@ -14,7 +14,6 @@
  */
 #include <stdio.h>
 #include <stdlib.h>
-#include <fcntl.h>
 #include <string.h>
 #include <unistd.h>
 
diff --git a/t/lfsr-test.c b/t/lfsr-test.c
index 4009b62..abdbafb 100644
--- a/t/lfsr-test.c
+++ b/t/lfsr-test.c
@@ -1,11 +1,7 @@
 #include <stdio.h>
 #include <stdlib.h>
-#include <time.h>
 #include <math.h>
 #include <string.h>
-#include <unistd.h>
-#include <sys/types.h>
-#include <sys/stat.h>
 
 #include "../lib/lfsr.h"
 #include "../gettime.h"
diff --git a/td_error.h b/td_error.h
index 1b38a53..1cc3a75 100644
--- a/td_error.h
+++ b/td_error.h
@@ -1,6 +1,8 @@
 #ifndef FIO_TD_ERROR_H
 #define FIO_TD_ERROR_H
 
+#include "io_ddir.h"
+
 /*
  * What type of errors to continue on when continue_on_error is used,
  * and what type of errors to ignore when ignore_error is used.
diff --git a/trim.c b/trim.c
index 78cf672..bf825db 100644
--- a/trim.c
+++ b/trim.c
@@ -1,11 +1,8 @@
 /*
  * TRIM/DISCARD support
  */
-#include <unistd.h>
-#include <fcntl.h>
 #include <string.h>
 #include <assert.h>
-#include <pthread.h>
 
 #include "fio.h"
 #include "trim.h"
diff --git a/trim.h b/trim.h
index 37f5d7c..fe8f9fe 100644
--- a/trim.h
+++ b/trim.h
@@ -1,9 +1,13 @@
 #ifndef FIO_TRIM_H
 #define FIO_TRIM_H
 
-#include "fio.h"
-
 #ifdef FIO_HAVE_TRIM
+#include "flist.h"
+#include "iolog.h"
+#include "compiler/compiler.h"
+#include "lib/types.h"
+#include "os/os.h"
+
 extern bool __must_check get_next_trim(struct thread_data *td, struct io_u *io_u);
 extern bool io_u_should_trim(struct thread_data *td, struct io_u *io_u);
 
diff --git a/verify.c b/verify.c
index d10670b..c5fa241 100644
--- a/verify.c
+++ b/verify.c
@@ -245,33 +245,23 @@ struct vcont {
 static void dump_buf(char *buf, unsigned int len, unsigned long long offset,
 		     const char *type, struct fio_file *f)
 {
-	char *ptr, fname[DUMP_BUF_SZ];
-	size_t buf_left = DUMP_BUF_SZ;
+	char *ptr, *fname;
+	char sep[2] = { FIO_OS_PATH_SEPARATOR, 0 };
 	int ret, fd;
 
 	ptr = strdup(f->file_name);
 
-	memset(fname, 0, sizeof(fname));
-	if (aux_path)
-		sprintf(fname, "%s%c", aux_path, FIO_OS_PATH_SEPARATOR);
-
-	strncpy(fname + strlen(fname), basename(ptr), buf_left - 1);
-
-	buf_left -= strlen(fname);
-	if (buf_left <= 0) {
+	if (asprintf(&fname, "%s%s%s.%llu.%s", aux_path ? : "",
+		     aux_path ? sep : "", basename(ptr), offset, type) < 0) {
 		if (!fio_did_warn(FIO_WARN_VERIFY_BUF))
-			log_err("fio: verify failure dump buffer too small\n");
-		free(ptr);
-		return;
+			log_err("fio: not enough memory for dump buffer filename\n");
+		goto free_ptr;
 	}
 
-	snprintf(fname + strlen(fname), buf_left, ".%llu.%s", offset, type);
-
 	fd = open(fname, O_CREAT | O_TRUNC | O_WRONLY, 0644);
 	if (fd < 0) {
 		perror("open verify buf file");
-		free(ptr);
-		return;
+		goto free_fname;
 	}
 
 	while (len) {
@@ -288,6 +278,11 @@ static void dump_buf(char *buf, unsigned int len, unsigned long long offset,
 
 	close(fd);
 	log_err("       %s data dumped as %s\n", type, fname);
+
+free_fname:
+	free(fname);
+
+free_ptr:
 	free(ptr);
 }
 
diff --git a/verify.h b/verify.h
index 321e648..64121a5 100644
--- a/verify.h
+++ b/verify.h
@@ -2,6 +2,7 @@
 #define FIO_VERIFY_H
 
 #include <stdint.h>
+#include "compiler/compiler.h"
 #include "verify-state.h"
 
 #define FIO_HDR_MAGIC	0xacca
diff --git a/workqueue.h b/workqueue.h
index e35c181..0a62b5f 100644
--- a/workqueue.h
+++ b/workqueue.h
@@ -1,7 +1,14 @@
 #ifndef FIO_RATE_H
 #define FIO_RATE_H
 
+#include <inttypes.h>
+#include <pthread.h>
+
 #include "flist.h"
+#include "lib/types.h"
+
+struct sk_out;
+struct thread_data;
 
 struct workqueue_work {
 	struct flist_head list;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 96344ff00349422172de6fa57899c66dc3c00391:

  optgroup: add check for optgroup bit numbers being within range (2018-03-19 15:56:20 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 69c594d81d4067fadd70fa4909e19d615efa5f1c:

  optgroup: move debug code into function (2018-03-20 11:19:19 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Merge branch 'expand_fiohistparser' of https://github.com/shimrot/fio
      Fix whitespace issues in previous commit
      optgroup: move debug code into function

krisd (1):
      Expand fiologparser_hist operations with new options

 optgroup.c                        |   5 +-
 tools/hist/fiologparser_hist.py   | 385 ++++++++++++++++++++++++++++++--------
 tools/hist/fiologparser_hist.py.1 |  23 ++-
 3 files changed, 328 insertions(+), 85 deletions(-)

---

Diff of recent changes:

diff --git a/optgroup.c b/optgroup.c
index 1c418f5..04ceec7 100644
--- a/optgroup.c
+++ b/optgroup.c
@@ -202,7 +202,8 @@ const struct opt_group *opt_group_from_mask(uint64_t *mask)
 
 const struct opt_group *opt_group_cat_from_mask(uint64_t *mask)
 {
+	compiletime_assert(__FIO_OPT_G_NR <= 8 * sizeof(uint64_t),
+				"__FIO_OPT_G_NR");
+
 	return group_from_mask(fio_opt_cat_groups, mask, FIO_OPT_G_INVALID);
 }
-
-compiletime_assert(__FIO_OPT_G_NR <= 8 * sizeof(uint64_t), "__FIO_OPT_G_NR");
diff --git a/tools/hist/fiologparser_hist.py b/tools/hist/fiologparser_hist.py
index 62a4eb4..8910d5f 100755
--- a/tools/hist/fiologparser_hist.py
+++ b/tools/hist/fiologparser_hist.py
@@ -16,10 +16,57 @@
 import os
 import sys
 import pandas
+import re
 import numpy as np
 
+runascmd = False
+
 err = sys.stderr.write
 
+class HistFileRdr():
+    """ Class to read a hist file line by line, buffering
+        a value array for the latest line, and allowing a preview
+        of the next timestamp in next line
+        Note: this does not follow a generator pattern, but must explicitly
+        get next bin array.
+    """
+    def __init__(self, file):
+        self.fp = open(file, 'r')
+        self.data = self.nextData()
+
+    def close(self):
+        self.fp.close()
+        self.fp = None
+
+    def nextData(self):
+        self.data = None
+        if self.fp:
+            line = self.fp.readline()
+            if line == "":
+                self.close()
+            else:
+                self.data = [int(x) for x in line.replace(' ', '').rstrip().split(',')]
+
+        return self.data
+
+    @property
+    def curTS(self):
+        ts = None
+        if self.data:
+            ts = self.data[0]
+        return ts
+
+    @property
+    def curDir(self):
+        d = None
+        if self.data:
+            d = self.data[1]
+        return d
+
+    @property
+    def curBins(self):
+        return self.data[3:]
+
 def weighted_percentile(percs, vs, ws):
     """ Use linear interpolation to calculate the weighted percentile.
         
@@ -42,7 +89,7 @@ def weights(start_ts, end_ts, start, end):
     """ Calculate weights based on fraction of sample falling in the
         given interval [start,end]. Weights computed using vector / array
         computation instead of for-loops.
-    
+
         Note that samples with zero time length are effectively ignored
         (we set their weight to zero).
 
@@ -64,8 +111,21 @@ def weights(start_ts, end_ts, start, end):
 def weighted_average(vs, ws):
     return np.sum(vs * ws) / np.sum(ws)
 
-columns = ["end-time", "samples", "min", "avg", "median", "90%", "95%", "99%", "max"]
-percs   = [50, 90, 95, 99]
+
+percs = None
+columns = None
+
+def gen_output_columns(ctx):
+    global percs,columns
+    strpercs = re.split('[,:]', ctx.percentiles)
+    percs = [50.0]  # always print 50% in 'median' column
+    percs.extend(list(map(float,strpercs)))
+    if ctx.directions:
+        columns = ["end-time", "dir", "samples", "min", "avg", "median"]
+    else:
+        columns = ["end-time", "samples", "min", "avg", "median"]
+    columns.extend(list(map(lambda x: x+'%', strpercs)))
+    columns.append("max")
 
 def fmt_float_list(ctx, num=1):
   """ Return a comma separated list of float formatters to the required number
@@ -80,7 +140,7 @@ def fmt_float_list(ctx, num=1):
 __HIST_COLUMNS = 1216
 __NON_HIST_COLUMNS = 3
 __TOTAL_COLUMNS = __HIST_COLUMNS + __NON_HIST_COLUMNS
-    
+
 def read_chunk(rdr, sz):
     """ Read the next chunk of size sz from the given reader. """
     try:
@@ -88,15 +148,18 @@ def read_chunk(rdr, sz):
             occurs if rdr is None due to the file being empty. """
         new_arr = rdr.read().values
     except (StopIteration, AttributeError):
-        return None    
+        return None
 
-    """ Extract array of just the times, and histograms matrix without times column. """
-    times, rws, szs = new_arr[:,0], new_arr[:,1], new_arr[:,2]
-    hists = new_arr[:,__NON_HIST_COLUMNS:]
-    times = times.reshape((len(times),1))
-    arr = np.append(times, hists, axis=1)
+    # Let's leave the array as is, and let later code ignore the block size
+    return new_arr
 
-    return arr
+    #""" Extract array of the times, directions wo times, and histograms matrix without times column. """
+    #times, rws, szs = new_arr[:,0], new_arr[:,1], new_arr[:,2]
+    #hists = new_arr[:,__NON_HIST_COLUMNS:]
+    #times = times.reshape((len(times),1))
+    #dirs  = rws.reshape((len(rws),1))
+    #arr = np.append(times, hists, axis=1)
+    #return arr
 
 def get_min(fps, arrs):
     """ Find the file with the current first row with the smallest start time """
@@ -126,7 +189,8 @@ def histogram_generator(ctx, fps, sz):
         except ValueError:
             return
         arr = arrs[fp]
-        yield np.insert(arr[0], 1, fps.index(fp))
+        arri = np.insert(arr[0], 1, fps.index(fp))
+        yield arri
         arrs[fp] = arr[1:]
 
         if arrs[fp].shape[0] == 0:
@@ -172,13 +236,24 @@ def plat_idx_to_val_coarse(idx, coarseness, edge=0.5):
     upper = _plat_idx_to_val(idx + stride, edge=1.0)
     return lower + (upper - lower) * edge
 
-def print_all_stats(ctx, end, mn, ss_cnt, vs, ws, mx):
+def print_all_stats(ctx, end, mn, ss_cnt, vs, ws, mx, dir=dir):
     ps = weighted_percentile(percs, vs, ws)
 
     avg = weighted_average(vs, ws)
     values = [mn, avg] + list(ps) + [mx]
-    row = [end, ss_cnt] + [float(x) / ctx.divisor for x in values]
-    fmt = "%d, %d, %d, " + fmt_float_list(ctx, 5) + ", %d"
+    if ctx.directions:
+        row = [end, dir, ss_cnt]
+        fmt = "%d, %s, %d, "
+    else:
+        row = [end, ss_cnt]
+        fmt = "%d, %d, "
+    row = row + [float(x) / ctx.divisor for x in values]
+    if ctx.divisor > 1:
+        fmt = fmt + fmt_float_list(ctx, len(percs)+3)
+    else:
+        # max and min are decimal values if no divisor
+        fmt = fmt + "%d, " + fmt_float_list(ctx, len(percs)+1) + ", %d"
+
     print (fmt % tuple(row))
 
 def update_extreme(val, fncn, new_val):
@@ -191,40 +266,69 @@ bin_vals = []
 lower_bin_vals = [] # lower edge of each bin
 upper_bin_vals = [] # upper edge of each bin 
 
-def process_interval(ctx, samples, iStart, iEnd):
+def process_interval(ctx, iHist, iEnd, dir):
+    """ print estimated percentages for the given merged sample
+    """
+    ss_cnt = 0 # number of samples affecting this interval
+    mn_bin_val, mx_bin_val = None, None
+
+    # Update total number of samples affecting current interval histogram:
+    ss_cnt += np.sum(iHist)
+
+    # Update min and max bin values
+    idxs = np.nonzero(iHist != 0)[0]
+    if idxs.size > 0:
+        mn_bin_val = bin_vals[idxs[0]]
+        mx_bin_val = bin_vals[idxs[-1]]
+
+    if ss_cnt > 0: print_all_stats(ctx, iEnd, mn_bin_val, ss_cnt, bin_vals, iHist, mx_bin_val, dir=dir)
+
+
+dir_map = ['r', 'w', 't']  # map of directional value in log to textual representation
+def process_weighted_interval(ctx, samples, iStart, iEnd, printdirs):
     """ Construct the weighted histogram for the given interval by scanning
         through all the histograms and figuring out which of their bins have
         samples with latencies which overlap with the given interval
         [iStart,iEnd].
     """
-    
-    times, files, hists = samples[:,0], samples[:,1], samples[:,2:]
-    iHist = np.zeros(__HIST_COLUMNS)
-    ss_cnt = 0 # number of samples affecting this interval
-    mn_bin_val, mx_bin_val = None, None
 
-    for end_time,file,hist in zip(times,files,hists):
-            
+    times, files, dirs, sizes, hists = samples[:,0], samples[:,1], samples[:,2], samples[:,3], samples[:,4:]
+    iHist={}; ss_cnt = {}; mn_bin_val={}; mx_bin_val={}
+    for dir in printdirs:
+        iHist[dir] = np.zeros(__HIST_COLUMNS, dtype=float)
+        ss_cnt[dir] = 0 # number of samples affecting this interval
+        mn_bin_val[dir] = None
+        mx_bin_val[dir] = None
+
+    for end_time,file,dir,hist in zip(times,files,dirs,hists):
+
         # Only look at bins of the current histogram sample which
         # started before the end of the current time interval [start,end]
-        start_times = (end_time - 0.5 * ctx.interval) - bin_vals / 1000.0
+        start_times = (end_time - 0.5 * ctx.interval) - bin_vals / ctx.time_divisor
         idx = np.where(start_times < iEnd)
         s_ts, l_bvs, u_bvs, hs = start_times[idx], lower_bin_vals[idx], upper_bin_vals[idx], hist[idx]
 
-        # Increment current interval histogram by weighted values of future histogram:
+        # Increment current interval histogram by weighted values of future histogram
+        # total number of samples
+        # and min and max values as necessary
+        textdir = dir_map[dir]
         ws = hs * weights(s_ts, end_time, iStart, iEnd)
-        iHist[idx] += ws
-    
-        # Update total number of samples affecting current interval histogram:
-        ss_cnt += np.sum(hs)
-        
-        # Update min and max bin values seen if necessary:
-        idx = np.where(hs != 0)[0]
-        if idx.size > 0:
-            mn_bin_val = update_extreme(mn_bin_val, min, l_bvs[max(0,           idx[0]  - 1)])
-            mx_bin_val = update_extreme(mx_bin_val, max, u_bvs[min(len(hs) - 1, idx[-1] + 1)])
-
-    if ss_cnt > 0: print_all_stats(ctx, iEnd, mn_bin_val, ss_cnt, bin_vals, iHist, mx_bin_val)
+        mmidx = np.where(hs != 0)[0]
+        if 'm' in printdirs:
+            iHist['m'][idx] += ws
+            ss_cnt['m'] += np.sum(hs)
+            if mmidx.size > 0:
+                mn_bin_val['m'] = update_extreme(mn_bin_val['m'], min, l_bvs[max(0,           mmidx[0]  - 1)])
+                mx_bin_val['m'] = update_extreme(mx_bin_val['m'], max, u_bvs[min(len(hs) - 1, mmidx[-1] + 1)])
+        if textdir in printdirs:
+            iHist[textdir][idx] += ws
+            ss_cnt[textdir] += np.sum(hs)  # Update total number of samples affecting current interval histogram:
+            if mmidx.size > 0:
+                mn_bin_val[textdir] = update_extreme(mn_bin_val[textdir], min, l_bvs[max(0,           mmidx[0]  - 1)])
+                mx_bin_val[textdir] = update_extreme(mx_bin_val[textdir], max, u_bvs[min(len(hs) - 1, mmidx[-1] + 1)])
+
+    for textdir in sorted(printdirs):
+        if ss_cnt[textdir] > 0: print_all_stats(ctx, iEnd, mn_bin_val[textdir], ss_cnt[textdir], bin_vals, iHist[textdir], mx_bin_val[textdir], dir=textdir)
 
 def guess_max_from_bins(ctx, hist_cols):
     """ Try to guess the GROUP_NR from given # of histogram
@@ -241,7 +345,7 @@ def guess_max_from_bins(ctx, hist_cols):
     idx = np.where(arr == hist_cols)
     if len(idx[1]) == 0:
         table = repr(arr.astype(int)).replace('-10', 'N/A').replace('array','     ')
-        err("Unable to determine bin values from input clat_hist files. Namely \n"
+        errmsg = ("Unable to determine bin values from input clat_hist files. Namely \n"
             "the first line of file '%s' " % ctx.FILE[0] + "has %d \n" % (__TOTAL_COLUMNS,) +
             "columns of which we assume %d " % (hist_cols,) + "correspond to histogram bins. \n"
             "This number needs to be equal to one of the following numbers:\n\n"
@@ -250,9 +354,119 @@ def guess_max_from_bins(ctx, hist_cols):
             "  - Input file(s) does not contain histograms.\n"
             "  - You recompiled fio with a different GROUP_NR. If so please specify this\n"
             "    new GROUP_NR on the command line with --group_nr\n")
-        exit(1)
+        if runascmd:
+            err(errmsg)
+            exit(1)
+        else:
+            raise RuntimeError(errmsg) 
+
     return bins[idx[1][0]]
 
+def output_weighted_interval_data(ctx,printdirs):
+
+    fps = [open(f, 'r') for f in ctx.FILE]
+    gen = histogram_generator(ctx, fps, ctx.buff_size)
+
+    print(', '.join(columns))
+
+    try:
+        start, end = 0, ctx.interval
+        arr = np.empty(shape=(0,__TOTAL_COLUMNS + 1),dtype=int)
+        more_data = True
+        while more_data or len(arr) > 0:
+
+            # Read up to ctx.max_latency (default 20 seconds) of data from end of current interval.
+            while len(arr) == 0 or arr[-1][0] < ctx.max_latency * 1000 + end:
+                try:
+                    new_arr = next(gen)
+                except StopIteration:
+                    more_data = False
+                    break
+                nashape  = new_arr.reshape((1,__TOTAL_COLUMNS + 1))
+                arr = np.append(arr, nashape, axis=0)
+            #arr = arr.astype(int)
+            
+            if arr.size > 0:
+                # Jump immediately to the start of the input, rounding
+                # down to the nearest multiple of the interval (useful when --log_unix_epoch
+                # was used to create these histograms):
+                if start == 0 and arr[0][0] - ctx.max_latency > end:
+                    start = arr[0][0] - ctx.max_latency
+                    start = start - (start % ctx.interval)
+                    end = start + ctx.interval
+
+                process_weighted_interval(ctx, arr, start, end, printdirs)
+                
+                # Update arr to throw away samples we no longer need - samples which
+                # end before the start of the next interval, i.e. the end of the
+                # current interval:
+                idx = np.where(arr[:,0] > end)
+                arr = arr[idx]
+            
+            start += ctx.interval
+            end = start + ctx.interval
+    finally:
+        for fp in fps:
+            fp.close()
+
+def output_interval_data(ctx,directions):
+    fps = [HistFileRdr(f) for f in ctx.FILE]
+
+    print(', '.join(columns))
+
+    start = 0
+    end = ctx.interval
+    while True:
+
+        more_data = False
+
+        # add bins from all files in target intervals
+        arr = None
+        numSamples = 0
+        while True:
+            foundSamples = False
+            for fp in fps:
+                ts = fp.curTS
+                if ts and ts+10 < end:  # shift sample time when very close to an end time
+                    curdirect = fp.curDir
+                    numSamples += 1
+                    foundSamples = True
+                    if arr is None:
+                        arr = {}
+                        for d in directions:
+                            arr[d] = np.zeros(shape=(__HIST_COLUMNS), dtype=int)
+                    if 'm' in arr:
+                        arr['m'] = np.add(arr['m'], fp.curBins)
+                    if 'r' in arr and curdirect == 0:
+                        arr['r'] = np.add(arr['r'], fp.curBins)
+                    if 'w' in arr and curdirect == 1:
+                        arr['w'] = np.add(arr['w'], fp.curBins)
+                    if 't' in arr and curdirect == 2:
+                        arr['t'] = np.add(arr['t'], fp.curBins)
+
+                    more_data = True
+                    fp.nextData()
+                elif ts:
+                    more_data = True
+
+            # reached end of all files
+            # or gone through all files without finding sample in interval
+            if not more_data or not foundSamples:
+                break
+
+        if arr is not None:
+            #print("{} size({}) samples({}) nonzero({}):".format(end, arr.size, numSamples, np.count_nonzero(arr)), str(arr), )
+            for d in sorted(arr.keys()):
+                aval = arr[d]
+                process_interval(ctx, aval, end, d)
+
+        # reach end of all files
+        if not more_data:
+            break
+
+        start += ctx.interval
+        end = start + ctx.interval
+
 def main(ctx):
 
     if ctx.job_file:
@@ -275,9 +489,23 @@ def main(ctx):
                 except NoOptionError:
                     pass
 
+    if not hasattr(ctx, 'percentiles'):
+        ctx.percentiles = "90,95,99"
+
+    if ctx.directions:
+        ctx.directions = ctx.directions.lower()
+
     if ctx.interval is None:
         ctx.interval = 1000
 
+    if ctx.usbin:
+        ctx.time_divisor = 1000.0        # bins are in us
+    else:
+        ctx.time_divisor = 1000000.0     # bins are in ns
+
+    gen_output_columns(ctx)
+
+
     # Automatically detect how many columns are in the input files,
     # calculate the corresponding 'coarseness' parameter used to generate
     # those files, and calculate the appropriate bin latency values:
@@ -292,53 +520,22 @@ def main(ctx):
         lower_bin_vals = np.array([plat_idx_to_val_coarse(x, coarseness, 0.0) for x in np.arange(__HIST_COLUMNS)], dtype=float)
         upper_bin_vals = np.array([plat_idx_to_val_coarse(x, coarseness, 1.0) for x in np.arange(__HIST_COLUMNS)], dtype=float)
 
-    fps = [open(f, 'r') for f in ctx.FILE]
-    gen = histogram_generator(ctx, fps, ctx.buff_size)
-
-    print(', '.join(columns))
+    # indicate which directions to output (read(0), write(1), trim(2), mixed(3))
+    directions = set()
+    if not ctx.directions or 'm' in ctx.directions: directions.add('m')
+    if ctx.directions and 'r' in ctx.directions:    directions.add('r')
+    if ctx.directions and 'w' in ctx.directions:    directions.add('w')
+    if ctx.directions and 't' in ctx.directions:    directions.add('t')
 
-    try:
-        start, end = 0, ctx.interval
-        arr = np.empty(shape=(0,__TOTAL_COLUMNS - 1))
-        more_data = True
-        while more_data or len(arr) > 0:
-            
-            # Read up to ctx.max_latency (default 20 seconds) of data from end of current interval.
-            while len(arr) == 0 or arr[-1][0] < ctx.max_latency * 1000 + end:
-                try:
-                    new_arr = next(gen)
-                except StopIteration:
-                    more_data = False
-                    break
-                arr = np.append(arr, new_arr.reshape((1,__TOTAL_COLUMNS - 1)), axis=0)
-            arr = arr.astype(int)
-            
-            if arr.size > 0:
-                # Jump immediately to the start of the input, rounding
-                # down to the nearest multiple of the interval (useful when --log_unix_epoch
-                # was used to create these histograms):
-                if start == 0 and arr[0][0] - ctx.max_latency > end:
-                    start = arr[0][0] - ctx.max_latency
-                    start = start - (start % ctx.interval)
-                    end = start + ctx.interval
-
-                process_interval(ctx, arr, start, end)
-                
-                # Update arr to throw away samples we no longer need - samples which
-                # end before the start of the next interval, i.e. the end of the
-                # current interval:
-                idx = np.where(arr[:,0] > end)
-                arr = arr[idx]
-            
-            start += ctx.interval
-            end = start + ctx.interval
-    finally:
-        for fp in fps:
-            fp.close()
+    if ctx.noweight:
+        output_interval_data(ctx, directions)
+    else:
+        output_weighted_interval_data(ctx, directions)
 
 
 if __name__ == '__main__':
     import argparse
+    runascmd = True
     p = argparse.ArgumentParser()
     arg = p.add_argument
     arg("FILE", help='space separated list of latency log filenames', nargs='+')
@@ -356,6 +553,11 @@ if __name__ == '__main__':
         type=int,
         help='interval width (ms), default 1000 ms')
 
+    arg('--noweight',
+        action='store_true',
+        default=False,
+        help='do not perform weighting of samples between output intervals')
+
     arg('-d', '--divisor',
         required=False,
         type=int,
@@ -385,5 +587,26 @@ if __name__ == '__main__':
              'given histogram files. Useful for auto-detecting --log_hist_msec and '
              '--log_unix_epoch (in fio) values.')
 
+    arg('--percentiles',
+        default="90:95:99",
+        type=str,
+        help='Optional argument of comma or colon separated percentiles to print. '
+             'The default is "90.0:95.0:99.0".  min, median(50%%) and max percentiles are always printed')
+
+    arg('--usbin',
+        default=False,
+        action='store_true',
+        help='histogram bin latencies are in us (fio versions < 2.99. fio uses ns for version >= 2.99')
+
+    arg('--directions',
+        default=None,
+        type=str,
+        help='Optionally split results output by reads, writes, trims or mixed. '
+             'Value may be any combination of "rwtm" characters. '
+             'By default, only "mixed" results are output without a "dir" field. '
+             'But, specifying the --directions option '
+             'adds a "dir" field to the output content, and separate rows for each of the indicated '
+             'directions.')
+
     main(p.parse_args())
 
diff --git a/tools/hist/fiologparser_hist.py.1 b/tools/hist/fiologparser_hist.py.1
index 5dfacfe..449f248 100644
--- a/tools/hist/fiologparser_hist.py.1
+++ b/tools/hist/fiologparser_hist.py.1
@@ -8,7 +8,7 @@ fiologparser_hist.py \- Calculate statistics from fio histograms
 .B fiologparser_hist.py
 is a utility for converting *_clat_hist* files
 generated by fio into a CSV of latency statistics including minimum,
-average, maximum latency, and 50th, 95th, and 99th percentiles.
+average, maximum latency, and selectable percentiles.
 .SH EXAMPLES
 .PP
 .nf
@@ -42,6 +42,9 @@ Interval at which statistics are reported. Defaults to 1000 ms. This
 should be set a minimum of the value for \fBlog_hist_msec\fR as given
 to fio.
 .TP
+.BR \-\-noweight
+Do not perform weighting of samples between output intervals. Default is False.
+.TP
 .BR \-d ", " \-\-divisor \fR=\fPint
 Divide statistics by this value. Defaults to 1. Useful if you want to
 convert latencies from milliseconds to seconds (\fBdivisor\fR=\fP1000\fR).
@@ -53,6 +56,21 @@ Enables warning messages printed to stderr, useful for debugging.
 Set this to the value of \fIFIO_IO_U_PLAT_GROUP_NR\fR as defined in
 \fPstat.h\fR if fio has been recompiled. Defaults to 19, the
 current value used in fio. See NOTES for more details.
+.TP
+.BR \-\-percentiles \fR=\fPstr
+Pass desired list of comma or colon separated percentiles to print.
+The default is "90.0:95.0:99.0", but min, median(50%) and max percentiles are always printed
+.TP
+.BR \-\-usbin
+Use to indicate to parser that histogram bin latencies values are in microseconds.
+The default is to use nanoseconds, but histogram logs from fio versions <= 2.99 are in microseconds.
+.TP
+.BR \-\-directions \fR=\fPstr
+By default, all directions (e.g read and write) histogram bins are combined
+producing one 'mixed' result.
+To produce independent directional results, pass some combination of
+\'rwtm\' characters with the \-\-directions\fR=\fPrwtm option.
+A \'dir\' column is added indicating the result direction for a row.
 
 .SH NOTES
 end-times are calculated to be uniform increments of the \fB\-\-interval\fR value given,
@@ -87,7 +105,8 @@ min / max seen by fio (and reported in *_clat.* with averaging turned off).
 .PP
 Average statistics use a standard weighted arithmetic mean.
 
-Percentile statistics are computed using the weighted percentile method as
+When --noweights option is false (the default)
+percentile statistics are computed using the weighted percentile method as
 described here: \fIhttps://en.wikipedia.org/wiki/Percentile#Weighted_percentile\fR.
 See weights() method for details on how weights are computed for individual
 samples. In process_interval() we further multiply by the height of each bin

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 69b98f11d62cb12482130fac79b8ebf00c0bb139:

  io_u: only rewind file position if it's non-zero (2018-03-13 11:49:55 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 96344ff00349422172de6fa57899c66dc3c00391:

  optgroup: add check for optgroup bit numbers being within range (2018-03-19 15:56:20 -0600)

----------------------------------------------------------------
Bart Van Assche (7):
      Split mutex.c and .h each into three files
      Rename fio_mutex into fio_sem
      Improve Valgrind instrumentation of memory allocations
      gettime: Rework the clock thread starting mechanism
      Suppress uninteresting data race reports
      Make sure that assert() expressions do not have side effects
      Signal td->free_cond with the associated mutex held

Jens Axboe (3):
      Merge branch 'pthread-cond' of https://github.com/bvanassche/fio
      Merge branch 'master' of https://github.com/bvanassche/fio
      optgroup: add check for optgroup bit numbers being within range

Kris Davis (1):
      sg: add read/write FUA options

 HOWTO            |  12 +++
 Makefile         |  16 +--
 backend.c        |  49 ++++-----
 cgroup.c         |  14 +--
 configure        |  21 ++++
 diskutil.c       |  44 ++++----
 diskutil.h       |   7 +-
 engines/sg.c     |  45 ++++++++
 eta.c            |   6 ++
 file.h           |   2 +-
 filehash.c       |  26 ++---
 filelock.c       |  38 +++----
 filesetup.c      |   7 +-
 fio.1            |  10 +-
 fio.h            |   7 +-
 fio_sem.c        | 178 ++++++++++++++++++++++++++++++
 fio_sem.h        |  31 ++++++
 flow.c           |  18 ++--
 gettime-thread.c |  22 ++--
 gettime.c        |  20 ++--
 helper_thread.c  |  23 ++--
 helper_thread.h  |   2 +-
 init.c           |  11 +-
 io_u.c           |   8 +-
 iolog.c          |   1 +
 mutex.c          | 322 -------------------------------------------------------
 mutex.h          |  47 --------
 optgroup.c       |   3 +
 optgroup.h       |   8 +-
 profiles/act.c   |  14 +--
 pshared.c        |  76 +++++++++++++
 pshared.h        |  10 ++
 rwlock.c         |  83 ++++++++++++++
 rwlock.h         |  19 ++++
 server.c         |  34 +++---
 server.h         |   6 +-
 smalloc.c        |  50 +++++++--
 stat.c           |  18 ++--
 stat.h           |   2 +-
 t/dedupe.c       |  26 ++---
 verify.c         |  11 +-
 workqueue.c      |   1 +
 42 files changed, 766 insertions(+), 582 deletions(-)
 create mode 100644 fio_sem.c
 create mode 100644 fio_sem.h
 delete mode 100644 mutex.c
 delete mode 100644 mutex.h
 create mode 100644 pshared.c
 create mode 100644 pshared.h
 create mode 100644 rwlock.c
 create mode 100644 rwlock.h

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index acb9e97..dbbbfaa 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1747,6 +1747,7 @@ I/O engine
 			:manpage:`read(2)` and :manpage:`write(2)` for asynchronous
 			I/O. Requires :option:`filename` option to specify either block or
 			character devices.
+			The sg engine includes engine specific options.
 
 		**null**
 			Doesn't transfer any data, just pretends to.  This is mainly used to
@@ -2068,6 +2069,17 @@ with the caveat that when used on the command line, they must come after the
 	multiple paths exist between the client and the server or in certain loopback
 	configurations.
 
+.. option:: readfua=bool : [sg]
+
+	With readfua option set to 1, read operations include
+	the force unit access (fua) flag. Default is 0.
+
+.. option:: writefua=bool : [sg]
+
+	With writefua option set to 1, write operations include
+	the force unit access (fua) flag. Default is 0.
+
+
 I/O depth
 ~~~~~~~~~
 
diff --git a/Makefile b/Makefile
index d73b944..eb3bddd 100644
--- a/Makefile
+++ b/Makefile
@@ -41,7 +41,8 @@ endif
 SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		$(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/lib/*.c))) \
 		gettime.c ioengines.c init.c stat.c log.c time.c filesetup.c \
-		eta.c verify.c memory.c io_u.c parse.c mutex.c options.c \
+		eta.c verify.c memory.c io_u.c parse.c fio_sem.c rwlock.c \
+		pshared.c options.c \
 		smalloc.c filehash.c profile.c debug.c engines/cpu.c \
 		engines/mmap.c engines/sync.c engines/null.c engines/net.c \
 		engines/ftruncate.c engines/filecreate.c \
@@ -211,7 +212,8 @@ endif
 -include $(OBJS:.o=.d)
 
 T_SMALLOC_OBJS = t/stest.o
-T_SMALLOC_OBJS += gettime.o mutex.o smalloc.o t/log.o t/debug.o t/arch.o
+T_SMALLOC_OBJS += gettime.o fio_sem.o pshared.o smalloc.o t/log.o t/debug.o \
+		  t/arch.o
 T_SMALLOC_PROGS = t/stest
 
 T_IEEE_OBJS = t/ieee754.o
@@ -229,7 +231,8 @@ T_AXMAP_OBJS += lib/lfsr.o lib/axmap.o
 T_AXMAP_PROGS = t/axmap
 
 T_LFSR_TEST_OBJS = t/lfsr-test.o
-T_LFSR_TEST_OBJS += lib/lfsr.o gettime.o t/log.o t/debug.o t/arch.o
+T_LFSR_TEST_OBJS += lib/lfsr.o gettime.o fio_sem.o pshared.o \
+		    t/log.o t/debug.o t/arch.o
 T_LFSR_TEST_PROGS = t/lfsr-test
 
 T_GEN_RAND_OBJS = t/gen-rand.o
@@ -244,9 +247,10 @@ T_BTRACE_FIO_PROGS = t/fio-btrace2fio
 endif
 
 T_DEDUPE_OBJS = t/dedupe.o
-T_DEDUPE_OBJS += lib/rbtree.o t/log.o mutex.o smalloc.o gettime.o crc/md5.o \
-		lib/memalign.o lib/bloom.o t/debug.o crc/xxhash.o t/arch.o \
-		crc/murmur3.o crc/crc32c.o crc/crc32c-intel.o crc/crc32c-arm64.o crc/fnv.o
+T_DEDUPE_OBJS += lib/rbtree.o t/log.o fio_sem.o pshared.o smalloc.o gettime.o \
+		crc/md5.o lib/memalign.o lib/bloom.o t/debug.o crc/xxhash.o \
+		t/arch.o crc/murmur3.o crc/crc32c.o crc/crc32c-intel.o \
+		crc/crc32c-arm64.o crc/fnv.o
 T_DEDUPE_PROGS = t/fio-dedupe
 
 T_VS_OBJS = t/verify-state.o t/log.o crc/crc32c.o crc/crc32c-intel.o crc/crc32c-arm64.o t/debug.o
diff --git a/backend.c b/backend.c
index b4a09ac..d82d494 100644
--- a/backend.c
+++ b/backend.c
@@ -58,8 +58,9 @@
 #include "lib/mountcheck.h"
 #include "rate-submit.h"
 #include "helper_thread.h"
+#include "pshared.h"
 
-static struct fio_mutex *startup_mutex;
+static struct fio_sem *startup_sem;
 static struct flist_head *cgroup_list;
 static char *cgroup_mnt;
 static int exit_value;
@@ -426,7 +427,7 @@ static void check_update_rusage(struct thread_data *td)
 	if (td->update_rusage) {
 		td->update_rusage = 0;
 		update_rusage_stat(td);
-		fio_mutex_up(td->rusage_sem);
+		fio_sem_up(td->rusage_sem);
 	}
 }
 
@@ -1569,11 +1570,11 @@ static void *thread_main(void *data)
 	}
 
 	td_set_runstate(td, TD_INITIALIZED);
-	dprint(FD_MUTEX, "up startup_mutex\n");
-	fio_mutex_up(startup_mutex);
-	dprint(FD_MUTEX, "wait on td->mutex\n");
-	fio_mutex_down(td->mutex);
-	dprint(FD_MUTEX, "done waiting on td->mutex\n");
+	dprint(FD_MUTEX, "up startup_sem\n");
+	fio_sem_up(startup_sem);
+	dprint(FD_MUTEX, "wait on td->sem\n");
+	fio_sem_down(td->sem);
+	dprint(FD_MUTEX, "done waiting on td->sem\n");
 
 	/*
 	 * A new gid requires privilege, so we need to do this before setting
@@ -1802,11 +1803,11 @@ static void *thread_main(void *data)
 		deadlock_loop_cnt = 0;
 		do {
 			check_update_rusage(td);
-			if (!fio_mutex_down_trylock(stat_mutex))
+			if (!fio_sem_down_trylock(stat_sem))
 				break;
 			usleep(1000);
 			if (deadlock_loop_cnt++ > 5000) {
-				log_err("fio seems to be stuck grabbing stat_mutex, forcibly exiting\n");
+				log_err("fio seems to be stuck grabbing stat_sem, forcibly exiting\n");
 				td->error = EDEADLK;
 				goto err;
 			}
@@ -1819,7 +1820,7 @@ static void *thread_main(void *data)
 		if (td_trim(td) && td->io_bytes[DDIR_TRIM])
 			update_runtime(td, elapsed_us, DDIR_TRIM);
 		fio_gettime(&td->start, NULL);
-		fio_mutex_up(stat_mutex);
+		fio_sem_up(stat_sem);
 
 		if (td->error || td->terminate)
 			break;
@@ -1843,10 +1844,10 @@ static void *thread_main(void *data)
 		 */
 		check_update_rusage(td);
 
-		fio_mutex_down(stat_mutex);
+		fio_sem_down(stat_sem);
 		update_runtime(td, elapsed_us, DDIR_READ);
 		fio_gettime(&td->start, NULL);
-		fio_mutex_up(stat_mutex);
+		fio_sem_up(stat_sem);
 
 		if (td->error || td->terminate)
 			break;
@@ -2317,7 +2318,7 @@ reap:
 
 			init_disk_util(td);
 
-			td->rusage_sem = fio_mutex_init(FIO_MUTEX_LOCKED);
+			td->rusage_sem = fio_sem_init(FIO_SEM_LOCKED);
 			td->update_rusage = 0;
 
 			/*
@@ -2362,8 +2363,8 @@ reap:
 				} else if (i == fio_debug_jobno)
 					*fio_debug_jobp = pid;
 			}
-			dprint(FD_MUTEX, "wait on startup_mutex\n");
-			if (fio_mutex_down_timeout(startup_mutex, 10000)) {
+			dprint(FD_MUTEX, "wait on startup_sem\n");
+			if (fio_sem_down_timeout(startup_sem, 10000)) {
 				log_err("fio: job startup hung? exiting.\n");
 				fio_terminate_threads(TERMINATE_ALL);
 				fio_abort = 1;
@@ -2371,7 +2372,7 @@ reap:
 				free(fd);
 				break;
 			}
-			dprint(FD_MUTEX, "done waiting on startup_mutex\n");
+			dprint(FD_MUTEX, "done waiting on startup_sem\n");
 		}
 
 		/*
@@ -2430,7 +2431,7 @@ reap:
 			m_rate += ddir_rw_sum(td->o.ratemin);
 			t_rate += ddir_rw_sum(td->o.rate);
 			todo--;
-			fio_mutex_up(td->mutex);
+			fio_sem_up(td->sem);
 		}
 
 		reap_threads(&nr_running, &t_rate, &m_rate);
@@ -2479,13 +2480,13 @@ int fio_backend(struct sk_out *sk_out)
 		setup_log(&agg_io_log[DDIR_TRIM], &p, "agg-trim_bw.log");
 	}
 
-	startup_mutex = fio_mutex_init(FIO_MUTEX_LOCKED);
-	if (startup_mutex == NULL)
+	startup_sem = fio_sem_init(FIO_SEM_LOCKED);
+	if (startup_sem == NULL)
 		return 1;
 
 	set_genesis_time();
 	stat_init();
-	helper_thread_create(startup_mutex, sk_out);
+	helper_thread_create(startup_sem, sk_out);
 
 	cgroup_list = smalloc(sizeof(*cgroup_list));
 	INIT_FLIST_HEAD(cgroup_list);
@@ -2510,11 +2511,11 @@ int fio_backend(struct sk_out *sk_out)
 		steadystate_free(td);
 		fio_options_free(td);
 		if (td->rusage_sem) {
-			fio_mutex_remove(td->rusage_sem);
+			fio_sem_remove(td->rusage_sem);
 			td->rusage_sem = NULL;
 		}
-		fio_mutex_remove(td->mutex);
-		td->mutex = NULL;
+		fio_sem_remove(td->sem);
+		td->sem = NULL;
 	}
 
 	free_disk_util();
@@ -2522,7 +2523,7 @@ int fio_backend(struct sk_out *sk_out)
 	sfree(cgroup_list);
 	sfree(cgroup_mnt);
 
-	fio_mutex_remove(startup_mutex);
+	fio_sem_remove(startup_sem);
 	stat_exit();
 	return exit_value;
 }
diff --git a/cgroup.c b/cgroup.c
index a297e2a..4fab977 100644
--- a/cgroup.c
+++ b/cgroup.c
@@ -11,7 +11,7 @@
 #include "cgroup.h"
 #include "smalloc.h"
 
-static struct fio_mutex *lock;
+static struct fio_sem *lock;
 
 struct cgroup_member {
 	struct flist_head list;
@@ -70,9 +70,9 @@ err:
 	}
 	if (td->o.cgroup_nodelete)
 		cm->cgroup_nodelete = 1;
-	fio_mutex_down(lock);
+	fio_sem_down(lock);
 	flist_add_tail(&cm->list, clist);
-	fio_mutex_up(lock);
+	fio_sem_up(lock);
 }
 
 void cgroup_kill(struct flist_head *clist)
@@ -83,7 +83,7 @@ void cgroup_kill(struct flist_head *clist)
 	if (!lock)
 		return;
 
-	fio_mutex_down(lock);
+	fio_sem_down(lock);
 
 	flist_for_each_safe(n, tmp, clist) {
 		cm = flist_entry(n, struct cgroup_member, list);
@@ -94,7 +94,7 @@ void cgroup_kill(struct flist_head *clist)
 		sfree(cm);
 	}
 
-	fio_mutex_up(lock);
+	fio_sem_up(lock);
 }
 
 static char *get_cgroup_root(struct thread_data *td, char *mnt)
@@ -198,12 +198,12 @@ void cgroup_shutdown(struct thread_data *td, char **mnt)
 
 static void fio_init cgroup_init(void)
 {
-	lock = fio_mutex_init(FIO_MUTEX_UNLOCKED);
+	lock = fio_sem_init(FIO_SEM_UNLOCKED);
 	if (!lock)
 		log_err("fio: failed to allocate cgroup lock\n");
 }
 
 static void fio_exit cgroup_exit(void)
 {
-	fio_mutex_remove(lock);
+	fio_sem_remove(lock);
 }
diff --git a/configure b/configure
index f38e9c7..ddf03a6 100755
--- a/configure
+++ b/configure
@@ -2050,6 +2050,24 @@ fi
 print_config "strndup" "$strndup"
 
 ##########################################
+# <valgrind/drd.h> probe
+# Note: presence of <valgrind/drd.h> implies that <valgrind/valgrind.h> is
+# also available but not the other way around.
+if test "$valgrind_dev" != "yes" ; then
+  valgrind_dev="no"
+fi
+cat > $TMPC << EOF
+#include <valgrind/drd.h>
+int main(int argc, char **argv)
+{
+  return 0;
+}
+EOF
+if compile_prog "" "" "valgrind_dev"; then
+  valgrind_dev="yes"
+fi
+print_config "Valgrind headers" "$valgrind_dev"
+
 # check march=armv8-a+crc+crypto
 if test "$march_armv8_a_crc_crypto" != "yes" ; then
   march_armv8_a_crc_crypto="no"
@@ -2354,6 +2372,9 @@ fi
 if test "$disable_opt" = "yes" ; then
   output_sym "CONFIG_DISABLE_OPTIMIZATIONS"
 fi
+if test "$valgrind_dev" = "yes"; then
+  output_sym "CONFIG_VALGRIND_DEV"
+fi
 if test "$zlib" = "no" ; then
   echo "Consider installing zlib-dev (zlib-devel, some fio features depend on it."
   if test "$build_static" = "yes"; then
diff --git a/diskutil.c b/diskutil.c
index 789071d..dd8fc6a 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -8,6 +8,11 @@
 #include <libgen.h>
 #include <math.h>
 #include <assert.h>
+#ifdef CONFIG_VALGRIND_DEV
+#include <valgrind/drd.h>
+#else
+#define DRD_IGNORE_VAR(x) do { } while (0)
+#endif
 
 #include "fio.h"
 #include "smalloc.h"
@@ -17,7 +22,7 @@
 static int last_majdev, last_mindev;
 static struct disk_util *last_du;
 
-static struct fio_mutex *disk_util_mutex;
+static struct fio_sem *disk_util_sem;
 
 static struct disk_util *__init_per_file_disk_util(struct thread_data *td,
 		int majdev, int mindev, char *path);
@@ -35,7 +40,7 @@ static void disk_util_free(struct disk_util *du)
 		slave->users--;
 	}
 
-	fio_mutex_remove(du->lock);
+	fio_sem_remove(du->lock);
 	free(du->sysfs_root);
 	sfree(du);
 }
@@ -120,7 +125,7 @@ int update_io_ticks(void)
 
 	dprint(FD_DISKUTIL, "update io ticks\n");
 
-	fio_mutex_down(disk_util_mutex);
+	fio_sem_down(disk_util_sem);
 
 	if (!helper_should_exit()) {
 		flist_for_each(entry, &disk_list) {
@@ -130,7 +135,7 @@ int update_io_ticks(void)
 	} else
 		ret = 1;
 
-	fio_mutex_up(disk_util_mutex);
+	fio_sem_up(disk_util_sem);
 	return ret;
 }
 
@@ -139,18 +144,18 @@ static struct disk_util *disk_util_exists(int major, int minor)
 	struct flist_head *entry;
 	struct disk_util *du;
 
-	fio_mutex_down(disk_util_mutex);
+	fio_sem_down(disk_util_sem);
 
 	flist_for_each(entry, &disk_list) {
 		du = flist_entry(entry, struct disk_util, list);
 
 		if (major == du->major && minor == du->minor) {
-			fio_mutex_up(disk_util_mutex);
+			fio_sem_up(disk_util_sem);
 			return du;
 		}
 	}
 
-	fio_mutex_up(disk_util_mutex);
+	fio_sem_up(disk_util_sem);
 	return NULL;
 }
 
@@ -297,6 +302,7 @@ static struct disk_util *disk_util_add(struct thread_data *td, int majdev,
 	if (!du)
 		return NULL;
 
+	DRD_IGNORE_VAR(du->users);
 	memset(du, 0, sizeof(*du));
 	INIT_FLIST_HEAD(&du->list);
 	l = snprintf(du->path, sizeof(du->path), "%s/stat", path);
@@ -312,10 +318,10 @@ static struct disk_util *disk_util_add(struct thread_data *td, int majdev,
 	du->minor = mindev;
 	INIT_FLIST_HEAD(&du->slavelist);
 	INIT_FLIST_HEAD(&du->slaves);
-	du->lock = fio_mutex_init(FIO_MUTEX_UNLOCKED);
+	du->lock = fio_sem_init(FIO_SEM_UNLOCKED);
 	du->users = 0;
 
-	fio_mutex_down(disk_util_mutex);
+	fio_sem_down(disk_util_sem);
 
 	flist_for_each(entry, &disk_list) {
 		__du = flist_entry(entry, struct disk_util, list);
@@ -324,7 +330,7 @@ static struct disk_util *disk_util_add(struct thread_data *td, int majdev,
 
 		if (!strcmp((char *) du->dus.name, (char *) __du->dus.name)) {
 			disk_util_free(du);
-			fio_mutex_up(disk_util_mutex);
+			fio_sem_up(disk_util_sem);
 			return __du;
 		}
 	}
@@ -335,7 +341,7 @@ static struct disk_util *disk_util_add(struct thread_data *td, int majdev,
 	get_io_ticks(du, &du->last_dus);
 
 	flist_add_tail(&du->list, &disk_list);
-	fio_mutex_up(disk_util_mutex);
+	fio_sem_up(disk_util_sem);
 
 	find_add_disk_slaves(td, path, du);
 	return du;
@@ -559,7 +565,7 @@ static void aggregate_slaves_stats(struct disk_util *masterdu)
 
 void disk_util_prune_entries(void)
 {
-	fio_mutex_down(disk_util_mutex);
+	fio_sem_down(disk_util_sem);
 
 	while (!flist_empty(&disk_list)) {
 		struct disk_util *du;
@@ -570,8 +576,8 @@ void disk_util_prune_entries(void)
 	}
 
 	last_majdev = last_mindev = -1;
-	fio_mutex_up(disk_util_mutex);
-	fio_mutex_remove(disk_util_mutex);
+	fio_sem_up(disk_util_sem);
+	fio_sem_remove(disk_util_sem);
 }
 
 void print_disk_util(struct disk_util_stat *dus, struct disk_util_agg *agg,
@@ -693,13 +699,13 @@ void show_disk_util(int terse, struct json_object *parent,
 	struct disk_util *du;
 	bool do_json;
 
-	if (!disk_util_mutex)
+	if (!disk_util_sem)
 		return;
 
-	fio_mutex_down(disk_util_mutex);
+	fio_sem_down(disk_util_sem);
 
 	if (flist_empty(&disk_list)) {
-		fio_mutex_up(disk_util_mutex);
+		fio_sem_up(disk_util_sem);
 		return;
 	}
 
@@ -722,10 +728,10 @@ void show_disk_util(int terse, struct json_object *parent,
 		}
 	}
 
-	fio_mutex_up(disk_util_mutex);
+	fio_sem_up(disk_util_sem);
 }
 
 void setup_disk_util(void)
 {
-	disk_util_mutex = fio_mutex_init(FIO_MUTEX_UNLOCKED);
+	disk_util_sem = fio_sem_init(FIO_SEM_UNLOCKED);
 }
diff --git a/diskutil.h b/diskutil.h
index 91b4202..c103578 100644
--- a/diskutil.h
+++ b/diskutil.h
@@ -5,6 +5,7 @@
 
 #include "lib/output_buffer.h"
 #include "helper_thread.h"
+#include "fio_sem.h"
 
 struct disk_util_stats {
 	uint64_t ios[2];
@@ -66,7 +67,7 @@ struct disk_util {
 
 	struct timespec time;
 
-	struct fio_mutex *lock;
+	struct fio_sem *lock;
 	unsigned long users;
 };
 
@@ -75,7 +76,7 @@ static inline void disk_util_mod(struct disk_util *du, int val)
 	if (du) {
 		struct flist_head *n;
 
-		fio_mutex_down(du->lock);
+		fio_sem_down(du->lock);
 		du->users += val;
 
 		flist_for_each(n, &du->slavelist) {
@@ -84,7 +85,7 @@ static inline void disk_util_mod(struct disk_util *du, int val)
 			slave = flist_entry(n, struct disk_util, slavelist);
 			slave->users += val;
 		}
-		fio_mutex_up(du->lock);
+		fio_sem_up(du->lock);
 	}
 }
 static inline void disk_util_inc(struct disk_util *du)
diff --git a/engines/sg.c b/engines/sg.c
index 4540b57..f240755 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -12,9 +12,43 @@
 #include <sys/poll.h>
 
 #include "../fio.h"
+#include "../optgroup.h"
 
 #ifdef FIO_HAVE_SGIO
 
+
+struct sg_options {
+	void *pad;
+	unsigned int readfua;
+	unsigned int writefua;
+};
+
+static struct fio_option options[] = {
+	{
+		.name	= "readfua",
+		.lname	= "sg engine read fua flag support",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct sg_options, readfua),
+		.help	= "Set FUA flag (force unit access) for all Read operations",
+		.def	= "0",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_SG,
+	},
+	{
+		.name	= "writefua",
+		.lname	= "sg engine write fua flag support",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct sg_options, writefua),
+		.help	= "Set FUA flag (force unit access) for all Write operations",
+		.def	= "0",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_SG,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
 #define MAX_10B_LBA  0xFFFFFFFFULL
 #define SCSI_TIMEOUT_MS 30000   // 30 second timeout; currently no method to override
 #define MAX_SB 64               // sense block maximum return size
@@ -267,6 +301,7 @@ static int fio_sgio_doio(struct thread_data *td, struct io_u *io_u, int do_sync)
 static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct sg_io_hdr *hdr = &io_u->hdr;
+	struct sg_options *o = td->eo;
 	struct sgio_data *sd = td->io_ops_data;
 	long long nr_blocks, lba;
 
@@ -286,6 +321,10 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 			hdr->cmdp[0] = 0x28; // read(10)
 		else
 			hdr->cmdp[0] = 0x88; // read(16)
+
+		if (o->readfua)
+			hdr->cmdp[1] |= 0x08;
+
 	} else if (io_u->ddir == DDIR_WRITE) {
 		sgio_hdr_init(sd, hdr, io_u, 1);
 
@@ -294,6 +333,10 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 			hdr->cmdp[0] = 0x2a; // write(10)
 		else
 			hdr->cmdp[0] = 0x8a; // write(16)
+
+		if (o->writefua)
+			hdr->cmdp[1] |= 0x08;
+
 	} else {
 		sgio_hdr_init(sd, hdr, io_u, 0);
 		hdr->dxfer_direction = SG_DXFER_NONE;
@@ -822,6 +865,8 @@ static struct ioengine_ops ioengine = {
 	.close_file	= generic_close_file,
 	.get_file_size	= fio_sgio_get_file_size,
 	.flags		= FIO_SYNCIO | FIO_RAWIO,
+	.options	= options,
+	.option_struct_size	= sizeof(struct sg_options)
 };
 
 #else /* FIO_HAVE_SGIO */
diff --git a/eta.c b/eta.c
index 0b79526..3126f21 100644
--- a/eta.c
+++ b/eta.c
@@ -4,6 +4,11 @@
 #include <unistd.h>
 #include <fcntl.h>
 #include <string.h>
+#ifdef CONFIG_VALGRIND_DEV
+#include <valgrind/drd.h>
+#else
+#define DRD_IGNORE_VAR(x) do { } while (0)
+#endif
 
 #include "fio.h"
 #include "lib/pow2.h"
@@ -668,6 +673,7 @@ void print_thread_status(void)
 
 void print_status_init(int thr_number)
 {
+	DRD_IGNORE_VAR(__run_str);
 	__run_str[thr_number] = 'P';
 	update_condensed_str(__run_str, run_str);
 }
diff --git a/file.h b/file.h
index cc721ee..8fd34b1 100644
--- a/file.h
+++ b/file.h
@@ -125,7 +125,7 @@ struct fio_file {
 	 * if io is protected by a semaphore, this is set
 	 */
 	union {
-		struct fio_mutex *lock;
+		struct fio_sem *lock;
 		struct fio_rwlock *rwlock;
 	};
 
diff --git a/filehash.c b/filehash.c
index edeeab4..b55ab73 100644
--- a/filehash.c
+++ b/filehash.c
@@ -16,7 +16,7 @@
 static unsigned int file_hash_size = HASH_BUCKETS * sizeof(struct flist_head);
 
 static struct flist_head *file_hash;
-static struct fio_mutex *hash_lock;
+static struct fio_sem *hash_lock;
 static struct bloom *file_bloom;
 
 static unsigned short hash(const char *name)
@@ -27,18 +27,18 @@ static unsigned short hash(const char *name)
 void fio_file_hash_lock(void)
 {
 	if (hash_lock)
-		fio_mutex_down(hash_lock);
+		fio_sem_down(hash_lock);
 }
 
 void fio_file_hash_unlock(void)
 {
 	if (hash_lock)
-		fio_mutex_up(hash_lock);
+		fio_sem_up(hash_lock);
 }
 
 void remove_file_hash(struct fio_file *f)
 {
-	fio_mutex_down(hash_lock);
+	fio_sem_down(hash_lock);
 
 	if (fio_file_hashed(f)) {
 		assert(!flist_empty(&f->hash_list));
@@ -46,7 +46,7 @@ void remove_file_hash(struct fio_file *f)
 		fio_file_clear_hashed(f);
 	}
 
-	fio_mutex_up(hash_lock);
+	fio_sem_up(hash_lock);
 }
 
 static struct fio_file *__lookup_file_hash(const char *name)
@@ -73,9 +73,9 @@ struct fio_file *lookup_file_hash(const char *name)
 {
 	struct fio_file *f;
 
-	fio_mutex_down(hash_lock);
+	fio_sem_down(hash_lock);
 	f = __lookup_file_hash(name);
-	fio_mutex_up(hash_lock);
+	fio_sem_up(hash_lock);
 	return f;
 }
 
@@ -88,7 +88,7 @@ struct fio_file *add_file_hash(struct fio_file *f)
 
 	INIT_FLIST_HEAD(&f->hash_list);
 
-	fio_mutex_down(hash_lock);
+	fio_sem_down(hash_lock);
 
 	alias = __lookup_file_hash(f->file_name);
 	if (!alias) {
@@ -96,7 +96,7 @@ struct fio_file *add_file_hash(struct fio_file *f)
 		flist_add_tail(&f->hash_list, &file_hash[hash(f->file_name)]);
 	}
 
-	fio_mutex_up(hash_lock);
+	fio_sem_up(hash_lock);
 	return alias;
 }
 
@@ -109,17 +109,17 @@ void file_hash_exit(void)
 {
 	unsigned int i, has_entries = 0;
 
-	fio_mutex_down(hash_lock);
+	fio_sem_down(hash_lock);
 	for (i = 0; i < HASH_BUCKETS; i++)
 		has_entries += !flist_empty(&file_hash[i]);
-	fio_mutex_up(hash_lock);
+	fio_sem_up(hash_lock);
 
 	if (has_entries)
 		log_err("fio: file hash not empty on exit\n");
 
 	sfree(file_hash);
 	file_hash = NULL;
-	fio_mutex_remove(hash_lock);
+	fio_sem_remove(hash_lock);
 	hash_lock = NULL;
 	bloom_free(file_bloom);
 	file_bloom = NULL;
@@ -134,6 +134,6 @@ void file_hash_init(void)
 	for (i = 0; i < HASH_BUCKETS; i++)
 		INIT_FLIST_HEAD(&file_hash[i]);
 
-	hash_lock = fio_mutex_init(FIO_MUTEX_UNLOCKED);
+	hash_lock = fio_sem_init(FIO_SEM_UNLOCKED);
 	file_bloom = bloom_new(BLOOM_SIZE);
 }
diff --git a/filelock.c b/filelock.c
index 6e84970..cc98aaf 100644
--- a/filelock.c
+++ b/filelock.c
@@ -11,13 +11,13 @@
 #include "flist.h"
 #include "filelock.h"
 #include "smalloc.h"
-#include "mutex.h"
+#include "fio_sem.h"
 #include "hash.h"
 #include "log.h"
 
 struct fio_filelock {
 	uint32_t hash;
-	struct fio_mutex lock;
+	struct fio_sem lock;
 	struct flist_head list;
 	unsigned int references;
 };
@@ -26,7 +26,7 @@ struct fio_filelock {
 	
 static struct filelock_data {
 	struct flist_head list;
-	struct fio_mutex lock;
+	struct fio_sem lock;
 
 	struct flist_head free_list;
 	struct fio_filelock ffs[MAX_FILELOCKS];
@@ -58,9 +58,9 @@ static struct fio_filelock *get_filelock(int trylock, int *retry)
 		if (ff || trylock)
 			break;
 
-		fio_mutex_up(&fld->lock);
+		fio_sem_up(&fld->lock);
 		usleep(1000);
-		fio_mutex_down(&fld->lock);
+		fio_sem_down(&fld->lock);
 		*retry = 1;
 	} while (1);
 
@@ -78,13 +78,13 @@ int fio_filelock_init(void)
 	INIT_FLIST_HEAD(&fld->list);
 	INIT_FLIST_HEAD(&fld->free_list);
 
-	if (__fio_mutex_init(&fld->lock, FIO_MUTEX_UNLOCKED))
+	if (__fio_sem_init(&fld->lock, FIO_SEM_UNLOCKED))
 		goto err;
 
 	for (i = 0; i < MAX_FILELOCKS; i++) {
 		struct fio_filelock *ff = &fld->ffs[i];
 
-		if (__fio_mutex_init(&ff->lock, FIO_MUTEX_UNLOCKED))
+		if (__fio_sem_init(&ff->lock, FIO_SEM_UNLOCKED))
 			goto err;
 		flist_add_tail(&ff->list, &fld->free_list);
 	}
@@ -101,7 +101,7 @@ void fio_filelock_exit(void)
 		return;
 
 	assert(flist_empty(&fld->list));
-	__fio_mutex_remove(&fld->lock);
+	__fio_sem_remove(&fld->lock);
 
 	while (!flist_empty(&fld->free_list)) {
 		struct fio_filelock *ff;
@@ -109,7 +109,7 @@ void fio_filelock_exit(void)
 		ff = flist_first_entry(&fld->free_list, struct fio_filelock, list);
 
 		flist_del_init(&ff->list);
-		__fio_mutex_remove(&ff->lock);
+		__fio_sem_remove(&ff->lock);
 	}
 
 	sfree(fld);
@@ -172,11 +172,11 @@ static bool __fio_lock_file(const char *fname, int trylock)
 
 	hash = jhash(fname, strlen(fname), 0);
 
-	fio_mutex_down(&fld->lock);
+	fio_sem_down(&fld->lock);
 	ff = fio_hash_get(hash, trylock);
 	if (ff)
 		ff->references++;
-	fio_mutex_up(&fld->lock);
+	fio_sem_up(&fld->lock);
 
 	if (!ff) {
 		assert(!trylock);
@@ -184,14 +184,14 @@ static bool __fio_lock_file(const char *fname, int trylock)
 	}
 
 	if (!trylock) {
-		fio_mutex_down(&ff->lock);
+		fio_sem_down(&ff->lock);
 		return false;
 	}
 
-	if (!fio_mutex_down_trylock(&ff->lock))
+	if (!fio_sem_down_trylock(&ff->lock))
 		return false;
 
-	fio_mutex_down(&fld->lock);
+	fio_sem_down(&fld->lock);
 
 	/*
 	 * If we raced and the only reference to the lock is us, we can
@@ -202,10 +202,10 @@ static bool __fio_lock_file(const char *fname, int trylock)
 		ff = NULL;
 	}
 
-	fio_mutex_up(&fld->lock);
+	fio_sem_up(&fld->lock);
 
 	if (ff) {
-		fio_mutex_down(&ff->lock);
+		fio_sem_down(&ff->lock);
 		return false;
 	}
 
@@ -229,12 +229,12 @@ void fio_unlock_file(const char *fname)
 
 	hash = jhash(fname, strlen(fname), 0);
 
-	fio_mutex_down(&fld->lock);
+	fio_sem_down(&fld->lock);
 
 	ff = fio_hash_find(hash);
 	if (ff) {
 		int refs = --ff->references;
-		fio_mutex_up(&ff->lock);
+		fio_sem_up(&ff->lock);
 		if (!refs) {
 			flist_del_init(&ff->list);
 			put_filelock(ff);
@@ -242,5 +242,5 @@ void fio_unlock_file(const char *fname)
 	} else
 		log_err("fio: file not found for unlocking\n");
 
-	fio_mutex_up(&fld->lock);
+	fio_sem_up(&fld->lock);
 }
diff --git a/filesetup.c b/filesetup.c
index 1a187ff..7cbce13 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -15,6 +15,7 @@
 #include "os/os.h"
 #include "hash.h"
 #include "lib/axmap.h"
+#include "rwlock.h"
 
 #ifdef CONFIG_LINUX_FALLOCATE
 #include <linux/falloc.h>
@@ -1621,7 +1622,7 @@ int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 		f->rwlock = fio_rwlock_init();
 		break;
 	case FILE_LOCK_EXCLUSIVE:
-		f->lock = fio_mutex_init(FIO_MUTEX_UNLOCKED);
+		f->lock = fio_sem_init(FIO_SEM_UNLOCKED);
 		break;
 	default:
 		log_err("fio: unknown lock mode: %d\n", td->o.file_lock_mode);
@@ -1706,7 +1707,7 @@ void lock_file(struct thread_data *td, struct fio_file *f, enum fio_ddir ddir)
 		else
 			fio_rwlock_write(f->rwlock);
 	} else if (td->o.file_lock_mode == FILE_LOCK_EXCLUSIVE)
-		fio_mutex_down(f->lock);
+		fio_sem_down(f->lock);
 
 	td->file_locks[f->fileno] = td->o.file_lock_mode;
 }
@@ -1719,7 +1720,7 @@ void unlock_file(struct thread_data *td, struct fio_file *f)
 	if (td->o.file_lock_mode == FILE_LOCK_READWRITE)
 		fio_rwlock_unlock(f->rwlock);
 	else if (td->o.file_lock_mode == FILE_LOCK_EXCLUSIVE)
-		fio_mutex_up(f->lock);
+		fio_sem_up(f->lock);
 
 	td->file_locks[f->fileno] = FILE_LOCK_NONE;
 }
diff --git a/fio.1 b/fio.1
index f955167..5ca57ce 100644
--- a/fio.1
+++ b/fio.1
@@ -1523,7 +1523,7 @@ SCSI generic sg v3 I/O. May either be synchronous using the SG_IO
 ioctl, or if the target is an sg character device we use
 \fBread\fR\|(2) and \fBwrite\fR\|(2) for asynchronous
 I/O. Requires \fBfilename\fR option to specify either block or
-character devices.
+character devices. The sg engine includes engine specific options.
 .TP
 .B null
 Doesn't transfer any data, just pretends to. This is mainly used to
@@ -1820,6 +1820,14 @@ server side this will be passed into the rdma_bind_addr() function and
 on the client site it will be used in the rdma_resolve_add()
 function. This can be useful when multiple paths exist between the
 client and the server or in certain loopback configurations.
+.TP
+.BI (sg)readfua \fR=\fPbool
+With readfua option set to 1, read operations include the force
+unit access (fua) flag. Default: 0.
+.TP
+.BI (sg)writefua \fR=\fPbool
+With writefua option set to 1, write operations include the force
+unit access (fua) flag. Default: 0.
 .SS "I/O depth"
 .TP
 .BI iodepth \fR=\fPint
diff --git a/fio.h b/fio.h
index 85546c5..9551048 100644
--- a/fio.h
+++ b/fio.h
@@ -20,7 +20,6 @@
 #include "fifo.h"
 #include "arch/arch.h"
 #include "os/os.h"
-#include "mutex.h"
 #include "log.h"
 #include "debug.h"
 #include "file.h"
@@ -63,6 +62,8 @@
 #include <cuda.h>
 #endif
 
+struct fio_sem;
+
 /*
  * offset generator types
  */
@@ -198,7 +199,7 @@ struct thread_data {
 	struct timespec iops_sample_time;
 
 	volatile int update_rusage;
-	struct fio_mutex *rusage_sem;
+	struct fio_sem *rusage_sem;
 	struct rusage ru_start;
 	struct rusage ru_end;
 
@@ -341,7 +342,7 @@ struct thread_data {
 	uint64_t this_io_bytes[DDIR_RWDIR_CNT];
 	uint64_t io_skip_bytes;
 	uint64_t zone_bytes;
-	struct fio_mutex *mutex;
+	struct fio_sem *sem;
 	uint64_t bytes_done[DDIR_RWDIR_CNT];
 
 	/*
diff --git a/fio_sem.c b/fio_sem.c
new file mode 100644
index 0000000..20fcfcc
--- /dev/null
+++ b/fio_sem.c
@@ -0,0 +1,178 @@
+#include <string.h>
+#include <sys/mman.h>
+#include <assert.h>
+#ifdef CONFIG_VALGRIND_DEV
+#include <valgrind/valgrind.h>
+#else
+#define RUNNING_ON_VALGRIND 0
+#endif
+
+#include "log.h"
+#include "fio_sem.h"
+#include "pshared.h"
+#include "os/os.h"
+#include "fio_time.h"
+#include "gettime.h"
+
+void __fio_sem_remove(struct fio_sem *sem)
+{
+	assert(sem->magic == FIO_SEM_MAGIC);
+	pthread_mutex_destroy(&sem->lock);
+	pthread_cond_destroy(&sem->cond);
+
+	/*
+	 * When not running on Valgrind, ensure any subsequent attempt to grab
+	 * this semaphore will fail with an assert, instead of just silently
+	 * hanging. When running on Valgrind, let Valgrind detect
+	 * use-after-free.
+         */
+	if (!RUNNING_ON_VALGRIND)
+		memset(sem, 0, sizeof(*sem));
+}
+
+void fio_sem_remove(struct fio_sem *sem)
+{
+	__fio_sem_remove(sem);
+	munmap((void *) sem, sizeof(*sem));
+}
+
+int __fio_sem_init(struct fio_sem *sem, int value)
+{
+	int ret;
+
+	sem->value = value;
+	/* Initialize .waiters explicitly for Valgrind. */
+	sem->waiters = 0;
+	sem->magic = FIO_SEM_MAGIC;
+
+	ret = mutex_cond_init_pshared(&sem->lock, &sem->cond);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+struct fio_sem *fio_sem_init(int value)
+{
+	struct fio_sem *sem = NULL;
+
+	sem = (void *) mmap(NULL, sizeof(struct fio_sem),
+				PROT_READ | PROT_WRITE,
+				OS_MAP_ANON | MAP_SHARED, -1, 0);
+	if (sem == MAP_FAILED) {
+		perror("mmap semaphore");
+		return NULL;
+	}
+
+	if (!__fio_sem_init(sem, value))
+		return sem;
+
+	fio_sem_remove(sem);
+	return NULL;
+}
+
+static bool sem_timed_out(struct timespec *t, unsigned int msecs)
+{
+	struct timeval tv;
+	struct timespec now;
+
+	gettimeofday(&tv, NULL);
+	now.tv_sec = tv.tv_sec;
+	now.tv_nsec = tv.tv_usec * 1000;
+
+	return mtime_since(t, &now) >= msecs;
+}
+
+int fio_sem_down_timeout(struct fio_sem *sem, unsigned int msecs)
+{
+	struct timeval tv_s;
+	struct timespec base;
+	struct timespec t;
+	int ret = 0;
+
+	assert(sem->magic == FIO_SEM_MAGIC);
+
+	gettimeofday(&tv_s, NULL);
+	base.tv_sec = t.tv_sec = tv_s.tv_sec;
+	base.tv_nsec = t.tv_nsec = tv_s.tv_usec * 1000;
+
+	t.tv_sec += msecs / 1000;
+	t.tv_nsec += ((msecs * 1000000ULL) % 1000000000);
+	if (t.tv_nsec >= 1000000000) {
+		t.tv_nsec -= 1000000000;
+		t.tv_sec++;
+	}
+
+	pthread_mutex_lock(&sem->lock);
+
+	sem->waiters++;
+	while (!sem->value && !ret) {
+		/*
+		 * Some platforms (FreeBSD 9?) seems to return timed out
+		 * way too early, double check.
+		 */
+		ret = pthread_cond_timedwait(&sem->cond, &sem->lock, &t);
+		if (ret == ETIMEDOUT && !sem_timed_out(&base, msecs))
+			ret = 0;
+	}
+	sem->waiters--;
+
+	if (!ret) {
+		sem->value--;
+		pthread_mutex_unlock(&sem->lock);
+		return 0;
+	}
+
+	pthread_mutex_unlock(&sem->lock);
+	return ret;
+}
+
+bool fio_sem_down_trylock(struct fio_sem *sem)
+{
+	bool ret = true;
+
+	assert(sem->magic == FIO_SEM_MAGIC);
+
+	pthread_mutex_lock(&sem->lock);
+	if (sem->value) {
+		sem->value--;
+		ret = false;
+	}
+	pthread_mutex_unlock(&sem->lock);
+
+	return ret;
+}
+
+void fio_sem_down(struct fio_sem *sem)
+{
+	assert(sem->magic == FIO_SEM_MAGIC);
+
+	pthread_mutex_lock(&sem->lock);
+
+	while (!sem->value) {
+		sem->waiters++;
+		pthread_cond_wait(&sem->cond, &sem->lock);
+		sem->waiters--;
+	}
+
+	sem->value--;
+	pthread_mutex_unlock(&sem->lock);
+}
+
+void fio_sem_up(struct fio_sem *sem)
+{
+	int do_wake = 0;
+
+	assert(sem->magic == FIO_SEM_MAGIC);
+
+	pthread_mutex_lock(&sem->lock);
+	read_barrier();
+	if (!sem->value && sem->waiters)
+		do_wake = 1;
+	sem->value++;
+
+	if (do_wake)
+		pthread_cond_signal(&sem->cond);
+
+	pthread_mutex_unlock(&sem->lock);
+}
diff --git a/fio_sem.h b/fio_sem.h
new file mode 100644
index 0000000..a796ddd
--- /dev/null
+++ b/fio_sem.h
@@ -0,0 +1,31 @@
+#ifndef FIO_SEM_H
+#define FIO_SEM_H
+
+#include <pthread.h>
+#include "lib/types.h"
+
+#define FIO_SEM_MAGIC		0x4d555445U
+
+struct fio_sem {
+	pthread_mutex_t lock;
+	pthread_cond_t cond;
+	int value;
+	int waiters;
+	int magic;
+};
+
+enum {
+	FIO_SEM_LOCKED	= 0,
+	FIO_SEM_UNLOCKED	= 1,
+};
+
+extern int __fio_sem_init(struct fio_sem *, int);
+extern struct fio_sem *fio_sem_init(int);
+extern void __fio_sem_remove(struct fio_sem *);
+extern void fio_sem_remove(struct fio_sem *);
+extern void fio_sem_up(struct fio_sem *);
+extern void fio_sem_down(struct fio_sem *);
+extern bool fio_sem_down_trylock(struct fio_sem *);
+extern int fio_sem_down_timeout(struct fio_sem *, unsigned int);
+
+#endif
diff --git a/flow.c b/flow.c
index 384187e..a8dbfb9 100644
--- a/flow.c
+++ b/flow.c
@@ -1,5 +1,5 @@
 #include "fio.h"
-#include "mutex.h"
+#include "fio_sem.h"
 #include "smalloc.h"
 #include "flist.h"
 
@@ -11,7 +11,7 @@ struct fio_flow {
 };
 
 static struct flist_head *flow_list;
-static struct fio_mutex *flow_lock;
+static struct fio_sem *flow_lock;
 
 int flow_threshold_exceeded(struct thread_data *td)
 {
@@ -49,7 +49,7 @@ static struct fio_flow *flow_get(unsigned int id)
 	if (!flow_lock)
 		return NULL;
 
-	fio_mutex_down(flow_lock);
+	fio_sem_down(flow_lock);
 
 	flist_for_each(n, flow_list) {
 		flow = flist_entry(n, struct fio_flow, list);
@@ -62,7 +62,7 @@ static struct fio_flow *flow_get(unsigned int id)
 	if (!flow) {
 		flow = smalloc(sizeof(*flow));
 		if (!flow) {
-			fio_mutex_up(flow_lock);
+			fio_sem_up(flow_lock);
 			return NULL;
 		}
 		flow->refs = 0;
@@ -74,7 +74,7 @@ static struct fio_flow *flow_get(unsigned int id)
 	}
 
 	flow->refs++;
-	fio_mutex_up(flow_lock);
+	fio_sem_up(flow_lock);
 	return flow;
 }
 
@@ -83,14 +83,14 @@ static void flow_put(struct fio_flow *flow)
 	if (!flow_lock)
 		return;
 
-	fio_mutex_down(flow_lock);
+	fio_sem_down(flow_lock);
 
 	if (!--flow->refs) {
 		flist_del(&flow->list);
 		sfree(flow);
 	}
 
-	fio_mutex_up(flow_lock);
+	fio_sem_up(flow_lock);
 }
 
 void flow_init_job(struct thread_data *td)
@@ -115,7 +115,7 @@ void flow_init(void)
 		return;
 	}
 
-	flow_lock = fio_mutex_init(FIO_MUTEX_UNLOCKED);
+	flow_lock = fio_sem_init(FIO_SEM_UNLOCKED);
 	if (!flow_lock) {
 		log_err("fio: failed to allocate flow lock\n");
 		sfree(flow_list);
@@ -128,7 +128,7 @@ void flow_init(void)
 void flow_exit(void)
 {
 	if (flow_lock)
-		fio_mutex_remove(flow_lock);
+		fio_sem_remove(flow_lock);
 	if (flow_list)
 		sfree(flow_list);
 }
diff --git a/gettime-thread.c b/gettime-thread.c
index fc52236..87f5060 100644
--- a/gettime-thread.c
+++ b/gettime-thread.c
@@ -35,18 +35,18 @@ static void fio_gtod_update(void)
 }
 
 struct gtod_cpu_data {
-	struct fio_mutex *mutex;
+	struct fio_sem *sem;
 	unsigned int cpu;
 };
 
 static void *gtod_thread_main(void *data)
 {
-	struct fio_mutex *mutex = data;
+	struct fio_sem *sem = data;
 	int ret;
 
 	ret = fio_setaffinity(gettid(), fio_gtod_cpumask);
 
-	fio_mutex_up(mutex);
+	fio_sem_up(sem);
 
 	if (ret == -1) {
 		log_err("gtod: setaffinity failed\n");
@@ -69,17 +69,17 @@ static void *gtod_thread_main(void *data)
 
 int fio_start_gtod_thread(void)
 {
-	struct fio_mutex *mutex;
+	struct fio_sem *sem;
 	pthread_attr_t attr;
 	int ret;
 
-	mutex = fio_mutex_init(FIO_MUTEX_LOCKED);
-	if (!mutex)
+	sem = fio_sem_init(FIO_SEM_LOCKED);
+	if (!sem)
 		return 1;
 
 	pthread_attr_init(&attr);
 	pthread_attr_setstacksize(&attr, 2 * PTHREAD_STACK_MIN);
-	ret = pthread_create(&gtod_thread, &attr, gtod_thread_main, mutex);
+	ret = pthread_create(&gtod_thread, &attr, gtod_thread_main, sem);
 	pthread_attr_destroy(&attr);
 	if (ret) {
 		log_err("Can't create gtod thread: %s\n", strerror(ret));
@@ -92,11 +92,11 @@ int fio_start_gtod_thread(void)
 		goto err;
 	}
 
-	dprint(FD_MUTEX, "wait on startup_mutex\n");
-	fio_mutex_down(mutex);
-	dprint(FD_MUTEX, "done waiting on startup_mutex\n");
+	dprint(FD_MUTEX, "wait on startup_sem\n");
+	fio_sem_down(sem);
+	dprint(FD_MUTEX, "done waiting on startup_sem\n");
 err:
-	fio_mutex_remove(mutex);
+	fio_sem_remove(sem);
 	return ret;
 }
 
diff --git a/gettime.c b/gettime.c
index c256a96..57c66f7 100644
--- a/gettime.c
+++ b/gettime.c
@@ -8,6 +8,7 @@
 #include <time.h>
 
 #include "fio.h"
+#include "fio_sem.h"
 #include "smalloc.h"
 
 #include "hash.h"
@@ -563,8 +564,7 @@ struct clock_thread {
 	pthread_t thread;
 	int cpu;
 	int debug;
-	pthread_mutex_t lock;
-	pthread_mutex_t started;
+	struct fio_sem lock;
 	unsigned long nr_entries;
 	uint32_t *seq;
 	struct clock_entry *entries;
@@ -600,8 +600,7 @@ static void *clock_thread_fn(void *data)
 		goto err;
 	}
 
-	pthread_mutex_lock(&t->lock);
-	pthread_mutex_unlock(&t->started);
+	fio_sem_down(&t->lock);
 
 	first = get_cpu_clock();
 	c = &t->entries[0];
@@ -702,9 +701,7 @@ int fio_monotonic_clocktest(int debug)
 		t->seq = &seq;
 		t->nr_entries = nr_entries;
 		t->entries = &entries[i * nr_entries];
-		pthread_mutex_init(&t->lock, NULL);
-		pthread_mutex_init(&t->started, NULL);
-		pthread_mutex_lock(&t->lock);
+		__fio_sem_init(&t->lock, FIO_SEM_LOCKED);
 		if (pthread_create(&t->thread, NULL, clock_thread_fn, t)) {
 			failed++;
 			nr_cpus = i;
@@ -715,13 +712,7 @@ int fio_monotonic_clocktest(int debug)
 	for (i = 0; i < nr_cpus; i++) {
 		struct clock_thread *t = &cthreads[i];
 
-		pthread_mutex_lock(&t->started);
-	}
-
-	for (i = 0; i < nr_cpus; i++) {
-		struct clock_thread *t = &cthreads[i];
-
-		pthread_mutex_unlock(&t->lock);
+		fio_sem_up(&t->lock);
 	}
 
 	for (i = 0; i < nr_cpus; i++) {
@@ -731,6 +722,7 @@ int fio_monotonic_clocktest(int debug)
 		pthread_join(t->thread, &ret);
 		if (ret)
 			failed++;
+		__fio_sem_remove(&t->lock);
 	}
 	free(cthreads);
 
diff --git a/helper_thread.c b/helper_thread.c
index b05f821..f0c717f 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -1,7 +1,14 @@
+#ifdef CONFIG_VALGRIND_DEV
+#include <valgrind/drd.h>
+#else
+#define DRD_IGNORE_VAR(x) do { } while (0)
+#endif
+
 #include "fio.h"
 #include "smalloc.h"
 #include "helper_thread.h"
 #include "steadystate.h"
+#include "pshared.h"
 
 static struct helper_data {
 	volatile int exit;
@@ -11,7 +18,7 @@ static struct helper_data {
 	pthread_t thread;
 	pthread_mutex_t lock;
 	pthread_cond_t cond;
-	struct fio_mutex *startup_mutex;
+	struct fio_sem *startup_sem;
 } *helper_data;
 
 void helper_thread_destroy(void)
@@ -83,7 +90,7 @@ static void *helper_thread_main(void *data)
 	memcpy(&last_du, &ts, sizeof(ts));
 	memcpy(&last_ss, &ts, sizeof(ts));
 
-	fio_mutex_up(hd->startup_mutex);
+	fio_sem_up(hd->startup_sem);
 
 	msec_to_next_event = DISK_UTIL_MSEC;
 	while (!ret && !hd->exit) {
@@ -151,7 +158,7 @@ static void *helper_thread_main(void *data)
 	return NULL;
 }
 
-int helper_thread_create(struct fio_mutex *startup_mutex, struct sk_out *sk_out)
+int helper_thread_create(struct fio_sem *startup_sem, struct sk_out *sk_out)
 {
 	struct helper_data *hd;
 	int ret;
@@ -167,7 +174,9 @@ int helper_thread_create(struct fio_mutex *startup_mutex, struct sk_out *sk_out)
 	if (ret)
 		return 1;
 
-	hd->startup_mutex = startup_mutex;
+	hd->startup_sem = startup_sem;
+
+	DRD_IGNORE_VAR(helper_data);
 
 	ret = pthread_create(&hd->thread, NULL, helper_thread_main, hd);
 	if (ret) {
@@ -177,8 +186,8 @@ int helper_thread_create(struct fio_mutex *startup_mutex, struct sk_out *sk_out)
 
 	helper_data = hd;
 
-	dprint(FD_MUTEX, "wait on startup_mutex\n");
-	fio_mutex_down(startup_mutex);
-	dprint(FD_MUTEX, "done waiting on startup_mutex\n");
+	dprint(FD_MUTEX, "wait on startup_sem\n");
+	fio_sem_down(startup_sem);
+	dprint(FD_MUTEX, "done waiting on startup_sem\n");
 	return 0;
 }
diff --git a/helper_thread.h b/helper_thread.h
index 78933b1..d7df6c4 100644
--- a/helper_thread.h
+++ b/helper_thread.h
@@ -6,6 +6,6 @@ extern void helper_do_stat(void);
 extern bool helper_should_exit(void);
 extern void helper_thread_destroy(void);
 extern void helper_thread_exit(void);
-extern int helper_thread_create(struct fio_mutex *, struct sk_out *);
+extern int helper_thread_create(struct fio_sem *, struct sk_out *);
 
 #endif
diff --git a/init.c b/init.c
index bb0627b..e47e538 100644
--- a/init.c
+++ b/init.c
@@ -12,6 +12,11 @@
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <dlfcn.h>
+#ifdef CONFIG_VALGRIND_DEV
+#include <valgrind/drd.h>
+#else
+#define DRD_IGNORE_VAR(x) do { } while (0)
+#endif
 
 #include "fio.h"
 #ifndef FIO_NO_HAVE_SHM_H
@@ -333,6 +338,8 @@ static void free_shm(void)
  */
 static int setup_thread_area(void)
 {
+	int i;
+
 	if (threads)
 		return 0;
 
@@ -376,6 +383,8 @@ static int setup_thread_area(void)
 #endif
 
 	memset(threads, 0, max_jobs * sizeof(struct thread_data));
+	for (i = 0; i < max_jobs; i++)
+		DRD_IGNORE_VAR(threads[i]);
 	fio_debug_jobp = (unsigned int *)(threads + max_jobs);
 	*fio_debug_jobp = -1;
 	fio_warned = fio_debug_jobp + 1;
@@ -1471,7 +1480,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			f->real_file_size = -1ULL;
 	}
 
-	td->mutex = fio_mutex_init(FIO_MUTEX_LOCKED);
+	td->sem = fio_sem_init(FIO_SEM_LOCKED);
 
 	td->ts.clat_percentiles = o->clat_percentiles;
 	td->ts.lat_percentiles = o->lat_percentiles;
diff --git a/io_u.c b/io_u.c
index 01b3693..f3b5932 100644
--- a/io_u.c
+++ b/io_u.c
@@ -856,8 +856,8 @@ void put_io_u(struct thread_data *td, struct io_u *io_u)
 		assert(!(td->flags & TD_F_CHILD));
 	}
 	io_u_qpush(&td->io_u_freelist, io_u);
-	td_io_u_unlock(td);
 	td_io_u_free_notify(td);
+	td_io_u_unlock(td);
 }
 
 void clear_io_u(struct thread_data *td, struct io_u *io_u)
@@ -889,8 +889,8 @@ void requeue_io_u(struct thread_data *td, struct io_u **io_u)
 	}
 
 	io_u_rpush(&td->io_u_requeues, __io_u);
-	td_io_u_unlock(td);
 	td_io_u_free_notify(td);
+	td_io_u_unlock(td);
 	*io_u = NULL;
 }
 
@@ -1558,6 +1558,7 @@ bool queue_full(const struct thread_data *td)
 struct io_u *__get_io_u(struct thread_data *td)
 {
 	struct io_u *io_u = NULL;
+	int ret;
 
 	if (td->stop_io)
 		return NULL;
@@ -1594,7 +1595,8 @@ again:
 		 * return one
 		 */
 		assert(!(td->flags & TD_F_CHILD));
-		assert(!pthread_cond_wait(&td->free_cond, &td->io_u_lock));
+		ret = pthread_cond_wait(&td->free_cond, &td->io_u_lock);
+		assert(ret == 0);
 		goto again;
 	}
 
diff --git a/iolog.c b/iolog.c
index 7d5a136..460d7a2 100644
--- a/iolog.c
+++ b/iolog.c
@@ -20,6 +20,7 @@
 #include "filelock.h"
 #include "smalloc.h"
 #include "blktrace.h"
+#include "pshared.h"
 
 static int iolog_flush(struct io_log *log);
 
diff --git a/mutex.c b/mutex.c
deleted file mode 100644
index acc88dc..0000000
--- a/mutex.c
+++ /dev/null
@@ -1,322 +0,0 @@
-#include <stdio.h>
-#include <string.h>
-#include <unistd.h>
-#include <stdlib.h>
-#include <fcntl.h>
-#include <time.h>
-#include <errno.h>
-#include <pthread.h>
-#include <sys/mman.h>
-#include <assert.h>
-
-#include "fio.h"
-#include "log.h"
-#include "mutex.h"
-#include "arch/arch.h"
-#include "os/os.h"
-#include "helpers.h"
-#include "fio_time.h"
-#include "gettime.h"
-
-void __fio_mutex_remove(struct fio_mutex *mutex)
-{
-	assert(mutex->magic == FIO_MUTEX_MAGIC);
-	pthread_cond_destroy(&mutex->cond);
-
-	/*
-	 * Ensure any subsequent attempt to grab this mutex will fail
-	 * with an assert, instead of just silently hanging.
-	 */
-	memset(mutex, 0, sizeof(*mutex));
-}
-
-void fio_mutex_remove(struct fio_mutex *mutex)
-{
-	__fio_mutex_remove(mutex);
-	munmap((void *) mutex, sizeof(*mutex));
-}
-
-int cond_init_pshared(pthread_cond_t *cond)
-{
-	pthread_condattr_t cattr;
-	int ret;
-
-	ret = pthread_condattr_init(&cattr);
-	if (ret) {
-		log_err("pthread_condattr_init: %s\n", strerror(ret));
-		return ret;
-	}
-
-#ifdef CONFIG_PSHARED
-	ret = pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED);
-	if (ret) {
-		log_err("pthread_condattr_setpshared: %s\n", strerror(ret));
-		return ret;
-	}
-#endif
-	ret = pthread_cond_init(cond, &cattr);
-	if (ret) {
-		log_err("pthread_cond_init: %s\n", strerror(ret));
-		return ret;
-	}
-
-	return 0;
-}
-
-int mutex_init_pshared(pthread_mutex_t *mutex)
-{
-	pthread_mutexattr_t mattr;
-	int ret;
-
-	ret = pthread_mutexattr_init(&mattr);
-	if (ret) {
-		log_err("pthread_mutexattr_init: %s\n", strerror(ret));
-		return ret;
-	}
-
-	/*
-	 * Not all platforms support process shared mutexes (FreeBSD)
-	 */
-#ifdef CONFIG_PSHARED
-	ret = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
-	if (ret) {
-		log_err("pthread_mutexattr_setpshared: %s\n", strerror(ret));
-		return ret;
-	}
-#endif
-	ret = pthread_mutex_init(mutex, &mattr);
-	if (ret) {
-		log_err("pthread_mutex_init: %s\n", strerror(ret));
-		return ret;
-	}
-
-	return 0;
-}
-
-int mutex_cond_init_pshared(pthread_mutex_t *mutex, pthread_cond_t *cond)
-{
-	int ret;
-
-	ret = mutex_init_pshared(mutex);
-	if (ret)
-		return ret;
-
-	ret = cond_init_pshared(cond);
-	if (ret)
-		return ret;
-
-	return 0;
-}
-
-int __fio_mutex_init(struct fio_mutex *mutex, int value)
-{
-	int ret;
-
-	mutex->value = value;
-	mutex->magic = FIO_MUTEX_MAGIC;
-
-	ret = mutex_cond_init_pshared(&mutex->lock, &mutex->cond);
-	if (ret)
-		return ret;
-
-	return 0;
-}
-
-struct fio_mutex *fio_mutex_init(int value)
-{
-	struct fio_mutex *mutex = NULL;
-
-	mutex = (void *) mmap(NULL, sizeof(struct fio_mutex),
-				PROT_READ | PROT_WRITE,
-				OS_MAP_ANON | MAP_SHARED, -1, 0);
-	if (mutex == MAP_FAILED) {
-		perror("mmap mutex");
-		return NULL;
-	}
-
-	if (!__fio_mutex_init(mutex, value))
-		return mutex;
-
-	fio_mutex_remove(mutex);
-	return NULL;
-}
-
-static bool mutex_timed_out(struct timespec *t, unsigned int msecs)
-{
-	struct timeval tv;
-	struct timespec now;
-
-	gettimeofday(&tv, NULL);
-	now.tv_sec = tv.tv_sec;
-	now.tv_nsec = tv.tv_usec * 1000;
-
-	return mtime_since(t, &now) >= msecs;
-}
-
-int fio_mutex_down_timeout(struct fio_mutex *mutex, unsigned int msecs)
-{
-	struct timeval tv_s;
-	struct timespec base;
-	struct timespec t;
-	int ret = 0;
-
-	assert(mutex->magic == FIO_MUTEX_MAGIC);
-
-	gettimeofday(&tv_s, NULL);
-	base.tv_sec = t.tv_sec = tv_s.tv_sec;
-	base.tv_nsec = t.tv_nsec = tv_s.tv_usec * 1000;
-
-	t.tv_sec += msecs / 1000;
-	t.tv_nsec += ((msecs * 1000000ULL) % 1000000000);
-	if (t.tv_nsec >= 1000000000) {
-		t.tv_nsec -= 1000000000;
-		t.tv_sec++;
-	}
-
-	pthread_mutex_lock(&mutex->lock);
-
-	mutex->waiters++;
-	while (!mutex->value && !ret) {
-		/*
-		 * Some platforms (FreeBSD 9?) seems to return timed out
-		 * way too early, double check.
-		 */
-		ret = pthread_cond_timedwait(&mutex->cond, &mutex->lock, &t);
-		if (ret == ETIMEDOUT && !mutex_timed_out(&base, msecs))
-			ret = 0;
-	}
-	mutex->waiters--;
-
-	if (!ret) {
-		mutex->value--;
-		pthread_mutex_unlock(&mutex->lock);
-		return 0;
-	}
-
-	pthread_mutex_unlock(&mutex->lock);
-	return ret;
-}
-
-bool fio_mutex_down_trylock(struct fio_mutex *mutex)
-{
-	bool ret = true;
-
-	assert(mutex->magic == FIO_MUTEX_MAGIC);
-
-	pthread_mutex_lock(&mutex->lock);
-	if (mutex->value) {
-		mutex->value--;
-		ret = false;
-	}
-	pthread_mutex_unlock(&mutex->lock);
-
-	return ret;
-}
-
-void fio_mutex_down(struct fio_mutex *mutex)
-{
-	assert(mutex->magic == FIO_MUTEX_MAGIC);
-
-	pthread_mutex_lock(&mutex->lock);
-
-	while (!mutex->value) {
-		mutex->waiters++;
-		pthread_cond_wait(&mutex->cond, &mutex->lock);
-		mutex->waiters--;
-	}
-
-	mutex->value--;
-	pthread_mutex_unlock(&mutex->lock);
-}
-
-void fio_mutex_up(struct fio_mutex *mutex)
-{
-	int do_wake = 0;
-
-	assert(mutex->magic == FIO_MUTEX_MAGIC);
-
-	pthread_mutex_lock(&mutex->lock);
-	read_barrier();
-	if (!mutex->value && mutex->waiters)
-		do_wake = 1;
-	mutex->value++;
-
-	if (do_wake)
-		pthread_cond_signal(&mutex->cond);
-
-	pthread_mutex_unlock(&mutex->lock);
-}
-
-void fio_rwlock_write(struct fio_rwlock *lock)
-{
-	assert(lock->magic == FIO_RWLOCK_MAGIC);
-	pthread_rwlock_wrlock(&lock->lock);
-}
-
-void fio_rwlock_read(struct fio_rwlock *lock)
-{
-	assert(lock->magic == FIO_RWLOCK_MAGIC);
-	pthread_rwlock_rdlock(&lock->lock);
-}
-
-void fio_rwlock_unlock(struct fio_rwlock *lock)
-{
-	assert(lock->magic == FIO_RWLOCK_MAGIC);
-	pthread_rwlock_unlock(&lock->lock);
-}
-
-void fio_rwlock_remove(struct fio_rwlock *lock)
-{
-	assert(lock->magic == FIO_RWLOCK_MAGIC);
-	munmap((void *) lock, sizeof(*lock));
-}
-
-struct fio_rwlock *fio_rwlock_init(void)
-{
-	struct fio_rwlock *lock;
-	pthread_rwlockattr_t attr;
-	int ret;
-
-	lock = (void *) mmap(NULL, sizeof(struct fio_rwlock),
-				PROT_READ | PROT_WRITE,
-				OS_MAP_ANON | MAP_SHARED, -1, 0);
-	if (lock == MAP_FAILED) {
-		perror("mmap rwlock");
-		lock = NULL;
-		goto err;
-	}
-
-	lock->magic = FIO_RWLOCK_MAGIC;
-
-	ret = pthread_rwlockattr_init(&attr);
-	if (ret) {
-		log_err("pthread_rwlockattr_init: %s\n", strerror(ret));
-		goto err;
-	}
-#ifdef CONFIG_PSHARED
-	ret = pthread_rwlockattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
-	if (ret) {
-		log_err("pthread_rwlockattr_setpshared: %s\n", strerror(ret));
-		goto destroy_attr;
-	}
-
-	ret = pthread_rwlock_init(&lock->lock, &attr);
-#else
-	ret = pthread_rwlock_init(&lock->lock, NULL);
-#endif
-
-	if (ret) {
-		log_err("pthread_rwlock_init: %s\n", strerror(ret));
-		goto destroy_attr;
-	}
-
-	pthread_rwlockattr_destroy(&attr);
-
-	return lock;
-destroy_attr:
-	pthread_rwlockattr_destroy(&attr);
-err:
-	if (lock)
-		fio_rwlock_remove(lock);
-	return NULL;
-}
diff --git a/mutex.h b/mutex.h
deleted file mode 100644
index 54009ba..0000000
--- a/mutex.h
+++ /dev/null
@@ -1,47 +0,0 @@
-#ifndef FIO_MUTEX_H
-#define FIO_MUTEX_H
-
-#include <pthread.h>
-#include "lib/types.h"
-
-#define FIO_MUTEX_MAGIC		0x4d555445U
-#define FIO_RWLOCK_MAGIC	0x52574c4fU
-
-struct fio_mutex {
-	pthread_mutex_t lock;
-	pthread_cond_t cond;
-	int value;
-	int waiters;
-	int magic;
-};
-
-struct fio_rwlock {
-	pthread_rwlock_t lock;
-	int magic;
-};
-
-enum {
-	FIO_MUTEX_LOCKED	= 0,
-	FIO_MUTEX_UNLOCKED	= 1,
-};
-
-extern int __fio_mutex_init(struct fio_mutex *, int);
-extern struct fio_mutex *fio_mutex_init(int);
-extern void __fio_mutex_remove(struct fio_mutex *);
-extern void fio_mutex_remove(struct fio_mutex *);
-extern void fio_mutex_up(struct fio_mutex *);
-extern void fio_mutex_down(struct fio_mutex *);
-extern bool fio_mutex_down_trylock(struct fio_mutex *);
-extern int fio_mutex_down_timeout(struct fio_mutex *, unsigned int);
-
-extern void fio_rwlock_read(struct fio_rwlock *);
-extern void fio_rwlock_write(struct fio_rwlock *);
-extern void fio_rwlock_unlock(struct fio_rwlock *);
-extern struct fio_rwlock *fio_rwlock_init(void);
-extern void fio_rwlock_remove(struct fio_rwlock *);
-
-extern int mutex_init_pshared(pthread_mutex_t *);
-extern int cond_init_pshared(pthread_cond_t *);
-extern int mutex_cond_init_pshared(pthread_mutex_t *, pthread_cond_t *);
-
-#endif
diff --git a/optgroup.c b/optgroup.c
index 122d24e..1c418f5 100644
--- a/optgroup.c
+++ b/optgroup.c
@@ -1,6 +1,7 @@
 #include <stdio.h>
 #include <inttypes.h>
 #include "optgroup.h"
+#include "compiler/compiler.h"
 
 /*
  * Option grouping
@@ -203,3 +204,5 @@ const struct opt_group *opt_group_cat_from_mask(uint64_t *mask)
 {
 	return group_from_mask(fio_opt_cat_groups, mask, FIO_OPT_G_INVALID);
 }
+
+compiletime_assert(__FIO_OPT_G_NR <= 8 * sizeof(uint64_t), "__FIO_OPT_G_NR");
diff --git a/optgroup.h b/optgroup.h
index 815ac16..d5e968d 100644
--- a/optgroup.h
+++ b/optgroup.h
@@ -55,10 +55,11 @@ enum opt_category_group {
 	__FIO_OPT_G_LIBAIO,
 	__FIO_OPT_G_ACT,
 	__FIO_OPT_G_LATPROF,
-        __FIO_OPT_G_RBD,
-        __FIO_OPT_G_GFAPI,
-        __FIO_OPT_G_MTD,
+	__FIO_OPT_G_RBD,
+	__FIO_OPT_G_GFAPI,
+	__FIO_OPT_G_MTD,
 	__FIO_OPT_G_HDFS,
+	__FIO_OPT_G_SG,
 	__FIO_OPT_G_NR,
 
 	FIO_OPT_G_RATE		= (1ULL << __FIO_OPT_G_RATE),
@@ -93,6 +94,7 @@ enum opt_category_group {
 	FIO_OPT_G_GFAPI		= (1ULL << __FIO_OPT_G_GFAPI),
 	FIO_OPT_G_MTD		= (1ULL << __FIO_OPT_G_MTD),
 	FIO_OPT_G_HDFS		= (1ULL << __FIO_OPT_G_HDFS),
+	FIO_OPT_G_SG		= (1ULL << __FIO_OPT_G_SG),
 	FIO_OPT_G_INVALID	= (1ULL << __FIO_OPT_G_NR),
 };
 
diff --git a/profiles/act.c b/profiles/act.c
index 3fa5afa..5d3bd25 100644
--- a/profiles/act.c
+++ b/profiles/act.c
@@ -38,7 +38,7 @@ struct act_slice {
 };
 
 struct act_run_data {
-	struct fio_mutex *mutex;
+	struct fio_sem *sem;
 	unsigned int pending;
 
 	struct act_slice *slices;
@@ -337,9 +337,9 @@ static int act_io_u_lat(struct thread_data *td, uint64_t nsec)
 
 static void get_act_ref(void)
 {
-	fio_mutex_down(act_run_data->mutex);
+	fio_sem_down(act_run_data->sem);
 	act_run_data->pending++;
-	fio_mutex_up(act_run_data->mutex);
+	fio_sem_up(act_run_data->sem);
 }
 
 static int show_slice(struct act_slice *slice, unsigned int slice_num)
@@ -396,7 +396,7 @@ static void put_act_ref(struct thread_data *td)
 	struct act_prof_data *apd = td->prof_data;
 	unsigned int i, slice;
 
-	fio_mutex_down(act_run_data->mutex);
+	fio_sem_down(act_run_data->sem);
 
 	if (!act_run_data->slices) {
 		act_run_data->slices = calloc(apd->nr_slices, sizeof(struct act_slice));
@@ -416,7 +416,7 @@ static void put_act_ref(struct thread_data *td)
 	if (!--act_run_data->pending)
 		act_show_all_stats();
 
-	fio_mutex_up(act_run_data->mutex);
+	fio_sem_up(act_run_data->sem);
 }
 
 static int act_td_init(struct thread_data *td)
@@ -464,7 +464,7 @@ static struct profile_ops act_profile = {
 static void fio_init act_register(void)
 {
 	act_run_data = calloc(1, sizeof(*act_run_data));
-	act_run_data->mutex = fio_mutex_init(FIO_MUTEX_UNLOCKED);
+	act_run_data->sem = fio_sem_init(FIO_SEM_UNLOCKED);
 
 	if (register_profile(&act_profile))
 		log_err("fio: failed to register profile 'act'\n");
@@ -476,7 +476,7 @@ static void fio_exit act_unregister(void)
 		free((void *) act_opts[++org_idx]);
 
 	unregister_profile(&act_profile);
-	fio_mutex_remove(act_run_data->mutex);
+	fio_sem_remove(act_run_data->sem);
 	free(act_run_data->slices);
 	free(act_run_data);
 	act_run_data = NULL;
diff --git a/pshared.c b/pshared.c
new file mode 100644
index 0000000..74812ed
--- /dev/null
+++ b/pshared.c
@@ -0,0 +1,76 @@
+#include <string.h>
+
+#include "log.h"
+#include "pshared.h"
+
+int cond_init_pshared(pthread_cond_t *cond)
+{
+	pthread_condattr_t cattr;
+	int ret;
+
+	ret = pthread_condattr_init(&cattr);
+	if (ret) {
+		log_err("pthread_condattr_init: %s\n", strerror(ret));
+		return ret;
+	}
+
+#ifdef CONFIG_PSHARED
+	ret = pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED);
+	if (ret) {
+		log_err("pthread_condattr_setpshared: %s\n", strerror(ret));
+		return ret;
+	}
+#endif
+	ret = pthread_cond_init(cond, &cattr);
+	if (ret) {
+		log_err("pthread_cond_init: %s\n", strerror(ret));
+		return ret;
+	}
+
+	return 0;
+}
+
+int mutex_init_pshared(pthread_mutex_t *mutex)
+{
+	pthread_mutexattr_t mattr;
+	int ret;
+
+	ret = pthread_mutexattr_init(&mattr);
+	if (ret) {
+		log_err("pthread_mutexattr_init: %s\n", strerror(ret));
+		return ret;
+	}
+
+	/*
+	 * Not all platforms support process shared mutexes (FreeBSD)
+	 */
+#ifdef CONFIG_PSHARED
+	ret = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
+	if (ret) {
+		log_err("pthread_mutexattr_setpshared: %s\n", strerror(ret));
+		return ret;
+	}
+#endif
+	ret = pthread_mutex_init(mutex, &mattr);
+	if (ret) {
+		log_err("pthread_mutex_init: %s\n", strerror(ret));
+		return ret;
+	}
+
+	return 0;
+}
+
+int mutex_cond_init_pshared(pthread_mutex_t *mutex, pthread_cond_t *cond)
+{
+	int ret;
+
+	ret = mutex_init_pshared(mutex);
+	if (ret)
+		return ret;
+
+	ret = cond_init_pshared(cond);
+	if (ret)
+		return ret;
+
+	return 0;
+}
diff --git a/pshared.h b/pshared.h
new file mode 100644
index 0000000..a58df6f
--- /dev/null
+++ b/pshared.h
@@ -0,0 +1,10 @@
+#ifndef FIO_PSHARED_H
+#define FIO_PSHARED_H
+
+#include <pthread.h>
+
+extern int mutex_init_pshared(pthread_mutex_t *);
+extern int cond_init_pshared(pthread_cond_t *);
+extern int mutex_cond_init_pshared(pthread_mutex_t *, pthread_cond_t *);
+
+#endif
diff --git a/rwlock.c b/rwlock.c
new file mode 100644
index 0000000..00e3809
--- /dev/null
+++ b/rwlock.c
@@ -0,0 +1,83 @@
+#include <stdio.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <assert.h>
+
+#include "log.h"
+#include "rwlock.h"
+#include "os/os.h"
+
+void fio_rwlock_write(struct fio_rwlock *lock)
+{
+	assert(lock->magic == FIO_RWLOCK_MAGIC);
+	pthread_rwlock_wrlock(&lock->lock);
+}
+
+void fio_rwlock_read(struct fio_rwlock *lock)
+{
+	assert(lock->magic == FIO_RWLOCK_MAGIC);
+	pthread_rwlock_rdlock(&lock->lock);
+}
+
+void fio_rwlock_unlock(struct fio_rwlock *lock)
+{
+	assert(lock->magic == FIO_RWLOCK_MAGIC);
+	pthread_rwlock_unlock(&lock->lock);
+}
+
+void fio_rwlock_remove(struct fio_rwlock *lock)
+{
+	assert(lock->magic == FIO_RWLOCK_MAGIC);
+	pthread_rwlock_destroy(&lock->lock);
+	munmap((void *) lock, sizeof(*lock));
+}
+
+struct fio_rwlock *fio_rwlock_init(void)
+{
+	struct fio_rwlock *lock;
+	pthread_rwlockattr_t attr;
+	int ret;
+
+	lock = (void *) mmap(NULL, sizeof(struct fio_rwlock),
+				PROT_READ | PROT_WRITE,
+				OS_MAP_ANON | MAP_SHARED, -1, 0);
+	if (lock == MAP_FAILED) {
+		perror("mmap rwlock");
+		lock = NULL;
+		goto err;
+	}
+
+	lock->magic = FIO_RWLOCK_MAGIC;
+
+	ret = pthread_rwlockattr_init(&attr);
+	if (ret) {
+		log_err("pthread_rwlockattr_init: %s\n", strerror(ret));
+		goto err;
+	}
+#ifdef CONFIG_PSHARED
+	ret = pthread_rwlockattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
+	if (ret) {
+		log_err("pthread_rwlockattr_setpshared: %s\n", strerror(ret));
+		goto destroy_attr;
+	}
+
+	ret = pthread_rwlock_init(&lock->lock, &attr);
+#else
+	ret = pthread_rwlock_init(&lock->lock, NULL);
+#endif
+
+	if (ret) {
+		log_err("pthread_rwlock_init: %s\n", strerror(ret));
+		goto destroy_attr;
+	}
+
+	pthread_rwlockattr_destroy(&attr);
+
+	return lock;
+destroy_attr:
+	pthread_rwlockattr_destroy(&attr);
+err:
+	if (lock)
+		fio_rwlock_remove(lock);
+	return NULL;
+}
diff --git a/rwlock.h b/rwlock.h
new file mode 100644
index 0000000..2968eed
--- /dev/null
+++ b/rwlock.h
@@ -0,0 +1,19 @@
+#ifndef FIO_RWLOCK_H
+#define FIO_RWLOCK_H
+
+#include <pthread.h>
+
+#define FIO_RWLOCK_MAGIC	0x52574c4fU
+
+struct fio_rwlock {
+	pthread_rwlock_t lock;
+	int magic;
+};
+
+extern void fio_rwlock_read(struct fio_rwlock *);
+extern void fio_rwlock_write(struct fio_rwlock *);
+extern void fio_rwlock_unlock(struct fio_rwlock *);
+extern struct fio_rwlock *fio_rwlock_init(void);
+extern void fio_rwlock_remove(struct fio_rwlock *);
+
+#endif
diff --git a/server.c b/server.c
index 959786f..65d4484 100644
--- a/server.c
+++ b/server.c
@@ -74,7 +74,7 @@ struct fio_fork_item {
 };
 
 struct cmd_reply {
-	struct fio_mutex lock;
+	struct fio_sem lock;
 	void *data;
 	size_t size;
 	int error;
@@ -108,12 +108,12 @@ static const char *fio_server_ops[FIO_NET_CMD_NR] = {
 
 static void sk_lock(struct sk_out *sk_out)
 {
-	fio_mutex_down(&sk_out->lock);
+	fio_sem_down(&sk_out->lock);
 }
 
 static void sk_unlock(struct sk_out *sk_out)
 {
-	fio_mutex_up(&sk_out->lock);
+	fio_sem_up(&sk_out->lock);
 }
 
 void sk_out_assign(struct sk_out *sk_out)
@@ -129,9 +129,9 @@ void sk_out_assign(struct sk_out *sk_out)
 
 static void sk_out_free(struct sk_out *sk_out)
 {
-	__fio_mutex_remove(&sk_out->lock);
-	__fio_mutex_remove(&sk_out->wait);
-	__fio_mutex_remove(&sk_out->xmit);
+	__fio_sem_remove(&sk_out->lock);
+	__fio_sem_remove(&sk_out->wait);
+	__fio_sem_remove(&sk_out->xmit);
 	sfree(sk_out);
 }
 
@@ -558,7 +558,7 @@ static void fio_net_queue_entry(struct sk_entry *entry)
 		flist_add_tail(&entry->list, &sk_out->list);
 		sk_unlock(sk_out);
 
-		fio_mutex_up(&sk_out->wait);
+		fio_sem_up(&sk_out->wait);
 	}
 }
 
@@ -1039,7 +1039,7 @@ static int handle_command(struct sk_out *sk_out, struct flist_head *job_list,
 				memcpy(rep->data, in->data, in->size);
 			}
 		}
-		fio_mutex_up(&rep->lock);
+		fio_sem_up(&rep->lock);
 		break;
 		}
 	default:
@@ -1138,7 +1138,7 @@ static int handle_sk_entry(struct sk_out *sk_out, struct sk_entry *entry)
 {
 	int ret;
 
-	fio_mutex_down(&sk_out->xmit);
+	fio_sem_down(&sk_out->xmit);
 
 	if (entry->flags & SK_F_VEC)
 		ret = send_vec_entry(sk_out, entry);
@@ -1150,7 +1150,7 @@ static int handle_sk_entry(struct sk_out *sk_out, struct sk_entry *entry)
 					entry->size, &entry->tag, NULL);
 	}
 
-	fio_mutex_up(&sk_out->xmit);
+	fio_sem_up(&sk_out->xmit);
 
 	if (ret)
 		log_err("fio: failed handling cmd %s\n", fio_server_op(entry->opcode));
@@ -1215,7 +1215,7 @@ static int handle_connection(struct sk_out *sk_out)
 				break;
 			} else if (!ret) {
 				fio_server_check_jobs(&job_list);
-				fio_mutex_down_timeout(&sk_out->wait, timeout);
+				fio_sem_down_timeout(&sk_out->wait, timeout);
 				continue;
 			}
 
@@ -1361,9 +1361,9 @@ static int accept_loop(int listen_sk)
 		sk_out = smalloc(sizeof(*sk_out));
 		sk_out->sk = sk;
 		INIT_FLIST_HEAD(&sk_out->list);
-		__fio_mutex_init(&sk_out->lock, FIO_MUTEX_UNLOCKED);
-		__fio_mutex_init(&sk_out->wait, FIO_MUTEX_LOCKED);
-		__fio_mutex_init(&sk_out->xmit, FIO_MUTEX_UNLOCKED);
+		__fio_sem_init(&sk_out->lock, FIO_SEM_UNLOCKED);
+		__fio_sem_init(&sk_out->wait, FIO_SEM_LOCKED);
+		__fio_sem_init(&sk_out->xmit, FIO_SEM_UNLOCKED);
 
 		pid = fork();
 		if (pid) {
@@ -2033,7 +2033,7 @@ int fio_server_get_verify_state(const char *name, int threadnumber,
 	if (!rep)
 		return ENOMEM;
 
-	__fio_mutex_init(&rep->lock, FIO_MUTEX_LOCKED);
+	__fio_sem_init(&rep->lock, FIO_SEM_LOCKED);
 	rep->data = NULL;
 	rep->error = 0;
 
@@ -2046,7 +2046,7 @@ int fio_server_get_verify_state(const char *name, int threadnumber,
 	/*
 	 * Wait for the backend to receive the reply
 	 */
-	if (fio_mutex_down_timeout(&rep->lock, 10000)) {
+	if (fio_sem_down_timeout(&rep->lock, 10000)) {
 		log_err("fio: timed out waiting for reply\n");
 		ret = ETIMEDOUT;
 		goto fail;
@@ -2083,7 +2083,7 @@ fail:
 	*datap = data;
 
 	sfree(rep->data);
-	__fio_mutex_remove(&rep->lock);
+	__fio_sem_remove(&rep->lock);
 	sfree(rep);
 	return ret;
 }
diff --git a/server.h b/server.h
index bd892fc..d652d31 100644
--- a/server.h
+++ b/server.h
@@ -17,10 +17,10 @@ struct sk_out {
 				 * protected by below ->lock */
 
 	int sk;			/* socket fd to talk to client */
-	struct fio_mutex lock;	/* protects ref and below list */
+	struct fio_sem lock;	/* protects ref and below list */
 	struct flist_head list;	/* list of pending transmit work */
-	struct fio_mutex wait;	/* wake backend when items added to list */
-	struct fio_mutex xmit;	/* held while sending data */
+	struct fio_sem wait;	/* wake backend when items added to list */
+	struct fio_sem xmit;	/* held while sending data */
 };
 
 /*
diff --git a/smalloc.c b/smalloc.c
index cab7132..13995ac 100644
--- a/smalloc.c
+++ b/smalloc.c
@@ -12,9 +12,16 @@
 #include <sys/types.h>
 #include <limits.h>
 #include <fcntl.h>
+#ifdef CONFIG_VALGRIND_DEV
+#include <valgrind/valgrind.h>
+#else
+#define RUNNING_ON_VALGRIND 0
+#define VALGRIND_MALLOCLIKE_BLOCK(addr, size, rzB, is_zeroed) do { } while (0)
+#define VALGRIND_FREELIKE_BLOCK(addr, rzB) do { } while (0)
+#endif
 
 #include "fio.h"
-#include "mutex.h"
+#include "fio_sem.h"
 #include "arch/arch.h"
 #include "os/os.h"
 #include "smalloc.h"
@@ -40,7 +47,7 @@ static const int int_mask = sizeof(int) - 1;
 #endif
 
 struct pool {
-	struct fio_mutex *lock;			/* protects this pool */
+	struct fio_sem *lock;			/* protects this pool */
 	void *map;				/* map of blocks */
 	unsigned int *bitmap;			/* blocks free/busy map */
 	size_t free_blocks;		/* free blocks */
@@ -49,6 +56,12 @@ struct pool {
 	size_t mmap_size;
 };
 
+#ifdef SMALLOC_REDZONE
+#define REDZONE_SIZE sizeof(unsigned int)
+#else
+#define REDZONE_SIZE 0
+#endif
+
 struct block_hdr {
 	size_t size;
 #ifdef SMALLOC_REDZONE
@@ -192,7 +205,7 @@ static bool add_pool(struct pool *pool, unsigned int alloc_size)
 	pool->bitmap = (unsigned int *)((char *) ptr + (pool->nr_blocks * SMALLOC_BPL));
 	memset(pool->bitmap, 0, bitmap_blocks * sizeof(unsigned int));
 
-	pool->lock = fio_mutex_init(FIO_MUTEX_UNLOCKED);
+	pool->lock = fio_sem_init(FIO_SEM_UNLOCKED);
 	if (!pool->lock)
 		goto out_fail;
 
@@ -232,7 +245,7 @@ static void cleanup_pool(struct pool *pool)
 	munmap(pool->map, pool->mmap_size);
 
 	if (pool->lock)
-		fio_mutex_remove(pool->lock);
+		fio_sem_remove(pool->lock);
 }
 
 void scleanup(void)
@@ -258,6 +271,10 @@ static void fill_redzone(struct block_hdr *hdr)
 {
 	unsigned int *postred = postred_ptr(hdr);
 
+	/* Let Valgrind fill the red zones. */
+	if (RUNNING_ON_VALGRIND)
+		return;
+
 	hdr->prered = SMALLOC_PRE_RED;
 	*postred = SMALLOC_POST_RED;
 }
@@ -266,6 +283,10 @@ static void sfree_check_redzone(struct block_hdr *hdr)
 {
 	unsigned int *postred = postred_ptr(hdr);
 
+	/* Let Valgrind check the red zones. */
+	if (RUNNING_ON_VALGRIND)
+		return;
+
 	if (hdr->prered != SMALLOC_PRE_RED) {
 		log_err("smalloc pre redzone destroyed!\n"
 			" ptr=%p, prered=%x, expected %x\n",
@@ -309,12 +330,12 @@ static void sfree_pool(struct pool *pool, void *ptr)
 	i = offset / SMALLOC_BPL;
 	idx = (offset % SMALLOC_BPL) / SMALLOC_BPB;
 
-	fio_mutex_down(pool->lock);
+	fio_sem_down(pool->lock);
 	clear_blocks(pool, i, idx, size_to_blocks(hdr->size));
 	if (i < pool->next_non_full)
 		pool->next_non_full = i;
 	pool->free_blocks += size_to_blocks(hdr->size);
-	fio_mutex_up(pool->lock);
+	fio_sem_up(pool->lock);
 }
 
 void sfree(void *ptr)
@@ -333,6 +354,7 @@ void sfree(void *ptr)
 	}
 
 	if (pool) {
+		VALGRIND_FREELIKE_BLOCK(ptr, REDZONE_SIZE);
 		sfree_pool(pool, ptr);
 		return;
 	}
@@ -348,7 +370,7 @@ static void *__smalloc_pool(struct pool *pool, size_t size)
 	unsigned int last_idx;
 	void *ret = NULL;
 
-	fio_mutex_down(pool->lock);
+	fio_sem_down(pool->lock);
 
 	nr_blocks = size_to_blocks(size);
 	if (nr_blocks > pool->free_blocks)
@@ -391,7 +413,7 @@ static void *__smalloc_pool(struct pool *pool, size_t size)
 		ret = pool->map + offset;
 	}
 fail:
-	fio_mutex_up(pool->lock);
+	fio_sem_up(pool->lock);
 	return ret;
 }
 
@@ -423,7 +445,7 @@ static void *smalloc_pool(struct pool *pool, size_t size)
 	return ptr;
 }
 
-void *smalloc(size_t size)
+static void *__smalloc(size_t size, bool is_zeroed)
 {
 	unsigned int i, end_pool;
 
@@ -439,6 +461,9 @@ void *smalloc(size_t size)
 
 			if (ptr) {
 				last_pool = i;
+				VALGRIND_MALLOCLIKE_BLOCK(ptr, size,
+							  REDZONE_SIZE,
+							  is_zeroed);
 				return ptr;
 			}
 		}
@@ -456,9 +481,14 @@ void *smalloc(size_t size)
 	return NULL;
 }
 
+void *smalloc(size_t size)
+{
+	return __smalloc(size, false);
+}
+
 void *scalloc(size_t nmemb, size_t size)
 {
-	return smalloc(nmemb * size);
+	return __smalloc(nmemb * size, true);
 }
 
 char *smalloc_strdup(const char *str)
diff --git a/stat.c b/stat.c
index 8a242c9..98ab638 100644
--- a/stat.c
+++ b/stat.c
@@ -20,7 +20,7 @@
 
 #define LOG_MSEC_SLACK	1
 
-struct fio_mutex *stat_mutex;
+struct fio_sem *stat_sem;
 
 void clear_rusage_stat(struct thread_data *td)
 {
@@ -1946,9 +1946,9 @@ void __show_run_stats(void)
 
 void show_run_stats(void)
 {
-	fio_mutex_down(stat_mutex);
+	fio_sem_down(stat_sem);
 	__show_run_stats();
-	fio_mutex_up(stat_mutex);
+	fio_sem_up(stat_sem);
 }
 
 void __show_running_run_stats(void)
@@ -1958,7 +1958,7 @@ void __show_running_run_stats(void)
 	struct timespec ts;
 	int i;
 
-	fio_mutex_down(stat_mutex);
+	fio_sem_down(stat_sem);
 
 	rt = malloc(thread_number * sizeof(unsigned long long));
 	fio_gettime(&ts, NULL);
@@ -1984,7 +1984,7 @@ void __show_running_run_stats(void)
 			continue;
 		if (td->rusage_sem) {
 			td->update_rusage = 1;
-			fio_mutex_down(td->rusage_sem);
+			fio_sem_down(td->rusage_sem);
 		}
 		td->update_rusage = 0;
 	}
@@ -2001,7 +2001,7 @@ void __show_running_run_stats(void)
 	}
 
 	free(rt);
-	fio_mutex_up(stat_mutex);
+	fio_sem_up(stat_sem);
 }
 
 static bool status_interval_init;
@@ -2690,7 +2690,7 @@ int calc_log_samples(void)
 
 void stat_init(void)
 {
-	stat_mutex = fio_mutex_init(FIO_MUTEX_UNLOCKED);
+	stat_sem = fio_sem_init(FIO_SEM_UNLOCKED);
 }
 
 void stat_exit(void)
@@ -2699,8 +2699,8 @@ void stat_exit(void)
 	 * When we have the mutex, we know out-of-band access to it
 	 * have ended.
 	 */
-	fio_mutex_down(stat_mutex);
-	fio_mutex_remove(stat_mutex);
+	fio_sem_down(stat_sem);
+	fio_sem_remove(stat_sem);
 }
 
 /*
diff --git a/stat.h b/stat.h
index 7580f0d..8e7bcdb 100644
--- a/stat.h
+++ b/stat.h
@@ -277,7 +277,7 @@ struct io_u_plat_entry {
 	uint64_t io_u_plat[FIO_IO_U_PLAT_NR];
 };
 
-extern struct fio_mutex *stat_mutex;
+extern struct fio_sem *stat_sem;
 
 extern struct jobs_eta *get_jobs_eta(bool force, size_t *size);
 
diff --git a/t/dedupe.c b/t/dedupe.c
index 9a50821..1b4277c 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -16,7 +16,7 @@
 
 #include "../flist.h"
 #include "../log.h"
-#include "../mutex.h"
+#include "../fio_sem.h"
 #include "../smalloc.h"
 #include "../minmax.h"
 #include "../crc/md5.h"
@@ -62,7 +62,7 @@ struct item {
 
 static struct rb_root rb_root;
 static struct bloom *bloom;
-static struct fio_mutex *rb_lock;
+static struct fio_sem *rb_lock;
 
 static unsigned int blocksize = 4096;
 static unsigned int num_threads;
@@ -75,7 +75,7 @@ static unsigned int use_bloom = 1;
 
 static uint64_t total_size;
 static uint64_t cur_offset;
-static struct fio_mutex *size_lock;
+static struct fio_sem *size_lock;
 
 static struct fio_file file;
 
@@ -102,7 +102,7 @@ static int get_work(uint64_t *offset, uint64_t *size)
 	uint64_t this_chunk;
 	int ret = 1;
 
-	fio_mutex_down(size_lock);
+	fio_sem_down(size_lock);
 
 	if (cur_offset < total_size) {
 		*offset = cur_offset;
@@ -112,7 +112,7 @@ static int get_work(uint64_t *offset, uint64_t *size)
 		ret = 0;
 	}
 
-	fio_mutex_up(size_lock);
+	fio_sem_up(size_lock);
 	return ret;
 }
 
@@ -215,9 +215,9 @@ static void insert_chunk(struct item *i)
 			if (!collision_check)
 				goto add;
 
-			fio_mutex_up(rb_lock);
+			fio_sem_up(rb_lock);
 			ret = col_check(c, i);
-			fio_mutex_down(rb_lock);
+			fio_sem_down(rb_lock);
 
 			if (!ret)
 				goto add;
@@ -241,7 +241,7 @@ static void insert_chunks(struct item *items, unsigned int nitems,
 {
 	int i;
 
-	fio_mutex_down(rb_lock);
+	fio_sem_down(rb_lock);
 
 	for (i = 0; i < nitems; i++) {
 		if (bloom) {
@@ -255,7 +255,7 @@ static void insert_chunks(struct item *items, unsigned int nitems,
 			insert_chunk(&items[i]);
 	}
 
-	fio_mutex_up(rb_lock);
+	fio_sem_up(rb_lock);
 }
 
 static void crc_buf(void *buf, uint32_t *hash)
@@ -383,7 +383,7 @@ static int run_dedupe_threads(struct fio_file *f, uint64_t dev_size,
 	total_size = dev_size;
 	total_items = dev_size / blocksize;
 	cur_offset = 0;
-	size_lock = fio_mutex_init(FIO_MUTEX_UNLOCKED);
+	size_lock = fio_sem_init(FIO_SEM_UNLOCKED);
 
 	threads = malloc(num_threads * sizeof(struct worker_thread));
 	for (i = 0; i < num_threads; i++) {
@@ -414,7 +414,7 @@ static int run_dedupe_threads(struct fio_file *f, uint64_t dev_size,
 	*nextents = nitems;
 	*nchunks = nitems - *nchunks;
 
-	fio_mutex_remove(size_lock);
+	fio_sem_remove(size_lock);
 	free(threads);
 	return err;
 }
@@ -581,7 +581,7 @@ int main(int argc, char *argv[])
 	sinit();
 
 	rb_root = RB_ROOT;
-	rb_lock = fio_mutex_init(FIO_MUTEX_UNLOCKED);
+	rb_lock = fio_sem_init(FIO_SEM_UNLOCKED);
 
 	ret = dedupe_check(argv[optind], &nextents, &nchunks);
 
@@ -592,7 +592,7 @@ int main(int argc, char *argv[])
 		show_stat(nextents, nchunks);
 	}
 
-	fio_mutex_remove(rb_lock);
+	fio_sem_remove(rb_lock);
 	if (bloom)
 		bloom_free(bloom);
 	scleanup();
diff --git a/verify.c b/verify.c
index 17af3bb..d10670b 100644
--- a/verify.c
+++ b/verify.c
@@ -1454,9 +1454,9 @@ static void *verify_async_thread(void *data)
 done:
 	pthread_mutex_lock(&td->io_u_lock);
 	td->nr_verify_threads--;
+	pthread_cond_signal(&td->free_cond);
 	pthread_mutex_unlock(&td->io_u_lock);
 
-	pthread_cond_signal(&td->free_cond);
 	return NULL;
 }
 
@@ -1492,9 +1492,12 @@ int verify_async_init(struct thread_data *td)
 
 	if (i != td->o.verify_async) {
 		log_err("fio: only %d verify threads started, exiting\n", i);
+
+		pthread_mutex_lock(&td->io_u_lock);
 		td->verify_thread_exit = 1;
-		write_barrier();
 		pthread_cond_broadcast(&td->verify_cond);
+		pthread_mutex_unlock(&td->io_u_lock);
+
 		return 1;
 	}
 
@@ -1503,12 +1506,10 @@ int verify_async_init(struct thread_data *td)
 
 void verify_async_exit(struct thread_data *td)
 {
+	pthread_mutex_lock(&td->io_u_lock);
 	td->verify_thread_exit = 1;
-	write_barrier();
 	pthread_cond_broadcast(&td->verify_cond);
 
-	pthread_mutex_lock(&td->io_u_lock);
-
 	while (td->nr_verify_threads)
 		pthread_cond_wait(&td->free_cond, &td->io_u_lock);
 
diff --git a/workqueue.c b/workqueue.c
index 18ec198..841dbb9 100644
--- a/workqueue.c
+++ b/workqueue.c
@@ -10,6 +10,7 @@
 #include "flist.h"
 #include "workqueue.h"
 #include "smalloc.h"
+#include "pshared.h"
 
 enum {
 	SW_F_IDLE	= 1 << 0,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c4bf91427a4fd1fbdb662667307189eabacf45b5:

  Merge branch 'master' of https://github.com/bvanassche/fio (2018-03-12 18:13:10 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 69b98f11d62cb12482130fac79b8ebf00c0bb139:

  io_u: only rewind file position if it's non-zero (2018-03-13 11:49:55 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      configure: don't disable lex on FreeBSD
      io_u: only rewind file position if it's non-zero

 configure | 2 +-
 io_u.c    | 6 +++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index a73b61e..f38e9c7 100755
--- a/configure
+++ b/configure
@@ -282,7 +282,7 @@ fi
 # cross-compiling to one of these OSes then you'll need to specify
 # the correct CPU with the --cpu option.
 case $targetos in
-AIX|*BSD)
+AIX|OpenBSD|NetBSD)
   # Unless explicitly enabled, turn off lex.
   # OpenBSD will hit syntax error when enabled.
   if test -z "$disable_lex" ; then
diff --git a/io_u.c b/io_u.c
index a37b723..01b3693 100644
--- a/io_u.c
+++ b/io_u.c
@@ -430,7 +430,11 @@ static int get_next_seq_offset(struct thread_data *td, struct fio_file *f,
 	if (f->last_pos[ddir] < f->real_file_size) {
 		uint64_t pos;
 
-		if (f->last_pos[ddir] == f->file_offset && o->ddir_seq_add < 0) {
+		/*
+		 * Only rewind if we already hit the end
+		 */
+		if (f->last_pos[ddir] == f->file_offset &&
+		    f->file_offset && o->ddir_seq_add < 0) {
 			if (f->real_file_size > f->io_size)
 				f->last_pos[ddir] = f->io_size;
 			else

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e0b3258bde6f39ab3a9d178b56526b65e0e32a8d:

  filesetup: Initialize all members of struct fio_file (2018-03-09 21:34:44 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c4bf91427a4fd1fbdb662667307189eabacf45b5:

  Merge branch 'master' of https://github.com/bvanassche/fio (2018-03-12 18:13:10 -0600)

----------------------------------------------------------------
Bart Van Assche (4):
      configure: Disable lex on NetBSD
      Rename struct rb_node into struct fio_rb_node
      stat: Fix a compiler warning in __show_run_stats()
      parse: Fix two compiler warnings

Jens Axboe (1):
      Merge branch 'master' of https://github.com/bvanassche/fio

 configure    |  4 +++-
 iolog.c      |  4 ++--
 iolog.h      |  2 +-
 lib/rbtree.c | 49 +++++++++++++++++++++++++------------------------
 lib/rbtree.h | 37 +++++++++++++++++++------------------
 parse.c      |  4 ++--
 stat.c       |  5 +++--
 t/dedupe.c   |  6 +++---
 verify.c     |  2 +-
 9 files changed, 59 insertions(+), 54 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index aefd5bb..a73b61e 100755
--- a/configure
+++ b/configure
@@ -252,6 +252,8 @@ elif check_define __linux__ ; then
   targetos="Linux"
 elif check_define __OpenBSD__ ; then
   targetos='OpenBSD'
+elif check_define __NetBSD__ ; then
+  targetos='NetBSD'
 elif check_define __sun__ ; then
   targetos='SunOS'
   CFLAGS="$CFLAGS -D_REENTRANT"
@@ -280,7 +282,7 @@ fi
 # cross-compiling to one of these OSes then you'll need to specify
 # the correct CPU with the --cpu option.
 case $targetos in
-AIX|OpenBSD)
+AIX|*BSD)
   # Unless explicitly enabled, turn off lex.
   # OpenBSD will hit syntax error when enabled.
   if test -z "$disable_lex" ; then
diff --git a/iolog.c b/iolog.c
index fc3dade..7d5a136 100644
--- a/iolog.c
+++ b/iolog.c
@@ -184,7 +184,7 @@ int read_iolog_get(struct thread_data *td, struct io_u *io_u)
 void prune_io_piece_log(struct thread_data *td)
 {
 	struct io_piece *ipo;
-	struct rb_node *n;
+	struct fio_rb_node *n;
 
 	while ((n = rb_first(&td->io_hist_tree)) != NULL) {
 		ipo = rb_entry(n, struct io_piece, rb_node);
@@ -208,7 +208,7 @@ void prune_io_piece_log(struct thread_data *td)
  */
 void log_io_piece(struct thread_data *td, struct io_u *io_u)
 {
-	struct rb_node **p, *parent;
+	struct fio_rb_node **p, *parent;
 	struct io_piece *ipo, *__ipo;
 
 	ipo = malloc(sizeof(struct io_piece));
diff --git a/iolog.h b/iolog.h
index 2266617..70981f9 100644
--- a/iolog.h
+++ b/iolog.h
@@ -199,7 +199,7 @@ enum {
  */
 struct io_piece {
 	union {
-		struct rb_node rb_node;
+		struct fio_rb_node rb_node;
 		struct flist_head list;
 	};
 	struct flist_head trim_list;
diff --git a/lib/rbtree.c b/lib/rbtree.c
index 00a5a90..6f0feae 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -22,10 +22,10 @@
 
 #include "rbtree.h"
 
-static void __rb_rotate_left(struct rb_node *node, struct rb_root *root)
+static void __rb_rotate_left(struct fio_rb_node *node, struct rb_root *root)
 {
-	struct rb_node *right = node->rb_right;
-	struct rb_node *parent = rb_parent(node);
+	struct fio_rb_node *right = node->rb_right;
+	struct fio_rb_node *parent = rb_parent(node);
 
 	if ((node->rb_right = right->rb_left))
 		rb_set_parent(right->rb_left, node);
@@ -45,10 +45,10 @@ static void __rb_rotate_left(struct rb_node *node, struct rb_root *root)
 	rb_set_parent(node, right);
 }
 
-static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
+static void __rb_rotate_right(struct fio_rb_node *node, struct rb_root *root)
 {
-	struct rb_node *left = node->rb_left;
-	struct rb_node *parent = rb_parent(node);
+	struct fio_rb_node *left = node->rb_left;
+	struct fio_rb_node *parent = rb_parent(node);
 
 	if ((node->rb_left = left->rb_right))
 		rb_set_parent(left->rb_right, node);
@@ -68,9 +68,9 @@ static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
 	rb_set_parent(node, left);
 }
 
-void rb_insert_color(struct rb_node *node, struct rb_root *root)
+void rb_insert_color(struct fio_rb_node *node, struct rb_root *root)
 {
-	struct rb_node *parent, *gparent;
+	struct fio_rb_node *parent, *gparent;
 
 	while ((parent = rb_parent(node)) && rb_is_red(parent))
 	{
@@ -79,7 +79,7 @@ void rb_insert_color(struct rb_node *node, struct rb_root *root)
 		if (parent == gparent->rb_left)
 		{
 			{
-				register struct rb_node *uncle = gparent->rb_right;
+				register struct fio_rb_node *uncle = gparent->rb_right;
 				if (uncle && rb_is_red(uncle))
 				{
 					rb_set_black(uncle);
@@ -92,7 +92,7 @@ void rb_insert_color(struct rb_node *node, struct rb_root *root)
 
 			if (parent->rb_right == node)
 			{
-				register struct rb_node *tmp;
+				register struct fio_rb_node *tmp;
 				__rb_rotate_left(parent, root);
 				tmp = parent;
 				parent = node;
@@ -104,7 +104,7 @@ void rb_insert_color(struct rb_node *node, struct rb_root *root)
 			__rb_rotate_right(gparent, root);
 		} else {
 			{
-				register struct rb_node *uncle = gparent->rb_left;
+				register struct fio_rb_node *uncle = gparent->rb_left;
 				if (uncle && rb_is_red(uncle))
 				{
 					rb_set_black(uncle);
@@ -117,7 +117,7 @@ void rb_insert_color(struct rb_node *node, struct rb_root *root)
 
 			if (parent->rb_left == node)
 			{
-				register struct rb_node *tmp;
+				register struct fio_rb_node *tmp;
 				__rb_rotate_right(parent, root);
 				tmp = parent;
 				parent = node;
@@ -133,10 +133,11 @@ void rb_insert_color(struct rb_node *node, struct rb_root *root)
 	rb_set_black(root->rb_node);
 }
 
-static void __rb_erase_color(struct rb_node *node, struct rb_node *parent,
+static void __rb_erase_color(struct fio_rb_node *node,
+			     struct fio_rb_node *parent,
 			     struct rb_root *root)
 {
-	struct rb_node *other;
+	struct fio_rb_node *other;
 
 	while ((!node || rb_is_black(node)) && node != root->rb_node)
 	{
@@ -161,7 +162,7 @@ static void __rb_erase_color(struct rb_node *node, struct rb_node *parent,
 			{
 				if (!other->rb_right || rb_is_black(other->rb_right))
 				{
-					struct rb_node *o_left;
+					struct fio_rb_node *o_left;
 					if ((o_left = other->rb_left))
 						rb_set_black(o_left);
 					rb_set_red(other);
@@ -198,7 +199,7 @@ static void __rb_erase_color(struct rb_node *node, struct rb_node *parent,
 			{
 				if (!other->rb_left || rb_is_black(other->rb_left))
 				{
-					register struct rb_node *o_right;
+					register struct fio_rb_node *o_right;
 					if ((o_right = other->rb_right))
 						rb_set_black(o_right);
 					rb_set_red(other);
@@ -219,9 +220,9 @@ static void __rb_erase_color(struct rb_node *node, struct rb_node *parent,
 		rb_set_black(node);
 }
 
-void rb_erase(struct rb_node *node, struct rb_root *root)
+void rb_erase(struct fio_rb_node *node, struct rb_root *root)
 {
-	struct rb_node *child, *parent;
+	struct fio_rb_node *child, *parent;
 	int color;
 
 	if (!node->rb_left)
@@ -230,7 +231,7 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		child = node->rb_left;
 	else
 	{
-		struct rb_node *old = node, *left;
+		struct fio_rb_node *old = node, *left;
 
 		node = node->rb_right;
 		while ((left = node->rb_left) != NULL)
@@ -289,9 +290,9 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 /*
  * This function returns the first node (in sort order) of the tree.
  */
-struct rb_node *rb_first(struct rb_root *root)
+struct fio_rb_node *rb_first(struct rb_root *root)
 {
-	struct rb_node	*n;
+	struct fio_rb_node	*n;
 
 	n = root->rb_node;
 	if (!n)
@@ -301,9 +302,9 @@ struct rb_node *rb_first(struct rb_root *root)
 	return n;
 }
 
-struct rb_node *rb_next(const struct rb_node *node)
+struct fio_rb_node *rb_next(const struct fio_rb_node *node)
 {
-	struct rb_node *parent;
+	struct fio_rb_node *parent;
 
 	if (RB_EMPTY_NODE(node))
 		return NULL;
@@ -316,7 +317,7 @@ struct rb_node *rb_next(const struct rb_node *node)
 		node = node->rb_right; 
 		while (node->rb_left)
 			node=node->rb_left;
-		return (struct rb_node *)node;
+		return (struct fio_rb_node *)node;
 	}
 
 	/*
diff --git a/lib/rbtree.h b/lib/rbtree.h
index f31fc56..82ab97a 100644
--- a/lib/rbtree.h
+++ b/lib/rbtree.h
@@ -34,7 +34,7 @@
 static inline struct page * rb_search_page_cache(struct inode * inode,
 						 unsigned long offset)
 {
-	struct rb_node * n = inode->i_rb_page_cache.rb_node;
+	struct fio_rb_node * n = inode->i_rb_page_cache.rb_node;
 	struct page * page;
 
 	while (n)
@@ -53,10 +53,10 @@ static inline struct page * rb_search_page_cache(struct inode * inode,
 
 static inline struct page * __rb_insert_page_cache(struct inode * inode,
 						   unsigned long offset,
-						   struct rb_node * node)
+						   struct fio_rb_node * node)
 {
-	struct rb_node ** p = &inode->i_rb_page_cache.rb_node;
-	struct rb_node * parent = NULL;
+	struct fio_rb_node ** p = &inode->i_rb_page_cache.rb_node;
+	struct fio_rb_node * parent = NULL;
 	struct page * page;
 
 	while (*p)
@@ -79,7 +79,7 @@ static inline struct page * __rb_insert_page_cache(struct inode * inode,
 
 static inline struct page * rb_insert_page_cache(struct inode * inode,
 						 unsigned long offset,
-						 struct rb_node * node)
+						 struct fio_rb_node * node)
 {
 	struct page * ret;
 	if ((ret = __rb_insert_page_cache(inode, offset, node)))
@@ -97,34 +97,34 @@ static inline struct page * rb_insert_page_cache(struct inode * inode,
 #include <stdlib.h>
 #include <inttypes.h>
 
-struct rb_node
+struct fio_rb_node
 {
 	intptr_t rb_parent_color;
 #define	RB_RED		0
 #define	RB_BLACK	1
-	struct rb_node *rb_right;
-	struct rb_node *rb_left;
+	struct fio_rb_node *rb_right;
+	struct fio_rb_node *rb_left;
 } __attribute__((aligned(sizeof(long))));
     /* The alignment might seem pointless, but allegedly CRIS needs it */
 
 struct rb_root
 {
-	struct rb_node *rb_node;
+	struct fio_rb_node *rb_node;
 };
 
 
-#define rb_parent(r)   ((struct rb_node *)((r)->rb_parent_color & ~3))
+#define rb_parent(r)   ((struct fio_rb_node *)((r)->rb_parent_color & ~3))
 #define rb_color(r)   ((r)->rb_parent_color & 1)
 #define rb_is_red(r)   (!rb_color(r))
 #define rb_is_black(r) rb_color(r)
 #define rb_set_red(r)  do { (r)->rb_parent_color &= ~1; } while (0)
 #define rb_set_black(r)  do { (r)->rb_parent_color |= 1; } while (0)
 
-static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
+static inline void rb_set_parent(struct fio_rb_node *rb, struct fio_rb_node *p)
 {
 	rb->rb_parent_color = (rb->rb_parent_color & 3) | (uintptr_t)p;
 }
-static inline void rb_set_color(struct rb_node *rb, int color)
+static inline void rb_set_color(struct fio_rb_node *rb, int color)
 {
 	rb->rb_parent_color = (rb->rb_parent_color & ~1) | color;
 }
@@ -136,15 +136,16 @@ static inline void rb_set_color(struct rb_node *rb, int color)
 #define RB_EMPTY_NODE(node)	(rb_parent(node) == node)
 #define RB_CLEAR_NODE(node)	(rb_set_parent(node, node))
 
-extern void rb_insert_color(struct rb_node *, struct rb_root *);
-extern void rb_erase(struct rb_node *, struct rb_root *);
+extern void rb_insert_color(struct fio_rb_node *, struct rb_root *);
+extern void rb_erase(struct fio_rb_node *, struct rb_root *);
 
 /* Find logical next and previous nodes in a tree */
-extern struct rb_node *rb_first(struct rb_root *);
-extern struct rb_node *rb_next(const struct rb_node *);
+extern struct fio_rb_node *rb_first(struct rb_root *);
+extern struct fio_rb_node *rb_next(const struct fio_rb_node *);
 
-static inline void rb_link_node(struct rb_node * node, struct rb_node * parent,
-				struct rb_node ** rb_link)
+static inline void rb_link_node(struct fio_rb_node * node,
+				struct fio_rb_node * parent,
+				struct fio_rb_node ** rb_link)
 {
 	node->rb_parent_color = (uintptr_t)parent;
 	node->rb_left = node->rb_right = NULL;
diff --git a/parse.c b/parse.c
index a9ee1ce..fdb6611 100644
--- a/parse.c
+++ b/parse.c
@@ -172,7 +172,7 @@ static unsigned long long get_mult_time(const char *str, int len,
 
 	c = strdup(p);
 	for (i = 0; i < strlen(c); i++)
-		c[i] = tolower(c[i]);
+		c[i] = tolower((unsigned char)c[i]);
 
 	if (!strncmp("us", c, 2) || !strncmp("usec", c, 4))
 		mult = 1;
@@ -218,7 +218,7 @@ static unsigned long long __get_mult_bytes(const char *p, void *data,
 	c = strdup(p);
 
 	for (i = 0; i < strlen(c); i++) {
-		c[i] = tolower(c[i]);
+		c[i] = tolower((unsigned char)c[i]);
 		if (is_separator(c[i])) {
 			c[i] = '\0';
 			break;
diff --git a/stat.c b/stat.c
index f89913b..8a242c9 100644
--- a/stat.c
+++ b/stat.c
@@ -1860,13 +1860,14 @@ void __show_run_stats(void)
 		char time_buf[32];
 		struct timeval now;
 		unsigned long long ms_since_epoch;
+		time_t tv_sec;
 
 		gettimeofday(&now, NULL);
 		ms_since_epoch = (unsigned long long)(now.tv_sec) * 1000 +
 		                 (unsigned long long)(now.tv_usec) / 1000;
 
-		os_ctime_r((const time_t *) &now.tv_sec, time_buf,
-				sizeof(time_buf));
+		tv_sec = now.tv_sec;
+		os_ctime_r(&tv_sec, time_buf, sizeof(time_buf));
 		if (time_buf[strlen(time_buf) - 1] == '\n')
 			time_buf[strlen(time_buf) - 1] = '\0';
 
diff --git a/t/dedupe.c b/t/dedupe.c
index c3b837f..9a50821 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -49,7 +49,7 @@ struct extent {
 };
 
 struct chunk {
-	struct rb_node rb_node;
+	struct fio_rb_node rb_node;
 	uint64_t count;
 	uint32_t hash[MD5_HASH_WORDS];
 	struct flist_head extent_list[0];
@@ -194,7 +194,7 @@ static struct chunk *alloc_chunk(void)
 
 static void insert_chunk(struct item *i)
 {
-	struct rb_node **p, *parent;
+	struct fio_rb_node **p, *parent;
 	struct chunk *c;
 	int diff;
 
@@ -497,7 +497,7 @@ static void show_stat(uint64_t nextents, uint64_t nchunks)
 
 static void iter_rb_tree(uint64_t *nextents, uint64_t *nchunks)
 {
-	struct rb_node *n;
+	struct fio_rb_node *n;
 
 	*nchunks = *nextents = 0;
 
diff --git a/verify.c b/verify.c
index d070f33..17af3bb 100644
--- a/verify.c
+++ b/verify.c
@@ -1307,7 +1307,7 @@ int get_next_verify(struct thread_data *td, struct io_u *io_u)
 		return 0;
 
 	if (!RB_EMPTY_ROOT(&td->io_hist_tree)) {
-		struct rb_node *n = rb_first(&td->io_hist_tree);
+		struct fio_rb_node *n = rb_first(&td->io_hist_tree);
 
 		ipo = rb_entry(n, struct io_piece, rb_node);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-10 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-10 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 582e2fd9acae207400bed4226ceda4ee02464136:

  Merge branch 'disable_opt' of https://github.com/sitsofe/fio (2018-03-07 10:22:28 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e0b3258bde6f39ab3a9d178b56526b65e0e32a8d:

  filesetup: Initialize all members of struct fio_file (2018-03-09 21:34:44 -0700)

----------------------------------------------------------------
Bart Van Assche (5):
      Remove prof_io_ops.fill_io_u_off(), .fill_io_u_size() and .get_next_file()
      Makefile: Rerun the configure script if it has been modified
      Declare debug_levels[] const
      helper_thread: Initialize all helper_data members before using it
      filesetup: Initialize all members of struct fio_file

Jens Axboe (5):
      mutex: ensure that fio_mutex_up() holds mutex lock during wakeup
      mutex: fix other locations where we are not waking within the lock
      io_u: kill get_next_{offset,buflen} wrappers
      io_u: 'is_random' can be a boolean
      filesetup: don't round/adjust size for size_percent == 100

 Makefile        | 23 +++++++++++------------
 debug.h         |  2 +-
 filesetup.c     |  4 ++--
 helper_thread.c |  2 +-
 init.c          |  6 +++---
 io_u.c          | 57 ++++++++++++---------------------------------------------
 mutex.c         |  3 ++-
 profile.h       |  4 ----
 verify.c        |  2 +-
 workqueue.c     |  2 +-
 10 files changed, 34 insertions(+), 71 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 19ba40a..d73b944 100644
--- a/Makefile
+++ b/Makefile
@@ -4,19 +4,18 @@ endif
 
 VPATH := $(SRCDIR)
 
-ifneq ($(wildcard config-host.mak),)
-all:
-include config-host.mak
-config-host-mak: configure
-	@echo $@ is out-of-date, running configure
-	@sed -n "/.*Configured with/s/[^:]*: //p" $@ | sh
-else
-config-host.mak:
+all: fio
+
+config-host.mak: configure
+	@if [ ! -e "$@" ]; then					\
+	  echo "Running configure ...";				\
+	  ./configure;						\
+	else							\
+	  echo "$@ is out-of-date, running configure";		\
+	  sed -n "/.*Configured with/s/[^:]*: //p" "$@" | sh;	\
+	fi
+
 ifneq ($(MAKECMDGOALS),clean)
-	@echo "Running configure for you..."
-	@./configure
-endif
-all:
 include config-host.mak
 endif
 
diff --git a/debug.h b/debug.h
index ac5f2cc..b8718dd 100644
--- a/debug.h
+++ b/debug.h
@@ -53,7 +53,7 @@ struct debug_level {
 	unsigned long shift;
 	unsigned int jobno;
 };
-extern struct debug_level debug_levels[];
+extern const struct debug_level debug_levels[];
 
 extern unsigned long fio_debug;
 
diff --git a/filesetup.c b/filesetup.c
index cced556..1a187ff 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1036,7 +1036,7 @@ int setup_files(struct thread_data *td)
 		if (f->io_size == -1ULL)
 			total_size = -1ULL;
 		else {
-                        if (o->size_percent) {
+                        if (o->size_percent && o->size_percent != 100) {
 				uint64_t file_size;
 
 				file_size = f->io_size + f->file_offset;
@@ -1481,7 +1481,7 @@ static struct fio_file *alloc_new_file(struct thread_data *td)
 	if (td_ioengine_flagged(td, FIO_NOFILEHASH))
 		f = calloc(1, sizeof(*f));
 	else
-		f = smalloc(sizeof(*f));
+		f = scalloc(1, sizeof(*f));
 	if (!f) {
 		assert(0);
 		return NULL;
diff --git a/helper_thread.c b/helper_thread.c
index 64e5a3c..b05f821 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -156,7 +156,7 @@ int helper_thread_create(struct fio_mutex *startup_mutex, struct sk_out *sk_out)
 	struct helper_data *hd;
 	int ret;
 
-	hd = smalloc(sizeof(*hd));
+	hd = scalloc(1, sizeof(*hd));
 
 	setup_disk_util();
 	steadystate_setup();
diff --git a/init.c b/init.c
index 28061db..bb0627b 100644
--- a/init.c
+++ b/init.c
@@ -2094,7 +2094,7 @@ static int fill_def_thread(void)
 static void show_debug_categories(void)
 {
 #ifdef FIO_INC_DEBUG
-	struct debug_level *dl = &debug_levels[0];
+	const struct debug_level *dl = &debug_levels[0];
 	int curlen, first = 1;
 
 	curlen = 0;
@@ -2184,7 +2184,7 @@ static void usage(const char *name)
 }
 
 #ifdef FIO_INC_DEBUG
-struct debug_level debug_levels[] = {
+const struct debug_level debug_levels[] = {
 	{ .name = "process",
 	  .help = "Process creation/exit logging",
 	  .shift = FD_PROCESS,
@@ -2262,7 +2262,7 @@ struct debug_level debug_levels[] = {
 
 static int set_debug(const char *string)
 {
-	struct debug_level *dl;
+	const struct debug_level *dl;
 	char *p = (char *) string;
 	char *opt;
 	int i;
diff --git a/io_u.c b/io_u.c
index 61d09ba..a37b723 100644
--- a/io_u.c
+++ b/io_u.c
@@ -470,7 +470,7 @@ static int get_next_seq_offset(struct thread_data *td, struct fio_file *f,
 
 static int get_next_block(struct thread_data *td, struct io_u *io_u,
 			  enum fio_ddir ddir, int rw_seq,
-			  unsigned int *is_random)
+			  bool *is_random)
 {
 	struct fio_file *f = io_u->file;
 	uint64_t b, offset;
@@ -484,27 +484,27 @@ static int get_next_block(struct thread_data *td, struct io_u *io_u,
 		if (td_random(td)) {
 			if (should_do_random(td, ddir)) {
 				ret = get_next_rand_block(td, f, ddir, &b);
-				*is_random = 1;
+				*is_random = true;
 			} else {
-				*is_random = 0;
+				*is_random = false;
 				io_u_set(td, io_u, IO_U_F_BUSY_OK);
 				ret = get_next_seq_offset(td, f, ddir, &offset);
 				if (ret)
 					ret = get_next_rand_block(td, f, ddir, &b);
 			}
 		} else {
-			*is_random = 0;
+			*is_random = false;
 			ret = get_next_seq_offset(td, f, ddir, &offset);
 		}
 	} else {
 		io_u_set(td, io_u, IO_U_F_BUSY_OK);
-		*is_random = 0;
+		*is_random = false;
 
 		if (td->o.rw_seq == RW_SEQ_SEQ) {
 			ret = get_next_seq_offset(td, f, ddir, &offset);
 			if (ret) {
 				ret = get_next_rand_block(td, f, ddir, &b);
-				*is_random = 0;
+				*is_random = false;
 			}
 		} else if (td->o.rw_seq == RW_SEQ_IDENT) {
 			if (f->last_start[ddir] != -1ULL)
@@ -537,8 +537,8 @@ static int get_next_block(struct thread_data *td, struct io_u *io_u,
  * until we find a free one. For sequential io, just return the end of
  * the last io issued.
  */
-static int __get_next_offset(struct thread_data *td, struct io_u *io_u,
-			     unsigned int *is_random)
+static int get_next_offset(struct thread_data *td, struct io_u *io_u,
+			   bool *is_random)
 {
 	struct fio_file *f = io_u->file;
 	enum fio_ddir ddir = io_u->ddir;
@@ -572,19 +572,6 @@ static int __get_next_offset(struct thread_data *td, struct io_u *io_u,
 	return 0;
 }
 
-static int get_next_offset(struct thread_data *td, struct io_u *io_u,
-			   unsigned int *is_random)
-{
-	if (td->flags & TD_F_PROFILE_OPS) {
-		struct prof_io_ops *ops = &td->prof_io_ops;
-
-		if (ops->fill_io_u_off)
-			return ops->fill_io_u_off(td, io_u, is_random);
-	}
-
-	return __get_next_offset(td, io_u, is_random);
-}
-
 static inline bool io_u_fits(struct thread_data *td, struct io_u *io_u,
 			     unsigned int buflen)
 {
@@ -593,8 +580,8 @@ static inline bool io_u_fits(struct thread_data *td, struct io_u *io_u,
 	return io_u->offset + buflen <= f->io_size + get_start_offset(td, f);
 }
 
-static unsigned int __get_next_buflen(struct thread_data *td, struct io_u *io_u,
-				      unsigned int is_random)
+static unsigned int get_next_buflen(struct thread_data *td, struct io_u *io_u,
+				    bool is_random)
 {
 	int ddir = io_u->ddir;
 	unsigned int buflen = 0;
@@ -605,7 +592,7 @@ static unsigned int __get_next_buflen(struct thread_data *td, struct io_u *io_u,
 	assert(ddir_rw(ddir));
 
 	if (td->o.bs_is_seq_rand)
-		ddir = is_random ? DDIR_WRITE: DDIR_READ;
+		ddir = is_random ? DDIR_WRITE : DDIR_READ;
 
 	minbs = td->o.min_bs[ddir];
 	maxbs = td->o.max_bs[ddir];
@@ -655,19 +642,6 @@ static unsigned int __get_next_buflen(struct thread_data *td, struct io_u *io_u,
 	return buflen;
 }
 
-static unsigned int get_next_buflen(struct thread_data *td, struct io_u *io_u,
-				    unsigned int is_random)
-{
-	if (td->flags & TD_F_PROFILE_OPS) {
-		struct prof_io_ops *ops = &td->prof_io_ops;
-
-		if (ops->fill_io_u_size)
-			return ops->fill_io_u_size(td, io_u, is_random);
-	}
-
-	return __get_next_buflen(td, io_u, is_random);
-}
-
 static void set_rwmix_bytes(struct thread_data *td)
 {
 	unsigned int diff;
@@ -957,7 +931,7 @@ static void __fill_io_u_zone(struct thread_data *td, struct io_u *io_u)
 
 static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 {
-	unsigned int is_random;
+	bool is_random;
 
 	if (td_ioengine_flagged(td, FIO_NOIO))
 		goto out;
@@ -1387,13 +1361,6 @@ out:
 
 static struct fio_file *get_next_file(struct thread_data *td)
 {
-	if (td->flags & TD_F_PROFILE_OPS) {
-		struct prof_io_ops *ops = &td->prof_io_ops;
-
-		if (ops->get_next_file)
-			return ops->get_next_file(td);
-	}
-
 	return __get_next_file(td);
 }
 
diff --git a/mutex.c b/mutex.c
index 63229ed..acc88dc 100644
--- a/mutex.c
+++ b/mutex.c
@@ -240,10 +240,11 @@ void fio_mutex_up(struct fio_mutex *mutex)
 	if (!mutex->value && mutex->waiters)
 		do_wake = 1;
 	mutex->value++;
-	pthread_mutex_unlock(&mutex->lock);
 
 	if (do_wake)
 		pthread_cond_signal(&mutex->cond);
+
+	pthread_mutex_unlock(&mutex->lock);
 }
 
 void fio_rwlock_write(struct fio_rwlock *lock)
diff --git a/profile.h b/profile.h
index 8d1f757..414151e 100644
--- a/profile.h
+++ b/profile.h
@@ -10,10 +10,6 @@ struct prof_io_ops {
 	int (*td_init)(struct thread_data *);
 	void (*td_exit)(struct thread_data *);
 
-	int (*fill_io_u_off)(struct thread_data *, struct io_u *, unsigned int *);
-	int (*fill_io_u_size)(struct thread_data *, struct io_u *, unsigned int);
-	struct fio_file *(*get_next_file)(struct thread_data *);
-
 	int (*io_u_lat)(struct thread_data *, uint64_t);
 };
 
diff --git a/verify.c b/verify.c
index aeafdb5..d070f33 100644
--- a/verify.c
+++ b/verify.c
@@ -748,9 +748,9 @@ int verify_io_u_async(struct thread_data *td, struct io_u **io_u_ptr)
 	}
 	flist_add_tail(&io_u->verify_list, &td->verify_list);
 	*io_u_ptr = NULL;
-	pthread_mutex_unlock(&td->io_u_lock);
 
 	pthread_cond_signal(&td->verify_cond);
+	pthread_mutex_unlock(&td->io_u_lock);
 	return 0;
 }
 
diff --git a/workqueue.c b/workqueue.c
index 1131400..18ec198 100644
--- a/workqueue.c
+++ b/workqueue.c
@@ -109,9 +109,9 @@ void workqueue_enqueue(struct workqueue *wq, struct workqueue_work *work)
 	flist_add_tail(&work->list, &sw->work_list);
 	sw->seq = ++wq->work_seq;
 	sw->flags &= ~SW_F_IDLE;
-	pthread_mutex_unlock(&sw->lock);
 
 	pthread_cond_signal(&sw->cond);
+	pthread_mutex_unlock(&sw->lock);
 }
 
 static void handle_list(struct submit_worker *sw, struct flist_head *list)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-08 13:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3072 bytes --]

The following changes since commit 674456bf7d0f775f774c8097c8bcb98b48cccc51:

  Reduce LOG_MSEC_SLACK (2018-03-06 17:53:20 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 582e2fd9acae207400bed4226ceda4ee02464136:

  Merge branch 'disable_opt' of https://github.com/sitsofe/fio (2018-03-07 10:22:28 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      configure: don't override march if already set
      Merge branch 'disable_opt' of https://github.com/sitsofe/fio

Sitsofe Wheeler (2):
      configure: make --disable-optimizations disable march=native
      appveyor: disable setting compiler march

 appveyor.yml |  2 +-
 configure    | 10 +++++++---
 2 files changed, 8 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/appveyor.yml b/appveyor.yml
index c6c3689..09ebccf 100644
--- a/appveyor.yml
+++ b/appveyor.yml
@@ -17,7 +17,7 @@ install:
   - SET PATH=%CYG_ROOT%\bin;%PATH% #��NB: Changed env variables persist to later sections
 
 build_script:
-  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure --extra-cflags=\"-Werror\" ${CONFIGURE_OPTIONS} && make.exe'
+  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure --disable-native --extra-cflags=\"-Werror\" ${CONFIGURE_OPTIONS} && make.exe'
 
 after_build:
   - cd os\windows && dobuild.cmd %PLATFORM%
diff --git a/configure b/configure
index 589ff3f..aefd5bb 100755
--- a/configure
+++ b/configure
@@ -1,7 +1,7 @@
 #!/bin/sh
 #
 # Fio configure script. Heavily influenced by the manual qemu configure
-# script. Sad this this is easier than autoconf and enemies.
+# script. Sad this is easier than autoconf and enemies.
 #
 
 # set temporary file name
@@ -146,6 +146,7 @@ pmem="no"
 disable_lex=""
 disable_pmem="no"
 disable_native="no"
+march_set="no"
 prefix=/usr/local
 
 # parse options
@@ -2065,6 +2066,7 @@ EOF
   if compile_prog "-march=armv8-a+crc+crypto" "" ""; then
     march_armv8_a_crc_crypto="yes"
     CFLAGS="$CFLAGS -march=armv8-a+crc+crypto -DARCH_HAVE_CRC_CRYPTO"
+    march_set="yes"
   fi
 fi
 print_config "march_armv8_a_crc_crypto" "$march_armv8_a_crc_crypto"
@@ -2112,7 +2114,8 @@ int main(int argc, char **argv)
   return 0;
 }
 EOF
-if test "$disable_native" = "no" && compile_prog "-march=native" "" "march=native"; then
+if test "$disable_native" = "no" && test "$disable_opt" != "yes" && \
+   compile_prog "-march=native" "" "march=native"; then
   build_native="yes"
 fi
 print_config "Build march=native" "$build_native"
@@ -2287,6 +2290,7 @@ fi
 if test "$s390_z196_facilities" = "yes" ; then
   output_sym "CONFIG_S390_Z196_FACILITIES"
   CFLAGS="$CFLAGS -march=z9-109"
+  march_set="yes"
 fi
 if test "$gfapi" = "yes" ; then
   output_sym "CONFIG_GFAPI"
@@ -2360,7 +2364,7 @@ fi
 if test "$mkdir_two" = "yes" ; then
   output_sym "CONFIG_HAVE_MKDIR_TWO"
 fi
-if test "$build_native" = "yes" ; then
+if test "$march_set" = "no" && test "$build_native" = "yes" ; then
   output_sym "CONFIG_BUILD_NATIVE"
 fi
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b979af5cba25e31dc2a8f2fd89ac5e40c24b6519:

  Merge branch 'howto_typos' of https://github.com/dirtyharrycallahan/fio (2018-03-05 10:29:39 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 674456bf7d0f775f774c8097c8bcb98b48cccc51:

  Reduce LOG_MSEC_SLACK (2018-03-06 17:53:20 -0700)

----------------------------------------------------------------
Jason Dillaman (2):
      rbd: fixed busy-loop when using eventfd polling
      rbd: remove support for blkin tracing

Jeff Furlong (1):
      Reduce LOG_MSEC_SLACK

Jens Axboe (3):
      Don't make fadvise failure fatal
      Default to building native code
      Merge branch 'wip-rbd-engine' of https://github.com/dillaman/fio

 Makefile      |  3 +++
 configure     | 56 +++++++++++++++++++++-----------------------------------
 debug.h       |  1 +
 engines/rbd.c | 41 ++++++++++++++++++-----------------------
 ioengines.c   |  4 ++--
 stat.c        | 14 ++++++++------
 6 files changed, 53 insertions(+), 66 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index c25b422..19ba40a 100644
--- a/Makefile
+++ b/Makefile
@@ -31,6 +31,9 @@ SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot
 ifndef CONFIG_FIO_NO_OPT
   CFLAGS += -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2
 endif
+ifdef CONFIG_BUILD_NATIVE
+  CFLAGS += -march=native
+endif
 
 ifdef CONFIG_GFIO
   PROGS += gfio
diff --git a/configure b/configure
index 2e8eb18..589ff3f 100755
--- a/configure
+++ b/configure
@@ -145,6 +145,7 @@ devdax="no"
 pmem="no"
 disable_lex=""
 disable_pmem="no"
+disable_native="no"
 prefix=/usr/local
 
 # parse options
@@ -177,8 +178,6 @@ for opt do
   ;;
   --disable-rbd) disable_rbd="yes"
   ;;
-  --disable-rbd-blkin) disable_rbd_blkin="yes"
-  ;;
   --disable-gfapi) disable_gfapi="yes"
   ;;
   --enable-libhdfs) libhdfs="yes"
@@ -195,6 +194,8 @@ for opt do
   ;;
   --enable-cuda) enable_cuda="yes"
   ;;
+  --disable-native) disable_native="yes"
+  ;;
   --help)
     show_help="yes"
     ;;
@@ -224,6 +225,7 @@ if test "$show_help" = "yes" ; then
   echo "--disable-shm           Disable SHM support"
   echo "--disable-optimizations Don't enable compiler optimizations"
   echo "--enable-cuda           Enable GPUDirect RDMA support"
+  echo "--disable-native        Don't build for native host"
   exit $exit_val
 fi
 
@@ -1629,36 +1631,6 @@ print_config "rbd_invalidate_cache" "$rbd_inval"
 fi
 
 ##########################################
-# check for blkin
-if test "$rbd_blkin" != "yes" ; then
-  rbd_blkin="no"
-fi
-cat > $TMPC << EOF
-#include <rbd/librbd.h>
-#include <zipkin_c.h>
-
-int main(int argc, char **argv)
-{
-  int r;
-  struct blkin_trace_info t_info;
-  blkin_init_trace_info(&t_info);
-  rbd_completion_t completion;
-  rbd_image_t image;
-  uint64_t off;
-  size_t len;
-  const char *buf;
-  r = rbd_aio_write_traced(image, off, len, buf, completion, &t_info);
-  return 0;
-}
-EOF
-if test "$disable_rbd" != "yes" && test "$disable_rbd_blkin" != "yes" \
- && compile_prog "" "-lrbd -lrados -lblkin" "rbd_blkin"; then
-  LIBS="-lblkin $LIBS"
-  rbd_blkin="yes"
-fi
-print_config "rbd blkin tracing" "$rbd_blkin"
-
-##########################################
 # Check whether we have setvbuf
 if test "$setvbuf" != "yes" ; then
   setvbuf="no"
@@ -2131,6 +2103,20 @@ if compile_prog "" "" "mkdir(a, b)"; then
 fi
 print_config "mkdir(a, b)" "$mkdir_two"
 
+##########################################
+# check for cc -march=native
+build_native="no"
+cat > $TMPC << EOF
+int main(int argc, char **argv)
+{
+  return 0;
+}
+EOF
+if test "$disable_native" = "no" && compile_prog "-march=native" "" "march=native"; then
+  build_native="yes"
+fi
+print_config "Build march=native" "$build_native"
+
 #############################################################################
 
 if test "$wordsize" = "64" ; then
@@ -2295,9 +2281,6 @@ fi
 if test "$rbd_inval" = "yes" ; then
   output_sym "CONFIG_RBD_INVAL"
 fi
-if test "$rbd_blkin" = "yes" ; then
-  output_sym "CONFIG_RBD_BLKIN"
-fi
 if test "$setvbuf" = "yes" ; then
   output_sym "CONFIG_SETVBUF"
 fi
@@ -2377,6 +2360,9 @@ fi
 if test "$mkdir_two" = "yes" ; then
   output_sym "CONFIG_HAVE_MKDIR_TWO"
 fi
+if test "$build_native" = "yes" ; then
+  output_sym "CONFIG_BUILD_NATIVE"
+fi
 
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "GFIO_LIBS+=$GFIO_LIBS" >> $config_host_mak
diff --git a/debug.h b/debug.h
index ba62214..ac5f2cc 100644
--- a/debug.h
+++ b/debug.h
@@ -43,6 +43,7 @@ enum {
 	FIO_WARN_VERIFY_BUF	= 2,
 	FIO_WARN_ZONED_BUG	= 4,
 	FIO_WARN_IOLOG_DROP	= 8,
+	FIO_WARN_FADVISE	= 16,
 };
 
 #ifdef FIO_INC_DEBUG
diff --git a/engines/rbd.c b/engines/rbd.c
index 39501eb..6582b06 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -9,9 +9,6 @@
 
 #include "../fio.h"
 #include "../optgroup.h"
-#ifdef CONFIG_RBD_BLKIN
-#include <zipkin_c.h>
-#endif
 
 #ifdef CONFIG_RBD_POLL
 /* add for poll */
@@ -24,9 +21,6 @@ struct fio_rbd_iou {
 	rbd_completion_t completion;
 	int io_seen;
 	int io_complete;
-#ifdef CONFIG_RBD_BLKIN
-	struct blkin_trace_info info;
-#endif
 };
 
 struct rbd_data {
@@ -146,7 +140,7 @@ static bool _fio_rbd_setup_poll(struct rbd_data *rbd)
 	int r;
 
 	/* add for rbd poll */
-	rbd->fd = eventfd(0, EFD_NONBLOCK);
+	rbd->fd = eventfd(0, EFD_SEMAPHORE);
 	if (rbd->fd < 0) {
 		log_err("eventfd failed.\n");
 		return false;
@@ -366,25 +360,37 @@ static int rbd_iter_events(struct thread_data *td, unsigned int *events,
 	int event_num = 0;
 	struct fio_rbd_iou *fri = NULL;
 	rbd_completion_t comps[min_evts];
+	uint64_t counter;
+	bool completed;
 
 	struct pollfd pfd;
 	pfd.fd = rbd->fd;
 	pfd.events = POLLIN;
 
-	ret = poll(&pfd, 1, -1);
+	ret = poll(&pfd, 1, wait ? -1 : 0);
 	if (ret <= 0)
 		return 0;
-
-	assert(pfd.revents & POLLIN);
+	if (!(pfd.revents & POLLIN))
+		return 0;
 
 	event_num = rbd_poll_io_events(rbd->image, comps, min_evts);
 
 	for (i = 0; i < event_num; i++) {
 		fri = rbd_aio_get_arg(comps[i]);
 		io_u = fri->io_u;
+
+		/* best effort to decrement the semaphore */
+		ret = read(rbd->fd, &counter, sizeof(counter));
+		if (ret <= 0)
+			log_err("rbd_iter_events failed to decrement semaphore.\n");
+
+		completed = fri_check_complete(rbd, io_u, events);
+		assert(completed);
+
+		this_events++;
+	}
 #else
 	io_u_qiter(&td->io_u_all, io_u, i) {
-#endif
 		if (!(io_u->flags & IO_U_F_FLIGHT))
 			continue;
 		if (rbd_io_u_seen(io_u))
@@ -395,6 +401,7 @@ static int rbd_iter_events(struct thread_data *td, unsigned int *events,
 		else if (wait)
 			rbd->sort_events[sidx++] = io_u;
 	}
+#endif
 
 	if (!wait || !sidx)
 		return this_events;
@@ -474,28 +481,16 @@ static int fio_rbd_queue(struct thread_data *td, struct io_u *io_u)
 	}
 
 	if (io_u->ddir == DDIR_WRITE) {
-#ifdef CONFIG_RBD_BLKIN
-		blkin_init_trace_info(&fri->info);
-		r = rbd_aio_write_traced(rbd->image, io_u->offset, io_u->xfer_buflen,
-					 io_u->xfer_buf, fri->completion, &fri->info);
-#else
 		r = rbd_aio_write(rbd->image, io_u->offset, io_u->xfer_buflen,
 					 io_u->xfer_buf, fri->completion);
-#endif
 		if (r < 0) {
 			log_err("rbd_aio_write failed.\n");
 			goto failed_comp;
 		}
 
 	} else if (io_u->ddir == DDIR_READ) {
-#ifdef CONFIG_RBD_BLKIN
-		blkin_init_trace_info(&fri->info);
-		r = rbd_aio_read_traced(rbd->image, io_u->offset, io_u->xfer_buflen,
-					io_u->xfer_buf, fri->completion, &fri->info);
-#else
 		r = rbd_aio_read(rbd->image, io_u->offset, io_u->xfer_buflen,
 					io_u->xfer_buf, fri->completion);
-#endif
 
 		if (r < 0) {
 			log_err("rbd_aio_read failed.\n");
diff --git a/ioengines.c b/ioengines.c
index 5dd2311..965581a 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -498,8 +498,8 @@ int td_io_open_file(struct thread_data *td, struct fio_file *f)
 		}
 
 		if (posix_fadvise(f->fd, f->file_offset, f->io_size, flags) < 0) {
-			td_verror(td, errno, "fadvise");
-			goto err;
+			if (!fio_did_warn(FIO_WARN_FADVISE))
+				log_err("fio: fadvise hint failed\n");
 		}
 	}
 #ifdef FIO_HAVE_WRITE_HINT
diff --git a/stat.c b/stat.c
index 5bbc056..f89913b 100644
--- a/stat.c
+++ b/stat.c
@@ -18,7 +18,7 @@
 #include "helper_thread.h"
 #include "smalloc.h"
 
-#define LOG_MSEC_SLACK	10
+#define LOG_MSEC_SLACK	1
 
 struct fio_mutex *stat_mutex;
 
@@ -2340,9 +2340,11 @@ static void _add_stat_to_log(struct io_log *iolog, unsigned long elapsed,
 		__add_stat_to_log(iolog, ddir, elapsed, log_max);
 }
 
-static long add_log_sample(struct thread_data *td, struct io_log *iolog,
-			   union io_sample_data data, enum fio_ddir ddir,
-			   unsigned int bs, uint64_t offset)
+static unsigned long add_log_sample(struct thread_data *td,
+				    struct io_log *iolog,
+				    union io_sample_data data,
+				    enum fio_ddir ddir, unsigned int bs,
+				    uint64_t offset)
 {
 	unsigned long elapsed, this_window;
 
@@ -2373,7 +2375,7 @@ static long add_log_sample(struct thread_data *td, struct io_log *iolog,
 	if (elapsed < iolog->avg_last[ddir])
 		return iolog->avg_last[ddir] - elapsed;
 	else if (this_window < iolog->avg_msec) {
-		int diff = iolog->avg_msec - this_window;
+		unsigned long diff = iolog->avg_msec - this_window;
 
 		if (inline_log(iolog) || diff > LOG_MSEC_SLACK)
 			return diff;
@@ -2562,7 +2564,7 @@ static int __add_samples(struct thread_data *td, struct timespec *parent_tv,
 {
 	unsigned long spent, rate;
 	enum fio_ddir ddir;
-	unsigned int next, next_log;
+	unsigned long next, next_log;
 
 	next_log = avg_time;
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-06 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-06 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 316bc2d48fefa6f9dbf01e83948dfe847cf3ea0e:

  Merge branch 'mpath_nvme_diskutil' of https://github.com/sitsofe/fio (2018-03-02 12:02:08 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b979af5cba25e31dc2a8f2fd89ac5e40c24b6519:

  Merge branch 'howto_typos' of https://github.com/dirtyharrycallahan/fio (2018-03-05 10:29:39 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'manpage_typos' of https://github.com/dirtyharrycallahan/fio
      Merge branch 'howto_typos' of https://github.com/dirtyharrycallahan/fio

Patrick Callahan (2):
      doc: fix typos in fio man page
      doc: fix typos in HOWTO

 HOWTO | 8 ++++----
 fio.1 | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index f91a22e..acb9e97 100644
--- a/HOWTO
+++ b/HOWTO
@@ -672,7 +672,7 @@ Time related parameters
 	Tell fio to terminate processing after the specified period of time.  It
 	can be quite hard to determine for how long a specified job will run, so
 	this parameter is handy to cap the total runtime to a given time.  When
-	the unit is omitted, the value is intepreted in seconds.
+	the unit is omitted, the value is interpreted in seconds.
 
 .. option:: time_based
 
@@ -1775,7 +1775,7 @@ I/O engine
 			at least one non-cpuio job.
 
 		**guasi**
-			The GUASI I/O engine is the Generic Userspace Asyncronous Syscall
+			The GUASI I/O engine is the Generic Userspace Asynchronous Syscall
 			Interface approach to async I/O. See
 
 			http://www.xmailserver.org/guasi-lib.html
@@ -2413,7 +2413,7 @@ Threads, processes and job synchronization
 
 		<mode>[:<nodelist>]
 
-	``mode`` is one of the following memory poicies: ``default``, ``prefer``,
+	``mode`` is one of the following memory policies: ``default``, ``prefer``,
 	``bind``, ``interleave`` or ``local``. For ``default`` and ``local`` memory
 	policies, no node needs to be specified.  For ``prefer``, only one node is
 	allowed.  For ``bind`` and ``interleave`` the ``nodelist`` may be as
@@ -2537,7 +2537,7 @@ Verification
 			each block. This will automatically use hardware acceleration
 			(e.g. SSE4.2 on an x86 or CRC crypto extensions on ARM64) but will
 			fall back to software crc32c if none is found. Generally the
-			fatest checksum fio supports when hardware accelerated.
+			fastest checksum fio supports when hardware accelerated.
 
 		**crc32c-intel**
 			Synonym for crc32c.
diff --git a/fio.1 b/fio.1
index e488b01..f955167 100644
--- a/fio.1
+++ b/fio.1
@@ -454,7 +454,7 @@ See \fB\-\-max\-jobs\fR. Default: 1.
 Tell fio to terminate processing after the specified period of time. It
 can be quite hard to determine for how long a specified job will run, so
 this parameter is handy to cap the total runtime to a given time. When
-the unit is omitted, the value is intepreted in seconds.
+the unit is omitted, the value is interpreted in seconds.
 .TP
 .BI time_based
 If set, fio will run for the duration of the \fBruntime\fR specified
@@ -1552,7 +1552,7 @@ single CPU at the desired rate. A job never finishes unless there is
 at least one non\-cpuio job.
 .TP
 .B guasi
-The GUASI I/O engine is the Generic Userspace Asyncronous Syscall
+The GUASI I/O engine is the Generic Userspace Asynchronous Syscall
 Interface approach to async I/O. See \fIhttp://www.xmailserver.org/guasi\-lib.html\fR
 for more info on GUASI.
 .TP
@@ -2134,7 +2134,7 @@ arguments:
 <mode>[:<nodelist>]
 .RE
 .P
-`mode' is one of the following memory poicies: `default', `prefer',
+`mode' is one of the following memory policies: `default', `prefer',
 `bind', `interleave' or `local'. For `default' and `local' memory
 policies, no node needs to be specified. For `prefer', only one node is
 allowed. For `bind' and `interleave' the `nodelist' may be as
@@ -2244,7 +2244,7 @@ Use a crc32c sum of the data area and store it in the header of
 each block. This will automatically use hardware acceleration
 (e.g. SSE4.2 on an x86 or CRC crypto extensions on ARM64) but will
 fall back to software crc32c if none is found. Generally the
-fatest checksum fio supports when hardware accelerated.
+fastest checksum fio supports when hardware accelerated.
 .TP
 .B crc32c\-intel
 Synonym for crc32c.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-03 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-03 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 892930839558edf72052e0471714f227c1391132:

  Merge branch 'hotfix_counter_overflow' of https://github.com/ifke/fio (2018-03-01 09:18:43 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 316bc2d48fefa6f9dbf01e83948dfe847cf3ea0e:

  Merge branch 'mpath_nvme_diskutil' of https://github.com/sitsofe/fio (2018-03-02 12:02:08 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'mpath_nvme_diskutil' of https://github.com/sitsofe/fio

Potnuri Bharat Teja (1):
      diskutil: try additional slave device path if first fails

Sitsofe Wheeler (1):
      diskutil: minor style cleanup

 diskutil.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/diskutil.c b/diskutil.c
index 618cae8..789071d 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -246,7 +246,7 @@ static void find_add_disk_slaves(struct thread_data *td, char *path,
 		 * devices?
 		 */
 		linklen = readlink(temppath, slavepath, PATH_MAX - 1);
-		if (linklen  < 0) {
+		if (linklen < 0) {
 			perror("readlink() for slave device.");
 			closedir(dirhandle);
 			return;
@@ -254,8 +254,10 @@ static void find_add_disk_slaves(struct thread_data *td, char *path,
 		slavepath[linklen] = '\0';
 
 		sprintf(temppath, "%s/%s/dev", slavesdir, slavepath);
+		if (access(temppath, F_OK) != 0)
+			sprintf(temppath, "%s/%s/device/dev", slavesdir, slavepath);
 		if (read_block_dev_entry(temppath, &majdev, &mindev)) {
-			perror("Error getting slave device numbers.");
+			perror("Error getting slave device numbers");
 			closedir(dirhandle);
 			return;
 		}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-02 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-02 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e09d68c8da4ab91397490577454de928106651f5:

  Merge branch 'win_build' of https://github.com/sitsofe/fio (2018-02-28 09:44:08 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 892930839558edf72052e0471714f227c1391132:

  Merge branch 'hotfix_counter_overflow' of https://github.com/ifke/fio (2018-03-01 09:18:43 -0700)

----------------------------------------------------------------
Alexander Larin (1):
      Fix overflow of counters incremented on each I/O operation

Jens Axboe (1):
      Merge branch 'hotfix_counter_overflow' of https://github.com/ifke/fio

 client.c  | 20 ++++++++++----------
 gclient.c |  2 +-
 io_u.c    |  2 +-
 iolog.c   | 16 ++++++++--------
 iolog.h   |  2 +-
 server.c  | 16 ++++++++--------
 server.h  |  2 +-
 stat.c    | 12 ++++++------
 stat.h    | 23 +++++++++++------------
 9 files changed, 47 insertions(+), 48 deletions(-)

---

Diff of recent changes:

diff --git a/client.c b/client.c
index 6fe6d9f..bff0adc 100644
--- a/client.c
+++ b/client.c
@@ -905,21 +905,21 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	}
 
 	for (i = 0; i < FIO_IO_U_MAP_NR; i++) {
-		dst->io_u_map[i]	= le32_to_cpu(src->io_u_map[i]);
-		dst->io_u_submit[i]	= le32_to_cpu(src->io_u_submit[i]);
-		dst->io_u_complete[i]	= le32_to_cpu(src->io_u_complete[i]);
+		dst->io_u_map[i]	= le64_to_cpu(src->io_u_map[i]);
+		dst->io_u_submit[i]	= le64_to_cpu(src->io_u_submit[i]);
+		dst->io_u_complete[i]	= le64_to_cpu(src->io_u_complete[i]);
 	}
 
 	for (i = 0; i < FIO_IO_U_LAT_N_NR; i++)
-		dst->io_u_lat_n[i]	= le32_to_cpu(src->io_u_lat_n[i]);
+		dst->io_u_lat_n[i]	= le64_to_cpu(src->io_u_lat_n[i]);
 	for (i = 0; i < FIO_IO_U_LAT_U_NR; i++)
-		dst->io_u_lat_u[i]	= le32_to_cpu(src->io_u_lat_u[i]);
+		dst->io_u_lat_u[i]	= le64_to_cpu(src->io_u_lat_u[i]);
 	for (i = 0; i < FIO_IO_U_LAT_M_NR; i++)
-		dst->io_u_lat_m[i]	= le32_to_cpu(src->io_u_lat_m[i]);
+		dst->io_u_lat_m[i]	= le64_to_cpu(src->io_u_lat_m[i]);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++)
 		for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
-			dst->io_u_plat[i][j] = le32_to_cpu(src->io_u_plat[i][j]);
+			dst->io_u_plat[i][j] = le64_to_cpu(src->io_u_plat[i][j]);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		dst->total_io_u[i]	= le64_to_cpu(src->total_io_u[i]);
@@ -1283,7 +1283,7 @@ static void client_flush_hist_samples(FILE *f, int hist_coarseness, void *sample
 	int log_offset;
 	uint64_t i, j, nr_samples;
 	struct io_u_plat_entry *entry;
-	unsigned int *io_u_plat;
+	uint64_t *io_u_plat;
 
 	int stride = 1 << hist_coarseness;
 
@@ -1306,9 +1306,9 @@ static void client_flush_hist_samples(FILE *f, int hist_coarseness, void *sample
 		fprintf(f, "%lu, %u, %u, ", (unsigned long) s->time,
 						io_sample_ddir(s), s->bs);
 		for (j = 0; j < FIO_IO_U_PLAT_NR - stride; j += stride) {
-			fprintf(f, "%lu, ", hist_sum(j, stride, io_u_plat, NULL));
+			fprintf(f, "%llu, ", (unsigned long long)hist_sum(j, stride, io_u_plat, NULL));
 		}
-		fprintf(f, "%lu\n", (unsigned long)
+		fprintf(f, "%llu\n", (unsigned long long)
 			hist_sum(FIO_IO_U_PLAT_NR - stride, stride, io_u_plat, NULL));
 
 	}
diff --git a/gclient.c b/gclient.c
index 70dda48..5087b6b 100644
--- a/gclient.c
+++ b/gclient.c
@@ -1099,7 +1099,7 @@ static void gfio_show_clat_percentiles(struct gfio_client *gc,
 				       GtkWidget *vbox, struct thread_stat *ts,
 				       int ddir)
 {
-	unsigned int *io_u_plat = ts->io_u_plat[ddir];
+	uint64_t *io_u_plat = ts->io_u_plat[ddir];
 	unsigned long long nr = ts->clat_stat[ddir].samples;
 	fio_fp64_t *plist = ts->percentile_list;
 	unsigned int len, scale_down;
diff --git a/io_u.c b/io_u.c
index b54a79c..61d09ba 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1012,7 +1012,7 @@ out:
 	return 0;
 }
 
-static void __io_u_mark_map(unsigned int *map, unsigned int nr)
+static void __io_u_mark_map(uint64_t *map, unsigned int nr)
 {
 	int idx = 0;
 
diff --git a/iolog.c b/iolog.c
index 34e74a8..fc3dade 100644
--- a/iolog.c
+++ b/iolog.c
@@ -694,10 +694,10 @@ void free_log(struct io_log *log)
 	sfree(log);
 }
 
-unsigned long hist_sum(int j, int stride, unsigned int *io_u_plat,
-		unsigned int *io_u_plat_last)
+uint64_t hist_sum(int j, int stride, uint64_t *io_u_plat,
+		uint64_t *io_u_plat_last)
 {
-	unsigned long sum;
+	uint64_t sum;
 	int k;
 
 	if (io_u_plat_last) {
@@ -718,8 +718,8 @@ static void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
 	int log_offset;
 	uint64_t i, j, nr_samples;
 	struct io_u_plat_entry *entry, *entry_before;
-	unsigned int *io_u_plat;
-	unsigned int *io_u_plat_before;
+	uint64_t *io_u_plat;
+	uint64_t *io_u_plat_before;
 
 	int stride = 1 << hist_coarseness;
 	
@@ -743,10 +743,10 @@ static void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
 		fprintf(f, "%lu, %u, %u, ", (unsigned long) s->time,
 						io_sample_ddir(s), s->bs);
 		for (j = 0; j < FIO_IO_U_PLAT_NR - stride; j += stride) {
-			fprintf(f, "%lu, ", hist_sum(j, stride, io_u_plat,
-						io_u_plat_before));
+			fprintf(f, "%llu, ", (unsigned long long)
+			        hist_sum(j, stride, io_u_plat, io_u_plat_before));
 		}
-		fprintf(f, "%lu\n", (unsigned long)
+		fprintf(f, "%llu\n", (unsigned long long)
 		        hist_sum(FIO_IO_U_PLAT_NR - stride, stride, io_u_plat,
 					io_u_plat_before));
 
diff --git a/iolog.h b/iolog.h
index bc3a0b5..2266617 100644
--- a/iolog.h
+++ b/iolog.h
@@ -286,7 +286,7 @@ extern void finalize_logs(struct thread_data *td, bool);
 extern void setup_log(struct io_log **, struct log_params *, const char *);
 extern void flush_log(struct io_log *, bool);
 extern void flush_samples(FILE *, void *, uint64_t);
-extern unsigned long hist_sum(int, int, unsigned int *, unsigned int *);
+extern uint64_t hist_sum(int, int, uint64_t *, uint64_t *);
 extern void free_log(struct io_log *);
 extern void fio_writeout_logs(bool);
 extern void td_writeout_logs(struct thread_data *, bool);
diff --git a/server.c b/server.c
index ce9dca3..959786f 100644
--- a/server.c
+++ b/server.c
@@ -1497,21 +1497,21 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	}
 
 	for (i = 0; i < FIO_IO_U_MAP_NR; i++) {
-		p.ts.io_u_map[i]	= cpu_to_le32(ts->io_u_map[i]);
-		p.ts.io_u_submit[i]	= cpu_to_le32(ts->io_u_submit[i]);
-		p.ts.io_u_complete[i]	= cpu_to_le32(ts->io_u_complete[i]);
+		p.ts.io_u_map[i]	= cpu_to_le64(ts->io_u_map[i]);
+		p.ts.io_u_submit[i]	= cpu_to_le64(ts->io_u_submit[i]);
+		p.ts.io_u_complete[i]	= cpu_to_le64(ts->io_u_complete[i]);
 	}
 
 	for (i = 0; i < FIO_IO_U_LAT_N_NR; i++)
-		p.ts.io_u_lat_n[i]	= cpu_to_le32(ts->io_u_lat_n[i]);
+		p.ts.io_u_lat_n[i]	= cpu_to_le64(ts->io_u_lat_n[i]);
 	for (i = 0; i < FIO_IO_U_LAT_U_NR; i++)
-		p.ts.io_u_lat_u[i]	= cpu_to_le32(ts->io_u_lat_u[i]);
+		p.ts.io_u_lat_u[i]	= cpu_to_le64(ts->io_u_lat_u[i]);
 	for (i = 0; i < FIO_IO_U_LAT_M_NR; i++)
-		p.ts.io_u_lat_m[i]	= cpu_to_le32(ts->io_u_lat_m[i]);
+		p.ts.io_u_lat_m[i]	= cpu_to_le64(ts->io_u_lat_m[i]);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++)
 		for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
-			p.ts.io_u_plat[i][j] = cpu_to_le32(ts->io_u_plat[i][j]);
+			p.ts.io_u_plat[i][j] = cpu_to_le64(ts->io_u_plat[i][j]);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		p.ts.total_io_u[i]	= cpu_to_le64(ts->total_io_u[i]);
@@ -1748,7 +1748,7 @@ static int __fio_append_iolog_gz_hist(struct sk_entry *first, struct io_log *log
 	for (i = 0; i < cur_log->nr_samples; i++) {
 		struct io_sample *s;
 		struct io_u_plat_entry *cur_plat_entry, *prev_plat_entry;
-		unsigned int *cur_plat, *prev_plat;
+		uint64_t *cur_plat, *prev_plat;
 
 		s = get_sample(log, cur_log, i);
 		ret = __deflate_pdu_buffer(s, sample_sz, &out_pdu, &entry, stream, first);
diff --git a/server.h b/server.h
index cabd447..bd892fc 100644
--- a/server.h
+++ b/server.h
@@ -49,7 +49,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 70,
+	FIO_SERVER_VER			= 71,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index bd2c27d..5bbc056 100644
--- a/stat.c
+++ b/stat.c
@@ -135,7 +135,7 @@ static int double_cmp(const void *a, const void *b)
 	return cmp;
 }
 
-unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long long nr,
+unsigned int calc_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 				   fio_fp64_t *plist, unsigned long long **output,
 				   unsigned long long *maxv, unsigned long long *minv)
 {
@@ -198,7 +198,7 @@ unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long long n
 /*
  * Find and display the p-th percentile of clat
  */
-static void show_clat_percentiles(unsigned int *io_u_plat, unsigned long long nr,
+static void show_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 				  fio_fp64_t *plist, unsigned int precision,
 				  const char *pre, struct buf_output *out)
 {
@@ -323,7 +323,7 @@ void show_group_stats(struct group_run_stats *rs, struct buf_output *out)
 	}
 }
 
-void stat_calc_dist(unsigned int *map, unsigned long total, double *io_u_dist)
+void stat_calc_dist(uint64_t *map, unsigned long total, double *io_u_dist)
 {
 	int i;
 
@@ -342,7 +342,7 @@ void stat_calc_dist(unsigned int *map, unsigned long total, double *io_u_dist)
 }
 
 static void stat_calc_lat(struct thread_stat *ts, double *dst,
-			  unsigned int *src, int nr)
+			  uint64_t *src, int nr)
 {
 	unsigned long total = ddir_rw_sum(ts->total_io_u);
 	int i;
@@ -2460,7 +2460,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		this_window = elapsed - hw->hist_last;
 		
 		if (this_window >= iolog->hist_msec) {
-			unsigned int *io_u_plat;
+			uint64_t *io_u_plat;
 			struct io_u_plat_entry *dst;
 
 			/*
@@ -2470,7 +2470,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 			 * located in iolog.c after printing this sample to the
 			 * log file.
 			 */
-			io_u_plat = (unsigned int *) td->ts.io_u_plat[ddir];
+			io_u_plat = (uint64_t *) td->ts.io_u_plat[ddir];
 			dst = malloc(sizeof(struct io_u_plat_entry));
 			memcpy(&(dst->io_u_plat), io_u_plat,
 				FIO_IO_U_PLAT_NR * sizeof(unsigned int));
diff --git a/stat.h b/stat.h
index cc91dfc..7580f0d 100644
--- a/stat.h
+++ b/stat.h
@@ -182,15 +182,14 @@ struct thread_stat {
 	uint64_t percentile_precision;
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
-	uint32_t io_u_map[FIO_IO_U_MAP_NR];
-	uint32_t io_u_submit[FIO_IO_U_MAP_NR];
-	uint32_t io_u_complete[FIO_IO_U_MAP_NR];
-	uint32_t io_u_lat_n[FIO_IO_U_LAT_N_NR];
-	uint32_t io_u_lat_u[FIO_IO_U_LAT_U_NR];
-	uint32_t io_u_lat_m[FIO_IO_U_LAT_M_NR];
-	uint32_t io_u_plat[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR];
-	uint32_t io_u_sync_plat[FIO_IO_U_PLAT_NR];
-	uint32_t pad;
+	uint64_t io_u_map[FIO_IO_U_MAP_NR];
+	uint64_t io_u_submit[FIO_IO_U_MAP_NR];
+	uint64_t io_u_complete[FIO_IO_U_MAP_NR];
+	uint64_t io_u_lat_n[FIO_IO_U_LAT_N_NR];
+	uint64_t io_u_lat_u[FIO_IO_U_LAT_U_NR];
+	uint64_t io_u_lat_m[FIO_IO_U_LAT_M_NR];
+	uint64_t io_u_plat[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR];
+	uint64_t io_u_sync_plat[FIO_IO_U_PLAT_NR];
 
 	uint64_t total_io_u[DDIR_RWDIR_SYNC_CNT];
 	uint64_t short_io_u[DDIR_RWDIR_CNT];
@@ -275,7 +274,7 @@ struct jobs_eta {
 
 struct io_u_plat_entry {
 	struct flist_head list;
-	unsigned int io_u_plat[FIO_IO_U_PLAT_NR];
+	uint64_t io_u_plat[FIO_IO_U_PLAT_NR];
 };
 
 extern struct fio_mutex *stat_mutex;
@@ -300,11 +299,11 @@ extern void init_thread_stat(struct thread_stat *ts);
 extern void init_group_run_stat(struct group_run_stats *gs);
 extern void eta_to_str(char *str, unsigned long eta_sec);
 extern bool calc_lat(struct io_stat *is, unsigned long long *min, unsigned long long *max, double *mean, double *dev);
-extern unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long long nr, fio_fp64_t *plist, unsigned long long **output, unsigned long long *maxv, unsigned long long *minv);
+extern unsigned int calc_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr, fio_fp64_t *plist, unsigned long long **output, unsigned long long *maxv, unsigned long long *minv);
 extern void stat_calc_lat_n(struct thread_stat *ts, double *io_u_lat);
 extern void stat_calc_lat_m(struct thread_stat *ts, double *io_u_lat);
 extern void stat_calc_lat_u(struct thread_stat *ts, double *io_u_lat);
-extern void stat_calc_dist(unsigned int *map, unsigned long total, double *io_u_dist);
+extern void stat_calc_dist(uint64_t *map, unsigned long total, double *io_u_dist);
 extern void reset_io_stats(struct thread_data *);
 extern void update_rusage_stat(struct thread_data *);
 extern void clear_rusage_stat(struct thread_data *);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-03-01 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-03-01 13:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7332 bytes --]

The following changes since commit 68ae273e7544ebafeef721281b9bda5d42d66f4c:

  Merge branch 'master' of https://github.com/brycepg/fio (2018-02-27 16:15:22 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e09d68c8da4ab91397490577454de928106651f5:

  Merge branch 'win_build' of https://github.com/sitsofe/fio (2018-02-28 09:44:08 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      configure: improve static zlib package warning
      Merge branch 'win_build' of https://github.com/sitsofe/fio

Sitsofe Wheeler (3):
      appveyor: minor refactoring, clarifications
      windows: minor windows installer improvements
      windows: document MinGW zlib install and remove custom zlib search

 README                 |   4 +++-
 appveyor.yml           |  12 +++++-------
 configure              |  17 +++++------------
 os/windows/dobuild.cmd |  12 +++++++++++-
 os/windows/eula.rtf    | Bin 1072 -> 1075 bytes
 os/windows/install.wxs |   8 ++++----
 6 files changed, 28 insertions(+), 25 deletions(-)

---

Diff of recent changes:

diff --git a/README b/README
index fc28b16..fba5f10 100644
--- a/README
+++ b/README
@@ -172,7 +172,9 @@ directory.
 How to compile fio on 64-bit Windows:
 
  1. Install Cygwin (http://www.cygwin.com/). Install **make** and all
-    packages starting with **mingw64-i686** and **mingw64-x86_64**.
+    packages starting with **mingw64-i686** and **mingw64-x86_64**. Ensure
+    **mingw64-i686-zlib** and **mingw64-x86_64-zlib** are installed if you wish
+    to enable fio's log compression functionality.
  2. Open the Cygwin Terminal.
  3. Go to the fio directory (source files).
  4. Run ``make clean && make -j``.
diff --git a/appveyor.yml b/appveyor.yml
index 844afa5..c6c3689 100644
--- a/appveyor.yml
+++ b/appveyor.yml
@@ -1,30 +1,28 @@
-clone_depth: 1
+clone_depth: 1 # NB: this stops FIO-VERSION-GEN making tag based versions
+
 environment:
   CYG_MIRROR: http://cygwin.mirror.constant.com
   CYG_ROOT: C:\cygwin64
   MAKEFLAGS: -j 2
   matrix:
-    - platform: x86_64
-      BUILD_ARCH: x64
+    - platform: x64
       PACKAGE_ARCH: x86_64
       CONFIGURE_OPTIONS:
     - platform: x86
-      BUILD_ARCH: x86
       PACKAGE_ARCH: i686
       CONFIGURE_OPTIONS: --build-32bit-win
 
 install:
   - '%CYG_ROOT%\setup-x86_64.exe --quiet-mode --no-shortcuts --only-site --site "%CYG_MIRROR%" --packages "mingw64-%PACKAGE_ARCH%-zlib" > NULL'
+  - SET PATH=%CYG_ROOT%\bin;%PATH% #��NB: Changed env variables persist to later sections
 
 build_script:
-  - SET PATH=%CYG_ROOT%\bin;%PATH%
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure --extra-cflags=\"-Werror\" ${CONFIGURE_OPTIONS} && make.exe'
 
 after_build:
-  - cd os\windows && dobuild.cmd %BUILD_ARCH%
+  - cd os\windows && dobuild.cmd %PLATFORM%
 
 test_script:
-  - SET PATH=%CYG_ROOT%\bin;%PATH%
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && file.exe fio.exe && make.exe test'
 
 artifacts:
diff --git a/configure b/configure
index 2b99ce9..2e8eb18 100755
--- a/configure
+++ b/configure
@@ -320,18 +320,8 @@ CYGWIN*)
   if test -z "${CC}${cross_prefix}"; then
     if test ! -z "$build_32bit_win" && test "$build_32bit_win" = "yes"; then
       cc="i686-w64-mingw32-gcc"
-      if test -e "../zlib/contrib/vstudio/vc14/x86/ZlibStatReleaseWithoutAsm/zlibstat.lib"; then
-        echo "Building with zlib support"
-        output_sym "CONFIG_ZLIB"
-        echo "LIBS=../zlib/contrib/vstudio/vc14/x86/ZlibStatReleaseWithoutAsm/zlibstat.lib" >> $config_host_mak
-      fi
     else
       cc="x86_64-w64-mingw32-gcc"
-      if test -e "../zlib/contrib/vstudio/vc14/x64/ZlibStatReleaseWithoutAsm/zlibstat.lib"; then
-        echo "Building with zlib support"
-        output_sym "CONFIG_ZLIB"
-        echo "LIBS=../zlib/contrib/vstudio/vc14/x64/ZlibStatReleaseWithoutAsm/zlibstat.lib" >> $config_host_mak
-      fi
     fi
   fi
   if test ! -z "$build_32bit_win" && test "$build_32bit_win" = "yes"; then
@@ -359,7 +349,7 @@ CYGWIN*)
   static_assert="yes"
   ipv6="yes"
   mkdir_two="no"
-  echo "BUILD_CFLAGS=$CFLAGS -I../zlib -include config-host.h -D_GNU_SOURCE" >> $config_host_mak
+  echo "BUILD_CFLAGS=$CFLAGS -include config-host.h -D_GNU_SOURCE" >> $config_host_mak
   ;;
 esac
 
@@ -2376,7 +2366,10 @@ if test "$disable_opt" = "yes" ; then
   output_sym "CONFIG_DISABLE_OPTIMIZATIONS"
 fi
 if test "$zlib" = "no" ; then
-  echo "Consider installing zlib-dev (zlib-devel or zlib-static), some fio features depend on it."
+  echo "Consider installing zlib-dev (zlib-devel, some fio features depend on it."
+  if test "$build_static" = "yes"; then
+    echo "Note that some distros have separate packages for static libraries."
+  fi
 fi
 if test "$cuda" = "yes" ; then
   output_sym "CONFIG_CUDA"
diff --git a/os/windows/dobuild.cmd b/os/windows/dobuild.cmd
index fd54a9c..ef12d82 100644
--- a/os/windows/dobuild.cmd
+++ b/os/windows/dobuild.cmd
@@ -6,6 +6,16 @@ for /f "tokens=3" %%i in (..\..\FIO-VERSION-FILE) do (
  set /a counter+=1
 )
 
+for /f "tokens=2 delims=-" %%i in ("%FIO_VERSION%") do (
+ set FIO_VERSION_NUMBERS=%%i
+)
+
+if not defined FIO_VERSION_NUMBERS (
+  echo Could not find version numbers in the string '%FIO_VERSION%'
+  echo Expected version to follow format 'fio-^([0-9]+.[0-9.]+^)'
+  goto end
+)
+
 if "%1"=="x86" set FIO_ARCH=x86
 if "%1"=="x64" set FIO_ARCH=x64
 
@@ -16,7 +26,7 @@ if not defined FIO_ARCH (
   goto end
 )
 
-"%WIX%bin\candle" -nologo -arch %FIO_ARCH% install.wxs
+"%WIX%bin\candle" -nologo -arch %FIO_ARCH% -dFioVersionNumbers="%FIO_VERSION_NUMBERS%" install.wxs
 @if ERRORLEVEL 1 goto end
 "%WIX%bin\candle" -nologo -arch %FIO_ARCH% examples.wxs
 @if ERRORLEVEL 1 goto end
diff --git a/os/windows/eula.rtf b/os/windows/eula.rtf
index 1c92932..b2798bb 100755
Binary files a/os/windows/eula.rtf and b/os/windows/eula.rtf differ
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 577af55..73b2810 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -1,7 +1,7 @@
 <?xml version="1.0" encoding="utf-8"?>
 <Wix xmlns="http://schemas.microsoft.com/wix/2006/wi">
 
-	<?if $(env.FIO_ARCH) = x86 ?>
+	<?if $(sys.BUILDARCH) = x86 ?>
 		<?define ProgramDirectory = ProgramFilesFolder ?>
 	<?else?>
 		<?define ProgramDirectory = ProgramFiles64Folder ?>
@@ -10,9 +10,9 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="3.5">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="$(var.FioVersionNumbers)">
 		<Package
-		  Description="Flexible IO Tester"
+		  Description="Flexible I/O Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"
 		  Languages="1033" Manufacturer="fio"
 		  InstallScope="perMachine" InstallPrivileges="elevated" Compressed="yes"/>
@@ -48,7 +48,7 @@
 			</Directory>
 	</Directory>
 
-	<Feature Id="AlwaysInstall" Absent="disallow" ConfigurableDirectory="INSTALLDIR" Display="hidden" Level="1" Title="Flexible IO Tester">
+	<Feature Id="AlwaysInstall" Absent="disallow" ConfigurableDirectory="INSTALLDIR" Display="hidden" Level="1" Title="Flexible I/O Tester">
 		<ComponentRef Id="fio.exe"/>
 		<ComponentRef Id="HOWTO"/>
 		<ComponentRef Id="README"/>

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-02-28 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-02-28 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8aa33bf1b4363b706f7139e15b0a3e7b14bea16b:

  Merge branch 'wip-ifed-howto-update' of https://github.com/ifed01/fio (2018-02-26 11:59:54 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 68ae273e7544ebafeef721281b9bda5d42d66f4c:

  Merge branch 'master' of https://github.com/brycepg/fio (2018-02-27 16:15:22 -0700)

----------------------------------------------------------------
Bryce Guinta (1):
      Refer to zlib-static package in zlib static build warning

Jens Axboe (1):
      Merge branch 'master' of https://github.com/brycepg/fio

 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 5d283d7..2b99ce9 100755
--- a/configure
+++ b/configure
@@ -2376,7 +2376,7 @@ if test "$disable_opt" = "yes" ; then
   output_sym "CONFIG_DISABLE_OPTIMIZATIONS"
 fi
 if test "$zlib" = "no" ; then
-  echo "Consider installing zlib-dev (zlib-devel), some fio features depend on it."
+  echo "Consider installing zlib-dev (zlib-devel or zlib-static), some fio features depend on it."
 fi
 if test "$cuda" = "yes" ; then
   output_sym "CONFIG_CUDA"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-02-27 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-02-27 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit cd174b909109e527e41ca42ac761c5f7d29f25e5:

  Fio 3.5 (2018-02-20 15:30:28 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8aa33bf1b4363b706f7139e15b0a3e7b14bea16b:

  Merge branch 'wip-ifed-howto-update' of https://github.com/ifed01/fio (2018-02-26 11:59:54 -0700)

----------------------------------------------------------------
Igor Fedotov (1):
      Update HOWTO with RADOS information

Jens Axboe (1):
      Merge branch 'wip-ifed-howto-update' of https://github.com/ifed01/fio

 HOWTO | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 307b50d..f91a22e 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1810,6 +1810,11 @@ I/O engine
 			I/O engine that does regular EXT4_IOC_MOVE_EXT ioctls to simulate
 			defragment activity in request to DDIR_WRITE event.
 
+		**rados**
+			I/O engine supporting direct access to Ceph Reliable Autonomic
+			Distributed Object Store (RADOS) via librados. This ioengine
+			defines engine specific options.
+
 		**rbd**
 			I/O engine supporting direct access to Ceph Rados Block Devices
 			(RBD) via librbd without the need to use the kernel rbd driver. This
@@ -2011,7 +2016,7 @@ with the caveat that when used on the command line, they must come after the
 		Allocate space immediately inside defragment event, and free right
 		after event.
 
-.. option:: clustername=str : [rbd]
+.. option:: clustername=str : [rbd,rados]
 
 	Specifies the name of the Ceph cluster.
 
@@ -2019,17 +2024,22 @@ with the caveat that when used on the command line, they must come after the
 
 	Specifies the name of the RBD.
 
-.. option:: pool=str : [rbd]
+.. option:: pool=str : [rbd,rados]
 
-	Specifies the name of the Ceph pool containing RBD.
+	Specifies the name of the Ceph pool containing RBD or RADOS data.
 
-.. option:: clientname=str : [rbd]
+.. option:: clientname=str : [rbd,rados]
 
 	Specifies the username (without the 'client.' prefix) used to access the
 	Ceph cluster. If the *clustername* is specified, the *clientname* shall be
 	the full *type.id* string. If no type. prefix is given, fio will add
 	'client.' by default.
 
+.. option:: busy_poll=bool : [rbd,rados]
+
+        Poll store instead of waiting for completion. Usually this provides better
+        throughput at cost of higher(up to 100%) CPU utilization.
+
 .. option:: skip_bad=bool : [mtd]
 
 	Skip operations against known bad blocks.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-02-21 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-02-21 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 488468793d72297f7dccd42a7b50011e74de0688:

  Merge branch 'wip-ifed-rados' of https://github.com/ifed01/fio (2018-02-14 12:43:04 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cd174b909109e527e41ca42ac761c5f7d29f25e5:

  Fio 3.5 (2018-02-20 15:30:28 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 3.5

 FIO-VERSION-GEN        | 2 +-
 os/windows/install.wxs | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 5aed535..7abd8ce 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.3
+DEF_VER=fio-3.5
 
 LF='
 '
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 905addc..577af55 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="3.3">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="3.5">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-02-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-02-15 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f2cd91604af170e972438c461a40230e266a57d9:

  debug: fix inverted logic in fio_did_warn() (2018-02-12 10:55:07 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 488468793d72297f7dccd42a7b50011e74de0688:

  Merge branch 'wip-ifed-rados' of https://github.com/ifed01/fio (2018-02-14 12:43:04 -0700)

----------------------------------------------------------------
Igor Fedotov (1):
      Add support for Ceph Rados benchmarking.

Jens Axboe (1):
      Merge branch 'wip-ifed-rados' of https://github.com/ifed01/fio

 Makefile           |   3 +
 configure          |  34 ++++
 engines/rados.c    | 479 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 examples/rados.fio |  24 +++
 fio.1              |  17 +-
 5 files changed, 553 insertions(+), 4 deletions(-)
 create mode 100644 engines/rados.c
 create mode 100644 examples/rados.fio

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 3ce6064..c25b422 100644
--- a/Makefile
+++ b/Makefile
@@ -95,6 +95,9 @@ endif
 ifdef CONFIG_WINDOWSAIO
   SOURCE += engines/windowsaio.c
 endif
+ifdef CONFIG_RADOS
+  SOURCE += engines/rados.c
+endif
 ifdef CONFIG_RBD
   SOURCE += engines/rbd.c
 endif
diff --git a/configure b/configure
index d92bb0f..5d283d7 100755
--- a/configure
+++ b/configure
@@ -173,6 +173,8 @@ for opt do
   ;;
   --disable-rdma) disable_rdma="yes"
   ;;
+  --disable-rados) disable_rados="yes"
+  ;;
   --disable-rbd) disable_rbd="yes"
   ;;
   --disable-rbd-blkin) disable_rbd_blkin="yes"
@@ -1527,6 +1529,35 @@ fi
 print_config "IPv6 helpers" "$ipv6"
 
 ##########################################
+# check for rados
+if test "$rados" != "yes" ; then
+  rados="no"
+fi
+cat > $TMPC << EOF
+#include <rados/librados.h>
+
+int main(int argc, char **argv)
+{
+  rados_t cluster;
+  rados_ioctx_t io_ctx;
+  const char cluster_name[] = "ceph";
+  const char user_name[] = "client.admin";
+  const char pool[] = "rados";
+
+  /* The rados_create2 signature required was only introduced in ceph 0.65 */
+  rados_create2(&cluster, cluster_name, user_name, 0);
+  rados_ioctx_create(cluster, pool, &io_ctx);
+
+  return 0;
+}
+EOF
+if test "$disable_rados" != "yes"  && compile_prog "" "-lrados" "rados"; then
+  LIBS="-lrados $LIBS"
+  rados="yes"
+fi
+print_config "Rados engine" "$rados"
+
+##########################################
 # check for rbd
 if test "$rbd" != "yes" ; then
   rbd="no"
@@ -2262,6 +2293,9 @@ fi
 if test "$ipv6" = "yes" ; then
   output_sym "CONFIG_IPV6"
 fi
+if test "$rados" = "yes" ; then
+  output_sym "CONFIG_RADOS"
+fi
 if test "$rbd" = "yes" ; then
   output_sym "CONFIG_RBD"
 fi
diff --git a/engines/rados.c b/engines/rados.c
new file mode 100644
index 0000000..dc0d7b1
--- /dev/null
+++ b/engines/rados.c
@@ -0,0 +1,479 @@
+/*
+ *  Ceph Rados engine
+ *
+ * IO engine using Ceph's RADOS interface to test low-level performance of
+ * Ceph OSDs.
+ *
+ */
+
+#include <rados/librados.h>
+#include <pthread.h>
+#include "fio.h"
+#include "../optgroup.h"
+
+struct fio_rados_iou {
+	struct thread_data *td;
+	struct io_u *io_u;
+	rados_completion_t completion;
+	rados_write_op_t write_op;
+};
+
+struct rados_data {
+	rados_t cluster;
+	rados_ioctx_t io_ctx;
+	char **objects;
+	size_t object_count;
+	struct io_u **aio_events;
+	bool connected;
+};
+
+/* fio configuration options read from the job file */
+struct rados_options {
+	void *pad;
+	char *cluster_name;
+	char *pool_name;
+	char *client_name;
+	int busy_poll;
+};
+
+static struct fio_option options[] = {
+	{
+		.name     = "clustername",
+		.lname    = "ceph cluster name",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "Cluster name for ceph",
+		.off1     = offsetof(struct rados_options, cluster_name),
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_RBD,
+	},
+	{
+		.name     = "pool",
+		.lname    = "pool name to use",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "Ceph pool name to benchmark against",
+		.off1     = offsetof(struct rados_options, pool_name),
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_RBD,
+	},
+	{
+		.name     = "clientname",
+		.lname    = "rados engine clientname",
+		.type     = FIO_OPT_STR_STORE,
+		.help     = "Name of the ceph client to access RADOS engine",
+		.off1     = offsetof(struct rados_options, client_name),
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_RBD,
+	},
+	{
+		.name     = "busy_poll",
+		.lname    = "busy poll mode",
+		.type     = FIO_OPT_BOOL,
+		.help     = "Busy poll for completions instead of sleeping",
+		.off1     = offsetof(struct rados_options, busy_poll),
+		.def	  = "0",
+		.category = FIO_OPT_C_ENGINE,
+		.group    = FIO_OPT_G_RBD,
+	},
+	{
+		.name     = NULL,
+	},
+};
+
+static int _fio_setup_rados_data(struct thread_data *td,
+				struct rados_data **rados_data_ptr)
+{
+	struct rados_data *rados;
+
+	if (td->io_ops_data)
+		return 0;
+
+	rados = calloc(1, sizeof(struct rados_data));
+	if (!rados)
+		goto failed;
+
+	rados->connected = false;
+
+	rados->aio_events = calloc(td->o.iodepth, sizeof(struct io_u *));
+	if (!rados->aio_events)
+		goto failed;
+
+	rados->object_count = td->o.nr_files;
+	rados->objects = calloc(rados->object_count, sizeof(char*));
+	if (!rados->objects)
+		goto failed;
+
+	*rados_data_ptr = rados;
+	return 0;
+
+failed:
+	if (rados) {
+		rados->object_count = 0;
+		if (rados->aio_events)
+			free(rados->aio_events);
+		free(rados);
+	}
+	return 1;
+}
+
+static void _fio_rados_rm_objects(struct rados_data *rados)
+{
+	size_t i;
+	for (i = 0; i < rados->object_count; ++i) {
+		if (rados->objects[i]) {
+			rados_remove(rados->io_ctx, rados->objects[i]);
+			free(rados->objects[i]);
+			rados->objects[i] = NULL;
+		}
+	}
+}
+
+static int _fio_rados_connect(struct thread_data *td)
+{
+	struct rados_data *rados = td->io_ops_data;
+	struct rados_options *o = td->eo;
+	int r;
+	const uint64_t file_size =
+		td->o.size / (td->o.nr_files ? td->o.nr_files : 1u);
+	struct fio_file *f;
+	uint32_t i;
+	size_t oname_len = 0;
+
+	if (o->cluster_name) {
+		char *client_name = NULL;
+
+		/*
+		* If we specify cluser name, the rados_create2
+		* will not assume 'client.'. name is considered
+		* as a full type.id namestr
+		*/
+		if (o->client_name) {
+			if (!index(o->client_name, '.')) {
+				client_name = calloc(1, strlen("client.") +
+					strlen(o->client_name) + 1);
+				strcat(client_name, "client.");
+				strcat(client_name, o->client_name);
+			} else {
+				client_name = o->client_name;
+			}
+		}
+
+		r = rados_create2(&rados->cluster, o->cluster_name,
+			client_name, 0);
+
+		if (client_name && !index(o->client_name, '.'))
+			free(client_name);
+	} else
+		r = rados_create(&rados->cluster, o->client_name);
+
+	if (r < 0) {
+		log_err("rados_create failed.\n");
+		goto failed_early;
+	}
+
+	r = rados_conf_read_file(rados->cluster, NULL);
+	if (r < 0) {
+		log_err("rados_conf_read_file failed.\n");
+		goto failed_early;
+	}
+
+	r = rados_connect(rados->cluster);
+	if (r < 0) {
+		log_err("rados_connect failed.\n");
+		goto failed_early;
+	}
+
+	r = rados_ioctx_create(rados->cluster, o->pool_name, &rados->io_ctx);
+	if (r < 0) {
+		log_err("rados_ioctx_create failed.\n");
+		goto failed_shutdown;
+	}
+
+	for (i = 0; i < rados->object_count; i++) {
+		f = td->files[i];
+		f->real_file_size = file_size;
+		f->engine_pos = i;
+
+		oname_len = strlen(f->file_name) + 32;
+		rados->objects[i] = malloc(oname_len);
+		/* vary objects for different jobs */
+		snprintf(rados->objects[i], oname_len - 1,
+			"fio_rados_bench.%s.%x",
+			f->file_name, td->thread_number);
+		r = rados_write(rados->io_ctx, rados->objects[i], "", 0, 0);
+		if (r < 0) {
+			free(rados->objects[i]);
+			rados->objects[i] = NULL;
+			log_err("error creating object.\n");
+			goto failed_obj_create;
+		}
+	}
+
+  return 0;
+
+failed_obj_create:
+	_fio_rados_rm_objects(rados);
+	rados_ioctx_destroy(rados->io_ctx);
+	rados->io_ctx = NULL;
+failed_shutdown:
+	rados_shutdown(rados->cluster);
+	rados->cluster = NULL;
+failed_early:
+	return 1;
+}
+
+static void _fio_rados_disconnect(struct rados_data *rados)
+{
+	if (!rados)
+		return;
+
+	_fio_rados_rm_objects(rados);
+
+	if (rados->io_ctx) {
+		rados_ioctx_destroy(rados->io_ctx);
+		rados->io_ctx = NULL;
+	}
+
+	if (rados->cluster) {
+		rados_shutdown(rados->cluster);
+		rados->cluster = NULL;
+	}
+}
+
+static void fio_rados_cleanup(struct thread_data *td)
+{
+	struct rados_data *rados = td->io_ops_data;
+
+	if (rados) {
+		_fio_rados_disconnect(rados);
+		free(rados->objects);
+		free(rados->aio_events);
+		free(rados);
+	}
+}
+
+static int fio_rados_queue(struct thread_data *td, struct io_u *io_u)
+{
+	struct rados_data *rados = td->io_ops_data;
+	struct fio_rados_iou *fri = io_u->engine_data;
+	char *object = rados->objects[io_u->file->engine_pos];
+	int r = -1;
+
+	fio_ro_check(td, io_u);
+
+	if (io_u->ddir == DDIR_WRITE) {
+		 r = rados_aio_create_completion(fri, NULL,
+			NULL, &fri->completion);
+		if (r < 0) {
+			log_err("rados_aio_create_completion failed.\n");
+			goto failed;
+		}
+
+		r = rados_aio_write(rados->io_ctx, object, fri->completion,
+			io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+		if (r < 0) {
+			log_err("rados_write failed.\n");
+			goto failed_comp;
+		}
+		return FIO_Q_QUEUED;
+	} else if (io_u->ddir == DDIR_READ) {
+		r = rados_aio_create_completion(fri, NULL,
+			NULL, &fri->completion);
+		if (r < 0) {
+			log_err("rados_aio_create_completion failed.\n");
+			goto failed;
+		}
+		r = rados_aio_read(rados->io_ctx, object, fri->completion,
+			io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+		if (r < 0) {
+			log_err("rados_aio_read failed.\n");
+			goto failed_comp;
+		}
+		return FIO_Q_QUEUED;
+	} else if (io_u->ddir == DDIR_TRIM) {
+		r = rados_aio_create_completion(fri, NULL,
+			NULL , &fri->completion);
+		if (r < 0) {
+			log_err("rados_aio_create_completion failed.\n");
+			goto failed;
+		}
+		fri->write_op = rados_create_write_op();
+		if (fri->write_op == NULL) {
+			log_err("rados_create_write_op failed.\n");
+			goto failed_comp;
+		}
+		rados_write_op_zero(fri->write_op, io_u->offset,
+			io_u->xfer_buflen);
+		r = rados_aio_write_op_operate(fri->write_op, rados->io_ctx,
+			fri->completion, object, NULL, 0);
+		if (r < 0) {
+			log_err("rados_aio_write_op_operate failed.\n");
+			goto failed_write_op;
+		}
+		return FIO_Q_QUEUED;
+	 }
+
+	log_err("WARNING: Only DDIR_READ, DDIR_WRITE and DDIR_TRIM are supported!");
+
+failed_write_op:
+	rados_release_write_op(fri->write_op);
+failed_comp:
+	rados_aio_release(fri->completion);
+failed:
+	io_u->error = -r;
+	td_verror(td, io_u->error, "xfer");
+	return FIO_Q_COMPLETED;
+}
+
+static struct io_u *fio_rados_event(struct thread_data *td, int event)
+{
+	struct rados_data *rados = td->io_ops_data;
+	return rados->aio_events[event];
+}
+
+int fio_rados_getevents(struct thread_data *td, unsigned int min,
+	unsigned int max, const struct timespec *t)
+{
+	struct rados_data *rados = td->io_ops_data;
+	struct rados_options *o = td->eo;
+	int busy_poll = o->busy_poll;
+	unsigned int events = 0;
+	struct io_u *u;
+	struct fio_rados_iou *fri;
+	unsigned int i;
+	rados_completion_t first_unfinished;
+	int observed_new = 0;
+
+	/* loop through inflight ios until we find 'min' completions */
+	do {
+		first_unfinished = NULL;
+		io_u_qiter(&td->io_u_all, u, i) {
+			if (!(u->flags & IO_U_F_FLIGHT))
+				continue;
+
+			fri = u->engine_data;
+			if (fri->completion) {
+				if (rados_aio_is_complete(fri->completion)) {
+					if (fri->write_op != NULL) {
+						rados_release_write_op(fri->write_op);
+						fri->write_op = NULL;
+					}
+					rados_aio_release(fri->completion);
+					fri->completion = NULL;
+					rados->aio_events[events] = u;
+					events++;
+					observed_new = 1;
+				} else if (first_unfinished == NULL) {
+					first_unfinished = fri->completion;
+				}
+			}
+			if (events >= max)
+				break;
+		}
+		if (events >= min)
+			return events;
+		if (first_unfinished == NULL || busy_poll)
+			continue;
+
+		if (!observed_new)
+			rados_aio_wait_for_complete(first_unfinished);
+	} while (1);
+  return events;
+}
+
+static int fio_rados_setup(struct thread_data *td)
+{
+	struct rados_data *rados = NULL;
+	int r;
+	/* allocate engine specific structure to deal with librados. */
+	r = _fio_setup_rados_data(td, &rados);
+	if (r) {
+		log_err("fio_setup_rados_data failed.\n");
+		goto cleanup;
+	}
+	td->io_ops_data = rados;
+
+	/* Force single process mode.
+	*/
+	td->o.use_thread = 1;
+
+	/* connect in the main thread to determine to determine
+	* the size of the given RADOS block device. And disconnect
+	* later on.
+	*/
+	r = _fio_rados_connect(td);
+	if (r) {
+		log_err("fio_rados_connect failed.\n");
+		goto cleanup;
+	}
+	rados->connected = true;
+
+	return 0;
+cleanup:
+	fio_rados_cleanup(td);
+	return r;
+}
+
+/* open/invalidate are noops. we set the FIO_DISKLESSIO flag in ioengine_ops to
+   prevent fio from creating the files
+*/
+static int fio_rados_open(struct thread_data *td, struct fio_file *f)
+{
+	return 0;
+}
+static int fio_rados_invalidate(struct thread_data *td, struct fio_file *f)
+{
+	return 0;
+}
+
+static void fio_rados_io_u_free(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_rados_iou *fri = io_u->engine_data;
+
+	if (fri) {
+		io_u->engine_data = NULL;
+		fri->td = NULL;
+		if (fri->completion)
+			rados_aio_release(fri->completion);
+		if (fri->write_op)
+			rados_release_write_op(fri->write_op);
+		free(fri);
+	}
+}
+
+static int fio_rados_io_u_init(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_rados_iou *fri;
+	fri = calloc(1, sizeof(*fri));
+	fri->io_u = io_u;
+	fri->td = td;
+	io_u->engine_data = fri;
+	return 0;
+}
+
+/* ioengine_ops for get_ioengine() */
+static struct ioengine_ops ioengine = {
+	.name = "rados",
+	.version		= FIO_IOOPS_VERSION,
+	.flags			= FIO_DISKLESSIO,
+	.setup			= fio_rados_setup,
+	.queue			= fio_rados_queue,
+	.getevents		= fio_rados_getevents,
+	.event			= fio_rados_event,
+	.cleanup		= fio_rados_cleanup,
+	.open_file		= fio_rados_open,
+	.invalidate		= fio_rados_invalidate,
+	.options		= options,
+	.io_u_init		= fio_rados_io_u_init,
+	.io_u_free		= fio_rados_io_u_free,
+	.option_struct_size	= sizeof(struct rados_options),
+};
+
+static void fio_init fio_rados_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_rados_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/rados.fio b/examples/rados.fio
new file mode 100644
index 0000000..035cbff
--- /dev/null
+++ b/examples/rados.fio
@@ -0,0 +1,24 @@
+######################################################################
+# Example test for the RADOS engine.
+#
+# Runs a 4k random write test against a RADOS via librados
+#
+# NOTE: Make sure you have either Ceph pool named 'rados' or change
+#       the pool parameter.
+######################################################################
+[global]
+#logging
+#write_iops_log=write_iops_log
+#write_bw_log=write_bw_log
+#write_lat_log=write_lat_log
+ioengine=rados
+clientname=admin
+pool=rados
+busy_poll=0
+rw=randwrite
+bs=4k
+
+[rbd_iodepth32]
+iodepth=32
+size=128m
+nr_files=32
diff --git a/fio.1 b/fio.1
index 91ae4a2..e488b01 100644
--- a/fio.1
+++ b/fio.1
@@ -1585,6 +1585,11 @@ size to the current block offset. \fBblocksize\fR is ignored.
 I/O engine that does regular EXT4_IOC_MOVE_EXT ioctls to simulate
 defragment activity in request to DDIR_WRITE event.
 .TP
+.B rados
+I/O engine supporting direct access to Ceph Reliable Autonomic Distributed
+Object Store (RADOS) via librados. This ioengine defines engine specific
+options.
+.TP
 .B rbd
 I/O engine supporting direct access to Ceph Rados Block Devices
 (RBD) via librbd without the need to use the kernel rbd driver. This
@@ -1773,21 +1778,25 @@ after event.
 .RE
 .RE
 .TP
-.BI (rbd)clustername \fR=\fPstr
+.BI (rbd,rados)clustername \fR=\fPstr
 Specifies the name of the Ceph cluster.
 .TP
 .BI (rbd)rbdname \fR=\fPstr
 Specifies the name of the RBD.
 .TP
-.BI (rbd)pool \fR=\fPstr
-Specifies the name of the Ceph pool containing RBD.
+.BI (rbd,rados)pool \fR=\fPstr
+Specifies the name of the Ceph pool containing RBD or RADOS data.
 .TP
-.BI (rbd)clientname \fR=\fPstr
+.BI (rbd,rados)clientname \fR=\fPstr
 Specifies the username (without the 'client.' prefix) used to access the
 Ceph cluster. If the \fBclustername\fR is specified, the \fBclientname\fR shall be
 the full *type.id* string. If no type. prefix is given, fio will add 'client.'
 by default.
 .TP
+.BI (rbd,rados)busy_poll \fR=\fPbool
+Poll store instead of waiting for completion. Usually this provides better
+throughput at cost of higher(up to 100%) CPU utilization.
+.TP
 .BI (mtd)skip_bad \fR=\fPbool
 Skip operations against known bad blocks.
 .TP

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-02-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-02-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3c0a8bc2f33ba721a97b459d5b32acbf4460450f:

  init: fixup some bad style in previous commit (2018-02-10 14:44:49 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f2cd91604af170e972438c461a40230e266a57d9:

  debug: fix inverted logic in fio_did_warn() (2018-02-12 10:55:07 -0700)

----------------------------------------------------------------
Bryce Guinta (1):
      Make fiologparser_hist compatible with python3

Jens Axboe (7):
      Merge branch 'master' of https://github.com/brycepg/fio
      init: add global 'warned' state
      filesetup: convert root flush warning to fio_did_warn()
      verify: convert verify buf too small warning to fio_did_warn()
      io_u: convert zoned bug warning to fio_did_warn()
      iolog: convert drop warning to fio_did_warn()
      debug: fix inverted logic in fio_did_warn()

 debug.h                         | 19 ++++++++++++++++++-
 filesetup.c                     |  5 +----
 init.c                          |  6 +++++-
 io_u.c                          | 10 ++--------
 iolog.c                         |  6 +-----
 tools/hist/fiologparser_hist.py | 11 ++++++-----
 verify.c                        |  5 +----
 7 files changed, 34 insertions(+), 28 deletions(-)

---

Diff of recent changes:

diff --git a/debug.h b/debug.h
index e3aa3f1..ba62214 100644
--- a/debug.h
+++ b/debug.h
@@ -2,6 +2,7 @@
 #define FIO_DEBUG_H
 
 #include <assert.h>
+#include "lib/types.h"
 #include "log.h"
 
 enum {
@@ -26,7 +27,23 @@ enum {
 	FD_DEBUG_MAX,
 };
 
-extern unsigned int fio_debug_jobno, *fio_debug_jobp;
+extern unsigned int fio_debug_jobno, *fio_debug_jobp, *fio_warned;
+
+static inline bool fio_did_warn(unsigned int mask)
+{
+	if (*fio_warned & mask)
+		return true;
+
+	*fio_warned |= mask;
+	return false;
+}
+
+enum {
+	FIO_WARN_ROOT_FLUSH	= 1,
+	FIO_WARN_VERIFY_BUF	= 2,
+	FIO_WARN_ZONED_BUG	= 4,
+	FIO_WARN_IOLOG_DROP	= 8,
+};
 
 #ifdef FIO_INC_DEBUG
 struct debug_level {
diff --git a/filesetup.c b/filesetup.c
index 3cda606..cced556 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -20,8 +20,6 @@
 #include <linux/falloc.h>
 #endif
 
-static int root_warn;
-
 static FLIST_HEAD(filename_list);
 
 /*
@@ -516,10 +514,9 @@ static int __file_invalidate_cache(struct thread_data *td, struct fio_file *f,
 			ret = blockdev_invalidate_cache(f);
 		}
 		if (ret < 0 && errno == EACCES && geteuid()) {
-			if (!root_warn) {
+			if (!fio_did_warn(FIO_WARN_ROOT_FLUSH)) {
 				log_err("fio: only root may flush block "
 					"devices. Cache flush bypassed!\n");
-				root_warn = 1;
 			}
 			ret = 0;
 		}
diff --git a/init.c b/init.c
index 25661be..28061db 100644
--- a/init.c
+++ b/init.c
@@ -79,6 +79,7 @@ static int prev_group_jobs;
 unsigned long fio_debug = 0;
 unsigned int fio_debug_jobno = -1;
 unsigned int *fio_debug_jobp = NULL;
+unsigned int *fio_warned = NULL;
 
 static char cmd_optstr[256];
 static bool did_arg;
@@ -309,6 +310,7 @@ static void free_shm(void)
 	if (threads) {
 		flow_exit();
 		fio_debug_jobp = NULL;
+		fio_warned = NULL;
 		free_threads_shm();
 	}
 
@@ -341,7 +343,7 @@ static int setup_thread_area(void)
 	do {
 		size_t size = max_jobs * sizeof(struct thread_data);
 
-		size += sizeof(unsigned int);
+		size += 2 * sizeof(unsigned int);
 
 #ifndef CONFIG_NO_SHM
 		shm_id = shmget(0, size, IPC_CREAT | 0600);
@@ -376,6 +378,8 @@ static int setup_thread_area(void)
 	memset(threads, 0, max_jobs * sizeof(struct thread_data));
 	fio_debug_jobp = (unsigned int *)(threads + max_jobs);
 	*fio_debug_jobp = -1;
+	fio_warned = fio_debug_jobp + 1;
+	*fio_warned = 0;
 
 	flow_init();
 
diff --git a/io_u.c b/io_u.c
index 404c75b..b54a79c 100644
--- a/io_u.c
+++ b/io_u.c
@@ -163,7 +163,6 @@ static int __get_next_rand_offset_zoned_abs(struct thread_data *td,
 {
 	struct zone_split_index *zsi;
 	uint64_t lastb, send, stotal;
-	static int warned;
 	unsigned int v;
 
 	lastb = last_block(td, f, ddir);
@@ -192,10 +191,8 @@ bail:
 	 * Should never happen
 	 */
 	if (send == -1U) {
-		if (!warned) {
+		if (!fio_did_warn(FIO_WARN_ZONED_BUG))
 			log_err("fio: bug in zoned generation\n");
-			warned = 1;
-		}
 		goto bail;
 	} else if (send > lastb) {
 		/*
@@ -223,7 +220,6 @@ static int __get_next_rand_offset_zoned(struct thread_data *td,
 {
 	unsigned int v, send, stotal;
 	uint64_t offset, lastb;
-	static int warned;
 	struct zone_split_index *zsi;
 
 	lastb = last_block(td, f, ddir);
@@ -248,10 +244,8 @@ bail:
 	 * Should never happen
 	 */
 	if (send == -1U) {
-		if (!warned) {
+		if (!fio_did_warn(FIO_WARN_ZONED_BUG))
 			log_err("fio: bug in zoned generation\n");
-			warned = 1;
-		}
 		goto bail;
 	}
 
diff --git a/iolog.c b/iolog.c
index 760d7b0..34e74a8 100644
--- a/iolog.c
+++ b/iolog.c
@@ -1141,8 +1141,6 @@ size_t log_chunk_sizes(struct io_log *log)
 
 #ifdef CONFIG_ZLIB
 
-static bool warned_on_drop;
-
 static void iolog_put_deferred(struct io_log *log, void *ptr)
 {
 	if (!ptr)
@@ -1152,10 +1150,8 @@ static void iolog_put_deferred(struct io_log *log, void *ptr)
 	if (log->deferred < IOLOG_MAX_DEFER) {
 		log->deferred_items[log->deferred] = ptr;
 		log->deferred++;
-	} else if (!warned_on_drop) {
+	} else if (!fio_did_warn(FIO_WARN_IOLOG_DROP))
 		log_err("fio: had to drop log entry free\n");
-		warned_on_drop = true;
-	}
 	pthread_mutex_unlock(&log->deferred_free_lock);
 }
 
diff --git a/tools/hist/fiologparser_hist.py b/tools/hist/fiologparser_hist.py
index 2e05b92..62a4eb4 100755
--- a/tools/hist/fiologparser_hist.py
+++ b/tools/hist/fiologparser_hist.py
@@ -177,7 +177,7 @@ def print_all_stats(ctx, end, mn, ss_cnt, vs, ws, mx):
 
     avg = weighted_average(vs, ws)
     values = [mn, avg] + list(ps) + [mx]
-    row = [end, ss_cnt] + map(lambda x: float(x) / ctx.divisor, values)
+    row = [end, ss_cnt] + [float(x) / ctx.divisor for x in values]
     fmt = "%d, %d, %d, " + fmt_float_list(ctx, 5) + ", %d"
     print (fmt % tuple(row))
 
@@ -288,9 +288,9 @@ def main(ctx):
 
         max_cols = guess_max_from_bins(ctx, __HIST_COLUMNS)
         coarseness = int(np.log2(float(max_cols) / __HIST_COLUMNS))
-        bin_vals = np.array(map(lambda x: plat_idx_to_val_coarse(x, coarseness), np.arange(__HIST_COLUMNS)), dtype=float)
-        lower_bin_vals = np.array(map(lambda x: plat_idx_to_val_coarse(x, coarseness, 0.0), np.arange(__HIST_COLUMNS)), dtype=float)
-        upper_bin_vals = np.array(map(lambda x: plat_idx_to_val_coarse(x, coarseness, 1.0), np.arange(__HIST_COLUMNS)), dtype=float)
+        bin_vals = np.array([plat_idx_to_val_coarse(x, coarseness) for x in np.arange(__HIST_COLUMNS)], dtype=float)
+        lower_bin_vals = np.array([plat_idx_to_val_coarse(x, coarseness, 0.0) for x in np.arange(__HIST_COLUMNS)], dtype=float)
+        upper_bin_vals = np.array([plat_idx_to_val_coarse(x, coarseness, 1.0) for x in np.arange(__HIST_COLUMNS)], dtype=float)
 
     fps = [open(f, 'r') for f in ctx.FILE]
     gen = histogram_generator(ctx, fps, ctx.buff_size)
@@ -333,7 +333,8 @@ def main(ctx):
             start += ctx.interval
             end = start + ctx.interval
     finally:
-        map(lambda f: f.close(), fps)
+        for fp in fps:
+            fp.close()
 
 
 if __name__ == '__main__':
diff --git a/verify.c b/verify.c
index b178450..aeafdb5 100644
--- a/verify.c
+++ b/verify.c
@@ -241,7 +241,6 @@ struct vcont {
 };
 
 #define DUMP_BUF_SZ	255
-static int dump_buf_warned;
 
 static void dump_buf(char *buf, unsigned int len, unsigned long long offset,
 		     const char *type, struct fio_file *f)
@@ -260,10 +259,8 @@ static void dump_buf(char *buf, unsigned int len, unsigned long long offset,
 
 	buf_left -= strlen(fname);
 	if (buf_left <= 0) {
-		if (!dump_buf_warned) {
+		if (!fio_did_warn(FIO_WARN_VERIFY_BUF))
 			log_err("fio: verify failure dump buffer too small\n");
-			dump_buf_warned = 1;
-		}
 		free(ptr);
 		return;
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-02-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-02-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0cf542af81751d7b318afe4429001c1aab6baee5:

  Include 'numjobs' in global options output (2018-02-08 15:46:46 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3c0a8bc2f33ba721a97b459d5b32acbf4460450f:

  init: fixup some bad style in previous commit (2018-02-10 14:44:49 -0700)

----------------------------------------------------------------
Damian Yurzola (1):
      init: fix broken verify_interval

Jens Axboe (1):
      init: fixup some bad style in previous commit

 init.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index ae3c4f7..25661be 100644
--- a/init.c
+++ b/init.c
@@ -802,11 +802,12 @@ static int fixup_options(struct thread_data *td)
 			o->verify_interval = o->min_bs[DDIR_READ];
 
 		/*
-		 * Verify interval must be a factor or both min and max
+		 * Verify interval must be a factor of both min and max
 		 * write size
 		 */
-		if (o->verify_interval % o->min_bs[DDIR_WRITE] ||
-		    o->verify_interval % o->max_bs[DDIR_WRITE])
+		if (!o->verify_interval ||
+		    (o->min_bs[DDIR_WRITE] % o->verify_interval) ||
+		    (o->max_bs[DDIR_WRITE] % o->verify_interval))
 			o->verify_interval = gcd(o->min_bs[DDIR_WRITE],
 							o->max_bs[DDIR_WRITE]);
 	}
@@ -1585,7 +1586,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			p.avg_msec = min(o->log_avg_msec, o->bw_avg_time);
 		else
 			o->bw_avg_time = p.avg_msec;
-	
+
 		p.hist_msec = o->log_hist_msec;
 		p.hist_coarseness = o->log_hist_coarseness;
 
@@ -1616,7 +1617,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			p.avg_msec = min(o->log_avg_msec, o->iops_avg_time);
 		else
 			o->iops_avg_time = p.avg_msec;
-	
+
 		p.hist_msec = o->log_hist_msec;
 		p.hist_coarseness = o->log_hist_coarseness;
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-02-09 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-02-09 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c69f6bf3ed1b413562d7aab1aa9c476101348726:

  mmap: don't include MADV_FREE in fadvise_hint check (2018-02-07 11:30:59 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0cf542af81751d7b318afe4429001c1aab6baee5:

  Include 'numjobs' in global options output (2018-02-08 15:46:46 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Include 'numjobs' in global options output

 stat.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index a980a1d..bd2c27d 100644
--- a/stat.c
+++ b/stat.c
@@ -1226,7 +1226,7 @@ static void show_thread_status_terse_all(struct thread_stat *ts,
 }
 
 static void json_add_job_opts(struct json_object *root, const char *name,
-			      struct flist_head *opt_list, bool num_jobs)
+			      struct flist_head *opt_list)
 {
 	struct json_object *dir_object;
 	struct flist_head *entry;
@@ -1242,8 +1242,6 @@ static void json_add_job_opts(struct json_object *root, const char *name,
 		const char *pos = "";
 
 		p = flist_entry(entry, struct print_option, list);
-		if (!num_jobs && !strcmp(p->name, "numjobs"))
-			continue;
 		if (p->value)
 			pos = p->value;
 		json_object_add_value_string(dir_object, p->name, pos);
@@ -1277,7 +1275,7 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 	}
 
 	if (opt_list)
-		json_add_job_opts(root, "job options", opt_list, true);
+		json_add_job_opts(root, "job options", opt_list);
 
 	add_ddir_status_json(ts, rs, DDIR_READ, root);
 	add_ddir_status_json(ts, rs, DDIR_WRITE, root);
@@ -1878,7 +1876,7 @@ void __show_run_stats(void)
 		json_object_add_value_int(root, "timestamp_ms", ms_since_epoch);
 		json_object_add_value_string(root, "time", time_buf);
 		global = get_global_options();
-		json_add_job_opts(root, "global options", &global->opt_list, false);
+		json_add_job_opts(root, "global options", &global->opt_list);
 		array = json_create_array();
 		json_object_add_value_array(root, "jobs", array);
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-02-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-02-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f5ec81235eaf21bd3a97556427d4a84e48a87e54:

  stat: add total fsync ios to json output (2018-01-25 16:04:20 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c69f6bf3ed1b413562d7aab1aa9c476101348726:

  mmap: don't include MADV_FREE in fadvise_hint check (2018-02-07 11:30:59 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Let fadvise_hint also apply too mmap engine and madvise
      mmap: don't include MADV_FREE in fadvise_hint check

 HOWTO          |  5 +++--
 engines/mmap.c | 38 +++++++++++++++++++++++++++-----------
 fio.1          |  4 ++--
 options.c      |  2 +-
 4 files changed, 33 insertions(+), 16 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 78fa6cc..307b50d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1093,8 +1093,9 @@ I/O type
 
 .. option:: fadvise_hint=str
 
-	Use :manpage:`posix_fadvise(2)` to advise the kernel on what I/O patterns
-	are likely to be issued.  Accepted values are:
+	Use :manpage:`posix_fadvise(2)` or :manpage:`posix_fadvise(2)` to
+	advise the kernel on what I/O patterns are likely to be issued.
+	Accepted values are:
 
 		**0**
 			Backwards-compatible hint for "no hint".
diff --git a/engines/mmap.c b/engines/mmap.c
index 7755658..ea7179d 100644
--- a/engines/mmap.c
+++ b/engines/mmap.c
@@ -27,6 +27,30 @@ struct fio_mmap_data {
 	off_t mmap_off;
 };
 
+static bool fio_madvise_file(struct thread_data *td, struct fio_file *f,
+			     size_t length)
+
+{
+	struct fio_mmap_data *fmd = FILE_ENG_DATA(f);
+
+	if (!td->o.fadvise_hint)
+		return true;
+
+	if (!td_random(td)) {
+		if (posix_madvise(fmd->mmap_ptr, length, POSIX_MADV_SEQUENTIAL) < 0) {
+			td_verror(td, errno, "madvise");
+			return false;
+		}
+	} else {
+		if (posix_madvise(fmd->mmap_ptr, length, POSIX_MADV_RANDOM) < 0) {
+			td_verror(td, errno, "madvise");
+			return false;
+		}
+	}
+
+	return true;
+}
+
 static int fio_mmap_file(struct thread_data *td, struct fio_file *f,
 			 size_t length, off_t off)
 {
@@ -50,17 +74,9 @@ static int fio_mmap_file(struct thread_data *td, struct fio_file *f,
 		goto err;
 	}
 
-	if (!td_random(td)) {
-		if (posix_madvise(fmd->mmap_ptr, length, POSIX_MADV_SEQUENTIAL) < 0) {
-			td_verror(td, errno, "madvise");
-			goto err;
-		}
-	} else {
-		if (posix_madvise(fmd->mmap_ptr, length, POSIX_MADV_RANDOM) < 0) {
-			td_verror(td, errno, "madvise");
-			goto err;
-		}
-	}
+	if (!fio_madvise_file(td, f, length))
+		goto err;
+
 	if (posix_madvise(fmd->mmap_ptr, length, POSIX_MADV_DONTNEED) < 0) {
 		td_verror(td, errno, "madvise");
 		goto err;
diff --git a/fio.1 b/fio.1
index 70eeeb0..91ae4a2 100644
--- a/fio.1
+++ b/fio.1
@@ -870,8 +870,8 @@ pre\-allocation methods are available, \fBnone\fR if not.
 .RE
 .TP
 .BI fadvise_hint \fR=\fPstr
-Use \fBposix_fadvise\fR\|(2) to advise the kernel what I/O patterns
-are likely to be issued. Accepted values are:
+Use \fBposix_fadvise\fR\|(2) or \fBposix_madvise\fR\|(2) to advise the kernel
+what I/O patterns are likely to be issued. Accepted values are:
 .RS
 .RS
 .TP
diff --git a/options.c b/options.c
index 9a3431d..6810521 100644
--- a/options.c
+++ b/options.c
@@ -2443,7 +2443,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.posval	= {
 			  { .ival = "0",
 			    .oval = F_ADV_NONE,
-			    .help = "Don't issue fadvise",
+			    .help = "Don't issue fadvise/madvise",
 			  },
 			  { .ival = "1",
 			    .oval = F_ADV_TYPE,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-01-26 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-01-26 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2ea93f982e728343f823c2cf63b4674a104575bf:

  Switch last_was_sync and terminate to bool and pack better (2018-01-24 20:22:50 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f5ec81235eaf21bd3a97556427d4a84e48a87e54:

  stat: add total fsync ios to json output (2018-01-25 16:04:20 -0700)

----------------------------------------------------------------
Jens Axboe (5):
      Track fsync/fdatasync/sync_file_range issue count
      stat: ensure that we align ts->sync_stat appropriately
      io_ddir: move count values out of the enum fio_ddir
      io_ddir: revert separate ddir count change
      stat: add total fsync ios to json output

 init.c      |  1 +
 io_ddir.h   |  4 +++-
 ioengines.c |  4 ++--
 stat.c      | 13 ++++++++++---
 stat.h      |  4 ++--
 5 files changed, 18 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 8a80138..ae3c4f7 100644
--- a/init.c
+++ b/init.c
@@ -1481,6 +1481,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		td->ts.bw_stat[i].min_val = ULONG_MAX;
 		td->ts.iops_stat[i].min_val = ULONG_MAX;
 	}
+	td->ts.sync_stat.min_val = ULONG_MAX;
 	td->ddir_seq_nr = o->ddir_seq_nr;
 
 	if ((o->stonewall || o->new_group) && prev_group_jobs) {
diff --git a/io_ddir.h b/io_ddir.h
index 613d5fb..deaa8b5 100644
--- a/io_ddir.h
+++ b/io_ddir.h
@@ -5,13 +5,15 @@ enum fio_ddir {
 	DDIR_READ = 0,
 	DDIR_WRITE = 1,
 	DDIR_TRIM = 2,
-	DDIR_RWDIR_CNT = 3,
 	DDIR_SYNC = 3,
 	DDIR_DATASYNC,
 	DDIR_SYNC_FILE_RANGE,
 	DDIR_WAIT,
 	DDIR_LAST,
 	DDIR_INVAL = -1,
+
+	DDIR_RWDIR_CNT = 3,
+	DDIR_RWDIR_SYNC_CNT = 4,
 };
 
 static inline const char *io_ddir_name(enum fio_ddir ddir)
diff --git a/ioengines.c b/ioengines.c
index fb475e9..5dd2311 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -357,7 +357,7 @@ int td_io_queue(struct thread_data *td, struct io_u *io_u)
 	}
 
 	if (ret == FIO_Q_COMPLETED) {
-		if (ddir_rw(io_u->ddir)) {
+		if (ddir_rw(io_u->ddir) || ddir_sync(io_u->ddir)) {
 			io_u_mark_depth(td, 1);
 			td->ts.total_io_u[io_u->ddir]++;
 		}
@@ -366,7 +366,7 @@ int td_io_queue(struct thread_data *td, struct io_u *io_u)
 
 		td->io_u_queued++;
 
-		if (ddir_rw(io_u->ddir))
+		if (ddir_rw(io_u->ddir) || ddir_sync(io_u->ddir))
 			td->ts.total_io_u[io_u->ddir]++;
 
 		if (td->io_u_queued >= td->o.iodepth_batch) {
diff --git a/stat.c b/stat.c
index 3a014d6..a980a1d 100644
--- a/stat.c
+++ b/stat.c
@@ -854,12 +854,13 @@ static void show_thread_status_normal(struct thread_stat *ts,
 					io_u_dist[1], io_u_dist[2],
 					io_u_dist[3], io_u_dist[4],
 					io_u_dist[5], io_u_dist[6]);
-	log_buf(out, "     issued rwt: total=%llu,%llu,%llu,"
-				 " short=%llu,%llu,%llu,"
-				 " dropped=%llu,%llu,%llu\n",
+	log_buf(out, "     issued rwts: total=%llu,%llu,%llu,%llu"
+				 " short=%llu,%llu,%llu,0"
+				 " dropped=%llu,%llu,%llu,0\n",
 					(unsigned long long) ts->total_io_u[0],
 					(unsigned long long) ts->total_io_u[1],
 					(unsigned long long) ts->total_io_u[2],
+					(unsigned long long) ts->total_io_u[3],
 					(unsigned long long) ts->short_io_u[0],
 					(unsigned long long) ts->short_io_u[1],
 					(unsigned long long) ts->short_io_u[2],
@@ -1048,6 +1049,7 @@ static void add_ddir_status_json(struct thread_stat *ts,
 
 		tmp_object = json_create_object();
 		json_object_add_value_object(dir_object, "lat_ns", tmp_object);
+		json_object_add_value_int(dir_object, "total_ios", ts->total_io_u[DDIR_SYNC]);
 		json_object_add_value_int(tmp_object, "min", min);
 		json_object_add_value_int(tmp_object, "max", max);
 		json_object_add_value_float(tmp_object, "mean", mean);
@@ -1609,6 +1611,8 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 		}
 	}
 
+	dst->total_io_u[DDIR_SYNC] += src->total_io_u[DDIR_SYNC];
+
 	for (k = 0; k < DDIR_RWDIR_CNT; k++) {
 		int m;
 
@@ -1647,6 +1651,7 @@ void init_thread_stat(struct thread_stat *ts)
 		ts->bw_stat[j].min_val = -1UL;
 		ts->iops_stat[j].min_val = -1UL;
 	}
+	ts->sync_stat.min_val = -1UL;
 	ts->groupid = -1;
 }
 
@@ -2287,6 +2292,8 @@ void reset_io_stats(struct thread_data *td)
 		}
 	}
 
+	ts->total_io_u[DDIR_SYNC] = 0;
+
 	for (i = 0; i < FIO_IO_U_MAP_NR; i++) {
 		ts->io_u_map[i] = 0;
 		ts->io_u_submit[i] = 0;
diff --git a/stat.h b/stat.h
index e32a21e..cc91dfc 100644
--- a/stat.h
+++ b/stat.h
@@ -159,10 +159,10 @@ struct thread_stat {
 	/*
 	 * bandwidth and latency stats
 	 */
+	struct io_stat sync_stat __attribute__((aligned(8)));/* fsync etc stats */
 	struct io_stat clat_stat[DDIR_RWDIR_CNT]; /* completion latency */
 	struct io_stat slat_stat[DDIR_RWDIR_CNT]; /* submission latency */
 	struct io_stat lat_stat[DDIR_RWDIR_CNT]; /* total latency */
-	struct io_stat sync_stat;		/* fsync etc stats */
 	struct io_stat bw_stat[DDIR_RWDIR_CNT]; /* bandwidth stats */
 	struct io_stat iops_stat[DDIR_RWDIR_CNT]; /* IOPS stats */
 
@@ -192,7 +192,7 @@ struct thread_stat {
 	uint32_t io_u_sync_plat[FIO_IO_U_PLAT_NR];
 	uint32_t pad;
 
-	uint64_t total_io_u[DDIR_RWDIR_CNT];
+	uint64_t total_io_u[DDIR_RWDIR_SYNC_CNT];
 	uint64_t short_io_u[DDIR_RWDIR_CNT];
 	uint64_t drop_io_u[DDIR_RWDIR_CNT];
 	uint64_t total_submit;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-01-25 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-01-25 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ca65714c48bcd4fc601e3c04163e2422352be9ca:

  null: drop unneeded casts from void* to non-void* (2018-01-16 08:32:39 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2ea93f982e728343f823c2cf63b4674a104575bf:

  Switch last_was_sync and terminate to bool and pack better (2018-01-24 20:22:50 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Add suppor for logging fsync (and friends) latencies
      Switch last_was_sync and terminate to bool and pack better

 fio.h    |   4 +-
 io_u.c   |  18 +++++--
 libfio.c |   4 +-
 server.h |   2 +-
 stat.c   | 175 ++++++++++++++++++++++++++++++++++++++++++---------------------
 stat.h   |   4 ++
 6 files changed, 140 insertions(+), 67 deletions(-)

---

Diff of recent changes:

diff --git a/fio.h b/fio.h
index 334f203..85546c5 100644
--- a/fio.h
+++ b/fio.h
@@ -228,9 +228,9 @@ struct thread_data {
 	pid_t pid;
 	char *orig_buffer;
 	size_t orig_buffer_size;
-	volatile int terminate;
 	volatile int runstate;
-	unsigned int last_was_sync;
+	volatile bool terminate;
+	bool last_was_sync;
 	enum fio_ddir last_ddir;
 
 	int mmapfd;
diff --git a/io_u.c b/io_u.c
index 1d6872e..404c75b 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1921,7 +1921,8 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 
 		if (no_reduce && per_unit_log(td->iops_log))
 			add_iops_sample(td, io_u, bytes);
-	}
+	} else if (ddir_sync(idx) && !td->o.disable_clat)
+		add_sync_clat_sample(&td->ts, llnsec);
 
 	if (td->ts.nr_block_infos && io_u->ddir == DDIR_TRIM) {
 		uint32_t *info = io_u_block_info(td, io_u);
@@ -1959,6 +1960,12 @@ static void file_log_write_comp(const struct thread_data *td, struct fio_file *f
 		f->last_write_idx = 0;
 }
 
+static bool should_account(struct thread_data *td)
+{
+	return ramp_time_over(td) && (td->runstate == TD_RUNNING ||
+					   td->runstate == TD_VERIFYING);
+}
+
 static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 			 struct io_completion_data *icd)
 {
@@ -1987,15 +1994,17 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 	}
 
 	if (ddir_sync(ddir)) {
-		td->last_was_sync = 1;
+		td->last_was_sync = true;
 		if (f) {
 			f->first_write = -1ULL;
 			f->last_write = -1ULL;
 		}
+		if (should_account(td))
+			account_io_completion(td, io_u, icd, ddir, io_u->buflen);
 		return;
 	}
 
-	td->last_was_sync = 0;
+	td->last_was_sync = false;
 	td->last_ddir = ddir;
 
 	if (!io_u->error && ddir_rw(ddir)) {
@@ -2013,8 +2022,7 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 		if (ddir == DDIR_WRITE)
 			file_log_write_comp(td, f, io_u->offset, bytes);
 
-		if (ramp_time_over(td) && (td->runstate == TD_RUNNING ||
-					   td->runstate == TD_VERIFYING))
+		if (should_account(td))
 			account_io_completion(td, io_u, icd, ddir, bytes);
 
 		icd->bytes_done[ddir] += bytes;
diff --git a/libfio.c b/libfio.c
index 74de735..80159b4 100644
--- a/libfio.c
+++ b/libfio.c
@@ -98,7 +98,7 @@ static void reset_io_counters(struct thread_data *td, int all)
 
 	td->zone_bytes = 0;
 
-	td->last_was_sync = 0;
+	td->last_was_sync = false;
 	td->rwmix_issues = 0;
 
 	/*
@@ -230,7 +230,7 @@ void fio_mark_td_terminate(struct thread_data *td)
 {
 	fio_gettime(&td->terminate_time, NULL);
 	write_barrier();
-	td->terminate = 1;
+	td->terminate = true;
 }
 
 void fio_terminate_threads(unsigned int group_id)
diff --git a/server.h b/server.h
index 1a9b650..cabd447 100644
--- a/server.h
+++ b/server.h
@@ -49,7 +49,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 69,
+	FIO_SERVER_VER			= 70,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index 80f804a..3a014d6 100644
--- a/stat.c
+++ b/stat.c
@@ -200,18 +200,17 @@ unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long long n
  */
 static void show_clat_percentiles(unsigned int *io_u_plat, unsigned long long nr,
 				  fio_fp64_t *plist, unsigned int precision,
-				  bool is_clat, struct buf_output *out)
+				  const char *pre, struct buf_output *out)
 {
 	unsigned int divisor, len, i, j = 0;
 	unsigned long long minv, maxv;
 	unsigned long long *ovals;
 	int per_line, scale_down, time_width;
-	const char *pre = is_clat ? "clat" : " lat";
 	bool is_last;
 	char fmt[32];
 
 	len = calc_clat_percentiles(io_u_plat, nr, plist, &ovals, &maxv, &minv);
-	if (!len)
+	if (!len || !ovals)
 		goto out;
 
 	/*
@@ -419,13 +418,26 @@ static void display_lat(const char *name, unsigned long long min,
 static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 			     int ddir, struct buf_output *out)
 {
-	const char *str[] = { " read", "write", " trim" };
+	const char *str[] = { " read", "write", " trim", "sync" };
 	unsigned long runt;
 	unsigned long long min, max, bw, iops;
 	double mean, dev;
 	char *io_p, *bw_p, *bw_p_alt, *iops_p;
 	int i2p;
 
+	if (ddir_sync(ddir)) {
+		if (calc_lat(&ts->sync_stat, &min, &max, &mean, &dev)) {
+			log_buf(out, "  %s:\n", "fsync/fdatasync/sync_file_range");
+			display_lat(str[ddir], min, max, mean, dev, out);
+			show_clat_percentiles(ts->io_u_sync_plat,
+						ts->sync_stat.samples,
+						ts->percentile_list,
+						ts->percentile_precision,
+						str[ddir], out);
+		}
+		return;
+	}
+
 	assert(ddir_rw(ddir));
 
 	if (!ts->runtime[ddir])
@@ -460,6 +472,7 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		display_lat(" lat", min, max, mean, dev, out);
 
 	if (ts->clat_percentiles || ts->lat_percentiles) {
+		const char *name = ts->clat_percentiles ? "clat" : " lat";
 		uint64_t samples;
 
 		if (ts->clat_percentiles)
@@ -470,8 +483,7 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		show_clat_percentiles(ts->io_u_plat[ddir],
 					samples,
 					ts->percentile_list,
-					ts->percentile_precision,
-					ts->clat_percentiles, out);
+					ts->percentile_precision, name, out);
 	}
 	if (calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev)) {
 		double p_of_agg = 100.0, fkb_base = (double)rs->kb_base;
@@ -803,6 +815,9 @@ static void show_thread_status_normal(struct thread_stat *ts,
 
 	show_latencies(ts, out);
 
+	if (ts->sync_stat.samples)
+		show_ddir_status(rs, ts, DDIR_SYNC, out);
+
 	runtime = ts->total_run_time;
 	if (runtime) {
 		double runt = (double) runtime;
@@ -968,12 +983,12 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	double mean, dev, iops;
 	unsigned int len;
 	int i;
-	const char *ddirname[] = {"read", "write", "trim"};
+	const char *ddirname[] = { "read", "write", "trim", "sync" };
 	struct json_object *dir_object, *tmp_object, *percentile_object, *clat_bins_object = NULL;
 	char buf[120];
 	double p_of_agg = 100.0;
 
-	assert(ddir_rw(ddir));
+	assert(ddir_rw(ddir) || ddir_sync(ddir));
 
 	if (ts->unified_rw_rep && ddir != DDIR_READ)
 		return;
@@ -982,54 +997,76 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	json_object_add_value_object(parent,
 		ts->unified_rw_rep ? "mixed" : ddirname[ddir], dir_object);
 
-	bw_bytes = 0;
-	bw = 0;
-	iops = 0.0;
-	if (ts->runtime[ddir]) {
-		uint64_t runt = ts->runtime[ddir];
+	if (ddir_rw(ddir)) {
+		bw_bytes = 0;
+		bw = 0;
+		iops = 0.0;
+		if (ts->runtime[ddir]) {
+			uint64_t runt = ts->runtime[ddir];
 
-		bw_bytes = ((1000 * ts->io_bytes[ddir]) / runt); /* Bytes/s */
-		bw = bw_bytes / 1024; /* KiB/s */
-		iops = (1000.0 * (uint64_t) ts->total_io_u[ddir]) / runt;
-	}
+			bw_bytes = ((1000 * ts->io_bytes[ddir]) / runt); /* Bytes/s */
+			bw = bw_bytes / 1024; /* KiB/s */
+			iops = (1000.0 * (uint64_t) ts->total_io_u[ddir]) / runt;
+		}
 
-	json_object_add_value_int(dir_object, "io_bytes", ts->io_bytes[ddir]);
-	json_object_add_value_int(dir_object, "io_kbytes", ts->io_bytes[ddir] >> 10);
-	json_object_add_value_int(dir_object, "bw_bytes", bw_bytes);
-	json_object_add_value_int(dir_object, "bw", bw);
-	json_object_add_value_float(dir_object, "iops", iops);
-	json_object_add_value_int(dir_object, "runtime", ts->runtime[ddir]);
-	json_object_add_value_int(dir_object, "total_ios", ts->total_io_u[ddir]);
-	json_object_add_value_int(dir_object, "short_ios", ts->short_io_u[ddir]);
-	json_object_add_value_int(dir_object, "drop_ios", ts->drop_io_u[ddir]);
+		json_object_add_value_int(dir_object, "io_bytes", ts->io_bytes[ddir]);
+		json_object_add_value_int(dir_object, "io_kbytes", ts->io_bytes[ddir] >> 10);
+		json_object_add_value_int(dir_object, "bw_bytes", bw_bytes);
+		json_object_add_value_int(dir_object, "bw", bw);
+		json_object_add_value_float(dir_object, "iops", iops);
+		json_object_add_value_int(dir_object, "runtime", ts->runtime[ddir]);
+		json_object_add_value_int(dir_object, "total_ios", ts->total_io_u[ddir]);
+		json_object_add_value_int(dir_object, "short_ios", ts->short_io_u[ddir]);
+		json_object_add_value_int(dir_object, "drop_ios", ts->drop_io_u[ddir]);
+
+		if (!calc_lat(&ts->slat_stat[ddir], &min, &max, &mean, &dev)) {
+			min = max = 0;
+			mean = dev = 0.0;
+		}
+		tmp_object = json_create_object();
+		json_object_add_value_object(dir_object, "slat_ns", tmp_object);
+		json_object_add_value_int(tmp_object, "min", min);
+		json_object_add_value_int(tmp_object, "max", max);
+		json_object_add_value_float(tmp_object, "mean", mean);
+		json_object_add_value_float(tmp_object, "stddev", dev);
+
+		if (!calc_lat(&ts->clat_stat[ddir], &min, &max, &mean, &dev)) {
+			min = max = 0;
+			mean = dev = 0.0;
+		}
+		tmp_object = json_create_object();
+		json_object_add_value_object(dir_object, "clat_ns", tmp_object);
+		json_object_add_value_int(tmp_object, "min", min);
+		json_object_add_value_int(tmp_object, "max", max);
+		json_object_add_value_float(tmp_object, "mean", mean);
+		json_object_add_value_float(tmp_object, "stddev", dev);
+	} else {
+		if (!calc_lat(&ts->sync_stat, &min, &max, &mean, &dev)) {
+			min = max = 0;
+			mean = dev = 0.0;
+		}
 
-	if (!calc_lat(&ts->slat_stat[ddir], &min, &max, &mean, &dev)) {
-		min = max = 0;
-		mean = dev = 0.0;
+		tmp_object = json_create_object();
+		json_object_add_value_object(dir_object, "lat_ns", tmp_object);
+		json_object_add_value_int(tmp_object, "min", min);
+		json_object_add_value_int(tmp_object, "max", max);
+		json_object_add_value_float(tmp_object, "mean", mean);
+		json_object_add_value_float(tmp_object, "stddev", dev);
 	}
-	tmp_object = json_create_object();
-	json_object_add_value_object(dir_object, "slat_ns", tmp_object);
-	json_object_add_value_int(tmp_object, "min", min);
-	json_object_add_value_int(tmp_object, "max", max);
-	json_object_add_value_float(tmp_object, "mean", mean);
-	json_object_add_value_float(tmp_object, "stddev", dev);
-
-	if (!calc_lat(&ts->clat_stat[ddir], &min, &max, &mean, &dev)) {
-		min = max = 0;
-		mean = dev = 0.0;
-	}
-	tmp_object = json_create_object();
-	json_object_add_value_object(dir_object, "clat_ns", tmp_object);
-	json_object_add_value_int(tmp_object, "min", min);
-	json_object_add_value_int(tmp_object, "max", max);
-	json_object_add_value_float(tmp_object, "mean", mean);
-	json_object_add_value_float(tmp_object, "stddev", dev);
 
 	if (ts->clat_percentiles || ts->lat_percentiles) {
-		len = calc_clat_percentiles(ts->io_u_plat[ddir],
+		if (ddir_rw(ddir)) {
+			len = calc_clat_percentiles(ts->io_u_plat[ddir],
 					ts->clat_stat[ddir].samples,
 					ts->percentile_list, &ovals, &maxv,
 					&minv);
+		} else {
+			len = calc_clat_percentiles(ts->io_u_sync_plat,
+					ts->sync_stat.samples,
+					ts->percentile_list, &ovals, &maxv,
+					&minv);
+		}
+
 		if (len > FIO_IO_U_LIST_MAX_LEN)
 			len = FIO_IO_U_LIST_MAX_LEN;
 	} else
@@ -1048,13 +1085,23 @@ static void add_ddir_status_json(struct thread_stat *ts,
 			json_object_add_value_object(tmp_object, "bins", clat_bins_object);
 
 		for(i = 0; i < FIO_IO_U_PLAT_NR; i++) {
-			if (ts->io_u_plat[ddir][i]) {
-				snprintf(buf, sizeof(buf), "%llu", plat_idx_to_val(i));
-				json_object_add_value_int(clat_bins_object, (const char *)buf, ts->io_u_plat[ddir][i]);
+			if (ddir_rw(ddir)) {
+				if (ts->io_u_plat[ddir][i]) {
+					snprintf(buf, sizeof(buf), "%llu", plat_idx_to_val(i));
+					json_object_add_value_int(clat_bins_object, (const char *)buf, ts->io_u_plat[ddir][i]);
+				}
+			} else {
+				if (ts->io_u_sync_plat[i]) {
+					snprintf(buf, sizeof(buf), "%llu", plat_idx_to_val(i));
+					json_object_add_value_int(clat_bins_object, (const char *)buf, ts->io_u_sync_plat[i]);
+				}
 			}
 		}
 	}
 
+	if (!ddir_rw(ddir))
+		return;
+
 	if (!calc_lat(&ts->lat_stat[ddir], &min, &max, &mean, &dev)) {
 		min = max = 0;
 		mean = dev = 0.0;
@@ -1233,6 +1280,7 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 	add_ddir_status_json(ts, rs, DDIR_READ, root);
 	add_ddir_status_json(ts, rs, DDIR_WRITE, root);
 	add_ddir_status_json(ts, rs, DDIR_TRIM, root);
+	add_ddir_status_json(ts, rs, DDIR_SYNC, root);
 
 	/* CPU Usage */
 	if (ts->total_run_time) {
@@ -1529,24 +1577,25 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 		}
 	}
 
+	sum_stat(&dst->sync_stat, &src->sync_stat, first);
 	dst->usr_time += src->usr_time;
 	dst->sys_time += src->sys_time;
 	dst->ctx += src->ctx;
 	dst->majf += src->majf;
 	dst->minf += src->minf;
 
-	for (k = 0; k < FIO_IO_U_MAP_NR; k++)
+	for (k = 0; k < FIO_IO_U_MAP_NR; k++) {
 		dst->io_u_map[k] += src->io_u_map[k];
-	for (k = 0; k < FIO_IO_U_MAP_NR; k++)
 		dst->io_u_submit[k] += src->io_u_submit[k];
-	for (k = 0; k < FIO_IO_U_MAP_NR; k++)
 		dst->io_u_complete[k] += src->io_u_complete[k];
-	for (k = 0; k < FIO_IO_U_LAT_N_NR; k++)
+	}
+	for (k = 0; k < FIO_IO_U_LAT_N_NR; k++) {
 		dst->io_u_lat_n[k] += src->io_u_lat_n[k];
-	for (k = 0; k < FIO_IO_U_LAT_U_NR; k++)
 		dst->io_u_lat_u[k] += src->io_u_lat_u[k];
-	for (k = 0; k < FIO_IO_U_LAT_M_NR; k++)
 		dst->io_u_lat_m[k] += src->io_u_lat_m[k];
+	}
+	for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
+		dst->io_u_sync_plat[k] += src->io_u_sync_plat[k];
 
 	for (k = 0; k < DDIR_RWDIR_CNT; k++) {
 		if (!dst->unified_rw_rep) {
@@ -2231,8 +2280,11 @@ void reset_io_stats(struct thread_data *td)
 		ts->short_io_u[i] = 0;
 		ts->drop_io_u[i] = 0;
 
-		for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
+		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
 			ts->io_u_plat[i][j] = 0;
+			if (!i)
+				ts->io_u_sync_plat[j] = 0;
+		}
 	}
 
 	for (i = 0; i < FIO_IO_U_MAP_NR; i++) {
@@ -2357,6 +2409,15 @@ void add_agg_sample(union io_sample_data data, enum fio_ddir ddir, unsigned int
 	__add_log_sample(iolog, data, ddir, bs, mtime_since_genesis(), 0);
 }
 
+void add_sync_clat_sample(struct thread_stat *ts, unsigned long long nsec)
+{
+	unsigned int idx = plat_val_to_idx(nsec);
+	assert(idx < FIO_IO_U_PLAT_NR);
+
+	ts->io_u_sync_plat[idx]++;
+	add_stat_sample(&ts->sync_stat, nsec);
+}
+
 static void add_clat_percentile_sample(struct thread_stat *ts,
 				unsigned long long nsec, enum fio_ddir ddir)
 {
diff --git a/stat.h b/stat.h
index ba66c40..e32a21e 100644
--- a/stat.h
+++ b/stat.h
@@ -162,6 +162,7 @@ struct thread_stat {
 	struct io_stat clat_stat[DDIR_RWDIR_CNT]; /* completion latency */
 	struct io_stat slat_stat[DDIR_RWDIR_CNT]; /* submission latency */
 	struct io_stat lat_stat[DDIR_RWDIR_CNT]; /* total latency */
+	struct io_stat sync_stat;		/* fsync etc stats */
 	struct io_stat bw_stat[DDIR_RWDIR_CNT]; /* bandwidth stats */
 	struct io_stat iops_stat[DDIR_RWDIR_CNT]; /* IOPS stats */
 
@@ -188,6 +189,7 @@ struct thread_stat {
 	uint32_t io_u_lat_u[FIO_IO_U_LAT_U_NR];
 	uint32_t io_u_lat_m[FIO_IO_U_LAT_M_NR];
 	uint32_t io_u_plat[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR];
+	uint32_t io_u_sync_plat[FIO_IO_U_PLAT_NR];
 	uint32_t pad;
 
 	uint64_t total_io_u[DDIR_RWDIR_CNT];
@@ -318,6 +320,8 @@ extern void add_iops_sample(struct thread_data *, struct io_u *,
 				unsigned int);
 extern void add_bw_sample(struct thread_data *, struct io_u *,
 				unsigned int, unsigned long long);
+extern void add_sync_clat_sample(struct thread_stat *ts,
+					unsigned long long nsec);
 extern int calc_log_samples(void);
 
 extern struct io_log *agg_io_log[DDIR_RWDIR_CNT];

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-01-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-01-17 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 53bb5d9c037e842970b32bc828dbe809b42c144d:

  Merge branch 'diskless_invalidate' of https://github.com/sitsofe/fio (2018-01-12 10:59:08 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ca65714c48bcd4fc601e3c04163e2422352be9ca:

  null: drop unneeded casts from void* to non-void* (2018-01-16 08:32:39 -0700)

----------------------------------------------------------------
Tomohiro Kusumi (3):
      null: fix compile time warning on OpenBSD
      null: make *impl_ private
      null: drop unneeded casts from void* to non-void*

 engines/null.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

---

Diff of recent changes:

diff --git a/engines/null.c b/engines/null.c
index 8a4d106..0cfc22a 100644
--- a/engines/null.c
+++ b/engines/null.c
@@ -87,9 +87,9 @@ static void null_cleanup(struct null_data *nd)
 	}
 }
 
-static int null_init(struct thread_data *td, struct null_data **nd_ptr)
+static struct null_data *null_init(struct thread_data *td)
 {
-	struct null_data *nd = (struct null_data *) malloc(sizeof(**nd_ptr));
+	struct null_data *nd = (struct null_data *) malloc(sizeof(*nd));
 
 	memset(nd, 0, sizeof(*nd));
 
@@ -99,47 +99,48 @@ static int null_init(struct thread_data *td, struct null_data **nd_ptr)
 	} else
 		td->io_ops->flags |= FIO_SYNCIO;
 
-	*nd_ptr = nd;
-	return 0;
+	return nd;
 }
 
 #ifndef __cplusplus
 
 static struct io_u *fio_null_event(struct thread_data *td, int event)
 {
-	return null_event((struct null_data *)td->io_ops_data, event);
+	return null_event(td->io_ops_data, event);
 }
 
 static int fio_null_getevents(struct thread_data *td, unsigned int min_events,
 			      unsigned int max, const struct timespec *t)
 {
-	struct null_data *nd = (struct null_data *)td->io_ops_data;
+	struct null_data *nd = td->io_ops_data;
 	return null_getevents(nd, min_events, max, t);
 }
 
 static int fio_null_commit(struct thread_data *td)
 {
-	return null_commit(td, (struct null_data *)td->io_ops_data);
+	return null_commit(td, td->io_ops_data);
 }
 
 static int fio_null_queue(struct thread_data *td, struct io_u *io_u)
 {
-	return null_queue(td, (struct null_data *)td->io_ops_data, io_u);
+	return null_queue(td, td->io_ops_data, io_u);
 }
 
 static int fio_null_open(struct thread_data *td, struct fio_file *f)
 {
-	return null_open((struct null_data *)td->io_ops_data, f);
+	return null_open(td->io_ops_data, f);
 }
 
 static void fio_null_cleanup(struct thread_data *td)
 {
-	null_cleanup((struct null_data *)td->io_ops_data);
+	null_cleanup(td->io_ops_data);
 }
 
 static int fio_null_init(struct thread_data *td)
 {
-	return null_init(td, (struct null_data **)&td->io_ops_data);
+	td->io_ops_data = null_init(td);
+	assert(td->io_ops_data);
+	return 0;
 }
 
 static struct ioengine_ops ioengine = {
@@ -172,7 +173,8 @@ static void fio_exit fio_null_unregister(void)
 struct NullData {
 	NullData(struct thread_data *td)
 	{
-		null_init(td, &impl_);
+		impl_ = null_init(td);
+		assert(impl_);
 	}
 
 	~NullData()
@@ -211,6 +213,7 @@ struct NullData {
 		return null_open(impl_, f);
 	}
 
+private:
 	struct null_data *impl_;
 };
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-01-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-01-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9d5fe300085759c0f764a62be27355fe80b5fd8f:

  io/examples/fio-seq-read: use direct=1 (2018-01-10 08:30:07 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 53bb5d9c037e842970b32bc828dbe809b42c144d:

  Merge branch 'diskless_invalidate' of https://github.com/sitsofe/fio (2018-01-12 10:59:08 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'fio-issue-450' of https://github.com/gvkovai/fio
      Merge branch 'diskless_invalidate' of https://github.com/sitsofe/fio

Robert Elliott (1):
      ioengines: don't call munmap unless size boundary is exceeded

Sitsofe Wheeler (1):
      filesetup: skip fallback invalidation with diskless ioengines

gvkovai (1):
      Fix zoning issue with seq-io and randommap issue

 engines/dev-dax.c |  2 +-
 engines/libpmem.c |  2 +-
 engines/mmap.c    |  2 +-
 filesetup.c       |  4 ++++
 io_u.c            | 56 +++++++++++++++++++++++++++++++++++++++++--------------
 5 files changed, 49 insertions(+), 17 deletions(-)

---

Diff of recent changes:

diff --git a/engines/dev-dax.c b/engines/dev-dax.c
index b1f91a4..caae1e0 100644
--- a/engines/dev-dax.c
+++ b/engines/dev-dax.c
@@ -157,7 +157,7 @@ static int fio_devdax_prep(struct thread_data *td, struct io_u *io_u)
 	 * It fits within existing mapping, use it
 	 */
 	if (io_u->offset >= fdd->devdax_off &&
-	    io_u->offset + io_u->buflen < fdd->devdax_off + fdd->devdax_sz)
+	    io_u->offset + io_u->buflen <= fdd->devdax_off + fdd->devdax_sz)
 		goto done;
 
 	/*
diff --git a/engines/libpmem.c b/engines/libpmem.c
index 3f4e44f..3038784 100644
--- a/engines/libpmem.c
+++ b/engines/libpmem.c
@@ -430,7 +430,7 @@ static int fio_libpmem_prep(struct thread_data *td, struct io_u *io_u)
 			io_u->buflen, fdd->libpmem_sz);
 
 	if (io_u->offset >= fdd->libpmem_off &&
-	    (io_u->offset + io_u->buflen <
+	    (io_u->offset + io_u->buflen <=
 	     fdd->libpmem_off + fdd->libpmem_sz))
 		goto done;
 
diff --git a/engines/mmap.c b/engines/mmap.c
index 51606e1..7755658 100644
--- a/engines/mmap.c
+++ b/engines/mmap.c
@@ -137,7 +137,7 @@ static int fio_mmapio_prep(struct thread_data *td, struct io_u *io_u)
 	 * It fits within existing mapping, use it
 	 */
 	if (io_u->offset >= fmd->mmap_off &&
-	    io_u->offset + io_u->buflen < fmd->mmap_off + fmd->mmap_sz)
+	    io_u->offset + io_u->buflen <= fmd->mmap_off + fmd->mmap_sz)
 		goto done;
 
 	/*
diff --git a/filesetup.c b/filesetup.c
index 30af085..3cda606 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -490,6 +490,10 @@ static int __file_invalidate_cache(struct thread_data *td, struct fio_file *f,
 		ret = td->io_ops->invalidate(td, f);
 		if (ret < 0)
 			errval = -ret;
+	} else if (td_ioengine_flagged(td, FIO_DISKLESSIO)) {
+		dprint(FD_IO, "invalidate not supported by ioengine %s\n",
+		       td->io_ops->name);
+		ret = 0;
 	} else if (f->filetype == FIO_TYPE_FILE) {
 		dprint(FD_IO, "declare unneeded cache %s: %llu/%llu\n",
 			f->file_name, off, len);
diff --git a/io_u.c b/io_u.c
index 852b98e..1d6872e 100644
--- a/io_u.c
+++ b/io_u.c
@@ -922,6 +922,45 @@ void requeue_io_u(struct thread_data *td, struct io_u **io_u)
 	*io_u = NULL;
 }
 
+static void __fill_io_u_zone(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+
+	/*
+	 * See if it's time to switch to a new zone
+	 */
+	if (td->zone_bytes >= td->o.zone_size && td->o.zone_skip) {
+		td->zone_bytes = 0;
+		f->file_offset += td->o.zone_range + td->o.zone_skip;
+
+		/*
+		 * Wrap from the beginning, if we exceed the file size
+		 */
+		if (f->file_offset >= f->real_file_size)
+			f->file_offset = f->real_file_size - f->file_offset;
+		f->last_pos[io_u->ddir] = f->file_offset;
+		td->io_skip_bytes += td->o.zone_skip;
+	}
+
+	/*
+ 	 * If zone_size > zone_range, then maintain the same zone until
+ 	 * zone_bytes >= zone_size.
+ 	 */
+	if (f->last_pos[io_u->ddir] >= (f->file_offset + td->o.zone_range)) {
+		dprint(FD_IO, "io_u maintain zone offset=%" PRIu64 "/last_pos=%" PRIu64 "\n",
+				f->file_offset, f->last_pos[io_u->ddir]);
+		f->last_pos[io_u->ddir] = f->file_offset;
+	}
+
+	/*
+	 * For random: if 'norandommap' is not set and zone_size > zone_range,
+	 * map needs to be reset as it's done with zone_range everytime.
+	 */
+	if ((td->zone_bytes % td->o.zone_range) == 0) {
+		fio_file_reset(td, f);
+	}
+}
+
 static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 {
 	unsigned int is_random;
@@ -938,21 +977,10 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 		goto out;
 
 	/*
-	 * See if it's time to switch to a new zone
+	 * When file is zoned zone_range is always positive
 	 */
-	if (td->zone_bytes >= td->o.zone_size && td->o.zone_skip) {
-		struct fio_file *f = io_u->file;
-
-		td->zone_bytes = 0;
-		f->file_offset += td->o.zone_range + td->o.zone_skip;
-
-		/*
-		 * Wrap from the beginning, if we exceed the file size
-		 */
-		if (f->file_offset >= f->real_file_size)
-			f->file_offset = f->real_file_size - f->file_offset;
-		f->last_pos[io_u->ddir] = f->file_offset;
-		td->io_skip_bytes += td->o.zone_skip;
+	if (td->o.zone_range) {
+		__fill_io_u_zone(td, io_u);
 	}
 
 	/*

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-01-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-01-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 702bd977555105292f3d60dee896cd35ff8b11ef:

  stat: don't add duplicate clat entries for json (2018-01-06 14:47:01 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9d5fe300085759c0f764a62be27355fe80b5fd8f:

  io/examples/fio-seq-read: use direct=1 (2018-01-10 08:30:07 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      io/examples/fio-seq-read: use direct=1

 examples/fio-seq-read.job | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/examples/fio-seq-read.job b/examples/fio-seq-read.job
index 74b1b30..28de93c 100644
--- a/examples/fio-seq-read.job
+++ b/examples/fio-seq-read.job
@@ -3,7 +3,7 @@ name=fio-seq-reads
 filename=fio-seq-reads
 rw=read
 bs=256K
-direct=0
+direct=1
 numjobs=1
 time_based=1
 runtime=900

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-01-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-01-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 57a61cd0e4c5f131cfe75587d8b995191d87ba57:

  verify: don't adjust verification length based on interval when unaligned (2018-01-05 13:38:40 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 702bd977555105292f3d60dee896cd35ff8b11ef:

  stat: don't add duplicate clat entries for json (2018-01-06 14:47:01 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      stat: don't add duplicate clat entries for json

 stat.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 509bd6d..80f804a 100644
--- a/stat.c
+++ b/stat.c
@@ -1030,16 +1030,14 @@ static void add_ddir_status_json(struct thread_stat *ts,
 					ts->clat_stat[ddir].samples,
 					ts->percentile_list, &ovals, &maxv,
 					&minv);
+		if (len > FIO_IO_U_LIST_MAX_LEN)
+			len = FIO_IO_U_LIST_MAX_LEN;
 	} else
 		len = 0;
 
 	percentile_object = json_create_object();
 	json_object_add_value_object(tmp_object, "percentile", percentile_object);
-	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++) {
-		if (i >= len) {
-			json_object_add_value_int(percentile_object, "0.00", 0);
-			continue;
-		}
+	for (i = 0; i < len; i++) {
 		snprintf(buf, sizeof(buf), "%f", ts->percentile_list[i].u.f);
 		json_object_add_value_int(percentile_object, (const char *)buf, ovals[i]);
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-01-06 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-01-06 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c619c0fdb28fbe043d7a7f75bba2ea82b4eca298:

  Merge branch 'percentiles' of https://github.com/sitsofe/fio (2018-01-02 09:05:44 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 57a61cd0e4c5f131cfe75587d8b995191d87ba57:

  verify: don't adjust verification length based on interval when unaligned (2018-01-05 13:38:40 -0700)

----------------------------------------------------------------
Jeff Furlong (1):
      Fix client/server "all clients" reporting

Jens Axboe (2):
      Change bluestop link to be https
      verify: don't adjust verification length based on interval when unaligned

 README    | 4 +++-
 client.c  | 1 +
 gclient.c | 1 +
 init.c    | 1 +
 server.c  | 1 +
 stat.c    | 2 ++
 verify.c  | 8 ++++++--
 7 files changed, 15 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/README b/README
index 72ff465..fc28b16 100644
--- a/README
+++ b/README
@@ -120,7 +120,9 @@ Solaris:
 
 Windows:
 	Rebecca Cran <rebecca+fio@bluestop.org> has fio packages for Windows at
-	http://www.bluestop.org/fio/ .
+	https://www.bluestop.org/fio/ . The latest builds for Windows can also
+	be grabbed from https://ci.appveyor.com/project/axboe/fio by clicking
+	the latest x86 or x64 build, then selecting the ARTIFACTS tab.
 
 BSDs:
 	Packages for BSDs may be available from their binary package repositories.
diff --git a/client.c b/client.c
index 18247ef..6fe6d9f 100644
--- a/client.c
+++ b/client.c
@@ -1024,6 +1024,7 @@ static void handle_ts(struct fio_client *client, struct fio_net_cmd *cmd)
 	client_ts.thread_number = p->ts.thread_number;
 	client_ts.groupid = p->ts.groupid;
 	client_ts.unified_rw_rep = p->ts.unified_rw_rep;
+	client_ts.sig_figs = p->ts.sig_figs;
 
 	if (++sum_stat_nr == sum_stat_clients) {
 		strcpy(client_ts.name, "All clients");
diff --git a/gclient.c b/gclient.c
index ab7aa10..70dda48 100644
--- a/gclient.c
+++ b/gclient.c
@@ -298,6 +298,7 @@ static void gfio_thread_status_op(struct fio_client *client,
 	client_ts.members++;
 	client_ts.thread_number = p->ts.thread_number;
 	client_ts.groupid = p->ts.groupid;
+	client_ts.sig_figs = p->ts.sig_figs;
 
 	if (++sum_stat_nr == sum_stat_clients) {
 		strcpy(client_ts.name, "All clients");
diff --git a/init.c b/init.c
index decd3b4..8a80138 100644
--- a/init.c
+++ b/init.c
@@ -1472,6 +1472,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	td->ts.lat_percentiles = o->lat_percentiles;
 	td->ts.percentile_precision = o->percentile_precision;
 	memcpy(td->ts.percentile_list, o->percentile_list, sizeof(o->percentile_list));
+	td->ts.sig_figs = o->sig_figs;
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		td->ts.clat_stat[i].min_val = ULONG_MAX;
diff --git a/server.c b/server.c
index 54d703d..ce9dca3 100644
--- a/server.c
+++ b/server.c
@@ -1443,6 +1443,7 @@ static void convert_gs(struct group_run_stats *dst, struct group_run_stats *src)
 	dst->unit_base	= cpu_to_le32(src->unit_base);
 	dst->groupid	= cpu_to_le32(src->groupid);
 	dst->unified_rw_rep	= cpu_to_le32(src->unified_rw_rep);
+	dst->sig_figs	= cpu_to_le32(src->sig_figs);
 }
 
 /*
diff --git a/stat.c b/stat.c
index cc171a4..509bd6d 100644
--- a/stat.c
+++ b/stat.c
@@ -1490,6 +1490,8 @@ void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src)
 		dst->kb_base = src->kb_base;
 	if (!dst->unit_base)
 		dst->unit_base = src->unit_base;
+	if (!dst->sig_figs)
+		dst->sig_figs = src->sig_figs;
 }
 
 void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
diff --git a/verify.c b/verify.c
index 2faeaad..b178450 100644
--- a/verify.c
+++ b/verify.c
@@ -87,8 +87,13 @@ static unsigned int get_hdr_inc(struct thread_data *td, struct io_u *io_u)
 {
 	unsigned int hdr_inc;
 
+	/*
+	 * If we use bs_unaligned, buflen can be larger than the verify
+	 * interval (which just defaults to the smallest blocksize possible).
+	 */
 	hdr_inc = io_u->buflen;
-	if (td->o.verify_interval && td->o.verify_interval <= io_u->buflen)
+	if (td->o.verify_interval && td->o.verify_interval <= io_u->buflen &&
+	    !td->o.bs_unaligned)
 		hdr_inc = td->o.verify_interval;
 
 	return hdr_inc;
@@ -1175,7 +1180,6 @@ static void fill_hdr(struct thread_data *td, struct io_u *io_u,
 		     struct verify_header *hdr, unsigned int header_num,
 		     unsigned int header_len, uint64_t rand_seed)
 {
-
 	if (td->o.verify != VERIFY_PATTERN_NO_HDR)
 		__fill_hdr(td, io_u, hdr, header_num, header_len, rand_seed);
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2018-01-03 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2018-01-03 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit df4bf1178ed773986129da6038961388af926971:

  log: fix bad < 0 check for unsigned (2017-12-29 08:45:22 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c619c0fdb28fbe043d7a7f75bba2ea82b4eca298:

  Merge branch 'percentiles' of https://github.com/sitsofe/fio (2018-01-02 09:05:44 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'percentiles' of https://github.com/sitsofe/fio

Sitsofe Wheeler (2):
      stat: make lat_percentiles=1 use sample count from lat_stat
      init: disable percentiles when latency gathering is disabled

 init.c | 5 +++++
 stat.c | 9 ++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index f7d79c1..decd3b4 100644
--- a/init.c
+++ b/init.c
@@ -938,6 +938,11 @@ static int fixup_options(struct thread_data *td)
 		ret = 1;
 	}
 
+	if (o->disable_lat)
+		o->lat_percentiles = 0;
+	if (o->disable_clat)
+		o->clat_percentiles = 0;
+
 	/*
 	 * Fix these up to be nsec internally
 	 */
diff --git a/stat.c b/stat.c
index 863aa45..cc171a4 100644
--- a/stat.c
+++ b/stat.c
@@ -460,8 +460,15 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		display_lat(" lat", min, max, mean, dev, out);
 
 	if (ts->clat_percentiles || ts->lat_percentiles) {
+		uint64_t samples;
+
+		if (ts->clat_percentiles)
+			samples = ts->clat_stat[ddir].samples;
+		else
+			samples = ts->lat_stat[ddir].samples;
+
 		show_clat_percentiles(ts->io_u_plat[ddir],
-					ts->clat_stat[ddir].samples,
+					samples,
 					ts->percentile_list,
 					ts->percentile_precision,
 					ts->clat_percentiles, out);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-30 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-30 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e38feac48ffb7a45c847cabd31ab5eab7fe05a4e:

  Merge branch 'master' of https://github.com/yashi/fio (2017-12-28 08:35:11 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to df4bf1178ed773986129da6038961388af926971:

  log: fix bad < 0 check for unsigned (2017-12-29 08:45:22 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      log: fix bad < 0 check for unsigned

Robert Elliott (1):
      debug: make debug=io readable with multiple threads

Tomohiro Kusumi (1):
      lib/memcpy: fix warning on FreeBSD

 debug.c     | 11 +----------
 fio_time.h  |  1 +
 io_u.c      |  9 ++++-----
 io_u.h      | 15 ++++++++++-----
 ioengines.c |  3 ++-
 log.c       | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 log.h       |  1 +
 7 files changed, 70 insertions(+), 21 deletions(-)

---

Diff of recent changes:

diff --git a/debug.c b/debug.c
index 013cd53..2bee507 100644
--- a/debug.c
+++ b/debug.c
@@ -7,20 +7,11 @@
 void __dprint(int type, const char *str, ...)
 {
 	va_list args;
-	pid_t pid;
 
 	assert(type < FD_DEBUG_MAX);
 
-	pid = getpid();
-	if (fio_debug_jobp && *fio_debug_jobp != -1U
-	    && pid != *fio_debug_jobp)
-		return;
-
-	log_info("%-8s ", debug_levels[type].name);
-	log_info("%-5u ", (int) pid);
-
 	va_start(args, str);
-	log_valist(str, args);
+	log_prevalist(type, str, args);
 	va_end(args);
 }
 #endif
diff --git a/fio_time.h b/fio_time.h
index ee8087e..8b4bb25 100644
--- a/fio_time.h
+++ b/fio_time.h
@@ -2,6 +2,7 @@
 #define FIO_TIME_H
 
 #include <time.h>
+#include <sys/time.h>
 #include "lib/types.h"
 
 struct thread_data;
diff --git a/io_u.c b/io_u.c
index 42d98eb..852b98e 100644
--- a/io_u.c
+++ b/io_u.c
@@ -971,9 +971,8 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 	}
 
 	if (io_u->offset + io_u->buflen > io_u->file->real_file_size) {
-		dprint(FD_IO, "io_u %p, offset + buflen exceeds file size\n",
-			io_u);
-		dprint(FD_IO, "  offset=%llu/buflen=%lu > %llu\n",
+		dprint(FD_IO, "io_u %p, off=0x%llx + len=0x%lx exceeds file size=0x%llx\n",
+			io_u,
 			(unsigned long long) io_u->offset, io_u->buflen,
 			(unsigned long long) io_u->file->real_file_size);
 		return 1;
@@ -986,7 +985,7 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 		mark_random_map(td, io_u);
 
 out:
-	dprint_io_u(io_u, "fill_io_u");
+	dprint_io_u(io_u, "fill");
 	td->zone_bytes += io_u->buflen;
 	return 0;
 }
@@ -1939,7 +1938,7 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 	enum fio_ddir ddir = io_u->ddir;
 	struct fio_file *f = io_u->file;
 
-	dprint_io_u(io_u, "io complete");
+	dprint_io_u(io_u, "complete");
 
 	assert(io_u->flags & IO_U_F_FLIGHT);
 	io_u_clear(td, io_u, IO_U_F_FLIGHT | IO_U_F_BUSY_OK);
diff --git a/io_u.h b/io_u.h
index b228e2e..da25efb 100644
--- a/io_u.h
+++ b/io_u.h
@@ -152,12 +152,17 @@ static inline void dprint_io_u(struct io_u *io_u, const char *p)
 {
 	struct fio_file *f = io_u->file;
 
-	dprint(FD_IO, "%s: io_u %p: off=%llu/len=%lu/ddir=%d", p, io_u,
-					(unsigned long long) io_u->offset,
-					io_u->buflen, io_u->ddir);
 	if (f)
-		dprint(FD_IO, "/%s", f->file_name);
-	dprint(FD_IO, "\n");
+		dprint(FD_IO, "%s: io_u %p: off=0x%llx,len=0x%lx,ddir=%d,file=%s\n",
+				p, io_u,
+				(unsigned long long) io_u->offset,
+				io_u->buflen, io_u->ddir,
+				f->file_name);
+	else
+		dprint(FD_IO, "%s: io_u %p: off=0x%llx,len=0x%lx,ddir=%d\n",
+				p, io_u,
+				(unsigned long long) io_u->offset,
+				io_u->buflen, io_u->ddir);
 }
 #else
 #define dprint_io_u(io_u, p)
diff --git a/ioengines.c b/ioengines.c
index cec0c76..fb475e9 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -224,7 +224,8 @@ int td_io_prep(struct thread_data *td, struct io_u *io_u)
 	if (td->io_ops->prep) {
 		int ret = td->io_ops->prep(td, io_u);
 
-		dprint(FD_IO, "->prep(%p)=%d\n", io_u, ret);
+		dprint(FD_IO, "prep: io_u %p: ret=%d\n", io_u, ret);
+
 		if (ret)
 			unlock_file(td, io_u->file);
 		return ret;
diff --git a/log.c b/log.c
index 95351d5..a327f6a 100644
--- a/log.c
+++ b/log.c
@@ -36,6 +36,8 @@ static size_t valist_to_buf(char **buffer, const char *fmt, va_list src_args)
 
 	do {
 		*buffer = calloc(1, cur);
+		if (!*buffer)
+			return 0;
 
 		va_copy(args, src_args);
 		len = vsnprintf(*buffer, cur, fmt, args);
@@ -51,6 +53,33 @@ static size_t valist_to_buf(char **buffer, const char *fmt, va_list src_args)
 	return len;
 }
 
+/* allocate buffer, fill with prefix string followed by vararg string */
+static size_t prevalist_to_buf(char **buffer, const char *pre, int prelen,
+		const char *fmt, va_list src_args)
+{
+	size_t len, cur = LOG_START_SZ;
+	va_list args;
+
+	do {
+		*buffer = calloc(1, cur);
+		if (!*buffer)
+			return 0;
+
+		va_copy(args, src_args);
+		memcpy(*buffer, pre, prelen);
+		len = prelen + vsnprintf(*buffer + prelen, cur - prelen, fmt, args);
+		va_end(args);
+
+		if (len < cur)
+			break;
+
+		cur = len + 1;
+		free(*buffer);
+	} while (1);
+
+	return len;
+}
+
 size_t log_valist(const char *fmt, va_list args)
 {
 	char *buffer;
@@ -63,6 +92,28 @@ size_t log_valist(const char *fmt, va_list args)
 	return len;
 }
 
+/* add prefix for the specified type in front of the valist */
+void log_prevalist(int type, const char *fmt, va_list args)
+{
+	char pre[32];
+	char *buffer;
+	size_t len;
+	int prelen;
+	pid_t pid;
+
+	pid = gettid();
+	if (fio_debug_jobp && *fio_debug_jobp != -1U
+	    && pid != *fio_debug_jobp)
+		return;
+
+	prelen = snprintf(pre, sizeof pre, "%-8s %-5u ", debug_levels[type].name, (int) pid);
+	if (prelen > 0) {
+		len = prevalist_to_buf(&buffer, pre, prelen, fmt, args);
+		len = log_info_buf(buffer, len);
+		free(buffer);
+	}
+}
+
 size_t log_info(const char *format, ...)
 {
 	va_list args;
diff --git a/log.h b/log.h
index 66546c4..8163f97 100644
--- a/log.h
+++ b/log.h
@@ -13,6 +13,7 @@ extern size_t log_err(const char *format, ...) __attribute__ ((__format__ (__pri
 extern size_t log_info(const char *format, ...) __attribute__ ((__format__ (__printf__, 1, 2)));
 extern size_t __log_buf(struct buf_output *, const char *format, ...) __attribute__ ((__format__ (__printf__, 2, 3)));
 extern size_t log_valist(const char *str, va_list);
+extern void log_prevalist(int type, const char *str, va_list);
 extern size_t log_info_buf(const char *buf, size_t len);
 extern int log_info_flush(void);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-29 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-29 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f7c305464667b118b62aff9b846d1a939fbc1547:

  Merge branch 'eta_display' of https://github.com/sitsofe/fio (2017-12-27 14:05:46 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e38feac48ffb7a45c847cabd31ab5eab7fe05a4e:

  Merge branch 'master' of https://github.com/yashi/fio (2017-12-28 08:35:11 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'master' of https://github.com/yashi/fio

Yasushi SHOJI (1):
      mutex: down_timeout: check against the base time

 mutex.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/mutex.c b/mutex.c
index 9fab715..63229ed 100644
--- a/mutex.c
+++ b/mutex.c
@@ -156,14 +156,15 @@ static bool mutex_timed_out(struct timespec *t, unsigned int msecs)
 int fio_mutex_down_timeout(struct fio_mutex *mutex, unsigned int msecs)
 {
 	struct timeval tv_s;
+	struct timespec base;
 	struct timespec t;
 	int ret = 0;
 
 	assert(mutex->magic == FIO_MUTEX_MAGIC);
 
 	gettimeofday(&tv_s, NULL);
-	t.tv_sec = tv_s.tv_sec;
-	t.tv_nsec = tv_s.tv_usec * 1000;
+	base.tv_sec = t.tv_sec = tv_s.tv_sec;
+	base.tv_nsec = t.tv_nsec = tv_s.tv_usec * 1000;
 
 	t.tv_sec += msecs / 1000;
 	t.tv_nsec += ((msecs * 1000000ULL) % 1000000000);
@@ -181,7 +182,7 @@ int fio_mutex_down_timeout(struct fio_mutex *mutex, unsigned int msecs)
 		 * way too early, double check.
 		 */
 		ret = pthread_cond_timedwait(&mutex->cond, &mutex->lock, &t);
-		if (ret == ETIMEDOUT && !mutex_timed_out(&t, msecs))
+		if (ret == ETIMEDOUT && !mutex_timed_out(&base, msecs))
 			ret = 0;
 	}
 	mutex->waiters--;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-28 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-28 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2fc703f639f1fac7d1f86917ba8bf4d0e81667b9:

  Merge branch 'eta_overflow' of https://github.com/sitsofe/fio (2017-12-21 08:22:39 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f7c305464667b118b62aff9b846d1a939fbc1547:

  Merge branch 'eta_display' of https://github.com/sitsofe/fio (2017-12-27 14:05:46 -0700)

----------------------------------------------------------------
Barak Pinhas (1):
      fix verify_only when using ioengine=mmap

Jens Axboe (2):
      Merge branch 'barak/mmap_verify_only' of https://github.com/barakp/fio
      Merge branch 'eta_display' of https://github.com/sitsofe/fio

Sitsofe Wheeler (5):
      eta: adjust truncation case
      eta: fix run_str_condensed overflow with maximum jobs
      eta: skip clearing of remainder of line when starting a new line
      eta: fix previous line length calculation
      eta: show complete status line with max job states

 engines/mmap.c |  4 ++--
 eta.c          | 20 ++++++++++++--------
 2 files changed, 14 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/engines/mmap.c b/engines/mmap.c
index bc038f4..51606e1 100644
--- a/engines/mmap.c
+++ b/engines/mmap.c
@@ -33,9 +33,9 @@ static int fio_mmap_file(struct thread_data *td, struct fio_file *f,
 	struct fio_mmap_data *fmd = FILE_ENG_DATA(f);
 	int flags = 0;
 
-	if (td_rw(td))
+	if (td_rw(td) && !td->o.verify_only)
 		flags = PROT_READ | PROT_WRITE;
-	else if (td_write(td)) {
+	else if (td_write(td) && !td->o.verify_only) {
 		flags = PROT_WRITE;
 
 		if (td->o.verify != VERIFY_NONE)
diff --git a/eta.c b/eta.c
index 087f57d..0b79526 100644
--- a/eta.c
+++ b/eta.c
@@ -9,7 +9,7 @@
 #include "lib/pow2.h"
 
 static char __run_str[REAL_MAX_JOBS + 1];
-static char run_str[__THREAD_RUNSTR_SZ(REAL_MAX_JOBS)];
+static char run_str[__THREAD_RUNSTR_SZ(REAL_MAX_JOBS) + 1];
 
 static void update_condensed_str(char *rstr, char *run_str_condensed)
 {
@@ -520,7 +520,7 @@ void display_thread_status(struct jobs_eta *je)
 	static int eta_new_line_init, eta_new_line_pending;
 	static int linelen_last;
 	static int eta_good;
-	char output[REAL_MAX_JOBS + 512], *p = output;
+	char output[__THREAD_RUNSTR_SZ(REAL_MAX_JOBS) + 512], *p = output;
 	char eta_str[128];
 	double perc = 0.0;
 
@@ -531,6 +531,7 @@ void display_thread_status(struct jobs_eta *je)
 
 	if (eta_new_line_pending) {
 		eta_new_line_pending = 0;
+		linelen_last = 0;
 		p += sprintf(p, "\n");
 	}
 
@@ -564,6 +565,7 @@ void display_thread_status(struct jobs_eta *je)
 		size_t left;
 		int l;
 		int ddir;
+		int linelen;
 
 		if ((!je->eta_sec && !eta_good) || je->nr_ramp == je->nr_running ||
 		    je->eta_sec == -1)
@@ -585,7 +587,7 @@ void display_thread_status(struct jobs_eta *je)
 			iops_str[ddir] = num2str(je->iops[ddir], 4, 1, 0, N2S_NONE);
 		}
 
-		left = sizeof(output) - (p - output) - 2;
+		left = sizeof(output) - (p - output) - 1;
 
 		if (je->rate[DDIR_TRIM] || je->iops[DDIR_TRIM])
 			l = snprintf(p, left,
@@ -601,12 +603,14 @@ void display_thread_status(struct jobs_eta *je)
 				rate_str[DDIR_READ], rate_str[DDIR_WRITE],
 				iops_str[DDIR_READ], iops_str[DDIR_WRITE],
 				eta_str);
-		if (l > left)
-			l = left;
+		/* If truncation occurred adjust l so p is on the null */
+		if (l >= left)
+			l = left - 1;
 		p += l;
-		if (l >= 0 && l < linelen_last)
-			p += sprintf(p, "%*s", linelen_last - l, "");
-		linelen_last = l;
+		linelen = p - output;
+		if (l >= 0 && linelen < linelen_last)
+			p += sprintf(p, "%*s", linelen_last - linelen, "");
+		linelen_last = linelen;
 
 		for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
 			free(rate_str[ddir]);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-22 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-22 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e3ccbdd5f93d33162a93000586461ac6bba5a7d3:

  Fio 3.3 (2017-12-19 13:16:36 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2fc703f639f1fac7d1f86917ba8bf4d0e81667b9:

  Merge branch 'eta_overflow' of https://github.com/sitsofe/fio (2017-12-21 08:22:39 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'eta_overflow' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      eta: fix buffer overflow in ETA output

 eta.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/eta.c b/eta.c
index 8b77daf..087f57d 100644
--- a/eta.c
+++ b/eta.c
@@ -585,7 +585,7 @@ void display_thread_status(struct jobs_eta *je)
 			iops_str[ddir] = num2str(je->iops[ddir], 4, 1, 0, N2S_NONE);
 		}
 
-		left = sizeof(output) - (p - output) - 1;
+		left = sizeof(output) - (p - output) - 2;
 
 		if (je->rate[DDIR_TRIM] || je->iops[DDIR_TRIM])
 			l = snprintf(p, left,
@@ -601,6 +601,8 @@ void display_thread_status(struct jobs_eta *je)
 				rate_str[DDIR_READ], rate_str[DDIR_WRITE],
 				iops_str[DDIR_READ], iops_str[DDIR_WRITE],
 				eta_str);
+		if (l > left)
+			l = left;
 		p += l;
 		if (l >= 0 && l < linelen_last)
 			p += sprintf(p, "%*s", linelen_last - l, "");

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-20 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-20 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9b50942ecec9c79fa82050c503fe313cfd87ac96:

  ioengines: clear out ->td_ops_dlhandle if we close it (2017-12-15 13:35:56 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e3ccbdd5f93d33162a93000586461ac6bba5a7d3:

  Fio 3.3 (2017-12-19 13:16:36 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      backend: tweaks to missed rate thinktime
      Fio 3.3

 FIO-VERSION-GEN        | 2 +-
 backend.c              | 8 +++++---
 os/windows/install.wxs | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 22f4404..5aed535 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.2
+DEF_VER=fio-3.3
 
 LF='
 '
diff --git a/backend.c b/backend.c
index e248117..b4a09ac 100644
--- a/backend.c
+++ b/backend.c
@@ -899,12 +899,14 @@ static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir)
 	 */
 	if (total && td->rate_bps[ddir] && td->o.rate_ign_think) {
 		uint64_t missed = (td->rate_bps[ddir] * total) / 1000000ULL;
+		uint64_t bs = td->o.min_bs[ddir];
+		uint64_t usperop = bs * 1000000ULL / td->rate_bps[ddir];
 		uint64_t over;
 
-		if (total >= 1000000)
-			over = td->o.min_bs[ddir];
+		if (usperop <= total)
+			over = bs;
 		else
-			over = (td->o.min_bs[ddir] * total) / 1000000ULL;
+			over = (usperop - total) / usperop * -bs;
 
 		td->rate_io_issue_bytes[ddir] += (missed - over);
 	}
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 6dfe231..905addc 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="3.2">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="3.3">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit db37d89074ed204c9c2bd010e72f63dcf4725715:

  Allow configurable ETA intervals (2017-12-14 11:51:41 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9b50942ecec9c79fa82050c503fe313cfd87ac96:

  ioengines: clear out ->td_ops_dlhandle if we close it (2017-12-15 13:35:56 -0700)

----------------------------------------------------------------
Jens Axboe (5):
      parse: dump option type when using --debug=parse
      ioengines: improve "is this the same IO engine" check
      parse: don't check for < 0 on an unsigned type
      init: fix missing dlhandle reference put
      ioengines: clear out ->td_ops_dlhandle if we close it

 compiler/compiler.h |  5 +++++
 fio.h               |  5 -----
 init.c              | 20 ++++++++++++++++++++
 ioengines.c         |  4 +++-
 parse.c             | 32 ++++++++++++++++++++++++++++++--
 parse.h             |  2 +-
 6 files changed, 59 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/compiler/compiler.h b/compiler/compiler.h
index 20df21d..91a9883 100644
--- a/compiler/compiler.h
+++ b/compiler/compiler.h
@@ -69,4 +69,9 @@
 
 #endif
 
+#ifdef FIO_INTERNAL
+#define ARRAY_SIZE(x)    (sizeof((x)) / (sizeof((x)[0])))
+#define FIELD_SIZE(s, f) (sizeof(((typeof(s))0)->f))
+#endif
+
 #endif
diff --git a/fio.h b/fio.h
index b3b95ef..334f203 100644
--- a/fio.h
+++ b/fio.h
@@ -800,11 +800,6 @@ static inline void td_flags_set(struct thread_data *td, unsigned int *flags,
 extern const char *fio_get_arch_string(int);
 extern const char *fio_get_os_string(int);
 
-#ifdef FIO_INTERNAL
-#define ARRAY_SIZE(x)    (sizeof((x)) / (sizeof((x)[0])))
-#define FIELD_SIZE(s, f) (sizeof(((typeof(s))0)->f))
-#endif
-
 enum {
 	__FIO_OUTPUT_TERSE	= 0,
 	__FIO_OUTPUT_JSON	= 1,
diff --git a/init.c b/init.c
index b9da713..f7d79c1 100644
--- a/init.c
+++ b/init.c
@@ -11,6 +11,7 @@
 #include <sys/ipc.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <dlfcn.h>
 
 #include "fio.h"
 #ifndef FIO_NO_HAVE_SHM_H
@@ -1064,6 +1065,9 @@ int ioengine_load(struct thread_data *td)
 	}
 
 	if (td->io_ops) {
+		struct ioengine_ops *ops;
+		void *dlhandle;
+
 		/* An engine is loaded, but the requested ioengine
 		 * may have changed.
 		 */
@@ -1072,6 +1076,22 @@ int ioengine_load(struct thread_data *td)
 			return 0;
 		}
 
+		/*
+		 * Name of file and engine may be different, load ops
+		 * for this name and see if they match. If they do, then
+		 * the engine is unchanged.
+		 */
+		dlhandle = td->io_ops_dlhandle;
+		ops = load_ioengine(td);
+		if (ops == td->io_ops && dlhandle == td->io_ops_dlhandle) {
+			if (dlhandle)
+				dlclose(dlhandle);
+			return 0;
+		}
+
+		if (dlhandle && dlhandle != td->io_ops_dlhandle)
+			dlclose(dlhandle);
+
 		/* Unload the old engine. */
 		free_ioengine(td);
 	}
diff --git a/ioengines.c b/ioengines.c
index 7951ff3..cec0c76 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -194,8 +194,10 @@ void free_ioengine(struct thread_data *td)
 		td->eo = NULL;
 	}
 
-	if (td->io_ops_dlhandle)
+	if (td->io_ops_dlhandle) {
 		dlclose(td->io_ops_dlhandle);
+		td->io_ops_dlhandle = NULL;
+	}
 
 	td->io_ops = NULL;
 }
diff --git a/parse.c b/parse.c
index 68229d0..a9ee1ce 100644
--- a/parse.c
+++ b/parse.c
@@ -12,6 +12,7 @@
 #include <math.h>
 #include <float.h>
 
+#include "compiler/compiler.h"
 #include "parse.h"
 #include "debug.h"
 #include "options.h"
@@ -24,6 +25,22 @@
 #include "y.tab.h"
 #endif
 
+static const char *opt_type_names[] = {
+	"OPT_INVALID",
+	"OPT_STR",
+	"OPT_STR_MULTI",
+	"OPT_STR_VAL",
+	"OPT_STR_VAL_TIME",
+	"OPT_STR_STORE",
+	"OPT_RANGE",
+	"OPT_INT",
+	"OPT_BOOL",
+	"OPT_FLOAT_LIST",
+	"OPT_STR_SET",
+	"OPT_DEPRECATED",
+	"OPT_UNSUPPORTED",
+};
+
 static struct fio_option *__fio_options;
 
 static int vp_cmp(const void *p1, const void *p2)
@@ -469,6 +486,17 @@ static int str_match_len(const struct value_pair *vp, const char *str)
 			*ptr = (val);			\
 	} while (0)
 
+static const char *opt_type_name(struct fio_option *o)
+{
+	compiletime_assert(ARRAY_SIZE(opt_type_names) - 1 == FIO_OPT_UNSUPPORTED,
+				"opt_type_names[] index");
+
+	if (o->type <= FIO_OPT_UNSUPPORTED)
+		return opt_type_names[o->type];
+
+	return "OPT_UNKNOWN?";
+}
+
 static int __handle_option(struct fio_option *o, const char *ptr, void *data,
 			   int first, int more, int curr)
 {
@@ -483,8 +511,8 @@ static int __handle_option(struct fio_option *o, const char *ptr, void *data,
 	struct value_pair posval[PARSE_MAX_VP];
 	int i, all_skipped = 1;
 
-	dprint(FD_PARSE, "__handle_option=%s, type=%d, ptr=%s\n", o->name,
-							o->type, ptr);
+	dprint(FD_PARSE, "__handle_option=%s, type=%s, ptr=%s\n", o->name,
+							opt_type_name(o), ptr);
 
 	if (!ptr && o->type != FIO_OPT_STR_SET && o->type != FIO_OPT_STR) {
 		log_err("Option %s requires an argument\n", o->name);
diff --git a/parse.h b/parse.h
index dfe7f16..d05236b 100644
--- a/parse.h
+++ b/parse.h
@@ -20,7 +20,7 @@ enum fio_opt_type {
 	FIO_OPT_FLOAT_LIST,
 	FIO_OPT_STR_SET,
 	FIO_OPT_DEPRECATED,
-	FIO_OPT_UNSUPPORTED,
+	FIO_OPT_UNSUPPORTED,	/* keep this last */
 };
 
 /*

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-15 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c20c361255683ee138f0c239e48b315e25725f7e:

  server: initialize first iolog header properly (2017-12-13 08:44:34 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to db37d89074ed204c9c2bd010e72f63dcf4725715:

  Allow configurable ETA intervals (2017-12-14 11:51:41 -0700)

----------------------------------------------------------------
Jeff Furlong (1):
      Fix Windows local time

Jens Axboe (4):
      server: cleanup iolog pdu prep
      server: convert more memset to on-stack initialization
      client: respect --eta=never for networked connections
      Allow configurable ETA intervals

Robert Elliott (1):
      .gitignore: ignore tags files and additional output binaries

 .gitignore         | 12 ++++++++++
 HOWTO              | 11 ++++++++-
 client.c           |  5 +++-
 eta.c              | 13 +++++++----
 fio.1              |  9 +++++++-
 fio.h              |  3 +++
 init.c             | 29 +++++++++++++++++++++++
 os/windows/posix.c |  4 +++-
 server.c           | 68 +++++++++++++++++++++++++-----------------------------
 9 files changed, 110 insertions(+), 44 deletions(-)

---

Diff of recent changes:

diff --git a/.gitignore b/.gitignore
index 463b53a..0c8cb7c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -8,7 +8,19 @@
 /config.log
 /cscope.out
 /fio
+/gfio
+/t/axmap
+/t/fio-btrace2fio
+/t/fio-dedupe
+/t/fio-genzipf
+/t/fio-verify-state
+/t/gen-rand
+/t/ieee754
+/t/lfsr-test
+/t/stest
 y.tab.*
 lex.yy.c
 *.un~
 doc/output
+/tags
+/TAGS
diff --git a/HOWTO b/HOWTO
index 563ca93..78fa6cc 100644
--- a/HOWTO
+++ b/HOWTO
@@ -173,7 +173,16 @@ Command line options
 .. option:: --eta=when
 
 	Specifies when real-time ETA estimate should be printed.  `when` may be
-	`always`, `never` or `auto`.
+	`always`, `never` or `auto`. `auto` is the default, it prints ETA
+	when requested if the output is a TTY. `always` disregards the output
+	type, and prints ETA when requested. `never` never prints ETA.
+
+.. option:: --eta-interval=time
+
+	By default, fio requests client ETA status roughly every second. With
+	this option, the interval is configurable. Fio imposes a minimum
+	allowed time to avoid flooding the console, less than 250 msec is
+	not supported.
 
 .. option:: --eta-newline=time
 
diff --git a/client.c b/client.c
index 2b136a0..18247ef 100644
--- a/client.c
+++ b/client.c
@@ -1834,6 +1834,9 @@ static void request_client_etas(struct client_ops *ops)
 	struct client_eta *eta;
 	int skipped = 0;
 
+	if (eta_print == FIO_ETA_NEVER)
+		return;
+
 	dprint(FD_NET, "client: request eta (%d)\n", nr_clients);
 
 	eta = calloc(1, sizeof(*eta) + __THREAD_RUNSTR_SZ(REAL_MAX_JOBS));
@@ -1997,7 +2000,7 @@ int fio_handle_clients(struct client_ops *ops)
 			int timeout;
 
 			fio_gettime(&ts, NULL);
-			if (mtime_since(&eta_ts, &ts) >= 900) {
+			if (eta_time_within_slack(mtime_since(&eta_ts, &ts))) {
 				request_client_etas(ops);
 				memcpy(&eta_ts, &ts, sizeof(ts));
 
diff --git a/eta.c b/eta.c
index 1b0b000..8b77daf 100644
--- a/eta.c
+++ b/eta.c
@@ -348,6 +348,14 @@ static void calc_iops(int unified_rw_rep, unsigned long mtime,
 }
 
 /*
+ * Allow a little slack - if we're within 95% of the time, allow ETA.
+ */
+bool eta_time_within_slack(unsigned int time)
+{
+	return time > ((eta_interval_msec * 95) / 100);
+}
+
+/*
  * Print status of the jobs we know about. This includes rate estimates,
  * ETA, thread state, etc.
  */
@@ -489,10 +497,7 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 
 	disp_time = mtime_since(&disp_prev_time, &now);
 
-	/*
-	 * Allow a little slack, the target is to print it every 1000 msecs
-	 */
-	if (!force && disp_time < 900)
+	if (!force && !eta_time_within_slack(disp_time))
 		return false;
 
 	calc_rate(unified_rw_rep, disp_time, io_bytes, disp_io_bytes, je->rate);
diff --git a/fio.1 b/fio.1
index 57ab665..70eeeb0 100644
--- a/fio.1
+++ b/fio.1
@@ -77,7 +77,14 @@ the I/O engine core to prevent writes due to unknown user space bug(s).
 .TP
 .BI \-\-eta \fR=\fPwhen
 Specifies when real\-time ETA estimate should be printed. \fIwhen\fR may
-be `always', `never' or `auto'.
+be `always', `never' or `auto'. `auto' is the default, it prints ETA when
+requested if the output is a TTY. `always' disregards the output type, and
+prints ETA when requested. `never' never prints ETA.
+.TP
+.BI \-\-eta\-interval \fR=\fPtime
+By default, fio requests client ETA status roughly every second. With this
+option, the interval is configurable. Fio imposes a minimum allowed time to
+avoid flooding the console, less than 250 msec is not supported.
 .TP
 .BI \-\-eta\-newline \fR=\fPtime
 Force a new line for every \fItime\fR period passed. When the unit is omitted,
diff --git a/fio.h b/fio.h
index 8a65646..b3b95ef 100644
--- a/fio.h
+++ b/fio.h
@@ -505,6 +505,7 @@ extern uintptr_t page_mask, page_size;
 extern int read_only;
 extern int eta_print;
 extern int eta_new_line;
+extern unsigned int eta_interval_msec;
 extern unsigned long done_secs;
 extern int fio_gtod_offload;
 extern int fio_gtod_cpu;
@@ -525,6 +526,8 @@ extern char *aux_path;
 
 extern struct thread_data *threads;
 
+extern bool eta_time_within_slack(unsigned int time);
+
 static inline void fio_ro_check(const struct thread_data *td, struct io_u *io_u)
 {
 	assert(!(io_u->ddir == DDIR_WRITE && !td_write(td)));
diff --git a/init.c b/init.c
index b77b299..b9da713 100644
--- a/init.c
+++ b/init.c
@@ -51,6 +51,7 @@ static int nr_job_sections;
 int exitall_on_terminate = 0;
 int output_format = FIO_OUTPUT_NORMAL;
 int eta_print = FIO_ETA_AUTO;
+unsigned int eta_interval_msec = 1000;
 int eta_new_line = 0;
 FILE *f_out = NULL;
 FILE *f_err = NULL;
@@ -154,6 +155,11 @@ static struct option l_opts[FIO_NR_OPTIONS] = {
 		.val		= 'e' | FIO_CLIENT_FLAG,
 	},
 	{
+		.name		= (char *) "eta-interval",
+		.has_arg	= required_argument,
+		.val		= 'O' | FIO_CLIENT_FLAG,
+	},
+	{
 		.name		= (char *) "eta-newline",
 		.has_arg	= required_argument,
 		.val		= 'E' | FIO_CLIENT_FLAG,
@@ -2504,8 +2510,31 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 				log_err("fio: failed parsing eta time %s\n", optarg);
 				exit_val = 1;
 				do_exit++;
+				break;
 			}
 			eta_new_line = t / 1000;
+			if (!eta_new_line) {
+				log_err("fio: eta new line time too short\n");
+				exit_val = 1;
+				do_exit++;
+			}
+			break;
+			}
+		case 'O': {
+			long long t = 0;
+
+			if (check_str_time(optarg, &t, 1)) {
+				log_err("fio: failed parsing eta interval %s\n", optarg);
+				exit_val = 1;
+				do_exit++;
+				break;
+			}
+			eta_interval_msec = t / 1000;
+			if (eta_interval_msec < DISK_UTIL_MSEC) {
+				log_err("fio: eta interval time too short (%umsec min)\n", DISK_UTIL_MSEC);
+				exit_val = 1;
+				do_exit++;
+			}
 			break;
 			}
 		case 'd':
diff --git a/os/windows/posix.c b/os/windows/posix.c
index 00f0335..17e18a1 100755
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -228,12 +228,14 @@ void Time_tToSystemTime(time_t dosTime, SYSTEMTIME *systemTime)
 {
     FILETIME utcFT;
     LONGLONG jan1970;
+	SYSTEMTIME tempSystemTime;
 
     jan1970 = Int32x32To64(dosTime, 10000000) + 116444736000000000;
     utcFT.dwLowDateTime = (DWORD)jan1970;
     utcFT.dwHighDateTime = jan1970 >> 32;
 
-    FileTimeToSystemTime((FILETIME*)&utcFT, systemTime);
+    FileTimeToSystemTime((FILETIME*)&utcFT, &tempSystemTime);
+	SystemTimeToTzSpecificLocalTime(NULL, &tempSystemTime, systemTime);
 }
 
 char* ctime_r(const time_t *t, char *buf)
diff --git a/server.c b/server.c
index 6225614..54d703d 100644
--- a/server.c
+++ b/server.c
@@ -844,25 +844,24 @@ static int handle_jobline_cmd(struct fio_net_cmd *cmd)
 static int handle_probe_cmd(struct fio_net_cmd *cmd)
 {
 	struct cmd_client_probe_pdu *pdu = (struct cmd_client_probe_pdu *) cmd->payload;
-	struct cmd_probe_reply_pdu probe;
 	uint64_t tag = cmd->tag;
+	struct cmd_probe_reply_pdu probe = {
+#ifdef CONFIG_BIG_ENDIAN
+		.bigendian	= 1,
+#endif
+		.os		= FIO_OS,
+		.arch		= FIO_ARCH,
+		.bpp		= sizeof(void *),
+		.cpus		= __cpu_to_le32(cpus_online()),
+	};
 
 	dprint(FD_NET, "server: sending probe reply\n");
 
 	strcpy(me, (char *) pdu->server);
 
-	memset(&probe, 0, sizeof(probe));
 	gethostname((char *) probe.hostname, sizeof(probe.hostname));
-#ifdef CONFIG_BIG_ENDIAN
-	probe.bigendian = 1;
-#endif
 	strncpy((char *) probe.fio_version, fio_version_string, sizeof(probe.fio_version) - 1);
 
-	probe.os	= FIO_OS;
-	probe.arch	= FIO_ARCH;
-	probe.bpp	= sizeof(void *);
-	probe.cpus	= __cpu_to_le32(cpus_online());
-
 	/*
 	 * If the client supports compression and we do too, then enable it
 	 */
@@ -1826,13 +1825,12 @@ static int __fio_append_iolog_gz(struct sk_entry *first, struct io_log *log,
 
 static int fio_append_iolog_gz(struct sk_entry *first, struct io_log *log)
 {
+	z_stream stream = {
+		.zalloc	= Z_NULL,
+		.zfree	= Z_NULL,
+		.opaque	= Z_NULL,
+	};
 	int ret = 0;
-	z_stream stream;
-
-	memset(&stream, 0, sizeof(stream));
-	stream.zalloc = Z_NULL;
-	stream.zfree = Z_NULL;
-	stream.opaque = Z_NULL;
 
 	if (deflateInit(&stream, Z_DEFAULT_COMPRESSION) != Z_OK)
 		return 1;
@@ -1928,17 +1926,16 @@ static int fio_append_text_log(struct sk_entry *first, struct io_log *log)
 
 int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 {
-	struct cmd_iolog_pdu pdu;
+	struct cmd_iolog_pdu pdu = {
+		.nr_samples		= cpu_to_le64(iolog_nr_samples(log)),
+		.thread_number		= cpu_to_le32(td->thread_number),
+		.log_type		= cpu_to_le32(log->log_type),
+		.log_hist_coarseness	= cpu_to_le32(log->hist_coarseness),
+	};
 	struct sk_entry *first;
 	struct flist_head *entry;
 	int ret = 0;
 
-	memset(&pdu, 0, sizeof(pdu));
-	pdu.nr_samples = cpu_to_le64(iolog_nr_samples(log));
-	pdu.thread_number = cpu_to_le32(td->thread_number);
-	pdu.log_type = cpu_to_le32(log->log_type);
-	pdu.log_hist_coarseness = cpu_to_le32(log->hist_coarseness);
-
 	if (!flist_empty(&log->chunk_list))
 		pdu.compressed = __cpu_to_le32(STORE_COMPRESSED);
 	else if (use_zlib)
@@ -1999,11 +1996,11 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 
 void fio_server_send_add_job(struct thread_data *td)
 {
-	struct cmd_add_job_pdu pdu;
+	struct cmd_add_job_pdu pdu = {
+		.thread_number = cpu_to_le32(td->thread_number),
+		.groupid = cpu_to_le32(td->groupid),
+	};
 
-	memset(&pdu, 0, sizeof(pdu));
-	pdu.thread_number = cpu_to_le32(td->thread_number);
-	pdu.groupid = cpu_to_le32(td->groupid);
 	convert_thread_options_to_net(&pdu.top, &td->o);
 
 	fio_net_queue_cmd(FIO_NET_CMD_ADD_JOB, &pdu, sizeof(pdu), NULL,
@@ -2241,11 +2238,10 @@ int fio_server_parse_host(const char *host, int ipv6, struct in_addr *inp,
 		ret = inet_pton(AF_INET, host, inp);
 
 	if (ret != 1) {
-		struct addrinfo hints, *res;
-
-		memset(&hints, 0, sizeof(hints));
-		hints.ai_family = ipv6 ? AF_INET6 : AF_INET;
-		hints.ai_socktype = SOCK_STREAM;
+		struct addrinfo *res, hints = {
+			.ai_family = ipv6 ? AF_INET6 : AF_INET,
+			.ai_socktype = SOCK_STREAM,
+		};
 
 		ret = getaddrinfo(host, NULL, &hints, &res);
 		if (ret) {
@@ -2404,11 +2400,11 @@ static void sig_int(int sig)
 
 static void set_sig_handlers(void)
 {
-	struct sigaction act;
+	struct sigaction act = {
+		.sa_handler = sig_int,
+		.sa_flags = SA_RESTART,
+	};
 
-	memset(&act, 0, sizeof(act));
-	act.sa_handler = sig_int;
-	act.sa_flags = SA_RESTART;
 	sigaction(SIGINT, &act, NULL);
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-14 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-14 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e80d780108fd33350f7c4a3032a8d2d06d7b102f:

  fio: kill td->nr_normal_files (2017-12-08 12:50:28 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c20c361255683ee138f0c239e48b315e25725f7e:

  server: initialize first iolog header properly (2017-12-13 08:44:34 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      server: initialize first iolog header properly

 server.c | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/server.c b/server.c
index 76d662d..6225614 100644
--- a/server.c
+++ b/server.c
@@ -1933,6 +1933,7 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 	struct flist_head *entry;
 	int ret = 0;
 
+	memset(&pdu, 0, sizeof(pdu));
 	pdu.nr_samples = cpu_to_le64(iolog_nr_samples(log));
 	pdu.thread_number = cpu_to_le32(td->thread_number);
 	pdu.log_type = cpu_to_le32(log->log_type);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-09 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-09 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1aa39b0ce447f228460e6d0af601fee88fd5f4b4:

  rate: ensure IO issue restarts right after sleep (2017-12-07 09:06:04 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e80d780108fd33350f7c4a3032a8d2d06d7b102f:

  fio: kill td->nr_normal_files (2017-12-08 12:50:28 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      rate: fix bad math
      Remove old emails
      fio: kill td->nr_normal_files

 backend.c                  | 11 +++++++++--
 filesetup.c                |  3 ---
 fio.1                      |  3 +--
 fio.h                      |  1 -
 init.c                     |  4 +---
 tools/fio_generate_plots.1 |  3 +--
 6 files changed, 12 insertions(+), 13 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 5304ddc..e248117 100644
--- a/backend.c
+++ b/backend.c
@@ -898,8 +898,15 @@ static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir)
 	 * start issuing immediately after the sleep.
 	 */
 	if (total && td->rate_bps[ddir] && td->o.rate_ign_think) {
-		td->rate_io_issue_bytes[ddir] += (td->rate_bps[ddir] * 1000000) / total;
-		td->rate_io_issue_bytes[ddir] -= td->o.min_bs[ddir];
+		uint64_t missed = (td->rate_bps[ddir] * total) / 1000000ULL;
+		uint64_t over;
+
+		if (total >= 1000000)
+			over = td->o.min_bs[ddir];
+		else
+			over = (td->o.min_bs[ddir] * total) / 1000000ULL;
+
+		td->rate_io_issue_bytes[ddir] += (missed - over);
 	}
 }
 
diff --git a/filesetup.c b/filesetup.c
index 1d586b1..30af085 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1628,8 +1628,6 @@ int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 	}
 
 	td->files_index++;
-	if (f->filetype == FIO_TYPE_FILE)
-		td->nr_normal_files++;
 
 	if (td->o.numjobs > 1)
 		set_already_allocated(file_name);
@@ -1855,7 +1853,6 @@ void free_release_files(struct thread_data *td)
 	td->o.nr_files = 0;
 	td->o.open_files = 0;
 	td->files_index = 0;
-	td->nr_normal_files = 0;
 }
 
 void fio_file_reset(struct thread_data *td, struct fio_file *f)
diff --git a/fio.1 b/fio.1
index 80abc14..57ab665 100644
--- a/fio.1
+++ b/fio.1
@@ -3559,8 +3559,7 @@ containing two hostnames `h1' and `h2' with IP addresses 192.168.10.120 and
 .RE
 .SH AUTHORS
 .B fio
-was written by Jens Axboe <jens.axboe@oracle.com>,
-now Jens Axboe <axboe@fb.com>.
+was written by Jens Axboe <axboe@kernel.dk>.
 .br
 This man page was written by Aaron Carroll <aaronc@cse.unsw.edu.au> based
 on documentation by Jens Axboe.
diff --git a/fio.h b/fio.h
index 6b184c2..8a65646 100644
--- a/fio.h
+++ b/fio.h
@@ -208,7 +208,6 @@ struct thread_data {
 	unsigned int files_index;
 	unsigned int nr_open_files;
 	unsigned int nr_done_files;
-	unsigned int nr_normal_files;
 	union {
 		unsigned int next_file;
 		struct frand_state next_file_state;
diff --git a/init.c b/init.c
index c34bd15..b77b299 100644
--- a/init.c
+++ b/init.c
@@ -2142,9 +2142,7 @@ static void usage(const char *name)
 	printf("  --trigger=cmd\t\tSet this command as local trigger\n");
 	printf("  --trigger-remote=cmd\tSet this command as remote trigger\n");
 	printf("  --aux-path=path\tUse this path for fio state generated files\n");
-	printf("\nFio was written by Jens Axboe <jens.axboe@oracle.com>");
-	printf("\n                   Jens Axboe <jaxboe@fusionio.com>");
-	printf("\n                   Jens Axboe <axboe@fb.com>\n");
+	printf("\nFio was written by Jens Axboe <axboe@kernel.dk>\n");
 }
 
 #ifdef FIO_INC_DEBUG
diff --git a/tools/fio_generate_plots.1 b/tools/fio_generate_plots.1
index 9e3c1ff..92b2421 100644
--- a/tools/fio_generate_plots.1
+++ b/tools/fio_generate_plots.1
@@ -38,8 +38,7 @@ generated in the current directory.
 The script takes the title of the plot as only argument. It does
 not offer any additional options.
 .SH AUTHOR
-fio_generate_plots was written by Jens Axboe <jens.axboe@oracle.com>,
-now Jens Axboe <jaxboe@fusionio.com>.
+fio_generate_plots was written by Jens Axboe <axboe@kernel.dk>
 .PP
 This manual page was written by Martin Steigerwald <ms@teamix.de>,
 for the Debian project (but may be used by others).

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 395feabb53806d5cff2e0b73b6c94048f05b5aae:

  io_u: rate cleanup and spelling error (2017-12-06 12:30:20 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1aa39b0ce447f228460e6d0af601fee88fd5f4b4:

  rate: ensure IO issue restarts right after sleep (2017-12-07 09:06:04 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      rate: ensure IO issue restarts right after sleep

 backend.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 69f03dc..5304ddc 100644
--- a/backend.c
+++ b/backend.c
@@ -894,10 +894,13 @@ static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir)
 
 	/*
 	 * If we're ignoring thinktime for the rate, add the number of bytes
-	 * we would have done while sleeping.
+	 * we would have done while sleeping, minus one block to ensure we
+	 * start issuing immediately after the sleep.
 	 */
-	if (total && td->rate_bps[ddir] && td->o.rate_ign_think)
+	if (total && td->rate_bps[ddir] && td->o.rate_ign_think) {
 		td->rate_io_issue_bytes[ddir] += (td->rate_bps[ddir] * 1000000) / total;
+		td->rate_io_issue_bytes[ddir] -= td->o.min_bs[ddir];
+	}
 }
 
 /*

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 67bfebe6af2e6d030ec739fa45ccb211f3e50a0e:

  Merge branch 'wip-cleanup' of https://github.com/ZVampirEM77/fio (2017-12-03 10:11:53 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 395feabb53806d5cff2e0b73b6c94048f05b5aae:

  io_u: rate cleanup and spelling error (2017-12-06 12:30:20 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Add option to ignore thinktime for rated IO
      io_u: rate cleanup and spelling error

 HOWTO            |  7 +++++++
 backend.c        | 61 +++++++++++++++++++++++++++++++++++---------------------
 cconv.c          |  2 ++
 fio.1            |  6 ++++++
 io_u.c           |  9 ++++-----
 options.c        | 10 ++++++++++
 server.h         |  2 +-
 thread_options.h |  3 +++
 8 files changed, 71 insertions(+), 29 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 4caaf54..563ca93 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2208,6 +2208,13 @@ I/O rate
 	(https://en.wikipedia.org/wiki/Poisson_point_process). The lambda will be
 	10^6 / IOPS for the given workload.
 
+.. option:: rate_ignore_thinktime=bool
+
+	By default, fio will attempt to catch up to the specified rate setting,
+	if any kind of thinktime setting was used. If this option is set, then
+	fio will ignore the thinktime and continue doing IO at the specified
+	rate, instead of entering a catch-up mode after thinktime is done.
+
 
 I/O latency
 ~~~~~~~~~~~
diff --git a/backend.c b/backend.c
index 6c805c7..69f03dc 100644
--- a/backend.c
+++ b/backend.c
@@ -844,14 +844,13 @@ static bool io_complete_bytes_exceeded(struct thread_data *td)
  */
 static long long usec_for_io(struct thread_data *td, enum fio_ddir ddir)
 {
-	uint64_t secs, remainder, bps, bytes, iops;
+	uint64_t bps = td->rate_bps[ddir];
 
 	assert(!(td->flags & TD_F_CHILD));
-	bytes = td->rate_io_issue_bytes[ddir];
-	bps = td->rate_bps[ddir];
 
 	if (td->o.rate_process == RATE_PROCESS_POISSON) {
-		uint64_t val;
+		uint64_t val, iops;
+
 		iops = bps / td->o.bs[ddir];
 		val = (int64_t) (1000000 / iops) *
 				-logf(__rand_0_1(&td->poisson_state[ddir]));
@@ -863,14 +862,44 @@ static long long usec_for_io(struct thread_data *td, enum fio_ddir ddir)
 		td->last_usec[ddir] += val;
 		return td->last_usec[ddir];
 	} else if (bps) {
-		secs = bytes / bps;
-		remainder = bytes % bps;
+		uint64_t bytes = td->rate_io_issue_bytes[ddir];
+		uint64_t secs = bytes / bps;
+		uint64_t remainder = bytes % bps;
+
 		return remainder * 1000000 / bps + secs * 1000000;
 	}
 
 	return 0;
 }
 
+static void handle_thinktime(struct thread_data *td, enum fio_ddir ddir)
+{
+	unsigned long long b;
+	uint64_t total;
+	int left;
+
+	b = ddir_rw_sum(td->io_blocks);
+	if (b % td->o.thinktime_blocks)
+		return;
+
+	io_u_quiesce(td);
+
+	total = 0;
+	if (td->o.thinktime_spin)
+		total = usec_spin(td->o.thinktime_spin);
+
+	left = td->o.thinktime - total;
+	if (left)
+		total += usec_sleep(td, left);
+
+	/*
+	 * If we're ignoring thinktime for the rate, add the number of bytes
+	 * we would have done while sleeping.
+	 */
+	if (total && td->rate_bps[ddir] && td->o.rate_ign_think)
+		td->rate_io_issue_bytes[ddir] += (td->rate_bps[ddir] * 1000000) / total;
+}
+
 /*
  * Main IO worker function. It retrieves io_u's to process and queues
  * and reaps them, checking for rate and errors along the way.
@@ -955,6 +984,7 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 			int err = PTR_ERR(io_u);
 
 			io_u = NULL;
+			ddir = DDIR_INVAL;
 			if (err == -EBUSY) {
 				ret = FIO_Q_BUSY;
 				goto reap;
@@ -1062,23 +1092,8 @@ reap:
 		if (!in_ramp_time(td) && td->o.latency_target)
 			lat_target_check(td);
 
-		if (td->o.thinktime) {
-			unsigned long long b;
-
-			b = ddir_rw_sum(td->io_blocks);
-			if (!(b % td->o.thinktime_blocks)) {
-				int left;
-
-				io_u_quiesce(td);
-
-				if (td->o.thinktime_spin)
-					usec_spin(td->o.thinktime_spin);
-
-				left = td->o.thinktime - td->o.thinktime_spin;
-				if (left)
-					usec_sleep(td, left);
-			}
-		}
+		if (ddir_rw(ddir) && td->o.thinktime)
+			handle_thinktime(td, ddir);
 	}
 
 	check_update_rusage(td);
diff --git a/cconv.c b/cconv.c
index 5ed4640..92996b1 100644
--- a/cconv.c
+++ b/cconv.c
@@ -298,6 +298,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 
 	o->trim_backlog = le64_to_cpu(top->trim_backlog);
 	o->rate_process = le32_to_cpu(top->rate_process);
+	o->rate_ign_think = le32_to_cpu(top->rate_ign_think);
 
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++)
 		o->percentile_list[i].u.f = fio_uint64_to_double(le64_to_cpu(top->percentile_list[i].u.i));
@@ -557,6 +558,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->offset_increment = __cpu_to_le64(o->offset_increment);
 	top->number_ios = __cpu_to_le64(o->number_ios);
 	top->rate_process = cpu_to_le32(o->rate_process);
+	top->rate_ign_think = cpu_to_le32(o->rate_ign_think);
 
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++)
 		top->percentile_list[i].u.i = __cpu_to_le64(fio_double_to_uint64(o->percentile_list[i].u.f));
diff --git a/fio.1 b/fio.1
index 54d1b0f..80abc14 100644
--- a/fio.1
+++ b/fio.1
@@ -1955,6 +1955,12 @@ I/Os that gets adjusted based on I/O completion rates. If this is set to
 flow, known as the Poisson process
 (\fIhttps://en.wikipedia.org/wiki/Poisson_point_process\fR). The lambda will be
 10^6 / IOPS for the given workload.
+.TP
+.BI rate_ignore_thinktime \fR=\fPbool
+By default, fio will attempt to catch up to the specified rate setting, if any
+kind of thinktime setting was used. If this option is set, then fio will
+ignore the thinktime and continue doing IO at the specified rate, instead of
+entering a catch-up mode after thinktime is done.
 .SS "I/O latency"
 .TP
 .BI latency_target \fR=\fPtime
diff --git a/io_u.c b/io_u.c
index 44933a1..42d98eb 100644
--- a/io_u.c
+++ b/io_u.c
@@ -759,11 +759,11 @@ static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
 			return odir;
 
 		/*
-		 * Both directions are ahead of rate. sleep the min
-		 * switch if necissary
+		 * Both directions are ahead of rate. sleep the min,
+		 * switch if necessary
 		 */
 		if (td->rate_next_io_time[ddir] <=
-			td->rate_next_io_time[odir]) {
+		    td->rate_next_io_time[odir]) {
 			usec = td->rate_next_io_time[ddir] - now;
 		} else {
 			usec = td->rate_next_io_time[odir] - now;
@@ -775,8 +775,7 @@ static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
 	if (td->o.io_submit_mode == IO_MODE_INLINE)
 		io_u_quiesce(td);
 
-	usec = usec_sleep(td, usec);
-
+	usec_sleep(td, usec);
 	return ddir;
 }
 
diff --git a/options.c b/options.c
index 3fa646c..9a3431d 100644
--- a/options.c
+++ b/options.c
@@ -3460,6 +3460,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_RATE,
 	},
 	{
+		.name	= "rate_ignore_thinktime",
+		.lname	= "Rate ignore thinktime",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct thread_options, rate_ign_think),
+		.help	= "Rated IO ignores thinktime settings",
+		.parent = "rate",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_RATE,
+	},
+	{
 		.name	= "max_latency",
 		.lname	= "Max Latency (usec)",
 		.type	= FIO_OPT_STR_VAL_TIME,
diff --git a/server.h b/server.h
index 438a6c3..1a9b650 100644
--- a/server.h
+++ b/server.h
@@ -49,7 +49,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 68,
+	FIO_SERVER_VER			= 69,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 793df8a..dc290b0 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -273,6 +273,7 @@ struct thread_options {
 	unsigned int rate_iops[DDIR_RWDIR_CNT];
 	unsigned int rate_iops_min[DDIR_RWDIR_CNT];
 	unsigned int rate_process;
+	unsigned int rate_ign_think;
 
 	char *ioscheduler;
 
@@ -547,6 +548,8 @@ struct thread_options_pack {
 	uint32_t rate_iops[DDIR_RWDIR_CNT];
 	uint32_t rate_iops_min[DDIR_RWDIR_CNT];
 	uint32_t rate_process;
+	uint32_t rate_ign_think;
+	uint32_t pad;
 
 	uint8_t ioscheduler[FIO_TOP_STR_MAX];
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-04 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-04 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit af7fb4aa2835b82847701237783c9ebe8ec8239a:

  steadystate: style cleanup (2017-12-02 16:29:44 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 67bfebe6af2e6d030ec739fa45ccb211f3e50a0e:

  Merge branch 'wip-cleanup' of https://github.com/ZVampirEM77/fio (2017-12-03 10:11:53 -0700)

----------------------------------------------------------------
Enming Zhang (1):
      configure: fix typos

Jens Axboe (1):
      Merge branch 'wip-cleanup' of https://github.com/ZVampirEM77/fio

 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 31ba822..d92bb0f 100755
--- a/configure
+++ b/configure
@@ -1586,7 +1586,7 @@ print_config "rbd_poll" "$rbd_poll"
 fi
 
 ##########################################
-# check for rbd_invaidate_cache()
+# check for rbd_invalidate_cache()
 if test "$rbd_inval" != "yes" ; then
   rbd_inval="no"
 fi

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-03 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-03 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e0409d5ffe6127961ddc4c495ec32b72a65e11bf:

  thread_options: drop fadvise_stream from thread_options (2017-12-01 14:54:49 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to af7fb4aa2835b82847701237783c9ebe8ec8239a:

  steadystate: style cleanup (2017-12-02 16:29:44 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      steadystate: add free helper
      steadystate: style cleanup

 backend.c       |  7 +------
 helper_thread.c |  3 +--
 steadystate.c   | 16 +++++++++++-----
 steadystate.h   |  1 +
 4 files changed, 14 insertions(+), 13 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 10eb90e..6c805c7 100644
--- a/backend.c
+++ b/backend.c
@@ -2480,12 +2480,7 @@ int fio_backend(struct sk_out *sk_out)
 	}
 
 	for_each_td(td, i) {
-		if (td->ss.dur) {
-			if (td->ss.iops_data != NULL) {
-				free(td->ss.iops_data);
-				free(td->ss.bw_data);
-			}
-		}
+		steadystate_free(td);
 		fio_options_free(td);
 		if (td->rusage_sem) {
 			fio_mutex_remove(td->rusage_sem);
diff --git a/helper_thread.c b/helper_thread.c
index 9c6e0a2..64e5a3c 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -134,8 +134,7 @@ static void *helper_thread_main(void *data)
 					next_ss = STEADYSTATE_MSEC - (since_ss - STEADYSTATE_MSEC);
 				else
 					next_ss = STEADYSTATE_MSEC;
-			}
-			else
+			} else
 				next_ss = STEADYSTATE_MSEC - since_ss;
                 }
 
diff --git a/steadystate.c b/steadystate.c
index 05ce029..2017ca6 100644
--- a/steadystate.c
+++ b/steadystate.c
@@ -6,6 +6,14 @@
 
 bool steadystate_enabled = false;
 
+void steadystate_free(struct thread_data *td)
+{
+	free(td->ss.iops_data);
+	free(td->ss.bw_data);
+	td->ss.iops_data = NULL;
+	td->ss.bw_data = NULL;
+}
+
 static void steadystate_alloc(struct thread_data *td)
 {
 	td->ss.bw_data = calloc(td->ss.dur, sizeof(uint64_t));
@@ -16,8 +24,8 @@ static void steadystate_alloc(struct thread_data *td)
 
 void steadystate_setup(void)
 {
-	int i, prev_groupid;
 	struct thread_data *td, *prev_td;
+	int i, prev_groupid;
 
 	if (!steadystate_enabled)
 		return;
@@ -39,17 +47,15 @@ void steadystate_setup(void)
 		}
 
 		if (prev_groupid != td->groupid) {
-			if (prev_td != NULL) {
+			if (prev_td)
 				steadystate_alloc(prev_td);
-			}
 			prev_groupid = td->groupid;
 		}
 		prev_td = td;
 	}
 
-	if (prev_td != NULL && prev_td->o.group_reporting) {
+	if (prev_td && prev_td->o.group_reporting)
 		steadystate_alloc(prev_td);
-	}
 }
 
 static bool steadystate_slope(uint64_t iops, uint64_t bw,
diff --git a/steadystate.h b/steadystate.h
index eaba0d7..9fd88ee 100644
--- a/steadystate.h
+++ b/steadystate.h
@@ -5,6 +5,7 @@
 #include "thread_options.h"
 #include "lib/ieee754.h"
 
+extern void steadystate_free(struct thread_data *);
 extern void steadystate_check(void);
 extern void steadystate_setup(void);
 extern int td_steadystate_init(struct thread_data *);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-02 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-02 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e1c325d25dd977c28c9489c542a51ee05dfc620e:

  io_u: don't account io issue blocks for verify backlog (2017-11-30 21:48:12 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e0409d5ffe6127961ddc4c495ec32b72a65e11bf:

  thread_options: drop fadvise_stream from thread_options (2017-12-01 14:54:49 -0700)

----------------------------------------------------------------
Jens Axboe (5):
      Add basic memcpy test
      fio_time: should include time.h
      memcpy: use malloc
      memcpy: free buffer in case of failure
      memcpy: add hybrid

Vincent Fu (1):
      thread_options: drop fadvise_stream from thread_options

 fio_time.h       |   1 +
 init.c           |  11 +++
 lib/memcpy.c     | 286 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/memcpy.h     |   6 ++
 server.h         |   2 +-
 thread_options.h |   3 -
 6 files changed, 305 insertions(+), 4 deletions(-)
 create mode 100644 lib/memcpy.c
 create mode 100644 lib/memcpy.h

---

Diff of recent changes:

diff --git a/fio_time.h b/fio_time.h
index c7c3dbb..ee8087e 100644
--- a/fio_time.h
+++ b/fio_time.h
@@ -1,6 +1,7 @@
 #ifndef FIO_TIME_H
 #define FIO_TIME_H
 
+#include <time.h>
 #include "lib/types.h"
 
 struct thread_data;
diff --git a/init.c b/init.c
index 607f7e0..c34bd15 100644
--- a/init.c
+++ b/init.c
@@ -32,6 +32,7 @@
 
 #include "crc/test.h"
 #include "lib/pow2.h"
+#include "lib/memcpy.h"
 
 const char fio_version_string[] = FIO_VERSION;
 
@@ -234,6 +235,11 @@ static struct option l_opts[FIO_NR_OPTIONS] = {
 		.val		= 'G',
 	},
 	{
+		.name		= (char *) "memcpytest",
+		.has_arg	= optional_argument,
+		.val		= 'M',
+	},
+	{
 		.name		= (char *) "idle-prof",
 		.has_arg	= required_argument,
 		.val		= 'I',
@@ -2731,6 +2737,11 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			do_exit++;
 			exit_val = fio_crctest(optarg);
 			break;
+		case 'M':
+			did_arg = true;
+			do_exit++;
+			exit_val = fio_memcpy_test(optarg);
+			break;
 		case 'L': {
 			long long val;
 
diff --git a/lib/memcpy.c b/lib/memcpy.c
new file mode 100644
index 0000000..00e65aa
--- /dev/null
+++ b/lib/memcpy.c
@@ -0,0 +1,286 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "memcpy.h"
+#include "rand.h"
+#include "../fio_time.h"
+#include "../gettime.h"
+#include "../fio.h"
+
+#define BUF_SIZE	32 * 1024 * 1024ULL
+
+#define NR_ITERS	64
+
+struct memcpy_test {
+	const char *name;
+	void *src;
+	void *dst;
+	size_t size;
+};
+
+static struct memcpy_test tests[] = {
+	{
+		.name		= "8 bytes",
+		.size		= 8,
+	},
+	{
+		.name		= "16 bytes",
+		.size		= 16,
+	},
+	{
+		.name		= "96 bytes",
+		.size		= 96,
+	},
+	{
+		.name		= "128 bytes",
+		.size		= 128,
+	},
+	{
+		.name		= "256 bytes",
+		.size		= 256,
+	},
+	{
+		.name		= "512 bytes",
+		.size		= 512,
+	},
+	{
+		.name		= "2048 bytes",
+		.size		= 2048,
+	},
+	{
+		.name		= "8192 bytes",
+		.size		= 8192,
+	},
+	{
+		.name		= "131072 bytes",
+		.size		= 131072,
+	},
+	{
+		.name		= "262144 bytes",
+		.size		= 262144,
+	},
+	{
+		.name		= "524288 bytes",
+		.size		= 524288,
+	},
+	{
+		.name		= NULL,
+	},
+};
+
+struct memcpy_type {
+	const char *name;
+	unsigned int mask;
+	void (*fn)(struct memcpy_test *);
+};
+
+enum {
+	T_MEMCPY	= 1U << 0,
+	T_MEMMOVE	= 1U << 1,
+	T_SIMPLE	= 1U << 2,
+	T_HYBRID	= 1U << 3,
+};
+
+#define do_test(test, fn)	do {					\
+	size_t left, this;						\
+	void *src, *dst;						\
+	int i;								\
+									\
+	for (i = 0; i < NR_ITERS; i++) {				\
+		left = BUF_SIZE;					\
+		src = test->src;					\
+		dst = test->dst;					\
+		while (left) {						\
+			this = test->size;				\
+			if (this > left)				\
+				this = left;				\
+			(fn)(dst, src, this);				\
+			left -= this;					\
+			src += this;					\
+			dst += this;					\
+		}							\
+	}								\
+} while (0)
+
+static void t_memcpy(struct memcpy_test *test)
+{
+	do_test(test, memcpy);
+}
+
+static void t_memmove(struct memcpy_test *test)
+{
+	do_test(test, memmove);
+}
+
+static void simple_memcpy(void *dst, void const *src, size_t len)
+{
+ 	char *d = dst;
+	const char *s = src;
+
+	while (len--)
+		*d++ = *s++;
+}
+
+static void t_simple(struct memcpy_test *test)
+{
+	do_test(test, simple_memcpy);
+}
+
+static void t_hybrid(struct memcpy_test *test)
+{
+	if (test->size >= 64)
+		do_test(test, simple_memcpy);
+	else
+		do_test(test, memcpy);
+}
+
+static struct memcpy_type t[] = {
+	{
+		.name = "memcpy",
+		.mask = T_MEMCPY,
+		.fn = t_memcpy,
+	},
+	{
+		.name = "memmove",
+		.mask = T_MEMMOVE,
+		.fn = t_memmove,
+	},
+	{
+		.name = "simple",
+		.mask = T_SIMPLE,
+		.fn = t_simple,
+	},
+	{
+		.name = "hybrid",
+		.mask = T_HYBRID,
+		.fn = t_hybrid,
+	},
+	{
+		.name = NULL,
+	},
+};
+
+static unsigned int get_test_mask(const char *type)
+{
+	char *ostr, *str = strdup(type);
+	unsigned int mask;
+	char *name;
+	int i;
+
+	ostr = str;
+	mask = 0;
+	while ((name = strsep(&str, ",")) != NULL) {
+		for (i = 0; t[i].name; i++) {
+			if (!strcmp(t[i].name, name)) {
+				mask |= t[i].mask;
+				break;
+			}
+		}
+	}
+
+	free(ostr);
+	return mask;
+}
+
+static int list_types(void)
+{
+	int i;
+
+	for (i = 0; t[i].name; i++)
+		printf("%s\n", t[i].name);
+
+	return 1;
+}
+
+static int setup_tests(void)
+{
+	struct memcpy_test *test;
+	struct frand_state state;
+	void *src, *dst;
+	int i;
+
+	src = malloc(BUF_SIZE);
+	dst = malloc(BUF_SIZE);
+	if (!src || !dst) {
+		free(src);
+		free(dst);
+		return 1;
+	}
+
+	init_rand_seed(&state, 0x8989, 0);
+	fill_random_buf(&state, src, BUF_SIZE);
+
+	for (i = 0; tests[i].name; i++) {
+		test = &tests[i];
+		test->src = src;
+		test->dst = dst;
+	}
+
+	return 0;
+}
+
+static void free_tests(void)
+{
+	free(tests[0].src);
+	free(tests[0].dst);
+}
+
+int fio_memcpy_test(const char *type)
+{
+	unsigned int test_mask = 0;
+	int j, i;
+
+	if (!type)
+		test_mask = ~0U;
+	else if (!strcmp(type, "help") || !strcmp(type, "list"))
+		return list_types();
+	else
+		test_mask = get_test_mask(type);
+
+	if (!test_mask) {
+		fprintf(stderr, "fio: unknown hash `%s`. Available:\n", type);
+		return list_types();
+	}
+
+	if (setup_tests()) {
+		fprintf(stderr, "setting up mem regions failed\n");
+		return 1;
+	}
+
+	for (i = 0; t[i].name; i++) {
+		struct timespec ts;
+		double mb_sec;
+		uint64_t usec;
+
+		if (!(t[i].mask & test_mask))
+			continue;
+
+		/*
+		 * For first run, make sure CPUs are spun up and that
+		 * we've touched the data.
+		 */
+		usec_spin(100000);
+		t[i].fn(&tests[0]);
+
+		printf("%s\n", t[i].name);
+
+		for (j = 0; tests[j].name; j++) {
+			fio_gettime(&ts, NULL);
+			t[i].fn(&tests[j]);
+			usec = utime_since_now(&ts);
+
+			if (usec) {
+				unsigned long long mb = NR_ITERS * BUF_SIZE;
+
+				mb_sec = (double) mb / (double) usec;
+				mb_sec /= (1.024 * 1.024);
+				printf("\t%s:\t%8.2f MiB/sec\n", tests[j].name, mb_sec);
+			} else
+				printf("\t%s:inf MiB/sec\n", tests[j].name);
+		}
+	}
+
+	free_tests();
+	return 0;
+}
diff --git a/lib/memcpy.h b/lib/memcpy.h
new file mode 100644
index 0000000..f61a4a0
--- /dev/null
+++ b/lib/memcpy.h
@@ -0,0 +1,6 @@
+#ifndef FIO_MEMCPY_H
+#define FIO_MEMCPY_H
+
+int fio_memcpy_test(const char *type);
+
+#endif
diff --git a/server.h b/server.h
index dbd5c27..438a6c3 100644
--- a/server.h
+++ b/server.h
@@ -49,7 +49,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 67,
+	FIO_SERVER_VER			= 68,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index a9c3bee..793df8a 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -218,7 +218,6 @@ struct thread_options {
 	unsigned int group_reporting;
 	unsigned int stats;
 	unsigned int fadvise_hint;
-	unsigned int fadvise_stream;
 	enum fio_fallocate_mode fallocate_mode;
 	unsigned int zero_buffers;
 	unsigned int refill_buffers;
@@ -494,7 +493,6 @@ struct thread_options_pack {
 	uint32_t group_reporting;
 	uint32_t stats;
 	uint32_t fadvise_hint;
-	uint32_t fadvise_stream;
 	uint32_t fallocate_mode;
 	uint32_t zero_buffers;
 	uint32_t refill_buffers;
@@ -520,7 +518,6 @@ struct thread_options_pack {
 	uint64_t trim_backlog;
 	uint32_t clat_percentiles;
 	uint32_t percentile_precision;
-	uint32_t pad;
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
 	uint8_t read_iolog_file[FIO_TOP_STR_MAX];

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-12-01 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-12-01 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6c3fb04c80c3c241162e743a54761e5e896d4ba2:

  options: correct parser type for max_latency (2017-11-29 22:27:05 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e1c325d25dd977c28c9489c542a51ee05dfc620e:

  io_u: don't account io issue blocks for verify backlog (2017-11-30 21:48:12 -0700)

----------------------------------------------------------------
Jens Axboe (20):
      io_u: use nsec value for buffer scramble
      io_u: tweak small content buffer scramble
      io_u: cleanup check_get_trim()
      io_u: speed up small_content_scramble()
      fio: add check rate flag
      client: ignore a client timeout, if the last thing we saw as a trigger
      server: process connection list before executing trigger
      steadystate: make flags conform to usual fio standard
      Bump support of zones to 256 max
      options: don't overrun bssplit array
      Documentation cleanup
      gettime-thread: fix failure to check setaffinity return value
      backend: make it clear that we passed 'fd' to the new thread
      t/verify-state: fix leak in error case
      engines/dev-dax: fix leak of 'sfile' in error case
      client: fix use-after-free for client timeout
      ioengine: don't account verify bytes
      Documentation: add note about how many bssplit and zones fio supports
      options: warn if we exceed the supported number of split entries
      io_u: don't account io issue blocks for verify backlog

 HOWTO             | 54 ++++++++++++++++++++++++------------------
 backend.c         |  1 +
 client.c          | 16 +++++++++----
 client.h          |  1 +
 engines/dev-dax.c |  1 +
 fio.1             | 11 ++++++---
 fio.h             | 21 ++++++++++------
 gettime-thread.c  |  9 ++++++-
 init.c            |  8 +++++++
 io_u.c            | 71 +++++++++++++++++++++++++++++--------------------------
 ioengines.c       |  6 +++--
 libfio.c          |  2 +-
 options.c         | 15 +++++++-----
 server.c          | 11 +++++----
 stat.c            | 20 ++++++++--------
 steadystate.c     | 52 ++++++++++++++++++++--------------------
 steadystate.h     | 35 +++++++++++++++++----------
 t/verify-state.c  |  2 ++
 thread_options.h  |  2 +-
 19 files changed, 202 insertions(+), 136 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index dc99e99..4caaf54 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1293,6 +1293,9 @@ I/O type
 
 		random_distribution=zoned_abs=60/20G:30/100G:10/500g
 
+	For both **zoned** and **zoned_abs**, fio supports defining up to
+	256 separate zones.
+
 	Similarly to how :option:`bssplit` works for setting ranges and
 	percentages of block sizes. Like :option:`bssplit`, it's possible to
 	specify separate zones for reads, writes, and trims. If just one set
@@ -1388,34 +1391,39 @@ Block size
 
 .. option:: bssplit=str[,str][,str]
 
-	Sometimes you want even finer grained control of the block sizes issued, not
-	just an even split between them.  This option allows you to weight various
-	block sizes, so that you are able to define a specific amount of block sizes
-	issued. The format for this option is::
+	Sometimes you want even finer grained control of the block sizes
+	issued, not just an even split between them.  This option allows you to
+	weight various block sizes, so that you are able to define a specific
+	amount of block sizes issued. The format for this option is::
 
 		bssplit=blocksize/percentage:blocksize/percentage
 
-	for as many block sizes as needed. So if you want to define a workload that
-	has 50% 64k blocks, 10% 4k blocks, and 40% 32k blocks, you would write::
+	for as many block sizes as needed. So if you want to define a workload
+	that has 50% 64k blocks, 10% 4k blocks, and 40% 32k blocks, you would
+	write::
 
 		bssplit=4k/10:64k/50:32k/40
 
-	Ordering does not matter. If the percentage is left blank, fio will fill in
-	the remaining values evenly. So a bssplit option like this one::
+	Ordering does not matter. If the percentage is left blank, fio will
+	fill in the remaining values evenly. So a bssplit option like this one::
 
 		bssplit=4k/50:1k/:32k/
 
-	would have 50% 4k ios, and 25% 1k and 32k ios. The percentages always add up
-	to 100, if bssplit is given a range that adds up to more, it will error out.
+	would have 50% 4k ios, and 25% 1k and 32k ios. The percentages always
+	add up to 100, if bssplit is given a range that adds up to more, it
+	will error out.
 
 	Comma-separated values may be specified for reads, writes, and trims as
 	described in :option:`blocksize`.
 
-	If you want a workload that has 50% 2k reads and 50% 4k reads, while having
-	90% 4k writes and 10% 8k writes, you would specify::
+	If you want a workload that has 50% 2k reads and 50% 4k reads, while
+	having 90% 4k writes and 10% 8k writes, you would specify::
 
 		bssplit=2k/50:4k/50,4k/90,8k/10
 
+	Fio supports defining up to 64 different weights for each data
+	direction.
+
 .. option:: blocksize_unaligned, bs_unaligned
 
 	If set, fio will issue I/O units with any size within
@@ -2950,20 +2958,20 @@ Measurements and reporting
 
 .. option:: percentile_list=float_list
 
-	Overwrite the default list of percentiles for completion latencies and the
-	block error histogram.  Each number is a floating number in the range
-	(0,100], and the maximum length of the list is 20. Use ``:`` to separate the
-	numbers, and list the numbers in ascending order. For example,
-	``--percentile_list=99.5:99.9`` will cause fio to report the values of
-	completion latency below which 99.5% and 99.9% of the observed latencies
-	fell, respectively.
+	Overwrite the default list of percentiles for completion latencies and
+	the block error histogram.  Each number is a floating number in the
+	range (0,100], and the maximum length of the list is 20. Use ``:`` to
+	separate the numbers, and list the numbers in ascending order. For
+	example, ``--percentile_list=99.5:99.9`` will cause fio to report the
+	values of completion latency below which 99.5% and 99.9% of the observed
+	latencies fell, respectively.
 
 .. option:: significant_figures=int
 
-	If using :option:`--output-format` of `normal`, set the significant figures 
-	to this	value. Higher values will yield more precise IOPS and throughput 
-	units, while lower values will round. Requires a minimum value of 1 and a 
-	maximum value of 10. Defaults to 4.
+	If using :option:`--output-format` of `normal`, set the significant
+	figures to this	value. Higher values will yield more precise IOPS and
+	throughput units, while lower values will round. Requires a minimum
+	value of 1 and a maximum value of 10. Defaults to 4.
 
 
 Error handling
diff --git a/backend.c b/backend.c
index 7cf9b38..10eb90e 100644
--- a/backend.c
+++ b/backend.c
@@ -2318,6 +2318,7 @@ reap:
 					nr_started--;
 					break;
 				}
+				fd = NULL;
 				ret = pthread_detach(td->thread);
 				if (ret)
 					log_err("pthread_detach: %s",
diff --git a/client.c b/client.c
index 11fa262..2b136a0 100644
--- a/client.c
+++ b/client.c
@@ -961,7 +961,7 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->ss_deviation.u.f 	= fio_uint64_to_double(le64_to_cpu(src->ss_deviation.u.i));
 	dst->ss_criterion.u.f 	= fio_uint64_to_double(le64_to_cpu(src->ss_criterion.u.i));
 
-	if (dst->ss_state & __FIO_SS_DATA) {
+	if (dst->ss_state & FIO_SS_DATA) {
 		for (i = 0; i < dst->ss_dur; i++ ) {
 			dst->ss_iops_data[i] = le64_to_cpu(src->ss_iops_data[i]);
 			dst->ss_bw_data[i] = le64_to_cpu(src->ss_bw_data[i]);
@@ -1666,6 +1666,8 @@ int fio_handle_client(struct fio_client *client)
 	dprint(FD_NET, "client: got cmd op %s from %s (pdu=%u)\n",
 		fio_server_op(cmd->opcode), client->hostname, cmd->pdu_len);
 
+	client->last_cmd = cmd->opcode;
+
 	switch (cmd->opcode) {
 	case FIO_NET_CMD_QUIT:
 		if (ops->quit)
@@ -1689,7 +1691,7 @@ int fio_handle_client(struct fio_client *client)
 		struct cmd_ts_pdu *p = (struct cmd_ts_pdu *) cmd->payload;
 
 		dprint(FD_NET, "client: ts->ss_state = %u\n", (unsigned int) le32_to_cpu(p->ts.ss_state));
-		if (le32_to_cpu(p->ts.ss_state) & __FIO_SS_DATA) {
+		if (le32_to_cpu(p->ts.ss_state) & FIO_SS_DATA) {
 			dprint(FD_NET, "client: received steadystate ring buffers\n");
 
 			size = le64_to_cpu(p->ts.ss_dur);
@@ -1901,16 +1903,19 @@ static int client_check_cmd_timeout(struct fio_client *client,
 	int ret = 0;
 
 	flist_for_each_safe(entry, tmp, &client->cmd_list) {
+		unsigned int op;
+
 		reply = flist_entry(entry, struct fio_net_cmd_reply, list);
 
 		if (mtime_since(&reply->ts, now) < FIO_NET_CLIENT_TIMEOUT)
 			continue;
 
+		op = reply->opcode;
 		if (!handle_cmd_timeout(client, reply))
 			continue;
 
 		log_err("fio: client %s, timeout on cmd %s\n", client->hostname,
-						fio_server_op(reply->opcode));
+						fio_server_op(op));
 		ret = 1;
 	}
 
@@ -1940,7 +1945,10 @@ static int fio_check_clients_timed_out(void)
 		else
 			log_err("fio: client %s timed out\n", client->hostname);
 
-		client->error = ETIMEDOUT;
+		if (client->last_cmd != FIO_NET_CMD_VTRIGGER)
+			client->error = ETIMEDOUT;
+		else
+			log_info("fio: ignoring timeout due to vtrigger\n");
 		remove_client(client);
 		ret = 1;
 	}
diff --git a/client.h b/client.h
index 394b685..90082a3 100644
--- a/client.h
+++ b/client.h
@@ -39,6 +39,7 @@ struct fio_client {
 	int port;
 	int fd;
 	unsigned int refs;
+	unsigned int last_cmd;
 
 	char *name;
 
diff --git a/engines/dev-dax.c b/engines/dev-dax.c
index 235a31e..b1f91a4 100644
--- a/engines/dev-dax.c
+++ b/engines/dev-dax.c
@@ -307,6 +307,7 @@ fio_devdax_get_file_size(struct thread_data *td, struct fio_file *f)
 	if (rc < 0) {
 		log_err("%s: fscanf on %s failed (%s)\n",
 			td->o.name, spath, strerror(errno));
+		fclose(sfile);
 		return 1;
 	}
 
diff --git a/fio.1 b/fio.1
index 01b4db6..54d1b0f 100644
--- a/fio.1
+++ b/fio.1
@@ -1090,6 +1090,9 @@ we can define an absolute zoning distribution with:
 random_distribution=zoned:60/10:30/20:8/30:2/40
 .RE
 .P
+For both \fBzoned\fR and \fBzoned_abs\fR, fio supports defining up to 256
+separate zones.
+.P
 Similarly to how \fBbssplit\fR works for setting ranges and percentages
 of block sizes. Like \fBbssplit\fR, it's possible to specify separate
 zones for reads, writes, and trims. If just one set is given, it'll apply to
@@ -1219,6 +1222,8 @@ If you want a workload that has 50% 2k reads and 50% 4k reads, while having
 .P
 bssplit=2k/50:4k/50,4k/90,8k/10
 .RE
+.P
+Fio supports defining up to 64 different weights for each data direction.
 .RE
 .TP
 .BI blocksize_unaligned "\fR,\fB bs_unaligned"
@@ -2639,9 +2644,9 @@ completion latency below which 99.5% and 99.9% of the observed latencies
 fell, respectively.
 .TP
 .BI significant_figures \fR=\fPint
-If using \fB\-\-output\-format\fR of `normal', set the significant figures 
-to this value. Higher values will yield more precise IOPS and throughput 
-units, while lower values will round. Requires a minimum value of 1 and a 
+If using \fB\-\-output\-format\fR of `normal', set the significant figures
+to this value. Higher values will yield more precise IOPS and throughput
+units, while lower values will round. Requires a minimum value of 1 and a
 maximum value of 10. Defaults to 4.
 .SS "Error handling"
 .TP
diff --git a/fio.h b/fio.h
index a44f1aa..6b184c2 100644
--- a/fio.h
+++ b/fio.h
@@ -88,6 +88,7 @@ enum {
 	__TD_F_REGROW_LOGS,
 	__TD_F_MMAP_KEEP,
 	__TD_F_DIRS_CREATED,
+	__TD_F_CHECK_RATE,
 	__TD_F_LAST,		/* not a real bit, keep last */
 };
 
@@ -108,6 +109,7 @@ enum {
 	TD_F_REGROW_LOGS	= 1U << __TD_F_REGROW_LOGS,
 	TD_F_MMAP_KEEP		= 1U << __TD_F_MMAP_KEEP,
 	TD_F_DIRS_CREATED	= 1U << __TD_F_DIRS_CREATED,
+	TD_F_CHECK_RATE		= 1U << __TD_F_CHECK_RATE,
 };
 
 enum {
@@ -610,8 +612,8 @@ enum {
 	TD_NR,
 };
 
-#define TD_ENG_FLAG_SHIFT	16
-#define TD_ENG_FLAG_MASK	((1U << 16) - 1)
+#define TD_ENG_FLAG_SHIFT	17
+#define TD_ENG_FLAG_MASK	((1U << 17) - 1)
 
 static inline void td_set_ioengine_flags(struct thread_data *td)
 {
@@ -700,8 +702,7 @@ static inline bool fio_fill_issue_time(struct thread_data *td)
 	return false;
 }
 
-static inline bool __should_check_rate(struct thread_data *td,
-				       enum fio_ddir ddir)
+static inline bool option_check_rate(struct thread_data *td, enum fio_ddir ddir)
 {
 	struct thread_options *o = &td->o;
 
@@ -715,13 +716,19 @@ static inline bool __should_check_rate(struct thread_data *td,
 	return false;
 }
 
+static inline bool __should_check_rate(struct thread_data *td,
+				       enum fio_ddir ddir)
+{
+	return (td->flags & TD_F_CHECK_RATE) != 0;
+}
+
 static inline bool should_check_rate(struct thread_data *td)
 {
-	if (td->bytes_done[DDIR_READ] && __should_check_rate(td, DDIR_READ))
+	if (__should_check_rate(td, DDIR_READ) && td->bytes_done[DDIR_READ])
 		return true;
-	if (td->bytes_done[DDIR_WRITE] && __should_check_rate(td, DDIR_WRITE))
+	if (__should_check_rate(td, DDIR_WRITE) && td->bytes_done[DDIR_WRITE])
 		return true;
-	if (td->bytes_done[DDIR_TRIM] && __should_check_rate(td, DDIR_TRIM))
+	if (__should_check_rate(td, DDIR_TRIM) && td->bytes_done[DDIR_TRIM])
 		return true;
 
 	return false;
diff --git a/gettime-thread.c b/gettime-thread.c
index cbb81dc..fc52236 100644
--- a/gettime-thread.c
+++ b/gettime-thread.c
@@ -42,10 +42,17 @@ struct gtod_cpu_data {
 static void *gtod_thread_main(void *data)
 {
 	struct fio_mutex *mutex = data;
+	int ret;
+
+	ret = fio_setaffinity(gettid(), fio_gtod_cpumask);
 
-	fio_setaffinity(gettid(), fio_gtod_cpumask);
 	fio_mutex_up(mutex);
 
+	if (ret == -1) {
+		log_err("gtod: setaffinity failed\n");
+		return NULL;
+	}
+
 	/*
 	 * As long as we have jobs around, update the clock. It would be nice
 	 * to have some way of NOT hammering that CPU with gettimeofday(),
diff --git a/init.c b/init.c
index 7c16b06..607f7e0 100644
--- a/init.c
+++ b/init.c
@@ -1112,6 +1112,7 @@ int ioengine_load(struct thread_data *td)
 static void init_flags(struct thread_data *td)
 {
 	struct thread_options *o = &td->o;
+	int i;
 
 	if (o->verify_backlog)
 		td->flags |= TD_F_VER_BACKLOG;
@@ -1141,6 +1142,13 @@ static void init_flags(struct thread_data *td)
 
 	if (o->mem_type == MEM_CUDA_MALLOC)
 		td->flags &= ~TD_F_SCRAMBLE_BUFFERS;
+
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		if (option_check_rate(td, i)) {
+			td->flags |= TD_F_CHECK_RATE;
+			break;
+		}
+	}
 }
 
 static int setup_random_seeds(struct thread_data *td)
diff --git a/io_u.c b/io_u.c
index ebe82e1..44933a1 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1615,22 +1615,19 @@ static bool check_get_trim(struct thread_data *td, struct io_u *io_u)
 {
 	if (!(td->flags & TD_F_TRIM_BACKLOG))
 		return false;
+	if (!td->trim_entries)
+		return false;
 
-	if (td->trim_entries) {
-		int get_trim = 0;
-
-		if (td->trim_batch) {
-			td->trim_batch--;
-			get_trim = 1;
-		} else if (!(td->io_hist_len % td->o.trim_backlog) &&
-			 td->last_ddir != DDIR_READ) {
-			td->trim_batch = td->o.trim_batch;
-			if (!td->trim_batch)
-				td->trim_batch = td->o.trim_backlog;
-			get_trim = 1;
-		}
-
-		if (get_trim && get_next_trim(td, io_u))
+	if (td->trim_batch) {
+		td->trim_batch--;
+		if (get_next_trim(td, io_u))
+			return true;
+	} else if (!(td->io_hist_len % td->o.trim_backlog) &&
+		     td->last_ddir != DDIR_READ) {
+		td->trim_batch = td->o.trim_batch;
+		if (!td->trim_batch)
+			td->trim_batch = td->o.trim_backlog;
+		if (get_next_trim(td, io_u))
 			return true;
 	}
 
@@ -1672,35 +1669,40 @@ static bool check_get_verify(struct thread_data *td, struct io_u *io_u)
  */
 static void small_content_scramble(struct io_u *io_u)
 {
-	unsigned int i, nr_blocks = io_u->buflen / 512;
-	uint64_t boffset, usec;
+	unsigned int i, nr_blocks = io_u->buflen >> 9;
 	unsigned int offset;
-	char *p, *end;
+	uint64_t boffset, *iptr;
+	char *p;
 
 	if (!nr_blocks)
 		return;
 
 	p = io_u->xfer_buf;
 	boffset = io_u->offset;
-	io_u->buf_filled_len = 0;
 
-	/* close enough for this purpose */
-	usec = io_u->start_time.tv_nsec >> 10;
+	if (io_u->buf_filled_len)
+		io_u->buf_filled_len = 0;
+
+	/*
+	 * Generate random index between 0..7. We do chunks of 512b, if
+	 * we assume a cacheline is 64 bytes, then we have 8 of those.
+	 * Scramble content within the blocks in the same cacheline to
+	 * speed things up.
+	 */
+	offset = (io_u->start_time.tv_nsec ^ boffset) & 7;
 
 	for (i = 0; i < nr_blocks; i++) {
 		/*
-		 * Fill the byte offset into a "random" start offset of
-		 * the buffer, given by the product of the usec time
-		 * and the actual offset.
+		 * Fill offset into start of cacheline, time into end
+		 * of cacheline
 		 */
-		offset = (usec ^ boffset) & 511;
-		offset &= ~(sizeof(uint64_t) - 1);
-		if (offset >= 512 - sizeof(uint64_t))
-			offset -= sizeof(uint64_t);
-		memcpy(p + offset, &boffset, sizeof(boffset));
-
-		end = p + 512 - sizeof(io_u->start_time);
-		memcpy(end, &io_u->start_time, sizeof(io_u->start_time));
+		iptr = (void *) p + (offset << 6);
+		*iptr = boffset;
+
+		iptr = (void *) p + 64 - 2 * sizeof(uint64_t);
+		iptr[0] = io_u->start_time.tv_sec;
+		iptr[1] = io_u->start_time.tv_nsec;
+
 		p += 512;
 		boffset += 512;
 	}
@@ -1975,11 +1977,12 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 		int ret;
 
 		td->io_blocks[ddir]++;
-		td->this_io_blocks[ddir]++;
 		td->io_bytes[ddir] += bytes;
 
-		if (!(io_u->flags & IO_U_F_VER_LIST))
+		if (!(io_u->flags & IO_U_F_VER_LIST)) {
+			td->this_io_blocks[ddir]++;
 			td->this_io_bytes[ddir] += bytes;
+		}
 
 		if (ddir == DDIR_WRITE)
 			file_log_write_comp(td, f, io_u->offset, bytes);
diff --git a/ioengines.c b/ioengines.c
index 02eaee8..7951ff3 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -309,8 +309,10 @@ int td_io_queue(struct thread_data *td, struct io_u *io_u)
 	}
 
 	if (ddir_rw(ddir)) {
-		td->io_issues[ddir]++;
-		td->io_issue_bytes[ddir] += buflen;
+		if (!(io_u->flags & IO_U_F_VER_LIST)) {
+			td->io_issues[ddir]++;
+			td->io_issue_bytes[ddir] += buflen;
+		}
 		td->rate_io_issue_bytes[ddir] += buflen;
 	}
 
diff --git a/libfio.c b/libfio.c
index c9bb8f3..74de735 100644
--- a/libfio.c
+++ b/libfio.c
@@ -366,7 +366,7 @@ int initialize_fio(char *envp[])
 	compiletime_assert((offsetof(struct jobs_eta, m_rate) % 8) == 0, "m_rate");
 
 	compiletime_assert(__TD_F_LAST <= TD_ENG_FLAG_SHIFT, "TD_ENG_FLAG_SHIFT");
-	compiletime_assert(BSSPLIT_MAX == ZONESPLIT_MAX, "bsssplit/zone max");
+	compiletime_assert(BSSPLIT_MAX <= ZONESPLIT_MAX, "bsssplit/zone max");
 
 	err = endian_check();
 	if (err) {
diff --git a/options.c b/options.c
index a224e7b..3fa646c 100644
--- a/options.c
+++ b/options.c
@@ -61,7 +61,8 @@ struct split {
 };
 
 static int split_parse_ddir(struct thread_options *o, struct split *split,
-			    enum fio_ddir ddir, char *str, bool absolute)
+			    enum fio_ddir ddir, char *str, bool absolute,
+			    unsigned int max_splits)
 {
 	unsigned long long perc;
 	unsigned int i;
@@ -109,8 +110,10 @@ static int split_parse_ddir(struct thread_options *o, struct split *split,
 		split->val1[i] = val;
 		split->val2[i] = perc;
 		i++;
-		if (i == ZONESPLIT_MAX)
+		if (i == max_splits) {
+			log_err("fio: hit max of %d split entries\n", i);
 			break;
+		}
 	}
 
 	split->nr = i;
@@ -126,7 +129,7 @@ static int bssplit_ddir(struct thread_options *o, enum fio_ddir ddir, char *str,
 
 	memset(&split, 0, sizeof(split));
 
-	if (split_parse_ddir(o, &split, ddir, str, data))
+	if (split_parse_ddir(o, &split, ddir, str, data, BSSPLIT_MAX))
 		return 1;
 	if (!split.nr)
 		return 0;
@@ -846,7 +849,7 @@ static int zone_split_ddir(struct thread_options *o, enum fio_ddir ddir,
 
 	memset(&split, 0, sizeof(split));
 
-	if (split_parse_ddir(o, &split, ddir, str, absolute))
+	if (split_parse_ddir(o, &split, ddir, str, absolute, ZONESPLIT_MAX))
 		return 1;
 	if (!split.nr)
 		return 0;
@@ -1127,9 +1130,9 @@ static int str_steadystate_cb(void *data, const char *str)
 		if (parse_dryrun())
 			return 0;
 
-		td->o.ss_state |= __FIO_SS_PCT;
+		td->o.ss_state |= FIO_SS_PCT;
 		td->o.ss_limit.u.f = val;
-	} else if (td->o.ss_state & __FIO_SS_IOPS) {
+	} else if (td->o.ss_state & FIO_SS_IOPS) {
 		if (!str_to_float(nr, &val, 0)) {
 			log_err("fio: steadystate IOPS threshold postfix parsing failed\n");
 			free(nr);
diff --git a/server.c b/server.c
index 967cebe..76d662d 100644
--- a/server.c
+++ b/server.c
@@ -616,7 +616,7 @@ static int fio_net_queue_quit(void)
 {
 	dprint(FD_NET, "server: sending quit\n");
 
-	return fio_net_queue_cmd(FIO_NET_CMD_QUIT, NULL, 0, NULL, SK_F_SIMPLE);
+	return fio_net_queue_cmd(FIO_NET_CMD_QUIT, NULL, 0, NULL, SK_F_SIMPLE | SK_F_INLINE);
 }
 
 int fio_net_send_quit(int sk)
@@ -636,7 +636,7 @@ static int fio_net_send_ack(struct fio_net_cmd *cmd, int error, int signal)
 
 	epdu.error = __cpu_to_le32(error);
 	epdu.signal = __cpu_to_le32(signal);
-	return fio_net_queue_cmd(FIO_NET_CMD_STOP, &epdu, sizeof(epdu), &tag, SK_F_COPY);
+	return fio_net_queue_cmd(FIO_NET_CMD_STOP, &epdu, sizeof(epdu), &tag, SK_F_COPY | SK_F_INLINE);
 }
 
 static int fio_net_queue_stop(int error, int signal)
@@ -951,7 +951,7 @@ static int handle_update_job_cmd(struct fio_net_cmd *cmd)
 	return 0;
 }
 
-static int handle_trigger_cmd(struct fio_net_cmd *cmd)
+static int handle_trigger_cmd(struct fio_net_cmd *cmd, struct flist_head *job_list)
 {
 	struct cmd_vtrigger_pdu *pdu = (struct cmd_vtrigger_pdu *) cmd->payload;
 	char *buf = (char *) pdu->cmd;
@@ -971,6 +971,7 @@ static int handle_trigger_cmd(struct fio_net_cmd *cmd)
 		fio_net_queue_cmd(FIO_NET_CMD_VTRIGGER, rep, sz, NULL, SK_F_FREE | SK_F_INLINE);
 
 	fio_terminate_threads(TERMINATE_ALL);
+	fio_server_check_jobs(job_list);
 	exec_trigger(buf);
 	return 0;
 }
@@ -1014,7 +1015,7 @@ static int handle_command(struct sk_out *sk_out, struct flist_head *job_list,
 		ret = handle_update_job_cmd(cmd);
 		break;
 	case FIO_NET_CMD_VTRIGGER:
-		ret = handle_trigger_cmd(cmd);
+		ret = handle_trigger_cmd(cmd, job_list);
 		break;
 	case FIO_NET_CMD_SENDFILE: {
 		struct cmd_sendfile_reply *in;
@@ -1555,7 +1556,7 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	convert_gs(&p.rs, rs);
 
 	dprint(FD_NET, "ts->ss_state = %d\n", ts->ss_state);
-	if (ts->ss_state & __FIO_SS_DATA) {
+	if (ts->ss_state & FIO_SS_DATA) {
 		dprint(FD_NET, "server sending steadystate ring buffers\n");
 
 		ss_buf = malloc(sizeof(p) + 2*ts->ss_dur*sizeof(uint64_t));
diff --git a/stat.c b/stat.c
index 48d8e7d..863aa45 100644
--- a/stat.c
+++ b/stat.c
@@ -743,12 +743,12 @@ static void show_ss_normal(struct thread_stat *ts, struct buf_output *out)
 	p2 = num2str(iops_mean, ts->sig_figs, 1, 0, N2S_NONE);
 
 	log_buf(out, "  steadystate  : attained=%s, bw=%s (%s), iops=%s, %s%s=%.3f%s\n",
-		ts->ss_state & __FIO_SS_ATTAINED ? "yes" : "no",
+		ts->ss_state & FIO_SS_ATTAINED ? "yes" : "no",
 		p1, p1alt, p2,
-		ts->ss_state & __FIO_SS_IOPS ? "iops" : "bw",
-		ts->ss_state & __FIO_SS_SLOPE ? " slope": " mean dev",
+		ts->ss_state & FIO_SS_IOPS ? "iops" : "bw",
+		ts->ss_state & FIO_SS_SLOPE ? " slope": " mean dev",
 		ts->ss_criterion.u.f,
-		ts->ss_state & __FIO_SS_PCT ? "%" : "");
+		ts->ss_state & FIO_SS_PCT ? "%" : "");
 
 	free(p1);
 	free(p1alt);
@@ -1353,19 +1353,19 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 		char ss_buf[64];
 
 		snprintf(ss_buf, sizeof(ss_buf), "%s%s:%f%s",
-			ts->ss_state & __FIO_SS_IOPS ? "iops" : "bw",
-			ts->ss_state & __FIO_SS_SLOPE ? "_slope" : "",
+			ts->ss_state & FIO_SS_IOPS ? "iops" : "bw",
+			ts->ss_state & FIO_SS_SLOPE ? "_slope" : "",
 			(float) ts->ss_limit.u.f,
-			ts->ss_state & __FIO_SS_PCT ? "%" : "");
+			ts->ss_state & FIO_SS_PCT ? "%" : "");
 
 		tmp = json_create_object();
 		json_object_add_value_object(root, "steadystate", tmp);
 		json_object_add_value_string(tmp, "ss", ss_buf);
 		json_object_add_value_int(tmp, "duration", (int)ts->ss_dur);
-		json_object_add_value_int(tmp, "attained", (ts->ss_state & __FIO_SS_ATTAINED) > 0);
+		json_object_add_value_int(tmp, "attained", (ts->ss_state & FIO_SS_ATTAINED) > 0);
 
 		snprintf(ss_buf, sizeof(ss_buf), "%f%s", (float) ts->ss_criterion.u.f,
-			ts->ss_state & __FIO_SS_PCT ? "%" : "");
+			ts->ss_state & FIO_SS_PCT ? "%" : "");
 		json_object_add_value_string(tmp, "criterion", ss_buf);
 		json_object_add_value_float(tmp, "max_deviation", ts->ss_deviation.u.f);
 		json_object_add_value_float(tmp, "slope", ts->ss_slope.u.f);
@@ -1381,7 +1381,7 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 		** otherwise it actually points to the second element
 		** in the list
 		*/
-		if ((ts->ss_state & __FIO_SS_ATTAINED) || !(ts->ss_state & __FIO_SS_BUFFER_FULL))
+		if ((ts->ss_state & FIO_SS_ATTAINED) || !(ts->ss_state & FIO_SS_BUFFER_FULL))
 			j = ts->ss_head;
 		else
 			j = ts->ss_head == 0 ? ts->ss_dur - 1 : ts->ss_head - 1;
diff --git a/steadystate.c b/steadystate.c
index 45d4f5d..05ce029 100644
--- a/steadystate.c
+++ b/steadystate.c
@@ -11,7 +11,7 @@ static void steadystate_alloc(struct thread_data *td)
 	td->ss.bw_data = calloc(td->ss.dur, sizeof(uint64_t));
 	td->ss.iops_data = calloc(td->ss.dur, sizeof(uint64_t));
 
-	td->ss.state |= __FIO_SS_DATA;
+	td->ss.state |= FIO_SS_DATA;
 }
 
 void steadystate_setup(void)
@@ -63,33 +63,33 @@ static bool steadystate_slope(uint64_t iops, uint64_t bw,
 	ss->bw_data[ss->tail] = bw;
 	ss->iops_data[ss->tail] = iops;
 
-	if (ss->state & __FIO_SS_IOPS)
+	if (ss->state & FIO_SS_IOPS)
 		new_val = iops;
 	else
 		new_val = bw;
 
-	if (ss->state & __FIO_SS_BUFFER_FULL || ss->tail - ss->head == ss->dur - 1) {
-		if (!(ss->state & __FIO_SS_BUFFER_FULL)) {
+	if (ss->state & FIO_SS_BUFFER_FULL || ss->tail - ss->head == ss->dur - 1) {
+		if (!(ss->state & FIO_SS_BUFFER_FULL)) {
 			/* first time through */
 			for(i = 0, ss->sum_y = 0; i < ss->dur; i++) {
-				if (ss->state & __FIO_SS_IOPS)
+				if (ss->state & FIO_SS_IOPS)
 					ss->sum_y += ss->iops_data[i];
 				else
 					ss->sum_y += ss->bw_data[i];
 				j = (ss->head + i) % ss->dur;
-				if (ss->state & __FIO_SS_IOPS)
+				if (ss->state & FIO_SS_IOPS)
 					ss->sum_xy += i * ss->iops_data[j];
 				else
 					ss->sum_xy += i * ss->bw_data[j];
 			}
-			ss->state |= __FIO_SS_BUFFER_FULL;
+			ss->state |= FIO_SS_BUFFER_FULL;
 		} else {		/* easy to update the sums */
 			ss->sum_y -= ss->oldest_y;
 			ss->sum_y += new_val;
 			ss->sum_xy = ss->sum_xy - ss->sum_y + ss->dur * new_val;
 		}
 
-		if (ss->state & __FIO_SS_IOPS)
+		if (ss->state & FIO_SS_IOPS)
 			ss->oldest_y = ss->iops_data[ss->head];
 		else
 			ss->oldest_y = ss->bw_data[ss->head];
@@ -102,7 +102,7 @@ static bool steadystate_slope(uint64_t iops, uint64_t bw,
 		 */
 		ss->slope = (ss->sum_xy - (double) ss->sum_x * ss->sum_y / ss->dur) /
 				(ss->sum_x_sq - (double) ss->sum_x * ss->sum_x / ss->dur);
-		if (ss->state & __FIO_SS_PCT)
+		if (ss->state & FIO_SS_PCT)
 			ss->criterion = 100.0 * ss->slope / (ss->sum_y / ss->dur);
 		else
 			ss->criterion = ss->slope;
@@ -137,24 +137,24 @@ static bool steadystate_deviation(uint64_t iops, uint64_t bw,
 	ss->bw_data[ss->tail] = bw;
 	ss->iops_data[ss->tail] = iops;
 
-	if (ss->state & __FIO_SS_BUFFER_FULL || ss->tail - ss->head == ss->dur - 1) {
-		if (!(ss->state & __FIO_SS_BUFFER_FULL)) {
+	if (ss->state & FIO_SS_BUFFER_FULL || ss->tail - ss->head == ss->dur - 1) {
+		if (!(ss->state & FIO_SS_BUFFER_FULL)) {
 			/* first time through */
 			for(i = 0, ss->sum_y = 0; i < ss->dur; i++)
-				if (ss->state & __FIO_SS_IOPS)
+				if (ss->state & FIO_SS_IOPS)
 					ss->sum_y += ss->iops_data[i];
 				else
 					ss->sum_y += ss->bw_data[i];
-			ss->state |= __FIO_SS_BUFFER_FULL;
+			ss->state |= FIO_SS_BUFFER_FULL;
 		} else {		/* easy to update the sum */
 			ss->sum_y -= ss->oldest_y;
-			if (ss->state & __FIO_SS_IOPS)
+			if (ss->state & FIO_SS_IOPS)
 				ss->sum_y += ss->iops_data[ss->tail];
 			else
 				ss->sum_y += ss->bw_data[ss->tail];
 		}
 
-		if (ss->state & __FIO_SS_IOPS)
+		if (ss->state & FIO_SS_IOPS)
 			ss->oldest_y = ss->iops_data[ss->head];
 		else
 			ss->oldest_y = ss->bw_data[ss->head];
@@ -163,14 +163,14 @@ static bool steadystate_deviation(uint64_t iops, uint64_t bw,
 		ss->deviation = 0.0;
 
 		for (i = 0; i < ss->dur; i++) {
-			if (ss->state & __FIO_SS_IOPS)
+			if (ss->state & FIO_SS_IOPS)
 				diff = ss->iops_data[i] - mean;
 			else
 				diff = ss->bw_data[i] - mean;
 			ss->deviation = max(ss->deviation, diff * (diff < 0.0 ? -1.0 : 1.0));
 		}
 
-		if (ss->state & __FIO_SS_PCT)
+		if (ss->state & FIO_SS_PCT)
 			ss->criterion = 100.0 * ss->deviation / mean;
 		else
 			ss->criterion = ss->deviation;
@@ -207,7 +207,7 @@ void steadystate_check(void)
 
 		if (!ss->dur || td->runstate <= TD_SETTING_UP ||
 		    td->runstate >= TD_EXITED || !ss->state ||
-		    ss->state & __FIO_SS_ATTAINED)
+		    ss->state & FIO_SS_ATTAINED)
 			continue;
 
 		td_iops = 0;
@@ -221,13 +221,13 @@ void steadystate_check(void)
 		prev_groupid = td->groupid;
 
 		fio_gettime(&now, NULL);
-		if (ss->ramp_time && !(ss->state & __FIO_SS_RAMP_OVER)) {
+		if (ss->ramp_time && !(ss->state & FIO_SS_RAMP_OVER)) {
 			/*
 			 * Begin recording data one second after ss->ramp_time
 			 * has elapsed
 			 */
 			if (utime_since(&td->epoch, &now) >= (ss->ramp_time + 1000000L))
-				ss->state |= __FIO_SS_RAMP_OVER;
+				ss->state |= FIO_SS_RAMP_OVER;
 		}
 
 		td_io_u_lock(td);
@@ -247,7 +247,7 @@ void steadystate_check(void)
 		 * prev_iops/bw the first time through after ss->ramp_time
 		 * is done.
 		 */
-		if (ss->state & __FIO_SS_RAMP_OVER) {
+		if (ss->state & FIO_SS_RAMP_OVER) {
 			group_bw += 1000 * (td_bytes - ss->prev_bytes) / rate_time;
 			group_iops += 1000 * (td_iops - ss->prev_iops) / rate_time;
 			++group_ramp_time_over;
@@ -255,7 +255,7 @@ void steadystate_check(void)
 		ss->prev_iops = td_iops;
 		ss->prev_bytes = td_bytes;
 
-		if (td->o.group_reporting && !(ss->state & __FIO_SS_DATA))
+		if (td->o.group_reporting && !(ss->state & FIO_SS_DATA))
 			continue;
 
 		/*
@@ -273,7 +273,7 @@ void steadystate_check(void)
 					(unsigned long long) group_bw,
 					ss->head, ss->tail);
 
-		if (ss->state & __FIO_SS_SLOPE)
+		if (ss->state & FIO_SS_SLOPE)
 			ret = steadystate_slope(group_iops, group_bw, td);
 		else
 			ret = steadystate_deviation(group_iops, group_bw, td);
@@ -282,12 +282,12 @@ void steadystate_check(void)
 			if (td->o.group_reporting) {
 				for_each_td(td2, j) {
 					if (td2->groupid == td->groupid) {
-						td2->ss.state |= __FIO_SS_ATTAINED;
+						td2->ss.state |= FIO_SS_ATTAINED;
 						fio_mark_td_terminate(td2);
 					}
 				}
 			} else {
-				ss->state |= __FIO_SS_ATTAINED;
+				ss->state |= FIO_SS_ATTAINED;
 				fio_mark_td_terminate(td);
 			}
 		}
@@ -314,7 +314,7 @@ int td_steadystate_init(struct thread_data *td)
 
 		ss->state = o->ss_state;
 		if (!td->ss.ramp_time)
-			ss->state |= __FIO_SS_RAMP_OVER;
+			ss->state |= FIO_SS_RAMP_OVER;
 
 		ss->sum_x = o->ss_dur * (o->ss_dur - 1) / 2;
 		ss->sum_x_sq = (o->ss_dur - 1) * (o->ss_dur) * (2*o->ss_dur - 1) / 6;
diff --git a/steadystate.h b/steadystate.h
index bbc3945..eaba0d7 100644
--- a/steadystate.h
+++ b/steadystate.h
@@ -41,19 +41,28 @@ struct steadystate_data {
 };
 
 enum {
-	__FIO_SS_IOPS		= 1,
-	__FIO_SS_BW		= 2,
-	__FIO_SS_SLOPE		= 4,
-	__FIO_SS_ATTAINED	= 8,
-	__FIO_SS_RAMP_OVER	= 16,
-	__FIO_SS_DATA		= 32,
-	__FIO_SS_PCT		= 64,
-	__FIO_SS_BUFFER_FULL	= 128,
-
-	FIO_SS_IOPS		= __FIO_SS_IOPS,
-	FIO_SS_IOPS_SLOPE	= __FIO_SS_IOPS | __FIO_SS_SLOPE,
-	FIO_SS_BW		= __FIO_SS_BW,
-	FIO_SS_BW_SLOPE		= __FIO_SS_BW | __FIO_SS_SLOPE,
+	__FIO_SS_IOPS = 0,
+	__FIO_SS_BW,
+	__FIO_SS_SLOPE,
+	__FIO_SS_ATTAINED,
+	__FIO_SS_RAMP_OVER,
+	__FIO_SS_DATA,
+	__FIO_SS_PCT,
+	__FIO_SS_BUFFER_FULL,
+};
+
+enum {
+	FIO_SS_IOPS		= 1 << __FIO_SS_IOPS,
+	FIO_SS_BW		= 1 << __FIO_SS_BW,
+	FIO_SS_SLOPE		= 1 << __FIO_SS_SLOPE,
+	FIO_SS_ATTAINED		= 1 << __FIO_SS_ATTAINED,
+	FIO_SS_RAMP_OVER	= 1 << __FIO_SS_RAMP_OVER,
+	FIO_SS_DATA		= 1 << __FIO_SS_DATA,
+	FIO_SS_PCT		= 1 << __FIO_SS_PCT,
+	FIO_SS_BUFFER_FULL	= 1 << __FIO_SS_BUFFER_FULL,
+
+	FIO_SS_IOPS_SLOPE	= FIO_SS_IOPS | FIO_SS_SLOPE,
+	FIO_SS_BW_SLOPE		= FIO_SS_BW | FIO_SS_SLOPE,
 };
 
 #define STEADYSTATE_MSEC	1000
diff --git a/t/verify-state.c b/t/verify-state.c
index 78a56da..734c1e4 100644
--- a/t/verify-state.c
+++ b/t/verify-state.c
@@ -119,10 +119,12 @@ static int show_file(const char *file)
 	if (ret < 0) {
 		log_err("read: %s\n", strerror(errno));
 		close(fd);
+		free(buf);
 		return 1;
 	} else if (ret != sb.st_size) {
 		log_err("Short read\n");
 		close(fd);
+		free(buf);
 		return 1;
 	}
 
diff --git a/thread_options.h b/thread_options.h
index 3532300..a9c3bee 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -26,7 +26,7 @@ enum fio_memtype {
 #define ERROR_STR_MAX	128
 
 #define BSSPLIT_MAX	64
-#define ZONESPLIT_MAX	64
+#define ZONESPLIT_MAX	256
 
 struct bssplit {
 	uint32_t bs;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-11-30 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-11-30 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1201b24acd347d6daaad969e6abfe0975cb86bc8:

  init: did_arg cleanup (2017-11-28 16:00:22 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6c3fb04c80c3c241162e743a54761e5e896d4ba2:

  options: correct parser type for max_latency (2017-11-29 22:27:05 -0700)

----------------------------------------------------------------
Jens Axboe (13):
      options: don't quicksort zoned distribution series
      Add support for absolute random zones
      examples/rand-zones.fio: add zoned_abs example
      io_u: cleanup and simplify __get_next_rand_offset_zoned_abs()
      Unify max split zone support
      io_u: don't do expensive int divide for buffer scramble
      io_u: do nsec -> usec converison in one spot in account_io_completion()
      options: make it clear that max_latency is in usecs
      options: make max_latency a 64-bit variable
      Change latency targets to be in nsec values internally
      verify: kill unneeded forward declaration
      verify: convert hdr time to sec+nsec
      options: correct parser type for max_latency

Tomohiro Kusumi (1):
      Revert "Avoid irrelevant "offset extend ends" error message for chrdev"

 HOWTO                   |  24 +++++++--
 cconv.c                 |   4 +-
 examples/rand-zones.fio |   8 +++
 filesetup.c             |  26 ++++-----
 fio.1                   |  24 ++++++++-
 fio.h                   |   3 ++
 init.c                  |   7 +++
 io_u.c                  |  83 +++++++++++++++++++++++++----
 libfio.c                |   1 +
 options.c               | 136 ++++++++++++++++++++++++++++--------------------
 profiles/act.c          |   3 +-
 server.h                |   2 +-
 thread_options.h        |  14 ++---
 verify.c                |   5 +-
 verify.h                |   2 +-
 15 files changed, 240 insertions(+), 102 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 164ba2b..dc99e99 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1254,6 +1254,9 @@ I/O type
 		**zoned**
 				Zoned random distribution
 
+		**zoned_abs**
+				Zone absolute random distribution
+
 	When using a **zipf** or **pareto** distribution, an input value is also
 	needed to define the access pattern. For **zipf**, this is the `Zipf
 	theta`. For **pareto**, it's the `Pareto power`. Fio includes a test
@@ -1278,10 +1281,23 @@ I/O type
 
 		random_distribution=zoned:60/10:30/20:8/30:2/40
 
-	similarly to how :option:`bssplit` works for setting ranges and percentages
-	of block sizes. Like :option:`bssplit`, it's possible to specify separate
-	zones for reads, writes, and trims. If just one set is given, it'll apply to
-	all of them.
+	A **zoned_abs** distribution works exactly like the **zoned**, except
+	that it takes absolute sizes. For example, let's say you wanted to
+	define access according to the following criteria:
+
+		* 60% of accesses should be to the first 20G
+		* 30% of accesses should be to the next 100G
+		* 10% of accesses should be to the next 500G
+
+	we can define an absolute zoning distribution with:
+
+		random_distribution=zoned_abs=60/20G:30/100G:10/500g
+
+	Similarly to how :option:`bssplit` works for setting ranges and
+	percentages of block sizes. Like :option:`bssplit`, it's possible to
+	specify separate zones for reads, writes, and trims. If just one set
+	is given, it'll apply to all of them. This goes for both **zoned**
+	**zoned_abs** distributions.
 
 .. option:: percentage_random=int[,int][,int]
 
diff --git a/cconv.c b/cconv.c
index 1a41dc3..5ed4640 100644
--- a/cconv.c
+++ b/cconv.c
@@ -234,7 +234,6 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->loops = le32_to_cpu(top->loops);
 	o->mem_type = le32_to_cpu(top->mem_type);
 	o->mem_align = le32_to_cpu(top->mem_align);
-	o->max_latency = le32_to_cpu(top->max_latency);
 	o->stonewall = le32_to_cpu(top->stonewall);
 	o->new_group = le32_to_cpu(top->new_group);
 	o->numjobs = le32_to_cpu(top->numjobs);
@@ -283,6 +282,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->sync_file_range = le32_to_cpu(top->sync_file_range);
 	o->latency_target = le64_to_cpu(top->latency_target);
 	o->latency_window = le64_to_cpu(top->latency_window);
+	o->max_latency = le64_to_cpu(top->max_latency);
 	o->latency_percentile.u.f = fio_uint64_to_double(le64_to_cpu(top->latency_percentile.u.i));
 	o->compress_percentage = le32_to_cpu(top->compress_percentage);
 	o->compress_chunk = le32_to_cpu(top->compress_chunk);
@@ -423,7 +423,6 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->loops = cpu_to_le32(o->loops);
 	top->mem_type = cpu_to_le32(o->mem_type);
 	top->mem_align = cpu_to_le32(o->mem_align);
-	top->max_latency = cpu_to_le32(o->max_latency);
 	top->stonewall = cpu_to_le32(o->stonewall);
 	top->new_group = cpu_to_le32(o->new_group);
 	top->numjobs = cpu_to_le32(o->numjobs);
@@ -472,6 +471,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->sync_file_range = cpu_to_le32(o->sync_file_range);
 	top->latency_target = __cpu_to_le64(o->latency_target);
 	top->latency_window = __cpu_to_le64(o->latency_window);
+	top->max_latency = __cpu_to_le64(o->max_latency);
 	top->latency_percentile.u.i = __cpu_to_le64(fio_double_to_uint64(o->latency_percentile.u.f));
 	top->compress_percentage = cpu_to_le32(o->compress_percentage);
 	top->compress_chunk = cpu_to_le32(o->compress_chunk);
diff --git a/examples/rand-zones.fio b/examples/rand-zones.fio
index da13fa3..169137d 100644
--- a/examples/rand-zones.fio
+++ b/examples/rand-zones.fio
@@ -10,6 +10,14 @@ rw=randread
 norandommap
 random_distribution=zoned:50/5:30/15:20/
 
+# It's also possible to use zoned_abs to specify absolute sizes. For
+# instance, if you do:
+#
+# random_distribution=zoned_abs:50/10G:30/100G:20/500G
+#
+# Then 50% of the access will be to the first 10G of the drive, 30%
+# will be to the next 100G, and 20% will be to the next 500G.
+
 # The above applies to all of reads/writes/trims. If we wanted to do
 # something differently for writes, let's say 50% for the first 10%
 # and 50% for the remaining 90%, we could do it by adding a new section
diff --git a/filesetup.c b/filesetup.c
index 4d29b70..1d586b1 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -435,8 +435,12 @@ static int get_file_size(struct thread_data *td, struct fio_file *f)
 		ret = bdev_size(td, f);
 	else if (f->filetype == FIO_TYPE_CHAR)
 		ret = char_size(td, f);
-	else
-		f->real_file_size = -1ULL;
+	else {
+		f->real_file_size = -1;
+		log_info("%s: failed to get file size of %s\n", td->o.name,
+					f->file_name);
+		return 1; /* avoid offset extends end error message */
+	}
 
 	/*
 	 * Leave ->real_file_size with 0 since it could be expectation
@@ -446,22 +450,10 @@ static int get_file_size(struct thread_data *td, struct fio_file *f)
 		return ret;
 
 	/*
-	 * If ->real_file_size is -1, a conditional for the message
-	 * "offset extends end" is always true, but it makes no sense,
-	 * so just return the same value here.
-	 */
-	if (f->real_file_size == -1ULL) {
-		log_info("%s: failed to get file size of %s\n", td->o.name,
-					f->file_name);
-		return 1;
-	}
-
-	if (td->o.start_offset && f->file_offset == 0)
-		dprint(FD_FILE, "offset of file %s not initialized yet\n",
-					f->file_name);
-	/*
 	 * ->file_offset normally hasn't been initialized yet, so this
-	 * is basically always false.
+	 * is basically always false unless ->real_file_size is -1, but
+	 * if ->real_file_size is -1 this message doesn't make sense.
+	 * As a result, this message is basically useless.
 	 */
 	if (f->file_offset > f->real_file_size) {
 		log_err("%s: offset extends end (%llu > %llu)\n", td->o.name,
diff --git a/fio.1 b/fio.1
index a4b0ea6..01b4db6 100644
--- a/fio.1
+++ b/fio.1
@@ -1033,6 +1033,8 @@ Normal (Gaussian) distribution
 .TP
 .B zoned
 Zoned random distribution
+.B zoned_abs
+Zoned absolute random distribution
 .RE
 .P
 When using a \fBzipf\fR or \fBpareto\fR distribution, an input value is also
@@ -1068,7 +1070,27 @@ example, the user would do:
 random_distribution=zoned:60/10:30/20:8/30:2/40
 .RE
 .P
-similarly to how \fBbssplit\fR works for setting ranges and percentages
+A \fBzoned_abs\fR distribution works exactly like the\fBzoned\fR, except that
+it takes absolute sizes. For example, let's say you wanted to define access
+according to the following criteria:
+.RS
+.P
+.PD 0
+60% of accesses should be to the first 20G
+.P
+30% of accesses should be to the next 100G
+.P
+10% of accesses should be to the next 500G
+.PD
+.RE
+.P
+we can define an absolute zoning distribution with:
+.RS
+.P
+random_distribution=zoned:60/10:30/20:8/30:2/40
+.RE
+.P
+Similarly to how \fBbssplit\fR works for setting ranges and percentages
 of block sizes. Like \fBbssplit\fR, it's possible to specify separate
 zones for reads, writes, and trims. If just one set is given, it'll apply to
 all of them.
diff --git a/fio.h b/fio.h
index 8ca934d..a44f1aa 100644
--- a/fio.h
+++ b/fio.h
@@ -158,6 +158,8 @@ void sk_out_drop(void);
 struct zone_split_index {
 	uint8_t size_perc;
 	uint8_t size_perc_prev;
+	uint64_t size;
+	uint64_t size_prev;
 };
 
 /*
@@ -813,6 +815,7 @@ enum {
 	FIO_RAND_DIST_PARETO,
 	FIO_RAND_DIST_GAUSS,
 	FIO_RAND_DIST_ZONED,
+	FIO_RAND_DIST_ZONED_ABS,
 };
 
 #define FIO_DEF_ZIPF		1.1
diff --git a/init.c b/init.c
index acbbd48..7c16b06 100644
--- a/init.c
+++ b/init.c
@@ -925,6 +925,13 @@ static int fixup_options(struct thread_data *td)
 		ret = 1;
 	}
 
+	/*
+	 * Fix these up to be nsec internally
+	 */
+	o->max_latency *= 1000ULL;
+	o->latency_target *= 1000ULL;
+	o->latency_window *= 1000ULL;
+
 	return ret;
 }
 
diff --git a/io_u.c b/io_u.c
index 81ee724..ebe82e1 100644
--- a/io_u.c
+++ b/io_u.c
@@ -157,6 +157,66 @@ static int __get_next_rand_offset_gauss(struct thread_data *td,
 	return 0;
 }
 
+static int __get_next_rand_offset_zoned_abs(struct thread_data *td,
+					    struct fio_file *f,
+					    enum fio_ddir ddir, uint64_t *b)
+{
+	struct zone_split_index *zsi;
+	uint64_t lastb, send, stotal;
+	static int warned;
+	unsigned int v;
+
+	lastb = last_block(td, f, ddir);
+	if (!lastb)
+		return 1;
+
+	if (!td->o.zone_split_nr[ddir]) {
+bail:
+		return __get_next_rand_offset(td, f, ddir, b, lastb);
+	}
+
+	/*
+	 * Generate a value, v, between 1 and 100, both inclusive
+	 */
+	v = rand32_between(&td->zone_state, 1, 100);
+
+	/*
+	 * Find our generated table. 'send' is the end block of this zone,
+	 * 'stotal' is our start offset.
+	 */
+	zsi = &td->zone_state_index[ddir][v - 1];
+	stotal = zsi->size_prev / td->o.ba[ddir];
+	send = zsi->size / td->o.ba[ddir];
+
+	/*
+	 * Should never happen
+	 */
+	if (send == -1U) {
+		if (!warned) {
+			log_err("fio: bug in zoned generation\n");
+			warned = 1;
+		}
+		goto bail;
+	} else if (send > lastb) {
+		/*
+		 * This happens if the user specifies ranges that exceed
+		 * the file/device size. We can't handle that gracefully,
+		 * so error and exit.
+		 */
+		log_err("fio: zoned_abs sizes exceed file size\n");
+		return 1;
+	}
+
+	/*
+	 * Generate index from 0..send-stotal
+	 */
+	if (__get_next_rand_offset(td, f, ddir, b, send - stotal) == 1)
+		return 1;
+
+	*b += stotal;
+	return 0;
+}
+
 static int __get_next_rand_offset_zoned(struct thread_data *td,
 					struct fio_file *f, enum fio_ddir ddir,
 					uint64_t *b)
@@ -249,6 +309,8 @@ static int get_off_from_method(struct thread_data *td, struct fio_file *f,
 		return __get_next_rand_offset_gauss(td, f, ddir, b);
 	else if (td->o.random_distribution == FIO_RAND_DIST_ZONED)
 		return __get_next_rand_offset_zoned(td, f, ddir, b);
+	else if (td->o.random_distribution == FIO_RAND_DIST_ZONED_ABS)
+		return __get_next_rand_offset_zoned_abs(td, f, ddir, b);
 
 	log_err("fio: unknown random distribution: %d\n", td->o.random_distribution);
 	return 1;
@@ -1347,10 +1409,10 @@ static long set_io_u_file(struct thread_data *td, struct io_u *io_u)
 }
 
 static void lat_fatal(struct thread_data *td, struct io_completion_data *icd,
-		      unsigned long tusec, unsigned long max_usec)
+		      unsigned long long tnsec, unsigned long long max_nsec)
 {
 	if (!td->error)
-		log_err("fio: latency of %lu usec exceeds specified max (%lu usec)\n", tusec, max_usec);
+		log_err("fio: latency of %llu nsec exceeds specified max (%llu nsec)\n", tnsec, max_nsec);
 	td_verror(td, ETIMEDOUT, "max latency exceeded");
 	icd->error = ETIMEDOUT;
 }
@@ -1611,7 +1673,7 @@ static bool check_get_verify(struct thread_data *td, struct io_u *io_u)
 static void small_content_scramble(struct io_u *io_u)
 {
 	unsigned int i, nr_blocks = io_u->buflen / 512;
-	uint64_t boffset;
+	uint64_t boffset, usec;
 	unsigned int offset;
 	char *p, *end;
 
@@ -1622,13 +1684,16 @@ static void small_content_scramble(struct io_u *io_u)
 	boffset = io_u->offset;
 	io_u->buf_filled_len = 0;
 
+	/* close enough for this purpose */
+	usec = io_u->start_time.tv_nsec >> 10;
+
 	for (i = 0; i < nr_blocks; i++) {
 		/*
 		 * Fill the byte offset into a "random" start offset of
 		 * the buffer, given by the product of the usec time
 		 * and the actual offset.
 		 */
-		offset = ((io_u->start_time.tv_nsec/1000) ^ boffset) & 511;
+		offset = (usec ^ boffset) & 511;
 		offset &= ~(sizeof(uint64_t) - 1);
 		if (offset >= 512 - sizeof(uint64_t))
 			offset -= sizeof(uint64_t);
@@ -1806,14 +1871,14 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 			struct prof_io_ops *ops = &td->prof_io_ops;
 
 			if (ops->io_u_lat)
-				icd->error = ops->io_u_lat(td, tnsec/1000);
+				icd->error = ops->io_u_lat(td, tnsec);
 		}
 
-		if (td->o.max_latency && tnsec/1000 > td->o.max_latency)
-			lat_fatal(td, icd, tnsec/1000, td->o.max_latency);
-		if (td->o.latency_target && tnsec/1000 > td->o.latency_target) {
+		if (td->o.max_latency && tnsec > td->o.max_latency)
+			lat_fatal(td, icd, tnsec, td->o.max_latency);
+		if (td->o.latency_target && tnsec > td->o.latency_target) {
 			if (lat_target_failed(td))
-				lat_fatal(td, icd, tnsec/1000, td->o.latency_target);
+				lat_fatal(td, icd, tnsec, td->o.latency_target);
 		}
 	}
 
diff --git a/libfio.c b/libfio.c
index d9900ad..c9bb8f3 100644
--- a/libfio.c
+++ b/libfio.c
@@ -366,6 +366,7 @@ int initialize_fio(char *envp[])
 	compiletime_assert((offsetof(struct jobs_eta, m_rate) % 8) == 0, "m_rate");
 
 	compiletime_assert(__TD_F_LAST <= TD_ENG_FLAG_SHIFT, "TD_ENG_FLAG_SHIFT");
+	compiletime_assert(BSSPLIT_MAX == ZONESPLIT_MAX, "bsssplit/zone max");
 
 	err = endian_check();
 	if (err) {
diff --git a/options.c b/options.c
index 7caccb3..a224e7b 100644
--- a/options.c
+++ b/options.c
@@ -56,14 +56,15 @@ static int bs_cmp(const void *p1, const void *p2)
 
 struct split {
 	unsigned int nr;
-	unsigned int val1[100];
-	unsigned int val2[100];
+	unsigned int val1[ZONESPLIT_MAX];
+	unsigned long long val2[ZONESPLIT_MAX];
 };
 
 static int split_parse_ddir(struct thread_options *o, struct split *split,
-			    enum fio_ddir ddir, char *str)
+			    enum fio_ddir ddir, char *str, bool absolute)
 {
-	unsigned int i, perc;
+	unsigned long long perc;
+	unsigned int i;
 	long long val;
 	char *fname;
 
@@ -80,23 +81,35 @@ static int split_parse_ddir(struct thread_options *o, struct split *split,
 		if (perc_str) {
 			*perc_str = '\0';
 			perc_str++;
-			perc = atoi(perc_str);
-			if (perc > 100)
-				perc = 100;
-			else if (!perc)
+			if (absolute) {
+				if (str_to_decimal(perc_str, &val, 1, o, 0, 0)) {
+					log_err("fio: split conversion failed\n");
+					return 1;
+				}
+				perc = val;
+			} else {
+				perc = atoi(perc_str);
+				if (perc > 100)
+					perc = 100;
+				else if (!perc)
+					perc = -1U;
+			}
+		} else {
+			if (absolute)
+				perc = 0;
+			else
 				perc = -1U;
-		} else
-			perc = -1U;
+		}
 
 		if (str_to_decimal(fname, &val, 1, o, 0, 0)) {
-			log_err("fio: bssplit conversion failed\n");
+			log_err("fio: split conversion failed\n");
 			return 1;
 		}
 
 		split->val1[i] = val;
 		split->val2[i] = perc;
 		i++;
-		if (i == 100)
+		if (i == ZONESPLIT_MAX)
 			break;
 	}
 
@@ -104,7 +117,8 @@ static int split_parse_ddir(struct thread_options *o, struct split *split,
 	return 0;
 }
 
-static int bssplit_ddir(struct thread_options *o, enum fio_ddir ddir, char *str)
+static int bssplit_ddir(struct thread_options *o, enum fio_ddir ddir, char *str,
+			bool data)
 {
 	unsigned int i, perc, perc_missing;
 	unsigned int max_bs, min_bs;
@@ -112,7 +126,7 @@ static int bssplit_ddir(struct thread_options *o, enum fio_ddir ddir, char *str)
 
 	memset(&split, 0, sizeof(split));
 
-	if (split_parse_ddir(o, &split, ddir, str))
+	if (split_parse_ddir(o, &split, ddir, str, data))
 		return 1;
 	if (!split.nr)
 		return 0;
@@ -176,9 +190,10 @@ static int bssplit_ddir(struct thread_options *o, enum fio_ddir ddir, char *str)
 	return 0;
 }
 
-typedef int (split_parse_fn)(struct thread_options *, enum fio_ddir, char *);
+typedef int (split_parse_fn)(struct thread_options *, enum fio_ddir, char *, bool);
 
-static int str_split_parse(struct thread_data *td, char *str, split_parse_fn *fn)
+static int str_split_parse(struct thread_data *td, char *str,
+			   split_parse_fn *fn, bool data)
 {
 	char *odir, *ddir;
 	int ret = 0;
@@ -187,37 +202,37 @@ static int str_split_parse(struct thread_data *td, char *str, split_parse_fn *fn
 	if (odir) {
 		ddir = strchr(odir + 1, ',');
 		if (ddir) {
-			ret = fn(&td->o, DDIR_TRIM, ddir + 1);
+			ret = fn(&td->o, DDIR_TRIM, ddir + 1, data);
 			if (!ret)
 				*ddir = '\0';
 		} else {
 			char *op;
 
 			op = strdup(odir + 1);
-			ret = fn(&td->o, DDIR_TRIM, op);
+			ret = fn(&td->o, DDIR_TRIM, op, data);
 
 			free(op);
 		}
 		if (!ret)
-			ret = fn(&td->o, DDIR_WRITE, odir + 1);
+			ret = fn(&td->o, DDIR_WRITE, odir + 1, data);
 		if (!ret) {
 			*odir = '\0';
-			ret = fn(&td->o, DDIR_READ, str);
+			ret = fn(&td->o, DDIR_READ, str, data);
 		}
 	} else {
 		char *op;
 
 		op = strdup(str);
-		ret = fn(&td->o, DDIR_WRITE, op);
+		ret = fn(&td->o, DDIR_WRITE, op, data);
 		free(op);
 
 		if (!ret) {
 			op = strdup(str);
-			ret = fn(&td->o, DDIR_TRIM, op);
+			ret = fn(&td->o, DDIR_TRIM, op, data);
 			free(op);
 		}
 		if (!ret)
-			ret = fn(&td->o, DDIR_READ, str);
+			ret = fn(&td->o, DDIR_READ, str, data);
 	}
 
 	return ret;
@@ -234,7 +249,7 @@ static int str_bssplit_cb(void *data, const char *input)
 	strip_blank_front(&str);
 	strip_blank_end(str);
 
-	ret = str_split_parse(td, str, bssplit_ddir);
+	ret = str_split_parse(td, str, bssplit_ddir, false);
 
 	if (parse_dryrun()) {
 		int i;
@@ -823,23 +838,15 @@ static int str_sfr_cb(void *data, const char *str)
 }
 #endif
 
-static int zone_cmp(const void *p1, const void *p2)
-{
-	const struct zone_split *zsp1 = p1;
-	const struct zone_split *zsp2 = p2;
-
-	return (int) zsp2->access_perc - (int) zsp1->access_perc;
-}
-
 static int zone_split_ddir(struct thread_options *o, enum fio_ddir ddir,
-			   char *str)
+			   char *str, bool absolute)
 {
 	unsigned int i, perc, perc_missing, sperc, sperc_missing;
 	struct split split;
 
 	memset(&split, 0, sizeof(split));
 
-	if (split_parse_ddir(o, &split, ddir, str))
+	if (split_parse_ddir(o, &split, ddir, str, absolute))
 		return 1;
 	if (!split.nr)
 		return 0;
@@ -848,7 +855,10 @@ static int zone_split_ddir(struct thread_options *o, enum fio_ddir ddir,
 	o->zone_split_nr[ddir] = split.nr;
 	for (i = 0; i < split.nr; i++) {
 		o->zone_split[ddir][i].access_perc = split.val1[i];
-		o->zone_split[ddir][i].size_perc = split.val2[i];
+		if (absolute)
+			o->zone_split[ddir][i].size = split.val2[i];
+		else
+			o->zone_split[ddir][i].size_perc = split.val2[i];
 	}
 
 	/*
@@ -864,11 +874,12 @@ static int zone_split_ddir(struct thread_options *o, enum fio_ddir ddir,
 		else
 			perc += zsp->access_perc;
 
-		if (zsp->size_perc == (uint8_t) -1U)
-			sperc_missing++;
-		else
-			sperc += zsp->size_perc;
-
+		if (!absolute) {
+			if (zsp->size_perc == (uint8_t) -1U)
+				sperc_missing++;
+			else
+				sperc += zsp->size_perc;
+		}
 	}
 
 	if (perc > 100 || sperc > 100) {
@@ -910,20 +921,17 @@ static int zone_split_ddir(struct thread_options *o, enum fio_ddir ddir,
 		}
 	}
 
-	/*
-	 * now sort based on percentages, for ease of lookup
-	 */
-	qsort(o->zone_split[ddir], o->zone_split_nr[ddir], sizeof(struct zone_split), zone_cmp);
 	return 0;
 }
 
 static void __td_zone_gen_index(struct thread_data *td, enum fio_ddir ddir)
 {
 	unsigned int i, j, sprev, aprev;
+	uint64_t sprev_sz;
 
 	td->zone_state_index[ddir] = malloc(sizeof(struct zone_split_index) * 100);
 
-	sprev = aprev = 0;
+	sprev_sz = sprev = aprev = 0;
 	for (i = 0; i < td->o.zone_split_nr[ddir]; i++) {
 		struct zone_split *zsp = &td->o.zone_split[ddir][i];
 
@@ -932,10 +940,14 @@ static void __td_zone_gen_index(struct thread_data *td, enum fio_ddir ddir)
 
 			zsi->size_perc = sprev + zsp->size_perc;
 			zsi->size_perc_prev = sprev;
+
+			zsi->size = sprev_sz + zsp->size;
+			zsi->size_prev = sprev_sz;
 		}
 
 		aprev += zsp->access_perc;
 		sprev += zsp->size_perc;
+		sprev_sz += zsp->size;
 	}
 }
 
@@ -954,8 +966,10 @@ static void td_zone_gen_index(struct thread_data *td)
 		__td_zone_gen_index(td, i);
 }
 
-static int parse_zoned_distribution(struct thread_data *td, const char *input)
+static int parse_zoned_distribution(struct thread_data *td, const char *input,
+				    bool absolute)
 {
+	const char *pre = absolute ? "zoned_abs:" : "zoned:";
 	char *str, *p;
 	int i, ret = 0;
 
@@ -965,14 +979,14 @@ static int parse_zoned_distribution(struct thread_data *td, const char *input)
 	strip_blank_end(str);
 
 	/* We expect it to start like that, bail if not */
-	if (strncmp(str, "zoned:", 6)) {
+	if (strncmp(str, pre, strlen(pre))) {
 		log_err("fio: mismatch in zoned input <%s>\n", str);
 		free(p);
 		return 1;
 	}
-	str += strlen("zoned:");
+	str += strlen(pre);
 
-	ret = str_split_parse(td, str, zone_split_ddir);
+	ret = str_split_parse(td, str, zone_split_ddir, absolute);
 
 	free(p);
 
@@ -984,8 +998,15 @@ static int parse_zoned_distribution(struct thread_data *td, const char *input)
 		for (j = 0; j < td->o.zone_split_nr[i]; j++) {
 			struct zone_split *zsp = &td->o.zone_split[i][j];
 
-			dprint(FD_PARSE, "\t%d: %u/%u\n", j, zsp->access_perc,
-								zsp->size_perc);
+			if (absolute) {
+				dprint(FD_PARSE, "\t%d: %u/%llu\n", j,
+						zsp->access_perc,
+						(unsigned long long) zsp->size);
+			} else {
+				dprint(FD_PARSE, "\t%d: %u/%u\n", j,
+						zsp->access_perc,
+						zsp->size_perc);
+			}
 		}
 	}
 
@@ -1024,7 +1045,9 @@ static int str_random_distribution_cb(void *data, const char *str)
 	else if (td->o.random_distribution == FIO_RAND_DIST_GAUSS)
 		val = 0.0;
 	else if (td->o.random_distribution == FIO_RAND_DIST_ZONED)
-		return parse_zoned_distribution(td, str);
+		return parse_zoned_distribution(td, str, false);
+	else if (td->o.random_distribution == FIO_RAND_DIST_ZONED_ABS)
+		return parse_zoned_distribution(td, str, true);
 	else
 		return 0;
 
@@ -2253,7 +2276,10 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .oval = FIO_RAND_DIST_ZONED,
 			    .help = "Zoned random distribution",
 			  },
-
+			  { .ival = "zoned_abs",
+			    .oval = FIO_RAND_DIST_ZONED_ABS,
+			    .help = "Zoned absolute random distribution",
+			  },
 		},
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_RANDOM,
@@ -3432,8 +3458,8 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "max_latency",
-		.lname	= "Max Latency",
-		.type	= FIO_OPT_INT,
+		.lname	= "Max Latency (usec)",
+		.type	= FIO_OPT_STR_VAL_TIME,
 		.off1	= offsetof(struct thread_options, max_latency),
 		.help	= "Maximum tolerated IO latency (usec)",
 		.is_time = 1,
diff --git a/profiles/act.c b/profiles/act.c
index 4669535..3fa5afa 100644
--- a/profiles/act.c
+++ b/profiles/act.c
@@ -288,10 +288,11 @@ static int act_prep_cmdline(void)
 	return 0;
 }
 
-static int act_io_u_lat(struct thread_data *td, uint64_t usec)
+static int act_io_u_lat(struct thread_data *td, uint64_t nsec)
 {
 	struct act_prof_data *apd = td->prof_data;
 	struct act_slice *slice;
+	uint64_t usec = nsec / 1000ULL;
 	int i, ret = 0;
 	double perm;
 
diff --git a/server.h b/server.h
index ba3abfe..dbd5c27 100644
--- a/server.h
+++ b/server.h
@@ -49,7 +49,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 66,
+	FIO_SERVER_VER			= 67,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index ca549b5..3532300 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -36,6 +36,8 @@ struct bssplit {
 struct zone_split {
 	uint8_t access_perc;
 	uint8_t size_perc;
+	uint8_t pad[6];
+	uint64_t size;
 };
 
 #define NR_OPTS_SZ	(FIO_MAX_OPTS / (8 * sizeof(uint64_t)))
@@ -190,7 +192,7 @@ struct thread_options {
 	enum fio_memtype mem_type;
 	unsigned int mem_align;
 
-	unsigned int max_latency;
+	unsigned long long max_latency;
 
 	unsigned int stonewall;
 	unsigned int new_group;
@@ -427,7 +429,8 @@ struct thread_options_pack {
 
 	uint32_t random_distribution;
 	uint32_t exitall_error;
-	uint32_t pad;
+
+	uint32_t sync_file_range;
 
 	struct zone_split zone_split[DDIR_RWDIR_CNT][ZONESPLIT_MAX];
 	uint32_t zone_split_nr[DDIR_RWDIR_CNT];
@@ -467,8 +470,6 @@ struct thread_options_pack {
 	uint32_t mem_type;
 	uint32_t mem_align;
 
-	uint32_t max_latency;
-
 	uint32_t stonewall;
 	uint32_t new_group;
 	uint32_t numjobs;
@@ -519,6 +520,7 @@ struct thread_options_pack {
 	uint64_t trim_backlog;
 	uint32_t clat_percentiles;
 	uint32_t percentile_precision;
+	uint32_t pad;
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
 	uint8_t read_iolog_file[FIO_TOP_STR_MAX];
@@ -579,11 +581,9 @@ struct thread_options_pack {
 	uint64_t offset_increment;
 	uint64_t number_ios;
 
-	uint32_t sync_file_range;
-	uint32_t pad2;
-
 	uint64_t latency_target;
 	uint64_t latency_window;
+	uint64_t max_latency;
 	fio_fp64_t latency_percentile;
 
 	uint32_t sig_figs;
diff --git a/verify.c b/verify.c
index db6e17e..2faeaad 100644
--- a/verify.c
+++ b/verify.c
@@ -30,9 +30,6 @@
 static void populate_hdr(struct thread_data *td, struct io_u *io_u,
 			 struct verify_header *hdr, unsigned int header_num,
 			 unsigned int header_len);
-static void fill_hdr(struct thread_data *td, struct io_u *io_u,
-		     struct verify_header *hdr, unsigned int header_num,
-		     unsigned int header_len, uint64_t rand_seed);
 static void __fill_hdr(struct thread_data *td, struct io_u *io_u,
 		       struct verify_header *hdr, unsigned int header_num,
 		       unsigned int header_len, uint64_t rand_seed);
@@ -1167,7 +1164,7 @@ static void __fill_hdr(struct thread_data *td, struct io_u *io_u,
 	hdr->rand_seed = rand_seed;
 	hdr->offset = io_u->offset + header_num * td->o.verify_interval;
 	hdr->time_sec = io_u->start_time.tv_sec;
-	hdr->time_usec = io_u->start_time.tv_nsec / 1000;
+	hdr->time_nsec = io_u->start_time.tv_nsec;
 	hdr->thread = td->thread_number;
 	hdr->numberio = io_u->numberio;
 	hdr->crc32 = fio_crc32c(p, offsetof(struct verify_header, crc32));
diff --git a/verify.h b/verify.h
index 5aae2e7..321e648 100644
--- a/verify.h
+++ b/verify.h
@@ -43,7 +43,7 @@ struct verify_header {
 	uint64_t rand_seed;
 	uint64_t offset;
 	uint32_t time_sec;
-	uint32_t time_usec;
+	uint32_t time_nsec;
 	uint16_t thread;
 	uint16_t numberio;
 	uint32_t crc32;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-11-29 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-11-29 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 609ac152d5a7d0bc5c645f8c50bf2415e7b2d4d3:

  docs: Add documention for RDMA ioengine options. (2017-11-23 22:08:48 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1201b24acd347d6daaad969e6abfe0975cb86bc8:

  init: did_arg cleanup (2017-11-28 16:00:22 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      init: check and fail status-interval settings that are too small
      init: remove dead code
      init: did_arg cleanup

 init.c | 46 +++++++++++++++++++++++-----------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index b7e9c0e..acbbd48 100644
--- a/init.c
+++ b/init.c
@@ -78,7 +78,7 @@ unsigned int fio_debug_jobno = -1;
 unsigned int *fio_debug_jobp = NULL;
 
 static char cmd_optstr[256];
-static int did_arg;
+static bool did_arg;
 
 #define FIO_CLIENT_FLAG		(1 << 16)
 
@@ -2430,35 +2430,35 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			output_format |= FIO_OUTPUT_TERSE;
 			break;
 		case 'h':
-			did_arg = 1;
+			did_arg = true;
 			if (!cur_client) {
 				usage(argv[0]);
 				do_exit++;
 			}
 			break;
 		case 'c':
-			did_arg = 1;
+			did_arg = true;
 			if (!cur_client) {
 				fio_show_option_help(optarg);
 				do_exit++;
 			}
 			break;
 		case 'i':
-			did_arg = 1;
+			did_arg = true;
 			if (!cur_client) {
 				fio_show_ioengine_help(optarg);
 				do_exit++;
 			}
 			break;
 		case 's':
-			did_arg = 1;
+			did_arg = true;
 			dump_cmdline = 1;
 			break;
 		case 'r':
 			read_only = 1;
 			break;
 		case 'v':
-			did_arg = 1;
+			did_arg = true;
 			if (!cur_client) {
 				log_info("%s\n", fio_version_string);
 				do_exit++;
@@ -2494,7 +2494,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 				do_exit++;
 			break;
 		case 'P':
-			did_arg = 1;
+			did_arg = true;
 			parse_only = 1;
 			break;
 		case 'x': {
@@ -2516,12 +2516,12 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 #ifdef CONFIG_ZLIB
 		case 'X':
 			exit_val = iolog_file_inflate(optarg);
-			did_arg++;
+			did_arg = true;
 			do_exit++;
 			break;
 #endif
 		case 'p':
-			did_arg = 1;
+			did_arg = true;
 			if (exec_profile)
 				free(exec_profile);
 			exec_profile = strdup(optarg);
@@ -2535,7 +2535,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 				if (ret)
 					goto out_free;
 				td = NULL;
-				did_arg = 1;
+				did_arg = true;
 			}
 			if (!td) {
 				int is_section = !strncmp(opt, "name", 4);
@@ -2610,7 +2610,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			}
 			break;
 		case 'S':
-			did_arg = 1;
+			did_arg = true;
 #ifndef CONFIG_NO_SHM
 			if (nr_clients) {
 				log_err("fio: can't be both client and server\n");
@@ -2636,14 +2636,14 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 		case 'I':
 			if ((ret = fio_idle_prof_parse_opt(optarg))) {
 				/* exit on error and calibration only */
-				did_arg = 1;
+				did_arg = true;
 				do_exit++;
 				if (ret == -1)
 					exit_val = 1;
 			}
 			break;
 		case 'C':
-			did_arg = 1;
+			did_arg = true;
 			if (is_backend) {
 				log_err("fio: can't be both client and server\n");
 				do_exit++;
@@ -2700,19 +2700,19 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			}
 			break;
 		case 'R':
-			did_arg = 1;
+			did_arg = true;
 			if (fio_client_add_ini_file(cur_client, optarg, true)) {
 				do_exit++;
 				exit_val = 1;
 			}
 			break;
 		case 'T':
-			did_arg = 1;
+			did_arg = true;
 			do_exit++;
 			exit_val = fio_monotonic_clocktest(1);
 			break;
 		case 'G':
-			did_arg = 1;
+			did_arg = true;
 			do_exit++;
 			exit_val = fio_crctest(optarg);
 			break;
@@ -2725,6 +2725,11 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 				exit_val = 1;
 				break;
 			}
+			if (val < 1000) {
+				log_err("fio: status interval too small\n");
+				do_exit++;
+				exit_val = 1;
+			}
 			status_interval = val / 1000;
 			break;
 			}
@@ -2865,13 +2870,8 @@ int parse_options(int argc, char *argv[])
 			return 0;
 
 		log_err("No job(s) defined\n\n");
-
-		if (!did_arg) {
-			usage(argv[0]);
-			return 1;
-		}
-
-		return 0;
+		usage(argv[0]);
+		return 1;
 	}
 
 	if (output_format & FIO_OUTPUT_NORMAL)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-11-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-11-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b9c153b9023c3de65f01aeac4d1e993986a7107e:

  Merge branch 'cleanup' of https://github.com/sitsofe/fio (2017-11-22 19:58:21 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 609ac152d5a7d0bc5c645f8c50bf2415e7b2d4d3:

  docs: Add documention for RDMA ioengine options. (2017-11-23 22:08:48 -0700)

----------------------------------------------------------------
Stephen Bates (1):
      docs: Add documention for RDMA ioengine options.

 HOWTO | 29 +++++++++++++++++++++++++----
 fio.1 | 30 +++++++++++++++++++++++++-----
 2 files changed, 50 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 4d3a8c8..164ba2b 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1751,7 +1751,8 @@ I/O engine
 		**rdma**
 			The RDMA I/O engine supports both RDMA memory semantics
 			(RDMA_WRITE/RDMA_READ) and channel semantics (Send/Recv) for the
-			InfiniBand, RoCE and iWARP protocols.
+			InfiniBand, RoCE and iWARP protocols. This engine defines engine
+			specific options.
 
 		**falloc**
 			I/O engine that does regular fallocate to simulate data transfer as
@@ -1893,10 +1894,15 @@ with the caveat that when used on the command line, they must come after the
 		this will be the starting port number since fio will use a range of
 		ports.
 
-.. option:: hostname=str : [netsplice] [net]
+   [rdma]
+
+		The port to use for RDMA-CM communication. This should be the same value
+		on the client and the server side.
+
+.. option:: hostname=str : [netsplice] [net] [rdma]
 
-	The hostname or IP address to use for TCP or UDP based I/O.  If the job is
-	a TCP listener or UDP reader, the hostname is not used and must be omitted
+	The hostname or IP address to use for TCP, UDP or RDMA-CM based I/O.  If the job
+	is a TCP listener or UDP reader, the hostname is not used and must be omitted
 	unless it is a valid UDP multicast address.
 
 .. option:: interface=str : [netsplice] [net]
@@ -2002,6 +2008,21 @@ with the caveat that when used on the command line, they must come after the
 
 	The size of the chunk to use for each file.
 
+.. option:: verb=str : [rdma]
+
+	The RDMA verb to use on this side of the RDMA ioengine connection. Valid
+	values are write, read, send and recv. These correspond to the equivalent
+	RDMA verbs (e.g. write = rdma_write etc.). Note that this only needs to be
+	specified on the client side of the connection. See the examples folder.
+
+.. option:: bindname=str : [rdma]
+
+	The name to use to bind the local RDMA-CM connection to a local RDMA device.
+	This could be a hostname or an IPv4 or IPv6 address. On the server side this
+	will be passed into the rdma_bind_addr() function and on the client site it
+	will be used in the rdma_resolve_add() function. This can be useful when
+	multiple paths exist between the client and the server or in certain loopback
+	configurations.
 
 I/O depth
 ~~~~~~~~~
diff --git a/fio.1 b/fio.1
index 3224e9a..a4b0ea6 100644
--- a/fio.1
+++ b/fio.1
@@ -1525,7 +1525,8 @@ for more info on GUASI.
 .B rdma
 The RDMA I/O engine supports both RDMA memory semantics
 (RDMA_WRITE/RDMA_READ) and channel semantics (Send/Recv) for the
-InfiniBand, RoCE and iWARP protocols.
+InfiniBand, RoCE and iWARP protocols. This engine defines engine
+specific options.
 .TP
 .B falloc
 I/O engine that does regular fallocate to simulate data transfer as
@@ -1654,10 +1655,14 @@ The TCP or UDP port to bind to or connect to. If this is used with
 this will be the starting port number since fio will use a range of
 ports.
 .TP
-.BI (netsplice,net)hostname \fR=\fPstr
-The hostname or IP address to use for TCP or UDP based I/O. If the job is
-a TCP listener or UDP reader, the hostname is not used and must be omitted
-unless it is a valid UDP multicast address.
+.BI (rdma)port
+The port to use for RDMA-CM communication. This should be the same
+value on the client and the server side.
+.TP
+.BI (netsplice,net, rdma)hostname \fR=\fPstr
+The hostname or IP address to use for TCP, UDP or RDMA-CM based I/O.
+If the job is a TCP listener or UDP reader, the hostname is not used
+and must be omitted unless it is a valid UDP multicast address.
 .TP
 .BI (netsplice,net)interface \fR=\fPstr
 The IP address of the network interface used to send or receive UDP
@@ -1757,6 +1762,21 @@ libhdfs will create chunk in this HDFS directory.
 .TP
 .BI (libhdfs)chunk_size
 The size of the chunk to use for each file.
+.TP
+.BI (rdma)verb \fR=\fPstr
+The RDMA verb to use on this side of the RDMA ioengine
+connection. Valid values are write, read, send and recv. These
+correspond to the equivalent RDMA verbs (e.g. write = rdma_write
+etc.). Note that this only needs to be specified on the client side of
+the connection. See the examples folder.
+.TP
+.BI (rdma)bindname \fR=\fPstr
+The name to use to bind the local RDMA-CM connection to a local RDMA
+device. This could be a hostname or an IPv4 or IPv6 address. On the
+server side this will be passed into the rdma_bind_addr() function and
+on the client site it will be used in the rdma_resolve_add()
+function. This can be useful when multiple paths exist between the
+client and the server or in certain loopback configurations.
 .SS "I/O depth"
 .TP
 .BI iodepth \fR=\fPint

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-11-23 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-11-23 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 40e5f1bf1aca5970528724873a4544c43712a75d:

  Merge branch 'libpmem' (2017-11-17 09:21:19 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b9c153b9023c3de65f01aeac4d1e993986a7107e:

  Merge branch 'cleanup' of https://github.com/sitsofe/fio (2017-11-22 19:58:21 -0700)

----------------------------------------------------------------
Jeff Furlong (1):
      add significant_figures parameter

Jens Axboe (1):
      Merge branch 'cleanup' of https://github.com/sitsofe/fio

Sitsofe Wheeler (2):
      HOWTO: fix up broken formatting in logging options
      doc: reword buffer_compress_percentage, buffer_compress_chunk, dedupe_percentage

Stephen Bates (1):
      rdma: Add bind option

 HOWTO            | 63 ++++++++++++++++++++++++++++++++++++-------------------
 cconv.c          |  2 ++
 client.c         |  4 ++++
 engines/rdma.c   | 64 +++++++++++++++++++++++++++++++++++++++++++++-----------
 eta.c            |  4 ++--
 fio.1            | 45 ++++++++++++++++++++++++++-------------
 gclient.c        | 54 +++++++++++++++++++++++------------------------
 init.c           | 12 +++++------
 options.c        | 13 ++++++++++++
 server.c         |  2 ++
 stat.c           | 32 +++++++++++++++-------------
 stat.h           |  5 +++++
 thread_options.h |  4 ++++
 13 files changed, 205 insertions(+), 99 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index dce96bc..4d3a8c8 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1434,9 +1434,12 @@ Buffers and memory
 .. option:: refill_buffers
 
 	If this option is given, fio will refill the I/O buffers on every
-	submit. The default is to only fill it at init time and reuse that
-	data. Only makes sense if zero_buffers isn't specified, naturally. If data
-	verification is enabled, `refill_buffers` is also automatically enabled.
+	submit. Only makes sense if :option:`zero_buffers` isn't specified,
+	naturally. Defaults to being unset i.e., the buffer is only filled at
+	init time and the data in it is reused when possible but if any of
+	:option:`verify`, :option:`buffer_compress_percentage` or
+	:option:`dedupe_percentage` are enabled then `refill_buffers` is also
+	automatically enabled.
 
 .. option:: scramble_buffers=bool
 
@@ -1448,23 +1451,30 @@ Buffers and memory
 
 .. option:: buffer_compress_percentage=int
 
-	If this is set, then fio will attempt to provide I/O buffer content (on
-	WRITEs) that compresses to the specified level. Fio does this by providing a
-	mix of random data and a fixed pattern. The fixed pattern is either zeros,
-	or the pattern specified by :option:`buffer_pattern`. If the pattern option
-	is used, it might skew the compression ratio slightly. Note that this is per
-	block size unit, see :option:`buffer_compress_chunk` for setting a finer
-	granularity of compression regions.
+	If this is set, then fio will attempt to provide I/O buffer content
+	(on WRITEs) that compresses to the specified level. Fio does this by
+	providing a mix of random data followed by fixed pattern data. The
+	fixed pattern is either zeros, or the pattern specified by
+	:option:`buffer_pattern`. If the `buffer_pattern` option is used, it
+	might skew the compression ratio slightly. Setting
+	`buffer_compress_percentage` to a value other than 100 will also
+	enable :option:`refill_buffers` in order to reduce the likelihood that
+	adjacent blocks are so similar that they over compress when seen
+	together. See :option:`buffer_compress_chunk` for how to set a finer or
+	coarser granularity for the random/fixed data region. Defaults to unset
+	i.e., buffer data will not adhere to any compression level.
 
 .. option:: buffer_compress_chunk=int
 
-	See :option:`buffer_compress_percentage`. This setting allows fio to manage
-	how big the ranges of random data and zeroed data is. Without this set, fio
-	will provide :option:`buffer_compress_percentage` of blocksize random data,
-	followed by the remaining zeroed. With this set to some chunk size smaller
-	than the block size, fio can alternate random and zeroed data throughout the
-	I/O buffer. This is particularly useful when bigger block sizes are used
-	for a job. Defaults to 512.
+	This setting allows fio to manage how big the random/fixed data region
+	is when using :option:`buffer_compress_percentage`. When
+	`buffer_compress_chunk` is set to some non-zero value smaller than the
+	block size, fio can repeat the random/fixed region throughout the I/O
+	buffer at the specified interval (which particularly useful when
+	bigger block sizes are used for a job). When set to 0, fio will use a
+	chunk size that matches the block size resulting in a single
+	random/fixed region within the I/O buffer. Defaults to 512. When the
+	unit is omitted, the value is interpreted in bytes.
 
 .. option:: buffer_pattern=str
 
@@ -1501,7 +1511,9 @@ Buffers and memory
 	writing. These buffers will be naturally dedupable. The contents of the
 	buffers depend on what other buffer compression settings have been set. It's
 	possible to have the individual buffers either fully compressible, or not at
-	all. This option only controls the distribution of unique buffers.
+	all -- this option only controls the distribution of unique buffers. Setting
+	this option will also enable :option:`refill_buffers` to prevent every buffer
+	being identical.
 
 .. option:: invalidate=bool
 
@@ -2748,8 +2760,8 @@ Measurements and reporting
 .. option:: write_lat_log=str
 
 	Same as :option:`write_bw_log`, except this option creates I/O
-	submission (e.g., `file:`name_slat.x.log`), completion (e.g.,
-	`file:`name_clat.x.log`), and total (e.g., `file:`name_lat.x.log`)
+	submission (e.g., :file:`name_slat.x.log`), completion (e.g.,
+	:file:`name_clat.x.log`), and total (e.g., :file:`name_lat.x.log`)
 	latency files instead. See :option:`write_bw_log` for details about
 	the filename format and `Log File Formats`_ for how data is structured
 	within the files.
@@ -2757,7 +2769,7 @@ Measurements and reporting
 .. option:: write_hist_log=str
 
 	Same as :option:`write_bw_log` but writes an I/O completion latency
-	histogram file (e.g., `file:`name_hist.x.log`) instead. Note that this
+	histogram file (e.g., :file:`name_hist.x.log`) instead. Note that this
 	file will be empty unless :option:`log_hist_msec` has also been set.
 	See :option:`write_bw_log` for details about the filename format and
 	`Log File Formats`_ for how data is structured within the file.
@@ -2765,7 +2777,7 @@ Measurements and reporting
 .. option:: write_iops_log=str
 
 	Same as :option:`write_bw_log`, but writes an IOPS file (e.g.
-	`file:`name_iops.x.log`) instead. See :option:`write_bw_log` for
+	:file:`name_iops.x.log`) instead. See :option:`write_bw_log` for
 	details about the filename format and `Log File Formats`_ for how data
 	is structured within the file.
 
@@ -2909,6 +2921,13 @@ Measurements and reporting
 	completion latency below which 99.5% and 99.9% of the observed latencies
 	fell, respectively.
 
+.. option:: significant_figures=int
+
+	If using :option:`--output-format` of `normal`, set the significant figures 
+	to this	value. Higher values will yield more precise IOPS and throughput 
+	units, while lower values will round. Requires a minimum value of 1 and a 
+	maximum value of 10. Defaults to 4.
+
 
 Error handling
 ~~~~~~~~~~~~~~
diff --git a/cconv.c b/cconv.c
index dc3c4e6..1a41dc3 100644
--- a/cconv.c
+++ b/cconv.c
@@ -270,6 +270,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->clat_percentiles = le32_to_cpu(top->clat_percentiles);
 	o->lat_percentiles = le32_to_cpu(top->lat_percentiles);
 	o->percentile_precision = le32_to_cpu(top->percentile_precision);
+	o->sig_figs = le32_to_cpu(top->sig_figs);
 	o->continue_on_error = le32_to_cpu(top->continue_on_error);
 	o->cgroup_weight = le32_to_cpu(top->cgroup_weight);
 	o->cgroup_nodelete = le32_to_cpu(top->cgroup_nodelete);
@@ -458,6 +459,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->clat_percentiles = cpu_to_le32(o->clat_percentiles);
 	top->lat_percentiles = cpu_to_le32(o->lat_percentiles);
 	top->percentile_precision = cpu_to_le32(o->percentile_precision);
+	top->sig_figs = cpu_to_le32(o->sig_figs);
 	top->continue_on_error = cpu_to_le32(o->continue_on_error);
 	top->cgroup_weight = cpu_to_le32(o->cgroup_weight);
 	top->cgroup_nodelete = cpu_to_le32(o->cgroup_nodelete);
diff --git a/client.c b/client.c
index 779fb9d..11fa262 100644
--- a/client.c
+++ b/client.c
@@ -942,6 +942,8 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->kb_base		= le32_to_cpu(src->kb_base);
 	dst->unit_base		= le32_to_cpu(src->unit_base);
 
+	dst->sig_figs		= le32_to_cpu(src->sig_figs);
+
 	dst->latency_depth	= le32_to_cpu(src->latency_depth);
 	dst->latency_target	= le64_to_cpu(src->latency_target);
 	dst->latency_window	= le64_to_cpu(src->latency_window);
@@ -982,6 +984,7 @@ static void convert_gs(struct group_run_stats *dst, struct group_run_stats *src)
 
 	dst->kb_base	= le32_to_cpu(src->kb_base);
 	dst->unit_base	= le32_to_cpu(src->unit_base);
+	dst->sig_figs	= le32_to_cpu(src->sig_figs);
 	dst->groupid	= le32_to_cpu(src->groupid);
 	dst->unified_rw_rep	= le32_to_cpu(src->unified_rw_rep);
 }
@@ -1167,6 +1170,7 @@ static void convert_jobs_eta(struct jobs_eta *je)
 	je->nr_threads		= le32_to_cpu(je->nr_threads);
 	je->is_pow2		= le32_to_cpu(je->is_pow2);
 	je->unit_base		= le32_to_cpu(je->unit_base);
+	je->sig_figs		= le32_to_cpu(je->sig_figs);
 }
 
 void fio_client_sum_jobs_eta(struct jobs_eta *dst, struct jobs_eta *je)
diff --git a/engines/rdma.c b/engines/rdma.c
index da00cba..6b173a8 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -59,6 +59,7 @@ struct rdmaio_options {
 	struct thread_data *td;
 	unsigned int port;
 	enum rdma_io_mode verb;
+	char *bindname;
 };
 
 static int str_hostname_cb(void *data, const char *input)
@@ -82,6 +83,16 @@ static struct fio_option options[] = {
 		.group	= FIO_OPT_G_RDMA,
 	},
 	{
+		.name	= "bindname",
+		.lname	= "rdma engine bindname",
+		.type	= FIO_OPT_STR_STORE,
+		.off1	= offsetof(struct rdmaio_options, bindname),
+		.help	= "Bind for RDMA IO engine",
+		.def    = "",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_RDMA,
+	},
+	{
 		.name	= "port",
 		.lname	= "rdma engine port",
 		.type	= FIO_OPT_INT,
@@ -1004,30 +1015,53 @@ static int fio_rdmaio_close_file(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
+static int aton(struct thread_data *td, const char *host,
+		     struct sockaddr_in *addr)
+{
+	if (inet_aton(host, &addr->sin_addr) != 1) {
+		struct hostent *hent;
+
+		hent = gethostbyname(host);
+		if (!hent) {
+			td_verror(td, errno, "gethostbyname");
+			return 1;
+		}
+
+		memcpy(&addr->sin_addr, hent->h_addr, 4);
+	}
+	return 0;
+}
+
 static int fio_rdmaio_setup_connect(struct thread_data *td, const char *host,
 				    unsigned short port)
 {
 	struct rdmaio_data *rd = td->io_ops_data;
+	struct rdmaio_options *o = td->eo;
+	struct sockaddr_storage addrb;
 	struct ibv_recv_wr *bad_wr;
 	int err;
 
 	rd->addr.sin_family = AF_INET;
 	rd->addr.sin_port = htons(port);
 
-	if (inet_aton(host, &rd->addr.sin_addr) != 1) {
-		struct hostent *hent;
+	err = aton(td, host, &rd->addr);
+	if (err)
+		return err;
 
-		hent = gethostbyname(host);
-		if (!hent) {
-			td_verror(td, errno, "gethostbyname");
-			return 1;
-		}
+	/* resolve route */
+	if (strcmp(o->bindname, "") != 0) {
+		addrb.ss_family = AF_INET;
+		err = aton(td, o->bindname, (struct sockaddr_in *)&addrb);
+		if (err)
+			return err;
+		err = rdma_resolve_addr(rd->cm_id, (struct sockaddr *)&addrb,
+					(struct sockaddr *)&rd->addr, 2000);
 
-		memcpy(&rd->addr.sin_addr, hent->h_addr, 4);
+	} else {
+		err = rdma_resolve_addr(rd->cm_id, NULL,
+					(struct sockaddr *)&rd->addr, 2000);
 	}
 
-	/* resolve route */
-	err = rdma_resolve_addr(rd->cm_id, NULL, (struct sockaddr *)&rd->addr, 2000);
 	if (err != 0) {
 		log_err("fio: rdma_resolve_addr: %d\n", err);
 		return 1;
@@ -1072,15 +1106,20 @@ static int fio_rdmaio_setup_connect(struct thread_data *td, const char *host,
 static int fio_rdmaio_setup_listen(struct thread_data *td, short port)
 {
 	struct rdmaio_data *rd = td->io_ops_data;
+	struct rdmaio_options *o = td->eo;
 	struct ibv_recv_wr *bad_wr;
 	int state = td->runstate;
 
 	td_set_runstate(td, TD_SETTING_UP);
 
 	rd->addr.sin_family = AF_INET;
-	rd->addr.sin_addr.s_addr = htonl(INADDR_ANY);
 	rd->addr.sin_port = htons(port);
 
+	if (strcmp(o->bindname, "") == 0)
+		rd->addr.sin_addr.s_addr = htonl(INADDR_ANY);
+	else
+		rd->addr.sin_addr.s_addr = htonl(*o->bindname);
+
 	/* rdma_listen */
 	if (rdma_bind_addr(rd->cm_id, (struct sockaddr *)&rd->addr) != 0) {
 		log_err("fio: rdma_bind_addr fail: %m\n");
@@ -1155,7 +1194,8 @@ static int compat_options(struct thread_data *td)
 {
 	// The original RDMA engine had an ugly / seperator
 	// on the filename for it's options. This function
-	// retains backwards compatibility with it.100
+	// retains backwards compatibility with it. Note we do not
+	// support setting the bindname option is this legacy mode.
 
 	struct rdmaio_options *o = td->eo;
 	char *modep, *portp;
diff --git a/eta.c b/eta.c
index baaa681..1b0b000 100644
--- a/eta.c
+++ b/eta.c
@@ -537,9 +537,9 @@ void display_thread_status(struct jobs_eta *je)
 		char *tr, *mr;
 
 		mr = num2str(je->m_rate[0] + je->m_rate[1] + je->m_rate[2],
-				4, 0, je->is_pow2, N2S_BYTEPERSEC);
+				je->sig_figs, 0, je->is_pow2, N2S_BYTEPERSEC);
 		tr = num2str(je->t_rate[0] + je->t_rate[1] + je->t_rate[2],
-				4, 0, je->is_pow2, N2S_BYTEPERSEC);
+				je->sig_figs, 0, je->is_pow2, N2S_BYTEPERSEC);
 
 		p += sprintf(p, ", %s-%s", mr, tr);
 		free(tr);
diff --git a/fio.1 b/fio.1
index bd7670a..3224e9a 100644
--- a/fio.1
+++ b/fio.1
@@ -1237,22 +1237,29 @@ more clever block compression attempts, but it will stop naive dedupe of
 blocks. Default: true.
 .TP
 .BI buffer_compress_percentage \fR=\fPint
-If this is set, then fio will attempt to provide I/O buffer content (on
-WRITEs) that compresses to the specified level. Fio does this by providing a
-mix of random data and a fixed pattern. The fixed pattern is either zeros,
-or the pattern specified by \fBbuffer_pattern\fR. If the pattern option
-is used, it might skew the compression ratio slightly. Note that this is per
-block size unit, see \fBbuffer_compress_chunk\fR for setting a finer granularity
-of compressible regions.
+If this is set, then fio will attempt to provide I/O buffer content
+(on WRITEs) that compresses to the specified level. Fio does this by
+providing a mix of random data followed by fixed pattern data. The
+fixed pattern is either zeros, or the pattern specified by
+\fBbuffer_pattern\fR. If the \fBbuffer_pattern\fR option is used, it
+might skew the compression ratio slightly. Setting
+\fBbuffer_compress_percentage\fR to a value other than 100 will also
+enable \fBrefill_buffers\fR in order to reduce the likelihood that
+adjacent blocks are so similar that they over compress when seen
+together. See \fBbuffer_compress_chunk\fR for how to set a finer or
+coarser granularity of the random/fixed data regions. Defaults to unset
+i.e., buffer data will not adhere to any compression level.
 .TP
 .BI buffer_compress_chunk \fR=\fPint
-See \fBbuffer_compress_percentage\fR. This setting allows fio to manage
-how big the ranges of random data and zeroed data is. Without this set, fio
-will provide \fBbuffer_compress_percentage\fR of blocksize random data,
-followed by the remaining zeroed. With this set to some chunk size smaller
-than the block size, fio can alternate random and zeroed data throughout the
-I/O buffer. This is particularly useful when bigger block sizes are used
-for a job. Defaults to 512.
+This setting allows fio to manage how big the random/fixed data region
+is when using \fBbuffer_compress_percentage\fR. When
+\fBbuffer_compress_chunk\fR is set to some non-zero value smaller than the
+block size, fio can repeat the random/fixed region throughout the I/O
+buffer at the specified interval (which particularly useful when
+bigger block sizes are used for a job). When set to 0, fio will use a
+chunk size that matches the block size resulting in a single
+random/fixed region within the I/O buffer. Defaults to 512. When the
+unit is omitted, the value is interpreted in bytes.
 .TP
 .BI buffer_pattern \fR=\fPstr
 If set, fio will fill the I/O buffers with this pattern or with the contents
@@ -1295,7 +1302,9 @@ If set, fio will generate this percentage of identical buffers when
 writing. These buffers will be naturally dedupable. The contents of the
 buffers depend on what other buffer compression settings have been set. It's
 possible to have the individual buffers either fully compressible, or not at
-all. This option only controls the distribution of unique buffers.
+all \-\- this option only controls the distribution of unique buffers. Setting
+this option will also enable \fBrefill_buffers\fR to prevent every buffer
+being identical.
 .TP
 .BI invalidate \fR=\fPbool
 Invalidate the buffer/page cache parts of the files to be used prior to
@@ -2586,6 +2595,12 @@ numbers, and list the numbers in ascending order. For example,
 `\-\-percentile_list=99.5:99.9' will cause fio to report the values of
 completion latency below which 99.5% and 99.9% of the observed latencies
 fell, respectively.
+.TP
+.BI significant_figures \fR=\fPint
+If using \fB\-\-output\-format\fR of `normal', set the significant figures 
+to this value. Higher values will yield more precise IOPS and throughput 
+units, while lower values will round. Requires a minimum value of 1 and a 
+maximum value of 10. Defaults to 4.
 .SS "Error handling"
 .TP
 .BI exitall_on_error
diff --git a/gclient.c b/gclient.c
index daa9153..ab7aa10 100644
--- a/gclient.c
+++ b/gclient.c
@@ -379,24 +379,24 @@ static void gfio_update_client_eta(struct fio_client *client, struct jobs_eta *j
 			sprintf(output, "%3.1f%% done", perc);
 		}
 
-		iops_str[0] = num2str(je->iops[0], 4, 1, 0, N2S_PERSEC);
-		iops_str[1] = num2str(je->iops[1], 4, 1, 0, N2S_PERSEC);
-		iops_str[2] = num2str(je->iops[2], 4, 1, 0, N2S_PERSEC);
+		iops_str[0] = num2str(je->iops[0], je->sig_figs, 1, 0, N2S_PERSEC);
+		iops_str[1] = num2str(je->iops[1], je->sig_figs, 1, 0, N2S_PERSEC);
+		iops_str[2] = num2str(je->iops[2], je->sig_figs, 1, 0, N2S_PERSEC);
 
-		rate_str[0] = num2str(je->rate[0], 4, 10, i2p, N2S_BYTEPERSEC);
-		rate_alt[0] = num2str(je->rate[0], 4, 10, !i2p, N2S_BYTEPERSEC);
+		rate_str[0] = num2str(je->rate[0], je->sig_figs, 10, i2p, N2S_BYTEPERSEC);
+		rate_alt[0] = num2str(je->rate[0], je->sig_figs, 10, !i2p, N2S_BYTEPERSEC);
 		snprintf(tmp, sizeof(tmp), "%s (%s)", rate_str[0], rate_alt[0]);
 		gtk_entry_set_text(GTK_ENTRY(ge->eta.read_bw), tmp);
 		gtk_entry_set_text(GTK_ENTRY(ge->eta.read_iops), iops_str[0]);
 
-		rate_str[1] = num2str(je->rate[1], 4, 10, i2p, N2S_BYTEPERSEC);
-		rate_alt[1] = num2str(je->rate[1], 4, 10, !i2p, N2S_BYTEPERSEC);
+		rate_str[1] = num2str(je->rate[1], je->sig_figs, 10, i2p, N2S_BYTEPERSEC);
+		rate_alt[1] = num2str(je->rate[1], je->sig_figs, 10, !i2p, N2S_BYTEPERSEC);
 		snprintf(tmp, sizeof(tmp), "%s (%s)", rate_str[1], rate_alt[1]);
 		gtk_entry_set_text(GTK_ENTRY(ge->eta.write_bw), tmp);
 		gtk_entry_set_text(GTK_ENTRY(ge->eta.write_iops), iops_str[1]);
 
-		rate_str[2] = num2str(je->rate[2], 4, 10, i2p, N2S_BYTEPERSEC);
-		rate_alt[2] = num2str(je->rate[2], 4, 10, !i2p, N2S_BYTEPERSEC);
+		rate_str[2] = num2str(je->rate[2], je->sig_figs, 10, i2p, N2S_BYTEPERSEC);
+		rate_alt[2] = num2str(je->rate[2], je->sig_figs, 10, !i2p, N2S_BYTEPERSEC);
 		snprintf(tmp, sizeof(tmp), "%s (%s)", rate_str[2], rate_alt[2]);
 		gtk_entry_set_text(GTK_ENTRY(ge->eta.trim_bw), tmp);
 		gtk_entry_set_text(GTK_ENTRY(ge->eta.trim_iops), iops_str[2]);
@@ -463,24 +463,24 @@ static void gfio_update_all_eta(struct jobs_eta *je)
 			sprintf(output, "%3.1f%% done", perc);
 		}
 
-		iops_str[0] = num2str(je->iops[0], 4, 1, 0, N2S_PERSEC);
-		iops_str[1] = num2str(je->iops[1], 4, 1, 0, N2S_PERSEC);
-		iops_str[2] = num2str(je->iops[2], 4, 1, 0, N2S_PERSEC);
+		iops_str[0] = num2str(je->iops[0], je->sig_figs, 1, 0, N2S_PERSEC);
+		iops_str[1] = num2str(je->iops[1], je->sig_figs, 1, 0, N2S_PERSEC);
+		iops_str[2] = num2str(je->iops[2], je->sig_figs, 1, 0, N2S_PERSEC);
 
-		rate_str[0] = num2str(je->rate[0], 4, 10, i2p, N2S_BYTEPERSEC);
-		rate_alt[0] = num2str(je->rate[0], 4, 10, !i2p, N2S_BYTEPERSEC);
+		rate_str[0] = num2str(je->rate[0], je->sig_figs, 10, i2p, N2S_BYTEPERSEC);
+		rate_alt[0] = num2str(je->rate[0], je->sig_figs, 10, !i2p, N2S_BYTEPERSEC);
 		snprintf(tmp, sizeof(tmp), "%s (%s)", rate_str[0], rate_alt[0]);
 		gtk_entry_set_text(GTK_ENTRY(ui->eta.read_bw), tmp);
 		gtk_entry_set_text(GTK_ENTRY(ui->eta.read_iops), iops_str[0]);
 
-		rate_str[1] = num2str(je->rate[1], 4, 10, i2p, N2S_BYTEPERSEC);
-		rate_alt[1] = num2str(je->rate[1], 4, 10, !i2p, N2S_BYTEPERSEC);
+		rate_str[1] = num2str(je->rate[1], je->sig_figs, 10, i2p, N2S_BYTEPERSEC);
+		rate_alt[1] = num2str(je->rate[1], je->sig_figs, 10, !i2p, N2S_BYTEPERSEC);
 		snprintf(tmp, sizeof(tmp), "%s (%s)", rate_str[1], rate_alt[1]);
 		gtk_entry_set_text(GTK_ENTRY(ui->eta.write_bw), tmp);
 		gtk_entry_set_text(GTK_ENTRY(ui->eta.write_iops), iops_str[1]);
 
-		rate_str[2] = num2str(je->rate[2], 4, 10, i2p, N2S_BYTEPERSEC);
-		rate_alt[2] = num2str(je->rate[2], 4, 10, !i2p, N2S_BYTEPERSEC);
+		rate_str[2] = num2str(je->rate[2], je->sig_figs, 10, i2p, N2S_BYTEPERSEC);
+		rate_alt[2] = num2str(je->rate[2], je->sig_figs, 10, !i2p, N2S_BYTEPERSEC);
 		snprintf(tmp, sizeof(tmp), "%s (%s)", rate_str[2], rate_alt[2]);
 		gtk_entry_set_text(GTK_ENTRY(ui->eta.trim_bw), tmp);
 		gtk_entry_set_text(GTK_ENTRY(ui->eta.trim_iops), iops_str[2]);
@@ -587,10 +587,10 @@ static void gfio_add_job_op(struct fio_client *client, struct fio_net_cmd *cmd)
 	multitext_add_entry(&ge->eta.iotype, tmp);
 
 	i2p = is_power_of_2(o->kb_base);
-	c1 = num2str(o->min_bs[DDIR_READ], 4, 1, i2p, N2S_BYTE);
-	c2 = num2str(o->max_bs[DDIR_READ], 4, 1, i2p, N2S_BYTE);
-	c3 = num2str(o->min_bs[DDIR_WRITE], 4, 1, i2p, N2S_BYTE);
-	c4 = num2str(o->max_bs[DDIR_WRITE], 4, 1, i2p, N2S_BYTE);
+	c1 = num2str(o->min_bs[DDIR_READ], o->sig_figs, 1, i2p, N2S_BYTE);
+	c2 = num2str(o->max_bs[DDIR_READ], o->sig_figs, 1, i2p, N2S_BYTE);
+	c3 = num2str(o->min_bs[DDIR_WRITE], o->sig_figs, 1, i2p, N2S_BYTE);
+	c4 = num2str(o->max_bs[DDIR_WRITE], o->sig_figs, 1, i2p, N2S_BYTE);
 
 	sprintf(tmp, "%s-%s,%s-%s", c1, c2, c3, c4);
 	free(c1);
@@ -1183,7 +1183,7 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 	bw = (1000 * ts->io_bytes[ddir]) / runt;
 
 	iops = (1000 * (uint64_t)ts->total_io_u[ddir]) / runt;
-	iops_p = num2str(iops, 4, 1, 0, N2S_PERSEC);
+	iops_p = num2str(iops, ts->sig_figs, 1, 0, N2S_PERSEC);
 
 	box = gtk_hbox_new(FALSE, 3);
 	gtk_box_pack_start(GTK_BOX(mbox), box, TRUE, FALSE, 3);
@@ -1198,14 +1198,14 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 	gtk_box_pack_start(GTK_BOX(main_vbox), box, TRUE, FALSE, 3);
 
 	label = new_info_label_in_frame(box, "IO");
-	io_p = num2str(ts->io_bytes[ddir], 4, 1, i2p, N2S_BYTE);
-	io_palt = num2str(ts->io_bytes[ddir], 4, 1, !i2p, N2S_BYTE);
+	io_p = num2str(ts->io_bytes[ddir], ts->sig_figs, 1, i2p, N2S_BYTE);
+	io_palt = num2str(ts->io_bytes[ddir], ts->sig_figs, 1, !i2p, N2S_BYTE);
 	snprintf(tmp, sizeof(tmp), "%s (%s)", io_p, io_palt);
 	gtk_label_set_text(GTK_LABEL(label), tmp);
 
 	label = new_info_label_in_frame(box, "Bandwidth");
-	bw_p = num2str(bw, 4, 1, i2p, ts->unit_base);
-	bw_palt = num2str(bw, 4, 1, !i2p, ts->unit_base);
+	bw_p = num2str(bw, ts->sig_figs, 1, i2p, ts->unit_base);
+	bw_palt = num2str(bw, ts->sig_figs, 1, !i2p, ts->unit_base);
 	snprintf(tmp, sizeof(tmp), "%s (%s)", bw_p, bw_palt);
 	gtk_label_set_text(GTK_LABEL(label), tmp);
 
diff --git a/init.c b/init.c
index 736c6ff..b7e9c0e 100644
--- a/init.c
+++ b/init.c
@@ -1589,14 +1589,14 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 				char *c5 = NULL, *c6 = NULL;
 				int i2p = is_power_of_2(o->kb_base);
 
-				c1 = num2str(o->min_bs[DDIR_READ], 4, 1, i2p, N2S_BYTE);
-				c2 = num2str(o->max_bs[DDIR_READ], 4, 1, i2p, N2S_BYTE);
-				c3 = num2str(o->min_bs[DDIR_WRITE], 4, 1, i2p, N2S_BYTE);
-				c4 = num2str(o->max_bs[DDIR_WRITE], 4, 1, i2p, N2S_BYTE);
+				c1 = num2str(o->min_bs[DDIR_READ], o->sig_figs, 1, i2p, N2S_BYTE);
+				c2 = num2str(o->max_bs[DDIR_READ], o->sig_figs, 1, i2p, N2S_BYTE);
+				c3 = num2str(o->min_bs[DDIR_WRITE], o->sig_figs, 1, i2p, N2S_BYTE);
+				c4 = num2str(o->max_bs[DDIR_WRITE], o->sig_figs, 1, i2p, N2S_BYTE);
 
 				if (!o->bs_is_seq_rand) {
-					c5 = num2str(o->min_bs[DDIR_TRIM], 4, 1, i2p, N2S_BYTE);
-					c6 = num2str(o->max_bs[DDIR_TRIM], 4, 1, i2p, N2S_BYTE);
+					c5 = num2str(o->min_bs[DDIR_TRIM], o->sig_figs, 1, i2p, N2S_BYTE);
+					c6 = num2str(o->max_bs[DDIR_TRIM], o->sig_figs, 1, i2p, N2S_BYTE);
 				}
 
 				log_info("%s: (g=%d): rw=%s, ", td->o.name,
diff --git a/options.c b/options.c
index a0fcd8f..7caccb3 100644
--- a/options.c
+++ b/options.c
@@ -4127,6 +4127,19 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_STAT,
 		.group	= FIO_OPT_G_INVALID,
 	},
+	{
+		.name	= "significant_figures",
+		.lname	= "Significant figures",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, sig_figs),
+		.maxval	= 10,
+		.minval	= 1,
+		.help	= "Significant figures for output-format set to normal",
+		.def	= "4",
+		.interval = 1,
+		.category = FIO_OPT_C_STAT,
+		.group	= FIO_OPT_G_INVALID,
+	},
 
 #ifdef FIO_HAVE_DISK_UTIL
 	{
diff --git a/server.c b/server.c
index e6ea4cd..967cebe 100644
--- a/server.c
+++ b/server.c
@@ -1538,6 +1538,8 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	p.ts.latency_window	= cpu_to_le64(ts->latency_window);
 	p.ts.latency_percentile.u.i = cpu_to_le64(fio_double_to_uint64(ts->latency_percentile.u.f));
 
+	p.ts.sig_figs		= cpu_to_le32(ts->sig_figs);
+
 	p.ts.nr_block_infos	= cpu_to_le64(ts->nr_block_infos);
 	for (i = 0; i < p.ts.nr_block_infos; i++)
 		p.ts.block_infos[i] = cpu_to_le32(ts->block_infos[i]);
diff --git a/stat.c b/stat.c
index 89e2e6c..48d8e7d 100644
--- a/stat.c
+++ b/stat.c
@@ -299,14 +299,14 @@ void show_group_stats(struct group_run_stats *rs, struct buf_output *out)
 		if (!rs->max_run[i])
 			continue;
 
-		io = num2str(rs->iobytes[i], 4, 1, i2p, N2S_BYTE);
-		ioalt = num2str(rs->iobytes[i], 4, 1, !i2p, N2S_BYTE);
-		agg = num2str(rs->agg[i], 4, 1, i2p, rs->unit_base);
-		aggalt = num2str(rs->agg[i], 4, 1, !i2p, rs->unit_base);
-		min = num2str(rs->min_bw[i], 4, 1, i2p, rs->unit_base);
-		minalt = num2str(rs->min_bw[i], 4, 1, !i2p, rs->unit_base);
-		max = num2str(rs->max_bw[i], 4, 1, i2p, rs->unit_base);
-		maxalt = num2str(rs->max_bw[i], 4, 1, !i2p, rs->unit_base);
+		io = num2str(rs->iobytes[i], rs->sig_figs, 1, i2p, N2S_BYTE);
+		ioalt = num2str(rs->iobytes[i], rs->sig_figs, 1, !i2p, N2S_BYTE);
+		agg = num2str(rs->agg[i], rs->sig_figs, 1, i2p, rs->unit_base);
+		aggalt = num2str(rs->agg[i], rs->sig_figs, 1, !i2p, rs->unit_base);
+		min = num2str(rs->min_bw[i], rs->sig_figs, 1, i2p, rs->unit_base);
+		minalt = num2str(rs->min_bw[i], rs->sig_figs, 1, !i2p, rs->unit_base);
+		max = num2str(rs->max_bw[i], rs->sig_figs, 1, i2p, rs->unit_base);
+		maxalt = num2str(rs->max_bw[i], rs->sig_figs, 1, !i2p, rs->unit_base);
 		log_buf(out, "%s: bw=%s (%s), %s-%s (%s-%s), io=%s (%s), run=%llu-%llumsec\n",
 				rs->unified_rw_rep ? "  MIXED" : str[i],
 				agg, aggalt, min, max, minalt, maxalt, io, ioalt,
@@ -435,12 +435,12 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	runt = ts->runtime[ddir];
 
 	bw = (1000 * ts->io_bytes[ddir]) / runt;
-	io_p = num2str(ts->io_bytes[ddir], 4, 1, i2p, N2S_BYTE);
-	bw_p = num2str(bw, 4, 1, i2p, ts->unit_base);
-	bw_p_alt = num2str(bw, 4, 1, !i2p, ts->unit_base);
+	io_p = num2str(ts->io_bytes[ddir], ts->sig_figs, 1, i2p, N2S_BYTE);
+	bw_p = num2str(bw, ts->sig_figs, 1, i2p, ts->unit_base);
+	bw_p_alt = num2str(bw, ts->sig_figs, 1, !i2p, ts->unit_base);
 
 	iops = (1000 * (uint64_t)ts->total_io_u[ddir]) / runt;
-	iops_p = num2str(iops, 4, 1, 0, N2S_NONE);
+	iops_p = num2str(iops, ts->sig_figs, 1, 0, N2S_NONE);
 
 	log_buf(out, "  %s: IOPS=%s, BW=%s (%s)(%s/%llumsec)\n",
 			rs->unified_rw_rep ? "mixed" : str[ddir],
@@ -738,9 +738,9 @@ static void show_ss_normal(struct thread_stat *ts, struct buf_output *out)
 	bw_mean = steadystate_bw_mean(ts);
 	iops_mean = steadystate_iops_mean(ts);
 
-	p1 = num2str(bw_mean / ts->kb_base, 4, ts->kb_base, i2p, ts->unit_base);
-	p1alt = num2str(bw_mean / ts->kb_base, 4, ts->kb_base, !i2p, ts->unit_base);
-	p2 = num2str(iops_mean, 4, 1, 0, N2S_NONE);
+	p1 = num2str(bw_mean / ts->kb_base, ts->sig_figs, ts->kb_base, i2p, ts->unit_base);
+	p1alt = num2str(bw_mean / ts->kb_base, ts->sig_figs, ts->kb_base, !i2p, ts->unit_base);
+	p2 = num2str(iops_mean, ts->sig_figs, 1, 0, N2S_NONE);
 
 	log_buf(out, "  steadystate  : attained=%s, bw=%s (%s), iops=%s, %s%s=%.3f%s\n",
 		ts->ss_state & __FIO_SS_ATTAINED ? "yes" : "no",
@@ -1690,6 +1690,7 @@ void __show_run_stats(void)
 
 			ts->kb_base = td->o.kb_base;
 			ts->unit_base = td->o.unit_base;
+			ts->sig_figs = td->o.sig_figs;
 			ts->unified_rw_rep = td->o.unified_rw_rep;
 		} else if (ts->kb_base != td->o.kb_base && !kb_base_warned) {
 			log_info("fio: kb_base differs for jobs in group, using"
@@ -1752,6 +1753,7 @@ void __show_run_stats(void)
 		rs = &runstats[ts->groupid];
 		rs->kb_base = ts->kb_base;
 		rs->unit_base = ts->unit_base;
+		rs->sig_figs = ts->sig_figs;
 		rs->unified_rw_rep += ts->unified_rw_rep;
 
 		for (j = 0; j < DDIR_RWDIR_CNT; j++) {
diff --git a/stat.h b/stat.h
index 6ddcad2..ba66c40 100644
--- a/stat.h
+++ b/stat.h
@@ -11,6 +11,7 @@ struct group_run_stats {
 	uint64_t agg[DDIR_RWDIR_CNT];
 	uint32_t kb_base;
 	uint32_t unit_base;
+	uint32_t sig_figs;
 	uint32_t groupid;
 	uint32_t unified_rw_rep;
 } __attribute__((packed));
@@ -221,6 +222,8 @@ struct thread_stat {
 	fio_fp64_t latency_percentile;
 	uint64_t latency_window;
 
+	uint32_t sig_figs;
+
 	uint64_t ss_dur;
 	uint32_t ss_state;
 	uint32_t ss_head;
@@ -257,6 +260,8 @@ struct jobs_eta {
 	uint32_t is_pow2;
 	uint32_t unit_base;
 
+	uint32_t sig_figs;
+
 	uint32_t files_open;
 
 	/*
diff --git a/thread_options.h b/thread_options.h
index 5a037bf..ca549b5 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -309,6 +309,8 @@ struct thread_options {
 	unsigned long long latency_window;
 	fio_fp64_t latency_percentile;
 
+	unsigned int sig_figs;
+
 	unsigned block_error_hist;
 
 	unsigned int replay_align;
@@ -584,6 +586,8 @@ struct thread_options_pack {
 	uint64_t latency_window;
 	fio_fp64_t latency_percentile;
 
+	uint32_t sig_figs;
+
 	uint32_t block_error_hist;
 
 	uint32_t replay_align;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* RE: Recent changes (master)
  2017-11-18 13:00 Jens Axboe
@ 2017-11-20 15:00 ` Elliott, Robert (Persistent Memory)
  0 siblings, 0 replies; 1305+ messages in thread
From: Elliott, Robert (Persistent Memory) @ 2017-11-20 15:00 UTC (permalink / raw)
  To: Jens Axboe, fio



> -----Original Message-----
> From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On
> Behalf Of Jens Axboe
> Sent: Saturday, November 18, 2017 7:00 AM
...
> +/*
> + * Limits us to 1GiB of mapped files in total to model after
> + * libpmem engine behavior
> + */
> +#define MMAP_TOTAL_SZ   (1 * 1024 * 1024 * 1024UL)
> +
...
> +#define MEGABYTE ((uintptr_t)1 << 20)
> +#define GIGABYTE ((uintptr_t)1 << 30)
> +#define PROCMAXLEN 2048 /* maximum expected line length in /proc
> files */
> +#define roundup(x, y)   ((((x) + ((y) - 1)) / (y)) * (y))
...
> +/*
> + * util_map_hint_align -- choose the desired mapping alignment
> + *
> + * Use 2MB/1GB page alignment only if the mapping length is at least
> + * twice as big as the page size.
> + */
...
> + * Except for Windows Environment:
> + *   ALSR in 64-bit Linux kernel uses 28-bit of randomness for mmap
> + *   (bit positions 12-39), which means the base mapping address is
> randomized
> + *   within [0..1024GB] range, with 4KB granularity.  Assuming
> additional
> + *   1GB alignment, it results in 1024 possible locations.

Please use the IEC prefixes for binary units in new code:
MEBIBYTE, GIBIBYTE, MiB, GiB, etc.


---
Robert Elliott, HPE Persistent Memory




^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-11-18 13:00 Jens Axboe
  2017-11-20 15:00 ` Elliott, Robert (Persistent Memory)
  0 siblings, 1 reply; 1305+ messages in thread
From: Jens Axboe @ 2017-11-18 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit adf075fa89a7f3bbb45237f1440de0583833bd80:

  ioengines: remove pointless list initializations (2017-11-16 20:03:15 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 40e5f1bf1aca5970528724873a4544c43712a75d:

  Merge branch 'libpmem' (2017-11-17 09:21:19 -0700)

----------------------------------------------------------------
Jens Axboe (5):
      Merge branch 'add-libpmem-engine' of https://github.com/tishizaki/fio into libpmem
      libpmem: code cleanups
      libpmem: move mmap alignment to init time
      examples/libpmem.fio: clean up example
      Merge branch 'libpmem'

Teruaki Ishizaki (1):
      fio: add libpmem engine

 HOWTO                |   5 +
 Makefile             |   3 +
 configure            |   9 +
 engines/libpmem.c    | 591 +++++++++++++++++++++++++++++++++++++++++++++++++++
 examples/libpmem.fio |  73 +++++++
 fio.1                |   5 +
 options.c            |   5 +
 7 files changed, 691 insertions(+)
 create mode 100644 engines/libpmem.c
 create mode 100644 examples/libpmem.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 419fa73..dce96bc 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1820,6 +1820,11 @@ I/O engine
 			set  `filesize` so that all the accounting still occurs, but no
 			actual I/O will be done other than creating the file.
 
+		**libpmem**
+			Read and write using mmap I/O to a file on a filesystem
+			mounted with DAX on a persistent memory device through the NVML
+			libpmem library.
+
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/Makefile b/Makefile
index 2893348..3ce6064 100644
--- a/Makefile
+++ b/Makefile
@@ -135,6 +135,9 @@ endif
 ifdef CONFIG_LINUX_DEVDAX
   SOURCE += engines/dev-dax.c
 endif
+ifdef CONFIG_LIBPMEM
+  SOURCE += engines/libpmem.c
+endif
 
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
diff --git a/configure b/configure
index d34c000..31ba822 100755
--- a/configure
+++ b/configure
@@ -142,6 +142,7 @@ gfio_check="no"
 libhdfs="no"
 pmemblk="no"
 devdax="no"
+pmem="no"
 disable_lex=""
 disable_pmem="no"
 prefix=/usr/local
@@ -1845,6 +1846,7 @@ print_config "libpmemblk" "$libpmemblk"
 
 # Choose the ioengines
 if test "$libpmem" = "yes" && test "$disable_pmem" = "no"; then
+  pmem="yes"
   devdax="yes"
   if test "$libpmemblk" = "yes"; then
     pmemblk="yes"
@@ -1860,6 +1862,10 @@ print_config "NVML pmemblk engine" "$pmemblk"
 print_config "NVML dev-dax engine" "$devdax"
 
 ##########################################
+# Report whether libpmem engine is enabled
+print_config "NVML libpmem engine" "$pmem"
+
+##########################################
 # Check if we have lex/yacc available
 yacc="no"
 yacc_is_bison="no"
@@ -2300,6 +2306,9 @@ fi
 if test "$devdax" = "yes" ; then
   output_sym "CONFIG_LINUX_DEVDAX"
 fi
+if test "$pmem" = "yes" ; then
+  output_sym "CONFIG_LIBPMEM"
+fi
 if test "$arith" = "yes" ; then
   output_sym "CONFIG_ARITHMETIC"
   if test "$yacc_is_bison" = "yes" ; then
diff --git a/engines/libpmem.c b/engines/libpmem.c
new file mode 100644
index 0000000..3f4e44f
--- /dev/null
+++ b/engines/libpmem.c
@@ -0,0 +1,591 @@
+/*
+ * libpmem: IO engine that uses NVML libpmem to read and write data
+ *
+ * Copyright (C) 2017 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+/*
+ * libpmem engine
+ *
+ * IO engine that uses libpmem to read and write data
+ *
+ * To use:
+ *   ioengine=libpmem
+ *
+ * Other relevant settings:
+ *   iodepth=1
+ *   direct=1
+ *   directory=/mnt/pmem0/
+ *   bs=4k
+ *
+ *   direct=1 means that pmem_drain() is executed for each write operation.
+ *   In contrast, direct=0 means that pmem_drain() is not executed.
+ *
+ *   The pmem device must have a DAX-capable filesystem and be mounted
+ *   with DAX enabled. directory must point to a mount point of DAX FS.
+ *
+ *   Example:
+ *     mkfs.xfs /dev/pmem0
+ *     mkdir /mnt/pmem0
+ *     mount -o dax /dev/pmem0 /mnt/pmem0
+ *
+ *
+ * See examples/libpmem.fio for more.
+ *
+ *
+ * libpmem.so
+ *   By default, the libpmem engine will let the system find the libpmem.so
+ *   that it uses. You can use an alternative libpmem by setting the
+ *   FIO_PMEM_LIB environment variable to the full path to the desired
+ *   libpmem.so.
+ */
+
+#include <stdio.h>
+#include <limits.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/sysmacros.h>
+#include <libgen.h>
+#include <libpmem.h>
+
+#include "../fio.h"
+#include "../verify.h"
+
+/*
+ * Limits us to 1GiB of mapped files in total to model after
+ * libpmem engine behavior
+ */
+#define MMAP_TOTAL_SZ   (1 * 1024 * 1024 * 1024UL)
+
+struct fio_libpmem_data {
+	void *libpmem_ptr;
+	size_t libpmem_sz;
+	off_t libpmem_off;
+};
+
+#define MEGABYTE ((uintptr_t)1 << 20)
+#define GIGABYTE ((uintptr_t)1 << 30)
+#define PROCMAXLEN 2048 /* maximum expected line length in /proc files */
+#define roundup(x, y)   ((((x) + ((y) - 1)) / (y)) * (y))
+
+static bool Mmap_no_random;
+static void *Mmap_hint;
+static unsigned long long Mmap_align;
+
+/*
+ * util_map_hint_align -- choose the desired mapping alignment
+ *
+ * Use 2MB/1GB page alignment only if the mapping length is at least
+ * twice as big as the page size.
+ */
+static inline size_t util_map_hint_align(size_t len, size_t req_align)
+{
+	size_t align = Mmap_align;
+
+	dprint(FD_IO, "DEBUG util_map_hint_align\n" );
+
+	if (req_align)
+		align = req_align;
+	else if (len >= 2 * GIGABYTE)
+		align = GIGABYTE;
+	else if (len >= 4 * MEGABYTE)
+		align = 2 * MEGABYTE;
+
+	dprint(FD_IO, "align=%d\n", (int)align);
+	return align;
+}
+
+#ifdef __FreeBSD__
+static const char *sscanf_os = "%p %p";
+#define MAP_NORESERVE 0
+#define OS_MAPFILE "/proc/curproc/map"
+#else
+static const char *sscanf_os = "%p-%p";
+#define OS_MAPFILE "/proc/self/maps"
+#endif
+
+/*
+ * util_map_hint_unused -- use /proc to determine a hint address for mmap()
+ *
+ * This is a helper function for util_map_hint().
+ * It opens up /proc/self/maps and looks for the first unused address
+ * in the process address space that is:
+ * - greater or equal 'minaddr' argument,
+ * - large enough to hold range of given length,
+ * - aligned to the specified unit.
+ *
+ * Asking for aligned address like this will allow the DAX code to use large
+ * mappings.  It is not an error if mmap() ignores the hint and chooses
+ * different address.
+ */
+static char *util_map_hint_unused(void *minaddr, size_t len, size_t align)
+{
+	char *lo = NULL;        /* beginning of current range in maps file */
+	char *hi = NULL;        /* end of current range in maps file */
+	char *raddr = minaddr;  /* ignore regions below 'minaddr' */
+
+#ifdef WIN32
+	MEMORY_BASIC_INFORMATION mi;
+#else
+	FILE *fp;
+	char line[PROCMAXLEN];  /* for fgets() */
+#endif
+
+	dprint(FD_IO, "DEBUG util_map_hint_unused\n");
+	assert(align > 0);
+
+	if (raddr == NULL)
+		raddr += page_size;
+
+	raddr = (char *)roundup((uintptr_t)raddr, align);
+
+#ifdef WIN32
+	while ((uintptr_t)raddr < UINTPTR_MAX - len) {
+		size_t ret = VirtualQuery(raddr, &mi, sizeof(mi));
+		if (ret == 0) {
+			ERR("VirtualQuery %p", raddr);
+			return MAP_FAILED;
+		}
+		dprint(FD_IO, "addr %p len %zu state %d",
+				mi.BaseAddress, mi.RegionSize, mi.State);
+
+		if ((mi.State != MEM_FREE) || (mi.RegionSize < len)) {
+			raddr = (char *)mi.BaseAddress + mi.RegionSize;
+			raddr = (char *)roundup((uintptr_t)raddr, align);
+			dprint(FD_IO, "nearest aligned addr %p", raddr);
+		} else {
+			dprint(FD_IO, "unused region of size %zu found at %p",
+					mi.RegionSize, mi.BaseAddress);
+			return mi.BaseAddress;
+		}
+	}
+
+	dprint(FD_IO, "end of address space reached");
+	return MAP_FAILED;
+#else
+	fp = fopen(OS_MAPFILE, "r");
+	if (!fp) {
+		log_err("!%s\n", OS_MAPFILE);
+		return MAP_FAILED;
+	}
+
+	while (fgets(line, PROCMAXLEN, fp) != NULL) {
+		/* check for range line */
+		if (sscanf(line, sscanf_os, &lo, &hi) == 2) {
+			dprint(FD_IO, "%p-%p\n", lo, hi);
+			if (lo > raddr) {
+				if ((uintptr_t)(lo - raddr) >= len) {
+					dprint(FD_IO, "unused region of size "
+							"%zu found at %p\n",
+							lo - raddr, raddr);
+					break;
+				} else {
+					dprint(FD_IO, "region is too small: "
+							"%zu < %zu\n",
+							lo - raddr, len);
+				}
+			}
+
+			if (hi > raddr) {
+				raddr = (char *)roundup((uintptr_t)hi, align);
+				dprint(FD_IO, "nearest aligned addr %p\n",
+						raddr);
+			}
+
+			if (raddr == 0) {
+				dprint(FD_IO, "end of address space reached\n");
+				break;
+			}
+		}
+	}
+
+	/*
+	 * Check for a case when this is the last unused range in the address
+	 * space, but is not large enough. (very unlikely)
+	 */
+	if ((raddr != NULL) && (UINTPTR_MAX - (uintptr_t)raddr < len)) {
+		dprint(FD_IO, "end of address space reached");
+		raddr = MAP_FAILED;
+	}
+
+	fclose(fp);
+
+	dprint(FD_IO, "returning %p", raddr);
+	return raddr;
+#endif
+}
+
+/*
+ * util_map_hint -- determine hint address for mmap()
+ *
+ * If PMEM_MMAP_HINT environment variable is not set, we let the system to pick
+ * the randomized mapping address.  Otherwise, a user-defined hint address
+ * is used.
+ *
+ * Windows Environment:
+ *   XXX - Windows doesn't support large DAX pages yet, so there is
+ *   no point in aligning for the same.
+ *
+ * Except for Windows Environment:
+ *   ALSR in 64-bit Linux kernel uses 28-bit of randomness for mmap
+ *   (bit positions 12-39), which means the base mapping address is randomized
+ *   within [0..1024GB] range, with 4KB granularity.  Assuming additional
+ *   1GB alignment, it results in 1024 possible locations.
+ *
+ *   Configuring the hint address via PMEM_MMAP_HINT environment variable
+ *   disables address randomization.  In such case, the function will search for
+ *   the first unused, properly aligned region of given size, above the
+ *   specified address.
+ */
+static char *util_map_hint(size_t len, size_t req_align)
+{
+	char *addr;
+	size_t align = 0;
+	char *e = NULL;
+
+	dprint(FD_IO, "DEBUG util_map_hint\n");
+	dprint(FD_IO, "len %zu req_align %zu\n", len, req_align);
+
+	/* choose the desired alignment based on the requested length */
+	align = util_map_hint_align(len, req_align);
+
+	e = getenv("PMEM_MMAP_HINT");
+	if (e) {
+		char *endp;
+		unsigned long long val = 0;
+
+		errno = 0;
+
+		val = strtoull(e, &endp, 16);
+		if (errno || endp == e) {
+			dprint(FD_IO, "Invalid PMEM_MMAP_HINT\n");
+		} else {
+			Mmap_hint = (void *)val;
+			Mmap_no_random = true;
+			dprint(FD_IO, "PMEM_MMAP_HINT set to %p\n", Mmap_hint);
+		}
+	}
+
+	if (Mmap_no_random) {
+		dprint(FD_IO, "user-defined hint %p\n", (void *)Mmap_hint);
+		addr = util_map_hint_unused((void *)Mmap_hint, len, align);
+	} else {
+		/*
+		 * Create dummy mapping to find an unused region of given size.
+		 * * Request for increased size for later address alignment.
+		 *
+		 * Windows Environment: 
+		 *   Use MAP_NORESERVE flag to only reserve the range of pages
+		 *   rather than commit.  We don't want the pages to be actually
+		 *   backed by the operating system paging file, as the swap
+		 *   file is usually too small to handle terabyte pools.
+		 *
+		 * Except for Windows Environment:
+		 *   Use MAP_PRIVATE with read-only access to simulate
+		 *   zero cost for overcommit accounting.  Note: MAP_NORESERVE
+		 *   flag is ignored if overcommit is disabled (mode 2).
+		 */
+#ifndef WIN32
+		addr = mmap(NULL, len + align, PROT_READ,
+				MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+#else
+		addr = mmap(NULL, len + align, PROT_READ,
+				MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
+#endif
+		if (addr != MAP_FAILED) {
+			dprint(FD_IO, "system choice %p\n", addr);
+			munmap(addr, len + align);
+			addr = (char *)roundup((uintptr_t)addr, align);
+		}
+	}
+
+	dprint(FD_IO, "hint %p\n", addr);
+
+	return addr;
+}
+
+/*
+ * This is the mmap execution function
+ */
+static int fio_libpmem_file(struct thread_data *td, struct fio_file *f,
+			    size_t length, off_t off)
+{
+	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
+	int flags = 0;
+	void *addr = NULL;
+
+	dprint(FD_IO, "DEBUG fio_libpmem_file\n");
+
+	if (td_rw(td))
+		flags = PROT_READ | PROT_WRITE;
+	else if (td_write(td)) {
+		flags = PROT_WRITE;
+
+		if (td->o.verify != VERIFY_NONE)
+			flags |= PROT_READ;
+	} else
+		flags = PROT_READ;
+
+	dprint(FD_IO, "f->file_name = %s  td->o.verify = %d \n", f->file_name,
+			td->o.verify);
+	dprint(FD_IO, "length = %ld  flags = %d  f->fd = %d off = %ld \n",
+			length, flags, f->fd,off);
+
+	addr = util_map_hint(length, 0);
+
+	fdd->libpmem_ptr = mmap(addr, length, flags, MAP_SHARED, f->fd, off);
+	if (fdd->libpmem_ptr == MAP_FAILED) {
+		fdd->libpmem_ptr = NULL;
+		td_verror(td, errno, "mmap");
+	}
+
+	if (td->error && fdd->libpmem_ptr)
+		munmap(fdd->libpmem_ptr, length);
+
+	return td->error;
+}
+
+/*
+ * XXX Just mmap an appropriate portion, we cannot mmap the full extent
+ */
+static int fio_libpmem_prep_limited(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
+
+	dprint(FD_IO, "DEBUG fio_libpmem_prep_limited\n" );
+
+	if (io_u->buflen > f->real_file_size) {
+		log_err("libpmem: bs too big for libpmem engine\n");
+		return EIO;
+	}
+
+	fdd->libpmem_sz = min(MMAP_TOTAL_SZ, f->real_file_size);
+	if (fdd->libpmem_sz > f->io_size)
+		fdd->libpmem_sz = f->io_size;
+
+	fdd->libpmem_off = io_u->offset;
+
+	return fio_libpmem_file(td, f, fdd->libpmem_sz, fdd->libpmem_off);
+}
+
+/*
+ * Attempt to mmap the entire file
+ */
+static int fio_libpmem_prep_full(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
+	int ret;
+
+	dprint(FD_IO, "DEBUG fio_libpmem_prep_full\n" );
+
+	if (fio_file_partial_mmap(f))
+		return EINVAL;
+
+	dprint(FD_IO," f->io_size %ld : io_u->offset %lld \n",
+			f->io_size, io_u->offset);
+
+	if (io_u->offset != (size_t) io_u->offset ||
+	    f->io_size != (size_t) f->io_size) {
+		fio_file_set_partial_mmap(f);
+		return EINVAL;
+	}
+	fdd->libpmem_sz = f->io_size;
+	fdd->libpmem_off = 0;
+
+	ret = fio_libpmem_file(td, f, fdd->libpmem_sz, fdd->libpmem_off);
+	if (ret)
+		fio_file_set_partial_mmap(f);
+
+	return ret;
+}
+
+static int fio_libpmem_prep(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
+	int ret;
+
+	dprint(FD_IO, "DEBUG fio_libpmem_prep\n" );
+	/*
+	 * It fits within existing mapping, use it
+	 */
+	dprint(FD_IO," io_u->offset %lld : fdd->libpmem_off %ld : "
+			"io_u->buflen %ld : fdd->libpmem_sz %ld\n",
+			io_u->offset, fdd->libpmem_off,
+			io_u->buflen, fdd->libpmem_sz);
+
+	if (io_u->offset >= fdd->libpmem_off &&
+	    (io_u->offset + io_u->buflen <
+	     fdd->libpmem_off + fdd->libpmem_sz))
+		goto done;
+
+	/*
+	 * unmap any existing mapping
+	 */
+	if (fdd->libpmem_ptr) {
+		dprint(FD_IO,"munmap \n");
+		if (munmap(fdd->libpmem_ptr, fdd->libpmem_sz) < 0)
+			return errno;
+		fdd->libpmem_ptr = NULL;
+	}
+
+	if (fio_libpmem_prep_full(td, io_u)) {
+		td_clear_error(td);
+		ret = fio_libpmem_prep_limited(td, io_u);
+		if (ret)
+			return ret;
+	}
+
+done:
+	io_u->mmap_data = fdd->libpmem_ptr + io_u->offset - fdd->libpmem_off
+				- f->file_offset;
+	return 0;
+}
+
+static int fio_libpmem_queue(struct thread_data *td, struct io_u *io_u)
+{
+	fio_ro_check(td, io_u);
+	io_u->error = 0;
+
+	dprint(FD_IO, "DEBUG fio_libpmem_queue\n");
+
+	switch (io_u->ddir) {
+	case DDIR_READ:
+		memcpy(io_u->xfer_buf, io_u->mmap_data, io_u->xfer_buflen);
+		break;
+	case DDIR_WRITE:
+		dprint(FD_IO, "DEBUG mmap_data=%p, xfer_buf=%p\n",
+				io_u->mmap_data, io_u->xfer_buf );
+		dprint(FD_IO,"td->o.odirect %d \n",td->o.odirect);
+		if (td->o.odirect) {
+			pmem_memcpy_persist(io_u->mmap_data,
+						io_u->xfer_buf,
+						io_u->xfer_buflen);
+		} else {
+			pmem_memcpy_nodrain(io_u->mmap_data,
+						io_u->xfer_buf,
+						io_u->xfer_buflen);
+		}
+		break;
+	case DDIR_SYNC:
+	case DDIR_DATASYNC:
+	case DDIR_SYNC_FILE_RANGE:
+		break;
+	default:
+		io_u->error = EINVAL;
+		break;
+	}
+
+	return FIO_Q_COMPLETED;
+}
+
+static int fio_libpmem_init(struct thread_data *td)
+{
+	struct thread_options *o = &td->o;
+
+	dprint(FD_IO,"o->rw_min_bs %d \n o->fsync_blocks %d \n o->fdatasync_blocks %d \n",
+			o->rw_min_bs,o->fsync_blocks,o->fdatasync_blocks);
+	dprint(FD_IO, "DEBUG fio_libpmem_init\n");
+
+	if ((o->rw_min_bs & page_mask) &&
+	    (o->fsync_blocks || o->fdatasync_blocks)) {
+		log_err("libpmem: mmap options dictate a minimum block size of "
+				"%llu bytes\n",	(unsigned long long) page_size);
+		return 1;
+	}
+	return 0;
+}
+
+static int fio_libpmem_open_file(struct thread_data *td, struct fio_file *f)
+{
+	struct fio_libpmem_data *fdd;
+	int ret;
+
+	dprint(FD_IO,"DEBUG fio_libpmem_open_file\n");
+	dprint(FD_IO,"f->io_size=%ld \n",f->io_size);
+	dprint(FD_IO,"td->o.size=%lld \n",td->o.size);
+	dprint(FD_IO,"td->o.iodepth=%d\n",td->o.iodepth);
+	dprint(FD_IO,"td->o.iodepth_batch=%d \n",td->o.iodepth_batch);
+
+	ret = generic_open_file(td, f);
+	if (ret)
+		return ret;
+
+	fdd = calloc(1, sizeof(*fdd));
+	if (!fdd) {
+		int fio_unused __ret;
+		__ret = generic_close_file(td, f);
+		return 1;
+	}
+
+	FILE_SET_ENG_DATA(f, fdd);
+
+	return 0;
+}
+
+static int fio_libpmem_close_file(struct thread_data *td, struct fio_file *f)
+{
+	struct fio_libpmem_data *fdd = FILE_ENG_DATA(f);
+
+	dprint(FD_IO,"DEBUG fio_libpmem_close_file\n");
+	dprint(FD_IO,"td->o.odirect %d \n",td->o.odirect);
+
+	if (!td->o.odirect) {
+		dprint(FD_IO,"pmem_drain\n");
+		pmem_drain();
+	}
+
+	FILE_SET_ENG_DATA(f, NULL);
+	free(fdd);
+	fio_file_clear_partial_mmap(f);
+
+	return generic_close_file(td, f);
+}
+
+static struct ioengine_ops ioengine = {
+	.name		= "libpmem",
+	.version	= FIO_IOOPS_VERSION,
+	.init		= fio_libpmem_init,
+	.prep		= fio_libpmem_prep,
+	.queue		= fio_libpmem_queue,
+	.open_file	= fio_libpmem_open_file,
+	.close_file	= fio_libpmem_close_file,
+	.get_file_size	= generic_get_file_size,
+	.flags		= FIO_SYNCIO |FIO_NOEXTEND,
+};
+
+static void fio_init fio_libpmem_register(void)
+{
+#ifndef WIN32
+	Mmap_align = page_size;
+#else
+	if (Mmap_align == 0) {
+		SYSTEM_INFO si;
+
+		GetSystemInfo(&si);
+		Mmap_align = si.dwAllocationGranularity;
+	}
+#endif
+
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_libpmem_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/libpmem.fio b/examples/libpmem.fio
new file mode 100644
index 0000000..d44fcfa
--- /dev/null
+++ b/examples/libpmem.fio
@@ -0,0 +1,73 @@
+[global]
+bs=4k
+size=8g
+ioengine=libpmem
+norandommap
+time_based=1
+group_reporting
+invalidate=1
+disable_lat=1
+disable_slat=1
+disable_clat=1
+clat_percentiles=0
+
+iodepth=1
+iodepth_batch=1
+thread=1
+numjobs=1
+
+#
+# In case of 'scramble_buffers=1', the source buffer
+# is rewritten with a random value every write operations.
+#
+# But when 'scramble_buffers=0' is set, the source buffer isn't
+# rewritten. So it will be likely that the source buffer is in CPU
+# cache and it seems to be high performance.
+#
+scramble_buffers=0
+
+#
+# direct=0:
+#   Using pmem_memcpy_nodrain() for write operation
+#
+# direct=1:
+#   Using pmem_memcpy_persist() for write operation
+#
+direct=0
+
+#
+# Setting for fio process's CPU Node and Memory Node
+#
+numa_cpu_nodes=0
+numa_mem_policy=bind:0
+
+#
+# split means that each job will get a unique CPU from the CPU set
+#
+cpus_allowed_policy=split
+
+#
+# The pmemblk engine does IO to files in a DAX-mounted filesystem.
+# The filesystem should be created on an NVDIMM (e.g /dev/pmem0)
+# and then mounted with the '-o dax' option.  Note that the engine
+# accesses the underlying NVDIMM directly, bypassing the kernel block
+# layer, so the usual filesystem/disk performance monitoring tools such
+# as iostat will not provide useful data.
+#
+directory=/mnt/pmem0
+
+[libpmem-seqwrite]
+rw=write
+stonewall
+
+#[libpmem-seqread]
+#rw=read
+#stonewall
+
+#[libpmem-randwrite]
+#rw=randwrite
+#stonewall
+
+#[libpmem-randread]
+#rw=randread
+#stonewall
diff --git a/fio.1 b/fio.1
index 1f9fffc..bd7670a 100644
--- a/fio.1
+++ b/fio.1
@@ -1597,6 +1597,11 @@ details of writing an external I/O engine.
 Simply create the files and do no I/O to them.  You still need to set
 \fBfilesize\fR so that all the accounting still occurs, but no actual I/O will be
 done other than creating the file.
+.TP
+.B libpmem
+Read and write using mmap I/O to a file on a filesystem
+mounted with DAX on a persistent memory device through the NVML
+libpmem library.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
diff --git a/options.c b/options.c
index e8d1a3a..a0fcd8f 100644
--- a/options.c
+++ b/options.c
@@ -1851,6 +1851,11 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "Load external engine (append name)",
 			    .cb = str_ioengine_external_cb,
 			  },
+#ifdef CONFIG_LIBPMEM
+			  { .ival = "libpmem",
+			    .help = "NVML libpmem based IO engine",
+			  },
+#endif
 		},
 	},
 	{

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-11-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-11-17 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 522c29f69a31542170d94611e05e1780e4c08dbd:

  man page: fix bad case for 'pre-reading file' state (2017-11-15 09:53:14 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to adf075fa89a7f3bbb45237f1440de0583833bd80:

  ioengines: remove pointless list initializations (2017-11-16 20:03:15 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      os: make fio_cpu_isset() return a bool
      ioengines: remove pointless list initializations

Robert Elliott (2):
      .gitignore: ignore .exe files (for Windows)
      os-windows: fix cpumask operations

 .gitignore        |  1 +
 ioengines.c       |  4 +---
 os/os-dragonfly.h |  8 +++-----
 os/os-freebsd.h   |  2 +-
 os/os-linux.h     |  2 +-
 os/os-solaris.h   | 12 +++++++-----
 os/os-windows.h   |  7 ++++---
 7 files changed, 18 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/.gitignore b/.gitignore
index a07a324..463b53a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,5 +1,6 @@
 *.d
 *.o
+*.exe
 /.depend
 /FIO-VERSION-FILE
 /config-host.h
diff --git a/ioengines.c b/ioengines.c
index 1bfc06f..02eaee8 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -52,14 +52,12 @@ static bool check_engine_ops(struct ioengine_ops *ops)
 void unregister_ioengine(struct ioengine_ops *ops)
 {
 	dprint(FD_IO, "ioengine %s unregistered\n", ops->name);
-	flist_del(&ops->list);
-	INIT_FLIST_HEAD(&ops->list);
+	flist_del_init(&ops->list);
 }
 
 void register_ioengine(struct ioengine_ops *ops)
 {
 	dprint(FD_IO, "ioengine %s registered\n", ops->name);
-	INIT_FLIST_HEAD(&ops->list);
 	flist_add_tail(&ops->list, &engine_list);
 }
 
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index 423b236..713046f 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -15,6 +15,7 @@
 #include <sys/resource.h>
 
 #include "../file.h"
+#include "../lib/types.h"
 
 #define FIO_HAVE_ODIRECT
 #define FIO_USE_GENERIC_RAND
@@ -107,12 +108,9 @@ static inline void fio_cpu_set(os_cpu_mask_t *mask, int cpu)
 	CPUMASK_ORBIT(*mask, cpu);
 }
 
-static inline int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
+static inline bool fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
 {
-	if (CPUMASK_TESTBIT(*mask, cpu))
-		return 1;
-
-	return 0;
+	return CPUMASK_TESTBIT(*mask, cpu) != 0;
 }
 
 static inline int fio_setaffinity(int pid, os_cpu_mask_t mask)
diff --git a/os/os-freebsd.h b/os/os-freebsd.h
index 4a7cdeb..97bc8ae 100644
--- a/os/os-freebsd.h
+++ b/os/os-freebsd.h
@@ -37,7 +37,7 @@ typedef cpuset_t os_cpu_mask_t;
 
 #define fio_cpu_clear(mask, cpu)        (void) CPU_CLR((cpu), (mask))
 #define fio_cpu_set(mask, cpu)          (void) CPU_SET((cpu), (mask))
-#define fio_cpu_isset(mask, cpu)	CPU_ISSET((cpu), (mask))
+#define fio_cpu_isset(mask, cpu)	(CPU_ISSET((cpu), (mask)) != 0)
 #define fio_cpu_count(mask)		CPU_COUNT((mask))
 
 static inline int fio_cpuset_init(os_cpu_mask_t *mask)
diff --git a/os/os-linux.h b/os/os-linux.h
index 1ad6ebd..894dc85 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -71,7 +71,7 @@ typedef struct drand48_data os_random_state_t;
 
 #define fio_cpu_clear(mask, cpu)	(void) CPU_CLR((cpu), (mask))
 #define fio_cpu_set(mask, cpu)		(void) CPU_SET((cpu), (mask))
-#define fio_cpu_isset(mask, cpu)	CPU_ISSET((cpu), (mask))
+#define fio_cpu_isset(mask, cpu)	(CPU_ISSET((cpu), (mask)) != 0)
 #define fio_cpu_count(mask)		CPU_COUNT((mask))
 
 static inline int fio_cpuset_init(os_cpu_mask_t *mask)
diff --git a/os/os-solaris.h b/os/os-solaris.h
index 2f13723..db03546 100644
--- a/os/os-solaris.h
+++ b/os/os-solaris.h
@@ -16,6 +16,7 @@
 #include <pthread.h>
 
 #include "../file.h"
+#include "../lib/types.h"
 
 #define FIO_HAVE_CPU_AFFINITY
 #define FIO_HAVE_CHARDEV_SIZE
@@ -126,24 +127,25 @@ static inline int fio_set_odirect(struct fio_file *f)
 #define fio_cpu_clear(mask, cpu)	pset_assign(PS_NONE, (cpu), NULL)
 #define fio_cpu_set(mask, cpu)		pset_assign(*(mask), (cpu), NULL)
 
-static inline int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
+static inline bool fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
 {
 	const unsigned int max_cpus = sysconf(_SC_NPROCESSORS_ONLN);
 	unsigned int num_cpus;
 	processorid_t *cpus;
-	int i, ret;
+	bool ret;
+	int i;
 
 	cpus = malloc(sizeof(*cpus) * max_cpus);
 
 	if (pset_info(*mask, NULL, &num_cpus, cpus) < 0) {
 		free(cpus);
-		return 0;
+		return false;
 	}
 
-	ret = 0;
+	ret = false;
 	for (i = 0; i < num_cpus; i++) {
 		if (cpus[i] == cpu) {
-			ret = 1;
+			ret = true;
 			break;
 		}
 	}
diff --git a/os/os-windows.h b/os/os-windows.h
index 520da19..9b04579 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -17,6 +17,7 @@
 #include "../log.h"
 #include "../lib/hweight.h"
 #include "../oslib/strcasestr.h"
+#include "../lib/types.h"
 
 #include "windows/posix.h"
 
@@ -209,17 +210,17 @@ static inline int fio_getaffinity(int pid, os_cpu_mask_t *mask)
 
 static inline void fio_cpu_clear(os_cpu_mask_t *mask, int cpu)
 {
-	*mask ^= 1 << (cpu-1);
+	*mask &= ~(1ULL << cpu);
 }
 
 static inline void fio_cpu_set(os_cpu_mask_t *mask, int cpu)
 {
-	*mask |= 1 << cpu;
+	*mask |= 1ULL << cpu;
 }
 
 static inline int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
 {
-	return (*mask & (1U << cpu));
+	return (*mask & (1ULL << cpu)) != 0;
 }
 
 static inline int fio_cpu_count(os_cpu_mask_t *mask)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-11-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-11-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3b973af558aa0802e2c8193e9afac4cd49af3ca0:

  Merge branch 'fix-libhdfs' of https://github.com/follitude/fio (2017-11-06 09:11:07 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 522c29f69a31542170d94611e05e1780e4c08dbd:

  man page: fix bad case for 'pre-reading file' state (2017-11-15 09:53:14 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      man page: fix bad case for 'pre-reading file' state

 fio.1 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/fio.1 b/fio.1
index 63d32a5..1f9fffc 100644
--- a/fio.1
+++ b/fio.1
@@ -2730,7 +2730,7 @@ Thread created.
 .B I
 Thread initialized, waiting or generating necessary data.
 .TP
-.B P
+.B p
 Thread running pre\-reading file(s).
 .TP
 .B /

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-11-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-11-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9f50b4106bd1d6fa1c325900d1fb286832ccc5e8:

  Fio 3.2 (2017-11-03 15:23:49 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3b973af558aa0802e2c8193e9afac4cd49af3ca0:

  Merge branch 'fix-libhdfs' of https://github.com/follitude/fio (2017-11-06 09:11:07 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fix-libhdfs' of https://github.com/follitude/fio

follitude (1):
      Makefile: tiny fix of libhdfs

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 76243ff..2893348 100644
--- a/Makefile
+++ b/Makefile
@@ -51,7 +51,7 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 
 ifdef CONFIG_LIBHDFS
   HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
-  HDFSLIB= -Wl,-rpath $(JAVA_HOME)/jre/lib/$(FIO_HDFS_CPU)/server -L$(JAVA_HOME)/jre/lib/$(FIO_HDFS_CPU)/server -ljvm $(FIO_LIBHDFS_LIB)/libhdfs.a
+  HDFSLIB= -Wl,-rpath $(JAVA_HOME)/jre/lib/$(FIO_HDFS_CPU)/server -L$(JAVA_HOME)/jre/lib/$(FIO_HDFS_CPU)/server $(FIO_LIBHDFS_LIB)/libhdfs.a -ljvm
   CFLAGS += $(HDFSFLAGS)
   SOURCE += engines/libhdfs.c
 endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-11-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-11-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 34851ad5ffacf9f4f8a7f23ee2edb17281b917a0:

  io_u_queue: convert rings to bool (2017-11-02 12:26:39 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9f50b4106bd1d6fa1c325900d1fb286832ccc5e8:

  Fio 3.2 (2017-11-03 15:23:49 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'json_bw_bytes' of https://github.com/sitsofe/fio
      Fio 3.2

Sitsofe Wheeler (1):
      stat: add bw_bytes JSON key

Tomohiro Kusumi (3):
      solaris: #include <pthread.h>
      solaris: add os_phys_mem() implementation
      solaris: add get_fs_free_size() implementation

 FIO-VERSION-GEN        |  2 +-
 os/os-solaris.h        | 25 ++++++++++++++++++++++++-
 os/windows/install.wxs |  2 +-
 stat.c                 |  7 +++++--
 4 files changed, 31 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 8c075cb..22f4404 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.1
+DEF_VER=fio-3.2
 
 LF='
 '
diff --git a/os/os-solaris.h b/os/os-solaris.h
index 45268b2..2f13723 100644
--- a/os/os-solaris.h
+++ b/os/os-solaris.h
@@ -12,12 +12,15 @@
 #include <sys/mman.h>
 #include <sys/dkio.h>
 #include <sys/byteorder.h>
+#include <sys/statvfs.h>
+#include <pthread.h>
 
 #include "../file.h"
 
 #define FIO_HAVE_CPU_AFFINITY
 #define FIO_HAVE_CHARDEV_SIZE
 #define FIO_USE_GENERIC_BDEV_SIZE
+#define FIO_HAVE_FS_STAT
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_GETTID
 
@@ -65,7 +68,27 @@ static inline int blockdev_invalidate_cache(struct fio_file *f)
 
 static inline unsigned long long os_phys_mem(void)
 {
-	return 0;
+	long pagesize, pages;
+
+	pagesize = sysconf(_SC_PAGESIZE);
+	pages = sysconf(_SC_PHYS_PAGES);
+	if (pages == -1 || pagesize == -1)
+		return 0;
+
+	return (unsigned long long) pages * (unsigned long long) pagesize;
+}
+
+static inline unsigned long long get_fs_free_size(const char *path)
+{
+	unsigned long long ret;
+	struct statvfs s;
+
+	if (statvfs(path, &s) < 0)
+		return -1ULL;
+
+	ret = s.f_frsize;
+	ret *= (unsigned long long) s.f_bfree;
+	return ret;
 }
 
 static inline void os_random_seed(unsigned long seed, os_random_state_t *rs)
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 58244c5..6dfe231 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="3.1">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="3.2">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"
diff --git a/stat.c b/stat.c
index c8a45db..89e2e6c 100644
--- a/stat.c
+++ b/stat.c
@@ -956,7 +956,7 @@ static void add_ddir_status_json(struct thread_stat *ts,
 		struct group_run_stats *rs, int ddir, struct json_object *parent)
 {
 	unsigned long long min, max, minv, maxv;
-	unsigned long long bw;
+	unsigned long long bw_bytes, bw;
 	unsigned long long *ovals = NULL;
 	double mean, dev, iops;
 	unsigned int len;
@@ -975,17 +975,20 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	json_object_add_value_object(parent,
 		ts->unified_rw_rep ? "mixed" : ddirname[ddir], dir_object);
 
+	bw_bytes = 0;
 	bw = 0;
 	iops = 0.0;
 	if (ts->runtime[ddir]) {
 		uint64_t runt = ts->runtime[ddir];
 
-		bw = ((1000 * ts->io_bytes[ddir]) / runt) / 1024; /* KiB/s */
+		bw_bytes = ((1000 * ts->io_bytes[ddir]) / runt); /* Bytes/s */
+		bw = bw_bytes / 1024; /* KiB/s */
 		iops = (1000.0 * (uint64_t) ts->total_io_u[ddir]) / runt;
 	}
 
 	json_object_add_value_int(dir_object, "io_bytes", ts->io_bytes[ddir]);
 	json_object_add_value_int(dir_object, "io_kbytes", ts->io_bytes[ddir] >> 10);
+	json_object_add_value_int(dir_object, "bw_bytes", bw_bytes);
 	json_object_add_value_int(dir_object, "bw", bw);
 	json_object_add_value_float(dir_object, "iops", iops);
 	json_object_add_value_int(dir_object, "runtime", ts->runtime[ddir]);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-11-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-11-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c89daa4a98e6f3749ffc75b727a77cc061a0a454:

  io_u: reset file to initial offset (2017-11-01 14:51:03 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 34851ad5ffacf9f4f8a7f23ee2edb17281b917a0:

  io_u_queue: convert rings to bool (2017-11-02 12:26:39 -0600)

----------------------------------------------------------------
Jens Axboe (9):
      filesetup: don't print non-debug error on native fallocate failure
      filesetup: don't inline native_fallocate()
      filesetup: pre_read_files() can use a bool
      filesetup: __init_rand_distribution() can be void
      filesetup: change random file init to be bool based
      filesetup: create_work_dirs() can return bool
      filesetup: recurse_dir() can use bool
      filesetup: allocate 'r' locally in fallocate_file()
      io_u_queue: convert rings to bool

 backend.c    | 14 +++++-----
 file.h       |  4 +--
 filesetup.c  | 85 ++++++++++++++++++++++++++++++------------------------------
 io_u_queue.c | 12 ++++-----
 io_u_queue.h |  5 ++--
 5 files changed, 59 insertions(+), 61 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index c14f37c..7cf9b38 100644
--- a/backend.c
+++ b/backend.c
@@ -1203,9 +1203,9 @@ static int init_io_u(struct thread_data *td)
 		data_xfer = 0;
 
 	err = 0;
-	err += io_u_rinit(&td->io_u_requeues, td->o.iodepth);
-	err += io_u_qinit(&td->io_u_freelist, td->o.iodepth);
-	err += io_u_qinit(&td->io_u_all, td->o.iodepth);
+	err += !io_u_rinit(&td->io_u_requeues, td->o.iodepth);
+	err += !io_u_qinit(&td->io_u_freelist, td->o.iodepth);
+	err += !io_u_qinit(&td->io_u_all, td->o.iodepth);
 
 	if (err) {
 		log_err("fio: failed setting up IO queues\n");
@@ -1692,16 +1692,14 @@ static void *thread_main(void *data)
 	if (td_io_init(td))
 		goto err;
 
-	if (init_random_map(td))
+	if (!init_random_map(td))
 		goto err;
 
 	if (o->exec_prerun && exec_string(o, o->exec_prerun, (const char *)"prerun"))
 		goto err;
 
-	if (o->pre_read) {
-		if (pre_read_files(td) < 0)
-			goto err;
-	}
+	if (o->pre_read && !pre_read_files(td))
+		goto err;
 
 	fio_verify_init(td);
 
diff --git a/file.h b/file.h
index e3864ee..cc721ee 100644
--- a/file.h
+++ b/file.h
@@ -198,7 +198,7 @@ extern int __must_check generic_get_file_size(struct thread_data *, struct fio_f
 }
 #endif
 extern int __must_check file_lookup_open(struct fio_file *f, int flags);
-extern int __must_check pre_read_files(struct thread_data *);
+extern bool __must_check pre_read_files(struct thread_data *);
 extern unsigned long long get_rand_file_size(struct thread_data *td);
 extern int add_file(struct thread_data *, const char *, int, int);
 extern int add_file_exclusive(struct thread_data *, const char *);
@@ -209,7 +209,7 @@ extern void lock_file(struct thread_data *, struct fio_file *, enum fio_ddir);
 extern void unlock_file(struct thread_data *, struct fio_file *);
 extern void unlock_file_all(struct thread_data *, struct fio_file *);
 extern int add_dir_files(struct thread_data *, const char *);
-extern int init_random_map(struct thread_data *);
+extern bool init_random_map(struct thread_data *);
 extern void dup_files(struct thread_data *, struct thread_data *);
 extern int get_fileno(struct thread_data *, const char *);
 extern void free_release_files(struct thread_data *);
diff --git a/filesetup.c b/filesetup.c
index 5d7ea5c..4d29b70 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -38,7 +38,7 @@ static inline void clear_error(struct thread_data *td)
 	td->verror[0] = '\0';
 }
 
-static inline int native_fallocate(struct thread_data *td, struct fio_file *f)
+static int native_fallocate(struct thread_data *td, struct fio_file *f)
 {
 	bool success;
 
@@ -49,32 +49,29 @@ static inline int native_fallocate(struct thread_data *td, struct fio_file *f)
 			!success ? "un": "");
 
 	if (success)
-		return 0;
+		return false;
 
 	if (errno == ENOSYS)
 		dprint(FD_FILE, "native fallocate is not implemented\n");
 
-	return -1;
+	return true;
 }
 
 static void fallocate_file(struct thread_data *td, struct fio_file *f)
 {
-	int r;
-
 	if (td->o.fill_device)
 		return;
 
 	switch (td->o.fallocate_mode) {
 	case FIO_FALLOCATE_NATIVE:
-		r = native_fallocate(td, f);
-		if (r != 0 && errno != ENOSYS)
-			log_err("fio: native_fallocate call failed: %s\n",
-					strerror(errno));
+		native_fallocate(td, f);
 		break;
 	case FIO_FALLOCATE_NONE:
 		break;
 #ifdef CONFIG_POSIX_FALLOCATE
-	case FIO_FALLOCATE_POSIX:
+	case FIO_FALLOCATE_POSIX: {
+		int r;
+
 		dprint(FD_FILE, "posix_fallocate file %s size %llu\n",
 				 f->file_name,
 				 (unsigned long long) f->real_file_size);
@@ -83,9 +80,12 @@ static void fallocate_file(struct thread_data *td, struct fio_file *f)
 		if (r > 0)
 			log_err("fio: posix_fallocate fails: %s\n", strerror(r));
 		break;
+		}
 #endif /* CONFIG_POSIX_FALLOCATE */
 #ifdef CONFIG_LINUX_FALLOCATE
-	case FIO_FALLOCATE_KEEP_SIZE:
+	case FIO_FALLOCATE_KEEP_SIZE: {
+		int r;
+
 		dprint(FD_FILE, "fallocate(FALLOC_FL_KEEP_SIZE) "
 				"file %s size %llu\n", f->file_name,
 				(unsigned long long) f->real_file_size);
@@ -95,6 +95,7 @@ static void fallocate_file(struct thread_data *td, struct fio_file *f)
 			td_verror(td, errno, "fallocate");
 
 		break;
+		}
 #endif /* CONFIG_LINUX_FALLOCATE */
 	default:
 		log_err("fio: unknown fallocate mode: %d\n", td->o.fallocate_mode);
@@ -258,24 +259,25 @@ err:
 	return 1;
 }
 
-static int pre_read_file(struct thread_data *td, struct fio_file *f)
+static bool pre_read_file(struct thread_data *td, struct fio_file *f)
 {
-	int ret = 0, r, did_open = 0, old_runstate;
+	int r, did_open = 0, old_runstate;
 	unsigned long long left;
 	unsigned int bs;
+	bool ret = true;
 	char *b;
 
 	if (td_ioengine_flagged(td, FIO_PIPEIO) ||
 	    td_ioengine_flagged(td, FIO_NOIO))
-		return 0;
+		return true;
 
 	if (f->filetype == FIO_TYPE_CHAR)
-		return 0;
+		return true;
 
 	if (!fio_file_open(f)) {
 		if (td->io_ops->open_file(td, f)) {
 			log_err("fio: cannot pre-read, failed to open file\n");
-			return 1;
+			return false;
 		}
 		did_open = 1;
 	}
@@ -290,7 +292,7 @@ static int pre_read_file(struct thread_data *td, struct fio_file *f)
 	b = malloc(bs);
 	if (!b) {
 		td_verror(td, errno, "malloc");
-		ret = 1;
+		ret = false;
 		goto error;
 	}
 	memset(b, 0, bs);
@@ -298,7 +300,7 @@ static int pre_read_file(struct thread_data *td, struct fio_file *f)
 	if (lseek(f->fd, f->file_offset, SEEK_SET) < 0) {
 		td_verror(td, errno, "lseek");
 		log_err("fio: failed to lseek pre-read file\n");
-		ret = 1;
+		ret = false;
 		goto error;
 	}
 
@@ -1177,7 +1179,7 @@ err_out:
 	return 1;
 }
 
-int pre_read_files(struct thread_data *td)
+bool pre_read_files(struct thread_data *td)
 {
 	struct fio_file *f;
 	unsigned int i;
@@ -1185,14 +1187,14 @@ int pre_read_files(struct thread_data *td)
 	dprint(FD_FILE, "pre_read files\n");
 
 	for_each_file(td, f, i) {
-		if (pre_read_file(td, f))
-			return -1;
+		if (!pre_read_file(td, f))
+			return false;
 	}
 
-	return 0;
+	return true;
 }
 
-static int __init_rand_distribution(struct thread_data *td, struct fio_file *f)
+static void __init_rand_distribution(struct thread_data *td, struct fio_file *f)
 {
 	unsigned int range_size, seed;
 	unsigned long nranges;
@@ -1213,18 +1215,16 @@ static int __init_rand_distribution(struct thread_data *td, struct fio_file *f)
 		pareto_init(&f->zipf, nranges, td->o.pareto_h.u.f, seed);
 	else if (td->o.random_distribution == FIO_RAND_DIST_GAUSS)
 		gauss_init(&f->gauss, nranges, td->o.gauss_dev.u.f, seed);
-
-	return 1;
 }
 
-static int init_rand_distribution(struct thread_data *td)
+static bool init_rand_distribution(struct thread_data *td)
 {
 	struct fio_file *f;
 	unsigned int i;
 	int state;
 
 	if (td->o.random_distribution == FIO_RAND_DIST_RANDOM)
-		return 0;
+		return false;
 
 	state = td_bump_runstate(td, TD_SETTING_UP);
 
@@ -1232,8 +1232,7 @@ static int init_rand_distribution(struct thread_data *td)
 		__init_rand_distribution(td, f);
 
 	td_restore_runstate(td, state);
-
-	return 1;
+	return true;
 }
 
 /*
@@ -1273,16 +1272,16 @@ static int check_rand_gen_limits(struct thread_data *td, struct fio_file *f,
 	return 0;
 }
 
-int init_random_map(struct thread_data *td)
+bool init_random_map(struct thread_data *td)
 {
 	unsigned long long blocks;
 	struct fio_file *f;
 	unsigned int i;
 
 	if (init_rand_distribution(td))
-		return 0;
+		return true;
 	if (!td_random(td))
-		return 0;
+		return true;
 
 	for_each_file(td, f, i) {
 		uint64_t fsize = min(f->real_file_size, f->io_size);
@@ -1290,7 +1289,7 @@ int init_random_map(struct thread_data *td)
 		blocks = fsize / (unsigned long long) td->o.rw_min_bs;
 
 		if (check_rand_gen_limits(td, f, blocks))
-			return 1;
+			return false;
 
 		if (td->o.random_generator == FIO_RAND_GEN_LFSR) {
 			unsigned long seed;
@@ -1315,14 +1314,14 @@ int init_random_map(struct thread_data *td)
 				" a large number of jobs, try the 'norandommap'"
 				" option or set 'softrandommap'. Or give"
 				" a larger --alloc-size to fio.\n");
-			return 1;
+			return false;
 		}
 
 		log_info("fio: file %s failed allocating random map. Running "
 			 "job without.\n", f->file_name);
 	}
 
-	return 0;
+	return true;
 }
 
 void close_files(struct thread_data *td)
@@ -1521,7 +1520,7 @@ bool exists_and_not_regfile(const char *filename)
 	return true;
 }
 
-static int create_work_dirs(struct thread_data *td, const char *fname)
+static bool create_work_dirs(struct thread_data *td, const char *fname)
 {
 	char path[PATH_MAX];
 	char *start, *end;
@@ -1548,13 +1547,13 @@ static int create_work_dirs(struct thread_data *td, const char *fname)
 #endif
 			log_err("fio: failed to create dir (%s): %d\n",
 				start, errno);
-			return 1;
+			return false;
 		}
 		*end = FIO_OS_PATH_SEPARATOR;
 		end++;
 	}
 	td->flags |= TD_F_DIRS_CREATED;
-	return 0;
+	return true;
 }
 
 int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
@@ -1574,7 +1573,7 @@ int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 
 	if (strchr(fname, FIO_OS_PATH_SEPARATOR) &&
 	    !(td->flags & TD_F_DIRS_CREATED) &&
-	    create_work_dirs(td, fname))
+	    !create_work_dirs(td, fname))
 		return 1;
 
 	/* clean cloned siblings using existing files */
@@ -1742,10 +1741,10 @@ void unlock_file_all(struct thread_data *td, struct fio_file *f)
 		unlock_file(td, f);
 }
 
-static int recurse_dir(struct thread_data *td, const char *dirname)
+static bool recurse_dir(struct thread_data *td, const char *dirname)
 {
 	struct dirent *dir;
-	int ret = 0;
+	bool ret = false;
 	DIR *D;
 
 	D = opendir(dirname);
@@ -1754,7 +1753,7 @@ static int recurse_dir(struct thread_data *td, const char *dirname)
 
 		snprintf(buf, FIO_VERROR_SIZE, "opendir(%s)", dirname);
 		td_verror(td, errno, buf);
-		return 1;
+		return true;
 	}
 
 	while ((dir = readdir(D)) != NULL) {
@@ -1769,7 +1768,7 @@ static int recurse_dir(struct thread_data *td, const char *dirname)
 		if (lstat(full_path, &sb) == -1) {
 			if (errno != ENOENT) {
 				td_verror(td, errno, "stat");
-				ret = 1;
+				ret = true;
 				break;
 			}
 		}
diff --git a/io_u_queue.c b/io_u_queue.c
index 9994c78..8cf4c8c 100644
--- a/io_u_queue.c
+++ b/io_u_queue.c
@@ -1,15 +1,15 @@
 #include <stdlib.h>
 #include "io_u_queue.h"
 
-int io_u_qinit(struct io_u_queue *q, unsigned int nr)
+bool io_u_qinit(struct io_u_queue *q, unsigned int nr)
 {
 	q->io_us = calloc(nr, sizeof(struct io_u *));
 	if (!q->io_us)
-		return 1;
+		return false;
 
 	q->nr = 0;
 	q->max = nr;
-	return 0;
+	return true;
 }
 
 void io_u_qexit(struct io_u_queue *q)
@@ -17,7 +17,7 @@ void io_u_qexit(struct io_u_queue *q)
 	free(q->io_us);
 }
 
-int io_u_rinit(struct io_u_ring *ring, unsigned int nr)
+bool io_u_rinit(struct io_u_ring *ring, unsigned int nr)
 {
 	ring->max = nr + 1;
 	if (ring->max & (ring->max - 1)) {
@@ -32,10 +32,10 @@ int io_u_rinit(struct io_u_ring *ring, unsigned int nr)
 
 	ring->ring = calloc(ring->max, sizeof(struct io_u *));
 	if (!ring->ring)
-		return 1;
+		return false;
 
 	ring->head = ring->tail = 0;
-	return 0;
+	return true;
 }
 
 void io_u_rexit(struct io_u_ring *ring)
diff --git a/io_u_queue.h b/io_u_queue.h
index 118e593..b5b8d2f 100644
--- a/io_u_queue.h
+++ b/io_u_queue.h
@@ -2,6 +2,7 @@
 #define FIO_IO_U_QUEUE
 
 #include <assert.h>
+#include "lib/types.h"
 
 struct io_u;
 
@@ -42,7 +43,7 @@ static inline int io_u_qempty(const struct io_u_queue *q)
 #define io_u_qiter(q, io_u, i)	\
 	for (i = 0; i < (q)->nr && (io_u = (q)->io_us[i]); i++)
 
-int io_u_qinit(struct io_u_queue *q, unsigned int nr);
+bool io_u_qinit(struct io_u_queue *q, unsigned int nr);
 void io_u_qexit(struct io_u_queue *q);
 
 struct io_u_ring {
@@ -52,7 +53,7 @@ struct io_u_ring {
 	struct io_u **ring;
 };
 
-int io_u_rinit(struct io_u_ring *ring, unsigned int nr);
+bool io_u_rinit(struct io_u_ring *ring, unsigned int nr);
 void io_u_rexit(struct io_u_ring *ring);
 
 static inline void io_u_rpush(struct io_u_ring *r, struct io_u *io_u)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-11-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-11-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1633aa61a68593b4a4cc5dbb621129303a7c3049:

  engines/windowsaio: style fixups (2017-10-31 14:01:16 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c89daa4a98e6f3749ffc75b727a77cc061a0a454:

  io_u: reset file to initial offset (2017-11-01 14:51:03 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      io_u: wrap to beginning when end-of-file is reached for time_based
      io_u: reset file to initial offset

 io_u.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index 4246edf..81ee724 100644
--- a/io_u.c
+++ b/io_u.c
@@ -361,16 +361,13 @@ static int get_next_seq_offset(struct thread_data *td, struct fio_file *f,
 
 	assert(ddir_rw(ddir));
 
+	/*
+	 * If we reach the end for a time based run, reset us back to 0
+	 * and invalidate the cache, if we need to.
+	 */
 	if (f->last_pos[ddir] >= f->io_size + get_start_offset(td, f) &&
 	    o->time_based) {
-		struct thread_options *o = &td->o;
-		uint64_t io_size = f->io_size + (f->io_size % o->min_bs[ddir]);
-
-		if (io_size > f->last_pos[ddir])
-			f->last_pos[ddir] = 0;
-		else
-			f->last_pos[ddir] = f->last_pos[ddir] - io_size;
-
+		f->last_pos[ddir] = f->file_offset;
 		loop_cache_invalidate(td, f);
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-11-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-11-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e2b7f9fb0d105de217fe97817b47c594232ac14f:

  Merge branch 'misc' of https://github.com/sitsofe/fio (2017-10-30 09:05:43 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1633aa61a68593b4a4cc5dbb621129303a7c3049:

  engines/windowsaio: style fixups (2017-10-31 14:01:16 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      init: make sure that compression enables refill buffers
      Update compression documentation
      Default buffer_compress_chunk to 512
      engines/windowsaio: style fixups

 HOWTO                |  7 ++++---
 engines/windowsaio.c | 25 ++++++++++++++-----------
 fio.1                |  7 ++++---
 init.c               |  4 +++-
 options.c            |  1 +
 5 files changed, 26 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index f151350..419fa73 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1453,8 +1453,8 @@ Buffers and memory
 	mix of random data and a fixed pattern. The fixed pattern is either zeros,
 	or the pattern specified by :option:`buffer_pattern`. If the pattern option
 	is used, it might skew the compression ratio slightly. Note that this is per
-	block size unit, for file/disk wide compression level that matches this
-	setting, you'll also want to set :option:`refill_buffers`.
+	block size unit, see :option:`buffer_compress_chunk` for setting a finer
+	granularity of compression regions.
 
 .. option:: buffer_compress_chunk=int
 
@@ -1463,7 +1463,8 @@ Buffers and memory
 	will provide :option:`buffer_compress_percentage` of blocksize random data,
 	followed by the remaining zeroed. With this set to some chunk size smaller
 	than the block size, fio can alternate random and zeroed data throughout the
-	I/O buffer.
+	I/O buffer. This is particularly useful when bigger block sizes are used
+	for a job. Defaults to 512.
 
 .. option:: buffer_pattern=str
 
diff --git a/engines/windowsaio.c b/engines/windowsaio.c
index a66b1df..9439393 100644
--- a/engines/windowsaio.c
+++ b/engines/windowsaio.c
@@ -94,15 +94,13 @@ static int fio_windowsaio_init(struct thread_data *td)
 		if (!rc)
 			ctx = malloc(sizeof(struct thread_ctx));
 
-		if (!rc && ctx == NULL)
-		{
+		if (!rc && ctx == NULL) {
 			log_err("windowsaio: failed to allocate memory for thread context structure\n");
 			CloseHandle(hFile);
 			rc = 1;
 		}
 
-		if (!rc)
-		{
+		if (!rc) {
 			DWORD threadid;
 
 			ctx->iocp = hFile;
@@ -146,7 +144,7 @@ static int windowsaio_invalidate_cache(struct fio_file *f)
 {
 	DWORD error;
 	DWORD isharemode = (FILE_SHARE_DELETE | FILE_SHARE_READ |
-			FILE_SHARE_WRITE);
+				FILE_SHARE_WRITE);
 	HANDLE ihFile;
 	int rc = 0;
 
@@ -348,7 +346,8 @@ static int fio_windowsaio_getevents(struct thread_data *td, unsigned int min,
 				break;
 		}
 
-		if (dequeued >= min || (t != NULL && timeout_expired(start_count, end_count)))
+		if (dequeued >= min ||
+		    (t != NULL && timeout_expired(start_count, end_count)))
 			break;
 	} while (1);
 
@@ -371,10 +370,12 @@ static int fio_windowsaio_queue(struct thread_data *td, struct io_u *io_u)
 
 	switch (io_u->ddir) {
 	case DDIR_WRITE:
-		success = WriteFile(io_u->file->hFile, io_u->xfer_buf, io_u->xfer_buflen, NULL, lpOvl);
+		success = WriteFile(io_u->file->hFile, io_u->xfer_buf,
+					io_u->xfer_buflen, NULL, lpOvl);
 		break;
 	case DDIR_READ:
-		success = ReadFile(io_u->file->hFile, io_u->xfer_buf, io_u->xfer_buflen, NULL, lpOvl);
+		success = ReadFile(io_u->file->hFile, io_u->xfer_buf,
+					io_u->xfer_buflen, NULL, lpOvl);
 		break;
 	case DDIR_SYNC:
 	case DDIR_DATASYNC:
@@ -386,13 +387,11 @@ static int fio_windowsaio_queue(struct thread_data *td, struct io_u *io_u)
 		}
 
 		return FIO_Q_COMPLETED;
-		break;
 	case DDIR_TRIM:
 		log_err("windowsaio: manual TRIM isn't supported on Windows\n");
 		io_u->error = 1;
 		io_u->resid = io_u->xfer_buflen;
 		return FIO_Q_COMPLETED;
-		break;
 	default:
 		assert(0);
 		break;
@@ -423,7 +422,11 @@ static DWORD WINAPI IoCompletionRoutine(LPVOID lpParameter)
 	wd = ctx->wd;
 
 	do {
-		if (!GetQueuedCompletionStatus(ctx->iocp, &bytes, &ulKey, &ovl, 250) && ovl == NULL)
+		BOOL ret;
+
+		ret = GetQueuedCompletionStatus(ctx->iocp, &bytes, &ulKey,
+						&ovl, 250);
+		if (!ret && ovl == NULL)
 			continue;
 
 		fov = CONTAINING_RECORD(ovl, struct fio_overlapped, o);
diff --git a/fio.1 b/fio.1
index 198b9d8..63d32a5 100644
--- a/fio.1
+++ b/fio.1
@@ -1242,8 +1242,8 @@ WRITEs) that compresses to the specified level. Fio does this by providing a
 mix of random data and a fixed pattern. The fixed pattern is either zeros,
 or the pattern specified by \fBbuffer_pattern\fR. If the pattern option
 is used, it might skew the compression ratio slightly. Note that this is per
-block size unit, for file/disk wide compression level that matches this
-setting, you'll also want to set \fBrefill_buffers\fR.
+block size unit, see \fBbuffer_compress_chunk\fR for setting a finer granularity
+of compressible regions.
 .TP
 .BI buffer_compress_chunk \fR=\fPint
 See \fBbuffer_compress_percentage\fR. This setting allows fio to manage
@@ -1251,7 +1251,8 @@ how big the ranges of random data and zeroed data is. Without this set, fio
 will provide \fBbuffer_compress_percentage\fR of blocksize random data,
 followed by the remaining zeroed. With this set to some chunk size smaller
 than the block size, fio can alternate random and zeroed data throughout the
-I/O buffer.
+I/O buffer. This is particularly useful when bigger block sizes are used
+for a job. Defaults to 512.
 .TP
 .BI buffer_pattern \fR=\fPstr
 If set, fio will fill the I/O buffers with this pattern or with the contents
diff --git a/init.c b/init.c
index e80aec3..736c6ff 100644
--- a/init.c
+++ b/init.c
@@ -855,8 +855,10 @@ static int fixup_options(struct thread_data *td)
 		if (o->compress_percentage == 100) {
 			o->zero_buffers = 1;
 			o->compress_percentage = 0;
-		} else if (!fio_option_is_set(o, refill_buffers))
+		} else if (!fio_option_is_set(o, refill_buffers)) {
 			o->refill_buffers = 1;
+			td->flags |= TD_F_REFILL_BUFFERS;
+		}
 	}
 
 	/*
diff --git a/options.c b/options.c
index 5813a66..e8d1a3a 100644
--- a/options.c
+++ b/options.c
@@ -4067,6 +4067,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.parent	= "buffer_compress_percentage",
 		.hide	= 1,
 		.help	= "Size of compressible region in buffer",
+		.def	= "512",
 		.interval = 256,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_IO_BUF,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-10-31 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-10-31 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 11fd6aa8569c55c8488020e4e315d550d121ff79:

  Fix 'nice' parameter range: should be -20 to 19, not -19 to 20. (2017-10-26 15:31:35 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e2b7f9fb0d105de217fe97817b47c594232ac14f:

  Merge branch 'misc' of https://github.com/sitsofe/fio (2017-10-30 09:05:43 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'misc' of https://github.com/sitsofe/fio

Sitsofe Wheeler (9):
      steadystate_tests.py: fix up usage comment
      tools: use /usr/bin/python2.7 as the interpreter
      fio: fix interpreter lines
      doc: fix groff line that started with a dot
      COPYING: update license file
      fio: update FSF address
      doc: IO -> I/O, sync filecreate documentation
      doc: minor formatting fixes
      doc: rewrite write_*_log sections

 COPYING                           | 39 ++++++++++----------
 HOWTO                             | 78 +++++++++++++++++++--------------------
 backend.c                         |  2 +-
 crc/crc32.c                       |  2 +-
 crc/crc32.h                       |  2 +-
 crc/crc32c.h                      |  2 +-
 doc/conf.py                       |  1 -
 engines/pmemblk.c                 |  4 +-
 exp/expression-parser.l           |  2 +-
 exp/expression-parser.y           |  2 +-
 exp/test-expression-parser.c      |  2 +-
 fifo.c                            |  2 +-
 fifo.h                            |  2 +-
 fio.1                             | 77 +++++++++++++++++++-------------------
 fio.c                             |  2 +-
 gfio.c                            |  2 +-
 graph.c                           |  2 +-
 lib/rbtree.c                      |  2 +-
 lib/rbtree.h                      |  2 +-
 libfio.c                          |  2 +-
 oslib/libmtd.c                    |  2 +-
 oslib/libmtd.h                    |  2 +-
 oslib/libmtd_common.h             |  2 +-
 oslib/libmtd_int.h                |  2 +-
 oslib/libmtd_legacy.c             |  2 +-
 oslib/libmtd_xalloc.h             |  2 +-
 tools/fio_jsonplus_clat2csv       |  2 +-
 tools/fiologparser.py             |  3 +-
 tools/genfio                      |  4 +-
 tools/hist/fiologparser_hist.py   |  2 +-
 tools/hist/fiologparser_hist.py.1 |  2 +-
 tools/hist/half-bins.py           |  3 +-
 tools/plot/fio2gnuplot            |  4 +-
 unit_tests/steadystate_tests.py   |  4 +-
 34 files changed, 131 insertions(+), 134 deletions(-)

---

Diff of recent changes:

diff --git a/COPYING b/COPYING
index 5b6e7c6..d159169 100644
--- a/COPYING
+++ b/COPYING
@@ -1,12 +1,12 @@
-		    GNU GENERAL PUBLIC LICENSE
-		       Version 2, June 1991
+                    GNU GENERAL PUBLIC LICENSE
+                       Version 2, June 1991
 
- Copyright (C) 1989, 1991 Free Software Foundation, Inc.
-                       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  Everyone is permitted to copy and distribute verbatim copies
  of this license document, but changing it is not allowed.
 
-			    Preamble
+                            Preamble
 
   The licenses for most software are designed to take away your
 freedom to share and change it.  By contrast, the GNU General Public
@@ -15,7 +15,7 @@ software--to make sure the software is free for all its users.  This
 General Public License applies to most of the Free Software
 Foundation's software and to any other program whose authors commit to
 using it.  (Some other Free Software Foundation software is covered by
-the GNU Library General Public License instead.)  You can apply it to
+the GNU Lesser General Public License instead.)  You can apply it to
 your programs, too.
 
   When we speak of free software, we are referring to freedom, not
@@ -55,8 +55,8 @@ patent must be licensed for everyone's free use or not licensed at all.
 
   The precise terms and conditions for copying, distribution and
 modification follow.
-\f
-		    GNU GENERAL PUBLIC LICENSE
+
+                    GNU GENERAL PUBLIC LICENSE
    TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
 
   0. This License applies to any program or other work which contains
@@ -110,7 +110,7 @@ above, provided that you also meet all of these conditions:
     License.  (Exception: if the Program itself is interactive but
     does not normally print such an announcement, your work based on
     the Program is not required to print an announcement.)
-\f
+
 These requirements apply to the modified work as a whole.  If
 identifiable sections of that work are not derived from the Program,
 and can be reasonably considered independent and separate works in
@@ -168,7 +168,7 @@ access to copy from a designated place, then offering equivalent
 access to copy the source code from the same place counts as
 distribution of the source code, even though third parties are not
 compelled to copy the source along with the object code.
-\f
+
   4. You may not copy, modify, sublicense, or distribute the Program
 except as expressly provided under this License.  Any attempt
 otherwise to copy, modify, sublicense or distribute the Program is
@@ -225,7 +225,7 @@ impose that choice.
 
 This section is intended to make thoroughly clear what is believed to
 be a consequence of the rest of this License.
-\f
+
   8. If the distribution and/or use of the Program is restricted in
 certain countries either by patents or by copyrighted interfaces, the
 original copyright holder who places the Program under this License
@@ -255,7 +255,7 @@ make exceptions for this.  Our decision will be guided by the two goals
 of preserving the free status of all derivatives of our free software and
 of promoting the sharing and reuse of software generally.
 
-			    NO WARRANTY
+                            NO WARRANTY
 
   11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
 FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
@@ -277,9 +277,9 @@ YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
 PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGES.
 
-		     END OF TERMS AND CONDITIONS
-\f
-	    How to Apply These Terms to Your New Programs
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
 
   If you develop a new program, and you want it to be of the greatest
 possible use to the public, the best way to achieve this is to make it
@@ -303,10 +303,9 @@ the "copyright" line and a pointer to where the full notice is found.
     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     GNU General Public License for more details.
 
-    You should have received a copy of the GNU General Public License
-    along with this program; if not, write to the Free Software
-    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
-
+    You should have received a copy of the GNU General Public License along
+    with this program; if not, write to the Free Software Foundation, Inc.,
+    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 
 Also add information on how to contact you by electronic and paper mail.
 
@@ -336,5 +335,5 @@ necessary.  Here is a sample; alter the names:
 This General Public License does not permit incorporating your program into
 proprietary programs.  If your program is a subroutine library, you may
 consider it more useful to permit linking proprietary applications with the
-library.  If this is what you want to do, use the GNU Library General
+library.  If this is what you want to do, use the GNU Lesser General
 Public License instead of this License.
diff --git a/HOWTO b/HOWTO
index e7142c5..f151350 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1815,9 +1815,9 @@ I/O engine
 			details of writing an external I/O engine.
 
 		**filecreate**
-			Simply create the files and do no IO to them.  You still need to
+			Simply create the files and do no I/O to them.  You still need to
 			set  `filesize` so that all the accounting still occurs, but no
-			actual IO will be done other than creating the file.
+			actual I/O will be done other than creating the file.
 
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -2060,7 +2060,7 @@ I/O depth
 	changing data and the overlapping region has a non-zero size. Setting
 	``serialize_overlap`` tells fio to avoid provoking this behavior by explicitly
 	serializing in-flight I/Os that have a non-zero overlap. Note that setting
-	this option can reduce both performance and the `:option:iodepth` achieved.
+	this option can reduce both performance and the :option:`iodepth` achieved.
 	Additionally this option does not work when :option:`io_submit_mode` is set to
 	offload. Default: false.
 
@@ -2722,47 +2722,46 @@ Measurements and reporting
 .. option:: write_bw_log=str
 
 	If given, write a bandwidth log for this job. Can be used to store data of
-	the bandwidth of the jobs in their lifetime. The included
-	:command:`fio_generate_plots` script uses :command:`gnuplot` to turn these
-	text files into nice graphs. See :option:`write_lat_log` for behavior of
-	given filename. For this option, the postfix is :file:`_bw.x.log`, where `x`
-	is the index of the job (`1..N`, where `N` is the number of jobs). If
-	:option:`per_job_logs` is false, then the filename will not include the job
-	index.  See `Log File Formats`_.
+	the bandwidth of the jobs in their lifetime.
 
-.. option:: write_lat_log=str
+	If no str argument is given, the default filename of
+	:file:`jobname_type.x.log` is used. Even when the argument is given, fio
+	will still append the type of log. So if one specifies::
+
+		write_bw_log=foo
 
-	Same as :option:`write_bw_log`, except that this option stores I/O
-	submission, completion, and total latencies instead. If no filename is given
-	with this option, the default filename of :file:`jobname_type.log` is
-	used. Even if the filename is given, fio will still append the type of
-	log. So if one specifies::
+	The actual log name will be :file:`foo_bw.x.log` where `x` is the index
+	of the job (`1..N`, where `N` is the number of jobs). If
+	:option:`per_job_logs` is false, then the filename will not include the
+	`.x` job index.
 
-		write_lat_log=foo
+	The included :command:`fio_generate_plots` script uses :command:`gnuplot` to turn these
+	text files into nice graphs. See `Log File Formats`_ for how data is
+	structured within the file.
+
+.. option:: write_lat_log=str
 
-	The actual log names will be :file:`foo_slat.x.log`, :file:`foo_clat.x.log`,
-	and :file:`foo_lat.x.log`, where `x` is the index of the job (`1..N`, where `N`
-	is the number of jobs). This helps :command:`fio_generate_plots` find the
-	logs automatically. If :option:`per_job_logs` is false, then the filename
-	will not include the job index.  See `Log File Formats`_.
+	Same as :option:`write_bw_log`, except this option creates I/O
+	submission (e.g., `file:`name_slat.x.log`), completion (e.g.,
+	`file:`name_clat.x.log`), and total (e.g., `file:`name_lat.x.log`)
+	latency files instead. See :option:`write_bw_log` for details about
+	the filename format and `Log File Formats`_ for how data is structured
+	within the files.
 
 .. option:: write_hist_log=str
 
-	Same as :option:`write_lat_log`, but writes I/O completion latency
-	histograms. If no filename is given with this option, the default filename
-	of :file:`jobname_clat_hist.x.log` is used, where `x` is the index of the
-	job (`1..N`, where `N` is the number of jobs). Even if the filename is given,
-	fio will still append the type of log.  If :option:`per_job_logs` is false,
-	then the filename will not include the job index. See `Log File Formats`_.
+	Same as :option:`write_bw_log` but writes an I/O completion latency
+	histogram file (e.g., `file:`name_hist.x.log`) instead. Note that this
+	file will be empty unless :option:`log_hist_msec` has also been set.
+	See :option:`write_bw_log` for details about the filename format and
+	`Log File Formats`_ for how data is structured within the file.
 
 .. option:: write_iops_log=str
 
-	Same as :option:`write_bw_log`, but writes IOPS. If no filename is given
-	with this option, the default filename of :file:`jobname_type.x.log` is
-	used, where `x` is the index of the job (`1..N`, where `N` is the number of
-	jobs). Even if the filename is given, fio will still append the type of
-	log. If :option:`per_job_logs` is false, then the filename will not include
-	the job index. See `Log File Formats`_.
+	Same as :option:`write_bw_log`, but writes an IOPS file (e.g.
+	`file:`name_iops.x.log`) instead. See :option:`write_bw_log` for
+	details about the filename format and `Log File Formats`_ for how data
+	is structured within the file.
 
 .. option:: log_avg_msec=int
 
@@ -2780,15 +2779,16 @@ Measurements and reporting
 	:option:`log_avg_msec` is inaccurate. Setting this option makes fio log
 	histogram entries over the specified period of time, reducing log sizes for
 	high IOPS devices while retaining percentile accuracy.  See
-	:option:`log_hist_coarseness` as well. Defaults to 0, meaning histogram
-	logging is disabled.
+	:option:`log_hist_coarseness` and :option:`write_hist_log` as well.
+	Defaults to 0, meaning histogram logging is disabled.
 
 .. option:: log_hist_coarseness=int
 
 	Integer ranging from 0 to 6, defining the coarseness of the resolution of
 	the histogram logs enabled with :option:`log_hist_msec`. For each increment
 	in coarseness, fio outputs half as many bins. Defaults to 0, for which
-	histogram logs contain 1216 latency bins. See `Log File Formats`_.
+	histogram logs contain 1216 latency bins. See :option:`write_hist_log`
+	and `Log File Formats`_.
 
 .. option:: log_max_value=bool
 
@@ -2888,7 +2888,7 @@ Measurements and reporting
 
 .. option:: lat_percentiles=bool
 
-	Enable the reporting of percentiles of IO latencies. This is similar
+	Enable the reporting of percentiles of I/O latencies. This is similar
 	to :option:`clat_percentiles`, except that this includes the
 	submission latency. This option is mutually exclusive with
 	:option:`clat_percentiles`.
@@ -3258,7 +3258,7 @@ writes in the example above).  In the order listed, they denote:
 		short or dropped.
 
 **IO latency**
-		These values are for `--latency-target` and related options. When
+		These values are for :option:`latency_target` and related options. When
 		these options are engaged, this section describes the I/O depth required
 		to meet the specified latency target.
 
diff --git a/backend.c b/backend.c
index d98e5fe..c14f37c 100644
--- a/backend.c
+++ b/backend.c
@@ -18,7 +18,7 @@
  *
  *  You should have received a copy of the GNU General Public License
  *  along with this program; if not, write to the Free Software
- *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  */
 #include <unistd.h>
diff --git a/crc/crc32.c b/crc/crc32.c
index 657031d..4140a8d 100644
--- a/crc/crc32.c
+++ b/crc/crc32.c
@@ -13,7 +13,7 @@
 
    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software Foundation,
-   Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
+   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */
 
 #include <inttypes.h>
 #include "crc32.h"
diff --git a/crc/crc32.h b/crc/crc32.h
index 674057b..a37d7ad 100644
--- a/crc/crc32.h
+++ b/crc/crc32.h
@@ -13,7 +13,7 @@
 
    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software Foundation,
-   Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
+   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */
 
 #ifndef CRC32_H
 #define CRC32_H
diff --git a/crc/crc32c.h b/crc/crc32c.h
index d513f3a..be03c1a 100644
--- a/crc/crc32c.h
+++ b/crc/crc32c.h
@@ -13,7 +13,7 @@
 
    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software Foundation,
-   Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
+   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */
 
 #ifndef CRC32C_H
 #define CRC32C_H
diff --git a/doc/conf.py b/doc/conf.py
index 4102140..d4dd9d2 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -1,4 +1,3 @@
-#!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 #
 # fio documentation build configuration file, created by
diff --git a/engines/pmemblk.c b/engines/pmemblk.c
index 52af9ed..5d21915 100644
--- a/engines/pmemblk.c
+++ b/engines/pmemblk.c
@@ -14,8 +14,8 @@
  *
  * You should have received a copy of the GNU General Public
  * License along with this program; if not, write to the Free
- * Software Foundation, Inc., 59 Temple Place, Suite 330,
- * Boston, MA 02111-1307 USA
+ * Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
+ * Boston, MA 02110-1301, USA.
  */
 
 /*
diff --git a/exp/expression-parser.l b/exp/expression-parser.l
index 50bd383..692c6cc 100644
--- a/exp/expression-parser.l
+++ b/exp/expression-parser.l
@@ -14,7 +14,7 @@
  *
  *  You should have received a copy of the GNU General Public License
  *  along with this program; if not, write to the Free Software
- *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  */
 
diff --git a/exp/expression-parser.y b/exp/expression-parser.y
index d664b8e..04a6e07 100644
--- a/exp/expression-parser.y
+++ b/exp/expression-parser.y
@@ -14,7 +14,7 @@
  *
  *  You should have received a copy of the GNU General Public License
  *  along with this program; if not, write to the Free Software
- *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  */
 
diff --git a/exp/test-expression-parser.c b/exp/test-expression-parser.c
index bf3fb3e..e22f24d 100644
--- a/exp/test-expression-parser.c
+++ b/exp/test-expression-parser.c
@@ -15,7 +15,7 @@
  *
  *  You should have received a copy of the GNU General Public License
  *  along with this program; if not, write to the Free Software
- *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  */
 
diff --git a/fifo.c b/fifo.c
index 81d13b5..98737e9 100644
--- a/fifo.c
+++ b/fifo.c
@@ -15,7 +15,7 @@
  *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  */
 
diff --git a/fifo.h b/fifo.h
index 4b775b0..5e3d339 100644
--- a/fifo.h
+++ b/fifo.h
@@ -17,7 +17,7 @@
  *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  */
 #include "minmax.h"
diff --git a/fio.1 b/fio.1
index 96d8f11..198b9d8 100644
--- a/fio.1
+++ b/fio.1
@@ -1593,8 +1593,9 @@ absolute or relative. See `engines/skeleton_external.c' in the fio source for
 details of writing an external I/O engine.
 .TP
 .B filecreate
-Create empty files only.  \fBfilesize\fR still needs to be specified so that fio
-will run and grab latency results, but no IO will actually be done on the files.
+Simply create the files and do no I/O to them.  You still need to set
+\fBfilesize\fR so that all the accounting still occurs, but no actual I/O will be
+done other than creating the file.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -2419,48 +2420,48 @@ the final stat output.
 .TP
 .BI write_bw_log \fR=\fPstr
 If given, write a bandwidth log for this job. Can be used to store data of
-the bandwidth of the jobs in their lifetime. The included
-\fBfio_generate_plots\fR script uses gnuplot to turn these
-text files into nice graphs. See \fBwrite_lat_log\fR for behavior of
-given filename. For this option, the postfix is `_bw.x.log', where `x'
-is the index of the job (1..N, where N is the number of jobs). If
-\fBper_job_logs\fR is false, then the filename will not include the job
-index. See \fBLOG FILE FORMATS\fR section.
-.TP
-.BI write_lat_log \fR=\fPstr
-Same as \fBwrite_bw_log\fR, except that this option stores I/O
-submission, completion, and total latencies instead. If no filename is given
-with this option, the default filename of `jobname_type.log' is
-used. Even if the filename is given, fio will still append the type of
-log. So if one specifies:
+the bandwidth of the jobs in their lifetime.
 .RS
+.P
+If no str argument is given, the default filename of
+`jobname_type.x.log' is used. Even when the argument is given, fio
+will still append the type of log. So if one specifies:
 .RS
 .P
-write_lat_log=foo
+write_bw_log=foo
 .RE
 .P
-The actual log names will be `foo_slat.x.log', `foo_clat.x.log',
-and `foo_lat.x.log', where `x' is the index of the job (1..N, where N
-is the number of jobs). This helps \fBfio_generate_plots\fR find the
-logs automatically. If \fBper_job_logs\fR is false, then the filename
-will not include the job index. See \fBLOG FILE FORMATS\fR section.
+The actual log name will be `foo_bw.x.log' where `x' is the index
+of the job (1..N, where N is the number of jobs). If
+\fBper_job_logs\fR is false, then the filename will not include the
+`.x` job index.
+.P
+The included \fBfio_generate_plots\fR script uses gnuplot to turn these
+text files into nice graphs. See the \fBLOG FILE FORMATS\fR section for how data is
+structured within the file.
 .RE
 .TP
+.BI write_lat_log \fR=\fPstr
+Same as \fBwrite_bw_log\fR, except this option creates I/O
+submission (e.g., `name_slat.x.log'), completion (e.g.,
+`name_clat.x.log'), and total (e.g., `name_lat.x.log') latency
+files instead. See \fBwrite_bw_log\fR for details about the
+filename format and the \fBLOG FILE FORMATS\fR section for how data is structured
+within the files.
+.TP
 .BI write_hist_log \fR=\fPstr
-Same as \fBwrite_lat_log\fR, but writes I/O completion latency
-histograms. If no filename is given with this option, the default filename
-of `jobname_clat_hist.x.log' is used, where `x' is the index of the
-job (1..N, where N is the number of jobs). Even if the filename is given,
-fio will still append the type of log. If \fBper_job_logs\fR is false,
-then the filename will not include the job index. See \fBLOG FILE FORMATS\fR section.
+Same as \fBwrite_bw_log\fR but writes an I/O completion latency
+histogram file (e.g., `name_hist.x.log') instead. Note that this
+file will be empty unless \fBlog_hist_msec\fR has also been set.
+See \fBwrite_bw_log\fR for details about the filename format and
+the \fBLOG FILE FORMATS\fR section for how data is structured
+within the file.
 .TP
 .BI write_iops_log \fR=\fPstr
-Same as \fBwrite_bw_log\fR, but writes IOPS. If no filename is given
-with this option, the default filename of `jobname_type.x.log' is
-used, where `x' is the index of the job (1..N, where N is the number of
-jobs). Even if the filename is given, fio will still append the type of
-log. If \fBper_job_logs\fR is false, then the filename will not include
-the job index. See \fBLOG FILE FORMATS\fR section.
+Same as \fBwrite_bw_log\fR, but writes an IOPS file (e.g.
+`name_iops.x.log') instead. See \fBwrite_bw_log\fR for
+details about the filename format and the \fBLOG FILE FORMATS\fR section for how data
+is structured within the file.
 .TP
 .BI log_avg_msec \fR=\fPint
 By default, fio will log an entry in the iops, latency, or bw log for every
@@ -2476,8 +2477,8 @@ histograms. Computing latency percentiles from averages of intervals using
 \fBlog_avg_msec\fR is inaccurate. Setting this option makes fio log
 histogram entries over the specified period of time, reducing log sizes for
 high IOPS devices while retaining percentile accuracy. See
-\fBlog_hist_coarseness\fR as well. Defaults to 0, meaning histogram
-logging is disabled.
+\fBlog_hist_coarseness\fR and \fBwrite_hist_log\fR as well.
+Defaults to 0, meaning histogram logging is disabled.
 .TP
 .BI log_hist_coarseness \fR=\fPint
 Integer ranging from 0 to 6, defining the coarseness of the resolution of
@@ -2567,7 +2568,7 @@ Enable the reporting of percentiles of completion latencies. This option is
 mutually exclusive with \fBlat_percentiles\fR.
 .TP
 .BI lat_percentiles \fR=\fPbool
-Enable the reporting of percentiles of IO latencies. This is similar to
+Enable the reporting of percentiles of I/O latencies. This is similar to
 \fBclat_percentiles\fR, except that this includes the submission latency.
 This option is mutually exclusive with \fBclat_percentiles\fR.
 .TP
@@ -2915,7 +2916,7 @@ The number of \fBread/write/trim\fR requests issued, and how many of them were
 short or dropped.
 .TP
 .B IO latency
-These values are for \fBlatency-target\fR and related options. When
+These values are for \fBlatency_target\fR and related options. When
 these options are engaged, this section describes the I/O depth required
 to meet the specified latency target.
 .RE
diff --git a/fio.c b/fio.c
index 7b3a50b..7b61ffc 100644
--- a/fio.c
+++ b/fio.c
@@ -18,7 +18,7 @@
  *
  *  You should have received a copy of the GNU General Public License
  *  along with this program; if not, write to the Free Software
- *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  */
 #include <unistd.h>
diff --git a/gfio.c b/gfio.c
index 7160c3a..d222a1c 100644
--- a/gfio.c
+++ b/gfio.c
@@ -18,7 +18,7 @@
  *
  *  You should have received a copy of the GNU General Public License
  *  along with this program; if not, write to the Free Software
- *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  */
 #include <locale.h>
diff --git a/graph.c b/graph.c
index c45954c..f82b52a 100644
--- a/graph.c
+++ b/graph.c
@@ -17,7 +17,7 @@
  *
  *  You should have received a copy of the GNU General Public License
  *  along with this program; if not, write to the Free Software
- *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  */
 #include <string.h>
diff --git a/lib/rbtree.c b/lib/rbtree.c
index 883bc72..00a5a90 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -15,7 +15,7 @@
 
   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
-  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 
   linux/lib/rbtree.c
 */
diff --git a/lib/rbtree.h b/lib/rbtree.h
index c6cfe4a..f31fc56 100644
--- a/lib/rbtree.h
+++ b/lib/rbtree.h
@@ -14,7 +14,7 @@
 
   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
-  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 
   linux/include/linux/rbtree.h
 
diff --git a/libfio.c b/libfio.c
index 830759a..d9900ad 100644
--- a/libfio.c
+++ b/libfio.c
@@ -18,7 +18,7 @@
  *
  *  You should have received a copy of the GNU General Public License
  *  along with this program; if not, write to the Free Software
- *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  */
 
diff --git a/oslib/libmtd.c b/oslib/libmtd.c
index 5d18871..385b9d2 100644
--- a/oslib/libmtd.c
+++ b/oslib/libmtd.c
@@ -14,7 +14,7 @@
  *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  * Author: Artem Bityutskiy
  *
diff --git a/oslib/libmtd.h b/oslib/libmtd.h
index b5fd3f3..a0c90dc 100644
--- a/oslib/libmtd.h
+++ b/oslib/libmtd.h
@@ -13,7 +13,7 @@
  *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  * Author: Artem Bityutskiy
  *
diff --git a/oslib/libmtd_common.h b/oslib/libmtd_common.h
index 35628fe..87f93b6 100644
--- a/oslib/libmtd_common.h
+++ b/oslib/libmtd_common.h
@@ -13,7 +13,7 @@
  *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  */
 
 /* Imported from mtd-utils by dehrenberg */
diff --git a/oslib/libmtd_int.h b/oslib/libmtd_int.h
index cbe2ff5..a08e574 100644
--- a/oslib/libmtd_int.h
+++ b/oslib/libmtd_int.h
@@ -14,7 +14,7 @@
  *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  * Author: Artem Bityutskiy
  *
diff --git a/oslib/libmtd_legacy.c b/oslib/libmtd_legacy.c
index 38dc2b7..137e80a 100644
--- a/oslib/libmtd_legacy.c
+++ b/oslib/libmtd_legacy.c
@@ -13,7 +13,7 @@
  *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  *
  * Author: Artem Bityutskiy
  *
diff --git a/oslib/libmtd_xalloc.h b/oslib/libmtd_xalloc.h
index 532b80f..6ac595a 100644
--- a/oslib/libmtd_xalloc.h
+++ b/oslib/libmtd_xalloc.h
@@ -21,7 +21,7 @@
  *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
  */
 
 #ifndef __MTD_UTILS_XALLOC_H__
diff --git a/tools/fio_jsonplus_clat2csv b/tools/fio_jsonplus_clat2csv
index 64fdc9f..e63d6d8 100755
--- a/tools/fio_jsonplus_clat2csv
+++ b/tools/fio_jsonplus_clat2csv
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/python2.7
 #
 # fio_jsonplus_clat2csv
 #
diff --git a/tools/fiologparser.py b/tools/fiologparser.py
index 5a95009..8549859 100755
--- a/tools/fiologparser.py
+++ b/tools/fiologparser.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/python2.7
 #
 # fiologparser.py
 #
@@ -218,4 +218,3 @@ if __name__ == '__main__':
         print_all_stats(ctx, series)
     else:
         print_default(ctx, series)
-
diff --git a/tools/genfio b/tools/genfio
index 6800452..286d814 100755
--- a/tools/genfio
+++ b/tools/genfio
@@ -1,4 +1,4 @@
-#!/usr/bin/env bash
+#!/usr/bin/bash
 #
 #  Copyright (C) 2013 eNovance SAS <licensing@enovance.com>
 #  Author: Erwan Velu  <erwan@enovance.com>
@@ -17,7 +17,7 @@
 #
 #  You should have received a copy of the GNU General Public License
 #  along with this program; if not, write to the Free Software
-#  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+#  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 
 BLK_SIZE=
 BLOCK_SIZE=4k
diff --git a/tools/hist/fiologparser_hist.py b/tools/hist/fiologparser_hist.py
index ad97a54..2e05b92 100755
--- a/tools/hist/fiologparser_hist.py
+++ b/tools/hist/fiologparser_hist.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python2.7
+#!/usr/bin/python2.7
 """ 
     Utility for converting *_clat_hist* files generated by fio into latency statistics.
     
diff --git a/tools/hist/fiologparser_hist.py.1 b/tools/hist/fiologparser_hist.py.1
index ed22c74..5dfacfe 100644
--- a/tools/hist/fiologparser_hist.py.1
+++ b/tools/hist/fiologparser_hist.py.1
@@ -17,7 +17,7 @@ end-time, samples, min, avg, median, 90%, 95%, 99%, max
 1000, 15, 192, 1678.107, 1788.859, 1856.076, 1880.040, 1899.208, 1888.000
 2000, 43, 152, 1642.368, 1714.099, 1816.659, 1845.552, 1888.131, 1888.000
 4000, 39, 1152, 1546.962, 1545.785, 1627.192, 1640.019, 1691.204, 1744
-...
+\[char46]..
 .fi
 .PP
 
diff --git a/tools/hist/half-bins.py b/tools/hist/half-bins.py
index d592af0..1bba8ff 100755
--- a/tools/hist/half-bins.py
+++ b/tools/hist/half-bins.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python2.7
+#!/usr/bin/python2.7
 """ Cut the number bins in half in fio histogram output. Example usage:
 
         $ half-bins.py -c 2 output_clat_hist.1.log > smaller_clat_hist.1.log
@@ -35,4 +35,3 @@ if __name__ == '__main__':
             'e.g. coarseness of 4 merges each 2^4 = 16 consecutive '
             'bins.')
     main(p.parse_args())
-
diff --git a/tools/plot/fio2gnuplot b/tools/plot/fio2gnuplot
index a703ae3..5d31f13 100755
--- a/tools/plot/fio2gnuplot
+++ b/tools/plot/fio2gnuplot
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/python2.7
 #
 #  Copyright (C) 2013 eNovance SAS <licensing@enovance.com>
 #  Author: Erwan Velu  <erwan@enovance.com>
@@ -17,7 +17,7 @@
 #
 #  You should have received a copy of the GNU General Public License
 #  along with this program; if not, write to the Free Software
-#  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+#  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 
 import os
 import fnmatch
diff --git a/unit_tests/steadystate_tests.py b/unit_tests/steadystate_tests.py
index 91c79a4..5a74f95 100755
--- a/unit_tests/steadystate_tests.py
+++ b/unit_tests/steadystate_tests.py
@@ -1,10 +1,10 @@
-#!/usr/bin/python
+#!/usr/bin/python2.7
 #
 # steadystate_tests.py
 #
 # Test option parsing and functonality for fio's steady state detection feature.
 #
-# steadystate_tests.py ./fio file-for-read-testing file-for-write-testing
+# steadystate_tests.py --read file-for-read-testing --write file-for-write-testing ./fio
 #
 # REQUIREMENTS
 # Python 2.6+

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-10-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-10-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ff523a66e5af357e67602caf33de1e2cd0521b08:

  parse: minimum options values are signed (2017-10-25 13:06:40 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 11fd6aa8569c55c8488020e4e315d550d121ff79:

  Fix 'nice' parameter range: should be -20 to 19, not -19 to 20. (2017-10-26 15:31:35 -0600)

----------------------------------------------------------------
Jeff Furlong (1):
      Add offset_align option

Jens Axboe (1):
      io_u: re-invalidate cache when looping around without file open/close

Rebecca Cran (1):
      Fix 'nice' parameter range: should be -20 to 19, not -19 to 20.

 HOWTO            | 11 +++++++++--
 cconv.c          |  2 ++
 filesetup.c      |  8 +++-----
 fio.1            | 10 ++++++++--
 io_u.c           | 14 ++++++++++++++
 options.c        | 15 +++++++++++++--
 thread_options.h |  2 ++
 7 files changed, 51 insertions(+), 11 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 22a5849..e7142c5 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1128,13 +1128,20 @@ I/O type
 .. option:: offset=int
 
 	Start I/O at the provided offset in the file, given as either a fixed size in
-	bytes or a percentage. If a percentage is given, the next ``blockalign``-ed
-	offset will be used. Data before the given offset will not be touched. This
+	bytes or a percentage. If a percentage is given, the generated offset will be
+	aligned to the minimum ``blocksize`` or to the value of ``offset_align`` if
+	provided. Data before the given offset will not be touched. This
 	effectively caps the file size at `real_size - offset`. Can be combined with
 	:option:`size` to constrain the start and end range of the I/O workload.
 	A percentage can be specified by a number between 1 and 100 followed by '%',
 	for example, ``offset=20%`` to specify 20%.
 
+.. option:: offset_align=int
+
+	If set to non-zero value, the byte offset generated by a percentage ``offset``
+	is aligned upwards to this value. Defaults to 0 meaning that a percentage
+	offset is aligned to the minimum block size.
+
 .. option:: offset_increment=int
 
 	If this is provided, then the real offset becomes `offset + offset_increment
diff --git a/cconv.c b/cconv.c
index f809fd5..dc3c4e6 100644
--- a/cconv.c
+++ b/cconv.c
@@ -105,6 +105,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->file_size_low = le64_to_cpu(top->file_size_low);
 	o->file_size_high = le64_to_cpu(top->file_size_high);
 	o->start_offset = le64_to_cpu(top->start_offset);
+	o->start_offset_align = le64_to_cpu(top->start_offset_align);
 	o->start_offset_percent = le32_to_cpu(top->start_offset_percent);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
@@ -548,6 +549,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->file_size_low = __cpu_to_le64(o->file_size_low);
 	top->file_size_high = __cpu_to_le64(o->file_size_high);
 	top->start_offset = __cpu_to_le64(o->start_offset);
+	top->start_offset_align = __cpu_to_le64(o->start_offset_align);
 	top->start_offset_percent = __cpu_to_le32(o->start_offset_percent);
 	top->trim_backlog = __cpu_to_le64(o->trim_backlog);
 	top->offset_increment = __cpu_to_le64(o->offset_increment);
diff --git a/filesetup.c b/filesetup.c
index 7a602d4..5d7ea5c 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -869,12 +869,10 @@ uint64_t get_start_offset(struct thread_data *td, struct fio_file *f)
 
 	if (o->start_offset_percent > 0) {
 		/*
-		 * if blockalign is provided, find the min across read, write,
-		 * and trim
+		 * if offset_align is provided, set initial offset
 		 */
-		if (fio_option_is_set(o, ba)) {
-			align_bs = (unsigned long long) min(o->ba[DDIR_READ], o->ba[DDIR_WRITE]);
-			align_bs = min((unsigned long long) o->ba[DDIR_TRIM], align_bs);
+		if (fio_option_is_set(o, start_offset_align)) {
+			align_bs = o->start_offset_align;
 		} else {
 			/* else take the minimum block size */
 			align_bs = td_min_bs(td);
diff --git a/fio.1 b/fio.1
index 7787ef2..96d8f11 100644
--- a/fio.1
+++ b/fio.1
@@ -913,13 +913,19 @@ should be associated with them.
 .TP
 .BI offset \fR=\fPint
 Start I/O at the provided offset in the file, given as either a fixed size in
-bytes or a percentage. If a percentage is given, the next \fBblockalign\fR\-ed
-offset will be used. Data before the given offset will not be touched. This
+bytes or a percentage. If a percentage is given, the generated offset will be
+aligned to the minimum \fBblocksize\fR or to the value of \fBoffset_align\fR if
+provided. Data before the given offset will not be touched. This
 effectively caps the file size at `real_size \- offset'. Can be combined with
 \fBsize\fR to constrain the start and end range of the I/O workload.
 A percentage can be specified by a number between 1 and 100 followed by '%',
 for example, `offset=20%' to specify 20%.
 .TP
+.BI offset_align \fR=\fPint
+If set to non-zero value, the byte offset generated by a percentage \fBoffset\fR
+is aligned upwards to this value. Defaults to 0 meaning that a percentage
+offset is aligned to the minimum block size.
+.TP
 .BI offset_increment \fR=\fPint
 If this is provided, then the real offset becomes `\fBoffset\fR + \fBoffset_increment\fR
 * thread_number', where the thread number is a counter that starts at 0 and
diff --git a/io_u.c b/io_u.c
index fb4180a..4246edf 100644
--- a/io_u.c
+++ b/io_u.c
@@ -323,6 +323,17 @@ fetch:
 	goto fetch;
 }
 
+static void loop_cache_invalidate(struct thread_data *td, struct fio_file *f)
+{
+	struct thread_options *o = &td->o;
+
+	if (o->invalidate_cache && !o->odirect) {
+		int fio_unused ret;
+
+		ret = file_invalidate_cache(td, f);
+	}
+}
+
 static int get_next_rand_block(struct thread_data *td, struct fio_file *f,
 			       enum fio_ddir ddir, uint64_t *b)
 {
@@ -334,6 +345,7 @@ static int get_next_rand_block(struct thread_data *td, struct fio_file *f,
 		fio_file_reset(td, f);
 		if (!get_next_rand_offset(td, f, ddir, b))
 			return 0;
+		loop_cache_invalidate(td, f);
 	}
 
 	dprint(FD_IO, "%s: rand offset failed, last=%llu, size=%llu\n",
@@ -358,6 +370,8 @@ static int get_next_seq_offset(struct thread_data *td, struct fio_file *f,
 			f->last_pos[ddir] = 0;
 		else
 			f->last_pos[ddir] = f->last_pos[ddir] - io_size;
+
+		loop_cache_invalidate(td, f);
 	}
 
 	if (f->last_pos[ddir] < f->real_file_size) {
diff --git a/options.c b/options.c
index ddcc4e5..5813a66 100644
--- a/options.c
+++ b/options.c
@@ -2019,6 +2019,17 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
+		.name	= "offset_align",
+		.lname	= "IO offset alignment",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, start_offset_align),
+		.help	= "Start IO from this offset alignment",
+		.def	= "0",
+		.interval = 512,
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
 		.name	= "offset_increment",
 		.lname	= "IO offset increment",
 		.type	= FIO_OPT_STR_VAL,
@@ -3241,8 +3252,8 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct thread_options, nice),
 		.help	= "Set job CPU nice value",
-		.minval	= -19,
-		.maxval	= 20,
+		.minval	= -20,
+		.maxval	= 19,
 		.def	= "0",
 		.interval = 1,
 		.category = FIO_OPT_C_GENERAL,
diff --git a/thread_options.h b/thread_options.h
index 1813cdc..5a037bf 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -78,6 +78,7 @@ struct thread_options {
 	unsigned long long file_size_low;
 	unsigned long long file_size_high;
 	unsigned long long start_offset;
+	unsigned long long start_offset_align;
 
 	unsigned int bs[DDIR_RWDIR_CNT];
 	unsigned int ba[DDIR_RWDIR_CNT];
@@ -355,6 +356,7 @@ struct thread_options_pack {
 	uint64_t file_size_low;
 	uint64_t file_size_high;
 	uint64_t start_offset;
+	uint64_t start_offset_align;
 
 	uint32_t bs[DDIR_RWDIR_CNT];
 	uint32_t ba[DDIR_RWDIR_CNT];

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-10-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-10-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 71aa48eb4eed51adb719d159810ab0044b2a7154:

  doc: minor formatting fixes (2017-10-20 07:16:40 +0100)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ff523a66e5af357e67602caf33de1e2cd0521b08:

  parse: minimum options values are signed (2017-10-25 13:06:40 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      parse: minimum options values are signed

 parse.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/parse.c b/parse.c
index ecce8b8..68229d0 100644
--- a/parse.c
+++ b/parse.c
@@ -556,8 +556,8 @@ static int __handle_option(struct fio_option *o, const char *ptr, void *data,
 			return 1;
 		}
 		if (o->minval && ull < o->minval) {
-			log_err("min value out of range: %llu"
-					" (%u min)\n", ull, o->minval);
+			log_err("min value out of range: %lld"
+					" (%d min)\n", ull, o->minval);
 			return 1;
 		}
 		if (o->posval[0].ival) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-10-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-10-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7ad86b642b6c3962177064b85b4c055ae9455032:

  Merge branch 'cpuclock-test' (2017-10-17 12:59:40 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 71aa48eb4eed51adb719d159810ab0044b2a7154:

  doc: minor formatting fixes (2017-10-20 07:16:40 +0100)

----------------------------------------------------------------
Sitsofe Wheeler (1):
      doc: minor formatting fixes

 HOWTO | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index a1513e1..22a5849 100644
--- a/HOWTO
+++ b/HOWTO
@@ -218,7 +218,7 @@ Command line options
 
 	Set the maximum number of threads/processes to support to `nr`.
 	NOTE: On Linux, it may be necessary to increase the shared-memory
-	limit ('/proc/sys/kernel/shmmax') if fio runs into errors while
+	limit (:file:`/proc/sys/kernel/shmmax`) if fio runs into errors while
 	creating jobs.
 
 .. option:: --server=args
@@ -233,7 +233,7 @@ Command line options
 .. option:: --client=hostname
 
 	Instead of running the jobs locally, send and run them on the given `hostname`
-	or set of `hostname`s.  See `Client/Server`_ section.
+	or set of `hostname`\s.  See `Client/Server`_ section.
 
 .. option:: --remote-config=file
 
@@ -1715,7 +1715,7 @@ I/O engine
 			Doesn't transfer any data, but burns CPU cycles according to the
 			:option:`cpuload` and :option:`cpuchunks` options. Setting
 			:option:`cpuload`\=85 will cause that job to do nothing but burn 85%
-			of the CPU. In case of SMP machines, use :option:`numjobs`=<nr_of_cpu>
+			of the CPU. In case of SMP machines, use :option:`numjobs`\=<nr_of_cpu>
 			to get desired CPU usage, as the cpuload only loads a
 			single CPU at the desired rate. A job never finishes unless there is
 			at least one non-cpuio job.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-10-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-10-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c13a60ce72aaf5b07b93977ab86e7522d167ec28:

  flow: fix bad overflowing math (2017-10-12 10:54:27 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7ad86b642b6c3962177064b85b4c055ae9455032:

  Merge branch 'cpuclock-test' (2017-10-17 12:59:40 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      gettime: improve cpu clock test
      Merge branch 'cpuclock-test'

 configure | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 gettime.c | 21 ++++++++++++---------
 2 files changed, 58 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 2b46ab8..d34c000 100755
--- a/configure
+++ b/configure
@@ -343,6 +343,8 @@ CYGWIN*)
   # Flags below are still necessary mostly for MinGW.
   socklen_t="yes"
   sfaa="yes"
+  sync_sync="yes"
+  cmp_swap="yes"
   rusage_thread="yes"
   fdatasync="yes"
   clock_gettime="yes" # clock_monotonic probe has dependency on this
@@ -707,6 +709,44 @@ fi
 print_config "__sync_fetch_and_add" "$sfaa"
 
 ##########################################
+# __sync_synchronize() test
+if test "$sync_sync" != "yes" ; then
+  sync_sync="no"
+fi
+cat > $TMPC << EOF
+#include <inttypes.h>
+
+int main(int argc, char **argv)
+{
+  __sync_synchronize();
+  return 0;
+}
+EOF
+if compile_prog "" "" "__sync_synchronize()" ; then
+    sync_sync="yes"
+fi
+print_config "__sync_synchronize" "$sync_sync"
+
+##########################################
+# __sync_val_compare_and_swap() test
+if test "$cmp_swap" != "yes" ; then
+  cmp_swap="no"
+fi
+cat > $TMPC << EOF
+#include <inttypes.h>
+
+int main(int argc, char **argv)
+{
+  int x = 0;
+  return __sync_val_compare_and_swap(&x, 1, 2);
+}
+EOF
+if compile_prog "" "" "__sync_val_compare_and_swap()" ; then
+    cmp_swap="yes"
+fi
+print_config "__sync_val_compare_and_swap" "$cmp_swap"
+
+##########################################
 # libverbs probe
 if test "$libverbs" != "yes" ; then
   libverbs="no"
@@ -2108,6 +2148,12 @@ fi
 if test "$sfaa" = "yes" ; then
   output_sym "CONFIG_SFAA"
 fi
+if test "$sync_sync" = "yes" ; then
+  output_sym "CONFIG_SYNC_SYNC"
+fi
+if test "$cmp_swap" = "yes" ; then
+  output_sym "CONFIG_CMP_SWAP"
+fi
 if test "$libverbs" = "yes" -a "$rdmacm" = "yes" ; then
   output_sym "CONFIG_RDMA"
 fi
diff --git a/gettime.c b/gettime.c
index 1cbef84..c256a96 100644
--- a/gettime.c
+++ b/gettime.c
@@ -548,7 +548,7 @@ uint64_t time_since_now(const struct timespec *s)
 }
 
 #if defined(FIO_HAVE_CPU_AFFINITY) && defined(ARCH_HAVE_CPU_CLOCK)  && \
-    defined(CONFIG_SFAA)
+    defined(CONFIG_SYNC_SYNC) && defined(CONFIG_CMP_SWAP)
 
 #define CLOCK_ENTRIES_DEBUG	100000
 #define CLOCK_ENTRIES_TEST	1000
@@ -570,9 +570,10 @@ struct clock_thread {
 	struct clock_entry *entries;
 };
 
-static inline uint32_t atomic32_inc_return(uint32_t *seq)
+static inline uint32_t atomic32_compare_and_swap(uint32_t *ptr, uint32_t old,
+						 uint32_t new)
 {
-	return 1 + __sync_fetch_and_add(seq, 1);
+	return __sync_val_compare_and_swap(ptr, old, new);
 }
 
 static void *clock_thread_fn(void *data)
@@ -580,7 +581,6 @@ static void *clock_thread_fn(void *data)
 	struct clock_thread *t = data;
 	struct clock_entry *c;
 	os_cpu_mask_t cpu_mask;
-	uint32_t last_seq;
 	unsigned long long first;
 	int i;
 
@@ -604,7 +604,6 @@ static void *clock_thread_fn(void *data)
 	pthread_mutex_unlock(&t->started);
 
 	first = get_cpu_clock();
-	last_seq = 0;
 	c = &t->entries[0];
 	for (i = 0; i < t->nr_entries; i++, c++) {
 		uint32_t seq;
@@ -612,11 +611,15 @@ static void *clock_thread_fn(void *data)
 
 		c->cpu = t->cpu;
 		do {
-			seq = atomic32_inc_return(t->seq);
-			if (seq < last_seq)
+			seq = *t->seq;
+			if (seq == UINT_MAX)
 				break;
+			__sync_synchronize();
 			tsc = get_cpu_clock();
-		} while (seq != *t->seq);
+		} while (seq != atomic32_compare_and_swap(t->seq, seq, seq + 1));
+
+		if (seq == UINT_MAX)
+			break;
 
 		c->seq = seq;
 		c->tsc = tsc;
@@ -634,7 +637,7 @@ static void *clock_thread_fn(void *data)
 	 * The most common platform clock breakage is returning zero
 	 * indefinitely. Check for that and return failure.
 	 */
-	if (!t->entries[i - 1].tsc && !t->entries[0].tsc)
+	if (i > 1 && !t->entries[i - 1].tsc && !t->entries[0].tsc)
 		goto err;
 
 	fio_cpuset_exit(&cpu_mask);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-10-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-10-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 447b94e10bddab8078f35c423ca1e3c3f0b1be38:

  Fix overflow in percentile calculation for Windows (2017-10-11 16:26:00 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c13a60ce72aaf5b07b93977ab86e7522d167ec28:

  flow: fix bad overflowing math (2017-10-12 10:54:27 -0600)

----------------------------------------------------------------
Andrzej Jakowski (1):
      Fix more overflows in percentile calculation for Windows

Jens Axboe (3):
      Merge branch 'overflow_fix' of https://github.com/sitsofe/fio
      Merge branch 'master' of https://github.com/Venutiwa/fio
      flow: fix bad overflowing math

Sitsofe Wheeler (1):
      gettime: fix cycles_per_msec overflow when using 32 bit longs

Venu (1):
      Adding support for multiple jobs for fio test

 examples/fio-rand-RW.job    | 18 ++++++++++++++++++
 examples/fio-rand-read.job  | 16 ++++++++++++++++
 examples/fio-rand-write.job | 16 ++++++++++++++++
 examples/fio-seq-RW.job     | 18 ++++++++++++++++++
 examples/fio-seq-read.job   | 14 ++++++++++++++
 examples/fio-seq-write.job  | 16 ++++++++++++++++
 flow.c                      | 10 +++++++---
 gclient.c                   |  2 +-
 gettime.c                   |  2 +-
 stat.c                      |  4 ++--
 stat.h                      |  2 +-
 11 files changed, 110 insertions(+), 8 deletions(-)
 create mode 100644 examples/fio-rand-RW.job
 create mode 100644 examples/fio-rand-read.job
 create mode 100644 examples/fio-rand-write.job
 create mode 100644 examples/fio-seq-RW.job
 create mode 100644 examples/fio-seq-read.job
 create mode 100644 examples/fio-seq-write.job

---

Diff of recent changes:

diff --git a/examples/fio-rand-RW.job b/examples/fio-rand-RW.job
new file mode 100644
index 0000000..0df0bc1
--- /dev/null
+++ b/examples/fio-rand-RW.job
@@ -0,0 +1,18 @@
+; fio-rand-RW.job for fiotest
+
+[global]
+name=fio-rand-RW
+filename=fio-rand-RW
+rw=randrw
+rwmixread=60
+rwmixwrite=40
+bs=4K
+direct=0
+numjobs=4
+time_based=1
+runtime=900
+
+[file1]
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/examples/fio-rand-read.job b/examples/fio-rand-read.job
new file mode 100644
index 0000000..bc15466
--- /dev/null
+++ b/examples/fio-rand-read.job
@@ -0,0 +1,16 @@
+; fio-rand-read.job for fiotest
+
+[global]
+name=fio-rand-read
+filename=fio-rand-read
+rw=randread
+bs=4K
+direct=0
+numjobs=1
+time_based=1
+runtime=900
+
+[file1]
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/examples/fio-rand-write.job b/examples/fio-rand-write.job
new file mode 100644
index 0000000..bd1b73a
--- /dev/null
+++ b/examples/fio-rand-write.job
@@ -0,0 +1,16 @@
+; fio-rand-write.job for fiotest
+
+[global]
+name=fio-rand-write
+filename=fio-rand-write
+rw=randwrite
+bs=4K
+direct=0
+numjobs=4
+time_based=1
+runtime=900
+
+[file1]
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/examples/fio-seq-RW.job b/examples/fio-seq-RW.job
new file mode 100644
index 0000000..8f7090f
--- /dev/null
+++ b/examples/fio-seq-RW.job
@@ -0,0 +1,18 @@
+; fio-seq-RW.job for fiotest
+
+[global]
+name=fio-seq-RW
+filename=fio-seq-RW
+rw=rw
+rwmixread=60
+rwmixwrite=40
+bs=256K
+direct=0
+numjobs=4
+time_based=1
+runtime=900
+
+[file1]
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/examples/fio-seq-read.job b/examples/fio-seq-read.job
new file mode 100644
index 0000000..74b1b30
--- /dev/null
+++ b/examples/fio-seq-read.job
@@ -0,0 +1,14 @@
+[global]
+name=fio-seq-reads
+filename=fio-seq-reads
+rw=read
+bs=256K
+direct=0
+numjobs=1
+time_based=1
+runtime=900
+
+[file1]
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/examples/fio-seq-write.job b/examples/fio-seq-write.job
new file mode 100644
index 0000000..b291a15
--- /dev/null
+++ b/examples/fio-seq-write.job
@@ -0,0 +1,16 @@
+; fio-seq-write.job for fiotest
+
+[global]
+name=fio-seq-write
+filename=fio-seq-write
+rw=write
+bs=256K
+direct=0
+numjobs=1
+time_based=1
+runtime=900
+
+[file1]
+size=10G
+ioengine=libaio
+iodepth=16
diff --git a/flow.c b/flow.c
index 42b6dd7..384187e 100644
--- a/flow.c
+++ b/flow.c
@@ -16,13 +16,17 @@ static struct fio_mutex *flow_lock;
 int flow_threshold_exceeded(struct thread_data *td)
 {
 	struct fio_flow *flow = td->flow;
-	int sign;
+	long long flow_counter;
 
 	if (!flow)
 		return 0;
 
-	sign = td->o.flow > 0 ? 1 : -1;
-	if (sign * flow->flow_counter > td->o.flow_watermark) {
+	if (td->o.flow > 0)
+		flow_counter = flow->flow_counter;
+	else
+		flow_counter = -flow->flow_counter;
+
+	if (flow_counter > td->o.flow_watermark) {
 		if (td->o.flow_sleep) {
 			io_u_quiesce(td);
 			usleep(td->o.flow_sleep);
diff --git a/gclient.c b/gclient.c
index 43c8a08..daa9153 100644
--- a/gclient.c
+++ b/gclient.c
@@ -1099,7 +1099,7 @@ static void gfio_show_clat_percentiles(struct gfio_client *gc,
 				       int ddir)
 {
 	unsigned int *io_u_plat = ts->io_u_plat[ddir];
-	unsigned long nr = ts->clat_stat[ddir].samples;
+	unsigned long long nr = ts->clat_stat[ddir].samples;
 	fio_fp64_t *plist = ts->percentile_list;
 	unsigned int len, scale_down;
 	unsigned long long *ovals, minv, maxv;
diff --git a/gettime.c b/gettime.c
index 7945528..1cbef84 100644
--- a/gettime.c
+++ b/gettime.c
@@ -15,7 +15,7 @@
 
 #if defined(ARCH_HAVE_CPU_CLOCK)
 #ifndef ARCH_CPU_CLOCK_CYCLES_PER_USEC
-static unsigned long cycles_per_msec;
+static unsigned long long cycles_per_msec;
 static unsigned long long cycles_start;
 static unsigned long long clock_mult;
 static unsigned long long max_cycles_mask;
diff --git a/stat.c b/stat.c
index 5c75868..c8a45db 100644
--- a/stat.c
+++ b/stat.c
@@ -135,7 +135,7 @@ static int double_cmp(const void *a, const void *b)
 	return cmp;
 }
 
-unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
+unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long long nr,
 				   fio_fp64_t *plist, unsigned long long **output,
 				   unsigned long long *maxv, unsigned long long *minv)
 {
@@ -198,7 +198,7 @@ unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 /*
  * Find and display the p-th percentile of clat
  */
-static void show_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
+static void show_clat_percentiles(unsigned int *io_u_plat, unsigned long long nr,
 				  fio_fp64_t *plist, unsigned int precision,
 				  bool is_clat, struct buf_output *out)
 {
diff --git a/stat.h b/stat.h
index 3fda084..6ddcad2 100644
--- a/stat.h
+++ b/stat.h
@@ -293,7 +293,7 @@ extern void init_thread_stat(struct thread_stat *ts);
 extern void init_group_run_stat(struct group_run_stats *gs);
 extern void eta_to_str(char *str, unsigned long eta_sec);
 extern bool calc_lat(struct io_stat *is, unsigned long long *min, unsigned long long *max, double *mean, double *dev);
-extern unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr, fio_fp64_t *plist, unsigned long long **output, unsigned long long *maxv, unsigned long long *minv);
+extern unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long long nr, fio_fp64_t *plist, unsigned long long **output, unsigned long long *maxv, unsigned long long *minv);
 extern void stat_calc_lat_n(struct thread_stat *ts, double *io_u_lat);
 extern void stat_calc_lat_m(struct thread_stat *ts, double *io_u_lat);
 extern void stat_calc_lat_u(struct thread_stat *ts, double *io_u_lat);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-10-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-10-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8847ae4cd2e3d0d73dd7d7c93c5d6da96b71d174:

  backend: don't dereference ->io_ops in reap_threads() (2017-10-10 11:54:54 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 447b94e10bddab8078f35c423ca1e3c3f0b1be38:

  Fix overflow in percentile calculation for Windows (2017-10-11 16:26:00 -0600)

----------------------------------------------------------------
Andrzej Jakowski (1):
      Fix overflow in percentile calculation for Windows

Jens Axboe (10):
      Merge branch 'ci_and_configure' of https://github.com/sitsofe/fio
      Merge branch 'windowaio_invalidate' of https://github.com/sitsofe/fio
      engines/windowsaio: style
      Merge branch 'fgp_fixes' of https://github.com/sitsofe/fio
      fio: kill unused TD_F_ flag
      fio: rearrange TD_F_ flag logic
      Error if td flags overlap with engine flags
      Fix broken path separator definition on Windows
      Windows mkdir() fix
      fio: kill td_ioengine_flags()

Josef Bacik (4):
      convert FIO_OS_PATH_SEPARATOR to a character
      create subdirs if specified in the filename_format
      use mkdir instead of mkdirat
      add documentation about filename_format directory behavior

Sitsofe Wheeler (4):
      appveyor: install zlib and minor clean ups
      configure: update compiler probing
      fio_generate_plots: cope with per_job_logs filenames
      windowsaio: add best effort cache invalidation

 HOWTO                    |  7 +++++
 appveyor.yml             | 11 +++++---
 configure                | 68 ++++++++++++++++++++++++++++++++++++------------
 engines/windowsaio.c     | 43 ++++++++++++++++++++++++++++++
 filesetup.c              | 43 +++++++++++++++++++++++++++++-
 fio.1                    |  6 +++++
 fio.h                    | 58 +++++++++++++++++++++++++----------------
 libfio.c                 |  2 ++
 os/os-windows.h          |  2 +-
 os/os.h                  |  2 +-
 stat.c                   |  2 +-
 tools/fio_generate_plots | 26 +++++++++++-------
 verify.c                 |  4 +--
 13 files changed, 216 insertions(+), 58 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index d3f957b..a1513e1 100644
--- a/HOWTO
+++ b/HOWTO
@@ -795,6 +795,13 @@ Target file/device
 	named :file:`testfiles.4`. The default of :file:`$jobname.$jobnum.$filenum`
 	will be used if no other format specifier is given.
 
+	If you specify a path then the directories will be created up to the
+	main directory for the file.  So for example if you specify
+	``filename_format=a/b/c/$jobnum`` then the directories a/b/c will be
+	created before the file setup part of the job.  If you specify
+	:option:`directory` then the path will be relative that directory,
+	otherwise it is treated as the absolute path.
+
 .. option:: unique_filename=bool
 
 	To avoid collisions between networked clients, fio defaults to prefixing any
diff --git a/appveyor.yml b/appveyor.yml
index 39f50a8..844afa5 100644
--- a/appveyor.yml
+++ b/appveyor.yml
@@ -1,16 +1,21 @@
-clone_depth: 50
+clone_depth: 1
 environment:
+  CYG_MIRROR: http://cygwin.mirror.constant.com
+  CYG_ROOT: C:\cygwin64
   MAKEFLAGS: -j 2
   matrix:
     - platform: x86_64
       BUILD_ARCH: x64
-      CYG_ROOT: C:\cygwin64
+      PACKAGE_ARCH: x86_64
       CONFIGURE_OPTIONS:
     - platform: x86
       BUILD_ARCH: x86
-      CYG_ROOT: C:\cygwin
+      PACKAGE_ARCH: i686
       CONFIGURE_OPTIONS: --build-32bit-win
 
+install:
+  - '%CYG_ROOT%\setup-x86_64.exe --quiet-mode --no-shortcuts --only-site --site "%CYG_MIRROR%" --packages "mingw64-%PACKAGE_ARCH%-zlib" > NULL'
+
 build_script:
   - SET PATH=%CYG_ROOT%\bin;%PATH%
   - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure --extra-cflags=\"-Werror\" ${CONFIGURE_OPTIONS} && make.exe'
diff --git a/configure b/configure
index cefd610..2b46ab8 100755
--- a/configure
+++ b/configure
@@ -225,7 +225,20 @@ if test "$show_help" = "yes" ; then
 fi
 
 cross_prefix=${cross_prefix-${CROSS_COMPILE}}
-cc="${CC-${cross_prefix}gcc}"
+# Preferred compiler (can be overriden later after we know the platform):
+#  ${CC} (if set)
+#  ${cross_prefix}gcc (if cross-prefix specified)
+#  gcc if available
+#  clang if available
+if test -z "${CC}${cross_prefix}"; then
+  if has gcc; then
+    cc=gcc
+  elif has clang; then
+    cc=clang
+  fi
+else
+  cc="${CC-${cross_prefix}gcc}"
+fi
 
 if check_define __ANDROID__ ; then
   targetos="Android"
@@ -301,16 +314,16 @@ SunOS)
 CYGWIN*)
   # We still force some options, so keep this message here.
   echo "Forcing some known good options on Windows"
-  if test -z "$CC" ; then
+  if test -z "${CC}${cross_prefix}"; then
     if test ! -z "$build_32bit_win" && test "$build_32bit_win" = "yes"; then
-      CC="i686-w64-mingw32-gcc"
+      cc="i686-w64-mingw32-gcc"
       if test -e "../zlib/contrib/vstudio/vc14/x86/ZlibStatReleaseWithoutAsm/zlibstat.lib"; then
         echo "Building with zlib support"
         output_sym "CONFIG_ZLIB"
         echo "LIBS=../zlib/contrib/vstudio/vc14/x86/ZlibStatReleaseWithoutAsm/zlibstat.lib" >> $config_host_mak
       fi
     else
-      CC="x86_64-w64-mingw32-gcc"
+      cc="x86_64-w64-mingw32-gcc"
       if test -e "../zlib/contrib/vstudio/vc14/x64/ZlibStatReleaseWithoutAsm/zlibstat.lib"; then
         echo "Building with zlib support"
         output_sym "CONFIG_ZLIB"
@@ -340,11 +353,25 @@ CYGWIN*)
   tls_thread="yes"
   static_assert="yes"
   ipv6="yes"
-  echo "CC=$CC" >> $config_host_mak
+  mkdir_two="no"
   echo "BUILD_CFLAGS=$CFLAGS -I../zlib -include config-host.h -D_GNU_SOURCE" >> $config_host_mak
   ;;
 esac
 
+# Now we know the target platform we can have another guess at the preferred
+# compiler when it wasn't explictly set
+if test -z "${CC}${cross_prefix}"; then
+  if test "$targetos" = "FreeBSD" || test "$targetos" = "Darwin"; then
+    if has clang; then
+      cc=clang
+    fi
+  fi
+fi
+if test -z "$cc"; then
+    echo "configure: failed to find compiler"
+    exit 1
+fi
+
 if test ! -z "$cpu" ; then
   # command line argument
   :
@@ -415,18 +442,6 @@ case "$cpu" in
   ;;
 esac
 
-if test -z "$CC" ; then
-  if test "$targetos" = "FreeBSD"; then
-    if has clang; then
-      CC=clang
-    else
-      CC=gcc
-    fi
-  fi
-fi
-
-cc="${CC-${cross_prefix}gcc}"
-
 ##########################################
 # check cross compile
 
@@ -2033,6 +2048,22 @@ if test "$enable_cuda" = "yes" && compile_prog "" "-lcuda" "cuda"; then
 fi
 print_config "cuda" "$cuda"
 
+##########################################
+# mkdir() probe. mingw apparently has a one-argument mkdir :/
+mkdir_two="no"
+cat > $TMPC << EOF
+#include <sys/stat.h>
+#include <sys/types.h>
+int main(int argc, char **argv)
+{
+  return mkdir("/tmp/bla", 0600);
+}
+EOF
+if compile_prog "" "" "mkdir(a, b)"; then
+  mkdir_two="yes"
+fi
+print_config "mkdir(a, b)" "$mkdir_two"
+
 #############################################################################
 
 if test "$wordsize" = "64" ; then
@@ -2261,6 +2292,9 @@ fi
 if test "$cuda" = "yes" ; then
   output_sym "CONFIG_CUDA"
 fi
+if test "$mkdir_two" = "yes" ; then
+  output_sym "CONFIG_HAVE_MKDIR_TWO"
+fi
 
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "GFIO_LIBS+=$GFIO_LIBS" >> $config_host_mak
diff --git a/engines/windowsaio.c b/engines/windowsaio.c
index 314eaad..a66b1df 100644
--- a/engines/windowsaio.c
+++ b/engines/windowsaio.c
@@ -142,6 +142,44 @@ static void fio_windowsaio_cleanup(struct thread_data *td)
 	}
 }
 
+static int windowsaio_invalidate_cache(struct fio_file *f)
+{
+	DWORD error;
+	DWORD isharemode = (FILE_SHARE_DELETE | FILE_SHARE_READ |
+			FILE_SHARE_WRITE);
+	HANDLE ihFile;
+	int rc = 0;
+
+	/*
+	 * Encourage Windows to drop cached parts of a file by temporarily
+	 * opening it for non-buffered access. Note: this will only work when
+	 * the following is the only thing with the file open on the whole
+	 * system.
+	 */
+	dprint(FD_IO, "windowaio: attempt invalidate cache for %s\n",
+			f->file_name);
+	ihFile = CreateFile(f->file_name, 0, isharemode, NULL, OPEN_EXISTING,
+			FILE_FLAG_NO_BUFFERING, NULL);
+
+	if (ihFile != INVALID_HANDLE_VALUE) {
+		if (!CloseHandle(ihFile)) {
+			error = GetLastError();
+			log_info("windowsaio: invalidation fd close %s "
+				 "failed: error %d\n", f->file_name, error);
+			rc = 1;
+		}
+	} else {
+		error = GetLastError();
+		if (error != ERROR_FILE_NOT_FOUND) {
+			log_info("windowsaio: cache invalidation of %s failed: "
+					"error %d\n", f->file_name, error);
+			rc = 1;
+		}
+	}
+
+	return rc;
+}
+
 static int fio_windowsaio_open_file(struct thread_data *td, struct fio_file *f)
 {
 	int rc = 0;
@@ -200,6 +238,11 @@ static int fio_windowsaio_open_file(struct thread_data *td, struct fio_file *f)
 	else
 		openmode = OPEN_EXISTING;
 
+	/* If we're going to use direct I/O, Windows will try and invalidate
+	 * its cache at that point so there's no need to do it here */
+	if (td->o.invalidate_cache && !td->o.odirect)
+		windowsaio_invalidate_cache(f);
+
 	f->hFile = CreateFile(f->file_name, access, sharemode,
 		NULL, openmode, flags, NULL);
 
diff --git a/filesetup.c b/filesetup.c
index 0631a01..7a602d4 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1523,6 +1523,42 @@ bool exists_and_not_regfile(const char *filename)
 	return true;
 }
 
+static int create_work_dirs(struct thread_data *td, const char *fname)
+{
+	char path[PATH_MAX];
+	char *start, *end;
+
+	if (td->o.directory) {
+		snprintf(path, PATH_MAX, "%s%c%s", td->o.directory,
+			 FIO_OS_PATH_SEPARATOR, fname);
+		start = strstr(path, fname);
+	} else {
+		snprintf(path, PATH_MAX, "%s", fname);
+		start = path;
+	}
+
+	end = start;
+	while ((end = strchr(end, FIO_OS_PATH_SEPARATOR)) != NULL) {
+		if (end == start)
+			break;
+		*end = '\0';
+		errno = 0;
+#ifdef CONFIG_HAVE_MKDIR_TWO
+		if (mkdir(path, 0600) && errno != EEXIST) {
+#else
+		if (mkdir(path) && errno != EEXIST) {
+#endif
+			log_err("fio: failed to create dir (%s): %d\n",
+				start, errno);
+			return 1;
+		}
+		*end = FIO_OS_PATH_SEPARATOR;
+		end++;
+	}
+	td->flags |= TD_F_DIRS_CREATED;
+	return 0;
+}
+
 int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 {
 	int cur_files = td->files_index;
@@ -1538,6 +1574,11 @@ int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 
 	sprintf(file_name + len, "%s", fname);
 
+	if (strchr(fname, FIO_OS_PATH_SEPARATOR) &&
+	    !(td->flags & TD_F_DIRS_CREATED) &&
+	    create_work_dirs(td, fname))
+		return 1;
+
 	/* clean cloned siblings using existing files */
 	if (numjob && is_already_allocated(file_name) &&
 	    !exists_and_not_regfile(fname))
@@ -1725,7 +1766,7 @@ static int recurse_dir(struct thread_data *td, const char *dirname)
 		if (!strcmp(dir->d_name, ".") || !strcmp(dir->d_name, ".."))
 			continue;
 
-		sprintf(full_path, "%s%s%s", dirname, FIO_OS_PATH_SEPARATOR, dir->d_name);
+		sprintf(full_path, "%s%c%s", dirname, FIO_OS_PATH_SEPARATOR, dir->d_name);
 
 		if (lstat(full_path, &sb) == -1) {
 			if (errno != ENOENT) {
diff --git a/fio.1 b/fio.1
index 6e7d1f8..7787ef2 100644
--- a/fio.1
+++ b/fio.1
@@ -578,6 +578,12 @@ fio generate filenames that are shared between the two. For instance, if
 `testfiles.$filenum' is specified, file number 4 for any job will be
 named `testfiles.4'. The default of `$jobname.$jobnum.$filenum'
 will be used if no other format specifier is given.
+.P
+If you specify a path then the directories will be created up to the main
+directory for the file.  So for example if you specify `a/b/c/$jobnum` then the
+directories a/b/c will be created before the file setup part of the job.  If you
+specify \fBdirectory\fR then the path will be relative that directory, otherwise
+it is treated as the absolute path.
 .RE
 .TP
 .BI unique_filename \fR=\fPbool
diff --git a/fio.h b/fio.h
index 8814d84..8ca934d 100644
--- a/fio.h
+++ b/fio.h
@@ -72,22 +72,42 @@ enum {
 };
 
 enum {
-	TD_F_VER_BACKLOG	= 1U << 0,
-	TD_F_TRIM_BACKLOG	= 1U << 1,
-	TD_F_READ_IOLOG		= 1U << 2,
-	TD_F_REFILL_BUFFERS	= 1U << 3,
-	TD_F_SCRAMBLE_BUFFERS	= 1U << 4,
-	TD_F_VER_NONE		= 1U << 5,
-	TD_F_PROFILE_OPS	= 1U << 6,
-	TD_F_COMPRESS		= 1U << 7,
-	TD_F_RESERVED		= 1U << 8, /* not used */
-	TD_F_COMPRESS_LOG	= 1U << 9,
-	TD_F_VSTATE_SAVED	= 1U << 10,
-	TD_F_NEED_LOCK		= 1U << 11,
-	TD_F_CHILD		= 1U << 12,
-	TD_F_NO_PROGRESS        = 1U << 13,
-	TD_F_REGROW_LOGS	= 1U << 14,
-	TD_F_MMAP_KEEP		= 1U << 15,
+	__TD_F_VER_BACKLOG	= 0,
+	__TD_F_TRIM_BACKLOG,
+	__TD_F_READ_IOLOG,
+	__TD_F_REFILL_BUFFERS,
+	__TD_F_SCRAMBLE_BUFFERS,
+	__TD_F_VER_NONE,
+	__TD_F_PROFILE_OPS,
+	__TD_F_COMPRESS,
+	__TD_F_COMPRESS_LOG,
+	__TD_F_VSTATE_SAVED,
+	__TD_F_NEED_LOCK,
+	__TD_F_CHILD,
+	__TD_F_NO_PROGRESS,
+	__TD_F_REGROW_LOGS,
+	__TD_F_MMAP_KEEP,
+	__TD_F_DIRS_CREATED,
+	__TD_F_LAST,		/* not a real bit, keep last */
+};
+
+enum {
+	TD_F_VER_BACKLOG	= 1U << __TD_F_VER_BACKLOG,
+	TD_F_TRIM_BACKLOG	= 1U << __TD_F_TRIM_BACKLOG,
+	TD_F_READ_IOLOG		= 1U << __TD_F_READ_IOLOG,
+	TD_F_REFILL_BUFFERS	= 1U << __TD_F_REFILL_BUFFERS,
+	TD_F_SCRAMBLE_BUFFERS	= 1U << __TD_F_SCRAMBLE_BUFFERS,
+	TD_F_VER_NONE		= 1U << __TD_F_VER_NONE,
+	TD_F_PROFILE_OPS	= 1U << __TD_F_PROFILE_OPS,
+	TD_F_COMPRESS		= 1U << __TD_F_COMPRESS,
+	TD_F_COMPRESS_LOG	= 1U << __TD_F_COMPRESS_LOG,
+	TD_F_VSTATE_SAVED	= 1U << __TD_F_VSTATE_SAVED,
+	TD_F_NEED_LOCK		= 1U << __TD_F_NEED_LOCK,
+	TD_F_CHILD		= 1U << __TD_F_CHILD,
+	TD_F_NO_PROGRESS        = 1U << __TD_F_NO_PROGRESS,
+	TD_F_REGROW_LOGS	= 1U << __TD_F_REGROW_LOGS,
+	TD_F_MMAP_KEEP		= 1U << __TD_F_MMAP_KEEP,
+	TD_F_DIRS_CREATED	= 1U << __TD_F_DIRS_CREATED,
 };
 
 enum {
@@ -591,12 +611,6 @@ enum {
 #define TD_ENG_FLAG_SHIFT	16
 #define TD_ENG_FLAG_MASK	((1U << 16) - 1)
 
-static inline enum fio_ioengine_flags td_ioengine_flags(struct thread_data *td)
-{
-	return (enum fio_ioengine_flags)
-		((td->flags >> TD_ENG_FLAG_SHIFT) & TD_ENG_FLAG_MASK);
-}
-
 static inline void td_set_ioengine_flags(struct thread_data *td)
 {
 	td->flags = (~(TD_ENG_FLAG_MASK << TD_ENG_FLAG_SHIFT) & td->flags) |
diff --git a/libfio.c b/libfio.c
index 14ddc4d..830759a 100644
--- a/libfio.c
+++ b/libfio.c
@@ -365,6 +365,8 @@ int initialize_fio(char *envp[])
 	compiletime_assert((offsetof(struct thread_options_pack, latency_percentile) % 8) == 0, "latency_percentile");
 	compiletime_assert((offsetof(struct jobs_eta, m_rate) % 8) == 0, "m_rate");
 
+	compiletime_assert(__TD_F_LAST <= TD_ENG_FLAG_SHIFT, "TD_ENG_FLAG_SHIFT");
+
 	err = endian_check();
 	if (err) {
 		log_err("fio: endianness settings appear wrong.\n");
diff --git a/os/os-windows.h b/os/os-windows.h
index 36b421e..520da19 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -37,7 +37,7 @@ int rand_r(unsigned *);
 
 #define FIO_PREFERRED_ENGINE		"windowsaio"
 #define FIO_PREFERRED_CLOCK_SOURCE	CS_CGETTIME
-#define FIO_OS_PATH_SEPARATOR		"\\"
+#define FIO_OS_PATH_SEPARATOR		'\\'
 
 #define FIO_MAX_CPUS	MAXIMUM_PROCESSORS
 
diff --git a/os/os.h b/os/os.h
index f62b427..1a4437c 100644
--- a/os/os.h
+++ b/os/os.h
@@ -155,7 +155,7 @@ extern int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu);
 #endif
 
 #ifndef FIO_OS_PATH_SEPARATOR
-#define FIO_OS_PATH_SEPARATOR	"/"
+#define FIO_OS_PATH_SEPARATOR	'/'
 #endif
 
 #ifndef FIO_PREFERRED_CLOCK_SOURCE
diff --git a/stat.c b/stat.c
index c5a68ad..5c75868 100644
--- a/stat.c
+++ b/stat.c
@@ -139,7 +139,7 @@ unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 				   fio_fp64_t *plist, unsigned long long **output,
 				   unsigned long long *maxv, unsigned long long *minv)
 {
-	unsigned long sum = 0;
+	unsigned long long sum = 0;
 	unsigned int len, i, j = 0;
 	unsigned int oval_len = 0;
 	unsigned long long *ovals = NULL;
diff --git a/tools/fio_generate_plots b/tools/fio_generate_plots
index a47bfa5..8872206 100755
--- a/tools/fio_generate_plots
+++ b/tools/fio_generate_plots
@@ -93,20 +93,26 @@ plot () {
 
     i=0
     
-    for x in *_"$FILETYPE".log
+    for x in *_"$FILETYPE".log *_"$FILETYPE".*.log
     do
-        i=$((i+1))
-        PT=$(echo $x | sed s/_"$FILETYPE".log//g)
-        if [ ! -z "$PLOT_LINE" ]
-        then
-            PLOT_LINE=$PLOT_LINE", "
+        if [ -e "$x" ]; then
+            i=$((i+1))
+            PT=$(echo $x | sed 's/\(.*\)_'$FILETYPE'\(.*\).log$/\1\2/')
+            if [ ! -z "$PLOT_LINE" ]
+            then
+                PLOT_LINE=$PLOT_LINE", "
+            fi
+
+            DEPTH=$(echo $PT | cut -d "-" -f 4)
+            PLOT_LINE=$PLOT_LINE"'$x' using (\$1/1000):(\$2/$SCALE) title \"Queue depth $DEPTH\" with lines ls $i" 
         fi
-
-        DEPTH=$(echo $PT | cut -d "-" -f 4)
-	    PLOT_LINE=$PLOT_LINE"'$x' using (\$1/1000):(\$2/$SCALE) title \"Queue depth $DEPTH\" with lines ls $i" 
-        
     done
 
+    if [ $i -eq 0 ]; then
+       echo "No log files found"
+       exit 1
+    fi
+
     OUTPUT="set output \"$TITLE-$FILETYPE.svg\" "
 
     echo " $PLOT_TITLE ; $YAXIS ; $DEFAULT_OPTS ; show style lines ; $OUTPUT ; plot "  $PLOT_LINE  | $GNUPLOT -
diff --git a/verify.c b/verify.c
index 1f177d7..db6e17e 100644
--- a/verify.c
+++ b/verify.c
@@ -252,7 +252,7 @@ static void dump_buf(char *buf, unsigned int len, unsigned long long offset,
 
 	memset(fname, 0, sizeof(fname));
 	if (aux_path)
-		sprintf(fname, "%s%s", aux_path, FIO_OS_PATH_SEPARATOR);
+		sprintf(fname, "%s%c", aux_path, FIO_OS_PATH_SEPARATOR);
 
 	strncpy(fname + strlen(fname), basename(ptr), buf_left - 1);
 
@@ -1726,7 +1726,7 @@ void verify_save_state(int mask)
 		char prefix[PATH_MAX];
 
 		if (aux_path)
-			sprintf(prefix, "%s%slocal", aux_path, FIO_OS_PATH_SEPARATOR);
+			sprintf(prefix, "%s%clocal", aux_path, FIO_OS_PATH_SEPARATOR);
 		else
 			strcpy(prefix, "local");
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-10-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-10-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b18775f7b7c6c7d0a4d9b0a38e2a979e4180d14e:

  Update file creation example (2017-10-09 14:42:45 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8847ae4cd2e3d0d73dd7d7c93c5d6da96b71d174:

  backend: don't dereference ->io_ops in reap_threads() (2017-10-10 11:54:54 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      Merge branch 'pr/note-for-shmmax' of https://github.com/taghos/fio
      HOWTO: include note about increasing shared memory limits
      engines/filecreate: set data direction for stats
      blktrace: use for_each_file()

Justin Eno (1):
      backend: don't dereference ->io_ops in reap_threads()

Ricardo Nabinger Sanchez (1):
      Add note for increasing shmmax if necessary

 HOWTO                |  3 +++
 backend.c            |  6 +-----
 blktrace.c           |  4 +---
 engines/filecreate.c | 31 ++++++++++++++++++++++++++++++-
 fio.1                |  2 ++
 5 files changed, 37 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index df79e2d..d3f957b 100644
--- a/HOWTO
+++ b/HOWTO
@@ -217,6 +217,9 @@ Command line options
 .. option:: --max-jobs=nr
 
 	Set the maximum number of threads/processes to support to `nr`.
+	NOTE: On Linux, it may be necessary to increase the shared-memory
+	limit ('/proc/sys/kernel/shmmax') if fio runs into errors while
+	creating jobs.
 
 .. option:: --server=args
 
diff --git a/backend.c b/backend.c
index ba6f585..d98e5fe 100644
--- a/backend.c
+++ b/backend.c
@@ -1929,11 +1929,7 @@ static void reap_threads(unsigned int *nr_running, uint64_t *t_rate,
 	for_each_td(td, i) {
 		int flags = 0;
 
-		/*
-		 * ->io_ops is NULL for a thread that has closed its
-		 * io engine
-		 */
-		if (td->io_ops && !strcmp(td->io_ops->name, "cpuio"))
+		 if (!strcmp(td->o.ioengine, "cpuio"))
 			cputhreads++;
 		else
 			realthreads++;
diff --git a/blktrace.c b/blktrace.c
index 65b600f..4b791d7 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -500,10 +500,8 @@ int load_blktrace(struct thread_data *td, const char *filename, int need_swap)
 		handle_trace(td, &t, ios, rw_bs);
 	} while (1);
 
-	for (i = 0; i < td->files_index; i++) {
-		f = td->files[i];
+	for_each_file(td, f, i)
 		trace_add_open_close_event(td, f->fileno, FIO_LOG_CLOSE_FILE);
-	}
 
 	fifo_free(fifo);
 	close(fd);
diff --git a/engines/filecreate.c b/engines/filecreate.c
index c6b6597..0c3bcdd 100644
--- a/engines/filecreate.c
+++ b/engines/filecreate.c
@@ -12,6 +12,10 @@
 #include "../fio.h"
 #include "../filehash.h"
 
+struct fc_data {
+	enum fio_ddir stat_ddir;
+};
+
 static int open_file(struct thread_data *td, struct fio_file *f)
 {
 	struct timespec start;
@@ -43,10 +47,11 @@ static int open_file(struct thread_data *td, struct fio_file *f)
 	}
 
 	if (do_lat) {
+		struct fc_data *data = td->io_ops_data;
 		uint64_t nsec;
 
 		nsec = ntime_since_now(&start);
-		add_clat_sample(td, DDIR_READ, nsec, 0, 0);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0);
 	}
 
 	return 0;
@@ -68,9 +73,33 @@ static int get_file_size(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
+static int init(struct thread_data *td)
+{
+	struct fc_data *data;
+
+	data = calloc(1, sizeof(*data));
+
+	if (td_read(td))
+		data->stat_ddir = DDIR_READ;
+	else if (td_write(td))
+		data->stat_ddir = DDIR_WRITE;
+
+	td->io_ops_data = data;
+	return 0;
+}
+
+static void cleanup(struct thread_data *td)
+{
+	struct fc_data *data = td->io_ops_data;
+
+	free(data);
+}
+
 static struct ioengine_ops ioengine = {
 	.name		= "filecreate",
 	.version	= FIO_IOOPS_VERSION,
+	.init		= init,
+	.cleanup	= cleanup,
 	.queue		= queue_io,
 	.get_file_size	= get_file_size,
 	.open_file	= open_file,
diff --git a/fio.1 b/fio.1
index 68ed3ba..6e7d1f8 100644
--- a/fio.1
+++ b/fio.1
@@ -113,6 +113,8 @@ All fio parser warnings are fatal, causing fio to exit with an error.
 .TP
 .BI \-\-max\-jobs \fR=\fPnr
 Set the maximum number of threads/processes to support to \fInr\fR.
+NOTE: On Linux, it may be necessary to increase the shared-memory limit
+(`/proc/sys/kernel/shmmax') if fio runs into errors while creating jobs.
 .TP
 .BI \-\-server \fR=\fPargs
 Start a backend server, with \fIargs\fR specifying what to listen to.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-10-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-10-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f7c9bfd57232c6e11623d741be340d32f796c726:

  backend: don't complain about no IO done for create_only=1 (2017-10-06 11:41:47 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b18775f7b7c6c7d0a4d9b0a38e2a979e4180d14e:

  Update file creation example (2017-10-09 14:42:45 -0600)

----------------------------------------------------------------
Jens Axboe (7):
      engines/filecreate: a few fixes
      engines/filecreate: don't use file hash
      engines/filecreate: set FIO_NOSTATS flag
      filesetup: don't track file allocation for jobs == 1
      time: add ntime_since_now()
      engine/filecreate: use clat and reads for stats
      Update file creation example

Josef Bacik (3):
      add a filecreate engine
      add FIO_FILENOHASH ioengine flag
      add an filecreate example file to examples/

Vincent Fu (2):
      stat: update description of clat accounting in stat.h
      stat: update json+ output format for --lat_percentiles option

 HOWTO                            |  4 ++
 Makefile                         |  2 +-
 engines/filecreate.c             | 90 ++++++++++++++++++++++++++++++++++++++++
 examples/filecreate-ioengine.fio | 35 ++++++++++++++++
 filesetup.c                      | 29 ++++++++++---
 fio.1                            |  4 ++
 fio_time.h                       |  1 +
 gettime.c                        |  8 ++++
 io_u.c                           |  2 +-
 ioengines.h                      |  2 +
 options.c                        |  4 ++
 stat.c                           |  9 +++-
 stat.h                           | 23 +++++-----
 tools/fio_jsonplus_clat2csv      | 12 +++++-
 14 files changed, 203 insertions(+), 22 deletions(-)
 create mode 100644 engines/filecreate.c
 create mode 100644 examples/filecreate-ioengine.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 8fad2ce..df79e2d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1797,6 +1797,10 @@ I/O engine
 			absolute or relative. See :file:`engines/skeleton_external.c` for
 			details of writing an external I/O engine.
 
+		**filecreate**
+			Simply create the files and do no IO to them.  You still need to
+			set  `filesize` so that all the accounting still occurs, but no
+			actual IO will be done other than creating the file.
 
 I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/Makefile b/Makefile
index 3764da5..76243ff 100644
--- a/Makefile
+++ b/Makefile
@@ -42,7 +42,7 @@ SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		eta.c verify.c memory.c io_u.c parse.c mutex.c options.c \
 		smalloc.c filehash.c profile.c debug.c engines/cpu.c \
 		engines/mmap.c engines/sync.c engines/null.c engines/net.c \
-		engines/ftruncate.c \
+		engines/ftruncate.c engines/filecreate.c \
 		server.c client.c iolog.c backend.c libfio.c flow.c cconv.c \
 		gettime-thread.c helpers.c json.c idletime.c td_error.c \
 		profiles/tiobench.c profiles/act.c io_u_queue.c filelock.c \
diff --git a/engines/filecreate.c b/engines/filecreate.c
new file mode 100644
index 0000000..c6b6597
--- /dev/null
+++ b/engines/filecreate.c
@@ -0,0 +1,90 @@
+/*
+ * filecreate engine
+ *
+ * IO engine that doesn't do any IO, just creates files and tracks the latency
+ * of the file creation.
+ */
+#include <stdio.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <errno.h>
+
+#include "../fio.h"
+#include "../filehash.h"
+
+static int open_file(struct thread_data *td, struct fio_file *f)
+{
+	struct timespec start;
+	int do_lat = !td->o.disable_lat;
+
+	dprint(FD_FILE, "fd open %s\n", f->file_name);
+
+	if (f->filetype != FIO_TYPE_FILE) {
+		log_err("fio: only files are supported fallocate \n");
+		return 1;
+	}
+	if (!strcmp(f->file_name, "-")) {
+		log_err("fio: can't read/write to stdin/out\n");
+		return 1;
+	}
+
+	if (do_lat)
+		fio_gettime(&start, NULL);
+
+	f->fd = open(f->file_name, O_CREAT|O_RDWR, 0600);
+
+	if (f->fd == -1) {
+		char buf[FIO_VERROR_SIZE];
+		int e = errno;
+
+		snprintf(buf, sizeof(buf), "open(%s)", f->file_name);
+		td_verror(td, e, buf);
+		return 1;
+	}
+
+	if (do_lat) {
+		uint64_t nsec;
+
+		nsec = ntime_since_now(&start);
+		add_clat_sample(td, DDIR_READ, nsec, 0, 0);
+	}
+
+	return 0;
+}
+
+static int queue_io(struct thread_data *td, struct io_u fio_unused *io_u)
+{
+	return FIO_Q_COMPLETED;
+}
+
+/*
+ * Ensure that we at least have a block size worth of IO to do for each
+ * file. If the job file has td->o.size < nr_files * block_size, then
+ * fio won't do anything.
+ */
+static int get_file_size(struct thread_data *td, struct fio_file *f)
+{
+	f->real_file_size = td_min_bs(td);
+	return 0;
+}
+
+static struct ioengine_ops ioengine = {
+	.name		= "filecreate",
+	.version	= FIO_IOOPS_VERSION,
+	.queue		= queue_io,
+	.get_file_size	= get_file_size,
+	.open_file	= open_file,
+	.close_file	= generic_close_file,
+	.flags		= FIO_DISKLESSIO | FIO_SYNCIO | FIO_FAKEIO |
+				FIO_NOSTATS | FIO_NOFILEHASH,
+};
+
+static void fio_init fio_filecreate_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_filecreate_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/filecreate-ioengine.fio b/examples/filecreate-ioengine.fio
new file mode 100644
index 0000000..ec7caad
--- /dev/null
+++ b/examples/filecreate-ioengine.fio
@@ -0,0 +1,35 @@
+# Example filecreate job
+#
+# create_on_open is needed so that the open happens during the run and not the
+# setup.
+#
+# openfiles needs to be set so that you do not exceed the maximum allowed open
+# files.
+#
+# filesize needs to be set to a non zero value so fio will actually run, but the
+# IO will not really be done and the write latency numbers will only reflect the
+# open times.
+[global]
+create_on_open=1
+nrfiles=31250
+ioengine=filecreate
+fallocate=none
+filesize=4k
+openfiles=1
+
+[t0]
+[t1]
+[t2]
+[t3]
+[t4]
+[t5]
+[t6]
+[t7]
+[t8]
+[t9]
+[t10]
+[t11]
+[t12]
+[t13]
+[t14]
+[t15]
diff --git a/filesetup.c b/filesetup.c
index 891a55a..0631a01 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1342,6 +1342,7 @@ void close_and_free_files(struct thread_data *td)
 {
 	struct fio_file *f;
 	unsigned int i;
+	bool use_free = td_ioengine_flagged(td, FIO_NOFILEHASH);
 
 	dprint(FD_FILE, "close files\n");
 
@@ -1361,13 +1362,19 @@ void close_and_free_files(struct thread_data *td)
 			td_io_unlink_file(td, f);
 		}
 
-		sfree(f->file_name);
+		if (use_free)
+			free(f->file_name);
+		else
+			sfree(f->file_name);
 		f->file_name = NULL;
 		if (fio_file_axmap(f)) {
 			axmap_free(f->io_axmap);
 			f->io_axmap = NULL;
 		}
-		sfree(f);
+		if (use_free)
+			free(f);
+		else
+			sfree(f);
 	}
 
 	td->o.filename = NULL;
@@ -1481,7 +1488,10 @@ static struct fio_file *alloc_new_file(struct thread_data *td)
 {
 	struct fio_file *f;
 
-	f = smalloc(sizeof(*f));
+	if (td_ioengine_flagged(td, FIO_NOFILEHASH))
+		f = calloc(1, sizeof(*f));
+	else
+		f = smalloc(sizeof(*f));
 	if (!f) {
 		assert(0);
 		return NULL;
@@ -1564,7 +1574,10 @@ int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 	if (td->io_ops && td_ioengine_flagged(td, FIO_DISKLESSIO))
 		f->real_file_size = -1ULL;
 
-	f->file_name = smalloc_strdup(file_name);
+	if (td_ioengine_flagged(td, FIO_NOFILEHASH))
+		f->file_name = strdup(file_name);
+	else
+		f->file_name = smalloc_strdup(file_name);
 	if (!f->file_name)
 		assert(0);
 
@@ -1588,7 +1601,8 @@ int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 	if (f->filetype == FIO_TYPE_FILE)
 		td->nr_normal_files++;
 
-	set_already_allocated(file_name);
+	if (td->o.numjobs > 1)
+		set_already_allocated(file_name);
 
 	if (inc)
 		td->o.nr_files++;
@@ -1768,7 +1782,10 @@ void dup_files(struct thread_data *td, struct thread_data *org)
 		__f = alloc_new_file(td);
 
 		if (f->file_name) {
-			__f->file_name = smalloc_strdup(f->file_name);
+			if (td_ioengine_flagged(td, FIO_NOFILEHASH))
+				__f->file_name = strdup(f->file_name);
+			else
+				__f->file_name = smalloc_strdup(f->file_name);
 			if (!__f->file_name)
 				assert(0);
 
diff --git a/fio.1 b/fio.1
index b943db2..68ed3ba 100644
--- a/fio.1
+++ b/fio.1
@@ -1577,6 +1577,10 @@ the engine filename, e.g. `ioengine=external:/tmp/foo.o' to load
 ioengine `foo.o' in `/tmp'. The path can be either
 absolute or relative. See `engines/skeleton_external.c' in the fio source for
 details of writing an external I/O engine.
+.TP
+.B filecreate
+Create empty files only.  \fBfilesize\fR still needs to be specified so that fio
+will run and grab latency results, but no IO will actually be done on the files.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
diff --git a/fio_time.h b/fio_time.h
index f4eac79..c7c3dbb 100644
--- a/fio_time.h
+++ b/fio_time.h
@@ -5,6 +5,7 @@
 
 struct thread_data;
 extern uint64_t ntime_since(const struct timespec *, const struct timespec *);
+extern uint64_t ntime_since_now(const struct timespec *);
 extern uint64_t utime_since(const struct timespec *, const struct timespec *);
 extern uint64_t utime_since_now(const struct timespec *);
 extern uint64_t mtime_since(const struct timespec *, const struct timespec *);
diff --git a/gettime.c b/gettime.c
index 3dcaaf6..7945528 100644
--- a/gettime.c
+++ b/gettime.c
@@ -448,6 +448,14 @@ uint64_t ntime_since(const struct timespec *s, const struct timespec *e)
        return nsec + (sec * 1000000000LL);
 }
 
+uint64_t ntime_since_now(const struct timespec *s)
+{
+	struct timespec now;
+
+	fio_gettime(&now, NULL);
+	return ntime_since(s, &now);
+}
+
 uint64_t utime_since(const struct timespec *s, const struct timespec *e)
 {
 	int64_t sec, usec;
diff --git a/io_u.c b/io_u.c
index 58c2320..fb4180a 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1779,7 +1779,7 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 	if (td->parent)
 		td = td->parent;
 
-	if (!td->o.stats)
+	if (!td->o.stats || td_ioengine_flagged(td, FIO_NOSTATS))
 		return;
 
 	if (no_reduce)
diff --git a/ioengines.h b/ioengines.h
index 177cbc0..32b18ed 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -59,6 +59,8 @@ enum fio_ioengine_flags {
 	FIO_MEMALIGN	= 1 << 9,	/* engine wants aligned memory */
 	FIO_BIT_BASED	= 1 << 10,	/* engine uses a bit base (e.g. uses Kbit as opposed to KB) */
 	FIO_FAKEIO	= 1 << 11,	/* engine pretends to do IO */
+	FIO_NOSTATS	= 1 << 12,	/* don't do IO stats */
+	FIO_NOFILEHASH	= 1 << 13,	/* doesn't hash the files for lookup later. */
 };
 
 /*
diff --git a/options.c b/options.c
index 5c1abe9..ddcc4e5 100644
--- a/options.c
+++ b/options.c
@@ -1843,6 +1843,10 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "DAX Device based IO engine",
 			  },
 #endif
+			  {
+			    .ival = "filecreate",
+			    .help = "File creation engine",
+			  },
 			  { .ival = "external",
 			    .help = "Load external engine (append name)",
 			    .cb = str_ioengine_external_cb,
diff --git a/stat.c b/stat.c
index 09afa5b..c5a68ad 100644
--- a/stat.c
+++ b/stat.c
@@ -962,7 +962,7 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	unsigned int len;
 	int i;
 	const char *ddirname[] = {"read", "write", "trim"};
-	struct json_object *dir_object, *tmp_object, *percentile_object, *clat_bins_object;
+	struct json_object *dir_object, *tmp_object, *percentile_object, *clat_bins_object = NULL;
 	char buf[120];
 	double p_of_agg = 100.0;
 
@@ -1036,7 +1036,9 @@ static void add_ddir_status_json(struct thread_stat *ts,
 
 	if (output_format & FIO_OUTPUT_JSON_PLUS) {
 		clat_bins_object = json_create_object();
-		json_object_add_value_object(tmp_object, "bins", clat_bins_object);
+		if (ts->clat_percentiles)
+			json_object_add_value_object(tmp_object, "bins", clat_bins_object);
+
 		for(i = 0; i < FIO_IO_U_PLAT_NR; i++) {
 			if (ts->io_u_plat[ddir][i]) {
 				snprintf(buf, sizeof(buf), "%llu", plat_idx_to_val(i));
@@ -1055,6 +1057,9 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	json_object_add_value_int(tmp_object, "max", max);
 	json_object_add_value_float(tmp_object, "mean", mean);
 	json_object_add_value_float(tmp_object, "stddev", dev);
+	if (output_format & FIO_OUTPUT_JSON_PLUS && ts->lat_percentiles)
+		json_object_add_value_object(tmp_object, "bins", clat_bins_object);
+
 	if (ovals)
 		free(ovals);
 
diff --git a/stat.h b/stat.h
index 848331b..3fda084 100644
--- a/stat.h
+++ b/stat.h
@@ -24,6 +24,16 @@ struct group_run_stats {
 #define FIO_IO_U_LAT_M_NR 12
 
 /*
+ * Constants for clat percentiles
+ */
+#define FIO_IO_U_PLAT_BITS 6
+#define FIO_IO_U_PLAT_VAL (1 << FIO_IO_U_PLAT_BITS)
+#define FIO_IO_U_PLAT_GROUP_NR 29
+#define FIO_IO_U_PLAT_NR (FIO_IO_U_PLAT_GROUP_NR * FIO_IO_U_PLAT_VAL)
+#define FIO_IO_U_LIST_MAX_LEN 20 /* The size of the default and user-specified
+					list of percentiles */
+
+/*
  * Aggregate clat samples to report percentile(s) of them.
  *
  * EXECUTIVE SUMMARY
@@ -34,7 +44,7 @@ struct group_run_stats {
  *
  * FIO_IO_U_PLAT_GROUP_NR and FIO_IO_U_PLAT_BITS determine the maximum
  * range being tracked for latency samples. The maximum value tracked
- * accurately will be 2^(GROUP_NR + PLAT_BITS -1) microseconds.
+ * accurately will be 2^(GROUP_NR + PLAT_BITS - 1) nanoseconds.
  *
  * FIO_IO_U_PLAT_GROUP_NR and FIO_IO_U_PLAT_BITS determine the memory
  * requirement of storing those aggregate counts. The memory used will
@@ -98,22 +108,15 @@ struct group_run_stats {
  *	3	8	2		[256,511]		64
  *	4	9	3		[512,1023]		64
  *	...	...	...		[...,...]		...
- *	18	23	17		[8838608,+inf]**	64
+ *	28	33	27		[8589934592,+inf]**	64
  *
  *  * Special cases: when n < (M-1) or when n == (M-1), in both cases,
  *    the value cannot be rounded off. Use all bits of the sample as
  *    index.
  *
- *  ** If a sample's MSB is greater than 23, it will be counted as 23.
+ *  ** If a sample's MSB is greater than 33, it will be counted as 33.
  */
 
-#define FIO_IO_U_PLAT_BITS 6
-#define FIO_IO_U_PLAT_VAL (1 << FIO_IO_U_PLAT_BITS)
-#define FIO_IO_U_PLAT_GROUP_NR 29
-#define FIO_IO_U_PLAT_NR (FIO_IO_U_PLAT_GROUP_NR * FIO_IO_U_PLAT_VAL)
-#define FIO_IO_U_LIST_MAX_LEN 20 /* The size of the default and user-specified
-					list of percentiles */
-
 /*
  * Trim cycle count measurements
  */
diff --git a/tools/fio_jsonplus_clat2csv b/tools/fio_jsonplus_clat2csv
index d4ac16e..64fdc9f 100755
--- a/tools/fio_jsonplus_clat2csv
+++ b/tools/fio_jsonplus_clat2csv
@@ -107,8 +107,16 @@ def main():
 
         prev_ddir = None
         for ddir in ddir_set:
+            if 'bins' in jsondata['jobs'][jobnum][ddir]['clat_ns']:
+                bins_loc = 'clat_ns'
+            elif 'bins' in jsondata['jobs'][jobnum][ddir]['lat_ns']:
+                bins_loc = 'lat_ns'
+            else:
+                raise RuntimeError("Latency bins not found. "
+                                   "Are you sure you are using json+ output?")
+
             bins[ddir] = [[int(key), value] for key, value in
-                          jsondata['jobs'][jobnum][ddir]['clat_ns']
+                          jsondata['jobs'][jobnum][ddir][bins_loc]
                           ['bins'].iteritems()]
             bins[ddir] = sorted(bins[ddir], key=lambda bin: bin[0])
 
@@ -123,7 +131,7 @@ def main():
         outfile = stub + '_job' + str(jobnum) + ext
 
         with open(outfile, 'w') as output:
-            output.write("clat_nsec, ")
+            output.write("{0}ec, ".format(bins_loc))
             ddir_list = list(ddir_set)
             for ddir in ddir_list:
                 output.write("{0}_count, {0}_cumulative, {0}_percentile, ".

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-10-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-10-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e6fe02651641fc64d2fa4fcfe9b1013b2947d11b:

  Merge branch 'master' of https://github.com/dyniusz/fio (2017-10-03 11:19:26 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f7c9bfd57232c6e11623d741be340d32f796c726:

  backend: don't complain about no IO done for create_only=1 (2017-10-06 11:41:47 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      backend: don't complain about no IO done for create_only=1

 backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index b1995ef..ba6f585 100644
--- a/backend.c
+++ b/backend.c
@@ -1833,7 +1833,7 @@ static void *thread_main(void *data)
 	 * (Are we not missing other flags that can be ignored ?)
 	 */
 	if ((td->o.size || td->o.io_size) && !ddir_rw_sum(bytes_done) &&
-	    !did_some_io &&
+	    !did_some_io && !td->o.create_only &&
 	    !(td_ioengine_flagged(td, FIO_NOIO) ||
 	      td_ioengine_flagged(td, FIO_DISKLESSIO)))
 		log_err("%s: No I/O performed by %s, "

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-10-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-10-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c16035aadd600a3a4c4b241339e3d3099f56c4b2:

  backend: fix a case where we complain about no IO being done (2017-09-28 06:58:23 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e6fe02651641fc64d2fa4fcfe9b1013b2947d11b:

  Merge branch 'master' of https://github.com/dyniusz/fio (2017-10-03 11:19:26 -0600)

----------------------------------------------------------------
Erwan Velu (7):
      io_u: Converting usec from long to uint64_t
      backend: Removing double definition of the same variable
      backend: Removing memory leak in run_threads()
      oslib/libmtd: Removing useless err assigment
      t/gen-rand: Avoid memleak of buckets()
      client:  Avoid memory leak in fio_client_handle_iolog()
      client: Fixing invalid use after free()

Jens Axboe (3):
      Merge branch 'evelu/cleanup' of https://github.com/ErwanAliasr1/fio
      client: fix pointer vs uint8_t comparison
      Merge branch 'master' of https://github.com/dyniusz/fio

dyniusz (2):
      Adjustments to support C++ engines
      null context segfault fix

 backend.c             |   2 +-
 client.c              |  37 +++++++++---
 engines/null.c        | 164 +++++++++++++++++++++++++++++++++++++++++++-------
 examples/cpp_null.fio |  10 +++
 file.h                |   6 ++
 io_u.c                |   2 +-
 ioengines.c           |   5 +-
 lib/types.h           |   2 +-
 oslib/libmtd.c        |   1 -
 t/gen-rand.c          |   2 +-
 10 files changed, 192 insertions(+), 39 deletions(-)
 create mode 100644 examples/cpp_null.fio

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index e4eff37..b1995ef 100644
--- a/backend.c
+++ b/backend.c
@@ -499,7 +499,6 @@ int io_queue_event(struct thread_data *td, struct io_u *io_u, int *ret,
 			if (ddir_rw(io_u->ddir))
 				td->ts.short_io_u[io_u->ddir]++;
 
-			f = io_u->file;
 			if (io_u->offset == f->real_file_size)
 				goto sync_done;
 
@@ -2347,6 +2346,7 @@ reap:
 				fio_terminate_threads(TERMINATE_ALL);
 				fio_abort = 1;
 				nr_started--;
+				free(fd);
 				break;
 			}
 			dprint(FD_MUTEX, "done waiting on startup_mutex\n");
diff --git a/client.c b/client.c
index 09e810a..779fb9d 100644
--- a/client.c
+++ b/client.c
@@ -1312,14 +1312,16 @@ static void client_flush_hist_samples(FILE *f, int hist_coarseness, void *sample
 static int fio_client_handle_iolog(struct fio_client *client,
 				   struct fio_net_cmd *cmd)
 {
-	struct cmd_iolog_pdu *pdu;
+	struct cmd_iolog_pdu *pdu = NULL;
 	bool store_direct;
-	char *log_pathname;
+	char *log_pathname = NULL;
+	int ret = 0;
 
 	pdu = convert_iolog(cmd, &store_direct);
 	if (!pdu) {
 		log_err("fio: failed converting IO log\n");
-		return 1;
+		ret = 1;
+		goto out;
 	}
 
         /* allocate buffer big enough for next sprintf() call */
@@ -1327,7 +1329,8 @@ static int fio_client_handle_iolog(struct fio_client *client,
 			strlen(client->hostname));
 	if (!log_pathname) {
 		log_err("fio: memory allocation of unique pathname failed\n");
-		return -1;
+		ret = -1;
+		goto out;
 	}
 	/* generate a unique pathname for the log file using hostname */
 	sprintf(log_pathname, "%s.%s", pdu->name, client->hostname);
@@ -1342,7 +1345,8 @@ static int fio_client_handle_iolog(struct fio_client *client,
 		if (fd < 0) {
 			log_err("fio: open log %s: %s\n",
 				log_pathname, strerror(errno));
-			return 1;
+			ret = 1;
+			goto out;
 		}
 
 		sz = cmd->pdu_len - sizeof(*pdu);
@@ -1351,17 +1355,19 @@ static int fio_client_handle_iolog(struct fio_client *client,
 
 		if (ret != sz) {
 			log_err("fio: short write on compressed log\n");
-			return 1;
+			ret = 1;
+			goto out;
 		}
 
-		return 0;
+		ret = 0;
 	} else {
 		FILE *f;
 		f = fopen((const char *) log_pathname, "w");
 		if (!f) {
 			log_err("fio: fopen log %s : %s\n",
 				log_pathname, strerror(errno));
-			return 1;
+			ret = 1;
+			goto out;
 		}
 
 		if (pdu->log_type == IO_LOG_TYPE_HIST) {
@@ -1372,8 +1378,17 @@ static int fio_client_handle_iolog(struct fio_client *client,
 					pdu->nr_samples * sizeof(struct io_sample));
 		}
 		fclose(f);
-		return 0;
+		ret = 0;
 	}
+
+out:
+	if (pdu && pdu != (void *) cmd->payload)
+		free(pdu);
+
+	if (log_pathname)
+		free(log_pathname);
+
+	return ret;
 }
 
 static void handle_probe(struct fio_client *client, struct fio_net_cmd *cmd)
@@ -1849,10 +1864,12 @@ static void request_client_etas(struct client_ops *ops)
 static int handle_cmd_timeout(struct fio_client *client,
 			      struct fio_net_cmd_reply *reply)
 {
+	uint16_t reply_opcode = reply->opcode;
+
 	flist_del(&reply->list);
 	free(reply);
 
-	if (reply->opcode != FIO_NET_CMD_SEND_ETA)
+	if (reply_opcode != FIO_NET_CMD_SEND_ETA)
 		return 1;
 
 	log_info("client <%s>: timeout on SEND_ETA\n", client->hostname);
diff --git a/engines/null.c b/engines/null.c
index 812cadf..8a4d106 100644
--- a/engines/null.c
+++ b/engines/null.c
@@ -6,7 +6,11 @@
  *
  * It also can act as external C++ engine - compiled with:
  *
- * g++ -O2 -g -shared -rdynamic -fPIC -o null.so null.c -DFIO_EXTERNAL_ENGINE
+ * g++ -O2 -g -shared -rdynamic -fPIC -o cpp_null null.c -DFIO_EXTERNAL_ENGINE
+ *
+ * to test it execute:
+ *
+ * LD_LIBRARY_PATH=./engines ./fio examples/cpp_null.fio
  *
  */
 #include <stdio.h>
@@ -23,20 +27,17 @@ struct null_data {
 	int events;
 };
 
-static struct io_u *fio_null_event(struct thread_data *td, int event)
+static struct io_u *null_event(struct null_data *nd, int event)
 {
-	struct null_data *nd = (struct null_data *) td->io_ops_data;
-
 	return nd->io_us[event];
 }
 
-static int fio_null_getevents(struct thread_data *td, unsigned int min_events,
-			      unsigned int fio_unused max,
-			      const struct timespec fio_unused *t)
+static int null_getevents(struct null_data *nd, unsigned int min_events,
+			  unsigned int fio_unused max,
+			  const struct timespec fio_unused *t)
 {
-	struct null_data *nd = (struct null_data *) td->io_ops_data;
 	int ret = 0;
-	
+
 	if (min_events) {
 		ret = nd->events;
 		nd->events = 0;
@@ -45,10 +46,8 @@ static int fio_null_getevents(struct thread_data *td, unsigned int min_events,
 	return ret;
 }
 
-static int fio_null_commit(struct thread_data *td)
+static int null_commit(struct thread_data *td, struct null_data *nd)
 {
-	struct null_data *nd = (struct null_data *) td->io_ops_data;
-
 	if (!nd->events) {
 #ifndef FIO_EXTERNAL_ENGINE
 		io_u_mark_submit(td, nd->queued);
@@ -60,10 +59,9 @@ static int fio_null_commit(struct thread_data *td)
 	return 0;
 }
 
-static int fio_null_queue(struct thread_data *td, struct io_u *io_u)
+static int null_queue(struct thread_data *td, struct null_data *nd,
+		      struct io_u *io_u)
 {
-	struct null_data *nd = (struct null_data *) td->io_ops_data;
-
 	fio_ro_check(td, io_u);
 
 	if (td->io_ops->flags & FIO_SYNCIO)
@@ -75,25 +73,23 @@ static int fio_null_queue(struct thread_data *td, struct io_u *io_u)
 	return FIO_Q_QUEUED;
 }
 
-static int fio_null_open(struct thread_data fio_unused *td,
-			 struct fio_file fio_unused *f)
+static int null_open(struct null_data fio_unused *nd,
+		     struct fio_file fio_unused *f)
 {
 	return 0;
 }
 
-static void fio_null_cleanup(struct thread_data *td)
+static void null_cleanup(struct null_data *nd)
 {
-	struct null_data *nd = (struct null_data *) td->io_ops_data;
-
 	if (nd) {
 		free(nd->io_us);
 		free(nd);
 	}
 }
 
-static int fio_null_init(struct thread_data *td)
+static int null_init(struct thread_data *td, struct null_data **nd_ptr)
 {
-	struct null_data *nd = (struct null_data *) malloc(sizeof(*nd));
+	struct null_data *nd = (struct null_data *) malloc(sizeof(**nd_ptr));
 
 	memset(nd, 0, sizeof(*nd));
 
@@ -103,11 +99,49 @@ static int fio_null_init(struct thread_data *td)
 	} else
 		td->io_ops->flags |= FIO_SYNCIO;
 
-	td->io_ops_data = nd;
+	*nd_ptr = nd;
 	return 0;
 }
 
 #ifndef __cplusplus
+
+static struct io_u *fio_null_event(struct thread_data *td, int event)
+{
+	return null_event((struct null_data *)td->io_ops_data, event);
+}
+
+static int fio_null_getevents(struct thread_data *td, unsigned int min_events,
+			      unsigned int max, const struct timespec *t)
+{
+	struct null_data *nd = (struct null_data *)td->io_ops_data;
+	return null_getevents(nd, min_events, max, t);
+}
+
+static int fio_null_commit(struct thread_data *td)
+{
+	return null_commit(td, (struct null_data *)td->io_ops_data);
+}
+
+static int fio_null_queue(struct thread_data *td, struct io_u *io_u)
+{
+	return null_queue(td, (struct null_data *)td->io_ops_data, io_u);
+}
+
+static int fio_null_open(struct thread_data *td, struct fio_file *f)
+{
+	return null_open((struct null_data *)td->io_ops_data, f);
+}
+
+static void fio_null_cleanup(struct thread_data *td)
+{
+	null_cleanup((struct null_data *)td->io_ops_data);
+}
+
+static int fio_null_init(struct thread_data *td)
+{
+	return null_init(td, (struct null_data **)&td->io_ops_data);
+}
+
 static struct ioengine_ops ioengine = {
 	.name		= "null",
 	.version	= FIO_IOOPS_VERSION,
@@ -134,7 +168,91 @@ static void fio_exit fio_null_unregister(void)
 #else
 
 #ifdef FIO_EXTERNAL_ENGINE
+
+struct NullData {
+	NullData(struct thread_data *td)
+	{
+		null_init(td, &impl_);
+	}
+
+	~NullData()
+	{
+		null_cleanup(impl_);
+	}
+
+	static NullData *get(struct thread_data *td)
+	{
+		return reinterpret_cast<NullData *>(td->io_ops_data);
+	}
+
+	io_u *fio_null_event(struct thread_data *, int event)
+	{
+		return null_event(impl_, event);
+	}
+
+	int fio_null_getevents(struct thread_data *, unsigned int min_events,
+			       unsigned int max, const struct timespec *t)
+	{
+		return null_getevents(impl_, min_events, max, t);
+	}
+
+	int fio_null_commit(struct thread_data *td)
+	{
+		return null_commit(td, impl_);
+	}
+
+	int fio_null_queue(struct thread_data *td, struct io_u *io_u)
+	{
+		return null_queue(td, impl_, io_u);
+	}
+
+	int fio_null_open(struct thread_data *, struct fio_file *f)
+	{
+		return null_open(impl_, f);
+	}
+
+	struct null_data *impl_;
+};
+
 extern "C" {
+
+static struct io_u *fio_null_event(struct thread_data *td, int event)
+{
+	return NullData::get(td)->fio_null_event(td, event);
+}
+
+static int fio_null_getevents(struct thread_data *td, unsigned int min_events,
+			      unsigned int max, const struct timespec *t)
+{
+	return NullData::get(td)->fio_null_getevents(td, min_events, max, t);
+}
+
+static int fio_null_commit(struct thread_data *td)
+{
+	return NullData::get(td)->fio_null_commit(td);
+}
+
+static int fio_null_queue(struct thread_data *td, struct io_u *io_u)
+{
+	return NullData::get(td)->fio_null_queue(td, io_u);
+}
+
+static int fio_null_open(struct thread_data *td, struct fio_file *f)
+{
+	return NullData::get(td)->fio_null_open(td, f);
+}
+
+static int fio_null_init(struct thread_data *td)
+{
+	td->io_ops_data = new NullData(td);
+	return 0;
+}
+
+static void fio_null_cleanup(struct thread_data *td)
+{
+	delete NullData::get(td);
+}
+
 static struct ioengine_ops ioengine;
 void get_ioengine(struct ioengine_ops **ioengine_ptr)
 {
diff --git a/examples/cpp_null.fio b/examples/cpp_null.fio
new file mode 100644
index 0000000..436ed90
--- /dev/null
+++ b/examples/cpp_null.fio
@@ -0,0 +1,10 @@
+[global]
+bs=4k
+gtod_reduce=1
+
+[null]
+ioengine=cpp_null
+size=100g
+rw=randread
+norandommap
+time_based=0
diff --git a/file.h b/file.h
index ad8802d..e3864ee 100644
--- a/file.h
+++ b/file.h
@@ -188,9 +188,15 @@ extern void close_and_free_files(struct thread_data *);
 extern uint64_t get_start_offset(struct thread_data *, struct fio_file *);
 extern int __must_check setup_files(struct thread_data *);
 extern int __must_check file_invalidate_cache(struct thread_data *, struct fio_file *);
+#ifdef __cplusplus
+extern "C" {
+#endif
 extern int __must_check generic_open_file(struct thread_data *, struct fio_file *);
 extern int __must_check generic_close_file(struct thread_data *, struct fio_file *);
 extern int __must_check generic_get_file_size(struct thread_data *, struct fio_file *);
+#ifdef __cplusplus
+}
+#endif
 extern int __must_check file_lookup_open(struct fio_file *f, int flags);
 extern int __must_check pre_read_files(struct thread_data *);
 extern unsigned long long get_rand_file_size(struct thread_data *td);
diff --git a/io_u.c b/io_u.c
index e98cd31..58c2320 100644
--- a/io_u.c
+++ b/io_u.c
@@ -662,7 +662,7 @@ int io_u_quiesce(struct thread_data *td)
 static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
 {
 	enum fio_ddir odir = ddir ^ 1;
-	long usec;
+	uint64_t usec;
 	uint64_t now;
 
 	assert(ddir_rw(ddir));
diff --git a/ioengines.c b/ioengines.c
index 9638d80..1bfc06f 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -133,8 +133,10 @@ static struct ioengine_ops *__load_ioengine(const char *name)
 	/*
 	 * linux libaio has alias names, so convert to what we want
 	 */
-	if (!strncmp(engine, "linuxaio", 8) || !strncmp(engine, "aio", 3))
+	if (!strncmp(engine, "linuxaio", 8) || !strncmp(engine, "aio", 3)) {
+		dprint(FD_IO, "converting ioengine name: %s -> libaio\n", name);
 		strcpy(engine, "libaio");
+	}
 
 	dprint(FD_IO, "load ioengine %s\n", engine);
 	return find_ioengine(engine);
@@ -436,6 +438,7 @@ int td_io_open_file(struct thread_data *td, struct fio_file *f)
 {
 	assert(!fio_file_open(f));
 	assert(f->fd == -1);
+	assert(td->io_ops->open_file);
 
 	if (td->io_ops->open_file(td, f)) {
 		if (td->error == EINVAL && td->o.odirect)
diff --git a/lib/types.h b/lib/types.h
index 287a3b4..bb24506 100644
--- a/lib/types.h
+++ b/lib/types.h
@@ -1,7 +1,7 @@
 #ifndef FIO_TYPES_H
 #define FIO_TYPES_H
 
-#ifndef CONFIG_HAVE_BOOL
+#if !defined(CONFIG_HAVE_BOOL) && !defined(__cplusplus)
 typedef int bool;
 #ifndef false
 #define false	0
diff --git a/oslib/libmtd.c b/oslib/libmtd.c
index 24e9db9..5d18871 100644
--- a/oslib/libmtd.c
+++ b/oslib/libmtd.c
@@ -1002,7 +1002,6 @@ int mtd_torture(libmtd_t desc, const struct mtd_dev_info *mtd, int fd, int eb)
 		}
 	}
 
-	err = 0;
 	normsg("PEB %d passed torture test, do not mark it a bad", eb);
 
 out:
diff --git a/t/gen-rand.c b/t/gen-rand.c
index 6c31f92..4e9d39c 100644
--- a/t/gen-rand.c
+++ b/t/gen-rand.c
@@ -63,6 +63,6 @@ int main(int argc, char *argv[])
 	}
 
 	printf("Passes=%lu, Fail=%lu\n", pass, fail);
-
+	free(buckets);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b6b662b64c70bf043bbafd654f9bf98513ea5dc9:

  Fio 3.1 (2017-09-28 04:23:20 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c16035aadd600a3a4c4b241339e3d3099f56c4b2:

  backend: fix a case where we complain about no IO being done (2017-09-28 06:58:23 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      backend: fix a case where we complain about no IO being done

 backend.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 6198c3d..e4eff37 100644
--- a/backend.c
+++ b/backend.c
@@ -1505,7 +1505,7 @@ static void *thread_main(void *data)
 	struct sk_out *sk_out = fd->sk_out;
 	uint64_t bytes_done[DDIR_RWDIR_CNT];
 	int deadlock_loop_cnt;
-	int clear_state;
+	bool clear_state, did_some_io;
 	int ret;
 
 	sk_out_assign(sk_out);
@@ -1726,7 +1726,8 @@ static void *thread_main(void *data)
 	}
 
 	memset(bytes_done, 0, sizeof(bytes_done));
-	clear_state = 0;
+	clear_state = false;
+	did_some_io = false;
 
 	while (keep_running(td)) {
 		uint64_t verify_bytes;
@@ -1765,7 +1766,7 @@ static void *thread_main(void *data)
 		if (td->runstate >= TD_EXITED)
 			break;
 
-		clear_state = 1;
+		clear_state = true;
 
 		/*
 		 * Make sure we've successfully updated the rusage stats
@@ -1804,6 +1805,9 @@ static void *thread_main(void *data)
 		    td_ioengine_flagged(td, FIO_UNIDIR))
 			continue;
 
+		if (ddir_rw_sum(bytes_done))
+			did_some_io = true;
+
 		clear_io_state(td, 0);
 
 		fio_gettime(&td->start, NULL);
@@ -1830,6 +1834,7 @@ static void *thread_main(void *data)
 	 * (Are we not missing other flags that can be ignored ?)
 	 */
 	if ((td->o.size || td->o.io_size) && !ddir_rw_sum(bytes_done) &&
+	    !did_some_io &&
 	    !(td_ioengine_flagged(td, FIO_NOIO) ||
 	      td_ioengine_flagged(td, FIO_DISKLESSIO)))
 		log_err("%s: No I/O performed by %s, "

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7595fa09fbf048c9045617668bf0159a6cb82eac:

  Remove old exp/README.md file (2017-09-26 13:38:38 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b6b662b64c70bf043bbafd654f9bf98513ea5dc9:

  Fio 3.1 (2017-09-28 04:23:20 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 3.1

 FIO-VERSION-GEN        | 2 +-
 os/windows/install.wxs | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 31acf1c..8c075cb 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-3.0
+DEF_VER=fio-3.1
 
 LF='
 '
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index edfefa8..58244c5 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="3.0">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="3.1">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2ae7a90df2d2340c4503be8a91526f80b3b96789:

  init: typo in help output (2017-09-20 22:21:35 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7595fa09fbf048c9045617668bf0159a6cb82eac:

  Remove old exp/README.md file (2017-09-26 13:38:38 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Remove old exp/README.md file

 exp/README.md | 7 -------
 1 file changed, 7 deletions(-)
 delete mode 100644 exp/README.md

---

Diff of recent changes:

diff --git a/exp/README.md b/exp/README.md
deleted file mode 100644
index 48c11c9..0000000
--- a/exp/README.md
+++ /dev/null
@@ -1,7 +0,0 @@
-simple-expression-parser
-========================
-
-A simple expression parser for arithmetic expressions made with bison + flex
-
-To use, see the example test-expression-parser.c
-

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 51102e0d64a2ae08472ecb90a72737f08de942fb:

  add fio_set_directio() error message for platforms without direct I/O (2017-09-18 12:43:27 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2ae7a90df2d2340c4503be8a91526f80b3b96789:

  init: typo in help output (2017-09-20 22:21:35 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      init: typo in help output

 init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 6ac5212..e80aec3 100644
--- a/init.c
+++ b/init.c
@@ -2115,7 +2115,7 @@ static void usage(const char *name)
 	printf("  --inflate-log=log\tInflate and output compressed log\n");
 #endif
 	printf("  --trigger-file=file\tExecute trigger cmd when file exists\n");
-	printf("  --trigger-timeout=t\tExecute trigger af this time\n");
+	printf("  --trigger-timeout=t\tExecute trigger at this time\n");
 	printf("  --trigger=cmd\t\tSet this command as local trigger\n");
 	printf("  --trigger-remote=cmd\tSet this command as remote trigger\n");
 	printf("  --aux-path=path\tUse this path for fio state generated files\n");

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9acb08a9957b1111a06fbca6af113fa0c98dbd7c:

  Merge branch 'doc-patches' of https://github.com/vincentkfu/fio (2017-09-14 11:37:34 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 51102e0d64a2ae08472ecb90a72737f08de942fb:

  add fio_set_directio() error message for platforms without direct I/O (2017-09-18 12:43:27 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (4):
      lib/memalign: don't malloc size twice
      fix strncpy(3) copy length
      add "invalid block size" to "first direct IO errored." message
      add fio_set_directio() error message for platforms without direct I/O

 filesetup.c        | 1 +
 ioengines.c        | 4 ++--
 lib/memalign.c     | 2 +-
 os/windows/posix.c | 3 ++-
 server.c           | 2 +-
 5 files changed, 7 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index b51ab35..891a55a 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1868,6 +1868,7 @@ int fio_set_directio(struct thread_data *td, struct fio_file *f)
 
 	return 0;
 #else
+	log_err("fio: direct IO is not supported on this host operating system\n");
 	return -1;
 #endif
 }
diff --git a/ioengines.c b/ioengines.c
index fa4acab..9638d80 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -342,8 +342,8 @@ int td_io_queue(struct thread_data *td, struct io_u *io_u)
 	    td->o.odirect) {
 
 		log_info("fio: first direct IO errored. File system may not "
-			 "support direct IO, or iomem_align= is bad. Try "
-			 "setting direct=0.\n");
+			 "support direct IO, or iomem_align= is bad, or "
+			 "invalid block size. Try setting direct=0.\n");
 	}
 
 	if (!td->io_ops->commit || io_u->ddir == DDIR_TRIM) {
diff --git a/lib/memalign.c b/lib/memalign.c
index 137cc8e..bfbd1e8 100644
--- a/lib/memalign.c
+++ b/lib/memalign.c
@@ -18,7 +18,7 @@ void *fio_memalign(size_t alignment, size_t size)
 
 	assert(!(alignment & (alignment - 1)));
 
-	ptr = malloc(size + alignment + size + sizeof(*f) - 1);
+	ptr = malloc(size + alignment + sizeof(*f) - 1);
 	if (ptr) {
 		ret = PTR_ALIGN(ptr, alignment - 1);
 		f = ret + size;
diff --git a/os/windows/posix.c b/os/windows/posix.c
index 488d0ed..00f0335 100755
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -584,7 +584,8 @@ char *basename(char *path)
 	while (path[i] != '\\' && path[i] != '/' && i >= 0)
 		i--;
 
-	strncpy(name, path + i + 1, MAX_PATH);
+	name[MAX_PATH - 1] = '\0';
+	strncpy(name, path + i + 1, MAX_PATH - 1);
 
 	return name;
 }
diff --git a/server.c b/server.c
index 0469cea..e6ea4cd 100644
--- a/server.c
+++ b/server.c
@@ -856,7 +856,7 @@ static int handle_probe_cmd(struct fio_net_cmd *cmd)
 #ifdef CONFIG_BIG_ENDIAN
 	probe.bigendian = 1;
 #endif
-	strncpy((char *) probe.fio_version, fio_version_string, sizeof(probe.fio_version));
+	strncpy((char *) probe.fio_version, fio_version_string, sizeof(probe.fio_version) - 1);
 
 	probe.os	= FIO_OS;
 	probe.arch	= FIO_ARCH;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b599759ba565e7f2f573af364e6da4fe6d556a90:

  Add support for doing total latency percentiles (2017-09-13 22:07:31 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9acb08a9957b1111a06fbca6af113fa0c98dbd7c:

  Merge branch 'doc-patches' of https://github.com/vincentkfu/fio (2017-09-14 11:37:34 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      stat: some bool conversions
      Merge branch 'doc-patches' of https://github.com/vincentkfu/fio

Vincent Fu (2):
      doc: provide more detail regarding the --status-interval option
      doc: provide some documentation for the json output format

 HOWTO  | 15 +++++++++++++--
 fio.1  | 12 ++++++++++--
 stat.c | 42 ++++++++++++++++++++++--------------------
 3 files changed, 45 insertions(+), 24 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index bfaa054..8fad2ce 100644
--- a/HOWTO
+++ b/HOWTO
@@ -182,8 +182,10 @@ Command line options
 
 .. option:: --status-interval=time
 
-	Force full status dump every `time` period passed.  When the unit is
-	omitted, the value is interpreted in seconds.
+	Force a full status dump of cumulative (from job start) values at `time`
+	intervals. This option does *not* provide per-period measurements. So
+	values such as bandwidth are running averages. When the time unit is omitted,
+	`time` is interpreted in seconds.
 
 .. option:: --section=name
 
@@ -3389,6 +3391,15 @@ minimal output v3, separated by semicolons::
         terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_cla
 t_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 
 
+JSON output
+------------
+
+The `json` output format is intended to be both human readable and convenient
+for automated parsing. For the most part its sections mirror those of the
+`normal` output. The `runtime` value is reported in msec and the `bw` value is
+reported in 1024 bytes per second units.
+
+
 JSON+ output
 ------------
 
diff --git a/fio.1 b/fio.1
index 63e1c2e..b943db2 100644
--- a/fio.1
+++ b/fio.1
@@ -84,8 +84,10 @@ Force a new line for every \fItime\fR period passed. When the unit is omitted,
 the value is interpreted in seconds.
 .TP
 .BI \-\-status\-interval \fR=\fPtime
-Force full status dump every \fItime\fR period passed. When the unit is omitted,
-the value is interpreted in seconds.
+Force a full status dump of cumulative (from job start) values at \fItime\fR
+intervals. This option does *not* provide per-period measurements. So
+values such as bandwidth are running averages. When the time unit is omitted,
+\fItime\fR is interpreted in seconds.
 .TP
 .BI \-\-section \fR=\fPname
 Only run specified section \fIname\fR in job file. Multiple sections can be specified.
@@ -3106,6 +3108,12 @@ minimal output v3, separated by semicolons:
 .nf
 		terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct1
 0;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 .fi
+.SH JSON OUTPUT
+The \fBjson\fR output format is intended to be both human readable and convenient
+for automated parsing. For the most part its sections mirror those of the
+\fBnormal\fR output. The \fBruntime\fR value is reported in msec and the \fBbw\fR value is
+reported in 1024 bytes per second units.
+.fi
 .SH JSON+ OUTPUT
 The \fBjson+\fR output format is identical to the \fBjson\fR output format except that it
 adds a full dump of the completion latency bins. Each \fBbins\fR object contains a
diff --git a/stat.c b/stat.c
index 9828d15..09afa5b 100644
--- a/stat.c
+++ b/stat.c
@@ -143,7 +143,7 @@ unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 	unsigned int len, i, j = 0;
 	unsigned int oval_len = 0;
 	unsigned long long *ovals = NULL;
-	int is_last;
+	bool is_last;
 
 	*minv = -1ULL;
 	*maxv = 0;
@@ -166,7 +166,7 @@ unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 	/*
 	 * Calculate bucket values, note down max and min values
 	 */
-	is_last = 0;
+	is_last = false;
 	for (i = 0; i < FIO_IO_U_PLAT_NR && !is_last; i++) {
 		sum += io_u_plat[i];
 		while (sum >= (plist[j].u.f / 100.0 * nr)) {
@@ -183,7 +183,7 @@ unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 			if (ovals[j] > *maxv)
 				*maxv = ovals[j];
 
-			is_last = (j == len - 1);
+			is_last = (j == len - 1) != 0;
 			if (is_last)
 				break;
 
@@ -205,8 +205,9 @@ static void show_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 	unsigned int divisor, len, i, j = 0;
 	unsigned long long minv, maxv;
 	unsigned long long *ovals;
-	int is_last, per_line, scale_down, time_width;
+	int per_line, scale_down, time_width;
 	const char *pre = is_clat ? "clat" : " lat";
+	bool is_last;
 	char fmt[32];
 
 	len = calc_clat_percentiles(io_u_plat, nr, plist, &ovals, &maxv, &minv);
@@ -244,7 +245,7 @@ static void show_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 			log_buf(out, "     |");
 
 		/* end of the list */
-		is_last = (j == len - 1);
+		is_last = (j == len - 1) != 0;
 
 		for (i = 0; i < scale_down; i++)
 			ovals[j] = (ovals[j] + 999) / 1000;
@@ -511,20 +512,21 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	}
 }
 
-static int show_lat(double *io_u_lat, int nr, const char **ranges,
-		    const char *msg, struct buf_output *out)
+static bool show_lat(double *io_u_lat, int nr, const char **ranges,
+		     const char *msg, struct buf_output *out)
 {
-	int new_line = 1, i, line = 0, shown = 0;
+	bool new_line = true, shown = false;
+	int i, line = 0;
 
 	for (i = 0; i < nr; i++) {
 		if (io_u_lat[i] <= 0.0)
 			continue;
-		shown = 1;
+		shown = true;
 		if (new_line) {
 			if (line)
 				log_buf(out, "\n");
 			log_buf(out, "  lat (%s)   : ", msg);
-			new_line = 0;
+			new_line = false;
 			line = 0;
 		}
 		if (line)
@@ -532,13 +534,13 @@ static int show_lat(double *io_u_lat, int nr, const char **ranges,
 		log_buf(out, "%s%3.2f%%", ranges[i], io_u_lat[i]);
 		line++;
 		if (line == 5)
-			new_line = 1;
+			new_line = true;
 	}
 
 	if (shown)
 		log_buf(out, "\n");
 
-	return shown;
+	return true;
 }
 
 static void show_lat_n(double *io_u_lat_n, struct buf_output *out)
@@ -1590,8 +1592,8 @@ void __show_run_stats(void)
 	struct thread_data *td;
 	struct thread_stat *threadstats, *ts;
 	int i, j, k, nr_ts, last_ts, idx;
-	int kb_base_warned = 0;
-	int unit_base_warned = 0;
+	bool kb_base_warned = false;
+	bool unit_base_warned = false;
 	struct json_object *root = NULL;
 	struct json_array *array = NULL;
 	struct buf_output output[FIO_OUTPUT_NR];
@@ -1684,11 +1686,11 @@ void __show_run_stats(void)
 		} else if (ts->kb_base != td->o.kb_base && !kb_base_warned) {
 			log_info("fio: kb_base differs for jobs in group, using"
 				 " %u as the base\n", ts->kb_base);
-			kb_base_warned = 1;
+			kb_base_warned = true;
 		} else if (ts->unit_base != td->o.unit_base && !unit_base_warned) {
 			log_info("fio: unit_base differs for jobs in group, using"
 				 " %u as the base\n", ts->unit_base);
-			unit_base_warned = 1;
+			unit_base_warned = true;
 		}
 
 		ts->continue_on_error = td->o.continue_on_error;
@@ -1932,9 +1934,9 @@ void __show_running_run_stats(void)
 	fio_mutex_up(stat_mutex);
 }
 
-static int status_interval_init;
+static bool status_interval_init;
 static struct timespec status_time;
-static int status_file_disabled;
+static bool status_file_disabled;
 
 #define FIO_STATUS_FILE		"fio-dump-status"
 
@@ -1965,7 +1967,7 @@ static int check_status_file(void)
 		log_err("fio: failed to unlink %s: %s\n", fio_status_file_path,
 							strerror(errno));
 		log_err("fio: disabling status file updates\n");
-		status_file_disabled = 1;
+		status_file_disabled = true;
 	}
 
 	return 1;
@@ -1976,7 +1978,7 @@ void check_for_running_stats(void)
 	if (status_interval) {
 		if (!status_interval_init) {
 			fio_gettime(&status_time, NULL);
-			status_interval_init = 1;
+			status_interval_init = true;
 		} else if (mtime_since_now(&status_time) >= status_interval) {
 			show_running_run_stats();
 			fio_gettime(&status_time, NULL);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 956e60eade2bddb8aadfb54b58030e0b88fd03b2:

  io_u: fix trimming of mixed block size randommap (2017-09-12 14:02:34 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b599759ba565e7f2f573af364e6da4fe6d556a90:

  Add support for doing total latency percentiles (2017-09-13 22:07:31 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      time: ensure that offload mode switches parent out of ramp
      time: use bool for ramp time
      init: fixup_options() cleanup
      Add support for doing total latency percentiles

 HOWTO            | 10 +++++++++-
 cconv.c          |  2 ++
 client.c         |  3 ++-
 fio.1            |  8 +++++++-
 fio.h            |  2 +-
 gclient.c        |  6 +++++-
 init.c           | 35 +++++++++++++++++++++++++----------
 options.c        | 12 ++++++++++++
 server.c         |  3 ++-
 server.h         |  2 +-
 stat.c           | 22 ++++++++++++++--------
 stat.h           |  3 ++-
 thread_options.h |  3 ++-
 time.c           | 19 ++++++++++++++-----
 14 files changed, 98 insertions(+), 32 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 2a70b7c..bfaa054 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2860,7 +2860,15 @@ Measurements and reporting
 
 .. option:: clat_percentiles=bool
 
-	Enable the reporting of percentiles of completion latencies.
+	Enable the reporting of percentiles of completion latencies.  This
+	option is mutually exclusive with :option:`lat_percentiles`.
+
+.. option:: lat_percentiles=bool
+
+	Enable the reporting of percentiles of IO latencies. This is similar
+	to :option:`clat_percentiles`, except that this includes the
+	submission latency. This option is mutually exclusive with
+	:option:`clat_percentiles`.
 
 .. option:: percentile_list=float_list
 
diff --git a/cconv.c b/cconv.c
index ac58705..f809fd5 100644
--- a/cconv.c
+++ b/cconv.c
@@ -267,6 +267,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->trim_batch = le32_to_cpu(top->trim_batch);
 	o->trim_zero = le32_to_cpu(top->trim_zero);
 	o->clat_percentiles = le32_to_cpu(top->clat_percentiles);
+	o->lat_percentiles = le32_to_cpu(top->lat_percentiles);
 	o->percentile_precision = le32_to_cpu(top->percentile_precision);
 	o->continue_on_error = le32_to_cpu(top->continue_on_error);
 	o->cgroup_weight = le32_to_cpu(top->cgroup_weight);
@@ -454,6 +455,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->trim_batch = cpu_to_le32(o->trim_batch);
 	top->trim_zero = cpu_to_le32(o->trim_zero);
 	top->clat_percentiles = cpu_to_le32(o->clat_percentiles);
+	top->lat_percentiles = cpu_to_le32(o->lat_percentiles);
 	top->percentile_precision = cpu_to_le32(o->percentile_precision);
 	top->continue_on_error = cpu_to_le32(o->continue_on_error);
 	top->cgroup_weight = cpu_to_le32(o->cgroup_weight);
diff --git a/client.c b/client.c
index 281d853..09e810a 100644
--- a/client.c
+++ b/client.c
@@ -893,7 +893,8 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->ctx		= le64_to_cpu(src->ctx);
 	dst->minf		= le64_to_cpu(src->minf);
 	dst->majf		= le64_to_cpu(src->majf);
-	dst->clat_percentiles	= le64_to_cpu(src->clat_percentiles);
+	dst->clat_percentiles	= le32_to_cpu(src->clat_percentiles);
+	dst->lat_percentiles	= le32_to_cpu(src->lat_percentiles);
 	dst->percentile_precision = le64_to_cpu(src->percentile_precision);
 
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++) {
diff --git a/fio.1 b/fio.1
index 97133da..63e1c2e 100644
--- a/fio.1
+++ b/fio.1
@@ -2543,7 +2543,13 @@ Disable measurements of throughput/bandwidth numbers. See
 \fBdisable_lat\fR.
 .TP
 .BI clat_percentiles \fR=\fPbool
-Enable the reporting of percentiles of completion latencies.
+Enable the reporting of percentiles of completion latencies. This option is
+mutually exclusive with \fBlat_percentiles\fR.
+.TP
+.BI lat_percentiles \fR=\fPbool
+Enable the reporting of percentiles of IO latencies. This is similar to
+\fBclat_percentiles\fR, except that this includes the submission latency.
+This option is mutually exclusive with \fBclat_percentiles\fR.
 .TP
 .BI percentile_list \fR=\fPfloat_list
 Overwrite the default list of percentiles for completion latencies and the
diff --git a/fio.h b/fio.h
index da950ef..8814d84 100644
--- a/fio.h
+++ b/fio.h
@@ -335,7 +335,7 @@ struct thread_data {
 	struct timespec terminate_time;
 	unsigned int ts_cache_nr;
 	unsigned int ts_cache_mask;
-	unsigned int ramp_time_over;
+	bool ramp_time_over;
 
 	/*
 	 * Time since last latency_window was started
diff --git a/gclient.c b/gclient.c
index 4eb99a0..43c8a08 100644
--- a/gclient.c
+++ b/gclient.c
@@ -1127,7 +1127,11 @@ static void gfio_show_clat_percentiles(struct gfio_client *gc,
 		base = "nsec";
         }
 
-	sprintf(tmp, "Completion percentiles (%s)", base);
+	if (ts->clat_percentiles)
+		sprintf(tmp, "Completion percentiles (%s)", base);
+	else
+		sprintf(tmp, "Latency percentiles (%s)", base);
+
 	tree_view = gfio_output_clat_percentiles(ovals, plist, len, base, scale_down);
 	ge->clat_graph = setup_clat_graph(tmp, ovals, plist, len, 700.0, 300.0);
 
diff --git a/init.c b/init.c
index cf5c646..6ac5212 100644
--- a/init.c
+++ b/init.c
@@ -837,7 +837,7 @@ static int fixup_options(struct thread_data *td)
 	 * Windows doesn't support O_DIRECT or O_SYNC with the _open interface,
 	 * so fail if we're passed those flags
 	 */
-	if (td_ioengine_flagged(td, FIO_SYNCIO) && (td->o.odirect || td->o.sync_io)) {
+	if (td_ioengine_flagged(td, FIO_SYNCIO) && (o->odirect || o->sync_io)) {
 		log_err("fio: Windows does not support direct or non-buffered io with"
 				" the synchronous ioengines. Use the 'windowsaio' ioengine"
 				" with 'direct=1' and 'iodepth=1' instead.\n");
@@ -863,8 +863,8 @@ static int fixup_options(struct thread_data *td)
 	 * Using a non-uniform random distribution excludes usage of
 	 * a random map
 	 */
-	if (td->o.random_distribution != FIO_RAND_DIST_RANDOM)
-		td->o.norandommap = 1;
+	if (o->random_distribution != FIO_RAND_DIST_RANDOM)
+		o->norandommap = 1;
 
 	/*
 	 * If size is set but less than the min block size, complain
@@ -878,16 +878,16 @@ static int fixup_options(struct thread_data *td)
 	/*
 	 * O_ATOMIC implies O_DIRECT
 	 */
-	if (td->o.oatomic)
-		td->o.odirect = 1;
+	if (o->oatomic)
+		o->odirect = 1;
 
 	/*
 	 * If randseed is set, that overrides randrepeat
 	 */
-	if (fio_option_is_set(&td->o, rand_seed))
-		td->o.rand_repeatable = 0;
+	if (fio_option_is_set(o, rand_seed))
+		o->rand_repeatable = 0;
 
-	if (td_ioengine_flagged(td, FIO_NOEXTEND) && td->o.file_append) {
+	if (td_ioengine_flagged(td, FIO_NOEXTEND) && o->file_append) {
 		log_err("fio: can't append/extent with IO engine %s\n", td->io_ops->name);
 		ret = 1;
 	}
@@ -902,10 +902,24 @@ static int fixup_options(struct thread_data *td)
 	if (!td->loops)
 		td->loops = 1;
 
-	if (td->o.block_error_hist && td->o.nr_files != 1) {
+	if (o->block_error_hist && o->nr_files != 1) {
 		log_err("fio: block error histogram only available "
 			"with a single file per job, but %d files "
-			"provided\n", td->o.nr_files);
+			"provided\n", o->nr_files);
+		ret = 1;
+	}
+
+	if (fio_option_is_set(o, clat_percentiles) &&
+	    !fio_option_is_set(o, lat_percentiles)) {
+		o->lat_percentiles = !o->clat_percentiles;
+	} else if (fio_option_is_set(o, lat_percentiles) &&
+		   !fio_option_is_set(o, clat_percentiles)) {
+		o->clat_percentiles = !o->lat_percentiles;
+	} else if (fio_option_is_set(o, lat_percentiles) &&
+		   fio_option_is_set(o, clat_percentiles) &&
+		   o->lat_percentiles && o->clat_percentiles) {
+		log_err("fio: lat_percentiles and clat_percentiles are "
+			"mutually exclusive\n");
 		ret = 1;
 	}
 
@@ -1401,6 +1415,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	td->mutex = fio_mutex_init(FIO_MUTEX_LOCKED);
 
 	td->ts.clat_percentiles = o->clat_percentiles;
+	td->ts.lat_percentiles = o->lat_percentiles;
 	td->ts.percentile_precision = o->percentile_precision;
 	memcpy(td->ts.percentile_list, o->percentile_list, sizeof(o->percentile_list));
 
diff --git a/options.c b/options.c
index 54fa4ee..5c1abe9 100644
--- a/options.c
+++ b/options.c
@@ -4076,6 +4076,18 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.off1	= offsetof(struct thread_options, clat_percentiles),
 		.help	= "Enable the reporting of completion latency percentiles",
 		.def	= "1",
+		.inverse = "lat_percentiles",
+		.category = FIO_OPT_C_STAT,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
+		.name	= "lat_percentiles",
+		.lname	= "IO latency percentiles",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct thread_options, lat_percentiles),
+		.help	= "Enable the reporting of IO latency percentiles",
+		.def	= "0",
+		.inverse = "clat_percentiles",
 		.category = FIO_OPT_C_STAT,
 		.group	= FIO_OPT_G_INVALID,
 	},
diff --git a/server.c b/server.c
index 2c08c3e..0469cea 100644
--- a/server.c
+++ b/server.c
@@ -1484,7 +1484,8 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	p.ts.ctx		= cpu_to_le64(ts->ctx);
 	p.ts.minf		= cpu_to_le64(ts->minf);
 	p.ts.majf		= cpu_to_le64(ts->majf);
-	p.ts.clat_percentiles	= cpu_to_le64(ts->clat_percentiles);
+	p.ts.clat_percentiles	= cpu_to_le32(ts->clat_percentiles);
+	p.ts.lat_percentiles	= cpu_to_le32(ts->lat_percentiles);
 	p.ts.percentile_precision = cpu_to_le64(ts->percentile_precision);
 
 	for (i = 0; i < FIO_IO_U_LIST_MAX_LEN; i++) {
diff --git a/server.h b/server.h
index f63a518..ba3abfe 100644
--- a/server.h
+++ b/server.h
@@ -49,7 +49,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 65,
+	FIO_SERVER_VER			= 66,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index 63353cc..9828d15 100644
--- a/stat.c
+++ b/stat.c
@@ -200,12 +200,13 @@ unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
  */
 static void show_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 				  fio_fp64_t *plist, unsigned int precision,
-				  struct buf_output *out)
+				  bool is_clat, struct buf_output *out)
 {
 	unsigned int divisor, len, i, j = 0;
 	unsigned long long minv, maxv;
 	unsigned long long *ovals;
 	int is_last, per_line, scale_down, time_width;
+	const char *pre = is_clat ? "clat" : " lat";
 	char fmt[32];
 
 	len = calc_clat_percentiles(io_u_plat, nr, plist, &ovals, &maxv, &minv);
@@ -219,15 +220,15 @@ static void show_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 	if (minv > 2000000 && maxv > 99999999ULL) {
 		scale_down = 2;
 		divisor = 1000000;
-		log_buf(out, "    clat percentiles (msec):\n     |");
+		log_buf(out, "    %s percentiles (msec):\n     |", pre);
 	} else if (minv > 2000 && maxv > 99999) {
 		scale_down = 1;
 		divisor = 1000;
-		log_buf(out, "    clat percentiles (usec):\n     |");
+		log_buf(out, "    %s percentiles (usec):\n     |", pre);
 	} else {
 		scale_down = 0;
 		divisor = 1;
-		log_buf(out, "    clat percentiles (nsec):\n     |");
+		log_buf(out, "    %s percentiles (nsec):\n     |", pre);
 	}
 
 
@@ -457,11 +458,12 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	if (calc_lat(&ts->lat_stat[ddir], &min, &max, &mean, &dev))
 		display_lat(" lat", min, max, mean, dev, out);
 
-	if (ts->clat_percentiles) {
+	if (ts->clat_percentiles || ts->lat_percentiles) {
 		show_clat_percentiles(ts->io_u_plat[ddir],
 					ts->clat_stat[ddir].samples,
 					ts->percentile_list,
-					ts->percentile_precision, out);
+					ts->percentile_precision,
+					ts->clat_percentiles, out);
 	}
 	if (calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev)) {
 		double p_of_agg = 100.0, fkb_base = (double)rs->kb_base;
@@ -896,7 +898,7 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 	else
 		log_buf(out, ";%llu;%llu;%f;%f", 0ULL, 0ULL, 0.0, 0.0);
 
-	if (ts->clat_percentiles) {
+	if (ts->clat_percentiles || ts->lat_percentiles) {
 		len = calc_clat_percentiles(ts->io_u_plat[ddir],
 					ts->clat_stat[ddir].samples,
 					ts->percentile_list, &ovals, &maxv,
@@ -1011,7 +1013,7 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	json_object_add_value_float(tmp_object, "mean", mean);
 	json_object_add_value_float(tmp_object, "stddev", dev);
 
-	if (ts->clat_percentiles) {
+	if (ts->clat_percentiles || ts->lat_percentiles) {
 		len = calc_clat_percentiles(ts->io_u_plat[ddir],
 					ts->clat_stat[ddir].samples,
 					ts->percentile_list, &ovals, &maxv,
@@ -1645,6 +1647,7 @@ void __show_run_stats(void)
 		ts = &threadstats[j];
 
 		ts->clat_percentiles = td->o.clat_percentiles;
+		ts->lat_percentiles = td->o.lat_percentiles;
 		ts->percentile_precision = td->o.percentile_precision;
 		memcpy(ts->percentile_list, td->o.percentile_list, sizeof(td->o.percentile_list));
 		opt_lists[j] = &td->opt_list;
@@ -2437,6 +2440,9 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 		add_log_sample(td, td->lat_log, sample_val(nsec), ddir, bs,
 			       offset);
 
+	if (ts->lat_percentiles)
+		add_clat_percentile_sample(ts, nsec, ddir);
+
 	td_io_u_unlock(td);
 }
 
diff --git a/stat.h b/stat.h
index 132dee3..848331b 100644
--- a/stat.h
+++ b/stat.h
@@ -172,7 +172,8 @@ struct thread_stat {
 	/*
 	 * IO depth and latency stats
 	 */
-	uint64_t clat_percentiles;
+	uint32_t clat_percentiles;
+	uint32_t lat_percentiles;
 	uint64_t percentile_precision;
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
diff --git a/thread_options.h b/thread_options.h
index fd6576e..1813cdc 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -240,6 +240,7 @@ struct thread_options {
 	unsigned int trim_zero;
 	unsigned long long trim_backlog;
 	unsigned int clat_percentiles;
+	unsigned int lat_percentiles;
 	unsigned int percentile_precision;	/* digits after decimal for percentiles */
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
@@ -343,7 +344,7 @@ struct thread_options_pack {
 	uint32_t iodepth_batch_complete_min;
 	uint32_t iodepth_batch_complete_max;
 	uint32_t serialize_overlap;
-	uint32_t pad3;
+	uint32_t lat_percentiles;
 
 	uint64_t size;
 	uint64_t io_size;
diff --git a/time.c b/time.c
index 0798419..c887682 100644
--- a/time.c
+++ b/time.c
@@ -97,16 +97,17 @@ bool in_ramp_time(struct thread_data *td)
 	return td->o.ramp_time && !td->ramp_time_over;
 }
 
-static void parent_update_ramp(struct thread_data *td)
+static bool parent_update_ramp(struct thread_data *td)
 {
 	struct thread_data *parent = td->parent;
 
 	if (!parent || parent->ramp_time_over)
-		return;
+		return false;
 
 	reset_all_stats(parent);
-	parent->ramp_time_over = 1;
+	parent->ramp_time_over = true;
 	td_set_runstate(parent, TD_RAMP);
+	return true;
 }
 
 bool ramp_time_over(struct thread_data *td)
@@ -115,10 +116,18 @@ bool ramp_time_over(struct thread_data *td)
 		return true;
 
 	if (utime_since_now(&td->epoch) >= td->o.ramp_time) {
-		td->ramp_time_over = 1;
+		td->ramp_time_over = true;
 		reset_all_stats(td);
 		td_set_runstate(td, TD_RAMP);
-		parent_update_ramp(td);
+
+		/*
+		 * If we have a parent, the parent isn't doing IO. Hence
+		 * the parent never enters do_io(), which will switch us
+		 * from RAMP -> RUNNING. Do this manually here.
+		 */
+		if (parent_update_ramp(td))
+			td_set_runstate(td, TD_RUNNING);
+
 		return true;
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c6fa271e32f08f35d7fc25272e77c0f7ee17bfec:

  Merge branch 'solaris-clock-setaffinity' of https://github.com/szaydel/fio (2017-09-11 14:26:01 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 956e60eade2bddb8aadfb54b58030e0b88fd03b2:

  io_u: fix trimming of mixed block size randommap (2017-09-12 14:02:34 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      io_u: fix trimming of mixed block size randommap

 io_u.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index db043e4..e98cd31 100644
--- a/io_u.c
+++ b/io_u.c
@@ -37,7 +37,7 @@ static bool random_map_free(struct fio_file *f, const uint64_t block)
  */
 static void mark_random_map(struct thread_data *td, struct io_u *io_u)
 {
-	unsigned int min_bs = td->o.rw_min_bs;
+	unsigned int min_bs = td->o.min_bs[io_u->ddir];
 	struct fio_file *f = io_u->file;
 	unsigned int nr_blocks;
 	uint64_t block;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 83a9e706745e5a5affd5475b884e42d0100f783f:

  Merge branch 'windows_io_hint' of https://github.com/sitsofe/fio (2017-09-05 15:37:36 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c6fa271e32f08f35d7fc25272e77c0f7ee17bfec:

  Merge branch 'solaris-clock-setaffinity' of https://github.com/szaydel/fio (2017-09-11 14:26:01 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'verify_trigger' of https://github.com/sitsofe/fio
      Merge branch 'solaris-clock-setaffinity' of https://github.com/szaydel/fio

Sam Zaydel (1):
      Fix clock setaffinity failed error which occurs on Solaris and Solaris derivatives such as Illumos.

Sitsofe Wheeler (1):
      backend: verify-trigger fixes

 backend.c       | 9 +++++++--
 os/os-solaris.h | 2 +-
 server.c        | 1 +
 3 files changed, 9 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index d2675b4..6198c3d 100644
--- a/backend.c
+++ b/backend.c
@@ -1391,6 +1391,8 @@ static bool keep_running(struct thread_data *td)
 
 	if (td->done)
 		return false;
+	if (td->terminate)
+		return false;
 	if (td->o.time_based)
 		return true;
 	if (td->o.loops) {
@@ -2042,7 +2044,10 @@ static bool __check_trigger_file(void)
 static bool trigger_timedout(void)
 {
 	if (trigger_timeout)
-		return time_since_genesis() >= trigger_timeout;
+		if (time_since_genesis() >= trigger_timeout) {
+			trigger_timeout = 0;
+			return true;
+		}
 
 	return false;
 }
@@ -2051,7 +2056,7 @@ void exec_trigger(const char *cmd)
 {
 	int ret;
 
-	if (!cmd)
+	if (!cmd || cmd[0] == '\0')
 		return;
 
 	ret = system(cmd);
diff --git a/os/os-solaris.h b/os/os-solaris.h
index 6af25d2..45268b2 100644
--- a/os/os-solaris.h
+++ b/os/os-solaris.h
@@ -97,7 +97,7 @@ static inline int fio_set_odirect(struct fio_file *f)
  * pset binding hooks for fio
  */
 #define fio_setaffinity(pid, cpumask)		\
-	pset_bind((cpumask), P_PID, (pid), NULL)
+	pset_bind((cpumask), P_LWPID, (pid), NULL)
 #define fio_getaffinity(pid, ptr)	({ 0; })
 
 #define fio_cpu_clear(mask, cpu)	pset_assign(PS_NONE, (cpu), NULL)
diff --git a/server.c b/server.c
index a640fe3..2c08c3e 100644
--- a/server.c
+++ b/server.c
@@ -970,6 +970,7 @@ static int handle_trigger_cmd(struct fio_net_cmd *cmd)
 	} else
 		fio_net_queue_cmd(FIO_NET_CMD_VTRIGGER, rep, sz, NULL, SK_F_FREE | SK_F_INLINE);
 
+	fio_terminate_threads(TERMINATE_ALL);
 	exec_trigger(buf);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 07dff7d1d614b33e3a6d3e3ade38ce648b53a632:

  Merge branch 'shifted_logging' of https://github.com/sitsofe/fio (2017-09-02 17:00:24 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 83a9e706745e5a5affd5475b884e42d0100f783f:

  Merge branch 'windows_io_hint' of https://github.com/sitsofe/fio (2017-09-05 15:37:36 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      filesetup: revert O_DIRECT for layout mess
      Merge branch 'gluster_printf' of https://github.com/sitsofe/fio
      Merge branch 'travis_32bit' of https://github.com/sitsofe/fio
      Merge branch 'windows_io_hint' of https://github.com/sitsofe/fio

Sitsofe Wheeler (3):
      travis: add 32 bit build, minor updates and cleanups
      glusterfs: silence printf specifier warnings
      windowsaio: obey sequential/random I/O hinting

 .travis.yml               | 29 +++++++++++++++++++++++++----
 engines/glusterfs.c       |  8 ++++----
 engines/glusterfs_async.c |  2 +-
 engines/windowsaio.c      | 23 ++++++++++++++++++-----
 filesetup.c               | 25 ++++++-------------------
 5 files changed, 54 insertions(+), 33 deletions(-)

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
index 795c0fc..94f69fb 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -5,12 +5,16 @@ compiler:
   - clang
   - gcc
 env:
+  matrix:
+    - BUILD_ARCH="x86"
+    - BUILD_ARCH="x86_64"
   global:
     - MAKEFLAGS="-j 2"
 matrix:
   include:
     - os: osx
       compiler: clang # Workaround travis setting CC=["clang", "gcc"]
+      env: BUILD_ARCH="x86_64"
     # Build using the 10.12 SDK but target and run on OSX 10.11
 #   - os: osx
 #     compiler: clang
@@ -19,12 +23,29 @@ matrix:
     # Build on the latest OSX version (will eventually become obsolete)
     - os: osx
       compiler: clang
-      osx_image: xcode8.2
+      osx_image: xcode8.3
+      env: BUILD_ARCH="x86_64"
   exclude:
     - os: osx
       compiler: gcc
+  exclude:
+    - os: linux
+      compiler: clang
+      env: BUILD_ARCH="x86" # Only do the gcc x86 build to reduce clutter
 before_install:
-  - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then sudo apt-get -qq update; fi
-  - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then sudo apt-get install -qq -y libaio-dev libnuma-dev libz-dev librbd-dev glusterfs-common libibverbs-dev librdmacm-dev; fi
+  - EXTRA_CFLAGS="-Werror"
+  - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then
+        pkgs=(libaio-dev libnuma-dev libz-dev librbd-dev libibverbs-dev librdmacm-dev);
+        if [[ "$BUILD_ARCH" == "x86" ]]; then
+            pkgs=("${pkgs[@]/%/:i386}");
+            pkgs+=(gcc-multilib);
+            EXTRA_CFLAGS="${EXTRA_CFLAGS} -m32";
+        else
+            pkgs+=(glusterfs-common);
+        fi;
+        sudo apt-get -qq update;
+        sudo apt-get install --no-install-recommends -qq -y "${pkgs[@]}";
+    fi
 script:
-  - ./configure --extra-cflags="-Werror" && make && make test
+  - ./configure --extra-cflags="${EXTRA_CFLAGS}" && make
+  - make test
diff --git a/engines/glusterfs.c b/engines/glusterfs.c
index 2abc283..981dfa3 100644
--- a/engines/glusterfs.c
+++ b/engines/glusterfs.c
@@ -165,11 +165,11 @@ int fio_gf_open_file(struct thread_data *td, struct fio_file *f)
 	if (td_read(td)) {
 		if (glfs_lstat(g->fs, f->file_name, &sb)
 		    || sb.st_size < f->real_file_size) {
-			dprint(FD_FILE, "fio extend file %s from %ld to %ld\n",
-			       f->file_name, sb.st_size, f->real_file_size);
+			dprint(FD_FILE, "fio extend file %s from %jd to %" PRIu64 "\n",
+			       f->file_name, (intmax_t) sb.st_size, f->real_file_size);
 			ret = glfs_ftruncate(g->fd, f->real_file_size);
 			if (ret) {
-				log_err("failed fio extend file %s to %ld\n",
+				log_err("failed fio extend file %s to %" PRIu64 "\n",
 					f->file_name, f->real_file_size);
 			} else {
 				unsigned long long left;
@@ -190,7 +190,7 @@ int fio_gf_open_file(struct thread_data *td, struct fio_file *f)
 
 					r = glfs_write(g->fd, b, bs, 0);
 					dprint(FD_IO,
-					       "fio write %d of %ld file %s\n",
+					       "fio write %d of %" PRIu64 " file %s\n",
 					       r, f->real_file_size,
 					       f->file_name);
 
diff --git a/engines/glusterfs_async.c b/engines/glusterfs_async.c
index f46cb26..97271d6 100644
--- a/engines/glusterfs_async.c
+++ b/engines/glusterfs_async.c
@@ -92,7 +92,7 @@ static void gf_async_cb(glfs_fd_t * fd, ssize_t ret, void *data)
 	struct io_u *io_u = data;
 	struct fio_gf_iou *iou = io_u->engine_data;
 
-	dprint(FD_IO, "%s ret %lu\n", __FUNCTION__, ret);
+	dprint(FD_IO, "%s ret %zd\n", __FUNCTION__, ret);
 	iou->io_complete = 1;
 }
 
diff --git a/engines/windowsaio.c b/engines/windowsaio.c
index c4c5abd..314eaad 100644
--- a/engines/windowsaio.c
+++ b/engines/windowsaio.c
@@ -169,13 +169,26 @@ static int fio_windowsaio_open_file(struct thread_data *td, struct fio_file *f)
 
 	/*
 	 * Inform Windows whether we're going to be doing sequential or
-	 * random io so it can tune the Cache Manager
+	 * random IO so it can tune the Cache Manager
 	 */
-	if (td->o.td_ddir == TD_DDIR_READ  ||
-		td->o.td_ddir == TD_DDIR_WRITE)
-		flags |= FILE_FLAG_SEQUENTIAL_SCAN;
-	else
+	switch (td->o.fadvise_hint) {
+	case F_ADV_TYPE:
+		if (td_random(td))
+			flags |= FILE_FLAG_RANDOM_ACCESS;
+		else
+			flags |= FILE_FLAG_SEQUENTIAL_SCAN;
+		break;
+	case F_ADV_RANDOM:
 		flags |= FILE_FLAG_RANDOM_ACCESS;
+		break;
+	case F_ADV_SEQUENTIAL:
+		flags |= FILE_FLAG_SEQUENTIAL_SCAN;
+		break;
+	case F_ADV_NONE:
+		break;
+	default:
+		log_err("fio: unknown fadvise type %d\n", td->o.fadvise_hint);
+	}
 
 	if (!td_write(td) || read_only)
 		access = GENERIC_READ;
diff --git a/filesetup.c b/filesetup.c
index 5e8ea35..b51ab35 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -15,7 +15,6 @@
 #include "os/os.h"
 #include "hash.h"
 #include "lib/axmap.h"
-#include "lib/memalign.h"
 
 #ifdef CONFIG_LINUX_FALLOCATE
 #include <linux/falloc.h>
@@ -110,7 +109,7 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 {
 	int new_layout = 0, unlink_file = 0, flags;
 	unsigned long long left;
-	unsigned int bs, alloc_size = 0;
+	unsigned int bs;
 	char *b = NULL;
 
 	if (read_only) {
@@ -147,8 +146,6 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 		flags |= O_CREAT;
 	if (new_layout)
 		flags |= O_TRUNC;
-	if (td->o.odirect)
-		flags |= OS_O_DIRECT;
 
 #ifdef WIN32
 	flags |= _O_BINARY;
@@ -162,14 +159,8 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 		if (err == ENOENT && !td->o.allow_create)
 			log_err("fio: file creation disallowed by "
 					"allow_file_create=0\n");
-		else {
-			if (err == EINVAL && (flags & OS_O_DIRECT))
-				log_err("fio: looks like your filesystem "
-					"does not support "
-					"direct=1/buffered=0\n");
-
+		else
 			td_verror(td, err, "open");
-		}
 		return 1;
 	}
 
@@ -196,18 +187,14 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 		}
 	}
 
-	if (td->o.odirect && !OS_O_DIRECT && fio_set_directio(td, f))
-		goto err;
-
 	left = f->real_file_size;
 	bs = td->o.max_bs[DDIR_WRITE];
 	if (bs > left)
 		bs = left;
 
-	alloc_size = bs;
-	b = fio_memalign(page_size, alloc_size);
+	b = malloc(bs);
 	if (!b) {
-		td_verror(td, errno, "fio_memalign");
+		td_verror(td, errno, "malloc");
 		goto err;
 	}
 
@@ -260,14 +247,14 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 			f->io_size = f->real_file_size;
 	}
 
-	fio_memfree(b, alloc_size);
+	free(b);
 done:
 	return 0;
 err:
 	close(f->fd);
 	f->fd = -1;
 	if (b)
-		fio_memfree(b, alloc_size);
+		free(b);
 	return 1;
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ba872a0beb498740f076e19299bb7388b82ad4d6:

  revert/rework 81647a9a('fix load_ioengine() not to support no "external:" prefix') (2017-09-01 13:58:35 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 07dff7d1d614b33e3a6d3e3ade38ce648b53a632:

  Merge branch 'shifted_logging' of https://github.com/sitsofe/fio (2017-09-02 17:00:24 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'shifted_logging' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      stat: fix shifted windowed logging when using multiple directions

 iolog.h |  2 +-
 stat.c  | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/iolog.h b/iolog.h
index d157fa2..bc3a0b5 100644
--- a/iolog.h
+++ b/iolog.h
@@ -117,7 +117,7 @@ struct io_log {
 	 */
 	struct io_stat avg_window[DDIR_RWDIR_CNT];
 	unsigned long avg_msec;
-	unsigned long avg_last;
+	unsigned long avg_last[DDIR_RWDIR_CNT];
 
 	/*
 	 * Windowed latency histograms, for keeping track of when we need to
diff --git a/stat.c b/stat.c
index 91c74ab..63353cc 100644
--- a/stat.c
+++ b/stat.c
@@ -2159,7 +2159,7 @@ static void __add_log_sample(struct io_log *iolog, union io_sample_data data,
 	if (iolog->disabled)
 		return;
 	if (flist_empty(&iolog->io_logs))
-		iolog->avg_last = t;
+		iolog->avg_last[ddir] = t;
 
 	cur_log = get_cur_log(iolog);
 	if (cur_log) {
@@ -2290,9 +2290,9 @@ static long add_log_sample(struct thread_data *td, struct io_log *iolog,
 	 * If period hasn't passed, adding the above sample is all we
 	 * need to do.
 	 */
-	this_window = elapsed - iolog->avg_last;
-	if (elapsed < iolog->avg_last)
-		return iolog->avg_last - elapsed;
+	this_window = elapsed - iolog->avg_last[ddir];
+	if (elapsed < iolog->avg_last[ddir])
+		return iolog->avg_last[ddir] - elapsed;
 	else if (this_window < iolog->avg_msec) {
 		int diff = iolog->avg_msec - this_window;
 
@@ -2300,9 +2300,9 @@ static long add_log_sample(struct thread_data *td, struct io_log *iolog,
 			return diff;
 	}
 
-	_add_stat_to_log(iolog, elapsed, td->o.log_max != 0);
+	__add_stat_to_log(iolog, ddir, elapsed, td->o.log_max != 0);
 
-	iolog->avg_last = elapsed - (this_window - iolog->avg_msec);
+	iolog->avg_last[ddir] = elapsed - (this_window - iolog->avg_msec);
 	return iolog->avg_msec;
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 83070ccd1091a1c44ac838f95bab6811cbc287f5:

  t/axmap: we don't need smalloc/sfree wrappers (2017-08-31 08:34:27 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ba872a0beb498740f076e19299bb7388b82ad4d6:

  revert/rework 81647a9a('fix load_ioengine() not to support no "external:" prefix') (2017-09-01 13:58:35 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (7):
      skeleton: add option example
      fix broken external ioengine option
      cleanup ioengine_load() (for the next commit)
      fix load_ioengine() not to support no "external:" prefix
      add __load_ioengine() to separate ioengine loading from td context
      fix regression by 8c43ba62('filesetup: align layout buffer')
      revert/rework 81647a9a('fix load_ioengine() not to support no "external:" prefix')

 HOWTO                       |  4 +++-
 engines/skeleton_external.c | 32 +++++++++++++++++++++++++++++++-
 filesetup.c                 |  9 +++++----
 fio.1                       |  4 +++-
 init.c                      | 21 ++-------------------
 ioengines.c                 | 41 ++++++++++++++++++++++++++++++-----------
 ioengines.h                 |  2 +-
 options.c                   | 34 ++++++++++++++++++++++++++++++++++
 thread_options.h            |  1 +
 9 files changed, 110 insertions(+), 38 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 3a720c3..2a70b7c 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1791,7 +1791,9 @@ I/O engine
 		**external**
 			Prefix to specify loading an external I/O engine object file. Append
 			the engine filename, e.g. ``ioengine=external:/tmp/foo.o`` to load
-			ioengine :file:`foo.o` in :file:`/tmp`.
+			ioengine :file:`foo.o` in :file:`/tmp`. The path can be either
+			absolute or relative. See :file:`engines/skeleton_external.c` for
+			details of writing an external I/O engine.
 
 
 I/O engine specific parameters
diff --git a/engines/skeleton_external.c b/engines/skeleton_external.c
index 4bebcc4..56f89f9 100644
--- a/engines/skeleton_external.c
+++ b/engines/skeleton_external.c
@@ -3,7 +3,8 @@
  *
  * Should be compiled with:
  *
- * gcc -Wall -O2 -g -shared -rdynamic -fPIC -o engine.o engine.c
+ * gcc -Wall -O2 -g -shared -rdynamic -fPIC -o skeleton_external.o skeleton_external.c
+ * (also requires -D_GNU_SOURCE -DCONFIG_STRSEP on Linux)
  *
  */
 #include <stdio.h>
@@ -13,6 +14,7 @@
 #include <assert.h>
 
 #include "../fio.h"
+#include "../optgroup.h"
 
 /*
  * The core of the module is identical to the ones included with fio,
@@ -21,6 +23,32 @@
  */
 
 /*
+ * The io engine can define its own options within the io engine source.
+ * The option member must not be at offset 0, due to the way fio parses
+ * the given option. Just add a padding pointer unless the io engine has
+ * something usable.
+ */
+struct fio_skeleton_options {
+	void *pad; /* avoid ->off1 of fio_option becomes 0 */
+	unsigned int dummy;
+};
+
+static struct fio_option options[] = {
+	{
+		.name	= "dummy",
+		.lname	= "ldummy",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct fio_skeleton_options, dummy),
+		.help	= "Set dummy",
+		.category = FIO_OPT_C_ENGINE, /* always use this */
+		.group	= FIO_OPT_G_INVALID, /* this can be different */
+	},
+	{
+		.name	= NULL,
+	},
+};
+
+/*
  * The ->event() hook is called to match an event number with an io_u.
  * After the core has called ->getevents() and it has returned eg 3,
  * the ->event() hook must return the 3 events that have completed for
@@ -140,4 +168,6 @@ struct ioengine_ops ioengine = {
 	.cleanup	= fio_skeleton_cleanup,
 	.open_file	= fio_skeleton_open,
 	.close_file	= fio_skeleton_close,
+	.options	= options,
+	.option_struct_size	= sizeof(struct fio_skeleton_options),
 };
diff --git a/filesetup.c b/filesetup.c
index c4240d2..5e8ea35 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -110,7 +110,7 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 {
 	int new_layout = 0, unlink_file = 0, flags;
 	unsigned long long left;
-	unsigned int bs;
+	unsigned int bs, alloc_size = 0;
 	char *b = NULL;
 
 	if (read_only) {
@@ -204,7 +204,8 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 	if (bs > left)
 		bs = left;
 
-	b = fio_memalign(page_size, bs);
+	alloc_size = bs;
+	b = fio_memalign(page_size, alloc_size);
 	if (!b) {
 		td_verror(td, errno, "fio_memalign");
 		goto err;
@@ -259,14 +260,14 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 			f->io_size = f->real_file_size;
 	}
 
-	fio_memfree(b, bs);
+	fio_memfree(b, alloc_size);
 done:
 	return 0;
 err:
 	close(f->fd);
 	f->fd = -1;
 	if (b)
-		fio_memfree(b, bs);
+		fio_memfree(b, alloc_size);
 	return 1;
 }
 
diff --git a/fio.1 b/fio.1
index 5b63dfd..97133da 100644
--- a/fio.1
+++ b/fio.1
@@ -1572,7 +1572,9 @@ Read and write using device DAX to a persistent memory device (e.g.,
 .B external
 Prefix to specify loading an external I/O engine object file. Append
 the engine filename, e.g. `ioengine=external:/tmp/foo.o' to load
-ioengine `foo.o' in `/tmp'.
+ioengine `foo.o' in `/tmp'. The path can be either
+absolute or relative. See `engines/skeleton_external.c' in the fio source for
+details of writing an external I/O engine.
 .SS "I/O engine specific parameters"
 In addition, there are some parameters which are only valid when a specific
 \fBioengine\fR is in use. These are used identically to normal parameters,
diff --git a/init.c b/init.c
index 625c937..cf5c646 100644
--- a/init.c
+++ b/init.c
@@ -912,20 +912,6 @@ static int fixup_options(struct thread_data *td)
 	return ret;
 }
 
-/* External engines are specified by "external:name.o") */
-static const char *get_engine_name(const char *str)
-{
-	char *p = strstr(str, ":");
-
-	if (!p)
-		return str;
-
-	p++;
-	strip_blank_front(&p);
-	strip_blank_end(p);
-	return p;
-}
-
 static void init_rand_file_service(struct thread_data *td)
 {
 	unsigned long nranges = td->o.nr_files << FIO_FSERVICE_SHIFT;
@@ -1037,8 +1023,6 @@ void td_fill_rand_seeds(struct thread_data *td)
  */
 int ioengine_load(struct thread_data *td)
 {
-	const char *engine;
-
 	if (!td->o.ioengine) {
 		log_err("fio: internal fault, no IO engine specified\n");
 		return 1;
@@ -1057,10 +1041,9 @@ int ioengine_load(struct thread_data *td)
 		free_ioengine(td);
 	}
 
-	engine = get_engine_name(td->o.ioengine);
-	td->io_ops = load_ioengine(td, engine);
+	td->io_ops = load_ioengine(td);
 	if (!td->io_ops) {
-		log_err("fio: failed to load engine %s\n", engine);
+		log_err("fio: failed to load engine\n");
 		return 1;
 	}
 
diff --git a/ioengines.c b/ioengines.c
index 919781c..fa4acab 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -123,13 +123,10 @@ static struct ioengine_ops *dlopen_ioengine(struct thread_data *td,
 	return ops;
 }
 
-struct ioengine_ops *load_ioengine(struct thread_data *td, const char *name)
+static struct ioengine_ops *__load_ioengine(const char *name)
 {
-	struct ioengine_ops *ops;
 	char engine[64];
 
-	dprint(FD_IO, "load ioengine %s\n", name);
-
 	engine[sizeof(engine) - 1] = '\0';
 	strncpy(engine, name, sizeof(engine) - 1);
 
@@ -139,10 +136,37 @@ struct ioengine_ops *load_ioengine(struct thread_data *td, const char *name)
 	if (!strncmp(engine, "linuxaio", 8) || !strncmp(engine, "aio", 3))
 		strcpy(engine, "libaio");
 
-	ops = find_ioengine(engine);
+	dprint(FD_IO, "load ioengine %s\n", engine);
+	return find_ioengine(engine);
+}
+
+struct ioengine_ops *load_ioengine(struct thread_data *td)
+{
+	struct ioengine_ops *ops = NULL;
+	const char *name;
+
+	/*
+	 * Use ->ioengine_so_path if an external ioengine path is specified.
+	 * In this case, ->ioengine is "external" which also means the prefix
+	 * for external ioengines "external:" is properly used.
+	 */
+	name = td->o.ioengine_so_path ?: td->o.ioengine;
+
+	/*
+	 * Try to load ->ioengine first, and if failed try to dlopen(3) either
+	 * ->ioengine or ->ioengine_so_path.  This is redundant for an external
+	 * ioengine with prefix, and also leaves the possibility of unexpected
+	 * behavior (e.g. if the "external" ioengine exists), but we do this
+	 * so as not to break job files not using the prefix.
+	 */
+	ops = __load_ioengine(td->o.ioengine);
 	if (!ops)
 		ops = dlopen_ioengine(td, name);
 
+	/*
+	 * If ops is NULL, we failed to load ->ioengine, and also failed to
+	 * dlopen(3) either ->ioengine or ->ioengine_so_path as a path.
+	 */
 	if (!ops) {
 		log_err("fio: engine %s not loadable\n", name);
 		return NULL;
@@ -552,7 +576,6 @@ int td_io_get_file_size(struct thread_data *td, struct fio_file *f)
 int fio_show_ioengine_help(const char *engine)
 {
 	struct flist_head *entry;
-	struct thread_data td;
 	struct ioengine_ops *io_ops;
 	char *sep;
 	int ret = 1;
@@ -571,9 +594,7 @@ int fio_show_ioengine_help(const char *engine)
 		sep++;
 	}
 
-	memset(&td, 0, sizeof(td));
-
-	io_ops = load_ioengine(&td, engine);
+	io_ops = __load_ioengine(engine);
 	if (!io_ops) {
 		log_info("IO engine %s not found\n", engine);
 		return 1;
@@ -584,7 +605,5 @@ int fio_show_ioengine_help(const char *engine)
 	else
 		log_info("IO engine %s has no options\n", io_ops->name);
 
-	free_ioengine(&td);
-
 	return ret;
 }
diff --git a/ioengines.h b/ioengines.h
index f24f4df..177cbc0 100644
--- a/ioengines.h
+++ b/ioengines.h
@@ -79,7 +79,7 @@ extern int td_io_close_file(struct thread_data *, struct fio_file *);
 extern int td_io_unlink_file(struct thread_data *, struct fio_file *);
 extern int __must_check td_io_get_file_size(struct thread_data *, struct fio_file *);
 
-extern struct ioengine_ops *load_ioengine(struct thread_data *, const char *);
+extern struct ioengine_ops *load_ioengine(struct thread_data *);
 extern void register_ioengine(struct ioengine_ops *);
 extern void unregister_ioengine(struct ioengine_ops *);
 extern void free_ioengine(struct thread_data *);
diff --git a/options.c b/options.c
index 443791a..54fa4ee 100644
--- a/options.c
+++ b/options.c
@@ -1462,6 +1462,39 @@ static int str_write_hist_log_cb(void *data, const char *str)
 	return 0;
 }
 
+/*
+ * str is supposed to be a substring of the strdup'd original string,
+ * and is valid only if it's a regular file path.
+ * This function keeps the pointer to the path as needed later.
+ *
+ * "external:/path/to/so\0" <- original pointer updated with strdup'd
+ * "external\0"             <- above pointer after parsed, i.e. ->ioengine
+ *          "/path/to/so\0" <- str argument, i.e. ->ioengine_so_path
+ */
+static int str_ioengine_external_cb(void *data, const char *str)
+{
+	struct thread_data *td = cb_data_to_td(data);
+	struct stat sb;
+	char *p;
+
+	if (!str) {
+		log_err("fio: null external ioengine path\n");
+		return 1;
+	}
+
+	p = (char *)str; /* str is mutable */
+	strip_blank_front(&p);
+	strip_blank_end(p);
+
+	if (stat(p, &sb) || !S_ISREG(sb.st_mode)) {
+		log_err("fio: invalid external ioengine path \"%s\"\n", p);
+		return 1;
+	}
+
+	td->o.ioengine_so_path = p;
+	return 0;
+}
+
 static int rw_verify(struct fio_option *o, void *data)
 {
 	struct thread_data *td = cb_data_to_td(data);
@@ -1812,6 +1845,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 #endif
 			  { .ival = "external",
 			    .help = "Load external engine (append name)",
+			    .cb = str_ioengine_external_cb,
 			  },
 		},
 	},
diff --git a/thread_options.h b/thread_options.h
index 26a3e0e..fd6576e 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -53,6 +53,7 @@ struct thread_options {
 	char *filename_format;
 	char *opendir;
 	char *ioengine;
+	char *ioengine_so_path;
 	char *mmapfile;
 	enum td_ddir td_ddir;
 	unsigned int rw_seq;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-09-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-09-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2b2fa7f5ecfd5cd5b54a209934b05b770e9c9301:

  lib/axmap: a few fixes/cleanups (2017-08-30 13:03:26 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 83070ccd1091a1c44ac838f95bab6811cbc287f5:

  t/axmap: we don't need smalloc/sfree wrappers (2017-08-31 08:34:27 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'verify_warn' of https://github.com/sitsofe/fio
      t/axmap: we don't need smalloc/sfree wrappers

Sitsofe Wheeler (2):
      verify: make overwriting verified blocks warning more specific
      verify: warn when verify pass won't be run

 init.c    | 23 ++++++++++++++++++++---
 t/axmap.c | 10 ----------
 2 files changed, 20 insertions(+), 13 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 164e411..625c937 100644
--- a/init.c
+++ b/init.c
@@ -748,13 +748,30 @@ static int fixup_options(struct thread_data *td)
 		o->size = -1ULL;
 
 	if (o->verify != VERIFY_NONE) {
-		if (td_write(td) && o->do_verify && o->numjobs > 1) {
-			log_info("Multiple writers may overwrite blocks that "
-				"belong to other jobs. This can cause "
+		if (td_write(td) && o->do_verify && o->numjobs > 1 &&
+		    (o->filename ||
+		     !(o->unique_filename &&
+		       strstr(o->filename_format, "$jobname") &&
+		       strstr(o->filename_format, "$jobnum") &&
+		       strstr(o->filename_format, "$filenum")))) {
+			log_info("fio: multiple writers may overwrite blocks "
+				"that belong to other jobs. This can cause "
 				"verification failures.\n");
 			ret = warnings_fatal;
 		}
 
+		/*
+		 * Warn if verification is requested but no verification of any
+		 * kind can be started due to time constraints
+		 */
+		if (td_write(td) && o->do_verify && o->timeout &&
+		    o->time_based && !td_read(td) && !o->verify_backlog) {
+			log_info("fio: verification read phase will never "
+				 "start because write phase uses all of "
+				 "runtime\n");
+			ret = warnings_fatal;
+		}
+
 		if (!fio_option_is_set(o, refill_buffers))
 			o->refill_buffers = 1;
 
diff --git a/t/axmap.c b/t/axmap.c
index e32ff98..a803ce4 100644
--- a/t/axmap.c
+++ b/t/axmap.c
@@ -8,16 +8,6 @@
 #include "../lib/lfsr.h"
 #include "../lib/axmap.h"
 
-void *smalloc(size_t size)
-{
-	return malloc(size);
-}
-
-void sfree(void *ptr)
-{
-	free(ptr);
-}
-
 static int test_regular(size_t size, int seed)
 {
 	struct fio_lfsr lfsr;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-31 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-31 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 63463983ce10e9678c5ad309608630eea873b4df:

  Merge branch 'asmfix' of https://github.com/oohal/fio (2017-08-29 15:35:49 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2b2fa7f5ecfd5cd5b54a209934b05b770e9c9301:

  lib/axmap: a few fixes/cleanups (2017-08-30 13:03:26 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      lib/axmap: a few fixes/cleanups

 lib/axmap.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/lib/axmap.c b/lib/axmap.c
index 2ee3a25..bf203df 100644
--- a/lib/axmap.c
+++ b/lib/axmap.c
@@ -184,6 +184,9 @@ static bool axmap_clear_fn(struct axmap_level *al, unsigned long offset,
 void axmap_clear(struct axmap *axmap, uint64_t bit_nr)
 {
 	axmap_handler(axmap, bit_nr, axmap_clear_fn, NULL);
+
+	if (bit_nr < axmap->first_free)
+		axmap->first_free = bit_nr;
 }
 
 struct axmap_set_data {
@@ -191,7 +194,7 @@ struct axmap_set_data {
 	unsigned int set_bits;
 };
 
-static unsigned long bit_masks[] = {
+static const unsigned long bit_masks[] = {
 	0x0000000000000000, 0x0000000000000001, 0x0000000000000003, 0x0000000000000007,
 	0x000000000000000f, 0x000000000000001f, 0x000000000000003f, 0x000000000000007f,
 	0x00000000000000ff, 0x00000000000001ff, 0x00000000000003ff, 0x00000000000007ff,
@@ -372,10 +375,9 @@ static uint64_t axmap_find_first_free(struct axmap *axmap, unsigned int level,
 
 static uint64_t axmap_first_free(struct axmap *axmap)
 {
-	if (firstfree_valid(axmap))
-		return axmap->first_free;
+	if (!firstfree_valid(axmap))
+		axmap->first_free = axmap_find_first_free(axmap, axmap->nr_levels - 1, 0);
 
-	axmap->first_free = axmap_find_first_free(axmap, axmap->nr_levels - 1, 0);
 	return axmap->first_free;
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a4befdc605bc253f258a99e9f0d037147775035e:

  Merge branch 'direct_layout_fix' of https://github.com/sitsofe/fio (2017-08-28 09:57:30 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 63463983ce10e9678c5ad309608630eea873b4df:

  Merge branch 'asmfix' of https://github.com/oohal/fio (2017-08-29 15:35:49 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'asmfix' of https://github.com/oohal/fio

Oliver O'Halloran (1):
      arch/ppc: Fix get_cpu_clock asm clobber list

Tomohiro Kusumi (6):
      cleanup NetBSD/OpenBSD header
      HOWTO: add OpenBSD to direct I/O unsupported platform
      add ifdef __sun__ for Solaris specific code
      filesetup: add non O_DIRECT direct I/O support for initial layout setup
      change fio_set_odirect() prototype not to use int fd
      change os_trim() prototype not to use int fd

 HOWTO             |  2 +-
 arch/arch-ppc.h   |  3 ++-
 file.h            |  1 +
 filesetup.c       | 31 +++++++++++++++++++++++++++++++
 fio.1             |  2 +-
 io_u.c            |  2 +-
 ioengines.c       | 22 ++--------------------
 os/os-android.h   |  4 ++--
 os/os-dragonfly.h |  4 ++--
 os/os-freebsd.h   |  4 ++--
 os/os-linux.h     |  4 ++--
 os/os-mac.h       |  4 ++--
 os/os-netbsd.h    |  6 ++----
 os/os-openbsd.h   |  6 ++----
 os/os-solaris.h   |  4 ++--
 15 files changed, 55 insertions(+), 44 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index f36d4a7..3a720c3 100644
--- a/HOWTO
+++ b/HOWTO
@@ -947,7 +947,7 @@ I/O type
 .. option:: direct=bool
 
 	If value is true, use non-buffered I/O. This is usually O_DIRECT. Note that
-	ZFS on Solaris doesn't support direct I/O.  On Windows the synchronous
+	OpenBSD and ZFS on Solaris don't support direct I/O.  On Windows the synchronous
 	ioengines don't support direct I/O.  Default: false.
 
 .. option:: atomic=bool
diff --git a/arch/arch-ppc.h b/arch/arch-ppc.h
index ba452b1..804d596 100644
--- a/arch/arch-ppc.h
+++ b/arch/arch-ppc.h
@@ -62,7 +62,8 @@ static inline unsigned long long get_cpu_clock(void)
 		"	cmpwi %0,0;\n"
 		"	beq-  90b;\n"
 	: "=r" (rval)
-	: "i" (SPRN_TBRL));
+	: "i" (SPRN_TBRL)
+	: "cr0");
 
 	return rval;
 }
diff --git a/file.h b/file.h
index 84daa5f..ad8802d 100644
--- a/file.h
+++ b/file.h
@@ -211,5 +211,6 @@ extern void filesetup_mem_free(void);
 extern void fio_file_reset(struct thread_data *, struct fio_file *);
 extern bool fio_files_done(struct thread_data *);
 extern bool exists_and_not_regfile(const char *);
+extern int fio_set_directio(struct thread_data *, struct fio_file *);
 
 #endif
diff --git a/filesetup.c b/filesetup.c
index a6a94ee..c4240d2 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -196,6 +196,9 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 		}
 	}
 
+	if (td->o.odirect && !OS_O_DIRECT && fio_set_directio(td, f))
+		goto err;
+
 	left = f->real_file_size;
 	bs = td->o.max_bs[DDIR_WRITE];
 	if (bs > left)
@@ -1852,3 +1855,31 @@ void filesetup_mem_free(void)
 {
 	free_already_allocated();
 }
+
+/*
+ * This function is for platforms which support direct I/O but not O_DIRECT.
+ */
+int fio_set_directio(struct thread_data *td, struct fio_file *f)
+{
+#ifdef FIO_OS_DIRECTIO
+	int ret = fio_set_odirect(f);
+
+	if (ret) {
+		td_verror(td, ret, "fio_set_directio");
+#if defined(__sun__)
+		if (ret == ENOTTY) { /* ENOTTY suggests RAW device or ZFS */
+			log_err("fio: doing directIO to RAW devices or ZFS not supported\n");
+		} else {
+			log_err("fio: the file system does not seem to support direct IO\n");
+		}
+#else
+		log_err("fio: the file system does not seem to support direct IO\n");
+#endif
+		return -1;
+	}
+
+	return 0;
+#else
+	return -1;
+#endif
+}
diff --git a/fio.1 b/fio.1
index b8b3da2..5b63dfd 100644
--- a/fio.1
+++ b/fio.1
@@ -717,7 +717,7 @@ read. The two zone options can be used to only do I/O on zones of a file.
 .TP
 .BI direct \fR=\fPbool
 If value is true, use non\-buffered I/O. This is usually O_DIRECT. Note that
-ZFS on Solaris doesn't support direct I/O. On Windows the synchronous
+OpenBSD and ZFS on Solaris don't support direct I/O. On Windows the synchronous
 ioengines don't support direct I/O. Default: false.
 .TP
 .BI atomic \fR=\fPbool
diff --git a/io_u.c b/io_u.c
index ed8e84a..db043e4 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2188,7 +2188,7 @@ int do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
 	struct fio_file *f = io_u->file;
 	int ret;
 
-	ret = os_trim(f->fd, io_u->offset, io_u->xfer_buflen);
+	ret = os_trim(f, io_u->offset, io_u->xfer_buflen);
 	if (!ret)
 		return io_u->xfer_buflen;
 
diff --git a/ioengines.c b/ioengines.c
index 6e6e3de..919781c 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -495,26 +495,8 @@ int td_io_open_file(struct thread_data *td, struct fio_file *f)
 	}
 #endif
 
-#ifdef FIO_OS_DIRECTIO
-	/*
-	 * Some OS's have a distinct call to mark the file non-buffered,
-	 * instead of using O_DIRECT (Solaris)
-	 */
-	if (td->o.odirect) {
-		int ret = fio_set_odirect(f->fd);
-
-		if (ret) {
-			td_verror(td, ret, "fio_set_odirect");
-			if (ret == ENOTTY) { /* ENOTTY suggests RAW device or ZFS */
-				log_err("fio: doing directIO to RAW devices or ZFS not supported\n");
-			} else {
-				log_err("fio: the file system does not seem to support direct IO\n");
-			}
-
-			goto err;
-		}
-	}
-#endif
+	if (td->o.odirect && !OS_O_DIRECT && fio_set_directio(td, f))
+		goto err;
 
 done:
 	log_file(td, f, FIO_LOG_OPEN_FILE);
diff --git a/os/os-android.h b/os/os-android.h
index b217daa..bb590e4 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -274,7 +274,7 @@ static inline unsigned long long get_fs_free_size(const char *path)
 	return ret;
 }
 
-static inline int os_trim(int fd, unsigned long long start,
+static inline int os_trim(struct fio_file *f, unsigned long long start,
 			  unsigned long long len)
 {
 	uint64_t range[2];
@@ -282,7 +282,7 @@ static inline int os_trim(int fd, unsigned long long start,
 	range[0] = start;
 	range[1] = len;
 
-	if (!ioctl(fd, BLKDISCARD, range))
+	if (!ioctl(f->fd, BLKDISCARD, range))
 		return 0;
 
 	return errno;
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index 8d15833..423b236 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -216,7 +216,7 @@ static inline unsigned long long get_fs_free_size(const char *path)
 	return ret;
 }
 
-static inline int os_trim(int fd, unsigned long long start,
+static inline int os_trim(struct fio_file *f, unsigned long long start,
 			  unsigned long long len)
 {
 	off_t range[2];
@@ -224,7 +224,7 @@ static inline int os_trim(int fd, unsigned long long start,
 	range[0] = start;
 	range[1] = len;
 
-	if (!ioctl(fd, IOCTLTRIM, range))
+	if (!ioctl(f->fd, IOCTLTRIM, range))
 		return 0;
 
 	return errno;
diff --git a/os/os-freebsd.h b/os/os-freebsd.h
index e6da286..4a7cdeb 100644
--- a/os/os-freebsd.h
+++ b/os/os-freebsd.h
@@ -117,7 +117,7 @@ static inline unsigned long long get_fs_free_size(const char *path)
 	return ret;
 }
 
-static inline int os_trim(int fd, unsigned long long start,
+static inline int os_trim(struct fio_file *f, unsigned long long start,
 			  unsigned long long len)
 {
 	off_t range[2];
@@ -125,7 +125,7 @@ static inline int os_trim(int fd, unsigned long long start,
 	range[0] = start;
 	range[1] = len;
 
-	if (!ioctl(fd, DIOCGDELETE, range))
+	if (!ioctl(f->fd, DIOCGDELETE, range))
 		return 0;
 
 	return errno;
diff --git a/os/os-linux.h b/os/os-linux.h
index e7d600d..1ad6ebd 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -281,7 +281,7 @@ static inline unsigned long long get_fs_free_size(const char *path)
 	return ret;
 }
 
-static inline int os_trim(int fd, unsigned long long start,
+static inline int os_trim(struct fio_file *f, unsigned long long start,
 			  unsigned long long len)
 {
 	uint64_t range[2];
@@ -289,7 +289,7 @@ static inline int os_trim(int fd, unsigned long long start,
 	range[0] = start;
 	range[1] = len;
 
-	if (!ioctl(fd, BLKDISCARD, range))
+	if (!ioctl(f->fd, BLKDISCARD, range))
 		return 0;
 
 	return errno;
diff --git a/os/os-mac.h b/os/os-mac.h
index a1536c7..92a60ee 100644
--- a/os/os-mac.h
+++ b/os/os-mac.h
@@ -41,9 +41,9 @@ typedef unsigned int clockid_t;
 #endif
 
 #define FIO_OS_DIRECTIO
-static inline int fio_set_odirect(int fd)
+static inline int fio_set_odirect(struct fio_file *f)
 {
-	if (fcntl(fd, F_NOCACHE, 1) == -1)
+	if (fcntl(f->fd, F_NOCACHE, 1) == -1)
 		return errno;
 	return 0;
 }
diff --git a/os/os-netbsd.h b/os/os-netbsd.h
index eac76cf..682a11c 100644
--- a/os/os-netbsd.h
+++ b/os/os-netbsd.h
@@ -11,9 +11,9 @@
 #include <sys/dkio.h>
 #include <sys/disklabel.h>
 #include <sys/endian.h>
-/* XXX hack to avoid confilcts between rbtree.h and <sys/rb.h> */
-#define	rb_node	_rb_node
 #include <sys/sysctl.h>
+
+/* XXX hack to avoid confilcts between rbtree.h and <sys/rbtree.h> */
 #undef rb_node
 #undef rb_left
 #undef rb_right
@@ -26,8 +26,6 @@
 #define FIO_HAVE_FS_STAT
 #define FIO_HAVE_GETTID
 
-#undef	FIO_HAVE_CPU_AFFINITY	/* doesn't exist */
-
 #define OS_MAP_ANON		MAP_ANON
 
 #ifndef PTHREAD_STACK_MIN
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index 675bf89..b4c02c9 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -11,23 +11,21 @@
 #include <sys/disklabel.h>
 #include <sys/endian.h>
 #include <sys/utsname.h>
-/* XXX hack to avoid conflicts between rbtree.h and <sys/tree.h> */
 #include <sys/sysctl.h>
+
+/* XXX hack to avoid conflicts between rbtree.h and <sys/tree.h> */
 #undef RB_BLACK
 #undef RB_RED
 #undef RB_ROOT
 
 #include "../file.h"
 
-#undef  FIO_HAVE_ODIRECT
 #define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_FS_STAT
 #define FIO_HAVE_GETTID
 #define FIO_HAVE_SHM_ATTACH_REMOVED
 
-#undef	FIO_HAVE_CPU_AFFINITY	/* doesn't exist */
-
 #define OS_MAP_ANON		MAP_ANON
 
 #ifndef PTHREAD_STACK_MIN
diff --git a/os/os-solaris.h b/os/os-solaris.h
index 8f8f53b..6af25d2 100644
--- a/os/os-solaris.h
+++ b/os/os-solaris.h
@@ -85,9 +85,9 @@ static inline long os_random_long(os_random_state_t *rs)
 
 #define FIO_OS_DIRECTIO
 extern int directio(int, int);
-static inline int fio_set_odirect(int fd)
+static inline int fio_set_odirect(struct fio_file *f)
 {
-	if (directio(fd, DIRECTIO_ON) < 0)
+	if (directio(f->fd, DIRECTIO_ON) < 0)
 		return errno;
 
 	return 0;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0c76294c12bdc3df90a41754b05fa30b612ea6eb:

  engines/windowsaio: kill useless forward declarations (2017-08-27 15:21:26 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a4befdc605bc253f258a99e9f0d037147775035e:

  Merge branch 'direct_layout_fix' of https://github.com/sitsofe/fio (2017-08-28 09:57:30 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'direct_layout_fix' of https://github.com/sitsofe/fio

Sitsofe Wheeler (2):
      filesetup: align layout buffer
      filesetup: add direct=1 failure warning to layout

 filesetup.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index 0e5599a..a6a94ee 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -15,6 +15,7 @@
 #include "os/os.h"
 #include "hash.h"
 #include "lib/axmap.h"
+#include "lib/memalign.h"
 
 #ifdef CONFIG_LINUX_FALLOCATE
 #include <linux/falloc.h>
@@ -161,8 +162,14 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 		if (err == ENOENT && !td->o.allow_create)
 			log_err("fio: file creation disallowed by "
 					"allow_file_create=0\n");
-		else
+		else {
+			if (err == EINVAL && (flags & OS_O_DIRECT))
+				log_err("fio: looks like your filesystem "
+					"does not support "
+					"direct=1/buffered=0\n");
+
 			td_verror(td, err, "open");
+		}
 		return 1;
 	}
 
@@ -194,9 +201,9 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 	if (bs > left)
 		bs = left;
 
-	b = malloc(bs);
+	b = fio_memalign(page_size, bs);
 	if (!b) {
-		td_verror(td, errno, "malloc");
+		td_verror(td, errno, "fio_memalign");
 		goto err;
 	}
 
@@ -249,14 +256,14 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 			f->io_size = f->real_file_size;
 	}
 
-	free(b);
+	fio_memfree(b, bs);
 done:
 	return 0;
 err:
 	close(f->fd);
 	f->fd = -1;
 	if (b)
-		free(b);
+		fio_memfree(b, bs);
 	return 1;
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b427f2e12beba7b0c9a655f8ecbd19187c0b6029:

  Merge branch 'stat_base_overflow' of https://github.com/football1222/fio (2017-08-23 08:34:37 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0c76294c12bdc3df90a41754b05fa30b612ea6eb:

  engines/windowsaio: kill useless forward declarations (2017-08-27 15:21:26 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Merge branch 'fixes' of https://github.com/sitsofe/fio
      Merge branch 'doc_runtime' of https://github.com/sitsofe/fio
      engines/windowsaio: kill useless forward declarations

Sitsofe Wheeler (4):
      doc: remove '--runtime' command line option
      fio: implement 64 bit network/big endian byte swapping macros
      configure: clean up libverbs configure test
      README: update/add mintty issue links

 HOWTO                |  3 ---
 README               |  4 +++-
 configure            |  3 +--
 engines/rdma.c       |  5 ++---
 engines/windowsaio.c | 11 -----------
 fio.1                |  3 ---
 os/os.h              | 12 ++++++++++++
 7 files changed, 18 insertions(+), 23 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 4192ac7..f36d4a7 100644
--- a/HOWTO
+++ b/HOWTO
@@ -111,9 +111,6 @@ Command line options
 	format.  `json+` is like `json`, except it adds a full dump of the latency
 	buckets.
 
-.. option:: --runtime
-	Limit run time to runtime seconds.
-
 .. option:: --bandwidth-log
 
 	Generate aggregate bandwidth logs.
diff --git a/README b/README
index a6eba8f..72ff465 100644
--- a/README
+++ b/README
@@ -181,7 +181,9 @@ To build fio on 32-bit Windows, run ``./configure --build-32bit-win`` before
 It's recommended that once built or installed, fio be run in a Command Prompt or
 other 'native' console such as console2, since there are known to be display and
 signal issues when running it under a Cygwin shell (see
-http://code.google.com/p/mintty/issues/detail?id=56 for details).
+https://github.com/mintty/mintty/issues/56 and
+https://github.com/mintty/mintty/wiki/Tips#inputoutput-interaction-with-alien-programs
+for details).
 
 
 Documentation
diff --git a/configure b/configure
index 59af1b6..cefd610 100755
--- a/configure
+++ b/configure
@@ -697,8 +697,7 @@ if test "$libverbs" != "yes" ; then
   libverbs="no"
 fi
 cat > $TMPC << EOF
-#include <stdio.h>
-#include <infiniband/arch.h>
+#include <infiniband/verbs.h>
 int main(int argc, char **argv)
 {
   struct ibv_pd *pd = ibv_alloc_pd(NULL);
diff --git a/engines/rdma.c b/engines/rdma.c
index 8d31ff3..da00cba 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -44,7 +44,6 @@
 #include "../optgroup.h"
 
 #include <rdma/rdma_cma.h>
-#include <infiniband/arch.h>
 
 #define FIO_RDMA_MAX_IO_DEPTH    512
 
@@ -216,7 +215,7 @@ static int client_recv(struct thread_data *td, struct ibv_wc *wc)
 		rd->rmt_nr = ntohl(rd->recv_buf.nr);
 
 		for (i = 0; i < rd->rmt_nr; i++) {
-			rd->rmt_us[i].buf = ntohll(rd->recv_buf.rmt_us[i].buf);
+			rd->rmt_us[i].buf = be64_to_cpu(rd->recv_buf.rmt_us[i].buf);
 			rd->rmt_us[i].rkey = ntohl(rd->recv_buf.rmt_us[i].rkey);
 			rd->rmt_us[i].size = ntohl(rd->recv_buf.rmt_us[i].size);
 
@@ -1300,7 +1299,7 @@ static int fio_rdmaio_init(struct thread_data *td)
 		}
 
 		rd->send_buf.rmt_us[i].buf =
-		    htonll((uint64_t) (unsigned long)io_u->buf);
+		    cpu_to_be64((uint64_t) (unsigned long)io_u->buf);
 		rd->send_buf.rmt_us[i].rkey = htonl(io_u->mr->rkey);
 		rd->send_buf.rmt_us[i].size = htonl(max_bs);
 
diff --git a/engines/windowsaio.c b/engines/windowsaio.c
index f5cb048..c4c5abd 100644
--- a/engines/windowsaio.c
+++ b/engines/windowsaio.c
@@ -35,17 +35,7 @@ struct thread_ctx {
 	struct windowsaio_data *wd;
 };
 
-static BOOL timeout_expired(DWORD start_count, DWORD end_count);
-static int fio_windowsaio_getevents(struct thread_data *td, unsigned int min,
-				unsigned int max, const struct timespec *t);
-static struct io_u *fio_windowsaio_event(struct thread_data *td, int event);
-static int fio_windowsaio_queue(struct thread_data *td,
-				  struct io_u *io_u);
-static void fio_windowsaio_cleanup(struct thread_data *td);
 static DWORD WINAPI IoCompletionRoutine(LPVOID lpParameter);
-static int fio_windowsaio_init(struct thread_data *td);
-static int fio_windowsaio_open_file(struct thread_data *td, struct fio_file *f);
-static int fio_windowsaio_close_file(struct thread_data fio_unused *td, struct fio_file *f);
 
 static int fio_windowsaio_init(struct thread_data *td)
 {
@@ -152,7 +142,6 @@ static void fio_windowsaio_cleanup(struct thread_data *td)
 	}
 }
 
-
 static int fio_windowsaio_open_file(struct thread_data *td, struct fio_file *f)
 {
 	int rc = 0;
diff --git a/fio.1 b/fio.1
index a0f1a24..b8b3da2 100644
--- a/fio.1
+++ b/fio.1
@@ -29,9 +29,6 @@ Set the reporting \fIformat\fR to `normal', `terse', `json', or
 is a CSV based format. `json+' is like `json', except it adds a full
 dump of the latency buckets.
 .TP
-.BI \-\-runtime \fR=\fPruntime
-Limit run time to \fIruntime\fR seconds.
-.TP
 .BI \-\-bandwidth\-log
 Generate aggregate bandwidth logs.
 .TP
diff --git a/os/os.h b/os/os.h
index 2e15529..f62b427 100644
--- a/os/os.h
+++ b/os/os.h
@@ -204,16 +204,20 @@ static inline uint64_t fio_swap64(uint64_t val)
 
 #ifndef FIO_HAVE_BYTEORDER_FUNCS
 #ifdef CONFIG_LITTLE_ENDIAN
+#define __be64_to_cpu(x)		fio_swap64(x)
 #define __le16_to_cpu(x)		(x)
 #define __le32_to_cpu(x)		(x)
 #define __le64_to_cpu(x)		(x)
+#define __cpu_to_be64(x)		fio_swap64(x)
 #define __cpu_to_le16(x)		(x)
 #define __cpu_to_le32(x)		(x)
 #define __cpu_to_le64(x)		(x)
 #else
+#define __be64_to_cpu(x)		(x)
 #define __le16_to_cpu(x)		fio_swap16(x)
 #define __le32_to_cpu(x)		fio_swap32(x)
 #define __le64_to_cpu(x)		fio_swap64(x)
+#define __cpu_to_be64(x)		(x)
 #define __cpu_to_le16(x)		fio_swap16(x)
 #define __cpu_to_le32(x)		fio_swap32(x)
 #define __cpu_to_le64(x)		fio_swap64(x)
@@ -221,6 +225,10 @@ static inline uint64_t fio_swap64(uint64_t val)
 #endif /* FIO_HAVE_BYTEORDER_FUNCS */
 
 #ifdef FIO_INTERNAL
+#define be64_to_cpu(val) ({			\
+	typecheck(uint64_t, val);		\
+	__be64_to_cpu(val);			\
+})
 #define le16_to_cpu(val) ({			\
 	typecheck(uint16_t, val);		\
 	__le16_to_cpu(val);			\
@@ -235,6 +243,10 @@ static inline uint64_t fio_swap64(uint64_t val)
 })
 #endif
 
+#define cpu_to_be64(val) ({			\
+	typecheck(uint64_t, val);		\
+	__cpu_to_be64(val);			\
+})
 #define cpu_to_le16(val) ({			\
 	typecheck(uint16_t, val);		\
 	__cpu_to_le16(val);			\

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit deeb3c11c212e99e8d1162e03e0ef734bd0d01a7:

  Merge branch 'timespec_add_msec_overflow' of https://github.com/sitsofe/fio (2017-08-22 10:32:18 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b427f2e12beba7b0c9a655f8ecbd19187c0b6029:

  Merge branch 'stat_base_overflow' of https://github.com/football1222/fio (2017-08-23 08:34:37 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'stat_base_overflow' of https://github.com/football1222/fio

Richard Liu (1):
      stat: increase the size of base to avoid overflow

 stat.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 4aa9cb8..91c74ab 100644
--- a/stat.c
+++ b/stat.c
@@ -100,7 +100,8 @@ static unsigned int plat_val_to_idx(unsigned long long val)
  */
 static unsigned long long plat_idx_to_val(unsigned int idx)
 {
-	unsigned int error_bits, k, base;
+	unsigned int error_bits;
+	unsigned long long k, base;
 
 	assert(idx < FIO_IO_U_PLAT_NR);
 
@@ -111,7 +112,7 @@ static unsigned long long plat_idx_to_val(unsigned int idx)
 
 	/* Find the group and compute the minimum value of that group */
 	error_bits = (idx >> FIO_IO_U_PLAT_BITS) - 1;
-	base = 1 << (error_bits + FIO_IO_U_PLAT_BITS);
+	base = ((unsigned long long) 1) << (error_bits + FIO_IO_U_PLAT_BITS);
 
 	/* Find its bucket number of the group */
 	k = idx % FIO_IO_U_PLAT_VAL;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 168bb5875da97004163c6b755de162ad481134c4:

  doc: latency log unit is nsec (2017-08-17 15:39:56 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to deeb3c11c212e99e8d1162e03e0ef734bd0d01a7:

  Merge branch 'timespec_add_msec_overflow' of https://github.com/sitsofe/fio (2017-08-22 10:32:18 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'timespec_add_msec_overflow' of https://github.com/sitsofe/fio

Sitsofe Wheeler (1):
      time: fix overflow in timespec_add_msec

 time.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/time.c b/time.c
index edfe779..0798419 100644
--- a/time.c
+++ b/time.c
@@ -8,17 +8,17 @@ static unsigned long ns_granularity;
 
 void timespec_add_msec(struct timespec *ts, unsigned int msec)
 {
-	unsigned long adj_nsec = 1000000 * msec;
+	uint64_t adj_nsec = 1000000ULL * msec;
 
 	ts->tv_nsec += adj_nsec;
 	if (adj_nsec >= 1000000000) {
-		unsigned long adj_sec = adj_nsec / 1000000000UL;
+		uint64_t adj_sec = adj_nsec / 1000000000;
 
-		ts->tv_nsec -=  adj_sec * 1000000000UL;
+		ts->tv_nsec -= adj_sec * 1000000000;
 		ts->tv_sec += adj_sec;
 	}
-	if (ts->tv_nsec >= 1000000000UL){
-		ts->tv_nsec -= 1000000000UL;
+	if (ts->tv_nsec >= 1000000000){
+		ts->tv_nsec -= 1000000000;
 		ts->tv_sec++;
 	}
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-18 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3671 bytes --]

The following changes since commit bdadbb83ba3611a09888600830b8539bf3d19794:

  Fio 3.0 (2017-08-16 14:12:33 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 168bb5875da97004163c6b755de162ad481134c4:

  doc: latency log unit is nsec (2017-08-17 15:39:56 -0600)

----------------------------------------------------------------
Sitsofe Wheeler (4):
      rbd: fixup format specifier
      fio2gnuplot: minor man page heading fix
      configure: fail rbd configure check on wrong rados_create2 signature
      travis: install additional development libraries

Vincent Fu (1):
      doc: latency log unit is nsec

 .travis.yml              | 2 +-
 HOWTO                    | 2 +-
 configure                | 8 ++++++--
 engines/rbd.c            | 2 +-
 fio.1                    | 2 +-
 tools/plot/fio2gnuplot.1 | 2 +-
 6 files changed, 11 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
index 4cdda12..795c0fc 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -25,6 +25,6 @@ matrix:
       compiler: gcc
 before_install:
   - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then sudo apt-get -qq update; fi
-  - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then sudo apt-get install -qq -y libaio-dev libnuma-dev libz-dev; fi
+  - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then sudo apt-get install -qq -y libaio-dev libnuma-dev libz-dev librbd-dev glusterfs-common libibverbs-dev librdmacm-dev; fi
 script:
   - ./configure --extra-cflags="-Werror" && make && make test
diff --git a/HOWTO b/HOWTO
index 16ae708..4192ac7 100644
--- a/HOWTO
+++ b/HOWTO
@@ -3584,7 +3584,7 @@ and IOPS. The logs share a common format, which looks like this:
 on the type of log, it will be one of the following:
 
     **Latency log**
-		Value is latency in usecs
+		Value is latency in nsecs
     **Bandwidth log**
 		Value is in KiB/sec
     **IOPS log**
diff --git a/configure b/configure
index afb88ca..59af1b6 100755
--- a/configure
+++ b/configure
@@ -1483,12 +1483,16 @@ int main(int argc, char **argv)
 {
   rados_t cluster;
   rados_ioctx_t io_ctx;
+  const char cluster_name[] = "ceph";
+  const char user_name[] = "client.admin";
   const char pool[] = "rbd";
-
   int major, minor, extra;
-  rbd_version(&major, &minor, &extra);
 
+  rbd_version(&major, &minor, &extra);
+  /* The rados_create2 signature required was only introduced in ceph 0.65 */
+  rados_create2(&cluster, cluster_name, user_name, 0);
   rados_ioctx_create(cluster, pool, &io_ctx);
+
   return 0;
 }
 EOF
diff --git a/engines/rbd.c b/engines/rbd.c
index 5b51a39..39501eb 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -605,7 +605,7 @@ static int fio_rbd_setup(struct thread_data *td)
 		goto cleanup;
 	}
 
-	dprint(FD_IO, "rbd-engine: image size: %lu\n", info.size);
+	dprint(FD_IO, "rbd-engine: image size: %" PRIu64 "\n", info.size);
 
 	/* taken from "net" engine. Pretend we deal with files,
 	 * even if we do not have any ideas about files.
diff --git a/fio.1 b/fio.1
index 792bc9d..a0f1a24 100644
--- a/fio.1
+++ b/fio.1
@@ -3317,7 +3317,7 @@ on the type of log, it will be one of the following:
 .RS
 .TP
 .B Latency log
-Value is latency in usecs
+Value is latency in nsecs
 .TP
 .B Bandwidth log
 Value is in KiB/sec
diff --git a/tools/plot/fio2gnuplot.1 b/tools/plot/fio2gnuplot.1
index 1a33167..6fb1283 100644
--- a/tools/plot/fio2gnuplot.1
+++ b/tools/plot/fio2gnuplot.1
@@ -1,5 +1,5 @@
 .\" Text automatically generated by txt2man
-.TH fio2gnuplot  "07 ao��t 2013" "" ""
+.TH fio2gnuplot 1 "August 2013"
 .SH NAME
 \fBfio2gnuplot \fP- Render fio's output files with gnuplot
 .SH SYNOPSIS

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-17 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 248745 bytes --]

The following changes since commit 29092211c1f926541db0e2863badc03d7378b31a:

  HOWTO: update and clarify description of latencies in normal output (2017-08-14 13:02:49 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to bdadbb83ba3611a09888600830b8539bf3d19794:

  Fio 3.0 (2017-08-16 14:12:33 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      printing: use bigger on-stack buffer
      gfio: update copyright
      Fio 3.0

Tomohiro Kusumi (5):
      man: sync "JOB PARAMETERS" section with HOWTO
      man: sync "OUTPUT" section and after with HOWTO
      man: minor fixes for sections before "JOB PARAMETERS" for consistency
      HOWTO: minor fixes and backports from man page
      HOWTO: fix wrong kb_base= description

Vincent Fu (1):
      man: update description of normal output latencies

 FIO-VERSION-GEN        |    2 +-
 HOWTO                  |  213 ++-
 fio.1                  | 4300 ++++++++++++++++++++++++++++--------------------
 gfio.c                 |    2 +-
 os/windows/install.wxs |    2 +-
 printing.c             |    2 +-
 6 files changed, 2579 insertions(+), 1942 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index f82aeee..31acf1c 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.99
+DEF_VER=fio-3.0
 
 LF='
 '
diff --git a/HOWTO b/HOWTO
index 71d9fa5..16ae708 100644
--- a/HOWTO
+++ b/HOWTO
@@ -54,7 +54,7 @@ Command line options
 
 .. option:: --debug=type
 
-	Enable verbose tracing of various fio actions.  May be ``all`` for all types
+	Enable verbose tracing `type` of various fio actions.  May be ``all`` for all types
 	or individual types separated by a comma (e.g. ``--debug=file,mem`` will
 	enable file and memory debugging).  Currently, additional logging is
 	available for:
@@ -104,9 +104,9 @@ Command line options
 
 	Write output to file `filename`.
 
-.. option:: --output-format=type
+.. option:: --output-format=format
 
-	Set the reporting format to `normal`, `terse`, `json`, or `json+`.  Multiple
+	Set the reporting `format` to `normal`, `terse`, `json`, or `json+`.  Multiple
 	formats can be selected, separated by a comma.  `terse` is a CSV based
 	format.  `json+` is like `json`, except it adds a full dump of the latency
 	buckets.
@@ -128,9 +128,9 @@ Command line options
 	**Deprecated**, use :option:`--output-format` instead to select multiple
 	formats.
 
-.. option:: --terse-version=type
+.. option:: --terse-version=version
 
-	Set terse version output format (default 3, or 2 or 4 or 5).
+	Set terse `version` output format (default 3, or 2 or 4 or 5).
 
 .. option:: --version
 
@@ -156,8 +156,8 @@ Command line options
 
 .. option:: --enghelp=[ioengine[,command]]
 
-	List all commands defined by :option:`ioengine`, or print help for `command`
-	defined by :option:`ioengine`.  If no :option:`ioengine` is given, list all
+	List all commands defined by `ioengine`, or print help for `command`
+	defined by `ioengine`.  If no `ioengine` is given, list all
 	available ioengines.
 
 .. option:: --showcmd=jobfile
@@ -217,7 +217,7 @@ Command line options
 
 .. option:: --max-jobs=nr
 
-	Set the maximum number of threads/processes to support.
+	Set the maximum number of threads/processes to support to `nr`.
 
 .. option:: --server=args
 
@@ -230,12 +230,12 @@ Command line options
 
 .. option:: --client=hostname
 
-	Instead of running the jobs locally, send and run them on the given host or
-	set of hosts.  See `Client/Server`_ section.
+	Instead of running the jobs locally, send and run them on the given `hostname`
+	or set of `hostname`s.  See `Client/Server`_ section.
 
 .. option:: --remote-config=file
 
-	Tell fio server to load this local file.
+	Tell fio server to load this local `file`.
 
 .. option:: --idle-prof=option
 
@@ -252,27 +252,27 @@ Command line options
 
 .. option:: --inflate-log=log
 
-	Inflate and output compressed log.
+	Inflate and output compressed `log`.
 
 .. option:: --trigger-file=file
 
-	Execute trigger cmd when file exists.
+	Execute trigger command when `file` exists.
 
-.. option:: --trigger-timeout=t
+.. option:: --trigger-timeout=time
 
-	Execute trigger at this time.
+	Execute trigger at this `time`.
 
-.. option:: --trigger=cmd
+.. option:: --trigger=command
 
-	Set this command as local trigger.
+	Set this `command` as local trigger.
 
-.. option:: --trigger-remote=cmd
+.. option:: --trigger-remote=command
 
-	Set this command as remote trigger.
+	Set this `command` as remote trigger.
 
 .. option:: --aux-path=path
 
-	Use this path for fio state generated files.
+	Use this `path` for fio state generated files.
 
 Any parameters following the options will be assumed to be job files, unless
 they match a job file parameter. Multiple job files can be listed and each job
@@ -296,8 +296,8 @@ override a *global* section parameter, and a job file may even have several
 *global* sections if so desired. A job is only affected by a *global* section
 residing above it.
 
-The :option:`--cmdhelp` option also lists all options. If used with an `option`
-argument, :option:`--cmdhelp` will detail the given `option`.
+The :option:`--cmdhelp` option also lists all options. If used with a `command`
+argument, :option:`--cmdhelp` will detail the given `command`.
 
 See the `examples/` directory for inspiration on how to write job files.  Note
 the copyright and license requirements currently apply to `examples/` files.
@@ -505,19 +505,19 @@ Parameter types
 	prefixes.  To specify power-of-10 decimal values defined in the
 	International System of Units (SI):
 
-		* *Ki* -- means kilo (K) or 1000
-		* *Mi* -- means mega (M) or 1000**2
-		* *Gi* -- means giga (G) or 1000**3
-		* *Ti* -- means tera (T) or 1000**4
-		* *Pi* -- means peta (P) or 1000**5
+		* *K* -- means kilo (K) or 1000
+		* *M* -- means mega (M) or 1000**2
+		* *G* -- means giga (G) or 1000**3
+		* *T* -- means tera (T) or 1000**4
+		* *P* -- means peta (P) or 1000**5
 
 	To specify power-of-2 binary values defined in IEC 80000-13:
 
-		* *K* -- means kibi (Ki) or 1024
-		* *M* -- means mebi (Mi) or 1024**2
-		* *G* -- means gibi (Gi) or 1024**3
-		* *T* -- means tebi (Ti) or 1024**4
-		* *P* -- means pebi (Pi) or 1024**5
+		* *Ki* -- means kibi (Ki) or 1024
+		* *Mi* -- means mebi (Mi) or 1024**2
+		* *Gi* -- means gibi (Gi) or 1024**3
+		* *Ti* -- means tebi (Ti) or 1024**4
+		* *Pi* -- means pebi (Pi) or 1024**5
 
 	With :option:`kb_base`\=1024 (the default), the unit prefixes are opposite
 	from those specified in the SI and IEC 80000-13 standards to provide
@@ -847,7 +847,7 @@ Target file/device
 
 		**sequential**
 			Finish one file before moving on to the next. Multiple files can
-			still be open depending on 'openfiles'.
+			still be open depending on :option:`openfiles`.
 
 		**zipf**
 			Use a *Zipf* distribution to decide what file to access.
@@ -1167,9 +1167,9 @@ I/O type
 
 	Make every `N-th` write a barrier write.
 
-.. option:: sync_file_range=str:val
+.. option:: sync_file_range=str:int
 
-	Use :manpage:`sync_file_range(2)` for every `val` number of write
+	Use :manpage:`sync_file_range(2)` for every `int` number of write
 	operations. Fio will track range of writes that have happened since the last
 	:manpage:`sync_file_range(2)` call. `str` can currently be one or more of:
 
@@ -1239,9 +1239,9 @@ I/O type
 				Zoned random distribution
 
 	When using a **zipf** or **pareto** distribution, an input value is also
-	needed to define the access pattern. For **zipf**, this is the `zipf
+	needed to define the access pattern. For **zipf**, this is the `Zipf
 	theta`. For **pareto**, it's the `Pareto power`. Fio includes a test
-	program, :command:`genzipf`, that can be used visualize what the given input
+	program, :command:`fio-genzipf`, that can be used visualize what the given input
 	values will yield in terms of hit rates.  If you wanted to use **zipf** with
 	a `theta` of 1.2, you would use ``random_distribution=zipf:1.2`` as the
 	option. If a non-uniform model is used, fio will disable use of the random
@@ -1252,10 +1252,10 @@ I/O type
 	access that should fall within what range of the file or device. For
 	example, given a criteria of:
 
-	* 60% of accesses should be to the first 10%
-	* 30% of accesses should be to the next 20%
-	* 8% of accesses should be to the next 30%
-	* 2% of accesses should be to the next 40%
+		* 60% of accesses should be to the first 10%
+		* 30% of accesses should be to the next 20%
+		* 8% of accesses should be to the next 30%
+		* 2% of accesses should be to the next 40%
 
 	we can define that through zoning of the random accesses. For the above
 	example, the user would do::
@@ -1295,21 +1295,20 @@ I/O type
 
 .. option:: random_generator=str
 
-	Fio supports the following engines for generating
-	I/O offsets for random I/O:
+	Fio supports the following engines for generating I/O offsets for random I/O:
 
 		**tausworthe**
-			Strong 2^88 cycle random number generator
+			Strong 2^88 cycle random number generator.
 		**lfsr**
-			Linear feedback shift register generator
+			Linear feedback shift register generator.
 		**tausworthe64**
-			Strong 64-bit 2^258 cycle random number generator
+			Strong 64-bit 2^258 cycle random number generator.
 
 	**tausworthe** is a strong random number generator, but it requires tracking
 	on the side if we want to ensure that blocks are only read or written
-	once. **LFSR** guarantees that we never generate the same offset twice, and
+	once. **lfsr** guarantees that we never generate the same offset twice, and
 	it's also less computationally expensive. It's not a true random generator,
-	however, though for I/O purposes it's typically good enough. **LFSR** only
+	however, though for I/O purposes it's typically good enough. **lfsr** only
 	works with single block sizes, not with workloads that use multiple block
 	sizes. If used with such a workload, fio may read or write some blocks
 	multiple times. The default value is **tausworthe**, unless the required
@@ -1529,7 +1528,7 @@ Buffers and memory
 
 		**cudamalloc**
 			Use GPU memory as the buffers for GPUDirect RDMA benchmark.
-			The ioengine must be rdma.
+			The :option:`ioengine` must be `rdma`.
 
 	The area allocated is a function of the maximum allowed bs size for the job,
 	multiplied by the I/O depth given. Note that for **shmhuge** and
@@ -1548,7 +1547,7 @@ Buffers and memory
 	should point there. So if it's mounted in :file:`/huge`, you would use
 	`mem=mmaphuge:/huge/somefile`.
 
-.. option:: iomem_align=int
+.. option:: iomem_align=int, mem_align=int
 
 	This indicates the memory alignment of the I/O memory buffers.  Note that
 	the given alignment is applied to the first I/O unit buffer, if using
@@ -1683,8 +1682,8 @@ I/O engine
 			SCSI generic sg v3 I/O. May either be synchronous using the SG_IO
 			ioctl, or if the target is an sg character device we use
 			:manpage:`read(2)` and :manpage:`write(2)` for asynchronous
-			I/O. Requires filename option to specify either block or character
-			devices.
+			I/O. Requires :option:`filename` option to specify either block or
+			character devices.
 
 		**null**
 			Doesn't transfer any data, just pretends to.  This is mainly used to
@@ -1707,8 +1706,8 @@ I/O engine
 			Doesn't transfer any data, but burns CPU cycles according to the
 			:option:`cpuload` and :option:`cpuchunks` options. Setting
 			:option:`cpuload`\=85 will cause that job to do nothing but burn 85%
-			of the CPU. In case of SMP machines, use :option:`numjobs`
-			=<no_of_cpu> to get desired CPU usage, as the cpuload only loads a
+			of the CPU. In case of SMP machines, use :option:`numjobs`=<nr_of_cpu>
+			to get desired CPU usage, as the cpuload only loads a
 			single CPU at the desired rate. A job never finishes unless there is
 			at least one non-cpuio job.
 
@@ -1741,7 +1740,7 @@ I/O engine
 		**ftruncate**
 			I/O engine that sends :manpage:`ftruncate(2)` operations in response
 			to write (DDIR_WRITE) events. Each ftruncate issued sets the file's
-			size to the current block offset. Block size is ignored.
+			size to the current block offset. :option:`blocksize` is ignored.
 
 		**e4defrag**
 			I/O engine that does regular EXT4_IOC_MOVE_EXT ioctls to simulate
@@ -1763,7 +1762,7 @@ I/O engine
 			defines engine specific options.
 
 		**libhdfs**
-			Read and write through Hadoop (HDFS).  The :file:`filename` option
+			Read and write through Hadoop (HDFS).  The :option:`filename` option
 			is used to specify host,port of the hdfs name-node to connect.  This
 			engine interprets offsets a little differently.  In HDFS, files once
 			created cannot be modified so random writes are not possible. To
@@ -1802,8 +1801,8 @@ I/O engine specific parameters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 In addition, there are some parameters which are only valid when a specific
-ioengine is in use. These are used identically to normal parameters, with the
-caveat that when used on the command line, they must come after the
+:option:`ioengine` is in use. These are used identically to normal parameters,
+with the caveat that when used on the command line, they must come after the
 :option:`ioengine` that defines them is selected.
 
 .. option:: userspace_reap : [libaio]
@@ -1821,7 +1820,7 @@ caveat that when used on the command line, they must come after the
 
 .. option:: hipri_percentage : [pvsync2]
 
-	When hipri is set this determines the probability of a pvsync2 IO being high
+	When hipri is set this determines the probability of a pvsync2 I/O being high
 	priority. The default is 100%.
 
 .. option:: cpuload=int : [cpuio]
@@ -1837,18 +1836,16 @@ caveat that when used on the command line, they must come after the
 
 	Detect when I/O threads are done, then exit.
 
-.. option:: hostname=str : [netsplice] [net]
-
-	The hostname or IP address to use for TCP or UDP based I/O.  If the job is
-	a TCP listener or UDP reader, the hostname is not used and must be omitted
-	unless it is a valid UDP multicast address.
-
 .. option:: namenode=str : [libhdfs]
 
 	The hostname or IP address of a HDFS cluster namenode to contact.
 
 .. option:: port=int
 
+   [libhdfs]
+
+		The listening port of the HFDS cluster namenode.
+
    [netsplice], [net]
 
 		The TCP or UDP port to bind to or connect to. If this is used with
@@ -1856,9 +1853,11 @@ caveat that when used on the command line, they must come after the
 		this will be the starting port number since fio will use a range of
 		ports.
 
-   [libhdfs]
+.. option:: hostname=str : [netsplice] [net]
 
-		The listening port of the HFDS cluster namenode.
+	The hostname or IP address to use for TCP or UDP based I/O.  If the job is
+	a TCP listener or UDP reader, the hostname is not used and must be omitted
+	unless it is a valid UDP multicast address.
 
 .. option:: interface=str : [netsplice] [net]
 
@@ -1873,9 +1872,7 @@ caveat that when used on the command line, they must come after the
 
 	Set TCP_NODELAY on TCP connections.
 
-.. option:: protocol=str : [netsplice] [net]
-
-.. option:: proto=str : [netsplice] [net]
+.. option:: protocol=str, proto=str : [netsplice] [net]
 
 	The network protocol to use. Accepted values are:
 
@@ -1892,7 +1889,7 @@ caveat that when used on the command line, they must come after the
 
 	When the protocol is TCP or UDP, the port must also be given, as well as the
 	hostname if the job is a TCP listener or UDP reader. For unix sockets, the
-	normal filename option should be used and the port is invalid.
+	normal :option:`filename` option should be used and the port is invalid.
 
 .. option:: listen : [netsplice] [net]
 
@@ -2078,10 +2075,10 @@ I/O rate
 .. option:: thinktime_blocks=int
 
 	Only valid if :option:`thinktime` is set - control how many blocks to issue,
-	before waiting `thinktime` usecs. If not set, defaults to 1 which will make
-	fio wait `thinktime` usecs after every block. This effectively makes any
+	before waiting :option:`thinktime` usecs. If not set, defaults to 1 which will make
+	fio wait :option:`thinktime` usecs after every block. This effectively makes any
 	queue depth setting redundant, since no more than 1 I/O will be queued
-	before we have to complete it and do our thinktime. In other words, this
+	before we have to complete it and do our :option:`thinktime`. In other words, this
 	setting effectively caps the queue depth if the latter is larger.
 
 .. option:: rate=int[,int][,int]
@@ -2586,7 +2583,7 @@ Verification
 	state is loaded for the verify read phase. The format of the filename is,
 	roughly::
 
-	<type>-<jobname>-<jobindex>-verify.state.
+		<type>-<jobname>-<jobindex>-verify.state.
 
 	<type> is "local" for a local run, "sock" for a client/server socket
 	connection, and "ip" (192.168.0.1, for instance) for a networked
@@ -2722,8 +2719,8 @@ Measurements and reporting
 		write_lat_log=foo
 
 	The actual log names will be :file:`foo_slat.x.log`, :file:`foo_clat.x.log`,
-	and :file:`foo_lat.x.log`, where `x` is the index of the job (1..N, where N
-	is the number of jobs). This helps :command:`fio_generate_plot` find the
+	and :file:`foo_lat.x.log`, where `x` is the index of the job (`1..N`, where `N`
+	is the number of jobs). This helps :command:`fio_generate_plots` find the
 	logs automatically. If :option:`per_job_logs` is false, then the filename
 	will not include the job index.  See `Log File Formats`_.
 
@@ -2732,7 +2729,7 @@ Measurements and reporting
 	Same as :option:`write_lat_log`, but writes I/O completion latency
 	histograms. If no filename is given with this option, the default filename
 	of :file:`jobname_clat_hist.x.log` is used, where `x` is the index of the
-	job (1..N, where `N` is the number of jobs). Even if the filename is given,
+	job (`1..N`, where `N` is the number of jobs). Even if the filename is given,
 	fio will still append the type of log.  If :option:`per_job_logs` is false,
 	then the filename will not include the job index. See `Log File Formats`_.
 
@@ -2740,7 +2737,7 @@ Measurements and reporting
 
 	Same as :option:`write_bw_log`, but writes IOPS. If no filename is given
 	with this option, the default filename of :file:`jobname_type.x.log` is
-	used,where `x` is the index of the job (1..N, where `N` is the number of
+	used, where `x` is the index of the job (`1..N`, where `N` is the number of
 	jobs). Even if the filename is given, fio will still append the type of
 	log. If :option:`per_job_logs` is false, then the filename will not include
 	the job index. See `Log File Formats`_.
@@ -2855,7 +2852,7 @@ Measurements and reporting
 .. option:: disable_slat=bool
 
 	Disable measurements of submission latency numbers. See
-	:option:`disable_slat`.
+	:option:`disable_lat`.
 
 .. option:: disable_bw_measurement=bool, disable_bw=bool
 
@@ -2959,7 +2956,7 @@ other tools.
 To view a profile's additional options use :option:`--cmdhelp` after specifying
 the profile.  For example::
 
-$ fio --profile=act --cmdhelp
+	$ fio --profile=act --cmdhelp
 
 Act profile options
 ~~~~~~~~~~~~~~~~~~~
@@ -2983,7 +2980,7 @@ Act profile options
 .. option:: threads-per-queue=int
 	:noindex:
 
-	Number of read IO threads per device.  Default: 8.
+	Number of read I/O threads per device.  Default: 8.
 
 .. option:: read-req-num-512-blocks=int
 	:noindex:
@@ -3006,7 +3003,7 @@ Tiobench profile options
 .. option:: size=str
 	:noindex:
 
-	Size in MiB
+	Size in MiB.
 
 .. option:: block=int
 	:noindex:
@@ -3109,9 +3106,9 @@ are readers and 11--20 are writers.
 The other values are fairly self explanatory -- number of threads currently
 running and doing I/O, the number of currently open files (f=), the estimated
 completion percentage, the rate of I/O since last check (read speed listed first,
-then write speed and optionally trim speed) in terms of bandwidth and IOPS, and time to completion for the current
-running group. It's impossible to estimate runtime of the following groups (if
-any).
+then write speed and optionally trim speed) in terms of bandwidth and IOPS,
+and time to completion for the current running group. It's impossible to estimate
+runtime of the following groups (if any).
 
 ..
 	Example output was based on the following:
@@ -3261,7 +3258,7 @@ For each data direction it prints:
 **run**
 		The smallest and longest runtimes of the threads in this group.
 
-And finally, the disk statistics are printed. They will look like this::
+And finally, the disk statistics are printed. This is Linux specific. They will look like this::
 
   Disk stats (read/write):
     sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
@@ -3312,7 +3309,7 @@ field was introduced or whether it's specific to some terse version):
 
     ::
 
-	terse version, fio version [v3], jobname, groupid, error
+        terse version, fio version [v3], jobname, groupid, error
 
     READ status::
 
@@ -3321,8 +3318,8 @@ field was introduced or whether it's specific to some terse version):
         Completion latency: min, max, mean, stdev (usec)
         Completion latency percentiles: 20 fields (see below)
         Total latency: min, max, mean, stdev (usec)
-	Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
-	IOPS [v5]: min, max, mean, stdev, number of samples
+        Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
+        IOPS [v5]: min, max, mean, stdev, number of samples
 
     WRITE status:
 
@@ -3333,12 +3330,12 @@ field was introduced or whether it's specific to some terse version):
         Completion latency: min, max, mean, stdev (usec)
         Completion latency percentiles: 20 fields (see below)
         Total latency: min, max, mean, stdev (usec)
-	Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
-	IOPS [v5]: min, max, mean, stdev, number of samples
+        Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
+        IOPS [v5]: min, max, mean, stdev, number of samples
 
     TRIM status [all but version 3]:
 
-	Fields are similar to READ/WRITE status.
+        Fields are similar to READ/WRITE status.
 
     CPU usage::
 
@@ -3358,10 +3355,8 @@ field was introduced or whether it's specific to some terse version):
 
     Disk utilization [v3]::
 
-        Disk name, Read ios, write ios,
-        Read merges, write merges,
-        Read ticks, write ticks,
-        Time spent in queue, disk utilization percentage
+        disk name, read ios, write ios, read merges, write merges, read ticks, write ticks,
+        time spent in queue, disk utilization percentage
 
     Additional Info (dependent on continue_on_error, default off)::
 
@@ -3374,17 +3369,17 @@ field was introduced or whether it's specific to some terse version):
 Completion latency percentiles can be a grouping of up to 20 sets, so for the
 terse output fio writes all of them. Each field will look like this::
 
-	1.00%=6112
+        1.00%=6112
 
 which is the Xth percentile, and the `usec` latency associated with it.
 
-For disk utilization, all disks used by fio are shown. So for each disk there
+For `Disk utilization`, all disks used by fio are shown. So for each disk there
 will be a disk utilization section.
 
 Below is a single line containing short names for each of the fields in the
 minimal output v3, separated by semicolons::
 
-	terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10
 ;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
+        terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_cla
 t_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 
 
 JSON+ output
@@ -3405,7 +3400,7 @@ Also included with fio is a Python script `fio_jsonplus_clat2csv` that takes
 json+ output and generates CSV-formatted latency data suitable for plotting.
 
 The latency durations actually represent the midpoints of latency intervals.
-For details refer to stat.h.
+For details refer to :file:`stat.h`.
 
 
 Trace file format
@@ -3425,7 +3420,7 @@ Each line represents a single I/O action in the following format::
 
 	rw, offset, length
 
-where `rw=0/1` for read/write, and the offset and length entries being in bytes.
+where `rw=0/1` for read/write, and the `offset` and `length` entries being in bytes.
 
 This format is not supported in fio versions >= 1.20-rc3.
 
@@ -3447,15 +3442,15 @@ The file management format::
 
     filename action
 
-The filename is given as an absolute path. The action can be one of these:
+The `filename` is given as an absolute path. The `action` can be one of these:
 
 **add**
-		Add the given filename to the trace.
+		Add the given `filename` to the trace.
 **open**
-		Open the file with the given filename. The filename has to have
+		Open the file with the given `filename`. The `filename` has to have
 		been added with the **add** action before.
 **close**
-		Close the file with the given filename. The file has to have been
+		Close the file with the given `filename`. The file has to have been
 		opened before.
 
 
@@ -3538,8 +3533,8 @@ will then execute the trigger.
 Verification trigger example
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Let's say we want to run a powercut test on the remote machine 'server'.  Our
-write workload is in :file:`write-test.fio`. We want to cut power to 'server' at
+Let's say we want to run a powercut test on the remote Linux machine 'server'.
+Our write workload is in :file:`write-test.fio`. We want to cut power to 'server' at
 some point during the run, and we'll run this test from the safety or our local
 machine, 'localbox'. On the server, we'll start the fio backend normally::
 
@@ -3626,7 +3621,7 @@ Under Test" while being controlled by a client on another machine.
 
 Start the server on the machine which has access to the storage DUT::
 
-	fio --server=args
+	$ fio --server=args
 
 where `args` defines what fio listens to. The arguments are of the form
 ``type,hostname`` or ``IP,port``. *type* is either ``ip`` (or ip4) for TCP/IP
diff --git a/fio.1 b/fio.1
index 14359e6..792bc9d 100644
--- a/fio.1
+++ b/fio.1
@@ -1,4 +1,4 @@
-.TH fio 1 "July 2017" "User Manual"
+.TH fio 1 "August 2017" "User Manual"
 .SH NAME
 fio \- flexible I/O tester
 .SH SYNOPSIS
@@ -13,72 +13,73 @@ one wants to simulate.
 .SH OPTIONS
 .TP
 .BI \-\-debug \fR=\fPtype
-Enable verbose tracing of various fio actions. May be `all' for all types
-or individual types separated by a comma (e.g. \-\-debug=file,mem will enable
+Enable verbose tracing \fItype\fR of various fio actions. May be `all' for all \fItype\fRs
+or individual types separated by a comma (e.g. `\-\-debug=file,mem' will enable
 file and memory debugging). `help' will list all available tracing options.
 .TP
-.BI \-\-parse-only
+.BI \-\-parse\-only
 Parse options only, don't start any I/O.
 .TP
 .BI \-\-output \fR=\fPfilename
 Write output to \fIfilename\fR.
 .TP
-.BI \-\-output-format \fR=\fPformat
-Set the reporting format to \fInormal\fR, \fIterse\fR, \fIjson\fR, or
-\fIjson+\fR. Multiple formats can be selected, separate by a comma. \fIterse\fR
-is a CSV based format. \fIjson+\fR is like \fIjson\fR, except it adds a full
+.BI \-\-output\-format \fR=\fPformat
+Set the reporting \fIformat\fR to `normal', `terse', `json', or
+`json+'. Multiple formats can be selected, separate by a comma. `terse'
+is a CSV based format. `json+' is like `json', except it adds a full
 dump of the latency buckets.
 .TP
 .BI \-\-runtime \fR=\fPruntime
 Limit run time to \fIruntime\fR seconds.
 .TP
-.B \-\-bandwidth\-log
+.BI \-\-bandwidth\-log
 Generate aggregate bandwidth logs.
 .TP
-.B \-\-minimal
-Print statistics in a terse, semicolon-delimited format.
+.BI \-\-minimal
+Print statistics in a terse, semicolon\-delimited format.
 .TP
-.B \-\-append-terse
-Print statistics in selected mode AND terse, semicolon-delimited format.
-Deprecated, use \-\-output-format instead to select multiple formats.
+.BI \-\-append\-terse
+Print statistics in selected mode AND terse, semicolon\-delimited format.
+\fBDeprecated\fR, use \fB\-\-output\-format\fR instead to select multiple formats.
 .TP
 .BI \-\-terse\-version \fR=\fPversion
-Set terse version output format (default 3, or 2, 4, 5)
+Set terse \fIversion\fR output format (default `3', or `2', `4', `5').
 .TP
-.B \-\-version
+.BI \-\-version
 Print version information and exit.
 .TP
-.B \-\-help
+.BI \-\-help
 Print a summary of the command line options and exit.
 .TP
-.B \-\-cpuclock-test
+.BI \-\-cpuclock\-test
 Perform test and validation of internal CPU clock.
 .TP
 .BI \-\-crctest \fR=\fP[test]
-Test the speed of the built-in checksumming functions. If no argument is given,
+Test the speed of the built\-in checksumming functions. If no argument is given,
 all of them are tested. Alternatively, a comma separated list can be passed, in which
 case the given ones are tested.
 .TP
 .BI \-\-cmdhelp \fR=\fPcommand
 Print help information for \fIcommand\fR. May be `all' for all commands.
 .TP
-.BI \-\-enghelp \fR=\fPioengine[,command]
-List all commands defined by \fIioengine\fR, or print help for \fIcommand\fR defined by \fIioengine\fR.
-If no \fIioengine\fR is given, list all available ioengines.
+.BI \-\-enghelp \fR=\fP[ioengine[,command]]
+List all commands defined by \fIioengine\fR, or print help for \fIcommand\fR
+defined by \fIioengine\fR. If no \fIioengine\fR is given, list all
+available ioengines.
 .TP
 .BI \-\-showcmd \fR=\fPjobfile
-Convert \fIjobfile\fR to a set of command-line options.
+Convert \fIjobfile\fR to a set of command\-line options.
 .TP
 .BI \-\-readonly
-Turn on safety read-only checks, preventing writes. The \-\-readonly
+Turn on safety read\-only checks, preventing writes. The \fB\-\-readonly\fR
 option is an extra safety guard to prevent users from accidentally starting
 a write workload when that is not desired. Fio will only write if
-`rw=write/randwrite/rw/randrw` is given. This extra safety net can be used
-as an extra precaution as \-\-readonly will also enable a write check in
+`rw=write/randwrite/rw/randrw' is given. This extra safety net can be used
+as an extra precaution as \fB\-\-readonly\fR will also enable a write check in
 the I/O engine core to prevent writes due to unknown user space bug(s).
 .TP
 .BI \-\-eta \fR=\fPwhen
-Specifies when real-time ETA estimate should be printed. \fIwhen\fR may
+Specifies when real\-time ETA estimate should be printed. \fIwhen\fR may
 be `always', `never' or `auto'.
 .TP
 .BI \-\-eta\-newline \fR=\fPtime
@@ -91,43 +92,45 @@ the value is interpreted in seconds.
 .TP
 .BI \-\-section \fR=\fPname
 Only run specified section \fIname\fR in job file. Multiple sections can be specified.
-The \-\-section option allows one to combine related jobs into one file.
+The \fB\-\-section\fR option allows one to combine related jobs into one file.
 E.g. one job file could define light, moderate, and heavy sections. Tell
-fio to run only the "heavy" section by giving \-\-section=heavy
+fio to run only the "heavy" section by giving `\-\-section=heavy'
 command line option. One can also specify the "write" operations in one
-section and "verify" operation in another section. The \-\-section option
+section and "verify" operation in another section. The \fB\-\-section\fR option
 only applies to job sections. The reserved *global* section is always
 parsed and used.
 .TP
 .BI \-\-alloc\-size \fR=\fPkb
-Set the internal smalloc pool size to \fIkb\fP in KiB. The
-\-\-alloc-size switch allows one to use a larger pool size for smalloc.
+Set the internal smalloc pool size to \fIkb\fR in KiB. The
+\fB\-\-alloc\-size\fR switch allows one to use a larger pool size for smalloc.
 If running large jobs with randommap enabled, fio can run out of memory.
 Smalloc is an internal allocator for shared structures from a fixed size
 memory pool and can grow to 16 pools. The pool size defaults to 16MiB.
-NOTE: While running .fio_smalloc.* backing store files are visible
-in /tmp.
+NOTE: While running `.fio_smalloc.*' backing store files are visible
+in `/tmp'.
 .TP
 .BI \-\-warnings\-fatal
 All fio parser warnings are fatal, causing fio to exit with an error.
 .TP
 .BI \-\-max\-jobs \fR=\fPnr
-Set the maximum number of threads/processes to support.
+Set the maximum number of threads/processes to support to \fInr\fR.
 .TP
 .BI \-\-server \fR=\fPargs
-Start a backend server, with \fIargs\fP specifying what to listen to. See Client/Server section.
+Start a backend server, with \fIargs\fR specifying what to listen to.
+See \fBCLIENT/SERVER\fR section.
 .TP
 .BI \-\-daemonize \fR=\fPpidfile
-Background a fio server, writing the pid to the given \fIpidfile\fP file.
+Background a fio server, writing the pid to the given \fIpidfile\fR file.
 .TP
 .BI \-\-client \fR=\fPhostname
-Instead of running the jobs locally, send and run them on the given host or set of hosts. See Client/Server section.
+Instead of running the jobs locally, send and run them on the given \fIhostname\fR
+or set of \fIhostname\fRs. See \fBCLIENT/SERVER\fR section.
 .TP
-.BI \-\-remote-config \fR=\fPfile
-Tell fio server to load this local file.
+.BI \-\-remote\-config \fR=\fPfile
+Tell fio server to load this local \fIfile\fR.
 .TP
 .BI \-\-idle\-prof \fR=\fPoption
-Report CPU idleness. \fIoption\fP is one of the following:
+Report CPU idleness. \fIoption\fR is one of the following:
 .RS
 .RS
 .TP
@@ -138,31 +141,31 @@ Run unit work calibration only and exit.
 Show aggregate system idleness and unit work.
 .TP
 .B percpu
-As "system" but also show per CPU idleness.
+As \fBsystem\fR but also show per CPU idleness.
 .RE
 .RE
 .TP
-.BI \-\-inflate-log \fR=\fPlog
-Inflate and output compressed log.
+.BI \-\-inflate\-log \fR=\fPlog
+Inflate and output compressed \fIlog\fR.
 .TP
-.BI \-\-trigger-file \fR=\fPfile
-Execute trigger cmd when file exists.
+.BI \-\-trigger\-file \fR=\fPfile
+Execute trigger command when \fIfile\fR exists.
 .TP
-.BI \-\-trigger-timeout \fR=\fPt
-Execute trigger at this time.
+.BI \-\-trigger\-timeout \fR=\fPtime
+Execute trigger at this \fItime\fR.
 .TP
-.BI \-\-trigger \fR=\fPcmd
-Set this command as local trigger.
+.BI \-\-trigger \fR=\fPcommand
+Set this \fIcommand\fR as local trigger.
 .TP
-.BI \-\-trigger-remote \fR=\fPcmd
-Set this command as remote trigger.
+.BI \-\-trigger\-remote \fR=\fPcommand
+Set this \fIcommand\fR as remote trigger.
 .TP
-.BI \-\-aux-path \fR=\fPpath
-Use this path for fio state generated files.
+.BI \-\-aux\-path \fR=\fPpath
+Use this \fIpath\fR for fio state generated files.
 .SH "JOB FILE FORMAT"
 Any parameters following the options will be assumed to be job files, unless
 they match a job file parameter. Multiple job files can be listed and each job
-file will be regarded as a separate group. Fio will `stonewall` execution
+file will be regarded as a separate group. Fio will \fBstonewall\fR execution
 between each group.
 
 Fio accepts one or more job files describing what it is
@@ -178,32 +181,30 @@ override a *global* section parameter, and a job file may even have several
 *global* sections if so desired. A job is only affected by a *global* section
 residing above it.
 
-The \-\-cmdhelp option also lists all options. If used with an `option`
-argument, \-\-cmdhelp will detail the given `option`.
+The \fB\-\-cmdhelp\fR option also lists all options. If used with an \fIcommand\fR
+argument, \fB\-\-cmdhelp\fR will detail the given \fIcommand\fR.
 
-See the `examples/` directory in the fio source for inspiration on how to write
-job files. Note the copyright and license requirements currently apply to
-`examples/` files.
+See the `examples/' directory for inspiration on how to write job files. Note
+the copyright and license requirements currently apply to
+`examples/' files.
 .SH "JOB FILE PARAMETERS"
 Some parameters take an option of a given type, such as an integer or a
 string. Anywhere a numeric value is required, an arithmetic expression may be
 used, provided it is surrounded by parentheses. Supported operators are:
 .RS
-.RS
-.TP
+.P
 .B addition (+)
-.TP
-.B subtraction (-)
-.TP
+.P
+.B subtraction (\-)
+.P
 .B multiplication (*)
-.TP
+.P
 .B division (/)
-.TP
+.P
 .B modulus (%)
-.TP
+.P
 .B exponentiation (^)
 .RE
-.RE
 .P
 For time values in expressions, units are microseconds by default. This is
 different than for time values not in expressions (not enclosed in
@@ -238,45 +239,41 @@ default unit is bytes. For quantities of time, the default unit is seconds
 unless otherwise specified.
 .P
 With `kb_base=1000', fio follows international standards for unit
-prefixes. To specify power-of-10 decimal values defined in the
+prefixes. To specify power\-of\-10 decimal values defined in the
 International System of Units (SI):
 .RS
 .P
-Ki means kilo (K) or 1000
-.RE
-.RS
-Mi means mega (M) or 1000**2
-.RE
-.RS
-Gi means giga (G) or 1000**3
-.RE
-.RS
-Ti means tera (T) or 1000**4
-.RE
-.RS
-Pi means peta (P) or 1000**5
-.RE
+.PD 0
+K means kilo (K) or 1000
 .P
-To specify power-of-2 binary values defined in IEC 80000-13:
-.RS
+M means mega (M) or 1000**2
 .P
-K means kibi (Ki) or 1024
-.RE
-.RS
-M means mebi (Mi) or 1024**2
-.RE
-.RS
-G means gibi (Gi) or 1024**3
-.RE
-.RS
-T means tebi (Ti) or 1024**4
+G means giga (G) or 1000**3
+.P
+T means tera (T) or 1000**4
+.P
+P means peta (P) or 1000**5
+.PD
 .RE
+.P
+To specify power\-of\-2 binary values defined in IEC 80000\-13:
 .RS
-P means pebi (Pi) or 1024**5
+.P
+.PD 0
+Ki means kibi (Ki) or 1024
+.P
+Mi means mebi (Mi) or 1024**2
+.P
+Gi means gibi (Gi) or 1024**3
+.P
+Ti means tebi (Ti) or 1024**4
+.P
+Pi means pebi (Pi) or 1024**5
+.PD
 .RE
 .P
 With `kb_base=1024' (the default), the unit prefixes are opposite
-from those specified in the SI and IEC 80000-13 standards to provide
+from those specified in the SI and IEC 80000\-13 standards to provide
 compatibility with old scripts. For example, 4k means 4096.
 .P
 For quantities of data, an optional unit of 'B' may be included
@@ -288,62 +285,55 @@ not milli). 'b' and 'B' both mean byte, not bit.
 Examples with `kb_base=1000':
 .RS
 .P
+.PD 0
 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
-.RE
-.RS
+.P
 1 MiB: 1048576, 1m, 1024k
-.RE
-.RS
+.P
 1 MB: 1000000, 1mi, 1000ki
-.RE
-.RS
+.P
 1 TiB: 1073741824, 1t, 1024m, 1048576k
-.RE
-.RS
+.P
 1 TB: 1000000000, 1ti, 1000mi, 1000000ki
+.PD
 .RE
 .P
 Examples with `kb_base=1024' (default):
 .RS
 .P
+.PD 0
 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
-.RE
-.RS
+.P
 1 MiB: 1048576, 1m, 1024k
-.RE
-.RS
+.P
 1 MB: 1000000, 1mi, 1000ki
-.RE
-.RS
+.P
 1 TiB: 1073741824, 1t, 1024m, 1048576k
-.RE
-.RS
+.P
 1 TB: 1000000000, 1ti, 1000mi, 1000000ki
+.PD
 .RE
 .P
 To specify times (units are not case sensitive):
 .RS
 .P
+.PD 0
 D means days
-.RE
-.RS
+.P
 H means hours
-.RE
-.RS
+.P
 M mean minutes
-.RE
-.RS
+.P
 s or sec means seconds (default)
-.RE
-.RS
+.P
 ms or msec means milliseconds
-.RE
-.RS
+.P
 us or usec means microseconds
+.PD
 .RE
 .P
 If the option accepts an upper and lower range, use a colon ':' or
-minus '-' to separate such values. See `irange` parameter type.
+minus '\-' to separate such values. See \fIirange\fR parameter type.
 If the lower value specified happens to be larger than the upper value
 the two values are swapped.
 .RE
@@ -354,63 +344,219 @@ true and false (1 and 0).
 .TP
 .I irange
 Integer range with suffix. Allows value range to be given, such as
-1024-4096. A colon may also be used as the separator, e.g. 1k:4k. If the
+1024\-4096. A colon may also be used as the separator, e.g. 1k:4k. If the
 option allows two sets of ranges, they can be specified with a ',' or '/'
-delimiter: 1k-4k/8k-32k. Also see `int` parameter type.
+delimiter: 1k\-4k/8k\-32k. Also see \fIint\fR parameter type.
 .TP
 .I float_list
 A list of floating point numbers, separated by a ':' character.
-.SH "JOB DESCRIPTION"
+.SH "JOB PARAMETERS"
 With the above in mind, here follows the complete list of fio job parameters.
+.SS "Units"
 .TP
-.BI name \fR=\fPstr
-May be used to override the job name.  On the command line, this parameter
-has the special purpose of signalling the start of a new job.
+.BI kb_base \fR=\fPint
+Select the interpretation of unit prefixes in input parameters.
+.RS
+.RS
 .TP
-.BI wait_for \fR=\fPstr
-Specifies the name of the already defined job to wait for. Single waitee name
-only may be specified. If set, the job won't be started until all workers of
-the waitee job are done.  Wait_for operates on the job name basis, so there are
-a few limitations. First, the waitee must be defined prior to the waiter job
-(meaning no forward references). Second, if a job is being referenced as a
-waitee, it must have a unique name (no duplicate waitees).
+.B 1000
+Inputs comply with IEC 80000\-13 and the International
+System of Units (SI). Use:
+.RS
+.P
+.PD 0
+\- power\-of\-2 values with IEC prefixes (e.g., KiB)
+.P
+\- power\-of\-10 values with SI prefixes (e.g., kB)
+.PD
+.RE
+.TP
+.B 1024
+Compatibility mode (default). To avoid breaking old scripts:
+.P
+.RS
+.PD 0
+\- power\-of\-2 values with SI prefixes
+.P
+\- power\-of\-10 values with IEC prefixes
+.PD
+.RE
+.RE
+.P
+See \fBbs\fR for more details on input parameters.
+.P
+Outputs always use correct prefixes. Most outputs include both
+side\-by\-side, like:
+.P
+.RS
+bw=2383.3kB/s (2327.4KiB/s)
+.RE
+.P
+If only one value is reported, then kb_base selects the one to use:
+.P
+.RS
+.PD 0
+1000 \-\- SI prefixes
+.P
+1024 \-\- IEC prefixes
+.PD
+.RE
+.RE
+.TP
+.BI unit_base \fR=\fPint
+Base unit for reporting. Allowed values are:
+.RS
+.RS
+.TP
+.B 0
+Use auto\-detection (default).
+.TP
+.B 8
+Byte based.
+.TP
+.B 1
+Bit based.
+.RE
+.RE
+.SS "Job description"
+.TP
+.BI name \fR=\fPstr
+ASCII name of the job. This may be used to override the name printed by fio
+for this job. Otherwise the job name is used. On the command line this
+parameter has the special purpose of also signaling the start of a new job.
 .TP
 .BI description \fR=\fPstr
-Human-readable description of the job. It is printed when the job is run, but
-otherwise has no special purpose.
+Text description of the job. Doesn't do anything except dump this text
+description when this job is run. It's not parsed.
+.TP
+.BI loops \fR=\fPint
+Run the specified number of iterations of this job. Used to repeat the same
+workload a given number of times. Defaults to 1.
+.TP
+.BI numjobs \fR=\fPint
+Create the specified number of clones of this job. Each clone of job
+is spawned as an independent thread or process. May be used to setup a
+larger number of threads/processes doing the same thing. Each thread is
+reported separately; to see statistics for all clones as a whole, use
+\fBgroup_reporting\fR in conjunction with \fBnew_group\fR.
+See \fB\-\-max\-jobs\fR. Default: 1.
+.SS "Time related parameters"
+.TP
+.BI runtime \fR=\fPtime
+Tell fio to terminate processing after the specified period of time. It
+can be quite hard to determine for how long a specified job will run, so
+this parameter is handy to cap the total runtime to a given time. When
+the unit is omitted, the value is intepreted in seconds.
+.TP
+.BI time_based
+If set, fio will run for the duration of the \fBruntime\fR specified
+even if the file(s) are completely read or written. It will simply loop over
+the same workload as many times as the \fBruntime\fR allows.
+.TP
+.BI startdelay \fR=\fPirange(int)
+Delay the start of job for the specified amount of time. Can be a single
+value or a range. When given as a range, each thread will choose a value
+randomly from within the range. Value is in seconds if a unit is omitted.
+.TP
+.BI ramp_time \fR=\fPtime
+If set, fio will run the specified workload for this amount of time before
+logging any performance numbers. Useful for letting performance settle
+before logging results, thus minimizing the runtime required for stable
+results. Note that the \fBramp_time\fR is considered lead in time for a job,
+thus it will increase the total runtime if a special timeout or
+\fBruntime\fR is specified. When the unit is omitted, the value is
+given in seconds.
+.TP
+.BI clocksource \fR=\fPstr
+Use the given clocksource as the base of timing. The supported options are:
+.RS
+.RS
+.TP
+.B gettimeofday
+\fBgettimeofday\fR\|(2)
+.TP
+.B clock_gettime
+\fBclock_gettime\fR\|(2)
+.TP
+.B cpu
+Internal CPU clock source
+.RE
+.P
+\fBcpu\fR is the preferred clocksource if it is reliable, as it is very fast (and
+fio is heavy on time calls). Fio will automatically use this clocksource if
+it's supported and considered reliable on the system it is running on,
+unless another clocksource is specifically set. For x86/x86\-64 CPUs, this
+means supporting TSC Invariant.
+.RE
+.TP
+.BI gtod_reduce \fR=\fPbool
+Enable all of the \fBgettimeofday\fR\|(2) reducing options
+(\fBdisable_clat\fR, \fBdisable_slat\fR, \fBdisable_bw_measurement\fR) plus
+reduce precision of the timeout somewhat to really shrink the
+\fBgettimeofday\fR\|(2) call count. With this option enabled, we only do
+about 0.4% of the \fBgettimeofday\fR\|(2) calls we would have done if all
+time keeping was enabled.
+.TP
+.BI gtod_cpu \fR=\fPint
+Sometimes it's cheaper to dedicate a single thread of execution to just
+getting the current time. Fio (and databases, for instance) are very
+intensive on \fBgettimeofday\fR\|(2) calls. With this option, you can set
+one CPU aside for doing nothing but logging current time to a shared memory
+location. Then the other threads/processes that run I/O workloads need only
+copy that segment, instead of entering the kernel with a
+\fBgettimeofday\fR\|(2) call. The CPU set aside for doing these time
+calls will be excluded from other uses. Fio will manually clear it from the
+CPU mask of other jobs.
+.SS "Target file/device"
 .TP
 .BI directory \fR=\fPstr
-Prefix filenames with this directory.  Used to place files in a location other
-than `./'.
-You can specify a number of directories by separating the names with a ':'
-character. These directories will be assigned equally distributed to job clones
-creates with \fInumjobs\fR as long as they are using generated filenames.
-If specific \fIfilename(s)\fR are set fio will use the first listed directory,
-and thereby matching the  \fIfilename\fR semantic which generates a file each
-clone if not specified, but let all clones use the same if set. See
-\fIfilename\fR for considerations regarding escaping certain characters on
-some platforms.
+Prefix \fBfilename\fRs with this directory. Used to place files in a different
+location than `./'. You can specify a number of directories by
+separating the names with a ':' character. These directories will be
+assigned equally distributed to job clones created by \fBnumjobs\fR as
+long as they are using generated filenames. If specific \fBfilename\fR(s) are
+set fio will use the first listed directory, and thereby matching the
+\fBfilename\fR semantic which generates a file each clone if not specified, but
+let all clones use the same if set.
+.RS
+.P
+See the \fBfilename\fR option for information on how to escape ':' and '\'
+characters within the directory path itself.
+.RE
 .TP
 .BI filename \fR=\fPstr
-.B fio
-normally makes up a file name based on the job name, thread number, and file
-number. If you want to share files between threads in a job or several jobs,
-specify a \fIfilename\fR for each of them to override the default.
-If the I/O engine is file-based, you can specify
-a number of files by separating the names with a `:' character. `\-' is a
-reserved name, meaning stdin or stdout, depending on the read/write direction
-set. On Windows, disk devices are accessed as \\.\PhysicalDrive0 for the first
-device, \\.\PhysicalDrive1 for the second etc. Note: Windows and FreeBSD
-prevent write access to areas of the disk containing in-use data
-(e.g. filesystems). If the wanted filename does need to include a colon, then
-escape that with a '\\' character. For instance, if the filename is
-"/dev/dsk/foo@3,0:c", then you would use filename="/dev/dsk/foo@3,0\\:c".
+Fio normally makes up a \fBfilename\fR based on the job name, thread number, and
+file number (see \fBfilename_format\fR). If you want to share files
+between threads in a job or several
+jobs with fixed file paths, specify a \fBfilename\fR for each of them to override
+the default. If the ioengine is file based, you can specify a number of files
+by separating the names with a ':' colon. So if you wanted a job to open
+`/dev/sda' and `/dev/sdb' as the two working files, you would use
+`filename=/dev/sda:/dev/sdb'. This also means that whenever this option is
+specified, \fBnrfiles\fR is ignored. The size of regular files specified
+by this option will be \fBsize\fR divided by number of files unless an
+explicit size is specified by \fBfilesize\fR.
+.RS
+.P
+Each colon and backslash in the wanted path must be escaped with a '\'
+character. For instance, if the path is `/dev/dsk/foo@3,0:c' then you
+would use `filename=/dev/dsk/foo@3,0\\:c' and if the path is
+`F:\\\\filename' then you would use `filename=F\\:\\\\filename'.
+.P
+On Windows, disk devices are accessed as `\\\\\\\\.\\\\PhysicalDrive0' for
+the first device, `\\\\\\\\.\\\\PhysicalDrive1' for the second etc.
+Note: Windows and FreeBSD prevent write access to areas
+of the disk containing in\-use data (e.g. filesystems).
+.P
+The filename `\-' is a reserved name, meaning *stdin* or *stdout*. Which
+of the two depends on the read/write direction set.
+.RE
 .TP
 .BI filename_format \fR=\fPstr
-If sharing multiple files between jobs, it is usually necessary to have
-fio generate the exact names that you want. By default, fio will name a file
+If sharing multiple files between jobs, it is usually necessary to have fio
+generate the exact names that you want. By default, fio will name a file
 based on the default file format specification of
-\fBjobname.jobnumber.filenumber\fP. With this option, that can be
+`jobname.jobnumber.filenumber'. With this option, that can be
 customized. Fio will recognize and replace the following keywords in this
 string:
 .RS
@@ -426,44 +572,168 @@ The incremental number of the worker thread or process.
 The incremental number of the file for that worker thread or process.
 .RE
 .P
-To have dependent jobs share a set of files, this option can be set to
-have fio generate filenames that are shared between the two. For instance,
-if \fBtestfiles.$filenum\fR is specified, file number 4 for any job will
-be named \fBtestfiles.4\fR. The default of \fB$jobname.$jobnum.$filenum\fR
+To have dependent jobs share a set of files, this option can be set to have
+fio generate filenames that are shared between the two. For instance, if
+`testfiles.$filenum' is specified, file number 4 for any job will be
+named `testfiles.4'. The default of `$jobname.$jobnum.$filenum'
 will be used if no other format specifier is given.
 .RE
-.P
 .TP
 .BI unique_filename \fR=\fPbool
-To avoid collisions between networked clients, fio defaults to prefixing
-any generated filenames (with a directory specified) with the source of
-the client connecting. To disable this behavior, set this option to 0.
+To avoid collisions between networked clients, fio defaults to prefixing any
+generated filenames (with a directory specified) with the source of the
+client connecting. To disable this behavior, set this option to 0.
+.TP
+.BI opendir \fR=\fPstr
+Recursively open any files below directory \fIstr\fR.
 .TP
 .BI lockfile \fR=\fPstr
-Fio defaults to not locking any files before it does IO to them. If a file or
-file descriptor is shared, fio can serialize IO to that file to make the end
-result consistent. This is usual for emulating real workloads that share files.
-The lock modes are:
+Fio defaults to not locking any files before it does I/O to them. If a file
+or file descriptor is shared, fio can serialize I/O to that file to make the
+end result consistent. This is usual for emulating real workloads that share
+files. The lock modes are:
 .RS
 .RS
 .TP
 .B none
-No locking. This is the default.
+No locking. The default.
 .TP
 .B exclusive
-Only one thread or process may do IO at a time, excluding all others.
+Only one thread or process may do I/O at a time, excluding all others.
 .TP
 .B readwrite
-Read-write locking on the file. Many readers may access the file at the same
-time, but writes get exclusive access.
+Read\-write locking on the file. Many readers may
+access the file at the same time, but writes get exclusive access.
+.RE
 .RE
+.TP
+.BI nrfiles \fR=\fPint
+Number of files to use for this job. Defaults to 1. The size of files
+will be \fBsize\fR divided by this unless explicit size is specified by
+\fBfilesize\fR. Files are created for each thread separately, and each
+file will have a file number within its name by default, as explained in
+\fBfilename\fR section.
+.TP
+.BI openfiles \fR=\fPint
+Number of files to keep open at the same time. Defaults to the same as
+\fBnrfiles\fR, can be set smaller to limit the number simultaneous
+opens.
+.TP
+.BI file_service_type \fR=\fPstr
+Defines how fio decides which file from a job to service next. The following
+types are defined:
+.RS
+.RS
+.TP
+.B random
+Choose a file at random.
+.TP
+.B roundrobin
+Round robin over opened files. This is the default.
+.TP
+.B sequential
+Finish one file before moving on to the next. Multiple files can
+still be open depending on \fBopenfiles\fR.
+.TP
+.B zipf
+Use a Zipf distribution to decide what file to access.
+.TP
+.B pareto
+Use a Pareto distribution to decide what file to access.
+.TP
+.B normal
+Use a Gaussian (normal) distribution to decide what file to access.
+.TP
+.B gauss
+Alias for normal.
 .RE
 .P
-.BI opendir \fR=\fPstr
-Recursively open any files below directory \fIstr\fR.
+For \fBrandom\fR, \fBroundrobin\fR, and \fBsequential\fR, a postfix can be appended to
+tell fio how many I/Os to issue before switching to a new file. For example,
+specifying `file_service_type=random:8' would cause fio to issue
+8 I/Os before selecting a new file at random. For the non\-uniform
+distributions, a floating point postfix can be given to influence how the
+distribution is skewed. See \fBrandom_distribution\fR for a description
+of how that would work.
+.RE
+.TP
+.BI ioscheduler \fR=\fPstr
+Attempt to switch the device hosting the file to the specified I/O scheduler
+before running.
+.TP
+.BI create_serialize \fR=\fPbool
+If true, serialize the file creation for the jobs. This may be handy to
+avoid interleaving of data files, which may greatly depend on the filesystem
+used and even the number of processors in the system. Default: true.
+.TP
+.BI create_fsync \fR=\fPbool
+\fBfsync\fR\|(2) the data file after creation. This is the default.
+.TP
+.BI create_on_open \fR=\fPbool
+If true, don't pre\-create files but allow the job's open() to create a file
+when it's time to do I/O. Default: false \-\- pre\-create all necessary files
+when the job starts.
+.TP
+.BI create_only \fR=\fPbool
+If true, fio will only run the setup phase of the job. If files need to be
+laid out or updated on disk, only that will be done \-\- the actual job contents
+are not executed. Default: false.
+.TP
+.BI allow_file_create \fR=\fPbool
+If true, fio is permitted to create files as part of its workload. If this
+option is false, then fio will error out if
+the files it needs to use don't already exist. Default: true.
+.TP
+.BI allow_mounted_write \fR=\fPbool
+If this isn't set, fio will abort jobs that are destructive (e.g. that write)
+to what appears to be a mounted device or partition. This should help catch
+creating inadvertently destructive tests, not realizing that the test will
+destroy data on the mounted file system. Note that some platforms don't allow
+writing against a mounted device regardless of this option. Default: false.
+.TP
+.BI pre_read \fR=\fPbool
+If this is given, files will be pre\-read into memory before starting the
+given I/O operation. This will also clear the \fBinvalidate\fR flag,
+since it is pointless to pre\-read and then drop the cache. This will only
+work for I/O engines that are seek\-able, since they allow you to read the
+same data multiple times. Thus it will not work on non\-seekable I/O engines
+(e.g. network, splice). Default: false.
+.TP
+.BI unlink \fR=\fPbool
+Unlink the job files when done. Not the default, as repeated runs of that
+job would then waste time recreating the file set again and again. Default:
+false.
+.TP
+.BI unlink_each_loop \fR=\fPbool
+Unlink job files after each iteration or loop. Default: false.
+.TP
+.BI zonesize \fR=\fPint
+Divide a file into zones of the specified size. See \fBzoneskip\fR.
+.TP
+.BI zonerange \fR=\fPint
+Give size of an I/O zone. See \fBzoneskip\fR.
+.TP
+.BI zoneskip \fR=\fPint
+Skip the specified number of bytes when \fBzonesize\fR data has been
+read. The two zone options can be used to only do I/O on zones of a file.
+.SS "I/O type"
+.TP
+.BI direct \fR=\fPbool
+If value is true, use non\-buffered I/O. This is usually O_DIRECT. Note that
+ZFS on Solaris doesn't support direct I/O. On Windows the synchronous
+ioengines don't support direct I/O. Default: false.
+.TP
+.BI atomic \fR=\fPbool
+If value is true, attempt to use atomic direct I/O. Atomic writes are
+guaranteed to be stable once acknowledged by the operating system. Only
+Linux supports O_ATOMIC right now.
+.TP
+.BI buffered \fR=\fPbool
+If value is true, use buffered I/O. This is the opposite of the
+\fBdirect\fR option. Defaults to true.
 .TP
 .BI readwrite \fR=\fPstr "\fR,\fP rw" \fR=\fPstr
-Type of I/O pattern.  Accepted values are:
+Type of I/O pattern. Accepted values are:
 .RS
 .RS
 .TP
@@ -485,71 +755,67 @@ Random writes.
 .B randtrim
 Random trims (Linux block devices only).
 .TP
-.B rw, readwrite
-Mixed sequential reads and writes.
+.B rw,readwrite
+Sequential mixed reads and writes.
 .TP
 .B randrw
-Mixed random reads and writes.
+Random mixed reads and writes.
 .TP
 .B trimwrite
-Sequential trim and write mixed workload. Blocks will be trimmed first, then
-the same blocks will be written to.
+Sequential trim+write sequences. Blocks will be trimmed first,
+then the same blocks will be written to.
 .RE
 .P
-Fio defaults to read if the option is not specified.
-For mixed I/O, the default split is 50/50. For certain types of io the result
-may still be skewed a bit, since the speed may be different. It is possible to
-specify a number of IOs to do before getting a new offset, this is done by
-appending a `:\fI<nr>\fR to the end of the string given. For a random read, it
-would look like \fBrw=randread:8\fR for passing in an offset modifier with a
-value of 8. If the postfix is used with a sequential IO pattern, then the value
-specified will be added to the generated offset for each IO. For instance,
-using \fBrw=write:4k\fR will skip 4k for every write. It turns sequential IO
-into sequential IO with holes. See the \fBrw_sequencer\fR option.
+Fio defaults to read if the option is not specified. For the mixed I/O
+types, the default is to split them 50/50. For certain types of I/O the
+result may still be skewed a bit, since the speed may be different.
+.P
+It is possible to specify the number of I/Os to do before getting a new
+offset by appending `:<nr>' to the end of the string given. For a
+random read, it would look like `rw=randread:8' for passing in an offset
+modifier with a value of 8. If the suffix is used with a sequential I/O
+pattern, then the `<nr>' value specified will be added to the generated
+offset for each I/O turning sequential I/O into sequential I/O with holes.
+For instance, using `rw=write:4k' will skip 4k for every write. Also see
+the \fBrw_sequencer\fR option.
 .RE
 .TP
 .BI rw_sequencer \fR=\fPstr
-If an offset modifier is given by appending a number to the \fBrw=<str>\fR line,
-then this option controls how that number modifies the IO offset being
-generated. Accepted values are:
+If an offset modifier is given by appending a number to the `rw=\fIstr\fR'
+line, then this option controls how that number modifies the I/O offset
+being generated. Accepted values are:
 .RS
 .RS
 .TP
 .B sequential
-Generate sequential offset
+Generate sequential offset.
 .TP
 .B identical
-Generate the same offset
+Generate the same offset.
 .RE
 .P
-\fBsequential\fR is only useful for random IO, where fio would normally
-generate a new random offset for every IO. If you append eg 8 to randread, you
-would get a new random offset for every 8 IOs. The result would be a seek for
-only every 8 IOs, instead of for every IO. Use \fBrw=randread:8\fR to specify
-that. As sequential IO is already sequential, setting \fBsequential\fR for that
-would not result in any differences.  \fBidentical\fR behaves in a similar
-fashion, except it sends the same offset 8 number of times before generating a
-new offset.
+\fBsequential\fR is only useful for random I/O, where fio would normally
+generate a new random offset for every I/O. If you append e.g. 8 to randread,
+you would get a new random offset for every 8 I/Os. The result would be a
+seek for only every 8 I/Os, instead of for every I/O. Use `rw=randread:8'
+to specify that. As sequential I/O is already sequential, setting
+\fBsequential\fR for that would not result in any differences. \fBidentical\fR
+behaves in a similar fashion, except it sends the same offset 8 number of
+times before generating a new offset.
 .RE
-.P
-.TP
-.BI kb_base \fR=\fPint
-The base unit for a kilobyte. The defacto base is 2^10, 1024.  Storage
-manufacturers like to use 10^3 or 1000 as a base ten unit instead, for obvious
-reasons. Allowed values are 1024 or 1000, with 1024 being the default.
 .TP
 .BI unified_rw_reporting \fR=\fPbool
 Fio normally reports statistics on a per data direction basis, meaning that
-reads, writes, and trims are accounted and reported separately. If this option is
-set fio sums the results and reports them as "mixed" instead.
+reads, writes, and trims are accounted and reported separately. If this
+option is set fio sums the results and report them as "mixed" instead.
 .TP
 .BI randrepeat \fR=\fPbool
-Seed the random number generator used for random I/O patterns in a predictable
-way so the pattern is repeatable across runs.  Default: true.
+Seed the random number generator used for random I/O patterns in a
+predictable way so the pattern is repeatable across runs. Default: true.
 .TP
 .BI allrandrepeat \fR=\fPbool
 Seed all random number generators in a predictable way so results are
-repeatable across runs.  Default: false.
+repeatable across runs. Default: false.
 .TP
 .BI randseed \fR=\fPint
 Seed the random number generators based on this seed value, to be able to
@@ -557,35 +823,36 @@ control what sequence of output is being generated. If not set, the random
 sequence depends on the \fBrandrepeat\fR setting.
 .TP
 .BI fallocate \fR=\fPstr
-Whether pre-allocation is performed when laying down files. Accepted values
-are:
+Whether pre\-allocation is performed when laying down files.
+Accepted values are:
 .RS
 .RS
 .TP
 .B none
-Do not pre-allocate space.
+Do not pre\-allocate space.
 .TP
 .B native
-Use a platform's native pre-allocation call but fall back to 'none' behavior if
-it fails/is not implemented.
+Use a platform's native pre\-allocation call but fall back to
+\fBnone\fR behavior if it fails/is not implemented.
 .TP
 .B posix
-Pre-allocate via \fBposix_fallocate\fR\|(3).
+Pre\-allocate via \fBposix_fallocate\fR\|(3).
 .TP
 .B keep
-Pre-allocate via \fBfallocate\fR\|(2) with FALLOC_FL_KEEP_SIZE set.
+Pre\-allocate via \fBfallocate\fR\|(2) with
+FALLOC_FL_KEEP_SIZE set.
 .TP
 .B 0
-Backward-compatible alias for 'none'.
+Backward\-compatible alias for \fBnone\fR.
 .TP
 .B 1
-Backward-compatible alias for 'posix'.
+Backward\-compatible alias for \fBposix\fR.
 .RE
 .P
-May not be available on all supported platforms. 'keep' is only
-available on Linux. If using ZFS on Solaris this cannot be set to 'posix'
-because ZFS doesn't support it. Default: 'native' if any pre-allocation methods
-are available, 'none' if not.
+May not be available on all supported platforms. \fBkeep\fR is only available
+on Linux. If using ZFS on Solaris this cannot be set to \fBposix\fR
+because ZFS doesn't support pre\-allocation. Default: \fBnative\fR if any
+pre\-allocation methods are available, \fBnone\fR if not.
 .RE
 .TP
 .BI fadvise_hint \fR=\fPstr
@@ -599,21 +866,20 @@ Backwards compatible hint for "no hint".
 .TP
 .B 1
 Backwards compatible hint for "advise with fio workload type". This
-uses \fBFADV_RANDOM\fR for a random workload, and \fBFADV_SEQUENTIAL\fR
+uses FADV_RANDOM for a random workload, and FADV_SEQUENTIAL
 for a sequential workload.
 .TP
 .B sequential
-Advise using \fBFADV_SEQUENTIAL\fR
+Advise using FADV_SEQUENTIAL.
 .TP
 .B random
-Advise using \fBFADV_RANDOM\fR
+Advise using FADV_RANDOM.
 .RE
 .RE
 .TP
 .BI write_hint \fR=\fPstr
-Use \fBfcntl\fR\|(2) to advise the kernel what life time to expect from a write.
-Only supported on Linux, as of version 4.13. The values are all relative to
-each other, and no absolute meaning should be associated with them. Accepted
+Use \fBfcntl\fR\|(2) to advise the kernel what life time to expect
+from a write. Only supported on Linux, as of version 4.13. Accepted
 values are:
 .RS
 .RS
@@ -633,235 +899,536 @@ Data written to this file has a long life time.
 .B extreme
 Data written to this file has a very long life time.
 .RE
+.P
+The values are all relative to each other, and no absolute meaning
+should be associated with them.
 .RE
 .TP
-.BI size \fR=\fPint
-Total size of I/O for this job.  \fBfio\fR will run until this many bytes have
-been transferred, unless limited by other options (\fBruntime\fR, for instance,
-or increased/descreased by \fBio_size\fR). Unless \fBnrfiles\fR and
-\fBfilesize\fR options are given, this amount will be divided between the
-available files for the job. If not set, fio will use the full size of the
-given files or devices. If the files do not exist, size must be given. It is
-also possible to give size as a percentage between 1 and 100. If size=20% is
-given, fio will use 20% of the full size of the given files or devices.
+.BI offset \fR=\fPint
+Start I/O at the provided offset in the file, given as either a fixed size in
+bytes or a percentage. If a percentage is given, the next \fBblockalign\fR\-ed
+offset will be used. Data before the given offset will not be touched. This
+effectively caps the file size at `real_size \- offset'. Can be combined with
+\fBsize\fR to constrain the start and end range of the I/O workload.
+A percentage can be specified by a number between 1 and 100 followed by '%',
+for example, `offset=20%' to specify 20%.
 .TP
-.BI io_size \fR=\fPint "\fR,\fB io_limit \fR=\fPint
-Normally fio operates within the region set by \fBsize\fR, which means that
-the \fBsize\fR option sets both the region and size of IO to be performed.
-Sometimes that is not what you want. With this option, it is possible to
-define just the amount of IO that fio should do. For instance, if \fBsize\fR
-is set to 20G and \fBio_limit\fR is set to 5G, fio will perform IO within
-the first 20G but exit when 5G have been done. The opposite is also
-possible - if \fBsize\fR is set to 20G, and \fBio_size\fR is set to 40G, then
-fio will do 40G of IO within the 0..20G region.
+.BI offset_increment \fR=\fPint
+If this is provided, then the real offset becomes `\fBoffset\fR + \fBoffset_increment\fR
+* thread_number', where the thread number is a counter that starts at 0 and
+is incremented for each sub\-job (i.e. when \fBnumjobs\fR option is
+specified). This option is useful if there are several jobs which are
+intended to operate on a file in parallel disjoint segments, with even
+spacing between the starting points.
 .TP
-.BI fill_device \fR=\fPbool "\fR,\fB fill_fs" \fR=\fPbool
-Sets size to something really large and waits for ENOSPC (no space left on
-device) as the terminating condition. Only makes sense with sequential write.
-For a read workload, the mount point will be filled first then IO started on
-the result. This option doesn't make sense if operating on a raw device node,
-since the size of that is already known by the file system. Additionally,
-writing beyond end-of-device will not return ENOSPC there.
-.TP
-.BI filesize \fR=\fPirange
-Individual file sizes. May be a range, in which case \fBfio\fR will select sizes
-for files at random within the given range, limited to \fBsize\fR in total (if
-that is given). If \fBfilesize\fR is not specified, each created file is the
-same size.
+.BI number_ios \fR=\fPint
+Fio will normally perform I/Os until it has exhausted the size of the region
+set by \fBsize\fR, or if it exhaust the allocated time (or hits an error
+condition). With this setting, the range/size can be set independently of
+the number of I/Os to perform. When fio reaches this number, it will exit
+normally and report status. Note that this does not extend the amount of I/O
+that will be done, it will only stop fio if this condition is met before
+other end\-of\-job criteria.
 .TP
-.BI file_append \fR=\fPbool
-Perform IO after the end of the file. Normally fio will operate within the
-size of a file. If this option is set, then fio will append to the file
-instead. This has identical behavior to setting \fRoffset\fP to the size
-of a file. This option is ignored on non-regular files.
+.BI fsync \fR=\fPint
+If writing to a file, issue an \fBfsync\fR\|(2) (or its equivalent) of
+the dirty data for every number of blocks given. For example, if you give 32
+as a parameter, fio will sync the file after every 32 writes issued. If fio is
+using non\-buffered I/O, we may not sync the file. The exception is the sg
+I/O engine, which synchronizes the disk cache anyway. Defaults to 0, which
+means fio does not periodically issue and wait for a sync to complete. Also
+see \fBend_fsync\fR and \fBfsync_on_close\fR.
+.TP
+.BI fdatasync \fR=\fPint
+Like \fBfsync\fR but uses \fBfdatasync\fR\|(2) to only sync data and
+not metadata blocks. In Windows, FreeBSD, and DragonFlyBSD there is no
+\fBfdatasync\fR\|(2) so this falls back to using \fBfsync\fR\|(2).
+Defaults to 0, which means fio does not periodically issue and wait for a
+data\-only sync to complete.
+.TP
+.BI write_barrier \fR=\fPint
+Make every N\-th write a barrier write.
+.TP
+.BI sync_file_range \fR=\fPstr:int
+Use \fBsync_file_range\fR\|(2) for every \fIint\fR number of write
+operations. Fio will track range of writes that have happened since the last
+\fBsync_file_range\fR\|(2) call. \fIstr\fR can currently be one or more of:
+.RS
+.RS
+.TP
+.B wait_before
+SYNC_FILE_RANGE_WAIT_BEFORE
+.TP
+.B write
+SYNC_FILE_RANGE_WRITE
+.TP
+.B wait_after
+SYNC_FILE_RANGE_WRITE_AFTER
+.RE
+.P
+So if you do `sync_file_range=wait_before,write:8', fio would use
+`SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE' for every 8
+writes. Also see the \fBsync_file_range\fR\|(2) man page. This option is
+Linux specific.
+.RE
+.TP
+.BI overwrite \fR=\fPbool
+If true, writes to a file will always overwrite existing data. If the file
+doesn't already exist, it will be created before the write phase begins. If
+the file exists and is large enough for the specified write phase, nothing
+will be done. Default: false.
+.TP
+.BI end_fsync \fR=\fPbool
+If true, \fBfsync\fR\|(2) file contents when a write stage has completed.
+Default: false.
+.TP
+.BI fsync_on_close \fR=\fPbool
+If true, fio will \fBfsync\fR\|(2) a dirty file on close. This differs
+from \fBend_fsync\fR in that it will happen on every file close, not
+just at the end of the job. Default: false.
+.TP
+.BI rwmixread \fR=\fPint
+Percentage of a mixed workload that should be reads. Default: 50.
+.TP
+.BI rwmixwrite \fR=\fPint
+Percentage of a mixed workload that should be writes. If both
+\fBrwmixread\fR and \fBrwmixwrite\fR is given and the values do not
+add up to 100%, the latter of the two will be used to override the
+first. This may interfere with a given rate setting, if fio is asked to
+limit reads or writes to a certain rate. If that is the case, then the
+distribution may be skewed. Default: 50.
+.TP
+.BI random_distribution \fR=\fPstr:float[,str:float][,str:float]
+By default, fio will use a completely uniform random distribution when asked
+to perform random I/O. Sometimes it is useful to skew the distribution in
+specific ways, ensuring that some parts of the data is more hot than others.
+fio includes the following distribution models:
+.RS
+.RS
+.TP
+.B random
+Uniform random distribution
+.TP
+.B zipf
+Zipf distribution
+.TP
+.B pareto
+Pareto distribution
+.TP
+.B normal
+Normal (Gaussian) distribution
+.TP
+.B zoned
+Zoned random distribution
+.RE
+.P
+When using a \fBzipf\fR or \fBpareto\fR distribution, an input value is also
+needed to define the access pattern. For \fBzipf\fR, this is the `Zipf theta'.
+For \fBpareto\fR, it's the `Pareto power'. Fio includes a test
+program, \fBfio\-genzipf\fR, that can be used visualize what the given input
+values will yield in terms of hit rates. If you wanted to use \fBzipf\fR with
+a `theta' of 1.2, you would use `random_distribution=zipf:1.2' as the
+option. If a non\-uniform model is used, fio will disable use of the random
+map. For the \fBnormal\fR distribution, a normal (Gaussian) deviation is
+supplied as a value between 0 and 100.
+.P
+For a \fBzoned\fR distribution, fio supports specifying percentages of I/O
+access that should fall within what range of the file or device. For
+example, given a criteria of:
+.RS
+.P
+.PD 0
+60% of accesses should be to the first 10%
+.P
+30% of accesses should be to the next 20%
+.P
+8% of accesses should be to the next 30%
+.P
+2% of accesses should be to the next 40%
+.PD
+.RE
+.P
+we can define that through zoning of the random accesses. For the above
+example, the user would do:
+.RS
+.P
+random_distribution=zoned:60/10:30/20:8/30:2/40
+.RE
+.P
+similarly to how \fBbssplit\fR works for setting ranges and percentages
+of block sizes. Like \fBbssplit\fR, it's possible to specify separate
+zones for reads, writes, and trims. If just one set is given, it'll apply to
+all of them.
+.RE
+.TP
+.BI percentage_random \fR=\fPint[,int][,int]
+For a random workload, set how big a percentage should be random. This
+defaults to 100%, in which case the workload is fully random. It can be set
+from anywhere from 0 to 100. Setting it to 0 would make the workload fully
+sequential. Any setting in between will result in a random mix of sequential
+and random I/O, at the given percentages. Comma\-separated values may be
+specified for reads, writes, and trims as described in \fBblocksize\fR.
+.TP
+.BI norandommap
+Normally fio will cover every block of the file when doing random I/O. If
+this option is given, fio will just get a new random offset without looking
+at past I/O history. This means that some blocks may not be read or written,
+and that some blocks may be read/written more than once. If this option is
+used with \fBverify\fR and multiple blocksizes (via \fBbsrange\fR),
+only intact blocks are verified, i.e., partially\-overwritten blocks are
+ignored.
+.TP
+.BI softrandommap \fR=\fPbool
+See \fBnorandommap\fR. If fio runs with the random block map enabled and
+it fails to allocate the map, if this option is set it will continue without
+a random block map. As coverage will not be as complete as with random maps,
+this option is disabled by default.
+.TP
+.BI random_generator \fR=\fPstr
+Fio supports the following engines for generating I/O offsets for random I/O:
+.RS
+.RS
+.TP
+.B tausworthe
+Strong 2^88 cycle random number generator.
+.TP
+.B lfsr
+Linear feedback shift register generator.
+.TP
+.B tausworthe64
+Strong 64\-bit 2^258 cycle random number generator.
+.RE
+.P
+\fBtausworthe\fR is a strong random number generator, but it requires tracking
+on the side if we want to ensure that blocks are only read or written
+once. \fBlfsr\fR guarantees that we never generate the same offset twice, and
+it's also less computationally expensive. It's not a true random generator,
+however, though for I/O purposes it's typically good enough. \fBlfsr\fR only
+works with single block sizes, not with workloads that use multiple block
+sizes. If used with such a workload, fio may read or write some blocks
+multiple times. The default value is \fBtausworthe\fR, unless the required
+space exceeds 2^32 blocks. If it does, then \fBtausworthe64\fR is
+selected automatically.
+.RE
+.SS "Block size"
 .TP
 .BI blocksize \fR=\fPint[,int][,int] "\fR,\fB bs" \fR=\fPint[,int][,int]
-The block size in bytes for I/O units.  Default: 4096.
-A single value applies to reads, writes, and trims.
-Comma-separated values may be specified for reads, writes, and trims.
-Empty values separated by commas use the default value. A value not
-terminated in a comma applies to subsequent types.
-.nf
-Examples:
-bs=256k    means 256k for reads, writes and trims
-bs=8k,32k  means 8k for reads, 32k for writes and trims
-bs=8k,32k, means 8k for reads, 32k for writes, and default for trims
-bs=,8k     means default for reads, 8k for writes and trims
-bs=,8k,    means default for reads, 8k for writes, and default for trims
-.fi
+The block size in bytes used for I/O units. Default: 4096. A single value
+applies to reads, writes, and trims. Comma\-separated values may be
+specified for reads, writes, and trims. A value not terminated in a comma
+applies to subsequent types. Examples:
+.RS
+.RS
+.P
+.PD 0
+bs=256k        means 256k for reads, writes and trims.
+.P
+bs=8k,32k      means 8k for reads, 32k for writes and trims.
+.P
+bs=8k,32k,     means 8k for reads, 32k for writes, and default for trims.
+.P
+bs=,8k         means default for reads, 8k for writes and trims.
+.P
+bs=,8k,        means default for reads, 8k for writes, and default for trims.
+.PD
+.RE
+.RE
 .TP
 .BI blocksize_range \fR=\fPirange[,irange][,irange] "\fR,\fB bsrange" \fR=\fPirange[,irange][,irange]
-A range of block sizes in bytes for I/O units.
-The issued I/O unit will always be a multiple of the minimum size, unless
+A range of block sizes in bytes for I/O units. The issued I/O unit will
+always be a multiple of the minimum size, unless
 \fBblocksize_unaligned\fR is set.
-Comma-separated ranges may be specified for reads, writes, and trims
-as described in \fBblocksize\fR.
-.nf
-Example: bsrange=1k-4k,2k-8k.
-.fi
+Comma\-separated ranges may be specified for reads, writes, and trims as
+described in \fBblocksize\fR. Example:
+.RS
+.RS
+.P
+bsrange=1k\-4k,2k\-8k
+.RE
+.RE
 .TP
 .BI bssplit \fR=\fPstr[,str][,str]
-This option allows even finer grained control of the block sizes issued,
-not just even splits between them. With this option, you can weight various
-block sizes for exact control of the issued IO for a job that has mixed
-block sizes. The format of the option is bssplit=blocksize/percentage,
-optionally adding as many definitions as needed separated by a colon.
-Example: bssplit=4k/10:64k/50:32k/40 would issue 50% 64k blocks, 10% 4k
-blocks and 40% 32k blocks. \fBbssplit\fR also supports giving separate
-splits to reads, writes, and trims.
-Comma-separated values may be specified for reads, writes, and trims
-as described in \fBblocksize\fR.
-.TP
-.B blocksize_unaligned\fR,\fB bs_unaligned
-If set, fio will issue I/O units with any size within \fBblocksize_range\fR,
-not just multiples of the minimum size.  This typically won't
-work with direct I/O, as that normally requires sector alignment.
+Sometimes you want even finer grained control of the block sizes issued, not
+just an even split between them. This option allows you to weight various
+block sizes, so that you are able to define a specific amount of block sizes
+issued. The format for this option is:
+.RS
+.RS
+.P
+bssplit=blocksize/percentage:blocksize/percentage
+.RE
+.P
+for as many block sizes as needed. So if you want to define a workload that
+has 50% 64k blocks, 10% 4k blocks, and 40% 32k blocks, you would write:
+.RS
+.P
+bssplit=4k/10:64k/50:32k/40
+.RE
+.P
+Ordering does not matter. If the percentage is left blank, fio will fill in
+the remaining values evenly. So a bssplit option like this one:
+.RS
+.P
+bssplit=4k/50:1k/:32k/
+.RE
+.P
+would have 50% 4k ios, and 25% 1k and 32k ios. The percentages always add up
+to 100, if bssplit is given a range that adds up to more, it will error out.
+.P
+Comma\-separated values may be specified for reads, writes, and trims as
+described in \fBblocksize\fR.
+.P
+If you want a workload that has 50% 2k reads and 50% 4k reads, while having
+90% 4k writes and 10% 8k writes, you would specify:
+.RS
+.P
+bssplit=2k/50:4k/50,4k/90,8k/10
+.RE
+.RE
+.TP
+.BI blocksize_unaligned "\fR,\fB bs_unaligned"
+If set, fio will issue I/O units with any size within
+\fBblocksize_range\fR, not just multiples of the minimum size. This
+typically won't work with direct I/O, as that normally requires sector
+alignment.
 .TP
 .BI bs_is_seq_rand \fR=\fPbool
-If this option is set, fio will use the normal read,write blocksize settings as
-sequential,random blocksize settings instead. Any random read or write will
-use the WRITE blocksize settings, and any sequential read or write will use
-the READ blocksize settings.
+If this option is set, fio will use the normal read,write blocksize settings
+as sequential,random blocksize settings instead. Any random read or write
+will use the WRITE blocksize settings, and any sequential read or write will
+use the READ blocksize settings.
 .TP
 .BI blockalign \fR=\fPint[,int][,int] "\fR,\fB ba" \fR=\fPint[,int][,int]
-Boundary to which fio will align random I/O units. Default: \fBblocksize\fR.
-Minimum alignment is typically 512b for using direct IO, though it usually
-depends on the hardware block size.  This option is mutually exclusive with
-using a random map for files, so it will turn off that option.
-Comma-separated values may be specified for reads, writes, and trims
-as described in \fBblocksize\fR.
-.TP
-.B zero_buffers
+Boundary to which fio will align random I/O units. Default:
+\fBblocksize\fR. Minimum alignment is typically 512b for using direct
+I/O, though it usually depends on the hardware block size. This option is
+mutually exclusive with using a random map for files, so it will turn off
+that option. Comma\-separated values may be specified for reads, writes, and
+trims as described in \fBblocksize\fR.
+.SS "Buffers and memory"
+.TP
+.BI zero_buffers
 Initialize buffers with all zeros. Default: fill buffers with random data.
 .TP
-.B refill_buffers
-If this option is given, fio will refill the IO buffers on every submit. The
-default is to only fill it at init time and reuse that data. Only makes sense
-if zero_buffers isn't specified, naturally. If data verification is enabled,
-refill_buffers is also automatically enabled.
+.BI refill_buffers
+If this option is given, fio will refill the I/O buffers on every
+submit. The default is to only fill it at init time and reuse that
+data. Only makes sense if zero_buffers isn't specified, naturally. If data
+verification is enabled, \fBrefill_buffers\fR is also automatically enabled.
 .TP
 .BI scramble_buffers \fR=\fPbool
 If \fBrefill_buffers\fR is too costly and the target is using data
-deduplication, then setting this option will slightly modify the IO buffer
-contents to defeat normal de-dupe attempts. This is not enough to defeat
-more clever block compression attempts, but it will stop naive dedupe
-of blocks. Default: true.
+deduplication, then setting this option will slightly modify the I/O buffer
+contents to defeat normal de\-dupe attempts. This is not enough to defeat
+more clever block compression attempts, but it will stop naive dedupe of
+blocks. Default: true.
 .TP
 .BI buffer_compress_percentage \fR=\fPint
-If this is set, then fio will attempt to provide IO buffer content (on WRITEs)
-that compress to the specified level. Fio does this by providing a mix of
-random data and a fixed pattern. The fixed pattern is either zeroes, or the
-pattern specified by \fBbuffer_pattern\fR. If the pattern option is used, it
-might skew the compression ratio slightly. Note that this is per block size
-unit, for file/disk wide compression level that matches this setting. Note
-that this is per block size unit, for file/disk wide compression level that
-matches this setting, you'll also want to set refill_buffers.
+If this is set, then fio will attempt to provide I/O buffer content (on
+WRITEs) that compresses to the specified level. Fio does this by providing a
+mix of random data and a fixed pattern. The fixed pattern is either zeros,
+or the pattern specified by \fBbuffer_pattern\fR. If the pattern option
+is used, it might skew the compression ratio slightly. Note that this is per
+block size unit, for file/disk wide compression level that matches this
+setting, you'll also want to set \fBrefill_buffers\fR.
 .TP
 .BI buffer_compress_chunk \fR=\fPint
-See \fBbuffer_compress_percentage\fR. This setting allows fio to manage how
-big the ranges of random data and zeroed data is. Without this set, fio will
-provide \fBbuffer_compress_percentage\fR of blocksize random data, followed by
-the remaining zeroed. With this set to some chunk size smaller than the block
-size, fio can alternate random and zeroed data throughout the IO buffer.
+See \fBbuffer_compress_percentage\fR. This setting allows fio to manage
+how big the ranges of random data and zeroed data is. Without this set, fio
+will provide \fBbuffer_compress_percentage\fR of blocksize random data,
+followed by the remaining zeroed. With this set to some chunk size smaller
+than the block size, fio can alternate random and zeroed data throughout the
+I/O buffer.
 .TP
 .BI buffer_pattern \fR=\fPstr
 If set, fio will fill the I/O buffers with this pattern or with the contents
 of a file. If not set, the contents of I/O buffers are defined by the other
 options related to buffer contents. The setting can be any pattern of bytes,
 and can be prefixed with 0x for hex values. It may also be a string, where
-the string must then be wrapped with ``""``. Or it may also be a filename,
-where the filename must be wrapped with ``''`` in which case the file is
+the string must then be wrapped with "". Or it may also be a filename,
+where the filename must be wrapped with '' in which case the file is
 opened and read. Note that not all the file contents will be read if that
 would cause the buffers to overflow. So, for example:
 .RS
 .RS
-\fBbuffer_pattern\fR='filename'
-.RS
-or
-.RE
-\fBbuffer_pattern\fR="abcd"
-.RS
-or
-.RE
-\fBbuffer_pattern\fR=-12
-.RS
-or
-.RE
-\fBbuffer_pattern\fR=0xdeadface
+.P
+.PD 0
+buffer_pattern='filename'
+.P
+or:
+.P
+buffer_pattern="abcd"
+.P
+or:
+.P
+buffer_pattern=\-12
+.P
+or:
+.P
+buffer_pattern=0xdeadface
+.PD
 .RE
-.LP
+.P
 Also you can combine everything together in any order:
-.LP
 .RS
-\fBbuffer_pattern\fR=0xdeadface"abcd"-12'filename'
+.P
+buffer_pattern=0xdeadface"abcd"\-12'filename'
 .RE
 .RE
 .TP
 .BI dedupe_percentage \fR=\fPint
-If set, fio will generate this percentage of identical buffers when writing.
-These buffers will be naturally dedupable. The contents of the buffers depend
-on what other buffer compression settings have been set. It's possible to have
-the individual buffers either fully compressible, or not at all. This option
-only controls the distribution of unique buffers.
+If set, fio will generate this percentage of identical buffers when
+writing. These buffers will be naturally dedupable. The contents of the
+buffers depend on what other buffer compression settings have been set. It's
+possible to have the individual buffers either fully compressible, or not at
+all. This option only controls the distribution of unique buffers.
 .TP
-.BI nrfiles \fR=\fPint
-Number of files to use for this job.  Default: 1.
+.BI invalidate \fR=\fPbool
+Invalidate the buffer/page cache parts of the files to be used prior to
+starting I/O if the platform and file type support it. Defaults to true.
+This will be ignored if \fBpre_read\fR is also specified for the
+same job.
 .TP
-.BI openfiles \fR=\fPint
-Number of files to keep open at the same time.  Default: \fBnrfiles\fR.
+.BI sync \fR=\fPbool
+Use synchronous I/O for buffered writes. For the majority of I/O engines,
+this means using O_SYNC. Default: false.
 .TP
-.BI file_service_type \fR=\fPstr
-Defines how files to service are selected.  The following types are defined:
+.BI iomem \fR=\fPstr "\fR,\fP mem" \fR=\fPstr
+Fio can use various types of memory as the I/O unit buffer. The allowed
+values are:
 .RS
 .RS
 .TP
-.B random
-Choose a file at random.
+.B malloc
+Use memory from \fBmalloc\fR\|(3) as the buffers. Default memory type.
 .TP
-.B roundrobin
-Round robin over opened files (default).
+.B shm
+Use shared memory as the buffers. Allocated through \fBshmget\fR\|(2).
 .TP
-.B sequential
-Do each file in the set sequentially.
+.B shmhuge
+Same as \fBshm\fR, but use huge pages as backing.
 .TP
-.B zipf
-Use a zipfian distribution to decide what file to access.
+.B mmap
+Use \fBmmap\fR\|(2) to allocate buffers. May either be anonymous memory, or can
+be file backed if a filename is given after the option. The format
+is `mem=mmap:/path/to/file'.
 .TP
-.B pareto
-Use a pareto distribution to decide what file to access.
+.B mmaphuge
+Use a memory mapped huge file as the buffer backing. Append filename
+after mmaphuge, ala `mem=mmaphuge:/hugetlbfs/file'.
 .TP
-.B normal
-Use a Gaussian (normal) distribution to decide what file to access.
+.B mmapshared
+Same as \fBmmap\fR, but use a MMAP_SHARED mapping.
 .TP
-.B gauss
-Alias for normal.
+.B cudamalloc
+Use GPU memory as the buffers for GPUDirect RDMA benchmark.
+The \fBioengine\fR must be \fBrdma\fR.
 .RE
 .P
-For \fBrandom\fR, \fBroundrobin\fR, and \fBsequential\fR, a postfix can be
-appended to tell fio how many I/Os to issue before switching to a new file.
-For example, specifying \fBfile_service_type=random:8\fR would cause fio to
-issue \fI8\fR I/Os before selecting a new file at random. For the non-uniform
-distributions, a floating point postfix can be given to influence how the
-distribution is skewed. See \fBrandom_distribution\fR for a description of how
-that would work.
+The area allocated is a function of the maximum allowed bs size for the job,
+multiplied by the I/O depth given. Note that for \fBshmhuge\fR and
+\fBmmaphuge\fR to work, the system must have free huge pages allocated. This
+can normally be checked and set by reading/writing
+`/proc/sys/vm/nr_hugepages' on a Linux system. Fio assumes a huge page
+is 4MiB in size. So to calculate the number of huge pages you need for a
+given job file, add up the I/O depth of all jobs (normally one unless
+\fBiodepth\fR is used) and multiply by the maximum bs set. Then divide
+that number by the huge page size. You can see the size of the huge pages in
+`/proc/meminfo'. If no huge pages are allocated by having a non\-zero
+number in `nr_hugepages', using \fBmmaphuge\fR or \fBshmhuge\fR will fail. Also
+see \fBhugepage\-size\fR.
+.P
+\fBmmaphuge\fR also needs to have hugetlbfs mounted and the file location
+should point there. So if it's mounted in `/huge', you would use
+`mem=mmaphuge:/huge/somefile'.
 .RE
 .TP
+.BI iomem_align \fR=\fPint "\fR,\fP mem_align" \fR=\fPint
+This indicates the memory alignment of the I/O memory buffers. Note that
+the given alignment is applied to the first I/O unit buffer, if using
+\fBiodepth\fR the alignment of the following buffers are given by the
+\fBbs\fR used. In other words, if using a \fBbs\fR that is a
+multiple of the page sized in the system, all buffers will be aligned to
+this value. If using a \fBbs\fR that is not page aligned, the alignment
+of subsequent I/O memory buffers is the sum of the \fBiomem_align\fR and
+\fBbs\fR used.
+.TP
+.BI hugepage\-size \fR=\fPint
+Defines the size of a huge page. Must at least be equal to the system
+setting, see `/proc/meminfo'. Defaults to 4MiB. Should probably
+always be a multiple of megabytes, so using `hugepage\-size=Xm' is the
+preferred way to set this to avoid setting a non\-pow\-2 bad value.
+.TP
+.BI lockmem \fR=\fPint
+Pin the specified amount of memory with \fBmlock\fR\|(2). Can be used to
+simulate a smaller amount of memory. The amount specified is per worker.
+.SS "I/O size"
+.TP
+.BI size \fR=\fPint
+The total size of file I/O for each thread of this job. Fio will run until
+this many bytes has been transferred, unless runtime is limited by other options
+(such as \fBruntime\fR, for instance, or increased/decreased by \fBio_size\fR).
+Fio will divide this size between the available files determined by options
+such as \fBnrfiles\fR, \fBfilename\fR, unless \fBfilesize\fR is
+specified by the job. If the result of division happens to be 0, the size is
+set to the physical size of the given files or devices if they exist.
+If this option is not specified, fio will use the full size of the given
+files or devices. If the files do not exist, size must be given. It is also
+possible to give size as a percentage between 1 and 100. If `size=20%' is
+given, fio will use 20% of the full size of the given files or devices.
+Can be combined with \fBoffset\fR to constrain the start and end range
+that I/O will be done within.
+.TP
+.BI io_size \fR=\fPint "\fR,\fB io_limit" \fR=\fPint
+Normally fio operates within the region set by \fBsize\fR, which means
+that the \fBsize\fR option sets both the region and size of I/O to be
+performed. Sometimes that is not what you want. With this option, it is
+possible to define just the amount of I/O that fio should do. For instance,
+if \fBsize\fR is set to 20GiB and \fBio_size\fR is set to 5GiB, fio
+will perform I/O within the first 20GiB but exit when 5GiB have been
+done. The opposite is also possible \-\- if \fBsize\fR is set to 20GiB,
+and \fBio_size\fR is set to 40GiB, then fio will do 40GiB of I/O within
+the 0..20GiB region.
+.TP
+.BI filesize \fR=\fPirange(int)
+Individual file sizes. May be a range, in which case fio will select sizes
+for files at random within the given range and limited to \fBsize\fR in
+total (if that is given). If not given, each created file is the same size.
+This option overrides \fBsize\fR in terms of file size, which means
+this value is used as a fixed size or possible range of each file.
+.TP
+.BI file_append \fR=\fPbool
+Perform I/O after the end of the file. Normally fio will operate within the
+size of a file. If this option is set, then fio will append to the file
+instead. This has identical behavior to setting \fBoffset\fR to the size
+of a file. This option is ignored on non\-regular files.
+.TP
+.BI fill_device \fR=\fPbool "\fR,\fB fill_fs" \fR=\fPbool
+Sets size to something really large and waits for ENOSPC (no space left on
+device) as the terminating condition. Only makes sense with sequential
+write. For a read workload, the mount point will be filled first then I/O
+started on the result. This option doesn't make sense if operating on a raw
+device node, since the size of that is already known by the file system.
+Additionally, writing beyond end\-of\-device will not return ENOSPC there.
+.SS "I/O engine"
+.TP
 .BI ioengine \fR=\fPstr
-Defines how the job issues I/O.  The following types are defined:
+Defines how the job issues I/O to the file. The following types are defined:
 .RS
 .RS
 .TP
 .B sync
-Basic \fBread\fR\|(2) or \fBwrite\fR\|(2) I/O.  \fBfseek\fR\|(2) is used to
-position the I/O location.
+Basic \fBread\fR\|(2) or \fBwrite\fR\|(2)
+I/O. \fBlseek\fR\|(2) is used to position the I/O location.
+See \fBfsync\fR and \fBfdatasync\fR for syncing write I/Os.
 .TP
 .B psync
-Basic \fBpread\fR\|(2) or \fBpwrite\fR\|(2) I/O.
-Default on all supported operating systems except for Windows.
+Basic \fBpread\fR\|(2) or \fBpwrite\fR\|(2) I/O. Default on
+all supported operating systems except for Windows.
 .TP
 .B vsync
-Basic \fBreadv\fR\|(2) or \fBwritev\fR\|(2) I/O. Will emulate queuing by
-coalescing adjacent IOs into a single submission.
+Basic \fBreadv\fR\|(2) or \fBwritev\fR\|(2) I/O. Will emulate
+queuing by coalescing adjacent I/Os into a single submission.
 .TP
 .B pvsync
 Basic \fBpreadv\fR\|(2) or \fBpwritev\fR\|(2) I/O.
@@ -870,10 +1437,14 @@ Basic \fBpreadv\fR\|(2) or \fBpwritev\fR\|(2) I/O.
 Basic \fBpreadv2\fR\|(2) or \fBpwritev2\fR\|(2) I/O.
 .TP
 .B libaio
-Linux native asynchronous I/O. This ioengine defines engine specific options.
+Linux native asynchronous I/O. Note that Linux may only support
+queued behavior with non\-buffered I/O (set `direct=1' or
+`buffered=0').
+This engine defines engine specific options.
 .TP
 .B posixaio
-POSIX asynchronous I/O using \fBaio_read\fR\|(3) and \fBaio_write\fR\|(3).
+POSIX asynchronous I/O using \fBaio_read\fR\|(3) and
+\fBaio_write\fR\|(3).
 .TP
 .B solarisaio
 Solaris native asynchronous I/O.
@@ -882,482 +1453,552 @@ Solaris native asynchronous I/O.
 Windows native asynchronous I/O. Default on Windows.
 .TP
 .B mmap
-File is memory mapped with \fBmmap\fR\|(2) and data copied using
-\fBmemcpy\fR\|(3).
+File is memory mapped with \fBmmap\fR\|(2) and data copied
+to/from using \fBmemcpy\fR\|(3).
 .TP
 .B splice
-\fBsplice\fR\|(2) is used to transfer the data and \fBvmsplice\fR\|(2) to
-transfer data from user-space to the kernel.
+\fBsplice\fR\|(2) is used to transfer the data and
+\fBvmsplice\fR\|(2) to transfer data from user space to the
+kernel.
 .TP
 .B sg
-SCSI generic sg v3 I/O. May be either synchronous using the SG_IO ioctl, or if
-the target is an sg character device, we use \fBread\fR\|(2) and
-\fBwrite\fR\|(2) for asynchronous I/O.
+SCSI generic sg v3 I/O. May either be synchronous using the SG_IO
+ioctl, or if the target is an sg character device we use
+\fBread\fR\|(2) and \fBwrite\fR\|(2) for asynchronous
+I/O. Requires \fBfilename\fR option to specify either block or
+character devices.
 .TP
 .B null
-Doesn't transfer any data, just pretends to.  Mainly used to exercise \fBfio\fR
-itself and for debugging and testing purposes.
+Doesn't transfer any data, just pretends to. This is mainly used to
+exercise fio itself and for debugging/testing purposes.
 .TP
 .B net
-Transfer over the network.  The protocol to be used can be defined with the
-\fBprotocol\fR parameter.  Depending on the protocol, \fBfilename\fR,
-\fBhostname\fR, \fBport\fR, or \fBlisten\fR must be specified.
-This ioengine defines engine specific options.
+Transfer over the network to given `host:port'. Depending on the
+\fBprotocol\fR used, the \fBhostname\fR, \fBport\fR,
+\fBlisten\fR and \fBfilename\fR options are used to specify
+what sort of connection to make, while the \fBprotocol\fR option
+determines which protocol will be used. This engine defines engine
+specific options.
 .TP
 .B netsplice
-Like \fBnet\fR, but uses \fBsplice\fR\|(2) and \fBvmsplice\fR\|(2) to map data
-and send/receive. This ioengine defines engine specific options.
+Like \fBnet\fR, but uses \fBsplice\fR\|(2) and
+\fBvmsplice\fR\|(2) to map data and send/receive.
+This engine defines engine specific options.
 .TP
 .B cpuio
-Doesn't transfer any data, but burns CPU cycles according to \fBcpuload\fR and
-\fBcpuchunks\fR parameters. A job never finishes unless there is at least one
-non-cpuio job.
+Doesn't transfer any data, but burns CPU cycles according to the
+\fBcpuload\fR and \fBcpuchunks\fR options. Setting
+\fBcpuload\fR\=85 will cause that job to do nothing but burn 85%
+of the CPU. In case of SMP machines, use `numjobs=<nr_of_cpu>'
+to get desired CPU usage, as the cpuload only loads a
+single CPU at the desired rate. A job never finishes unless there is
+at least one non\-cpuio job.
 .TP
 .B guasi
-The GUASI I/O engine is the Generic Userspace Asynchronous Syscall Interface
-approach to asynchronous I/O.
-.br
-See <http://www.xmailserver.org/guasi\-lib.html>.
+The GUASI I/O engine is the Generic Userspace Asyncronous Syscall
+Interface approach to async I/O. See \fIhttp://www.xmailserver.org/guasi\-lib.html\fR
+for more info on GUASI.
 .TP
 .B rdma
-The RDMA I/O engine supports both RDMA memory semantics (RDMA_WRITE/RDMA_READ)
-and channel semantics (Send/Recv) for the InfiniBand, RoCE and iWARP protocols.
-.TP
-.B external
-Loads an external I/O engine object file.  Append the engine filename as
-`:\fIenginepath\fR'.
+The RDMA I/O engine supports both RDMA memory semantics
+(RDMA_WRITE/RDMA_READ) and channel semantics (Send/Recv) for the
+InfiniBand, RoCE and iWARP protocols.
 .TP
 .B falloc
-   IO engine that does regular linux native fallocate call to simulate data
-transfer as fio ioengine
-.br
-  DDIR_READ  does fallocate(,mode = FALLOC_FL_KEEP_SIZE,)
-.br
-  DIR_WRITE does fallocate(,mode = 0)
-.br
-  DDIR_TRIM does fallocate(,mode = FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE)
+I/O engine that does regular fallocate to simulate data transfer as
+fio ioengine.
+.RS
+.P
+.PD 0
+DDIR_READ      does fallocate(,mode = FALLOC_FL_KEEP_SIZE,).
+.P
+DIR_WRITE      does fallocate(,mode = 0).
+.P
+DDIR_TRIM      does fallocate(,mode = FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE).
+.PD
+.RE
+.TP
+.B ftruncate
+I/O engine that sends \fBftruncate\fR\|(2) operations in response
+to write (DDIR_WRITE) events. Each ftruncate issued sets the file's
+size to the current block offset. \fBblocksize\fR is ignored.
 .TP
 .B e4defrag
-IO engine that does regular EXT4_IOC_MOVE_EXT ioctls to simulate defragment activity
-request to DDIR_WRITE event
+I/O engine that does regular EXT4_IOC_MOVE_EXT ioctls to simulate
+defragment activity in request to DDIR_WRITE event.
 .TP
 .B rbd
-IO engine supporting direct access to Ceph Rados Block Devices (RBD) via librbd
-without the need to use the kernel rbd driver. This ioengine defines engine specific
-options.
+I/O engine supporting direct access to Ceph Rados Block Devices
+(RBD) via librbd without the need to use the kernel rbd driver. This
+ioengine defines engine specific options.
 .TP
 .B gfapi
-Using Glusterfs libgfapi sync interface to direct access to Glusterfs volumes without
-having to go through FUSE. This ioengine defines engine specific
-options.
+Using GlusterFS libgfapi sync interface to direct access to
+GlusterFS volumes without having to go through FUSE. This ioengine
+defines engine specific options.
 .TP
 .B gfapi_async
-Using Glusterfs libgfapi async interface to direct access to Glusterfs volumes without
-having to go through FUSE. This ioengine defines engine specific
-options.
+Using GlusterFS libgfapi async interface to direct access to
+GlusterFS volumes without having to go through FUSE. This ioengine
+defines engine specific options.
 .TP
 .B libhdfs
-Read and write through Hadoop (HDFS).  The \fBfilename\fR option is used to
-specify host,port of the hdfs name-node to connect. This engine interprets
-offsets a little differently. In HDFS, files once created cannot be modified.
-So random writes are not possible. To imitate this, libhdfs engine expects
-bunch of small files to be created over HDFS, and engine will randomly pick a
-file out of those files based on the offset generated by fio backend. (see the
-example job file to create such files, use rw=write option). Please note, you
-might want to set necessary environment variables to work with hdfs/libhdfs
-properly.
+Read and write through Hadoop (HDFS). The \fBfilename\fR option
+is used to specify host,port of the hdfs name\-node to connect. This
+engine interprets offsets a little differently. In HDFS, files once
+created cannot be modified so random writes are not possible. To
+imitate this the libhdfs engine expects a bunch of small files to be
+created over HDFS and will randomly pick a file from them
+based on the offset generated by fio backend (see the example
+job file to create such files, use `rw=write' option). Please
+note, it may be necessary to set environment variables to work
+with HDFS/libhdfs properly. Each job uses its own connection to
+HDFS.
 .TP
 .B mtd
-Read, write and erase an MTD character device (e.g., /dev/mtd0). Discards are
-treated as erases. Depending on the underlying device type, the I/O may have
-to go in a certain pattern, e.g., on NAND, writing sequentially to erase blocks
-and discarding before overwriting. The trimwrite mode works well for this
+Read, write and erase an MTD character device (e.g.,
+`/dev/mtd0'). Discards are treated as erases. Depending on the
+underlying device type, the I/O may have to go in a certain pattern,
+e.g., on NAND, writing sequentially to erase blocks and discarding
+before overwriting. The \fBtrimwrite\fR mode works well for this
 constraint.
 .TP
 .B pmemblk
-Read and write using filesystem DAX to a file on a filesystem mounted with
-DAX on a persistent memory device through the NVML libpmemblk library.
-.TP
-.B dev-dax
-Read and write using device DAX to a persistent memory device
-(e.g., /dev/dax0.0) through the NVML libpmem library.
-.RE
-.P
-.RE
+Read and write using filesystem DAX to a file on a filesystem
+mounted with DAX on a persistent memory device through the NVML
+libpmemblk library.
 .TP
-.BI iodepth \fR=\fPint
-Number of I/O units to keep in flight against the file. Note that increasing
-iodepth beyond 1 will not affect synchronous ioengines (except for small
-degress when verify_async is in use). Even async engines may impose OS
-restrictions causing the desired depth not to be achieved.  This may happen on
-Linux when using libaio and not setting \fBdirect\fR=1, since buffered IO is
-not async on that OS. Keep an eye on the IO depth distribution in the
-fio output to verify that the achieved depth is as expected. Default: 1.
-.TP
-.BI iodepth_batch \fR=\fPint "\fR,\fP iodepth_batch_submit" \fR=\fPint
-This defines how many pieces of IO to submit at once. It defaults to 1
-which means that we submit each IO as soon as it is available, but can
-be raised to submit bigger batches of IO at the time. If it is set to 0
-the \fBiodepth\fR value will be used.
+.B dev\-dax
+Read and write using device DAX to a persistent memory device (e.g.,
+/dev/dax0.0) through the NVML libpmem library.
 .TP
-.BI iodepth_batch_complete_min \fR=\fPint "\fR,\fP iodepth_batch_complete" \fR=\fPint
-This defines how many pieces of IO to retrieve at once. It defaults to 1 which
- means that we'll ask for a minimum of 1 IO in the retrieval process from the
-kernel. The IO retrieval will go on until we hit the limit set by
-\fBiodepth_low\fR. If this variable is set to 0, then fio will always check for
-completed events before queuing more IO. This helps reduce IO latency, at the
-cost of more retrieval system calls.
+.B external
+Prefix to specify loading an external I/O engine object file. Append
+the engine filename, e.g. `ioengine=external:/tmp/foo.o' to load
+ioengine `foo.o' in `/tmp'.
+.SS "I/O engine specific parameters"
+In addition, there are some parameters which are only valid when a specific
+\fBioengine\fR is in use. These are used identically to normal parameters,
+with the caveat that when used on the command line, they must come after the
+\fBioengine\fR that defines them is selected.
 .TP
-.BI iodepth_batch_complete_max \fR=\fPint
-This defines maximum pieces of IO to
-retrieve at once. This variable should be used along with
-\fBiodepth_batch_complete_min\fR=int variable, specifying the range
-of min and max amount of IO which should be retrieved. By default
-it is equal to \fBiodepth_batch_complete_min\fR value.
-
-Example #1:
-.RS
-.RS
-\fBiodepth_batch_complete_min\fR=1
-.LP
-\fBiodepth_batch_complete_max\fR=<iodepth>
-.RE
-
-which means that we will retrieve at least 1 IO and up to the
-whole submitted queue depth. If none of IO has been completed
-yet, we will wait.
-
-Example #2:
-.RS
-\fBiodepth_batch_complete_min\fR=0
-.LP
-\fBiodepth_batch_complete_max\fR=<iodepth>
-.RE
-
-which means that we can retrieve up to the whole submitted
-queue depth, but if none of IO has been completed yet, we will
-NOT wait and immediately exit the system call. In this example
-we simply do polling.
-.RE
+.BI (libaio)userspace_reap
+Normally, with the libaio engine in use, fio will use the
+\fBio_getevents\fR\|(3) system call to reap newly returned events. With
+this flag turned on, the AIO ring will be read directly from user\-space to
+reap events. The reaping mode is only enabled when polling for a minimum of
+0 events (e.g. when `iodepth_batch_complete=0').
 .TP
-.BI iodepth_low \fR=\fPint
-Low watermark indicating when to start filling the queue again.  Default:
-\fBiodepth\fR.
+.BI (pvsync2)hipri
+Set RWF_HIPRI on I/O, indicating to the kernel that it's of higher priority
+than normal.
 .TP
-.BI serialize_overlap \fR=\fPbool
-Serialize in-flight I/Os that might otherwise cause or suffer from data races.
-When two or more I/Os are submitted simultaneously, there is no guarantee that
-the I/Os will be processed or completed in the submitted order. Further, if
-two or more of those I/Os are writes, any overlapping region between them can
-become indeterminate/undefined on certain storage. These issues can cause
-verification to fail erratically when at least one of the racing I/Os is
-changing data and the overlapping region has a non-zero size. Setting
-\fBserialize_overlap\fR tells fio to avoid provoking this behavior by explicitly
-serializing in-flight I/Os that have a non-zero overlap. Note that setting
-this option can reduce both performance and the \fBiodepth\fR achieved.
-Additionally this option does not work when \fBio_submit_mode\fR is set to
-offload. Default: false.
+.BI (pvsync2)hipri_percentage
+When hipri is set this determines the probability of a pvsync2 I/O being high
+priority. The default is 100%.
 .TP
-.BI io_submit_mode \fR=\fPstr
-This option controls how fio submits the IO to the IO engine. The default is
-\fBinline\fR, which means that the fio job threads submit and reap IO directly.
-If set to \fBoffload\fR, the job threads will offload IO submission to a
-dedicated pool of IO threads. This requires some coordination and thus has a
-bit of extra overhead, especially for lower queue depth IO where it can
-increase latencies. The benefit is that fio can manage submission rates
-independently of the device completion rates. This avoids skewed latency
-reporting if IO gets back up on the device side (the coordinated omission
-problem).
+.BI (cpuio)cpuload \fR=\fPint
+Attempt to use the specified percentage of CPU cycles. This is a mandatory
+option when using cpuio I/O engine.
 .TP
-.BI direct \fR=\fPbool
-If true, use non-buffered I/O (usually O_DIRECT).  Default: false.
+.BI (cpuio)cpuchunks \fR=\fPint
+Split the load into cycles of the given time. In microseconds.
 .TP
-.BI atomic \fR=\fPbool
-If value is true, attempt to use atomic direct IO. Atomic writes are guaranteed
-to be stable once acknowledged by the operating system. Only Linux supports
-O_ATOMIC right now.
+.BI (cpuio)exit_on_io_done \fR=\fPbool
+Detect when I/O threads are done, then exit.
 .TP
-.BI buffered \fR=\fPbool
-If true, use buffered I/O.  This is the opposite of the \fBdirect\fR parameter.
-Default: true.
+.BI (libhdfs)namenode \fR=\fPstr
+The hostname or IP address of a HDFS cluster namenode to contact.
 .TP
-.BI offset \fR=\fPint
-Start I/O at the provided offset in the file, given as either a fixed size in
-bytes or a percentage. If a percentage is given, the next \fBblockalign\fR-ed
-offset will be used. Data before the given offset will not be touched. This
-effectively caps the file size at (real_size - offset). Can be combined with
-\fBsize\fR to constrain the start and end range of the I/O workload. A percentage
-can be specified by a number between 1 and 100 followed by '%', for example,
-offset=20% to specify 20%.
+.BI (libhdfs)port
+The listening port of the HFDS cluster namenode.
 .TP
-.BI offset_increment \fR=\fPint
-If this is provided, then the real offset becomes the
-offset + offset_increment * thread_number, where the thread number is a
-counter that starts at 0 and is incremented for each sub-job (i.e. when
-numjobs option is specified). This option is useful if there are several jobs
-which are intended to operate on a file in parallel disjoint segments, with
-even spacing between the starting points.
+.BI (netsplice,net)port
+The TCP or UDP port to bind to or connect to. If this is used with
+\fBnumjobs\fR to spawn multiple instances of the same job type, then
+this will be the starting port number since fio will use a range of
+ports.
 .TP
-.BI number_ios \fR=\fPint
-Fio will normally perform IOs until it has exhausted the size of the region
-set by \fBsize\fR, or if it exhaust the allocated time (or hits an error
-condition). With this setting, the range/size can be set independently of
-the number of IOs to perform. When fio reaches this number, it will exit
-normally and report status. Note that this does not extend the amount
-of IO that will be done, it will only stop fio if this condition is met
-before other end-of-job criteria.
+.BI (netsplice,net)hostname \fR=\fPstr
+The hostname or IP address to use for TCP or UDP based I/O. If the job is
+a TCP listener or UDP reader, the hostname is not used and must be omitted
+unless it is a valid UDP multicast address.
 .TP
-.BI fsync \fR=\fPint
-How many I/Os to perform before issuing an \fBfsync\fR\|(2) of dirty data.  If
-0, don't sync.  Default: 0.
+.BI (netsplice,net)interface \fR=\fPstr
+The IP address of the network interface used to send or receive UDP
+multicast.
 .TP
-.BI fdatasync \fR=\fPint
-Like \fBfsync\fR, but uses \fBfdatasync\fR\|(2) instead to only sync the
-data parts of the file. Default: 0.
+.BI (netsplice,net)ttl \fR=\fPint
+Time\-to\-live value for outgoing UDP multicast packets. Default: 1.
 .TP
-.BI write_barrier \fR=\fPint
-Make every Nth write a barrier write.
+.BI (netsplice,net)nodelay \fR=\fPbool
+Set TCP_NODELAY on TCP connections.
 .TP
-.BI sync_file_range \fR=\fPstr:int
-Use \fBsync_file_range\fR\|(2) for every \fRval\fP number of write operations. Fio will
-track range of writes that have happened since the last \fBsync_file_range\fR\|(2) call.
-\fRstr\fP can currently be one or more of:
+.BI (netsplice,net)protocol \fR=\fPstr "\fR,\fP proto" \fR=\fPstr
+The network protocol to use. Accepted values are:
 .RS
-.TP
-.B wait_before
-SYNC_FILE_RANGE_WAIT_BEFORE
-.TP
-.B write
-SYNC_FILE_RANGE_WRITE
-.TP
-.B wait_after
-SYNC_FILE_RANGE_WAIT_AFTER
-.TP
-.RE
-.P
-So if you do sync_file_range=wait_before,write:8, fio would use
-\fBSYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE\fP for every 8 writes.
-Also see the \fBsync_file_range\fR\|(2) man page.  This option is Linux specific.
-.TP
-.BI overwrite \fR=\fPbool
-If writing, setup the file first and do overwrites.  Default: false.
-.TP
-.BI end_fsync \fR=\fPbool
-Sync file contents when a write stage has completed.  Default: false.
-.TP
-.BI fsync_on_close \fR=\fPbool
-If true, sync file contents on close.  This differs from \fBend_fsync\fR in that
-it will happen on every close, not just at the end of the job.  Default: false.
-.TP
-.BI rwmixread \fR=\fPint
-Percentage of a mixed workload that should be reads. Default: 50.
-.TP
-.BI rwmixwrite \fR=\fPint
-Percentage of a mixed workload that should be writes.  If \fBrwmixread\fR and
-\fBrwmixwrite\fR are given and do not sum to 100%, the latter of the two
-overrides the first. This may interfere with a given rate setting, if fio is
-asked to limit reads or writes to a certain rate. If that is the case, then
-the distribution may be skewed. Default: 50.
-.TP
-.BI random_distribution \fR=\fPstr:float
-By default, fio will use a completely uniform random distribution when asked
-to perform random IO. Sometimes it is useful to skew the distribution in
-specific ways, ensuring that some parts of the data is more hot than others.
-Fio includes the following distribution models:
 .RS
 .TP
-.B random
-Uniform random distribution
-.TP
-.B zipf
-Zipf distribution
-.TP
-.B pareto
-Pareto distribution
+.B tcp
+Transmission control protocol.
 .TP
-.B normal
-Normal (Gaussian) distribution
+.B tcpv6
+Transmission control protocol V6.
 .TP
-.B zoned
-Zoned random distribution
+.B udp
+User datagram protocol.
 .TP
+.B udpv6
+User datagram protocol V6.
+.TP
+.B unix
+UNIX domain socket.
 .RE
-When using a \fBzipf\fR or \fBpareto\fR distribution, an input value is also
-needed to define the access pattern. For \fBzipf\fR, this is the zipf theta.
-For \fBpareto\fR, it's the pareto power. Fio includes a test program, genzipf,
-that can be used visualize what the given input values will yield in terms of
-hit rates. If you wanted to use \fBzipf\fR with a theta of 1.2, you would use
-random_distribution=zipf:1.2 as the option. If a non-uniform model is used,
-fio will disable use of the random map. For the \fBnormal\fR distribution, a
-normal (Gaussian) deviation is supplied as a value between 0 and 100.
-.P
-.RS
-For a \fBzoned\fR distribution, fio supports specifying percentages of IO
-access that should fall within what range of the file or device. For example,
-given a criteria of:
-.P
-.RS
-60% of accesses should be to the first 10%
-.RE
-.RS
-30% of accesses should be to the next 20%
-.RE
-.RS
-8% of accesses should be to the next 30%
+.P
+When the protocol is TCP or UDP, the port must also be given, as well as the
+hostname if the job is a TCP listener or UDP reader. For unix sockets, the
+normal \fBfilename\fR option should be used and the port is invalid.
 .RE
+.TP
+.BI (netsplice,net)listen
+For TCP network connections, tell fio to listen for incoming connections
+rather than initiating an outgoing connection. The \fBhostname\fR must
+be omitted if this option is used.
+.TP
+.BI (netsplice,net)pingpong
+Normally a network writer will just continue writing data, and a network
+reader will just consume packages. If `pingpong=1' is set, a writer will
+send its normal payload to the reader, then wait for the reader to send the
+same payload back. This allows fio to measure network latencies. The
+submission and completion latencies then measure local time spent sending or
+receiving, and the completion latency measures how long it took for the
+other end to receive and send back. For UDP multicast traffic
+`pingpong=1' should only be set for a single reader when multiple readers
+are listening to the same address.
+.TP
+.BI (netsplice,net)window_size \fR=\fPint
+Set the desired socket buffer size for the connection.
+.TP
+.BI (netsplice,net)mss \fR=\fPint
+Set the TCP maximum segment size (TCP_MAXSEG).
+.TP
+.BI (e4defrag)donorname \fR=\fPstr
+File will be used as a block donor (swap extents between files).
+.TP
+.BI (e4defrag)inplace \fR=\fPint
+Configure donor file blocks allocation strategy:
 .RS
-2% of accesses should be to the next 40%
-.RE
-.P
-we can define that through zoning of the random accesses. For the above
-example, the user would do:
-.P
 .RS
-.B random_distribution=zoned:60/10:30/20:8/30:2/40
+.TP
+.B 0
+Default. Preallocate donor's file on init.
+.TP
+.B 1
+Allocate space immediately inside defragment event, and free right
+after event.
 .RE
-.P
-similarly to how \fBbssplit\fR works for setting ranges and percentages of block
-sizes. Like \fBbssplit\fR, it's possible to specify separate zones for reads,
-writes, and trims. If just one set is given, it'll apply to all of them.
 .RE
 .TP
-.BI percentage_random \fR=\fPint[,int][,int]
-For a random workload, set how big a percentage should be random. This defaults
-to 100%, in which case the workload is fully random. It can be set from
-anywhere from 0 to 100.  Setting it to 0 would make the workload fully
-sequential. It is possible to set different values for reads, writes, and
-trim. To do so, simply use a comma separated list. See \fBblocksize\fR.
+.BI (rbd)clustername \fR=\fPstr
+Specifies the name of the Ceph cluster.
 .TP
-.B norandommap
-Normally \fBfio\fR will cover every block of the file when doing random I/O. If
-this parameter is given, a new offset will be chosen without looking at past
-I/O history.  This parameter is mutually exclusive with \fBverify\fR.
+.BI (rbd)rbdname \fR=\fPstr
+Specifies the name of the RBD.
 .TP
-.BI softrandommap \fR=\fPbool
-See \fBnorandommap\fR. If fio runs with the random block map enabled and it
-fails to allocate the map, if this option is set it will continue without a
-random block map. As coverage will not be as complete as with random maps, this
-option is disabled by default.
+.BI (rbd)pool \fR=\fPstr
+Specifies the name of the Ceph pool containing RBD.
 .TP
-.BI random_generator \fR=\fPstr
-Fio supports the following engines for generating IO offsets for random IO:
-.RS
+.BI (rbd)clientname \fR=\fPstr
+Specifies the username (without the 'client.' prefix) used to access the
+Ceph cluster. If the \fBclustername\fR is specified, the \fBclientname\fR shall be
+the full *type.id* string. If no type. prefix is given, fio will add 'client.'
+by default.
 .TP
-.B tausworthe
-Strong 2^88 cycle random number generator
+.BI (mtd)skip_bad \fR=\fPbool
+Skip operations against known bad blocks.
 .TP
-.B lfsr
-Linear feedback shift register generator
+.BI (libhdfs)hdfsdirectory
+libhdfs will create chunk in this HDFS directory.
 .TP
-.B tausworthe64
-Strong 64-bit 2^258 cycle random number generator
+.BI (libhdfs)chunk_size
+The size of the chunk to use for each file.
+.SS "I/O depth"
+.TP
+.BI iodepth \fR=\fPint
+Number of I/O units to keep in flight against the file. Note that
+increasing \fBiodepth\fR beyond 1 will not affect synchronous ioengines (except
+for small degrees when \fBverify_async\fR is in use). Even async
+engines may impose OS restrictions causing the desired depth not to be
+achieved. This may happen on Linux when using libaio and not setting
+`direct=1', since buffered I/O is not async on that OS. Keep an
+eye on the I/O depth distribution in the fio output to verify that the
+achieved depth is as expected. Default: 1.
+.TP
+.BI iodepth_batch_submit \fR=\fPint "\fR,\fP iodepth_batch" \fR=\fPint
+This defines how many pieces of I/O to submit at once. It defaults to 1
+which means that we submit each I/O as soon as it is available, but can be
+raised to submit bigger batches of I/O at the time. If it is set to 0 the
+\fBiodepth\fR value will be used.
+.TP
+.BI iodepth_batch_complete_min \fR=\fPint "\fR,\fP iodepth_batch_complete" \fR=\fPint
+This defines how many pieces of I/O to retrieve at once. It defaults to 1
+which means that we'll ask for a minimum of 1 I/O in the retrieval process
+from the kernel. The I/O retrieval will go on until we hit the limit set by
+\fBiodepth_low\fR. If this variable is set to 0, then fio will always
+check for completed events before queuing more I/O. This helps reduce I/O
+latency, at the cost of more retrieval system calls.
 .TP
+.BI iodepth_batch_complete_max \fR=\fPint
+This defines maximum pieces of I/O to retrieve at once. This variable should
+be used along with \fBiodepth_batch_complete_min\fR=\fIint\fR variable,
+specifying the range of min and max amount of I/O which should be
+retrieved. By default it is equal to \fBiodepth_batch_complete_min\fR
+value. Example #1:
+.RS
+.RS
+.P
+.PD 0
+iodepth_batch_complete_min=1
+.P
+iodepth_batch_complete_max=<iodepth>
+.PD
+.RE
+.P
+which means that we will retrieve at least 1 I/O and up to the whole
+submitted queue depth. If none of I/O has been completed yet, we will wait.
+Example #2:
+.RS
+.P
+.PD 0
+iodepth_batch_complete_min=0
+.P
+iodepth_batch_complete_max=<iodepth>
+.PD
 .RE
 .P
-Tausworthe is a strong random number generator, but it requires tracking on the
-side if we want to ensure that blocks are only read or written once. LFSR
-guarantees that we never generate the same offset twice, and it's also less
-computationally expensive. It's not a true random generator, however, though
-for IO purposes it's typically good enough. LFSR only works with single block
-sizes, not with workloads that use multiple block sizes. If used with such a
-workload, fio may read or write some blocks multiple times. The default
-value is tausworthe, unless the required space exceeds 2^32 blocks. If it does,
-then tausworthe64 is selected automatically.
+which means that we can retrieve up to the whole submitted queue depth, but
+if none of I/O has been completed yet, we will NOT wait and immediately exit
+the system call. In this example we simply do polling.
+.RE
 .TP
-.BI nice \fR=\fPint
-Run job with given nice value.  See \fBnice\fR\|(2).
+.BI iodepth_low \fR=\fPint
+The low water mark indicating when to start filling the queue
+again. Defaults to the same as \fBiodepth\fR, meaning that fio will
+attempt to keep the queue full at all times. If \fBiodepth\fR is set to
+e.g. 16 and \fBiodepth_low\fR is set to 4, then after fio has filled the queue of
+16 requests, it will let the depth drain down to 4 before starting to fill
+it again.
 .TP
-.BI prio \fR=\fPint
-Set I/O priority value of this job between 0 (highest) and 7 (lowest).  See
-\fBionice\fR\|(1).
+.BI serialize_overlap \fR=\fPbool
+Serialize in-flight I/Os that might otherwise cause or suffer from data races.
+When two or more I/Os are submitted simultaneously, there is no guarantee that
+the I/Os will be processed or completed in the submitted order. Further, if
+two or more of those I/Os are writes, any overlapping region between them can
+become indeterminate/undefined on certain storage. These issues can cause
+verification to fail erratically when at least one of the racing I/Os is
+changing data and the overlapping region has a non-zero size. Setting
+\fBserialize_overlap\fR tells fio to avoid provoking this behavior by explicitly
+serializing in-flight I/Os that have a non-zero overlap. Note that setting
+this option can reduce both performance and the \fBiodepth\fR achieved.
+Additionally this option does not work when \fBio_submit_mode\fR is set to
+offload. Default: false.
 .TP
-.BI prioclass \fR=\fPint
-Set I/O priority class.  See \fBionice\fR\|(1).
+.BI io_submit_mode \fR=\fPstr
+This option controls how fio submits the I/O to the I/O engine. The default
+is `inline', which means that the fio job threads submit and reap I/O
+directly. If set to `offload', the job threads will offload I/O submission
+to a dedicated pool of I/O threads. This requires some coordination and thus
+has a bit of extra overhead, especially for lower queue depth I/O where it
+can increase latencies. The benefit is that fio can manage submission rates
+independently of the device completion rates. This avoids skewed latency
+reporting if I/O gets backed up on the device side (the coordinated omission
+problem).
+.SS "I/O rate"
 .TP
-.BI thinktime \fR=\fPint
-Stall job for given number of microseconds between issuing I/Os.
+.BI thinktime \fR=\fPtime
+Stall the job for the specified period of time after an I/O has completed before issuing the
+next. May be used to simulate processing being done by an application.
+When the unit is omitted, the value is interpreted in microseconds. See
+\fBthinktime_blocks\fR and \fBthinktime_spin\fR.
 .TP
-.BI thinktime_spin \fR=\fPint
-Pretend to spend CPU time for given number of microseconds, sleeping the rest
-of the time specified by \fBthinktime\fR.  Only valid if \fBthinktime\fR is set.
+.BI thinktime_spin \fR=\fPtime
+Only valid if \fBthinktime\fR is set \- pretend to spend CPU time doing
+something with the data received, before falling back to sleeping for the
+rest of the period specified by \fBthinktime\fR. When the unit is
+omitted, the value is interpreted in microseconds.
 .TP
 .BI thinktime_blocks \fR=\fPint
-Only valid if thinktime is set - control how many blocks to issue, before
-waiting \fBthinktime\fR microseconds. If not set, defaults to 1 which will
-make fio wait \fBthinktime\fR microseconds after every block. This
-effectively makes any queue depth setting redundant, since no more than 1 IO
-will be queued before we have to complete it and do our thinktime. In other
-words, this setting effectively caps the queue depth if the latter is larger.
-Default: 1.
+Only valid if \fBthinktime\fR is set \- control how many blocks to issue,
+before waiting \fBthinktime\fR usecs. If not set, defaults to 1 which will make
+fio wait \fBthinktime\fR usecs after every block. This effectively makes any
+queue depth setting redundant, since no more than 1 I/O will be queued
+before we have to complete it and do our \fBthinktime\fR. In other words, this
+setting effectively caps the queue depth if the latter is larger.
 .TP
 .BI rate \fR=\fPint[,int][,int]
-Cap bandwidth used by this job. The number is in bytes/sec, the normal postfix
-rules apply. You can use \fBrate\fR=500k to limit reads and writes to 500k each,
-or you can specify reads, write, and trim limits separately.
-Using \fBrate\fR=1m,500k would
-limit reads to 1MiB/sec and writes to 500KiB/sec. Capping only reads or writes
-can be done with \fBrate\fR=,500k or \fBrate\fR=500k,. The former will only
-limit writes (to 500KiB/sec), the latter will only limit reads.
+Cap the bandwidth used by this job. The number is in bytes/sec, the normal
+suffix rules apply. Comma\-separated values may be specified for reads,
+writes, and trims as described in \fBblocksize\fR.
+.RS
+.P
+For example, using `rate=1m,500k' would limit reads to 1MiB/sec and writes to
+500KiB/sec. Capping only reads or writes can be done with `rate=,500k' or
+`rate=500k,' where the former will only limit writes (to 500KiB/sec) and the
+latter will only limit reads.
+.RE
 .TP
 .BI rate_min \fR=\fPint[,int][,int]
-Tell \fBfio\fR to do whatever it can to maintain at least the given bandwidth.
-Failing to meet this requirement will cause the job to exit. The same format
-as \fBrate\fR is used for read vs write vs trim separation.
+Tell fio to do whatever it can to maintain at least this bandwidth. Failing
+to meet this requirement will cause the job to exit. Comma\-separated values
+may be specified for reads, writes, and trims as described in
+\fBblocksize\fR.
 .TP
 .BI rate_iops \fR=\fPint[,int][,int]
-Cap the bandwidth to this number of IOPS. Basically the same as rate, just
-specified independently of bandwidth. The same format as \fBrate\fR is used for
-read vs write vs trim separation. If \fBblocksize\fR is a range, the smallest block
-size is used as the metric.
+Cap the bandwidth to this number of IOPS. Basically the same as
+\fBrate\fR, just specified independently of bandwidth. If the job is
+given a block size range instead of a fixed value, the smallest block size
+is used as the metric. Comma\-separated values may be specified for reads,
+writes, and trims as described in \fBblocksize\fR.
 .TP
 .BI rate_iops_min \fR=\fPint[,int][,int]
-If this rate of I/O is not met, the job will exit. The same format as \fBrate\fR
-is used for read vs write vs trim separation.
+If fio doesn't meet this rate of I/O, it will cause the job to exit.
+Comma\-separated values may be specified for reads, writes, and trims as
+described in \fBblocksize\fR.
 .TP
 .BI rate_process \fR=\fPstr
-This option controls how fio manages rated IO submissions. The default is
-\fBlinear\fR, which submits IO in a linear fashion with fixed delays between
-IOs that gets adjusted based on IO completion rates. If this is set to
-\fBpoisson\fR, fio will submit IO based on a more real world random request
+This option controls how fio manages rated I/O submissions. The default is
+`linear', which submits I/O in a linear fashion with fixed delays between
+I/Os that gets adjusted based on I/O completion rates. If this is set to
+`poisson', fio will submit I/O based on a more real world random request
 flow, known as the Poisson process
-(https://en.wikipedia.org/wiki/Poisson_process). The lambda will be
+(\fIhttps://en.wikipedia.org/wiki/Poisson_point_process\fR). The lambda will be
 10^6 / IOPS for the given workload.
+.SS "I/O latency"
 .TP
-.BI rate_cycle \fR=\fPint
-Average bandwidth for \fBrate\fR and \fBrate_min\fR over this number of
-milliseconds.  Default: 1000ms.
-.TP
-.BI latency_target \fR=\fPint
+.BI latency_target \fR=\fPtime
 If set, fio will attempt to find the max performance point that the given
-workload will run at while maintaining a latency below this target. The
-values is given in microseconds. See \fBlatency_window\fR and
-\fBlatency_percentile\fR.
+workload will run at while maintaining a latency below this target. When
+the unit is omitted, the value is interpreted in microseconds. See
+\fBlatency_window\fR and \fBlatency_percentile\fR.
 .TP
-.BI latency_window \fR=\fPint
+.BI latency_window \fR=\fPtime
 Used with \fBlatency_target\fR to specify the sample window that the job
-is run at varying queue depths to test the performance. The value is given
-in microseconds.
+is run at varying queue depths to test the performance. When the unit is
+omitted, the value is interpreted in microseconds.
 .TP
 .BI latency_percentile \fR=\fPfloat
-The percentage of IOs that must fall within the criteria specified by
-\fBlatency_target\fR and \fBlatency_window\fR. If not set, this defaults
-to 100.0, meaning that all IOs must be equal or below to the value set
-by \fBlatency_target\fR.
+The percentage of I/Os that must fall within the criteria specified by
+\fBlatency_target\fR and \fBlatency_window\fR. If not set, this
+defaults to 100.0, meaning that all I/Os must be equal or below to the value
+set by \fBlatency_target\fR.
+.TP
+.BI max_latency \fR=\fPtime
+If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
+maximum latency. When the unit is omitted, the value is interpreted in
+microseconds.
+.TP
+.BI rate_cycle \fR=\fPint
+Average bandwidth for \fBrate\fR and \fBrate_min\fR over this number
+of milliseconds. Defaults to 1000.
+.SS "I/O replay"
+.TP
+.BI write_iolog \fR=\fPstr
+Write the issued I/O patterns to the specified file. See
+\fBread_iolog\fR. Specify a separate file for each job, otherwise the
+iologs will be interspersed and the file may be corrupt.
+.TP
+.BI read_iolog \fR=\fPstr
+Open an iolog with the specified filename and replay the I/O patterns it
+contains. This can be used to store a workload and replay it sometime
+later. The iolog given may also be a blktrace binary file, which allows fio
+to replay a workload captured by blktrace. See
+\fBblktrace\fR\|(8) for how to capture such logging data. For blktrace
+replay, the file needs to be turned into a blkparse binary data file first
+(`blkparse <device> \-o /dev/null \-d file_for_fio.bin').
+.TP
+.BI replay_no_stall \fR=\fPbool
+When replaying I/O with \fBread_iolog\fR the default behavior is to
+attempt to respect the timestamps within the log and replay them with the
+appropriate delay between IOPS. By setting this variable fio will not
+respect the timestamps and attempt to replay them as fast as possible while
+still respecting ordering. The result is the same I/O pattern to a given
+device, but different timings.
+.TP
+.BI replay_redirect \fR=\fPstr
+While replaying I/O patterns using \fBread_iolog\fR the default behavior
+is to replay the IOPS onto the major/minor device that each IOP was recorded
+from. This is sometimes undesirable because on a different machine those
+major/minor numbers can map to a different device. Changing hardware on the
+same system can also result in a different major/minor mapping.
+\fBreplay_redirect\fR causes all I/Os to be replayed onto the single specified
+device regardless of the device it was recorded
+from. i.e. `replay_redirect=/dev/sdc' would cause all I/O
+in the blktrace or iolog to be replayed onto `/dev/sdc'. This means
+multiple devices will be replayed onto a single device, if the trace
+contains multiple devices. If you want multiple devices to be replayed
+concurrently to multiple redirected devices you must blkparse your trace
+into separate traces and replay them with independent fio invocations.
+Unfortunately this also breaks the strict time ordering between multiple
+device accesses.
+.TP
+.BI replay_align \fR=\fPint
+Force alignment of I/O offsets and lengths in a trace to this power of 2
+value.
+.TP
+.BI replay_scale \fR=\fPint
+Scale sector offsets down by this factor when replaying traces.
+.SS "Threads, processes and job synchronization"
+.TP
+.BI thread
+Fio defaults to creating jobs by using fork, however if this option is
+given, fio will create jobs by using POSIX Threads' function
+\fBpthread_create\fR\|(3) to create threads instead.
+.TP
+.BI wait_for \fR=\fPstr
+If set, the current job won't be started until all workers of the specified
+waitee job are done.
+.\" ignore blank line here from HOWTO as it looks normal without it
+\fBwait_for\fR operates on the job name basis, so there are a few
+limitations. First, the waitee must be defined prior to the waiter job
+(meaning no forward references). Second, if a job is being referenced as a
+waitee, it must have a unique name (no duplicate waitees).
+.TP
+.BI nice \fR=\fPint
+Run the job with the given nice value. See man \fBnice\fR\|(2).
+.\" ignore blank line here from HOWTO as it looks normal without it
+On Windows, values less than \-15 set the process class to "High"; \-1 through
+\-15 set "Above Normal"; 1 through 15 "Below Normal"; and above 15 "Idle"
+priority class.
+.TP
+.BI prio \fR=\fPint
+Set the I/O priority value of this job. Linux limits us to a positive value
+between 0 and 7, with 0 being the highest. See man
+\fBionice\fR\|(1). Refer to an appropriate manpage for other operating
+systems since meaning of priority may differ.
 .TP
-.BI max_latency \fR=\fPint
-If set, fio will exit the job if it exceeds this maximum latency. It will exit
-with an ETIME error.
+.BI prioclass \fR=\fPint
+Set the I/O priority class. See man \fBionice\fR\|(1).
 .TP
 .BI cpumask \fR=\fPint
-Set CPU affinity for this job. \fIint\fR is a bitmask of allowed CPUs the job
-may run on.  See \fBsched_setaffinity\fR\|(2).
+Set the CPU affinity of this job. The parameter given is a bit mask of
+allowed CPUs the job may run on. So if you want the allowed CPUs to be 1
+and 5, you would pass the decimal value of (1 << 1 | 1 << 5), or 34. See man
+\fBsched_setaffinity\fR\|(2). This may not work on all supported
+operating systems or kernel versions. This option doesn't work well for a
+higher CPU count than what you can store in an integer mask, so it can only
+control cpus 1\-32. For boxes with larger CPU counts, use
+\fBcpus_allowed\fR.
 .TP
 .BI cpus_allowed \fR=\fPstr
-Same as \fBcpumask\fR, but allows a comma-delimited list of CPU numbers.
+Controls the same options as \fBcpumask\fR, but accepts a textual
+specification of the permitted CPUs instead. So to use CPUs 1 and 5 you
+would specify `cpus_allowed=1,5'. This option also allows a range of CPUs
+to be specified \-\- say you wanted a binding to CPUs 1, 5, and 8 to 15, you
+would set `cpus_allowed=1,5,8\-15'.
 .TP
 .BI cpus_allowed_policy \fR=\fPstr
-Set the policy of how fio distributes the CPUs specified by \fBcpus_allowed\fR
-or \fBcpumask\fR. Two policies are supported:
+Set the policy of how fio distributes the CPUs specified by
+\fBcpus_allowed\fR or \fBcpumask\fR. Two policies are supported:
 .RS
 .RS
 .TP
@@ -1368,839 +2009,705 @@ All jobs will share the CPU set specified.
 Each job will get a unique CPU from the CPU set.
 .RE
 .P
-\fBshared\fR is the default behaviour, if the option isn't specified. If
-\fBsplit\fR is specified, then fio will assign one cpu per job. If not enough
-CPUs are given for the jobs listed, then fio will roundrobin the CPUs in
-the set.
+\fBshared\fR is the default behavior, if the option isn't specified. If
+\fBsplit\fR is specified, then fio will will assign one cpu per job. If not
+enough CPUs are given for the jobs listed, then fio will roundrobin the CPUs
+in the set.
 .RE
-.P
 .TP
 .BI numa_cpu_nodes \fR=\fPstr
 Set this job running on specified NUMA nodes' CPUs. The arguments allow
-comma delimited list of cpu numbers, A-B ranges, or 'all'.
+comma delimited list of cpu numbers, A\-B ranges, or `all'. Note, to enable
+NUMA options support, fio must be built on a system with libnuma\-dev(el)
+installed.
 .TP
 .BI numa_mem_policy \fR=\fPstr
-Set this job's memory policy and corresponding NUMA nodes. Format of
-the arguments:
+Set this job's memory policy and corresponding NUMA nodes. Format of the
+arguments:
 .RS
-.TP
-.B <mode>[:<nodelist>]
-.TP
-.B mode
-is one of the following memory policy:
-.TP
-.B default, prefer, bind, interleave, local
-.TP
+.RS
+.P
+<mode>[:<nodelist>]
+.RE
+.P
+`mode' is one of the following memory poicies: `default', `prefer',
+`bind', `interleave' or `local'. For `default' and `local' memory
+policies, no node needs to be specified. For `prefer', only one node is
+allowed. For `bind' and `interleave' the `nodelist' may be as
+follows: a comma delimited list of numbers, A\-B ranges, or `all'.
 .RE
-For \fBdefault\fR and \fBlocal\fR memory policy, no \fBnodelist\fR is
-needed to be specified. For \fBprefer\fR, only one node is
-allowed. For \fBbind\fR and \fBinterleave\fR, \fBnodelist\fR allows
-comma delimited list of numbers, A-B ranges, or 'all'.
-.TP
-.BI startdelay \fR=\fPirange
-Delay start of job for the specified number of seconds. Supports all time
-suffixes to allow specification of hours, minutes, seconds and
-milliseconds - seconds are the default if a unit is omitted.
-Can be given as a range which causes each thread to choose randomly out of the
-range.
-.TP
-.BI runtime \fR=\fPint
-Terminate processing after the specified number of seconds.
-.TP
-.B time_based
-If given, run for the specified \fBruntime\fR duration even if the files are
-completely read or written. The same workload will be repeated as many times
-as \fBruntime\fR allows.
-.TP
-.BI ramp_time \fR=\fPint
-If set, fio will run the specified workload for this amount of time before
-logging any performance numbers. Useful for letting performance settle before
-logging results, thus minimizing the runtime required for stable results. Note
-that the \fBramp_time\fR is considered lead in time for a job, thus it will
-increase the total runtime if a special timeout or runtime is specified.
 .TP
-.BI steadystate \fR=\fPstr:float "\fR,\fP ss" \fR=\fPstr:float
-Define the criterion and limit for assessing steady state performance. The
-first parameter designates the criterion whereas the second parameter sets the
-threshold. When the criterion falls below the threshold for the specified
-duration, the job will stop. For example, iops_slope:0.1% will direct fio
-to terminate the job when the least squares regression slope falls below 0.1%
-of the mean IOPS. If group_reporting is enabled this will apply to all jobs in
-the group. All assessments are carried out using only data from the rolling
-collection window. Threshold limits can be expressed as a fixed value or as a
-percentage of the mean in the collection window. Below are the available steady
-state assessment criteria.
+.BI cgroup \fR=\fPstr
+Add job to this control group. If it doesn't exist, it will be created. The
+system must have a mounted cgroup blkio mount point for this to work. If
+your system doesn't have it mounted, you can do so with:
 .RS
 .RS
-.TP
-.B iops
-Collect IOPS data. Stop the job if all individual IOPS measurements are within
-the specified limit of the mean IOPS (e.g., iops:2 means that all individual
-IOPS values must be within 2 of the mean, whereas iops:0.2% means that all
-individual IOPS values must be within 0.2% of the mean IOPS to terminate the
-job).
-.TP
-.B iops_slope
-Collect IOPS data and calculate the least squares regression slope. Stop the
-job if the slope falls below the specified limit.
-.TP
-.B bw
-Collect bandwidth data. Stop the job if all individual bandwidth measurements
-are within the specified limit of the mean bandwidth.
-.TP
-.B bw_slope
-Collect bandwidth data and calculate the least squares regression slope. Stop
-the job if the slope falls below the specified limit.
+.P
+# mount \-t cgroup \-o blkio none /cgroup
 .RE
 .RE
 .TP
-.BI steadystate_duration \fR=\fPtime "\fR,\fP ss_dur" \fR=\fPtime
-A rolling window of this duration will be used to judge whether steady state
-has been reached. Data will be collected once per second. The default is 0
-which disables steady state detection.
-.TP
-.BI steadystate_ramp_time \fR=\fPtime "\fR,\fP ss_ramp" \fR=\fPtime
-Allow the job to run for the specified duration before beginning data collection
-for checking the steady state job termination criterion. The default is 0.
-.TP
-.BI invalidate \fR=\fPbool
-Invalidate buffer-cache for the file prior to starting I/O.  Default: true.
+.BI cgroup_weight \fR=\fPint
+Set the weight of the cgroup to this value. See the documentation that comes
+with the kernel, allowed values are in the range of 100..1000.
 .TP
-.BI sync \fR=\fPbool
-Use synchronous I/O for buffered writes.  For the majority of I/O engines,
-this means using O_SYNC.  Default: false.
+.BI cgroup_nodelete \fR=\fPbool
+Normally fio will delete the cgroups it has created after the job
+completion. To override this behavior and to leave cgroups around after the
+job completion, set `cgroup_nodelete=1'. This can be useful if one wants
+to inspect various cgroup files after job completion. Default: false.
 .TP
-.BI iomem \fR=\fPstr "\fR,\fP mem" \fR=\fPstr
-Allocation method for I/O unit buffer.  Allowed values are:
-.RS
-.RS
+.BI flow_id \fR=\fPint
+The ID of the flow. If not specified, it defaults to being a global
+flow. See \fBflow\fR.
 .TP
-.B malloc
-Allocate memory with \fBmalloc\fR\|(3). Default memory type.
+.BI flow \fR=\fPint
+Weight in token\-based flow control. If this value is used, then there is
+a 'flow counter' which is used to regulate the proportion of activity between
+two or more jobs. Fio attempts to keep this flow counter near zero. The
+\fBflow\fR parameter stands for how much should be added or subtracted to the
+flow counter on each iteration of the main I/O loop. That is, if one job has
+`flow=8' and another job has `flow=\-1', then there will be a roughly 1:8
+ratio in how much one runs vs the other.
 .TP
-.B shm
-Use shared memory buffers allocated through \fBshmget\fR\|(2).
+.BI flow_watermark \fR=\fPint
+The maximum value that the absolute value of the flow counter is allowed to
+reach before the job must wait for a lower value of the counter.
 .TP
-.B shmhuge
-Same as \fBshm\fR, but use huge pages as backing.
+.BI flow_sleep \fR=\fPint
+The period of time, in microseconds, to wait after the flow watermark has
+been exceeded before retrying operations.
 .TP
-.B mmap
-Use \fBmmap\fR\|(2) for allocation.  Uses anonymous memory unless a filename
-is given after the option in the format `:\fIfile\fR'.
+.BI stonewall "\fR,\fB wait_for_previous"
+Wait for preceding jobs in the job file to exit, before starting this
+one. Can be used to insert serialization points in the job file. A stone
+wall also implies starting a new reporting group, see
+\fBgroup_reporting\fR.
 .TP
-.B mmaphuge
-Same as \fBmmap\fR, but use huge files as backing.
+.BI exitall
+By default, fio will continue running all other jobs when one job finishes
+but sometimes this is not the desired action. Setting \fBexitall\fR will
+instead make fio terminate all other jobs when one job finishes.
 .TP
-.B mmapshared
-Same as \fBmmap\fR, but use a MMAP_SHARED mapping.
+.BI exec_prerun \fR=\fPstr
+Before running this job, issue the command specified through
+\fBsystem\fR\|(3). Output is redirected in a file called `jobname.prerun.txt'.
 .TP
-.B cudamalloc
-Use GPU memory as the buffers for GPUDirect RDMA benchmark. The ioengine must be \fBrdma\fR.
-.RE
-.P
-The amount of memory allocated is the maximum allowed \fBblocksize\fR for the
-job multiplied by \fBiodepth\fR.  For \fBshmhuge\fR or \fBmmaphuge\fR to work,
-the system must have free huge pages allocated.  \fBmmaphuge\fR also needs to
-have hugetlbfs mounted, and \fIfile\fR must point there. At least on Linux,
-huge pages must be manually allocated. See \fB/proc/sys/vm/nr_hugehages\fR
-and the documentation for that. Normally you just need to echo an appropriate
-number, eg echoing 8 will ensure that the OS has 8 huge pages ready for
-use.
-.RE
+.BI exec_postrun \fR=\fPstr
+After the job completes, issue the command specified though
+\fBsystem\fR\|(3). Output is redirected in a file called `jobname.postrun.txt'.
 .TP
-.BI iomem_align \fR=\fPint "\fR,\fP mem_align" \fR=\fPint
-This indicates the memory alignment of the IO memory buffers. Note that the
-given alignment is applied to the first IO unit buffer, if using \fBiodepth\fR
-the alignment of the following buffers are given by the \fBbs\fR used. In
-other words, if using a \fBbs\fR that is a multiple of the page sized in the
-system, all buffers will be aligned to this value. If using a \fBbs\fR that
-is not page aligned, the alignment of subsequent IO memory buffers is the
-sum of the \fBiomem_align\fR and \fBbs\fR used.
+.BI uid \fR=\fPint
+Instead of running as the invoking user, set the user ID to this value
+before the thread/process does any work.
 .TP
-.BI hugepage\-size \fR=\fPint
-Defines the size of a huge page.  Must be at least equal to the system setting.
-Should be a multiple of 1MiB. Default: 4MiB.
+.BI gid \fR=\fPint
+Set group ID, see \fBuid\fR.
+.SS "Verification"
 .TP
-.B exitall
-Terminate all jobs when one finishes.  Default: wait for each job to finish.
+.BI verify_only
+Do not perform specified workload, only verify data still matches previous
+invocation of this workload. This option allows one to check data multiple
+times at a later date without overwriting it. This option makes sense only
+for workloads that write data, and does not support workloads with the
+\fBtime_based\fR option set.
 .TP
-.B exitall_on_error
-Terminate all jobs if one job finishes in error.  Default: wait for each job
-to finish.
+.BI do_verify \fR=\fPbool
+Run the verify phase after a write phase. Only valid if \fBverify\fR is
+set. Default: true.
 .TP
-.BI bwavgtime \fR=\fPint
-Average bandwidth calculations over the given time in milliseconds. If the job
-also does bandwidth logging through \fBwrite_bw_log\fR, then the minimum of
-this option and \fBlog_avg_msec\fR will be used.  Default: 500ms.
+.BI verify \fR=\fPstr
+If writing to a file, fio can verify the file contents after each iteration
+of the job. Each verification method also implies verification of special
+header, which is written to the beginning of each block. This header also
+includes meta information, like offset of the block, block number, timestamp
+when block was written, etc. \fBverify\fR can be combined with
+\fBverify_pattern\fR option. The allowed values are:
+.RS
+.RS
 .TP
-.BI iopsavgtime \fR=\fPint
-Average IOPS calculations over the given time in milliseconds. If the job
-also does IOPS logging through \fBwrite_iops_log\fR, then the minimum of
-this option and \fBlog_avg_msec\fR will be used.  Default: 500ms.
+.B md5
+Use an md5 sum of the data area and store it in the header of
+each block.
 .TP
-.BI create_serialize \fR=\fPbool
-If true, serialize file creation for the jobs.  Default: true.
+.B crc64
+Use an experimental crc64 sum of the data area and store it in the
+header of each block.
 .TP
-.BI create_fsync \fR=\fPbool
-\fBfsync\fR\|(2) data file after creation.  Default: true.
+.B crc32c
+Use a crc32c sum of the data area and store it in the header of
+each block. This will automatically use hardware acceleration
+(e.g. SSE4.2 on an x86 or CRC crypto extensions on ARM64) but will
+fall back to software crc32c if none is found. Generally the
+fatest checksum fio supports when hardware accelerated.
 .TP
-.BI create_on_open \fR=\fPbool
-If true, the files are not created until they are opened for IO by the job.
+.B crc32c\-intel
+Synonym for crc32c.
 .TP
-.BI create_only \fR=\fPbool
-If true, fio will only run the setup phase of the job. If files need to be
-laid out or updated on disk, only that will be done. The actual job contents
-are not executed.
+.B crc32
+Use a crc32 sum of the data area and store it in the header of each
+block.
 .TP
-.BI allow_file_create \fR=\fPbool
-If true, fio is permitted to create files as part of its workload. This is
-the default behavior. If this option is false, then fio will error out if the
-files it needs to use don't already exist. Default: true.
+.B crc16
+Use a crc16 sum of the data area and store it in the header of each
+block.
 .TP
-.BI allow_mounted_write \fR=\fPbool
-If this isn't set, fio will abort jobs that are destructive (eg that write)
-to what appears to be a mounted device or partition. This should help catch
-creating inadvertently destructive tests, not realizing that the test will
-destroy data on the mounted file system. Default: false.
+.B crc7
+Use a crc7 sum of the data area and store it in the header of each
+block.
 .TP
-.BI pre_read \fR=\fPbool
-If this is given, files will be pre-read into memory before starting the given
-IO operation. This will also clear the \fR \fBinvalidate\fR flag, since it is
-pointless to pre-read and then drop the cache. This will only work for IO
-engines that are seekable, since they allow you to read the same data
-multiple times. Thus it will not work on eg network or splice IO.
+.B xxhash
+Use xxhash as the checksum function. Generally the fastest software
+checksum that fio supports.
 .TP
-.BI unlink \fR=\fPbool
-Unlink job files when done.  Default: false.
+.B sha512
+Use sha512 as the checksum function.
 .TP
-.BI unlink_each_loop \fR=\fPbool
-Unlink job files after each iteration or loop.  Default: false.
+.B sha256
+Use sha256 as the checksum function.
 .TP
-.BI loops \fR=\fPint
-Specifies the number of iterations (runs of the same workload) of this job.
-Default: 1.
+.B sha1
+Use optimized sha1 as the checksum function.
 .TP
-.BI verify_only
-Do not perform the specified workload, only verify data still matches previous
-invocation of this workload. This option allows one to check data multiple
-times at a later date without overwriting it. This option makes sense only for
-workloads that write data, and does not support workloads with the
-\fBtime_based\fR option set.
+.B sha3\-224
+Use optimized sha3\-224 as the checksum function.
 .TP
-.BI do_verify \fR=\fPbool
-Run the verify phase after a write phase.  Only valid if \fBverify\fR is set.
-Default: true.
+.B sha3\-256
+Use optimized sha3\-256 as the checksum function.
 .TP
-.BI verify \fR=\fPstr
-Method of verifying file contents after each iteration of the job. Each
-verification method also implies verification of special header, which is
-written to the beginning of each block. This header also includes meta
-information, like offset of the block, block number, timestamp when block
-was written, etc.  \fBverify\fR=str can be combined with \fBverify_pattern\fR=str
-option.  The allowed values are:
-.RS
-.RS
+.B sha3\-384
+Use optimized sha3\-384 as the checksum function.
 .TP
-.B md5 crc16 crc32 crc32c crc32c-intel crc64 crc7 sha256 sha512 sha1 sha3-224 sha3-256 sha3-384 sha3-512 xxhash
-Store appropriate checksum in the header of each block. crc32c-intel is
-hardware accelerated SSE4.2 driven, falls back to regular crc32c if
-not supported by the system.
+.B sha3\-512
+Use optimized sha3\-512 as the checksum function.
 .TP
 .B meta
-This option is deprecated, since now meta information is included in generic
-verification header and meta verification happens by default.  For detailed
-information see the description of the \fBverify\fR=str setting. This option
-is kept because of compatibility's sake with old configurations. Do not use it.
+This option is deprecated, since now meta information is included in
+generic verification header and meta verification happens by
+default. For detailed information see the description of the
+\fBverify\fR setting. This option is kept because of
+compatibility's sake with old configurations. Do not use it.
 .TP
 .B pattern
-Verify a strict pattern. Normally fio includes a header with some basic
-information and checksumming, but if this option is set, only the
-specific pattern set with \fBverify_pattern\fR is verified.
+Verify a strict pattern. Normally fio includes a header with some
+basic information and checksumming, but if this option is set, only
+the specific pattern set with \fBverify_pattern\fR is verified.
 .TP
 .B null
-Pretend to verify.  Used for testing internals.
+Only pretend to verify. Useful for testing internals with
+`ioengine=null', not for much else.
 .RE
-
-This option can be used for repeated burn-in tests of a system to make sure
-that the written data is also correctly read back. If the data direction given
-is a read or random read, fio will assume that it should verify a previously
-written file. If the data direction includes any form of write, the verify will
-be of the newly written data.
+.P
+This option can be used for repeated burn\-in tests of a system to make sure
+that the written data is also correctly read back. If the data direction
+given is a read or random read, fio will assume that it should verify a
+previously written file. If the data direction includes any form of write,
+the verify will be of the newly written data.
 .RE
 .TP
 .BI verifysort \fR=\fPbool
-If true, written verify blocks are sorted if \fBfio\fR deems it to be faster to
-read them back in a sorted manner.  Default: true.
+If true, fio will sort written verify blocks when it deems it faster to read
+them back in a sorted manner. This is often the case when overwriting an
+existing file, since the blocks are already laid out in the file system. You
+can ignore this option unless doing huge amounts of really fast I/O where
+the red\-black tree sorting CPU time becomes significant. Default: true.
 .TP
 .BI verifysort_nr \fR=\fPint
-Pre-load and sort verify blocks for a read workload.
+Pre\-load and sort verify blocks for a read workload.
 .TP
 .BI verify_offset \fR=\fPint
 Swap the verification header with data somewhere else in the block before
-writing.  It is swapped back before verifying.
+writing. It is swapped back before verifying.
 .TP
 .BI verify_interval \fR=\fPint
-Write the verification header for this number of bytes, which should divide
-\fBblocksize\fR.  Default: \fBblocksize\fR.
+Write the verification header at a finer granularity than the
+\fBblocksize\fR. It will be written for chunks the size of
+\fBverify_interval\fR. \fBblocksize\fR should divide this evenly.
 .TP
 .BI verify_pattern \fR=\fPstr
-If set, fio will fill the io buffers with this pattern. Fio defaults to filling
-with totally random bytes, but sometimes it's interesting to fill with a known
-pattern for io verification purposes. Depending on the width of the pattern,
-fio will fill 1/2/3/4 bytes of the buffer at the time(it can be either a
-decimal or a hex number). The verify_pattern if larger than a 32-bit quantity
-has to be a hex number that starts with either "0x" or "0X". Use with
-\fBverify\fP=str. Also, verify_pattern supports %o format, which means that for
-each block offset will be written and then verified back, e.g.:
+If set, fio will fill the I/O buffers with this pattern. Fio defaults to
+filling with totally random bytes, but sometimes it's interesting to fill
+with a known pattern for I/O verification purposes. Depending on the width
+of the pattern, fio will fill 1/2/3/4 bytes of the buffer at the time (it can
+be either a decimal or a hex number). The \fBverify_pattern\fR if larger than
+a 32\-bit quantity has to be a hex number that starts with either "0x" or
+"0X". Use with \fBverify\fR. Also, \fBverify_pattern\fR supports %o
+format, which means that for each block offset will be written and then
+verified back, e.g.:
 .RS
 .RS
-\fBverify_pattern\fR=%o
+.P
+verify_pattern=%o
 .RE
+.P
 Or use combination of everything:
-.LP
 .RS
-\fBverify_pattern\fR=0xff%o"abcd"-21
+.P
+verify_pattern=0xff%o"abcd"\-12
 .RE
 .RE
 .TP
 .BI verify_fatal \fR=\fPbool
-If true, exit the job on the first observed verification failure.  Default:
-false.
+Normally fio will keep checking the entire contents before quitting on a
+block verification failure. If this option is set, fio will exit the job on
+the first observed failure. Default: false.
 .TP
 .BI verify_dump \fR=\fPbool
-If set, dump the contents of both the original data block and the data block we
-read off disk to files. This allows later analysis to inspect just what kind of
-data corruption occurred. Off by default.
+If set, dump the contents of both the original data block and the data block
+we read off disk to files. This allows later analysis to inspect just what
+kind of data corruption occurred. Off by default.
 .TP
 .BI verify_async \fR=\fPint
-Fio will normally verify IO inline from the submitting thread. This option
-takes an integer describing how many async offload threads to create for IO
-verification instead, causing fio to offload the duty of verifying IO contents
-to one or more separate threads.  If using this offload option, even sync IO
-engines can benefit from using an \fBiodepth\fR setting higher than 1, as it
-allows them to have IO in flight while verifies are running.
+Fio will normally verify I/O inline from the submitting thread. This option
+takes an integer describing how many async offload threads to create for I/O
+verification instead, causing fio to offload the duty of verifying I/O
+contents to one or more separate threads. If using this offload option, even
+sync I/O engines can benefit from using an \fBiodepth\fR setting higher
+than 1, as it allows them to have I/O in flight while verifies are running.
+Defaults to 0 async threads, i.e. verification is not asynchronous.
 .TP
 .BI verify_async_cpus \fR=\fPstr
-Tell fio to set the given CPU affinity on the async IO verification threads.
-See \fBcpus_allowed\fP for the format used.
+Tell fio to set the given CPU affinity on the async I/O verification
+threads. See \fBcpus_allowed\fR for the format used.
 .TP
 .BI verify_backlog \fR=\fPint
 Fio will normally verify the written contents of a job that utilizes verify
 once that job has completed. In other words, everything is written then
 everything is read back and verified. You may want to verify continually
-instead for a variety of reasons. Fio stores the meta data associated with an
-IO block in memory, so for large verify workloads, quite a bit of memory would
-be used up holding this meta data. If this option is enabled, fio will write
-only N blocks before verifying these blocks.
+instead for a variety of reasons. Fio stores the meta data associated with
+an I/O block in memory, so for large verify workloads, quite a bit of memory
+would be used up holding this meta data. If this option is enabled, fio will
+write only N blocks before verifying these blocks.
 .TP
 .BI verify_backlog_batch \fR=\fPint
-Control how many blocks fio will verify if verify_backlog is set. If not set,
-will default to the value of \fBverify_backlog\fR (meaning the entire queue is
-read back and verified).  If \fBverify_backlog_batch\fR is less than
-\fBverify_backlog\fR then not all blocks will be verified,  if
-\fBverify_backlog_batch\fR is larger than \fBverify_backlog\fR,  some blocks
-will be verified more than once.
+Control how many blocks fio will verify if \fBverify_backlog\fR is
+set. If not set, will default to the value of \fBverify_backlog\fR
+(meaning the entire queue is read back and verified). If
+\fBverify_backlog_batch\fR is less than \fBverify_backlog\fR then not all
+blocks will be verified, if \fBverify_backlog_batch\fR is larger than
+\fBverify_backlog\fR, some blocks will be verified more than once.
+.TP
+.BI verify_state_save \fR=\fPbool
+When a job exits during the write phase of a verify workload, save its
+current state. This allows fio to replay up until that point, if the verify
+state is loaded for the verify read phase. The format of the filename is,
+roughly:
+.RS
+.RS
+.P
+<type>\-<jobname>\-<jobindex>\-verify.state.
+.RE
+.P
+<type> is "local" for a local run, "sock" for a client/server socket
+connection, and "ip" (192.168.0.1, for instance) for a networked
+client/server connection. Defaults to true.
+.RE
+.TP
+.BI verify_state_load \fR=\fPbool
+If a verify termination trigger was used, fio stores the current write state
+of each thread. This can be used at verification time so that fio knows how
+far it should verify. Without this information, fio will run a full
+verification pass, according to the settings in the job file used. Default
+false.
 .TP
 .BI trim_percentage \fR=\fPint
 Number of verify blocks to discard/trim.
 .TP
 .BI trim_verify_zero \fR=\fPbool
-Verify that trim/discarded blocks are returned as zeroes.
+Verify that trim/discarded blocks are returned as zeros.
 .TP
 .BI trim_backlog \fR=\fPint
-Trim after this number of blocks are written.
+Verify that trim/discarded blocks are returned as zeros.
 .TP
 .BI trim_backlog_batch \fR=\fPint
-Trim this number of IO blocks.
+Trim this number of I/O blocks.
 .TP
 .BI experimental_verify \fR=\fPbool
 Enable experimental verification.
+.SS "Steady state"
 .TP
-.BI verify_state_save \fR=\fPbool
-When a job exits during the write phase of a verify workload, save its
-current state. This allows fio to replay up until that point, if the
-verify state is loaded for the verify read phase.
-.TP
-.BI verify_state_load \fR=\fPbool
-If a verify termination trigger was used, fio stores the current write
-state of each thread. This can be used at verification time so that fio
-knows how far it should verify. Without this information, fio will run
-a full verification pass, according to the settings in the job file used.
-.TP
-.B stonewall "\fR,\fP wait_for_previous"
-Wait for preceding jobs in the job file to exit before starting this one.
-\fBstonewall\fR implies \fBnew_group\fR.
-.TP
-.B new_group
-Start a new reporting group.  If not given, all jobs in a file will be part
-of the same reporting group, unless separated by a stonewall.
-.TP
-.BI stats \fR=\fPbool
-By default, fio collects and shows final output results for all jobs that run.
-If this option is set to 0, then fio will ignore it in the final stat output.
-.TP
-.BI numjobs \fR=\fPint
-Number of clones (processes/threads performing the same workload) of this job.
-Default: 1.
-.TP
-.B group_reporting
-If set, display per-group reports instead of per-job when \fBnumjobs\fR is
-specified.
-.TP
-.B thread
-Use threads created with \fBpthread_create\fR\|(3) instead of processes created
-with \fBfork\fR\|(2).
-.TP
-.BI zonesize \fR=\fPint
-Divide file into zones of the specified size in bytes.  See \fBzoneskip\fR.
-.TP
-.BI zonerange \fR=\fPint
-Give size of an IO zone.  See \fBzoneskip\fR.
-.TP
-.BI zoneskip \fR=\fPint
-Skip the specified number of bytes when \fBzonesize\fR bytes of data have been
-read.
+.BI steadystate \fR=\fPstr:float "\fR,\fP ss" \fR=\fPstr:float
+Define the criterion and limit for assessing steady state performance. The
+first parameter designates the criterion whereas the second parameter sets
+the threshold. When the criterion falls below the threshold for the
+specified duration, the job will stop. For example, `iops_slope:0.1%' will
+direct fio to terminate the job when the least squares regression slope
+falls below 0.1% of the mean IOPS. If \fBgroup_reporting\fR is enabled
+this will apply to all jobs in the group. Below is the list of available
+steady state assessment criteria. All assessments are carried out using only
+data from the rolling collection window. Threshold limits can be expressed
+as a fixed value or as a percentage of the mean in the collection window.
+.RS
+.RS
 .TP
-.BI write_iolog \fR=\fPstr
-Write the issued I/O patterns to the specified file.  Specify a separate file
-for each job, otherwise the iologs will be interspersed and the file may be
-corrupt.
+.B iops
+Collect IOPS data. Stop the job if all individual IOPS measurements
+are within the specified limit of the mean IOPS (e.g., `iops:2'
+means that all individual IOPS values must be within 2 of the mean,
+whereas `iops:0.2%' means that all individual IOPS values must be
+within 0.2% of the mean IOPS to terminate the job).
 .TP
-.BI read_iolog \fR=\fPstr
-Replay the I/O patterns contained in the specified file generated by
-\fBwrite_iolog\fR, or may be a \fBblktrace\fR binary file.
+.B iops_slope
+Collect IOPS data and calculate the least squares regression
+slope. Stop the job if the slope falls below the specified limit.
 .TP
-.BI replay_no_stall \fR=\fPbool
-While replaying I/O patterns using \fBread_iolog\fR the default behavior
-attempts to respect timing information between I/Os.  Enabling
-\fBreplay_no_stall\fR causes I/Os to be replayed as fast as possible while
-still respecting ordering.
+.B bw
+Collect bandwidth data. Stop the job if all individual bandwidth
+measurements are within the specified limit of the mean bandwidth.
 .TP
-.BI replay_redirect \fR=\fPstr
-While replaying I/O patterns using \fBread_iolog\fR the default behavior
-is to replay the IOPS onto the major/minor device that each IOP was recorded
-from.  Setting \fBreplay_redirect\fR causes all IOPS to be replayed onto the
-single specified device regardless of the device it was recorded from.
+.B bw_slope
+Collect bandwidth data and calculate the least squares regression
+slope. Stop the job if the slope falls below the specified limit.
+.RE
+.RE
 .TP
-.BI replay_align \fR=\fPint
-Force alignment of IO offsets and lengths in a trace to this power of 2 value.
+.BI steadystate_duration \fR=\fPtime "\fR,\fP ss_dur" \fR=\fPtime
+A rolling window of this duration will be used to judge whether steady state
+has been reached. Data will be collected once per second. The default is 0
+which disables steady state detection. When the unit is omitted, the
+value is interpreted in seconds.
 .TP
-.BI replay_scale \fR=\fPint
-Scale sector offsets down by this factor when replaying traces.
+.BI steadystate_ramp_time \fR=\fPtime "\fR,\fP ss_ramp" \fR=\fPtime
+Allow the job to run for the specified duration before beginning data
+collection for checking the steady state job termination criterion. The
+default is 0. When the unit is omitted, the value is interpreted in seconds.
+.SS "Measurements and reporting"
 .TP
 .BI per_job_logs \fR=\fPbool
 If set, this generates bw/clat/iops log with per file private filenames. If
-not set, jobs with identical names will share the log filename. Default: true.
+not set, jobs with identical names will share the log filename. Default:
+true.
+.TP
+.BI group_reporting
+It may sometimes be interesting to display statistics for groups of jobs as
+a whole instead of for each individual job. This is especially true if
+\fBnumjobs\fR is used; looking at individual thread/process output
+quickly becomes unwieldy. To see the final report per\-group instead of
+per\-job, use \fBgroup_reporting\fR. Jobs in a file will be part of the
+same reporting group, unless if separated by a \fBstonewall\fR, or by
+using \fBnew_group\fR.
+.TP
+.BI new_group
+Start a new reporting group. See: \fBgroup_reporting\fR. If not given,
+all jobs in a file will be part of the same reporting group, unless
+separated by a \fBstonewall\fR.
+.TP
+.BI stats \fR=\fPbool
+By default, fio collects and shows final output results for all jobs
+that run. If this option is set to 0, then fio will ignore it in
+the final stat output.
 .TP
 .BI write_bw_log \fR=\fPstr
-If given, write a bandwidth log for this job. Can be used to store data of the
-bandwidth of the jobs in their lifetime. The included fio_generate_plots script
-uses gnuplot to turn these text files into nice graphs. See \fBwrite_lat_log\fR
-for behaviour of given filename. For this option, the postfix is _bw.x.log,
-where x is the index of the job (1..N, where N is the number of jobs). If
-\fBper_job_logs\fR is false, then the filename will not include the job index.
-See the \fBLOG FILE FORMATS\fR
-section.
+If given, write a bandwidth log for this job. Can be used to store data of
+the bandwidth of the jobs in their lifetime. The included
+\fBfio_generate_plots\fR script uses gnuplot to turn these
+text files into nice graphs. See \fBwrite_lat_log\fR for behavior of
+given filename. For this option, the postfix is `_bw.x.log', where `x'
+is the index of the job (1..N, where N is the number of jobs). If
+\fBper_job_logs\fR is false, then the filename will not include the job
+index. See \fBLOG FILE FORMATS\fR section.
 .TP
 .BI write_lat_log \fR=\fPstr
-Same as \fBwrite_bw_log\fR, but writes I/O completion latencies.  If no
-filename is given with this option, the default filename of
-"jobname_type.x.log" is used, where x is the index of the job (1..N, where
-N is the number of jobs). Even if the filename is given, fio will still
-append the type of log. If \fBper_job_logs\fR is false, then the filename will
-not include the job index. See the \fBLOG FILE FORMATS\fR section.
+Same as \fBwrite_bw_log\fR, except that this option stores I/O
+submission, completion, and total latencies instead. If no filename is given
+with this option, the default filename of `jobname_type.log' is
+used. Even if the filename is given, fio will still append the type of
+log. So if one specifies:
+.RS
+.RS
+.P
+write_lat_log=foo
+.RE
+.P
+The actual log names will be `foo_slat.x.log', `foo_clat.x.log',
+and `foo_lat.x.log', where `x' is the index of the job (1..N, where N
+is the number of jobs). This helps \fBfio_generate_plots\fR find the
+logs automatically. If \fBper_job_logs\fR is false, then the filename
+will not include the job index. See \fBLOG FILE FORMATS\fR section.
+.RE
 .TP
 .BI write_hist_log \fR=\fPstr
-Same as \fBwrite_lat_log\fR, but writes I/O completion latency histograms. If
-no filename is given with this option, the default filename of
-"jobname_clat_hist.x.log" is used, where x is the index of the job (1..N, where
-N is the number of jobs). Even if the filename is given, fio will still append
-the type of log. If \fBper_job_logs\fR is false, then the filename will not
-include the job index. See the \fBLOG FILE FORMATS\fR section.
+Same as \fBwrite_lat_log\fR, but writes I/O completion latency
+histograms. If no filename is given with this option, the default filename
+of `jobname_clat_hist.x.log' is used, where `x' is the index of the
+job (1..N, where N is the number of jobs). Even if the filename is given,
+fio will still append the type of log. If \fBper_job_logs\fR is false,
+then the filename will not include the job index. See \fBLOG FILE FORMATS\fR section.
 .TP
 .BI write_iops_log \fR=\fPstr
-Same as \fBwrite_bw_log\fR, but writes IOPS. If no filename is given with this
-option, the default filename of "jobname_type.x.log" is used, where x is the
-index of the job (1..N, where N is the number of jobs). Even if the filename
-is given, fio will still append the type of log. If \fBper_job_logs\fR is false,
-then the filename will not include the job index. See the \fBLOG FILE FORMATS\fR
-section.
+Same as \fBwrite_bw_log\fR, but writes IOPS. If no filename is given
+with this option, the default filename of `jobname_type.x.log' is
+used, where `x' is the index of the job (1..N, where N is the number of
+jobs). Even if the filename is given, fio will still append the type of
+log. If \fBper_job_logs\fR is false, then the filename will not include
+the job index. See \fBLOG FILE FORMATS\fR section.
 .TP
 .BI log_avg_msec \fR=\fPint
 By default, fio will log an entry in the iops, latency, or bw log for every
-IO that completes. When writing to the disk log, that can quickly grow to a
+I/O that completes. When writing to the disk log, that can quickly grow to a
 very large size. Setting this option makes fio average the each log entry
 over the specified period of time, reducing the resolution of the log. See
-\fBlog_max_value\fR as well.  Defaults to 0, logging all entries.
-.TP
-.BI log_max_value \fR=\fPbool
-If \fBlog_avg_msec\fR is set, fio logs the average over that window. If you
-instead want to log the maximum value, set this option to 1.  Defaults to
-0, meaning that averaged values are logged.
+\fBlog_max_value\fR as well. Defaults to 0, logging all entries.
+Also see \fBLOG FILE FORMATS\fR section.
 .TP
 .BI log_hist_msec \fR=\fPint
-Same as \fBlog_avg_msec\fR, but logs entries for completion latency histograms.
-Computing latency percentiles from averages of intervals using \fBlog_avg_msec\fR
-is innacurate. Setting this option makes fio log histogram entries over the
-specified period of time, reducing log sizes for high IOPS devices while
-retaining percentile accuracy. See \fBlog_hist_coarseness\fR as well. Defaults
-to 0, meaning histogram logging is disabled.
+Same as \fBlog_avg_msec\fR, but logs entries for completion latency
+histograms. Computing latency percentiles from averages of intervals using
+\fBlog_avg_msec\fR is inaccurate. Setting this option makes fio log
+histogram entries over the specified period of time, reducing log sizes for
+high IOPS devices while retaining percentile accuracy. See
+\fBlog_hist_coarseness\fR as well. Defaults to 0, meaning histogram
+logging is disabled.
 .TP
 .BI log_hist_coarseness \fR=\fPint
-Integer ranging from 0 to 6, defining the coarseness of the resolution of the
-histogram logs enabled with \fBlog_hist_msec\fR. For each increment in
-coarseness, fio outputs half as many bins. Defaults to 0, for which histogram
-logs contain 1216 latency bins. See the \fBLOG FILE FORMATS\fR section.
+Integer ranging from 0 to 6, defining the coarseness of the resolution of
+the histogram logs enabled with \fBlog_hist_msec\fR. For each increment
+in coarseness, fio outputs half as many bins. Defaults to 0, for which
+histogram logs contain 1216 latency bins. See \fBLOG FILE FORMATS\fR section.
+.TP
+.BI log_max_value \fR=\fPbool
+If \fBlog_avg_msec\fR is set, fio logs the average over that window. If
+you instead want to log the maximum value, set this option to 1. Defaults to
+0, meaning that averaged values are logged.
 .TP
 .BI log_offset \fR=\fPbool
-If this is set, the iolog options will include the byte offset for the IO
-entry as well as the other data values. Defaults to 0 meaning that offsets are
-not present in logs. See the \fBLOG FILE FORMATS\fR section.
+If this is set, the iolog options will include the byte offset for the I/O
+entry as well as the other data values. Defaults to 0 meaning that
+offsets are not present in logs. Also see \fBLOG FILE FORMATS\fR section.
 .TP
 .BI log_compression \fR=\fPint
-If this is set, fio will compress the IO logs as it goes, to keep the memory
-footprint lower. When a log reaches the specified size, that chunk is removed
-and compressed in the background. Given that IO logs are fairly highly
-compressible, this yields a nice memory savings for longer runs. The downside
-is that the compression will consume some background CPU cycles, so it may
-impact the run. This, however, is also true if the logging ends up consuming
-most of the system memory. So pick your poison. The IO logs are saved
-normally at the end of a run, by decompressing the chunks and storing them
-in the specified log file. This feature depends on the availability of zlib.
+If this is set, fio will compress the I/O logs as it goes, to keep the
+memory footprint lower. When a log reaches the specified size, that chunk is
+removed and compressed in the background. Given that I/O logs are fairly
+highly compressible, this yields a nice memory savings for longer runs. The
+downside is that the compression will consume some background CPU cycles, so
+it may impact the run. This, however, is also true if the logging ends up
+consuming most of the system memory. So pick your poison. The I/O logs are
+saved normally at the end of a run, by decompressing the chunks and storing
+them in the specified log file. This feature depends on the availability of
+zlib.
 .TP
 .BI log_compression_cpus \fR=\fPstr
-Define the set of CPUs that are allowed to handle online log compression
-for the IO jobs. This can provide better isolation between performance
+Define the set of CPUs that are allowed to handle online log compression for
+the I/O jobs. This can provide better isolation between performance
 sensitive jobs, and background compression work.
 .TP
 .BI log_store_compressed \fR=\fPbool
 If set, fio will store the log files in a compressed format. They can be
-decompressed with fio, using the \fB\-\-inflate-log\fR command line parameter.
-The files will be stored with a \fB\.fz\fR suffix.
+decompressed with fio, using the \fB\-\-inflate\-log\fR command line
+parameter. The files will be stored with a `.fz' suffix.
 .TP
 .BI log_unix_epoch \fR=\fPbool
 If set, fio will log Unix timestamps to the log files produced by enabling
-\fBwrite_type_log\fR for each log type, instead of the default zero-based
+write_type_log for each log type, instead of the default zero\-based
 timestamps.
 .TP
 .BI block_error_percentiles \fR=\fPbool
-If set, record errors in trim block-sized units from writes and trims and output
-a histogram of how many trims it took to get to errors, and what kind of error
-was encountered.
+If set, record errors in trim block\-sized units from writes and trims and
+output a histogram of how many trims it took to get to errors, and what kind
+of error was encountered.
 .TP
-.BI disable_lat \fR=\fPbool
-Disable measurements of total latency numbers. Useful only for cutting
-back the number of calls to \fBgettimeofday\fR\|(2), as that does impact performance at
-really high IOPS rates.  Note that to really get rid of a large amount of these
-calls, this option must be used with disable_slat and disable_bw as well.
-.TP
-.BI disable_clat \fR=\fPbool
-Disable measurements of completion latency numbers. See \fBdisable_lat\fR.
-.TP
-.BI disable_slat \fR=\fPbool
-Disable measurements of submission latency numbers. See \fBdisable_lat\fR.
-.TP
-.BI disable_bw_measurement \fR=\fPbool
-Disable measurements of throughput/bandwidth numbers. See \fBdisable_lat\fR.
-.TP
-.BI lockmem \fR=\fPint
-Pin the specified amount of memory with \fBmlock\fR\|(2).  Can be used to
-simulate a smaller amount of memory. The amount specified is per worker.
-.TP
-.BI exec_prerun \fR=\fPstr
-Before running the job, execute the specified command with \fBsystem\fR\|(3).
-.RS
-Output is redirected in a file called \fBjobname.prerun.txt\fR
-.RE
-.TP
-.BI exec_postrun \fR=\fPstr
-Same as \fBexec_prerun\fR, but the command is executed after the job completes.
-.RS
-Output is redirected in a file called \fBjobname.postrun.txt\fR
-.RE
+.BI bwavgtime \fR=\fPint
+Average the calculated bandwidth over the given time. Value is specified in
+milliseconds. If the job also does bandwidth logging through
+\fBwrite_bw_log\fR, then the minimum of this option and
+\fBlog_avg_msec\fR will be used. Default: 500ms.
 .TP
-.BI ioscheduler \fR=\fPstr
-Attempt to switch the device hosting the file to the specified I/O scheduler.
+.BI iopsavgtime \fR=\fPint
+Average the calculated IOPS over the given time. Value is specified in
+milliseconds. If the job also does IOPS logging through
+\fBwrite_iops_log\fR, then the minimum of this option and
+\fBlog_avg_msec\fR will be used. Default: 500ms.
 .TP
 .BI disk_util \fR=\fPbool
-Generate disk utilization statistics if the platform supports it. Default: true.
-.TP
-.BI clocksource \fR=\fPstr
-Use the given clocksource as the base of timing. The supported options are:
-.RS
-.TP
-.B gettimeofday
-\fBgettimeofday\fR\|(2)
-.TP
-.B clock_gettime
-\fBclock_gettime\fR\|(2)
-.TP
-.B cpu
-Internal CPU clock source
-.TP
-.RE
-.P
-\fBcpu\fR is the preferred clocksource if it is reliable, as it is very fast
-(and fio is heavy on time calls). Fio will automatically use this clocksource
-if it's supported and considered reliable on the system it is running on,
-unless another clocksource is specifically set. For x86/x86-64 CPUs, this
-means supporting TSC Invariant.
-.TP
-.BI gtod_reduce \fR=\fPbool
-Enable all of the \fBgettimeofday\fR\|(2) reducing options (disable_clat, disable_slat,
-disable_bw) plus reduce precision of the timeout somewhat to really shrink the
-\fBgettimeofday\fR\|(2) call count. With this option enabled, we only do about 0.4% of
-the gtod() calls we would have done if all time keeping was enabled.
-.TP
-.BI gtod_cpu \fR=\fPint
-Sometimes it's cheaper to dedicate a single thread of execution to just getting
-the current time. Fio (and databases, for instance) are very intensive on
-\fBgettimeofday\fR\|(2) calls. With this option, you can set one CPU aside for doing
-nothing but logging current time to a shared memory location. Then the other
-threads/processes that run IO workloads need only copy that segment, instead of
-entering the kernel with a \fBgettimeofday\fR\|(2) call. The CPU set aside for doing
-these time calls will be excluded from other uses. Fio will manually clear it
-from the CPU mask of other jobs.
-.TP
-.BI ignore_error \fR=\fPstr
-Sometimes you want to ignore some errors during test in that case you can specify
-error list for each error type.
-.br
-ignore_error=READ_ERR_LIST,WRITE_ERR_LIST,VERIFY_ERR_LIST
-.br
-errors for given error type is separated with ':'.
-Error may be symbol ('ENOSPC', 'ENOMEM') or an integer.
-.br
-Example: ignore_error=EAGAIN,ENOSPC:122 .
-.br
-This option will ignore EAGAIN from READ, and ENOSPC and 122(EDQUOT) from WRITE.
-.TP
-.BI error_dump \fR=\fPbool
-If set dump every error even if it is non fatal, true by default. If disabled
-only fatal error will be dumped
-.TP
-.BI profile \fR=\fPstr
-Select a specific builtin performance test.
-.TP
-.BI cgroup \fR=\fPstr
-Add job to this control group. If it doesn't exist, it will be created.
-The system must have a mounted cgroup blkio mount point for this to work. If
-your system doesn't have it mounted, you can do so with:
-
-# mount \-t cgroup \-o blkio none /cgroup
-.TP
-.BI cgroup_weight \fR=\fPint
-Set the weight of the cgroup to this value. See the documentation that comes
-with the kernel, allowed values are in the range of 100..1000.
-.TP
-.BI cgroup_nodelete \fR=\fPbool
-Normally fio will delete the cgroups it has created after the job completion.
-To override this behavior and to leave cgroups around after the job completion,
-set cgroup_nodelete=1. This can be useful if one wants to inspect various
-cgroup files after job completion. Default: false
-.TP
-.BI uid \fR=\fPint
-Instead of running as the invoking user, set the user ID to this value before
-the thread/process does any work.
-.TP
-.BI gid \fR=\fPint
-Set group ID, see \fBuid\fR.
-.TP
-.BI unit_base \fR=\fPint
-Base unit for reporting.  Allowed values are:
-.RS
-.TP
-.B 0
-Use auto-detection (default).
-.TP
-.B 8
-Byte based.
-.TP
-.B 1
-Bit based.
-.RE
-.P
+Generate disk utilization statistics, if the platform supports it.
+Default: true.
 .TP
-.BI flow_id \fR=\fPint
-The ID of the flow. If not specified, it defaults to being a global flow. See
-\fBflow\fR.
+.BI disable_lat \fR=\fPbool
+Disable measurements of total latency numbers. Useful only for cutting back
+the number of calls to \fBgettimeofday\fR\|(2), as that does impact
+performance at really high IOPS rates. Note that to really get rid of a
+large amount of these calls, this option must be used with
+\fBdisable_slat\fR and \fBdisable_bw_measurement\fR as well.
 .TP
-.BI flow \fR=\fPint
-Weight in token-based flow control. If this value is used, then there is a
-\fBflow counter\fR which is used to regulate the proportion of activity between
-two or more jobs. fio attempts to keep this flow counter near zero. The
-\fBflow\fR parameter stands for how much should be added or subtracted to the
-flow counter on each iteration of the main I/O loop. That is, if one job has
-\fBflow=8\fR and another job has \fBflow=-1\fR, then there will be a roughly
-1:8 ratio in how much one runs vs the other.
+.BI disable_clat \fR=\fPbool
+Disable measurements of completion latency numbers. See
+\fBdisable_lat\fR.
 .TP
-.BI flow_watermark \fR=\fPint
-The maximum value that the absolute value of the flow counter is allowed to
-reach before the job must wait for a lower value of the counter.
+.BI disable_slat \fR=\fPbool
+Disable measurements of submission latency numbers. See
+\fBdisable_lat\fR.
 .TP
-.BI flow_sleep \fR=\fPint
-The period of time, in microseconds, to wait after the flow watermark has been
-exceeded before retrying operations
+.BI disable_bw_measurement \fR=\fPbool "\fR,\fP disable_bw" \fR=\fPbool
+Disable measurements of throughput/bandwidth numbers. See
+\fBdisable_lat\fR.
 .TP
 .BI clat_percentiles \fR=\fPbool
 Enable the reporting of percentiles of completion latencies.
 .TP
 .BI percentile_list \fR=\fPfloat_list
 Overwrite the default list of percentiles for completion latencies and the
-block error histogram. Each number is a floating number in the range (0,100],
-and the maximum length of the list is 20. Use ':' to separate the
-numbers. For example, \-\-percentile_list=99.5:99.9 will cause fio to
-report the values of completion latency below which 99.5% and 99.9% of
-the observed latencies fell, respectively.
-.SS "Ioengine Parameters List"
-Some parameters are only valid when a specific ioengine is in use. These are
-used identically to normal parameters, with the caveat that when used on the
-command line, they must come after the ioengine.
+block error histogram. Each number is a floating number in the range
+(0,100], and the maximum length of the list is 20. Use ':' to separate the
+numbers, and list the numbers in ascending order. For example,
+`\-\-percentile_list=99.5:99.9' will cause fio to report the values of
+completion latency below which 99.5% and 99.9% of the observed latencies
+fell, respectively.
+.SS "Error handling"
 .TP
-.BI (cpuio)cpuload \fR=\fPint
-Attempt to use the specified percentage of CPU cycles.
-.TP
-.BI (cpuio)cpuchunks \fR=\fPint
-Split the load into cycles of the given time. In microseconds.
+.BI exitall_on_error
+When one job finishes in error, terminate the rest. The default is to wait
+for each job to finish.
 .TP
-.BI (cpuio)exit_on_io_done \fR=\fPbool
-Detect when IO threads are done, then exit.
+.BI continue_on_error \fR=\fPstr
+Normally fio will exit the job on the first observed failure. If this option
+is set, fio will continue the job when there is a 'non\-fatal error' (EIO or
+EILSEQ) until the runtime is exceeded or the I/O size specified is
+completed. If this option is used, there are two more stats that are
+appended, the total error count and the first error. The error field given
+in the stats is the first error that was hit during the run.
+The allowed values are:
+.RS
+.RS
 .TP
-.BI (libaio)userspace_reap
-Normally, with the libaio engine in use, fio will use
-the io_getevents system call to reap newly returned events.
-With this flag turned on, the AIO ring will be read directly
-from user-space to reap events. The reaping mode is only
-enabled when polling for a minimum of 0 events (eg when
-iodepth_batch_complete=0).
+.B none
+Exit on any I/O or verify errors.
 .TP
-.BI (pvsync2)hipri
-Set RWF_HIPRI on IO, indicating to the kernel that it's of
-higher priority than normal.
+.B read
+Continue on read errors, exit on all others.
 .TP
-.BI (pvsync2)hipri_percentage
-When hipri is set this determines the probability of a pvsync2 IO being high
-priority. The default is 100%.
+.B write
+Continue on write errors, exit on all others.
 .TP
-.BI (net,netsplice)hostname \fR=\fPstr
-The host name or IP address to use for TCP or UDP based IO.
-If the job is a TCP listener or UDP reader, the hostname is not
-used and must be omitted unless it is a valid UDP multicast address.
+.B io
+Continue on any I/O error, exit on all others.
 .TP
-.BI (net,netsplice)port \fR=\fPint
-The TCP or UDP port to bind to or connect to. If this is used with
-\fBnumjobs\fR to spawn multiple instances of the same job type, then
-this will be the starting port number since fio will use a range of ports.
+.B verify
+Continue on verify errors, exit on all others.
 .TP
-.BI (net,netsplice)interface \fR=\fPstr
-The IP address of the network interface used to send or receive UDP multicast
-packets.
+.B all
+Continue on all errors.
 .TP
-.BI (net,netsplice)ttl \fR=\fPint
-Time-to-live value for outgoing UDP multicast packets. Default: 1
+.B 0
+Backward\-compatible alias for 'none'.
 .TP
-.BI (net,netsplice)nodelay \fR=\fPbool
-Set TCP_NODELAY on TCP connections.
+.B 1
+Backward\-compatible alias for 'all'.
+.RE
+.RE
 .TP
-.BI (net,netsplice)protocol \fR=\fPstr "\fR,\fP proto" \fR=\fPstr
-The network protocol to use. Accepted values are:
+.BI ignore_error \fR=\fPstr
+Sometimes you want to ignore some errors during test in that case you can
+specify error list for each error type, instead of only being able to
+ignore the default 'non\-fatal error' using \fBcontinue_on_error\fR.
+`ignore_error=READ_ERR_LIST,WRITE_ERR_LIST,VERIFY_ERR_LIST' errors for
+given error type is separated with ':'. Error may be symbol ('ENOSPC', 'ENOMEM')
+or integer. Example:
 .RS
 .RS
+.P
+ignore_error=EAGAIN,ENOSPC:122
+.RE
+.P
+This option will ignore EAGAIN from READ, and ENOSPC and 122(EDQUOT) from
+WRITE. This option works by overriding \fBcontinue_on_error\fR with
+the list of errors for each error type if any.
+.RE
 .TP
-.B tcp
-Transmission control protocol
-.TP
-.B tcpv6
-Transmission control protocol V6
+.BI error_dump \fR=\fPbool
+If set dump every error even if it is non fatal, true by default. If
+disabled only fatal error will be dumped.
+.SS "Running predefined workloads"
+Fio includes predefined profiles that mimic the I/O workloads generated by
+other tools.
 .TP
-.B udp
-User datagram protocol
+.BI profile \fR=\fPstr
+The predefined workload to run. Current profiles are:
+.RS
+.RS
 .TP
-.B udpv6
-User datagram protocol V6
+.B tiobench
+Threaded I/O bench (tiotest/tiobench) like workload.
 .TP
-.B unix
-UNIX domain socket
+.B act
+Aerospike Certification Tool (ACT) like workload.
+.RE
 .RE
 .P
-When the protocol is TCP or UDP, the port must also be given,
-as well as the hostname if the job is a TCP listener or UDP
-reader. For unix sockets, the normal filename option should be
-used and the port is invalid.
+To view a profile's additional options use \fB\-\-cmdhelp\fR after specifying
+the profile. For example:
+.RS
+.TP
+$ fio \-\-profile=act \-\-cmdhelp
 .RE
+.SS "Act profile options"
 .TP
-.BI (net,netsplice)listen
-For TCP network connections, tell fio to listen for incoming
-connections rather than initiating an outgoing connection. The
-hostname must be omitted if this option is used.
+.BI device\-names \fR=\fPstr
+Devices to use.
 .TP
-.BI (net,netsplice)pingpong
-Normally a network writer will just continue writing data, and a network reader
-will just consume packets. If pingpong=1 is set, a writer will send its normal
-payload to the reader, then wait for the reader to send the same payload back.
-This allows fio to measure network latencies. The submission and completion
-latencies then measure local time spent sending or receiving, and the
-completion latency measures how long it took for the other end to receive and
-send back. For UDP multicast traffic pingpong=1 should only be set for a single
-reader when multiple readers are listening to the same address.
+.BI load \fR=\fPint
+ACT load multiplier. Default: 1.
 .TP
-.BI (net,netsplice)window_size \fR=\fPint
-Set the desired socket buffer size for the connection.
+.BI test\-duration\fR=\fPtime
+How long the entire test takes to run. When the unit is omitted, the value
+is given in seconds. Default: 24h.
 .TP
-.BI (net,netsplice)mss \fR=\fPint
-Set the TCP maximum segment size (TCP_MAXSEG).
+.BI threads\-per\-queue\fR=\fPint
+Number of read I/O threads per device. Default: 8.
 .TP
-.BI (e4defrag)donorname \fR=\fPstr
-File will be used as a block donor (swap extents between files)
+.BI read\-req\-num\-512\-blocks\fR=\fPint
+Number of 512B blocks to read at the time. Default: 3.
 .TP
-.BI (e4defrag)inplace \fR=\fPint
-Configure donor file block allocation strategy
-.RS
-.BI 0(default) :
-Preallocate donor's file on init
+.BI large\-block\-op\-kbytes\fR=\fPint
+Size of large block ops in KiB (writes). Default: 131072.
 .TP
-.BI 1:
-allocate space immediately inside defragment event, and free right after event
-.RE
+.BI prep
+Set to run ACT prep phase.
+.SS "Tiobench profile options"
 .TP
-.BI (rbd)clustername \fR=\fPstr
-Specifies the name of the ceph cluster.
+.BI size\fR=\fPstr
+Size in MiB.
 .TP
-.BI (rbd)rbdname \fR=\fPstr
-Specifies the name of the RBD.
+.BI block\fR=\fPint
+Block size in bytes. Default: 4096.
 .TP
-.BI (rbd)pool \fR=\fPstr
-Specifies the name of the Ceph pool containing the RBD.
+.BI numruns\fR=\fPint
+Number of runs.
 .TP
-.BI (rbd)clientname \fR=\fPstr
-Specifies the username (without the 'client.' prefix) used to access the Ceph
-cluster. If the clustername is specified, the clientname shall be the full
-type.id string. If no type. prefix is given, fio will add 'client.' by default.
+.BI dir\fR=\fPstr
+Test directory.
 .TP
-.BI (mtd)skip_bad \fR=\fPbool
-Skip operations against known bad blocks.
+.BI threads\fR=\fPint
+Number of threads.
 .SH OUTPUT
-While running, \fBfio\fR will display the status of the created jobs.  For
-example:
-.RS
-.P
-Jobs: 1: [_r] [24.8% done] [ 13509/  8334 kb/s] [eta 00h:01m:31s]
-.RE
+Fio spits out a lot of output. While running, fio will display the status of the
+jobs created. An example of that would be:
 .P
-The characters in the first set of brackets denote the current status of each
-threads.  The possible values are:
+.nf
+		Jobs: 1 (f=1): [_(1),M(1)][24.8%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 01m:31s]
+.fi
 .P
-.PD 0
+The characters inside the first set of square brackets denote the current status of
+each thread. The first character is the first job defined in the job file, and so
+forth. The possible values (in typical life cycle order) are:
 .RS
 .TP
+.PD 0
 .B P
-Setup but not started.
+Thread setup, but not started.
 .TP
 .B C
 Thread created.
 .TP
 .B I
-Initialized, waiting.
+Thread initialized, waiting or generating necessary data.
+.TP
+.B P
+Thread running pre\-reading file(s).
+.TP
+.B /
+Thread is in ramp period.
 .TP
 .B R
 Running, doing sequential reads.
@@ -2220,96 +2727,210 @@ Running, doing mixed sequential reads/writes.
 .B m
 Running, doing mixed random reads/writes.
 .TP
+.B D
+Running, doing sequential trims.
+.TP
+.B d
+Running, doing random trims.
+.TP
 .B F
 Running, currently waiting for \fBfsync\fR\|(2).
 .TP
 .B V
-Running, verifying written data.
+Running, doing verification of written data.
+.TP
+.B f
+Thread finishing.
 .TP
 .B E
-Exited, not reaped by main thread.
+Thread exited, not reaped by main thread yet.
 .TP
 .B \-
-Exited, thread reaped.
-.RE
+Thread reaped.
+.TP
+.B X
+Thread reaped, exited with an error.
+.TP
+.B K
+Thread reaped, exited due to signal.
 .PD
+.RE
+.P
+Fio will condense the thread string as not to take up more space on the command
+line than needed. For instance, if you have 10 readers and 10 writers running,
+the output would look like this:
+.P
+.nf
+		Jobs: 20 (f=20): [R(10),W(10)][4.0%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 57m:36s]
+.fi
+.P
+Note that the status string is displayed in order, so it's possible to tell which of
+the jobs are currently doing what. In the example above this means that jobs 1\-\-10
+are readers and 11\-\-20 are writers.
 .P
-The second set of brackets shows the estimated completion percentage of
-the current group.  The third set shows the read and write I/O rate,
-respectively. Finally, the estimated run time of the job is displayed.
+The other values are fairly self explanatory \-\- number of threads currently
+running and doing I/O, the number of currently open files (f=), the estimated
+completion percentage, the rate of I/O since last check (read speed listed first,
+then write speed and optionally trim speed) in terms of bandwidth and IOPS,
+and time to completion for the current running group. It's impossible to estimate
+runtime of the following groups (if any).
 .P
-When \fBfio\fR completes (or is interrupted by Ctrl-C), it will show data
-for each thread, each group of threads, and each disk, in that order.
+When fio is done (or interrupted by Ctrl\-C), it will show the data for
+each thread, group of threads, and disks in that order. For each overall thread (or
+group) the output looks like:
 .P
-Per-thread statistics first show the threads client number, group-id, and
-error code.  The remaining figures are as follows:
+.nf
+		Client1: (groupid=0, jobs=1): err= 0: pid=16109: Sat Jun 24 12:07:54 2017
+		  write: IOPS=88, BW=623KiB/s (638kB/s)(30.4MiB/50032msec)
+		    slat (nsec): min=500, max=145500, avg=8318.00, stdev=4781.50
+		    clat (usec): min=170, max=78367, avg=4019.02, stdev=8293.31
+		     lat (usec): min=174, max=78375, avg=4027.34, stdev=8291.79
+		    clat percentiles (usec):
+		     |  1.00th=[  302],  5.00th=[  326], 10.00th=[  343], 20.00th=[  363],
+		     | 30.00th=[  392], 40.00th=[  404], 50.00th=[  416], 60.00th=[  445],
+		     | 70.00th=[  816], 80.00th=[ 6718], 90.00th=[12911], 95.00th=[21627],
+		     | 99.00th=[43779], 99.50th=[51643], 99.90th=[68682], 99.95th=[72877],
+		     | 99.99th=[78119]
+		   bw (  KiB/s): min=  532, max=  686, per=0.10%, avg=622.87, stdev=24.82, samples=  100
+		   iops        : min=   76, max=   98, avg=88.98, stdev= 3.54, samples=  100
+		  lat (usec)   : 250=0.04%, 500=64.11%, 750=4.81%, 1000=2.79%
+		  lat (msec)   : 2=4.16%, 4=1.84%, 10=4.90%, 20=11.33%, 50=5.37%
+		  lat (msec)   : 100=0.65%
+		  cpu          : usr=0.27%, sys=0.18%, ctx=12072, majf=0, minf=21
+		  IO depths    : 1=85.0%, 2=13.1%, 4=1.8%, 8=0.1%, 16=0.0%, 32=0.0%, >=64=0.0%
+		     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
+		     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
+		     issued rwt: total=0,4450,0, short=0,0,0, dropped=0,0,0
+		     latency   : target=0, window=0, percentile=100.00%, depth=8
+.fi
+.P
+The job name (or first job's name when using \fBgroup_reporting\fR) is printed,
+along with the group id, count of jobs being aggregated, last error id seen (which
+is 0 when there are no errors), pid/tid of that thread and the time the job/group
+completed. Below are the I/O statistics for each data direction performed (showing
+writes in the example above). In the order listed, they denote:
 .RS
 .TP
-.B io
-Number of megabytes of I/O performed.
-.TP
-.B bw
-Average data rate (bandwidth).
-.TP
-.B runt
-Threads run time.
+.B read/write/trim
+The string before the colon shows the I/O direction the statistics
+are for. \fIIOPS\fR is the average I/Os performed per second. \fIBW\fR
+is the average bandwidth rate shown as: value in power of 2 format
+(value in power of 10 format). The last two values show: (total
+I/O performed in power of 2 format / \fIruntime\fR of that thread).
 .TP
 .B slat
-Submission latency minimum, maximum, average and standard deviation. This is
-the time it took to submit the I/O.
+Submission latency (\fImin\fR being the minimum, \fImax\fR being the
+maximum, \fIavg\fR being the average, \fIstdev\fR being the standard
+deviation). This is the time it took to submit the I/O. For
+sync I/O this row is not displayed as the slat is really the
+completion latency (since queue/complete is one operation there).
+This value can be in nanoseconds, microseconds or milliseconds \-\-\-
+fio will choose the most appropriate base and print that (in the
+example above nanoseconds was the best scale). Note: in \fB\-\-minimal\fR mode
+latencies are always expressed in microseconds.
 .TP
 .B clat
-Completion latency minimum, maximum, average and standard deviation.  This
-is the time between submission and completion.
+Completion latency. Same names as slat, this denotes the time from
+submission to completion of the I/O pieces. For sync I/O, clat will
+usually be equal (or very close) to 0, as the time from submit to
+complete is basically just CPU time (I/O has already been done, see slat
+explanation).
+.TP
+.B lat
+Total latency. Same names as slat and clat, this denotes the time from
+when fio created the I/O unit to completion of the I/O operation.
 .TP
 .B bw
-Bandwidth minimum, maximum, percentage of aggregate bandwidth received, average
-and standard deviation.
+Bandwidth statistics based on samples. Same names as the xlat stats,
+but also includes the number of samples taken (\fIsamples\fR) and an
+approximate percentage of total aggregate bandwidth this thread
+received in its group (\fIper\fR). This last value is only really
+useful if the threads in this group are on the same disk, since they
+are then competing for disk access.
+.TP
+.B iops
+IOPS statistics based on samples. Same names as \fBbw\fR.
+.TP
+.B lat (nsec/usec/msec)
+The distribution of I/O completion latencies. This is the time from when
+I/O leaves fio and when it gets completed. Unlike the separate
+read/write/trim sections above, the data here and in the remaining
+sections apply to all I/Os for the reporting group. 250=0.04% means that
+0.04% of the I/Os completed in under 250us. 500=64.11% means that 64.11%
+of the I/Os required 250 to 499us for completion.
 .TP
 .B cpu
-CPU usage statistics. Includes user and system time, number of context switches
-this thread went through and number of major and minor page faults. The CPU
-utilization numbers are averages for the jobs in that reporting group, while
-the context and fault counters are summed.
+CPU usage. User and system time, along with the number of context
+switches this thread went through, usage of system and user time, and
+finally the number of major and minor page faults. The CPU utilization
+numbers are averages for the jobs in that reporting group, while the
+context and fault counters are summed.
 .TP
 .B IO depths
-Distribution of I/O depths.  Each depth includes everything less than (or equal)
-to it, but greater than the previous depth.
+The distribution of I/O depths over the job lifetime. The numbers are
+divided into powers of 2 and each entry covers depths from that value
+up to those that are lower than the next entry \-\- e.g., 16= covers
+depths from 16 to 31. Note that the range covered by a depth
+distribution entry can be different to the range covered by the
+equivalent \fBsubmit\fR/\fBcomplete\fR distribution entry.
+.TP
+.B IO submit
+How many pieces of I/O were submitting in a single submit call. Each
+entry denotes that amount and below, until the previous entry \-\- e.g.,
+16=100% means that we submitted anywhere between 9 to 16 I/Os per submit
+call. Note that the range covered by a \fBsubmit\fR distribution entry can
+be different to the range covered by the equivalent depth distribution
+entry.
 .TP
-.B IO issued
-Number of read/write requests issued, and number of short read/write requests.
+.B IO complete
+Like the above \fBsubmit\fR number, but for completions instead.
 .TP
-.B IO latencies
-Distribution of I/O completion latencies.  The numbers follow the same pattern
-as \fBIO depths\fR.
+.B IO issued rwt
+The number of \fBread/write/trim\fR requests issued, and how many of them were
+short or dropped.
+.TP
+.B IO latency
+These values are for \fBlatency-target\fR and related options. When
+these options are engaged, this section describes the I/O depth required
+to meet the specified latency target.
 .RE
 .P
-The group statistics show:
-.PD 0
+After each client has been listed, the group statistics are printed. They
+will look like this:
+.P
+.nf
+		Run status group 0 (all jobs):
+		   READ: bw=20.9MiB/s (21.9MB/s), 10.4MiB/s\-10.8MiB/s (10.9MB/s\-11.3MB/s), io=64.0MiB (67.1MB), run=2973\-3069msec
+		  WRITE: bw=1231KiB/s (1261kB/s), 616KiB/s\-621KiB/s (630kB/s\-636kB/s), io=64.0MiB (67.1MB), run=52747\-53223msec
+.fi
+.P
+For each data direction it prints:
 .RS
 .TP
-.B io
-Number of megabytes I/O performed.
-.TP
-.B aggrb
-Aggregate bandwidth of threads in the group.
-.TP
-.B minb
-Minimum average bandwidth a thread saw.
-.TP
-.B maxb
-Maximum average bandwidth a thread saw.
+.B bw
+Aggregate bandwidth of threads in this group followed by the
+minimum and maximum bandwidth of all the threads in this group.
+Values outside of brackets are power\-of\-2 format and those
+within are the equivalent value in a power\-of\-10 format.
 .TP
-.B mint
-Shortest runtime of threads in the group.
+.B io
+Aggregate I/O performed of all threads in this group. The
+format is the same as \fBbw\fR.
 .TP
-.B maxt
-Longest runtime of threads in the group.
+.B run
+The smallest and longest runtimes of the threads in this group.
 .RE
-.PD
 .P
-Finally, disk statistics are printed with reads first:
-.PD 0
+And finally, the disk statistics are printed. This is Linux specific.
+They will look like this:
+.P
+.nf
+		  Disk stats (read/write):
+		    sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
+.fi
+.P
+Each value is printed for both reads and writes, with reads first. The
+numbers denote:
 .RS
 .TP
 .B ios
@@ -2321,517 +2942,538 @@ Number of merges performed by the I/O scheduler.
 .B ticks
 Number of ticks we kept the disk busy.
 .TP
-.B io_queue
+.B in_queue
 Total time spent in the disk queue.
 .TP
 .B util
-Disk utilization.
+The disk utilization. A value of 100% means we kept the disk
+busy constantly, 50% would be a disk idling half of the time.
 .RE
-.PD
 .P
-It is also possible to get fio to dump the current output while it is
-running, without terminating the job. To do that, send fio the \fBUSR1\fR
-signal.
+It is also possible to get fio to dump the current output while it is running,
+without terminating the job. To do that, send fio the USR1 signal. You can
+also get regularly timed dumps by using the \fB\-\-status\-interval\fR
+parameter, or by creating a file in `/tmp' named
+`fio\-dump\-status'. If fio sees this file, it will unlink it and dump the
+current output status.
 .SH TERSE OUTPUT
-If the \fB\-\-minimal\fR / \fB\-\-append-terse\fR options are given, the
-results will be printed/appended in a semicolon-delimited format suitable for
-scripted use.
-A job description (if provided) follows on a new line.  Note that the first
-number in the line is the version number. If the output has to be changed
-for some reason, this number will be incremented by 1 to signify that
-change. Numbers in brackets (e.g. "[v3]") indicate which terse version
-introduced a field. The fields are:
+For scripted usage where you typically want to generate tables or graphs of the
+results, fio can output the results in a semicolon separated format. The format
+is one long line of values, such as:
 .P
-.RS
-.B terse version, fio version [v3], jobname, groupid, error
+.nf
+		2;card0;0;0;7139336;121836;60004;1;10109;27.932460;116.933948;220;126861;3495.446807;1085.368601;226;126864;3523.635629;1089.012448;24063;99944;50.275485%;59818.274627;5540.657370;7155060;122104;60004;1;8338;29.086342;117.839068;388;128077;5032.488518;1234.785715;391;128085;5061.839412;1236.909129;23436;100928;50.287926%;59964.832030;5644.844189;14.595833%;19.394167%;123706;0;7313;0.1%;0.1%;0.1%;0.1%;0.1%;0.1%;100.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.01%;0.02%;0.05%;0.16%;6.04%;40.40%;52.68%;0.64%;0.01%;0.00%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00%
+		A description of this job goes here.
+.fi
 .P
-Read status:
-.RS
-.B Total I/O \fR(KiB)\fP, bandwidth \fR(KiB/s)\fP, IOPS, runtime \fR(ms)\fP
+The job description (if provided) follows on a second line.
 .P
-Submission latency:
-.RS
-.B min, max, mean, standard deviation
-.RE
-Completion latency:
-.RS
-.B min, max, mean, standard deviation
-.RE
-Completion latency percentiles (20 fields):
-.RS
-.B Xth percentile=usec
-.RE
-Total latency:
-.RS
-.B min, max, mean, standard deviation
-.RE
-Bandwidth:
-.RS
-.B min, max, aggregate percentage of total, mean, standard deviation, number of samples [v5]
-.RE
-IOPS [v5]:
-.RS
-.B min, max, mean, standard deviation, number of samples
-.RE
-.RE
+To enable terse output, use the \fB\-\-minimal\fR or
+`\-\-output\-format=terse' command line options. The
+first value is the version of the terse output format. If the output has to be
+changed for some reason, this number will be incremented by 1 to signify that
+change.
 .P
-Write status:
-.RS
-.B Total I/O \fR(KiB)\fP, bandwidth \fR(KiB/s)\fP, IOPS, runtime \fR(ms)\fP
+Split up, the format is as follows (comments in brackets denote when a
+field was introduced or whether it's specific to some terse version):
 .P
-Submission latency:
-.RS
-.B min, max, mean, standard deviation
-.RE
-Completion latency:
-.RS
-.B min, max, mean, standard deviation
-.RE
-Completion latency percentiles (20 fields):
-.RS
-.B Xth percentile=usec
-.RE
-Total latency:
+.nf
+			terse version, fio version [v3], jobname, groupid, error
+.fi
 .RS
-.B min, max, mean, standard deviation
+.P
+.B
+READ status:
 .RE
-Bandwidth:
+.P
+.nf
+			Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
+			Submission latency: min, max, mean, stdev (usec)
+			Completion latency: min, max, mean, stdev (usec)
+			Completion latency percentiles: 20 fields (see below)
+			Total latency: min, max, mean, stdev (usec)
+			Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
+			IOPS [v5]: min, max, mean, stdev, number of samples
+.fi
 .RS
-.B min, max, aggregate percentage of total, mean, standard deviation, number of samples [v5]
+.P
+.B
+WRITE status:
 .RE
-IOPS [v5]:
+.P
+.nf
+			Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
+			Submission latency: min, max, mean, stdev (usec)
+			Completion latency: min, max, mean, stdev (usec)
+			Completion latency percentiles: 20 fields (see below)
+			Total latency: min, max, mean, stdev (usec)
+			Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
+			IOPS [v5]: min, max, mean, stdev, number of samples
+.fi
 .RS
-.B min, max, mean, standard deviation, number of samples
-.RE
+.P
+.B
+TRIM status [all but version 3]:
 .RE
 .P
-Trim status [all but version 3]:
+.nf
+			Fields are similar to \fBREAD/WRITE\fR status.
+.fi
 .RS
-Similar to Read/Write status but for trims.
-.RE
 .P
+.B
 CPU usage:
-.RS
-.B user, system, context switches, major page faults, minor page faults
 .RE
 .P
-IO depth distribution:
+.nf
+			user, system, context switches, major faults, minor faults
+.fi
 .RS
-.B <=1, 2, 4, 8, 16, 32, >=64
+.P
+.B
+I/O depths:
 .RE
 .P
-IO latency distribution:
-.RS
-Microseconds:
+.nf
+			<=1, 2, 4, 8, 16, 32, >=64
+.fi
 .RS
-.B <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000
+.P
+.B
+I/O latencies microseconds:
 .RE
-Milliseconds:
+.P
+.nf
+			<=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000
+.fi
 .RS
-.B <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000, 2000, >=2000
-.RE
+.P
+.B
+I/O latencies milliseconds:
 .RE
 .P
-Disk utilization (1 for each disk used) [v3]:
+.nf
+			<=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000, 2000, >=2000
+.fi
 .RS
-.B name, read ios, write ios, read merges, write merges, read ticks, write ticks, read in-queue time, write in-queue time, disk utilization percentage
+.P
+.B
+Disk utilization [v3]:
 .RE
 .P
-Error Info (dependent on continue_on_error, default off):
+.nf
+			disk name, read ios, write ios, read merges, write merges, read ticks, write ticks, time spent in queue, disk utilization percentage
+.fi
 .RS
-.B total # errors, first error code
-.RE
 .P
-.B text description (if provided in config - appears on newline)
+.B
+Additional Info (dependent on continue_on_error, default off):
 .RE
 .P
-Below is a single line containing short names for each of the fields in
-the minimal output v3, separated by semicolons:
+.nf
+			total # errors, first error code
+.fi
 .RS
 .P
+.B
+Additional Info (dependent on description being set):
+.RE
+.P
 .nf
-terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_max;read_clat_min;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_max;write_clat_min;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;
 write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;pu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
+			Text description
+.fi
+.P
+Completion latency percentiles can be a grouping of up to 20 sets, so for the
+terse output fio writes all of them. Each field will look like this:
+.P
+.nf
+		1.00%=6112
+.fi
+.P
+which is the Xth percentile, and the `usec' latency associated with it.
+.P
+For \fBDisk utilization\fR, all disks used by fio are shown. So for each disk there
+will be a disk utilization section.
+.P
+Below is a single line containing short names for each of the fields in the
+minimal output v3, separated by semicolons:
+.P
+.nf
+		terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct1
 0;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 .fi
-.RE
 .SH JSON+ OUTPUT
 The \fBjson+\fR output format is identical to the \fBjson\fR output format except that it
 adds a full dump of the completion latency bins. Each \fBbins\fR object contains a
 set of (key, value) pairs where keys are latency durations and values count how
 many I/Os had completion latencies of the corresponding duration. For example,
 consider:
-
 .RS
+.P
 "bins" : { "87552" : 1, "89600" : 1, "94720" : 1, "96768" : 1, "97792" : 1, "99840" : 1, "100864" : 2, "103936" : 6, "104960" : 534, "105984" : 5995, "107008" : 7529, ... }
 .RE
-
+.P
 This data indicates that one I/O required 87,552ns to complete, two I/Os required
 100,864ns to complete, and 7529 I/Os required 107,008ns to complete.
-
+.P
 Also included with fio is a Python script \fBfio_jsonplus_clat2csv\fR that takes
-json+ output and generates CSV-formatted latency data suitable for plotting.
-
+json+ output and generates CSV\-formatted latency data suitable for plotting.
+.P
 The latency durations actually represent the midpoints of latency intervals.
-For details refer to stat.h.
-
-
+For details refer to `stat.h' in the fio source.
 .SH TRACE FILE FORMAT
-There are two trace file format that you can encounter. The older (v1) format
-is unsupported since version 1.20-rc3 (March 2008). It will still be described
+There are two trace file format that you can encounter. The older (v1) format is
+unsupported since version 1.20\-rc3 (March 2008). It will still be described
 below in case that you get an old trace and want to understand it.
-
-In any case the trace is a simple text file with a single action per line.
-
 .P
+In any case the trace is a simple text file with a single action per line.
+.TP
 .B Trace file format v1
+Each line represents a single I/O action in the following format:
 .RS
-Each line represents a single io action in the following format:
-
+.RS
+.P
 rw, offset, length
-
-where rw=0/1 for read/write, and the offset and length entries being in bytes.
-
-This format is not supported in Fio versions => 1.20-rc3.
-
 .RE
 .P
+where `rw=0/1' for read/write, and the `offset' and `length' entries being in bytes.
+.P
+This format is not supported in fio versions >= 1.20\-rc3.
+.RE
+.TP
 .B Trace file format v2
+The second version of the trace file format was added in fio version 1.17. It
+allows to access more then one file per trace and has a bigger set of possible
+file actions.
 .RS
-The second version of the trace file format was added in Fio version 1.17.
-It allows one to access more then one file per trace and has a bigger set of
-possible file actions.
-
+.P
 The first line of the trace file has to be:
-
-\fBfio version 2 iolog\fR
-
+.RS
+.P
+"fio version 2 iolog"
+.RE
+.P
 Following this can be lines in two different formats, which are described below.
+.P
+.B
 The file management format:
-
-\fBfilename action\fR
-
-The filename is given as an absolute path. The action can be one of these:
-
+.RS
+filename action
 .P
-.PD 0
+The `filename' is given as an absolute path. The `action' can be one of these:
 .RS
 .TP
 .B add
-Add the given filename to the trace
+Add the given `filename' to the trace.
 .TP
 .B open
-Open the file with the given filename. The filename has to have been previously
-added with the \fBadd\fR action.
+Open the file with the given `filename'. The `filename' has to have
+been added with the \fBadd\fR action before.
 .TP
 .B close
-Close the file with the given filename. The file must have previously been
-opened.
+Close the file with the given `filename'. The file has to have been
+\fBopen\fRed before.
+.RE
 .RE
-.PD
 .P
-
-The file io action format:
-
-\fBfilename action offset length\fR
-
-The filename is given as an absolute path, and has to have been added and opened
-before it can be used with this format. The offset and length are given in
-bytes. The action can be one of these:
-
+.B
+The file I/O action format:
+.RS
+filename action offset length
 .P
-.PD 0
+The `filename' is given as an absolute path, and has to have been \fBadd\fRed and
+\fBopen\fRed before it can be used with this format. The `offset' and `length' are
+given in bytes. The `action' can be one of these:
 .RS
 .TP
 .B wait
-Wait for 'offset' microseconds. Everything below 100 is discarded.  The time is
-relative to the previous wait statement.
+Wait for `offset' microseconds. Everything below 100 is discarded.
+The time is relative to the previous `wait' statement.
 .TP
 .B read
-Read \fBlength\fR bytes beginning from \fBoffset\fR
+Read `length' bytes beginning from `offset'.
 .TP
 .B write
-Write \fBlength\fR bytes beginning from \fBoffset\fR
+Write `length' bytes beginning from `offset'.
 .TP
 .B sync
-fsync() the file
+\fBfsync\fR\|(2) the file.
 .TP
 .B datasync
-fdatasync() the file
+\fBfdatasync\fR\|(2) the file.
 .TP
 .B trim
-trim the given file from the given \fBoffset\fR for \fBlength\fR bytes
+Trim the given file from the given `offset' for `length' bytes.
+.RE
 .RE
-.PD
-.P
-
 .SH CPU IDLENESS PROFILING
-In some cases, we want to understand CPU overhead in a test. For example,
-we test patches for the specific goodness of whether they reduce CPU usage.
-fio implements a balloon approach to create a thread per CPU that runs at
-idle priority, meaning that it only runs when nobody else needs the cpu.
-By measuring the amount of work completed by the thread, idleness of each
-CPU can be derived accordingly.
-
-An unit work is defined as touching a full page of unsigned characters. Mean
-and standard deviation of time to complete an unit work is reported in "unit
-work" section. Options can be chosen to report detailed percpu idleness or
-overall system idleness by aggregating percpu stats.
-
+In some cases, we want to understand CPU overhead in a test. For example, we
+test patches for the specific goodness of whether they reduce CPU usage.
+Fio implements a balloon approach to create a thread per CPU that runs at idle
+priority, meaning that it only runs when nobody else needs the cpu.
+By measuring the amount of work completed by the thread, idleness of each CPU
+can be derived accordingly.
+.P
+An unit work is defined as touching a full page of unsigned characters. Mean and
+standard deviation of time to complete an unit work is reported in "unit work"
+section. Options can be chosen to report detailed percpu idleness or overall
+system idleness by aggregating percpu stats.
 .SH VERIFICATION AND TRIGGERS
-Fio is usually run in one of two ways, when data verification is done. The
-first is a normal write job of some sort with verify enabled. When the
-write phase has completed, fio switches to reads and verifies everything
-it wrote. The second model is running just the write phase, and then later
-on running the same job (but with reads instead of writes) to repeat the
-same IO patterns and verify the contents. Both of these methods depend
-on the write phase being completed, as fio otherwise has no idea how much
-data was written.
-
-With verification triggers, fio supports dumping the current write state
-to local files. Then a subsequent read verify workload can load this state
-and know exactly where to stop. This is useful for testing cases where
-power is cut to a server in a managed fashion, for instance.
-
+Fio is usually run in one of two ways, when data verification is done. The first
+is a normal write job of some sort with verify enabled. When the write phase has
+completed, fio switches to reads and verifies everything it wrote. The second
+model is running just the write phase, and then later on running the same job
+(but with reads instead of writes) to repeat the same I/O patterns and verify
+the contents. Both of these methods depend on the write phase being completed,
+as fio otherwise has no idea how much data was written.
+.P
+With verification triggers, fio supports dumping the current write state to
+local files. Then a subsequent read verify workload can load this state and know
+exactly where to stop. This is useful for testing cases where power is cut to a
+server in a managed fashion, for instance.
+.P
 A verification trigger consists of two things:
-
 .RS
-Storing the write state of each job
-.LP
-Executing a trigger command
+.P
+1) Storing the write state of each job.
+.P
+2) Executing a trigger command.
 .RE
-
-The write state is relatively small, on the order of hundreds of bytes
-to single kilobytes. It contains information on the number of completions
-done, the last X completions, etc.
-
-A trigger is invoked either through creation (\fBtouch\fR) of a specified
-file in the system, or through a timeout setting. If fio is run with
-\fB\-\-trigger\-file=/tmp/trigger-file\fR, then it will continually check for
-the existence of /tmp/trigger-file. When it sees this file, it will
-fire off the trigger (thus saving state, and executing the trigger
+.P
+The write state is relatively small, on the order of hundreds of bytes to single
+kilobytes. It contains information on the number of completions done, the last X
+completions, etc.
+.P
+A trigger is invoked either through creation ('touch') of a specified file in
+the system, or through a timeout setting. If fio is run with
+`\-\-trigger\-file=/tmp/trigger\-file', then it will continually
+check for the existence of `/tmp/trigger\-file'. When it sees this file, it
+will fire off the trigger (thus saving state, and executing the trigger
 command).
-
-For client/server runs, there's both a local and remote trigger. If
-fio is running as a server backend, it will send the job states back
-to the client for safe storage, then execute the remote trigger, if
-specified. If a local trigger is specified, the server will still send
-back the write state, but the client will then execute the trigger.
-
+.P
+For client/server runs, there's both a local and remote trigger. If fio is
+running as a server backend, it will send the job states back to the client for
+safe storage, then execute the remote trigger, if specified. If a local trigger
+is specified, the server will still send back the write state, but the client
+will then execute the trigger.
 .RE
 .P
 .B Verification trigger example
 .RS
-
-Lets say we want to run a powercut test on the remote machine 'server'.
-Our write workload is in write-test.fio. We want to cut power to 'server'
-at some point during the run, and we'll run this test from the safety
-or our local machine, 'localbox'. On the server, we'll start the fio
-backend normally:
-
-server# \fBfio \-\-server\fR
-
+Let's say we want to run a powercut test on the remote Linux machine 'server'.
+Our write workload is in `write\-test.fio'. We want to cut power to 'server' at
+some point during the run, and we'll run this test from the safety or our local
+machine, 'localbox'. On the server, we'll start the fio backend normally:
+.RS
+.P
+server# fio \-\-server
+.RE
+.P
 and on the client, we'll fire off the workload:
-
-localbox$ \fBfio \-\-client=server \-\-trigger\-file=/tmp/my\-trigger \-\-trigger-remote="bash \-c "echo b > /proc/sysrq-triger""\fR
-
-We set \fB/tmp/my-trigger\fR as the trigger file, and we tell fio to execute
-
-\fBecho b > /proc/sysrq-trigger\fR
-
-on the server once it has received the trigger and sent us the write
-state. This will work, but it's not \fIreally\fR cutting power to the server,
-it's merely abruptly rebooting it. If we have a remote way of cutting
-power to the server through IPMI or similar, we could do that through
-a local trigger command instead. Lets assume we have a script that does
-IPMI reboot of a given hostname, ipmi-reboot. On localbox, we could
-then have run fio with a local trigger instead:
-
-localbox$ \fBfio \-\-client=server \-\-trigger\-file=/tmp/my\-trigger \-\-trigger="ipmi-reboot server"\fR
-
-For this case, fio would wait for the server to send us the write state,
-then execute 'ipmi-reboot server' when that happened.
-
+.RS
+.P
+localbox$ fio \-\-client=server \-\-trigger\-file=/tmp/my\-trigger \-\-trigger\-remote="bash \-c "echo b > /proc/sysrq\-triger""
+.RE
+.P
+We set `/tmp/my\-trigger' as the trigger file, and we tell fio to execute:
+.RS
+.P
+echo b > /proc/sysrq\-trigger
+.RE
+.P
+on the server once it has received the trigger and sent us the write state. This
+will work, but it's not really cutting power to the server, it's merely
+abruptly rebooting it. If we have a remote way of cutting power to the server
+through IPMI or similar, we could do that through a local trigger command
+instead. Let's assume we have a script that does IPMI reboot of a given hostname,
+ipmi\-reboot. On localbox, we could then have run fio with a local trigger
+instead:
+.RS
+.P
+localbox$ fio \-\-client=server \-\-trigger\-file=/tmp/my\-trigger \-\-trigger="ipmi\-reboot server"
+.RE
+.P
+For this case, fio would wait for the server to send us the write state, then
+execute `ipmi\-reboot server' when that happened.
 .RE
 .P
 .B Loading verify state
 .RS
-To load store write state, read verification job file must contain
-the verify_state_load option. If that is set, fio will load the previously
+To load stored write state, a read verification job file must contain the
+\fBverify_state_load\fR option. If that is set, fio will load the previously
 stored state. For a local fio run this is done by loading the files directly,
-and on a client/server run, the server backend will ask the client to send
-the files over and load them from there.
-
+and on a client/server run, the server backend will ask the client to send the
+files over and load them from there.
 .RE
-
 .SH LOG FILE FORMATS
-
 Fio supports a variety of log file formats, for logging latencies, bandwidth,
 and IOPS. The logs share a common format, which looks like this:
-
-.B time (msec), value, data direction, block size (bytes), offset (bytes)
-
-Time for the log entry is always in milliseconds. The value logged depends
-on the type of log, it will be one of the following:
-
+.RS
 .P
-.PD 0
+time (msec), value, data direction, block size (bytes), offset (bytes)
+.RE
+.P
+`Time' for the log entry is always in milliseconds. The `value' logged depends
+on the type of log, it will be one of the following:
+.RS
 .TP
 .B Latency log
-Value is in latency in usecs
+Value is latency in usecs
 .TP
 .B Bandwidth log
 Value is in KiB/sec
 .TP
 .B IOPS log
-Value is in IOPS
-.PD
-.P
-
-Data direction is one of the following:
-
+Value is IOPS
+.RE
 .P
-.PD 0
+`Data direction' is one of the following:
+.RS
 .TP
 .B 0
-IO is a READ
+I/O is a READ
 .TP
 .B 1
-IO is a WRITE
+I/O is a WRITE
 .TP
 .B 2
-IO is a TRIM
-.PD
-.P
-
-The entry's *block size* is always in bytes. The \fIoffset\fR is the offset, in
-bytes, from the start of the file, for that particular IO. The logging of the
-offset can be toggled with \fBlog_offset\fR.
-
-If windowed logging is enabled through \fBlog_avg_msec\fR, then fio doesn't log
-individual IOs. Instead of logs the average values over the specified
-period of time. Since \fIdata direction\fR, \fIblock size\fR and \fIoffset\fR
-are per-IO values, if windowed logging is enabled they aren't applicable and
-will be 0. If windowed logging is enabled and \fBlog_max_value\fR is set, then
-fio logs maximum values in that window instead of averages.
-
-For histogram logging the logs look like this:
-
-.B time (msec), data direction, block-size, bin 0, bin 1, ..., bin 1215
-
-Where 'bin i' gives the frequency of IO requests with a latency falling in
-the i-th bin. See \fBlog_hist_coarseness\fR for logging fewer bins.
-
+I/O is a TRIM
 .RE
-
+.P
+The entry's `block size' is always in bytes. The `offset' is the offset, in bytes,
+from the start of the file, for that particular I/O. The logging of the offset can be
+toggled with \fBlog_offset\fR.
+.P
+Fio defaults to logging every individual I/O. When IOPS are logged for individual
+I/Os the `value' entry will always be 1. If windowed logging is enabled through
+\fBlog_avg_msec\fR, fio logs the average values over the specified period of time.
+If windowed logging is enabled and \fBlog_max_value\fR is set, then fio logs
+maximum values in that window instead of averages. Since `data direction', `block size'
+and `offset' are per\-I/O values, if windowed logging is enabled they
+aren't applicable and will be 0.
 .SH CLIENT / SERVER
-Normally you would run fio as a stand-alone application on the machine
-where the IO workload should be generated. However, it is also possible to
-run the frontend and backend of fio separately. This makes it possible to
-have a fio server running on the machine(s) where the IO workload should
-be running, while controlling it from another machine.
-
-To start the server, you would do:
-
-\fBfio \-\-server=args\fR
-
-on that machine, where args defines what fio listens to. The arguments
-are of the form 'type:hostname or IP:port'. 'type' is either 'ip' (or ip4)
-for TCP/IP v4, 'ip6' for TCP/IP v6, or 'sock' for a local unix domain
-socket. 'hostname' is either a hostname or IP address, and 'port' is the port to
-listen to (only valid for TCP/IP, not a local socket). Some examples:
-
+Normally fio is invoked as a stand\-alone application on the machine where the
+I/O workload should be generated. However, the backend and frontend of fio can
+be run separately i.e., the fio server can generate an I/O workload on the "Device
+Under Test" while being controlled by a client on another machine.
+.P
+Start the server on the machine which has access to the storage DUT:
+.RS
+.P
+$ fio \-\-server=args
+.RE
+.P
+where `args' defines what fio listens to. The arguments are of the form
+`type,hostname' or `IP,port'. `type' is either `ip' (or ip4) for TCP/IP
+v4, `ip6' for TCP/IP v6, or `sock' for a local unix domain socket.
+`hostname' is either a hostname or IP address, and `port' is the port to listen
+to (only valid for TCP/IP, not a local socket). Some examples:
+.RS
+.TP
 1) \fBfio \-\-server\fR
-
-   Start a fio server, listening on all interfaces on the default port (8765).
-
+Start a fio server, listening on all interfaces on the default port (8765).
+.TP
 2) \fBfio \-\-server=ip:hostname,4444\fR
-
-   Start a fio server, listening on IP belonging to hostname and on port 4444.
-
+Start a fio server, listening on IP belonging to hostname and on port 4444.
+.TP
 3) \fBfio \-\-server=ip6:::1,4444\fR
-
-   Start a fio server, listening on IPv6 localhost ::1 and on port 4444.
-
+Start a fio server, listening on IPv6 localhost ::1 and on port 4444.
+.TP
 4) \fBfio \-\-server=,4444\fR
-
-   Start a fio server, listening on all interfaces on port 4444.
-
+Start a fio server, listening on all interfaces on port 4444.
+.TP
 5) \fBfio \-\-server=1.2.3.4\fR
-
-   Start a fio server, listening on IP 1.2.3.4 on the default port.
-
+Start a fio server, listening on IP 1.2.3.4 on the default port.
+.TP
 6) \fBfio \-\-server=sock:/tmp/fio.sock\fR
-
-   Start a fio server, listening on the local socket /tmp/fio.sock.
-
-When a server is running, you can connect to it from a client. The client
-is run with:
-
-\fBfio \-\-local-args \-\-client=server \-\-remote-args <job file(s)>\fR
-
-where \-\-local-args are arguments that are local to the client where it is
-running, 'server' is the connect string, and \-\-remote-args and <job file(s)>
-are sent to the server. The 'server' string follows the same format as it
-does on the server side, to allow IP/hostname/socket and port strings.
-You can connect to multiple clients as well, to do that you could run:
-
-\fBfio \-\-client=server2 \-\-client=server2 <job file(s)>\fR
-
-If the job file is located on the fio server, then you can tell the server
-to load a local file as well. This is done by using \-\-remote-config:
-
-\fBfio \-\-client=server \-\-remote-config /path/to/file.fio\fR
-
-Then fio will open this local (to the server) job file instead
-of being passed one from the client.
-
+Start a fio server, listening on the local socket `/tmp/fio.sock'.
+.RE
+.P
+Once a server is running, a "client" can connect to the fio server with:
+.RS
+.P
+$ fio <local\-args> \-\-client=<server> <remote\-args> <job file(s)>
+.RE
+.P
+where `local\-args' are arguments for the client where it is running, `server'
+is the connect string, and `remote\-args' and `job file(s)' are sent to the
+server. The `server' string follows the same format as it does on the server
+side, to allow IP/hostname/socket and port strings.
+.P
+Fio can connect to multiple servers this way:
+.RS
+.P
+$ fio \-\-client=<server1> <job file(s)> \-\-client=<server2> <job file(s)>
+.RE
+.P
+If the job file is located on the fio server, then you can tell the server to
+load a local file as well. This is done by using \fB\-\-remote\-config\fR:
+.RS
+.P
+$ fio \-\-client=server \-\-remote\-config /path/to/file.fio
+.RE
+.P
+Then fio will open this local (to the server) job file instead of being passed
+one from the client.
+.P
 If you have many servers (example: 100 VMs/containers), you can input a pathname
-of a file containing host IPs/names as the parameter value for the \-\-client option.
-For example, here is an example "host.list" file containing 2 hostnames:
-
+of a file containing host IPs/names as the parameter value for the
+\fB\-\-client\fR option. For example, here is an example `host.list'
+file containing 2 hostnames:
+.RS
+.P
+.PD 0
 host1.your.dns.domain
-.br
+.P
 host2.your.dns.domain
-
+.PD
+.RE
+.P
 The fio command would then be:
-
-\fBfio \-\-client=host.list <job file>\fR
-
-In this mode, you cannot input server-specific parameters or job files, and all
+.RS
+.P
+$ fio \-\-client=host.list <job file(s)>
+.RE
+.P
+In this mode, you cannot input server\-specific parameters or job files \-\- all
 servers receive the same job file.
-
-In order to enable fio \-\-client runs utilizing a shared filesystem from multiple hosts,
-fio \-\-client now prepends the IP address of the server to the filename. For example,
-if fio is using directory /mnt/nfs/fio and is writing filename fileio.tmp,
-with a \-\-client hostfile
-containing two hostnames h1 and h2 with IP addresses 192.168.10.120 and 192.168.10.121, then
-fio will create two files:
-
+.P
+In order to let `fio \-\-client' runs use a shared filesystem from multiple
+hosts, `fio \-\-client' now prepends the IP address of the server to the
+filename. For example, if fio is using the directory `/mnt/nfs/fio' and is
+writing filename `fileio.tmp', with a \fB\-\-client\fR `hostfile'
+containing two hostnames `h1' and `h2' with IP addresses 192.168.10.120 and
+192.168.10.121, then fio will create two files:
+.RS
+.P
+.PD 0
 /mnt/nfs/fio/192.168.10.120.fileio.tmp
-.br
+.P
 /mnt/nfs/fio/192.168.10.121.fileio.tmp
-
+.PD
+.RE
 .SH AUTHORS
-
 .B fio
 was written by Jens Axboe <jens.axboe@oracle.com>,
 now Jens Axboe <axboe@fb.com>.
 .br
 This man page was written by Aaron Carroll <aaronc@cse.unsw.edu.au> based
 on documentation by Jens Axboe.
+.br
+This man page was rewritten by Tomohiro Kusumi <tkusumi@tuxera.com> based
+on documentation by Jens Axboe.
 .SH "REPORTING BUGS"
 Report bugs to the \fBfio\fR mailing list <fio@vger.kernel.org>.
 .br
-See \fBREPORTING-BUGS\fR.
-
-\fBREPORTING-BUGS\fR: http://git.kernel.dk/cgit/fio/plain/REPORTING-BUGS
+See \fBREPORTING\-BUGS\fR.
+.P
+\fBREPORTING\-BUGS\fR: \fIhttp://git.kernel.dk/cgit/fio/plain/REPORTING\-BUGS\fR
 .SH "SEE ALSO"
 For further documentation see \fBHOWTO\fR and \fBREADME\fR.
 .br
-Sample jobfiles are available in the \fBexamples\fR directory.
-.br
-These are typically located under /usr/share/doc/fio.
-
-\fBHOWTO\fR:  http://git.kernel.dk/cgit/fio/plain/HOWTO
+Sample jobfiles are available in the `examples/' directory.
 .br
-\fBREADME\fR: http://git.kernel.dk/cgit/fio/plain/README
+These are typically located under `/usr/share/doc/fio'.
+.P
+\fBHOWTO\fR: \fIhttp://git.kernel.dk/cgit/fio/plain/HOWTO\fR
 .br
+\fBREADME\fR: \fIhttp://git.kernel.dk/cgit/fio/plain/README\fR
diff --git a/gfio.c b/gfio.c
index 7c92a50..7160c3a 100644
--- a/gfio.c
+++ b/gfio.c
@@ -1243,7 +1243,7 @@ static void about_dialog(GtkWidget *w, gpointer data)
 		"website", "http://git.kernel.dk/cgit/fio/",
 		"authors", authors,
 		"version", fio_version_string,
-		"copyright", "�� 2012 Jens Axboe <axboe@kernel.dk>",
+		"copyright", "�� 2012-2017 Jens Axboe <axboe@kernel.dk>",
 		"logo-icon-name", "fio",
 		/* Must be last: */
 		"wrap-license", TRUE,
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 500d64c..edfefa8 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.99">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="3.0">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"
diff --git a/printing.c b/printing.c
index 4dcc986..b58996b 100644
--- a/printing.c
+++ b/printing.c
@@ -31,7 +31,7 @@ static void results_draw_page(GtkPrintOperation *operation,
 			      gpointer data)
 {
 	cairo_t *cr;
-	char str[20];
+	char str[32];
 	double x, y;
 
 	cr = gtk_print_context_get_cairo_context(context);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a94a977497636bdcbef7106ce3617c96c8ad66bd:

  HOWTO: fix unit type suffix in "Parameter types" section to upper case (2017-08-09 08:14:18 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 29092211c1f926541db0e2863badc03d7378b31a:

  HOWTO: update and clarify description of latencies in normal output (2017-08-14 13:02:49 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Merge branch 'serialize_overlap' of https://github.com/sitsofe/fio
      backend: cleanup overlap submission logic
      Merge branch 'ci' of https://github.com/sitsofe/fio

Sitsofe Wheeler (6):
      Makefile: modify make test to use a filesystem file
      ci: make CI builds fail on compilation warnings
      fio: add serialize_overlap option
      iolog: fix double free when verified I/O overlaps
      iolog: remove random layout verification optimisation
      iolog: tidy up log_io_piece() conditional

Vincent Fu (2):
      stat: change indentation of the lat (nsec/usec/msec) section in the normal output
      HOWTO: update and clarify description of latencies in normal output

 .travis.yml      |  2 ++
 HOWTO            | 44 ++++++++++++++++++++++++++++++++++----------
 Makefile         |  2 +-
 appveyor.yml     |  2 +-
 backend.c        | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 cconv.c          |  2 ++
 fio.1            | 14 ++++++++++++++
 init.c           | 17 +++++++++++++++++
 iolog.c          | 24 ++++++++++--------------
 options.c        | 11 +++++++++++
 stat.c           |  2 +-
 thread_options.h |  3 +++
 12 files changed, 142 insertions(+), 29 deletions(-)

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
index e84e61f..4cdda12 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -26,3 +26,5 @@ matrix:
 before_install:
   - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then sudo apt-get -qq update; fi
   - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then sudo apt-get install -qq -y libaio-dev libnuma-dev libz-dev; fi
+script:
+  - ./configure --extra-cflags="-Werror" && make && make test
diff --git a/HOWTO b/HOWTO
index fc173f0..71d9fa5 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2030,6 +2030,21 @@ I/O depth
 	16 requests, it will let the depth drain down to 4 before starting to fill
 	it again.
 
+.. option:: serialize_overlap=bool
+
+	Serialize in-flight I/Os that might otherwise cause or suffer from data races.
+	When two or more I/Os are submitted simultaneously, there is no guarantee that
+	the I/Os will be processed or completed in the submitted order. Further, if
+	two or more of those I/Os are writes, any overlapping region between them can
+	become indeterminate/undefined on certain storage. These issues can cause
+	verification to fail erratically when at least one of the racing I/Os is
+	changing data and the overlapping region has a non-zero size. Setting
+	``serialize_overlap`` tells fio to avoid provoking this behavior by explicitly
+	serializing in-flight I/Os that have a non-zero overlap. Note that setting
+	this option can reduce both performance and the `:option:iodepth` achieved.
+	Additionally this option does not work when :option:`io_submit_mode` is set to
+	offload. Default: false.
+
 .. option:: io_submit_mode=str
 
 	This option controls how fio submits the I/O to the I/O engine. The default
@@ -2605,7 +2620,6 @@ Verification
 
 	Enable experimental verification.
 
-
 Steady state
 ~~~~~~~~~~~~
 
@@ -3122,9 +3136,9 @@ group) the output looks like::
 	     | 99.99th=[78119]
 	   bw (  KiB/s): min=  532, max=  686, per=0.10%, avg=622.87, stdev=24.82, samples=  100
 	   iops        : min=   76, max=   98, avg=88.98, stdev= 3.54, samples=  100
-	    lat (usec) : 250=0.04%, 500=64.11%, 750=4.81%, 1000=2.79%
-	    lat (msec) : 2=4.16%, 4=1.84%, 10=4.90%, 20=11.33%, 50=5.37%
-	    lat (msec) : 100=0.65%
+	  lat (usec)   : 250=0.04%, 500=64.11%, 750=4.81%, 1000=2.79%
+	  lat (msec)   : 2=4.16%, 4=1.84%, 10=4.90%, 20=11.33%, 50=5.37%
+	  lat (msec)   : 100=0.65%
 	  cpu          : usr=0.27%, sys=0.18%, ctx=12072, majf=0, minf=21
 	  IO depths    : 1=85.0%, 2=13.1%, 4=1.8%, 8=0.1%, 16=0.0%, 32=0.0%, >=64=0.0%
 	     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
@@ -3163,6 +3177,10 @@ writes in the example above).  In the order listed, they denote:
 		complete is basically just CPU time (I/O has already been done, see slat
 		explanation).
 
+**lat**
+		Total latency. Same names as slat and clat, this denotes the time from
+		when fio created the I/O unit to completion of the I/O operation.
+
 **bw**
 		Bandwidth statistics based on samples. Same names as the xlat stats,
 		but also includes the number of samples taken (**samples**) and an
@@ -3174,6 +3192,14 @@ writes in the example above).  In the order listed, they denote:
 **iops**
 		IOPS statistics based on samples. Same names as bw.
 
+**lat (nsec/usec/msec)**
+		The distribution of I/O completion latencies. This is the time from when
+		I/O leaves fio and when it gets completed. Unlike the separate
+		read/write/trim sections above, the data here and in the remaining
+		sections apply to all I/Os for the reporting group. 250=0.04% means that
+		0.04% of the I/Os completed in under 250us. 500=64.11% means that 64.11%
+		of the I/Os required 250 to 499us for completion.
+
 **cpu**
 		CPU usage. User and system time, along with the number of context
 		switches this thread went through, usage of system and user time, and
@@ -3204,12 +3230,10 @@ writes in the example above).  In the order listed, they denote:
 		The number of read/write/trim requests issued, and how many of them were
 		short or dropped.
 
-**IO latencies**
-		The distribution of I/O completion latencies. This is the time from when
-		I/O leaves fio and when it gets completed.  The numbers follow the same
-		pattern as the I/O depths, meaning that 2=1.6% means that 1.6% of the
-		I/O completed within 2 msecs, 20=12.8% means that 12.8% of the I/O took
-		more than 10 msecs, but less than (or equal to) 20 msecs.
+**IO latency**
+		These values are for `--latency-target` and related options. When
+		these options are engaged, this section describes the I/O depth required
+		to meet the specified latency target.
 
 ..
 	Example output was based on the following:
diff --git a/Makefile b/Makefile
index 540ffb2..3764da5 100644
--- a/Makefile
+++ b/Makefile
@@ -471,7 +471,7 @@ doc: tools/plot/fio2gnuplot.1
 	@man -t tools/hist/fiologparser_hist.py.1 | ps2pdf - fiologparser_hist.pdf
 
 test: fio
-	./fio --minimal --thread --ioengine=null --runtime=1s --name=nulltest --rw=randrw --iodepth=2 --norandommap --random_generator=tausworthe64 --size=16T --name=verifynulltest --rw=write --verify=crc32c --verify_state_save=0 --size=100M
+	./fio --minimal --thread --exitall_on_error --runtime=1s --name=nulltest --ioengine=null --rw=randrw --iodepth=2 --norandommap --random_generator=tausworthe64 --size=16T --name=verifyfstest --filename=fiotestfile.tmp --unlink=1 --rw=write --verify=crc32c --verify_state_save=0 --size=16K
 
 install: $(PROGS) $(SCRIPTS) tools/plot/fio2gnuplot.1 FORCE
 	$(INSTALL) -m 755 -d $(DESTDIR)$(bindir)
diff --git a/appveyor.yml b/appveyor.yml
index 7543393..39f50a8 100644
--- a/appveyor.yml
+++ b/appveyor.yml
@@ -13,7 +13,7 @@ environment:
 
 build_script:
   - SET PATH=%CYG_ROOT%\bin;%PATH%
-  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure ${CONFIGURE_OPTIONS} && make.exe'
+  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure --extra-cflags=\"-Werror\" ${CONFIGURE_OPTIONS} && make.exe'
 
 after_build:
   - cd os\windows && dobuild.cmd %BUILD_ARCH%
diff --git a/backend.c b/backend.c
index fe15997..d2675b4 100644
--- a/backend.c
+++ b/backend.c
@@ -587,6 +587,50 @@ static int unlink_all_files(struct thread_data *td)
 }
 
 /*
+ * Check if io_u will overlap an in-flight IO in the queue
+ */
+static bool in_flight_overlap(struct io_u_queue *q, struct io_u *io_u)
+{
+	bool overlap;
+	struct io_u *check_io_u;
+	unsigned long long x1, x2, y1, y2;
+	int i;
+
+	x1 = io_u->offset;
+	x2 = io_u->offset + io_u->buflen;
+	overlap = false;
+	io_u_qiter(q, check_io_u, i) {
+		if (check_io_u->flags & IO_U_F_FLIGHT) {
+			y1 = check_io_u->offset;
+			y2 = check_io_u->offset + check_io_u->buflen;
+
+			if (x1 < y2 && y1 < x2) {
+				overlap = true;
+				dprint(FD_IO, "in-flight overlap: %llu/%lu, %llu/%lu\n",
+						x1, io_u->buflen,
+						y1, check_io_u->buflen);
+				break;
+			}
+		}
+	}
+
+	return overlap;
+}
+
+static int io_u_submit(struct thread_data *td, struct io_u *io_u)
+{
+	/*
+	 * Check for overlap if the user asked us to, and we have
+	 * at least one IO in flight besides this one.
+	 */
+	if (td->o.serialize_overlap && td->cur_depth > 1 &&
+	    in_flight_overlap(&td->io_u_all, io_u))
+		return FIO_Q_BUSY;
+
+	return td_io_queue(td, io_u);
+}
+
+/*
  * The main verify engine. Runs over the writes we previously submitted,
  * reads the blocks back in, and checks the crc/md5 of the data.
  */
@@ -716,7 +760,7 @@ static void do_verify(struct thread_data *td, uint64_t verify_bytes)
 		if (!td->o.disable_slat)
 			fio_gettime(&io_u->start_time, NULL);
 
-		ret = td_io_queue(td, io_u);
+		ret = io_u_submit(td, io_u);
 
 		if (io_queue_event(td, io_u, &ret, ddir, NULL, 1, NULL))
 			break;
@@ -983,7 +1027,7 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 				td->rate_next_io_time[ddir] = usec_for_io(td, ddir);
 
 		} else {
-			ret = td_io_queue(td, io_u);
+			ret = io_u_submit(td, io_u);
 
 			if (should_check_rate(td))
 				td->rate_next_io_time[ddir] = usec_for_io(td, ddir);
diff --git a/cconv.c b/cconv.c
index f9f2b30..ac58705 100644
--- a/cconv.c
+++ b/cconv.c
@@ -96,6 +96,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->iodepth_batch = le32_to_cpu(top->iodepth_batch);
 	o->iodepth_batch_complete_min = le32_to_cpu(top->iodepth_batch_complete_min);
 	o->iodepth_batch_complete_max = le32_to_cpu(top->iodepth_batch_complete_max);
+	o->serialize_overlap = le32_to_cpu(top->serialize_overlap);
 	o->size = le64_to_cpu(top->size);
 	o->io_size = le64_to_cpu(top->io_size);
 	o->size_percent = le32_to_cpu(top->size_percent);
@@ -346,6 +347,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->iodepth_batch = cpu_to_le32(o->iodepth_batch);
 	top->iodepth_batch_complete_min = cpu_to_le32(o->iodepth_batch_complete_min);
 	top->iodepth_batch_complete_max = cpu_to_le32(o->iodepth_batch_complete_max);
+	top->serialize_overlap = cpu_to_le32(o->serialize_overlap);
 	top->size_percent = cpu_to_le32(o->size_percent);
 	top->fill_device = cpu_to_le32(o->fill_device);
 	top->file_append = cpu_to_le32(o->file_append);
diff --git a/fio.1 b/fio.1
index a3fba65..14359e6 100644
--- a/fio.1
+++ b/fio.1
@@ -1044,6 +1044,20 @@ we simply do polling.
 Low watermark indicating when to start filling the queue again.  Default:
 \fBiodepth\fR.
 .TP
+.BI serialize_overlap \fR=\fPbool
+Serialize in-flight I/Os that might otherwise cause or suffer from data races.
+When two or more I/Os are submitted simultaneously, there is no guarantee that
+the I/Os will be processed or completed in the submitted order. Further, if
+two or more of those I/Os are writes, any overlapping region between them can
+become indeterminate/undefined on certain storage. These issues can cause
+verification to fail erratically when at least one of the racing I/Os is
+changing data and the overlapping region has a non-zero size. Setting
+\fBserialize_overlap\fR tells fio to avoid provoking this behavior by explicitly
+serializing in-flight I/Os that have a non-zero overlap. Note that setting
+this option can reduce both performance and the \fBiodepth\fR achieved.
+Additionally this option does not work when \fBio_submit_mode\fR is set to
+offload. Default: false.
+.TP
 .BI io_submit_mode \fR=\fPstr
 This option controls how fio submits the IO to the IO engine. The default is
 \fBinline\fR, which means that the fio job threads submit and reap IO directly.
diff --git a/init.c b/init.c
index 42e7107..164e411 100644
--- a/init.c
+++ b/init.c
@@ -698,6 +698,23 @@ static int fixup_options(struct thread_data *td)
 	if (o->iodepth_batch_complete_min > o->iodepth_batch_complete_max)
 		o->iodepth_batch_complete_max = o->iodepth_batch_complete_min;
 
+	/*
+	 * There's no need to check for in-flight overlapping IOs if the job
+	 * isn't changing data or the maximum iodepth is guaranteed to be 1
+	 */
+	if (o->serialize_overlap && !(td->flags & TD_F_READ_IOLOG) &&
+	    (!(td_write(td) || td_trim(td)) || o->iodepth == 1))
+		o->serialize_overlap = 0;
+	/*
+	 * Currently can't check for overlaps in offload mode
+	 */
+	if (o->serialize_overlap && o->io_submit_mode == IO_MODE_OFFLOAD) {
+		log_err("fio: checking for in-flight overlaps when the "
+			"io_submit_mode is offload is not supported\n");
+		o->serialize_overlap = 0;
+		ret = warnings_fatal;
+	}
+
 	if (o->nr_files > td->files_index)
 		o->nr_files = td->files_index;
 
diff --git a/iolog.c b/iolog.c
index 27c14eb..760d7b0 100644
--- a/iolog.c
+++ b/iolog.c
@@ -227,21 +227,16 @@ void log_io_piece(struct thread_data *td, struct io_u *io_u)
 	}
 
 	/*
-	 * We don't need to sort the entries, if:
+	 * We don't need to sort the entries if we only performed sequential
+	 * writes. In this case, just reading back data in the order we wrote
+	 * it out is the faster but still safe.
 	 *
-	 *	Sequential writes, or
-	 *	Random writes that lay out the file as it goes along
-	 *
-	 * For both these cases, just reading back data in the order we
-	 * wrote it out is the fastest.
-	 *
-	 * One exception is if we don't have a random map AND we are doing
-	 * verifies, in that case we need to check for duplicate blocks and
-	 * drop the old one, which we rely on the rb insert/lookup for
-	 * handling.
+	 * One exception is if we don't have a random map in which case we need
+	 * to check for duplicate blocks and drop the old one, which we rely on
+	 * the rb insert/lookup for handling.
 	 */
-	if (((!td->o.verifysort) || !td_random(td) || !td->o.overwrite) &&
-	      (file_randommap(td, ipo->file) || td->o.verify == VERIFY_NONE)) {
+	if (((!td->o.verifysort) || !td_random(td)) &&
+	      file_randommap(td, ipo->file)) {
 		INIT_FLIST_HEAD(&ipo->list);
 		flist_add_tail(&ipo->list, &td->io_hist_list);
 		ipo->flags |= IP_F_ONLIST;
@@ -284,7 +279,8 @@ restart:
 			td->io_hist_len--;
 			rb_erase(parent, &td->io_hist_tree);
 			remove_trim_entry(td, __ipo);
-			free(__ipo);
+			if (!(__ipo->flags & IP_F_IN_FLIGHT))
+				free(__ipo);
 			goto restart;
 		}
 	}
diff --git a/options.c b/options.c
index f2b2bb9..443791a 100644
--- a/options.c
+++ b/options.c
@@ -1882,6 +1882,17 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_IO_BASIC,
 	},
 	{
+		.name	= "serialize_overlap",
+		.lname	= "Serialize overlap",
+		.off1	= offsetof(struct thread_options, serialize_overlap),
+		.type	= FIO_OPT_BOOL,
+		.help	= "Wait for in-flight IOs that collide to complete",
+		.parent	= "iodepth",
+		.def	= "0",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_IO_BASIC,
+	},
+	{
 		.name	= "io_submit_mode",
 		.lname	= "IO submit mode",
 		.type	= FIO_OPT_STR,
diff --git a/stat.c b/stat.c
index aebd107..4aa9cb8 100644
--- a/stat.c
+++ b/stat.c
@@ -520,7 +520,7 @@ static int show_lat(double *io_u_lat, int nr, const char **ranges,
 		if (new_line) {
 			if (line)
 				log_buf(out, "\n");
-			log_buf(out, "    lat (%s) : ", msg);
+			log_buf(out, "  lat (%s)   : ", msg);
 			new_line = 0;
 			line = 0;
 		}
diff --git a/thread_options.h b/thread_options.h
index f3dfd42..26a3e0e 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -65,6 +65,7 @@ struct thread_options {
 	unsigned int iodepth_batch;
 	unsigned int iodepth_batch_complete_min;
 	unsigned int iodepth_batch_complete_max;
+	unsigned int serialize_overlap;
 
 	unsigned int unique_filename;
 
@@ -340,6 +341,8 @@ struct thread_options_pack {
 	uint32_t iodepth_batch;
 	uint32_t iodepth_batch_complete_min;
 	uint32_t iodepth_batch_complete_max;
+	uint32_t serialize_overlap;
+	uint32_t pad3;
 
 	uint64_t size;
 	uint64_t io_size;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f9cfc7d44a80638f81810416385136c35ad34658:

  Add ability to keep memory-mapped files (2017-08-08 14:26:56 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a94a977497636bdcbef7106ce3617c96c8ad66bd:

  HOWTO: fix unit type suffix in "Parameter types" section to upper case (2017-08-09 08:14:18 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (1):
      HOWTO: fix unit type suffix in "Parameter types" section to upper case

 HOWTO | 20 ++++++++++----------
 fio.1 | 20 ++++++++++----------
 2 files changed, 20 insertions(+), 20 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 6c69a0e..fc173f0 100644
--- a/HOWTO
+++ b/HOWTO
@@ -505,19 +505,19 @@ Parameter types
 	prefixes.  To specify power-of-10 decimal values defined in the
 	International System of Units (SI):
 
-		* *ki* -- means kilo (K) or 1000
-		* *mi* -- means mega (M) or 1000**2
-		* *gi* -- means giga (G) or 1000**3
-		* *ti* -- means tera (T) or 1000**4
-		* *pi* -- means peta (P) or 1000**5
+		* *Ki* -- means kilo (K) or 1000
+		* *Mi* -- means mega (M) or 1000**2
+		* *Gi* -- means giga (G) or 1000**3
+		* *Ti* -- means tera (T) or 1000**4
+		* *Pi* -- means peta (P) or 1000**5
 
 	To specify power-of-2 binary values defined in IEC 80000-13:
 
-		* *k* -- means kibi (Ki) or 1024
-		* *m* -- means mebi (Mi) or 1024**2
-		* *g* -- means gibi (Gi) or 1024**3
-		* *t* -- means tebi (Ti) or 1024**4
-		* *p* -- means pebi (Pi) or 1024**5
+		* *K* -- means kibi (Ki) or 1024
+		* *M* -- means mebi (Mi) or 1024**2
+		* *G* -- means gibi (Gi) or 1024**3
+		* *T* -- means tebi (Ti) or 1024**4
+		* *P* -- means pebi (Pi) or 1024**5
 
 	With :option:`kb_base`\=1024 (the default), the unit prefixes are opposite
 	from those specified in the SI and IEC 80000-13 standards to provide
diff --git a/fio.1 b/fio.1
index ab978ab..a3fba65 100644
--- a/fio.1
+++ b/fio.1
@@ -242,37 +242,37 @@ prefixes. To specify power-of-10 decimal values defined in the
 International System of Units (SI):
 .RS
 .P
-ki means kilo (K) or 1000
+Ki means kilo (K) or 1000
 .RE
 .RS
-mi means mega (M) or 1000**2
+Mi means mega (M) or 1000**2
 .RE
 .RS
-gi means giga (G) or 1000**3
+Gi means giga (G) or 1000**3
 .RE
 .RS
-ti means tera (T) or 1000**4
+Ti means tera (T) or 1000**4
 .RE
 .RS
-pi means peta (P) or 1000**5
+Pi means peta (P) or 1000**5
 .RE
 .P
 To specify power-of-2 binary values defined in IEC 80000-13:
 .RS
 .P
-k means kibi (Ki) or 1024
+K means kibi (Ki) or 1024
 .RE
 .RS
-m means mebi (Mi) or 1024**2
+M means mebi (Mi) or 1024**2
 .RE
 .RS
-g means gibi (Gi) or 1024**3
+G means gibi (Gi) or 1024**3
 .RE
 .RS
-t means tebi (Ti) or 1024**4
+T means tebi (Ti) or 1024**4
 .RE
 .RS
-p means pebi (Pi) or 1024**5
+P means pebi (Pi) or 1024**5
 .RE
 .P
 With `kb_base=1024' (the default), the unit prefixes are opposite

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d4a507c17533f05bcf6d6eeb8d00f3dad1a020a1:

  Merge branch 'fio-jsonplus-patches' of https://github.com/vincentkfu/fio (2017-08-07 13:44:01 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f9cfc7d44a80638f81810416385136c35ad34658:

  Add ability to keep memory-mapped files (2017-08-08 14:26:56 -0600)

----------------------------------------------------------------
Stephen Bates (1):
      Add ability to keep memory-mapped files

 fio.h    | 1 +
 memory.c | 8 ++++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/fio.h b/fio.h
index 39d775c..da950ef 100644
--- a/fio.h
+++ b/fio.h
@@ -87,6 +87,7 @@ enum {
 	TD_F_CHILD		= 1U << 12,
 	TD_F_NO_PROGRESS        = 1U << 13,
 	TD_F_REGROW_LOGS	= 1U << 14,
+	TD_F_MMAP_KEEP		= 1U << 15,
 };
 
 enum {
diff --git a/memory.c b/memory.c
index 22a7f5d..04dc3be 100644
--- a/memory.c
+++ b/memory.c
@@ -138,6 +138,9 @@ static int alloc_mem_mmap(struct thread_data *td, size_t total_mem)
 	}
 
 	if (td->o.mmapfile) {
+		if (access(td->o.mmapfile, F_OK) == 0)
+			td->flags |= TD_F_MMAP_KEEP;
+
 		td->mmapfd = open(td->o.mmapfile, O_RDWR|O_CREAT, 0644);
 
 		if (td->mmapfd < 0) {
@@ -169,7 +172,7 @@ static int alloc_mem_mmap(struct thread_data *td, size_t total_mem)
 		td->orig_buffer = NULL;
 		if (td->mmapfd != 1 && td->mmapfd != -1) {
 			close(td->mmapfd);
-			if (td->o.mmapfile)
+			if (td->o.mmapfile && !(td->flags & TD_F_MMAP_KEEP))
 				unlink(td->o.mmapfile);
 		}
 
@@ -187,7 +190,8 @@ static void free_mem_mmap(struct thread_data *td, size_t total_mem)
 	if (td->o.mmapfile) {
 		if (td->mmapfd != -1)
 			close(td->mmapfd);
-		unlink(td->o.mmapfile);
+		if (!(td->flags & TD_F_MMAP_KEEP))
+			unlink(td->o.mmapfile);
 		free(td->o.mmapfile);
 	}
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2b455dbf8038ee5998f9280fac40f8089175adf7:

  HOWTO: minor fix and backport from man page (2017-08-01 13:55:09 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d4a507c17533f05bcf6d6eeb8d00f3dad1a020a1:

  Merge branch 'fio-jsonplus-patches' of https://github.com/vincentkfu/fio (2017-08-07 13:44:01 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fio-jsonplus-patches' of https://github.com/vincentkfu/fio

Tomohiro Kusumi (4):
      man: add proper indentation to "PARAMETER TYPES" section
      HOWTO: fix unit type suffix in "Parameter types" section
      HOWTO: use proper (or drop wrong usage of) option type =bool
      move skip_bad= option to engines/mtd.c

Vincent Fu (3):
      tools: add fio_jsonplus_clat2csv
      HOWTO: add section providing details about json+ output format
      man: add section describing json+ output format

 HOWTO                       |  47 +++++++++----
 Makefile                    |   2 +-
 cconv.c                     |   2 -
 engines/mtd.c               |  28 +++++++-
 fio.1                       | 139 +++++++++++++++++++++++++++----------
 options.c                   |  11 ---
 thread_options.h            |   2 -
 tools/fio_jsonplus_clat2csv | 164 ++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 329 insertions(+), 66 deletions(-)
 create mode 100755 tools/fio_jsonplus_clat2csv

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index ee2b996..6c69a0e 100644
--- a/HOWTO
+++ b/HOWTO
@@ -505,19 +505,19 @@ Parameter types
 	prefixes.  To specify power-of-10 decimal values defined in the
 	International System of Units (SI):
 
-		* *Ki* -- means kilo (K) or 1000
-		* *Mi* -- means mega (M) or 1000**2
-		* *Gi* -- means giga (G) or 1000**3
-		* *Ti* -- means tera (T) or 1000**4
-		* *Pi* -- means peta (P) or 1000**5
+		* *ki* -- means kilo (K) or 1000
+		* *mi* -- means mega (M) or 1000**2
+		* *gi* -- means giga (G) or 1000**3
+		* *ti* -- means tera (T) or 1000**4
+		* *pi* -- means peta (P) or 1000**5
 
 	To specify power-of-2 binary values defined in IEC 80000-13:
 
 		* *k* -- means kibi (Ki) or 1024
-		* *M* -- means mebi (Mi) or 1024**2
-		* *G* -- means gibi (Gi) or 1024**3
-		* *T* -- means tebi (Ti) or 1024**4
-		* *P* -- means pebi (Pi) or 1024**5
+		* *m* -- means mebi (Mi) or 1024**2
+		* *g* -- means gibi (Gi) or 1024**3
+		* *t* -- means tebi (Ti) or 1024**4
+		* *p* -- means pebi (Pi) or 1024**5
 
 	With :option:`kb_base`\=1024 (the default), the unit prefixes are opposite
 	from those specified in the SI and IEC 80000-13 standards to provide
@@ -1392,7 +1392,7 @@ Block size
 	typically won't work with direct I/O, as that normally requires sector
 	alignment.
 
-.. option:: bs_is_seq_rand
+.. option:: bs_is_seq_rand=bool
 
 	If this option is set, fio will use the normal read,write blocksize settings
 	as sequential,random blocksize settings instead. Any random read or write
@@ -2166,7 +2166,7 @@ I/O replay
 	replay, the file needs to be turned into a blkparse binary data file first
 	(``blkparse <device> -o /dev/null -d file_for_fio.bin``).
 
-.. option:: replay_no_stall=int
+.. option:: replay_no_stall=bool
 
 	When replaying I/O with :option:`read_iolog` the default behavior is to
 	attempt to respect the timestamps within the log and replay them with the
@@ -2680,7 +2680,7 @@ Measurements and reporting
 	all jobs in a file will be part of the same reporting group, unless
 	separated by a :option:`stonewall`.
 
-.. option:: stats
+.. option:: stats=bool
 
 	By default, fio collects and shows final output results for all jobs
 	that run. If this option is set to 0, then fio will ignore it in
@@ -2763,7 +2763,7 @@ Measurements and reporting
 	you instead want to log the maximum value, set this option to 1. Defaults to
 	0, meaning that averaged values are logged.
 
-.. option:: log_offset=int
+.. option:: log_offset=bool
 
 	If this is set, the iolog options will include the byte offset for the I/O
 	entry as well as the other data values. Defaults to 0 meaning that
@@ -3363,6 +3363,27 @@ minimal output v3, separated by semicolons::
 	terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10
 ;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 
 
+JSON+ output
+------------
+
+The `json+` output format is identical to the `json` output format except that it
+adds a full dump of the completion latency bins. Each `bins` object contains a
+set of (key, value) pairs where keys are latency durations and values count how
+many I/Os had completion latencies of the corresponding duration. For example,
+consider:
+
+	"bins" : { "87552" : 1, "89600" : 1, "94720" : 1, "96768" : 1, "97792" : 1, "99840" : 1, "100864" : 2, "103936" : 6, "104960" : 534, "105984" : 5995, "107008" : 7529, ... }
+
+This data indicates that one I/O required 87,552ns to complete, two I/Os required
+100,864ns to complete, and 7529 I/Os required 107,008ns to complete.
+
+Also included with fio is a Python script `fio_jsonplus_clat2csv` that takes
+json+ output and generates CSV-formatted latency data suitable for plotting.
+
+The latency durations actually represent the midpoints of latency intervals.
+For details refer to stat.h.
+
+
 Trace file format
 -----------------
 
diff --git a/Makefile b/Makefile
index e8ea6cb..540ffb2 100644
--- a/Makefile
+++ b/Makefile
@@ -26,7 +26,7 @@ OPTFLAGS= -g -ffast-math
 CFLAGS	= -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement $(OPTFLAGS) $(EXTFLAGS) $(BUILD_CFLAGS) -I. -I$(SRCDIR)
 LIBS	+= -lm $(EXTLIBS)
 PROGS	= fio
-SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/hist/fiologparser_hist.py)
+SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/hist/fiologparser_hist.py tools/fio_jsonplus_clat2csv)
 
 ifndef CONFIG_FIO_NO_OPT
   CFLAGS += -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2
diff --git a/cconv.c b/cconv.c
index b8d9ddc..f9f2b30 100644
--- a/cconv.c
+++ b/cconv.c
@@ -283,7 +283,6 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->compress_percentage = le32_to_cpu(top->compress_percentage);
 	o->compress_chunk = le32_to_cpu(top->compress_chunk);
 	o->dedupe_percentage = le32_to_cpu(top->dedupe_percentage);
-	o->skip_bad = le32_to_cpu(top->skip_bad);
 	o->block_error_hist = le32_to_cpu(top->block_error_hist);
 	o->replay_align = le32_to_cpu(top->replay_align);
 	o->replay_scale = le32_to_cpu(top->replay_scale);
@@ -471,7 +470,6 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->compress_chunk = cpu_to_le32(o->compress_chunk);
 	top->dedupe_percentage = cpu_to_le32(o->dedupe_percentage);
 	top->block_error_hist = cpu_to_le32(o->block_error_hist);
-	top->skip_bad = cpu_to_le32(o->skip_bad);
 	top->replay_align = cpu_to_le32(o->replay_align);
 	top->replay_scale = cpu_to_le32(o->replay_scale);
 	top->per_job_logs = cpu_to_le32(o->per_job_logs);
diff --git a/engines/mtd.c b/engines/mtd.c
index 3c22a1b..b4a6600 100644
--- a/engines/mtd.c
+++ b/engines/mtd.c
@@ -13,6 +13,7 @@
 #include <mtd/mtd-user.h>
 
 #include "../fio.h"
+#include "../optgroup.h"
 #include "../verify.h"
 #include "../oslib/libmtd.h"
 
@@ -22,6 +23,28 @@ struct fio_mtd_data {
 	struct mtd_dev_info info;
 };
 
+struct fio_mtd_options {
+	void *pad; /* avoid off1 == 0 */
+	unsigned int skip_bad;
+};
+
+static struct fio_option options[] = {
+	{
+		.name	= "skip_bad",
+		.lname	= "Skip operations against bad blocks",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct fio_mtd_options, skip_bad),
+		.help	= "Skip operations against known bad blocks.",
+		.hide	= 1,
+		.def	= "0",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_MTD,
+	},
+	{
+		.name	= NULL,
+	},
+};
+
 static int fio_mtd_maybe_mark_bad(struct thread_data *td,
 				  struct fio_mtd_data *fmd,
 				  struct io_u *io_u, int eb)
@@ -55,6 +78,7 @@ static int fio_mtd_queue(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
 	struct fio_mtd_data *fmd = FILE_ENG_DATA(f);
+	struct fio_mtd_options *o = td->eo;
 	int local_offs = 0;
 	int ret;
 
@@ -77,7 +101,7 @@ static int fio_mtd_queue(struct thread_data *td, struct io_u *io_u)
 			      (int)fmd->info.eb_size - eb_offs);
 		char *buf = ((char *)io_u->buf) + local_offs;
 
-		if (td->o.skip_bad) {
+		if (o->skip_bad) {
 			ret = fio_mtd_is_bad(td, fmd, io_u, eb);
 			if (ret == -1)
 				break;
@@ -190,6 +214,8 @@ static struct ioengine_ops ioengine = {
 	.close_file	= fio_mtd_close_file,
 	.get_file_size	= fio_mtd_get_file_size,
 	.flags		= FIO_SYNCIO | FIO_NOEXTEND,
+	.options	= options,
+	.option_struct_size	= sizeof(struct fio_mtd_options),
 };
 
 static void fio_init fio_mtd_register(void)
diff --git a/fio.1 b/fio.1
index a5ec199..ab978ab 100644
--- a/fio.1
+++ b/fio.1
@@ -223,84 +223,130 @@ hours, 'm' for minutes, 's' for seconds, 'ms' (or 'msec') for milliseconds and '
 .I int
 Integer. A whole number value, which may contain an integer prefix
 and an integer suffix.
-
+.RS
+.RS
+.P
 [*integer prefix*] **number** [*integer suffix*]
-
+.RE
+.P
 The optional *integer prefix* specifies the number's base. The default
 is decimal. *0x* specifies hexadecimal.
-
+.P
 The optional *integer suffix* specifies the number's units, and includes an
 optional unit prefix and an optional unit. For quantities of data, the
 default unit is bytes. For quantities of time, the default unit is seconds
 unless otherwise specified.
-
-With \fBkb_base=1000\fR, fio follows international standards for unit
+.P
+With `kb_base=1000', fio follows international standards for unit
 prefixes. To specify power-of-10 decimal values defined in the
 International System of Units (SI):
-
-.nf
+.RS
+.P
 ki means kilo (K) or 1000
+.RE
+.RS
 mi means mega (M) or 1000**2
+.RE
+.RS
 gi means giga (G) or 1000**3
+.RE
+.RS
 ti means tera (T) or 1000**4
+.RE
+.RS
 pi means peta (P) or 1000**5
-.fi
-
+.RE
+.P
 To specify power-of-2 binary values defined in IEC 80000-13:
-
-.nf
+.RS
+.P
 k means kibi (Ki) or 1024
+.RE
+.RS
 m means mebi (Mi) or 1024**2
+.RE
+.RS
 g means gibi (Gi) or 1024**3
+.RE
+.RS
 t means tebi (Ti) or 1024**4
+.RE
+.RS
 p means pebi (Pi) or 1024**5
-.fi
-
-With \fBkb_base=1024\fR (the default), the unit prefixes are opposite
+.RE
+.P
+With `kb_base=1024' (the default), the unit prefixes are opposite
 from those specified in the SI and IEC 80000-13 standards to provide
 compatibility with old scripts. For example, 4k means 4096.
-
+.P
 For quantities of data, an optional unit of 'B' may be included
 (e.g., 'kB' is the same as 'k').
-
+.P
 The *integer suffix* is not case sensitive (e.g., m/mi mean mebi/mega,
 not milli). 'b' and 'B' both mean byte, not bit.
-
-Examples with \fBkb_base=1000\fR:
-
-.nf
+.P
+Examples with `kb_base=1000':
+.RS
+.P
 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
+.RE
+.RS
 1 MiB: 1048576, 1m, 1024k
+.RE
+.RS
 1 MB: 1000000, 1mi, 1000ki
+.RE
+.RS
 1 TiB: 1073741824, 1t, 1024m, 1048576k
+.RE
+.RS
 1 TB: 1000000000, 1ti, 1000mi, 1000000ki
-.fi
-
-Examples with \fBkb_base=1024\fR (default):
-
-.nf
+.RE
+.P
+Examples with `kb_base=1024' (default):
+.RS
+.P
 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
+.RE
+.RS
 1 MiB: 1048576, 1m, 1024k
+.RE
+.RS
 1 MB: 1000000, 1mi, 1000ki
+.RE
+.RS
 1 TiB: 1073741824, 1t, 1024m, 1048576k
+.RE
+.RS
 1 TB: 1000000000, 1ti, 1000mi, 1000000ki
-.fi
-
+.RE
+.P
 To specify times (units are not case sensitive):
-
-.nf
+.RS
+.P
 D means days
+.RE
+.RS
 H means hours
+.RE
+.RS
 M mean minutes
+.RE
+.RS
 s or sec means seconds (default)
+.RE
+.RS
 ms or msec means milliseconds
+.RE
+.RS
 us or usec means microseconds
-.fi
-
+.RE
+.P
 If the option accepts an upper and lower range, use a colon ':' or
 minus '-' to separate such values. See `irange` parameter type.
 If the lower value specified happens to be larger than the upper value
 the two values are swapped.
+.RE
 .TP
 .I bool
 Boolean. Usually parsed as an integer, however only defined for
@@ -1464,7 +1510,7 @@ Should be a multiple of 1MiB. Default: 4MiB.
 .B exitall
 Terminate all jobs when one finishes.  Default: wait for each job to finish.
 .TP
-.B exitall_on_error \fR=\fPbool
+.B exitall_on_error
 Terminate all jobs if one job finishes in error.  Default: wait for each job
 to finish.
 .TP
@@ -1520,7 +1566,7 @@ Unlink job files after each iteration or loop.  Default: false.
 Specifies the number of iterations (runs of the same workload) of this job.
 Default: 1.
 .TP
-.BI verify_only \fR=\fPbool
+.BI verify_only
 Do not perform the specified workload, only verify data still matches previous
 invocation of this workload. This option allows one to check data multiple
 times at a later date without overwriting it. This option makes sense only for
@@ -1710,7 +1756,7 @@ corrupt.
 Replay the I/O patterns contained in the specified file generated by
 \fBwrite_iolog\fR, or may be a \fBblktrace\fR binary file.
 .TP
-.BI replay_no_stall \fR=\fPint
+.BI replay_no_stall \fR=\fPbool
 While replaying I/O patterns using \fBread_iolog\fR the default behavior
 attempts to respect timing information between I/Os.  Enabling
 \fBreplay_no_stall\fR causes I/Os to be replayed as fast as possible while
@@ -2074,7 +2120,7 @@ For TCP network connections, tell fio to listen for incoming
 connections rather than initiating an outgoing connection. The
 hostname must be omitted if this option is used.
 .TP
-.BI (net,netsplice)pingpong \fR=\fPbool
+.BI (net,netsplice)pingpong
 Normally a network writer will just continue writing data, and a network reader
 will just consume packets. If pingpong=1 is set, a writer will send its normal
 payload to the reader, then wait for the reader to send the same payload back.
@@ -2117,7 +2163,7 @@ Specifies the username (without the 'client.' prefix) used to access the Ceph
 cluster. If the clustername is specified, the clientname shall be the full
 type.id string. If no type. prefix is given, fio will add 'client.' by default.
 .TP
-.BI (mtd)skipbad \fR=\fPbool
+.BI (mtd)skip_bad \fR=\fPbool
 Skip operations against known bad blocks.
 .SH OUTPUT
 While running, \fBfio\fR will display the status of the created jobs.  For
@@ -2393,6 +2439,27 @@ the minimal output v3, separated by semicolons:
 terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_max;read_clat_min;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_max;write_clat_min;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;
 write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;pu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 .fi
 .RE
+.SH JSON+ OUTPUT
+The \fBjson+\fR output format is identical to the \fBjson\fR output format except that it
+adds a full dump of the completion latency bins. Each \fBbins\fR object contains a
+set of (key, value) pairs where keys are latency durations and values count how
+many I/Os had completion latencies of the corresponding duration. For example,
+consider:
+
+.RS
+"bins" : { "87552" : 1, "89600" : 1, "94720" : 1, "96768" : 1, "97792" : 1, "99840" : 1, "100864" : 2, "103936" : 6, "104960" : 534, "105984" : 5995, "107008" : 7529, ... }
+.RE
+
+This data indicates that one I/O required 87,552ns to complete, two I/Os required
+100,864ns to complete, and 7529 I/Os required 107,008ns to complete.
+
+Also included with fio is a Python script \fBfio_jsonplus_clat2csv\fR that takes
+json+ output and generates CSV-formatted latency data suitable for plotting.
+
+The latency durations actually represent the midpoints of latency intervals.
+For details refer to stat.h.
+
+
 .SH TRACE FILE FORMAT
 There are two trace file format that you can encounter. The older (v1) format
 is unsupported since version 1.20-rc3 (March 2008). It will still be described
diff --git a/options.c b/options.c
index 5a2ab57..f2b2bb9 100644
--- a/options.c
+++ b/options.c
@@ -4379,17 +4379,6 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_IO_FLOW,
 	},
 	{
-		.name	= "skip_bad",
-		.lname	= "Skip operations against bad blocks",
-		.type	= FIO_OPT_BOOL,
-		.off1	= offsetof(struct thread_options, skip_bad),
-		.help	= "Skip operations against known bad blocks.",
-		.hide	= 1,
-		.def	= "0",
-		.category = FIO_OPT_C_IO,
-		.group	= FIO_OPT_G_MTD,
-	},
-	{
 		.name   = "steadystate",
 		.lname  = "Steady state threshold",
 		.alias  = "ss",
diff --git a/thread_options.h b/thread_options.h
index 72d86cf..f3dfd42 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -306,7 +306,6 @@ struct thread_options {
 	fio_fp64_t latency_percentile;
 
 	unsigned block_error_hist;
-	unsigned int skip_bad;
 
 	unsigned int replay_align;
 	unsigned int replay_scale;
@@ -579,7 +578,6 @@ struct thread_options_pack {
 	fio_fp64_t latency_percentile;
 
 	uint32_t block_error_hist;
-	uint32_t skip_bad;
 
 	uint32_t replay_align;
 	uint32_t replay_scale;
diff --git a/tools/fio_jsonplus_clat2csv b/tools/fio_jsonplus_clat2csv
new file mode 100755
index 0000000..d4ac16e
--- /dev/null
+++ b/tools/fio_jsonplus_clat2csv
@@ -0,0 +1,164 @@
+#!/usr/bin/python
+#
+# fio_jsonplus_clat2csv
+#
+# This script converts fio's json+ completion latency data to CSV format.
+#
+# For example:
+#
+# Run the following fio jobs:
+# ../fio --output=fio-jsonplus.output --output-format=json+ --name=test1
+#  	--ioengine=null --time_based --runtime=5s --size=1G --rw=randrw
+# 	--name=test2 --ioengine=null --time_based --runtime=3s --size=1G
+# 	--rw=read --name=test3 --ioengine=null --time_based --runtime=4s
+# 	--size=8G --rw=write
+#
+# Then run:
+# fio_jsonplus_clat2csv fio-jsonplus.output fio-latency.csv
+#
+# You will end up with the following 3 files
+#
+# -rw-r--r-- 1 root root  6467 Jun 27 14:57 fio-latency_job0.csv
+# -rw-r--r-- 1 root root  3985 Jun 27 14:57 fio-latency_job1.csv
+# -rw-r--r-- 1 root root  4490 Jun 27 14:57 fio-latency_job2.csv
+#
+# fio-latency_job0.csv will look something like:
+#
+# clat_nsec, read_count, read_cumulative, read_percentile, write_count,
+# 	write_cumulative, write_percentile, trim_count, trim_cumulative,
+# 	trim_percentile,
+# 25, 1, 1, 1.50870705013e-07, , , , , , ,
+# 26, 12, 13, 1.96131916517e-06, 947, 947, 0.000142955890032, , , ,
+# 27, 843677, 843690, 0.127288105112, 838347, 839294, 0.126696959629, , , ,
+# 28, 1877982, 2721672, 0.410620573454, 1870189, 2709483, 0.409014312345, , , ,
+# 29, 4471, 2726143, 0.411295116376, 7718, 2717201, 0.410179395301, , , ,
+# 30, 2142885, 4869028, 0.734593687087, 2138164, 4855365, 0.732949340025, , , ,
+# ...
+# 2544, , , , 2, 6624404, 0.999997433738, , , ,
+# 2576, 3, 6628178, 0.99999788781, 4, 6624408, 0.999998037564, , , ,
+# 2608, 4, 6628182, 0.999998491293, 4, 6624412, 0.999998641391, , , ,
+# 2640, 3, 6628185, 0.999998943905, 2, 6624414, 0.999998943304, , , ,
+# 2672, 1, 6628186, 0.999999094776, 3, 6624417, 0.999999396174, , , ,
+# 2736, 1, 6628187, 0.999999245646, 1, 6624418, 0.99999954713, , , ,
+# 2768, 2, 6628189, 0.999999547388, 1, 6624419, 0.999999698087, , , ,
+# 2800, , , , 1, 6624420, 0.999999849043, , , ,
+# 2832, 1, 6628190, 0.999999698259, , , , , , ,
+# 4192, 1, 6628191, 0.999999849129, , , , , , ,
+# 5792, , , , 1, 6624421, 1.0, , , ,
+# 10304, 1, 6628192, 1.0, , , , , , ,
+#
+# The first line says that you had one read IO with 25ns clat,
+# the cumulative number of read IOs at or below 25ns is 1, and
+# 25ns is the 0.00001509th percentile for read latency
+#
+# The job had 2 write IOs complete in 2544ns,
+# 6624404 write IOs completed in 2544ns or less,
+# and this represents the 99.99974th percentile for write latency
+#
+# The last line says that one read IO had 10304ns clat,
+# 6628192 read IOs had 10304ns or shorter clat, and
+# 10304ns is the 100th percentile for read latency
+#
+
+import os
+import json
+import argparse
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('source',
+                        help='fio json+ output file containing completion '
+                             'latency data')
+    parser.add_argument('dest',
+                        help='destination file stub for latency data in CSV '
+                             'format. job number will be appended to filename')
+    args = parser.parse_args()
+
+    return args
+
+
+def percentile(idx, run_total):
+    total = run_total[len(run_total)-1]
+    if total == 0:
+        return 0
+
+    return float(run_total[idx]) / total
+
+
+def more_lines(indices, bins):
+    for key, value in indices.iteritems():
+        if value < len(bins[key]):
+            return True
+
+    return False
+
+
+def main():
+    args = parse_args()
+
+    with open(args.source, 'r') as source:
+        jsondata = json.loads(source.read())
+
+    for jobnum in range(0, len(jsondata['jobs'])):
+        bins = {}
+        run_total = {}
+        ddir_set = set(['read', 'write', 'trim'])
+
+        prev_ddir = None
+        for ddir in ddir_set:
+            bins[ddir] = [[int(key), value] for key, value in
+                          jsondata['jobs'][jobnum][ddir]['clat_ns']
+                          ['bins'].iteritems()]
+            bins[ddir] = sorted(bins[ddir], key=lambda bin: bin[0])
+
+            run_total[ddir] = [0 for x in range(0, len(bins[ddir]))]
+            if len(bins[ddir]) > 0:
+                run_total[ddir][0] = bins[ddir][0][1]
+                for x in range(1, len(bins[ddir])):
+                    run_total[ddir][x] = run_total[ddir][x-1] + \
+                        bins[ddir][x][1]
+
+        stub, ext = os.path.splitext(args.dest)
+        outfile = stub + '_job' + str(jobnum) + ext
+
+        with open(outfile, 'w') as output:
+            output.write("clat_nsec, ")
+            ddir_list = list(ddir_set)
+            for ddir in ddir_list:
+                output.write("{0}_count, {0}_cumulative, {0}_percentile, ".
+                             format(ddir))
+            output.write("\n")
+
+#
+# Have a counter for each ddir
+# In each round, pick the shortest remaining duration
+# and output a line with any values for that duration
+#
+            indices = {x: 0 for x in ddir_list}
+            while more_lines(indices, bins):
+                min_lat = 17112760320
+                for ddir in ddir_list:
+                    if indices[ddir] < len(bins[ddir]):
+                        min_lat = min(bins[ddir][indices[ddir]][0], min_lat)
+
+                output.write("{0}, ".format(min_lat))
+
+                for ddir in ddir_list:
+                    if indices[ddir] < len(bins[ddir]) and \
+                       min_lat == bins[ddir][indices[ddir]][0]:
+                        count = bins[ddir][indices[ddir]][1]
+                        cumulative = run_total[ddir][indices[ddir]]
+                        ptile = percentile(indices[ddir], run_total[ddir])
+                        output.write("{0}, {1}, {2}, ".format(count,
+                                     cumulative, ptile))
+                        indices[ddir] += 1
+                    else:
+                        output.write(", , , ")
+                output.write("\n")
+
+            print "{0} generated".format(outfile)
+
+
+if __name__ == '__main__':
+    main()

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2e2b80c58afa2da6cf2e6c792db16757b2244847:

  Merge branch 'master' of https://github.com/dublio/fio (2017-07-31 08:15:17 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2b455dbf8038ee5998f9280fac40f8089175adf7:

  HOWTO: minor fix and backport from man page (2017-08-01 13:55:09 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (3):
      HOWTO: fix wrong description of trim_backlog=
      HOWTO: fix wrong "here follows the complete list of fio job parameters" position
      HOWTO: minor fix and backport from man page

 HOWTO | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index caf6591..ee2b996 100644
--- a/HOWTO
+++ b/HOWTO
@@ -576,6 +576,8 @@ Parameter types
 **float_list**
 	A list of floating point numbers, separated by a ':' character.
 
+With the above in mind, here follows the complete list of fio job parameters.
+
 
 Units
 ~~~~~
@@ -622,9 +624,6 @@ Units
 		Bit based.
 
 
-With the above in mind, here follows the complete list of fio job parameters.
-
-
 Job description
 ~~~~~~~~~~~~~~~
 
@@ -1530,6 +1529,7 @@ Buffers and memory
 
 		**cudamalloc**
 			Use GPU memory as the buffers for GPUDirect RDMA benchmark.
+			The ioengine must be rdma.
 
 	The area allocated is a function of the maximum allowed bs size for the job,
 	multiplied by the I/O depth given. Note that for **shmhuge** and
@@ -1858,7 +1858,7 @@ caveat that when used on the command line, they must come after the
 
    [libhdfs]
 
-		the listening port of the HFDS cluster namenode.
+		The listening port of the HFDS cluster namenode.
 
 .. option:: interface=str : [netsplice] [net]
 
@@ -1931,7 +1931,7 @@ caveat that when used on the command line, they must come after the
 	**0**
 		Default. Preallocate donor's file on init.
 	**1**
-		Allocate space immediately inside defragment event,	and free right
+		Allocate space immediately inside defragment event, and free right
 		after event.
 
 .. option:: clustername=str : [rbd]
@@ -1963,7 +1963,7 @@ caveat that when used on the command line, they must come after the
 
 .. option:: chunk_size : [libhdfs]
 
-	the size of the chunk to use for each file.
+	The size of the chunk to use for each file.
 
 
 I/O depth
@@ -2487,7 +2487,7 @@ Verification
 
 .. option:: verifysort_nr=int
 
-   Pre-load and sort verify blocks for a read workload.
+	Pre-load and sort verify blocks for a read workload.
 
 .. option:: verify_offset=int
 
@@ -2595,7 +2595,7 @@ Verification
 
 .. option:: trim_backlog=int
 
-	Verify that trim/discarded blocks are returned as zeros.
+	Trim after this number of blocks are written.
 
 .. option:: trim_backlog_batch=int
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-08-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-08-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f271a3f2d598b9dd8036543071cad573295d8e8e:

  don't print native_fallocate() error if ENOSYS (2017-07-27 14:44:23 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2e2b80c58afa2da6cf2e6c792db16757b2244847:

  Merge branch 'master' of https://github.com/dublio/fio (2017-07-31 08:15:17 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'master' of https://github.com/dublio/fio

weiping zhang (1):
      filesetup: keep OS_O_DIRECT flag when pre-allocating file

 filesetup.c | 2 ++
 1 file changed, 2 insertions(+)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index 839aefc..0e5599a 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -146,6 +146,8 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 		flags |= O_CREAT;
 	if (new_layout)
 		flags |= O_TRUNC;
+	if (td->o.odirect)
+		flags |= OS_O_DIRECT;
 
 #ifdef WIN32
 	flags |= _O_BINARY;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-28 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 18128 bytes --]

The following changes since commit e1933299ed9f1525e010e0489f0185c063d6d129:

  drop logging when blkdev invalidation failed on unsupported platforms (2017-07-25 13:56:21 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f271a3f2d598b9dd8036543071cad573295d8e8e:

  don't print native_fallocate() error if ENOSYS (2017-07-27 14:44:23 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'minor_fixes' of https://github.com/sitsofe/fio

Sitsofe Wheeler (10):
      HOWTO: remove unnecessary escaping
      travis: get rid of non-breaking space characters
      os: add missing include for bswap_* on BSDs
      arch: raise an error when compiling for an unknown ARM platform
      examples: add a butterfly seek job file
      time: Add chosen clocksource debug
      doc: add block size to log file format
      doc: minor grammar fixes
      init: force fallocate_mode to none when fallocate is unsupported
      fio: refactor fallocate defines

Stephen Bates (1):
      pvsync2: Add hipri_percentage option

Tomohiro Kusumi (1):
      don't print native_fallocate() error if ENOSYS

 .travis.yml            |  2 +-
 HOWTO                  | 36 ++++++++++++++++++++++--------------
 arch/arch-arm.h        |  2 ++
 doc/fio_examples.rst   | 10 ++++++++++
 engines/sync.c         | 20 +++++++++++++++++++-
 examples/butterfly.fio | 19 +++++++++++++++++++
 filesetup.c            |  3 +--
 fio.1                  | 30 ++++++++++++++++++------------
 gettime.c              |  1 +
 init.c                 |  5 +++++
 options.c              |  6 +++---
 os/os-dragonfly.h      |  1 +
 os/os-freebsd.h        |  1 +
 os/os-netbsd.h         |  1 +
 os/os-openbsd.h        |  1 +
 os/os.h                |  4 ++++
 16 files changed, 109 insertions(+), 33 deletions(-)
 create mode 100644 examples/butterfly.fio

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
index ca50e22..e84e61f 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -16,7 +16,7 @@ matrix:
 #     compiler: clang
 #     osx_image: xcode8
 #     env: SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk MACOSX_DEPLOYMENT_TARGET=10.11
-    #��Build on the latest OSX version (will eventually become��obsolete)
+    # Build on the latest OSX version (will eventually become obsolete)
     - os: osx
       compiler: clang
       osx_image: xcode8.2
diff --git a/HOWTO b/HOWTO
index 2fa8fc2..caf6591 100644
--- a/HOWTO
+++ b/HOWTO
@@ -98,7 +98,7 @@ Command line options
 
 .. option:: --parse-only
 
-	Parse options only, don\'t start any I/O.
+	Parse options only, don't start any I/O.
 
 .. option:: --output=filename
 
@@ -1015,8 +1015,8 @@ I/O type
 
 	``sequential`` is only useful for random I/O, where fio would normally
 	generate a new random offset for every I/O. If you append e.g. 8 to randread,
-	you would get a new random offset for every 8 I/O's. The result would be a
-	seek for only every 8 I/O's, instead of for every I/O. Use ``rw=randread:8``
+	you would get a new random offset for every 8 I/Os. The result would be a
+	seek for only every 8 I/Os, instead of for every I/O. Use ``rw=randread:8``
 	to specify that. As sequential I/O is already sequential, setting
 	``sequential`` for that would not result in any differences.  ``identical``
 	behaves in a similar fashion, except it sends the same offset 8 number of
@@ -1819,6 +1819,11 @@ caveat that when used on the command line, they must come after the
 	Set RWF_HIPRI on I/O, indicating to the kernel that it's of higher priority
 	than normal.
 
+.. option:: hipri_percentage : [pvsync2]
+
+	When hipri is set this determines the probability of a pvsync2 IO being high
+	priority. The default is 100%.
+
 .. option:: cpuload=int : [cpuio]
 
 	Attempt to use the specified percentage of CPU cycles. This is a mandatory
@@ -2761,7 +2766,8 @@ Measurements and reporting
 .. option:: log_offset=int
 
 	If this is set, the iolog options will include the byte offset for the I/O
-	entry as well as the other data values.
+	entry as well as the other data values. Defaults to 0 meaning that
+	offsets are not present in logs. Also see `Log File Formats`_.
 
 .. option:: log_compression=int
 
@@ -3242,7 +3248,7 @@ numbers denote:
 **ios**
 		Number of I/Os performed by all groups.
 **merge**
-		Number of merges I/O the I/O scheduler.
+		Number of merges performed by the I/O scheduler.
 **ticks**
 		Number of ticks we kept the disk busy.
 **in_queue**
@@ -3278,7 +3284,7 @@ changed for some reason, this number will be incremented by 1 to signify that
 change.
 
 Split up, the format is as follows (comments in brackets denote when a
-field was introduced or whether its specific to some terse version):
+field was introduced or whether it's specific to some terse version):
 
     ::
 
@@ -3531,9 +3537,10 @@ Log File Formats
 Fio supports a variety of log file formats, for logging latencies, bandwidth,
 and IOPS. The logs share a common format, which looks like this:
 
-    *time* (`msec`), *value*, *data direction*, *offset*
+    *time* (`msec`), *value*, *data direction*, *block size* (`bytes`),
+    *offset* (`bytes`)
 
-Time for the log entry is always in milliseconds. The *value* logged depends
+*Time* for the log entry is always in milliseconds. The *value* logged depends
 on the type of log, it will be one of the following:
 
     **Latency log**
@@ -3552,16 +3559,17 @@ on the type of log, it will be one of the following:
 	**2**
 		I/O is a TRIM
 
-The *offset* is the offset, in bytes, from the start of the file, for that
-particular I/O. The logging of the offset can be toggled with
-:option:`log_offset`.
+The entry's *block size* is always in bytes. The *offset* is the offset, in bytes,
+from the start of the file, for that particular I/O. The logging of the offset can be
+toggled with :option:`log_offset`.
 
 Fio defaults to logging every individual I/O.  When IOPS are logged for individual
-I/Os the value entry will always be 1.  If windowed logging is enabled through
+I/Os the *value* entry will always be 1. If windowed logging is enabled through
 :option:`log_avg_msec`, fio logs the average values over the specified period of time.
 If windowed logging is enabled and :option:`log_max_value` is set, then fio logs
-maximum values in that window instead of averages.  Since 'data direction' and
-'offset' are per-I/O values, they aren't applicable if windowed logging is enabled.
+maximum values in that window instead of averages. Since *data direction*, *block
+size* and *offset* are per-I/O values, if windowed logging is enabled they
+aren't applicable and will be 0.
 
 Client/Server
 -------------
diff --git a/arch/arch-arm.h b/arch/arch-arm.h
index 31671fd..dd286d0 100644
--- a/arch/arch-arm.h
+++ b/arch/arch-arm.h
@@ -14,6 +14,8 @@
 #define	nop		__asm__ __volatile__ ("nop")
 #define read_barrier()	__sync_synchronize()
 #define write_barrier()	__sync_synchronize()
+#else
+#error "unsupported ARM architecture"
 #endif
 
 #endif
diff --git a/doc/fio_examples.rst b/doc/fio_examples.rst
index ae0ef6f..cff1f39 100644
--- a/doc/fio_examples.rst
+++ b/doc/fio_examples.rst
@@ -60,3 +60,13 @@ Fixed rate submission
 
 .. literalinclude:: ../examples/fixed-rate-submission.fio
 	:language: ini
+
+Butterfly seek pattern
+-----------------------
+
+.. only:: builder_html
+
+:download:`Download butterfly.fio <../examples/butterfly.fio>`
+
+.. literalinclude:: ../examples/butterfly.fio
+	:language: ini
diff --git a/engines/sync.c b/engines/sync.c
index e76bbbb..26b98b6 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -14,6 +14,7 @@
 
 #include "../fio.h"
 #include "../optgroup.h"
+#include "../lib/rand.h"
 
 /*
  * Sync engine uses engine_data to store last offset
@@ -30,12 +31,15 @@ struct syncio_data {
 	unsigned long long last_offset;
 	struct fio_file *last_file;
 	enum fio_ddir last_ddir;
+
+	struct frand_state rand_state;
 };
 
 #ifdef FIO_HAVE_PWRITEV2
 struct psyncv2_options {
 	void *pad;
 	unsigned int hipri;
+	unsigned int hipri_percentage;
 };
 
 static struct fio_option options[] = {
@@ -49,6 +53,18 @@ static struct fio_option options[] = {
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
+		.name	= "hipri_percentage",
+		.lname	= "RWF_HIPRI_PERCENTAGE",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct psyncv2_options, hipri_percentage),
+		.minval	= 0,
+		.maxval	= 100,
+		.def    = "100",
+		.help	= "Probabilistically set RWF_HIPRI for pwritev2/preadv2",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
 		.name	= NULL,
 	},
 };
@@ -132,7 +148,8 @@ static int fio_pvsyncio2_queue(struct thread_data *td, struct io_u *io_u)
 
 	fio_ro_check(td, io_u);
 
-	if (o->hipri)
+	if (o->hipri &&
+	    (rand32_between(&sd->rand_state, 1, 100) <= o->hipri_percentage))
 		flags |= RWF_HIPRI;
 
 	iov->iov_base = io_u->xfer_buf;
@@ -363,6 +380,7 @@ static int fio_vsyncio_init(struct thread_data *td)
 	sd->last_offset = -1ULL;
 	sd->iovecs = malloc(td->o.iodepth * sizeof(struct iovec));
 	sd->io_us = malloc(td->o.iodepth * sizeof(struct io_u *));
+	init_rand(&sd->rand_state, 0);
 
 	td->io_ops_data = sd;
 	return 0;
diff --git a/examples/butterfly.fio b/examples/butterfly.fio
new file mode 100644
index 0000000..42d253d
--- /dev/null
+++ b/examples/butterfly.fio
@@ -0,0 +1,19 @@
+# Perform a butterfly/funnel seek pattern. This won't always alternate ends on
+# every I/O but it will get close.
+
+[global]
+filename=/tmp/testfile
+bs=4k
+direct=1
+
+[forward]
+rw=read
+flow=2
+# Uncomment the size= and offset= lines to prevent each direction going past
+# the middle of the file
+#size=50%
+
+[backward]
+rw=read:-8k
+flow=-2
+#offset=50%
diff --git a/filesetup.c b/filesetup.c
index 3b2ebd9..839aefc 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -67,7 +67,7 @@ static void fallocate_file(struct thread_data *td, struct fio_file *f)
 	switch (td->o.fallocate_mode) {
 	case FIO_FALLOCATE_NATIVE:
 		r = native_fallocate(td, f);
-		if (r != 0)
+		if (r != 0 && errno != ENOSYS)
 			log_err("fio: native_fallocate call failed: %s\n",
 					strerror(errno));
 		break;
@@ -100,7 +100,6 @@ static void fallocate_file(struct thread_data *td, struct fio_file *f)
 		log_err("fio: unknown fallocate mode: %d\n", td->o.fallocate_mode);
 		assert(0);
 	}
-
 }
 
 /*
diff --git a/fio.1 b/fio.1
index 768b209..a5ec199 100644
--- a/fio.1
+++ b/fio.1
@@ -453,7 +453,7 @@ the same blocks will be written to.
 Fio defaults to read if the option is not specified.
 For mixed I/O, the default split is 50/50. For certain types of io the result
 may still be skewed a bit, since the speed may be different. It is possible to
-specify a number of IO's to do before getting a new offset, this is done by
+specify a number of IOs to do before getting a new offset, this is done by
 appending a `:\fI<nr>\fR to the end of the string given. For a random read, it
 would look like \fBrw=randread:8\fR for passing in an offset modifier with a
 value of 8. If the postfix is used with a sequential IO pattern, then the value
@@ -478,8 +478,8 @@ Generate the same offset
 .P
 \fBsequential\fR is only useful for random IO, where fio would normally
 generate a new random offset for every IO. If you append eg 8 to randread, you
-would get a new random offset for every 8 IO's. The result would be a seek for
-only every 8 IO's, instead of for every IO. Use \fBrw=randread:8\fR to specify
+would get a new random offset for every 8 IOs. The result would be a seek for
+only every 8 IOs, instead of for every IO. Use \fBrw=randread:8\fR to specify
 that. As sequential IO is already sequential, setting \fBsequential\fR for that
 would not result in any differences.  \fBidentical\fR behaves in a similar
 fashion, except it sends the same offset 8 number of times before generating a
@@ -1794,7 +1794,8 @@ logs contain 1216 latency bins. See the \fBLOG FILE FORMATS\fR section.
 .TP
 .BI log_offset \fR=\fPbool
 If this is set, the iolog options will include the byte offset for the IO
-entry as well as the other data values.
+entry as well as the other data values. Defaults to 0 meaning that offsets are
+not present in logs. See the \fBLOG FILE FORMATS\fR section.
 .TP
 .BI log_compression \fR=\fPint
 If this is set, fio will compress the IO logs as it goes, to keep the memory
@@ -2017,6 +2018,10 @@ iodepth_batch_complete=0).
 Set RWF_HIPRI on IO, indicating to the kernel that it's of
 higher priority than normal.
 .TP
+.BI (pvsync2)hipri_percentage
+When hipri is set this determines the probability of a pvsync2 IO being high
+priority. The default is 100%.
+.TP
 .BI (net,netsplice)hostname \fR=\fPstr
 The host name or IP address to use for TCP or UDP based IO.
 If the job is a TCP listener or UDP reader, the hostname is not
@@ -2251,7 +2256,7 @@ Finally, disk statistics are printed with reads first:
 Number of I/Os performed by all groups.
 .TP
 .B merge
-Number of merges in the I/O scheduler.
+Number of merges performed by the I/O scheduler.
 .TP
 .B ticks
 Number of ticks we kept the disk busy.
@@ -2581,7 +2586,7 @@ the files over and load them from there.
 Fio supports a variety of log file formats, for logging latencies, bandwidth,
 and IOPS. The logs share a common format, which looks like this:
 
-.B time (msec), value, data direction, offset
+.B time (msec), value, data direction, block size (bytes), offset (bytes)
 
 Time for the log entry is always in milliseconds. The value logged depends
 on the type of log, it will be one of the following:
@@ -2616,15 +2621,16 @@ IO is a TRIM
 .PD
 .P
 
-The \fIoffset\fR is the offset, in bytes, from the start of the file, for that
-particular IO. The logging of the offset can be toggled with \fBlog_offset\fR.
+The entry's *block size* is always in bytes. The \fIoffset\fR is the offset, in
+bytes, from the start of the file, for that particular IO. The logging of the
+offset can be toggled with \fBlog_offset\fR.
 
 If windowed logging is enabled through \fBlog_avg_msec\fR, then fio doesn't log
 individual IOs. Instead of logs the average values over the specified
-period of time. Since \fIdata direction\fR and \fIoffset\fR are per-IO values,
-they aren't applicable if windowed logging is enabled. If windowed logging
-is enabled and \fBlog_max_value\fR is set, then fio logs maximum values in
-that window instead of averages.
+period of time. Since \fIdata direction\fR, \fIblock size\fR and \fIoffset\fR
+are per-IO values, if windowed logging is enabled they aren't applicable and
+will be 0. If windowed logging is enabled and \fBlog_max_value\fR is set, then
+fio logs maximum values in that window instead of averages.
 
 For histogram logging the logs look like this:
 
diff --git a/gettime.c b/gettime.c
index 9e5457e..3dcaaf6 100644
--- a/gettime.c
+++ b/gettime.c
@@ -425,6 +425,7 @@ void fio_clock_init(void)
 			fio_clock_source = CS_CPUCLOCK;
 	} else if (fio_clock_source == CS_CPUCLOCK)
 		log_info("fio: clocksource=cpu may not be reliable\n");
+	dprint(FD_TIME, "gettime: clocksource=%d\n", (int) fio_clock_source);
 }
 
 uint64_t ntime_since(const struct timespec *s, const struct timespec *e)
diff --git a/init.c b/init.c
index 90cc0bc..42e7107 100644
--- a/init.c
+++ b/init.c
@@ -781,6 +781,11 @@ static int fixup_options(struct thread_data *td)
 			o->unit_base = 8;
 	}
 
+#ifndef FIO_HAVE_ANY_FALLOCATE
+	/* Platform doesn't support any fallocate so force it to none */
+	o->fallocate_mode = FIO_FALLOCATE_NONE;
+#endif
+
 #ifndef CONFIG_FDATASYNC
 	if (o->fdatasync_blocks) {
 		log_info("fio: this platform does not support fdatasync()"
diff --git a/options.c b/options.c
index b21f09a..5a2ab57 100644
--- a/options.c
+++ b/options.c
@@ -2289,7 +2289,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.parent = "nrfiles",
 		.hide	= 1,
 	},
-#if defined(CONFIG_POSIX_FALLOCATE) || defined(FIO_HAVE_NATIVE_FALLOCATE)
+#ifdef FIO_HAVE_ANY_FALLOCATE
 	{
 		.name	= "fallocate",
 		.lname	= "Fallocate",
@@ -2333,14 +2333,14 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 #endif
 		},
 	},
-#else	/* CONFIG_POSIX_FALLOCATE */
+#else	/* FIO_HAVE_ANY_FALLOCATE */
 	{
 		.name	= "fallocate",
 		.lname	= "Fallocate",
 		.type	= FIO_OPT_UNSUPPORTED,
 		.help	= "Your platform does not support fallocate",
 	},
-#endif /* CONFIG_POSIX_FALLOCATE || FIO_HAVE_NATIVE_FALLOCATE */
+#endif /* FIO_HAVE_ANY_FALLOCATE */
 	{
 		.name	= "fadvise_hint",
 		.lname	= "Fadvise hint",
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index 8a116e6..8d15833 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -5,6 +5,7 @@
 
 #include <errno.h>
 #include <unistd.h>
+#include <sys/endian.h>
 #include <sys/param.h>
 #include <sys/sysctl.h>
 #include <sys/statvfs.h>
diff --git a/os/os-freebsd.h b/os/os-freebsd.h
index c7863b5..e6da286 100644
--- a/os/os-freebsd.h
+++ b/os/os-freebsd.h
@@ -6,6 +6,7 @@
 #include <errno.h>
 #include <sys/sysctl.h>
 #include <sys/disk.h>
+#include <sys/endian.h>
 #include <sys/thr.h>
 #include <sys/socket.h>
 #include <sys/param.h>
diff --git a/os/os-netbsd.h b/os/os-netbsd.h
index 7be02a7..eac76cf 100644
--- a/os/os-netbsd.h
+++ b/os/os-netbsd.h
@@ -10,6 +10,7 @@
 #include <sys/ioctl.h>
 #include <sys/dkio.h>
 #include <sys/disklabel.h>
+#include <sys/endian.h>
 /* XXX hack to avoid confilcts between rbtree.h and <sys/rb.h> */
 #define	rb_node	_rb_node
 #include <sys/sysctl.h>
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index d874ee2..675bf89 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -9,6 +9,7 @@
 #include <sys/ioctl.h>
 #include <sys/dkio.h>
 #include <sys/disklabel.h>
+#include <sys/endian.h>
 #include <sys/utsname.h>
 /* XXX hack to avoid conflicts between rbtree.h and <sys/tree.h> */
 #include <sys/sysctl.h>
diff --git a/os/os.h b/os/os.h
index afee9f9..2e15529 100644
--- a/os/os.h
+++ b/os/os.h
@@ -369,4 +369,8 @@ static inline bool fio_fallocate(struct fio_file *f, uint64_t offset, uint64_t l
 }
 #endif
 
+#if defined(CONFIG_POSIX_FALLOCATE) || defined(FIO_HAVE_NATIVE_FALLOCATE)
+# define FIO_HAVE_ANY_FALLOCATE
+#endif
+
 #endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 800334db17a22029553488b41a5ede8af909c66d:

  Correctly detect whether ioengine_load can exit early (2017-07-19 12:27:57 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e1933299ed9f1525e010e0489f0185c063d6d129:

  drop logging when blkdev invalidation failed on unsupported platforms (2017-07-25 13:56:21 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (4):
      HOWTO: fix "should be to to"
      man: fix broken ioengine specific option format
      HOWTO: add missing [netsplice] for ioengine specific options
      drop logging when blkdev invalidation failed on unsupported platforms

 HOWTO       | 10 +++++-----
 filesetup.c |  2 --
 fio.1       | 12 ++++++------
 3 files changed, 11 insertions(+), 13 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index e544634..2fa8fc2 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1255,7 +1255,7 @@ I/O type
 
 	* 60% of accesses should be to the first 10%
 	* 30% of accesses should be to the next 20%
-	* 8% of accesses should be to to the next 30%
+	* 8% of accesses should be to the next 30%
 	* 2% of accesses should be to the next 40%
 
 	we can define that through zoning of the random accesses. For the above
@@ -1889,13 +1889,13 @@ caveat that when used on the command line, they must come after the
 	hostname if the job is a TCP listener or UDP reader. For unix sockets, the
 	normal filename option should be used and the port is invalid.
 
-.. option:: listen : [net]
+.. option:: listen : [netsplice] [net]
 
 	For TCP network connections, tell fio to listen for incoming connections
 	rather than initiating an outgoing connection. The :option:`hostname` must
 	be omitted if this option is used.
 
-.. option:: pingpong : [net]
+.. option:: pingpong : [netsplice] [net]
 
 	Normally a network writer will just continue writing data, and a network
 	reader will just consume packages. If ``pingpong=1`` is set, a writer will
@@ -1907,11 +1907,11 @@ caveat that when used on the command line, they must come after the
 	``pingpong=1`` should only be set for a single reader when multiple readers
 	are listening to the same address.
 
-.. option:: window_size : [net]
+.. option:: window_size : [netsplice] [net]
 
 	Set the desired socket buffer size for the connection.
 
-.. option:: mss : [net]
+.. option:: mss : [netsplice] [net]
 
 	Set the TCP maximum segment size (TCP_MAXSEG).
 
diff --git a/filesetup.c b/filesetup.c
index 38ad9ed..3b2ebd9 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -528,8 +528,6 @@ static int __file_invalidate_cache(struct thread_data *td, struct fio_file *f,
 		}
 		if (ret < 0)
 			errval = errno;
-		else if (ret) /* probably not supported */
-			errval = ret;
 	} else if (f->filetype == FIO_TYPE_CHAR ||
 		   f->filetype == FIO_TYPE_PIPE) {
 		dprint(FD_IO, "invalidate not supported %s\n", f->file_name);
diff --git a/fio.1 b/fio.1
index 7fe96e0..768b209 100644
--- a/fio.1
+++ b/fio.1
@@ -1143,7 +1143,7 @@ given a criteria of:
 30% of accesses should be to the next 20%
 .RE
 .RS
-8% of accesses should be to to the next 30%
+8% of accesses should be to the next 30%
 .RE
 .RS
 2% of accesses should be to the next 40%
@@ -2069,7 +2069,7 @@ For TCP network connections, tell fio to listen for incoming
 connections rather than initiating an outgoing connection. The
 hostname must be omitted if this option is used.
 .TP
-.BI (net, pingpong) \fR=\fPbool
+.BI (net,netsplice)pingpong \fR=\fPbool
 Normally a network writer will just continue writing data, and a network reader
 will just consume packets. If pingpong=1 is set, a writer will send its normal
 payload to the reader, then wait for the reader to send the same payload back.
@@ -2079,16 +2079,16 @@ completion latency measures how long it took for the other end to receive and
 send back. For UDP multicast traffic pingpong=1 should only be set for a single
 reader when multiple readers are listening to the same address.
 .TP
-.BI (net, window_size) \fR=\fPint
+.BI (net,netsplice)window_size \fR=\fPint
 Set the desired socket buffer size for the connection.
 .TP
-.BI (net, mss) \fR=\fPint
+.BI (net,netsplice)mss \fR=\fPint
 Set the TCP maximum segment size (TCP_MAXSEG).
 .TP
-.BI (e4defrag,donorname) \fR=\fPstr
+.BI (e4defrag)donorname \fR=\fPstr
 File will be used as a block donor (swap extents between files)
 .TP
-.BI (e4defrag,inplace) \fR=\fPint
+.BI (e4defrag)inplace \fR=\fPint
 Configure donor file block allocation strategy
 .RS
 .BI 0(default) :

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 785e49c659023df1735bff195ad4ba133ebd23a7:

  build: Sort file list (2017-07-16 14:02:51 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 800334db17a22029553488b41a5ede8af909c66d:

  Correctly detect whether ioengine_load can exit early (2017-07-19 12:27:57 -0700)

----------------------------------------------------------------
Ben Walker (1):
      Correctly detect whether ioengine_load can exit early

 init.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 9b2b63d..90cc0bc 100644
--- a/init.c
+++ b/init.c
@@ -1000,16 +1000,24 @@ int ioengine_load(struct thread_data *td)
 {
 	const char *engine;
 
-	/*
-	 * Engine has already been loaded.
-	 */
-	if (td->io_ops)
-		return 0;
 	if (!td->o.ioengine) {
 		log_err("fio: internal fault, no IO engine specified\n");
 		return 1;
 	}
 
+	if (td->io_ops) {
+		/* An engine is loaded, but the requested ioengine
+		 * may have changed.
+		 */
+		if (!strcmp(td->io_ops->name, td->o.ioengine)) {
+			/* The right engine is already loaded */
+			return 0;
+		}
+
+		/* Unload the old engine. */
+		free_ioengine(td);
+	}
+
 	engine = get_engine_name(td->o.ioengine);
 	td->io_ops = load_ioengine(td, engine);
 	if (!td->io_ops) {
@@ -2530,7 +2538,6 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			}
 
 			if (!ret && !strcmp(opt, "ioengine")) {
-				free_ioengine(td);
 				if (ioengine_load(td)) {
 					put_job(td);
 					td = NULL;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 74c30eab68f320d73f4c7cf192699cb8bfb14eaf:

  man: fix wrong info on sync_file_range= (2017-07-14 08:25:24 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 785e49c659023df1735bff195ad4ba133ebd23a7:

  build: Sort file list (2017-07-16 14:02:51 -0600)

----------------------------------------------------------------
Bernhard M. Wiedemann (1):
      build: Sort file list

Tomohiro Kusumi (1):
      man: add missing \ for fcntl(2) in write_hint option

 Makefile | 4 ++--
 fio.1    | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index bef930f..e8ea6cb 100644
--- a/Makefile
+++ b/Makefile
@@ -36,8 +36,8 @@ ifdef CONFIG_GFIO
   PROGS += gfio
 endif
 
-SOURCE :=	$(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
-		$(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/lib/*.c)) \
+SOURCE :=	$(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
+		$(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/lib/*.c))) \
 		gettime.c ioengines.c init.c stat.c log.c time.c filesetup.c \
 		eta.c verify.c memory.c io_u.c parse.c mutex.c options.c \
 		smalloc.c filehash.c profile.c debug.c engines/cpu.c \
diff --git a/fio.1 b/fio.1
index 8a41321..7fe96e0 100644
--- a/fio.1
+++ b/fio.1
@@ -565,7 +565,7 @@ Advise using \fBFADV_RANDOM\fR
 .RE
 .TP
 .BI write_hint \fR=\fPstr
-Use \fBfcntl\fR|(2) to advise the kernel what life time to expect from a write.
+Use \fBfcntl\fR\|(2) to advise the kernel what life time to expect from a write.
 Only supported on Linux, as of version 4.13. The values are all relative to
 each other, and no absolute meaning should be associated with them. Accepted
 values are:

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8f4b9f2475a4524067b6a0662aff006783919922:

  Update documentation for write_hint (2017-07-13 09:39:01 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 74c30eab68f320d73f4c7cf192699cb8bfb14eaf:

  man: fix wrong info on sync_file_range= (2017-07-14 08:25:24 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (1):
      man: fix wrong info on sync_file_range=

 fio.1 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/fio.1 b/fio.1
index 5da9ecf..8a41321 100644
--- a/fio.1
+++ b/fio.1
@@ -1071,7 +1071,7 @@ SYNC_FILE_RANGE_WAIT_BEFORE
 SYNC_FILE_RANGE_WRITE
 .TP
 .B wait_after
-SYNC_FILE_RANGE_WRITE
+SYNC_FILE_RANGE_WAIT_AFTER
 .TP
 .RE
 .P

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 43f466e667a8bcfc58c1c69b0897fe0345c34841:

  parse: enable options to be marked dont-free (2017-07-12 16:44:07 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8f4b9f2475a4524067b6a0662aff006783919922:

  Update documentation for write_hint (2017-07-13 09:39:01 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Update documentation for write_hint

 HOWTO | 26 ++++++++++++++++++++++----
 fio.1 | 28 ++++++++++++++++++++++++----
 2 files changed, 46 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 0b80a62..e544634 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1093,11 +1093,29 @@ I/O type
 		**random**
 			Advise using **FADV_RANDOM**.
 
-.. option:: fadvise_stream=int
+.. option:: write_hint=str
 
-	Use :manpage:`posix_fadvise(2)` to advise the kernel what stream ID the
-	writes issued belong to. Only supported on Linux. Note, this option may
-	change going forward.
+	Use :manpage:`fcntl(2)` to advise the kernel what life time to expect
+	from a write. Only supported on Linux, as of version 4.13. Accepted
+	values are:
+
+		**none**
+			No particular life time associated with this file.
+
+		**short**
+			Data written to this file has a short life time.
+
+		**medium**
+			Data written to this file has a medium life time.
+
+		**long**
+			Data written to this file has a long life time.
+
+		**extreme**
+			Data written to this file has a very long life time.
+
+	The values are all relative to each other, and no absolute meaning
+	should be associated with them.
 
 .. option:: offset=int
 
diff --git a/fio.1 b/fio.1
index bc477a2..5da9ecf 100644
--- a/fio.1
+++ b/fio.1
@@ -564,10 +564,30 @@ Advise using \fBFADV_RANDOM\fR
 .RE
 .RE
 .TP
-.BI fadvise_stream \fR=\fPint
-Use \fBposix_fadvise\fR\|(2) to advise the kernel what stream ID the
-writes issued belong to. Only supported on Linux. Note, this option
-may change going forward.
+.BI write_hint \fR=\fPstr
+Use \fBfcntl\fR|(2) to advise the kernel what life time to expect from a write.
+Only supported on Linux, as of version 4.13. The values are all relative to
+each other, and no absolute meaning should be associated with them. Accepted
+values are:
+.RS
+.RS
+.TP
+.B none
+No particular life time associated with this file.
+.TP
+.B short
+Data written to this file has a short life time.
+.TP
+.B medium
+Data written to this file has a medium life time.
+.TP
+.B long
+Data written to this file has a long life time.
+.TP
+.B extreme
+Data written to this file has a very long life time.
+.RE
+.RE
 .TP
 .BI size \fR=\fPint
 Total size of I/O for this job.  \fBfio\fR will run until this many bytes have

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ef2f4a50c25b3315d8825eb5e6fdfd6d57a47b74:

  return correct error code for unhandled addr. (2017-07-11 00:28:45 +0800)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 43f466e667a8bcfc58c1c69b0897fe0345c34841:

  parse: enable options to be marked dont-free (2017-07-12 16:44:07 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      profiles/act: use the right options
      parse: enable options to be marked dont-free

 parse.c             |  2 +-
 parse.h             |  1 +
 profiles/act.c      | 35 +++++++++++++++--------------------
 profiles/tiobench.c |  1 +
 4 files changed, 18 insertions(+), 21 deletions(-)

---

Diff of recent changes:

diff --git a/parse.c b/parse.c
index 4d4fddd..ecce8b8 100644
--- a/parse.c
+++ b/parse.c
@@ -1345,7 +1345,7 @@ void options_free(struct fio_option *options, void *data)
 	dprint(FD_PARSE, "free options\n");
 
 	for (o = &options[0]; o->name; o++) {
-		if (o->type != FIO_OPT_STR_STORE || !o->off1)
+		if (o->type != FIO_OPT_STR_STORE || !o->off1 || o->no_free)
 			continue;
 
 		ptr = td_var(data, o, o->off1);
diff --git a/parse.h b/parse.h
index fb6abd1..dfe7f16 100644
--- a/parse.h
+++ b/parse.h
@@ -78,6 +78,7 @@ struct fio_option {
 	int is_time;			/* time based value */
 	int no_warn_def;
 	int pow2;			/* must be a power-of-2 */
+	int no_free;
 };
 
 extern int parse_option(char *, const char *, struct fio_option *, struct fio_option **, void *, struct flist_head *);
diff --git a/profiles/act.c b/profiles/act.c
index 59e5005..4669535 100644
--- a/profiles/act.c
+++ b/profiles/act.c
@@ -53,14 +53,6 @@ struct act_prof_data {
 	unsigned int nr_slices;
 };
 
-static char *device_names;
-static unsigned int load;
-static unsigned int prep;
-static unsigned int threads_per_queue;
-static unsigned int num_read_blocks;
-static unsigned int write_size;
-static unsigned long long test_duration;
-
 #define ACT_MAX_OPTS	128
 static const char *act_opts[ACT_MAX_OPTS] = {
 	"direct=1",
@@ -97,6 +89,7 @@ static struct fio_option options[] = {
 		.help	= "Devices to use",
 		.category = FIO_OPT_C_PROFILE,
 		.group	= FIO_OPT_G_ACT,
+		.no_free = true,
 	},
 	{
 		.name	= "load",
@@ -185,6 +178,8 @@ static int act_add_opt(const char *str, ...)
 
 static int act_add_rw(const char *dev, int reads)
 {
+	struct act_options *ao = &act_options;
+
 	if (act_add_opt("name=act-%s-%s", reads ? "read" : "write", dev))
 		return 1;
 	if (act_add_opt("filename=%s", dev))
@@ -192,21 +187,21 @@ static int act_add_rw(const char *dev, int reads)
 	if (act_add_opt("rw=%s", reads ? "randread" : "randwrite"))
 		return 1;
 	if (reads) {
-		int rload = load * R_LOAD / threads_per_queue;
+		int rload = ao->load * R_LOAD / ao->threads_per_queue;
 
-		if (act_add_opt("numjobs=%u", threads_per_queue))
+		if (act_add_opt("numjobs=%u", ao->threads_per_queue))
 			return 1;
 		if (act_add_opt("rate_iops=%u", rload))
 			return 1;
-		if (act_add_opt("bs=%u", num_read_blocks * 512))
+		if (act_add_opt("bs=%u", ao->num_read_blocks * 512))
 			return 1;
 	} else {
-		const int rsize = write_size / (num_read_blocks * 512);
-		int wload = (load * W_LOAD + rsize - 1) / rsize;
+		const int rsize = ao->write_size / (ao->num_read_blocks * 512);
+		int wload = (ao->load * W_LOAD + rsize - 1) / rsize;
 
 		if (act_add_opt("rate_iops=%u", wload))
 			return 1;
-		if (act_add_opt("bs=%u", write_size))
+		if (act_add_opt("bs=%u", ao->write_size))
 			return 1;
 	}
 
@@ -248,10 +243,10 @@ static int act_add_dev_prep(const char *dev)
 
 static int act_add_dev(const char *dev)
 {
-	if (prep)
+	if (act_options.prep)
 		return act_add_dev_prep(dev);
 
-	if (act_add_opt("runtime=%llus", test_duration))
+	if (act_add_opt("runtime=%llus", act_options.test_duration))
 		return 1;
 	if (act_add_opt("time_based=1"))
 		return 1;
@@ -269,7 +264,7 @@ static int act_add_dev(const char *dev)
  */
 static int act_prep_cmdline(void)
 {
-	if (!device_names) {
+	if (!act_options.device_names) {
 		log_err("act: you need to set IO target(s) with the "
 			"device-names option.\n");
 		return 1;
@@ -280,7 +275,7 @@ static int act_prep_cmdline(void)
 	do {
 		char *dev;
 
-		dev = strsep(&device_names, ",");
+		dev = strsep(&act_options.device_names, ",");
 		if (!dev)
 			break;
 
@@ -300,7 +295,7 @@ static int act_io_u_lat(struct thread_data *td, uint64_t usec)
 	int i, ret = 0;
 	double perm;
 
-	if (prep)
+	if (act_options.prep)
 		return 0;
 
 	/*
@@ -431,7 +426,7 @@ static int act_td_init(struct thread_data *td)
 	get_act_ref();
 
 	apd = calloc(1, sizeof(*apd));
-	nr_slices = (test_duration + SAMPLE_SEC - 1) / SAMPLE_SEC;
+	nr_slices = (act_options.test_duration + SAMPLE_SEC - 1) / SAMPLE_SEC;
 	apd->slices = calloc(nr_slices, sizeof(struct act_slice));
 	apd->nr_slices = nr_slices;
 	fio_gettime(&apd->sample_tv, NULL);
diff --git a/profiles/tiobench.c b/profiles/tiobench.c
index 9d9885a..f19a085 100644
--- a/profiles/tiobench.c
+++ b/profiles/tiobench.c
@@ -70,6 +70,7 @@ static struct fio_option options[] = {
 		.help	= "Test directory",
 		.category = FIO_OPT_C_PROFILE,
 		.group	= FIO_OPT_G_TIOBENCH,
+		.no_free = true,
 	},
 	{
 		.name	= "threads",

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6210cf66316e25498ab2e445731b3bb6b886c363:

  Fio 2.99 (2017-07-07 12:45:04 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ef2f4a50c25b3315d8825eb5e6fdfd6d57a47b74:

  return correct error code for unhandled addr. (2017-07-11 00:28:45 +0800)

----------------------------------------------------------------
Pan Liu (1):
      return correct error code for unhandled addr.

 engines/rbd.c | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/engines/rbd.c b/engines/rbd.c
index 4bae425..5b51a39 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -517,6 +517,7 @@ static int fio_rbd_queue(struct thread_data *td, struct io_u *io_u)
 	} else {
 		dprint(FD_IO, "%s: Warning: unhandled ddir: %d\n", __func__,
 		       io_u->ddir);
+		r = -EINVAL;
 		goto failed_comp;
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f25f4ef6d123173d7669553ec712419f95c1a3ea:

  oslib/libmtd: kill dead code (2017-07-06 08:09:22 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6210cf66316e25498ab2e445731b3bb6b886c363:

  Fio 2.99 (2017-07-07 12:45:04 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      io_u: sequence random buflen generation individually
      init: add comment as to how we seed read/write/trim generators
      Fio 2.99

 FIO-VERSION-GEN        |  2 +-
 fio.h                  |  4 +++-
 init.c                 | 17 ++++++++++++++++-
 io_u.c                 |  4 ++--
 os/windows/install.wxs |  2 +-
 5 files changed, 23 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 3b4f206..f82aeee 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.21
+DEF_VER=fio-2.99
 
 LF='
 '
diff --git a/fio.h b/fio.h
index d5d6bfe..39d775c 100644
--- a/fio.h
+++ b/fio.h
@@ -91,6 +91,8 @@ enum {
 
 enum {
 	FIO_RAND_BS_OFF		= 0,
+	FIO_RAND_BS1_OFF,
+	FIO_RAND_BS2_OFF,
 	FIO_RAND_VER_OFF,
 	FIO_RAND_MIX_OFF,
 	FIO_RAND_FILE_OFF,
@@ -214,7 +216,7 @@ struct thread_data {
 
 	unsigned long rand_seeds[FIO_RAND_NR_OFFS];
 
-	struct frand_state bsrange_state;
+	struct frand_state bsrange_state[DDIR_RWDIR_CNT];
 	struct frand_state verify_state;
 	struct frand_state trim_state;
 	struct frand_state delay_state;
diff --git a/init.c b/init.c
index a4b5adb..9b2b63d 100644
--- a/init.c
+++ b/init.c
@@ -921,7 +921,22 @@ static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 {
 	int i;
 
-	init_rand_seed(&td->bsrange_state, td->rand_seeds[FIO_RAND_BS_OFF], use64);
+	/*
+	 * trimwrite is special in that we need to generate the same
+	 * offsets to get the "write after trim" effect. If we are
+	 * using bssplit to set buffer length distributions, ensure that
+	 * we seed the trim and write generators identically.
+	 */
+	if (td_trimwrite(td)) {
+		init_rand_seed(&td->bsrange_state[DDIR_READ], td->rand_seeds[FIO_RAND_BS_OFF], use64);
+		init_rand_seed(&td->bsrange_state[DDIR_WRITE], td->rand_seeds[FIO_RAND_BS1_OFF], use64);
+		init_rand_seed(&td->bsrange_state[DDIR_TRIM], td->rand_seeds[FIO_RAND_BS1_OFF], use64);
+	} else {
+		init_rand_seed(&td->bsrange_state[DDIR_READ], td->rand_seeds[FIO_RAND_BS_OFF], use64);
+		init_rand_seed(&td->bsrange_state[DDIR_WRITE], td->rand_seeds[FIO_RAND_BS1_OFF], use64);
+		init_rand_seed(&td->bsrange_state[DDIR_TRIM], td->rand_seeds[FIO_RAND_BS2_OFF], use64);
+	}
+
 	td_fill_verify_state_seed(td);
 	init_rand_seed(&td->rwmix_state, td->rand_seeds[FIO_RAND_MIX_OFF], false);
 
diff --git a/io_u.c b/io_u.c
index 8d42d65..ed8e84a 100644
--- a/io_u.c
+++ b/io_u.c
@@ -552,9 +552,9 @@ static unsigned int __get_next_buflen(struct thread_data *td, struct io_u *io_u,
 	if (!io_u_fits(td, io_u, minbs))
 		return 0;
 
-	frand_max = rand_max(&td->bsrange_state);
+	frand_max = rand_max(&td->bsrange_state[ddir]);
 	do {
-		r = __rand(&td->bsrange_state);
+		r = __rand(&td->bsrange_state[ddir]);
 
 		if (!td->o.bssplit_nr[ddir]) {
 			buflen = 1 + (unsigned int) ((double) maxbs *
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 860570a..500d64c 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.21">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.99">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b034c0dd2cdb27d3523b300c1b4b93a1c5b84b3c:

  HOWTO: fix indentation for options and job parameters (2017-07-04 16:07:50 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f25f4ef6d123173d7669553ec712419f95c1a3ea:

  oslib/libmtd: kill dead code (2017-07-06 08:09:22 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      oslib/libmtd: kill dead code

 oslib/libmtd_common.h | 42 ------------------------------------------
 1 file changed, 42 deletions(-)

---

Diff of recent changes:

diff --git a/oslib/libmtd_common.h b/oslib/libmtd_common.h
index 3f9e1b8..35628fe 100644
--- a/oslib/libmtd_common.h
+++ b/oslib/libmtd_common.h
@@ -119,48 +119,6 @@ extern "C" {
 	fprintf(stderr, "%s: warning!: " fmt "\n", PROGRAM_NAME, ##__VA_ARGS__); \
 } while(0)
 
-static inline int mtd_rpmatch(const char *resp)
-{
-    return (resp[0] == 'y' || resp[0] == 'Y') ? 1 :
-	(resp[0] == 'n' || resp[0] == 'N') ? 0 : -1;
-}
-
-/**
- * prompt the user for confirmation
- */
-static inline bool prompt(const char *msg, bool def)
-{
-	char *line = NULL;
-	size_t len;
-	bool ret = def;
-
-	do {
-		normsg_cont("%s (%c/%c) ", msg, def ? 'Y' : 'y', def ? 'n' : 'N');
-		fflush(stdout);
-
-		while (getline(&line, &len, stdin) == -1) {
-			printf("failed to read prompt; assuming '%s'\n",
-				def ? "yes" : "no");
-			break;
-		}
-
-		if (strcmp("\n", line) != 0) {
-			switch (mtd_rpmatch(line)) {
-			case 0: ret = false; break;
-			case 1: ret = true; break;
-			case -1:
-				puts("unknown response; please try again");
-				continue;
-			}
-		}
-		break;
-	} while (1);
-
-	free(line);
-
-	return ret;
-}
-
 static inline int is_power_of_2(unsigned long long n)
 {
 	return (n != 0 && ((n & (n - 1)) == 0));

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1be7afd7329ebdca520d637f571d2b31c33f6ba1:

  Merge branch 'fallocate_native' of https://github.com/sitsofe/fio (2017-07-03 16:51:31 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b034c0dd2cdb27d3523b300c1b4b93a1c5b84b3c:

  HOWTO: fix indentation for options and job parameters (2017-07-04 16:07:50 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (7):
      man: sync OPTIONS section with HOWTO
      man: sync "JOB FILE FORMAT" section with HOWTO
      man: sync "JOB FILE PARAMETERS" section with HOWTO
      man: sync "PARAMETER TYPES" section with HOWTO
      man: refer to REPORTING-BUGS for bug reporting
      HOWTO: minor backports from the man page
      HOWTO: fix indentation for options and job parameters

 HOWTO  | 159 +++++++++++++++++++++--------------------
 README |   3 +-
 fio.1  | 250 ++++++++++++++++++++++++++++++++++++++++++++---------------------
 3 files changed, 251 insertions(+), 161 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 92c3b73..0b80a62 100644
--- a/HOWTO
+++ b/HOWTO
@@ -54,56 +54,66 @@ Command line options
 
 .. option:: --debug=type
 
-    Enable verbose tracing of various fio actions.  May be ``all`` for all types
-    or individual types separated by a comma (e.g. ``--debug=file,mem`` will
-    enable file and memory debugging).  Currently, additional logging is
-    available for:
+	Enable verbose tracing of various fio actions.  May be ``all`` for all types
+	or individual types separated by a comma (e.g. ``--debug=file,mem`` will
+	enable file and memory debugging).  Currently, additional logging is
+	available for:
 
-    *process*
+	*process*
 			Dump info related to processes.
-    *file*
+	*file*
 			Dump info related to file actions.
-    *io*
+	*io*
 			Dump info related to I/O queuing.
-    *mem*
+	*mem*
 			Dump info related to memory allocations.
-    *blktrace*
+	*blktrace*
 			Dump info related to blktrace setup.
-    *verify*
+	*verify*
 			Dump info related to I/O verification.
-    *all*
+	*all*
 			Enable all debug options.
-    *random*
+	*random*
 			Dump info related to random offset generation.
-    *parse*
+	*parse*
 			Dump info related to option matching and parsing.
-    *diskutil*
+	*diskutil*
 			Dump info related to disk utilization updates.
-    *job:x*
+	*job:x*
 			Dump info only related to job number x.
-    *mutex*
+	*mutex*
 			Dump info only related to mutex up/down ops.
-    *profile*
+	*profile*
 			Dump info related to profile extensions.
-    *time*
+	*time*
 			Dump info related to internal time keeping.
-    *net*
+	*net*
 			Dump info related to networking connections.
-    *rate*
+	*rate*
 			Dump info related to I/O rate switching.
-    *compress*
+	*compress*
 			Dump info related to log compress/decompress.
-    *?* or *help*
+	*?* or *help*
 			Show available debug options.
 
 .. option:: --parse-only
 
-    Parse options only, don\'t start any I/O.
+	Parse options only, don\'t start any I/O.
 
 .. option:: --output=filename
 
 	Write output to file `filename`.
 
+.. option:: --output-format=type
+
+	Set the reporting format to `normal`, `terse`, `json`, or `json+`.  Multiple
+	formats can be selected, separated by a comma.  `terse` is a CSV based
+	format.  `json+` is like `json`, except it adds a full dump of the latency
+	buckets.
+
+.. option:: --runtime
+	Limit run time to runtime seconds.
+
 .. option:: --bandwidth-log
 
 	Generate aggregate bandwidth logs.
@@ -114,16 +124,9 @@ Command line options
 
 .. option:: --append-terse
 
-    Print statistics in selected mode AND terse, semicolon-delimited format.
-    **deprecated**, use :option:`--output-format` instead to select multiple
-    formats.
-
-.. option:: --output-format=type
-
-	Set the reporting format to `normal`, `terse`, `json`, or `json+`.  Multiple
-	formats can be selected, separated by a comma.  `terse` is a CSV based
-	format.  `json+` is like `json`, except it adds a full dump of the latency
-	buckets.
+	Print statistics in selected mode AND terse, semicolon-delimited format.
+	**Deprecated**, use :option:`--output-format` instead to select multiple
+	formats.
 
 .. option:: --terse-version=type
 
@@ -131,7 +134,7 @@ Command line options
 
 .. option:: --version
 
-	Print version info and exit.
+	Print version information and exit.
 
 .. option:: --help
 
@@ -143,9 +146,9 @@ Command line options
 
 .. option:: --crctest=[test]
 
-    Test the speed of the built-in checksumming functions. If no argument is
-    given all of them are tested. Alternatively, a comma separated list can be passed, in
-    which case the given ones are tested.
+	Test the speed of the built-in checksumming functions. If no argument is
+	given, all of them are tested. Alternatively, a comma separated list can
+	be passed, in which case the given ones are tested.
 
 .. option:: --cmdhelp=command
 
@@ -153,27 +156,27 @@ Command line options
 
 .. option:: --enghelp=[ioengine[,command]]
 
-    List all commands defined by :option:`ioengine`, or print help for `command`
-    defined by :option:`ioengine`.  If no :option:`ioengine` is given, list all
-    available ioengines.
+	List all commands defined by :option:`ioengine`, or print help for `command`
+	defined by :option:`ioengine`.  If no :option:`ioengine` is given, list all
+	available ioengines.
 
 .. option:: --showcmd=jobfile
 
-	Turn a job file into command line options.
+	Convert `jobfile` to a set of command-line options.
 
 .. option:: --readonly
 
-    Turn on safety read-only checks, preventing writes.  The ``--readonly``
-    option is an extra safety guard to prevent users from accidentally starting
-    a write workload when that is not desired.  Fio will only write if
-    `rw=write/randwrite/rw/randrw` is given.  This extra safety net can be used
-    as an extra precaution as ``--readonly`` will also enable a write check in
-    the I/O engine core to prevent writes due to unknown user space bug(s).
+	Turn on safety read-only checks, preventing writes.  The ``--readonly``
+	option is an extra safety guard to prevent users from accidentally starting
+	a write workload when that is not desired.  Fio will only write if
+	`rw=write/randwrite/rw/randrw` is given.  This extra safety net can be used
+	as an extra precaution as ``--readonly`` will also enable a write check in
+	the I/O engine core to prevent writes due to unknown user space bug(s).
 
 .. option:: --eta=when
 
-	When real-time ETA estimate should be printed.  May be `always`, `never` or
-	`auto`.
+	Specifies when real-time ETA estimate should be printed.  `when` may be
+	`always`, `never` or `auto`.
 
 .. option:: --eta-newline=time
 
@@ -187,48 +190,48 @@ Command line options
 
 .. option:: --section=name
 
-    Only run specified section in job file.  Multiple sections can be specified.
-    The ``--section`` option allows one to combine related jobs into one file.
-    E.g. one job file could define light, moderate, and heavy sections. Tell
-    fio to run only the "heavy" section by giving ``--section=heavy``
-    command line option.  One can also specify the "write" operations in one
-    section and "verify" operation in another section.  The ``--section`` option
-    only applies to job sections.  The reserved *global* section is always
-    parsed and used.
+	Only run specified section `name` in job file.  Multiple sections can be specified.
+	The ``--section`` option allows one to combine related jobs into one file.
+	E.g. one job file could define light, moderate, and heavy sections. Tell
+	fio to run only the "heavy" section by giving ``--section=heavy``
+	command line option.  One can also specify the "write" operations in one
+	section and "verify" operation in another section.  The ``--section`` option
+	only applies to job sections.  The reserved *global* section is always
+	parsed and used.
 
 .. option:: --alloc-size=kb
 
-    Set the internal smalloc pool to this size in KiB.  The
-    ``--alloc-size`` switch allows one to use a larger pool size for smalloc.
-    If running large jobs with randommap enabled, fio can run out of memory.
-    Smalloc is an internal allocator for shared structures from a fixed size
-    memory pool and can grow to 16 pools. The pool size defaults to 16MiB.
+	Set the internal smalloc pool size to `kb` in KiB.  The
+	``--alloc-size`` switch allows one to use a larger pool size for smalloc.
+	If running large jobs with randommap enabled, fio can run out of memory.
+	Smalloc is an internal allocator for shared structures from a fixed size
+	memory pool and can grow to 16 pools. The pool size defaults to 16MiB.
 
-    NOTE: While running :file:`.fio_smalloc.*` backing store files are visible
-    in :file:`/tmp`.
+	NOTE: While running :file:`.fio_smalloc.*` backing store files are visible
+	in :file:`/tmp`.
 
 .. option:: --warnings-fatal
 
-    All fio parser warnings are fatal, causing fio to exit with an
-    error.
+	All fio parser warnings are fatal, causing fio to exit with an
+	error.
 
 .. option:: --max-jobs=nr
 
-	Maximum number of threads/processes to support.
+	Set the maximum number of threads/processes to support.
 
 .. option:: --server=args
 
-    Start a backend server, with `args` specifying what to listen to.
-    See `Client/Server`_ section.
+	Start a backend server, with `args` specifying what to listen to.
+	See `Client/Server`_ section.
 
 .. option:: --daemonize=pidfile
 
-    Background a fio server, writing the pid to the given `pidfile` file.
+	Background a fio server, writing the pid to the given `pidfile` file.
 
 .. option:: --client=hostname
 
-    Instead of running the jobs locally, send and run them on the given host or
-    set of hosts.  See `Client/Server`_ section.
+	Instead of running the jobs locally, send and run them on the given host or
+	set of hosts.  See `Client/Server`_ section.
 
 .. option:: --remote-config=file
 
@@ -236,7 +239,7 @@ Command line options
 
 .. option:: --idle-prof=option
 
-	Report CPU idleness. *option* is one of the following:
+	Report CPU idleness. `option` is one of the following:
 
 		**calibrate**
 			Run unit work calibration only and exit.
@@ -445,7 +448,7 @@ automatically substituted with the current system values when the job is
 run. Simple math is also supported on these keywords, so you can perform actions
 like::
 
-        size=8*$mb_memory
+	size=8*$mb_memory
 
 and get that properly expanded to 8 times the size of memory in the machine.
 
@@ -474,7 +477,7 @@ Parameter types
 ~~~~~~~~~~~~~~~
 
 **str**
-    String. This is a sequence of alpha characters.
+	String: A sequence of alphanumeric characters.
 
 **time**
 	Integer with possible time suffix.  Without a unit value is interpreted as
@@ -488,7 +491,7 @@ Parameter types
 	Integer. A whole number value, which may contain an integer prefix
 	and an integer suffix:
 
-        [*integer prefix*] **number** [*integer suffix*]
+	[*integer prefix*] **number** [*integer suffix*]
 
 	The optional *integer prefix* specifies the number's base. The default
 	is decimal. *0x* specifies hexadecimal.
@@ -521,7 +524,7 @@ Parameter types
 	compatibility with old scripts.  For example, 4k means 4096.
 
 	For quantities of data, an optional unit of 'B' may be included
-	(e.g.,  'kB' is the same as 'k').
+	(e.g., 'kB' is the same as 'k').
 
 	The *integer suffix* is not case sensitive (e.g., m/mi mean mebi/mega,
 	not milli). 'b' and 'B' both mean byte, not bit.
@@ -3542,7 +3545,7 @@ If windowed logging is enabled and :option:`log_max_value` is set, then fio logs
 maximum values in that window instead of averages.  Since 'data direction' and
 'offset' are per-I/O values, they aren't applicable if windowed logging is enabled.
 
-Client/server
+Client/Server
 -------------
 
 Normally fio is invoked as a stand-alone application on the machine where the
diff --git a/README b/README
index ec3e9c0..a6eba8f 100644
--- a/README
+++ b/README
@@ -59,7 +59,8 @@ Mailing list
 ------------
 
 The fio project mailing list is meant for anything related to fio including
-general discussion, bug reporting, questions, and development.
+general discussion, bug reporting, questions, and development. For bug reporting,
+see REPORTING-BUGS.
 
 An automated mail detailing recent commits is automatically sent to the list at
 most daily. The list address is fio@vger.kernel.org, subscribe by sending an
diff --git a/fio.1 b/fio.1
index 9783646..bc477a2 100644
--- a/fio.1
+++ b/fio.1
@@ -1,4 +1,4 @@
-.TH fio 1 "June 2017" "User Manual"
+.TH fio 1 "July 2017" "User Manual"
 .SH NAME
 fio \- flexible I/O tester
 .SH SYNOPSIS
@@ -14,8 +14,11 @@ one wants to simulate.
 .TP
 .BI \-\-debug \fR=\fPtype
 Enable verbose tracing of various fio actions. May be `all' for all types
-or individual types separated by a comma (eg \-\-debug=io,file). `help' will
-list all available tracing options.
+or individual types separated by a comma (e.g. \-\-debug=file,mem will enable
+file and memory debugging). `help' will list all available tracing options.
+.TP
+.BI \-\-parse-only
+Parse options only, don't start any I/O.
 .TP
 .BI \-\-output \fR=\fPfilename
 Write output to \fIfilename\fR.
@@ -39,89 +42,152 @@ Print statistics in a terse, semicolon-delimited format.
 Print statistics in selected mode AND terse, semicolon-delimited format.
 Deprecated, use \-\-output-format instead to select multiple formats.
 .TP
-.B \-\-version
-Display version information and exit.
-.TP
 .BI \-\-terse\-version \fR=\fPversion
 Set terse version output format (default 3, or 2, 4, 5)
 .TP
+.B \-\-version
+Print version information and exit.
+.TP
 .B \-\-help
-Display usage information and exit.
+Print a summary of the command line options and exit.
 .TP
 .B \-\-cpuclock-test
-Perform test and validation of internal CPU clock
+Perform test and validation of internal CPU clock.
 .TP
-.BI \-\-crctest[\fR=\fPtest]
-Test the speed of the builtin checksumming functions. If no argument is given,
-all of them are tested. Or a comma separated list can be passed, in which
+.BI \-\-crctest \fR=\fP[test]
+Test the speed of the built-in checksumming functions. If no argument is given,
+all of them are tested. Alternatively, a comma separated list can be passed, in which
 case the given ones are tested.
 .TP
 .BI \-\-cmdhelp \fR=\fPcommand
-Print help information for \fIcommand\fR.  May be `all' for all commands.
+Print help information for \fIcommand\fR. May be `all' for all commands.
 .TP
 .BI \-\-enghelp \fR=\fPioengine[,command]
 List all commands defined by \fIioengine\fR, or print help for \fIcommand\fR defined by \fIioengine\fR.
+If no \fIioengine\fR is given, list all available ioengines.
 .TP
 .BI \-\-showcmd \fR=\fPjobfile
 Convert \fIjobfile\fR to a set of command-line options.
 .TP
+.BI \-\-readonly
+Turn on safety read-only checks, preventing writes. The \-\-readonly
+option is an extra safety guard to prevent users from accidentally starting
+a write workload when that is not desired. Fio will only write if
+`rw=write/randwrite/rw/randrw` is given. This extra safety net can be used
+as an extra precaution as \-\-readonly will also enable a write check in
+the I/O engine core to prevent writes due to unknown user space bug(s).
+.TP
 .BI \-\-eta \fR=\fPwhen
-Specifies when real-time ETA estimate should be printed.  \fIwhen\fR may
-be one of `always', `never' or `auto'.
+Specifies when real-time ETA estimate should be printed. \fIwhen\fR may
+be `always', `never' or `auto'.
 .TP
 .BI \-\-eta\-newline \fR=\fPtime
-Force an ETA newline for every `time` period passed.
+Force a new line for every \fItime\fR period passed. When the unit is omitted,
+the value is interpreted in seconds.
 .TP
 .BI \-\-status\-interval \fR=\fPtime
-Report full output status every `time` period passed.
-.TP
-.BI \-\-readonly
-Turn on safety read-only checks, preventing any attempted write.
-.TP
-.BI \-\-section \fR=\fPsec
-Only run section \fIsec\fR from job file. This option can be used multiple times to add more sections to run.
+Force full status dump every \fItime\fR period passed. When the unit is omitted,
+the value is interpreted in seconds.
+.TP
+.BI \-\-section \fR=\fPname
+Only run specified section \fIname\fR in job file. Multiple sections can be specified.
+The \-\-section option allows one to combine related jobs into one file.
+E.g. one job file could define light, moderate, and heavy sections. Tell
+fio to run only the "heavy" section by giving \-\-section=heavy
+command line option. One can also specify the "write" operations in one
+section and "verify" operation in another section. The \-\-section option
+only applies to job sections. The reserved *global* section is always
+parsed and used.
 .TP
 .BI \-\-alloc\-size \fR=\fPkb
-Set the internal smalloc pool size to \fIkb\fP kilobytes.
+Set the internal smalloc pool size to \fIkb\fP in KiB. The
+\-\-alloc-size switch allows one to use a larger pool size for smalloc.
+If running large jobs with randommap enabled, fio can run out of memory.
+Smalloc is an internal allocator for shared structures from a fixed size
+memory pool and can grow to 16 pools. The pool size defaults to 16MiB.
+NOTE: While running .fio_smalloc.* backing store files are visible
+in /tmp.
 .TP
 .BI \-\-warnings\-fatal
 All fio parser warnings are fatal, causing fio to exit with an error.
 .TP
 .BI \-\-max\-jobs \fR=\fPnr
-Set the maximum allowed number of jobs (threads/processes) to support.
+Set the maximum number of threads/processes to support.
 .TP
 .BI \-\-server \fR=\fPargs
-Start a backend server, with \fIargs\fP specifying what to listen to. See client/server section.
+Start a backend server, with \fIargs\fP specifying what to listen to. See Client/Server section.
 .TP
 .BI \-\-daemonize \fR=\fPpidfile
-Background a fio server, writing the pid to the given pid file.
+Background a fio server, writing the pid to the given \fIpidfile\fP file.
 .TP
-.BI \-\-client \fR=\fPhost
-Instead of running the jobs locally, send and run them on the given host or set of hosts.  See client/server section.
+.BI \-\-client \fR=\fPhostname
+Instead of running the jobs locally, send and run them on the given host or set of hosts. See Client/Server section.
+.TP
+.BI \-\-remote-config \fR=\fPfile
+Tell fio server to load this local file.
 .TP
 .BI \-\-idle\-prof \fR=\fPoption
-Report cpu idleness on a system or percpu basis (\fIoption\fP=system,percpu) or run unit work calibration only (\fIoption\fP=calibrate).
+Report CPU idleness. \fIoption\fP is one of the following:
+.RS
+.RS
+.TP
+.B calibrate
+Run unit work calibration only and exit.
+.TP
+.B system
+Show aggregate system idleness and unit work.
+.TP
+.B percpu
+As "system" but also show per CPU idleness.
+.RE
+.RE
+.TP
+.BI \-\-inflate-log \fR=\fPlog
+Inflate and output compressed log.
+.TP
+.BI \-\-trigger-file \fR=\fPfile
+Execute trigger cmd when file exists.
+.TP
+.BI \-\-trigger-timeout \fR=\fPt
+Execute trigger at this time.
+.TP
+.BI \-\-trigger \fR=\fPcmd
+Set this command as local trigger.
+.TP
+.BI \-\-trigger-remote \fR=\fPcmd
+Set this command as remote trigger.
+.TP
+.BI \-\-aux-path \fR=\fPpath
+Use this path for fio state generated files.
 .SH "JOB FILE FORMAT"
-Job files are in `ini' format. They consist of one or more
-job definitions, which begin with a job name in square brackets and
-extend to the next job name.  The job name can be any ASCII string
-except `global', which has a special meaning.  Following the job name is
-a sequence of zero or more parameters, one per line, that define the
-behavior of the job.  Any line starting with a `;' or `#' character is
-considered a comment and ignored.
-.P
-If \fIjobfile\fR is specified as `-', the job file will be read from
-standard input.
-.SS "Global Section"
-The global section contains default parameters for jobs specified in the
-job file.  A job is only affected by global sections residing above it,
-and there may be any number of global sections.  Specific job definitions
-may override any parameter set in global sections.
-.SH "JOB PARAMETERS"
-.SS Types
-Some parameters may take arguments of a specific type.
-Anywhere a numeric value is required, an arithmetic expression may be used,
-provided it is surrounded by parentheses. Supported operators are:
+Any parameters following the options will be assumed to be job files, unless
+they match a job file parameter. Multiple job files can be listed and each job
+file will be regarded as a separate group. Fio will `stonewall` execution
+between each group.
+
+Fio accepts one or more job files describing what it is
+supposed to do. The job file format is the classic ini file, where the names
+enclosed in [] brackets define the job name. You are free to use any ASCII name
+you want, except *global* which has special meaning. Following the job name is
+a sequence of zero or more parameters, one per line, that define the behavior of
+the job. If the first character in a line is a ';' or a '#', the entire line is
+discarded as a comment.
+
+A *global* section sets defaults for the jobs described in that file. A job may
+override a *global* section parameter, and a job file may even have several
+*global* sections if so desired. A job is only affected by a *global* section
+residing above it.
+
+The \-\-cmdhelp option also lists all options. If used with an `option`
+argument, \-\-cmdhelp will detail the given `option`.
+
+See the `examples/` directory in the fio source for inspiration on how to write
+job files. Note the copyright and license requirements currently apply to
+`examples/` files.
+.SH "JOB FILE PARAMETERS"
+Some parameters take an option of a given type, such as an integer or a
+string. Anywhere a numeric value is required, an arithmetic expression may be
+used, provided it is surrounded by parentheses. Supported operators are:
 .RS
 .RS
 .TP
@@ -141,28 +207,37 @@ provided it is surrounded by parentheses. Supported operators are:
 .P
 For time values in expressions, units are microseconds by default. This is
 different than for time values not in expressions (not enclosed in
-parentheses). The types used are:
+parentheses).
+.SH "PARAMETER TYPES"
+The following parameter types are used.
 .TP
 .I str
-String: a sequence of alphanumeric characters.
+String. A sequence of alphanumeric characters.
+.TP
+.I time
+Integer with possible time suffix. Without a unit value is interpreted as
+seconds unless otherwise specified. Accepts a suffix of 'd' for days, 'h' for
+hours, 'm' for minutes, 's' for seconds, 'ms' (or 'msec') for milliseconds and 'us'
+(or 'usec') for microseconds. For example, use 10m for 10 minutes.
 .TP
 .I int
 Integer. A whole number value, which may contain an integer prefix
 and an integer suffix.
 
-[integer prefix]number[integer suffix]
+[*integer prefix*] **number** [*integer suffix*]
+
+The optional *integer prefix* specifies the number's base. The default
+is decimal. *0x* specifies hexadecimal.
 
-The optional integer prefix specifies the number's base. The default
-is decimal. 0x specifies hexadecimal.
+The optional *integer suffix* specifies the number's units, and includes an
+optional unit prefix and an optional unit. For quantities of data, the
+default unit is bytes. For quantities of time, the default unit is seconds
+unless otherwise specified.
 
-The optional integer suffix specifies the number's units, and includes
-an optional unit prefix and an optional unit.  For quantities
-of data, the default unit is bytes. For quantities of time,
-the default unit is seconds.
+With \fBkb_base=1000\fR, fio follows international standards for unit
+prefixes. To specify power-of-10 decimal values defined in the
+International System of Units (SI):
 
-With \fBkb_base=1000\fR, fio follows international standards for unit prefixes.
-To specify power-of-10 decimal values defined in the International
-System of Units (SI):
 .nf
 ki means kilo (K) or 1000
 mi means mega (M) or 1000**2
@@ -172,6 +247,7 @@ pi means peta (P) or 1000**5
 .fi
 
 To specify power-of-2 binary values defined in IEC 80000-13:
+
 .nf
 k means kibi (Ki) or 1024
 m means mebi (Mi) or 1024**2
@@ -180,12 +256,19 @@ t means tebi (Ti) or 1024**4
 p means pebi (Pi) or 1024**5
 .fi
 
-With \fBkb_base=1024\fR (the default), the unit prefixes are opposite from
-those specified in the SI and IEC 80000-13 standards to provide
-compatibility with old scripts.  For example, 4k means 4096.
+With \fBkb_base=1024\fR (the default), the unit prefixes are opposite
+from those specified in the SI and IEC 80000-13 standards to provide
+compatibility with old scripts. For example, 4k means 4096.
+
+For quantities of data, an optional unit of 'B' may be included
+(e.g., 'kB' is the same as 'k').
+
+The *integer suffix* is not case sensitive (e.g., m/mi mean mebi/mega,
+not milli). 'b' and 'B' both mean byte, not bit.
 
-.nf
 Examples with \fBkb_base=1000\fR:
+
+.nf
 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
 1 MiB: 1048576, 1m, 1024k
 1 MB: 1000000, 1mi, 1000ki
@@ -193,8 +276,9 @@ Examples with \fBkb_base=1000\fR:
 1 TB: 1000000000, 1ti, 1000mi, 1000000ki
 .fi
 
-.nf
 Examples with \fBkb_base=1024\fR (default):
+
+.nf
 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
 1 MiB: 1048576, 1m, 1024k
 1 MB: 1000000, 1mi, 1000ki
@@ -202,13 +286,8 @@ Examples with \fBkb_base=1024\fR (default):
 1 TB: 1000000000, 1ti, 1000mi, 1000000ki
 .fi
 
-For quantities of data, an optional unit of 'B' may be included
-(e.g.,  'kb' is the same as 'k').
-
-The integer suffix is not case sensitive (e.g., m/mi mean mebi/mega,
-not milli). 'b' and 'B' both mean byte, not bit.
-
 To specify times (units are not case sensitive):
+
 .nf
 D means days
 H means hours
@@ -218,21 +297,25 @@ ms or msec means milliseconds
 us or usec means microseconds
 .fi
 
+If the option accepts an upper and lower range, use a colon ':' or
+minus '-' to separate such values. See `irange` parameter type.
+If the lower value specified happens to be larger than the upper value
+the two values are swapped.
 .TP
 .I bool
-Boolean: a true or false value. `0' denotes false, `1' denotes true.
+Boolean. Usually parsed as an integer, however only defined for
+true and false (1 and 0).
 .TP
 .I irange
-Integer range: a range of integers specified in the format
-\fIlower\fR:\fIupper\fR or \fIlower\fR\-\fIupper\fR. \fIlower\fR and
-\fIupper\fR may contain a suffix as described above.  If an option allows two
-sets of ranges, they are separated with a `,' or `/' character. For example:
-`8\-8k/8M\-4G'.
+Integer range with suffix. Allows value range to be given, such as
+1024-4096. A colon may also be used as the separator, e.g. 1k:4k. If the
+option allows two sets of ranges, they can be specified with a ',' or '/'
+delimiter: 1k-4k/8k-32k. Also see `int` parameter type.
 .TP
 .I float_list
-List of floating numbers: A list of floating numbers, separated by
-a ':' character.
-.SS "Parameter List"
+A list of floating point numbers, separated by a ':' character.
+.SH "JOB DESCRIPTION"
+With the above in mind, here follows the complete list of fio job parameters.
 .TP
 .BI name \fR=\fPstr
 May be used to override the job name.  On the command line, this parameter
@@ -2630,7 +2713,10 @@ This man page was written by Aaron Carroll <aaronc@cse.unsw.edu.au> based
 on documentation by Jens Axboe.
 .SH "REPORTING BUGS"
 Report bugs to the \fBfio\fR mailing list <fio@vger.kernel.org>.
-See \fBREADME\fR.
+.br
+See \fBREPORTING-BUGS\fR.
+
+\fBREPORTING-BUGS\fR: http://git.kernel.dk/cgit/fio/plain/REPORTING-BUGS
 .SH "SEE ALSO"
 For further documentation see \fBHOWTO\fR and \fBREADME\fR.
 .br

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5283741f7be708fbbb3feb2cd5ca5187f3a964d1:

  gettime: reduce test CPU clock entries to 1000 (2017-07-02 16:21:47 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1be7afd7329ebdca520d637f571d2b31c33f6ba1:

  Merge branch 'fallocate_native' of https://github.com/sitsofe/fio (2017-07-03 16:51:31 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'fallocate_native' of https://github.com/sitsofe/fio

Sitsofe Wheeler (2):
      filesetup: add native fallocate
      HOWTO: indent v3 terse output definition

 HOWTO         | 11 ++++++++---
 file.h        |  1 +
 filesetup.c   | 30 ++++++++++++++++++++++++++++--
 fio.1         |  9 +++++++--
 options.c     | 14 +++++++++++---
 os/os-linux.h | 18 ++++++++++++++++++
 os/os-mac.h   | 12 ++++++++++++
 os/os.h       |  8 ++++++++
 8 files changed, 93 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 6e53cff..92c3b73 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1049,6 +1049,10 @@ I/O type
 		**none**
 			Do not pre-allocate space.
 
+		**native**
+			Use a platform's native pre-allocation call but fall back to
+			**none** behavior if it fails/is not implemented.
+
 		**posix**
 			Pre-allocate via :manpage:`posix_fallocate(3)`.
 
@@ -1063,8 +1067,9 @@ I/O type
 			Backward-compatible alias for **posix**.
 
 	May not be available on all supported platforms. **keep** is only available
-	on Linux. If using ZFS on Solaris this must be set to **none** because ZFS
-	doesn't support it. Default: **posix**.
+	on Linux. If using ZFS on Solaris this cannot be set to **posix**
+	because ZFS doesn't support pre-allocation. Default: **native** if any
+	pre-allocation methods are available, **none** if not.
 
 .. option:: fadvise_hint=str
 
@@ -3328,7 +3333,7 @@ will be a disk utilization section.
 Below is a single line containing short names for each of the fields in the
 minimal output v3, separated by semicolons::
 
-terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;
 write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
+	terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10
 ;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 
 
 Trace file format
diff --git a/file.h b/file.h
index 9801bb5..84daa5f 100644
--- a/file.h
+++ b/file.h
@@ -63,6 +63,7 @@ enum fio_fallocate_mode {
 	FIO_FALLOCATE_NONE	= 1,
 	FIO_FALLOCATE_POSIX	= 2,
 	FIO_FALLOCATE_KEEP_SIZE	= 3,
+	FIO_FALLOCATE_NATIVE	= 4,
 };
 
 /*
diff --git a/filesetup.c b/filesetup.c
index f3e3865..38ad9ed 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -38,6 +38,25 @@ static inline void clear_error(struct thread_data *td)
 	td->verror[0] = '\0';
 }
 
+static inline int native_fallocate(struct thread_data *td, struct fio_file *f)
+{
+	bool success;
+
+	success = fio_fallocate(f, 0, f->real_file_size);
+	dprint(FD_FILE, "native fallocate of file %s size %llu was "
+			"%ssuccessful\n", f->file_name,
+			(unsigned long long) f->real_file_size,
+			!success ? "un": "");
+
+	if (success)
+		return 0;
+
+	if (errno == ENOSYS)
+		dprint(FD_FILE, "native fallocate is not implemented\n");
+
+	return -1;
+}
+
 static void fallocate_file(struct thread_data *td, struct fio_file *f)
 {
 	int r;
@@ -45,10 +64,16 @@ static void fallocate_file(struct thread_data *td, struct fio_file *f)
 	if (td->o.fill_device)
 		return;
 
-#ifdef CONFIG_POSIX_FALLOCATE
 	switch (td->o.fallocate_mode) {
+	case FIO_FALLOCATE_NATIVE:
+		r = native_fallocate(td, f);
+		if (r != 0)
+			log_err("fio: native_fallocate call failed: %s\n",
+					strerror(errno));
+		break;
 	case FIO_FALLOCATE_NONE:
 		break;
+#ifdef CONFIG_POSIX_FALLOCATE
 	case FIO_FALLOCATE_POSIX:
 		dprint(FD_FILE, "posix_fallocate file %s size %llu\n",
 				 f->file_name,
@@ -58,6 +83,7 @@ static void fallocate_file(struct thread_data *td, struct fio_file *f)
 		if (r > 0)
 			log_err("fio: posix_fallocate fails: %s\n", strerror(r));
 		break;
+#endif /* CONFIG_POSIX_FALLOCATE */
 #ifdef CONFIG_LINUX_FALLOCATE
 	case FIO_FALLOCATE_KEEP_SIZE:
 		dprint(FD_FILE, "fallocate(FALLOC_FL_KEEP_SIZE) "
@@ -74,7 +100,7 @@ static void fallocate_file(struct thread_data *td, struct fio_file *f)
 		log_err("fio: unknown fallocate mode: %d\n", td->o.fallocate_mode);
 		assert(0);
 	}
-#endif /* CONFIG_POSIX_FALLOCATE */
+
 }
 
 /*
diff --git a/fio.1 b/fio.1
index ab04208..9783646 100644
--- a/fio.1
+++ b/fio.1
@@ -436,6 +436,10 @@ are:
 .B none
 Do not pre-allocate space.
 .TP
+.B native
+Use a platform's native pre-allocation call but fall back to 'none' behavior if
+it fails/is not implemented.
+.TP
 .B posix
 Pre-allocate via \fBposix_fallocate\fR\|(3).
 .TP
@@ -450,8 +454,9 @@ Backward-compatible alias for 'posix'.
 .RE
 .P
 May not be available on all supported platforms. 'keep' is only
-available on Linux. If using ZFS on Solaris this must be set to 'none'
-because ZFS doesn't support it. Default: 'posix'.
+available on Linux. If using ZFS on Solaris this cannot be set to 'posix'
+because ZFS doesn't support it. Default: 'native' if any pre-allocation methods
+are available, 'none' if not.
 .RE
 .TP
 .BI fadvise_hint \fR=\fPstr
diff --git a/options.c b/options.c
index 09a21af..b21f09a 100644
--- a/options.c
+++ b/options.c
@@ -2289,14 +2289,14 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.parent = "nrfiles",
 		.hide	= 1,
 	},
-#ifdef CONFIG_POSIX_FALLOCATE
+#if defined(CONFIG_POSIX_FALLOCATE) || defined(FIO_HAVE_NATIVE_FALLOCATE)
 	{
 		.name	= "fallocate",
 		.lname	= "Fallocate",
 		.type	= FIO_OPT_STR,
 		.off1	= offsetof(struct thread_options, fallocate_mode),
 		.help	= "Whether pre-allocation is performed when laying out files",
-		.def	= "posix",
+		.def	= "native",
 		.category = FIO_OPT_C_FILE,
 		.group	= FIO_OPT_G_INVALID,
 		.posval	= {
@@ -2304,10 +2304,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .oval = FIO_FALLOCATE_NONE,
 			    .help = "Do not pre-allocate space",
 			  },
+			  { .ival = "native",
+			    .oval = FIO_FALLOCATE_NATIVE,
+			    .help = "Use native pre-allocation if possible",
+			  },
+#ifdef CONFIG_POSIX_FALLOCATE
 			  { .ival = "posix",
 			    .oval = FIO_FALLOCATE_POSIX,
 			    .help = "Use posix_fallocate()",
 			  },
+#endif
 #ifdef CONFIG_LINUX_FALLOCATE
 			  { .ival = "keep",
 			    .oval = FIO_FALLOCATE_KEEP_SIZE,
@@ -2319,10 +2325,12 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .oval = FIO_FALLOCATE_NONE,
 			    .help = "Alias for 'none'",
 			  },
+#ifdef CONFIG_POSIX_FALLOCATE
 			  { .ival = "1",
 			    .oval = FIO_FALLOCATE_POSIX,
 			    .help = "Alias for 'posix'",
 			  },
+#endif
 		},
 	},
 #else	/* CONFIG_POSIX_FALLOCATE */
@@ -2332,7 +2340,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.type	= FIO_OPT_UNSUPPORTED,
 		.help	= "Your platform does not support fallocate",
 	},
-#endif /* CONFIG_POSIX_FALLOCATE */
+#endif /* CONFIG_POSIX_FALLOCATE || FIO_HAVE_NATIVE_FALLOCATE */
 	{
 		.name	= "fadvise_hint",
 		.lname	= "Fadvise hint",
diff --git a/os/os-linux.h b/os/os-linux.h
index 8c1e93b..e7d600d 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -392,4 +392,22 @@ static inline int shm_attach_to_open_removed(void)
 	return 1;
 }
 
+#ifdef CONFIG_LINUX_FALLOCATE
+#define FIO_HAVE_NATIVE_FALLOCATE
+static inline bool fio_fallocate(struct fio_file *f, uint64_t offset,
+				 uint64_t len)
+{
+	int ret;
+	ret = fallocate(f->fd, 0, 0, len);
+	if (ret == 0)
+		return true;
+
+	/* Work around buggy old glibc versions... */
+	if (ret > 0)
+		errno = ret;
+
+	return false;
+}
+#endif
+
 #endif
diff --git a/os/os-mac.h b/os/os-mac.h
index 7de36ea..a1536c7 100644
--- a/os/os-mac.h
+++ b/os/os-mac.h
@@ -20,6 +20,7 @@
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_GETTID
 #define FIO_HAVE_CHARDEV_SIZE
+#define FIO_HAVE_NATIVE_FALLOCATE
 
 #define OS_MAP_ANON		MAP_ANON
 
@@ -101,4 +102,15 @@ static inline int gettid(void)
  */
 extern int fdatasync(int fd);
 
+static inline bool fio_fallocate(struct fio_file *f, uint64_t offset, uint64_t len)
+{
+	fstore_t store = {F_ALLOCATEALL, F_PEOFPOSMODE, offset, len};
+	if (fcntl(f->fd, F_PREALLOCATE, &store) != -1) {
+		if (ftruncate(f->fd, len) == 0)
+			return true;
+	}
+
+	return false;
+}
+
 #endif
diff --git a/os/os.h b/os/os.h
index 1d400c8..afee9f9 100644
--- a/os/os.h
+++ b/os/os.h
@@ -361,4 +361,12 @@ static inline int shm_attach_to_open_removed(void)
 }
 #endif
 
+#ifndef FIO_HAVE_NATIVE_FALLOCATE
+static inline bool fio_fallocate(struct fio_file *f, uint64_t offset, uint64_t len)
+{
+	errno = ENOSYS;
+	return false;
+}
+#endif
+
 #endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-07-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-07-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dd3503d365f87e68079fb3e443a410743688d53b:

  fio: make gauss a duplicate of normal for file_service_type (2017-06-28 22:58:08 +0100)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5283741f7be708fbbb3feb2cd5ca5187f3a964d1:

  gettime: reduce test CPU clock entries to 1000 (2017-07-02 16:21:47 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      filesetup: abstract out fallocate helper
      gettime: reduce test CPU clock entries to 1000

 filesetup.c | 81 ++++++++++++++++++++++++++++++++-----------------------------
 gettime.c   |  2 +-
 2 files changed, 44 insertions(+), 39 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index 13079e4..f3e3865 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -38,12 +38,51 @@ static inline void clear_error(struct thread_data *td)
 	td->verror[0] = '\0';
 }
 
+static void fallocate_file(struct thread_data *td, struct fio_file *f)
+{
+	int r;
+
+	if (td->o.fill_device)
+		return;
+
+#ifdef CONFIG_POSIX_FALLOCATE
+	switch (td->o.fallocate_mode) {
+	case FIO_FALLOCATE_NONE:
+		break;
+	case FIO_FALLOCATE_POSIX:
+		dprint(FD_FILE, "posix_fallocate file %s size %llu\n",
+				 f->file_name,
+				 (unsigned long long) f->real_file_size);
+
+		r = posix_fallocate(f->fd, 0, f->real_file_size);
+		if (r > 0)
+			log_err("fio: posix_fallocate fails: %s\n", strerror(r));
+		break;
+#ifdef CONFIG_LINUX_FALLOCATE
+	case FIO_FALLOCATE_KEEP_SIZE:
+		dprint(FD_FILE, "fallocate(FALLOC_FL_KEEP_SIZE) "
+				"file %s size %llu\n", f->file_name,
+				(unsigned long long) f->real_file_size);
+
+		r = fallocate(f->fd, FALLOC_FL_KEEP_SIZE, 0, f->real_file_size);
+		if (r != 0)
+			td_verror(td, errno, "fallocate");
+
+		break;
+#endif /* CONFIG_LINUX_FALLOCATE */
+	default:
+		log_err("fio: unknown fallocate mode: %d\n", td->o.fallocate_mode);
+		assert(0);
+	}
+#endif /* CONFIG_POSIX_FALLOCATE */
+}
+
 /*
  * Leaves f->fd open on success, caller must close
  */
 static int extend_file(struct thread_data *td, struct fio_file *f)
 {
-	int r, new_layout = 0, unlink_file = 0, flags;
+	int new_layout = 0, unlink_file = 0, flags;
 	unsigned long long left;
 	unsigned int bs;
 	char *b = NULL;
@@ -100,43 +139,7 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 		return 1;
 	}
 
-#ifdef CONFIG_POSIX_FALLOCATE
-	if (!td->o.fill_device) {
-		switch (td->o.fallocate_mode) {
-		case FIO_FALLOCATE_NONE:
-			break;
-		case FIO_FALLOCATE_POSIX:
-			dprint(FD_FILE, "posix_fallocate file %s size %llu\n",
-				 f->file_name,
-				 (unsigned long long) f->real_file_size);
-
-			r = posix_fallocate(f->fd, 0, f->real_file_size);
-			if (r > 0) {
-				log_err("fio: posix_fallocate fails: %s\n",
-						strerror(r));
-			}
-			break;
-#ifdef CONFIG_LINUX_FALLOCATE
-		case FIO_FALLOCATE_KEEP_SIZE:
-			dprint(FD_FILE,
-				"fallocate(FALLOC_FL_KEEP_SIZE) "
-				"file %s size %llu\n", f->file_name,
-				(unsigned long long) f->real_file_size);
-
-			r = fallocate(f->fd, FALLOC_FL_KEEP_SIZE, 0,
-					f->real_file_size);
-			if (r != 0)
-				td_verror(td, errno, "fallocate");
-
-			break;
-#endif /* CONFIG_LINUX_FALLOCATE */
-		default:
-			log_err("fio: unknown fallocate mode: %d\n",
-				td->o.fallocate_mode);
-			assert(0);
-		}
-	}
-#endif /* CONFIG_POSIX_FALLOCATE */
+	fallocate_file(td, f);
 
 	/*
 	 * If our jobs don't require regular files initially, we're done.
@@ -171,6 +174,8 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 	}
 
 	while (left && !td->terminate) {
+		ssize_t r;
+
 		if (bs > left)
 			bs = left;
 
diff --git a/gettime.c b/gettime.c
index 5741932..9e5457e 100644
--- a/gettime.c
+++ b/gettime.c
@@ -542,7 +542,7 @@ uint64_t time_since_now(const struct timespec *s)
     defined(CONFIG_SFAA)
 
 #define CLOCK_ENTRIES_DEBUG	100000
-#define CLOCK_ENTRIES_TEST	10000
+#define CLOCK_ENTRIES_TEST	1000
 
 struct clock_entry {
 	uint32_t seq;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 867f90e36d9d1458b20e5fab4542ce6c631f2633:

  blktrace: remove unused ioctl definitions (2017-06-27 19:59:49 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dd3503d365f87e68079fb3e443a410743688d53b:

  fio: make gauss a duplicate of normal for file_service_type (2017-06-28 22:58:08 +0100)

----------------------------------------------------------------
Sitsofe Wheeler (3):
      stat: further group percentage fixes
      doc: fix random_distribution Gaussian parameter name
      fio: make gauss a duplicate of normal for file_service_type

Tomohiro Kusumi (2):
      HOWTO/manpage: update percentage explanation using '%'
      add FD_PARSE debug print for size= option (which exists in offset=)

 HOWTO     | 15 +++++++++------
 fio.1     | 19 +++++++++++--------
 options.c |  8 +++++++-
 stat.c    | 16 ++++++++--------
 4 files changed, 35 insertions(+), 23 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 37caa3c..6e53cff 100644
--- a/HOWTO
+++ b/HOWTO
@@ -853,10 +853,13 @@ Target file/device
 		**pareto**
 			Use a *Pareto* distribution to decide what file to access.
 
-		**gauss**
+		**normal**
 			Use a *Gaussian* (normal) distribution to decide what file to
 			access.
 
+		**gauss**
+			Alias for normal.
+
 	For *random*, *roundrobin*, and *sequential*, a postfix can be appended to
 	tell fio how many I/Os to issue before switching to a new file. For example,
 	specifying ``file_service_type=random:8`` would cause fio to issue
@@ -1095,8 +1098,8 @@ I/O type
 	offset will be used. Data before the given offset will not be touched. This
 	effectively caps the file size at `real_size - offset`. Can be combined with
 	:option:`size` to constrain the start and end range of the I/O workload.
-	A percentage can be specified by the percentage number plus 1 with preceding '-'.
-	For example, -1 is parsed as 0%, -10 is parsed as 9%, -101 is parsed as 100%.
+	A percentage can be specified by a number between 1 and 100 followed by '%',
+	for example, ``offset=20%`` to specify 20%.
 
 .. option:: offset_increment=int
 
@@ -1204,7 +1207,7 @@ I/O type
 		**pareto**
 				Pareto distribution
 
-		**gauss**
+		**normal**
 				Normal (Gaussian) distribution
 
 		**zoned**
@@ -1217,8 +1220,8 @@ I/O type
 	values will yield in terms of hit rates.  If you wanted to use **zipf** with
 	a `theta` of 1.2, you would use ``random_distribution=zipf:1.2`` as the
 	option. If a non-uniform model is used, fio will disable use of the random
-	map. For the **gauss** distribution, a normal deviation is supplied as a
-	value between 0 and 100.
+	map. For the **normal** distribution, a normal (Gaussian) deviation is
+	supplied as a value between 0 and 100.
 
 	For a **zoned** distribution, fio supports specifying percentages of I/O
 	access that should fall within what range of the file or device. For
diff --git a/fio.1 b/fio.1
index ac87c9d..ab04208 100644
--- a/fio.1
+++ b/fio.1
@@ -1,4 +1,4 @@
-.TH fio 1 "May 2017" "User Manual"
+.TH fio 1 "June 2017" "User Manual"
 .SH NAME
 fio \- flexible I/O tester
 .SH SYNOPSIS
@@ -676,8 +676,11 @@ Use a zipfian distribution to decide what file to access.
 .B pareto
 Use a pareto distribution to decide what file to access.
 .TP
+.B normal
+Use a Gaussian (normal) distribution to decide what file to access.
+.TP
 .B gauss
-Use a gaussian (normal) distribution to decide what file to access.
+Alias for normal.
 .RE
 .P
 For \fBrandom\fR, \fBroundrobin\fR, and \fBsequential\fR, a postfix can be
@@ -916,8 +919,8 @@ bytes or a percentage. If a percentage is given, the next \fBblockalign\fR-ed
 offset will be used. Data before the given offset will not be touched. This
 effectively caps the file size at (real_size - offset). Can be combined with
 \fBsize\fR to constrain the start and end range of the I/O workload. A percentage
-can be specified by the percentage number plus 1 with preceding '-'. For example,
--1 is parsed as 0%, -10 is parsed as 9%, -101 is parsed as 100%.
+can be specified by a number between 1 and 100 followed by '%', for example,
+offset=20% to specify 20%.
 .TP
 .BI offset_increment \fR=\fPint
 If this is provided, then the real offset becomes the
@@ -1004,8 +1007,8 @@ Zipf distribution
 .B pareto
 Pareto distribution
 .TP
-.B gauss
-Normal (gaussian) distribution
+.B normal
+Normal (Gaussian) distribution
 .TP
 .B zoned
 Zoned random distribution
@@ -1017,8 +1020,8 @@ For \fBpareto\fR, it's the pareto power. Fio includes a test program, genzipf,
 that can be used visualize what the given input values will yield in terms of
 hit rates. If you wanted to use \fBzipf\fR with a theta of 1.2, you would use
 random_distribution=zipf:1.2 as the option. If a non-uniform model is used,
-fio will disable use of the random map. For the \fBgauss\fR distribution, a
-normal deviation is supplied as a value between 0 and 100.
+fio will disable use of the random map. For the \fBnormal\fR distribution, a
+normal (Gaussian) deviation is supplied as a value between 0 and 100.
 .P
 .RS
 For a \fBzoned\fR distribution, fio supports specifying percentages of IO
diff --git a/options.c b/options.c
index 7431ed8..09a21af 100644
--- a/options.c
+++ b/options.c
@@ -1410,6 +1410,8 @@ static int str_size_cb(void *data, unsigned long long *__val)
 	if (parse_is_percent(v)) {
 		td->o.size = 0;
 		td->o.size_percent = -1ULL - v;
+		dprint(FD_PARSE, "SET size_percent %d\n",
+					td->o.size_percent);
 	} else
 		td->o.size = v;
 
@@ -2267,9 +2269,13 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .oval = FIO_FSERVICE_PARETO,
 			    .help = "Pareto randomized",
 			  },
+			  { .ival = "normal",
+			    .oval = FIO_FSERVICE_GAUSS,
+			    .help = "Normal (Gaussian) randomized",
+			  },
 			  { .ival = "gauss",
 			    .oval = FIO_FSERVICE_GAUSS,
-			    .help = "Normal (Gaussian) distribution",
+			    .help = "Alias for normal",
 			  },
 			  { .ival = "roundrobin",
 			    .oval = FIO_FSERVICE_RR,
diff --git a/stat.c b/stat.c
index beec574..aebd107 100644
--- a/stat.c
+++ b/stat.c
@@ -475,6 +475,12 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		else
 			bw_str = "kB";
 
+		if (rs->agg[ddir]) {
+			p_of_agg = mean * 100 / (double) (rs->agg[ddir] / 1024);
+			if (p_of_agg > 100.0)
+				p_of_agg = 100.0;
+		}
+
 		if (rs->unit_base == 1) {
 			min *= 8.0;
 			max *= 8.0;
@@ -482,12 +488,6 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 			dev *= 8.0;
 		}
 
-		if (rs->agg[ddir]) {
-			p_of_agg = mean * 100 / (double) (rs->agg[ddir] / 1024);
-			if (p_of_agg > 100.0)
-				p_of_agg = 100.0;
-		}
-
 		if (mean > fkb_base * fkb_base) {
 			min /= fkb_base;
 			max /= fkb_base;
@@ -924,7 +924,7 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 		double p_of_agg = 100.0;
 
 		if (rs->agg[ddir]) {
-			p_of_agg = mean * 100 / (double) rs->agg[ddir];
+			p_of_agg = mean * 100 / (double) (rs->agg[ddir] / 1024);
 			if (p_of_agg > 100.0)
 				p_of_agg = 100.0;
 		}
@@ -1055,7 +1055,7 @@ static void add_ddir_status_json(struct thread_stat *ts,
 
 	if (calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev)) {
 		if (rs->agg[ddir]) {
-			p_of_agg = mean * 100 / (double) rs->agg[ddir];
+			p_of_agg = mean * 100 / (double) (rs->agg[ddir] / 1024);
 			if (p_of_agg > 100.0)
 				p_of_agg = 100.0;
 		}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit db84b73bd7b0c3b718596fbeb6a5f940b05a6735:

  stat: fix group percentage (2017-06-27 00:47:27 +0100)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 867f90e36d9d1458b20e5fab4542ce6c631f2633:

  blktrace: remove unused ioctl definitions (2017-06-27 19:59:49 -0600)

----------------------------------------------------------------
Daniel Verkamp (1):
      lib/ffz: remove dead store

Ido Ben-Tsion (1):
      HOWTO: fix the v3 terse output definition

Jens Axboe (3):
      Update API for file write hints
      mtd: add private rpmatch()
      blktrace: remove unused ioctl definitions

Tomohiro Kusumi (2):
      HOWTO: add offset unit info for offset= option
      man page: add offset unit info for offset= option

 HOWTO                 | 10 ++++++----
 blktrace_api.h        |  5 -----
 fio.1                 | 12 +++++++-----
 ioengines.c           | 13 ++++++++++++-
 lib/ffz.h             |  4 +---
 os/os-linux.h         | 13 ++++++++-----
 oslib/libmtd_common.h | 13 ++-----------
 7 files changed, 36 insertions(+), 34 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 2007dc0..37caa3c 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1090,11 +1090,13 @@ I/O type
 
 .. option:: offset=int
 
-	Start I/O at the provided offset in the file, given as either a fixed size or
-	a percentage. If a percentage is given, the next ``blockalign``-ed offset
-	will be used. Data before the given offset will not be touched. This
+	Start I/O at the provided offset in the file, given as either a fixed size in
+	bytes or a percentage. If a percentage is given, the next ``blockalign``-ed
+	offset will be used. Data before the given offset will not be touched. This
 	effectively caps the file size at `real_size - offset`. Can be combined with
 	:option:`size` to constrain the start and end range of the I/O workload.
+	A percentage can be specified by the percentage number plus 1 with preceding '-'.
+	For example, -1 is parsed as 0%, -10 is parsed as 9%, -101 is parsed as 100%.
 
 .. option:: offset_increment=int
 
@@ -3323,7 +3325,7 @@ will be a disk utilization section.
 Below is a single line containing short names for each of the fields in the
 minimal output v3, separated by semicolons::
 
-	terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_max;read_clat_min;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_max;write_clat_min;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10
 ;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;pu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
+terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;
 write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 
 
 Trace file format
diff --git a/blktrace_api.h b/blktrace_api.h
index 3df3347..e2d8cb3 100644
--- a/blktrace_api.h
+++ b/blktrace_api.h
@@ -127,9 +127,4 @@ struct blk_user_trace_setup {
 	__u32 pid;
 };
 
-#define BLKTRACESETUP _IOWR(0x12,115,struct blk_user_trace_setup)
-#define BLKTRACESTART _IO(0x12,116)
-#define BLKTRACESTOP _IO(0x12,117)
-#define BLKTRACETEARDOWN _IO(0x12,118)
-
 #endif
diff --git a/fio.1 b/fio.1
index 6a6ea1b..ac87c9d 100644
--- a/fio.1
+++ b/fio.1
@@ -911,11 +911,13 @@ If true, use buffered I/O.  This is the opposite of the \fBdirect\fR parameter.
 Default: true.
 .TP
 .BI offset \fR=\fPint
-Start I/O at the provided offset in the file, given as either a fixed size or a
-percentage. If a percentage is given, the next \fBblockalign\fR-ed offset will
-be used. Data before the given offset will not be touched. This effectively
-caps the file size at (real_size - offset). Can be combined with \fBsize\fR to
-constrain the start and end range of the I/O workload.
+Start I/O at the provided offset in the file, given as either a fixed size in
+bytes or a percentage. If a percentage is given, the next \fBblockalign\fR-ed
+offset will be used. Data before the given offset will not be touched. This
+effectively caps the file size at (real_size - offset). Can be combined with
+\fBsize\fR to constrain the start and end range of the I/O workload. A percentage
+can be specified by the percentage number plus 1 with preceding '-'. For example,
+-1 is parsed as 0%, -10 is parsed as 9%, -101 is parsed as 100%.
 .TP
 .BI offset_increment \fR=\fPint
 If this is provided, then the real offset becomes the
diff --git a/ioengines.c b/ioengines.c
index abbaa9a..6e6e3de 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -476,8 +476,19 @@ int td_io_open_file(struct thread_data *td, struct fio_file *f)
 	if (fio_option_is_set(&td->o, write_hint) &&
 	    (f->filetype == FIO_TYPE_BLOCK || f->filetype == FIO_TYPE_FILE)) {
 		uint64_t hint = td->o.write_hint;
+		int cmd;
 
-		if (fcntl(f->fd, F_SET_RW_HINT, &hint) < 0) {
+		/*
+		 * For direct IO, we just need/want to set the hint on
+		 * the file descriptor. For buffered IO, we need to set
+		 * it on the inode.
+		 */
+		if (td->o.odirect)
+			cmd = F_SET_FILE_RW_HINT;
+		else
+			cmd = F_SET_RW_HINT;
+
+		if (fcntl(f->fd, cmd, &hint) < 0) {
 			td_verror(td, errno, "fcntl write hint");
 			goto err;
 		}
diff --git a/lib/ffz.h b/lib/ffz.h
index e2c1b8e..16c9ae9 100644
--- a/lib/ffz.h
+++ b/lib/ffz.h
@@ -27,10 +27,8 @@ static inline int ffs64(uint64_t word)
 		word >>= 2;
 		r += 2;
 	}
-	if (!(word & 1)) {
-		word >>= 1;
+	if (!(word & 1))
 		r += 1;
-	}
 
 	return r;
 }
diff --git a/os/os-linux.h b/os/os-linux.h
index 3e7a2fc..8c1e93b 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -309,14 +309,17 @@ static inline int fio_set_sched_idle(void)
 #endif
 #define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
 #define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
+#define F_GET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 13)
+#define F_SET_FILE_RW_HINT	(F_LINUX_SPECIFIC_BASE + 14)
 #endif
 
 #ifndef RWH_WRITE_LIFE_NONE
-#define RWH_WRITE_LIFE_NONE	0
-#define RWH_WRITE_LIFE_SHORT	1
-#define RWH_WRITE_LIFE_MEDIUM	2
-#define RWH_WRITE_LIFE_LONG	3
-#define RWH_WRITE_LIFE_EXTREME	4
+#define RWH_WRITE_LIFE_NOT_SET	0
+#define RWH_WRITE_LIFE_NONE	1
+#define RWH_WRITE_LIFE_SHORT	2
+#define RWH_WRITE_LIFE_MEDIUM	3
+#define RWH_WRITE_LIFE_LONG	4
+#define RWH_WRITE_LIFE_EXTREME	5
 #endif
 
 #define FIO_HAVE_WRITE_HINT
diff --git a/oslib/libmtd_common.h b/oslib/libmtd_common.h
index 9768066..3f9e1b8 100644
--- a/oslib/libmtd_common.h
+++ b/oslib/libmtd_common.h
@@ -119,20 +119,11 @@ extern "C" {
 	fprintf(stderr, "%s: warning!: " fmt "\n", PROGRAM_NAME, ##__VA_ARGS__); \
 } while(0)
 
-#if defined(__UCLIBC__)
-/* uClibc versions before 0.9.34 don't have rpmatch() */
-#if __UCLIBC_MAJOR__ == 0 && \
-		(__UCLIBC_MINOR__ < 9 || \
-		(__UCLIBC_MINOR__ == 9 && __UCLIBC_SUBLEVEL__ < 34))
-#undef rpmatch
-#define rpmatch __rpmatch
-static inline int __rpmatch(const char *resp)
+static inline int mtd_rpmatch(const char *resp)
 {
     return (resp[0] == 'y' || resp[0] == 'Y') ? 1 :
 	(resp[0] == 'n' || resp[0] == 'N') ? 0 : -1;
 }
-#endif
-#endif
 
 /**
  * prompt the user for confirmation
@@ -154,7 +145,7 @@ static inline bool prompt(const char *msg, bool def)
 		}
 
 		if (strcmp("\n", line) != 0) {
-			switch (rpmatch(line)) {
+			switch (mtd_rpmatch(line)) {
 			case 0: ret = false; break;
 			case 1: ret = true; break;
 			case -1:

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 036a159982656aaa98b0a0490defc36c6065aa93:

  Android: fix missing sysmacros.h include (2017-06-25 09:51:13 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to db84b73bd7b0c3b718596fbeb6a5f940b05a6735:

  stat: fix group percentage (2017-06-27 00:47:27 +0100)

----------------------------------------------------------------
Jens Axboe (1):
      stat: fix alignment of the iops stats

Sitsofe Wheeler (26):
      HOWTO: add defaults
      HOWTO: state default time unit
      HOWTO: grammar/spelling changes
      HOWTO: escape =
      HOWTO: update time specification
      HOWTO: update directory and filename option descriptions
      HOWTO: update command line option descriptions
      HOWTO: general consistency
      HOWTO: minor internal/reordering/formatting changes
      HOWTO: description rewording/fixes
      HOWTO: Fix some capitalisation
      HOWTO: make filesize syntax show it can take a typed range
      HOWTO: reword HDFS description
      HOWTO: add --output-format=terse as another way to get minimal output
      HOWTO: add rate example
      HOWTO: add some markup
      HOWTO: reorder client/server phrasing
      HOWTO: reword iodepth and submit distribution text
      HOWTO: Reword Log File Formats and add reference
      HOWTO: modernize output examples and descriptions
      HOWTO/examples: fix writetrim "typo"
      init: update --crctest help syntax
      HOWTO: note that crc32c will automatically use hw
      README: update Red Hat fio package URL
      stat: fix printf format specifier
      stat: fix group percentage

 HOWTO            | 575 +++++++++++++++++++++++++++++++------------------------
 README           |   2 +-
 examples/mtd.fio |   2 +-
 init.c           |   2 +-
 stat.c           |  12 +-
 5 files changed, 335 insertions(+), 258 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index b2db69d..2007dc0 100644
--- a/HOWTO
+++ b/HOWTO
@@ -121,7 +121,7 @@ Command line options
 .. option:: --output-format=type
 
 	Set the reporting format to `normal`, `terse`, `json`, or `json+`.  Multiple
-	formats can be selected, separate by a comma.  `terse` is a CSV based
+	formats can be selected, separated by a comma.  `terse` is a CSV based
 	format.  `json+` is like `json`, except it adds a full dump of the latency
 	buckets.
 
@@ -135,16 +135,16 @@ Command line options
 
 .. option:: --help
 
-	Print this page.
+	Print a summary of the command line options and exit.
 
 .. option:: --cpuclock-test
 
 	Perform test and validation of internal CPU clock.
 
-.. option:: --crctest=test
+.. option:: --crctest=[test]
 
-    Test the speed of the builtin checksumming functions. If no argument is
-    given, all of them are tested. Or a comma separated list can be passed, in
+    Test the speed of the built-in checksumming functions. If no argument is
+    given all of them are tested. Alternatively, a comma separated list can be passed, in
     which case the given ones are tested.
 
 .. option:: --cmdhelp=command
@@ -177,11 +177,13 @@ Command line options
 
 .. option:: --eta-newline=time
 
-	Force a new line for every `time` period passed.
+	Force a new line for every `time` period passed.  When the unit is omitted,
+	the value is interpreted in seconds.
 
 .. option:: --status-interval=time
 
-	Force full status dump every `time` period passed.
+	Force full status dump every `time` period passed.  When the unit is
+	omitted, the value is interpreted in seconds.
 
 .. option:: --section=name
 
@@ -196,11 +198,11 @@ Command line options
 
 .. option:: --alloc-size=kb
 
-    Set the internal smalloc pool to this size in kb (def 1024).  The
+    Set the internal smalloc pool to this size in KiB.  The
     ``--alloc-size`` switch allows one to use a larger pool size for smalloc.
     If running large jobs with randommap enabled, fio can run out of memory.
     Smalloc is an internal allocator for shared structures from a fixed size
-    memory pool. The pool size defaults to 16M and can grow to 8 pools.
+    memory pool and can grow to 16 pools. The pool size defaults to 16MiB.
 
     NOTE: While running :file:`.fio_smalloc.*` backing store files are visible
     in :file:`/tmp`.
@@ -234,9 +236,16 @@ Command line options
 
 .. option:: --idle-prof=option
 
-	Report cpu idleness on a system or percpu basis
-	``--idle-prof=system,percpu`` or
-	run unit work calibration only ``--idle-prof=calibrate``.
+	Report CPU idleness. *option* is one of the following:
+
+		**calibrate**
+			Run unit work calibration only and exit.
+
+		**system**
+			Show aggregate system idleness and unit work.
+
+		**percpu**
+			As **system** but also show per CPU idleness.
 
 .. option:: --inflate-log=log
 
@@ -468,10 +477,10 @@ Parameter types
     String. This is a sequence of alpha characters.
 
 **time**
-	Integer with possible time suffix. In seconds unless otherwise
-	specified, use e.g. 10m for 10 minutes. Accepts s/m/h for seconds, minutes,
-	and hours, and accepts 'ms' (or 'msec') for milliseconds, and 'us' (or
-	'usec') for microseconds.
+	Integer with possible time suffix.  Without a unit value is interpreted as
+	seconds unless otherwise specified.  Accepts a suffix of 'd' for days, 'h' for
+	hours, 'm' for minutes, 's' for seconds, 'ms' (or 'msec') for milliseconds and
+	'us' (or 'usec') for microseconds.  For example, use 10m for 10 minutes.
 
 .. _int:
 
@@ -486,9 +495,10 @@ Parameter types
 
 	The optional *integer suffix* specifies the number's units, and includes an
 	optional unit prefix and an optional unit.  For quantities of data, the
-	default unit is bytes. For quantities of time, the default unit is seconds.
+	default unit is bytes. For quantities of time, the default unit is seconds
+	unless otherwise specified.
 
-	With :option:`kb_base` =1000, fio follows international standards for unit
+	With :option:`kb_base`\=1000, fio follows international standards for unit
 	prefixes.  To specify power-of-10 decimal values defined in the
 	International System of Units (SI):
 
@@ -506,7 +516,7 @@ Parameter types
 		* *T* -- means tebi (Ti) or 1024**4
 		* *P* -- means pebi (Pi) or 1024**5
 
-	With :option:`kb_base` =1024 (the default), the unit prefixes are opposite
+	With :option:`kb_base`\=1024 (the default), the unit prefixes are opposite
 	from those specified in the SI and IEC 80000-13 standards to provide
 	compatibility with old scripts.  For example, 4k means 4096.
 
@@ -516,7 +526,7 @@ Parameter types
 	The *integer suffix* is not case sensitive (e.g., m/mi mean mebi/mega,
 	not milli). 'b' and 'B' both mean byte, not bit.
 
-	Examples with :option:`kb_base` =1000:
+	Examples with :option:`kb_base`\=1000:
 
 		* *4 KiB*: 4096, 4096b, 4096B, 4ki, 4kib, 4kiB, 4Ki, 4KiB
 		* *1 MiB*: 1048576, 1mi, 1024ki
@@ -524,7 +534,7 @@ Parameter types
 		* *1 TiB*: 1099511627776, 1ti, 1024gi, 1048576mi
 		* *1 TB*: 1000000000, 1t, 1000m, 1000000k
 
-	Examples with :option:`kb_base` =1024 (default):
+	Examples with :option:`kb_base`\=1024 (default):
 
 		* *4 KiB*: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
 		* *1 MiB*: 1048576, 1m, 1024k
@@ -536,15 +546,15 @@ Parameter types
 
 		* *D* -- means days
 		* *H* -- means hours
-		* *M* -- mean minutes
+		* *M* -- means minutes
 		* *s* -- or sec means seconds (default)
 		* *ms* -- or *msec* means milliseconds
 		* *us* -- or *usec* means microseconds
 
 	If the option accepts an upper and lower range, use a colon ':' or
 	minus '-' to separate such values. See :ref:`irange <irange>`.
-	If the lower value specified happens to be larger than the upper value,
-	two values are swapped.
+	If the lower value specified happens to be larger than the upper value
+	the two values are swapped.
 
 .. _bool:
 
@@ -638,7 +648,7 @@ Job description
 	larger number of threads/processes doing the same thing. Each thread is
 	reported separately; to see statistics for all clones as a whole, use
 	:option:`group_reporting` in conjunction with :option:`new_group`.
-	See :option:`--max-jobs`.
+	See :option:`--max-jobs`.  Default: 1.
 
 
 Time related parameters
@@ -649,7 +659,7 @@ Time related parameters
 	Tell fio to terminate processing after the specified period of time.  It
 	can be quite hard to determine for how long a specified job will run, so
 	this parameter is handy to cap the total runtime to a given time.  When
-	the unit is omitted, the value is given in seconds.
+	the unit is omitted, the value is intepreted in seconds.
 
 .. option:: time_based
 
@@ -659,10 +669,9 @@ Time related parameters
 
 .. option:: startdelay=irange(time)
 
-	Delay start of job for the specified number of seconds. Supports all time
-	suffixes to allow specification of hours, minutes, seconds and milliseconds
-	-- seconds are the default if a unit is omitted.  Can be given as a range
-	which causes each thread to choose randomly out of the range.
+	Delay the start of job for the specified amount of time.  Can be a single
+	value or a range.  When given as a range, each thread will choose a value
+	randomly from within the range.  Value is in seconds if a unit is omitted.
 
 .. option:: ramp_time=time
 
@@ -723,36 +732,41 @@ Target file/device
 	Prefix filenames with this directory. Used to place files in a different
 	location than :file:`./`.  You can specify a number of directories by
 	separating the names with a ':' character. These directories will be
-	assigned equally distributed to job clones creates with :option:`numjobs` as
+	assigned equally distributed to job clones created by :option:`numjobs` as
 	long as they are using generated filenames. If specific `filename(s)` are
 	set fio will use the first listed directory, and thereby matching the
 	`filename` semantic which generates a file each clone if not specified, but
 	let all clones use the same if set.
 
-	See the :option:`filename` option for escaping certain characters.
+	See the :option:`filename` option for information on how to escape "``:``" and
+	"``\``" characters within the directory path itself.
 
 .. option:: filename=str
 
 	Fio normally makes up a `filename` based on the job name, thread number, and
-	file number. If you want to share files between threads in a job or several
+	file number (see :option:`filename_format`). If you want to share files
+	between threads in a job or several
 	jobs with fixed file paths, specify a `filename` for each of them to override
 	the default. If the ioengine is file based, you can specify a number of files
 	by separating the names with a ':' colon. So if you wanted a job to open
 	:file:`/dev/sda` and :file:`/dev/sdb` as the two working files, you would use
 	``filename=/dev/sda:/dev/sdb``. This also means that whenever this option is
 	specified, :option:`nrfiles` is ignored. The size of regular files specified
-	by this option will be :option:`size` divided by number of files unless
+	by this option will be :option:`size` divided by number of files unless an
 	explicit size is specified by :option:`filesize`.
 
+	Each colon and backslash in the wanted path must be escaped with a ``\``
+	character.  For instance, if the path is :file:`/dev/dsk/foo@3,0:c` then you
+	would use ``filename=/dev/dsk/foo@3,0\:c`` and if the path is
+	:file:`F:\\filename` then you would use ``filename=F\:\\filename``.
+
 	On Windows, disk devices are accessed as :file:`\\\\.\\PhysicalDrive0` for
 	the first device, :file:`\\\\.\\PhysicalDrive1` for the second etc.
 	Note: Windows and FreeBSD prevent write access to areas
-	of the disk containing in-use data (e.g. filesystems).  If the wanted
-	`filename` does need to include a colon, then escape that with a ``\``
-	character. For instance, if the `filename` is :file:`/dev/dsk/foo@3,0:c`,
-	then you would use ``filename="/dev/dsk/foo@3,0\:c"``.  The
-	:file:`-` is a reserved name, meaning stdin or stdout.  Which of the two
-	depends on the read/write direction set.
+	of the disk containing in-use data (e.g. filesystems).
+
+	The filename "`-`" is a reserved name, meaning *stdin* or *stdout*.  Which
+	of the two depends on the read/write direction set.
 
 .. option:: filename_format=str
 
@@ -860,27 +874,28 @@ Target file/device
 
 	If true, serialize the file creation for the jobs.  This may be handy to
 	avoid interleaving of data files, which may greatly depend on the filesystem
-	used and even the number of processors in the system.
+	used and even the number of processors in the system.  Default: true.
 
 .. option:: create_fsync=bool
 
-	fsync the data file after creation. This is the default.
+	:manpage:`fsync(2)` the data file after creation. This is the default.
 
 .. option:: create_on_open=bool
 
-	Don't pre-setup the files for I/O, just create open() when it's time to do
-	I/O to that file.
+	If true, don't pre-create files but allow the job's open() to create a file
+	when it's time to do I/O.  Default: false -- pre-create all necessary files
+	when the job starts.
 
 .. option:: create_only=bool
 
 	If true, fio will only run the setup phase of the job.  If files need to be
-	laid out or updated on disk, only that will be done. The actual job contents
-	are not executed.
+	laid out or updated on disk, only that will be done -- the actual job contents
+	are not executed.  Default: false.
 
 .. option:: allow_file_create=bool
 
-	If true, fio is permitted to create files as part of its workload. This is
-	the default behavior. If this option is false, then fio will error out if
+	If true, fio is permitted to create files as part of its workload.  If this
+	option is false, then fio will error out if
 	the files it needs to use don't already exist. Default: true.
 
 .. option:: allow_mounted_write=bool
@@ -897,16 +912,18 @@ Target file/device
 	given I/O operation. This will also clear the :option:`invalidate` flag,
 	since it is pointless to pre-read and then drop the cache. This will only
 	work for I/O engines that are seek-able, since they allow you to read the
-	same data multiple times. Thus it will not work on e.g. network or splice I/O.
+	same data multiple times. Thus it will not work on non-seekable I/O engines
+	(e.g. network, splice). Default: false.
 
 .. option:: unlink=bool
 
 	Unlink the job files when done. Not the default, as repeated runs of that
-	job would then waste time recreating the file set again and again.
+	job would then waste time recreating the file set again and again. Default:
+	false.
 
 .. option:: unlink_each_loop=bool
 
-	Unlink job files after each iteration or loop.
+	Unlink job files after each iteration or loop.  Default: false.
 
 .. option:: zonesize=int
 
@@ -952,10 +969,10 @@ I/O type
 				Sequential writes.
 		**trim**
 				Sequential trims (Linux block devices only).
-		**randwrite**
-				Random writes.
 		**randread**
 				Random reads.
+		**randwrite**
+				Random writes.
 		**randtrim**
 				Random trims (Linux block devices only).
 		**rw,readwrite**
@@ -968,15 +985,16 @@ I/O type
 
 	Fio defaults to read if the option is not specified.  For the mixed I/O
 	types, the default is to split them 50/50.  For certain types of I/O the
-	result may still be skewed a bit, since the speed may be different. It is
-	possible to specify a number of I/O's to do before getting a new offset,
-	this is done by appending a ``:<nr>`` to the end of the string given.  For a
+	result may still be skewed a bit, since the speed may be different.
+
+	It is possible to specify the number of I/Os to do before getting a new
+	offset by appending ``:<nr>`` to the end of the string given.  For a
 	random read, it would look like ``rw=randread:8`` for passing in an offset
 	modifier with a value of 8. If the suffix is used with a sequential I/O
-	pattern, then the value specified will be added to the generated offset for
-	each I/O.  For instance, using ``rw=write:4k`` will skip 4k for every
-	write. It turns sequential I/O into sequential I/O with holes.  See the
-	:option:`rw_sequencer` option.
+	pattern, then the *<nr>* value specified will be **added** to the generated
+	offset for each I/O turning sequential I/O into sequential I/O with holes.
+	For instance, using ``rw=write:4k`` will skip 4k for every write.  Also see
+	the :option:`rw_sequencer` option.
 
 .. option:: rw_sequencer=str
 
@@ -1099,23 +1117,25 @@ I/O type
 
 .. option:: fsync=int
 
-	If writing to a file, issue a sync of the dirty data for every number of
-	blocks given. For example, if you give 32 as a parameter, fio will sync the
-	file for every 32 writes issued. If fio is using non-buffered I/O, we may
-	not sync the file. The exception is the sg I/O engine, which synchronizes
-	the disk cache anyway. Defaults to 0, which means no sync every certain
-	number of writes.
+	If writing to a file, issue an :manpage:`fsync(2)` (or its equivalent) of
+	the dirty data for every number of blocks given. For example, if you give 32
+	as a parameter, fio will sync the file after every 32 writes issued. If fio is
+	using non-buffered I/O, we may not sync the file. The exception is the sg
+	I/O engine, which synchronizes the disk cache anyway. Defaults to 0, which
+	means fio does not periodically issue and wait for a sync to complete. Also
+	see :option:`end_fsync` and :option:`fsync_on_close`.
 
 .. option:: fdatasync=int
 
 	Like :option:`fsync` but uses :manpage:`fdatasync(2)` to only sync data and
 	not metadata blocks.  In Windows, FreeBSD, and DragonFlyBSD there is no
-	:manpage:`fdatasync(2)`, this falls back to using :manpage:`fsync(2)`.
-	Defaults to 0, which means no sync data every certain number of writes.
+	:manpage:`fdatasync(2)` so this falls back to using :manpage:`fsync(2)`.
+	Defaults to 0, which means fio does not periodically issue and wait for a
+	data-only sync to complete.
 
 .. option:: write_barrier=int
 
-   Make every `N-th` write a barrier write.
+	Make every `N-th` write a barrier write.
 
 .. option:: sync_file_range=str:val
 
@@ -1140,17 +1160,18 @@ I/O type
 	If true, writes to a file will always overwrite existing data. If the file
 	doesn't already exist, it will be created before the write phase begins. If
 	the file exists and is large enough for the specified write phase, nothing
-	will be done.
+	will be done. Default: false.
 
 .. option:: end_fsync=bool
 
-	If true, fsync file contents when a write stage has completed.
+	If true, :manpage:`fsync(2)` file contents when a write stage has completed.
+	Default: false.
 
 .. option:: fsync_on_close=bool
 
 	If true, fio will :manpage:`fsync(2)` a dirty file on close.  This differs
-	from end_fsync in that it will happen on every file close, not just at the
-	end of the job.
+	from :option:`end_fsync` in that it will happen on every file close, not
+	just at the end of the job.  Default: false.
 
 .. option:: rwmixread=int
 
@@ -1383,8 +1404,8 @@ Buffers and memory
 .. option:: buffer_compress_percentage=int
 
 	If this is set, then fio will attempt to provide I/O buffer content (on
-	WRITEs) that compress to the specified level. Fio does this by providing a
-	mix of random data and a fixed pattern. The fixed pattern is either zeroes,
+	WRITEs) that compresses to the specified level. Fio does this by providing a
+	mix of random data and a fixed pattern. The fixed pattern is either zeros,
 	or the pattern specified by :option:`buffer_pattern`. If the pattern option
 	is used, it might skew the compression ratio slightly. Note that this is per
 	block size unit, for file/disk wide compression level that matches this
@@ -1438,8 +1459,8 @@ Buffers and memory
 
 .. option:: invalidate=bool
 
-	Invalidate the buffer/page cache parts for this file prior to starting
-	I/O if the platform and file type support it. Defaults to true.
+	Invalidate the buffer/page cache parts of the files to be used prior to
+	starting I/O if the platform and file type support it.  Defaults to true.
 	This will be ignored if :option:`pre_read` is also specified for the
 	same job.
 
@@ -1465,7 +1486,7 @@ Buffers and memory
 			Same as shm, but use huge pages as backing.
 
 		**mmap**
-			Use mmap to allocate buffers. May either be anonymous memory, or can
+			Use :manpage:`mmap(2)` to allocate buffers. May either be anonymous memory, or can
 			be file backed if a filename is given after the option. The format
 			is `mem=mmap:/path/to/file`.
 
@@ -1551,7 +1572,7 @@ I/O size
 	and :option:`io_size` is set to 40GiB, then fio will do 40GiB of I/O within
 	the 0..20GiB region.
 
-.. option:: filesize=int
+.. option:: filesize=irange(int)
 
 	Individual file sizes. May be a range, in which case fio will select sizes
 	for files at random within the given range and limited to :option:`size` in
@@ -1604,7 +1625,7 @@ I/O engine
 
 		**libaio**
 			Linux native asynchronous I/O. Note that Linux may only support
-			queued behaviour with non-buffered I/O (set ``direct=1`` or
+			queued behavior with non-buffered I/O (set ``direct=1`` or
 			``buffered=0``).
 			This engine defines engine specific options.
 
@@ -1654,7 +1675,7 @@ I/O engine
 		**cpuio**
 			Doesn't transfer any data, but burns CPU cycles according to the
 			:option:`cpuload` and :option:`cpuchunks` options. Setting
-			:option:`cpuload` =85 will cause that job to do nothing but burn 85%
+			:option:`cpuload`\=85 will cause that job to do nothing but burn 85%
 			of the CPU. In case of SMP machines, use :option:`numjobs`
 			=<no_of_cpu> to get desired CPU usage, as the cpuload only loads a
 			single CPU at the desired rate. A job never finishes unless there is
@@ -1701,26 +1722,26 @@ I/O engine
 			ioengine defines engine specific options.
 
 		**gfapi**
-			Using Glusterfs libgfapi sync interface to direct access to
-			Glusterfs volumes without having to go through FUSE.  This ioengine
+			Using GlusterFS libgfapi sync interface to direct access to
+			GlusterFS volumes without having to go through FUSE.  This ioengine
 			defines engine specific options.
 
 		**gfapi_async**
-			Using Glusterfs libgfapi async interface to direct access to
-			Glusterfs volumes without having to go through FUSE. This ioengine
+			Using GlusterFS libgfapi async interface to direct access to
+			GlusterFS volumes without having to go through FUSE. This ioengine
 			defines engine specific options.
 
 		**libhdfs**
 			Read and write through Hadoop (HDFS).  The :file:`filename` option
 			is used to specify host,port of the hdfs name-node to connect.  This
 			engine interprets offsets a little differently.  In HDFS, files once
-			created cannot be modified.  So random writes are not possible. To
-			imitate this, libhdfs engine expects bunch of small files to be
-			created over HDFS, and engine will randomly pick a file out of those
-			files based on the offset generated by fio backend. (see the example
+			created cannot be modified so random writes are not possible. To
+			imitate this the libhdfs engine expects a bunch of small files to be
+			created over HDFS and will randomly pick a file from them
+			based on the offset generated by fio backend (see the example
 			job file to create such files, use ``rw=write`` option). Please
-			note, you might want to set necessary environment variables to work
-			with hdfs/libhdfs properly.  Each job uses its own connection to
+			note, it may be necessary to set environment variables to work
+			with HDFS/libhdfs properly.  Each job uses its own connection to
 			HDFS.
 
 		**mtd**
@@ -1728,7 +1749,7 @@ I/O engine
 			:file:`/dev/mtd0`). Discards are treated as erases. Depending on the
 			underlying device type, the I/O may have to go in a certain pattern,
 			e.g., on NAND, writing sequentially to erase blocks and discarding
-			before overwriting. The writetrim mode works well for this
+			before overwriting. The `trimwrite` mode works well for this
 			constraint.
 
 		**pmemblk**
@@ -1782,13 +1803,13 @@ caveat that when used on the command line, they must come after the
 
 .. option:: hostname=str : [netsplice] [net]
 
-	The host name or IP address to use for TCP or UDP based I/O.  If the job is
-	a TCP listener or UDP reader, the host name is not used and must be omitted
+	The hostname or IP address to use for TCP or UDP based I/O.  If the job is
+	a TCP listener or UDP reader, the hostname is not used and must be omitted
 	unless it is a valid UDP multicast address.
 
 .. option:: namenode=str : [libhdfs]
 
-	The host name or IP address of a HDFS cluster namenode to contact.
+	The hostname or IP address of a HDFS cluster namenode to contact.
 
 .. option:: port=int
 
@@ -1865,7 +1886,7 @@ caveat that when used on the command line, they must come after the
 
 .. option:: donorname=str : [e4defrag]
 
-	File will be used as a block donor(swap extents between files).
+	File will be used as a block donor (swap extents between files).
 
 .. option:: inplace=int : [e4defrag]
 
@@ -1919,7 +1940,7 @@ I/O depth
 	for small degrees when :option:`verify_async` is in use).  Even async
 	engines may impose OS restrictions causing the desired depth not to be
 	achieved.  This may happen on Linux when using libaio and not setting
-	:option:`direct` =1, since buffered I/O is not async on that OS.  Keep an
+	:option:`direct`\=1, since buffered I/O is not async on that OS.  Keep an
 	eye on the I/O depth distribution in the fio output to verify that the
 	achieved depth is as expected. Default: 1.
 
@@ -1942,9 +1963,9 @@ I/O depth
 .. option:: iodepth_batch_complete_max=int
 
 	This defines maximum pieces of I/O to retrieve at once. This variable should
-	be used along with :option:`iodepth_batch_complete_min` =int variable,
+	be used along with :option:`iodepth_batch_complete_min`\=int variable,
 	specifying the range of min and max amount of I/O which should be
-	retrieved. By default it is equal to :option:`iodepth_batch_complete_min`
+	retrieved. By default it is equal to the :option:`iodepth_batch_complete_min`
 	value.
 
 	Example #1::
@@ -1982,7 +2003,7 @@ I/O depth
 	has a bit of extra overhead, especially for lower queue depth I/O where it
 	can increase latencies. The benefit is that fio can manage submission rates
 	independently of the device completion rates. This avoids skewed latency
-	reporting if I/O gets back up on the device side (the coordinated omission
+	reporting if I/O gets backed up on the device side (the coordinated omission
 	problem).
 
 
@@ -1993,7 +2014,7 @@ I/O rate
 
 	Stall the job for the specified period of time after an I/O has completed before issuing the
 	next. May be used to simulate processing being done by an application.
-	When the unit is omitted, the value is given in microseconds.  See
+	When the unit is omitted, the value is interpreted in microseconds.  See
 	:option:`thinktime_blocks` and :option:`thinktime_spin`.
 
 .. option:: thinktime_spin=time
@@ -2001,7 +2022,7 @@ I/O rate
 	Only valid if :option:`thinktime` is set - pretend to spend CPU time doing
 	something with the data received, before falling back to sleeping for the
 	rest of the period specified by :option:`thinktime`.  When the unit is
-	omitted, the value is given in microseconds.
+	omitted, the value is interpreted in microseconds.
 
 .. option:: thinktime_blocks=int
 
@@ -2018,6 +2039,11 @@ I/O rate
 	suffix rules apply.  Comma-separated values may be specified for reads,
 	writes, and trims as described in :option:`blocksize`.
 
+	For example, using `rate=1m,500k` would limit reads to 1MiB/sec and writes to
+	500KiB/sec.  Capping only reads or writes can be done with `rate=,500k` or
+	`rate=500k,` where the former will only limit writes (to 500KiB/sec) and the
+	latter will only limit reads.
+
 .. option:: rate_min=int[,int][,int]
 
 	Tell fio to do whatever it can to maintain at least this bandwidth. Failing
@@ -2057,14 +2083,14 @@ I/O latency
 
 	If set, fio will attempt to find the max performance point that the given
 	workload will run at while maintaining a latency below this target.  When
-	the unit is omitted, the value is given in microseconds.  See
+	the unit is omitted, the value is interpreted in microseconds.  See
 	:option:`latency_window` and :option:`latency_percentile`.
 
 .. option:: latency_window=time
 
 	Used with :option:`latency_target` to specify the sample window that the job
 	is run at varying queue depths to test the performance.  When the unit is
-	omitted, the value is given in microseconds.
+	omitted, the value is interpreted in microseconds.
 
 .. option:: latency_percentile=float
 
@@ -2076,13 +2102,13 @@ I/O latency
 .. option:: max_latency=time
 
 	If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
-	maximum latency. When the unit is omitted, the value is given in
+	maximum latency. When the unit is omitted, the value is interpreted in
 	microseconds.
 
 .. option:: rate_cycle=int
 
 	Average bandwidth for :option:`rate` and :option:`rate_min` over this number
-	of milliseconds.
+	of milliseconds. Defaults to 1000.
 
 
 I/O replay
@@ -2096,7 +2122,7 @@ I/O replay
 
 .. option:: read_iolog=str
 
-	Open an iolog with the specified file name and replay the I/O patterns it
+	Open an iolog with the specified filename and replay the I/O patterns it
 	contains. This can be used to store a workload and replay it sometime
 	later. The iolog given may also be a blktrace binary file, which allows fio
 	to replay a workload captured by :command:`blktrace`. See
@@ -2107,7 +2133,7 @@ I/O replay
 .. option:: replay_no_stall=int
 
 	When replaying I/O with :option:`read_iolog` the default behavior is to
-	attempt to respect the time stamps within the log and replay them with the
+	attempt to respect the timestamps within the log and replay them with the
 	appropriate delay between IOPS. By setting this variable fio will not
 	respect the timestamps and attempt to replay them as fast as possible while
 	still respecting ordering. The result is the same I/O pattern to a given
@@ -2120,9 +2146,9 @@ I/O replay
 	from.  This is sometimes undesirable because on a different machine those
 	major/minor numbers can map to a different device.  Changing hardware on the
 	same system can also result in a different major/minor mapping.
-	``replay_redirect`` causes all IOPS to be replayed onto the single specified
+	``replay_redirect`` causes all I/Os to be replayed onto the single specified
 	device regardless of the device it was recorded
-	from. i.e. :option:`replay_redirect` = :file:`/dev/sdc` would cause all I/O
+	from. i.e. :option:`replay_redirect`\= :file:`/dev/sdc` would cause all I/O
 	in the blktrace or iolog to be replayed onto :file:`/dev/sdc`.  This means
 	multiple devices will be replayed onto a single device, if the trace
 	contains multiple devices. If you want multiple devices to be replayed
@@ -2146,15 +2172,14 @@ Threads, processes and job synchronization
 
 .. option:: thread
 
-	Fio defaults to forking jobs, however if this option is given, fio will use
-	POSIX Threads function :manpage:`pthread_create(3)` to create threads instead
-	of forking processes.
+	Fio defaults to creating jobs by using fork, however if this option is
+	given, fio will create jobs by using POSIX Threads' function
+	:manpage:`pthread_create(3)` to create threads instead.
 
 .. option:: wait_for=str
 
-	Specifies the name of the already defined job to wait for. Single waitee
-	name only may be specified. If set, the job won't be started until all
-	workers of the waitee job are done.
+	If set, the current job won't be started until all workers of the specified
+	waitee job are done.
 
 	``wait_for`` operates on the job name basis, so there are a few
 	limitations. First, the waitee must be defined prior to the waiter job
@@ -2182,8 +2207,8 @@ Threads, processes and job synchronization
 
 .. option:: cpumask=int
 
-	Set the CPU affinity of this job. The parameter given is a bitmask of
-	allowed CPU's the job may run on. So if you want the allowed CPUs to be 1
+	Set the CPU affinity of this job. The parameter given is a bit mask of
+	allowed CPUs the job may run on. So if you want the allowed CPUs to be 1
 	and 5, you would pass the decimal value of (1 << 1 | 1 << 5), or 34. See man
 	:manpage:`sched_setaffinity(2)`. This may not work on all supported
 	operating systems or kernel versions. This option doesn't work well for a
@@ -2193,23 +2218,23 @@ Threads, processes and job synchronization
 
 .. option:: cpus_allowed=str
 
-	Controls the same options as :option:`cpumask`, but it allows a text setting
-	of the permitted CPUs instead. So to use CPUs 1 and 5, you would specify
-	``cpus_allowed=1,5``. This options also allows a range of CPUs. Say you
-	wanted a binding to CPUs 1, 5, and 8-15, you would set
-	``cpus_allowed=1,5,8-15``.
+	Controls the same options as :option:`cpumask`, but accepts a textual
+	specification of the permitted CPUs instead. So to use CPUs 1 and 5 you
+	would specify ``cpus_allowed=1,5``. This option also allows a range of CPUs
+	to be specified -- say you wanted a binding to CPUs 1, 5, and 8 to 15, you
+	would set ``cpus_allowed=1,5,8-15``.
 
 .. option:: cpus_allowed_policy=str
 
 	Set the policy of how fio distributes the CPUs specified by
-	:option:`cpus_allowed` or cpumask. Two policies are supported:
+	:option:`cpus_allowed` or :option:`cpumask`. Two policies are supported:
 
 		**shared**
 			All jobs will share the CPU set specified.
 		**split**
 			Each job will get a unique CPU from the CPU set.
 
-	**shared** is the default behaviour, if the option isn't specified. If
+	**shared** is the default behavior, if the option isn't specified. If
 	**split** is specified, then fio will will assign one cpu per job. If not
 	enough CPUs are given for the jobs listed, then fio will roundrobin the CPUs
 	in the set.
@@ -2218,7 +2243,7 @@ Threads, processes and job synchronization
 
 	Set this job running on specified NUMA nodes' CPUs. The arguments allow
 	comma delimited list of cpu numbers, A-B ranges, or `all`. Note, to enable
-	numa options support, fio must be built on a system with libnuma-dev(el)
+	NUMA options support, fio must be built on a system with libnuma-dev(el)
 	installed.
 
 .. option:: numa_mem_policy=str
@@ -2228,11 +2253,11 @@ Threads, processes and job synchronization
 
 		<mode>[:<nodelist>]
 
-	``mode`` is one of the following memory policy: ``default``, ``prefer``,
-	``bind``, ``interleave``, ``local`` For ``default`` and ``local`` memory
-	policy, no node is needed to be specified.  For ``prefer``, only one node is
-	allowed.  For ``bind`` and ``interleave``, it allow comma delimited list of
-	numbers, A-B ranges, or `all`.
+	``mode`` is one of the following memory poicies: ``default``, ``prefer``,
+	``bind``, ``interleave`` or ``local``. For ``default`` and ``local`` memory
+	policies, no node needs to be specified.  For ``prefer``, only one node is
+	allowed.  For ``bind`` and ``interleave`` the ``nodelist`` may be as
+	follows: a comma delimited list of numbers, A-B ranges, or `all`.
 
 .. option:: cgroup=str
 
@@ -2288,8 +2313,9 @@ Threads, processes and job synchronization
 
 .. option:: exitall
 
-	When one job finishes, terminate the rest. The default is to wait for each
-	job to finish, sometimes that is not the desired action.
+	By default, fio will continue running all other jobs when one job finishes
+	but sometimes this is not the desired action.  Setting ``exitall`` will
+	instead make fio terminate all other jobs when one job finishes.
 
 .. option:: exec_prerun=str
 
@@ -2347,13 +2373,14 @@ Verification
 			header of each block.
 
 		**crc32c**
-			Use a crc32c sum of the data area and store it in the header of each
-			block.
+			Use a crc32c sum of the data area and store it in the header of
+			each block. This will automatically use hardware acceleration
+			(e.g. SSE4.2 on an x86 or CRC crypto extensions on ARM64) but will
+			fall back to software crc32c if none is found. Generally the
+			fatest checksum fio supports when hardware accelerated.
 
 		**crc32c-intel**
-			Use hardware assisted crc32c calculation provided on SSE4.2 enabled
-			processors. Falls back to regular software crc32c, if not supported
-			by the system.
+			Synonym for crc32c.
 
 		**crc32**
 			Use a crc32 sum of the data area and store it in the header of each
@@ -2406,7 +2433,7 @@ Verification
 
 		**null**
 			Only pretend to verify. Useful for testing internals with
-			:option:`ioengine` `=null`, not for much else.
+			:option:`ioengine`\=null, not for much else.
 
 	This option can be used for repeated burn-in tests of a system to make sure
 	that the written data is also correctly read back. If the data direction
@@ -2442,7 +2469,7 @@ Verification
 	If set, fio will fill the I/O buffers with this pattern. Fio defaults to
 	filling with totally random bytes, but sometimes it's interesting to fill
 	with a known pattern for I/O verification purposes. Depending on the width
-	of the pattern, fio will fill 1/2/3/4 bytes of the buffer at the time(it can
+	of the pattern, fio will fill 1/2/3/4 bytes of the buffer at the time (it can
 	be either a decimal or a hex number).  The ``verify_pattern`` if larger than
 	a 32-bit quantity has to be a hex number that starts with either "0x" or
 	"0X". Use with :option:`verify`. Also, ``verify_pattern`` supports %o
@@ -2519,7 +2546,8 @@ Verification
 	If a verify termination trigger was used, fio stores the current write state
 	of each thread. This can be used at verification time so that fio knows how
 	far it should verify.  Without this information, fio will run a full
-	verification pass, according to the settings in the job file used.
+	verification pass, according to the settings in the job file used.  Default
+	false.
 
 .. option:: trim_percentage=int
 
@@ -2527,11 +2555,11 @@ Verification
 
 .. option:: trim_verify_zero=bool
 
-	Verify that trim/discarded blocks are returned as zeroes.
+	Verify that trim/discarded blocks are returned as zeros.
 
 .. option:: trim_backlog=int
 
-	Verify that trim/discarded blocks are returned as zeroes.
+	Verify that trim/discarded blocks are returned as zeros.
 
 .. option:: trim_backlog_batch=int
 
@@ -2582,13 +2610,13 @@ Steady state
 	A rolling window of this duration will be used to judge whether steady state
 	has been reached. Data will be collected once per second. The default is 0
 	which disables steady state detection.  When the unit is omitted, the
-	value is given in seconds.
+	value is interpreted in seconds.
 
 .. option:: steadystate_ramp_time=time, ss_ramp=time
 
 	Allow the job to run for the specified duration before beginning data
 	collection for checking the steady state job termination criterion. The
-	default is 0.  When the unit is omitted, the value is given in seconds.
+	default is 0.  When the unit is omitted, the value is interpreted in seconds.
 
 
 Measurements and reporting
@@ -2627,7 +2655,7 @@ Measurements and reporting
 	If given, write a bandwidth log for this job. Can be used to store data of
 	the bandwidth of the jobs in their lifetime. The included
 	:command:`fio_generate_plots` script uses :command:`gnuplot` to turn these
-	text files into nice graphs. See :option:`write_lat_log` for behaviour of
+	text files into nice graphs. See :option:`write_lat_log` for behavior of
 	given filename. For this option, the postfix is :file:`_bw.x.log`, where `x`
 	is the index of the job (`1..N`, where `N` is the number of jobs). If
 	:option:`per_job_logs` is false, then the filename will not include the job
@@ -2674,6 +2702,7 @@ Measurements and reporting
 	very large size. Setting this option makes fio average the each log entry
 	over the specified period of time, reducing the resolution of the log.  See
 	:option:`log_max_value` as well. Defaults to 0, logging all entries.
+	Also see `Log File Formats`_.
 
 .. option:: log_hist_msec=int
 
@@ -2897,7 +2926,8 @@ Act profile options
 .. option:: test-duration=time
 	:noindex:
 
-	How long the entire test takes to run.  Default: 24h.
+	How long the entire test takes to run.  When the unit is omitted, the value
+	is given in seconds.  Default: 24h.
 
 .. option:: threads-per-queue=int
 	:noindex:
@@ -2950,13 +2980,20 @@ Tiobench profile options
 Interpreting the output
 -----------------------
 
+..
+	Example output was based on the following:
+	TZ=UTC fio --iodepth=8 --ioengine=null --size=100M --time_based \
+		--rate=1256k --bs=14K --name=quick --runtime=1s --name=mixed \
+		--runtime=2m --rw=rw
+
 Fio spits out a lot of output. While running, fio will display the status of the
 jobs created. An example of that would be::
 
     Jobs: 1 (f=1): [_(1),M(1)][24.8%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 01m:31s]
 
-The characters inside the square brackets denote the current status of each
-thread. The possible values (in typical life cycle order) are:
+The characters inside the first set of square brackets denote the current status of
+each thread.  The first character is the first job defined in the job file, and so
+forth.  The possible values (in typical life cycle order) are:
 
 +------+-----+-----------------------------------------------------------+
 | Idle | Run |                                                           |
@@ -2969,6 +3006,8 @@ thread. The possible values (in typical life cycle order) are:
 +------+-----+-----------------------------------------------------------+
 |      |  p  | Thread running pre-reading file(s).                       |
 +------+-----+-----------------------------------------------------------+
+|      |  /  | Thread is in ramp period.                                 |
++------+-----+-----------------------------------------------------------+
 |      |  R  | Running, doing sequential reads.                          |
 +------+-----+-----------------------------------------------------------+
 |      |  r  | Running, doing random reads.                              |
@@ -2981,77 +3020,103 @@ thread. The possible values (in typical life cycle order) are:
 +------+-----+-----------------------------------------------------------+
 |      |  m  | Running, doing mixed random reads/writes.                 |
 +------+-----+-----------------------------------------------------------+
-|      |  F  | Running, currently waiting for :manpage:`fsync(2)`        |
+|      |  D  | Running, doing sequential trims.                          |
++------+-----+-----------------------------------------------------------+
+|      |  d  | Running, doing random trims.                              |
++------+-----+-----------------------------------------------------------+
+|      |  F  | Running, currently waiting for :manpage:`fsync(2)`.       |
 +------+-----+-----------------------------------------------------------+
 |      |  V  | Running, doing verification of written data.              |
 +------+-----+-----------------------------------------------------------+
+| f    |     | Thread finishing.                                         |
++------+-----+-----------------------------------------------------------+
 | E    |     | Thread exited, not reaped by main thread yet.             |
 +------+-----+-----------------------------------------------------------+
-| _    |     | Thread reaped, or                                         |
+| _    |     | Thread reaped.                                            |
 +------+-----+-----------------------------------------------------------+
 | X    |     | Thread reaped, exited with an error.                      |
 +------+-----+-----------------------------------------------------------+
 | K    |     | Thread reaped, exited due to signal.                      |
 +------+-----+-----------------------------------------------------------+
 
+..
+	Example output was based on the following:
+	TZ=UTC fio --iodepth=8 --ioengine=null --size=100M --runtime=58m \
+		--time_based --rate=2512k --bs=256K --numjobs=10 \
+		--name=readers --rw=read --name=writers --rw=write
+
 Fio will condense the thread string as not to take up more space on the command
-line as is needed. For instance, if you have 10 readers and 10 writers running,
+line than needed. For instance, if you have 10 readers and 10 writers running,
 the output would look like this::
 
     Jobs: 20 (f=20): [R(10),W(10)][4.0%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 57m:36s]
 
-Fio will still maintain the ordering, though. So the above means that jobs 1..10
-are readers, and 11..20 are writers.
+Note that the status string is displayed in order, so it's possible to tell which of
+the jobs are currently doing what.  In the example above this means that jobs 1--10
+are readers and 11--20 are writers.
 
 The other values are fairly self explanatory -- number of threads currently
-running and doing I/O, the number of currently open files (f=), the rate of I/O
-since last check (read speed listed first, then write speed and optionally trim
-speed), and the estimated completion percentage and time for the current
+running and doing I/O, the number of currently open files (f=), the estimated
+completion percentage, the rate of I/O since last check (read speed listed first,
+then write speed and optionally trim speed) in terms of bandwidth and IOPS, and time to completion for the current
 running group. It's impossible to estimate runtime of the following groups (if
-any). Note that the string is displayed in order, so it's possible to tell which
-of the jobs are currently doing what. The first character is the first job
-defined in the job file, and so forth.
-
-When fio is done (or interrupted by :kbd:`ctrl-c`), it will show the data for
-each thread, group of threads, and disks in that order. For each data direction,
-the output looks like::
-
-    Client1 (g=0): err= 0:
-      write: io=    32MiB, bw=   666KiB/s, iops=89 , runt= 50320msec
-        slat (msec): min=    0, max=  136, avg= 0.03, stdev= 1.92
-        clat (msec): min=    0, max=  631, avg=48.50, stdev=86.82
-        bw (KiB/s) : min=    0, max= 1196, per=51.00%, avg=664.02, stdev=681.68
-      cpu        : usr=1.49%, sys=0.25%, ctx=7969, majf=0, minf=17
-      IO depths    : 1=0.1%, 2=0.3%, 4=0.5%, 8=99.0%, 16=0.0%, 32=0.0%, >32=0.0%
-         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
-         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
-         issued r/w: total=0/32768, short=0/0
-         lat (msec): 2=1.6%, 4=0.0%, 10=3.2%, 20=12.8%, 50=38.4%, 100=24.8%,
-         lat (msec): 250=15.2%, 500=0.0%, 750=0.0%, 1000=0.0%, >=2048=0.0%
-
-The client number is printed, along with the group id and error of that
-thread. Below is the I/O statistics, here for writes. In the order listed, they
-denote:
-
-**io**
-		Number of megabytes I/O performed.
-
-**bw**
-		Average bandwidth rate.
-
-**iops**
-		Average I/Os performed per second.
-
-**runt**
-		The runtime of that thread.
+any).
+
+..
+	Example output was based on the following:
+	TZ=UTC fio --iodepth=16 --ioengine=posixaio --filename=/tmp/fiofile \
+		--direct=1 --size=100M --time_based --runtime=50s --rate_iops=89 \
+		--bs=7K --name=Client1 --rw=write
+
+When fio is done (or interrupted by :kbd:`Ctrl-C`), it will show the data for
+each thread, group of threads, and disks in that order. For each overall thread (or
+group) the output looks like::
+
+	Client1: (groupid=0, jobs=1): err= 0: pid=16109: Sat Jun 24 12:07:54 2017
+	  write: IOPS=88, BW=623KiB/s (638kB/s)(30.4MiB/50032msec)
+	    slat (nsec): min=500, max=145500, avg=8318.00, stdev=4781.50
+	    clat (usec): min=170, max=78367, avg=4019.02, stdev=8293.31
+	     lat (usec): min=174, max=78375, avg=4027.34, stdev=8291.79
+	    clat percentiles (usec):
+	     |  1.00th=[  302],  5.00th=[  326], 10.00th=[  343], 20.00th=[  363],
+	     | 30.00th=[  392], 40.00th=[  404], 50.00th=[  416], 60.00th=[  445],
+	     | 70.00th=[  816], 80.00th=[ 6718], 90.00th=[12911], 95.00th=[21627],
+	     | 99.00th=[43779], 99.50th=[51643], 99.90th=[68682], 99.95th=[72877],
+	     | 99.99th=[78119]
+	   bw (  KiB/s): min=  532, max=  686, per=0.10%, avg=622.87, stdev=24.82, samples=  100
+	   iops        : min=   76, max=   98, avg=88.98, stdev= 3.54, samples=  100
+	    lat (usec) : 250=0.04%, 500=64.11%, 750=4.81%, 1000=2.79%
+	    lat (msec) : 2=4.16%, 4=1.84%, 10=4.90%, 20=11.33%, 50=5.37%
+	    lat (msec) : 100=0.65%
+	  cpu          : usr=0.27%, sys=0.18%, ctx=12072, majf=0, minf=21
+	  IO depths    : 1=85.0%, 2=13.1%, 4=1.8%, 8=0.1%, 16=0.0%, 32=0.0%, >=64=0.0%
+	     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
+	     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
+	     issued rwt: total=0,4450,0, short=0,0,0, dropped=0,0,0
+	     latency   : target=0, window=0, percentile=100.00%, depth=8
+
+The job name (or first job's name when using :option:`group_reporting`) is printed,
+along with the group id, count of jobs being aggregated, last error id seen (which
+is 0 when there are no errors), pid/tid of that thread and the time the job/group
+completed.  Below are the I/O statistics for each data direction performed (showing
+writes in the example above).  In the order listed, they denote:
+
+**read/write/trim**
+		The string before the colon shows the I/O direction the statistics
+		are for.  **IOPS** is the average I/Os performed per second.  **BW**
+		is the average bandwidth rate shown as: value in power of 2 format
+		(value in power of 10 format).  The last two values show: (**total
+		I/O performed** in power of 2 format / **runtime** of that thread).
 
 **slat**
-		Submission latency (avg being the average, stdev being the standard
-		deviation). This is the time it took to submit the I/O. For sync I/O,
-		the slat is really the completion latency, since queue/complete is one
-		operation there. This value can be in milliseconds or microseconds, fio
-		will choose the most appropriate base and print that. In the example
-		above, milliseconds is the best scale. Note: in :option:`--minimal` mode
+		Submission latency (**min** being the minimum, **max** being the
+		maximum, **avg** being the average, **stdev** being the standard
+		deviation).  This is the time it took to submit the I/O.  For
+		sync I/O this row is not displayed as the slat is really the
+		completion latency (since queue/complete is one operation there).
+		This value can be in nanoseconds, microseconds or milliseconds ---
+		fio will choose the most appropriate base and print that (in the
+		example above nanoseconds was the best scale).  Note: in :option:`--minimal` mode
 		latencies are always expressed in microseconds.
 
 **clat**
@@ -3062,11 +3127,15 @@ denote:
 		explanation).
 
 **bw**
-		Bandwidth. Same names as the xlat stats, but also includes an
-		approximate percentage of total aggregate bandwidth this thread received
-		in this group. This last value is only really useful if the threads in
-		this group are on the same disk, since they are then competing for disk
-		access.
+		Bandwidth statistics based on samples. Same names as the xlat stats,
+		but also includes the number of samples taken (**samples**) and an
+		approximate percentage of total aggregate bandwidth this thread
+		received in its group (**per**). This last value is only really
+		useful if the threads in this group are on the same disk, since they
+		are then competing for disk access.
+
+**iops**
+		IOPS statistics based on samples. Same names as bw.
 
 **cpu**
 		CPU usage. User and system time, along with the number of context
@@ -3076,23 +3145,27 @@ denote:
 		context and fault counters are summed.
 
 **IO depths**
-		The distribution of I/O depths over the job life time. The numbers are
-		divided into powers of 2, so for example the 16= entries includes depths
-		up to that value but higher than the previous entry. In other words, it
-		covers the range from 16 to 31.
+		The distribution of I/O depths over the job lifetime.  The numbers are
+		divided into powers of 2 and each entry covers depths from that value
+		up to those that are lower than the next entry -- e.g., 16= covers
+		depths from 16 to 31.  Note that the range covered by a depth
+		distribution entry can be different to the range covered by the
+		equivalent submit/complete distribution entry.
 
 **IO submit**
 		How many pieces of I/O were submitting in a single submit call. Each
 		entry denotes that amount and below, until the previous entry -- e.g.,
-		8=100% mean that we submitted anywhere in between 5-8 I/Os per submit
-		call.
+		16=100% means that we submitted anywhere between 9 to 16 I/Os per submit
+		call.  Note that the range covered by a submit distribution entry can
+		be different to the range covered by the equivalent depth distribution
+		entry.
 
 **IO complete**
 		Like the above submit number, but for completions instead.
 
-**IO issued**
-		The number of read/write requests issued, and how many of them were
-		short.
+**IO issued rwt**
+		The number of read/write/trim requests issued, and how many of them were
+		short or dropped.
 
 **IO latencies**
 		The distribution of I/O completion latencies. This is the time from when
@@ -3101,27 +3174,31 @@ denote:
 		I/O completed within 2 msecs, 20=12.8% means that 12.8% of the I/O took
 		more than 10 msecs, but less than (or equal to) 20 msecs.
 
+..
+	Example output was based on the following:
+	TZ=UTC fio --ioengine=null --iodepth=2 --size=100M --numjobs=2 \
+		--rate_process=poisson --io_limit=32M --name=read --bs=128k \
+		--rate=11M --name=write --rw=write --bs=2k --rate=700k
+
 After each client has been listed, the group statistics are printed. They
 will look like this::
 
     Run status group 0 (all jobs):
-       READ: io=64MB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec
-      WRITE: io=64MB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec
+       READ: bw=20.9MiB/s (21.9MB/s), 10.4MiB/s-10.8MiB/s (10.9MB/s-11.3MB/s), io=64.0MiB (67.1MB), run=2973-3069msec
+      WRITE: bw=1231KiB/s (1261kB/s), 616KiB/s-621KiB/s (630kB/s-636kB/s), io=64.0MiB (67.1MB), run=52747-53223msec
 
-For each data direction, it prints:
+For each data direction it prints:
 
+**bw**
+		Aggregate bandwidth of threads in this group followed by the
+		minimum and maximum bandwidth of all the threads in this group.
+		Values outside of brackets are power-of-2 format and those
+		within are the equivalent value in a power-of-10 format.
 **io**
-		Number of megabytes I/O performed.
-**aggrb**
-		Aggregate bandwidth of threads in this group.
-**minb**
-		The minimum average bandwidth a thread saw.
-**maxb**
-		The maximum average bandwidth a thread saw.
-**mint**
-		The smallest runtime of the threads in that group.
-**maxt**
-		The longest runtime of the threads in that group.
+		Aggregate I/O performed of all threads in this group. The
+		format is the same as bw.
+**run**
+		The smallest and longest runtimes of the threads in this group.
 
 And finally, the disk statistics are printed. They will look like this::
 
@@ -3137,7 +3214,7 @@ numbers denote:
 		Number of merges I/O the I/O scheduler.
 **ticks**
 		Number of ticks we kept the disk busy.
-**io_queue**
+**in_queue**
 		Total time spent in the disk queue.
 **util**
 		The disk utilization. A value of 100% means we kept the disk
@@ -3163,7 +3240,8 @@ is one long line of values, such as::
 
 The job description (if provided) follows on a second line.
 
-To enable terse output, use the :option:`--minimal` command line option. The
+To enable terse output, use the :option:`--minimal` or
+:option:`--output-format`\=terse command line options. The
 first value is the version of the terse output format. If the output has to be
 changed for some reason, this number will be incremented by 1 to signify that
 change.
@@ -3243,9 +3321,9 @@ For disk utilization, all disks used by fio are shown. So for each disk there
 will be a disk utilization section.
 
 Below is a single line containing short names for each of the fields in the
-minimal output v3, separated by semicolons:
+minimal output v3, separated by semicolons::
 
-terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_max;read_clat_min;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_max;write_clat_min;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;
 write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;pu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
+	terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_max;read_clat_min;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_max;write_clat_min;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10
 ;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;pu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
 
 
 Trace file format
@@ -3267,7 +3345,7 @@ Each line represents a single I/O action in the following format::
 
 where `rw=0/1` for read/write, and the offset and length entries being in bytes.
 
-This format is not supported in fio versions => 1.20-rc3.
+This format is not supported in fio versions >= 1.20-rc3.
 
 
 Trace file format v2
@@ -3364,7 +3442,7 @@ completions, etc.
 
 A trigger is invoked either through creation ('touch') of a specified file in
 the system, or through a timeout setting. If fio is run with
-:option:`--trigger-file` = :file:`/tmp/trigger-file`, then it will continually
+:option:`--trigger-file`\= :file:`/tmp/trigger-file`, then it will continually
 check for the existence of :file:`/tmp/trigger-file`. When it sees this file, it
 will fire off the trigger (thus saving state, and executing the trigger
 command).
@@ -3378,7 +3456,7 @@ will then execute the trigger.
 Verification trigger example
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Lets say we want to run a powercut test on the remote machine 'server'.  Our
+Let's say we want to run a powercut test on the remote machine 'server'.  Our
 write workload is in :file:`write-test.fio`. We want to cut power to 'server' at
 some point during the run, and we'll run this test from the safety or our local
 machine, 'localbox'. On the server, we'll start the fio backend normally::
@@ -3397,7 +3475,7 @@ on the server once it has received the trigger and sent us the write state. This
 will work, but it's not **really** cutting power to the server, it's merely
 abruptly rebooting it. If we have a remote way of cutting power to the server
 through IPMI or similar, we could do that through a local trigger command
-instead. Lets assume we have a script that does IPMI reboot of a given hostname,
+instead. Let's assume we have a script that does IPMI reboot of a given hostname,
 ipmi-reboot. On localbox, we could then have run fio with a local trigger
 instead::
 
@@ -3409,7 +3487,7 @@ execute ``ipmi-reboot server`` when that happened.
 Loading verify state
 ~~~~~~~~~~~~~~~~~~~~
 
-To load store write state, read verification job file must contain the
+To load stored write state, a read verification job file must contain the
 :option:`verify_state_load` option. If that is set, fio will load the previously
 stored state. For a local fio run this is done by loading the files directly,
 and on a client/server run, the server backend will ask the client to send the
@@ -3447,27 +3525,26 @@ The *offset* is the offset, in bytes, from the start of the file, for that
 particular I/O. The logging of the offset can be toggled with
 :option:`log_offset`.
 
-If windowed logging is enabled through :option:`log_avg_msec` then fio doesn't
-log individual I/Os. Instead of logs the average values over the specified period
-of time. Since 'data direction' and 'offset' are per-I/O values, they aren't
-applicable if windowed logging is enabled. If windowed logging is enabled and
-:option:`log_max_value` is set, then fio logs maximum values in that window
-instead of averages.
-
+Fio defaults to logging every individual I/O.  When IOPS are logged for individual
+I/Os the value entry will always be 1.  If windowed logging is enabled through
+:option:`log_avg_msec`, fio logs the average values over the specified period of time.
+If windowed logging is enabled and :option:`log_max_value` is set, then fio logs
+maximum values in that window instead of averages.  Since 'data direction' and
+'offset' are per-I/O values, they aren't applicable if windowed logging is enabled.
 
 Client/server
 -------------
 
 Normally fio is invoked as a stand-alone application on the machine where the
-I/O workload should be generated. However, the frontend and backend of fio can
-be run separately. Ie the fio server can generate an I/O workload on the "Device
-Under Test" while being controlled from another machine.
+I/O workload should be generated. However, the backend and frontend of fio can
+be run separately i.e., the fio server can generate an I/O workload on the "Device
+Under Test" while being controlled by a client on another machine.
 
 Start the server on the machine which has access to the storage DUT::
 
 	fio --server=args
 
-where args defines what fio listens to. The arguments are of the form
+where `args` defines what fio listens to. The arguments are of the form
 ``type,hostname`` or ``IP,port``. *type* is either ``ip`` (or ip4) for TCP/IP
 v4, ``ip6`` for TCP/IP v6, or ``sock`` for a local unix domain socket.
 *hostname* is either a hostname or IP address, and *port* is the port to listen
@@ -3495,7 +3572,7 @@ to (only valid for TCP/IP, not a local socket). Some examples:
 
 6) ``fio --server=sock:/tmp/fio.sock``
 
-   Start a fio server, listening on the local socket /tmp/fio.sock.
+   Start a fio server, listening on the local socket :file:`/tmp/fio.sock`.
 
 Once a server is running, a "client" can connect to the fio server with::
 
@@ -3535,7 +3612,7 @@ servers receive the same job file.
 
 In order to let ``fio --client`` runs use a shared filesystem from multiple
 hosts, ``fio --client`` now prepends the IP address of the server to the
-filename.  For example, if fio is using directory :file:`/mnt/nfs/fio` and is
+filename.  For example, if fio is using the directory :file:`/mnt/nfs/fio` and is
 writing filename :file:`fileio.tmp`, with a :option:`--client` `hostfile`
 containing two hostnames ``h1`` and ``h2`` with IP addresses 192.168.10.120 and
 192.168.10.121, then fio will create two files::
diff --git a/README b/README
index 6bff82b..ec3e9c0 100644
--- a/README
+++ b/README
@@ -102,7 +102,7 @@ Ubuntu:
 Red Hat, Fedora, CentOS & Co:
 	Starting with Fedora 9/Extra Packages for Enterprise Linux 4, fio
 	packages are part of the Fedora/EPEL repositories.
-	https://admin.fedoraproject.org/pkgdb/package/rpms/fio/ .
+	https://apps.fedoraproject.org/packages/fio .
 
 Mandriva:
 	Mandriva has integrated fio into their package repository, so installing
diff --git a/examples/mtd.fio b/examples/mtd.fio
index ca09735..e5dcea4 100644
--- a/examples/mtd.fio
+++ b/examples/mtd.fio
@@ -17,5 +17,5 @@ rw=write
 [write]
 stonewall
 block_error_percentiles=1
-rw=writetrim
+rw=trimwrite
 loops=4
diff --git a/init.c b/init.c
index 934b9d7..a4b5adb 100644
--- a/init.c
+++ b/init.c
@@ -2022,7 +2022,7 @@ static void usage(const char *name)
 	printf("  --version\t\tPrint version info and exit\n");
 	printf("  --help\t\tPrint this page\n");
 	printf("  --cpuclock-test\tPerform test/validation of CPU clock\n");
-	printf("  --crctest=type\tTest speed of checksum functions\n");
+	printf("  --crctest=[type]\tTest speed of checksum functions\n");
 	printf("  --cmdhelp=cmd\t\tPrint command help, \"all\" for all of"
 		" them\n");
 	printf("  --enghelp=engine\tPrint ioengine help, or list"
diff --git a/stat.c b/stat.c
index b3b2cb3..beec574 100644
--- a/stat.c
+++ b/stat.c
@@ -483,7 +483,7 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		}
 
 		if (rs->agg[ddir]) {
-			p_of_agg = mean * 100 / (double) rs->agg[ddir];
+			p_of_agg = mean * 100 / (double) (rs->agg[ddir] / 1024);
 			if (p_of_agg > 100.0)
 				p_of_agg = 100.0;
 		}
@@ -497,13 +497,13 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		}
 
 		log_buf(out, "   bw (%5s/s): min=%5llu, max=%5llu, per=%3.2f%%, "
-			"avg=%5.02f, stdev=%5.02f, samples=%5lu\n",
+			"avg=%5.02f, stdev=%5.02f, samples=%" PRIu64 "\n",
 			bw_str, min, max, p_of_agg, mean, dev,
 			(&ts->bw_stat[ddir])->samples);
 	}
 	if (calc_lat(&ts->iops_stat[ddir], &min, &max, &mean, &dev)) {
-		log_buf(out, "   iops : min=%5llu, max=%5llu, avg=%5.02f, "
-			"stdev=%5.02f, samples=%5lu\n",
+		log_buf(out, "   iops        : min=%5llu, max=%5llu, "
+			"avg=%5.02f, stdev=%5.02f, samples=%" PRIu64 "\n",
 			min, max, mean, dev, (&ts->iops_stat[ddir])->samples);
 	}
 }
@@ -935,12 +935,12 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 
 	if (ver == 5) {
 		if (bw_stat)
-			log_buf(out, ";%lu", (&ts->bw_stat[ddir])->samples);
+			log_buf(out, ";%" PRIu64, (&ts->bw_stat[ddir])->samples);
 		else
 			log_buf(out, ";%lu", 0UL);
 
 		if (calc_lat(&ts->iops_stat[ddir], &min, &max, &mean, &dev))
-			log_buf(out, ";%llu;%llu;%f;%f;%lu", min, max,
+			log_buf(out, ";%llu;%llu;%f;%f;%" PRIu64, min, max,
 				mean, dev, (&ts->iops_stat[ddir])->samples);
 		else
 			log_buf(out, ";%llu;%llu;%f;%f;%lu", 0ULL, 0ULL, 0.0, 0.0, 0UL);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a2c95580b468a1ddd72ecb5532aca7d94f6efa5b:

  stat: Add iops stat and sample number information to terse format (2017-06-23 16:31:02 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 036a159982656aaa98b0a0490defc36c6065aa93:

  Android: fix missing sysmacros.h include (2017-06-25 09:51:13 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Android: fix missing sysmacros.h include

 diskutil.c               | 1 +
 os/os-android.h          | 1 +
 oslib/linux-dev-lookup.c | 1 +
 3 files changed, 3 insertions(+)

---

Diff of recent changes:

diff --git a/diskutil.c b/diskutil.c
index 4fe554f..618cae8 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -3,6 +3,7 @@
 #include <sys/time.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <sys/sysmacros.h>
 #include <dirent.h>
 #include <libgen.h>
 #include <math.h>
diff --git a/os/os-android.h b/os/os-android.h
index c56d682..b217daa 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -7,6 +7,7 @@
 #include <sys/mman.h>
 #include <sys/uio.h>
 #include <sys/syscall.h>
+#include <sys/sysmacros.h>
 #include <sys/vfs.h>
 #include <unistd.h>
 #include <fcntl.h>
diff --git a/oslib/linux-dev-lookup.c b/oslib/linux-dev-lookup.c
index 54017ff..1dda93f 100644
--- a/oslib/linux-dev-lookup.c
+++ b/oslib/linux-dev-lookup.c
@@ -1,5 +1,6 @@
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <sys/sysmacros.h>
 #include <dirent.h>
 #include <string.h>
 #include <stdio.h>

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit cf6b7fb4f1883af9cbc443ed2536e7454ed51215:

  t/time-test: cleanups (2017-06-22 19:29:35 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a2c95580b468a1ddd72ecb5532aca7d94f6efa5b:

  stat: Add iops stat and sample number information to terse format (2017-06-23 16:31:02 -0600)

----------------------------------------------------------------
Andreas Herrmann (4):
      stat: Print one-line iops stat
      stat: Print number of samples in bw and iops stats
      stat: Merge show_thread_status_terse_* functions
      stat: Add iops stat and sample number information to terse format

Tomohiro Kusumi (7):
      Makefile: use fmt(1) rather than tr(1) on NetBSD/etc
      server: don't use void* for pointer arithmetic (gcc)
      io_u: don't use void* for pointer arithmetic (gcc)
      init: don't use void* for pointer arithmetic (gcc)
      client: don't use void* for pointer arithmetic (gcc)
      verify: don't use void* for pointer arithmetic (gcc)
      smalloc: don't use void* for pointer arithmetic (gcc)

 HOWTO            |  19 +++++---
 Makefile         |  10 ++++
 client.c         |  13 +++---
 fio.1            |  26 ++++++++---
 init.c           |   6 +--
 io_u.c           |   2 +-
 server.c         |   6 ++-
 server.h         |   2 +-
 smalloc.c        |   2 +-
 stat.c           | 139 +++++++++++++++++++++++--------------------------------
 t/verify-state.c |   3 +-
 verify-state.h   |   2 +-
 verify.c         |   7 +--
 13 files changed, 126 insertions(+), 111 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 22c5a5b..b2db69d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -127,7 +127,7 @@ Command line options
 
 .. option:: --terse-version=type
 
-	Set terse version output format (default 3, or 2 or 4).
+	Set terse version output format (default 3, or 2 or 4 or 5).
 
 .. option:: --version
 
@@ -3168,11 +3168,12 @@ first value is the version of the terse output format. If the output has to be
 changed for some reason, this number will be incremented by 1 to signify that
 change.
 
-Split up, the format is as follows:
+Split up, the format is as follows (comments in brackets denote when a
+field was introduced or whether its specific to some terse version):
 
     ::
 
-        terse version, fio version, jobname, groupid, error
+	terse version, fio version [v3], jobname, groupid, error
 
     READ status::
 
@@ -3181,7 +3182,8 @@ Split up, the format is as follows:
         Completion latency: min, max, mean, stdev (usec)
         Completion latency percentiles: 20 fields (see below)
         Total latency: min, max, mean, stdev (usec)
-        Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev
+	Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
+	IOPS [v5]: min, max, mean, stdev, number of samples
 
     WRITE status:
 
@@ -3192,7 +3194,12 @@ Split up, the format is as follows:
         Completion latency: min, max, mean, stdev (usec)
         Completion latency percentiles: 20 fields (see below)
         Total latency: min, max, mean, stdev (usec)
-        Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev
+	Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
+	IOPS [v5]: min, max, mean, stdev, number of samples
+
+    TRIM status [all but version 3]:
+
+	Fields are similar to READ/WRITE status.
 
     CPU usage::
 
@@ -3210,7 +3217,7 @@ Split up, the format is as follows:
 
         <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000, 2000, >=2000
 
-    Disk utilization::
+    Disk utilization [v3]::
 
         Disk name, Read ios, write ios,
         Read merges, write merges,
diff --git a/Makefile b/Makefile
index 64fa97a..bef930f 100644
--- a/Makefile
+++ b/Makefile
@@ -327,8 +327,13 @@ override CFLAGS += -DFIO_VERSION='"$(FIO_VERSION)"'
 	@$(CC) -MM $(CFLAGS) $(CPPFLAGS) $(SRCDIR)/$*.c > $*.d
 	@mv -f $*.d $*.d.tmp
 	@sed -e 's|.*:|$*.o:|' < $*.d.tmp > $*.d
+ifeq ($(CONFIG_TARGET_OS), NetBSD)
+	@sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | tr -cs "[:graph:]" "\n" | \
+		sed -e 's/^ *//' -e '/^$$/ d' -e 's/$$/:/' >> $*.d
+else
 	@sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | fmt -w 1 | \
 		sed -e 's/^ *//' -e 's/$$/:/' >> $*.d
+endif
 	@rm -f $*.d.tmp
 
 ifdef CONFIG_ARITHMETIC
@@ -366,8 +371,13 @@ init.o: init.c FIO-VERSION-FILE
 	@$(CC) -MM $(CFLAGS) $(CPPFLAGS) $(SRCDIR)/$*.c > $*.d
 	@mv -f $*.d $*.d.tmp
 	@sed -e 's|.*:|$*.o:|' < $*.d.tmp > $*.d
+ifeq ($(CONFIG_TARGET_OS), NetBSD)
+	@sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | tr -cs "[:graph:]" "\n" | \
+		sed -e 's/^ *//' -e '/^$$/ d' -e 's/$$/:/' >> $*.d
+else
 	@sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | fmt -w 1 | \
 		sed -e 's/^ *//' -e 's/$$/:/' >> $*.d
+endif
 	@rm -f $*.d.tmp
 
 gcompat.o: gcompat.c gcompat.h
diff --git a/client.c b/client.c
index 7a986aa..281d853 100644
--- a/client.c
+++ b/client.c
@@ -885,6 +885,7 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 		convert_io_stat(&dst->slat_stat[i], &src->slat_stat[i]);
 		convert_io_stat(&dst->lat_stat[i], &src->lat_stat[i]);
 		convert_io_stat(&dst->bw_stat[i], &src->bw_stat[i]);
+		convert_io_stat(&dst->iops_stat[i], &src->iops_stat[i]);
 	}
 
 	dst->usr_time		= le64_to_cpu(src->usr_time);
@@ -1452,7 +1453,7 @@ static struct cmd_iolog_pdu *convert_iolog_gz(struct fio_net_cmd *cmd,
 	z_stream stream;
 	uint32_t nr_samples;
 	size_t total;
-	void *p;
+	char *p;
 
 	stream.zalloc = Z_NULL;
 	stream.zfree = Z_NULL;
@@ -1478,10 +1479,10 @@ static struct cmd_iolog_pdu *convert_iolog_gz(struct fio_net_cmd *cmd,
 
 	memcpy(ret, pdu, sizeof(*pdu));
 
-	p = (void *) ret + sizeof(*pdu);
+	p = (char *) ret + sizeof(*pdu);
 
 	stream.avail_in = cmd->pdu_len - sizeof(*pdu);
-	stream.next_in = (void *) pdu + sizeof(*pdu);
+	stream.next_in = (void *)((char *) pdu + sizeof(*pdu));
 	while (stream.avail_in) {
 		unsigned int this_chunk = 65536;
 		unsigned int this_len;
@@ -1491,7 +1492,7 @@ static struct cmd_iolog_pdu *convert_iolog_gz(struct fio_net_cmd *cmd,
 			this_chunk = total;
 
 		stream.avail_out = this_chunk;
-		stream.next_out = p;
+		stream.next_out = (void *)p;
 		err = inflate(&stream, Z_NO_FLUSH);
 		/* may be Z_OK, or Z_STREAM_END */
 		if (err < 0) {
@@ -1566,7 +1567,7 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 
 		s = __get_sample(samples, ret->log_offset, i);
 		if (ret->log_type == IO_LOG_TYPE_HIST)
-			s = (struct io_sample *)((void *)s + sizeof(struct io_u_plat_entry) * i);
+			s = (struct io_sample *)((char *)s + sizeof(struct io_u_plat_entry) * i);
 
 		s->time		= le64_to_cpu(s->time);
 		s->data.val	= le64_to_cpu(s->data.val);
@@ -1580,7 +1581,7 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 		}
 
 		if (ret->log_type == IO_LOG_TYPE_HIST) {
-			s->data.plat_entry = (struct io_u_plat_entry *)(((void *)s) + sizeof(*s));
+			s->data.plat_entry = (struct io_u_plat_entry *)(((char *)s) + sizeof(*s));
 			s->data.plat_entry->list.next = NULL;
 			s->data.plat_entry->list.prev = NULL;
 		}
diff --git a/fio.1 b/fio.1
index 96eceaf..6a6ea1b 100644
--- a/fio.1
+++ b/fio.1
@@ -43,7 +43,7 @@ Deprecated, use \-\-output-format instead to select multiple formats.
 Display version information and exit.
 .TP
 .BI \-\-terse\-version \fR=\fPversion
-Set terse version output format (default 3, or 2 or 4)
+Set terse version output format (default 3, or 2, 4, 5)
 .TP
 .B \-\-help
 Display usage information and exit.
@@ -2161,10 +2161,11 @@ scripted use.
 A job description (if provided) follows on a new line.  Note that the first
 number in the line is the version number. If the output has to be changed
 for some reason, this number will be incremented by 1 to signify that
-change.  The fields are:
+change. Numbers in brackets (e.g. "[v3]") indicate which terse version
+introduced a field. The fields are:
 .P
 .RS
-.B terse version, fio version, jobname, groupid, error
+.B terse version, fio version [v3], jobname, groupid, error
 .P
 Read status:
 .RS
@@ -2188,7 +2189,11 @@ Total latency:
 .RE
 Bandwidth:
 .RS
-.B min, max, aggregate percentage of total, mean, standard deviation
+.B min, max, aggregate percentage of total, mean, standard deviation, number of samples [v5]
+.RE
+IOPS [v5]:
+.RS
+.B min, max, mean, standard deviation, number of samples
 .RE
 .RE
 .P
@@ -2214,10 +2219,19 @@ Total latency:
 .RE
 Bandwidth:
 .RS
-.B min, max, aggregate percentage of total, mean, standard deviation
+.B min, max, aggregate percentage of total, mean, standard deviation, number of samples [v5]
+.RE
+IOPS [v5]:
+.RS
+.B min, max, mean, standard deviation, number of samples
 .RE
 .RE
 .P
+Trim status [all but version 3]:
+.RS
+Similar to Read/Write status but for trims.
+.RE
+.P
 CPU usage:
 .RS
 .B user, system, context switches, major page faults, minor page faults
@@ -2240,7 +2254,7 @@ Milliseconds:
 .RE
 .RE
 .P
-Disk utilization (1 for each disk used):
+Disk utilization (1 for each disk used) [v3]:
 .RS
 .B name, read ios, write ios, read merges, write merges, read ticks, write ticks, read in-queue time, write in-queue time, disk utilization percentage
 .RE
diff --git a/init.c b/init.c
index 2b7768a..934b9d7 100644
--- a/init.c
+++ b/init.c
@@ -361,7 +361,7 @@ static int setup_thread_area(void)
 #endif
 
 	memset(threads, 0, max_jobs * sizeof(struct thread_data));
-	fio_debug_jobp = (void *) threads + max_jobs * sizeof(struct thread_data);
+	fio_debug_jobp = (unsigned int *)(threads + max_jobs);
 	*fio_debug_jobp = -1;
 
 	flow_init();
@@ -1364,6 +1364,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		td->ts.slat_stat[i].min_val = ULONG_MAX;
 		td->ts.lat_stat[i].min_val = ULONG_MAX;
 		td->ts.bw_stat[i].min_val = ULONG_MAX;
+		td->ts.iops_stat[i].min_val = ULONG_MAX;
 	}
 	td->ddir_seq_nr = o->ddir_seq_nr;
 
@@ -2403,8 +2404,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			break;
 		case 'V':
 			terse_version = atoi(optarg);
-			if (!(terse_version == 2 || terse_version == 3 ||
-			     terse_version == 4)) {
+			if (!(terse_version >= 2 && terse_version <= 5)) {
 				log_err("fio: bad terse version format\n");
 				exit_val = 1;
 				do_exit++;
diff --git a/io_u.c b/io_u.c
index 375413f..8d42d65 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1602,7 +1602,7 @@ static void small_content_scramble(struct io_u *io_u)
 	unsigned int i, nr_blocks = io_u->buflen / 512;
 	uint64_t boffset;
 	unsigned int offset;
-	void *p, *end;
+	char *p, *end;
 
 	if (!nr_blocks)
 		return;
diff --git a/server.c b/server.c
index 8b36e38..a640fe3 100644
--- a/server.c
+++ b/server.c
@@ -252,9 +252,10 @@ static int fio_send_data(int sk, const void *p, unsigned int len)
 	return fio_sendv_data(sk, &iov, 1);
 }
 
-static int fio_recv_data(int sk, void *p, unsigned int len, bool wait)
+static int fio_recv_data(int sk, void *buf, unsigned int len, bool wait)
 {
 	int flags;
+	char *p = buf;
 
 	if (wait)
 		flags = MSG_WAITALL;
@@ -377,7 +378,7 @@ struct fio_net_cmd *fio_net_recv_cmd(int sk, bool wait)
 			break;
 
 		/* There's payload, get it */
-		pdu = (void *) cmdret->payload + pdu_offset;
+		pdu = (char *) cmdret->payload + pdu_offset;
 		ret = fio_recv_data(sk, pdu, cmd.pdu_len, wait);
 		if (ret)
 			break;
@@ -1474,6 +1475,7 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 		convert_io_stat(&p.ts.slat_stat[i], &ts->slat_stat[i]);
 		convert_io_stat(&p.ts.lat_stat[i], &ts->lat_stat[i]);
 		convert_io_stat(&p.ts.bw_stat[i], &ts->bw_stat[i]);
+		convert_io_stat(&p.ts.iops_stat[i], &ts->iops_stat[i]);
 	}
 
 	p.ts.usr_time		= cpu_to_le64(ts->usr_time);
diff --git a/server.h b/server.h
index 7f235f3..f63a518 100644
--- a/server.h
+++ b/server.h
@@ -49,7 +49,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 64,
+	FIO_SERVER_VER			= 65,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/smalloc.c b/smalloc.c
index e48cfe8..cab7132 100644
--- a/smalloc.c
+++ b/smalloc.c
@@ -189,7 +189,7 @@ static bool add_pool(struct pool *pool, unsigned int alloc_size)
 		goto out_fail;
 
 	pool->map = ptr;
-	pool->bitmap = (void *) ptr + (pool->nr_blocks * SMALLOC_BPL);
+	pool->bitmap = (unsigned int *)((char *) ptr + (pool->nr_blocks * SMALLOC_BPL));
 	memset(pool->bitmap, 0, bitmap_blocks * sizeof(unsigned int));
 
 	pool->lock = fio_mutex_init(FIO_MUTEX_UNLOCKED);
diff --git a/stat.c b/stat.c
index 5042650..b3b2cb3 100644
--- a/stat.c
+++ b/stat.c
@@ -496,8 +496,15 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 			bw_str = (rs->unit_base == 1 ? "Mibit" : "MiB");
 		}
 
-		log_buf(out, "   bw (%5s/s): min=%5llu, max=%5llu, per=%3.2f%%, avg=%5.02f, stdev=%5.02f\n",
-			bw_str, min, max, p_of_agg, mean, dev);
+		log_buf(out, "   bw (%5s/s): min=%5llu, max=%5llu, per=%3.2f%%, "
+			"avg=%5.02f, stdev=%5.02f, samples=%5lu\n",
+			bw_str, min, max, p_of_agg, mean, dev,
+			(&ts->bw_stat[ddir])->samples);
+	}
+	if (calc_lat(&ts->iops_stat[ddir], &min, &max, &mean, &dev)) {
+		log_buf(out, "   iops : min=%5llu, max=%5llu, avg=%5.02f, "
+			"stdev=%5.02f, samples=%5lu\n",
+			min, max, mean, dev, (&ts->iops_stat[ddir])->samples);
 	}
 }
 
@@ -856,13 +863,13 @@ static void show_thread_status_normal(struct thread_stat *ts,
 
 static void show_ddir_status_terse(struct thread_stat *ts,
 				   struct group_run_stats *rs, int ddir,
-				   struct buf_output *out)
+				   int ver, struct buf_output *out)
 {
 	unsigned long long min, max, minv, maxv, bw, iops;
 	unsigned long long *ovals = NULL;
 	double mean, dev;
 	unsigned int len;
-	int i;
+	int i, bw_stat;
 
 	assert(ddir_rw(ddir));
 
@@ -912,7 +919,8 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 	if (ovals)
 		free(ovals);
 
-	if (calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev)) {
+	bw_stat = calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev);
+	if (bw_stat) {
 		double p_of_agg = 100.0;
 
 		if (rs->agg[ddir]) {
@@ -924,6 +932,19 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 		log_buf(out, ";%llu;%llu;%f%%;%f;%f", min, max, p_of_agg, mean, dev);
 	} else
 		log_buf(out, ";%llu;%llu;%f%%;%f;%f", 0ULL, 0ULL, 0.0, 0.0, 0.0);
+
+	if (ver == 5) {
+		if (bw_stat)
+			log_buf(out, ";%lu", (&ts->bw_stat[ddir])->samples);
+		else
+			log_buf(out, ";%lu", 0UL);
+
+		if (calc_lat(&ts->iops_stat[ddir], &min, &max, &mean, &dev))
+			log_buf(out, ";%llu;%llu;%f;%f;%lu", min, max,
+				mean, dev, (&ts->iops_stat[ddir])->samples);
+		else
+			log_buf(out, ";%llu;%llu;%f;%f;%lu", 0ULL, 0ULL, 0.0, 0.0, 0UL);
+	}
 }
 
 static void add_ddir_status_json(struct thread_stat *ts,
@@ -1047,74 +1068,24 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	json_object_add_value_float(dir_object, "bw_agg", p_of_agg);
 	json_object_add_value_float(dir_object, "bw_mean", mean);
 	json_object_add_value_float(dir_object, "bw_dev", dev);
-}
-
-static void show_thread_status_terse_v2(struct thread_stat *ts,
-					struct group_run_stats *rs,
-					struct buf_output *out)
-{
-	double io_u_dist[FIO_IO_U_MAP_NR];
-	double io_u_lat_u[FIO_IO_U_LAT_U_NR];
-	double io_u_lat_m[FIO_IO_U_LAT_M_NR];
-	double usr_cpu, sys_cpu;
-	int i;
-
-	/* General Info */
-	log_buf(out, "2;%s;%d;%d", ts->name, ts->groupid, ts->error);
-	/* Log Read Status */
-	show_ddir_status_terse(ts, rs, DDIR_READ, out);
-	/* Log Write Status */
-	show_ddir_status_terse(ts, rs, DDIR_WRITE, out);
-	/* Log Trim Status */
-	show_ddir_status_terse(ts, rs, DDIR_TRIM, out);
+	json_object_add_value_int(dir_object, "bw_samples",
+				(&ts->bw_stat[ddir])->samples);
 
-	/* CPU Usage */
-	if (ts->total_run_time) {
-		double runt = (double) ts->total_run_time;
-
-		usr_cpu = (double) ts->usr_time * 100 / runt;
-		sys_cpu = (double) ts->sys_time * 100 / runt;
-	} else {
-		usr_cpu = 0;
-		sys_cpu = 0;
+	if (!calc_lat(&ts->iops_stat[ddir], &min, &max, &mean, &dev)) {
+		min = max = 0;
+		mean = dev = 0.0;
 	}
-
-	log_buf(out, ";%f%%;%f%%;%llu;%llu;%llu", usr_cpu, sys_cpu,
-						(unsigned long long) ts->ctx,
-						(unsigned long long) ts->majf,
-						(unsigned long long) ts->minf);
-
-	/* Calc % distribution of IO depths, usecond, msecond latency */
-	stat_calc_dist(ts->io_u_map, ddir_rw_sum(ts->total_io_u), io_u_dist);
-	stat_calc_lat_nu(ts, io_u_lat_u);
-	stat_calc_lat_m(ts, io_u_lat_m);
-
-	/* Only show fixed 7 I/O depth levels*/
-	log_buf(out, ";%3.1f%%;%3.1f%%;%3.1f%%;%3.1f%%;%3.1f%%;%3.1f%%;%3.1f%%",
-			io_u_dist[0], io_u_dist[1], io_u_dist[2], io_u_dist[3],
-			io_u_dist[4], io_u_dist[5], io_u_dist[6]);
-
-	/* Microsecond latency */
-	for (i = 0; i < FIO_IO_U_LAT_U_NR; i++)
-		log_buf(out, ";%3.2f%%", io_u_lat_u[i]);
-	/* Millisecond latency */
-	for (i = 0; i < FIO_IO_U_LAT_M_NR; i++)
-		log_buf(out, ";%3.2f%%", io_u_lat_m[i]);
-	/* Additional output if continue_on_error set - default off*/
-	if (ts->continue_on_error)
-		log_buf(out, ";%llu;%d", (unsigned long long) ts->total_err_count, ts->first_error);
-	log_buf(out, "\n");
-
-	/* Additional output if description is set */
-	if (strlen(ts->description))
-		log_buf(out, ";%s", ts->description);
-
-	log_buf(out, "\n");
+	json_object_add_value_int(dir_object, "iops_min", min);
+	json_object_add_value_int(dir_object, "iops_max", max);
+	json_object_add_value_float(dir_object, "iops_mean", mean);
+	json_object_add_value_float(dir_object, "iops_stddev", dev);
+	json_object_add_value_int(dir_object, "iops_samples",
+				(&ts->iops_stat[ddir])->samples);
 }
 
-static void show_thread_status_terse_v3_v4(struct thread_stat *ts,
-					   struct group_run_stats *rs, int ver,
-					   struct buf_output *out)
+static void show_thread_status_terse_all(struct thread_stat *ts,
+					 struct group_run_stats *rs, int ver,
+					 struct buf_output *out)
 {
 	double io_u_dist[FIO_IO_U_MAP_NR];
 	double io_u_lat_u[FIO_IO_U_LAT_U_NR];
@@ -1123,15 +1094,19 @@ static void show_thread_status_terse_v3_v4(struct thread_stat *ts,
 	int i;
 
 	/* General Info */
-	log_buf(out, "%d;%s;%s;%d;%d", ver, fio_version_string,
-					ts->name, ts->groupid, ts->error);
+	if (ver == 2)
+		log_buf(out, "2;%s;%d;%d", ts->name, ts->groupid, ts->error);
+	else
+		log_buf(out, "%d;%s;%s;%d;%d", ver, fio_version_string,
+			ts->name, ts->groupid, ts->error);
+
 	/* Log Read Status */
-	show_ddir_status_terse(ts, rs, DDIR_READ, out);
+	show_ddir_status_terse(ts, rs, DDIR_READ, ver, out);
 	/* Log Write Status */
-	show_ddir_status_terse(ts, rs, DDIR_WRITE, out);
+	show_ddir_status_terse(ts, rs, DDIR_WRITE, ver, out);
 	/* Log Trim Status */
-	if (ver == 4)
-		show_ddir_status_terse(ts, rs, DDIR_TRIM, out);
+	if (ver == 2 || ver == 4 || ver == 5)
+		show_ddir_status_terse(ts, rs, DDIR_TRIM, ver, out);
 
 	/* CPU Usage */
 	if (ts->total_run_time) {
@@ -1167,11 +1142,14 @@ static void show_thread_status_terse_v3_v4(struct thread_stat *ts,
 		log_buf(out, ";%3.2f%%", io_u_lat_m[i]);
 
 	/* disk util stats, if any */
-	show_disk_util(1, NULL, out);
+	if (ver >= 3)
+		show_disk_util(1, NULL, out);
 
 	/* Additional output if continue_on_error set - default off*/
 	if (ts->continue_on_error)
 		log_buf(out, ";%llu;%d", (unsigned long long) ts->total_err_count, ts->first_error);
+	if (ver == 2)
+		log_buf(out, "\n");
 
 	/* Additional output if description is set */
 	if (strlen(ts->description))
@@ -1412,10 +1390,8 @@ static void show_thread_status_terse(struct thread_stat *ts,
 				     struct group_run_stats *rs,
 				     struct buf_output *out)
 {
-	if (terse_version == 2)
-		show_thread_status_terse_v2(ts, rs, out);
-	else if (terse_version == 3 || terse_version == 4)
-		show_thread_status_terse_v3_v4(ts, rs, terse_version, out);
+	if (terse_version >= 2 && terse_version <= 5)
+		show_thread_status_terse_all(ts, rs, terse_version, out);
 	else
 		log_err("fio: bad terse version!? %d\n", terse_version);
 }
@@ -1507,6 +1483,7 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 			sum_stat(&dst->slat_stat[l], &src->slat_stat[l], first);
 			sum_stat(&dst->lat_stat[l], &src->lat_stat[l], first);
 			sum_stat(&dst->bw_stat[l], &src->bw_stat[l], first);
+			sum_stat(&dst->iops_stat[l], &src->iops_stat[l], first);
 
 			dst->io_bytes[l] += src->io_bytes[l];
 
@@ -1517,6 +1494,7 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 			sum_stat(&dst->slat_stat[0], &src->slat_stat[l], first);
 			sum_stat(&dst->lat_stat[0], &src->lat_stat[l], first);
 			sum_stat(&dst->bw_stat[0], &src->bw_stat[l], first);
+			sum_stat(&dst->iops_stat[0], &src->iops_stat[l], first);
 
 			dst->io_bytes[0] += src->io_bytes[l];
 
@@ -1598,6 +1576,7 @@ void init_thread_stat(struct thread_stat *ts)
 		ts->clat_stat[j].min_val = -1UL;
 		ts->slat_stat[j].min_val = -1UL;
 		ts->bw_stat[j].min_val = -1UL;
+		ts->iops_stat[j].min_val = -1UL;
 	}
 	ts->groupid = -1;
 }
diff --git a/t/verify-state.c b/t/verify-state.c
index 9a2c3df..78a56da 100644
--- a/t/verify-state.c
+++ b/t/verify-state.c
@@ -58,7 +58,8 @@ static void show(struct thread_io_list *s, size_t size)
 		show_s(s, no_s);
 		no_s++;
 		size -= __thread_io_list_sz(s->depth, s->nofiles);
-		s = (void *) s + __thread_io_list_sz(s->depth, s->nofiles);
+		s = (struct thread_io_list *)((char *) s +
+			__thread_io_list_sz(s->depth, s->nofiles));
 	} while (size != 0);
 }
 
diff --git a/verify-state.h b/verify-state.h
index e46265e..1586f63 100644
--- a/verify-state.h
+++ b/verify-state.h
@@ -77,7 +77,7 @@ static inline size_t thread_io_list_sz(struct thread_io_list *s)
 
 static inline struct thread_io_list *io_list_next(struct thread_io_list *s)
 {
-	return (void *) s + thread_io_list_sz(s);
+	return (struct thread_io_list *)((char *) s + thread_io_list_sz(s));
 }
 
 static inline void verify_state_gen_name(char *out, size_t size,
diff --git a/verify.c b/verify.c
index ffd8707..1f177d7 100644
--- a/verify.c
+++ b/verify.c
@@ -388,7 +388,7 @@ static int verify_io_u_pattern(struct verify_header *hdr, struct vcont *vc)
 	(void)paste_format_inplace(pattern, pattern_size,
 				   td->o.verify_fmt, td->o.verify_fmt_sz, io_u);
 
-	buf = (void *) hdr + header_size;
+	buf = (char *) hdr + header_size;
 	len = get_hdr_inc(td, io_u) - header_size;
 	mod = (get_hdr_inc(td, io_u) * vc->hdr_num + header_size) % pattern_size;
 
@@ -1188,9 +1188,10 @@ static void populate_hdr(struct thread_data *td, struct io_u *io_u,
 			 unsigned int header_len)
 {
 	unsigned int data_len;
-	void *data, *p;
+	void *data;
+	char *p;
 
-	p = (void *) hdr;
+	p = (char *) hdr;
 
 	fill_hdr(td, io_u, hdr, header_num, header_len, io_u->rand_seed);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b2fcbe01bdac01bc5d7f8ddea94f264b9f8c2003:

  Ensure that thread_stat alignment is correct (2017-06-19 16:41:51 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cf6b7fb4f1883af9cbc443ed2536e7454ed51215:

  t/time-test: cleanups (2017-06-22 19:29:35 -0600)

----------------------------------------------------------------
Jens Axboe (10):
      Merge branch 'nanosecond-2stage' of https://github.com/vincentkfu/fio into nsec
      Fixup some style issues
      Merge branch 'nsec'
      crc32c: use bool
      arch: tsc_reliable can be a bool
      client/server: bool conversion
      iolog: get work items out of shared memory pool
      iolog: punt freeing of data back to original thread
      iolog: ensure proper flushing of compressed logs
      t/time-test: cleanups

Vincent Fu (12):
      nanosecond: initial commit changing timeval to timespec
      nanosecond: update completion latency recording and normal, json output to use nanoseconds
      nanosecond: reconcile terse output with nanosecond timing for latencies
      nanosecond: alter gfio to accommodate nanosecond timing
      nanosecond: fiologparser_hist set default --group_nr to 29 to match stat.h FIO_IO_U_PLAT_GROUP_NR
      nanosecond: fix up conversion of ticks to nsec by doing the conversion in 2 stages
      nanosecond: add test program t/time-test for experimenting with cpu clock ticks to nsec conversion
      lib/seqlock: #include "types.h" for bool type
      nanosecond: update t/time-test.c to include experiments using seqlock for conversion
      gettime: for better accuracy calculate cycles_per_msec instead of cycles_per_usec
      gettime: drop tv_valid->last_cycles and tv_valid->last_tv_valid
      server: bump server version for the change to FIO_IO_U_PLAT_GROUP_NR

 HOWTO                           |   2 +-
 Makefile                        |   7 +
 arch/arch-ia64.h                |   4 +-
 arch/arch-ppc.h                 |   4 +-
 arch/arch-s390.h                |   4 +-
 arch/arch-x86-common.h          |   2 +-
 arch/arch.h                     |   2 +
 backend.c                       |  42 ++--
 client.c                        |  36 +--
 client.h                        |  11 +-
 crc/crc32c-arm64.c              |   9 +-
 crc/crc32c-intel.c              |   6 +-
 crc/crc32c.h                    |   5 +-
 crc/test.c                      |   6 +-
 diskutil.c                      |   2 +-
 diskutil.h                      |   2 +-
 engines/guasi.c                 |   2 +-
 engines/libaio.c                |   8 +-
 engines/rdma.c                  |   2 +-
 eta.c                           |   6 +-
 fio.h                           |  24 +-
 fio_time.h                      |  16 +-
 gclient.c                       |  56 +++--
 gettime-thread.c                |  14 +-
 gettime.c                       | 193 ++++++++++----
 gettime.h                       |  14 +-
 helper_thread.c                 |  32 +--
 idletime.c                      |  14 +-
 idletime.h                      |   4 +-
 io_u.c                          |  91 +++++--
 io_u.h                          |   4 +-
 ioengines.c                     |   4 +-
 iolog.c                         |  52 +++-
 iolog.h                         |   7 +-
 lib/seqlock.h                   |   1 +
 libfio.c                        |   8 +-
 mutex.c                         |  12 +-
 options.c                       |   2 +-
 os/windows/posix.c              |   6 +-
 profiles/act.c                  |   2 +-
 server.c                        |  13 +-
 server.h                        |   6 +-
 stat.c                          | 217 ++++++++++------
 stat.h                          |  38 ++-
 steadystate.c                   |   2 +-
 steadystate.h                   |   2 +-
 t/arch.c                        |   2 +-
 t/debug.c                       |   2 +-
 t/dedupe.c                      |   2 +-
 t/lfsr-test.c                   |   2 +-
 t/time-test.c                   | 544 ++++++++++++++++++++++++++++++++++++++++
 time.c                          |  36 ++-
 tools/hist/fiologparser_hist.py |   2 +-
 verify.c                        |   2 +-
 54 files changed, 1203 insertions(+), 385 deletions(-)
 create mode 100644 t/time-test.c

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index d3a5783..22c5a5b 100644
--- a/HOWTO
+++ b/HOWTO
@@ -3189,7 +3189,7 @@ Split up, the format is as follows:
 
         Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
         Submission latency: min, max, mean, stdev (usec)
-        Completion latency: min, max, mean, stdev(usec)
+        Completion latency: min, max, mean, stdev (usec)
         Completion latency percentiles: 20 fields (see below)
         Total latency: min, max, mean, stdev (usec)
         Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev
diff --git a/Makefile b/Makefile
index d7786d2..64fa97a 100644
--- a/Makefile
+++ b/Makefile
@@ -250,6 +250,9 @@ T_PIPE_ASYNC_PROGS = t/read-to-pipe-async
 T_MEMLOCK_OBJS = t/memlock.o
 T_MEMLOCK_PROGS = t/memlock
 
+T_TT_OBJS = t/time-test.o
+T_TT_PROGS = t/time-test
+
 T_OBJS = $(T_SMALLOC_OBJS)
 T_OBJS += $(T_IEEE_OBJS)
 T_OBJS += $(T_ZIPF_OBJS)
@@ -261,6 +264,7 @@ T_OBJS += $(T_DEDUPE_OBJS)
 T_OBJS += $(T_VS_OBJS)
 T_OBJS += $(T_PIPE_ASYNC_OBJS)
 T_OBJS += $(T_MEMLOCK_OBJS)
+T_OBJS += $(T_TT_OBJS)
 
 ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
     T_DEDUPE_OBJS += os/windows/posix.o lib/hweight.o
@@ -434,6 +438,9 @@ t/fio-dedupe: $(T_DEDUPE_OBJS)
 t/fio-verify-state: $(T_VS_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_VS_OBJS) $(LIBS)
 
+t/time-test: $(T_TT_OBJS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_TT_OBJS) $(LIBS)
+
 clean: FORCE
 	@rm -f .depend $(FIO_OBJS) $(GFIO_OBJS) $(OBJS) $(T_OBJS) $(PROGS) $(T_PROGS) $(T_TEST_PROGS) core.* core gfio FIO-VERSION-FILE *.d lib/*.d oslib/*.d crc/*.d engines/*.d profiles/*.d t/*.d config-host.mak config-host.h y.tab.[ch] lex.yy.c exp/*.[do] lexer.h
 	@rm -rf  doc/output
diff --git a/arch/arch-ia64.h b/arch/arch-ia64.h
index 53c049f..ece3f7e 100644
--- a/arch/arch-ia64.h
+++ b/arch/arch-ia64.h
@@ -28,10 +28,10 @@ static inline unsigned long long get_cpu_clock(void)
 }
 
 #define ARCH_HAVE_INIT
-extern int tsc_reliable;
+extern bool tsc_reliable;
 static inline int arch_init(char *envp[])
 {
-	tsc_reliable = 1;
+	tsc_reliable = true;
 	return 0;
 }
 
diff --git a/arch/arch-ppc.h b/arch/arch-ppc.h
index 4a8aa97..ba452b1 100644
--- a/arch/arch-ppc.h
+++ b/arch/arch-ppc.h
@@ -117,12 +117,12 @@ static void atb_clocktest(void)
 #endif
 
 #define ARCH_HAVE_INIT
-extern int tsc_reliable;
+extern bool tsc_reliable;
 
 static inline int arch_init(char *envp[])
 {
 #if 0
-	tsc_reliable = 1;
+	tsc_reliable = true;
 	atb_clocktest();
 #endif
 	return 0;
diff --git a/arch/arch-s390.h b/arch/arch-s390.h
index 2e84bf8..6bf033b 100644
--- a/arch/arch-s390.h
+++ b/arch/arch-s390.h
@@ -28,10 +28,10 @@ static inline unsigned long long get_cpu_clock(void)
 #undef ARCH_CPU_CLOCK_WRAPS
 
 #define ARCH_HAVE_INIT
-extern int tsc_reliable;
+extern bool tsc_reliable;
 static inline int arch_init(char *envp[])
 {
-	tsc_reliable = 1;
+	tsc_reliable = true;
 	return 0;
 }
 
diff --git a/arch/arch-x86-common.h b/arch/arch-x86-common.h
index cbf66b8..c51c04c 100644
--- a/arch/arch-x86-common.h
+++ b/arch/arch-x86-common.h
@@ -14,7 +14,7 @@ static inline void cpuid(unsigned int op,
 
 #define ARCH_HAVE_INIT
 
-extern int tsc_reliable;
+extern bool tsc_reliable;
 extern int arch_random;
 
 static inline void arch_init_intel(unsigned int level)
diff --git a/arch/arch.h b/arch/arch.h
index 00d247c..4fb9b51 100644
--- a/arch/arch.h
+++ b/arch/arch.h
@@ -1,6 +1,8 @@
 #ifndef ARCH_H
 #define ARCH_H
 
+#include "../lib/types.h"
+
 enum {
 	arch_x86_64 = 1,
 	arch_x86,
diff --git a/backend.c b/backend.c
index 9a684ed..fe15997 100644
--- a/backend.c
+++ b/backend.c
@@ -136,7 +136,7 @@ static void set_sig_handlers(void)
 /*
  * Check if we are above the minimum rate given.
  */
-static bool __check_min_rate(struct thread_data *td, struct timeval *now,
+static bool __check_min_rate(struct thread_data *td, struct timespec *now,
 			     enum fio_ddir ddir)
 {
 	unsigned long long bytes = 0;
@@ -223,7 +223,7 @@ static bool __check_min_rate(struct thread_data *td, struct timeval *now,
 	return false;
 }
 
-static bool check_min_rate(struct thread_data *td, struct timeval *now)
+static bool check_min_rate(struct thread_data *td, struct timespec *now)
 {
 	bool ret = false;
 
@@ -335,18 +335,18 @@ static int fio_file_fsync(struct thread_data *td, struct fio_file *f)
 	return ret;
 }
 
-static inline void __update_tv_cache(struct thread_data *td)
+static inline void __update_ts_cache(struct thread_data *td)
 {
-	fio_gettime(&td->tv_cache, NULL);
+	fio_gettime(&td->ts_cache, NULL);
 }
 
-static inline void update_tv_cache(struct thread_data *td)
+static inline void update_ts_cache(struct thread_data *td)
 {
-	if ((++td->tv_cache_nr & td->tv_cache_mask) == td->tv_cache_mask)
-		__update_tv_cache(td);
+	if ((++td->ts_cache_nr & td->ts_cache_mask) == td->ts_cache_mask)
+		__update_ts_cache(td);
 }
 
-static inline bool runtime_exceeded(struct thread_data *td, struct timeval *t)
+static inline bool runtime_exceeded(struct thread_data *td, struct timespec *t)
 {
 	if (in_ramp_time(td))
 		return false;
@@ -430,7 +430,7 @@ static void check_update_rusage(struct thread_data *td)
 	}
 }
 
-static int wait_for_completions(struct thread_data *td, struct timeval *time)
+static int wait_for_completions(struct thread_data *td, struct timespec *time)
 {
 	const int full = queue_full(td);
 	int min_evts = 0;
@@ -462,7 +462,7 @@ static int wait_for_completions(struct thread_data *td, struct timeval *time)
 
 int io_queue_event(struct thread_data *td, struct io_u *io_u, int *ret,
 		   enum fio_ddir ddir, uint64_t *bytes_issued, int from_verify,
-		   struct timeval *comp_time)
+		   struct timespec *comp_time)
 {
 	int ret2;
 
@@ -633,12 +633,12 @@ static void do_verify(struct thread_data *td, uint64_t verify_bytes)
 		enum fio_ddir ddir;
 		int full;
 
-		update_tv_cache(td);
+		update_ts_cache(td);
 		check_update_rusage(td);
 
-		if (runtime_exceeded(td, &td->tv_cache)) {
-			__update_tv_cache(td);
-			if (runtime_exceeded(td, &td->tv_cache)) {
+		if (runtime_exceeded(td, &td->ts_cache)) {
+			__update_ts_cache(td);
+			if (runtime_exceeded(td, &td->ts_cache)) {
 				fio_mark_td_terminate(td);
 				break;
 			}
@@ -874,7 +874,7 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 	while ((td->o.read_iolog_file && !flist_empty(&td->io_log_list)) ||
 		(!flist_empty(&td->trim_list)) || !io_issue_bytes_exceeded(td) ||
 		td->o.time_based) {
-		struct timeval comp_time;
+		struct timespec comp_time;
 		struct io_u *io_u;
 		int full;
 		enum fio_ddir ddir;
@@ -884,11 +884,11 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 		if (td->terminate || td->done)
 			break;
 
-		update_tv_cache(td);
+		update_ts_cache(td);
 
-		if (runtime_exceeded(td, &td->tv_cache)) {
-			__update_tv_cache(td);
-			if (runtime_exceeded(td, &td->tv_cache)) {
+		if (runtime_exceeded(td, &td->ts_cache)) {
+			__update_ts_cache(td);
+			if (runtime_exceeded(td, &td->ts_cache)) {
 				fio_mark_td_terminate(td);
 				break;
 			}
@@ -1686,7 +1686,7 @@ static void *thread_main(void *data)
 		uint64_t verify_bytes;
 
 		fio_gettime(&td->start, NULL);
-		memcpy(&td->tv_cache, &td->start, sizeof(td->start));
+		memcpy(&td->ts_cache, &td->start, sizeof(td->start));
 
 		if (clear_state) {
 			clear_io_state(td, 0);
@@ -2202,7 +2202,7 @@ reap:
 
 	while (todo) {
 		struct thread_data *map[REAL_MAX_JOBS];
-		struct timeval this_start;
+		struct timespec this_start;
 		int this_jobs = 0, left;
 		struct fork_data *fd;
 
diff --git a/client.c b/client.c
index 80096bf..7a986aa 100644
--- a/client.c
+++ b/client.c
@@ -48,7 +48,7 @@ struct client_ops fio_client_ops = {
 	.client_type	= FIO_CLIENT_TYPE_CLI,
 };
 
-static struct timeval eta_tv;
+static struct timespec eta_ts;
 
 static FLIST_HEAD(client_list);
 static FLIST_HEAD(eta_list);
@@ -318,7 +318,7 @@ struct fio_client *fio_client_add_explicit(struct client_ops *ops,
 	client->hostname = strdup(hostname);
 
 	if (type == Fio_client_socket)
-		client->is_sock = 1;
+		client->is_sock = true;
 	else {
 		int ipv6;
 
@@ -728,7 +728,7 @@ static int __fio_client_send_remote_ini(struct fio_client *client,
 	strcpy((char *) pdu->file, filename);
 	pdu->client_type = cpu_to_le16((uint16_t) client->type);
 
-	client->sent_job = 1;
+	client->sent_job = true;
 	ret = fio_net_send_cmd(client->fd, FIO_NET_CMD_LOAD_FILE, pdu, p_size,NULL, NULL);
 	free(pdu);
 	return ret;
@@ -781,7 +781,7 @@ static int __fio_client_send_local_ini(struct fio_client *client,
 	pdu->buf_len = __cpu_to_le32(sb.st_size);
 	pdu->client_type = cpu_to_le32(client->type);
 
-	client->sent_job = 1;
+	client->sent_job = true;
 	ret = fio_net_send_cmd(client->fd, FIO_NET_CMD_JOB, pdu, p_size, NULL, NULL);
 	free(pdu);
 	close(fd);
@@ -799,7 +799,7 @@ int fio_client_send_ini(struct fio_client *client, const char *filename,
 		ret = __fio_client_send_remote_ini(client, filename);
 
 	if (!ret)
-		client->sent_job = 1;
+		client->sent_job = true;
 
 	return ret;
 }
@@ -908,6 +908,8 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 		dst->io_u_complete[i]	= le32_to_cpu(src->io_u_complete[i]);
 	}
 
+	for (i = 0; i < FIO_IO_U_LAT_N_NR; i++)
+		dst->io_u_lat_n[i]	= le32_to_cpu(src->io_u_lat_n[i]);
 	for (i = 0; i < FIO_IO_U_LAT_U_NR; i++)
 		dst->io_u_lat_u[i]	= le32_to_cpu(src->io_u_lat_u[i]);
 	for (i = 0; i < FIO_IO_U_LAT_M_NR; i++)
@@ -1001,7 +1003,7 @@ static void handle_ts(struct fio_client *client, struct fio_net_cmd *cmd)
 		opt_list = &client->opt_lists[p->ts.thread_number - 1];
 
 	tsobj = show_thread_status(&p->ts, &p->rs, opt_list, NULL);
-	client->did_stat = 1;
+	client->did_stat = true;
 	if (tsobj) {
 		json_object_add_client_info(tsobj, client);
 		json_array_add_value_object(clients_array, tsobj);
@@ -1123,7 +1125,7 @@ static void handle_du(struct fio_client *client, struct fio_net_cmd *cmd)
 	struct cmd_du_pdu *du = (struct cmd_du_pdu *) cmd->payload;
 
 	if (!client->disk_stats_shown) {
-		client->disk_stats_shown = 1;
+		client->disk_stats_shown = true;
 		log_info("\nDisk stats (read/write):\n");
 	}
 
@@ -1869,7 +1871,7 @@ static int handle_cmd_timeout(struct fio_client *client,
 }
 
 static int client_check_cmd_timeout(struct fio_client *client,
-				    struct timeval *now)
+				    struct timespec *now)
 {
 	struct fio_net_cmd_reply *reply;
 	struct flist_head *entry, *tmp;
@@ -1878,7 +1880,7 @@ static int client_check_cmd_timeout(struct fio_client *client,
 	flist_for_each_safe(entry, tmp, &client->cmd_list) {
 		reply = flist_entry(entry, struct fio_net_cmd_reply, list);
 
-		if (mtime_since(&reply->tv, now) < FIO_NET_CLIENT_TIMEOUT)
+		if (mtime_since(&reply->ts, now) < FIO_NET_CLIENT_TIMEOUT)
 			continue;
 
 		if (!handle_cmd_timeout(client, reply))
@@ -1896,10 +1898,10 @@ static int fio_check_clients_timed_out(void)
 {
 	struct fio_client *client;
 	struct flist_head *entry, *tmp;
-	struct timeval tv;
+	struct timespec ts;
 	int ret = 0;
 
-	fio_gettime(&tv, NULL);
+	fio_gettime(&ts, NULL);
 
 	flist_for_each_safe(entry, tmp, &client_list) {
 		client = flist_entry(entry, struct fio_client, list);
@@ -1907,7 +1909,7 @@ static int fio_check_clients_timed_out(void)
 		if (flist_empty(&client->cmd_list))
 			continue;
 
-		if (!client_check_cmd_timeout(client, &tv))
+		if (!client_check_cmd_timeout(client, &ts))
 			continue;
 
 		if (client->ops->timed_out)
@@ -1928,7 +1930,7 @@ int fio_handle_clients(struct client_ops *ops)
 	struct pollfd *pfds;
 	int i, ret = 0, retval = 0;
 
-	fio_gettime(&eta_tv, NULL);
+	fio_gettime(&eta_ts, NULL);
 
 	pfds = malloc(nr_clients * sizeof(struct pollfd));
 
@@ -1960,13 +1962,13 @@ int fio_handle_clients(struct client_ops *ops)
 		assert(i == nr_clients);
 
 		do {
-			struct timeval tv;
+			struct timespec ts;
 			int timeout;
 
-			fio_gettime(&tv, NULL);
-			if (mtime_since(&eta_tv, &tv) >= 900) {
+			fio_gettime(&ts, NULL);
+			if (mtime_since(&eta_ts, &ts) >= 900) {
 				request_client_etas(ops);
-				memcpy(&eta_tv, &tv, sizeof(tv));
+				memcpy(&eta_ts, &ts, sizeof(ts));
 
 				if (fio_check_clients_timed_out())
 					break;
diff --git a/client.h b/client.h
index fc9c196..394b685 100644
--- a/client.h
+++ b/client.h
@@ -6,6 +6,7 @@
 #include <netinet/in.h>
 #include <arpa/inet.h>
 
+#include "lib/types.h"
 #include "stat.h"
 
 struct fio_net_cmd;
@@ -45,16 +46,16 @@ struct fio_client {
 
 	int state;
 
-	int skip_newline;
-	int is_sock;
-	int disk_stats_shown;
+	bool skip_newline;
+	bool is_sock;
+	bool disk_stats_shown;
 	unsigned int jobs;
 	unsigned int nr_stat;
 	int error;
 	int signal;
 	int ipv6;
-	int sent_job;
-	int did_stat;
+	bool sent_job;
+	bool did_stat;
 	uint32_t type;
 
 	uint32_t thread_number;
diff --git a/crc/crc32c-arm64.c b/crc/crc32c-arm64.c
index c3f42c7..08177ba 100644
--- a/crc/crc32c-arm64.c
+++ b/crc/crc32c-arm64.c
@@ -19,7 +19,7 @@
 #define HWCAP_CRC32             (1 << 7)
 #endif /* HWCAP_CRC32 */
 
-int crc32c_arm64_available = 0;
+bool crc32c_arm64_available = false;
 
 #ifdef ARCH_HAVE_ARM64_CRC_CRYPTO
 
@@ -27,7 +27,7 @@ int crc32c_arm64_available = 0;
 #include <arm_acle.h>
 #include <arm_neon.h>
 
-static int crc32c_probed;
+static bool crc32c_probed;
 
 /*
  * Function to calculate reflected crc with PMULL Instruction
@@ -106,9 +106,8 @@ void crc32c_arm64_probe(void)
 
 	if (!crc32c_probed) {
 		hwcap = getauxval(AT_HWCAP);
-		if (hwcap & HWCAP_CRC32)
-			crc32c_arm64_available = 1;
-		crc32c_probed = 1;
+		crc32c_arm64_available = (hwcap & HWCAP_CRC32) != 0;
+		crc32c_probed = true;
 	}
 }
 
diff --git a/crc/crc32c-intel.c b/crc/crc32c-intel.c
index 0b0f193..05a087d 100644
--- a/crc/crc32c-intel.c
+++ b/crc/crc32c-intel.c
@@ -18,7 +18,7 @@
  * Volume 2A: Instruction Set Reference, A-M
  */
 
-int crc32c_intel_available = 0;
+bool crc32c_intel_available = false;
 
 #ifdef ARCH_HAVE_SSE4_2
 
@@ -30,7 +30,7 @@ int crc32c_intel_available = 0;
 #define SCALE_F 4
 #endif
 
-static int crc32c_probed;
+static bool crc32c_probed;
 
 static uint32_t crc32c_intel_le_hw_byte(uint32_t crc, unsigned char const *data,
 					unsigned long length)
@@ -87,7 +87,7 @@ void crc32c_intel_probe(void)
 
 		do_cpuid(&eax, &ebx, &ecx, &edx);
 		crc32c_intel_available = (ecx & (1 << 20)) != 0;
-		crc32c_probed = 1;
+		crc32c_probed = true;
 	}
 }
 
diff --git a/crc/crc32c.h b/crc/crc32c.h
index 5d66407..d513f3a 100644
--- a/crc/crc32c.h
+++ b/crc/crc32c.h
@@ -19,10 +19,11 @@
 #define CRC32C_H
 
 #include "../arch/arch.h"
+#include "../lib/types.h"
 
 extern uint32_t crc32c_sw(unsigned char const *, unsigned long);
-extern int crc32c_arm64_available;
-extern int crc32c_intel_available;
+extern bool crc32c_arm64_available;
+extern bool crc32c_intel_available;
 
 #ifdef ARCH_HAVE_ARM64_CRC_CRYPTO
 extern uint32_t crc32c_arm64(unsigned char const *, unsigned long);
diff --git a/crc/test.c b/crc/test.c
index 368229e..b119872 100644
--- a/crc/test.c
+++ b/crc/test.c
@@ -392,7 +392,7 @@ int fio_crctest(const char *type)
 	fill_random_buf(&state, buf, CHUNK);
 
 	for (i = 0; t[i].name; i++) {
-		struct timeval tv;
+		struct timespec ts;
 		double mb_sec;
 		uint64_t usec;
 		char pre[3];
@@ -409,9 +409,9 @@ int fio_crctest(const char *type)
 			t[i].fn(&t[i], buf, CHUNK);
 		}
 
-		fio_gettime(&tv, NULL);
+		fio_gettime(&ts, NULL);
 		t[i].fn(&t[i], buf, CHUNK);
-		usec = utime_since_now(&tv);
+		usec = utime_since_now(&ts);
 
 		if (usec) {
 			mb_sec = (double) mb / (double) usec;
diff --git a/diskutil.c b/diskutil.c
index 9767ea2..4fe554f 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -84,7 +84,7 @@ static int get_io_ticks(struct disk_util *du, struct disk_util_stat *dus)
 static void update_io_tick_disk(struct disk_util *du)
 {
 	struct disk_util_stat __dus, *dus, *ldus;
-	struct timeval t;
+	struct timespec t;
 
 	if (!du->users)
 		return;
diff --git a/diskutil.h b/diskutil.h
index f773066..91b4202 100644
--- a/diskutil.h
+++ b/diskutil.h
@@ -64,7 +64,7 @@ struct disk_util {
 	 */
 	struct flist_head slaves;
 
-	struct timeval time;
+	struct timespec time;
 
 	struct fio_mutex *lock;
 	unsigned long users;
diff --git a/engines/guasi.c b/engines/guasi.c
index eb12c89..9644ee5 100644
--- a/engines/guasi.c
+++ b/engines/guasi.c
@@ -132,7 +132,7 @@ static void fio_guasi_queued(struct thread_data *td, struct io_u **io_us, int nr
 {
 	int i;
 	struct io_u *io_u;
-	struct timeval now;
+	struct timespec now;
 
 	if (!fio_fill_issue_time(td))
 		return;
diff --git a/engines/libaio.c b/engines/libaio.c
index e15c519..e0d7cbb 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -220,7 +220,7 @@ static int fio_libaio_queue(struct thread_data *td, struct io_u *io_u)
 static void fio_libaio_queued(struct thread_data *td, struct io_u **io_us,
 			      unsigned int nr)
 {
-	struct timeval now;
+	struct timespec now;
 	unsigned int i;
 
 	if (!fio_fill_issue_time(td))
@@ -241,7 +241,7 @@ static int fio_libaio_commit(struct thread_data *td)
 	struct libaio_data *ld = td->io_ops_data;
 	struct iocb **iocbs;
 	struct io_u **io_us;
-	struct timeval tv;
+	struct timespec ts;
 	int ret, wait_start = 0;
 
 	if (!ld->queued)
@@ -282,9 +282,9 @@ static int fio_libaio_commit(struct thread_data *td)
 				break;
 			}
 			if (!wait_start) {
-				fio_gettime(&tv, NULL);
+				fio_gettime(&ts, NULL);
 				wait_start = 1;
-			} else if (mtime_since_now(&tv) > 30000) {
+			} else if (mtime_since_now(&ts) > 30000) {
 				log_err("fio: aio appears to be stalled, giving up\n");
 				break;
 			}
diff --git a/engines/rdma.c b/engines/rdma.c
index 10e60dc..8d31ff3 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -802,7 +802,7 @@ static void fio_rdmaio_queued(struct thread_data *td, struct io_u **io_us,
 			      unsigned int nr)
 {
 	struct rdmaio_data *rd = td->io_ops_data;
-	struct timeval now;
+	struct timespec now;
 	unsigned int i;
 
 	if (!fio_fill_issue_time(td))
diff --git a/eta.c b/eta.c
index adf7f94..baaa681 100644
--- a/eta.c
+++ b/eta.c
@@ -358,12 +358,12 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 	uint64_t rate_time, disp_time, bw_avg_time, *eta_secs;
 	unsigned long long io_bytes[DDIR_RWDIR_CNT];
 	unsigned long long io_iops[DDIR_RWDIR_CNT];
-	struct timeval now;
+	struct timespec now;
 
 	static unsigned long long rate_io_bytes[DDIR_RWDIR_CNT];
 	static unsigned long long disp_io_bytes[DDIR_RWDIR_CNT];
 	static unsigned long long disp_io_iops[DDIR_RWDIR_CNT];
-	static struct timeval rate_prev_time, disp_prev_time;
+	static struct timespec rate_prev_time, disp_prev_time;
 
 	if (!force) {
 		if (!(output_format & FIO_OUTPUT_NORMAL) &&
@@ -511,7 +511,7 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 
 void display_thread_status(struct jobs_eta *je)
 {
-	static struct timeval disp_eta_new_line;
+	static struct timespec disp_eta_new_line;
 	static int eta_new_line_init, eta_new_line_pending;
 	static int linelen_last;
 	static int eta_good;
diff --git a/fio.h b/fio.h
index 6c06a0c..d5d6bfe 100644
--- a/fio.h
+++ b/fio.h
@@ -165,10 +165,10 @@ struct thread_data {
 	struct thread_data *parent;
 
 	uint64_t stat_io_bytes[DDIR_RWDIR_CNT];
-	struct timeval bw_sample_time;
+	struct timespec bw_sample_time;
 
 	uint64_t stat_io_blocks[DDIR_RWDIR_CNT];
-	struct timeval iops_sample_time;
+	struct timespec iops_sample_time;
 
 	volatile int update_rusage;
 	struct fio_mutex *rusage_sem;
@@ -287,7 +287,7 @@ struct thread_data {
 	unsigned long rate_bytes[DDIR_RWDIR_CNT];
 	unsigned long rate_blocks[DDIR_RWDIR_CNT];
 	unsigned long long rate_io_issue_bytes[DDIR_RWDIR_CNT];
-	struct timeval lastrate[DDIR_RWDIR_CNT];
+	struct timespec lastrate[DDIR_RWDIR_CNT];
 	int64_t last_usec[DDIR_RWDIR_CNT];
 	struct frand_state poisson_state[DDIR_RWDIR_CNT];
 
@@ -323,21 +323,21 @@ struct thread_data {
 	 */
 	struct frand_state random_state;
 
-	struct timeval start;	/* start of this loop */
-	struct timeval epoch;	/* time job was started */
+	struct timespec start;	/* start of this loop */
+	struct timespec epoch;	/* time job was started */
 	unsigned long long unix_epoch; /* Time job was started, unix epoch based. */
-	struct timeval last_issue;
+	struct timespec last_issue;
 	long time_offset;
-	struct timeval tv_cache;
-	struct timeval terminate_time;
-	unsigned int tv_cache_nr;
-	unsigned int tv_cache_mask;
+	struct timespec ts_cache;
+	struct timespec terminate_time;
+	unsigned int ts_cache_nr;
+	unsigned int ts_cache_mask;
 	unsigned int ramp_time_over;
 
 	/*
 	 * Time since last latency_window was started
 	 */
-	struct timeval latency_ts;
+	struct timespec latency_ts;
 	unsigned int latency_qd;
 	unsigned int latency_qd_high;
 	unsigned int latency_qd_low;
@@ -642,7 +642,7 @@ extern void reset_all_stats(struct thread_data *);
 
 extern int io_queue_event(struct thread_data *td, struct io_u *io_u, int *ret,
 		   enum fio_ddir ddir, uint64_t *bytes_issued, int from_verify,
-		   struct timeval *comp_time);
+		   struct timespec *comp_time);
 
 /*
  * Latency target helpers
diff --git a/fio_time.h b/fio_time.h
index b49cc82..f4eac79 100644
--- a/fio_time.h
+++ b/fio_time.h
@@ -4,22 +4,24 @@
 #include "lib/types.h"
 
 struct thread_data;
-extern uint64_t utime_since(const struct timeval *,const  struct timeval *);
-extern uint64_t utime_since_now(const struct timeval *);
-extern uint64_t mtime_since(const struct timeval *, const struct timeval *);
-extern uint64_t mtime_since_now(const struct timeval *);
-extern uint64_t time_since_now(const struct timeval *);
+extern uint64_t ntime_since(const struct timespec *, const struct timespec *);
+extern uint64_t utime_since(const struct timespec *, const struct timespec *);
+extern uint64_t utime_since_now(const struct timespec *);
+extern uint64_t mtime_since(const struct timespec *, const struct timespec *);
+extern uint64_t mtime_since_now(const struct timespec *);
+extern uint64_t mtime_since_tv(const struct timeval *, const struct timeval *);
+extern uint64_t time_since_now(const struct timespec *);
 extern uint64_t time_since_genesis(void);
 extern uint64_t mtime_since_genesis(void);
 extern uint64_t utime_since_genesis(void);
 extern uint64_t usec_spin(unsigned int);
 extern uint64_t usec_sleep(struct thread_data *, unsigned long);
-extern void fill_start_time(struct timeval *);
+extern void fill_start_time(struct timespec *);
 extern void set_genesis_time(void);
 extern bool ramp_time_over(struct thread_data *);
 extern bool in_ramp_time(struct thread_data *);
 extern void fio_time_init(void);
-extern void timeval_add_msec(struct timeval *, unsigned int);
+extern void timespec_add_msec(struct timespec *, unsigned int);
 extern void set_epoch_time(struct thread_data *, int);
 
 #endif
diff --git a/gclient.c b/gclient.c
index 928a1b7..4eb99a0 100644
--- a/gclient.c
+++ b/gclient.c
@@ -930,8 +930,10 @@ static gint on_config_lat_drawing_area(GtkWidget *w, GdkEventConfigure *event,
 static void gfio_show_latency_buckets(struct gfio_client *gc, GtkWidget *vbox,
 				      struct thread_stat *ts)
 {
-	double io_u_lat[FIO_IO_U_LAT_U_NR + FIO_IO_U_LAT_M_NR];
-	const char *ranges[] = { "2us", "4us", "10us", "20us", "50us", "100us",
+	double io_u_lat[FIO_IO_U_LAT_N_NR + FIO_IO_U_LAT_U_NR + FIO_IO_U_LAT_M_NR];
+	const char *ranges[] = { "2ns", "4ns", "10ns", "20ns", "50ns", "100ns",
+				 "250ns", "500ns", "750ns", "1000ns", "2us",
+				 "4us", "10us", "20us", "50us", "100us",
 				 "250us", "500us", "750us", "1ms", "2ms",
 				 "4ms", "10ms", "20ms", "50ms", "100ms",
 				 "250ms", "500ms", "750ms", "1s", "2s", ">= 2s" };
@@ -940,8 +942,9 @@ static void gfio_show_latency_buckets(struct gfio_client *gc, GtkWidget *vbox,
 	GtkWidget *frame, *tree_view, *hbox, *completion_vbox, *drawing_area;
 	struct gui_entry *ge = gc->ge;
 
-	stat_calc_lat_u(ts, io_u_lat);
-	stat_calc_lat_m(ts, &io_u_lat[FIO_IO_U_LAT_U_NR]);
+	stat_calc_lat_n(ts, io_u_lat);
+	stat_calc_lat_u(ts, &io_u_lat[FIO_IO_U_LAT_N_NR]);
+	stat_calc_lat_m(ts, &io_u_lat[FIO_IO_U_LAT_N_NR + FIO_IO_U_LAT_U_NR]);
 
 	/*
 	 * Found out which first bucket has entries, and which last bucket
@@ -983,16 +986,18 @@ static void gfio_show_latency_buckets(struct gfio_client *gc, GtkWidget *vbox,
 	gtk_box_pack_start(GTK_BOX(hbox), tree_view, TRUE, TRUE, 3);
 }
 
-static void gfio_show_lat(GtkWidget *vbox, const char *name, unsigned long min,
-			  unsigned long max, double mean, double dev)
+static void gfio_show_lat(GtkWidget *vbox, const char *name, unsigned long long min,
+			  unsigned long long max, double mean, double dev)
 {
-	const char *base = "(usec)";
+	const char *base = "(nsec)";
 	GtkWidget *hbox, *label, *frame;
 	char *minp, *maxp;
 	char tmp[64];
 
-	if (usec_to_msec(&min, &max, &mean, &dev))
+	if (nsec_to_msec(&min, &max, &mean, &dev))
 		base = "(msec)";
+	else if (nsec_to_usec(&min, &max, &mean, &dev))
+		base = "(usec)";
 
 	minp = num2str(min, 6, 1, 0, N2S_NONE);
 	maxp = num2str(max, 6, 1, 0, N2S_NONE);
@@ -1019,7 +1024,7 @@ static void gfio_show_lat(GtkWidget *vbox, const char *name, unsigned long min,
 	free(maxp);
 }
 
-static GtkWidget *gfio_output_clat_percentiles(unsigned int *ovals,
+static GtkWidget *gfio_output_clat_percentiles(unsigned long long *ovals,
 					       fio_fp64_t *plist,
 					       unsigned int len,
 					       const char *base,
@@ -1030,10 +1035,10 @@ static GtkWidget *gfio_output_clat_percentiles(unsigned int *ovals,
 	GtkTreeSelection *selection;
 	GtkListStore *model;
 	GtkTreeIter iter;
-	int i;
+	int i, j;
 
 	for (i = 0; i < len; i++)
-		types[i] = G_TYPE_INT;
+		types[i] = G_TYPE_ULONG;
 
 	model = gtk_list_store_newv(len, types);
 
@@ -1056,15 +1061,15 @@ static GtkWidget *gfio_output_clat_percentiles(unsigned int *ovals,
 	gtk_list_store_append(model, &iter);
 
 	for (i = 0; i < len; i++) {
-		if (scale)
+		for (j = 0; j < scale; j++)
 			ovals[i] = (ovals[i] + 999) / 1000;
-		gtk_list_store_set(model, &iter, i, ovals[i], -1);
+		gtk_list_store_set(model, &iter, i, (unsigned long) ovals[i], -1);
 	}
 
 	return tree_view;
 }
 
-static struct graph *setup_clat_graph(char *title, unsigned int *ovals,
+static struct graph *setup_clat_graph(char *title, unsigned long long *ovals,
 				      fio_fp64_t *plist,
 				      unsigned int len,
 				      double xdim, double ydim)
@@ -1096,7 +1101,8 @@ static void gfio_show_clat_percentiles(struct gfio_client *gc,
 	unsigned int *io_u_plat = ts->io_u_plat[ddir];
 	unsigned long nr = ts->clat_stat[ddir].samples;
 	fio_fp64_t *plist = ts->percentile_list;
-	unsigned int *ovals, len, minv, maxv, scale_down;
+	unsigned int len, scale_down;
+	unsigned long long *ovals, minv, maxv;
 	const char *base;
 	GtkWidget *tree_view, *frame, *hbox, *drawing_area, *completion_vbox;
 	struct gui_entry *ge = gc->ge;
@@ -1107,16 +1113,19 @@ static void gfio_show_clat_percentiles(struct gfio_client *gc,
 		goto out;
 
 	/*
-	 * We default to usecs, but if the value range is such that we
-	 * should scale down to msecs, do that.
+	 * We default to nsecs, but if the value range is such that we
+	 * should scale down to usecs or msecs, do that.
 	 */
-	if (minv > 2000 && maxv > 99999) {
-		scale_down = 1;
+        if (minv > 2000000 && maxv > 99999999ULL) {
+                scale_down = 2;
 		base = "msec";
-	} else {
-		scale_down = 0;
+        } else if (minv > 2000 && maxv > 99999) {
+                scale_down = 1;
 		base = "usec";
-	}
+        } else {
+                scale_down = 0;
+		base = "nsec";
+        }
 
 	sprintf(tmp, "Completion percentiles (%s)", base);
 	tree_view = gfio_output_clat_percentiles(ovals, plist, len, base, scale_down);
@@ -1152,7 +1161,8 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 {
 	const char *ddir_label[3] = { "Read", "Write", "Trim" };
 	GtkWidget *frame, *label, *box, *vbox, *main_vbox;
-	unsigned long min[3], max[3], runt;
+	unsigned long long min[3], max[3];
+	unsigned long runt;
 	unsigned long long bw, iops;
 	unsigned int flags = 0;
 	double mean[3], dev[3];
diff --git a/gettime-thread.c b/gettime-thread.c
index 19541b4..cbb81dc 100644
--- a/gettime-thread.c
+++ b/gettime-thread.c
@@ -6,30 +6,30 @@
 #include "fio.h"
 #include "smalloc.h"
 
-struct timeval *fio_tv = NULL;
+struct timespec *fio_ts = NULL;
 int fio_gtod_offload = 0;
 static pthread_t gtod_thread;
 static os_cpu_mask_t fio_gtod_cpumask;
 
 void fio_gtod_init(void)
 {
-	if (fio_tv)
+	if (fio_ts)
 		return;
 
-	fio_tv = smalloc(sizeof(struct timeval));
-	if (!fio_tv)
+	fio_ts = smalloc(sizeof(*fio_ts));
+	if (!fio_ts)
 		log_err("fio: smalloc pool exhausted\n");
 }
 
 static void fio_gtod_update(void)
 {
-	if (fio_tv) {
+	if (fio_ts) {
 		struct timeval __tv;
 
 		gettimeofday(&__tv, NULL);
-		fio_tv->tv_sec = __tv.tv_sec;
+		fio_ts->tv_sec = __tv.tv_sec;
 		write_barrier();
-		fio_tv->tv_usec = __tv.tv_usec;
+		fio_ts->tv_nsec = __tv.tv_usec * 1000;
 		write_barrier();
 	}
 }
diff --git a/gettime.c b/gettime.c
index 628aad6..5741932 100644
--- a/gettime.c
+++ b/gettime.c
@@ -15,19 +15,22 @@
 
 #if defined(ARCH_HAVE_CPU_CLOCK)
 #ifndef ARCH_CPU_CLOCK_CYCLES_PER_USEC
-static unsigned long cycles_per_usec;
-static unsigned long inv_cycles_per_usec;
-static uint64_t max_cycles_for_mult;
+static unsigned long cycles_per_msec;
+static unsigned long long cycles_start;
+static unsigned long long clock_mult;
+static unsigned long long max_cycles_mask;
+static unsigned long long nsecs_for_max_cycles;
+static unsigned int clock_shift;
+static unsigned int max_cycles_shift;
+#define MAX_CLOCK_SEC 60*60
 #endif
 #ifdef ARCH_CPU_CLOCK_WRAPS
-static unsigned long long cycles_start, cycles_wrap;
+static unsigned int cycles_wrap;
 #endif
 #endif
-int tsc_reliable = 0;
+bool tsc_reliable = false;
 
 struct tv_valid {
-	uint64_t last_cycles;
-	int last_tv_valid;
 	int warned;
 };
 #ifdef ARCH_HAVE_CPU_CLOCK
@@ -143,31 +146,31 @@ static int fill_clock_gettime(struct timespec *ts)
 }
 #endif
 
-static void __fio_gettime(struct timeval *tp)
+static void __fio_gettime(struct timespec *tp)
 {
 	switch (fio_clock_source) {
 #ifdef CONFIG_GETTIMEOFDAY
-	case CS_GTOD:
-		gettimeofday(tp, NULL);
+	case CS_GTOD: {
+		struct timeval tv;
+		gettimeofday(&tv, NULL);
+
+		tp->tv_sec = tv.tv_sec;
+		tp->tv_nsec = tv.tv_usec * 1000;
 		break;
+		}
 #endif
 #ifdef CONFIG_CLOCK_GETTIME
 	case CS_CGETTIME: {
-		struct timespec ts;
-
-		if (fill_clock_gettime(&ts) < 0) {
+		if (fill_clock_gettime(tp) < 0) {
 			log_err("fio: clock_gettime fails\n");
 			assert(0);
 		}
-
-		tp->tv_sec = ts.tv_sec;
-		tp->tv_usec = ts.tv_nsec / 1000;
 		break;
 		}
 #endif
 #ifdef ARCH_HAVE_CPU_CLOCK
 	case CS_CPUCLOCK: {
-		uint64_t usecs, t;
+		uint64_t nsecs, t, multiples;
 		struct tv_valid *tv;
 
 #ifdef CONFIG_TLS_THREAD
@@ -184,21 +187,17 @@ static void __fio_gettime(struct timeval *tp)
 			log_err("fio: double CPU clock wrap\n");
 			tv->warned = 1;
 		}
-
-		t -= cycles_start;
 #endif
-		tv->last_cycles = t;
-		tv->last_tv_valid = 1;
 #ifdef ARCH_CPU_CLOCK_CYCLES_PER_USEC
-		usecs = t / ARCH_CPU_CLOCK_CYCLES_PER_USEC;
+		nsecs = t / ARCH_CPU_CLOCK_CYCLES_PER_USEC * 1000;
 #else
-		if (t < max_cycles_for_mult)
-			usecs = (t * inv_cycles_per_usec) / 16777216UL;
-		else
-			usecs = t / cycles_per_usec;
+		t -= cycles_start;
+		multiples = t >> max_cycles_shift;
+		nsecs = multiples * nsecs_for_max_cycles;
+		nsecs += ((t & max_cycles_mask) * clock_mult) >> clock_shift;
 #endif
-		tp->tv_sec = usecs / 1000000;
-		tp->tv_usec = usecs % 1000000;
+		tp->tv_sec = nsecs / 1000000000ULL;
+		tp->tv_nsec = nsecs % 1000000000ULL;
 		break;
 		}
 #endif
@@ -209,9 +208,9 @@ static void __fio_gettime(struct timeval *tp)
 }
 
 #ifdef FIO_DEBUG_TIME
-void fio_gettime(struct timeval *tp, void *caller)
+void fio_gettime(struct timespec *tp, void *caller)
 #else
-void fio_gettime(struct timeval *tp, void fio_unused *caller)
+void fio_gettime(struct timespec *tp, void fio_unused *caller)
 #endif
 {
 #ifdef FIO_DEBUG_TIME
@@ -227,9 +226,9 @@ void fio_gettime(struct timeval *tp, void fio_unused *caller)
 }
 
 #if defined(ARCH_HAVE_CPU_CLOCK) && !defined(ARCH_CPU_CLOCK_CYCLES_PER_USEC)
-static unsigned long get_cycles_per_usec(void)
+static unsigned long get_cycles_per_msec(void)
 {
-	struct timeval s, e;
+	struct timespec s, e;
 	uint64_t c_s, c_e;
 	enum fio_cs old_cs = fio_clock_source;
 	uint64_t elapsed;
@@ -253,7 +252,7 @@ static unsigned long get_cycles_per_usec(void)
 	} while (1);
 
 	fio_clock_source = old_cs;
-	return (c_e - c_s) / elapsed;
+	return (c_e - c_s) * 1000 / elapsed;
 }
 
 #define NR_TIME_ITERS	50
@@ -262,12 +261,13 @@ static int calibrate_cpu_clock(void)
 {
 	double delta, mean, S;
 	uint64_t minc, maxc, avg, cycles[NR_TIME_ITERS];
-	int i, samples;
+	int i, samples, sft = 0;
+	unsigned long long tmp, max_ticks, max_mult;
 
-	cycles[0] = get_cycles_per_usec();
+	cycles[0] = get_cycles_per_msec();
 	S = delta = mean = 0.0;
 	for (i = 0; i < NR_TIME_ITERS; i++) {
-		cycles[i] = get_cycles_per_usec();
+		cycles[i] = get_cycles_per_msec();
 		delta = cycles[i] - mean;
 		if (delta) {
 			mean += delta / (i + 1.0);
@@ -304,19 +304,67 @@ static int calibrate_cpu_clock(void)
 		dprint(FD_TIME, "cycles[%d]=%llu\n", i, (unsigned long long) cycles[i]);
 
 	avg /= samples;
+	cycles_per_msec = avg;
 	dprint(FD_TIME, "avg: %llu\n", (unsigned long long) avg);
 	dprint(FD_TIME, "min=%llu, max=%llu, mean=%f, S=%f\n",
 			(unsigned long long) minc,
 			(unsigned long long) maxc, mean, S);
 
-	cycles_per_usec = avg;
-	inv_cycles_per_usec = 16777216UL / cycles_per_usec;
-	max_cycles_for_mult = ~0ULL / inv_cycles_per_usec;
-	dprint(FD_TIME, "inv_cycles_per_usec=%lu\n", inv_cycles_per_usec);
-#ifdef ARCH_CPU_CLOCK_WRAPS
+	max_ticks = MAX_CLOCK_SEC * cycles_per_msec * 1000ULL;
+	max_mult = ULLONG_MAX / max_ticks;
+	dprint(FD_TIME, "\n\nmax_ticks=%llu, __builtin_clzll=%d, "
+			"max_mult=%llu\n", max_ticks,
+			__builtin_clzll(max_ticks), max_mult);
+
+        /*
+         * Find the largest shift count that will produce
+         * a multiplier that does not exceed max_mult
+         */
+        tmp = max_mult * cycles_per_msec / 1000000;
+        while (tmp > 1) {
+                tmp >>= 1;
+                sft++;
+                dprint(FD_TIME, "tmp=%llu, sft=%u\n", tmp, sft);
+        }
+
+	clock_shift = sft;
+	clock_mult = (1ULL << sft) * 1000000 / cycles_per_msec;
+	dprint(FD_TIME, "clock_shift=%u, clock_mult=%llu\n", clock_shift,
+							clock_mult);
+
+	/*
+	 * Find the greatest power of 2 clock ticks that is less than the
+	 * ticks in MAX_CLOCK_SEC_2STAGE
+	 */
+	max_cycles_shift = max_cycles_mask = 0;
+	tmp = MAX_CLOCK_SEC * 1000ULL * cycles_per_msec;
+	dprint(FD_TIME, "tmp=%llu, max_cycles_shift=%u\n", tmp,
+							max_cycles_shift);
+	while (tmp > 1) {
+		tmp >>= 1;
+		max_cycles_shift++;
+		dprint(FD_TIME, "tmp=%llu, max_cycles_shift=%u\n", tmp, max_cycles_shift);
+	}
+	/*
+	 * if use use (1ULL << max_cycles_shift) * 1000 / cycles_per_msec
+	 * here we will have a discontinuity every
+	 * (1ULL << max_cycles_shift) cycles
+	 */
+	nsecs_for_max_cycles = ((1ULL << max_cycles_shift) * clock_mult)
+					>> clock_shift;
+
+	/* Use a bitmask to calculate ticks % (1ULL << max_cycles_shift) */
+	for (tmp = 0; tmp < max_cycles_shift; tmp++)
+		max_cycles_mask |= 1ULL << tmp;
+
+	dprint(FD_TIME, "max_cycles_shift=%u, 2^max_cycles_shift=%llu, "
+			"nsecs_for_max_cycles=%llu, "
+			"max_cycles_mask=%016llx\n",
+			max_cycles_shift, (1ULL << max_cycles_shift),
+			nsecs_for_max_cycles, max_cycles_mask);
+
 	cycles_start = get_cpu_clock();
 	dprint(FD_TIME, "cycles_start=%llu\n", cycles_start);
-#endif
 	return 0;
 }
 #else
@@ -365,7 +413,7 @@ void fio_clock_init(void)
 	fio_clock_source_inited = fio_clock_source;
 
 	if (calibrate_cpu_clock())
-		tsc_reliable = 0;
+		tsc_reliable = false;
 
 	/*
 	 * If the arch sets tsc_reliable != 0, then it must be good enough
@@ -379,12 +427,32 @@ void fio_clock_init(void)
 		log_info("fio: clocksource=cpu may not be reliable\n");
 }
 
-uint64_t utime_since(const struct timeval *s, const struct timeval *e)
+uint64_t ntime_since(const struct timespec *s, const struct timespec *e)
+{
+       int64_t sec, nsec;
+
+       sec = e->tv_sec - s->tv_sec;
+       nsec = e->tv_nsec - s->tv_nsec;
+       if (sec > 0 && nsec < 0) {
+	       sec--;
+	       nsec += 1000000000LL;
+       }
+
+       /*
+	* time warp bug on some kernels?
+	*/
+       if (sec < 0 || (sec == 0 && nsec < 0))
+	       return 0;
+
+       return nsec + (sec * 1000000000LL);
+}
+
+uint64_t utime_since(const struct timespec *s, const struct timespec *e)
 {
 	int64_t sec, usec;
 
 	sec = e->tv_sec - s->tv_sec;
-	usec = e->tv_usec - s->tv_usec;
+	usec = (e->tv_nsec - s->tv_nsec) / 1000;
 	if (sec > 0 && usec < 0) {
 		sec--;
 		usec += 1000000;
@@ -399,9 +467,9 @@ uint64_t utime_since(const struct timeval *s, const struct timeval *e)
 	return usec + (sec * 1000000);
 }
 
-uint64_t utime_since_now(const struct timeval *s)
+uint64_t utime_since_now(const struct timespec *s)
 {
-	struct timeval t;
+	struct timespec t;
 #ifdef FIO_DEBUG_TIME
 	void *p = __builtin_return_address(0);
 
@@ -413,12 +481,12 @@ uint64_t utime_since_now(const struct timeval *s)
 	return utime_since(s, &t);
 }
 
-uint64_t mtime_since(const struct timeval *s, const struct timeval *e)
+uint64_t mtime_since_tv(const struct timeval *s, const struct timeval *e)
 {
-	long sec, usec;
+	int64_t sec, usec;
 
 	sec = e->tv_sec - s->tv_sec;
-	usec = e->tv_usec - s->tv_usec;
+	usec = (e->tv_usec - s->tv_usec);
 	if (sec > 0 && usec < 0) {
 		sec--;
 		usec += 1000000;
@@ -432,9 +500,9 @@ uint64_t mtime_since(const struct timeval *s, const struct timeval *e)
 	return sec + usec;
 }
 
-uint64_t mtime_since_now(const struct timeval *s)
+uint64_t mtime_since_now(const struct timespec *s)
 {
-	struct timeval t;
+	struct timespec t;
 #ifdef FIO_DEBUG_TIME
 	void *p = __builtin_return_address(0);
 
@@ -446,7 +514,26 @@ uint64_t mtime_since_now(const struct timeval *s)
 	return mtime_since(s, &t);
 }
 
-uint64_t time_since_now(const struct timeval *s)
+uint64_t mtime_since(const struct timespec *s, const struct timespec *e)
+{
+	int64_t sec, usec;
+
+	sec = e->tv_sec - s->tv_sec;
+	usec = (e->tv_nsec - s->tv_nsec) / 1000;
+	if (sec > 0 && usec < 0) {
+		sec--;
+		usec += 1000000;
+	}
+
+	if (sec < 0 || (sec == 0 && usec < 0))
+		return 0;
+
+	sec *= 1000;
+	usec /= 1000;
+	return sec + usec;
+}
+
+uint64_t time_since_now(const struct timespec *s)
 {
 	return mtime_since_now(s) / 1000;
 }
diff --git a/gettime.h b/gettime.h
index 86d55bd..11e2a7b 100644
--- a/gettime.h
+++ b/gettime.h
@@ -13,27 +13,27 @@ enum fio_cs {
 	CS_INVAL,
 };
 
-extern void fio_gettime(struct timeval *, void *);
+extern void fio_gettime(struct timespec *, void *);
 extern void fio_gtod_init(void);
 extern void fio_clock_init(void);
 extern int fio_start_gtod_thread(void);
 extern int fio_monotonic_clocktest(int debug);
 extern void fio_local_clock_init(int);
 
-extern struct timeval *fio_tv;
+extern struct timespec *fio_ts;
 
-static inline int fio_gettime_offload(struct timeval *tv)
+static inline int fio_gettime_offload(struct timespec *ts)
 {
 	time_t last_sec;
 
-	if (!fio_tv)
+	if (!fio_ts)
 		return 0;
 
 	do {
 		read_barrier();
-		last_sec = tv->tv_sec = fio_tv->tv_sec;
-		tv->tv_usec = fio_tv->tv_usec;
-	} while (fio_tv->tv_sec != last_sec);
+		last_sec = ts->tv_sec = fio_ts->tv_sec;
+		ts->tv_nsec = fio_ts->tv_nsec;
+	} while (fio_ts->tv_sec != last_sec);
 
 	return 1;
 }
diff --git a/helper_thread.c b/helper_thread.c
index 47ec728..9c6e0a2 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -71,45 +71,45 @@ static void *helper_thread_main(void *data)
 {
 	struct helper_data *hd = data;
 	unsigned int msec_to_next_event, next_log, next_ss = STEADYSTATE_MSEC;
-	struct timeval tv, last_du, last_ss;
+	struct timeval tv;
+	struct timespec ts, last_du, last_ss;
 	int ret = 0;
 
 	sk_out_assign(hd->sk_out);
 
 	gettimeofday(&tv, NULL);
-	memcpy(&last_du, &tv, sizeof(tv));
-	memcpy(&last_ss, &tv, sizeof(tv));
+	ts.tv_sec = tv.tv_sec;
+	ts.tv_nsec = tv.tv_usec * 1000;
+	memcpy(&last_du, &ts, sizeof(ts));
+	memcpy(&last_ss, &ts, sizeof(ts));
 
 	fio_mutex_up(hd->startup_mutex);
 
 	msec_to_next_event = DISK_UTIL_MSEC;
 	while (!ret && !hd->exit) {
-		struct timespec ts;
-		struct timeval now;
 		uint64_t since_du, since_ss = 0;
 
-		timeval_add_msec(&tv, msec_to_next_event);
-		ts.tv_sec = tv.tv_sec;
-		ts.tv_nsec = tv.tv_usec * 1000;
+		timespec_add_msec(&ts, msec_to_next_event);
 
 		pthread_mutex_lock(&hd->lock);
 		pthread_cond_timedwait(&hd->cond, &hd->lock, &ts);
 
-		gettimeofday(&now, NULL);
+		gettimeofday(&tv, NULL);
+		ts.tv_sec = tv.tv_sec;
+		ts.tv_nsec = tv.tv_usec * 1000;
 
 		if (hd->reset) {
-			memcpy(&tv, &now, sizeof(tv));
-			memcpy(&last_du, &now, sizeof(last_du));
-			memcpy(&last_ss, &now, sizeof(last_ss));
+			memcpy(&last_du, &ts, sizeof(ts));
+			memcpy(&last_ss, &ts, sizeof(ts));
 			hd->reset = 0;
 		}
 
 		pthread_mutex_unlock(&hd->lock);
 
-		since_du = mtime_since(&last_du, &now);
+		since_du = mtime_since(&last_du, &ts);
 		if (since_du >= DISK_UTIL_MSEC || DISK_UTIL_MSEC - since_du < 10) {
 			ret = update_io_ticks();
-			timeval_add_msec(&last_du, DISK_UTIL_MSEC);
+			timespec_add_msec(&last_du, DISK_UTIL_MSEC);
 			msec_to_next_event = DISK_UTIL_MSEC;
 			if (since_du >= DISK_UTIL_MSEC)
 				msec_to_next_event -= (since_du - DISK_UTIL_MSEC);
@@ -126,10 +126,10 @@ static void *helper_thread_main(void *data)
 			next_log = DISK_UTIL_MSEC;
 
 		if (steadystate_enabled) {
-			since_ss = mtime_since(&last_ss, &now);
+			since_ss = mtime_since(&last_ss, &ts);
 			if (since_ss >= STEADYSTATE_MSEC || STEADYSTATE_MSEC - since_ss < 10) {
 				steadystate_check();
-				timeval_add_msec(&last_ss, since_ss);
+				timespec_add_msec(&last_ss, since_ss);
 				if (since_ss > STEADYSTATE_MSEC)
 					next_ss = STEADYSTATE_MSEC - (since_ss - STEADYSTATE_MSEC);
 				else
diff --git a/idletime.c b/idletime.c
index 4c00d80..90bc1d9 100644
--- a/idletime.c
+++ b/idletime.c
@@ -11,7 +11,7 @@ static volatile struct idle_prof_common ipc;
 static double calibrate_unit(unsigned char *data)
 {
 	unsigned long t, i, j, k;
-	struct timeval tps;
+	struct timespec tps;
 	double tunit = 0.0;
 
 	for (i = 0; i < CALIBRATE_RUNS; i++) {
@@ -183,7 +183,6 @@ static void calibration_stats(void)
 void fio_idle_prof_init(void)
 {
 	int i, ret;
-	struct timeval tp;
 	struct timespec ts;
 	pthread_attr_t tattr;
 	struct idle_prof_thread *ipt;
@@ -282,9 +281,8 @@ void fio_idle_prof_init(void)
 		pthread_mutex_lock(&ipt->init_lock);
 		while ((ipt->state != TD_EXITED) &&
 		       (ipt->state!=TD_INITIALIZED)) {
-			fio_gettime(&tp, NULL);
-			ts.tv_sec = tp.tv_sec + 1;
-			ts.tv_nsec = tp.tv_usec * 1000;
+			fio_gettime(&ts, NULL);
+			ts.tv_sec += 1;
 			pthread_cond_timedwait(&ipt->cond, &ipt->init_lock, &ts);
 		}
 		pthread_mutex_unlock(&ipt->init_lock);
@@ -325,7 +323,6 @@ void fio_idle_prof_stop(void)
 {
 	int i;
 	uint64_t runt;
-	struct timeval tp;
 	struct timespec ts;
 	struct idle_prof_thread *ipt;
 
@@ -343,9 +340,8 @@ void fio_idle_prof_stop(void)
 		pthread_mutex_lock(&ipt->start_lock);
 		while ((ipt->state != TD_EXITED) &&
 		       (ipt->state!=TD_NOT_CREATED)) {
-			fio_gettime(&tp, NULL);
-			ts.tv_sec = tp.tv_sec + 1;
-			ts.tv_nsec = tp.tv_usec * 1000;
+			fio_gettime(&ts, NULL);
+			ts.tv_sec += 1;
 			/* timed wait in case a signal is not received */
 			pthread_cond_timedwait(&ipt->cond, &ipt->start_lock, &ts);
 		}
diff --git a/idletime.h b/idletime.h
index 84c1fbb..b8376c2 100644
--- a/idletime.h
+++ b/idletime.h
@@ -26,8 +26,8 @@ struct idle_prof_thread {
 	pthread_t thread;
 	int cpu;
 	int state;
-	struct timeval tps;
-	struct timeval tpe;
+	struct timespec tps;
+	struct timespec tpe;
 	double cali_time; /* microseconds to finish a unit work */
 	double loops;
 	double idleness;
diff --git a/io_u.c b/io_u.c
index fd63119..375413f 100644
--- a/io_u.c
+++ b/io_u.c
@@ -20,7 +20,7 @@ struct io_completion_data {
 
 	int error;			/* output */
 	uint64_t bytes_done[DDIR_RWDIR_CNT];	/* output */
-	struct timeval time;		/* output */
+	struct timespec time;		/* output */
 };
 
 /*
@@ -989,11 +989,52 @@ void io_u_mark_depth(struct thread_data *td, unsigned int nr)
 	td->ts.io_u_map[idx] += nr;
 }
 
-static void io_u_mark_lat_usec(struct thread_data *td, unsigned long usec)
+static void io_u_mark_lat_nsec(struct thread_data *td, unsigned long long nsec)
 {
 	int idx = 0;
 
-	assert(usec < 1000);
+	assert(nsec < 1000);
+
+	switch (nsec) {
+	case 750 ... 999:
+		idx = 9;
+		break;
+	case 500 ... 749:
+		idx = 8;
+		break;
+	case 250 ... 499:
+		idx = 7;
+		break;
+	case 100 ... 249:
+		idx = 6;
+		break;
+	case 50 ... 99:
+		idx = 5;
+		break;
+	case 20 ... 49:
+		idx = 4;
+		break;
+	case 10 ... 19:
+		idx = 3;
+		break;
+	case 4 ... 9:
+		idx = 2;
+		break;
+	case 2 ... 3:
+		idx = 1;
+	case 0 ... 1:
+		break;
+	}
+
+	assert(idx < FIO_IO_U_LAT_N_NR);
+	td->ts.io_u_lat_n[idx]++;
+}
+
+static void io_u_mark_lat_usec(struct thread_data *td, unsigned long long usec)
+{
+	int idx = 0;
+
+	assert(usec < 1000 && usec >= 1);
 
 	switch (usec) {
 	case 750 ... 999:
@@ -1030,10 +1071,12 @@ static void io_u_mark_lat_usec(struct thread_data *td, unsigned long usec)
 	td->ts.io_u_lat_u[idx]++;
 }
 
-static void io_u_mark_lat_msec(struct thread_data *td, unsigned long msec)
+static void io_u_mark_lat_msec(struct thread_data *td, unsigned long long msec)
 {
 	int idx = 0;
 
+	assert(msec >= 1);
+
 	switch (msec) {
 	default:
 		idx = 11;
@@ -1075,12 +1118,14 @@ static void io_u_mark_lat_msec(struct thread_data *td, unsigned long msec)
 	td->ts.io_u_lat_m[idx]++;
 }
 
-static void io_u_mark_latency(struct thread_data *td, unsigned long usec)
+static void io_u_mark_latency(struct thread_data *td, unsigned long long nsec)
 {
-	if (usec < 1000)
-		io_u_mark_lat_usec(td, usec);
+	if (nsec < 1000)
+		io_u_mark_lat_nsec(td, nsec);
+	else if (nsec < 1000000)
+		io_u_mark_lat_usec(td, nsec / 1000);
 	else
-		io_u_mark_lat_msec(td, usec / 1000);
+		io_u_mark_lat_msec(td, nsec / 1000000);
 }
 
 static unsigned int __get_next_fileno_rand(struct thread_data *td)
@@ -1572,7 +1617,7 @@ static void small_content_scramble(struct io_u *io_u)
 		 * the buffer, given by the product of the usec time
 		 * and the actual offset.
 		 */
-		offset = (io_u->start_time.tv_usec ^ boffset) & 511;
+		offset = ((io_u->start_time.tv_nsec/1000) ^ boffset) & 511;
 		offset &= ~(sizeof(uint64_t) - 1);
 		if (offset >= 512 - sizeof(uint64_t))
 			offset -= sizeof(uint64_t);
@@ -1729,7 +1774,7 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 				  const enum fio_ddir idx, unsigned int bytes)
 {
 	const int no_reduce = !gtod_reduce(td);
-	unsigned long lusec = 0;
+	unsigned long long llnsec = 0;
 
 	if (td->parent)
 		td = td->parent;
@@ -1738,37 +1783,37 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 		return;
 
 	if (no_reduce)
-		lusec = utime_since(&io_u->issue_time, &icd->time);
+		llnsec = ntime_since(&io_u->issue_time, &icd->time);
 
 	if (!td->o.disable_lat) {
-		unsigned long tusec;
+		unsigned long long tnsec;
 
-		tusec = utime_since(&io_u->start_time, &icd->time);
-		add_lat_sample(td, idx, tusec, bytes, io_u->offset);
+		tnsec = ntime_since(&io_u->start_time, &icd->time);
+		add_lat_sample(td, idx, tnsec, bytes, io_u->offset);
 
 		if (td->flags & TD_F_PROFILE_OPS) {
 			struct prof_io_ops *ops = &td->prof_io_ops;
 
 			if (ops->io_u_lat)
-				icd->error = ops->io_u_lat(td, tusec);
+				icd->error = ops->io_u_lat(td, tnsec/1000);
 		}
 
-		if (td->o.max_latency && tusec > td->o.max_latency)
-			lat_fatal(td, icd, tusec, td->o.max_latency);
-		if (td->o.latency_target && tusec > td->o.latency_target) {
+		if (td->o.max_latency && tnsec/1000 > td->o.max_latency)
+			lat_fatal(td, icd, tnsec/1000, td->o.max_latency);
+		if (td->o.latency_target && tnsec/1000 > td->o.latency_target) {
 			if (lat_target_failed(td))
-				lat_fatal(td, icd, tusec, td->o.latency_target);
+				lat_fatal(td, icd, tnsec/1000, td->o.latency_target);
 		}
 	}
 
 	if (ddir_rw(idx)) {
 		if (!td->o.disable_clat) {
-			add_clat_sample(td, idx, lusec, bytes, io_u->offset);
-			io_u_mark_latency(td, lusec);
+			add_clat_sample(td, idx, llnsec, bytes, io_u->offset);
+			io_u_mark_latency(td, llnsec);
 		}
 
 		if (!td->o.disable_bw && per_unit_log(td->bw_log))
-			add_bw_sample(td, io_u, bytes, lusec);
+			add_bw_sample(td, io_u, bytes, llnsec);
 
 		if (no_reduce && per_unit_log(td->iops_log))
 			add_iops_sample(td, io_u, bytes);
@@ -2000,7 +2045,7 @@ void io_u_queued(struct thread_data *td, struct io_u *io_u)
 	if (!td->o.disable_slat && ramp_time_over(td) && td->o.stats) {
 		unsigned long slat_time;
 
-		slat_time = utime_since(&io_u->start_time, &io_u->issue_time);
+		slat_time = ntime_since(&io_u->start_time, &io_u->issue_time);
 
 		if (td->parent)
 			td = td->parent;
diff --git a/io_u.h b/io_u.h
index 155344d..b228e2e 100644
--- a/io_u.h
+++ b/io_u.h
@@ -31,8 +31,8 @@ enum {
  * The io unit
  */
 struct io_u {
-	struct timeval start_time;
-	struct timeval issue_time;
+	struct timespec start_time;
+	struct timespec issue_time;
 
 	struct fio_file *file;
 	unsigned int flags;
diff --git a/ioengines.c b/ioengines.c
index 2d55065..abbaa9a 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -281,7 +281,7 @@ int td_io_queue(struct thread_data *td, struct io_u *io_u)
 		 */
 		if (td->o.read_iolog_file)
 			memcpy(&td->last_issue, &io_u->issue_time,
-					sizeof(struct timeval));
+					sizeof(io_u->issue_time));
 	}
 
 	if (ddir_rw(ddir)) {
@@ -356,7 +356,7 @@ int td_io_queue(struct thread_data *td, struct io_u *io_u)
 		 */
 		if (td->o.read_iolog_file)
 			memcpy(&td->last_issue, &io_u->issue_time,
-					sizeof(struct timeval));
+					sizeof(io_u->issue_time));
 	}
 
 	return ret;
diff --git a/iolog.c b/iolog.c
index 01b82e8..27c14eb 100644
--- a/iolog.c
+++ b/iolog.c
@@ -65,7 +65,7 @@ static void iolog_delay(struct thread_data *td, unsigned long delay)
 {
 	uint64_t usec = utime_since_now(&td->last_issue);
 	uint64_t this_delay;
-	struct timeval tv;
+	struct timespec ts;
 
 	if (delay < td->time_offset) {
 		td->time_offset = 0;
@@ -78,7 +78,7 @@ static void iolog_delay(struct thread_data *td, unsigned long delay)
 
 	delay -= usec;
 
-	fio_gettime(&tv, NULL);
+	fio_gettime(&ts, NULL);
 	while (delay && !td->terminate) {
 		this_delay = delay;
 		if (this_delay > 500000)
@@ -88,7 +88,7 @@ static void iolog_delay(struct thread_data *td, unsigned long delay)
 		delay -= this_delay;
 	}
 
-	usec = utime_since_now(&tv);
+	usec = utime_since_now(&ts);
 	if (usec > delay)
 		td->time_offset = usec - delay;
 	else
@@ -643,6 +643,7 @@ void setup_log(struct io_log **log, struct log_params *p,
 		l->log_gz = 0;
 	else if (l->log_gz || l->log_gz_store) {
 		mutex_init_pshared(&l->chunk_lock);
+		mutex_init_pshared(&l->deferred_free_lock);
 		p->td->flags |= TD_F_COMPRESS_LOG;
 	}
 
@@ -1144,6 +1145,42 @@ size_t log_chunk_sizes(struct io_log *log)
 
 #ifdef CONFIG_ZLIB
 
+static bool warned_on_drop;
+
+static void iolog_put_deferred(struct io_log *log, void *ptr)
+{
+	if (!ptr)
+		return;
+
+	pthread_mutex_lock(&log->deferred_free_lock);
+	if (log->deferred < IOLOG_MAX_DEFER) {
+		log->deferred_items[log->deferred] = ptr;
+		log->deferred++;
+	} else if (!warned_on_drop) {
+		log_err("fio: had to drop log entry free\n");
+		warned_on_drop = true;
+	}
+	pthread_mutex_unlock(&log->deferred_free_lock);
+}
+
+static void iolog_free_deferred(struct io_log *log)
+{
+	int i;
+
+	if (!log->deferred)
+		return;
+
+	pthread_mutex_lock(&log->deferred_free_lock);
+
+	for (i = 0; i < log->deferred; i++) {
+		free(log->deferred_items[i]);
+		log->deferred_items[i] = NULL;
+	}
+
+	log->deferred = 0;
+	pthread_mutex_unlock(&log->deferred_free_lock);
+}
+
 static int gz_work(struct iolog_flush_data *data)
 {
 	struct iolog_compress *c = NULL;
@@ -1236,7 +1273,7 @@ static int gz_work(struct iolog_flush_data *data)
 	if (ret != Z_OK)
 		log_err("fio: deflateEnd %d\n", ret);
 
-	free(data->samples);
+	iolog_put_deferred(data->log, data->samples);
 
 	if (!flist_empty(&list)) {
 		pthread_mutex_lock(&data->log->chunk_lock);
@@ -1247,7 +1284,7 @@ static int gz_work(struct iolog_flush_data *data)
 	ret = 0;
 done:
 	if (data->free)
-		free(data);
+		sfree(data);
 	return ret;
 err:
 	while (!flist_empty(&list)) {
@@ -1348,7 +1385,7 @@ int iolog_cur_flush(struct io_log *log, struct io_logs *cur_log)
 {
 	struct iolog_flush_data *data;
 
-	data = malloc(sizeof(*data));
+	data = smalloc(sizeof(*data));
 	if (!data)
 		return 1;
 
@@ -1362,6 +1399,9 @@ int iolog_cur_flush(struct io_log *log, struct io_logs *cur_log)
 	cur_log->log = NULL;
 
 	workqueue_enqueue(&log->td->log_compress_wq, &data->work);
+
+	iolog_free_deferred(log);
+
 	return 0;
 }
 #else
diff --git a/iolog.h b/iolog.h
index 0733ad3..d157fa2 100644
--- a/iolog.h
+++ b/iolog.h
@@ -131,6 +131,11 @@ struct io_log {
 	pthread_mutex_t chunk_lock;
 	unsigned int chunk_seq;
 	struct flist_head chunk_list;
+
+	pthread_mutex_t deferred_free_lock;
+#define IOLOG_MAX_DEFER	8
+	void *deferred_items[IOLOG_MAX_DEFER];
+	unsigned int deferred;
 };
 
 /*
@@ -259,7 +264,7 @@ struct log_params {
 
 static inline bool per_unit_log(struct io_log *log)
 {
-	return log && !log->avg_msec;
+	return log && (!log->avg_msec || log->log_gz || log->log_gz_store);
 }
 
 static inline bool inline_log(struct io_log *log)
diff --git a/lib/seqlock.h b/lib/seqlock.h
index 1ac1eb6..762b6ec 100644
--- a/lib/seqlock.h
+++ b/lib/seqlock.h
@@ -1,6 +1,7 @@
 #ifndef FIO_SEQLOCK_H
 #define FIO_SEQLOCK_H
 
+#include "types.h"
 #include "../arch/arch.h"
 
 struct seqlock {
diff --git a/libfio.c b/libfio.c
index da22456..14ddc4d 100644
--- a/libfio.c
+++ b/libfio.c
@@ -144,10 +144,10 @@ void reset_all_stats(struct thread_data *td)
 	}
 
 	set_epoch_time(td, td->o.log_unix_epoch);
-	memcpy(&td->start, &td->epoch, sizeof(struct timeval));
-	memcpy(&td->iops_sample_time, &td->epoch, sizeof(struct timeval));
-	memcpy(&td->bw_sample_time, &td->epoch, sizeof(struct timeval));
-	memcpy(&td->ss.prev_time, &td->epoch, sizeof(struct timeval));
+	memcpy(&td->start, &td->epoch, sizeof(td->epoch));
+	memcpy(&td->iops_sample_time, &td->epoch, sizeof(td->epoch));
+	memcpy(&td->bw_sample_time, &td->epoch, sizeof(td->epoch));
+	memcpy(&td->ss.prev_time, &td->epoch, sizeof(td->epoch));
 
 	lat_target_reset(td);
 	clear_rusage_stat(td);
diff --git a/mutex.c b/mutex.c
index d8c4825..9fab715 100644
--- a/mutex.c
+++ b/mutex.c
@@ -141,11 +141,15 @@ struct fio_mutex *fio_mutex_init(int value)
 	return NULL;
 }
 
-static bool mutex_timed_out(struct timeval *t, unsigned int msecs)
+static bool mutex_timed_out(struct timespec *t, unsigned int msecs)
 {
-	struct timeval now;
+	struct timeval tv;
+	struct timespec now;
+
+	gettimeofday(&tv, NULL);
+	now.tv_sec = tv.tv_sec;
+	now.tv_nsec = tv.tv_usec * 1000;
 
-	gettimeofday(&now, NULL);
 	return mtime_since(t, &now) >= msecs;
 }
 
@@ -177,7 +181,7 @@ int fio_mutex_down_timeout(struct fio_mutex *mutex, unsigned int msecs)
 		 * way too early, double check.
 		 */
 		ret = pthread_cond_timedwait(&mutex->cond, &mutex->lock, &t);
-		if (ret == ETIMEDOUT && !mutex_timed_out(&tv_s, msecs))
+		if (ret == ETIMEDOUT && !mutex_timed_out(&t, msecs))
 			ret = 0;
 	}
 	mutex->waiters--;
diff --git a/options.c b/options.c
index a8fdde4..7431ed8 100644
--- a/options.c
+++ b/options.c
@@ -1381,7 +1381,7 @@ static int str_gtod_reduce_cb(void *data, int *il)
 	td->o.disable_bw = !!val;
 	td->o.clat_percentiles = !val;
 	if (val)
-		td->tv_cache_mask = 63;
+		td->ts_cache_mask = 63;
 
 	return 0;
 }
diff --git a/os/windows/posix.c b/os/windows/posix.c
index eae8c86..488d0ed 100755
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -25,8 +25,8 @@
 #include "../os-windows.h"
 #include "../../lib/hweight.h"
 
-extern unsigned long mtime_since_now(struct timeval *);
-extern void fio_gettime(struct timeval *, void *);
+extern unsigned long mtime_since_now(struct timespec *);
+extern void fio_gettime(struct timespec *, void *);
 
 /* These aren't defined in the MinGW headers */
 HRESULT WINAPI StringCchCopyA(
@@ -852,7 +852,7 @@ int poll(struct pollfd fds[], nfds_t nfds, int timeout)
 
 int nanosleep(const struct timespec *rqtp, struct timespec *rmtp)
 {
-	struct timeval tv;
+	struct timespec tv;
 	DWORD ms_remaining;
 	DWORD ms_total = (rqtp->tv_sec * 1000) + (rqtp->tv_nsec / 1000000.0);
 
diff --git a/profiles/act.c b/profiles/act.c
index 643f8a8..59e5005 100644
--- a/profiles/act.c
+++ b/profiles/act.c
@@ -47,7 +47,7 @@ struct act_run_data {
 static struct act_run_data *act_run_data;
 
 struct act_prof_data {
-	struct timeval sample_tv;
+	struct timespec sample_tv;
 	struct act_slice *slices;
 	unsigned int cur_slice;
 	unsigned int nr_slices;
diff --git a/server.c b/server.c
index 8a5e75d..8b36e38 100644
--- a/server.c
+++ b/server.c
@@ -438,7 +438,7 @@ static uint64_t alloc_reply(uint64_t tag, uint16_t opcode)
 
 	reply = calloc(1, sizeof(*reply));
 	INIT_FLIST_HEAD(&reply->list);
-	fio_gettime(&reply->tv, NULL);
+	fio_gettime(&reply->ts, NULL);
 	reply->saved_tag = tag;
 	reply->opcode = opcode;
 
@@ -1497,6 +1497,8 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 		p.ts.io_u_complete[i]	= cpu_to_le32(ts->io_u_complete[i]);
 	}
 
+	for (i = 0; i < FIO_IO_U_LAT_N_NR; i++)
+		p.ts.io_u_lat_n[i]	= cpu_to_le32(ts->io_u_lat_n[i]);
 	for (i = 0; i < FIO_IO_U_LAT_U_NR; i++)
 		p.ts.io_u_lat_u[i]	= cpu_to_le32(ts->io_u_lat_u[i]);
 	for (i = 0; i < FIO_IO_U_LAT_M_NR; i++)
@@ -2268,7 +2270,7 @@ int fio_server_parse_host(const char *host, int ipv6, struct in_addr *inp,
  * For local domain sockets:
  *	*ptr is the filename, *is_sock is 1.
  */
-int fio_server_parse_string(const char *str, char **ptr, int *is_sock,
+int fio_server_parse_string(const char *str, char **ptr, bool *is_sock,
 			    int *port, struct in_addr *inp,
 			    struct in6_addr *inp6, int *ipv6)
 {
@@ -2277,13 +2279,13 @@ int fio_server_parse_string(const char *str, char **ptr, int *is_sock,
 	int lport = 0;
 
 	*ptr = NULL;
-	*is_sock = 0;
+	*is_sock = false;
 	*port = fio_net_port;
 	*ipv6 = 0;
 
 	if (!strncmp(str, "sock:", 5)) {
 		*ptr = strdup(str + 5);
-		*is_sock = 1;
+		*is_sock = true;
 
 		return 0;
 	}
@@ -2362,7 +2364,8 @@ int fio_server_parse_string(const char *str, char **ptr, int *is_sock,
 static int fio_handle_server_arg(void)
 {
 	int port = fio_net_port;
-	int is_sock, ret = 0;
+	bool is_sock;
+	int ret = 0;
 
 	saddr_in.sin_addr.s_addr = htonl(INADDR_ANY);
 
diff --git a/server.h b/server.h
index f002f3b..7f235f3 100644
--- a/server.h
+++ b/server.h
@@ -43,13 +43,13 @@ struct fio_net_cmd {
 
 struct fio_net_cmd_reply {
 	struct flist_head list;
-	struct timeval tv;
+	struct timespec ts;
 	uint64_t saved_tag;
 	uint16_t opcode;
 };
 
 enum {
-	FIO_SERVER_VER			= 63,
+	FIO_SERVER_VER			= 64,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
@@ -212,7 +212,7 @@ extern int fio_server_text_output(int, const char *, size_t);
 extern int fio_net_send_cmd(int, uint16_t, const void *, off_t, uint64_t *, struct flist_head *);
 extern int fio_net_send_simple_cmd(int, uint16_t, uint64_t, struct flist_head *);
 extern void fio_server_set_arg(const char *);
-extern int fio_server_parse_string(const char *, char **, int *, int *, struct in_addr *, struct in6_addr *, int *);
+extern int fio_server_parse_string(const char *, char **, bool *, int *, struct in_addr *, struct in6_addr *, int *);
 extern int fio_server_parse_host(const char *, int, struct in_addr *, struct in6_addr *);
 extern const char *fio_server_op(unsigned int);
 extern void fio_server_got_signal(int);
diff --git a/stat.c b/stat.c
index fd3ad5a..5042650 100644
--- a/stat.c
+++ b/stat.c
@@ -37,9 +37,9 @@ void update_rusage_stat(struct thread_data *td)
 	struct thread_stat *ts = &td->ts;
 
 	fio_getrusage(&td->ru_end);
-	ts->usr_time += mtime_since(&td->ru_start.ru_utime,
+	ts->usr_time += mtime_since_tv(&td->ru_start.ru_utime,
 					&td->ru_end.ru_utime);
-	ts->sys_time += mtime_since(&td->ru_start.ru_stime,
+	ts->sys_time += mtime_since_tv(&td->ru_start.ru_stime,
 					&td->ru_end.ru_stime);
 	ts->ctx += td->ru_end.ru_nvcsw + td->ru_end.ru_nivcsw
 			- (td->ru_start.ru_nvcsw + td->ru_start.ru_nivcsw);
@@ -58,7 +58,7 @@ void update_rusage_stat(struct thread_data *td)
  * group by looking at the index bits.
  *
  */
-static unsigned int plat_val_to_idx(unsigned int val)
+static unsigned int plat_val_to_idx(unsigned long long val)
 {
 	unsigned int msb, error_bits, base, offset, idx;
 
@@ -66,7 +66,7 @@ static unsigned int plat_val_to_idx(unsigned int val)
 	if (val == 0)
 		msb = 0;
 	else
-		msb = (sizeof(val)*8) - __builtin_clz(val) - 1;
+		msb = (sizeof(val)*8) - __builtin_clzll(val) - 1;
 
 	/*
 	 * MSB <= (FIO_IO_U_PLAT_BITS-1), cannot be rounded off. Use
@@ -135,16 +135,16 @@ static int double_cmp(const void *a, const void *b)
 }
 
 unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
-				   fio_fp64_t *plist, unsigned int **output,
-				   unsigned int *maxv, unsigned int *minv)
+				   fio_fp64_t *plist, unsigned long long **output,
+				   unsigned long long *maxv, unsigned long long *minv)
 {
 	unsigned long sum = 0;
 	unsigned int len, i, j = 0;
 	unsigned int oval_len = 0;
-	unsigned int *ovals = NULL;
+	unsigned long long *ovals = NULL;
 	int is_last;
 
-	*minv = -1U;
+	*minv = -1ULL;
 	*maxv = 0;
 
 	len = 0;
@@ -173,7 +173,7 @@ unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 
 			if (j == oval_len) {
 				oval_len += 100;
-				ovals = realloc(ovals, oval_len * sizeof(unsigned int));
+				ovals = realloc(ovals, oval_len * sizeof(*ovals));
 			}
 
 			ovals[j] = plat_idx_to_val(i);
@@ -201,9 +201,10 @@ static void show_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 				  fio_fp64_t *plist, unsigned int precision,
 				  struct buf_output *out)
 {
-	unsigned int len, j = 0, minv, maxv;
-	unsigned int *ovals;
-	int is_last, per_line, scale_down;
+	unsigned int divisor, len, i, j = 0;
+	unsigned long long minv, maxv;
+	unsigned long long *ovals;
+	int is_last, per_line, scale_down, time_width;
 	char fmt[32];
 
 	len = calc_clat_percentiles(io_u_plat, nr, plist, &ovals, &maxv, &minv);
@@ -211,23 +212,31 @@ static void show_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 		goto out;
 
 	/*
-	 * We default to usecs, but if the value range is such that we
-	 * should scale down to msecs, do that.
+	 * We default to nsecs, but if the value range is such that we
+	 * should scale down to usecs or msecs, do that.
 	 */
-	if (minv > 2000 && maxv > 99999) {
-		scale_down = 1;
+	if (minv > 2000000 && maxv > 99999999ULL) {
+		scale_down = 2;
+		divisor = 1000000;
 		log_buf(out, "    clat percentiles (msec):\n     |");
+	} else if (minv > 2000 && maxv > 99999) {
+		scale_down = 1;
+		divisor = 1000;
+		log_buf(out, "    clat percentiles (usec):\n     |");
 	} else {
 		scale_down = 0;
-		log_buf(out, "    clat percentiles (usec):\n     |");
+		divisor = 1;
+		log_buf(out, "    clat percentiles (nsec):\n     |");
 	}
 
-	snprintf(fmt, sizeof(fmt), "%%1.%uf", precision);
-	per_line = (80 - 7) / (precision + 14);
 
-	for (j = 0; j < len; j++) {
-		char fbuf[16], *ptr = fbuf;
+	time_width = max(5, (int) (log10(maxv / divisor) + 1));
+	snprintf(fmt, sizeof(fmt), " %%%u.%ufth=[%%%dllu]%%c", precision + 3,
+			precision, time_width);
+	/* fmt will be something like " %5.2fth=[%4llu]%c" */
+	per_line = (80 - 7) / (precision + 10 + time_width);
 
+	for (j = 0; j < len; j++) {
 		/* for formatting */
 		if (j != 0 && (j % per_line) == 0)
 			log_buf(out, "     |");
@@ -235,15 +244,10 @@ static void show_clat_percentiles(unsigned int *io_u_plat, unsigned long nr,
 		/* end of the list */
 		is_last = (j == len - 1);
 
-		if (plist[j].u.f < 10.0)
-			ptr += sprintf(fbuf, " ");
-
-		snprintf(ptr, sizeof(fbuf), fmt, plist[j].u.f);
-
-		if (scale_down)
+		for (i = 0; i < scale_down; i++)
 			ovals[j] = (ovals[j] + 999) / 1000;
 
-		log_buf(out, " %sth=[%5u]%c", fbuf, ovals[j], is_last ? '\n' : ',');
+		log_buf(out, fmt, plist[j].u.f, ovals[j], is_last ? '\n' : ',');
 
 		if (is_last)
 			break;
@@ -257,8 +261,8 @@ out:
 		free(ovals);
 }
 
-bool calc_lat(struct io_stat *is, unsigned long *min, unsigned long *max,
-	      double *mean, double *dev)
+bool calc_lat(struct io_stat *is, unsigned long long *min,
+	      unsigned long long *max, double *mean, double *dev)
 {
 	double n = (double) is->samples;
 
@@ -355,6 +359,28 @@ static void stat_calc_lat(struct thread_stat *ts, double *dst,
 	}
 }
 
+/*
+ * To keep the terse format unaltered, add all of the ns latency
+ * buckets to the first us latency bucket
+ */
+void stat_calc_lat_nu(struct thread_stat *ts, double *io_u_lat_u)
+{
+	unsigned long ntotal = 0, total = ddir_rw_sum(ts->total_io_u);
+	int i;
+
+	stat_calc_lat(ts, io_u_lat_u, ts->io_u_lat_u, FIO_IO_U_LAT_U_NR);
+
+	for (i = 0; i < FIO_IO_U_LAT_N_NR; i++)
+		ntotal += ts->io_u_lat_n[i];
+
+	io_u_lat_u[0] += 100.0 * (double) ntotal / (double) total;
+}
+
+void stat_calc_lat_n(struct thread_stat *ts, double *io_u_lat)
+{
+	stat_calc_lat(ts, io_u_lat, ts->io_u_lat_n, FIO_IO_U_LAT_N_NR);
+}
+
 void stat_calc_lat_u(struct thread_stat *ts, double *io_u_lat)
 {
 	stat_calc_lat(ts, io_u_lat, ts->io_u_lat_u, FIO_IO_U_LAT_U_NR);
@@ -365,14 +391,17 @@ void stat_calc_lat_m(struct thread_stat *ts, double *io_u_lat)
 	stat_calc_lat(ts, io_u_lat, ts->io_u_lat_m, FIO_IO_U_LAT_M_NR);
 }
 
-static void display_lat(const char *name, unsigned long min, unsigned long max,
-			double mean, double dev, struct buf_output *out)
+static void display_lat(const char *name, unsigned long long min,
+			unsigned long long max, double mean, double dev,
+			struct buf_output *out)
 {
-	const char *base = "(usec)";
+	const char *base = "(nsec)";
 	char *minp, *maxp;
 
-	if (usec_to_msec(&min, &max, &mean, &dev))
+	if (nsec_to_msec(&min, &max, &mean, &dev))
 		base = "(msec)";
+	else if (nsec_to_usec(&min, &max, &mean, &dev))
+		base = "(usec)";
 
 	minp = num2str(min, 6, 1, 0, N2S_NONE);
 	maxp = num2str(max, 6, 1, 0, N2S_NONE);
@@ -388,8 +417,8 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 			     int ddir, struct buf_output *out)
 {
 	const char *str[] = { " read", "write", " trim" };
-	unsigned long min, max, runt;
-	unsigned long long bw, iops;
+	unsigned long runt;
+	unsigned long long min, max, bw, iops;
 	double mean, dev;
 	char *io_p, *bw_p, *bw_p_alt, *iops_p;
 	int i2p;
@@ -467,7 +496,7 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 			bw_str = (rs->unit_base == 1 ? "Mibit" : "MiB");
 		}
 
-		log_buf(out, "   bw (%5s/s): min=%5lu, max=%5lu, per=%3.2f%%, avg=%5.02f, stdev=%5.02f\n",
+		log_buf(out, "   bw (%5s/s): min=%5llu, max=%5llu, per=%3.2f%%, avg=%5.02f, stdev=%5.02f\n",
 			bw_str, min, max, p_of_agg, mean, dev);
 	}
 }
@@ -502,6 +531,14 @@ static int show_lat(double *io_u_lat, int nr, const char **ranges,
 	return shown;
 }
 
+static void show_lat_n(double *io_u_lat_n, struct buf_output *out)
+{
+	const char *ranges[] = { "2=", "4=", "10=", "20=", "50=", "100=",
+				 "250=", "500=", "750=", "1000=", };
+
+	show_lat(io_u_lat_n, FIO_IO_U_LAT_N_NR, ranges, "nsec", out);
+}
+
 static void show_lat_u(double *io_u_lat_u, struct buf_output *out)
 {
 	const char *ranges[] = { "2=", "4=", "10=", "20=", "50=", "100=",
@@ -521,12 +558,15 @@ static void show_lat_m(double *io_u_lat_m, struct buf_output *out)
 
 static void show_latencies(struct thread_stat *ts, struct buf_output *out)
 {
+	double io_u_lat_n[FIO_IO_U_LAT_N_NR];
 	double io_u_lat_u[FIO_IO_U_LAT_U_NR];
 	double io_u_lat_m[FIO_IO_U_LAT_M_NR];
 
+	stat_calc_lat_n(ts, io_u_lat_n);
 	stat_calc_lat_u(ts, io_u_lat_u);
 	stat_calc_lat_m(ts, io_u_lat_m);
 
+	show_lat_n(io_u_lat_n, out);
 	show_lat_u(io_u_lat_u, out);
 	show_lat_m(io_u_lat_m, out);
 }
@@ -818,11 +858,10 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 				   struct group_run_stats *rs, int ddir,
 				   struct buf_output *out)
 {
-	unsigned long min, max;
-	unsigned long long bw, iops;
-	unsigned int *ovals = NULL;
+	unsigned long long min, max, minv, maxv, bw, iops;
+	unsigned long long *ovals = NULL;
 	double mean, dev;
-	unsigned int len, minv, maxv;
+	unsigned int len;
 	int i;
 
 	assert(ddir_rw(ddir));
@@ -840,14 +879,14 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 					(unsigned long long) ts->runtime[ddir]);
 
 	if (calc_lat(&ts->slat_stat[ddir], &min, &max, &mean, &dev))
-		log_buf(out, ";%lu;%lu;%f;%f", min, max, mean, dev);
+		log_buf(out, ";%llu;%llu;%f;%f", min/1000, max/1000, mean/1000, dev/1000);
 	else
-		log_buf(out, ";%lu;%lu;%f;%f", 0UL, 0UL, 0.0, 0.0);
+		log_buf(out, ";%llu;%llu;%f;%f", 0ULL, 0ULL, 0.0, 0.0);
 
 	if (calc_lat(&ts->clat_stat[ddir], &min, &max, &mean, &dev))
-		log_buf(out, ";%lu;%lu;%f;%f", min, max, mean, dev);
+		log_buf(out, ";%llu;%llu;%f;%f", min/1000, max/1000, mean/1000, dev/1000);
 	else
-		log_buf(out, ";%lu;%lu;%f;%f", 0UL, 0UL, 0.0, 0.0);
+		log_buf(out, ";%llu;%llu;%f;%f", 0ULL, 0ULL, 0.0, 0.0);
 
 	if (ts->clat_percentiles) {
 		len = calc_clat_percentiles(ts->io_u_plat[ddir],
@@ -862,13 +901,13 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 			log_buf(out, ";0%%=0");
 			continue;
 		}
-		log_buf(out, ";%f%%=%u", ts->percentile_list[i].u.f, ovals[i]);
+		log_buf(out, ";%f%%=%llu", ts->percentile_list[i].u.f, ovals[i]/1000);
 	}
 
 	if (calc_lat(&ts->lat_stat[ddir], &min, &max, &mean, &dev))
-		log_buf(out, ";%lu;%lu;%f;%f", min, max, mean, dev);
+		log_buf(out, ";%llu;%llu;%f;%f", min/1000, max/1000, mean/1000, dev/1000);
 	else
-		log_buf(out, ";%lu;%lu;%f;%f", 0UL, 0UL, 0.0, 0.0);
+		log_buf(out, ";%llu;%llu;%f;%f", 0ULL, 0ULL, 0.0, 0.0);
 
 	if (ovals)
 		free(ovals);
@@ -882,19 +921,19 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 				p_of_agg = 100.0;
 		}
 
-		log_buf(out, ";%lu;%lu;%f%%;%f;%f", min, max, p_of_agg, mean, dev);
+		log_buf(out, ";%llu;%llu;%f%%;%f;%f", min, max, p_of_agg, mean, dev);
 	} else
-		log_buf(out, ";%lu;%lu;%f%%;%f;%f", 0UL, 0UL, 0.0, 0.0, 0.0);
+		log_buf(out, ";%llu;%llu;%f%%;%f;%f", 0ULL, 0ULL, 0.0, 0.0, 0.0);
 }
 
 static void add_ddir_status_json(struct thread_stat *ts,
 		struct group_run_stats *rs, int ddir, struct json_object *parent)
 {
-	unsigned long min, max;
+	unsigned long long min, max, minv, maxv;
 	unsigned long long bw;
-	unsigned int *ovals = NULL;
+	unsigned long long *ovals = NULL;
 	double mean, dev, iops;
-	unsigned int len, minv, maxv;
+	unsigned int len;
 	int i;
 	const char *ddirname[] = {"read", "write", "trim"};
 	struct json_object *dir_object, *tmp_object, *percentile_object, *clat_bins_object;
@@ -933,7 +972,7 @@ static void add_ddir_status_json(struct thread_stat *ts,
 		mean = dev = 0.0;
 	}
 	tmp_object = json_create_object();
-	json_object_add_value_object(dir_object, "slat", tmp_object);
+	json_object_add_value_object(dir_object, "slat_ns", tmp_object);
 	json_object_add_value_int(tmp_object, "min", min);
 	json_object_add_value_int(tmp_object, "max", max);
 	json_object_add_value_float(tmp_object, "mean", mean);
@@ -944,7 +983,7 @@ static void add_ddir_status_json(struct thread_stat *ts,
 		mean = dev = 0.0;
 	}
 	tmp_object = json_create_object();
-	json_object_add_value_object(dir_object, "clat", tmp_object);
+	json_object_add_value_object(dir_object, "clat_ns", tmp_object);
 	json_object_add_value_int(tmp_object, "min", min);
 	json_object_add_value_int(tmp_object, "max", max);
 	json_object_add_value_float(tmp_object, "mean", mean);
@@ -985,7 +1024,7 @@ static void add_ddir_status_json(struct thread_stat *ts,
 		mean = dev = 0.0;
 	}
 	tmp_object = json_create_object();
-	json_object_add_value_object(dir_object, "lat", tmp_object);
+	json_object_add_value_object(dir_object, "lat_ns", tmp_object);
 	json_object_add_value_int(tmp_object, "min", min);
 	json_object_add_value_int(tmp_object, "max", max);
 	json_object_add_value_float(tmp_object, "mean", mean);
@@ -1047,7 +1086,7 @@ static void show_thread_status_terse_v2(struct thread_stat *ts,
 
 	/* Calc % distribution of IO depths, usecond, msecond latency */
 	stat_calc_dist(ts->io_u_map, ddir_rw_sum(ts->total_io_u), io_u_dist);
-	stat_calc_lat_u(ts, io_u_lat_u);
+	stat_calc_lat_nu(ts, io_u_lat_u);
 	stat_calc_lat_m(ts, io_u_lat_m);
 
 	/* Only show fixed 7 I/O depth levels*/
@@ -1112,7 +1151,7 @@ static void show_thread_status_terse_v3_v4(struct thread_stat *ts,
 
 	/* Calc % distribution of IO depths, usecond, msecond latency */
 	stat_calc_dist(ts->io_u_map, ddir_rw_sum(ts->total_io_u), io_u_dist);
-	stat_calc_lat_u(ts, io_u_lat_u);
+	stat_calc_lat_nu(ts, io_u_lat_u);
 	stat_calc_lat_m(ts, io_u_lat_m);
 
 	/* Only show fixed 7 I/O depth levels*/
@@ -1173,6 +1212,7 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 	struct json_object *root, *tmp;
 	struct jobs_eta *je;
 	double io_u_dist[FIO_IO_U_MAP_NR];
+	double io_u_lat_n[FIO_IO_U_LAT_N_NR];
 	double io_u_lat_u[FIO_IO_U_LAT_U_NR];
 	double io_u_lat_m[FIO_IO_U_LAT_M_NR];
 	double usr_cpu, sys_cpu;
@@ -1217,6 +1257,7 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 
 	/* Calc % distribution of IO depths, usecond, msecond latency */
 	stat_calc_dist(ts->io_u_map, ddir_rw_sum(ts->total_io_u), io_u_dist);
+	stat_calc_lat_n(ts, io_u_lat_n);
 	stat_calc_lat_u(ts, io_u_lat_u);
 	stat_calc_lat_m(ts, io_u_lat_m);
 
@@ -1232,9 +1273,17 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 		json_object_add_value_float(tmp, (const char *)name, io_u_dist[i]);
 	}
 
+	/* Nanosecond latency */
 	tmp = json_create_object();
-	json_object_add_value_object(root, "latency_us", tmp);
+	json_object_add_value_object(root, "latency_ns", tmp);
+	for (i = 0; i < FIO_IO_U_LAT_N_NR; i++) {
+		const char *ranges[] = { "2", "4", "10", "20", "50", "100",
+				 "250", "500", "750", "1000", };
+		json_object_add_value_float(tmp, ranges[i], io_u_lat_n[i]);
+	}
 	/* Microsecond latency */
+	tmp = json_create_object();
+	json_object_add_value_object(root, "latency_us", tmp);
 	for (i = 0; i < FIO_IO_U_LAT_U_NR; i++) {
 		const char *ranges[] = { "2", "4", "10", "20", "50", "100",
 				 "250", "500", "750", "1000", };
@@ -1494,6 +1543,8 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src,
 		dst->io_u_submit[k] += src->io_u_submit[k];
 	for (k = 0; k < FIO_IO_U_MAP_NR; k++)
 		dst->io_u_complete[k] += src->io_u_complete[k];
+	for (k = 0; k < FIO_IO_U_LAT_N_NR; k++)
+		dst->io_u_lat_n[k] += src->io_u_lat_n[k];
 	for (k = 0; k < FIO_IO_U_LAT_U_NR; k++)
 		dst->io_u_lat_u[k] += src->io_u_lat_u[k];
 	for (k = 0; k < FIO_IO_U_LAT_M_NR; k++)
@@ -1849,22 +1900,22 @@ void __show_running_run_stats(void)
 {
 	struct thread_data *td;
 	unsigned long long *rt;
-	struct timeval tv;
+	struct timespec ts;
 	int i;
 
 	fio_mutex_down(stat_mutex);
 
 	rt = malloc(thread_number * sizeof(unsigned long long));
-	fio_gettime(&tv, NULL);
+	fio_gettime(&ts, NULL);
 
 	for_each_td(td, i) {
 		td->update_rusage = 1;
 		td->ts.io_bytes[DDIR_READ] = td->io_bytes[DDIR_READ];
 		td->ts.io_bytes[DDIR_WRITE] = td->io_bytes[DDIR_WRITE];
 		td->ts.io_bytes[DDIR_TRIM] = td->io_bytes[DDIR_TRIM];
-		td->ts.total_run_time = mtime_since(&td->epoch, &tv);
+		td->ts.total_run_time = mtime_since(&td->epoch, &ts);
 
-		rt[i] = mtime_since(&td->start, &tv);
+		rt[i] = mtime_since(&td->start, &ts);
 		if (td_read(td) && td->ts.io_bytes[DDIR_READ])
 			td->ts.runtime[DDIR_READ] += rt[i];
 		if (td_write(td) && td->ts.io_bytes[DDIR_WRITE])
@@ -1899,7 +1950,7 @@ void __show_running_run_stats(void)
 }
 
 static int status_interval_init;
-static struct timeval status_time;
+static struct timespec status_time;
 static int status_file_disabled;
 
 #define FIO_STATUS_FILE		"fio-dump-status"
@@ -1955,7 +2006,7 @@ void check_for_running_stats(void)
 	}
 }
 
-static inline void add_stat_sample(struct io_stat *is, unsigned long data)
+static inline void add_stat_sample(struct io_stat *is, unsigned long long data)
 {
 	double val = data;
 	double delta;
@@ -2188,6 +2239,8 @@ void reset_io_stats(struct thread_data *td)
 		ts->io_u_complete[i] = 0;
 	}
 
+	for (i = 0; i < FIO_IO_U_LAT_N_NR; i++)
+		ts->io_u_lat_n[i] = 0;
 	for (i = 0; i < FIO_IO_U_LAT_U_NR; i++)
 		ts->io_u_lat_u[i] = 0;
 	for (i = 0; i < FIO_IO_U_LAT_M_NR; i++)
@@ -2303,16 +2356,16 @@ void add_agg_sample(union io_sample_data data, enum fio_ddir ddir, unsigned int
 }
 
 static void add_clat_percentile_sample(struct thread_stat *ts,
-				unsigned long usec, enum fio_ddir ddir)
+				unsigned long long nsec, enum fio_ddir ddir)
 {
-	unsigned int idx = plat_val_to_idx(usec);
+	unsigned int idx = plat_val_to_idx(nsec);
 	assert(idx < FIO_IO_U_PLAT_NR);
 
 	ts->io_u_plat[ddir][idx]++;
 }
 
 void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
-		     unsigned long usec, unsigned int bs, uint64_t offset)
+		     unsigned long long nsec, unsigned int bs, uint64_t offset)
 {
 	unsigned long elapsed, this_window;
 	struct thread_stat *ts = &td->ts;
@@ -2320,14 +2373,14 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 
 	td_io_u_lock(td);
 
-	add_stat_sample(&ts->clat_stat[ddir], usec);
+	add_stat_sample(&ts->clat_stat[ddir], nsec);
 
 	if (td->clat_log)
-		add_log_sample(td, td->clat_log, sample_val(usec), ddir, bs,
+		add_log_sample(td, td->clat_log, sample_val(nsec), ddir, bs,
 			       offset);
 
 	if (ts->clat_percentiles)
-		add_clat_percentile_sample(ts, usec, ddir);
+		add_clat_percentile_sample(ts, nsec, ddir);
 
 	if (iolog && iolog->hist_msec) {
 		struct io_hist *hw = &iolog->hist_window[ddir];
@@ -2389,7 +2442,7 @@ void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
 }
 
 void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
-		    unsigned long usec, unsigned int bs, uint64_t offset)
+		    unsigned long long nsec, unsigned int bs, uint64_t offset)
 {
 	struct thread_stat *ts = &td->ts;
 
@@ -2398,23 +2451,23 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 
 	td_io_u_lock(td);
 
-	add_stat_sample(&ts->lat_stat[ddir], usec);
+	add_stat_sample(&ts->lat_stat[ddir], nsec);
 
 	if (td->lat_log)
-		add_log_sample(td, td->lat_log, sample_val(usec), ddir, bs,
+		add_log_sample(td, td->lat_log, sample_val(nsec), ddir, bs,
 			       offset);
 
 	td_io_u_unlock(td);
 }
 
 void add_bw_sample(struct thread_data *td, struct io_u *io_u,
-		   unsigned int bytes, unsigned long spent)
+		   unsigned int bytes, unsigned long long spent)
 {
 	struct thread_stat *ts = &td->ts;
 	unsigned long rate;
 
 	if (spent)
-		rate = bytes * 1000 / spent;
+		rate = (unsigned long) (bytes * 1000000ULL / spent);
 	else
 		rate = 0;
 
@@ -2430,8 +2483,8 @@ void add_bw_sample(struct thread_data *td, struct io_u *io_u,
 	td_io_u_unlock(td);
 }
 
-static int __add_samples(struct thread_data *td, struct timeval *parent_tv,
-			 struct timeval *t, unsigned int avg_time,
+static int __add_samples(struct thread_data *td, struct timespec *parent_tv,
+			 struct timespec *t, unsigned int avg_time,
 			 uint64_t *this_io_bytes, uint64_t *stat_io_bytes,
 			 struct io_stat *stat, struct io_log *log,
 			 bool is_kb)
@@ -2481,7 +2534,7 @@ static int __add_samples(struct thread_data *td, struct timeval *parent_tv,
 		stat_io_bytes[ddir] = this_io_bytes[ddir];
 	}
 
-	timeval_add_msec(parent_tv, avg_time);
+	timespec_add_msec(parent_tv, avg_time);
 
 	td_io_u_unlock(td);
 
@@ -2493,7 +2546,7 @@ static int __add_samples(struct thread_data *td, struct timeval *parent_tv,
 	return min(next, next_log);
 }
 
-static int add_bw_samples(struct thread_data *td, struct timeval *t)
+static int add_bw_samples(struct thread_data *td, struct timespec *t)
 {
 	return __add_samples(td, &td->bw_sample_time, t, td->o.bw_avg_time,
 				td->this_io_bytes, td->stat_io_bytes,
@@ -2517,7 +2570,7 @@ void add_iops_sample(struct thread_data *td, struct io_u *io_u,
 	td_io_u_unlock(td);
 }
 
-static int add_iops_samples(struct thread_data *td, struct timeval *t)
+static int add_iops_samples(struct thread_data *td, struct timespec *t)
 {
 	return __add_samples(td, &td->iops_sample_time, t, td->o.iops_avg_time,
 				td->this_io_blocks, td->stat_io_blocks,
@@ -2531,7 +2584,7 @@ int calc_log_samples(void)
 {
 	struct thread_data *td;
 	unsigned int next = ~0U, tmp;
-	struct timeval now;
+	struct timespec now;
 	int i;
 
 	fio_gettime(&now, NULL);
diff --git a/stat.h b/stat.h
index d8a0803..132dee3 100644
--- a/stat.h
+++ b/stat.h
@@ -19,6 +19,7 @@ struct group_run_stats {
  * How many depth levels to log
  */
 #define FIO_IO_U_MAP_NR	7
+#define FIO_IO_U_LAT_N_NR 10
 #define FIO_IO_U_LAT_U_NR 10
 #define FIO_IO_U_LAT_M_NR 12
 
@@ -108,7 +109,7 @@ struct group_run_stats {
 
 #define FIO_IO_U_PLAT_BITS 6
 #define FIO_IO_U_PLAT_VAL (1 << FIO_IO_U_PLAT_BITS)
-#define FIO_IO_U_PLAT_GROUP_NR 19
+#define FIO_IO_U_PLAT_GROUP_NR 29
 #define FIO_IO_U_PLAT_NR (FIO_IO_U_PLAT_GROUP_NR * FIO_IO_U_PLAT_VAL)
 #define FIO_IO_U_LIST_MAX_LEN 20 /* The size of the default and user-specified
 					list of percentiles */
@@ -178,6 +179,7 @@ struct thread_stat {
 	uint32_t io_u_map[FIO_IO_U_MAP_NR];
 	uint32_t io_u_submit[FIO_IO_U_MAP_NR];
 	uint32_t io_u_complete[FIO_IO_U_MAP_NR];
+	uint32_t io_u_lat_n[FIO_IO_U_LAT_N_NR];
 	uint32_t io_u_lat_u[FIO_IO_U_LAT_U_NR];
 	uint32_t io_u_lat_m[FIO_IO_U_LAT_M_NR];
 	uint32_t io_u_plat[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR];
@@ -286,8 +288,9 @@ extern void sum_group_stats(struct group_run_stats *dst, struct group_run_stats
 extern void init_thread_stat(struct thread_stat *ts);
 extern void init_group_run_stat(struct group_run_stats *gs);
 extern void eta_to_str(char *str, unsigned long eta_sec);
-extern bool calc_lat(struct io_stat *is, unsigned long *min, unsigned long *max, double *mean, double *dev);
-extern unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr, fio_fp64_t *plist, unsigned int **output, unsigned int *maxv, unsigned int *minv);
+extern bool calc_lat(struct io_stat *is, unsigned long long *min, unsigned long long *max, double *mean, double *dev);
+extern unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr, fio_fp64_t *plist, unsigned long long **output, unsigned long long *maxv, unsigned long long *minv);
+extern void stat_calc_lat_n(struct thread_stat *ts, double *io_u_lat);
 extern void stat_calc_lat_m(struct thread_stat *ts, double *io_u_lat);
 extern void stat_calc_lat_u(struct thread_stat *ts, double *io_u_lat);
 extern void stat_calc_dist(unsigned int *map, unsigned long total, double *io_u_dist);
@@ -295,9 +298,9 @@ extern void reset_io_stats(struct thread_data *);
 extern void update_rusage_stat(struct thread_data *);
 extern void clear_rusage_stat(struct thread_data *);
 
-extern void add_lat_sample(struct thread_data *, enum fio_ddir, unsigned long,
+extern void add_lat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
 				unsigned int, uint64_t);
-extern void add_clat_sample(struct thread_data *, enum fio_ddir, unsigned long,
+extern void add_clat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
 				unsigned int, uint64_t);
 extern void add_slat_sample(struct thread_data *, enum fio_ddir, unsigned long,
 				unsigned int, uint64_t);
@@ -305,16 +308,17 @@ extern void add_agg_sample(union io_sample_data, enum fio_ddir, unsigned int);
 extern void add_iops_sample(struct thread_data *, struct io_u *,
 				unsigned int);
 extern void add_bw_sample(struct thread_data *, struct io_u *,
-				unsigned int, unsigned long);
+				unsigned int, unsigned long long);
 extern int calc_log_samples(void);
 
 extern struct io_log *agg_io_log[DDIR_RWDIR_CNT];
 extern int write_bw_log;
 
-static inline bool usec_to_msec(unsigned long *min, unsigned long *max,
-				double *mean, double *dev)
+static inline bool nsec_to_usec(unsigned long long *min,
+				unsigned long long *max, double *mean,
+				double *dev)
 {
-	if (*min > 1000 && *max > 1000 && *mean > 1000.0 && *dev > 1000.0) {
+	if (*min > 2000 && *max > 99999 && *dev > 1000.0) {
 		*min /= 1000;
 		*max /= 1000;
 		*mean /= 1000.0;
@@ -324,6 +328,22 @@ static inline bool usec_to_msec(unsigned long *min, unsigned long *max,
 
 	return false;
 }
+
+static inline bool nsec_to_msec(unsigned long long *min,
+				unsigned long long *max, double *mean,
+				double *dev)
+{
+	if (*min > 2000000 && *max > 99999999ULL && *dev > 1000000.0) {
+		*min /= 1000000;
+		*max /= 1000000;
+		*mean /= 1000000.0;
+		*dev /= 1000000.0;
+		return true;
+	}
+
+	return false;
+}
+
 /*
  * Worst level condensing would be 1:5, so allow enough room for that
  */
diff --git a/steadystate.c b/steadystate.c
index 98f027c..45d4f5d 100644
--- a/steadystate.c
+++ b/steadystate.c
@@ -196,7 +196,7 @@ void steadystate_check(void)
 	int i, j, ddir, prev_groupid, group_ramp_time_over = 0;
 	unsigned long rate_time;
 	struct thread_data *td, *td2;
-	struct timeval now;
+	struct timespec now;
 	uint64_t group_bw = 0, group_iops = 0;
 	uint64_t td_iops, td_bytes;
 	bool ret;
diff --git a/steadystate.h b/steadystate.h
index 20ccd30..bbc3945 100644
--- a/steadystate.h
+++ b/steadystate.h
@@ -35,7 +35,7 @@ struct steadystate_data {
 	uint64_t sum_xy;
 	uint64_t oldest_y;
 
-	struct timeval prev_time;
+	struct timespec prev_time;
 	uint64_t prev_iops;
 	uint64_t prev_bytes;
 };
diff --git a/t/arch.c b/t/arch.c
index befb7c7..bd28a84 100644
--- a/t/arch.c
+++ b/t/arch.c
@@ -1,5 +1,5 @@
 #include "../arch/arch.h"
 
 unsigned long arch_flags = 0;
-int tsc_reliable;
+bool tsc_reliable;
 int arch_random;
diff --git a/t/debug.c b/t/debug.c
index bf6f460..8965cfb 100644
--- a/t/debug.c
+++ b/t/debug.c
@@ -1,7 +1,7 @@
 #include <stdio.h>
 
 FILE *f_err;
-struct timeval *fio_tv = NULL;
+struct timespec *fio_ts = NULL;
 unsigned long fio_debug = 0;
 
 void __dprint(int type, const char *str, ...)
diff --git a/t/dedupe.c b/t/dedupe.c
index 1f172a2..c3b837f 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -334,7 +334,7 @@ static void *thread_fn(void *data)
 static void show_progress(struct worker_thread *threads, unsigned long total)
 {
 	unsigned long last_nitems = 0;
-	struct timeval last_tv;
+	struct timespec last_tv;
 
 	fio_gettime(&last_tv, NULL);
 
diff --git a/t/lfsr-test.c b/t/lfsr-test.c
index 7016f26..4009b62 100644
--- a/t/lfsr-test.c
+++ b/t/lfsr-test.c
@@ -27,7 +27,7 @@ void usage()
 int main(int argc, char *argv[])
 {
 	int r;
-	struct timeval start, end;
+	struct timespec start, end;
 	struct fio_lfsr *fl;
 	int verify = 0;
 	unsigned int spin = 0;
diff --git a/t/time-test.c b/t/time-test.c
new file mode 100644
index 0000000..a74d920
--- /dev/null
+++ b/t/time-test.c
@@ -0,0 +1,544 @@
+/*
+ * Carry out arithmetic to explore conversion of CPU clock ticks to nsec
+ *
+ * When we use the CPU clock for timing, we do the following:
+ *
+ * 1) Calibrate the CPU clock to relate the frequency of CPU clock ticks
+ *    to actual time.
+ *
+ *    Using gettimeofday() or clock_gettime(), count how many CPU clock
+ *    ticks occur per usec
+ *
+ * 2) Calculate conversion factors so that we can ultimately convert
+ *    from clocks ticks to nsec with
+ *      nsec = (ticks * clock_mult) >> clock_shift
+ *
+ *    This is equivalent to
+ *	nsec = ticks * (MULTIPLIER / cycles_per_nsec) / MULTIPLIER
+ *    where
+ *	clock_mult = MULTIPLIER / cycles_per_nsec
+ *      MULTIPLIER = 2^clock_shift
+ *
+ *    It would be simpler to just calculate nsec = ticks / cycles_per_nsec,
+ *    but all of this is necessary because of rounding when calculating
+ *    cycles_per_nsec. With a 3.0GHz CPU, cycles_per_nsec would simply
+ *    be 3. But with a 3.33GHz CPU or a 4.5GHz CPU, the fractional
+ *    portion is lost with integer arithmetic.
+ *
+ *    This multiply and shift calculation also has a performance benefit
+ *    as multiplication and bit shift operations are faster than integer
+ *    division.
+ *
+ * 3) Dynamically determine clock_shift and clock_mult at run time based
+ *    on MAX_CLOCK_SEC and cycles_per_usec. MAX_CLOCK_SEC is the maximum
+ *    duration for which the conversion will be valid.
+ *
+ *    The primary constraint is that (ticks * clock_mult) must not overflow
+ *    when ticks is at its maximum value.
+ *
+ *    So we have
+ *	max_ticks = MAX_CLOCK_SEC * 1000000000 * cycles_per_nsec
+ *	max_ticks * clock_mult <= ULLONG_MAX
+ *	max_ticks * MULTIPLIER / cycles_per_nsec <= ULLONG_MAX
+ *      MULTIPLIER <= ULLONG_MAX * cycles_per_nsec / max_ticks
+ *
+ *    Then choose the largest clock_shift that satisfies
+ *	2^clock_shift <= ULLONG_MAX * cycles_per_nsec / max_ticks
+ *
+ *    Finally calculate the appropriate clock_mult associated with clock_shift
+ *	clock_mult = 2^clock_shift / cycles_per_nsec
+ *
+ * 4) In the code below we have cycles_per_usec and use
+ *	cycles_per_nsec = cycles_per_usec / 1000
+ *
+ *
+ * The code below implements 4 clock tick to nsec conversion strategies
+ *
+ *   i) 64-bit arithmetic for the (ticks * clock_mult) product with the
+ *	conversion valid for at most MAX_CLOCK_SEC
+ *
+ *  ii) NOT IMPLEMENTED Use 64-bit integers to emulate 128-bit multiplication
+ *	for the (ticks * clock_mult) product
+ *
+ * iii) 64-bit arithmetic with clock ticks to nsec conversion occurring in
+ *	two stages. The first stage counts the number of discrete, large chunks
+ *	of time that have elapsed. To this is added the time represented by
+ *	the remaining clock ticks. The advantage of this strategy is better
+ *	accuracy because the (ticks * clock_mult) product used for final
+ *	fractional chunk
+ *
+ *  iv) 64-bit arithmetic with the clock ticks to nsec conversion occuring in
+ *	two stages. This is carried out using locks to update the number of
+ *	large time chunks (MAX_CLOCK_SEC_2STAGE) that have elapsed.
+ *
+ *   v) 128-bit arithmetic used for the clock ticks to nsec conversion.
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <limits.h>
+#include <assert.h>
+#include <stdlib.h>
+#include "lib/seqlock.h"
+
+#define DEBUG 0
+#define MAX_CLOCK_SEC 365*24*60*60ULL
+#define MAX_CLOCK_SEC_2STAGE 60*60ULL
+#define dprintf(...) if (DEBUG) { printf(__VA_ARGS__); }
+
+enum {
+	__CLOCK64_BIT		= 1 << 0,
+	__CLOCK128_BIT		= 1 << 1,
+	__CLOCK_MULT_SHIFT	= 1 << 2,
+	__CLOCK_EMULATE_128	= 1 << 3,
+	__CLOCK_2STAGE		= 1 << 4,
+	__CLOCK_LOCK		= 1 << 5,
+
+	CLOCK64_MULT_SHIFT	= __CLOCK64_BIT | __CLOCK_MULT_SHIFT,
+	CLOCK64_EMULATE_128	= __CLOCK64_BIT | __CLOCK_EMULATE_128,
+	CLOCK64_2STAGE		= __CLOCK64_BIT | __CLOCK_2STAGE,
+	CLOCK64_LOCK		= __CLOCK64_BIT | __CLOCK_LOCK,
+	CLOCK128_MULT_SHIFT	= __CLOCK128_BIT | __CLOCK_MULT_SHIFT,
+};
+
+static struct seqlock clock_seqlock;
+static unsigned long long cycles_start;
+static unsigned long long elapsed_nsec;
+
+static unsigned int max_cycles_shift;
+static unsigned long long max_cycles_mask;
+static unsigned long long nsecs_for_max_cycles;
+
+static unsigned int clock_shift;
+static unsigned long long clock_mult;
+
+static unsigned long long *nsecs;
+static unsigned long long clock_mult64_128[2];
+static __uint128_t clock_mult128;
+
+/*
+ * Functions for carrying out 128-bit
+ * arithmetic using 64-bit integers
+ *
+ * 128-bit integers are stored as
+ * arrays of two 64-bit integers
+ *
+ * Ordering is little endian
+ *
+ * a[0] has the less significant bits
+ * a[1] has the more significant bits
+ *
+ * NOT FULLY IMPLEMENTED
+ */
+static void do_mult(unsigned long long a[2], unsigned long long b,
+		    unsigned long long product[2])
+{
+	product[0] = product[1] = 0;
+	return;
+}
+
+static void do_div(unsigned long long a[2], unsigned long long b,
+		   unsigned long long c[2])
+{
+	return;
+}
+
+static void do_shift64(unsigned long long a[2], unsigned int count)
+{
+	a[0] = a[1] >> (count-64);
+	a[1] = 0;
+}
+
+static void do_shift(unsigned long long a[2], unsigned int count)
+{
+	if (count > 64)
+		do_shift64(a, count);
+	else {
+		while (count--) {
+			a[0] >>= 1;
+			a[0] |= a[1] << 63;
+			a[1] >>= 1;
+		}
+	}
+}
+
+static void update_clock(unsigned long long t)
+{
+	write_seqlock_begin(&clock_seqlock);
+	elapsed_nsec = (t >> max_cycles_shift) * nsecs_for_max_cycles;
+	cycles_start = t & ~max_cycles_mask;
+	write_seqlock_end(&clock_seqlock);
+}
+
+static unsigned long long _get_nsec(int mode, unsigned long long t)
+{
+	switch(mode) {
+	case CLOCK64_MULT_SHIFT:
+		return (t * clock_mult) >> clock_shift;
+	case CLOCK64_EMULATE_128: {
+		unsigned long long product[2] =  { };
+
+		do_mult(clock_mult64_128, t, product);
+		do_shift(product, clock_shift);
+		return product[0];
+		}
+	case CLOCK64_2STAGE: {
+		unsigned long long multiples, nsec;
+
+		multiples = t >> max_cycles_shift;
+		dprintf("multiples=%llu\n", multiples);
+		nsec = multiples * nsecs_for_max_cycles;
+		nsec += ((t & max_cycles_mask) * clock_mult) >> clock_shift;
+		return nsec;
+		}
+	case CLOCK64_LOCK: {
+		unsigned int seq;
+		unsigned long long nsec;
+
+		do {
+			seq = read_seqlock_begin(&clock_seqlock);
+			nsec = elapsed_nsec;
+			nsec += ((t - cycles_start) * clock_mult) >> clock_shift;
+		} while (read_seqlock_retry(&clock_seqlock, seq));
+		return nsec;
+		}
+	case CLOCK128_MULT_SHIFT:
+		return (unsigned long long)((t * clock_mult128) >> clock_shift);
+		default:
+			assert(0);
+	}
+}
+
+static unsigned long long get_nsec(int mode, unsigned long long t)
+{
+	if (mode == CLOCK64_LOCK) {
+		update_clock(t);
+	}
+
+	return _get_nsec(mode, t);
+}
+
+static void calc_mult_shift(int mode, void *mult, unsigned int *shift,
+			    unsigned long long max_sec,
+			    unsigned long long cycles_per_usec)
+{
+	unsigned long long max_ticks;
+	max_ticks = max_sec * cycles_per_usec * 1000000ULL;
+
+	switch (mode) {
+	case CLOCK64_MULT_SHIFT: {
+		unsigned long long max_mult, tmp;
+		unsigned int sft = 0;
+
+		/*
+		 * Calculate the largest multiplier that will not
+		 * produce a 64-bit overflow in the multiplication
+		 * step of the clock ticks to nsec conversion
+		 */
+		max_mult = ULLONG_MAX / max_ticks;
+		dprintf("max_ticks=%llu, __builtin_clzll=%d, max_mult=%llu\n", max_ticks, __builtin_clzll(max_ticks), max_mult);
+
+		/*
+		 * Find the largest shift count that will produce
+		 * a multiplier less than max_mult
+		 */
+		tmp = max_mult * cycles_per_usec / 1000;
+		while (tmp > 1) {
+			tmp >>= 1;
+			sft++;
+			dprintf("tmp=%llu, sft=%u\n", tmp, sft);
+		}
+
+		*shift = sft;
+		*((unsigned long long *)mult) = (unsigned long long) ((1ULL << sft) * 1000 / cycles_per_usec);
+		break;
+		}
+	case CLOCK64_EMULATE_128: {
+		unsigned long long max_mult[2], tmp[2] = { };
+		unsigned int sft = 0;
+
+		/*
+		 * Calculate the largest multiplier that will not
+		 * produce a 128-bit overflow in the multiplication
+		 * step of the clock ticks to nsec conversion,
+		 * but use only 64-bit integers in the process
+		 */
+		max_mult[0] = max_mult[1] = ULLONG_MAX;
+		do_div(max_mult, max_ticks, max_mult);
+		dprintf("max_ticks=%llu, __builtin_clzll=%d, max_mult=0x%016llx%016llx\n",
+			max_ticks, __builtin_clzll(max_ticks), max_mult[1], max_mult[0]);
+
+		/*
+		 * Find the largest shift count that will produce
+		 * a multiplier less than max_mult
+		 */
+		do_div(max_mult, cycles_per_usec, tmp);
+		do_div(tmp, 1000ULL, tmp);
+		while (tmp[0] > 1 || tmp[1] > 1) {
+			do_shift(tmp, 1);
+			sft++;
+			dprintf("tmp=0x%016llx%016llx, sft=%u\n", tmp[1], tmp[0], sft);
+		}
+
+		*shift = sft;
+//		*((unsigned long long *)mult) = (__uint128_t) (((__uint128_t)1 << sft) * 1000 / cycles_per_usec);
+		break;
+		}
+	case CLOCK64_2STAGE: {
+		unsigned long long tmp;
+/*
+ * This clock tick to nsec conversion requires two stages.
+ *
+ * Stage 1: Determine how many ~MAX_CLOCK_SEC_2STAGE periods worth of clock ticks
+ * 	have elapsed and set nsecs to the appropriate value for those
+ *	~MAX_CLOCK_SEC_2STAGE periods.
+ * Stage 2: Subtract the ticks for the elapsed ~MAX_CLOCK_SEC_2STAGE periods from
+ *	Stage 1. Convert remaining clock ticks to nsecs and add to previously
+ *	set nsec value.
+ *
+ * To optimize the arithmetic operations, use the greatest power of 2 ticks
+ * less than the number of ticks in MAX_CLOCK_SEC_2STAGE seconds.
+ *
+ */
+		// Use a period shorter than MAX_CLOCK_SEC here for better accuracy
+		calc_mult_shift(CLOCK64_MULT_SHIFT, mult, shift, MAX_CLOCK_SEC_2STAGE, cycles_per_usec);
+
+		// Find the greatest power of 2 clock ticks that is less than the ticks in MAX_CLOCK_SEC_2STAGE
+		max_cycles_shift = max_cycles_mask = 0;
+		tmp = MAX_CLOCK_SEC_2STAGE * 1000000ULL * cycles_per_usec;
+		dprintf("tmp=%llu, max_cycles_shift=%u\n", tmp, max_cycles_shift);
+		while (tmp > 1) {
+			tmp >>= 1;
+			max_cycles_shift++;
+			dprintf("tmp=%llu, max_cycles_shift=%u\n", tmp, max_cycles_shift);
+		}
+		// if use use (1ULL << max_cycles_shift) * 1000 / cycles_per_usec here we will
+		// have a discontinuity every (1ULL << max_cycles_shift) cycles
+		nsecs_for_max_cycles = (1ULL << max_cycles_shift) * *((unsigned long long *)mult) >> *shift;
+
+		// Use a bitmask to calculate ticks % (1ULL << max_cycles_shift)
+		for (tmp = 0; tmp < max_cycles_shift; tmp++)
+			max_cycles_mask |= 1ULL << tmp;
+
+		dprintf("max_cycles_shift=%u, 2^max_cycles_shift=%llu, nsecs_for_max_cycles=%llu, max_cycles_mask=%016llx\n",
+				max_cycles_shift, (1ULL << max_cycles_shift),
+				nsecs_for_max_cycles, max_cycles_mask);
+
+
+		break;
+		}
+	case CLOCK64_LOCK: {
+/*
+ * This clock tick to nsec conversion also requires two stages.
+ *
+ * Stage 1: Add to nsec the current running total of elapsed long periods
+ * Stage 2: Subtract from clock ticks the tick count corresponding to the
+ *	most recently elapsed long period. Convert the remaining ticks to
+ *	nsec and add to the previous nsec value.
+ *
+ * In practice the elapsed nsec from Stage 1 and the tick count subtracted
+ * in Stage 2 will be maintained in a separate thread.
+ *
+ */
+		calc_mult_shift(CLOCK64_2STAGE, mult, shift, MAX_CLOCK_SEC, cycles_per_usec);
+		cycles_start = 0;
+		break;
+		}
+	case CLOCK128_MULT_SHIFT: {
+		__uint128_t max_mult, tmp;
+		unsigned int sft = 0;
+
+		/*
+		 * Calculate the largest multiplier that will not
+		 * produce a 128-bit overflow in the multiplication
+		 * step of the clock ticks to nsec conversion
+		 */
+		max_mult = ((__uint128_t) ULLONG_MAX) << 64 | ULLONG_MAX;
+		max_mult /= max_ticks;
+		dprintf("max_ticks=%llu, __builtin_clzll=%d, max_mult=0x%016llx%016llx\n",
+				max_ticks, __builtin_clzll(max_ticks),
+				(unsigned long long) (max_mult >> 64),
+				(unsigned long long) max_mult);
+
+		/*
+		 * Find the largest shift count that will produce
+		 * a multiplier less than max_mult
+		 */
+		tmp = max_mult * cycles_per_usec / 1000;
+		while (tmp > 1) {
+			tmp >>= 1;
+			sft++;
+			dprintf("tmp=0x%016llx%016llx, sft=%u\n",
+					(unsigned long long) (tmp >> 64),
+					(unsigned long long) tmp, sft);
+		}
+
+		*shift = sft;
+		*((__uint128_t *)mult) = (__uint128_t) (((__uint128_t)1 << sft) * 1000 / cycles_per_usec);
+		break;
+		}
+	}
+}
+
+static int discontinuity(int mode, int delta_ticks, int delta_nsec,
+			 unsigned long long start, unsigned long len)
+{
+	int i;
+	unsigned long mismatches = 0, bad_mismatches = 0;
+	unsigned long long delta, max_mismatch = 0;
+	unsigned long long *ns = nsecs;
+
+	for (i = 0; i < len; ns++, i++) {
+		*ns = get_nsec(mode, start + i);
+		if (i - delta_ticks >= 0) {
+			if (*ns > *(ns - delta_ticks))
+				delta = *ns - *(ns - delta_ticks);
+			else
+				delta = *(ns - delta_ticks) - *ns;
+			if (delta > delta_nsec)
+				delta -= delta_nsec;
+			else
+				delta = delta_nsec - delta;
+			if (delta) {
+				mismatches++;
+				if (delta > 1)
+					bad_mismatches++;
+				if (delta > max_mismatch)
+					max_mismatch = delta;
+			}
+		}
+		if (!bad_mismatches)
+			assert(max_mismatch == 0 || max_mismatch == 1);
+		if (!mismatches)
+			assert(max_mismatch == 0);
+	}
+
+	printf("%lu discontinuities (%lu%%) (%lu errors > 1ns, max delta = %lluns) for ticks = %llu...%llu\n",
+		mismatches, (mismatches * 100) / len, bad_mismatches, max_mismatch, start,
+		start + len - 1);
+	return mismatches;
+}
+
+#define MIN_TICKS 1ULL
+#define LEN 1000000000ULL
+#define NSEC_ONE_SEC 1000000000ULL
+#define TESTLEN 9
+
+static long long test_clock(int mode, int cycles_per_usec, int fast_test,
+			    int quiet, int delta_ticks, int delta_nsec)
+{
+	int i;
+	long long delta;
+	unsigned long long max_ticks;
+	unsigned long long nsecs;
+	void *mult;
+	unsigned long long test_ns[TESTLEN] =
+			{NSEC_ONE_SEC, NSEC_ONE_SEC,
+			 NSEC_ONE_SEC, NSEC_ONE_SEC*60, NSEC_ONE_SEC*60*60,
+			 NSEC_ONE_SEC*60*60*2, NSEC_ONE_SEC*60*60*4,
+			 NSEC_ONE_SEC*60*60*8, NSEC_ONE_SEC*60*60*24};
+	unsigned long long test_ticks[TESTLEN];
+
+	max_ticks = MAX_CLOCK_SEC * (unsigned long long) cycles_per_usec * 1000000ULL;
+
+	switch(mode) {
+	case CLOCK64_MULT_SHIFT:
+		mult = &clock_mult;
+		break;
+	case CLOCK64_EMULATE_128:
+		mult = clock_mult64_128;
+		break;
+	case CLOCK64_2STAGE:
+		mult = &clock_mult;
+		break;
+	case CLOCK64_LOCK:
+		mult = &clock_mult;
+		break;
+	case CLOCK128_MULT_SHIFT:
+		mult = &clock_mult128;
+		break;
+	default:
+		assert(0);
+	}
+	calc_mult_shift(mode, mult, &clock_shift, MAX_CLOCK_SEC, cycles_per_usec);
+	nsecs = get_nsec(mode, max_ticks);
+	delta = nsecs/1000000 - MAX_CLOCK_SEC*1000;
+
+	if (mode == CLOCK64_2STAGE) {
+		test_ns[0] = nsecs_for_max_cycles - 1;
+		test_ns[1] = nsecs_for_max_cycles;
+		test_ticks[0] = (1ULL << max_cycles_shift) - 1;
+		test_ticks[1] = (1ULL << max_cycles_shift);
+
+		for (i = 2; i < TESTLEN; i++)
+			test_ticks[i] = test_ns[i] / 1000 * cycles_per_usec;
+	}
+	else {
+		for (i = 0; i < TESTLEN; i++)
+			test_ticks[i] = test_ns[i] / 1000 * cycles_per_usec;
+	}
+
+	if (!quiet) {
+		printf("cycles_per_usec=%d, delta_ticks=%d, delta_nsec=%d, max_ticks=%llu, shift=%u, 2^shift=%llu\n",
+			cycles_per_usec, delta_ticks, delta_nsec, max_ticks, clock_shift, (1ULL << clock_shift));
+		switch(mode) {
+			case CLOCK64_LOCK:
+			case CLOCK64_2STAGE:
+			case CLOCK64_MULT_SHIFT: {
+				printf("clock_mult=%llu, clock_mult / 2^clock_shift=%f\n",
+					clock_mult, (double) clock_mult / (1ULL << clock_shift));
+				break;
+			}
+			case CLOCK64_EMULATE_128: {
+				printf("clock_mult=0x%016llx%016llx\n",
+					clock_mult64_128[1], clock_mult64_128[0]);
+				break;
+			}
+			case CLOCK128_MULT_SHIFT: {
+				printf("clock_mult=0x%016llx%016llx\n",
+					(unsigned long long) (clock_mult128 >> 64),
+					(unsigned long long) clock_mult128);
+				break;
+			}
+		}
+		printf("get_nsec(max_ticks) = %lluns, should be %lluns, error<=abs(%lld)ms\n",
+			nsecs, MAX_CLOCK_SEC*1000000000ULL, delta);
+	}
+
+	for (i = 0; i < TESTLEN; i++)
+	{
+		nsecs = get_nsec(mode, test_ticks[i]);
+		delta = nsecs > test_ns[i] ? nsecs - test_ns[i] : test_ns[i] - nsecs;
+		if (!quiet || delta > 0)
+			printf("get_nsec(%llu)=%llu, expected %llu, delta=%llu\n",
+				test_ticks[i], nsecs, test_ns[i], delta);
+	}
+
+	if (!fast_test) {
+		discontinuity(mode, delta_ticks, delta_nsec, max_ticks - LEN + 1, LEN);
+		discontinuity(mode, delta_ticks, delta_nsec, MIN_TICKS, LEN);
+	}
+
+	if (!quiet)
+		printf("\n\n");
+
+	return delta;
+}
+
+int main(int argc, char *argv[])
+{
+	nsecs = malloc(LEN * sizeof(unsigned long long));
+
+	test_clock(CLOCK64_LOCK, 3333, 1, 0, 0, 0);
+	test_clock(CLOCK64_LOCK, 1000, 1, 0, 1, 1);
+	test_clock(CLOCK64_LOCK, 1100, 1, 0, 11, 10);
+	test_clock(CLOCK64_LOCK, 3000, 1, 0, 3, 1);
+	test_clock(CLOCK64_LOCK, 3333, 1, 0, 3333, 1000);
+	test_clock(CLOCK64_LOCK, 3392, 1, 0, 424, 125);
+	test_clock(CLOCK64_LOCK, 4500, 1, 0, 9, 2);
+	test_clock(CLOCK64_LOCK, 5000, 1, 0, 5, 1);
+
+	free(nsecs);
+	return 0;
+}
diff --git a/time.c b/time.c
index 279ee48..edfe779 100644
--- a/time.c
+++ b/time.c
@@ -3,23 +3,23 @@
 
 #include "fio.h"
 
-static struct timeval genesis;
+static struct timespec genesis;
 static unsigned long ns_granularity;
 
-void timeval_add_msec(struct timeval *tv, unsigned int msec)
+void timespec_add_msec(struct timespec *ts, unsigned int msec)
 {
-	unsigned long adj_usec = 1000 * msec;
+	unsigned long adj_nsec = 1000000 * msec;
 
-	tv->tv_usec += adj_usec;
-	if (adj_usec >= 1000000) {
-		unsigned long adj_sec = adj_usec / 1000000;
+	ts->tv_nsec += adj_nsec;
+	if (adj_nsec >= 1000000000) {
+		unsigned long adj_sec = adj_nsec / 1000000000UL;
 
-		tv->tv_usec -=  adj_sec * 1000000;
-		tv->tv_sec += adj_sec;
+		ts->tv_nsec -=  adj_sec * 1000000000UL;
+		ts->tv_sec += adj_sec;
 	}
-	if (tv->tv_usec >= 1000000){
-		tv->tv_usec -= 1000000;
-		tv->tv_sec++;
+	if (ts->tv_nsec >= 1000000000UL){
+		ts->tv_nsec -= 1000000000UL;
+		ts->tv_sec++;
 	}
 }
 
@@ -28,7 +28,7 @@ void timeval_add_msec(struct timeval *tv, unsigned int msec)
  */
 uint64_t usec_spin(unsigned int usec)
 {
-	struct timeval start;
+	struct timespec start;
 	uint64_t t;
 
 	fio_gettime(&start, NULL);
@@ -41,7 +41,7 @@ uint64_t usec_spin(unsigned int usec)
 uint64_t usec_sleep(struct thread_data *td, unsigned long usec)
 {
 	struct timespec req;
-	struct timeval tv;
+	struct timespec tv;
 	uint64_t t = 0;
 
 	do {
@@ -111,13 +111,10 @@ static void parent_update_ramp(struct thread_data *td)
 
 bool ramp_time_over(struct thread_data *td)
 {
-	struct timeval tv;
-
 	if (!td->o.ramp_time || td->ramp_time_over)
 		return true;
 
-	fio_gettime(&tv, NULL);
-	if (utime_since(&td->epoch, &tv) >= td->o.ramp_time) {
+	if (utime_since_now(&td->epoch) >= td->o.ramp_time) {
 		td->ramp_time_over = 1;
 		reset_all_stats(td);
 		td_set_runstate(td, TD_RAMP);
@@ -138,8 +135,7 @@ void fio_time_init(void)
 	 * Check the granularity of the nanosleep function
 	 */
 	for (i = 0; i < 10; i++) {
-		struct timeval tv;
-		struct timespec ts;
+		struct timespec tv, ts;
 		unsigned long elapsed;
 
 		fio_gettime(&tv, NULL);
@@ -170,7 +166,7 @@ void set_epoch_time(struct thread_data *td, int log_unix_epoch)
 	}
 }
 
-void fill_start_time(struct timeval *t)
+void fill_start_time(struct timespec *t)
 {
 	memcpy(t, &genesis, sizeof(genesis));
 }
diff --git a/tools/hist/fiologparser_hist.py b/tools/hist/fiologparser_hist.py
index ead5e54..ad97a54 100755
--- a/tools/hist/fiologparser_hist.py
+++ b/tools/hist/fiologparser_hist.py
@@ -373,7 +373,7 @@ if __name__ == '__main__':
         help='print warning messages to stderr')
 
     arg('--group_nr',
-        default=19,
+        default=29,
         type=int,
         help='FIO_IO_U_PLAT_GROUP_NR as defined in stat.h')
 
diff --git a/verify.c b/verify.c
index 1c39fa2..ffd8707 100644
--- a/verify.c
+++ b/verify.c
@@ -1167,7 +1167,7 @@ static void __fill_hdr(struct thread_data *td, struct io_u *io_u,
 	hdr->rand_seed = rand_seed;
 	hdr->offset = io_u->offset + header_num * td->o.verify_interval;
 	hdr->time_sec = io_u->start_time.tv_sec;
-	hdr->time_usec = io_u->start_time.tv_usec;
+	hdr->time_usec = io_u->start_time.tv_nsec / 1000;
 	hdr->thread = td->thread_number;
 	hdr->numberio = io_u->numberio;
 	hdr->crc32 = fio_crc32c(p, offsetof(struct verify_header, crc32));

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 594f7abb0c3e5ea4174b1d450959305f827cbdfd:

  Merge branch 'dev/doc/pkg' of https://github.com/bobsaintcool/fio (2017-06-18 10:32:11 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b2fcbe01bdac01bc5d7f8ddea94f264b9f8c2003:

  Ensure that thread_stat alignment is correct (2017-06-19 16:41:51 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Update write_hint mechanism to latest API

Omri Mor (1):
      Ensure that thread_stat alignment is correct

 cconv.c          |  2 ++
 engines/sync.c   | 26 --------------------------
 fio.h            |  2 +-
 ioengines.c      | 10 +++++-----
 options.c        | 46 ++++++++++++++++++++++++++++------------------
 os/os-linux.h    | 18 +++++++++++++++---
 server.h         |  2 +-
 thread_options.h |  3 +++
 8 files changed, 55 insertions(+), 54 deletions(-)

---

Diff of recent changes:

diff --git a/cconv.c b/cconv.c
index bf4c517..b8d9ddc 100644
--- a/cconv.c
+++ b/cconv.c
@@ -156,6 +156,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->end_fsync = le32_to_cpu(top->end_fsync);
 	o->pre_read = le32_to_cpu(top->pre_read);
 	o->sync_io = le32_to_cpu(top->sync_io);
+	o->write_hint = le32_to_cpu(top->write_hint);
 	o->verify = le32_to_cpu(top->verify);
 	o->do_verify = le32_to_cpu(top->do_verify);
 	o->verifysort = le32_to_cpu(top->verifysort);
@@ -365,6 +366,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->end_fsync = cpu_to_le32(o->end_fsync);
 	top->pre_read = cpu_to_le32(o->pre_read);
 	top->sync_io = cpu_to_le32(o->sync_io);
+	top->write_hint = cpu_to_le32(o->write_hint);
 	top->verify = cpu_to_le32(o->verify);
 	top->do_verify = cpu_to_le32(o->do_verify);
 	top->verifysort = cpu_to_le32(o->verifysort);
diff --git a/engines/sync.c b/engines/sync.c
index 69d5e21..e76bbbb 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -36,7 +36,6 @@ struct syncio_data {
 struct psyncv2_options {
 	void *pad;
 	unsigned int hipri;
-	unsigned int stream;
 };
 
 static struct fio_option options[] = {
@@ -50,29 +49,6 @@ static struct fio_option options[] = {
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
-		.name	= "stream",
-		.lname	= "Stream ID",
-		.type	= FIO_OPT_STR,
-		.off1	= offsetof(struct psyncv2_options, stream),
-		.help	= "Set expected write life time",
-		.category = FIO_OPT_C_ENGINE,
-		.group	= FIO_OPT_G_INVALID,
-		.posval = {
-			  { .ival = "short",
-			    .oval = RWF_WRITE_LIFE_SHORT,
-			  },
-			  { .ival = "medium",
-			    .oval = RWF_WRITE_LIFE_MEDIUM,
-			  },
-			  { .ival = "long",
-			    .oval = RWF_WRITE_LIFE_LONG,
-			  },
-			  { .ival = "extreme",
-			    .oval = RWF_WRITE_LIFE_EXTREME,
-			  },
-		},
-	},
-	{
 		.name	= NULL,
 	},
 };
@@ -158,8 +134,6 @@ static int fio_pvsyncio2_queue(struct thread_data *td, struct io_u *io_u)
 
 	if (o->hipri)
 		flags |= RWF_HIPRI;
-	if (o->stream)
-		flags |= o->stream;
 
 	iov->iov_base = io_u->xfer_buf;
 	iov->iov_len = io_u->xfer_buflen;
diff --git a/fio.h b/fio.h
index 963cf03..6c06a0c 100644
--- a/fio.h
+++ b/fio.h
@@ -149,7 +149,7 @@ struct thread_data {
 	unsigned int thread_number;
 	unsigned int subjob_number;
 	unsigned int groupid;
-	struct thread_stat ts __attribute__ ((aligned));
+	struct thread_stat ts __attribute__ ((aligned(8)));
 
 	int client_type;
 
diff --git a/ioengines.c b/ioengines.c
index c90a2ca..2d55065 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -472,13 +472,13 @@ int td_io_open_file(struct thread_data *td, struct fio_file *f)
 			goto err;
 		}
 	}
-#ifdef FIO_HAVE_STREAMID
-	if (td->o.fadvise_stream &&
+#ifdef FIO_HAVE_WRITE_HINT
+	if (fio_option_is_set(&td->o, write_hint) &&
 	    (f->filetype == FIO_TYPE_BLOCK || f->filetype == FIO_TYPE_FILE)) {
-		off_t stream = td->o.fadvise_stream;
+		uint64_t hint = td->o.write_hint;
 
-		if (posix_fadvise(f->fd, stream, f->io_size, POSIX_FADV_STREAMID) < 0) {
-			td_verror(td, errno, "fadvise streamid");
+		if (fcntl(f->fd, F_SET_RW_HINT, &hint) < 0) {
+			td_verror(td, errno, "fcntl write hint");
 			goto err;
 		}
 	}
diff --git a/options.c b/options.c
index 6d799bf..a8fdde4 100644
--- a/options.c
+++ b/options.c
@@ -2355,24 +2355,6 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_FILE,
 		.group	= FIO_OPT_G_INVALID,
 	},
-#ifdef FIO_HAVE_STREAMID
-	{
-		.name	= "fadvise_stream",
-		.lname	= "Fadvise stream",
-		.type	= FIO_OPT_INT,
-		.off1	= offsetof(struct thread_options, fadvise_stream),
-		.help	= "Use fadvise() to set stream ID",
-		.category = FIO_OPT_C_FILE,
-		.group	= FIO_OPT_G_INVALID,
-	},
-#else
-	{
-		.name	= "fadvise_stream",
-		.lname	= "Fadvise stream",
-		.type	= FIO_OPT_UNSUPPORTED,
-		.help	= "Your platform does not support fadvise stream ID",
-	},
-#endif
 	{
 		.name	= "fsync",
 		.lname	= "Fsync",
@@ -3434,6 +3416,34 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_IO_TYPE,
 	},
+#ifdef FIO_HAVE_WRITE_HINT
+	{
+		.name	= "write_hint",
+		.lname	= "Write hint",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct thread_options, write_hint),
+		.help	= "Set expected write life time",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_INVALID,
+		.posval = {
+			  { .ival = "none",
+			    .oval = RWH_WRITE_LIFE_NONE,
+			  },
+			  { .ival = "short",
+			    .oval = RWH_WRITE_LIFE_SHORT,
+			  },
+			  { .ival = "medium",
+			    .oval = RWH_WRITE_LIFE_MEDIUM,
+			  },
+			  { .ival = "long",
+			    .oval = RWH_WRITE_LIFE_LONG,
+			  },
+			  { .ival = "extreme",
+			    .oval = RWH_WRITE_LIFE_EXTREME,
+			  },
+		},
+	},
+#endif
 	{
 		.name	= "create_serialize",
 		.lname	= "Create serialize",
diff --git a/os/os-linux.h b/os/os-linux.h
index 09e7413..3e7a2fc 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -303,11 +303,23 @@ static inline int fio_set_sched_idle(void)
 }
 #endif
 
-#ifndef POSIX_FADV_STREAMID
-#define POSIX_FADV_STREAMID	8
+#ifndef F_GET_RW_HINT
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE	1024
+#endif
+#define F_GET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 11)
+#define F_SET_RW_HINT		(F_LINUX_SPECIFIC_BASE + 12)
+#endif
+
+#ifndef RWH_WRITE_LIFE_NONE
+#define RWH_WRITE_LIFE_NONE	0
+#define RWH_WRITE_LIFE_SHORT	1
+#define RWH_WRITE_LIFE_MEDIUM	2
+#define RWH_WRITE_LIFE_LONG	3
+#define RWH_WRITE_LIFE_EXTREME	4
 #endif
 
-#define FIO_HAVE_STREAMID
+#define FIO_HAVE_WRITE_HINT
 
 #ifndef RWF_HIPRI
 #define RWF_HIPRI	0x00000001
diff --git a/server.h b/server.h
index fff6804..f002f3b 100644
--- a/server.h
+++ b/server.h
@@ -49,7 +49,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 62,
+	FIO_SERVER_VER			= 63,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 493e92e..72d86cf 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -102,6 +102,7 @@ struct thread_options {
 	unsigned int end_fsync;
 	unsigned int pre_read;
 	unsigned int sync_io;
+	unsigned int write_hint;
 	unsigned int verify;
 	unsigned int do_verify;
 	unsigned int verifysort;
@@ -376,6 +377,7 @@ struct thread_options_pack {
 	uint32_t end_fsync;
 	uint32_t pre_read;
 	uint32_t sync_io;
+	uint32_t write_hint;
 	uint32_t verify;
 	uint32_t do_verify;
 	uint32_t verifysort;
@@ -417,6 +419,7 @@ struct thread_options_pack {
 
 	uint32_t random_distribution;
 	uint32_t exitall_error;
+	uint32_t pad;
 
 	struct zone_split zone_split[DDIR_RWDIR_CNT][ZONESPLIT_MAX];
 	uint32_t zone_split_nr[DDIR_RWDIR_CNT];

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e5aaf1e677d1413125ffaf7aae48b1e8f4ce9ebc:

  Fio 2.21 (2017-06-15 12:25:03 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 594f7abb0c3e5ea4174b1d450959305f827cbdfd:

  Merge branch 'dev/doc/pkg' of https://github.com/bobsaintcool/fio (2017-06-18 10:32:11 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'dev/doc/pkg' of https://github.com/bobsaintcool/fio

Quentin Bourgeois (1):
      <README/pkg: Add reference to Arch Linux package>

 README | 4 ++++
 1 file changed, 4 insertions(+)

---

Diff of recent changes:

diff --git a/README b/README
index 951550b..6bff82b 100644
--- a/README
+++ b/README
@@ -108,6 +108,10 @@ Mandriva:
 	Mandriva has integrated fio into their package repository, so installing
 	on that distro should be as easy as typing ``urpmi fio``.
 
+Arch Linux:
+        An Arch Linux package is provided under the Community sub-repository:
+        https://www.archlinux.org/packages/?sort=&q=fio
+
 Solaris:
 	Packages for Solaris are available from OpenCSW. Install their pkgutil
 	tool (http://www.opencsw.org/get-it/pkgutil/) and then install fio via

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 72cd4146e538450b8789095b910ffce2b397a06c:

  sync: add support for write life time hint (2017-06-14 09:55:40 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e5aaf1e677d1413125ffaf7aae48b1e8f4ce9ebc:

  Fio 2.21 (2017-06-15 12:25:03 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 2.21

 FIO-VERSION-GEN        | 2 +-
 os/windows/install.wxs | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index a9ddb31..3b4f206 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.20
+DEF_VER=fio-2.21
 
 LF='
 '
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 05d2a83..860570a 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.20">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.21">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 052046a1e92bff19ecae277670a59fb726a66b86:

  binject: don't use void* for pointer arithmetic (gcc) (2017-06-12 14:51:01 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 72cd4146e538450b8789095b910ffce2b397a06c:

  sync: add support for write life time hint (2017-06-14 09:55:40 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      sync: add support for write life time hint

 engines/sync.c | 26 ++++++++++++++++++++++++++
 os/os-linux.h  |  8 ++++++++
 2 files changed, 34 insertions(+)

---

Diff of recent changes:

diff --git a/engines/sync.c b/engines/sync.c
index e76bbbb..69d5e21 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -36,6 +36,7 @@ struct syncio_data {
 struct psyncv2_options {
 	void *pad;
 	unsigned int hipri;
+	unsigned int stream;
 };
 
 static struct fio_option options[] = {
@@ -49,6 +50,29 @@ static struct fio_option options[] = {
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
+		.name	= "stream",
+		.lname	= "Stream ID",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct psyncv2_options, stream),
+		.help	= "Set expected write life time",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_INVALID,
+		.posval = {
+			  { .ival = "short",
+			    .oval = RWF_WRITE_LIFE_SHORT,
+			  },
+			  { .ival = "medium",
+			    .oval = RWF_WRITE_LIFE_MEDIUM,
+			  },
+			  { .ival = "long",
+			    .oval = RWF_WRITE_LIFE_LONG,
+			  },
+			  { .ival = "extreme",
+			    .oval = RWF_WRITE_LIFE_EXTREME,
+			  },
+		},
+	},
+	{
 		.name	= NULL,
 	},
 };
@@ -134,6 +158,8 @@ static int fio_pvsyncio2_queue(struct thread_data *td, struct io_u *io_u)
 
 	if (o->hipri)
 		flags |= RWF_HIPRI;
+	if (o->stream)
+		flags |= o->stream;
 
 	iov->iov_base = io_u->xfer_buf;
 	iov->iov_len = io_u->xfer_buflen;
diff --git a/os/os-linux.h b/os/os-linux.h
index 008ce2d..09e7413 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -319,6 +319,14 @@ static inline int fio_set_sched_idle(void)
 #define RWF_SYNC	0x00000004
 #endif
 
+#ifndef RWF_WRITE_LIFE_SHIFT
+#define RWF_WRITE_LIFE_SHIFT		4
+#define RWF_WRITE_LIFE_SHORT		(1 << RWF_WRITE_LIFE_SHIFT)
+#define RWF_WRITE_LIFE_MEDIUM		(2 << RWF_WRITE_LIFE_SHIFT)
+#define RWF_WRITE_LIFE_LONG		(3 << RWF_WRITE_LIFE_SHIFT)
+#define RWF_WRITE_LIFE_EXTREME		(4 << RWF_WRITE_LIFE_SHIFT)
+#endif
+
 #ifndef CONFIG_PWRITEV2
 #ifdef __NR_preadv2
 static inline void make_pos_h_l(unsigned long *pos_h, unsigned long *pos_l,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 85c705e55c2bbeb3c74d96ef4ec1ae90203c4083:

  man: Update buffer_pattern entry in man pages (2017-06-08 12:12:36 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 052046a1e92bff19ecae277670a59fb726a66b86:

  binject: don't use void* for pointer arithmetic (gcc) (2017-06-12 14:51:01 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (6):
      sg: add missing free(msg); in ->errdetails() handler
      sg: drop unneeded strdup from ->errdetails() handler
      sg: drop redundant void* cast
      sg: don't use void* for pointer arithmetic (gcc)
      splice: don't use void* for pointer arithmetic (gcc)
      binject: don't use void* for pointer arithmetic (gcc)

 engines/binject.c |  3 ++-
 engines/sg.c      | 16 ++++++++--------
 engines/splice.c  |  5 +++--
 3 files changed, 13 insertions(+), 11 deletions(-)

---

Diff of recent changes:

diff --git a/engines/binject.c b/engines/binject.c
index 932534a..792dbbd 100644
--- a/engines/binject.c
+++ b/engines/binject.c
@@ -59,11 +59,12 @@ static int pollin_events(struct pollfd *pfds, int fds)
 	return 0;
 }
 
-static unsigned int binject_read_commands(struct thread_data *td, void *p,
+static unsigned int binject_read_commands(struct thread_data *td, void *buf,
 					  int left, int *err)
 {
 	struct fio_file *f;
 	int i, ret, events;
+	char *p = buf;
 
 one_more:
 	events = 0;
diff --git a/engines/sg.c b/engines/sg.c
index 2148e87..4540b57 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -124,7 +124,7 @@ static int fio_sgio_getevents(struct thread_data *td, unsigned int min,
 	}
 
 	while (left) {
-		void *p;
+		char *p;
 
 		dprint(FD_IO, "sgio_getevents: sd %p: left=%d\n", sd, left);
 
@@ -184,7 +184,7 @@ re_read:
 			if (hdr->info & SG_INFO_CHECK) {
 				struct io_u *io_u;
 				io_u = (struct io_u *)(hdr->usr_ptr);
-				memcpy((void*)&(io_u->hdr), (void*)hdr, sizeof(struct sg_io_hdr));
+				memcpy(&io_u->hdr, hdr, sizeof(struct sg_io_hdr));
 				sd->events[i]->error = EIO;
 			}
 		}
@@ -572,17 +572,17 @@ static char *fio_sgio_errdetails(struct io_u *io_u)
 	struct sg_io_hdr *hdr = &io_u->hdr;
 #define MAXERRDETAIL 1024
 #define MAXMSGCHUNK  128
-	char *msg, msgchunk[MAXMSGCHUNK], *ret = NULL;
+	char *msg, msgchunk[MAXMSGCHUNK];
 	int i;
 
 	msg = calloc(1, MAXERRDETAIL);
+	strcpy(msg, "");
 
 	/*
 	 * can't seem to find sg_err.h, so I'll just echo the define values
 	 * so others can search on internet to find clearer clues of meaning.
 	 */
 	if (hdr->info & SG_INFO_CHECK) {
-		ret = msg;
 		if (hdr->host_status) {
 			snprintf(msgchunk, MAXMSGCHUNK, "SG Host Status: 0x%02x; ", hdr->host_status);
 			strlcat(msg, msgchunk, MAXERRDETAIL);
@@ -755,14 +755,14 @@ static char *fio_sgio_errdetails(struct io_u *io_u)
 		if (hdr->resid != 0) {
 			snprintf(msgchunk, MAXMSGCHUNK, "SG Driver: %d bytes out of %d not transferred. ", hdr->resid, hdr->dxfer_len);
 			strlcat(msg, msgchunk, MAXERRDETAIL);
-			ret = msg;
 		}
 	}
 
-	if (!ret)
-		ret = strdup("SG Driver did not report a Host, Driver or Device check");
+	if (!(hdr->info & SG_INFO_CHECK) && !strlen(msg))
+		strncpy(msg, "SG Driver did not report a Host, Driver or Device check",
+			MAXERRDETAIL - 1);
 
-	return ret;
+	return msg;
 }
 
 /*
diff --git a/engines/splice.c b/engines/splice.c
index eba093e..d5d8ab0 100644
--- a/engines/splice.c
+++ b/engines/splice.c
@@ -32,7 +32,7 @@ static int fio_splice_read_old(struct thread_data *td, struct io_u *io_u)
 	struct fio_file *f = io_u->file;
 	int ret, ret2, buflen;
 	off_t offset;
-	void *p;
+	char *p;
 
 	offset = io_u->offset;
 	buflen = io_u->xfer_buflen;
@@ -77,7 +77,8 @@ static int fio_splice_read(struct thread_data *td, struct io_u *io_u)
 	struct iovec iov;
 	int ret , buflen, mmap_len;
 	off_t offset;
-	void *p, *map;
+	void *map;
+	char *p;
 
 	ret = 0;
 	offset = io_u->offset;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a35ef7cb514d02671bdcb029a64785bbc288fe96:

  HOWTO: mention some details of ignore_error= option (2017-06-07 14:23:26 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 85c705e55c2bbeb3c74d96ef4ec1ae90203c4083:

  man: Update buffer_pattern entry in man pages (2017-06-08 12:12:36 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Add strndup() function, if we don't have it
      configure: remember to initialize 'strndup' to "no"
      oslib/strndup: cleanup and remember to include for aux programs

Stephen Bates (2):
      pattern: Add support for files in buffer_pattern argument.
      man: Update buffer_pattern entry in man pages

 HOWTO           | 19 +++++++++++------
 Makefile        |  8 +++++--
 configure       | 22 +++++++++++++++++++
 fio.1           | 19 +++++++++++------
 lib/pattern.c   | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 oslib/strndup.c | 18 ++++++++++++++++
 oslib/strndup.h |  7 ++++++
 7 files changed, 145 insertions(+), 14 deletions(-)
 create mode 100644 oslib/strndup.c
 create mode 100644 oslib/strndup.h

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 289c518..d3a5783 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1401,11 +1401,18 @@ Buffers and memory
 
 .. option:: buffer_pattern=str
 
-	If set, fio will fill the I/O buffers with this pattern. If not set, the
-	contents of I/O buffers is defined by the other options related to buffer
-	contents. The setting can be any pattern of bytes, and can be prefixed with
-	0x for hex values. It may also be a string, where the string must then be
-	wrapped with ``""``, e.g.::
+	If set, fio will fill the I/O buffers with this pattern or with the contents
+	of a file. If not set, the contents of I/O buffers are defined by the other
+	options related to buffer contents. The setting can be any pattern of bytes,
+	and can be prefixed with 0x for hex values. It may also be a string, where
+	the string must then be wrapped with ``""``. Or it may also be a filename,
+	where the filename must be wrapped with ``''`` in which case the file is
+	opened and read. Note that not all the file contents will be read if that
+	would cause the buffers to overflow. So, for example::
+
+		buffer_pattern='filename'
+
+	or::
 
 		buffer_pattern="abcd"
 
@@ -1419,7 +1426,7 @@ Buffers and memory
 
 	Also you can combine everything together in any order::
 
-		buffer_pattern=0xdeadface"abcd"-12
+		buffer_pattern=0xdeadface"abcd"-12'filename'
 
 .. option:: dedupe_percentage=int
 
diff --git a/Makefile b/Makefile
index c3e551d..d7786d2 100644
--- a/Makefile
+++ b/Makefile
@@ -107,6 +107,9 @@ endif
 ifndef CONFIG_STRLCAT
   SOURCE += oslib/strlcat.c
 endif
+ifndef CONFIG_HAVE_STRNDUP
+  SOURCE += oslib/strndup.c
+endif
 ifndef CONFIG_GETOPT_LONG_ONLY
   SOURCE += oslib/getopt_long.c
 endif
@@ -209,7 +212,8 @@ T_IEEE_PROGS = t/ieee754
 
 T_ZIPF_OBS = t/genzipf.o
 T_ZIPF_OBJS += t/log.o lib/ieee754.o lib/rand.o lib/pattern.o lib/zipf.o \
-		lib/strntol.o lib/gauss.o t/genzipf.o oslib/strcasestr.o
+		lib/strntol.o lib/gauss.o t/genzipf.o oslib/strcasestr.o \
+		oslib/strndup.o
 T_ZIPF_PROGS = t/fio-genzipf
 
 T_AXMAP_OBJS = t/axmap.o
@@ -222,7 +226,7 @@ T_LFSR_TEST_PROGS = t/lfsr-test
 
 T_GEN_RAND_OBJS = t/gen-rand.o
 T_GEN_RAND_OBJS += t/log.o t/debug.o lib/rand.o lib/pattern.o lib/strntol.o \
-			oslib/strcasestr.o
+			oslib/strcasestr.o oslib/strndup.o
 T_GEN_RAND_PROGS = t/gen-rand
 
 ifeq ($(CONFIG_TARGET_OS), Linux)
diff --git a/configure b/configure
index 2c6bfc8..afb88ca 100755
--- a/configure
+++ b/configure
@@ -1971,6 +1971,25 @@ fi
 print_config "bool" "$have_bool"
 
 ##########################################
+# Check whether we have strndup()
+strndup="no"
+cat > $TMPC << EOF
+#include <string.h>
+#include <stdlib.h>
+int main(int argc, char **argv)
+{
+  char *res = strndup("test string", 8);
+
+  free(res);
+  return 0;
+}
+EOF
+if compile_prog "" "" "strndup"; then
+  strndup="yes"
+fi
+print_config "strndup" "$strndup"
+
+##########################################
 # check march=armv8-a+crc+crypto
 if test "$march_armv8_a_crc_crypto" != "yes" ; then
   march_armv8_a_crc_crypto="no"
@@ -2227,6 +2246,9 @@ fi
 if test "$have_bool" = "yes" ; then
   output_sym "CONFIG_HAVE_BOOL"
 fi
+if test "$strndup" = "yes" ; then
+  output_sym "CONFIG_HAVE_STRNDUP"
+fi
 if test "$disable_opt" = "yes" ; then
   output_sym "CONFIG_DISABLE_OPTIMIZATIONS"
 fi
diff --git a/fio.1 b/fio.1
index e153d46..96eceaf 100644
--- a/fio.1
+++ b/fio.1
@@ -611,13 +611,20 @@ the remaining zeroed. With this set to some chunk size smaller than the block
 size, fio can alternate random and zeroed data throughout the IO buffer.
 .TP
 .BI buffer_pattern \fR=\fPstr
-If set, fio will fill the IO buffers with this pattern. If not set, the contents
-of IO buffers is defined by the other options related to buffer contents. The
-setting can be any pattern of bytes, and can be prefixed with 0x for hex
-values. It may also be a string, where the string must then be wrapped with
-"", e.g.:
+If set, fio will fill the I/O buffers with this pattern or with the contents
+of a file. If not set, the contents of I/O buffers are defined by the other
+options related to buffer contents. The setting can be any pattern of bytes,
+and can be prefixed with 0x for hex values. It may also be a string, where
+the string must then be wrapped with ``""``. Or it may also be a filename,
+where the filename must be wrapped with ``''`` in which case the file is
+opened and read. Note that not all the file contents will be read if that
+would cause the buffers to overflow. So, for example:
 .RS
 .RS
+\fBbuffer_pattern\fR='filename'
+.RS
+or
+.RE
 \fBbuffer_pattern\fR="abcd"
 .RS
 or
@@ -632,7 +639,7 @@ or
 Also you can combine everything together in any order:
 .LP
 .RS
-\fBbuffer_pattern\fR=0xdeadface"abcd"-12
+\fBbuffer_pattern\fR=0xdeadface"abcd"-12'filename'
 .RE
 .RE
 .TP
diff --git a/lib/pattern.c b/lib/pattern.c
index 0aeb935..31ee4ea 100644
--- a/lib/pattern.c
+++ b/lib/pattern.c
@@ -4,11 +4,74 @@
 #include <limits.h>
 #include <errno.h>
 #include <assert.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
 
 #include "strntol.h"
 #include "pattern.h"
 #include "../minmax.h"
 #include "../oslib/strcasestr.h"
+#include "../oslib/strndup.h"
+
+/**
+ * parse_file() - parses binary file to fill buffer
+ * @beg - string input, extract filename from this
+ * @out - output buffer where parsed number should be put
+ * @out_len - length of the output buffer
+ * @filled - pointer where number of bytes successfully
+ *           parsed will be put
+ *
+ * Returns the end pointer where parsing has been stopped.
+ * In case of parsing error or lack of bytes in output buffer
+ * NULL will be returned.
+ */
+static const char *parse_file(const char *beg, char *out,
+			      unsigned int out_len,
+			      unsigned int *filled)
+{
+	const char *end;
+	char *file;
+	int fd;
+	ssize_t count;
+
+	if (!out_len)
+		goto err_out;
+
+	assert(*beg == '\'');
+	beg++;
+	end = strchr(beg, '\'');
+	if (!end)
+		goto err_out;
+
+	file = strndup(beg, end - beg);
+	if (file == NULL)
+		goto err_out;
+
+	fd = open(file, O_RDONLY);
+	if (fd < 0)
+		goto err_free_out;
+
+	count = read(fd, out, out_len);
+	if (count == -1)
+		goto err_free_close_out;
+
+	*filled = count;
+	close(fd);
+	free(file);
+
+	/* Catch up quote */
+	return end + 1;
+
+err_free_close_out:
+	close(fd);
+err_free_out:
+	free(file);
+err_out:
+	return NULL;
+
+}
 
 /**
  * parse_string() - parses string in double quotes, like "abc"
@@ -271,6 +334,9 @@ int parse_and_fill_pattern(const char *in, unsigned int in_len,
 		parsed_fmt = 0;
 
 		switch (*beg) {
+		case '\'':
+			end = parse_file(beg, out, out_len, &filled);
+			break;
 		case '"':
 			end = parse_string(beg, out, out_len, &filled);
 			break;
diff --git a/oslib/strndup.c b/oslib/strndup.c
new file mode 100644
index 0000000..7b0fcb5
--- /dev/null
+++ b/oslib/strndup.c
@@ -0,0 +1,18 @@
+#include <stdlib.h>
+#include "strndup.h"
+
+#ifndef CONFIG_HAVE_STRNDUP
+
+char *strndup(const char *s, size_t n)
+{
+	char *str = malloc(n + 1);
+
+	if (str) {
+		strncpy(str, s, n);
+		str[n] = '\0';
+	}
+
+	return str;
+}
+
+#endif
diff --git a/oslib/strndup.h b/oslib/strndup.h
new file mode 100644
index 0000000..2cb904d
--- /dev/null
+++ b/oslib/strndup.h
@@ -0,0 +1,7 @@
+#include <string.h>
+
+#ifndef CONFIG_HAVE_STRNDUP
+
+char *strndup(const char *s, size_t n);
+
+#endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c12d597ac36632a6f08c749df302135bbd339cb2:

  stat: correct json 'io_bytes' output (2017-06-05 14:05:43 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a35ef7cb514d02671bdcb029a64785bbc288fe96:

  HOWTO: mention some details of ignore_error= option (2017-06-07 14:23:26 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      fileset: fix double addition of file offset
      Fix up some style
      diskutil: ensure we have enough room to not write past end
      blktrace: ensure that dev loop doesn't truncate name

Tomohiro Kusumi (5):
      fix wrong malloc size for ignore_error buffer
      don't leave ignore_error_nr[etype] with 4 on blank input or error
      use enum error_type_bit for ignore_error index
      use ARRAY_SIZE() for ignore_error_nr[etype]
      HOWTO: mention some details of ignore_error= option

 HOWTO                    |  6 ++++--
 diskutil.c               |  2 +-
 filesetup.c              | 27 +++++++++++++++++----------
 options.c                | 18 ++++++++++++------
 oslib/linux-dev-lookup.c |  2 +-
 td_error.c               |  3 +--
 td_error.h               |  3 ++-
 7 files changed, 38 insertions(+), 23 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index ea9466a..289c518 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2836,7 +2836,8 @@ Error handling
 .. option:: ignore_error=str
 
 	Sometimes you want to ignore some errors during test in that case you can
-	specify error list for each error type.
+	specify error list for each error type, instead of only being able to
+	ignore the default 'non-fatal error' using :option:`continue_on_error`.
 	``ignore_error=READ_ERR_LIST,WRITE_ERR_LIST,VERIFY_ERR_LIST`` errors for
 	given error type is separated with ':'. Error may be symbol ('ENOSPC',
 	'ENOMEM') or integer.  Example::
@@ -2844,7 +2845,8 @@ Error handling
 		ignore_error=EAGAIN,ENOSPC:122
 
 	This option will ignore EAGAIN from READ, and ENOSPC and 122(EDQUOT) from
-	WRITE.
+	WRITE. This option works by overriding :option:`continue_on_error` with
+	the list of errors for each error type if any.
 
 .. option:: error_dump=bool
 
diff --git a/diskutil.c b/diskutil.c
index dca3748..9767ea2 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -363,7 +363,7 @@ static int find_block_dir(int majdev, int mindev, char *path, int link_ok)
 		return 0;
 
 	while ((dir = readdir(D)) != NULL) {
-		char full_path[256];
+		char full_path[257];
 
 		if (!strcmp(dir->d_name, ".") || !strcmp(dir->d_name, ".."))
 			continue;
diff --git a/filesetup.c b/filesetup.c
index e548d21..13079e4 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -833,19 +833,22 @@ static unsigned long long get_fs_free_counts(struct thread_data *td)
 uint64_t get_start_offset(struct thread_data *td, struct fio_file *f)
 {
 	struct thread_options *o = &td->o;
-	unsigned long long align_bs;  /* align the offset to this block size */
-	unsigned long long offset;  /* align_bs-aligned offset */
+	unsigned long long align_bs;
+	unsigned long long offset;
 
 	if (o->file_append && f->filetype == FIO_TYPE_FILE)
 		return f->real_file_size;
 
 	if (o->start_offset_percent > 0) {
-
-		/* if blockalign is provided, find the min across read, write, and trim */
+		/*
+		 * if blockalign is provided, find the min across read, write,
+		 * and trim
+		 */
 		if (fio_option_is_set(o, ba)) {
 			align_bs = (unsigned long long) min(o->ba[DDIR_READ], o->ba[DDIR_WRITE]);
 			align_bs = min((unsigned long long) o->ba[DDIR_TRIM], align_bs);
-		} else {  /* else take the minimum block size */
+		} else {
+			/* else take the minimum block size */
 			align_bs = td_min_bs(td);
 		}
 
@@ -853,14 +856,18 @@ uint64_t get_start_offset(struct thread_data *td, struct fio_file *f)
 		offset = (f->real_file_size * o->start_offset_percent / 100) +
 			(td->subjob_number * o->offset_increment);
 
-		/* block align the offset at the next available boundary at
-		   ceiling(offset / align_bs) * align_bs */
+		/*
+		 * block align the offset at the next available boundary at
+		 * ceiling(offset / align_bs) * align_bs
+		 */
 		offset = (offset / align_bs + (offset % align_bs != 0)) * align_bs;
 
-	} else {  /* start_offset_percent not set */
-		offset = o->start_offset + o->start_offset +
-			td->subjob_number * o->offset_increment;
+	} else {
+		/* start_offset_percent not set */
+		offset = o->start_offset +
+				td->subjob_number * o->offset_increment;
 	}
+
 	return offset;
 }
 
diff --git a/options.c b/options.c
index dcee7e5..6d799bf 100644
--- a/options.c
+++ b/options.c
@@ -270,7 +270,8 @@ static int str2error(char *str)
 	return 0;
 }
 
-static int ignore_error_type(struct thread_data *td, int etype, char *str)
+static int ignore_error_type(struct thread_data *td, enum error_type_bit etype,
+				char *str)
 {
 	unsigned int i;
 	int *error;
@@ -282,7 +283,7 @@ static int ignore_error_type(struct thread_data *td, int etype, char *str)
 	}
 
 	td->o.ignore_error_nr[etype] = 4;
-	error = malloc(4 * sizeof(struct bssplit));
+	error = calloc(4, sizeof(int));
 
 	i = 0;
 	while ((fname = strsep(&str, ":")) != NULL) {
@@ -306,8 +307,9 @@ static int ignore_error_type(struct thread_data *td, int etype, char *str)
 				error[i] = -error[i];
 		}
 		if (!error[i]) {
-			log_err("Unknown error %s, please use number value \n",
+			log_err("Unknown error %s, please use number value\n",
 				  fname);
+			td->o.ignore_error_nr[etype] = 0;
 			free(error);
 			return 1;
 		}
@@ -317,8 +319,10 @@ static int ignore_error_type(struct thread_data *td, int etype, char *str)
 		td->o.continue_on_error |= 1 << etype;
 		td->o.ignore_error_nr[etype] = i;
 		td->o.ignore_error[etype] = error;
-	} else
+	} else {
+		td->o.ignore_error_nr[etype] = 0;
 		free(error);
+	}
 
 	return 0;
 
@@ -328,7 +332,8 @@ static int str_ignore_error_cb(void *data, const char *input)
 {
 	struct thread_data *td = cb_data_to_td(data);
 	char *str, *p, *n;
-	int type = 0, ret = 1;
+	int ret = 1;
+	enum error_type_bit type = 0;
 
 	if (parse_dryrun())
 		return 0;
@@ -1389,7 +1394,8 @@ static int str_offset_cb(void *data, unsigned long long *__val)
 	if (parse_is_percent(v)) {
 		td->o.start_offset = 0;
 		td->o.start_offset_percent = -1ULL - v;
-		dprint(FD_PARSE, "SET start_offset_percent %d\n", td->o.start_offset_percent);
+		dprint(FD_PARSE, "SET start_offset_percent %d\n",
+					td->o.start_offset_percent);
 	} else
 		td->o.start_offset = v;
 
diff --git a/oslib/linux-dev-lookup.c b/oslib/linux-dev-lookup.c
index 5fbccd3..54017ff 100644
--- a/oslib/linux-dev-lookup.c
+++ b/oslib/linux-dev-lookup.c
@@ -20,7 +20,7 @@ int blktrace_lookup_device(const char *redirect, char *path, unsigned int maj,
 		return 0;
 
 	while ((dir = readdir(D)) != NULL) {
-		char full_path[256];
+		char full_path[257];
 
 		if (!strcmp(dir->d_name, ".") || !strcmp(dir->d_name, ".."))
 			continue;
diff --git a/td_error.c b/td_error.c
index 903f9ea..9d58a31 100644
--- a/td_error.c
+++ b/td_error.c
@@ -20,8 +20,7 @@ int td_non_fatal_error(struct thread_data *td, enum error_type_bit etype,
 
 	if (!td->o.ignore_error[etype]) {
 		td->o.ignore_error[etype] = __NON_FATAL_ERR;
-		td->o.ignore_error_nr[etype] = sizeof(__NON_FATAL_ERR)
-			/ sizeof(int);
+		td->o.ignore_error_nr[etype] = ARRAY_SIZE(__NON_FATAL_ERR);
 	}
 
 	if (!(td->o.continue_on_error & (1 << etype)))
diff --git a/td_error.h b/td_error.h
index 1133989..1b38a53 100644
--- a/td_error.h
+++ b/td_error.h
@@ -2,7 +2,8 @@
 #define FIO_TD_ERROR_H
 
 /*
- * What type of errors to continue on when continue_on_error is used
+ * What type of errors to continue on when continue_on_error is used,
+ * and what type of errors to ignore when ignore_error is used.
  */
 enum error_type_bit {
 	ERROR_TYPE_READ_BIT = 0,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9af5a2450a555a725dd18e6845967cd7cf3aad64:

  use correct syscall name in log_err() (2017-06-02 13:38:43 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c12d597ac36632a6f08c749df302135bbd339cb2:

  stat: correct json 'io_bytes' output (2017-06-05 14:05:43 -0600)

----------------------------------------------------------------
Brantley West (1):
      allow a percent value for the offset parameter

Ian Chakeres (1):
      Added information about minimal output to documentation

Jens Axboe (4):
      Merge branch 'size_perc' of https://github.com/sitsofe/fio
      Merge branch 'master' of https://github.com/cbwest3/fio
      Merge branch 'more-minimal-info-in-docs' of https://github.com/ianchakeres/fio
      stat: correct json 'io_bytes' output

Sitsofe Wheeler (1):
      filesetup: fix size percentage calculations when using offset

 HOWTO            | 14 ++++++++++----
 cconv.c          |  2 ++
 filesetup.c      | 36 +++++++++++++++++++++++++++++++++---
 fio.1            | 15 ++++++++++++++-
 options.c        | 16 ++++++++++++++++
 stat.c           |  3 ++-
 thread_options.h |  3 ++-
 7 files changed, 79 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 6c9e9a4..ea9466a 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1072,10 +1072,11 @@ I/O type
 
 .. option:: offset=int
 
-	Start I/O at the given offset in the file. The data before the given offset
-	will not be touched. This effectively caps the file size at `real_size -
-	offset`. Can be combined with :option:`size` to constrain the start and
-	end range that I/O will be done within.
+	Start I/O at the provided offset in the file, given as either a fixed size or
+	a percentage. If a percentage is given, the next ``blockalign``-ed offset
+	will be used. Data before the given offset will not be touched. This
+	effectively caps the file size at `real_size - offset`. Can be combined with
+	:option:`size` to constrain the start and end range of the I/O workload.
 
 .. option:: offset_increment=int
 
@@ -3225,6 +3226,11 @@ which is the Xth percentile, and the `usec` latency associated with it.
 For disk utilization, all disks used by fio are shown. So for each disk there
 will be a disk utilization section.
 
+Below is a single line containing short names for each of the fields in the
+minimal output v3, separated by semicolons:
+
+terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_max;read_clat_min;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_max;write_clat_min;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;
 write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;pu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
+
 
 Trace file format
 -----------------
diff --git a/cconv.c b/cconv.c
index 3295824..bf4c517 100644
--- a/cconv.c
+++ b/cconv.c
@@ -104,6 +104,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->file_size_low = le64_to_cpu(top->file_size_low);
 	o->file_size_high = le64_to_cpu(top->file_size_high);
 	o->start_offset = le64_to_cpu(top->start_offset);
+	o->start_offset_percent = le32_to_cpu(top->start_offset_percent);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		o->bs[i] = le32_to_cpu(top->bs[i]);
@@ -543,6 +544,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->file_size_low = __cpu_to_le64(o->file_size_low);
 	top->file_size_high = __cpu_to_le64(o->file_size_high);
 	top->start_offset = __cpu_to_le64(o->start_offset);
+	top->start_offset_percent = __cpu_to_le32(o->start_offset_percent);
 	top->trim_backlog = __cpu_to_le64(o->trim_backlog);
 	top->offset_increment = __cpu_to_le64(o->offset_increment);
 	top->number_ios = __cpu_to_le64(o->number_ios);
diff --git a/filesetup.c b/filesetup.c
index 612e794..e548d21 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -833,12 +833,35 @@ static unsigned long long get_fs_free_counts(struct thread_data *td)
 uint64_t get_start_offset(struct thread_data *td, struct fio_file *f)
 {
 	struct thread_options *o = &td->o;
+	unsigned long long align_bs;  /* align the offset to this block size */
+	unsigned long long offset;  /* align_bs-aligned offset */
 
 	if (o->file_append && f->filetype == FIO_TYPE_FILE)
 		return f->real_file_size;
 
-	return td->o.start_offset +
-		td->subjob_number * td->o.offset_increment;
+	if (o->start_offset_percent > 0) {
+
+		/* if blockalign is provided, find the min across read, write, and trim */
+		if (fio_option_is_set(o, ba)) {
+			align_bs = (unsigned long long) min(o->ba[DDIR_READ], o->ba[DDIR_WRITE]);
+			align_bs = min((unsigned long long) o->ba[DDIR_TRIM], align_bs);
+		} else {  /* else take the minimum block size */
+			align_bs = td_min_bs(td);
+		}
+
+		/* calculate the raw offset */
+		offset = (f->real_file_size * o->start_offset_percent / 100) +
+			(td->subjob_number * o->offset_increment);
+
+		/* block align the offset at the next available boundary at
+		   ceiling(offset / align_bs) * align_bs */
+		offset = (offset / align_bs + (offset % align_bs != 0)) * align_bs;
+
+	} else {  /* start_offset_percent not set */
+		offset = o->start_offset + o->start_offset +
+			td->subjob_number * o->offset_increment;
+	}
+	return offset;
 }
 
 /*
@@ -986,7 +1009,14 @@ int setup_files(struct thread_data *td)
 			total_size = -1ULL;
 		else {
                         if (o->size_percent) {
-				f->io_size = (f->io_size * o->size_percent) / 100;
+				uint64_t file_size;
+
+				file_size = f->io_size + f->file_offset;
+				f->io_size = (file_size *
+					      o->size_percent) / 100;
+				if (f->io_size > (file_size - f->file_offset))
+					f->io_size = file_size - f->file_offset;
+
 				f->io_size -= (f->io_size % td_min_bs(td));
 			}
 			total_size += f->io_size;
diff --git a/fio.1 b/fio.1
index 9956867..e153d46 100644
--- a/fio.1
+++ b/fio.1
@@ -904,7 +904,11 @@ If true, use buffered I/O.  This is the opposite of the \fBdirect\fR parameter.
 Default: true.
 .TP
 .BI offset \fR=\fPint
-Offset in the file to start I/O. Data before the offset will not be touched.
+Start I/O at the provided offset in the file, given as either a fixed size or a
+percentage. If a percentage is given, the next \fBblockalign\fR-ed offset will
+be used. Data before the given offset will not be touched. This effectively
+caps the file size at (real_size - offset). Can be combined with \fBsize\fR to
+constrain the start and end range of the I/O workload.
 .TP
 .BI offset_increment \fR=\fPint
 If this is provided, then the real offset becomes the
@@ -2241,6 +2245,15 @@ Error Info (dependent on continue_on_error, default off):
 .P
 .B text description (if provided in config - appears on newline)
 .RE
+.P
+Below is a single line containing short names for each of the fields in
+the minimal output v3, separated by semicolons:
+.RS
+.P
+.nf
+terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_max;read_clat_min;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_max;write_clat_min;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;
 write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;pu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
+.fi
+.RE
 .SH TRACE FILE FORMAT
 There are two trace file format that you can encounter. The older (v1) format
 is unsupported since version 1.20-rc3 (March 2008). It will still be described
diff --git a/options.c b/options.c
index b489e90..dcee7e5 100644
--- a/options.c
+++ b/options.c
@@ -1381,6 +1381,21 @@ static int str_gtod_reduce_cb(void *data, int *il)
 	return 0;
 }
 
+static int str_offset_cb(void *data, unsigned long long *__val)
+{
+	struct thread_data *td = cb_data_to_td(data);
+	unsigned long long v = *__val;
+
+	if (parse_is_percent(v)) {
+		td->o.start_offset = 0;
+		td->o.start_offset_percent = -1ULL - v;
+		dprint(FD_PARSE, "SET start_offset_percent %d\n", td->o.start_offset_percent);
+	} else
+		td->o.start_offset = v;
+
+	return 0;
+}
+
 static int str_size_cb(void *data, unsigned long long *__val)
 {
 	struct thread_data *td = cb_data_to_td(data);
@@ -1938,6 +1953,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "IO offset",
 		.alias	= "fileoffset",
 		.type	= FIO_OPT_STR_VAL,
+		.cb	= str_offset_cb,
 		.off1	= offsetof(struct thread_options, start_offset),
 		.help	= "Start IO from this offset",
 		.def	= "0",
diff --git a/stat.c b/stat.c
index e433c6d..fd3ad5a 100644
--- a/stat.c
+++ b/stat.c
@@ -919,7 +919,8 @@ static void add_ddir_status_json(struct thread_stat *ts,
 		iops = (1000.0 * (uint64_t) ts->total_io_u[ddir]) / runt;
 	}
 
-	json_object_add_value_int(dir_object, "io_bytes", ts->io_bytes[ddir] >> 10);
+	json_object_add_value_int(dir_object, "io_bytes", ts->io_bytes[ddir]);
+	json_object_add_value_int(dir_object, "io_kbytes", ts->io_bytes[ddir] >> 10);
 	json_object_add_value_int(dir_object, "bw", bw);
 	json_object_add_value_float(dir_object, "iops", iops);
 	json_object_add_value_int(dir_object, "runtime", ts->runtime[ddir]);
diff --git a/thread_options.h b/thread_options.h
index d0f3fe9..493e92e 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -200,6 +200,7 @@ struct thread_options {
 	unsigned int numa_mem_prefer_node;
 	char *numa_memnodes;
 	unsigned int gpu_dev_id;
+	unsigned int start_offset_percent;
 
 	unsigned int iolog;
 	unsigned int rwmixcycle;
@@ -469,7 +470,7 @@ struct thread_options_pack {
 	uint8_t log_gz_cpumask[FIO_TOP_STR_MAX];
 #endif
 	uint32_t gpu_dev_id;
-	uint32_t pad;
+	uint32_t start_offset_percent;
 	uint32_t cpus_allowed_policy;
 	uint32_t iolog;
 	uint32_t rwmixcycle;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-06-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-06-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e5123c4ad9b0626e25d9b243f1111fa89082308b:

  manpage: update URL links to HOWTO/README (2017-05-26 13:29:00 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9af5a2450a555a725dd18e6845967cd7cf3aad64:

  use correct syscall name in log_err() (2017-06-02 13:38:43 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (2):
      use true/false for bool type
      use correct syscall name in log_err()

 init.c   | 8 ++++----
 server.c | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index d224bd6..2b7768a 100644
--- a/init.c
+++ b/init.c
@@ -909,9 +909,9 @@ void td_fill_verify_state_seed(struct thread_data *td)
 	bool use64;
 
 	if (td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE64)
-		use64 = 1;
+		use64 = true;
 	else
-		use64 = 0;
+		use64 = false;
 
 	init_rand_seed(&td->verify_state, td->rand_seeds[FIO_RAND_VER_OFF],
 		use64);
@@ -967,9 +967,9 @@ void td_fill_rand_seeds(struct thread_data *td)
 	}
 
 	if (td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE64)
-		use64 = 1;
+		use64 = true;
 	else
-		use64 = 0;
+		use64 = false;
 
 	td_fill_rand_seeds_internal(td, use64);
 
diff --git a/server.c b/server.c
index 1e269c2..8a5e75d 100644
--- a/server.c
+++ b/server.c
@@ -1279,7 +1279,7 @@ static int get_my_addr_str(int sk)
 
 	ret = getsockname(sk, sockaddr_p, &len);
 	if (ret) {
-		log_err("fio: getsockaddr: %s\n", strerror(errno));
+		log_err("fio: getsockname: %s\n", strerror(errno));
 		return -1;
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-05-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-05-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8dd0eca30caeda7f62df4d09748286592956fcc5:

  lib/output_buffer: harden buf_output_free() and kill buf_output_clear() (2017-05-24 10:21:27 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e5123c4ad9b0626e25d9b243f1111fa89082308b:

  manpage: update URL links to HOWTO/README (2017-05-26 13:29:00 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (3):
      verify: add missing free(ptr);
      verify: mention some default option values
      manpage: update URL links to HOWTO/README

 HOWTO    | 3 ++-
 fio.1    | 6 +++---
 verify.c | 1 +
 3 files changed, 6 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index a899b90..6c9e9a4 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2467,6 +2467,7 @@ Verification
 	contents to one or more separate threads. If using this offload option, even
 	sync I/O engines can benefit from using an :option:`iodepth` setting higher
 	than 1, as it allows them to have I/O in flight while verifies are running.
+	Defaults to 0 async threads, i.e. verification is not asynchronous.
 
 .. option:: verify_async_cpus=str
 
@@ -2503,7 +2504,7 @@ Verification
 
 	<type> is "local" for a local run, "sock" for a client/server socket
 	connection, and "ip" (192.168.0.1, for instance) for a networked
-	client/server connection.
+	client/server connection. Defaults to true.
 
 .. option:: verify_state_load=bool
 
diff --git a/fio.1 b/fio.1
index 301a708..9956867 100644
--- a/fio.1
+++ b/fio.1
@@ -1,4 +1,4 @@
-.TH fio 1 "March 2017" "User Manual"
+.TH fio 1 "May 2017" "User Manual"
 .SH NAME
 fio \- flexible I/O tester
 .SH SYNOPSIS
@@ -2594,7 +2594,7 @@ Sample jobfiles are available in the \fBexamples\fR directory.
 .br
 These are typically located under /usr/share/doc/fio.
 
-\fBHOWTO\fR:  http://git.kernel.dk/?p=fio.git;a=blob_plain;f=HOWTO
+\fBHOWTO\fR:  http://git.kernel.dk/cgit/fio/plain/HOWTO
 .br
-\fBREADME\fR: http://git.kernel.dk/?p=fio.git;a=blob_plain;f=README
+\fBREADME\fR: http://git.kernel.dk/cgit/fio/plain/README
 .br
diff --git a/verify.c b/verify.c
index cadfe9c..1c39fa2 100644
--- a/verify.c
+++ b/verify.c
@@ -271,6 +271,7 @@ static void dump_buf(char *buf, unsigned int len, unsigned long long offset,
 	fd = open(fname, O_CREAT | O_TRUNC | O_WRONLY, 0644);
 	if (fd < 0) {
 		perror("open verify buf file");
+		free(ptr);
 		return;
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-05-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-05-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c78e8496d438982157657711fbff8bedb621c1c9:

  log: ensure we don't truncate the final '\0' in the log (2017-05-23 21:51:59 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8dd0eca30caeda7f62df4d09748286592956fcc5:

  lib/output_buffer: harden buf_output_free() and kill buf_output_clear() (2017-05-24 10:21:27 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      log: unify the logging handlers
      lib/output_buffer: harden buf_output_free() and kill buf_output_clear()

 lib/output_buffer.c |  9 +-----
 lib/output_buffer.h |  1 -
 log.c               | 91 ++++++++++++++++++++---------------------------------
 stat.c              |  2 +-
 4 files changed, 36 insertions(+), 67 deletions(-)

---

Diff of recent changes:

diff --git a/lib/output_buffer.c b/lib/output_buffer.c
index 313536d..f6c304b 100644
--- a/lib/output_buffer.c
+++ b/lib/output_buffer.c
@@ -17,6 +17,7 @@ void buf_output_init(struct buf_output *out)
 void buf_output_free(struct buf_output *out)
 {
 	free(out->buf);
+	buf_output_init(out);
 }
 
 size_t buf_output_add(struct buf_output *out, const char *buf, size_t len)
@@ -39,11 +40,3 @@ size_t buf_output_add(struct buf_output *out, const char *buf, size_t len)
 	out->buflen += len;
 	return len;
 }
-
-void buf_output_clear(struct buf_output *out)
-{
-	if (out->buflen) {
-		memset(out->buf, 0, out->max_buflen);
-		out->buflen = 0;
-	}
-}
diff --git a/lib/output_buffer.h b/lib/output_buffer.h
index 15ee005..a235af2 100644
--- a/lib/output_buffer.h
+++ b/lib/output_buffer.h
@@ -12,6 +12,5 @@ struct buf_output {
 void buf_output_init(struct buf_output *out);
 void buf_output_free(struct buf_output *out);
 size_t buf_output_add(struct buf_output *out, const char *buf, size_t len);
-void buf_output_clear(struct buf_output *out);
 
 #endif
diff --git a/log.c b/log.c
index c7856eb..95351d5 100644
--- a/log.c
+++ b/log.c
@@ -29,78 +29,66 @@ size_t log_info_buf(const char *buf, size_t len)
 		return fwrite(buf, len, 1, f_out);
 }
 
-size_t log_valist(const char *str, va_list args)
+static size_t valist_to_buf(char **buffer, const char *fmt, va_list src_args)
 {
 	size_t len, cur = LOG_START_SZ;
-	char *buffer;
+	va_list args;
 
 	do {
-		buffer = calloc(1, cur);
+		*buffer = calloc(1, cur);
+
+		va_copy(args, src_args);
+		len = vsnprintf(*buffer, cur, fmt, args);
+		va_end(args);
 
-		len = vsnprintf(buffer, cur, str, args);
 		if (len < cur)
 			break;
 
 		cur = len + 1;
-		free(buffer);
+		free(*buffer);
 	} while (1);
 
-	cur = log_info_buf(buffer, len);
-	free(buffer);
-
-	return cur;
+	return len;
 }
 
-size_t log_info(const char *format, ...)
+size_t log_valist(const char *fmt, va_list args)
 {
-	size_t len, cur = LOG_START_SZ;
 	char *buffer;
-	va_list args;
-
-	do {
-		buffer = calloc(1, cur);
+	size_t len;
 
-		va_start(args, format);
-		len = vsnprintf(buffer, cur, format, args);
-		va_end(args);
+	len = valist_to_buf(&buffer, fmt, args);
+	len = log_info_buf(buffer, len);
+	free(buffer);
 
-		if (len < cur)
-			break;
+	return len;
+}
 
-		cur = len + 1;
-		free(buffer);
-	} while (1);
+size_t log_info(const char *format, ...)
+{
+	va_list args;
+	size_t ret;
 
-	cur = log_info_buf(buffer, len);
-	free(buffer);
+	va_start(args, format);
+	ret = log_valist(format, args);
+	va_end(args);
 
-	return cur;
+	return ret;
 }
 
 size_t __log_buf(struct buf_output *buf, const char *format, ...)
 {
-	size_t len, cur = LOG_START_SZ;
 	char *buffer;
 	va_list args;
+	size_t len;
 
-	do {
-		buffer = calloc(1, cur);
-
-		va_start(args, format);
-		len = vsnprintf(buffer, cur, format, args);
-		va_end(args);
-
-		if (len < cur)
-			break;
-
-		cur = len + 1;
-		free(buffer);
-	} while (1);
+	va_start(args, format);
+	len = valist_to_buf(&buffer, format, args);
+	va_end(args);
 
-	cur = buf_output_add(buf, buffer, len);
+	len = buf_output_add(buf, buffer, len);
 	free(buffer);
 
-	return cur;
+	return len;
 }
 
 int log_info_flush(void)
@@ -113,24 +101,13 @@ int log_info_flush(void)
 
 size_t log_err(const char *format, ...)
 {
-	size_t ret, len, cur = LOG_START_SZ;
+	size_t ret, len;
 	char *buffer;
 	va_list args;
 
-	do {
-		buffer = calloc(1, cur);
-
-		va_start(args, format);
-		len = vsnprintf(buffer, cur, format, args);
-		va_end(args);
-
-		if (len < cur)
-			break;
-
-		cur = len + 1;
-		free(buffer);
-	} while (1);
-
+	va_start(args, format);
+	len = valist_to_buf(&buffer, format, args);
+	va_end(args);
 
 	if (is_backend) {
 		ret = fio_server_text_output(FIO_LOG_ERR, buffer, len);
diff --git a/stat.c b/stat.c
index 1f124a8..e433c6d 100644
--- a/stat.c
+++ b/stat.c
@@ -1826,8 +1826,8 @@ void __show_run_stats(void)
 
 	for (i = 0; i < FIO_OUTPUT_NR; i++) {
 		struct buf_output *out = &output[i];
+
 		log_info_buf(out->buf, out->buflen);
-		buf_output_clear(out);
 		buf_output_free(out);
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-05-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-05-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit af13d1e88158d3e37940648be139d7a46fe00431:

  Merge branch 'bugfix' of https://github.com/YukiKita/fio (2017-05-22 10:23:25 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c78e8496d438982157657711fbff8bedb621c1c9:

  log: ensure we don't truncate the final '\0' in the log (2017-05-23 21:51:59 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      Revert "Fixed json_print_value so that ending double quote of JSON string value will not disappear"
      log: make log_buf() return how much it wrote
      log: make the logging functions handle > 1024 bytes correctly
      log: ensure we don't truncate the final '\0' in the log

Tomohiro Kusumi (7):
      configure: Use single square brackets (POSIX)
      configure: Add print_config() for "<config>... <yes|no>" outputs
      Move {is,load}_blktrace() to a new header blktrace.h
      Drop struct thread_data dependency from os headers
      Drop circular dependency in log.c and lib/output_buffer.c
      Include sg headers in os/os-linux.h
      Move Linux/ppc64 specific cpu_online() to os/os-linux.h

 blktrace.c          |   1 +
 blktrace.h          |  23 ++++++++
 configure           | 159 +++++++++++++++++++++++++++-------------------------
 fio.h               |   8 ---
 init.c              |  10 +++-
 iolog.c             |   1 +
 json.c              |  10 +---
 lib/output_buffer.c |   8 +--
 lib/output_buffer.h |   2 +-
 log.c               | 117 ++++++++++++++++++++++++++++----------
 log.h               |  16 +++---
 os/os-linux.h       |  10 ++++
 os/os-windows.h     |   4 +-
 os/os.h             |  34 +----------
 stat.c              |   6 +-
 15 files changed, 231 insertions(+), 178 deletions(-)
 create mode 100644 blktrace.h

---

Diff of recent changes:

diff --git a/blktrace.c b/blktrace.c
index a3474cb..65b600f 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -10,6 +10,7 @@
 
 #include "flist.h"
 #include "fio.h"
+#include "blktrace.h"
 #include "blktrace_api.h"
 #include "oslib/linux-dev-lookup.h"
 
diff --git a/blktrace.h b/blktrace.h
new file mode 100644
index 0000000..8656a95
--- /dev/null
+++ b/blktrace.h
@@ -0,0 +1,23 @@
+#ifndef FIO_BLKTRACE_H
+#define FIO_BLKTRACE_H
+
+#ifdef FIO_HAVE_BLKTRACE
+
+int is_blktrace(const char *, int *);
+int load_blktrace(struct thread_data *, const char *, int);
+
+#else
+
+static inline int is_blktrace(const char *fname, int *need_swap)
+{
+	return 0;
+}
+
+static inline int load_blktrace(struct thread_data *td, const char *fname,
+				int need_swap)
+{
+	return 1;
+}
+
+#endif
+#endif
diff --git a/configure b/configure
index 0327578..2c6bfc8 100755
--- a/configure
+++ b/configure
@@ -37,6 +37,11 @@ fatal() {
   exit 1
 }
 
+# Print result for each configuration test
+print_config() {
+  printf "%-30s%s\n" "$1" "$2"
+}
+
 # Default CFLAGS
 CFLAGS="-D_GNU_SOURCE -include config-host.h"
 BUILD_CFLAGS=""
@@ -475,11 +480,11 @@ EOF
 fi
 
 
-echo "Operating system              $targetos"
-echo "CPU                           $cpu"
-echo "Big endian                    $bigendian"
-echo "Compiler                      $cc"
-echo "Cross compile                 $cross_compile"
+print_config "Operating system" "$targetos"
+print_config "CPU" "$cpu"
+print_config "Big endian" "$bigendian"
+print_config "Compiler" "$cc"
+print_config "Cross compile" "$cross_compile"
 echo
 
 ##########################################
@@ -490,7 +495,7 @@ if test "$build_static" = "yes" ; then
 else
   build_static="no"
 fi
-echo "Static build                  $build_static"
+print_config "Static build" "$build_static"
 
 ##########################################
 # check for wordsize
@@ -511,7 +516,7 @@ elif compile_prog "-DWORDSIZE=64" "" "wordsize"; then
 else
   fatal "Unknown wordsize"
 fi
-echo "Wordsize                      $wordsize"
+print_config "Wordsize" "$wordsize"
 
 ##########################################
 # zlib probe
@@ -532,7 +537,7 @@ if compile_prog "" "-lz" "zlib" ; then
   zlib=yes
   LIBS="-lz $LIBS"
 fi
-echo "zlib                          $zlib"
+print_config "zlib" "$zlib"
 
 ##########################################
 # linux-aio probe
@@ -559,7 +564,7 @@ EOF
     libaio=no
   fi
 fi
-echo "Linux AIO support             $libaio"
+print_config "Linux AIO support" "$libaio"
 
 ##########################################
 # posix aio probe
@@ -585,8 +590,8 @@ elif compile_prog "" "-lrt" "posixaio"; then
   posix_aio_lrt="yes"
   LIBS="-lrt $LIBS"
 fi
-echo "POSIX AIO support             $posix_aio"
-echo "POSIX AIO support needs -lrt  $posix_aio_lrt"
+print_config "POSIX AIO support" "$posix_aio"
+print_config "POSIX AIO support needs -lrt" "$posix_aio_lrt"
 
 ##########################################
 # posix aio fsync probe
@@ -608,7 +613,7 @@ EOF
     posix_aio_fsync=yes
   fi
 fi
-echo "POSIX AIO fsync               $posix_aio_fsync"
+print_config "POSIX AIO fsync" "$posix_aio_fsync"
 
 ##########################################
 # POSIX pshared attribute probe
@@ -638,7 +643,7 @@ EOF
 if compile_prog "" "$LIBS" "posix_pshared" ; then
   posix_pshared=yes
 fi
-echo "POSIX pshared support         $posix_pshared"
+print_config "POSIX pshared support" "$posix_pshared"
 
 ##########################################
 # solaris aio probe
@@ -660,7 +665,7 @@ if compile_prog "" "-laio" "solarisaio" ; then
   solaris_aio=yes
   LIBS="-laio $LIBS"
 fi
-echo "Solaris AIO support           $solaris_aio"
+print_config "Solaris AIO support" "$solaris_aio"
 
 ##########################################
 # __sync_fetch_and_add test
@@ -684,7 +689,7 @@ EOF
 if compile_prog "" "" "__sync_fetch_and_add()" ; then
     sfaa="yes"
 fi
-echo "__sync_fetch_and_add          $sfaa"
+print_config "__sync_fetch_and_add" "$sfaa"
 
 ##########################################
 # libverbs probe
@@ -704,7 +709,7 @@ if test "$disable_rdma" != "yes" && compile_prog "" "-libverbs" "libverbs" ; the
     libverbs="yes"
     LIBS="-libverbs $LIBS"
 fi
-echo "libverbs                      $libverbs"
+print_config "libverbs" "$libverbs"
 
 ##########################################
 # rdmacm probe
@@ -724,7 +729,7 @@ if test "$disable_rdma" != "yes" && compile_prog "" "-lrdmacm" "rdma"; then
     rdmacm="yes"
     LIBS="-lrdmacm $LIBS"
 fi
-echo "rdmacm                        $rdmacm"
+print_config "rdmacm" "$rdmacm"
 
 ##########################################
 # Linux fallocate probe
@@ -744,7 +749,7 @@ EOF
 if compile_prog "" "" "linux_fallocate"; then
     linux_fallocate="yes"
 fi
-echo "Linux fallocate               $linux_fallocate"
+print_config "Linux fallocate" "$linux_fallocate"
 
 ##########################################
 # POSIX fadvise probe
@@ -763,7 +768,7 @@ EOF
 if compile_prog "" "" "posix_fadvise"; then
     posix_fadvise="yes"
 fi
-echo "POSIX fadvise                 $posix_fadvise"
+print_config "POSIX fadvise" "$posix_fadvise"
 
 ##########################################
 # POSIX fallocate probe
@@ -782,7 +787,7 @@ EOF
 if compile_prog "" "" "posix_fallocate"; then
     posix_fallocate="yes"
 fi
-echo "POSIX fallocate               $posix_fallocate"
+print_config "POSIX fallocate" "$posix_fallocate"
 
 ##########################################
 # sched_set/getaffinity 2 or 3 argument test
@@ -815,8 +820,8 @@ EOF
     linux_2arg_affinity="yes"
   fi
 fi
-echo "sched_setaffinity(3 arg)      $linux_3arg_affinity"
-echo "sched_setaffinity(2 arg)      $linux_2arg_affinity"
+print_config "sched_setaffinity(3 arg)" "$linux_3arg_affinity"
+print_config "sched_setaffinity(2 arg)" "$linux_2arg_affinity"
 
 ##########################################
 # clock_gettime probe
@@ -837,7 +842,7 @@ elif compile_prog "" "-lrt" "clock_gettime"; then
     clock_gettime="yes"
     LIBS="-lrt $LIBS"
 fi
-echo "clock_gettime                 $clock_gettime"
+print_config "clock_gettime" "$clock_gettime"
 
 ##########################################
 # CLOCK_MONOTONIC probe
@@ -857,7 +862,7 @@ EOF
       clock_monotonic="yes"
   fi
 fi
-echo "CLOCK_MONOTONIC               $clock_monotonic"
+print_config "CLOCK_MONOTONIC" "$clock_monotonic"
 
 ##########################################
 # CLOCK_MONOTONIC_RAW probe
@@ -877,7 +882,7 @@ EOF
       clock_monotonic_raw="yes"
   fi
 fi
-echo "CLOCK_MONOTONIC_RAW           $clock_monotonic_raw"
+print_config "CLOCK_MONOTONIC_RAW" "$clock_monotonic_raw"
 
 ##########################################
 # CLOCK_MONOTONIC_PRECISE probe
@@ -897,7 +902,7 @@ EOF
       clock_monotonic_precise="yes"
   fi
 fi
-echo "CLOCK_MONOTONIC_PRECISE       $clock_monotonic_precise"
+print_config "CLOCK_MONOTONIC_PRECISE" "$clock_monotonic_precise"
 
 ##########################################
 # clockid_t probe
@@ -917,7 +922,7 @@ EOF
 if compile_prog "" "$LIBS" "clockid_t"; then
   clockid_t="yes"
 fi
-echo "clockid_t                     $clockid_t"
+print_config "clockid_t" "$clockid_t"
 
 ##########################################
 # gettimeofday() probe
@@ -936,7 +941,7 @@ EOF
 if compile_prog "" "" "gettimeofday"; then
     gettimeofday="yes"
 fi
-echo "gettimeofday                  $gettimeofday"
+print_config "gettimeofday" "$gettimeofday"
 
 ##########################################
 # fdatasync() probe
@@ -954,7 +959,7 @@ EOF
 if compile_prog "" "" "fdatasync"; then
   fdatasync="yes"
 fi
-echo "fdatasync                     $fdatasync"
+print_config "fdatasync" "$fdatasync"
 
 ##########################################
 # sync_file_range() probe
@@ -976,7 +981,7 @@ EOF
 if compile_prog "" "" "sync_file_range"; then
   sync_file_range="yes"
 fi
-echo "sync_file_range               $sync_file_range"
+print_config "sync_file_range" "$sync_file_range"
 
 ##########################################
 # ext4 move extent probe
@@ -1000,7 +1005,7 @@ elif test $targetos = "Linux" ; then
   # work. Takes a while to bubble back.
   ext4_me="yes"
 fi
-echo "EXT4 move extent              $ext4_me"
+print_config "EXT4 move extent" "$ext4_me"
 
 ##########################################
 # splice probe
@@ -1018,7 +1023,7 @@ EOF
 if compile_prog "" "" "linux splice"; then
   linux_splice="yes"
 fi
-echo "Linux splice(2)               $linux_splice"
+print_config "Linux splice(2)" "$linux_splice"
 
 ##########################################
 # GUASI probe
@@ -1037,7 +1042,7 @@ EOF
 if compile_prog "" "" "guasi"; then
   guasi="yes"
 fi
-echo "GUASI                         $guasi"
+print_config "GUASI" "$guasi"
 
 ##########################################
 # fusion-aw probe
@@ -1059,7 +1064,7 @@ if compile_prog "" "-L/usr/lib/fio -L/usr/lib/nvm -lnvm-primitives -ldl -lpthrea
   LIBS="-L/usr/lib/fio -L/usr/lib/nvm -lnvm-primitives -ldl -lpthread $LIBS"
   fusion_aw="yes"
 fi
-echo "Fusion-io atomic engine       $fusion_aw"
+print_config "Fusion-io atomic engine" "$fusion_aw"
 
 ##########################################
 # libnuma probe
@@ -1077,7 +1082,7 @@ if test "$disable_numa" != "yes"  && compile_prog "" "-lnuma" "libnuma"; then
   libnuma="yes"
   LIBS="-lnuma $LIBS"
 fi
-echo "libnuma                       $libnuma"
+print_config "libnuma" "$libnuma"
 
 ##########################################
 # libnuma 2.x version API, initialize with "no" only if $libnuma is set to "yes"
@@ -1094,7 +1099,7 @@ EOF
 if compile_prog "" "" "libnuma api"; then
   libnuma_v2="yes"
 fi
-echo "libnuma v2                    $libnuma_v2"
+print_config "libnuma v2" "$libnuma_v2"
 fi
 
 ##########################################
@@ -1114,7 +1119,7 @@ EOF
 if compile_prog "" "" "strsep"; then
   strsep="yes"
 fi
-echo "strsep                        $strsep"
+print_config "strsep" "$strsep"
 
 ##########################################
 # strcasestr() probe
@@ -1131,7 +1136,7 @@ EOF
 if compile_prog "" "" "strcasestr"; then
   strcasestr="yes"
 fi
-echo "strcasestr                    $strcasestr"
+print_config "strcasestr" "$strcasestr"
 
 ##########################################
 # strlcat() probe
@@ -1152,7 +1157,7 @@ EOF
 if compile_prog "" "" "strlcat"; then
   strlcat="yes"
 fi
-echo "strlcat                       $strlcat"
+print_config "strlcat" "$strlcat"
 
 ##########################################
 # getopt_long_only() probe
@@ -1172,7 +1177,7 @@ EOF
 if compile_prog "" "" "getopt_long_only"; then
   getopt_long_only="yes"
 fi
-echo "getopt_long_only()            $getopt_long_only"
+print_config "getopt_long_only()" "$getopt_long_only"
 
 ##########################################
 # inet_aton() probe
@@ -1192,7 +1197,7 @@ EOF
 if compile_prog "" "" "inet_aton"; then
   inet_aton="yes"
 fi
-echo "inet_aton                     $inet_aton"
+print_config "inet_aton" "$inet_aton"
 
 ##########################################
 # socklen_t probe
@@ -1210,7 +1215,7 @@ EOF
 if compile_prog "" "" "socklen_t"; then
   socklen_t="yes"
 fi
-echo "socklen_t                     $socklen_t"
+print_config "socklen_t" "$socklen_t"
 
 ##########################################
 # Whether or not __thread is supported for TLS
@@ -1228,7 +1233,7 @@ EOF
 if compile_prog "" "" "__thread"; then
   tls_thread="yes"
 fi
-echo "__thread                      $tls_thread"
+print_config "__thread" "$tls_thread"
 
 ##########################################
 # Check if we have required gtk/glib support for gfio
@@ -1278,7 +1283,7 @@ LDFLAGS=$ORG_LDFLAGS
 fi
 
 if test "$gfio_check" = "yes" ; then
-  echo "gtk 2.18 or higher            $gfio"
+  print_config "gtk 2.18 or higher" "$gfio"
 fi
 
 ##########################################
@@ -1299,7 +1304,7 @@ EOF
 if compile_prog "" "" "RUSAGE_THREAD"; then
   rusage_thread="yes"
 fi
-echo "RUSAGE_THREAD                 $rusage_thread"
+print_config "RUSAGE_THREAD" "$rusage_thread"
 
 ##########################################
 # Check whether we have SCHED_IDLE
@@ -1317,7 +1322,7 @@ EOF
 if compile_prog "" "" "SCHED_IDLE"; then
   sched_idle="yes"
 fi
-echo "SCHED_IDLE                    $sched_idle"
+print_config "SCHED_IDLE" "$sched_idle"
 
 ##########################################
 # Check whether we have TCP_NODELAY
@@ -1337,7 +1342,7 @@ EOF
 if compile_prog "" "" "TCP_NODELAY"; then
   tcp_nodelay="yes"
 fi
-echo "TCP_NODELAY                   $tcp_nodelay"
+print_config "TCP_NODELAY" "$tcp_nodelay"
 
 ##########################################
 # Check whether we have SO_SNDBUF
@@ -1358,7 +1363,7 @@ EOF
 if compile_prog "" "" "SO_SNDBUF"; then
   window_size="yes"
 fi
-echo "Net engine window_size        $window_size"
+print_config "Net engine window_size" "$window_size"
 
 ##########################################
 # Check whether we have TCP_MAXSEG
@@ -1380,7 +1385,7 @@ EOF
 if compile_prog "" "" "TCP_MAXSEG"; then
   mss="yes"
 fi
-echo "TCP_MAXSEG                    $mss"
+print_config "TCP_MAXSEG" "$mss"
 
 ##########################################
 # Check whether we have RLIMIT_MEMLOCK
@@ -1399,7 +1404,7 @@ EOF
 if compile_prog "" "" "RLIMIT_MEMLOCK"; then
   rlimit_memlock="yes"
 fi
-echo "RLIMIT_MEMLOCK                $rlimit_memlock"
+print_config "RLIMIT_MEMLOCK" "$rlimit_memlock"
 
 ##########################################
 # Check whether we have pwritev/preadv
@@ -1417,7 +1422,7 @@ EOF
 if compile_prog "" "" "pwritev"; then
   pwritev="yes"
 fi
-echo "pwritev/preadv                $pwritev"
+print_config "pwritev/preadv" "$pwritev"
 
 ##########################################
 # Check whether we have pwritev2/preadv2
@@ -1435,7 +1440,7 @@ EOF
 if compile_prog "" "" "pwritev2"; then
   pwritev2="yes"
 fi
-echo "pwritev2/preadv2              $pwritev2"
+print_config "pwritev2/preadv2" "$pwritev2"
 
 ##########################################
 # Check whether we have the required functions for ipv6
@@ -1464,7 +1469,7 @@ EOF
 if compile_prog "" "" "ipv6"; then
   ipv6="yes"
 fi
-echo "IPv6 helpers                  $ipv6"
+print_config "IPv6 helpers" "$ipv6"
 
 ##########################################
 # check for rbd
@@ -1491,7 +1496,7 @@ if test "$disable_rbd" != "yes"  && compile_prog "" "-lrbd -lrados" "rbd"; then
   LIBS="-lrbd -lrados $LIBS"
   rbd="yes"
 fi
-echo "Rados Block Device engine     $rbd"
+print_config "Rados Block Device engine" "$rbd"
 
 ##########################################
 # check for rbd_poll
@@ -1518,7 +1523,7 @@ EOF
 if compile_prog "" "-lrbd -lrados" "rbd"; then
   rbd_poll="yes"
 fi
-echo "rbd_poll                      $rbd_poll"
+print_config "rbd_poll" "$rbd_poll"
 fi
 
 ##########################################
@@ -1540,7 +1545,7 @@ EOF
 if compile_prog "" "-lrbd -lrados" "rbd"; then
   rbd_inval="yes"
 fi
-echo "rbd_invalidate_cache          $rbd_inval"
+print_config "rbd_invalidate_cache" "$rbd_inval"
 fi
 
 ##########################################
@@ -1571,7 +1576,7 @@ if test "$disable_rbd" != "yes" && test "$disable_rbd_blkin" != "yes" \
   LIBS="-lblkin $LIBS"
   rbd_blkin="yes"
 fi
-echo "rbd blkin tracing             $rbd_blkin"
+print_config "rbd blkin tracing" "$rbd_blkin"
 
 ##########################################
 # Check whether we have setvbuf
@@ -1591,7 +1596,7 @@ EOF
 if compile_prog "" "" "setvbuf"; then
   setvbuf="yes"
 fi
-echo "setvbuf                       $setvbuf"
+print_config "setvbuf" "$setvbuf"
 
 ##########################################
 # check for gfapi
@@ -1612,7 +1617,7 @@ if test "$disable_gfapi" != "yes"  && compile_prog "" "-lgfapi -lglusterfs" "gfa
   LIBS="-lgfapi -lglusterfs $LIBS"
   gfapi="yes"
 fi
- echo "Gluster API engine            $gfapi"
+print_config "Gluster API engine" "$gfapi"
 
 ##########################################
 # check for gfapi fadvise support, initialize with "no" only if $gfapi is set to "yes"
@@ -1632,7 +1637,7 @@ EOF
 if compile_prog "" "-lgfapi -lglusterfs" "gfapi"; then
   gf_fadvise="yes"
 fi
-echo "Gluster API use fadvise       $gf_fadvise"
+print_config "Gluster API use fadvise" "$gf_fadvise"
 fi
 
 ##########################################
@@ -1652,7 +1657,7 @@ EOF
 if compile_prog "" "-lgfapi -lglusterfs" "gf trim"; then
   gf_trim="yes"
 fi
-echo "Gluster API trim support      $gf_trim"
+print_config "Gluster API trim support" "$gf_trim"
 fi
 
 ##########################################
@@ -1682,11 +1687,11 @@ int main(int argc, char **argv)
 EOF
 if compile_prog "" "" "s390_z196_facilities"; then
   $TMPE
-  if [[ $? -eq 0 ]]; then
+  if [ $? -eq 0 ]; then
   	s390_z196_facilities="yes"
   fi
 fi
-echo "s390_z196_facilities          $s390_z196_facilities"
+print_config "s390_z196_facilities" "$s390_z196_facilities"
 
 ##########################################
 # Check if we have required environment variables configured for libhdfs
@@ -1712,7 +1717,7 @@ if test "$libhdfs" = "yes" ; then
     FIO_HDFS_CPU="amd64"
   fi
 fi
-echo "HDFS engine                   $libhdfs"
+print_config "HDFS engine" "$libhdfs"
 
 ##########################################
 # Check whether we have MTD
@@ -1735,7 +1740,7 @@ EOF
 if compile_prog "" "" "mtd"; then
   mtd="yes"
 fi
-echo "MTD                           $mtd"
+print_config "MTD" "$mtd"
 
 ##########################################
 # Check whether we have libpmem
@@ -1755,7 +1760,7 @@ if compile_prog "" "-lpmem" "libpmem"; then
   libpmem="yes"
   LIBS="-lpmem $LIBS"
 fi
-echo "libpmem                       $libpmem"
+print_config "libpmem" "$libpmem"
 
 ##########################################
 # Check whether we have libpmemblk
@@ -1778,7 +1783,7 @@ EOF
     LIBS="-lpmemblk $LIBS"
   fi
 fi
-echo "libpmemblk                    $libpmemblk"
+print_config "libpmemblk" "$libpmemblk"
 
 # Choose the ioengines
 if test "$libpmem" = "yes" && test "$disable_pmem" = "no"; then
@@ -1790,11 +1795,11 @@ fi
 
 ##########################################
 # Report whether pmemblk engine is enabled
-echo "NVML pmemblk engine           $pmemblk"
+print_config "NVML pmemblk engine" "$pmemblk"
 
 ##########################################
 # Report whether dev-dax engine is enabled
-echo "NVML dev-dax engine           $devdax"
+print_config "NVML dev-dax engine" "$devdax"
 
 ##########################################
 # Check if we have lex/yacc available
@@ -1855,7 +1860,7 @@ fi
 fi
 fi
 
-echo "lex/yacc for arithmetic       $arith"
+print_config "lex/yacc for arithmetic" "$arith"
 
 ##########################################
 # Check whether we have setmntent/getmntent
@@ -1876,7 +1881,7 @@ EOF
 if compile_prog "" "" "getmntent"; then
   getmntent="yes"
 fi
-echo "getmntent                     $getmntent"
+print_config "getmntent" "$getmntent"
 
 ##########################################
 # Check whether we have getmntinfo
@@ -1901,7 +1906,7 @@ EOF
 if compile_prog "-Werror" "" "getmntinfo"; then
   getmntinfo="yes"
 fi
-echo "getmntinfo                    $getmntinfo"
+print_config "getmntinfo" "$getmntinfo"
 
 # getmntinfo(3) for NetBSD.
 if test "$getmntinfo_statvfs" != "yes" ; then
@@ -1919,7 +1924,7 @@ EOF
 # Skip the test if the one with statfs arg is detected.
 if test "$getmntinfo" != "yes" && compile_prog "-Werror" "" "getmntinfo_statvfs"; then
   getmntinfo_statvfs="yes"
-  echo "getmntinfo_statvfs            $getmntinfo_statvfs"
+  print_config "getmntinfo_statvfs" "$getmntinfo_statvfs"
 fi
 
 ##########################################
@@ -1945,7 +1950,7 @@ EOF
 if compile_prog "" "" "static_assert"; then
     static_assert="yes"
 fi
-echo "Static Assert                 $static_assert"
+print_config "Static Assert" "$static_assert"
 
 ##########################################
 # Check whether we have bool / stdbool.h
@@ -1963,7 +1968,7 @@ EOF
 if compile_prog "" "" "bool"; then
   have_bool="yes"
 fi
-echo "bool                          $have_bool"
+print_config "bool" "$have_bool"
 
 ##########################################
 # check march=armv8-a+crc+crypto
@@ -1986,7 +1991,7 @@ EOF
     CFLAGS="$CFLAGS -march=armv8-a+crc+crypto -DARCH_HAVE_CRC_CRYPTO"
   fi
 fi
-echo "march_armv8_a_crc_crypto      $march_armv8_a_crc_crypto"
+print_config "march_armv8_a_crc_crypto" "$march_armv8_a_crc_crypto"
 
 ##########################################
 # cuda probe
@@ -2004,7 +2009,7 @@ if test "$enable_cuda" = "yes" && compile_prog "" "-lcuda" "cuda"; then
   cuda="yes"
   LIBS="-lcuda $LIBS"
 fi
-echo "cuda                          $cuda"
+print_config "cuda" "$cuda"
 
 #############################################################################
 
diff --git a/fio.h b/fio.h
index ed631bc..963cf03 100644
--- a/fio.h
+++ b/fio.h
@@ -640,14 +640,6 @@ extern void free_threads_shm(void);
  */
 extern void reset_all_stats(struct thread_data *);
 
-/*
- * blktrace support
- */
-#ifdef FIO_HAVE_BLKTRACE
-extern int is_blktrace(const char *, int *);
-extern int load_blktrace(struct thread_data *, const char *, int);
-#endif
-
 extern int io_queue_event(struct thread_data *td, struct io_u *io_u, int *ret,
 		   enum fio_ddir ddir, uint64_t *bytes_issued, int from_verify,
 		   struct timeval *comp_time);
diff --git a/init.c b/init.c
index 52a5f03..d224bd6 100644
--- a/init.c
+++ b/init.c
@@ -1080,8 +1080,12 @@ static int setup_random_seeds(struct thread_data *td)
 	unsigned long seed;
 	unsigned int i;
 
-	if (!td->o.rand_repeatable && !fio_option_is_set(&td->o, rand_seed))
-		return init_random_state(td, td->rand_seeds, sizeof(td->rand_seeds));
+	if (!td->o.rand_repeatable && !fio_option_is_set(&td->o, rand_seed)) {
+		int ret = init_random_seeds(td->rand_seeds, sizeof(td->rand_seeds));
+		if (!ret)
+			td_fill_rand_seeds(td);
+		return ret;
+	}
 
 	seed = td->o.rand_seed;
 	for (i = 0; i < 4; i++)
@@ -1376,7 +1380,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	prev_group_jobs++;
 
 	if (setup_random_seeds(td)) {
-		td_verror(td, errno, "init_random_state");
+		td_verror(td, errno, "setup_random_seeds");
 		goto err;
 	}
 
diff --git a/iolog.c b/iolog.c
index 31d674c..01b82e8 100644
--- a/iolog.c
+++ b/iolog.c
@@ -19,6 +19,7 @@
 #include "trim.h"
 #include "filelock.h"
 #include "smalloc.h"
+#include "blktrace.h"
 
 static int iolog_flush(struct io_log *log);
 
diff --git a/json.c b/json.c
index 2160d29..e0227ec 100644
--- a/json.c
+++ b/json.c
@@ -340,13 +340,9 @@ static void json_print_array(struct json_array *array, struct buf_output *out)
 static void json_print_value(struct json_value *value, struct buf_output *out)
 {
 	switch (value->type) {
-	case JSON_TYPE_STRING: {
-			const char delimiter = '"';
-			buf_output_add(out, &delimiter, sizeof(delimiter));
-			buf_output_add(out, value->string, strlen(value->string));
-			buf_output_add(out, &delimiter, sizeof(delimiter));
-			break;
-		}
+	case JSON_TYPE_STRING:
+		log_buf(out, "\"%s\"", value->string);
+		break;
 	case JSON_TYPE_INTEGER:
 		log_buf(out, "%lld", value->integer_number);
 		break;
diff --git a/lib/output_buffer.c b/lib/output_buffer.c
index c1fdfc9..313536d 100644
--- a/lib/output_buffer.c
+++ b/lib/output_buffer.c
@@ -3,7 +3,6 @@
 #include <stdlib.h>
 
 #include "output_buffer.h"
-#include "../log.h"
 #include "../minmax.h"
 
 #define BUF_INC	1024
@@ -41,15 +40,10 @@ size_t buf_output_add(struct buf_output *out, const char *buf, size_t len)
 	return len;
 }
 
-size_t buf_output_flush(struct buf_output *out)
+void buf_output_clear(struct buf_output *out)
 {
-	size_t ret = 0;
-
 	if (out->buflen) {
-		ret = log_info_buf(out->buf, out->buflen);
 		memset(out->buf, 0, out->max_buflen);
 		out->buflen = 0;
 	}
-
-	return ret;
 }
diff --git a/lib/output_buffer.h b/lib/output_buffer.h
index 396002f..15ee005 100644
--- a/lib/output_buffer.h
+++ b/lib/output_buffer.h
@@ -12,6 +12,6 @@ struct buf_output {
 void buf_output_init(struct buf_output *out);
 void buf_output_free(struct buf_output *out);
 size_t buf_output_add(struct buf_output *out, const char *buf, size_t len);
-size_t buf_output_flush(struct buf_output *out);
+void buf_output_clear(struct buf_output *out);
 
 #endif
diff --git a/log.c b/log.c
index 4eb4af5..c7856eb 100644
--- a/log.c
+++ b/log.c
@@ -6,8 +6,16 @@
 
 #include "fio.h"
 
+#define LOG_START_SZ		512
+
 size_t log_info_buf(const char *buf, size_t len)
 {
+	/*
+	 * buf could be NULL (not just "").
+	 */
+	if (!buf)
+		return 0;
+
 	if (is_backend) {
 		size_t ret = fio_server_text_output(FIO_LOG_INFO, buf, len);
 		if (ret != -1)
@@ -23,38 +31,76 @@ size_t log_info_buf(const char *buf, size_t len)
 
 size_t log_valist(const char *str, va_list args)
 {
-	char buffer[1024];
-	size_t len;
+	size_t len, cur = LOG_START_SZ;
+	char *buffer;
+
+	do {
+		buffer = calloc(1, cur);
+
+		len = vsnprintf(buffer, cur, str, args);
+		if (len < cur)
+			break;
+
+		cur = len + 1;
+		free(buffer);
+	} while (1);
 
-	len = vsnprintf(buffer, sizeof(buffer), str, args);
+	cur = log_info_buf(buffer, len);
+	free(buffer);
 
-	return log_info_buf(buffer, min(len, sizeof(buffer) - 1));
+	return cur;
 }
 
 size_t log_info(const char *format, ...)
 {
-	char buffer[1024];
+	size_t len, cur = LOG_START_SZ;
+	char *buffer;
 	va_list args;
-	size_t len;
 
-	va_start(args, format);
-	len = vsnprintf(buffer, sizeof(buffer), format, args);
-	va_end(args);
+	do {
+		buffer = calloc(1, cur);
 
-	return log_info_buf(buffer, min(len, sizeof(buffer) - 1));
+		va_start(args, format);
+		len = vsnprintf(buffer, cur, format, args);
+		va_end(args);
+
+		if (len < cur)
+			break;
+
+		cur = len + 1;
+		free(buffer);
+	} while (1);
+
+	cur = log_info_buf(buffer, len);
+	free(buffer);
+
+	return cur;
 }
 
 size_t __log_buf(struct buf_output *buf, const char *format, ...)
 {
-	char buffer[1024];
+	size_t len, cur = LOG_START_SZ;
+	char *buffer;
 	va_list args;
-	size_t len;
 
-	va_start(args, format);
-	len = vsnprintf(buffer, sizeof(buffer), format, args);
-	va_end(args);
+	do {
+		buffer = calloc(1, cur);
+
+		va_start(args, format);
+		len = vsnprintf(buffer, cur, format, args);
+		va_end(args);
 
-	return buf_output_add(buf, buffer, min(len, sizeof(buffer) - 1));
+		if (len < cur)
+			break;
+
+		cur = len + 1;
+		free(buffer);
+	} while (1);
+
+	cur = buf_output_add(buf, buffer, len);
+	free(buffer);
+
+	return cur;
 }
 
 int log_info_flush(void)
@@ -67,33 +113,44 @@ int log_info_flush(void)
 
 size_t log_err(const char *format, ...)
 {
-	char buffer[1024];
+	size_t ret, len, cur = LOG_START_SZ;
+	char *buffer;
 	va_list args;
-	size_t len;
 
-	va_start(args, format);
-	len = vsnprintf(buffer, sizeof(buffer), format, args);
-	va_end(args);
-	len = min(len, sizeof(buffer) - 1);
+	do {
+		buffer = calloc(1, cur);
+
+		va_start(args, format);
+		len = vsnprintf(buffer, cur, format, args);
+		va_end(args);
+
+		if (len < cur)
+			break;
+
+		cur = len + 1;
+		free(buffer);
+	} while (1);
+
 
 	if (is_backend) {
-		size_t ret = fio_server_text_output(FIO_LOG_ERR, buffer, len);
+		ret = fio_server_text_output(FIO_LOG_ERR, buffer, len);
 		if (ret != -1)
-			return ret;
+			goto done;
 	}
 
 	if (log_syslog) {
 		syslog(LOG_INFO, "%s", buffer);
-		return len;
+		ret = len;
 	} else {
-		if (f_err != stderr) {
-			int fio_unused ret;
-
+		if (f_err != stderr)
 			ret = fwrite(buffer, len, 1, stderr);
-		}
 
-		return fwrite(buffer, len, 1, f_err);
+		ret = fwrite(buffer, len, 1, f_err);
 	}
+
+done:
+	free(buffer);
+	return ret;
 }
 
 const char *log_get_level(int level)
diff --git a/log.h b/log.h
index a39dea6..66546c4 100644
--- a/log.h
+++ b/log.h
@@ -16,13 +16,15 @@ extern size_t log_valist(const char *str, va_list);
 extern size_t log_info_buf(const char *buf, size_t len);
 extern int log_info_flush(void);
 
-#define log_buf(buf, format, args...)		\
-do {						\
-	if ((buf) != NULL)			\
-		__log_buf(buf, format, ##args);	\
-	else					\
-		log_info(format, ##args);	\
-} while (0)
+#define log_buf(buf, format, args...)			\
+({							\
+	size_t __ret;					\
+	if ((buf) != NULL)				\
+		__ret = __log_buf(buf, format, ##args);	\
+	else						\
+		__ret = log_info(format, ##args);	\
+	__ret;						\
+})
 
 enum {
 	FIO_LOG_DEBUG	= 1,
diff --git a/os/os-linux.h b/os/os-linux.h
index ba53590..008ce2d 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -16,6 +16,8 @@
 #include <linux/unistd.h>
 #include <linux/raw.h>
 #include <linux/major.h>
+#include <linux/fs.h>
+#include <scsi/sg.h>
 
 #include "./os-linux-syscall.h"
 #include "binject.h"
@@ -258,6 +260,14 @@ static inline int arch_cache_line_size(void)
 		return atoi(size);
 }
 
+#ifdef __powerpc64__
+#define FIO_HAVE_CPU_ONLINE_SYSCONF
+static inline unsigned int cpus_online(void)
+{
+        return sysconf(_SC_NPROCESSORS_CONF);
+}
+#endif
+
 static inline unsigned long long get_fs_free_size(const char *path)
 {
 	unsigned long long ret;
diff --git a/os/os-windows.h b/os/os-windows.h
index 0c8c42d..36b421e 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -116,7 +116,6 @@ int nanosleep(const struct timespec *rqtp, struct timespec *rmtp);
 ssize_t pread(int fildes, void *buf, size_t nbyte, off_t offset);
 ssize_t pwrite(int fildes, const void *buf, size_t nbyte,
 		off_t offset);
-extern void td_fill_rand_seeds(struct thread_data *);
 
 static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 {
@@ -239,7 +238,7 @@ static inline int fio_cpuset_exit(os_cpu_mask_t *mask)
 	return 0;
 }
 
-static inline int init_random_state(struct thread_data *td, unsigned long *rand_seeds, int size)
+static inline int init_random_seeds(unsigned long *rand_seeds, int size)
 {
 	HCRYPTPROV hCryptProv;
 
@@ -258,7 +257,6 @@ static inline int init_random_state(struct thread_data *td, unsigned long *rand_
 	}
 
 	CryptReleaseContext(hCryptProv, 0);
-	td_fill_rand_seeds(td);
 	return 0;
 }
 
diff --git a/os/os.h b/os/os.h
index 5e3c813..1d400c8 100644
--- a/os/os.h
+++ b/os/os.h
@@ -60,11 +60,6 @@ typedef struct aiocb os_aiocb_t;
 #endif
 #endif
 
-#ifdef FIO_HAVE_SGIO
-#include <linux/fs.h>
-#include <scsi/sg.h>
-#endif
-
 #ifndef CONFIG_STRSEP
 #include "../oslib/strsep.h"
 #endif
@@ -253,19 +248,6 @@ static inline uint64_t fio_swap64(uint64_t val)
 	__cpu_to_le64(val);			\
 })
 
-#ifndef FIO_HAVE_BLKTRACE
-static inline int is_blktrace(const char *fname, int *need_swap)
-{
-	return 0;
-}
-struct thread_data;
-static inline int load_blktrace(struct thread_data *td, const char *fname,
-				int need_swap)
-{
-	return 1;
-}
-#endif
-
 #define FIO_DEF_CL_SIZE		128
 
 static inline int os_cache_line_size(void)
@@ -316,12 +298,7 @@ static inline long os_random_long(os_random_state_t *rs)
 #endif
 
 #ifdef FIO_USE_GENERIC_INIT_RANDOM_STATE
-extern void td_fill_rand_seeds(struct thread_data *td);
-/*
- * Initialize the various random states we need (random io, block size ranges,
- * read/write mix, etc).
- */
-static inline int init_random_state(struct thread_data *td, unsigned long *rand_seeds, int size)
+static inline int init_random_seeds(unsigned long *rand_seeds, int size)
 {
 	int fd;
 
@@ -336,7 +313,6 @@ static inline int init_random_state(struct thread_data *td, unsigned long *rand_
 	}
 
 	close(fd);
-	td_fill_rand_seeds(td);
 	return 0;
 }
 #endif
@@ -348,14 +324,6 @@ static inline unsigned long long get_fs_free_size(const char *path)
 }
 #endif
 
-#ifdef __powerpc64__
-#define FIO_HAVE_CPU_ONLINE_SYSCONF
-static inline unsigned int cpus_online(void)
-{
-        return sysconf(_SC_NPROCESSORS_CONF);
-}
-#endif
-
 #ifndef FIO_HAVE_CPU_ONLINE_SYSCONF
 static inline unsigned int cpus_online(void)
 {
diff --git a/stat.c b/stat.c
index 5b48413..1f124a8 100644
--- a/stat.c
+++ b/stat.c
@@ -1825,8 +1825,10 @@ void __show_run_stats(void)
 	}
 
 	for (i = 0; i < FIO_OUTPUT_NR; i++) {
-		buf_output_flush(&output[i]);
-		buf_output_free(&output[i]);
+		struct buf_output *out = &output[i];
+		log_info_buf(out->buf, out->buflen);
+		buf_output_clear(out);
+		buf_output_free(out);
 	}
 
 	log_info_flush();

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-05-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-05-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4641daa9b2e33bc63197dfec48584eaf05890a01:

  Fio 2.20 (2017-05-19 08:25:27 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to af13d1e88158d3e37940648be139d7a46fe00431:

  Merge branch 'bugfix' of https://github.com/YukiKita/fio (2017-05-22 10:23:25 -0600)

----------------------------------------------------------------
Jens Axboe (6):
      Merge branch 'android-26-shm' of https://github.com/omor1/fio
      Merge branch 'alignment' of https://github.com/omor1/fio
      Fix typo in man page / HOWTO for block size defaults
      Merge branch 'alignment' of https://github.com/sitsofe/fio
      Merge branch 'android_cgroup' of https://github.com/omor1/fio
      Merge branch 'bugfix' of https://github.com/YukiKita/fio

Omri Mor (5):
      flist.h: replace offsetof macros by stddef.h include
      Android: add support for cgroups
      configure: fix _Static_assert check
      os/os-android.h: fix alignment problems in shared memory functions     Fixes: #356 ("Android: SIGBUS due to unaligned access")
      os/os-android.h: fix compilation for Android O

Sitsofe Wheeler (2):
      fio: fix some struct alignment issues
      server: bump protocol version

YukiKita (1):
      Fixed json_print_value so that ending double quote of JSON string value will not disappear

 HOWTO           |  2 +-
 Makefile        |  2 +-
 configure       | 11 +----------
 fio.1           |  2 +-
 fio.h           |  2 +-
 flist.h         |  8 +-------
 json.c          | 10 +++++++---
 libfio.c        |  3 +++
 os/os-android.h | 39 ++++++++++++++++++++-------------------
 server.h        |  2 +-
 stat.h          |  6 +++---
 11 files changed, 40 insertions(+), 47 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index d9e881a..a899b90 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1290,7 +1290,7 @@ Block size
 			means default for reads, 8k for writes and trims.
 
 		**bs=,8k,**
-			means default for reads, 8k for writes, and default for writes.
+			means default for reads, 8k for writes, and default for trims.
 
 .. option:: blocksize_range=irange[,irange][,irange], bsrange=irange[,irange][,irange]
 
diff --git a/Makefile b/Makefile
index 1f0f5d0..c3e551d 100644
--- a/Makefile
+++ b/Makefile
@@ -140,7 +140,7 @@ ifeq ($(CONFIG_TARGET_OS), Linux)
   LDFLAGS += -rdynamic
 endif
 ifeq ($(CONFIG_TARGET_OS), Android)
-  SOURCE += diskutil.c fifo.c blktrace.c trim.c profiles/tiobench.c \
+  SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c profiles/tiobench.c \
 		oslib/linux-dev-lookup.c
   LIBS += -ldl
   LDFLAGS += -rdynamic
diff --git a/configure b/configure
index 21bcaf4..0327578 100755
--- a/configure
+++ b/configure
@@ -1930,16 +1930,7 @@ fi
 cat > $TMPC << EOF
 #include <assert.h>
 #include <stdlib.h>
-#undef offsetof
-#ifdef __compiler_offsetof
-#define offsetof(TYPE,MEMBER) __compiler_offsetof(TYPE,MEMBER)
-#else
-#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
-#endif
-
-#define container_of(ptr, type, member) ({			\
-	const typeof( ((type *)0)->member ) *__mptr = (ptr);	\
-	(type *)( (char *)__mptr - offsetof(type,member) );})
+#include <stddef.h>
 
 struct foo {
   int a, b;
diff --git a/fio.1 b/fio.1
index 0167c23..301a708 100644
--- a/fio.1
+++ b/fio.1
@@ -533,7 +533,7 @@ bs=256k    means 256k for reads, writes and trims
 bs=8k,32k  means 8k for reads, 32k for writes and trims
 bs=8k,32k, means 8k for reads, 32k for writes, and default for trims
 bs=,8k     means default for reads, 8k for writes and trims
-bs=,8k,    means default for reads, 8k for writes, and default for writes
+bs=,8k,    means default for reads, 8k for writes, and default for trims
 .fi
 .TP
 .BI blocksize_range \fR=\fPirange[,irange][,irange] "\fR,\fB bsrange" \fR=\fPirange[,irange][,irange]
diff --git a/fio.h b/fio.h
index e11a039..ed631bc 100644
--- a/fio.h
+++ b/fio.h
@@ -149,7 +149,7 @@ struct thread_data {
 	unsigned int thread_number;
 	unsigned int subjob_number;
 	unsigned int groupid;
-	struct thread_stat ts;
+	struct thread_stat ts __attribute__ ((aligned));
 
 	int client_type;
 
diff --git a/flist.h b/flist.h
index b4fe6e6..2ca3d77 100644
--- a/flist.h
+++ b/flist.h
@@ -2,13 +2,7 @@
 #define _LINUX_FLIST_H
 
 #include <stdlib.h>
-
-#undef offsetof
-#ifdef __compiler_offsetof
-#define offsetof(TYPE,MEMBER) __compiler_offsetof(TYPE,MEMBER)
-#else
-#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
-#endif
+#include <stddef.h>
 
 #define container_of(ptr, type, member) ({			\
 	const typeof( ((type *)0)->member ) *__mptr = (ptr);	\
diff --git a/json.c b/json.c
index e0227ec..2160d29 100644
--- a/json.c
+++ b/json.c
@@ -340,9 +340,13 @@ static void json_print_array(struct json_array *array, struct buf_output *out)
 static void json_print_value(struct json_value *value, struct buf_output *out)
 {
 	switch (value->type) {
-	case JSON_TYPE_STRING:
-		log_buf(out, "\"%s\"", value->string);
-		break;
+	case JSON_TYPE_STRING: {
+			const char delimiter = '"';
+			buf_output_add(out, &delimiter, sizeof(delimiter));
+			buf_output_add(out, value->string, strlen(value->string));
+			buf_output_add(out, &delimiter, sizeof(delimiter));
+			break;
+		}
 	case JSON_TYPE_INTEGER:
 		log_buf(out, "%lld", value->integer_number);
 		break;
diff --git a/libfio.c b/libfio.c
index 8310708..da22456 100644
--- a/libfio.c
+++ b/libfio.c
@@ -353,14 +353,17 @@ int initialize_fio(char *envp[])
 	 * can run into problems on archs that fault on unaligned fp
 	 * access (ARM).
 	 */
+	compiletime_assert((offsetof(struct thread_data, ts) % sizeof(void *)) == 0, "ts");
 	compiletime_assert((offsetof(struct thread_stat, percentile_list) % 8) == 0, "stat percentile_list");
 	compiletime_assert((offsetof(struct thread_stat, total_run_time) % 8) == 0, "total_run_time");
 	compiletime_assert((offsetof(struct thread_stat, total_err_count) % 8) == 0, "total_err_count");
 	compiletime_assert((offsetof(struct thread_stat, latency_percentile) % 8) == 0, "stat latency_percentile");
+	compiletime_assert((offsetof(struct thread_data, ts.clat_stat) % 8) == 0, "ts.clat_stat");
 	compiletime_assert((offsetof(struct thread_options_pack, zipf_theta) % 8) == 0, "zipf_theta");
 	compiletime_assert((offsetof(struct thread_options_pack, pareto_h) % 8) == 0, "pareto_h");
 	compiletime_assert((offsetof(struct thread_options_pack, percentile_list) % 8) == 0, "percentile_list");
 	compiletime_assert((offsetof(struct thread_options_pack, latency_percentile) % 8) == 0, "latency_percentile");
+	compiletime_assert((offsetof(struct jobs_eta, m_rate) % 8) == 0, "m_rate");
 
 	err = endian_check();
 	if (err) {
diff --git a/os/os-android.h b/os/os-android.h
index 6c3e098..c56d682 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -32,6 +32,7 @@
 #define FIO_HAVE_HUGETLB
 #define FIO_HAVE_BLKTRACE
 #define FIO_HAVE_CL_SIZE
+#define FIO_HAVE_CGROUPS
 #define FIO_HAVE_FS_STAT
 #define FIO_HAVE_TRIM
 #define FIO_HAVE_GETTID
@@ -59,19 +60,17 @@
 
 #ifndef CONFIG_NO_SHM
 /*
- * The Android NDK doesn't currently export <sys/shm.h>, so define the
- * necessary stuff here.
+ * Bionic doesn't support SysV shared memeory, so implement it using ashmem
  */
-
-#include <sys/shm.h>
-#define SHM_HUGETLB    04000
-
 #include <stdio.h>
 #include <linux/ashmem.h>
+#include <linux/shm.h>
+#define shmid_ds shmid64_ds
+#define SHM_HUGETLB    04000
 
 #define ASHMEM_DEVICE	"/dev/ashmem"
 
-static inline int shmctl (int __shmid, int __cmd, struct shmid_ds *__buf)
+static inline int shmctl(int __shmid, int __cmd, struct shmid_ds *__buf)
 {
 	int ret=0;
 	if (__cmd == IPC_RMID)
@@ -84,7 +83,7 @@ static inline int shmctl (int __shmid, int __cmd, struct shmid_ds *__buf)
 	return ret;
 }
 
-static inline int shmget (key_t __key, size_t __size, int __shmflg)
+static inline int shmget(key_t __key, size_t __size, int __shmflg)
 {
 	int fd,ret;
 	char keybuf[11];
@@ -98,7 +97,8 @@ static inline int shmget (key_t __key, size_t __size, int __shmflg)
 	if (ret < 0)
 		goto error;
 
-	ret = ioctl(fd, ASHMEM_SET_SIZE, __size);
+	/* Stores size in first 8 bytes, allocate extra space */
+	ret = ioctl(fd, ASHMEM_SET_SIZE, __size + sizeof(uint64_t));
 	if (ret < 0)
 		goto error;
 
@@ -109,21 +109,22 @@ error:
 	return ret;
 }
 
-static inline void *shmat (int __shmid, const void *__shmaddr, int __shmflg)
+static inline void *shmat(int __shmid, const void *__shmaddr, int __shmflg)
 {
-	size_t *ptr, size = ioctl(__shmid, ASHMEM_GET_SIZE, NULL);
-	ptr = mmap(NULL, size + sizeof(size_t), PROT_READ | PROT_WRITE, MAP_SHARED, __shmid, 0);
-	*ptr = size;    //save size at beginning of buffer, for use with munmap
-	return &ptr[1];
+	size_t size = ioctl(__shmid, ASHMEM_GET_SIZE, NULL);
+	/* Needs to be 8-byte aligned to prevent SIGBUS on 32-bit ARM */
+	uint64_t *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, __shmid, 0);
+	/* Save size at beginning of buffer, for use with munmap */
+	*ptr = size;
+	return ptr + 1;
 }
 
 static inline int shmdt (const void *__shmaddr)
 {
-	size_t *ptr, size;
-	ptr = (size_t *)__shmaddr;
-	ptr--;
-	size = *ptr;    //find mmap size which we stored at the beginning of the buffer
-	return munmap((void *)ptr, size + sizeof(size_t));
+	/* Find mmap size which we stored at the beginning of the buffer */
+	uint64_t *ptr = (uint64_t *)__shmaddr - 1;
+	size_t size = *ptr;
+	return munmap(ptr, size);
 }
 #endif
 
diff --git a/server.h b/server.h
index 5c720d4..fff6804 100644
--- a/server.h
+++ b/server.h
@@ -49,7 +49,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 61,
+	FIO_SERVER_VER			= 62,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.h b/stat.h
index aa4ad80..d8a0803 100644
--- a/stat.h
+++ b/stat.h
@@ -242,17 +242,17 @@ struct jobs_eta {
 	uint32_t nr_pending;
 	uint32_t nr_setting_up;
 
-	uint32_t files_open;
-
 	uint64_t m_rate[DDIR_RWDIR_CNT], t_rate[DDIR_RWDIR_CNT];
-	uint32_t m_iops[DDIR_RWDIR_CNT], t_iops[DDIR_RWDIR_CNT];
 	uint64_t rate[DDIR_RWDIR_CNT];
+	uint32_t m_iops[DDIR_RWDIR_CNT], t_iops[DDIR_RWDIR_CNT];
 	uint32_t iops[DDIR_RWDIR_CNT];
 	uint64_t elapsed_sec;
 	uint64_t eta_sec;
 	uint32_t is_pow2;
 	uint32_t unit_base;
 
+	uint32_t files_open;
+
 	/*
 	 * Network 'copy' of run_str[]
 	 */

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-05-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-05-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dd3805d49995e59fdf61e2560c3fec5b7f5c71b6:

  Remove leftover warnings (2017-05-18 12:53:38 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4641daa9b2e33bc63197dfec48584eaf05890a01:

  Fio 2.20 (2017-05-19 08:25:27 -0600)

----------------------------------------------------------------
Andreas Herrmann (1):
      stat: Re-add output of basic bw information if bw_log is not written

Jens Axboe (1):
      Fio 2.20

 FIO-VERSION-GEN        | 2 +-
 os/windows/install.wxs | 2 +-
 stat.c                 | 8 +++++---
 3 files changed, 7 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 4cc903f..a9ddb31 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.19
+DEF_VER=fio-2.20
 
 LF='
 '
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index ffaed8e..05d2a83 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.19">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.20">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"
diff --git a/stat.c b/stat.c
index 6e47c34..5b48413 100644
--- a/stat.c
+++ b/stat.c
@@ -2465,7 +2465,7 @@ static int __add_samples(struct thread_data *td, struct timeval *parent_tv,
 
 		add_stat_sample(&stat[ddir], rate);
 
-		if (td->bw_log) {
+		if (log) {
 			unsigned int bs = 0;
 
 			if (td->o.min_bs[ddir] == td->o.max_bs[ddir])
@@ -2541,12 +2541,14 @@ int calc_log_samples(void)
 			next = min(td->o.iops_avg_time, td->o.bw_avg_time);
 			continue;
 		}
-		if (td->bw_log && !per_unit_log(td->bw_log)) {
+		if (!td->bw_log ||
+			(td->bw_log && !per_unit_log(td->bw_log))) {
 			tmp = add_bw_samples(td, &now);
 			if (tmp < next)
 				next = tmp;
 		}
-		if (td->iops_log && !per_unit_log(td->iops_log)) {
+		if (!td->iops_log ||
+			(td->iops_log && !per_unit_log(td->iops_log))) {
 			tmp = add_iops_samples(td, &now);
 			if (tmp < next)
 				next = tmp;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-05-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-05-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c78997bf625ffcca49c18e11ab0f5448b26d7452:

  man page: include reference to version 4 of the terse format (2017-05-09 21:08:12 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dd3805d49995e59fdf61e2560c3fec5b7f5c71b6:

  Remove leftover warnings (2017-05-18 12:53:38 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Fix wrap issue with 64-bit pwritev2/preadv2
      Remove leftover warnings

 os/os-linux.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/os/os-linux.h b/os/os-linux.h
index 911f7e7..ba53590 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -314,9 +314,13 @@ static inline int fio_set_sched_idle(void)
 static inline void make_pos_h_l(unsigned long *pos_h, unsigned long *pos_l,
 				off_t offset)
 {
+#if BITS_PER_LONG == 64
+	*pos_l = offset;
+	*pos_h = 0;
+#else
 	*pos_l = offset & 0xffffffff;
 	*pos_h = ((uint64_t) offset) >> 32;
-
+#endif
 }
 static inline ssize_t preadv2(int fd, const struct iovec *iov, int iovcnt,
 			      off_t offset, unsigned int flags)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-05-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-05-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c55fae03f7d5c0981e55241fc9003d762f7a5fd9:

  options: force refill_buffers with pattern and any reads (2017-05-04 08:43:27 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c78997bf625ffcca49c18e11ab0f5448b26d7452:

  man page: include reference to version 4 of the terse format (2017-05-09 21:08:12 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      man page: include reference to version 4 of the terse format

 fio.1 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/fio.1 b/fio.1
index 138bcbb..0167c23 100644
--- a/fio.1
+++ b/fio.1
@@ -43,7 +43,7 @@ Deprecated, use \-\-output-format instead to select multiple formats.
 Display version information and exit.
 .TP
 .BI \-\-terse\-version \fR=\fPversion
-Set terse version output format (Current version 3, or older version 2).
+Set terse version output format (default 3, or 2 or 4)
 .TP
 .B \-\-help
 Display usage information and exit.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-05-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-05-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f52e919826839eaba90903b67fe02042159a0023:

  gettime: make utime_since_now and mtime_since_now consistent in how they record the caller and put this all behind FIO_DEBUG_TIME (2017-05-03 08:49:37 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c55fae03f7d5c0981e55241fc9003d762f7a5fd9:

  options: force refill_buffers with pattern and any reads (2017-05-04 08:43:27 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      options: force refill_buffers with pattern and any reads

 options.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/options.c b/options.c
index 85574d7..b489e90 100644
--- a/options.c
+++ b/options.c
@@ -1306,8 +1306,17 @@ static int str_buffer_pattern_cb(void *data, const char *input)
 
 	assert(ret != 0);
 	td->o.buffer_pattern_bytes = ret;
-	if (!td->o.compress_percentage)
+
+	/*
+	 * If this job is doing any reading or has compression set,
+	 * ensure that we refill buffers for writes or we could be
+	 * invalidating the pattern through reads.
+	 */
+	if (!td->o.compress_percentage && !td_read(td))
 		td->o.refill_buffers = 0;
+	else
+		td->o.refill_buffers = 1;
+
 	td->o.scramble_buffers = 0;
 	td->o.zero_buffers = 0;
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-05-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-05-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d418fa90f6daf75bb9336182f79c0b25aa6feecd:

  configure: Add missing $val != "yes" test to override compile_prog() result (2017-05-01 14:47:16 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f52e919826839eaba90903b67fe02042159a0023:

  gettime: make utime_since_now and mtime_since_now consistent in how they record the caller and put this all behind FIO_DEBUG_TIME (2017-05-03 08:49:37 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'wip-remove-disconnect' of https://github.com/liupan1111/fio

Pan Liu (2):
      remove out-of-date comment
      remove redundant _fio_rbd_disconnect, which is already called in     fio_rbd_cleaup

Vincent Fu (5):
      stat: change json+ output format so that instead of printing the raw clat data structure, use actual durations instead of array indices and print only bins with nonzero counts
      Revert "tools/fio_latency2csv.py: add tool that converts json+ to CSV"
      stat: reset_io_stats: fix a problem, rearrange some code
      client/server: make sure that all elements in io_u_lat_m[] are transferred and received
      gettime: make utime_since_now and mtime_since_now consistent in how they record the caller and put this all behind FIO_DEBUG_TIME

 Makefile                 |   2 +-
 client.c                 |   4 +-
 engines/rbd.c            |   9 +----
 gettime.c                |  11 ++++++
 server.c                 |   4 +-
 stat.c                   |  28 ++++++-------
 tools/fio_latency2csv.py | 101 -----------------------------------------------
 7 files changed, 32 insertions(+), 127 deletions(-)
 delete mode 100755 tools/fio_latency2csv.py

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 66083ff..1f0f5d0 100644
--- a/Makefile
+++ b/Makefile
@@ -26,7 +26,7 @@ OPTFLAGS= -g -ffast-math
 CFLAGS	= -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement $(OPTFLAGS) $(EXTFLAGS) $(BUILD_CFLAGS) -I. -I$(SRCDIR)
 LIBS	+= -lm $(EXTLIBS)
 PROGS	= fio
-SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/fio_latency2csv.py tools/hist/fiologparser_hist.py)
+SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/hist/fiologparser_hist.py)
 
 ifndef CONFIG_FIO_NO_OPT
   CFLAGS += -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2
diff --git a/client.c b/client.c
index 7934661..80096bf 100644
--- a/client.c
+++ b/client.c
@@ -908,10 +908,10 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 		dst->io_u_complete[i]	= le32_to_cpu(src->io_u_complete[i]);
 	}
 
-	for (i = 0; i < FIO_IO_U_LAT_U_NR; i++) {
+	for (i = 0; i < FIO_IO_U_LAT_U_NR; i++)
 		dst->io_u_lat_u[i]	= le32_to_cpu(src->io_u_lat_u[i]);
+	for (i = 0; i < FIO_IO_U_LAT_M_NR; i++)
 		dst->io_u_lat_m[i]	= le32_to_cpu(src->io_u_lat_m[i]);
-	}
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++)
 		for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
diff --git a/engines/rbd.c b/engines/rbd.c
index 7433879..4bae425 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -597,11 +597,11 @@ static int fio_rbd_setup(struct thread_data *td)
 	r = rbd_stat(rbd->image, &info, sizeof(info));
 	if (r < 0) {
 		log_err("rbd_status failed.\n");
-		goto disconnect;
+		goto cleanup;
 	} else if (info.size == 0) {
 		log_err("image size should be larger than zero.\n");
 		r = -EINVAL;
-		goto disconnect;
+		goto cleanup;
 	}
 
 	dprint(FD_IO, "rbd-engine: image size: %lu\n", info.size);
@@ -618,13 +618,8 @@ static int fio_rbd_setup(struct thread_data *td)
 	f = td->files[0];
 	f->real_file_size = info.size;
 
-	/* disconnect, then we were only connected to determine
-	 * the size of the RBD.
-	 */
 	return 0;
 
-disconnect:
-	_fio_rbd_disconnect(rbd);
 cleanup:
 	fio_rbd_cleanup(td);
 	return r;
diff --git a/gettime.c b/gettime.c
index 85ba7cb..628aad6 100644
--- a/gettime.c
+++ b/gettime.c
@@ -402,8 +402,14 @@ uint64_t utime_since(const struct timeval *s, const struct timeval *e)
 uint64_t utime_since_now(const struct timeval *s)
 {
 	struct timeval t;
+#ifdef FIO_DEBUG_TIME
+	void *p = __builtin_return_address(0);
 
+	fio_gettime(&t, p);
+#else
 	fio_gettime(&t, NULL);
+#endif
+
 	return utime_since(s, &t);
 }
 
@@ -429,9 +435,14 @@ uint64_t mtime_since(const struct timeval *s, const struct timeval *e)
 uint64_t mtime_since_now(const struct timeval *s)
 {
 	struct timeval t;
+#ifdef FIO_DEBUG_TIME
 	void *p = __builtin_return_address(0);
 
 	fio_gettime(&t, p);
+#else
+	fio_gettime(&t, NULL);
+#endif
+
 	return mtime_since(s, &t);
 }
 
diff --git a/server.c b/server.c
index 1b3bc30..1e269c2 100644
--- a/server.c
+++ b/server.c
@@ -1497,10 +1497,10 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 		p.ts.io_u_complete[i]	= cpu_to_le32(ts->io_u_complete[i]);
 	}
 
-	for (i = 0; i < FIO_IO_U_LAT_U_NR; i++) {
+	for (i = 0; i < FIO_IO_U_LAT_U_NR; i++)
 		p.ts.io_u_lat_u[i]	= cpu_to_le32(ts->io_u_lat_u[i]);
+	for (i = 0; i < FIO_IO_U_LAT_M_NR; i++)
 		p.ts.io_u_lat_m[i]	= cpu_to_le32(ts->io_u_lat_m[i]);
-	}
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++)
 		for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
diff --git a/stat.c b/stat.c
index f3b82cf..6e47c34 100644
--- a/stat.c
+++ b/stat.c
@@ -98,7 +98,7 @@ static unsigned int plat_val_to_idx(unsigned int val)
  * Convert the given index of the bucket array to the value
  * represented by the bucket
  */
-static unsigned int plat_idx_to_val(unsigned int idx)
+static unsigned long long plat_idx_to_val(unsigned int idx)
 {
 	unsigned int error_bits, k, base;
 
@@ -972,12 +972,11 @@ static void add_ddir_status_json(struct thread_stat *ts,
 		clat_bins_object = json_create_object();
 		json_object_add_value_object(tmp_object, "bins", clat_bins_object);
 		for(i = 0; i < FIO_IO_U_PLAT_NR; i++) {
-			snprintf(buf, sizeof(buf), "%d", i);
-			json_object_add_value_int(clat_bins_object, (const char *)buf, ts->io_u_plat[ddir][i]);
+			if (ts->io_u_plat[ddir][i]) {
+				snprintf(buf, sizeof(buf), "%llu", plat_idx_to_val(i));
+				json_object_add_value_int(clat_bins_object, (const char *)buf, ts->io_u_plat[ddir][i]);
+			}
 		}
-		json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_BITS", FIO_IO_U_PLAT_BITS);
-		json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_VAL", FIO_IO_U_PLAT_VAL);
-		json_object_add_value_int(clat_bins_object, "FIO_IO_U_PLAT_NR", FIO_IO_U_PLAT_NR);
 	}
 
 	if (!calc_lat(&ts->lat_stat[ddir], &min, &max, &mean, &dev)) {
@@ -2172,6 +2171,9 @@ void reset_io_stats(struct thread_data *td)
 
 		ts->io_bytes[i] = 0;
 		ts->runtime[i] = 0;
+		ts->total_io_u[i] = 0;
+		ts->short_io_u[i] = 0;
+		ts->drop_io_u[i] = 0;
 
 		for (j = 0; j < FIO_IO_U_PLAT_NR; j++)
 			ts->io_u_plat[i][j] = 0;
@@ -2181,17 +2183,15 @@ void reset_io_stats(struct thread_data *td)
 		ts->io_u_map[i] = 0;
 		ts->io_u_submit[i] = 0;
 		ts->io_u_complete[i] = 0;
+	}
+
+	for (i = 0; i < FIO_IO_U_LAT_U_NR; i++)
 		ts->io_u_lat_u[i] = 0;
+	for (i = 0; i < FIO_IO_U_LAT_M_NR; i++)
 		ts->io_u_lat_m[i] = 0;
-		ts->total_submit = 0;
-		ts->total_complete = 0;
-	}
 
-	for (i = 0; i < 3; i++) {
-		ts->total_io_u[i] = 0;
-		ts->short_io_u[i] = 0;
-		ts->drop_io_u[i] = 0;
-	}
+	ts->total_submit = 0;
+	ts->total_complete = 0;
 }
 
 static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
diff --git a/tools/fio_latency2csv.py b/tools/fio_latency2csv.py
deleted file mode 100755
index 93586d2..0000000
--- a/tools/fio_latency2csv.py
+++ /dev/null
@@ -1,101 +0,0 @@
-#!/usr/bin/python
-#
-# fio_latency2csv.py
-#
-# This tool converts fio's json+ completion latency data to CSV format.
-# For example:
-#
-# fio_latency2csv.py fio-jsonplus.output fio-latency.csv
-#
-
-import os
-import json
-import argparse
-
-
-def parse_args():
-    parser = argparse.ArgumentParser()
-    parser.add_argument('source',
-                        help='fio json+ output file containing completion '
-                             'latency data')
-    parser.add_argument('dest',
-                        help='destination file stub for latency data in CSV '
-                             'format. job number will be appended to filename')
-    args = parser.parse_args()
-
-    return args
-
-
-# from stat.c
-def plat_idx_to_val(idx, FIO_IO_U_PLAT_BITS=6, FIO_IO_U_PLAT_VAL=64):
-    # MSB <= (FIO_IO_U_PLAT_BITS-1), cannot be rounded off. Use
-    # all bits of the sample as index
-    if (idx < (FIO_IO_U_PLAT_VAL << 1)):
-        return idx
-
-    # Find the group and compute the minimum value of that group
-    error_bits = (idx >> FIO_IO_U_PLAT_BITS) - 1
-    base = 1 << (error_bits + FIO_IO_U_PLAT_BITS)
-
-    # Find its bucket number of the group
-    k = idx % FIO_IO_U_PLAT_VAL
-
-    # Return the mean of the range of the bucket
-    return (base + ((k + 0.5) * (1 << error_bits)))
-
-
-def percentile(idx, run_total):
-    total = run_total[len(run_total)-1]
-    if total == 0:
-        return 0
-
-    return float(run_total[x]) / total
-
-
-if __name__ == '__main__':
-    args = parse_args()
-
-    with open(args.source, 'r') as source:
-        jsondata = json.loads(source.read())
-
-    bins = {}
-    bin_const = {}
-    run_total = {}
-    ddir_list = ['read', 'write', 'trim']
-    const_list = ['FIO_IO_U_PLAT_NR', 'FIO_IO_U_PLAT_BITS',
-                  'FIO_IO_U_PLAT_VAL']
-
-    for jobnum in range(0,len(jsondata['jobs'])):
-        prev_ddir = None
-        for ddir in ddir_list:
-            bins[ddir] = jsondata['jobs'][jobnum][ddir]['clat']['bins']
-
-            bin_const[ddir] = {}
-            for const in const_list:
-                bin_const[ddir][const] = bins[ddir].pop(const)
-                if prev_ddir:
-                    assert bin_const[ddir][const] == bin_const[prev_ddir][const]
-            prev_ddir = ddir
-
-            run_total[ddir] = [0 for x in
-                               range(bin_const[ddir]['FIO_IO_U_PLAT_NR'])]
-            run_total[ddir][0] = bins[ddir]['0']
-            for x in range(1, bin_const[ddir]['FIO_IO_U_PLAT_NR']):
-                run_total[ddir][x] = run_total[ddir][x-1] + bins[ddir][str(x)]
-        
-        stub, ext = os.path.splitext(args.dest)
-        outfile = stub + '_job' + str(jobnum) + ext
-
-        with open(outfile, 'w') as output:
-            output.write("clat (usec),")
-            for ddir in ddir_list:
-                output.write("{0},".format(ddir))
-            output.write("\n")
-
-            for x in range(bin_const['read']['FIO_IO_U_PLAT_NR']):
-                output.write("{0},".format(plat_idx_to_val(x,
-                                          bin_const['read']['FIO_IO_U_PLAT_BITS'],
-                                          bin_const['read']['FIO_IO_U_PLAT_VAL'])))
-                for ddir in ddir_list:
-                    output.write("{0},".format(percentile(x, run_total[ddir])))
-                output.write("\n")

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-05-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-05-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit adcedfb85288e86c9e70a9003485c89fa47722ce:

  Drop triple X for cpu affinity for OpenBSD (2017-04-30 16:51:21 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d418fa90f6daf75bb9336182f79c0b25aa6feecd:

  configure: Add missing $val != "yes" test to override compile_prog() result (2017-05-01 14:47:16 -0600)

----------------------------------------------------------------
Omri Mor (1):
      os/os-android.h: fix shared memory support

Tomohiro Kusumi (6):
      configure: output_sym CONFIG_GFIO
      configure: Add missing <string.h> to avoid bogus warning
      configure: Add void* cast to avoid bogus warning
      configure: Check gfio test result via return value (not printf)
      configure: Add missing ##########...
      configure: Add missing $val != "yes" test to override compile_prog() result

 configure       | 27 ++++++++++++++++-----------
 os/os-android.h |  6 +++---
 2 files changed, 19 insertions(+), 14 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index bcb898a..21bcaf4 100755
--- a/configure
+++ b/configure
@@ -161,8 +161,7 @@ for opt do
   ;;
   --build-static) build_static="yes"
   ;;
-  --enable-gfio)
-  gfio_check="yes"
+  --enable-gfio) gfio_check="yes"
   ;;
   --disable-numa) disable_numa="yes"
   ;;
@@ -613,7 +612,9 @@ echo "POSIX AIO fsync               $posix_aio_fsync"
 
 ##########################################
 # POSIX pshared attribute probe
-posix_pshared="no"
+if test "$posix_pshared" != "yes" ; then
+  posix_pshared="no"
+fi
 cat > $TMPC <<EOF
 #include <unistd.h>
 int main(void)
@@ -905,10 +906,11 @@ if test "$clockid_t" != "yes" ; then
 fi
 cat > $TMPC << EOF
 #include <time.h>
+#include <string.h>
 int main(int argc, char **argv)
 {
   volatile clockid_t cid;
-  memset(&cid, 0, sizeof(cid));
+  memset((void*)&cid, 0, sizeof(cid));
   return 0;
 }
 EOF
@@ -1243,7 +1245,7 @@ int main(void)
   gdk_threads_enter();
   gdk_threads_leave();
 
-  printf("%d", GTK_CHECK_VERSION(2, 18, 0));
+  return GTK_CHECK_VERSION(2, 18, 0) ? 0 : 1; /* 0 on success */
 }
 EOF
 GTK_CFLAGS=$(pkg-config --cflags gtk+-2.0 gthread-2.0)
@@ -1259,8 +1261,8 @@ if test "$?" != "0" ; then
   exit 1
 fi
 if compile_prog "$GTK_CFLAGS" "$GTK_LIBS" "gfio" ; then
-  r=$($TMPE)
-  if test "$r" != "0" ; then
+  $TMPE
+  if test "$?" = "0" ; then
     gfio="yes"
     GFIO_LIBS="$LIBS $GTK_LIBS"
     CFLAGS="$CFLAGS $GTK_CFLAGS"
@@ -1279,6 +1281,7 @@ if test "$gfio_check" = "yes" ; then
   echo "gtk 2.18 or higher            $gfio"
 fi
 
+##########################################
 # Check whether we have getrusage(RUSAGE_THREAD)
 if test "$rusage_thread" != "yes" ; then
   rusage_thread="no"
@@ -1473,7 +1476,6 @@ cat > $TMPC << EOF
 
 int main(int argc, char **argv)
 {
-
   rados_t cluster;
   rados_ioctx_t io_ctx;
   const char pool[] = "rbd";
@@ -1591,6 +1593,7 @@ if compile_prog "" "" "setvbuf"; then
 fi
 echo "setvbuf                       $setvbuf"
 
+##########################################
 # check for gfapi
 if test "$gfapi" != "yes" ; then
   gfapi="no"
@@ -1600,7 +1603,6 @@ cat > $TMPC << EOF
 
 int main(int argc, char **argv)
 {
-
   glfs_t *g = glfs_new("foo");
 
   return 0;
@@ -1794,6 +1796,7 @@ echo "NVML pmemblk engine           $pmemblk"
 # Report whether dev-dax engine is enabled
 echo "NVML dev-dax engine           $devdax"
 
+##########################################
 # Check if we have lex/yacc available
 yacc="no"
 yacc_is_bison="no"
@@ -1996,7 +1999,9 @@ echo "march_armv8_a_crc_crypto      $march_armv8_a_crc_crypto"
 
 ##########################################
 # cuda probe
-cuda="no"
+if test "$cuda" != "yes" ; then
+  cuda="no"
+fi
 cat > $TMPC << EOF
 #include <cuda.h>
 int main(int argc, char **argv)
@@ -2126,7 +2131,7 @@ if test "$rusage_thread" = "yes" ; then
   output_sym "CONFIG_RUSAGE_THREAD"
 fi
 if test "$gfio" = "yes" ; then
-  echo "CONFIG_GFIO=y" >> $config_host_mak
+  output_sym "CONFIG_GFIO"
 fi
 if test "$esx" = "yes" ; then
   output_sym "CONFIG_ESX"
diff --git a/os/os-android.h b/os/os-android.h
index ba599dd..6c3e098 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -57,18 +57,17 @@
 #define MAP_HUGETLB 0x40000 /* arch specific */
 #endif
 
-
+#ifndef CONFIG_NO_SHM
 /*
  * The Android NDK doesn't currently export <sys/shm.h>, so define the
  * necessary stuff here.
  */
 
-#include <linux/shm.h>
+#include <sys/shm.h>
 #define SHM_HUGETLB    04000
 
 #include <stdio.h>
 #include <linux/ashmem.h>
-#include <sys/mman.h>
 
 #define ASHMEM_DEVICE	"/dev/ashmem"
 
@@ -126,6 +125,7 @@ static inline int shmdt (const void *__shmaddr)
 	size = *ptr;    //find mmap size which we stored at the beginning of the buffer
 	return munmap((void *)ptr, size + sizeof(size_t));
 }
+#endif
 
 #define SPLICE_DEF_SIZE	(64*1024)
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-05-01 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-05-01 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4bd2c8b9251a2c88f44ad52168252ce2de660bf7:

  configure: fix broken test for cuda (2017-04-26 15:24:36 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to adcedfb85288e86c9e70a9003485c89fa47722ce:

  Drop triple X for cpu affinity for OpenBSD (2017-04-30 16:51:21 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'fix-348' of https://github.com/omor1/fio
      Merge branch 'zero_ioengine_flags' of https://github.com/sitsofe/fio

Omri Mor (3):
      os/os-android.h: fix broken shmget() due to ndk-r15
      os/os-android.h: use byte swap intrinsics if available
      os/os-linux.h: fix broken byte swap intrinsics

Sitsofe Wheeler (1):
      fio.h: zero old flag bits when setting new ioengine flags

Tomohiro Kusumi (6):
      Fix "cast from pointer to integer of different size" warning on OpenBSD
      Fix "C99 inline functions are not supported" warning on OpenBSD
      Fix "'RB_ROOT' undeclared" error on OpenBSD
      Turn off lex by default on OpenBSD
      Implement shm_attach_to_open_removed() for OpenBSD
      Drop triple X for cpu affinity for OpenBSD

 configure       |  3 ++-
 fio.h           |  3 ++-
 iolog.c         |  2 +-
 os/os-android.h | 30 ++++++++++++++++++++++--------
 os/os-linux.h   | 25 +++++++++++++------------
 os/os-netbsd.h  |  2 +-
 os/os-openbsd.h | 29 ++++++++++++++++++++++++++---
 t/dedupe.c      |  2 +-
 8 files changed, 68 insertions(+), 28 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 83a6702..bcb898a 100755
--- a/configure
+++ b/configure
@@ -257,8 +257,9 @@ fi
 # cross-compiling to one of these OSes then you'll need to specify
 # the correct CPU with the --cpu option.
 case $targetos in
-AIX)
+AIX|OpenBSD)
   # Unless explicitly enabled, turn off lex.
+  # OpenBSD will hit syntax error when enabled.
   if test -z "$disable_lex" ; then
     disable_lex="yes"
   else
diff --git a/fio.h b/fio.h
index 6b2b669..e11a039 100644
--- a/fio.h
+++ b/fio.h
@@ -596,7 +596,8 @@ static inline enum fio_ioengine_flags td_ioengine_flags(struct thread_data *td)
 
 static inline void td_set_ioengine_flags(struct thread_data *td)
 {
-	td->flags |= (td->io_ops->flags << TD_ENG_FLAG_SHIFT);
+	td->flags = (~(TD_ENG_FLAG_MASK << TD_ENG_FLAG_SHIFT) & td->flags) |
+		    (td->io_ops->flags << TD_ENG_FLAG_SHIFT);
 }
 
 static inline bool td_ioengine_flagged(struct thread_data *td,
diff --git a/iolog.c b/iolog.c
index 2e8da13..31d674c 100644
--- a/iolog.c
+++ b/iolog.c
@@ -696,7 +696,7 @@ void free_log(struct io_log *log)
 	sfree(log);
 }
 
-inline unsigned long hist_sum(int j, int stride, unsigned int *io_u_plat,
+unsigned long hist_sum(int j, int stride, unsigned int *io_u_plat,
 		unsigned int *io_u_plat_last)
 {
 	unsigned long sum;
diff --git a/os/os-android.h b/os/os-android.h
index b59fac1..ba599dd 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -20,6 +20,10 @@
 #include "binject.h"
 #include "../file.h"
 
+#ifndef __has_builtin         // Optional of course.
+  #define __has_builtin(x) 0  // Compatibility with non-clang compilers.
+#endif
+
 #define FIO_HAVE_DISK_UTIL
 #define FIO_HAVE_IOSCHED_SWITCH
 #define FIO_HAVE_IOPRIO
@@ -84,14 +88,14 @@ static inline int shmctl (int __shmid, int __cmd, struct shmid_ds *__buf)
 static inline int shmget (key_t __key, size_t __size, int __shmflg)
 {
 	int fd,ret;
-	char key[11];
-	
+	char keybuf[11];
+
 	fd = open(ASHMEM_DEVICE, O_RDWR);
 	if (fd < 0)
 		return fd;
 
-	sprintf(key,"%d",__key);
-	ret = ioctl(fd, ASHMEM_SET_NAME, key);
+	sprintf(keybuf,"%d",__key);
+	ret = ioctl(fd, ASHMEM_SET_NAME, keybuf);
 	if (ret < 0)
 		goto error;
 
@@ -100,7 +104,7 @@ static inline int shmget (key_t __key, size_t __size, int __shmflg)
 		goto error;
 
 	return fd;
-	
+
 error:
 	close(fd);
 	return ret;
@@ -219,9 +223,19 @@ static inline long os_random_long(os_random_state_t *rs)
 #define FIO_O_NOATIME	0
 #endif
 
-#define fio_swap16(x)	__bswap_16(x)
-#define fio_swap32(x)	__bswap_32(x)
-#define fio_swap64(x)	__bswap_64(x)
+/* Check for GCC or Clang byte swap intrinsics */
+#if (__has_builtin(__builtin_bswap16) && __has_builtin(__builtin_bswap32) \
+     && __has_builtin(__builtin_bswap64)) || (__GNUC__ > 4 \
+     || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8)) /* fio_swapN */
+#define fio_swap16(x)	__builtin_bswap16(x)
+#define fio_swap32(x)	__builtin_bswap32(x)
+#define fio_swap64(x)	__builtin_bswap64(x)
+#else
+#include <byteswap.h>
+#define fio_swap16(x)	bswap_16(x)
+#define fio_swap32(x)	bswap_32(x)
+#define fio_swap64(x)	bswap_64(x)
+#endif /* fio_swapN */
 
 #define CACHE_LINE_FILE	\
 	"/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size"
diff --git a/os/os-linux.h b/os/os-linux.h
index 7b328dc..911f7e7 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -16,12 +16,15 @@
 #include <linux/unistd.h>
 #include <linux/raw.h>
 #include <linux/major.h>
-#include <byteswap.h>
 
 #include "./os-linux-syscall.h"
 #include "binject.h"
 #include "../file.h"
 
+#ifndef __has_builtin         // Optional of course.
+  #define __has_builtin(x) 0  // Compatibility with non-clang compilers.
+#endif
+
 #define FIO_HAVE_CPU_AFFINITY
 #define FIO_HAVE_DISK_UTIL
 #define FIO_HAVE_SGIO
@@ -219,21 +222,19 @@ static inline int fio_lookup_raw(dev_t dev, int *majdev, int *mindev)
 #define FIO_MADV_FREE	MADV_REMOVE
 #endif
 
-#if defined(__builtin_bswap16)
+/* Check for GCC or Clang byte swap intrinsics */
+#if (__has_builtin(__builtin_bswap16) && __has_builtin(__builtin_bswap32) \
+     && __has_builtin(__builtin_bswap64)) || (__GNUC__ > 4 \
+     || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8)) /* fio_swapN */
 #define fio_swap16(x)	__builtin_bswap16(x)
-#else
-#define fio_swap16(x)	__bswap_16(x)
-#endif
-#if defined(__builtin_bswap32)
 #define fio_swap32(x)	__builtin_bswap32(x)
-#else
-#define fio_swap32(x)	__bswap_32(x)
-#endif
-#if defined(__builtin_bswap64)
 #define fio_swap64(x)	__builtin_bswap64(x)
 #else
-#define fio_swap64(x)	__bswap_64(x)
-#endif
+#include <byteswap.h>
+#define fio_swap16(x)	bswap_16(x)
+#define fio_swap32(x)	bswap_32(x)
+#define fio_swap64(x)	bswap_64(x)
+#endif /* fio_swapN */
 
 #define CACHE_LINE_FILE	\
 	"/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size"
diff --git a/os/os-netbsd.h b/os/os-netbsd.h
index e6ba508..7be02a7 100644
--- a/os/os-netbsd.h
+++ b/os/os-netbsd.h
@@ -25,7 +25,7 @@
 #define FIO_HAVE_FS_STAT
 #define FIO_HAVE_GETTID
 
-#undef	FIO_HAVE_CPU_AFFINITY	/* XXX notyet */
+#undef	FIO_HAVE_CPU_AFFINITY	/* doesn't exist */
 
 #define OS_MAP_ANON		MAP_ANON
 
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index 7def432..d874ee2 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -9,6 +9,7 @@
 #include <sys/ioctl.h>
 #include <sys/dkio.h>
 #include <sys/disklabel.h>
+#include <sys/utsname.h>
 /* XXX hack to avoid conflicts between rbtree.h and <sys/tree.h> */
 #include <sys/sysctl.h>
 #undef RB_BLACK
@@ -24,7 +25,7 @@
 #define FIO_HAVE_GETTID
 #define FIO_HAVE_SHM_ATTACH_REMOVED
 
-#undef	FIO_HAVE_CPU_AFFINITY	/* XXX notyet */
+#undef	FIO_HAVE_CPU_AFFINITY	/* doesn't exist */
 
 #define OS_MAP_ANON		MAP_ANON
 
@@ -68,7 +69,7 @@ static inline unsigned long long os_phys_mem(void)
 
 static inline int gettid(void)
 {
-	return (int) pthread_self();
+	return (int)(intptr_t) pthread_self();
 }
 
 static inline unsigned long long get_fs_free_size(const char *path)
@@ -90,9 +91,31 @@ static inline unsigned long long get_fs_free_size(const char *path)
 
 static inline int shm_attach_to_open_removed(void)
 {
+	struct utsname uts;
+	int major, minor;
+
+	if (uname(&uts) == -1)
+		return 0;
+
 	/*
-	 * XXX: Return 1 if >= OpenBSD 5.1 according to 97900ebf.
+	 * Return 1 if >= OpenBSD 5.1 according to 97900ebf,
+	 * assuming both major/minor versions are < 10.
 	 */
+	if (uts.release[0] > '9' || uts.release[0] < '0')
+		return 0;
+	if (uts.release[1] != '.')
+		return 0;
+	if (uts.release[2] > '9' || uts.release[2] < '0')
+		return 0;
+
+	major = uts.release[0] - '0';
+	minor = uts.release[2] - '0';
+
+	if (major > 5)
+		return 1;
+	if (major == 5 && minor >= 1)
+		return 1;
+
 	return 0;
 }
 
diff --git a/t/dedupe.c b/t/dedupe.c
index c0e9a69..1f172a2 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -14,7 +14,6 @@
 #include <fcntl.h>
 #include <string.h>
 
-#include "../lib/rbtree.h"
 #include "../flist.h"
 #include "../log.h"
 #include "../mutex.h"
@@ -25,6 +24,7 @@
 #include "../os/os.h"
 #include "../gettime.h"
 #include "../fio_time.h"
+#include "../lib/rbtree.h"
 
 #include "../lib/bloom.h"
 #include "debug.h"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-04-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-04-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 306fea38fa61c305703d0269b1fc8e7da3b91a1f:

  stat: make next log time decision cleaner (2017-04-25 18:14:00 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4bd2c8b9251a2c88f44ad52168252ce2de660bf7:

  configure: fix broken test for cuda (2017-04-26 15:24:36 -0600)

----------------------------------------------------------------
Jens Axboe (8):
      stat: cleanup iops/bw logging functions
      seqlock: add simple user space code for sequence locks
      Merge branch 'gpudirect' of https://github.com/yufeiren/fio
      gpu: ensure that we convert gpu_dev_id options
      gpu: kill a lot of useless ifdefs
      server: bump protocol version
      thread_options: kill two unused pads
      configure: fix broken test for cuda

Tomohiro Kusumi (9):
      Fix num2str() output when modulo != -1U
      Drop the only local variable declaration within a for-loop (C99)
      Make lib/strntol.c a stand-alone library
      Make lib/pattern.c a stand-alone library
      Make lib/rand.c a stand-alone library
      Make lib/zipf.c a stand-alone library
      Make lib/mountcheck.c a stand-alone library
      Make oslib/strlcat.c a stand-alone library
      Make oslib/linux-dev-lookup.c a stand-alone library

Yufei Ren (1):
      GPUDirect RDMA support

 HOWTO                                |   3 +
 cconv.c                              |   2 +
 configure                            |  24 +++++++-
 examples/gpudirect-rdmaio-client.fio |  15 +++++
 examples/gpudirect-rdmaio-server.fio |  12 ++++
 fio.1                                |   3 +
 fio.h                                |  16 ++++++
 init.c                               |   3 +
 io_u.c                               |   5 ++
 lib/mountcheck.c                     |   2 +-
 lib/num2str.c                        |  34 ++++++-----
 lib/pattern.c                        |   9 ++-
 lib/rand.c                           |   2 +-
 lib/seqlock.h                        |  48 ++++++++++++++++
 lib/strntol.c                        |   2 +-
 lib/zipf.c                           |   1 -
 memory.c                             |  76 +++++++++++++++++++++++++
 options.c                            |  18 ++++++
 oslib/linux-dev-lookup.c             |   3 +-
 oslib/strlcat.c                      |   2 +-
 parse.c                              |   3 +-
 server.h                             |   2 +-
 stat.c                               | 107 +++++++++++------------------------
 thread_options.h                     |   7 ++-
 24 files changed, 298 insertions(+), 101 deletions(-)
 create mode 100644 examples/gpudirect-rdmaio-client.fio
 create mode 100644 examples/gpudirect-rdmaio-server.fio
 create mode 100644 lib/seqlock.h

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index ffdcb75..d9e881a 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1468,6 +1468,9 @@ Buffers and memory
 		**mmapshared**
 			Same as mmap, but use a MMAP_SHARED mapping.
 
+		**cudamalloc**
+			Use GPU memory as the buffers for GPUDirect RDMA benchmark.
+
 	The area allocated is a function of the maximum allowed bs size for the job,
 	multiplied by the I/O depth given. Note that for **shmhuge** and
 	**mmaphuge** to work, the system must have free huge pages allocated. This
diff --git a/cconv.c b/cconv.c
index 886140d..3295824 100644
--- a/cconv.c
+++ b/cconv.c
@@ -235,6 +235,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->new_group = le32_to_cpu(top->new_group);
 	o->numjobs = le32_to_cpu(top->numjobs);
 	o->cpus_allowed_policy = le32_to_cpu(top->cpus_allowed_policy);
+	o->gpu_dev_id = le32_to_cpu(top->gpu_dev_id);
 	o->iolog = le32_to_cpu(top->iolog);
 	o->rwmixcycle = le32_to_cpu(top->rwmixcycle);
 	o->nice = le32_to_cpu(top->nice);
@@ -420,6 +421,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->new_group = cpu_to_le32(o->new_group);
 	top->numjobs = cpu_to_le32(o->numjobs);
 	top->cpus_allowed_policy = cpu_to_le32(o->cpus_allowed_policy);
+	top->gpu_dev_id = cpu_to_le32(o->gpu_dev_id);
 	top->iolog = cpu_to_le32(o->iolog);
 	top->rwmixcycle = cpu_to_le32(o->rwmixcycle);
 	top->nice = cpu_to_le32(o->nice);
diff --git a/configure b/configure
index f42489b..83a6702 100755
--- a/configure
+++ b/configure
@@ -186,6 +186,8 @@ for opt do
   ;;
   --disable-pmem) disable_pmem="yes"
   ;;
+  --enable-cuda) enable_cuda="yes"
+  ;;
   --help)
     show_help="yes"
     ;;
@@ -206,7 +208,7 @@ if test "$show_help" = "yes" ; then
   echo "--esx                   Configure build options for esx"
   echo "--enable-gfio           Enable building of gtk gfio"
   echo "--disable-numa          Disable libnuma even if found"
-  echo "--disable-rdma         Disable RDMA support even if found"
+  echo "--disable-rdma          Disable RDMA support even if found"
   echo "--disable-gfapi         Disable gfapi"
   echo "--enable-libhdfs        Enable hdfs support"
   echo "--disable-lex           Disable use of lex/yacc for math"
@@ -214,6 +216,7 @@ if test "$show_help" = "yes" ; then
   echo "--enable-lex            Enable use of lex/yacc for math"
   echo "--disable-shm           Disable SHM support"
   echo "--disable-optimizations Don't enable compiler optimizations"
+  echo "--enable-cuda           Enable GPUDirect RDMA support"
   exit $exit_val
 fi
 
@@ -1990,6 +1993,21 @@ EOF
 fi
 echo "march_armv8_a_crc_crypto      $march_armv8_a_crc_crypto"
 
+##########################################
+# cuda probe
+cuda="no"
+cat > $TMPC << EOF
+#include <cuda.h>
+int main(int argc, char **argv)
+{
+  return cuInit(0);
+}
+EOF
+if test "$enable_cuda" = "yes" && compile_prog "" "-lcuda" "cuda"; then
+  cuda="yes"
+  LIBS="-lcuda $LIBS"
+fi
+echo "cuda                          $cuda"
 
 #############################################################################
 
@@ -2210,10 +2228,12 @@ fi
 if test "$disable_opt" = "yes" ; then
   output_sym "CONFIG_DISABLE_OPTIMIZATIONS"
 fi
-
 if test "$zlib" = "no" ; then
   echo "Consider installing zlib-dev (zlib-devel), some fio features depend on it."
 fi
+if test "$cuda" = "yes" ; then
+  output_sym "CONFIG_CUDA"
+fi
 
 echo "LIBS+=$LIBS" >> $config_host_mak
 echo "GFIO_LIBS+=$GFIO_LIBS" >> $config_host_mak
diff --git a/examples/gpudirect-rdmaio-client.fio b/examples/gpudirect-rdmaio-client.fio
new file mode 100644
index 0000000..1e24624
--- /dev/null
+++ b/examples/gpudirect-rdmaio-client.fio
@@ -0,0 +1,15 @@
+# Example gpudirect rdma client job
+[global]
+ioengine=rdma
+hostname=[hostname]
+port=[port]
+verb=[read/write/send/recv]
+mem=cudamalloc
+gpu_dev_id=0
+bs=1m
+size=100g
+
+[sender]
+rw=write
+iodepth=1
+iodepth_batch_complete=1
diff --git a/examples/gpudirect-rdmaio-server.fio b/examples/gpudirect-rdmaio-server.fio
new file mode 100644
index 0000000..5fc4950
--- /dev/null
+++ b/examples/gpudirect-rdmaio-server.fio
@@ -0,0 +1,12 @@
+# Example rdma server job
+[global]
+ioengine=rdma
+port=[port]
+mem=cudamalloc
+gpu_dev_id=0
+bs=1m
+size=100g
+
+[receiver]
+rw=read
+iodepth=16
diff --git a/fio.1 b/fio.1
index b59025d..138bcbb 100644
--- a/fio.1
+++ b/fio.1
@@ -1309,6 +1309,9 @@ Same as \fBmmap\fR, but use huge files as backing.
 .TP
 .B mmapshared
 Same as \fBmmap\fR, but use a MMAP_SHARED mapping.
+.TP
+.B cudamalloc
+Use GPU memory as the buffers for GPUDirect RDMA benchmark. The ioengine must be \fBrdma\fR.
 .RE
 .P
 The amount of memory allocated is the maximum allowed \fBblocksize\fR for the
diff --git a/fio.h b/fio.h
index b67613e..6b2b669 100644
--- a/fio.h
+++ b/fio.h
@@ -59,6 +59,10 @@
 #define MPOL_LOCAL MPOL_MAX
 #endif
 
+#ifdef CONFIG_CUDA
+#include <cuda.h>
+#endif
+
 /*
  * offset generator types
  */
@@ -408,6 +412,18 @@ struct thread_data {
 	struct steadystate_data ss;
 
 	char verror[FIO_VERROR_SIZE];
+
+#ifdef CONFIG_CUDA
+	/*
+	 * for GPU memory management
+	 */
+	int gpu_dev_cnt;
+	int gpu_dev_id;
+	CUdevice  cu_dev;
+	CUcontext cu_ctx;
+	CUdeviceptr dev_mem_ptr;
+#endif	
+
 };
 
 /*
diff --git a/init.c b/init.c
index 9aa452d..52a5f03 100644
--- a/init.c
+++ b/init.c
@@ -1070,6 +1070,9 @@ static void init_flags(struct thread_data *td)
 
 	if (o->verify_async || o->io_submit_mode == IO_MODE_OFFLOAD)
 		td->flags |= TD_F_NEED_LOCK;
+
+	if (o->mem_type == MEM_CUDA_MALLOC)
+		td->flags &= ~TD_F_SCRAMBLE_BUFFERS;
 }
 
 static int setup_random_seeds(struct thread_data *td)
diff --git a/io_u.c b/io_u.c
index 88f35c9..fd63119 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1674,8 +1674,10 @@ out:
 	if (!td_io_prep(td, io_u)) {
 		if (!td->o.disable_lat)
 			fio_gettime(&io_u->start_time, NULL);
+
 		if (do_scramble)
 			small_content_scramble(io_u);
+
 		return io_u;
 	}
 err_put:
@@ -2043,6 +2045,9 @@ void fill_io_buffer(struct thread_data *td, void *buf, unsigned int min_write,
 {
 	struct thread_options *o = &td->o;
 
+	if (o->mem_type == MEM_CUDA_MALLOC)
+		return;
+
 	if (o->compress_percentage || o->dedupe_percentage) {
 		unsigned int perc = td->o.compress_percentage;
 		struct frand_state *rs;
diff --git a/lib/mountcheck.c b/lib/mountcheck.c
index 0aec744..2fb6fe7 100644
--- a/lib/mountcheck.c
+++ b/lib/mountcheck.c
@@ -4,7 +4,7 @@
 #ifdef CONFIG_GETMNTENT
 #include <mntent.h>
 
-#include "lib/mountcheck.h"
+#include "mountcheck.h"
 
 #define MTAB	"/etc/mtab"
 
diff --git a/lib/num2str.c b/lib/num2str.c
index 448d3ff..8d08841 100644
--- a/lib/num2str.c
+++ b/lib/num2str.c
@@ -10,7 +10,7 @@
 /**
  * num2str() - Cheesy number->string conversion, complete with carry rounding error.
  * @num: quantity (e.g., number of blocks, bytes or bits)
- * @maxlen: max number of digits in the output string (not counting prefix and units)
+ * @maxlen: max number of digits in the output string (not counting prefix and units, but counting .)
  * @base: multiplier for num (e.g., if num represents Ki, use 1024)
  * @pow2: select unit prefix - 0=power-of-10 decimal SI, nonzero=power-of-2 binary IEC
  * @units: select units - N2S_* macros defined in num2str.h
@@ -23,9 +23,9 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, int units)
 	const char **unitprefix;
 	const char *unitstr[] = { "", "/s", "B", "bit", "B/s", "bit/s" };
 	const unsigned int thousand[] = { 1000, 1024 };
-	unsigned int modulo, decimals;
+	unsigned int modulo;
 	int unit_index = 0, post_index, carry = 0;
-	char tmp[32];
+	char tmp[32], fmt[32];
 	char *buf;
 
 	compiletime_assert(sizeof(sistr) == sizeof(iecstr), "unit prefix arrays must be identical sizes");
@@ -62,6 +62,9 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, int units)
 		break;
 	}
 
+	/*
+	 * Divide by K/Ki until string length of num <= maxlen.
+	 */
 	modulo = -1U;
 	while (post_index < sizeof(sistr)) {
 		sprintf(tmp, "%llu", (unsigned long long) num);
@@ -74,6 +77,9 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, int units)
 		post_index++;
 	}
 
+	/*
+	 * If no modulo, then we're done.
+	 */
 	if (modulo == -1U) {
 done:
 		if (post_index >= ARRAY_SIZE(sistr))
@@ -84,23 +90,25 @@ done:
 		return buf;
 	}
 
+	/*
+	 * If no room for decimals, then we're done.
+	 */
 	sprintf(tmp, "%llu", (unsigned long long) num);
-	decimals = maxlen - strlen(tmp);
-	if ((int)decimals <= 1) {
+	if ((int)(maxlen - strlen(tmp)) <= 1) {
 		if (carry)
 			num++;
 		goto done;
 	}
 
-	do {
-		sprintf(tmp, "%u", modulo);
-		if (strlen(tmp) <= decimals - 1)
-			break;
-
-		modulo = (modulo + 9) / 10;
-	} while (1);
+	/*
+	 * Fill in everything and return the result.
+	 */
+	assert(maxlen - strlen(tmp) - 1 > 0);
+	assert(modulo < thousand[!!pow2]);
+	sprintf(fmt, "%%.%df", (int)(maxlen - strlen(tmp) - 1));
+	sprintf(tmp, fmt, (double)modulo / (double)thousand[!!pow2]);
 
-	sprintf(buf, "%llu.%u%s%s", (unsigned long long) num, modulo,
+	sprintf(buf, "%llu.%s%s%s", (unsigned long long) num, &tmp[2],
 			unitprefix[post_index], unitstr[unit_index]);
 	return buf;
 }
diff --git a/lib/pattern.c b/lib/pattern.c
index b8ae809..0aeb935 100644
--- a/lib/pattern.c
+++ b/lib/pattern.c
@@ -1,6 +1,13 @@
-#include "fio.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <limits.h>
+#include <errno.h>
+#include <assert.h>
+
 #include "strntol.h"
 #include "pattern.h"
+#include "../minmax.h"
 #include "../oslib/strcasestr.h"
 
 /**
diff --git a/lib/rand.c b/lib/rand.c
index 9c3e0d6..3f60a67 100644
--- a/lib/rand.c
+++ b/lib/rand.c
@@ -36,7 +36,7 @@
 #include <string.h>
 #include <assert.h>
 #include "rand.h"
-#include "lib/pattern.h"
+#include "pattern.h"
 #include "../hash.h"
 
 int arch_random;
diff --git a/lib/seqlock.h b/lib/seqlock.h
new file mode 100644
index 0000000..1ac1eb6
--- /dev/null
+++ b/lib/seqlock.h
@@ -0,0 +1,48 @@
+#ifndef FIO_SEQLOCK_H
+#define FIO_SEQLOCK_H
+
+#include "../arch/arch.h"
+
+struct seqlock {
+	volatile int sequence;
+};
+
+static inline void seqlock_init(struct seqlock *s)
+{
+	s->sequence = 0;
+}
+
+static inline unsigned int read_seqlock_begin(struct seqlock *s)
+{
+	unsigned int seq;
+
+	do {
+		seq = s->sequence;
+		if (!(seq & 1))
+			break;
+		nop;
+	} while (1);
+
+	read_barrier();
+	return seq;
+}
+
+static inline bool read_seqlock_retry(struct seqlock *s, unsigned int seq)
+{
+	read_barrier();
+	return s->sequence != seq;
+}
+
+static inline void write_seqlock_begin(struct seqlock *s)
+{
+	s->sequence++;
+	write_barrier();
+}
+
+static inline void write_seqlock_end(struct seqlock *s)
+{
+	write_barrier();
+	s->sequence++;
+}
+
+#endif
diff --git a/lib/strntol.c b/lib/strntol.c
index adf45bd..f622c8d 100644
--- a/lib/strntol.c
+++ b/lib/strntol.c
@@ -2,7 +2,7 @@
 #include <stdlib.h>
 #include <limits.h>
 
-#include "lib/strntol.h"
+#include "strntol.h"
 
 long strntol(const char *str, size_t sz, char **end, int base)
 {
diff --git a/lib/zipf.c b/lib/zipf.c
index 681df70..3d535c7 100644
--- a/lib/zipf.c
+++ b/lib/zipf.c
@@ -6,7 +6,6 @@
 #include <sys/types.h>
 #include <fcntl.h>
 #include "ieee754.h"
-#include "../log.h"
 #include "zipf.h"
 #include "../minmax.h"
 #include "../hash.h"
diff --git a/memory.c b/memory.c
index 9e73f10..22a7f5d 100644
--- a/memory.c
+++ b/memory.c
@@ -207,6 +207,78 @@ static void free_mem_malloc(struct thread_data *td)
 	free(td->orig_buffer);
 }
 
+static int alloc_mem_cudamalloc(struct thread_data *td, size_t total_mem)
+{
+#ifdef CONFIG_CUDA
+	CUresult ret;
+	char name[128];
+
+	ret = cuInit(0);
+	if (ret != CUDA_SUCCESS) {
+		log_err("fio: failed initialize cuda driver api\n");
+		return 1;
+	}
+
+	ret = cuDeviceGetCount(&td->gpu_dev_cnt);
+	if (ret != CUDA_SUCCESS) {
+		log_err("fio: failed get device count\n");
+		return 1;
+	}
+	dprint(FD_MEM, "found %d GPU devices\n", td->gpu_dev_cnt);
+
+	if (td->gpu_dev_cnt == 0) {
+		log_err("fio: no GPU device found. "
+			"Can not perform GPUDirect RDMA.\n");
+		return 1;
+	}
+
+	td->gpu_dev_id = td->o.gpu_dev_id;
+	ret = cuDeviceGet(&td->cu_dev, td->gpu_dev_id);
+	if (ret != CUDA_SUCCESS) {
+		log_err("fio: failed get GPU device\n");
+		return 1;
+	}
+
+	ret = cuDeviceGetName(name, sizeof(name), td->gpu_dev_id);
+	if (ret != CUDA_SUCCESS) {
+		log_err("fio: failed get device name\n");
+		return 1;
+	}
+	dprint(FD_MEM, "dev_id = [%d], device name = [%s]\n", \
+	       td->gpu_dev_id, name);
+
+	ret = cuCtxCreate(&td->cu_ctx, CU_CTX_MAP_HOST, td->cu_dev);
+	if (ret != CUDA_SUCCESS) {
+		log_err("fio: failed to create cuda context: %d\n", ret);
+		return 1;
+	}
+
+	ret = cuMemAlloc(&td->dev_mem_ptr, total_mem);
+	if (ret != CUDA_SUCCESS) {
+		log_err("fio: cuMemAlloc %zu bytes failed\n", total_mem);
+		return 1;
+	}
+	td->orig_buffer = (void *) td->dev_mem_ptr;
+
+	dprint(FD_MEM, "cudaMalloc %llu %p\n",				\
+	       (unsigned long long) total_mem, td->orig_buffer);
+	return 0;
+#else
+	return -EINVAL;
+#endif
+}
+
+static void free_mem_cudamalloc(struct thread_data *td)
+{
+#ifdef CONFIG_CUDA
+	if (td->dev_mem_ptr != NULL)
+		cuMemFree(td->dev_mem_ptr);
+
+	if (cuCtxDestroy(td->cu_ctx) != CUDA_SUCCESS)
+		log_err("fio: failed to destroy cuda context\n");
+#endif
+}
+
 /*
  * Set up the buffer area we need for io.
  */
@@ -246,6 +318,8 @@ int allocate_io_mem(struct thread_data *td)
 	else if (td->o.mem_type == MEM_MMAP || td->o.mem_type == MEM_MMAPHUGE ||
 		 td->o.mem_type == MEM_MMAPSHARED)
 		ret = alloc_mem_mmap(td, total_mem);
+	else if (td->o.mem_type == MEM_CUDA_MALLOC)
+		ret = alloc_mem_cudamalloc(td, total_mem);
 	else {
 		log_err("fio: bad mem type: %d\n", td->o.mem_type);
 		ret = 1;
@@ -275,6 +349,8 @@ void free_io_mem(struct thread_data *td)
 	else if (td->o.mem_type == MEM_MMAP || td->o.mem_type == MEM_MMAPHUGE ||
 		 td->o.mem_type == MEM_MMAPSHARED)
 		free_mem_mmap(td, total_mem);
+	else if (td->o.mem_type == MEM_CUDA_MALLOC)
+		free_mem_cudamalloc(td);
 	else
 		log_err("Bad memory type %u\n", td->o.mem_type);
 
diff --git a/options.c b/options.c
index e0deab0..85574d7 100644
--- a/options.c
+++ b/options.c
@@ -2604,6 +2604,12 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "Like mmap, but use huge pages",
 			  },
 #endif
+#ifdef CONFIG_CUDA
+			  { .ival = "cudamalloc",
+			    .oval = MEM_CUDA_MALLOC,
+			    .help = "Allocate GPU device memory for GPUDirect RDMA",
+			  },
+#endif
 		  },
 	},
 	{
@@ -3563,6 +3569,18 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.help	= "Build fio with libnuma-dev(el) to enable this option",
 	},
 #endif
+#ifdef CONFIG_CUDA
+	{
+		.name	= "gpu_dev_id",
+		.lname	= "GPU device ID",
+		.type	= FIO_OPT_INT,
+		.off1	= offsetof(struct thread_options, gpu_dev_id),
+		.help	= "Set GPU device ID for GPUDirect RDMA",
+		.def    = "0",
+		.category = FIO_OPT_C_GENERAL,
+		.group	= FIO_OPT_G_INVALID,
+	},
+#endif
 	{
 		.name	= "end_fsync",
 		.lname	= "End fsync",
diff --git a/oslib/linux-dev-lookup.c b/oslib/linux-dev-lookup.c
index 2bbd14a..5fbccd3 100644
--- a/oslib/linux-dev-lookup.c
+++ b/oslib/linux-dev-lookup.c
@@ -5,8 +5,7 @@
 #include <stdio.h>
 #include <unistd.h>
 
-#include "../os/os.h"
-#include "oslib/linux-dev-lookup.h"
+#include "linux-dev-lookup.h"
 
 int blktrace_lookup_device(const char *redirect, char *path, unsigned int maj,
 			   unsigned int min)
diff --git a/oslib/strlcat.c b/oslib/strlcat.c
index 3329b83..3b33d0e 100644
--- a/oslib/strlcat.c
+++ b/oslib/strlcat.c
@@ -1,5 +1,5 @@
 #include <string.h>
-#include "oslib/strlcat.h"
+#include "strlcat.h"
 
 size_t strlcat(char *dst, const char *src, size_t size)
 {
diff --git a/parse.c b/parse.c
index fd5605f..4d4fddd 100644
--- a/parse.c
+++ b/parse.c
@@ -135,6 +135,7 @@ static unsigned long long get_mult_time(const char *str, int len,
 	const char *p = str;
 	char *c;
 	unsigned long long mult = 1;
+	int i;
 
 	/*
          * Go forward until we hit a non-digit, or +/- sign
@@ -153,7 +154,7 @@ static unsigned long long get_mult_time(const char *str, int len,
 	}
 
 	c = strdup(p);
-	for (int i = 0; i < strlen(c); i++)
+	for (i = 0; i < strlen(c); i++)
 		c[i] = tolower(c[i]);
 
 	if (!strncmp("us", c, 2) || !strncmp("usec", c, 4))
diff --git a/server.h b/server.h
index 798d5a8..5c720d4 100644
--- a/server.h
+++ b/server.h
@@ -49,7 +49,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 60,
+	FIO_SERVER_VER			= 61,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index 27d1fea..f3b82cf 100644
--- a/stat.c
+++ b/stat.c
@@ -2427,19 +2427,21 @@ void add_bw_sample(struct thread_data *td, struct io_u *io_u,
 	td_io_u_unlock(td);
 }
 
-static int add_bw_samples(struct thread_data *td, struct timeval *t)
+static int __add_samples(struct thread_data *td, struct timeval *parent_tv,
+			 struct timeval *t, unsigned int avg_time,
+			 uint64_t *this_io_bytes, uint64_t *stat_io_bytes,
+			 struct io_stat *stat, struct io_log *log,
+			 bool is_kb)
 {
-	struct thread_stat *ts = &td->ts;
 	unsigned long spent, rate;
 	enum fio_ddir ddir;
 	unsigned int next, next_log;
 
-	next_log = td->o.bw_avg_time;
+	next_log = avg_time;
 
-	spent = mtime_since(&td->bw_sample_time, t);
-	if (spent < td->o.bw_avg_time &&
-	    td->o.bw_avg_time - spent >= LOG_MSEC_SLACK)
-		return td->o.bw_avg_time - spent;
+	spent = mtime_since(parent_tv, t);
+	if (spent < avg_time && avg_time - spent >= LOG_MSEC_SLACK)
+		return avg_time - spent;
 
 	td_io_u_lock(td);
 
@@ -2449,16 +2451,19 @@ static int add_bw_samples(struct thread_data *td, struct timeval *t)
 	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
 		uint64_t delta;
 
-		delta = td->this_io_bytes[ddir] - td->stat_io_bytes[ddir];
+		delta = this_io_bytes[ddir] - stat_io_bytes[ddir];
 		if (!delta)
 			continue; /* No entries for interval */
 
-		if (spent)
-			rate = delta * 1000 / spent / 1024; /* KiB/s */
-		else
+		if (spent) {
+			if (is_kb)
+				rate = delta * 1000 / spent / 1024; /* KiB/s */
+			else
+				rate = (delta * 1000) / spent;
+		} else
 			rate = 0;
 
-		add_stat_sample(&ts->bw_stat[ddir], rate);
+		add_stat_sample(&stat[ddir], rate);
 
 		if (td->bw_log) {
 			unsigned int bs = 0;
@@ -2466,26 +2471,32 @@ static int add_bw_samples(struct thread_data *td, struct timeval *t)
 			if (td->o.min_bs[ddir] == td->o.max_bs[ddir])
 				bs = td->o.min_bs[ddir];
 
-			next = add_log_sample(td, td->bw_log, sample_val(rate),
-					      ddir, bs, 0);
+			next = add_log_sample(td, log, sample_val(rate), ddir, bs, 0);
 			next_log = min(next_log, next);
 		}
 
-		td->stat_io_bytes[ddir] = td->this_io_bytes[ddir];
+		stat_io_bytes[ddir] = this_io_bytes[ddir];
 	}
 
-	timeval_add_msec(&td->bw_sample_time, td->o.bw_avg_time);
+	timeval_add_msec(parent_tv, avg_time);
 
 	td_io_u_unlock(td);
 
-	if (spent <= td->o.bw_avg_time)
-		next = td->o.bw_avg_time;
+	if (spent <= avg_time)
+		next = avg_time;
 	else
-		next = td->o.bw_avg_time - (1 + spent - td->o.bw_avg_time);
+		next = avg_time - (1 + spent - avg_time);
 
 	return min(next, next_log);
 }
 
+static int add_bw_samples(struct thread_data *td, struct timeval *t)
+{
+	return __add_samples(td, &td->bw_sample_time, t, td->o.bw_avg_time,
+				td->this_io_bytes, td->stat_io_bytes,
+				td->ts.bw_stat, td->bw_log, true);
+}
+
 void add_iops_sample(struct thread_data *td, struct io_u *io_u,
 		     unsigned int bytes)
 {
@@ -2505,61 +2516,9 @@ void add_iops_sample(struct thread_data *td, struct io_u *io_u,
 
 static int add_iops_samples(struct thread_data *td, struct timeval *t)
 {
-	struct thread_stat *ts = &td->ts;
-	unsigned long spent, iops;
-	enum fio_ddir ddir;
-	unsigned int next, next_log;
-
-	next_log = td->o.iops_avg_time;
-
-	spent = mtime_since(&td->iops_sample_time, t);
-	if (spent < td->o.iops_avg_time &&
-	    td->o.iops_avg_time - spent >= LOG_MSEC_SLACK)
-		return td->o.iops_avg_time - spent;
-
-	td_io_u_lock(td);
-
-	/*
-	 * Compute both read and write rates for the interval.
-	 */
-	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
-		uint64_t delta;
-
-		delta = td->this_io_blocks[ddir] - td->stat_io_blocks[ddir];
-		if (!delta)
-			continue; /* No entries for interval */
-
-		if (spent)
-			iops = (delta * 1000) / spent;
-		else
-			iops = 0;
-
-		add_stat_sample(&ts->iops_stat[ddir], iops);
-
-		if (td->iops_log) {
-			unsigned int bs = 0;
-
-			if (td->o.min_bs[ddir] == td->o.max_bs[ddir])
-				bs = td->o.min_bs[ddir];
-
-			next = add_log_sample(td, td->iops_log,
-					      sample_val(iops), ddir, bs, 0);
-			next_log = min(next_log, next);
-		}
-
-		td->stat_io_blocks[ddir] = td->this_io_blocks[ddir];
-	}
-
-	timeval_add_msec(&td->iops_sample_time, td->o.iops_avg_time);
-
-	td_io_u_unlock(td);
-
-	if (spent <= td->o.iops_avg_time)
-		next = td->o.iops_avg_time;
-	else
-		next = td->o.iops_avg_time - (1 + spent - td->o.iops_avg_time);
-
-	return min(next, next_log);
+	return __add_samples(td, &td->iops_sample_time, t, td->o.iops_avg_time,
+				td->this_io_blocks, td->stat_io_blocks,
+				td->ts.iops_stat, td->iops_log, false);
 }
 
 /*
diff --git a/thread_options.h b/thread_options.h
index 2b2df33..d0f3fe9 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -20,6 +20,7 @@ enum fio_memtype {
 	MEM_MMAP,	/* use anonynomous mmap */
 	MEM_MMAPHUGE,	/* memory mapped huge file */
 	MEM_MMAPSHARED, /* use mmap with shared flag */
+	MEM_CUDA_MALLOC,/* use GPU memory */
 };
 
 #define ERROR_STR_MAX	128
@@ -198,6 +199,8 @@ struct thread_options {
 	unsigned short numa_mem_mode;
 	unsigned int numa_mem_prefer_node;
 	char *numa_memnodes;
+	unsigned int gpu_dev_id;
+
 	unsigned int iolog;
 	unsigned int rwmixcycle;
 	unsigned int rwmix[DDIR_RWDIR_CNT];
@@ -336,7 +339,6 @@ struct thread_options_pack {
 	uint32_t iodepth_batch;
 	uint32_t iodepth_batch_complete_min;
 	uint32_t iodepth_batch_complete_max;
-	uint32_t __proper_alignment_for_64b;
 
 	uint64_t size;
 	uint64_t io_size;
@@ -411,7 +413,6 @@ struct thread_options_pack {
 	uint32_t bs_unaligned;
 	uint32_t fsync_on_close;
 	uint32_t bs_is_seq_rand;
-	uint32_t pad1;
 
 	uint32_t random_distribution;
 	uint32_t exitall_error;
@@ -467,6 +468,8 @@ struct thread_options_pack {
 	uint8_t verify_cpumask[FIO_TOP_STR_MAX];
 	uint8_t log_gz_cpumask[FIO_TOP_STR_MAX];
 #endif
+	uint32_t gpu_dev_id;
+	uint32_t pad;
 	uint32_t cpus_allowed_policy;
 	uint32_t iolog;
 	uint32_t rwmixcycle;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-04-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-04-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d7658cedd1f31d1285f1f1e1cfe42fabb7cd6c8f:

  Return non-negtive error in order to print right error msg (2017-04-19 20:34:19 +0800)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 306fea38fa61c305703d0269b1fc8e7da3b91a1f:

  stat: make next log time decision cleaner (2017-04-25 18:14:00 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      stat: make next log time decision cleaner

 stat.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index fde7af2..27d1fea 100644
--- a/stat.c
+++ b/stat.c
@@ -2479,9 +2479,10 @@ static int add_bw_samples(struct thread_data *td, struct timeval *t)
 	td_io_u_unlock(td);
 
 	if (spent <= td->o.bw_avg_time)
-		return min(next_log, td->o.bw_avg_time);
+		next = td->o.bw_avg_time;
+	else
+		next = td->o.bw_avg_time - (1 + spent - td->o.bw_avg_time);
 
-	next = td->o.bw_avg_time - (1 + spent - td->o.bw_avg_time);
 	return min(next, next_log);
 }
 
@@ -2554,9 +2555,10 @@ static int add_iops_samples(struct thread_data *td, struct timeval *t)
 	td_io_u_unlock(td);
 
 	if (spent <= td->o.iops_avg_time)
-		return min(next_log, td->o.iops_avg_time);
+		next = td->o.iops_avg_time;
+	else
+		next = td->o.iops_avg_time - (1 + spent - td->o.iops_avg_time);
 
-	next = td->o.iops_avg_time - (1 + spent - td->o.iops_avg_time);
 	return min(next, next_log);
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-04-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-04-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b94d4d75a2e474561dcda8ee852cd5e67dde884e:

  Merge branch 'pull-2' of https://github.com/dmonakhov/fio (2017-04-10 11:41:09 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d7658cedd1f31d1285f1f1e1cfe42fabb7cd6c8f:

  Return non-negtive error in order to print right error msg (2017-04-19 20:34:19 +0800)

----------------------------------------------------------------
mychoxin (1):
      Return non-negtive error in order to print right error msg

 engines/rbd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/engines/rbd.c b/engines/rbd.c
index 829e41a..7433879 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -290,7 +290,7 @@ static void _fio_rbd_finish_aiocb(rbd_completion_t comp, void *data)
 	 */
 	ret = rbd_aio_get_return_value(fri->completion);
 	if (ret < 0) {
-		io_u->error = ret;
+		io_u->error = -ret;
 		io_u->resid = io_u->xfer_buflen;
 	} else
 		io_u->error = 0;
@@ -524,7 +524,7 @@ static int fio_rbd_queue(struct thread_data *td, struct io_u *io_u)
 failed_comp:
 	rbd_aio_release(fri->completion);
 failed:
-	io_u->error = r;
+	io_u->error = -r;
 	td_verror(td, io_u->error, "xfer");
 	return FIO_Q_COMPLETED;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-04-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-04-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e5a415c889251928dd256738a8022f1eab91c73b:

  Fix num2str() output when maxlen <= strlen(tmp) (2017-04-08 11:04:21 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b94d4d75a2e474561dcda8ee852cd5e67dde884e:

  Merge branch 'pull-2' of https://github.com/dmonakhov/fio (2017-04-10 11:41:09 -0600)

----------------------------------------------------------------
Dmitry Monakhov (2):
      engine: e4defrag fix error reporting
      engine: add ftruncate ioengine

Jens Axboe (1):
      Merge branch 'pull-2' of https://github.com/dmonakhov/fio

Sitsofe Wheeler (1):
      doc: add ftruncate engine documentation and example jobfile

 HOWTO                  |  5 +++++
 Makefile               |  1 +
 engines/e4defrag.c     |  9 ++++++--
 engines/ftruncate.c    | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++
 examples/ftruncate.fio | 27 ++++++++++++++++++++++++
 5 files changed, 96 insertions(+), 2 deletions(-)
 create mode 100644 engines/ftruncate.c
 create mode 100644 examples/ftruncate.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 80b9e75..ffdcb75 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1675,6 +1675,11 @@ I/O engine
 			DDIR_TRIM
 				does fallocate(,mode = FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE).
 
+		**ftruncate**
+			I/O engine that sends :manpage:`ftruncate(2)` operations in response
+			to write (DDIR_WRITE) events. Each ftruncate issued sets the file's
+			size to the current block offset. Block size is ignored.
+
 		**e4defrag**
 			I/O engine that does regular EXT4_IOC_MOVE_EXT ioctls to simulate
 			defragment activity in request to DDIR_WRITE event.
diff --git a/Makefile b/Makefile
index 37150c6..66083ff 100644
--- a/Makefile
+++ b/Makefile
@@ -42,6 +42,7 @@ SOURCE :=	$(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		eta.c verify.c memory.c io_u.c parse.c mutex.c options.c \
 		smalloc.c filehash.c profile.c debug.c engines/cpu.c \
 		engines/mmap.c engines/sync.c engines/null.c engines/net.c \
+		engines/ftruncate.c \
 		server.c client.c iolog.c backend.c libfio.c flow.c cconv.c \
 		gettime-thread.c helpers.c json.c idletime.c td_error.c \
 		profiles/tiobench.c profiles/act.c io_u_queue.c filelock.c \
diff --git a/engines/e4defrag.c b/engines/e4defrag.c
index 1e4996f..4b44488 100644
--- a/engines/e4defrag.c
+++ b/engines/e4defrag.c
@@ -172,8 +172,13 @@ static int fio_e4defrag_queue(struct thread_data *td, struct io_u *io_u)
 		len = io_u->xfer_buflen;
 
 	if (len != io_u->xfer_buflen) {
-		io_u->resid = io_u->xfer_buflen - len;
-		io_u->error = 0;
+		if (len) {
+			io_u->resid = io_u->xfer_buflen - len;
+			io_u->error = 0;
+		} else {
+			/* access beyond i_size */
+			io_u->error = EINVAL;
+		}
 	}
 	if (ret)
 		io_u->error = errno;
diff --git a/engines/ftruncate.c b/engines/ftruncate.c
new file mode 100644
index 0000000..e86dbac
--- /dev/null
+++ b/engines/ftruncate.c
@@ -0,0 +1,56 @@
+/*
+ * ftruncate: ioengine for git://git.kernel.dk/fio.git
+ *
+ * IO engine that does regular truncates to simulate data transfer
+ * as fio ioengine.
+ * DDIR_WRITE does ftruncate
+ *
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/uio.h>
+#include <errno.h>
+#include <assert.h>
+#include <fcntl.h>
+
+#include "../fio.h"
+#include "../filehash.h"
+
+static int fio_ftruncate_queue(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	int ret;
+	fio_ro_check(td, io_u);
+
+	if (io_u->ddir != DDIR_WRITE) {
+		io_u->error = EINVAL;
+		return FIO_Q_COMPLETED;
+	}
+	ret = ftruncate(f->fd, io_u->offset);
+
+	if (ret)
+		io_u->error = errno;
+
+	return FIO_Q_COMPLETED;
+}
+
+static struct ioengine_ops ioengine = {
+	.name		= "ftruncate",
+	.version	= FIO_IOOPS_VERSION,
+	.queue		= fio_ftruncate_queue,
+	.open_file	= generic_open_file,
+	.close_file	= generic_close_file,
+	.get_file_size	= generic_get_file_size,
+	.flags		= FIO_SYNCIO | FIO_FAKEIO
+};
+
+static void fio_init fio_syncio_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_syncio_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/ftruncate.fio b/examples/ftruncate.fio
new file mode 100644
index 0000000..a6ef457
--- /dev/null
+++ b/examples/ftruncate.fio
@@ -0,0 +1,27 @@
+# Example ftruncate engine jobs
+
+[global]
+ioengine=ftruncate
+directory=/scratch
+size=102404k ; 100Mb+4k
+stonewall
+filename=truncate
+runtime=10s
+time_based
+direct=1
+#
+# bs option is stub here. Truncation is performed on the current block offset.
+# blocksize value is ignored
+bs=4k
+
+# truncate the file to 4Kbytes then repeatedly grow the file back to just over
+# its original size using subsequent truncates
+[grow-truncate]
+rw=write
+
+# Repeatedly change a file to a random size between 0Kbytes and 100Mb
+# using truncates
+[rand-truncate]
+rw=randwrite
+norandommap
+

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-04-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-04-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6c8611c79713fe73fddf7458ab3ab36feaeae67b:

  Split poisson rate control into read/write/trim (2017-04-07 16:04:31 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e5a415c889251928dd256738a8022f1eab91c73b:

  Fix num2str() output when maxlen <= strlen(tmp) (2017-04-08 11:04:21 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      backend: include data direction in debug poisson rate print

Tomohiro Kusumi (6):
      Don't silently terminate td when no I/O performed due to error
      dump_td_info() doesn't really need to be a function
      Make lib/prio_tree.c a stand-alone library
      Make lib/memalign.c a stand-alone library
      Make lib/num2str.c a stand-alone library by adding lib/num2str.h
      Fix num2str() output when maxlen <= strlen(tmp)

 backend.c       | 36 +++++++++++++++++++++++-------------
 fio.h           |  9 +--------
 io_u.c          |  5 +++--
 lib/memalign.c  |  4 +++-
 lib/num2str.c   |  9 ++++++---
 lib/num2str.h   | 15 +++++++++++++++
 lib/prio_tree.c |  5 ++++-
 7 files changed, 55 insertions(+), 28 deletions(-)
 create mode 100644 lib/num2str.h

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 86e689f..9a684ed 100644
--- a/backend.c
+++ b/backend.c
@@ -813,8 +813,9 @@ static long long usec_for_io(struct thread_data *td, enum fio_ddir ddir)
 		val = (int64_t) (1000000 / iops) *
 				-logf(__rand_0_1(&td->poisson_state[ddir]));
 		if (val) {
-			dprint(FD_RATE, "poisson rate iops=%llu\n",
-					(unsigned long long) 1000000 / val);
+			dprint(FD_RATE, "poisson rate iops=%llu, ddir=%d\n",
+					(unsigned long long) 1000000 / val,
+					ddir);
 		}
 		td->last_usec[ddir] += val;
 		return td->last_usec[ddir];
@@ -1456,6 +1457,7 @@ static void *thread_main(void *data)
 	struct thread_data *td = fd->td;
 	struct thread_options *o = &td->o;
 	struct sk_out *sk_out = fd->sk_out;
+	uint64_t bytes_done[DDIR_RWDIR_CNT];
 	int deadlock_loop_cnt;
 	int clear_state;
 	int ret;
@@ -1677,7 +1679,9 @@ static void *thread_main(void *data)
 					sizeof(td->bw_sample_time));
 	}
 
+	memset(bytes_done, 0, sizeof(bytes_done));
 	clear_state = 0;
+
 	while (keep_running(td)) {
 		uint64_t verify_bytes;
 
@@ -1696,8 +1700,6 @@ static void *thread_main(void *data)
 		if (td->o.verify_only && td_write(td))
 			verify_bytes = do_dry_run(td);
 		else {
-			uint64_t bytes_done[DDIR_RWDIR_CNT];
-
 			do_io(td, bytes_done);
 
 			if (!ddir_rw_sum(bytes_done)) {
@@ -1776,6 +1778,18 @@ static void *thread_main(void *data)
 			break;
 	}
 
+	/*
+	 * If td ended up with no I/O when it should have had,
+	 * then something went wrong unless FIO_NOIO or FIO_DISKLESSIO.
+	 * (Are we not missing other flags that can be ignored ?)
+	 */
+	if ((td->o.size || td->o.io_size) && !ddir_rw_sum(bytes_done) &&
+	    !(td_ioengine_flagged(td, FIO_NOIO) ||
+	      td_ioengine_flagged(td, FIO_DISKLESSIO)))
+		log_err("%s: No I/O performed by %s, "
+			 "perhaps try --debug=io option for details?\n",
+			 td->o.name, td->io_ops->name);
+
 	td_set_runstate(td, TD_FINISHING);
 
 	update_rusage_stat(td);
@@ -1848,14 +1862,6 @@ err:
 	return (void *) (uintptr_t) td->error;
 }
 
-static void dump_td_info(struct thread_data *td)
-{
-	log_err("fio: job '%s' (state=%d) hasn't exited in %lu seconds, it "
-		"appears to be stuck. Doing forceful exit of this job.\n",
-			td->o.name, td->runstate,
-			(unsigned long) time_since_now(&td->terminate_time));
-}
-
 /*
  * Run over the job map and reap the threads that have exited, if any.
  */
@@ -1940,7 +1946,11 @@ static void reap_threads(unsigned int *nr_running, uint64_t *t_rate,
 		if (td->terminate &&
 		    td->runstate < TD_FSYNCING &&
 		    time_since_now(&td->terminate_time) >= FIO_REAP_TIMEOUT) {
-			dump_td_info(td);
+			log_err("fio: job '%s' (state=%d) hasn't exited in "
+				"%lu seconds, it appears to be stuck. Doing "
+				"forceful exit of this job.\n",
+				td->o.name, td->runstate,
+				(unsigned long) time_since_now(&td->terminate_time));
 			td_set_runstate(td, TD_REAPED);
 			goto reaped;
 		}
diff --git a/fio.h b/fio.h
index 8171b9a..b67613e 100644
--- a/fio.h
+++ b/fio.h
@@ -35,6 +35,7 @@
 #include "oslib/getopt.h"
 #include "lib/rand.h"
 #include "lib/rbtree.h"
+#include "lib/num2str.h"
 #include "client.h"
 #include "server.h"
 #include "stat.h"
@@ -522,7 +523,6 @@ extern void fio_options_mem_dupe(struct thread_data *);
 extern void td_fill_rand_seeds(struct thread_data *);
 extern void td_fill_verify_state_seed(struct thread_data *);
 extern void add_job_opts(const char **, int);
-extern char *num2str(uint64_t, int, int, int, int);
 extern int ioengine_load(struct thread_data *);
 extern bool parse_dryrun(void);
 extern int fio_running_or_pending_io_threads(void);
@@ -535,13 +535,6 @@ extern uintptr_t page_size;
 extern int initialize_fio(char *envp[]);
 extern void deinitialize_fio(void);
 
-#define N2S_NONE	0
-#define N2S_BITPERSEC 	1	/* match unit_base for bit rates */
-#define N2S_PERSEC	2
-#define N2S_BIT		3
-#define N2S_BYTE	4
-#define N2S_BYTEPERSEC 	8	/* match unit_base for byte rates */
-
 #define FIO_GETOPT_JOB		0x89000000
 #define FIO_GETOPT_IOENGINE	0x98000000
 #define FIO_NR_OPTIONS		(FIO_MAX_OPTS + 128)
diff --git a/io_u.c b/io_u.c
index 363bfe1..88f35c9 100644
--- a/io_u.c
+++ b/io_u.c
@@ -899,8 +899,9 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 	}
 
 	if (io_u->offset + io_u->buflen > io_u->file->real_file_size) {
-		dprint(FD_IO, "io_u %p, offset too large\n", io_u);
-		dprint(FD_IO, "  off=%llu/%lu > %llu\n",
+		dprint(FD_IO, "io_u %p, offset + buflen exceeds file size\n",
+			io_u);
+		dprint(FD_IO, "  offset=%llu/buflen=%lu > %llu\n",
 			(unsigned long long) io_u->offset, io_u->buflen,
 			(unsigned long long) io_u->file->real_file_size);
 		return 1;
diff --git a/lib/memalign.c b/lib/memalign.c
index 1d1ba9b..137cc8e 100644
--- a/lib/memalign.c
+++ b/lib/memalign.c
@@ -3,7 +3,9 @@
 #include <inttypes.h>
 
 #include "memalign.h"
-#include "../fio.h"
+
+#define PTR_ALIGN(ptr, mask)   \
+	(char *)((uintptr_t)((ptr) + (mask)) & ~(mask))
 
 struct align_footer {
 	unsigned int offset;
diff --git a/lib/num2str.c b/lib/num2str.c
index ed3545d..448d3ff 100644
--- a/lib/num2str.c
+++ b/lib/num2str.c
@@ -2,7 +2,10 @@
 #include <stdio.h>
 #include <string.h>
 
-#include "../fio.h"
+#include "../compiler/compiler.h"
+#include "num2str.h"
+
+#define ARRAY_SIZE(x)    (sizeof((x)) / (sizeof((x)[0])))
 
 /**
  * num2str() - Cheesy number->string conversion, complete with carry rounding error.
@@ -10,7 +13,7 @@
  * @maxlen: max number of digits in the output string (not counting prefix and units)
  * @base: multiplier for num (e.g., if num represents Ki, use 1024)
  * @pow2: select unit prefix - 0=power-of-10 decimal SI, nonzero=power-of-2 binary IEC
- * @units: select units - N2S_* macros defined in fio.h
+ * @units: select units - N2S_* macros defined in num2str.h
  * @returns a malloc'd buffer containing "number[<unit prefix>][<units>]"
  */
 char *num2str(uint64_t num, int maxlen, int base, int pow2, int units)
@@ -83,7 +86,7 @@ done:
 
 	sprintf(tmp, "%llu", (unsigned long long) num);
 	decimals = maxlen - strlen(tmp);
-	if (decimals <= 1) {
+	if ((int)decimals <= 1) {
 		if (carry)
 			num++;
 		goto done;
diff --git a/lib/num2str.h b/lib/num2str.h
new file mode 100644
index 0000000..81358a1
--- /dev/null
+++ b/lib/num2str.h
@@ -0,0 +1,15 @@
+#ifndef FIO_NUM2STR_H
+#define FIO_NUM2STR_H
+
+#include <inttypes.h>
+
+#define N2S_NONE	0
+#define N2S_BITPERSEC	1	/* match unit_base for bit rates */
+#define N2S_PERSEC	2
+#define N2S_BIT		3
+#define N2S_BYTE	4
+#define N2S_BYTEPERSEC	8	/* match unit_base for byte rates */
+
+extern char *num2str(uint64_t, int, int, int, int);
+
+#endif
diff --git a/lib/prio_tree.c b/lib/prio_tree.c
index e18ae32..de3fe1c 100644
--- a/lib/prio_tree.c
+++ b/lib/prio_tree.c
@@ -13,9 +13,12 @@
 
 #include <stdlib.h>
 #include <limits.h>
-#include "../fio.h"
+
+#include "../compiler/compiler.h"
 #include "prio_tree.h"
 
+#define ARRAY_SIZE(x)    (sizeof((x)) / (sizeof((x)[0])))
+
 /*
  * A clever mix of heap and radix trees forms a radix priority search tree (PST)
  * which is useful for storing intervals, e.g, we can consider a vma as a closed

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-04-08 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-04-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e07a8a2281c7a5a0ec4eb8e8e66601ca1f9f71bb:

  Fio 2.19 (2017-04-04 08:30:59 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6c8611c79713fe73fddf7458ab3ab36feaeae67b:

  Split poisson rate control into read/write/trim (2017-04-07 16:04:31 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Split poisson rate control into read/write/trim

 backend.c | 6 +++---
 fio.h     | 6 ++++--
 init.c    | 6 ++++--
 3 files changed, 11 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 38ef348..86e689f 100644
--- a/backend.c
+++ b/backend.c
@@ -811,13 +811,13 @@ static long long usec_for_io(struct thread_data *td, enum fio_ddir ddir)
 		uint64_t val;
 		iops = bps / td->o.bs[ddir];
 		val = (int64_t) (1000000 / iops) *
-				-logf(__rand_0_1(&td->poisson_state));
+				-logf(__rand_0_1(&td->poisson_state[ddir]));
 		if (val) {
 			dprint(FD_RATE, "poisson rate iops=%llu\n",
 					(unsigned long long) 1000000 / val);
 		}
-		td->last_usec += val;
-		return td->last_usec;
+		td->last_usec[ddir] += val;
+		return td->last_usec[ddir];
 	} else if (bps) {
 		secs = bytes / bps;
 		remainder = bytes % bps;
diff --git a/fio.h b/fio.h
index 3955a81..8171b9a 100644
--- a/fio.h
+++ b/fio.h
@@ -100,6 +100,8 @@ enum {
 	FIO_DEDUPE_OFF,
 	FIO_RAND_POISSON_OFF,
 	FIO_RAND_ZONE_OFF,
+	FIO_RAND_POISSON2_OFF,
+	FIO_RAND_POISSON3_OFF,
 	FIO_RAND_NR_OFFS,
 };
 
@@ -281,8 +283,8 @@ struct thread_data {
 	unsigned long rate_blocks[DDIR_RWDIR_CNT];
 	unsigned long long rate_io_issue_bytes[DDIR_RWDIR_CNT];
 	struct timeval lastrate[DDIR_RWDIR_CNT];
-	int64_t last_usec;
-	struct frand_state poisson_state;
+	int64_t last_usec[DDIR_RWDIR_CNT];
+	struct frand_state poisson_state[DDIR_RWDIR_CNT];
 
 	/*
 	 * Enforced rate submission/completion workqueue
diff --git a/init.c b/init.c
index 2f9433b..9aa452d 100644
--- a/init.c
+++ b/init.c
@@ -523,7 +523,7 @@ static int __setup_rate(struct thread_data *td, enum fio_ddir ddir)
 
 	td->rate_next_io_time[ddir] = 0;
 	td->rate_io_issue_bytes[ddir] = 0;
-	td->last_usec = 0;
+	td->last_usec[ddir] = 0;
 	return 0;
 }
 
@@ -933,7 +933,9 @@ static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 	init_rand_seed(&td->file_size_state, td->rand_seeds[FIO_RAND_FILE_SIZE_OFF], use64);
 	init_rand_seed(&td->trim_state, td->rand_seeds[FIO_RAND_TRIM_OFF], use64);
 	init_rand_seed(&td->delay_state, td->rand_seeds[FIO_RAND_START_DELAY], use64);
-	init_rand_seed(&td->poisson_state, td->rand_seeds[FIO_RAND_POISSON_OFF], 0);
+	init_rand_seed(&td->poisson_state[0], td->rand_seeds[FIO_RAND_POISSON_OFF], 0);
+	init_rand_seed(&td->poisson_state[1], td->rand_seeds[FIO_RAND_POISSON2_OFF], 0);
+	init_rand_seed(&td->poisson_state[2], td->rand_seeds[FIO_RAND_POISSON3_OFF], 0);
 	init_rand_seed(&td->dedupe_state, td->rand_seeds[FIO_DEDUPE_OFF], false);
 	init_rand_seed(&td->zone_state, td->rand_seeds[FIO_RAND_ZONE_OFF], false);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-04-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-04-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5ba01402d96e113ac451441f45d0c8b4dd281f4d:

  backend: move freeing of td->mutex to main thread (2017-04-03 08:46:34 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e07a8a2281c7a5a0ec4eb8e8e66601ca1f9f71bb:

  Fio 2.19 (2017-04-04 08:30:59 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 2.19

 FIO-VERSION-GEN        | 2 +-
 os/windows/install.wxs | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 570e21f..4cc903f 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.18
+DEF_VER=fio-2.19
 
 LF='
 '
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index bb2b90b..ffaed8e 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.18">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.19">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-04-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-04-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 249ad5bfaded90c0431041b4a3816e7371c2d403:

  time: use correct size type for usecs (2017-04-02 16:01:18 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5ba01402d96e113ac451441f45d0c8b4dd281f4d:

  backend: move freeing of td->mutex to main thread (2017-04-03 08:46:34 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      backend: move freeing of td->mutex to main thread

 backend.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index b61de7c..38ef348 100644
--- a/backend.c
+++ b/backend.c
@@ -1836,9 +1836,6 @@ err:
 	if (o->write_iolog_file)
 		write_iolog_close(td);
 
-	fio_mutex_remove(td->mutex);
-	td->mutex = NULL;
-
 	td_set_runstate(td, TD_EXITED);
 
 	/*
@@ -2435,6 +2432,8 @@ int fio_backend(struct sk_out *sk_out)
 			fio_mutex_remove(td->rusage_sem);
 			td->rusage_sem = NULL;
 		}
+		fio_mutex_remove(td->mutex);
+		td->mutex = NULL;
 	}
 
 	free_disk_util();

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-04-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-04-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 618ee94c319c46c670d29c7cf71538ca2ace13b7:

  Separate io_u from ioengine [3/3] - rename ioengine.h to ioengines.h (2017-03-28 15:14:20 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 249ad5bfaded90c0431041b4a3816e7371c2d403:

  time: use correct size type for usecs (2017-04-02 16:01:18 -0600)

----------------------------------------------------------------
Chris Taylor (1):
      time: fix overflow in timeval_add_msec()

Jens Axboe (1):
      time: use correct size type for usecs

 time.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/time.c b/time.c
index f5dc049..279ee48 100644
--- a/time.c
+++ b/time.c
@@ -8,8 +8,16 @@ static unsigned long ns_granularity;
 
 void timeval_add_msec(struct timeval *tv, unsigned int msec)
 {
-	tv->tv_usec += 1000 * msec;
-	if (tv->tv_usec >= 1000000) {
+	unsigned long adj_usec = 1000 * msec;
+
+	tv->tv_usec += adj_usec;
+	if (adj_usec >= 1000000) {
+		unsigned long adj_sec = adj_usec / 1000000;
+
+		tv->tv_usec -=  adj_sec * 1000000;
+		tv->tv_sec += adj_sec;
+	}
+	if (tv->tv_usec >= 1000000){
 		tv->tv_usec -= 1000000;
 		tv->tv_sec++;
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-03-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-03-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f678f8d2aa7f6972b18e368fe42f7bc48134e66c:

  configure: add a --disable-rdma flag to control rdma deps (2017-03-21 07:20:32 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 618ee94c319c46c670d29c7cf71538ca2ace13b7:

  Separate io_u from ioengine [3/3] - rename ioengine.h to ioengines.h (2017-03-28 15:14:20 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (12):
      Fix return value of make_filename() when no filename_format
      Test malloc result when allocation size is tunable
      Don't malloc more than necessary on extending/prereading file
      HOWTO: Mention niche detail of range format options
      Drop redundant "ignore invalidate option" message from 21c1b29e
      Ignore pre-read for FIO_NOIO td
      Don't proceed with error set when failed to pre-read files/devices
      Ignore pre-read for character devices
      Drop prototype of unused function td_io_sync()
      Separate io_u from ioengine [1/3] - add io_u.h
      Separate io_u from ioengine [2/3] - move io_u functions
      Separate io_u from ioengine [3/3] - rename ioengine.h to ioengines.h

 HOWTO                |  2 ++
 filesetup.c          | 35 +++++++++++++++-----
 fio.h                |  3 +-
 init.c               |  7 ++--
 io_u.c               | 58 +++++++++++++++++++++++++++++++++
 ioengine.h => io_u.h | 86 ++-----------------------------------------------
 ioengines.c          | 58 ---------------------------------
 ioengines.h          | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 iolog.h              |  2 +-
 rate-submit.c        |  2 +-
 10 files changed, 185 insertions(+), 158 deletions(-)
 rename ioengine.h => io_u.h (52%)
 create mode 100644 ioengines.h

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index cae95b7..80b9e75 100644
--- a/HOWTO
+++ b/HOWTO
@@ -543,6 +543,8 @@ Parameter types
 
 	If the option accepts an upper and lower range, use a colon ':' or
 	minus '-' to separate such values. See :ref:`irange <irange>`.
+	If the lower value specified happens to be larger than the upper value,
+	two values are swapped.
 
 .. _bool:
 
diff --git a/filesetup.c b/filesetup.c
index bcf95bd..612e794 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -159,11 +159,18 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 		}
 	}
 
-	b = malloc(td->o.max_bs[DDIR_WRITE]);
-
 	left = f->real_file_size;
+	bs = td->o.max_bs[DDIR_WRITE];
+	if (bs > left)
+		bs = left;
+
+	b = malloc(bs);
+	if (!b) {
+		td_verror(td, errno, "malloc");
+		goto err;
+	}
+
 	while (left && !td->terminate) {
-		bs = td->o.max_bs[DDIR_WRITE];
 		if (bs > left)
 			bs = left;
 
@@ -228,7 +235,11 @@ static int pre_read_file(struct thread_data *td, struct fio_file *f)
 	unsigned int bs;
 	char *b;
 
-	if (td_ioengine_flagged(td, FIO_PIPEIO))
+	if (td_ioengine_flagged(td, FIO_PIPEIO) ||
+	    td_ioengine_flagged(td, FIO_NOIO))
+		return 0;
+
+	if (f->filetype == FIO_TYPE_CHAR)
 		return 0;
 
 	if (!fio_file_open(f)) {
@@ -241,8 +252,17 @@ static int pre_read_file(struct thread_data *td, struct fio_file *f)
 
 	old_runstate = td_bump_runstate(td, TD_PRE_READING);
 
+	left = f->io_size;
 	bs = td->o.max_bs[DDIR_READ];
+	if (bs > left)
+		bs = left;
+
 	b = malloc(bs);
+	if (!b) {
+		td_verror(td, errno, "malloc");
+		ret = 1;
+		goto error;
+	}
 	memset(b, 0, bs);
 
 	if (lseek(f->fd, f->file_offset, SEEK_SET) < 0) {
@@ -252,8 +272,6 @@ static int pre_read_file(struct thread_data *td, struct fio_file *f)
 		goto error;
 	}
 
-	left = f->io_size;
-
 	while (left && !td->terminate) {
 		if (bs > left)
 			bs = left;
@@ -1104,10 +1122,11 @@ int pre_read_files(struct thread_data *td)
 	dprint(FD_FILE, "pre_read files\n");
 
 	for_each_file(td, f, i) {
-		pre_read_file(td, f);
+		if (pre_read_file(td, f))
+			return -1;
 	}
 
-	return 1;
+	return 0;
 }
 
 static int __init_rand_distribution(struct thread_data *td, struct fio_file *f)
diff --git a/fio.h b/fio.h
index 52a9b75..3955a81 100644
--- a/fio.h
+++ b/fio.h
@@ -25,7 +25,7 @@
 #include "debug.h"
 #include "file.h"
 #include "io_ddir.h"
-#include "ioengine.h"
+#include "ioengines.h"
 #include "iolog.h"
 #include "helpers.h"
 #include "options.h"
@@ -39,6 +39,7 @@
 #include "server.h"
 #include "stat.h"
 #include "flow.h"
+#include "io_u.h"
 #include "io_u_queue.h"
 #include "workqueue.h"
 #include "steadystate.h"
diff --git a/init.c b/init.c
index 4a72255..2f9433b 100644
--- a/init.c
+++ b/init.c
@@ -765,11 +765,8 @@ static int fixup_options(struct thread_data *td)
 	}
 
 	if (o->pre_read) {
-		if (o->invalidate_cache) {
-			log_info("fio: ignore invalidate option for %s\n",
-				 o->name);
+		if (o->invalidate_cache)
 			o->invalidate_cache = 0;
-		}
 		if (td_ioengine_flagged(td, FIO_PIPEIO)) {
 			log_info("fio: cannot pre-read files with an IO engine"
 				 " that isn't seekable. Pre-read disabled.\n");
@@ -1121,7 +1118,7 @@ static char *make_filename(char *buf, size_t buf_size,struct thread_options *o,
 
 	if (!o->filename_format || !strlen(o->filename_format)) {
 		sprintf(buf, "%s.%d.%d", jobname, jobnum, filenum);
-		return NULL;
+		return buf;
 	}
 
 	for (f = &fpre_keywords[0]; f->keyword; f++)
diff --git a/io_u.c b/io_u.c
index c6d814b..363bfe1 100644
--- a/io_u.c
+++ b/io_u.c
@@ -2087,3 +2087,61 @@ void io_u_fill_buffer(struct thread_data *td, struct io_u *io_u,
 	io_u->buf_filled_len = 0;
 	fill_io_buffer(td, io_u->buf, min_write, max_bs);
 }
+
+static int do_sync_file_range(const struct thread_data *td,
+			      struct fio_file *f)
+{
+	off64_t offset, nbytes;
+
+	offset = f->first_write;
+	nbytes = f->last_write - f->first_write;
+
+	if (!nbytes)
+		return 0;
+
+	return sync_file_range(f->fd, offset, nbytes, td->o.sync_file_range);
+}
+
+int do_io_u_sync(const struct thread_data *td, struct io_u *io_u)
+{
+	int ret;
+
+	if (io_u->ddir == DDIR_SYNC) {
+		ret = fsync(io_u->file->fd);
+	} else if (io_u->ddir == DDIR_DATASYNC) {
+#ifdef CONFIG_FDATASYNC
+		ret = fdatasync(io_u->file->fd);
+#else
+		ret = io_u->xfer_buflen;
+		io_u->error = EINVAL;
+#endif
+	} else if (io_u->ddir == DDIR_SYNC_FILE_RANGE)
+		ret = do_sync_file_range(td, io_u->file);
+	else {
+		ret = io_u->xfer_buflen;
+		io_u->error = EINVAL;
+	}
+
+	if (ret < 0)
+		io_u->error = errno;
+
+	return ret;
+}
+
+int do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
+{
+#ifndef FIO_HAVE_TRIM
+	io_u->error = EINVAL;
+	return 0;
+#else
+	struct fio_file *f = io_u->file;
+	int ret;
+
+	ret = os_trim(f->fd, io_u->offset, io_u->xfer_buflen);
+	if (!ret)
+		return io_u->xfer_buflen;
+
+	io_u->error = ret;
+	return 0;
+#endif
+}
diff --git a/io_u.h b/io_u.h
new file mode 100644
index 0000000..155344d
--- /dev/null
+++ b/io_u.h
@@ -0,0 +1,179 @@
+#ifndef FIO_IO_U
+#define FIO_IO_U
+
+#include "compiler/compiler.h"
+#include "os/os.h"
+#include "log.h"
+#include "io_ddir.h"
+#include "debug.h"
+#include "file.h"
+#include "workqueue.h"
+
+#ifdef CONFIG_LIBAIO
+#include <libaio.h>
+#endif
+#ifdef CONFIG_GUASI
+#include <guasi.h>
+#endif
+
+enum {
+	IO_U_F_FREE		= 1 << 0,
+	IO_U_F_FLIGHT		= 1 << 1,
+	IO_U_F_NO_FILE_PUT	= 1 << 2,
+	IO_U_F_IN_CUR_DEPTH	= 1 << 3,
+	IO_U_F_BUSY_OK		= 1 << 4,
+	IO_U_F_TRIMMED		= 1 << 5,
+	IO_U_F_BARRIER		= 1 << 6,
+	IO_U_F_VER_LIST		= 1 << 7,
+};
+
+/*
+ * The io unit
+ */
+struct io_u {
+	struct timeval start_time;
+	struct timeval issue_time;
+
+	struct fio_file *file;
+	unsigned int flags;
+	enum fio_ddir ddir;
+
+	/*
+	 * For replay workloads, we may want to account as a different
+	 * IO type than what is being submitted.
+	 */
+	enum fio_ddir acct_ddir;
+
+	/*
+	 * Write generation
+	 */
+	unsigned short numberio;
+
+	/*
+	 * Allocated/set buffer and length
+	 */
+	unsigned long buflen;
+	unsigned long long offset;
+	void *buf;
+
+	/*
+	 * Initial seed for generating the buffer contents
+	 */
+	uint64_t rand_seed;
+
+	/*
+	 * IO engine state, may be different from above when we get
+	 * partial transfers / residual data counts
+	 */
+	void *xfer_buf;
+	unsigned long xfer_buflen;
+
+	/*
+	 * Parameter related to pre-filled buffers and
+	 * their size to handle variable block sizes.
+	 */
+	unsigned long buf_filled_len;
+
+	struct io_piece *ipo;
+
+	unsigned int resid;
+	unsigned int error;
+
+	/*
+	 * io engine private data
+	 */
+	union {
+		unsigned int index;
+		unsigned int seen;
+		void *engine_data;
+	};
+
+	union {
+		struct flist_head verify_list;
+		struct workqueue_work work;
+	};
+
+	/*
+	 * Callback for io completion
+	 */
+	int (*end_io)(struct thread_data *, struct io_u **);
+
+	union {
+#ifdef CONFIG_LIBAIO
+		struct iocb iocb;
+#endif
+#ifdef CONFIG_POSIXAIO
+		os_aiocb_t aiocb;
+#endif
+#ifdef FIO_HAVE_SGIO
+		struct sg_io_hdr hdr;
+#endif
+#ifdef CONFIG_GUASI
+		guasi_req_t greq;
+#endif
+#ifdef CONFIG_SOLARISAIO
+		aio_result_t resultp;
+#endif
+#ifdef FIO_HAVE_BINJECT
+		struct b_user_cmd buc;
+#endif
+#ifdef CONFIG_RDMA
+		struct ibv_mr *mr;
+#endif
+		void *mmap_data;
+	};
+};
+
+/*
+ * io unit handling
+ */
+extern struct io_u *__get_io_u(struct thread_data *);
+extern struct io_u *get_io_u(struct thread_data *);
+extern void put_io_u(struct thread_data *, struct io_u *);
+extern void clear_io_u(struct thread_data *, struct io_u *);
+extern void requeue_io_u(struct thread_data *, struct io_u **);
+extern int __must_check io_u_sync_complete(struct thread_data *, struct io_u *);
+extern int __must_check io_u_queued_complete(struct thread_data *, int);
+extern void io_u_queued(struct thread_data *, struct io_u *);
+extern int io_u_quiesce(struct thread_data *);
+extern void io_u_log_error(struct thread_data *, struct io_u *);
+extern void io_u_mark_depth(struct thread_data *, unsigned int);
+extern void fill_io_buffer(struct thread_data *, void *, unsigned int, unsigned int);
+extern void io_u_fill_buffer(struct thread_data *td, struct io_u *, unsigned int, unsigned int);
+void io_u_mark_complete(struct thread_data *, unsigned int);
+void io_u_mark_submit(struct thread_data *, unsigned int);
+bool queue_full(const struct thread_data *);
+
+int do_io_u_sync(const struct thread_data *, struct io_u *);
+int do_io_u_trim(const struct thread_data *, struct io_u *);
+
+#ifdef FIO_INC_DEBUG
+static inline void dprint_io_u(struct io_u *io_u, const char *p)
+{
+	struct fio_file *f = io_u->file;
+
+	dprint(FD_IO, "%s: io_u %p: off=%llu/len=%lu/ddir=%d", p, io_u,
+					(unsigned long long) io_u->offset,
+					io_u->buflen, io_u->ddir);
+	if (f)
+		dprint(FD_IO, "/%s", f->file_name);
+	dprint(FD_IO, "\n");
+}
+#else
+#define dprint_io_u(io_u, p)
+#endif
+
+static inline enum fio_ddir acct_ddir(struct io_u *io_u)
+{
+	if (io_u->acct_ddir != -1)
+		return io_u->acct_ddir;
+
+	return io_u->ddir;
+}
+
+#define io_u_clear(td, io_u, val)	\
+	td_flags_clear((td), &(io_u->flags), (val))
+#define io_u_set(td, io_u, val)		\
+	td_flags_set((td), &(io_u)->flags, (val))
+
+#endif
diff --git a/ioengine.h b/ioengine.h
deleted file mode 100644
index 7249df6..0000000
--- a/ioengine.h
+++ /dev/null
@@ -1,261 +0,0 @@
-#ifndef FIO_IOENGINE_H
-#define FIO_IOENGINE_H
-
-#include "compiler/compiler.h"
-#include "os/os.h"
-#include "log.h"
-#include "io_ddir.h"
-#include "debug.h"
-#include "file.h"
-#include "workqueue.h"
-
-#ifdef CONFIG_LIBAIO
-#include <libaio.h>
-#endif
-#ifdef CONFIG_GUASI
-#include <guasi.h>
-#endif
-
-#define FIO_IOOPS_VERSION	23
-
-enum {
-	IO_U_F_FREE		= 1 << 0,
-	IO_U_F_FLIGHT		= 1 << 1,
-	IO_U_F_NO_FILE_PUT	= 1 << 2,
-	IO_U_F_IN_CUR_DEPTH	= 1 << 3,
-	IO_U_F_BUSY_OK		= 1 << 4,
-	IO_U_F_TRIMMED		= 1 << 5,
-	IO_U_F_BARRIER		= 1 << 6,
-	IO_U_F_VER_LIST		= 1 << 7,
-};
-
-/*
- * The io unit
- */
-struct io_u {
-	struct timeval start_time;
-	struct timeval issue_time;
-
-	struct fio_file *file;
-	unsigned int flags;
-	enum fio_ddir ddir;
-
-	/*
-	 * For replay workloads, we may want to account as a different
-	 * IO type than what is being submitted.
-	 */
-	enum fio_ddir acct_ddir;
-
-	/*
-	 * Write generation
-	 */
-	unsigned short numberio;
-
-	/*
-	 * Allocated/set buffer and length
-	 */
-	unsigned long buflen;
-	unsigned long long offset;
-	void *buf;
-
-	/*
-	 * Initial seed for generating the buffer contents
-	 */
-	uint64_t rand_seed;
-
-	/*
-	 * IO engine state, may be different from above when we get
-	 * partial transfers / residual data counts
-	 */
-	void *xfer_buf;
-	unsigned long xfer_buflen;
-
-	/*
-	 * Parameter related to pre-filled buffers and
-	 * their size to handle variable block sizes.
-	 */
-	unsigned long buf_filled_len;
-
-	struct io_piece *ipo;
-
-	unsigned int resid;
-	unsigned int error;
-
-	/*
-	 * io engine private data
-	 */
-	union {
-		unsigned int index;
-		unsigned int seen;
-		void *engine_data;
-	};
-
-	union {
-		struct flist_head verify_list;
-		struct workqueue_work work;
-	};
-
-	/*
-	 * Callback for io completion
-	 */
-	int (*end_io)(struct thread_data *, struct io_u **);
-
-	union {
-#ifdef CONFIG_LIBAIO
-		struct iocb iocb;
-#endif
-#ifdef CONFIG_POSIXAIO
-		os_aiocb_t aiocb;
-#endif
-#ifdef FIO_HAVE_SGIO
-		struct sg_io_hdr hdr;
-#endif
-#ifdef CONFIG_GUASI
-		guasi_req_t greq;
-#endif
-#ifdef CONFIG_SOLARISAIO
-		aio_result_t resultp;
-#endif
-#ifdef FIO_HAVE_BINJECT
-		struct b_user_cmd buc;
-#endif
-#ifdef CONFIG_RDMA
-		struct ibv_mr *mr;
-#endif
-		void *mmap_data;
-	};
-};
-
-/*
- * io_ops->queue() return values
- */
-enum {
-	FIO_Q_COMPLETED	= 0,		/* completed sync */
-	FIO_Q_QUEUED	= 1,		/* queued, will complete async */
-	FIO_Q_BUSY	= 2,		/* no more room, call ->commit() */
-};
-
-struct ioengine_ops {
-	struct flist_head list;
-	const char *name;
-	int version;
-	int flags;
-	int (*setup)(struct thread_data *);
-	int (*init)(struct thread_data *);
-	int (*prep)(struct thread_data *, struct io_u *);
-	int (*queue)(struct thread_data *, struct io_u *);
-	int (*commit)(struct thread_data *);
-	int (*getevents)(struct thread_data *, unsigned int, unsigned int, const struct timespec *);
-	struct io_u *(*event)(struct thread_data *, int);
-	char *(*errdetails)(struct io_u *);
-	int (*cancel)(struct thread_data *, struct io_u *);
-	void (*cleanup)(struct thread_data *);
-	int (*open_file)(struct thread_data *, struct fio_file *);
-	int (*close_file)(struct thread_data *, struct fio_file *);
-	int (*invalidate)(struct thread_data *, struct fio_file *);
-	int (*unlink_file)(struct thread_data *, struct fio_file *);
-	int (*get_file_size)(struct thread_data *, struct fio_file *);
-	void (*terminate)(struct thread_data *);
-	int (*iomem_alloc)(struct thread_data *, size_t);
-	void (*iomem_free)(struct thread_data *);
-	int (*io_u_init)(struct thread_data *, struct io_u *);
-	void (*io_u_free)(struct thread_data *, struct io_u *);
-	int option_struct_size;
-	struct fio_option *options;
-};
-
-enum fio_ioengine_flags {
-	FIO_SYNCIO	= 1 << 0,	/* io engine has synchronous ->queue */
-	FIO_RAWIO	= 1 << 1,	/* some sort of direct/raw io */
-	FIO_DISKLESSIO	= 1 << 2,	/* no disk involved */
-	FIO_NOEXTEND	= 1 << 3,	/* engine can't extend file */
-	FIO_NODISKUTIL  = 1 << 4,	/* diskutil can't handle filename */
-	FIO_UNIDIR	= 1 << 5,	/* engine is uni-directional */
-	FIO_NOIO	= 1 << 6,	/* thread does only pseudo IO */
-	FIO_PIPEIO	= 1 << 7,	/* input/output no seekable */
-	FIO_BARRIER	= 1 << 8,	/* engine supports barriers */
-	FIO_MEMALIGN	= 1 << 9,	/* engine wants aligned memory */
-	FIO_BIT_BASED	= 1 << 10,	/* engine uses a bit base (e.g. uses Kbit as opposed to KB) */
-	FIO_FAKEIO	= 1 << 11,	/* engine pretends to do IO */
-};
-
-/*
- * External engine defined symbol to fill in the engine ops structure
- */
-typedef void (*get_ioengine_t)(struct ioengine_ops **);
-
-/*
- * io engine entry points
- */
-extern int __must_check td_io_init(struct thread_data *);
-extern int __must_check td_io_prep(struct thread_data *, struct io_u *);
-extern int __must_check td_io_queue(struct thread_data *, struct io_u *);
-extern int __must_check td_io_sync(struct thread_data *, struct fio_file *);
-extern int __must_check td_io_getevents(struct thread_data *, unsigned int, unsigned int, const struct timespec *);
-extern int __must_check td_io_commit(struct thread_data *);
-extern int __must_check td_io_open_file(struct thread_data *, struct fio_file *);
-extern int td_io_close_file(struct thread_data *, struct fio_file *);
-extern int td_io_unlink_file(struct thread_data *, struct fio_file *);
-extern int __must_check td_io_get_file_size(struct thread_data *, struct fio_file *);
-
-extern struct ioengine_ops *load_ioengine(struct thread_data *, const char *);
-extern void register_ioengine(struct ioengine_ops *);
-extern void unregister_ioengine(struct ioengine_ops *);
-extern void free_ioengine(struct thread_data *);
-extern void close_ioengine(struct thread_data *);
-
-extern int fio_show_ioengine_help(const char *engine);
-
-/*
- * io unit handling
- */
-extern struct io_u *__get_io_u(struct thread_data *);
-extern struct io_u *get_io_u(struct thread_data *);
-extern void put_io_u(struct thread_data *, struct io_u *);
-extern void clear_io_u(struct thread_data *, struct io_u *);
-extern void requeue_io_u(struct thread_data *, struct io_u **);
-extern int __must_check io_u_sync_complete(struct thread_data *, struct io_u *);
-extern int __must_check io_u_queued_complete(struct thread_data *, int);
-extern void io_u_queued(struct thread_data *, struct io_u *);
-extern int io_u_quiesce(struct thread_data *);
-extern void io_u_log_error(struct thread_data *, struct io_u *);
-extern void io_u_mark_depth(struct thread_data *, unsigned int);
-extern void fill_io_buffer(struct thread_data *, void *, unsigned int, unsigned int);
-extern void io_u_fill_buffer(struct thread_data *td, struct io_u *, unsigned int, unsigned int);
-void io_u_mark_complete(struct thread_data *, unsigned int);
-void io_u_mark_submit(struct thread_data *, unsigned int);
-bool queue_full(const struct thread_data *);
-
-int do_io_u_sync(const struct thread_data *, struct io_u *);
-int do_io_u_trim(const struct thread_data *, struct io_u *);
-
-#ifdef FIO_INC_DEBUG
-static inline void dprint_io_u(struct io_u *io_u, const char *p)
-{
-	struct fio_file *f = io_u->file;
-
-	dprint(FD_IO, "%s: io_u %p: off=%llu/len=%lu/ddir=%d", p, io_u,
-					(unsigned long long) io_u->offset,
-					io_u->buflen, io_u->ddir);
-	if (f)
-		dprint(FD_IO, "/%s", f->file_name);
-	dprint(FD_IO, "\n");
-}
-#else
-#define dprint_io_u(io_u, p)
-#endif
-
-static inline enum fio_ddir acct_ddir(struct io_u *io_u)
-{
-	if (io_u->acct_ddir != -1)
-		return io_u->acct_ddir;
-
-	return io_u->ddir;
-}
-
-#define io_u_clear(td, io_u, val)	\
-	td_flags_clear((td), &(io_u->flags), (val))
-#define io_u_set(td, io_u, val)		\
-	td_flags_set((td), &(io_u)->flags, (val))
-
-#endif
diff --git a/ioengines.c b/ioengines.c
index c773f2e..c90a2ca 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -556,64 +556,6 @@ int td_io_get_file_size(struct thread_data *td, struct fio_file *f)
 	return td->io_ops->get_file_size(td, f);
 }
 
-static int do_sync_file_range(const struct thread_data *td,
-			      struct fio_file *f)
-{
-	off64_t offset, nbytes;
-
-	offset = f->first_write;
-	nbytes = f->last_write - f->first_write;
-
-	if (!nbytes)
-		return 0;
-
-	return sync_file_range(f->fd, offset, nbytes, td->o.sync_file_range);
-}
-
-int do_io_u_sync(const struct thread_data *td, struct io_u *io_u)
-{
-	int ret;
-
-	if (io_u->ddir == DDIR_SYNC) {
-		ret = fsync(io_u->file->fd);
-	} else if (io_u->ddir == DDIR_DATASYNC) {
-#ifdef CONFIG_FDATASYNC
-		ret = fdatasync(io_u->file->fd);
-#else
-		ret = io_u->xfer_buflen;
-		io_u->error = EINVAL;
-#endif
-	} else if (io_u->ddir == DDIR_SYNC_FILE_RANGE)
-		ret = do_sync_file_range(td, io_u->file);
-	else {
-		ret = io_u->xfer_buflen;
-		io_u->error = EINVAL;
-	}
-
-	if (ret < 0)
-		io_u->error = errno;
-
-	return ret;
-}
-
-int do_io_u_trim(const struct thread_data *td, struct io_u *io_u)
-{
-#ifndef FIO_HAVE_TRIM
-	io_u->error = EINVAL;
-	return 0;
-#else
-	struct fio_file *f = io_u->file;
-	int ret;
-
-	ret = os_trim(f->fd, io_u->offset, io_u->xfer_buflen);
-	if (!ret)
-		return io_u->xfer_buflen;
-
-	io_u->error = ret;
-	return 0;
-#endif
-}
-
 int fio_show_ioengine_help(const char *engine)
 {
 	struct flist_head *entry;
diff --git a/ioengines.h b/ioengines.h
new file mode 100644
index 0000000..f24f4df
--- /dev/null
+++ b/ioengines.h
@@ -0,0 +1,90 @@
+#ifndef FIO_IOENGINE_H
+#define FIO_IOENGINE_H
+
+#include "compiler/compiler.h"
+#include "os/os.h"
+#include "file.h"
+#include "io_u.h"
+
+#define FIO_IOOPS_VERSION	23
+
+/*
+ * io_ops->queue() return values
+ */
+enum {
+	FIO_Q_COMPLETED	= 0,		/* completed sync */
+	FIO_Q_QUEUED	= 1,		/* queued, will complete async */
+	FIO_Q_BUSY	= 2,		/* no more room, call ->commit() */
+};
+
+struct ioengine_ops {
+	struct flist_head list;
+	const char *name;
+	int version;
+	int flags;
+	int (*setup)(struct thread_data *);
+	int (*init)(struct thread_data *);
+	int (*prep)(struct thread_data *, struct io_u *);
+	int (*queue)(struct thread_data *, struct io_u *);
+	int (*commit)(struct thread_data *);
+	int (*getevents)(struct thread_data *, unsigned int, unsigned int, const struct timespec *);
+	struct io_u *(*event)(struct thread_data *, int);
+	char *(*errdetails)(struct io_u *);
+	int (*cancel)(struct thread_data *, struct io_u *);
+	void (*cleanup)(struct thread_data *);
+	int (*open_file)(struct thread_data *, struct fio_file *);
+	int (*close_file)(struct thread_data *, struct fio_file *);
+	int (*invalidate)(struct thread_data *, struct fio_file *);
+	int (*unlink_file)(struct thread_data *, struct fio_file *);
+	int (*get_file_size)(struct thread_data *, struct fio_file *);
+	void (*terminate)(struct thread_data *);
+	int (*iomem_alloc)(struct thread_data *, size_t);
+	void (*iomem_free)(struct thread_data *);
+	int (*io_u_init)(struct thread_data *, struct io_u *);
+	void (*io_u_free)(struct thread_data *, struct io_u *);
+	int option_struct_size;
+	struct fio_option *options;
+};
+
+enum fio_ioengine_flags {
+	FIO_SYNCIO	= 1 << 0,	/* io engine has synchronous ->queue */
+	FIO_RAWIO	= 1 << 1,	/* some sort of direct/raw io */
+	FIO_DISKLESSIO	= 1 << 2,	/* no disk involved */
+	FIO_NOEXTEND	= 1 << 3,	/* engine can't extend file */
+	FIO_NODISKUTIL  = 1 << 4,	/* diskutil can't handle filename */
+	FIO_UNIDIR	= 1 << 5,	/* engine is uni-directional */
+	FIO_NOIO	= 1 << 6,	/* thread does only pseudo IO */
+	FIO_PIPEIO	= 1 << 7,	/* input/output no seekable */
+	FIO_BARRIER	= 1 << 8,	/* engine supports barriers */
+	FIO_MEMALIGN	= 1 << 9,	/* engine wants aligned memory */
+	FIO_BIT_BASED	= 1 << 10,	/* engine uses a bit base (e.g. uses Kbit as opposed to KB) */
+	FIO_FAKEIO	= 1 << 11,	/* engine pretends to do IO */
+};
+
+/*
+ * External engine defined symbol to fill in the engine ops structure
+ */
+typedef void (*get_ioengine_t)(struct ioengine_ops **);
+
+/*
+ * io engine entry points
+ */
+extern int __must_check td_io_init(struct thread_data *);
+extern int __must_check td_io_prep(struct thread_data *, struct io_u *);
+extern int __must_check td_io_queue(struct thread_data *, struct io_u *);
+extern int __must_check td_io_getevents(struct thread_data *, unsigned int, unsigned int, const struct timespec *);
+extern int __must_check td_io_commit(struct thread_data *);
+extern int __must_check td_io_open_file(struct thread_data *, struct fio_file *);
+extern int td_io_close_file(struct thread_data *, struct fio_file *);
+extern int td_io_unlink_file(struct thread_data *, struct fio_file *);
+extern int __must_check td_io_get_file_size(struct thread_data *, struct fio_file *);
+
+extern struct ioengine_ops *load_ioengine(struct thread_data *, const char *);
+extern void register_ioengine(struct ioengine_ops *);
+extern void unregister_ioengine(struct ioengine_ops *);
+extern void free_ioengine(struct thread_data *);
+extern void close_ioengine(struct thread_data *);
+
+extern int fio_show_ioengine_help(const char *engine);
+
+#endif
diff --git a/iolog.h b/iolog.h
index 37f27bc..0733ad3 100644
--- a/iolog.h
+++ b/iolog.h
@@ -4,7 +4,7 @@
 #include "lib/rbtree.h"
 #include "lib/ieee754.h"
 #include "flist.h"
-#include "ioengine.h"
+#include "ioengines.h"
 
 /*
  * Use for maintaining statistics
diff --git a/rate-submit.c b/rate-submit.c
index 4738dc4..fdbece6 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -5,7 +5,7 @@
  *
  */
 #include "fio.h"
-#include "ioengine.h"
+#include "ioengines.h"
 #include "lib/getrusage.h"
 #include "rate-submit.h"
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-03-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-03-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit cee3ddfee4d39ec9ba31b7329a343053af057914:

  Merge branch 'wip-fix-bs-title' of https://github.com/liupan1111/fio (2017-03-19 19:49:04 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f678f8d2aa7f6972b18e368fe42f7bc48134e66c:

  configure: add a --disable-rdma flag to control rdma deps (2017-03-21 07:20:32 -0600)

----------------------------------------------------------------
Mike Frysinger (1):
      configure: add a --disable-rdma flag to control rdma deps

Tomohiro Kusumi (7):
      Replace redundant TD_F_NOIO flag with td->io_ops_init
      Define struct sk_out in server.h (not server.c)
      HOWTO: Mention cpuload= is mandatory for cpuio
      HOWTO: Mention fsync=/fsyncdata= are set to 0 by default
      Fix a comment after f227e2b6
      Test uint,int before division uint/int for the next i/o
      Test fsync/fdatasync/sync_file_range for the next i/o only if should_fsync(td)

 HOWTO       |  8 ++++++--
 configure   |  7 +++++--
 filesetup.c |  9 +++++----
 fio.h       |  4 ++--
 init.c      |  1 +
 io_u.c      | 36 +++++++++++++++---------------------
 ioengines.c | 14 +++++++-------
 libfio.c    |  2 +-
 server.c    | 11 -----------
 server.h    | 11 +++++++++++
 10 files changed, 53 insertions(+), 50 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 5d378f3..cae95b7 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1100,13 +1100,15 @@ I/O type
 	blocks given. For example, if you give 32 as a parameter, fio will sync the
 	file for every 32 writes issued. If fio is using non-buffered I/O, we may
 	not sync the file. The exception is the sg I/O engine, which synchronizes
-	the disk cache anyway.
+	the disk cache anyway. Defaults to 0, which means no sync every certain
+	number of writes.
 
 .. option:: fdatasync=int
 
 	Like :option:`fsync` but uses :manpage:`fdatasync(2)` to only sync data and
 	not metadata blocks.  In Windows, FreeBSD, and DragonFlyBSD there is no
 	:manpage:`fdatasync(2)`, this falls back to using :manpage:`fsync(2)`.
+	Defaults to 0, which means no sync data every certain number of writes.
 
 .. option:: write_barrier=int
 
@@ -1571,6 +1573,7 @@ I/O engine
 		**sync**
 			Basic :manpage:`read(2)` or :manpage:`write(2)`
 			I/O. :manpage:`lseek(2)` is used to position the I/O location.
+			See :option:`fsync` and :option:`fdatasync` for syncing write I/Os.
 
 		**psync**
 			Basic :manpage:`pread(2)` or :manpage:`pwrite(2)` I/O.  Default on
@@ -1748,7 +1751,8 @@ caveat that when used on the command line, they must come after the
 
 .. option:: cpuload=int : [cpuio]
 
-	Attempt to use the specified percentage of CPU cycles.
+	Attempt to use the specified percentage of CPU cycles. This is a mandatory
+	option when using cpuio I/O engine.
 
 .. option:: cpuchunks=int : [cpuio]
 
diff --git a/configure b/configure
index 9335124..f42489b 100755
--- a/configure
+++ b/configure
@@ -166,6 +166,8 @@ for opt do
   ;;
   --disable-numa) disable_numa="yes"
   ;;
+  --disable-rdma) disable_rdma="yes"
+  ;;
   --disable-rbd) disable_rbd="yes"
   ;;
   --disable-rbd-blkin) disable_rbd_blkin="yes"
@@ -204,6 +206,7 @@ if test "$show_help" = "yes" ; then
   echo "--esx                   Configure build options for esx"
   echo "--enable-gfio           Enable building of gtk gfio"
   echo "--disable-numa          Disable libnuma even if found"
+  echo "--disable-rdma         Disable RDMA support even if found"
   echo "--disable-gfapi         Disable gfapi"
   echo "--enable-libhdfs        Enable hdfs support"
   echo "--disable-lex           Disable use of lex/yacc for math"
@@ -692,7 +695,7 @@ int main(int argc, char **argv)
   return 0;
 }
 EOF
-if compile_prog "" "-libverbs" "libverbs" ; then
+if test "$disable_rdma" != "yes" && compile_prog "" "-libverbs" "libverbs" ; then
     libverbs="yes"
     LIBS="-libverbs $LIBS"
 fi
@@ -712,7 +715,7 @@ int main(int argc, char **argv)
   return 0;
 }
 EOF
-if compile_prog "" "-lrdmacm" "rdma"; then
+if test "$disable_rdma" != "yes" && compile_prog "" "-lrdmacm" "rdma"; then
     rdmacm="yes"
     LIBS="-lrdmacm $LIBS"
 fi
diff --git a/filesetup.c b/filesetup.c
index c9f2b5f..bcf95bd 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -932,10 +932,11 @@ int setup_files(struct thread_data *td)
 			}
 
 			/*
-			 * We normally don't come here, but if the result is 0,
-			 * set it to the real file size. This could be size of
-			 * the existing one if it already exists, but otherwise
-			 * will be set to 0. A new file won't be created because
+			 * We normally don't come here for regular files, but
+			 * if the result is 0 for a regular file, set it to the
+			 * real file size. This could be size of the existing
+			 * one if it already exists, but otherwise will be set
+			 * to 0. A new file won't be created because
 			 * ->io_size + ->file_offset equals ->real_file_size.
 			 */
 			if (!f->io_size) {
diff --git a/fio.h b/fio.h
index b573ac5..52a9b75 100644
--- a/fio.h
+++ b/fio.h
@@ -74,7 +74,7 @@ enum {
 	TD_F_VER_NONE		= 1U << 5,
 	TD_F_PROFILE_OPS	= 1U << 6,
 	TD_F_COMPRESS		= 1U << 7,
-	TD_F_NOIO		= 1U << 8,
+	TD_F_RESERVED		= 1U << 8, /* not used */
 	TD_F_COMPRESS_LOG	= 1U << 9,
 	TD_F_VSTATE_SAVED	= 1U << 10,
 	TD_F_NEED_LOCK		= 1U << 11,
@@ -121,7 +121,6 @@ enum {
  * Per-thread/process specific data. Only used for the network client
  * for now.
  */
-struct sk_out;
 void sk_out_assign(struct sk_out *);
 void sk_out_drop(void);
 
@@ -231,6 +230,7 @@ struct thread_data {
 	 * to any of the available IO engines.
 	 */
 	struct ioengine_ops *io_ops;
+	int io_ops_init;
 
 	/*
 	 * IO engine private data and dlhandle.
diff --git a/init.c b/init.c
index b4b0974..4a72255 100644
--- a/init.c
+++ b/init.c
@@ -459,6 +459,7 @@ static struct thread_data *get_new_job(bool global, struct thread_data *parent,
 		copy_opt_list(td, parent);
 
 	td->io_ops = NULL;
+	td->io_ops_init = 0;
 	if (!preserve_eo)
 		td->eo = NULL;
 
diff --git a/io_u.c b/io_u.c
index 5f01c1b..c6d814b 100644
--- a/io_u.c
+++ b/io_u.c
@@ -717,28 +717,22 @@ static enum fio_ddir get_rw_ddir(struct thread_data *td)
 	enum fio_ddir ddir;
 
 	/*
-	 * see if it's time to fsync
+	 * See if it's time to fsync/fdatasync/sync_file_range first,
+	 * and if not then move on to check regular I/Os.
 	 */
-	if (td->o.fsync_blocks &&
-	   !(td->io_issues[DDIR_WRITE] % td->o.fsync_blocks) &&
-	     td->io_issues[DDIR_WRITE] && should_fsync(td))
-		return DDIR_SYNC;
-
-	/*
-	 * see if it's time to fdatasync
-	 */
-	if (td->o.fdatasync_blocks &&
-	   !(td->io_issues[DDIR_WRITE] % td->o.fdatasync_blocks) &&
-	     td->io_issues[DDIR_WRITE] && should_fsync(td))
-		return DDIR_DATASYNC;
-
-	/*
-	 * see if it's time to sync_file_range
-	 */
-	if (td->sync_file_range_nr &&
-	   !(td->io_issues[DDIR_WRITE] % td->sync_file_range_nr) &&
-	     td->io_issues[DDIR_WRITE] && should_fsync(td))
-		return DDIR_SYNC_FILE_RANGE;
+	if (should_fsync(td)) {
+		if (td->o.fsync_blocks && td->io_issues[DDIR_WRITE] &&
+		    !(td->io_issues[DDIR_WRITE] % td->o.fsync_blocks))
+			return DDIR_SYNC;
+
+		if (td->o.fdatasync_blocks && td->io_issues[DDIR_WRITE] &&
+		    !(td->io_issues[DDIR_WRITE] % td->o.fdatasync_blocks))
+			return DDIR_DATASYNC;
+
+		if (td->sync_file_range_nr && td->io_issues[DDIR_WRITE] &&
+		    !(td->io_issues[DDIR_WRITE] % td->sync_file_range_nr))
+			return DDIR_SYNC_FILE_RANGE;
+	}
 
 	if (td_rw(td)) {
 		/*
diff --git a/ioengines.c b/ioengines.c
index 95013d1..c773f2e 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -368,17 +368,17 @@ int td_io_init(struct thread_data *td)
 
 	if (td->io_ops->init) {
 		ret = td->io_ops->init(td);
-		if (ret && td->o.iodepth > 1) {
-			log_err("fio: io engine init failed. Perhaps try"
-				" reducing io depth?\n");
-		}
+		if (ret)
+			log_err("fio: io engine %s init failed.%s\n",
+				td->io_ops->name,
+				td->o.iodepth > 1 ?
+				" Perhaps try reducing io depth?" : "");
+		else
+			td->io_ops_init = 1;
 		if (!td->error)
 			td->error = ret;
 	}
 
-	if (!ret && td_ioengine_flagged(td, FIO_NOIO))
-		td->flags |= TD_F_NOIO;
-
 	return ret;
 }
 
diff --git a/libfio.c b/libfio.c
index 4b53c92..8310708 100644
--- a/libfio.c
+++ b/libfio.c
@@ -276,7 +276,7 @@ int fio_running_or_pending_io_threads(void)
 	int nr_io_threads = 0;
 
 	for_each_td(td, i) {
-		if (td->flags & TD_F_NOIO)
+		if (td->io_ops_init && td_ioengine_flagged(td, FIO_NOIO))
 			continue;
 		nr_io_threads++;
 		if (td->runstate < TD_EXITED)
diff --git a/server.c b/server.c
index 6d5d4ea..1b3bc30 100644
--- a/server.c
+++ b/server.c
@@ -50,17 +50,6 @@ struct sk_entry {
 	struct flist_head next;	/* Other sk_entry's, if linked command */
 };
 
-struct sk_out {
-	unsigned int refs;	/* frees sk_out when it drops to zero.
-				 * protected by below ->lock */
-
-	int sk;			/* socket fd to talk to client */
-	struct fio_mutex lock;	/* protects ref and below list */
-	struct flist_head list;	/* list of pending transmit work */
-	struct fio_mutex wait;	/* wake backend when items added to list */
-	struct fio_mutex xmit;	/* held while sending data */
-};
-
 static char *fio_server_arg;
 static char *bind_sock;
 static struct sockaddr_in saddr_in;
diff --git a/server.h b/server.h
index 3a1d0b0..798d5a8 100644
--- a/server.h
+++ b/server.h
@@ -12,6 +12,17 @@
 
 #define FIO_NET_PORT 8765
 
+struct sk_out {
+	unsigned int refs;	/* frees sk_out when it drops to zero.
+				 * protected by below ->lock */
+
+	int sk;			/* socket fd to talk to client */
+	struct fio_mutex lock;	/* protects ref and below list */
+	struct flist_head list;	/* list of pending transmit work */
+	struct fio_mutex wait;	/* wake backend when items added to list */
+	struct fio_mutex xmit;	/* held while sending data */
+};
+
 /*
  * On-wire encoding is little endian
  */

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-03-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-03-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d9bb03d475e918e08c38bd882032ff788daa297f:

  is_power_of_2() should return bool (2017-03-17 10:39:42 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to cee3ddfee4d39ec9ba31b7329a343053af057914:

  Merge branch 'wip-fix-bs-title' of https://github.com/liupan1111/fio (2017-03-19 19:49:04 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'wip-fix-bs-title' of https://github.com/liupan1111/fio

Pan Liu (1):
      make the bs info output clearer.

 init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 54fdb92..b4b0974 100644
--- a/init.c
+++ b/init.c
@@ -1534,10 +1534,10 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 							ddir_str(o->td_ddir));
 
 				if (o->bs_is_seq_rand)
-					log_info("bs=%s-%s,%s-%s, bs_is_seq_rand, ",
+					log_info("bs=(R) %s-%s, (W) %s-%s, bs_is_seq_rand, ",
 							c1, c2, c3, c4);
 				else
-					log_info("bs=%s-%s,%s-%s,%s-%s, ",
+					log_info("bs=(R) %s-%s, (W) %s-%s, (T) %s-%s, ",
 							c1, c2, c3, c4, c5, c6);
 
 				log_info("ioengine=%s, iodepth=%u\n",

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-03-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-03-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8243be59aa35aa016fcbeee99353b08376953911:

  Add 'stats' option (2017-03-16 14:43:37 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d9bb03d475e918e08c38bd882032ff788daa297f:

  is_power_of_2() should return bool (2017-03-17 10:39:42 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      io_u: we don't need to set power_2 to false
      is_power_of_2() should return bool

Pan Liu (1):
      fixed the error=invalid argument when the lower bound of bsrange is not power of 2.

 io_u.c     | 7 +++++--
 lib/pow2.h | 3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index f6efae0..5f01c1b 100644
--- a/io_u.c
+++ b/io_u.c
@@ -533,6 +533,7 @@ static unsigned int __get_next_buflen(struct thread_data *td, struct io_u *io_u,
 	unsigned int buflen = 0;
 	unsigned int minbs, maxbs;
 	uint64_t frand_max, r;
+	bool power_2;
 
 	assert(ddir_rw(ddir));
 
@@ -577,9 +578,11 @@ static unsigned int __get_next_buflen(struct thread_data *td, struct io_u *io_u,
 			}
 		}
 
-		if (!td->o.bs_unaligned && is_power_of_2(minbs))
+		power_2 = is_power_of_2(minbs);
+		if (!td->o.bs_unaligned && power_2)
 			buflen &= ~(minbs - 1);
-
+		else if (!td->o.bs_unaligned && !power_2) 
+			buflen -= buflen % minbs; 
 	} while (!io_u_fits(td, io_u, buflen));
 
 	return buflen;
diff --git a/lib/pow2.h b/lib/pow2.h
index f3ca4d7..2cbca1a 100644
--- a/lib/pow2.h
+++ b/lib/pow2.h
@@ -2,8 +2,9 @@
 #define FIO_POW2_H
 
 #include <inttypes.h>
+#include "types.h"
 
-static inline int is_power_of_2(uint64_t val)
+static inline bool is_power_of_2(uint64_t val)
 {
 	return (val != 0 && ((val & (val - 1)) == 0));
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-03-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-03-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 06eac6b2318da7759a055c4a3ac01c2c1e8aa764:

  configure: add generic pshared mutex test (2017-03-14 10:52:42 +0000)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8243be59aa35aa016fcbeee99353b08376953911:

  Add 'stats' option (2017-03-16 14:43:37 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      filesetup: remove bogus message on forcing file size
      Add 'stats' option

 HOWTO            |  6 ++++++
 cconv.c          |  2 ++
 filesetup.c      |  3 ---
 fio.1            |  4 ++++
 io_u.c           |  5 ++++-
 options.c        | 10 ++++++++++
 stat.c           |  6 ++++++
 thread_options.h |  3 ++-
 8 files changed, 34 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index e376ea5..5d378f3 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2593,6 +2593,12 @@ Measurements and reporting
 	all jobs in a file will be part of the same reporting group, unless
 	separated by a :option:`stonewall`.
 
+.. option:: stats
+
+	By default, fio collects and shows final output results for all jobs
+	that run. If this option is set to 0, then fio will ignore it in
+	the final stat output.
+
 .. option:: write_bw_log=str
 
 	If given, write a bandwidth log for this job. Can be used to store data of
diff --git a/cconv.c b/cconv.c
index b329bf4..886140d 100644
--- a/cconv.c
+++ b/cconv.c
@@ -242,6 +242,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->ioprio_class = le32_to_cpu(top->ioprio_class);
 	o->file_service_type = le32_to_cpu(top->file_service_type);
 	o->group_reporting = le32_to_cpu(top->group_reporting);
+	o->stats = le32_to_cpu(top->stats);
 	o->fadvise_hint = le32_to_cpu(top->fadvise_hint);
 	o->fallocate_mode = le32_to_cpu(top->fallocate_mode);
 	o->zero_buffers = le32_to_cpu(top->zero_buffers);
@@ -426,6 +427,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->ioprio_class = cpu_to_le32(o->ioprio_class);
 	top->file_service_type = cpu_to_le32(o->file_service_type);
 	top->group_reporting = cpu_to_le32(o->group_reporting);
+	top->stats = cpu_to_le32(o->stats);
 	top->fadvise_hint = cpu_to_le32(o->fadvise_hint);
 	top->fallocate_mode = cpu_to_le32(o->fallocate_mode);
 	top->zero_buffers = cpu_to_le32(o->zero_buffers);
diff --git a/filesetup.c b/filesetup.c
index f2e47b1..c9f2b5f 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -942,9 +942,6 @@ int setup_files(struct thread_data *td)
 				if (f->file_offset > f->real_file_size)
 					goto err_offset;
 				f->io_size = f->real_file_size - f->file_offset;
-				log_info("fio: forcing file %s size to %llu\n",
-					f->file_name,
-					(unsigned long long)f->io_size);
 				if (!f->io_size)
 					log_info("fio: file %s may be ignored\n",
 						f->file_name);
diff --git a/fio.1 b/fio.1
index 3348513..b59025d 100644
--- a/fio.1
+++ b/fio.1
@@ -1548,6 +1548,10 @@ Wait for preceding jobs in the job file to exit before starting this one.
 Start a new reporting group.  If not given, all jobs in a file will be part
 of the same reporting group, unless separated by a stonewall.
 .TP
+.BI stats \fR=\fPbool
+By default, fio collects and shows final output results for all jobs that run.
+If this option is set to 0, then fio will ignore it in the final stat output.
+.TP
 .BI numjobs \fR=\fPint
 Number of clones (processes/threads performing the same workload) of this job.
 Default: 1.
diff --git a/io_u.c b/io_u.c
index cb8fc4a..f6efae0 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1734,6 +1734,9 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 	if (td->parent)
 		td = td->parent;
 
+	if (!td->o.stats)
+		return;
+
 	if (no_reduce)
 		lusec = utime_since(&io_u->issue_time, &icd->time);
 
@@ -1994,7 +1997,7 @@ int io_u_queued_complete(struct thread_data *td, int min_evts)
  */
 void io_u_queued(struct thread_data *td, struct io_u *io_u)
 {
-	if (!td->o.disable_slat && ramp_time_over(td)) {
+	if (!td->o.disable_slat && ramp_time_over(td) && td->o.stats) {
 		unsigned long slat_time;
 
 		slat_time = utime_since(&io_u->start_time, &io_u->issue_time);
diff --git a/options.c b/options.c
index dcf0eea..e0deab0 100644
--- a/options.c
+++ b/options.c
@@ -3866,6 +3866,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
+		.name	= "stats",
+		.lname	= "Stats",
+		.type	= FIO_OPT_BOOL,
+		.off1	= offsetof(struct thread_options, stats),
+		.help	= "Enable collection of stats",
+		.def	= "1",
+		.category = FIO_OPT_C_STAT,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
 		.name	= "zero_buffers",
 		.lname	= "Zero I/O buffers",
 		.type	= FIO_OPT_STR_SET,
diff --git a/stat.c b/stat.c
index 0bb21d0..fde7af2 100644
--- a/stat.c
+++ b/stat.c
@@ -1582,6 +1582,8 @@ void __show_run_stats(void)
 		}
 		if (last_ts == td->groupid)
 			continue;
+		if (!td->o.stats)
+			continue;
 
 		last_ts = td->groupid;
 		nr_ts++;
@@ -1599,6 +1601,8 @@ void __show_run_stats(void)
 	last_ts = -1;
 	idx = 0;
 	for_each_td(td, i) {
+		if (!td->o.stats)
+			continue;
 		if (idx && (!td->o.group_reporting ||
 		    (td->o.group_reporting && last_ts != td->groupid))) {
 			idx = 0;
@@ -2569,6 +2573,8 @@ int calc_log_samples(void)
 	fio_gettime(&now, NULL);
 
 	for_each_td(td, i) {
+		if (!td->o.stats)
+			continue;
 		if (in_ramp_time(td) ||
 		    !(td->runstate == TD_RUNNING || td->runstate == TD_VERIFYING)) {
 			next = min(td->o.iops_avg_time, td->o.bw_avg_time);
diff --git a/thread_options.h b/thread_options.h
index 5e72867..2b2df33 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -206,6 +206,7 @@ struct thread_options {
 	unsigned int ioprio_class;
 	unsigned int file_service_type;
 	unsigned int group_reporting;
+	unsigned int stats;
 	unsigned int fadvise_hint;
 	unsigned int fadvise_stream;
 	enum fio_fallocate_mode fallocate_mode;
@@ -475,6 +476,7 @@ struct thread_options_pack {
 	uint32_t ioprio_class;
 	uint32_t file_service_type;
 	uint32_t group_reporting;
+	uint32_t stats;
 	uint32_t fadvise_hint;
 	uint32_t fadvise_stream;
 	uint32_t fallocate_mode;
@@ -502,7 +504,6 @@ struct thread_options_pack {
 	uint64_t trim_backlog;
 	uint32_t clat_percentiles;
 	uint32_t percentile_precision;
-	uint32_t padding;	/* REMOVE ME when possible to maintain alignment */
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
 	uint8_t read_iolog_file[FIO_TOP_STR_MAX];

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-03-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-03-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1df28a3960734e1e00cb2e5fe0e261fcba30f7c7:

  Conditionally enable FIO_HAVE_PSHARED_MUTEX on FreeBSD (2017-03-13 12:54:18 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 06eac6b2318da7759a055c4a3ac01c2c1e8aa764:

  configure: add generic pshared mutex test (2017-03-14 10:52:42 +0000)

----------------------------------------------------------------
Sitsofe Wheeler (1):
      configure: add generic pshared mutex test

 README          |  8 ++++----
 configure       | 31 +++++++++++++++++++++++++++++++
 init.c          |  2 +-
 mutex.c         |  6 +++---
 os/os-aix.h     |  2 --
 os/os-android.h |  1 -
 os/os-freebsd.h |  4 ----
 os/os-hpux.h    |  1 -
 os/os-linux.h   |  1 -
 os/os-solaris.h |  1 -
 10 files changed, 39 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/README b/README
index 9493c2a..951550b 100644
--- a/README
+++ b/README
@@ -205,10 +205,10 @@ implemented, I'd be happy to take patches for that. An example of that is disk
 utility statistics and (I think) huge page support, support for that does exist
 in FreeBSD/Solaris.
 
-Fio uses pthread mutexes for signalling and locking and FreeBSD does not
-support process shared pthread mutexes. As a result, only threads are
-supported on FreeBSD. This could be fixed with sysv ipc locking or
-other locking alternatives.
+Fio uses pthread mutexes for signalling and locking and some platforms do not
+support process shared pthread mutexes. As a result, on such platforms only
+threads are supported. This could be fixed with sysv ipc locking or other
+locking alternatives.
 
 Other \*BSD platforms are untested, but fio should work there almost out of the
 box. Since I don't do test runs or even compiles on those platforms, your
diff --git a/configure b/configure
index a7610b1..9335124 100755
--- a/configure
+++ b/configure
@@ -605,6 +605,34 @@ fi
 echo "POSIX AIO fsync               $posix_aio_fsync"
 
 ##########################################
+# POSIX pshared attribute probe
+posix_pshared="no"
+cat > $TMPC <<EOF
+#include <unistd.h>
+int main(void)
+{
+#if defined(_POSIX_THREAD_PROCESS_SHARED) && ((_POSIX_THREAD_PROCESS_SHARED + 0) > 0)
+# if defined(__CYGWIN__)
+#  error "_POSIX_THREAD_PROCESS_SHARED is buggy on Cygwin"
+# elif defined(__APPLE__)
+#  include <AvailabilityMacros.h>
+#  include <TargetConditionals.h>
+#  if TARGET_OS_MAC && MAC_OS_X_VERSION_MIN_REQUIRED < 1070
+#   error "_POSIX_THREAD_PROCESS_SHARED is buggy/unsupported prior to OSX 10.7"
+#  endif
+# endif
+#else
+# error "_POSIX_THREAD_PROCESS_SHARED is unsupported"
+#endif
+  return 0;
+}
+EOF
+if compile_prog "" "$LIBS" "posix_pshared" ; then
+  posix_pshared=yes
+fi
+echo "POSIX pshared support         $posix_pshared"
+
+##########################################
 # solaris aio probe
 if test "$solaris_aio" != "yes" ; then
   solaris_aio="no"
@@ -1986,6 +2014,9 @@ fi
 if test "$posix_aio_fsync" = "yes" ; then
   output_sym "CONFIG_POSIXAIO_FSYNC"
 fi
+if test "$posix_pshared" = "yes" ; then
+  output_sym "CONFIG_PSHARED"
+fi
 if test "$linux_fallocate" = "yes" ; then
   output_sym "CONFIG_LINUX_FALLOCATE"
 fi
diff --git a/init.c b/init.c
index 18538de..54fdb92 100644
--- a/init.c
+++ b/init.c
@@ -586,7 +586,7 @@ static int fixup_options(struct thread_data *td)
 	struct thread_options *o = &td->o;
 	int ret = 0;
 
-#ifndef FIO_HAVE_PSHARED_MUTEX
+#ifndef CONFIG_PSHARED
 	if (!o->use_thread) {
 		log_info("fio: this platform does not support process shared"
 			 " mutexes, forcing use of threads. Use the 'thread'"
diff --git a/mutex.c b/mutex.c
index 5e5a064..d8c4825 100644
--- a/mutex.c
+++ b/mutex.c
@@ -47,7 +47,7 @@ int cond_init_pshared(pthread_cond_t *cond)
 		return ret;
 	}
 
-#ifdef FIO_HAVE_PSHARED_MUTEX
+#ifdef CONFIG_PSHARED
 	ret = pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED);
 	if (ret) {
 		log_err("pthread_condattr_setpshared: %s\n", strerror(ret));
@@ -77,7 +77,7 @@ int mutex_init_pshared(pthread_mutex_t *mutex)
 	/*
 	 * Not all platforms support process shared mutexes (FreeBSD)
 	 */
-#ifdef FIO_HAVE_PSHARED_MUTEX
+#ifdef CONFIG_PSHARED
 	ret = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
 	if (ret) {
 		log_err("pthread_mutexattr_setpshared: %s\n", strerror(ret));
@@ -287,7 +287,7 @@ struct fio_rwlock *fio_rwlock_init(void)
 		log_err("pthread_rwlockattr_init: %s\n", strerror(ret));
 		goto err;
 	}
-#ifdef FIO_HAVE_PSHARED_MUTEX
+#ifdef CONFIG_PSHARED
 	ret = pthread_rwlockattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
 	if (ret) {
 		log_err("pthread_rwlockattr_setpshared: %s\n", strerror(ret));
diff --git a/os/os-aix.h b/os/os-aix.h
index bdc190a..e204d6f 100644
--- a/os/os-aix.h
+++ b/os/os-aix.h
@@ -14,8 +14,6 @@
 #define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 
-#define FIO_HAVE_PSHARED_MUTEX
-
 #define OS_MAP_ANON		MAP_ANON
 #define OS_MSG_DONTWAIT		0
 
diff --git a/os/os-android.h b/os/os-android.h
index cdae703..b59fac1 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -27,7 +27,6 @@
 #define FIO_HAVE_ODIRECT
 #define FIO_HAVE_HUGETLB
 #define FIO_HAVE_BLKTRACE
-#define FIO_HAVE_PSHARED_MUTEX
 #define FIO_HAVE_CL_SIZE
 #define FIO_HAVE_FS_STAT
 #define FIO_HAVE_TRIM
diff --git a/os/os-freebsd.h b/os/os-freebsd.h
index 3d7dbe6..c7863b5 100644
--- a/os/os-freebsd.h
+++ b/os/os-freebsd.h
@@ -24,10 +24,6 @@
 #define FIO_HAVE_CPU_AFFINITY
 #define FIO_HAVE_SHM_ATTACH_REMOVED
 
-#if _POSIX_THREAD_PROCESS_SHARED > 0
-#define FIO_HAVE_PSHARED_MUTEX
-#endif
-
 #define OS_MAP_ANON		MAP_ANON
 
 #define fio_swap16(x)	bswap16(x)
diff --git a/os/os-hpux.h b/os/os-hpux.h
index 1707ddd..6a240b0 100644
--- a/os/os-hpux.h
+++ b/os/os-hpux.h
@@ -22,7 +22,6 @@
 #define FIO_HAVE_ODIRECT
 #define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
-#define FIO_HAVE_PSHARED_MUTEX
 #define FIO_HAVE_CHARDEV_SIZE
 
 #define OS_MAP_ANON		MAP_ANONYMOUS
diff --git a/os/os-linux.h b/os/os-linux.h
index 7be833b..7b328dc 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -32,7 +32,6 @@
 #define FIO_HAVE_HUGETLB
 #define FIO_HAVE_RAWBIND
 #define FIO_HAVE_BLKTRACE
-#define FIO_HAVE_PSHARED_MUTEX
 #define FIO_HAVE_CL_SIZE
 #define FIO_HAVE_CGROUPS
 #define FIO_HAVE_FS_STAT
diff --git a/os/os-solaris.h b/os/os-solaris.h
index 73ad84a..8f8f53b 100644
--- a/os/os-solaris.h
+++ b/os/os-solaris.h
@@ -16,7 +16,6 @@
 #include "../file.h"
 
 #define FIO_HAVE_CPU_AFFINITY
-#define FIO_HAVE_PSHARED_MUTEX
 #define FIO_HAVE_CHARDEV_SIZE
 #define FIO_USE_GENERIC_BDEV_SIZE
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-03-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-03-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e9bd687d147d5aee710d56854524bbada5a34650:

  Makefile: make test target use thread (2017-03-12 21:04:45 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1df28a3960734e1e00cb2e5fe0e261fcba30f7c7:

  Conditionally enable FIO_HAVE_PSHARED_MUTEX on FreeBSD (2017-03-13 12:54:18 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      rbd: don't log version to stdout

Tomohiro Kusumi (7):
      Make check_mount_writes() test appropriate device types
      HOWTO: Add note/exception on allow_mounted_write=
      Minor fixup for page cache invalidation debug prints
      Use ENOTSUP if OS doesn't support blkdev page cache invalidation
      Fix errval variable to be positive errno value
      manpage: Add URL links to HOWTO/README
      Conditionally enable FIO_HAVE_PSHARED_MUTEX on FreeBSD

 HOWTO             |  3 ++-
 backend.c         |  8 ++++++++
 engines/rbd.c     |  5 -----
 filesetup.c       | 20 ++++++++++++++------
 fio.1             |  9 ++++++++-
 os/os-aix.h       |  2 +-
 os/os-dragonfly.h |  2 +-
 os/os-freebsd.h   |  6 +++++-
 os/os-hpux.h      |  2 +-
 os/os-mac.h       |  2 +-
 os/os-netbsd.h    |  2 +-
 os/os-openbsd.h   |  2 +-
 os/os-solaris.h   |  2 +-
 os/os-windows.h   |  4 +---
 14 files changed, 45 insertions(+), 24 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index c2c6509..e376ea5 100644
--- a/HOWTO
+++ b/HOWTO
@@ -886,7 +886,8 @@ Target file/device
 	If this isn't set, fio will abort jobs that are destructive (e.g. that write)
 	to what appears to be a mounted device or partition. This should help catch
 	creating inadvertently destructive tests, not realizing that the test will
-	destroy data on the mounted file system. Default: false.
+	destroy data on the mounted file system. Note that some platforms don't allow
+	writing against a mounted device regardless of this option. Default: false.
 
 .. option:: pre_read=bool
 
diff --git a/backend.c b/backend.c
index 2e8a994..b61de7c 100644
--- a/backend.c
+++ b/backend.c
@@ -2056,8 +2056,16 @@ static bool check_mount_writes(struct thread_data *td)
 	if (!td_write(td) || td->o.allow_mounted_write)
 		return false;
 
+	/*
+	 * If FIO_HAVE_CHARDEV_SIZE is defined, it's likely that chrdevs
+	 * are mkfs'd and mounted.
+	 */
 	for_each_file(td, f, i) {
+#ifdef FIO_HAVE_CHARDEV_SIZE
+		if (f->filetype != FIO_TYPE_BLOCK && f->filetype != FIO_TYPE_CHAR)
+#else
 		if (f->filetype != FIO_TYPE_BLOCK)
+#endif
 			continue;
 		if (device_is_mounted(f->file_name))
 			goto mounted;
diff --git a/engines/rbd.c b/engines/rbd.c
index 62f0b2e..829e41a 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -566,13 +566,8 @@ static int fio_rbd_setup(struct thread_data *td)
 	rbd_image_info_t info;
 	struct fio_file *f;
 	struct rbd_data *rbd = NULL;
-	int major, minor, extra;
 	int r;
 
-	/* log version of librbd. No cluster connection required. */
-	rbd_version(&major, &minor, &extra);
-	log_info("rbd engine: RBD version: %d.%d.%d\n", major, minor, extra);
-
 	/* allocate engine specific structure to deal with librbd. */
 	r = _fio_setup_rbd_data(td, &rbd);
 	if (r) {
diff --git a/filesetup.c b/filesetup.c
index 4d0b127..f2e47b1 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -442,20 +442,22 @@ static int __file_invalidate_cache(struct thread_data *td, struct fio_file *f,
 	if (len == -1ULL || off == -1ULL)
 		return 0;
 
-	dprint(FD_IO, "invalidate cache %s: %llu/%llu\n", f->file_name, off,
-								len);
-
 	if (td->io_ops->invalidate) {
+		dprint(FD_IO, "invalidate %s cache %s\n", td->io_ops->name,
+			f->file_name);
 		ret = td->io_ops->invalidate(td, f);
 		if (ret < 0)
-			errval = ret;
+			errval = -ret;
 	} else if (f->filetype == FIO_TYPE_FILE) {
+		dprint(FD_IO, "declare unneeded cache %s: %llu/%llu\n",
+			f->file_name, off, len);
 		ret = posix_fadvise(f->fd, off, len, POSIX_FADV_DONTNEED);
 		if (ret)
 			errval = ret;
 	} else if (f->filetype == FIO_TYPE_BLOCK) {
 		int retry_count = 0;
 
+		dprint(FD_IO, "drop page cache %s\n", f->file_name);
 		ret = blockdev_invalidate_cache(f);
 		while (ret < 0 && errno == EAGAIN && retry_count++ < 25) {
 			/*
@@ -477,8 +479,13 @@ static int __file_invalidate_cache(struct thread_data *td, struct fio_file *f,
 		}
 		if (ret < 0)
 			errval = errno;
-	} else if (f->filetype == FIO_TYPE_CHAR || f->filetype == FIO_TYPE_PIPE)
+		else if (ret) /* probably not supported */
+			errval = ret;
+	} else if (f->filetype == FIO_TYPE_CHAR ||
+		   f->filetype == FIO_TYPE_PIPE) {
+		dprint(FD_IO, "invalidate not supported %s\n", f->file_name);
 		ret = 0;
+	}
 
 	/*
 	 * Cache flushing isn't a fatal condition, and we know it will
@@ -487,7 +494,8 @@ static int __file_invalidate_cache(struct thread_data *td, struct fio_file *f,
 	 * continue on our way.
 	 */
 	if (errval)
-		log_info("fio: cache invalidation of %s failed: %s\n", f->file_name, strerror(errval));
+		log_info("fio: cache invalidation of %s failed: %s\n",
+			 f->file_name, strerror(errval));
 
 	return 0;
 
diff --git a/fio.1 b/fio.1
index cc68dee..3348513 100644
--- a/fio.1
+++ b/fio.1
@@ -1,4 +1,4 @@
-.TH fio 1 "December 2016" "User Manual"
+.TH fio 1 "March 2017" "User Manual"
 .SH NAME
 fio \- flexible I/O tester
 .SH SYNOPSIS
@@ -2584,3 +2584,10 @@ See \fBREADME\fR.
 For further documentation see \fBHOWTO\fR and \fBREADME\fR.
 .br
 Sample jobfiles are available in the \fBexamples\fR directory.
+.br
+These are typically located under /usr/share/doc/fio.
+
+\fBHOWTO\fR:  http://git.kernel.dk/?p=fio.git;a=blob_plain;f=HOWTO
+.br
+\fBREADME\fR: http://git.kernel.dk/?p=fio.git;a=blob_plain;f=README
+.br
diff --git a/os/os-aix.h b/os/os-aix.h
index 3d67765..bdc190a 100644
--- a/os/os-aix.h
+++ b/os/os-aix.h
@@ -23,7 +23,7 @@
 
 static inline int blockdev_invalidate_cache(struct fio_file *f)
 {
-	return EINVAL;
+	return ENOTSUP;
 }
 
 static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index 97452ca..8a116e6 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -184,7 +184,7 @@ static inline int chardev_size(struct fio_file *f, unsigned long long *bytes)
 
 static inline int blockdev_invalidate_cache(struct fio_file *f)
 {
-	return EINVAL;
+	return ENOTSUP;
 }
 
 static inline unsigned long long os_phys_mem(void)
diff --git a/os/os-freebsd.h b/os/os-freebsd.h
index 9d1af3b..3d7dbe6 100644
--- a/os/os-freebsd.h
+++ b/os/os-freebsd.h
@@ -24,6 +24,10 @@
 #define FIO_HAVE_CPU_AFFINITY
 #define FIO_HAVE_SHM_ATTACH_REMOVED
 
+#if _POSIX_THREAD_PROCESS_SHARED > 0
+#define FIO_HAVE_PSHARED_MUTEX
+#endif
+
 #define OS_MAP_ANON		MAP_ANON
 
 #define fio_swap16(x)	bswap16(x)
@@ -82,7 +86,7 @@ static inline int chardev_size(struct fio_file *f, unsigned long long *bytes)
 
 static inline int blockdev_invalidate_cache(struct fio_file *f)
 {
-	return EINVAL;
+	return ENOTSUP;
 }
 
 static inline unsigned long long os_phys_mem(void)
diff --git a/os/os-hpux.h b/os/os-hpux.h
index 82acd11..1707ddd 100644
--- a/os/os-hpux.h
+++ b/os/os-hpux.h
@@ -44,7 +44,7 @@ typedef struct aiocb64 os_aiocb_t;
 
 static inline int blockdev_invalidate_cache(struct fio_file *f)
 {
-	return EINVAL;
+	return ENOTSUP;
 }
 
 static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
diff --git a/os/os-mac.h b/os/os-mac.h
index 0903a6f..7de36ea 100644
--- a/os/os-mac.h
+++ b/os/os-mac.h
@@ -77,7 +77,7 @@ static inline int chardev_size(struct fio_file *f, unsigned long long *bytes)
 
 static inline int blockdev_invalidate_cache(struct fio_file *f)
 {
-	return EINVAL;
+	return ENOTSUP;
 }
 
 static inline unsigned long long os_phys_mem(void)
diff --git a/os/os-netbsd.h b/os/os-netbsd.h
index 2133d7a..e6ba508 100644
--- a/os/os-netbsd.h
+++ b/os/os-netbsd.h
@@ -54,7 +54,7 @@ static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 
 static inline int blockdev_invalidate_cache(struct fio_file *f)
 {
-	return EINVAL;
+	return ENOTSUP;
 }
 
 static inline unsigned long long os_phys_mem(void)
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index 3b19483..7def432 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -53,7 +53,7 @@ static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 
 static inline int blockdev_invalidate_cache(struct fio_file *f)
 {
-	return EINVAL;
+	return ENOTSUP;
 }
 
 static inline unsigned long long os_phys_mem(void)
diff --git a/os/os-solaris.h b/os/os-solaris.h
index 5b78cc2..73ad84a 100644
--- a/os/os-solaris.h
+++ b/os/os-solaris.h
@@ -61,7 +61,7 @@ static inline int chardev_size(struct fio_file *f, unsigned long long *bytes)
 
 static inline int blockdev_invalidate_cache(struct fio_file *f)
 {
-	return 0;
+	return ENOTSUP;
 }
 
 static inline unsigned long long os_phys_mem(void)
diff --git a/os/os-windows.h b/os/os-windows.h
index 616ad43..0c8c42d 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -152,9 +152,7 @@ static inline int chardev_size(struct fio_file *f, unsigned long long *bytes)
 
 static inline int blockdev_invalidate_cache(struct fio_file *f)
 {
-	/* There's no way to invalidate the cache in Windows
-	 * so just pretend to succeed */
-	return 0;
+	return ENOTSUP;
 }
 
 static inline unsigned long long os_phys_mem(void)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-03-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-03-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ca205a752c3d6ebe7de74a3dfe81808e48a502e3:

  configure: Make Cygwin take regular configure path (2017-03-10 14:43:37 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e9bd687d147d5aee710d56854524bbada5a34650:

  Makefile: make test target use thread (2017-03-12 21:04:45 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Makefile: make test target use thread

Theodore Ts'o (1):
      Only enable arm64 CRC32 acceleration if the required header files are there

 Makefile  | 2 +-
 configure | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 4112410..37150c6 100644
--- a/Makefile
+++ b/Makefile
@@ -449,7 +449,7 @@ doc: tools/plot/fio2gnuplot.1
 	@man -t tools/hist/fiologparser_hist.py.1 | ps2pdf - fiologparser_hist.pdf
 
 test: fio
-	./fio --minimal --ioengine=null --runtime=1s --name=nulltest --rw=randrw --iodepth=2 --norandommap --random_generator=tausworthe64 --size=16T --name=verifynulltest --rw=write --verify=crc32c --verify_state_save=0 --size=100M
+	./fio --minimal --thread --ioengine=null --runtime=1s --name=nulltest --rw=randrw --iodepth=2 --norandommap --random_generator=tausworthe64 --size=16T --name=verifynulltest --rw=write --verify=crc32c --verify_state_save=0 --size=100M
 
 install: $(PROGS) $(SCRIPTS) tools/plot/fio2gnuplot.1 FORCE
 	$(INSTALL) -m 755 -d $(DESTDIR)$(bindir)
diff --git a/configure b/configure
index 7b55711..a7610b1 100755
--- a/configure
+++ b/configure
@@ -1943,6 +1943,10 @@ if test "$march_armv8_a_crc_crypto" != "yes" ; then
 fi
 if test "$cpu" = "arm64" ; then
   cat > $TMPC <<EOF
+#include <sys/auxv.h>
+#include <arm_acle.h>
+#include <arm_neon.h>
+
 int main(void)
 {
   return 0;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-03-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-03-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit ae3a5accfdbe1fbfde6ba4ab583887a7d3d779ac:

  verify: add support for the sha3 variants (2017-03-08 09:13:14 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ca205a752c3d6ebe7de74a3dfe81808e48a502e3:

  configure: Make Cygwin take regular configure path (2017-03-10 14:43:37 -0700)

----------------------------------------------------------------
Tomohiro Kusumi (17):
      Add runtime handlers for 97900ebf for FreeBSD/DragonFlyBSD
      HOWTO: Add platforms without fdatasync(2)
      configure: Align help messages
      Avoid irrelevant "offset extend ends" error message for chrdev
      Fix debug print format of file ->file_name
      Fixup for a minor 0 byte file size case
      Explicitly check td_trim(td) to detect open(2) flag
      Drop redundant td_rw(td) tests
      Remove unassigned fio_unused variable
      Drop fio_unused attribute from used variable
      Fix a function name typo in debug print
      Don't set FIO_FILE_extend when create_on_open= option is set
      Minor fixup for "Layint out IO file..." message
      HOWTO: Add some details for invalidate=
      Define struct file_name as a file local structure
      Use union for per file engine private data storage
      configure: Make Cygwin take regular configure path

 HOWTO                    |   8 +-
 backend.c                |   2 +-
 configure                | 320 +++++++++++++++++++++++++++++++++--------------
 engines/glusterfs_sync.c |   2 +-
 engines/pmemblk.c        |  16 +--
 engines/sync.c           |   2 +-
 file.h                   |  17 ++-
 filesetup.c              |  76 ++++++++---
 fio.h                    |   2 +-
 init.c                   |  13 +-
 io_u.c                   |   4 +-
 os/os-dragonfly.h        |  14 ++-
 os/os-freebsd.h          |  15 ++-
 os/os-linux.h            |   5 +
 os/os-openbsd.h          |  12 +-
 os/os.h                  |   7 ++
 rate-submit.c            |   1 -
 verify.c                 |   2 +-
 18 files changed, 359 insertions(+), 159 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 15ed425..c2c6509 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1104,7 +1104,7 @@ I/O type
 .. option:: fdatasync=int
 
 	Like :option:`fsync` but uses :manpage:`fdatasync(2)` to only sync data and
-	not metadata blocks.  In FreeBSD and Windows there is no
+	not metadata blocks.  In Windows, FreeBSD, and DragonFlyBSD there is no
 	:manpage:`fdatasync(2)`, this falls back to using :manpage:`fsync(2)`.
 
 .. option:: write_barrier=int
@@ -1426,7 +1426,9 @@ Buffers and memory
 .. option:: invalidate=bool
 
 	Invalidate the buffer/page cache parts for this file prior to starting
-	I/O. Defaults to true.
+	I/O if the platform and file type support it. Defaults to true.
+	This will be ignored if :option:`pre_read` is also specified for the
+	same job.
 
 .. option:: sync=bool
 
@@ -1513,7 +1515,7 @@ I/O size
 	Fio will divide this size between the available files determined by options
 	such as :option:`nrfiles`, :option:`filename`, unless :option:`filesize` is
 	specified by the job. If the result of division happens to be 0, the size is
-	set to the physical size of the given files or devices.
+	set to the physical size of the given files or devices if they exist.
 	If this option is not specified, fio will use the full size of the given
 	files or devices.  If the files do not exist, size must be given. It is also
 	possible to give size as a percentage between 1 and 100. If ``size=20%`` is
diff --git a/backend.c b/backend.c
index 4bc00e6..2e8a994 100644
--- a/backend.c
+++ b/backend.c
@@ -1693,7 +1693,7 @@ static void *thread_main(void *data)
 
 		prune_io_piece_log(td);
 
-		if (td->o.verify_only && (td_write(td) || td_rw(td)))
+		if (td->o.verify_only && td_write(td))
 			verify_bytes = do_dry_run(td);
 		else {
 			uint64_t bytes_done[DDIR_RWDIR_CNT];
diff --git a/configure b/configure
index 15b87fa..7b55711 100755
--- a/configure
+++ b/configure
@@ -195,21 +195,21 @@ for opt do
 done
 
 if test "$show_help" = "yes" ; then
-  echo "--prefix=              Use this directory as installation prefix"
-  echo "--cpu=                 Specify target CPU if auto-detect fails"
-  echo "--cc=                  Specify compiler to use"
-  echo "--extra-cflags=        Specify extra CFLAGS to pass to compiler"
-  echo "--build-32bit-win      Enable 32-bit build on Windows"
-  echo "--build-static         Build a static fio"
-  echo "--esx                  Configure build options for esx"
-  echo "--enable-gfio          Enable building of gtk gfio"
-  echo "--disable-numa         Disable libnuma even if found"
-  echo "--disable-gfapi        Disable gfapi"
-  echo "--enable-libhdfs       Enable hdfs support"
-  echo "--disable-lex          Disable use of lex/yacc for math"
-  echo "--disable-pmem         Disable pmem based engines even if found"
-  echo "--enable-lex           Enable use of lex/yacc for math"
-  echo "--disable-shm          Disable SHM support"
+  echo "--prefix=               Use this directory as installation prefix"
+  echo "--cpu=                  Specify target CPU if auto-detect fails"
+  echo "--cc=                   Specify compiler to use"
+  echo "--extra-cflags=         Specify extra CFLAGS to pass to compiler"
+  echo "--build-32bit-win       Enable 32-bit build on Windows"
+  echo "--build-static          Build a static fio"
+  echo "--esx                   Configure build options for esx"
+  echo "--enable-gfio           Enable building of gtk gfio"
+  echo "--disable-numa          Disable libnuma even if found"
+  echo "--disable-gfapi         Disable gfapi"
+  echo "--enable-libhdfs        Enable hdfs support"
+  echo "--disable-lex           Disable use of lex/yacc for math"
+  echo "--disable-pmem          Disable pmem based engines even if found"
+  echo "--enable-lex            Enable use of lex/yacc for math"
+  echo "--disable-shm           Disable SHM support"
   echo "--disable-optimizations Don't enable compiler optimizations"
   exit $exit_val
 fi
@@ -288,7 +288,8 @@ SunOS)
   LIBS="-lnsl -lsocket"
   ;;
 CYGWIN*)
-  echo "Forcing known good options on Windows"
+  # We still force some options, so keep this message here.
+  echo "Forcing some known good options on Windows"
   if test -z "$CC" ; then
     if test ! -z "$build_32bit_win" && test "$build_32bit_win" = "yes"; then
       CC="i686-w64-mingw32-gcc"
@@ -306,29 +307,30 @@ CYGWIN*)
       fi
     fi
   fi
-  output_sym "CONFIG_LITTLE_ENDIAN"
   if test ! -z "$build_32bit_win" && test "$build_32bit_win" = "yes"; then
     output_sym "CONFIG_32BIT"
   else
     output_sym "CONFIG_64BIT_LLP64"
   fi
-  output_sym "CONFIG_SOCKLEN_T"
-  output_sym "CONFIG_SFAA"
-  output_sym "CONFIG_RUSAGE_THREAD"
+  # We need this to be output_sym'd here because this is Windows specific.
+  # The regular configure path never sets this config.
   output_sym "CONFIG_WINDOWSAIO"
-  output_sym "CONFIG_FDATASYNC"
-  output_sym "CONFIG_CLOCK_MONOTONIC"
-  output_sym "CONFIG_GETTIMEOFDAY"
-  output_sym "CONFIG_CLOCK_GETTIME"
-  output_sym "CONFIG_SCHED_IDLE"
-  output_sym "CONFIG_TCP_NODELAY"
-  output_sym "CONFIG_TLS_THREAD"
-  output_sym "CONFIG_STATIC_ASSERT"
-  output_sym "CONFIG_IPV6"
+  # We now take the regular configuration path without having exit 0 here.
+  # Flags below are still necessary mostly for MinGW.
+  socklen_t="yes"
+  sfaa="yes"
+  rusage_thread="yes"
+  fdatasync="yes"
+  clock_gettime="yes" # clock_monotonic probe has dependency on this
+  clock_monotonic="yes"
+  gettimeofday="yes"
+  sched_idle="yes"
+  tcp_nodelay="yes"
+  tls_thread="yes"
+  static_assert="yes"
+  ipv6="yes"
   echo "CC=$CC" >> $config_host_mak
   echo "BUILD_CFLAGS=$CFLAGS -I../zlib -include config-host.h -D_GNU_SOURCE" >> $config_host_mak
-
-  exit 0
   ;;
 esac
 
@@ -417,7 +419,9 @@ cc="${CC-${cross_prefix}gcc}"
 ##########################################
 # check cross compile
 
-cross_compile="no"
+if test "$cross_compile" != "yes" ; then
+  cross_compile="no"
+fi
 cat > $TMPC <<EOF
 int main(void)
 {
@@ -432,7 +436,9 @@ fi
 
 ##########################################
 # check endianness
-bigendian="no"
+if test "$bigendian" != "yes" ; then
+  bigendian="no"
+fi
 if test "$cross_compile" = "no" ; then
   cat > $TMPC <<EOF
 #include <inttypes.h>
@@ -503,7 +509,9 @@ echo "Wordsize                      $wordsize"
 
 ##########################################
 # zlib probe
-zlib="no"
+if test "$zlib" != "yes" ; then
+  zlib="no"
+fi
 cat > $TMPC <<EOF
 #include <zlib.h>
 int main(void)
@@ -522,7 +530,9 @@ echo "zlib                          $zlib"
 
 ##########################################
 # linux-aio probe
-libaio="no"
+if test "$libaio" != "yes" ; then
+  libaio="no"
+fi
 if test "$esx" != "yes" ; then
   cat > $TMPC <<EOF
 #include <libaio.h>
@@ -547,8 +557,12 @@ echo "Linux AIO support             $libaio"
 
 ##########################################
 # posix aio probe
-posix_aio="no"
-posix_aio_lrt="no"
+if test "$posix_aio" != "yes" ; then
+  posix_aio="no"
+fi
+if test "$posix_aio_lrt" != "yes" ; then
+  posix_aio_lrt="no"
+fi
 cat > $TMPC <<EOF
 #include <aio.h>
 int main(void)
@@ -570,7 +584,9 @@ echo "POSIX AIO support needs -lrt  $posix_aio_lrt"
 
 ##########################################
 # posix aio fsync probe
-posix_aio_fsync="no"
+if test "$posix_aio_fsync" != "yes" ; then
+  posix_aio_fsync="no"
+fi
 if test "$posix_aio" = "yes" ; then
   cat > $TMPC <<EOF
 #include <fcntl.h>
@@ -590,7 +606,9 @@ echo "POSIX AIO fsync               $posix_aio_fsync"
 
 ##########################################
 # solaris aio probe
-solaris_aio="no"
+if test "$solaris_aio" != "yes" ; then
+  solaris_aio="no"
+fi
 cat > $TMPC <<EOF
 #include <sys/types.h>
 #include <sys/asynch.h>
@@ -610,7 +628,9 @@ echo "Solaris AIO support           $solaris_aio"
 
 ##########################################
 # __sync_fetch_and_add test
-sfaa="no"
+if test "$sfaa" != "yes" ; then
+  sfaa="no"
+fi
 cat > $TMPC << EOF
 #include <inttypes.h>
 static int sfaa(uint64_t *ptr)
@@ -632,7 +652,9 @@ echo "__sync_fetch_and_add          $sfaa"
 
 ##########################################
 # libverbs probe
-libverbs="no"
+if test "$libverbs" != "yes" ; then
+  libverbs="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <infiniband/arch.h>
@@ -650,7 +672,9 @@ echo "libverbs                      $libverbs"
 
 ##########################################
 # rdmacm probe
-rdmacm="no"
+if test "$rdmacm" != "yes" ; then
+  rdmacm="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <rdma/rdma_cma.h>
@@ -668,7 +692,9 @@ echo "rdmacm                        $rdmacm"
 
 ##########################################
 # Linux fallocate probe
-linux_fallocate="no"
+if test "$linux_fallocate" != "yes" ; then
+  linux_fallocate="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <fcntl.h>
@@ -686,7 +712,9 @@ echo "Linux fallocate               $linux_fallocate"
 
 ##########################################
 # POSIX fadvise probe
-posix_fadvise="no"
+if test "$posix_fadvise" != "yes" ; then
+  posix_fadvise="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <fcntl.h>
@@ -703,7 +731,9 @@ echo "POSIX fadvise                 $posix_fadvise"
 
 ##########################################
 # POSIX fallocate probe
-posix_fallocate="no"
+if test "$posix_fallocate" != "yes" ; then
+  posix_fallocate="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <fcntl.h>
@@ -720,8 +750,12 @@ echo "POSIX fallocate               $posix_fallocate"
 
 ##########################################
 # sched_set/getaffinity 2 or 3 argument test
-linux_2arg_affinity="no"
-linux_3arg_affinity="no"
+if test "$linux_2arg_affinity" != "yes" ; then
+  linux_2arg_affinity="no"
+fi
+if test "$linux_3arg_affinity" != "yes" ; then
+  linux_3arg_affinity="no"
+fi
 cat > $TMPC << EOF
 #include <sched.h>
 int main(int argc, char **argv)
@@ -750,7 +784,9 @@ echo "sched_setaffinity(2 arg)      $linux_2arg_affinity"
 
 ##########################################
 # clock_gettime probe
-clock_gettime="no"
+if test "$clock_gettime" != "yes" ; then
+  clock_gettime="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <time.h>
@@ -769,7 +805,9 @@ echo "clock_gettime                 $clock_gettime"
 
 ##########################################
 # CLOCK_MONOTONIC probe
-clock_monotonic="no"
+if test "$clock_monotonic" != "yes" ; then
+  clock_monotonic="no"
+fi
 if test "$clock_gettime" = "yes" ; then
   cat > $TMPC << EOF
 #include <stdio.h>
@@ -787,7 +825,9 @@ echo "CLOCK_MONOTONIC               $clock_monotonic"
 
 ##########################################
 # CLOCK_MONOTONIC_RAW probe
-clock_monotonic_raw="no"
+if test "$clock_monotonic_raw" != "yes" ; then
+  clock_monotonic_raw="no"
+fi
 if test "$clock_gettime" = "yes" ; then
   cat > $TMPC << EOF
 #include <stdio.h>
@@ -805,7 +845,9 @@ echo "CLOCK_MONOTONIC_RAW           $clock_monotonic_raw"
 
 ##########################################
 # CLOCK_MONOTONIC_PRECISE probe
-clock_monotonic_precise="no"
+if test "$clock_monotonic_precise" != "yes" ; then
+  clock_monotonic_precise="no"
+fi
 if test "$clock_gettime" = "yes" ; then
   cat > $TMPC << EOF
 #include <stdio.h>
@@ -823,7 +865,9 @@ echo "CLOCK_MONOTONIC_PRECISE       $clock_monotonic_precise"
 
 ##########################################
 # clockid_t probe
-clockid_t="no"
+if test "$clockid_t" != "yes" ; then
+  clockid_t="no"
+fi
 cat > $TMPC << EOF
 #include <time.h>
 int main(int argc, char **argv)
@@ -840,7 +884,9 @@ echo "clockid_t                     $clockid_t"
 
 ##########################################
 # gettimeofday() probe
-gettimeofday="no"
+if test "$gettimeofday" != "yes" ; then
+  gettimeofday="no"
+fi
 cat > $TMPC << EOF
 #include <sys/time.h>
 #include <stdio.h>
@@ -857,7 +903,9 @@ echo "gettimeofday                  $gettimeofday"
 
 ##########################################
 # fdatasync() probe
-fdatasync="no"
+if test "$fdatasync" != "yes" ; then
+  fdatasync="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <unistd.h>
@@ -873,7 +921,9 @@ echo "fdatasync                     $fdatasync"
 
 ##########################################
 # sync_file_range() probe
-sync_file_range="no"
+if test "$sync_file_range" != "yes" ; then
+  sync_file_range="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <unistd.h>
@@ -893,7 +943,9 @@ echo "sync_file_range               $sync_file_range"
 
 ##########################################
 # ext4 move extent probe
-ext4_me="no"
+if test "$ext4_me" != "yes" ; then
+  ext4_me="no"
+fi
 cat > $TMPC << EOF
 #include <fcntl.h>
 #include <sys/ioctl.h>
@@ -915,7 +967,9 @@ echo "EXT4 move extent              $ext4_me"
 
 ##########################################
 # splice probe
-linux_splice="no"
+if test "$linux_splice" != "yes" ; then
+  linux_splice="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <fcntl.h>
@@ -931,7 +985,9 @@ echo "Linux splice(2)               $linux_splice"
 
 ##########################################
 # GUASI probe
-guasi="no"
+if test "$guasi" != "yes" ; then
+  guasi="no"
+fi
 cat > $TMPC << EOF
 #include <guasi.h>
 #include <guasi_syscalls.h>
@@ -948,7 +1004,9 @@ echo "GUASI                         $guasi"
 
 ##########################################
 # fusion-aw probe
-fusion_aw="no"
+if test "$fusion_aw" != "yes" ; then
+  fusion_aw="no"
+fi
 cat > $TMPC << EOF
 #include <nvm/nvm_primitives.h>
 int main(int argc, char **argv)
@@ -968,7 +1026,9 @@ echo "Fusion-io atomic engine       $fusion_aw"
 
 ##########################################
 # libnuma probe
-libnuma="no"
+if test "$libnuma" != "yes" ; then
+  libnuma="no"
+fi
 cat > $TMPC << EOF
 #include <numa.h>
 int main(int argc, char **argv)
@@ -983,7 +1043,7 @@ fi
 echo "libnuma                       $libnuma"
 
 ##########################################
-# libnuma 2.x version API
+# libnuma 2.x version API, initialize with "no" only if $libnuma is set to "yes"
 if test "$libnuma" = "yes" ; then
 libnuma_v2="no"
 cat > $TMPC << EOF
@@ -1002,7 +1062,9 @@ fi
 
 ##########################################
 # strsep() probe
-strsep="no"
+if test "$strsep" != "yes" ; then
+  strsep="no"
+fi
 cat > $TMPC << EOF
 #include <string.h>
 int main(int argc, char **argv)
@@ -1019,7 +1081,9 @@ echo "strsep                        $strsep"
 
 ##########################################
 # strcasestr() probe
-strcasestr="no"
+if test "$strcasestr" != "yes" ; then
+  strcasestr="no"
+fi
 cat > $TMPC << EOF
 #include <string.h>
 int main(int argc, char **argv)
@@ -1034,7 +1098,9 @@ echo "strcasestr                    $strcasestr"
 
 ##########################################
 # strlcat() probe
-strlcat="no"
+if test "$strlcat" != "yes" ; then
+  strlcat="no"
+fi
 cat > $TMPC << EOF
 #include <string.h>
 int main(int argc, char **argv)
@@ -1053,7 +1119,9 @@ echo "strlcat                       $strlcat"
 
 ##########################################
 # getopt_long_only() probe
-getopt_long_only="no"
+if test "$getopt_long_only" != "yes" ; then
+  getopt_long_only="no"
+fi
 cat > $TMPC << EOF
 #include <unistd.h>
 #include <stdio.h>
@@ -1071,7 +1139,9 @@ echo "getopt_long_only()            $getopt_long_only"
 
 ##########################################
 # inet_aton() probe
-inet_aton="no"
+if test "$inet_aton" != "yes" ; then
+  inet_aton="no"
+fi
 cat > $TMPC << EOF
 #include <sys/socket.h>
 #include <arpa/inet.h>
@@ -1089,7 +1159,9 @@ echo "inet_aton                     $inet_aton"
 
 ##########################################
 # socklen_t probe
-socklen_t="no"
+if test "$socklen_t" != "yes" ; then
+  socklen_t="no"
+fi
 cat > $TMPC << EOF
 #include <sys/socket.h>
 int main(int argc, char **argv)
@@ -1105,7 +1177,9 @@ echo "socklen_t                     $socklen_t"
 
 ##########################################
 # Whether or not __thread is supported for TLS
-tls_thread="no"
+if test "$tls_thread" != "yes" ; then
+  tls_thread="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 static __thread int ret;
@@ -1121,7 +1195,9 @@ echo "__thread                      $tls_thread"
 
 ##########################################
 # Check if we have required gtk/glib support for gfio
-gfio="no"
+if test "$gfio" != "yes" ; then
+  gfio="no"
+fi
 if test "$gfio_check" = "yes" ; then
   cat > $TMPC << EOF
 #include <glib.h>
@@ -1169,7 +1245,9 @@ if test "$gfio_check" = "yes" ; then
 fi
 
 # Check whether we have getrusage(RUSAGE_THREAD)
-rusage_thread="no"
+if test "$rusage_thread" != "yes" ; then
+  rusage_thread="no"
+fi
 cat > $TMPC << EOF
 #include <sys/time.h>
 #include <sys/resource.h>
@@ -1187,7 +1265,9 @@ echo "RUSAGE_THREAD                 $rusage_thread"
 
 ##########################################
 # Check whether we have SCHED_IDLE
-sched_idle="no"
+if test "$sched_idle" != "yes" ; then
+  sched_idle="no"
+fi
 cat > $TMPC << EOF
 #include <sched.h>
 int main(int argc, char **argv)
@@ -1203,7 +1283,9 @@ echo "SCHED_IDLE                    $sched_idle"
 
 ##########################################
 # Check whether we have TCP_NODELAY
-tcp_nodelay="no"
+if test "$tcp_nodelay" != "yes" ; then
+  tcp_nodelay="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <sys/types.h>
@@ -1221,7 +1303,9 @@ echo "TCP_NODELAY                   $tcp_nodelay"
 
 ##########################################
 # Check whether we have SO_SNDBUF
-window_size="no"
+if test "$window_size" != "yes" ; then
+  window_size="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <sys/types.h>
@@ -1240,7 +1324,9 @@ echo "Net engine window_size        $window_size"
 
 ##########################################
 # Check whether we have TCP_MAXSEG
-mss="no"
+if test "$mss" != "yes" ; then
+  mss="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <sys/types.h>
@@ -1260,7 +1346,9 @@ echo "TCP_MAXSEG                    $mss"
 
 ##########################################
 # Check whether we have RLIMIT_MEMLOCK
-rlimit_memlock="no"
+if test "$rlimit_memlock" != "yes" ; then
+  rlimit_memlock="no"
+fi
 cat > $TMPC << EOF
 #include <sys/time.h>
 #include <sys/resource.h>
@@ -1277,7 +1365,9 @@ echo "RLIMIT_MEMLOCK                $rlimit_memlock"
 
 ##########################################
 # Check whether we have pwritev/preadv
-pwritev="no"
+if test "$pwritev" != "yes" ; then
+  pwritev="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <sys/uio.h>
@@ -1293,7 +1383,9 @@ echo "pwritev/preadv                $pwritev"
 
 ##########################################
 # Check whether we have pwritev2/preadv2
-pwritev2="no"
+if test "$pwritev2" != "yes" ; then
+  pwritev2="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <sys/uio.h>
@@ -1309,7 +1401,9 @@ echo "pwritev2/preadv2              $pwritev2"
 
 ##########################################
 # Check whether we have the required functions for ipv6
-ipv6="no"
+if test "$ipv6" != "yes" ; then
+  ipv6="no"
+fi
 cat > $TMPC << EOF
 #include <sys/types.h>
 #include <sys/socket.h>
@@ -1336,7 +1430,9 @@ echo "IPv6 helpers                  $ipv6"
 
 ##########################################
 # check for rbd
-rbd="no"
+if test "$rbd" != "yes" ; then
+  rbd="no"
+fi
 cat > $TMPC << EOF
 #include <rbd/librbd.h>
 
@@ -1362,7 +1458,9 @@ echo "Rados Block Device engine     $rbd"
 
 ##########################################
 # check for rbd_poll
-rbd_poll="no"
+if test "$rbd_poll" != "yes" ; then
+  rbd_poll="no"
+fi
 if test "$rbd" = "yes"; then
 cat > $TMPC << EOF
 #include <rbd/librbd.h>
@@ -1388,7 +1486,9 @@ fi
 
 ##########################################
 # check for rbd_invaidate_cache()
-rbd_inval="no"
+if test "$rbd_inval" != "yes" ; then
+  rbd_inval="no"
+fi
 if test "$rbd" = "yes"; then
 cat > $TMPC << EOF
 #include <rbd/librbd.h>
@@ -1408,7 +1508,9 @@ fi
 
 ##########################################
 # check for blkin
-rbd_blkin="no"
+if test "$rbd_blkin" != "yes" ; then
+  rbd_blkin="no"
+fi
 cat > $TMPC << EOF
 #include <rbd/librbd.h>
 #include <zipkin_c.h>
@@ -1436,7 +1538,9 @@ echo "rbd blkin tracing             $rbd_blkin"
 
 ##########################################
 # Check whether we have setvbuf
-setvbuf="no"
+if test "$setvbuf" != "yes" ; then
+  setvbuf="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 int main(int argc, char **argv)
@@ -1453,7 +1557,9 @@ fi
 echo "setvbuf                       $setvbuf"
 
 # check for gfapi
-gfapi="no"
+if test "$gfapi" != "yes" ; then
+  gfapi="no"
+fi
 cat > $TMPC << EOF
 #include <glusterfs/api/glfs.h>
 
@@ -1472,7 +1578,7 @@ fi
  echo "Gluster API engine            $gfapi"
 
 ##########################################
-# check for gfapi fadvise support
+# check for gfapi fadvise support, initialize with "no" only if $gfapi is set to "yes"
 if test "$gfapi" = "yes" ; then
 gf_fadvise="no"
 cat > $TMPC << EOF
@@ -1494,7 +1600,9 @@ fi
 
 ##########################################
 # check for gfapi trim support
-gf_trim="no"
+if test "$gf_trim" != "yes" ; then
+  gf_trim="no"
+fi
 if test "$gfapi" = "yes" ; then
 cat > $TMPC << EOF
 #include <glusterfs/api/glfs.h>
@@ -1512,7 +1620,9 @@ fi
 
 ##########################################
 # Check if we support stckf on s390
-s390_z196_facilities="no"
+if test "$s390_z196_facilities" != "yes" ; then
+  s390_z196_facilities="no"
+fi
 cat > $TMPC << EOF
 #define STFLE_BITS_Z196 45 /* various z196 facilities ... */
 int main(int argc, char **argv)
@@ -1569,7 +1679,9 @@ echo "HDFS engine                   $libhdfs"
 
 ##########################################
 # Check whether we have MTD
-mtd="no"
+if test "$mtd" != "yes" ; then
+  mtd="no"
+fi
 cat > $TMPC << EOF
 #include <string.h>
 #include <mtd/mtd-user.h>
@@ -1590,7 +1702,9 @@ echo "MTD                           $mtd"
 
 ##########################################
 # Check whether we have libpmem
-libpmem="no"
+if test "$libpmem" != "yes" ; then
+  libpmem="no"
+fi
 cat > $TMPC << EOF
 #include <libpmem.h>
 int main(int argc, char **argv)
@@ -1609,7 +1723,9 @@ echo "libpmem                       $libpmem"
 ##########################################
 # Check whether we have libpmemblk
 # libpmem is a prerequisite
-libpmemblk="no"
+if test "$libpmemblk" != "yes" ; then
+  libpmemblk="no"
+fi
 if test "$libpmem" = "yes"; then
   cat > $TMPC << EOF
 #include <libpmemblk.h>
@@ -1705,7 +1821,9 @@ echo "lex/yacc for arithmetic       $arith"
 
 ##########################################
 # Check whether we have setmntent/getmntent
-getmntent="no"
+if test "$getmntent" != "yes" ; then
+  getmntent="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <mntent.h>
@@ -1729,7 +1847,9 @@ echo "getmntent                     $getmntent"
 
 # getmntinfo(3) for FreeBSD/DragonFlyBSD/OpenBSD.
 # Note that NetBSD needs -Werror to catch warning as error.
-getmntinfo="no"
+if test "$getmntinfo" != "yes" ; then
+  getmntinfo="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <sys/param.h>
@@ -1746,7 +1866,9 @@ fi
 echo "getmntinfo                    $getmntinfo"
 
 # getmntinfo(3) for NetBSD.
-getmntinfo_statvfs="no"
+if test "$getmntinfo_statvfs" != "yes" ; then
+  getmntinfo_statvfs="no"
+fi
 cat > $TMPC << EOF
 #include <stdio.h>
 #include <sys/statvfs.h>
@@ -1764,7 +1886,9 @@ fi
 
 ##########################################
 # Check whether we have _Static_assert
-static_assert="no"
+if test "$static_assert" != "yes" ; then
+  static_assert="no"
+fi
 cat > $TMPC << EOF
 #include <assert.h>
 #include <stdlib.h>
@@ -1796,7 +1920,9 @@ echo "Static Assert                 $static_assert"
 
 ##########################################
 # Check whether we have bool / stdbool.h
-have_bool="no"
+if test "$have_bool" != "yes" ; then
+  have_bool="no"
+fi
 cat > $TMPC << EOF
 #include <stdbool.h>
 int main(int argc, char **argv)
@@ -1812,7 +1938,9 @@ echo "bool                          $have_bool"
 
 ##########################################
 # check march=armv8-a+crc+crypto
-march_armv8_a_crc_crypto="no"
+if test "$march_armv8_a_crc_crypto" != "yes" ; then
+  march_armv8_a_crc_crypto="no"
+fi
 if test "$cpu" = "arm64" ; then
   cat > $TMPC <<EOF
 int main(void)
diff --git a/engines/glusterfs_sync.c b/engines/glusterfs_sync.c
index 05e184c..25d05b2 100644
--- a/engines/glusterfs_sync.c
+++ b/engines/glusterfs_sync.c
@@ -7,7 +7,7 @@
 
 #include "gfapi.h"
 
-#define LAST_POS(f)	((f)->engine_data)
+#define LAST_POS(f)	((f)->engine_pos)
 static int fio_gf_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
diff --git a/engines/pmemblk.c b/engines/pmemblk.c
index e8476f9..52af9ed 100644
--- a/engines/pmemblk.c
+++ b/engines/pmemblk.c
@@ -86,10 +86,6 @@ struct fio_pmemblk_file {
 	size_t pmb_bsize;
 	size_t pmb_nblocks;
 };
-#define FIOFILEPMBSET(_f, _v)  do {                 \
-	(_f)->engine_data = (uint64_t)(uintptr_t)(_v);  \
-} while(0)
-#define FIOFILEPMBGET(_f)  ((fio_pmemblk_file_t)((_f)->engine_data))
 
 static fio_pmemblk_file_t Cache;
 
@@ -304,26 +300,26 @@ static int fio_pmemblk_open_file(struct thread_data *td, struct fio_file *f)
 	if (!pmb)
 		return 1;
 
-	FIOFILEPMBSET(f, pmb);
+	FILE_SET_ENG_DATA(f, pmb);
 	return 0;
 }
 
 static int fio_pmemblk_close_file(struct thread_data fio_unused *td,
 				  struct fio_file *f)
 {
-	fio_pmemblk_file_t pmb = FIOFILEPMBGET(f);
+	fio_pmemblk_file_t pmb = FILE_ENG_DATA(f);
 
 	if (pmb)
 		pmb_close(pmb, false);
 
-	FIOFILEPMBSET(f, NULL);
+	FILE_SET_ENG_DATA(f, NULL);
 	return 0;
 }
 
 static int fio_pmemblk_get_file_size(struct thread_data *td, struct fio_file *f)
 {
 	uint64_t flags = 0;
-	fio_pmemblk_file_t pmb = FIOFILEPMBGET(f);
+	fio_pmemblk_file_t pmb = FILE_ENG_DATA(f);
 
 	if (fio_file_size_known(f))
 		return 0;
@@ -340,7 +336,7 @@ static int fio_pmemblk_get_file_size(struct thread_data *td, struct fio_file *f)
 
 	fio_file_set_size_known(f);
 
-	if (!FIOFILEPMBGET(f))
+	if (!FILE_ENG_DATA(f))
 		pmb_close(pmb, true);
 
 	return 0;
@@ -349,7 +345,7 @@ static int fio_pmemblk_get_file_size(struct thread_data *td, struct fio_file *f)
 static int fio_pmemblk_queue(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
-	fio_pmemblk_file_t pmb = FIOFILEPMBGET(f);
+	fio_pmemblk_file_t pmb = FILE_ENG_DATA(f);
 
 	unsigned long long off;
 	unsigned long len;
diff --git a/engines/sync.c b/engines/sync.c
index 1726b8e..e76bbbb 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -18,7 +18,7 @@
 /*
  * Sync engine uses engine_data to store last offset
  */
-#define LAST_POS(f)	((f)->engine_data)
+#define LAST_POS(f)	((f)->engine_pos)
 
 struct syncio_data {
 	struct iovec *iovecs;
diff --git a/file.h b/file.h
index 611470c..9801bb5 100644
--- a/file.h
+++ b/file.h
@@ -113,9 +113,12 @@ struct fio_file {
 	unsigned int last_write_idx;
 
 	/*
-	 * For use by the io engine
+	 * For use by the io engine for offset or private data storage
 	 */
-	uint64_t engine_data;
+	union {
+		uint64_t engine_pos;
+		void *engine_data;
+	};
 
 	/*
 	 * if io is protected by a semaphore, this is set
@@ -147,14 +150,8 @@ struct fio_file {
 	struct disk_util *du;
 };
 
-#define FILE_ENG_DATA(f)	((void *) (uintptr_t) (f)->engine_data)
-#define FILE_SET_ENG_DATA(f, data)	\
-	((f)->engine_data = (uintptr_t) (data))
-
-struct file_name {
-	struct flist_head list;
-	char *filename;
-};
+#define FILE_ENG_DATA(f)		((f)->engine_data)
+#define FILE_SET_ENG_DATA(f, data)	((f)->engine_data = (data))
 
 #define FILE_FLAG_FNS(name)						\
 static inline void fio_file_set_##name(struct fio_file *f)		\
diff --git a/filesetup.c b/filesetup.c
index 793b08d..4d0b127 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -24,6 +24,14 @@ static int root_warn;
 
 static FLIST_HEAD(filename_list);
 
+/*
+ * List entry for filename_list
+ */
+struct file_name {
+	struct flist_head list;
+	char *filename;
+};
+
 static inline void clear_error(struct thread_data *td)
 {
 	td->error = 0;
@@ -377,12 +385,8 @@ static int get_file_size(struct thread_data *td, struct fio_file *f)
 		ret = bdev_size(td, f);
 	else if (f->filetype == FIO_TYPE_CHAR)
 		ret = char_size(td, f);
-	else {
-		f->real_file_size = -1;
-		log_info("%s: failed to get file size of %s\n", td->o.name,
-					f->file_name);
-		return 1; /* avoid offset extends end error message */
-	}
+	else
+		f->real_file_size = -1ULL;
 
 	/*
 	 * Leave ->real_file_size with 0 since it could be expectation
@@ -392,10 +396,22 @@ static int get_file_size(struct thread_data *td, struct fio_file *f)
 		return ret;
 
 	/*
+	 * If ->real_file_size is -1, a conditional for the message
+	 * "offset extends end" is always true, but it makes no sense,
+	 * so just return the same value here.
+	 */
+	if (f->real_file_size == -1ULL) {
+		log_info("%s: failed to get file size of %s\n", td->o.name,
+					f->file_name);
+		return 1;
+	}
+
+	if (td->o.start_offset && f->file_offset == 0)
+		dprint(FD_FILE, "offset of file %s not initialized yet\n",
+					f->file_name);
+	/*
 	 * ->file_offset normally hasn't been initialized yet, so this
-	 * is basically always false unless ->real_file_size is -1, but
-	 * if ->real_file_size is -1 this message doesn't make sense.
-	 * As a result, this message is basically useless.
+	 * is basically always false.
 	 */
 	if (f->file_offset > f->real_file_size) {
 		log_err("%s: offset extends end (%llu > %llu)\n", td->o.name,
@@ -503,7 +519,7 @@ int generic_close_file(struct thread_data fio_unused *td, struct fio_file *f)
 		f->shadow_fd = -1;
 	}
 
-	f->engine_data = 0;
+	f->engine_pos = 0;
 	return ret;
 }
 
@@ -611,7 +627,8 @@ open_again:
 			f->fd = dup(STDIN_FILENO);
 		else
 			from_hash = file_lookup_open(f, flags);
-	} else { //td trim
+	} else if (td_trim(td)) {
+		assert(!td_rw(td)); /* should have matched above */
 		flags |= O_RDWR;
 		from_hash = file_lookup_open(f, flags);
 	}
@@ -685,7 +702,7 @@ static int get_file_sizes(struct thread_data *td)
 	int err = 0;
 
 	for_each_file(td, f, i) {
-		dprint(FD_FILE, "get file size for %p/%d/%p\n", f, i,
+		dprint(FD_FILE, "get file size for %p/%d/%s\n", f, i,
 								f->file_name);
 
 		if (td_io_get_file_size(td, f)) {
@@ -896,8 +913,7 @@ int setup_files(struct thread_data *td)
 		if (!o->file_size_low) {
 			/*
 			 * no file size or range given, file size is equal to
-			 * total size divided by number of files. If that is
-			 * zero, set it to the real file size. If the size
+			 * total size divided by number of files. If the size
 			 * doesn't divide nicely with the min blocksize,
 			 * make the first files bigger.
 			 */
@@ -907,8 +923,24 @@ int setup_files(struct thread_data *td)
 				f->io_size += bs;
 			}
 
-			if (!f->io_size)
+			/*
+			 * We normally don't come here, but if the result is 0,
+			 * set it to the real file size. This could be size of
+			 * the existing one if it already exists, but otherwise
+			 * will be set to 0. A new file won't be created because
+			 * ->io_size + ->file_offset equals ->real_file_size.
+			 */
+			if (!f->io_size) {
+				if (f->file_offset > f->real_file_size)
+					goto err_offset;
 				f->io_size = f->real_file_size - f->file_offset;
+				log_info("fio: forcing file %s size to %llu\n",
+					f->file_name,
+					(unsigned long long)f->io_size);
+				if (!f->io_size)
+					log_info("fio: file %s may be ignored\n",
+						f->file_name);
+			}
 		} else if (f->real_file_size < o->file_size_low ||
 			   f->real_file_size > o->file_size_high) {
 			if (f->file_offset > o->file_size_low)
@@ -942,9 +974,9 @@ int setup_files(struct thread_data *td)
 			if (!o->create_on_open) {
 				need_extend++;
 				extend_size += (f->io_size + f->file_offset);
+				fio_file_set_extend(f);
 			} else
 				f->real_file_size = f->io_size + f->file_offset;
-			fio_file_set_extend(f);
 		}
 	}
 
@@ -984,9 +1016,15 @@ int setup_files(struct thread_data *td)
 	 */
 	if (need_extend) {
 		temp_stall_ts = 1;
-		if (output_format & FIO_OUTPUT_NORMAL)
-			log_info("%s: Laying out IO file(s) (%u file(s) / %lluMiB)\n",
-				 o->name, need_extend, extend_size >> 20);
+		if (output_format & FIO_OUTPUT_NORMAL) {
+			log_info("%s: Laying out IO file%s (%u file%s / %s%lluMiB)\n",
+				 o->name,
+				 need_extend > 1 ? "s" : "",
+				 need_extend,
+				 need_extend > 1 ? "s" : "",
+				 need_extend > 1 ? "total " : "",
+				 extend_size >> 20);
+		}
 
 		for_each_file(td, f, i) {
 			unsigned long long old_len = -1ULL, extend_len = -1ULL;
diff --git a/fio.h b/fio.h
index b2f0e2f..b573ac5 100644
--- a/fio.h
+++ b/fio.h
@@ -490,7 +490,7 @@ static inline int should_fsync(struct thread_data *td)
 {
 	if (td->last_was_sync)
 		return 0;
-	if (td_write(td) || td_rw(td) || td->o.override_sync)
+	if (td_write(td) || td->o.override_sync)
 		return 1;
 
 	return 0;
diff --git a/init.c b/init.c
index fabc887..18538de 100644
--- a/init.c
+++ b/init.c
@@ -356,9 +356,8 @@ static int setup_thread_area(void)
 		perror("shmat");
 		return 1;
 	}
-#ifdef FIO_HAVE_SHM_ATTACH_REMOVED
-	shmctl(shm_id, IPC_RMID, NULL);
-#endif
+	if (shm_attach_to_open_removed())
+		shmctl(shm_id, IPC_RMID, NULL);
 #endif
 
 	memset(threads, 0, max_jobs * sizeof(struct thread_data));
@@ -620,7 +619,7 @@ static int fixup_options(struct thread_data *td)
 	/*
 	 * Reads can do overwrites, we always need to pre-create the file
 	 */
-	if (td_read(td) || td_rw(td))
+	if (td_read(td))
 		o->overwrite = 1;
 
 	if (!o->min_bs[DDIR_READ])
@@ -765,7 +764,11 @@ static int fixup_options(struct thread_data *td)
 	}
 
 	if (o->pre_read) {
-		o->invalidate_cache = 0;
+		if (o->invalidate_cache) {
+			log_info("fio: ignore invalidate option for %s\n",
+				 o->name);
+			o->invalidate_cache = 0;
+		}
 		if (td_ioengine_flagged(td, FIO_PIPEIO)) {
 			log_info("fio: cannot pre-read files with an IO engine"
 				 " that isn't seekable. Pre-read disabled.\n");
diff --git a/io_u.c b/io_u.c
index e12382b..cb8fc4a 100644
--- a/io_u.c
+++ b/io_u.c
@@ -643,7 +643,7 @@ int io_u_quiesce(struct thread_data *td)
 	}
 
 	while (td->io_u_in_flight) {
-		int fio_unused ret;
+		int ret;
 
 		ret = io_u_queued_complete(td, 1);
 		if (ret > 0)
@@ -1960,7 +1960,7 @@ int io_u_queued_complete(struct thread_data *td, int min_evts)
 	int ret, ddir;
 	struct timespec ts = { .tv_sec = 0, .tv_nsec = 0, };
 
-	dprint(FD_IO, "io_u_queued_completed: min=%d\n", min_evts);
+	dprint(FD_IO, "io_u_queued_complete: min=%d\n", min_evts);
 
 	if (!min_evts)
 		tvp = &ts;
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index 5e94855..97452ca 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -24,8 +24,7 @@
 #define FIO_HAVE_GETTID
 #define FIO_HAVE_CPU_AFFINITY
 #define FIO_HAVE_IOPRIO
-/* Only have attach-to-open-removed when kern.ipc.shm_allow_removed is 1 */
-#undef  FIO_HAVE_SHM_ATTACH_REMOVED
+#define FIO_HAVE_SHM_ATTACH_REMOVED
 
 #define OS_MAP_ANON		MAP_ANON
 
@@ -234,4 +233,15 @@ static inline int os_trim(int fd, unsigned long long start,
 #define FIO_MADV_FREE	MADV_FREE
 #endif
 
+static inline int shm_attach_to_open_removed(void)
+{
+	int x;
+	size_t len = sizeof(x);
+
+	if (sysctlbyname("kern.ipc.shm_allow_removed", &x, &len, NULL, 0) < 0)
+		return 0;
+
+	return x > 0 ? 1 : 0;
+}
+
 #endif
diff --git a/os/os-freebsd.h b/os/os-freebsd.h
index aa90954..9d1af3b 100644
--- a/os/os-freebsd.h
+++ b/os/os-freebsd.h
@@ -22,9 +22,7 @@
 #define FIO_HAVE_TRIM
 #define FIO_HAVE_GETTID
 #define FIO_HAVE_CPU_AFFINITY
-/* Only have attach-to-open-removed when kern.ipc.shm_allow_removed is 1 */
-#undef  FIO_HAVE_SHM_ATTACH_REMOVED
-
+#define FIO_HAVE_SHM_ATTACH_REMOVED
 
 #define OS_MAP_ANON		MAP_ANON
 
@@ -136,4 +134,15 @@ static inline int os_trim(int fd, unsigned long long start,
 #define FIO_MADV_FREE	MADV_FREE
 #endif
 
+static inline int shm_attach_to_open_removed(void)
+{
+	int x;
+	size_t len = sizeof(x);
+
+	if (sysctlbyname("kern.ipc.shm_allow_removed", &x, &len, NULL, 0) < 0)
+		return 0;
+
+	return x > 0 ? 1 : 0;
+}
+
 #endif
diff --git a/os/os-linux.h b/os/os-linux.h
index 1829829..7be833b 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -350,4 +350,9 @@ static inline ssize_t pwritev2(int fd, const struct iovec *iov, int iovcnt,
 #endif /* __NR_preadv2 */
 #endif /* CONFIG_PWRITEV2 */
 
+static inline int shm_attach_to_open_removed(void)
+{
+	return 1;
+}
+
 #endif
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index 4700572..3b19483 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -22,12 +22,10 @@
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_FS_STAT
 #define FIO_HAVE_GETTID
+#define FIO_HAVE_SHM_ATTACH_REMOVED
 
 #undef	FIO_HAVE_CPU_AFFINITY	/* XXX notyet */
 
-/* Only OpenBSD 5.1 and above have attach-to-open-removed semantics */
-#undef  FIO_HAVE_SHM_ATTACH_REMOVED
-
 #define OS_MAP_ANON		MAP_ANON
 
 #ifndef PTHREAD_STACK_MIN
@@ -90,4 +88,12 @@ static inline unsigned long long get_fs_free_size(const char *path)
 #define FIO_MADV_FREE	MADV_FREE
 #endif
 
+static inline int shm_attach_to_open_removed(void)
+{
+	/*
+	 * XXX: Return 1 if >= OpenBSD 5.1 according to 97900ebf.
+	 */
+	return 0;
+}
+
 #endif
diff --git a/os/os.h b/os/os.h
index 4178e6f..5e3c813 100644
--- a/os/os.h
+++ b/os/os.h
@@ -386,4 +386,11 @@ static inline int gettid(void)
 }
 #endif
 
+#ifndef FIO_HAVE_SHM_ATTACH_REMOVED
+static inline int shm_attach_to_open_removed(void)
+{
+	return 0;
+}
+#endif
+
 #endif
diff --git a/rate-submit.c b/rate-submit.c
index 42927ff..4738dc4 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -98,7 +98,6 @@ static int io_workqueue_init_worker_fn(struct submit_worker *sw)
 {
 	struct thread_data *parent = sw->wq->td;
 	struct thread_data *td = sw->priv;
-	int fio_unused ret;
 
 	memcpy(&td->o, &parent->o, sizeof(td->o));
 	memcpy(&td->ts, &parent->ts, sizeof(td->ts));
diff --git a/verify.c b/verify.c
index f567ec1..cadfe9c 100644
--- a/verify.c
+++ b/verify.c
@@ -851,7 +851,7 @@ static int verify_header(struct io_u *io_u, struct thread_data *td,
 	 * state of numberio, that would have been written to each block
 	 * in a previous run of fio, has been reached.
 	 */
-	if ((td_write(td) || td_rw(td)) && (td_min_bs(td) == td_max_bs(td)) &&
+	if (td_write(td) && (td_min_bs(td) == td_max_bs(td)) &&
 	    !td->o.time_based)
 		if (!td->o.verify_only || td->o.loops == 0)
 			if (hdr->numberio != io_u->numberio) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-03-09 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-03-09 13:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 18027 bytes --]

The following changes since commit 8f7630813305a4f4f04a5f9ba20b2a7d486c0cfb:

  io_u: don't add slat samples if we are in ramp time (2017-03-07 10:18:53 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ae3a5accfdbe1fbfde6ba4ab583887a7d3d779ac:

  verify: add support for the sha3 variants (2017-03-08 09:13:14 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      crc: add support for sha3 variants
      verify: add support for the sha3 variants

 HOWTO      |  12 +++++
 crc/sha3.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 crc/sha3.h |  42 +++++++++++++++
 crc/test.c |  81 +++++++++++++++++++++++++++++
 fio.1      |   2 +-
 options.c  |  16 ++++++
 verify.c   | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 verify.h   |  16 ++++++
 8 files changed, 514 insertions(+), 1 deletion(-)
 create mode 100644 crc/sha3.c
 create mode 100644 crc/sha3.h

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index a72d868..15ed425 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2355,6 +2355,18 @@ Verification
 		**sha1**
 			Use optimized sha1 as the checksum function.
 
+		**sha3-224**
+			Use optimized sha3-224 as the checksum function.
+
+		**sha3-256**
+			Use optimized sha3-256 as the checksum function.
+
+		**sha3-384**
+			Use optimized sha3-384 as the checksum function.
+
+		**sha3-512**
+			Use optimized sha3-512 as the checksum function.
+
 		**meta**
 			This option is deprecated, since now meta information is included in
 			generic verification header and meta verification happens by
diff --git a/crc/sha3.c b/crc/sha3.c
new file mode 100644
index 0000000..2685dce
--- /dev/null
+++ b/crc/sha3.c
@@ -0,0 +1,173 @@
+/*
+ * Cryptographic API.
+ *
+ * SHA-3, as specified in
+ * http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf
+ *
+ * SHA-3 code by Jeff Garzik <jeff@garzik.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)���
+ * any later version.
+ *
+ */
+#include <string.h>
+#include <inttypes.h>
+
+#include "../os/os.h"
+
+#include "sha3.h"
+
+#define KECCAK_ROUNDS 24
+
+#define ROTL64(x, y) (((x) << (y)) | ((x) >> (64 - (y))))
+
+static const uint64_t keccakf_rndc[24] = {
+	0x0000000000000001ULL, 0x0000000000008082ULL, 0x800000000000808aULL,
+	0x8000000080008000ULL, 0x000000000000808bULL, 0x0000000080000001ULL,
+	0x8000000080008081ULL, 0x8000000000008009ULL, 0x000000000000008aULL,
+	0x0000000000000088ULL, 0x0000000080008009ULL, 0x000000008000000aULL,
+	0x000000008000808bULL, 0x800000000000008bULL, 0x8000000000008089ULL,
+	0x8000000000008003ULL, 0x8000000000008002ULL, 0x8000000000000080ULL,
+	0x000000000000800aULL, 0x800000008000000aULL, 0x8000000080008081ULL,
+	0x8000000000008080ULL, 0x0000000080000001ULL, 0x8000000080008008ULL
+};
+
+static const int keccakf_rotc[24] = {
+	1,  3,  6,  10, 15, 21, 28, 36, 45, 55, 2,  14,
+	27, 41, 56, 8,  25, 43, 62, 18, 39, 61, 20, 44
+};
+
+static const int keccakf_piln[24] = {
+	10, 7,  11, 17, 18, 3, 5,  16, 8,  21, 24, 4,
+	15, 23, 19, 13, 12, 2, 20, 14, 22, 9,  6,  1
+};
+
+/* update the state with given number of rounds */
+
+static void keccakf(uint64_t st[25])
+{
+	int i, j, round;
+	uint64_t t, bc[5];
+
+	for (round = 0; round < KECCAK_ROUNDS; round++) {
+
+		/* Theta */
+		for (i = 0; i < 5; i++)
+			bc[i] = st[i] ^ st[i + 5] ^ st[i + 10] ^ st[i + 15]
+				^ st[i + 20];
+
+		for (i = 0; i < 5; i++) {
+			t = bc[(i + 4) % 5] ^ ROTL64(bc[(i + 1) % 5], 1);
+			for (j = 0; j < 25; j += 5)
+				st[j + i] ^= t;
+		}
+
+		/* Rho Pi */
+		t = st[1];
+		for (i = 0; i < 24; i++) {
+			j = keccakf_piln[i];
+			bc[0] = st[j];
+			st[j] = ROTL64(t, keccakf_rotc[i]);
+			t = bc[0];
+		}
+
+		/* Chi */
+		for (j = 0; j < 25; j += 5) {
+			for (i = 0; i < 5; i++)
+				bc[i] = st[j + i];
+			for (i = 0; i < 5; i++)
+				st[j + i] ^= (~bc[(i + 1) % 5]) &
+					     bc[(i + 2) % 5];
+		}
+
+		/* Iota */
+		st[0] ^= keccakf_rndc[round];
+	}
+}
+
+static void fio_sha3_init(struct fio_sha3_ctx *sctx, unsigned int digest_sz)
+{
+	memset(sctx->st, 0, sizeof(sctx->st));
+	sctx->md_len = digest_sz;
+	sctx->rsiz = 200 - 2 * digest_sz;
+	sctx->rsizw = sctx->rsiz / 8;
+	sctx->partial = 0;
+	memset(sctx->buf, 0, sizeof(sctx->buf));
+}
+
+void fio_sha3_224_init(struct fio_sha3_ctx *sctx)
+{
+	fio_sha3_init(sctx, SHA3_224_DIGEST_SIZE);
+}
+
+void fio_sha3_256_init(struct fio_sha3_ctx *sctx)
+{
+	fio_sha3_init(sctx, SHA3_256_DIGEST_SIZE);
+}
+
+void fio_sha3_384_init(struct fio_sha3_ctx *sctx)
+{
+	fio_sha3_init(sctx, SHA3_384_DIGEST_SIZE);
+}
+
+void fio_sha3_512_init(struct fio_sha3_ctx *sctx)
+{
+	fio_sha3_init(sctx, SHA3_512_DIGEST_SIZE);
+}
+
+int fio_sha3_update(struct fio_sha3_ctx *sctx, const uint8_t *data,
+		    unsigned int len)
+{
+	unsigned int done;
+	const uint8_t *src;
+
+	done = 0;
+	src = data;
+
+	if ((sctx->partial + len) > (sctx->rsiz - 1)) {
+		if (sctx->partial) {
+			done = -sctx->partial;
+			memcpy(sctx->buf + sctx->partial, data,
+			       done + sctx->rsiz);
+			src = sctx->buf;
+		}
+
+		do {
+			unsigned int i;
+
+			for (i = 0; i < sctx->rsizw; i++)
+				sctx->st[i] ^= ((uint64_t *) src)[i];
+			keccakf(sctx->st);
+
+			done += sctx->rsiz;
+			src = data + done;
+		} while (done + (sctx->rsiz - 1) < len);
+
+		sctx->partial = 0;
+	}
+	memcpy(sctx->buf + sctx->partial, src, len - done);
+	sctx->partial += (len - done);
+
+	return 0;
+}
+
+void fio_sha3_final(struct fio_sha3_ctx *sctx)
+{
+	unsigned int i, inlen = sctx->partial;
+
+	sctx->buf[inlen++] = 0x06;
+	memset(sctx->buf + inlen, 0, sctx->rsiz - inlen);
+	sctx->buf[sctx->rsiz - 1] |= 0x80;
+
+	for (i = 0; i < sctx->rsizw; i++)
+		sctx->st[i] ^= ((uint64_t *) sctx->buf)[i];
+
+	keccakf(sctx->st);
+
+	for (i = 0; i < sctx->rsizw; i++)
+		sctx->st[i] = cpu_to_le64(sctx->st[i]);
+
+	memcpy(sctx->sha, sctx->st, sctx->md_len);
+}
diff --git a/crc/sha3.h b/crc/sha3.h
new file mode 100644
index 0000000..9f1970a
--- /dev/null
+++ b/crc/sha3.h
@@ -0,0 +1,42 @@
+/*
+ * Common values for SHA-3 algorithms
+ */
+#ifndef __CRYPTO_SHA3_H__
+#define __CRYPTO_SHA3_H__
+
+#include <inttypes.h>
+
+#define SHA3_224_DIGEST_SIZE	(224 / 8)
+#define SHA3_224_BLOCK_SIZE	(200 - 2 * SHA3_224_DIGEST_SIZE)
+
+#define SHA3_256_DIGEST_SIZE	(256 / 8)
+#define SHA3_256_BLOCK_SIZE	(200 - 2 * SHA3_256_DIGEST_SIZE)
+
+#define SHA3_384_DIGEST_SIZE	(384 / 8)
+#define SHA3_384_BLOCK_SIZE	(200 - 2 * SHA3_384_DIGEST_SIZE)
+
+#define SHA3_512_DIGEST_SIZE	(512 / 8)
+#define SHA3_512_BLOCK_SIZE	(200 - 2 * SHA3_512_DIGEST_SIZE)
+
+struct fio_sha3_ctx {
+	uint64_t	st[25];
+	unsigned int	md_len;
+	unsigned int	rsiz;
+	unsigned int	rsizw;
+
+	unsigned int	partial;
+	uint8_t		buf[SHA3_224_BLOCK_SIZE];
+
+	uint8_t		*sha;
+};
+
+void fio_sha3_224_init(struct fio_sha3_ctx *sctx);
+void fio_sha3_256_init(struct fio_sha3_ctx *sctx);
+void fio_sha3_384_init(struct fio_sha3_ctx *sctx);
+void fio_sha3_512_init(struct fio_sha3_ctx *sctx);
+
+int fio_sha3_update(struct fio_sha3_ctx *sctx, const uint8_t *data,
+		    unsigned int len);
+void fio_sha3_final(struct fio_sha3_ctx *sctx);
+
+#endif
diff --git a/crc/test.c b/crc/test.c
index 78f19ac..368229e 100644
--- a/crc/test.c
+++ b/crc/test.c
@@ -16,6 +16,7 @@
 #include "../crc/sha1.h"
 #include "../crc/sha256.h"
 #include "../crc/sha512.h"
+#include "../crc/sha3.h"
 #include "../crc/xxhash.h"
 #include "../crc/murmur3.h"
 #include "../crc/fnv.h"
@@ -47,6 +48,10 @@ enum {
 	T_MURMUR3	= 1U << 10,
 	T_JHASH		= 1U << 11,
 	T_FNV		= 1U << 12,
+	T_SHA3_224	= 1U << 13,
+	T_SHA3_256	= 1U << 14,
+	T_SHA3_384	= 1U << 15,
+	T_SHA3_512	= 1U << 16,
 };
 
 static void t_md5(struct test_type *t, void *buf, size_t size)
@@ -143,6 +148,62 @@ static void t_sha512(struct test_type *t, void *buf, size_t size)
 		fio_sha512_update(&ctx, buf, size);
 }
 
+static void t_sha3_224(struct test_type *t, void *buf, size_t size)
+{
+	uint8_t sha[SHA3_224_DIGEST_SIZE];
+	struct fio_sha3_ctx ctx = { .sha = sha };
+	int i;
+
+	fio_sha3_224_init(&ctx);
+
+	for (i = 0; i < NR_CHUNKS; i++) {
+		fio_sha3_update(&ctx, buf, size);
+		fio_sha3_final(&ctx);
+	}
+}
+
+static void t_sha3_256(struct test_type *t, void *buf, size_t size)
+{
+	uint8_t sha[SHA3_256_DIGEST_SIZE];
+	struct fio_sha3_ctx ctx = { .sha = sha };
+	int i;
+
+	fio_sha3_256_init(&ctx);
+
+	for (i = 0; i < NR_CHUNKS; i++) {
+		fio_sha3_update(&ctx, buf, size);
+		fio_sha3_final(&ctx);
+	}
+}
+
+static void t_sha3_384(struct test_type *t, void *buf, size_t size)
+{
+	uint8_t sha[SHA3_384_DIGEST_SIZE];
+	struct fio_sha3_ctx ctx = { .sha = sha };
+	int i;
+
+	fio_sha3_384_init(&ctx);
+
+	for (i = 0; i < NR_CHUNKS; i++) {
+		fio_sha3_update(&ctx, buf, size);
+		fio_sha3_final(&ctx);
+	}
+}
+
+static void t_sha3_512(struct test_type *t, void *buf, size_t size)
+{
+	uint8_t sha[SHA3_512_DIGEST_SIZE];
+	struct fio_sha3_ctx ctx = { .sha = sha };
+	int i;
+
+	fio_sha3_512_init(&ctx);
+
+	for (i = 0; i < NR_CHUNKS; i++) {
+		fio_sha3_update(&ctx, buf, size);
+		fio_sha3_final(&ctx);
+	}
+}
+
 static void t_murmur3(struct test_type *t, void *buf, size_t size)
 {
 	int i;
@@ -247,6 +308,26 @@ static struct test_type t[] = {
 		.fn = t_fnv,
 	},
 	{
+		.name = "sha3-224",
+		.mask = T_SHA3_224,
+		.fn = t_sha3_224,
+	},
+	{
+		.name = "sha3-256",
+		.mask = T_SHA3_256,
+		.fn = t_sha3_256,
+	},
+	{
+		.name = "sha3-384",
+		.mask = T_SHA3_384,
+		.fn = t_sha3_384,
+	},
+	{
+		.name = "sha3-512",
+		.mask = T_SHA3_512,
+		.fn = t_sha3_512,
+	},
+	{
 		.name = NULL,
 	},
 };
diff --git a/fio.1 b/fio.1
index 56f2d11..cc68dee 100644
--- a/fio.1
+++ b/fio.1
@@ -1414,7 +1414,7 @@ option.  The allowed values are:
 .RS
 .RS
 .TP
-.B md5 crc16 crc32 crc32c crc32c-intel crc64 crc7 sha256 sha512 sha1 xxhash
+.B md5 crc16 crc32 crc32c crc32c-intel crc64 crc7 sha256 sha512 sha1 sha3-224 sha3-256 sha3-384 sha3-512 xxhash
 Store appropriate checksum in the header of each block. crc32c-intel is
 hardware accelerated SSE4.2 driven, falls back to regular crc32c if
 not supported by the system.
diff --git a/options.c b/options.c
index a543e5a..dcf0eea 100644
--- a/options.c
+++ b/options.c
@@ -2674,6 +2674,22 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .oval = VERIFY_SHA512,
 			    .help = "Use sha512 checksums for verification",
 			  },
+			  { .ival = "sha3-224",
+			    .oval = VERIFY_SHA3_224,
+			    .help = "Use sha3-224 checksums for verification",
+			  },
+			  { .ival = "sha3-256",
+			    .oval = VERIFY_SHA3_256,
+			    .help = "Use sha3-256 checksums for verification",
+			  },
+			  { .ival = "sha3-384",
+			    .oval = VERIFY_SHA3_384,
+			    .help = "Use sha3-384 checksums for verification",
+			  },
+			  { .ival = "sha3-512",
+			    .oval = VERIFY_SHA3_512,
+			    .help = "Use sha3-512 checksums for verification",
+			  },
 			  { .ival = "xxhash",
 			    .oval = VERIFY_XXHASH,
 			    .help = "Use xxhash checksums for verification",
diff --git a/verify.c b/verify.c
index 5c7e43d..f567ec1 100644
--- a/verify.c
+++ b/verify.c
@@ -25,6 +25,7 @@
 #include "crc/sha512.h"
 #include "crc/sha1.h"
 #include "crc/xxhash.h"
+#include "crc/sha3.h"
 
 static void populate_hdr(struct thread_data *td, struct io_u *io_u,
 			 struct verify_header *hdr, unsigned int header_num,
@@ -172,6 +173,18 @@ static inline unsigned int __hdr_size(int verify_type)
 	case VERIFY_SHA512:
 		len = sizeof(struct vhdr_sha512);
 		break;
+	case VERIFY_SHA3_224:
+		len = sizeof(struct vhdr_sha3_224);
+		break;
+	case VERIFY_SHA3_256:
+		len = sizeof(struct vhdr_sha3_256);
+		break;
+	case VERIFY_SHA3_384:
+		len = sizeof(struct vhdr_sha3_384);
+		break;
+	case VERIFY_SHA3_512:
+		len = sizeof(struct vhdr_sha3_512);
+		break;
 	case VERIFY_XXHASH:
 		len = sizeof(struct vhdr_xxhash);
 		break;
@@ -431,6 +444,84 @@ static int verify_io_u_xxhash(struct verify_header *hdr, struct vcont *vc)
 	return EILSEQ;
 }
 
+static int verify_io_u_sha3(struct verify_header *hdr, struct vcont *vc,
+			    struct fio_sha3_ctx *sha3_ctx, uint8_t *sha,
+			    unsigned int sha_size, const char *name)
+{
+	void *p = io_u_verify_off(hdr, vc);
+
+	dprint(FD_VERIFY, "%s verify io_u %p, len %u\n", name, vc->io_u, hdr->len);
+
+	fio_sha3_update(sha3_ctx, p, hdr->len - hdr_size(vc->td, hdr));
+	fio_sha3_final(sha3_ctx);
+
+	if (!memcmp(sha, sha3_ctx->sha, sha_size))
+		return 0;
+
+	vc->name = name;
+	vc->good_crc = sha;
+	vc->bad_crc = sha3_ctx->sha;
+	vc->crc_len = sha_size;
+	log_verify_failure(hdr, vc);
+	return EILSEQ;
+}
+
+static int verify_io_u_sha3_224(struct verify_header *hdr, struct vcont *vc)
+{
+	struct vhdr_sha3_224 *vh = hdr_priv(hdr);
+	uint8_t sha[SHA3_224_DIGEST_SIZE];
+	struct fio_sha3_ctx sha3_ctx = {
+		.sha = sha,
+	};
+
+	fio_sha3_224_init(&sha3_ctx);
+
+	return verify_io_u_sha3(hdr, vc, &sha3_ctx, vh->sha,
+				SHA3_224_DIGEST_SIZE, "sha3-224");
+}
+
+static int verify_io_u_sha3_256(struct verify_header *hdr, struct vcont *vc)
+{
+	struct vhdr_sha3_256 *vh = hdr_priv(hdr);
+	uint8_t sha[SHA3_256_DIGEST_SIZE];
+	struct fio_sha3_ctx sha3_ctx = {
+		.sha = sha,
+	};
+
+	fio_sha3_256_init(&sha3_ctx);
+
+	return verify_io_u_sha3(hdr, vc, &sha3_ctx, vh->sha,
+				SHA3_256_DIGEST_SIZE, "sha3-256");
+}
+
+static int verify_io_u_sha3_384(struct verify_header *hdr, struct vcont *vc)
+{
+	struct vhdr_sha3_384 *vh = hdr_priv(hdr);
+	uint8_t sha[SHA3_384_DIGEST_SIZE];
+	struct fio_sha3_ctx sha3_ctx = {
+		.sha = sha,
+	};
+
+	fio_sha3_384_init(&sha3_ctx);
+
+	return verify_io_u_sha3(hdr, vc, &sha3_ctx, vh->sha,
+				SHA3_384_DIGEST_SIZE, "sha3-384");
+}
+
+static int verify_io_u_sha3_512(struct verify_header *hdr, struct vcont *vc)
+{
+	struct vhdr_sha3_512 *vh = hdr_priv(hdr);
+	uint8_t sha[SHA3_512_DIGEST_SIZE];
+	struct fio_sha3_ctx sha3_ctx = {
+		.sha = sha,
+	};
+
+	fio_sha3_512_init(&sha3_ctx);
+
+	return verify_io_u_sha3(hdr, vc, &sha3_ctx, vh->sha,
+				SHA3_512_DIGEST_SIZE, "sha3-512");
+}
+
 static int verify_io_u_sha512(struct verify_header *hdr, struct vcont *vc)
 {
 	void *p = io_u_verify_off(hdr, vc);
@@ -882,6 +973,18 @@ int verify_io_u(struct thread_data *td, struct io_u **io_u_ptr)
 		case VERIFY_SHA512:
 			ret = verify_io_u_sha512(hdr, &vc);
 			break;
+		case VERIFY_SHA3_224:
+			ret = verify_io_u_sha3_224(hdr, &vc);
+			break;
+		case VERIFY_SHA3_256:
+			ret = verify_io_u_sha3_256(hdr, &vc);
+			break;
+		case VERIFY_SHA3_384:
+			ret = verify_io_u_sha3_384(hdr, &vc);
+			break;
+		case VERIFY_SHA3_512:
+			ret = verify_io_u_sha3_512(hdr, &vc);
+			break;
 		case VERIFY_XXHASH:
 			ret = verify_io_u_xxhash(hdr, &vc);
 			break;
@@ -919,6 +1022,56 @@ static void fill_xxhash(struct verify_header *hdr, void *p, unsigned int len)
 	vh->hash = XXH32_digest(state);
 }
 
+static void fill_sha3(struct fio_sha3_ctx *sha3_ctx, void *p, unsigned int len)
+{
+	fio_sha3_update(sha3_ctx, p, len);
+	fio_sha3_final(sha3_ctx);
+}
+
+static void fill_sha3_224(struct verify_header *hdr, void *p, unsigned int len)
+{
+	struct vhdr_sha3_224 *vh = hdr_priv(hdr);
+	struct fio_sha3_ctx sha3_ctx = {
+		.sha = vh->sha,
+	};
+
+	fio_sha3_224_init(&sha3_ctx);
+	fill_sha3(&sha3_ctx, p, len);
+}
+
+static void fill_sha3_256(struct verify_header *hdr, void *p, unsigned int len)
+{
+	struct vhdr_sha3_256 *vh = hdr_priv(hdr);
+	struct fio_sha3_ctx sha3_ctx = {
+		.sha = vh->sha,
+	};
+
+	fio_sha3_256_init(&sha3_ctx);
+	fill_sha3(&sha3_ctx, p, len);
+}
+
+static void fill_sha3_384(struct verify_header *hdr, void *p, unsigned int len)
+{
+	struct vhdr_sha3_384 *vh = hdr_priv(hdr);
+	struct fio_sha3_ctx sha3_ctx = {
+		.sha = vh->sha,
+	};
+
+	fio_sha3_384_init(&sha3_ctx);
+	fill_sha3(&sha3_ctx, p, len);
+}
+
+static void fill_sha3_512(struct verify_header *hdr, void *p, unsigned int len)
+{
+	struct vhdr_sha3_512 *vh = hdr_priv(hdr);
+	struct fio_sha3_ctx sha3_ctx = {
+		.sha = vh->sha,
+	};
+
+	fio_sha3_512_init(&sha3_ctx);
+	fill_sha3(&sha3_ctx, p, len);
+}
+
 static void fill_sha512(struct verify_header *hdr, void *p, unsigned int len)
 {
 	struct vhdr_sha512 *vh = hdr_priv(hdr);
@@ -1085,6 +1238,26 @@ static void populate_hdr(struct thread_data *td, struct io_u *io_u,
 						io_u, hdr->len);
 		fill_sha512(hdr, data, data_len);
 		break;
+	case VERIFY_SHA3_224:
+		dprint(FD_VERIFY, "fill sha3-224 io_u %p, len %u\n",
+						io_u, hdr->len);
+		fill_sha3_224(hdr, data, data_len);
+		break;
+	case VERIFY_SHA3_256:
+		dprint(FD_VERIFY, "fill sha3-256 io_u %p, len %u\n",
+						io_u, hdr->len);
+		fill_sha3_256(hdr, data, data_len);
+		break;
+	case VERIFY_SHA3_384:
+		dprint(FD_VERIFY, "fill sha3-384 io_u %p, len %u\n",
+						io_u, hdr->len);
+		fill_sha3_384(hdr, data, data_len);
+		break;
+	case VERIFY_SHA3_512:
+		dprint(FD_VERIFY, "fill sha3-512 io_u %p, len %u\n",
+						io_u, hdr->len);
+		fill_sha3_512(hdr, data, data_len);
+		break;
 	case VERIFY_XXHASH:
 		dprint(FD_VERIFY, "fill xxhash io_u %p, len %u\n",
 						io_u, hdr->len);
diff --git a/verify.h b/verify.h
index deb161e..5aae2e7 100644
--- a/verify.h
+++ b/verify.h
@@ -20,6 +20,10 @@ enum {
 	VERIFY_CRC7,			/* crc7 sum data blocks */
 	VERIFY_SHA256,			/* sha256 sum data blocks */
 	VERIFY_SHA512,			/* sha512 sum data blocks */
+	VERIFY_SHA3_224,		/* sha3-224 sum data blocks */
+	VERIFY_SHA3_256,		/* sha3-256 sum data blocks */
+	VERIFY_SHA3_384,		/* sha3-384 sum data blocks */
+	VERIFY_SHA3_512,		/* sha3-512 sum data blocks */
 	VERIFY_XXHASH,			/* xxhash sum data blocks */
 	VERIFY_SHA1,			/* sha1 sum data blocks */
 	VERIFY_PATTERN,			/* verify specific patterns */
@@ -48,6 +52,18 @@ struct verify_header {
 struct vhdr_md5 {
 	uint32_t md5_digest[4];
 };
+struct vhdr_sha3_224 {
+	uint8_t sha[224 / 8];
+};
+struct vhdr_sha3_256 {
+	uint8_t sha[256 / 8];
+};
+struct vhdr_sha3_384 {
+	uint8_t sha[384 / 8];
+};
+struct vhdr_sha3_512 {
+	uint8_t sha[512 / 8];
+};
 struct vhdr_sha512 {
 	uint8_t sha512[128];
 };

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-03-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-03-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4ed22fe532123f81e618269e9a77b7b41e0e9cad:

  fio: fix overflow trying to use 'd' suffix (2017-02-24 01:45:07 +0000)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8f7630813305a4f4f04a5f9ba20b2a7d486c0cfb:

  io_u: don't add slat samples if we are in ramp time (2017-03-07 10:18:53 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      io_u: don't add slat samples if we are in ramp time

 io_u.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index 46d9731..e12382b 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1994,7 +1994,7 @@ int io_u_queued_complete(struct thread_data *td, int min_evts)
  */
 void io_u_queued(struct thread_data *td, struct io_u *io_u)
 {
-	if (!td->o.disable_slat) {
+	if (!td->o.disable_slat && ramp_time_over(td)) {
 		unsigned long slat_time;
 
 		slat_time = utime_since(&io_u->start_time, &io_u->issue_time);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-25 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-25 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c9057434a99a85f643ce433c7fec9b8f7fad9761:

  Fio 2.18 (2017-02-23 08:44:32 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4ed22fe532123f81e618269e9a77b7b41e0e9cad:

  fio: fix overflow trying to use 'd' suffix (2017-02-24 01:45:07 +0000)

----------------------------------------------------------------
Sitsofe Wheeler (1):
      fio: fix overflow trying to use 'd' suffix

 parse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/parse.c b/parse.c
index fc508b6..fd5605f 100644
--- a/parse.c
+++ b/parse.c
@@ -167,7 +167,7 @@ static unsigned long long get_mult_time(const char *str, int len,
 	else if (!strcmp("h", c))
 		mult = 60 * 60 * 1000000UL;
 	else if (!strcmp("d", c))
-		mult = 24 * 60 * 60 * 1000000UL;
+		mult = 24 * 60 * 60 * 1000000ULL;
 
 	free(c);
 	return mult;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 484817a95764dc7ba83b9f16c417bd727813043a:

  configure: disable compile time asserts for !opt and !static_assert (2017-02-22 09:00:06 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c9057434a99a85f643ce433c7fec9b8f7fad9761:

  Fio 2.18 (2017-02-23 08:44:32 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 2.18

 FIO-VERSION-GEN        | 2 +-
 os/windows/install.wxs | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index e2d8a43..570e21f 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.17
+DEF_VER=fio-2.18
 
 LF='
 '
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 9e776de..bb2b90b 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.17">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.18">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-23 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-23 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8983ce71bc3aa4076cb0c9f2b5c3b73ab7c7de93:

  init: use 'bool' for get_new_job() (2017-02-21 20:53:48 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 484817a95764dc7ba83b9f16c417bd727813043a:

  configure: disable compile time asserts for !opt and !static_assert (2017-02-22 09:00:06 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      configure: disable compile time asserts for !opt and !static_assert

 compiler/compiler.h | 8 +++++++-
 configure           | 3 +++
 2 files changed, 10 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/compiler/compiler.h b/compiler/compiler.h
index a6a7432..20df21d 100644
--- a/compiler/compiler.h
+++ b/compiler/compiler.h
@@ -38,10 +38,12 @@
 #if defined(CONFIG_STATIC_ASSERT)
 #define compiletime_assert(condition, msg) _Static_assert(condition, msg)
 
-#else
+#elif !defined(CONFIG_DISABLE_OPTIMIZATIONS)
+
 #ifndef __compiletime_error
 #define __compiletime_error(message)
 #endif
+
 #ifndef __compiletime_error_fallback
 #define __compiletime_error_fallback(condition)	do { } while (0)
 #endif
@@ -61,6 +63,10 @@
 #define compiletime_assert(condition, msg) \
 	_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 
+#else
+
+#define compiletime_assert(condition, msg)	do { } while (0)
+
 #endif
 
 #endif
diff --git a/configure b/configure
index 44d215f..15b87fa 100755
--- a/configure
+++ b/configure
@@ -2041,6 +2041,9 @@ fi
 if test "$have_bool" = "yes" ; then
   output_sym "CONFIG_HAVE_BOOL"
 fi
+if test "$disable_opt" = "yes" ; then
+  output_sym "CONFIG_DISABLE_OPTIMIZATIONS"
+fi
 
 if test "$zlib" = "no" ; then
   echo "Consider installing zlib-dev (zlib-devel), some fio features depend on it."

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-22 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-22 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 349cdc89ffebb8f3a9cf3ff5be6a9934b94f1b05:

  appveyor: add CI building of the Windows version of fio (2017-02-20 07:19:20 +0000)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8983ce71bc3aa4076cb0c9f2b5c3b73ab7c7de93:

  init: use 'bool' for get_new_job() (2017-02-21 20:53:48 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      init: exit on failure to add all jobs
      init: use 'bool' for get_new_job()

 init.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 0a2ace1..fabc887 100644
--- a/init.c
+++ b/init.c
@@ -435,8 +435,8 @@ static void copy_opt_list(struct thread_data *dst, struct thread_data *src)
 /*
  * Return a free job structure.
  */
-static struct thread_data *get_new_job(int global, struct thread_data *parent,
-				       int preserve_eo, const char *jobname)
+static struct thread_data *get_new_job(bool global, struct thread_data *parent,
+				       bool preserve_eo, const char *jobname)
 {
 	struct thread_data *td;
 
@@ -1560,7 +1560,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	 */
 	numjobs = o->numjobs;
 	while (--numjobs) {
-		struct thread_data *td_new = get_new_job(0, td, 1, jobname);
+		struct thread_data *td_new = get_new_job(false, td, true, jobname);
 
 		if (!td_new)
 			goto err;
@@ -1621,11 +1621,11 @@ void add_job_opts(const char **o, int client_type)
 			sprintf(jobname, "%s", o[i] + 5);
 		}
 		if (in_global && !td_parent)
-			td_parent = get_new_job(1, &def_thread, 0, jobname);
+			td_parent = get_new_job(true, &def_thread, false, jobname);
 		else if (!in_global && !td) {
 			if (!td_parent)
 				td_parent = &def_thread;
-			td = get_new_job(0, td_parent, 0, jobname);
+			td = get_new_job(false, td_parent, false, jobname);
 		}
 		if (in_global)
 			fio_options_parse(td_parent, (char **) &o[i], 1);
@@ -1677,7 +1677,7 @@ static int __parse_jobs_ini(struct thread_data *td,
 		char *file, int is_buf, int stonewall_flag, int type,
 		int nested, char *name, char ***popts, int *aopts, int *nopts)
 {
-	unsigned int global = 0;
+	bool global = false;
 	char *string;
 	FILE *f;
 	char *p;
@@ -1786,7 +1786,7 @@ static int __parse_jobs_ini(struct thread_data *td,
 				first_sect = 0;
 			}
 
-			td = get_new_job(global, &def_thread, 0, name);
+			td = get_new_job(global, &def_thread, false, name);
 			if (!td) {
 				ret = 1;
 				break;
@@ -2475,7 +2475,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 				if (is_section && skip_this_section(val))
 					continue;
 
-				td = get_new_job(global, &def_thread, 1, NULL);
+				td = get_new_job(global, &def_thread, true, NULL);
 				if (!td || ioengine_load(td)) {
 					if (td) {
 						put_job(td);
@@ -2713,7 +2713,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 		if (!ret) {
 			ret = add_job(td, td->o.name ?: "fio", 0, 0, client_type);
 			if (ret)
-				did_arg = 1;
+				exit(1);
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-21 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-21 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit dbd39049ca473de7dc031cd9bf3efe992834323f:

  Revert "configure: Drop default CONFIG_LITTLE_ENDIAN for Cygwin" (2017-02-19 17:57:43 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 349cdc89ffebb8f3a9cf3ff5be6a9934b94f1b05:

  appveyor: add CI building of the Windows version of fio (2017-02-20 07:19:20 +0000)

----------------------------------------------------------------
Sitsofe Wheeler (1):
      appveyor: add CI building of the Windows version of fio

 appveyor.yml | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)
 create mode 100644 appveyor.yml

---

Diff of recent changes:

diff --git a/appveyor.yml b/appveyor.yml
new file mode 100644
index 0000000..7543393
--- /dev/null
+++ b/appveyor.yml
@@ -0,0 +1,27 @@
+clone_depth: 50
+environment:
+  MAKEFLAGS: -j 2
+  matrix:
+    - platform: x86_64
+      BUILD_ARCH: x64
+      CYG_ROOT: C:\cygwin64
+      CONFIGURE_OPTIONS:
+    - platform: x86
+      BUILD_ARCH: x86
+      CYG_ROOT: C:\cygwin
+      CONFIGURE_OPTIONS: --build-32bit-win
+
+build_script:
+  - SET PATH=%CYG_ROOT%\bin;%PATH%
+  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && ./configure ${CONFIGURE_OPTIONS} && make.exe'
+
+after_build:
+  - cd os\windows && dobuild.cmd %BUILD_ARCH%
+
+test_script:
+  - SET PATH=%CYG_ROOT%\bin;%PATH%
+  - 'bash.exe -lc "cd \"${APPVEYOR_BUILD_FOLDER}\" && file.exe fio.exe && make.exe test'
+
+artifacts:
+  - path: os\windows\*.msi
+    name: msi

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-20 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-20 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a0cb220bbb28c68fab0b175d01dcfce38a6f835c:

  Revert "Always set ->real_file_size to -1 when failed to get file size" (2017-02-17 13:30:43 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dbd39049ca473de7dc031cd9bf3efe992834323f:

  Revert "configure: Drop default CONFIG_LITTLE_ENDIAN for Cygwin" (2017-02-19 17:57:43 -0700)

----------------------------------------------------------------
Tomohiro Kusumi (6):
      Rename thread_options' ->io_limit to io_size
      Avoid irrelevant "offset extend ends" error message
      Add details of file number/size related options to HOWTO
      Remove irrelevant Cygwin config flag CONFIG_FADVISE
      Silence Cygwin warning "'vsprintf_s' redeclared without dllimport..."
      Revert "configure: Drop default CONFIG_LITTLE_ENDIAN for Cygwin"

 HOWTO              | 43 +++++++++++++++++++++++++++++--------------
 backend.c          | 14 +++++++-------
 cconv.c            |  4 ++--
 configure          |  3 +--
 file.h             |  1 +
 filesetup.c        | 54 +++++++++++++++++++++++++++++++++++++++++++++++-------
 fio.h              |  3 +++
 io_u.c             |  1 +
 options.c          |  5 ++++-
 os/windows/posix.c |  6 ------
 thread_options.h   |  4 ++--
 11 files changed, 97 insertions(+), 41 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index f44c626..a72d868 100644
--- a/HOWTO
+++ b/HOWTO
@@ -631,7 +631,8 @@ Job description
 
 .. option:: numjobs=int
 
-	Create the specified number of clones of this job. May be used to setup a
+	Create the specified number of clones of this job. Each clone of job
+	is spawned as an independent thread or process. May be used to setup a
 	larger number of threads/processes doing the same thing. Each thread is
 	reported separately; to see statistics for all clones as a whole, use
 	:option:`group_reporting` in conjunction with :option:`new_group`.
@@ -732,11 +733,15 @@ Target file/device
 
 	Fio normally makes up a `filename` based on the job name, thread number, and
 	file number. If you want to share files between threads in a job or several
-	jobs, specify a `filename` for each of them to override the default.  If the
-	ioengine is file based, you can specify a number of files by separating the
-	names with a ':' colon. So if you wanted a job to open :file:`/dev/sda` and
-	:file:`/dev/sdb` as the two working files, you would use
-	``filename=/dev/sda:/dev/sdb``.
+	jobs with fixed file paths, specify a `filename` for each of them to override
+	the default. If the ioengine is file based, you can specify a number of files
+	by separating the names with a ':' colon. So if you wanted a job to open
+	:file:`/dev/sda` and :file:`/dev/sdb` as the two working files, you would use
+	``filename=/dev/sda:/dev/sdb``. This also means that whenever this option is
+	specified, :option:`nrfiles` is ignored. The size of regular files specified
+	by this option will be :option:`size` divided by number of files unless
+	explicit size is specified by :option:`filesize`.
+
 	On Windows, disk devices are accessed as :file:`\\\\.\\PhysicalDrive0` for
 	the first device, :file:`\\\\.\\PhysicalDrive1` for the second etc.
 	Note: Windows and FreeBSD prevent write access to areas
@@ -798,7 +803,12 @@ Target file/device
 
 .. option:: nrfiles=int
 
-	Number of files to use for this job. Defaults to 1.
+	Number of files to use for this job. Defaults to 1. The size of files
+	will be :option:`size` divided by this unless explicit size is specified by
+	:option:`filesize`. Files are created for each thread separately, and each
+	file will have a file number within its name by default, as explained in
+	:option:`filename` section.
+
 
 .. option:: openfiles=int
 
@@ -1497,12 +1507,14 @@ I/O size
 
 .. option:: size=int
 
-	The total size of file I/O for this job. Fio will run until this many bytes
-	has been transferred, unless runtime is limited by other options (such as
-	:option:`runtime`, for instance, or increased/decreased by
-	:option:`io_size`). Unless specific :option:`nrfiles` and :option:`filesize`
-	options are given, fio will divide this size between the available files
-	specified by the job. If not set, fio will use the full size of the given
+	The total size of file I/O for each thread of this job. Fio will run until
+	this many bytes has been transferred, unless runtime is limited by other options
+	(such as :option:`runtime`, for instance, or increased/decreased by :option:`io_size`).
+	Fio will divide this size between the available files determined by options
+	such as :option:`nrfiles`, :option:`filename`, unless :option:`filesize` is
+	specified by the job. If the result of division happens to be 0, the size is
+	set to the physical size of the given files or devices.
+	If this option is not specified, fio will use the full size of the given
 	files or devices.  If the files do not exist, size must be given. It is also
 	possible to give size as a percentage between 1 and 100. If ``size=20%`` is
 	given, fio will use 20% of the full size of the given files or devices.
@@ -1526,6 +1538,8 @@ I/O size
 	Individual file sizes. May be a range, in which case fio will select sizes
 	for files at random within the given range and limited to :option:`size` in
 	total (if that is given). If not given, each created file is the same size.
+	This option overrides :option:`size` in terms of file size, which means
+	this value is used as a fixed size or possible range of each file.
 
 .. option:: file_append=bool
 
@@ -2108,7 +2122,8 @@ Threads, processes and job synchronization
 .. option:: thread
 
 	Fio defaults to forking jobs, however if this option is given, fio will use
-	:manpage:`pthread_create(3)` to create threads instead.
+	POSIX Threads function :manpage:`pthread_create(3)` to create threads instead
+	of forking processes.
 
 .. option:: wait_for=str
 
diff --git a/backend.c b/backend.c
index 1c1f2f9..4bc00e6 100644
--- a/backend.c
+++ b/backend.c
@@ -776,8 +776,8 @@ static bool io_bytes_exceeded(struct thread_data *td, uint64_t *this_bytes)
 	else
 		bytes = this_bytes[DDIR_TRIM];
 
-	if (td->o.io_limit)
-		limit = td->o.io_limit;
+	if (td->o.io_size)
+		limit = td->o.io_size;
 	else
 		limit = td->o.size;
 
@@ -851,11 +851,11 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 
 	total_bytes = td->o.size;
 	/*
-	* Allow random overwrite workloads to write up to io_limit
+	* Allow random overwrite workloads to write up to io_size
 	* before starting verification phase as 'size' doesn't apply.
 	*/
 	if (td_write(td) && td_random(td) && td->o.norandommap)
-		total_bytes = max(total_bytes, (uint64_t) td->o.io_limit);
+		total_bytes = max(total_bytes, (uint64_t) td->o.io_size);
 	/*
 	 * If verify_backlog is enabled, we'll run the verify in this
 	 * handler as well. For that case, we may need up to twice the
@@ -1355,8 +1355,8 @@ static bool keep_running(struct thread_data *td)
 	if (exceeds_number_ios(td))
 		return false;
 
-	if (td->o.io_limit)
-		limit = td->o.io_limit;
+	if (td->o.io_size)
+		limit = td->o.io_size;
 	else
 		limit = td->o.size;
 
@@ -1371,7 +1371,7 @@ static bool keep_running(struct thread_data *td)
 		if (diff < td_max_bs(td))
 			return false;
 
-		if (fio_files_done(td) && !td->o.io_limit)
+		if (fio_files_done(td) && !td->o.io_size)
 			return false;
 
 		return true;
diff --git a/cconv.c b/cconv.c
index 0c11629..b329bf4 100644
--- a/cconv.c
+++ b/cconv.c
@@ -97,7 +97,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->iodepth_batch_complete_min = le32_to_cpu(top->iodepth_batch_complete_min);
 	o->iodepth_batch_complete_max = le32_to_cpu(top->iodepth_batch_complete_max);
 	o->size = le64_to_cpu(top->size);
-	o->io_limit = le64_to_cpu(top->io_limit);
+	o->io_size = le64_to_cpu(top->io_size);
 	o->size_percent = le32_to_cpu(top->size_percent);
 	o->fill_device = le32_to_cpu(top->fill_device);
 	o->file_append = le32_to_cpu(top->file_append);
@@ -521,7 +521,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	memcpy(top->buffer_pattern, o->buffer_pattern, MAX_PATTERN_SIZE);
 
 	top->size = __cpu_to_le64(o->size);
-	top->io_limit = __cpu_to_le64(o->io_limit);
+	top->io_size = __cpu_to_le64(o->io_size);
 	top->verify_backlog = __cpu_to_le64(o->verify_backlog);
 	top->start_delay = __cpu_to_le64(o->start_delay);
 	top->start_delay_high = __cpu_to_le64(o->start_delay_high);
diff --git a/configure b/configure
index be29db9..44d215f 100755
--- a/configure
+++ b/configure
@@ -306,14 +306,13 @@ CYGWIN*)
       fi
     fi
   fi
+  output_sym "CONFIG_LITTLE_ENDIAN"
   if test ! -z "$build_32bit_win" && test "$build_32bit_win" = "yes"; then
     output_sym "CONFIG_32BIT"
   else
     output_sym "CONFIG_64BIT_LLP64"
   fi
-  output_sym "CONFIG_FADVISE"
   output_sym "CONFIG_SOCKLEN_T"
-  output_sym "CONFIG_FADVISE"
   output_sym "CONFIG_SFAA"
   output_sym "CONFIG_RUSAGE_THREAD"
   output_sym "CONFIG_WINDOWSAIO"
diff --git a/file.h b/file.h
index ac00ff8..611470c 100644
--- a/file.h
+++ b/file.h
@@ -90,6 +90,7 @@ struct fio_file {
 
 	/*
 	 * size of the file, offset into file, and io size from that offset
+	 * (be aware io_size is different from thread_options::io_size)
 	 */
 	uint64_t real_file_size;
 	uint64_t file_offset;
diff --git a/filesetup.c b/filesetup.c
index e2585ee..793b08d 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -130,6 +130,9 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 	}
 #endif /* CONFIG_POSIX_FALLOCATE */
 
+	/*
+	 * If our jobs don't require regular files initially, we're done.
+	 */
 	if (!new_layout)
 		goto done;
 
@@ -374,12 +377,26 @@ static int get_file_size(struct thread_data *td, struct fio_file *f)
 		ret = bdev_size(td, f);
 	else if (f->filetype == FIO_TYPE_CHAR)
 		ret = char_size(td, f);
-	else
+	else {
 		f->real_file_size = -1;
+		log_info("%s: failed to get file size of %s\n", td->o.name,
+					f->file_name);
+		return 1; /* avoid offset extends end error message */
+	}
 
+	/*
+	 * Leave ->real_file_size with 0 since it could be expectation
+	 * of initial setup for regular files.
+	 */
 	if (ret)
 		return ret;
 
+	/*
+	 * ->file_offset normally hasn't been initialized yet, so this
+	 * is basically always false unless ->real_file_size is -1, but
+	 * if ->real_file_size is -1 this message doesn't make sense.
+	 * As a result, this message is basically useless.
+	 */
 	if (f->file_offset > f->real_file_size) {
 		log_err("%s: offset extends end (%llu > %llu)\n", td->o.name,
 					(unsigned long long) f->file_offset,
@@ -649,6 +666,10 @@ open_again:
 	return 0;
 }
 
+/*
+ * This function i.e. get_file_size() is the default .get_file_size
+ * implementation of majority of I/O engines.
+ */
 int generic_get_file_size(struct thread_data *td, struct fio_file *f)
 {
 	return get_file_size(td, f);
@@ -676,6 +697,13 @@ static int get_file_sizes(struct thread_data *td)
 			clear_error(td);
 		}
 
+		/*
+		 * There are corner cases where we end up with -1 for
+		 * ->real_file_size due to unsupported file type, etc.
+		 * We then just set to size option value divided by number
+		 * of files, similar to the way file ->io_size is set.
+		 * stat(2) failure doesn't set ->real_file_size to -1.
+		 */
 		if (f->real_file_size == -1ULL && td->o.size)
 			f->real_file_size = td->o.size / td->o.nr_files;
 	}
@@ -792,7 +820,9 @@ int setup_files(struct thread_data *td)
 		goto done;
 
 	/*
-	 * if ioengine defines a setup() method, it's responsible for
+	 * Find out physical size of files or devices for this thread,
+	 * before we determine I/O size and range of our targets.
+	 * If ioengine defines a setup() method, it's responsible for
 	 * opening the files and setting f->real_file_size to indicate
 	 * the valid range for that file.
 	 */
@@ -833,7 +863,7 @@ int setup_files(struct thread_data *td)
 
 	/*
 	 * Calculate per-file size and potential extra size for the
-	 * first files, if needed.
+	 * first files, if needed (i.e. if we don't have a fixed size).
 	 */
 	if (!o->file_size_low && o->nr_files) {
 		uint64_t all_fs;
@@ -855,9 +885,17 @@ int setup_files(struct thread_data *td)
 	for_each_file(td, f, i) {
 		f->file_offset = get_start_offset(td, f);
 
+		/*
+		 * Update ->io_size depending on options specified.
+		 * ->file_size_low being 0 means filesize option isn't set.
+		 * Non zero ->file_size_low equals ->file_size_high means
+		 * filesize option is set in a fixed size format.
+		 * Non zero ->file_size_low not equals ->file_size_high means
+		 * filesize option is set in a range format.
+		 */
 		if (!o->file_size_low) {
 			/*
-			 * no file size range given, file size is equal to
+			 * no file size or range given, file size is equal to
 			 * total size divided by number of files. If that is
 			 * zero, set it to the real file size. If the size
 			 * doesn't divide nicely with the min blocksize,
@@ -940,7 +978,9 @@ int setup_files(struct thread_data *td)
 	}
 
 	/*
-	 * See if we need to extend some files
+	 * See if we need to extend some files, typically needed when our
+	 * target regular files don't exist yet, but our jobs require them
+	 * initially due to read I/Os.
 	 */
 	if (need_extend) {
 		temp_stall_ts = 1;
@@ -993,8 +1033,8 @@ int setup_files(struct thread_data *td)
 	 * stored entries.
 	 */
 	if (!o->read_iolog_file) {
-		if (o->io_limit)
-			td->total_io_size = o->io_limit * o->loops;
+		if (o->io_size)
+			td->total_io_size = o->io_size * o->loops;
 		else
 			td->total_io_size = o->size * o->loops;
 	}
diff --git a/fio.h b/fio.h
index 19ac0af..b2f0e2f 100644
--- a/fio.h
+++ b/fio.h
@@ -646,6 +646,9 @@ extern void lat_target_check(struct thread_data *);
 extern void lat_target_init(struct thread_data *);
 extern void lat_target_reset(struct thread_data *);
 
+/*
+ * Iterates all threads/processes within all the defined jobs
+ */
 #define for_each_td(td, i)	\
 	for ((i) = 0, (td) = &threads[0]; (i) < (int) thread_number; (i)++, (td)++)
 #define for_each_file(td, f, i)	\
diff --git a/io_u.c b/io_u.c
index 69bec4b..46d9731 100644
--- a/io_u.c
+++ b/io_u.c
@@ -62,6 +62,7 @@ static uint64_t last_block(struct thread_data *td, struct fio_file *f,
 
 	/*
 	 * Hmm, should we make sure that ->io_size <= ->real_file_size?
+	 * -> not for now since there is code assuming it could go either.
 	 */
 	max_size = f->io_size;
 	if (max_size > f->real_file_size)
diff --git a/options.c b/options.c
index 1fa99b6..a543e5a 100644
--- a/options.c
+++ b/options.c
@@ -1233,6 +1233,9 @@ static int str_filename_cb(void *data, const char *input)
 	strip_blank_front(&str);
 	strip_blank_end(str);
 
+	/*
+	 * Ignore what we may already have from nrfiles option.
+	 */
 	if (!td->files_index)
 		td->o.nr_files = 0;
 
@@ -1882,7 +1885,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.alias	= "io_limit",
 		.lname	= "IO Size",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= offsetof(struct thread_options, io_limit),
+		.off1	= offsetof(struct thread_options, io_size),
 		.help	= "Total size of I/O to be performed",
 		.interval = 1024 * 1024,
 		.category = FIO_OPT_C_IO,
diff --git a/os/windows/posix.c b/os/windows/posix.c
index f468cbf..eae8c86 100755
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -40,12 +40,6 @@ HRESULT WINAPI StringCchPrintfA(
   const char *pszFormat,
   ...);
 
-int vsprintf_s(
-  char *buffer,
-  size_t numberOfElements,
-  const char *format,
-  va_list argptr);
-
 int win_to_posix_error(DWORD winerr)
 {
 	switch (winerr)
diff --git a/thread_options.h b/thread_options.h
index dd5b9ef..5e72867 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -68,7 +68,7 @@ struct thread_options {
 	unsigned int unique_filename;
 
 	unsigned long long size;
-	unsigned long long io_limit;
+	unsigned long long io_size;
 	unsigned int size_percent;
 	unsigned int fill_device;
 	unsigned int file_append;
@@ -338,7 +338,7 @@ struct thread_options_pack {
 	uint32_t __proper_alignment_for_64b;
 
 	uint64_t size;
-	uint64_t io_limit;
+	uint64_t io_size;
 	uint32_t size_percent;
 	uint32_t fill_device;
 	uint32_t file_append;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-18 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-18 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit b3e7e59290577696dc76c08651cce1e121cff64a:

  Merge branch 'travis_osx' of https://github.com/sitsofe/fio (2017-02-16 09:00:08 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a0cb220bbb28c68fab0b175d01dcfce38a6f835c:

  Revert "Always set ->real_file_size to -1 when failed to get file size" (2017-02-17 13:30:43 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Revert "Always set ->real_file_size to -1 when failed to get file size"

 filesetup.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index e9976eb..e2585ee 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -375,12 +375,10 @@ static int get_file_size(struct thread_data *td, struct fio_file *f)
 	else if (f->filetype == FIO_TYPE_CHAR)
 		ret = char_size(td, f);
 	else
-		f->real_file_size = -1ULL;
+		f->real_file_size = -1;
 
-	if (ret) {
-		f->real_file_size = -1ULL;
+	if (ret)
 		return ret;
-	}
 
 	if (f->file_offset > f->real_file_size) {
 		log_err("%s: offset extends end (%llu > %llu)\n", td->o.name,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-17 13:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2815 bytes --]

The following changes since commit 5c8e84cab3bca39de54a69092473f000f8a57f40:

  Explicitly check td_trim(td) for the direction of next io_u (2017-02-15 13:52:24 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b3e7e59290577696dc76c08651cce1e121cff64a:

  Merge branch 'travis_osx' of https://github.com/sitsofe/fio (2017-02-16 09:00:08 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Merge branch 'travis_osx' of https://github.com/sitsofe/fio

Sitsofe Wheeler (4):
      Makefile: add test using the null ioengine
      travis: run parallel make jobs
      travis: prepare for additional builds
      travis: add OS X builds

 .travis.yml | 25 +++++++++++++++++++++++--
 Makefile    |  5 +++--
 2 files changed, 26 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
index bf0433d..ca50e22 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -1,7 +1,28 @@
 language: c
+os:
+  - linux
 compiler:
   - clang
   - gcc
+env:
+  global:
+    - MAKEFLAGS="-j 2"
+matrix:
+  include:
+    - os: osx
+      compiler: clang # Workaround travis setting CC=["clang", "gcc"]
+    # Build using the 10.12 SDK but target and run on OSX 10.11
+#   - os: osx
+#     compiler: clang
+#     osx_image: xcode8
+#     env: SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk MACOSX_DEPLOYMENT_TARGET=10.11
+    #��Build on the latest OSX version (will eventually become��obsolete)
+    - os: osx
+      compiler: clang
+      osx_image: xcode8.2
+  exclude:
+    - os: osx
+      compiler: gcc
 before_install:
-  - sudo apt-get -qq update
-  - sudo apt-get install -qq -y libaio-dev libnuma-dev libz-dev
+  - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then sudo apt-get -qq update; fi
+  - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then sudo apt-get install -qq -y libaio-dev libnuma-dev libz-dev; fi
diff --git a/Makefile b/Makefile
index a2842a0..4112410 100644
--- a/Makefile
+++ b/Makefile
@@ -303,7 +303,7 @@ endif
 
 all: $(PROGS) $(T_TEST_PROGS) $(SCRIPTS) FORCE
 
-.PHONY: all install clean
+.PHONY: all install clean test
 .PHONY: FORCE cscope
 
 FIO-VERSION-FILE: FORCE
@@ -448,7 +448,8 @@ doc: tools/plot/fio2gnuplot.1
 	@man -t tools/plot/fio2gnuplot.1 | ps2pdf - fio2gnuplot.pdf
 	@man -t tools/hist/fiologparser_hist.py.1 | ps2pdf - fiologparser_hist.pdf
 
-test:
+test: fio
+	./fio --minimal --ioengine=null --runtime=1s --name=nulltest --rw=randrw --iodepth=2 --norandommap --random_generator=tausworthe64 --size=16T --name=verifynulltest --rw=write --verify=crc32c --verify_state_save=0 --size=100M
 
 install: $(PROGS) $(SCRIPTS) tools/plot/fio2gnuplot.1 FORCE
 	$(INSTALL) -m 755 -d $(DESTDIR)$(bindir)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c1f50f765a0a51931605c1fb223d166e3b3a93c6:

  Use 0 instead of DDIR_READ to iterate from 0 to DDIR_RWDIR_CNT (2017-02-14 08:24:24 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5c8e84cab3bca39de54a69092473f000f8a57f40:

  Explicitly check td_trim(td) for the direction of next io_u (2017-02-15 13:52:24 -0700)

----------------------------------------------------------------
Tomohiro Kusumi (2):
      Drop obsolete comment on a race condition
      Explicitly check td_trim(td) for the direction of next io_u

 filesetup.c | 3 ---
 io_u.c      | 4 +++-
 2 files changed, 3 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index 3fa8b32..e9976eb 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -500,9 +500,6 @@ int file_lookup_open(struct fio_file *f, int flags)
 	__f = lookup_file_hash(f->file_name);
 	if (__f) {
 		dprint(FD_FILE, "found file in hash %s\n", f->file_name);
-		/*
-		 * racy, need the __f->lock locked
-		 */
 		f->lock = __f->lock;
 		from_hash = 1;
 	} else {
diff --git a/io_u.c b/io_u.c
index f1a3916..69bec4b 100644
--- a/io_u.c
+++ b/io_u.c
@@ -758,8 +758,10 @@ static enum fio_ddir get_rw_ddir(struct thread_data *td)
 		ddir = DDIR_READ;
 	else if (td_write(td))
 		ddir = DDIR_WRITE;
-	else
+	else if (td_trim(td))
 		ddir = DDIR_TRIM;
+	else
+		ddir = DDIR_INVAL;
 
 	td->rwmix_ddir = rate_ddir(td, ddir);
 	return td->rwmix_ddir;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-15 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-15 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 06cbb3c71fc75dbeddebb53c8f0a2ea95dc28228:

  Windows: re-enable the mmap ioengine and fix static asserts (2017-02-13 15:38:59 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c1f50f765a0a51931605c1fb223d166e3b3a93c6:

  Use 0 instead of DDIR_READ to iterate from 0 to DDIR_RWDIR_CNT (2017-02-14 08:24:24 -0700)

----------------------------------------------------------------
Tomohiro Kusumi (6):
      configure: Drop default CONFIG_LITTLE_ENDIAN for Cygwin
      configure: Use x86 instead of i386 for $cpu for IA32
      Drop conditional declaration of disk_list
      Always set ->real_file_size to -1 when failed to get file size
      Add missing "rand"/"trimwrite" strings to corresponding ddir slots
      Use 0 instead of DDIR_READ to iterate from 0 to DDIR_RWDIR_CNT

 configure     | 3 +--
 diskutil.c    | 2 --
 eta.c         | 6 +++---
 filesetup.c   | 6 ++++--
 io_ddir.h     | 4 ++--
 io_u.c        | 6 +++---
 libfio.c      | 5 -----
 stat.c        | 4 ++--
 steadystate.c | 2 +-
 9 files changed, 16 insertions(+), 22 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 0a258bf..be29db9 100755
--- a/configure
+++ b/configure
@@ -306,7 +306,6 @@ CYGWIN*)
       fi
     fi
   fi
-  output_sym "CONFIG_LITTLE_ENDIAN"
   if test ! -z "$build_32bit_win" && test "$build_32bit_win" = "yes"; then
     output_sym "CONFIG_32BIT"
   else
@@ -379,7 +378,7 @@ case "$cpu" in
     cpu="$cpu"
   ;;
   i386|i486|i586|i686|i86pc|BePC)
-    cpu="i386"
+    cpu="x86"
   ;;
   x86_64|amd64)
     cpu="x86_64"
diff --git a/diskutil.c b/diskutil.c
index c3bcec9..dca3748 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -18,8 +18,6 @@ static struct disk_util *last_du;
 
 static struct fio_mutex *disk_util_mutex;
 
-FLIST_HEAD(disk_list);
-
 static struct disk_util *__init_per_file_disk_util(struct thread_data *td,
 		int majdev, int mindev, char *path);
 
diff --git a/eta.c b/eta.c
index 1d66163..adf7f94 100644
--- a/eta.c
+++ b/eta.c
@@ -440,7 +440,7 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 		if (td->runstate > TD_SETTING_UP) {
 			int ddir;
 
-			for (ddir = DDIR_READ; ddir < DDIR_RWDIR_CNT; ddir++) {
+			for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
 				if (unified_rw_rep) {
 					io_bytes[0] += td->io_bytes[ddir];
 					io_iops[0] += td->io_blocks[ddir];
@@ -574,7 +574,7 @@ void display_thread_status(struct jobs_eta *je)
 			sprintf(perc_str, "%3.1f%%", perc);
 		}
 
-		for (ddir = DDIR_READ; ddir < DDIR_RWDIR_CNT; ddir++) {
+		for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
 			rate_str[ddir] = num2str(je->rate[ddir], 4,
 						1024, je->is_pow2, je->unit_base);
 			iops_str[ddir] = num2str(je->iops[ddir], 4, 1, 0, N2S_NONE);
@@ -601,7 +601,7 @@ void display_thread_status(struct jobs_eta *je)
 			p += sprintf(p, "%*s", linelen_last - l, "");
 		linelen_last = l;
 
-		for (ddir = DDIR_READ; ddir < DDIR_RWDIR_CNT; ddir++) {
+		for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
 			free(rate_str[ddir]);
 			free(iops_str[ddir]);
 		}
diff --git a/filesetup.c b/filesetup.c
index eb28826..3fa8b32 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -375,10 +375,12 @@ static int get_file_size(struct thread_data *td, struct fio_file *f)
 	else if (f->filetype == FIO_TYPE_CHAR)
 		ret = char_size(td, f);
 	else
-		f->real_file_size = -1;
+		f->real_file_size = -1ULL;
 
-	if (ret)
+	if (ret) {
+		f->real_file_size = -1ULL;
 		return ret;
+	}
 
 	if (f->file_offset > f->real_file_size) {
 		log_err("%s: offset extends end (%llu > %llu)\n", td->o.name,
diff --git a/io_ddir.h b/io_ddir.h
index 2141119..613d5fb 100644
--- a/io_ddir.h
+++ b/io_ddir.h
@@ -61,9 +61,9 @@ static inline int ddir_rw(enum fio_ddir ddir)
 
 static inline const char *ddir_str(enum td_ddir ddir)
 {
-	static const char *__str[] = { NULL, "read", "write", "rw", NULL,
+	static const char *__str[] = { NULL, "read", "write", "rw", "rand",
 				"randread", "randwrite", "randrw",
-				"trim", NULL, NULL, NULL, "randtrim" };
+				"trim", NULL, "trimwrite", NULL, "randtrim" };
 
 	return __str[ddir];
 }
diff --git a/io_u.c b/io_u.c
index 1daaf7b..f1a3916 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1902,7 +1902,7 @@ static void init_icd(struct thread_data *td, struct io_completion_data *icd,
 	icd->nr = nr;
 
 	icd->error = 0;
-	for (ddir = DDIR_READ; ddir < DDIR_RWDIR_CNT; ddir++)
+	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
 		icd->bytes_done[ddir] = 0;
 }
 
@@ -1941,7 +1941,7 @@ int io_u_sync_complete(struct thread_data *td, struct io_u *io_u)
 		return -1;
 	}
 
-	for (ddir = DDIR_READ; ddir < DDIR_RWDIR_CNT; ddir++)
+	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
 		td->bytes_done[ddir] += icd.bytes_done[ddir];
 
 	return 0;
@@ -1980,7 +1980,7 @@ int io_u_queued_complete(struct thread_data *td, int min_evts)
 		return -1;
 	}
 
-	for (ddir = DDIR_READ; ddir < DDIR_RWDIR_CNT; ddir++)
+	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
 		td->bytes_done[ddir] += icd.bytes_done[ddir];
 
 	return ret;
diff --git a/libfio.c b/libfio.c
index 7e0d32c..4b53c92 100644
--- a/libfio.c
+++ b/libfio.c
@@ -36,12 +36,7 @@
 #include "helper_thread.h"
 #include "filehash.h"
 
-/*
- * Just expose an empty list, if the OS does not support disk util stats
- */
-#ifndef FIO_HAVE_DISK_UTIL
 FLIST_HEAD(disk_list);
-#endif
 
 unsigned long arch_flags = 0;
 
diff --git a/stat.c b/stat.c
index f1d468c..0bb21d0 100644
--- a/stat.c
+++ b/stat.c
@@ -2442,7 +2442,7 @@ static int add_bw_samples(struct thread_data *td, struct timeval *t)
 	/*
 	 * Compute both read and write rates for the interval.
 	 */
-	for (ddir = DDIR_READ; ddir < DDIR_RWDIR_CNT; ddir++) {
+	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
 		uint64_t delta;
 
 		delta = td->this_io_bytes[ddir] - td->stat_io_bytes[ddir];
@@ -2517,7 +2517,7 @@ static int add_iops_samples(struct thread_data *td, struct timeval *t)
 	/*
 	 * Compute both read and write rates for the interval.
 	 */
-	for (ddir = DDIR_READ; ddir < DDIR_RWDIR_CNT; ddir++) {
+	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
 		uint64_t delta;
 
 		delta = td->this_io_blocks[ddir] - td->stat_io_blocks[ddir];
diff --git a/steadystate.c b/steadystate.c
index 43c715c..98f027c 100644
--- a/steadystate.c
+++ b/steadystate.c
@@ -231,7 +231,7 @@ void steadystate_check(void)
 		}
 
 		td_io_u_lock(td);
-		for (ddir = DDIR_READ; ddir < DDIR_RWDIR_CNT; ddir++) {
+		for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
 			td_iops += td->io_blocks[ddir];
 			td_bytes += td->io_bytes[ddir];
 		}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-14 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-14 13:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 18164 bytes --]

The following changes since commit 855f03627f2bce6a7f725fe6cc92e7ebe8d39deb:

  fnv: work with non-64-bit aligned chunks of data (2017-02-07 15:11:37 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 06cbb3c71fc75dbeddebb53c8f0a2ea95dc28228:

  Windows: re-enable the mmap ioengine and fix static asserts (2017-02-13 15:38:59 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      gfio: fix git location for fio

Rebecca Cran (3):
      Windows: Update the EULA year and add more examples to the installer
      Fix the return type of log_err and log_info functions
      Windows: re-enable the mmap ioengine and fix static asserts

Sitsofe Wheeler (1):
      configure: try to disable weak linking on OSX

Tomohiro Kusumi (3):
      Add a comment to clarify 941bda94
      steadystate: Use calloc(3)
      Be more verbose on endianness detection failure

 Makefile                |   1 -
 configure               |  18 ++++++--
 diskutil.h              |   1 +
 gfio.c                  |   2 +-
 libfio.c                |  36 +++++++++++++---
 os/windows/eula.rtf     | Bin 1060 -> 1072 bytes
 os/windows/examples.wxs | 112 ++++++++++++++++++++++++++++++++++++++++++------
 os/windows/install.wxs  |   2 +-
 os/windows/posix.c      |  63 ++++++++++++++++++++-------
 steadystate.c           |   9 +---
 t/log.c                 |   4 +-
 11 files changed, 198 insertions(+), 50 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 491278e..a2842a0 100644
--- a/Makefile
+++ b/Makefile
@@ -179,7 +179,6 @@ ifeq ($(CONFIG_TARGET_OS), Darwin)
   LIBS	 += -lpthread -ldl
 endif
 ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
-  SOURCE := $(filter-out engines/mmap.c,$(SOURCE))
   SOURCE += os/windows/posix.c
   LIBS	 += -lpthread -lpsapi -lws2_32
   CFLAGS += -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format -static
diff --git a/configure b/configure
index d0c2173..0a258bf 100755
--- a/configure
+++ b/configure
@@ -268,6 +268,17 @@ Darwin)
   if test -z "$cpu" && test "$(sysctl -n hw.optional.x86_64)" = "1"; then
     cpu="x86_64"
   fi
+  # Error at compile time linking of weak/partial symbols if possible...
+cat > $TMPC <<EOF
+int main(void)
+{
+  return 0;
+}
+EOF
+  if compile_prog "" "-Wl,-no_weak_imports" "disable weak symbols"; then
+    echo "Disabling weak symbols"
+    LDFLAGS="$LDFLAGS -Wl,-no_weak_imports"
+  fi
   ;;
 SunOS)
   # `uname -m` returns i86pc even on an x86_64 box, so default based on isainfo
@@ -314,6 +325,7 @@ CYGWIN*)
   output_sym "CONFIG_SCHED_IDLE"
   output_sym "CONFIG_TCP_NODELAY"
   output_sym "CONFIG_TLS_THREAD"
+  output_sym "CONFIG_STATIC_ASSERT"
   output_sym "CONFIG_IPV6"
   echo "CC=$CC" >> $config_host_mak
   echo "BUILD_CFLAGS=$CFLAGS -I../zlib -include config-host.h -D_GNU_SOURCE" >> $config_host_mak
@@ -815,14 +827,12 @@ echo "CLOCK_MONOTONIC_PRECISE       $clock_monotonic_precise"
 # clockid_t probe
 clockid_t="no"
 cat > $TMPC << EOF
-#include <stdio.h>
-#include <string.h>
 #include <time.h>
 int main(int argc, char **argv)
 {
-  clockid_t cid;
+  volatile clockid_t cid;
   memset(&cid, 0, sizeof(cid));
-  return clock_gettime(cid, NULL);
+  return 0;
 }
 EOF
 if compile_prog "" "$LIBS" "clockid_t"; then
diff --git a/diskutil.h b/diskutil.h
index 04fdde2..f773066 100644
--- a/diskutil.h
+++ b/diskutil.h
@@ -114,6 +114,7 @@ extern int update_io_ticks(void);
 extern void setup_disk_util(void);
 extern void disk_util_prune_entries(void);
 #else
+/* keep this as a function to avoid a warning in handle_du() */
 static inline void print_disk_util(struct disk_util_stat *du,
 				   struct disk_util_agg *agg, int terse,
 				   struct buf_output *out)
diff --git a/gfio.c b/gfio.c
index 9c917cb..7c92a50 100644
--- a/gfio.c
+++ b/gfio.c
@@ -1240,7 +1240,7 @@ static void about_dialog(GtkWidget *w, gpointer data)
 		"program-name", "gfio",
 		"comments", "Gtk2 UI for fio",
 		"license", license_trans,
-		"website", "http://git.kernel.dk/?p=fio.git;a=summary",
+		"website", "http://git.kernel.dk/cgit/fio/",
 		"authors", authors,
 		"version", fio_version_string,
 		"copyright", "�� 2012 Jens Axboe <axboe@kernel.dk>",
diff --git a/libfio.c b/libfio.c
index 960daf6..7e0d32c 100644
--- a/libfio.c
+++ b/libfio.c
@@ -311,6 +311,13 @@ int fio_set_fd_nonblocking(int fd, const char *who)
 	return flags;
 }
 
+enum {
+	ENDIAN_INVALID_BE = 1,
+	ENDIAN_INVALID_LE,
+	ENDIAN_INVALID_CONFIG,
+	ENDIAN_BROKEN,
+};
+
 static int endian_check(void)
 {
 	union {
@@ -327,16 +334,16 @@ static int endian_check(void)
 
 #if defined(CONFIG_LITTLE_ENDIAN)
 	if (be)
-		return 1;
+		return ENDIAN_INVALID_BE;
 #elif defined(CONFIG_BIG_ENDIAN)
 	if (le)
-		return 1;
+		return ENDIAN_INVALID_LE;
 #else
-	return 1;
+	return ENDIAN_INVALID_CONFIG;
 #endif
 
 	if (!le && !be)
-		return 1;
+		return ENDIAN_BROKEN;
 
 	return 0;
 }
@@ -344,6 +351,7 @@ static int endian_check(void)
 int initialize_fio(char *envp[])
 {
 	long ps;
+	int err;
 
 	/*
 	 * We need these to be properly 64-bit aligned, otherwise we
@@ -359,8 +367,26 @@ int initialize_fio(char *envp[])
 	compiletime_assert((offsetof(struct thread_options_pack, percentile_list) % 8) == 0, "percentile_list");
 	compiletime_assert((offsetof(struct thread_options_pack, latency_percentile) % 8) == 0, "latency_percentile");
 
-	if (endian_check()) {
+	err = endian_check();
+	if (err) {
 		log_err("fio: endianness settings appear wrong.\n");
+		switch (err) {
+		case ENDIAN_INVALID_BE:
+			log_err("fio: got big-endian when configured for little\n");
+			break;
+		case ENDIAN_INVALID_LE:
+			log_err("fio: got little-endian when configured for big\n");
+			break;
+		case ENDIAN_INVALID_CONFIG:
+			log_err("fio: not configured to any endianness\n");
+			break;
+		case ENDIAN_BROKEN:
+			log_err("fio: failed to detect endianness\n");
+			break;
+		default:
+			assert(0);
+			break;
+		}
 		log_err("fio: please report this to fio@vger.kernel.org\n");
 		return 1;
 	}
diff --git a/os/windows/eula.rtf b/os/windows/eula.rtf
index cc7be7f..1c92932 100755
Binary files a/os/windows/eula.rtf and b/os/windows/eula.rtf differ
diff --git a/os/windows/examples.wxs b/os/windows/examples.wxs
index a21182a..cc2ff5c 100755
--- a/os/windows/examples.wxs
+++ b/os/windows/examples.wxs
@@ -9,48 +9,111 @@
                     <File Source="..\..\examples\aio-read.fio" />
                 </Component>
                 <Component>
+                    <File Source="..\..\examples\backwards-read.fio" />
+                </Component>
+                <Component>
+                    <File Source="..\..\examples\basic-verify.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\cpuio.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\dev-dax.fio" />
+                </Component>
+                <Component>
                     <File Source="..\..\examples\disk-zone-profile.fio" />
                 </Component>
                 <Component>
+                  <File Source="..\..\examples\e4defrag.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\e4defrag2.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\enospc-pressure.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\falloc.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\fixed-rate-submission.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\flow.fio" />
+                </Component>
+                <Component>
                     <File Source="..\..\examples\fsx.fio" />
                 </Component>
                 <Component>
+                  <File Source="..\..\examples\fusion-aw-sync.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\gfapi.fio" />
+                </Component>
+                <Component>
                     <File Source="..\..\examples\iometer-file-access-server.fio" />
                 </Component>
                 <Component>
+                  <File Source="..\..\examples\jesd219.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\latency-profile.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\libhdfs.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\mtd.fio" />
+                </Component>
+                <Component>
                     <File Source="..\..\examples\netio.fio" />
                 </Component>
                 <Component>
                     <File Source="..\..\examples\netio_multicast.fio" />
                 </Component>
                 <Component>
-                    <File Source="..\..\examples\ssd-test.fio" />
+                  <File Source="..\..\examples\null.fio" />
                 </Component>
                 <Component>
-                    <File Source="..\..\examples\surface-scan.fio" />
+                  <File Source="..\..\examples\numa.fio" />
                 </Component>
                 <Component>
-                    <File Source="..\..\examples\tiobench-example.fio" />
+                  <File Source="..\..\examples\pmemblk.fio" />
                 </Component>
                 <Component>
-                  <File Source="..\..\examples\null.fio" />
+                  <File Source="..\..\examples\poisson-rate-submission.fio" />
                 </Component>
                 <Component>
-                  <File Source="..\..\examples\flow.fio" />
+                  <File Source="..\..\examples\rand-zones.fio" />
                 </Component>
                 <Component>
-                  <File Source="..\..\examples\cpuio.fio" />
+                  <File Source="..\..\examples\rbd.fio" />
                 </Component>
                 <Component>
-                  <File Source="..\..\examples\falloc.fio" />
+                  <File Source="..\..\examples\rdmaio-client.fio" />
                 </Component>
                 <Component>
-                  <File Source="..\..\examples\fusion-aw-sync.fio" />
+                  <File Source="..\..\examples\rdmaio-server.fio" />
                 </Component>
                 <Component>
                   <File Source="..\..\examples\ssd-steadystate.fio" />
                 </Component>
                 <Component>
+                    <File Source="..\..\examples\ssd-test.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\steadystate.fio" />
+                </Component>
+                <Component>
+                    <File Source="..\..\examples\surface-scan.fio" />
+                </Component>
+                <Component>
+                    <File Source="..\..\examples\tiobench-example.fio" />
+                </Component>
+                <Component>
+                  <File Source="..\..\examples\waitfor.fio" />
+                </Component>
+                <Component>
                   <File Source="..\..\examples\zipf.fio" />
                 </Component>
         </DirectoryRef>
@@ -59,20 +122,41 @@
         <ComponentGroup Id="examples">
             <ComponentRef Id="_1mbs_clients.fio" />
             <ComponentRef Id="aio_read.fio" />
+            <ComponentRef Id="backwards_read.fio" />
+            <ComponentRef Id="basic_verify.fio" />
+            <ComponentRef Id="cpuio.fio" />
+            <ComponentRef Id="dev_dax.fio" />
             <ComponentRef Id="disk_zone_profile.fio" />
+            <ComponentRef Id="e4defrag.fio" />
+            <ComponentRef Id="e4defrag2.fio" />
+            <ComponentRef Id="enospc_pressure.fio" />
+            <ComponentRef Id="falloc.fio" />
+            <ComponentRef Id="fixed_rate_submission.fio" />
+            <ComponentRef Id="flow.fio" />
             <ComponentRef Id="fsx.fio" />
+            <ComponentRef Id="fusion_aw_sync.fio" />
+            <ComponentRef Id="gfapi.fio" />
             <ComponentRef Id="iometer_file_access_server.fio" />
+            <ComponentRef Id="jesd219.fio" />
+            <ComponentRef Id="latency_profile.fio" />
+            <ComponentRef Id="libhdfs.fio" />
+            <ComponentRef Id="mtd.fio" />
             <ComponentRef Id="netio.fio" />
             <ComponentRef Id="netio_multicast.fio" />
+            <ComponentRef Id="null.fio" />
+            <ComponentRef Id="numa.fio" />
+            <ComponentRef Id="pmemblk.fio" />
+            <ComponentRef Id="poisson_rate_submission.fio" />
+            <ComponentRef Id="rand_zones.fio" />
+            <ComponentRef Id="rbd.fio" />
+            <ComponentRef Id="rdmaio_client.fio" />
+            <ComponentRef Id="rdmaio_server.fio" />
+            <ComponentRef Id="ssd_steadystate.fio" />
             <ComponentRef Id="ssd_test.fio" />
+            <ComponentRef Id="steadystate.fio" />
             <ComponentRef Id="surface_scan.fio" />
             <ComponentRef Id="tiobench_example.fio" />
-            <ComponentRef Id="null.fio" />
-            <ComponentRef Id="flow.fio" />
-            <ComponentRef Id="cpuio.fio" />
-            <ComponentRef Id="falloc.fio" />
-            <ComponentRef Id="fusion_aw_sync.fio" />
-            <ComponentRef Id="ssd_steadystate.fio" />
+            <ComponentRef Id="waitfor.fio" />
             <ComponentRef Id="zipf.fio" />
         </ComponentGroup>
     </Fragment>
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 22b7f7e..9e776de 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -58,7 +58,7 @@
 		<ComponentGroupRef Id="examples"/>
 	</Feature>
 
-	<Property Id="ARPURLINFOABOUT" Value="http://git.kernel.dk/?p=fio.git" />
+	<Property Id="ARPURLINFOABOUT" Value="http://git.kernel.dk/cgit/fio/" />
 	<Property Id='ARPCONTACT'>fio@vger.kernel.org</Property>
 	<Property Id='ARPHELPLINK'>http://www.spinics.net/lists/fio/</Property>
 	<Property Id='ARPURLUPDATEINFO'>http://bluestop.org/fio/</Property>
diff --git a/os/windows/posix.c b/os/windows/posix.c
index bbd93e9..f468cbf 100755
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -304,35 +304,76 @@ void *mmap(void *addr, size_t len, int prot, int flags,
 		int fildes, off_t off)
 {
 	DWORD vaProt = 0;
+	DWORD mapAccess = 0;
+	DWORD lenlow;
+	DWORD lenhigh;
+	HANDLE hMap;
 	void* allocAddr = NULL;
 
 	if (prot & PROT_NONE)
 		vaProt |= PAGE_NOACCESS;
 
-	if ((prot & PROT_READ) && !(prot & PROT_WRITE))
+	if ((prot & PROT_READ) && !(prot & PROT_WRITE)) {
 		vaProt |= PAGE_READONLY;
+		mapAccess = FILE_MAP_READ;
+	}
 
-	if (prot & PROT_WRITE)
+	if (prot & PROT_WRITE) {
 		vaProt |= PAGE_READWRITE;
+		mapAccess |= FILE_MAP_WRITE;
+	}
+
+	lenlow = len & 0xFFFF;
+	lenhigh = len >> 16;
+	/* If the low DWORD is zero and the high DWORD is non-zero, `CreateFileMapping`
+	   will return ERROR_INVALID_PARAMETER. To avoid this, set both to zero. */
+	if (lenlow == 0) {
+		lenhigh = 0;
+	}
 
-	if ((flags & MAP_ANON) | (flags & MAP_ANONYMOUS))
+	if (flags & MAP_ANON || flags & MAP_ANONYMOUS)
 	{
 		allocAddr = VirtualAlloc(addr, len, MEM_COMMIT, vaProt);
 		if (allocAddr == NULL)
 			errno = win_to_posix_error(GetLastError());
 	}
+	else
+	{
+		hMap = CreateFileMapping((HANDLE)_get_osfhandle(fildes), NULL, vaProt, lenhigh, lenlow, NULL);
+
+		if (hMap != NULL)
+		{
+			allocAddr = MapViewOfFile(hMap, mapAccess, off >> 16, off & 0xFFFF, len);
+		}
+
+		if (hMap == NULL || allocAddr == NULL)
+			errno = win_to_posix_error(GetLastError());
+
+	}
 
 	return allocAddr;
 }
 
 int munmap(void *addr, size_t len)
 {
-	if (!VirtualFree(addr, 0, MEM_RELEASE)) {
-		errno = win_to_posix_error(GetLastError());
-		return -1;
+	BOOL success;
+
+	/* We may have allocated the memory with either MapViewOfFile or
+		 VirtualAlloc. Therefore, try calling UnmapViewOfFile first, and if that
+		 fails, call VirtualFree. */
+	success = UnmapViewOfFile(addr);
+
+	if (!success)
+	{
+		success = VirtualFree(addr, 0, MEM_RELEASE);
 	}
 
-	return 0;
+	return !success;
+}
+
+int msync(void *addr, size_t len, int flags)
+{
+	return !FlushViewOfFile(addr, len);
 }
 
 int fork(void)
@@ -702,17 +743,9 @@ int getrusage(int who, struct rusage *r_usage)
 
 int posix_madvise(void *addr, size_t len, int advice)
 {
-	log_err("%s is not implemented\n", __func__);
 	return ENOSYS;
 }
 
-/* Windows doesn't support advice for memory pages. Just ignore it. */
-int msync(void *addr, size_t len, int flags)
-{
-	errno = ENOSYS;
-	return -1;
-}
-
 int fdatasync(int fildes)
 {
 	return fsync(fildes);
diff --git a/steadystate.c b/steadystate.c
index 951376f..43c715c 100644
--- a/steadystate.c
+++ b/steadystate.c
@@ -8,13 +8,8 @@ bool steadystate_enabled = false;
 
 static void steadystate_alloc(struct thread_data *td)
 {
-	int i;
-
-	td->ss.bw_data = malloc(td->ss.dur * sizeof(uint64_t));
-	td->ss.iops_data = malloc(td->ss.dur * sizeof(uint64_t));
-	/* initialize so that it is obvious if the cache is not full in the output */
-	for (i = 0; i < td->ss.dur; i++)
-		td->ss.iops_data[i] = td->ss.bw_data[i] = 0;
+	td->ss.bw_data = calloc(td->ss.dur, sizeof(uint64_t));
+	td->ss.iops_data = calloc(td->ss.dur, sizeof(uint64_t));
 
 	td->ss.state |= __FIO_SS_DATA;
 }
diff --git a/t/log.c b/t/log.c
index 1ed3851..929aac6 100644
--- a/t/log.c
+++ b/t/log.c
@@ -2,7 +2,7 @@
 #include <stdarg.h>
 #include "../minmax.h"
 
-int log_err(const char *format, ...)
+size_t log_err(const char *format, ...)
 {
 	char buffer[1024];
 	va_list args;
@@ -16,7 +16,7 @@ int log_err(const char *format, ...)
 	return fwrite(buffer, len, 1, stderr);
 }
 
-int log_info(const char *format, ...)
+size_t log_info(const char *format, ...)
 {
 	char buffer[1024];
 	va_list args;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9d25d068f88f1be7ce4e67654ee26f8faa1ebca4:

  doc: minor HOWTO fixes (2017-02-04 10:25:58 +0000)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 855f03627f2bce6a7f725fe6cc92e7ebe8d39deb:

  fnv: work with non-64-bit aligned chunks of data (2017-02-07 15:11:37 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      init: don't crash server on failure to open output log
      bloom: if we're not setting bits, break after first failed mask check
      fnv: work with non-64-bit aligned chunks of data

 crc/fnv.c   | 24 +++++++++++++++++++++---
 init.c      | 17 +++++++++++------
 lib/bloom.c |  4 +++-
 3 files changed, 35 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/crc/fnv.c b/crc/fnv.c
index 04c0560..4cd0650 100644
--- a/crc/fnv.c
+++ b/crc/fnv.c
@@ -2,14 +2,32 @@
 
 #define FNV_PRIME	0x100000001b3ULL
 
+/*
+ * 64-bit fnv, but don't require 64-bit multiples of data. Use bytes
+ * for the last unaligned chunk.
+ */
 uint64_t fnv(const void *buf, uint32_t len, uint64_t hval)
 {
 	const uint64_t *ptr = buf;
-	const uint64_t *end = (void *) buf + len;
 
-	while (ptr < end) {
+	while (len) {
 		hval *= FNV_PRIME;
-		hval ^= (uint64_t) *ptr++;
+		if (len >= sizeof(uint64_t)) {
+			hval ^= (uint64_t) *ptr++;
+			len -= sizeof(uint64_t);
+			continue;
+		} else {
+			const uint8_t *ptr8 = (const uint8_t *) ptr;
+			uint64_t val = 0;
+			int i;
+
+			for (i = 0; i < len; i++) {
+				val <<= 8;
+				val |= (uint8_t) *ptr8++;
+			}
+			hval ^= val;
+			break;
+		}
 	}
 
 	return hval;
diff --git a/init.c b/init.c
index 34ed20f..0a2ace1 100644
--- a/init.c
+++ b/init.c
@@ -2326,17 +2326,22 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 		case 'b':
 			write_bw_log = 1;
 			break;
-		case 'o':
+		case 'o': {
+			FILE *tmp;
+
 			if (f_out && f_out != stdout)
 				fclose(f_out);
 
-			f_out = fopen(optarg, "w+");
-			if (!f_out) {
-				perror("fopen output");
-				exit(1);
+			tmp = fopen(optarg, "w+");
+			if (!tmp) {
+				log_err("fio: output file open error: %s\n", strerror(errno));
+				exit_val = 1;
+				do_exit++;
+				break;
 			}
-			f_err = f_out;
+			f_err = f_out = tmp;
 			break;
+			}
 		case 'm':
 			output_format = FIO_OUTPUT_TERSE;
 			break;
diff --git a/lib/bloom.c b/lib/bloom.c
index 7a9ebaa..bb81dbb 100644
--- a/lib/bloom.c
+++ b/lib/bloom.c
@@ -104,8 +104,10 @@ static bool __bloom_check(struct bloom *b, const void *data, unsigned int len,
 
 		if (b->map[index] & (1U << bit))
 			was_set++;
-		if (set)
+		else if (set)
 			b->map[index] |= 1U << bit;
+		else
+			break;
 	}
 
 	return was_set == N_HASHES;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-05 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-05 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3dd2e8aa0519225da5ccc100aa02e55306bcc2f3:

  fix to replay_align on iolog previous code was rejecting all positive alignment values and only accepting align to 0 value, opposite of what it should be doing (2017-02-02 10:37:00 -0800)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9d25d068f88f1be7ce4e67654ee26f8faa1ebca4:

  doc: minor HOWTO fixes (2017-02-04 10:25:58 +0000)

----------------------------------------------------------------
Sitsofe Wheeler (1):
      doc: minor HOWTO fixes

 HOWTO | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 295dc10..f44c626 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1061,7 +1061,8 @@ I/O type
 
 	Start I/O at the given offset in the file. The data before the given offset
 	will not be touched. This effectively caps the file size at `real_size -
-	offset`.
+	offset`. Can be combined with :option:`size` to constrain the start and
+	end range that I/O will be done within.
 
 .. option:: offset_increment=int
 
@@ -1505,6 +1506,8 @@ I/O size
 	files or devices.  If the files do not exist, size must be given. It is also
 	possible to give size as a percentage between 1 and 100. If ``size=20%`` is
 	given, fio will use 20% of the full size of the given files or devices.
+	Can be combined with :option:`offset` to constrain the start and end range
+	that I/O will be done within.
 
 .. option:: io_size=int, io_limit=int
 
@@ -1679,7 +1682,7 @@ I/O engine
 			files based on the offset generated by fio backend. (see the example
 			job file to create such files, use ``rw=write`` option). Please
 			note, you might want to set necessary environment variables to work
-			with hdfs/libhdfs properly. Each jobs uses it's own connection to
+			with hdfs/libhdfs properly.  Each job uses its own connection to
 			HDFS.
 
 		**mtd**
@@ -1721,7 +1724,7 @@ caveat that when used on the command line, they must come after the
 	reap events. The reaping mode is only enabled when polling for a minimum of
 	0 events (e.g. when :option:`iodepth_batch_complete` `=0`).
 
-.. option:: hipri : [psyncv2]
+.. option:: hipri : [pvsync2]
 
 	Set RWF_HIPRI on I/O, indicating to the kernel that it's of higher priority
 	than normal.
@@ -2889,7 +2892,7 @@ Interpreting the output
 Fio spits out a lot of output. While running, fio will display the status of the
 jobs created. An example of that would be::
 
-    Jobs: 1: [_r] [24.8% done] [r=20992KiB/s,w=24064KiB/s,t=0KiB/s] [r=82,w=94,t=0 iops] [eta 00h:01m:31s]
+    Jobs: 1 (f=1): [_(1),M(1)][24.8%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 01m:31s]
 
 The characters inside the square brackets denote the current status of each
 thread. The possible values (in typical life cycle order) are:
@@ -2934,14 +2937,15 @@ Fio will condense the thread string as not to take up more space on the command
 line as is needed. For instance, if you have 10 readers and 10 writers running,
 the output would look like this::
 
-    Jobs: 20 (f=20): [R(10),W(10)] [4.0% done] [r=20992KiB/s,w=24064KiB/s,t=0KiB/s] [r=82,w=94,t=0 iops] [eta 57m:36s]
+    Jobs: 20 (f=20): [R(10),W(10)][4.0%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 57m:36s]
 
 Fio will still maintain the ordering, though. So the above means that jobs 1..10
 are readers, and 11..20 are writers.
 
 The other values are fairly self explanatory -- number of threads currently
-running and doing I/O, rate of I/O since last check (read speed listed first,
-then write speed), and the estimated completion percentage and time for the
+running and doing I/O, the number of currently open files (f=), the rate of I/O
+since last check (read speed listed first, then write speed and optionally trim
+speed), and the estimated completion percentage and time for the current
 running group. It's impossible to estimate runtime of the following groups (if
 any). Note that the string is displayed in order, so it's possible to tell which
 of the jobs are currently doing what. The first character is the first job

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-02-03 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-02-03 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f75ede1d8fe9a255be06f1bf0bde4b99b75acef9:

  doc: document profiles, minor fixes (2017-01-30 13:25:52 +0000)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3dd2e8aa0519225da5ccc100aa02e55306bcc2f3:

  fix to replay_align on iolog previous code was rejecting all positive alignment values and only accepting align to 0 value, opposite of what it should be doing (2017-02-02 10:37:00 -0800)

----------------------------------------------------------------
Dylan Fairchild (1):
      fix to replay_align on iolog     previous code was rejecting all positive alignment values and     only accepting align to 0 value, opposite of what it should be doing

 iolog.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/iolog.h b/iolog.h
index 60ee3e9..37f27bc 100644
--- a/iolog.h
+++ b/iolog.h
@@ -271,7 +271,7 @@ static inline bool inline_log(struct io_log *log)
 
 static inline void ipo_bytes_align(unsigned int replay_align, struct io_piece *ipo)
 {
-	if (replay_align)
+	if (!replay_align)
 		return;
 
 	ipo->offset &= ~(replay_align - (uint64_t)1);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-31 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-31 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c60ebc45bccb8603a360f88c494ecca40a7becef:

  doc: minor consistency and spelling changes (2017-01-27 09:44:05 +0000)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f75ede1d8fe9a255be06f1bf0bde4b99b75acef9:

  doc: document profiles, minor fixes (2017-01-30 13:25:52 +0000)

----------------------------------------------------------------
Sitsofe Wheeler (1):
      doc: document profiles, minor fixes

 HOWTO | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 110 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index e917e77..295dc10 100644
--- a/HOWTO
+++ b/HOWTO
@@ -643,9 +643,10 @@ Time related parameters
 
 .. option:: runtime=time
 
-	Tell fio to terminate processing after the specified number of seconds. It
+	Tell fio to terminate processing after the specified period of time.  It
 	can be quite hard to determine for how long a specified job will run, so
-	this parameter is handy to cap the total runtime to a given time.
+	this parameter is handy to cap the total runtime to a given time.  When
+	the unit is omitted, the value is given in seconds.
 
 .. option:: time_based
 
@@ -667,7 +668,8 @@ Time related parameters
 	before logging results, thus minimizing the runtime required for stable
 	results. Note that the ``ramp_time`` is considered lead in time for a job,
 	thus it will increase the total runtime if a special timeout or
-	:option:`runtime` is specified.
+	:option:`runtime` is specified.  When the unit is omitted, the value is
+	given in seconds.
 
 .. option:: clocksource=str
 
@@ -691,7 +693,7 @@ Time related parameters
 .. option:: gtod_reduce=bool
 
 	Enable all of the :manpage:`gettimeofday(2)` reducing options
-	(:option:`disable_clat`, :option:`disable_slat`, :option:`disable_bw`) plus
+	(:option:`disable_clat`, :option:`disable_slat`, :option:`disable_bw_measurement`) plus
 	reduce precision of the timeout somewhat to really shrink the
 	:manpage:`gettimeofday(2)` call count. With this option enabled, we only do
 	about 0.4% of the :manpage:`gettimeofday(2)` calls we would have done if all
@@ -1947,15 +1949,17 @@ I/O rate
 
 .. option:: thinktime=time
 
-	Stall the job x microseconds after an I/O has completed before issuing the
-	next. May be used to simulate processing being done by an application. See
+	Stall the job for the specified period of time after an I/O has completed before issuing the
+	next. May be used to simulate processing being done by an application.
+	When the unit is omitted, the value is given in microseconds.  See
 	:option:`thinktime_blocks` and :option:`thinktime_spin`.
 
 .. option:: thinktime_spin=time
 
 	Only valid if :option:`thinktime` is set - pretend to spend CPU time doing
 	something with the data received, before falling back to sleeping for the
-	rest of the period specified by :option:`thinktime`.
+	rest of the period specified by :option:`thinktime`.  When the unit is
+	omitted, the value is given in microseconds.
 
 .. option:: thinktime_blocks=int
 
@@ -2010,15 +2014,15 @@ I/O latency
 .. option:: latency_target=time
 
 	If set, fio will attempt to find the max performance point that the given
-	workload will run at while maintaining a latency below this target. The
-	values is given in microseconds.  See :option:`latency_window` and
-	:option:`latency_percentile`.
+	workload will run at while maintaining a latency below this target.  When
+	the unit is omitted, the value is given in microseconds.  See
+	:option:`latency_window` and :option:`latency_percentile`.
 
 .. option:: latency_window=time
 
 	Used with :option:`latency_target` to specify the sample window that the job
-	is run at varying queue depths to test the performance. The value is given
-	in microseconds.
+	is run at varying queue depths to test the performance.  When the unit is
+	omitted, the value is given in microseconds.
 
 .. option:: latency_percentile=float
 
@@ -2029,8 +2033,9 @@ I/O latency
 
 .. option:: max_latency=time
 
-	If set, fio will exit the job if it exceeds this maximum latency. It will
-	exit with an ETIME error.
+	If set, fio will exit the job with an ETIMEDOUT error if it exceeds this
+	maximum latency. When the unit is omitted, the value is given in
+	microseconds.
 
 .. option:: rate_cycle=int
 
@@ -2520,13 +2525,14 @@ Steady state
 
 	A rolling window of this duration will be used to judge whether steady state
 	has been reached. Data will be collected once per second. The default is 0
-	which disables steady state detection.
+	which disables steady state detection.  When the unit is omitted, the
+	value is given in seconds.
 
 .. option:: steadystate_ramp_time=time, ss_ramp=time
 
 	Allow the job to run for the specified duration before beginning data
 	collection for checking the steady state job termination criterion. The
-	default is 0.
+	default is 0.  When the unit is omitted, the value is given in seconds.
 
 
 Measurements and reporting
@@ -2697,7 +2703,7 @@ Measurements and reporting
 	the number of calls to :manpage:`gettimeofday(2)`, as that does impact
 	performance at really high IOPS rates.  Note that to really get rid of a
 	large amount of these calls, this option must be used with
-	:option:`disable_slat` and :option:`disable_bw` as well.
+	:option:`disable_slat` and :option:`disable_bw_measurement` as well.
 
 .. option:: disable_clat=bool
 
@@ -2709,7 +2715,7 @@ Measurements and reporting
 	Disable measurements of submission latency numbers. See
 	:option:`disable_slat`.
 
-.. option:: disable_bw=bool
+.. option:: disable_bw_measurement=bool, disable_bw=bool
 
 	Disable measurements of throughput/bandwidth numbers. See
 	:option:`disable_lat`.
@@ -2790,6 +2796,92 @@ Error handling
 	If set dump every error even if it is non fatal, true by default. If
 	disabled only fatal error will be dumped.
 
+Running predefined workloads
+----------------------------
+
+Fio includes predefined profiles that mimic the I/O workloads generated by
+other tools.
+
+.. option:: profile=str
+
+	The predefined workload to run.  Current profiles are:
+
+		**tiobench**
+			Threaded I/O bench (tiotest/tiobench) like workload.
+
+		**act**
+			Aerospike Certification Tool (ACT) like workload.
+
+To view a profile's additional options use :option:`--cmdhelp` after specifying
+the profile.  For example::
+
+$ fio --profile=act --cmdhelp
+
+Act profile options
+~~~~~~~~~~~~~~~~~~~
+
+.. option:: device-names=str
+	:noindex:
+
+	Devices to use.
+
+.. option:: load=int
+	:noindex:
+
+	ACT load multiplier.  Default: 1.
+
+.. option:: test-duration=time
+	:noindex:
+
+	How long the entire test takes to run.  Default: 24h.
+
+.. option:: threads-per-queue=int
+	:noindex:
+
+	Number of read IO threads per device.  Default: 8.
+
+.. option:: read-req-num-512-blocks=int
+	:noindex:
+
+	Number of 512B blocks to read at the time.  Default: 3.
+
+.. option:: large-block-op-kbytes=int
+	:noindex:
+
+	Size of large block ops in KiB (writes).  Default: 131072.
+
+.. option:: prep
+	:noindex:
+
+	Set to run ACT prep phase.
+
+Tiobench profile options
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. option:: size=str
+	:noindex:
+
+	Size in MiB
+
+.. option:: block=int
+	:noindex:
+
+	Block size in bytes.  Default: 4096.
+
+.. option:: numruns=int
+	:noindex:
+
+	Number of runs.
+
+.. option:: dir=str
+	:noindex:
+
+	Test directory.
+
+.. option:: threads=int
+	:noindex:
+
+	Number of threads.
 
 Interpreting the output
 -----------------------

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-28 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-28 13:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 14462 bytes --]

The following changes since commit f0ac17190989b4ada1d4d74be8d7a4ef3a76dfbb:

  Merge branch 'shm_rm' of https://github.com/sitsofe/fio (2017-01-26 10:07:48 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c60ebc45bccb8603a360f88c494ecca40a7becef:

  doc: minor consistency and spelling changes (2017-01-27 09:44:05 +0000)

----------------------------------------------------------------
Sitsofe Wheeler (2):
      doc: minor documentation changes
      doc: minor consistency and spelling changes

 HOWTO     | 68 +++++++++++++++++++++++++++++++--------------------------------
 README    | 11 ++++++-----
 options.c |  2 +-
 3 files changed, 41 insertions(+), 40 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index d63fff7..e917e77 100644
--- a/HOWTO
+++ b/HOWTO
@@ -55,7 +55,7 @@ Command line options
 .. option:: --debug=type
 
     Enable verbose tracing of various fio actions.  May be ``all`` for all types
-    or individual types separated by a comma (eg ``--debug=file,mem`` will
+    or individual types separated by a comma (e.g. ``--debug=file,mem`` will
     enable file and memory debugging).  Currently, additional logging is
     available for:
 
@@ -273,7 +273,7 @@ Job file format
 
 As previously described, fio accepts one or more job files describing what it is
 supposed to do. The job file format is the classic ini file, where the names
-enclosed in [] brackets define the job name. You are free to use any ascii name
+enclosed in [] brackets define the job name. You are free to use any ASCII name
 you want, except *global* which has special meaning.  Following the job name is
 a sequence of zero or more parameters, one per line, that define the behavior of
 the job. If the first character in a line is a ';' or a '#', the entire line is
@@ -469,7 +469,7 @@ Parameter types
 
 **time**
 	Integer with possible time suffix. In seconds unless otherwise
-	specified, use eg 10m for 10 minutes. Accepts s/m/h for seconds, minutes,
+	specified, use e.g. 10m for 10 minutes. Accepts s/m/h for seconds, minutes,
 	and hours, and accepts 'ms' (or 'msec') for milliseconds, and 'us' (or
 	'usec') for microseconds.
 
@@ -554,7 +554,7 @@ Parameter types
 
 **irange**
 	Integer range with suffix. Allows value range to be given, such as
-	1024-4096. A colon may also be used as the separator, eg 1k:4k. If the
+	1024-4096. A colon may also be used as the separator, e.g. 1k:4k. If the
 	option allows two sets of ranges, they can be specified with a ',' or '/'
 	delimiter: 1k-4k/8k-32k. Also see :ref:`int <int>`.
 
@@ -653,7 +653,7 @@ Time related parameters
 	even if the file(s) are completely read or written. It will simply loop over
 	the same workload as many times as the :option:`runtime` allows.
 
-.. option:: startdelay=irange
+.. option:: startdelay=irange(time)
 
 	Delay start of job for the specified number of seconds. Supports all time
 	suffixes to allow specification of hours, minutes, seconds and milliseconds
@@ -820,13 +820,13 @@ Target file/device
 			still be open depending on 'openfiles'.
 
 		**zipf**
-			Use a *zipfian* distribution to decide what file to access.
+			Use a *Zipf* distribution to decide what file to access.
 
 		**pareto**
-			Use a *pareto* distribution to decide what file to access.
+			Use a *Pareto* distribution to decide what file to access.
 
 		**gauss**
-			Use a *gaussian* (normal) distribution to decide what file to
+			Use a *Gaussian* (normal) distribution to decide what file to
 			access.
 
 	For *random*, *roundrobin*, and *sequential*, a postfix can be appended to
@@ -871,7 +871,7 @@ Target file/device
 
 .. option:: allow_mounted_write=bool
 
-	If this isn't set, fio will abort jobs that are destructive (eg that write)
+	If this isn't set, fio will abort jobs that are destructive (e.g. that write)
 	to what appears to be a mounted device or partition. This should help catch
 	creating inadvertently destructive tests, not realizing that the test will
 	destroy data on the mounted file system. Default: false.
@@ -882,7 +882,7 @@ Target file/device
 	given I/O operation. This will also clear the :option:`invalidate` flag,
 	since it is pointless to pre-read and then drop the cache. This will only
 	work for I/O engines that are seek-able, since they allow you to read the
-	same data multiple times. Thus it will not work on eg network or splice I/O.
+	same data multiple times. Thus it will not work on e.g. network or splice I/O.
 
 .. option:: unlink=bool
 
@@ -975,7 +975,7 @@ I/O type
 			Generate the same offset.
 
 	``sequential`` is only useful for random I/O, where fio would normally
-	generate a new random offset for every I/O. If you append eg 8 to randread,
+	generate a new random offset for every I/O. If you append e.g. 8 to randread,
 	you would get a new random offset for every 8 I/O's. The result would be a
 	seek for only every 8 I/O's, instead of for every I/O. Use ``rw=randread:8``
 	to specify that. As sequential I/O is already sequential, setting
@@ -1072,10 +1072,10 @@ I/O type
 
 .. option:: number_ios=int
 
-	Fio will normally perform IOs until it has exhausted the size of the region
+	Fio will normally perform I/Os until it has exhausted the size of the region
 	set by :option:`size`, or if it exhaust the allocated time (or hits an error
 	condition). With this setting, the range/size can be set independently of
-	the number of IOs to perform. When fio reaches this number, it will exit
+	the number of I/Os to perform. When fio reaches this number, it will exit
 	normally and report status. Note that this does not extend the amount of I/O
 	that will be done, it will only stop fio if this condition is met before
 	other end-of-job criteria.
@@ -1163,14 +1163,14 @@ I/O type
 				Pareto distribution
 
 		**gauss**
-				Normal (gaussian) distribution
+				Normal (Gaussian) distribution
 
 		**zoned**
 				Zoned random distribution
 
 	When using a **zipf** or **pareto** distribution, an input value is also
 	needed to define the access pattern. For **zipf**, this is the `zipf
-	theta`. For **pareto**, it's the `pareto power`. Fio includes a test
+	theta`. For **pareto**, it's the `Pareto power`. Fio includes a test
 	program, :command:`genzipf`, that can be used visualize what the given input
 	values will yield in terms of hit rates.  If you wanted to use **zipf** with
 	a `theta` of 1.2, you would use ``random_distribution=zipf:1.2`` as the
@@ -1556,7 +1556,7 @@ I/O engine
 
 		**vsync**
 			Basic :manpage:`readv(2)` or :manpage:`writev(2)` I/O.  Will emulate
-			queuing by coalescing adjacent IOs into a single submission.
+			queuing by coalescing adjacent I/Os into a single submission.
 
 		**pvsync**
 			Basic :manpage:`preadv(2)` or :manpage:`pwritev(2)` I/O.
@@ -1699,7 +1699,7 @@ I/O engine
 
 		**external**
 			Prefix to specify loading an external I/O engine object file. Append
-			the engine filename, eg ``ioengine=external:/tmp/foo.o`` to load
+			the engine filename, e.g. ``ioengine=external:/tmp/foo.o`` to load
 			ioengine :file:`foo.o` in :file:`/tmp`.
 
 
@@ -1717,7 +1717,7 @@ caveat that when used on the command line, they must come after the
 	:manpage:`io_getevents(2)` system call to reap newly returned events.  With
 	this flag turned on, the AIO ring will be read directly from user-space to
 	reap events. The reaping mode is only enabled when polling for a minimum of
-	0 events (eg when :option:`iodepth_batch_complete` `=0`).
+	0 events (e.g. when :option:`iodepth_batch_complete` `=0`).
 
 .. option:: hipri : [psyncv2]
 
@@ -1872,7 +1872,7 @@ I/O depth
 
 	Number of I/O units to keep in flight against the file.  Note that
 	increasing *iodepth* beyond 1 will not affect synchronous ioengines (except
-	for small degress when :option:`verify_async` is in use).  Even async
+	for small degrees when :option:`verify_async` is in use).  Even async
 	engines may impose OS restrictions causing the desired depth not to be
 	achieved.  This may happen on Linux when using libaio and not setting
 	:option:`direct` =1, since buffered I/O is not async on that OS.  Keep an
@@ -1925,7 +1925,7 @@ I/O depth
 	The low water mark indicating when to start filling the queue
 	again. Defaults to the same as :option:`iodepth`, meaning that fio will
 	attempt to keep the queue full at all times.  If :option:`iodepth` is set to
-	eg 16 and *iodepth_low* is set to 4, then after fio has filled the queue of
+	e.g. 16 and *iodepth_low* is set to 4, then after fio has filled the queue of
 	16 requests, it will let the depth drain down to 4 before starting to fill
 	it again.
 
@@ -1945,13 +1945,13 @@ I/O depth
 I/O rate
 ~~~~~~~~
 
-.. option:: thinktime=int
+.. option:: thinktime=time
 
 	Stall the job x microseconds after an I/O has completed before issuing the
 	next. May be used to simulate processing being done by an application. See
 	:option:`thinktime_blocks` and :option:`thinktime_spin`.
 
-.. option:: thinktime_spin=int
+.. option:: thinktime_spin=time
 
 	Only valid if :option:`thinktime` is set - pretend to spend CPU time doing
 	something with the data received, before falling back to sleeping for the
@@ -1997,7 +1997,7 @@ I/O rate
 
 	This option controls how fio manages rated I/O submissions. The default is
 	`linear`, which submits I/O in a linear fashion with fixed delays between
-	IOs that gets adjusted based on I/O completion rates. If this is set to
+	I/Os that gets adjusted based on I/O completion rates. If this is set to
 	`poisson`, fio will submit I/O based on a more real world random request
 	flow, known as the Poisson process
 	(https://en.wikipedia.org/wiki/Poisson_point_process). The lambda will be
@@ -2007,14 +2007,14 @@ I/O rate
 I/O latency
 ~~~~~~~~~~~
 
-.. option:: latency_target=int
+.. option:: latency_target=time
 
 	If set, fio will attempt to find the max performance point that the given
 	workload will run at while maintaining a latency below this target. The
 	values is given in microseconds.  See :option:`latency_window` and
 	:option:`latency_percentile`.
 
-.. option:: latency_window=int
+.. option:: latency_window=time
 
 	Used with :option:`latency_target` to specify the sample window that the job
 	is run at varying queue depths to test the performance. The value is given
@@ -2022,12 +2022,12 @@ I/O latency
 
 .. option:: latency_percentile=float
 
-	The percentage of IOs that must fall within the criteria specified by
+	The percentage of I/Os that must fall within the criteria specified by
 	:option:`latency_target` and :option:`latency_window`. If not set, this
-	defaults to 100.0, meaning that all IOs must be equal or below to the value
+	defaults to 100.0, meaning that all I/Os must be equal or below to the value
 	set by :option:`latency_target`.
 
-.. option:: max_latency=int
+.. option:: max_latency=time
 
 	If set, fio will exit the job if it exceeds this maximum latency. It will
 	exit with an ETIME error.
@@ -2611,7 +2611,7 @@ Measurements and reporting
 
 	Same as :option:`log_avg_msec`, but logs entries for completion latency
 	histograms. Computing latency percentiles from averages of intervals using
-	:option:`log_avg_msec` is innacurate. Setting this option makes fio log
+	:option:`log_avg_msec` is inaccurate. Setting this option makes fio log
 	histogram entries over the specified period of time, reducing log sizes for
 	high IOPS devices while retaining percentile accuracy.  See
 	:option:`log_hist_coarseness` as well. Defaults to 0, meaning histogram
@@ -2883,7 +2883,7 @@ denote:
 		Average bandwidth rate.
 
 **iops**
-		Average IOs performed per second.
+		Average I/Os performed per second.
 
 **runt**
 		The runtime of that thread.
@@ -2926,8 +2926,8 @@ denote:
 
 **IO submit**
 		How many pieces of I/O were submitting in a single submit call. Each
-		entry denotes that amount and below, until the previous entry -- eg,
-		8=100% mean that we submitted anywhere in between 5-8 IOs per submit
+		entry denotes that amount and below, until the previous entry -- e.g.,
+		8=100% mean that we submitted anywhere in between 5-8 I/Os per submit
 		call.
 
 **IO complete**
@@ -2975,7 +2975,7 @@ Each value is printed for both reads and writes, with reads first. The
 numbers denote:
 
 **ios**
-		Number of ios performed by all groups.
+		Number of I/Os performed by all groups.
 **merge**
 		Number of merges I/O the I/O scheduler.
 **ticks**
@@ -3279,7 +3279,7 @@ particular I/O. The logging of the offset can be toggled with
 :option:`log_offset`.
 
 If windowed logging is enabled through :option:`log_avg_msec` then fio doesn't
-log individual IOs. Instead of logs the average values over the specified period
+log individual I/Os. Instead of logs the average values over the specified period
 of time. Since 'data direction' and 'offset' are per-I/O values, they aren't
 applicable if windowed logging is enabled. If windowed logging is enabled and
 :option:`log_max_value` is set, then fio logs maximum values in that window
diff --git a/README b/README
index 8f5385e..9493c2a 100644
--- a/README
+++ b/README
@@ -92,16 +92,17 @@ Binary packages
 
 Debian:
 	Starting with Debian "Squeeze", fio packages are part of the official
-	Debian repository. http://packages.debian.org/search?keywords=fio.
+	Debian repository. http://packages.debian.org/search?keywords=fio .
 
 Ubuntu:
 	Starting with Ubuntu 10.04 LTS (aka "Lucid Lynx"), fio packages are part
 	of the Ubuntu "universe" repository.
-	http://packages.ubuntu.com/search?keywords=fio.
+	http://packages.ubuntu.com/search?keywords=fio .
 
-Red Hat, CentOS & Co:
-	Dag Wie��rs has RPMs for Red Hat related distros, find them here:
-	http://dag.wieers.com/rpm/packages/fio/.
+Red Hat, Fedora, CentOS & Co:
+	Starting with Fedora 9/Extra Packages for Enterprise Linux 4, fio
+	packages are part of the Fedora/EPEL repositories.
+	https://admin.fedoraproject.org/pkgdb/package/rpms/fio/ .
 
 Mandriva:
 	Mandriva has integrated fio into their package repository, so installing
diff --git a/options.c b/options.c
index 713112f..1fa99b6 100644
--- a/options.c
+++ b/options.c
@@ -2235,7 +2235,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  },
 			  { .ival = "gauss",
 			    .oval = FIO_FSERVICE_GAUSS,
-			    .help = "Normal (gaussian) distribution",
+			    .help = "Normal (Gaussian) distribution",
 			  },
 			  { .ival = "roundrobin",
 			    .oval = FIO_FSERVICE_RR,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-27 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-27 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit bd4d9bdc5097c3b35b5172508e1a2828296e01c2:

  Remove/Move Linux specific sysfs_root field from thread_data (2017-01-23 08:26:12 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f0ac17190989b4ada1d4d74be8d7a4ef3a76dfbb:

  Merge branch 'shm_rm' of https://github.com/sitsofe/fio (2017-01-26 10:07:48 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'sphinx-doc' of https://github.com/termim/fio
      Merge branch 'shm_rm' of https://github.com/sitsofe/fio

Mikhail Terekhov (1):
      fix manpage heading issue

Sitsofe Wheeler (1):
      shm: have os remove shared memory if fio dies unexpectedly

 doc/fio_man.rst   | 3 ++-
 init.c            | 3 +++
 os/os-dragonfly.h | 2 ++
 os/os-freebsd.h   | 3 +++
 os/os-linux.h     | 1 +
 os/os-openbsd.h   | 3 +++
 6 files changed, 14 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/doc/fio_man.rst b/doc/fio_man.rst
index 7eae05e..c6a6438 100644
--- a/doc/fio_man.rst
+++ b/doc/fio_man.rst
@@ -1,6 +1,7 @@
 :orphan:
 
-
+Fio Manpage
+===========
 
 (rev. |release|)
 
diff --git a/init.c b/init.c
index c3cc3e5..34ed20f 100644
--- a/init.c
+++ b/init.c
@@ -356,6 +356,9 @@ static int setup_thread_area(void)
 		perror("shmat");
 		return 1;
 	}
+#ifdef FIO_HAVE_SHM_ATTACH_REMOVED
+	shmctl(shm_id, IPC_RMID, NULL);
+#endif
 #endif
 
 	memset(threads, 0, max_jobs * sizeof(struct thread_data));
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index c799817..5e94855 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -24,6 +24,8 @@
 #define FIO_HAVE_GETTID
 #define FIO_HAVE_CPU_AFFINITY
 #define FIO_HAVE_IOPRIO
+/* Only have attach-to-open-removed when kern.ipc.shm_allow_removed is 1 */
+#undef  FIO_HAVE_SHM_ATTACH_REMOVED
 
 #define OS_MAP_ANON		MAP_ANON
 
diff --git a/os/os-freebsd.h b/os/os-freebsd.h
index ac408c9..aa90954 100644
--- a/os/os-freebsd.h
+++ b/os/os-freebsd.h
@@ -22,6 +22,9 @@
 #define FIO_HAVE_TRIM
 #define FIO_HAVE_GETTID
 #define FIO_HAVE_CPU_AFFINITY
+/* Only have attach-to-open-removed when kern.ipc.shm_allow_removed is 1 */
+#undef  FIO_HAVE_SHM_ATTACH_REMOVED
+
 
 #define OS_MAP_ANON		MAP_ANON
 
diff --git a/os/os-linux.h b/os/os-linux.h
index 06235ab..1829829 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -41,6 +41,7 @@
 #define FIO_HAVE_GETTID
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_PWRITEV2
+#define FIO_HAVE_SHM_ATTACH_REMOVED
 
 #ifdef MAP_HUGETLB
 #define FIO_HAVE_MMAP_HUGE
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index 3343cbd..4700572 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -25,6 +25,9 @@
 
 #undef	FIO_HAVE_CPU_AFFINITY	/* XXX notyet */
 
+/* Only OpenBSD 5.1 and above have attach-to-open-removed semantics */
+#undef  FIO_HAVE_SHM_ATTACH_REMOVED
+
 #define OS_MAP_ANON		MAP_ANON
 
 #ifndef PTHREAD_STACK_MIN

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9cf163b0b3f69df8ef70d5a0799d9452e80ee2c4:

  Add missing opt/cat group entries (2017-01-20 10:50:16 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to bd4d9bdc5097c3b35b5172508e1a2828296e01c2:

  Remove/Move Linux specific sysfs_root field from thread_data (2017-01-23 08:26:12 -0700)

----------------------------------------------------------------
Tomohiro Kusumi (8):
      Define pointer alignment macro in fio.h
      Use ARRAY_SIZE()
      Fix wrong comment on exit condition of threads/processes
      Remove unused io_u's priv union field
      Remove unused disk_util's name field
      Add missing free(td->sysfs_root);
      Fix bad pointer du->sysfs_root
      Remove/Move Linux specific sysfs_root field from thread_data

 backend.c      | 14 ++++++++------
 diskutil.c     | 12 +++---------
 diskutil.h     |  1 -
 fio.h          |  7 +++++--
 gclient.c      |  2 +-
 gfio.c         |  2 +-
 ioengine.h     |  1 -
 lib/memalign.c |  4 +---
 lib/num2str.c  |  4 +---
 smalloc.c      |  3 ++-
 10 files changed, 22 insertions(+), 28 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 4570d8d..1c1f2f9 100644
--- a/backend.c
+++ b/backend.c
@@ -76,9 +76,6 @@ int shm_id = 0;
 int temp_stall_ts;
 unsigned long done_secs = 0;
 
-#define PAGE_ALIGN(buf)	\
-	(char *) (((uintptr_t) (buf) + page_mask) & ~page_mask)
-
 #define JOB_START_TIMEOUT	(5 * 1000)
 
 static void sig_int(int sig)
@@ -1198,7 +1195,7 @@ static int init_io_u(struct thread_data *td)
 
 	if (td->o.odirect || td->o.mem_align || td->o.oatomic ||
 	    td_ioengine_flagged(td, FIO_RAWIO))
-		p = PAGE_ALIGN(td->orig_buffer) + td->o.mem_align;
+		p = PTR_ALIGN(td->orig_buffer, page_mask) + td->o.mem_align;
 	else
 		p = td->orig_buffer;
 
@@ -1264,6 +1261,10 @@ static int init_io_u(struct thread_data *td)
 	return 0;
 }
 
+/*
+ * This function is Linux specific.
+ * FIO_HAVE_IOSCHED_SWITCH enabled currently means it's Linux.
+ */
 static int switch_ioscheduler(struct thread_data *td)
 {
 #ifdef FIO_HAVE_IOSCHED_SWITCH
@@ -1274,7 +1275,8 @@ static int switch_ioscheduler(struct thread_data *td)
 	if (td_ioengine_flagged(td, FIO_DISKLESSIO))
 		return 0;
 
-	sprintf(tmp, "%s/queue/scheduler", td->sysfs_root);
+	assert(td->files && td->files[0]);
+	sprintf(tmp, "%s/queue/scheduler", td->files[0]->du->sysfs_root);
 
 	f = fopen(tmp, "r+");
 	if (!f) {
@@ -1362,7 +1364,7 @@ static bool keep_running(struct thread_data *td)
 		uint64_t diff;
 
 		/*
-		 * If the difference is less than the minimum IO size, we
+		 * If the difference is less than the maximum IO size, we
 		 * are done.
 		 */
 		diff = limit - ddir_rw_sum(td->io_bytes);
diff --git a/diskutil.c b/diskutil.c
index 27ddb46..c3bcec9 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -37,6 +37,7 @@ static void disk_util_free(struct disk_util *du)
 	}
 
 	fio_mutex_remove(du->lock);
+	free(du->sysfs_root);
 	sfree(du);
 }
 
@@ -305,7 +306,7 @@ static struct disk_util *disk_util_add(struct thread_data *td, int majdev,
 		return NULL;
 	}
 	strncpy((char *) du->dus.name, basename(path), FIO_DU_NAME_SZ - 1);
-	du->sysfs_root = path;
+	du->sysfs_root = strdup(path);
 	du->major = majdev;
 	du->minor = mindev;
 	INIT_FLIST_HEAD(&du->slavelist);
@@ -430,9 +431,6 @@ static struct disk_util *__init_per_file_disk_util(struct thread_data *td,
 		sprintf(path, "%s", tmp);
 	}
 
-	if (td->o.ioscheduler && !td->sysfs_root)
-		td->sysfs_root = strdup(path);
-
 	return disk_util_add(td, majdev, mindev, path);
 }
 
@@ -451,12 +449,8 @@ static struct disk_util *init_per_file_disk_util(struct thread_data *td,
 			mindev);
 
 	du = disk_util_exists(majdev, mindev);
-	if (du) {
-		if (td->o.ioscheduler && !td->sysfs_root)
-			td->sysfs_root = strdup(du->sysfs_root);
-
+	if (du)
 		return du;
-	}
 
 	/*
 	 * for an fs without a device, we will repeatedly stat through
diff --git a/diskutil.h b/diskutil.h
index ff8a5b0..04fdde2 100644
--- a/diskutil.h
+++ b/diskutil.h
@@ -46,7 +46,6 @@ struct disk_util {
 	 */
 	struct flist_head slavelist;
 
-	char *name;
 	char *sysfs_root;
 	char path[PATH_MAX];
 	int major, minor;
diff --git a/fio.h b/fio.h
index b2dade9..19ac0af 100644
--- a/fio.h
+++ b/fio.h
@@ -205,8 +205,6 @@ struct thread_data {
 	void *iolog_buf;
 	FILE *iolog_f;
 
-	char *sysfs_root;
-
 	unsigned long rand_seeds[FIO_RAND_NR_OFFS];
 
 	struct frand_state bsrange_state;
@@ -619,6 +617,11 @@ extern int __must_check allocate_io_mem(struct thread_data *);
 extern void free_io_mem(struct thread_data *);
 extern void free_threads_shm(void);
 
+#ifdef FIO_INTERNAL
+#define PTR_ALIGN(ptr, mask)	\
+	(char *) (((uintptr_t) (ptr) + (mask)) & ~(mask))
+#endif
+
 /*
  * Reset stats after ramp time completes
  */
diff --git a/gclient.c b/gclient.c
index 5ce33d0..928a1b7 100644
--- a/gclient.c
+++ b/gclient.c
@@ -48,7 +48,7 @@ static GtkActionEntry results_menu_items[] = {
 	{ "PrintFile", GTK_STOCK_PRINT, "Print", "<Control>P", NULL, G_CALLBACK(results_print) },
 	{ "CloseFile", GTK_STOCK_CLOSE, "Close", "<Control>W", NULL, G_CALLBACK(results_close) },
 };
-static gint results_nmenu_items = sizeof(results_menu_items) / sizeof(results_menu_items[0]);
+static gint results_nmenu_items = ARRAY_SIZE(results_menu_items);
 
 static const gchar *results_ui_string = " \
 	<ui> \
diff --git a/gfio.c b/gfio.c
index 9ccf78c..9c917cb 100644
--- a/gfio.c
+++ b/gfio.c
@@ -1271,7 +1271,7 @@ static GtkActionEntry menu_items[] = {
 	{ "Quit", GTK_STOCK_QUIT, NULL,   "<Control>Q", NULL, G_CALLBACK(quit_clicked) },
 	{ "About", GTK_STOCK_ABOUT, NULL,  NULL, NULL, G_CALLBACK(about_dialog) },
 };
-static gint nmenu_items = sizeof(menu_items) / sizeof(menu_items[0]);
+static gint nmenu_items = ARRAY_SIZE(menu_items);
 
 static const gchar *ui_string = " \
 	<ui> \
diff --git a/ioengine.h b/ioengine.h
index 89873e7..7249df6 100644
--- a/ioengine.h
+++ b/ioengine.h
@@ -123,7 +123,6 @@ struct io_u {
 		struct ibv_mr *mr;
 #endif
 		void *mmap_data;
-		uint64_t null;
 	};
 };
 
diff --git a/lib/memalign.c b/lib/memalign.c
index cfd6e46..1d1ba9b 100644
--- a/lib/memalign.c
+++ b/lib/memalign.c
@@ -3,14 +3,12 @@
 #include <inttypes.h>
 
 #include "memalign.h"
+#include "../fio.h"
 
 struct align_footer {
 	unsigned int offset;
 };
 
-#define PTR_ALIGN(ptr, mask)	\
-	(char *) (((uintptr_t) ((ptr) + (mask)) & ~(mask)))
-
 void *fio_memalign(size_t alignment, size_t size)
 {
 	struct align_footer *f;
diff --git a/lib/num2str.c b/lib/num2str.c
index 940d4a5..ed3545d 100644
--- a/lib/num2str.c
+++ b/lib/num2str.c
@@ -4,8 +4,6 @@
 
 #include "../fio.h"
 
-#define ARRAY_LENGTH(arr)	sizeof(arr) / sizeof((arr)[0])
-
 /**
  * num2str() - Cheesy number->string conversion, complete with carry rounding error.
  * @num: quantity (e.g., number of blocks, bytes or bits)
@@ -75,7 +73,7 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, int units)
 
 	if (modulo == -1U) {
 done:
-		if (post_index >= ARRAY_LENGTH(sistr))
+		if (post_index >= ARRAY_SIZE(sistr))
 			post_index = 0;
 
 		sprintf(buf, "%llu%s%s", (unsigned long long) num,
diff --git a/smalloc.c b/smalloc.c
index d038ac6..e48cfe8 100644
--- a/smalloc.c
+++ b/smalloc.c
@@ -13,6 +13,7 @@
 #include <limits.h>
 #include <fcntl.h>
 
+#include "fio.h"
 #include "mutex.h"
 #include "arch/arch.h"
 #include "os/os.h"
@@ -248,7 +249,7 @@ static void *postred_ptr(struct block_hdr *hdr)
 	uintptr_t ptr;
 
 	ptr = (uintptr_t) hdr + hdr->size - sizeof(unsigned int);
-	ptr = (ptr + int_mask) & ~int_mask;
+	ptr = (uintptr_t) PTR_ALIGN(ptr, int_mask);
 
 	return (void *) ptr;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-21 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-21 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit f70afaca743f2971312d9928f069a9ea7daeccf7:

  Merge branch 'sphinx-doc' of https://github.com/termim/fio (2017-01-19 22:01:48 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9cf163b0b3f69df8ef70d5a0799d9452e80ee2c4:

  Add missing opt/cat group entries (2017-01-20 10:50:16 -0700)

----------------------------------------------------------------
Tomohiro Kusumi (2):
      Change .category of cpuio to FIO_OPT_C_ENGINE
      Add missing opt/cat group entries

 engines/cpu.c |  6 +++---
 optgroup.c    | 53 ++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 49 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/engines/cpu.c b/engines/cpu.c
index 3d855e3..d0b4a89 100644
--- a/engines/cpu.c
+++ b/engines/cpu.c
@@ -22,7 +22,7 @@ static struct fio_option options[] = {
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct cpu_options, cpuload),
 		.help	= "Use this percentage of CPU",
-		.category = FIO_OPT_C_GENERAL,
+		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
@@ -34,7 +34,7 @@ static struct fio_option options[] = {
 		.def	= "50000",
 		.parent = "cpuload",
 		.hide	= 1,
-		.category = FIO_OPT_C_GENERAL,
+		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
@@ -44,7 +44,7 @@ static struct fio_option options[] = {
 		.off1	= offsetof(struct cpu_options, exit_io_done),
 		.help	= "Exit when IO threads finish",
 		.def	= "0",
-		.category = FIO_OPT_C_GENERAL,
+		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
diff --git a/optgroup.c b/optgroup.c
index 5f9ca96..122d24e 100644
--- a/optgroup.c
+++ b/optgroup.c
@@ -31,16 +31,16 @@ static const struct opt_group fio_opt_groups[] = {
 		.mask	= FIO_OPT_C_PROFILE,
 	},
 	{
+		.name	= "I/O engines",
+		.mask	= FIO_OPT_C_ENGINE,
+	},
+	{
 		.name	= NULL,
 	},
 };
 
 static const struct opt_group fio_opt_cat_groups[] = {
 	{
-		.name	= "Latency profiling",
-		.mask	= FIO_OPT_G_LATPROF,
-	},
-	{
 		.name	= "Rate",
 		.mask	= FIO_OPT_G_RATE,
 	},
@@ -125,13 +125,52 @@ static const struct opt_group fio_opt_cat_groups[] = {
 		.mask	= FIO_OPT_G_TIOBENCH,
 	},
 	{
-		.name	= "MTD",
+		.name	= "Error handling",
+		.mask	= FIO_OPT_G_ERR,
+	},
+	{
+		.name	= "Ext4 defrag I/O engine", /* e4defrag */
+		.mask	= FIO_OPT_G_E4DEFRAG,
+	},
+	{
+		.name	= "Network I/O engine", /* net */
+		.mask	= FIO_OPT_G_NETIO,
+	},
+	{
+		.name	= "RDMA I/O engine", /* rdma */
+		.mask	= FIO_OPT_G_RDMA,
+	},
+	{
+		.name	= "libaio I/O engine", /* libaio */
+		.mask	= FIO_OPT_G_LIBAIO,
+	},
+	{
+		.name	= "ACT Aerospike like benchmark profile",
+		.mask	= FIO_OPT_G_ACT,
+	},
+	{
+		.name	= "Latency profiling",
+		.mask	= FIO_OPT_G_LATPROF,
+	},
+	{
+		.name	= "RBD I/O engine", /* rbd */
+		.mask	= FIO_OPT_G_RBD,
+	},
+	{
+		.name	= "GlusterFS I/O engine", /* gfapi,gfapi_async */
+		.mask	= FIO_OPT_G_GFAPI,
+	},
+	{
+		.name	= "MTD I/O engine", /* mtd */
 		.mask	= FIO_OPT_G_MTD,
 	},
-
+	{
+		.name	= "libhdfs I/O engine", /* libhdfs */
+		.mask	= FIO_OPT_G_HDFS,
+	},
 	{
 		.name	= NULL,
-	}
+	},
 };
 
 static const struct opt_group *group_from_mask(const struct opt_group *ogs,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-20 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-20 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 54441e7b5f44d3786ee12b3fede90c4bb2c2c260:

  init: fix double free of pid_file (2017-01-18 08:18:28 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f70afaca743f2971312d9928f069a9ea7daeccf7:

  Merge branch 'sphinx-doc' of https://github.com/termim/fio (2017-01-19 22:01:48 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      Merge branch 'sphinx-doc' of https://github.com/termim/fio into sphinx
      Merge branch 'sphinx-doc' of https://github.com/termim/fio
      Merge branch 'sphinx-doc' of https://github.com/termim/fio

Mikhail Terekhov (6):
      run sphinx-quickstart
      Convert documentation to reStructured text.
      add .rst suffix to README and HOWTO
      Revert README and HOWTO files renaming.
      ignore documentation output directory
      remove documentation output in clean

Tomohiro Kusumi (5):
      Rename FIO_TYPE_BD to FIO_TYPE_BLOCK
      Fix typo for "job" in plural
      Refactor fio_show_ioengine_help()
      Move options_mem_dupe() to parse.c
      Change td_var() to take void* for the first arg

 .gitignore           |    1 +
 HOWTO                | 5484 +++++++++++++++++++++++++++++---------------------
 Makefile             |    1 +
 README               |  443 ++--
 backend.c            |    2 +-
 doc/Makefile         |  225 +++
 doc/conf.py          |  360 ++++
 doc/fio_doc.rst      |   51 +
 doc/fio_examples.rst |   62 +
 doc/fio_man.rst      |   11 +
 doc/index.rst        |   25 +
 doc/make.bat         |  281 +++
 engines/binject.c    |    2 +-
 engines/mmap.c       |    2 +-
 engines/sg.c         |    8 +-
 file.h               |    2 +-
 filesetup.c          |   10 +-
 fio.h                |    1 -
 init.c               |    4 +-
 ioengines.c          |   20 +-
 options.c            |   19 +-
 parse.c              |   17 +
 parse.h              |    4 +-
 23 files changed, 4393 insertions(+), 2642 deletions(-)
 create mode 100644 doc/Makefile
 create mode 100644 doc/conf.py
 create mode 100644 doc/fio_doc.rst
 create mode 100644 doc/fio_examples.rst
 create mode 100644 doc/fio_man.rst
 create mode 100644 doc/index.rst
 create mode 100644 doc/make.bat

---

Diff of recent changes:

Too large to post


^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-19 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-19 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit d7ea5a89fa2745f5bd743187b62bd05120c44c30:

  Fio 2.17 (2017-01-17 08:51:31 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 54441e7b5f44d3786ee12b3fede90c4bb2c2c260:

  init: fix double free of pid_file (2017-01-18 08:18:28 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      init: fix double free of pid_file

 init.c | 3 ---
 1 file changed, 3 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index ae20d61..324dc7b 100644
--- a/init.c
+++ b/init.c
@@ -2717,9 +2717,6 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 	}
 
 out_free:
-	if (pid_file)
-		free(pid_file);
-
 	return ini_idx;
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-18 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-18 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit bc0c01e5d03d27e80d2a3b85ab21714bb6f32a19:

  Support zlib in the Windows build (enabled latency histogram logging) (2017-01-11 21:03:23 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d7ea5a89fa2745f5bd743187b62bd05120c44c30:

  Fio 2.17 (2017-01-17 08:51:31 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 2.17

wei xiao (1):
      Drop crc32c-arm64 option

 FIO-VERSION-GEN        | 2 +-
 HOWTO                  | 5 -----
 options.c              | 4 ----
 os/windows/install.wxs | 2 +-
 verify.c               | 1 -
 verify.h               | 1 -
 6 files changed, 2 insertions(+), 13 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index b324859..e2d8a43 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.16
+DEF_VER=fio-2.17
 
 LF='
 '
diff --git a/HOWTO b/HOWTO
index 9c8a837..9ba511b 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1517,11 +1517,6 @@ verify=str	If writing to a file, fio can verify the file contents
 				back to regular software crc32c, if not
 				supported by the system.
 
-			crc32c-arm64 Use hardware assisted crc32c calculation
-				provided on CRC enabled ARM 64-bits processors.
-				Falls back to regular software crc32c, if not
-				supported by the system.
-
 			crc32	Use a crc32 sum of the data area and store
 				it in the header of each block.
 
diff --git a/options.c b/options.c
index 5886c50..1ca16e8 100644
--- a/options.c
+++ b/options.c
@@ -2647,10 +2647,6 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .oval = VERIFY_CRC32C,
 			    .help = "Use crc32c checksums for verification (hw assisted, if available)",
 			  },
-			  { .ival = "crc32c-arm64",
-			    .oval = VERIFY_CRC32C,
-			    .help = "Use crc32c checksums for verification (hw assisted, if available)",
-			  },
 			  { .ival = "crc32c",
 			    .oval = VERIFY_CRC32C,
 			    .help = "Use crc32c checksums for verification (hw assisted, if available)",
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index b660fc6..22b7f7e 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.16">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.17">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"
diff --git a/verify.c b/verify.c
index 02cd3a4..5c7e43d 100644
--- a/verify.c
+++ b/verify.c
@@ -1211,7 +1211,6 @@ nothing:
 void fio_verify_init(struct thread_data *td)
 {
 	if (td->o.verify == VERIFY_CRC32C_INTEL ||
-	    td->o.verify == VERIFY_CRC32C_ARM64 ||
 	    td->o.verify == VERIFY_CRC32C) {
 		crc32c_arm64_probe();
 		crc32c_intel_probe();
diff --git a/verify.h b/verify.h
index 8d40ff6..deb161e 100644
--- a/verify.h
+++ b/verify.h
@@ -15,7 +15,6 @@ enum {
 	VERIFY_CRC64,			/* crc64 sum data blocks */
 	VERIFY_CRC32,			/* crc32 sum data blocks */
 	VERIFY_CRC32C,			/* crc32c sum data blocks */
-	VERIFY_CRC32C_ARM64,		/* crc32c sum data blocks with hw */
 	VERIFY_CRC32C_INTEL,		/* crc32c sum data blocks with hw */
 	VERIFY_CRC16,			/* crc16 sum data blocks */
 	VERIFY_CRC7,			/* crc7 sum data blocks */

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2017-01-17 15:51   ` Jens Axboe
@ 2017-01-17 16:03     ` Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-17 16:03 UTC (permalink / raw)
  To: Elliott, Robert (Persistent Memory), fio

On 01/17/2017 07:51 AM, Jens Axboe wrote:
> On 01/17/2017 06:42 AM, Elliott, Robert (Persistent Memory) wrote:
>> Could we get a tag for fio-2.17 to cover the last month's worth of changes?
> 
> Yes, I'm going to cut a release shortly, wanted to do that before
> pulling in the documentation changes anyway.

Done

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2017-01-17 14:42 ` Elliott, Robert (Persistent Memory)
@ 2017-01-17 15:51   ` Jens Axboe
  2017-01-17 16:03     ` Jens Axboe
  0 siblings, 1 reply; 1305+ messages in thread
From: Jens Axboe @ 2017-01-17 15:51 UTC (permalink / raw)
  To: Elliott, Robert (Persistent Memory), fio

On 01/17/2017 06:42 AM, Elliott, Robert (Persistent Memory) wrote:
> Could we get a tag for fio-2.17 to cover the last month's worth of changes?

Yes, I'm going to cut a release shortly, wanted to do that before
pulling in the documentation changes anyway.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* RE: Recent changes (master)
  2017-01-13 13:00 Jens Axboe
@ 2017-01-17 14:42 ` Elliott, Robert (Persistent Memory)
  2017-01-17 15:51   ` Jens Axboe
  0 siblings, 1 reply; 1305+ messages in thread
From: Elliott, Robert (Persistent Memory) @ 2017-01-17 14:42 UTC (permalink / raw)
  To: Jens Axboe, fio

Could we get a tag for fio-2.17 to cover the last month's worth of changes?

 


^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-13 13:00 Jens Axboe
  2017-01-17 14:42 ` Elliott, Robert (Persistent Memory)
  0 siblings, 1 reply; 1305+ messages in thread
From: Jens Axboe @ 2017-01-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0cbb3f53ebeec45478c7b361c2a84092da93e4a8:

  pmemblk: Clarify fsize is in MiB not MB (2017-01-11 21:00:01 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to bc0c01e5d03d27e80d2a3b85ab21714bb6f32a19:

  Support zlib in the Windows build (enabled latency histogram logging) (2017-01-11 21:03:23 -0700)

----------------------------------------------------------------
Rebecca Cran (1):
      Support zlib in the Windows build (enabled latency histogram logging)

 configure | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index d768e9d..d0c2173 100755
--- a/configure
+++ b/configure
@@ -281,8 +281,18 @@ CYGWIN*)
   if test -z "$CC" ; then
     if test ! -z "$build_32bit_win" && test "$build_32bit_win" = "yes"; then
       CC="i686-w64-mingw32-gcc"
+      if test -e "../zlib/contrib/vstudio/vc14/x86/ZlibStatReleaseWithoutAsm/zlibstat.lib"; then
+        echo "Building with zlib support"
+        output_sym "CONFIG_ZLIB"
+        echo "LIBS=../zlib/contrib/vstudio/vc14/x86/ZlibStatReleaseWithoutAsm/zlibstat.lib" >> $config_host_mak
+      fi
     else
       CC="x86_64-w64-mingw32-gcc"
+      if test -e "../zlib/contrib/vstudio/vc14/x64/ZlibStatReleaseWithoutAsm/zlibstat.lib"; then
+        echo "Building with zlib support"
+        output_sym "CONFIG_ZLIB"
+        echo "LIBS=../zlib/contrib/vstudio/vc14/x64/ZlibStatReleaseWithoutAsm/zlibstat.lib" >> $config_host_mak
+      fi
     fi
   fi
   output_sym "CONFIG_LITTLE_ENDIAN"
@@ -306,7 +316,8 @@ CYGWIN*)
   output_sym "CONFIG_TLS_THREAD"
   output_sym "CONFIG_IPV6"
   echo "CC=$CC" >> $config_host_mak
-  echo "BUILD_CFLAGS=$CFLAGS -include config-host.h -D_GNU_SOURCE" >> $config_host_mak
+  echo "BUILD_CFLAGS=$CFLAGS -I../zlib -include config-host.h -D_GNU_SOURCE" >> $config_host_mak
+
   exit 0
   ;;
 esac

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-12 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-12 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9bae252254d0ce37a144cb4c8d2cb4222d539a9e:

  Python style/portability fix (2017-01-10 13:20:30 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0cbb3f53ebeec45478c7b361c2a84092da93e4a8:

  pmemblk: Clarify fsize is in MiB not MB (2017-01-11 21:00:01 -0700)

----------------------------------------------------------------
Robert Elliott (4):
      pmemblk, dev-dax: load libpmem and libpmemblk at startup
      pmemblk, dev-dax: Update descriptions
      pmemblk, dev-dax: clean up error logs
      pmemblk: Clarify fsize is in MiB not MB

 HOWTO             |  10 +++--
 configure         |  20 ++++++----
 engines/dev-dax.c |  44 ++++++--------------
 engines/pmemblk.c | 117 +++++++++++++++---------------------------------------
 fio.1             |   6 ++-
 5 files changed, 66 insertions(+), 131 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 33f8718..9c8a837 100644
--- a/HOWTO
+++ b/HOWTO
@@ -904,11 +904,13 @@ ioengine=str	Defines how the job issues io to the file. The following
 				overwriting. The writetrim mode works well
 				for this constraint.
 
-			pmemblk	Read and write through the NVML libpmemblk
-				interface.
+			pmemblk	Read and write using filesystem DAX to a file
+				on a filesystem mounted with DAX on a persistent
+				memory device through the NVML libpmemblk library.
 
-			dev-dax Read and write through a DAX device exposed
-				from persistent memory.
+			dev-dax Read and write using device DAX to a persistent
+				memory device (e.g., /dev/dax0.0) through the
+				NVML libpmem library.
 
 			external Prefix to specify loading an external
 				IO engine object file. Append the engine
diff --git a/configure b/configure
index 7de88f8..d768e9d 100755
--- a/configure
+++ b/configure
@@ -1583,26 +1583,32 @@ int main(int argc, char **argv)
 EOF
 if compile_prog "" "-lpmem" "libpmem"; then
   libpmem="yes"
+  LIBS="-lpmem $LIBS"
 fi
 echo "libpmem                       $libpmem"
 
 ##########################################
 # Check whether we have libpmemblk
+# libpmem is a prerequisite
 libpmemblk="no"
-cat > $TMPC << EOF
+if test "$libpmem" = "yes"; then
+  cat > $TMPC << EOF
 #include <libpmemblk.h>
 int main(int argc, char **argv)
 {
-  int rc;
-  rc = pmemblk_open("", 0);
+  PMEMblkpool *pbp;
+  pbp = pmemblk_open("", 0);
   return 0;
 }
 EOF
-if compile_prog "" "-lpmemblk -lpmem" "libpmemblk"; then
-  libpmemblk="yes"
+  if compile_prog "" "-lpmemblk" "libpmemblk"; then
+    libpmemblk="yes"
+    LIBS="-lpmemblk $LIBS"
+  fi
 fi
 echo "libpmemblk                    $libpmemblk"
 
+# Choose the ioengines
 if test "$libpmem" = "yes" && test "$disable_pmem" = "no"; then
   devdax="yes"
   if test "$libpmemblk" = "yes"; then
@@ -1612,11 +1618,11 @@ fi
 
 ##########################################
 # Report whether pmemblk engine is enabled
-echo "NVML libpmemblk engine        $pmemblk"
+echo "NVML pmemblk engine           $pmemblk"
 
 ##########################################
 # Report whether dev-dax engine is enabled
-echo "NVML Device Dax engine        $devdax"
+echo "NVML dev-dax engine           $devdax"
 
 # Check if we have lex/yacc available
 yacc="no"
diff --git a/engines/dev-dax.c b/engines/dev-dax.c
index 2516bca..235a31e 100644
--- a/engines/dev-dax.c
+++ b/engines/dev-dax.c
@@ -51,8 +51,8 @@
 #include <sys/mman.h>
 #include <sys/stat.h>
 #include <sys/sysmacros.h>
-#include <dlfcn.h>
 #include <libgen.h>
+#include <libpmem.h>
 
 #include "../fio.h"
 #include "../verify.h"
@@ -69,8 +69,6 @@ struct fio_devdax_data {
 	off_t devdax_off;
 };
 
-static void * (*pmem_memcpy_persist)(void *dest, const void *src, size_t len);
-
 static int fio_devdax_file(struct thread_data *td, struct fio_file *f,
 			   size_t length, off_t off)
 {
@@ -108,7 +106,7 @@ static int fio_devdax_prep_limited(struct thread_data *td, struct io_u *io_u)
 	struct fio_devdax_data *fdd = FILE_ENG_DATA(f);
 
 	if (io_u->buflen > f->real_file_size) {
-		log_err("fio: bs too big for dev-dax engine\n");
+		log_err("dev-dax: bs too big for dev-dax engine\n");
 		return EIO;
 	}
 
@@ -212,29 +210,11 @@ static int fio_devdax_queue(struct thread_data *td, struct io_u *io_u)
 static int fio_devdax_init(struct thread_data *td)
 {
 	struct thread_options *o = &td->o;
-	const char *path;
-	void *dl;
 
 	if ((o->rw_min_bs & page_mask) &&
 	    (o->fsync_blocks || o->fdatasync_blocks)) {
-		log_err("fio: mmap options dictate a minimum block size of "
-			"%llu bytes\n", (unsigned long long) page_size);
-		return 1;
-	}
-
-	path = getenv("FIO_PMEM_LIB");
-	if (!path)
-		path = "libpmem.so";
-
-	dl = dlopen(path, RTLD_NOW | RTLD_NODELETE);
-	if (!dl) {
-		log_err("fio: unable to open libpmem: %s\n", dlerror());
-		return 1;
-	}
-
-	pmem_memcpy_persist = dlsym(dl, "pmem_memcpy_persist");
-	if (!pmem_memcpy_persist) {
-		log_err("fio: unable to load libpmem: %s\n", dlerror());
+		log_err("dev-dax: mmap options dictate a minimum block size of %llu bytes\n",
+			(unsigned long long) page_size);
 		return 1;
 	}
 
@@ -292,8 +272,8 @@ fio_devdax_get_file_size(struct thread_data *td, struct fio_file *f)
 
 	rc = stat(f->file_name, &st);
 	if (rc < 0) {
-		log_err("%s: failed to stat file %s: %d\n",
-			td->o.name, f->file_name, errno);
+		log_err("%s: failed to stat file %s (%s)\n",
+			td->o.name, f->file_name, strerror(errno));
 		return -errno;
 	}
 
@@ -302,8 +282,8 @@ fio_devdax_get_file_size(struct thread_data *td, struct fio_file *f)
 
 	rpath = realpath(spath, npath);
 	if (!rpath) {
-		log_err("%s: realpath on %s failed: %d\n",
-			td->o.name, spath, errno);
+		log_err("%s: realpath on %s failed (%s)\n",
+			td->o.name, spath, strerror(errno));
 		return -errno;
 	}
 
@@ -318,15 +298,15 @@ fio_devdax_get_file_size(struct thread_data *td, struct fio_file *f)
 
 	sfile = fopen(spath, "r");
 	if (!sfile) {
-		log_err("%s: fopen on %s failed: %d\n",
-			td->o.name, spath, errno);
+		log_err("%s: fopen on %s failed (%s)\n",
+			td->o.name, spath, strerror(errno));
 		return 1;
 	}
 
 	rc = fscanf(sfile, "%lu", &size);
 	if (rc < 0) {
-		log_err("%s: fscanf on %s failed: %d\n",
-			td->o.name, spath, errno);
+		log_err("%s: fscanf on %s failed (%s)\n",
+			td->o.name, spath, strerror(errno));
 		return 1;
 	}
 
diff --git a/engines/pmemblk.c b/engines/pmemblk.c
index 5439da0..e8476f9 100644
--- a/engines/pmemblk.c
+++ b/engines/pmemblk.c
@@ -27,11 +27,11 @@
  *   ioengine=pmemblk
  *
  * Other relevant settings:
+ *   thread=1   REQUIRED
  *   iodepth=1
  *   direct=1
- *   thread=1   REQUIRED
  *   unlink=1
- *   filename=/pmem0/fiotestfile,BSIZE,FSIZEMB
+ *   filename=/mnt/pmem0/fiotestfile,BSIZE,FSIZEMiB
  *
  *   thread must be set to 1 for pmemblk as multiple processes cannot
  *     open the same block pool file.
@@ -39,23 +39,26 @@
  *   iodepth should be set to 1 as pmemblk is always synchronous.
  *   Use numjobs to scale up.
  *
- *   direct=1 is implied as pmemblk is always direct.
+ *   direct=1 is implied as pmemblk is always direct. A warning message
+ *   is printed if this is not specified.
+ *
+ *   unlink=1 removes the block pool file after testing, and is optional.
  *
- *   Can set unlink to 1 to remove the block pool file after testing.
+ *   The pmem device must have a DAX-capable filesystem and be mounted
+ *   with DAX enabled.  filename must point to a file on that filesystem.
+ *
+ *   Example:
+ *     mkfs.xfs /dev/pmem0
+ *     mkdir /mnt/pmem0
+ *     mount -o dax /dev/pmem0 /mnt/pmem0
  *
  *   When specifying the filename, if the block pool file does not already
- *   exist, then the pmemblk engine can create the pool file if you specify
+ *   exist, then the pmemblk engine creates the pool file if you specify
  *   the block and file sizes.  BSIZE is the block size in bytes.
- *   FSIZEMB is the pool file size in MB.
+ *   FSIZEMB is the pool file size in MiB.
  *
  *   See examples/pmemblk.fio for more.
  *
- * libpmemblk.so
- *   By default, the pmemblk engine will let the system find the libpmemblk.so
- *   that it uses.  You can use an alternative libpmemblk by setting the
- *   FIO_PMEMBLK_LIB environment variable to the full path to the desired
- *   libpmemblk.so.
- *
  */
 
 #include <stdio.h>
@@ -64,68 +67,15 @@
 #include <sys/uio.h>
 #include <errno.h>
 #include <assert.h>
-#include <dlfcn.h>
 #include <string.h>
+#include <libpmem.h>
+#include <libpmemblk.h>
 
 #include "../fio.h"
 
 /*
  * libpmemblk
  */
-struct PMEMblkpool_s;
-typedef struct PMEMblkpool_s PMEMblkpool;
-
-static PMEMblkpool *(*pmemblk_create) (const char *, size_t, size_t, mode_t);
-static PMEMblkpool *(*pmemblk_open) (const char *, size_t);
-static void (*pmemblk_close) (PMEMblkpool *);
-static size_t(*pmemblk_nblock) (PMEMblkpool *);
-static size_t(*pmemblk_bsize) (PMEMblkpool *);
-static int (*pmemblk_read) (PMEMblkpool *, void *, off_t);
-static int (*pmemblk_write) (PMEMblkpool *, const void *, off_t);
-
-int load_libpmemblk(const char *path)
-{
-	void *dl;
-
-	if (!path)
-		path = "libpmemblk.so";
-
-	dl = dlopen(path, RTLD_NOW | RTLD_NODELETE);
-	if (!dl)
-		goto errorout;
-
-	pmemblk_create = dlsym(dl, "pmemblk_create");
-	if (!pmemblk_create)
-		goto errorout;
-	pmemblk_open = dlsym(dl, "pmemblk_open");
-	if (!pmemblk_open)
-		goto errorout;
-	pmemblk_close = dlsym(dl, "pmemblk_close");
-	if (!pmemblk_close)
-		goto errorout;
-	pmemblk_nblock = dlsym(dl, "pmemblk_nblock");
-	if (!pmemblk_nblock)
-		goto errorout;
-	pmemblk_bsize = dlsym(dl, "pmemblk_bsize");
-	if (!pmemblk_bsize)
-		goto errorout;
-	pmemblk_read = dlsym(dl, "pmemblk_read");
-	if (!pmemblk_read)
-		goto errorout;
-	pmemblk_write = dlsym(dl, "pmemblk_write");
-	if (!pmemblk_write)
-		goto errorout;
-
-	return 0;
-
-errorout:
-	log_err("fio: unable to load libpmemblk: %s\n", dlerror());
-	if (dl)
-		dlclose(dl);
-
-	return -1;
-}
-
 typedef struct fio_pmemblk_file *fio_pmemblk_file_t;
 
 struct fio_pmemblk_file {
@@ -187,7 +137,7 @@ static void fio_pmemblk_cache_remove(fio_pmemblk_file_t pmb)
  * level, we allow the block size and file size to be appended
  * to the file name:
  *
- *   path[,bsize,fsizemb]
+ *   path[,bsize,fsizemib]
  *
  * note that we do not use the fio option "filesize" to dictate
  * the file size because we can only give libpmemblk the gross
@@ -197,7 +147,7 @@ static void fio_pmemblk_cache_remove(fio_pmemblk_file_t pmb)
  * the final path without the parameters is returned in ppath.
  * the block size and file size are returned in pbsize and fsize.
  *
- * note that the user should specify the file size in MiB, but
+ * note that the user specifies the file size in MiB, but
  * we return bytes from here.
  */
 static void pmb_parse_path(const char *pathspec, char **ppath, uint64_t *pbsize,
@@ -206,7 +156,7 @@ static void pmb_parse_path(const char *pathspec, char **ppath, uint64_t *pbsize,
 	char *path;
 	char *s;
 	uint64_t bsize;
-	uint64_t fsizemb;
+	uint64_t fsizemib;
 
 	path = strdup(pathspec);
 	if (!path) {
@@ -216,14 +166,14 @@ static void pmb_parse_path(const char *pathspec, char **ppath, uint64_t *pbsize,
 
 	/* extract sizes, if given */
 	s = strrchr(path, ',');
-	if (s && (fsizemb = strtoull(s + 1, NULL, 10))) {
+	if (s && (fsizemib = strtoull(s + 1, NULL, 10))) {
 		*s = 0;
 		s = strrchr(path, ',');
 		if (s && (bsize = strtoull(s + 1, NULL, 10))) {
 			*s = 0;
 			*ppath = path;
 			*pbsize = bsize;
-			*pfsize = fsizemb << 20;
+			*pfsize = fsizemib << 20;
 			return;
 		}
 	}
@@ -250,11 +200,6 @@ static fio_pmemblk_file_t pmb_open(const char *pathspec, int flags)
 
 	pmb = fio_pmemblk_cache_lookup(path);
 	if (!pmb) {
-		/* load libpmemblk if needed */
-		if (!pmemblk_open)
-			if (load_libpmemblk(getenv("FIO_PMEMBLK_LIB")))
-				goto error;
-
 		pmb = malloc(sizeof(*pmb));
 		if (!pmb)
 			goto error;
@@ -267,9 +212,8 @@ static fio_pmemblk_file_t pmb_open(const char *pathspec, int flags)
 			    pmemblk_create(path, bsize, fsize, 0644);
 		}
 		if (!pmb->pmb_pool) {
-			log_err
-			    ("fio: enable to open pmemblk pool file (errno %d)\n",
-			     errno);
+			log_err("pmemblk: unable to open pmemblk pool file %s (%s)\n",
+			     path, strerror(errno));
 			goto error;
 		}
 
@@ -331,14 +275,14 @@ static int pmb_get_flags(struct thread_data *td, uint64_t *pflags)
 	if (!td->o.use_thread) {
 		if (!thread_warned) {
 			thread_warned = 1;
-			log_err("fio: must set thread=1 for pmemblk engine\n");
+			log_err("pmemblk: must set thread=1 for pmemblk engine\n");
 		}
 		return 1;
 	}
 
 	if (!td->o.odirect && !odirect_warned) {
 		odirect_warned = 1;
-		log_info("fio: direct == 0, but pmemblk is always direct\n");
+		log_info("pmemblk: direct == 0, but pmemblk is always direct\n");
 	}
 
 	if (td->o.allow_create)
@@ -410,14 +354,11 @@ static int fio_pmemblk_queue(struct thread_data *td, struct io_u *io_u)
 	unsigned long long off;
 	unsigned long len;
 	void *buf;
-	int (*blkop) (PMEMblkpool *, void *, off_t) = (void *)pmemblk_write;
 
 	fio_ro_check(td, io_u);
 
 	switch (io_u->ddir) {
 	case DDIR_READ:
-		blkop = pmemblk_read;
-		/* fall through */
 	case DDIR_WRITE:
 		off = io_u->offset;
 		len = io_u->xfer_buflen;
@@ -435,7 +376,11 @@ static int fio_pmemblk_queue(struct thread_data *td, struct io_u *io_u)
 		off /= pmb->pmb_bsize;
 		len /= pmb->pmb_bsize;
 		while (0 < len) {
-			if (0 != blkop(pmb->pmb_pool, buf, off)) {
+			if (io_u->ddir == DDIR_READ &&
+			   0 != pmemblk_read(pmb->pmb_pool, buf, off)) {
+				io_u->error = errno;
+				break;
+			} else if (0 != pmemblk_write(pmb->pmb_pool, buf, off)) {
 				io_u->error = errno;
 				break;
 			}
diff --git a/fio.1 b/fio.1
index f486276..56f2d11 100644
--- a/fio.1
+++ b/fio.1
@@ -811,10 +811,12 @@ and discarding before overwriting. The trimwrite mode works well for this
 constraint.
 .TP
 .B pmemblk
-Read and write through the NVML libpmemblk interface.
+Read and write using filesystem DAX to a file on a filesystem mounted with
+DAX on a persistent memory device through the NVML libpmemblk library.
 .TP
 .B dev-dax
-Read and write through a DAX device exposed from persistent memory.
+Read and write using device DAX to a persistent memory device
+(e.g., /dev/dax0.0) through the NVML libpmem library.
 .RE
 .P
 .RE

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 28c43a89ae13b648dd37269d288fbbea2550faa8:

  Fix comment on SCSI commands (2017-01-06 11:26:04 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9bae252254d0ce37a144cb4c8d2cb4222d539a9e:

  Python style/portability fix (2017-01-10 13:20:30 -0700)

----------------------------------------------------------------
Tomohiro Kusumi (10):
      Bring in additional sg error cases from Linux kernel (and sg3) header
      Staticize pmemblk ioengine_ops
      Fix conditional/message for max lba for sg ioengine
      Non functional fixup for 16 bytes read capacity for sg ioengine
      Mention sg ioengine requires filename option
      Fix compile time errors for skeleton_external ioengine
      Partly revert 8172fe97 in 2008 (A few debug debug log fixes)
      Add missing trailing \n in dprint()
      Fix README regarding fio snapshots
      Python style/portability fix

 HOWTO                       |  3 ++-
 README                      |  5 +++--
 engines/pmemblk.c           |  2 +-
 engines/sg.c                | 48 ++++++++++++++++++++++++++++++++-------------
 engines/skeleton_external.c |  8 ++++----
 ioengine.h                  |  9 +++------
 iolog.c                     |  2 +-
 tools/fiologparser.py       | 14 ++++++-------
 8 files changed, 55 insertions(+), 36 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 4cc733f..33f8718 100644
--- a/HOWTO
+++ b/HOWTO
@@ -819,7 +819,8 @@ ioengine=str	Defines how the job issues io to the file. The following
 				synchronous using the SG_IO ioctl, or if
 				the target is an sg character device
 				we use read(2) and write(2) for asynchronous
-				io.
+				io. Requires filename option to specify either
+				block or character devices.
 
 			null	Doesn't transfer any data, just pretends
 				to. This is mainly used to exercise fio
diff --git a/README b/README
index 875d2be..31d53fe 100644
--- a/README
+++ b/README
@@ -21,7 +21,8 @@ If git:// does not work, use the http protocol instead:
 
 	http://git.kernel.dk/fio.git
 
-Snapshots are frequently generated and include the git meta data as well.
+Snapshots are frequently generated and fio-git-*.tar.gz include the git
+meta data as well. Other tarballs are archives of official fio releases.
 Snapshots can download from:
 
 	http://brick.kernel.dk/snaps/
@@ -262,7 +263,7 @@ the copyright and license requirements currently apply to examples/ files.
 
 
 Client/server
-------------
+-------------
 
 Normally fio is invoked as a stand-alone application on the machine
 where the IO workload should be generated. However, the frontend and
diff --git a/engines/pmemblk.c b/engines/pmemblk.c
index ca72697..5439da0 100644
--- a/engines/pmemblk.c
+++ b/engines/pmemblk.c
@@ -482,7 +482,7 @@ static int fio_pmemblk_unlink_file(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
-struct ioengine_ops ioengine = {
+static struct ioengine_ops ioengine = {
 	.name = "pmemblk",
 	.version = FIO_IOOPS_VERSION,
 	.queue = fio_pmemblk_queue,
diff --git a/engines/sg.c b/engines/sg.c
index 2ad3394..3f7d911 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -20,7 +20,7 @@
 #define MAX_SB 64               // sense block maximum return size
 
 struct sgio_cmd {
-	unsigned char cdb[16];  	// increase to support 16 byte commands
+	unsigned char cdb[16];      // enhanced from 10 to support 16 byte commands
 	unsigned char sb[MAX_SB];   // add sense block to commands
 	int nr;
 };
@@ -32,7 +32,6 @@ struct sgio_data {
 	int *fd_flags;
 	void *sgbuf;
 	unsigned int bs;
-	long long max_lba;
 	int type_checked;
 };
 
@@ -309,7 +308,6 @@ static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 	 * blocks on medium.
 	 */
 	if (hdr->dxfer_direction != SG_DXFER_NONE) {
-
 		if (lba < MAX_10B_LBA) {
 			hdr->cmdp[2] = (unsigned char) ((lba >> 24) & 0xff);
 			hdr->cmdp[3] = (unsigned char) ((lba >> 16) & 0xff);
@@ -416,12 +414,11 @@ static int fio_sgio_read_capacity(struct thread_data *td, unsigned int *bs,
 	}
 
 	*bs	 = (buf[4] << 24) | (buf[5] << 16) | (buf[6] << 8) | buf[7];
-	*max_lba = ((buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | buf[3]) & 0x00000000FFFFFFFFULL;  // for some reason max_lba is being sign extended even though unsigned.
-
+	*max_lba = ((buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | buf[3]) & MAX_10B_LBA;  // for some reason max_lba is being sign extended even though unsigned.
 
 	/*
-	 * If max lba is 0xFFFFFFFF, then need to retry with
-	 * 16 byteread capacity
+	 * If max lba masked by MAX_10B_LBA equals MAX_10B_LBA,
+	 * then need to retry with 16 byte Read Capacity command.
 	 */
 	if (*max_lba == MAX_10B_LBA) {
 		hdr.cmd_len = 16;
@@ -507,7 +504,6 @@ static int fio_sgio_type_check(struct thread_data *td, struct fio_file *f)
 	unsigned int bs = 0;
 	unsigned long long max_lba = 0;
 
-
 	if (f->filetype == FIO_TYPE_BD) {
 		if (ioctl(f->fd, BLKSSZGET, &bs) < 0) {
 			td_verror(td, errno, "ioctl");
@@ -529,18 +525,18 @@ static int fio_sgio_type_check(struct thread_data *td, struct fio_file *f)
 		}
 	} else {
 		td_verror(td, EINVAL, "wrong file type");
-		log_err("ioengine sg only works on block devices\n");
+		log_err("ioengine sg only works on block or character devices\n");
 		return 1;
 	}
 
 	sd->bs = bs;
 	// Determine size of commands needed based on max_lba
-	sd->max_lba = max_lba;
-	if (max_lba > MAX_10B_LBA) {
-		dprint(FD_IO, "sgio_type_check: using 16 byte operations: max_lba = 0x%016llx\n", max_lba);
+	if (max_lba >= MAX_10B_LBA) {
+		dprint(FD_IO, "sgio_type_check: using 16 byte read/write "
+			"commands for lba above 0x%016llx/0x%016llx\n",
+			MAX_10B_LBA, max_lba);
 	}
 
-
 	if (f->filetype == FIO_TYPE_BD) {
 		td->io_ops->getevents = NULL;
 		td->io_ops->event = NULL;
@@ -630,6 +626,24 @@ static char *fio_sgio_errdetails(struct io_u *io_u)
 			case 0x0d:
 				strlcat(msg, "SG_ERR_DID_REQUEUE", MAXERRDETAIL);
 				break;
+			case 0x0e:
+				strlcat(msg, "SG_ERR_DID_TRANSPORT_DISRUPTED", MAXERRDETAIL);
+				break;
+			case 0x0f:
+				strlcat(msg, "SG_ERR_DID_TRANSPORT_FAILFAST", MAXERRDETAIL);
+				break;
+			case 0x10:
+				strlcat(msg, "SG_ERR_DID_TARGET_FAILURE", MAXERRDETAIL);
+				break;
+			case 0x11:
+				strlcat(msg, "SG_ERR_DID_NEXUS_FAILURE", MAXERRDETAIL);
+				break;
+			case 0x12:
+				strlcat(msg, "SG_ERR_DID_ALLOC_FAILURE", MAXERRDETAIL);
+				break;
+			case 0x13:
+				strlcat(msg, "SG_ERR_DID_MEDIUM_ERROR", MAXERRDETAIL);
+				break;
 			default:
 				strlcat(msg, "Unknown", MAXERRDETAIL);
 				break;
@@ -775,6 +789,12 @@ static int fio_sgio_get_file_size(struct thread_data *td, struct fio_file *f)
 	if (fio_file_size_known(f))
 		return 0;
 
+	if (f->filetype != FIO_TYPE_BD && f->filetype != FIO_TYPE_CHAR) {
+		td_verror(td, EINVAL, "wrong file type");
+		log_err("ioengine sg only works on block or character devices\n");
+		return 1;
+	}
+
 	ret = fio_sgio_read_capacity(td, &bs, &max_lba);
 	if (ret ) {
 		td_verror(td, td->error, "fio_sgio_read_capacity");
@@ -800,7 +820,7 @@ static struct ioengine_ops ioengine = {
 	.cleanup	= fio_sgio_cleanup,
 	.open_file	= fio_sgio_open,
 	.close_file	= generic_close_file,
-	.get_file_size	= fio_sgio_get_file_size, // generic_get_file_size
+	.get_file_size	= fio_sgio_get_file_size,
 	.flags		= FIO_SYNCIO | FIO_RAWIO,
 };
 
diff --git a/engines/skeleton_external.c b/engines/skeleton_external.c
index 63a6f8d..4bebcc4 100644
--- a/engines/skeleton_external.c
+++ b/engines/skeleton_external.c
@@ -109,11 +109,11 @@ static void fio_skeleton_cleanup(struct thread_data *td)
 
 /*
  * Hook for opening the given file. Unless the engine has special
- * needs, it usually just provides generic_file_open() as the handler.
+ * needs, it usually just provides generic_open_file() as the handler.
  */
 static int fio_skeleton_open(struct thread_data *td, struct fio_file *f)
 {
-	return generic_file_open(td, f);
+	return generic_open_file(td, f);
 }
 
 /*
@@ -121,12 +121,12 @@ static int fio_skeleton_open(struct thread_data *td, struct fio_file *f)
  */
 static int fio_skeleton_close(struct thread_data *td, struct fio_file *f)
 {
-	generic_file_close(td, f);
+	return generic_close_file(td, f);
 }
 
 /*
  * Note that the structure is exported, so that fio can get it via
- * dlsym(..., "ioengine");
+ * dlsym(..., "ioengine"); for (and only for) external engines.
  */
 struct ioengine_ops ioengine = {
 	.name		= "engine_name",
diff --git a/ioengine.h b/ioengine.h
index 08e8fab..89873e7 100644
--- a/ioengine.h
+++ b/ioengine.h
@@ -238,12 +238,9 @@ static inline void dprint_io_u(struct io_u *io_u, const char *p)
 	dprint(FD_IO, "%s: io_u %p: off=%llu/len=%lu/ddir=%d", p, io_u,
 					(unsigned long long) io_u->offset,
 					io_u->buflen, io_u->ddir);
-	if (fio_debug & (1 << FD_IO)) {
-		if (f)
-			log_info("/%s", f->file_name);
-
-		log_info("\n");
-	}
+	if (f)
+		dprint(FD_IO, "/%s", f->file_name);
+	dprint(FD_IO, "\n");
 }
 #else
 #define dprint_io_u(io_u, p)
diff --git a/iolog.c b/iolog.c
index 25d8dd0..2e8da13 100644
--- a/iolog.c
+++ b/iolog.c
@@ -277,7 +277,7 @@ restart:
 			overlap = 1;
 
 		if (overlap) {
-			dprint(FD_IO, "iolog: overlap %llu/%lu, %llu/%lu",
+			dprint(FD_IO, "iolog: overlap %llu/%lu, %llu/%lu\n",
 				__ipo->offset, __ipo->len,
 				ipo->offset, ipo->len);
 			td->io_hist_len--;
diff --git a/tools/fiologparser.py b/tools/fiologparser.py
index 685f419..5a95009 100755
--- a/tools/fiologparser.py
+++ b/tools/fiologparser.py
@@ -45,7 +45,7 @@ def print_full(ctx, series):
     while (start < ftime):
         end = ftime if ftime < end else end
         results = [ts.get_value(start, end) for ts in series]
-        print "%s, %s" % (end, ', '.join(["%0.3f" % i for i in results]))
+        print("%s, %s" % (end, ', '.join(["%0.3f" % i for i in results])))
         start += ctx.interval
         end += ctx.interval
 
@@ -57,7 +57,7 @@ def print_sums(ctx, series):
     while (start < ftime):
         end = ftime if ftime < end else end
         results = [ts.get_value(start, end) for ts in series]
-        print "%s, %0.3f" % (end, sum(results))
+        print("%s, %0.3f" % (end, sum(results)))
         start += ctx.interval
         end += ctx.interval
 
@@ -69,7 +69,7 @@ def print_averages(ctx, series):
     while (start < ftime):
         end = ftime if ftime < end else end
         results = [ts.get_value(start, end) for ts in series]
-        print "%s, %0.3f" % (end, float(sum(results))/len(results))
+        print("%s, %0.3f" % (end, float(sum(results))/len(results)))
         start += ctx.interval
         end += ctx.interval
 
@@ -147,11 +147,11 @@ def print_default(ctx, series):
         end += ctx.interval
 
     total = 0
-    for i in xrange(0, len(averages)):
+    for i in range(0, len(averages)):
         total += averages[i]*weights[i]
-    print '%0.3f' % (total/sum(weights))
+    print('%0.3f' % (total/sum(weights)))
  
-class TimeSeries():
+class TimeSeries(object):
     def __init__(self, ctx, fn):
         self.ctx = ctx
         self.last = None 
@@ -185,7 +185,7 @@ class TimeSeries():
             value += sample.get_contribution(start, end)
         return value
 
-class Sample():
+class Sample(object):
     def __init__(self, ctx, start, end, value):
        self.ctx = ctx
        self.start = start

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-07 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-07 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8c4e634a44ca35a21387b79ae6e701f951e2cb0c:

  init: cleaner gcd() (2017-01-05 10:38:41 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 28c43a89ae13b648dd37269d288fbbea2550faa8:

  Fix comment on SCSI commands (2017-01-06 11:26:04 -0700)

----------------------------------------------------------------
Tomohiro Kusumi (2):
      Remove doubled ; in .c
      Fix comment on SCSI commands

 engines/net.c | 2 +-
 engines/sg.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/engines/net.c b/engines/net.c
index 3bdd5cd..37d44fd 100644
--- a/engines/net.c
+++ b/engines/net.c
@@ -1371,7 +1371,7 @@ static int fio_netio_setup(struct thread_data *td)
 	}
 
 	if (!td->io_ops_data) {
-		nd = malloc(sizeof(*nd));;
+		nd = malloc(sizeof(*nd));
 
 		memset(nd, 0, sizeof(*nd));
 		nd->listenfd = -1;
diff --git a/engines/sg.c b/engines/sg.c
index 001193d..2ad3394 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -425,8 +425,8 @@ static int fio_sgio_read_capacity(struct thread_data *td, unsigned int *bs,
 	 */
 	if (*max_lba == MAX_10B_LBA) {
 		hdr.cmd_len = 16;
-		hdr.cmdp[0] = 0x9e; // Read Capacity(16)
-		hdr.cmdp[1] = 0x10; // service action
+		hdr.cmdp[0] = 0x9e; // service action
+		hdr.cmdp[1] = 0x10; // Read Capacity(16)
 		hdr.cmdp[10] = (unsigned char) ((sizeof(buf) >> 24) & 0xff);
 		hdr.cmdp[11] = (unsigned char) ((sizeof(buf) >> 16) & 0xff);
 		hdr.cmdp[12] = (unsigned char) ((sizeof(buf) >> 8) & 0xff);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-06 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-06 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 847d544cce05157ec36f50b8214b26aff83aef01:

  Style cleanups for arm crc32c hw support (2017-01-04 19:44:35 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8c4e634a44ca35a21387b79ae6e701f951e2cb0c:

  init: cleaner gcd() (2017-01-05 10:38:41 -0700)

----------------------------------------------------------------
Jens Axboe (5):
      verify: use log_verify_failure() for pattern verifies
      Remove '--runtime' command line option
      verify: ensure that verify_interval is always a factor of min/max bs
      verify: fill in vc->name for pattern verify
      init: cleaner gcd()

Tomohiro Kusumi (2):
      Fix invalid ioengine initialization for cpp_null
      Don't malloc ioengine_ops for cpp_null

 README         |  1 -
 engines/null.c | 28 +++++++++++++---------------
 init.c         | 47 +++++++++++++++++++++--------------------------
 io_u.c         |  4 ----
 verify.c       |  3 ++-
 5 files changed, 36 insertions(+), 47 deletions(-)

---

Diff of recent changes:

diff --git a/README b/README
index fdd5bec..875d2be 100644
--- a/README
+++ b/README
@@ -152,7 +152,6 @@ $ fio
 	--debug			Enable some debugging options (see below)
 	--parse-only		Parse options only, don't start any IO
 	--output		Write output to file
-	--runtime		Runtime in seconds
 	--bandwidth-log		Generate aggregate bandwidth logs
 	--minimal		Minimal (terse) output
 	--output-format=type	Output format (terse,json,json+,normal)
diff --git a/engines/null.c b/engines/null.c
index f7ba370..812cadf 100644
--- a/engines/null.c
+++ b/engines/null.c
@@ -135,23 +135,21 @@ static void fio_exit fio_null_unregister(void)
 
 #ifdef FIO_EXTERNAL_ENGINE
 extern "C" {
+static struct ioengine_ops ioengine;
 void get_ioengine(struct ioengine_ops **ioengine_ptr)
 {
-	struct ioengine_ops *ioengine;
-
-	*ioengine_ptr = (struct ioengine_ops *) malloc(sizeof(struct ioengine_ops));
-	ioengine = *ioengine_ptr;
-
-	strcpy(ioengine->name, "cpp_null");
-	ioengine->version        = FIO_IOOPS_VERSION;
-	ioengine->queue          = fio_null_queue;
-	ioengine->commit         = fio_null_commit;
-	ioengine->getevents      = fio_null_getevents;
-	ioengine->event          = fio_null_event;
-	ioengine->init           = fio_null_init;
-	ioengine->cleanup        = fio_null_cleanup;
-	ioengine->open_file      = fio_null_open;
-	ioengine->flags	         = FIO_DISKLESSIO | FIO_FAKEIO;
+	*ioengine_ptr = &ioengine;
+
+	ioengine.name           = "cpp_null";
+	ioengine.version        = FIO_IOOPS_VERSION;
+	ioengine.queue          = fio_null_queue;
+	ioengine.commit         = fio_null_commit;
+	ioengine.getevents      = fio_null_getevents;
+	ioengine.event          = fio_null_event;
+	ioengine.init           = fio_null_init;
+	ioengine.cleanup        = fio_null_cleanup;
+	ioengine.open_file      = fio_null_open;
+	ioengine.flags          = FIO_DISKLESSIO | FIO_FAKEIO;
 }
 }
 #endif /* FIO_EXTERNAL_ENGINE */
diff --git a/init.c b/init.c
index 9889949..ae20d61 100644
--- a/init.c
+++ b/init.c
@@ -40,7 +40,6 @@ const char fio_version_string[] = FIO_VERSION;
 static char **ini_file;
 static int max_jobs = FIO_MAX_JOBS;
 static int dump_cmdline;
-static long long def_timeout;
 static int parse_only;
 
 static struct thread_data def_thread;
@@ -94,11 +93,6 @@ static struct option l_opts[FIO_NR_OPTIONS] = {
 		.val		= 'o' | FIO_CLIENT_FLAG,
 	},
 	{
-		.name		= (char *) "runtime",
-		.has_arg	= required_argument,
-		.val		= 't' | FIO_CLIENT_FLAG,
-	},
-	{
 		.name		= (char *) "latency-log",
 		.has_arg	= required_argument,
 		.val		= 'l' | FIO_CLIENT_FLAG,
@@ -373,14 +367,6 @@ static int setup_thread_area(void)
 	return 0;
 }
 
-static void set_cmd_options(struct thread_data *td)
-{
-	struct thread_options *o = &td->o;
-
-	if (!o->timeout)
-		o->timeout = def_timeout;
-}
-
 static void dump_print_option(struct print_option *p)
 {
 	const char *delim;
@@ -451,10 +437,8 @@ static struct thread_data *get_new_job(int global, struct thread_data *parent,
 {
 	struct thread_data *td;
 
-	if (global) {
-		set_cmd_options(&def_thread);
+	if (global)
 		return &def_thread;
-	}
 	if (setup_thread_area()) {
 		log_err("error: failed to setup shm segment\n");
 		return NULL;
@@ -492,7 +476,6 @@ static struct thread_data *get_new_job(int global, struct thread_data *parent,
 	if (!parent->o.group_reporting || parent == &def_thread)
 		stat_number++;
 
-	set_cmd_options(td);
 	return td;
 }
 
@@ -582,6 +565,17 @@ static unsigned long long get_rand_start_delay(struct thread_data *td)
 }
 
 /*
+ * <3 Johannes
+ */
+static unsigned int gcd(unsigned int m, unsigned int n)
+{
+	if (!n)
+		return m;
+
+	return gcd(n, m % n);
+}
+
+/*
  * Lazy way of fixing up options that depend on each other. We could also
  * define option callback handlers, but this is easier.
  */
@@ -756,6 +750,15 @@ static int fixup_options(struct thread_data *td)
 			o->verify_interval = o->min_bs[DDIR_WRITE];
 		else if (td_read(td) && o->verify_interval > o->min_bs[DDIR_READ])
 			o->verify_interval = o->min_bs[DDIR_READ];
+
+		/*
+		 * Verify interval must be a factor or both min and max
+		 * write size
+		 */
+		if (o->verify_interval % o->min_bs[DDIR_WRITE] ||
+		    o->verify_interval % o->max_bs[DDIR_WRITE])
+			o->verify_interval = gcd(o->min_bs[DDIR_WRITE],
+							o->max_bs[DDIR_WRITE]);
 	}
 
 	if (o->pre_read) {
@@ -1997,7 +2000,6 @@ static void usage(const char *name)
 	show_debug_categories();
 	printf("  --parse-only\t\tParse options only, don't start any IO\n");
 	printf("  --output\t\tWrite output to file\n");
-	printf("  --runtime\t\tRuntime in seconds\n");
 	printf("  --bandwidth-log\tGenerate aggregate bandwidth logs\n");
 	printf("  --minimal\t\tMinimal (terse) output\n");
 	printf("  --output-format=type\tOutput format (terse,json,json+,normal)\n");
@@ -2313,13 +2315,6 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			smalloc_pool_size <<= 10;
 			sinit();
 			break;
-		case 't':
-			if (check_str_time(optarg, &def_timeout, 1)) {
-				log_err("fio: failed parsing time %s\n", optarg);
-				do_exit++;
-				exit_val = 1;
-			}
-			break;
 		case 'l':
 			log_err("fio: --latency-log is deprecated. Use per-job latency log options.\n");
 			do_exit++;
diff --git a/io_u.c b/io_u.c
index 7420629..1daaf7b 100644
--- a/io_u.c
+++ b/io_u.c
@@ -576,10 +576,6 @@ static unsigned int __get_next_buflen(struct thread_data *td, struct io_u *io_u,
 			}
 		}
 
-		if (td->o.verify != VERIFY_NONE)
-			buflen = (buflen + td->o.verify_interval - 1) &
-				~(td->o.verify_interval - 1);
-
 		if (!td->o.bs_unaligned && is_power_of_2(minbs))
 			buflen &= ~(minbs - 1);
 
diff --git a/verify.c b/verify.c
index 8733feb..02cd3a4 100644
--- a/verify.c
+++ b/verify.c
@@ -393,7 +393,8 @@ static int verify_io_u_pattern(struct verify_header *hdr, struct vcont *vc)
 				(unsigned char)pattern[mod],
 				bits);
 			log_err("fio: bad pattern block offset %u\n", i);
-			dump_verify_buffers(hdr, vc);
+			vc->name = "pattern";
+			log_verify_failure(hdr, vc);
 			return EILSEQ;
 		}
 		mod++;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-05 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-05 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1684f7fd9047c7405264f462f76e1135c563ec33:

  Add missing .help string for io_size option (2017-01-03 10:10:58 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 847d544cce05157ec36f50b8214b26aff83aef01:

  Style cleanups for arm crc32c hw support (2017-01-04 19:44:35 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Style cleanups for arm crc32c hw support

wei xiao (1):
      Add arm64 hardware assisted crc32c support

 HOWTO               |   5 +++
 Makefile            |   4 +-
 arch/arch-aarch64.h |   4 ++
 configure           |  23 +++++++++++
 crc/crc32c-arm64.c  | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 crc/crc32c.h        |  14 +++++++
 crc/test.c          |   1 +
 lib/bloom.c         |   1 +
 options.c           |   4 ++
 verify.c            |   2 +
 verify.h            |   1 +
 11 files changed, 172 insertions(+), 2 deletions(-)
 create mode 100644 crc/crc32c-arm64.c

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 4354e46..4cc733f 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1514,6 +1514,11 @@ verify=str	If writing to a file, fio can verify the file contents
 				back to regular software crc32c, if not
 				supported by the system.
 
+			crc32c-arm64 Use hardware assisted crc32c calculation
+				provided on CRC enabled ARM 64-bits processors.
+				Falls back to regular software crc32c, if not
+				supported by the system.
+
 			crc32	Use a crc32 sum of the data area and store
 				it in the header of each block.
 
diff --git a/Makefile b/Makefile
index 4c64168..ad02d93 100644
--- a/Makefile
+++ b/Makefile
@@ -234,10 +234,10 @@ endif
 T_DEDUPE_OBJS = t/dedupe.o
 T_DEDUPE_OBJS += lib/rbtree.o t/log.o mutex.o smalloc.o gettime.o crc/md5.o \
 		lib/memalign.o lib/bloom.o t/debug.o crc/xxhash.o t/arch.o \
-		crc/murmur3.o crc/crc32c.o crc/crc32c-intel.o crc/fnv.o
+		crc/murmur3.o crc/crc32c.o crc/crc32c-intel.o crc/crc32c-arm64.o crc/fnv.o
 T_DEDUPE_PROGS = t/fio-dedupe
 
-T_VS_OBJS = t/verify-state.o t/log.o crc/crc32c.o crc/crc32c-intel.o t/debug.o
+T_VS_OBJS = t/verify-state.o t/log.o crc/crc32c.o crc/crc32c-intel.o crc/crc32c-arm64.o t/debug.o
 T_VS_PROGS = t/fio-verify-state
 
 T_PIPE_ASYNC_OBJS = t/read-to-pipe-async.o
diff --git a/arch/arch-aarch64.h b/arch/arch-aarch64.h
index 2a86cc5..0912a86 100644
--- a/arch/arch-aarch64.h
+++ b/arch/arch-aarch64.h
@@ -27,4 +27,8 @@ static inline int arch_ffz(unsigned long bitmask)
 
 #define ARCH_HAVE_FFZ
 
+#ifdef ARCH_HAVE_CRC_CRYPTO
+#define ARCH_HAVE_ARM64_CRC_CRYPTO
+#endif
+
 #endif
diff --git a/configure b/configure
index fc15782..7de88f8 100755
--- a/configure
+++ b/configure
@@ -342,6 +342,8 @@ elif check_define __s390__ ; then
   fi
 elif check_define __arm__ ; then
   cpu="arm"
+elif check_define __aarch64__ ; then
+  cpu="aarch64"
 elif check_define __hppa__ ; then
   cpu="hppa"
 else
@@ -362,6 +364,9 @@ case "$cpu" in
   armv*b|armv*l|arm)
     cpu="arm"
   ;;
+  aarch64)
+    cpu="arm64"
+  ;;
   hppa|parisc|parisc64)
     cpu="hppa"
   ;;
@@ -1780,6 +1785,24 @@ if compile_prog "" "" "bool"; then
 fi
 echo "bool                          $have_bool"
 
+##########################################
+# check march=armv8-a+crc+crypto
+march_armv8_a_crc_crypto="no"
+if test "$cpu" = "arm64" ; then
+  cat > $TMPC <<EOF
+int main(void)
+{
+  return 0;
+}
+EOF
+  if compile_prog "-march=armv8-a+crc+crypto" "" ""; then
+    march_armv8_a_crc_crypto="yes"
+    CFLAGS="$CFLAGS -march=armv8-a+crc+crypto -DARCH_HAVE_CRC_CRYPTO"
+  fi
+fi
+echo "march_armv8_a_crc_crypto      $march_armv8_a_crc_crypto"
+
+
 #############################################################################
 
 if test "$wordsize" = "64" ; then
diff --git a/crc/crc32c-arm64.c b/crc/crc32c-arm64.c
new file mode 100644
index 0000000..c3f42c7
--- /dev/null
+++ b/crc/crc32c-arm64.c
@@ -0,0 +1,115 @@
+#include "crc32c.h"
+
+#define CRC32C3X8(ITR) \
+	crc1 = __crc32cd(crc1, *((const uint64_t *)data + 42*1 + (ITR)));\
+	crc2 = __crc32cd(crc2, *((const uint64_t *)data + 42*2 + (ITR)));\
+	crc0 = __crc32cd(crc0, *((const uint64_t *)data + 42*0 + (ITR)));
+
+#define CRC32C7X3X8(ITR) do {\
+	CRC32C3X8((ITR)*7+0) \
+	CRC32C3X8((ITR)*7+1) \
+	CRC32C3X8((ITR)*7+2) \
+	CRC32C3X8((ITR)*7+3) \
+	CRC32C3X8((ITR)*7+4) \
+	CRC32C3X8((ITR)*7+5) \
+	CRC32C3X8((ITR)*7+6) \
+	} while(0)
+
+#ifndef HWCAP_CRC32
+#define HWCAP_CRC32             (1 << 7)
+#endif /* HWCAP_CRC32 */
+
+int crc32c_arm64_available = 0;
+
+#ifdef ARCH_HAVE_ARM64_CRC_CRYPTO
+
+#include <sys/auxv.h>
+#include <arm_acle.h>
+#include <arm_neon.h>
+
+static int crc32c_probed;
+
+/*
+ * Function to calculate reflected crc with PMULL Instruction
+ * crc done "by 3" for fixed input block size of 1024 bytes
+ */
+uint32_t crc32c_arm64(unsigned char const *data, unsigned long length)
+{
+	signed long len = length;
+	uint32_t crc = ~0;
+	uint32_t crc0, crc1, crc2;
+
+	/* Load two consts: K1 and K2 */
+	const poly64_t k1 = 0xe417f38a, k2 = 0x8f158014;
+	uint64_t t0, t1;
+
+	while ((len -= 1024) >= 0) {
+		/* Do first 8 bytes here for better pipelining */
+		crc0 = __crc32cd(crc, *(const uint64_t *)data);
+		crc1 = 0;
+		crc2 = 0;
+		data += sizeof(uint64_t);
+
+		/* Process block inline
+		   Process crc0 last to avoid dependency with above */
+		CRC32C7X3X8(0);
+		CRC32C7X3X8(1);
+		CRC32C7X3X8(2);
+		CRC32C7X3X8(3);
+		CRC32C7X3X8(4);
+		CRC32C7X3X8(5);
+
+		data += 42*3*sizeof(uint64_t);
+
+		/* Merge crc0 and crc1 into crc2
+		   crc1 multiply by K2
+		   crc0 multiply by K1 */
+
+		t1 = (uint64_t)vmull_p64(crc1, k2);
+		t0 = (uint64_t)vmull_p64(crc0, k1);
+		crc = __crc32cd(crc2, *(const uint64_t *)data);
+		crc1 = __crc32cd(0, t1);
+		crc ^= crc1;
+		crc0 = __crc32cd(0, t0);
+		crc ^= crc0;
+
+		data += sizeof(uint64_t);
+	}
+
+	if (!(len += 1024))
+		return crc;
+
+	while ((len -= sizeof(uint64_t)) >= 0) {
+                crc = __crc32cd(crc, *(const uint64_t *)data);
+                data += sizeof(uint64_t);
+        }
+
+        /* The following is more efficient than the straight loop */
+        if (len & sizeof(uint32_t)) {
+                crc = __crc32cw(crc, *(const uint32_t *)data);
+                data += sizeof(uint32_t);
+        }
+        if (len & sizeof(uint16_t)) {
+                crc = __crc32ch(crc, *(const uint16_t *)data);
+                data += sizeof(uint16_t);
+        }
+        if (len & sizeof(uint8_t)) {
+                crc = __crc32cb(crc, *(const uint8_t *)data);
+        }
+
+	return crc;
+}
+
+void crc32c_arm64_probe(void)
+{
+	unsigned long hwcap;
+
+	if (!crc32c_probed) {
+		hwcap = getauxval(AT_HWCAP);
+		if (hwcap & HWCAP_CRC32)
+			crc32c_arm64_available = 1;
+		crc32c_probed = 1;
+	}
+}
+
+#endif /* ARCH_HAVE_ARM64_CRC_CRYPTO */
diff --git a/crc/crc32c.h b/crc/crc32c.h
index 11bcf9c..5d66407 100644
--- a/crc/crc32c.h
+++ b/crc/crc32c.h
@@ -21,8 +21,19 @@
 #include "../arch/arch.h"
 
 extern uint32_t crc32c_sw(unsigned char const *, unsigned long);
+extern int crc32c_arm64_available;
 extern int crc32c_intel_available;
 
+#ifdef ARCH_HAVE_ARM64_CRC_CRYPTO
+extern uint32_t crc32c_arm64(unsigned char const *, unsigned long);
+extern void crc32c_arm64_probe(void);
+#else
+#define crc32c_arm64 crc32c_sw
+static inline void crc32c_arm64_probe(void)
+{
+}
+#endif
+
 #ifdef ARCH_HAVE_SSE4_2
 extern uint32_t crc32c_intel(unsigned char const *, unsigned long);
 extern void crc32c_intel_probe(void);
@@ -35,6 +46,9 @@ static inline void crc32c_intel_probe(void)
 
 static inline uint32_t fio_crc32c(unsigned char const *buf, unsigned long len)
 {
+	if (crc32c_arm64_available)
+		return crc32c_arm64(buf, len);
+
 	if (crc32c_intel_available)
 		return crc32c_intel(buf, len);
 
diff --git a/crc/test.c b/crc/test.c
index 300000d..78f19ac 100644
--- a/crc/test.c
+++ b/crc/test.c
@@ -291,6 +291,7 @@ int fio_crctest(const char *type)
 	int i, first = 1;
 	void *buf;
 
+	crc32c_arm64_probe();
 	crc32c_intel_probe();
 
 	if (!type)
diff --git a/lib/bloom.c b/lib/bloom.c
index fa38db9..7a9ebaa 100644
--- a/lib/bloom.c
+++ b/lib/bloom.c
@@ -65,6 +65,7 @@ struct bloom *bloom_new(uint64_t entries)
 	struct bloom *b;
 	size_t no_uints;
 
+	crc32c_arm64_probe();
 	crc32c_intel_probe();
 
 	b = malloc(sizeof(*b));
diff --git a/options.c b/options.c
index 1ca16e8..5886c50 100644
--- a/options.c
+++ b/options.c
@@ -2647,6 +2647,10 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .oval = VERIFY_CRC32C,
 			    .help = "Use crc32c checksums for verification (hw assisted, if available)",
 			  },
+			  { .ival = "crc32c-arm64",
+			    .oval = VERIFY_CRC32C,
+			    .help = "Use crc32c checksums for verification (hw assisted, if available)",
+			  },
 			  { .ival = "crc32c",
 			    .oval = VERIFY_CRC32C,
 			    .help = "Use crc32c checksums for verification (hw assisted, if available)",
diff --git a/verify.c b/verify.c
index 790ab31..8733feb 100644
--- a/verify.c
+++ b/verify.c
@@ -1210,7 +1210,9 @@ nothing:
 void fio_verify_init(struct thread_data *td)
 {
 	if (td->o.verify == VERIFY_CRC32C_INTEL ||
+	    td->o.verify == VERIFY_CRC32C_ARM64 ||
 	    td->o.verify == VERIFY_CRC32C) {
+		crc32c_arm64_probe();
 		crc32c_intel_probe();
 	}
 }
diff --git a/verify.h b/verify.h
index deb161e..8d40ff6 100644
--- a/verify.h
+++ b/verify.h
@@ -15,6 +15,7 @@ enum {
 	VERIFY_CRC64,			/* crc64 sum data blocks */
 	VERIFY_CRC32,			/* crc32 sum data blocks */
 	VERIFY_CRC32C,			/* crc32c sum data blocks */
+	VERIFY_CRC32C_ARM64,		/* crc32c sum data blocks with hw */
 	VERIFY_CRC32C_INTEL,		/* crc32c sum data blocks with hw */
 	VERIFY_CRC16,			/* crc16 sum data blocks */
 	VERIFY_CRC7,			/* crc7 sum data blocks */

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-04 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-04 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 747311bd9cb82c02bfa4622054b5142a71a6c8ec:

  t/stest: remove old test (2017-01-02 18:21:14 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1684f7fd9047c7405264f462f76e1135c563ec33:

  Add missing .help string for io_size option (2017-01-03 10:10:58 -0700)

----------------------------------------------------------------
Tomohiro Kusumi (7):
      Fix trivial calloc argument order
      Add missing trailing \n in log_err/info()
      Sync README with fio usage output
      Fix canonical name for runtime/timeout option
      Add BSD package/building info to README
      Fix README - change just type "configure" to "./configure"
      Add missing .help string for io_size option

 README             | 32 +++++++++++++++++++-------------
 backend.c          |  2 +-
 client.c           |  2 +-
 engines/e4defrag.c |  2 +-
 engines/net.c      |  2 +-
 engines/rdma.c     |  6 +++---
 engines/sg.c       |  2 +-
 init.c             | 21 ++++++++++++++-------
 iolog.c            |  2 +-
 options.c          |  1 +
 server.c           |  2 +-
 11 files changed, 44 insertions(+), 30 deletions(-)

---

Diff of recent changes:

diff --git a/README b/README
index a35842e..fdd5bec 100644
--- a/README
+++ b/README
@@ -68,6 +68,10 @@ Windows:
 Rebecca Cran <rebecca+fio@bluestop.org> has fio packages for Windows at
 http://www.bluestop.org/fio/ .
 
+BSDs:
+Packages for BSDs may be available from their binary package repositories.
+Look for a package "fio" using their binary package managers.
+
 
 Mailing list
 ------------
@@ -93,11 +97,11 @@ and archives for the old list can be found here:
 Building
 --------
 
-Just type 'configure', 'make' and 'make install'.
+Just type './configure', 'make' and 'make install'.
 
-Note that GNU make is required. On BSD it's available from devel/gmake;
-on Solaris it's in the SUNWgmake package. On platforms where GNU make
-isn't the default, type 'gmake' instead of 'make'.
+Note that GNU make is required. On BSDs it's available from devel/gmake
+within ports directory; on Solaris it's in the SUNWgmake package.
+On platforms where GNU make isn't the default, type 'gmake' instead of 'make'.
 
 Configure will print the enabled options. Note that on Linux based
 platforms, the libaio development packages must be installed to use
@@ -152,32 +156,32 @@ $ fio
 	--bandwidth-log		Generate aggregate bandwidth logs
 	--minimal		Minimal (terse) output
 	--output-format=type	Output format (terse,json,json+,normal)
-	--terse-version=type	Terse version output format (default 3, or 2 or 4).
+	--terse-version=type	Set terse version output format (default 3, or 2 or 4)
 	--version		Print version info and exit
 	--help			Print this page
 	--cpuclock-test		Perform test/validation of CPU clock
-	--crctest[=test]	Test speed of checksum functions
+	--crctest=type		Test speed of checksum functions
 	--cmdhelp=cmd		Print command help, "all" for all of them
 	--enghelp=engine	Print ioengine help, or list available ioengines
 	--enghelp=engine,cmd	Print help for an ioengine cmd
 	--showcmd		Turn a job file into command line options
-	--readonly		Turn on safety read-only checks, preventing
-				writes
 	--eta=when		When ETA estimate should be printed
 				May be "always", "never" or "auto"
 	--eta-newline=time	Force a new line for every 'time' period passed
 	--status-interval=t	Force full status dump every 't' period passed
+	--readonly		Turn on safety read-only checks, preventing writes
 	--section=name		Only run specified section in job file.
 				Multiple sections can be specified.
 	--alloc-size=kb		Set smalloc pool to this size in kb (def 16384)
 	--warnings-fatal	Fio parser warnings are fatal
-	--max-jobs		Maximum number of threads/processes to support
-	--server=args		Start backend server. See Client/Server section.
-	--client=host		Connect to specified backend(s).
-	--remote-config=file	Tell fio server to load this local file
+	--max-jobs=nr		Maximum number of threads/processes to support
+	--server=args		Start a backend fio server. See Client/Server section.
+	--client=hostname	Talk to remote backend(s) fio server at hostname
+	--daemonize=pidfile	Background fio server, write pid to file
+	--remote-config=file	Tell fio server to load this local job file
 	--idle-prof=option	Report cpu idleness on a system or percpu basis
 				(option=system,percpu) or run unit work
-				calibration only (option=calibrate).
+				calibration only (option=calibrate)
 	--inflate-log=log	Inflate and output compressed log
 	--trigger-file=file	Execute trigger cmd when file exists
 	--trigger-timeout=t	Execute trigger af this time
@@ -218,6 +222,8 @@ Currently, additional logging is available for:
 	net		Dump info related to networking connections
 	rate		Dump info related to IO rate switching
 	compress	Dump info related to log compress/decompress
+	steadystate	Dump info related to steady state detection
+	helperthread	Dump info related to helper thread
 	? or help	Show available debug options.
 
 One can specify multiple debug options: e.g. --debug=file,mem will enable
diff --git a/backend.c b/backend.c
index c8c6de6..a46101c 100644
--- a/backend.c
+++ b/backend.c
@@ -2063,7 +2063,7 @@ static bool check_mount_writes(struct thread_data *td)
 
 	return false;
 mounted:
-	log_err("fio: %s appears mounted, and 'allow_mounted_write' isn't set. Aborting.", f->file_name);
+	log_err("fio: %s appears mounted, and 'allow_mounted_write' isn't set. Aborting.\n", f->file_name);
 	return true;
 }
 
diff --git a/client.c b/client.c
index 1b4d3d7..7934661 100644
--- a/client.c
+++ b/client.c
@@ -1322,7 +1322,7 @@ static int fio_client_handle_iolog(struct fio_client *client,
 	log_pathname = malloc(10 + strlen((char *)pdu->name) +
 			strlen(client->hostname));
 	if (!log_pathname) {
-		log_err("fio: memory allocation of unique pathname failed");
+		log_err("fio: memory allocation of unique pathname failed\n");
 		return -1;
 	}
 	/* generate a unique pathname for the log file using hostname */
diff --git a/engines/e4defrag.c b/engines/e4defrag.c
index e53636e..1e4996f 100644
--- a/engines/e4defrag.c
+++ b/engines/e4defrag.c
@@ -95,7 +95,7 @@ static int fio_e4defrag_init(struct thread_data *td)
 	ed->donor_fd = open(donor_name, O_CREAT|O_WRONLY, 0644);
 	if (ed->donor_fd < 0) {
 		td_verror(td, errno, "io_queue_init");
-		log_err("Can't open donor file %s err:%d", donor_name, ed->donor_fd);
+		log_err("Can't open donor file %s err:%d\n", donor_name, ed->donor_fd);
 		free(ed);
 		return 1;
 	}
diff --git a/engines/net.c b/engines/net.c
index 5f1401c..3bdd5cd 100644
--- a/engines/net.c
+++ b/engines/net.c
@@ -1218,7 +1218,7 @@ static int fio_netio_setup_listen_inet(struct thread_data *td, short port)
 			return 1;
 		}
 		if (is_ipv6(o)) {
-			log_err("fio: IPv6 not supported for multicast network IO");
+			log_err("fio: IPv6 not supported for multicast network IO\n");
 			close(fd);
 			return 1;
 		}
diff --git a/engines/rdma.c b/engines/rdma.c
index fbe8434..10e60dc 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -881,7 +881,7 @@ static int fio_rdmaio_connect(struct thread_data *td, struct fio_file *f)
 	rd->send_buf.nr = htonl(td->o.iodepth);
 
 	if (ibv_post_send(rd->qp, &rd->sq_wr, &bad_wr) != 0) {
-		log_err("fio: ibv_post_send fail: %m");
+		log_err("fio: ibv_post_send fail: %m\n");
 		return 1;
 	}
 
@@ -932,7 +932,7 @@ static int fio_rdmaio_accept(struct thread_data *td, struct fio_file *f)
 	ret = rdma_poll_wait(td, IBV_WC_RECV) < 0;
 
 	if (ibv_post_send(rd->qp, &rd->sq_wr, &bad_wr) != 0) {
-		log_err("fio: ibv_post_send fail: %m");
+		log_err("fio: ibv_post_send fail: %m\n");
 		return 1;
 	}
 
@@ -965,7 +965,7 @@ static int fio_rdmaio_close_file(struct thread_data *td, struct fio_file *f)
 				     || (rd->rdma_protocol ==
 					 FIO_RDMA_MEM_READ))) {
 		if (ibv_post_send(rd->qp, &rd->sq_wr, &bad_wr) != 0) {
-			log_err("fio: ibv_post_send fail: %m");
+			log_err("fio: ibv_post_send fail: %m\n");
 			return 1;
 		}
 
diff --git a/engines/sg.c b/engines/sg.c
index c1fe602..001193d 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -579,7 +579,7 @@ static char *fio_sgio_errdetails(struct io_u *io_u)
 	char *msg, msgchunk[MAXMSGCHUNK], *ret = NULL;
 	int i;
 
-	msg = calloc(MAXERRDETAIL, 1);
+	msg = calloc(1, MAXERRDETAIL);
 
 	/*
 	 * can't seem to find sg_err.h, so I'll just echo the define values
diff --git a/init.c b/init.c
index 3c925a3..9889949 100644
--- a/init.c
+++ b/init.c
@@ -94,7 +94,7 @@ static struct option l_opts[FIO_NR_OPTIONS] = {
 		.val		= 'o' | FIO_CLIENT_FLAG,
 	},
 	{
-		.name		= (char *) "timeout",
+		.name		= (char *) "runtime",
 		.has_arg	= required_argument,
 		.val		= 't' | FIO_CLIENT_FLAG,
 	},
@@ -1984,6 +1984,11 @@ static void show_debug_categories(void)
 #endif
 }
 
+/*
+ * Following options aren't printed by usage().
+ * --append-terse - Equivalent to --output-format=terse, see f6a7df53.
+ * --latency-log - Deprecated option.
+ */
 static void usage(const char *name)
 {
 	printf("%s\n", fio_version_string);
@@ -1995,12 +2000,13 @@ static void usage(const char *name)
 	printf("  --runtime\t\tRuntime in seconds\n");
 	printf("  --bandwidth-log\tGenerate aggregate bandwidth logs\n");
 	printf("  --minimal\t\tMinimal (terse) output\n");
-	printf("  --output-format=x\tOutput format (terse,json,json+,normal)\n");
-	printf("  --terse-version=x\tSet terse version output format to 'x'\n");
+	printf("  --output-format=type\tOutput format (terse,json,json+,normal)\n");
+	printf("  --terse-version=type\tSet terse version output format"
+		" (default 3, or 2 or 4)\n");
 	printf("  --version\t\tPrint version info and exit\n");
 	printf("  --help\t\tPrint this page\n");
 	printf("  --cpuclock-test\tPerform test/validation of CPU clock\n");
-	printf("  --crctest\t\tTest speed of checksum functions\n");
+	printf("  --crctest=type\tTest speed of checksum functions\n");
 	printf("  --cmdhelp=cmd\t\tPrint command help, \"all\" for all of"
 		" them\n");
 	printf("  --enghelp=engine\tPrint ioengine help, or list"
@@ -2016,14 +2022,15 @@ static void usage(const char *name)
 	printf(" 't' period passed\n");
 	printf("  --readonly\t\tTurn on safety read-only checks, preventing"
 		" writes\n");
-	printf("  --section=name\tOnly run specified section in job file\n");
+	printf("  --section=name\tOnly run specified section in job file,"
+		" multiple sections can be specified\n");
 	printf("  --alloc-size=kb\tSet smalloc pool to this size in kb"
-		" (def 1024)\n");
+		" (def 16384)\n");
 	printf("  --warnings-fatal\tFio parser warnings are fatal\n");
 	printf("  --max-jobs=nr\t\tMaximum number of threads/processes to support\n");
 	printf("  --server=args\t\tStart a backend fio server\n");
 	printf("  --daemonize=pidfile\tBackground fio server, write pid to file\n");
-	printf("  --client=hostname\tTalk to remote backend fio server at hostname\n");
+	printf("  --client=hostname\tTalk to remote backend(s) fio server at hostname\n");
 	printf("  --remote-config=file\tTell fio server to load this local job file\n");
 	printf("  --idle-prof=option\tReport cpu idleness on a system or percpu basis\n"
 		"\t\t\t(option=system,percpu) or run unit work\n"
diff --git a/iolog.c b/iolog.c
index 9393890..25d8dd0 100644
--- a/iolog.c
+++ b/iolog.c
@@ -422,7 +422,7 @@ static int read_iolog2(struct thread_data *td, FILE *f)
 				continue;
 			}
 		} else {
-			log_err("bad iolog2: %s", p);
+			log_err("bad iolog2: %s\n", p);
 			continue;
 		}
 
diff --git a/options.c b/options.c
index 0f2adcd..1ca16e8 100644
--- a/options.c
+++ b/options.c
@@ -1883,6 +1883,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "IO Size",
 		.type	= FIO_OPT_STR_VAL,
 		.off1	= offsetof(struct thread_options, io_limit),
+		.help	= "Total size of I/O to be performed",
 		.interval = 1024 * 1024,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_INVALID,
diff --git a/server.c b/server.c
index b7ebd63..6d5d4ea 100644
--- a/server.c
+++ b/server.c
@@ -2538,7 +2538,7 @@ int fio_start_server(char *pidfile)
 
 	pid = fork();
 	if (pid < 0) {
-		log_err("fio: failed server fork: %s", strerror(errno));
+		log_err("fio: failed server fork: %s\n", strerror(errno));
 		free(pidfile);
 		return -1;
 	} else if (pid) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2017-01-03 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2017-01-03 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 915ca9807717762e288ded3eba0fe5fc82a2ddcd:

  options: mark steadystate option parents (2016-12-29 09:07:57 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 747311bd9cb82c02bfa4622054b5142a71a6c8ec:

  t/stest: remove old test (2017-01-02 18:21:14 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      t/stest: remove old test

Rebecca Cran (1):
      Update Windows maintainer contact details

Robert Elliott (12):
      Avoid using units in option defaults
      gfio: Improve IOPS textbox labels
      Document trim workload choices and other nits
      tests, profiles: Use IEC prefixes for binary multiples
      Fix unit_base kb_base mixup in thread option conversion functions
      Line up colons across read, write, and trim thread stats
      gclient: Delete unused code
      gclient: Use proper time units in latency buckets chart
      Convert group_run_stats to use bytes instead of KiB/KB
      Clean up unit prefixes for binary multiples in comments and prints
      Improve IEC binary and SI decimal prefix handling
      Documentation for IEC binary and SI decimal prefix handling

 HOWTO                           | 299 +++++++++++++++++++++++++---------------
 README                          |   4 +-
 backend.c                       |  18 ++-
 cconv.c                         |   4 +-
 client.c                        |   2 +-
 crc/test.c                      |   4 +-
 engines/dev-dax.c               |   2 +-
 engines/mmap.c                  |   2 +-
 eta.c                           |  46 +++++--
 filesetup.c                     |   5 +-
 fio.1                           | 205 ++++++++++++++++++---------
 fio.h                           |   7 +
 gclient.c                       | 173 ++++++++++++-----------
 gfio.c                          |  10 +-
 goptions.c                      |   2 +-
 init.c                          |  39 ++----
 lib/num2str.c                   |  57 ++++++--
 memory.c                        |   4 +-
 options.c                       |  14 +-
 parse.c                         |  40 ++++--
 profiles/act.c                  |  14 +-
 profiles/tiobench.c             |   6 +-
 server.c                        |   2 +-
 stat.c                          | 114 ++++++++-------
 stat.h                          |   2 +-
 t/btrace2fio.c                  |  18 +--
 t/dedupe.c                      |   2 +-
 t/genzipf.c                     |  13 +-
 t/lfsr-test.c                   |   2 +-
 t/memlock.c                     |  14 +-
 t/read-to-pipe-async.c          |   4 +-
 t/stest.c                       |  12 --
 unit_tests/steadystate_tests.py |   2 +-
 33 files changed, 674 insertions(+), 468 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 7274c0e..4354e46 100644
--- a/HOWTO
+++ b/HOWTO
@@ -116,7 +116,7 @@ section residing above it. If the first character in a line is a ';' or a
 '#', the entire line is discarded as a comment.
 
 So let's look at a really simple job file that defines two processes, each
-randomly reading from a 128MB file.
+randomly reading from a 128MiB file.
 
 ; -- start job file --
 [global]
@@ -154,9 +154,9 @@ numjobs=4
 
 Here we have no global section, as we only have one job defined anyway.
 We want to use async io here, with a depth of 4 for each file. We also
-increased the buffer size used to 32KB and define numjobs to 4 to
+increased the buffer size used to 32KiB and define numjobs to 4 to
 fork 4 identical jobs. The result is 4 processes each randomly writing
-to their own 64MB file. Instead of using the above job file, you could
+to their own 64MiB file. Instead of using the above job file, you could
 have given the parameters on the command line. For this case, you would
 specify:
 
@@ -276,20 +276,70 @@ time	Integer with possible time suffix. In seconds unless otherwise
 	specified, use eg 10m for 10 minutes. Accepts s/m/h for seconds,
 	minutes, and hours, and accepts 'ms' (or 'msec') for milliseconds,
 	and 'us' (or 'usec') for microseconds.
-int	SI integer. A whole number value, which may contain a suffix
-	describing the base of the number. Accepted suffixes are k/m/g/t/p,
-	meaning kilo, mega, giga, tera, and peta. The suffix is not case
-	sensitive, and you may also include trailing 'b' (eg 'kb' is the same
-	as 'k'). So if you want to specify 4096, you could either write
-	out '4096' or just give 4k. The suffixes signify base 2 values, so
-	1024 is 1k and 1024k is 1m and so on, unless the suffix is explicitly
-	set to a base 10 value using 'kib', 'mib', 'gib', etc. If that is the
-	case, then 1000 is used as the multiplier. This can be handy for
-	disks, since manufacturers generally use base 10 values when listing
-	the capacity of a drive. If the option accepts an upper and lower
-	range, use a colon ':' or minus '-' to separate such values.  May also
-	include a prefix to indicate numbers base. If 0x is used, the number
-	is assumed to be hexadecimal.  See irange.
+
+int	Integer. A whole number value, which may contain an integer prefix
+	and an integer suffix.
+	[integer prefix]number[integer suffix]
+
+	The optional integer prefix specifies the number's base. The default
+	is decimal. 0x specifies hexadecimal.
+
+	The optional integer suffix specifies the number's units, and includes
+	an optional unit prefix and an optional unit.  For quantities of data,
+	the default unit is bytes. For quantities of time, the default unit
+	is seconds.
+
+	With kb_base=1000, fio follows international standards for unit prefixes.
+	To specify power-of-10 decimal values defined in the International
+	System of Units (SI):
+		Ki means kilo (K) or 1000
+		Mi means mega (M) or 1000**2
+		Gi means giga (G) or 1000**3
+		Ti means tera (T) or 1000**4
+		Pi means peta (P) or 1000**5
+
+	To specify power-of-2 binary values defined in IEC 80000-13:
+		k means kibi (Ki) or 1024
+		M means mebi (Mi) or 1024**2
+		G means gibi (Gi) or 1024**3
+		T means tebi (Ti) or 1024**4
+		P means pebi (Pi) or 1024**5
+
+	With kb_base=1024 (the default), the unit prefixes are opposite from
+	those specified in the SI and IEC 80000-13 standards to provide
+	compatibility with old scripts.  For example, 4k means 4096.
+
+	For quantities of data, an optional unit of 'B' may be included
+	(e.g.,  'kB' is the same as 'k').
+
+	The integer suffix is not case sensitive (e.g., m/mi mean mebi/mega,
+	not milli). 'b' and 'B' both mean byte, not bit.
+
+	Examples with kb_base=1000:
+		4 KiB: 4096, 4096b, 4096B, 4ki, 4kib, 4kiB, 4Ki, 4KiB
+		1 MiB: 1048576, 1mi, 1024ki
+		1 MB: 1000000, 1m, 1000k
+		1 TiB: 1073741824, 1ti, 1024mi, 1048576ki
+		1 TB: 1000000000, 1t, 1000m, 1000000k
+
+	Examples with kb_base=1024 (default):
+		4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
+		1 MiB: 1048576, 1m, 1024k
+		1 MB: 1000000, 1mi, 1000ki
+		1 TiB: 1073741824, 1t, 1024m, 1048576k
+		1 TB: 1000000000, 1ti, 1000mi, 1000000ki
+
+	To specify times (units are not case sensitive):
+		D means days
+		H means hours
+		M mean minutes
+		s or sec means seconds (default)
+		ms or msec means milliseconds
+		us or usec means microseconds
+
+	If the option accepts an upper and lower range, use a colon ':' or
+	minus '-' to separate such values.   See irange.
+
 bool	Boolean. Usually parsed as an integer, however only defined for
 	true and false (1 and 0).
 irange	Integer range with suffix. Allows value range to be given, such
@@ -398,12 +448,13 @@ rw=str		Type of io pattern. Accepted values are:
 
 			read		Sequential reads
 			write		Sequential writes
+			trim		Sequential trims
 			randwrite	Random writes
 			randread	Random reads
+			randtrim	Random trims
 			rw,readwrite	Sequential mixed reads and writes
 			randrw		Random mixed reads and writes
-			trimwrite	Mixed trims and writes. Blocks will be
-					trimmed first, then written to.
+			trimwrite	Sequential trim+write sequences
 
 		Fio defaults to read if the option is not specified.
 		For the mixed io types, the default is to split them 50/50.
@@ -438,13 +489,27 @@ rw_sequencer=str If an offset modifier is given by appending a number to
 		the same offset 8 number of times before generating a new
 		offset.
 
-kb_base=int	The base unit for a kilobyte. The defacto base is 2^10, 1024.
-		Storage manufacturers like to use 10^3 or 1000 as a base
-		ten unit instead, for obvious reasons. Allow values are
-		1024 or 1000, with 1024 being the default.
+kb_base=int	Select the interpretation of unit prefixes in input parameters.
+		1000 = Inputs comply with IEC 80000-13 and the International
+		       System of Units (SI).  Use:
+			- power-of-2 values with IEC prefixes (e.g., KiB)
+			- power-of-10 values with SI prefixes (e.g., kB)
+		1024 = Compatibility mode (default).  To avoid breaking
+		       old scripts:
+			- power-of-2 values with SI prefixes
+			- power-of-10 values with IEC prefixes
+		See bs= for more details on input parameters.
+
+		Outputs always use correct prefixes.  Most outputs include
+		both side-by-side, like:
+			bw=2383.3kB/s (2327.4KiB/s)
+		If only one value is reported, then kb_base selects the
+		one to use:
+			1000 = SI prefixes
+			1024 = IEC prefixes
 
 unified_rw_reporting=bool	Fio normally reports statistics on a per
-		data direction basis, meaning that read, write, and trim are
+		data direction basis, meaning that reads, writes, and trims are
 		accounted and reported separately. If this option is set,
 		the fio will sum the results and report them as "mixed"
 		instead.
@@ -509,11 +574,11 @@ io_limit=int	Normally fio operates within the region set by 'size', which
 		means that the 'size' option sets both the region and size of
 		IO to be performed. Sometimes that is not what you want. With
 		this option, it is possible to define just the amount of IO
-		that fio should do. For instance, if 'size' is set to 20G and
-		'io_size' is set to 5G, fio will perform IO within the first
-		20G but exit when 5G have been done. The opposite is also
-		possible - if 'size' is set to 20G, and 'io_size' is set to
-		40G, then fio will do 40G of IO within the 0..20G region.
+		that fio should do. For instance, if 'size' is set to 20GiB and
+		'io_size' is set to 5GiB, fio will perform IO within the first
+		20GiB but exit when 5GiB have been done. The opposite is also
+		possible - if 'size' is set to 20GiB, and 'io_size' is set to
+		40GiB, then fio will do 40GiB of IO within the 0..20GiB region.
 
 filesize=int	Individual file sizes. May be a range, in which case fio
 		will select sizes for files at random within the given range
@@ -536,36 +601,36 @@ fill_fs=bool	Sets size to something really large and waits for ENOSPC (no
 		Additionally, writing beyond end-of-device will not return
 		ENOSPC there.
 
-blocksize=int
-bs=int		The block size used for the io units. Defaults to 4k. Values
-		can be given for both read and writes. If a single int is
-		given, it will apply to both. If a second int is specified
-		after a comma, it will apply to writes only. In other words,
-		the format is either bs=read_and_write or bs=read,write,trim.
-		bs=4k,8k will thus use 4k blocks for reads, 8k blocks for
-		writes, and 8k for trims. You can terminate the list with
-		a trailing comma. bs=4k,8k, would use the default value for
-		trims.. If you only wish to set the write size, you
-		can do so by passing an empty read size - bs=,8k will set
-		8k for writes and leave the read default value.
-
-blockalign=int
-ba=int		At what boundary to align random IO offsets. Defaults to
-		the same as 'blocksize' the minimum blocksize given.
-		Minimum alignment is typically 512b for using direct IO,
-		though it usually depends on the hardware block size. This
-		option is mutually exclusive with using a random map for
-		files, so it will turn off that option.
-
-blocksize_range=irange
-bsrange=irange	Instead of giving a single block size, specify a range
-		and fio will mix the issued io block sizes. The issued
-		io unit will always be a multiple of the minimum value
-		given (also see bs_unaligned). Applies to both reads and
-		writes, however a second range can be given after a comma.
-		See bs=.
-
-bssplit=str	Sometimes you want even finer grained control of the
+blocksize=int[,int][,int]
+bs=int[,int][,int]
+		The block size in bytes used for I/O units. Default: 4096.
+		A single value applies to reads, writes, and trims.
+		Comma-separated values may be specified for reads, writes,
+		and trims.  A value not terminated in a comma applies to
+		subsequent types.
+
+		Examples:
+		bs=256k    means 256k for reads, writes and trims
+		bs=8k,32k  means 8k for reads, 32k for writes and trims
+		bs=8k,32k, means 8k for reads, 32k for writes, and
+		           default for trims
+		bs=,8k     means default for reads, 8k for writes and trims
+		bs=,8k,    means default for reads, 8k for writes, and
+		           default for writes
+
+blocksize_range=irange[,irange][,irange]
+bsrange=irange[,irange][,irange]
+		A range of block sizes in bytes for I/O units.
+		The issued I/O unit will always be a multiple of the minimum
+		size, unless blocksize_unaligned is set.
+
+		Comma-separated ranges may be specified for reads, writes,
+		and trims as described in 'blocksize'.
+
+		Example: bsrange=1k-4k,2k-8k
+
+bssplit=str[,str][,str]
+		Sometimes you want even finer grained control of the
 		block sizes issued, not just an even split between them.
 		This option allows you to weight various block sizes,
 		so that you are able to define a specific amount of
@@ -589,24 +654,37 @@ bssplit=str	Sometimes you want even finer grained control of the
 		always add up to 100, if bssplit is given a range that adds
 		up to more, it will error out.
 
-		bssplit also supports giving separate splits to reads and
-		writes. The format is identical to what bs= accepts. You
-		have to separate the read and write parts with a comma. So
-		if you want a workload that has 50% 2k reads and 50% 4k reads,
+		Comma-separated values may be specified for reads, writes,
+		and trims as described in 'blocksize'.
+
+		If you want a workload that has 50% 2k reads and 50% 4k reads,
 		while having 90% 4k writes and 10% 8k writes, you would
 		specify:
 
 		bssplit=2k/50:4k/50,4k/90:8k/10
 
 blocksize_unaligned
-bs_unaligned	If this option is given, any byte size value within bsrange
-		may be used as a block range. This typically wont work with
-		direct IO, as that normally requires sector alignment.
+bs_unaligned	If set, fio will issue I/O units with any size within
+		blocksize_range, not just multiples of the minimum size.
+		This typically won't work with direct I/O, as that normally
+		requires sector alignment.
 
 bs_is_seq_rand	If this option is set, fio will use the normal read,write
-		blocksize settings as sequential,random instead. Any random
-		read or write will use the WRITE blocksize settings, and any
-		sequential read or write will use the READ blocksize setting.
+		blocksize settings as sequential,random blocksize settings
+		instead. Any random read or write will use the WRITE blocksize
+		settings, and any sequential read or write will use the READ
+		blocksize settings.
+
+blockalign=int[,int][,int]
+ba=int[,int][,int]
+		Boundary to which fio will align random I/O units.
+		Default: 'blocksize'.
+		Minimum alignment is typically 512b for using direct IO,
+		though it usually depends on the hardware block size. This
+		option is mutually exclusive with using a random map for
+		files, so it will turn off that option.
+		Comma-separated values may be specified for reads, writes,
+		and trims as described in 'blocksize'.
 
 zero_buffers	If this option is given, fio will init the IO buffers to
 		all zeroes. The default is to fill them with random data.
@@ -836,7 +914,7 @@ ioengine=str	Defines how the job issues io to the file. The following
 				filename, eg ioengine=external:/tmp/foo.o
 				to load ioengine foo.o in /tmp.
 
-iodepth=int	This defines how many io units to keep in flight against
+iodepth=int	This defines how many I/O units to keep in flight against
 		the file. The default is 1 for each file defined in this
 		job, can be overridden with a larger value for higher
 		concurrency. Note that increasing iodepth beyond 1 will not
@@ -989,7 +1067,8 @@ rwmixwrite=int	How large a percentage of the mix should be writes. If both
 		if fio is asked to limit reads or writes to a certain rate.
 		If that is the case, then the distribution may be skewed.
 
-random_distribution=str:float	By default, fio will use a completely uniform
+random_distribution=str:float[,str:float][,str:float]
+		By default, fio will use a completely uniform
 		random distribution when asked to perform random IO. Sometimes
 		it is useful to skew the distribution in specific ways,
 		ensuring that some parts of the data is more hot than others.
@@ -1031,14 +1110,15 @@ random_distribution=str:float	By default, fio will use a completely uniform
 		specify separate zones for reads, writes, and trims. If just
 		one set is given, it'll apply to all of them.
 
-percentage_random=int	For a random workload, set how big a percentage should
+percentage_random=int[,int][,int]
+		For a random workload, set how big a percentage should
 		be random. This defaults to 100%, in which case the workload
 		is fully random. It can be set from anywhere from 0 to 100.
 		Setting it to 0 would make the workload fully sequential. Any
 		setting in between will result in a random mix of sequential
-		and random IO, at the given percentages. It is possible to
-		set different values for reads, writes, and trim. To do so,
-		simply use a comma separated list. See blocksize.
+		and random IO, at the given percentages.
+		Comma-separated values may be specified for reads, writes,
+		and trims as described in 'blocksize'.
 
 norandommap	Normally fio will cover every block of the file when doing
 		random IO. If this option is given, fio will just get a
@@ -1110,29 +1190,32 @@ thinktime_blocks=int
 		other words, this setting effectively caps the queue depth
 		if the latter is larger.
 
-rate=int	Cap the bandwidth used by this job. The number is in bytes/sec,
-		the normal suffix rules apply. You can use rate=500k to limit
-		reads and writes to 500k each, or you can specify read and
-		writes separately. Using rate=1m,500k would limit reads to
-		1MB/sec and writes to 500KB/sec. Capping only reads or
-		writes can be done with rate=,500k or rate=500k,. The former
-		will only limit writes (to 500KB/sec), the latter will only
-		limit reads.
-
-rate_min=int	Tell fio to do whatever it can to maintain at least this
-		bandwidth. Failing to meet this requirement, will cause
-		the job to exit. The same format as rate is used for
-		read vs write separation.
-
-rate_iops=int	Cap the bandwidth to this number of IOPS. Basically the same
+rate=int[,int][,int]
+		Cap the bandwidth used by this job. The number is in bytes/sec,
+		the normal suffix rules apply.
+		Comma-separated values may be specified for reads, writes,
+		and trims as described in 'blocksize'.
+
+rate_min=int[,int][,int]
+		Tell fio to do whatever it can to maintain at least this
+		bandwidth. Failing to meet this requirement will cause
+		the job to exit.
+		Comma-separated values may be specified for reads, writes,
+		and trims as described in 'blocksize'.
+
+rate_iops=int[,int][,int]
+		Cap the bandwidth to this number of IOPS. Basically the same
 		as rate, just specified independently of bandwidth. If the
 		job is given a block size range instead of a fixed value,
-		the smallest block size is used as the metric. The same format
-		as rate is used for read vs write separation.
+		the smallest block size is used as the metric.
+		Comma-separated values may be specified for reads, writes,
+		and trims as described in 'blocksize'.
 
-rate_iops_min=int If fio doesn't meet this rate of IO, it will cause
-		the job to exit. The same format as rate is used for read vs
-		write separation.
+rate_iops_min=int[,int][,int]
+		If fio doesn't meet this rate of IO, it will cause
+		the job to exit.
+		Comma-separated values may be specified for reads, writes,
+		and trims as described in 'blocksize'.
 
 rate_process=str	This option controls how fio manages rated IO
 		submissions. The default is 'linear', which submits IO in a
@@ -1279,7 +1362,7 @@ sync=bool	Use sync io for buffered writes. For the majority of the
 		io engines, this means using O_SYNC.
 
 iomem=str
-mem=str		Fio can use various types of memory as the io unit buffer.
+mem=str		Fio can use various types of memory as the I/O unit buffer.
 		The allowed values are:
 
 			malloc	Use memory from malloc(3) as the buffers.
@@ -1307,7 +1390,7 @@ mem=str		Fio can use various types of memory as the io unit buffer.
 		that for shmhuge and mmaphuge to work, the system must have
 		free huge pages allocated. This can normally be checked
 		and set by reading/writing /proc/sys/vm/nr_hugepages on a
-		Linux system. Fio assumes a huge page is 4MB in size. So
+		Linux system. Fio assumes a huge page is 4MiB in size. So
 		to calculate the number of huge pages you need for a given
 		job file, add up the io depth of all jobs (normally one unless
 		iodepth= is used) and multiply by the maximum bs set. Then
@@ -1321,7 +1404,7 @@ mem=str		Fio can use various types of memory as the io unit buffer.
 		you would use mem=mmaphuge:/huge/somefile.
 
 iomem_align=int	This indicates the memory alignment of the IO memory buffers.
-		Note that the given alignment is applied to the first IO unit
+		Note that the given alignment is applied to the first I/O unit
 		buffer, if using iodepth the alignment of the following buffers
 		are given by the bs used. In other words, if using a bs that is
 		a multiple of the page sized in the system, all buffers will
@@ -1331,7 +1414,7 @@ iomem_align=int	This indicates the memory alignment of the IO memory buffers.
 
 hugepage-size=int
 		Defines the size of a huge page. Must at least be equal
-		to the system setting, see /proc/meminfo. Defaults to 4MB.
+		to the system setting, see /proc/meminfo. Defaults to 4MiB.
 		Should probably always be a multiple of megabytes, so using
 		hugepage-size=Xm is the preferred way to set this to avoid
 		setting a non-pow-2 bad value.
@@ -2023,7 +2106,7 @@ be the starting port number since fio will use a range of ports.
 fio spits out a lot of output. While running, fio will display the
 status of the jobs created. An example of that would be:
 
-Threads: 1: [_r] [24.8% done] [ 13509/  8334 kb/s] [eta 00h:01m:31s]
+Jobs: 1: [_r] [24.8% done] [r=20992KiB/s,w=24064KiB/s,t=0KiB/s] [r=82,w=94,t=0 iops] [eta 00h:01m:31s]
 
 The characters inside the square brackets denote the current status of
 each thread. The possible values (in typical life cycle order) are:
@@ -2052,7 +2135,7 @@ Fio will condense the thread string as not to take up more space on the
 command line as is needed. For instance, if you have 10 readers and 10
 writers running, the output would look like this:
 
-Jobs: 20 (f=20): [R(10),W(10)] [4.0% done] [2103MB/0KB/0KB /s] [538K/0/0 iops] [eta 57m:36s]
+Jobs: 20 (f=20): [R(10),W(10)] [4.0% done] [r=20992KiB/s,w=24064KiB/s,t=0KiB/s] [r=82,w=94,t=0 iops] [eta 57m:36s]
 
 Fio will still maintain the ordering, though. So the above means that jobs
 1..10 are readers, and 11..20 are writers.
@@ -2070,10 +2153,10 @@ each thread, group of threads, and disks in that order. For each data
 direction, the output looks like:
 
 Client1 (g=0): err= 0:
-  write: io=    32MB, bw=   666KB/s, iops=89 , runt= 50320msec
+  write: io=    32MiB, bw=   666KiB/s, iops=89 , runt= 50320msec
     slat (msec): min=    0, max=  136, avg= 0.03, stdev= 1.92
     clat (msec): min=    0, max=  631, avg=48.50, stdev=86.82
-    bw (KB/s) : min=    0, max= 1196, per=51.00%, avg=664.02, stdev=681.68
+    bw (KiB/s) : min=    0, max= 1196, per=51.00%, avg=664.02, stdev=681.68
   cpu        : usr=1.49%, sys=0.25%, ctx=7969, majf=0, minf=17
   IO depths    : 1=0.1%, 2=0.3%, 4=0.5%, 8=99.0%, 16=0.0%, 32=0.0%, >32=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
@@ -2192,19 +2275,19 @@ Split up, the format is as follows:
 
 	terse version, fio version, jobname, groupid, error
 	READ status:
-		Total IO (KB), bandwidth (KB/sec), IOPS, runtime (msec)
+		Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
 		Submission latency: min, max, mean, stdev (usec)
 		Completion latency: min, max, mean, stdev (usec)
 		Completion latency percentiles: 20 fields (see below)
 		Total latency: min, max, mean, stdev (usec)
-		Bw (KB/s): min, max, aggregate percentage of total, mean, stdev
+		Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev
 	WRITE status:
-		Total IO (KB), bandwidth (KB/sec), IOPS, runtime (msec)
+		Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
 		Submission latency: min, max, mean, stdev (usec)
 		Completion latency: min, max, mean, stdev(usec)
 		Completion latency percentiles: 20 fields (see below)
 		Total latency: min, max, mean, stdev (usec)
-		Bw (KB/s): min, max, aggregate percentage of total, mean, stdev
+		Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev
 	CPU usage: user, system, context switches, major faults, minor faults
 	IO depths: <=1, 2, 4, 8, 16, 32, >=64
 	IO latencies microseconds: <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000
@@ -2395,7 +2478,7 @@ Time for the log entry is always in milliseconds. The value logged depends
 on the type of log, it will be one of the following:
 
 	Latency log		Value is latency in usecs
-	Bandwidth log		Value is in KB/sec
+	Bandwidth log		Value is in KiB/sec
 	IOPS log		Value is IOPS
 
 Data direction is one of the following:
diff --git a/README b/README
index a8a4fdf..a35842e 100644
--- a/README
+++ b/README
@@ -65,7 +65,7 @@ tool (http://www.opencsw.org/get-it/pkgutil/) and then install fio via
 'pkgutil -i fio'.
 
 Windows:
-Bruce Cran <bruce@cran.org.uk> has fio packages for Windows at
+Rebecca Cran <rebecca+fio@bluestop.org> has fio packages for Windows at
 http://www.bluestop.org/fio/ .
 
 
@@ -233,7 +233,7 @@ sections.  The reserved 'global' section is always parsed and used.
 The --alloc-size switch allows one to use a larger pool size for smalloc.
 If running large jobs with randommap enabled, fio can run out of memory.
 Smalloc is an internal allocator for shared structures from a fixed size
-memory pool. The pool size defaults to 16M and can grow to 8 pools.
+memory pool. The pool size defaults to 16MiB and can grow to 8 pools.
 
 NOTE: While running .fio_smalloc.* backing store files are visible in /tmp.
 
diff --git a/backend.c b/backend.c
index a048452..c8c6de6 100644
--- a/backend.c
+++ b/backend.c
@@ -180,8 +180,8 @@ static bool __check_min_rate(struct thread_data *td, struct timeval *now,
 			 * check bandwidth specified rate
 			 */
 			if (bytes < td->rate_bytes[ddir]) {
-				log_err("%s: min rate %u not met\n", td->o.name,
-								ratemin);
+				log_err("%s: rate_min=%uB/s not met, only transferred %lluB\n",
+					td->o.name, ratemin, bytes);
 				return true;
 			} else {
 				if (spent)
@@ -191,9 +191,8 @@ static bool __check_min_rate(struct thread_data *td, struct timeval *now,
 
 				if (rate < ratemin ||
 				    bytes < td->rate_bytes[ddir]) {
-					log_err("%s: min rate %u not met, got"
-						" %luKB/sec\n", td->o.name,
-							ratemin, rate);
+					log_err("%s: rate_min=%uB/s not met, got %luB/s\n",
+						td->o.name, ratemin, rate);
 					return true;
 				}
 			}
@@ -202,8 +201,8 @@ static bool __check_min_rate(struct thread_data *td, struct timeval *now,
 			 * checks iops specified rate
 			 */
 			if (iops < rate_iops) {
-				log_err("%s: min iops rate %u not met\n",
-						td->o.name, rate_iops);
+				log_err("%s: rate_iops_min=%u not met, only performed %lu IOs\n",
+						td->o.name, rate_iops, iops);
 				return true;
 			} else {
 				if (spent)
@@ -213,9 +212,8 @@ static bool __check_min_rate(struct thread_data *td, struct timeval *now,
 
 				if (rate < rate_iops_min ||
 				    iops < td->rate_blocks[ddir]) {
-					log_err("%s: min iops rate %u not met,"
-						" got %lu\n", td->o.name,
-							rate_iops_min, rate);
+					log_err("%s: rate_iops_min=%u not met, got %lu IOPS\n",
+						td->o.name, rate_iops_min, rate);
 					return true;
 				}
 			}
diff --git a/cconv.c b/cconv.c
index 336805b..0c11629 100644
--- a/cconv.c
+++ b/cconv.c
@@ -88,7 +88,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->td_ddir = le32_to_cpu(top->td_ddir);
 	o->rw_seq = le32_to_cpu(top->rw_seq);
 	o->kb_base = le32_to_cpu(top->kb_base);
-	o->unit_base = le32_to_cpu(top->kb_base);
+	o->unit_base = le32_to_cpu(top->unit_base);
 	o->ddir_seq_nr = le32_to_cpu(top->ddir_seq_nr);
 	o->ddir_seq_add = le64_to_cpu(top->ddir_seq_add);
 	o->iodepth = le32_to_cpu(top->iodepth);
@@ -336,7 +336,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->td_ddir = cpu_to_le32(o->td_ddir);
 	top->rw_seq = cpu_to_le32(o->rw_seq);
 	top->kb_base = cpu_to_le32(o->kb_base);
-	top->unit_base = cpu_to_le32(o->kb_base);
+	top->unit_base = cpu_to_le32(o->unit_base);
 	top->ddir_seq_nr = cpu_to_le32(o->ddir_seq_nr);
 	top->iodepth = cpu_to_le32(o->iodepth);
 	top->iodepth_low = cpu_to_le32(o->iodepth_low);
diff --git a/client.c b/client.c
index 48d4c52..1b4d3d7 100644
--- a/client.c
+++ b/client.c
@@ -972,7 +972,7 @@ static void convert_gs(struct group_run_stats *dst, struct group_run_stats *src)
 		dst->min_run[i]		= le64_to_cpu(src->min_run[i]);
 		dst->max_bw[i]		= le64_to_cpu(src->max_bw[i]);
 		dst->min_bw[i]		= le64_to_cpu(src->min_bw[i]);
-		dst->io_kb[i]		= le64_to_cpu(src->io_kb[i]);
+		dst->iobytes[i]		= le64_to_cpu(src->iobytes[i]);
 		dst->agg[i]		= le64_to_cpu(src->agg[i]);
 	}
 
diff --git a/crc/test.c b/crc/test.c
index 213b5d5..300000d 100644
--- a/crc/test.c
+++ b/crc/test.c
@@ -338,9 +338,9 @@ int fio_crctest(const char *type)
 				sprintf(pre, "\t");
 			else
 				sprintf(pre, "\t\t");
-			printf("%s:%s%8.2f MB/sec\n", t[i].name, pre, mb_sec);
+			printf("%s:%s%8.2f MiB/sec\n", t[i].name, pre, mb_sec);
 		} else
-			printf("%s:inf MB/sec\n", t[i].name);
+			printf("%s:inf MiB/sec\n", t[i].name);
 		first = 0;
 	}
 
diff --git a/engines/dev-dax.c b/engines/dev-dax.c
index 6372576..2516bca 100644
--- a/engines/dev-dax.c
+++ b/engines/dev-dax.c
@@ -58,7 +58,7 @@
 #include "../verify.h"
 
 /*
- * Limits us to 1GB of mapped files in total to model after
+ * Limits us to 1GiB of mapped files in total to model after
  * mmap engine behavior
  */
 #define MMAP_TOTAL_SZ	(1 * 1024 * 1024 * 1024UL)
diff --git a/engines/mmap.c b/engines/mmap.c
index c479ed3..99e1d6a 100644
--- a/engines/mmap.c
+++ b/engines/mmap.c
@@ -15,7 +15,7 @@
 #include "../verify.h"
 
 /*
- * Limits us to 1GB of mapped files in total
+ * Limits us to 1GiB of mapped files in total
  */
 #define MMAP_TOTAL_SZ	(1 * 1024 * 1024 * 1024UL)
 
diff --git a/eta.c b/eta.c
index 19afad5..1d66163 100644
--- a/eta.c
+++ b/eta.c
@@ -308,7 +308,7 @@ static void calc_rate(int unified_rw_rep, unsigned long mtime,
 
 		diff = io_bytes[i] - prev_io_bytes[i];
 		if (mtime)
-			this_rate = ((1000 * diff) / mtime) / 1024;
+			this_rate = ((1000 * diff) / mtime) / 1024; /* KiB/s */
 		else
 			this_rate = 0;
 
@@ -530,19 +530,28 @@ void display_thread_status(struct jobs_eta *je)
 	}
 
 	p += sprintf(p, "Jobs: %d (f=%d)", je->nr_running, je->files_open);
-	if (je->m_rate[0] || je->m_rate[1] || je->t_rate[0] || je->t_rate[1]) {
+
+	/* rate limits, if any */
+	if (je->m_rate[0] || je->m_rate[1] || je->m_rate[2] ||
+	    je->t_rate[0] || je->t_rate[1] || je->t_rate[2]) {
 		char *tr, *mr;
 
-		mr = num2str(je->m_rate[0] + je->m_rate[1], 4, 0, je->is_pow2, 8);
-		tr = num2str(je->t_rate[0] + je->t_rate[1], 4, 0, je->is_pow2, 8);
-		p += sprintf(p, ", CR=%s/%s KB/s", tr, mr);
+		mr = num2str(je->m_rate[0] + je->m_rate[1] + je->m_rate[2],
+				4, 0, je->is_pow2, N2S_BYTEPERSEC);
+		tr = num2str(je->t_rate[0] + je->t_rate[1] + je->t_rate[2],
+				4, 0, je->is_pow2, N2S_BYTEPERSEC);
+
+		p += sprintf(p, ", %s-%s", mr, tr);
 		free(tr);
 		free(mr);
-	} else if (je->m_iops[0] || je->m_iops[1] || je->t_iops[0] || je->t_iops[1]) {
-		p += sprintf(p, ", CR=%d/%d IOPS",
-					je->t_iops[0] + je->t_iops[1],
-					je->m_iops[0] + je->m_iops[1]);
+	} else if (je->m_iops[0] || je->m_iops[1] || je->m_iops[2] ||
+		   je->t_iops[0] || je->t_iops[1] || je->t_iops[2]) {
+		p += sprintf(p, ", %d-%d IOPS",
+					je->m_iops[0] + je->m_iops[1] + je->m_iops[2],
+					je->t_iops[0] + je->t_iops[1] + je->t_iops[2]);
 	}
+
+	/* current run string, % done, bandwidth, iops, eta */
 	if (je->eta_sec != INT_MAX && je->nr_running) {
 		char perc_str[32];
 		char *iops_str[DDIR_RWDIR_CNT];
@@ -553,7 +562,7 @@ void display_thread_status(struct jobs_eta *je)
 
 		if ((!je->eta_sec && !eta_good) || je->nr_ramp == je->nr_running ||
 		    je->eta_sec == -1)
-			strcpy(perc_str, "-.-% done");
+			strcpy(perc_str, "-.-%");
 		else {
 			double mult = 100.0;
 
@@ -562,22 +571,31 @@ void display_thread_status(struct jobs_eta *je)
 
 			eta_good = 1;
 			perc *= mult;
-			sprintf(perc_str, "%3.1f%% done", perc);
+			sprintf(perc_str, "%3.1f%%", perc);
 		}
 
 		for (ddir = DDIR_READ; ddir < DDIR_RWDIR_CNT; ddir++) {
-			rate_str[ddir] = num2str(je->rate[ddir], 5,
+			rate_str[ddir] = num2str(je->rate[ddir], 4,
 						1024, je->is_pow2, je->unit_base);
-			iops_str[ddir] = num2str(je->iops[ddir], 4, 1, 0, 0);
+			iops_str[ddir] = num2str(je->iops[ddir], 4, 1, 0, N2S_NONE);
 		}
 
 		left = sizeof(output) - (p - output) - 1;
 
-		l = snprintf(p, left, ": [%s] [%s] [%s/%s/%s /s] [%s/%s/%s iops] [eta %s]",
+		if (je->rate[DDIR_TRIM] || je->iops[DDIR_TRIM])
+			l = snprintf(p, left,
+				": [%s][%s][r=%s,w=%s,t=%s][r=%s,w=%s,t=%s IOPS][eta %s]",
 				je->run_str, perc_str, rate_str[DDIR_READ],
 				rate_str[DDIR_WRITE], rate_str[DDIR_TRIM],
 				iops_str[DDIR_READ], iops_str[DDIR_WRITE],
 				iops_str[DDIR_TRIM], eta_str);
+		else
+			l = snprintf(p, left,
+				": [%s][%s][r=%s,w=%s][r=%s,w=%s IOPS][eta %s]",
+				je->run_str, perc_str,
+				rate_str[DDIR_READ], rate_str[DDIR_WRITE],
+				iops_str[DDIR_READ], iops_str[DDIR_WRITE],
+				eta_str);
 		p += l;
 		if (l >= 0 && l < linelen_last)
 			p += sprintf(p, "%*s", linelen_last - l, "");
diff --git a/filesetup.c b/filesetup.c
index 969e7cc..ef94bd2 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -948,9 +948,8 @@ int setup_files(struct thread_data *td)
 	if (need_extend) {
 		temp_stall_ts = 1;
 		if (output_format & FIO_OUTPUT_NORMAL)
-			log_info("%s: Laying out IO file(s) (%u file(s) /"
-				 " %lluMB)\n", o->name, need_extend,
-					extend_size >> 20);
+			log_info("%s: Laying out IO file(s) (%u file(s) / %lluMiB)\n",
+				 o->name, need_extend, extend_size >> 20);
 
 		for_each_file(td, f, i) {
 			unsigned long long old_len = -1ULL, extend_len = -1ULL;
diff --git a/fio.1 b/fio.1
index 6161760..f486276 100644
--- a/fio.1
+++ b/fio.1
@@ -1,4 +1,4 @@
-.TH fio 1 "December 2014" "User Manual"
+.TH fio 1 "December 2016" "User Manual"
 .SH NAME
 fio \- flexible I/O tester
 .SH SYNOPSIS
@@ -147,19 +147,77 @@ parentheses). The types used are:
 String: a sequence of alphanumeric characters.
 .TP
 .I int
-SI integer: a whole number, possibly containing a suffix denoting the base unit
-of the value.  Accepted suffixes are `k', 'M', 'G', 'T', and 'P', denoting
-kilo (1024), mega (1024^2), giga (1024^3), tera (1024^4), and peta (1024^5)
-respectively. If prefixed with '0x', the value is assumed to be base 16
-(hexadecimal). A suffix may include a trailing 'b', for instance 'kb' is
-identical to 'k'. You can specify a base 10 value by using 'KiB', 'MiB','GiB',
-etc. This is useful for disk drives where values are often given in base 10
-values. Specifying '30GiB' will get you 30*1000^3 bytes.
-When specifying times the default suffix meaning changes, still denoting the
-base unit of the value, but accepted suffixes are 'D' (days), 'H' (hours), 'M'
-(minutes), 'S' Seconds, 'ms' (or msec) milli seconds, 'us' (or 'usec') micro
-seconds. Time values without a unit specify seconds.
-The suffixes are not case sensitive.
+Integer. A whole number value, which may contain an integer prefix
+and an integer suffix.
+
+[integer prefix]number[integer suffix]
+
+The optional integer prefix specifies the number's base. The default
+is decimal. 0x specifies hexadecimal.
+
+The optional integer suffix specifies the number's units, and includes
+an optional unit prefix and an optional unit.  For quantities
+of data, the default unit is bytes. For quantities of time,
+the default unit is seconds.
+
+With \fBkb_base=1000\fR, fio follows international standards for unit prefixes.
+To specify power-of-10 decimal values defined in the International
+System of Units (SI):
+.nf
+ki means kilo (K) or 1000
+mi means mega (M) or 1000**2
+gi means giga (G) or 1000**3
+ti means tera (T) or 1000**4
+pi means peta (P) or 1000**5
+.fi
+
+To specify power-of-2 binary values defined in IEC 80000-13:
+.nf
+k means kibi (Ki) or 1024
+m means mebi (Mi) or 1024**2
+g means gibi (Gi) or 1024**3
+t means tebi (Ti) or 1024**4
+p means pebi (Pi) or 1024**5
+.fi
+
+With \fBkb_base=1024\fR (the default), the unit prefixes are opposite from
+those specified in the SI and IEC 80000-13 standards to provide
+compatibility with old scripts.  For example, 4k means 4096.
+
+.nf
+Examples with \fBkb_base=1000\fR:
+4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
+1 MiB: 1048576, 1m, 1024k
+1 MB: 1000000, 1mi, 1000ki
+1 TiB: 1073741824, 1t, 1024m, 1048576k
+1 TB: 1000000000, 1ti, 1000mi, 1000000ki
+.fi
+
+.nf
+Examples with \fBkb_base=1024\fR (default):
+4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
+1 MiB: 1048576, 1m, 1024k
+1 MB: 1000000, 1mi, 1000ki
+1 TiB: 1073741824, 1t, 1024m, 1048576k
+1 TB: 1000000000, 1ti, 1000mi, 1000000ki
+.fi
+
+For quantities of data, an optional unit of 'B' may be included
+(e.g.,  'kb' is the same as 'k').
+
+The integer suffix is not case sensitive (e.g., m/mi mean mebi/mega,
+not milli). 'b' and 'B' both mean byte, not bit.
+
+To specify times (units are not case sensitive):
+.nf
+D means days
+H means hours
+M mean minutes
+s or sec means seconds (default)
+ms or msec means milliseconds
+us or usec means microseconds
+.fi
+
 .TP
 .I bool
 Boolean: a true or false value. `0' denotes false, `1' denotes true.
@@ -287,7 +345,7 @@ Sequential reads.
 Sequential writes.
 .TP
 .B trim
-Sequential trim (Linux block devices only).
+Sequential trims (Linux block devices only).
 .TP
 .B randread
 Random reads.
@@ -296,7 +354,7 @@ Random reads.
 Random writes.
 .TP
 .B randtrim
-Random trim (Linux block devices only).
+Random trims (Linux block devices only).
 .TP
 .B rw, readwrite
 Mixed sequential reads and writes.
@@ -305,8 +363,8 @@ Mixed sequential reads and writes.
 Mixed random reads and writes.
 .TP
 .B trimwrite
-Trim and write mixed workload. Blocks will be trimmed first, then the same
-blocks will be written to.
+Sequential trim and write mixed workload. Blocks will be trimmed first, then
+the same blocks will be written to.
 .RE
 .P
 Fio defaults to read if the option is not specified.
@@ -353,7 +411,7 @@ reasons. Allowed values are 1024 or 1000, with 1024 being the default.
 .TP
 .BI unified_rw_reporting \fR=\fPbool
 Fio normally reports statistics on a per data direction basis, meaning that
-read, write, and trim are accounted and reported separately. If this option is
+reads, writes, and trims are accounted and reported separately. If this option is
 set fio sums the results and reports them as "mixed" instead.
 .TP
 .BI randrepeat \fR=\fPbool
@@ -463,20 +521,32 @@ size of a file. If this option is set, then fio will append to the file
 instead. This has identical behavior to setting \fRoffset\fP to the size
 of a file. This option is ignored on non-regular files.
 .TP
-.BI blocksize \fR=\fPint[,int] "\fR,\fB bs" \fR=\fPint[,int]
-Block size for I/O units.  Default: 4k.  Values for reads, writes, and trims
-can be specified separately in the format \fIread\fR,\fIwrite\fR,\fItrim\fR
-either of which may be empty to leave that value at its default. If a trailing
-comma isn't given, the remainder will inherit the last value set.
-.TP
-.BI blocksize_range \fR=\fPirange[,irange] "\fR,\fB bsrange" \fR=\fPirange[,irange]
-Specify a range of I/O block sizes.  The issued I/O unit will always be a
-multiple of the minimum size, unless \fBblocksize_unaligned\fR is set.  Applies
-to both reads and writes if only one range is given, but can be specified
-separately with a comma separating the values. Example: bsrange=1k-4k,2k-8k.
-Also (see \fBblocksize\fR).
-.TP
-.BI bssplit \fR=\fPstr
+.BI blocksize \fR=\fPint[,int][,int] "\fR,\fB bs" \fR=\fPint[,int][,int]
+The block size in bytes for I/O units.  Default: 4096.
+A single value applies to reads, writes, and trims.
+Comma-separated values may be specified for reads, writes, and trims.
+Empty values separated by commas use the default value. A value not
+terminated in a comma applies to subsequent types.
+.nf
+Examples:
+bs=256k    means 256k for reads, writes and trims
+bs=8k,32k  means 8k for reads, 32k for writes and trims
+bs=8k,32k, means 8k for reads, 32k for writes, and default for trims
+bs=,8k     means default for reads, 8k for writes and trims
+bs=,8k,    means default for reads, 8k for writes, and default for writes
+.fi
+.TP
+.BI blocksize_range \fR=\fPirange[,irange][,irange] "\fR,\fB bsrange" \fR=\fPirange[,irange][,irange]
+A range of block sizes in bytes for I/O units.
+The issued I/O unit will always be a multiple of the minimum size, unless
+\fBblocksize_unaligned\fR is set.
+Comma-separated ranges may be specified for reads, writes, and trims
+as described in \fBblocksize\fR.
+.nf
+Example: bsrange=1k-4k,2k-8k.
+.fi
+.TP
+.BI bssplit \fR=\fPstr[,str][,str]
 This option allows even finer grained control of the block sizes issued,
 not just even splits between them. With this option, you can weight various
 block sizes for exact control of the issued IO for a job that has mixed
@@ -484,26 +554,28 @@ block sizes. The format of the option is bssplit=blocksize/percentage,
 optionally adding as many definitions as needed separated by a colon.
 Example: bssplit=4k/10:64k/50:32k/40 would issue 50% 64k blocks, 10% 4k
 blocks and 40% 32k blocks. \fBbssplit\fR also supports giving separate
-splits to reads and writes. The format is identical to what the
-\fBbs\fR option accepts, the read and write parts are separated with a
-comma.
+splits to reads, writes, and trims.
+Comma-separated values may be specified for reads, writes, and trims
+as described in \fBblocksize\fR.
 .TP
-.B blocksize_unaligned\fR,\fP bs_unaligned
-If set, any size in \fBblocksize_range\fR may be used.  This typically won't
+.B blocksize_unaligned\fR,\fB bs_unaligned
+If set, fio will issue I/O units with any size within \fBblocksize_range\fR,
+not just multiples of the minimum size.  This typically won't
 work with direct I/O, as that normally requires sector alignment.
 .TP
-.BI blockalign \fR=\fPint[,int] "\fR,\fB ba" \fR=\fPint[,int]
-At what boundary to align random IO offsets. Defaults to the same as 'blocksize'
-the minimum blocksize given.  Minimum alignment is typically 512b
-for using direct IO, though it usually depends on the hardware block size.
-This option is mutually exclusive with using a random map for files, so it
-will turn off that option.
-.TP
 .BI bs_is_seq_rand \fR=\fPbool
 If this option is set, fio will use the normal read,write blocksize settings as
-sequential,random instead. Any random read or write will use the WRITE
-blocksize settings, and any sequential read or write will use the READ
-blocksize setting.
+sequential,random blocksize settings instead. Any random read or write will
+use the WRITE blocksize settings, and any sequential read or write will use
+the READ blocksize settings.
+.TP
+.BI blockalign \fR=\fPint[,int][,int] "\fR,\fB ba" \fR=\fPint[,int][,int]
+Boundary to which fio will align random I/O units. Default: \fBblocksize\fR.
+Minimum alignment is typically 512b for using direct IO, though it usually
+depends on the hardware block size.  This option is mutually exclusive with
+using a random map for files, so it will turn off that option.
+Comma-separated values may be specified for reads, writes, and trims
+as described in \fBblocksize\fR.
 .TP
 .B zero_buffers
 Initialize buffers with all zeros. Default: fill buffers with random data.
@@ -735,7 +807,7 @@ properly.
 Read, write and erase an MTD character device (e.g., /dev/mtd0). Discards are
 treated as erases. Depending on the underlying device type, the I/O may have
 to go in a certain pattern, e.g., on NAND, writing sequentially to erase blocks
-and discarding before overwriting. The writetrim mode works well for this
+and discarding before overwriting. The trimwrite mode works well for this
 constraint.
 .TP
 .B pmemblk
@@ -963,7 +1035,7 @@ sizes. Like \fBbssplit\fR, it's possible to specify separate zones for reads,
 writes, and trims. If just one set is given, it'll apply to all of them.
 .RE
 .TP
-.BI percentage_random \fR=\fPint
+.BI percentage_random \fR=\fPint[,int][,int]
 For a random workload, set how big a percentage should be random. This defaults
 to 100%, in which case the workload is fully random. It can be set from
 anywhere from 0 to 100.  Setting it to 0 would make the workload fully
@@ -1032,28 +1104,29 @@ will be queued before we have to complete it and do our thinktime. In other
 words, this setting effectively caps the queue depth if the latter is larger.
 Default: 1.
 .TP
-.BI rate \fR=\fPint
+.BI rate \fR=\fPint[,int][,int]
 Cap bandwidth used by this job. The number is in bytes/sec, the normal postfix
 rules apply. You can use \fBrate\fR=500k to limit reads and writes to 500k each,
-or you can specify read and writes separately. Using \fBrate\fR=1m,500k would
-limit reads to 1MB/sec and writes to 500KB/sec. Capping only reads or writes
+or you can specify reads, write, and trim limits separately.
+Using \fBrate\fR=1m,500k would
+limit reads to 1MiB/sec and writes to 500KiB/sec. Capping only reads or writes
 can be done with \fBrate\fR=,500k or \fBrate\fR=500k,. The former will only
-limit writes (to 500KB/sec), the latter will only limit reads.
+limit writes (to 500KiB/sec), the latter will only limit reads.
 .TP
-.BI rate_min \fR=\fPint
+.BI rate_min \fR=\fPint[,int][,int]
 Tell \fBfio\fR to do whatever it can to maintain at least the given bandwidth.
 Failing to meet this requirement will cause the job to exit. The same format
-as \fBrate\fR is used for read vs write separation.
+as \fBrate\fR is used for read vs write vs trim separation.
 .TP
-.BI rate_iops \fR=\fPint
+.BI rate_iops \fR=\fPint[,int][,int]
 Cap the bandwidth to this number of IOPS. Basically the same as rate, just
 specified independently of bandwidth. The same format as \fBrate\fR is used for
-read vs write separation. If \fBblocksize\fR is a range, the smallest block
+read vs write vs trim separation. If \fBblocksize\fR is a range, the smallest block
 size is used as the metric.
 .TP
-.BI rate_iops_min \fR=\fPint
+.BI rate_iops_min \fR=\fPint[,int][,int]
 If this rate of I/O is not met, the job will exit. The same format as \fBrate\fR
-is used for read vs write separation.
+is used for read vs write vs trim separation.
 .TP
 .BI rate_process \fR=\fPstr
 This option controls how fio manages rated IO submissions. The default is
@@ -1257,7 +1330,7 @@ sum of the \fBiomem_align\fR and \fBbs\fR used.
 .TP
 .BI hugepage\-size \fR=\fPint
 Defines the size of a huge page.  Must be at least equal to the system setting.
-Should be a multiple of 1MB. Default: 4MB.
+Should be a multiple of 1MiB. Default: 4MiB.
 .TP
 .B exitall
 Terminate all jobs when one finishes.  Default: wait for each job to finish.
@@ -1891,7 +1964,7 @@ Preallocate donor's file on init
 .BI 1:
 allocate space immediately inside defragment event, and free right after event
 .RE
-.TP 
+.TP
 .BI (rbd)clustername \fR=\fPstr
 Specifies the name of the ceph cluster.
 .TP
@@ -1913,7 +1986,7 @@ While running, \fBfio\fR will display the status of the created jobs.  For
 example:
 .RS
 .P
-Threads: 1: [_r] [24.8% done] [ 13509/  8334 kb/s] [eta 00h:01m:31s]
+Jobs: 1: [_r] [24.8% done] [ 13509/  8334 kb/s] [eta 00h:01m:31s]
 .RE
 .P
 The characters in the first set of brackets denote the current status of each
@@ -2075,7 +2148,7 @@ change.  The fields are:
 .P
 Read status:
 .RS
-.B Total I/O \fR(KB)\fP, bandwidth \fR(KB/s)\fP, IOPS, runtime \fR(ms)\fP
+.B Total I/O \fR(KiB)\fP, bandwidth \fR(KiB/s)\fP, IOPS, runtime \fR(ms)\fP
 .P
 Submission latency:
 .RS
@@ -2101,7 +2174,7 @@ Bandwidth:
 .P
 Write status:
 .RS
-.B Total I/O \fR(KB)\fP, bandwidth \fR(KB/s)\fP, IOPS, runtime \fR(ms)\fP
+.B Total I/O \fR(KiB)\fP, bandwidth \fR(KiB/s)\fP, IOPS, runtime \fR(ms)\fP
 .P
 Submission latency:
 .RS
@@ -2364,7 +2437,7 @@ on the type of log, it will be one of the following:
 Value is in latency in usecs
 .TP
 .B Bandwidth log
-Value is in KB/sec
+Value is in KiB/sec
 .TP
 .B IOPS log
 Value is in IOPS
diff --git a/fio.h b/fio.h
index df17074..62ff7ab 100644
--- a/fio.h
+++ b/fio.h
@@ -535,6 +535,13 @@ extern uintptr_t page_size;
 extern int initialize_fio(char *envp[]);
 extern void deinitialize_fio(void);
 
+#define N2S_NONE	0
+#define N2S_BITPERSEC 	1	/* match unit_base for bit rates */
+#define N2S_PERSEC	2
+#define N2S_BIT		3
+#define N2S_BYTE	4
+#define N2S_BYTEPERSEC 	8	/* match unit_base for byte rates */
+
 #define FIO_GETOPT_JOB		0x89000000
 #define FIO_GETOPT_IOENGINE	0x98000000
 #define FIO_NR_OPTIONS		(FIO_MAX_OPTS + 128)
diff --git a/gclient.c b/gclient.c
index 23b0899..5ce33d0 100644
--- a/gclient.c
+++ b/gclient.c
@@ -364,29 +364,11 @@ static void gfio_update_client_eta(struct fio_client *client, struct jobs_eta *j
 	sprintf(tmp, "%u", je->files_open);
 	gtk_entry_set_text(GTK_ENTRY(ge->eta.files), tmp);
 
-#if 0
-	if (je->m_rate[0] || je->m_rate[1] || je->t_rate[0] || je->t_rate[1]) {
-	if (je->m_rate || je->t_rate) {
-		char *tr, *mr;
-
-		mr = num2str(je->m_rate, 4, 0, i2p);
-		tr = num2str(je->t_rate, 4, 0, i2p);
-		gtk_entry_set_text(GTK_ENTRY(ge->eta);
-		p += sprintf(p, ", CR=%s/%s KB/s", tr, mr);
-		free(tr);
-		free(mr);
-	} else if (je->m_iops || je->t_iops)
-		p += sprintf(p, ", CR=%d/%d IOPS", je->t_iops, je->m_iops);
-
-	gtk_entry_set_text(GTK_ENTRY(ge->eta.cr_bw), "---");
-	gtk_entry_set_text(GTK_ENTRY(ge->eta.cr_iops), "---");
-	gtk_entry_set_text(GTK_ENTRY(ge->eta.cw_bw), "---");
-	gtk_entry_set_text(GTK_ENTRY(ge->eta.cw_iops), "---");
-#endif
-
 	if (je->eta_sec != INT_MAX && je->nr_running) {
 		char *iops_str[DDIR_RWDIR_CNT];
 		char *rate_str[DDIR_RWDIR_CNT];
+		char *rate_alt[DDIR_RWDIR_CNT];
+		char tmp[128];
 		int i;
 
 		if ((!je->eta_sec && !eta_good) || je->nr_ramp == je->nr_running)
@@ -397,19 +379,26 @@ static void gfio_update_client_eta(struct fio_client *client, struct jobs_eta *j
 			sprintf(output, "%3.1f%% done", perc);
 		}
 
-		rate_str[0] = num2str(je->rate[0], 5, 10, i2p, 0);
-		rate_str[1] = num2str(je->rate[1], 5, 10, i2p, 0);
-		rate_str[2] = num2str(je->rate[2], 5, 10, i2p, 0);
+		iops_str[0] = num2str(je->iops[0], 4, 1, 0, N2S_PERSEC);
+		iops_str[1] = num2str(je->iops[1], 4, 1, 0, N2S_PERSEC);
+		iops_str[2] = num2str(je->iops[2], 4, 1, 0, N2S_PERSEC);
 
-		iops_str[0] = num2str(je->iops[0], 4, 1, 0, 0);
-		iops_str[1] = num2str(je->iops[1], 4, 1, 0, 0);
-		iops_str[2] = num2str(je->iops[2], 4, 1, 0, 0);
-
-		gtk_entry_set_text(GTK_ENTRY(ge->eta.read_bw), rate_str[0]);
+		rate_str[0] = num2str(je->rate[0], 4, 10, i2p, N2S_BYTEPERSEC);
+		rate_alt[0] = num2str(je->rate[0], 4, 10, !i2p, N2S_BYTEPERSEC);
+		snprintf(tmp, sizeof(tmp), "%s (%s)", rate_str[0], rate_alt[0]);
+		gtk_entry_set_text(GTK_ENTRY(ge->eta.read_bw), tmp);
 		gtk_entry_set_text(GTK_ENTRY(ge->eta.read_iops), iops_str[0]);
-		gtk_entry_set_text(GTK_ENTRY(ge->eta.write_bw), rate_str[1]);
+
+		rate_str[1] = num2str(je->rate[1], 4, 10, i2p, N2S_BYTEPERSEC);
+		rate_alt[1] = num2str(je->rate[1], 4, 10, !i2p, N2S_BYTEPERSEC);
+		snprintf(tmp, sizeof(tmp), "%s (%s)", rate_str[1], rate_alt[1]);
+		gtk_entry_set_text(GTK_ENTRY(ge->eta.write_bw), tmp);
 		gtk_entry_set_text(GTK_ENTRY(ge->eta.write_iops), iops_str[1]);
-		gtk_entry_set_text(GTK_ENTRY(ge->eta.trim_bw), rate_str[2]);
+
+		rate_str[2] = num2str(je->rate[2], 4, 10, i2p, N2S_BYTEPERSEC);
+		rate_alt[2] = num2str(je->rate[2], 4, 10, !i2p, N2S_BYTEPERSEC);
+		snprintf(tmp, sizeof(tmp), "%s (%s)", rate_str[2], rate_alt[2]);
+		gtk_entry_set_text(GTK_ENTRY(ge->eta.trim_bw), tmp);
 		gtk_entry_set_text(GTK_ENTRY(ge->eta.trim_iops), iops_str[2]);
 
 		graph_add_xy_data(ge->graphs.iops_graph, ge->graphs.read_iops, je->elapsed_sec, je->iops[0], iops_str[0]);
@@ -421,6 +410,7 @@ static void gfio_update_client_eta(struct fio_client *client, struct jobs_eta *j
 
 		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 			free(rate_str[i]);
+			free(rate_alt[i]);
 			free(iops_str[i]);
 		}
 	}
@@ -457,31 +447,13 @@ static void gfio_update_all_eta(struct jobs_eta *je)
 		eta_to_str(eta_str, je->eta_sec);
 	}
 
-#if 0
-	if (je->m_rate[0] || je->m_rate[1] || je->t_rate[0] || je->t_rate[1]) {
-	if (je->m_rate || je->t_rate) {
-		char *tr, *mr;
-
-		mr = num2str(je->m_rate, 4, 0, i2p);
-		tr = num2str(je->t_rate, 4, 0, i2p);
-		gtk_entry_set_text(GTK_ENTRY(ui->eta);
-		p += sprintf(p, ", CR=%s/%s KB/s", tr, mr);
-		free(tr);
-		free(mr);
-	} else if (je->m_iops || je->t_iops)
-		p += sprintf(p, ", CR=%d/%d IOPS", je->t_iops, je->m_iops);
-
-	gtk_entry_set_text(GTK_ENTRY(ui->eta.cr_bw), "---");
-	gtk_entry_set_text(GTK_ENTRY(ui->eta.cr_iops), "---");
-	gtk_entry_set_text(GTK_ENTRY(ui->eta.cw_bw), "---");
-	gtk_entry_set_text(GTK_ENTRY(ui->eta.cw_iops), "---");
-#endif
-
 	entry_set_int_value(ui->eta.jobs, je->nr_running);
 
 	if (je->eta_sec != INT_MAX && je->nr_running) {
-		char *iops_str[3];
-		char *rate_str[3];
+		char *iops_str[DDIR_RWDIR_CNT];
+		char *rate_str[DDIR_RWDIR_CNT];
+		char *rate_alt[DDIR_RWDIR_CNT];
+		char tmp[128];
 
 		if ((!je->eta_sec && !eta_good) || je->nr_ramp == je->nr_running)
 			strcpy(output, "-.-% done");
@@ -491,19 +463,26 @@ static void gfio_update_all_eta(struct jobs_eta *je)
 			sprintf(output, "%3.1f%% done", perc);
 		}
 
-		rate_str[0] = num2str(je->rate[0], 5, 10, i2p, 0);
-		rate_str[1] = num2str(je->rate[1], 5, 10, i2p, 0);
-		rate_str[2] = num2str(je->rate[2], 5, 10, i2p, 0);
+		iops_str[0] = num2str(je->iops[0], 4, 1, 0, N2S_PERSEC);
+		iops_str[1] = num2str(je->iops[1], 4, 1, 0, N2S_PERSEC);
+		iops_str[2] = num2str(je->iops[2], 4, 1, 0, N2S_PERSEC);
 
-		iops_str[0] = num2str(je->iops[0], 4, 1, 0, 0);
-		iops_str[1] = num2str(je->iops[1], 4, 1, 0, 0);
-		iops_str[2] = num2str(je->iops[2], 4, 1, 0, 0);
-
-		gtk_entry_set_text(GTK_ENTRY(ui->eta.read_bw), rate_str[0]);
+		rate_str[0] = num2str(je->rate[0], 4, 10, i2p, N2S_BYTEPERSEC);
+		rate_alt[0] = num2str(je->rate[0], 4, 10, !i2p, N2S_BYTEPERSEC);
+		snprintf(tmp, sizeof(tmp), "%s (%s)", rate_str[0], rate_alt[0]);
+		gtk_entry_set_text(GTK_ENTRY(ui->eta.read_bw), tmp);
 		gtk_entry_set_text(GTK_ENTRY(ui->eta.read_iops), iops_str[0]);
-		gtk_entry_set_text(GTK_ENTRY(ui->eta.write_bw), rate_str[1]);
+
+		rate_str[1] = num2str(je->rate[1], 4, 10, i2p, N2S_BYTEPERSEC);
+		rate_alt[1] = num2str(je->rate[1], 4, 10, !i2p, N2S_BYTEPERSEC);
+		snprintf(tmp, sizeof(tmp), "%s (%s)", rate_str[1], rate_alt[1]);
+		gtk_entry_set_text(GTK_ENTRY(ui->eta.write_bw), tmp);
 		gtk_entry_set_text(GTK_ENTRY(ui->eta.write_iops), iops_str[1]);
-		gtk_entry_set_text(GTK_ENTRY(ui->eta.trim_bw), rate_str[2]);
+
+		rate_str[2] = num2str(je->rate[2], 4, 10, i2p, N2S_BYTEPERSEC);
+		rate_alt[2] = num2str(je->rate[2], 4, 10, !i2p, N2S_BYTEPERSEC);
+		snprintf(tmp, sizeof(tmp), "%s (%s)", rate_str[2], rate_alt[2]);
+		gtk_entry_set_text(GTK_ENTRY(ui->eta.trim_bw), tmp);
 		gtk_entry_set_text(GTK_ENTRY(ui->eta.trim_iops), iops_str[2]);
 
 		graph_add_xy_data(ui->graphs.iops_graph, ui->graphs.read_iops, je->elapsed_sec, je->iops[0], iops_str[0]);
@@ -515,6 +494,7 @@ static void gfio_update_all_eta(struct jobs_eta *je)
 
 		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 			free(rate_str[i]);
+			free(rate_alt[i]);
 			free(iops_str[i]);
 		}
 	}
@@ -592,6 +572,7 @@ static void gfio_add_job_op(struct fio_client *client, struct fio_net_cmd *cmd)
 	struct thread_options *o;
 	char *c1, *c2, *c3, *c4;
 	char tmp[80];
+	int i2p;
 
 	p->thread_number = le32_to_cpu(p->thread_number);
 	p->groupid = le32_to_cpu(p->groupid);
@@ -605,11 +586,13 @@ static void gfio_add_job_op(struct fio_client *client, struct fio_net_cmd *cmd)
 	sprintf(tmp, "%s %s", o->odirect ? "direct" : "buffered", ddir_str(o->td_ddir));
 	multitext_add_entry(&ge->eta.iotype, tmp);
 
-	c1 = fio_uint_to_kmg(o->min_bs[DDIR_READ]);
-	c2 = fio_uint_to_kmg(o->max_bs[DDIR_WRITE]);
-	c3 = fio_uint_to_kmg(o->min_bs[DDIR_READ]);
-	c4 = fio_uint_to_kmg(o->max_bs[DDIR_WRITE]);
-	sprintf(tmp, "%s-%s/%s-%s", c1, c2, c3, c4);
+	i2p = is_power_of_2(o->kb_base);
+	c1 = num2str(o->min_bs[DDIR_READ], 4, 1, i2p, N2S_BYTE);
+	c2 = num2str(o->max_bs[DDIR_READ], 4, 1, i2p, N2S_BYTE);
+	c3 = num2str(o->min_bs[DDIR_WRITE], 4, 1, i2p, N2S_BYTE);
+	c4 = num2str(o->max_bs[DDIR_WRITE], 4, 1, i2p, N2S_BYTE);
+
+	sprintf(tmp, "%s-%s,%s-%s", c1, c2, c3, c4);
 	free(c1);
 	free(c2);
 	free(c3);
@@ -948,10 +931,10 @@ static void gfio_show_latency_buckets(struct gfio_client *gc, GtkWidget *vbox,
 				      struct thread_stat *ts)
 {
 	double io_u_lat[FIO_IO_U_LAT_U_NR + FIO_IO_U_LAT_M_NR];
-	const char *ranges[] = { "2u", "4u", "10u", "20u", "50u", "100u",
-				 "250u", "500u", "750u", "1m", "2m",
-				 "4m", "10m", "20m", "50m", "100m",
-				 "250m", "500m", "750m", "1s", "2s", ">= 2s" };
+	const char *ranges[] = { "2us", "4us", "10us", "20us", "50us", "100us",
+				 "250us", "500us", "750us", "1ms", "2ms",
+				 "4ms", "10ms", "20ms", "50ms", "100ms",
+				 "250ms", "500ms", "750ms", "1s", "2s", ">= 2s" };
 	int start, end, i;
 	const int total = FIO_IO_U_LAT_U_NR + FIO_IO_U_LAT_M_NR;
 	GtkWidget *frame, *tree_view, *hbox, *completion_vbox, *drawing_area;
@@ -980,7 +963,7 @@ static void gfio_show_latency_buckets(struct gfio_client *gc, GtkWidget *vbox,
 		return;
 
 	tree_view = gfio_output_lat_buckets(&io_u_lat[start], &ranges[start], end - start + 1);
-	ge->lat_bucket_graph = setup_lat_bucket_graph("Latency Buckets", &io_u_lat[start], &ranges[start], end - start + 1, 700.0, 300.0);
+	ge->lat_bucket_graph = setup_lat_bucket_graph("Latency buckets", &io_u_lat[start], &ranges[start], end - start + 1, 700.0, 300.0);
 
 	frame = gtk_frame_new("Latency buckets");
 	gtk_box_pack_start(GTK_BOX(vbox), frame, FALSE, FALSE, 5);
@@ -1011,8 +994,8 @@ static void gfio_show_lat(GtkWidget *vbox, const char *name, unsigned long min,
 	if (usec_to_msec(&min, &max, &mean, &dev))
 		base = "(msec)";
 
-	minp = num2str(min, 6, 1, 0, 0);
-	maxp = num2str(max, 6, 1, 0, 0);
+	minp = num2str(min, 6, 1, 0, N2S_NONE);
+	maxp = num2str(max, 6, 1, 0, N2S_NONE);
 
 	sprintf(tmp, "%s %s", name, base);
 	frame = gtk_frame_new(tmp);
@@ -1173,7 +1156,8 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 	unsigned long long bw, iops;
 	unsigned int flags = 0;
 	double mean[3], dev[3];
-	char *io_p, *bw_p, *iops_p;
+	char *io_p, *io_palt, *bw_p, *bw_palt, *iops_p;
+	char tmp[128];
 	int i2p;
 
 	if (!ts->runtime[ddir])
@@ -1183,11 +1167,9 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 	runt = ts->runtime[ddir];
 
 	bw = (1000 * ts->io_bytes[ddir]) / runt;
-	io_p = num2str(ts->io_bytes[ddir], 6, 1, i2p, 8);
-	bw_p = num2str(bw, 6, 1, i2p, ts->unit_base);
 
 	iops = (1000 * (uint64_t)ts->total_io_u[ddir]) / runt;
-	iops_p = num2str(iops, 6, 1, 0, 0);
+	iops_p = num2str(iops, 4, 1, 0, N2S_PERSEC);
 
 	box = gtk_hbox_new(FALSE, 3);
 	gtk_box_pack_start(GTK_BOX(mbox), box, TRUE, FALSE, 3);
@@ -1202,9 +1184,17 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 	gtk_box_pack_start(GTK_BOX(main_vbox), box, TRUE, FALSE, 3);
 
 	label = new_info_label_in_frame(box, "IO");
-	gtk_label_set_text(GTK_LABEL(label), io_p);
+	io_p = num2str(ts->io_bytes[ddir], 4, 1, i2p, N2S_BYTE);
+	io_palt = num2str(ts->io_bytes[ddir], 4, 1, !i2p, N2S_BYTE);
+	snprintf(tmp, sizeof(tmp), "%s (%s)", io_p, io_palt);
+	gtk_label_set_text(GTK_LABEL(label), tmp);
+
 	label = new_info_label_in_frame(box, "Bandwidth");
-	gtk_label_set_text(GTK_LABEL(label), bw_p);
+	bw_p = num2str(bw, 4, 1, i2p, ts->unit_base);
+	bw_palt = num2str(bw, 4, 1, !i2p, ts->unit_base);
+	snprintf(tmp, sizeof(tmp), "%s (%s)", bw_p, bw_palt);
+	gtk_label_set_text(GTK_LABEL(label), tmp);
+
 	label = new_info_label_in_frame(box, "IOPS");
 	gtk_label_set_text(GTK_LABEL(label), iops_p);
 	label = new_info_label_in_frame(box, "Runtime (msec)");
@@ -1212,7 +1202,7 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 
 	if (calc_lat(&ts->bw_stat[ddir], &min[0], &max[0], &mean[0], &dev[0])) {
 		double p_of_agg = 100.0;
-		const char *bw_str = "KB";
+		const char *bw_str = "KiB/s";
 		char tmp[32];
 
 		if (rs->agg[ddir]) {
@@ -1221,14 +1211,21 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 				p_of_agg = 100.0;
 		}
 
-		if (mean[0] > 999999.9) {
-			min[0] /= 1000.0;
-			max[0] /= 1000.0;
-			mean[0] /= 1000.0;
-			dev[0] /= 1000.0;
-			bw_str = "MB";
+		if (mean[0] > 1073741824.9) {
+			min[0] /= 1048576.0;
+			max[0] /= 1048576.0;
+			mean[0] /= 1048576.0;
+			dev[0] /= 1048576.0;
+			bw_str = "GiB/s";
 		}
 
+		if (mean[0] > 1047575.9) {
+			min[0] /= 1024.0;
+			max[0] /= 1024.0;
+			mean[0] /= 1024.0;
+			dev[0] /= 1024.0;
+			bw_str = "MiB/s";
+		}
 		sprintf(tmp, "Bandwidth (%s)", bw_str);
 		frame = gtk_frame_new(tmp);
 		gtk_box_pack_start(GTK_BOX(main_vbox), frame, FALSE, FALSE, 5);
@@ -1278,6 +1275,8 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 
 	free(io_p);
 	free(bw_p);
+	free(io_palt);
+	free(bw_palt);
 	free(iops_p);
 }
 
diff --git a/gfio.c b/gfio.c
index ce18091..9ccf78c 100644
--- a/gfio.c
+++ b/gfio.c
@@ -1215,7 +1215,7 @@ static void about_dialog(GtkWidget *w, gpointer data)
 {
 	const char *authors[] = {
 		"Jens Axboe <axboe@kernel.dk>",
-		"Stephen Carmeron <stephenmcameron@gmail.com>",
+		"Stephen Cameron <stephenmcameron@gmail.com>",
 		NULL
 	};
 	const char *license[] = {
@@ -1386,7 +1386,7 @@ static GtkWidget *new_client_page(struct gui_entry *ge)
 	g_signal_connect(ge->eta.names, "changed", G_CALLBACK(combo_entry_changed), ge);
 	g_signal_connect(ge->eta.names, "destroy", G_CALLBACK(combo_entry_destroy), ge);
 	ge->eta.iotype.entry = new_info_entry_in_frame(probe_box, "IO");
-	ge->eta.bs.entry = new_info_entry_in_frame(probe_box, "Blocksize (Read/Write)");
+	ge->eta.bs.entry = new_info_entry_in_frame(probe_box, "Blocksize (Read/Write/Trim)");
 	ge->eta.ioengine.entry = new_info_entry_in_frame(probe_box, "IO Engine");
 	ge->eta.iodepth.entry = new_info_entry_in_frame(probe_box, "IO Depth");
 	ge->eta.jobs = new_info_entry_in_frame(probe_box, "Jobs");
@@ -1395,11 +1395,11 @@ static GtkWidget *new_client_page(struct gui_entry *ge)
 	probe_box = gtk_hbox_new(FALSE, 3);
 	gtk_box_pack_start(GTK_BOX(probe_frame), probe_box, FALSE, FALSE, 3);
 	ge->eta.read_bw = new_info_entry_in_frame_rgb(probe_box, "Read BW", GFIO_READ_R, GFIO_READ_G, GFIO_READ_B);
-	ge->eta.read_iops = new_info_entry_in_frame_rgb(probe_box, "IOPS", GFIO_READ_R, GFIO_READ_G, GFIO_READ_B);
+	ge->eta.read_iops = new_info_entry_in_frame_rgb(probe_box, "Read IOPS", GFIO_READ_R, GFIO_READ_G, GFIO_READ_B);
 	ge->eta.write_bw = new_info_entry_in_frame_rgb(probe_box, "Write BW", GFIO_WRITE_R, GFIO_WRITE_G, GFIO_WRITE_B);
-	ge->eta.write_iops = new_info_entry_in_frame_rgb(probe_box, "IOPS", GFIO_WRITE_R, GFIO_WRITE_G, GFIO_WRITE_B);
+	ge->eta.write_iops = new_info_entry_in_frame_rgb(probe_box, "Write IOPS", GFIO_WRITE_R, GFIO_WRITE_G, GFIO_WRITE_B);
 	ge->eta.trim_bw = new_info_entry_in_frame_rgb(probe_box, "Trim BW", GFIO_TRIM_R, GFIO_TRIM_G, GFIO_TRIM_B);
-	ge->eta.trim_iops = new_info_entry_in_frame_rgb(probe_box, "IOPS", GFIO_TRIM_R, GFIO_TRIM_G, GFIO_TRIM_B);
+	ge->eta.trim_iops = new_info_entry_in_frame_rgb(probe_box, "Trim IOPS", GFIO_TRIM_R, GFIO_TRIM_G, GFIO_TRIM_B);
 
 	/*
 	 * Only add this if we have a commit rate
diff --git a/goptions.c b/goptions.c
index b3d3684..16938ed 100644
--- a/goptions.c
+++ b/goptions.c
@@ -826,7 +826,7 @@ static struct gopt *gopt_new_str_val(struct gopt_job_view *gjv,
 				     unsigned long long *p, unsigned int idx)
 {
 	struct gopt_str_val *g;
-	const gchar *postfix[] = { "B", "KB", "MB", "GB", "PB", "TB", "" };
+	const gchar *postfix[] = { "B", "KiB", "MiB", "GiB", "PiB", "PiB", "" };
 	GtkWidget *label;
 	int i;
 
diff --git a/init.c b/init.c
index f26f35d..3c925a3 100644
--- a/init.c
+++ b/init.c
@@ -31,6 +31,7 @@
 #include "oslib/strcasestr.h"
 
 #include "crc/test.h"
+#include "lib/pow2.h"
 
 const char fio_version_string[] = FIO_VERSION;
 
@@ -865,27 +866,6 @@ static int fixup_options(struct thread_data *td)
 	return ret;
 }
 
-/*
- * This function leaks the buffer
- */
-char *fio_uint_to_kmg(unsigned int val)
-{
-	char *buf = malloc(32);
-	char post[] = { 0, 'K', 'M', 'G', 'P', 'E', 0 };
-	char *p = post;
-
-	do {
-		if (val & 1023)
-			break;
-
-		val >>= 10;
-		p++;
-	} while (*p);
-
-	snprintf(buf, 32, "%u%c", val, *p);
-	return buf;
-}
-
 /* External engines are specified by "external:name.o") */
 static const char *get_engine_name(const char *str)
 {
@@ -1528,15 +1508,16 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			if (!td_ioengine_flagged(td, FIO_NOIO)) {
 				char *c1, *c2, *c3, *c4;
 				char *c5 = NULL, *c6 = NULL;
+				int i2p = is_power_of_2(o->kb_base);
 
-				c1 = fio_uint_to_kmg(o->min_bs[DDIR_READ]);
-				c2 = fio_uint_to_kmg(o->max_bs[DDIR_READ]);
-				c3 = fio_uint_to_kmg(o->min_bs[DDIR_WRITE]);
-				c4 = fio_uint_to_kmg(o->max_bs[DDIR_WRITE]);
+				c1 = num2str(o->min_bs[DDIR_READ], 4, 1, i2p, N2S_BYTE);
+				c2 = num2str(o->max_bs[DDIR_READ], 4, 1, i2p, N2S_BYTE);
+				c3 = num2str(o->min_bs[DDIR_WRITE], 4, 1, i2p, N2S_BYTE);
+				c4 = num2str(o->max_bs[DDIR_WRITE], 4, 1, i2p, N2S_BYTE);
 
 				if (!o->bs_is_seq_rand) {
-					c5 = fio_uint_to_kmg(o->min_bs[DDIR_TRIM]);
-					c6 = fio_uint_to_kmg(o->max_bs[DDIR_TRIM]);
+					c5 = num2str(o->min_bs[DDIR_TRIM], 4, 1, i2p, N2S_BYTE);
+					c6 = num2str(o->max_bs[DDIR_TRIM], 4, 1, i2p, N2S_BYTE);
 				}
 
 				log_info("%s: (g=%d): rw=%s, ", td->o.name,
@@ -1544,10 +1525,10 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 							ddir_str(o->td_ddir));
 
 				if (o->bs_is_seq_rand)
-					log_info("bs(seq/rand)=%s-%s/%s-%s, ",
+					log_info("bs=%s-%s,%s-%s, bs_is_seq_rand, ",
 							c1, c2, c3, c4);
 				else
-					log_info("bs=%s-%s/%s-%s/%s-%s, ",
+					log_info("bs=%s-%s,%s-%s,%s-%s, ",
 							c1, c2, c3, c4, c5, c6);
 
 				log_info("ioengine=%s, iodepth=%u\n",
diff --git a/lib/num2str.c b/lib/num2str.c
index 0ed05f3..940d4a5 100644
--- a/lib/num2str.c
+++ b/lib/num2str.c
@@ -6,36 +6,63 @@
 
 #define ARRAY_LENGTH(arr)	sizeof(arr) / sizeof((arr)[0])
 
-/*
- * Cheesy number->string conversion, complete with carry rounding error.
+/**
+ * num2str() - Cheesy number->string conversion, complete with carry rounding error.
+ * @num: quantity (e.g., number of blocks, bytes or bits)
+ * @maxlen: max number of digits in the output string (not counting prefix and units)
+ * @base: multiplier for num (e.g., if num represents Ki, use 1024)
+ * @pow2: select unit prefix - 0=power-of-10 decimal SI, nonzero=power-of-2 binary IEC
+ * @units: select units - N2S_* macros defined in fio.h
+ * @returns a malloc'd buffer containing "number[<unit prefix>][<units>]"
  */
-char *num2str(uint64_t num, int maxlen, int base, int pow2, int unit_base)
+char *num2str(uint64_t num, int maxlen, int base, int pow2, int units)
 {
-	const char *postfix[] = { "", "K", "M", "G", "P", "E" };
-	const char *byte_postfix[] = { "", "B", "bit" };
+	const char *sistr[] = { "", "k", "M", "G", "T", "P" };
+	const char *iecstr[] = { "", "Ki", "Mi", "Gi", "Ti", "Pi" };
+	const char **unitprefix;
+	const char *unitstr[] = { "", "/s", "B", "bit", "B/s", "bit/s" };
 	const unsigned int thousand[] = { 1000, 1024 };
 	unsigned int modulo, decimals;
-	int byte_post_index = 0, post_index, carry = 0;
+	int unit_index = 0, post_index, carry = 0;
 	char tmp[32];
 	char *buf;
 
+	compiletime_assert(sizeof(sistr) == sizeof(iecstr), "unit prefix arrays must be identical sizes");
+
 	buf = malloc(128);
+	if (!buf)
+		return NULL;
+
+	if (pow2)
+		unitprefix = iecstr;
+	else
+		unitprefix = sistr;
 
 	for (post_index = 0; base > 1; post_index++)
 		base /= thousand[!!pow2];
 
-	switch (unit_base) {
-	case 1:
-		byte_post_index = 2;
+	switch (units) {
+	case N2S_PERSEC:
+		unit_index = 1;
+		break;
+	case N2S_BYTE:
+		unit_index = 2;
+		break;
+	case N2S_BIT:
+		unit_index = 3;
 		num *= 8;
 		break;
-	case 8:
-		byte_post_index = 1;
+	case N2S_BYTEPERSEC:
+		unit_index = 4;
+		break;
+	case N2S_BITPERSEC:
+		unit_index = 5;
+		num *= 8;
 		break;
 	}
 
 	modulo = -1U;
-	while (post_index < sizeof(postfix)) {
+	while (post_index < sizeof(sistr)) {
 		sprintf(tmp, "%llu", (unsigned long long) num);
 		if (strlen(tmp) <= maxlen)
 			break;
@@ -48,11 +75,11 @@ char *num2str(uint64_t num, int maxlen, int base, int pow2, int unit_base)
 
 	if (modulo == -1U) {
 done:
-		if (post_index >= ARRAY_LENGTH(postfix))
+		if (post_index >= ARRAY_LENGTH(sistr))
 			post_index = 0;
 
 		sprintf(buf, "%llu%s%s", (unsigned long long) num,
-			postfix[post_index], byte_postfix[byte_post_index]);
+			unitprefix[post_index], unitstr[unit_index]);
 		return buf;
 	}
 
@@ -73,6 +100,6 @@ done:
 	} while (1);
 
 	sprintf(buf, "%llu.%u%s%s", (unsigned long long) num, modulo,
-			postfix[post_index], byte_postfix[byte_post_index]);
+			unitprefix[post_index], unitstr[unit_index]);
 	return buf;
 }
diff --git a/memory.c b/memory.c
index 9124117..9e73f10 100644
--- a/memory.c
+++ b/memory.c
@@ -33,13 +33,13 @@ int fio_pin_memory(struct thread_data *td)
 	dprint(FD_MEM, "pinning %llu bytes\n", td->o.lockmem);
 
 	/*
-	 * Don't allow mlock of more than real_mem-128MB
+	 * Don't allow mlock of more than real_mem-128MiB
 	 */
 	phys_mem = os_phys_mem();
 	if (phys_mem) {
 		if ((td->o.lockmem + 128 * 1024 * 1024) > phys_mem) {
 			td->o.lockmem = phys_mem - 128 * 1024 * 1024;
-			log_info("fio: limiting mlocked memory to %lluMB\n",
+			log_info("fio: limiting mlocked memory to %lluMiB\n",
 							td->o.lockmem >> 20);
 		}
 	}
diff --git a/options.c b/options.c
index d8b4012..0f2adcd 100644
--- a/options.c
+++ b/options.c
@@ -1965,7 +1965,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.off3	= offsetof(struct thread_options, bs[DDIR_TRIM]),
 		.minval = 1,
 		.help	= "Block size unit",
-		.def	= "4k",
+		.def	= "4096",
 		.parent = "rw",
 		.hide	= 1,
 		.interval = 512,
@@ -2885,7 +2885,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.off1	= offsetof(struct thread_options, trim_percentage),
 		.minval = 0,
 		.maxval = 100,
-		.help	= "Number of verify blocks to discard/trim",
+		.help	= "Number of verify blocks to trim (i.e., discard)",
 		.parent	= "verify",
 		.def	= "0",
 		.interval = 1,
@@ -2897,7 +2897,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "trim_verify_zero",
 		.lname	= "Verify trim zero",
 		.type	= FIO_OPT_BOOL,
-		.help	= "Verify that trim/discarded blocks are returned as zeroes",
+		.help	= "Verify that trimmed (i.e., discarded) blocks are returned as zeroes",
 		.off1	= offsetof(struct thread_options, trim_zero),
 		.parent	= "trim_percentage",
 		.hide	= 1,
@@ -4180,20 +4180,20 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.posval = {
 			  { .ival = "1024",
 			    .oval = 1024,
-			    .help = "Use 1024 as the K base",
+			    .help = "Inputs invert IEC and SI prefixes (for compatibility); outputs prefer binary",
 			  },
 			  { .ival = "1000",
 			    .oval = 1000,
-			    .help = "Use 1000 as the K base",
+			    .help = "Inputs use IEC and SI prefixes; outputs prefer SI",
 			  },
 		},
-		.help	= "How many bytes per KB for reporting (1000 or 1024)",
+		.help	= "Unit prefix interpretation for quantities of data (IEC and SI)",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
 		.name	= "unit_base",
-		.lname	= "Base unit for reporting (Bits or Bytes)",
+		.lname	= "Unit for quantities of data (Bits or Bytes)",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct thread_options, unit_base),
 		.prio	= 1,
diff --git a/parse.c b/parse.c
index 8ed4619..518c2df 100644
--- a/parse.c
+++ b/parse.c
@@ -207,32 +207,50 @@ static unsigned long long __get_mult_bytes(const char *p, void *data,
 		}
 	}
 
+	/* If kb_base is 1000, use true units.
+	 * If kb_base is 1024, use opposite units.
+	 */
 	if (!strncmp("pib", c, 3)) {
 		pow = 5;
-		mult = 1000;
+		if (kb_base == 1000)
+			mult = 1024;
+		else if (kb_base == 1024)
+			mult = 1000;
 	} else if (!strncmp("tib", c, 3)) {
 		pow = 4;
-		mult = 1000;
+		if (kb_base == 1000)
+			mult = 1024;
+		else if (kb_base == 1024)
+			mult = 1000;
 	} else if (!strncmp("gib", c, 3)) {
 		pow = 3;
-		mult = 1000;
+		if (kb_base == 1000)
+			mult = 1024;
+		else if (kb_base == 1024)
+			mult = 1000;
 	} else if (!strncmp("mib", c, 3)) {
 		pow = 2;
-		mult = 1000;
+		if (kb_base == 1000)
+			mult = 1024;
+		else if (kb_base == 1024)
+			mult = 1000;
 	} else if (!strncmp("kib", c, 3)) {
 		pow = 1;
-		mult = 1000;
-	} else if (!strncmp("p", c, 1) || !strncmp("pb", c, 2))
+		if (kb_base == 1000)
+			mult = 1024;
+		else if (kb_base == 1024)
+			mult = 1000;
+	} else if (!strncmp("p", c, 1) || !strncmp("pb", c, 2)) {
 		pow = 5;
-	else if (!strncmp("t", c, 1) || !strncmp("tb", c, 2))
+	} else if (!strncmp("t", c, 1) || !strncmp("tb", c, 2)) {
 		pow = 4;
-	else if (!strncmp("g", c, 1) || !strncmp("gb", c, 2))
+	} else if (!strncmp("g", c, 1) || !strncmp("gb", c, 2)) {
 		pow = 3;
-	else if (!strncmp("m", c, 1) || !strncmp("mb", c, 2))
+	} else if (!strncmp("m", c, 1) || !strncmp("mb", c, 2)) {
 		pow = 2;
-	else if (!strncmp("k", c, 1) || !strncmp("kb", c, 2))
+	} else if (!strncmp("k", c, 1) || !strncmp("kb", c, 2)) {
 		pow = 1;
-	else if (!strncmp("%", c, 1)) {
+	} else if (!strncmp("%", c, 1)) {
 		*percent = 1;
 		free(c);
 		return ret;
diff --git a/profiles/act.c b/profiles/act.c
index 3e9238b..643f8a8 100644
--- a/profiles/act.c
+++ b/profiles/act.c
@@ -130,21 +130,21 @@ static struct fio_option options[] = {
 	},
 	{
 		.name	= "read-req-num-512-blocks",
-		.lname	= "Number of 512b blocks to read",
+		.lname	= "Number of 512B blocks to read",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct act_options, num_read_blocks),
-		.help	= "Number of 512b blocks to read at the time",
+		.help	= "Number of 512B blocks to read at the time",
 		.def	= "3",
 		.category = FIO_OPT_C_PROFILE,
 		.group	= FIO_OPT_G_ACT,
 	},
 	{
 		.name	= "large-block-op-kbytes",
-		.lname	= "Size of large block ops (writes)",
+		.lname	= "Size of large block ops in KiB (writes)",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct act_options, write_size),
-		.help	= "Size of large block ops (writes)",
-		.def	= "128k",
+		.help	= "Size of large block ops in KiB (writes)",
+		.def	= "131072",
 		.category = FIO_OPT_C_PROFILE,
 		.group	= FIO_OPT_G_ACT,
 	},
@@ -220,7 +220,7 @@ static int act_add_dev_prep(const char *dev)
 		return 1;
 	if (act_add_opt("filename=%s", dev))
 		return 1;
-	if (act_add_opt("bs=1M"))
+	if (act_add_opt("bs=1048576"))
 		return 1;
 	if (act_add_opt("zero_buffers"))
 		return 1;
@@ -234,7 +234,7 @@ static int act_add_dev_prep(const char *dev)
 		return 1;
 	if (act_add_opt("filename=%s", dev))
 		return 1;
-	if (act_add_opt("bs=4k"))
+	if (act_add_opt("bs=4096"))
 		return 1;
 	if (act_add_opt("ioengine=libaio"))
 		return 1;
diff --git a/profiles/tiobench.c b/profiles/tiobench.c
index 8af6f4e..9d9885a 100644
--- a/profiles/tiobench.c
+++ b/profiles/tiobench.c
@@ -39,7 +39,7 @@ static struct fio_option options[] = {
 		.lname	= "Tiobench size",
 		.type	= FIO_OPT_STR_VAL,
 		.off1	= offsetof(struct tiobench_options, size),
-		.help	= "Size in MB",
+		.help	= "Size in MiB",
 		.category = FIO_OPT_C_PROFILE,
 		.group	= FIO_OPT_G_TIOBENCH,
 	},
@@ -49,7 +49,7 @@ static struct fio_option options[] = {
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct tiobench_options, bs),
 		.help	= "Block size in bytes",
-		.def	= "4k",
+		.def	= "4096",
 		.category = FIO_OPT_C_PROFILE,
 		.group	= FIO_OPT_G_TIOBENCH,
 	},
@@ -91,7 +91,7 @@ static struct fio_option options[] = {
 static int tb_prep_cmdline(void)
 {
 	/*
-	 * tiobench uses size as MB, so multiply up
+	 * tiobench uses size as MiB, so multiply up
 	 */
 	size *= 1024 * 1024ULL;
 	if (size)
diff --git a/server.c b/server.c
index 2e05415..b7ebd63 100644
--- a/server.c
+++ b/server.c
@@ -1444,7 +1444,7 @@ static void convert_gs(struct group_run_stats *dst, struct group_run_stats *src)
 		dst->min_run[i]		= cpu_to_le64(src->min_run[i]);
 		dst->max_bw[i]		= cpu_to_le64(src->max_bw[i]);
 		dst->min_bw[i]		= cpu_to_le64(src->min_bw[i]);
-		dst->io_kb[i]		= cpu_to_le64(src->io_kb[i]);
+		dst->iobytes[i]		= cpu_to_le64(src->iobytes[i]);
 		dst->agg[i]		= cpu_to_le64(src->agg[i]);
 	}
 
diff --git a/stat.c b/stat.c
index 3e57e54..f1d468c 100644
--- a/stat.c
+++ b/stat.c
@@ -279,7 +279,8 @@ bool calc_lat(struct io_stat *is, unsigned long *min, unsigned long *max,
 
 void show_group_stats(struct group_run_stats *rs, struct buf_output *out)
 {
-	char *p1, *p2, *p3, *p4;
+	char *io, *agg, *min, *max;
+	char *ioalt, *aggalt, *minalt, *maxalt;
 	const char *str[] = { "   READ", "  WRITE" , "   TRIM"};
 	int i;
 
@@ -291,22 +292,28 @@ void show_group_stats(struct group_run_stats *rs, struct buf_output *out)
 		if (!rs->max_run[i])
 			continue;
 
-		p1 = num2str(rs->io_kb[i], 6, rs->kb_base, i2p, 8);
-		p2 = num2str(rs->agg[i], 6, rs->kb_base, i2p, rs->unit_base);
-		p3 = num2str(rs->min_bw[i], 6, rs->kb_base, i2p, rs->unit_base);
-		p4 = num2str(rs->max_bw[i], 6, rs->kb_base, i2p, rs->unit_base);
-
-		log_buf(out, "%s: io=%s, aggrb=%s/s, minb=%s/s, maxb=%s/s,"
-			 " mint=%llumsec, maxt=%llumsec\n",
+		io = num2str(rs->iobytes[i], 4, 1, i2p, N2S_BYTE);
+		ioalt = num2str(rs->iobytes[i], 4, 1, !i2p, N2S_BYTE);
+		agg = num2str(rs->agg[i], 4, 1, i2p, rs->unit_base);
+		aggalt = num2str(rs->agg[i], 4, 1, !i2p, rs->unit_base);
+		min = num2str(rs->min_bw[i], 4, 1, i2p, rs->unit_base);
+		minalt = num2str(rs->min_bw[i], 4, 1, !i2p, rs->unit_base);
+		max = num2str(rs->max_bw[i], 4, 1, i2p, rs->unit_base);
+		maxalt = num2str(rs->max_bw[i], 4, 1, !i2p, rs->unit_base);
+		log_buf(out, "%s: bw=%s (%s), %s-%s (%s-%s), io=%s (%s), run=%llu-%llumsec\n",
 				rs->unified_rw_rep ? "  MIXED" : str[i],
-				p1, p2, p3, p4,
+				agg, aggalt, min, max, minalt, maxalt, io, ioalt,
 				(unsigned long long) rs->min_run[i],
 				(unsigned long long) rs->max_run[i]);
 
-		free(p1);
-		free(p2);
-		free(p3);
-		free(p4);
+		free(io);
+		free(agg);
+		free(min);
+		free(max);
+		free(ioalt);
+		free(aggalt);
+		free(minalt);
+		free(maxalt);
 	}
 }
 
@@ -367,8 +374,8 @@ static void display_lat(const char *name, unsigned long min, unsigned long max,
 	if (usec_to_msec(&min, &max, &mean, &dev))
 		base = "(msec)";
 
-	minp = num2str(min, 6, 1, 0, 0);
-	maxp = num2str(max, 6, 1, 0, 0);
+	minp = num2str(min, 6, 1, 0, N2S_NONE);
+	maxp = num2str(max, 6, 1, 0, N2S_NONE);
 
 	log_buf(out, "    %s %s: min=%s, max=%s, avg=%5.02f,"
 		 " stdev=%5.02f\n", name, base, minp, maxp, mean, dev);
@@ -380,11 +387,11 @@ static void display_lat(const char *name, unsigned long min, unsigned long max,
 static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 			     int ddir, struct buf_output *out)
 {
-	const char *str[] = { "read ", "write", "trim" };
+	const char *str[] = { " read", "write", " trim" };
 	unsigned long min, max, runt;
 	unsigned long long bw, iops;
 	double mean, dev;
-	char *io_p, *bw_p, *iops_p;
+	char *io_p, *bw_p, *bw_p_alt, *iops_p;
 	int i2p;
 
 	assert(ddir_rw(ddir));
@@ -396,19 +403,21 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	runt = ts->runtime[ddir];
 
 	bw = (1000 * ts->io_bytes[ddir]) / runt;
-	io_p = num2str(ts->io_bytes[ddir], 6, 1, i2p, 8);
-	bw_p = num2str(bw, 6, 1, i2p, ts->unit_base);
+	io_p = num2str(ts->io_bytes[ddir], 4, 1, i2p, N2S_BYTE);
+	bw_p = num2str(bw, 4, 1, i2p, ts->unit_base);
+	bw_p_alt = num2str(bw, 4, 1, !i2p, ts->unit_base);
 
 	iops = (1000 * (uint64_t)ts->total_io_u[ddir]) / runt;
-	iops_p = num2str(iops, 6, 1, 0, 0);
+	iops_p = num2str(iops, 4, 1, 0, N2S_NONE);
 
-	log_buf(out, "  %s: io=%s, bw=%s/s, iops=%s, runt=%6llumsec\n",
-				rs->unified_rw_rep ? "mixed" : str[ddir],
-				io_p, bw_p, iops_p,
-				(unsigned long long) ts->runtime[ddir]);
+	log_buf(out, "  %s: IOPS=%s, BW=%s (%s)(%s/%llumsec)\n",
+			rs->unified_rw_rep ? "mixed" : str[ddir],
+			iops_p, bw_p, bw_p_alt, io_p,
+			(unsigned long long) ts->runtime[ddir]);
 
 	free(io_p);
 	free(bw_p);
+	free(bw_p_alt);
 	free(iops_p);
 
 	if (calc_lat(&ts->slat_stat[ddir], &min, &max, &mean, &dev))
@@ -426,7 +435,16 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	}
 	if (calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev)) {
 		double p_of_agg = 100.0, fkb_base = (double)rs->kb_base;
-		const char *bw_str = (rs->unit_base == 1 ? "Kbit" : "KB");
+		const char *bw_str;
+
+		if ((rs->unit_base == 1) && i2p)
+			bw_str = "Kibit";
+		else if (rs->unit_base == 1)
+			bw_str = "kbit";
+		else if (i2p)
+			bw_str = "KiB";
+		else
+			bw_str = "kB";
 
 		if (rs->unit_base == 1) {
 			min *= 8.0;
@@ -446,12 +464,11 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 			max /= fkb_base;
 			mean /= fkb_base;
 			dev /= fkb_base;
-			bw_str = (rs->unit_base == 1 ? "Mbit" : "MB");
+			bw_str = (rs->unit_base == 1 ? "Mibit" : "MiB");
 		}
 
-		log_buf(out, "    bw (%-4s/s): min=%5lu, max=%5lu, per=%3.2f%%,"
-			 " avg=%5.02f, stdev=%5.02f\n", bw_str, min, max,
-							p_of_agg, mean, dev);
+		log_buf(out, "   bw (%5s/s): min=%5lu, max=%5lu, per=%3.2f%%, avg=%5.02f, stdev=%5.02f\n",
+			bw_str, min, max, p_of_agg, mean, dev);
 	}
 }
 
@@ -659,7 +676,7 @@ static void show_block_infos(int nr_block_infos, uint32_t *block_infos,
 
 static void show_ss_normal(struct thread_stat *ts, struct buf_output *out)
 {
-	char *p1, *p2;
+	char *p1, *p1alt, *p2;
 	unsigned long long bw_mean, iops_mean;
 	const int i2p = is_power_of_2(ts->kb_base);
 
@@ -669,18 +686,20 @@ static void show_ss_normal(struct thread_stat *ts, struct buf_output *out)
 	bw_mean = steadystate_bw_mean(ts);
 	iops_mean = steadystate_iops_mean(ts);
 
-	p1 = num2str(bw_mean / ts->kb_base, 6, ts->kb_base, i2p, ts->unit_base);
-	p2 = num2str(iops_mean, 6, 1, 0, 0);
+	p1 = num2str(bw_mean / ts->kb_base, 4, ts->kb_base, i2p, ts->unit_base);
+	p1alt = num2str(bw_mean / ts->kb_base, 4, ts->kb_base, !i2p, ts->unit_base);
+	p2 = num2str(iops_mean, 4, 1, 0, N2S_NONE);
 
-	log_buf(out, "  steadystate  : attained=%s, bw=%s/s, iops=%s, %s%s=%.3f%s\n",
+	log_buf(out, "  steadystate  : attained=%s, bw=%s (%s), iops=%s, %s%s=%.3f%s\n",
 		ts->ss_state & __FIO_SS_ATTAINED ? "yes" : "no",
-		p1, p2,
+		p1, p1alt, p2,
 		ts->ss_state & __FIO_SS_IOPS ? "iops" : "bw",
 		ts->ss_state & __FIO_SS_SLOPE ? " slope": " mean dev",
 		ts->ss_criterion.u.f,
 		ts->ss_state & __FIO_SS_PCT ? "%" : "");
 
 	free(p1);
+	free(p1alt);
 	free(p2);
 }
 
@@ -761,9 +780,9 @@ static void show_thread_status_normal(struct thread_stat *ts,
 					io_u_dist[1], io_u_dist[2],
 					io_u_dist[3], io_u_dist[4],
 					io_u_dist[5], io_u_dist[6]);
-	log_buf(out, "     issued    : total=r=%llu/w=%llu/d=%llu,"
-				 " short=r=%llu/w=%llu/d=%llu,"
-				 " drop=r=%llu/w=%llu/d=%llu\n",
+	log_buf(out, "     issued rwt: total=%llu,%llu,%llu,"
+				 " short=%llu,%llu,%llu,"
+				 " dropped=%llu,%llu,%llu\n",
 					(unsigned long long) ts->total_io_u[0],
 					(unsigned long long) ts->total_io_u[1],
 					(unsigned long long) ts->total_io_u[2],
@@ -812,7 +831,7 @@ static void show_ddir_status_terse(struct thread_stat *ts,
 	if (ts->runtime[ddir]) {
 		uint64_t runt = ts->runtime[ddir];
 
-		bw = ((1000 * ts->io_bytes[ddir]) / runt) / 1024;
+		bw = ((1000 * ts->io_bytes[ddir]) / runt) / 1024; /* KiB/s */
 		iops = (1000 * (uint64_t) ts->total_io_u[ddir]) / runt;
 	}
 
@@ -896,7 +915,7 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	if (ts->runtime[ddir]) {
 		uint64_t runt = ts->runtime[ddir];
 
-		bw = ((1000 * ts->io_bytes[ddir]) / runt) / 1024;
+		bw = ((1000 * ts->io_bytes[ddir]) / runt) / 1024; /* KiB/s */
 		iops = (1000.0 * (uint64_t) ts->total_io_u[ddir]) / runt;
 	}
 
@@ -1418,7 +1437,7 @@ void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src)
 		if (dst->min_bw[i] && dst->min_bw[i] > src->min_bw[i])
 			dst->min_bw[i] = src->min_bw[i];
 
-		dst->io_kb[i] += src->io_kb[i];
+		dst->iobytes[i] += src->iobytes[i];
 		dst->agg[i] += src->agg[i];
 	}
 
@@ -1696,19 +1715,14 @@ void __show_run_stats(void)
 				rs->max_run[j] = ts->runtime[j];
 
 			bw = 0;
-			if (ts->runtime[j]) {
-				unsigned long runt = ts->runtime[j];
-				unsigned long long kb;
-
-				kb = ts->io_bytes[j] / rs->kb_base;
-				bw = kb * 1000 / runt;
-			}
+			if (ts->runtime[j])
+				bw = ts->io_bytes[j] * 1000 / ts->runtime[j];
 			if (bw < rs->min_bw[j])
 				rs->min_bw[j] = bw;
 			if (bw > rs->max_bw[j])
 				rs->max_bw[j] = bw;
 
-			rs->io_kb[j] += ts->io_bytes[j] / rs->kb_base;
+			rs->iobytes[j] += ts->io_bytes[j];
 		}
 	}
 
@@ -1719,7 +1733,7 @@ void __show_run_stats(void)
 
 		for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
 			if (rs->max_run[ddir])
-				rs->agg[ddir] = (rs->io_kb[ddir] * 1000) /
+				rs->agg[ddir] = (rs->iobytes[ddir] * 1000) /
 						rs->max_run[ddir];
 		}
 	}
@@ -2436,7 +2450,7 @@ static int add_bw_samples(struct thread_data *td, struct timeval *t)
 			continue; /* No entries for interval */
 
 		if (spent)
-			rate = delta * 1000 / spent / 1024;
+			rate = delta * 1000 / spent / 1024; /* KiB/s */
 		else
 			rate = 0;
 
diff --git a/stat.h b/stat.h
index 22083da..aa4ad80 100644
--- a/stat.h
+++ b/stat.h
@@ -7,7 +7,7 @@
 struct group_run_stats {
 	uint64_t max_run[DDIR_RWDIR_CNT], min_run[DDIR_RWDIR_CNT];
 	uint64_t max_bw[DDIR_RWDIR_CNT], min_bw[DDIR_RWDIR_CNT];
-	uint64_t io_kb[DDIR_RWDIR_CNT];
+	uint64_t iobytes[DDIR_RWDIR_CNT];
 	uint64_t agg[DDIR_RWDIR_CNT];
 	uint32_t kb_base;
 	uint32_t unit_base;
diff --git a/t/btrace2fio.c b/t/btrace2fio.c
index c589cea..4cdb38d 100644
--- a/t/btrace2fio.c
+++ b/t/btrace2fio.c
@@ -62,7 +62,7 @@ struct btrace_out {
 
 	uint64_t first_ttime[DDIR_RWDIR_CNT];
 	uint64_t last_ttime[DDIR_RWDIR_CNT];
-	uint64_t kb[DDIR_RWDIR_CNT];
+	uint64_t kib[DDIR_RWDIR_CNT];
 
 	uint64_t start_delay;
 };
@@ -406,7 +406,7 @@ static int handle_trace(struct blk_io_trace *t, struct btrace_pid *p)
 
 		i = inflight_find(t->sector + (t->bytes >> 9));
 		if (i) {
-			i->p->o.kb[t_to_rwdir(t)] += (t->bytes >> 10);
+			i->p->o.kib[t_to_rwdir(t)] += (t->bytes >> 10);
 			i->p->o.complete_seen = 1;
 			inflight_remove(i);
 		}
@@ -556,7 +556,7 @@ static int bs_cmp(const void *ba, const void *bb)
 	return bsb->nr - bsa->nr;
 }
 
-static unsigned long o_to_kb_rate(struct btrace_out *o, int rw)
+static unsigned long o_to_kib_rate(struct btrace_out *o, int rw)
 {
 	uint64_t usec = (o->last_ttime[rw] - o->first_ttime[rw]) / 1000ULL;
 	uint64_t val;
@@ -568,7 +568,7 @@ static unsigned long o_to_kb_rate(struct btrace_out *o, int rw)
 	if (!usec)
 		return 0;
 
-	val = o->kb[rw] * 1000ULL;
+	val = o->kib[rw] * 1000ULL;
 	return val / usec;
 }
 
@@ -623,7 +623,7 @@ static void __output_p_ascii(struct btrace_pid *p, unsigned long *ios)
 		printf("\tmerges: %lu (perc=%3.2f%%)\n", o->merges[i], perc);
 		perc = ((float) o->seq[i] * 100.0) / (float) o->ios[i];
 		printf("\tseq:    %lu (perc=%3.2f%%)\n", (unsigned long) o->seq[i], perc);
-		printf("\trate:   %lu KB/sec\n", o_to_kb_rate(o, i));
+		printf("\trate:   %lu KiB/sec\n", o_to_kib_rate(o, i));
 
 		for (j = 0; j < o->nr_bs[i]; j++) {
 			struct bs *bs = &o->bs[i][j];
@@ -746,7 +746,7 @@ static int __output_p_fio(struct btrace_pid *p, unsigned long *ios)
 		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 			unsigned long rate;
 
-			rate = o_to_kb_rate(o, i);
+			rate = o_to_kib_rate(o, i);
 			if (i)
 				printf(",");
 			if (rate)
@@ -810,7 +810,7 @@ static int prune_entry(struct btrace_out *o)
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		unsigned long this_rate;
 
-		this_rate = o_to_kb_rate(o, i);
+		this_rate = o_to_kib_rate(o, i);
 		if (this_rate < rate_threshold) {
 			remove_ddir(o, i);
 			this_rate = 0;
@@ -926,7 +926,7 @@ static int merge_entries(struct btrace_pid *pida, struct btrace_pid *pidb)
 		oa->ios[i] += ob->ios[i];
 		oa->merges[i] += ob->merges[i];
 		oa->seq[i] += ob->seq[i];
-		oa->kb[i] += ob->kb[i];
+		oa->kib[i] += ob->kib[i];
 		oa->first_ttime[i] = min(oa->first_ttime[i], ob->first_ttime[i]);
 		oa->last_ttime[i] = max(oa->last_ttime[i], ob->last_ttime[i]);
 		merge_bs(&oa->bs[i], &oa->nr_bs[i], ob->bs[i], ob->nr_bs[i]);
@@ -1021,7 +1021,7 @@ static int usage(char *argv[])
 	log_err("\t-n\tNumber IOS threshold to ignore task\n");
 	log_err("\t-f\tFio job file output\n");
 	log_err("\t-d\tUse this file/device for replay\n");
-	log_err("\t-r\tIgnore jobs with less than this KB/sec rate\n");
+	log_err("\t-r\tIgnore jobs with less than this KiB/sec rate\n");
 	log_err("\t-R\tSet rate in fio job (def=%u)\n", set_rate);
 	log_err("\t-D\tCap queue depth at this value (def=%u)\n", max_depth);
 	log_err("\t-c\tCollapse \"identical\" jobs (def=%u)\n", collapse_entries);
diff --git a/t/dedupe.c b/t/dedupe.c
index 7856da1..c0e9a69 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -363,7 +363,7 @@ static void show_progress(struct worker_thread *threads, unsigned long total)
 		tdiff = mtime_since_now(&last_tv);
 		if (tdiff) {
 			this_items = (this_items * 1000) / (tdiff * 1024);
-			printf("%3.2f%% done (%luKB/sec)\r", perc, this_items);
+			printf("%3.2f%% done (%luKiB/sec)\r", perc, this_items);
 			last_nitems = nitems;
 			fio_gettime(&last_tv, NULL);
 		} else
diff --git a/t/genzipf.c b/t/genzipf.c
index d8253c3..9faec38 100644
--- a/t/genzipf.c
+++ b/t/genzipf.c
@@ -3,8 +3,8 @@
  * what an access pattern would look like.
  *
  * For instance, the following would generate a zipf distribution
- * with theta 1.2, using 262144 (1 GB / 4096) values and split the reporting into
- * 20 buckets:
+ * with theta 1.2, using 262144 (1 GiB / 4096) values and split the
+ * reporting into 20 buckets:
  *
  *	./t/fio-genzipf -t zipf -i 1.2 -g 1 -b 4096 -o 20
  *
@@ -49,7 +49,7 @@ enum {
 };
 
 static int dist_type = TYPE_ZIPF;
-static unsigned long gb_size = 500;
+static unsigned long gib_size = 500;
 static unsigned long block_size = 4096;
 static unsigned long output_nranges = DEF_NR_OUTPUT;
 static double percentage;
@@ -131,7 +131,7 @@ static int parse_options(int argc, char *argv[])
 			}
 			break;
 		case 'g':
-			gb_size = strtoul(optarg, NULL, 10);
+			gib_size = strtoul(optarg, NULL, 10);
 			break;
 		case 'i':
 			dist_val = atof(optarg);
@@ -291,9 +291,10 @@ int main(int argc, char *argv[])
 		return 1;
 
 	if (output_type != OUTPUT_CSV)
-		printf("Generating %s distribution with %f input and %lu GB size and %lu block_size.\n", dist_types[dist_type], dist_val, gb_size, block_size);
+		printf("Generating %s distribution with %f input and %lu GiB size and %lu block_size.\n",
+		       dist_types[dist_type], dist_val, gib_size, block_size);
 
-	nranges = gb_size * 1024 * 1024 * 1024ULL;
+	nranges = gib_size * 1024 * 1024 * 1024ULL;
 	nranges /= block_size;
 
 	if (dist_type == TYPE_ZIPF)
diff --git a/t/lfsr-test.c b/t/lfsr-test.c
index bad5097..7016f26 100644
--- a/t/lfsr-test.c
+++ b/t/lfsr-test.c
@@ -80,7 +80,7 @@ int main(int argc, char *argv[])
 		v_size = numbers * sizeof(uint8_t);
 		v = malloc(v_size);
 		memset(v, 0, v_size);
-		printf("\nVerification table is %lf KBs\n", (double)(v_size) / 1024);
+		printf("\nVerification table is %lf KiB\n", (double)(v_size) / 1024);
 	}
 	v_start = v;
 
diff --git a/t/memlock.c b/t/memlock.c
index d9d586d..3d3579a 100644
--- a/t/memlock.c
+++ b/t/memlock.c
@@ -4,7 +4,7 @@
 #include <pthread.h>
 
 static struct thread_data {
-	unsigned long mb;
+	unsigned long mib;
 } td;
 
 static void *worker(void *data)
@@ -15,14 +15,14 @@ static void *worker(void *data)
 	char *buf;
 	int i, first = 1;
 
-	size = td->mb * 1024UL * 1024UL;
+	size = td->mib * 1024UL * 1024UL;
 	buf = malloc(size);
 
 	for (i = 0; i < 100000; i++) {
 		for (index = 0; index + 4096 < size; index += 4096)
 			memset(&buf[index+512], 0x89, 512);
 		if (first) {
-			printf("loop%d: did %lu MB\n", i+1, size/(1024UL*1024UL));
+			printf("loop%d: did %lu MiB\n", i+1, size/(1024UL*1024UL));
 			first = 0;
 		}
 	}
@@ -31,20 +31,20 @@ static void *worker(void *data)
 
 int main(int argc, char *argv[])
 {
-	unsigned long mb, threads;
+	unsigned long mib, threads;
 	pthread_t *pthreads;
 	int i;
 
 	if (argc < 3) {
-		printf("%s: <mb per thread> <threads>\n", argv[0]);
+		printf("%s: <MiB per thread> <threads>\n", argv[0]);
 		return 1;
 	}
 
-	mb = strtoul(argv[1], NULL, 10);
+	mib = strtoul(argv[1], NULL, 10);
 	threads = strtoul(argv[2], NULL, 10);
 
 	pthreads = calloc(threads, sizeof(pthread_t));
-	td.mb = mb;
+	td.mib = mib;
 
 	for (i = 0; i < threads; i++)
 		pthread_create(&pthreads[i], NULL, worker, &td);
diff --git a/t/read-to-pipe-async.c b/t/read-to-pipe-async.c
index e8bdc85..ebdd8f1 100644
--- a/t/read-to-pipe-async.c
+++ b/t/read-to-pipe-async.c
@@ -661,9 +661,9 @@ int main(int argc, char *argv[])
 
 	bytes /= 1024;
 	rate = (bytes * 1000UL * 1000UL) / utime_since(&s, &re);
-	fprintf(stderr, "Read rate (KB/sec) : %lu\n", rate);
+	fprintf(stderr, "Read rate (KiB/sec) : %lu\n", rate);
 	rate = (bytes * 1000UL * 1000UL) / utime_since(&s, &we);
-	fprintf(stderr, "Write rate (KB/sec): %lu\n", rate);
+	fprintf(stderr, "Write rate (KiB/sec): %lu\n", rate);
 
 	close(fd);
 	return 0;
diff --git a/t/stest.c b/t/stest.c
index 0e0d8b0..04df60d 100644
--- a/t/stest.c
+++ b/t/stest.c
@@ -59,15 +59,6 @@ static int do_rand_allocs(void)
 	return 0;
 }
 
-static int do_specific_alloc(unsigned long size)
-{
-	void *ptr;
-
-	ptr = smalloc(size);
-	sfree(ptr);
-	return 0;
-}
-
 int main(int argc, char *argv[])
 {
 	arch_init(argv);
@@ -76,9 +67,6 @@ int main(int argc, char *argv[])
 
 	do_rand_allocs();
 
-	/* smalloc bug, commit 271067a6 */
-	do_specific_alloc(671386584);
-
 	scleanup();
 	return 0;
 }
diff --git a/unit_tests/steadystate_tests.py b/unit_tests/steadystate_tests.py
index a8e4e39..91c79a4 100755
--- a/unit_tests/steadystate_tests.py
+++ b/unit_tests/steadystate_tests.py
@@ -115,7 +115,7 @@ if __name__ == '__main__':
     if args.read == None:
         if os.name == 'posix':
             args.read = '/dev/zero'
-            extra = [ "--size=128M" ]
+            extra = [ "--size=134217728" ]  # 128 MiB
         else:
             print "ERROR: file for read testing must be specified on non-posix systems"
             sys.exit(1)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-12-30 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-12-30 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 35448275f6577483f2a5f98db27f28bd3257ddb5:

  rbd: style fixups (2016-12-23 19:54:47 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 915ca9807717762e288ded3eba0fe5fc82a2ddcd:

  options: mark steadystate option parents (2016-12-29 09:07:57 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      options: mark steadystate option parents

 options.c | 2 ++
 1 file changed, 2 insertions(+)

---

Diff of recent changes:

diff --git a/options.c b/options.c
index b81db23..d8b4012 100644
--- a/options.c
+++ b/options.c
@@ -4321,6 +4321,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name   = "steadystate_duration",
 		.lname  = "Steady state duration",
 		.alias  = "ss_dur",
+		.parent	= "steadystate",
 		.type   = FIO_OPT_STR_VAL_TIME,
 		.off1   = offsetof(struct thread_options, ss_dur),
 		.help   = "Stop workload upon attaining steady state for specified duration",
@@ -4334,6 +4335,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name   = "steadystate_ramp_time",
 		.lname  = "Steady state ramp time",
 		.alias  = "ss_ramp",
+		.parent	= "steadystate",
 		.type   = FIO_OPT_STR_VAL_TIME,
 		.off1   = offsetof(struct thread_options, ss_ramp_time),
 		.help   = "Delay before initiation of data collection for steady state job termination testing",

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-12-24 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-12-24 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 3ee2a3c1ae52841ecb4926ab5748e6d856fd4b2c:

  fio: Add support for auto detect dev-dax and libpmemblk engines (2016-12-20 16:43:36 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 35448275f6577483f2a5f98db27f28bd3257ddb5:

  rbd: style fixups (2016-12-23 19:54:47 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      rbd: style fixups

Pan Liu (1):
      rbd: remove duplicate _fio_rbd_connect

 engines/rbd.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/rbd.c b/engines/rbd.c
index ee2ce81..62f0b2e 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -36,6 +36,7 @@ struct rbd_data {
 	struct io_u **aio_events;
 	struct io_u **sort_events;
 	int fd; /* add for poll */
+	bool connected;
 };
 
 struct rbd_options {
@@ -111,6 +112,8 @@ static int _fio_setup_rbd_data(struct thread_data *td,
 	if (!rbd)
 		goto failed;
 
+	rbd->connected = false;
+
 	/* add for poll, init fd: -1 */
 	rbd->fd = -1;
 
@@ -529,6 +532,10 @@ failed:
 static int fio_rbd_init(struct thread_data *td)
 {
 	int r;
+	struct rbd_data *rbd = td->io_ops_data;
+
+	if (rbd->connected)
+		return 0;
 
 	r = _fio_rbd_connect(td);
 	if (r) {
@@ -589,6 +596,7 @@ static int fio_rbd_setup(struct thread_data *td)
 		log_err("fio_rbd_connect failed.\n");
 		goto cleanup;
 	}
+	rbd->connected = true;
 
 	/* get size of the RADOS block device */
 	r = rbd_stat(rbd->image, &info, sizeof(info));
@@ -618,7 +626,6 @@ static int fio_rbd_setup(struct thread_data *td)
 	/* disconnect, then we were only connected to determine
 	 * the size of the RBD.
 	 */
-	_fio_rbd_disconnect(rbd);
 	return 0;
 
 disconnect:

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-12-21 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-12-21 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4fc8c5adec635f3a0d7e6c666328e96f14a9f015:

  Fio 2.16 (2016-12-19 23:12:56 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 3ee2a3c1ae52841ecb4926ab5748e6d856fd4b2c:

  fio: Add support for auto detect dev-dax and libpmemblk engines (2016-12-20 16:43:36 -0700)

----------------------------------------------------------------
Dave Jiang (1):
      fio: Add support for auto detect dev-dax and libpmemblk engines

Jens Axboe (1):
      Fix compile on OSX

 configure | 53 ++++++++++++++++++++++++++++++++++++++++++++++-------
 os/os.h   |  1 +
 2 files changed, 47 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 03bed3b..fc15782 100755
--- a/configure
+++ b/configure
@@ -138,6 +138,7 @@ libhdfs="no"
 pmemblk="no"
 devdax="no"
 disable_lex=""
+disable_pmem="no"
 prefix=/usr/local
 
 # parse options
@@ -173,10 +174,6 @@ for opt do
   ;;
   --enable-libhdfs) libhdfs="yes"
   ;;
-  --enable-pmemblk) pmemblk="yes"
-  ;;
-  --enable-devdax) devdax="yes"
-  ;;
   --disable-lex) disable_lex="yes"
   ;;
   --enable-lex) disable_lex="no"
@@ -185,6 +182,8 @@ for opt do
   ;;
   --disable-optimizations) disable_opt="yes"
   ;;
+  --disable-pmem) disable_pmem="yes"
+  ;;
   --help)
     show_help="yes"
     ;;
@@ -207,9 +206,8 @@ if test "$show_help" = "yes" ; then
   echo "--disable-numa         Disable libnuma even if found"
   echo "--disable-gfapi        Disable gfapi"
   echo "--enable-libhdfs       Enable hdfs support"
-  echo "--enable-pmemblk       Enable NVML libpmemblk support"
-  echo "--enable-devdax        Enable NVM Device Dax support"
   echo "--disable-lex          Disable use of lex/yacc for math"
+  echo "--disable-pmem         Disable pmem based engines even if found"
   echo "--enable-lex           Enable use of lex/yacc for math"
   echo "--disable-shm          Disable SHM support"
   echo "--disable-optimizations Don't enable compiler optimizations"
@@ -1567,12 +1565,53 @@ fi
 echo "MTD                           $mtd"
 
 ##########################################
+# Check whether we have libpmem
+libpmem="no"
+cat > $TMPC << EOF
+#include <libpmem.h>
+int main(int argc, char **argv)
+{
+  int rc;
+  rc = pmem_is_pmem(0, 0);
+  return 0;
+}
+EOF
+if compile_prog "" "-lpmem" "libpmem"; then
+  libpmem="yes"
+fi
+echo "libpmem                       $libpmem"
+
+##########################################
+# Check whether we have libpmemblk
+libpmemblk="no"
+cat > $TMPC << EOF
+#include <libpmemblk.h>
+int main(int argc, char **argv)
+{
+  int rc;
+  rc = pmemblk_open("", 0);
+  return 0;
+}
+EOF
+if compile_prog "" "-lpmemblk -lpmem" "libpmemblk"; then
+  libpmemblk="yes"
+fi
+echo "libpmemblk                    $libpmemblk"
+
+if test "$libpmem" = "yes" && test "$disable_pmem" = "no"; then
+  devdax="yes"
+  if test "$libpmemblk" = "yes"; then
+    pmemblk="yes"
+  fi
+fi
+
+##########################################
 # Report whether pmemblk engine is enabled
 echo "NVML libpmemblk engine        $pmemblk"
 
 ##########################################
 # Report whether dev-dax engine is enabled
-echo "NVM Device Dax engine        $devdax"
+echo "NVML Device Dax engine        $devdax"
 
 # Check if we have lex/yacc available
 yacc="no"
diff --git a/os/os.h b/os/os.h
index 16bca68..4178e6f 100644
--- a/os/os.h
+++ b/os/os.h
@@ -81,6 +81,7 @@ typedef struct aiocb os_aiocb_t;
 #define POSIX_FADV_DONTNEED	(0)
 #define POSIX_FADV_SEQUENTIAL	(0)
 #define POSIX_FADV_RANDOM	(0)
+#define POSIX_FADV_NORMAL	(0)
 #endif
 
 #ifndef FIO_HAVE_CPU_AFFINITY

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-12-20 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-12-20 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 104ee4dea7246d51d053076c55b917548dd0e7e2:

  fio: add additional support for dev-dax ioengine (2016-12-16 16:05:05 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4fc8c5adec635f3a0d7e6c666328e96f14a9f015:

  Fio 2.16 (2016-12-19 23:12:56 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      options: full control of fadvise hinting
      Fio 2.16

 FIO-VERSION-GEN        |  2 +-
 HOWTO                  | 13 +++++++++++--
 fio.1                  | 22 ++++++++++++++++++++--
 fio.h                  |  7 +++++++
 ioengines.c            | 16 +++++++++++++---
 options.c              | 20 +++++++++++++++++++-
 os/windows/install.wxs |  2 +-
 7 files changed, 72 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index eac0e00..b324859 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.15
+DEF_VER=fio-2.16
 
 LF='
 '
diff --git a/HOWTO b/HOWTO
index 6893c86..7274c0e 100644
--- a/HOWTO
+++ b/HOWTO
@@ -476,8 +476,17 @@ fadvise_hint=bool By default, fio will use fadvise() to advise the kernel
 		on what IO patterns it is likely to issue. Sometimes you
 		want to test specific IO patterns without telling the
 		kernel about it, in which case you can disable this option.
-		If set, fio will use POSIX_FADV_SEQUENTIAL for sequential
-		IO and POSIX_FADV_RANDOM for random IO.
+		The following options are supported:
+
+			sequential	Use FADV_SEQUENTIAL
+			random		Use FADV_RANDOM
+			1		Backwards-compatible hint for basing
+					the hint on the fio workload. Will use
+					FADV_SEQUENTIAL for a sequential
+					workload, and FADV_RANDOM for a random
+					workload.
+			0		Backwards-compatible setting for not
+					issing a fadvise hint.
 
 fadvise_stream=int Notify the kernel what write stream ID to place these
 		writes under. Only supported on Linux. Note, this option
diff --git a/fio.1 b/fio.1
index e8a327c..6161760 100644
--- a/fio.1
+++ b/fio.1
@@ -396,9 +396,27 @@ available on Linux. If using ZFS on Solaris this must be set to 'none'
 because ZFS doesn't support it. Default: 'posix'.
 .RE
 .TP
-.BI fadvise_hint \fR=\fPbool
+.BI fadvise_hint \fR=\fPstr
 Use \fBposix_fadvise\fR\|(2) to advise the kernel what I/O patterns
-are likely to be issued. Default: true.
+are likely to be issued. Accepted values are:
+.RS
+.RS
+.TP
+.B 0
+Backwards compatible hint for "no hint".
+.TP
+.B 1
+Backwards compatible hint for "advise with fio workload type". This
+uses \fBFADV_RANDOM\fR for a random workload, and \fBFADV_SEQUENTIAL\fR
+for a sequential workload.
+.TP
+.B sequential
+Advise using \fBFADV_SEQUENTIAL\fR
+.TP
+.B random
+Advise using \fBFADV_RANDOM\fR
+.RE
+.RE
 .TP
 .BI fadvise_stream \fR=\fPint
 Use \fBposix_fadvise\fR\|(2) to advise the kernel what stream ID the
diff --git a/fio.h b/fio.h
index 5726bef..df17074 100644
--- a/fio.h
+++ b/fio.h
@@ -110,6 +110,13 @@ enum {
 	RATE_PROCESS_POISSON = 1,
 };
 
+enum {
+	F_ADV_NONE = 0,
+	F_ADV_TYPE,
+	F_ADV_RANDOM,
+	F_ADV_SEQUENTIAL,
+};
+
 /*
  * Per-thread/process specific data. Only used for the network client
  * for now.
diff --git a/ioengines.c b/ioengines.c
index 1b58168..a1f8756 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -448,14 +448,24 @@ int td_io_open_file(struct thread_data *td, struct fio_file *f)
 	if (td->o.invalidate_cache && file_invalidate_cache(td, f))
 		goto err;
 
-	if (td->o.fadvise_hint &&
+	if (td->o.fadvise_hint != F_ADV_NONE &&
 	    (f->filetype == FIO_TYPE_BD || f->filetype == FIO_TYPE_FILE)) {
 		int flags;
 
-		if (td_random(td))
+		if (td->o.fadvise_hint == F_ADV_TYPE) {
+			if (td_random(td))
+				flags = POSIX_FADV_RANDOM;
+			else
+				flags = POSIX_FADV_SEQUENTIAL;
+		} else if (td->o.fadvise_hint == F_ADV_RANDOM)
 			flags = POSIX_FADV_RANDOM;
-		else
+		else if (td->o.fadvise_hint == F_ADV_SEQUENTIAL)
 			flags = POSIX_FADV_SEQUENTIAL;
+		else {
+			log_err("fio: unknown fadvise type %d\n",
+							td->o.fadvise_hint);
+			flags = POSIX_FADV_NORMAL;
+		}
 
 		if (posix_fadvise(f->fd, f->file_offset, f->io_size, flags) < 0) {
 			td_verror(td, errno, "fadvise");
diff --git a/options.c b/options.c
index 7638afc..b81db23 100644
--- a/options.c
+++ b/options.c
@@ -2295,8 +2295,26 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "fadvise_hint",
 		.lname	= "Fadvise hint",
-		.type	= FIO_OPT_BOOL,
+		.type	= FIO_OPT_STR,
 		.off1	= offsetof(struct thread_options, fadvise_hint),
+		.posval	= {
+			  { .ival = "0",
+			    .oval = F_ADV_NONE,
+			    .help = "Don't issue fadvise",
+			  },
+			  { .ival = "1",
+			    .oval = F_ADV_TYPE,
+			    .help = "Advise using fio IO pattern",
+			  },
+			  { .ival = "random",
+			    .oval = F_ADV_RANDOM,
+			    .help = "Advise using FADV_RANDOM",
+			  },
+			  { .ival = "sequential",
+			    .oval = F_ADV_SEQUENTIAL,
+			    .help = "Advise using FADV_SEQUENTIAL",
+			  },
+		},
 		.help	= "Use fadvise() to advise the kernel on IO pattern",
 		.def	= "1",
 		.category = FIO_OPT_C_FILE,
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 25cb269..b660fc6 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.15">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.16">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-12-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-12-17 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0d60927f167d318a685b9e5309bb392c624776e4:

  fio: move _FORTIFY_SOURCE to only when optimization is turned on (2016-12-15 14:55:00 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 104ee4dea7246d51d053076c55b917548dd0e7e2:

  fio: add additional support for dev-dax ioengine (2016-12-16 16:05:05 -0700)

----------------------------------------------------------------
Dave Jiang (1):
      fio: add additional support for dev-dax ioengine

Jens Axboe (11):
      Add local TODO for steady-state work
      steady-state: convert options across the wire
      TODO: man page missing
      steadystate: move into its own header
      steadystate: cleanups
      steadystate: more cleanups
      steadystate: rename 'steadystate' to 'steadystate_enabled'
      steadystate: eliminate some steadystate_data members
      steadystate: kill off ->attained
      steadystate: kill ->last_in_group and ->ramp_time_over
      Merge https://bitbucket.org/vincentfu/fio-steadystate into steady-state-2

Vincent Fu (30):
      Allow fio to terminate jobs when steady state is attained
      Fix typo, restore unintended HOWTO deletion
      Add locking around reads of td->io_blocks and io_byes.
      Change steadystate reporting:
      Steady state detection: enhance reporting of results, change memory allocation point
      JSON output, code formatting changes
      Fix bug where measurements were not printed in the correct order when steady state was not attained
      Split helper thread debug logging away from steadystate debug logging
      Update test script for new JSON steadystate['criterion'] reporting
      Clear up white space errors
      steadystate: bug fixes
      steadystate: reject job if steadystate options are not consistent within reporting group
      steadystate: add example job file
      steadystate: update man page
      steadystate: rename __FIO_SS_LAST to __FIO_SS_DATA
      steadystate: get rid of ->ss_pct and encode this information in ->state via __FIO_SS_PCT
      steadystate: add line for output-format=normal
      steadystate: implement transmission of steadystate data over the wire in client/server mode
      steadystate: instead of including ss_sum_y in thread_stat record whether ss_sum_y is nonzero in ss_state via __FIO_SS_BUFFER_FULL
      steadystate: rename TODO to STEADYSTATE-TODO
      steadystate: ensure that pointers in thread_stat occupy the same amount of storage in 32- and 64-bit platforms
      steadystate: fix alignment in stat.h
      steadystate: use uint64_t for storing bw and iops calculations and related values. Also fix an overflow with group_bw on 32-bit platforms.
      steadystate: clean up checks for when steadystate termination is not engaged
      steadystate: rename options->ss to options->ss_state since ss is used elsewhere to refer to struct steadystate_data
      steadystate: improve output of test script
      steadystate: make file for read testing optional on posix systems
      steadystate: make test script work better under Windows
      Merge git://git.kernel.dk/fio into steady-state
      use type double in creating floating point JSON values

 HOWTO                           |  47 ++++-
 Makefile                        |   3 +-
 STEADYSTATE-TODO                |  14 ++
 backend.c                       |   7 +
 cconv.c                         |   8 +
 client.c                        |  25 +++
 debug.h                         |   2 +
 examples/dev-dax.fio            |  45 +++++
 examples/steadystate.fio        |  45 +++++
 fio.1                           |  47 +++++
 fio.h                           |   3 +
 helper_thread.c                 |  27 ++-
 init.c                          |  14 ++
 json.c                          |   2 +-
 libfio.c                        |   1 +
 options.c                       | 134 +++++++++++++++
 server.c                        |  32 +++-
 server.h                        |   2 +-
 stat.c                          |  94 ++++++++++
 stat.h                          |  24 ++-
 steadystate.c                   | 368 ++++++++++++++++++++++++++++++++++++++++
 steadystate.h                   |  61 +++++++
 thread_options.h                |   9 +
 unit_tests/steadystate_tests.py | 222 ++++++++++++++++++++++++
 24 files changed, 1225 insertions(+), 11 deletions(-)
 create mode 100644 STEADYSTATE-TODO
 create mode 100644 examples/dev-dax.fio
 create mode 100644 examples/steadystate.fio
 create mode 100644 steadystate.c
 create mode 100644 steadystate.h
 create mode 100755 unit_tests/steadystate_tests.py

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 577eed9..6893c86 100644
--- a/HOWTO
+++ b/HOWTO
@@ -297,7 +297,7 @@ irange	Integer range with suffix. Allows value range to be given, such
 	1k:4k. If the option allows two sets of ranges, they can be
 	specified with a ',' or '/' delimiter: 1k-4k/8k-32k. Also see
 	int.
-float_list	A list of floating numbers, separated by a ':' character.
+float_list	A list of floating point numbers, separated by a ':' character.
 
 With the above in mind, here follows the complete list of fio job
 parameters.
@@ -819,6 +819,9 @@ ioengine=str	Defines how the job issues io to the file. The following
 			pmemblk	Read and write through the NVML libpmemblk
 				interface.
 
+			dev-dax Read and write through a DAX device exposed
+				from persistent memory.
+
 			external Prefix to specify loading an external
 				IO engine object file. Append the engine
 				filename, eg ioengine=external:/tmp/foo.o
@@ -1218,6 +1221,48 @@ ramp_time=time	If set, fio will run the specified workload for this amount
 		thus it will increase the total runtime if a special timeout
 		or runtime is specified.
 
+steadystate=str:float
+ss=str:float	Define the criterion and limit for assessing steady state
+		performance. The first parameter designates the criterion
+		whereas the second parameter sets the threshold. When the
+		criterion falls below the threshold for the specified duration,
+		the job will stop. For example, iops_slope:0.1% will direct fio
+		to terminate the job when the least squares regression slope
+		falls below 0.1% of the mean IOPS. If group_reporting is
+		enabled this will apply to all jobs in the group. Below is the
+		list of available steady state assessment criteria. All
+		assessments are carried out using only data from the rolling
+		collection window. Threshold limits can be expressed as a fixed
+		value or as a percentage of the mean in the collection window.
+			iops	Collect IOPS data. Stop the job if all
+				individual IOPS measurements are within the
+				specified limit of the mean IOPS (e.g., iops:2
+				means that all individual IOPS values must be
+				within 2 of the mean, whereas iops:0.2% means
+				that all individual IOPS values must be within
+				0.2% of	the mean IOPS to terminate the job).
+			iops_slope
+				Collect IOPS data and calculate the least
+				squares regression slope. Stop the job if the
+				slope falls below the specified limit.
+			bw	Collect bandwidth data. Stop the job if all
+				individual bandwidth measurements are within
+				the specified limit of the mean bandwidth.
+			bw_slope
+				Collect bandwidth data and calculate the least
+				squares regression slope. Stop the job if the
+				slope falls below the specified limit.
+
+steadystate_duration=time
+ss_dur=time	A rolling window of this duration will be used to judge whether
+		steady state has been reached. Data will be collected once per
+		second. The default is 0 which disables steady state detection.
+
+steadystate_ramp_time=time
+ss_ramp=time	Allow the job to run for the specified duration before
+		beginning data collection for checking the steady state job
+		termination criterion. The default is 0.
+
 invalidate=bool	Invalidate the buffer/page cache parts for this file prior
 		to starting io. Defaults to true.
 
diff --git a/Makefile b/Makefile
index d27380b..4c64168 100644
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,8 @@ SOURCE :=	$(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		server.c client.c iolog.c backend.c libfio.c flow.c cconv.c \
 		gettime-thread.c helpers.c json.c idletime.c td_error.c \
 		profiles/tiobench.c profiles/act.c io_u_queue.c filelock.c \
-		workqueue.c rate-submit.c optgroup.c helper_thread.c
+		workqueue.c rate-submit.c optgroup.c helper_thread.c \
+		steadystate.c
 
 ifdef CONFIG_LIBHDFS
   HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
diff --git a/STEADYSTATE-TODO b/STEADYSTATE-TODO
new file mode 100644
index 0000000..e4b146e
--- /dev/null
+++ b/STEADYSTATE-TODO
@@ -0,0 +1,14 @@
+Known issues/TODO (for steady-state)
+
+- Allow user to specify the frequency of measurements
+
+- Better documentation for output
+
+- Report read, write, trim IOPS/BW separately
+
+- Semantics for the ring buffer ss->head are confusing. ss->head points
+  to the beginning of the buffer up through the point where the buffer
+  is filled for the first time. afterwards, when a new element is added,
+  ss->head is advanced to point to the second element in the buffer. if
+  steady state is attained upon adding a new element, ss->head is not
+  advanced so it actually does point to the head of the buffer.
diff --git a/backend.c b/backend.c
index ac71521..a048452 100644
--- a/backend.c
+++ b/backend.c
@@ -1665,6 +1665,7 @@ static void *thread_main(void *data)
 	fio_getrusage(&td->ru_start);
 	memcpy(&td->bw_sample_time, &td->epoch, sizeof(td->epoch));
 	memcpy(&td->iops_sample_time, &td->epoch, sizeof(td->epoch));
+	memcpy(&td->ss.prev_time, &td->epoch, sizeof(td->epoch));
 
 	if (o->ratemin[DDIR_READ] || o->ratemin[DDIR_WRITE] ||
 			o->ratemin[DDIR_TRIM]) {
@@ -2415,6 +2416,12 @@ int fio_backend(struct sk_out *sk_out)
 	}
 
 	for_each_td(td, i) {
+		if (td->ss.dur) {
+			if (td->ss.iops_data != NULL) {
+				free(td->ss.iops_data);
+				free(td->ss.bw_data);
+			}
+		}
 		fio_options_free(td);
 		if (td->rusage_sem) {
 			fio_mutex_remove(td->rusage_sem);
diff --git a/cconv.c b/cconv.c
index 0032cc0..336805b 100644
--- a/cconv.c
+++ b/cconv.c
@@ -213,6 +213,10 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->start_delay_high = le64_to_cpu(top->start_delay_high);
 	o->timeout = le64_to_cpu(top->timeout);
 	o->ramp_time = le64_to_cpu(top->ramp_time);
+	o->ss_dur = le64_to_cpu(top->ss_dur);
+	o->ss_ramp_time = le64_to_cpu(top->ss_ramp_time);
+	o->ss_state = le32_to_cpu(top->ss_state);
+	o->ss_limit.u.f = fio_uint64_to_double(le64_to_cpu(top->ss_limit.u.i));
 	o->zone_range = le64_to_cpu(top->zone_range);
 	o->zone_size = le64_to_cpu(top->zone_size);
 	o->zone_skip = le64_to_cpu(top->zone_skip);
@@ -523,6 +527,10 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->start_delay_high = __cpu_to_le64(o->start_delay_high);
 	top->timeout = __cpu_to_le64(o->timeout);
 	top->ramp_time = __cpu_to_le64(o->ramp_time);
+	top->ss_dur = __cpu_to_le64(top->ss_dur);
+	top->ss_ramp_time = __cpu_to_le64(top->ss_ramp_time);
+	top->ss_state = cpu_to_le32(top->ss_state);
+	top->ss_limit.u.i = __cpu_to_le64(fio_double_to_uint64(o->ss_limit.u.f));
 	top->zone_range = __cpu_to_le64(o->zone_range);
 	top->zone_size = __cpu_to_le64(o->zone_size);
 	top->zone_skip = __cpu_to_le64(o->zone_skip);
diff --git a/client.c b/client.c
index c613887..48d4c52 100644
--- a/client.c
+++ b/client.c
@@ -946,6 +946,21 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->nr_block_infos	= le64_to_cpu(src->nr_block_infos);
 	for (i = 0; i < dst->nr_block_infos; i++)
 		dst->block_infos[i] = le32_to_cpu(src->block_infos[i]);
+
+	dst->ss_dur		= le64_to_cpu(src->ss_dur);
+	dst->ss_state		= le32_to_cpu(src->ss_state);
+	dst->ss_head		= le32_to_cpu(src->ss_head);
+	dst->ss_limit.u.f 	= fio_uint64_to_double(le64_to_cpu(src->ss_limit.u.i));
+	dst->ss_slope.u.f 	= fio_uint64_to_double(le64_to_cpu(src->ss_slope.u.i));
+	dst->ss_deviation.u.f 	= fio_uint64_to_double(le64_to_cpu(src->ss_deviation.u.i));
+	dst->ss_criterion.u.f 	= fio_uint64_to_double(le64_to_cpu(src->ss_criterion.u.i));
+
+	if (dst->ss_state & __FIO_SS_DATA) {
+		for (i = 0; i < dst->ss_dur; i++ ) {
+			dst->ss_iops_data[i] = le64_to_cpu(src->ss_iops_data[i]);
+			dst->ss_bw_data[i] = le64_to_cpu(src->ss_bw_data[i]);
+		}
+	}
 }
 
 static void convert_gs(struct group_run_stats *dst, struct group_run_stats *src)
@@ -1617,6 +1632,7 @@ int fio_handle_client(struct fio_client *client)
 {
 	struct client_ops *ops = client->ops;
 	struct fio_net_cmd *cmd;
+	int size;
 
 	dprint(FD_NET, "client: handle %s\n", client->hostname);
 
@@ -1649,6 +1665,15 @@ int fio_handle_client(struct fio_client *client)
 	case FIO_NET_CMD_TS: {
 		struct cmd_ts_pdu *p = (struct cmd_ts_pdu *) cmd->payload;
 
+		dprint(FD_NET, "client: ts->ss_state = %u\n", (unsigned int) le32_to_cpu(p->ts.ss_state));
+		if (le32_to_cpu(p->ts.ss_state) & __FIO_SS_DATA) {
+			dprint(FD_NET, "client: received steadystate ring buffers\n");
+
+			size = le64_to_cpu(p->ts.ss_dur);
+			p->ts.ss_iops_data = (uint64_t *) ((struct cmd_ts_pdu *)cmd->payload + 1);
+			p->ts.ss_bw_data = p->ts.ss_iops_data + size;
+		}
+
 		convert_ts(&p->ts, &p->ts);
 		convert_gs(&p->rs, &p->rs);
 
diff --git a/debug.h b/debug.h
index 923fa39..e3aa3f1 100644
--- a/debug.h
+++ b/debug.h
@@ -21,6 +21,8 @@ enum {
 	FD_NET,
 	FD_RATE,
 	FD_COMPRESS,
+	FD_STEADYSTATE,
+	FD_HELPERTHREAD,
 	FD_DEBUG_MAX,
 };
 
diff --git a/examples/dev-dax.fio b/examples/dev-dax.fio
new file mode 100644
index 0000000..d9f430e
--- /dev/null
+++ b/examples/dev-dax.fio
@@ -0,0 +1,45 @@
+[global]
+bs=2m
+ioengine=dev-dax
+norandommap
+time_based=1
+runtime=30
+group_reporting
+disable_lat=1
+disable_slat=1
+disable_clat=1
+clat_percentiles=0
+cpus_allowed_policy=split
+
+# For the dev-dax engine:
+#
+#   IOs always complete immediately
+#   IOs are always direct
+#
+iodepth=1
+direct=0
+thread=1
+numjobs=16
+#
+# The dev-dax engine does IO to DAX device that are special character
+# devices exported by the kernel (e.g. /dev/dax0.0). The device is
+# opened normally and then the region is accessible via mmap. We do
+# not use the O_DIRECT flag because the device is naturally direct
+# access. The O_DIRECT flags will result in failure. The engine
+# access the underlying NVDIMM directly once the mmapping is setup.
+#
+# Check the alignment requirement of your DAX device. Currently the default
+# should be 2M. Blocksize (bs) should meet alignment requirement.
+#
+# An example of creating a dev dax device node from pmem:
+# ndctl create-namespace --reconfig=namespace0.0 --mode=dax --force
+#
+filename=/dev/dax0.0
+
+[dev-dax-write]
+rw=randwrite
+stonewall
+
+[dev-dax-read]
+rw=randread
+stonewall
diff --git a/examples/steadystate.fio b/examples/steadystate.fio
new file mode 100644
index 0000000..26fb808
--- /dev/null
+++ b/examples/steadystate.fio
@@ -0,0 +1,45 @@
+#
+# Example job file for steady state job termination
+# Use --output-format=json for detailed information
+#
+# For Windows, change the file names
+#
+
+[global]
+threads=1
+group_reporting=1
+time_based
+size=128m
+
+[ss-write]
+filename=/dev/null
+rw=write
+bs=128k
+numjobs=4
+runtime=5m
+ss=iops:10%
+ss_dur=30s
+ss_ramp=10s
+#
+# Begin ss detection 10s after job starts
+# Terminate job when largest deviation from mean IOPS is 10%
+# Use a rolling 30s window for deviations
+#
+
+
+[ss-read]
+new_group
+stonewall
+filename=/dev/zero
+rw=randread
+bs=4k
+numjobs=4
+runtime=5m
+ss=bw_slope:1%
+ss_dur=10s
+ss_ramp=5s
+#
+# Begin ss detection 5s after job starts
+# Terminate job when bandwidth slope is less than 1% of avg bw
+# Use a rolling 10s window for bw measurements
+#
diff --git a/fio.1 b/fio.1
index 07480f0..e8a327c 100644
--- a/fio.1
+++ b/fio.1
@@ -722,6 +722,9 @@ constraint.
 .TP
 .B pmemblk
 Read and write through the NVML libpmemblk interface.
+.TP
+.B dev-dax
+Read and write through a DAX device exposed from persistent memory.
 .RE
 .P
 .RE
@@ -1139,6 +1142,50 @@ logging results, thus minimizing the runtime required for stable results. Note
 that the \fBramp_time\fR is considered lead in time for a job, thus it will
 increase the total runtime if a special timeout or runtime is specified.
 .TP
+.BI steadystate \fR=\fPstr:float "\fR,\fP ss" \fR=\fPstr:float
+Define the criterion and limit for assessing steady state performance. The
+first parameter designates the criterion whereas the second parameter sets the
+threshold. When the criterion falls below the threshold for the specified
+duration, the job will stop. For example, iops_slope:0.1% will direct fio
+to terminate the job when the least squares regression slope falls below 0.1%
+of the mean IOPS. If group_reporting is enabled this will apply to all jobs in
+the group. All assessments are carried out using only data from the rolling
+collection window. Threshold limits can be expressed as a fixed value or as a
+percentage of the mean in the collection window. Below are the available steady
+state assessment criteria.
+.RS
+.RS
+.TP
+.B iops
+Collect IOPS data. Stop the job if all individual IOPS measurements are within
+the specified limit of the mean IOPS (e.g., iops:2 means that all individual
+IOPS values must be within 2 of the mean, whereas iops:0.2% means that all
+individual IOPS values must be within 0.2% of the mean IOPS to terminate the
+job).
+.TP
+.B iops_slope
+Collect IOPS data and calculate the least squares regression slope. Stop the
+job if the slope falls below the specified limit.
+.TP
+.B bw
+Collect bandwidth data. Stop the job if all individual bandwidth measurements
+are within the specified limit of the mean bandwidth.
+.TP
+.B bw_slope
+Collect bandwidth data and calculate the least squares regression slope. Stop
+the job if the slope falls below the specified limit.
+.RE
+.RE
+.TP
+.BI steadystate_duration \fR=\fPtime "\fR,\fP ss_dur" \fR=\fPtime
+A rolling window of this duration will be used to judge whether steady state
+has been reached. Data will be collected once per second. The default is 0
+which disables steady state detection.
+.TP
+.BI steadystate_ramp_time \fR=\fPtime "\fR,\fP ss_ramp" \fR=\fPtime
+Allow the job to run for the specified duration before beginning data collection
+for checking the steady state job termination criterion. The default is 0.
+.TP
 .BI invalidate \fR=\fPbool
 Invalidate buffer-cache for the file prior to starting I/O.  Default: true.
 .TP
diff --git a/fio.h b/fio.h
index 7e32788..5726bef 100644
--- a/fio.h
+++ b/fio.h
@@ -41,6 +41,7 @@
 #include "flow.h"
 #include "io_u_queue.h"
 #include "workqueue.h"
+#include "steadystate.h"
 
 #ifdef CONFIG_SOLARISAIO
 #include <sys/asynch.h>
@@ -395,6 +396,8 @@ struct thread_data {
 
 	void *pinned_mem;
 
+	struct steadystate_data ss;
+
 	char verror[FIO_VERROR_SIZE];
 };
 
diff --git a/helper_thread.c b/helper_thread.c
index f031df4..47ec728 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -1,6 +1,7 @@
 #include "fio.h"
 #include "smalloc.h"
 #include "helper_thread.h"
+#include "steadystate.h"
 
 static struct helper_data {
 	volatile int exit;
@@ -69,14 +70,15 @@ void helper_thread_exit(void)
 static void *helper_thread_main(void *data)
 {
 	struct helper_data *hd = data;
-	unsigned int msec_to_next_event, next_log;
-	struct timeval tv, last_du;
+	unsigned int msec_to_next_event, next_log, next_ss = STEADYSTATE_MSEC;
+	struct timeval tv, last_du, last_ss;
 	int ret = 0;
 
 	sk_out_assign(hd->sk_out);
 
 	gettimeofday(&tv, NULL);
 	memcpy(&last_du, &tv, sizeof(tv));
+	memcpy(&last_ss, &tv, sizeof(tv));
 
 	fio_mutex_up(hd->startup_mutex);
 
@@ -84,7 +86,7 @@ static void *helper_thread_main(void *data)
 	while (!ret && !hd->exit) {
 		struct timespec ts;
 		struct timeval now;
-		uint64_t since_du;
+		uint64_t since_du, since_ss = 0;
 
 		timeval_add_msec(&tv, msec_to_next_event);
 		ts.tv_sec = tv.tv_sec;
@@ -98,6 +100,7 @@ static void *helper_thread_main(void *data)
 		if (hd->reset) {
 			memcpy(&tv, &now, sizeof(tv));
 			memcpy(&last_du, &now, sizeof(last_du));
+			memcpy(&last_ss, &now, sizeof(last_ss));
 			hd->reset = 0;
 		}
 
@@ -122,7 +125,22 @@ static void *helper_thread_main(void *data)
 		if (!next_log)
 			next_log = DISK_UTIL_MSEC;
 
-		msec_to_next_event = min(next_log, msec_to_next_event);
+		if (steadystate_enabled) {
+			since_ss = mtime_since(&last_ss, &now);
+			if (since_ss >= STEADYSTATE_MSEC || STEADYSTATE_MSEC - since_ss < 10) {
+				steadystate_check();
+				timeval_add_msec(&last_ss, since_ss);
+				if (since_ss > STEADYSTATE_MSEC)
+					next_ss = STEADYSTATE_MSEC - (since_ss - STEADYSTATE_MSEC);
+				else
+					next_ss = STEADYSTATE_MSEC;
+			}
+			else
+				next_ss = STEADYSTATE_MSEC - since_ss;
+                }
+
+		msec_to_next_event = min(min(next_log, msec_to_next_event), next_ss);
+		dprint(FD_HELPERTHREAD, "since_ss: %llu, next_ss: %u, next_log: %u, msec_to_next_event: %u\n", (unsigned long long)since_ss, next_ss, next_log, msec_to_next_event);
 
 		if (!is_backend)
 			print_thread_status();
@@ -142,6 +160,7 @@ int helper_thread_create(struct fio_mutex *startup_mutex, struct sk_out *sk_out)
 	hd = smalloc(sizeof(*hd));
 
 	setup_disk_util();
+	steadystate_setup();
 
 	hd->sk_out = sk_out;
 
diff --git a/init.c b/init.c
index 36feb51..f26f35d 100644
--- a/init.c
+++ b/init.c
@@ -25,6 +25,7 @@
 #include "server.h"
 #include "idletime.h"
 #include "filelock.h"
+#include "steadystate.h"
 
 #include "oslib/getopt.h"
 #include "oslib/strcasestr.h"
@@ -1563,6 +1564,9 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			log_info("...\n");
 	}
 
+	if (td_steadystate_init(td))
+		goto err;
+
 	/*
 	 * recurse add identical jobs, clear numjobs and stonewall options
 	 * as they don't apply to sub-jobs
@@ -1578,6 +1582,8 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		td_new->o.stonewall = 0;
 		td_new->o.new_group = 0;
 		td_new->subjob_number = numjobs;
+		td_new->o.ss_dur = o->ss_dur * 1000000l;
+		td_new->o.ss_limit = o->ss_limit;
 
 		if (file_alloced) {
 			if (td_new->files) {
@@ -2120,6 +2126,14 @@ struct debug_level debug_levels[] = {
 	  .help = "Log compression logging",
 	  .shift = FD_COMPRESS,
 	},
+	{ .name = "steadystate",
+	  .help = "Steady state detection logging",
+	  .shift = FD_STEADYSTATE,
+	},
+	{ .name = "helperthread",
+	  .help = "Helper thread logging",
+	  .shift = FD_HELPERTHREAD,
+	},
 	{ .name = NULL, },
 };
 
diff --git a/json.c b/json.c
index 190fa9e..e0227ec 100644
--- a/json.c
+++ b/json.c
@@ -40,7 +40,7 @@ static struct json_value *json_create_value_int(long long number)
 	return value;
 }
 
-static struct json_value *json_create_value_float(float number)
+static struct json_value *json_create_value_float(double number)
 {
 	struct json_value *value = malloc(sizeof(struct json_value));
 
diff --git a/libfio.c b/libfio.c
index 0f9f4e7..960daf6 100644
--- a/libfio.c
+++ b/libfio.c
@@ -152,6 +152,7 @@ void reset_all_stats(struct thread_data *td)
 	memcpy(&td->start, &td->epoch, sizeof(struct timeval));
 	memcpy(&td->iops_sample_time, &td->epoch, sizeof(struct timeval));
 	memcpy(&td->bw_sample_time, &td->epoch, sizeof(struct timeval));
+	memcpy(&td->ss.prev_time, &td->epoch, sizeof(struct timeval));
 
 	lat_target_reset(td);
 	clear_rusage_stat(td);
diff --git a/options.c b/options.c
index dfecd9d..7638afc 100644
--- a/options.c
+++ b/options.c
@@ -1061,6 +1061,78 @@ static int str_random_distribution_cb(void *data, const char *str)
 	return 0;
 }
 
+static int str_steadystate_cb(void *data, const char *str)
+{
+	struct thread_data *td = cb_data_to_td(data);
+	double val;
+	char *nr;
+	char *pct;
+	long long ll;
+
+	if (td->o.ss_state != FIO_SS_IOPS && td->o.ss_state != FIO_SS_IOPS_SLOPE &&
+	    td->o.ss_state != FIO_SS_BW && td->o.ss_state != FIO_SS_BW_SLOPE) {
+		/* should be impossible to get here */
+		log_err("fio: unknown steady state criterion\n");
+		return 1;
+	}
+
+	nr = get_opt_postfix(str);
+	if (!nr) {
+		log_err("fio: steadystate threshold must be specified in addition to criterion\n");
+		free(nr);
+		return 1;
+	}
+
+	/* ENHANCEMENT Allow fio to understand size=10.2% and use here */
+	pct = strstr(nr, "%");
+	if (pct) {
+		*pct = '\0';
+		strip_blank_end(nr);
+		if (!str_to_float(nr, &val, 0))	{
+			log_err("fio: could not parse steadystate threshold percentage\n");
+			free(nr);
+			return 1;
+		}
+
+		dprint(FD_PARSE, "set steady state threshold to %f%%\n", val);
+		free(nr);
+		if (parse_dryrun())
+			return 0;
+
+		td->o.ss_state |= __FIO_SS_PCT;
+		td->o.ss_limit.u.f = val;
+	} else if (td->o.ss_state & __FIO_SS_IOPS) {
+		if (!str_to_float(nr, &val, 0)) {
+			log_err("fio: steadystate IOPS threshold postfix parsing failed\n");
+			free(nr);
+			return 1;
+		}
+
+		dprint(FD_PARSE, "set steady state IOPS threshold to %f\n", val);
+		free(nr);
+		if (parse_dryrun())
+			return 0;
+
+		td->o.ss_limit.u.f = val;
+	} else {	/* bandwidth criterion */
+		if (str_to_decimal(nr, &ll, 1, td, 0, 0)) {
+			log_err("fio: steadystate BW threshold postfix parsing failed\n");
+			free(nr);
+			return 1;
+		}
+
+		dprint(FD_PARSE, "set steady state BW threshold to %lld\n", ll);
+		free(nr);
+		if (parse_dryrun())
+			return 0;
+
+		td->o.ss_limit.u.f = (double) ll;
+	}
+
+	td->ss.state = td->o.ss_state;
+	return 0;
+}
+
 /*
  * Return next name in the string. Files are separated with ':'. If the ':'
  * is escaped with a '\', then that ':' is part of the filename and does not
@@ -1698,6 +1770,11 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  },
 
 #endif
+#ifdef CONFIG_LINUX_DEVDAX
+			  { .ival = "dev-dax",
+			    .help = "DAX Device based IO engine",
+			  },
+#endif
 			  { .ival = "external",
 			    .help = "Load external engine (append name)",
 			  },
@@ -4192,6 +4269,63 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_MTD,
 	},
 	{
+		.name   = "steadystate",
+		.lname  = "Steady state threshold",
+		.alias  = "ss",
+		.type   = FIO_OPT_STR,
+		.off1   = offsetof(struct thread_options, ss_state),
+		.cb	= str_steadystate_cb,
+		.help   = "Define the criterion and limit to judge when a job has reached steady state",
+		.def	= "iops_slope:0.01%",
+		.posval	= {
+			  { .ival = "iops",
+			    .oval = FIO_SS_IOPS,
+			    .help = "maximum mean deviation of IOPS measurements",
+			  },
+			  { .ival = "iops_slope",
+			    .oval = FIO_SS_IOPS_SLOPE,
+			    .help = "slope calculated from IOPS measurements",
+			  },
+			  { .ival = "bw",
+			    .oval = FIO_SS_BW,
+			    .help = "maximum mean deviation of bandwidth measurements",
+			  },
+			  {
+			    .ival = "bw_slope",
+			    .oval = FIO_SS_BW_SLOPE,
+			    .help = "slope calculated from bandwidth measurements",
+			  },
+		},
+		.category = FIO_OPT_C_GENERAL,
+		.group  = FIO_OPT_G_RUNTIME,
+	},
+        {
+		.name   = "steadystate_duration",
+		.lname  = "Steady state duration",
+		.alias  = "ss_dur",
+		.type   = FIO_OPT_STR_VAL_TIME,
+		.off1   = offsetof(struct thread_options, ss_dur),
+		.help   = "Stop workload upon attaining steady state for specified duration",
+		.def    = "0",
+		.is_seconds = 1,
+		.is_time = 1,
+		.category = FIO_OPT_C_GENERAL,
+		.group  = FIO_OPT_G_RUNTIME,
+	},
+        {
+		.name   = "steadystate_ramp_time",
+		.lname  = "Steady state ramp time",
+		.alias  = "ss_ramp",
+		.type   = FIO_OPT_STR_VAL_TIME,
+		.off1   = offsetof(struct thread_options, ss_ramp_time),
+		.help   = "Delay before initiation of data collection for steady state job termination testing",
+		.def    = "0",
+		.is_seconds = 1,
+		.is_time = 1,
+		.category = FIO_OPT_C_GENERAL,
+		.group  = FIO_OPT_G_RUNTIME,
+	},
+	{
 		.name = NULL,
 	},
 };
diff --git a/server.c b/server.c
index 172ccc0..2e05415 100644
--- a/server.c
+++ b/server.c
@@ -1462,6 +1462,8 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 {
 	struct cmd_ts_pdu p;
 	int i, j;
+	void *ss_buf;
+	uint64_t *ss_iops, *ss_bw;
 
 	dprint(FD_NET, "server sending end stats\n");
 
@@ -1545,9 +1547,37 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	for (i = 0; i < p.ts.nr_block_infos; i++)
 		p.ts.block_infos[i] = cpu_to_le32(ts->block_infos[i]);
 
+	p.ts.ss_dur		= cpu_to_le64(ts->ss_dur);
+	p.ts.ss_state		= cpu_to_le32(ts->ss_state);
+	p.ts.ss_head		= cpu_to_le32(ts->ss_head);
+	p.ts.ss_limit.u.i	= cpu_to_le64(fio_double_to_uint64(ts->ss_limit.u.f));
+	p.ts.ss_slope.u.i	= cpu_to_le64(fio_double_to_uint64(ts->ss_slope.u.f));
+	p.ts.ss_deviation.u.i	= cpu_to_le64(fio_double_to_uint64(ts->ss_deviation.u.f));
+	p.ts.ss_criterion.u.i	= cpu_to_le64(fio_double_to_uint64(ts->ss_criterion.u.f));
+
 	convert_gs(&p.rs, rs);
 
-	fio_net_queue_cmd(FIO_NET_CMD_TS, &p, sizeof(p), NULL, SK_F_COPY);
+	dprint(FD_NET, "ts->ss_state = %d\n", ts->ss_state);
+	if (ts->ss_state & __FIO_SS_DATA) {
+		dprint(FD_NET, "server sending steadystate ring buffers\n");
+
+		ss_buf = malloc(sizeof(p) + 2*ts->ss_dur*sizeof(uint64_t));
+
+		memcpy(ss_buf, &p, sizeof(p));
+
+		ss_iops = (uint64_t *) ((struct cmd_ts_pdu *)ss_buf + 1);
+		ss_bw = ss_iops + (int) ts->ss_dur;
+		for (i = 0; i < ts->ss_dur; i++) {
+			ss_iops[i] = cpu_to_le64(ts->ss_iops_data[i]);
+			ss_bw[i] = cpu_to_le64(ts->ss_bw_data[i]);
+		}
+
+		fio_net_queue_cmd(FIO_NET_CMD_TS, ss_buf, sizeof(p) + 2*ts->ss_dur*sizeof(uint64_t), NULL, SK_F_COPY);
+
+		free(ss_buf);
+	}
+	else
+		fio_net_queue_cmd(FIO_NET_CMD_TS, &p, sizeof(p), NULL, SK_F_COPY);
 }
 
 void fio_server_send_gs(struct group_run_stats *rs)
diff --git a/server.h b/server.h
index 4a81bcd..3a1d0b0 100644
--- a/server.h
+++ b/server.h
@@ -38,7 +38,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 59,
+	FIO_SERVER_VER			= 60,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index 423aacd..3e57e54 100644
--- a/stat.c
+++ b/stat.c
@@ -657,6 +657,33 @@ static void show_block_infos(int nr_block_infos, uint32_t *block_infos,
 			 i == BLOCK_STATE_COUNT - 1 ? '\n' : ',');
 }
 
+static void show_ss_normal(struct thread_stat *ts, struct buf_output *out)
+{
+	char *p1, *p2;
+	unsigned long long bw_mean, iops_mean;
+	const int i2p = is_power_of_2(ts->kb_base);
+
+	if (!ts->ss_dur)
+		return;
+
+	bw_mean = steadystate_bw_mean(ts);
+	iops_mean = steadystate_iops_mean(ts);
+
+	p1 = num2str(bw_mean / ts->kb_base, 6, ts->kb_base, i2p, ts->unit_base);
+	p2 = num2str(iops_mean, 6, 1, 0, 0);
+
+	log_buf(out, "  steadystate  : attained=%s, bw=%s/s, iops=%s, %s%s=%.3f%s\n",
+		ts->ss_state & __FIO_SS_ATTAINED ? "yes" : "no",
+		p1, p2,
+		ts->ss_state & __FIO_SS_IOPS ? "iops" : "bw",
+		ts->ss_state & __FIO_SS_SLOPE ? " slope": " mean dev",
+		ts->ss_criterion.u.f,
+		ts->ss_state & __FIO_SS_PCT ? "%" : "");
+
+	free(p1);
+	free(p2);
+}
+
 static void show_thread_status_normal(struct thread_stat *ts,
 				      struct group_run_stats *rs,
 				      struct buf_output *out)
@@ -763,6 +790,9 @@ static void show_thread_status_normal(struct thread_stat *ts,
 	if (ts->nr_block_infos)
 		show_block_infos(ts->nr_block_infos, ts->block_infos,
 				  ts->percentile_list, out);
+
+	if (ts->ss_dur)
+		show_ss_normal(ts, out);
 }
 
 static void show_ddir_status_terse(struct thread_stat *ts,
@@ -1257,6 +1287,56 @@ static struct json_object *show_thread_status_json(struct thread_stat *ts,
 		}
 	}
 
+	if (ts->ss_dur) {
+		struct json_object *data;
+		struct json_array *iops, *bw;
+		int i, j, k;
+		char ss_buf[64];
+
+		snprintf(ss_buf, sizeof(ss_buf), "%s%s:%f%s",
+			ts->ss_state & __FIO_SS_IOPS ? "iops" : "bw",
+			ts->ss_state & __FIO_SS_SLOPE ? "_slope" : "",
+			(float) ts->ss_limit.u.f,
+			ts->ss_state & __FIO_SS_PCT ? "%" : "");
+
+		tmp = json_create_object();
+		json_object_add_value_object(root, "steadystate", tmp);
+		json_object_add_value_string(tmp, "ss", ss_buf);
+		json_object_add_value_int(tmp, "duration", (int)ts->ss_dur);
+		json_object_add_value_int(tmp, "attained", (ts->ss_state & __FIO_SS_ATTAINED) > 0);
+
+		snprintf(ss_buf, sizeof(ss_buf), "%f%s", (float) ts->ss_criterion.u.f,
+			ts->ss_state & __FIO_SS_PCT ? "%" : "");
+		json_object_add_value_string(tmp, "criterion", ss_buf);
+		json_object_add_value_float(tmp, "max_deviation", ts->ss_deviation.u.f);
+		json_object_add_value_float(tmp, "slope", ts->ss_slope.u.f);
+
+		data = json_create_object();
+		json_object_add_value_object(tmp, "data", data);
+		bw = json_create_array();
+		iops = json_create_array();
+
+		/*
+		** if ss was attained or the buffer is not full,
+		** ss->head points to the first element in the list.
+		** otherwise it actually points to the second element
+		** in the list
+		*/
+		if ((ts->ss_state & __FIO_SS_ATTAINED) || !(ts->ss_state & __FIO_SS_BUFFER_FULL))
+			j = ts->ss_head;
+		else
+			j = ts->ss_head == 0 ? ts->ss_dur - 1 : ts->ss_head - 1;
+		for (i = 0; i < ts->ss_dur; i++) {
+			k = (j + i) % ts->ss_dur;
+			json_array_add_value_int(bw, ts->ss_bw_data[k]);
+			json_array_add_value_int(iops, ts->ss_iops_data[k]);
+		}
+		json_object_add_value_int(data, "bw_mean", steadystate_bw_mean(ts));
+		json_object_add_value_int(data, "iops_mean", steadystate_iops_mean(ts));
+		json_object_add_value_array(data, "iops", iops);
+		json_object_add_value_array(data, "bw", bw);
+	}
+
 	return root;
 }
 
@@ -1580,6 +1660,20 @@ void __show_run_stats(void)
 			ts->block_infos[k] = td->ts.block_infos[k];
 
 		sum_thread_stats(ts, &td->ts, idx == 1);
+
+		if (td->o.ss_dur) {
+			ts->ss_state = td->ss.state;
+			ts->ss_dur = td->ss.dur;
+			ts->ss_head = td->ss.head;
+			ts->ss_bw_data = td->ss.bw_data;
+			ts->ss_iops_data = td->ss.iops_data;
+			ts->ss_limit.u.f = td->ss.limit;
+			ts->ss_slope.u.f = td->ss.slope;
+			ts->ss_deviation.u.f = td->ss.deviation;
+			ts->ss_criterion.u.f = td->ss.criterion;
+		}
+		else
+			ts->ss_dur = ts->ss_state = 0;
 	}
 
 	for (i = 0; i < nr_ts; i++) {
diff --git a/stat.h b/stat.h
index 75d1f4e..22083da 100644
--- a/stat.h
+++ b/stat.h
@@ -198,10 +198,10 @@ struct thread_stat {
 	 */
 	union {
 		uint16_t continue_on_error;
-		uint64_t pad2;
+		uint32_t pad2;
 	};
-	uint64_t total_err_count;
 	uint32_t first_error;
+	uint64_t total_err_count;
 
 	uint64_t nr_block_infos;
 	uint32_t block_infos[MAX_NR_BLOCK_INFOS];
@@ -210,9 +210,29 @@ struct thread_stat {
 	uint32_t unit_base;
 
 	uint32_t latency_depth;
+	uint32_t pad3;
 	uint64_t latency_target;
 	fio_fp64_t latency_percentile;
 	uint64_t latency_window;
+
+	uint64_t ss_dur;
+	uint32_t ss_state;
+	uint32_t ss_head;
+
+	fio_fp64_t ss_limit;
+	fio_fp64_t ss_slope;
+	fio_fp64_t ss_deviation;
+	fio_fp64_t ss_criterion;
+
+	union {
+		uint64_t *ss_iops_data;
+		uint64_t pad4;
+	};
+
+	union {
+		uint64_t *ss_bw_data;
+		uint64_t pad5;
+	};
 } __attribute__((packed));
 
 struct jobs_eta {
diff --git a/steadystate.c b/steadystate.c
new file mode 100644
index 0000000..951376f
--- /dev/null
+++ b/steadystate.c
@@ -0,0 +1,368 @@
+#include <stdlib.h>
+
+#include "fio.h"
+#include "steadystate.h"
+#include "helper_thread.h"
+
+bool steadystate_enabled = false;
+
+static void steadystate_alloc(struct thread_data *td)
+{
+	int i;
+
+	td->ss.bw_data = malloc(td->ss.dur * sizeof(uint64_t));
+	td->ss.iops_data = malloc(td->ss.dur * sizeof(uint64_t));
+	/* initialize so that it is obvious if the cache is not full in the output */
+	for (i = 0; i < td->ss.dur; i++)
+		td->ss.iops_data[i] = td->ss.bw_data[i] = 0;
+
+	td->ss.state |= __FIO_SS_DATA;
+}
+
+void steadystate_setup(void)
+{
+	int i, prev_groupid;
+	struct thread_data *td, *prev_td;
+
+	if (!steadystate_enabled)
+		return;
+
+	/*
+	 * if group reporting is enabled, identify the last td
+	 * for each group and use it for storing steady state
+	 * data
+	 */
+	prev_groupid = -1;
+	prev_td = NULL;
+	for_each_td(td, i) {
+		if (!td->ss.dur)
+			continue;
+
+		if (!td->o.group_reporting) {
+			steadystate_alloc(td);
+			continue;
+		}
+
+		if (prev_groupid != td->groupid) {
+			if (prev_td != NULL) {
+				steadystate_alloc(prev_td);
+			}
+			prev_groupid = td->groupid;
+		}
+		prev_td = td;
+	}
+
+	if (prev_td != NULL && prev_td->o.group_reporting) {
+		steadystate_alloc(prev_td);
+	}
+}
+
+static bool steadystate_slope(uint64_t iops, uint64_t bw,
+			      struct thread_data *td)
+{
+	int i, j;
+	double result;
+	struct steadystate_data *ss = &td->ss;
+	uint64_t new_val;
+
+	ss->bw_data[ss->tail] = bw;
+	ss->iops_data[ss->tail] = iops;
+
+	if (ss->state & __FIO_SS_IOPS)
+		new_val = iops;
+	else
+		new_val = bw;
+
+	if (ss->state & __FIO_SS_BUFFER_FULL || ss->tail - ss->head == ss->dur - 1) {
+		if (!(ss->state & __FIO_SS_BUFFER_FULL)) {
+			/* first time through */
+			for(i = 0, ss->sum_y = 0; i < ss->dur; i++) {
+				if (ss->state & __FIO_SS_IOPS)
+					ss->sum_y += ss->iops_data[i];
+				else
+					ss->sum_y += ss->bw_data[i];
+				j = (ss->head + i) % ss->dur;
+				if (ss->state & __FIO_SS_IOPS)
+					ss->sum_xy += i * ss->iops_data[j];
+				else
+					ss->sum_xy += i * ss->bw_data[j];
+			}
+			ss->state |= __FIO_SS_BUFFER_FULL;
+		} else {		/* easy to update the sums */
+			ss->sum_y -= ss->oldest_y;
+			ss->sum_y += new_val;
+			ss->sum_xy = ss->sum_xy - ss->sum_y + ss->dur * new_val;
+		}
+
+		if (ss->state & __FIO_SS_IOPS)
+			ss->oldest_y = ss->iops_data[ss->head];
+		else
+			ss->oldest_y = ss->bw_data[ss->head];
+
+		/*
+		 * calculate slope as (sum_xy - sum_x * sum_y / n) / (sum_(x^2)
+		 * - (sum_x)^2 / n) This code assumes that all x values are
+		 * equally spaced when they are often off by a few milliseconds.
+		 * This assumption greatly simplifies the calculations.
+		 */
+		ss->slope = (ss->sum_xy - (double) ss->sum_x * ss->sum_y / ss->dur) /
+				(ss->sum_x_sq - (double) ss->sum_x * ss->sum_x / ss->dur);
+		if (ss->state & __FIO_SS_PCT)
+			ss->criterion = 100.0 * ss->slope / (ss->sum_y / ss->dur);
+		else
+			ss->criterion = ss->slope;
+
+		dprint(FD_STEADYSTATE, "sum_y: %llu, sum_xy: %llu, slope: %f, "
+					"criterion: %f, limit: %f\n",
+					(unsigned long long) ss->sum_y,
+					(unsigned long long) ss->sum_xy,
+					ss->slope, ss->criterion, ss->limit);
+
+		result = ss->criterion * (ss->criterion < 0.0 ? -1.0 : 1.0);
+		if (result < ss->limit)
+			return true;
+	}
+
+	ss->tail = (ss->tail + 1) % ss->dur;
+	if (ss->tail <= ss->head)
+		ss->head = (ss->head + 1) % ss->dur;
+
+	return false;
+}
+
+static bool steadystate_deviation(uint64_t iops, uint64_t bw,
+				  struct thread_data *td)
+{
+	int i;
+	double diff;
+	double mean;
+
+	struct steadystate_data *ss = &td->ss;
+
+	ss->bw_data[ss->tail] = bw;
+	ss->iops_data[ss->tail] = iops;
+
+	if (ss->state & __FIO_SS_BUFFER_FULL || ss->tail - ss->head == ss->dur - 1) {
+		if (!(ss->state & __FIO_SS_BUFFER_FULL)) {
+			/* first time through */
+			for(i = 0, ss->sum_y = 0; i < ss->dur; i++)
+				if (ss->state & __FIO_SS_IOPS)
+					ss->sum_y += ss->iops_data[i];
+				else
+					ss->sum_y += ss->bw_data[i];
+			ss->state |= __FIO_SS_BUFFER_FULL;
+		} else {		/* easy to update the sum */
+			ss->sum_y -= ss->oldest_y;
+			if (ss->state & __FIO_SS_IOPS)
+				ss->sum_y += ss->iops_data[ss->tail];
+			else
+				ss->sum_y += ss->bw_data[ss->tail];
+		}
+
+		if (ss->state & __FIO_SS_IOPS)
+			ss->oldest_y = ss->iops_data[ss->head];
+		else
+			ss->oldest_y = ss->bw_data[ss->head];
+
+		mean = (double) ss->sum_y / ss->dur;
+		ss->deviation = 0.0;
+
+		for (i = 0; i < ss->dur; i++) {
+			if (ss->state & __FIO_SS_IOPS)
+				diff = ss->iops_data[i] - mean;
+			else
+				diff = ss->bw_data[i] - mean;
+			ss->deviation = max(ss->deviation, diff * (diff < 0.0 ? -1.0 : 1.0));
+		}
+
+		if (ss->state & __FIO_SS_PCT)
+			ss->criterion = 100.0 * ss->deviation / mean;
+		else
+			ss->criterion = ss->deviation;
+
+		dprint(FD_STEADYSTATE, "sum_y: %llu, mean: %f, max diff: %f, "
+					"objective: %f, limit: %f\n",
+					(unsigned long long) ss->sum_y, mean,
+					ss->deviation, ss->criterion, ss->limit);
+
+		if (ss->criterion < ss->limit)
+			return true;
+	}
+
+	ss->tail = (ss->tail + 1) % ss->dur;
+	if (ss->tail <= ss->head)
+		ss->head = (ss->head + 1) % ss->dur;
+
+	return false;
+}
+
+void steadystate_check(void)
+{
+	int i, j, ddir, prev_groupid, group_ramp_time_over = 0;
+	unsigned long rate_time;
+	struct thread_data *td, *td2;
+	struct timeval now;
+	uint64_t group_bw = 0, group_iops = 0;
+	uint64_t td_iops, td_bytes;
+	bool ret;
+
+	prev_groupid = -1;
+	for_each_td(td, i) {
+		struct steadystate_data *ss = &td->ss;
+
+		if (!ss->dur || td->runstate <= TD_SETTING_UP ||
+		    td->runstate >= TD_EXITED || !ss->state ||
+		    ss->state & __FIO_SS_ATTAINED)
+			continue;
+
+		td_iops = 0;
+		td_bytes = 0;
+		if (!td->o.group_reporting ||
+		    (td->o.group_reporting && td->groupid != prev_groupid)) {
+			group_bw = 0;
+			group_iops = 0;
+			group_ramp_time_over = 0;
+		}
+		prev_groupid = td->groupid;
+
+		fio_gettime(&now, NULL);
+		if (ss->ramp_time && !(ss->state & __FIO_SS_RAMP_OVER)) {
+			/*
+			 * Begin recording data one second after ss->ramp_time
+			 * has elapsed
+			 */
+			if (utime_since(&td->epoch, &now) >= (ss->ramp_time + 1000000L))
+				ss->state |= __FIO_SS_RAMP_OVER;
+		}
+
+		td_io_u_lock(td);
+		for (ddir = DDIR_READ; ddir < DDIR_RWDIR_CNT; ddir++) {
+			td_iops += td->io_blocks[ddir];
+			td_bytes += td->io_bytes[ddir];
+		}
+		td_io_u_unlock(td);
+
+		rate_time = mtime_since(&ss->prev_time, &now);
+		memcpy(&ss->prev_time, &now, sizeof(now));
+
+		/*
+		 * Begin monitoring when job starts but don't actually use
+		 * data in checking stopping criterion until ss->ramp_time is
+		 * over. This ensures that we will have a sane value in
+		 * prev_iops/bw the first time through after ss->ramp_time
+		 * is done.
+		 */
+		if (ss->state & __FIO_SS_RAMP_OVER) {
+			group_bw += 1000 * (td_bytes - ss->prev_bytes) / rate_time;
+			group_iops += 1000 * (td_iops - ss->prev_iops) / rate_time;
+			++group_ramp_time_over;
+		}
+		ss->prev_iops = td_iops;
+		ss->prev_bytes = td_bytes;
+
+		if (td->o.group_reporting && !(ss->state & __FIO_SS_DATA))
+			continue;
+
+		/*
+		 * Don't begin checking criterion until ss->ramp_time is over
+		 * for at least one thread in group
+		 */
+		if (!group_ramp_time_over)
+			continue;
+
+		dprint(FD_STEADYSTATE, "steadystate_check() thread: %d, "
+					"groupid: %u, rate_msec: %ld, "
+					"iops: %llu, bw: %llu, head: %d, tail: %d\n",
+					i, td->groupid, rate_time,
+					(unsigned long long) group_iops,
+					(unsigned long long) group_bw,
+					ss->head, ss->tail);
+
+		if (ss->state & __FIO_SS_SLOPE)
+			ret = steadystate_slope(group_iops, group_bw, td);
+		else
+			ret = steadystate_deviation(group_iops, group_bw, td);
+
+		if (ret) {
+			if (td->o.group_reporting) {
+				for_each_td(td2, j) {
+					if (td2->groupid == td->groupid) {
+						td2->ss.state |= __FIO_SS_ATTAINED;
+						fio_mark_td_terminate(td2);
+					}
+				}
+			} else {
+				ss->state |= __FIO_SS_ATTAINED;
+				fio_mark_td_terminate(td);
+			}
+		}
+	}
+}
+
+int td_steadystate_init(struct thread_data *td)
+{
+	struct steadystate_data *ss = &td->ss;
+	struct thread_options *o = &td->o;
+	struct thread_data *td2;
+	int j;
+
+	memset(ss, 0, sizeof(*ss));
+
+	if (o->ss_dur) {
+		steadystate_enabled = true;
+		o->ss_dur /= 1000000L;
+
+		/* put all steady state info in one place */
+		ss->dur = o->ss_dur;
+		ss->limit = o->ss_limit.u.f;
+		ss->ramp_time = o->ss_ramp_time;
+
+		ss->state = o->ss_state;
+		if (!td->ss.ramp_time)
+			ss->state |= __FIO_SS_RAMP_OVER;
+
+		ss->sum_x = o->ss_dur * (o->ss_dur - 1) / 2;
+		ss->sum_x_sq = (o->ss_dur - 1) * (o->ss_dur) * (2*o->ss_dur - 1) / 6;
+	}
+
+	/* make sure that ss options are consistent within reporting group */
+	for_each_td(td2, j) {
+		if (td2->groupid == td->groupid) {
+			struct steadystate_data *ss2 = &td2->ss;
+
+			if (ss2->dur != ss->dur ||
+			    ss2->limit != ss->limit ||
+			    ss2->ramp_time != ss->ramp_time ||
+			    ss2->state != ss->state ||
+			    ss2->sum_x != ss->sum_x ||
+			    ss2->sum_x_sq != ss->sum_x_sq) {
+				td_verror(td, EINVAL, "job rejected: steadystate options must be consistent within reporting groups");
+				return 1;
+			}
+		}
+	}
+
+	return 0;
+}
+
+uint64_t steadystate_bw_mean(struct thread_stat *ts)
+{
+	int i;
+	uint64_t sum;
+
+	for (i = 0, sum = 0; i < ts->ss_dur; i++)
+		sum += ts->ss_bw_data[i];
+
+	return sum / ts->ss_dur;
+}
+
+uint64_t steadystate_iops_mean(struct thread_stat *ts)
+{
+	int i;
+	uint64_t sum;
+
+	for (i = 0, sum = 0; i < ts->ss_dur; i++)
+		sum += ts->ss_iops_data[i];
+
+	return sum / ts->ss_dur;
+}
diff --git a/steadystate.h b/steadystate.h
new file mode 100644
index 0000000..20ccd30
--- /dev/null
+++ b/steadystate.h
@@ -0,0 +1,61 @@
+#ifndef FIO_STEADYSTATE_H
+#define FIO_STEADYSTATE_H
+
+#include "stat.h"
+#include "thread_options.h"
+#include "lib/ieee754.h"
+
+extern void steadystate_check(void);
+extern void steadystate_setup(void);
+extern int td_steadystate_init(struct thread_data *);
+extern uint64_t steadystate_bw_mean(struct thread_stat *);
+extern uint64_t steadystate_iops_mean(struct thread_stat *);
+
+extern bool steadystate_enabled;
+
+struct steadystate_data {
+	double limit;
+	unsigned long long dur;
+	unsigned long long ramp_time;
+
+	uint32_t state;
+
+	unsigned int head;
+	unsigned int tail;
+	uint64_t *iops_data;
+	uint64_t *bw_data;
+
+	double slope;
+	double deviation;
+	double criterion;
+
+	uint64_t sum_y;
+	uint64_t sum_x;
+	uint64_t sum_x_sq;
+	uint64_t sum_xy;
+	uint64_t oldest_y;
+
+	struct timeval prev_time;
+	uint64_t prev_iops;
+	uint64_t prev_bytes;
+};
+
+enum {
+	__FIO_SS_IOPS		= 1,
+	__FIO_SS_BW		= 2,
+	__FIO_SS_SLOPE		= 4,
+	__FIO_SS_ATTAINED	= 8,
+	__FIO_SS_RAMP_OVER	= 16,
+	__FIO_SS_DATA		= 32,
+	__FIO_SS_PCT		= 64,
+	__FIO_SS_BUFFER_FULL	= 128,
+
+	FIO_SS_IOPS		= __FIO_SS_IOPS,
+	FIO_SS_IOPS_SLOPE	= __FIO_SS_IOPS | __FIO_SS_SLOPE,
+	FIO_SS_BW		= __FIO_SS_BW,
+	FIO_SS_BW_SLOPE		= __FIO_SS_BW | __FIO_SS_SLOPE,
+};
+
+#define STEADYSTATE_MSEC	1000
+
+#endif
diff --git a/thread_options.h b/thread_options.h
index 8ec6b97..dd5b9ef 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -170,6 +170,10 @@ struct thread_options {
 	unsigned long long start_delay_high;
 	unsigned long long timeout;
 	unsigned long long ramp_time;
+	unsigned int ss_state;
+	fio_fp64_t ss_limit;
+	unsigned long long ss_dur;
+	unsigned long long ss_ramp_time;
 	unsigned int overwrite;
 	unsigned int bw_avg_time;
 	unsigned int iops_avg_time;
@@ -434,6 +438,10 @@ struct thread_options_pack {
 	uint64_t start_delay_high;
 	uint64_t timeout;
 	uint64_t ramp_time;
+	uint64_t ss_dur;
+	uint64_t ss_ramp_time;
+	uint32_t ss_state;
+	fio_fp64_t ss_limit;
 	uint32_t overwrite;
 	uint32_t bw_avg_time;
 	uint32_t iops_avg_time;
@@ -494,6 +502,7 @@ struct thread_options_pack {
 	uint64_t trim_backlog;
 	uint32_t clat_percentiles;
 	uint32_t percentile_precision;
+	uint32_t padding;	/* REMOVE ME when possible to maintain alignment */
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
 	uint8_t read_iolog_file[FIO_TOP_STR_MAX];
diff --git a/unit_tests/steadystate_tests.py b/unit_tests/steadystate_tests.py
new file mode 100755
index 0000000..a8e4e39
--- /dev/null
+++ b/unit_tests/steadystate_tests.py
@@ -0,0 +1,222 @@
+#!/usr/bin/python
+#
+# steadystate_tests.py
+#
+# Test option parsing and functonality for fio's steady state detection feature.
+#
+# steadystate_tests.py ./fio file-for-read-testing file-for-write-testing
+#
+# REQUIREMENTS
+# Python 2.6+
+# SciPy
+#
+# KNOWN ISSUES
+# only option parsing and read tests are carried out
+# On Windows this script works under Cygwin but not from cmd.exe
+# On Windows I encounter frequent fio problems generating JSON output (nothing to decode)
+# min runtime:
+# if ss attained: min runtime = ss_dur + ss_ramp
+# if not attained: runtime = timeout
+
+import os
+import sys
+import json
+import uuid
+import pprint
+import argparse
+import subprocess
+from scipy import stats
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('fio',
+                        help='path to fio executable');
+    parser.add_argument('--read',
+                        help='target for read testing')
+    parser.add_argument('--write',
+                        help='target for write testing')
+    args = parser.parse_args()
+
+    return args
+
+
+def check(data, iops, slope, pct, limit, dur, criterion):
+    measurement = 'iops' if iops else 'bw'
+    data = data[measurement]
+    mean = sum(data) / len(data)
+    if slope:
+        x = range(len(data))
+        m, intercept, r_value, p_value, std_err = stats.linregress(x,data)
+        m = abs(m)
+        if pct:
+            target = m / mean * 100
+            criterion = criterion[:-1]
+        else:
+            target = m
+    else:
+        maxdev = 0
+        for x in data:
+            maxdev = max(abs(mean-x), maxdev)
+        if pct:
+            target = maxdev / mean * 100
+            criterion = criterion[:-1]
+        else:
+            target = maxdev
+
+    criterion = float(criterion)
+    return (abs(target - criterion) / criterion < 0.005), target < limit, mean, target
+
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    pp = pprint.PrettyPrinter(indent=4)
+
+#
+# test option parsing
+#
+    parsing = [ { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:10", "--ss_ramp=5"],
+                  'output': "set steady state IOPS threshold to 10.000000" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:10%", "--ss_ramp=5"],
+                  'output': "set steady state threshold to 10.000000%" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:.1%", "--ss_ramp=5"],
+                  'output': "set steady state threshold to 0.100000%" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:10%", "--ss_ramp=5"],
+                  'output': "set steady state threshold to 10.000000%" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:.1%", "--ss_ramp=5"],
+                  'output': "set steady state threshold to 0.100000%" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:12", "--ss_ramp=5"],
+                  'output': "set steady state BW threshold to 12" },
+              ]
+    for test in parsing:
+        output = subprocess.check_output([args.fio] + test['args']);
+        if test['output'] in output:
+            print "PASSED '{0}' found with arguments {1}".format(test['output'], test['args'])
+        else:
+            print "FAILED '{0}' NOT found with arguments {1}".format(test['output'], test['args'])
+
+#
+# test some read workloads
+#
+# if ss active and attained,
+#   check that runtime is less than job time
+#   check criteria
+#   how to check ramp time?
+#
+# if ss inactive
+#   check that runtime is what was specified
+#
+    reads = [ {'s': True, 'timeout': 100, 'numjobs': 1, 'ss_dur': 5, 'ss_ramp': 3, 'iops': True, 'slope': True, 'ss_limit': 0.1, 'pct': True},
+              {'s': False, 'timeout': 20, 'numjobs': 2},
+              {'s': True, 'timeout': 100, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 5, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
+              {'s': True, 'timeout': 10, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 500, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
+            ]
+
+    if args.read == None:
+        if os.name == 'posix':
+            args.read = '/dev/zero'
+            extra = [ "--size=128M" ]
+        else:
+            print "ERROR: file for read testing must be specified on non-posix systems"
+            sys.exit(1)
+    else:
+        extra = []
+
+    jobnum = 0
+    for job in reads:
+
+        tf = uuid.uuid4().hex
+        parameters = [ "--name=job{0}".format(jobnum) ]
+        parameters.extend(extra)
+        parameters.extend([ "--thread",
+                            "--output-format=json",
+                            "--output={0}".format(tf),
+                            "--filename={0}".format(args.read),
+                            "--rw=randrw",
+                            "--rwmixread=100",
+                            "--stonewall",
+                            "--group_reporting",
+                            "--numjobs={0}".format(job['numjobs']),
+                            "--time_based",
+                            "--runtime={0}".format(job['timeout']) ])
+        if job['s']:
+           if job['iops']:
+               ss = 'iops'
+           else:
+               ss = 'bw'
+           if job['slope']:
+               ss += "_slope"
+           ss += ":" + str(job['ss_limit'])
+           if job['pct']:
+               ss += '%'
+           parameters.extend([ '--ss_dur={0}'.format(job['ss_dur']),
+                               '--ss={0}'.format(ss),
+                               '--ss_ramp={0}'.format(job['ss_ramp']) ])
+
+        output = subprocess.call([args.fio] + parameters)
+        with open(tf, 'r') as source:
+            jsondata = json.loads(source.read())
+        os.remove(tf)
+
+        for jsonjob in jsondata['jobs']:
+            line = "job {0}".format(jsonjob['job options']['name'])
+            if job['s']:
+                if jsonjob['steadystate']['attained'] == 1:
+                    # check runtime >= ss_dur + ss_ramp, check criterion, check criterion < limit
+                    mintime = (job['ss_dur'] + job['ss_ramp']) * 1000
+                    actual = jsonjob['read']['runtime']
+                    if mintime > actual:
+                        line = 'FAILED ' + line + ' ss attained, runtime {0} < ss_dur {1} + ss_ramp {2}'.format(actual, job['ss_dur'], job['ss_ramp'])
+                    else:
+                        line = line + ' ss attained, runtime {0} > ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
+                        objsame, met, mean, target = check(data=jsonjob['steadystate']['data'],
+                            iops=job['iops'],
+                            slope=job['slope'],
+                            pct=job['pct'],
+                            limit=job['ss_limit'],
+                            dur=job['ss_dur'],
+                            criterion=jsonjob['steadystate']['criterion'])
+                        if not objsame:
+                            line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
+                        else:
+                            if met:
+                                line = 'PASSED ' + line + ' target {0} < limit {1}'.format(target, job['ss_limit'])
+                            else:
+                                line = 'FAILED ' + line + ' target {0} < limit {1} but fio reports ss not attained '.format(target, job['ss_limit'])
+                else:
+                    # check runtime, confirm criterion calculation, and confirm that criterion was not met
+                    expected = job['timeout'] * 1000
+                    actual = jsonjob['read']['runtime']
+                    if abs(expected - actual) > 10:
+                        line = 'FAILED ' + line + ' ss not attained, expected runtime {0} != actual runtime {1}'.format(expected, actual)
+                    else:
+                        line = line + ' ss not attained, runtime {0} != ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
+                        objsame, met, mean, target = check(data=jsonjob['steadystate']['data'],
+                            iops=job['iops'],
+                            slope=job['slope'],
+                            pct=job['pct'],
+                            limit=job['ss_limit'],
+                            dur=job['ss_dur'],
+                            criterion=jsonjob['steadystate']['criterion'])
+                        if not objsame:
+                            if actual > (job['ss_dur'] + job['ss_ramp'])*1000:
+                                line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
+                            else:
+                                line = 'PASSED ' + line + ' fio criterion {0} == 0.0 since ss_dur + ss_ramp has not elapsed '.format(jsonjob['steadystate']['criterion'])
+                        else:
+                            if met:
+                                line = 'FAILED ' + line + ' target {0} < threshold {1} but fio reports ss not attained '.format(target, job['ss_limit'])
+                            else:
+                                line = 'PASSED ' + line + ' criterion {0} > threshold {1}'.format(target, job['ss_limit'])
+            else:
+                expected = job['timeout'] * 1000
+                actual = jsonjob['read']['runtime']
+                if abs(expected - actual) < 10:
+                    result = 'PASSED '
+                else:
+                    result = 'FAILED '
+                line = result + line + ' no ss, expected runtime {0} ~= actual runtime {1}'.format(expected, actual)
+            print line
+            if 'steadystate' in jsonjob:
+                pp.pprint(jsonjob['steadystate'])
+        jobnum += 1

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-12-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-12-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 13e0f06b805eb0bb3a100ed710c7da18684c8950:

  Change misleading error message for invalid size= value (2016-12-13 09:32:59 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0d60927f167d318a685b9e5309bb392c624776e4:

  fio: move _FORTIFY_SOURCE to only when optimization is turned on (2016-12-15 14:55:00 -0700)

----------------------------------------------------------------
Dave Jiang (2):
      fio: add device dax engine
      fio: move _FORTIFY_SOURCE to only when optimization is turned on

 Makefile          |   7 +-
 configure         |  11 ++
 engines/dev-dax.c | 368 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 384 insertions(+), 2 deletions(-)
 create mode 100644 engines/dev-dax.c

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index b3a12dd..d27380b 100644
--- a/Makefile
+++ b/Makefile
@@ -20,7 +20,7 @@ all:
 include config-host.mak
 endif
 
-DEBUGFLAGS = -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -DFIO_INC_DEBUG
+DEBUGFLAGS = -DFIO_INC_DEBUG
 CPPFLAGS= -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DFIO_INTERNAL $(DEBUGFLAGS)
 OPTFLAGS= -g -ffast-math
 CFLAGS	= -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement $(OPTFLAGS) $(EXTFLAGS) $(BUILD_CFLAGS) -I. -I$(SRCDIR)
@@ -29,7 +29,7 @@ PROGS	= fio
 SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/fio_latency2csv.py tools/hist/fiologparser_hist.py)
 
 ifndef CONFIG_FIO_NO_OPT
-  CFLAGS += -O3
+  CFLAGS += -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2
 endif
 
 ifdef CONFIG_GFIO
@@ -127,6 +127,9 @@ endif
 ifdef CONFIG_PMEMBLK
   SOURCE += engines/pmemblk.c
 endif
+ifdef CONFIG_LINUX_DEVDAX
+  SOURCE += engines/dev-dax.c
+endif
 
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
diff --git a/configure b/configure
index 833b6d3..03bed3b 100755
--- a/configure
+++ b/configure
@@ -136,6 +136,7 @@ exit_val=0
 gfio_check="no"
 libhdfs="no"
 pmemblk="no"
+devdax="no"
 disable_lex=""
 prefix=/usr/local
 
@@ -174,6 +175,8 @@ for opt do
   ;;
   --enable-pmemblk) pmemblk="yes"
   ;;
+  --enable-devdax) devdax="yes"
+  ;;
   --disable-lex) disable_lex="yes"
   ;;
   --enable-lex) disable_lex="no"
@@ -205,6 +208,7 @@ if test "$show_help" = "yes" ; then
   echo "--disable-gfapi        Disable gfapi"
   echo "--enable-libhdfs       Enable hdfs support"
   echo "--enable-pmemblk       Enable NVML libpmemblk support"
+  echo "--enable-devdax        Enable NVM Device Dax support"
   echo "--disable-lex          Disable use of lex/yacc for math"
   echo "--enable-lex           Enable use of lex/yacc for math"
   echo "--disable-shm          Disable SHM support"
@@ -1566,6 +1570,10 @@ echo "MTD                           $mtd"
 # Report whether pmemblk engine is enabled
 echo "NVML libpmemblk engine        $pmemblk"
 
+##########################################
+# Report whether dev-dax engine is enabled
+echo "NVM Device Dax engine        $devdax"
+
 # Check if we have lex/yacc available
 yacc="no"
 yacc_is_bison="no"
@@ -1917,6 +1925,9 @@ fi
 if test "$pmemblk" = "yes" ; then
   output_sym "CONFIG_PMEMBLK"
 fi
+if test "$devdax" = "yes" ; then
+  output_sym "CONFIG_LINUX_DEVDAX"
+fi
 if test "$arith" = "yes" ; then
   output_sym "CONFIG_ARITHMETIC"
   if test "$yacc_is_bison" = "yes" ; then
diff --git a/engines/dev-dax.c b/engines/dev-dax.c
new file mode 100644
index 0000000..6372576
--- /dev/null
+++ b/engines/dev-dax.c
@@ -0,0 +1,368 @@
+/*
+ * device DAX engine
+ *
+ * IO engine that reads/writes from files by doing memcpy to/from
+ * a memory mapped region of DAX enabled device.
+ *
+ * Copyright (C) 2016 Intel Corp
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+/*
+ * device dax engine
+ * IO engine that access a DAX device directly for read and write data
+ *
+ * To use:
+ *   ioengine=dev-dax
+ *
+ *   Other relevant settings:
+ *     iodepth=1
+ *     direct=0	   REQUIRED
+ *     filename=/dev/daxN.N
+ *     bs=2m
+ *
+ *     direct should be left to 0. Using dev-dax implies that memory access
+ *     is direct. However, dev-dax does not support O_DIRECT flag by design
+ *     since it is not necessary.
+ *
+ *     bs should adhere to the device dax alignment at minimally.
+ *
+ * libpmem.so
+ *   By default, the dev-dax engine will let the system find the libpmem.so
+ *   that it uses. You can use an alternative libpmem by setting the
+ *   FIO_PMEM_LIB environment variable to the full path to the desired
+ *   libpmem.so.
+ */
+
+#include <stdio.h>
+#include <limits.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/sysmacros.h>
+#include <dlfcn.h>
+#include <libgen.h>
+
+#include "../fio.h"
+#include "../verify.h"
+
+/*
+ * Limits us to 1GB of mapped files in total to model after
+ * mmap engine behavior
+ */
+#define MMAP_TOTAL_SZ	(1 * 1024 * 1024 * 1024UL)
+
+struct fio_devdax_data {
+	void *devdax_ptr;
+	size_t devdax_sz;
+	off_t devdax_off;
+};
+
+static void * (*pmem_memcpy_persist)(void *dest, const void *src, size_t len);
+
+static int fio_devdax_file(struct thread_data *td, struct fio_file *f,
+			   size_t length, off_t off)
+{
+	struct fio_devdax_data *fdd = FILE_ENG_DATA(f);
+	int flags = 0;
+
+	if (td_rw(td))
+		flags = PROT_READ | PROT_WRITE;
+	else if (td_write(td)) {
+		flags = PROT_WRITE;
+
+		if (td->o.verify != VERIFY_NONE)
+			flags |= PROT_READ;
+	} else
+		flags = PROT_READ;
+
+	fdd->devdax_ptr = mmap(NULL, length, flags, MAP_SHARED, f->fd, off);
+	if (fdd->devdax_ptr == MAP_FAILED) {
+		fdd->devdax_ptr = NULL;
+		td_verror(td, errno, "mmap");
+	}
+
+	if (td->error && fdd->devdax_ptr)
+		munmap(fdd->devdax_ptr, length);
+
+	return td->error;
+}
+
+/*
+ * Just mmap an appropriate portion, we cannot mmap the full extent
+ */
+static int fio_devdax_prep_limited(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	struct fio_devdax_data *fdd = FILE_ENG_DATA(f);
+
+	if (io_u->buflen > f->real_file_size) {
+		log_err("fio: bs too big for dev-dax engine\n");
+		return EIO;
+	}
+
+	fdd->devdax_sz = min(MMAP_TOTAL_SZ, f->real_file_size);
+	if (fdd->devdax_sz > f->io_size)
+		fdd->devdax_sz = f->io_size;
+
+	fdd->devdax_off = io_u->offset;
+
+	return fio_devdax_file(td, f, fdd->devdax_sz, fdd->devdax_off);
+}
+
+/*
+ * Attempt to mmap the entire file
+ */
+static int fio_devdax_prep_full(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	struct fio_devdax_data *fdd = FILE_ENG_DATA(f);
+	int ret;
+
+	if (fio_file_partial_mmap(f))
+		return EINVAL;
+
+	if (io_u->offset != (size_t) io_u->offset ||
+	    f->io_size != (size_t) f->io_size) {
+		fio_file_set_partial_mmap(f);
+		return EINVAL;
+	}
+
+	fdd->devdax_sz = f->io_size;
+	fdd->devdax_off = 0;
+
+	ret = fio_devdax_file(td, f, fdd->devdax_sz, fdd->devdax_off);
+	if (ret)
+		fio_file_set_partial_mmap(f);
+
+	return ret;
+}
+
+static int fio_devdax_prep(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	struct fio_devdax_data *fdd = FILE_ENG_DATA(f);
+	int ret;
+
+	/*
+	 * It fits within existing mapping, use it
+	 */
+	if (io_u->offset >= fdd->devdax_off &&
+	    io_u->offset + io_u->buflen < fdd->devdax_off + fdd->devdax_sz)
+		goto done;
+
+	/*
+	 * unmap any existing mapping
+	 */
+	if (fdd->devdax_ptr) {
+		if (munmap(fdd->devdax_ptr, fdd->devdax_sz) < 0)
+			return errno;
+		fdd->devdax_ptr = NULL;
+	}
+
+	if (fio_devdax_prep_full(td, io_u)) {
+		td_clear_error(td);
+		ret = fio_devdax_prep_limited(td, io_u);
+		if (ret)
+			return ret;
+	}
+
+done:
+	io_u->mmap_data = fdd->devdax_ptr + io_u->offset - fdd->devdax_off -
+				f->file_offset;
+	return 0;
+}
+
+static int fio_devdax_queue(struct thread_data *td, struct io_u *io_u)
+{
+	fio_ro_check(td, io_u);
+	io_u->error = 0;
+
+	switch (io_u->ddir) {
+	case DDIR_READ:
+		memcpy(io_u->xfer_buf, io_u->mmap_data, io_u->xfer_buflen);
+		break;
+	case DDIR_WRITE:
+		pmem_memcpy_persist(io_u->mmap_data, io_u->xfer_buf,
+				    io_u->xfer_buflen);
+		break;
+	case DDIR_SYNC:
+	case DDIR_DATASYNC:
+	case DDIR_SYNC_FILE_RANGE:
+		break;
+	default:
+		io_u->error = EINVAL;
+		break;
+	}
+
+	return FIO_Q_COMPLETED;
+}
+
+static int fio_devdax_init(struct thread_data *td)
+{
+	struct thread_options *o = &td->o;
+	const char *path;
+	void *dl;
+
+	if ((o->rw_min_bs & page_mask) &&
+	    (o->fsync_blocks || o->fdatasync_blocks)) {
+		log_err("fio: mmap options dictate a minimum block size of "
+			"%llu bytes\n", (unsigned long long) page_size);
+		return 1;
+	}
+
+	path = getenv("FIO_PMEM_LIB");
+	if (!path)
+		path = "libpmem.so";
+
+	dl = dlopen(path, RTLD_NOW | RTLD_NODELETE);
+	if (!dl) {
+		log_err("fio: unable to open libpmem: %s\n", dlerror());
+		return 1;
+	}
+
+	pmem_memcpy_persist = dlsym(dl, "pmem_memcpy_persist");
+	if (!pmem_memcpy_persist) {
+		log_err("fio: unable to load libpmem: %s\n", dlerror());
+		return 1;
+	}
+
+	return 0;
+}
+
+static int fio_devdax_open_file(struct thread_data *td, struct fio_file *f)
+{
+	struct fio_devdax_data *fdd;
+	int ret;
+
+	ret = generic_open_file(td, f);
+	if (ret)
+		return ret;
+
+	fdd = calloc(1, sizeof(*fdd));
+	if (!fdd) {
+		int fio_unused __ret;
+		__ret = generic_close_file(td, f);
+		return 1;
+	}
+
+	FILE_SET_ENG_DATA(f, fdd);
+
+	return 0;
+}
+
+static int fio_devdax_close_file(struct thread_data *td, struct fio_file *f)
+{
+	struct fio_devdax_data *fdd = FILE_ENG_DATA(f);
+
+	FILE_SET_ENG_DATA(f, NULL);
+	free(fdd);
+	fio_file_clear_partial_mmap(f);
+
+	return generic_close_file(td, f);
+}
+
+static int
+fio_devdax_get_file_size(struct thread_data *td, struct fio_file *f)
+{
+	char spath[PATH_MAX];
+	char npath[PATH_MAX];
+	char *rpath;
+	FILE *sfile;
+	uint64_t size;
+	struct stat st;
+	int rc;
+
+	if (fio_file_size_known(f))
+		return 0;
+
+	if (f->filetype != FIO_TYPE_CHAR)
+		return -EINVAL;
+
+	rc = stat(f->file_name, &st);
+	if (rc < 0) {
+		log_err("%s: failed to stat file %s: %d\n",
+			td->o.name, f->file_name, errno);
+		return -errno;
+	}
+
+	snprintf(spath, PATH_MAX, "/sys/dev/char/%d:%d/subsystem",
+		 major(st.st_rdev), minor(st.st_rdev));
+
+	rpath = realpath(spath, npath);
+	if (!rpath) {
+		log_err("%s: realpath on %s failed: %d\n",
+			td->o.name, spath, errno);
+		return -errno;
+	}
+
+	/* check if DAX device */
+	if (strcmp("/sys/class/dax", rpath)) {
+		log_err("%s: %s not a DAX device!\n",
+			td->o.name, f->file_name);
+	}
+
+	snprintf(spath, PATH_MAX, "/sys/dev/char/%d:%d/size",
+		 major(st.st_rdev), minor(st.st_rdev));
+
+	sfile = fopen(spath, "r");
+	if (!sfile) {
+		log_err("%s: fopen on %s failed: %d\n",
+			td->o.name, spath, errno);
+		return 1;
+	}
+
+	rc = fscanf(sfile, "%lu", &size);
+	if (rc < 0) {
+		log_err("%s: fscanf on %s failed: %d\n",
+			td->o.name, spath, errno);
+		return 1;
+	}
+
+	f->real_file_size = size;
+
+	fclose(sfile);
+
+	if (f->file_offset > f->real_file_size) {
+		log_err("%s: offset extends end (%llu > %llu)\n", td->o.name,
+					(unsigned long long) f->file_offset,
+					(unsigned long long) f->real_file_size);
+		return 1;
+	}
+
+	fio_file_set_size_known(f);
+	return 0;
+}
+
+static struct ioengine_ops ioengine = {
+	.name		= "dev-dax",
+	.version	= FIO_IOOPS_VERSION,
+	.init		= fio_devdax_init,
+	.prep		= fio_devdax_prep,
+	.queue		= fio_devdax_queue,
+	.open_file	= fio_devdax_open_file,
+	.close_file	= fio_devdax_close_file,
+	.get_file_size	= fio_devdax_get_file_size,
+	.flags		= FIO_SYNCIO | FIO_DISKLESSIO | FIO_NOEXTEND | FIO_NODISKUTIL,
+};
+
+static void fio_init fio_devdax_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_devdax_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-12-14 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-12-14 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a79f17bf3bfa20b83424c2301de092bdcfbaaea4:

  iolog: add support for replay_scale option (2016-12-12 22:23:28 +0000)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 13e0f06b805eb0bb3a100ed710c7da18684c8950:

  Change misleading error message for invalid size= value (2016-12-13 09:32:59 -0700)

----------------------------------------------------------------
Jakub Sitnicki (1):
      Change misleading error message for invalid size= value

 init.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 382fa1f..36feb51 100644
--- a/init.c
+++ b/init.c
@@ -822,7 +822,8 @@ static int fixup_options(struct thread_data *td)
 	 * If size is set but less than the min block size, complain
 	 */
 	if (o->size && o->size < td_min_bs(td)) {
-		log_err("fio: size too small, must be larger than the IO size: %llu\n", (unsigned long long) o->size);
+		log_err("fio: size too small, must not be less than minimum block size: %llu < %u\n",
+			(unsigned long long) o->size, td_min_bs(td));
 		ret = 1;
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-12-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-12-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 487197d9e8f3aa0f135a6d88e5f222a1a930723a:

  mmap engine: remove unused variable mmap_map_mask (2016-12-05 09:48:08 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a79f17bf3bfa20b83424c2301de092bdcfbaaea4:

  iolog: add support for replay_scale option (2016-12-12 22:23:28 +0000)

----------------------------------------------------------------
Sitsofe Wheeler (3):
      iolog: Ignore re-add/re-open with replay_redirect
      blktrace: Fix replay_align 32 bit truncation
      iolog: add support for replay_scale option

 blktrace.c | 13 ++-----------
 iolog.c    | 22 +++++++++++++++++++---
 iolog.h    |  8 ++++++++
 3 files changed, 29 insertions(+), 14 deletions(-)

---

Diff of recent changes:

diff --git a/blktrace.c b/blktrace.c
index deb8b2d..a3474cb 100644
--- a/blktrace.c
+++ b/blktrace.c
@@ -216,15 +216,6 @@ static void t_bytes_align(struct thread_options *o, struct blk_io_trace *t)
 	t->bytes = (t->bytes + o->replay_align - 1) & ~(o->replay_align - 1);
 }
 
-static void ipo_bytes_align(struct thread_options *o, struct io_piece *ipo)
-{
-	if (!o->replay_align)
-		return;
-
-	ipo->offset &= ~(o->replay_align - 1);
-}
-
-
 /*
  * Store blk_io_trace data in an ipo for later retrieval.
  */
@@ -239,7 +230,7 @@ static void store_ipo(struct thread_data *td, unsigned long long offset,
 	ipo->offset = offset * bs;
 	if (td->o.replay_scale)
 		ipo->offset = ipo->offset / td->o.replay_scale;
-	ipo_bytes_align(&td->o, ipo);
+	ipo_bytes_align(td->o.replay_align, ipo);
 	ipo->len = bytes;
 	ipo->delay = ttime / 1000;
 	if (rw)
@@ -297,7 +288,7 @@ static void handle_trace_discard(struct thread_data *td,
 	ipo->offset = t->sector * bs;
 	if (td->o.replay_scale)
 		ipo->offset = ipo->offset / td->o.replay_scale;
-	ipo_bytes_align(&td->o, ipo);
+	ipo_bytes_align(td->o.replay_align, ipo);
 	ipo->len = t->bytes;
 	ipo->delay = ttime / 1000;
 	ipo->ddir = DDIR_TRIM;
diff --git a/iolog.c b/iolog.c
index 2bc3e3a..9393890 100644
--- a/iolog.c
+++ b/iolog.c
@@ -109,6 +109,11 @@ static int ipo_special(struct thread_data *td, struct io_piece *ipo)
 
 	switch (ipo->file_action) {
 	case FIO_LOG_OPEN_FILE:
+		if (td->o.replay_redirect && fio_file_open(f)) {
+			dprint(FD_FILE, "iolog: ignoring re-open of file %s\n",
+					f->file_name);
+			break;
+		}
 		ret = td_io_open_file(td, f);
 		if (!ret)
 			break;
@@ -396,8 +401,14 @@ static int read_iolog2(struct thread_data *td, FILE *f)
 		} else if (r == 2) {
 			rw = DDIR_INVAL;
 			if (!strcmp(act, "add")) {
-				fileno = add_file(td, fname, 0, 1);
-				file_action = FIO_LOG_ADD_FILE;
+				if (td->o.replay_redirect &&
+				    get_fileno(td, fname) != -1) {
+					dprint(FD_FILE, "iolog: ignoring"
+						" re-add of file %s\n", fname);
+				} else {
+					fileno = add_file(td, fname, 0, 1);
+					file_action = FIO_LOG_ADD_FILE;
+				}
 				continue;
 			} else if (!strcmp(act, "open")) {
 				fileno = get_fileno(td, fname);
@@ -443,7 +454,12 @@ static int read_iolog2(struct thread_data *td, FILE *f)
 		if (rw == DDIR_WAIT) {
 			ipo->delay = offset;
 		} else {
-			ipo->offset = offset;
+			if (td->o.replay_scale)
+				ipo->offset = offset / td->o.replay_scale;
+			else
+				ipo->offset = offset;
+			ipo_bytes_align(td->o.replay_align, ipo);
+
 			ipo->len = bytes;
 			if (rw != DDIR_INVAL && bytes > td->o.max_bs[rw])
 				td->o.max_bs[rw] = bytes;
diff --git a/iolog.h b/iolog.h
index ee28944..60ee3e9 100644
--- a/iolog.h
+++ b/iolog.h
@@ -269,6 +269,14 @@ static inline bool inline_log(struct io_log *log)
 		log->log_type == IO_LOG_TYPE_SLAT;
 }
 
+static inline void ipo_bytes_align(unsigned int replay_align, struct io_piece *ipo)
+{
+	if (replay_align)
+		return;
+
+	ipo->offset &= ~(replay_align - (uint64_t)1);
+}
+
 extern void finalize_logs(struct thread_data *td, bool);
 extern void setup_log(struct io_log **, struct log_params *, const char *);
 extern void flush_log(struct io_log *, bool);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-12-06 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-12-06 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6be06c46544c19e513ff80e7b841b1de688ffc66:

  log: fix for crash with rate IO and logging (2016-12-01 21:23:47 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 487197d9e8f3aa0f135a6d88e5f222a1a930723a:

  mmap engine: remove unused variable mmap_map_mask (2016-12-05 09:48:08 -0700)

----------------------------------------------------------------
Ross Zwisler (1):
      mmap engine: remove unused variable mmap_map_mask

 engines/mmap.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

---

Diff of recent changes:

diff --git a/engines/mmap.c b/engines/mmap.c
index 14e4013..c479ed3 100644
--- a/engines/mmap.c
+++ b/engines/mmap.c
@@ -20,7 +20,6 @@
 #define MMAP_TOTAL_SZ	(1 * 1024 * 1024 * 1024UL)
 
 static unsigned long mmap_map_size;
-static unsigned long mmap_map_mask;
 
 struct fio_mmap_data {
 	void *mmap_ptr;
@@ -72,7 +71,6 @@ static int fio_mmap_file(struct thread_data *td, struct fio_file *f,
 		(void) posix_madvise(fmd->mmap_ptr, fmd->mmap_sz, FIO_MADV_FREE);
 #endif
 
-
 err:
 	if (td->error && fmd->mmap_ptr)
 		munmap(fmd->mmap_ptr, length);
@@ -208,26 +206,15 @@ static int fio_mmapio_queue(struct thread_data *td, struct io_u *io_u)
 static int fio_mmapio_init(struct thread_data *td)
 {
 	struct thread_options *o = &td->o;
-	unsigned long shift, mask;
 
-	if ((td->o.rw_min_bs & page_mask) &&
+	if ((o->rw_min_bs & page_mask) &&
 	    (o->odirect || o->fsync_blocks || o->fdatasync_blocks)) {
 		log_err("fio: mmap options dictate a minimum block size of "
 			"%llu bytes\n", (unsigned long long) page_size);
 		return 1;
 	}
 
-	mmap_map_size = MMAP_TOTAL_SZ / td->o.nr_files;
-	mask = mmap_map_size;
-	shift = 0;
-	do {
-		mask >>= 1;
-		if (!mask)
-			break;
-		shift++;
-	} while (1);
-
-	mmap_map_mask = 1UL << shift;
+	mmap_map_size = MMAP_TOTAL_SZ / o->nr_files;
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-12-02 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-12-02 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7a3b2fc3434985fa519db55e8f81734c24af274d:

  server: bump protocol version (2016-11-27 21:40:26 +0000)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6be06c46544c19e513ff80e7b841b1de688ffc66:

  log: fix for crash with rate IO and logging (2016-12-01 21:23:47 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      log: fix for crash with rate IO and logging

Vincent Fu (1):
      Fix conversion in fio_server_send_ts

 backend.c | 7 ++-----
 io_u.c    | 3 +++
 server.c  | 4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 8616fc2..ac71521 100644
--- a/backend.c
+++ b/backend.c
@@ -441,11 +441,8 @@ static int wait_for_completions(struct thread_data *td, struct timeval *time)
 	int min_evts = 0;
 	int ret;
 
-	if (td->flags & TD_F_REGROW_LOGS) {
-		ret = io_u_quiesce(td);
-		regrow_logs(td);
-		return ret;
-	}
+	if (td->flags & TD_F_REGROW_LOGS)
+		return io_u_quiesce(td);
 
 	/*
 	 * if the queue is full, we MUST reap at least 1 event
diff --git a/io_u.c b/io_u.c
index 428d4b7..7420629 100644
--- a/io_u.c
+++ b/io_u.c
@@ -653,6 +653,9 @@ int io_u_quiesce(struct thread_data *td)
 			completed += ret;
 	}
 
+	if (td->flags & TD_F_REGROW_LOGS)
+		regrow_logs(td);
+
 	return completed;
 }
 
diff --git a/server.c b/server.c
index ab3e7cf..172ccc0 100644
--- a/server.c
+++ b/server.c
@@ -1541,9 +1541,9 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	p.ts.latency_window	= cpu_to_le64(ts->latency_window);
 	p.ts.latency_percentile.u.i = cpu_to_le64(fio_double_to_uint64(ts->latency_percentile.u.f));
 
-	p.ts.nr_block_infos	= le64_to_cpu(ts->nr_block_infos);
+	p.ts.nr_block_infos	= cpu_to_le64(ts->nr_block_infos);
 	for (i = 0; i < p.ts.nr_block_infos; i++)
-		p.ts.block_infos[i] = le32_to_cpu(ts->block_infos[i]);
+		p.ts.block_infos[i] = cpu_to_le32(ts->block_infos[i]);
 
 	convert_gs(&p.rs, rs);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-11-28 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-11-28 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 42f1ee68ceec87fbbfdc4972c35d3cdf7c08d9f6:

  Improve informativeness about directIO support or rather lackthereof on Solaris when errno is set to ENOTTY (2016-11-16 10:16:55 -0800)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7a3b2fc3434985fa519db55e8f81734c24af274d:

  server: bump protocol version (2016-11-27 21:40:26 +0000)

----------------------------------------------------------------
Sitsofe Wheeer (2):
      ioengines: Fix td->io_issues[ddir] over decrement
      Silence compiler warnings

Sitsofe Wheeler (5):
      fio: Fix (unsigned) integer overflow issues
      eta: Fix ramp time ETA
      eta: Fix ETA oddness at crossover points
      stat: Change access to io_sample union
      server: bump protocol version

 backend.c        |  7 ++++---
 cconv.c          |  8 ++++----
 client.c         | 16 ++++++++--------
 eta.c            | 38 ++++++++++++++++++++++++++------------
 fio.h            |  4 ++--
 gettime.c        |  2 +-
 init.c           |  2 +-
 io_u.c           |  3 ++-
 ioengines.c      |  1 +
 iolog.c          | 10 +++++-----
 iolog.h          | 14 ++++++++++----
 mutex.c          |  2 +-
 options.c        |  2 +-
 server.c         | 10 +++++-----
 server.h         |  2 +-
 stat.c           | 44 +++++++++++++++++++++++++-------------------
 stat.h           |  8 ++++----
 thread_options.h |  8 ++++----
 18 files changed, 105 insertions(+), 76 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 60cea3c..8616fc2 100644
--- a/backend.c
+++ b/backend.c
@@ -1864,8 +1864,8 @@ static void dump_td_info(struct thread_data *td)
 /*
  * Run over the job map and reap the threads that have exited, if any.
  */
-static void reap_threads(unsigned int *nr_running, unsigned int *t_rate,
-			 unsigned int *m_rate)
+static void reap_threads(unsigned int *nr_running, uint64_t *t_rate,
+			 uint64_t *m_rate)
 {
 	struct thread_data *td;
 	unsigned int cputhreads, realthreads, pending;
@@ -2103,7 +2103,8 @@ static bool waitee_running(struct thread_data *me)
 static void run_threads(struct sk_out *sk_out)
 {
 	struct thread_data *td;
-	unsigned int i, todo, nr_running, m_rate, t_rate, nr_started;
+	unsigned int i, todo, nr_running, nr_started;
+	uint64_t m_rate, t_rate;
 	uint64_t spent;
 
 	if (fio_gtod_offload && fio_start_gtod_thread())
diff --git a/cconv.c b/cconv.c
index 6e0f609..0032cc0 100644
--- a/cconv.c
+++ b/cconv.c
@@ -131,8 +131,8 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 		}
 
 		o->rwmix[i] = le32_to_cpu(top->rwmix[i]);
-		o->rate[i] = le32_to_cpu(top->rate[i]);
-		o->ratemin[i] = le32_to_cpu(top->ratemin[i]);
+		o->rate[i] = le64_to_cpu(top->rate[i]);
+		o->ratemin[i] = le64_to_cpu(top->ratemin[i]);
 		o->rate_iops[i] = le32_to_cpu(top->rate_iops[i]);
 		o->rate_iops_min[i] = le32_to_cpu(top->rate_iops_min[i]);
 
@@ -505,8 +505,8 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 		}
 
 		top->rwmix[i] = cpu_to_le32(o->rwmix[i]);
-		top->rate[i] = cpu_to_le32(o->rate[i]);
-		top->ratemin[i] = cpu_to_le32(o->ratemin[i]);
+		top->rate[i] = cpu_to_le64(o->rate[i]);
+		top->ratemin[i] = cpu_to_le64(o->ratemin[i]);
 		top->rate_iops[i] = cpu_to_le32(o->rate_iops[i]);
 		top->rate_iops_min[i] = cpu_to_le32(o->rate_iops_min[i]);
 
diff --git a/client.c b/client.c
index 9698122..c613887 100644
--- a/client.c
+++ b/client.c
@@ -1135,11 +1135,11 @@ static void convert_jobs_eta(struct jobs_eta *je)
 	je->files_open		= le32_to_cpu(je->files_open);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		je->m_rate[i]	= le32_to_cpu(je->m_rate[i]);
-		je->t_rate[i]	= le32_to_cpu(je->t_rate[i]);
+		je->m_rate[i]	= le64_to_cpu(je->m_rate[i]);
+		je->t_rate[i]	= le64_to_cpu(je->t_rate[i]);
 		je->m_iops[i]	= le32_to_cpu(je->m_iops[i]);
 		je->t_iops[i]	= le32_to_cpu(je->t_iops[i]);
-		je->rate[i]	= le32_to_cpu(je->rate[i]);
+		je->rate[i]	= le64_to_cpu(je->rate[i]);
 		je->iops[i]	= le32_to_cpu(je->iops[i]);
 	}
 
@@ -1276,7 +1276,7 @@ static void client_flush_hist_samples(FILE *f, int hist_coarseness, void *sample
 		s = (struct io_sample *)((char *)__get_sample(samples, log_offset, i) +
 			i * sizeof(struct io_u_plat_entry));
 
-		entry = s->plat_entry;
+		entry = s->data.plat_entry;
 		io_u_plat = entry->io_u_plat;
 
 		fprintf(f, "%lu, %u, %u, ", (unsigned long) s->time,
@@ -1552,7 +1552,7 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 			s = (struct io_sample *)((void *)s + sizeof(struct io_u_plat_entry) * i);
 
 		s->time		= le64_to_cpu(s->time);
-		s->val		= le64_to_cpu(s->val);
+		s->data.val	= le64_to_cpu(s->data.val);
 		s->__ddir	= le32_to_cpu(s->__ddir);
 		s->bs		= le32_to_cpu(s->bs);
 
@@ -1563,9 +1563,9 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 		}
 
 		if (ret->log_type == IO_LOG_TYPE_HIST) {
-			s->plat_entry = (struct io_u_plat_entry *)(((void *)s) + sizeof(*s));
-			s->plat_entry->list.next = NULL;
-			s->plat_entry->list.prev = NULL;
+			s->data.plat_entry = (struct io_u_plat_entry *)(((void *)s) + sizeof(*s));
+			s->data.plat_entry->list.next = NULL;
+			s->data.plat_entry->list.prev = NULL;
 		}
 	}
 
diff --git a/eta.c b/eta.c
index 3c1aeee..19afad5 100644
--- a/eta.c
+++ b/eta.c
@@ -225,7 +225,11 @@ static unsigned long thread_eta(struct thread_data *td)
 			}
 		}
 
-		eta_sec = (unsigned long) (elapsed * (1.0 / perc)) - elapsed;
+		if (perc == 0.0) {
+			eta_sec = timeout;
+		} else {
+			eta_sec = (unsigned long) (elapsed * (1.0 / perc)) - elapsed;
+		}
 
 		if (td->o.timeout &&
 		    eta_sec > (timeout + done_secs - elapsed))
@@ -235,7 +239,7 @@ static unsigned long thread_eta(struct thread_data *td)
 			|| td->runstate == TD_SETTING_UP
 			|| td->runstate == TD_RAMP
 			|| td->runstate == TD_PRE_READING) {
-		int t_eta = 0, r_eta = 0;
+		int64_t t_eta = 0, r_eta = 0;
 		unsigned long long rate_bytes;
 
 		/*
@@ -247,7 +251,10 @@ static unsigned long thread_eta(struct thread_data *td)
 			uint64_t start_delay = td->o.start_delay;
 			uint64_t ramp_time = td->o.ramp_time;
 
-			t_eta = __timeout + start_delay + ramp_time;
+			t_eta = __timeout + start_delay;
+			if (!td->ramp_time_over) {
+				t_eta += ramp_time;
+			}
 			t_eta /= 1000000ULL;
 
 			if ((td->runstate == TD_RAMP) && in_ramp_time(td)) {
@@ -259,9 +266,16 @@ static unsigned long thread_eta(struct thread_data *td)
 					t_eta -= ramp_left;
 			}
 		}
-		rate_bytes = ddir_rw_sum(td->o.rate);
+		rate_bytes = 0;
+		if (td_read(td))
+			rate_bytes  = td->o.rate[DDIR_READ];
+		if (td_write(td))
+			rate_bytes += td->o.rate[DDIR_WRITE];
+		if (td_trim(td))
+			rate_bytes += td->o.rate[DDIR_TRIM];
+
 		if (rate_bytes) {
-			r_eta = (bytes_total / 1024) / rate_bytes;
+			r_eta = bytes_total / rate_bytes;
 			r_eta += (td->o.start_delay / 1000000ULL);
 		}
 
@@ -285,7 +299,7 @@ static unsigned long thread_eta(struct thread_data *td)
 
 static void calc_rate(int unified_rw_rep, unsigned long mtime,
 		      unsigned long long *io_bytes,
-		      unsigned long long *prev_io_bytes, unsigned int *rate)
+		      unsigned long long *prev_io_bytes, uint64_t *rate)
 {
 	int i;
 
@@ -341,7 +355,7 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 {
 	struct thread_data *td;
 	int i, unified_rw_rep;
-	unsigned long rate_time, disp_time, bw_avg_time, *eta_secs;
+	uint64_t rate_time, disp_time, bw_avg_time, *eta_secs;
 	unsigned long long io_bytes[DDIR_RWDIR_CNT];
 	unsigned long long io_iops[DDIR_RWDIR_CNT];
 	struct timeval now;
@@ -367,8 +381,8 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 	if (!ddir_rw_sum(disp_io_bytes))
 		fill_start_time(&disp_prev_time);
 
-	eta_secs = malloc(thread_number * sizeof(unsigned long));
-	memset(eta_secs, 0, thread_number * sizeof(unsigned long));
+	eta_secs = malloc(thread_number * sizeof(uint64_t));
+	memset(eta_secs, 0, thread_number * sizeof(uint64_t));
 
 	je->elapsed_sec = (mtime_since_genesis() + 999) / 1000;
 
@@ -468,9 +482,9 @@ bool calc_thread_status(struct jobs_eta *je, int force)
 		calc_rate(unified_rw_rep, rate_time, io_bytes, rate_io_bytes,
 				je->rate);
 		memcpy(&rate_prev_time, &now, sizeof(now));
-		add_agg_sample(je->rate[DDIR_READ], DDIR_READ, 0);
-		add_agg_sample(je->rate[DDIR_WRITE], DDIR_WRITE, 0);
-		add_agg_sample(je->rate[DDIR_TRIM], DDIR_TRIM, 0);
+		add_agg_sample(sample_val(je->rate[DDIR_READ]), DDIR_READ, 0);
+		add_agg_sample(sample_val(je->rate[DDIR_WRITE]), DDIR_WRITE, 0);
+		add_agg_sample(sample_val(je->rate[DDIR_TRIM]), DDIR_TRIM, 0);
 	}
 
 	disp_time = mtime_since(&disp_prev_time, &now);
diff --git a/fio.h b/fio.h
index 74c1b30..7e32788 100644
--- a/fio.h
+++ b/fio.h
@@ -269,10 +269,10 @@ struct thread_data {
 	 * Rate state
 	 */
 	uint64_t rate_bps[DDIR_RWDIR_CNT];
-	unsigned long rate_next_io_time[DDIR_RWDIR_CNT];
+	uint64_t rate_next_io_time[DDIR_RWDIR_CNT];
 	unsigned long rate_bytes[DDIR_RWDIR_CNT];
 	unsigned long rate_blocks[DDIR_RWDIR_CNT];
-	unsigned long rate_io_issue_bytes[DDIR_RWDIR_CNT];
+	unsigned long long rate_io_issue_bytes[DDIR_RWDIR_CNT];
 	struct timeval lastrate[DDIR_RWDIR_CNT];
 	int64_t last_usec;
 	struct frand_state poisson_state;
diff --git a/gettime.c b/gettime.c
index 73b48b0..85ba7cb 100644
--- a/gettime.c
+++ b/gettime.c
@@ -381,7 +381,7 @@ void fio_clock_init(void)
 
 uint64_t utime_since(const struct timeval *s, const struct timeval *e)
 {
-	long sec, usec;
+	int64_t sec, usec;
 
 	sec = e->tv_sec - s->tv_sec;
 	usec = e->tv_usec - s->tv_usec;
diff --git a/init.c b/init.c
index d8c0bd1..382fa1f 100644
--- a/init.c
+++ b/init.c
@@ -1679,7 +1679,7 @@ static int is_empty_or_comment(char *line)
 /*
  * This is our [ini] type file parser.
  */
-int __parse_jobs_ini(struct thread_data *td,
+static int __parse_jobs_ini(struct thread_data *td,
 		char *file, int is_buf, int stonewall_flag, int type,
 		int nested, char *name, char ***popts, int *aopts, int *nopts)
 {
diff --git a/io_u.c b/io_u.c
index 7b51dd2..428d4b7 100644
--- a/io_u.c
+++ b/io_u.c
@@ -659,7 +659,8 @@ int io_u_quiesce(struct thread_data *td)
 static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
 {
 	enum fio_ddir odir = ddir ^ 1;
-	long usec, now;
+	long usec;
+	uint64_t now;
 
 	assert(ddir_rw(ddir));
 	now = utime_since_now(&td->start);
diff --git a/ioengines.c b/ioengines.c
index 4c53fe5..1b58168 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -298,6 +298,7 @@ int td_io_queue(struct thread_data *td, struct io_u *io_u)
 		td->io_issues[ddir]--;
 		td->io_issue_bytes[ddir] -= buflen;
 		td->rate_io_issue_bytes[ddir] -= buflen;
+		io_u_clear(td, io_u, IO_U_F_FLIGHT);
 	}
 
 	/*
diff --git a/iolog.c b/iolog.c
index f0ce3b2..2bc3e3a 100644
--- a/iolog.c
+++ b/iolog.c
@@ -720,7 +720,7 @@ static void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
 	for (i = 0; i < nr_samples; i++) {
 		s = __get_sample(samples, log_offset, i);
 
-		entry = (struct io_u_plat_entry *) (uintptr_t) s->val;
+		entry = s->data.plat_entry;
 		io_u_plat = entry->io_u_plat;
 
 		entry_before = flist_first_entry(&entry->list, struct io_u_plat_entry, list);
@@ -759,16 +759,16 @@ void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 		s = __get_sample(samples, log_offset, i);
 
 		if (!log_offset) {
-			fprintf(f, "%lu, %lu, %u, %u\n",
+			fprintf(f, "%lu, %" PRId64 ", %u, %u\n",
 					(unsigned long) s->time,
-					(unsigned long) s->val,
+					s->data.val,
 					io_sample_ddir(s), s->bs);
 		} else {
 			struct io_sample_offset *so = (void *) s;
 
-			fprintf(f, "%lu, %lu, %u, %u, %llu\n",
+			fprintf(f, "%lu, %" PRId64 ", %u, %u, %llu\n",
 					(unsigned long) s->time,
-					(unsigned long) s->val,
+					s->data.val,
 					io_sample_ddir(s), s->bs,
 					(unsigned long long) so->offset);
 		}
diff --git a/iolog.h b/iolog.h
index de641d5..ee28944 100644
--- a/iolog.h
+++ b/iolog.h
@@ -24,15 +24,21 @@ struct io_hist {
 	struct flist_head list;
 };
 
+
+union io_sample_data {
+	uint64_t val;
+	struct io_u_plat_entry *plat_entry;
+};
+
+#define sample_val(value) ((union io_sample_data) { .val = value })
+#define sample_plat(plat) ((union io_sample_data) { .plat_entry = plat })
+
 /*
  * A single data sample
  */
 struct io_sample {
 	uint64_t time;
-	union {
-		uint64_t val;
-		struct io_u_plat_entry *plat_entry;
-	};
+	union io_sample_data data;
 	uint32_t __ddir;
 	uint32_t bs;
 };
diff --git a/mutex.c b/mutex.c
index e5b045e..5e5a064 100644
--- a/mutex.c
+++ b/mutex.c
@@ -162,7 +162,7 @@ int fio_mutex_down_timeout(struct fio_mutex *mutex, unsigned int msecs)
 	t.tv_nsec = tv_s.tv_usec * 1000;
 
 	t.tv_sec += msecs / 1000;
-	t.tv_nsec += ((msecs * 1000000) % 1000000000);
+	t.tv_nsec += ((msecs * 1000000ULL) % 1000000000);
 	if (t.tv_nsec >= 1000000000) {
 		t.tv_nsec -= 1000000000;
 		t.tv_sec++;
diff --git a/options.c b/options.c
index 5937eb6..dfecd9d 100644
--- a/options.c
+++ b/options.c
@@ -22,7 +22,7 @@ char client_sockaddr_str[INET6_ADDRSTRLEN] = { 0 };
 
 #define cb_data_to_td(data)	container_of(data, struct thread_data, o)
 
-struct pattern_fmt_desc fmt_desc[] = {
+static struct pattern_fmt_desc fmt_desc[] = {
 	{
 		.fmt   = "%o",
 		.len   = FIELD_SIZE(struct io_u *, offset),
diff --git a/server.c b/server.c
index 091c161..ab3e7cf 100644
--- a/server.c
+++ b/server.c
@@ -912,11 +912,11 @@ static int handle_send_eta_cmd(struct fio_net_cmd *cmd)
 		je->files_open		= cpu_to_le32(je->files_open);
 
 		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-			je->m_rate[i]	= cpu_to_le32(je->m_rate[i]);
-			je->t_rate[i]	= cpu_to_le32(je->t_rate[i]);
+			je->m_rate[i]	= cpu_to_le64(je->m_rate[i]);
+			je->t_rate[i]	= cpu_to_le64(je->t_rate[i]);
 			je->m_iops[i]	= cpu_to_le32(je->m_iops[i]);
 			je->t_iops[i]	= cpu_to_le32(je->t_iops[i]);
-			je->rate[i]	= cpu_to_le32(je->rate[i]);
+			je->rate[i]	= cpu_to_le64(je->rate[i]);
 			je->iops[i]	= cpu_to_le32(je->iops[i]);
 		}
 
@@ -1730,7 +1730,7 @@ static int __fio_append_iolog_gz_hist(struct sk_entry *first, struct io_log *log
 		/* Do the subtraction on server side so that client doesn't have to
 		 * reconstruct our linked list from packets.
 		 */
-		cur_plat_entry  = s->plat_entry;
+		cur_plat_entry  = s->data.plat_entry;
 		prev_plat_entry = flist_first_entry(&cur_plat_entry->list, struct io_u_plat_entry, list);
 		cur_plat  = cur_plat_entry->io_u_plat;
 		prev_plat = prev_plat_entry->io_u_plat;
@@ -1934,7 +1934,7 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 			struct io_sample *s = get_sample(log, cur_log, i);
 
 			s->time		= cpu_to_le64(s->time);
-			s->val		= cpu_to_le64(s->val);
+			s->data.val	= cpu_to_le64(s->data.val);
 			s->__ddir	= cpu_to_le32(s->__ddir);
 			s->bs		= cpu_to_le32(s->bs);
 
diff --git a/server.h b/server.h
index 3b592c7..4a81bcd 100644
--- a/server.h
+++ b/server.h
@@ -38,7 +38,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 58,
+	FIO_SERVER_VER			= 59,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index 1e889f5..423aacd 100644
--- a/stat.c
+++ b/stat.c
@@ -2005,7 +2005,7 @@ static struct io_logs *get_cur_log(struct io_log *iolog)
 	return iolog->pending;
 }
 
-static void __add_log_sample(struct io_log *iolog, unsigned long val,
+static void __add_log_sample(struct io_log *iolog, union io_sample_data data,
 			     enum fio_ddir ddir, unsigned int bs,
 			     unsigned long t, uint64_t offset)
 {
@@ -2022,7 +2022,7 @@ static void __add_log_sample(struct io_log *iolog, unsigned long val,
 
 		s = get_sample(iolog, cur_log, cur_log->nr_samples);
 
-		s->val = val;
+		s->data = data;
 		s->time = t + (iolog->td ? iolog->td->unix_epoch : 0);
 		io_sample_set_ddir(iolog, s, ddir);
 		s->bs = bs;
@@ -2091,14 +2091,14 @@ static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
 	 * had actual samples done.
 	 */
 	if (iolog->avg_window[ddir].samples) {
-		unsigned long val;
+		union io_sample_data data;
 
 		if (log_max)
-			val = iolog->avg_window[ddir].max_val;
+			data.val = iolog->avg_window[ddir].max_val;
 		else
-			val = iolog->avg_window[ddir].mean.u.f + 0.50;
+			data.val = iolog->avg_window[ddir].mean.u.f + 0.50;
 
-		__add_log_sample(iolog, val, ddir, 0, elapsed, 0);
+		__add_log_sample(iolog, data, ddir, 0, elapsed, 0);
 	}
 
 	reset_io_stat(&iolog->avg_window[ddir]);
@@ -2114,7 +2114,7 @@ static void _add_stat_to_log(struct io_log *iolog, unsigned long elapsed,
 }
 
 static long add_log_sample(struct thread_data *td, struct io_log *iolog,
-			   unsigned long val, enum fio_ddir ddir,
+			   union io_sample_data data, enum fio_ddir ddir,
 			   unsigned int bs, uint64_t offset)
 {
 	unsigned long elapsed, this_window;
@@ -2128,7 +2128,7 @@ static long add_log_sample(struct thread_data *td, struct io_log *iolog,
 	 * If no time averaging, just add the log sample.
 	 */
 	if (!iolog->avg_msec) {
-		__add_log_sample(iolog, val, ddir, bs, elapsed, offset);
+		__add_log_sample(iolog, data, ddir, bs, elapsed, offset);
 		return 0;
 	}
 
@@ -2136,7 +2136,7 @@ static long add_log_sample(struct thread_data *td, struct io_log *iolog,
 	 * Add the sample. If the time period has passed, then
 	 * add that entry to the log and clear.
 	 */
-	add_stat_sample(&iolog->avg_window[ddir], val);
+	add_stat_sample(&iolog->avg_window[ddir], data.val);
 
 	/*
 	 * If period hasn't passed, adding the above sample is all we
@@ -2176,7 +2176,7 @@ void finalize_logs(struct thread_data *td, bool unit_logs)
 		_add_stat_to_log(td->iops_log, elapsed, td->o.log_max != 0);
 }
 
-void add_agg_sample(unsigned long val, enum fio_ddir ddir, unsigned int bs)
+void add_agg_sample(union io_sample_data data, enum fio_ddir ddir, unsigned int bs)
 {
 	struct io_log *iolog;
 
@@ -2184,7 +2184,7 @@ void add_agg_sample(unsigned long val, enum fio_ddir ddir, unsigned int bs)
 		return;
 
 	iolog = agg_io_log[ddir];
-	__add_log_sample(iolog, val, ddir, bs, mtime_since_genesis(), 0);
+	__add_log_sample(iolog, data, ddir, bs, mtime_since_genesis(), 0);
 }
 
 static void add_clat_percentile_sample(struct thread_stat *ts,
@@ -2208,7 +2208,8 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 	add_stat_sample(&ts->clat_stat[ddir], usec);
 
 	if (td->clat_log)
-		add_log_sample(td, td->clat_log, usec, ddir, bs, offset);
+		add_log_sample(td, td->clat_log, sample_val(usec), ddir, bs,
+			       offset);
 
 	if (ts->clat_percentiles)
 		add_clat_percentile_sample(ts, usec, ddir);
@@ -2238,7 +2239,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 			memcpy(&(dst->io_u_plat), io_u_plat,
 				FIO_IO_U_PLAT_NR * sizeof(unsigned int));
 			flist_add(&dst->list, &hw->list);
-			__add_log_sample(iolog, (unsigned long)dst, ddir, bs,
+			__add_log_sample(iolog, sample_plat(dst), ddir, bs,
 						elapsed, offset);
 
 			/*
@@ -2267,7 +2268,7 @@ void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
 	add_stat_sample(&ts->slat_stat[ddir], usec);
 
 	if (td->slat_log)
-		add_log_sample(td, td->slat_log, usec, ddir, bs, offset);
+		add_log_sample(td, td->slat_log, sample_val(usec), ddir, bs, offset);
 
 	td_io_u_unlock(td);
 }
@@ -2285,7 +2286,8 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 	add_stat_sample(&ts->lat_stat[ddir], usec);
 
 	if (td->lat_log)
-		add_log_sample(td, td->lat_log, usec, ddir, bs, offset);
+		add_log_sample(td, td->lat_log, sample_val(usec), ddir, bs,
+			       offset);
 
 	td_io_u_unlock(td);
 }
@@ -2306,7 +2308,8 @@ void add_bw_sample(struct thread_data *td, struct io_u *io_u,
 	add_stat_sample(&ts->bw_stat[io_u->ddir], rate);
 
 	if (td->bw_log)
-		add_log_sample(td, td->bw_log, rate, io_u->ddir, bytes, io_u->offset);
+		add_log_sample(td, td->bw_log, sample_val(rate), io_u->ddir,
+			       bytes, io_u->offset);
 
 	td->stat_io_bytes[io_u->ddir] = td->this_io_bytes[io_u->ddir];
 	td_io_u_unlock(td);
@@ -2351,7 +2354,8 @@ static int add_bw_samples(struct thread_data *td, struct timeval *t)
 			if (td->o.min_bs[ddir] == td->o.max_bs[ddir])
 				bs = td->o.min_bs[ddir];
 
-			next = add_log_sample(td, td->bw_log, rate, ddir, bs, 0);
+			next = add_log_sample(td, td->bw_log, sample_val(rate),
+					      ddir, bs, 0);
 			next_log = min(next_log, next);
 		}
 
@@ -2379,7 +2383,8 @@ void add_iops_sample(struct thread_data *td, struct io_u *io_u,
 	add_stat_sample(&ts->iops_stat[io_u->ddir], 1);
 
 	if (td->iops_log)
-		add_log_sample(td, td->iops_log, 1, io_u->ddir, bytes, io_u->offset);
+		add_log_sample(td, td->iops_log, sample_val(1), io_u->ddir,
+			       bytes, io_u->offset);
 
 	td->stat_io_blocks[io_u->ddir] = td->this_io_blocks[io_u->ddir];
 	td_io_u_unlock(td);
@@ -2424,7 +2429,8 @@ static int add_iops_samples(struct thread_data *td, struct timeval *t)
 			if (td->o.min_bs[ddir] == td->o.max_bs[ddir])
 				bs = td->o.min_bs[ddir];
 
-			next = add_log_sample(td, td->iops_log, iops, ddir, bs, 0);
+			next = add_log_sample(td, td->iops_log,
+					      sample_val(iops), ddir, bs, 0);
 			next_log = min(next_log, next);
 		}
 
diff --git a/stat.h b/stat.h
index e6f7759..75d1f4e 100644
--- a/stat.h
+++ b/stat.h
@@ -123,7 +123,7 @@ struct group_run_stats {
 #define BLOCK_INFO_STATE(block_info)		\
 	((block_info) >> BLOCK_INFO_STATE_SHIFT)
 #define BLOCK_INFO(state, trim_cycles)	\
-	((trim_cycles) | ((state) << BLOCK_INFO_STATE_SHIFT))
+	((trim_cycles) | ((unsigned int) (state) << BLOCK_INFO_STATE_SHIFT))
 #define BLOCK_INFO_SET_STATE(block_info, state)	\
 	BLOCK_INFO(state, BLOCK_INFO_TRIMS(block_info))
 enum block_info_state {
@@ -224,9 +224,9 @@ struct jobs_eta {
 
 	uint32_t files_open;
 
-	uint32_t m_rate[DDIR_RWDIR_CNT], t_rate[DDIR_RWDIR_CNT];
+	uint64_t m_rate[DDIR_RWDIR_CNT], t_rate[DDIR_RWDIR_CNT];
 	uint32_t m_iops[DDIR_RWDIR_CNT], t_iops[DDIR_RWDIR_CNT];
-	uint32_t rate[DDIR_RWDIR_CNT];
+	uint64_t rate[DDIR_RWDIR_CNT];
 	uint32_t iops[DDIR_RWDIR_CNT];
 	uint64_t elapsed_sec;
 	uint64_t eta_sec;
@@ -281,7 +281,7 @@ extern void add_clat_sample(struct thread_data *, enum fio_ddir, unsigned long,
 				unsigned int, uint64_t);
 extern void add_slat_sample(struct thread_data *, enum fio_ddir, unsigned long,
 				unsigned int, uint64_t);
-extern void add_agg_sample(unsigned long, enum fio_ddir, unsigned int);
+extern void add_agg_sample(union io_sample_data, enum fio_ddir, unsigned int);
 extern void add_iops_sample(struct thread_data *, struct io_u *,
 				unsigned int);
 extern void add_bw_sample(struct thread_data *, struct io_u *,
diff --git a/thread_options.h b/thread_options.h
index 5e379e3..8ec6b97 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -251,8 +251,8 @@ struct thread_options {
 	char *exec_prerun;
 	char *exec_postrun;
 
-	unsigned int rate[DDIR_RWDIR_CNT];
-	unsigned int ratemin[DDIR_RWDIR_CNT];
+	uint64_t rate[DDIR_RWDIR_CNT];
+	uint64_t ratemin[DDIR_RWDIR_CNT];
 	unsigned int ratecycle;
 	unsigned int io_submit_mode;
 	unsigned int rate_iops[DDIR_RWDIR_CNT];
@@ -516,8 +516,8 @@ struct thread_options_pack {
 	uint8_t exec_prerun[FIO_TOP_STR_MAX];
 	uint8_t exec_postrun[FIO_TOP_STR_MAX];
 
-	uint32_t rate[DDIR_RWDIR_CNT];
-	uint32_t ratemin[DDIR_RWDIR_CNT];
+	uint64_t rate[DDIR_RWDIR_CNT];
+	uint64_t ratemin[DDIR_RWDIR_CNT];
 	uint32_t ratecycle;
 	uint32_t io_submit_mode;
 	uint32_t rate_iops[DDIR_RWDIR_CNT];

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-11-17 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-11-17 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit effe99e4b18eb5c345629d7bbaae1879a2594b20:

  posixaio: fix bad type passed to memset (2016-11-15 10:42:58 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 42f1ee68ceec87fbbfdc4972c35d3cdf7c08d9f6:

  Improve informativeness about directIO support or rather lackthereof on Solaris when errno is set to ENOTTY (2016-11-16 10:16:55 -0800)

----------------------------------------------------------------
Sam Zaydel (1):
      Improve informativeness about directIO support or rather lackthereof on Solaris when errno is set to ENOTTY

 ioengines.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/ioengines.c b/ioengines.c
index ae55f95..4c53fe5 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -483,7 +483,12 @@ int td_io_open_file(struct thread_data *td, struct fio_file *f)
 
 		if (ret) {
 			td_verror(td, ret, "fio_set_odirect");
-			log_err("fio: the file system does not seem to support direct IO\n");
+			if (ret == ENOTTY) { /* ENOTTY suggests RAW device or ZFS */
+				log_err("fio: doing directIO to RAW devices or ZFS not supported\n");
+			} else {
+				log_err("fio: the file system does not seem to support direct IO\n");
+			}
+
 			goto err;
 		}
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-11-16 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-11-16 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 471bb52b2e75413d18e8def5bb7d301aab7541e9:

  Fix memory leak on tmp_buf (2016-11-14 00:05:37 +0000)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to effe99e4b18eb5c345629d7bbaae1879a2594b20:

  posixaio: fix bad type passed to memset (2016-11-15 10:42:58 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      posixaio: fix bad type passed to memset

 engines/posixaio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/posixaio.c b/engines/posixaio.c
index e5411b7..bddb1ec 100644
--- a/engines/posixaio.c
+++ b/engines/posixaio.c
@@ -109,7 +109,7 @@ static int fio_posixaio_getevents(struct thread_data *td, unsigned int min,
 
 	r = 0;
 restart:
-	memset(suspend_list, 0, sizeof(*suspend_list));
+	memset(suspend_list, 0, sizeof(suspend_list));
 	suspend_entries = 0;
 	io_u_qiter(&td->io_u_all, io_u, i) {
 		int err;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-11-14 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-11-14 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 206c546d6015fe3809b8e52ea95f56114b8e9f25:

  rbd: fix crash with zero sized image (2016-11-12 08:36:23 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 471bb52b2e75413d18e8def5bb7d301aab7541e9:

  Fix memory leak on tmp_buf (2016-11-14 00:05:37 +0000)

----------------------------------------------------------------
Colin Ian King (1):
      Fix memory leak on tmp_buf

 oslib/libmtd.c | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/oslib/libmtd.c b/oslib/libmtd.c
index 5b22d6a..24e9db9 100644
--- a/oslib/libmtd.c
+++ b/oslib/libmtd.c
@@ -1116,6 +1116,7 @@ static int legacy_auto_oob_layout(const struct mtd_dev_info *mtd, int fd,
 		len = mtd->oob_size - start;
 		memcpy(oob + start, tmp_buf + start, len);
 	}
+	free(tmp_buf);
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-11-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-11-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4e7a881493790a3a2b970988aef4bd3603877fab:

  Fix duplicated typos from 42d97b5c in fio(1) (2016-11-02 08:05:49 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 206c546d6015fe3809b8e52ea95f56114b8e9f25:

  rbd: fix crash with zero sized image (2016-11-12 08:36:23 -0700)

----------------------------------------------------------------
Pan Liu (1):
      rbd: fix crash with zero sized image

 engines/rbd.c | 5 +++++
 1 file changed, 5 insertions(+)

---

Diff of recent changes:

diff --git a/engines/rbd.c b/engines/rbd.c
index aa50c80..ee2ce81 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -595,7 +595,12 @@ static int fio_rbd_setup(struct thread_data *td)
 	if (r < 0) {
 		log_err("rbd_status failed.\n");
 		goto disconnect;
+	} else if (info.size == 0) {
+		log_err("image size should be larger than zero.\n");
+		r = -EINVAL;
+		goto disconnect;
 	}
+
 	dprint(FD_IO, "rbd-engine: image size: %lu\n", info.size);
 
 	/* taken from "net" engine. Pretend we deal with files,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-11-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-11-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4e795a3e0940509bd991682ec029000b6aa8881b:

  Remove extra space in tausworthe32 warning message (2016-11-01 14:24:50 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4e7a881493790a3a2b970988aef4bd3603877fab:

  Fix duplicated typos from 42d97b5c in fio(1) (2016-11-02 08:05:49 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (1):
      Fix duplicated typos from 42d97b5c in fio(1)

 fio.1     | 6 +++---
 options.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/fio.1 b/fio.1
index 48c2060..07480f0 100644
--- a/fio.1
+++ b/fio.1
@@ -764,7 +764,7 @@ Example #1:
 \fBiodepth_batch_complete_max\fR=<iodepth>
 .RE
 
-which means that we will retrieve at leat 1 IO and up to the
+which means that we will retrieve at least 1 IO and up to the
 whole submitted queue depth. If none of IO has been completed
 yet, we will wait.
 
@@ -1324,7 +1324,7 @@ fio will fill 1/2/3/4 bytes of the buffer at the time(it can be either a
 decimal or a hex number). The verify_pattern if larger than a 32-bit quantity
 has to be a hex number that starts with either "0x" or "0X". Use with
 \fBverify\fP=str. Also, verify_pattern supports %o format, which means that for
-each block offset will be written and then verifyied back, e.g.:
+each block offset will be written and then verified back, e.g.:
 .RS
 .RS
 \fBverify_pattern\fR=%o
@@ -2325,7 +2325,7 @@ IO is a TRIM
 The \fIoffset\fR is the offset, in bytes, from the start of the file, for that
 particular IO. The logging of the offset can be toggled with \fBlog_offset\fR.
 
-If windowed logging is enabled though \fBlog_avg_msec\fR, then fio doesn't log
+If windowed logging is enabled through \fBlog_avg_msec\fR, then fio doesn't log
 individual IOs. Instead of logs the average values over the specified
 period of time. Since \fIdata direction\fR and \fIoffset\fR are per-IO values,
 they aren't applicable if windowed logging is enabled. If windowed logging
diff --git a/options.c b/options.c
index 3c9adfb..5937eb6 100644
--- a/options.c
+++ b/options.c
@@ -2157,7 +2157,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  },
 			  { .ival = "gauss",
 			    .oval = FIO_FSERVICE_GAUSS,
-			    .help = "Normal (guassian) distribution",
+			    .help = "Normal (gaussian) distribution",
 			  },
 			  { .ival = "roundrobin",
 			    .oval = FIO_FSERVICE_RR,

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-11-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-11-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a955f5297d6951517c663ac9effd94adfae6a563:

  Add blockdev_size() support for OpenBSD (2016-10-26 08:00:10 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4e795a3e0940509bd991682ec029000b6aa8881b:

  Remove extra space in tausworthe32 warning message (2016-11-01 14:24:50 -0600)

----------------------------------------------------------------
Bruce Cran (1):
      Remove extra space in tausworthe32 warning message

Jens Axboe (2):
      backend: cleanup check for completion/issue byte checking
      Merge branch 'HOWTO-cleanup' of https://github.com/szaydel/fio

Sam Zaydel (1):
      Fix minor spelling mistakes in HOWTO document

 HOWTO       | 33 ++++++++++++++++-----------------
 backend.c   | 34 +++++++++++-----------------------
 filesetup.c |  2 +-
 3 files changed, 28 insertions(+), 41 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 3f8acee..577eed9 100644
--- a/HOWTO
+++ b/HOWTO
@@ -803,8 +803,8 @@ ioengine=str	Defines how the job issues io to the file. The following
 				cannot be modified. So random writes are not
 				possible. To imitate this, libhdfs engine
 				creates bunch of small files, and engine will
-				pick a file out of those files based on the 
-				offset enerated by fio backend. Each jobs uses
+				pick a file out of those files based on the
+				offset generated by fio backend. Each jobs uses
 				it's own connection to HDFS.
 
 			mtd	Read, write and erase an MTD character device
@@ -864,7 +864,7 @@ iodepth_batch_complete_max=int This defines maximum pieces of IO to
 		iodepth_batch_complete_min=1
 		iodepth_batch_complete_max=<iodepth>
 
-		which means that we will retrieve at leat 1 IO and up to the
+		which means that we will retrieve at least 1 IO and up to the
 		whole submitted queue depth. If none of IO has been completed
 		yet, we will wait.
 
@@ -986,7 +986,7 @@ random_distribution=str:float	By default, fio will use a completely uniform
 		random		Uniform random distribution
 		zipf		Zipf distribution
 		pareto		Pareto distribution
-		gauss		Normal (guassian) distribution
+		gauss		Normal (gaussian) distribution
 		zoned		Zoned random distribution
 
 		When using a zipf or pareto distribution, an input value
@@ -1027,7 +1027,7 @@ percentage_random=int	For a random workload, set how big a percentage should
 		and random IO, at the given percentages. It is possible to
 		set different values for reads, writes, and trim. To do so,
 		simply use a comma separated list. See blocksize.
-	
+
 norandommap	Normally fio will cover every block of the file when doing
 		random IO. If this option is given, fio will just get a
 		new random offset without looking at past io history. This
@@ -1185,7 +1185,7 @@ numa_cpu_nodes=str Set this job running on specified NUMA nodes' CPUs. The
 		fio must be built on a system with libnuma-dev(el) installed.
 
 numa_mem_policy=str Set this job's memory policy and corresponding NUMA
-		nodes. Format of the argements:
+		nodes. Format of the arguments:
 			<mode>[:<nodelist>]
 		`mode' is one of the following memory policy:
 			default, prefer, bind, interleave, local
@@ -1266,7 +1266,7 @@ mem=str		Fio can use various types of memory as the io unit buffer.
 		location should point there. So if it's mounted in /huge,
 		you would use mem=mmaphuge:/huge/somefile.
 
-iomem_align=int	This indiciates the memory alignment of the IO memory buffers.
+iomem_align=int	This indicates the memory alignment of the IO memory buffers.
 		Note that the given alignment is applied to the first IO unit
 		buffer, if using iodepth the alignment of the following buffers
 		are given by the bs used. In other words, if using a bs that is
@@ -1330,7 +1330,7 @@ pre_read=bool	If this is given, files will be pre-read into memory before
 		starting the given IO operation. This will also clear
 		the 'invalidate' flag, since it is pointless to pre-read
 		and then drop the cache. This will only work for IO engines
-		that are seekable, since they allow you to read the same data
+		that are seek-able, since they allow you to read the same data
 		multiple times. Thus it will not work on eg network or splice
 		IO.
 
@@ -1372,7 +1372,7 @@ verify=str	If writing to a file, fio can verify the file contents
 			crc32c	Use a crc32c sum of the data area and store
 				it in the header of each block.
 
-			crc32c-intel Use hardware assisted crc32c calcuation
+			crc32c-intel Use hardware assisted crc32c calculation
 				provided on SSE4.2 enabled processors. Falls
 				back to regular software crc32c, if not
 				supported by the system.
@@ -1447,7 +1447,7 @@ verify_pattern=str	If set, fio will fill the io buffers with this
 		be a hex number that starts with either "0x" or "0X". Use
 		with verify=str. Also, verify_pattern supports %o format,
 		which means that for each block offset will be written and
-		then verifyied back, e.g.:
+		then verified back, e.g.:
 
 		verify_pattern=%o
 
@@ -1579,7 +1579,7 @@ replay_redirect=str While replaying I/O patterns using read_iolog the
 		multiple devices to be replayed concurrently to multiple
 		redirected devices you must blkparse your trace into separate
 		traces and replay them with independent fio invocations.
-		Unfortuantely this also breaks the strict time ordering
+		Unfortunately this also breaks the strict time ordering
 		between multiple device accesses.
 
 replay_align=int	Force alignment of IO offsets and lengths in a trace
@@ -1924,7 +1924,7 @@ be the starting port number since fio will use a range of ports.
 		connections rather than initiating an outgoing connection. The
 		hostname must be omitted if this option is used.
 
-[net] pingpong	Normaly a network writer will just continue writing data, and
+[net] pingpong	Normally a network writer will just continue writing data, and
 		a network reader will just consume packages. If pingpong=1
 		is set, a writer will send its normal payload to the reader,
 		then wait for the reader to send the same payload back. This
@@ -1945,15 +1945,15 @@ be the starting port number since fio will use a range of ports.
 [e4defrag] inplace=int
 		Configure donor file blocks allocation strategy
 		0(default): Preallocate donor's file on init
-		1 	  : allocate space immidietly inside defragment event,
+		1 	  : allocate space immediately inside defragment event,
 			    and free right after event
 
 [rbd] clustername=str	Specifies the name of the Ceph cluster.
 [rbd] rbdname=str	Specifies the name of the RBD.
-[rbd] pool=str		Specifies the naem of the Ceph pool containing RBD.
+[rbd] pool=str		Specifies the name of the Ceph pool containing RBD.
 [rbd] clientname=str	Specifies the username (without the 'client.' prefix)
 			used to access the Ceph cluster. If the clustername is
-			specified, the clientmae shall be the full type.id
+			specified, the clientname shall be the full type.id
 			string. If no type. prefix is given, fio will add
 			'client.' by default.
 
@@ -2353,10 +2353,9 @@ Data direction is one of the following:
 The offset is the offset, in bytes, from the start of the file, for that
 particular IO. The logging of the offset can be toggled with 'log_offset'.
 
-If windowed logging is enabled though 'log_avg_msec', then fio doesn't log
+If windowed logging is enabled through 'log_avg_msec', then fio doesn't log
 individual IOs. Instead of logs the average values over the specified
 period of time. Since 'data direction' and 'offset' are per-IO values,
 they aren't applicable if windowed logging is enabled. If windowed logging
 is enabled and 'log_max_value' is set, then fio logs maximum values in
 that window instead of averages.
-
diff --git a/backend.c b/backend.c
index ed4f1f0..60cea3c 100644
--- a/backend.c
+++ b/backend.c
@@ -771,18 +771,18 @@ static bool exceeds_number_ios(struct thread_data *td)
 	return number_ios >= (td->o.number_ios * td->loops);
 }
 
-static bool io_issue_bytes_exceeded(struct thread_data *td)
+static bool io_bytes_exceeded(struct thread_data *td, uint64_t *this_bytes)
 {
 	unsigned long long bytes, limit;
 
 	if (td_rw(td))
-		bytes = td->io_issue_bytes[DDIR_READ] + td->io_issue_bytes[DDIR_WRITE];
+		bytes = this_bytes[DDIR_READ] + this_bytes[DDIR_WRITE];
 	else if (td_write(td))
-		bytes = td->io_issue_bytes[DDIR_WRITE];
+		bytes = this_bytes[DDIR_WRITE];
 	else if (td_read(td))
-		bytes = td->io_issue_bytes[DDIR_READ];
+		bytes = this_bytes[DDIR_READ];
 	else
-		bytes = td->io_issue_bytes[DDIR_TRIM];
+		bytes = this_bytes[DDIR_TRIM];
 
 	if (td->o.io_limit)
 		limit = td->o.io_limit;
@@ -793,26 +793,14 @@ static bool io_issue_bytes_exceeded(struct thread_data *td)
 	return bytes >= limit || exceeds_number_ios(td);
 }
 
-static bool io_complete_bytes_exceeded(struct thread_data *td)
+static bool io_issue_bytes_exceeded(struct thread_data *td)
 {
-	unsigned long long bytes, limit;
-
-	if (td_rw(td))
-		bytes = td->this_io_bytes[DDIR_READ] + td->this_io_bytes[DDIR_WRITE];
-	else if (td_write(td))
-		bytes = td->this_io_bytes[DDIR_WRITE];
-	else if (td_read(td))
-		bytes = td->this_io_bytes[DDIR_READ];
-	else
-		bytes = td->this_io_bytes[DDIR_TRIM];
-
-	if (td->o.io_limit)
-		limit = td->o.io_limit;
-	else
-		limit = td->o.size;
+	return io_bytes_exceeded(td, td->io_issue_bytes);
+}
 
-	limit *= td->loops;
-	return bytes >= limit || exceeds_number_ios(td);
+static bool io_complete_bytes_exceeded(struct thread_data *td)
+{
+	return io_bytes_exceeded(td, td->this_io_bytes);
 }
 
 /*
diff --git a/filesetup.c b/filesetup.c
index a3bbbb2..969e7cc 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1098,7 +1098,7 @@ static int check_rand_gen_limits(struct thread_data *td, struct fio_file *f,
 	if (!fio_option_is_set(&td->o, random_generator)) {
 		log_info("fio: Switching to tausworthe64. Use the "
 			 "random_generator= option to get rid of this "
-			 " warning.\n");
+			 "warning.\n");
 		td->o.random_generator = FIO_RAND_GEN_TAUSWORTHE64;
 		return 0;
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-10-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-10-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a0290964d9e15aa9faa5a139effc14cfa5bcd3f1:

  Fio 2.15 (2016-10-25 12:38:13 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a955f5297d6951517c663ac9effd94adfae6a563:

  Add blockdev_size() support for OpenBSD (2016-10-26 08:00:10 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (2):
      Add blockdev_size() support for NetBSD
      Add blockdev_size() support for OpenBSD

 os/os-netbsd.h  | 17 ++++++++++++++++-
 os/os-openbsd.h | 17 ++++++++++++++++-
 2 files changed, 32 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/os/os-netbsd.h b/os/os-netbsd.h
index 1ef5866..2133d7a 100644
--- a/os/os-netbsd.h
+++ b/os/os-netbsd.h
@@ -7,6 +7,9 @@
 #include <lwp.h>
 #include <sys/param.h>
 #include <sys/statvfs.h>
+#include <sys/ioctl.h>
+#include <sys/dkio.h>
+#include <sys/disklabel.h>
 /* XXX hack to avoid confilcts between rbtree.h and <sys/rb.h> */
 #define	rb_node	_rb_node
 #include <sys/sysctl.h>
@@ -17,7 +20,6 @@
 #include "../file.h"
 
 #define FIO_HAVE_ODIRECT
-#define FIO_USE_GENERIC_BDEV_SIZE
 #define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_FS_STAT
@@ -37,6 +39,19 @@
 
 typedef off_t off64_t;
 
+static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
+{
+	struct disklabel dl;
+
+	if (!ioctl(f->fd, DIOCGDINFO, &dl)) {
+		*bytes = ((unsigned long long)dl.d_secperunit) * dl.d_secsize;
+		return 0;
+	}
+
+	*bytes = 0;
+	return errno;
+}
+
 static inline int blockdev_invalidate_cache(struct fio_file *f)
 {
 	return EINVAL;
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index 2998510..3343cbd 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -6,6 +6,9 @@
 #include <errno.h>
 #include <sys/param.h>
 #include <sys/statvfs.h>
+#include <sys/ioctl.h>
+#include <sys/dkio.h>
+#include <sys/disklabel.h>
 /* XXX hack to avoid conflicts between rbtree.h and <sys/tree.h> */
 #include <sys/sysctl.h>
 #undef RB_BLACK
@@ -15,7 +18,6 @@
 #include "../file.h"
 
 #undef  FIO_HAVE_ODIRECT
-#define FIO_USE_GENERIC_BDEV_SIZE
 #define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_FS_STAT
@@ -35,6 +37,19 @@
 
 typedef off_t off64_t;
 
+static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
+{
+	struct disklabel dl;
+
+	if (!ioctl(f->fd, DIOCGDINFO, &dl)) {
+		*bytes = ((unsigned long long)dl.d_secperunit) * dl.d_secsize;
+		return 0;
+	}
+
+	*bytes = 0;
+	return errno;
+}
+
 static inline int blockdev_invalidate_cache(struct fio_file *f)
 {
 	return EINVAL;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-10-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-10-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 43f248c4527a49d3de0cd758ce669f7736028ea4:

  Use the POSIX `EDEADLK` instead of the Linux `EDEADLOCK` (2016-10-24 20:48:43 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a0290964d9e15aa9faa5a139effc14cfa5bcd3f1:

  Fio 2.15 (2016-10-25 12:38:13 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 2.15

 FIO-VERSION-GEN        | 2 +-
 os/windows/install.wxs | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 8d4f1ef..eac0e00 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.14
+DEF_VER=fio-2.15
 
 LF='
 '
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index da09b9f..25cb269 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.14">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.15">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-10-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-10-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ca0122d822ea7dd573f05ca4cf43c5d0ff9f4adb:

  backend: if we can't grab stat_mutex, report a deadlock error and exit (2016-10-23 08:35:14 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 43f248c4527a49d3de0cd758ce669f7736028ea4:

  Use the POSIX `EDEADLK` instead of the Linux `EDEADLOCK` (2016-10-24 20:48:43 -0600)

----------------------------------------------------------------
Bruce Cran (1):
      Use the POSIX `EDEADLK` instead of the Linux `EDEADLOCK`

Jens Axboe (3):
      mutex: clear mutex when removed
      fio: make job reap timeout 5 minutes
      backend: end IO loop early, if the job is marked as terminated

 backend.c | 10 +++++++++-
 fio.h     |  2 +-
 mutex.c   |  6 ++++++
 3 files changed, 16 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 093b6a3..ed4f1f0 100644
--- a/backend.c
+++ b/backend.c
@@ -1723,6 +1723,14 @@ static void *thread_main(void *data)
 			}
 		}
 
+		/*
+		 * If we took too long to shut down, the main thread could
+		 * already consider us reaped/exited. If that happens, break
+		 * out and clean up.
+		 */
+		if (td->runstate >= TD_EXITED)
+			break;
+
 		clear_state = 1;
 
 		/*
@@ -1740,7 +1748,7 @@ static void *thread_main(void *data)
 			usleep(1000);
 			if (deadlock_loop_cnt++ > 5000) {
 				log_err("fio seems to be stuck grabbing stat_mutex, forcibly exiting\n");
-				td->error = EDEADLOCK;
+				td->error = EDEADLK;
 				goto err;
 			}
 		} while (1);
diff --git a/fio.h b/fio.h
index 080842a..74c1b30 100644
--- a/fio.h
+++ b/fio.h
@@ -588,7 +588,7 @@ extern const char *runstate_to_name(int runstate);
  * Allow 60 seconds for a job to quit on its own, otherwise reap with
  * a vengeance.
  */
-#define FIO_REAP_TIMEOUT	60
+#define FIO_REAP_TIMEOUT	300
 
 #define TERMINATE_ALL		(-1U)
 extern void fio_terminate_threads(unsigned int);
diff --git a/mutex.c b/mutex.c
index 7580922..e5b045e 100644
--- a/mutex.c
+++ b/mutex.c
@@ -22,6 +22,12 @@ void __fio_mutex_remove(struct fio_mutex *mutex)
 {
 	assert(mutex->magic == FIO_MUTEX_MAGIC);
 	pthread_cond_destroy(&mutex->cond);
+
+	/*
+	 * Ensure any subsequent attempt to grab this mutex will fail
+	 * with an assert, instead of just silently hanging.
+	 */
+	memset(mutex, 0, sizeof(*mutex));
 }
 
 void fio_mutex_remove(struct fio_mutex *mutex)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-10-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-10-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e291cff14e97feb3cff711f5a5cbcb63b32f9c72:

  Use fmt -w WIDTH option instead of -WIDTH (2016-10-20 08:10:10 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ca0122d822ea7dd573f05ca4cf43c5d0ff9f4adb:

  backend: if we can't grab stat_mutex, report a deadlock error and exit (2016-10-23 08:35:14 -0600)

----------------------------------------------------------------
Theodore Ts'o (1):
      backend: if we can't grab stat_mutex, report a deadlock error and exit

 backend.c | 7 +++++++
 1 file changed, 7 insertions(+)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index fb2a855..093b6a3 100644
--- a/backend.c
+++ b/backend.c
@@ -1471,6 +1471,7 @@ static void *thread_main(void *data)
 	struct thread_data *td = fd->td;
 	struct thread_options *o = &td->o;
 	struct sk_out *sk_out = fd->sk_out;
+	int deadlock_loop_cnt;
 	int clear_state;
 	int ret;
 
@@ -1731,11 +1732,17 @@ static void *thread_main(void *data)
 		 * the rusage_sem, which would never get upped because
 		 * this thread is waiting for the stat mutex.
 		 */
+		deadlock_loop_cnt = 0;
 		do {
 			check_update_rusage(td);
 			if (!fio_mutex_down_trylock(stat_mutex))
 				break;
 			usleep(1000);
+			if (deadlock_loop_cnt++ > 5000) {
+				log_err("fio seems to be stuck grabbing stat_mutex, forcibly exiting\n");
+				td->error = EDEADLOCK;
+				goto err;
+			}
 		} while (1);
 
 		if (td_read(td) && td->io_bytes[DDIR_READ])

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-10-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-10-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 99350ae471e1271cae7bb3ef68b5ee0e11c21828:

  Merge branch 'rbd-poll' (2016-10-19 09:24:04 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e291cff14e97feb3cff711f5a5cbcb63b32f9c72:

  Use fmt -w WIDTH option instead of -WIDTH (2016-10-20 08:10:10 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (2):
      Remove getopt_long_only macro from NetBSD header
      Use fmt -w WIDTH option instead of -WIDTH

 Makefile       | 4 ++--
 os/os-netbsd.h | 3 ---
 2 files changed, 2 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 6b5548a..b3a12dd 100644
--- a/Makefile
+++ b/Makefile
@@ -315,7 +315,7 @@ override CFLAGS += -DFIO_VERSION='"$(FIO_VERSION)"'
 	@$(CC) -MM $(CFLAGS) $(CPPFLAGS) $(SRCDIR)/$*.c > $*.d
 	@mv -f $*.d $*.d.tmp
 	@sed -e 's|.*:|$*.o:|' < $*.d.tmp > $*.d
-	@sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | fmt -1 | \
+	@sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | fmt -w 1 | \
 		sed -e 's/^ *//' -e 's/$$/:/' >> $*.d
 	@rm -f $*.d.tmp
 
@@ -354,7 +354,7 @@ init.o: init.c FIO-VERSION-FILE
 	@$(CC) -MM $(CFLAGS) $(CPPFLAGS) $(SRCDIR)/$*.c > $*.d
 	@mv -f $*.d $*.d.tmp
 	@sed -e 's|.*:|$*.o:|' < $*.d.tmp > $*.d
-	@sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | fmt -1 | \
+	@sed -e 's/.*://' -e 's/\\$$//' < $*.d.tmp | fmt -w 1 | \
 		sed -e 's/^ *//' -e 's/$$/:/' >> $*.d
 	@rm -f $*.d.tmp
 
diff --git a/os/os-netbsd.h b/os/os-netbsd.h
index 4c629dd..1ef5866 100644
--- a/os/os-netbsd.h
+++ b/os/os-netbsd.h
@@ -74,7 +74,4 @@ static inline unsigned long long get_fs_free_size(const char *path)
 #define FIO_MADV_FREE	MADV_FREE
 #endif
 
-/* XXX NetBSD doesn't have getopt_long_only */
-#define	getopt_long_only	getopt_long
-
 #endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-10-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-10-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0228cfe7bb04b3c8329f8e77ee47e30e1a5a03cd:

  HOWTO: update to include iolog replay (2016-10-18 09:12:45 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 99350ae471e1271cae7bb3ef68b5ee0e11c21828:

  Merge branch 'rbd-poll' (2016-10-19 09:24:04 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      rbd: poll cleanups
      Merge branch 'rbd-poll'

Pan Liu (1):
      use poll() and rbd_poll_io_events to speed up io retrieval.

 configure     | 29 ++++++++++++++++++++++
 engines/rbd.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 105 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index e91ec25..833b6d3 100755
--- a/configure
+++ b/configure
@@ -1335,6 +1335,32 @@ fi
 echo "Rados Block Device engine     $rbd"
 
 ##########################################
+# check for rbd_poll
+rbd_poll="no"
+if test "$rbd" = "yes"; then
+cat > $TMPC << EOF
+#include <rbd/librbd.h>
+#include <sys/eventfd.h>
+
+int main(int argc, char **argv)
+{
+  rbd_image_t image;
+  rbd_completion_t comp;
+
+  int fd = eventfd(0, EFD_NONBLOCK);
+  rbd_set_image_notification(image, fd, EVENT_TYPE_EVENTFD);
+  rbd_poll_io_events(image, comp, 1);
+
+  return 0;
+}
+EOF
+if compile_prog "" "-lrbd -lrados" "rbd"; then
+  rbd_poll="yes"
+fi
+echo "rbd_poll                      $rbd_poll"
+fi
+
+##########################################
 # check for rbd_invaidate_cache()
 rbd_inval="no"
 if test "$rbd" = "yes"; then
@@ -1853,6 +1879,9 @@ fi
 if test "$rbd" = "yes" ; then
   output_sym "CONFIG_RBD"
 fi
+if test "$rbd_poll" = "yes" ; then
+  output_sym "CONFIG_RBD_POLL"
+fi
 if test "$rbd_inval" = "yes" ; then
   output_sym "CONFIG_RBD_INVAL"
 fi
diff --git a/engines/rbd.c b/engines/rbd.c
index 5e17fbe..aa50c80 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -13,6 +13,12 @@
 #include <zipkin_c.h>
 #endif
 
+#ifdef CONFIG_RBD_POLL
+/* add for poll */
+#include <poll.h>
+#include <sys/eventfd.h>
+#endif
+
 struct fio_rbd_iou {
 	struct io_u *io_u;
 	rbd_completion_t completion;
@@ -29,6 +35,7 @@ struct rbd_data {
 	rbd_image_t image;
 	struct io_u **aio_events;
 	struct io_u **sort_events;
+	int fd; /* add for poll */
 };
 
 struct rbd_options {
@@ -104,6 +111,9 @@ static int _fio_setup_rbd_data(struct thread_data *td,
 	if (!rbd)
 		goto failed;
 
+	/* add for poll, init fd: -1 */
+	rbd->fd = -1;
+
 	rbd->aio_events = calloc(td->o.iodepth, sizeof(struct io_u *));
 	if (!rbd->aio_events)
 		goto failed;
@@ -127,6 +137,35 @@ failed:
 
 }
 
+#ifdef CONFIG_RBD_POLL
+static bool _fio_rbd_setup_poll(struct rbd_data *rbd)
+{
+	int r;
+
+	/* add for rbd poll */
+	rbd->fd = eventfd(0, EFD_NONBLOCK);
+	if (rbd->fd < 0) {
+		log_err("eventfd failed.\n");
+		return false;
+	}
+
+	r = rbd_set_image_notification(rbd->image, rbd->fd, EVENT_TYPE_EVENTFD);
+	if (r < 0) {
+		log_err("rbd_set_image_notification failed.\n");
+		close(rbd->fd);
+		rbd->fd = -1;
+		return false;
+	}
+
+	return true;
+}
+#else
+static bool _fio_rbd_setup_poll(struct rbd_data *rbd)
+{
+	return true;
+}
+#endif
+
 static int _fio_rbd_connect(struct thread_data *td)
 {
 	struct rbd_data *rbd = td->io_ops_data;
@@ -188,8 +227,15 @@ static int _fio_rbd_connect(struct thread_data *td)
 		log_err("rbd_open failed.\n");
 		goto failed_open;
 	}
+
+	if (!_fio_rbd_setup_poll(rbd))
+		goto failed_poll;
+
 	return 0;
 
+failed_poll:
+	rbd_close(rbd->image);
+	rbd->image = NULL;
 failed_open:
 	rados_ioctx_destroy(rbd->io_ctx);
 	rbd->io_ctx = NULL;
@@ -205,6 +251,12 @@ static void _fio_rbd_disconnect(struct rbd_data *rbd)
 	if (!rbd)
 		return;
 
+	/* close eventfd */
+	if (rbd->fd != -1) {
+		close(rbd->fd);
+		rbd->fd = -1;
+	}
+
 	/* shutdown everything */
 	if (rbd->image) {
 		rbd_close(rbd->image);
@@ -304,10 +356,32 @@ static int rbd_iter_events(struct thread_data *td, unsigned int *events,
 	struct rbd_data *rbd = td->io_ops_data;
 	unsigned int this_events = 0;
 	struct io_u *io_u;
-	int i, sidx;
+	int i, sidx = 0;
+
+#ifdef CONFIG_RBD_POLL
+	int ret = 0;
+	int event_num = 0;
+	struct fio_rbd_iou *fri = NULL;
+	rbd_completion_t comps[min_evts];
 
-	sidx = 0;
+	struct pollfd pfd;
+	pfd.fd = rbd->fd;
+	pfd.events = POLLIN;
+
+	ret = poll(&pfd, 1, -1);
+	if (ret <= 0)
+		return 0;
+
+	assert(pfd.revents & POLLIN);
+
+	event_num = rbd_poll_io_events(rbd->image, comps, min_evts);
+
+	for (i = 0; i < event_num; i++) {
+		fri = rbd_aio_get_arg(comps[i]);
+		io_u = fri->io_u;
+#else
 	io_u_qiter(&td->io_u_all, io_u, i) {
+#endif
 		if (!(io_u->flags & IO_U_F_FLIGHT))
 			continue;
 		if (rbd_io_u_seen(io_u))

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-10-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-10-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit dac34079f7d8cfb0a64e4d3f086728edd7eded6d:

  iolog: enable replay_redirect on iolog replay (2016-10-17 14:51:54 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0228cfe7bb04b3c8329f8e77ee47e30e1a5a03cd:

  HOWTO: update to include iolog replay (2016-10-18 09:12:45 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      iolog: add support for 'replay_no_stall'
      HOWTO: update to include iolog replay

 HOWTO   | 19 ++++++++++---------
 iolog.c |  2 ++
 2 files changed, 12 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index cf1024c..3f8acee 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1558,10 +1558,10 @@ read_iolog=str	Open an iolog with the specified file name and replay the
 
 replay_no_stall=int When replaying I/O with read_iolog the default behavior
 		is to attempt to respect the time stamps within the log and
-		replay them with the appropriate delay between IOPS.  By
+		replay them with the appropriate delay between IOPS. By
 		setting this variable fio will not respect the timestamps and
 		attempt to replay them as fast as possible while still
-		respecting ordering.  The result is the same I/O pattern to a
+		respecting ordering. The result is the same I/O pattern to a
 		given device, but different timings.
 
 replay_redirect=str While replaying I/O patterns using read_iolog the
@@ -1573,13 +1573,14 @@ replay_redirect=str While replaying I/O patterns using read_iolog the
 		mapping.  Replay_redirect causes all IOPS to be replayed onto
 		the single specified device regardless of the device it was
 		recorded from. i.e. replay_redirect=/dev/sdc would cause all
-		IO in the blktrace to be replayed onto /dev/sdc.  This means
-		multiple devices will be replayed onto a single, if the trace
-		contains multiple devices.  If you want multiple devices to be
-		replayed concurrently to multiple redirected devices you must
-		blkparse your trace into separate traces and replay them with
-		independent fio invocations.  Unfortuantely this also breaks
-		the strict time ordering between multiple device accesses.
+		IO in the blktrace or iolog to be replayed onto /dev/sdc.
+		This means multiple devices will be replayed onto a single
+		device, if the trace contains multiple devices. If you want
+		multiple devices to be replayed concurrently to multiple
+		redirected devices you must blkparse your trace into separate
+		traces and replay them with independent fio invocations.
+		Unfortuantely this also breaks the strict time ordering
+		between multiple device accesses.
 
 replay_align=int	Force alignment of IO offsets and lengths in a trace
 		to this power of 2 value.
diff --git a/iolog.c b/iolog.c
index 686c713..f0ce3b2 100644
--- a/iolog.c
+++ b/iolog.c
@@ -425,6 +425,8 @@ static int read_iolog2(struct thread_data *td, FILE *f)
 				continue;
 			writes++;
 		} else if (rw == DDIR_WAIT) {
+			if (td->o.no_stall)
+				continue;
 			waits++;
 		} else if (rw == DDIR_INVAL) {
 		} else if (!ddir_sync(rw)) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-10-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-10-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a4f581f981a2a0e19d22339d9fdf17b3aaeb12b8:

  Update REPORTING-BUGS (2016-10-14 14:07:43 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to dac34079f7d8cfb0a64e4d3f086728edd7eded6d:

  iolog: enable replay_redirect on iolog replay (2016-10-17 14:51:54 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      iolog: enable replay_redirect on iolog replay

Tomohiro Kusumi (4):
      Add FIO_HAVE_FS_STAT/get_fs_free_size() support for NetBSD
      Add FIO_HAVE_FS_STAT/get_fs_free_size() support for OpenBSD
      Fix e7e136da (Add device_is_mounted() support for BSDs)
      Add device_is_mounted() support for NetBSD

 configure        | 29 +++++++++++++++++++++++++++--
 iolog.c          | 12 ++++++++----
 lib/mountcheck.c | 23 ++++++++++++++++++++++-
 os/os-netbsd.h   | 15 +++++++++++++++
 os/os-openbsd.h  | 15 +++++++++++++++
 5 files changed, 87 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index a24e3ef..e91ec25 100755
--- a/configure
+++ b/configure
@@ -1621,6 +1621,11 @@ echo "getmntent                     $getmntent"
 
 ##########################################
 # Check whether we have getmntinfo
+# These are originally added for BSDs, but may also work
+# on other operating systems with getmntinfo(3).
+
+# getmntinfo(3) for FreeBSD/DragonFlyBSD/OpenBSD.
+# Note that NetBSD needs -Werror to catch warning as error.
 getmntinfo="no"
 cat > $TMPC << EOF
 #include <stdio.h>
@@ -1628,15 +1633,32 @@ cat > $TMPC << EOF
 #include <sys/mount.h>
 int main(int argc, char **argv)
 {
-  struct statfs st;
+  struct statfs *st;
   return getmntinfo(&st, MNT_NOWAIT);
 }
 EOF
-if compile_prog "" "" "getmntinfo"; then
+if compile_prog "-Werror" "" "getmntinfo"; then
   getmntinfo="yes"
 fi
 echo "getmntinfo                    $getmntinfo"
 
+# getmntinfo(3) for NetBSD.
+getmntinfo_statvfs="no"
+cat > $TMPC << EOF
+#include <stdio.h>
+#include <sys/statvfs.h>
+int main(int argc, char **argv)
+{
+  struct statvfs *st;
+  return getmntinfo(&st, MNT_NOWAIT);
+}
+EOF
+# Skip the test if the one with statfs arg is detected.
+if test "$getmntinfo" != "yes" && compile_prog "-Werror" "" "getmntinfo_statvfs"; then
+  getmntinfo_statvfs="yes"
+  echo "getmntinfo_statvfs            $getmntinfo_statvfs"
+fi
+
 ##########################################
 # Check whether we have _Static_assert
 static_assert="no"
@@ -1883,6 +1905,9 @@ fi
 if test "$getmntinfo" = "yes" ; then
   output_sym "CONFIG_GETMNTINFO"
 fi
+if test "$getmntinfo_statvfs" = "yes" ; then
+  output_sym "CONFIG_GETMNTINFO_STATVFS"
+fi
 if test "$static_assert" = "yes" ; then
   output_sym "CONFIG_STATIC_ASSERT"
 fi
diff --git a/iolog.c b/iolog.c
index ab9c878..686c713 100644
--- a/iolog.c
+++ b/iolog.c
@@ -346,7 +346,7 @@ static int read_iolog2(struct thread_data *td, FILE *f)
 	unsigned long long offset;
 	unsigned int bytes;
 	int reads, writes, waits, fileno = 0, file_action = 0; /* stupid gcc */
-	char *fname, *act;
+	char *rfname, *fname, *act;
 	char *str, *p;
 	enum fio_ddir rw;
 
@@ -357,7 +357,7 @@ static int read_iolog2(struct thread_data *td, FILE *f)
 	 * for doing verifications.
 	 */
 	str = malloc(4096);
-	fname = malloc(256+16);
+	rfname = fname = malloc(256+16);
 	act = malloc(256+16);
 
 	reads = writes = waits = 0;
@@ -365,8 +365,12 @@ static int read_iolog2(struct thread_data *td, FILE *f)
 		struct io_piece *ipo;
 		int r;
 
-		r = sscanf(p, "%256s %256s %llu %u", fname, act, &offset,
+		r = sscanf(p, "%256s %256s %llu %u", rfname, act, &offset,
 									&bytes);
+
+		if (td->o.replay_redirect)
+			fname = td->o.replay_redirect;
+
 		if (r == 4) {
 			/*
 			 * Check action first
@@ -451,7 +455,7 @@ static int read_iolog2(struct thread_data *td, FILE *f)
 
 	free(str);
 	free(act);
-	free(fname);
+	free(rfname);
 
 	if (writes && read_only) {
 		log_err("fio: <%s> skips replay of %d writes due to"
diff --git a/lib/mountcheck.c b/lib/mountcheck.c
index e8780eb..0aec744 100644
--- a/lib/mountcheck.c
+++ b/lib/mountcheck.c
@@ -32,7 +32,7 @@ int device_is_mounted(const char *dev)
 }
 
 #elif defined(CONFIG_GETMNTINFO)
-/* for BSDs */
+/* for most BSDs */
 #include <sys/param.h>
 #include <sys/mount.h>
 
@@ -53,6 +53,27 @@ int device_is_mounted(const char *dev)
 	return 0;
 }
 
+#elif defined(CONFIG_GETMNTINFO_STATVFS)
+/* for NetBSD */
+#include <sys/statvfs.h>
+
+int device_is_mounted(const char *dev)
+{
+	struct statvfs *st;
+	int i, ret;
+
+	ret = getmntinfo(&st, MNT_NOWAIT);
+	if (ret <= 0)
+		return 0;
+
+	for (i = 0; i < ret; i++) {
+		if (!strcmp(st[i].f_mntfromname, dev))
+			return 1;
+	}
+
+	return 0;
+}
+
 #else
 /* others */
 
diff --git a/os/os-netbsd.h b/os/os-netbsd.h
index 4b0269e..4c629dd 100644
--- a/os/os-netbsd.h
+++ b/os/os-netbsd.h
@@ -6,6 +6,7 @@
 #include <errno.h>
 #include <lwp.h>
 #include <sys/param.h>
+#include <sys/statvfs.h>
 /* XXX hack to avoid confilcts between rbtree.h and <sys/rb.h> */
 #define	rb_node	_rb_node
 #include <sys/sysctl.h>
@@ -19,6 +20,7 @@
 #define FIO_USE_GENERIC_BDEV_SIZE
 #define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
+#define FIO_HAVE_FS_STAT
 #define FIO_HAVE_GETTID
 
 #undef	FIO_HAVE_CPU_AFFINITY	/* XXX notyet */
@@ -55,6 +57,19 @@ static inline int gettid(void)
 	return (int) _lwp_self();
 }
 
+static inline unsigned long long get_fs_free_size(const char *path)
+{
+	unsigned long long ret;
+	struct statvfs s;
+
+	if (statvfs(path, &s) < 0)
+		return -1ULL;
+
+	ret = s.f_frsize;
+	ret *= (unsigned long long) s.f_bfree;
+	return ret;
+}
+
 #ifdef MADV_FREE
 #define FIO_MADV_FREE	MADV_FREE
 #endif
diff --git a/os/os-openbsd.h b/os/os-openbsd.h
index b1d8e83..2998510 100644
--- a/os/os-openbsd.h
+++ b/os/os-openbsd.h
@@ -5,6 +5,7 @@
 
 #include <errno.h>
 #include <sys/param.h>
+#include <sys/statvfs.h>
 /* XXX hack to avoid conflicts between rbtree.h and <sys/tree.h> */
 #include <sys/sysctl.h>
 #undef RB_BLACK
@@ -17,6 +18,7 @@
 #define FIO_USE_GENERIC_BDEV_SIZE
 #define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
+#define FIO_HAVE_FS_STAT
 #define FIO_HAVE_GETTID
 
 #undef	FIO_HAVE_CPU_AFFINITY	/* XXX notyet */
@@ -53,6 +55,19 @@ static inline int gettid(void)
 	return (int) pthread_self();
 }
 
+static inline unsigned long long get_fs_free_size(const char *path)
+{
+	unsigned long long ret;
+	struct statvfs s;
+
+	if (statvfs(path, &s) < 0)
+		return -1ULL;
+
+	ret = s.f_frsize;
+	ret *= (unsigned long long) s.f_bfree;
+	return ret;
+}
+
 #ifdef MADV_FREE
 #define FIO_MADV_FREE	MADV_FREE
 #endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-10-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-10-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit afd2ceffd65514a54665493075d7957ec9d0e5fc:

  Add alias for 'disable_bw_measurement' option (2016-10-12 08:59:25 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a4f581f981a2a0e19d22339d9fdf17b3aaeb12b8:

  Update REPORTING-BUGS (2016-10-14 14:07:43 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Update REPORTING-BUGS

 REPORTING-BUGS | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/REPORTING-BUGS b/REPORTING-BUGS
index c6150d1..d8876ae 100644
--- a/REPORTING-BUGS
+++ b/REPORTING-BUGS
@@ -2,8 +2,10 @@ Reporting a bug
 ---------------
 
 If you notice anything that seems like a fio bug, please do send email
-to the list (fio@vger.kernel.org, see README) about it. You'll need
-to report at least:
+to the list (fio@vger.kernel.org, see README) about it. If you are not
+running the newest release of fio, upgrading first is recommended.
+
+When reporting a bug, you'll need to include:
 
 1) A description of what you think the bug is
 2) Environment (Linux distro version, kernel version). This is mostly
@@ -12,4 +14,8 @@ to report at least:
 4) How to reproduce. Please include a full list of the parameters
    passed to fio and the job file used (if any).
 
+A bug report can never have too much information. Any time information
+is left out and has to be asked for, it'll add to the turn-around time
+of getting to the bottom of it and committing a fix.
+
 That's it!

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-10-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-10-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d23ae82785f99927331e358d2c0deac5e53f2df1:

  Update bandwidth log documentation (2016-10-11 16:04:28 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to afd2ceffd65514a54665493075d7957ec9d0e5fc:

  Add alias for 'disable_bw_measurement' option (2016-10-12 08:59:25 -0600)

----------------------------------------------------------------
Bruce Cran (1):
      Implement nice() for Windows

Jens Axboe (1):
      Add alias for 'disable_bw_measurement' option

 HOWTO              |  4 ++++
 options.c          |  1 +
 os/windows/posix.c | 17 +++++++++++++----
 3 files changed, 18 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 07419a1..cf1024c 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1066,6 +1066,10 @@ random_generator=str	Fio supports the following engines for generating
 
 nice=int	Run the job with the given nice value. See man nice(2).
 
+     On Windows, values less than -15 set the process class to "High";
+     -1 through -15 set "Above Normal"; 1 through 15 "Below Normal";
+     and above 15 "Idle" priority class.
+
 prio=int	Set the io priority value of this job. Linux limits us to
 		a positive value between 0 and 7, with 0 being the highest.
 		See man ionice(1). Refer to an appropriate manpage for
diff --git a/options.c b/options.c
index bcda556..3c9adfb 100644
--- a/options.c
+++ b/options.c
@@ -3920,6 +3920,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "disable_bw_measurement",
+		.alias	= "disable_bw",
 		.lname	= "Disable bandwidth stats",
 		.type	= FIO_OPT_BOOL,
 		.off1	= offsetof(struct thread_options, disable_bw),
diff --git a/os/windows/posix.c b/os/windows/posix.c
index 3388127..bbd93e9 100755
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -647,10 +647,19 @@ int setgid(gid_t gid)
 
 int nice(int incr)
 {
-	if (incr != 0) {
-		errno = EINVAL;
-		return -1;
-	}
+	DWORD prioclass = NORMAL_PRIORITY_CLASS;
+	
+	if (incr < -15)
+		prioclass = HIGH_PRIORITY_CLASS;
+	else if (incr < 0)
+		prioclass = ABOVE_NORMAL_PRIORITY_CLASS;
+	else if (incr > 15)
+		prioclass = IDLE_PRIORITY_CLASS;
+	else if (incr > 0)
+		prioclass = BELOW_NORMAL_PRIORITY_CLASS;
+	
+	if (!SetPriorityClass(GetCurrentProcess(), prioclass))
+		log_err("fio: SetPriorityClass failed\n");
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-10-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-10-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d619275cfad893e37e8c59e5c9e0bc5ca4946b82:

  filehash: fix init/exit (2016-09-27 10:28:29 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d23ae82785f99927331e358d2c0deac5e53f2df1:

  Update bandwidth log documentation (2016-10-11 16:04:28 -0600)

----------------------------------------------------------------
Bruce Cran (1):
      windowsaio: fix completion thread affinitization

Omar Sandoval (4):
      init: fix --bandwidth-log without argument
      iolog: fix --bandwidth-log segfaults
      iolog: make write_*_log prefix optional
      Update bandwidth log documentation

 README               |  2 +-
 cconv.c              |  8 ++++++++
 engines/windowsaio.c |  9 +++++++--
 fio.1                | 16 +++++++--------
 init.c               | 28 +++++++++++++++-----------
 iolog.c              |  2 +-
 options.c            | 56 ++++++++++++++++++++++++++++++++++++++++++++++++----
 server.h             |  2 +-
 stat.c               |  2 +-
 thread_options.h     | 12 +++++++++++
 10 files changed, 107 insertions(+), 30 deletions(-)

---

Diff of recent changes:

diff --git a/README b/README
index a69a578..a8a4fdf 100644
--- a/README
+++ b/README
@@ -149,7 +149,7 @@ $ fio
 	--parse-only		Parse options only, don't start any IO
 	--output		Write output to file
 	--runtime		Runtime in seconds
-	--bandwidth-log		Generate per-job bandwidth logs
+	--bandwidth-log		Generate aggregate bandwidth logs
 	--minimal		Minimal (terse) output
 	--output-format=type	Output format (terse,json,json+,normal)
 	--terse-version=type	Terse version output format (default 3, or 2 or 4).
diff --git a/cconv.c b/cconv.c
index 194e342..6e0f609 100644
--- a/cconv.c
+++ b/cconv.c
@@ -280,6 +280,10 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->replay_align = le32_to_cpu(top->replay_align);
 	o->replay_scale = le32_to_cpu(top->replay_scale);
 	o->per_job_logs = le32_to_cpu(top->per_job_logs);
+	o->write_bw_log = le32_to_cpu(top->write_bw_log);
+	o->write_lat_log = le32_to_cpu(top->write_lat_log);
+	o->write_iops_log = le32_to_cpu(top->write_iops_log);
+	o->write_hist_log = le32_to_cpu(top->write_hist_log);
 
 	o->trim_backlog = le64_to_cpu(top->trim_backlog);
 	o->rate_process = le32_to_cpu(top->rate_process);
@@ -460,6 +464,10 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->replay_align = cpu_to_le32(o->replay_align);
 	top->replay_scale = cpu_to_le32(o->replay_scale);
 	top->per_job_logs = cpu_to_le32(o->per_job_logs);
+	top->write_bw_log = cpu_to_le32(o->write_bw_log);
+	top->write_lat_log = cpu_to_le32(o->write_lat_log);
+	top->write_iops_log = cpu_to_le32(o->write_iops_log);
+	top->write_hist_log = cpu_to_le32(o->write_hist_log);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		top->bs[i] = cpu_to_le32(o->bs[i]);
diff --git a/engines/windowsaio.c b/engines/windowsaio.c
index 0e164b6..f5cb048 100644
--- a/engines/windowsaio.c
+++ b/engines/windowsaio.c
@@ -113,10 +113,15 @@ static int fio_windowsaio_init(struct thread_data *td)
 
 		if (!rc)
 		{
+			DWORD threadid;
+
 			ctx->iocp = hFile;
 			ctx->wd = wd;
-			wd->iothread = CreateThread(NULL, 0, IoCompletionRoutine, ctx, 0, NULL);
-			if (wd->iothread == NULL)
+			wd->iothread = CreateThread(NULL, 0, IoCompletionRoutine, ctx, 0, &threadid);
+
+			if (wd->iothread != NULL)
+				fio_setaffinity(threadid, td->o.cpumask);
+			else
 				log_err("windowsaio: failed to create io completion thread\n");
 		}
 
diff --git a/fio.1 b/fio.1
index 8d596fb..48c2060 100644
--- a/fio.1
+++ b/fio.1
@@ -30,7 +30,7 @@ dump of the latency buckets.
 Limit run time to \fIruntime\fR seconds.
 .TP
 .B \-\-bandwidth\-log
-Generate per-job bandwidth logs.
+Generate aggregate bandwidth logs.
 .TP
 .B \-\-minimal
 Print statistics in a terse, semicolon-delimited format.
@@ -1462,13 +1462,13 @@ If set, this generates bw/clat/iops log with per file private filenames. If
 not set, jobs with identical names will share the log filename. Default: true.
 .TP
 .BI write_bw_log \fR=\fPstr
-If given, write a bandwidth log of the jobs in this job file. Can be used to
-store data of the bandwidth of the jobs in their lifetime. The included
-fio_generate_plots script uses gnuplot to turn these text files into nice
-graphs. See \fBwrite_lat_log\fR for behaviour of given filename. For this
-option, the postfix is _bw.x.log, where x is the index of the job (1..N,
-where N is the number of jobs). If \fBper_job_logs\fR is false, then the
-filename will not include the job index. See the \fBLOG FILE FORMATS\fR
+If given, write a bandwidth log for this job. Can be used to store data of the
+bandwidth of the jobs in their lifetime. The included fio_generate_plots script
+uses gnuplot to turn these text files into nice graphs. See \fBwrite_lat_log\fR
+for behaviour of given filename. For this option, the postfix is _bw.x.log,
+where x is the index of the job (1..N, where N is the number of jobs). If
+\fBper_job_logs\fR is false, then the filename will not include the job index.
+See the \fBLOG FILE FORMATS\fR
 section.
 .TP
 .BI write_lat_log \fR=\fPstr
diff --git a/init.c b/init.c
index c556fa2..d8c0bd1 100644
--- a/init.c
+++ b/init.c
@@ -103,7 +103,7 @@ static struct option l_opts[FIO_NR_OPTIONS] = {
 	},
 	{
 		.name		= (char *) "bandwidth-log",
-		.has_arg	= required_argument,
+		.has_arg	= no_argument,
 		.val		= 'b' | FIO_CLIENT_FLAG,
 	},
 	{
@@ -1389,7 +1389,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	if (setup_rate(td))
 		goto err;
 
-	if (o->lat_log_file) {
+	if (o->write_lat_log) {
 		struct log_params p = {
 			.td = td,
 			.avg_msec = o->log_avg_msec,
@@ -1400,6 +1400,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			.log_gz = o->log_gz,
 			.log_gz_store = o->log_gz_store,
 		};
+		const char *pre = o->lat_log_file ? o->lat_log_file : o->name;
 		const char *suf;
 
 		if (p.log_gz_store)
@@ -1407,20 +1408,20 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		else
 			suf = "log";
 
-		gen_log_name(logname, sizeof(logname), "lat", o->lat_log_file,
+		gen_log_name(logname, sizeof(logname), "lat", pre,
 				td->thread_number, suf, o->per_job_logs);
 		setup_log(&td->lat_log, &p, logname);
 
-		gen_log_name(logname, sizeof(logname), "slat", o->lat_log_file,
+		gen_log_name(logname, sizeof(logname), "slat", pre,
 				td->thread_number, suf, o->per_job_logs);
 		setup_log(&td->slat_log, &p, logname);
 
-		gen_log_name(logname, sizeof(logname), "clat", o->lat_log_file,
+		gen_log_name(logname, sizeof(logname), "clat", pre,
 				td->thread_number, suf, o->per_job_logs);
 		setup_log(&td->clat_log, &p, logname);
 	}
 
-	if (o->hist_log_file) {
+	if (o->write_hist_log) {
 		struct log_params p = {
 			.td = td,
 			.avg_msec = o->log_avg_msec,
@@ -1431,6 +1432,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			.log_gz = o->log_gz,
 			.log_gz_store = o->log_gz_store,
 		};
+		const char *pre = o->hist_log_file ? o->hist_log_file : o->name;
 		const char *suf;
 
 #ifndef CONFIG_ZLIB
@@ -1445,12 +1447,12 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		else
 			suf = "log";
 
-		gen_log_name(logname, sizeof(logname), "clat_hist", o->hist_log_file,
+		gen_log_name(logname, sizeof(logname), "clat_hist", pre,
 				td->thread_number, suf, o->per_job_logs);
 		setup_log(&td->clat_hist_log, &p, logname);
 	}
 
-	if (o->bw_log_file) {
+	if (o->write_bw_log) {
 		struct log_params p = {
 			.td = td,
 			.avg_msec = o->log_avg_msec,
@@ -1461,6 +1463,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			.log_gz = o->log_gz,
 			.log_gz_store = o->log_gz_store,
 		};
+		const char *pre = o->bw_log_file ? o->bw_log_file : o->name;
 		const char *suf;
 
 		if (fio_option_is_set(o, bw_avg_time))
@@ -1476,11 +1479,11 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		else
 			suf = "log";
 
-		gen_log_name(logname, sizeof(logname), "bw", o->bw_log_file,
+		gen_log_name(logname, sizeof(logname), "bw", pre,
 				td->thread_number, suf, o->per_job_logs);
 		setup_log(&td->bw_log, &p, logname);
 	}
-	if (o->iops_log_file) {
+	if (o->write_iops_log) {
 		struct log_params p = {
 			.td = td,
 			.avg_msec = o->log_avg_msec,
@@ -1491,6 +1494,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			.log_gz = o->log_gz,
 			.log_gz_store = o->log_gz_store,
 		};
+		const char *pre = o->iops_log_file ? o->iops_log_file : o->name;
 		const char *suf;
 
 		if (fio_option_is_set(o, iops_avg_time))
@@ -1506,7 +1510,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		else
 			suf = "log";
 
-		gen_log_name(logname, sizeof(logname), "iops", o->iops_log_file,
+		gen_log_name(logname, sizeof(logname), "iops", pre,
 				td->thread_number, suf, o->per_job_logs);
 		setup_log(&td->iops_log, &p, logname);
 	}
@@ -2001,7 +2005,7 @@ static void usage(const char *name)
 	printf("  --parse-only\t\tParse options only, don't start any IO\n");
 	printf("  --output\t\tWrite output to file\n");
 	printf("  --runtime\t\tRuntime in seconds\n");
-	printf("  --bandwidth-log\tGenerate per-job bandwidth logs\n");
+	printf("  --bandwidth-log\tGenerate aggregate bandwidth logs\n");
 	printf("  --minimal\t\tMinimal (terse) output\n");
 	printf("  --output-format=x\tOutput format (terse,json,json+,normal)\n");
 	printf("  --terse-version=x\tSet terse version output format to 'x'\n");
diff --git a/iolog.c b/iolog.c
index 6576ca5..ab9c878 100644
--- a/iolog.c
+++ b/iolog.c
@@ -1065,7 +1065,7 @@ void flush_log(struct io_log *log, bool do_append)
 		cur_log = flist_first_entry(&log->io_logs, struct io_logs, list);
 		flist_del_init(&cur_log->list);
 		
-		if (log == log->td->clat_hist_log)
+		if (log->td && log == log->td->clat_hist_log)
 			flush_hist_samples(f, log->hist_coarseness, cur_log->log,
 			                   log_sample_sz(log, cur_log));
 		else
diff --git a/options.c b/options.c
index 50b4d09..bcda556 100644
--- a/options.c
+++ b/options.c
@@ -1311,6 +1311,50 @@ static int str_size_cb(void *data, unsigned long long *__val)
 	return 0;
 }
 
+static int str_write_bw_log_cb(void *data, const char *str)
+{
+	struct thread_data *td = cb_data_to_td(data);
+
+	if (str)
+		td->o.bw_log_file = strdup(str);
+
+	td->o.write_bw_log = 1;
+	return 0;
+}
+
+static int str_write_lat_log_cb(void *data, const char *str)
+{
+	struct thread_data *td = cb_data_to_td(data);
+
+	if (str)
+		td->o.lat_log_file = strdup(str);
+
+	td->o.write_lat_log = 1;
+	return 0;
+}
+
+static int str_write_iops_log_cb(void *data, const char *str)
+{
+	struct thread_data *td = cb_data_to_td(data);
+
+	if (str)
+		td->o.iops_log_file = strdup(str);
+
+	td->o.write_iops_log = 1;
+	return 0;
+}
+
+static int str_write_hist_log_cb(void *data, const char *str)
+{
+	struct thread_data *td = cb_data_to_td(data);
+
+	if (str)
+		td->o.hist_log_file = strdup(str);
+
+	td->o.write_hist_log = 1;
+	return 0;
+}
+
 static int rw_verify(struct fio_option *o, void *data)
 {
 	struct thread_data *td = cb_data_to_td(data);
@@ -3507,8 +3551,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "write_bw_log",
 		.lname	= "Write bandwidth log",
-		.type	= FIO_OPT_STR_STORE,
+		.type	= FIO_OPT_STR,
 		.off1	= offsetof(struct thread_options, bw_log_file),
+		.cb	= str_write_bw_log_cb,
 		.help	= "Write log of bandwidth during run",
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
@@ -3516,8 +3561,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "write_lat_log",
 		.lname	= "Write latency log",
-		.type	= FIO_OPT_STR_STORE,
+		.type	= FIO_OPT_STR,
 		.off1	= offsetof(struct thread_options, lat_log_file),
+		.cb	= str_write_lat_log_cb,
 		.help	= "Write log of latency during run",
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
@@ -3525,8 +3571,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "write_iops_log",
 		.lname	= "Write IOPS log",
-		.type	= FIO_OPT_STR_STORE,
+		.type	= FIO_OPT_STR,
 		.off1	= offsetof(struct thread_options, iops_log_file),
+		.cb	= str_write_iops_log_cb,
 		.help	= "Write log of IOPS during run",
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
@@ -3566,8 +3613,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "write_hist_log",
 		.lname	= "Write latency histogram logs",
-		.type	= FIO_OPT_STR_STORE,
+		.type	= FIO_OPT_STR,
 		.off1	= offsetof(struct thread_options, hist_log_file),
+		.cb	= str_write_hist_log_cb,
 		.help	= "Write log of latency histograms during run",
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
diff --git a/server.h b/server.h
index 6c572a1..3b592c7 100644
--- a/server.h
+++ b/server.h
@@ -38,7 +38,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 57,
+	FIO_SERVER_VER			= 58,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index 3c72a6c..1e889f5 100644
--- a/stat.c
+++ b/stat.c
@@ -2023,7 +2023,7 @@ static void __add_log_sample(struct io_log *iolog, unsigned long val,
 		s = get_sample(iolog, cur_log, cur_log->nr_samples);
 
 		s->val = val;
-		s->time = t + iolog->td->unix_epoch;
+		s->time = t + (iolog->td ? iolog->td->unix_epoch : 0);
 		io_sample_set_ddir(iolog, s, ddir);
 		s->bs = bs;
 
diff --git a/thread_options.h b/thread_options.h
index 1b4590f..5e379e3 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -233,6 +233,12 @@ struct thread_options {
 
 	char *read_iolog_file;
 	char *write_iolog_file;
+
+	unsigned int write_bw_log;
+	unsigned int write_lat_log;
+	unsigned int write_iops_log;
+	unsigned int write_hist_log;
+
 	char *bw_log_file;
 	char *lat_log_file;
 	char *iops_log_file;
@@ -492,6 +498,12 @@ struct thread_options_pack {
 
 	uint8_t read_iolog_file[FIO_TOP_STR_MAX];
 	uint8_t write_iolog_file[FIO_TOP_STR_MAX];
+
+	uint32_t write_bw_log;
+	uint32_t write_lat_log;
+	uint32_t write_iops_log;
+	uint32_t write_hist_log;
+
 	uint8_t bw_log_file[FIO_TOP_STR_MAX];
 	uint8_t lat_log_file[FIO_TOP_STR_MAX];
 	uint8_t iops_log_file[FIO_TOP_STR_MAX];

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-09-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-09-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 63a26e05622b0ced2cc685f545f493e794ccc325:

  filehash: move to separate allocation (2016-09-26 01:40:52 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d619275cfad893e37e8c59e5c9e0bc5ca4946b82:

  filehash: fix init/exit (2016-09-27 10:28:29 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      init: remove unused variable
      filehash: fix init/exit

 init.c   | 5 +----
 libfio.c | 3 +++
 2 files changed, 4 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 5151ff1..c556fa2 100644
--- a/init.c
+++ b/init.c
@@ -298,7 +298,6 @@ void free_threads_shm(void)
 static void free_shm(void)
 {
 	if (threads) {
-		file_hash_exit();
 		flow_exit();
 		fio_debug_jobp = NULL;
 		free_threads_shm();
@@ -311,6 +310,7 @@ static void free_shm(void)
 
 	options_free(fio_options, &def_thread.o);
 	fio_filelock_exit();
+	file_hash_exit();
 	scleanup();
 }
 
@@ -322,8 +322,6 @@ static void free_shm(void)
  */
 static int setup_thread_area(void)
 {
-	void *hash;
-
 	if (threads)
 		return 0;
 
@@ -368,7 +366,6 @@ static int setup_thread_area(void)
 	fio_debug_jobp = (void *) threads + max_jobs * sizeof(struct thread_data);
 	*fio_debug_jobp = -1;
 
-	file_hash_init();
 	flow_init();
 
 	return 0;
diff --git a/libfio.c b/libfio.c
index d88ed4e..0f9f4e7 100644
--- a/libfio.c
+++ b/libfio.c
@@ -34,6 +34,7 @@
 #include "os/os.h"
 #include "filelock.h"
 #include "helper_thread.h"
+#include "filehash.h"
 
 /*
  * Just expose an empty list, if the OS does not support disk util stats
@@ -376,6 +377,8 @@ int initialize_fio(char *envp[])
 		return 1;
 	}
 
+	file_hash_init();
+
 	/*
 	 * We need locale for number printing, if it isn't set then just
 	 * go with the US format.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-09-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-09-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0a301e93062df3735f9bb87c445e18d98a4b6efb:

  bloom: add string version (2016-09-23 11:57:00 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 63a26e05622b0ced2cc685f545f493e794ccc325:

  filehash: move to separate allocation (2016-09-26 01:40:52 -0600)

----------------------------------------------------------------
Jens Axboe (8):
      bloom: allow to pass in whether to set bits for strings
      smalloc: fixup --alloc-size
      init: re-call sinit() if we change the smallc pool size
      file: add bloom filter to avoid quadratic lookup behavior
      bloom: don't enforce minimum entry count
      smalloc: OOM fixups
      fio: bump max jobs to 4k
      filehash: move to separate allocation

 README      |  4 ++--
 diskutil.c  |  4 +---
 filehash.c  | 21 ++++++++++++++++++---
 filehash.h  |  5 +++--
 filesetup.c | 21 +++++++++------------
 fio.h       |  2 +-
 flow.c      |  1 -
 init.c      |  8 ++++----
 lib/bloom.c |  8 +++-----
 lib/bloom.h |  2 +-
 os/os.h     |  2 +-
 server.c    | 12 +++++++-----
 smalloc.c   | 24 ++++++++++++++++--------
 workqueue.c |  2 ++
 14 files changed, 68 insertions(+), 48 deletions(-)

---

Diff of recent changes:

diff --git a/README b/README
index 5fa37f3..a69a578 100644
--- a/README
+++ b/README
@@ -169,7 +169,7 @@ $ fio
 	--status-interval=t	Force full status dump every 't' period passed
 	--section=name		Only run specified section in job file.
 				Multiple sections can be specified.
-	--alloc-size=kb		Set smalloc pool to this size in kb (def 1024)
+	--alloc-size=kb		Set smalloc pool to this size in kb (def 16384)
 	--warnings-fatal	Fio parser warnings are fatal
 	--max-jobs		Maximum number of threads/processes to support
 	--server=args		Start backend server. See Client/Server section.
@@ -233,7 +233,7 @@ sections.  The reserved 'global' section is always parsed and used.
 The --alloc-size switch allows one to use a larger pool size for smalloc.
 If running large jobs with randommap enabled, fio can run out of memory.
 Smalloc is an internal allocator for shared structures from a fixed size
-memory pool. The pool size defaults to 1024k and can grow to 128 pools.
+memory pool. The pool size defaults to 16M and can grow to 8 pools.
 
 NOTE: While running .fio_smalloc.* backing store files are visible in /tmp.
 
diff --git a/diskutil.c b/diskutil.c
index 0f7a642..27ddb46 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -292,10 +292,8 @@ static struct disk_util *disk_util_add(struct thread_data *td, int majdev,
 	dprint(FD_DISKUTIL, "add maj/min %d/%d: %s\n", majdev, mindev, path);
 
 	du = smalloc(sizeof(*du));
-	if (!du) {
-		log_err("fio: smalloc() pool exhausted\n");
+	if (!du)
 		return NULL;
-	}
 
 	memset(du, 0, sizeof(*du));
 	INIT_FLIST_HEAD(&du->list);
diff --git a/filehash.c b/filehash.c
index 0d61f54..edeeab4 100644
--- a/filehash.c
+++ b/filehash.c
@@ -5,14 +5,19 @@
 #include "flist.h"
 #include "hash.h"
 #include "filehash.h"
+#include "smalloc.h"
+#include "lib/bloom.h"
 
 #define HASH_BUCKETS	512
 #define HASH_MASK	(HASH_BUCKETS - 1)
 
-unsigned int file_hash_size = HASH_BUCKETS * sizeof(struct flist_head);
+#define BLOOM_SIZE	16*1024*1024
+
+static unsigned int file_hash_size = HASH_BUCKETS * sizeof(struct flist_head);
 
 static struct flist_head *file_hash;
 static struct fio_mutex *hash_lock;
+static struct bloom *file_bloom;
 
 static unsigned short hash(const char *name)
 {
@@ -95,6 +100,11 @@ struct fio_file *add_file_hash(struct fio_file *f)
 	return alias;
 }
 
+bool file_bloom_exists(const char *fname, bool set)
+{
+	return bloom_string(file_bloom, fname, strlen(fname), set);
+}
+
 void file_hash_exit(void)
 {
 	unsigned int i, has_entries = 0;
@@ -107,18 +117,23 @@ void file_hash_exit(void)
 	if (has_entries)
 		log_err("fio: file hash not empty on exit\n");
 
+	sfree(file_hash);
 	file_hash = NULL;
 	fio_mutex_remove(hash_lock);
 	hash_lock = NULL;
+	bloom_free(file_bloom);
+	file_bloom = NULL;
 }
 
-void file_hash_init(void *ptr)
+void file_hash_init(void)
 {
 	unsigned int i;
 
-	file_hash = ptr;
+	file_hash = smalloc(file_hash_size);
+
 	for (i = 0; i < HASH_BUCKETS; i++)
 		INIT_FLIST_HEAD(&file_hash[i]);
 
 	hash_lock = fio_mutex_init(FIO_MUTEX_UNLOCKED);
+	file_bloom = bloom_new(BLOOM_SIZE);
 }
diff --git a/filehash.h b/filehash.h
index f316b20..5fecc3b 100644
--- a/filehash.h
+++ b/filehash.h
@@ -1,14 +1,15 @@
 #ifndef FIO_FILE_HASH_H
 #define FIO_FILE_HASH_H
 
-extern unsigned int file_hash_size;
+#include "lib/types.h"
 
-extern void file_hash_init(void *);
+extern void file_hash_init(void);
 extern void file_hash_exit(void);
 extern struct fio_file *lookup_file_hash(const char *);
 extern struct fio_file *add_file_hash(struct fio_file *);
 extern void remove_file_hash(struct fio_file *);
 extern void fio_file_hash_lock(void);
 extern void fio_file_hash_unlock(void);
+extern bool file_bloom_exists(const char *, bool);
 
 #endif
diff --git a/filesetup.c b/filesetup.c
index c6ef3bf..a3bbbb2 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1242,12 +1242,14 @@ static void get_file_type(struct fio_file *f)
 	}
 }
 
-static bool __is_already_allocated(const char *fname)
+static bool __is_already_allocated(const char *fname, bool set)
 {
 	struct flist_head *entry;
+	bool ret;
 
-	if (flist_empty(&filename_list))
-		return false;
+	ret = file_bloom_exists(fname, set);
+	if (!ret)
+		return ret;
 
 	flist_for_each(entry, &filename_list) {
 		struct file_name *fn;
@@ -1266,7 +1268,7 @@ static bool is_already_allocated(const char *fname)
 	bool ret;
 
 	fio_file_hash_lock();
-	ret = __is_already_allocated(fname);
+	ret = __is_already_allocated(fname, false);
 	fio_file_hash_unlock();
 
 	return ret;
@@ -1280,7 +1282,7 @@ static void set_already_allocated(const char *fname)
 	fn->filename = strdup(fname);
 
 	fio_file_hash_lock();
-	if (!__is_already_allocated(fname)) {
+	if (!__is_already_allocated(fname, true)) {
 		flist_add_tail(&fn->list, &filename_list);
 		fn = NULL;
 	}
@@ -1317,7 +1319,6 @@ static struct fio_file *alloc_new_file(struct thread_data *td)
 
 	f = smalloc(sizeof(*f));
 	if (!f) {
-		log_err("fio: smalloc OOM\n");
 		assert(0);
 		return NULL;
 	}
@@ -1400,10 +1401,8 @@ int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 		f->real_file_size = -1ULL;
 
 	f->file_name = smalloc_strdup(file_name);
-	if (!f->file_name) {
-		log_err("fio: smalloc OOM\n");
+	if (!f->file_name)
 		assert(0);
-	}
 
 	get_file_type(f);
 
@@ -1606,10 +1605,8 @@ void dup_files(struct thread_data *td, struct thread_data *org)
 
 		if (f->file_name) {
 			__f->file_name = smalloc_strdup(f->file_name);
-			if (!__f->file_name) {
-				log_err("fio: smalloc OOM\n");
+			if (!__f->file_name)
 				assert(0);
-			}
 
 			__f->filetype = f->filetype;
 		}
diff --git a/fio.h b/fio.h
index df4fbb1..080842a 100644
--- a/fio.h
+++ b/fio.h
@@ -476,7 +476,7 @@ static inline void fio_ro_check(const struct thread_data *td, struct io_u *io_u)
 	assert(!(io_u->ddir == DDIR_WRITE && !td_write(td)));
 }
 
-#define REAL_MAX_JOBS		2048
+#define REAL_MAX_JOBS		4096
 
 static inline int should_fsync(struct thread_data *td)
 {
diff --git a/flow.c b/flow.c
index e0ac135..42b6dd7 100644
--- a/flow.c
+++ b/flow.c
@@ -58,7 +58,6 @@ static struct fio_flow *flow_get(unsigned int id)
 	if (!flow) {
 		flow = smalloc(sizeof(*flow));
 		if (!flow) {
-			log_err("fio: smalloc pool exhausted\n");
 			fio_mutex_up(flow_lock);
 			return NULL;
 		}
diff --git a/init.c b/init.c
index 6b6e386..5151ff1 100644
--- a/init.c
+++ b/init.c
@@ -334,7 +334,6 @@ static int setup_thread_area(void)
 	do {
 		size_t size = max_jobs * sizeof(struct thread_data);
 
-		size += file_hash_size;
 		size += sizeof(unsigned int);
 
 #ifndef CONFIG_NO_SHM
@@ -366,11 +365,10 @@ static int setup_thread_area(void)
 #endif
 
 	memset(threads, 0, max_jobs * sizeof(struct thread_data));
-	hash = (void *) threads + max_jobs * sizeof(struct thread_data);
-	fio_debug_jobp = (void *) hash + file_hash_size;
+	fio_debug_jobp = (void *) threads + max_jobs * sizeof(struct thread_data);
 	*fio_debug_jobp = -1;
-	file_hash_init(hash);
 
+	file_hash_init();
 	flow_init();
 
 	return 0;
@@ -2308,6 +2306,8 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 		switch (c) {
 		case 'a':
 			smalloc_pool_size = atoi(optarg);
+			smalloc_pool_size <<= 10;
+			sinit();
 			break;
 		case 't':
 			if (check_str_time(optarg, &def_timeout, 1)) {
diff --git a/lib/bloom.c b/lib/bloom.c
index c2e6c11..fa38db9 100644
--- a/lib/bloom.c
+++ b/lib/bloom.c
@@ -60,8 +60,6 @@ static struct bloom_hash hashes[] = {
 
 #define N_HASHES	5
 
-#define MIN_ENTRIES	1073741824UL
-
 struct bloom *bloom_new(uint64_t entries)
 {
 	struct bloom *b;
@@ -72,7 +70,6 @@ struct bloom *bloom_new(uint64_t entries)
 	b = malloc(sizeof(*b));
 	b->nentries = entries;
 	no_uints = (entries + BITS_PER_INDEX - 1) / BITS_PER_INDEX;
-	no_uints = max((unsigned long) no_uints, MIN_ENTRIES);
 	b->map = calloc(no_uints, sizeof(uint32_t));
 	if (!b->map) {
 		free(b);
@@ -118,7 +115,8 @@ bool bloom_set(struct bloom *b, uint32_t *data, unsigned int nwords)
 	return __bloom_check(b, data, nwords * sizeof(uint32_t), true);
 }
 
-bool bloom_set_string(struct bloom *b, const char *data, unsigned int len)
+bool bloom_string(struct bloom *b, const char *data, unsigned int len,
+		  bool set)
 {
-	return __bloom_check(b, data, len, true);
+	return __bloom_check(b, data, len, set);
 }
diff --git a/lib/bloom.h b/lib/bloom.h
index d40d9f6..141ead9 100644
--- a/lib/bloom.h
+++ b/lib/bloom.h
@@ -9,6 +9,6 @@ struct bloom;
 struct bloom *bloom_new(uint64_t entries);
 void bloom_free(struct bloom *b);
 bool bloom_set(struct bloom *b, uint32_t *data, unsigned int nwords);
-bool bloom_set_string(struct bloom *b, const char *data, unsigned int len);
+bool bloom_string(struct bloom *b, const char *data, unsigned int len, bool);
 
 #endif
diff --git a/os/os.h b/os/os.h
index 4f267c2..16bca68 100644
--- a/os/os.h
+++ b/os/os.h
@@ -171,7 +171,7 @@ extern int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu);
 #endif
 
 #ifndef FIO_MAX_JOBS
-#define FIO_MAX_JOBS		2048
+#define FIO_MAX_JOBS		4096
 #endif
 
 #ifndef CONFIG_SOCKLEN_T
diff --git a/server.c b/server.c
index 3862699..091c161 100644
--- a/server.c
+++ b/server.c
@@ -578,8 +578,12 @@ static int fio_net_queue_cmd(uint16_t opcode, void *buf, off_t size,
 	struct sk_entry *entry;
 
 	entry = fio_net_prep_cmd(opcode, buf, size, tagptr, flags);
-	fio_net_queue_entry(entry);
-	return 0;
+	if (entry) {
+		fio_net_queue_entry(entry);
+		return 0;
+	}
+
+	return 1;
 }
 
 static int fio_net_send_simple_stack_cmd(int sk, uint16_t opcode, uint64_t tag)
@@ -1999,10 +2003,8 @@ int fio_server_get_verify_state(const char *name, int threadnumber,
 	dprint(FD_NET, "server: request verify state\n");
 
 	rep = smalloc(sizeof(*rep));
-	if (!rep) {
-		log_err("fio: smalloc pool too small\n");
+	if (!rep)
 		return ENOMEM;
-	}
 
 	__fio_mutex_init(&rep->lock, FIO_MUTEX_LOCKED);
 	rep->data = NULL;
diff --git a/smalloc.c b/smalloc.c
index 6f647c0..d038ac6 100644
--- a/smalloc.c
+++ b/smalloc.c
@@ -26,7 +26,9 @@
 #define SMALLOC_BPL	(SMALLOC_BPB * SMALLOC_BPI)
 
 #define INITIAL_SIZE	16*1024*1024	/* new pool size */
-#define MAX_POOLS	8		/* maximum number of pools to setup */
+#define INITIAL_POOLS	8		/* maximum number of pools to setup */
+
+#define MAX_POOLS	16
 
 #define SMALLOC_PRE_RED		0xdeadbeefU
 #define SMALLOC_POST_RED	0x5aa55aa5U
@@ -149,12 +151,15 @@ static int find_next_zero(int word, int start)
 	return ffz(word) + start;
 }
 
-static int add_pool(struct pool *pool, unsigned int alloc_size)
+static bool add_pool(struct pool *pool, unsigned int alloc_size)
 {
 	int bitmap_blocks;
 	int mmap_flags;
 	void *ptr;
 
+	if (nr_pools == MAX_POOLS)
+		return false;
+
 #ifdef SMALLOC_REDZONE
 	alloc_size += sizeof(unsigned int);
 #endif
@@ -191,21 +196,22 @@ static int add_pool(struct pool *pool, unsigned int alloc_size)
 		goto out_fail;
 
 	nr_pools++;
-	return 0;
+	return true;
 out_fail:
 	log_err("smalloc: failed adding pool\n");
 	if (pool->map)
 		munmap(pool->map, pool->mmap_size);
-	return 1;
+	return false;
 }
 
 void sinit(void)
 {
-	int i, ret;
+	bool ret;
+	int i;
 
-	for (i = 0; i < MAX_POOLS; i++) {
-		ret = add_pool(&mp[i], smalloc_pool_size);
-		if (ret)
+	for (i = 0; i < INITIAL_POOLS; i++) {
+		ret = add_pool(&mp[nr_pools], smalloc_pool_size);
+		if (!ret)
 			break;
 	}
 
@@ -444,6 +450,8 @@ void *smalloc(size_t size)
 		break;
 	} while (1);
 
+	log_err("smalloc: OOM. Consider using --alloc-size to increase the "
+		"shared memory available.\n");
 	return NULL;
 }
 
diff --git a/workqueue.c b/workqueue.c
index 013087e..1131400 100644
--- a/workqueue.c
+++ b/workqueue.c
@@ -323,6 +323,8 @@ int workqueue_init(struct thread_data *td, struct workqueue *wq,
 		goto err;
 
 	wq->workers = smalloc(wq->max_workers * sizeof(struct submit_worker));
+	if (!wq->workers)
+		goto err;
 
 	for (i = 0; i < wq->max_workers; i++)
 		if (start_worker(wq, i, sk_out))

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-09-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-09-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1d272416412b0c867224a2b667e6b6124cbc26e8:

  stat: check if ctime_r() ends in a newline before stripping (2016-09-20 21:58:34 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0a301e93062df3735f9bb87c445e18d98a4b6efb:

  bloom: add string version (2016-09-23 11:57:00 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      bloom: use bool
      bloom: hashes take byte lengths, not nwords
      bloom: add string version

 lib/bloom.c | 15 ++++++++++-----
 lib/bloom.h |  4 +++-
 2 files changed, 13 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/lib/bloom.c b/lib/bloom.c
index f4eff57..c2e6c11 100644
--- a/lib/bloom.c
+++ b/lib/bloom.c
@@ -88,14 +88,14 @@ void bloom_free(struct bloom *b)
 	free(b);
 }
 
-static int __bloom_check(struct bloom *b, uint32_t *data, unsigned int nwords,
-			 int set)
+static bool __bloom_check(struct bloom *b, const void *data, unsigned int len,
+			  bool set)
 {
 	uint32_t hash[N_HASHES];
 	int i, was_set;
 
 	for (i = 0; i < N_HASHES; i++) {
-		hash[i] = hashes[i].fn(data, nwords, hashes[i].seed);
+		hash[i] = hashes[i].fn(data, len, hashes[i].seed);
 		hash[i] = hash[i] % b->nentries;
 	}
 
@@ -113,7 +113,12 @@ static int __bloom_check(struct bloom *b, uint32_t *data, unsigned int nwords,
 	return was_set == N_HASHES;
 }
 
-int bloom_set(struct bloom *b, uint32_t *data, unsigned int nwords)
+bool bloom_set(struct bloom *b, uint32_t *data, unsigned int nwords)
 {
-	return __bloom_check(b, data, nwords, 1);
+	return __bloom_check(b, data, nwords * sizeof(uint32_t), true);
+}
+
+bool bloom_set_string(struct bloom *b, const char *data, unsigned int len)
+{
+	return __bloom_check(b, data, len, true);
 }
diff --git a/lib/bloom.h b/lib/bloom.h
index 127ed9b..d40d9f6 100644
--- a/lib/bloom.h
+++ b/lib/bloom.h
@@ -2,11 +2,13 @@
 #define FIO_BLOOM_H
 
 #include <inttypes.h>
+#include "../lib/types.h"
 
 struct bloom;
 
 struct bloom *bloom_new(uint64_t entries);
 void bloom_free(struct bloom *b);
-int bloom_set(struct bloom *b, uint32_t *data, unsigned int nwords);
+bool bloom_set(struct bloom *b, uint32_t *data, unsigned int nwords);
+bool bloom_set_string(struct bloom *b, const char *data, unsigned int len);
 
 #endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-09-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-09-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5de1ade5a8a8dd118bdfac835a6cfb4bcf013734:

  Windows: fix Time_tToSystemTime function to be 64-bit compliant (2016-09-19 08:28:11 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1d272416412b0c867224a2b667e6b6124cbc26e8:

  stat: check if ctime_r() ends in a newline before stripping (2016-09-20 21:58:34 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      os/windows/posix.c: ensure that ctime_r() adds a newline
      stat: check if ctime_r() ends in a newline before stripping

 os/windows/posix.c | 2 +-
 stat.c             | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/os/windows/posix.c b/os/windows/posix.c
index 5830e4c..3388127 100755
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -250,7 +250,7 @@ char* ctime_r(const time_t *t, char *buf)
 
     Time_tToSystemTime(*t, &systime);
     /* We don't know how long `buf` is, but assume it's rounded up from the minimum of 25 to 32 */
-    StringCchPrintfA(buf, 31, "%s %s %d %02d:%02d:%02d %04d", dayOfWeek[systime.wDayOfWeek % 7], monthOfYear[(systime.wMonth - 1) % 12],
+    StringCchPrintfA(buf, 31, "%s %s %d %02d:%02d:%02d %04d\n", dayOfWeek[systime.wDayOfWeek % 7], monthOfYear[(systime.wMonth - 1) % 12],
 										 systime.wDay, systime.wHour, systime.wMinute, systime.wSecond, systime.wYear);
     return buf;
 }
diff --git a/stat.c b/stat.c
index c9148ad..3c72a6c 100644
--- a/stat.c
+++ b/stat.c
@@ -1650,7 +1650,8 @@ void __show_run_stats(void)
 
 		os_ctime_r((const time_t *) &now.tv_sec, time_buf,
 				sizeof(time_buf));
-		time_buf[strlen(time_buf) - 1] = '\0';
+		if (time_buf[strlen(time_buf) - 1] == '\n')
+			time_buf[strlen(time_buf) - 1] = '\0';
 
 		root = json_create_object();
 		json_object_add_value_string(root, "fio version", fio_version_string);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-09-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-09-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d22042d2117b78e16b06bab0880422c417007d37:

  workqueue: kill SW_F_EXITED (2016-09-16 12:48:32 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5de1ade5a8a8dd118bdfac835a6cfb4bcf013734:

  Windows: fix Time_tToSystemTime function to be 64-bit compliant (2016-09-19 08:28:11 -0600)

----------------------------------------------------------------
Josh Sinykin (2):
      Fix garbage characters in json output caused by time_buf being uninitialized
      Windows: fix Time_tToSystemTime function to be 64-bit compliant

 os/windows/posix.c | 10 ++++++----
 stat.c             |  2 ++
 2 files changed, 8 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/os/windows/posix.c b/os/windows/posix.c
index fd3d9ab..5830e4c 100755
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -232,10 +232,12 @@ char *dlerror(void)
 /* Copied from http://blogs.msdn.com/b/joshpoley/archive/2007/12/19/date-time-formats-and-conversions.aspx */
 void Time_tToSystemTime(time_t dosTime, SYSTEMTIME *systemTime)
 {
-    LARGE_INTEGER jan1970FT;
-    LARGE_INTEGER utcFT;
-    jan1970FT.QuadPart = 116444736000000000LL; // january 1st 1970
-    utcFT.QuadPart = ((unsigned __int64)dosTime) * 10000000 + jan1970FT.QuadPart;
+    FILETIME utcFT;
+    LONGLONG jan1970;
+
+    jan1970 = Int32x32To64(dosTime, 10000000) + 116444736000000000;
+    utcFT.dwLowDateTime = (DWORD)jan1970;
+    utcFT.dwHighDateTime = jan1970 >> 32;
 
     FileTimeToSystemTime((FILETIME*)&utcFT, systemTime);
 }
diff --git a/stat.c b/stat.c
index 74c2686..c9148ad 100644
--- a/stat.c
+++ b/stat.c
@@ -669,6 +669,8 @@ static void show_thread_status_normal(struct thread_stat *ts,
 
 	if (!ddir_rw_sum(ts->io_bytes) && !ddir_rw_sum(ts->total_io_u))
 		return;
+		
+	memset(time_buf, 0, sizeof(time_buf));
 
 	time(&time_p);
 	os_ctime_r((const time_t *) &time_p, time_buf, sizeof(time_buf));

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-09-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-09-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7903bf87725b18495a06f7199342f167147712eb:

  Merge branch 'master' of https://github.com/jan--f/fio (2016-09-15 07:57:38 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d22042d2117b78e16b06bab0880422c417007d37:

  workqueue: kill SW_F_EXITED (2016-09-16 12:48:32 -0600)

----------------------------------------------------------------
Ben England (3):
      generate unique pathname for each fio --client log file
      adhere to fio coding standards
      safety first!

Jens Axboe (9):
      io_u: fix overflow in 64-bit bssplit calculation
      Merge branch 'client-unique-log-names' of https://github.com/bengland2/fio into log-unique
      client: coding style fixups
      mac: fix for 10.12 having clockid_t
      configure: harden clockid_t test
      Fio 2.14
      Fixup two compile warnings
      iolog: dprint() casts for 32-bit warnings
      workqueue: kill SW_F_EXITED

 FIO-VERSION-GEN        |  2 +-
 client.c               | 22 +++++++++++++++++-----
 configure              | 22 ++++++++++++++++++++++
 init.c                 | 13 +++++++------
 io_u.c                 |  9 +++++----
 iolog.c                | 10 ++++++----
 os/os-mac.h            |  2 ++
 os/windows/install.wxs |  2 +-
 workqueue.c            | 10 +++-------
 9 files changed, 64 insertions(+), 28 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index d19dcca..8d4f1ef 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.13
+DEF_VER=fio-2.14
 
 LF='
 '
diff --git a/client.c b/client.c
index c8069a0..9698122 100644
--- a/client.c
+++ b/client.c
@@ -1295,6 +1295,7 @@ static int fio_client_handle_iolog(struct fio_client *client,
 {
 	struct cmd_iolog_pdu *pdu;
 	bool store_direct;
+	char *log_pathname;
 
 	pdu = convert_iolog(cmd, &store_direct);
 	if (!pdu) {
@@ -1302,15 +1303,26 @@ static int fio_client_handle_iolog(struct fio_client *client,
 		return 1;
 	}
 
+        /* allocate buffer big enough for next sprintf() call */
+	log_pathname = malloc(10 + strlen((char *)pdu->name) +
+			strlen(client->hostname));
+	if (!log_pathname) {
+		log_err("fio: memory allocation of unique pathname failed");
+		return -1;
+	}
+	/* generate a unique pathname for the log file using hostname */
+	sprintf(log_pathname, "%s.%s", pdu->name, client->hostname);
+
 	if (store_direct) {
 		ssize_t ret;
 		size_t sz;
 		int fd;
 
-		fd = open((const char *) pdu->name,
+		fd = open((const char *) log_pathname,
 				O_WRONLY | O_CREAT | O_TRUNC, 0644);
 		if (fd < 0) {
-			log_err("fio: open log: %s\n", strerror(errno));
+			log_err("fio: open log %s: %s\n",
+				log_pathname, strerror(errno));
 			return 1;
 		}
 
@@ -1326,10 +1338,10 @@ static int fio_client_handle_iolog(struct fio_client *client,
 		return 0;
 	} else {
 		FILE *f;
-
-		f = fopen((const char *) pdu->name, "w");
+		f = fopen((const char *) log_pathname, "w");
 		if (!f) {
-			log_err("fio: fopen log: %s\n", strerror(errno));
+			log_err("fio: fopen log %s : %s\n",
+				log_pathname, strerror(errno));
 			return 1;
 		}
 
diff --git a/configure b/configure
index 2851f54..a24e3ef 100755
--- a/configure
+++ b/configure
@@ -794,6 +794,25 @@ fi
 echo "CLOCK_MONOTONIC_PRECISE       $clock_monotonic_precise"
 
 ##########################################
+# clockid_t probe
+clockid_t="no"
+cat > $TMPC << EOF
+#include <stdio.h>
+#include <string.h>
+#include <time.h>
+int main(int argc, char **argv)
+{
+  clockid_t cid;
+  memset(&cid, 0, sizeof(cid));
+  return clock_gettime(cid, NULL);
+}
+EOF
+if compile_prog "" "$LIBS" "clockid_t"; then
+  clockid_t="yes"
+fi
+echo "clockid_t                     $clockid_t"
+
+##########################################
 # gettimeofday() probe
 gettimeofday="no"
 cat > $TMPC << EOF
@@ -1722,6 +1741,9 @@ fi
 if test "$clock_monotonic_precise" = "yes" ; then
   output_sym "CONFIG_CLOCK_MONOTONIC_PRECISE"
 fi
+if test "$clockid_t" = "yes"; then
+  output_sym "CONFIG_CLOCKID_T"
+fi
 if test "$gettimeofday" = "yes" ; then
   output_sym "CONFIG_GETTIMEOFDAY"
 fi
diff --git a/init.c b/init.c
index bc17b40..6b6e386 100644
--- a/init.c
+++ b/init.c
@@ -1426,12 +1426,6 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	}
 
 	if (o->hist_log_file) {
-#ifndef CONFIG_ZLIB
-		if (td->client_type) {
-			log_err("fio: --write_hist_log requires zlib in client/server mode\n");
-			goto err;
-		}
-#endif
 		struct log_params p = {
 			.td = td,
 			.avg_msec = o->log_avg_msec,
@@ -1444,6 +1438,13 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		};
 		const char *suf;
 
+#ifndef CONFIG_ZLIB
+		if (td->client_type) {
+			log_err("fio: --write_hist_log requires zlib in client/server mode\n");
+			goto err;
+		}
+#endif
+
 		if (p.log_gz_store)
 			suf = "log.fz";
 		else
diff --git a/io_u.c b/io_u.c
index b6d530f..7b51dd2 100644
--- a/io_u.c
+++ b/io_u.c
@@ -531,8 +531,7 @@ static unsigned int __get_next_buflen(struct thread_data *td, struct io_u *io_u,
 	int ddir = io_u->ddir;
 	unsigned int buflen = 0;
 	unsigned int minbs, maxbs;
-	uint64_t frand_max;
-	unsigned long r;
+	uint64_t frand_max, r;
 
 	assert(ddir_rw(ddir));
 
@@ -561,7 +560,7 @@ static unsigned int __get_next_buflen(struct thread_data *td, struct io_u *io_u,
 			if (buflen < minbs)
 				buflen = minbs;
 		} else {
-			long perc = 0;
+			long long perc = 0;
 			unsigned int i;
 
 			for (i = 0; i < td->o.bssplit_nr[ddir]; i++) {
@@ -569,7 +568,9 @@ static unsigned int __get_next_buflen(struct thread_data *td, struct io_u *io_u,
 
 				buflen = bsp->bs;
 				perc += bsp->perc;
-				if ((r * 100UL <= frand_max * perc) &&
+				if (!perc)
+					break;
+				if ((r / perc <= frand_max / 100ULL) &&
 				    io_u_fits(td, io_u, buflen))
 					break;
 			}
diff --git a/iolog.c b/iolog.c
index baa4b85..6576ca5 100644
--- a/iolog.c
+++ b/iolog.c
@@ -714,7 +714,7 @@ static void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
 	for (i = 0; i < nr_samples; i++) {
 		s = __get_sample(samples, log_offset, i);
 
-		entry = (struct io_u_plat_entry *) s->val;
+		entry = (struct io_u_plat_entry *) (uintptr_t) s->val;
 		io_u_plat = entry->io_u_plat;
 
 		entry_before = flist_first_entry(&entry->list, struct io_u_plat_entry, list);
@@ -1153,7 +1153,8 @@ static int gz_work(struct iolog_flush_data *data)
 				data->log->filename);
 	do {
 		if (c)
-			dprint(FD_COMPRESS, "seq=%d, chunk=%lu\n", seq, c->len);
+			dprint(FD_COMPRESS, "seq=%d, chunk=%lu\n", seq,
+				(unsigned long) c->len);
 		c = get_new_chunk(seq);
 		stream.avail_out = GZ_CHUNK;
 		stream.next_out = c->buf;
@@ -1190,7 +1191,7 @@ static int gz_work(struct iolog_flush_data *data)
 	total -= c->len;
 	c->len = GZ_CHUNK - stream.avail_out;
 	total += c->len;
-	dprint(FD_COMPRESS, "seq=%d, chunk=%lu\n", seq, c->len);
+	dprint(FD_COMPRESS, "seq=%d, chunk=%lu\n", seq, (unsigned long) c->len);
 
 	if (ret != Z_STREAM_END) {
 		do {
@@ -1201,7 +1202,8 @@ static int gz_work(struct iolog_flush_data *data)
 			c->len = GZ_CHUNK - stream.avail_out;
 			total += c->len;
 			flist_add_tail(&c->list, &list);
-			dprint(FD_COMPRESS, "seq=%d, chunk=%lu\n", seq, c->len);
+			dprint(FD_COMPRESS, "seq=%d, chunk=%lu\n", seq,
+				(unsigned long) c->len);
 		} while (ret != Z_STREAM_END);
 	}
 
diff --git a/os/os-mac.h b/os/os-mac.h
index 76d388e..0903a6f 100644
--- a/os/os-mac.h
+++ b/os/os-mac.h
@@ -35,7 +35,9 @@
 
 typedef off_t off64_t;
 
+#ifndef CONFIG_CLOCKID_T
 typedef unsigned int clockid_t;
+#endif
 
 #define FIO_OS_DIRECTIO
 static inline int fio_set_odirect(int fd)
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index f8d3773..da09b9f 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.13">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.14">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"
diff --git a/workqueue.c b/workqueue.c
index 2e01b58..013087e 100644
--- a/workqueue.c
+++ b/workqueue.c
@@ -15,9 +15,8 @@ enum {
 	SW_F_IDLE	= 1 << 0,
 	SW_F_RUNNING	= 1 << 1,
 	SW_F_EXIT	= 1 << 2,
-	SW_F_EXITED	= 1 << 3,
-	SW_F_ACCOUNTED	= 1 << 4,
-	SW_F_ERROR	= 1 << 5,
+	SW_F_ACCOUNTED	= 1 << 3,
+	SW_F_ERROR	= 1 << 4,
 };
 
 static struct submit_worker *__get_submit_worker(struct workqueue *wq,
@@ -131,7 +130,7 @@ static void *worker_thread(void *data)
 {
 	struct submit_worker *sw = data;
 	struct workqueue *wq = sw->wq;
-	unsigned int eflags = 0, ret = 0;
+	unsigned int ret = 0;
 	FLIST_HEAD(local_list);
 
 	sk_out_assign(sw->sk_out);
@@ -206,9 +205,6 @@ handle_work:
 		wq->ops.update_acct_fn(sw);
 
 done:
-	pthread_mutex_lock(&sw->lock);
-	sw->flags |= (SW_F_EXITED | eflags);
-	pthread_mutex_unlock(&sw->lock);
 	sk_out_drop();
 	return NULL;
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-09-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-09-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 308d69b5d340577b7886696f39753b7ba5ae9e11:

  client: ignore SEND_ETA, if we can't fin a reply command (2016-09-13 09:08:00 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7903bf87725b18495a06f7199342f167147712eb:

  Merge branch 'master' of https://github.com/jan--f/fio (2016-09-15 07:57:38 -0600)

----------------------------------------------------------------
Jan Fajerski (1):
      add simple fio.service to start fio server with systemd

Jens Axboe (1):
      Merge branch 'master' of https://github.com/jan--f/fio

 tools/fio.service | 10 ++++++++++
 1 file changed, 10 insertions(+)
 create mode 100644 tools/fio.service

---

Diff of recent changes:

diff --git a/tools/fio.service b/tools/fio.service
new file mode 100644
index 0000000..21de0b7
--- /dev/null
+++ b/tools/fio.service
@@ -0,0 +1,10 @@
+[Unit]
+
+Description=flexible I/O tester server
+After=network.target
+
+[Service]
+
+Type=simple
+PIDFile=/run/fio.pid
+ExecStart=/usr/bin/fio --server

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-09-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-09-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 247e54360a9dff433e815f91b500073fe3c1a820:

  Makefile: fixup java path for libhdfs (2016-09-12 08:10:45 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 308d69b5d340577b7886696f39753b7ba5ae9e11:

  client: ignore SEND_ETA, if we can't fin a reply command (2016-09-13 09:08:00 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      client: ignore SEND_ETA, if we can't fin a reply command

 client.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/client.c b/client.c
index 3456665..c8069a0 100644
--- a/client.c
+++ b/client.c
@@ -1183,7 +1183,7 @@ void fio_client_sum_jobs_eta(struct jobs_eta *dst, struct jobs_eta *je)
 	strcpy((char *) dst->run_str, (char *) je->run_str);
 }
 
-static void remove_reply_cmd(struct fio_client *client, struct fio_net_cmd *cmd)
+static bool remove_reply_cmd(struct fio_client *client, struct fio_net_cmd *cmd)
 {
 	struct fio_net_cmd_reply *reply = NULL;
 	struct flist_head *entry;
@@ -1199,12 +1199,13 @@ static void remove_reply_cmd(struct fio_client *client, struct fio_net_cmd *cmd)
 
 	if (!reply) {
 		log_err("fio: client: unable to find matching tag (%llx)\n", (unsigned long long) cmd->tag);
-		return;
+		return false;
 	}
 
 	flist_del(&reply->list);
 	cmd->tag = reply->saved_tag;
 	free(reply);
+	return true;
 }
 
 int fio_client_wait_for_reply(struct fio_client *client, uint64_t tag)
@@ -1653,7 +1654,8 @@ int fio_handle_client(struct fio_client *client)
 	case FIO_NET_CMD_ETA: {
 		struct jobs_eta *je = (struct jobs_eta *) cmd->payload;
 
-		remove_reply_cmd(client, cmd);
+		if (!remove_reply_cmd(client, cmd))
+			break;
 		convert_jobs_eta(je);
 		handle_eta(client, cmd);
 		break;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-09-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-09-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5e333ada464c885d2a33a30e98812bfb666e6052:

  init: pass in right pointer to def thread options free (2016-09-11 19:14:10 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 247e54360a9dff433e815f91b500073fe3c1a820:

  Makefile: fixup java path for libhdfs (2016-09-12 08:10:45 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Makefile: fixup java path for libhdfs

 Makefile  | 2 +-
 configure | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 3f67ab7..6b5548a 100644
--- a/Makefile
+++ b/Makefile
@@ -49,7 +49,7 @@ SOURCE :=	$(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 
 ifdef CONFIG_LIBHDFS
   HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
-  HDFSLIB= -Wl,-rpath $(JAVA_HOME)/jre/lib/`uname -m`/server -L$(JAVA_HOME)/jre/lib/`uname -m`/server -ljvm $(FIO_LIBHDFS_LIB)/libhdfs.a
+  HDFSLIB= -Wl,-rpath $(JAVA_HOME)/jre/lib/$(FIO_HDFS_CPU)/server -L$(JAVA_HOME)/jre/lib/$(FIO_HDFS_CPU)/server -ljvm $(FIO_LIBHDFS_LIB)/libhdfs.a
   CFLAGS += $(HDFSFLAGS)
   SOURCE += engines/libhdfs.c
 endif
diff --git a/configure b/configure
index 93c3720..2851f54 100755
--- a/configure
+++ b/configure
@@ -1489,6 +1489,10 @@ if test "$libhdfs" = "yes" ; then
   if test "$hdfs_conf_error" = "1" ; then
     exit 1
   fi
+  FIO_HDFS_CPU=$cpu
+  if test "$FIO_HDFS_CPU" = "x86_64" ; then
+    FIO_HDFS_CPU="amd64"
+  fi
 fi
 echo "HDFS engine                   $libhdfs"
 
@@ -1829,6 +1833,7 @@ if test "$gf_trim" = "yes" ; then
 fi
 if test "$libhdfs" = "yes" ; then
   output_sym "CONFIG_LIBHDFS"
+  echo "FIO_HDFS_CPU=$FIO_HDFS_CPU" >> $config_host_mak
   echo "JAVA_HOME=$JAVA_HOME" >> $config_host_mak
   echo "FIO_LIBHDFS_INCLUDE=$FIO_LIBHDFS_INCLUDE" >> $config_host_mak
   echo "FIO_LIBHDFS_LIB=$FIO_LIBHDFS_LIB" >> $config_host_mak

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-09-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-09-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 27c9aab2d9a45a1babf01a3aa0123e1f5ce36a24:

  Makes use of configparser portable to older versions by: - relying on its' own NoOptionError exception - using getter method instead of dictionary overriding - and using readfp() as older version does not autodetect fp vs string types (2016-09-06 10:22:00 -0400)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5e333ada464c885d2a33a30e98812bfb666e6052:

  init: pass in right pointer to def thread options free (2016-09-11 19:14:10 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      init: pass in right pointer to def thread options free

 init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 4b4a86a..bc17b40 100644
--- a/init.c
+++ b/init.c
@@ -309,7 +309,7 @@ static void free_shm(void)
 	free(trigger_remote_cmd);
 	trigger_file = trigger_cmd = trigger_remote_cmd = NULL;
 
-	options_free(fio_options, &def_thread);
+	options_free(fio_options, &def_thread.o);
 	fio_filelock_exit();
 	scleanup();
 }

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-09-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-09-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b678fac65b13dabd0f78ceb338547b9acb5a4f2d:

  server: bump version (2016-09-02 11:24:59 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 27c9aab2d9a45a1babf01a3aa0123e1f5ce36a24:

  Makes use of configparser portable to older versions by: - relying on its' own NoOptionError exception - using getter method instead of dictionary overriding - and using readfp() as older version does not autodetect fp vs string types (2016-09-06 10:22:00 -0400)

----------------------------------------------------------------
Karl Cronburg (1):
      Makes use of configparser portable to older versions by:     - relying on its' own NoOptionError exception     - using getter method instead of dictionary overriding     - and using readfp() as older version does not autodetect fp vs string types

 tools/hist/fiologparser_hist.py | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/tools/hist/fiologparser_hist.py b/tools/hist/fiologparser_hist.py
index 93dca01..ead5e54 100755
--- a/tools/hist/fiologparser_hist.py
+++ b/tools/hist/fiologparser_hist.py
@@ -257,22 +257,22 @@ def main(ctx):
 
     if ctx.job_file:
         try:
-            from configparser import SafeConfigParser
+            from configparser import SafeConfigParser, NoOptionError
         except ImportError:
-            from ConfigParser import SafeConfigParser
+            from ConfigParser import SafeConfigParser, NoOptionError
 
         cp = SafeConfigParser(allow_no_value=True)
         with open(ctx.job_file, 'r') as fp:
-            cp.read(fp)
+            cp.readfp(fp)
 
         if ctx.interval is None:
             # Auto detect --interval value
             for s in cp.sections():
                 try:
-                    hist_msec = cp[s]['log_hist_msec']
+                    hist_msec = cp.get(s, 'log_hist_msec')
                     if hist_msec is not None:
                         ctx.interval = int(hist_msec)
-                except KeyError:
+                except NoOptionError:
                     pass
 
     if ctx.interval is None:

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-09-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-09-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 32bc83ddb4adb6b19500f2f6acec8c591feaae26:

  jesd219: fix alignment (2016-08-29 15:43:27 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b678fac65b13dabd0f78ceb338547b9acb5a4f2d:

  server: bump version (2016-09-02 11:24:59 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'histogram-client-server' of https://github.com/cronburg/fio
      server: bump version

Karl Cronburg (1):
      Client / server code for handling histograms. The server:

 client.c |  63 ++++++++++++++++++++++++++++++++++++++--
 init.c   |   6 ++++
 iolog.c  |  15 ++++++----
 iolog.h  |  13 +++++++--
 server.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 server.h |   3 +-
 6 files changed, 189 insertions(+), 11 deletions(-)

---

Diff of recent changes:

diff --git a/client.c b/client.c
index 238c93f..3456665 100644
--- a/client.c
+++ b/client.c
@@ -1251,6 +1251,44 @@ static void handle_eta(struct fio_client *client, struct fio_net_cmd *cmd)
 	fio_client_dec_jobs_eta(eta, client->ops->eta);
 }
 
+static void client_flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
+				      uint64_t sample_size)
+{
+	struct io_sample *s;
+	int log_offset;
+	uint64_t i, j, nr_samples;
+	struct io_u_plat_entry *entry;
+	unsigned int *io_u_plat;
+
+	int stride = 1 << hist_coarseness;
+
+	if (!sample_size)
+		return;
+
+	s = __get_sample(samples, 0, 0);
+	log_offset = (s->__ddir & LOG_OFFSET_SAMPLE_BIT) != 0;
+
+	nr_samples = sample_size / __log_entry_sz(log_offset);
+
+	for (i = 0; i < nr_samples; i++) {
+
+		s = (struct io_sample *)((char *)__get_sample(samples, log_offset, i) +
+			i * sizeof(struct io_u_plat_entry));
+
+		entry = s->plat_entry;
+		io_u_plat = entry->io_u_plat;
+
+		fprintf(f, "%lu, %u, %u, ", (unsigned long) s->time,
+						io_sample_ddir(s), s->bs);
+		for (j = 0; j < FIO_IO_U_PLAT_NR - stride; j += stride) {
+			fprintf(f, "%lu, ", hist_sum(j, stride, io_u_plat, NULL));
+		}
+		fprintf(f, "%lu\n", (unsigned long)
+			hist_sum(FIO_IO_U_PLAT_NR - stride, stride, io_u_plat, NULL));
+
+	}
+}
+
 static int fio_client_handle_iolog(struct fio_client *client,
 				   struct fio_net_cmd *cmd)
 {
@@ -1294,8 +1332,13 @@ static int fio_client_handle_iolog(struct fio_client *client,
 			return 1;
 		}
 
-		flush_samples(f, pdu->samples,
-				pdu->nr_samples * sizeof(struct io_sample));
+		if (pdu->log_type == IO_LOG_TYPE_HIST) {
+			client_flush_hist_samples(f, pdu->log_hist_coarseness, pdu->samples,
+					   pdu->nr_samples * sizeof(struct io_sample));
+		} else {
+			flush_samples(f, pdu->samples,
+					pdu->nr_samples * sizeof(struct io_sample));
+		}
 		fclose(f);
 		return 0;
 	}
@@ -1395,7 +1438,11 @@ static struct cmd_iolog_pdu *convert_iolog_gz(struct fio_net_cmd *cmd,
 	 */
 	nr_samples = le64_to_cpu(pdu->nr_samples);
 
-	total = nr_samples * __log_entry_sz(le32_to_cpu(pdu->log_offset));
+	if (pdu->log_type == IO_LOG_TYPE_HIST)
+		total = nr_samples * (__log_entry_sz(le32_to_cpu(pdu->log_offset)) +
+					sizeof(struct io_u_plat_entry));
+	else
+		total = nr_samples * __log_entry_sz(le32_to_cpu(pdu->log_offset));
 	ret = malloc(total + sizeof(*pdu));
 	ret->nr_samples = nr_samples;
 
@@ -1478,6 +1525,7 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 	ret->log_type		= le32_to_cpu(ret->log_type);
 	ret->compressed		= le32_to_cpu(ret->compressed);
 	ret->log_offset		= le32_to_cpu(ret->log_offset);
+	ret->log_hist_coarseness = le32_to_cpu(ret->log_hist_coarseness);
 
 	if (*store_direct)
 		return ret;
@@ -1487,6 +1535,9 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 		struct io_sample *s;
 
 		s = __get_sample(samples, ret->log_offset, i);
+		if (ret->log_type == IO_LOG_TYPE_HIST)
+			s = (struct io_sample *)((void *)s + sizeof(struct io_u_plat_entry) * i);
+
 		s->time		= le64_to_cpu(s->time);
 		s->val		= le64_to_cpu(s->val);
 		s->__ddir	= le32_to_cpu(s->__ddir);
@@ -1497,6 +1548,12 @@ static struct cmd_iolog_pdu *convert_iolog(struct fio_net_cmd *cmd,
 
 			so->offset = le64_to_cpu(so->offset);
 		}
+
+		if (ret->log_type == IO_LOG_TYPE_HIST) {
+			s->plat_entry = (struct io_u_plat_entry *)(((void *)s) + sizeof(*s));
+			s->plat_entry->list.next = NULL;
+			s->plat_entry->list.prev = NULL;
+		}
 	}
 
 	return ret;
diff --git a/init.c b/init.c
index 0221ab2..4b4a86a 100644
--- a/init.c
+++ b/init.c
@@ -1426,6 +1426,12 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	}
 
 	if (o->hist_log_file) {
+#ifndef CONFIG_ZLIB
+		if (td->client_type) {
+			log_err("fio: --write_hist_log requires zlib in client/server mode\n");
+			goto err;
+		}
+#endif
 		struct log_params p = {
 			.td = td,
 			.avg_msec = o->log_avg_msec,
diff --git a/iolog.c b/iolog.c
index d4213db..baa4b85 100644
--- a/iolog.c
+++ b/iolog.c
@@ -674,14 +674,19 @@ void free_log(struct io_log *log)
 	sfree(log);
 }
 
-static inline unsigned long hist_sum(int j, int stride, unsigned int *io_u_plat,
+inline unsigned long hist_sum(int j, int stride, unsigned int *io_u_plat,
 		unsigned int *io_u_plat_last)
 {
 	unsigned long sum;
 	int k;
 
-	for (k = sum = 0; k < stride; k++)
-		sum += io_u_plat[j + k] - io_u_plat_last[j + k];
+	if (io_u_plat_last) {
+		for (k = sum = 0; k < stride; k++)
+			sum += io_u_plat[j + k] - io_u_plat_last[j + k];
+	} else {
+		for (k = sum = 0; k < stride; k++)
+			sum += io_u_plat[j + k];
+	}
 
 	return sum;
 }
@@ -1062,9 +1067,9 @@ void flush_log(struct io_log *log, bool do_append)
 		
 		if (log == log->td->clat_hist_log)
 			flush_hist_samples(f, log->hist_coarseness, cur_log->log,
-			                   cur_log->nr_samples * log_entry_sz(log));
+			                   log_sample_sz(log, cur_log));
 		else
-			flush_samples(f, cur_log->log, cur_log->nr_samples * log_entry_sz(log));
+			flush_samples(f, cur_log->log, log_sample_sz(log, cur_log));
 		
 		sfree(cur_log);
 	}
diff --git a/iolog.h b/iolog.h
index ca344f1..de641d5 100644
--- a/iolog.h
+++ b/iolog.h
@@ -29,7 +29,10 @@ struct io_hist {
  */
 struct io_sample {
 	uint64_t time;
-	uint64_t val;
+	union {
+		uint64_t val;
+		struct io_u_plat_entry *plat_entry;
+	};
 	uint32_t __ddir;
 	uint32_t bs;
 };
@@ -117,7 +120,7 @@ struct io_log {
 	 */
 	struct io_hist hist_window[DDIR_RWDIR_CNT];
 	unsigned long hist_msec;
-	int hist_coarseness;
+	unsigned int hist_coarseness;
 
 	pthread_mutex_t chunk_lock;
 	unsigned int chunk_seq;
@@ -150,6 +153,11 @@ static inline size_t log_entry_sz(struct io_log *log)
 	return __log_entry_sz(log->log_offset);
 }
 
+static inline size_t log_sample_sz(struct io_log *log, struct io_logs *cur_log)
+{
+	return cur_log->nr_samples * log_entry_sz(log);
+}
+
 static inline struct io_sample *__get_sample(void *samples, int log_offset,
 					     uint64_t sample)
 {
@@ -259,6 +267,7 @@ extern void finalize_logs(struct thread_data *td, bool);
 extern void setup_log(struct io_log **, struct log_params *, const char *);
 extern void flush_log(struct io_log *, bool);
 extern void flush_samples(FILE *, void *, uint64_t);
+extern unsigned long hist_sum(int, int, unsigned int *, unsigned int *);
 extern void free_log(struct io_log *);
 extern void fio_writeout_logs(bool);
 extern void td_writeout_logs(struct thread_data *, bool);
diff --git a/server.c b/server.c
index 9f2220d..3862699 100644
--- a/server.c
+++ b/server.c
@@ -1654,6 +1654,102 @@ void fio_server_send_du(void)
 }
 
 #ifdef CONFIG_ZLIB
+
+static inline void __fio_net_prep_tail(z_stream *stream, void *out_pdu,
+					struct sk_entry **last_entry,
+					struct sk_entry *first)
+{
+	unsigned int this_len = FIO_SERVER_MAX_FRAGMENT_PDU - stream->avail_out;
+
+	*last_entry = fio_net_prep_cmd(FIO_NET_CMD_IOLOG, out_pdu, this_len,
+				 NULL, SK_F_VEC | SK_F_INLINE | SK_F_FREE);
+	flist_add_tail(&(*last_entry)->list, &first->next);
+
+}
+
+/*
+ * Deflates the next input given, creating as many new packets in the
+ * linked list as necessary.
+ */
+static int __deflate_pdu_buffer(void *next_in, unsigned int next_sz, void **out_pdu,
+				struct sk_entry **last_entry, z_stream *stream,
+				struct sk_entry *first)
+{
+	int ret;
+
+	stream->next_in = next_in;
+	stream->avail_in = next_sz;
+	do {
+		if (! stream->avail_out) {
+
+			__fio_net_prep_tail(stream, *out_pdu, last_entry, first);
+
+			*out_pdu = malloc(FIO_SERVER_MAX_FRAGMENT_PDU);
+
+			stream->avail_out = FIO_SERVER_MAX_FRAGMENT_PDU;
+			stream->next_out = *out_pdu;
+		}
+
+		ret = deflate(stream, Z_BLOCK);
+
+		if (ret < 0) {
+			free(*out_pdu);
+			return 1;
+		}
+	} while (stream->avail_in);
+
+	return 0;
+}
+
+static int __fio_append_iolog_gz_hist(struct sk_entry *first, struct io_log *log,
+				      struct io_logs *cur_log, z_stream *stream)
+{
+	struct sk_entry *entry;
+	void *out_pdu;
+	int ret, i, j;
+	int sample_sz = log_entry_sz(log);
+
+	out_pdu = malloc(FIO_SERVER_MAX_FRAGMENT_PDU);
+	stream->avail_out = FIO_SERVER_MAX_FRAGMENT_PDU;
+	stream->next_out = out_pdu;
+
+	for (i = 0; i < cur_log->nr_samples; i++) {
+		struct io_sample *s;
+		struct io_u_plat_entry *cur_plat_entry, *prev_plat_entry;
+		unsigned int *cur_plat, *prev_plat;
+
+		s = get_sample(log, cur_log, i);
+		ret = __deflate_pdu_buffer(s, sample_sz, &out_pdu, &entry, stream, first);
+		if (ret)
+			return ret;
+
+		/* Do the subtraction on server side so that client doesn't have to
+		 * reconstruct our linked list from packets.
+		 */
+		cur_plat_entry  = s->plat_entry;
+		prev_plat_entry = flist_first_entry(&cur_plat_entry->list, struct io_u_plat_entry, list);
+		cur_plat  = cur_plat_entry->io_u_plat;
+		prev_plat = prev_plat_entry->io_u_plat;
+
+		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
+			cur_plat[j] -= prev_plat[j];
+		}
+
+		flist_del(&prev_plat_entry->list);
+		free(prev_plat_entry);
+
+		ret = __deflate_pdu_buffer(cur_plat_entry, sizeof(*cur_plat_entry),
+					   &out_pdu, &entry, stream, first);
+
+		if (ret)
+			return ret;
+	}
+
+	__fio_net_prep_tail(stream, out_pdu, &entry, first);
+
+	return 0;
+}
+
 static int __fio_append_iolog_gz(struct sk_entry *first, struct io_log *log,
 				 struct io_logs *cur_log, z_stream *stream)
 {
@@ -1661,6 +1757,9 @@ static int __fio_append_iolog_gz(struct sk_entry *first, struct io_log *log,
 	void *out_pdu;
 	int ret;
 
+	if (log->log_type == IO_LOG_TYPE_HIST)
+		return __fio_append_iolog_gz_hist(first, log, cur_log, stream);
+
 	stream->next_in = (void *) cur_log->log;
 	stream->avail_in = cur_log->nr_samples * log_entry_sz(log);
 
@@ -1805,6 +1904,7 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 	pdu.nr_samples = cpu_to_le64(iolog_nr_samples(log));
 	pdu.thread_number = cpu_to_le32(td->thread_number);
 	pdu.log_type = cpu_to_le32(log->log_type);
+	pdu.log_hist_coarseness = cpu_to_le32(log->hist_coarseness);
 
 	if (!flist_empty(&log->chunk_list))
 		pdu.compressed = __cpu_to_le32(STORE_COMPRESSED);
diff --git a/server.h b/server.h
index fb384fb..6c572a1 100644
--- a/server.h
+++ b/server.h
@@ -38,7 +38,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 56,
+	FIO_SERVER_VER			= 57,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
@@ -183,6 +183,7 @@ struct cmd_iolog_pdu {
 	uint32_t log_type;
 	uint32_t compressed;
 	uint32_t log_offset;
+	uint32_t log_hist_coarseness;
 	uint8_t name[FIO_NET_NAME_MAX];
 	struct io_sample samples[0];
 };

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 39d13e67ef1f4b327c68431f8daf033a03920117:

  backend: check if we need to update rusage stats, if stat_mutex is busy (2016-08-26 14:39:30 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 32bc83ddb4adb6b19500f2f6acec8c591feaae26:

  jesd219: fix alignment (2016-08-29 15:43:27 -0600)

----------------------------------------------------------------
Jeff Furlong (1):
      jesd219: fix alignment

Jens Axboe (3):
      trim: convert to bool
      filelock: bool conversion
      FIO-VERSION-GEN: fix dirty repo tracking

 FIO-VERSION-GEN      |  4 +---
 examples/jesd219.fio |  1 +
 filelock.c           | 14 +++++++-------
 filelock.h           |  4 +++-
 io_u.c               |  2 +-
 trim.c               | 14 +++++++-------
 trim.h               | 12 ++++++------
 7 files changed, 26 insertions(+), 25 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 7065a57..d19dcca 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -15,7 +15,7 @@ elif test -d .git -o -f .git &&
 	VN=`git describe --match "fio-[0-9]*" --abbrev=4 HEAD 2>/dev/null` &&
 	case "$VN" in
 	*$LF*) (exit 1) ;;
-	v[0-9]*)
+	fio-[0-9]*)
 		git update-index -q --refresh
 		test -z "`git diff-index --name-only HEAD --`" ||
 		VN="$VN-dirty" ;;
@@ -38,5 +38,3 @@ test "$VN" = "$VC" || {
 	echo >&2 "FIO_VERSION = $VN"
 	echo "FIO_VERSION = $VN" >$GVF
 }
-
-
diff --git a/examples/jesd219.fio b/examples/jesd219.fio
index ab2c40e..24f16f7 100644
--- a/examples/jesd219.fio
+++ b/examples/jesd219.fio
@@ -14,6 +14,7 @@ rwmixwrite=60
 iodepth=256
 numjobs=4
 bssplit=512/4:1024/1:1536/1:2048/1:2560/1:3072/1:3584/1:4k/67:8k/10:16k/7:32k/3:64k/3
+blockalign=4k
 random_distribution=zoned:50/5:30/15:20/80
 filename=/dev/nvme0n1
 group_reporting=1
diff --git a/filelock.c b/filelock.c
index b113007..6e84970 100644
--- a/filelock.c
+++ b/filelock.c
@@ -165,7 +165,7 @@ static struct fio_filelock *fio_hash_get(uint32_t hash, int trylock)
 	return ff;
 }
 
-static int __fio_lock_file(const char *fname, int trylock)
+static bool __fio_lock_file(const char *fname, int trylock)
 {
 	struct fio_filelock *ff;
 	uint32_t hash;
@@ -180,16 +180,16 @@ static int __fio_lock_file(const char *fname, int trylock)
 
 	if (!ff) {
 		assert(!trylock);
-		return 1;
+		return true;
 	}
 
 	if (!trylock) {
 		fio_mutex_down(&ff->lock);
-		return 0;
+		return false;
 	}
 
 	if (!fio_mutex_down_trylock(&ff->lock))
-		return 0;
+		return false;
 
 	fio_mutex_down(&fld->lock);
 
@@ -206,13 +206,13 @@ static int __fio_lock_file(const char *fname, int trylock)
 
 	if (ff) {
 		fio_mutex_down(&ff->lock);
-		return 0;
+		return false;
 	}
 
-	return 1;
+	return true;
 }
 
-int fio_trylock_file(const char *fname)
+bool fio_trylock_file(const char *fname)
 {
 	return __fio_lock_file(fname, 1);
 }
diff --git a/filelock.h b/filelock.h
index 97d13b7..4551bb0 100644
--- a/filelock.h
+++ b/filelock.h
@@ -1,8 +1,10 @@
 #ifndef FIO_LOCK_FILE_H
 #define FIO_LOCK_FILE_H
 
+#include "lib/types.h"
+
 extern void fio_lock_file(const char *);
-extern int fio_trylock_file(const char *);
+extern bool fio_trylock_file(const char *);
 extern void fio_unlock_file(const char *);
 
 extern int fio_filelock_init(void);
diff --git a/io_u.c b/io_u.c
index dcf7a40..b6d530f 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1510,7 +1510,7 @@ static bool check_get_trim(struct thread_data *td, struct io_u *io_u)
 			get_trim = 1;
 		}
 
-		if (get_trim && !get_next_trim(td, io_u))
+		if (get_trim && get_next_trim(td, io_u))
 			return true;
 	}
 
diff --git a/trim.c b/trim.c
index 4345541..78cf672 100644
--- a/trim.c
+++ b/trim.c
@@ -11,7 +11,7 @@
 #include "trim.h"
 
 #ifdef FIO_HAVE_TRIM
-int get_next_trim(struct thread_data *td, struct io_u *io_u)
+bool get_next_trim(struct thread_data *td, struct io_u *io_u)
 {
 	struct io_piece *ipo;
 
@@ -19,9 +19,9 @@ int get_next_trim(struct thread_data *td, struct io_u *io_u)
 	 * this io_u is from a requeue, we already filled the offsets
 	 */
 	if (io_u->file)
-		return 0;
+		return true;
 	if (flist_empty(&td->trim_list))
-		return 1;
+		return false;
 
 	assert(td->trim_entries);
 	ipo = flist_first_entry(&td->trim_list, struct io_piece, trim_list);
@@ -53,7 +53,7 @@ int get_next_trim(struct thread_data *td, struct io_u *io_u)
 		if (r) {
 			dprint(FD_VERIFY, "failed file %s open\n",
 					io_u->file->file_name);
-			return 1;
+			return false;
 		}
 	}
 
@@ -64,17 +64,17 @@ int get_next_trim(struct thread_data *td, struct io_u *io_u)
 	io_u->xfer_buflen = io_u->buflen;
 
 	dprint(FD_VERIFY, "get_next_trim: ret io_u %p\n", io_u);
-	return 0;
+	return true;
 }
 
-int io_u_should_trim(struct thread_data *td, struct io_u *io_u)
+bool io_u_should_trim(struct thread_data *td, struct io_u *io_u)
 {
 	unsigned long long val;
 	uint64_t frand_max;
 	unsigned long r;
 
 	if (!td->o.trim_percentage)
-		return 0;
+		return false;
 
 	frand_max = rand_max(&td->trim_state);
 	r = __rand(&td->trim_state);
diff --git a/trim.h b/trim.h
index 6584606..37f5d7c 100644
--- a/trim.h
+++ b/trim.h
@@ -4,8 +4,8 @@
 #include "fio.h"
 
 #ifdef FIO_HAVE_TRIM
-extern int __must_check get_next_trim(struct thread_data *td, struct io_u *io_u);
-extern int io_u_should_trim(struct thread_data *td, struct io_u *io_u);
+extern bool __must_check get_next_trim(struct thread_data *td, struct io_u *io_u);
+extern bool io_u_should_trim(struct thread_data *td, struct io_u *io_u);
 
 /*
  * Determine whether a given io_u should be logged for verify or
@@ -20,13 +20,13 @@ static inline void remove_trim_entry(struct thread_data *td, struct io_piece *ip
 }
 
 #else
-static inline int get_next_trim(struct thread_data *td, struct io_u *io_u)
+static inline bool get_next_trim(struct thread_data *td, struct io_u *io_u)
 {
-	return 1;
+	return false;
 }
-static inline int io_u_should_trim(struct thread_data *td, struct io_u *io_u)
+static inline bool io_u_should_trim(struct thread_data *td, struct io_u *io_u)
 {
-	return 0;
+	return false;
 }
 static inline void remove_trim_entry(struct thread_data *td, struct io_piece *ipo)
 {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 04d6530f6ecd50520e99732b0b6bb90f71ff131a:

  file: fix numjobs > 1 and implied jobname as filename (2016-08-25 21:00:55 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 39d13e67ef1f4b327c68431f8daf033a03920117:

  backend: check if we need to update rusage stats, if stat_mutex is busy (2016-08-26 14:39:30 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      file: fio_files_done() can return bool
      backend: check if we need to update rusage stats, if stat_mutex is busy

 backend.c   | 8 ++++++--
 file.h      | 2 +-
 filesetup.c | 7 +++----
 3 files changed, 10 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index d986586..fb2a855 100644
--- a/backend.c
+++ b/backend.c
@@ -1731,9 +1731,13 @@ static void *thread_main(void *data)
 		 * the rusage_sem, which would never get upped because
 		 * this thread is waiting for the stat mutex.
 		 */
-		check_update_rusage(td);
+		do {
+			check_update_rusage(td);
+			if (!fio_mutex_down_trylock(stat_mutex))
+				break;
+			usleep(1000);
+		} while (1);
 
-		fio_mutex_down(stat_mutex);
 		if (td_read(td) && td->io_bytes[DDIR_READ])
 			update_runtime(td, elapsed_us, DDIR_READ);
 		if (td_write(td) && td->io_bytes[DDIR_WRITE])
diff --git a/file.h b/file.h
index aff3ce9..6f34dd5 100644
--- a/file.h
+++ b/file.h
@@ -210,7 +210,7 @@ extern int get_fileno(struct thread_data *, const char *);
 extern void free_release_files(struct thread_data *);
 extern void filesetup_mem_free(void);
 extern void fio_file_reset(struct thread_data *, struct fio_file *);
-extern int fio_files_done(struct thread_data *);
+extern bool fio_files_done(struct thread_data *);
 extern bool exists_and_not_regfile(const char *);
 
 #endif
diff --git a/filesetup.c b/filesetup.c
index fc9f306..c6ef3bf 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1292,7 +1292,6 @@ static void set_already_allocated(const char *fname)
 	}
 }
 
-
 static void free_already_allocated(void)
 {
 	struct flist_head *entry, *tmp;
@@ -1666,16 +1665,16 @@ void fio_file_reset(struct thread_data *td, struct fio_file *f)
 		lfsr_reset(&f->lfsr, td->rand_seeds[FIO_RAND_BLOCK_OFF]);
 }
 
-int fio_files_done(struct thread_data *td)
+bool fio_files_done(struct thread_data *td)
 {
 	struct fio_file *f;
 	unsigned int i;
 
 	for_each_file(td, f, i)
 		if (!fio_file_done(f))
-			return 0;
+			return false;
 
-	return 1;
+	return true;
 }
 
 /* free memory used in initialization phase only */

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9e4438fecd1d92b4d5221f35d5e73546f52c6ebf:

  stat: don't trust per_unit_log() if log is NULL (2016-08-22 13:23:29 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 04d6530f6ecd50520e99732b0b6bb90f71ff131a:

  file: fix numjobs > 1 and implied jobname as filename (2016-08-25 21:00:55 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Merge branch 'epoch-histograms' of https://github.com/cronburg/fio
      file: fix numjobs > 1 and implied jobname as filename

Karl Cronburg (1):
      Give job file to fiologparser_hist.py so that it can auto detect     log_hist_msec. This commit also adds handling of unix epoch     timestamps by fiologparser_hist.py.

 file.h                          |  1 +
 filesetup.c                     | 43 +++++++++++++++++++++++++++++++----------
 init.c                          | 20 -------------------
 tools/hist/fiologparser_hist.py | 41 +++++++++++++++++++++++++++++++++++++--
 4 files changed, 73 insertions(+), 32 deletions(-)

---

Diff of recent changes:

diff --git a/file.h b/file.h
index f7e5d20..aff3ce9 100644
--- a/file.h
+++ b/file.h
@@ -211,5 +211,6 @@ extern void free_release_files(struct thread_data *);
 extern void filesetup_mem_free(void);
 extern void fio_file_reset(struct thread_data *, struct fio_file *);
 extern int fio_files_done(struct thread_data *);
+extern bool exists_and_not_regfile(const char *);
 
 #endif
diff --git a/filesetup.c b/filesetup.c
index 5db44c2..fc9f306 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1242,31 +1242,33 @@ static void get_file_type(struct fio_file *f)
 	}
 }
 
-static int __is_already_allocated(const char *fname)
+static bool __is_already_allocated(const char *fname)
 {
 	struct flist_head *entry;
-	char *filename;
 
 	if (flist_empty(&filename_list))
-		return 0;
+		return false;
 
 	flist_for_each(entry, &filename_list) {
-		filename = flist_entry(entry, struct file_name, list)->filename;
+		struct file_name *fn;
 
-		if (strcmp(filename, fname) == 0)
-			return 1;
+		fn = flist_entry(entry, struct file_name, list);
+
+		if (!strcmp(fn->filename, fname))
+			return true;
 	}
 
-	return 0;
+	return false;
 }
 
-static int is_already_allocated(const char *fname)
+static bool is_already_allocated(const char *fname)
 {
-	int ret;
+	bool ret;
 
 	fio_file_hash_lock();
 	ret = __is_already_allocated(fname);
 	fio_file_hash_unlock();
+
 	return ret;
 }
 
@@ -1327,6 +1329,26 @@ static struct fio_file *alloc_new_file(struct thread_data *td)
 	return f;
 }
 
+bool exists_and_not_regfile(const char *filename)
+{
+	struct stat sb;
+
+	if (lstat(filename, &sb) == -1)
+		return false;
+
+#ifndef WIN32 /* NOT Windows */
+	if (S_ISREG(sb.st_mode))
+		return false;
+#else
+	/* \\.\ is the device namespace in Windows, where every file
+	 * is a device node */
+	if (S_ISREG(sb.st_mode) && strncmp(filename, "\\\\.\\", 4) != 0)
+		return false;
+#endif
+
+	return true;
+}
+
 int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 {
 	int cur_files = td->files_index;
@@ -1343,7 +1365,8 @@ int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 	sprintf(file_name + len, "%s", fname);
 
 	/* clean cloned siblings using existing files */
-	if (numjob && is_already_allocated(file_name))
+	if (numjob && is_already_allocated(file_name) &&
+	    !exists_and_not_regfile(fname))
 		return 0;
 
 	f = alloc_new_file(td);
diff --git a/init.c b/init.c
index 5ff7385..0221ab2 100644
--- a/init.c
+++ b/init.c
@@ -903,26 +903,6 @@ static const char *get_engine_name(const char *str)
 	return p;
 }
 
-static int exists_and_not_regfile(const char *filename)
-{
-	struct stat sb;
-
-	if (lstat(filename, &sb) == -1)
-		return 0;
-
-#ifndef WIN32 /* NOT Windows */
-	if (S_ISREG(sb.st_mode))
-		return 0;
-#else
-	/* \\.\ is the device namespace in Windows, where every file
-	 * is a device node */
-	if (S_ISREG(sb.st_mode) && strncmp(filename, "\\\\.\\", 4) != 0)
-		return 0;
-#endif
-
-	return 1;
-}
-
 static void init_rand_file_service(struct thread_data *td)
 {
 	unsigned long nranges = td->o.nr_files << FIO_FSERVICE_SHIFT;
diff --git a/tools/hist/fiologparser_hist.py b/tools/hist/fiologparser_hist.py
index 778cc00..93dca01 100755
--- a/tools/hist/fiologparser_hist.py
+++ b/tools/hist/fiologparser_hist.py
@@ -255,6 +255,29 @@ def guess_max_from_bins(ctx, hist_cols):
 
 def main(ctx):
 
+    if ctx.job_file:
+        try:
+            from configparser import SafeConfigParser
+        except ImportError:
+            from ConfigParser import SafeConfigParser
+
+        cp = SafeConfigParser(allow_no_value=True)
+        with open(ctx.job_file, 'r') as fp:
+            cp.read(fp)
+
+        if ctx.interval is None:
+            # Auto detect --interval value
+            for s in cp.sections():
+                try:
+                    hist_msec = cp[s]['log_hist_msec']
+                    if hist_msec is not None:
+                        ctx.interval = int(hist_msec)
+                except KeyError:
+                    pass
+
+    if ctx.interval is None:
+        ctx.interval = 1000
+
     # Automatically detect how many columns are in the input files,
     # calculate the corresponding 'coarseness' parameter used to generate
     # those files, and calculate the appropriate bin latency values:
@@ -291,6 +314,14 @@ def main(ctx):
             arr = arr.astype(int)
             
             if arr.size > 0:
+                # Jump immediately to the start of the input, rounding
+                # down to the nearest multiple of the interval (useful when --log_unix_epoch
+                # was used to create these histograms):
+                if start == 0 and arr[0][0] - ctx.max_latency > end:
+                    start = arr[0][0] - ctx.max_latency
+                    start = start - (start % ctx.interval)
+                    end = start + ctx.interval
+
                 process_interval(ctx, arr, start, end)
                 
                 # Update arr to throw away samples we no longer need - samples which
@@ -321,9 +352,8 @@ if __name__ == '__main__':
         help='number of seconds of data to process at a time')
 
     arg('-i', '--interval',
-        default=1000,
         type=int,
-        help='interval width (ms)')
+        help='interval width (ms), default 1000 ms')
 
     arg('-d', '--divisor',
         required=False,
@@ -347,5 +377,12 @@ if __name__ == '__main__':
         type=int,
         help='FIO_IO_U_PLAT_GROUP_NR as defined in stat.h')
 
+    arg('--job-file',
+        default=None,
+        type=str,
+        help='Optional argument pointing to the job file used to create the '
+             'given histogram files. Useful for auto-detecting --log_hist_msec and '
+             '--log_unix_epoch (in fio) values.')
+
     main(p.parse_args())
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a86f6d07f12141a32ccad2007d4568e612e0df10:

  verify: use proper include for PATH_MAX (2016-08-20 10:28:57 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9e4438fecd1d92b4d5221f35d5e73546f52c6ebf:

  stat: don't trust per_unit_log() if log is NULL (2016-08-22 13:23:29 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      stat: don't trust per_unit_log() if log is NULL

 stat.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 552d88d..74c2686 100644
--- a/stat.c
+++ b/stat.c
@@ -2457,12 +2457,12 @@ int calc_log_samples(void)
 			next = min(td->o.iops_avg_time, td->o.bw_avg_time);
 			continue;
 		}
-		if (!per_unit_log(td->bw_log)) {
+		if (td->bw_log && !per_unit_log(td->bw_log)) {
 			tmp = add_bw_samples(td, &now);
 			if (tmp < next)
 				next = tmp;
 		}
-		if (!per_unit_log(td->iops_log)) {
+		if (td->iops_log && !per_unit_log(td->iops_log)) {
 			tmp = add_iops_samples(td, &now);
 			if (tmp < next)
 				next = tmp;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d1f6fcadb7cb28a5e57a5e573395fe2deb3cfd7b:

  Manual page for fiologparser_hist.py and Makefile updates to install them. (2016-08-18 18:56:17 -0400)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a86f6d07f12141a32ccad2007d4568e612e0df10:

  verify: use proper include for PATH_MAX (2016-08-20 10:28:57 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      verify: use proper include for PATH_MAX

 verify-state.h | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/verify-state.h b/verify-state.h
index 901aa0a..e46265e 100644
--- a/verify-state.h
+++ b/verify-state.h
@@ -3,6 +3,7 @@
 
 #include <stdint.h>
 #include <string.h>
+#include <limits.h>
 
 struct thread_rand32_state {
 	uint32_t s[4];

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c22825bb537af1f84a18dcb4af6d8c6844f751ac:

  Fix backwards reads with --size smaller than the file size (2016-08-16 15:22:17 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d1f6fcadb7cb28a5e57a5e573395fe2deb3cfd7b:

  Manual page for fiologparser_hist.py and Makefile updates to install them. (2016-08-18 18:56:17 -0400)

----------------------------------------------------------------
Jens Axboe (4):
      fio: use the proper enum type for the shifted IO engine flags
      Add basic write/read-and-verify example job file
      parse: fix void * pointer math complaint
      Merge branch 'epoch' of https://github.com/cronburg/fio

Karl Cronburg (2):
      Option for changing log files to use Unix epoch instead of being     zero-based (when fio starts) epoch.
      Manual page for fiologparser_hist.py and Makefile updates to install     them.

 HOWTO                             |   4 +
 Makefile                          |   6 +-
 backend.c                         |   2 +-
 cconv.c                           |   2 +
 examples/basic-verify.fio         |  12 +++
 fio.1                             |   5 +
 fio.h                             |   9 +-
 fio_time.h                        |   1 +
 libfio.c                          |  10 +-
 options.c                         |   9 ++
 parse.h                           |   2 +-
 rate-submit.c                     |   2 +-
 stat.c                            |   2 +-
 thread_options.h                  |   3 +
 time.c                            |  11 +++
 tools/hist/fiologparser_hist.py   | 117 ++--------------------
 tools/hist/fiologparser_hist.py.1 | 201 ++++++++++++++++++++++++++++++++++++++
 17 files changed, 272 insertions(+), 126 deletions(-)
 create mode 100644 examples/basic-verify.fio
 create mode 100644 tools/hist/fiologparser_hist.py.1

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index c1b768d..07419a1 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1681,6 +1681,10 @@ log_store_compressed=bool	If set, fio will store the log files in a
 		the --inflate-log command line parameter. The files will be
 		stored with a .fz suffix.
 
+log_unix_epoch=bool	If set, fio will log Unix timestamps to the log
+		files produced by enabling write_type_log for each log type, instead
+		of the default zero-based timestamps.
+
 block_error_percentiles=bool	If set, record errors in trim block-sized
 		units from writes and trims and output a histogram of
 		how many trims it took to get to errors, and what kind
diff --git a/Makefile b/Makefile
index b54f7e9..3f67ab7 100644
--- a/Makefile
+++ b/Makefile
@@ -26,7 +26,7 @@ OPTFLAGS= -g -ffast-math
 CFLAGS	= -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement $(OPTFLAGS) $(EXTFLAGS) $(BUILD_CFLAGS) -I. -I$(SRCDIR)
 LIBS	+= -lm $(EXTLIBS)
 PROGS	= fio
-SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/fio_latency2csv.py)
+SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/fio_latency2csv.py tools/hist/fiologparser_hist.py)
 
 ifndef CONFIG_FIO_NO_OPT
   CFLAGS += -O3
@@ -430,7 +430,7 @@ clean: FORCE
 	@rm -f .depend $(FIO_OBJS) $(GFIO_OBJS) $(OBJS) $(T_OBJS) $(PROGS) $(T_PROGS) $(T_TEST_PROGS) core.* core gfio FIO-VERSION-FILE *.d lib/*.d oslib/*.d crc/*.d engines/*.d profiles/*.d t/*.d config-host.mak config-host.h y.tab.[ch] lex.yy.c exp/*.[do] lexer.h
 
 distclean: clean FORCE
-	@rm -f cscope.out fio.pdf fio_generate_plots.pdf fio2gnuplot.pdf
+	@rm -f cscope.out fio.pdf fio_generate_plots.pdf fio2gnuplot.pdf fiologparser_hist.pdf
 
 cscope:
 	@cscope -b -R
@@ -442,6 +442,7 @@ doc: tools/plot/fio2gnuplot.1
 	@man -t ./fio.1 | ps2pdf - fio.pdf
 	@man -t tools/fio_generate_plots.1 | ps2pdf - fio_generate_plots.pdf
 	@man -t tools/plot/fio2gnuplot.1 | ps2pdf - fio2gnuplot.pdf
+	@man -t tools/hist/fiologparser_hist.py.1 | ps2pdf - fiologparser_hist.pdf
 
 test:
 
@@ -452,5 +453,6 @@ install: $(PROGS) $(SCRIPTS) tools/plot/fio2gnuplot.1 FORCE
 	$(INSTALL) -m 644 $(SRCDIR)/fio.1 $(DESTDIR)$(mandir)/man1
 	$(INSTALL) -m 644 $(SRCDIR)/tools/fio_generate_plots.1 $(DESTDIR)$(mandir)/man1
 	$(INSTALL) -m 644 $(SRCDIR)/tools/plot/fio2gnuplot.1 $(DESTDIR)$(mandir)/man1
+	$(INSTALL) -m 644 $(SRCDIR)/tools/hist/fiologparser_hist.py.1 $(DESTDIR)$(mandir)/man1
 	$(INSTALL) -m 755 -d $(DESTDIR)$(sharedir)
 	$(INSTALL) -m 644 $(SRCDIR)/tools/plot/*gpm $(DESTDIR)$(sharedir)/
diff --git a/backend.c b/backend.c
index b43486d..d986586 100644
--- a/backend.c
+++ b/backend.c
@@ -1675,7 +1675,7 @@ static void *thread_main(void *data)
 	if (rate_submit_init(td, sk_out))
 		goto err;
 
-	fio_gettime(&td->epoch, NULL);
+	set_epoch_time(td, o->log_unix_epoch);
 	fio_getrusage(&td->ru_start);
 	memcpy(&td->bw_sample_time, &td->epoch, sizeof(td->epoch));
 	memcpy(&td->iops_sample_time, &td->epoch, sizeof(td->epoch));
diff --git a/cconv.c b/cconv.c
index 8d9a0a8..194e342 100644
--- a/cconv.c
+++ b/cconv.c
@@ -187,6 +187,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->log_offset = le32_to_cpu(top->log_offset);
 	o->log_gz = le32_to_cpu(top->log_gz);
 	o->log_gz_store = le32_to_cpu(top->log_gz_store);
+	o->log_unix_epoch = le32_to_cpu(top->log_unix_epoch);
 	o->norandommap = le32_to_cpu(top->norandommap);
 	o->softrandommap = le32_to_cpu(top->softrandommap);
 	o->bs_unaligned = le32_to_cpu(top->bs_unaligned);
@@ -379,6 +380,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->log_offset = cpu_to_le32(o->log_offset);
 	top->log_gz = cpu_to_le32(o->log_gz);
 	top->log_gz_store = cpu_to_le32(o->log_gz_store);
+	top->log_unix_epoch = cpu_to_le32(o->log_unix_epoch);
 	top->norandommap = cpu_to_le32(o->norandommap);
 	top->softrandommap = cpu_to_le32(o->softrandommap);
 	top->bs_unaligned = cpu_to_le32(o->bs_unaligned);
diff --git a/examples/basic-verify.fio b/examples/basic-verify.fio
new file mode 100644
index 0000000..7871aeb
--- /dev/null
+++ b/examples/basic-verify.fio
@@ -0,0 +1,12 @@
+# The most basic form of data verification. Write the device randomly
+# in 4K chunks, then read it back and verify the contents.
+[write-and-verify]
+rw=randwrite
+bs=4k
+direct=1
+ioengine=libaio
+iodepth=16
+verify=crc32c
+# Use /dev/XXX. For running this on a file instead, remove the filename
+# option and add a size=32G (or whatever file size you want) instead.
+filename=/dev/XXX
diff --git a/fio.1 b/fio.1
index 696664a..8d596fb 100644
--- a/fio.1
+++ b/fio.1
@@ -1546,6 +1546,11 @@ If set, fio will store the log files in a compressed format. They can be
 decompressed with fio, using the \fB\-\-inflate-log\fR command line parameter.
 The files will be stored with a \fB\.fz\fR suffix.
 .TP
+.BI log_unix_epoch \fR=\fPbool
+If set, fio will log Unix timestamps to the log files produced by enabling
+\fBwrite_type_log\fR for each log type, instead of the default zero-based
+timestamps.
+.TP
 .BI block_error_percentiles \fR=\fPbool
 If set, record errors in trim block-sized units from writes and trims and output
 a histogram of how many trims it took to get to errors, and what kind of error
diff --git a/fio.h b/fio.h
index 0da0bc5..df4fbb1 100644
--- a/fio.h
+++ b/fio.h
@@ -311,6 +311,7 @@ struct thread_data {
 
 	struct timeval start;	/* start of this loop */
 	struct timeval epoch;	/* time job was started */
+	unsigned long long unix_epoch; /* Time job was started, unix epoch based. */
 	struct timeval last_issue;
 	long time_offset;
 	struct timeval tv_cache;
@@ -563,7 +564,8 @@ enum {
 
 static inline enum fio_ioengine_flags td_ioengine_flags(struct thread_data *td)
 {
-	return (td->flags >> TD_ENG_FLAG_SHIFT) & TD_ENG_FLAG_MASK;
+	return (enum fio_ioengine_flags)
+		((td->flags >> TD_ENG_FLAG_SHIFT) & TD_ENG_FLAG_MASK);
 }
 
 static inline void td_set_ioengine_flags(struct thread_data *td)
@@ -571,9 +573,10 @@ static inline void td_set_ioengine_flags(struct thread_data *td)
 	td->flags |= (td->io_ops->flags << TD_ENG_FLAG_SHIFT);
 }
 
-static inline bool td_ioengine_flagged(struct thread_data *td, unsigned int val)
+static inline bool td_ioengine_flagged(struct thread_data *td,
+				       enum fio_ioengine_flags flags)
 {
-	return ((td->flags >> TD_ENG_FLAG_SHIFT) & val) != 0;
+	return ((td->flags >> TD_ENG_FLAG_SHIFT) & flags) != 0;
 }
 
 extern void td_set_runstate(struct thread_data *, int);
diff --git a/fio_time.h b/fio_time.h
index e31ea09..b49cc82 100644
--- a/fio_time.h
+++ b/fio_time.h
@@ -20,5 +20,6 @@ extern bool ramp_time_over(struct thread_data *);
 extern bool in_ramp_time(struct thread_data *);
 extern void fio_time_init(void);
 extern void timeval_add_msec(struct timeval *, unsigned int);
+extern void set_epoch_time(struct thread_data *, int);
 
 #endif
diff --git a/libfio.c b/libfio.c
index fb7d35a..d88ed4e 100644
--- a/libfio.c
+++ b/libfio.c
@@ -134,7 +134,6 @@ void clear_io_state(struct thread_data *td, int all)
 
 void reset_all_stats(struct thread_data *td)
 {
-	struct timeval tv;
 	int i;
 
 	reset_io_counters(td, 1);
@@ -148,11 +147,10 @@ void reset_all_stats(struct thread_data *td)
 		td->rwmix_issues = 0;
 	}
 
-	fio_gettime(&tv, NULL);
-	memcpy(&td->epoch, &tv, sizeof(tv));
-	memcpy(&td->start, &tv, sizeof(tv));
-	memcpy(&td->iops_sample_time, &tv, sizeof(tv));
-	memcpy(&td->bw_sample_time, &tv, sizeof(tv));
+	set_epoch_time(td, td->o.log_unix_epoch);
+	memcpy(&td->start, &td->epoch, sizeof(struct timeval));
+	memcpy(&td->iops_sample_time, &td->epoch, sizeof(struct timeval));
+	memcpy(&td->bw_sample_time, &td->epoch, sizeof(struct timeval));
 
 	lat_target_reset(td);
 	clear_rusage_stat(td);
diff --git a/options.c b/options.c
index 517ee68..50b4d09 100644
--- a/options.c
+++ b/options.c
@@ -3648,6 +3648,15 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 #endif
 	{
+		.name = "log_unix_epoch",
+		.lname = "Log epoch unix",
+		.type = FIO_OPT_BOOL,
+		.off1 = offsetof(struct thread_options, log_unix_epoch),
+		.help = "Use Unix time in log files",
+		.category = FIO_OPT_C_LOG,
+		.group = FIO_OPT_G_INVALID,
+	},
+	{
 		.name	= "block_error_percentiles",
 		.lname	= "Block error percentiles",
 		.type	= FIO_OPT_BOOL,
diff --git a/parse.h b/parse.h
index d852ddc..7ba4e37 100644
--- a/parse.h
+++ b/parse.h
@@ -116,7 +116,7 @@ static inline void *td_var(struct thread_options *to, struct fio_option *o,
 	else
 		ret = to;
 
-	return ret + offset;
+	return (char *) ret + offset;
 }
 
 static inline int parse_is_percent(unsigned long long val)
diff --git a/rate-submit.c b/rate-submit.c
index 2efbdcb..42927ff 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -123,7 +123,7 @@ static int io_workqueue_init_worker_fn(struct submit_worker *sw)
 	if (td_io_init(td))
 		goto err_io_init;
 
-	fio_gettime(&td->epoch, NULL);
+	set_epoch_time(td, td->o.log_unix_epoch);
 	fio_getrusage(&td->ru_start);
 	clear_io_state(td, 1);
 
diff --git a/stat.c b/stat.c
index 5e7c593..552d88d 100644
--- a/stat.c
+++ b/stat.c
@@ -2020,7 +2020,7 @@ static void __add_log_sample(struct io_log *iolog, unsigned long val,
 		s = get_sample(iolog, cur_log, cur_log->nr_samples);
 
 		s->val = val;
-		s->time = t;
+		s->time = t + iolog->td->unix_epoch;
 		io_sample_set_ddir(iolog, s, ddir);
 		s->bs = bs;
 
diff --git a/thread_options.h b/thread_options.h
index d70fda3..1b4590f 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -135,6 +135,7 @@ struct thread_options {
 	unsigned int log_offset;
 	unsigned int log_gz;
 	unsigned int log_gz_store;
+	unsigned int log_unix_epoch;
 	unsigned int norandommap;
 	unsigned int softrandommap;
 	unsigned int bs_unaligned;
@@ -393,11 +394,13 @@ struct thread_options_pack {
 	uint32_t log_offset;
 	uint32_t log_gz;
 	uint32_t log_gz_store;
+	uint32_t log_unix_epoch;
 	uint32_t norandommap;
 	uint32_t softrandommap;
 	uint32_t bs_unaligned;
 	uint32_t fsync_on_close;
 	uint32_t bs_is_seq_rand;
+	uint32_t pad1;
 
 	uint32_t random_distribution;
 	uint32_t exitall_error;
diff --git a/time.c b/time.c
index f1c5d3f..f5dc049 100644
--- a/time.c
+++ b/time.c
@@ -151,6 +151,17 @@ void set_genesis_time(void)
 	fio_gettime(&genesis, NULL);
 }
 
+void set_epoch_time(struct thread_data *td, int log_unix_epoch)
+{
+	fio_gettime(&td->epoch, NULL);
+	if (log_unix_epoch) {
+		struct timeval tv;
+		gettimeofday(&tv, NULL);
+		td->unix_epoch = (unsigned long long)(tv.tv_sec) * 1000 +
+		                 (unsigned long long)(tv.tv_usec) / 1000;
+	}
+}
+
 void fill_start_time(struct timeval *t)
 {
 	memcpy(t, &genesis, sizeof(genesis));
diff --git a/tools/hist/fiologparser_hist.py b/tools/hist/fiologparser_hist.py
index 5891427..778cc00 100755
--- a/tools/hist/fiologparser_hist.py
+++ b/tools/hist/fiologparser_hist.py
@@ -11,111 +11,6 @@
             4000, 39, 1152, 1546.962, 1545.785, 1627.192, 1640.019, 1691.204, 1744
             ...
     
-    Notes:
-
-    * end-times are calculated to be uniform increments of the --interval value given,
-      regardless of when histogram samples are reported. Of note:
-        
-        * Intervals with no samples are omitted. In the example above this means
-          "no statistics from 2 to 3 seconds" and "39 samples influenced the statistics
-          of the interval from 3 to 4 seconds".
-        
-        * Intervals with a single sample will have the same value for all statistics
-        
-    * The number of samples is unweighted, corresponding to the total number of samples
-      which have any effect whatsoever on the interval.
-
-    * Min statistics are computed using value of the lower boundary of the first bin
-      (in increasing bin order) with non-zero samples in it. Similarly for max,
-      we take the upper boundary of the last bin with non-zero samples in it.
-      This is semantically identical to taking the 0th and 100th percentiles with a
-      50% bin-width buffer (because percentiles are computed using mid-points of
-      the bins). This enforces the following nice properties:
-
-        * min <= 50th <= 90th <= 95th <= 99th <= max
-
-        * min and max are strict lower and upper bounds on the actual
-          min / max seen by fio (and reported in *_clat.* with averaging turned off).
-
-    * Average statistics use a standard weighted arithmetic mean.
-
-    * Percentile statistics are computed using the weighted percentile method as
-      described here: https://en.wikipedia.org/wiki/Percentile#Weighted_percentile
-      See weights() method for details on how weights are computed for individual
-      samples. In process_interval() we further multiply by the height of each bin
-      to get weighted histograms.
-    
-    * We convert files given on the command line, assumed to be fio histogram files,
-      An individual histogram file can contain the
-      histograms for multiple different r/w directions (notably when --rw=randrw). This
-      is accounted for by tracking each r/w direction separately. In the statistics
-      reported we ultimately merge *all* histograms (regardless of r/w direction).
-
-    * The value of *_GROUP_NR in stat.h (and *_BITS) determines how many latency bins
-      fio outputs when histogramming is enabled. Namely for the current default of
-      GROUP_NR=19, we get 1,216 bins with a maximum latency of approximately 17
-      seconds. For certain applications this may not be sufficient. With GROUP_NR=24
-      we have 1,536 bins, giving us a maximum latency of 541 seconds (~ 9 minutes). If
-      you expect your application to experience latencies greater than 17 seconds,
-      you will need to recompile fio with a larger GROUP_NR, e.g. with:
-        
-            sed -i.bak 's/^#define FIO_IO_U_PLAT_GROUP_NR 19\n/#define FIO_IO_U_PLAT_GROUP_NR 24/g' stat.h
-            make fio
-            
-      Quick reference table for the max latency corresponding to a sampling of
-      values for GROUP_NR:
-            
-            GROUP_NR | # bins | max latency bin value
-            19       | 1216   | 16.9 sec
-            20       | 1280   | 33.8 sec
-            21       | 1344   | 67.6 sec
-            22       | 1408   | 2  min, 15 sec
-            23       | 1472   | 4  min, 32 sec
-            24       | 1536   | 9  min, 4  sec
-            25       | 1600   | 18 min, 8  sec
-            26       | 1664   | 36 min, 16 sec
-      
-    * At present this program automatically detects the number of histogram bins in
-      the log files, and adjusts the bin latency values accordingly. In particular if
-      you use the --log_hist_coarseness parameter of fio, you get output files with
-      a number of bins according to the following table (note that the first
-      row is identical to the table above):
-
-      coarse \ GROUP_NR
-                  19     20    21     22     23     24     25     26
-             -------------------------------------------------------
-            0  [[ 1216,  1280,  1344,  1408,  1472,  1536,  1600,  1664],
-            1   [  608,   640,   672,   704,   736,   768,   800,   832],
-            2   [  304,   320,   336,   352,   368,   384,   400,   416],
-            3   [  152,   160,   168,   176,   184,   192,   200,   208],
-            4   [   76,    80,    84,    88,    92,    96,   100,   104],
-            5   [   38,    40,    42,    44,    46,    48,    50,    52],
-            6   [   19,    20,    21,    22,    23,    24,    25,    26],
-            7   [  N/A,    10,   N/A,    11,   N/A,    12,   N/A,    13],
-            8   [  N/A,     5,   N/A,   N/A,   N/A,     6,   N/A,   N/A]]
-
-      For other values of GROUP_NR and coarseness, this table can be computed like this:    
-        
-            bins = [1216,1280,1344,1408,1472,1536,1600,1664]
-            max_coarse = 8
-            fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else nan, range(max_coarse + 1)))
-            np.transpose(list(map(fncn, bins)))
-      
-      Also note that you can achieve the same downsampling / log file size reduction
-      by pre-processing (before inputting into this script) with half_bins.py.
-
-    * If you have not adjusted GROUP_NR for your (high latency) application, then you
-      will see the percentiles computed by this tool max out at the max latency bin
-      value as in the first table above, and in this plot (where GROUP_NR=19 and thus we see
-      a max latency of ~16.7 seconds in the red line):
-
-            https://www.cronburg.com/fio/max_latency_bin_value_bug.png
-    
-    * Motivation for, design decisions, and the implementation process are
-      described in further detail here:
-
-            https://www.cronburg.com/fio/cloud-latency-problem-measurement/
-
     @author Karl Cronburg <karl.cronburg@gmail.com>
 """
 import os
@@ -216,7 +111,7 @@ def histogram_generator(ctx, fps, sz):
             rdrs[fp] = pandas.read_csv(fp, dtype=int, header=None, chunksize=sz)
         except ValueError as e:
             if e.message == 'No columns to parse from file':
-                if not ctx.nowarn: sys.stderr.write("WARNING: Empty input file encountered.\n")
+                if ctx.warn: sys.stderr.write("WARNING: Empty input file encountered.\n")
                 rdrs[fp] = None
             else:
                 raise(e)
@@ -441,11 +336,11 @@ if __name__ == '__main__':
         type=int,
         help='number of decimal places to print floats to')
 
-    arg('--nowarn',
-        dest='nowarn',
-        action='store_false',
-        default=True,
-        help='do not print any warning messages to stderr')
+    arg('--warn',
+        dest='warn',
+        action='store_true',
+        default=False,
+        help='print warning messages to stderr')
 
     arg('--group_nr',
         default=19,
diff --git a/tools/hist/fiologparser_hist.py.1 b/tools/hist/fiologparser_hist.py.1
new file mode 100644
index 0000000..ed22c74
--- /dev/null
+++ b/tools/hist/fiologparser_hist.py.1
@@ -0,0 +1,201 @@
+.TH fiologparser_hist.py 1 "August 18, 2016"
+.SH NAME
+fiologparser_hist.py \- Calculate statistics from fio histograms
+.SH SYNOPSIS
+.B fiologparser_hist.py
+[\fIoptions\fR] [clat_hist_files]...
+.SH DESCRIPTION
+.B fiologparser_hist.py
+is a utility for converting *_clat_hist* files
+generated by fio into a CSV of latency statistics including minimum,
+average, maximum latency, and 50th, 95th, and 99th percentiles.
+.SH EXAMPLES
+.PP
+.nf
+$ fiologparser_hist.py *_clat_hist*
+end-time, samples, min, avg, median, 90%, 95%, 99%, max
+1000, 15, 192, 1678.107, 1788.859, 1856.076, 1880.040, 1899.208, 1888.000
+2000, 43, 152, 1642.368, 1714.099, 1816.659, 1845.552, 1888.131, 1888.000
+4000, 39, 1152, 1546.962, 1545.785, 1627.192, 1640.019, 1691.204, 1744
+...
+.fi
+.PP
+
+.SH OPTIONS
+.TP
+.BR \-\-help
+Print these options.
+.TP
+.BR \-\-buff_size \fR=\fPint
+Number of samples to buffer into numpy at a time. Default is 10,000.
+This can be adjusted to help performance.
+.TP
+.BR \-\-max_latency \fR=\fPint
+Number of seconds of data to process at a time. Defaults to 20 seconds,
+in order to handle the 17 second upper bound on latency in histograms
+reported by fio. This should be increased if fio has been
+run with a larger maximum latency. Lowering this when a lower maximum
+latency is known can improve performance. See NOTES for more details.
+.TP
+.BR \-i ", " \-\-interval \fR=\fPint
+Interval at which statistics are reported. Defaults to 1000 ms. This
+should be set a minimum of the value for \fBlog_hist_msec\fR as given
+to fio.
+.TP
+.BR \-d ", " \-\-divisor \fR=\fPint
+Divide statistics by this value. Defaults to 1. Useful if you want to
+convert latencies from milliseconds to seconds (\fBdivisor\fR=\fP1000\fR).
+.TP
+.BR \-\-warn
+Enables warning messages printed to stderr, useful for debugging.
+.TP
+.BR \-\-group_nr \fR=\fPint
+Set this to the value of \fIFIO_IO_U_PLAT_GROUP_NR\fR as defined in
+\fPstat.h\fR if fio has been recompiled. Defaults to 19, the
+current value used in fio. See NOTES for more details.
+
+.SH NOTES
+end-times are calculated to be uniform increments of the \fB\-\-interval\fR value given,
+regardless of when histogram samples are reported. Of note:
+
+.RS
+Intervals with no samples are omitted. In the example above this means
+"no statistics from 2 to 3 seconds" and "39 samples influenced the statistics
+of the interval from 3 to 4 seconds".
+.LP
+Intervals with a single sample will have the same value for all statistics
+.RE
+
+.PP
+The number of samples is unweighted, corresponding to the total number of samples
+which have any effect whatsoever on the interval.
+
+Min statistics are computed using value of the lower boundary of the first bin
+(in increasing bin order) with non-zero samples in it. Similarly for max,
+we take the upper boundary of the last bin with non-zero samples in it.
+This is semantically identical to taking the 0th and 100th percentiles with a
+50% bin-width buffer (because percentiles are computed using mid-points of
+the bins). This enforces the following nice properties:
+
+.RS
+min <= 50th <= 90th <= 95th <= 99th <= max
+.LP
+min and max are strict lower and upper bounds on the actual
+min / max seen by fio (and reported in *_clat.* with averaging turned off).
+.RE
+
+.PP
+Average statistics use a standard weighted arithmetic mean.
+
+Percentile statistics are computed using the weighted percentile method as
+described here: \fIhttps://en.wikipedia.org/wiki/Percentile#Weighted_percentile\fR.
+See weights() method for details on how weights are computed for individual
+samples. In process_interval() we further multiply by the height of each bin
+to get weighted histograms.
+
+We convert files given on the command line, assumed to be fio histogram files,
+An individual histogram file can contain the
+histograms for multiple different r/w directions (notably when \fB\-\-rw\fR=\fPrandrw\fR). This
+is accounted for by tracking each r/w direction separately. In the statistics
+reported we ultimately merge *all* histograms (regardless of r/w direction).
+
+The value of *_GROUP_NR in \fIstat.h\fR (and *_BITS) determines how many latency bins
+fio outputs when histogramming is enabled. Namely for the current default of
+GROUP_NR=19, we get 1,216 bins with a maximum latency of approximately 17
+seconds. For certain applications this may not be sufficient. With GROUP_NR=24
+we have 1,536 bins, giving us a maximum latency of 541 seconds (~ 9 minutes). If
+you expect your application to experience latencies greater than 17 seconds,
+you will need to recompile fio with a larger GROUP_NR, e.g. with:
+
+.RS
+.PP
+.nf
+sed -i.bak 's/^#define FIO_IO_U_PLAT_GROUP_NR 19\n/#define FIO_IO_U_PLAT_GROUP_NR 24/g' stat.h
+make fio
+.fi
+.PP
+.RE
+
+.PP
+Quick reference table for the max latency corresponding to a sampling of
+values for GROUP_NR:
+
+.RS
+.PP
+.nf
+GROUP_NR | # bins | max latency bin value
+19       | 1216   | 16.9 sec
+20       | 1280   | 33.8 sec
+21       | 1344   | 67.6 sec
+22       | 1408   | 2  min, 15 sec
+23       | 1472   | 4  min, 32 sec
+24       | 1536   | 9  min, 4  sec
+25       | 1600   | 18 min, 8  sec
+26       | 1664   | 36 min, 16 sec
+.fi
+.PP
+.RE
+
+.PP
+At present this program automatically detects the number of histogram bins in
+the log files, and adjusts the bin latency values accordingly. In particular if
+you use the \fB\-\-log_hist_coarseness\fR parameter of fio, you get output files with
+a number of bins according to the following table (note that the first
+row is identical to the table above):
+
+.RS
+.PP
+.nf
+coarse \\ GROUP_NR
+        19     20    21     22     23     24     25     26
+   -------------------------------------------------------
+  0  [[ 1216,  1280,  1344,  1408,  1472,  1536,  1600,  1664],
+  1   [  608,   640,   672,   704,   736,   768,   800,   832],
+  2   [  304,   320,   336,   352,   368,   384,   400,   416],
+  3   [  152,   160,   168,   176,   184,   192,   200,   208],
+  4   [   76,    80,    84,    88,    92,    96,   100,   104],
+  5   [   38,    40,    42,    44,    46,    48,    50,    52],
+  6   [   19,    20,    21,    22,    23,    24,    25,    26],
+  7   [  N/A,    10,   N/A,    11,   N/A,    12,   N/A,    13],
+  8   [  N/A,     5,   N/A,   N/A,   N/A,     6,   N/A,   N/A]]
+.fi
+.PP
+.RE
+
+.PP
+For other values of GROUP_NR and coarseness, this table can be computed like this:
+
+.RS
+.PP
+.nf
+bins = [1216,1280,1344,1408,1472,1536,1600,1664]
+max_coarse = 8
+fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else nan, range(max_coarse + 1)))
+np.transpose(list(map(fncn, bins)))
+.fi
+.PP
+.RE
+
+.PP
+If you have not adjusted GROUP_NR for your (high latency) application, then you
+will see the percentiles computed by this tool max out at the max latency bin
+value as in the first table above, and in this plot (where GROUP_NR=19 and thus we see
+a max latency of ~16.7 seconds in the red line):
+
+.RS
+\fIhttps://www.cronburg.com/fio/max_latency_bin_value_bug.png
+.RE
+
+.PP
+Motivation for, design decisions, and the implementation process are
+described in further detail here:
+
+.RS
+\fIhttps://www.cronburg.com/fio/cloud-latency-problem-measurement/
+.RE
+
+.SH AUTHOR
+.B fiologparser_hist.py
+and this manual page were written by Karl Cronburg <karl.cronburg@gmail.com>.
+.SH "REPORTING BUGS"
+Report bugs to the \fBfio\fR mailing list <fio@vger.kernel.org>.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8aa89d70f44eb3fe9d9581fd9bcc3cebca22621b:

  Various cleanups (2016-08-15 23:36:11 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c22825bb537af1f84a18dcb4af6d8c6844f751ac:

  Fix backwards reads with --size smaller than the file size (2016-08-16 15:22:17 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Merge branch 'histogram-delta' of https://github.com/cronburg/fio into histogram
      histogram: style and list fixups
      Fix backwards reads with --size smaller than the file size

Karl Cronburg (1):
      Make histogram samples non-cumulative by tracking a linked-list     of the most recent histogram and differencing it when we print     to the log file(s). Linked list of pointers used to minimize     runtime impact on recording side, instead choosing to do     subtraction on the logging (when logs get printed to file) side.

 io_u.c                          | 18 ++++++++++---
 iolog.c                         | 44 +++++++++++++++++++++++++-------
 iolog.h                         |  1 +
 stat.c                          |  9 ++++---
 stat.h                          |  5 ++++
 tools/hist/fiologparser_hist.py | 56 ++++++++++-------------------------------
 6 files changed, 73 insertions(+), 60 deletions(-)

---

Diff of recent changes:

diff --git a/io_u.c b/io_u.c
index 2270127..dcf7a40 100644
--- a/io_u.c
+++ b/io_u.c
@@ -362,8 +362,12 @@ static int get_next_seq_offset(struct thread_data *td, struct fio_file *f,
 	if (f->last_pos[ddir] < f->real_file_size) {
 		uint64_t pos;
 
-		if (f->last_pos[ddir] == f->file_offset && o->ddir_seq_add < 0)
-			f->last_pos[ddir] = f->real_file_size;
+		if (f->last_pos[ddir] == f->file_offset && o->ddir_seq_add < 0) {
+			if (f->real_file_size > f->io_size)
+				f->last_pos[ddir] = f->io_size;
+			else
+				f->last_pos[ddir] = f->real_file_size;
+		}
 
 		pos = f->last_pos[ddir] - f->file_offset;
 		if (pos && o->ddir_seq_add) {
@@ -378,8 +382,14 @@ static int get_next_seq_offset(struct thread_data *td, struct fio_file *f,
 			if (pos >= f->real_file_size) {
 				if (o->ddir_seq_add > 0)
 					pos = f->file_offset;
-				else
-					pos = f->real_file_size + o->ddir_seq_add;
+				else {
+					if (f->real_file_size > f->io_size)
+						pos = f->io_size;
+					else
+						pos = f->real_file_size;
+
+					pos += o->ddir_seq_add;
+				}
 			}
 		}
 
diff --git a/iolog.c b/iolog.c
index b0c948b..d4213db 100644
--- a/iolog.c
+++ b/iolog.c
@@ -576,6 +576,9 @@ void setup_log(struct io_log **log, struct log_params *p,
 	       const char *filename)
 {
 	struct io_log *l;
+	int i;
+	struct io_u_plat_entry *entry;
+	struct flist_head *list;
 
 	l = scalloc(1, sizeof(*l));
 	INIT_FLIST_HEAD(&l->io_logs);
@@ -589,6 +592,16 @@ void setup_log(struct io_log **log, struct log_params *p,
 	l->filename = strdup(filename);
 	l->td = p->td;
 
+	/* Initialize histogram lists for each r/w direction,
+	 * with initial io_u_plat of all zeros:
+	 */
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		list = &l->hist_window[i].list;
+		INIT_FLIST_HEAD(list);
+		entry = calloc(1, sizeof(struct io_u_plat_entry));
+		flist_add(&entry->list, list);
+	}
+
 	if (l->td && l->td->o.io_submit_mode != IO_MODE_OFFLOAD) {
 		struct io_logs *p;
 
@@ -661,13 +674,14 @@ void free_log(struct io_log *log)
 	sfree(log);
 }
 
-static inline unsigned long hist_sum(int j, int stride, unsigned int *io_u_plat)
+static inline unsigned long hist_sum(int j, int stride, unsigned int *io_u_plat,
+		unsigned int *io_u_plat_last)
 {
 	unsigned long sum;
 	int k;
 
 	for (k = sum = 0; k < stride; k++)
-		sum += io_u_plat[j + k];
+		sum += io_u_plat[j + k] - io_u_plat_last[j + k];
 
 	return sum;
 }
@@ -678,7 +692,9 @@ static void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
 	struct io_sample *s;
 	int log_offset;
 	uint64_t i, j, nr_samples;
+	struct io_u_plat_entry *entry, *entry_before;
 	unsigned int *io_u_plat;
+	unsigned int *io_u_plat_before;
 
 	int stride = 1 << hist_coarseness;
 	
@@ -692,15 +708,25 @@ static void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
 
 	for (i = 0; i < nr_samples; i++) {
 		s = __get_sample(samples, log_offset, i);
-		io_u_plat = (unsigned int *) (uintptr_t) s->val;
-		fprintf(f, "%lu, %u, %u, ", (unsigned long)s->time,
-		        io_sample_ddir(s), s->bs);
+
+		entry = (struct io_u_plat_entry *) s->val;
+		io_u_plat = entry->io_u_plat;
+
+		entry_before = flist_first_entry(&entry->list, struct io_u_plat_entry, list);
+		io_u_plat_before = entry_before->io_u_plat;
+
+		fprintf(f, "%lu, %u, %u, ", (unsigned long) s->time,
+						io_sample_ddir(s), s->bs);
 		for (j = 0; j < FIO_IO_U_PLAT_NR - stride; j += stride) {
-			fprintf(f, "%lu, ", hist_sum(j, stride, io_u_plat));
+			fprintf(f, "%lu, ", hist_sum(j, stride, io_u_plat,
+						io_u_plat_before));
 		}
-		fprintf(f, "%lu\n", (unsigned long) 
-		        hist_sum(FIO_IO_U_PLAT_NR - stride, stride, io_u_plat));
-		free(io_u_plat);
+		fprintf(f, "%lu\n", (unsigned long)
+		        hist_sum(FIO_IO_U_PLAT_NR - stride, stride, io_u_plat,
+					io_u_plat_before));
+
+		flist_del(&entry_before->list);
+		free(entry_before);
 	}
 }
 
diff --git a/iolog.h b/iolog.h
index 93e970e..ca344f1 100644
--- a/iolog.h
+++ b/iolog.h
@@ -21,6 +21,7 @@ struct io_stat {
 struct io_hist {
 	uint64_t samples;
 	unsigned long hist_last;
+	struct flist_head list;
 };
 
 /*
diff --git a/stat.c b/stat.c
index 6f5f002..5e7c593 100644
--- a/stat.c
+++ b/stat.c
@@ -2221,7 +2221,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		
 		if (this_window >= iolog->hist_msec) {
 			unsigned int *io_u_plat;
-			unsigned int *dst;
+			struct io_u_plat_entry *dst;
 
 			/*
 			 * Make a byte-for-byte copy of the latency histogram
@@ -2231,10 +2231,11 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 			 * log file.
 			 */
 			io_u_plat = (unsigned int *) td->ts.io_u_plat[ddir];
-			dst = malloc(FIO_IO_U_PLAT_NR * sizeof(unsigned int));
-			memcpy(dst, io_u_plat,
+			dst = malloc(sizeof(struct io_u_plat_entry));
+			memcpy(&(dst->io_u_plat), io_u_plat,
 				FIO_IO_U_PLAT_NR * sizeof(unsigned int));
-			__add_log_sample(iolog, (unsigned long )dst, ddir, bs,
+			flist_add(&dst->list, &hw->list);
+			__add_log_sample(iolog, (unsigned long)dst, ddir, bs,
 						elapsed, offset);
 
 			/*
diff --git a/stat.h b/stat.h
index c3e343d..e6f7759 100644
--- a/stat.h
+++ b/stat.h
@@ -240,6 +240,11 @@ struct jobs_eta {
 	uint8_t run_str[];
 } __attribute__((packed));
 
+struct io_u_plat_entry {
+	struct flist_head list;
+	unsigned int io_u_plat[FIO_IO_U_PLAT_NR];
+};
+
 extern struct fio_mutex *stat_mutex;
 
 extern struct jobs_eta *get_jobs_eta(bool force, size_t *size);
diff --git a/tools/hist/fiologparser_hist.py b/tools/hist/fiologparser_hist.py
index ce98d2e..5891427 100755
--- a/tools/hist/fiologparser_hist.py
+++ b/tools/hist/fiologparser_hist.py
@@ -46,9 +46,7 @@
       to get weighted histograms.
     
     * We convert files given on the command line, assumed to be fio histogram files,
-      on-the-fly into their corresponding differenced files i.e. non-cumulative histograms
-      because fio outputs cumulative histograms, but we want histograms corresponding
-      to individual time intervals. An individual histogram file can contain the cumulative
+      An individual histogram file can contain the
       histograms for multiple different r/w directions (notably when --rw=randrw). This
       is accounted for by tracking each r/w direction separately. In the statistics
       reported we ultimately merge *all* histograms (regardless of r/w direction).
@@ -188,23 +186,8 @@ __HIST_COLUMNS = 1216
 __NON_HIST_COLUMNS = 3
 __TOTAL_COLUMNS = __HIST_COLUMNS + __NON_HIST_COLUMNS
     
-def sequential_diffs(head_row, times, rws, hists):
-    """ Take the difference of sequential (in time) histograms with the same
-        r/w direction, returning a new array of differenced histograms.  """
-    result = np.empty(shape=(0, __HIST_COLUMNS))
-    result_times = np.empty(shape=(1, 0))
-    for i in range(8):
-        idx = np.where(rws == i)
-        diff = np.diff(np.append(head_row[i], hists[idx], axis=0), axis=0).astype(int)
-        result = np.append(diff, result, axis=0)
-        result_times = np.append(times[idx], result_times)
-    idx = np.argsort(result_times)
-    return result[idx]
-
-def read_chunk(head_row, rdr, sz):
-    """ Read the next chunk of size sz from the given reader, computing the
-        differences across neighboring histogram samples.
-    """
+def read_chunk(rdr, sz):
+    """ Read the next chunk of size sz from the given reader. """
     try:
         """ StopIteration occurs when the pandas reader is empty, and AttributeError
             occurs if rdr is None due to the file being empty. """
@@ -212,32 +195,20 @@ def read_chunk(head_row, rdr, sz):
     except (StopIteration, AttributeError):
         return None    
 
-    """ Extract array of just the times, and histograms matrix without times column.
-        Then, take the sequential difference of each of the rows in the histogram
-        matrix. This is necessary because fio outputs *cumulative* histograms as
-        opposed to histograms with counts just for a particular interval. """
+    """ Extract array of just the times, and histograms matrix without times column. """
     times, rws, szs = new_arr[:,0], new_arr[:,1], new_arr[:,2]
     hists = new_arr[:,__NON_HIST_COLUMNS:]
-    hists_diff   = sequential_diffs(head_row, times, rws, hists)
     times = times.reshape((len(times),1))
-    arr = np.append(times, hists_diff, axis=1)
+    arr = np.append(times, hists, axis=1)
 
-    """ hists[-1] will be the row we need to start our differencing with the
-        next time we call read_chunk() on the same rdr """
-    return arr, hists[-1]
+    return arr
 
 def get_min(fps, arrs):
     """ Find the file with the current first row with the smallest start time """
-    return min([fp for fp in fps if not arrs[fp] is None], key=lambda fp: arrs.get(fp)[0][0][0])
+    return min([fp for fp in fps if not arrs[fp] is None], key=lambda fp: arrs.get(fp)[0][0])
 
 def histogram_generator(ctx, fps, sz):
     
-    """ head_row for a particular file keeps track of the last (cumulative)
-        histogram we read so that we have a reference point to subtract off
-        when computing sequential differences. """
-    head_row  = np.zeros(shape=(1, __HIST_COLUMNS))
-    head_rows = {fp: {i: head_row for i in range(8)} for fp in fps}
-
     # Create a chunked pandas reader for each of the files:
     rdrs = {}
     for fp in fps:
@@ -250,8 +221,8 @@ def histogram_generator(ctx, fps, sz):
             else:
                 raise(e)
 
-    # Initial histograms and corresponding head_rows:
-    arrs = {fp: read_chunk(head_rows[fp], rdr, sz) for fp,rdr in rdrs.items()}
+    # Initial histograms from disk:
+    arrs = {fp: read_chunk(rdr, sz) for fp,rdr in rdrs.items()}
     while True:
 
         try:
@@ -259,13 +230,12 @@ def histogram_generator(ctx, fps, sz):
             fp = get_min(fps, arrs)
         except ValueError:
             return
-        arr, head_row = arrs[fp]
+        arr = arrs[fp]
         yield np.insert(arr[0], 1, fps.index(fp))
-        arrs[fp] = arr[1:], head_row
-        head_rows[fp] = head_row
+        arrs[fp] = arr[1:]
 
-        if arrs[fp][0].shape[0] == 0:
-            arrs[fp] = read_chunk(head_rows[fp], rdrs[fp], sz)
+        if arrs[fp].shape[0] == 0:
+            arrs[fp] = read_chunk(rdrs[fp], sz)
 
 def _plat_idx_to_val(idx, edge=0.5, FIO_IO_U_PLAT_BITS=6, FIO_IO_U_PLAT_VAL=64):
     """ Taken from fio's stat.c for calculating the latency value of a bin

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 1651e4310feb3eab7c7c8cf0bd23d159cb410628:

  Only enable atomic io_u flag setting/clearing if we need it (2016-08-14 21:31:16 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8aa89d70f44eb3fe9d9581fd9bcc3cebca22621b:

  Various cleanups (2016-08-15 23:36:11 -0600)

----------------------------------------------------------------
Jens Axboe (11):
      options: remove dependency of 'o' being first in 'td'
      fio: move thread_options
      fio: inherit IO engine flags to 'td'
      options: pass in right pointer to options free
      Option updates
      Fixup correct sparse warnings
      parse: get rid of __td_var()
      parse: remove dead code
      gfio: fix link error
      gfio: fix auto-start of backend
      Various cleanups

 HOWTO                     |   2 +-
 backend.c                 |  12 +-
 client.c                  |   2 +-
 client.h                  |   5 +-
 diskutil.c                |   2 +-
 engines/e4defrag.c        |   2 +
 engines/glusterfs_async.c |   2 +-
 engines/libhdfs.c         |   6 +-
 engines/mtd.c             |   4 +-
 engines/net.c             |   2 +
 eta.c                     |  14 +-
 file.h                    |   4 +-
 filesetup.c               |  10 +-
 fio.c                     |   2 +
 fio.h                     |  61 +++---
 gclient.c                 |   6 +-
 gfio.c                    |   2 +
 init.c                    |  24 +--
 io_u.c                    |   6 +-
 ioengines.c               |   8 +-
 iolog.c                   |   4 +-
 iolog.h                   |   9 +-
 lib/bloom.c               |   2 +-
 lib/mountcheck.c          |   2 +
 lib/strntol.c             |   2 +
 memory.c                  |   4 +-
 options.c                 | 487 +++++++++++++++++++++++-----------------------
 options.h                 |   5 +-
 oslib/libmtd.c            |   2 +-
 oslib/linux-dev-lookup.c  |   1 +
 oslib/strlcat.c           |   1 +
 parse.c                   |   2 +-
 parse.h                   |  14 +-
 rate-submit.c             |   3 -
 server.c                  |   4 +-
 stat.c                    |  14 +-
 stat.h                    |  12 +-
 verify.c                  |   9 +-
 38 files changed, 392 insertions(+), 361 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 5bf7125..c1b768d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1951,7 +1951,7 @@ be the starting port number since fio will use a range of ports.
 [mtd] skip_bad=bool	Skip operations against known bad blocks.
 
 [libhdfs] hdfsdirectory	libhdfs will create chunk in this HDFS directory
-[libhdfs] chunck_size	the size of the chunck to use for each file.
+[libhdfs] chunk_size	the size of the chunk to use for each file.
 
 
 6.0 Interpreting the output
diff --git a/backend.c b/backend.c
index c051c13..b43486d 100644
--- a/backend.c
+++ b/backend.c
@@ -1024,7 +1024,7 @@ reap:
 		if (ret < 0)
 			break;
 		if (!ddir_rw_sum(td->bytes_done) &&
-		    !(td->io_ops->flags & FIO_NOIO))
+		    !td_ioengine_flagged(td, FIO_NOIO))
 			continue;
 
 		if (!in_ramp_time(td) && should_check_rate(td)) {
@@ -1175,7 +1175,7 @@ static int init_io_u(struct thread_data *td)
 	td->orig_buffer_size = (unsigned long long) max_bs
 					* (unsigned long long) max_units;
 
-	if ((td->io_ops->flags & FIO_NOIO) || !(td_read(td) || td_write(td)))
+	if (td_ioengine_flagged(td, FIO_NOIO) || !(td_read(td) || td_write(td)))
 		data_xfer = 0;
 
 	err = 0;
@@ -1195,7 +1195,7 @@ static int init_io_u(struct thread_data *td)
 	 * lucky and the allocator gives us an aligned address.
 	 */
 	if (td->o.odirect || td->o.mem_align || td->o.oatomic ||
-	    (td->io_ops->flags & FIO_RAWIO))
+	    td_ioengine_flagged(td, FIO_RAWIO))
 		td->orig_buffer_size += page_mask + td->o.mem_align;
 
 	if (td->o.mem_type == MEM_SHMHUGE || td->o.mem_type == MEM_MMAPHUGE) {
@@ -1214,7 +1214,7 @@ static int init_io_u(struct thread_data *td)
 		return 1;
 
 	if (td->o.odirect || td->o.mem_align || td->o.oatomic ||
-	    (td->io_ops->flags & FIO_RAWIO))
+	    td_ioengine_flagged(td, FIO_RAWIO))
 		p = PAGE_ALIGN(td->orig_buffer) + td->o.mem_align;
 	else
 		p = td->orig_buffer;
@@ -1288,7 +1288,7 @@ static int switch_ioscheduler(struct thread_data *td)
 	FILE *f;
 	int ret;
 
-	if (td->io_ops->flags & FIO_DISKLESSIO)
+	if (td_ioengine_flagged(td, FIO_DISKLESSIO))
 		return 0;
 
 	sprintf(tmp, "%s/queue/scheduler", td->sysfs_root);
@@ -1748,7 +1748,7 @@ static void *thread_main(void *data)
 
 		if (!o->do_verify ||
 		    o->verify == VERIFY_NONE ||
-		    (td->io_ops->flags & FIO_UNIDIR))
+		    td_ioengine_flagged(td, FIO_UNIDIR))
 			continue;
 
 		clear_io_state(td, 0);
diff --git a/client.c b/client.c
index d502a4b..238c93f 100644
--- a/client.c
+++ b/client.c
@@ -557,7 +557,7 @@ int fio_client_terminate(struct fio_client *client)
 	return fio_net_send_quit(client->fd);
 }
 
-void fio_clients_terminate(void)
+static void fio_clients_terminate(void)
 {
 	struct flist_head *entry;
 	struct fio_client *client;
diff --git a/client.h b/client.h
index ddacf78..fc9c196 100644
--- a/client.h
+++ b/client.h
@@ -131,7 +131,6 @@ extern struct fio_client *fio_client_add_explicit(struct client_ops *, const cha
 extern void fio_client_add_cmd_option(void *, const char *);
 extern int fio_client_add_ini_file(void *, const char *, bool);
 extern int fio_client_terminate(struct fio_client *);
-extern void fio_clients_terminate(void);
 extern struct fio_client *fio_get_client(struct fio_client *);
 extern void fio_put_client(struct fio_client *);
 extern int fio_client_update_options(struct fio_client *, struct thread_options *, uint64_t *);
@@ -145,5 +144,9 @@ enum {
 	FIO_CLIENT_TYPE_GUI		= 2,
 };
 
+extern int sum_stat_clients;
+extern struct thread_stat client_ts;
+extern struct group_run_stats client_gs;
+
 #endif
 
diff --git a/diskutil.c b/diskutil.c
index a1077d4..0f7a642 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -491,7 +491,7 @@ void init_disk_util(struct thread_data *td)
 	unsigned int i;
 
 	if (!td->o.do_disk_util ||
-	    (td->io_ops->flags & (FIO_DISKLESSIO | FIO_NODISKUTIL)))
+	    td_ioengine_flagged(td, FIO_DISKLESSIO | FIO_NODISKUTIL))
 		return;
 
 	for_each_file(td, f, i)
diff --git a/engines/e4defrag.c b/engines/e4defrag.c
index c599c98..e53636e 100644
--- a/engines/e4defrag.c
+++ b/engines/e4defrag.c
@@ -45,6 +45,7 @@ struct e4defrag_options {
 static struct fio_option options[] = {
 	{
 		.name	= "donorname",
+		.lname	= "Donor Name",
 		.type	= FIO_OPT_STR_STORE,
 		.off1	= offsetof(struct e4defrag_options, donor_name),
 		.help	= "File used as a block donor",
@@ -53,6 +54,7 @@ static struct fio_option options[] = {
 	},
 	{
 		.name	= "inplace",
+		.lname	= "In Place",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct e4defrag_options, inplace),
 		.minval	= 0,
diff --git a/engines/glusterfs_async.c b/engines/glusterfs_async.c
index 8e42a84..f46cb26 100644
--- a/engines/glusterfs_async.c
+++ b/engines/glusterfs_async.c
@@ -137,7 +137,7 @@ failed:
 	return FIO_Q_COMPLETED;
 }
 
-int fio_gf_async_setup(struct thread_data *td)
+static int fio_gf_async_setup(struct thread_data *td)
 {
 	struct gf_data *g;
 	int r;
diff --git a/engines/libhdfs.c b/engines/libhdfs.c
index fba17c4..96a0871 100644
--- a/engines/libhdfs.c
+++ b/engines/libhdfs.c
@@ -80,7 +80,9 @@ static struct fio_option options[] = {
 		.group	= FIO_OPT_G_HDFS,
 	},
 	{
-		.name	= "chunck_size",
+		.name	= "chunk_size",
+		.alias	= "chunck_size",
+		.lname	= "Chunk size",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct hdfsio_options, chunck_size),
 		.def    = "1048576",
@@ -90,6 +92,7 @@ static struct fio_option options[] = {
 	},
 	{
 		.name	= "single_instance",
+		.lname	= "Single Instance",
 		.type	= FIO_OPT_BOOL,
 		.off1	= offsetof(struct hdfsio_options, single_instance),
 		.def    = "1",
@@ -99,6 +102,7 @@ static struct fio_option options[] = {
 	},
 	{
 		.name	= "hdfs_use_direct",
+		.lname	= "HDFS Use Direct",
 		.type	= FIO_OPT_BOOL,
 		.off1	= offsetof(struct hdfsio_options, use_direct),
 		.def    = "0",
diff --git a/engines/mtd.c b/engines/mtd.c
index 7b92c83..3c22a1b 100644
--- a/engines/mtd.c
+++ b/engines/mtd.c
@@ -16,7 +16,7 @@
 #include "../verify.h"
 #include "../oslib/libmtd.h"
 
-libmtd_t desc;
+static libmtd_t desc;
 
 struct fio_mtd_data {
 	struct mtd_dev_info info;
@@ -168,7 +168,7 @@ static int fio_mtd_close_file(struct thread_data *td, struct fio_file *f)
 	return generic_close_file(td, f);
 }
 
-int fio_mtd_get_file_size(struct thread_data *td, struct fio_file *f)
+static int fio_mtd_get_file_size(struct thread_data *td, struct fio_file *f)
 {
 	struct mtd_dev_info info;
 
diff --git a/engines/net.c b/engines/net.c
index f24efc1..5f1401c 100644
--- a/engines/net.c
+++ b/engines/net.c
@@ -135,6 +135,7 @@ static struct fio_option options[] = {
 #ifdef CONFIG_TCP_NODELAY
 	{
 		.name	= "nodelay",
+		.lname	= "No Delay",
 		.type	= FIO_OPT_BOOL,
 		.off1	= offsetof(struct netio_options, nodelay),
 		.help	= "Use TCP_NODELAY on TCP connections",
@@ -153,6 +154,7 @@ static struct fio_option options[] = {
 	},
 	{
 		.name	= "pingpong",
+		.lname	= "Ping Pong",
 		.type	= FIO_OPT_STR_SET,
 		.off1	= offsetof(struct netio_options, pingpong),
 		.help	= "Ping-pong IO requests",
diff --git a/eta.c b/eta.c
index ffab34e..3c1aeee 100644
--- a/eta.c
+++ b/eta.c
@@ -337,7 +337,7 @@ static void calc_iops(int unified_rw_rep, unsigned long mtime,
  * Print status of the jobs we know about. This includes rate estimates,
  * ETA, thread state, etc.
  */
-int calc_thread_status(struct jobs_eta *je, int force)
+bool calc_thread_status(struct jobs_eta *je, int force)
 {
 	struct thread_data *td;
 	int i, unified_rw_rep;
@@ -354,12 +354,12 @@ int calc_thread_status(struct jobs_eta *je, int force)
 	if (!force) {
 		if (!(output_format & FIO_OUTPUT_NORMAL) &&
 		    f_out == stdout)
-			return 0;
+			return false;
 		if (temp_stall_ts || eta_print == FIO_ETA_NEVER)
-			return 0;
+			return false;
 
 		if (!isatty(STDOUT_FILENO) && (eta_print != FIO_ETA_ALWAYS))
-			return 0;
+			return false;
 	}
 
 	if (!ddir_rw_sum(rate_io_bytes))
@@ -479,7 +479,7 @@ int calc_thread_status(struct jobs_eta *je, int force)
 	 * Allow a little slack, the target is to print it every 1000 msecs
 	 */
 	if (!force && disp_time < 900)
-		return 0;
+		return false;
 
 	calc_rate(unified_rw_rep, disp_time, io_bytes, disp_io_bytes, je->rate);
 	calc_iops(unified_rw_rep, disp_time, io_iops, disp_io_iops, je->iops);
@@ -487,12 +487,12 @@ int calc_thread_status(struct jobs_eta *je, int force)
 	memcpy(&disp_prev_time, &now, sizeof(now));
 
 	if (!force && !je->nr_running && !je->nr_pending)
-		return 0;
+		return false;
 
 	je->nr_threads = thread_number;
 	update_condensed_str(__run_str, run_str);
 	memcpy(je->run_str, run_str, strlen(run_str));
-	return 1;
+	return true;
 }
 
 void display_thread_status(struct jobs_eta *je)
diff --git a/file.h b/file.h
index 0cf622f..f7e5d20 100644
--- a/file.h
+++ b/file.h
@@ -209,7 +209,7 @@ extern void dup_files(struct thread_data *, struct thread_data *);
 extern int get_fileno(struct thread_data *, const char *);
 extern void free_release_files(struct thread_data *);
 extern void filesetup_mem_free(void);
-void fio_file_reset(struct thread_data *, struct fio_file *);
-int fio_files_done(struct thread_data *);
+extern void fio_file_reset(struct thread_data *, struct fio_file *);
+extern int fio_files_done(struct thread_data *);
 
 #endif
diff --git a/filesetup.c b/filesetup.c
index 42a9f41..5db44c2 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -52,7 +52,7 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 	 */
 	if (td_read(td) ||
 	   (td_write(td) && td->o.overwrite && !td->o.file_append) ||
-	    (td_write(td) && td->io_ops->flags & FIO_NOEXTEND))
+	    (td_write(td) && td_ioengine_flagged(td, FIO_NOEXTEND)))
 		new_layout = 1;
 	if (td_write(td) && !td->o.overwrite && !td->o.file_append)
 		unlink_file = 1;
@@ -217,7 +217,7 @@ static int pre_read_file(struct thread_data *td, struct fio_file *f)
 	unsigned int bs;
 	char *b;
 
-	if (td->io_ops->flags & FIO_PIPEIO)
+	if (td_ioengine_flagged(td, FIO_PIPEIO))
 		return 0;
 
 	if (!fio_file_open(f)) {
@@ -827,7 +827,7 @@ int setup_files(struct thread_data *td)
 	 * device/file sizes are zero and no size given, punt
 	 */
 	if ((!total_size || total_size == -1ULL) && !o->size &&
-	    !(td->io_ops->flags & FIO_NOIO) && !o->fill_device &&
+	    !td_ioengine_flagged(td, FIO_NOIO) && !o->fill_device &&
 	    !(o->nr_files && (o->file_size_low || o->file_size_high))) {
 		log_err("%s: you need to specify size=\n", o->name);
 		td_verror(td, EINVAL, "total_file_size");
@@ -903,7 +903,7 @@ int setup_files(struct thread_data *td)
 
 		if (f->filetype == FIO_TYPE_FILE &&
 		    (f->io_size + f->file_offset) > f->real_file_size &&
-		    !(td->io_ops->flags & FIO_DISKLESSIO)) {
+		    !td_ioengine_flagged(td, FIO_DISKLESSIO)) {
 			if (!o->create_on_open) {
 				need_extend++;
 				extend_size += (f->io_size + f->file_offset);
@@ -1374,7 +1374,7 @@ int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 	/*
 	 * init function, io engine may not be loaded yet
 	 */
-	if (td->io_ops && (td->io_ops->flags & FIO_DISKLESSIO))
+	if (td->io_ops && td_ioengine_flagged(td, FIO_DISKLESSIO))
 		f->real_file_size = -1ULL;
 
 	f->file_name = smalloc_strdup(file_name);
diff --git a/fio.c b/fio.c
index 69014dd..7b3a50b 100644
--- a/fio.c
+++ b/fio.c
@@ -32,6 +32,8 @@ int main(int argc, char *argv[], char *envp[])
 {
 	int ret = 1;
 
+	compiletime_assert(TD_NR <= TD_ENG_FLAG_SHIFT, "TD_ENG_FLAG_SHIFT");
+
 	if (initialize_fio(envp))
 		return 1;
 
diff --git a/fio.h b/fio.h
index 7f685ea..0da0bc5 100644
--- a/fio.h
+++ b/fio.h
@@ -126,11 +126,10 @@ struct zone_split_index {
  * This describes a single thread/process executing a fio job.
  */
 struct thread_data {
-	struct thread_options o;
 	struct flist_head opt_list;
 	unsigned long flags;
+	struct thread_options o;
 	void *eo;
-	char verror[FIO_VERROR_SIZE];
 	pthread_t thread;
 	unsigned int thread_number;
 	unsigned int subjob_number;
@@ -394,6 +393,8 @@ struct thread_data {
 	void *prof_data;
 
 	void *pinned_mem;
+
+	char verror[FIO_VERROR_SIZE];
 };
 
 /*
@@ -450,7 +451,6 @@ extern int read_only;
 extern int eta_print;
 extern int eta_new_line;
 extern unsigned long done_secs;
-extern char *job_section;
 extern int fio_gtod_offload;
 extern int fio_gtod_cpu;
 extern enum fio_cs fio_clock_source;
@@ -513,7 +513,7 @@ extern void td_fill_verify_state_seed(struct thread_data *);
 extern void add_job_opts(const char **, int);
 extern char *num2str(uint64_t, int, int, int, int);
 extern int ioengine_load(struct thread_data *);
-extern int parse_dryrun(void);
+extern bool parse_dryrun(void);
 extern int fio_running_or_pending_io_threads(void);
 extern int fio_set_fd_nonblocking(int, const char *);
 extern void sig_show_status(int sig);
@@ -555,8 +555,27 @@ enum {
 	TD_EXITED,
 	TD_REAPED,
 	TD_LAST,
+	TD_NR,
 };
 
+#define TD_ENG_FLAG_SHIFT	16
+#define TD_ENG_FLAG_MASK	((1U << 16) - 1)
+
+static inline enum fio_ioengine_flags td_ioengine_flags(struct thread_data *td)
+{
+	return (td->flags >> TD_ENG_FLAG_SHIFT) & TD_ENG_FLAG_MASK;
+}
+
+static inline void td_set_ioengine_flags(struct thread_data *td)
+{
+	td->flags |= (td->io_ops->flags << TD_ENG_FLAG_SHIFT);
+}
+
+static inline bool td_ioengine_flagged(struct thread_data *td, unsigned int val)
+{
+	return ((td->flags >> TD_ENG_FLAG_SHIFT) & val) != 0;
+}
+
 extern void td_set_runstate(struct thread_data *, int);
 extern int td_bump_runstate(struct thread_data *, int);
 extern void td_restore_runstate(struct thread_data *, int);
@@ -623,17 +642,17 @@ extern void lat_target_reset(struct thread_data *);
 	}	\
 } while (0)
 
-static inline int fio_fill_issue_time(struct thread_data *td)
+static inline bool fio_fill_issue_time(struct thread_data *td)
 {
 	if (td->o.read_iolog_file ||
 	    !td->o.disable_clat || !td->o.disable_slat || !td->o.disable_bw)
-		return 1;
+		return true;
 
-	return 0;
+	return false;
 }
 
-static inline int __should_check_rate(struct thread_data *td,
-				      enum fio_ddir ddir)
+static inline bool __should_check_rate(struct thread_data *td,
+				       enum fio_ddir ddir)
 {
 	struct thread_options *o = &td->o;
 
@@ -642,23 +661,21 @@ static inline int __should_check_rate(struct thread_data *td,
 	 */
 	if (o->rate[ddir] || o->ratemin[ddir] || o->rate_iops[ddir] ||
 	    o->rate_iops_min[ddir])
-		return 1;
+		return true;
 
-	return 0;
+	return false;
 }
 
-static inline int should_check_rate(struct thread_data *td)
+static inline bool should_check_rate(struct thread_data *td)
 {
-	int ret = 0;
-
-	if (td->bytes_done[DDIR_READ])
-		ret |= __should_check_rate(td, DDIR_READ);
-	if (td->bytes_done[DDIR_WRITE])
-		ret |= __should_check_rate(td, DDIR_WRITE);
-	if (td->bytes_done[DDIR_TRIM])
-		ret |= __should_check_rate(td, DDIR_TRIM);
-
-	return ret;
+	if (td->bytes_done[DDIR_READ] && __should_check_rate(td, DDIR_READ))
+		return true;
+	if (td->bytes_done[DDIR_WRITE] && __should_check_rate(td, DDIR_WRITE))
+		return true;
+	if (td->bytes_done[DDIR_TRIM] && __should_check_rate(td, DDIR_TRIM))
+		return true;
+
+	return false;
 }
 
 static inline unsigned int td_max_bs(struct thread_data *td)
diff --git a/gclient.c b/gclient.c
index 9c32474..23b0899 100644
--- a/gclient.c
+++ b/gclient.c
@@ -280,10 +280,6 @@ static void gfio_disk_util_op(struct fio_client *client, struct fio_net_cmd *cmd
 	gdk_threads_leave();
 }
 
-extern int sum_stat_clients;
-extern struct thread_stat client_ts;
-extern struct group_run_stats client_gs;
-
 static int sum_stat_nr;
 
 static void gfio_thread_status_op(struct fio_client *client,
@@ -1012,7 +1008,7 @@ static void gfio_show_lat(GtkWidget *vbox, const char *name, unsigned long min,
 	char *minp, *maxp;
 	char tmp[64];
 
-	if (!usec_to_msec(&min, &max, &mean, &dev))
+	if (usec_to_msec(&min, &max, &mean, &dev))
 		base = "(msec)";
 
 	minp = num2str(min, 6, 1, 0, 0);
diff --git a/gfio.c b/gfio.c
index 37c1818..ce18091 100644
--- a/gfio.c
+++ b/gfio.c
@@ -459,10 +459,12 @@ static int send_job_file(struct gui_entry *ge)
 
 static void *server_thread(void *arg)
 {
+	fio_server_create_sk_key();
 	is_backend = 1;
 	gfio_server_running = 1;
 	fio_start_server(NULL);
 	gfio_server_running = 0;
+	fio_server_destroy_sk_key();
 	return NULL;
 }
 
diff --git a/init.c b/init.c
index fb07daa..5ff7385 100644
--- a/init.c
+++ b/init.c
@@ -47,7 +47,6 @@ static char **job_sections;
 static int nr_job_sections;
 
 int exitall_on_terminate = 0;
-int exitall_on_terminate_error = 0;
 int output_format = FIO_OUTPUT_NORMAL;
 int eta_print = FIO_ETA_AUTO;
 int eta_new_line = 0;
@@ -677,7 +676,7 @@ static int fixup_options(struct thread_data *td)
 			"verify limited\n");
 		ret = warnings_fatal;
 	}
-	if (o->bs_unaligned && (o->odirect || td->io_ops->flags & FIO_RAWIO))
+	if (o->bs_unaligned && (o->odirect || td_ioengine_flagged(td, FIO_RAWIO)))
 		log_err("fio: bs_unaligned may not work with raw io\n");
 
 	/*
@@ -764,7 +763,7 @@ static int fixup_options(struct thread_data *td)
 
 	if (o->pre_read) {
 		o->invalidate_cache = 0;
-		if (td->io_ops->flags & FIO_PIPEIO) {
+		if (td_ioengine_flagged(td, FIO_PIPEIO)) {
 			log_info("fio: cannot pre-read files with an IO engine"
 				 " that isn't seekable. Pre-read disabled.\n");
 			ret = warnings_fatal;
@@ -772,7 +771,7 @@ static int fixup_options(struct thread_data *td)
 	}
 
 	if (!o->unit_base) {
-		if (td->io_ops->flags & FIO_BIT_BASED)
+		if (td_ioengine_flagged(td, FIO_BIT_BASED))
 			o->unit_base = 1;
 		else
 			o->unit_base = 8;
@@ -795,7 +794,7 @@ static int fixup_options(struct thread_data *td)
 	 * Windows doesn't support O_DIRECT or O_SYNC with the _open interface,
 	 * so fail if we're passed those flags
 	 */
-	if ((td->io_ops->flags & FIO_SYNCIO) && (td->o.odirect || td->o.sync_io)) {
+	if (td_ioengine_flagged(td, FIO_SYNCIO) && (td->o.odirect || td->o.sync_io)) {
 		log_err("fio: Windows does not support direct or non-buffered io with"
 				" the synchronous ioengines. Use the 'windowsaio' ioengine"
 				" with 'direct=1' and 'iodepth=1' instead.\n");
@@ -844,7 +843,7 @@ static int fixup_options(struct thread_data *td)
 	if (fio_option_is_set(&td->o, rand_seed))
 		td->o.rand_repeatable = 0;
 
-	if ((td->io_ops->flags & FIO_NOEXTEND) && td->o.file_append) {
+	if (td_ioengine_flagged(td, FIO_NOEXTEND) && td->o.file_append) {
 		log_err("fio: can't append/extent with IO engine %s\n", td->io_ops->name);
 		ret = 1;
 	}
@@ -1069,6 +1068,10 @@ int ioengine_load(struct thread_data *td)
 		*(struct thread_data **)td->eo = td;
 	}
 
+	if (td->o.odirect)
+		td->io_ops->flags |= FIO_RAWIO;
+
+	td_set_ioengine_flags(td);
 	return 0;
 }
 
@@ -1244,7 +1247,7 @@ static char *make_filename(char *buf, size_t buf_size,struct thread_options *o,
 	return buf;
 }
 
-int parse_dryrun(void)
+bool parse_dryrun(void)
 {
 	return dump_cmdline || parse_only;
 }
@@ -1340,9 +1343,6 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	if (ioengine_load(td))
 		goto err;
 
-	if (o->odirect)
-		td->io_ops->flags |= FIO_RAWIO;
-
 	file_alloced = 0;
 	if (!o->filename && !td->files_index && !o->read_iolog_file) {
 		file_alloced = 1;
@@ -1373,7 +1373,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	if (td->eo)
 		*(struct thread_data **)td->eo = NULL;
 
-	if (td->io_ops->flags & FIO_DISKLESSIO) {
+	if (td_ioengine_flagged(td, FIO_DISKLESSIO)) {
 		struct fio_file *f;
 
 		for_each_file(td, f, i)
@@ -1537,7 +1537,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			if (is_backend && !recursed)
 				fio_server_send_add_job(td);
 
-			if (!(td->io_ops->flags & FIO_NOIO)) {
+			if (!td_ioengine_flagged(td, FIO_NOIO)) {
 				char *c1, *c2, *c3, *c4;
 				char *c5 = NULL, *c6 = NULL;
 
diff --git a/io_u.c b/io_u.c
index 34acc56..2270127 100644
--- a/io_u.c
+++ b/io_u.c
@@ -768,7 +768,7 @@ static void set_rw_ddir(struct thread_data *td, struct io_u *io_u)
 
 	io_u->ddir = io_u->acct_ddir = ddir;
 
-	if (io_u->ddir == DDIR_WRITE && (td->io_ops->flags & FIO_BARRIER) &&
+	if (io_u->ddir == DDIR_WRITE && td_ioengine_flagged(td, FIO_BARRIER) &&
 	    td->o.barrier_blocks &&
 	   !(td->io_issues[DDIR_WRITE] % td->o.barrier_blocks) &&
 	     td->io_issues[DDIR_WRITE])
@@ -843,7 +843,7 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 {
 	unsigned int is_random;
 
-	if (td->io_ops->flags & FIO_NOIO)
+	if (td_ioengine_flagged(td, FIO_NOIO))
 		goto out;
 
 	set_rw_ddir(td, io_u);
@@ -1622,7 +1622,7 @@ struct io_u *get_io_u(struct thread_data *td)
 	assert(fio_file_open(f));
 
 	if (ddir_rw(io_u->ddir)) {
-		if (!io_u->buflen && !(td->io_ops->flags & FIO_NOIO)) {
+		if (!io_u->buflen && !td_ioengine_flagged(td, FIO_NOIO)) {
 			dprint(FD_IO, "get_io_u: zero buflen on %p\n", io_u);
 			goto err_put;
 		}
diff --git a/ioengines.c b/ioengines.c
index 1c7a93b..ae55f95 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -272,7 +272,7 @@ int td_io_queue(struct thread_data *td, struct io_u *io_u)
 	io_u->error = 0;
 	io_u->resid = 0;
 
-	if (td->io_ops->flags & FIO_SYNCIO) {
+	if (td_ioengine_flagged(td, FIO_SYNCIO)) {
 		if (fio_fill_issue_time(td))
 			fio_gettime(&io_u->issue_time, NULL);
 
@@ -346,7 +346,7 @@ int td_io_queue(struct thread_data *td, struct io_u *io_u)
 		}
 	}
 
-	if ((td->io_ops->flags & FIO_SYNCIO) == 0) {
+	if (!td_ioengine_flagged(td, FIO_SYNCIO)) {
 		if (fio_fill_issue_time(td))
 			fio_gettime(&io_u->issue_time, NULL);
 
@@ -375,7 +375,7 @@ int td_io_init(struct thread_data *td)
 			td->error = ret;
 	}
 
-	if (!ret && (td->io_ops->flags & FIO_NOIO))
+	if (!ret && td_ioengine_flagged(td, FIO_NOIO))
 		td->flags |= TD_F_NOIO;
 
 	return ret;
@@ -441,7 +441,7 @@ int td_io_open_file(struct thread_data *td, struct fio_file *f)
 		}
 	}
 
-	if (td->io_ops->flags & FIO_DISKLESSIO)
+	if (td_ioengine_flagged(td, FIO_DISKLESSIO))
 		goto done;
 
 	if (td->o.invalidate_cache && file_invalidate_cache(td, f))
diff --git a/iolog.c b/iolog.c
index 975ce6f..b0c948b 100644
--- a/iolog.c
+++ b/iolog.c
@@ -672,8 +672,8 @@ static inline unsigned long hist_sum(int j, int stride, unsigned int *io_u_plat)
 	return sum;
 }
 
-void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
-			uint64_t sample_size)
+static void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
+			       uint64_t sample_size)
 {
 	struct io_sample *s;
 	int log_offset;
diff --git a/iolog.h b/iolog.h
index 011179a..93e970e 100644
--- a/iolog.h
+++ b/iolog.h
@@ -109,10 +109,11 @@ struct io_log {
 	unsigned long avg_msec;
 	unsigned long avg_last;
 
-  /*
-   * Windowed latency histograms, for keeping track of when we need to
-   * save a copy of the histogram every approximately hist_msec milliseconds.
-   */
+	/*
+	 * Windowed latency histograms, for keeping track of when we need to
+	 * save a copy of the histogram every approximately hist_msec
+	 * milliseconds.
+	 */
 	struct io_hist hist_window[DDIR_RWDIR_CNT];
 	unsigned long hist_msec;
 	int hist_coarseness;
diff --git a/lib/bloom.c b/lib/bloom.c
index ee4ba0b..f4eff57 100644
--- a/lib/bloom.c
+++ b/lib/bloom.c
@@ -35,7 +35,7 @@ static uint32_t bloom_fnv(const void *buf, uint32_t len, uint32_t seed)
 
 #define BLOOM_SEED	0x8989
 
-struct bloom_hash hashes[] = {
+static struct bloom_hash hashes[] = {
 	{
 		.seed = BLOOM_SEED,
 		.fn = jhash,
diff --git a/lib/mountcheck.c b/lib/mountcheck.c
index e37e9f9..e8780eb 100644
--- a/lib/mountcheck.c
+++ b/lib/mountcheck.c
@@ -4,6 +4,8 @@
 #ifdef CONFIG_GETMNTENT
 #include <mntent.h>
 
+#include "lib/mountcheck.h"
+
 #define MTAB	"/etc/mtab"
 
 int device_is_mounted(const char *dev)
diff --git a/lib/strntol.c b/lib/strntol.c
index 713f63b..adf45bd 100644
--- a/lib/strntol.c
+++ b/lib/strntol.c
@@ -2,6 +2,8 @@
 #include <stdlib.h>
 #include <limits.h>
 
+#include "lib/strntol.h"
+
 long strntol(const char *str, size_t sz, char **end, int base)
 {
 	/* Expect that digit representation of LONG_MAX/MIN
diff --git a/memory.c b/memory.c
index af4d5ef..9124117 100644
--- a/memory.c
+++ b/memory.c
@@ -215,13 +215,13 @@ int allocate_io_mem(struct thread_data *td)
 	size_t total_mem;
 	int ret = 0;
 
-	if (td->io_ops->flags & FIO_NOIO)
+	if (td_ioengine_flagged(td, FIO_NOIO))
 		return 0;
 
 	total_mem = td->orig_buffer_size;
 
 	if (td->o.odirect || td->o.mem_align || td->o.oatomic ||
-	    (td->io_ops->flags & FIO_MEMALIGN)) {
+	    td_ioengine_flagged(td, FIO_MEMALIGN)) {
 		total_mem += page_mask;
 		if (td->o.mem_align && td->o.mem_align > page_size)
 			total_mem += td->o.mem_align - page_size;
diff --git a/options.c b/options.c
index 56e51fc..517ee68 100644
--- a/options.c
+++ b/options.c
@@ -20,6 +20,8 @@
 
 char client_sockaddr_str[INET6_ADDRSTRLEN] = { 0 };
 
+#define cb_data_to_td(data)	container_of(data, struct thread_data, o)
+
 struct pattern_fmt_desc fmt_desc[] = {
 	{
 		.fmt   = "%o",
@@ -223,7 +225,7 @@ static int str_split_parse(struct thread_data *td, char *str, split_parse_fn *fn
 
 static int str_bssplit_cb(void *data, const char *input)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	char *str, *p;
 	int ret = 0;
 
@@ -324,7 +326,7 @@ static int ignore_error_type(struct thread_data *td, int etype, char *str)
 
 static int str_ignore_error_cb(void *data, const char *input)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	char *str, *p, *n;
 	int type = 0, ret = 1;
 
@@ -352,7 +354,7 @@ static int str_ignore_error_cb(void *data, const char *input)
 
 static int str_rw_cb(void *data, const char *str)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	struct thread_options *o = &td->o;
 	char *nr;
 
@@ -386,7 +388,7 @@ static int str_rw_cb(void *data, const char *str)
 
 static int str_mem_cb(void *data, const char *mem)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 
 	if (td->o.mem_type == MEM_MMAPHUGE || td->o.mem_type == MEM_MMAP ||
 	    td->o.mem_type == MEM_MMAPSHARED)
@@ -397,7 +399,7 @@ static int str_mem_cb(void *data, const char *mem)
 
 static int fio_clock_source_cb(void *data, const char *str)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 
 	fio_clock_source = td->o.clocksource;
 	fio_clock_source_set = 1;
@@ -407,7 +409,7 @@ static int fio_clock_source_cb(void *data, const char *str)
 
 static int str_rwmix_read_cb(void *data, unsigned long long *val)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 
 	td->o.rwmix[DDIR_READ] = *val;
 	td->o.rwmix[DDIR_WRITE] = 100 - *val;
@@ -416,7 +418,7 @@ static int str_rwmix_read_cb(void *data, unsigned long long *val)
 
 static int str_rwmix_write_cb(void *data, unsigned long long *val)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 
 	td->o.rwmix[DDIR_WRITE] = *val;
 	td->o.rwmix[DDIR_READ] = 100 - *val;
@@ -454,7 +456,7 @@ int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu_index)
 
 static int str_cpumask_cb(void *data, unsigned long long *val)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	unsigned int i;
 	long max_cpu;
 	int ret;
@@ -554,7 +556,7 @@ static int set_cpus_allowed(struct thread_data *td, os_cpu_mask_t *mask,
 
 static int str_cpus_allowed_cb(void *data, const char *input)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 
 	if (parse_dryrun())
 		return 0;
@@ -564,7 +566,7 @@ static int str_cpus_allowed_cb(void *data, const char *input)
 
 static int str_verify_cpus_allowed_cb(void *data, const char *input)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 
 	if (parse_dryrun())
 		return 0;
@@ -575,7 +577,7 @@ static int str_verify_cpus_allowed_cb(void *data, const char *input)
 #ifdef CONFIG_ZLIB
 static int str_log_cpus_allowed_cb(void *data, const char *input)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 
 	if (parse_dryrun())
 		return 0;
@@ -589,7 +591,7 @@ static int str_log_cpus_allowed_cb(void *data, const char *input)
 #ifdef CONFIG_LIBNUMA
 static int str_numa_cpunodes_cb(void *data, char *input)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	struct bitmask *verify_bitmask;
 
 	if (parse_dryrun())
@@ -614,7 +616,7 @@ static int str_numa_cpunodes_cb(void *data, char *input)
 
 static int str_numa_mpol_cb(void *data, char *input)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	const char * const policy_types[] =
 		{ "default", "prefer", "bind", "interleave", "local", NULL };
 	int i;
@@ -723,7 +725,7 @@ out:
 
 static int str_fst_cb(void *data, const char *str)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	double val;
 	bool done = false;
 	char *nr;
@@ -803,7 +805,7 @@ static int str_fst_cb(void *data, const char *str)
 #ifdef CONFIG_SYNC_FILE_RANGE
 static int str_sfr_cb(void *data, const char *str)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	char *nr = get_opt_postfix(str);
 
 	td->sync_file_range_nr = 1;
@@ -1006,7 +1008,7 @@ static int parse_zoned_distribution(struct thread_data *td, const char *input)
 
 static int str_random_distribution_cb(void *data, const char *str)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	double val;
 	char *nr;
 
@@ -1151,7 +1153,7 @@ int set_name_idx(char *target, size_t tlen, char *input, int index,
 
 static int str_filename_cb(void *data, const char *input)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	char *fname, *str, *p;
 
 	p = str = strdup(input);
@@ -1174,7 +1176,7 @@ static int str_filename_cb(void *data, const char *input)
 
 static int str_directory_cb(void *data, const char fio_unused *unused)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	struct stat sb;
 	char *dirname, *str, *p;
 	int ret = 0;
@@ -1205,7 +1207,7 @@ out:
 
 static int str_opendir_cb(void *data, const char fio_unused *str)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 
 	if (parse_dryrun())
 		return 0;
@@ -1218,7 +1220,7 @@ static int str_opendir_cb(void *data, const char fio_unused *str)
 
 static int str_buffer_pattern_cb(void *data, const char *input)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	int ret;
 
 	/* FIXME: for now buffer pattern does not support formats */
@@ -1239,7 +1241,7 @@ static int str_buffer_pattern_cb(void *data, const char *input)
 
 static int str_buffer_compress_cb(void *data, unsigned long long *il)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 
 	td->flags |= TD_F_COMPRESS;
 	td->o.compress_percentage = *il;
@@ -1248,7 +1250,7 @@ static int str_buffer_compress_cb(void *data, unsigned long long *il)
 
 static int str_dedupe_cb(void *data, unsigned long long *il)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 
 	td->flags |= TD_F_COMPRESS;
 	td->o.dedupe_percentage = *il;
@@ -1258,7 +1260,7 @@ static int str_dedupe_cb(void *data, unsigned long long *il)
 
 static int str_verify_pattern_cb(void *data, const char *input)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	int ret;
 
 	td->o.verify_fmt_sz = ARRAY_SIZE(td->o.verify_fmt);
@@ -1281,7 +1283,7 @@ static int str_verify_pattern_cb(void *data, const char *input)
 
 static int str_gtod_reduce_cb(void *data, int *il)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	int val = *il;
 
 	td->o.disable_lat = !!val;
@@ -1297,7 +1299,7 @@ static int str_gtod_reduce_cb(void *data, int *il)
 
 static int str_size_cb(void *data, unsigned long long *__val)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 	unsigned long long v = *__val;
 
 	if (parse_is_percent(v)) {
@@ -1311,7 +1313,7 @@ static int str_size_cb(void *data, unsigned long long *__val)
 
 static int rw_verify(struct fio_option *o, void *data)
 {
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 
 	if (read_only && td_write(td)) {
 		log_err("fio: job <%s> has write bit set, but fio is in"
@@ -1325,7 +1327,7 @@ static int rw_verify(struct fio_option *o, void *data)
 static int gtod_cpu_verify(struct fio_option *o, void *data)
 {
 #ifndef FIO_HAVE_CPU_AFFINITY
-	struct thread_data *td = data;
+	struct thread_data *td = cb_data_to_td(data);
 
 	if (td->o.gtod_cpu) {
 		log_err("fio: platform must support CPU affinity for"
@@ -1345,7 +1347,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "description",
 		.lname	= "Description of job",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(description),
+		.off1	= offsetof(struct thread_options, description),
 		.help	= "Text job description",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_DESC,
@@ -1354,7 +1356,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "name",
 		.lname	= "Job name",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(name),
+		.off1	= offsetof(struct thread_options, name),
 		.help	= "Name of this job",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_DESC,
@@ -1363,7 +1365,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "wait_for",
 		.lname	= "Waitee name",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(wait_for),
+		.off1	= offsetof(struct thread_options, wait_for),
 		.help	= "Name of the job this one wants to wait for before starting",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_DESC,
@@ -1372,7 +1374,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "filename",
 		.lname	= "Filename(s)",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(filename),
+		.off1	= offsetof(struct thread_options, filename),
 		.cb	= str_filename_cb,
 		.prio	= -1, /* must come after "directory" */
 		.help	= "File(s) to use for the workload",
@@ -1383,7 +1385,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "directory",
 		.lname	= "Directory",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(directory),
+		.off1	= offsetof(struct thread_options, directory),
 		.cb	= str_directory_cb,
 		.help	= "Directory to store files in",
 		.category = FIO_OPT_C_FILE,
@@ -1393,7 +1395,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "filename_format",
 		.lname	= "Filename Format",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(filename_format),
+		.off1	= offsetof(struct thread_options, filename_format),
 		.prio	= -1, /* must come after "directory" */
 		.help	= "Override default $jobname.$jobnum.$filenum naming",
 		.def	= "$jobname.$jobnum.$filenum",
@@ -1404,7 +1406,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "unique_filename",
 		.lname	= "Unique Filename",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(unique_filename),
+		.off1	= offsetof(struct thread_options, unique_filename),
 		.help	= "For network clients, prefix file with source IP",
 		.def	= "1",
 		.category = FIO_OPT_C_FILE,
@@ -1414,7 +1416,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "lockfile",
 		.lname	= "Lockfile",
 		.type	= FIO_OPT_STR,
-		.off1	= td_var_offset(file_lock_mode),
+		.off1	= offsetof(struct thread_options, file_lock_mode),
 		.help	= "Lock file when doing IO to it",
 		.prio	= 1,
 		.parent	= "filename",
@@ -1442,7 +1444,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "opendir",
 		.lname	= "Open directory",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(opendir),
+		.off1	= offsetof(struct thread_options, opendir),
 		.cb	= str_opendir_cb,
 		.help	= "Recursively add files from this directory and down",
 		.category = FIO_OPT_C_FILE,
@@ -1454,7 +1456,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.alias	= "readwrite",
 		.type	= FIO_OPT_STR,
 		.cb	= str_rw_cb,
-		.off1	= td_var_offset(td_ddir),
+		.off1	= offsetof(struct thread_options, td_ddir),
 		.help	= "IO direction",
 		.def	= "read",
 		.verify	= rw_verify,
@@ -1507,7 +1509,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "rw_sequencer",
 		.lname	= "RW Sequencer",
 		.type	= FIO_OPT_STR,
-		.off1	= td_var_offset(rw_seq),
+		.off1	= offsetof(struct thread_options, rw_seq),
 		.help	= "IO offset generator modifier",
 		.def	= "sequential",
 		.category = FIO_OPT_C_IO,
@@ -1528,7 +1530,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "ioengine",
 		.lname	= "IO Engine",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(ioengine),
+		.off1	= offsetof(struct thread_options, ioengine),
 		.help	= "IO engine to use",
 		.def	= FIO_PREFERRED_ENGINE,
 		.category = FIO_OPT_C_IO,
@@ -1661,7 +1663,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "iodepth",
 		.lname	= "IO Depth",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(iodepth),
+		.off1	= offsetof(struct thread_options, iodepth),
 		.help	= "Number of IO buffers to keep in flight",
 		.minval = 1,
 		.interval = 1,
@@ -1674,7 +1676,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "IO Depth batch",
 		.alias	= "iodepth_batch_submit",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(iodepth_batch),
+		.off1	= offsetof(struct thread_options, iodepth_batch),
 		.help	= "Number of IO buffers to submit in one go",
 		.parent	= "iodepth",
 		.hide	= 1,
@@ -1688,7 +1690,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Min IO depth batch complete",
 		.alias	= "iodepth_batch_complete",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(iodepth_batch_complete_min),
+		.off1	= offsetof(struct thread_options, iodepth_batch_complete_min),
 		.help	= "Min number of IO buffers to retrieve in one go",
 		.parent	= "iodepth",
 		.hide	= 1,
@@ -1702,7 +1704,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "iodepth_batch_complete_max",
 		.lname	= "Max IO depth batch complete",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(iodepth_batch_complete_max),
+		.off1	= offsetof(struct thread_options, iodepth_batch_complete_max),
 		.help	= "Max number of IO buffers to retrieve in one go",
 		.parent	= "iodepth",
 		.hide	= 1,
@@ -1715,7 +1717,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "iodepth_low",
 		.lname	= "IO Depth batch low",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(iodepth_low),
+		.off1	= offsetof(struct thread_options, iodepth_low),
 		.help	= "Low water mark for queuing depth",
 		.parent	= "iodepth",
 		.hide	= 1,
@@ -1727,7 +1729,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "io_submit_mode",
 		.lname	= "IO submit mode",
 		.type	= FIO_OPT_STR,
-		.off1	= td_var_offset(io_submit_mode),
+		.off1	= offsetof(struct thread_options, io_submit_mode),
 		.help	= "How IO submissions and completions are done",
 		.def	= "inline",
 		.category = FIO_OPT_C_IO,
@@ -1748,7 +1750,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Size",
 		.type	= FIO_OPT_STR_VAL,
 		.cb	= str_size_cb,
-		.off1	= td_var_offset(size),
+		.off1	= offsetof(struct thread_options, size),
 		.help	= "Total size of device or files",
 		.interval = 1024 * 1024,
 		.category = FIO_OPT_C_IO,
@@ -1759,7 +1761,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.alias	= "io_limit",
 		.lname	= "IO Size",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= td_var_offset(io_limit),
+		.off1	= offsetof(struct thread_options, io_limit),
 		.interval = 1024 * 1024,
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_INVALID,
@@ -1769,7 +1771,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Fill device",
 		.alias	= "fill_fs",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(fill_device),
+		.off1	= offsetof(struct thread_options, fill_device),
 		.help	= "Write until an ENOSPC error occurs",
 		.def	= "0",
 		.category = FIO_OPT_C_FILE,
@@ -1779,8 +1781,8 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "filesize",
 		.lname	= "File size",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= td_var_offset(file_size_low),
-		.off2	= td_var_offset(file_size_high),
+		.off1	= offsetof(struct thread_options, file_size_low),
+		.off2	= offsetof(struct thread_options, file_size_high),
 		.minval = 1,
 		.help	= "Size of individual files",
 		.interval = 1024 * 1024,
@@ -1791,7 +1793,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "file_append",
 		.lname	= "File append",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(file_append),
+		.off1	= offsetof(struct thread_options, file_append),
 		.help	= "IO will start at the end of the file(s)",
 		.def	= "0",
 		.category = FIO_OPT_C_FILE,
@@ -1802,7 +1804,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "IO offset",
 		.alias	= "fileoffset",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= td_var_offset(start_offset),
+		.off1	= offsetof(struct thread_options, start_offset),
 		.help	= "Start IO from this offset",
 		.def	= "0",
 		.interval = 1024 * 1024,
@@ -1813,7 +1815,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "offset_increment",
 		.lname	= "IO offset increment",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= td_var_offset(offset_increment),
+		.off1	= offsetof(struct thread_options, offset_increment),
 		.help	= "What is the increment from one offset to the next",
 		.parent = "offset",
 		.hide	= 1,
@@ -1826,7 +1828,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "number_ios",
 		.lname	= "Number of IOs to perform",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= td_var_offset(number_ios),
+		.off1	= offsetof(struct thread_options, number_ios),
 		.help	= "Force job completion after this number of IOs",
 		.def	= "0",
 		.category = FIO_OPT_C_IO,
@@ -1837,9 +1839,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Block size",
 		.alias	= "blocksize",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(bs[DDIR_READ]),
-		.off2	= td_var_offset(bs[DDIR_WRITE]),
-		.off3	= td_var_offset(bs[DDIR_TRIM]),
+		.off1	= offsetof(struct thread_options, bs[DDIR_READ]),
+		.off2	= offsetof(struct thread_options, bs[DDIR_WRITE]),
+		.off3	= offsetof(struct thread_options, bs[DDIR_TRIM]),
 		.minval = 1,
 		.help	= "Block size unit",
 		.def	= "4k",
@@ -1854,9 +1856,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Block size align",
 		.alias	= "blockalign",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(ba[DDIR_READ]),
-		.off2	= td_var_offset(ba[DDIR_WRITE]),
-		.off3	= td_var_offset(ba[DDIR_TRIM]),
+		.off1	= offsetof(struct thread_options, ba[DDIR_READ]),
+		.off2	= offsetof(struct thread_options, ba[DDIR_WRITE]),
+		.off3	= offsetof(struct thread_options, ba[DDIR_TRIM]),
 		.minval	= 1,
 		.help	= "IO block offset alignment",
 		.parent	= "rw",
@@ -1870,12 +1872,12 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Block size range",
 		.alias	= "blocksize_range",
 		.type	= FIO_OPT_RANGE,
-		.off1	= td_var_offset(min_bs[DDIR_READ]),
-		.off2	= td_var_offset(max_bs[DDIR_READ]),
-		.off3	= td_var_offset(min_bs[DDIR_WRITE]),
-		.off4	= td_var_offset(max_bs[DDIR_WRITE]),
-		.off5	= td_var_offset(min_bs[DDIR_TRIM]),
-		.off6	= td_var_offset(max_bs[DDIR_TRIM]),
+		.off1	= offsetof(struct thread_options, min_bs[DDIR_READ]),
+		.off2	= offsetof(struct thread_options, max_bs[DDIR_READ]),
+		.off3	= offsetof(struct thread_options, min_bs[DDIR_WRITE]),
+		.off4	= offsetof(struct thread_options, max_bs[DDIR_WRITE]),
+		.off5	= offsetof(struct thread_options, min_bs[DDIR_TRIM]),
+		.off6	= offsetof(struct thread_options, max_bs[DDIR_TRIM]),
 		.minval = 1,
 		.help	= "Set block size range (in more detail than bs)",
 		.parent = "rw",
@@ -1889,7 +1891,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Block size split",
 		.type	= FIO_OPT_STR,
 		.cb	= str_bssplit_cb,
-		.off1	= td_var_offset(bssplit),
+		.off1	= offsetof(struct thread_options, bssplit),
 		.help	= "Set a specific mix of block sizes",
 		.parent	= "rw",
 		.hide	= 1,
@@ -1901,7 +1903,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Block size unaligned",
 		.alias	= "blocksize_unaligned",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= td_var_offset(bs_unaligned),
+		.off1	= offsetof(struct thread_options, bs_unaligned),
 		.help	= "Don't sector align IO buffer sizes",
 		.parent = "rw",
 		.hide	= 1,
@@ -1912,7 +1914,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "bs_is_seq_rand",
 		.lname	= "Block size division is seq/random (not read/write)",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(bs_is_seq_rand),
+		.off1	= offsetof(struct thread_options, bs_is_seq_rand),
 		.help	= "Consider any blocksize setting to be sequential,random",
 		.def	= "0",
 		.parent = "blocksize",
@@ -1923,7 +1925,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "randrepeat",
 		.lname	= "Random repeatable",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(rand_repeatable),
+		.off1	= offsetof(struct thread_options, rand_repeatable),
 		.help	= "Use repeatable random IO pattern",
 		.def	= "1",
 		.parent = "rw",
@@ -1935,7 +1937,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "randseed",
 		.lname	= "The random generator seed",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= td_var_offset(rand_seed),
+		.off1	= offsetof(struct thread_options, rand_seed),
 		.help	= "Set the random generator seed value",
 		.def	= "0x89",
 		.parent = "rw",
@@ -1946,7 +1948,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "use_os_rand",
 		.lname	= "Use OS random",
 		.type	= FIO_OPT_DEPRECATED,
-		.off1	= td_var_offset(dep_use_os_rand),
+		.off1	= offsetof(struct thread_options, dep_use_os_rand),
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_RANDOM,
 	},
@@ -1954,7 +1956,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "norandommap",
 		.lname	= "No randommap",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= td_var_offset(norandommap),
+		.off1	= offsetof(struct thread_options, norandommap),
 		.help	= "Accept potential duplicate random blocks",
 		.parent = "rw",
 		.hide	= 1,
@@ -1966,7 +1968,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "softrandommap",
 		.lname	= "Soft randommap",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(softrandommap),
+		.off1	= offsetof(struct thread_options, softrandommap),
 		.help	= "Set norandommap if randommap allocation fails",
 		.parent	= "norandommap",
 		.hide	= 1,
@@ -1978,7 +1980,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "random_generator",
 		.lname	= "Random Generator",
 		.type	= FIO_OPT_STR,
-		.off1	= td_var_offset(random_generator),
+		.off1	= offsetof(struct thread_options, random_generator),
 		.help	= "Type of random number generator to use",
 		.def	= "tausworthe",
 		.posval	= {
@@ -2003,7 +2005,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "random_distribution",
 		.lname	= "Random Distribution",
 		.type	= FIO_OPT_STR,
-		.off1	= td_var_offset(random_distribution),
+		.off1	= offsetof(struct thread_options, random_distribution),
 		.cb	= str_random_distribution_cb,
 		.help	= "Random offset distribution generator",
 		.def	= "random",
@@ -2037,9 +2039,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "percentage_random",
 		.lname	= "Percentage Random",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(perc_rand[DDIR_READ]),
-		.off2	= td_var_offset(perc_rand[DDIR_WRITE]),
-		.off3	= td_var_offset(perc_rand[DDIR_TRIM]),
+		.off1	= offsetof(struct thread_options, perc_rand[DDIR_READ]),
+		.off2	= offsetof(struct thread_options, perc_rand[DDIR_WRITE]),
+		.off3	= offsetof(struct thread_options, perc_rand[DDIR_TRIM]),
 		.maxval	= 100,
 		.help	= "Percentage of seq/random mix that should be random",
 		.def	= "100,100,100",
@@ -2059,7 +2061,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "allrandrepeat",
 		.lname	= "All Random Repeat",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(allrand_repeatable),
+		.off1	= offsetof(struct thread_options, allrand_repeatable),
 		.help	= "Use repeatable random numbers for everything",
 		.def	= "0",
 		.category = FIO_OPT_C_IO,
@@ -2070,7 +2072,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Number of files",
 		.alias	= "nr_files",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(nr_files),
+		.off1	= offsetof(struct thread_options, nr_files),
 		.help	= "Split job workload between this number of files",
 		.def	= "1",
 		.interval = 1,
@@ -2081,7 +2083,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "openfiles",
 		.lname	= "Number of open files",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(open_files),
+		.off1	= offsetof(struct thread_options, open_files),
 		.help	= "Number of files to keep open at the same time",
 		.category = FIO_OPT_C_FILE,
 		.group	= FIO_OPT_G_INVALID,
@@ -2091,7 +2093,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "File service type",
 		.type	= FIO_OPT_STR,
 		.cb	= str_fst_cb,
-		.off1	= td_var_offset(file_service_type),
+		.off1	= offsetof(struct thread_options, file_service_type),
 		.help	= "How to select which file to service next",
 		.def	= "roundrobin",
 		.category = FIO_OPT_C_FILE,
@@ -2130,7 +2132,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "fallocate",
 		.lname	= "Fallocate",
 		.type	= FIO_OPT_STR,
-		.off1	= td_var_offset(fallocate_mode),
+		.off1	= offsetof(struct thread_options, fallocate_mode),
 		.help	= "Whether pre-allocation is performed when laying out files",
 		.def	= "posix",
 		.category = FIO_OPT_C_FILE,
@@ -2173,7 +2175,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "fadvise_hint",
 		.lname	= "Fadvise hint",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(fadvise_hint),
+		.off1	= offsetof(struct thread_options, fadvise_hint),
 		.help	= "Use fadvise() to advise the kernel on IO pattern",
 		.def	= "1",
 		.category = FIO_OPT_C_FILE,
@@ -2184,7 +2186,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "fadvise_stream",
 		.lname	= "Fadvise stream",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(fadvise_stream),
+		.off1	= offsetof(struct thread_options, fadvise_stream),
 		.help	= "Use fadvise() to set stream ID",
 		.category = FIO_OPT_C_FILE,
 		.group	= FIO_OPT_G_INVALID,
@@ -2201,7 +2203,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "fsync",
 		.lname	= "Fsync",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(fsync_blocks),
+		.off1	= offsetof(struct thread_options, fsync_blocks),
 		.help	= "Issue fsync for writes every given number of blocks",
 		.def	= "0",
 		.interval = 1,
@@ -2212,7 +2214,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "fdatasync",
 		.lname	= "Fdatasync",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(fdatasync_blocks),
+		.off1	= offsetof(struct thread_options, fdatasync_blocks),
 		.help	= "Issue fdatasync for writes every given number of blocks",
 		.def	= "0",
 		.interval = 1,
@@ -2223,7 +2225,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "write_barrier",
 		.lname	= "Write barrier",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(barrier_blocks),
+		.off1	= offsetof(struct thread_options, barrier_blocks),
 		.help	= "Make every Nth write a barrier write",
 		.def	= "0",
 		.interval = 1,
@@ -2254,7 +2256,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		},
 		.type	= FIO_OPT_STR_MULTI,
 		.cb	= str_sfr_cb,
-		.off1	= td_var_offset(sync_file_range),
+		.off1	= offsetof(struct thread_options, sync_file_range),
 		.help	= "Use sync_file_range()",
 		.category = FIO_OPT_C_FILE,
 		.group	= FIO_OPT_G_INVALID,
@@ -2271,7 +2273,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "direct",
 		.lname	= "Direct I/O",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(odirect),
+		.off1	= offsetof(struct thread_options, odirect),
 		.help	= "Use O_DIRECT IO (negates buffered)",
 		.def	= "0",
 		.inverse = "buffered",
@@ -2282,7 +2284,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "atomic",
 		.lname	= "Atomic I/O",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(oatomic),
+		.off1	= offsetof(struct thread_options, oatomic),
 		.help	= "Use Atomic IO with O_DIRECT (implies O_DIRECT)",
 		.def	= "0",
 		.category = FIO_OPT_C_IO,
@@ -2292,7 +2294,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "buffered",
 		.lname	= "Buffered I/O",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(odirect),
+		.off1	= offsetof(struct thread_options, odirect),
 		.neg	= 1,
 		.help	= "Use buffered IO (negates direct)",
 		.def	= "1",
@@ -2304,7 +2306,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "overwrite",
 		.lname	= "Overwrite",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(overwrite),
+		.off1	= offsetof(struct thread_options, overwrite),
 		.help	= "When writing, set whether to overwrite current data",
 		.def	= "0",
 		.category = FIO_OPT_C_FILE,
@@ -2314,7 +2316,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "loops",
 		.lname	= "Loops",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(loops),
+		.off1	= offsetof(struct thread_options, loops),
 		.help	= "Number of times to run the job",
 		.def	= "1",
 		.interval = 1,
@@ -2325,7 +2327,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "numjobs",
 		.lname	= "Number of jobs",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(numjobs),
+		.off1	= offsetof(struct thread_options, numjobs),
 		.help	= "Duplicate this job this many times",
 		.def	= "1",
 		.interval = 1,
@@ -2336,8 +2338,8 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "startdelay",
 		.lname	= "Start delay",
 		.type	= FIO_OPT_STR_VAL_TIME,
-		.off1	= td_var_offset(start_delay),
-		.off2	= td_var_offset(start_delay_high),
+		.off1	= offsetof(struct thread_options, start_delay),
+		.off2	= offsetof(struct thread_options, start_delay_high),
 		.help	= "Only start job when this period has passed",
 		.def	= "0",
 		.is_seconds = 1,
@@ -2350,7 +2352,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Runtime",
 		.alias	= "timeout",
 		.type	= FIO_OPT_STR_VAL_TIME,
-		.off1	= td_var_offset(timeout),
+		.off1	= offsetof(struct thread_options, timeout),
 		.help	= "Stop workload when this amount of time has passed",
 		.def	= "0",
 		.is_seconds = 1,
@@ -2362,7 +2364,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "time_based",
 		.lname	= "Time based",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= td_var_offset(time_based),
+		.off1	= offsetof(struct thread_options, time_based),
 		.help	= "Keep running until runtime/timeout is met",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_RUNTIME,
@@ -2371,7 +2373,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "verify_only",
 		.lname	= "Verify only",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= td_var_offset(verify_only),
+		.off1	= offsetof(struct thread_options, verify_only),
 		.help	= "Verifies previously written data is still valid",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_RUNTIME,
@@ -2380,7 +2382,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "ramp_time",
 		.lname	= "Ramp time",
 		.type	= FIO_OPT_STR_VAL_TIME,
-		.off1	= td_var_offset(ramp_time),
+		.off1	= offsetof(struct thread_options, ramp_time),
 		.help	= "Ramp up time before measuring performance",
 		.is_seconds = 1,
 		.is_time = 1,
@@ -2392,7 +2394,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Clock source",
 		.type	= FIO_OPT_STR,
 		.cb	= fio_clock_source_cb,
-		.off1	= td_var_offset(clocksource),
+		.off1	= offsetof(struct thread_options, clocksource),
 		.help	= "What type of timing source to use",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_CLOCK,
@@ -2423,7 +2425,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "I/O Memory",
 		.type	= FIO_OPT_STR,
 		.cb	= str_mem_cb,
-		.off1	= td_var_offset(mem_type),
+		.off1	= offsetof(struct thread_options, mem_type),
 		.help	= "Backing type for IO buffers",
 		.def	= "malloc",
 		.category = FIO_OPT_C_IO,
@@ -2466,7 +2468,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.alias	= "mem_align",
 		.lname	= "I/O memory alignment",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(mem_align),
+		.off1	= offsetof(struct thread_options, mem_align),
 		.minval	= 0,
 		.help	= "IO memory buffer offset alignment",
 		.def	= "0",
@@ -2479,7 +2481,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "verify",
 		.lname	= "Verify",
 		.type	= FIO_OPT_STR,
-		.off1	= td_var_offset(verify),
+		.off1	= offsetof(struct thread_options, verify),
 		.help	= "Verify data written",
 		.def	= "0",
 		.category = FIO_OPT_C_IO,
@@ -2556,7 +2558,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "do_verify",
 		.lname	= "Perform verify step",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(do_verify),
+		.off1	= offsetof(struct thread_options, do_verify),
 		.help	= "Run verification stage after write",
 		.def	= "1",
 		.parent = "verify",
@@ -2568,7 +2570,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "verifysort",
 		.lname	= "Verify sort",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(verifysort),
+		.off1	= offsetof(struct thread_options, verifysort),
 		.help	= "Sort written verify blocks for read back",
 		.def	= "1",
 		.parent = "verify",
@@ -2580,7 +2582,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "verifysort_nr",
 		.lname	= "Verify Sort Nr",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(verifysort_nr),
+		.off1	= offsetof(struct thread_options, verifysort_nr),
 		.help	= "Pre-load and sort verify blocks for a read workload",
 		.minval	= 0,
 		.maxval	= 131072,
@@ -2593,7 +2595,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name   = "verify_interval",
 		.lname	= "Verify interval",
 		.type   = FIO_OPT_INT,
-		.off1   = td_var_offset(verify_interval),
+		.off1   = offsetof(struct thread_options, verify_interval),
 		.minval	= 2 * sizeof(struct verify_header),
 		.help   = "Store verify buffer header every N bytes",
 		.parent	= "verify",
@@ -2607,7 +2609,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Verify offset",
 		.type	= FIO_OPT_INT,
 		.help	= "Offset verify header location by N bytes",
-		.off1	= td_var_offset(verify_offset),
+		.off1	= offsetof(struct thread_options, verify_offset),
 		.minval	= sizeof(struct verify_header),
 		.parent	= "verify",
 		.hide	= 1,
@@ -2619,7 +2621,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Verify pattern",
 		.type	= FIO_OPT_STR,
 		.cb	= str_verify_pattern_cb,
-		.off1	= td_var_offset(verify_pattern),
+		.off1	= offsetof(struct thread_options, verify_pattern),
 		.help	= "Fill pattern for IO buffers",
 		.parent	= "verify",
 		.hide	= 1,
@@ -2630,7 +2632,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "verify_fatal",
 		.lname	= "Verify fatal",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(verify_fatal),
+		.off1	= offsetof(struct thread_options, verify_fatal),
 		.def	= "0",
 		.help	= "Exit on a single verify failure, don't continue",
 		.parent = "verify",
@@ -2642,7 +2644,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "verify_dump",
 		.lname	= "Verify dump",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(verify_dump),
+		.off1	= offsetof(struct thread_options, verify_dump),
 		.def	= "0",
 		.help	= "Dump contents of good and bad blocks on failure",
 		.parent = "verify",
@@ -2654,7 +2656,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "verify_async",
 		.lname	= "Verify asynchronously",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(verify_async),
+		.off1	= offsetof(struct thread_options, verify_async),
 		.def	= "0",
 		.help	= "Number of async verifier threads to use",
 		.parent	= "verify",
@@ -2666,7 +2668,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "verify_backlog",
 		.lname	= "Verify backlog",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= td_var_offset(verify_backlog),
+		.off1	= offsetof(struct thread_options, verify_backlog),
 		.help	= "Verify after this number of blocks are written",
 		.parent	= "verify",
 		.hide	= 1,
@@ -2677,7 +2679,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "verify_backlog_batch",
 		.lname	= "Verify backlog batch",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(verify_batch),
+		.off1	= offsetof(struct thread_options, verify_batch),
 		.help	= "Verify this number of IO blocks",
 		.parent	= "verify",
 		.hide	= 1,
@@ -2690,7 +2692,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Async verify CPUs",
 		.type	= FIO_OPT_STR,
 		.cb	= str_verify_cpus_allowed_cb,
-		.off1	= td_var_offset(verify_cpumask),
+		.off1	= offsetof(struct thread_options, verify_cpumask),
 		.help	= "Set CPUs allowed for async verify threads",
 		.parent	= "verify_async",
 		.hide	= 1,
@@ -2708,7 +2710,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "experimental_verify",
 		.lname	= "Experimental Verify",
-		.off1	= td_var_offset(experimental_verify),
+		.off1	= offsetof(struct thread_options, experimental_verify),
 		.type	= FIO_OPT_BOOL,
 		.help	= "Enable experimental verification",
 		.parent	= "verify",
@@ -2718,7 +2720,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "verify_state_load",
 		.lname	= "Load verify state",
-		.off1	= td_var_offset(verify_state),
+		.off1	= offsetof(struct thread_options, verify_state),
 		.type	= FIO_OPT_BOOL,
 		.help	= "Load verify termination state",
 		.parent	= "verify",
@@ -2728,7 +2730,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "verify_state_save",
 		.lname	= "Save verify state",
-		.off1	= td_var_offset(verify_state_save),
+		.off1	= offsetof(struct thread_options, verify_state_save),
 		.type	= FIO_OPT_BOOL,
 		.def	= "1",
 		.help	= "Save verify state on termination",
@@ -2741,7 +2743,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "trim_percentage",
 		.lname	= "Trim percentage",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(trim_percentage),
+		.off1	= offsetof(struct thread_options, trim_percentage),
 		.minval = 0,
 		.maxval = 100,
 		.help	= "Number of verify blocks to discard/trim",
@@ -2757,7 +2759,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Verify trim zero",
 		.type	= FIO_OPT_BOOL,
 		.help	= "Verify that trim/discarded blocks are returned as zeroes",
-		.off1	= td_var_offset(trim_zero),
+		.off1	= offsetof(struct thread_options, trim_zero),
 		.parent	= "trim_percentage",
 		.hide	= 1,
 		.def	= "1",
@@ -2768,7 +2770,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "trim_backlog",
 		.lname	= "Trim backlog",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= td_var_offset(trim_backlog),
+		.off1	= offsetof(struct thread_options, trim_backlog),
 		.help	= "Trim after this number of blocks are written",
 		.parent	= "trim_percentage",
 		.hide	= 1,
@@ -2780,7 +2782,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "trim_backlog_batch",
 		.lname	= "Trim backlog batch",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(trim_batch),
+		.off1	= offsetof(struct thread_options, trim_batch),
 		.help	= "Trim this number of IO blocks",
 		.parent	= "trim_percentage",
 		.hide	= 1,
@@ -2818,7 +2820,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "write_iolog",
 		.lname	= "Write I/O log",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(write_iolog_file),
+		.off1	= offsetof(struct thread_options, write_iolog_file),
 		.help	= "Store IO pattern to file",
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_IOLOG,
@@ -2827,7 +2829,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "read_iolog",
 		.lname	= "Read I/O log",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(read_iolog_file),
+		.off1	= offsetof(struct thread_options, read_iolog_file),
 		.help	= "Playback IO pattern from file",
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_IOLOG,
@@ -2836,7 +2838,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "replay_no_stall",
 		.lname	= "Don't stall on replay",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(no_stall),
+		.off1	= offsetof(struct thread_options, no_stall),
 		.def	= "0",
 		.parent	= "read_iolog",
 		.hide	= 1,
@@ -2848,7 +2850,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "replay_redirect",
 		.lname	= "Redirect device for replay",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(replay_redirect),
+		.off1	= offsetof(struct thread_options, replay_redirect),
 		.parent	= "read_iolog",
 		.hide	= 1,
 		.help	= "Replay all I/O onto this device, regardless of trace device",
@@ -2859,7 +2861,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "replay_scale",
 		.lname	= "Replace offset scale factor",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(replay_scale),
+		.off1	= offsetof(struct thread_options, replay_scale),
 		.parent	= "read_iolog",
 		.def	= "1",
 		.help	= "Align offsets to this blocksize",
@@ -2870,7 +2872,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "replay_align",
 		.lname	= "Replace alignment",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(replay_align),
+		.off1	= offsetof(struct thread_options, replay_align),
 		.parent	= "read_iolog",
 		.help	= "Scale offset down by this factor",
 		.category = FIO_OPT_C_IO,
@@ -2881,7 +2883,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "exec_prerun",
 		.lname	= "Pre-execute runnable",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(exec_prerun),
+		.off1	= offsetof(struct thread_options, exec_prerun),
 		.help	= "Execute this file prior to running job",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_INVALID,
@@ -2890,7 +2892,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "exec_postrun",
 		.lname	= "Post-execute runnable",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(exec_postrun),
+		.off1	= offsetof(struct thread_options, exec_postrun),
 		.help	= "Execute this file after running job",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_INVALID,
@@ -2900,7 +2902,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "ioscheduler",
 		.lname	= "I/O scheduler",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(ioscheduler),
+		.off1	= offsetof(struct thread_options, ioscheduler),
 		.help	= "Use this IO scheduler on the backing device",
 		.category = FIO_OPT_C_FILE,
 		.group	= FIO_OPT_G_INVALID,
@@ -2917,7 +2919,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "zonesize",
 		.lname	= "Zone size",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= td_var_offset(zone_size),
+		.off1	= offsetof(struct thread_options, zone_size),
 		.help	= "Amount of data to read per zone",
 		.def	= "0",
 		.interval = 1024 * 1024,
@@ -2928,7 +2930,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "zonerange",
 		.lname	= "Zone range",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= td_var_offset(zone_range),
+		.off1	= offsetof(struct thread_options, zone_range),
 		.help	= "Give size of an IO zone",
 		.def	= "0",
 		.interval = 1024 * 1024,
@@ -2939,7 +2941,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "zoneskip",
 		.lname	= "Zone skip",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= td_var_offset(zone_skip),
+		.off1	= offsetof(struct thread_options, zone_skip),
 		.help	= "Space between IO zones",
 		.def	= "0",
 		.interval = 1024 * 1024,
@@ -2950,7 +2952,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "lockmem",
 		.lname	= "Lock memory",
 		.type	= FIO_OPT_STR_VAL,
-		.off1	= td_var_offset(lockmem),
+		.off1	= offsetof(struct thread_options, lockmem),
 		.help	= "Lock down this amount of memory (per worker)",
 		.def	= "0",
 		.interval = 1024 * 1024,
@@ -2962,7 +2964,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Read/write mix read",
 		.type	= FIO_OPT_INT,
 		.cb	= str_rwmix_read_cb,
-		.off1	= td_var_offset(rwmix[DDIR_READ]),
+		.off1	= offsetof(struct thread_options, rwmix[DDIR_READ]),
 		.maxval	= 100,
 		.help	= "Percentage of mixed workload that is reads",
 		.def	= "50",
@@ -2976,7 +2978,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Read/write mix write",
 		.type	= FIO_OPT_INT,
 		.cb	= str_rwmix_write_cb,
-		.off1	= td_var_offset(rwmix[DDIR_WRITE]),
+		.off1	= offsetof(struct thread_options, rwmix[DDIR_WRITE]),
 		.maxval	= 100,
 		.help	= "Percentage of mixed workload that is writes",
 		.def	= "50",
@@ -2996,7 +2998,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "nice",
 		.lname	= "Nice",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(nice),
+		.off1	= offsetof(struct thread_options, nice),
 		.help	= "Set job CPU nice value",
 		.minval	= -19,
 		.maxval	= 20,
@@ -3010,7 +3012,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "prio",
 		.lname	= "I/O nice priority",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(ioprio),
+		.off1	= offsetof(struct thread_options, ioprio),
 		.help	= "Set job IO priority value",
 		.minval	= IOPRIO_MIN_PRIO,
 		.maxval	= IOPRIO_MAX_PRIO,
@@ -3034,7 +3036,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "prioclass",
 		.lname	= "I/O nice priority class",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(ioprio_class),
+		.off1	= offsetof(struct thread_options, ioprio_class),
 		.help	= "Set job IO priority class",
 		.minval	= IOPRIO_MIN_PRIO_CLASS,
 		.maxval	= IOPRIO_MAX_PRIO_CLASS,
@@ -3054,7 +3056,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "thinktime",
 		.lname	= "Thinktime",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(thinktime),
+		.off1	= offsetof(struct thread_options, thinktime),
 		.help	= "Idle time between IO buffers (usec)",
 		.def	= "0",
 		.is_time = 1,
@@ -3065,7 +3067,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "thinktime_spin",
 		.lname	= "Thinktime spin",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(thinktime_spin),
+		.off1	= offsetof(struct thread_options, thinktime_spin),
 		.help	= "Start think time by spinning this amount (usec)",
 		.def	= "0",
 		.is_time = 1,
@@ -3078,7 +3080,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "thinktime_blocks",
 		.lname	= "Thinktime blocks",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(thinktime_blocks),
+		.off1	= offsetof(struct thread_options, thinktime_blocks),
 		.help	= "IO buffer period between 'thinktime'",
 		.def	= "1",
 		.parent	= "thinktime",
@@ -3090,9 +3092,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "rate",
 		.lname	= "I/O rate",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(rate[DDIR_READ]),
-		.off2	= td_var_offset(rate[DDIR_WRITE]),
-		.off3	= td_var_offset(rate[DDIR_TRIM]),
+		.off1	= offsetof(struct thread_options, rate[DDIR_READ]),
+		.off2	= offsetof(struct thread_options, rate[DDIR_WRITE]),
+		.off3	= offsetof(struct thread_options, rate[DDIR_TRIM]),
 		.help	= "Set bandwidth rate",
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_RATE,
@@ -3102,9 +3104,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.alias	= "ratemin",
 		.lname	= "I/O min rate",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(ratemin[DDIR_READ]),
-		.off2	= td_var_offset(ratemin[DDIR_WRITE]),
-		.off3	= td_var_offset(ratemin[DDIR_TRIM]),
+		.off1	= offsetof(struct thread_options, ratemin[DDIR_READ]),
+		.off2	= offsetof(struct thread_options, ratemin[DDIR_WRITE]),
+		.off3	= offsetof(struct thread_options, ratemin[DDIR_TRIM]),
 		.help	= "Job must meet this rate or it will be shutdown",
 		.parent	= "rate",
 		.hide	= 1,
@@ -3115,9 +3117,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "rate_iops",
 		.lname	= "I/O rate IOPS",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(rate_iops[DDIR_READ]),
-		.off2	= td_var_offset(rate_iops[DDIR_WRITE]),
-		.off3	= td_var_offset(rate_iops[DDIR_TRIM]),
+		.off1	= offsetof(struct thread_options, rate_iops[DDIR_READ]),
+		.off2	= offsetof(struct thread_options, rate_iops[DDIR_WRITE]),
+		.off3	= offsetof(struct thread_options, rate_iops[DDIR_TRIM]),
 		.help	= "Limit IO used to this number of IO operations/sec",
 		.hide	= 1,
 		.category = FIO_OPT_C_IO,
@@ -3127,9 +3129,9 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "rate_iops_min",
 		.lname	= "I/O min rate IOPS",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(rate_iops_min[DDIR_READ]),
-		.off2	= td_var_offset(rate_iops_min[DDIR_WRITE]),
-		.off3	= td_var_offset(rate_iops_min[DDIR_TRIM]),
+		.off1	= offsetof(struct thread_options, rate_iops_min[DDIR_READ]),
+		.off2	= offsetof(struct thread_options, rate_iops_min[DDIR_WRITE]),
+		.off3	= offsetof(struct thread_options, rate_iops_min[DDIR_TRIM]),
 		.help	= "Job must meet this rate or it will be shut down",
 		.parent	= "rate_iops",
 		.hide	= 1,
@@ -3140,7 +3142,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "rate_process",
 		.lname	= "Rate Process",
 		.type	= FIO_OPT_STR,
-		.off1	= td_var_offset(rate_process),
+		.off1	= offsetof(struct thread_options, rate_process),
 		.help	= "What process controls how rated IO is managed",
 		.def	= "linear",
 		.category = FIO_OPT_C_IO,
@@ -3163,7 +3165,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.alias	= "ratecycle",
 		.lname	= "I/O rate cycle",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(ratecycle),
+		.off1	= offsetof(struct thread_options, ratecycle),
 		.help	= "Window average for rate limits (msec)",
 		.def	= "1000",
 		.parent = "rate",
@@ -3175,7 +3177,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "max_latency",
 		.lname	= "Max Latency",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(max_latency),
+		.off1	= offsetof(struct thread_options, max_latency),
 		.help	= "Maximum tolerated IO latency (usec)",
 		.is_time = 1,
 		.category = FIO_OPT_C_IO,
@@ -3185,7 +3187,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "latency_target",
 		.lname	= "Latency Target (usec)",
 		.type	= FIO_OPT_STR_VAL_TIME,
-		.off1	= td_var_offset(latency_target),
+		.off1	= offsetof(struct thread_options, latency_target),
 		.help	= "Ramp to max queue depth supporting this latency",
 		.is_time = 1,
 		.category = FIO_OPT_C_IO,
@@ -3195,7 +3197,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "latency_window",
 		.lname	= "Latency Window (usec)",
 		.type	= FIO_OPT_STR_VAL_TIME,
-		.off1	= td_var_offset(latency_window),
+		.off1	= offsetof(struct thread_options, latency_window),
 		.help	= "Time to sustain latency_target",
 		.is_time = 1,
 		.category = FIO_OPT_C_IO,
@@ -3205,7 +3207,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "latency_percentile",
 		.lname	= "Latency Percentile",
 		.type	= FIO_OPT_FLOAT_LIST,
-		.off1	= td_var_offset(latency_percentile),
+		.off1	= offsetof(struct thread_options, latency_percentile),
 		.help	= "Percentile of IOs must be below latency_target",
 		.def	= "100",
 		.maxlen	= 1,
@@ -3218,7 +3220,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "invalidate",
 		.lname	= "Cache invalidate",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(invalidate_cache),
+		.off1	= offsetof(struct thread_options, invalidate_cache),
 		.help	= "Invalidate buffer/page cache prior to running job",
 		.def	= "1",
 		.category = FIO_OPT_C_IO,
@@ -3228,7 +3230,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "sync",
 		.lname	= "Synchronous I/O",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(sync_io),
+		.off1	= offsetof(struct thread_options, sync_io),
 		.help	= "Use O_SYNC for buffered writes",
 		.def	= "0",
 		.parent = "buffered",
@@ -3240,7 +3242,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "create_serialize",
 		.lname	= "Create serialize",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(create_serialize),
+		.off1	= offsetof(struct thread_options, create_serialize),
 		.help	= "Serialize creation of job files",
 		.def	= "1",
 		.category = FIO_OPT_C_FILE,
@@ -3250,7 +3252,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "create_fsync",
 		.lname	= "Create fsync",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(create_fsync),
+		.off1	= offsetof(struct thread_options, create_fsync),
 		.help	= "fsync file after creation",
 		.def	= "1",
 		.category = FIO_OPT_C_FILE,
@@ -3260,7 +3262,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "create_on_open",
 		.lname	= "Create on open",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(create_on_open),
+		.off1	= offsetof(struct thread_options, create_on_open),
 		.help	= "Create files when they are opened for IO",
 		.def	= "0",
 		.category = FIO_OPT_C_FILE,
@@ -3270,7 +3272,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "create_only",
 		.lname	= "Create Only",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(create_only),
+		.off1	= offsetof(struct thread_options, create_only),
 		.help	= "Only perform file creation phase",
 		.category = FIO_OPT_C_FILE,
 		.def	= "0",
@@ -3279,7 +3281,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "allow_file_create",
 		.lname	= "Allow file create",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(allow_create),
+		.off1	= offsetof(struct thread_options, allow_create),
 		.help	= "Permit fio to create files, if they don't exist",
 		.def	= "1",
 		.category = FIO_OPT_C_FILE,
@@ -3289,7 +3291,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "allow_mounted_write",
 		.lname	= "Allow mounted write",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(allow_mounted_write),
+		.off1	= offsetof(struct thread_options, allow_mounted_write),
 		.help	= "Allow writes to a mounted partition",
 		.def	= "0",
 		.category = FIO_OPT_C_FILE,
@@ -3299,7 +3301,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "pre_read",
 		.lname	= "Pre-read files",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(pre_read),
+		.off1	= offsetof(struct thread_options, pre_read),
 		.help	= "Pre-read files before starting official testing",
 		.def	= "0",
 		.category = FIO_OPT_C_FILE,
@@ -3311,7 +3313,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "CPU mask",
 		.type	= FIO_OPT_INT,
 		.cb	= str_cpumask_cb,
-		.off1	= td_var_offset(cpumask),
+		.off1	= offsetof(struct thread_options, cpumask),
 		.help	= "CPU affinity mask",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_CRED,
@@ -3321,7 +3323,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "CPUs allowed",
 		.type	= FIO_OPT_STR,
 		.cb	= str_cpus_allowed_cb,
-		.off1	= td_var_offset(cpumask),
+		.off1	= offsetof(struct thread_options, cpumask),
 		.help	= "Set CPUs allowed",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_CRED,
@@ -3330,7 +3332,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "cpus_allowed_policy",
 		.lname	= "CPUs allowed distribution policy",
 		.type	= FIO_OPT_STR,
-		.off1	= td_var_offset(cpus_allowed_policy),
+		.off1	= offsetof(struct thread_options, cpus_allowed_policy),
 		.help	= "Distribution policy for cpus_allowed",
 		.parent = "cpus_allowed",
 		.prio	= 1,
@@ -3373,7 +3375,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "NUMA CPU Nodes",
 		.type	= FIO_OPT_STR,
 		.cb	= str_numa_cpunodes_cb,
-		.off1	= td_var_offset(numa_cpunodes),
+		.off1	= offsetof(struct thread_options, numa_cpunodes),
 		.help	= "NUMA CPU nodes bind",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_INVALID,
@@ -3383,7 +3385,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "NUMA Memory Policy",
 		.type	= FIO_OPT_STR,
 		.cb	= str_numa_mpol_cb,
-		.off1	= td_var_offset(numa_memnodes),
+		.off1	= offsetof(struct thread_options, numa_memnodes),
 		.help	= "NUMA memory policy setup",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_INVALID,
@@ -3406,7 +3408,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "end_fsync",
 		.lname	= "End fsync",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(end_fsync),
+		.off1	= offsetof(struct thread_options, end_fsync),
 		.help	= "Include fsync at the end of job",
 		.def	= "0",
 		.category = FIO_OPT_C_FILE,
@@ -3416,7 +3418,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "fsync_on_close",
 		.lname	= "Fsync on close",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(fsync_on_close),
+		.off1	= offsetof(struct thread_options, fsync_on_close),
 		.help	= "fsync files on close",
 		.def	= "0",
 		.category = FIO_OPT_C_FILE,
@@ -3426,7 +3428,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "unlink",
 		.lname	= "Unlink file",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(unlink),
+		.off1	= offsetof(struct thread_options, unlink),
 		.help	= "Unlink created files after job has completed",
 		.def	= "0",
 		.category = FIO_OPT_C_FILE,
@@ -3436,7 +3438,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "unlink_each_loop",
 		.lname	= "Unlink file after each loop of a job",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(unlink_each_loop),
+		.off1	= offsetof(struct thread_options, unlink_each_loop),
 		.help	= "Unlink created files after each loop in a job has completed",
 		.def	= "0",
 		.category = FIO_OPT_C_FILE,
@@ -3455,7 +3457,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "exitall_on_error",
 		.lname	= "Exit-all on terminate in error",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= td_var_offset(exitall_error),
+		.off1	= offsetof(struct thread_options, exitall_error),
 		.help	= "Terminate all jobs when one exits in error",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_PROCESS,
@@ -3465,7 +3467,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Wait for previous",
 		.alias	= "wait_for_previous",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= td_var_offset(stonewall),
+		.off1	= offsetof(struct thread_options, stonewall),
 		.help	= "Insert a hard barrier between this job and previous",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_PROCESS,
@@ -3474,7 +3476,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "new_group",
 		.lname	= "New group",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= td_var_offset(new_group),
+		.off1	= offsetof(struct thread_options, new_group),
 		.help	= "Mark the start of a new group (for reporting)",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_PROCESS,
@@ -3483,7 +3485,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "thread",
 		.lname	= "Thread",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= td_var_offset(use_thread),
+		.off1	= offsetof(struct thread_options, use_thread),
 		.help	= "Use threads instead of processes",
 #ifdef CONFIG_NO_SHM
 		.def	= "1",
@@ -3496,7 +3498,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "per_job_logs",
 		.lname	= "Per Job Logs",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(per_job_logs),
+		.off1	= offsetof(struct thread_options, per_job_logs),
 		.help	= "Include job number in generated log files or not",
 		.def	= "1",
 		.category = FIO_OPT_C_LOG,
@@ -3506,7 +3508,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "write_bw_log",
 		.lname	= "Write bandwidth log",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(bw_log_file),
+		.off1	= offsetof(struct thread_options, bw_log_file),
 		.help	= "Write log of bandwidth during run",
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
@@ -3515,7 +3517,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "write_lat_log",
 		.lname	= "Write latency log",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(lat_log_file),
+		.off1	= offsetof(struct thread_options, lat_log_file),
 		.help	= "Write log of latency during run",
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
@@ -3524,7 +3526,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "write_iops_log",
 		.lname	= "Write IOPS log",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(iops_log_file),
+		.off1	= offsetof(struct thread_options, iops_log_file),
 		.help	= "Write log of IOPS during run",
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
@@ -3533,7 +3535,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "log_avg_msec",
 		.lname	= "Log averaging (msec)",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(log_avg_msec),
+		.off1	= offsetof(struct thread_options, log_avg_msec),
 		.help	= "Average bw/iops/lat logs over this period of time",
 		.def	= "0",
 		.category = FIO_OPT_C_LOG,
@@ -3543,7 +3545,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "log_hist_msec",
 		.lname	= "Log histograms (msec)",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(log_hist_msec),
+		.off1	= offsetof(struct thread_options, log_hist_msec),
 		.help	= "Dump completion latency histograms at frequency of this time value",
 		.def	= "0",
 		.category = FIO_OPT_C_LOG,
@@ -3553,7 +3555,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "log_hist_coarseness",
 		.lname	= "Histogram logs coarseness",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(log_hist_coarseness),
+		.off1	= offsetof(struct thread_options, log_hist_coarseness),
 		.help	= "Integer in range [0,6]. Higher coarseness outputs"
 			" fewer histogram bins per sample. The number of bins for"
 			" these are [1216, 608, 304, 152, 76, 38, 19] respectively.",
@@ -3565,7 +3567,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "write_hist_log",
 		.lname	= "Write latency histogram logs",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(hist_log_file),
+		.off1	= offsetof(struct thread_options, hist_log_file),
 		.help	= "Write log of latency histograms during run",
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
@@ -3574,7 +3576,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "log_max_value",
 		.lname	= "Log maximum instead of average",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(log_max),
+		.off1	= offsetof(struct thread_options, log_max),
 		.help	= "Log max sample in a window instead of average",
 		.def	= "0",
 		.category = FIO_OPT_C_LOG,
@@ -3584,7 +3586,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "log_offset",
 		.lname	= "Log offset of IO",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(log_offset),
+		.off1	= offsetof(struct thread_options, log_offset),
 		.help	= "Include offset of IO for each log entry",
 		.def	= "0",
 		.category = FIO_OPT_C_LOG,
@@ -3595,7 +3597,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "log_compression",
 		.lname	= "Log compression",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(log_gz),
+		.off1	= offsetof(struct thread_options, log_gz),
 		.help	= "Log in compressed chunks of this size",
 		.minval	= 1024ULL,
 		.maxval	= 512 * 1024 * 1024ULL,
@@ -3608,7 +3610,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Log Compression CPUs",
 		.type	= FIO_OPT_STR,
 		.cb	= str_log_cpus_allowed_cb,
-		.off1	= td_var_offset(log_gz_cpumask),
+		.off1	= offsetof(struct thread_options, log_gz_cpumask),
 		.parent = "log_compression",
 		.help	= "Limit log compression to these CPUs",
 		.category = FIO_OPT_C_LOG,
@@ -3626,7 +3628,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "log_store_compressed",
 		.lname	= "Log store compressed",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(log_gz_store),
+		.off1	= offsetof(struct thread_options, log_gz_store),
 		.help	= "Store logs in a compressed format",
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
@@ -3649,7 +3651,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "block_error_percentiles",
 		.lname	= "Block error percentiles",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(block_error_hist),
+		.off1	= offsetof(struct thread_options, block_error_hist),
 		.help	= "Record trim block errors and make a histogram",
 		.def	= "0",
 		.category = FIO_OPT_C_LOG,
@@ -3659,7 +3661,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "bwavgtime",
 		.lname	= "Bandwidth average time",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(bw_avg_time),
+		.off1	= offsetof(struct thread_options, bw_avg_time),
 		.help	= "Time window over which to calculate bandwidth"
 			  " (msec)",
 		.def	= "500",
@@ -3673,7 +3675,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "iopsavgtime",
 		.lname	= "IOPS average time",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(iops_avg_time),
+		.off1	= offsetof(struct thread_options, iops_avg_time),
 		.help	= "Time window over which to calculate IOPS (msec)",
 		.def	= "500",
 		.parent	= "write_iops_log",
@@ -3686,7 +3688,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "group_reporting",
 		.lname	= "Group reporting",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= td_var_offset(group_reporting),
+		.off1	= offsetof(struct thread_options, group_reporting),
 		.help	= "Do reporting on a per-group basis",
 		.category = FIO_OPT_C_STAT,
 		.group	= FIO_OPT_G_INVALID,
@@ -3695,7 +3697,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "zero_buffers",
 		.lname	= "Zero I/O buffers",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= td_var_offset(zero_buffers),
+		.off1	= offsetof(struct thread_options, zero_buffers),
 		.help	= "Init IO buffers to all zeroes",
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_IO_BUF,
@@ -3704,7 +3706,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "refill_buffers",
 		.lname	= "Refill I/O buffers",
 		.type	= FIO_OPT_STR_SET,
-		.off1	= td_var_offset(refill_buffers),
+		.off1	= offsetof(struct thread_options, refill_buffers),
 		.help	= "Refill IO buffers on every IO submit",
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_IO_BUF,
@@ -3713,7 +3715,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "scramble_buffers",
 		.lname	= "Scramble I/O buffers",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(scramble_buffers),
+		.off1	= offsetof(struct thread_options, scramble_buffers),
 		.help	= "Slightly scramble buffers on every IO submit",
 		.def	= "1",
 		.category = FIO_OPT_C_IO,
@@ -3724,7 +3726,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Buffer pattern",
 		.type	= FIO_OPT_STR,
 		.cb	= str_buffer_pattern_cb,
-		.off1	= td_var_offset(buffer_pattern),
+		.off1	= offsetof(struct thread_options, buffer_pattern),
 		.help	= "Fill pattern for IO buffers",
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_IO_BUF,
@@ -3734,7 +3736,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Buffer compression percentage",
 		.type	= FIO_OPT_INT,
 		.cb	= str_buffer_compress_cb,
-		.off1	= td_var_offset(compress_percentage),
+		.off1	= offsetof(struct thread_options, compress_percentage),
 		.maxval	= 100,
 		.minval	= 0,
 		.help	= "How compressible the buffer is (approximately)",
@@ -3746,7 +3748,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "buffer_compress_chunk",
 		.lname	= "Buffer compression chunk size",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(compress_chunk),
+		.off1	= offsetof(struct thread_options, compress_chunk),
 		.parent	= "buffer_compress_percentage",
 		.hide	= 1,
 		.help	= "Size of compressible region in buffer",
@@ -3759,7 +3761,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Dedupe percentage",
 		.type	= FIO_OPT_INT,
 		.cb	= str_dedupe_cb,
-		.off1	= td_var_offset(dedupe_percentage),
+		.off1	= offsetof(struct thread_options, dedupe_percentage),
 		.maxval	= 100,
 		.minval	= 0,
 		.help	= "Percentage of buffers that are dedupable",
@@ -3771,7 +3773,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "clat_percentiles",
 		.lname	= "Completion latency percentiles",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(clat_percentiles),
+		.off1	= offsetof(struct thread_options, clat_percentiles),
 		.help	= "Enable the reporting of completion latency percentiles",
 		.def	= "1",
 		.category = FIO_OPT_C_STAT,
@@ -3781,8 +3783,8 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "percentile_list",
 		.lname	= "Percentile list",
 		.type	= FIO_OPT_FLOAT_LIST,
-		.off1	= td_var_offset(percentile_list),
-		.off2	= td_var_offset(percentile_precision),
+		.off1	= offsetof(struct thread_options, percentile_list),
+		.off2	= offsetof(struct thread_options, percentile_precision),
 		.help	= "Specify a custom list of percentiles to report for "
 			  "completion latency and block errors",
 		.def    = "1:5:10:20:30:40:50:60:70:80:90:95:99:99.5:99.9:99.95:99.99",
@@ -3798,7 +3800,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "disk_util",
 		.lname	= "Disk utilization",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(do_disk_util),
+		.off1	= offsetof(struct thread_options, do_disk_util),
 		.help	= "Log disk utilization statistics",
 		.def	= "1",
 		.category = FIO_OPT_C_STAT,
@@ -3827,7 +3829,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "disable_lat",
 		.lname	= "Disable all latency stats",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(disable_lat),
+		.off1	= offsetof(struct thread_options, disable_lat),
 		.help	= "Disable latency numbers",
 		.parent	= "gtod_reduce",
 		.hide	= 1,
@@ -3839,7 +3841,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "disable_clat",
 		.lname	= "Disable completion latency stats",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(disable_clat),
+		.off1	= offsetof(struct thread_options, disable_clat),
 		.help	= "Disable completion latency numbers",
 		.parent	= "gtod_reduce",
 		.hide	= 1,
@@ -3851,7 +3853,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "disable_slat",
 		.lname	= "Disable submission latency stats",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(disable_slat),
+		.off1	= offsetof(struct thread_options, disable_slat),
 		.help	= "Disable submission latency numbers",
 		.parent	= "gtod_reduce",
 		.hide	= 1,
@@ -3863,7 +3865,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "disable_bw_measurement",
 		.lname	= "Disable bandwidth stats",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(disable_bw),
+		.off1	= offsetof(struct thread_options, disable_bw),
 		.help	= "Disable bandwidth logging",
 		.parent	= "gtod_reduce",
 		.hide	= 1,
@@ -3875,7 +3877,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "gtod_cpu",
 		.lname	= "Dedicated gettimeofday() CPU",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(gtod_cpu),
+		.off1	= offsetof(struct thread_options, gtod_cpu),
 		.help	= "Set up dedicated gettimeofday() thread on this CPU",
 		.verify	= gtod_cpu_verify,
 		.category = FIO_OPT_C_GENERAL,
@@ -3885,7 +3887,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "unified_rw_reporting",
 		.lname	= "Unified RW Reporting",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(unified_rw_rep),
+		.off1	= offsetof(struct thread_options, unified_rw_rep),
 		.help	= "Unify reporting across data direction",
 		.def	= "0",
 		.category = FIO_OPT_C_GENERAL,
@@ -3895,7 +3897,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "continue_on_error",
 		.lname	= "Continue on error",
 		.type	= FIO_OPT_STR,
-		.off1	= td_var_offset(continue_on_error),
+		.off1	= offsetof(struct thread_options, continue_on_error),
 		.help	= "Continue on non-fatal errors during IO",
 		.def	= "none",
 		.category = FIO_OPT_C_GENERAL,
@@ -3940,7 +3942,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Ignore Error",
 		.type	= FIO_OPT_STR,
 		.cb	= str_ignore_error_cb,
-		.off1	= td_var_offset(ignore_error_nr),
+		.off1	= offsetof(struct thread_options, ignore_error_nr),
 		.help	= "Set a specific list of errors to ignore",
 		.parent	= "rw",
 		.category = FIO_OPT_C_GENERAL,
@@ -3950,7 +3952,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "error_dump",
 		.lname	= "Error Dump",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(error_dump),
+		.off1	= offsetof(struct thread_options, error_dump),
 		.def	= "0",
 		.help	= "Dump info on each error",
 		.category = FIO_OPT_C_GENERAL,
@@ -3960,7 +3962,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "profile",
 		.lname	= "Profile",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(profile),
+		.off1	= offsetof(struct thread_options, profile),
 		.help	= "Select a specific builtin performance test",
 		.category = FIO_OPT_C_PROFILE,
 		.group	= FIO_OPT_G_INVALID,
@@ -3969,7 +3971,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "cgroup",
 		.lname	= "Cgroup",
 		.type	= FIO_OPT_STR_STORE,
-		.off1	= td_var_offset(cgroup),
+		.off1	= offsetof(struct thread_options, cgroup),
 		.help	= "Add job to cgroup of this name",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_CGROUP,
@@ -3978,7 +3980,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "cgroup_nodelete",
 		.lname	= "Cgroup no-delete",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(cgroup_nodelete),
+		.off1	= offsetof(struct thread_options, cgroup_nodelete),
 		.help	= "Do not delete cgroups after job completion",
 		.def	= "0",
 		.parent	= "cgroup",
@@ -3989,7 +3991,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "cgroup_weight",
 		.lname	= "Cgroup weight",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(cgroup_weight),
+		.off1	= offsetof(struct thread_options, cgroup_weight),
 		.help	= "Use given weight for cgroup",
 		.minval = 100,
 		.maxval	= 1000,
@@ -4001,7 +4003,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "uid",
 		.lname	= "User ID",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(uid),
+		.off1	= offsetof(struct thread_options, uid),
 		.help	= "Run job with this user ID",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_CRED,
@@ -4010,7 +4012,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "gid",
 		.lname	= "Group ID",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(gid),
+		.off1	= offsetof(struct thread_options, gid),
 		.help	= "Run job with this group ID",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_CRED,
@@ -4019,7 +4021,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "kb_base",
 		.lname	= "KB Base",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(kb_base),
+		.off1	= offsetof(struct thread_options, kb_base),
 		.prio	= 1,
 		.def	= "1024",
 		.posval = {
@@ -4040,7 +4042,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "unit_base",
 		.lname	= "Base unit for reporting (Bits or Bytes)",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(unit_base),
+		.off1	= offsetof(struct thread_options, unit_base),
 		.prio	= 1,
 		.posval = {
 			  { .ival = "0",
@@ -4064,7 +4066,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "hugepage-size",
 		.lname	= "Hugepage size",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(hugepage_size),
+		.off1	= offsetof(struct thread_options, hugepage_size),
 		.help	= "When using hugepages, specify size of each page",
 		.def	= __fio_stringify(FIO_HUGE_PAGE),
 		.interval = 1024 * 1024,
@@ -4075,7 +4077,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "flow_id",
 		.lname	= "I/O flow ID",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(flow_id),
+		.off1	= offsetof(struct thread_options, flow_id),
 		.help	= "The flow index ID to use",
 		.def	= "0",
 		.category = FIO_OPT_C_IO,
@@ -4085,7 +4087,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "flow",
 		.lname	= "I/O flow weight",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(flow),
+		.off1	= offsetof(struct thread_options, flow),
 		.help	= "Weight for flow control of this job",
 		.parent	= "flow_id",
 		.hide	= 1,
@@ -4097,7 +4099,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "flow_watermark",
 		.lname	= "I/O flow watermark",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(flow_watermark),
+		.off1	= offsetof(struct thread_options, flow_watermark),
 		.help	= "High watermark for flow control. This option"
 			" should be set to the same value for all threads"
 			" with non-zero flow.",
@@ -4111,7 +4113,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "flow_sleep",
 		.lname	= "I/O flow sleep",
 		.type	= FIO_OPT_INT,
-		.off1	= td_var_offset(flow_sleep),
+		.off1	= offsetof(struct thread_options, flow_sleep),
 		.help	= "How many microseconds to sleep after being held"
 			" back by the flow control mechanism",
 		.parent	= "flow_id",
@@ -4124,7 +4126,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "skip_bad",
 		.lname	= "Skip operations against bad blocks",
 		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(skip_bad),
+		.off1	= offsetof(struct thread_options, skip_bad),
 		.help	= "Skip operations against known bad blocks.",
 		.hide	= 1,
 		.def	= "0",
@@ -4474,7 +4476,7 @@ int fio_options_parse(struct thread_data *td, char **opts, int num_opts)
 	for (ret = 0, i = 0, unknown = 0; i < num_opts; i++) {
 		struct fio_option *o;
 		int newret = parse_option(opts_copy[i], opts[i], fio_options,
-						&o, td, &td->opt_list);
+						&o, &td->o, &td->opt_list);
 
 		if (!newret && o)
 			fio_option_mark_set(&td->o, o);
@@ -4527,7 +4529,7 @@ int fio_cmd_option_parse(struct thread_data *td, const char *opt, char *val)
 {
 	int ret;
 
-	ret = parse_cmd_option(opt, val, fio_options, td, &td->opt_list);
+	ret = parse_cmd_option(opt, val, fio_options, &td->o, &td->opt_list);
 	if (!ret) {
 		struct fio_option *o;
 
@@ -4549,7 +4551,7 @@ int fio_cmd_ioengine_option_parse(struct thread_data *td, const char *opt,
 void fio_fill_default_options(struct thread_data *td)
 {
 	td->o.magic = OPT_MAGIC;
-	fill_default_options(td, fio_options);
+	fill_default_options(&td->o, fio_options);
 }
 
 int fio_show_option_help(const char *opt)
@@ -4590,7 +4592,8 @@ void fio_options_mem_dupe(struct thread_data *td)
 
 unsigned int fio_get_kb_base(void *data)
 {
-	struct thread_options *o = data;
+	struct thread_data *td = cb_data_to_td(data);
+	struct thread_options *o = &td->o;
 	unsigned int kb_base = 0;
 
 	/*
@@ -4686,7 +4689,7 @@ void del_opt_posval(const char *optname, const char *ival)
 
 void fio_options_free(struct thread_data *td)
 {
-	options_free(fio_options, td);
+	options_free(fio_options, &td->o);
 	if (td->eo && td->io_ops && td->io_ops->options) {
 		options_free(td->io_ops->options, td->eo);
 		free(td->eo);
diff --git a/options.h b/options.h
index 539a636..83a58e2 100644
--- a/options.h
+++ b/options.h
@@ -9,8 +9,6 @@
 #include "flist.h"
 #include "lib/types.h"
 
-#define td_var_offset(var)	((size_t) &((struct thread_options *)0)->var)
-
 int add_option(struct fio_option *);
 void invalidate_profile_options(const char *);
 extern char *exec_profile;
@@ -19,7 +17,6 @@ void add_opt_posval(const char *, const char *, const char *);
 void del_opt_posval(const char *, const char *);
 struct thread_data;
 void fio_options_free(struct thread_data *);
-char *get_name_idx(char *, int);
 int set_name_idx(char *, size_t, char *, int, bool);
 
 extern char client_sockaddr_str[];  /* used with --client option */
@@ -30,7 +27,7 @@ extern bool __fio_option_is_set(struct thread_options *, unsigned int off);
 
 #define fio_option_is_set(__td, name)					\
 ({									\
-	const unsigned int off = td_var_offset(name);			\
+	const unsigned int off = offsetof(struct thread_options, name);	\
 	bool __r = __fio_option_is_set((__td), off);			\
 	__r;								\
 })
diff --git a/oslib/libmtd.c b/oslib/libmtd.c
index 5c9eac2..5b22d6a 100644
--- a/oslib/libmtd.c
+++ b/oslib/libmtd.c
@@ -1190,7 +1190,7 @@ int mtd_write(libmtd_t desc, const struct mtd_dev_info *mtd, int fd, int eb,
 	return 0;
 }
 
-int do_oob_op(libmtd_t desc, const struct mtd_dev_info *mtd, int fd,
+static int do_oob_op(libmtd_t desc, const struct mtd_dev_info *mtd, int fd,
 	      uint64_t start, uint64_t length, void *data, unsigned int cmd64,
 	      unsigned int cmd)
 {
diff --git a/oslib/linux-dev-lookup.c b/oslib/linux-dev-lookup.c
index 3a415dd..2bbd14a 100644
--- a/oslib/linux-dev-lookup.c
+++ b/oslib/linux-dev-lookup.c
@@ -6,6 +6,7 @@
 #include <unistd.h>
 
 #include "../os/os.h"
+#include "oslib/linux-dev-lookup.h"
 
 int blktrace_lookup_device(const char *redirect, char *path, unsigned int maj,
 			   unsigned int min)
diff --git a/oslib/strlcat.c b/oslib/strlcat.c
index 643d496..3329b83 100644
--- a/oslib/strlcat.c
+++ b/oslib/strlcat.c
@@ -1,4 +1,5 @@
 #include <string.h>
+#include "oslib/strlcat.h"
 
 size_t strlcat(char *dst, const char *src, size_t size)
 {
diff --git a/parse.c b/parse.c
index 086f786..8ed4619 100644
--- a/parse.c
+++ b/parse.c
@@ -1250,7 +1250,7 @@ void fill_default_options(void *data, struct fio_option *options)
 			handle_option(o, o->def, data);
 }
 
-void option_init(struct fio_option *o)
+static void option_init(struct fio_option *o)
 {
 	if (o->type == FIO_OPT_DEPRECATED || o->type == FIO_OPT_UNSUPPORTED)
 		return;
diff --git a/parse.h b/parse.h
index aa00a67..d852ddc 100644
--- a/parse.h
+++ b/parse.h
@@ -80,14 +80,11 @@ struct fio_option {
 	int pow2;			/* must be a power-of-2 */
 };
 
-typedef int (str_cb_fn)(void *, char *);
-
 extern int parse_option(char *, const char *, struct fio_option *, struct fio_option **, void *, struct flist_head *);
 extern void sort_options(char **, struct fio_option *, int);
 extern int parse_cmd_option(const char *t, const char *l, struct fio_option *, void *, struct flist_head *);
 extern int show_cmd_help(struct fio_option *, const char *);
 extern void fill_default_options(void *, struct fio_option *);
-extern void option_init(struct fio_option *);
 extern void options_init(struct fio_option *);
 extern void options_free(struct fio_option *, void *);
 
@@ -107,18 +104,19 @@ extern int string_distance_ok(const char *s1, int dist);
 typedef int (fio_opt_str_fn)(void *, const char *);
 typedef int (fio_opt_str_val_fn)(void *, long long *);
 typedef int (fio_opt_int_fn)(void *, int *);
-typedef int (fio_opt_str_set_fn)(void *);
-
-#define __td_var(start, offset)	((char *) start + (offset))
 
 struct thread_options;
 static inline void *td_var(struct thread_options *to, struct fio_option *o,
 			   unsigned int offset)
 {
+	void *ret;
+
 	if (o->prof_opts)
-		return __td_var(o->prof_opts, offset);
+		ret = o->prof_opts;
+	else
+		ret = to;
 
-	return __td_var(to, offset);
+	return ret + offset;
 }
 
 static inline int parse_is_percent(unsigned long long val)
diff --git a/rate-submit.c b/rate-submit.c
index 48b7a58..2efbdcb 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -110,9 +110,6 @@ static int io_workqueue_init_worker_fn(struct submit_worker *sw)
 	if (ioengine_load(td))
 		goto err;
 
-	if (td->o.odirect)
-		td->io_ops->flags |= FIO_RAWIO;
-
 	td->pid = gettid();
 
 	INIT_FLIST_HEAD(&td->io_log_list);
diff --git a/server.c b/server.c
index 2fd9b45..9f2220d 100644
--- a/server.c
+++ b/server.c
@@ -622,7 +622,7 @@ static int fio_net_queue_quit(void)
 {
 	dprint(FD_NET, "server: sending quit\n");
 
-	return fio_net_queue_cmd(FIO_NET_CMD_QUIT, NULL, 0, 0, SK_F_SIMPLE);
+	return fio_net_queue_cmd(FIO_NET_CMD_QUIT, NULL, 0, NULL, SK_F_SIMPLE);
 }
 
 int fio_net_send_quit(int sk)
@@ -1883,7 +1883,7 @@ void fio_server_send_start(struct thread_data *td)
 
 	assert(sk_out->sk != -1);
 
-	fio_net_queue_cmd(FIO_NET_CMD_SERVER_START, NULL, 0, 0, SK_F_SIMPLE);
+	fio_net_queue_cmd(FIO_NET_CMD_SERVER_START, NULL, 0, NULL, SK_F_SIMPLE);
 }
 
 int fio_server_get_verify_state(const char *name, int threadnumber,
diff --git a/stat.c b/stat.c
index ef9fe7d..6f5f002 100644
--- a/stat.c
+++ b/stat.c
@@ -257,13 +257,13 @@ out:
 		free(ovals);
 }
 
-int calc_lat(struct io_stat *is, unsigned long *min, unsigned long *max,
-	     double *mean, double *dev)
+bool calc_lat(struct io_stat *is, unsigned long *min, unsigned long *max,
+	      double *mean, double *dev)
 {
 	double n = (double) is->samples;
 
 	if (n == 0)
-		return 0;
+		return false;
 
 	*min = is->min_val;
 	*max = is->max_val;
@@ -274,7 +274,7 @@ int calc_lat(struct io_stat *is, unsigned long *min, unsigned long *max,
 	else
 		*dev = 0;
 
-	return 1;
+	return true;
 }
 
 void show_group_stats(struct group_run_stats *rs, struct buf_output *out)
@@ -364,7 +364,7 @@ static void display_lat(const char *name, unsigned long min, unsigned long max,
 	const char *base = "(usec)";
 	char *minp, *maxp;
 
-	if (!usec_to_msec(&min, &max, &mean, &dev))
+	if (usec_to_msec(&min, &max, &mean, &dev))
 		base = "(msec)";
 
 	minp = num2str(min, 6, 1, 0, 0);
@@ -1090,8 +1090,8 @@ static void show_thread_status_terse_v3_v4(struct thread_stat *ts,
 	log_buf(out, "\n");
 }
 
-void json_add_job_opts(struct json_object *root, const char *name,
-		       struct flist_head *opt_list, bool num_jobs)
+static void json_add_job_opts(struct json_object *root, const char *name,
+			      struct flist_head *opt_list, bool num_jobs)
 {
 	struct json_object *dir_object;
 	struct flist_head *entry;
diff --git a/stat.h b/stat.h
index 86f1a0b..c3e343d 100644
--- a/stat.h
+++ b/stat.h
@@ -249,7 +249,7 @@ extern void stat_exit(void);
 
 extern struct json_object * show_thread_status(struct thread_stat *ts, struct group_run_stats *rs, struct flist_head *, struct buf_output *);
 extern void show_group_stats(struct group_run_stats *rs, struct buf_output *);
-extern int calc_thread_status(struct jobs_eta *je, int force);
+extern bool calc_thread_status(struct jobs_eta *je, int force);
 extern void display_thread_status(struct jobs_eta *je);
 extern void show_run_stats(void);
 extern void __show_run_stats(void);
@@ -261,7 +261,7 @@ extern void sum_group_stats(struct group_run_stats *dst, struct group_run_stats
 extern void init_thread_stat(struct thread_stat *ts);
 extern void init_group_run_stat(struct group_run_stats *gs);
 extern void eta_to_str(char *str, unsigned long eta_sec);
-extern int calc_lat(struct io_stat *is, unsigned long *min, unsigned long *max, double *mean, double *dev);
+extern bool calc_lat(struct io_stat *is, unsigned long *min, unsigned long *max, double *mean, double *dev);
 extern unsigned int calc_clat_percentiles(unsigned int *io_u_plat, unsigned long nr, fio_fp64_t *plist, unsigned int **output, unsigned int *maxv, unsigned int *minv);
 extern void stat_calc_lat_m(struct thread_stat *ts, double *io_u_lat);
 extern void stat_calc_lat_u(struct thread_stat *ts, double *io_u_lat);
@@ -286,18 +286,18 @@ extern int calc_log_samples(void);
 extern struct io_log *agg_io_log[DDIR_RWDIR_CNT];
 extern int write_bw_log;
 
-static inline int usec_to_msec(unsigned long *min, unsigned long *max,
-			       double *mean, double *dev)
+static inline bool usec_to_msec(unsigned long *min, unsigned long *max,
+				double *mean, double *dev)
 {
 	if (*min > 1000 && *max > 1000 && *mean > 1000.0 && *dev > 1000.0) {
 		*min /= 1000;
 		*max /= 1000;
 		*mean /= 1000.0;
 		*dev /= 1000.0;
-		return 0;
+		return true;
 	}
 
-	return 1;
+	return false;
 }
 /*
  * Worst level condensing would be 1:5, so allow enough room for that
diff --git a/verify.c b/verify.c
index 40cfbab..790ab31 100644
--- a/verify.c
+++ b/verify.c
@@ -41,13 +41,14 @@ void fill_buffer_pattern(struct thread_data *td, void *p, unsigned int len)
 	(void)cpy_pattern(td->o.buffer_pattern, td->o.buffer_pattern_bytes, p, len);
 }
 
-void __fill_buffer(struct thread_options *o, unsigned long seed, void *p,
-		   unsigned int len)
+static void __fill_buffer(struct thread_options *o, unsigned long seed, void *p,
+			  unsigned int len)
 {
 	__fill_random_buf_percentage(seed, p, o->compress_percentage, len, len, o->buffer_pattern, o->buffer_pattern_bytes);
 }
 
-unsigned long fill_buffer(struct thread_data *td, void *p, unsigned int len)
+static unsigned long fill_buffer(struct thread_data *td, void *p,
+				 unsigned int len)
 {
 	struct frand_state *fs = &td->verify_state;
 	struct thread_options *o = &td->o;
@@ -802,7 +803,7 @@ int verify_io_u(struct thread_data *td, struct io_u **io_u_ptr)
 	 * If the IO engine is faking IO (like null), then just pretend
 	 * we verified everything.
 	 */
-	if (td->io_ops->flags & FIO_FAKEIO)
+	if (td_ioengine_flagged(td, FIO_FAKEIO))
 		return 0;
 
 	if (io_u->flags & IO_U_F_TRIMMED) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9973b0f961a57c19f885ffca05f86ae6ef85f8c7:

  iolog: silence warning on pointer cast on 32-bit compiles (2016-08-08 11:32:34 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 1651e4310feb3eab7c7c8cf0bd23d159cb410628:

  Only enable atomic io_u flag setting/clearing if we need it (2016-08-14 21:31:16 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Only enable atomic io_u flag setting/clearing if we need it

 backend.c     |  4 ++--
 fio.h         | 20 +++++++++++++++++++-
 io_u.c        | 20 ++++++++++----------
 ioengine.h    | 13 ++++---------
 ioengines.c   |  2 +-
 rate-submit.c |  4 ++--
 verify.c      |  8 ++++----
 7 files changed, 42 insertions(+), 29 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 6bf5d67..c051c13 100644
--- a/backend.c
+++ b/backend.c
@@ -695,7 +695,7 @@ static void do_verify(struct thread_data *td, uint64_t verify_bytes)
 					continue;
 				} else if (io_u->ddir == DDIR_TRIM) {
 					io_u->ddir = DDIR_READ;
-					io_u_set(io_u, IO_U_F_TRIMMED);
+					io_u_set(td, io_u, IO_U_F_TRIMMED);
 					break;
 				} else if (io_u->ddir == DDIR_WRITE) {
 					io_u->ddir = DDIR_READ;
@@ -1432,7 +1432,7 @@ static uint64_t do_dry_run(struct thread_data *td)
 		if (IS_ERR_OR_NULL(io_u))
 			break;
 
-		io_u_set(io_u, IO_U_F_FLIGHT);
+		io_u_set(td, io_u, IO_U_F_FLIGHT);
 		io_u->error = 0;
 		io_u->resid = 0;
 		if (ddir_rw(acct_ddir(io_u)))
diff --git a/fio.h b/fio.h
index d929467..7f685ea 100644
--- a/fio.h
+++ b/fio.h
@@ -677,7 +677,7 @@ static inline unsigned int td_min_bs(struct thread_data *td)
 	return min(td->o.min_bs[DDIR_TRIM], min_bs);
 }
 
-static inline int td_async_processing(struct thread_data *td)
+static inline bool td_async_processing(struct thread_data *td)
 {
 	return (td->flags & TD_F_NEED_LOCK) != 0;
 }
@@ -704,6 +704,24 @@ static inline void td_io_u_free_notify(struct thread_data *td)
 		pthread_cond_signal(&td->free_cond);
 }
 
+static inline void td_flags_clear(struct thread_data *td, unsigned int *flags,
+				  unsigned int value)
+{
+	if (!td_async_processing(td))
+		*flags &= ~value;
+	else
+		__sync_fetch_and_and(flags, ~value);
+}
+
+static inline void td_flags_set(struct thread_data *td, unsigned int *flags,
+				unsigned int value)
+{
+	if (!td_async_processing(td))
+		*flags |= value;
+	else
+		__sync_fetch_and_or(flags, value);
+}
+
 extern const char *fio_get_arch_string(int);
 extern const char *fio_get_os_string(int);
 
diff --git a/io_u.c b/io_u.c
index c0790b2..34acc56 100644
--- a/io_u.c
+++ b/io_u.c
@@ -409,7 +409,7 @@ static int get_next_block(struct thread_data *td, struct io_u *io_u,
 				*is_random = 1;
 			} else {
 				*is_random = 0;
-				io_u_set(io_u, IO_U_F_BUSY_OK);
+				io_u_set(td, io_u, IO_U_F_BUSY_OK);
 				ret = get_next_seq_offset(td, f, ddir, &offset);
 				if (ret)
 					ret = get_next_rand_block(td, f, ddir, &b);
@@ -419,7 +419,7 @@ static int get_next_block(struct thread_data *td, struct io_u *io_u,
 			ret = get_next_seq_offset(td, f, ddir, &offset);
 		}
 	} else {
-		io_u_set(io_u, IO_U_F_BUSY_OK);
+		io_u_set(td, io_u, IO_U_F_BUSY_OK);
 		*is_random = 0;
 
 		if (td->o.rw_seq == RW_SEQ_SEQ) {
@@ -772,7 +772,7 @@ static void set_rw_ddir(struct thread_data *td, struct io_u *io_u)
 	    td->o.barrier_blocks &&
 	   !(td->io_issues[DDIR_WRITE] % td->o.barrier_blocks) &&
 	     td->io_issues[DDIR_WRITE])
-		io_u_set(io_u, IO_U_F_BARRIER);
+		io_u_set(td, io_u, IO_U_F_BARRIER);
 }
 
 void put_file_log(struct thread_data *td, struct fio_file *f)
@@ -794,7 +794,7 @@ void put_io_u(struct thread_data *td, struct io_u *io_u)
 		put_file_log(td, io_u->file);
 
 	io_u->file = NULL;
-	io_u_set(io_u, IO_U_F_FREE);
+	io_u_set(td, io_u, IO_U_F_FREE);
 
 	if (io_u->flags & IO_U_F_IN_CUR_DEPTH) {
 		td->cur_depth--;
@@ -807,7 +807,7 @@ void put_io_u(struct thread_data *td, struct io_u *io_u)
 
 void clear_io_u(struct thread_data *td, struct io_u *io_u)
 {
-	io_u_clear(io_u, IO_U_F_FLIGHT);
+	io_u_clear(td, io_u, IO_U_F_FLIGHT);
 	put_io_u(td, io_u);
 }
 
@@ -823,11 +823,11 @@ void requeue_io_u(struct thread_data *td, struct io_u **io_u)
 
 	td_io_u_lock(td);
 
-	io_u_set(__io_u, IO_U_F_FREE);
+	io_u_set(td, __io_u, IO_U_F_FREE);
 	if ((__io_u->flags & IO_U_F_FLIGHT) && ddir_rw(ddir))
 		td->io_issues[ddir]--;
 
-	io_u_clear(__io_u, IO_U_F_FLIGHT);
+	io_u_clear(td, __io_u, IO_U_F_FLIGHT);
 	if (__io_u->flags & IO_U_F_IN_CUR_DEPTH) {
 		td->cur_depth--;
 		assert(!(td->flags & TD_F_CHILD));
@@ -1457,7 +1457,7 @@ again:
 
 	if (io_u) {
 		assert(io_u->flags & IO_U_F_FREE);
-		io_u_clear(io_u, IO_U_F_FREE | IO_U_F_NO_FILE_PUT |
+		io_u_clear(td, io_u, IO_U_F_FREE | IO_U_F_NO_FILE_PUT |
 				 IO_U_F_TRIMMED | IO_U_F_BARRIER |
 				 IO_U_F_VER_LIST);
 
@@ -1465,7 +1465,7 @@ again:
 		io_u->acct_ddir = -1;
 		td->cur_depth++;
 		assert(!(td->flags & TD_F_CHILD));
-		io_u_set(io_u, IO_U_F_IN_CUR_DEPTH);
+		io_u_set(td, io_u, IO_U_F_IN_CUR_DEPTH);
 		io_u->ipo = NULL;
 	} else if (td_async_processing(td)) {
 		/*
@@ -1803,7 +1803,7 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 	dprint_io_u(io_u, "io complete");
 
 	assert(io_u->flags & IO_U_F_FLIGHT);
-	io_u_clear(io_u, IO_U_F_FLIGHT | IO_U_F_BUSY_OK);
+	io_u_clear(td, io_u, IO_U_F_FLIGHT | IO_U_F_BUSY_OK);
 
 	/*
 	 * Mark IO ok to verify
diff --git a/ioengine.h b/ioengine.h
index ceed329..08e8fab 100644
--- a/ioengine.h
+++ b/ioengine.h
@@ -257,14 +257,9 @@ static inline enum fio_ddir acct_ddir(struct io_u *io_u)
 	return io_u->ddir;
 }
 
-static inline void io_u_clear(struct io_u *io_u, unsigned int flags)
-{
-	__sync_fetch_and_and(&io_u->flags, ~flags);
-}
-
-static inline void io_u_set(struct io_u *io_u, unsigned int flags)
-{
-	__sync_fetch_and_or(&io_u->flags, flags);
-}
+#define io_u_clear(td, io_u, val)	\
+	td_flags_clear((td), &(io_u->flags), (val))
+#define io_u_set(td, io_u, val)		\
+	td_flags_set((td), &(io_u)->flags, (val))
 
 #endif
diff --git a/ioengines.c b/ioengines.c
index a06909e..1c7a93b 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -260,7 +260,7 @@ int td_io_queue(struct thread_data *td, struct io_u *io_u)
 	fio_ro_check(td, io_u);
 
 	assert((io_u->flags & IO_U_F_FLIGHT) == 0);
-	io_u_set(io_u, IO_U_F_FLIGHT);
+	io_u_set(td, io_u, IO_U_F_FLIGHT);
 
 	assert(fio_file_open(io_u->file));
 
diff --git a/rate-submit.c b/rate-submit.c
index 0c31f29..48b7a58 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -19,7 +19,7 @@ static int io_workqueue_fn(struct submit_worker *sw,
 
 	dprint(FD_RATE, "io_u %p queued by %u\n", io_u, gettid());
 
-	io_u_set(io_u, IO_U_F_NO_FILE_PUT);
+	io_u_set(td, io_u, IO_U_F_NO_FILE_PUT);
 
 	td->cur_depth++;
 
@@ -30,7 +30,7 @@ static int io_workqueue_fn(struct submit_worker *sw,
 		ret = io_u_queued_complete(td, 1);
 		if (ret > 0)
 			td->cur_depth -= ret;
-		io_u_clear(io_u, IO_U_F_FLIGHT);
+		io_u_clear(td, io_u, IO_U_F_FLIGHT);
 	} while (1);
 
 	dprint(FD_RATE, "io_u %p ret %d by %u\n", io_u, ret, gettid());
diff --git a/verify.c b/verify.c
index 9a96fbb..40cfbab 100644
--- a/verify.c
+++ b/verify.c
@@ -651,7 +651,7 @@ int verify_io_u_async(struct thread_data *td, struct io_u **io_u_ptr)
 
 	if (io_u->flags & IO_U_F_IN_CUR_DEPTH) {
 		td->cur_depth--;
-		io_u_clear(io_u, IO_U_F_IN_CUR_DEPTH);
+		io_u_clear(td, io_u, IO_U_F_IN_CUR_DEPTH);
 	}
 	flist_add_tail(&io_u->verify_list, &td->verify_list);
 	*io_u_ptr = NULL;
@@ -1168,10 +1168,10 @@ int get_next_verify(struct thread_data *td, struct io_u *io_u)
 		io_u->buflen = ipo->len;
 		io_u->numberio = ipo->numberio;
 		io_u->file = ipo->file;
-		io_u_set(io_u, IO_U_F_VER_LIST);
+		io_u_set(td, io_u, IO_U_F_VER_LIST);
 
 		if (ipo->flags & IP_F_TRIMMED)
-			io_u_set(io_u, IO_U_F_TRIMMED);
+			io_u_set(td, io_u, IO_U_F_TRIMMED);
 
 		if (!fio_file_open(io_u->file)) {
 			int r = td_io_open_file(td, io_u->file);
@@ -1255,7 +1255,7 @@ static void *verify_async_thread(void *data)
 			io_u = flist_first_entry(&list, struct io_u, verify_list);
 			flist_del_init(&io_u->verify_list);
 
-			io_u_set(io_u, IO_U_F_NO_FILE_PUT);
+			io_u_set(td, io_u, IO_U_F_NO_FILE_PUT);
 			ret = verify_io_u(td, &io_u);
 
 			put_io_u(td, io_u);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 93168285bc564941d832deea172dc1f68de68666:

  stat: fixups to histogram logging (2016-08-07 15:18:38 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9973b0f961a57c19f885ffca05f86ae6ef85f8c7:

  iolog: silence warning on pointer cast on 32-bit compiles (2016-08-08 11:32:34 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      iolog: hist_sum() should return unsigned long
      ioengines: fixup td_io_unlink_file() error propagation
      Fix spelling error
      iolog: silence warning on pointer cast on 32-bit compiles

Tomohiro Kusumi (6):
      Use char* for pid_file path
      Change --output-format argument from optional to required
      Make local const string array static
      Add missing FIO_NET_CMD entry to fio_server_ops[]
      Use a pointer to const char* for I/O engine name (in response to aa2b823c)
      Check if sysfs ioscheduler entry is "none"

mrturtledev (1):
      Add 'unlink_each_loop' option

 HOWTO             |  4 +++-
 backend.c         | 36 +++++++++++++++++++++++++++++++++++-
 cconv.c           |  2 ++
 engines/pmemblk.c |  2 +-
 filesetup.c       |  6 +++++-
 fio.1             |  3 +++
 init.c            | 10 ++--------
 io_ddir.h         |  2 +-
 ioengine.h        |  2 +-
 ioengines.c       | 11 +++++++++--
 iolog.c           |  9 +++++----
 options.c         | 12 +++++++++++-
 server.c          |  1 +
 server.h          |  2 +-
 thread_options.h  |  3 ++-
 15 files changed, 82 insertions(+), 23 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 0085b74..5bf7125 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1295,7 +1295,7 @@ iopsavgtime=int	Average the calculated IOPS over the given time. Value
 		through 'write_iops_log', then the minimum of this option and
 		'log_avg_msec' will be used.  Default: 500ms.
 
-create_serialize=bool	If true, serialize the file creating for the jobs.
+create_serialize=bool	If true, serialize the file creation for the jobs.
 			This may be handy to avoid interleaving of data
 			files, which may greatly depend on the filesystem
 			used and even the number of processors in the system.
@@ -1334,6 +1334,8 @@ unlink=bool	Unlink the job files when done. Not the default, as repeated
 		runs of that job would then waste time recreating the file
 		set again and again.
 
+unlink_each_loop=bool	Unlink job files after each iteration or loop.
+
 loops=int	Run the specified number of iterations of this job. Used
 		to repeat the same workload a given number of times. Defaults
 		to 1.
diff --git a/backend.c b/backend.c
index c3ad831..6bf5d67 100644
--- a/backend.c
+++ b/backend.c
@@ -571,6 +571,28 @@ static inline bool io_in_polling(struct thread_data *td)
 	return !td->o.iodepth_batch_complete_min &&
 		   !td->o.iodepth_batch_complete_max;
 }
+/*
+ * Unlinks files from thread data fio_file structure
+ */
+static int unlink_all_files(struct thread_data *td)
+{
+	struct fio_file *f;
+	unsigned int i;
+	int ret = 0;
+
+	for_each_file(td, f, i) {
+		if (f->filetype != FIO_TYPE_FILE)
+			continue;
+		ret = td_io_unlink_file(td, f);
+		if (ret)
+			break;
+	}
+
+	if (ret)
+		td_verror(td, ret, "unlink_all_files");
+
+	return ret;
+}
 
 /*
  * The main verify engine. Runs over the writes we previously submitted,
@@ -1309,6 +1331,14 @@ static int switch_ioscheduler(struct thread_data *td)
 	 */
 	tmp[strlen(tmp) - 1] = '\0';
 
+	/*
+	 * Write to "none" entry doesn't fail, so check the result here.
+	 */
+	if (!strcmp(tmp, "none")) {
+		log_err("fio: io scheduler is not tunable\n");
+		fclose(f);
+		return 0;
+	}
 
 	sprintf(tmp2, "[%s]", td->o.ioscheduler);
 	if (!strstr(tmp, tmp2)) {
@@ -1667,9 +1697,13 @@ static void *thread_main(void *data)
 		fio_gettime(&td->start, NULL);
 		memcpy(&td->tv_cache, &td->start, sizeof(td->start));
 
-		if (clear_state)
+		if (clear_state) {
 			clear_io_state(td, 0);
 
+			if (o->unlink_each_loop && unlink_all_files(td))
+				break;
+		}
+
 		prune_io_piece_log(td);
 
 		if (td->o.verify_only && (td_write(td) || td_rw(td)))
diff --git a/cconv.c b/cconv.c
index 837963d..8d9a0a8 100644
--- a/cconv.c
+++ b/cconv.c
@@ -174,6 +174,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->verify_batch = le32_to_cpu(top->verify_batch);
 	o->use_thread = le32_to_cpu(top->use_thread);
 	o->unlink = le32_to_cpu(top->unlink);
+	o->unlink_each_loop = le32_to_cpu(top->unlink_each_loop);
 	o->do_disk_util = le32_to_cpu(top->do_disk_util);
 	o->override_sync = le32_to_cpu(top->override_sync);
 	o->rand_repeatable = le32_to_cpu(top->rand_repeatable);
@@ -367,6 +368,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->verify_batch = cpu_to_le32(o->verify_batch);
 	top->use_thread = cpu_to_le32(o->use_thread);
 	top->unlink = cpu_to_le32(o->unlink);
+	top->unlink_each_loop = cpu_to_le32(o->unlink_each_loop);
 	top->do_disk_util = cpu_to_le32(o->do_disk_util);
 	top->override_sync = cpu_to_le32(o->override_sync);
 	top->rand_repeatable = cpu_to_le32(o->rand_repeatable);
diff --git a/engines/pmemblk.c b/engines/pmemblk.c
index 6d19864..ca72697 100644
--- a/engines/pmemblk.c
+++ b/engines/pmemblk.c
@@ -475,7 +475,7 @@ static int fio_pmemblk_unlink_file(struct thread_data *td, struct fio_file *f)
 
 	pmb_parse_path(f->file_name, &path, &bsize, &fsize);
 	if (!path)
-		return 1;
+		return ENOENT;
 
 	unlink(path);
 	free(path);
diff --git a/filesetup.c b/filesetup.c
index 1ecdda6..42a9f41 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -58,8 +58,12 @@ static int extend_file(struct thread_data *td, struct fio_file *f)
 		unlink_file = 1;
 
 	if (unlink_file || new_layout) {
+		int ret;
+
 		dprint(FD_FILE, "layout unlink %s\n", f->file_name);
-		if ((td_io_unlink_file(td, f) < 0) && (errno != ENOENT)) {
+
+		ret = td_io_unlink_file(td, f);
+		if (ret != 0 && ret != ENOENT) {
 			td_verror(td, errno, "unlink");
 			return 1;
 		}
diff --git a/fio.1 b/fio.1
index d1acebc..696664a 100644
--- a/fio.1
+++ b/fio.1
@@ -1246,6 +1246,9 @@ multiple times. Thus it will not work on eg network or splice IO.
 .BI unlink \fR=\fPbool
 Unlink job files when done.  Default: false.
 .TP
+.BI unlink_each_loop \fR=\fPbool
+Unlink job files after each iteration or loop.  Default: false.
+.TP
 .BI loops \fR=\fPint
 Specifies the number of iterations (runs of the same workload) of this job.
 Default: 1.
diff --git a/init.c b/init.c
index 048bd5d..fb07daa 100644
--- a/init.c
+++ b/init.c
@@ -114,7 +114,7 @@ static struct option l_opts[FIO_NR_OPTIONS] = {
 	},
 	{
 		.name		= (char *) "output-format",
-		.has_arg	= optional_argument,
+		.has_arg	= required_argument,
 		.val		= 'F' | FIO_CLIENT_FLAG,
 	},
 	{
@@ -2302,7 +2302,7 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 	struct thread_data *td = NULL;
 	int c, ini_idx = 0, lidx, ret = 0, do_exit = 0, exit_val = 0;
 	char *ostr = cmd_optstr;
-	void *pid_file = NULL;
+	char *pid_file = NULL;
 	void *cur_client = NULL;
 	int backend = 0;
 
@@ -2352,12 +2352,6 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 			output_format = FIO_OUTPUT_TERSE;
 			break;
 		case 'F':
-			if (!optarg) {
-				log_err("fio: missing --output-format argument\n");
-				exit_val = 1;
-				do_exit++;
-				break;
-			}
 			if (parse_output_format(optarg)) {
 				log_err("fio: failed parsing output-format\n");
 				exit_val = 1;
diff --git a/io_ddir.h b/io_ddir.h
index 763e826..2141119 100644
--- a/io_ddir.h
+++ b/io_ddir.h
@@ -61,7 +61,7 @@ static inline int ddir_rw(enum fio_ddir ddir)
 
 static inline const char *ddir_str(enum td_ddir ddir)
 {
-	const char *__str[] = { NULL, "read", "write", "rw", NULL,
+	static const char *__str[] = { NULL, "read", "write", "rw", NULL,
 				"randread", "randwrite", "randrw",
 				"trim", NULL, NULL, NULL, "randtrim" };
 
diff --git a/ioengine.h b/ioengine.h
index 0effade..ceed329 100644
--- a/ioengine.h
+++ b/ioengine.h
@@ -138,7 +138,7 @@ enum {
 
 struct ioengine_ops {
 	struct flist_head list;
-	char name[16];
+	const char *name;
 	int version;
 	int flags;
 	int (*setup)(struct thread_data *);
diff --git a/ioengines.c b/ioengines.c
index 4129ac2..a06909e 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -521,8 +521,15 @@ int td_io_unlink_file(struct thread_data *td, struct fio_file *f)
 {
 	if (td->io_ops->unlink_file)
 		return td->io_ops->unlink_file(td, f);
-	else
-		return unlink(f->file_name);
+	else {
+		int ret;
+
+		ret = unlink(f->file_name);
+		if (ret < 0)
+			return errno;
+
+		return 0;
+	}
 }
 
 int td_io_get_file_size(struct thread_data *td, struct fio_file *f)
diff --git a/iolog.c b/iolog.c
index a9cbd5b..975ce6f 100644
--- a/iolog.c
+++ b/iolog.c
@@ -661,9 +661,10 @@ void free_log(struct io_log *log)
 	sfree(log);
 }
 
-static inline int hist_sum(int j, int stride, unsigned int *io_u_plat)
+static inline unsigned long hist_sum(int j, int stride, unsigned int *io_u_plat)
 {
-	int k, sum;
+	unsigned long sum;
+	int k;
 
 	for (k = sum = 0; k < stride; k++)
 		sum += io_u_plat[j + k];
@@ -691,11 +692,11 @@ void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
 
 	for (i = 0; i < nr_samples; i++) {
 		s = __get_sample(samples, log_offset, i);
-		io_u_plat = (unsigned int *) s->val;
+		io_u_plat = (unsigned int *) (uintptr_t) s->val;
 		fprintf(f, "%lu, %u, %u, ", (unsigned long)s->time,
 		        io_sample_ddir(s), s->bs);
 		for (j = 0; j < FIO_IO_U_PLAT_NR - stride; j += stride) {
-			fprintf(f, "%lu, ", (unsigned long) hist_sum(j, stride, io_u_plat)); 
+			fprintf(f, "%lu, ", hist_sum(j, stride, io_u_plat));
 		}
 		fprintf(f, "%lu\n", (unsigned long) 
 		        hist_sum(FIO_IO_U_PLAT_NR - stride, stride, io_u_plat));
diff --git a/options.c b/options.c
index 56d3e2b..56e51fc 100644
--- a/options.c
+++ b/options.c
@@ -3241,7 +3241,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.lname	= "Create serialize",
 		.type	= FIO_OPT_BOOL,
 		.off1	= td_var_offset(create_serialize),
-		.help	= "Serialize creating of job files",
+		.help	= "Serialize creation of job files",
 		.def	= "1",
 		.category = FIO_OPT_C_FILE,
 		.group	= FIO_OPT_G_INVALID,
@@ -3433,6 +3433,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
+		.name	= "unlink_each_loop",
+		.lname	= "Unlink file after each loop of a job",
+		.type	= FIO_OPT_BOOL,
+		.off1	= td_var_offset(unlink_each_loop),
+		.help	= "Unlink created files after each loop in a job has completed",
+		.def	= "0",
+		.category = FIO_OPT_C_FILE,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
 		.name	= "exitall",
 		.lname	= "Exit-all on terminate",
 		.type	= FIO_OPT_STR_SET,
diff --git a/server.c b/server.c
index 667a66c..2fd9b45 100644
--- a/server.c
+++ b/server.c
@@ -114,6 +114,7 @@ static const char *fio_server_ops[FIO_NET_CMD_NR] = {
 	"LOAD_FILE",
 	"VTRIGGER",
 	"SENDFILE",
+	"JOB_OPT",
 };
 
 static void sk_lock(struct sk_out *sk_out)
diff --git a/server.h b/server.h
index c17c3bb..fb384fb 100644
--- a/server.h
+++ b/server.h
@@ -38,7 +38,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 55,
+	FIO_SERVER_VER			= 56,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 449c66f..d70fda3 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -121,6 +121,7 @@ struct thread_options {
 	unsigned int verify_state_save;
 	unsigned int use_thread;
 	unsigned int unlink;
+	unsigned int unlink_each_loop;
 	unsigned int do_disk_util;
 	unsigned int override_sync;
 	unsigned int rand_repeatable;
@@ -378,6 +379,7 @@ struct thread_options_pack {
 	uint32_t verify_state_save;
 	uint32_t use_thread;
 	uint32_t unlink;
+	uint32_t unlink_each_loop;
 	uint32_t do_disk_util;
 	uint32_t override_sync;
 	uint32_t rand_repeatable;
@@ -396,7 +398,6 @@ struct thread_options_pack {
 	uint32_t bs_unaligned;
 	uint32_t fsync_on_close;
 	uint32_t bs_is_seq_rand;
-	uint32_t pad1;
 
 	uint32_t random_distribution;
 	uint32_t exitall_error;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2016-08-08 13:31 ` Erwan Velu
@ 2016-08-08 13:47   ` Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-08 13:47 UTC (permalink / raw)
  To: Erwan Velu; +Cc: fio

On 08/08/2016 07:31 AM, Erwan Velu wrote:
> Hey Jens,
>
> Isn't that dangerous to sum many unsigned integers into a signed int ?
> Couldn't this trigger overflows ?
> +                sum += io_u_plat[j + k];

It'd probably be more appropriate to have it be an unsigned long at 
least, then we could kill the cast as well when we print it.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2016-08-08 12:00 Jens Axboe
@ 2016-08-08 13:31 ` Erwan Velu
  2016-08-08 13:47   ` Jens Axboe
  0 siblings, 1 reply; 1305+ messages in thread
From: Erwan Velu @ 2016-08-08 13:31 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

Hey Jens,

Isn't that dangerous to sum many unsigned integers into a signed int ?
Couldn't this trigger overflows ?
+                sum += io_u_plat[j + k];

----- Mail original -----
De: "Jens Axboe" <axboe@kernel.dk>
À: fio@vger.kernel.org
Envoyé: Lundi 8 Août 2016 14:00:02
Objet: Recent changes (master)

The following changes since commit 5fd31680d0370c6b71ccfa456ade211477af81d6:

  Revert "filesetup: ensure that we catch a file flagged for extend" (2016-08-04 19:41:09 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 93168285bc564941d832deea172dc1f68de68666:

  stat: fixups to histogram logging (2016-08-07 15:18:38 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      Merge branch 'histograms-PR' of https://github.com/cronburg/fio
      server: bump protocol version
      iolog: style updates
      stat: fixups to histogram logging

Karl Cronburg (1):
      This commit / feature adds completion latency histogram output to fio, piggybacking     on the existing histograms recorded by stat.c and adding the following command     line options:

 HOWTO                           |  22 ++
 cconv.c                         |   5 +
 fio.1                           |  29 +++
 fio.h                           |   1 +
 init.c                          |  36 +++
 iolog.c                         |  73 +++++-
 iolog.h                         |  16 ++
 options.c                       |  31 +++
 server.h                        |   2 +-
 stat.c                          |  40 ++++
 thread_options.h                |   6 +
 tools/hist/.gitignore           |   3 +
 tools/hist/fiologparser_hist.py | 486 ++++++++++++++++++++++++++++++++++++++++
 tools/hist/half-bins.py         |  38 ++++
 14 files changed, 785 insertions(+), 3 deletions(-)
 create mode 100644 tools/hist/.gitignore
 create mode 100755 tools/hist/fiologparser_hist.py
 create mode 100755 tools/hist/half-bins.py

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index d18d59b..0085b74 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1610,6 +1610,14 @@ write_lat_log=str Same as write_bw_log, except that this option stores io
 		the filename will not include the job index. See 'Log File
 		Formats'.
 
+write_hist_log=str Same as write_lat_log, but writes I/O completion
+		latency histograms. If no filename is given with this option, the
+		default filename of "jobname_clat_hist.x.log" is used, where x is
+		the index of the job (1..N, where N is the number of jobs). Even
+		if the filename is given, fio will still append the type of log.
+		If per_job_logs is false, then the filename will not include the
+		job index. See 'Log File Formats'.
+
 write_iops_log=str Same as write_bw_log, but writes IOPS. If no filename is
 		given with this option, the default filename of
 		"jobname_type.x.log" is used,where x is the index of the job
@@ -1625,6 +1633,20 @@ log_avg_msec=int By default, fio will log an entry in the iops, latency,
 		specified period of time, reducing the resolution of the log.
 		See log_max_value as well. Defaults to 0, logging all entries.
 
+log_hist_msec=int Same as log_avg_msec, but logs entries for completion
+		latency histograms. Computing latency percentiles from averages of
+		intervals using log_avg_msec is innacurate. Setting this option makes
+		fio log histogram entries over the specified period of time, reducing
+		log sizes for high IOPS devices while retaining percentile accuracy.
+		See log_hist_coarseness as well. Defaults to 0, meaning histogram
+		logging is disabled.
+
+log_hist_coarseness=int Integer ranging from 0 to 6, defining the coarseness
+		of the resolution of the histogram logs enabled with log_hist_msec. For
+		each increment in coarseness, fio outputs half as many bins. Defaults to
+		0, for which histogram logs contain 1216 latency bins. See
+		'Log File Formats'.
+
 log_max_value=bool	If log_avg_msec is set, fio logs the average over that
 		window. If you instead want to log the maximum value, set this
 		option to 1. Defaults to 0, meaning that averaged values are
diff --git a/cconv.c b/cconv.c
index ac826a3..837963d 100644
--- a/cconv.c
+++ b/cconv.c
@@ -39,6 +39,7 @@ static void free_thread_options_to_cpu(struct thread_options *o)
 	free(o->bw_log_file);
 	free(o->lat_log_file);
 	free(o->iops_log_file);
+	free(o->hist_log_file);
 	free(o->replay_redirect);
 	free(o->exec_prerun);
 	free(o->exec_postrun);
@@ -74,6 +75,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	string_to_cpu(&o->bw_log_file, top->bw_log_file);
 	string_to_cpu(&o->lat_log_file, top->lat_log_file);
 	string_to_cpu(&o->iops_log_file, top->iops_log_file);
+	string_to_cpu(&o->hist_log_file, top->hist_log_file);
 	string_to_cpu(&o->replay_redirect, top->replay_redirect);
 	string_to_cpu(&o->exec_prerun, top->exec_prerun);
 	string_to_cpu(&o->exec_postrun, top->exec_postrun);
@@ -178,6 +180,8 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->allrand_repeatable = le32_to_cpu(top->allrand_repeatable);
 	o->rand_seed = le64_to_cpu(top->rand_seed);
 	o->log_avg_msec = le32_to_cpu(top->log_avg_msec);
+	o->log_hist_msec = le32_to_cpu(top->log_hist_msec);
+	o->log_hist_coarseness = le32_to_cpu(top->log_hist_coarseness);
 	o->log_max = le32_to_cpu(top->log_max);
 	o->log_offset = le32_to_cpu(top->log_offset);
 	o->log_gz = le32_to_cpu(top->log_gz);
@@ -309,6 +313,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	string_to_net(top->bw_log_file, o->bw_log_file);
 	string_to_net(top->lat_log_file, o->lat_log_file);
 	string_to_net(top->iops_log_file, o->iops_log_file);
+	string_to_net(top->hist_log_file, o->hist_log_file);
 	string_to_net(top->replay_redirect, o->replay_redirect);
 	string_to_net(top->exec_prerun, o->exec_prerun);
 	string_to_net(top->exec_postrun, o->exec_postrun);
diff --git a/fio.1 b/fio.1
index 85eb0fe..d1acebc 100644
--- a/fio.1
+++ b/fio.1
@@ -1476,6 +1476,14 @@ N is the number of jobs). Even if the filename is given, fio will still
 append the type of log. If \fBper_job_logs\fR is false, then the filename will
 not include the job index. See the \fBLOG FILE FORMATS\fR section.
 .TP
+.BI write_hist_log \fR=\fPstr
+Same as \fBwrite_lat_log\fR, but writes I/O completion latency histograms. If
+no filename is given with this option, the default filename of
+"jobname_clat_hist.x.log" is used, where x is the index of the job (1..N, where
+N is the number of jobs). Even if the filename is given, fio will still append
+the type of log. If \fBper_job_logs\fR is false, then the filename will not
+include the job index. See the \fBLOG FILE FORMATS\fR section.
+.TP
 .BI write_iops_log \fR=\fPstr
 Same as \fBwrite_bw_log\fR, but writes IOPS. If no filename is given with this
 option, the default filename of "jobname_type.x.log" is used, where x is the
@@ -1496,6 +1504,20 @@ If \fBlog_avg_msec\fR is set, fio logs the average over that window. If you
 instead want to log the maximum value, set this option to 1.  Defaults to
 0, meaning that averaged values are logged.
 .TP
+.BI log_hist_msec \fR=\fPint
+Same as \fBlog_avg_msec\fR, but logs entries for completion latency histograms.
+Computing latency percentiles from averages of intervals using \fBlog_avg_msec\fR
+is innacurate. Setting this option makes fio log histogram entries over the
+specified period of time, reducing log sizes for high IOPS devices while
+retaining percentile accuracy. See \fBlog_hist_coarseness\fR as well. Defaults
+to 0, meaning histogram logging is disabled.
+.TP
+.BI log_hist_coarseness \fR=\fPint
+Integer ranging from 0 to 6, defining the coarseness of the resolution of the
+histogram logs enabled with \fBlog_hist_msec\fR. For each increment in
+coarseness, fio outputs half as many bins. Defaults to 0, for which histogram
+logs contain 1216 latency bins. See the \fBLOG FILE FORMATS\fR section.
+.TP
 .BI log_offset \fR=\fPbool
 If this is set, the iolog options will include the byte offset for the IO
 entry as well as the other data values.
@@ -2302,6 +2324,13 @@ they aren't applicable if windowed logging is enabled. If windowed logging
 is enabled and \fBlog_max_value\fR is set, then fio logs maximum values in
 that window instead of averages.
 
+For histogram logging the logs look like this:
+
+.B time (msec), data direction, block-size, bin 0, bin 1, ..., bin 1215
+
+Where 'bin i' gives the frequency of IO requests with a latency falling in
+the i-th bin. See \fBlog_hist_coarseness\fR for logging fewer bins.
+
 .RE
 
 .SH CLIENT / SERVER
diff --git a/fio.h b/fio.h
index 87a94f6..d929467 100644
--- a/fio.h
+++ b/fio.h
@@ -141,6 +141,7 @@ struct thread_data {
 
 	struct io_log *slat_log;
 	struct io_log *clat_log;
+	struct io_log *clat_hist_log;
 	struct io_log *lat_log;
 	struct io_log *bw_log;
 	struct io_log *iops_log;
diff --git a/init.c b/init.c
index f81db3c..048bd5d 100644
--- a/init.c
+++ b/init.c
@@ -1418,6 +1418,8 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		struct log_params p = {
 			.td = td,
 			.avg_msec = o->log_avg_msec,
+			.hist_msec = o->log_hist_msec,
+			.hist_coarseness = o->log_hist_coarseness,
 			.log_type = IO_LOG_TYPE_LAT,
 			.log_offset = o->log_offset,
 			.log_gz = o->log_gz,
@@ -1442,10 +1444,36 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 				td->thread_number, suf, o->per_job_logs);
 		setup_log(&td->clat_log, &p, logname);
 	}
+
+	if (o->hist_log_file) {
+		struct log_params p = {
+			.td = td,
+			.avg_msec = o->log_avg_msec,
+			.hist_msec = o->log_hist_msec,
+			.hist_coarseness = o->log_hist_coarseness,
+			.log_type = IO_LOG_TYPE_HIST,
+			.log_offset = o->log_offset,
+			.log_gz = o->log_gz,
+			.log_gz_store = o->log_gz_store,
+		};
+		const char *suf;
+
+		if (p.log_gz_store)
+			suf = "log.fz";
+		else
+			suf = "log";
+
+		gen_log_name(logname, sizeof(logname), "clat_hist", o->hist_log_file,
+				td->thread_number, suf, o->per_job_logs);
+		setup_log(&td->clat_hist_log, &p, logname);
+	}
+
 	if (o->bw_log_file) {
 		struct log_params p = {
 			.td = td,
 			.avg_msec = o->log_avg_msec,
+			.hist_msec = o->log_hist_msec,
+			.hist_coarseness = o->log_hist_coarseness,
 			.log_type = IO_LOG_TYPE_BW,
 			.log_offset = o->log_offset,
 			.log_gz = o->log_gz,
@@ -1457,6 +1485,9 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			p.avg_msec = min(o->log_avg_msec, o->bw_avg_time);
 		else
 			o->bw_avg_time = p.avg_msec;
+	
+		p.hist_msec = o->log_hist_msec;
+		p.hist_coarseness = o->log_hist_coarseness;
 
 		if (p.log_gz_store)
 			suf = "log.fz";
@@ -1471,6 +1502,8 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		struct log_params p = {
 			.td = td,
 			.avg_msec = o->log_avg_msec,
+			.hist_msec = o->log_hist_msec,
+			.hist_coarseness = o->log_hist_coarseness,
 			.log_type = IO_LOG_TYPE_IOPS,
 			.log_offset = o->log_offset,
 			.log_gz = o->log_gz,
@@ -1482,6 +1515,9 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			p.avg_msec = min(o->log_avg_msec, o->iops_avg_time);
 		else
 			o->iops_avg_time = p.avg_msec;
+	
+		p.hist_msec = o->log_hist_msec;
+		p.hist_coarseness = o->log_hist_coarseness;
 
 		if (p.log_gz_store)
 			suf = "log.fz";
diff --git a/iolog.c b/iolog.c
index 4c87f1c..a9cbd5b 100644
--- a/iolog.c
+++ b/iolog.c
@@ -584,6 +584,8 @@ void setup_log(struct io_log **log, struct log_params *p,
 	l->log_gz = p->log_gz;
 	l->log_gz_store = p->log_gz_store;
 	l->avg_msec = p->avg_msec;
+	l->hist_msec = p->hist_msec;
+	l->hist_coarseness = p->hist_coarseness;
 	l->filename = strdup(filename);
 	l->td = p->td;
 
@@ -659,6 +661,48 @@ void free_log(struct io_log *log)
 	sfree(log);
 }
 
+static inline int hist_sum(int j, int stride, unsigned int *io_u_plat)
+{
+	int k, sum;
+
+	for (k = sum = 0; k < stride; k++)
+		sum += io_u_plat[j + k];
+
+	return sum;
+}
+
+void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
+			uint64_t sample_size)
+{
+	struct io_sample *s;
+	int log_offset;
+	uint64_t i, j, nr_samples;
+	unsigned int *io_u_plat;
+
+	int stride = 1 << hist_coarseness;
+	
+	if (!sample_size)
+		return;
+
+	s = __get_sample(samples, 0, 0);
+	log_offset = (s->__ddir & LOG_OFFSET_SAMPLE_BIT) != 0;
+
+	nr_samples = sample_size / __log_entry_sz(log_offset);
+
+	for (i = 0; i < nr_samples; i++) {
+		s = __get_sample(samples, log_offset, i);
+		io_u_plat = (unsigned int *) s->val;
+		fprintf(f, "%lu, %u, %u, ", (unsigned long)s->time,
+		        io_sample_ddir(s), s->bs);
+		for (j = 0; j < FIO_IO_U_PLAT_NR - stride; j += stride) {
+			fprintf(f, "%lu, ", (unsigned long) hist_sum(j, stride, io_u_plat)); 
+		}
+		fprintf(f, "%lu\n", (unsigned long) 
+		        hist_sum(FIO_IO_U_PLAT_NR - stride, stride, io_u_plat));
+		free(io_u_plat);
+	}
+}
+
 void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 {
 	struct io_sample *s;
@@ -988,7 +1032,13 @@ void flush_log(struct io_log *log, bool do_append)
 
 		cur_log = flist_first_entry(&log->io_logs, struct io_logs, list);
 		flist_del_init(&cur_log->list);
-		flush_samples(f, cur_log->log, cur_log->nr_samples * log_entry_sz(log));
+		
+		if (log == log->td->clat_hist_log)
+			flush_hist_samples(f, log->hist_coarseness, cur_log->log,
+			                   cur_log->nr_samples * log_entry_sz(log));
+		else
+			flush_samples(f, cur_log->log, cur_log->nr_samples * log_entry_sz(log));
+		
 		sfree(cur_log);
 	}
 
@@ -1353,6 +1403,20 @@ static int write_clat_log(struct thread_data *td, int try, bool unit_log)
 	return ret;
 }
 
+static int write_clat_hist_log(struct thread_data *td, int try, bool unit_log)
+{
+	int ret;
+
+	if (!unit_log)
+		return 0;
+
+	ret = __write_log(td, td->clat_hist_log, try);
+	if (!ret)
+		td->clat_hist_log = NULL;
+
+	return ret;
+}
+
 static int write_lat_log(struct thread_data *td, int try, bool unit_log)
 {
 	int ret;
@@ -1387,8 +1451,9 @@ enum {
 	SLAT_LOG_MASK	= 4,
 	CLAT_LOG_MASK	= 8,
 	IOPS_LOG_MASK	= 16,
+	CLAT_HIST_LOG_MASK = 32,
 
-	ALL_LOG_NR	= 5,
+	ALL_LOG_NR	= 6,
 };
 
 struct log_type {
@@ -1417,6 +1482,10 @@ static struct log_type log_types[] = {
 		.mask	= IOPS_LOG_MASK,
 		.fn	= write_iops_log,
 	},
+	{
+		.mask	= CLAT_HIST_LOG_MASK,
+		.fn	= write_clat_hist_log,
+	}
 };
 
 void td_writeout_logs(struct thread_data *td, bool unit_logs)
diff --git a/iolog.h b/iolog.h
index 0438fa7..011179a 100644
--- a/iolog.h
+++ b/iolog.h
@@ -18,6 +18,11 @@ struct io_stat {
 	fio_fp64_t S;
 };
 
+struct io_hist {
+	uint64_t samples;
+	unsigned long hist_last;
+};
+
 /*
  * A single data sample
  */
@@ -39,6 +44,7 @@ enum {
 	IO_LOG_TYPE_SLAT,
 	IO_LOG_TYPE_BW,
 	IO_LOG_TYPE_IOPS,
+	IO_LOG_TYPE_HIST,
 };
 
 #define DEF_LOG_ENTRIES		1024
@@ -103,6 +109,14 @@ struct io_log {
 	unsigned long avg_msec;
 	unsigned long avg_last;
 
+  /*
+   * Windowed latency histograms, for keeping track of when we need to
+   * save a copy of the histogram every approximately hist_msec milliseconds.
+   */
+	struct io_hist hist_window[DDIR_RWDIR_CNT];
+	unsigned long hist_msec;
+	int hist_coarseness;
+
 	pthread_mutex_t chunk_lock;
 	unsigned int chunk_seq;
 	struct flist_head chunk_list;
@@ -218,6 +232,8 @@ extern int iolog_file_inflate(const char *);
 struct log_params {
 	struct thread_data *td;
 	unsigned long avg_msec;
+	unsigned long hist_msec;
+	int hist_coarseness;
 	int log_type;
 	int log_offset;
 	int log_gz;
diff --git a/options.c b/options.c
index 4c56dbe..56d3e2b 100644
--- a/options.c
+++ b/options.c
@@ -3530,6 +3530,37 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
+		.name	= "log_hist_msec",
+		.lname	= "Log histograms (msec)",
+		.type	= FIO_OPT_INT,
+		.off1	= td_var_offset(log_hist_msec),
+		.help	= "Dump completion latency histograms at frequency of this time value",
+		.def	= "0",
+		.category = FIO_OPT_C_LOG,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
+		.name	= "log_hist_coarseness",
+		.lname	= "Histogram logs coarseness",
+		.type	= FIO_OPT_INT,
+		.off1	= td_var_offset(log_hist_coarseness),
+		.help	= "Integer in range [0,6]. Higher coarseness outputs"
+			" fewer histogram bins per sample. The number of bins for"
+			" these are [1216, 608, 304, 152, 76, 38, 19] respectively.",
+		.def	= "0",
+		.category = FIO_OPT_C_LOG,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
+		.name	= "write_hist_log",
+		.lname	= "Write latency histogram logs",
+		.type	= FIO_OPT_STR_STORE,
+		.off1	= td_var_offset(hist_log_file),
+		.help	= "Write log of latency histograms during run",
+		.category = FIO_OPT_C_LOG,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
 		.name	= "log_max_value",
 		.lname	= "Log maximum instead of average",
 		.type	= FIO_OPT_BOOL,
diff --git a/server.h b/server.h
index 79c751d..c17c3bb 100644
--- a/server.h
+++ b/server.h
@@ -38,7 +38,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 54,
+	FIO_SERVER_VER			= 55,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index d6787b7..ef9fe7d 100644
--- a/stat.c
+++ b/stat.c
@@ -1965,6 +1965,7 @@ void regrow_logs(struct thread_data *td)
 {
 	regrow_log(td->slat_log);
 	regrow_log(td->clat_log);
+	regrow_log(td->clat_hist_log);
 	regrow_log(td->lat_log);
 	regrow_log(td->bw_log);
 	regrow_log(td->iops_log);
@@ -2195,7 +2196,9 @@ static void add_clat_percentile_sample(struct thread_stat *ts,
 void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		     unsigned long usec, unsigned int bs, uint64_t offset)
 {
+	unsigned long elapsed, this_window;
 	struct thread_stat *ts = &td->ts;
+	struct io_log *iolog = td->clat_hist_log;
 
 	td_io_u_lock(td);
 
@@ -2207,6 +2210,43 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 	if (ts->clat_percentiles)
 		add_clat_percentile_sample(ts, usec, ddir);
 
+	if (iolog && iolog->hist_msec) {
+		struct io_hist *hw = &iolog->hist_window[ddir];
+
+		hw->samples++;
+		elapsed = mtime_since_now(&td->epoch);
+		if (!hw->hist_last)
+			hw->hist_last = elapsed;
+		this_window = elapsed - hw->hist_last;
+		
+		if (this_window >= iolog->hist_msec) {
+			unsigned int *io_u_plat;
+			unsigned int *dst;
+
+			/*
+			 * Make a byte-for-byte copy of the latency histogram
+			 * stored in td->ts.io_u_plat[ddir], recording it in a
+			 * log sample. Note that the matching call to free() is
+			 * located in iolog.c after printing this sample to the
+			 * log file.
+			 */
+			io_u_plat = (unsigned int *) td->ts.io_u_plat[ddir];
+			dst = malloc(FIO_IO_U_PLAT_NR * sizeof(unsigned int));
+			memcpy(dst, io_u_plat,
+				FIO_IO_U_PLAT_NR * sizeof(unsigned int));
+			__add_log_sample(iolog, (unsigned long )dst, ddir, bs,
+						elapsed, offset);
+
+			/*
+			 * Update the last time we recorded as being now, minus
+			 * any drift in time we encountered before actually
+			 * making the record.
+			 */
+			hw->hist_last = elapsed - (this_window - iolog->hist_msec);
+			hw->samples = 0;
+		}
+	}
+
 	td_io_u_unlock(td);
 }
 
diff --git a/thread_options.h b/thread_options.h
index edf090d..449c66f 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -128,6 +128,8 @@ struct thread_options {
 	unsigned long long rand_seed;
 	unsigned int dep_use_os_rand;
 	unsigned int log_avg_msec;
+	unsigned int log_hist_msec;
+	unsigned int log_hist_coarseness;
 	unsigned int log_max;
 	unsigned int log_offset;
 	unsigned int log_gz;
@@ -232,6 +234,7 @@ struct thread_options {
 	char *bw_log_file;
 	char *lat_log_file;
 	char *iops_log_file;
+	char *hist_log_file;
 	char *replay_redirect;
 
 	/*
@@ -382,6 +385,8 @@ struct thread_options_pack {
 	uint64_t rand_seed;
 	uint32_t dep_use_os_rand;
 	uint32_t log_avg_msec;
+	uint32_t log_hist_msec;
+	uint32_t log_hist_coarseness;
 	uint32_t log_max;
 	uint32_t log_offset;
 	uint32_t log_gz;
@@ -486,6 +491,7 @@ struct thread_options_pack {
 	uint8_t bw_log_file[FIO_TOP_STR_MAX];
 	uint8_t lat_log_file[FIO_TOP_STR_MAX];
 	uint8_t iops_log_file[FIO_TOP_STR_MAX];
+	uint8_t hist_log_file[FIO_TOP_STR_MAX];
 	uint8_t replay_redirect[FIO_TOP_STR_MAX];
 
 	/*
diff --git a/tools/hist/.gitignore b/tools/hist/.gitignore
new file mode 100644
index 0000000..4f875da
--- /dev/null
+++ b/tools/hist/.gitignore
@@ -0,0 +1,3 @@
+*.pyc
+*.ipynb
+.ipynb_checkpoints
diff --git a/tools/hist/fiologparser_hist.py b/tools/hist/fiologparser_hist.py
new file mode 100755
index 0000000..ce98d2e
--- /dev/null
+++ b/tools/hist/fiologparser_hist.py
@@ -0,0 +1,486 @@
+#!/usr/bin/env python2.7
+""" 
+    Utility for converting *_clat_hist* files generated by fio into latency statistics.
+    
+    Example usage:
+    
+            $ fiologparser_hist.py *_clat_hist*
+            end-time, samples, min, avg, median, 90%, 95%, 99%, max
+            1000, 15, 192, 1678.107, 1788.859, 1856.076, 1880.040, 1899.208, 1888.000
+            2000, 43, 152, 1642.368, 1714.099, 1816.659, 1845.552, 1888.131, 1888.000
+            4000, 39, 1152, 1546.962, 1545.785, 1627.192, 1640.019, 1691.204, 1744
+            ...
+    
+    Notes:
+
+    * end-times are calculated to be uniform increments of the --interval value given,
+      regardless of when histogram samples are reported. Of note:
+        
+        * Intervals with no samples are omitted. In the example above this means
+          "no statistics from 2 to 3 seconds" and "39 samples influenced the statistics
+          of the interval from 3 to 4 seconds".
+        
+        * Intervals with a single sample will have the same value for all statistics
+        
+    * The number of samples is unweighted, corresponding to the total number of samples
+      which have any effect whatsoever on the interval.
+
+    * Min statistics are computed using value of the lower boundary of the first bin
+      (in increasing bin order) with non-zero samples in it. Similarly for max,
+      we take the upper boundary of the last bin with non-zero samples in it.
+      This is semantically identical to taking the 0th and 100th percentiles with a
+      50% bin-width buffer (because percentiles are computed using mid-points of
+      the bins). This enforces the following nice properties:
+
+        * min <= 50th <= 90th <= 95th <= 99th <= max
+
+        * min and max are strict lower and upper bounds on the actual
+          min / max seen by fio (and reported in *_clat.* with averaging turned off).
+
+    * Average statistics use a standard weighted arithmetic mean.
+
+    * Percentile statistics are computed using the weighted percentile method as
+      described here: https://en.wikipedia.org/wiki/Percentile#Weighted_percentile
+      See weights() method for details on how weights are computed for individual
+      samples. In process_interval() we further multiply by the height of each bin
+      to get weighted histograms.
+    
+    * We convert files given on the command line, assumed to be fio histogram files,
+      on-the-fly into their corresponding differenced files i.e. non-cumulative histograms
+      because fio outputs cumulative histograms, but we want histograms corresponding
+      to individual time intervals. An individual histogram file can contain the cumulative
+      histograms for multiple different r/w directions (notably when --rw=randrw). This
+      is accounted for by tracking each r/w direction separately. In the statistics
+      reported we ultimately merge *all* histograms (regardless of r/w direction).
+
+    * The value of *_GROUP_NR in stat.h (and *_BITS) determines how many latency bins
+      fio outputs when histogramming is enabled. Namely for the current default of
+      GROUP_NR=19, we get 1,216 bins with a maximum latency of approximately 17
+      seconds. For certain applications this may not be sufficient. With GROUP_NR=24
+      we have 1,536 bins, giving us a maximum latency of 541 seconds (~ 9 minutes). If
+      you expect your application to experience latencies greater than 17 seconds,
+      you will need to recompile fio with a larger GROUP_NR, e.g. with:
+        
+            sed -i.bak 's/^#define FIO_IO_U_PLAT_GROUP_NR 19\n/#define FIO_IO_U_PLAT_GROUP_NR 24/g' stat.h
+            make fio
+            
+      Quick reference table for the max latency corresponding to a sampling of
+      values for GROUP_NR:
+            
+            GROUP_NR | # bins | max latency bin value
+            19       | 1216   | 16.9 sec
+            20       | 1280   | 33.8 sec
+            21       | 1344   | 67.6 sec
+            22       | 1408   | 2  min, 15 sec
+            23       | 1472   | 4  min, 32 sec
+            24       | 1536   | 9  min, 4  sec
+            25       | 1600   | 18 min, 8  sec
+            26       | 1664   | 36 min, 16 sec
+      
+    * At present this program automatically detects the number of histogram bins in
+      the log files, and adjusts the bin latency values accordingly. In particular if
+      you use the --log_hist_coarseness parameter of fio, you get output files with
+      a number of bins according to the following table (note that the first
+      row is identical to the table above):
+
+      coarse \ GROUP_NR
+                  19     20    21     22     23     24     25     26
+             -------------------------------------------------------
+            0  [[ 1216,  1280,  1344,  1408,  1472,  1536,  1600,  1664],
+            1   [  608,   640,   672,   704,   736,   768,   800,   832],
+            2   [  304,   320,   336,   352,   368,   384,   400,   416],
+            3   [  152,   160,   168,   176,   184,   192,   200,   208],
+            4   [   76,    80,    84,    88,    92,    96,   100,   104],
+            5   [   38,    40,    42,    44,    46,    48,    50,    52],
+            6   [   19,    20,    21,    22,    23,    24,    25,    26],
+            7   [  N/A,    10,   N/A,    11,   N/A,    12,   N/A,    13],
+            8   [  N/A,     5,   N/A,   N/A,   N/A,     6,   N/A,   N/A]]
+
+      For other values of GROUP_NR and coarseness, this table can be computed like this:    
+        
+            bins = [1216,1280,1344,1408,1472,1536,1600,1664]
+            max_coarse = 8
+            fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else nan, range(max_coarse + 1)))
+            np.transpose(list(map(fncn, bins)))
+      
+      Also note that you can achieve the same downsampling / log file size reduction
+      by pre-processing (before inputting into this script) with half_bins.py.
+
+    * If you have not adjusted GROUP_NR for your (high latency) application, then you
+      will see the percentiles computed by this tool max out at the max latency bin
+      value as in the first table above, and in this plot (where GROUP_NR=19 and thus we see
+      a max latency of ~16.7 seconds in the red line):
+
+            https://www.cronburg.com/fio/max_latency_bin_value_bug.png
+    
+    * Motivation for, design decisions, and the implementation process are
+      described in further detail here:
+
+            https://www.cronburg.com/fio/cloud-latency-problem-measurement/
+
+    @author Karl Cronburg <karl.cronburg@gmail.com>
+"""
+import os
+import sys
+import pandas
+import numpy as np
+
+err = sys.stderr.write
+
+def weighted_percentile(percs, vs, ws):
+    """ Use linear interpolation to calculate the weighted percentile.
+        
+        Value and weight arrays are first sorted by value. The cumulative
+        distribution function (cdf) is then computed, after which np.interp
+        finds the two values closest to our desired weighted percentile(s)
+        and linearly interpolates them.
+        
+        percs  :: List of percentiles we want to calculate
+        vs     :: Array of values we are computing the percentile of
+        ws     :: Array of weights for our corresponding values
+        return :: Array of percentiles
+    """
+    idx = np.argsort(vs)
+    vs, ws = vs[idx], ws[idx] # weights and values sorted by value
+    cdf = 100 * (ws.cumsum() - ws / 2.0) / ws.sum()
+    return np.interp(percs, cdf, vs) # linear interpolation
+
+def weights(start_ts, end_ts, start, end):
+    """ Calculate weights based on fraction of sample falling in the
+        given interval [start,end]. Weights computed using vector / array
+        computation instead of for-loops.
+    
+        Note that samples with zero time length are effectively ignored
+        (we set their weight to zero).
+
+        start_ts :: Array of start times for a set of samples
+        end_ts   :: Array of end times for a set of samples
+        start    :: int
+        end      :: int
+        return   :: Array of weights
+    """
+    sbounds = np.maximum(start_ts, start).astype(float)
+    ebounds = np.minimum(end_ts,   end).astype(float)
+    ws = (ebounds - sbounds) / (end_ts - start_ts)
+    if np.any(np.isnan(ws)):
+      err("WARNING: zero-length sample(s) detected. Log file corrupt"
+          " / bad time values? Ignoring these samples.\n")
+    ws[np.where(np.isnan(ws))] = 0.0;
+    return ws
+
+def weighted_average(vs, ws):
+    return np.sum(vs * ws) / np.sum(ws)
+
+columns = ["end-time", "samples", "min", "avg", "median", "90%", "95%", "99%", "max"]
+percs   = [50, 90, 95, 99]
+
+def fmt_float_list(ctx, num=1):
+  """ Return a comma separated list of float formatters to the required number
+      of decimal places. For instance:
+
+        fmt_float_list(ctx.decimals=4, num=3) == "%.4f, %.4f, %.4f"
+  """
+  return ', '.join(["%%.%df" % ctx.decimals] * num)
+
+# Default values - see beginning of main() for how we detect number columns in
+# the input files:
+__HIST_COLUMNS = 1216
+__NON_HIST_COLUMNS = 3
+__TOTAL_COLUMNS = __HIST_COLUMNS + __NON_HIST_COLUMNS
+    
+def sequential_diffs(head_row, times, rws, hists):
+    """ Take the difference of sequential (in time) histograms with the same
+        r/w direction, returning a new array of differenced histograms.  """
+    result = np.empty(shape=(0, __HIST_COLUMNS))
+    result_times = np.empty(shape=(1, 0))
+    for i in range(8):
+        idx = np.where(rws == i)
+        diff = np.diff(np.append(head_row[i], hists[idx], axis=0), axis=0).astype(int)
+        result = np.append(diff, result, axis=0)
+        result_times = np.append(times[idx], result_times)
+    idx = np.argsort(result_times)
+    return result[idx]
+
+def read_chunk(head_row, rdr, sz):
+    """ Read the next chunk of size sz from the given reader, computing the
+        differences across neighboring histogram samples.
+    """
+    try:
+        """ StopIteration occurs when the pandas reader is empty, and AttributeError
+            occurs if rdr is None due to the file being empty. """
+        new_arr = rdr.read().values
+    except (StopIteration, AttributeError):
+        return None    
+
+    """ Extract array of just the times, and histograms matrix without times column.
+        Then, take the sequential difference of each of the rows in the histogram
+        matrix. This is necessary because fio outputs *cumulative* histograms as
+        opposed to histograms with counts just for a particular interval. """
+    times, rws, szs = new_arr[:,0], new_arr[:,1], new_arr[:,2]
+    hists = new_arr[:,__NON_HIST_COLUMNS:]
+    hists_diff   = sequential_diffs(head_row, times, rws, hists)
+    times = times.reshape((len(times),1))
+    arr = np.append(times, hists_diff, axis=1)
+
+    """ hists[-1] will be the row we need to start our differencing with the
+        next time we call read_chunk() on the same rdr """
+    return arr, hists[-1]
+
+def get_min(fps, arrs):
+    """ Find the file with the current first row with the smallest start time """
+    return min([fp for fp in fps if not arrs[fp] is None], key=lambda fp: arrs.get(fp)[0][0][0])
+
+def histogram_generator(ctx, fps, sz):
+    
+    """ head_row for a particular file keeps track of the last (cumulative)
+        histogram we read so that we have a reference point to subtract off
+        when computing sequential differences. """
+    head_row  = np.zeros(shape=(1, __HIST_COLUMNS))
+    head_rows = {fp: {i: head_row for i in range(8)} for fp in fps}
+
+    # Create a chunked pandas reader for each of the files:
+    rdrs = {}
+    for fp in fps:
+        try:
+            rdrs[fp] = pandas.read_csv(fp, dtype=int, header=None, chunksize=sz)
+        except ValueError as e:
+            if e.message == 'No columns to parse from file':
+                if not ctx.nowarn: sys.stderr.write("WARNING: Empty input file encountered.\n")
+                rdrs[fp] = None
+            else:
+                raise(e)
+
+    # Initial histograms and corresponding head_rows:
+    arrs = {fp: read_chunk(head_rows[fp], rdr, sz) for fp,rdr in rdrs.items()}
+    while True:
+
+        try:
+            """ ValueError occurs when nothing more to read """
+            fp = get_min(fps, arrs)
+        except ValueError:
+            return
+        arr, head_row = arrs[fp]
+        yield np.insert(arr[0], 1, fps.index(fp))
+        arrs[fp] = arr[1:], head_row
+        head_rows[fp] = head_row
+
+        if arrs[fp][0].shape[0] == 0:
+            arrs[fp] = read_chunk(head_rows[fp], rdrs[fp], sz)
+
+def _plat_idx_to_val(idx, edge=0.5, FIO_IO_U_PLAT_BITS=6, FIO_IO_U_PLAT_VAL=64):
+    """ Taken from fio's stat.c for calculating the latency value of a bin
+        from that bin's index.
+        
+            idx  : the value of the index into the histogram bins
+            edge : fractional value in the range [0,1]** indicating how far into
+            the bin we wish to compute the latency value of.
+        
+        ** edge = 0.0 and 1.0 computes the lower and upper latency bounds
+           respectively of the given bin index. """
+
+    # MSB <= (FIO_IO_U_PLAT_BITS-1), cannot be rounded off. Use
+    # all bits of the sample as index
+    if (idx < (FIO_IO_U_PLAT_VAL << 1)):
+        return idx 
+
+    # Find the group and compute the minimum value of that group
+    error_bits = (idx >> FIO_IO_U_PLAT_BITS) - 1 
+    base = 1 << (error_bits + FIO_IO_U_PLAT_BITS)
+
+    # Find its bucket number of the group
+    k = idx % FIO_IO_U_PLAT_VAL
+
+    # Return the mean (if edge=0.5) of the range of the bucket
+    return base + ((k + edge) * (1 << error_bits))
+    
+def plat_idx_to_val_coarse(idx, coarseness, edge=0.5):
+    """ Converts the given *coarse* index into a non-coarse index as used by fio
+        in stat.h:plat_idx_to_val(), subsequently computing the appropriate
+        latency value for that bin.
+        """
+
+    # Multiply the index by the power of 2 coarseness to get the bin
+    # bin index with a max of 1536 bins (FIO_IO_U_PLAT_GROUP_NR = 24 in stat.h)
+    stride = 1 << coarseness
+    idx = idx * stride
+    lower = _plat_idx_to_val(idx, edge=0.0)
+    upper = _plat_idx_to_val(idx + stride, edge=1.0)
+    return lower + (upper - lower) * edge
+
+def print_all_stats(ctx, end, mn, ss_cnt, vs, ws, mx):
+    ps = weighted_percentile(percs, vs, ws)
+
+    avg = weighted_average(vs, ws)
+    values = [mn, avg] + list(ps) + [mx]
+    row = [end, ss_cnt] + map(lambda x: float(x) / ctx.divisor, values)
+    fmt = "%d, %d, %d, " + fmt_float_list(ctx, 5) + ", %d"
+    print (fmt % tuple(row))
+
+def update_extreme(val, fncn, new_val):
+    """ Calculate min / max in the presence of None values """
+    if val is None: return new_val
+    else: return fncn(val, new_val)
+
+# See beginning of main() for how bin_vals are computed
+bin_vals = []
+lower_bin_vals = [] # lower edge of each bin
+upper_bin_vals = [] # upper edge of each bin 
+
+def process_interval(ctx, samples, iStart, iEnd):
+    """ Construct the weighted histogram for the given interval by scanning
+        through all the histograms and figuring out which of their bins have
+        samples with latencies which overlap with the given interval
+        [iStart,iEnd].
+    """
+    
+    times, files, hists = samples[:,0], samples[:,1], samples[:,2:]
+    iHist = np.zeros(__HIST_COLUMNS)
+    ss_cnt = 0 # number of samples affecting this interval
+    mn_bin_val, mx_bin_val = None, None
+
+    for end_time,file,hist in zip(times,files,hists):
+            
+        # Only look at bins of the current histogram sample which
+        # started before the end of the current time interval [start,end]
+        start_times = (end_time - 0.5 * ctx.interval) - bin_vals / 1000.0
+        idx = np.where(start_times < iEnd)
+        s_ts, l_bvs, u_bvs, hs = start_times[idx], lower_bin_vals[idx], upper_bin_vals[idx], hist[idx]
+
+        # Increment current interval histogram by weighted values of future histogram:
+        ws = hs * weights(s_ts, end_time, iStart, iEnd)
+        iHist[idx] += ws
+    
+        # Update total number of samples affecting current interval histogram:
+        ss_cnt += np.sum(hs)
+        
+        # Update min and max bin values seen if necessary:
+        idx = np.where(hs != 0)[0]
+        if idx.size > 0:
+            mn_bin_val = update_extreme(mn_bin_val, min, l_bvs[max(0,           idx[0]  - 1)])
+            mx_bin_val = update_extreme(mx_bin_val, max, u_bvs[min(len(hs) - 1, idx[-1] + 1)])
+
+    if ss_cnt > 0: print_all_stats(ctx, iEnd, mn_bin_val, ss_cnt, bin_vals, iHist, mx_bin_val)
+
+def guess_max_from_bins(ctx, hist_cols):
+    """ Try to guess the GROUP_NR from given # of histogram
+        columns seen in an input file """
+    max_coarse = 8
+    if ctx.group_nr < 19 or ctx.group_nr > 26:
+        bins = [ctx.group_nr * (1 << 6)]
+    else:
+        bins = [1216,1280,1344,1408,1472,1536,1600,1664]
+    coarses = range(max_coarse + 1)
+    fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else -10, coarses))
+    
+    arr = np.transpose(list(map(fncn, bins)))
+    idx = np.where(arr == hist_cols)
+    if len(idx[1]) == 0:
+        table = repr(arr.astype(int)).replace('-10', 'N/A').replace('array','     ')
+        err("Unable to determine bin values from input clat_hist files. Namely \n"
+            "the first line of file '%s' " % ctx.FILE[0] + "has %d \n" % (__TOTAL_COLUMNS,) +
+            "columns of which we assume %d " % (hist_cols,) + "correspond to histogram bins. \n"
+            "This number needs to be equal to one of the following numbers:\n\n"
+            + table + "\n\n"
+            "Possible reasons and corresponding solutions:\n"
+            "  - Input file(s) does not contain histograms.\n"
+            "  - You recompiled fio with a different GROUP_NR. If so please specify this\n"
+            "    new GROUP_NR on the command line with --group_nr\n")
+        exit(1)
+    return bins[idx[1][0]]
+
+def main(ctx):
+
+    # Automatically detect how many columns are in the input files,
+    # calculate the corresponding 'coarseness' parameter used to generate
+    # those files, and calculate the appropriate bin latency values:
+    with open(ctx.FILE[0], 'r') as fp:
+        global bin_vals,lower_bin_vals,upper_bin_vals,__HIST_COLUMNS,__TOTAL_COLUMNS
+        __TOTAL_COLUMNS = len(fp.readline().split(','))
+        __HIST_COLUMNS = __TOTAL_COLUMNS - __NON_HIST_COLUMNS
+
+        max_cols = guess_max_from_bins(ctx, __HIST_COLUMNS)
+        coarseness = int(np.log2(float(max_cols) / __HIST_COLUMNS))
+        bin_vals = np.array(map(lambda x: plat_idx_to_val_coarse(x, coarseness), np.arange(__HIST_COLUMNS)), dtype=float)
+        lower_bin_vals = np.array(map(lambda x: plat_idx_to_val_coarse(x, coarseness, 0.0), np.arange(__HIST_COLUMNS)), dtype=float)
+        upper_bin_vals = np.array(map(lambda x: plat_idx_to_val_coarse(x, coarseness, 1.0), np.arange(__HIST_COLUMNS)), dtype=float)
+
+    fps = [open(f, 'r') for f in ctx.FILE]
+    gen = histogram_generator(ctx, fps, ctx.buff_size)
+
+    print(', '.join(columns))
+
+    try:
+        start, end = 0, ctx.interval
+        arr = np.empty(shape=(0,__TOTAL_COLUMNS - 1))
+        more_data = True
+        while more_data or len(arr) > 0:
+            
+            # Read up to ctx.max_latency (default 20 seconds) of data from end of current interval.
+            while len(arr) == 0 or arr[-1][0] < ctx.max_latency * 1000 + end:
+                try:
+                    new_arr = next(gen)
+                except StopIteration:
+                    more_data = False
+                    break
+                arr = np.append(arr, new_arr.reshape((1,__TOTAL_COLUMNS - 1)), axis=0)
+            arr = arr.astype(int)
+            
+            if arr.size > 0:
+                process_interval(ctx, arr, start, end)
+                
+                # Update arr to throw away samples we no longer need - samples which
+                # end before the start of the next interval, i.e. the end of the
+                # current interval:
+                idx = np.where(arr[:,0] > end)
+                arr = arr[idx]
+            
+            start += ctx.interval
+            end = start + ctx.interval
+    finally:
+        map(lambda f: f.close(), fps)
+
+
+if __name__ == '__main__':
+    import argparse
+    p = argparse.ArgumentParser()
+    arg = p.add_argument
+    arg("FILE", help='space separated list of latency log filenames', nargs='+')
+    arg('--buff_size',
+        default=10000,
+        type=int,
+        help='number of samples to buffer into numpy at a time')
+
+    arg('--max_latency',
+        default=20,
+        type=float,
+        help='number of seconds of data to process at a time')
+
+    arg('-i', '--interval',
+        default=1000,
+        type=int,
+        help='interval width (ms)')
+
+    arg('-d', '--divisor',
+        required=False,
+        type=int,
+        default=1,
+        help='divide the results by this value.')
+
+    arg('--decimals',
+        default=3,
+        type=int,
+        help='number of decimal places to print floats to')
+
+    arg('--nowarn',
+        dest='nowarn',
+        action='store_false',
+        default=True,
+        help='do not print any warning messages to stderr')
+
+    arg('--group_nr',
+        default=19,
+        type=int,
+        help='FIO_IO_U_PLAT_GROUP_NR as defined in stat.h')
+
+    main(p.parse_args())
+
diff --git a/tools/hist/half-bins.py b/tools/hist/half-bins.py
new file mode 100755
index 0000000..d592af0
--- /dev/null
+++ b/tools/hist/half-bins.py
@@ -0,0 +1,38 @@
+#!/usr/bin/env python2.7
+""" Cut the number bins in half in fio histogram output. Example usage:
+
+        $ half-bins.py -c 2 output_clat_hist.1.log > smaller_clat_hist.1.log
+
+    Which merges e.g. bins [0 .. 3], [4 .. 7], ..., [1212 .. 1215] resulting in
+    304 = 1216 / (2**2) merged bins per histogram sample.
+
+    @author Karl Cronburg <karl.cronburg@gmail.com>
+"""
+import sys
+
+def main(ctx):
+    stride = 1 << ctx.coarseness
+    with open(ctx.FILENAME, 'r') as fp:
+        for line in fp.readlines():
+            vals = line.split(', ')
+            sys.stdout.write("%s, %s, %s, " % tuple(vals[:3]))
+
+            hist = list(map(int, vals[3:]))
+            for i in range(0, len(hist) - stride, stride):
+                sys.stdout.write("%d, " % sum(hist[i : i + stride],))
+            sys.stdout.write("%d\n" % sum(hist[len(hist) - stride:]))
+
+if __name__ == '__main__':
+    import argparse
+    p = argparse.ArgumentParser()
+    arg = p.add_argument
+    arg( 'FILENAME', help='clat_hist file for which we will reduce'
+                         ' (by half or more) the number of bins.')
+    arg('-c', '--coarseness',
+       default=1,
+       type=int,
+       help='number of times to reduce number of bins by half, '
+            'e.g. coarseness of 4 merges each 2^4 = 16 consecutive '
+            'bins.')
+    main(p.parse_args())
+

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-08 12:00 Jens Axboe
  2016-08-08 13:31 ` Erwan Velu
  0 siblings, 1 reply; 1305+ messages in thread
From: Jens Axboe @ 2016-08-08 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5fd31680d0370c6b71ccfa456ade211477af81d6:

  Revert "filesetup: ensure that we catch a file flagged for extend" (2016-08-04 19:41:09 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 93168285bc564941d832deea172dc1f68de68666:

  stat: fixups to histogram logging (2016-08-07 15:18:38 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      Merge branch 'histograms-PR' of https://github.com/cronburg/fio
      server: bump protocol version
      iolog: style updates
      stat: fixups to histogram logging

Karl Cronburg (1):
      This commit / feature adds completion latency histogram output to fio, piggybacking     on the existing histograms recorded by stat.c and adding the following command     line options:

 HOWTO                           |  22 ++
 cconv.c                         |   5 +
 fio.1                           |  29 +++
 fio.h                           |   1 +
 init.c                          |  36 +++
 iolog.c                         |  73 +++++-
 iolog.h                         |  16 ++
 options.c                       |  31 +++
 server.h                        |   2 +-
 stat.c                          |  40 ++++
 thread_options.h                |   6 +
 tools/hist/.gitignore           |   3 +
 tools/hist/fiologparser_hist.py | 486 ++++++++++++++++++++++++++++++++++++++++
 tools/hist/half-bins.py         |  38 ++++
 14 files changed, 785 insertions(+), 3 deletions(-)
 create mode 100644 tools/hist/.gitignore
 create mode 100755 tools/hist/fiologparser_hist.py
 create mode 100755 tools/hist/half-bins.py

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index d18d59b..0085b74 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1610,6 +1610,14 @@ write_lat_log=str Same as write_bw_log, except that this option stores io
 		the filename will not include the job index. See 'Log File
 		Formats'.
 
+write_hist_log=str Same as write_lat_log, but writes I/O completion
+		latency histograms. If no filename is given with this option, the
+		default filename of "jobname_clat_hist.x.log" is used, where x is
+		the index of the job (1..N, where N is the number of jobs). Even
+		if the filename is given, fio will still append the type of log.
+		If per_job_logs is false, then the filename will not include the
+		job index. See 'Log File Formats'.
+
 write_iops_log=str Same as write_bw_log, but writes IOPS. If no filename is
 		given with this option, the default filename of
 		"jobname_type.x.log" is used,where x is the index of the job
@@ -1625,6 +1633,20 @@ log_avg_msec=int By default, fio will log an entry in the iops, latency,
 		specified period of time, reducing the resolution of the log.
 		See log_max_value as well. Defaults to 0, logging all entries.
 
+log_hist_msec=int Same as log_avg_msec, but logs entries for completion
+		latency histograms. Computing latency percentiles from averages of
+		intervals using log_avg_msec is innacurate. Setting this option makes
+		fio log histogram entries over the specified period of time, reducing
+		log sizes for high IOPS devices while retaining percentile accuracy.
+		See log_hist_coarseness as well. Defaults to 0, meaning histogram
+		logging is disabled.
+
+log_hist_coarseness=int Integer ranging from 0 to 6, defining the coarseness
+		of the resolution of the histogram logs enabled with log_hist_msec. For
+		each increment in coarseness, fio outputs half as many bins. Defaults to
+		0, for which histogram logs contain 1216 latency bins. See
+		'Log File Formats'.
+
 log_max_value=bool	If log_avg_msec is set, fio logs the average over that
 		window. If you instead want to log the maximum value, set this
 		option to 1. Defaults to 0, meaning that averaged values are
diff --git a/cconv.c b/cconv.c
index ac826a3..837963d 100644
--- a/cconv.c
+++ b/cconv.c
@@ -39,6 +39,7 @@ static void free_thread_options_to_cpu(struct thread_options *o)
 	free(o->bw_log_file);
 	free(o->lat_log_file);
 	free(o->iops_log_file);
+	free(o->hist_log_file);
 	free(o->replay_redirect);
 	free(o->exec_prerun);
 	free(o->exec_postrun);
@@ -74,6 +75,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	string_to_cpu(&o->bw_log_file, top->bw_log_file);
 	string_to_cpu(&o->lat_log_file, top->lat_log_file);
 	string_to_cpu(&o->iops_log_file, top->iops_log_file);
+	string_to_cpu(&o->hist_log_file, top->hist_log_file);
 	string_to_cpu(&o->replay_redirect, top->replay_redirect);
 	string_to_cpu(&o->exec_prerun, top->exec_prerun);
 	string_to_cpu(&o->exec_postrun, top->exec_postrun);
@@ -178,6 +180,8 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->allrand_repeatable = le32_to_cpu(top->allrand_repeatable);
 	o->rand_seed = le64_to_cpu(top->rand_seed);
 	o->log_avg_msec = le32_to_cpu(top->log_avg_msec);
+	o->log_hist_msec = le32_to_cpu(top->log_hist_msec);
+	o->log_hist_coarseness = le32_to_cpu(top->log_hist_coarseness);
 	o->log_max = le32_to_cpu(top->log_max);
 	o->log_offset = le32_to_cpu(top->log_offset);
 	o->log_gz = le32_to_cpu(top->log_gz);
@@ -309,6 +313,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	string_to_net(top->bw_log_file, o->bw_log_file);
 	string_to_net(top->lat_log_file, o->lat_log_file);
 	string_to_net(top->iops_log_file, o->iops_log_file);
+	string_to_net(top->hist_log_file, o->hist_log_file);
 	string_to_net(top->replay_redirect, o->replay_redirect);
 	string_to_net(top->exec_prerun, o->exec_prerun);
 	string_to_net(top->exec_postrun, o->exec_postrun);
diff --git a/fio.1 b/fio.1
index 85eb0fe..d1acebc 100644
--- a/fio.1
+++ b/fio.1
@@ -1476,6 +1476,14 @@ N is the number of jobs). Even if the filename is given, fio will still
 append the type of log. If \fBper_job_logs\fR is false, then the filename will
 not include the job index. See the \fBLOG FILE FORMATS\fR section.
 .TP
+.BI write_hist_log \fR=\fPstr
+Same as \fBwrite_lat_log\fR, but writes I/O completion latency histograms. If
+no filename is given with this option, the default filename of
+"jobname_clat_hist.x.log" is used, where x is the index of the job (1..N, where
+N is the number of jobs). Even if the filename is given, fio will still append
+the type of log. If \fBper_job_logs\fR is false, then the filename will not
+include the job index. See the \fBLOG FILE FORMATS\fR section.
+.TP
 .BI write_iops_log \fR=\fPstr
 Same as \fBwrite_bw_log\fR, but writes IOPS. If no filename is given with this
 option, the default filename of "jobname_type.x.log" is used, where x is the
@@ -1496,6 +1504,20 @@ If \fBlog_avg_msec\fR is set, fio logs the average over that window. If you
 instead want to log the maximum value, set this option to 1.  Defaults to
 0, meaning that averaged values are logged.
 .TP
+.BI log_hist_msec \fR=\fPint
+Same as \fBlog_avg_msec\fR, but logs entries for completion latency histograms.
+Computing latency percentiles from averages of intervals using \fBlog_avg_msec\fR
+is innacurate. Setting this option makes fio log histogram entries over the
+specified period of time, reducing log sizes for high IOPS devices while
+retaining percentile accuracy. See \fBlog_hist_coarseness\fR as well. Defaults
+to 0, meaning histogram logging is disabled.
+.TP
+.BI log_hist_coarseness \fR=\fPint
+Integer ranging from 0 to 6, defining the coarseness of the resolution of the
+histogram logs enabled with \fBlog_hist_msec\fR. For each increment in
+coarseness, fio outputs half as many bins. Defaults to 0, for which histogram
+logs contain 1216 latency bins. See the \fBLOG FILE FORMATS\fR section.
+.TP
 .BI log_offset \fR=\fPbool
 If this is set, the iolog options will include the byte offset for the IO
 entry as well as the other data values.
@@ -2302,6 +2324,13 @@ they aren't applicable if windowed logging is enabled. If windowed logging
 is enabled and \fBlog_max_value\fR is set, then fio logs maximum values in
 that window instead of averages.
 
+For histogram logging the logs look like this:
+
+.B time (msec), data direction, block-size, bin 0, bin 1, ..., bin 1215
+
+Where 'bin i' gives the frequency of IO requests with a latency falling in
+the i-th bin. See \fBlog_hist_coarseness\fR for logging fewer bins.
+
 .RE
 
 .SH CLIENT / SERVER
diff --git a/fio.h b/fio.h
index 87a94f6..d929467 100644
--- a/fio.h
+++ b/fio.h
@@ -141,6 +141,7 @@ struct thread_data {
 
 	struct io_log *slat_log;
 	struct io_log *clat_log;
+	struct io_log *clat_hist_log;
 	struct io_log *lat_log;
 	struct io_log *bw_log;
 	struct io_log *iops_log;
diff --git a/init.c b/init.c
index f81db3c..048bd5d 100644
--- a/init.c
+++ b/init.c
@@ -1418,6 +1418,8 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		struct log_params p = {
 			.td = td,
 			.avg_msec = o->log_avg_msec,
+			.hist_msec = o->log_hist_msec,
+			.hist_coarseness = o->log_hist_coarseness,
 			.log_type = IO_LOG_TYPE_LAT,
 			.log_offset = o->log_offset,
 			.log_gz = o->log_gz,
@@ -1442,10 +1444,36 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 				td->thread_number, suf, o->per_job_logs);
 		setup_log(&td->clat_log, &p, logname);
 	}
+
+	if (o->hist_log_file) {
+		struct log_params p = {
+			.td = td,
+			.avg_msec = o->log_avg_msec,
+			.hist_msec = o->log_hist_msec,
+			.hist_coarseness = o->log_hist_coarseness,
+			.log_type = IO_LOG_TYPE_HIST,
+			.log_offset = o->log_offset,
+			.log_gz = o->log_gz,
+			.log_gz_store = o->log_gz_store,
+		};
+		const char *suf;
+
+		if (p.log_gz_store)
+			suf = "log.fz";
+		else
+			suf = "log";
+
+		gen_log_name(logname, sizeof(logname), "clat_hist", o->hist_log_file,
+				td->thread_number, suf, o->per_job_logs);
+		setup_log(&td->clat_hist_log, &p, logname);
+	}
+
 	if (o->bw_log_file) {
 		struct log_params p = {
 			.td = td,
 			.avg_msec = o->log_avg_msec,
+			.hist_msec = o->log_hist_msec,
+			.hist_coarseness = o->log_hist_coarseness,
 			.log_type = IO_LOG_TYPE_BW,
 			.log_offset = o->log_offset,
 			.log_gz = o->log_gz,
@@ -1457,6 +1485,9 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			p.avg_msec = min(o->log_avg_msec, o->bw_avg_time);
 		else
 			o->bw_avg_time = p.avg_msec;
+	
+		p.hist_msec = o->log_hist_msec;
+		p.hist_coarseness = o->log_hist_coarseness;
 
 		if (p.log_gz_store)
 			suf = "log.fz";
@@ -1471,6 +1502,8 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		struct log_params p = {
 			.td = td,
 			.avg_msec = o->log_avg_msec,
+			.hist_msec = o->log_hist_msec,
+			.hist_coarseness = o->log_hist_coarseness,
 			.log_type = IO_LOG_TYPE_IOPS,
 			.log_offset = o->log_offset,
 			.log_gz = o->log_gz,
@@ -1482,6 +1515,9 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 			p.avg_msec = min(o->log_avg_msec, o->iops_avg_time);
 		else
 			o->iops_avg_time = p.avg_msec;
+	
+		p.hist_msec = o->log_hist_msec;
+		p.hist_coarseness = o->log_hist_coarseness;
 
 		if (p.log_gz_store)
 			suf = "log.fz";
diff --git a/iolog.c b/iolog.c
index 4c87f1c..a9cbd5b 100644
--- a/iolog.c
+++ b/iolog.c
@@ -584,6 +584,8 @@ void setup_log(struct io_log **log, struct log_params *p,
 	l->log_gz = p->log_gz;
 	l->log_gz_store = p->log_gz_store;
 	l->avg_msec = p->avg_msec;
+	l->hist_msec = p->hist_msec;
+	l->hist_coarseness = p->hist_coarseness;
 	l->filename = strdup(filename);
 	l->td = p->td;
 
@@ -659,6 +661,48 @@ void free_log(struct io_log *log)
 	sfree(log);
 }
 
+static inline int hist_sum(int j, int stride, unsigned int *io_u_plat)
+{
+	int k, sum;
+
+	for (k = sum = 0; k < stride; k++)
+		sum += io_u_plat[j + k];
+
+	return sum;
+}
+
+void flush_hist_samples(FILE *f, int hist_coarseness, void *samples,
+			uint64_t sample_size)
+{
+	struct io_sample *s;
+	int log_offset;
+	uint64_t i, j, nr_samples;
+	unsigned int *io_u_plat;
+
+	int stride = 1 << hist_coarseness;
+	
+	if (!sample_size)
+		return;
+
+	s = __get_sample(samples, 0, 0);
+	log_offset = (s->__ddir & LOG_OFFSET_SAMPLE_BIT) != 0;
+
+	nr_samples = sample_size / __log_entry_sz(log_offset);
+
+	for (i = 0; i < nr_samples; i++) {
+		s = __get_sample(samples, log_offset, i);
+		io_u_plat = (unsigned int *) s->val;
+		fprintf(f, "%lu, %u, %u, ", (unsigned long)s->time,
+		        io_sample_ddir(s), s->bs);
+		for (j = 0; j < FIO_IO_U_PLAT_NR - stride; j += stride) {
+			fprintf(f, "%lu, ", (unsigned long) hist_sum(j, stride, io_u_plat)); 
+		}
+		fprintf(f, "%lu\n", (unsigned long) 
+		        hist_sum(FIO_IO_U_PLAT_NR - stride, stride, io_u_plat));
+		free(io_u_plat);
+	}
+}
+
 void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 {
 	struct io_sample *s;
@@ -988,7 +1032,13 @@ void flush_log(struct io_log *log, bool do_append)
 
 		cur_log = flist_first_entry(&log->io_logs, struct io_logs, list);
 		flist_del_init(&cur_log->list);
-		flush_samples(f, cur_log->log, cur_log->nr_samples * log_entry_sz(log));
+		
+		if (log == log->td->clat_hist_log)
+			flush_hist_samples(f, log->hist_coarseness, cur_log->log,
+			                   cur_log->nr_samples * log_entry_sz(log));
+		else
+			flush_samples(f, cur_log->log, cur_log->nr_samples * log_entry_sz(log));
+		
 		sfree(cur_log);
 	}
 
@@ -1353,6 +1403,20 @@ static int write_clat_log(struct thread_data *td, int try, bool unit_log)
 	return ret;
 }
 
+static int write_clat_hist_log(struct thread_data *td, int try, bool unit_log)
+{
+	int ret;
+
+	if (!unit_log)
+		return 0;
+
+	ret = __write_log(td, td->clat_hist_log, try);
+	if (!ret)
+		td->clat_hist_log = NULL;
+
+	return ret;
+}
+
 static int write_lat_log(struct thread_data *td, int try, bool unit_log)
 {
 	int ret;
@@ -1387,8 +1451,9 @@ enum {
 	SLAT_LOG_MASK	= 4,
 	CLAT_LOG_MASK	= 8,
 	IOPS_LOG_MASK	= 16,
+	CLAT_HIST_LOG_MASK = 32,
 
-	ALL_LOG_NR	= 5,
+	ALL_LOG_NR	= 6,
 };
 
 struct log_type {
@@ -1417,6 +1482,10 @@ static struct log_type log_types[] = {
 		.mask	= IOPS_LOG_MASK,
 		.fn	= write_iops_log,
 	},
+	{
+		.mask	= CLAT_HIST_LOG_MASK,
+		.fn	= write_clat_hist_log,
+	}
 };
 
 void td_writeout_logs(struct thread_data *td, bool unit_logs)
diff --git a/iolog.h b/iolog.h
index 0438fa7..011179a 100644
--- a/iolog.h
+++ b/iolog.h
@@ -18,6 +18,11 @@ struct io_stat {
 	fio_fp64_t S;
 };
 
+struct io_hist {
+	uint64_t samples;
+	unsigned long hist_last;
+};
+
 /*
  * A single data sample
  */
@@ -39,6 +44,7 @@ enum {
 	IO_LOG_TYPE_SLAT,
 	IO_LOG_TYPE_BW,
 	IO_LOG_TYPE_IOPS,
+	IO_LOG_TYPE_HIST,
 };
 
 #define DEF_LOG_ENTRIES		1024
@@ -103,6 +109,14 @@ struct io_log {
 	unsigned long avg_msec;
 	unsigned long avg_last;
 
+  /*
+   * Windowed latency histograms, for keeping track of when we need to
+   * save a copy of the histogram every approximately hist_msec milliseconds.
+   */
+	struct io_hist hist_window[DDIR_RWDIR_CNT];
+	unsigned long hist_msec;
+	int hist_coarseness;
+
 	pthread_mutex_t chunk_lock;
 	unsigned int chunk_seq;
 	struct flist_head chunk_list;
@@ -218,6 +232,8 @@ extern int iolog_file_inflate(const char *);
 struct log_params {
 	struct thread_data *td;
 	unsigned long avg_msec;
+	unsigned long hist_msec;
+	int hist_coarseness;
 	int log_type;
 	int log_offset;
 	int log_gz;
diff --git a/options.c b/options.c
index 4c56dbe..56d3e2b 100644
--- a/options.c
+++ b/options.c
@@ -3530,6 +3530,37 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
+		.name	= "log_hist_msec",
+		.lname	= "Log histograms (msec)",
+		.type	= FIO_OPT_INT,
+		.off1	= td_var_offset(log_hist_msec),
+		.help	= "Dump completion latency histograms at frequency of this time value",
+		.def	= "0",
+		.category = FIO_OPT_C_LOG,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
+		.name	= "log_hist_coarseness",
+		.lname	= "Histogram logs coarseness",
+		.type	= FIO_OPT_INT,
+		.off1	= td_var_offset(log_hist_coarseness),
+		.help	= "Integer in range [0,6]. Higher coarseness outputs"
+			" fewer histogram bins per sample. The number of bins for"
+			" these are [1216, 608, 304, 152, 76, 38, 19] respectively.",
+		.def	= "0",
+		.category = FIO_OPT_C_LOG,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
+		.name	= "write_hist_log",
+		.lname	= "Write latency histogram logs",
+		.type	= FIO_OPT_STR_STORE,
+		.off1	= td_var_offset(hist_log_file),
+		.help	= "Write log of latency histograms during run",
+		.category = FIO_OPT_C_LOG,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
 		.name	= "log_max_value",
 		.lname	= "Log maximum instead of average",
 		.type	= FIO_OPT_BOOL,
diff --git a/server.h b/server.h
index 79c751d..c17c3bb 100644
--- a/server.h
+++ b/server.h
@@ -38,7 +38,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 54,
+	FIO_SERVER_VER			= 55,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index d6787b7..ef9fe7d 100644
--- a/stat.c
+++ b/stat.c
@@ -1965,6 +1965,7 @@ void regrow_logs(struct thread_data *td)
 {
 	regrow_log(td->slat_log);
 	regrow_log(td->clat_log);
+	regrow_log(td->clat_hist_log);
 	regrow_log(td->lat_log);
 	regrow_log(td->bw_log);
 	regrow_log(td->iops_log);
@@ -2195,7 +2196,9 @@ static void add_clat_percentile_sample(struct thread_stat *ts,
 void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		     unsigned long usec, unsigned int bs, uint64_t offset)
 {
+	unsigned long elapsed, this_window;
 	struct thread_stat *ts = &td->ts;
+	struct io_log *iolog = td->clat_hist_log;
 
 	td_io_u_lock(td);
 
@@ -2207,6 +2210,43 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 	if (ts->clat_percentiles)
 		add_clat_percentile_sample(ts, usec, ddir);
 
+	if (iolog && iolog->hist_msec) {
+		struct io_hist *hw = &iolog->hist_window[ddir];
+
+		hw->samples++;
+		elapsed = mtime_since_now(&td->epoch);
+		if (!hw->hist_last)
+			hw->hist_last = elapsed;
+		this_window = elapsed - hw->hist_last;
+		
+		if (this_window >= iolog->hist_msec) {
+			unsigned int *io_u_plat;
+			unsigned int *dst;
+
+			/*
+			 * Make a byte-for-byte copy of the latency histogram
+			 * stored in td->ts.io_u_plat[ddir], recording it in a
+			 * log sample. Note that the matching call to free() is
+			 * located in iolog.c after printing this sample to the
+			 * log file.
+			 */
+			io_u_plat = (unsigned int *) td->ts.io_u_plat[ddir];
+			dst = malloc(FIO_IO_U_PLAT_NR * sizeof(unsigned int));
+			memcpy(dst, io_u_plat,
+				FIO_IO_U_PLAT_NR * sizeof(unsigned int));
+			__add_log_sample(iolog, (unsigned long )dst, ddir, bs,
+						elapsed, offset);
+
+			/*
+			 * Update the last time we recorded as being now, minus
+			 * any drift in time we encountered before actually
+			 * making the record.
+			 */
+			hw->hist_last = elapsed - (this_window - iolog->hist_msec);
+			hw->samples = 0;
+		}
+	}
+
 	td_io_u_unlock(td);
 }
 
diff --git a/thread_options.h b/thread_options.h
index edf090d..449c66f 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -128,6 +128,8 @@ struct thread_options {
 	unsigned long long rand_seed;
 	unsigned int dep_use_os_rand;
 	unsigned int log_avg_msec;
+	unsigned int log_hist_msec;
+	unsigned int log_hist_coarseness;
 	unsigned int log_max;
 	unsigned int log_offset;
 	unsigned int log_gz;
@@ -232,6 +234,7 @@ struct thread_options {
 	char *bw_log_file;
 	char *lat_log_file;
 	char *iops_log_file;
+	char *hist_log_file;
 	char *replay_redirect;
 
 	/*
@@ -382,6 +385,8 @@ struct thread_options_pack {
 	uint64_t rand_seed;
 	uint32_t dep_use_os_rand;
 	uint32_t log_avg_msec;
+	uint32_t log_hist_msec;
+	uint32_t log_hist_coarseness;
 	uint32_t log_max;
 	uint32_t log_offset;
 	uint32_t log_gz;
@@ -486,6 +491,7 @@ struct thread_options_pack {
 	uint8_t bw_log_file[FIO_TOP_STR_MAX];
 	uint8_t lat_log_file[FIO_TOP_STR_MAX];
 	uint8_t iops_log_file[FIO_TOP_STR_MAX];
+	uint8_t hist_log_file[FIO_TOP_STR_MAX];
 	uint8_t replay_redirect[FIO_TOP_STR_MAX];
 
 	/*
diff --git a/tools/hist/.gitignore b/tools/hist/.gitignore
new file mode 100644
index 0000000..4f875da
--- /dev/null
+++ b/tools/hist/.gitignore
@@ -0,0 +1,3 @@
+*.pyc
+*.ipynb
+.ipynb_checkpoints
diff --git a/tools/hist/fiologparser_hist.py b/tools/hist/fiologparser_hist.py
new file mode 100755
index 0000000..ce98d2e
--- /dev/null
+++ b/tools/hist/fiologparser_hist.py
@@ -0,0 +1,486 @@
+#!/usr/bin/env python2.7
+""" 
+    Utility for converting *_clat_hist* files generated by fio into latency statistics.
+    
+    Example usage:
+    
+            $ fiologparser_hist.py *_clat_hist*
+            end-time, samples, min, avg, median, 90%, 95%, 99%, max
+            1000, 15, 192, 1678.107, 1788.859, 1856.076, 1880.040, 1899.208, 1888.000
+            2000, 43, 152, 1642.368, 1714.099, 1816.659, 1845.552, 1888.131, 1888.000
+            4000, 39, 1152, 1546.962, 1545.785, 1627.192, 1640.019, 1691.204, 1744
+            ...
+    
+    Notes:
+
+    * end-times are calculated to be uniform increments of the --interval value given,
+      regardless of when histogram samples are reported. Of note:
+        
+        * Intervals with no samples are omitted. In the example above this means
+          "no statistics from 2 to 3 seconds" and "39 samples influenced the statistics
+          of the interval from 3 to 4 seconds".
+        
+        * Intervals with a single sample will have the same value for all statistics
+        
+    * The number of samples is unweighted, corresponding to the total number of samples
+      which have any effect whatsoever on the interval.
+
+    * Min statistics are computed using value of the lower boundary of the first bin
+      (in increasing bin order) with non-zero samples in it. Similarly for max,
+      we take the upper boundary of the last bin with non-zero samples in it.
+      This is semantically identical to taking the 0th and 100th percentiles with a
+      50% bin-width buffer (because percentiles are computed using mid-points of
+      the bins). This enforces the following nice properties:
+
+        * min <= 50th <= 90th <= 95th <= 99th <= max
+
+        * min and max are strict lower and upper bounds on the actual
+          min / max seen by fio (and reported in *_clat.* with averaging turned off).
+
+    * Average statistics use a standard weighted arithmetic mean.
+
+    * Percentile statistics are computed using the weighted percentile method as
+      described here: https://en.wikipedia.org/wiki/Percentile#Weighted_percentile
+      See weights() method for details on how weights are computed for individual
+      samples. In process_interval() we further multiply by the height of each bin
+      to get weighted histograms.
+    
+    * We convert files given on the command line, assumed to be fio histogram files,
+      on-the-fly into their corresponding differenced files i.e. non-cumulative histograms
+      because fio outputs cumulative histograms, but we want histograms corresponding
+      to individual time intervals. An individual histogram file can contain the cumulative
+      histograms for multiple different r/w directions (notably when --rw=randrw). This
+      is accounted for by tracking each r/w direction separately. In the statistics
+      reported we ultimately merge *all* histograms (regardless of r/w direction).
+
+    * The value of *_GROUP_NR in stat.h (and *_BITS) determines how many latency bins
+      fio outputs when histogramming is enabled. Namely for the current default of
+      GROUP_NR=19, we get 1,216 bins with a maximum latency of approximately 17
+      seconds. For certain applications this may not be sufficient. With GROUP_NR=24
+      we have 1,536 bins, giving us a maximum latency of 541 seconds (~ 9 minutes). If
+      you expect your application to experience latencies greater than 17 seconds,
+      you will need to recompile fio with a larger GROUP_NR, e.g. with:
+        
+            sed -i.bak 's/^#define FIO_IO_U_PLAT_GROUP_NR 19\n/#define FIO_IO_U_PLAT_GROUP_NR 24/g' stat.h
+            make fio
+            
+      Quick reference table for the max latency corresponding to a sampling of
+      values for GROUP_NR:
+            
+            GROUP_NR | # bins | max latency bin value
+            19       | 1216   | 16.9 sec
+            20       | 1280   | 33.8 sec
+            21       | 1344   | 67.6 sec
+            22       | 1408   | 2  min, 15 sec
+            23       | 1472   | 4  min, 32 sec
+            24       | 1536   | 9  min, 4  sec
+            25       | 1600   | 18 min, 8  sec
+            26       | 1664   | 36 min, 16 sec
+      
+    * At present this program automatically detects the number of histogram bins in
+      the log files, and adjusts the bin latency values accordingly. In particular if
+      you use the --log_hist_coarseness parameter of fio, you get output files with
+      a number of bins according to the following table (note that the first
+      row is identical to the table above):
+
+      coarse \ GROUP_NR
+                  19     20    21     22     23     24     25     26
+             -------------------------------------------------------
+            0  [[ 1216,  1280,  1344,  1408,  1472,  1536,  1600,  1664],
+            1   [  608,   640,   672,   704,   736,   768,   800,   832],
+            2   [  304,   320,   336,   352,   368,   384,   400,   416],
+            3   [  152,   160,   168,   176,   184,   192,   200,   208],
+            4   [   76,    80,    84,    88,    92,    96,   100,   104],
+            5   [   38,    40,    42,    44,    46,    48,    50,    52],
+            6   [   19,    20,    21,    22,    23,    24,    25,    26],
+            7   [  N/A,    10,   N/A,    11,   N/A,    12,   N/A,    13],
+            8   [  N/A,     5,   N/A,   N/A,   N/A,     6,   N/A,   N/A]]
+
+      For other values of GROUP_NR and coarseness, this table can be computed like this:    
+        
+            bins = [1216,1280,1344,1408,1472,1536,1600,1664]
+            max_coarse = 8
+            fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else nan, range(max_coarse + 1)))
+            np.transpose(list(map(fncn, bins)))
+      
+      Also note that you can achieve the same downsampling / log file size reduction
+      by pre-processing (before inputting into this script) with half_bins.py.
+
+    * If you have not adjusted GROUP_NR for your (high latency) application, then you
+      will see the percentiles computed by this tool max out at the max latency bin
+      value as in the first table above, and in this plot (where GROUP_NR=19 and thus we see
+      a max latency of ~16.7 seconds in the red line):
+
+            https://www.cronburg.com/fio/max_latency_bin_value_bug.png
+    
+    * Motivation for, design decisions, and the implementation process are
+      described in further detail here:
+
+            https://www.cronburg.com/fio/cloud-latency-problem-measurement/
+
+    @author Karl Cronburg <karl.cronburg@gmail.com>
+"""
+import os
+import sys
+import pandas
+import numpy as np
+
+err = sys.stderr.write
+
+def weighted_percentile(percs, vs, ws):
+    """ Use linear interpolation to calculate the weighted percentile.
+        
+        Value and weight arrays are first sorted by value. The cumulative
+        distribution function (cdf) is then computed, after which np.interp
+        finds the two values closest to our desired weighted percentile(s)
+        and linearly interpolates them.
+        
+        percs  :: List of percentiles we want to calculate
+        vs     :: Array of values we are computing the percentile of
+        ws     :: Array of weights for our corresponding values
+        return :: Array of percentiles
+    """
+    idx = np.argsort(vs)
+    vs, ws = vs[idx], ws[idx] # weights and values sorted by value
+    cdf = 100 * (ws.cumsum() - ws / 2.0) / ws.sum()
+    return np.interp(percs, cdf, vs) # linear interpolation
+
+def weights(start_ts, end_ts, start, end):
+    """ Calculate weights based on fraction of sample falling in the
+        given interval [start,end]. Weights computed using vector / array
+        computation instead of for-loops.
+    
+        Note that samples with zero time length are effectively ignored
+        (we set their weight to zero).
+
+        start_ts :: Array of start times for a set of samples
+        end_ts   :: Array of end times for a set of samples
+        start    :: int
+        end      :: int
+        return   :: Array of weights
+    """
+    sbounds = np.maximum(start_ts, start).astype(float)
+    ebounds = np.minimum(end_ts,   end).astype(float)
+    ws = (ebounds - sbounds) / (end_ts - start_ts)
+    if np.any(np.isnan(ws)):
+      err("WARNING: zero-length sample(s) detected. Log file corrupt"
+          " / bad time values? Ignoring these samples.\n")
+    ws[np.where(np.isnan(ws))] = 0.0;
+    return ws
+
+def weighted_average(vs, ws):
+    return np.sum(vs * ws) / np.sum(ws)
+
+columns = ["end-time", "samples", "min", "avg", "median", "90%", "95%", "99%", "max"]
+percs   = [50, 90, 95, 99]
+
+def fmt_float_list(ctx, num=1):
+  """ Return a comma separated list of float formatters to the required number
+      of decimal places. For instance:
+
+        fmt_float_list(ctx.decimals=4, num=3) == "%.4f, %.4f, %.4f"
+  """
+  return ', '.join(["%%.%df" % ctx.decimals] * num)
+
+# Default values - see beginning of main() for how we detect number columns in
+# the input files:
+__HIST_COLUMNS = 1216
+__NON_HIST_COLUMNS = 3
+__TOTAL_COLUMNS = __HIST_COLUMNS + __NON_HIST_COLUMNS
+    
+def sequential_diffs(head_row, times, rws, hists):
+    """ Take the difference of sequential (in time) histograms with the same
+        r/w direction, returning a new array of differenced histograms.  """
+    result = np.empty(shape=(0, __HIST_COLUMNS))
+    result_times = np.empty(shape=(1, 0))
+    for i in range(8):
+        idx = np.where(rws == i)
+        diff = np.diff(np.append(head_row[i], hists[idx], axis=0), axis=0).astype(int)
+        result = np.append(diff, result, axis=0)
+        result_times = np.append(times[idx], result_times)
+    idx = np.argsort(result_times)
+    return result[idx]
+
+def read_chunk(head_row, rdr, sz):
+    """ Read the next chunk of size sz from the given reader, computing the
+        differences across neighboring histogram samples.
+    """
+    try:
+        """ StopIteration occurs when the pandas reader is empty, and AttributeError
+            occurs if rdr is None due to the file being empty. """
+        new_arr = rdr.read().values
+    except (StopIteration, AttributeError):
+        return None    
+
+    """ Extract array of just the times, and histograms matrix without times column.
+        Then, take the sequential difference of each of the rows in the histogram
+        matrix. This is necessary because fio outputs *cumulative* histograms as
+        opposed to histograms with counts just for a particular interval. """
+    times, rws, szs = new_arr[:,0], new_arr[:,1], new_arr[:,2]
+    hists = new_arr[:,__NON_HIST_COLUMNS:]
+    hists_diff   = sequential_diffs(head_row, times, rws, hists)
+    times = times.reshape((len(times),1))
+    arr = np.append(times, hists_diff, axis=1)
+
+    """ hists[-1] will be the row we need to start our differencing with the
+        next time we call read_chunk() on the same rdr """
+    return arr, hists[-1]
+
+def get_min(fps, arrs):
+    """ Find the file with the current first row with the smallest start time """
+    return min([fp for fp in fps if not arrs[fp] is None], key=lambda fp: arrs.get(fp)[0][0][0])
+
+def histogram_generator(ctx, fps, sz):
+    
+    """ head_row for a particular file keeps track of the last (cumulative)
+        histogram we read so that we have a reference point to subtract off
+        when computing sequential differences. """
+    head_row  = np.zeros(shape=(1, __HIST_COLUMNS))
+    head_rows = {fp: {i: head_row for i in range(8)} for fp in fps}
+
+    # Create a chunked pandas reader for each of the files:
+    rdrs = {}
+    for fp in fps:
+        try:
+            rdrs[fp] = pandas.read_csv(fp, dtype=int, header=None, chunksize=sz)
+        except ValueError as e:
+            if e.message == 'No columns to parse from file':
+                if not ctx.nowarn: sys.stderr.write("WARNING: Empty input file encountered.\n")
+                rdrs[fp] = None
+            else:
+                raise(e)
+
+    # Initial histograms and corresponding head_rows:
+    arrs = {fp: read_chunk(head_rows[fp], rdr, sz) for fp,rdr in rdrs.items()}
+    while True:
+
+        try:
+            """ ValueError occurs when nothing more to read """
+            fp = get_min(fps, arrs)
+        except ValueError:
+            return
+        arr, head_row = arrs[fp]
+        yield np.insert(arr[0], 1, fps.index(fp))
+        arrs[fp] = arr[1:], head_row
+        head_rows[fp] = head_row
+
+        if arrs[fp][0].shape[0] == 0:
+            arrs[fp] = read_chunk(head_rows[fp], rdrs[fp], sz)
+
+def _plat_idx_to_val(idx, edge=0.5, FIO_IO_U_PLAT_BITS=6, FIO_IO_U_PLAT_VAL=64):
+    """ Taken from fio's stat.c for calculating the latency value of a bin
+        from that bin's index.
+        
+            idx  : the value of the index into the histogram bins
+            edge : fractional value in the range [0,1]** indicating how far into
+            the bin we wish to compute the latency value of.
+        
+        ** edge = 0.0 and 1.0 computes the lower and upper latency bounds
+           respectively of the given bin index. """
+
+    # MSB <= (FIO_IO_U_PLAT_BITS-1), cannot be rounded off. Use
+    # all bits of the sample as index
+    if (idx < (FIO_IO_U_PLAT_VAL << 1)):
+        return idx 
+
+    # Find the group and compute the minimum value of that group
+    error_bits = (idx >> FIO_IO_U_PLAT_BITS) - 1 
+    base = 1 << (error_bits + FIO_IO_U_PLAT_BITS)
+
+    # Find its bucket number of the group
+    k = idx % FIO_IO_U_PLAT_VAL
+
+    # Return the mean (if edge=0.5) of the range of the bucket
+    return base + ((k + edge) * (1 << error_bits))
+    
+def plat_idx_to_val_coarse(idx, coarseness, edge=0.5):
+    """ Converts the given *coarse* index into a non-coarse index as used by fio
+        in stat.h:plat_idx_to_val(), subsequently computing the appropriate
+        latency value for that bin.
+        """
+
+    # Multiply the index by the power of 2 coarseness to get the bin
+    # bin index with a max of 1536 bins (FIO_IO_U_PLAT_GROUP_NR = 24 in stat.h)
+    stride = 1 << coarseness
+    idx = idx * stride
+    lower = _plat_idx_to_val(idx, edge=0.0)
+    upper = _plat_idx_to_val(idx + stride, edge=1.0)
+    return lower + (upper - lower) * edge
+
+def print_all_stats(ctx, end, mn, ss_cnt, vs, ws, mx):
+    ps = weighted_percentile(percs, vs, ws)
+
+    avg = weighted_average(vs, ws)
+    values = [mn, avg] + list(ps) + [mx]
+    row = [end, ss_cnt] + map(lambda x: float(x) / ctx.divisor, values)
+    fmt = "%d, %d, %d, " + fmt_float_list(ctx, 5) + ", %d"
+    print (fmt % tuple(row))
+
+def update_extreme(val, fncn, new_val):
+    """ Calculate min / max in the presence of None values """
+    if val is None: return new_val
+    else: return fncn(val, new_val)
+
+# See beginning of main() for how bin_vals are computed
+bin_vals = []
+lower_bin_vals = [] # lower edge of each bin
+upper_bin_vals = [] # upper edge of each bin 
+
+def process_interval(ctx, samples, iStart, iEnd):
+    """ Construct the weighted histogram for the given interval by scanning
+        through all the histograms and figuring out which of their bins have
+        samples with latencies which overlap with the given interval
+        [iStart,iEnd].
+    """
+    
+    times, files, hists = samples[:,0], samples[:,1], samples[:,2:]
+    iHist = np.zeros(__HIST_COLUMNS)
+    ss_cnt = 0 # number of samples affecting this interval
+    mn_bin_val, mx_bin_val = None, None
+
+    for end_time,file,hist in zip(times,files,hists):
+            
+        # Only look at bins of the current histogram sample which
+        # started before the end of the current time interval [start,end]
+        start_times = (end_time - 0.5 * ctx.interval) - bin_vals / 1000.0
+        idx = np.where(start_times < iEnd)
+        s_ts, l_bvs, u_bvs, hs = start_times[idx], lower_bin_vals[idx], upper_bin_vals[idx], hist[idx]
+
+        # Increment current interval histogram by weighted values of future histogram:
+        ws = hs * weights(s_ts, end_time, iStart, iEnd)
+        iHist[idx] += ws
+    
+        # Update total number of samples affecting current interval histogram:
+        ss_cnt += np.sum(hs)
+        
+        # Update min and max bin values seen if necessary:
+        idx = np.where(hs != 0)[0]
+        if idx.size > 0:
+            mn_bin_val = update_extreme(mn_bin_val, min, l_bvs[max(0,           idx[0]  - 1)])
+            mx_bin_val = update_extreme(mx_bin_val, max, u_bvs[min(len(hs) - 1, idx[-1] + 1)])
+
+    if ss_cnt > 0: print_all_stats(ctx, iEnd, mn_bin_val, ss_cnt, bin_vals, iHist, mx_bin_val)
+
+def guess_max_from_bins(ctx, hist_cols):
+    """ Try to guess the GROUP_NR from given # of histogram
+        columns seen in an input file """
+    max_coarse = 8
+    if ctx.group_nr < 19 or ctx.group_nr > 26:
+        bins = [ctx.group_nr * (1 << 6)]
+    else:
+        bins = [1216,1280,1344,1408,1472,1536,1600,1664]
+    coarses = range(max_coarse + 1)
+    fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else -10, coarses))
+    
+    arr = np.transpose(list(map(fncn, bins)))
+    idx = np.where(arr == hist_cols)
+    if len(idx[1]) == 0:
+        table = repr(arr.astype(int)).replace('-10', 'N/A').replace('array','     ')
+        err("Unable to determine bin values from input clat_hist files. Namely \n"
+            "the first line of file '%s' " % ctx.FILE[0] + "has %d \n" % (__TOTAL_COLUMNS,) +
+            "columns of which we assume %d " % (hist_cols,) + "correspond to histogram bins. \n"
+            "This number needs to be equal to one of the following numbers:\n\n"
+            + table + "\n\n"
+            "Possible reasons and corresponding solutions:\n"
+            "  - Input file(s) does not contain histograms.\n"
+            "  - You recompiled fio with a different GROUP_NR. If so please specify this\n"
+            "    new GROUP_NR on the command line with --group_nr\n")
+        exit(1)
+    return bins[idx[1][0]]
+
+def main(ctx):
+
+    # Automatically detect how many columns are in the input files,
+    # calculate the corresponding 'coarseness' parameter used to generate
+    # those files, and calculate the appropriate bin latency values:
+    with open(ctx.FILE[0], 'r') as fp:
+        global bin_vals,lower_bin_vals,upper_bin_vals,__HIST_COLUMNS,__TOTAL_COLUMNS
+        __TOTAL_COLUMNS = len(fp.readline().split(','))
+        __HIST_COLUMNS = __TOTAL_COLUMNS - __NON_HIST_COLUMNS
+
+        max_cols = guess_max_from_bins(ctx, __HIST_COLUMNS)
+        coarseness = int(np.log2(float(max_cols) / __HIST_COLUMNS))
+        bin_vals = np.array(map(lambda x: plat_idx_to_val_coarse(x, coarseness), np.arange(__HIST_COLUMNS)), dtype=float)
+        lower_bin_vals = np.array(map(lambda x: plat_idx_to_val_coarse(x, coarseness, 0.0), np.arange(__HIST_COLUMNS)), dtype=float)
+        upper_bin_vals = np.array(map(lambda x: plat_idx_to_val_coarse(x, coarseness, 1.0), np.arange(__HIST_COLUMNS)), dtype=float)
+
+    fps = [open(f, 'r') for f in ctx.FILE]
+    gen = histogram_generator(ctx, fps, ctx.buff_size)
+
+    print(', '.join(columns))
+
+    try:
+        start, end = 0, ctx.interval
+        arr = np.empty(shape=(0,__TOTAL_COLUMNS - 1))
+        more_data = True
+        while more_data or len(arr) > 0:
+            
+            # Read up to ctx.max_latency (default 20 seconds) of data from end of current interval.
+            while len(arr) == 0 or arr[-1][0] < ctx.max_latency * 1000 + end:
+                try:
+                    new_arr = next(gen)
+                except StopIteration:
+                    more_data = False
+                    break
+                arr = np.append(arr, new_arr.reshape((1,__TOTAL_COLUMNS - 1)), axis=0)
+            arr = arr.astype(int)
+            
+            if arr.size > 0:
+                process_interval(ctx, arr, start, end)
+                
+                # Update arr to throw away samples we no longer need - samples which
+                # end before the start of the next interval, i.e. the end of the
+                # current interval:
+                idx = np.where(arr[:,0] > end)
+                arr = arr[idx]
+            
+            start += ctx.interval
+            end = start + ctx.interval
+    finally:
+        map(lambda f: f.close(), fps)
+
+
+if __name__ == '__main__':
+    import argparse
+    p = argparse.ArgumentParser()
+    arg = p.add_argument
+    arg("FILE", help='space separated list of latency log filenames', nargs='+')
+    arg('--buff_size',
+        default=10000,
+        type=int,
+        help='number of samples to buffer into numpy at a time')
+
+    arg('--max_latency',
+        default=20,
+        type=float,
+        help='number of seconds of data to process at a time')
+
+    arg('-i', '--interval',
+        default=1000,
+        type=int,
+        help='interval width (ms)')
+
+    arg('-d', '--divisor',
+        required=False,
+        type=int,
+        default=1,
+        help='divide the results by this value.')
+
+    arg('--decimals',
+        default=3,
+        type=int,
+        help='number of decimal places to print floats to')
+
+    arg('--nowarn',
+        dest='nowarn',
+        action='store_false',
+        default=True,
+        help='do not print any warning messages to stderr')
+
+    arg('--group_nr',
+        default=19,
+        type=int,
+        help='FIO_IO_U_PLAT_GROUP_NR as defined in stat.h')
+
+    main(p.parse_args())
+
diff --git a/tools/hist/half-bins.py b/tools/hist/half-bins.py
new file mode 100755
index 0000000..d592af0
--- /dev/null
+++ b/tools/hist/half-bins.py
@@ -0,0 +1,38 @@
+#!/usr/bin/env python2.7
+""" Cut the number bins in half in fio histogram output. Example usage:
+
+        $ half-bins.py -c 2 output_clat_hist.1.log > smaller_clat_hist.1.log
+
+    Which merges e.g. bins [0 .. 3], [4 .. 7], ..., [1212 .. 1215] resulting in
+    304 = 1216 / (2**2) merged bins per histogram sample.
+
+    @author Karl Cronburg <karl.cronburg@gmail.com>
+"""
+import sys
+
+def main(ctx):
+    stride = 1 << ctx.coarseness
+    with open(ctx.FILENAME, 'r') as fp:
+        for line in fp.readlines():
+            vals = line.split(', ')
+            sys.stdout.write("%s, %s, %s, " % tuple(vals[:3]))
+
+            hist = list(map(int, vals[3:]))
+            for i in range(0, len(hist) - stride, stride):
+                sys.stdout.write("%d, " % sum(hist[i : i + stride],))
+            sys.stdout.write("%d\n" % sum(hist[len(hist) - stride:]))
+
+if __name__ == '__main__':
+    import argparse
+    p = argparse.ArgumentParser()
+    arg = p.add_argument
+    arg( 'FILENAME', help='clat_hist file for which we will reduce'
+                         ' (by half or more) the number of bins.')
+    arg('-c', '--coarseness',
+       default=1,
+       type=int,
+       help='number of times to reduce number of bins by half, '
+            'e.g. coarseness of 4 merges each 2^4 = 16 consecutive '
+            'bins.')
+    main(p.parse_args())
+

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ff56a4e2e0a87e4e3b1cc1e74547d55b295967a6:

  travis: don't enable rbd (2016-08-03 10:18:42 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5fd31680d0370c6b71ccfa456ade211477af81d6:

  Revert "filesetup: ensure that we catch a file flagged for extend" (2016-08-04 19:41:09 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Revert "filesetup: ensure that we catch a file flagged for extend"

 filesetup.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index f32d874..1ecdda6 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -900,12 +900,11 @@ int setup_files(struct thread_data *td)
 		if (f->filetype == FIO_TYPE_FILE &&
 		    (f->io_size + f->file_offset) > f->real_file_size &&
 		    !(td->io_ops->flags & FIO_DISKLESSIO)) {
-			if (!o->create_on_open)
+			if (!o->create_on_open) {
+				need_extend++;
 				extend_size += (f->io_size + f->file_offset);
-			else
+			} else
 				f->real_file_size = f->io_size + f->file_offset;
-
-			need_extend++;
 			fio_file_set_extend(f);
 		}
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 550beaad94a11beef70dfb4057797ff8800c8a72:

  engines/rbd: fix compile without blkin support (2016-08-02 15:23:43 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ff56a4e2e0a87e4e3b1cc1e74547d55b295967a6:

  travis: don't enable rbd (2016-08-03 10:18:42 -0600)

----------------------------------------------------------------
Jan Fajerski (2):
      fix typo in HOWTO
      fix typo in HOWTO

Jens Axboe (4):
      iolog: prevent early entry from skewing entire logging run
      filesetup: ensure that we catch a file flagged for extend
      travis: add rbd/zlib
      travis: don't enable rbd

 .travis.yml | 2 +-
 HOWTO       | 4 ++--
 filesetup.c | 7 ++++---
 stat.c      | 4 +++-
 4 files changed, 10 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
index 9bef750..bf0433d 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -4,4 +4,4 @@ compiler:
   - gcc
 before_install:
   - sudo apt-get -qq update
-  - sudo apt-get install -y libaio-dev libnuma-dev
+  - sudo apt-get install -qq -y libaio-dev libnuma-dev libz-dev
diff --git a/HOWTO b/HOWTO
index 2c5896d..d18d59b 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1175,7 +1175,7 @@ cpus_allowed_policy=str Set the policy of how fio distributes the CPUs
 		one cpu per job. If not enough CPUs are given for the jobs
 		listed, then fio will roundrobin the CPUs in the set.
 
-numa_cpu_nodes=str Set this job running on spcified NUMA nodes' CPUs. The
+numa_cpu_nodes=str Set this job running on specified NUMA nodes' CPUs. The
 		arguments allow comma delimited list of cpu numbers,
 		A-B ranges, or 'all'. Note, to enable numa options support,
 		fio must be built on a system with libnuma-dev(el) installed.
@@ -1606,7 +1606,7 @@ write_lat_log=str Same as write_bw_log, except that this option stores io
 		The actual log names will be foo_slat.x.log, foo_clat.x.log,
 		and foo_lat.x.log, where x is the index of the job (1..N,
 		where N is the number of jobs). This helps fio_generate_plot
-		fine the logs automatically. If 'per_job_logs' is false, then
+		find the logs automatically. If 'per_job_logs' is false, then
 		the filename will not include the job index. See 'Log File
 		Formats'.
 
diff --git a/filesetup.c b/filesetup.c
index 1ecdda6..f32d874 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -900,11 +900,12 @@ int setup_files(struct thread_data *td)
 		if (f->filetype == FIO_TYPE_FILE &&
 		    (f->io_size + f->file_offset) > f->real_file_size &&
 		    !(td->io_ops->flags & FIO_DISKLESSIO)) {
-			if (!o->create_on_open) {
-				need_extend++;
+			if (!o->create_on_open)
 				extend_size += (f->io_size + f->file_offset);
-			} else
+			else
 				f->real_file_size = f->io_size + f->file_offset;
+
+			need_extend++;
 			fio_file_set_extend(f);
 		}
 	}
diff --git a/stat.c b/stat.c
index 7a35117..d6787b7 100644
--- a/stat.c
+++ b/stat.c
@@ -2139,7 +2139,9 @@ static long add_log_sample(struct thread_data *td, struct io_log *iolog,
 	 * need to do.
 	 */
 	this_window = elapsed - iolog->avg_last;
-	if (this_window < iolog->avg_msec) {
+	if (elapsed < iolog->avg_last)
+		return iolog->avg_last - elapsed;
+	else if (this_window < iolog->avg_msec) {
 		int diff = iolog->avg_msec - this_window;
 
 		if (inline_log(iolog) || diff > LOG_MSEC_SLACK)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 059b61f219b15db434eddc2207b876c6a0bad6c0:

  backend: do_verify() cleanup (2016-08-01 13:46:17 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 550beaad94a11beef70dfb4057797ff8800c8a72:

  engines/rbd: fix compile without blkin support (2016-08-02 15:23:43 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Revert "filesetup: ensure that we align file starting offset"
      Merge branch 'wip-traceinfo' of https://github.com/vears91/fio
      engines/rbd: fix compile without blkin support

vears91 (1):
      Add support for blkin tracing in rbd engine

 configure     | 33 +++++++++++++++++++++++++++++++++
 engines/rbd.c | 18 ++++++++++++++++++
 filesetup.c   |  8 ++------
 3 files changed, 53 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/configure b/configure
index 5f6bca3..93c3720 100755
--- a/configure
+++ b/configure
@@ -166,6 +166,8 @@ for opt do
   ;;
   --disable-rbd) disable_rbd="yes"
   ;;
+  --disable-rbd-blkin) disable_rbd_blkin="yes"
+  ;;
   --disable-gfapi) disable_gfapi="yes"
   ;;
   --enable-libhdfs) libhdfs="yes"
@@ -1334,6 +1336,34 @@ echo "rbd_invalidate_cache          $rbd_inval"
 fi
 
 ##########################################
+# check for blkin
+rbd_blkin="no"
+cat > $TMPC << EOF
+#include <rbd/librbd.h>
+#include <zipkin_c.h>
+
+int main(int argc, char **argv)
+{
+  int r;
+  struct blkin_trace_info t_info;
+  blkin_init_trace_info(&t_info);
+  rbd_completion_t completion;
+  rbd_image_t image;
+  uint64_t off;
+  size_t len;
+  const char *buf;
+  r = rbd_aio_write_traced(image, off, len, buf, completion, &t_info);
+  return 0;
+}
+EOF
+if test "$disable_rbd" != "yes" && test "$disable_rbd_blkin" != "yes" \
+ && compile_prog "" "-lrbd -lrados -lblkin" "rbd_blkin"; then
+  LIBS="-lblkin $LIBS"
+  rbd_blkin="yes"
+fi
+echo "rbd blkin tracing             $rbd_blkin"
+
+##########################################
 # Check whether we have setvbuf
 setvbuf="no"
 cat > $TMPC << EOF
@@ -1778,6 +1808,9 @@ fi
 if test "$rbd_inval" = "yes" ; then
   output_sym "CONFIG_RBD_INVAL"
 fi
+if test "$rbd_blkin" = "yes" ; then
+  output_sym "CONFIG_RBD_BLKIN"
+fi
 if test "$setvbuf" = "yes" ; then
   output_sym "CONFIG_SETVBUF"
 fi
diff --git a/engines/rbd.c b/engines/rbd.c
index c85645a..5e17fbe 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -9,12 +9,18 @@
 
 #include "../fio.h"
 #include "../optgroup.h"
+#ifdef CONFIG_RBD_BLKIN
+#include <zipkin_c.h>
+#endif
 
 struct fio_rbd_iou {
 	struct io_u *io_u;
 	rbd_completion_t completion;
 	int io_seen;
 	int io_complete;
+#ifdef CONFIG_RBD_BLKIN
+	struct blkin_trace_info info;
+#endif
 };
 
 struct rbd_data {
@@ -391,16 +397,28 @@ static int fio_rbd_queue(struct thread_data *td, struct io_u *io_u)
 	}
 
 	if (io_u->ddir == DDIR_WRITE) {
+#ifdef CONFIG_RBD_BLKIN
+		blkin_init_trace_info(&fri->info);
+		r = rbd_aio_write_traced(rbd->image, io_u->offset, io_u->xfer_buflen,
+					 io_u->xfer_buf, fri->completion, &fri->info);
+#else
 		r = rbd_aio_write(rbd->image, io_u->offset, io_u->xfer_buflen,
 					 io_u->xfer_buf, fri->completion);
+#endif
 		if (r < 0) {
 			log_err("rbd_aio_write failed.\n");
 			goto failed_comp;
 		}
 
 	} else if (io_u->ddir == DDIR_READ) {
+#ifdef CONFIG_RBD_BLKIN
+		blkin_init_trace_info(&fri->info);
+		r = rbd_aio_read_traced(rbd->image, io_u->offset, io_u->xfer_buflen,
+					io_u->xfer_buf, fri->completion, &fri->info);
+#else
 		r = rbd_aio_read(rbd->image, io_u->offset, io_u->xfer_buflen,
 					io_u->xfer_buf, fri->completion);
+#endif
 
 		if (r < 0) {
 			log_err("rbd_aio_read failed.\n");
diff --git a/filesetup.c b/filesetup.c
index a48faf5..1ecdda6 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -761,16 +761,12 @@ static unsigned long long get_fs_free_counts(struct thread_data *td)
 uint64_t get_start_offset(struct thread_data *td, struct fio_file *f)
 {
 	struct thread_options *o = &td->o;
-	uint64_t offset;
 
 	if (o->file_append && f->filetype == FIO_TYPE_FILE)
 		return f->real_file_size;
 
-	offset = td->o.start_offset + td->subjob_number * td->o.offset_increment;
-	if (offset % td_max_bs(td))
-		offset -= (offset % td_max_bs(td));
-
-	return offset;
+	return td->o.start_offset +
+		td->subjob_number * td->o.offset_increment;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-08-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-08-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 99955d3d3e290ccb06583a821a8112210e4b332d:

  backend: do_dry_run(): get_io_u() can return an error pointer (2016-07-29 09:59:38 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 059b61f219b15db434eddc2207b876c6a0bad6c0:

  backend: do_verify() cleanup (2016-08-01 13:46:17 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      backend: do_verify() cleanup

 backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 8a71490..c3ad831 100644
--- a/backend.c
+++ b/backend.c
@@ -652,7 +652,7 @@ static void do_verify(struct thread_data *td, uint64_t verify_bytes)
 				break;
 
 			while ((io_u = get_io_u(td)) != NULL) {
-				if (IS_ERR(io_u)) {
+				if (IS_ERR_OR_NULL(io_u)) {
 					io_u = NULL;
 					ret = FIO_Q_BUSY;
 					goto reap;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit be6bb2b72608d7efbec13d06c67446e229136afa:

  Fix overflow caused by signed long division by unsigned long. The over flow seems to occurr when the value of 'log_avg_msec' option is relatively large. (2016-07-29 09:25:17 +0900)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 99955d3d3e290ccb06583a821a8112210e4b332d:

  backend: do_dry_run(): get_io_u() can return an error pointer (2016-07-29 09:59:38 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      gettime: remove unneeded 'ret' in {utime,mtime}_since()
      backend: do_dry_run(): get_io_u() can return an error pointer

Tomohiro Kusumi (4):
      Use in-place path separator "/" for Linux specific code
      Make switch_ioscheduler() return 0 if FIO_HAVE_IOSCHED_SWITCH is undefined
      Null terminate before (or after) strncpy(3)
      Use larger local buffer for I/O engine name

 backend.c                |  6 +++++-
 cgroup.c                 |  6 +++---
 diskutil.c               |  8 +++++---
 gettime.c                | 11 +++--------
 ioengines.c              |  3 ++-
 oslib/linux-dev-lookup.c |  2 +-
 6 files changed, 19 insertions(+), 17 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index ad2d7da..8a71490 100644
--- a/backend.c
+++ b/backend.c
@@ -1261,6 +1261,7 @@ static int init_io_u(struct thread_data *td)
 
 static int switch_ioscheduler(struct thread_data *td)
 {
+#ifdef FIO_HAVE_IOSCHED_SWITCH
 	char tmp[256], tmp2[128];
 	FILE *f;
 	int ret;
@@ -1319,6 +1320,9 @@ static int switch_ioscheduler(struct thread_data *td)
 
 	fclose(f);
 	return 0;
+#else
+	return 0;
+#endif
 }
 
 static bool keep_running(struct thread_data *td)
@@ -1395,7 +1399,7 @@ static uint64_t do_dry_run(struct thread_data *td)
 			break;
 
 		io_u = get_io_u(td);
-		if (!io_u)
+		if (IS_ERR_OR_NULL(io_u))
 			break;
 
 		io_u_set(io_u, IO_U_F_FLIGHT);
diff --git a/cgroup.c b/cgroup.c
index 34b61de..a297e2a 100644
--- a/cgroup.c
+++ b/cgroup.c
@@ -102,9 +102,9 @@ static char *get_cgroup_root(struct thread_data *td, char *mnt)
 	char *str = malloc(64);
 
 	if (td->o.cgroup)
-		sprintf(str, "%s%s%s", mnt, FIO_OS_PATH_SEPARATOR, td->o.cgroup);
+		sprintf(str, "%s/%s", mnt, td->o.cgroup);
 	else
-		sprintf(str, "%s%s%s", mnt, FIO_OS_PATH_SEPARATOR, td->o.name);
+		sprintf(str, "%s/%s", mnt, td->o.name);
 
 	return str;
 }
@@ -116,7 +116,7 @@ static int write_int_to_file(struct thread_data *td, const char *path,
 	char tmp[256];
 	FILE *f;
 
-	sprintf(tmp, "%s%s%s", path, FIO_OS_PATH_SEPARATOR, filename);
+	sprintf(tmp, "%s/%s", path, filename);
 	f = fopen(tmp, "w");
 	if (!f) {
 		td_verror(td, errno, onerr);
diff --git a/diskutil.c b/diskutil.c
index 8031d5d..a1077d4 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -179,6 +179,7 @@ static int get_device_numbers(char *file_name, int *maj, int *min)
 		/*
 		 * must be a file, open "." in that path
 		 */
+		tempname[PATH_MAX - 1] = '\0';
 		strncpy(tempname, file_name, PATH_MAX - 1);
 		p = dirname(tempname);
 		if (stat(p, &st)) {
@@ -239,7 +240,7 @@ static void find_add_disk_slaves(struct thread_data *td, char *path,
 		    !strcmp(dirent->d_name, ".."))
 			continue;
 
-		sprintf(temppath, "%s%s%s", slavesdir, FIO_OS_PATH_SEPARATOR, dirent->d_name);
+		sprintf(temppath, "%s/%s", slavesdir, dirent->d_name);
 		/* Can we always assume that the slaves device entries
 		 * are links to the real directories for the slave
 		 * devices?
@@ -266,7 +267,7 @@ static void find_add_disk_slaves(struct thread_data *td, char *path,
 		if (slavedu)
 			continue;
 
-		sprintf(temppath, "%s%s%s", slavesdir, FIO_OS_PATH_SEPARATOR, slavepath);
+		sprintf(temppath, "%s/%s", slavesdir, slavepath);
 		__init_per_file_disk_util(td, majdev, mindev, temppath);
 		slavedu = disk_util_exists(majdev, mindev);
 
@@ -370,7 +371,7 @@ static int find_block_dir(int majdev, int mindev, char *path, int link_ok)
 		if (!strcmp(dir->d_name, ".") || !strcmp(dir->d_name, ".."))
 			continue;
 
-		sprintf(full_path, "%s%s%s", path, FIO_OS_PATH_SEPARATOR, dir->d_name);
+		sprintf(full_path, "%s/%s", path, dir->d_name);
 
 		if (!strcmp(dir->d_name, "dev")) {
 			if (!check_dev_match(majdev, mindev, full_path)) {
@@ -426,6 +427,7 @@ static struct disk_util *__init_per_file_disk_util(struct thread_data *td,
 			log_err("unknown sysfs layout\n");
 			return NULL;
 		}
+		tmp[PATH_MAX - 1] = '\0';
 		strncpy(tmp, p, PATH_MAX - 1);
 		sprintf(path, "%s", tmp);
 	}
diff --git a/gettime.c b/gettime.c
index 964a52f..73b48b0 100644
--- a/gettime.c
+++ b/gettime.c
@@ -382,7 +382,6 @@ void fio_clock_init(void)
 uint64_t utime_since(const struct timeval *s, const struct timeval *e)
 {
 	long sec, usec;
-	uint64_t ret;
 
 	sec = e->tv_sec - s->tv_sec;
 	usec = e->tv_usec - s->tv_usec;
@@ -397,9 +396,7 @@ uint64_t utime_since(const struct timeval *s, const struct timeval *e)
 	if (sec < 0 || (sec == 0 && usec < 0))
 		return 0;
 
-	ret = sec * 1000000ULL + usec;
-
-	return ret;
+	return usec + (sec * 1000000);
 }
 
 uint64_t utime_since_now(const struct timeval *s)
@@ -412,7 +409,7 @@ uint64_t utime_since_now(const struct timeval *s)
 
 uint64_t mtime_since(const struct timeval *s, const struct timeval *e)
 {
-	long sec, usec, ret;
+	long sec, usec;
 
 	sec = e->tv_sec - s->tv_sec;
 	usec = e->tv_usec - s->tv_usec;
@@ -426,9 +423,7 @@ uint64_t mtime_since(const struct timeval *s, const struct timeval *e)
 
 	sec *= 1000;
 	usec /= 1000;
-	ret = sec + usec;
-
-	return ret;
+	return sec + usec;
 }
 
 uint64_t mtime_since_now(const struct timeval *s)
diff --git a/ioengines.c b/ioengines.c
index 918b50a..4129ac2 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -126,10 +126,11 @@ static struct ioengine_ops *dlopen_ioengine(struct thread_data *td,
 struct ioengine_ops *load_ioengine(struct thread_data *td, const char *name)
 {
 	struct ioengine_ops *ops;
-	char engine[16];
+	char engine[64];
 
 	dprint(FD_IO, "load ioengine %s\n", name);
 
+	engine[sizeof(engine) - 1] = '\0';
 	strncpy(engine, name, sizeof(engine) - 1);
 
 	/*
diff --git a/oslib/linux-dev-lookup.c b/oslib/linux-dev-lookup.c
index 4d5f356..3a415dd 100644
--- a/oslib/linux-dev-lookup.c
+++ b/oslib/linux-dev-lookup.c
@@ -25,7 +25,7 @@ int blktrace_lookup_device(const char *redirect, char *path, unsigned int maj,
 		if (!strcmp(dir->d_name, ".") || !strcmp(dir->d_name, ".."))
 			continue;
 
-		sprintf(full_path, "%s%s%s", path, FIO_OS_PATH_SEPARATOR, dir->d_name);
+		sprintf(full_path, "%s/%s", path, dir->d_name);
 		if (lstat(full_path, &st) == -1) {
 			perror("lstat");
 			break;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c915aa3b56b72c3c9eac3b92f89f22a18ccf0047:

  examples/backwards-read.fio: add size (2016-07-27 08:33:21 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to be6bb2b72608d7efbec13d06c67446e229136afa:

  Fix overflow caused by signed long division by unsigned long. The over flow seems to occurr when the value of 'log_avg_msec' option is relatively large. (2016-07-29 09:25:17 +0900)

----------------------------------------------------------------
Jevon Qiao (2):
      Fix segmentation fault while specifying clustername with rbd engine
      Fix memory leak in _fio_rbd_connect()

YukiKita (1):
      Fix overflow caused by signed long division by unsigned long.     The over flow seems to occurr when the value of 'log_avg_msec' option is relatively large.

 engines/rbd.c | 22 +++++++++++++++-------
 gettime.c     |  4 ++--
 2 files changed, 17 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/engines/rbd.c b/engines/rbd.c
index 7a109ee..c85645a 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -131,18 +131,26 @@ static int _fio_rbd_connect(struct thread_data *td)
 		char *client_name = NULL; 
 
 		/*
-		 * If we specify cluser name, the rados_creat2
+		 * If we specify cluser name, the rados_create2
 		 * will not assume 'client.'. name is considered
 		 * as a full type.id namestr
 		 */
-		if (!index(o->client_name, '.')) {
-			client_name = calloc(1, strlen("client.") +
-						strlen(o->client_name) + 1);
-			strcat(client_name, "client.");
-			o->client_name = strcat(client_name, o->client_name);
+		if (o->client_name) {
+			if (!index(o->client_name, '.')) {
+				client_name = calloc(1, strlen("client.") +
+						    strlen(o->client_name) + 1);
+				strcat(client_name, "client.");
+				strcat(client_name, o->client_name);
+			} else {
+				client_name = o->client_name;
+			}
 		}
+
 		r = rados_create2(&rbd->cluster, o->cluster_name,
-					o->client_name, 0);
+				 client_name, 0);
+
+		if (client_name && !index(o->client_name, '.'))
+			free(client_name);
 	} else
 		r = rados_create(&rbd->cluster, o->client_name);
 	
diff --git a/gettime.c b/gettime.c
index b896b5b..964a52f 100644
--- a/gettime.c
+++ b/gettime.c
@@ -424,8 +424,8 @@ uint64_t mtime_since(const struct timeval *s, const struct timeval *e)
 	if (sec < 0 || (sec == 0 && usec < 0))
 		return 0;
 
-	sec *= 1000UL;
-	usec /= 1000UL;
+	sec *= 1000;
+	usec /= 1000;
 	ret = sec + usec;
 
 	return ret;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit fa3cdbb7a46e42186e8fa62d33b82b92c7c0e310:

  Add sample job file showing how to read backwards (2016-07-26 14:50:02 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c915aa3b56b72c3c9eac3b92f89f22a18ccf0047:

  examples/backwards-read.fio: add size (2016-07-27 08:33:21 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      examples/backwards-read.fio: add size

Jevon Qiao (1):
      Fix memory leak in _fio_setup_rbd_data()

Tomohiro Kusumi (14):
      Make return value type of fio_getaffinity() consistent
      Fix typos in log_err() message
      Use default CPU_COUNT() function in DragonFlyBSD
      Use sizeof(char*) instead of sizeof(void*)
      Mention default values for readwrite=/ioengine=/mem= in documentation
      Mention cpuio never finishes without real I/O in documentation
      Ignore exit_io_done= option if no I/O threads are configured
      Use correct I/O engine name "cpuio" instead of "cpu"
      Add missing --cmdhelp type string for FIO_OPT_UNSUPPORTED
      Don't malloc/memcpy ioengine_ops on td initialization
      Fix stat(2) related bugs introduced by changes made for Windows
      Rename exists_and_not_file() to exists_and_not_regfile()
      Add missing archs in fio_arch_strings[]
      Change arch_i386 to arch_x86

 HOWTO                       | 15 +++++++++-----
 arch/arch-x86.h             |  2 +-
 arch/arch.h                 |  2 +-
 engines/binject.c           | 10 +++++-----
 engines/cpu.c               |  4 ++--
 engines/e4defrag.c          |  6 +++---
 engines/glusterfs.c         | 20 +++++++++----------
 engines/glusterfs_async.c   |  8 ++++----
 engines/glusterfs_sync.c    |  4 ++--
 engines/guasi.c             | 12 ++++++------
 engines/libaio.c            | 14 ++++++-------
 engines/libhdfs.c           | 16 +++++++--------
 engines/net.c               | 42 +++++++++++++++++++--------------------
 engines/null.c              | 12 ++++++------
 engines/posixaio.c          | 10 +++++-----
 engines/rbd.c               | 23 +++++++++++++---------
 engines/rdma.c              | 48 ++++++++++++++++++++++-----------------------
 engines/sg.c                | 16 +++++++--------
 engines/solarisaio.c        | 12 ++++++------
 engines/splice.c            | 12 ++++++------
 engines/sync.c              | 20 +++++++++----------
 engines/windowsaio.c        | 16 +++++++--------
 examples/backwards-read.fio |  1 +
 filesetup.c                 |  4 +++-
 fio.1                       | 15 ++++++++------
 fio.h                       |  6 ++++++
 init.c                      | 11 ++++++++---
 ioengine.h                  |  2 --
 ioengines.c                 | 17 ++++++----------
 libfio.c                    |  8 ++++++++
 options.c                   |  2 +-
 os/os-dragonfly.h           | 24 +++++++++--------------
 os/os-linux-syscall.h       |  2 +-
 os/os-windows.h             |  5 ++++-
 os/os.h                     |  6 +++++-
 parse.c                     |  1 +
 36 files changed, 229 insertions(+), 199 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index ab25cb2..2c5896d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -405,6 +405,7 @@ rw=str		Type of io pattern. Accepted values are:
 			trimwrite	Mixed trims and writes. Blocks will be
 					trimmed first, then written to.
 
+		Fio defaults to read if the option is not specified.
 		For the mixed io types, the default is to split them 50/50.
 		For certain types of io the result may still be skewed a bit,
 		since the speed may be different. It is possible to specify
@@ -699,7 +700,8 @@ ioengine=str	Defines how the job issues io to the file. The following
 			sync	Basic read(2) or write(2) io. lseek(2) is
 				used to position the io location.
 
-			psync 	Basic pread(2) or pwrite(2) io.
+			psync 	Basic pread(2) or pwrite(2) io. Default on all
+				supported operating systems except for Windows.
 
 			vsync	Basic readv(2) or writev(2) IO.
 
@@ -717,6 +719,7 @@ ioengine=str	Defines how the job issues io to the file. The following
 			solarisaio Solaris native asynchronous io.
 
 			windowsaio Windows native asynchronous io.
+				Default on Windows.
 
 			mmap	File is memory mapped and data copied
 				to/from using memcpy(3).
@@ -754,7 +757,8 @@ ioengine=str	Defines how the job issues io to the file. The following
 				85% of the CPU. In case of SMP machines,
 				use numjobs=<no_of_cpu> to get desired CPU
 				usage, as the cpuload only loads a single
-				CPU at the desired rate.
+				CPU at the desired rate. A job never finishes
+				unless there is at least one non-cpuio job.
 
 			guasi	The GUASI IO engine is the Generic Userspace
 				Asyncronous Syscall Interface approach
@@ -1221,6 +1225,7 @@ mem=str		Fio can use various types of memory as the io unit buffer.
 		The allowed values are:
 
 			malloc	Use memory from malloc(3) as the buffers.
+				Default memory type.
 
 			shm	Use shared memory as the buffers. Allocated
 				through shmget(2).
@@ -1835,12 +1840,12 @@ that defines them is selected.
 [psyncv2] hipri		Set RWF_HIPRI on IO, indicating to the kernel that
 			it's of higher priority than normal.
 
-[cpu] cpuload=int Attempt to use the specified percentage of CPU cycles.
+[cpuio] cpuload=int Attempt to use the specified percentage of CPU cycles.
 
-[cpu] cpuchunks=int Split the load into cycles of the given time. In
+[cpuio] cpuchunks=int Split the load into cycles of the given time. In
 		microseconds.
 
-[cpu] exit_on_io_done=bool Detect when IO threads are done, then exit.
+[cpuio] exit_on_io_done=bool Detect when IO threads are done, then exit.
 
 [netsplice] hostname=str
 [net] hostname=str The host name or IP address to use for TCP or UDP based IO.
diff --git a/arch/arch-x86.h b/arch/arch-x86.h
index d3b8985..457b44c 100644
--- a/arch/arch-x86.h
+++ b/arch/arch-x86.h
@@ -12,7 +12,7 @@ static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 
 #include "arch-x86-common.h"
 
-#define FIO_ARCH	(arch_i386)
+#define FIO_ARCH	(arch_x86)
 
 #define	FIO_HUGE_PAGE		4194304
 
diff --git a/arch/arch.h b/arch/arch.h
index 043e283..00d247c 100644
--- a/arch/arch.h
+++ b/arch/arch.h
@@ -3,7 +3,7 @@
 
 enum {
 	arch_x86_64 = 1,
-	arch_i386,
+	arch_x86,
 	arch_ppc,
 	arch_ia64,
 	arch_s390,
diff --git a/engines/binject.c b/engines/binject.c
index f8e83cd..7d20a3f 100644
--- a/engines/binject.c
+++ b/engines/binject.c
@@ -94,7 +94,7 @@ static int fio_binject_getevents(struct thread_data *td, unsigned int min,
 				 unsigned int max,
 				 const struct timespec fio_unused *t)
 {
-	struct binject_data *bd = td->io_ops->data;
+	struct binject_data *bd = td->io_ops_data;
 	int left = max, ret, r = 0, ev_index = 0;
 	void *buf = bd->cmds;
 	unsigned int i, events;
@@ -185,7 +185,7 @@ static int fio_binject_doio(struct thread_data *td, struct io_u *io_u)
 
 static int fio_binject_prep(struct thread_data *td, struct io_u *io_u)
 {
-	struct binject_data *bd = td->io_ops->data;
+	struct binject_data *bd = td->io_ops_data;
 	struct b_user_cmd *buc = &io_u->buc;
 	struct binject_file *bf = FILE_ENG_DATA(io_u->file);
 
@@ -234,7 +234,7 @@ static int fio_binject_queue(struct thread_data *td, struct io_u *io_u)
 
 static struct io_u *fio_binject_event(struct thread_data *td, int event)
 {
-	struct binject_data *bd = td->io_ops->data;
+	struct binject_data *bd = td->io_ops_data;
 
 	return bd->events[event];
 }
@@ -376,7 +376,7 @@ err_close:
 
 static void fio_binject_cleanup(struct thread_data *td)
 {
-	struct binject_data *bd = td->io_ops->data;
+	struct binject_data *bd = td->io_ops_data;
 
 	if (bd) {
 		free(bd->events);
@@ -406,7 +406,7 @@ static int fio_binject_init(struct thread_data *td)
 	bd->fd_flags = malloc(sizeof(int) * td->o.nr_files);
 	memset(bd->fd_flags, 0, sizeof(int) * td->o.nr_files);
 
-	td->io_ops->data = bd;
+	td->io_ops_data = bd;
 	return 0;
 }
 
diff --git a/engines/cpu.c b/engines/cpu.c
index 7643a8c..3d855e3 100644
--- a/engines/cpu.c
+++ b/engines/cpu.c
@@ -88,8 +88,8 @@ static int fio_cpuio_init(struct thread_data *td)
 
 	o->nr_files = o->open_files = 1;
 
-	log_info("%s: ioengine=cpu, cpuload=%u, cpucycle=%u\n", td->o.name,
-						co->cpuload, co->cpucycle);
+	log_info("%s: ioengine=%s, cpuload=%u, cpucycle=%u\n",
+		td->o.name, td->io_ops->name, co->cpuload, co->cpucycle);
 
 	return 0;
 }
diff --git a/engines/e4defrag.c b/engines/e4defrag.c
index c0667fe..c599c98 100644
--- a/engines/e4defrag.c
+++ b/engines/e4defrag.c
@@ -109,7 +109,7 @@ static int fio_e4defrag_init(struct thread_data *td)
 		goto err;
 
 	ed->bsz = stub.st_blksize;
-	td->io_ops->data = ed;
+	td->io_ops_data = ed;
 	return 0;
 err:
 	td_verror(td, errno, "io_queue_init");
@@ -120,7 +120,7 @@ err:
 
 static void fio_e4defrag_cleanup(struct thread_data *td)
 {
-	struct e4defrag_data *ed = td->io_ops->data;
+	struct e4defrag_data *ed = td->io_ops_data;
 	if (ed) {
 		if (ed->donor_fd >= 0)
 			close(ed->donor_fd);
@@ -136,7 +136,7 @@ static int fio_e4defrag_queue(struct thread_data *td, struct io_u *io_u)
 	unsigned long long len;
 	struct move_extent me;
 	struct fio_file *f = io_u->file;
-	struct e4defrag_data *ed = td->io_ops->data;
+	struct e4defrag_data *ed = td->io_ops_data;
 	struct e4defrag_options *o = td->eo;
 
 	fio_ro_check(td, io_u);
diff --git a/engines/glusterfs.c b/engines/glusterfs.c
index dec9fb5..2abc283 100644
--- a/engines/glusterfs.c
+++ b/engines/glusterfs.c
@@ -41,7 +41,7 @@ int fio_gf_setup(struct thread_data *td)
 
 	dprint(FD_IO, "fio setup\n");
 
-	if (td->io_ops->data)
+	if (td->io_ops_data)
 		return 0;
 
 	g = malloc(sizeof(struct gf_data));
@@ -77,19 +77,19 @@ int fio_gf_setup(struct thread_data *td)
 		goto cleanup;
 	}
 	dprint(FD_FILE, "fio setup %p\n", g->fs);
-	td->io_ops->data = g;
+	td->io_ops_data = g;
 	return 0;
 cleanup:
 	if (g->fs)
 		glfs_fini(g->fs);
 	free(g);
-	td->io_ops->data = NULL;
+	td->io_ops_data = NULL;
 	return r;
 }
 
 void fio_gf_cleanup(struct thread_data *td)
 {
-	struct gf_data *g = td->io_ops->data;
+	struct gf_data *g = td->io_ops_data;
 
 	if (g) {
 		if (g->aio_events)
@@ -99,7 +99,7 @@ void fio_gf_cleanup(struct thread_data *td)
 		if (g->fs)
 			glfs_fini(g->fs);
 		free(g);
-		td->io_ops->data = NULL;
+		td->io_ops_data = NULL;
 	}
 }
 
@@ -107,7 +107,7 @@ int fio_gf_get_file_size(struct thread_data *td, struct fio_file *f)
 {
 	struct stat buf;
 	int ret;
-	struct gf_data *g = td->io_ops->data;
+	struct gf_data *g = td->io_ops_data;
 
 	dprint(FD_FILE, "get file size %s\n", f->file_name);
 
@@ -135,7 +135,7 @@ int fio_gf_open_file(struct thread_data *td, struct fio_file *f)
 
 	int flags = 0;
 	int ret = 0;
-	struct gf_data *g = td->io_ops->data;
+	struct gf_data *g = td->io_ops_data;
 	struct stat sb = { 0, };
 
 	if (td_write(td)) {
@@ -268,7 +268,7 @@ int fio_gf_open_file(struct thread_data *td, struct fio_file *f)
 int fio_gf_close_file(struct thread_data *td, struct fio_file *f)
 {
 	int ret = 0;
-	struct gf_data *g = td->io_ops->data;
+	struct gf_data *g = td->io_ops_data;
 
 	dprint(FD_FILE, "fd close %s\n", f->file_name);
 
@@ -284,7 +284,7 @@ int fio_gf_close_file(struct thread_data *td, struct fio_file *f)
 int fio_gf_unlink_file(struct thread_data *td, struct fio_file *f)
 {
 	int ret = 0;
-	struct gf_data *g = td->io_ops->data;
+	struct gf_data *g = td->io_ops_data;
 
 	dprint(FD_FILE, "fd unlink %s\n", f->file_name);
 
@@ -300,7 +300,7 @@ int fio_gf_unlink_file(struct thread_data *td, struct fio_file *f)
 		g->fd = NULL;
 		free(g);
 	}
-	td->io_ops->data = NULL;
+	td->io_ops_data = NULL;
 
 	return ret;
 }
diff --git a/engines/glusterfs_async.c b/engines/glusterfs_async.c
index 7c2c139..8e42a84 100644
--- a/engines/glusterfs_async.c
+++ b/engines/glusterfs_async.c
@@ -13,7 +13,7 @@ struct fio_gf_iou {
 
 static struct io_u *fio_gf_event(struct thread_data *td, int event)
 {
-	struct gf_data *gf_data = td->io_ops->data;
+	struct gf_data *gf_data = td->io_ops_data;
 
 	dprint(FD_IO, "%s\n", __FUNCTION__);
 	return gf_data->aio_events[event];
@@ -22,7 +22,7 @@ static struct io_u *fio_gf_event(struct thread_data *td, int event)
 static int fio_gf_getevents(struct thread_data *td, unsigned int min,
 			    unsigned int max, const struct timespec *t)
 {
-	struct gf_data *g = td->io_ops->data;
+	struct gf_data *g = td->io_ops_data;
 	unsigned int events = 0;
 	struct io_u *io_u;
 	int i;
@@ -99,7 +99,7 @@ static void gf_async_cb(glfs_fd_t * fd, ssize_t ret, void *data)
 static int fio_gf_async_queue(struct thread_data fio_unused * td,
 			      struct io_u *io_u)
 {
-	struct gf_data *g = td->io_ops->data;
+	struct gf_data *g = td->io_ops_data;
 	int r;
 
 	dprint(FD_IO, "%s op %s\n", __FUNCTION__, io_ddir_name(io_u->ddir));
@@ -150,7 +150,7 @@ int fio_gf_async_setup(struct thread_data *td)
 		return r;
 
 	td->o.use_thread = 1;
-	g = td->io_ops->data;
+	g = td->io_ops_data;
 	g->aio_events = calloc(td->o.iodepth, sizeof(struct io_u *));
 	if (!g->aio_events) {
 		r = -ENOMEM;
diff --git a/engines/glusterfs_sync.c b/engines/glusterfs_sync.c
index 6de4ee2..05e184c 100644
--- a/engines/glusterfs_sync.c
+++ b/engines/glusterfs_sync.c
@@ -11,7 +11,7 @@
 static int fio_gf_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
-	struct gf_data *g = td->io_ops->data;
+	struct gf_data *g = td->io_ops_data;
 
 	dprint(FD_FILE, "fio prep\n");
 
@@ -31,7 +31,7 @@ static int fio_gf_prep(struct thread_data *td, struct io_u *io_u)
 
 static int fio_gf_queue(struct thread_data *td, struct io_u *io_u)
 {
-	struct gf_data *g = td->io_ops->data;
+	struct gf_data *g = td->io_ops_data;
 	int ret = 0;
 
 	dprint(FD_FILE, "fio queue len %lu\n", io_u->xfer_buflen);
diff --git a/engines/guasi.c b/engines/guasi.c
index c586f09..eb12c89 100644
--- a/engines/guasi.c
+++ b/engines/guasi.c
@@ -50,7 +50,7 @@ static int fio_guasi_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 
 static struct io_u *fio_guasi_event(struct thread_data *td, int event)
 {
-	struct guasi_data *ld = td->io_ops->data;
+	struct guasi_data *ld = td->io_ops_data;
 	struct io_u *io_u;
 	struct guasi_reqinfo rinf;
 
@@ -82,7 +82,7 @@ static struct io_u *fio_guasi_event(struct thread_data *td, int event)
 static int fio_guasi_getevents(struct thread_data *td, unsigned int min,
 			       unsigned int max, const struct timespec *t)
 {
-	struct guasi_data *ld = td->io_ops->data;
+	struct guasi_data *ld = td->io_ops_data;
 	int n, r;
 	long timeo = -1;
 
@@ -115,7 +115,7 @@ static int fio_guasi_getevents(struct thread_data *td, unsigned int min,
 
 static int fio_guasi_queue(struct thread_data *td, struct io_u *io_u)
 {
-	struct guasi_data *ld = td->io_ops->data;
+	struct guasi_data *ld = td->io_ops_data;
 
 	fio_ro_check(td, io_u);
 
@@ -148,7 +148,7 @@ static void fio_guasi_queued(struct thread_data *td, struct io_u **io_us, int nr
 
 static int fio_guasi_commit(struct thread_data *td)
 {
-	struct guasi_data *ld = td->io_ops->data;
+	struct guasi_data *ld = td->io_ops_data;
 	int i;
 	struct io_u *io_u;
 	struct fio_file *f;
@@ -198,7 +198,7 @@ static int fio_guasi_cancel(struct thread_data fio_unused *td,
 
 static void fio_guasi_cleanup(struct thread_data *td)
 {
-	struct guasi_data *ld = td->io_ops->data;
+	struct guasi_data *ld = td->io_ops_data;
 	int n;
 
 	GDBG_PRINT(("fio_guasi_cleanup(%p)\n", ld));
@@ -235,7 +235,7 @@ static int fio_guasi_init(struct thread_data *td)
 	ld->queued_nr = 0;
 	ld->reqs_nr = 0;
 
-	td->io_ops->data = ld;
+	td->io_ops_data = ld;
 	GDBG_PRINT(("fio_guasi_init(): depth=%d -> %p\n", td->o.iodepth, ld));
 
 	return 0;
diff --git a/engines/libaio.c b/engines/libaio.c
index 9d562bb..e15c519 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -83,7 +83,7 @@ static int fio_libaio_prep(struct thread_data fio_unused *td, struct io_u *io_u)
 
 static struct io_u *fio_libaio_event(struct thread_data *td, int event)
 {
-	struct libaio_data *ld = td->io_ops->data;
+	struct libaio_data *ld = td->io_ops_data;
 	struct io_event *ev;
 	struct io_u *io_u;
 
@@ -145,7 +145,7 @@ static int user_io_getevents(io_context_t aio_ctx, unsigned int max,
 static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
 				unsigned int max, const struct timespec *t)
 {
-	struct libaio_data *ld = td->io_ops->data;
+	struct libaio_data *ld = td->io_ops_data;
 	struct libaio_options *o = td->eo;
 	unsigned actual_min = td->o.iodepth_batch_complete_min == 0 ? 0 : min;
 	struct timespec __lt, *lt = NULL;
@@ -181,7 +181,7 @@ static int fio_libaio_getevents(struct thread_data *td, unsigned int min,
 
 static int fio_libaio_queue(struct thread_data *td, struct io_u *io_u)
 {
-	struct libaio_data *ld = td->io_ops->data;
+	struct libaio_data *ld = td->io_ops_data;
 
 	fio_ro_check(td, io_u);
 
@@ -238,7 +238,7 @@ static void fio_libaio_queued(struct thread_data *td, struct io_u **io_us,
 
 static int fio_libaio_commit(struct thread_data *td)
 {
-	struct libaio_data *ld = td->io_ops->data;
+	struct libaio_data *ld = td->io_ops_data;
 	struct iocb **iocbs;
 	struct io_u **io_us;
 	struct timeval tv;
@@ -308,14 +308,14 @@ static int fio_libaio_commit(struct thread_data *td)
 
 static int fio_libaio_cancel(struct thread_data *td, struct io_u *io_u)
 {
-	struct libaio_data *ld = td->io_ops->data;
+	struct libaio_data *ld = td->io_ops_data;
 
 	return io_cancel(ld->aio_ctx, &io_u->iocb, ld->aio_events);
 }
 
 static void fio_libaio_cleanup(struct thread_data *td)
 {
-	struct libaio_data *ld = td->io_ops->data;
+	struct libaio_data *ld = td->io_ops_data;
 
 	if (ld) {
 		/*
@@ -363,7 +363,7 @@ static int fio_libaio_init(struct thread_data *td)
 	ld->iocbs = calloc(ld->entries, sizeof(struct iocb *));
 	ld->io_us = calloc(ld->entries, sizeof(struct io_u *));
 
-	td->io_ops->data = ld;
+	td->io_ops_data = ld;
 	return 0;
 }
 
diff --git a/engines/libhdfs.c b/engines/libhdfs.c
index faad3f8..fba17c4 100644
--- a/engines/libhdfs.c
+++ b/engines/libhdfs.c
@@ -119,7 +119,7 @@ static int get_chunck_name(char *dest, char *file_name, uint64_t chunk_id) {
 static int fio_hdfsio_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct hdfsio_options *options = td->eo;
-	struct hdfsio_data *hd = td->io_ops->data;
+	struct hdfsio_data *hd = td->io_ops_data;
 	unsigned long f_id;
 	char fname[CHUNCK_NAME_LENGTH_MAX];
 	int open_flags;
@@ -163,7 +163,7 @@ static int fio_hdfsio_prep(struct thread_data *td, struct io_u *io_u)
 
 static int fio_hdfsio_queue(struct thread_data *td, struct io_u *io_u)
 {
-	struct hdfsio_data *hd = td->io_ops->data;
+	struct hdfsio_data *hd = td->io_ops_data;
 	struct hdfsio_options *options = td->eo;
 	int ret;
 	unsigned long offset;
@@ -223,7 +223,7 @@ int fio_hdfsio_open_file(struct thread_data *td, struct fio_file *f)
 
 int fio_hdfsio_close_file(struct thread_data *td, struct fio_file *f)
 {
-	struct hdfsio_data *hd = td->io_ops->data;
+	struct hdfsio_data *hd = td->io_ops_data;
 
 	if (hd->curr_file_id != -1) {
 		if ( hdfsCloseFile(hd->fs, hd->fp) == -1) {
@@ -238,7 +238,7 @@ int fio_hdfsio_close_file(struct thread_data *td, struct fio_file *f)
 static int fio_hdfsio_init(struct thread_data *td)
 {
 	struct hdfsio_options *options = td->eo;
-	struct hdfsio_data *hd = td->io_ops->data;
+	struct hdfsio_data *hd = td->io_ops_data;
 	struct fio_file *f;
 	uint64_t j,k;
 	int i, failure = 0;
@@ -309,13 +309,13 @@ static int fio_hdfsio_setup(struct thread_data *td)
 	int i;
 	uint64_t file_size, total_file_size;
 
-	if (!td->io_ops->data) {
+	if (!td->io_ops_data) {
 		hd = malloc(sizeof(*hd));
 		memset(hd, 0, sizeof(*hd));
 		
 		hd->curr_file_id = -1;
 
-		td->io_ops->data = hd;
+		td->io_ops_data = hd;
 	}
 	
 	total_file_size = 0;
@@ -346,7 +346,7 @@ static int fio_hdfsio_setup(struct thread_data *td)
 
 static int fio_hdfsio_io_u_init(struct thread_data *td, struct io_u *io_u)
 {
-	struct hdfsio_data *hd = td->io_ops->data;
+	struct hdfsio_data *hd = td->io_ops_data;
 	struct hdfsio_options *options = td->eo;
 	int failure;
 	struct hdfsBuilder *bld;
@@ -381,7 +381,7 @@ static int fio_hdfsio_io_u_init(struct thread_data *td, struct io_u *io_u)
 
 static void fio_hdfsio_io_u_free(struct thread_data *td, struct io_u *io_u)
 {
-	struct hdfsio_data *hd = td->io_ops->data;
+	struct hdfsio_data *hd = td->io_ops_data;
 
 	if (hd->fs && hdfsDisconnect(hd->fs) < 0) {
 		log_err("hdfs: disconnect failed: %d\n", errno);
diff --git a/engines/net.c b/engines/net.c
index 9301ccf..f24efc1 100644
--- a/engines/net.c
+++ b/engines/net.c
@@ -374,7 +374,7 @@ static int splice_io_u(int fdin, int fdout, unsigned int len)
  */
 static int splice_in(struct thread_data *td, struct io_u *io_u)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 
 	return splice_io_u(io_u->file->fd, nd->pipes[1], io_u->xfer_buflen);
 }
@@ -385,7 +385,7 @@ static int splice_in(struct thread_data *td, struct io_u *io_u)
 static int splice_out(struct thread_data *td, struct io_u *io_u,
 		      unsigned int len)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 
 	return splice_io_u(nd->pipes[0], io_u->file->fd, len);
 }
@@ -423,7 +423,7 @@ static int vmsplice_io_u(struct io_u *io_u, int fd, unsigned int len)
 static int vmsplice_io_u_out(struct thread_data *td, struct io_u *io_u,
 			     unsigned int len)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 
 	return vmsplice_io_u(io_u, nd->pipes[0], len);
 }
@@ -433,7 +433,7 @@ static int vmsplice_io_u_out(struct thread_data *td, struct io_u *io_u,
  */
 static int vmsplice_io_u_in(struct thread_data *td, struct io_u *io_u)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 
 	return vmsplice_io_u(io_u, nd->pipes[1], io_u->xfer_buflen);
 }
@@ -524,7 +524,7 @@ static void verify_udp_seq(struct thread_data *td, struct netio_data *nd,
 
 static int fio_netio_send(struct thread_data *td, struct io_u *io_u)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct netio_options *o = td->eo;
 	int ret, flags = 0;
 
@@ -587,7 +587,7 @@ static int is_close_msg(struct io_u *io_u, int len)
 
 static int fio_netio_recv(struct thread_data *td, struct io_u *io_u)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct netio_options *o = td->eo;
 	int ret, flags = 0;
 
@@ -645,7 +645,7 @@ static int fio_netio_recv(struct thread_data *td, struct io_u *io_u)
 static int __fio_netio_queue(struct thread_data *td, struct io_u *io_u,
 			     enum fio_ddir ddir)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct netio_options *o = td->eo;
 	int ret;
 
@@ -711,7 +711,7 @@ static int fio_netio_queue(struct thread_data *td, struct io_u *io_u)
 
 static int fio_netio_connect(struct thread_data *td, struct fio_file *f)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct netio_options *o = td->eo;
 	int type, domain;
 
@@ -826,7 +826,7 @@ static int fio_netio_connect(struct thread_data *td, struct fio_file *f)
 
 static int fio_netio_accept(struct thread_data *td, struct fio_file *f)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct netio_options *o = td->eo;
 	socklen_t socklen;
 	int state;
@@ -878,7 +878,7 @@ err:
 
 static void fio_netio_send_close(struct thread_data *td, struct fio_file *f)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct netio_options *o = td->eo;
 	struct udp_close_msg msg;
 	struct sockaddr *to;
@@ -913,7 +913,7 @@ static int fio_netio_close_file(struct thread_data *td, struct fio_file *f)
 
 static int fio_netio_udp_recv_open(struct thread_data *td, struct fio_file *f)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct netio_options *o = td->eo;
 	struct udp_close_msg msg;
 	struct sockaddr *to;
@@ -947,7 +947,7 @@ static int fio_netio_udp_recv_open(struct thread_data *td, struct fio_file *f)
 
 static int fio_netio_send_open(struct thread_data *td, struct fio_file *f)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct netio_options *o = td->eo;
 	struct udp_close_msg msg;
 	struct sockaddr *to;
@@ -1049,7 +1049,7 @@ static int fio_fill_addr(struct thread_data *td, const char *host, int af,
 static int fio_netio_setup_connect_inet(struct thread_data *td,
 					const char *host, unsigned short port)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct netio_options *o = td->eo;
 	struct addrinfo *res = NULL;
 	void *dst, *src;
@@ -1099,7 +1099,7 @@ static int fio_netio_setup_connect_inet(struct thread_data *td,
 static int fio_netio_setup_connect_unix(struct thread_data *td,
 					const char *path)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct sockaddr_un *soun = &nd->addr_un;
 
 	soun->sun_family = AF_UNIX;
@@ -1120,7 +1120,7 @@ static int fio_netio_setup_connect(struct thread_data *td)
 
 static int fio_netio_setup_listen_unix(struct thread_data *td, const char *path)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct sockaddr_un *addr = &nd->addr_un;
 	mode_t mode;
 	int len, fd;
@@ -1153,7 +1153,7 @@ static int fio_netio_setup_listen_unix(struct thread_data *td, const char *path)
 
 static int fio_netio_setup_listen_inet(struct thread_data *td, short port)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct netio_options *o = td->eo;
 	struct ip_mreq mr;
 	struct sockaddr_in sin;
@@ -1269,7 +1269,7 @@ static int fio_netio_setup_listen_inet(struct thread_data *td, short port)
 
 static int fio_netio_setup_listen(struct thread_data *td)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 	struct netio_options *o = td->eo;
 	int ret;
 
@@ -1344,7 +1344,7 @@ static int fio_netio_init(struct thread_data *td)
 
 static void fio_netio_cleanup(struct thread_data *td)
 {
-	struct netio_data *nd = td->io_ops->data;
+	struct netio_data *nd = td->io_ops_data;
 
 	if (nd) {
 		if (nd->listenfd != -1)
@@ -1368,13 +1368,13 @@ static int fio_netio_setup(struct thread_data *td)
 		td->o.open_files++;
 	}
 
-	if (!td->io_ops->data) {
+	if (!td->io_ops_data) {
 		nd = malloc(sizeof(*nd));;
 
 		memset(nd, 0, sizeof(*nd));
 		nd->listenfd = -1;
 		nd->pipes[0] = nd->pipes[1] = -1;
-		td->io_ops->data = nd;
+		td->io_ops_data = nd;
 	}
 
 	return 0;
@@ -1392,7 +1392,7 @@ static int fio_netio_setup_splice(struct thread_data *td)
 
 	fio_netio_setup(td);
 
-	nd = td->io_ops->data;
+	nd = td->io_ops_data;
 	if (nd) {
 		if (pipe(nd->pipes) < 0)
 			return 1;
diff --git a/engines/null.c b/engines/null.c
index 41d42e0..f7ba370 100644
--- a/engines/null.c
+++ b/engines/null.c
@@ -25,7 +25,7 @@ struct null_data {
 
 static struct io_u *fio_null_event(struct thread_data *td, int event)
 {
-	struct null_data *nd = (struct null_data *) td->io_ops->data;
+	struct null_data *nd = (struct null_data *) td->io_ops_data;
 
 	return nd->io_us[event];
 }
@@ -34,7 +34,7 @@ static int fio_null_getevents(struct thread_data *td, unsigned int min_events,
 			      unsigned int fio_unused max,
 			      const struct timespec fio_unused *t)
 {
-	struct null_data *nd = (struct null_data *) td->io_ops->data;
+	struct null_data *nd = (struct null_data *) td->io_ops_data;
 	int ret = 0;
 	
 	if (min_events) {
@@ -47,7 +47,7 @@ static int fio_null_getevents(struct thread_data *td, unsigned int min_events,
 
 static int fio_null_commit(struct thread_data *td)
 {
-	struct null_data *nd = (struct null_data *) td->io_ops->data;
+	struct null_data *nd = (struct null_data *) td->io_ops_data;
 
 	if (!nd->events) {
 #ifndef FIO_EXTERNAL_ENGINE
@@ -62,7 +62,7 @@ static int fio_null_commit(struct thread_data *td)
 
 static int fio_null_queue(struct thread_data *td, struct io_u *io_u)
 {
-	struct null_data *nd = (struct null_data *) td->io_ops->data;
+	struct null_data *nd = (struct null_data *) td->io_ops_data;
 
 	fio_ro_check(td, io_u);
 
@@ -83,7 +83,7 @@ static int fio_null_open(struct thread_data fio_unused *td,
 
 static void fio_null_cleanup(struct thread_data *td)
 {
-	struct null_data *nd = (struct null_data *) td->io_ops->data;
+	struct null_data *nd = (struct null_data *) td->io_ops_data;
 
 	if (nd) {
 		free(nd->io_us);
@@ -103,7 +103,7 @@ static int fio_null_init(struct thread_data *td)
 	} else
 		td->io_ops->flags |= FIO_SYNCIO;
 
-	td->io_ops->data = nd;
+	td->io_ops_data = nd;
 	return 0;
 }
 
diff --git a/engines/posixaio.c b/engines/posixaio.c
index 29bcc5a..e5411b7 100644
--- a/engines/posixaio.c
+++ b/engines/posixaio.c
@@ -93,7 +93,7 @@ static int fio_posixaio_prep(struct thread_data fio_unused *td,
 static int fio_posixaio_getevents(struct thread_data *td, unsigned int min,
 				  unsigned int max, const struct timespec *t)
 {
-	struct posixaio_data *pd = td->io_ops->data;
+	struct posixaio_data *pd = td->io_ops_data;
 	os_aiocb_t *suspend_list[SUSPEND_ENTRIES];
 	struct timespec start;
 	int have_timeout = 0;
@@ -161,7 +161,7 @@ restart:
 
 static struct io_u *fio_posixaio_event(struct thread_data *td, int event)
 {
-	struct posixaio_data *pd = td->io_ops->data;
+	struct posixaio_data *pd = td->io_ops_data;
 
 	return pd->aio_events[event];
 }
@@ -169,7 +169,7 @@ static struct io_u *fio_posixaio_event(struct thread_data *td, int event)
 static int fio_posixaio_queue(struct thread_data *td,
 			      struct io_u *io_u)
 {
-	struct posixaio_data *pd = td->io_ops->data;
+	struct posixaio_data *pd = td->io_ops_data;
 	os_aiocb_t *aiocb = &io_u->aiocb;
 	int ret;
 
@@ -220,7 +220,7 @@ static int fio_posixaio_queue(struct thread_data *td,
 
 static void fio_posixaio_cleanup(struct thread_data *td)
 {
-	struct posixaio_data *pd = td->io_ops->data;
+	struct posixaio_data *pd = td->io_ops_data;
 
 	if (pd) {
 		free(pd->aio_events);
@@ -236,7 +236,7 @@ static int fio_posixaio_init(struct thread_data *td)
 	pd->aio_events = malloc(td->o.iodepth * sizeof(struct io_u *));
 	memset(pd->aio_events, 0, td->o.iodepth * sizeof(struct io_u *));
 
-	td->io_ops->data = pd;
+	td->io_ops_data = pd;
 	return 0;
 }
 
diff --git a/engines/rbd.c b/engines/rbd.c
index 87ed360..7a109ee 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -91,7 +91,7 @@ static int _fio_setup_rbd_data(struct thread_data *td,
 {
 	struct rbd_data *rbd;
 
-	if (td->io_ops->data)
+	if (td->io_ops_data)
 		return 0;
 
 	rbd = calloc(1, sizeof(struct rbd_data));
@@ -110,15 +110,20 @@ static int _fio_setup_rbd_data(struct thread_data *td,
 	return 0;
 
 failed:
-	if (rbd)
+	if (rbd) {
+		if (rbd->aio_events) 
+			free(rbd->aio_events);
+		if (rbd->sort_events)
+			free(rbd->sort_events);
 		free(rbd);
+	}
 	return 1;
 
 }
 
 static int _fio_rbd_connect(struct thread_data *td)
 {
-	struct rbd_data *rbd = td->io_ops->data;
+	struct rbd_data *rbd = td->io_ops_data;
 	struct rbd_options *o = td->eo;
 	int r;
 
@@ -226,7 +231,7 @@ static void _fio_rbd_finish_aiocb(rbd_completion_t comp, void *data)
 
 static struct io_u *fio_rbd_event(struct thread_data *td, int event)
 {
-	struct rbd_data *rbd = td->io_ops->data;
+	struct rbd_data *rbd = td->io_ops_data;
 
 	return rbd->aio_events[event];
 }
@@ -282,7 +287,7 @@ static int rbd_io_u_cmp(const void *p1, const void *p2)
 static int rbd_iter_events(struct thread_data *td, unsigned int *events,
 			   unsigned int min_evts, int wait)
 {
-	struct rbd_data *rbd = td->io_ops->data;
+	struct rbd_data *rbd = td->io_ops_data;
 	unsigned int this_events = 0;
 	struct io_u *io_u;
 	int i, sidx;
@@ -361,7 +366,7 @@ static int fio_rbd_getevents(struct thread_data *td, unsigned int min,
 
 static int fio_rbd_queue(struct thread_data *td, struct io_u *io_u)
 {
-	struct rbd_data *rbd = td->io_ops->data;
+	struct rbd_data *rbd = td->io_ops_data;
 	struct fio_rbd_iou *fri = io_u->engine_data;
 	int r = -1;
 
@@ -439,7 +444,7 @@ failed:
 
 static void fio_rbd_cleanup(struct thread_data *td)
 {
-	struct rbd_data *rbd = td->io_ops->data;
+	struct rbd_data *rbd = td->io_ops_data;
 
 	if (rbd) {
 		_fio_rbd_disconnect(rbd);
@@ -467,7 +472,7 @@ static int fio_rbd_setup(struct thread_data *td)
 		log_err("fio_setup_rbd_data failed.\n");
 		goto cleanup;
 	}
-	td->io_ops->data = rbd;
+	td->io_ops_data = rbd;
 
 	/* librbd does not allow us to run first in the main thread and later
 	 * in a fork child. It needs to be the same process context all the
@@ -526,7 +531,7 @@ static int fio_rbd_open(struct thread_data *td, struct fio_file *f)
 static int fio_rbd_invalidate(struct thread_data *td, struct fio_file *f)
 {
 #if defined(CONFIG_RBD_INVAL)
-	struct rbd_data *rbd = td->io_ops->data;
+	struct rbd_data *rbd = td->io_ops_data;
 
 	return rbd_invalidate_cache(rbd->image);
 #else
diff --git a/engines/rdma.c b/engines/rdma.c
index 7fbfad9..fbe8434 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -191,7 +191,7 @@ struct rdmaio_data {
 
 static int client_recv(struct thread_data *td, struct ibv_wc *wc)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	unsigned int max_bs;
 
 	if (wc->byte_len != sizeof(rd->recv_buf)) {
@@ -232,7 +232,7 @@ static int client_recv(struct thread_data *td, struct ibv_wc *wc)
 
 static int server_recv(struct thread_data *td, struct ibv_wc *wc)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	unsigned int max_bs;
 
 	if (wc->wr_id == FIO_RDMA_MAX_IO_DEPTH) {
@@ -257,7 +257,7 @@ static int server_recv(struct thread_data *td, struct ibv_wc *wc)
 
 static int cq_event_handler(struct thread_data *td, enum ibv_wc_opcode opcode)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct ibv_wc wc;
 	struct rdma_io_u_data *r_io_u_d;
 	int ret;
@@ -368,7 +368,7 @@ static int cq_event_handler(struct thread_data *td, enum ibv_wc_opcode opcode)
  */
 static int rdma_poll_wait(struct thread_data *td, enum ibv_wc_opcode opcode)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct ibv_cq *ev_cq;
 	void *ev_ctx;
 	int ret;
@@ -405,7 +405,7 @@ again:
 
 static int fio_rdmaio_setup_qp(struct thread_data *td)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct ibv_qp_init_attr init_attr;
 	int qp_depth = td->o.iodepth * 2;	/* 2 times of io depth */
 
@@ -485,7 +485,7 @@ err1:
 
 static int fio_rdmaio_setup_control_msg_buffers(struct thread_data *td)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 
 	rd->recv_mr = ibv_reg_mr(rd->pd, &rd->recv_buf, sizeof(rd->recv_buf),
 				 IBV_ACCESS_LOCAL_WRITE);
@@ -529,7 +529,7 @@ static int get_next_channel_event(struct thread_data *td,
 				  struct rdma_event_channel *channel,
 				  enum rdma_cm_event_type wait_event)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct rdma_cm_event *event;
 	int ret;
 
@@ -561,7 +561,7 @@ static int get_next_channel_event(struct thread_data *td,
 
 static int fio_rdmaio_prep(struct thread_data *td, struct io_u *io_u)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct rdma_io_u_data *r_io_u_d;
 
 	r_io_u_d = io_u->engine_data;
@@ -604,7 +604,7 @@ static int fio_rdmaio_prep(struct thread_data *td, struct io_u *io_u)
 
 static struct io_u *fio_rdmaio_event(struct thread_data *td, int event)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct io_u *io_u;
 	int i;
 
@@ -622,7 +622,7 @@ static struct io_u *fio_rdmaio_event(struct thread_data *td, int event)
 static int fio_rdmaio_getevents(struct thread_data *td, unsigned int min,
 				unsigned int max, const struct timespec *t)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	enum ibv_wc_opcode comp_opcode;
 	struct ibv_cq *ev_cq;
 	void *ev_ctx;
@@ -684,7 +684,7 @@ again:
 static int fio_rdmaio_send(struct thread_data *td, struct io_u **io_us,
 			   unsigned int nr)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct ibv_send_wr *bad_wr;
 #if 0
 	enum ibv_wc_opcode comp_opcode;
@@ -747,7 +747,7 @@ static int fio_rdmaio_send(struct thread_data *td, struct io_u **io_us,
 static int fio_rdmaio_recv(struct thread_data *td, struct io_u **io_us,
 			   unsigned int nr)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct ibv_recv_wr *bad_wr;
 	struct rdma_io_u_data *r_io_u_d;
 	int i;
@@ -783,7 +783,7 @@ static int fio_rdmaio_recv(struct thread_data *td, struct io_u **io_us,
 
 static int fio_rdmaio_queue(struct thread_data *td, struct io_u *io_u)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 
 	fio_ro_check(td, io_u);
 
@@ -801,7 +801,7 @@ static int fio_rdmaio_queue(struct thread_data *td, struct io_u *io_u)
 static void fio_rdmaio_queued(struct thread_data *td, struct io_u **io_us,
 			      unsigned int nr)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct timeval now;
 	unsigned int i;
 
@@ -824,7 +824,7 @@ static void fio_rdmaio_queued(struct thread_data *td, struct io_u **io_us,
 
 static int fio_rdmaio_commit(struct thread_data *td)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct io_u **io_us;
 	int ret;
 
@@ -856,7 +856,7 @@ static int fio_rdmaio_commit(struct thread_data *td)
 
 static int fio_rdmaio_connect(struct thread_data *td, struct fio_file *f)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct rdma_conn_param conn_param;
 	struct ibv_send_wr *bad_wr;
 
@@ -907,7 +907,7 @@ static int fio_rdmaio_connect(struct thread_data *td, struct fio_file *f)
 
 static int fio_rdmaio_accept(struct thread_data *td, struct fio_file *f)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct rdma_conn_param conn_param;
 	struct ibv_send_wr *bad_wr;
 	int ret = 0;
@@ -952,7 +952,7 @@ static int fio_rdmaio_open_file(struct thread_data *td, struct fio_file *f)
 
 static int fio_rdmaio_close_file(struct thread_data *td, struct fio_file *f)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct ibv_send_wr *bad_wr;
 
 	/* unregister rdma buffer */
@@ -1008,7 +1008,7 @@ static int fio_rdmaio_close_file(struct thread_data *td, struct fio_file *f)
 static int fio_rdmaio_setup_connect(struct thread_data *td, const char *host,
 				    unsigned short port)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct ibv_recv_wr *bad_wr;
 	int err;
 
@@ -1072,7 +1072,7 @@ static int fio_rdmaio_setup_connect(struct thread_data *td, const char *host,
 
 static int fio_rdmaio_setup_listen(struct thread_data *td, short port)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct ibv_recv_wr *bad_wr;
 	int state = td->runstate;
 
@@ -1207,7 +1207,7 @@ bad_host:
 
 static int fio_rdmaio_init(struct thread_data *td)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 	struct rdmaio_options *o = td->eo;
 	unsigned int max_bs;
 	int ret, i;
@@ -1316,7 +1316,7 @@ static int fio_rdmaio_init(struct thread_data *td)
 
 static void fio_rdmaio_cleanup(struct thread_data *td)
 {
-	struct rdmaio_data *rd = td->io_ops->data;
+	struct rdmaio_data *rd = td->io_ops_data;
 
 	if (rd)
 		free(rd);
@@ -1332,12 +1332,12 @@ static int fio_rdmaio_setup(struct thread_data *td)
 		td->o.open_files++;
 	}
 
-	if (!td->io_ops->data) {
+	if (!td->io_ops_data) {
 		rd = malloc(sizeof(*rd));
 
 		memset(rd, 0, sizeof(*rd));
 		init_rand_seed(&rd->rand_state, (unsigned int) GOLDEN_RATIO_PRIME, 0);
-		td->io_ops->data = rd;
+		td->io_ops_data = rd;
 	}
 
 	return 0;
diff --git a/engines/sg.c b/engines/sg.c
index 360775f..c1fe602 100644
--- a/engines/sg.c
+++ b/engines/sg.c
@@ -102,7 +102,7 @@ static int fio_sgio_getevents(struct thread_data *td, unsigned int min,
 			      unsigned int max,
 			      const struct timespec fio_unused *t)
 {
-	struct sgio_data *sd = td->io_ops->data;
+	struct sgio_data *sd = td->io_ops_data;
 	int left = max, eventNum, ret, r = 0;
 	void *buf = sd->sgbuf;
 	unsigned int i, events;
@@ -207,7 +207,7 @@ re_read:
 static int fio_sgio_ioctl_doio(struct thread_data *td,
 			       struct fio_file *f, struct io_u *io_u)
 {
-	struct sgio_data *sd = td->io_ops->data;
+	struct sgio_data *sd = td->io_ops_data;
 	struct sg_io_hdr *hdr = &io_u->hdr;
 	int ret;
 
@@ -268,7 +268,7 @@ static int fio_sgio_doio(struct thread_data *td, struct io_u *io_u, int do_sync)
 static int fio_sgio_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct sg_io_hdr *hdr = &io_u->hdr;
-	struct sgio_data *sd = td->io_ops->data;
+	struct sgio_data *sd = td->io_ops_data;
 	long long nr_blocks, lba;
 
 	if (io_u->xfer_buflen & (sd->bs - 1)) {
@@ -366,7 +366,7 @@ static int fio_sgio_queue(struct thread_data *td, struct io_u *io_u)
 
 static struct io_u *fio_sgio_event(struct thread_data *td, int event)
 {
-	struct sgio_data *sd = td->io_ops->data;
+	struct sgio_data *sd = td->io_ops_data;
 
 	return sd->events[event];
 }
@@ -463,7 +463,7 @@ static int fio_sgio_read_capacity(struct thread_data *td, unsigned int *bs,
 
 static void fio_sgio_cleanup(struct thread_data *td)
 {
-	struct sgio_data *sd = td->io_ops->data;
+	struct sgio_data *sd = td->io_ops_data;
 
 	if (sd) {
 		free(sd->events);
@@ -492,7 +492,7 @@ static int fio_sgio_init(struct thread_data *td)
 	sd->sgbuf = malloc(sizeof(struct sg_io_hdr) * td->o.iodepth);
 	memset(sd->sgbuf, 0, sizeof(struct sg_io_hdr) * td->o.iodepth);
 	sd->type_checked = 0;
-	td->io_ops->data = sd;
+	td->io_ops_data = sd;
 
 	/*
 	 * we want to do it, regardless of whether odirect is set or not
@@ -503,7 +503,7 @@ static int fio_sgio_init(struct thread_data *td)
 
 static int fio_sgio_type_check(struct thread_data *td, struct fio_file *f)
 {
-	struct sgio_data *sd = td->io_ops->data;
+	struct sgio_data *sd = td->io_ops_data;
 	unsigned int bs = 0;
 	unsigned long long max_lba = 0;
 
@@ -552,7 +552,7 @@ static int fio_sgio_type_check(struct thread_data *td, struct fio_file *f)
 
 static int fio_sgio_open(struct thread_data *td, struct fio_file *f)
 {
-	struct sgio_data *sd = td->io_ops->data;
+	struct sgio_data *sd = td->io_ops_data;
 	int ret;
 
 	ret = generic_open_file(td, f);
diff --git a/engines/solarisaio.c b/engines/solarisaio.c
index 55a0cb9..151f31d 100644
--- a/engines/solarisaio.c
+++ b/engines/solarisaio.c
@@ -28,7 +28,7 @@ static int fio_solarisaio_cancel(struct thread_data fio_unused *td,
 static int fio_solarisaio_prep(struct thread_data fio_unused *td,
 			    struct io_u *io_u)
 {
-	struct solarisaio_data *sd = td->io_ops->data;
+	struct solarisaio_data *sd = td->io_ops_data;
 
 	io_u->resultp.aio_return = AIO_INPROGRESS;
 	io_u->engine_data = sd;
@@ -75,7 +75,7 @@ static void wait_for_event(struct timeval *tv)
 static int fio_solarisaio_getevents(struct thread_data *td, unsigned int min,
 				    unsigned int max, const struct timespec *t)
 {
-	struct solarisaio_data *sd = td->io_ops->data;
+	struct solarisaio_data *sd = td->io_ops_data;
 	struct timeval tv;
 	int ret;
 
@@ -100,7 +100,7 @@ static int fio_solarisaio_getevents(struct thread_data *td, unsigned int min,
 
 static struct io_u *fio_solarisaio_event(struct thread_data *td, int event)
 {
-	struct solarisaio_data *sd = td->io_ops->data;
+	struct solarisaio_data *sd = td->io_ops_data;
 
 	return sd->aio_events[event];
 }
@@ -108,7 +108,7 @@ static struct io_u *fio_solarisaio_event(struct thread_data *td, int event)
 static int fio_solarisaio_queue(struct thread_data fio_unused *td,
 			      struct io_u *io_u)
 {
-	struct solarisaio_data *sd = td->io_ops->data;
+	struct solarisaio_data *sd = td->io_ops_data;
 	struct fio_file *f = io_u->file;
 	off_t off;
 	int ret;
@@ -155,7 +155,7 @@ static int fio_solarisaio_queue(struct thread_data fio_unused *td,
 
 static void fio_solarisaio_cleanup(struct thread_data *td)
 {
-	struct solarisaio_data *sd = td->io_ops->data;
+	struct solarisaio_data *sd = td->io_ops_data;
 
 	if (sd) {
 		free(sd->aio_events);
@@ -204,7 +204,7 @@ static int fio_solarisaio_init(struct thread_data *td)
 	fio_solarisaio_init_sigio();
 #endif
 
-	td->io_ops->data = sd;
+	td->io_ops_data = sd;
 	return 0;
 }
 
diff --git a/engines/splice.c b/engines/splice.c
index f35ae17..eba093e 100644
--- a/engines/splice.c
+++ b/engines/splice.c
@@ -28,7 +28,7 @@ struct spliceio_data {
  */
 static int fio_splice_read_old(struct thread_data *td, struct io_u *io_u)
 {
-	struct spliceio_data *sd = td->io_ops->data;
+	struct spliceio_data *sd = td->io_ops_data;
 	struct fio_file *f = io_u->file;
 	int ret, ret2, buflen;
 	off_t offset;
@@ -72,7 +72,7 @@ static int fio_splice_read_old(struct thread_data *td, struct io_u *io_u)
  */
 static int fio_splice_read(struct thread_data *td, struct io_u *io_u)
 {
-	struct spliceio_data *sd = td->io_ops->data;
+	struct spliceio_data *sd = td->io_ops_data;
 	struct fio_file *f = io_u->file;
 	struct iovec iov;
 	int ret , buflen, mmap_len;
@@ -166,7 +166,7 @@ static int fio_splice_read(struct thread_data *td, struct io_u *io_u)
  */
 static int fio_splice_write(struct thread_data *td, struct io_u *io_u)
 {
-	struct spliceio_data *sd = td->io_ops->data;
+	struct spliceio_data *sd = td->io_ops_data;
 	struct iovec iov = {
 		.iov_base = io_u->xfer_buf,
 		.iov_len = io_u->xfer_buflen,
@@ -201,7 +201,7 @@ static int fio_splice_write(struct thread_data *td, struct io_u *io_u)
 
 static int fio_spliceio_queue(struct thread_data *td, struct io_u *io_u)
 {
-	struct spliceio_data *sd = td->io_ops->data;
+	struct spliceio_data *sd = td->io_ops_data;
 	int ret = 0;
 
 	fio_ro_check(td, io_u);
@@ -247,7 +247,7 @@ static int fio_spliceio_queue(struct thread_data *td, struct io_u *io_u)
 
 static void fio_spliceio_cleanup(struct thread_data *td)
 {
-	struct spliceio_data *sd = td->io_ops->data;
+	struct spliceio_data *sd = td->io_ops_data;
 
 	if (sd) {
 		close(sd->pipe[0]);
@@ -284,7 +284,7 @@ static int fio_spliceio_init(struct thread_data *td)
 	if (td_read(td))
 		td->o.mem_align = 1;
 
-	td->io_ops->data = sd;
+	td->io_ops_data = sd;
 	return 0;
 }
 
diff --git a/engines/sync.c b/engines/sync.c
index 433e4fa..1726b8e 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -97,7 +97,7 @@ static int fio_io_end(struct thread_data *td, struct io_u *io_u, int ret)
 #ifdef CONFIG_PWRITEV
 static int fio_pvsyncio_queue(struct thread_data *td, struct io_u *io_u)
 {
-	struct syncio_data *sd = td->io_ops->data;
+	struct syncio_data *sd = td->io_ops_data;
 	struct iovec *iov = &sd->iovecs[0];
 	struct fio_file *f = io_u->file;
 	int ret;
@@ -124,7 +124,7 @@ static int fio_pvsyncio_queue(struct thread_data *td, struct io_u *io_u)
 #ifdef FIO_HAVE_PWRITEV2
 static int fio_pvsyncio2_queue(struct thread_data *td, struct io_u *io_u)
 {
-	struct syncio_data *sd = td->io_ops->data;
+	struct syncio_data *sd = td->io_ops_data;
 	struct psyncv2_options *o = td->eo;
 	struct iovec *iov = &sd->iovecs[0];
 	struct fio_file *f = io_u->file;
@@ -197,7 +197,7 @@ static int fio_vsyncio_getevents(struct thread_data *td, unsigned int min,
 				 unsigned int max,
 				 const struct timespec fio_unused *t)
 {
-	struct syncio_data *sd = td->io_ops->data;
+	struct syncio_data *sd = td->io_ops_data;
 	int ret;
 
 	if (min) {
@@ -212,14 +212,14 @@ static int fio_vsyncio_getevents(struct thread_data *td, unsigned int min,
 
 static struct io_u *fio_vsyncio_event(struct thread_data *td, int event)
 {
-	struct syncio_data *sd = td->io_ops->data;
+	struct syncio_data *sd = td->io_ops_data;
 
 	return sd->io_us[event];
 }
 
 static int fio_vsyncio_append(struct thread_data *td, struct io_u *io_u)
 {
-	struct syncio_data *sd = td->io_ops->data;
+	struct syncio_data *sd = td->io_ops_data;
 
 	if (ddir_sync(io_u->ddir))
 		return 0;
@@ -246,7 +246,7 @@ static void fio_vsyncio_set_iov(struct syncio_data *sd, struct io_u *io_u,
 
 static int fio_vsyncio_queue(struct thread_data *td, struct io_u *io_u)
 {
-	struct syncio_data *sd = td->io_ops->data;
+	struct syncio_data *sd = td->io_ops_data;
 
 	fio_ro_check(td, io_u);
 
@@ -286,7 +286,7 @@ static int fio_vsyncio_queue(struct thread_data *td, struct io_u *io_u)
  */
 static int fio_vsyncio_end(struct thread_data *td, ssize_t bytes)
 {
-	struct syncio_data *sd = td->io_ops->data;
+	struct syncio_data *sd = td->io_ops_data;
 	struct io_u *io_u;
 	unsigned int i;
 	int err;
@@ -326,7 +326,7 @@ static int fio_vsyncio_end(struct thread_data *td, ssize_t bytes)
 
 static int fio_vsyncio_commit(struct thread_data *td)
 {
-	struct syncio_data *sd = td->io_ops->data;
+	struct syncio_data *sd = td->io_ops_data;
 	struct fio_file *f;
 	ssize_t ret;
 
@@ -364,13 +364,13 @@ static int fio_vsyncio_init(struct thread_data *td)
 	sd->iovecs = malloc(td->o.iodepth * sizeof(struct iovec));
 	sd->io_us = malloc(td->o.iodepth * sizeof(struct io_u *));
 
-	td->io_ops->data = sd;
+	td->io_ops_data = sd;
 	return 0;
 }
 
 static void fio_vsyncio_cleanup(struct thread_data *td)
 {
-	struct syncio_data *sd = td->io_ops->data;
+	struct syncio_data *sd = td->io_ops_data;
 
 	if (sd) {
 		free(sd->iovecs);
diff --git a/engines/windowsaio.c b/engines/windowsaio.c
index cbbed6a..0e164b6 100644
--- a/engines/windowsaio.c
+++ b/engines/windowsaio.c
@@ -84,7 +84,7 @@ static int fio_windowsaio_init(struct thread_data *td)
 		}
 	}
 
-	td->io_ops->data = wd;
+	td->io_ops_data = wd;
 
 	if (!rc) {
 		struct thread_ctx *ctx;
@@ -97,7 +97,7 @@ static int fio_windowsaio_init(struct thread_data *td)
 			rc = 1;
 		}
 
-		wd = td->io_ops->data;
+		wd = td->io_ops_data;
 		wd->iothread_running = TRUE;
 		wd->iocp = hFile;
 
@@ -131,7 +131,7 @@ static void fio_windowsaio_cleanup(struct thread_data *td)
 {
 	struct windowsaio_data *wd;
 
-	wd = td->io_ops->data;
+	wd = td->io_ops_data;
 
 	if (wd != NULL) {
 		wd->iothread_running = FALSE;
@@ -143,7 +143,7 @@ static void fio_windowsaio_cleanup(struct thread_data *td)
 		free(wd->aio_events);
 		free(wd);
 
-		td->io_ops->data = NULL;
+		td->io_ops_data = NULL;
 	}
 }
 
@@ -203,10 +203,10 @@ static int fio_windowsaio_open_file(struct thread_data *td, struct fio_file *f)
 
 	/* Only set up the completion port and thread if we're not just
 	 * querying the device size */
-	if (!rc && td->io_ops->data != NULL) {
+	if (!rc && td->io_ops_data != NULL) {
 		struct windowsaio_data *wd;
 
-		wd = td->io_ops->data;
+		wd = td->io_ops_data;
 
 		if (CreateIoCompletionPort(f->hFile, wd->iocp, 0, 0) == NULL) {
 			log_err("windowsaio: failed to create io completion port\n");
@@ -251,7 +251,7 @@ static BOOL timeout_expired(DWORD start_count, DWORD end_count)
 
 static struct io_u* fio_windowsaio_event(struct thread_data *td, int event)
 {
-	struct windowsaio_data *wd = td->io_ops->data;
+	struct windowsaio_data *wd = td->io_ops_data;
 	return wd->aio_events[event];
 }
 
@@ -259,7 +259,7 @@ static int fio_windowsaio_getevents(struct thread_data *td, unsigned int min,
 				    unsigned int max,
 				    const struct timespec *t)
 {
-	struct windowsaio_data *wd = td->io_ops->data;
+	struct windowsaio_data *wd = td->io_ops_data;
 	unsigned int dequeued = 0;
 	struct io_u *io_u;
 	int i;
diff --git a/examples/backwards-read.fio b/examples/backwards-read.fio
index ddd47e4..0fe35a2 100644
--- a/examples/backwards-read.fio
+++ b/examples/backwards-read.fio
@@ -5,3 +5,4 @@ bs=4k
 # seek -8k back for every IO
 rw=read:-8k
 filename=128m
+size=128m
diff --git a/filesetup.c b/filesetup.c
index 012773b..a48faf5 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -329,7 +329,7 @@ static int char_size(struct thread_data *td, struct fio_file *f)
 	int r;
 
 	if (td->io_ops->open_file(td, f)) {
-		log_err("fio: failed opening blockdev %s for size check\n",
+		log_err("fio: failed opening chardev %s for size check\n",
 			f->file_name);
 		return 1;
 	}
@@ -1225,10 +1225,12 @@ static void get_file_type(struct fio_file *f)
 	else
 		f->filetype = FIO_TYPE_FILE;
 
+#ifdef WIN32
 	/* \\.\ is the device namespace in Windows, where every file is
 	 * a block device */
 	if (strncmp(f->file_name, "\\\\.\\", 4) == 0)
 		f->filetype = FIO_TYPE_BD;
+#endif
 
 	if (!stat(f->file_name, &sb)) {
 		if (S_ISBLK(sb.st_mode))
diff --git a/fio.1 b/fio.1
index e89c3d1..85eb0fe 100644
--- a/fio.1
+++ b/fio.1
@@ -309,6 +309,7 @@ Trim and write mixed workload. Blocks will be trimmed first, then the same
 blocks will be written to.
 .RE
 .P
+Fio defaults to read if the option is not specified.
 For mixed I/O, the default split is 50/50. For certain types of io the result
 may still be skewed a bit, since the speed may be different. It is possible to
 specify a number of IO's to do before getting a new offset, this is done by
@@ -602,6 +603,7 @@ position the I/O location.
 .TP
 .B psync
 Basic \fBpread\fR\|(2) or \fBpwrite\fR\|(2) I/O.
+Default on all supported operating systems except for Windows.
 .TP
 .B vsync
 Basic \fBreadv\fR\|(2) or \fBwritev\fR\|(2) I/O. Will emulate queuing by
@@ -623,7 +625,7 @@ POSIX asynchronous I/O using \fBaio_read\fR\|(3) and \fBaio_write\fR\|(3).
 Solaris native asynchronous I/O.
 .TP
 .B windowsaio
-Windows native asynchronous I/O.
+Windows native asynchronous I/O. Default on Windows.
 .TP
 .B mmap
 File is memory mapped with \fBmmap\fR\|(2) and data copied using
@@ -654,7 +656,8 @@ and send/receive. This ioengine defines engine specific options.
 .TP
 .B cpuio
 Doesn't transfer any data, but burns CPU cycles according to \fBcpuload\fR and
-\fBcpuchunks\fR parameters.
+\fBcpuchunks\fR parameters. A job never finishes unless there is at least one
+non-cpuio job.
 .TP
 .B guasi
 The GUASI I/O engine is the Generic Userspace Asynchronous Syscall Interface
@@ -1149,7 +1152,7 @@ Allocation method for I/O unit buffer.  Allowed values are:
 .RS
 .TP
 .B malloc
-Allocate memory with \fBmalloc\fR\|(3).
+Allocate memory with \fBmalloc\fR\|(3). Default memory type.
 .TP
 .B shm
 Use shared memory buffers allocated through \fBshmget\fR\|(2).
@@ -1692,13 +1695,13 @@ Some parameters are only valid when a specific ioengine is in use. These are
 used identically to normal parameters, with the caveat that when used on the
 command line, they must come after the ioengine.
 .TP
-.BI (cpu)cpuload \fR=\fPint
+.BI (cpuio)cpuload \fR=\fPint
 Attempt to use the specified percentage of CPU cycles.
 .TP
-.BI (cpu)cpuchunks \fR=\fPint
+.BI (cpuio)cpuchunks \fR=\fPint
 Split the load into cycles of the given time. In microseconds.
 .TP
-.BI (cpu)exit_on_io_done \fR=\fPbool
+.BI (cpuio)exit_on_io_done \fR=\fPbool
 Detect when IO threads are done, then exit.
 .TP
 .BI (libaio)userspace_reap
diff --git a/fio.h b/fio.h
index 8a0ebe3..87a94f6 100644
--- a/fio.h
+++ b/fio.h
@@ -227,6 +227,12 @@ struct thread_data {
 	struct ioengine_ops *io_ops;
 
 	/*
+	 * IO engine private data and dlhandle.
+	 */
+	void *io_ops_data;
+	void *io_ops_dlhandle;
+
+	/*
 	 * Queue depth of io_u's that fio MIGHT do
 	 */
 	unsigned int cur_depth;
diff --git a/init.c b/init.c
index 065a71a..f81db3c 100644
--- a/init.c
+++ b/init.c
@@ -860,7 +860,7 @@ static int fixup_options(struct thread_data *td)
 		td->loops = 1;
 
 	if (td->o.block_error_hist && td->o.nr_files != 1) {
-		log_err("fio: block error histogram only available with "
+		log_err("fio: block error histogram only available "
 			"with a single file per job, but %d files "
 			"provided\n", td->o.nr_files);
 		ret = 1;
@@ -904,17 +904,22 @@ static const char *get_engine_name(const char *str)
 	return p;
 }
 
-static int exists_and_not_file(const char *filename)
+static int exists_and_not_regfile(const char *filename)
 {
 	struct stat sb;
 
 	if (lstat(filename, &sb) == -1)
 		return 0;
 
+#ifndef WIN32 /* NOT Windows */
+	if (S_ISREG(sb.st_mode))
+		return 0;
+#else
 	/* \\.\ is the device namespace in Windows, where every file
 	 * is a device node */
 	if (S_ISREG(sb.st_mode) && strncmp(filename, "\\\\.\\", 4) != 0)
 		return 0;
+#endif
 
 	return 1;
 }
@@ -1342,7 +1347,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	if (!o->filename && !td->files_index && !o->read_iolog_file) {
 		file_alloced = 1;
 
-		if (o->nr_files == 1 && exists_and_not_file(jobname))
+		if (o->nr_files == 1 && exists_and_not_regfile(jobname))
 			add_file(td, jobname, job_add_num, 0);
 		else {
 			for (i = 0; i < o->nr_files; i++)
diff --git a/ioengine.h b/ioengine.h
index 161acf5..0effade 100644
--- a/ioengine.h
+++ b/ioengine.h
@@ -163,8 +163,6 @@ struct ioengine_ops {
 	void (*io_u_free)(struct thread_data *, struct io_u *);
 	int option_struct_size;
 	struct fio_option *options;
-	void *data;
-	void *dlhandle;
 };
 
 enum fio_ioengine_flags {
diff --git a/ioengines.c b/ioengines.c
index e2e7280..918b50a 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -119,13 +119,13 @@ static struct ioengine_ops *dlopen_ioengine(struct thread_data *td,
 		return NULL;
 	}
 
-	ops->dlhandle = dlhandle;
+	td->io_ops_dlhandle = dlhandle;
 	return ops;
 }
 
 struct ioengine_ops *load_ioengine(struct thread_data *td, const char *name)
 {
-	struct ioengine_ops *ops, *ret;
+	struct ioengine_ops *ops;
 	char engine[16];
 
 	dprint(FD_IO, "load ioengine %s\n", name);
@@ -153,11 +153,7 @@ struct ioengine_ops *load_ioengine(struct thread_data *td, const char *name)
 	if (check_engine_ops(ops))
 		return NULL;
 
-	ret = malloc(sizeof(*ret));
-	memcpy(ret, ops, sizeof(*ret));
-	ret->data = NULL;
-
-	return ret;
+	return ops;
 }
 
 /*
@@ -173,10 +169,9 @@ void free_ioengine(struct thread_data *td)
 		td->eo = NULL;
 	}
 
-	if (td->io_ops->dlhandle)
-		dlclose(td->io_ops->dlhandle);
+	if (td->io_ops_dlhandle)
+		dlclose(td->io_ops_dlhandle);
 
-	free(td->io_ops);
 	td->io_ops = NULL;
 }
 
@@ -186,7 +181,7 @@ void close_ioengine(struct thread_data *td)
 
 	if (td->io_ops->cleanup) {
 		td->io_ops->cleanup(td);
-		td->io_ops->data = NULL;
+		td->io_ops_data = NULL;
 	}
 
 	free_ioengine(td);
diff --git a/libfio.c b/libfio.c
index 55762d7..fb7d35a 100644
--- a/libfio.c
+++ b/libfio.c
@@ -47,6 +47,7 @@ unsigned long arch_flags = 0;
 uintptr_t page_mask = 0;
 uintptr_t page_size = 0;
 
+/* see os/os.h */
 static const char *fio_os_strings[os_nr] = {
 	"Invalid",
 	"Linux",
@@ -62,6 +63,7 @@ static const char *fio_os_strings[os_nr] = {
 	"DragonFly",
 };
 
+/* see arch/arch.h */
 static const char *fio_arch_strings[arch_nr] = {
 	"Invalid",
 	"x86-64",
@@ -75,6 +77,8 @@ static const char *fio_arch_strings[arch_nr] = {
 	"arm",
 	"sh",
 	"hppa",
+	"mips",
+	"aarch64",
 	"generic"
 };
 
@@ -274,14 +278,18 @@ int fio_running_or_pending_io_threads(void)
 {
 	struct thread_data *td;
 	int i;
+	int nr_io_threads = 0;
 
 	for_each_td(td, i) {
 		if (td->flags & TD_F_NOIO)
 			continue;
+		nr_io_threads++;
 		if (td->runstate < TD_EXITED)
 			return 1;
 	}
 
+	if (!nr_io_threads)
+		return -1; /* we only had cpuio threads to begin with */
 	return 0;
 }
 
diff --git a/options.c b/options.c
index 4461643..4c56dbe 100644
--- a/options.c
+++ b/options.c
@@ -258,7 +258,7 @@ static int str2error(char *str)
 			    "EINVAL", "ENFILE", "EMFILE", "ENOTTY",
 			    "ETXTBSY","EFBIG", "ENOSPC", "ESPIPE",
 			    "EROFS","EMLINK", "EPIPE", "EDOM", "ERANGE" };
-	int i = 0, num = sizeof(err) / sizeof(void *);
+	int i = 0, num = sizeof(err) / sizeof(char *);
 
 	while (i < num) {
 		if (!strcmp(err[i], str))
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index 187330b..c799817 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -70,8 +70,7 @@ typedef cpumask_t os_cpu_mask_t;
 
 /*
  * Define USCHED_GET_CPUMASK as the macro didn't exist until release 4.5.
- * usched_set(2) returns EINVAL if the kernel doesn't support it, though
- * fio_getaffinity() returns void.
+ * usched_set(2) returns EINVAL if the kernel doesn't support it.
  *
  * Also note usched_set(2) works only for the current thread regardless of
  * the command type. It doesn't work against another thread regardless of
@@ -82,6 +81,9 @@ typedef cpumask_t os_cpu_mask_t;
 #define USCHED_GET_CPUMASK	5
 #endif
 
+/* No CPU_COUNT(), but use the default function defined in os/os.h */
+#define fio_cpu_count(mask)             CPU_COUNT((mask))
+
 static inline int fio_cpuset_init(os_cpu_mask_t *mask)
 {
 	CPUMASK_ASSZERO(*mask);
@@ -111,17 +113,6 @@ static inline int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
 	return 0;
 }
 
-static inline int fio_cpu_count(os_cpu_mask_t *mask)
-{
-	int i, n = 0;
-
-	for (i = 0; i < FIO_MAX_CPUS; i++)
-		if (CPUMASK_TESTBIT(*mask, i))
-			n++;
-
-	return n;
-}
-
 static inline int fio_setaffinity(int pid, os_cpu_mask_t mask)
 {
 	int i, firstcall = 1;
@@ -145,12 +136,15 @@ static inline int fio_setaffinity(int pid, os_cpu_mask_t mask)
 	return 0;
 }
 
-static inline void fio_getaffinity(int pid, os_cpu_mask_t *mask)
+static inline int fio_getaffinity(int pid, os_cpu_mask_t *mask)
 {
 	/* 0 for the current thread, see BUGS in usched_set(2) */
 	pid = 0;
 
-	usched_set(pid, USCHED_GET_CPUMASK, mask, sizeof(*mask));
+	if (usched_set(pid, USCHED_GET_CPUMASK, mask, sizeof(*mask)))
+		return -1;
+
+	return 0;
 }
 
 /* fio code is Linux based, so rename macros to Linux style */
diff --git a/os/os-linux-syscall.h b/os/os-linux-syscall.h
index 2de02f1..c399b2f 100644
--- a/os/os-linux-syscall.h
+++ b/os/os-linux-syscall.h
@@ -3,7 +3,7 @@
 
 #include "../arch/arch.h"
 
-/* Linux syscalls for i386 */
+/* Linux syscalls for x86 */
 #if defined(ARCH_X86_H)
 #ifndef __NR_ioprio_set
 #define __NR_ioprio_set		289
diff --git a/os/os-windows.h b/os/os-windows.h
index d049531..616ad43 100644
--- a/os/os-windows.h
+++ b/os/os-windows.h
@@ -193,7 +193,7 @@ static inline int fio_setaffinity(int pid, os_cpu_mask_t cpumask)
 	return (bSuccess)? 0 : -1;
 }
 
-static inline void fio_getaffinity(int pid, os_cpu_mask_t *mask)
+static inline int fio_getaffinity(int pid, os_cpu_mask_t *mask)
 {
 	os_cpu_mask_t systemMask;
 
@@ -204,7 +204,10 @@ static inline void fio_getaffinity(int pid, os_cpu_mask_t *mask)
 		CloseHandle(h);
 	} else {
 		log_err("fio_getaffinity failed: failed to get handle for pid %d\n", pid);
+		return -1;
 	}
+
+	return 0;
 }
 
 static inline void fio_cpu_clear(os_cpu_mask_t *mask, int cpu)
diff --git a/os/os.h b/os/os.h
index 9877383..4f267c2 100644
--- a/os/os.h
+++ b/os/os.h
@@ -84,7 +84,6 @@ typedef struct aiocb os_aiocb_t;
 #endif
 
 #ifndef FIO_HAVE_CPU_AFFINITY
-#define fio_getaffinity(pid, mask)	do { } while (0)
 #define fio_cpu_clear(mask, cpu)	do { } while (0)
 typedef unsigned long os_cpu_mask_t;
 
@@ -93,6 +92,11 @@ static inline int fio_setaffinity(int pid, os_cpu_mask_t cpumask)
 	return 0;
 }
 
+static inline int fio_getaffinity(int pid, os_cpu_mask_t *cpumask)
+{
+	return -1;
+}
+
 static inline int fio_cpuset_exit(os_cpu_mask_t *mask)
 {
 	return -1;
diff --git a/parse.c b/parse.c
index bb16bc1..086f786 100644
--- a/parse.c
+++ b/parse.c
@@ -109,6 +109,7 @@ static void show_option_help(struct fio_option *o, int is_err)
 		"list of floating point values separated by ':' (opt=5.9:7.8)",
 		"no argument (opt)",
 		"deprecated",
+		"unsupported",
 	};
 	size_t (*logger)(const char *format, ...);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-27 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-27 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 79baf7f48a6e680e1e746e150b60c165542fdf6c:

  Fio 2.13 (2016-07-22 13:43:56 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fa3cdbb7a46e42186e8fa62d33b82b92c7c0e310:

  Add sample job file showing how to read backwards (2016-07-26 14:50:02 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Revert "Revert "fio: Simplify forking of processes""
      Add sample job file showing how to read backwards

 backend.c                   | 46 +++++++--------------------------------------
 examples/backwards-read.fio |  7 +++++++
 2 files changed, 14 insertions(+), 39 deletions(-)
 create mode 100644 examples/backwards-read.fio

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 58c77cb..ad2d7da 100644
--- a/backend.c
+++ b/backend.c
@@ -1808,39 +1808,6 @@ err:
 	return (void *) (uintptr_t) td->error;
 }
 
-
-/*
- * We cannot pass the td data into a forked process, so attach the td and
- * pass it to the thread worker.
- */
-static int fork_main(struct sk_out *sk_out, int shmid, int offset)
-{
-	struct fork_data *fd;
-	void *data, *ret;
-
-#if !defined(__hpux) && !defined(CONFIG_NO_SHM)
-	data = shmat(shmid, NULL, 0);
-	if (data == (void *) -1) {
-		int __err = errno;
-
-		perror("shmat");
-		return __err;
-	}
-#else
-	/*
-	 * HP-UX inherits shm mappings?
-	 */
-	data = threads;
-#endif
-
-	fd = calloc(1, sizeof(*fd));
-	fd->td = data + offset * sizeof(struct thread_data);
-	fd->sk_out = sk_out;
-	ret = thread_main(fd);
-	shmdt(data);
-	return (int) (uintptr_t) ret;
-}
-
 static void dump_td_info(struct thread_data *td)
 {
 	log_err("fio: job '%s' (state=%d) hasn't exited in %lu seconds, it "
@@ -2178,6 +2145,7 @@ reap:
 		struct thread_data *map[REAL_MAX_JOBS];
 		struct timeval this_start;
 		int this_jobs = 0, left;
+		struct fork_data *fd;
 
 		/*
 		 * create threads (TD_NOT_CREATED -> TD_CREATED)
@@ -2227,14 +2195,13 @@ reap:
 			map[this_jobs++] = td;
 			nr_started++;
 
+			fd = calloc(1, sizeof(*fd));
+			fd->td = td;
+			fd->sk_out = sk_out;
+
 			if (td->o.use_thread) {
-				struct fork_data *fd;
 				int ret;
 
-				fd = calloc(1, sizeof(*fd));
-				fd->td = td;
-				fd->sk_out = sk_out;
-
 				dprint(FD_PROCESS, "will pthread_create\n");
 				ret = pthread_create(&td->thread, NULL,
 							thread_main, fd);
@@ -2254,8 +2221,9 @@ reap:
 				dprint(FD_PROCESS, "will fork\n");
 				pid = fork();
 				if (!pid) {
-					int ret = fork_main(sk_out, shm_id, i);
+					int ret;
 
+					ret = (int)(uintptr_t)thread_main(fd);
 					_exit(ret);
 				} else if (i == fio_debug_jobno)
 					*fio_debug_jobp = pid;
diff --git a/examples/backwards-read.fio b/examples/backwards-read.fio
new file mode 100644
index 0000000..ddd47e4
--- /dev/null
+++ b/examples/backwards-read.fio
@@ -0,0 +1,7 @@
+# Demonstrates how to read backwards in a file.
+
+[backwards-read]
+bs=4k
+# seek -8k back for every IO
+rw=read:-8k
+filename=128m

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-23 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-23 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit c16556af62cd1cd1ae31b6ee8706efc43c137f77:

  drifting in output of interval-averaged values was eventually causing IOP samples to be dropped. (2016-07-20 16:21:55 -0400)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 79baf7f48a6e680e1e746e150b60c165542fdf6c:

  Fio 2.13 (2016-07-22 13:43:56 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      log: fix missing entries
      log: fix averaged latency logging
      Fio 2.13

YukiKita (1):
      Fix "exitall_on_error" option     "exitall_on_error" option should be enabled without any argument.

 FIO-VERSION-GEN        |  2 +-
 iolog.h                |  7 +++++++
 options.c              |  4 ++--
 os/windows/install.wxs |  2 +-
 stat.c                 | 43 ++++++++++++++++++++++++++++++-------------
 5 files changed, 41 insertions(+), 17 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 04802dd..7065a57 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.12
+DEF_VER=fio-2.13
 
 LF='
 '
diff --git a/iolog.h b/iolog.h
index a58e3f0..0438fa7 100644
--- a/iolog.h
+++ b/iolog.h
@@ -230,6 +230,13 @@ static inline bool per_unit_log(struct io_log *log)
 	return log && !log->avg_msec;
 }
 
+static inline bool inline_log(struct io_log *log)
+{
+	return log->log_type == IO_LOG_TYPE_LAT ||
+		log->log_type == IO_LOG_TYPE_CLAT ||
+		log->log_type == IO_LOG_TYPE_SLAT;
+}
+
 extern void finalize_logs(struct thread_data *td, bool);
 extern void setup_log(struct io_log **, struct log_params *, const char *);
 extern void flush_log(struct io_log *, bool);
diff --git a/options.c b/options.c
index 4723e41..4461643 100644
--- a/options.c
+++ b/options.c
@@ -3444,8 +3444,8 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	{
 		.name	= "exitall_on_error",
 		.lname	= "Exit-all on terminate in error",
-		.type	= FIO_OPT_BOOL,
-		.off1	= td_var_offset(unlink),
+		.type	= FIO_OPT_STR_SET,
+		.off1	= td_var_offset(exitall_error),
 		.help	= "Terminate all jobs when one exits in error",
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_PROCESS,
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 1e8022d..f8d3773 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.12">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.13">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"
diff --git a/stat.c b/stat.c
index 08a402a..7a35117 100644
--- a/stat.c
+++ b/stat.c
@@ -18,6 +18,8 @@
 #include "helper_thread.h"
 #include "smalloc.h"
 
+#define LOG_MSEC_SLACK	10
+
 struct fio_mutex *stat_mutex;
 
 void clear_rusage_stat(struct thread_data *td)
@@ -2107,14 +2109,14 @@ static void _add_stat_to_log(struct io_log *iolog, unsigned long elapsed,
 		__add_stat_to_log(iolog, ddir, elapsed, log_max);
 }
 
-static void add_log_sample(struct thread_data *td, struct io_log *iolog,
+static long add_log_sample(struct thread_data *td, struct io_log *iolog,
 			   unsigned long val, enum fio_ddir ddir,
 			   unsigned int bs, uint64_t offset)
 {
 	unsigned long elapsed, this_window;
 
 	if (!ddir_rw(ddir))
-		return;
+		return 0;
 
 	elapsed = mtime_since_now(&td->epoch);
 
@@ -2123,7 +2125,7 @@ static void add_log_sample(struct thread_data *td, struct io_log *iolog,
 	 */
 	if (!iolog->avg_msec) {
 		__add_log_sample(iolog, val, ddir, bs, elapsed, offset);
-		return;
+		return 0;
 	}
 
 	/*
@@ -2137,12 +2139,17 @@ static void add_log_sample(struct thread_data *td, struct io_log *iolog,
 	 * need to do.
 	 */
 	this_window = elapsed - iolog->avg_last;
-	if (this_window < iolog->avg_msec)
-		return;
+	if (this_window < iolog->avg_msec) {
+		int diff = iolog->avg_msec - this_window;
+
+		if (inline_log(iolog) || diff > LOG_MSEC_SLACK)
+			return diff;
+	}
 
 	_add_stat_to_log(iolog, elapsed, td->o.log_max != 0);
 
 	iolog->avg_last = elapsed - (this_window - iolog->avg_msec);
+	return iolog->avg_msec;
 }
 
 void finalize_logs(struct thread_data *td, bool unit_logs)
@@ -2264,10 +2271,13 @@ static int add_bw_samples(struct thread_data *td, struct timeval *t)
 	struct thread_stat *ts = &td->ts;
 	unsigned long spent, rate;
 	enum fio_ddir ddir;
+	unsigned int next, next_log;
+
+	next_log = td->o.bw_avg_time;
 
 	spent = mtime_since(&td->bw_sample_time, t);
 	if (spent < td->o.bw_avg_time &&
-	    td->o.bw_avg_time - spent >= 10)
+	    td->o.bw_avg_time - spent >= LOG_MSEC_SLACK)
 		return td->o.bw_avg_time - spent;
 
 	td_io_u_lock(td);
@@ -2295,7 +2305,8 @@ static int add_bw_samples(struct thread_data *td, struct timeval *t)
 			if (td->o.min_bs[ddir] == td->o.max_bs[ddir])
 				bs = td->o.min_bs[ddir];
 
-			add_log_sample(td, td->bw_log, rate, ddir, bs, 0);
+			next = add_log_sample(td, td->bw_log, rate, ddir, bs, 0);
+			next_log = min(next_log, next);
 		}
 
 		td->stat_io_bytes[ddir] = td->this_io_bytes[ddir];
@@ -2306,9 +2317,10 @@ static int add_bw_samples(struct thread_data *td, struct timeval *t)
 	td_io_u_unlock(td);
 
 	if (spent <= td->o.bw_avg_time)
-		return td->o.bw_avg_time;
+		return min(next_log, td->o.bw_avg_time);
 
-	return td->o.bw_avg_time - (1 + spent - td->o.bw_avg_time);
+	next = td->o.bw_avg_time - (1 + spent - td->o.bw_avg_time);
+	return min(next, next_log);
 }
 
 void add_iops_sample(struct thread_data *td, struct io_u *io_u,
@@ -2332,10 +2344,13 @@ static int add_iops_samples(struct thread_data *td, struct timeval *t)
 	struct thread_stat *ts = &td->ts;
 	unsigned long spent, iops;
 	enum fio_ddir ddir;
+	unsigned int next, next_log;
+
+	next_log = td->o.iops_avg_time;
 
 	spent = mtime_since(&td->iops_sample_time, t);
 	if (spent < td->o.iops_avg_time &&
-	    td->o.iops_avg_time - spent >= 10)
+	    td->o.iops_avg_time - spent >= LOG_MSEC_SLACK)
 		return td->o.iops_avg_time - spent;
 
 	td_io_u_lock(td);
@@ -2363,7 +2378,8 @@ static int add_iops_samples(struct thread_data *td, struct timeval *t)
 			if (td->o.min_bs[ddir] == td->o.max_bs[ddir])
 				bs = td->o.min_bs[ddir];
 
-			add_log_sample(td, td->iops_log, iops, ddir, bs, 0);
+			next = add_log_sample(td, td->iops_log, iops, ddir, bs, 0);
+			next_log = min(next_log, next);
 		}
 
 		td->stat_io_blocks[ddir] = td->this_io_blocks[ddir];
@@ -2374,9 +2390,10 @@ static int add_iops_samples(struct thread_data *td, struct timeval *t)
 	td_io_u_unlock(td);
 
 	if (spent <= td->o.iops_avg_time)
-		return td->o.iops_avg_time;
+		return min(next_log, td->o.iops_avg_time);
 
-	return td->o.iops_avg_time - (1 + spent - td->o.iops_avg_time);
+	next = td->o.iops_avg_time - (1 + spent - td->o.iops_avg_time);
+	return min(next, next_log);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit eec97b935abb714bf498c96a8f18ec1104c75fd4:

  Add missing header inclusion for Android from 1c764dbe (2016-07-19 16:20:02 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c16556af62cd1cd1ae31b6ee8706efc43c137f77:

  drifting in output of interval-averaged values was eventually causing IOP samples to be dropped. (2016-07-20 16:21:55 -0400)

----------------------------------------------------------------
Karl Cronburg (1):
      drifting in output of interval-averaged values was eventually causing IOP samples to be dropped.

 stat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index 96cd764..08a402a 100644
--- a/stat.c
+++ b/stat.c
@@ -2142,7 +2142,7 @@ static void add_log_sample(struct thread_data *td, struct io_log *iolog,
 
 	_add_stat_to_log(iolog, elapsed, td->o.log_max != 0);
 
-	iolog->avg_last = elapsed;
+	iolog->avg_last = elapsed - (this_window - iolog->avg_msec);
 }
 
 void finalize_logs(struct thread_data *td, bool unit_logs)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e4d03925dca33524be25d883febf9d3b83155a9f:

  plot: indicate that the pattern is a glob (2016-07-17 09:03:48 +0100)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to eec97b935abb714bf498c96a8f18ec1104c75fd4:

  Add missing header inclusion for Android from 1c764dbe (2016-07-19 16:20:02 -0600)

----------------------------------------------------------------
Tomohiro Kusumi (8):
      Fix wrong cpuio option name in documentation
      Add CPU affinity support for DragonFlyBSD
      Make I/O priority option generic for non-Linux environment [1/2]
      Make I/O priority option generic for non-Linux environment [2/2]
      Add ioprio_set() support for DragonFlyBSD
      Change ARCH_X86_64_h to ARCH_X86_64_H
      Add os/os-linux-syscall.h to separate syscall NR from arch headers
      Add missing header inclusion for Android from 1c764dbe

 HOWTO                 |   5 +-
 arch/arch-aarch64.h   |   5 -
 arch/arch-alpha.h     |  15 ---
 arch/arch-arm.h       |  22 ----
 arch/arch-hppa.h      |  15 ---
 arch/arch-ia64.h      |  22 ----
 arch/arch-mips.h      |  15 ---
 arch/arch-ppc.h       |  15 ---
 arch/arch-s390.h      |  22 ----
 arch/arch-sh.h        |  15 ---
 arch/arch-sparc.h     |  22 ----
 arch/arch-sparc64.h   |  22 ----
 arch/arch-x86.h       |  22 ----
 arch/arch-x86_64.h    |  34 +------
 fio.1                 |   2 +-
 options.c             |  28 +++--
 os/os-android.h       |   8 ++
 os/os-dragonfly.h     | 138 ++++++++++++++++++++++++-
 os/os-linux-syscall.h | 277 ++++++++++++++++++++++++++++++++++++++++++++++++++
 os/os-linux.h         |   8 ++
 20 files changed, 452 insertions(+), 260 deletions(-)
 create mode 100644 os/os-linux-syscall.h

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index a50f93e..ab25cb2 100644
--- a/HOWTO
+++ b/HOWTO
@@ -749,7 +749,7 @@ ioengine=str	Defines how the job issues io to the file. The following
 
 			cpuio	Doesn't transfer any data, but burns CPU
 				cycles according to the cpuload= and
-				cpucycle= options. Setting cpuload=85
+				cpuchunks= options. Setting cpuload=85
 				will cause that job to do nothing but burn
 				85% of the CPU. In case of SMP machines,
 				use numjobs=<no_of_cpu> to get desired CPU
@@ -1064,7 +1064,8 @@ nice=int	Run the job with the given nice value. See man nice(2).
 
 prio=int	Set the io priority value of this job. Linux limits us to
 		a positive value between 0 and 7, with 0 being the highest.
-		See man ionice(1).
+		See man ionice(1). Refer to an appropriate manpage for
+		other operating systems since meaning of priority may differ.
 
 prioclass=int	Set the io priority class. See man ionice(1).
 
diff --git a/arch/arch-aarch64.h b/arch/arch-aarch64.h
index a6cfaf2..2a86cc5 100644
--- a/arch/arch-aarch64.h
+++ b/arch/arch-aarch64.h
@@ -8,11 +8,6 @@
 
 #define FIO_ARCH	(arch_aarch64)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set		30
-#define __NR_ioprio_get		31
-#endif
-
 #define nop		do { __asm__ __volatile__ ("yield"); } while (0)
 #define read_barrier()	do { __sync_synchronize(); } while (0)
 #define write_barrier()	do { __sync_synchronize(); } while (0)
diff --git a/arch/arch-alpha.h b/arch/arch-alpha.h
index c0f784f..9318e15 100644
--- a/arch/arch-alpha.h
+++ b/arch/arch-alpha.h
@@ -3,21 +3,6 @@
 
 #define FIO_ARCH	(arch_alpha)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set		442
-#define __NR_ioprio_get		443
-#endif
-
-#ifndef __NR_fadvise64
-#define __NR_fadvise64		413
-#endif
-
-#ifndef __NR_sys_splice
-#define __NR_sys_splice		468
-#define __NR_sys_tee		470
-#define __NR_sys_vmsplice	471
-#endif
-
 #define nop			do { } while (0)
 #define read_barrier()		__asm__ __volatile__("mb": : :"memory")
 #define write_barrier()		__asm__ __volatile__("wmb": : :"memory")
diff --git a/arch/arch-arm.h b/arch/arch-arm.h
index 57d9488..31671fd 100644
--- a/arch/arch-arm.h
+++ b/arch/arch-arm.h
@@ -3,28 +3,6 @@
 
 #define FIO_ARCH	(arch_arm)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set		314
-#define __NR_ioprio_get		315
-#endif
-
-#ifndef __NR_fadvise64
-#define __NR_fadvise64		270
-#endif
-
-#ifndef __NR_sys_splice
-#define __NR_sys_splice		340
-#define __NR_sys_tee		342
-#define __NR_sys_vmsplice	343
-#endif
-
-#ifndef __NR_preadv2
-#define __NR_preadv2		392
-#endif
-#ifndef __NR_pwritev2
-#define __NR_pwritev2		393
-#endif
-
 #if defined (__ARM_ARCH_4__) || defined (__ARM_ARCH_4T__) \
 	|| defined (__ARM_ARCH_5__) || defined (__ARM_ARCH_5T__) || defined (__ARM_ARCH_5E__)\
 	|| defined (__ARM_ARCH_5TE__) || defined (__ARM_ARCH_5TEJ__) \
diff --git a/arch/arch-hppa.h b/arch/arch-hppa.h
index c1c079e..eb4fc33 100644
--- a/arch/arch-hppa.h
+++ b/arch/arch-hppa.h
@@ -3,21 +3,6 @@
 
 #define FIO_ARCH	(arch_hppa)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set		267
-#define __NR_ioprio_get		268
-#endif
-
-#ifndef __NR_fadvise64
-#define __NR_fadvise64		236
-#endif
-
-#ifndef __NR_sys_splice
-#define __NR_sys_splice		291
-#define __NR_sys_tee		293
-#define __NR_sys_vmsplice	294
-#endif
-
 #define nop	do { } while (0)
 
 #define read_barrier()	__asm__ __volatile__ ("" : : : "memory")
diff --git a/arch/arch-ia64.h b/arch/arch-ia64.h
index 7cdeefc..53c049f 100644
--- a/arch/arch-ia64.h
+++ b/arch/arch-ia64.h
@@ -3,28 +3,6 @@
 
 #define FIO_ARCH	(arch_ia64)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set		1274
-#define __NR_ioprio_get		1275
-#endif
-
-#ifndef __NR_fadvise64
-#define __NR_fadvise64		1234
-#endif
-
-#ifndef __NR_sys_splice
-#define __NR_sys_splice		1297
-#define __NR_sys_tee		1301
-#define __NR_sys_vmsplice	1302
-#endif
-
-#ifndef __NR_preadv2
-#define __NR_preadv2		1348
-#endif
-#ifndef __NR_pwritev2
-#define __NR_pwritev2		1349
-#endif
-
 #define nop		asm volatile ("hint @pause" ::: "memory");
 #define read_barrier()	asm volatile ("mf" ::: "memory")
 #define write_barrier()	asm volatile ("mf" ::: "memory")
diff --git a/arch/arch-mips.h b/arch/arch-mips.h
index 0b781d1..6f157fb 100644
--- a/arch/arch-mips.h
+++ b/arch/arch-mips.h
@@ -3,21 +3,6 @@
 
 #define FIO_ARCH	(arch_mips)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set		314
-#define __NR_ioprio_get		315
-#endif
-
-#ifndef __NR_fadvise64
-#define __NR_fadvise64		215
-#endif
-
-#ifndef __NR_sys_splice
-#define __NR_sys_splice		263
-#define __NR_sys_tee		265
-#define __NR_sys_vmsplice	266
-#endif
-
 #define read_barrier()		__asm__ __volatile__("": : :"memory")
 #define write_barrier()		__asm__ __volatile__("": : :"memory")
 #define nop			__asm__ __volatile__("": : :"memory")
diff --git a/arch/arch-ppc.h b/arch/arch-ppc.h
index 161c39c..4a8aa97 100644
--- a/arch/arch-ppc.h
+++ b/arch/arch-ppc.h
@@ -8,21 +8,6 @@
 
 #define FIO_ARCH	(arch_ppc)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set		273
-#define __NR_ioprio_get		274
-#endif
-
-#ifndef __NR_fadvise64
-#define __NR_fadvise64		233
-#endif
-
-#ifndef __NR_sys_splice
-#define __NR_sys_splice		283
-#define __NR_sys_tee		284
-#define __NR_sys_vmsplice	285
-#endif
-
 #define nop	do { } while (0)
 
 #ifdef __powerpc64__
diff --git a/arch/arch-s390.h b/arch/arch-s390.h
index 71beb7d..2e84bf8 100644
--- a/arch/arch-s390.h
+++ b/arch/arch-s390.h
@@ -3,28 +3,6 @@
 
 #define FIO_ARCH	(arch_s390)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set		282
-#define __NR_ioprio_get		283
-#endif
-
-#ifndef __NR_fadvise64
-#define __NR_fadvise64		253
-#endif
-
-#ifndef __NR_sys_splice
-#define __NR_sys_splice		306
-#define __NR_sys_tee		308
-#define __NR_sys_vmsplice	309
-#endif
-
-#ifndef __NR_preadv2
-#define __NR_preadv2		376
-#endif
-#ifndef __NR_pwritev2
-#define __NR_pwritev2		377
-#endif
-
 #define nop		asm volatile("nop" : : : "memory")
 #define read_barrier()	asm volatile("bcr 15,0" : : : "memory")
 #define write_barrier()	asm volatile("bcr 15,0" : : : "memory")
diff --git a/arch/arch-sh.h b/arch/arch-sh.h
index 9acbbbe..58ff226 100644
--- a/arch/arch-sh.h
+++ b/arch/arch-sh.h
@@ -5,21 +5,6 @@
 
 #define FIO_ARCH	(arch_sh)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set	288
-#define __NR_ioprio_get	289
-#endif
-
-#ifndef __NR_fadvise64
-#define __NR_fadvise64	250
-#endif
-
-#ifndef __NR_sys_splice
-#define __NR_sys_splice		313
-#define __NR_sys_tee		315
-#define __NR_sys_vmsplice	316
-#endif
-
 #define nop             __asm__ __volatile__ ("nop": : :"memory")
 
 #define mb()								\
diff --git a/arch/arch-sparc.h b/arch/arch-sparc.h
index d0df883..f82a1f2 100644
--- a/arch/arch-sparc.h
+++ b/arch/arch-sparc.h
@@ -3,28 +3,6 @@
 
 #define FIO_ARCH	(arch_sparc)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set		196
-#define __NR_ioprio_get		218
-#endif
-
-#ifndef __NR_fadvise64
-#define __NR_fadvise64		209
-#endif
-
-#ifndef __NR_sys_splice
-#define __NR_sys_splice		232
-#define __NR_sys_tee		280
-#define __NR_sys_vmsplice	25
-#endif
-
-#ifndef __NR_preadv2
-#define __NR_preadv2		358
-#endif
-#ifndef __NR_pwritev2
-#define __NR_pwritev2		359
-#endif
-
 #define nop	do { } while (0)
 
 #define read_barrier()	__asm__ __volatile__ ("" : : : "memory")
diff --git a/arch/arch-sparc64.h b/arch/arch-sparc64.h
index 5c4e649..80c697b 100644
--- a/arch/arch-sparc64.h
+++ b/arch/arch-sparc64.h
@@ -3,28 +3,6 @@
 
 #define FIO_ARCH	(arch_sparc64)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set		196
-#define __NR_ioprio_get		218
-#endif
-
-#ifndef __NR_fadvise64
-#define __NR_fadvise64		209
-#endif
-
-#ifndef __NR_sys_splice
-#define __NR_sys_splice		232
-#define __NR_sys_tee		280
-#define __NR_sys_vmsplice	25
-#endif
-
-#ifndef __NR_preadv2
-#define __NR_preadv2		358
-#endif
-#ifndef __NR_pwritev2
-#define __NR_pwritev2		359
-#endif
-
 #define nop	do { } while (0)
 
 #define membar_safe(type) \
diff --git a/arch/arch-x86.h b/arch/arch-x86.h
index 9471a89..d3b8985 100644
--- a/arch/arch-x86.h
+++ b/arch/arch-x86.h
@@ -14,28 +14,6 @@ static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 
 #define FIO_ARCH	(arch_i386)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set		289
-#define __NR_ioprio_get		290
-#endif
-
-#ifndef __NR_fadvise64
-#define __NR_fadvise64		250
-#endif
-
-#ifndef __NR_sys_splice
-#define __NR_sys_splice		313
-#define __NR_sys_tee		315
-#define __NR_sys_vmsplice	316
-#endif
-
-#ifndef __NR_preadv2
-#define __NR_preadv2		378
-#endif
-#ifndef __NR_pwritev2
-#define __NR_pwritev2		379
-#endif
-
 #define	FIO_HUGE_PAGE		4194304
 
 #define nop		__asm__ __volatile__("rep;nop": : :"memory")
diff --git a/arch/arch-x86_64.h b/arch/arch-x86_64.h
index 21da412..e686d10 100644
--- a/arch/arch-x86_64.h
+++ b/arch/arch-x86_64.h
@@ -1,5 +1,5 @@
-#ifndef ARCH_X86_64_h
-#define ARCH_X86_64_h
+#ifndef ARCH_X86_64_H
+#define ARCH_X86_64_H
 
 static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 			    unsigned int *ecx, unsigned int *edx)
@@ -14,36 +14,6 @@ static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 
 #define FIO_ARCH	(arch_x86_64)
 
-#ifndef __NR_ioprio_set
-#define __NR_ioprio_set		251
-#define __NR_ioprio_get		252
-#endif
-
-#ifndef __NR_fadvise64
-#define __NR_fadvise64		221
-#endif
-
-#ifndef __NR_sys_splice
-#define __NR_sys_splice		275
-#define __NR_sys_tee		276
-#define __NR_sys_vmsplice	278
-#endif
-
-#ifndef __NR_shmget
-#define __NR_shmget		 29
-#define __NR_shmat		 30
-#define __NR_shmctl		 31
-#define __NR_shmdt		 67
-#endif
-
-#ifndef __NR_preadv2
-#define __NR_preadv2		327
-#endif
-#ifndef __NR_pwritev2
-#define __NR_pwritev2		328
-#endif
-
-
 #define	FIO_HUGE_PAGE		2097152
 
 #define nop		__asm__ __volatile__("rep;nop": : :"memory")
diff --git a/fio.1 b/fio.1
index 353f8ff..e89c3d1 100644
--- a/fio.1
+++ b/fio.1
@@ -654,7 +654,7 @@ and send/receive. This ioengine defines engine specific options.
 .TP
 .B cpuio
 Doesn't transfer any data, but burns CPU cycles according to \fBcpuload\fR and
-\fBcpucycles\fR parameters.
+\fBcpuchunks\fR parameters.
 .TP
 .B guasi
 The GUASI I/O engine is the Generic Userspace Asynchronous Syscall Interface
diff --git a/options.c b/options.c
index 5199823..4723e41 100644
--- a/options.c
+++ b/options.c
@@ -3012,36 +3012,42 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.type	= FIO_OPT_INT,
 		.off1	= td_var_offset(ioprio),
 		.help	= "Set job IO priority value",
-		.minval	= 0,
-		.maxval	= 7,
+		.minval	= IOPRIO_MIN_PRIO,
+		.maxval	= IOPRIO_MAX_PRIO,
 		.interval = 1,
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_CRED,
 	},
+#else
+	{
+		.name	= "prio",
+		.lname	= "I/O nice priority",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support IO priorities",
+	},
+#endif
+#ifdef FIO_HAVE_IOPRIO_CLASS
+#ifndef FIO_HAVE_IOPRIO
+#error "FIO_HAVE_IOPRIO_CLASS requires FIO_HAVE_IOPRIO"
+#endif
 	{
 		.name	= "prioclass",
 		.lname	= "I/O nice priority class",
 		.type	= FIO_OPT_INT,
 		.off1	= td_var_offset(ioprio_class),
 		.help	= "Set job IO priority class",
-		.minval	= 0,
-		.maxval	= 3,
+		.minval	= IOPRIO_MIN_PRIO_CLASS,
+		.maxval	= IOPRIO_MAX_PRIO_CLASS,
 		.interval = 1,
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_CRED,
 	},
 #else
 	{
-		.name	= "prio",
-		.lname	= "I/O nice priority",
-		.type	= FIO_OPT_UNSUPPORTED,
-		.help	= "Your platform does not support IO priorities",
-	},
-	{
 		.name	= "prioclass",
 		.lname	= "I/O nice priority class",
 		.type	= FIO_OPT_UNSUPPORTED,
-		.help	= "Your platform does not support IO priorities",
+		.help	= "Your platform does not support IO priority classes",
 	},
 #endif
 	{
diff --git a/os/os-android.h b/os/os-android.h
index 1699539..cdae703 100644
--- a/os/os-android.h
+++ b/os/os-android.h
@@ -16,12 +16,14 @@
 #include <linux/major.h>
 #include <asm/byteorder.h>
 
+#include "./os-linux-syscall.h"
 #include "binject.h"
 #include "../file.h"
 
 #define FIO_HAVE_DISK_UTIL
 #define FIO_HAVE_IOSCHED_SWITCH
 #define FIO_HAVE_IOPRIO
+#define FIO_HAVE_IOPRIO_CLASS
 #define FIO_HAVE_ODIRECT
 #define FIO_HAVE_HUGETLB
 #define FIO_HAVE_BLKTRACE
@@ -140,6 +142,12 @@ enum {
 #define IOPRIO_BITS		16
 #define IOPRIO_CLASS_SHIFT	13
 
+#define IOPRIO_MIN_PRIO		0	/* highest priority */
+#define IOPRIO_MAX_PRIO		7	/* lowest priority */
+
+#define IOPRIO_MIN_PRIO_CLASS	0
+#define IOPRIO_MAX_PRIO_CLASS	3
+
 static inline int ioprio_set(int which, int who, int ioprio_class, int ioprio)
 {
 	/*
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index d776d1f..187330b 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -10,6 +10,8 @@
 #include <sys/statvfs.h>
 #include <sys/diskslice.h>
 #include <sys/ioctl_compat.h>
+#include <sys/usched.h>
+#include <sys/resource.h>
 
 #include "../file.h"
 
@@ -20,8 +22,8 @@
 #define FIO_HAVE_TRIM
 #define FIO_HAVE_CHARDEV_SIZE
 #define FIO_HAVE_GETTID
-
-#undef	FIO_HAVE_CPU_AFFINITY	/* XXX notyet */
+#define FIO_HAVE_CPU_AFFINITY
+#define FIO_HAVE_IOPRIO
 
 #define OS_MAP_ANON		MAP_ANON
 
@@ -33,7 +35,139 @@
 #define fio_swap32(x)	bswap32(x)
 #define fio_swap64(x)	bswap64(x)
 
+/* This is supposed to equal (sizeof(cpumask_t)*8) */
+#define FIO_MAX_CPUS	SMP_MAXCPU
+
 typedef off_t off64_t;
+typedef cpumask_t os_cpu_mask_t;
+
+/*
+ * These macros are copied from sys/cpu/x86_64/include/types.h.
+ * It's okay to copy from arch dependent header because x86_64 is the only
+ * supported arch, and no other arch is going to be supported any time soon.
+ *
+ * These are supposed to be able to be included from userspace by defining
+ * _KERNEL_STRUCTURES, however this scheme is badly broken that enabling it
+ * causes compile-time conflicts with other headers. Although the current
+ * upstream code no longer requires _KERNEL_STRUCTURES, they should be kept
+ * here for compatibility with older versions.
+ */
+#ifndef CPUMASK_SIMPLE
+#define CPUMASK_SIMPLE(cpu)		((uint64_t)1 << (cpu))
+#define CPUMASK_TESTBIT(val, i)		((val).ary[((i) >> 6) & 3] & \
+					 CPUMASK_SIMPLE((i) & 63))
+#define CPUMASK_ORBIT(mask, i)		((mask).ary[((i) >> 6) & 3] |= \
+					 CPUMASK_SIMPLE((i) & 63))
+#define CPUMASK_NANDBIT(mask, i)	((mask).ary[((i) >> 6) & 3] &= \
+					 ~CPUMASK_SIMPLE((i) & 63))
+#define CPUMASK_ASSZERO(mask)		do {				\
+					(mask).ary[0] = 0;		\
+					(mask).ary[1] = 0;		\
+					(mask).ary[2] = 0;		\
+					(mask).ary[3] = 0;		\
+					} while(0)
+#endif
+
+/*
+ * Define USCHED_GET_CPUMASK as the macro didn't exist until release 4.5.
+ * usched_set(2) returns EINVAL if the kernel doesn't support it, though
+ * fio_getaffinity() returns void.
+ *
+ * Also note usched_set(2) works only for the current thread regardless of
+ * the command type. It doesn't work against another thread regardless of
+ * a caller's privilege. A caller would generally specify 0 for pid for the
+ * current thread though that's the only choice. See BUGS in usched_set(2).
+ */
+#ifndef USCHED_GET_CPUMASK
+#define USCHED_GET_CPUMASK	5
+#endif
+
+static inline int fio_cpuset_init(os_cpu_mask_t *mask)
+{
+	CPUMASK_ASSZERO(*mask);
+	return 0;
+}
+
+static inline int fio_cpuset_exit(os_cpu_mask_t *mask)
+{
+	return 0;
+}
+
+static inline void fio_cpu_clear(os_cpu_mask_t *mask, int cpu)
+{
+	CPUMASK_NANDBIT(*mask, cpu);
+}
+
+static inline void fio_cpu_set(os_cpu_mask_t *mask, int cpu)
+{
+	CPUMASK_ORBIT(*mask, cpu);
+}
+
+static inline int fio_cpu_isset(os_cpu_mask_t *mask, int cpu)
+{
+	if (CPUMASK_TESTBIT(*mask, cpu))
+		return 1;
+
+	return 0;
+}
+
+static inline int fio_cpu_count(os_cpu_mask_t *mask)
+{
+	int i, n = 0;
+
+	for (i = 0; i < FIO_MAX_CPUS; i++)
+		if (CPUMASK_TESTBIT(*mask, i))
+			n++;
+
+	return n;
+}
+
+static inline int fio_setaffinity(int pid, os_cpu_mask_t mask)
+{
+	int i, firstcall = 1;
+
+	/* 0 for the current thread, see BUGS in usched_set(2) */
+	pid = 0;
+
+	for (i = 0; i < FIO_MAX_CPUS; i++) {
+		if (!CPUMASK_TESTBIT(mask, i))
+			continue;
+		if (firstcall) {
+			if (usched_set(pid, USCHED_SET_CPU, &i, sizeof(int)))
+				return -1;
+			firstcall = 0;
+		} else {
+			if (usched_set(pid, USCHED_ADD_CPU, &i, sizeof(int)))
+				return -1;
+		}
+	}
+
+	return 0;
+}
+
+static inline void fio_getaffinity(int pid, os_cpu_mask_t *mask)
+{
+	/* 0 for the current thread, see BUGS in usched_set(2) */
+	pid = 0;
+
+	usched_set(pid, USCHED_GET_CPUMASK, mask, sizeof(*mask));
+}
+
+/* fio code is Linux based, so rename macros to Linux style */
+#define IOPRIO_WHO_PROCESS	PRIO_PROCESS
+#define IOPRIO_WHO_PGRP		PRIO_PGRP
+#define IOPRIO_WHO_USER		PRIO_USER
+
+#define IOPRIO_MIN_PRIO		1	/* lowest priority */
+#define IOPRIO_MAX_PRIO		10	/* highest priority */
+
+/*
+ * Prototypes declared in sys/sys/resource.h are preventing from defining
+ * ioprio_set() with 4 arguments, so define fio's ioprio_set() as a macro.
+ * Note that there is no idea of class within ioprio_set(2) unlike Linux.
+ */
+#define ioprio_set(which, who, ioprio_class, ioprio)	\
+	ioprio_set(which, who, ioprio)
 
 static inline int blockdev_size(struct fio_file *f, unsigned long long *bytes)
 {
diff --git a/os/os-linux-syscall.h b/os/os-linux-syscall.h
new file mode 100644
index 0000000..2de02f1
--- /dev/null
+++ b/os/os-linux-syscall.h
@@ -0,0 +1,277 @@
+#ifndef FIO_OS_LINUX_SYSCALL_H
+#define FIO_OS_LINUX_SYSCALL_H
+
+#include "../arch/arch.h"
+
+/* Linux syscalls for i386 */
+#if defined(ARCH_X86_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		289
+#define __NR_ioprio_get		290
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64		250
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice		313
+#define __NR_sys_tee		315
+#define __NR_sys_vmsplice	316
+#endif
+
+#ifndef __NR_preadv2
+#define __NR_preadv2		378
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		379
+#endif
+
+/* Linux syscalls for x86_64 */
+#elif defined(ARCH_X86_64_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		251
+#define __NR_ioprio_get		252
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64		221
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice		275
+#define __NR_sys_tee		276
+#define __NR_sys_vmsplice	278
+#endif
+
+#ifndef __NR_shmget
+#define __NR_shmget		 29
+#define __NR_shmat		 30
+#define __NR_shmctl		 31
+#define __NR_shmdt		 67
+#endif
+
+#ifndef __NR_preadv2
+#define __NR_preadv2		327
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		328
+#endif
+
+/* Linux syscalls for ppc */
+#elif defined(ARCH_PPC_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		273
+#define __NR_ioprio_get		274
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64		233
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice		283
+#define __NR_sys_tee		284
+#define __NR_sys_vmsplice	285
+#endif
+
+/* Linux syscalls for ia64 */
+#elif defined(ARCH_IA64_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		1274
+#define __NR_ioprio_get		1275
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64		1234
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice		1297
+#define __NR_sys_tee		1301
+#define __NR_sys_vmsplice	1302
+#endif
+
+#ifndef __NR_preadv2
+#define __NR_preadv2		1348
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		1349
+#endif
+
+/* Linux syscalls for alpha */
+#elif defined(ARCH_ALPHA_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		442
+#define __NR_ioprio_get		443
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64		413
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice		468
+#define __NR_sys_tee		470
+#define __NR_sys_vmsplice	471
+#endif
+
+/* Linux syscalls for s390 */
+#elif defined(ARCH_S390_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		282
+#define __NR_ioprio_get		283
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64		253
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice		306
+#define __NR_sys_tee		308
+#define __NR_sys_vmsplice	309
+#endif
+
+#ifndef __NR_preadv2
+#define __NR_preadv2		376
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		377
+#endif
+
+/* Linux syscalls for sparc */
+#elif defined(ARCH_SPARC_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		196
+#define __NR_ioprio_get		218
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64		209
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice		232
+#define __NR_sys_tee		280
+#define __NR_sys_vmsplice	25
+#endif
+
+#ifndef __NR_preadv2
+#define __NR_preadv2		358
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		359
+#endif
+
+/* Linux syscalls for sparc64 */
+#elif defined(ARCH_SPARC64_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		196
+#define __NR_ioprio_get		218
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64		209
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice		232
+#define __NR_sys_tee		280
+#define __NR_sys_vmsplice	25
+#endif
+
+#ifndef __NR_preadv2
+#define __NR_preadv2		358
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		359
+#endif
+
+/* Linux syscalls for arm */
+#elif defined(ARCH_ARM_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		314
+#define __NR_ioprio_get		315
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64		270
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice		340
+#define __NR_sys_tee		342
+#define __NR_sys_vmsplice	343
+#endif
+
+#ifndef __NR_preadv2
+#define __NR_preadv2		392
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		393
+#endif
+
+/* Linux syscalls for mips */
+#elif defined(ARCH_MIPS64_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		314
+#define __NR_ioprio_get		315
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64		215
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice		263
+#define __NR_sys_tee		265
+#define __NR_sys_vmsplice	266
+#endif
+
+/* Linux syscalls for sh */
+#elif defined(ARCH_SH_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		288
+#define __NR_ioprio_get		289
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64		250
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice		313
+#define __NR_sys_tee		315
+#define __NR_sys_vmsplice	316
+#endif
+
+/* Linux syscalls for hppa */
+#elif defined(ARCH_HPPA_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		267
+#define __NR_ioprio_get		268
+#endif
+
+#ifndef __NR_fadvise64
+#define __NR_fadvise64		236
+#endif
+
+#ifndef __NR_sys_splice
+#define __NR_sys_splice		291
+#define __NR_sys_tee		293
+#define __NR_sys_vmsplice	294
+#endif
+
+/* Linux syscalls for aarch64 */
+#elif defined(ARCH_AARCH64_H)
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set		30
+#define __NR_ioprio_get		31
+#endif
+
+#else
+#warning "Unknown architecture"
+#endif
+
+#endif /* FIO_OS_LINUX_SYSCALL_H */
diff --git a/os/os-linux.h b/os/os-linux.h
index b36d33c..06235ab 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -18,6 +18,7 @@
 #include <linux/major.h>
 #include <byteswap.h>
 
+#include "./os-linux-syscall.h"
 #include "binject.h"
 #include "../file.h"
 
@@ -25,6 +26,7 @@
 #define FIO_HAVE_DISK_UTIL
 #define FIO_HAVE_SGIO
 #define FIO_HAVE_IOPRIO
+#define FIO_HAVE_IOPRIO_CLASS
 #define FIO_HAVE_IOSCHED_SWITCH
 #define FIO_HAVE_ODIRECT
 #define FIO_HAVE_HUGETLB
@@ -96,6 +98,12 @@ enum {
 #define IOPRIO_BITS		16
 #define IOPRIO_CLASS_SHIFT	13
 
+#define IOPRIO_MIN_PRIO		0	/* highest priority */
+#define IOPRIO_MAX_PRIO		7	/* lowest priority */
+
+#define IOPRIO_MIN_PRIO_CLASS	0
+#define IOPRIO_MAX_PRIO_CLASS	3
+
 static inline int ioprio_set(int which, int who, int ioprio_class, int ioprio)
 {
 	/*

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 45213f1b15f820e6791118b7200a1185e2af7d87:

  pthread: bump min stack size (2016-07-14 10:36:12 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e4d03925dca33524be25d883febf9d3b83155a9f:

  plot: indicate that the pattern is a glob (2016-07-17 09:03:48 +0100)

----------------------------------------------------------------
Sitsofe Wheeler (2):
      plot: add gnuplot 5 support
      plot: indicate that the pattern is a glob

 tools/plot/fio2gnuplot |  6 +++---
 tools/plot/graph2D.gpm | 37 +++++++++++++++++++++++++++++--------
 tools/plot/graph3D.gpm | 33 ++++++++++++++++++++++++---------
 tools/plot/math.gpm    | 25 +++++++++++++++++++++----
 4 files changed, 77 insertions(+), 24 deletions(-)

---

Diff of recent changes:

diff --git a/tools/plot/fio2gnuplot b/tools/plot/fio2gnuplot
index 1009ae0..a703ae3 100755
--- a/tools/plot/fio2gnuplot
+++ b/tools/plot/fio2gnuplot
@@ -31,7 +31,7 @@ def find_file(path, pattern):
 	fio_data_file=[]
 	# For all the local files
 	for file in os.listdir(path):
-	    # If the file math the regexp
+	    # If the file matches the glob
 	    if fnmatch.fnmatch(file, pattern):
 		# Let's consider this file
 		fio_data_file.append(file)
@@ -361,7 +361,7 @@ def print_help():
     print 'fio2gnuplot -ghbiodvk -t <title> -o <outputfile> -p <pattern> -G <type> -m <time> -M <time>'
     print
     print '-h --help                           : Print this help'
-    print '-p <pattern> or --pattern <pattern> : A pattern in regexp to select fio input files'
+    print '-p <pattern> or --pattern <pattern> : A glob pattern to select fio input files'
     print '-b           or --bandwidth         : A predefined pattern for selecting *_bw.log files'
     print '-i           or --iops              : A predefined pattern for selecting *_iops.log files'
     print '-g           or --gnuplot           : Render gnuplot traces before exiting'
@@ -487,7 +487,7 @@ def main(argv):
     #We need to adjust the output filename regarding the pattern required by the user
     if (pattern_set_by_user == True):
 	    gnuplot_output_filename=pattern
-	    # As we do have some regexp in the pattern, let's make this simpliest
+	    # As we do have some glob in the pattern, let's make this simpliest
 	    # We do remove the simpliest parts of the expression to get a clear file name
 	    gnuplot_output_filename=gnuplot_output_filename.replace('-*-','-')
 	    gnuplot_output_filename=gnuplot_output_filename.replace('*','-')
diff --git a/tools/plot/graph2D.gpm b/tools/plot/graph2D.gpm
index 5cd6ff3..769b754 100644
--- a/tools/plot/graph2D.gpm
+++ b/tools/plot/graph2D.gpm
@@ -1,9 +1,30 @@
 # This Gnuplot file has been generated by eNovance
 
-set title '$0'
+needed_args = 8
+if (exists("ARGC") && ARGC >= needed_args) \
+	found_args = 1; \
+else if (strlen("$$#") < 3 && "$#" >= needed_args) \
+	found_args = 1; \
+	ARG1 = "$0"; \
+	ARG2 = "$1"; \
+	ARG3 = "$2"; \
+	ARG4 = "$3"; \
+	ARG5 = "$4"; \
+	ARG6 = "$5"; \
+	ARG7 = "$6"; \
+	ARG8 = "$7"; \
+else \
+	found_args = 0; \
+	print "Aborting: could not find all arguments"; \
+	exit
+
+avg_num = ARG8 + 0
+avg_str = sprintf("%g", avg_num)
+
+set title ARG1
 
 set terminal png size 1280,1024
-set output '$3.png'
+set output ARG4 . '.png'
 #set terminal x11
 
 #Preparing Axes
@@ -12,7 +33,7 @@ set ytics axis out auto
 #set data style lines
 set key top left reverse
 set xlabel "Time (Seconds)"
-set ylabel '$4'
+set ylabel ARG5
 set xrange [0:]
 set yrange [0:]
 
@@ -22,13 +43,13 @@ set yrange [0:]
 set style line 100 lt 7 lw 0.5
 set style line 1 lt 1 lw 3 pt 3 linecolor rgb "green"
 
-plot '$1' using 2:3 with linespoints title '$2', $7 w l ls 1 ti 'Global average value ($7)'
+plot ARG2 using 2:3 with linespoints title ARG3, avg_num w l ls 1 ti 'Global average value (' . avg_str . ')'
 
-set output '$5.png'
-plot '$1' using 2:3 smooth csplines title '$2', $7 w l ls 1 ti 'Global average value ($7)'
+set output ARG6 . '.png'
+plot ARG2 using 2:3 smooth csplines title ARG3, avg_num w l ls 1 ti 'Global average value (' . avg_str . ')'
 
-set output '$6.png'
-plot '$1' using 2:3 smooth bezier title '$2', $7 w l ls 1 ti 'Global average value ($7)'
+set output ARG7 . '.png'
+plot ARG2 using 2:3 smooth bezier title ARG3, avg_num w l ls 1 ti 'Global average value (' . avg_str .')'
 
 #pause -1
 #The End
diff --git a/tools/plot/graph3D.gpm b/tools/plot/graph3D.gpm
index 93f7a4d..ac2cdf6 100644
--- a/tools/plot/graph3D.gpm
+++ b/tools/plot/graph3D.gpm
@@ -1,9 +1,24 @@
 # This Gnuplot file has been generated by eNovance
 
-set title '$0'
+needed_args = 5
+if (exists("ARGC") && ARGC >= needed_args) \
+	found_args = 1; \
+else if (strlen("$$#") < 3 && "$#" >= needed_args) \
+	found_args = 1; \
+	ARG1 = "$0"; \
+	ARG2 = "$1"; \
+	ARG3 = "$2"; \
+	ARG4 = "$3"; \
+	ARG5 = "$4"; \
+else \
+	found_args = 0; \
+	print "Aborting: could not find all arguments"; \
+	exit
+
+set title ARG1
 
 set terminal png size 1280,1024
-set output '$3.png'
+set output ARG4 . '.png'
 #set terminal x11
 #3D Config
 set isosamples 30
@@ -19,7 +34,7 @@ set grid back
 set key top left reverse
 set ylabel "Disk"
 set xlabel "Time (Seconds)"
-set zlabel '$4'
+set zlabel ARG5
 set cbrange [0:]
 set zrange [0:]
 
@@ -35,7 +50,7 @@ set multiplot
 set size 0.5,0.5
 set view 64,216
 set origin 0,0.5
-splot '$1' using 2:1:3 with linespoints title '$2'
+splot ARG2 using 2:1:3 with linespoints title ARG3
 
 #Top Right View
 set size 0.5,0.5
@@ -43,7 +58,7 @@ set origin 0.5,0.5
 set view 90,0
 set pm3d at s solid hidden3d 100 scansbackward
 set pm3d depthorder
-splot '$1' using 2:1:3 with linespoints title '$2'
+splot ARG2 using 2:1:3 with linespoints title ARG3
 
 #Bottom Right View
 set size 0.5,0.5
@@ -51,13 +66,13 @@ set origin 0.5,0
 set view 63,161
 set pm3d at s solid hidden3d 100 scansbackward
 set pm3d depthorder
-splot '$1' using 2:1:3 with linespoints title '$2'
+splot ARG2 using 2:1:3 with linespoints title ARG3
 
 #Bottom Left View
 set size 0.5,0.5
 set origin 0,0
 set pm3d map
-splot '$1' using 2:1:3 with linespoints title '$2'
+splot ARG2 using 2:1:3 with linespoints title ARG3
 
 #Unsetting multiplotting
 unset multiplot
@@ -66,7 +81,7 @@ unset multiplot
 #Preparing 3D Interactive view
 set mouse
 set terminal png size 1024,768
-set output '$3-3D.png'
+set output ARG4 . '-3D.png'
 
 #set term x11
 set view 64,216
@@ -74,7 +89,7 @@ set origin 0,0
 set size 1,1
 set pm3d at bs solid hidden3d 100 scansbackward
 set pm3d depthorder
-splot '$1' using 2:1:3 with linespoints title '$2'
+splot ARG2 using 2:1:3 with linespoints title ARG3
 
 #pause -1
 #The End
diff --git a/tools/plot/math.gpm b/tools/plot/math.gpm
index a01f5a0..0a2aff5 100644
--- a/tools/plot/math.gpm
+++ b/tools/plot/math.gpm
@@ -1,15 +1,32 @@
 # This Gnuplot file has been generated by eNovance
+if (exists("ARGC") && ARGC > 5) \
+	found_args = 1; \
+else if (strlen("$$#") < 3 && "$#" > 5) \
+	found_args = 1; \
+	ARG1 = "$0"; \
+	ARG2 = "$1"; \
+	ARG3 = "$2"; \
+	ARG4 = "$3"; \
+	ARG5 = "$4"; \
+	ARG6 = "$5"; \
+else \
+	found_args = 0; \
+	print "Aborting: could not find all arguments"; \
+	exit
 
-set title '$0'
+avg_num = ARG6 + 0
+avg_str = sprintf("%g", avg_num)
+
+set title ARG1
 
 set terminal png size 1280,1024
-set output '$3.png'
+set output ARG4 . '.png'
 
 set palette rgbformulae 7,5,15
 set style line 100 lt 7 lw 0.5
 set style fill transparent solid 0.9 noborder
 set auto x
-set ylabel '$4'
+set ylabel ARG5
 set xlabel "Disk"
 set yrange [0:]
 set style data histogram
@@ -22,4 +39,4 @@ set xtics axis out
 set xtic rotate by 45 scale 0 font ",8" autojustify
 set xtics offset 0,-1 border -5,1,5
 set style line 1 lt 1 lw 3 pt 3 linecolor rgb "green"
-plot '$1' using 2:xtic(1) ti col, $5 w l ls 1 ti 'Global average value ($5)'
+plot ARG2 using 2:xtic(1) ti col, avg_num w l ls 1 ti 'Global average value (' . avg_str . ')'

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-15 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-15 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 883e4841d5466955ad464ee3a6b37e009cfa80ef:

  Merge branch 'fix_verify' of https://github.com/charles-jacobsen/fio (2016-07-13 09:13:46 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 45213f1b15f820e6791118b7200a1185e2af7d87:

  pthread: bump min stack size (2016-07-14 10:36:12 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      pthread: bump min stack size

Tomohiro Kusumi (2):
      Add os_trim() support for DragonFlyBSD
      Add os_trim() support for FreeBSD

 Makefile          |  2 ++
 gettime-thread.c  |  2 +-
 os/os-dragonfly.h | 16 ++++++++++++++++
 os/os-freebsd.h   | 15 +++++++++++++++
 verify.c          |  2 +-
 5 files changed, 35 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index c617d6f..b54f7e9 100644
--- a/Makefile
+++ b/Makefile
@@ -145,6 +145,7 @@ ifeq ($(CONFIG_TARGET_OS), SunOS)
   CPPFLAGS += -D__EXTENSIONS__
 endif
 ifeq ($(CONFIG_TARGET_OS), FreeBSD)
+  SOURCE += trim.c
   LIBS	 += -lpthread -lrt
   LDFLAGS += -rdynamic
 endif
@@ -157,6 +158,7 @@ ifeq ($(CONFIG_TARGET_OS), NetBSD)
   LDFLAGS += -rdynamic
 endif
 ifeq ($(CONFIG_TARGET_OS), DragonFly)
+  SOURCE += trim.c
   LIBS	 += -lpthread -lrt
   LDFLAGS += -rdynamic
 endif
diff --git a/gettime-thread.c b/gettime-thread.c
index 6dc1486..19541b4 100644
--- a/gettime-thread.c
+++ b/gettime-thread.c
@@ -71,7 +71,7 @@ int fio_start_gtod_thread(void)
 		return 1;
 
 	pthread_attr_init(&attr);
-	pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);
+	pthread_attr_setstacksize(&attr, 2 * PTHREAD_STACK_MIN);
 	ret = pthread_create(&gtod_thread, &attr, gtod_thread_main, mutex);
 	pthread_attr_destroy(&attr);
 	if (ret) {
diff --git a/os/os-dragonfly.h b/os/os-dragonfly.h
index b67c660..d776d1f 100644
--- a/os/os-dragonfly.h
+++ b/os/os-dragonfly.h
@@ -9,6 +9,7 @@
 #include <sys/sysctl.h>
 #include <sys/statvfs.h>
 #include <sys/diskslice.h>
+#include <sys/ioctl_compat.h>
 
 #include "../file.h"
 
@@ -16,6 +17,7 @@
 #define FIO_USE_GENERIC_RAND
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_FS_STAT
+#define FIO_HAVE_TRIM
 #define FIO_HAVE_CHARDEV_SIZE
 #define FIO_HAVE_GETTID
 
@@ -84,6 +86,20 @@ static inline unsigned long long get_fs_free_size(const char *path)
 	return ret;
 }
 
+static inline int os_trim(int fd, unsigned long long start,
+			  unsigned long long len)
+{
+	off_t range[2];
+
+	range[0] = start;
+	range[1] = len;
+
+	if (!ioctl(fd, IOCTLTRIM, range))
+		return 0;
+
+	return errno;
+}
+
 #ifdef MADV_FREE
 #define FIO_MADV_FREE	MADV_FREE
 #endif
diff --git a/os/os-freebsd.h b/os/os-freebsd.h
index fa00bb8..ac408c9 100644
--- a/os/os-freebsd.h
+++ b/os/os-freebsd.h
@@ -19,6 +19,7 @@
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
 #define FIO_HAVE_CHARDEV_SIZE
 #define FIO_HAVE_FS_STAT
+#define FIO_HAVE_TRIM
 #define FIO_HAVE_GETTID
 #define FIO_HAVE_CPU_AFFINITY
 
@@ -114,6 +115,20 @@ static inline unsigned long long get_fs_free_size(const char *path)
 	return ret;
 }
 
+static inline int os_trim(int fd, unsigned long long start,
+			  unsigned long long len)
+{
+	off_t range[2];
+
+	range[0] = start;
+	range[1] = len;
+
+	if (!ioctl(fd, DIOCGDELETE, range))
+		return 0;
+
+	return errno;
+}
+
 #ifdef MADV_FREE
 #define FIO_MADV_FREE	MADV_FREE
 #endif
diff --git a/verify.c b/verify.c
index 58f37ae..9a96fbb 100644
--- a/verify.c
+++ b/verify.c
@@ -1290,7 +1290,7 @@ int verify_async_init(struct thread_data *td)
 	pthread_attr_t attr;
 
 	pthread_attr_init(&attr);
-	pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);
+	pthread_attr_setstacksize(&attr, 2 * PTHREAD_STACK_MIN);
 
 	td->verify_thread_exit = 0;
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit b86ad8f1c3845419742715e94526f60e1e2bf596:

  workqueue: rename private to priv for compiling as c++ (2016-07-11 16:52:34 -0400)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 883e4841d5466955ad464ee3a6b37e009cfa80ef:

  Merge branch 'fix_verify' of https://github.com/charles-jacobsen/fio (2016-07-13 09:13:46 -0700)

----------------------------------------------------------------
Charlie Jacobsen (1):
      verify: Reset verify_state before verification phase.

Jens Axboe (1):
      Merge branch 'fix_verify' of https://github.com/charles-jacobsen/fio

 backend.c |  9 +++++++++
 fio.h     |  1 +
 init.c    | 15 ++++++++++++++-
 3 files changed, 24 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index dc6f530..58c77cb 100644
--- a/backend.c
+++ b/backend.c
@@ -603,6 +603,15 @@ static void do_verify(struct thread_data *td, uint64_t verify_bytes)
 	if (td->error)
 		return;
 
+	/*
+	 * verify_state needs to be reset before verification
+	 * proceeds so that expected random seeds match actual
+	 * random seeds in headers. The main loop will reset
+	 * all random number generators if randrepeat is set.
+	 */
+	if (!td->o.rand_repeatable)
+		td_fill_verify_state_seed(td);
+
 	td_set_runstate(td, TD_VERIFYING);
 
 	io_u = NULL;
diff --git a/fio.h b/fio.h
index 7e6311c..8a0ebe3 100644
--- a/fio.h
+++ b/fio.h
@@ -502,6 +502,7 @@ extern void fio_options_dup_and_init(struct option *);
 extern void fio_options_mem_dupe(struct thread_data *);
 extern void options_mem_dupe(void *data, struct fio_option *options);
 extern void td_fill_rand_seeds(struct thread_data *);
+extern void td_fill_verify_state_seed(struct thread_data *);
 extern void add_job_opts(const char **, int);
 extern char *num2str(uint64_t, int, int, int, int);
 extern int ioengine_load(struct thread_data *);
diff --git a/init.c b/init.c
index 7166ea7..065a71a 100644
--- a/init.c
+++ b/init.c
@@ -936,12 +936,25 @@ static void init_rand_file_service(struct thread_data *td)
 	}
 }
 
+void td_fill_verify_state_seed(struct thread_data *td)
+{
+	bool use64;
+
+	if (td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE64)
+		use64 = 1;
+	else
+		use64 = 0;
+
+	init_rand_seed(&td->verify_state, td->rand_seeds[FIO_RAND_VER_OFF],
+		use64);
+}
+
 static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 {
 	int i;
 
 	init_rand_seed(&td->bsrange_state, td->rand_seeds[FIO_RAND_BS_OFF], use64);
-	init_rand_seed(&td->verify_state, td->rand_seeds[FIO_RAND_VER_OFF], use64);
+	td_fill_verify_state_seed(td);
 	init_rand_seed(&td->rwmix_state, td->rand_seeds[FIO_RAND_MIX_OFF], false);
 
 	if (td->o.file_service_type == FIO_FSERVICE_RANDOM)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-13 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-13 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 60a257279f249d65d9905e77d3a2fa54ac5aa881:

  iolog: flush_log() can be bool (2016-07-11 11:51:58 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to b86ad8f1c3845419742715e94526f60e1e2bf596:

  workqueue: rename private to priv for compiling as c++ (2016-07-11 16:52:34 -0400)

----------------------------------------------------------------
Casey Bodley (1):
      workqueue: rename private to priv for compiling as c++

 rate-submit.c | 18 +++++++++---------
 workqueue.h   |  2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

---

Diff of recent changes:

diff --git a/rate-submit.c b/rate-submit.c
index 92cb622..0c31f29 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -14,7 +14,7 @@ static int io_workqueue_fn(struct submit_worker *sw,
 {
 	struct io_u *io_u = container_of(work, struct io_u, work);
 	const enum fio_ddir ddir = io_u->ddir;
-	struct thread_data *td = sw->private;
+	struct thread_data *td = sw->priv;
 	int ret;
 
 	dprint(FD_RATE, "io_u %p queued by %u\n", io_u, gettid());
@@ -61,7 +61,7 @@ static int io_workqueue_fn(struct submit_worker *sw,
 
 static bool io_workqueue_pre_sleep_flush_fn(struct submit_worker *sw)
 {
-	struct thread_data *td = sw->private;
+	struct thread_data *td = sw->priv;
 
 	if (td->io_u_queued || td->cur_depth || td->io_u_in_flight)
 		return true;
@@ -71,7 +71,7 @@ static bool io_workqueue_pre_sleep_flush_fn(struct submit_worker *sw)
 
 static void io_workqueue_pre_sleep_fn(struct submit_worker *sw)
 {
-	struct thread_data *td = sw->private;
+	struct thread_data *td = sw->priv;
 	int ret;
 
 	ret = io_u_quiesce(td);
@@ -84,20 +84,20 @@ static int io_workqueue_alloc_fn(struct submit_worker *sw)
 	struct thread_data *td;
 
 	td = calloc(1, sizeof(*td));
-	sw->private = td;
+	sw->priv = td;
 	return 0;
 }
 
 static void io_workqueue_free_fn(struct submit_worker *sw)
 {
-	free(sw->private);
-	sw->private = NULL;
+	free(sw->priv);
+	sw->priv = NULL;
 }
 
 static int io_workqueue_init_worker_fn(struct submit_worker *sw)
 {
 	struct thread_data *parent = sw->wq->td;
-	struct thread_data *td = sw->private;
+	struct thread_data *td = sw->priv;
 	int fio_unused ret;
 
 	memcpy(&td->o, &parent->o, sizeof(td->o));
@@ -145,7 +145,7 @@ err:
 static void io_workqueue_exit_worker_fn(struct submit_worker *sw,
 					unsigned int *sum_cnt)
 {
-	struct thread_data *td = sw->private;
+	struct thread_data *td = sw->priv;
 
 	(*sum_cnt)++;
 	sum_thread_stats(&sw->wq->td->ts, &td->ts, *sum_cnt == 1);
@@ -213,7 +213,7 @@ static void sum_ddir(struct thread_data *dst, struct thread_data *src,
 
 static void io_workqueue_update_acct_fn(struct submit_worker *sw)
 {
-	struct thread_data *src = sw->private;
+	struct thread_data *src = sw->priv;
 	struct thread_data *dst = sw->wq->td;
 
 	if (td_read(src))
diff --git a/workqueue.h b/workqueue.h
index 1961b2a..e35c181 100644
--- a/workqueue.h
+++ b/workqueue.h
@@ -16,7 +16,7 @@ struct submit_worker {
 	unsigned int index;
 	uint64_t seq;
 	struct workqueue *wq;
-	void *private;
+	void *priv;
 	struct sk_out *sk_out;
 };
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 0b2eef4940d9818f91f455d0cdb4f37db4fbb158:

  samples being added to the pending log were silently dropped because we failed to set nr_samples in the new log they get copied into (2016-07-06 15:54:10 -0400)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 60a257279f249d65d9905e77d3a2fa54ac5aa881:

  iolog: flush_log() can be bool (2016-07-11 11:51:58 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      iolog: flush_log() can be bool

 backend.c | 2 +-
 iolog.c   | 2 +-
 iolog.h   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index bb0200b..dc6f530 100644
--- a/backend.c
+++ b/backend.c
@@ -2388,7 +2388,7 @@ int fio_backend(struct sk_out *sk_out)
 			for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 				struct io_log *log = agg_io_log[i];
 
-				flush_log(log, 0);
+				flush_log(log, false);
 				free_log(log);
 			}
 		}
diff --git a/iolog.c b/iolog.c
index ff521df..4c87f1c 100644
--- a/iolog.c
+++ b/iolog.c
@@ -965,7 +965,7 @@ int iolog_file_inflate(const char *file)
 
 #endif
 
-void flush_log(struct io_log *log, int do_append)
+void flush_log(struct io_log *log, bool do_append)
 {
 	void *buf;
 	FILE *f;
diff --git a/iolog.h b/iolog.h
index 0da7067..a58e3f0 100644
--- a/iolog.h
+++ b/iolog.h
@@ -232,7 +232,7 @@ static inline bool per_unit_log(struct io_log *log)
 
 extern void finalize_logs(struct thread_data *td, bool);
 extern void setup_log(struct io_log **, struct log_params *, const char *);
-extern void flush_log(struct io_log *, int);
+extern void flush_log(struct io_log *, bool);
 extern void flush_samples(FILE *, void *, uint64_t);
 extern void free_log(struct io_log *);
 extern void fio_writeout_logs(bool);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit f1c4b3727386bd8da3617a6730ad55cf2ba04ec8:

  gfio: call g_thread_init() for <= 2.31.0 (2016-07-05 14:23:56 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 0b2eef4940d9818f91f455d0cdb4f37db4fbb158:

  samples being added to the pending log were silently dropped because we failed to set nr_samples in the new log they get copied into (2016-07-06 15:54:10 -0400)

----------------------------------------------------------------
Karl Cronburg (1):
      samples being added to the pending log were silently dropped because we failed to set nr_samples in the new log they get copied into

 stat.c | 1 +
 1 file changed, 1 insertion(+)

---

Diff of recent changes:

diff --git a/stat.c b/stat.c
index e0e97cd..96cd764 100644
--- a/stat.c
+++ b/stat.c
@@ -1949,6 +1949,7 @@ static struct io_logs *regrow_log(struct io_log *iolog)
 		dst = get_sample(iolog, cur_log, i);
 		memcpy(dst, src, log_entry_sz(iolog));
 	}
+	cur_log->nr_samples = iolog->pending->nr_samples;
 
 	iolog->pending->nr_samples = 0;
 	return cur_log;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-07-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-07-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8a09277d18f942ed35354e31e38df50d991a595a:

  HOWTO: remove old use cases for the net IO engine (2016-06-29 13:09:15 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to f1c4b3727386bd8da3617a6730ad55cf2ba04ec8:

  gfio: call g_thread_init() for <= 2.31.0 (2016-07-05 14:23:56 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      gfio: call g_thread_init() for <= 2.31.0

 gfio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/gfio.c b/gfio.c
index e3bcbdf..37c1818 100644
--- a/gfio.c
+++ b/gfio.c
@@ -1677,7 +1677,7 @@ static void init_ui(int *argc, char **argv[], struct gui *ui)
 	 * Without it, the update that happens in gfio_update_thread_status
 	 * doesn't really happen in a timely fashion, you need expose events
 	 */
-#if !GTK_CHECK_VERSION(2, 24, 0)
+#if !GLIB_CHECK_VERSION(2, 31, 0)
 	if (!g_thread_supported())
 		g_thread_init(NULL);
 #endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-06-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-06-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit bccdc0d0c9d41749515131226aab71baa59e03cd:

  Fio 2.12 (2016-06-13 15:42:44 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8a09277d18f942ed35354e31e38df50d991a595a:

  HOWTO: remove old use cases for the net IO engine (2016-06-29 13:09:15 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      HOWTO: remove old use cases for the net IO engine

Vincent Fu (2):
      Remove hard-coded precision for printing JSON float values
      helper_thread: remove impossible branch

 HOWTO           | 21 +++++++++------------
 helper_thread.c |  8 ++------
 json.c          |  2 +-
 3 files changed, 12 insertions(+), 19 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 67fd833..a50f93e 100644
--- a/HOWTO
+++ b/HOWTO
@@ -329,18 +329,15 @@ directory=str	Prefix filenames with this directory. Used to place files
 filename=str	Fio normally makes up a filename based on the job name,
 		thread number, and file number. If you want to share
 		files between threads in a job or several jobs, specify
-		a filename for each of them to override the default. If
-		the ioengine used is 'net', the filename is the host, port,
-		and protocol to use in the format of =host,port,protocol.
-		See ioengine=net for more. If the ioengine is file based, you
-		can specify a number of files by separating the names with a
-		':' colon. So if you wanted a job to open /dev/sda and /dev/sdb
-		as the two working files, you would use
-		filename=/dev/sda:/dev/sdb. On Windows, disk devices are
-		accessed as \\.\PhysicalDrive0 for the first device,
-		\\.\PhysicalDrive1 for the second etc. Note: Windows and
-		FreeBSD prevent write access to areas of the disk containing
-		in-use data (e.g. filesystems).
+		a filename for each of them to override the default.
+		If the ioengine is file based, you can specify a number of
+		files by separating the names with a ':' colon. So if you
+		wanted a job to open /dev/sda and /dev/sdb as the two working
+		files, you would use filename=/dev/sda:/dev/sdb. On Windows,
+		disk devices are accessed as \\.\PhysicalDrive0 for the first
+		device, \\.\PhysicalDrive1 for the second etc. Note: Windows
+		and FreeBSD prevent write access to areas of the disk
+		containing in-use data (e.g. filesystems).
 		If the wanted filename does need to include a colon, then
 		escape that with a '\' character. For instance, if the filename
 		is "/dev/dsk/foo@3,0:c", then you would use
diff --git a/helper_thread.c b/helper_thread.c
index e788af5..f031df4 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -110,12 +110,8 @@ static void *helper_thread_main(void *data)
 			msec_to_next_event = DISK_UTIL_MSEC;
 			if (since_du >= DISK_UTIL_MSEC)
 				msec_to_next_event -= (since_du - DISK_UTIL_MSEC);
-		} else {
-			if (since_du >= DISK_UTIL_MSEC)
-				msec_to_next_event = DISK_UTIL_MSEC - (DISK_UTIL_MSEC - since_du);
-			else
-				msec_to_next_event = DISK_UTIL_MSEC;
-		}
+		} else
+			msec_to_next_event = DISK_UTIL_MSEC - since_du;
 
 		if (hd->do_stat) {
 			hd->do_stat = 0;
diff --git a/json.c b/json.c
index f3ec0bb..190fa9e 100644
--- a/json.c
+++ b/json.c
@@ -347,7 +347,7 @@ static void json_print_value(struct json_value *value, struct buf_output *out)
 		log_buf(out, "%lld", value->integer_number);
 		break;
 	case JSON_TYPE_FLOAT:
-		log_buf(out, "%.2f", value->float_number);
+		log_buf(out, "%f", value->float_number);
 		break;
 	case JSON_TYPE_OBJECT:
 		json_print_object(value->object, out);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-06-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-06-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2b7625e25e32783272b8e6ffbc1546fa50b9386c:

  iolog: fix 'cur_log' leaks (2016-06-11 21:41:13 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to bccdc0d0c9d41749515131226aab71baa59e03cd:

  Fio 2.12 (2016-06-13 15:42:44 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      stat: treat !per_unit_logs() like IO offload mode
      Fio 2.12

 FIO-VERSION-GEN        |  2 +-
 os/windows/install.wxs |  2 +-
 stat.c                 | 11 +++++++----
 3 files changed, 9 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index ea65ea8..04802dd 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.11
+DEF_VER=fio-2.12
 
 LF='
 '
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 45084e6..1e8022d 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.11">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.12">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"
diff --git a/stat.c b/stat.c
index 26d8d53..e0e97cd 100644
--- a/stat.c
+++ b/stat.c
@@ -1983,11 +1983,14 @@ static struct io_logs *get_cur_log(struct io_log *iolog)
 		return cur_log;
 
 	/*
-	 * Out of space. If we're in IO offload mode, add a new log chunk
-	 * inline. If we're doing inline submissions, flag 'td' as needing
-	 * a log regrow and we'll take care of it on the submission side.
+	 * Out of space. If we're in IO offload mode, or we're not doing
+	 * per unit logging (hence logging happens outside of the IO thread
+	 * as well), add a new log chunk inline. If we're doing inline
+	 * submissions, flag 'td' as needing a log regrow and we'll take
+	 * care of it on the submission side.
 	 */
-	if (iolog->td->o.io_submit_mode == IO_MODE_OFFLOAD)
+	if (iolog->td->o.io_submit_mode == IO_MODE_OFFLOAD ||
+	    !per_unit_log(iolog))
 		return regrow_log(iolog);
 
 	iolog->td->flags |= TD_F_REGROW_LOGS;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-06-12 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-06-12 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 54d0a3150d44adca3ee4047fabd85651c6ea2db1:

  options: fix typos (2016-06-09 13:30:52 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2b7625e25e32783272b8e6ffbc1546fa50b9386c:

  iolog: fix 'cur_log' leaks (2016-06-11 21:41:13 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Revert "fio: Simplify forking of processes"
      iolog: allocate 'cur_log's out of shared pool
      iolog: fix 'cur_log' leaks

 backend.c | 46 +++++++++++++++++++++++++++++++++++++++-------
 iolog.c   |  6 +++---
 stat.c    |  5 +++--
 3 files changed, 45 insertions(+), 12 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index d8f4f4c..bb0200b 100644
--- a/backend.c
+++ b/backend.c
@@ -1799,6 +1799,39 @@ err:
 	return (void *) (uintptr_t) td->error;
 }
 
+
+/*
+ * We cannot pass the td data into a forked process, so attach the td and
+ * pass it to the thread worker.
+ */
+static int fork_main(struct sk_out *sk_out, int shmid, int offset)
+{
+	struct fork_data *fd;
+	void *data, *ret;
+
+#if !defined(__hpux) && !defined(CONFIG_NO_SHM)
+	data = shmat(shmid, NULL, 0);
+	if (data == (void *) -1) {
+		int __err = errno;
+
+		perror("shmat");
+		return __err;
+	}
+#else
+	/*
+	 * HP-UX inherits shm mappings?
+	 */
+	data = threads;
+#endif
+
+	fd = calloc(1, sizeof(*fd));
+	fd->td = data + offset * sizeof(struct thread_data);
+	fd->sk_out = sk_out;
+	ret = thread_main(fd);
+	shmdt(data);
+	return (int) (uintptr_t) ret;
+}
+
 static void dump_td_info(struct thread_data *td)
 {
 	log_err("fio: job '%s' (state=%d) hasn't exited in %lu seconds, it "
@@ -2136,7 +2169,6 @@ reap:
 		struct thread_data *map[REAL_MAX_JOBS];
 		struct timeval this_start;
 		int this_jobs = 0, left;
-		struct fork_data *fd;
 
 		/*
 		 * create threads (TD_NOT_CREATED -> TD_CREATED)
@@ -2186,13 +2218,14 @@ reap:
 			map[this_jobs++] = td;
 			nr_started++;
 
-			fd = calloc(1, sizeof(*fd));
-			fd->td = td;
-			fd->sk_out = sk_out;
-
 			if (td->o.use_thread) {
+				struct fork_data *fd;
 				int ret;
 
+				fd = calloc(1, sizeof(*fd));
+				fd->td = td;
+				fd->sk_out = sk_out;
+
 				dprint(FD_PROCESS, "will pthread_create\n");
 				ret = pthread_create(&td->thread, NULL,
 							thread_main, fd);
@@ -2212,9 +2245,8 @@ reap:
 				dprint(FD_PROCESS, "will fork\n");
 				pid = fork();
 				if (!pid) {
-					int ret;
+					int ret = fork_main(sk_out, shm_id, i);
 
-					ret = (int)(uintptr_t)thread_main(fd);
 					_exit(ret);
 				} else if (i == fio_debug_jobno)
 					*fio_debug_jobp = pid;
diff --git a/iolog.c b/iolog.c
index 9391507..ff521df 100644
--- a/iolog.c
+++ b/iolog.c
@@ -645,6 +645,7 @@ void free_log(struct io_log *log)
 		cur_log = flist_first_entry(&log->io_logs, struct io_logs, list);
 		flist_del_init(&cur_log->list);
 		free(cur_log->log);
+		sfree(cur_log);
 	}
 
 	if (log->pending) {
@@ -988,6 +989,7 @@ void flush_log(struct io_log *log, int do_append)
 		cur_log = flist_first_entry(&log->io_logs, struct io_logs, list);
 		flist_del_init(&cur_log->list);
 		flush_samples(f, cur_log->log, cur_log->nr_samples * log_entry_sz(log));
+		sfree(cur_log);
 	}
 
 	fclose(f);
@@ -1226,9 +1228,7 @@ static int iolog_flush(struct io_log *log)
 		data->samples = cur_log->log;
 		data->nr_samples = cur_log->nr_samples;
 
-		cur_log->nr_samples = 0;
-		cur_log->max_samples = 0;
-		cur_log->log = NULL;
+		sfree(cur_log);
 
 		gz_work(data);
 	}
diff --git a/stat.c b/stat.c
index a8ccd9a..26d8d53 100644
--- a/stat.c
+++ b/stat.c
@@ -16,6 +16,7 @@
 #include "lib/pow2.h"
 #include "lib/output_buffer.h"
 #include "helper_thread.h"
+#include "smalloc.h"
 
 struct fio_mutex *stat_mutex;
 
@@ -1877,7 +1878,7 @@ static struct io_logs *get_new_log(struct io_log *iolog)
 
 	new_size = new_samples * log_entry_sz(iolog);
 
-	cur_log = malloc(sizeof(*cur_log));
+	cur_log = smalloc(sizeof(*cur_log));
 	if (cur_log) {
 		INIT_FLIST_HEAD(&cur_log->list);
 		cur_log->log = malloc(new_size);
@@ -1888,7 +1889,7 @@ static struct io_logs *get_new_log(struct io_log *iolog)
 			iolog->cur_log_max = new_samples;
 			return cur_log;
 		}
-		free(cur_log);
+		sfree(cur_log);
 	}
 
 	return NULL;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-06-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-06-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a275c37ab0001b62b1961e3430e58a2d42ee3dc9:

  options: mark unsupported options as such (2016-06-08 11:13:08 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 54d0a3150d44adca3ee4047fabd85651c6ea2db1:

  options: fix typos (2016-06-09 13:30:52 -0600)

----------------------------------------------------------------
Vincent Fu (1):
      options: fix typos

 options.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/options.c b/options.c
index e8c0b7f..5199823 100644
--- a/options.c
+++ b/options.c
@@ -2190,6 +2190,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_INVALID,
 	},
 #else
+	{
 		.name	= "fadvise_stream",
 		.lname	= "Fadvise stream",
 		.type	= FIO_OPT_UNSUPPORTED,
@@ -2259,6 +2260,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_INVALID,
 	},
 #else
+	{
 		.name	= "sync_file_range",
 		.lname	= "Sync file range",
 		.type	= FIO_OPT_UNSUPPORTED,
@@ -2700,7 +2702,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.name	= "verify_async_cpus",
 		.lname	= "Async verify CPUs",
 		.type	= FIO_OPT_UNSUPPORTED,
-		.help	"Your platform does not support CPU affinities",
+		.help	= "Your platform does not support CPU affinities",
 	},
 #endif
 	{

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-06-09 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-06-09 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6a89b401289ef823c51760c71018b43d0c17532b:

  stat: fix reversed check for ramp time (2016-06-06 21:23:53 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a275c37ab0001b62b1961e3430e58a2d42ee3dc9:

  options: mark unsupported options as such (2016-06-08 11:13:08 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      parse: add support for unsupported options
      options: mark unsupported options as such

 options.c | 135 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 options.h |  15 +------
 parse.c   |  21 +++++++++-
 parse.h   |   1 +
 4 files changed, 156 insertions(+), 16 deletions(-)

---

Diff of recent changes:

diff --git a/options.c b/options.c
index 7a22fe4..e8c0b7f 100644
--- a/options.c
+++ b/options.c
@@ -2161,7 +2161,14 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  },
 		},
 	},
-#endif	/* CONFIG_POSIX_FALLOCATE */
+#else	/* CONFIG_POSIX_FALLOCATE */
+	{
+		.name	= "fallocate",
+		.lname	= "Fallocate",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support fallocate",
+	},
+#endif /* CONFIG_POSIX_FALLOCATE */
 	{
 		.name	= "fadvise_hint",
 		.lname	= "Fadvise hint",
@@ -2182,6 +2189,12 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_FILE,
 		.group	= FIO_OPT_G_INVALID,
 	},
+#else
+		.name	= "fadvise_stream",
+		.lname	= "Fadvise stream",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support fadvise stream ID",
+	},
 #endif
 	{
 		.name	= "fsync",
@@ -2245,6 +2258,12 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_FILE,
 		.group	= FIO_OPT_G_INVALID,
 	},
+#else
+		.name	= "sync_file_range",
+		.lname	= "Sync file range",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support sync_file_range",
+	},
 #endif
 	{
 		.name	= "direct",
@@ -2676,6 +2695,13 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_VERIFY,
 	},
+#else
+	{
+		.name	= "verify_async_cpus",
+		.lname	= "Async verify CPUs",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	"Your platform does not support CPU affinities",
+	},
 #endif
 	{
 		.name	= "experimental_verify",
@@ -2760,6 +2786,31 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_TRIM,
 	},
+#else
+	{
+		.name	= "trim_percentage",
+		.lname	= "Trim percentage",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Fio does not support TRIM on your platform",
+	},
+	{
+		.name	= "trim_verify_zero",
+		.lname	= "Verify trim zero",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Fio does not support TRIM on your platform",
+	},
+	{
+		.name	= "trim_backlog",
+		.lname	= "Trim backlog",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Fio does not support TRIM on your platform",
+	},
+	{
+		.name	= "trim_backlog_batch",
+		.lname	= "Trim backlog batch",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Fio does not support TRIM on your platform",
+	},
 #endif
 	{
 		.name	= "write_iolog",
@@ -2852,6 +2903,13 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_FILE,
 		.group	= FIO_OPT_G_INVALID,
 	},
+#else
+	{
+		.name	= "ioscheduler",
+		.lname	= "I/O scheduler",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support IO scheduler switching",
+	},
 #endif
 	{
 		.name	= "zonesize",
@@ -2970,6 +3028,19 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_CRED,
 	},
+#else
+	{
+		.name	= "prio",
+		.lname	= "I/O nice priority",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support IO priorities",
+	},
+	{
+		.name	= "prioclass",
+		.lname	= "I/O nice priority class",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support IO priorities",
+	},
 #endif
 	{
 		.name	= "thinktime",
@@ -3268,6 +3339,25 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_CRED,
 	},
+#else
+	{
+		.name	= "cpumask",
+		.lname	= "CPU mask",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support CPU affinities",
+	},
+	{
+		.name	= "cpus_allowed",
+		.lname	= "CPUs allowed",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support CPU affinities",
+	},
+	{
+		.name	= "cpus_allowed_policy",
+		.lname	= "CPUs allowed distribution policy",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support CPU affinities",
+	},
 #endif
 #ifdef CONFIG_LIBNUMA
 	{
@@ -3290,6 +3380,19 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_GENERAL,
 		.group	= FIO_OPT_G_INVALID,
 	},
+#else
+	{
+		.name	= "numa_cpu_nodes",
+		.lname	= "NUMA CPU Nodes",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Build fio with libnuma-dev(el) to enable this option",
+	},
+	{
+		.name	= "numa_mem_policy",
+		.lname	= "NUMA Memory Policy",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Build fio with libnuma-dev(el) to enable this option",
+	},
 #endif
 	{
 		.name	= "end_fsync",
@@ -3462,6 +3565,13 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
 	},
+#else
+	{
+		.name	= "log_compression_cpus",
+		.lname	= "Log Compression CPUs",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support CPU affinities",
+	},
 #endif
 	{
 		.name	= "log_store_compressed",
@@ -3472,6 +3582,19 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_LOG,
 		.group	= FIO_OPT_G_INVALID,
 	},
+#else
+	{
+		.name	= "log_compression",
+		.lname	= "Log compression",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Install libz-dev(el) to get compression support",
+	},
+	{
+		.name	= "log_store_compressed",
+		.lname	= "Log store compressed",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Install libz-dev(el) to get compression support",
+	},
 #endif
 	{
 		.name	= "block_error_percentiles",
@@ -3632,6 +3755,13 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.category = FIO_OPT_C_STAT,
 		.group	= FIO_OPT_G_INVALID,
 	},
+#else
+	{
+		.name	= "disk_util",
+		.lname	= "Disk utilization",
+		.type	= FIO_OPT_UNSUPPORTED,
+		.help	= "Your platform does not support disk utilization",
+	},
 #endif
 	{
 		.name	= "gtod_reduce",
@@ -4277,7 +4407,8 @@ static void show_closest_option(const char *opt)
 		i++;
 	}
 
-	if (best_option != -1 && string_distance_ok(name, best_distance))
+	if (best_option != -1 && string_distance_ok(name, best_distance) &&
+	    fio_options[best_option].type != FIO_OPT_UNSUPPORTED)
 		log_err("Did you mean %s?\n", fio_options[best_option].name);
 
 	free(name);
diff --git a/options.h b/options.h
index 4727bac..539a636 100644
--- a/options.h
+++ b/options.h
@@ -47,19 +47,8 @@ static inline bool o_match(struct fio_option *o, const char *opt)
 	return false;
 }
 
-static inline struct fio_option *find_option(struct fio_option *options,
-					     const char *opt)
-{
-	struct fio_option *o;
-
-	for (o = &options[0]; o->name; o++)
-		if (o_match(o, opt))
-			return o;
-
-	return NULL;
-}
-
-extern struct fio_option *fio_option_find(const char *name);
+extern struct fio_option *find_option(struct fio_option *, const char *);
+extern struct fio_option *fio_option_find(const char *);
 extern unsigned int fio_get_kb_base(void *);
 
 #endif
diff --git a/parse.c b/parse.c
index 963f1f8..bb16bc1 100644
--- a/parse.c
+++ b/parse.c
@@ -906,6 +906,25 @@ static int handle_option(struct fio_option *o, const char *__ptr, void *data)
 	return ret;
 }
 
+struct fio_option *find_option(struct fio_option *options, const char *opt)
+{
+	struct fio_option *o;
+
+	for (o = &options[0]; o->name; o++) {
+		if (!o_match(o, opt))
+			continue;
+		if (o->type == FIO_OPT_UNSUPPORTED) {
+			log_err("Option <%s>: %s\n", o->name, o->help);
+			continue;
+		}
+
+		return o;
+	}
+
+	return NULL;
+}
+
+
 static struct fio_option *get_option(char *opt,
 				     struct fio_option *options, char **post)
 {
@@ -1232,7 +1251,7 @@ void fill_default_options(void *data, struct fio_option *options)
 
 void option_init(struct fio_option *o)
 {
-	if (o->type == FIO_OPT_DEPRECATED)
+	if (o->type == FIO_OPT_DEPRECATED || o->type == FIO_OPT_UNSUPPORTED)
 		return;
 	if (o->name && !o->lname)
 		log_err("Option %s: missing long option name\n", o->name);
diff --git a/parse.h b/parse.h
index 77450ef..aa00a67 100644
--- a/parse.h
+++ b/parse.h
@@ -20,6 +20,7 @@ enum fio_opt_type {
 	FIO_OPT_FLOAT_LIST,
 	FIO_OPT_STR_SET,
 	FIO_OPT_DEPRECATED,
+	FIO_OPT_UNSUPPORTED,
 };
 
 /*

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-06-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-06-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 82e65aecd90a35171eb9930ba7b08d27fee95640:

  Documentation: fix psyncv2 typo (2016-06-03 09:00:49 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6a89b401289ef823c51760c71018b43d0c17532b:

  stat: fix reversed check for ramp time (2016-06-06 21:23:53 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Fix ramp time breakage
      stat: remove redundant unit log check
      stat: fix reversed check for ramp time

 fio_time.h |  6 ++++--
 stat.c     |  8 +-------
 time.c     | 10 +++++-----
 3 files changed, 10 insertions(+), 14 deletions(-)

---

Diff of recent changes:

diff --git a/fio_time.h b/fio_time.h
index cb271c2..e31ea09 100644
--- a/fio_time.h
+++ b/fio_time.h
@@ -1,6 +1,8 @@
 #ifndef FIO_TIME_H
 #define FIO_TIME_H
 
+#include "lib/types.h"
+
 struct thread_data;
 extern uint64_t utime_since(const struct timeval *,const  struct timeval *);
 extern uint64_t utime_since_now(const struct timeval *);
@@ -14,8 +16,8 @@ extern uint64_t usec_spin(unsigned int);
 extern uint64_t usec_sleep(struct thread_data *, unsigned long);
 extern void fill_start_time(struct timeval *);
 extern void set_genesis_time(void);
-extern int ramp_time_over(struct thread_data *);
-extern int in_ramp_time(struct thread_data *);
+extern bool ramp_time_over(struct thread_data *);
+extern bool in_ramp_time(struct thread_data *);
 extern void fio_time_init(void);
 extern void timeval_add_msec(struct timeval *, unsigned int);
 
diff --git a/stat.c b/stat.c
index 5384884..a8ccd9a 100644
--- a/stat.c
+++ b/stat.c
@@ -2260,9 +2260,6 @@ static int add_bw_samples(struct thread_data *td, struct timeval *t)
 	unsigned long spent, rate;
 	enum fio_ddir ddir;
 
-	if (per_unit_log(td->bw_log))
-		return 0;
-
 	spent = mtime_since(&td->bw_sample_time, t);
 	if (spent < td->o.bw_avg_time &&
 	    td->o.bw_avg_time - spent >= 10)
@@ -2331,9 +2328,6 @@ static int add_iops_samples(struct thread_data *td, struct timeval *t)
 	unsigned long spent, iops;
 	enum fio_ddir ddir;
 
-	if (per_unit_log(td->iops_log))
-		return 0;
-
 	spent = mtime_since(&td->iops_sample_time, t);
 	if (spent < td->o.iops_avg_time &&
 	    td->o.iops_avg_time - spent >= 10)
@@ -2393,7 +2387,7 @@ int calc_log_samples(void)
 	fio_gettime(&now, NULL);
 
 	for_each_td(td, i) {
-		if (!ramp_time_over(td) ||
+		if (in_ramp_time(td) ||
 		    !(td->runstate == TD_RUNNING || td->runstate == TD_VERIFYING)) {
 			next = min(td->o.iops_avg_time, td->o.bw_avg_time);
 			continue;
diff --git a/time.c b/time.c
index 0e64af5..f1c5d3f 100644
--- a/time.c
+++ b/time.c
@@ -84,7 +84,7 @@ uint64_t utime_since_genesis(void)
 	return utime_since_now(&genesis);
 }
 
-int in_ramp_time(struct thread_data *td)
+bool in_ramp_time(struct thread_data *td)
 {
 	return td->o.ramp_time && !td->ramp_time_over;
 }
@@ -101,12 +101,12 @@ static void parent_update_ramp(struct thread_data *td)
 	td_set_runstate(parent, TD_RAMP);
 }
 
-int ramp_time_over(struct thread_data *td)
+bool ramp_time_over(struct thread_data *td)
 {
 	struct timeval tv;
 
 	if (!td->o.ramp_time || td->ramp_time_over)
-		return 1;
+		return true;
 
 	fio_gettime(&tv, NULL);
 	if (utime_since(&td->epoch, &tv) >= td->o.ramp_time) {
@@ -114,10 +114,10 @@ int ramp_time_over(struct thread_data *td)
 		reset_all_stats(td);
 		td_set_runstate(td, TD_RAMP);
 		parent_update_ramp(td);
-		return 1;
+		return true;
 	}
 
-	return 0;
+	return false;
 }
 
 void fio_time_init(void)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-06-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-06-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 385e1da6468bc951a0bf7ae60d890bb4d4a55ded:

  Documentation update (2016-06-02 16:57:20 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 82e65aecd90a35171eb9930ba7b08d27fee95640:

  Documentation: fix psyncv2 typo (2016-06-03 09:00:49 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Documentation: fix psyncv2 typo

Vincent Fu (1):
      tools/fio_latency2csv.py: add tool that converts json+ to CSV

 HOWTO                    |   2 +-
 Makefile                 |   2 +-
 fio.1                    |   2 +-
 tools/fio_latency2csv.py | 101 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 104 insertions(+), 3 deletions(-)
 create mode 100755 tools/fio_latency2csv.py

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 1d4e46c..67fd833 100644
--- a/HOWTO
+++ b/HOWTO
@@ -708,7 +708,7 @@ ioengine=str	Defines how the job issues io to the file. The following
 
 			pvsync	Basic preadv(2) or pwritev(2) IO.
 
-			psync2	Basic preadv2(2) or pwritev2(2) IO.
+			pvsync2	Basic preadv2(2) or pwritev2(2) IO.
 
 			libaio	Linux native asynchronous io. Note that Linux
 				may only support queued behaviour with
diff --git a/Makefile b/Makefile
index 108e6ee..c617d6f 100644
--- a/Makefile
+++ b/Makefile
@@ -26,7 +26,7 @@ OPTFLAGS= -g -ffast-math
 CFLAGS	= -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement $(OPTFLAGS) $(EXTFLAGS) $(BUILD_CFLAGS) -I. -I$(SRCDIR)
 LIBS	+= -lm $(EXTLIBS)
 PROGS	= fio
-SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py)
+SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py tools/fio_latency2csv.py)
 
 ifndef CONFIG_FIO_NO_OPT
   CFLAGS += -O3
diff --git a/fio.1 b/fio.1
index 7f053d4..353f8ff 100644
--- a/fio.1
+++ b/fio.1
@@ -1709,7 +1709,7 @@ from user-space to reap events. The reaping mode is only
 enabled when polling for a minimum of 0 events (eg when
 iodepth_batch_complete=0).
 .TP
-.BI (psyncv2)hipri
+.BI (pvsync2)hipri
 Set RWF_HIPRI on IO, indicating to the kernel that it's of
 higher priority than normal.
 .TP
diff --git a/tools/fio_latency2csv.py b/tools/fio_latency2csv.py
new file mode 100755
index 0000000..93586d2
--- /dev/null
+++ b/tools/fio_latency2csv.py
@@ -0,0 +1,101 @@
+#!/usr/bin/python
+#
+# fio_latency2csv.py
+#
+# This tool converts fio's json+ completion latency data to CSV format.
+# For example:
+#
+# fio_latency2csv.py fio-jsonplus.output fio-latency.csv
+#
+
+import os
+import json
+import argparse
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('source',
+                        help='fio json+ output file containing completion '
+                             'latency data')
+    parser.add_argument('dest',
+                        help='destination file stub for latency data in CSV '
+                             'format. job number will be appended to filename')
+    args = parser.parse_args()
+
+    return args
+
+
+# from stat.c
+def plat_idx_to_val(idx, FIO_IO_U_PLAT_BITS=6, FIO_IO_U_PLAT_VAL=64):
+    # MSB <= (FIO_IO_U_PLAT_BITS-1), cannot be rounded off. Use
+    # all bits of the sample as index
+    if (idx < (FIO_IO_U_PLAT_VAL << 1)):
+        return idx
+
+    # Find the group and compute the minimum value of that group
+    error_bits = (idx >> FIO_IO_U_PLAT_BITS) - 1
+    base = 1 << (error_bits + FIO_IO_U_PLAT_BITS)
+
+    # Find its bucket number of the group
+    k = idx % FIO_IO_U_PLAT_VAL
+
+    # Return the mean of the range of the bucket
+    return (base + ((k + 0.5) * (1 << error_bits)))
+
+
+def percentile(idx, run_total):
+    total = run_total[len(run_total)-1]
+    if total == 0:
+        return 0
+
+    return float(run_total[x]) / total
+
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    with open(args.source, 'r') as source:
+        jsondata = json.loads(source.read())
+
+    bins = {}
+    bin_const = {}
+    run_total = {}
+    ddir_list = ['read', 'write', 'trim']
+    const_list = ['FIO_IO_U_PLAT_NR', 'FIO_IO_U_PLAT_BITS',
+                  'FIO_IO_U_PLAT_VAL']
+
+    for jobnum in range(0,len(jsondata['jobs'])):
+        prev_ddir = None
+        for ddir in ddir_list:
+            bins[ddir] = jsondata['jobs'][jobnum][ddir]['clat']['bins']
+
+            bin_const[ddir] = {}
+            for const in const_list:
+                bin_const[ddir][const] = bins[ddir].pop(const)
+                if prev_ddir:
+                    assert bin_const[ddir][const] == bin_const[prev_ddir][const]
+            prev_ddir = ddir
+
+            run_total[ddir] = [0 for x in
+                               range(bin_const[ddir]['FIO_IO_U_PLAT_NR'])]
+            run_total[ddir][0] = bins[ddir]['0']
+            for x in range(1, bin_const[ddir]['FIO_IO_U_PLAT_NR']):
+                run_total[ddir][x] = run_total[ddir][x-1] + bins[ddir][str(x)]
+        
+        stub, ext = os.path.splitext(args.dest)
+        outfile = stub + '_job' + str(jobnum) + ext
+
+        with open(outfile, 'w') as output:
+            output.write("clat (usec),")
+            for ddir in ddir_list:
+                output.write("{0},".format(ddir))
+            output.write("\n")
+
+            for x in range(bin_const['read']['FIO_IO_U_PLAT_NR']):
+                output.write("{0},".format(plat_idx_to_val(x,
+                                          bin_const['read']['FIO_IO_U_PLAT_BITS'],
+                                          bin_const['read']['FIO_IO_U_PLAT_VAL'])))
+                for ddir in ddir_list:
+                    output.write("{0},".format(percentile(x, run_total[ddir])))
+                output.write("\n")

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-06-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-06-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit aa7d2ef092d5a8e417fcddaf8808fb0d48f1064b:

  Added millisecond-accurate timestamp to JSON output (2016-05-27 14:26:16 -0400)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 385e1da6468bc951a0bf7ae60d890bb4d4a55ded:

  Documentation update (2016-06-02 16:57:20 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Enable preadv2/pwritev2 engines by default on Linux
      arch: wire up preadv2/pwritev2 for more architectures
      Documentation update

Omar Sandoval (1):
      Fix iodepth_batch=0

 HOWTO               | 11 +++++------
 arch/arch-arm.h     |  7 +++++++
 arch/arch-ia64.h    |  7 +++++++
 arch/arch-s390.h    |  7 +++++++
 arch/arch-sparc.h   |  7 +++++++
 arch/arch-sparc64.h |  7 +++++++
 arch/arch-x86.h     |  7 +++++++
 arch/arch-x86_64.h  |  8 ++++++++
 engines/sync.c      | 10 +++++-----
 fio.1               |  3 ---
 options.c           |  3 +--
 os/os-linux.h       | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 12 files changed, 113 insertions(+), 16 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index cec4e42..1d4e46c 100644
--- a/HOWTO
+++ b/HOWTO
@@ -53,8 +53,8 @@ bottom, it contains the following basic parameters:
 
 	IO engine	How do we issue io? We could be memory mapping the
 			file, we could be using regular read/write, we
-			could be using splice, async io, syslet, or even
-			SG (SCSI generic sg).
+			could be using splice, async io, or even SG
+			(SCSI generic sg).
 
 	IO depth	If the io engine is async, how large a queuing
 			depth do we want to maintain?
@@ -706,7 +706,9 @@ ioengine=str	Defines how the job issues io to the file. The following
 
 			vsync	Basic readv(2) or writev(2) IO.
 
-			psyncv	Basic preadv(2) or pwritev(2) IO.
+			pvsync	Basic preadv(2) or pwritev(2) IO.
+
+			psync2	Basic preadv2(2) or pwritev2(2) IO.
 
 			libaio	Linux native asynchronous io. Note that Linux
 				may only support queued behaviour with
@@ -726,9 +728,6 @@ ioengine=str	Defines how the job issues io to the file. The following
 				vmsplice(2) to transfer data from user
 				space to the kernel.
 
-			syslet-rw Use the syslet system calls to make
-				regular read/write async.
-
 			sg	SCSI generic sg v3 io. May either be
 				synchronous using the SG_IO ioctl, or if
 				the target is an sg character device
diff --git a/arch/arch-arm.h b/arch/arch-arm.h
index 93268d2..57d9488 100644
--- a/arch/arch-arm.h
+++ b/arch/arch-arm.h
@@ -18,6 +18,13 @@
 #define __NR_sys_vmsplice	343
 #endif
 
+#ifndef __NR_preadv2
+#define __NR_preadv2		392
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		393
+#endif
+
 #if defined (__ARM_ARCH_4__) || defined (__ARM_ARCH_4T__) \
 	|| defined (__ARM_ARCH_5__) || defined (__ARM_ARCH_5T__) || defined (__ARM_ARCH_5E__)\
 	|| defined (__ARM_ARCH_5TE__) || defined (__ARM_ARCH_5TEJ__) \
diff --git a/arch/arch-ia64.h b/arch/arch-ia64.h
index 8e8dd7f..7cdeefc 100644
--- a/arch/arch-ia64.h
+++ b/arch/arch-ia64.h
@@ -18,6 +18,13 @@
 #define __NR_sys_vmsplice	1302
 #endif
 
+#ifndef __NR_preadv2
+#define __NR_preadv2		1348
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		1349
+#endif
+
 #define nop		asm volatile ("hint @pause" ::: "memory");
 #define read_barrier()	asm volatile ("mf" ::: "memory")
 #define write_barrier()	asm volatile ("mf" ::: "memory")
diff --git a/arch/arch-s390.h b/arch/arch-s390.h
index cc7a1d1..71beb7d 100644
--- a/arch/arch-s390.h
+++ b/arch/arch-s390.h
@@ -18,6 +18,13 @@
 #define __NR_sys_vmsplice	309
 #endif
 
+#ifndef __NR_preadv2
+#define __NR_preadv2		376
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		377
+#endif
+
 #define nop		asm volatile("nop" : : : "memory")
 #define read_barrier()	asm volatile("bcr 15,0" : : : "memory")
 #define write_barrier()	asm volatile("bcr 15,0" : : : "memory")
diff --git a/arch/arch-sparc.h b/arch/arch-sparc.h
index fe47b80..d0df883 100644
--- a/arch/arch-sparc.h
+++ b/arch/arch-sparc.h
@@ -18,6 +18,13 @@
 #define __NR_sys_vmsplice	25
 #endif
 
+#ifndef __NR_preadv2
+#define __NR_preadv2		358
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		359
+#endif
+
 #define nop	do { } while (0)
 
 #define read_barrier()	__asm__ __volatile__ ("" : : : "memory")
diff --git a/arch/arch-sparc64.h b/arch/arch-sparc64.h
index e793ae5..5c4e649 100644
--- a/arch/arch-sparc64.h
+++ b/arch/arch-sparc64.h
@@ -18,6 +18,13 @@
 #define __NR_sys_vmsplice	25
 #endif
 
+#ifndef __NR_preadv2
+#define __NR_preadv2		358
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		359
+#endif
+
 #define nop	do { } while (0)
 
 #define membar_safe(type) \
diff --git a/arch/arch-x86.h b/arch/arch-x86.h
index 385a912..9471a89 100644
--- a/arch/arch-x86.h
+++ b/arch/arch-x86.h
@@ -29,6 +29,13 @@ static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 #define __NR_sys_vmsplice	316
 #endif
 
+#ifndef __NR_preadv2
+#define __NR_preadv2		378
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		379
+#endif
+
 #define	FIO_HUGE_PAGE		4194304
 
 #define nop		__asm__ __volatile__("rep;nop": : :"memory")
diff --git a/arch/arch-x86_64.h b/arch/arch-x86_64.h
index 8f33fc5..21da412 100644
--- a/arch/arch-x86_64.h
+++ b/arch/arch-x86_64.h
@@ -36,6 +36,14 @@ static inline void do_cpuid(unsigned int *eax, unsigned int *ebx,
 #define __NR_shmdt		 67
 #endif
 
+#ifndef __NR_preadv2
+#define __NR_preadv2		327
+#endif
+#ifndef __NR_pwritev2
+#define __NR_pwritev2		328
+#endif
+
+
 #define	FIO_HUGE_PAGE		2097152
 
 #define nop		__asm__ __volatile__("rep;nop": : :"memory")
diff --git a/engines/sync.c b/engines/sync.c
index 260ef66..433e4fa 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -32,7 +32,7 @@ struct syncio_data {
 	enum fio_ddir last_ddir;
 };
 
-#ifdef CONFIG_PWRITEV2
+#ifdef FIO_HAVE_PWRITEV2
 struct psyncv2_options {
 	void *pad;
 	unsigned int hipri;
@@ -121,7 +121,7 @@ static int fio_pvsyncio_queue(struct thread_data *td, struct io_u *io_u)
 }
 #endif
 
-#ifdef CONFIG_PWRITEV2
+#ifdef FIO_HAVE_PWRITEV2
 static int fio_pvsyncio2_queue(struct thread_data *td, struct io_u *io_u)
 {
 	struct syncio_data *sd = td->io_ops->data;
@@ -429,7 +429,7 @@ static struct ioengine_ops ioengine_pvrw = {
 };
 #endif
 
-#ifdef CONFIG_PWRITEV2
+#ifdef FIO_HAVE_PWRITEV2
 static struct ioengine_ops ioengine_pvrw2 = {
 	.name		= "pvsync2",
 	.version	= FIO_IOOPS_VERSION,
@@ -453,7 +453,7 @@ static void fio_init fio_syncio_register(void)
 #ifdef CONFIG_PWRITEV
 	register_ioengine(&ioengine_pvrw);
 #endif
-#ifdef CONFIG_PWRITEV2
+#ifdef FIO_HAVE_PWRITEV2
 	register_ioengine(&ioengine_pvrw2);
 #endif
 }
@@ -466,7 +466,7 @@ static void fio_exit fio_syncio_unregister(void)
 #ifdef CONFIG_PWRITEV
 	unregister_ioengine(&ioengine_pvrw);
 #endif
-#ifdef CONFIG_PWRITEV2
+#ifdef FIO_HAVE_PWRITEV2
 	unregister_ioengine(&ioengine_pvrw2);
 #endif
 }
diff --git a/fio.1 b/fio.1
index f521c9d..7f053d4 100644
--- a/fio.1
+++ b/fio.1
@@ -633,9 +633,6 @@ File is memory mapped with \fBmmap\fR\|(2) and data copied using
 \fBsplice\fR\|(2) is used to transfer the data and \fBvmsplice\fR\|(2) to
 transfer data from user-space to the kernel.
 .TP
-.B syslet-rw
-Use the syslet system calls to make regular read/write asynchronous.
-.TP
 .B sg
 SCSI generic sg v3 I/O. May be either synchronous using the SG_IO ioctl, or if
 the target is an sg character device, we use \fBread\fR\|(2) and
diff --git a/options.c b/options.c
index 3360784..7a22fe4 100644
--- a/options.c
+++ b/options.c
@@ -1548,7 +1548,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "Use preadv/pwritev",
 			  },
 #endif
-#ifdef CONFIG_PWRITEV2
+#ifdef FIO_HAVE_PWRITEV2
 			  { .ival = "pvsync2",
 			    .help = "Use preadv2/pwritev2",
 			  },
@@ -1678,7 +1678,6 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.help	= "Number of IO buffers to submit in one go",
 		.parent	= "iodepth",
 		.hide	= 1,
-		.minval	= 1,
 		.interval = 1,
 		.def	= "1",
 		.category = FIO_OPT_C_IO,
diff --git a/os/os-linux.h b/os/os-linux.h
index 23c16b6..b36d33c 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -38,6 +38,7 @@
 #define FIO_HAVE_BINJECT
 #define FIO_HAVE_GETTID
 #define FIO_USE_GENERIC_INIT_RANDOM_STATE
+#define FIO_HAVE_PWRITEV2
 
 #ifdef MAP_HUGETLB
 #define FIO_HAVE_MMAP_HUGE
@@ -289,4 +290,55 @@ static inline int fio_set_sched_idle(void)
 
 #define FIO_HAVE_STREAMID
 
+#ifndef RWF_HIPRI
+#define RWF_HIPRI	0x00000001
+#endif
+#ifndef RWF_DSYNC
+#define RWF_DSYNC	0x00000002
+#endif
+#ifndef RWF_SYNC
+#define RWF_SYNC	0x00000004
+#endif
+
+#ifndef CONFIG_PWRITEV2
+#ifdef __NR_preadv2
+static inline void make_pos_h_l(unsigned long *pos_h, unsigned long *pos_l,
+				off_t offset)
+{
+	*pos_l = offset & 0xffffffff;
+	*pos_h = ((uint64_t) offset) >> 32;
+
+}
+static inline ssize_t preadv2(int fd, const struct iovec *iov, int iovcnt,
+			      off_t offset, unsigned int flags)
+{
+	unsigned long pos_l, pos_h;
+
+	make_pos_h_l(&pos_h, &pos_l, offset);
+	return syscall(__NR_preadv2, fd, iov, iovcnt, pos_l, pos_h, flags);
+}
+static inline ssize_t pwritev2(int fd, const struct iovec *iov, int iovcnt,
+			       off_t offset, unsigned int flags)
+{
+	unsigned long pos_l, pos_h;
+
+	make_pos_h_l(&pos_h, &pos_l, offset);
+	return syscall(__NR_pwritev2, fd, iov, iovcnt, pos_l, pos_h, flags);
+}
+#else
+static inline ssize_t preadv2(int fd, const struct iovec *iov, int iovcnt,
+			      off_t offset, unsigned int flags)
+{
+	errno = ENOSYS;
+	return -1;
+}
+static inline ssize_t pwritev2(int fd, const struct iovec *iov, int iovcnt,
+			       off_t offset, unsigned int flags)
+{
+	errno = ENOSYS;
+	return -1;
+}
+#endif /* __NR_preadv2 */
+#endif /* CONFIG_PWRITEV2 */
+
 #endif

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-28 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-28 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 34febb23fa9c7b9b0d54c324effff1a808a8fe6e:

  mutex: abstract out cond/lock pshared init (2016-05-25 13:55:48 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to aa7d2ef092d5a8e417fcddaf8808fb0d48f1064b:

  Added millisecond-accurate timestamp to JSON output (2016-05-27 14:26:16 -0400)

----------------------------------------------------------------
Jens Axboe (1):
      server: ensure that we flush compressed logs correctly

Ryan Hardin (1):
      Added millisecond-accurate timestamp to JSON output

 server.c | 38 +++++++++++++++++++++++++++++++++-----
 stat.c   | 13 +++++++++----
 2 files changed, 42 insertions(+), 9 deletions(-)

---

Diff of recent changes:

diff --git a/server.c b/server.c
index d36c511..667a66c 100644
--- a/server.c
+++ b/server.c
@@ -1656,7 +1656,7 @@ void fio_server_send_du(void)
 static int __fio_append_iolog_gz(struct sk_entry *first, struct io_log *log,
 				 struct io_logs *cur_log, z_stream *stream)
 {
-	struct sk_entry *entry;
+	unsigned int this_len;
 	void *out_pdu;
 	int ret;
 
@@ -1664,7 +1664,7 @@ static int __fio_append_iolog_gz(struct sk_entry *first, struct io_log *log,
 	stream->avail_in = cur_log->nr_samples * log_entry_sz(log);
 
 	do {
-		unsigned int this_len;
+		struct sk_entry *entry;
 
 		/*
 		 * Dirty - since the log is potentially huge, compress it into
@@ -1675,7 +1675,7 @@ static int __fio_append_iolog_gz(struct sk_entry *first, struct io_log *log,
 
 		stream->avail_out = FIO_SERVER_MAX_FRAGMENT_PDU;
 		stream->next_out = out_pdu;
-		ret = deflate(stream, Z_FINISH);
+		ret = deflate(stream, Z_BLOCK);
 		/* may be Z_OK, or Z_STREAM_END */
 		if (ret < 0) {
 			free(out_pdu);
@@ -1716,8 +1716,36 @@ static int fio_append_iolog_gz(struct sk_entry *first, struct io_log *log)
 			break;
 	}
 
-	deflateEnd(&stream);
-	return ret;
+	ret = deflate(&stream, Z_FINISH);
+
+	while (ret != Z_STREAM_END) {
+		struct sk_entry *entry;
+		unsigned int this_len;
+		void *out_pdu;
+
+		out_pdu = malloc(FIO_SERVER_MAX_FRAGMENT_PDU);
+		stream.avail_out = FIO_SERVER_MAX_FRAGMENT_PDU;
+		stream.next_out = out_pdu;
+
+		ret = deflate(&stream, Z_FINISH);
+		/* may be Z_OK, or Z_STREAM_END */
+		if (ret < 0) {
+			free(out_pdu);
+			break;
+		}
+
+		this_len = FIO_SERVER_MAX_FRAGMENT_PDU - stream.avail_out;
+
+		entry = fio_net_prep_cmd(FIO_NET_CMD_IOLOG, out_pdu, this_len,
+					 NULL, SK_F_VEC | SK_F_INLINE | SK_F_FREE);
+		flist_add_tail(&entry->list, &first->next);
+	} while (ret != Z_STREAM_END);
+
+	ret = deflateEnd(&stream);
+	if (ret == Z_OK)
+		return 0;
+
+	return 1;
 }
 #else
 static int fio_append_iolog_gz(struct sk_entry *first, struct io_log *log)
diff --git a/stat.c b/stat.c
index fc1efd4..5384884 100644
--- a/stat.c
+++ b/stat.c
@@ -1636,16 +1636,21 @@ void __show_run_stats(void)
 	if (output_format & FIO_OUTPUT_JSON) {
 		struct thread_data *global;
 		char time_buf[32];
-		time_t time_p;
+		struct timeval now;
+		unsigned long long ms_since_epoch;
 
-		time(&time_p);
-		os_ctime_r((const time_t *) &time_p, time_buf,
+		gettimeofday(&now, NULL);
+		ms_since_epoch = (unsigned long long)(now.tv_sec) * 1000 +
+		                 (unsigned long long)(now.tv_usec) / 1000;
+
+		os_ctime_r((const time_t *) &now.tv_sec, time_buf,
 				sizeof(time_buf));
 		time_buf[strlen(time_buf) - 1] = '\0';
 
 		root = json_create_object();
 		json_object_add_value_string(root, "fio version", fio_version_string);
-		json_object_add_value_int(root, "timestamp", time_p);
+		json_object_add_value_int(root, "timestamp", now.tv_sec);
+		json_object_add_value_int(root, "timestamp_ms", ms_since_epoch);
 		json_object_add_value_string(root, "time", time_buf);
 		global = get_global_options();
 		json_add_job_opts(root, "global options", &global->opt_list, false);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 97e1fe78db572a48a44c2a8511f8393a8643fc28:

  Fio 2.11 (2016-05-24 18:42:04 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 34febb23fa9c7b9b0d54c324effff1a808a8fe6e:

  mutex: abstract out cond/lock pshared init (2016-05-25 13:55:48 -0600)

----------------------------------------------------------------
Jan Kara (2):
      fio: Simplify forking of processes
      Fix occasional hangs on mutexes

Jens Axboe (2):
      hash: make 64-bit even on 32-bit
      mutex: abstract out cond/lock pshared init

 backend.c       | 61 +++++++++++++--------------------------------
 hash.h          |  6 ++---
 helper_thread.c |  7 ++++--
 iolog.c         |  2 +-
 mutex.c         | 77 ++++++++++++++++++++++++++++++++++++++++++++-------------
 mutex.h         |  4 +++
 workqueue.c     | 18 ++++++++++----
 7 files changed, 103 insertions(+), 72 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index f132222..d8f4f4c 100644
--- a/backend.c
+++ b/backend.c
@@ -1428,7 +1428,6 @@ static void *thread_main(void *data)
 	struct thread_data *td = fd->td;
 	struct thread_options *o = &td->o;
 	struct sk_out *sk_out = fd->sk_out;
-	pthread_condattr_t attr;
 	int clear_state;
 	int ret;
 
@@ -1453,12 +1452,18 @@ static void *thread_main(void *data)
 	INIT_FLIST_HEAD(&td->verify_list);
 	INIT_FLIST_HEAD(&td->trim_list);
 	INIT_FLIST_HEAD(&td->next_rand_list);
-	pthread_mutex_init(&td->io_u_lock, NULL);
 	td->io_hist_tree = RB_ROOT;
 
-	pthread_condattr_init(&attr);
-	pthread_cond_init(&td->verify_cond, &attr);
-	pthread_cond_init(&td->free_cond, &attr);
+	ret = mutex_cond_init_pshared(&td->io_u_lock, &td->free_cond);
+	if (ret) {
+		td_verror(td, ret, "mutex_cond_init_pshared");
+		goto err;
+	}
+	ret = cond_init_pshared(&td->verify_cond);
+	if (ret) {
+		td_verror(td, ret, "mutex_cond_pshared");
+		goto err;
+	}
 
 	td_set_runstate(td, TD_INITIALIZED);
 	dprint(FD_MUTEX, "up startup_mutex\n");
@@ -1794,39 +1799,6 @@ err:
 	return (void *) (uintptr_t) td->error;
 }
 
-
-/*
- * We cannot pass the td data into a forked process, so attach the td and
- * pass it to the thread worker.
- */
-static int fork_main(struct sk_out *sk_out, int shmid, int offset)
-{
-	struct fork_data *fd;
-	void *data, *ret;
-
-#if !defined(__hpux) && !defined(CONFIG_NO_SHM)
-	data = shmat(shmid, NULL, 0);
-	if (data == (void *) -1) {
-		int __err = errno;
-
-		perror("shmat");
-		return __err;
-	}
-#else
-	/*
-	 * HP-UX inherits shm mappings?
-	 */
-	data = threads;
-#endif
-
-	fd = calloc(1, sizeof(*fd));
-	fd->td = data + offset * sizeof(struct thread_data);
-	fd->sk_out = sk_out;
-	ret = thread_main(fd);
-	shmdt(data);
-	return (int) (uintptr_t) ret;
-}
-
 static void dump_td_info(struct thread_data *td)
 {
 	log_err("fio: job '%s' (state=%d) hasn't exited in %lu seconds, it "
@@ -2164,6 +2136,7 @@ reap:
 		struct thread_data *map[REAL_MAX_JOBS];
 		struct timeval this_start;
 		int this_jobs = 0, left;
+		struct fork_data *fd;
 
 		/*
 		 * create threads (TD_NOT_CREATED -> TD_CREATED)
@@ -2213,14 +2186,13 @@ reap:
 			map[this_jobs++] = td;
 			nr_started++;
 
+			fd = calloc(1, sizeof(*fd));
+			fd->td = td;
+			fd->sk_out = sk_out;
+
 			if (td->o.use_thread) {
-				struct fork_data *fd;
 				int ret;
 
-				fd = calloc(1, sizeof(*fd));
-				fd->td = td;
-				fd->sk_out = sk_out;
-
 				dprint(FD_PROCESS, "will pthread_create\n");
 				ret = pthread_create(&td->thread, NULL,
 							thread_main, fd);
@@ -2240,8 +2212,9 @@ reap:
 				dprint(FD_PROCESS, "will fork\n");
 				pid = fork();
 				if (!pid) {
-					int ret = fork_main(sk_out, shm_id, i);
+					int ret;
 
+					ret = (int)(uintptr_t)thread_main(fd);
 					_exit(ret);
 				} else if (i == fio_debug_jobno)
 					*fio_debug_jobp = pid;
diff --git a/hash.h b/hash.h
index 1d7608b..d227b93 100644
--- a/hash.h
+++ b/hash.h
@@ -44,15 +44,15 @@
 #define GOLDEN_RATIO_32 0x61C88647
 #define GOLDEN_RATIO_64 0x61C8864680B583EBull
 
-static inline unsigned long __hash_long(unsigned long val)
+static inline unsigned long __hash_long(uint64_t val)
 {
-	unsigned long hash = val;
+	uint64_t hash = val;
 
 #if BITS_PER_LONG == 64
 	hash *= GOLDEN_RATIO_64;
 #else
 	/*  Sigh, gcc can't optimise this alone like it does for 32 bits. */
-	unsigned long n = hash;
+	uint64_t n = hash;
 	n <<= 18;
 	hash -= n;
 	n <<= 33;
diff --git a/helper_thread.c b/helper_thread.c
index 1befabf..e788af5 100644
--- a/helper_thread.c
+++ b/helper_thread.c
@@ -148,8 +148,11 @@ int helper_thread_create(struct fio_mutex *startup_mutex, struct sk_out *sk_out)
 	setup_disk_util();
 
 	hd->sk_out = sk_out;
-	pthread_cond_init(&hd->cond, NULL);
-	pthread_mutex_init(&hd->lock, NULL);
+
+	ret = mutex_cond_init_pshared(&hd->lock, &hd->cond);
+	if (ret)
+		return 1;
+
 	hd->startup_mutex = startup_mutex;
 
 	ret = pthread_create(&hd->thread, NULL, helper_thread_main, hd);
diff --git a/iolog.c b/iolog.c
index d9a17a5..9391507 100644
--- a/iolog.c
+++ b/iolog.c
@@ -604,7 +604,7 @@ void setup_log(struct io_log **log, struct log_params *p,
 	if (l->log_gz && !p->td)
 		l->log_gz = 0;
 	else if (l->log_gz || l->log_gz_store) {
-		pthread_mutex_init(&l->chunk_lock, NULL);
+		mutex_init_pshared(&l->chunk_lock);
 		p->td->flags |= TD_F_COMPRESS_LOG;
 	}
 
diff --git a/mutex.c b/mutex.c
index 16107dd..7580922 100644
--- a/mutex.c
+++ b/mutex.c
@@ -30,16 +30,39 @@ void fio_mutex_remove(struct fio_mutex *mutex)
 	munmap((void *) mutex, sizeof(*mutex));
 }
 
-int __fio_mutex_init(struct fio_mutex *mutex, int value)
+int cond_init_pshared(pthread_cond_t *cond)
 {
-	pthread_mutexattr_t attr;
-	pthread_condattr_t cond;
+	pthread_condattr_t cattr;
 	int ret;
 
-	mutex->value = value;
-	mutex->magic = FIO_MUTEX_MAGIC;
+	ret = pthread_condattr_init(&cattr);
+	if (ret) {
+		log_err("pthread_condattr_init: %s\n", strerror(ret));
+		return ret;
+	}
+
+#ifdef FIO_HAVE_PSHARED_MUTEX
+	ret = pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED);
+	if (ret) {
+		log_err("pthread_condattr_setpshared: %s\n", strerror(ret));
+		return ret;
+	}
+#endif
+	ret = pthread_cond_init(cond, &cattr);
+	if (ret) {
+		log_err("pthread_cond_init: %s\n", strerror(ret));
+		return ret;
+	}
+
+	return 0;
+}
 
-	ret = pthread_mutexattr_init(&attr);
+int mutex_init_pshared(pthread_mutex_t *mutex)
+{
+	pthread_mutexattr_t mattr;
+	int ret;
+
+	ret = pthread_mutexattr_init(&mattr);
 	if (ret) {
 		log_err("pthread_mutexattr_init: %s\n", strerror(ret));
 		return ret;
@@ -49,27 +72,47 @@ int __fio_mutex_init(struct fio_mutex *mutex, int value)
 	 * Not all platforms support process shared mutexes (FreeBSD)
 	 */
 #ifdef FIO_HAVE_PSHARED_MUTEX
-	ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
+	ret = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
 	if (ret) {
 		log_err("pthread_mutexattr_setpshared: %s\n", strerror(ret));
 		return ret;
 	}
 #endif
-
-	pthread_condattr_init(&cond);
-#ifdef FIO_HAVE_PSHARED_MUTEX
-	pthread_condattr_setpshared(&cond, PTHREAD_PROCESS_SHARED);
-#endif
-	pthread_cond_init(&mutex->cond, &cond);
-
-	ret = pthread_mutex_init(&mutex->lock, &attr);
+	ret = pthread_mutex_init(mutex, &mattr);
 	if (ret) {
 		log_err("pthread_mutex_init: %s\n", strerror(ret));
 		return ret;
 	}
 
-	pthread_condattr_destroy(&cond);
-	pthread_mutexattr_destroy(&attr);
+	return 0;
+}
+
+int mutex_cond_init_pshared(pthread_mutex_t *mutex, pthread_cond_t *cond)
+{
+	int ret;
+
+	ret = mutex_init_pshared(mutex);
+	if (ret)
+		return ret;
+
+	ret = cond_init_pshared(cond);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+int __fio_mutex_init(struct fio_mutex *mutex, int value)
+{
+	int ret;
+
+	mutex->value = value;
+	mutex->magic = FIO_MUTEX_MAGIC;
+
+	ret = mutex_cond_init_pshared(&mutex->lock, &mutex->cond);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
diff --git a/mutex.h b/mutex.h
index 8c1a711..54009ba 100644
--- a/mutex.h
+++ b/mutex.h
@@ -40,4 +40,8 @@ extern void fio_rwlock_unlock(struct fio_rwlock *);
 extern struct fio_rwlock *fio_rwlock_init(void);
 extern void fio_rwlock_remove(struct fio_rwlock *);
 
+extern int mutex_init_pshared(pthread_mutex_t *);
+extern int cond_init_pshared(pthread_cond_t *);
+extern int mutex_cond_init_pshared(pthread_mutex_t *, pthread_cond_t *);
+
 #endif
diff --git a/workqueue.c b/workqueue.c
index 4f9c414..2e01b58 100644
--- a/workqueue.c
+++ b/workqueue.c
@@ -278,8 +278,11 @@ static int start_worker(struct workqueue *wq, unsigned int index,
 	int ret;
 
 	INIT_FLIST_HEAD(&sw->work_list);
-	pthread_cond_init(&sw->cond, NULL);
-	pthread_mutex_init(&sw->lock, NULL);
+
+	ret = mutex_cond_init_pshared(&sw->lock, &sw->cond);
+	if (ret)
+		return ret;
+
 	sw->wq = wq;
 	sw->index = index;
 	sw->sk_out = sk_out;
@@ -308,15 +311,20 @@ int workqueue_init(struct thread_data *td, struct workqueue *wq,
 {
 	unsigned int running;
 	int i, error;
+	int ret;
 
 	wq->max_workers = max_workers;
 	wq->td = td;
 	wq->ops = *ops;
 	wq->work_seq = 0;
 	wq->next_free_worker = 0;
-	pthread_cond_init(&wq->flush_cond, NULL);
-	pthread_mutex_init(&wq->flush_lock, NULL);
-	pthread_mutex_init(&wq->stat_lock, NULL);
+
+	ret = mutex_cond_init_pshared(&wq->flush_lock, &wq->flush_cond);
+	if (ret)
+		goto err;
+	ret = mutex_init_pshared(&wq->stat_lock);
+	if (ret)
+		goto err;
 
 	wq->workers = smalloc(wq->max_workers * sizeof(struct submit_worker));
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-25 12:00 UTC (permalink / raw)
  To: fio

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 6907 bytes --]

The following changes since commit bbcacb72ac5f81b77a96981e6d00d9134360e7c5:

  parse: warn if option is missing a long option variant (2016-05-23 10:39:16 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 97e1fe78db572a48a44c2a8511f8393a8643fc28:

  Fio 2.11 (2016-05-24 18:42:04 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      backend: regrow logs for sync IO engines as well
      Fio 2.11

Mark Nelson (1):
      remove numpy and scipy dependency

Martin Steigerwald (5):
      Spelling fix. Reported by Debian��s lintian.
      Spelling fix. Reported by Debian��s lintian.
      Spelling fix. Reported by Debian��s lintian.
      Spelling fix.
      Spelling fix. Reported by Debian��s lintian.

 FIO-VERSION-GEN        |  2 +-
 backend.c              |  3 +++
 fio.1                  |  2 +-
 gettime-thread.c       |  2 +-
 idletime.c             |  2 +-
 memory.c               |  2 +-
 options.c              |  2 +-
 os/windows/install.wxs |  2 +-
 tools/fiologparser.py  | 35 ++++++++++++++++++++---------------
 9 files changed, 30 insertions(+), 22 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index d1ba7ca..ea65ea8 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.10
+DEF_VER=fio-2.11
 
 LF='
 '
diff --git a/backend.c b/backend.c
index 6d50360..f132222 100644
--- a/backend.c
+++ b/backend.c
@@ -524,6 +524,9 @@ sync_done:
 				break;
 		}
 
+		if (td->flags & TD_F_REGROW_LOGS)
+			regrow_logs(td);
+
 		/*
 		 * when doing I/O (not when verifying),
 		 * check for any errors that are to be ignored
diff --git a/fio.1 b/fio.1
index 839a359..f521c9d 100644
--- a/fio.1
+++ b/fio.1
@@ -2087,7 +2087,7 @@ This format is not supported in Fio versions => 1.20-rc3.
 .B Trace file format v2
 .RS
 The second version of the trace file format was added in Fio version 1.17.
-It allows to access more then one file per trace and has a bigger set of
+It allows one to access more then one file per trace and has a bigger set of
 possible file actions.
 
 The first line of the trace file has to be:
diff --git a/gettime-thread.c b/gettime-thread.c
index 9bf85f0..6dc1486 100644
--- a/gettime-thread.c
+++ b/gettime-thread.c
@@ -81,7 +81,7 @@ int fio_start_gtod_thread(void)
 
 	ret = pthread_detach(gtod_thread);
 	if (ret) {
-		log_err("Can't detatch gtod thread: %s\n", strerror(ret));
+		log_err("Can't detach gtod thread: %s\n", strerror(ret));
 		goto err;
 	}
 
diff --git a/idletime.c b/idletime.c
index fab43c5..4c00d80 100644
--- a/idletime.c
+++ b/idletime.c
@@ -260,7 +260,7 @@ void fio_idle_prof_init(void)
 
 		if ((ret = pthread_detach(ipt->thread))) {
 			/* log error and let the thread spin */
-			log_err("fio: pthread_detatch %s\n", strerror(ret));
+			log_err("fio: pthread_detach %s\n", strerror(ret));
 		}
 	}
 
diff --git a/memory.c b/memory.c
index c04d7df..af4d5ef 100644
--- a/memory.c
+++ b/memory.c
@@ -89,7 +89,7 @@ static int alloc_mem_shm(struct thread_data *td, unsigned int total_mem)
 					" support huge pages.\n");
 			} else if (errno == ENOMEM) {
 				log_err("fio: no huge pages available, do you"
-					" need to alocate some? See HOWTO.\n");
+					" need to allocate some? See HOWTO.\n");
 			}
 		}
 
diff --git a/options.c b/options.c
index 7f0f2c0..3360784 100644
--- a/options.c
+++ b/options.c
@@ -2023,7 +2023,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  },
 			  { .ival = "normal",
 			    .oval = FIO_RAND_DIST_GAUSS,
-			    .help = "Normal (gaussian) distribution",
+			    .help = "Normal (Gaussian) distribution",
 			  },
 			  { .ival = "zoned",
 			    .oval = FIO_RAND_DIST_ZONED,
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 8ae1394..45084e6 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.10">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.11">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"
diff --git a/tools/fiologparser.py b/tools/fiologparser.py
index 00e4d30..685f419 100755
--- a/tools/fiologparser.py
+++ b/tools/fiologparser.py
@@ -14,8 +14,7 @@
 # to see per-interval average completion latency.
 
 import argparse
-import numpy
-import scipy
+import math
 
 def parse_args():
     parser = argparse.ArgumentParser()
@@ -82,7 +81,6 @@ def print_averages(ctx, series):
 # to debug this routine, use
 #   # sort -n -t ',' -k 2 small.log
 # on your input.
-# Sometimes scipy interpolates between two values to get a percentile
 
 def my_extend( vlist, val ):
     vlist.extend(val)
@@ -102,21 +100,16 @@ def print_all_stats(ctx, series):
         for sample_array in sample_arrays:
             samplevalue_arrays.append( 
                 [ sample.value for sample in sample_array ] )
-        #print('samplevalue_arrays len: %d' % len(samplevalue_arrays))
-        #print('samplevalue_arrays elements len: ' + \
-               #str(map( lambda l: len(l), samplevalue_arrays)))
         # collapse list of lists of sample values into list of sample values
         samplevalues = reduce( array_collapser, samplevalue_arrays, [] )
-        #print('samplevalues: ' + str(sorted(samplevalues)))
         # compute all stats and print them
-        myarray = scipy.fromiter(samplevalues, float)
-        mymin = scipy.amin(myarray)
-        myavg = scipy.average(myarray)
-        mymedian = scipy.median(myarray)
-        my90th = scipy.percentile(myarray, 90)
-        my95th = scipy.percentile(myarray, 95)
-        my99th = scipy.percentile(myarray, 99)
-        mymax = scipy.amax(myarray)
+        mymin = min(samplevalues)
+        myavg = sum(samplevalues) / float(len(samplevalues))
+        mymedian = median(samplevalues)
+        my90th = percentile(samplevalues, 0.90) 
+        my95th = percentile(samplevalues, 0.95)
+        my99th = percentile(samplevalues, 0.99)
+        mymax = max(samplevalues)
         print( '%f, %d, %f, %f, %f, %f, %f, %f, %f' % (
             start, len(samplevalues), 
             mymin, myavg, mymedian, my90th, my95th, my99th, mymax))
@@ -125,6 +118,18 @@ def print_all_stats(ctx, series):
         start += ctx.interval
         end += ctx.interval
 
+def median(values):
+    s=sorted(values)
+    return float(s[(len(s)-1)/2]+s[(len(s)/2)])/2
+
+def percentile(values, p):
+    s = sorted(values)
+    k = (len(s)-1) * p
+    f = math.floor(k)
+    c = math.ceil(k)
+    if f == c:
+        return s[int(k)]
+    return (s[int(f)] * (c-k)) + (s[int(c)] * (k-f))
 
 def print_default(ctx, series):
     ftime = get_ftime(series)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ac32c6a20a37e850d721aee082493838b184fed1:

  Fio 2.10 (2016-05-21 09:00:54 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to bbcacb72ac5f81b77a96981e6d00d9134360e7c5:

  parse: warn if option is missing a long option variant (2016-05-23 10:39:16 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      options: add 'unique_filename'
      cconv: wire up conversion of unique_filename
      options: add missing long option names
      parse: warn if option is missing a long option variant

 HOWTO            |  5 +++++
 cconv.c          |  2 ++
 filesetup.c      |  3 ++-
 fio.1            |  5 +++++
 options.c        | 29 +++++++++++++++++++++++++++--
 options.h        |  2 +-
 parse.c          |  2 ++
 server.h         |  2 +-
 thread_options.h |  8 +++++---
 9 files changed, 50 insertions(+), 8 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 9ed2c5f..cec4e42 100644
--- a/HOWTO
+++ b/HOWTO
@@ -374,6 +374,11 @@ filename_format=str
 		default of $jobname.$jobnum.$filenum will be used if
 		no other format specifier is given.
 
+unique_filename=bool	To avoid collisions between networked clients, fio
+		defaults to prefixing any generated filenames (with a directory
+		specified) with the source of the client connecting. To disable
+		this behavior, set this option to 0.
+
 opendir=str	Tell fio to recursively add any file it can find in this
 		directory and down the file system tree.
 
diff --git a/cconv.c b/cconv.c
index 0c3a36c..ac826a3 100644
--- a/cconv.c
+++ b/cconv.c
@@ -139,6 +139,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 
 	o->ratecycle = le32_to_cpu(top->ratecycle);
 	o->io_submit_mode = le32_to_cpu(top->io_submit_mode);
+	o->unique_filename = le32_to_cpu(top->unique_filename);
 	o->nr_files = le32_to_cpu(top->nr_files);
 	o->open_files = le32_to_cpu(top->open_files);
 	o->file_lock_mode = le32_to_cpu(top->file_lock_mode);
@@ -333,6 +334,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->ratecycle = cpu_to_le32(o->ratecycle);
 	top->io_submit_mode = cpu_to_le32(o->io_submit_mode);
 	top->nr_files = cpu_to_le32(o->nr_files);
+	top->unique_filename = cpu_to_le32(o->unique_filename);
 	top->open_files = cpu_to_le32(o->open_files);
 	top->file_lock_mode = cpu_to_le32(o->file_lock_mode);
 	top->odirect = cpu_to_le32(o->odirect);
diff --git a/filesetup.c b/filesetup.c
index f721c36..012773b 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1335,7 +1335,8 @@ int add_file(struct thread_data *td, const char *fname, int numjob, int inc)
 	dprint(FD_FILE, "add file %s\n", fname);
 
 	if (td->o.directory)
-		len = set_name_idx(file_name, PATH_MAX, td->o.directory, numjob);
+		len = set_name_idx(file_name, PATH_MAX, td->o.directory, numjob,
+					td->o.unique_filename);
 
 	sprintf(file_name + len, "%s", fname);
 
diff --git a/fio.1 b/fio.1
index 5e4cd4f..839a359 100644
--- a/fio.1
+++ b/fio.1
@@ -247,6 +247,11 @@ will be used if no other format specifier is given.
 .RE
 .P
 .TP
+.BI unique_filename \fR=\fPbool
+To avoid collisions between networked clients, fio defaults to prefixing
+any generated filenames (with a directory specified) with the source of
+the client connecting. To disable this behavior, set this option to 0.
+.TP
 .BI lockfile \fR=\fPstr
 Fio defaults to not locking any files before it does IO to them. If a file or
 file descriptor is shared, fio can serialize IO to that file to make the end
diff --git a/options.c b/options.c
index 07589c4..7f0f2c0 100644
--- a/options.c
+++ b/options.c
@@ -1124,7 +1124,8 @@ static int get_max_name_idx(char *input)
  * Returns the directory at the index, indexes > entires will be
  * assigned via modulo division of the index
  */
-int set_name_idx(char *target, size_t tlen, char *input, int index)
+int set_name_idx(char *target, size_t tlen, char *input, int index,
+		 bool unique_filename)
 {
 	unsigned int cur_idx;
 	int len;
@@ -1136,7 +1137,7 @@ int set_name_idx(char *target, size_t tlen, char *input, int index)
 	for (cur_idx = 0; cur_idx <= index; cur_idx++)
 		fname = get_next_name(&str);
 
-	if (client_sockaddr_str[0]) {
+	if (client_sockaddr_str[0] && unique_filename) {
 		len = snprintf(target, tlen, "%s/%s.", fname,
 				client_sockaddr_str);
 	} else
@@ -1390,6 +1391,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "filename_format",
+		.lname	= "Filename Format",
 		.type	= FIO_OPT_STR_STORE,
 		.off1	= td_var_offset(filename_format),
 		.prio	= -1, /* must come after "directory" */
@@ -1399,6 +1401,16 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_FILENAME,
 	},
 	{
+		.name	= "unique_filename",
+		.lname	= "Unique Filename",
+		.type	= FIO_OPT_BOOL,
+		.off1	= td_var_offset(unique_filename),
+		.help	= "For network clients, prefix file with source IP",
+		.def	= "1",
+		.category = FIO_OPT_C_FILE,
+		.group	= FIO_OPT_G_FILENAME,
+	},
+	{
 		.name	= "lockfile",
 		.lname	= "Lockfile",
 		.type	= FIO_OPT_STR,
@@ -1965,6 +1977,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "random_generator",
+		.lname	= "Random Generator",
 		.type	= FIO_OPT_STR,
 		.off1	= td_var_offset(random_generator),
 		.help	= "Type of random number generator to use",
@@ -1989,6 +2002,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "random_distribution",
+		.lname	= "Random Distribution",
 		.type	= FIO_OPT_STR,
 		.off1	= td_var_offset(random_distribution),
 		.cb	= str_random_distribution_cb,
@@ -2044,6 +2058,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "allrandrepeat",
+		.lname	= "All Random Repeat",
 		.type	= FIO_OPT_BOOL,
 		.off1	= td_var_offset(allrand_repeatable),
 		.help	= "Use repeatable random numbers for everything",
@@ -2543,6 +2558,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "verifysort_nr",
+		.lname	= "Verify Sort Nr",
 		.type	= FIO_OPT_INT,
 		.off1	= td_var_offset(verifysort_nr),
 		.help	= "Pre-load and sort verify blocks for a read workload",
@@ -2664,6 +2680,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 #endif
 	{
 		.name	= "experimental_verify",
+		.lname	= "Experimental Verify",
 		.off1	= td_var_offset(experimental_verify),
 		.type	= FIO_OPT_BOOL,
 		.help	= "Enable experimental verification",
@@ -3078,6 +3095,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "max_latency",
+		.lname	= "Max Latency",
 		.type	= FIO_OPT_INT,
 		.off1	= td_var_offset(max_latency),
 		.help	= "Maximum tolerated IO latency (usec)",
@@ -3172,6 +3190,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "create_only",
+		.lname	= "Create Only",
 		.type	= FIO_OPT_BOOL,
 		.off1	= td_var_offset(create_only),
 		.help	= "Only perform file creation phase",
@@ -3254,6 +3273,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 #ifdef CONFIG_LIBNUMA
 	{
 		.name	= "numa_cpu_nodes",
+		.lname	= "NUMA CPU Nodes",
 		.type	= FIO_OPT_STR,
 		.cb	= str_numa_cpunodes_cb,
 		.off1	= td_var_offset(numa_cpunodes),
@@ -3263,6 +3283,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "numa_mem_policy",
+		.lname	= "NUMA Memory Policy",
 		.type	= FIO_OPT_STR,
 		.cb	= str_numa_mpol_cb,
 		.off1	= td_var_offset(numa_memnodes),
@@ -3353,6 +3374,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "per_job_logs",
+		.lname	= "Per Job Logs",
 		.type	= FIO_OPT_BOOL,
 		.off1	= td_var_offset(per_job_logs),
 		.help	= "Include job number in generated log files or not",
@@ -3683,6 +3705,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "unified_rw_reporting",
+		.lname	= "Unified RW Reporting",
 		.type	= FIO_OPT_BOOL,
 		.off1	= td_var_offset(unified_rw_rep),
 		.help	= "Unify reporting across data direction",
@@ -3736,6 +3759,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "ignore_error",
+		.lname	= "Ignore Error",
 		.type	= FIO_OPT_STR,
 		.cb	= str_ignore_error_cb,
 		.off1	= td_var_offset(ignore_error_nr),
@@ -3746,6 +3770,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 	},
 	{
 		.name	= "error_dump",
+		.lname	= "Error Dump",
 		.type	= FIO_OPT_BOOL,
 		.off1	= td_var_offset(error_dump),
 		.def	= "0",
diff --git a/options.h b/options.h
index 6a5db07..4727bac 100644
--- a/options.h
+++ b/options.h
@@ -20,7 +20,7 @@ void del_opt_posval(const char *, const char *);
 struct thread_data;
 void fio_options_free(struct thread_data *);
 char *get_name_idx(char *, int);
-int set_name_idx(char *, size_t, char *, int);
+int set_name_idx(char *, size_t, char *, int, bool);
 
 extern char client_sockaddr_str[];  /* used with --client option */
 
diff --git a/parse.c b/parse.c
index ec0f870..963f1f8 100644
--- a/parse.c
+++ b/parse.c
@@ -1234,6 +1234,8 @@ void option_init(struct fio_option *o)
 {
 	if (o->type == FIO_OPT_DEPRECATED)
 		return;
+	if (o->name && !o->lname)
+		log_err("Option %s: missing long option name\n", o->name);
 	if (o->type == FIO_OPT_BOOL) {
 		o->minval = 0;
 		o->maxval = 1;
diff --git a/server.h b/server.h
index 7fc3ec6..79c751d 100644
--- a/server.h
+++ b/server.h
@@ -38,7 +38,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 53,
+	FIO_SERVER_VER			= 54,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 10d7ba6..edf090d 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -65,6 +65,8 @@ struct thread_options {
 	unsigned int iodepth_batch_complete_min;
 	unsigned int iodepth_batch_complete_max;
 
+	unsigned int unique_filename;
+
 	unsigned long long size;
 	unsigned long long io_limit;
 	unsigned int size_percent;
@@ -325,6 +327,7 @@ struct thread_options_pack {
 	uint32_t size_percent;
 	uint32_t fill_device;
 	uint32_t file_append;
+	uint32_t unique_filename;
 	uint64_t file_size_low;
 	uint64_t file_size_high;
 	uint64_t start_offset;
@@ -388,6 +391,7 @@ struct thread_options_pack {
 	uint32_t bs_unaligned;
 	uint32_t fsync_on_close;
 	uint32_t bs_is_seq_rand;
+	uint32_t pad1;
 
 	uint32_t random_distribution;
 	uint32_t exitall_error;
@@ -411,7 +415,6 @@ struct thread_options_pack {
 	uint32_t fsync_blocks;
 	uint32_t fdatasync_blocks;
 	uint32_t barrier_blocks;
-	uint32_t pad1;
 	uint64_t start_delay;
 	uint64_t start_delay_high;
 	uint64_t timeout;
@@ -476,7 +479,6 @@ struct thread_options_pack {
 	uint64_t trim_backlog;
 	uint32_t clat_percentiles;
 	uint32_t percentile_precision;
-	uint32_t pad2;
 	fio_fp64_t percentile_list[FIO_IO_U_LIST_MAX_LEN];
 
 	uint8_t read_iolog_file[FIO_TOP_STR_MAX];
@@ -531,7 +533,7 @@ struct thread_options_pack {
 	uint64_t number_ios;
 
 	uint32_t sync_file_range;
-	uint32_t pad3;
+	uint32_t pad2;
 
 	uint64_t latency_target;
 	uint64_t latency_window;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-22 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-22 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2ab71dc4d39e29764f0f80a3559a0119247e1eb1:

  iolog: fix potential oops in iolog disabling (2016-05-20 14:36:34 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ac32c6a20a37e850d721aee082493838b184fed1:

  Fio 2.10 (2016-05-21 09:00:54 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 2.10

 FIO-VERSION-GEN        | 2 +-
 os/windows/install.wxs | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index fcdbd98..d1ba7ca 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.9
+DEF_VER=fio-2.10
 
 LF='
 '
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 44cc938..8ae1394 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.9">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.10">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 6fa3ad511a61666b492ed8126330db9f876359bc:

  iolog: fix duplicate handling of compression end (2016-05-19 15:49:57 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2ab71dc4d39e29764f0f80a3559a0119247e1eb1:

  iolog: fix potential oops in iolog disabling (2016-05-20 14:36:34 -0600)

----------------------------------------------------------------
David Zeng (1):
      The fixed CPU architecture in the Makefile will make failure on ppc64le.

Jens Axboe (5):
      iolog: regrow log out-of-line
      iolog: remove dead define
      Merge branch 'master' of https://github.com/davidzengxhsh/fio
      iolog: fix two bugs in deferred growing
      iolog: fix potential oops in iolog disabling

 Makefile  |  2 +-
 backend.c |  6 ++++++
 fio.h     |  1 +
 iolog.c   | 16 +++++++++++++++
 iolog.h   |  9 ++++++--
 stat.c    | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 6 files changed, 98 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 0133ac4..108e6ee 100644
--- a/Makefile
+++ b/Makefile
@@ -49,7 +49,7 @@ SOURCE :=	$(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 
 ifdef CONFIG_LIBHDFS
   HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
-  HDFSLIB= -Wl,-rpath $(JAVA_HOME)/jre/lib/amd64/server -L$(JAVA_HOME)/jre/lib/amd64/server -ljvm $(FIO_LIBHDFS_LIB)/libhdfs.a
+  HDFSLIB= -Wl,-rpath $(JAVA_HOME)/jre/lib/`uname -m`/server -L$(JAVA_HOME)/jre/lib/`uname -m`/server -ljvm $(FIO_LIBHDFS_LIB)/libhdfs.a
   CFLAGS += $(HDFSFLAGS)
   SOURCE += engines/libhdfs.c
 endif
diff --git a/backend.c b/backend.c
index f830040..6d50360 100644
--- a/backend.c
+++ b/backend.c
@@ -441,6 +441,12 @@ static int wait_for_completions(struct thread_data *td, struct timeval *time)
 	int min_evts = 0;
 	int ret;
 
+	if (td->flags & TD_F_REGROW_LOGS) {
+		ret = io_u_quiesce(td);
+		regrow_logs(td);
+		return ret;
+	}
+
 	/*
 	 * if the queue is full, we MUST reap at least 1 event
 	 */
diff --git a/fio.h b/fio.h
index 8b6a272..7e6311c 100644
--- a/fio.h
+++ b/fio.h
@@ -79,6 +79,7 @@ enum {
 	TD_F_NEED_LOCK		= 1U << 11,
 	TD_F_CHILD		= 1U << 12,
 	TD_F_NO_PROGRESS        = 1U << 13,
+	TD_F_REGROW_LOGS	= 1U << 14,
 };
 
 enum {
diff --git a/iolog.c b/iolog.c
index aec0881..d9a17a5 100644
--- a/iolog.c
+++ b/iolog.c
@@ -587,6 +587,15 @@ void setup_log(struct io_log **log, struct log_params *p,
 	l->filename = strdup(filename);
 	l->td = p->td;
 
+	if (l->td && l->td->o.io_submit_mode != IO_MODE_OFFLOAD) {
+		struct io_logs *p;
+
+		p = calloc(1, sizeof(*l->pending));
+		p->max_samples = DEF_LOG_ENTRIES;
+		p->log = calloc(p->max_samples, log_entry_sz(l));
+		l->pending = p;
+	}
+
 	if (l->log_offset)
 		l->log_ddir_mask = LOG_OFFSET_SAMPLE_BIT;
 
@@ -638,6 +647,13 @@ void free_log(struct io_log *log)
 		free(cur_log->log);
 	}
 
+	if (log->pending) {
+		free(log->pending->log);
+		free(log->pending);
+		log->pending = NULL;
+	}
+
+	free(log->pending);
 	free(log->filename);
 	sfree(log);
 }
diff --git a/iolog.h b/iolog.h
index 2b7813b..0da7067 100644
--- a/iolog.h
+++ b/iolog.h
@@ -44,8 +44,6 @@ enum {
 #define DEF_LOG_ENTRIES		1024
 #define MAX_LOG_ENTRIES		(1024 * DEF_LOG_ENTRIES)
 
-#define LOG_QUIESCE_SZ		(64 * 1024 * 1024)
-
 struct io_logs {
 	struct flist_head list;
 	uint64_t nr_samples;
@@ -63,6 +61,12 @@ struct io_log {
 	struct flist_head io_logs;
 	uint32_t cur_log_max;
 
+	/*
+	 * When the current log runs out of space, store events here until
+	 * we have a chance to regrow
+	 */
+	struct io_logs *pending;
+
 	unsigned int log_ddir_mask;
 
 	char *filename;
@@ -139,6 +143,7 @@ static inline struct io_sample *__get_sample(void *samples, int log_offset,
 
 struct io_logs *iolog_cur_log(struct io_log *);
 uint64_t iolog_nr_samples(struct io_log *);
+void regrow_logs(struct thread_data *);
 
 static inline struct io_sample *get_sample(struct io_log *iolog,
 					   struct io_logs *cur_log,
diff --git a/stat.c b/stat.c
index 5eb1aab..fc1efd4 100644
--- a/stat.c
+++ b/stat.c
@@ -1889,9 +1889,16 @@ static struct io_logs *get_new_log(struct io_log *iolog)
 	return NULL;
 }
 
-static struct io_logs *get_cur_log(struct io_log *iolog)
+/*
+ * Add and return a new log chunk, or return current log if big enough
+ */
+static struct io_logs *regrow_log(struct io_log *iolog)
 {
 	struct io_logs *cur_log;
+	int i;
+
+	if (!iolog || iolog->disabled)
+		goto disable;
 
 	cur_log = iolog_cur_log(iolog);
 	if (!cur_log) {
@@ -1918,13 +1925,70 @@ static struct io_logs *get_cur_log(struct io_log *iolog)
 	 * Get a new log array, and add to our list
 	 */
 	cur_log = get_new_log(iolog);
-	if (cur_log)
+	if (!cur_log) {
+		log_err("fio: failed extending iolog! Will stop logging.\n");
+		return NULL;
+	}
+
+	if (!iolog->pending || !iolog->pending->nr_samples)
 		return cur_log;
 
-	log_err("fio: failed extending iolog! Will stop logging.\n");
+	/*
+	 * Flush pending items to new log
+	 */
+	for (i = 0; i < iolog->pending->nr_samples; i++) {
+		struct io_sample *src, *dst;
+
+		src = get_sample(iolog, iolog->pending, i);
+		dst = get_sample(iolog, cur_log, i);
+		memcpy(dst, src, log_entry_sz(iolog));
+	}
+
+	iolog->pending->nr_samples = 0;
+	return cur_log;
+disable:
+	if (iolog)
+		iolog->disabled = true;
 	return NULL;
 }
 
+void regrow_logs(struct thread_data *td)
+{
+	regrow_log(td->slat_log);
+	regrow_log(td->clat_log);
+	regrow_log(td->lat_log);
+	regrow_log(td->bw_log);
+	regrow_log(td->iops_log);
+	td->flags &= ~TD_F_REGROW_LOGS;
+}
+
+static struct io_logs *get_cur_log(struct io_log *iolog)
+{
+	struct io_logs *cur_log;
+
+	cur_log = iolog_cur_log(iolog);
+	if (!cur_log) {
+		cur_log = get_new_log(iolog);
+		if (!cur_log)
+			return NULL;
+	}
+
+	if (cur_log->nr_samples < cur_log->max_samples)
+		return cur_log;
+
+	/*
+	 * Out of space. If we're in IO offload mode, add a new log chunk
+	 * inline. If we're doing inline submissions, flag 'td' as needing
+	 * a log regrow and we'll take care of it on the submission side.
+	 */
+	if (iolog->td->o.io_submit_mode == IO_MODE_OFFLOAD)
+		return regrow_log(iolog);
+
+	iolog->td->flags |= TD_F_REGROW_LOGS;
+	assert(iolog->pending->nr_samples < iolog->pending->max_samples);
+	return iolog->pending;
+}
+
 static void __add_log_sample(struct io_log *iolog, unsigned long val,
 			     enum fio_ddir ddir, unsigned int bs,
 			     unsigned long t, uint64_t offset)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-20 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-20 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e391c70489d9f63612bce419d7fa0df5d15abf16:

  filesetup: align a size given as a percentage to the block size (2016-05-18 15:06:05 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 6fa3ad511a61666b492ed8126330db9f876359bc:

  iolog: fix duplicate handling of compression end (2016-05-19 15:49:57 -0600)

----------------------------------------------------------------
Jens Axboe (13):
      flist: add flist_last_entry()
      backend: only do forceful timeout exit if the job isn't actively finishing
      backend: mark the thread as finishing, when we are out of the IO loop
      backend: dump state of stuck thread
      iolog: switch to list based scheme
      iolog: don't quiesce on completion
      backend: move iolog compression init before CPU affinity settings
      iolog: fix missing new-line in inflate debug statement
      iolog: memset() zstream at init time
      iolog: sum last chunk length to total
      iolog: more compression debugging/fixes
      iolog: fix bug with ret != Z_STREAM_END
      iolog: fix duplicate handling of compression end

 backend.c |  19 ++++++--
 flist.h   |   3 ++
 iolog.c   | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++------------
 iolog.h   |  25 +++++++---
 server.c  | 139 ++++++++++++++++++++++++++++++++++++--------------------
 stat.c    | 137 +++++++++++++++++++++++++++++++++++++------------------
 6 files changed, 342 insertions(+), 135 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index 7de6f65..f830040 100644
--- a/backend.c
+++ b/backend.c
@@ -1472,6 +1472,14 @@ static void *thread_main(void *data)
 	}
 
 	/*
+	 * Do this early, we don't want the compress threads to be limited
+	 * to the same CPUs as the IO workers. So do this before we set
+	 * any potential CPU affinity
+	 */
+	if (iolog_compress_init(td, sk_out))
+		goto err;
+
+	/*
 	 * If we have a gettimeofday() thread, make sure we exclude that
 	 * thread from this job
 	 */
@@ -1605,9 +1613,6 @@ static void *thread_main(void *data)
 			goto err;
 	}
 
-	if (iolog_compress_init(td, sk_out))
-		goto err;
-
 	fio_verify_init(td);
 
 	if (rate_submit_init(td, sk_out))
@@ -1705,6 +1710,8 @@ static void *thread_main(void *data)
 			break;
 	}
 
+	td_set_runstate(td, TD_FINISHING);
+
 	update_rusage_stat(td);
 	td->ts.total_run_time = mtime_since_now(&td->epoch);
 	td->ts.io_bytes[DDIR_READ] = td->io_bytes[DDIR_READ];
@@ -1813,8 +1820,9 @@ static int fork_main(struct sk_out *sk_out, int shmid, int offset)
 
 static void dump_td_info(struct thread_data *td)
 {
-	log_err("fio: job '%s' hasn't exited in %lu seconds, it appears to "
-		"be stuck. Doing forceful exit of this job.\n", td->o.name,
+	log_err("fio: job '%s' (state=%d) hasn't exited in %lu seconds, it "
+		"appears to be stuck. Doing forceful exit of this job.\n",
+			td->o.name, td->runstate,
 			(unsigned long) time_since_now(&td->terminate_time));
 }
 
@@ -1900,6 +1908,7 @@ static void reap_threads(unsigned int *nr_running, unsigned int *t_rate,
 		 * move on.
 		 */
 		if (td->terminate &&
+		    td->runstate < TD_FSYNCING &&
 		    time_since_now(&td->terminate_time) >= FIO_REAP_TIMEOUT) {
 			dump_td_info(td);
 			td_set_runstate(td, TD_REAPED);
diff --git a/flist.h b/flist.h
index d453e79..b4fe6e6 100644
--- a/flist.h
+++ b/flist.h
@@ -177,6 +177,9 @@ static inline void flist_splice_init(struct flist_head *list,
 #define flist_first_entry(ptr, type, member) \
 	flist_entry((ptr)->next, type, member)
 
+#define flist_last_entry(ptr, type, member) \
+	flist_entry((ptr)->prev, type, member)
+
 /**
  * flist_for_each	-	iterate over a list
  * @pos:	the &struct flist_head to use as a loop counter.
diff --git a/iolog.c b/iolog.c
index 71afe86..aec0881 100644
--- a/iolog.c
+++ b/iolog.c
@@ -20,6 +20,8 @@
 #include "filelock.h"
 #include "smalloc.h"
 
+static int iolog_flush(struct io_log *log);
+
 static const char iolog_ver2[] = "fio version 2 iolog";
 
 void queue_io_piece(struct thread_data *td, struct io_piece *ipo)
@@ -575,8 +577,8 @@ void setup_log(struct io_log **log, struct log_params *p,
 {
 	struct io_log *l;
 
-	l = smalloc(sizeof(*l));
-	l->nr_samples = 0;
+	l = scalloc(1, sizeof(*l));
+	INIT_FLIST_HEAD(&l->io_logs);
 	l->log_type = p->log_type;
 	l->log_offset = p->log_offset;
 	l->log_gz = p->log_gz;
@@ -628,7 +630,14 @@ static void clear_file_buffer(void *buf)
 
 void free_log(struct io_log *log)
 {
-	free(log->log);
+	while (!flist_empty(&log->io_logs)) {
+		struct io_logs *cur_log;
+
+		cur_log = flist_first_entry(&log->io_logs, struct io_logs, list);
+		flist_del_init(&cur_log->list);
+		free(cur_log->log);
+	}
+
 	free(log->filename);
 	sfree(log);
 }
@@ -673,7 +682,8 @@ struct iolog_flush_data {
 	struct workqueue_work work;
 	struct io_log *log;
 	void *samples;
-	uint64_t nr_samples;
+	uint32_t nr_samples;
+	bool free;
 };
 
 #define GZ_CHUNK	131072
@@ -700,6 +710,7 @@ static int z_stream_init(z_stream *stream, int gz_hdr)
 {
 	int wbits = 15;
 
+	memset(stream, 0, sizeof(*stream));
 	stream->zalloc = Z_NULL;
 	stream->zfree = Z_NULL;
 	stream->opaque = Z_NULL;
@@ -734,7 +745,8 @@ static void finish_chunk(z_stream *stream, FILE *f,
 
 	ret = inflateEnd(stream);
 	if (ret != Z_OK)
-		log_err("fio: failed to end log inflation (%d)\n", ret);
+		log_err("fio: failed to end log inflation seq %d (%d)\n",
+				iter->seq, ret);
 
 	flush_samples(f, iter->buf, iter->buf_used);
 	free(iter->buf);
@@ -751,7 +763,7 @@ static size_t inflate_chunk(struct iolog_compress *ic, int gz_hdr, FILE *f,
 {
 	size_t ret;
 
-	dprint(FD_COMPRESS, "inflate chunk size=%lu, seq=%u",
+	dprint(FD_COMPRESS, "inflate chunk size=%lu, seq=%u\n",
 				(unsigned long) ic->len, ic->seq);
 
 	if (ic->seq != iter->seq) {
@@ -798,7 +810,7 @@ static size_t inflate_chunk(struct iolog_compress *ic, int gz_hdr, FILE *f,
 
 	ret = (void *) stream->next_in - ic->buf;
 
-	dprint(FD_COMPRESS, "inflated to size=%lu\n", (unsigned long) ret);
+	dprint(FD_COMPRESS, "inflated to size=%lu\n", (unsigned long) iter->buf_size);
 
 	return ret;
 }
@@ -954,7 +966,13 @@ void flush_log(struct io_log *log, int do_append)
 
 	inflate_gz_chunks(log, f);
 
-	flush_samples(f, log->log, log->nr_samples * log_entry_sz(log));
+	while (!flist_empty(&log->io_logs)) {
+		struct io_logs *cur_log;
+
+		cur_log = flist_first_entry(&log->io_logs, struct io_logs, list);
+		flist_del_init(&cur_log->list);
+		flush_samples(f, cur_log->log, cur_log->nr_samples * log_entry_sz(log));
+	}
 
 	fclose(f);
 	clear_file_buffer(buf);
@@ -963,7 +981,7 @@ void flush_log(struct io_log *log, int do_append)
 static int finish_log(struct thread_data *td, struct io_log *log, int trylock)
 {
 	if (td->flags & TD_F_COMPRESS_LOG)
-		iolog_flush(log, 1);
+		iolog_flush(log);
 
 	if (trylock) {
 		if (fio_trylock_file(log->filename))
@@ -1005,7 +1023,7 @@ size_t log_chunk_sizes(struct io_log *log)
 
 static int gz_work(struct iolog_flush_data *data)
 {
-	struct iolog_compress *c;
+	struct iolog_compress *c = NULL;
 	struct flist_head list;
 	unsigned int seq;
 	z_stream stream;
@@ -1014,6 +1032,7 @@ static int gz_work(struct iolog_flush_data *data)
 
 	INIT_FLIST_HEAD(&list);
 
+	memset(&stream, 0, sizeof(stream));
 	stream.zalloc = Z_NULL;
 	stream.zfree = Z_NULL;
 	stream.opaque = Z_NULL;
@@ -1029,9 +1048,12 @@ static int gz_work(struct iolog_flush_data *data)
 	stream.next_in = (void *) data->samples;
 	stream.avail_in = data->nr_samples * log_entry_sz(data->log);
 
-	dprint(FD_COMPRESS, "deflate input size=%lu, seq=%u\n",
-				(unsigned long) stream.avail_in, seq);
+	dprint(FD_COMPRESS, "deflate input size=%lu, seq=%u, log=%s\n",
+				(unsigned long) stream.avail_in, seq,
+				data->log->filename);
 	do {
+		if (c)
+			dprint(FD_COMPRESS, "seq=%d, chunk=%lu\n", seq, c->len);
 		c = get_new_chunk(seq);
 		stream.avail_out = GZ_CHUNK;
 		stream.next_out = c->buf;
@@ -1051,9 +1073,26 @@ static int gz_work(struct iolog_flush_data *data)
 	stream.avail_out = GZ_CHUNK - c->len;
 
 	ret = deflate(&stream, Z_FINISH);
-	if (ret == Z_STREAM_END)
-		c->len = GZ_CHUNK - stream.avail_out;
-	else {
+	if (ret < 0) {
+		/*
+		 * Z_BUF_ERROR is special, it just means we need more
+		 * output space. We'll handle that below. Treat any other
+		 * error as fatal.
+		 */
+		if (ret != Z_BUF_ERROR) {
+			log_err("fio: deflate log (%d)\n", ret);
+			flist_del(&c->list);
+			free_chunk(c);
+			goto err;
+		}
+	}
+
+	total -= c->len;
+	c->len = GZ_CHUNK - stream.avail_out;
+	total += c->len;
+	dprint(FD_COMPRESS, "seq=%d, chunk=%lu\n", seq, c->len);
+
+	if (ret != Z_STREAM_END) {
 		do {
 			c = get_new_chunk(seq);
 			stream.avail_out = GZ_CHUNK;
@@ -1062,6 +1101,7 @@ static int gz_work(struct iolog_flush_data *data)
 			c->len = GZ_CHUNK - stream.avail_out;
 			total += c->len;
 			flist_add_tail(&c->list, &list);
+			dprint(FD_COMPRESS, "seq=%d, chunk=%lu\n", seq, c->len);
 		} while (ret != Z_STREAM_END);
 	}
 
@@ -1081,7 +1121,8 @@ static int gz_work(struct iolog_flush_data *data)
 
 	ret = 0;
 done:
-	free(data);
+	if (data->free)
+		free(data);
 	return ret;
 err:
 	while (!flist_empty(&list)) {
@@ -1145,39 +1186,69 @@ void iolog_compress_exit(struct thread_data *td)
  * Queue work item to compress the existing log entries. We reset the
  * current log to a small size, and reference the existing log in the
  * data that we queue for compression. Once compression has been done,
- * this old log is freed. If called with wait == 1, will not return until
- * the log compression has completed.
+ * this old log is freed. If called with finish == true, will not return
+ * until the log compression has completed, and will flush all previous
+ * logs too
  */
-int iolog_flush(struct io_log *log, int wait)
+static int iolog_flush(struct io_log *log)
 {
 	struct iolog_flush_data *data;
 
-	io_u_quiesce(log->td);
-
 	data = malloc(sizeof(*data));
 	if (!data)
 		return 1;
 
 	data->log = log;
+	data->free = false;
 
-	data->samples = log->log;
-	data->nr_samples = log->nr_samples;
+	while (!flist_empty(&log->io_logs)) {
+		struct io_logs *cur_log;
 
-	log->nr_samples = 0;
-	log->max_samples = DEF_LOG_ENTRIES;
-	log->log = malloc(log->max_samples * log_entry_sz(log));
+		cur_log = flist_first_entry(&log->io_logs, struct io_logs, list);
+		flist_del_init(&cur_log->list);
+
+		data->samples = cur_log->log;
+		data->nr_samples = cur_log->nr_samples;
+
+		cur_log->nr_samples = 0;
+		cur_log->max_samples = 0;
+		cur_log->log = NULL;
 
-	if (!wait)
-		workqueue_enqueue(&log->td->log_compress_wq, &data->work);
-	else
 		gz_work(data);
+	}
 
+	free(data);
 	return 0;
 }
 
+int iolog_cur_flush(struct io_log *log, struct io_logs *cur_log)
+{
+	struct iolog_flush_data *data;
+
+	data = malloc(sizeof(*data));
+	if (!data)
+		return 1;
+
+	data->log = log;
+
+	data->samples = cur_log->log;
+	data->nr_samples = cur_log->nr_samples;
+	data->free = true;
+
+	cur_log->nr_samples = cur_log->max_samples = 0;
+	cur_log->log = NULL;
+
+	workqueue_enqueue(&log->td->log_compress_wq, &data->work);
+	return 0;
+}
 #else
 
-int iolog_flush(struct io_log *log, int wait)
+static int iolog_flush(struct io_log *log)
+{
+	return 1;
+}
+
+int iolog_cur_flush(struct io_log *log, struct io_logs *cur_log)
 {
 	return 1;
 }
@@ -1193,6 +1264,29 @@ void iolog_compress_exit(struct thread_data *td)
 
 #endif
 
+struct io_logs *iolog_cur_log(struct io_log *log)
+{
+	if (flist_empty(&log->io_logs))
+		return NULL;
+
+	return flist_last_entry(&log->io_logs, struct io_logs, list);
+}
+
+uint64_t iolog_nr_samples(struct io_log *iolog)
+{
+	struct flist_head *entry;
+	uint64_t ret = 0;
+
+	flist_for_each(entry, &iolog->io_logs) {
+		struct io_logs *cur_log;
+
+		cur_log = flist_entry(entry, struct io_logs, list);
+		ret += cur_log->nr_samples;
+	}
+
+	return ret;
+}
+
 static int __write_log(struct thread_data *td, struct io_log *log, int try)
 {
 	if (log)
diff --git a/iolog.h b/iolog.h
index 739a7c8..2b7813b 100644
--- a/iolog.h
+++ b/iolog.h
@@ -42,6 +42,16 @@ enum {
 };
 
 #define DEF_LOG_ENTRIES		1024
+#define MAX_LOG_ENTRIES		(1024 * DEF_LOG_ENTRIES)
+
+#define LOG_QUIESCE_SZ		(64 * 1024 * 1024)
+
+struct io_logs {
+	struct flist_head list;
+	uint64_t nr_samples;
+	uint64_t max_samples;
+	void *log;
+};
 
 /*
  * Dynamically growing data sample log
@@ -50,9 +60,8 @@ struct io_log {
 	/*
 	 * Entries already logged
 	 */
-	uint64_t nr_samples;
-	uint64_t max_samples;
-	void *log;
+	struct flist_head io_logs;
+	uint32_t cur_log_max;
 
 	unsigned int log_ddir_mask;
 
@@ -65,7 +74,7 @@ struct io_log {
 	/*
 	 * If we fail extending the log, stop collecting more entries.
 	 */
-	unsigned int disabled;
+	bool disabled;
 
 	/*
 	 * Log offsets
@@ -128,10 +137,14 @@ static inline struct io_sample *__get_sample(void *samples, int log_offset,
 	return (struct io_sample *) ((char *) samples + sample_offset);
 }
 
+struct io_logs *iolog_cur_log(struct io_log *);
+uint64_t iolog_nr_samples(struct io_log *);
+
 static inline struct io_sample *get_sample(struct io_log *iolog,
+					   struct io_logs *cur_log,
 					   uint64_t sample)
 {
-	return __get_sample(iolog->log, iolog->log_offset, sample);
+	return __get_sample(cur_log->log, iolog->log_offset, sample);
 }
 
 enum {
@@ -219,7 +232,7 @@ extern void flush_samples(FILE *, void *, uint64_t);
 extern void free_log(struct io_log *);
 extern void fio_writeout_logs(bool);
 extern void td_writeout_logs(struct thread_data *, bool);
-extern int iolog_flush(struct io_log *, int);
+extern int iolog_cur_flush(struct io_log *, struct io_logs *);
 
 static inline void init_ipo(struct io_piece *ipo)
 {
diff --git a/server.c b/server.c
index dcb7c2d..d36c511 100644
--- a/server.c
+++ b/server.c
@@ -1652,58 +1652,79 @@ void fio_server_send_du(void)
 	}
 }
 
-static int fio_append_iolog_gz(struct sk_entry *first, struct io_log *log)
-{
-	int ret = 0;
 #ifdef CONFIG_ZLIB
+static int __fio_append_iolog_gz(struct sk_entry *first, struct io_log *log,
+				 struct io_logs *cur_log, z_stream *stream)
+{
 	struct sk_entry *entry;
-	z_stream stream;
 	void *out_pdu;
+	int ret;
 
-	/*
-	 * Dirty - since the log is potentially huge, compress it into
-	 * FIO_SERVER_MAX_FRAGMENT_PDU chunks and let the receiving
-	 * side defragment it.
-	 */
-	out_pdu = malloc(FIO_SERVER_MAX_FRAGMENT_PDU);
-
-	stream.zalloc = Z_NULL;
-	stream.zfree = Z_NULL;
-	stream.opaque = Z_NULL;
-
-	if (deflateInit(&stream, Z_DEFAULT_COMPRESSION) != Z_OK) {
-		ret = 1;
-		goto err;
-	}
-
-	stream.next_in = (void *) log->log;
-	stream.avail_in = log->nr_samples * log_entry_sz(log);
+	stream->next_in = (void *) cur_log->log;
+	stream->avail_in = cur_log->nr_samples * log_entry_sz(log);
 
 	do {
 		unsigned int this_len;
 
-		stream.avail_out = FIO_SERVER_MAX_FRAGMENT_PDU;
-		stream.next_out = out_pdu;
-		ret = deflate(&stream, Z_FINISH);
+		/*
+		 * Dirty - since the log is potentially huge, compress it into
+		 * FIO_SERVER_MAX_FRAGMENT_PDU chunks and let the receiving
+		 * side defragment it.
+		 */
+		out_pdu = malloc(FIO_SERVER_MAX_FRAGMENT_PDU);
+
+		stream->avail_out = FIO_SERVER_MAX_FRAGMENT_PDU;
+		stream->next_out = out_pdu;
+		ret = deflate(stream, Z_FINISH);
 		/* may be Z_OK, or Z_STREAM_END */
-		if (ret < 0)
-			goto err_zlib;
+		if (ret < 0) {
+			free(out_pdu);
+			return 1;
+		}
 
-		this_len = FIO_SERVER_MAX_FRAGMENT_PDU - stream.avail_out;
+		this_len = FIO_SERVER_MAX_FRAGMENT_PDU - stream->avail_out;
 
 		entry = fio_net_prep_cmd(FIO_NET_CMD_IOLOG, out_pdu, this_len,
-						NULL, SK_F_VEC | SK_F_INLINE | SK_F_FREE);
-		out_pdu = NULL;
+					 NULL, SK_F_VEC | SK_F_INLINE | SK_F_FREE);
 		flist_add_tail(&entry->list, &first->next);
-	} while (stream.avail_in);
+	} while (stream->avail_in);
+
+	return 0;
+}
+
+static int fio_append_iolog_gz(struct sk_entry *first, struct io_log *log)
+{
+	int ret = 0;
+	z_stream stream;
+
+	memset(&stream, 0, sizeof(stream));
+	stream.zalloc = Z_NULL;
+	stream.zfree = Z_NULL;
+	stream.opaque = Z_NULL;
+
+	if (deflateInit(&stream, Z_DEFAULT_COMPRESSION) != Z_OK)
+		return 1;
+
+	while (!flist_empty(&log->io_logs)) {
+		struct io_logs *cur_log;
+
+		cur_log = flist_first_entry(&log->io_logs, struct io_logs, list);
+		flist_del_init(&cur_log->list);
+
+		ret = __fio_append_iolog_gz(first, log, cur_log, &stream);
+		if (ret)
+			break;
+	}
 
-err_zlib:
 	deflateEnd(&stream);
-err:
-	free(out_pdu);
-#endif
 	return ret;
 }
+#else
+static int fio_append_iolog_gz(struct sk_entry *first, struct io_log *log)
+{
+	return 1;
+}
+#endif
 
 static int fio_append_gz_chunks(struct sk_entry *first, struct io_log *log)
 {
@@ -1727,11 +1748,21 @@ static int fio_append_gz_chunks(struct sk_entry *first, struct io_log *log)
 static int fio_append_text_log(struct sk_entry *first, struct io_log *log)
 {
 	struct sk_entry *entry;
-	size_t size = log->nr_samples * log_entry_sz(log);
 
-	entry = fio_net_prep_cmd(FIO_NET_CMD_IOLOG, log->log, size,
-					NULL, SK_F_VEC | SK_F_INLINE);
-	flist_add_tail(&entry->list, &first->next);
+	while (!flist_empty(&log->io_logs)) {
+		struct io_logs *cur_log;
+		size_t size;
+
+		cur_log = flist_first_entry(&log->io_logs, struct io_logs, list);
+		flist_del_init(&cur_log->list);
+
+		size = cur_log->nr_samples * log_entry_sz(log);
+
+		entry = fio_net_prep_cmd(FIO_NET_CMD_IOLOG, cur_log->log, size,
+						NULL, SK_F_VEC | SK_F_INLINE);
+		flist_add_tail(&entry->list, &first->next);
+	}
+
 	return 0;
 }
 
@@ -1739,9 +1770,10 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 {
 	struct cmd_iolog_pdu pdu;
 	struct sk_entry *first;
-	int i, ret = 0;
+	struct flist_head *entry;
+	int ret = 0;
 
-	pdu.nr_samples = cpu_to_le64(log->nr_samples);
+	pdu.nr_samples = cpu_to_le64(iolog_nr_samples(log));
 	pdu.thread_number = cpu_to_le32(td->thread_number);
 	pdu.log_type = cpu_to_le32(log->log_type);
 
@@ -1759,18 +1791,25 @@ int fio_send_iolog(struct thread_data *td, struct io_log *log, const char *name)
 	 * We can't do this for a pre-compressed log, but for that case,
 	 * log->nr_samples is zero anyway.
 	 */
-	for (i = 0; i < log->nr_samples; i++) {
-		struct io_sample *s = get_sample(log, i);
+	flist_for_each(entry, &log->io_logs) {
+		struct io_logs *cur_log;
+		int i;
 
-		s->time		= cpu_to_le64(s->time);
-		s->val		= cpu_to_le64(s->val);
-		s->__ddir	= cpu_to_le32(s->__ddir);
-		s->bs		= cpu_to_le32(s->bs);
+		cur_log = flist_entry(entry, struct io_logs, list);
 
-		if (log->log_offset) {
-			struct io_sample_offset *so = (void *) s;
+		for (i = 0; i < cur_log->nr_samples; i++) {
+			struct io_sample *s = get_sample(log, cur_log, i);
 
-			so->offset = cpu_to_le64(so->offset);
+			s->time		= cpu_to_le64(s->time);
+			s->val		= cpu_to_le64(s->val);
+			s->__ddir	= cpu_to_le32(s->__ddir);
+			s->bs		= cpu_to_le32(s->bs);
+
+			if (log->log_offset) {
+				struct io_sample_offset *so = (void *) s;
+
+				so->offset = cpu_to_le64(so->offset);
+			}
 		}
 	}
 
diff --git a/stat.c b/stat.c
index 4d87c29..5eb1aab 100644
--- a/stat.c
+++ b/stat.c
@@ -1849,66 +1849,115 @@ static inline void add_stat_sample(struct io_stat *is, unsigned long data)
 	is->samples++;
 }
 
+/*
+ * Return a struct io_logs, which is added to the tail of the log
+ * list for 'iolog'.
+ */
+static struct io_logs *get_new_log(struct io_log *iolog)
+{
+	size_t new_size, new_samples;
+	struct io_logs *cur_log;
+
+	/*
+	 * Cap the size at MAX_LOG_ENTRIES, so we don't keep doubling
+	 * forever
+	 */
+	if (!iolog->cur_log_max)
+		new_samples = DEF_LOG_ENTRIES;
+	else {
+		new_samples = iolog->cur_log_max * 2;
+		if (new_samples > MAX_LOG_ENTRIES)
+			new_samples = MAX_LOG_ENTRIES;
+	}
+
+	new_size = new_samples * log_entry_sz(iolog);
+
+	cur_log = malloc(sizeof(*cur_log));
+	if (cur_log) {
+		INIT_FLIST_HEAD(&cur_log->list);
+		cur_log->log = malloc(new_size);
+		if (cur_log->log) {
+			cur_log->nr_samples = 0;
+			cur_log->max_samples = new_samples;
+			flist_add_tail(&cur_log->list, &iolog->io_logs);
+			iolog->cur_log_max = new_samples;
+			return cur_log;
+		}
+		free(cur_log);
+	}
+
+	return NULL;
+}
+
+static struct io_logs *get_cur_log(struct io_log *iolog)
+{
+	struct io_logs *cur_log;
+
+	cur_log = iolog_cur_log(iolog);
+	if (!cur_log) {
+		cur_log = get_new_log(iolog);
+		if (!cur_log)
+			return NULL;
+	}
+
+	if (cur_log->nr_samples < cur_log->max_samples)
+		return cur_log;
+
+	/*
+	 * No room for a new sample. If we're compressing on the fly, flush
+	 * out the current chunk
+	 */
+	if (iolog->log_gz) {
+		if (iolog_cur_flush(iolog, cur_log)) {
+			log_err("fio: failed flushing iolog! Will stop logging.\n");
+			return NULL;
+		}
+	}
+
+	/*
+	 * Get a new log array, and add to our list
+	 */
+	cur_log = get_new_log(iolog);
+	if (cur_log)
+		return cur_log;
+
+	log_err("fio: failed extending iolog! Will stop logging.\n");
+	return NULL;
+}
+
 static void __add_log_sample(struct io_log *iolog, unsigned long val,
 			     enum fio_ddir ddir, unsigned int bs,
 			     unsigned long t, uint64_t offset)
 {
-	uint64_t nr_samples = iolog->nr_samples;
-	struct io_sample *s;
+	struct io_logs *cur_log;
 
 	if (iolog->disabled)
 		return;
-
-	if (!iolog->nr_samples)
+	if (flist_empty(&iolog->io_logs))
 		iolog->avg_last = t;
 
-	if (iolog->nr_samples == iolog->max_samples) {
-		size_t new_size, new_samples;
-		void *new_log;
+	cur_log = get_cur_log(iolog);
+	if (cur_log) {
+		struct io_sample *s;
 
-		if (!iolog->max_samples)
-			new_samples = DEF_LOG_ENTRIES;
-		else
-			new_samples = iolog->max_samples * 2;
-
-		new_size = new_samples * log_entry_sz(iolog);
-
-		if (iolog->log_gz && (new_size > iolog->log_gz)) {
-			if (!iolog->log) {
-				iolog->log = malloc(new_size);
-				iolog->max_samples = new_samples;
-			} else if (iolog_flush(iolog, 0)) {
-				log_err("fio: failed flushing iolog! Will stop logging.\n");
-				iolog->disabled = 1;
-				return;
-			}
-			nr_samples = iolog->nr_samples;
-		} else {
-			new_log = realloc(iolog->log, new_size);
-			if (!new_log) {
-				log_err("fio: failed extending iolog! Will stop logging.\n");
-				iolog->disabled = 1;
-				return;
-			}
-			iolog->log = new_log;
-			iolog->max_samples = new_samples;
-		}
-	}
+		s = get_sample(iolog, cur_log, cur_log->nr_samples);
 
-	s = get_sample(iolog, nr_samples);
+		s->val = val;
+		s->time = t;
+		io_sample_set_ddir(iolog, s, ddir);
+		s->bs = bs;
 
-	s->val = val;
-	s->time = t;
-	io_sample_set_ddir(iolog, s, ddir);
-	s->bs = bs;
+		if (iolog->log_offset) {
+			struct io_sample_offset *so = (void *) s;
 
-	if (iolog->log_offset) {
-		struct io_sample_offset *so = (void *) s;
+			so->offset = offset;
+		}
 
-		so->offset = offset;
+		cur_log->nr_samples++;
+		return;
 	}
 
-	iolog->nr_samples++;
+	iolog->disabled = true;
 }
 
 static inline void reset_io_stat(struct io_stat *ios)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8c4693e2e578613f517dc42b38e204bf77fdab1d:

  add -A option for better stats (2016-05-17 18:48:30 -0400)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e391c70489d9f63612bce419d7fa0df5d15abf16:

  filesetup: align a size given as a percentage to the block size (2016-05-18 15:06:05 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      filesetup: align a size given as a percentage to the block size

 filesetup.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index 9c37ae5..f721c36 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -894,8 +894,10 @@ int setup_files(struct thread_data *td)
 		if (f->io_size == -1ULL)
 			total_size = -1ULL;
 		else {
-                        if (o->size_percent)
-                                f->io_size = (f->io_size * o->size_percent) / 100;
+                        if (o->size_percent) {
+				f->io_size = (f->io_size * o->size_percent) / 100;
+				f->io_size -= (f->io_size % td_min_bs(td));
+			}
 			total_size += f->io_size;
 		}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-18 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-18 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 15a0c8ee4e1a5434075ebc2c9f48e96e5e892196:

  Windows crash in ctime_r() (2016-05-16 19:25:48 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8c4693e2e578613f517dc42b38e204bf77fdab1d:

  add -A option for better stats (2016-05-17 18:48:30 -0400)

----------------------------------------------------------------
Ben England (1):
      add -A option for better stats

Jens Axboe (1):
      init: cleanup random inits

 init.c                | 35 +++++++++++++++-------------
 options.c             |  4 ++--
 tools/fiologparser.py | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+), 18 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index e8c8afb..7166ea7 100644
--- a/init.c
+++ b/init.c
@@ -919,6 +919,23 @@ static int exists_and_not_file(const char *filename)
 	return 1;
 }
 
+static void init_rand_file_service(struct thread_data *td)
+{
+	unsigned long nranges = td->o.nr_files << FIO_FSERVICE_SHIFT;
+	const unsigned int seed = td->rand_seeds[FIO_RAND_FILE_OFF];
+
+	if (td->o.file_service_type == FIO_FSERVICE_ZIPF) {
+		zipf_init(&td->next_file_zipf, nranges, td->zipf_theta, seed);
+		zipf_disable_hash(&td->next_file_zipf);
+	} else if (td->o.file_service_type == FIO_FSERVICE_PARETO) {
+		pareto_init(&td->next_file_zipf, nranges, td->pareto_h, seed);
+		zipf_disable_hash(&td->next_file_zipf);
+	} else if (td->o.file_service_type == FIO_FSERVICE_GAUSS) {
+		gauss_init(&td->next_file_gauss, nranges, td->gauss_dev, seed);
+		gauss_disable_hash(&td->next_file_gauss);
+	}
+}
+
 static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 {
 	int i;
@@ -929,22 +946,8 @@ static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 
 	if (td->o.file_service_type == FIO_FSERVICE_RANDOM)
 		init_rand_seed(&td->next_file_state, td->rand_seeds[FIO_RAND_FILE_OFF], use64);
-	else if (td->o.file_service_type & __FIO_FSERVICE_NONUNIFORM) {
-		unsigned long nranges;
-
-		nranges = td->o.nr_files << FIO_FSERVICE_SHIFT;
-
-		if (td->o.file_service_type == FIO_FSERVICE_ZIPF) {
-			zipf_init(&td->next_file_zipf, nranges, td->zipf_theta, td->rand_seeds[FIO_RAND_FILE_OFF]);
-			zipf_disable_hash(&td->next_file_zipf);
-		} else if (td->o.file_service_type == FIO_FSERVICE_PARETO) {
-			pareto_init(&td->next_file_zipf, nranges, td->pareto_h, td->rand_seeds[FIO_RAND_FILE_OFF]);
-			zipf_disable_hash(&td->next_file_zipf);
-		} else if (td->o.file_service_type == FIO_FSERVICE_GAUSS) {
-			gauss_init(&td->next_file_gauss, nranges, td->gauss_dev, td->rand_seeds[FIO_RAND_FILE_OFF]);
-			gauss_disable_hash(&td->next_file_gauss);
-		}
-	}
+	else if (td->o.file_service_type & __FIO_FSERVICE_NONUNIFORM)
+		init_rand_file_service(td);
 
 	init_rand_seed(&td->file_size_state, td->rand_seeds[FIO_RAND_FILE_SIZE_OFF], use64);
 	init_rand_seed(&td->trim_state, td->rand_seeds[FIO_RAND_TRIM_OFF], use64);
diff --git a/options.c b/options.c
index a925663..07589c4 100644
--- a/options.c
+++ b/options.c
@@ -788,7 +788,7 @@ static int str_fst_cb(void *data, const char *str)
 		break;
 	case FIO_FSERVICE_GAUSS:
 		if (val < 0.00 || val >= 100.00) {
-                          log_err("fio: normal deviation out of range (0 < input < 100.0  )\n");
+                          log_err("fio: normal deviation out of range (0 <= input < 100.0)\n");
                           return 1;
 		}
 		if (parse_dryrun())
@@ -1048,7 +1048,7 @@ static int str_random_distribution_cb(void *data, const char *str)
 		td->o.pareto_h.u.f = val;
 	} else {
 		if (val < 0.00 || val >= 100.0) {
-			log_err("fio: normal deviation out of range (0 < input < 100.0)\n");
+			log_err("fio: normal deviation out of range (0 <= input < 100.0)\n");
 			return 1;
 		}
 		if (parse_dryrun())
diff --git a/tools/fiologparser.py b/tools/fiologparser.py
index 0574099..00e4d30 100755
--- a/tools/fiologparser.py
+++ b/tools/fiologparser.py
@@ -14,12 +14,16 @@
 # to see per-interval average completion latency.
 
 import argparse
+import numpy
+import scipy
 
 def parse_args():
     parser = argparse.ArgumentParser()
     parser.add_argument('-i', '--interval', required=False, type=int, default=1000, help='interval of time in seconds.')
     parser.add_argument('-d', '--divisor', required=False, type=int, default=1, help='divide the results by this value.')
     parser.add_argument('-f', '--full', dest='full', action='store_true', default=False, help='print full output.')
+    parser.add_argument('-A', '--all', dest='allstats', action='store_true', default=False, 
+                        help='print all stats for each interval.')
     parser.add_argument('-a', '--average', dest='average', action='store_true', default=False, help='print the average for each interval.')
     parser.add_argument('-s', '--sum', dest='sum', action='store_true', default=False, help='print the sum for each interval.')
     parser.add_argument("FILE", help="collectl log output files to parse", nargs="+")
@@ -70,6 +74,57 @@ def print_averages(ctx, series):
         start += ctx.interval
         end += ctx.interval
 
+# FIXME: this routine is computationally inefficient
+# and has O(N^2) behavior
+# it would be better to make one pass through samples
+# to segment them into a series of time intervals, and
+# then compute stats on each time interval instead.
+# to debug this routine, use
+#   # sort -n -t ',' -k 2 small.log
+# on your input.
+# Sometimes scipy interpolates between two values to get a percentile
+
+def my_extend( vlist, val ):
+    vlist.extend(val)
+    return vlist
+
+array_collapser = lambda vlist, val:  my_extend(vlist, val) 
+
+def print_all_stats(ctx, series):
+    ftime = get_ftime(series)
+    start = 0 
+    end = ctx.interval
+    print('start-time, samples, min, avg, median, 90%, 95%, 99%, max')
+    while (start < ftime):  # for each time interval
+        end = ftime if ftime < end else end
+        sample_arrays = [ s.get_samples(start, end) for s in series ]
+        samplevalue_arrays = []
+        for sample_array in sample_arrays:
+            samplevalue_arrays.append( 
+                [ sample.value for sample in sample_array ] )
+        #print('samplevalue_arrays len: %d' % len(samplevalue_arrays))
+        #print('samplevalue_arrays elements len: ' + \
+               #str(map( lambda l: len(l), samplevalue_arrays)))
+        # collapse list of lists of sample values into list of sample values
+        samplevalues = reduce( array_collapser, samplevalue_arrays, [] )
+        #print('samplevalues: ' + str(sorted(samplevalues)))
+        # compute all stats and print them
+        myarray = scipy.fromiter(samplevalues, float)
+        mymin = scipy.amin(myarray)
+        myavg = scipy.average(myarray)
+        mymedian = scipy.median(myarray)
+        my90th = scipy.percentile(myarray, 90)
+        my95th = scipy.percentile(myarray, 95)
+        my99th = scipy.percentile(myarray, 99)
+        mymax = scipy.amax(myarray)
+        print( '%f, %d, %f, %f, %f, %f, %f, %f, %f' % (
+            start, len(samplevalues), 
+            mymin, myavg, mymedian, my90th, my95th, my99th, mymax))
+
+        # advance to next interval
+        start += ctx.interval
+        end += ctx.interval
+
 
 def print_default(ctx, series):
     ftime = get_ftime(series)
@@ -112,6 +167,13 @@ class TimeSeries():
             self.last = sample
         self.samples.append(sample)
 
+    def get_samples(self, start, end):
+        sample_list = []
+        for s in self.samples:
+            if s.start >= start and s.end <= end:
+                sample_list.append(s)
+        return sample_list
+
     def get_value(self, start, end):
         value = 0
         for sample in self.samples:
@@ -147,6 +209,8 @@ if __name__ == '__main__':
         print_averages(ctx, series)
     elif ctx.full:
         print_full(ctx, series)
+    elif ctx.allstats:
+        print_all_stats(ctx, series)
     else:
         print_default(ctx, series)
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-17 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-17 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5c8f0ba56837a0b848cbbbc5a8673589d099ded3:

  verify: unroll string copy (2016-05-10 19:50:00 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 15a0c8ee4e1a5434075ebc2c9f48e96e5e892196:

  Windows crash in ctime_r() (2016-05-16 19:25:48 -0600)

----------------------------------------------------------------
Jens Axboe (5):
      stat: add blocksize to averaged log, if it's consistent
      zipf/pareto/gauss: add option to disable hashing
      Add support for non-uniformly random file service type
      options: 0.00 is a valid gauss dev
      zipf/pareto/gauss: hash cleanup

Michael Schoberg (mschoberg) (1):
      Windows crash in ctime_r()

 HOWTO              | 21 ++++++++++---
 file.h             | 17 +++++++----
 fio.1              | 18 +++++++++--
 fio.h              |  9 ++++++
 init.c             | 16 ++++++++++
 io_u.c             | 47 +++++++++++++++++++++++------
 lib/gauss.c        | 10 ++++++-
 lib/gauss.h        |  2 ++
 lib/zipf.c         | 21 +++++++++++--
 lib/zipf.h         |  2 ++
 options.c          | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 os/windows/posix.c |  6 ++--
 stat.c             | 20 ++++++++++---
 13 files changed, 240 insertions(+), 36 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 88d10a1..9ed2c5f 100644
--- a/HOWTO
+++ b/HOWTO
@@ -673,10 +673,23 @@ file_service_type=str  Defines how fio decides which file from a job to
 				the next. Multiple files can still be
 				open depending on 'openfiles'.
 
-		The string can have a number appended, indicating how
-		often to switch to a new file. So if option random:4 is
-		given, fio will switch to a new random file after 4 ios
-		have been issued.
+			zipf	Use a zipfian distribution to decide what file
+				to access.
+
+			pareto	Use a pareto distribution to decide what file
+				to access.
+
+			gauss	Use a gaussian (normal) distribution to decide
+				what file to access.
+
+		For random, roundrobin, and sequential, a postfix can be
+		appended to tell fio how many I/Os to issue before switching
+		to a new file. For example, specifying
+		'file_service_type=random:8' would cause fio to issue 8 I/Os
+		before selecting a new file at random. For the non-uniform
+		distributions, a floating point postfix can be given to
+		influence how the distribution is skewed. See
+		'random_distribution' for a description of how that would work.
 
 ioengine=str	Defines how the job issues io to the file. The following
 		types are defined:
diff --git a/file.h b/file.h
index e7563b8..0cf622f 100644
--- a/file.h
+++ b/file.h
@@ -39,13 +39,20 @@ enum file_lock_mode {
 };
 
 /*
- * roundrobin available files, or choose one at random, or do each one
- * serially.
+ * How fio chooses what file to service next. Choice of uniformly random, or
+ * some skewed random variants, or just sequentially go through them or
+ * roundrobing.
  */
 enum {
-	FIO_FSERVICE_RANDOM	= 1,
-	FIO_FSERVICE_RR		= 2,
-	FIO_FSERVICE_SEQ	= 3,
+	FIO_FSERVICE_RANDOM		= 1,
+	FIO_FSERVICE_RR			= 2,
+	FIO_FSERVICE_SEQ		= 3,
+	__FIO_FSERVICE_NONUNIFORM	= 0x100,
+	FIO_FSERVICE_ZIPF		= __FIO_FSERVICE_NONUNIFORM | 4,
+	FIO_FSERVICE_PARETO		= __FIO_FSERVICE_NONUNIFORM | 5,
+	FIO_FSERVICE_GAUSS		= __FIO_FSERVICE_NONUNIFORM | 6,
+
+	FIO_FSERVICE_SHIFT		= 10,
 };
 
 /*
diff --git a/fio.1 b/fio.1
index ebb4899..5e4cd4f 100644
--- a/fio.1
+++ b/fio.1
@@ -566,10 +566,24 @@ Round robin over opened files (default).
 .TP
 .B sequential
 Do each file in the set sequentially.
+.TP
+.B zipf
+Use a zipfian distribution to decide what file to access.
+.TP
+.B pareto
+Use a pareto distribution to decide what file to access.
+.TP
+.B gauss
+Use a gaussian (normal) distribution to decide what file to access.
 .RE
 .P
-The number of I/Os to issue before switching to a new file can be specified by
-appending `:\fIint\fR' to the service type.
+For \fBrandom\fR, \fBroundrobin\fR, and \fBsequential\fR, a postfix can be
+appended to tell fio how many I/Os to issue before switching to a new file.
+For example, specifying \fBfile_service_type=random:8\fR would cause fio to
+issue \fI8\fR I/Os before selecting a new file at random. For the non-uniform
+distributions, a floating point postfix can be given to influence how the
+distribution is skewed. See \fBrandom_distribution\fR for a description of how
+that would work.
 .RE
 .TP
 .BI ioengine \fR=\fPstr
diff --git a/fio.h b/fio.h
index 6a244c3..8b6a272 100644
--- a/fio.h
+++ b/fio.h
@@ -170,6 +170,15 @@ struct thread_data {
 		unsigned int next_file;
 		struct frand_state next_file_state;
 	};
+	union {
+		struct zipf_state next_file_zipf;
+		struct gauss_state next_file_gauss;
+	};
+	union {
+		double zipf_theta;
+		double pareto_h;
+		double gauss_dev;
+	};
 	int error;
 	int sig;
 	int done;
diff --git a/init.c b/init.c
index c579d5c..e8c8afb 100644
--- a/init.c
+++ b/init.c
@@ -929,6 +929,22 @@ static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 
 	if (td->o.file_service_type == FIO_FSERVICE_RANDOM)
 		init_rand_seed(&td->next_file_state, td->rand_seeds[FIO_RAND_FILE_OFF], use64);
+	else if (td->o.file_service_type & __FIO_FSERVICE_NONUNIFORM) {
+		unsigned long nranges;
+
+		nranges = td->o.nr_files << FIO_FSERVICE_SHIFT;
+
+		if (td->o.file_service_type == FIO_FSERVICE_ZIPF) {
+			zipf_init(&td->next_file_zipf, nranges, td->zipf_theta, td->rand_seeds[FIO_RAND_FILE_OFF]);
+			zipf_disable_hash(&td->next_file_zipf);
+		} else if (td->o.file_service_type == FIO_FSERVICE_PARETO) {
+			pareto_init(&td->next_file_zipf, nranges, td->pareto_h, td->rand_seeds[FIO_RAND_FILE_OFF]);
+			zipf_disable_hash(&td->next_file_zipf);
+		} else if (td->o.file_service_type == FIO_FSERVICE_GAUSS) {
+			gauss_init(&td->next_file_gauss, nranges, td->gauss_dev, td->rand_seeds[FIO_RAND_FILE_OFF]);
+			gauss_disable_hash(&td->next_file_gauss);
+		}
+	}
 
 	init_rand_seed(&td->file_size_state, td->rand_seeds[FIO_RAND_FILE_SIZE_OFF], use64);
 	init_rand_seed(&td->trim_state, td->rand_seeds[FIO_RAND_TRIM_OFF], use64);
diff --git a/io_u.c b/io_u.c
index f9870e7..c0790b2 100644
--- a/io_u.c
+++ b/io_u.c
@@ -328,7 +328,8 @@ static int get_next_rand_block(struct thread_data *td, struct fio_file *f,
 	if (!get_next_rand_offset(td, f, ddir, b))
 		return 0;
 
-	if (td->o.time_based) {
+	if (td->o.time_based ||
+	    (td->o.file_service_type & __FIO_FSERVICE_NONUNIFORM)) {
 		fio_file_reset(td, f);
 		if (!get_next_rand_offset(td, f, ddir, b))
 			return 0;
@@ -1070,6 +1071,34 @@ static void io_u_mark_latency(struct thread_data *td, unsigned long usec)
 		io_u_mark_lat_msec(td, usec / 1000);
 }
 
+static unsigned int __get_next_fileno_rand(struct thread_data *td)
+{
+	unsigned long fileno;
+
+	if (td->o.file_service_type == FIO_FSERVICE_RANDOM) {
+		uint64_t frand_max = rand_max(&td->next_file_state);
+		unsigned long r;
+
+		r = __rand(&td->next_file_state);
+		return (unsigned int) ((double) td->o.nr_files
+				* (r / (frand_max + 1.0)));
+	}
+
+	if (td->o.file_service_type == FIO_FSERVICE_ZIPF)
+		fileno = zipf_next(&td->next_file_zipf);
+	else if (td->o.file_service_type == FIO_FSERVICE_PARETO)
+		fileno = pareto_next(&td->next_file_zipf);
+	else if (td->o.file_service_type == FIO_FSERVICE_GAUSS)
+		fileno = gauss_next(&td->next_file_gauss);
+	else {
+		log_err("fio: bad file service type: %d\n", td->o.file_service_type);
+		assert(0);
+		return 0;
+	}
+
+	return fileno >> FIO_FSERVICE_SHIFT;
+}
+
 /*
  * Get next file to service by choosing one at random
  */
@@ -1077,17 +1106,13 @@ static struct fio_file *get_next_file_rand(struct thread_data *td,
 					   enum fio_file_flags goodf,
 					   enum fio_file_flags badf)
 {
-	uint64_t frand_max = rand_max(&td->next_file_state);
 	struct fio_file *f;
 	int fno;
 
 	do {
 		int opened = 0;
-		unsigned long r;
 
-		r = __rand(&td->next_file_state);
-		fno = (unsigned int) ((double) td->o.nr_files
-				* (r / (frand_max + 1.0)));
+		fno = __get_next_fileno_rand(td);
 
 		f = td->files[fno];
 		if (fio_file_done(f))
@@ -1240,10 +1265,14 @@ static long set_io_u_file(struct thread_data *td, struct io_u *io_u)
 		put_file_log(td, f);
 		td_io_close_file(td, f);
 		io_u->file = NULL;
-		fio_file_set_done(f);
-		td->nr_done_files++;
-		dprint(FD_FILE, "%s: is done (%d of %d)\n", f->file_name,
+		if (td->o.file_service_type & __FIO_FSERVICE_NONUNIFORM)
+			fio_file_reset(td, f);
+		else {
+			fio_file_set_done(f);
+			td->nr_done_files++;
+			dprint(FD_FILE, "%s: is done (%d of %d)\n", f->file_name,
 					td->nr_done_files, td->o.nr_files);
+		}
 	} while (1);
 
 	return 0;
diff --git a/lib/gauss.c b/lib/gauss.c
index afd0490..f974490 100644
--- a/lib/gauss.c
+++ b/lib/gauss.c
@@ -38,7 +38,10 @@ unsigned long long gauss_next(struct gauss_state *gs)
 		sum += dev;
 	}
 
-	return __hash_u64(sum) % gs->nranges;
+	if (!gs->disable_hash)
+		sum = __hash_u64(sum);
+
+	return sum % gs->nranges;
 }
 
 void gauss_init(struct gauss_state *gs, unsigned long nranges, double dev,
@@ -54,3 +57,8 @@ void gauss_init(struct gauss_state *gs, unsigned long nranges, double dev,
 			gs->stddev = nranges / 2;
 	}
 }
+
+void gauss_disable_hash(struct gauss_state *gs)
+{
+	gs->disable_hash = true;
+}
diff --git a/lib/gauss.h b/lib/gauss.h
index a76df3f..478aa14 100644
--- a/lib/gauss.h
+++ b/lib/gauss.h
@@ -8,10 +8,12 @@ struct gauss_state {
 	struct frand_state r;
 	uint64_t nranges;
 	unsigned int stddev;
+	bool disable_hash;
 };
 
 void gauss_init(struct gauss_state *gs, unsigned long nranges, double dev,
 		unsigned int seed);
 unsigned long long gauss_next(struct gauss_state *gs);
+void gauss_disable_hash(struct gauss_state *gs);
 
 #endif
diff --git a/lib/zipf.c b/lib/zipf.c
index d8e72b1..681df70 100644
--- a/lib/zipf.c
+++ b/lib/zipf.c
@@ -69,7 +69,12 @@ unsigned long long zipf_next(struct zipf_state *zs)
 	else
 		val = 1 + (unsigned long long)(n * pow(eta*rand_uni - eta + 1.0, alpha));
 
-	return (__hash_u64(val - 1) + zs->rand_off) % zs->nranges;
+	val--;
+
+	if (!zs->disable_hash)
+		val = __hash_u64(val);
+
+	return (val + zs->rand_off) % zs->nranges;
 }
 
 void pareto_init(struct zipf_state *zs, unsigned long nranges, double h,
@@ -82,7 +87,17 @@ void pareto_init(struct zipf_state *zs, unsigned long nranges, double h,
 unsigned long long pareto_next(struct zipf_state *zs)
 {
 	double rand = (double) __rand(&zs->rand) / (double) FRAND32_MAX;
-	unsigned long long n = zs->nranges - 1;
+	unsigned long long n;
+
+	n = (zs->nranges - 1) * pow(rand, zs->pareto_pow);
+
+	if (!zs->disable_hash)
+		n = __hash_u64(n);
 
-	return (__hash_u64(n * pow(rand, zs->pareto_pow)) + zs->rand_off) % zs->nranges;
+	return (n + zs->rand_off)  % zs->nranges;
+}
+
+void zipf_disable_hash(struct zipf_state *zs)
+{
+	zs->disable_hash = true;
 }
diff --git a/lib/zipf.h b/lib/zipf.h
index f98ad81..af2d0e6 100644
--- a/lib/zipf.h
+++ b/lib/zipf.h
@@ -12,6 +12,7 @@ struct zipf_state {
 	double pareto_pow;
 	struct frand_state rand;
 	uint64_t rand_off;
+	bool disable_hash;
 };
 
 void zipf_init(struct zipf_state *zs, unsigned long nranges, double theta, unsigned int seed);
@@ -19,5 +20,6 @@ unsigned long long zipf_next(struct zipf_state *zs);
 
 void pareto_init(struct zipf_state *zs, unsigned long nranges, double h, unsigned int seed);
 unsigned long long pareto_next(struct zipf_state *zs);
+void zipf_disable_hash(struct zipf_state *zs);
 
 #endif
diff --git a/options.c b/options.c
index 980b7e5..a925663 100644
--- a/options.c
+++ b/options.c
@@ -724,12 +724,77 @@ out:
 static int str_fst_cb(void *data, const char *str)
 {
 	struct thread_data *td = data;
-	char *nr = get_opt_postfix(str);
+	double val;
+	bool done = false;
+	char *nr;
 
 	td->file_service_nr = 1;
-	if (nr) {
-		td->file_service_nr = atoi(nr);
+
+	switch (td->o.file_service_type) {
+	case FIO_FSERVICE_RANDOM:
+	case FIO_FSERVICE_RR:
+	case FIO_FSERVICE_SEQ:
+		nr = get_opt_postfix(str);
+		if (nr) {
+			td->file_service_nr = atoi(nr);
+			free(nr);
+		}
+		done = true;
+		break;
+	case FIO_FSERVICE_ZIPF:
+		val = FIO_DEF_ZIPF;
+		break;
+	case FIO_FSERVICE_PARETO:
+		val = FIO_DEF_PARETO;
+		break;
+	case FIO_FSERVICE_GAUSS:
+		val = 0.0;
+		break;
+	default:
+		log_err("fio: bad file service type: %d\n", td->o.file_service_type);
+		return 1;
+	}
+
+	if (done)
+		return 0;
+
+	nr = get_opt_postfix(str);
+	if (nr && !str_to_float(nr, &val, 0)) {
+		log_err("fio: file service type random postfix parsing failed\n");
 		free(nr);
+		return 1;
+	}
+
+	free(nr);
+
+	switch (td->o.file_service_type) {
+	case FIO_FSERVICE_ZIPF:
+		if (val == 1.00) {
+			log_err("fio: zipf theta must be different than 1.0\n");
+			return 1;
+		}
+		if (parse_dryrun())
+			return 0;
+		td->zipf_theta = val;
+		break;
+	case FIO_FSERVICE_PARETO:
+		if (val <= 0.00 || val >= 1.00) {
+                          log_err("fio: pareto input out of range (0 < input < 1.0)\n");
+                          return 1;
+		}
+		if (parse_dryrun())
+			return 0;
+		td->pareto_h = val;
+		break;
+	case FIO_FSERVICE_GAUSS:
+		if (val < 0.00 || val >= 100.00) {
+                          log_err("fio: normal deviation out of range (0 < input < 100.0  )\n");
+                          return 1;
+		}
+		if (parse_dryrun())
+			return 0;
+		td->gauss_dev = val;
+		break;
 	}
 
 	return 0;
@@ -982,7 +1047,7 @@ static int str_random_distribution_cb(void *data, const char *str)
 			return 0;
 		td->o.pareto_h.u.f = val;
 	} else {
-		if (val <= 0.00 || val >= 100.0) {
+		if (val < 0.00 || val >= 100.0) {
 			log_err("fio: normal deviation out of range (0 < input < 100.0)\n");
 			return 1;
 		}
@@ -2020,7 +2085,19 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.posval	= {
 			  { .ival = "random",
 			    .oval = FIO_FSERVICE_RANDOM,
-			    .help = "Choose a file at random",
+			    .help = "Choose a file at random (uniform)",
+			  },
+			  { .ival = "zipf",
+			    .oval = FIO_FSERVICE_ZIPF,
+			    .help = "Zipf randomized",
+			  },
+			  { .ival = "pareto",
+			    .oval = FIO_FSERVICE_PARETO,
+			    .help = "Pareto randomized",
+			  },
+			  { .ival = "gauss",
+			    .oval = FIO_FSERVICE_GAUSS,
+			    .help = "Normal (guassian) distribution",
 			  },
 			  { .ival = "roundrobin",
 			    .oval = FIO_FSERVICE_RR,
diff --git a/os/windows/posix.c b/os/windows/posix.c
index 41fc480..fd3d9ab 100755
--- a/os/windows/posix.c
+++ b/os/windows/posix.c
@@ -243,12 +243,12 @@ void Time_tToSystemTime(time_t dosTime, SYSTEMTIME *systemTime)
 char* ctime_r(const time_t *t, char *buf)
 {
     SYSTEMTIME systime;
-    const char * const dayOfWeek[] = { "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun" };
+    const char * const dayOfWeek[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };
     const char * const monthOfYear[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" };
 
     Time_tToSystemTime(*t, &systime);
     /* We don't know how long `buf` is, but assume it's rounded up from the minimum of 25 to 32 */
-    StringCchPrintfA(buf, 32, "%s %s %d %02d:%02d:%02d %04d", dayOfWeek[systime.wDayOfWeek - 1], monthOfYear[systime.wMonth - 1],
+    StringCchPrintfA(buf, 31, "%s %s %d %02d:%02d:%02d %04d", dayOfWeek[systime.wDayOfWeek % 7], monthOfYear[(systime.wMonth - 1) % 12],
 										 systime.wDay, systime.wHour, systime.wMinute, systime.wSecond, systime.wYear);
     return buf;
 }
@@ -888,7 +888,7 @@ struct dirent *readdir(DIR *dirp)
 
 	if (dirp->find_handle == INVALID_HANDLE_VALUE) {
 		char search_pattern[MAX_PATH];
-		StringCchPrintfA(search_pattern, MAX_PATH, "%s\\*", dirp->dirname);
+		StringCchPrintfA(search_pattern, MAX_PATH-1, "%s\\*", dirp->dirname);
 		dirp->find_handle = FindFirstFileA(search_pattern, &find_data);
 		if (dirp->find_handle == INVALID_HANDLE_VALUE)
 			return NULL;
diff --git a/stat.c b/stat.c
index 95f206e..4d87c29 100644
--- a/stat.c
+++ b/stat.c
@@ -2169,8 +2169,14 @@ static int add_bw_samples(struct thread_data *td, struct timeval *t)
 
 		add_stat_sample(&ts->bw_stat[ddir], rate);
 
-		if (td->bw_log)
-			add_log_sample(td, td->bw_log, rate, ddir, 0, 0);
+		if (td->bw_log) {
+			unsigned int bs = 0;
+
+			if (td->o.min_bs[ddir] == td->o.max_bs[ddir])
+				bs = td->o.min_bs[ddir];
+
+			add_log_sample(td, td->bw_log, rate, ddir, bs, 0);
+		}
 
 		td->stat_io_bytes[ddir] = td->this_io_bytes[ddir];
 	}
@@ -2234,8 +2240,14 @@ static int add_iops_samples(struct thread_data *td, struct timeval *t)
 
 		add_stat_sample(&ts->iops_stat[ddir], iops);
 
-		if (td->iops_log)
-			add_log_sample(td, td->iops_log, iops, ddir, 0, 0);
+		if (td->iops_log) {
+			unsigned int bs = 0;
+
+			if (td->o.min_bs[ddir] == td->o.max_bs[ddir])
+				bs = td->o.min_bs[ddir];
+
+			add_log_sample(td, td->iops_log, iops, ddir, bs, 0);
+		}
 
 		td->stat_io_blocks[ddir] = td->this_io_blocks[ddir];
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-11 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-11 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit df96a39edc86394bca0643e9aa2a8f4dfc76c7c9:

  Change default IO engine from sync to psync (2016-05-09 13:35:09 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5c8f0ba56837a0b848cbbbc5a8673589d099ded3:

  verify: unroll string copy (2016-05-10 19:50:00 -0600)

----------------------------------------------------------------
Jens Axboe (6):
      engines/pmemblk: more cleanups
      ioengines: cleanup
      verify: increase state file name and log error on failure
      verify: escape '/' in state file name to '.'
      verify: only escape 'name', not prefix+name
      verify: unroll string copy

 engines/pmemblk.c | 165 ++++++++++++++++++++++++------------------------------
 ioengines.c       |  20 +++----
 verify-state.h    |  20 ++++++-
 verify.c          |   3 +-
 4 files changed, 102 insertions(+), 106 deletions(-)

---

Diff of recent changes:

diff --git a/engines/pmemblk.c b/engines/pmemblk.c
index ab4b769..6d19864 100644
--- a/engines/pmemblk.c
+++ b/engines/pmemblk.c
@@ -75,52 +75,59 @@
 struct PMEMblkpool_s;
 typedef struct PMEMblkpool_s PMEMblkpool;
 
-PMEMblkpool *(*pmemblk_create) (const char *, size_t, size_t, mode_t) = NULL;
-PMEMblkpool *(*pmemblk_open) (const char *, size_t) = NULL;
-void (*pmemblk_close) (PMEMblkpool *) = NULL;
-size_t(*pmemblk_nblock) (PMEMblkpool *) = NULL;
-size_t(*pmemblk_bsize) (PMEMblkpool *) = NULL;
-int (*pmemblk_read) (PMEMblkpool *, void *, off_t) = NULL;
-int (*pmemblk_write) (PMEMblkpool *, const void *, off_t) = NULL;
+static PMEMblkpool *(*pmemblk_create) (const char *, size_t, size_t, mode_t);
+static PMEMblkpool *(*pmemblk_open) (const char *, size_t);
+static void (*pmemblk_close) (PMEMblkpool *);
+static size_t(*pmemblk_nblock) (PMEMblkpool *);
+static size_t(*pmemblk_bsize) (PMEMblkpool *);
+static int (*pmemblk_read) (PMEMblkpool *, void *, off_t);
+static int (*pmemblk_write) (PMEMblkpool *, const void *, off_t);
 
 int load_libpmemblk(const char *path)
 {
 	void *dl;
 
-	if (NULL == path)
+	if (!path)
 		path = "libpmemblk.so";
 
 	dl = dlopen(path, RTLD_NOW | RTLD_NODELETE);
-	if (NULL == dl)
+	if (!dl)
 		goto errorout;
 
-	if (NULL == (pmemblk_create = dlsym(dl, "pmemblk_create")))
+	pmemblk_create = dlsym(dl, "pmemblk_create");
+	if (!pmemblk_create)
 		goto errorout;
-	if (NULL == (pmemblk_open = dlsym(dl, "pmemblk_open")))
+	pmemblk_open = dlsym(dl, "pmemblk_open");
+	if (!pmemblk_open)
 		goto errorout;
-	if (NULL == (pmemblk_close = dlsym(dl, "pmemblk_close")))
+	pmemblk_close = dlsym(dl, "pmemblk_close");
+	if (!pmemblk_close)
 		goto errorout;
-	if (NULL == (pmemblk_nblock = dlsym(dl, "pmemblk_nblock")))
+	pmemblk_nblock = dlsym(dl, "pmemblk_nblock");
+	if (!pmemblk_nblock)
 		goto errorout;
-	if (NULL == (pmemblk_bsize = dlsym(dl, "pmemblk_bsize")))
+	pmemblk_bsize = dlsym(dl, "pmemblk_bsize");
+	if (!pmemblk_bsize)
 		goto errorout;
-	if (NULL == (pmemblk_read = dlsym(dl, "pmemblk_read")))
+	pmemblk_read = dlsym(dl, "pmemblk_read");
+	if (!pmemblk_read)
 		goto errorout;
-	if (NULL == (pmemblk_write = dlsym(dl, "pmemblk_write")))
+	pmemblk_write = dlsym(dl, "pmemblk_write");
+	if (!pmemblk_write)
 		goto errorout;
 
 	return 0;
 
 errorout:
 	log_err("fio: unable to load libpmemblk: %s\n", dlerror());
-	if (NULL != dl)
+	if (dl)
 		dlclose(dl);
 
-	return (-1);
-
-}				/* load_libpmemblk() */
+	return -1;
+}
 
 typedef struct fio_pmemblk_file *fio_pmemblk_file_t;
+
 struct fio_pmemblk_file {
 	fio_pmemblk_file_t pmb_next;
 	char *pmb_filename;
@@ -134,7 +141,7 @@ struct fio_pmemblk_file {
 } while(0)
 #define FIOFILEPMBGET(_f)  ((fio_pmemblk_file_t)((_f)->engine_data))
 
-static fio_pmemblk_file_t Cache = NULL;
+static fio_pmemblk_file_t Cache;
 
 static pthread_mutex_t CacheLock = PTHREAD_MUTEX_INITIALIZER;
 
@@ -145,21 +152,17 @@ fio_pmemblk_file_t fio_pmemblk_cache_lookup(const char *filename)
 	fio_pmemblk_file_t i;
 
 	for (i = Cache; i != NULL; i = i->pmb_next)
-		if (0 == strcmp(filename, i->pmb_filename))
+		if (!strcmp(filename, i->pmb_filename))
 			return i;
 
 	return NULL;
-
-}				/* fio_pmemblk_cache_lookup() */
+}
 
 static void fio_pmemblk_cache_insert(fio_pmemblk_file_t pmb)
 {
 	pmb->pmb_next = Cache;
 	Cache = pmb;
-
-	return;
-
-}				/* fio_pmemblk_cache_insert() */
+}
 
 static void fio_pmemblk_cache_remove(fio_pmemblk_file_t pmb)
 {
@@ -177,10 +180,7 @@ static void fio_pmemblk_cache_remove(fio_pmemblk_file_t pmb)
 			pmb->pmb_next = NULL;
 			return;
 		}
-
-	return;
-
-}				/* fio_pmemblk_cache_remove() */
+}
 
 /*
  * to control block size and gross file size at the libpmemblk
@@ -200,9 +200,8 @@ static void fio_pmemblk_cache_remove(fio_pmemblk_file_t pmb)
  * note that the user should specify the file size in MiB, but
  * we return bytes from here.
  */
-static void
-pmb_parse_path(const char *pathspec,
-	       char **ppath, uint64_t * pbsize, uint64_t * pfsize)
+static void pmb_parse_path(const char *pathspec, char **ppath, uint64_t *pbsize,
+			   uint64_t *pfsize)
 {
 	char *path;
 	char *s;
@@ -210,7 +209,7 @@ pmb_parse_path(const char *pathspec,
 	uint64_t fsizemb;
 
 	path = strdup(pathspec);
-	if (NULL == path) {
+	if (!path) {
 		*ppath = NULL;
 		return;
 	}
@@ -234,12 +233,9 @@ pmb_parse_path(const char *pathspec,
 	*ppath = path;
 	*pbsize = 0;
 	*pfsize = 0;
-	return;
-
-}				/* pmb_parse_path() */
+}
 
-static
- fio_pmemblk_file_t pmb_open(const char *pathspec, int flags)
+static fio_pmemblk_file_t pmb_open(const char *pathspec, int flags)
 {
 	fio_pmemblk_file_t pmb;
 	char *path = NULL;
@@ -247,32 +243,30 @@ static
 	uint64_t fsize = 0;
 
 	pmb_parse_path(pathspec, &path, &bsize, &fsize);
-	if (NULL == path)
+	if (!path)
 		return NULL;
 
 	pthread_mutex_lock(&CacheLock);
 
 	pmb = fio_pmemblk_cache_lookup(path);
-
-	if (NULL == pmb) {
+	if (!pmb) {
 		/* load libpmemblk if needed */
-		if (NULL == pmemblk_open)
-			if (0 != load_libpmemblk(getenv("FIO_PMEMBLK_LIB")))
+		if (!pmemblk_open)
+			if (load_libpmemblk(getenv("FIO_PMEMBLK_LIB")))
 				goto error;
 
 		pmb = malloc(sizeof(*pmb));
-		if (NULL == pmb)
+		if (!pmb)
 			goto error;
 
 		/* try opening existing first, create it if needed */
 		pmb->pmb_pool = pmemblk_open(path, bsize);
-		if ((NULL == pmb->pmb_pool) &&
-		    (ENOENT == errno) &&
+		if (!pmb->pmb_pool && (errno == ENOENT) &&
 		    (flags & PMB_CREATE) && (0 < fsize) && (0 < bsize)) {
 			pmb->pmb_pool =
 			    pmemblk_create(path, bsize, fsize, 0644);
 		}
-		if (NULL == pmb->pmb_pool) {
+		if (!pmb->pmb_pool) {
 			log_err
 			    ("fio: enable to open pmemblk pool file (errno %d)\n",
 			     errno);
@@ -295,28 +289,27 @@ static
 	return pmb;
 
 error:
-	if (NULL != pmb) {
-		if (NULL != pmb->pmb_pool)
+	if (pmb) {
+		if (pmb->pmb_pool)
 			pmemblk_close(pmb->pmb_pool);
 		pmb->pmb_pool = NULL;
 		pmb->pmb_filename = NULL;
 		free(pmb);
 	}
-	if (NULL != path)
+	if (path)
 		free(path);
 
 	pthread_mutex_unlock(&CacheLock);
 	return NULL;
+}
 
-}				/* pmb_open() */
-
-static void pmb_close(fio_pmemblk_file_t pmb, const int keep)
+static void pmb_close(fio_pmemblk_file_t pmb, const bool keep)
 {
 	pthread_mutex_lock(&CacheLock);
 
 	pmb->pmb_refcnt--;
 
-	if (!keep && (0 == pmb->pmb_refcnt)) {
+	if (!keep && !pmb->pmb_refcnt) {
 		pmemblk_close(pmb->pmb_pool);
 		pmb->pmb_pool = NULL;
 		free(pmb->pmb_filename);
@@ -326,10 +319,9 @@ static void pmb_close(fio_pmemblk_file_t pmb, const int keep)
 	}
 
 	pthread_mutex_unlock(&CacheLock);
+}
 
-}				/* pmb_close() */
-
-static int pmb_get_flags(struct thread_data *td, uint64_t * pflags)
+static int pmb_get_flags(struct thread_data *td, uint64_t *pflags)
 {
 	static int thread_warned = 0;
 	static int odirect_warned = 0;
@@ -354,40 +346,35 @@ static int pmb_get_flags(struct thread_data *td, uint64_t * pflags)
 
 	(*pflags) = flags;
 	return 0;
-
-}				/* pmb_get_flags() */
+}
 
 static int fio_pmemblk_open_file(struct thread_data *td, struct fio_file *f)
 {
 	uint64_t flags = 0;
 	fio_pmemblk_file_t pmb;
 
-	if (0 != pmb_get_flags(td, &flags))
+	if (pmb_get_flags(td, &flags))
 		return 1;
 
 	pmb = pmb_open(f->file_name, flags);
-	if (NULL == pmb)
+	if (!pmb)
 		return 1;
 
 	FIOFILEPMBSET(f, pmb);
-
 	return 0;
+}
 
-}				/* fio_pmemblk_open_file() */
-
-static int
-fio_pmemblk_close_file(struct thread_data fio_unused * td, struct fio_file *f)
+static int fio_pmemblk_close_file(struct thread_data fio_unused *td,
+				  struct fio_file *f)
 {
 	fio_pmemblk_file_t pmb = FIOFILEPMBGET(f);
 
 	if (pmb)
-		pmb_close(pmb, 0);
+		pmb_close(pmb, false);
 
 	FIOFILEPMBSET(f, NULL);
-
 	return 0;
-
-}				/* fio_pmemblk_close_file() */
+}
 
 static int fio_pmemblk_get_file_size(struct thread_data *td, struct fio_file *f)
 {
@@ -397,11 +384,11 @@ static int fio_pmemblk_get_file_size(struct thread_data *td, struct fio_file *f)
 	if (fio_file_size_known(f))
 		return 0;
 
-	if (NULL == pmb) {
-		if (0 != pmb_get_flags(td, &flags))
+	if (!pmb) {
+		if (pmb_get_flags(td, &flags))
 			return 1;
 		pmb = pmb_open(f->file_name, flags);
-		if (NULL == pmb)
+		if (!pmb)
 			return 1;
 	}
 
@@ -409,12 +396,11 @@ static int fio_pmemblk_get_file_size(struct thread_data *td, struct fio_file *f)
 
 	fio_file_set_size_known(f);
 
-	if (NULL == FIOFILEPMBGET(f))
-		pmb_close(pmb, 1);
+	if (!FIOFILEPMBGET(f))
+		pmb_close(pmb, true);
 
 	return 0;
-
-}				/* fio_pmemblk_get_file_size() */
+}
 
 static int fio_pmemblk_queue(struct thread_data *td, struct io_u *io_u)
 {
@@ -437,9 +423,9 @@ static int fio_pmemblk_queue(struct thread_data *td, struct io_u *io_u)
 		len = io_u->xfer_buflen;
 
 		io_u->error = EINVAL;
-		if (0 != (off % pmb->pmb_bsize))
+		if (off % pmb->pmb_bsize)
 			break;
-		if (0 != (len % pmb->pmb_bsize))
+		if (len % pmb->pmb_bsize)
 			break;
 		if ((off + len) / pmb->pmb_bsize > pmb->pmb_nblocks)
 			break;
@@ -473,8 +459,7 @@ static int fio_pmemblk_queue(struct thread_data *td, struct io_u *io_u)
 	}
 
 	return FIO_Q_COMPLETED;
-
-}				/* fio_pmemblk_queue() */
+}
 
 static int fio_pmemblk_unlink_file(struct thread_data *td, struct fio_file *f)
 {
@@ -489,15 +474,13 @@ static int fio_pmemblk_unlink_file(struct thread_data *td, struct fio_file *f)
 	 */
 
 	pmb_parse_path(f->file_name, &path, &bsize, &fsize);
-	if (NULL == path)
+	if (!path)
 		return 1;
 
 	unlink(path);
 	free(path);
-
 	return 0;
-
-}				/* fio_pmemblk_unlink_file() */
+}
 
 struct ioengine_ops ioengine = {
 	.name = "pmemblk",
@@ -510,14 +493,12 @@ struct ioengine_ops ioengine = {
 	.flags = FIO_SYNCIO | FIO_DISKLESSIO | FIO_NOEXTEND | FIO_NODISKUTIL,
 };
 
-static void
-fio_init fio_pmemblk_register(void)
+static void fio_init fio_pmemblk_register(void)
 {
 	register_ioengine(&ioengine);
 }
 
-static void
-fio_exit fio_pmemblk_unregister(void)
+static void fio_exit fio_pmemblk_unregister(void)
 {
 	unregister_ioengine(&ioengine);
 }
diff --git a/ioengines.c b/ioengines.c
index b89a121..e2e7280 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -22,35 +22,31 @@
 
 static FLIST_HEAD(engine_list);
 
-static int check_engine_ops(struct ioengine_ops *ops)
+static bool check_engine_ops(struct ioengine_ops *ops)
 {
 	if (ops->version != FIO_IOOPS_VERSION) {
 		log_err("bad ioops version %d (want %d)\n", ops->version,
 							FIO_IOOPS_VERSION);
-		return 1;
+		return true;
 	}
 
 	if (!ops->queue) {
 		log_err("%s: no queue handler\n", ops->name);
-		return 1;
+		return true;
 	}
 
 	/*
 	 * sync engines only need a ->queue()
 	 */
 	if (ops->flags & FIO_SYNCIO)
-		return 0;
+		return false;
 
-	if (!ops->event) {
-		log_err("%s: no event handler\n", ops->name);
-		return 1;
-	}
-	if (!ops->getevents) {
-		log_err("%s: no getevents handler\n", ops->name);
-		return 1;
+	if (!ops->event || !ops->getevents) {
+		log_err("%s: no event/getevents handler\n", ops->name);
+		return true;
 	}
 
-	return 0;
+	return false;
 }
 
 void unregister_ioengine(struct ioengine_ops *ops)
diff --git a/verify-state.h b/verify-state.h
index f1dc069..901aa0a 100644
--- a/verify-state.h
+++ b/verify-state.h
@@ -2,6 +2,7 @@
 #define FIO_VERIFY_STATE_H
 
 #include <stdint.h>
+#include <string.h>
 
 struct thread_rand32_state {
 	uint32_t s[4];
@@ -82,7 +83,24 @@ static inline void verify_state_gen_name(char *out, size_t size,
 					 const char *name, const char *prefix,
 					 int num)
 {
-	snprintf(out, size, "%s-%s-%d-verify.state", prefix, name, num);
+	char ename[PATH_MAX];
+	char *ptr;
+
+	/*
+	 * Escape '/', just turn them into '.'
+	 */
+	ptr = ename;
+	do {
+		*ptr = *name;
+		if (*ptr == '\0')
+			break;
+		else if (*ptr == '/')
+			*ptr = '.';
+		ptr++;
+		name++;
+	} while (1);
+
+	snprintf(out, size, "%s-%s-%d-verify.state", prefix, ename, num);
 	out[size - 1] = '\0';
 }
 
diff --git a/verify.c b/verify.c
index 838db10..58f37ae 100644
--- a/verify.c
+++ b/verify.c
@@ -1471,7 +1471,7 @@ struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 static int open_state_file(const char *name, const char *prefix, int num,
 			   int for_write)
 {
-	char out[64];
+	char out[PATH_MAX];
 	int flags;
 	int fd;
 
@@ -1485,6 +1485,7 @@ static int open_state_file(const char *name, const char *prefix, int num,
 	fd = open(out, flags, 0644);
 	if (fd == -1) {
 		perror("fio: open state file");
+		log_err("fio: state file: %s (for_write=%d)\n", out, for_write);
 		return -1;
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-10 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-10 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit d5bdff69e877a3f65928278df9d252d8881ff864:

  Makefile: fix path to tools/fiologparser.py (2016-05-06 17:10:33 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to df96a39edc86394bca0643e9aa2a8f4dfc76c7c9:

  Change default IO engine from sync to psync (2016-05-09 13:35:09 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      filesetup: ensure that we align file starting offset
      io_u: if we're doing backwards IO, wrap to end (not start)
      Change default IO engine from sync to psync

 filesetup.c |  8 ++++++--
 io_u.c      | 11 ++++++++---
 os/os.h     |  2 +-
 3 files changed, 15 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/filesetup.c b/filesetup.c
index 3fc1464..9c37ae5 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -761,12 +761,16 @@ static unsigned long long get_fs_free_counts(struct thread_data *td)
 uint64_t get_start_offset(struct thread_data *td, struct fio_file *f)
 {
 	struct thread_options *o = &td->o;
+	uint64_t offset;
 
 	if (o->file_append && f->filetype == FIO_TYPE_FILE)
 		return f->real_file_size;
 
-	return td->o.start_offset +
-		td->subjob_number * td->o.offset_increment;
+	offset = td->o.start_offset + td->subjob_number * td->o.offset_increment;
+	if (offset % td_max_bs(td))
+		offset -= (offset % td_max_bs(td));
+
+	return offset;
 }
 
 /*
diff --git a/io_u.c b/io_u.c
index eb15dc2..f9870e7 100644
--- a/io_u.c
+++ b/io_u.c
@@ -371,10 +371,15 @@ static int get_next_seq_offset(struct thread_data *td, struct fio_file *f,
 			/*
 			 * If we reach beyond the end of the file
 			 * with holed IO, wrap around to the
-			 * beginning again.
+			 * beginning again. If we're doing backwards IO,
+			 * wrap to the end.
 			 */
-			if (pos >= f->real_file_size)
-				pos = f->file_offset;
+			if (pos >= f->real_file_size) {
+				if (o->ddir_seq_add > 0)
+					pos = f->file_offset;
+				else
+					pos = f->real_file_size + o->ddir_seq_add;
+			}
 		}
 
 		*offset = pos;
diff --git a/os/os.h b/os/os.h
index 02ab40d..9877383 100644
--- a/os/os.h
+++ b/os/os.h
@@ -151,7 +151,7 @@ extern int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu);
 #endif
 
 #ifndef FIO_PREFERRED_ENGINE
-#define FIO_PREFERRED_ENGINE	"sync"
+#define FIO_PREFERRED_ENGINE	"psync"
 #endif
 
 #ifndef FIO_OS_PATH_SEPARATOR

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-07 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-07 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 604577f1329b617d724d6712868d344a5adf5251:

  libfio: clear iops/bw sample times on stats reset (2016-05-05 10:55:47 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to d5bdff69e877a3f65928278df9d252d8881ff864:

  Makefile: fix path to tools/fiologparser.py (2016-05-06 17:10:33 -0600)

----------------------------------------------------------------
Brian Boylston (4):
      add pmemblk engine
      add an example job file for pmemblk
      pmemblk: remove comments about an external engine
      pmemblk: don't use #defines for the pmemblk_* functions

Jens Axboe (13):
      Improve logging accuracy
      stat: remove debug statement
      Makefile: add tools/fiologpaser.py
      Merge branch 'libpmemblk' of https://github.com/bgbhpe/fio
      engines/pmeblk: fixup coding style
      engines/pmemblk: get rid of CACHE_LOCK/UNLOCK defines
      Wire up pmemblk
      Merge branch 'logging'
      helper_thread: split into separate file
      os/os-mac: kill unused code
      diskutil: adapt to new helper_thread functions
      Fix typo in tools/fiologparser.py
      Makefile: fix path to tools/fiologparser.py

Mark Nelson (1):
      added fio log parser tool.

 HOWTO                 |  11 +-
 Makefile              |   7 +-
 backend.c             |  87 +--------
 configure             |  11 ++
 diskutil.c            |   3 +-
 diskutil.h            |   5 +-
 engines/pmemblk.c     | 523 ++++++++++++++++++++++++++++++++++++++++++++++++++
 examples/pmemblk.fio  |  71 +++++++
 fio.1                 |  13 +-
 fio.h                 |   2 -
 fio_time.h            |   1 +
 helper_thread.c       | 167 ++++++++++++++++
 helper_thread.h       |  11 ++
 init.c                |  10 +
 io_u.c                |  18 +-
 iolog.c               |  89 +++++++--
 iolog.h               |  10 +-
 libfio.c              |   2 +
 options.c             |   6 +
 os/os-mac.h           |  69 -------
 stat.c                | 152 ++++++++++++---
 stat.h                |   9 +-
 time.c                |   9 +
 tools/fiologparser.py | 152 +++++++++++++++
 workqueue.c           |   5 +-
 25 files changed, 1213 insertions(+), 230 deletions(-)
 create mode 100644 engines/pmemblk.c
 create mode 100644 examples/pmemblk.fio
 create mode 100644 helper_thread.c
 create mode 100644 helper_thread.h
 create mode 100755 tools/fiologparser.py

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 1f523d3..88d10a1 100644
--- a/HOWTO
+++ b/HOWTO
@@ -798,6 +798,9 @@ ioengine=str	Defines how the job issues io to the file. The following
 				overwriting. The writetrim mode works well
 				for this constraint.
 
+			pmemblk	Read and write through the NVML libpmemblk
+				interface.
+
 			external Prefix to specify loading an external
 				IO engine object file. Append the engine
 				filename, eg ioengine=external:/tmp/foo.o
@@ -1263,10 +1266,14 @@ exitall_on_error	When one job finishes in error, terminate the rest. The
 		default is to wait for each job to finish.
 
 bwavgtime=int	Average the calculated bandwidth over the given time. Value
-		is specified in milliseconds.
+		is specified in milliseconds. If the job also does bandwidth
+		logging through 'write_bw_log', then the minimum of this option
+		and 'log_avg_msec' will be used.  Default: 500ms.
 
 iopsavgtime=int	Average the calculated IOPS over the given time. Value
-		is specified in milliseconds.
+		is specified in milliseconds. If the job also does IOPS logging
+		through 'write_iops_log', then the minimum of this option and
+		'log_avg_msec' will be used.  Default: 500ms.
 
 create_serialize=bool	If true, serialize the file creating for the jobs.
 			This may be handy to avoid interleaving of data
diff --git a/Makefile b/Makefile
index 007ae40..0133ac4 100644
--- a/Makefile
+++ b/Makefile
@@ -26,7 +26,7 @@ OPTFLAGS= -g -ffast-math
 CFLAGS	= -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement $(OPTFLAGS) $(EXTFLAGS) $(BUILD_CFLAGS) -I. -I$(SRCDIR)
 LIBS	+= -lm $(EXTLIBS)
 PROGS	= fio
-SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio)
+SCRIPTS = $(addprefix $(SRCDIR)/,tools/fio_generate_plots tools/plot/fio2gnuplot tools/genfio tools/fiologparser.py)
 
 ifndef CONFIG_FIO_NO_OPT
   CFLAGS += -O3
@@ -45,7 +45,7 @@ SOURCE :=	$(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
 		server.c client.c iolog.c backend.c libfio.c flow.c cconv.c \
 		gettime-thread.c helpers.c json.c idletime.c td_error.c \
 		profiles/tiobench.c profiles/act.c io_u_queue.c filelock.c \
-		workqueue.c rate-submit.c optgroup.c
+		workqueue.c rate-submit.c optgroup.c helper_thread.c
 
 ifdef CONFIG_LIBHDFS
   HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
@@ -124,6 +124,9 @@ ifdef CONFIG_MTD
   SOURCE += oslib/libmtd.c
   SOURCE += oslib/libmtd_legacy.c
 endif
+ifdef CONFIG_PMEMBLK
+  SOURCE += engines/pmemblk.c
+endif
 
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
diff --git a/backend.c b/backend.c
index 1723b8f..7de6f65 100644
--- a/backend.c
+++ b/backend.c
@@ -57,11 +57,7 @@
 #include "workqueue.h"
 #include "lib/mountcheck.h"
 #include "rate-submit.h"
-
-static pthread_t helper_thread;
-static pthread_mutex_t helper_lock;
-pthread_cond_t helper_cond;
-int helper_do_stat = 0;
+#include "helper_thread.h"
 
 static struct fio_mutex *startup_mutex;
 static struct flist_head *cgroup_list;
@@ -79,7 +75,6 @@ unsigned int stat_number = 0;
 int shm_id = 0;
 int temp_stall_ts;
 unsigned long done_secs = 0;
-volatile int helper_exit = 0;
 
 #define PAGE_ALIGN(buf)	\
 	(char *) (((uintptr_t) (buf) + page_mask) & ~page_mask)
@@ -1722,7 +1717,7 @@ static void *thread_main(void *data)
 
 	fio_unpin_memory(td);
 
-	fio_writeout_logs(td);
+	td_writeout_logs(td, true);
 
 	iolog_compress_exit(td);
 	rate_submit_exit(td);
@@ -2319,82 +2314,10 @@ reap:
 	update_io_ticks();
 }
 
-static void wait_for_helper_thread_exit(void)
-{
-	void *ret;
-
-	helper_exit = 1;
-	pthread_cond_signal(&helper_cond);
-	pthread_join(helper_thread, &ret);
-}
-
 static void free_disk_util(void)
 {
 	disk_util_prune_entries();
-
-	pthread_cond_destroy(&helper_cond);
-}
-
-static void *helper_thread_main(void *data)
-{
-	struct sk_out *sk_out = data;
-	int ret = 0;
-
-	sk_out_assign(sk_out);
-
-	fio_mutex_up(startup_mutex);
-
-	while (!ret) {
-		uint64_t sec = DISK_UTIL_MSEC / 1000;
-		uint64_t nsec = (DISK_UTIL_MSEC % 1000) * 1000000;
-		struct timespec ts;
-		struct timeval tv;
-
-		gettimeofday(&tv, NULL);
-		ts.tv_sec = tv.tv_sec + sec;
-		ts.tv_nsec = (tv.tv_usec * 1000) + nsec;
-
-		if (ts.tv_nsec >= 1000000000ULL) {
-			ts.tv_nsec -= 1000000000ULL;
-			ts.tv_sec++;
-		}
-
-		pthread_cond_timedwait(&helper_cond, &helper_lock, &ts);
-
-		ret = update_io_ticks();
-
-		if (helper_do_stat) {
-			helper_do_stat = 0;
-			__show_running_run_stats();
-		}
-
-		if (!is_backend)
-			print_thread_status();
-	}
-
-	sk_out_drop();
-	return NULL;
-}
-
-static int create_helper_thread(struct sk_out *sk_out)
-{
-	int ret;
-
-	setup_disk_util();
-
-	pthread_cond_init(&helper_cond, NULL);
-	pthread_mutex_init(&helper_lock, NULL);
-
-	ret = pthread_create(&helper_thread, NULL, helper_thread_main, sk_out);
-	if (ret) {
-		log_err("Can't create helper thread: %s\n", strerror(ret));
-		return 1;
-	}
-
-	dprint(FD_MUTEX, "wait on startup_mutex\n");
-	fio_mutex_down(startup_mutex);
-	dprint(FD_MUTEX, "done waiting on startup_mutex\n");
-	return 0;
+	helper_thread_destroy();
 }
 
 int fio_backend(struct sk_out *sk_out)
@@ -2427,14 +2350,14 @@ int fio_backend(struct sk_out *sk_out)
 
 	set_genesis_time();
 	stat_init();
-	create_helper_thread(sk_out);
+	helper_thread_create(startup_mutex, sk_out);
 
 	cgroup_list = smalloc(sizeof(*cgroup_list));
 	INIT_FLIST_HEAD(cgroup_list);
 
 	run_threads(sk_out);
 
-	wait_for_helper_thread_exit();
+	helper_thread_exit();
 
 	if (!fio_abort) {
 		__show_run_stats();
diff --git a/configure b/configure
index 6e2488c..5f6bca3 100755
--- a/configure
+++ b/configure
@@ -135,6 +135,7 @@ show_help="no"
 exit_val=0
 gfio_check="no"
 libhdfs="no"
+pmemblk="no"
 disable_lex=""
 prefix=/usr/local
 
@@ -169,6 +170,8 @@ for opt do
   ;;
   --enable-libhdfs) libhdfs="yes"
   ;;
+  --enable-pmemblk) pmemblk="yes"
+  ;;
   --disable-lex) disable_lex="yes"
   ;;
   --enable-lex) disable_lex="no"
@@ -199,6 +202,7 @@ if test "$show_help" = "yes" ; then
   echo "--disable-numa         Disable libnuma even if found"
   echo "--disable-gfapi        Disable gfapi"
   echo "--enable-libhdfs       Enable hdfs support"
+  echo "--enable-pmemblk       Enable NVML libpmemblk support"
   echo "--disable-lex          Disable use of lex/yacc for math"
   echo "--enable-lex           Enable use of lex/yacc for math"
   echo "--disable-shm          Disable SHM support"
@@ -1479,6 +1483,10 @@ if compile_prog "" "" "mtd"; then
 fi
 echo "MTD                           $mtd"
 
+##########################################
+# Report whether pmemblk engine is enabled
+echo "NVML libpmemblk engine        $pmemblk"
+
 # Check if we have lex/yacc available
 yacc="no"
 yacc_is_bison="no"
@@ -1795,6 +1803,9 @@ if test "$libhdfs" = "yes" ; then
 if test "$mtd" = "yes" ; then
   output_sym "CONFIG_MTD"
 fi
+if test "$pmemblk" = "yes" ; then
+  output_sym "CONFIG_PMEMBLK"
+fi
 if test "$arith" = "yes" ; then
   output_sym "CONFIG_ARITHMETIC"
   if test "$yacc_is_bison" = "yes" ; then
diff --git a/diskutil.c b/diskutil.c
index c25c5c9..8031d5d 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -11,6 +11,7 @@
 #include "fio.h"
 #include "smalloc.h"
 #include "diskutil.h"
+#include "helper_thread.h"
 
 static int last_majdev, last_mindev;
 static struct disk_util *last_du;
@@ -121,7 +122,7 @@ int update_io_ticks(void)
 
 	fio_mutex_down(disk_util_mutex);
 
-	if (!helper_exit) {
+	if (!helper_should_exit()) {
 		flist_for_each(entry, &disk_list) {
 			du = flist_entry(entry, struct disk_util, list);
 			update_io_tick_disk(du);
diff --git a/diskutil.h b/diskutil.h
index 25d0beb..ff8a5b0 100644
--- a/diskutil.h
+++ b/diskutil.h
@@ -4,8 +4,7 @@
 #define FIO_DU_NAME_SZ		64
 
 #include "lib/output_buffer.h"
-
-extern volatile int helper_exit;
+#include "helper_thread.h"
 
 struct disk_util_stats {
 	uint64_t ios[2];
@@ -129,7 +128,7 @@ static inline void print_disk_util(struct disk_util_stat *du,
 
 static inline int update_io_ticks(void)
 {
-	return helper_exit;
+	return helper_should_exit();
 }
 #endif
 
diff --git a/engines/pmemblk.c b/engines/pmemblk.c
new file mode 100644
index 0000000..ab4b769
--- /dev/null
+++ b/engines/pmemblk.c
@@ -0,0 +1,523 @@
+/*
+ * pmemblk: IO engine that uses NVML libpmemblk to read and write data
+ *
+ * Copyright (C) 2016 Hewlett Packard Enterprise Development LP
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the Free
+ * Software Foundation, Inc., 59 Temple Place, Suite 330,
+ * Boston, MA 02111-1307 USA
+ */
+
+/*
+ * pmemblk engine
+ *
+ * IO engine that uses libpmemblk to read and write data
+ *
+ * To use:
+ *   ioengine=pmemblk
+ *
+ * Other relevant settings:
+ *   iodepth=1
+ *   direct=1
+ *   thread=1   REQUIRED
+ *   unlink=1
+ *   filename=/pmem0/fiotestfile,BSIZE,FSIZEMB
+ *
+ *   thread must be set to 1 for pmemblk as multiple processes cannot
+ *     open the same block pool file.
+ *
+ *   iodepth should be set to 1 as pmemblk is always synchronous.
+ *   Use numjobs to scale up.
+ *
+ *   direct=1 is implied as pmemblk is always direct.
+ *
+ *   Can set unlink to 1 to remove the block pool file after testing.
+ *
+ *   When specifying the filename, if the block pool file does not already
+ *   exist, then the pmemblk engine can create the pool file if you specify
+ *   the block and file sizes.  BSIZE is the block size in bytes.
+ *   FSIZEMB is the pool file size in MB.
+ *
+ *   See examples/pmemblk.fio for more.
+ *
+ * libpmemblk.so
+ *   By default, the pmemblk engine will let the system find the libpmemblk.so
+ *   that it uses.  You can use an alternative libpmemblk by setting the
+ *   FIO_PMEMBLK_LIB environment variable to the full path to the desired
+ *   libpmemblk.so.
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/uio.h>
+#include <errno.h>
+#include <assert.h>
+#include <dlfcn.h>
+#include <string.h>
+
+#include "../fio.h"
+
+/*
+ * libpmemblk
+ */
+struct PMEMblkpool_s;
+typedef struct PMEMblkpool_s PMEMblkpool;
+
+PMEMblkpool *(*pmemblk_create) (const char *, size_t, size_t, mode_t) = NULL;
+PMEMblkpool *(*pmemblk_open) (const char *, size_t) = NULL;
+void (*pmemblk_close) (PMEMblkpool *) = NULL;
+size_t(*pmemblk_nblock) (PMEMblkpool *) = NULL;
+size_t(*pmemblk_bsize) (PMEMblkpool *) = NULL;
+int (*pmemblk_read) (PMEMblkpool *, void *, off_t) = NULL;
+int (*pmemblk_write) (PMEMblkpool *, const void *, off_t) = NULL;
+
+int load_libpmemblk(const char *path)
+{
+	void *dl;
+
+	if (NULL == path)
+		path = "libpmemblk.so";
+
+	dl = dlopen(path, RTLD_NOW | RTLD_NODELETE);
+	if (NULL == dl)
+		goto errorout;
+
+	if (NULL == (pmemblk_create = dlsym(dl, "pmemblk_create")))
+		goto errorout;
+	if (NULL == (pmemblk_open = dlsym(dl, "pmemblk_open")))
+		goto errorout;
+	if (NULL == (pmemblk_close = dlsym(dl, "pmemblk_close")))
+		goto errorout;
+	if (NULL == (pmemblk_nblock = dlsym(dl, "pmemblk_nblock")))
+		goto errorout;
+	if (NULL == (pmemblk_bsize = dlsym(dl, "pmemblk_bsize")))
+		goto errorout;
+	if (NULL == (pmemblk_read = dlsym(dl, "pmemblk_read")))
+		goto errorout;
+	if (NULL == (pmemblk_write = dlsym(dl, "pmemblk_write")))
+		goto errorout;
+
+	return 0;
+
+errorout:
+	log_err("fio: unable to load libpmemblk: %s\n", dlerror());
+	if (NULL != dl)
+		dlclose(dl);
+
+	return (-1);
+
+}				/* load_libpmemblk() */
+
+typedef struct fio_pmemblk_file *fio_pmemblk_file_t;
+struct fio_pmemblk_file {
+	fio_pmemblk_file_t pmb_next;
+	char *pmb_filename;
+	uint64_t pmb_refcnt;
+	PMEMblkpool *pmb_pool;
+	size_t pmb_bsize;
+	size_t pmb_nblocks;
+};
+#define FIOFILEPMBSET(_f, _v)  do {                 \
+	(_f)->engine_data = (uint64_t)(uintptr_t)(_v);  \
+} while(0)
+#define FIOFILEPMBGET(_f)  ((fio_pmemblk_file_t)((_f)->engine_data))
+
+static fio_pmemblk_file_t Cache = NULL;
+
+static pthread_mutex_t CacheLock = PTHREAD_MUTEX_INITIALIZER;
+
+#define PMB_CREATE   (0x0001)	/* should create file */
+
+fio_pmemblk_file_t fio_pmemblk_cache_lookup(const char *filename)
+{
+	fio_pmemblk_file_t i;
+
+	for (i = Cache; i != NULL; i = i->pmb_next)
+		if (0 == strcmp(filename, i->pmb_filename))
+			return i;
+
+	return NULL;
+
+}				/* fio_pmemblk_cache_lookup() */
+
+static void fio_pmemblk_cache_insert(fio_pmemblk_file_t pmb)
+{
+	pmb->pmb_next = Cache;
+	Cache = pmb;
+
+	return;
+
+}				/* fio_pmemblk_cache_insert() */
+
+static void fio_pmemblk_cache_remove(fio_pmemblk_file_t pmb)
+{
+	fio_pmemblk_file_t i;
+
+	if (pmb == Cache) {
+		Cache = Cache->pmb_next;
+		pmb->pmb_next = NULL;
+		return;
+	}
+
+	for (i = Cache; i != NULL; i = i->pmb_next)
+		if (pmb == i->pmb_next) {
+			i->pmb_next = i->pmb_next->pmb_next;
+			pmb->pmb_next = NULL;
+			return;
+		}
+
+	return;
+
+}				/* fio_pmemblk_cache_remove() */
+
+/*
+ * to control block size and gross file size at the libpmemblk
+ * level, we allow the block size and file size to be appended
+ * to the file name:
+ *
+ *   path[,bsize,fsizemb]
+ *
+ * note that we do not use the fio option "filesize" to dictate
+ * the file size because we can only give libpmemblk the gross
+ * file size, which is different from the net or usable file
+ * size (which is probably what fio wants).
+ *
+ * the final path without the parameters is returned in ppath.
+ * the block size and file size are returned in pbsize and fsize.
+ *
+ * note that the user should specify the file size in MiB, but
+ * we return bytes from here.
+ */
+static void
+pmb_parse_path(const char *pathspec,
+	       char **ppath, uint64_t * pbsize, uint64_t * pfsize)
+{
+	char *path;
+	char *s;
+	uint64_t bsize;
+	uint64_t fsizemb;
+
+	path = strdup(pathspec);
+	if (NULL == path) {
+		*ppath = NULL;
+		return;
+	}
+
+	/* extract sizes, if given */
+	s = strrchr(path, ',');
+	if (s && (fsizemb = strtoull(s + 1, NULL, 10))) {
+		*s = 0;
+		s = strrchr(path, ',');
+		if (s && (bsize = strtoull(s + 1, NULL, 10))) {
+			*s = 0;
+			*ppath = path;
+			*pbsize = bsize;
+			*pfsize = fsizemb << 20;
+			return;
+		}
+	}
+
+	/* size specs not found */
+	strcpy(path, pathspec);
+	*ppath = path;
+	*pbsize = 0;
+	*pfsize = 0;
+	return;
+
+}				/* pmb_parse_path() */
+
+static
+ fio_pmemblk_file_t pmb_open(const char *pathspec, int flags)
+{
+	fio_pmemblk_file_t pmb;
+	char *path = NULL;
+	uint64_t bsize = 0;
+	uint64_t fsize = 0;
+
+	pmb_parse_path(pathspec, &path, &bsize, &fsize);
+	if (NULL == path)
+		return NULL;
+
+	pthread_mutex_lock(&CacheLock);
+
+	pmb = fio_pmemblk_cache_lookup(path);
+
+	if (NULL == pmb) {
+		/* load libpmemblk if needed */
+		if (NULL == pmemblk_open)
+			if (0 != load_libpmemblk(getenv("FIO_PMEMBLK_LIB")))
+				goto error;
+
+		pmb = malloc(sizeof(*pmb));
+		if (NULL == pmb)
+			goto error;
+
+		/* try opening existing first, create it if needed */
+		pmb->pmb_pool = pmemblk_open(path, bsize);
+		if ((NULL == pmb->pmb_pool) &&
+		    (ENOENT == errno) &&
+		    (flags & PMB_CREATE) && (0 < fsize) && (0 < bsize)) {
+			pmb->pmb_pool =
+			    pmemblk_create(path, bsize, fsize, 0644);
+		}
+		if (NULL == pmb->pmb_pool) {
+			log_err
+			    ("fio: enable to open pmemblk pool file (errno %d)\n",
+			     errno);
+			goto error;
+		}
+
+		pmb->pmb_filename = path;
+		pmb->pmb_next = NULL;
+		pmb->pmb_refcnt = 0;
+		pmb->pmb_bsize = pmemblk_bsize(pmb->pmb_pool);
+		pmb->pmb_nblocks = pmemblk_nblock(pmb->pmb_pool);
+
+		fio_pmemblk_cache_insert(pmb);
+	}
+
+	pmb->pmb_refcnt += 1;
+
+	pthread_mutex_unlock(&CacheLock);
+
+	return pmb;
+
+error:
+	if (NULL != pmb) {
+		if (NULL != pmb->pmb_pool)
+			pmemblk_close(pmb->pmb_pool);
+		pmb->pmb_pool = NULL;
+		pmb->pmb_filename = NULL;
+		free(pmb);
+	}
+	if (NULL != path)
+		free(path);
+
+	pthread_mutex_unlock(&CacheLock);
+	return NULL;
+
+}				/* pmb_open() */
+
+static void pmb_close(fio_pmemblk_file_t pmb, const int keep)
+{
+	pthread_mutex_lock(&CacheLock);
+
+	pmb->pmb_refcnt--;
+
+	if (!keep && (0 == pmb->pmb_refcnt)) {
+		pmemblk_close(pmb->pmb_pool);
+		pmb->pmb_pool = NULL;
+		free(pmb->pmb_filename);
+		pmb->pmb_filename = NULL;
+		fio_pmemblk_cache_remove(pmb);
+		free(pmb);
+	}
+
+	pthread_mutex_unlock(&CacheLock);
+
+}				/* pmb_close() */
+
+static int pmb_get_flags(struct thread_data *td, uint64_t * pflags)
+{
+	static int thread_warned = 0;
+	static int odirect_warned = 0;
+
+	uint64_t flags = 0;
+
+	if (!td->o.use_thread) {
+		if (!thread_warned) {
+			thread_warned = 1;
+			log_err("fio: must set thread=1 for pmemblk engine\n");
+		}
+		return 1;
+	}
+
+	if (!td->o.odirect && !odirect_warned) {
+		odirect_warned = 1;
+		log_info("fio: direct == 0, but pmemblk is always direct\n");
+	}
+
+	if (td->o.allow_create)
+		flags |= PMB_CREATE;
+
+	(*pflags) = flags;
+	return 0;
+
+}				/* pmb_get_flags() */
+
+static int fio_pmemblk_open_file(struct thread_data *td, struct fio_file *f)
+{
+	uint64_t flags = 0;
+	fio_pmemblk_file_t pmb;
+
+	if (0 != pmb_get_flags(td, &flags))
+		return 1;
+
+	pmb = pmb_open(f->file_name, flags);
+	if (NULL == pmb)
+		return 1;
+
+	FIOFILEPMBSET(f, pmb);
+
+	return 0;
+
+}				/* fio_pmemblk_open_file() */
+
+static int
+fio_pmemblk_close_file(struct thread_data fio_unused * td, struct fio_file *f)
+{
+	fio_pmemblk_file_t pmb = FIOFILEPMBGET(f);
+
+	if (pmb)
+		pmb_close(pmb, 0);
+
+	FIOFILEPMBSET(f, NULL);
+
+	return 0;
+
+}				/* fio_pmemblk_close_file() */
+
+static int fio_pmemblk_get_file_size(struct thread_data *td, struct fio_file *f)
+{
+	uint64_t flags = 0;
+	fio_pmemblk_file_t pmb = FIOFILEPMBGET(f);
+
+	if (fio_file_size_known(f))
+		return 0;
+
+	if (NULL == pmb) {
+		if (0 != pmb_get_flags(td, &flags))
+			return 1;
+		pmb = pmb_open(f->file_name, flags);
+		if (NULL == pmb)
+			return 1;
+	}
+
+	f->real_file_size = pmb->pmb_bsize * pmb->pmb_nblocks;
+
+	fio_file_set_size_known(f);
+
+	if (NULL == FIOFILEPMBGET(f))
+		pmb_close(pmb, 1);
+
+	return 0;
+
+}				/* fio_pmemblk_get_file_size() */
+
+static int fio_pmemblk_queue(struct thread_data *td, struct io_u *io_u)
+{
+	struct fio_file *f = io_u->file;
+	fio_pmemblk_file_t pmb = FIOFILEPMBGET(f);
+
+	unsigned long long off;
+	unsigned long len;
+	void *buf;
+	int (*blkop) (PMEMblkpool *, void *, off_t) = (void *)pmemblk_write;
+
+	fio_ro_check(td, io_u);
+
+	switch (io_u->ddir) {
+	case DDIR_READ:
+		blkop = pmemblk_read;
+		/* fall through */
+	case DDIR_WRITE:
+		off = io_u->offset;
+		len = io_u->xfer_buflen;
+
+		io_u->error = EINVAL;
+		if (0 != (off % pmb->pmb_bsize))
+			break;
+		if (0 != (len % pmb->pmb_bsize))
+			break;
+		if ((off + len) / pmb->pmb_bsize > pmb->pmb_nblocks)
+			break;
+
+		io_u->error = 0;
+		buf = io_u->xfer_buf;
+		off /= pmb->pmb_bsize;
+		len /= pmb->pmb_bsize;
+		while (0 < len) {
+			if (0 != blkop(pmb->pmb_pool, buf, off)) {
+				io_u->error = errno;
+				break;
+			}
+			buf += pmb->pmb_bsize;
+			off++;
+			len--;
+		}
+		off *= pmb->pmb_bsize;
+		len *= pmb->pmb_bsize;
+		io_u->resid = io_u->xfer_buflen - (off - io_u->offset);
+		break;
+	case DDIR_SYNC:
+	case DDIR_DATASYNC:
+	case DDIR_SYNC_FILE_RANGE:
+		/* we're always sync'd */
+		io_u->error = 0;
+		break;
+	default:
+		io_u->error = EINVAL;
+		break;
+	}
+
+	return FIO_Q_COMPLETED;
+
+}				/* fio_pmemblk_queue() */
+
+static int fio_pmemblk_unlink_file(struct thread_data *td, struct fio_file *f)
+{
+	char *path = NULL;
+	uint64_t bsize = 0;
+	uint64_t fsize = 0;
+
+	/*
+	 * we need our own unlink in case the user has specified
+	 * the block and file sizes in the path name.  we parse
+	 * the file_name to determine the file name we actually used.
+	 */
+
+	pmb_parse_path(f->file_name, &path, &bsize, &fsize);
+	if (NULL == path)
+		return 1;
+
+	unlink(path);
+	free(path);
+
+	return 0;
+
+}				/* fio_pmemblk_unlink_file() */
+
+struct ioengine_ops ioengine = {
+	.name = "pmemblk",
+	.version = FIO_IOOPS_VERSION,
+	.queue = fio_pmemblk_queue,
+	.open_file = fio_pmemblk_open_file,
+	.close_file = fio_pmemblk_close_file,
+	.get_file_size = fio_pmemblk_get_file_size,
+	.unlink_file = fio_pmemblk_unlink_file,
+	.flags = FIO_SYNCIO | FIO_DISKLESSIO | FIO_NOEXTEND | FIO_NODISKUTIL,
+};
+
+static void
+fio_init fio_pmemblk_register(void)
+{
+	register_ioengine(&ioengine);
+}
+
+static void
+fio_exit fio_pmemblk_unregister(void)
+{
+	unregister_ioengine(&ioengine);
+}
diff --git a/examples/pmemblk.fio b/examples/pmemblk.fio
new file mode 100644
index 0000000..2d5ecfc
--- /dev/null
+++ b/examples/pmemblk.fio
@@ -0,0 +1,71 @@
+[global]
+bs=1m
+ioengine=pmemblk
+norandommap
+time_based=1
+runtime=30
+group_reporting
+disable_lat=1
+disable_slat=1
+disable_clat=1
+clat_percentiles=0
+cpus_allowed_policy=split
+
+# For the pmemblk engine:
+#
+#   IOs always complete immediately
+#   IOs are always direct
+#   Must use threads
+#
+iodepth=1
+direct=1
+thread=1
+numjobs=16
+#
+# Unlink can be used to remove the files when done, but if you are
+# using serial runs with stonewall, and you want the files to be created
+# only once and unlinked only at the very end, then put the unlink=1
+# in the last group.  This is the method demonstrated here.
+#
+# Note that if you have a read-only group and if the files will be
+# newly created, then all of the data will read back as zero and the
+# read will be optimized, yielding performance that is different from
+# that of reading non-zero blocks (or unoptimized zero blocks).
+#
+unlink=0
+#
+# The pmemblk engine does IO to files in a DAX-mounted filesystem.
+# The filesystem should be created on an NVDIMM (e.g /dev/pmem0)
+# and then mounted with the '-o dax' option.  Note that the engine
+# accesses the underlying NVDIMM directly, bypassing the kernel block
+# layer, so the usual filesystem/disk performance monitoring tools such
+# as iostat will not provide useful data.
+#
+# Here we specify a test file on each of two NVDIMMs.  The first
+# number after the file name is the block size in bytes (4096 bytes
+# in this example).  The second number is the size of the file to
+# create in MiB (1 GiB in this example); note that the actual usable
+# space available to fio will be less than this as libpmemblk requires
+# some space for metadata.
+#
+# Currently, the minimum block size is 512 bytes and the minimum file
+# size is about 17 MiB (these are libpmemblk requirements).
+#
+# While both files in this example have the same block size and file
+# size, this is not required.
+#
+filename=/pmem0/fio-test,4096,1024
+filename=/pmem1/fio-test,4096,1024
+
+[pmemblk-write]
+rw=randwrite
+stonewall
+
+[pmemblk-read]
+rw=randread
+stonewall
+#
+# We're done, so unlink the file:
+#
+unlink=1
+
diff --git a/fio.1 b/fio.1
index 73fdee6..ebb4899 100644
--- a/fio.1
+++ b/fio.1
@@ -700,6 +700,9 @@ treated as erases. Depending on the underlying device type, the I/O may have
 to go in a certain pattern, e.g., on NAND, writing sequentially to erase blocks
 and discarding before overwriting. The writetrim mode works well for this
 constraint.
+.TP
+.B pmemblk
+Read and write through the NVML libpmemblk interface.
 .RE
 .P
 .RE
@@ -1180,12 +1183,14 @@ Terminate all jobs if one job finishes in error.  Default: wait for each job
 to finish.
 .TP
 .BI bwavgtime \fR=\fPint
-Average bandwidth calculations over the given time in milliseconds.  Default:
-500ms.
+Average bandwidth calculations over the given time in milliseconds. If the job
+also does bandwidth logging through \fBwrite_bw_log\fR, then the minimum of
+this option and \fBlog_avg_msec\fR will be used.  Default: 500ms.
 .TP
 .BI iopsavgtime \fR=\fPint
-Average IOPS calculations over the given time in milliseconds.  Default:
-500ms.
+Average IOPS calculations over the given time in milliseconds. If the job
+also does IOPS logging through \fBwrite_iops_log\fR, then the minimum of
+this option and \fBlog_avg_msec\fR will be used.  Default: 500ms.
 .TP
 .BI create_serialize \fR=\fPbool
 If true, serialize file creation for the jobs.  Default: true.
diff --git a/fio.h b/fio.h
index 829cc81..6a244c3 100644
--- a/fio.h
+++ b/fio.h
@@ -445,8 +445,6 @@ extern int nr_clients;
 extern int log_syslog;
 extern int status_interval;
 extern const char fio_version_string[];
-extern int helper_do_stat;
-extern pthread_cond_t helper_cond;
 extern char *trigger_file;
 extern char *trigger_cmd;
 extern char *trigger_remote_cmd;
diff --git a/fio_time.h b/fio_time.h
index 79f324a..cb271c2 100644
--- a/fio_time.h
+++ b/fio_time.h
@@ -17,5 +17,6 @@ extern void set_genesis_time(void);
 extern int ramp_time_over(struct thread_data *);
 extern int in_ramp_time(struct thread_data *);
 extern void fio_time_init(void);
+extern void timeval_add_msec(struct timeval *, unsigned int);
 
 #endif
diff --git a/helper_thread.c b/helper_thread.c
new file mode 100644
index 0000000..1befabf
--- /dev/null
+++ b/helper_thread.c
@@ -0,0 +1,167 @@
+#include "fio.h"
+#include "smalloc.h"
+#include "helper_thread.h"
+
+static struct helper_data {
+	volatile int exit;
+	volatile int reset;
+	volatile int do_stat;
+	struct sk_out *sk_out;
+	pthread_t thread;
+	pthread_mutex_t lock;
+	pthread_cond_t cond;
+	struct fio_mutex *startup_mutex;
+} *helper_data;
+
+void helper_thread_destroy(void)
+{
+	pthread_cond_destroy(&helper_data->cond);
+	pthread_mutex_destroy(&helper_data->lock);
+	sfree(helper_data);
+}
+
+void helper_reset(void)
+{
+	if (!helper_data)
+		return;
+
+	pthread_mutex_lock(&helper_data->lock);
+
+	if (!helper_data->reset) {
+		helper_data->reset = 1;
+		pthread_cond_signal(&helper_data->cond);
+	}
+
+	pthread_mutex_unlock(&helper_data->lock);
+}
+
+void helper_do_stat(void)
+{
+	if (!helper_data)
+		return;
+
+	pthread_mutex_lock(&helper_data->lock);
+	helper_data->do_stat = 1;
+	pthread_cond_signal(&helper_data->cond);
+	pthread_mutex_unlock(&helper_data->lock);
+}
+
+bool helper_should_exit(void)
+{
+	if (!helper_data)
+		return true;
+
+	return helper_data->exit;
+}
+
+void helper_thread_exit(void)
+{
+	void *ret;
+
+	pthread_mutex_lock(&helper_data->lock);
+	helper_data->exit = 1;
+	pthread_cond_signal(&helper_data->cond);
+	pthread_mutex_unlock(&helper_data->lock);
+
+	pthread_join(helper_data->thread, &ret);
+}
+
+static void *helper_thread_main(void *data)
+{
+	struct helper_data *hd = data;
+	unsigned int msec_to_next_event, next_log;
+	struct timeval tv, last_du;
+	int ret = 0;
+
+	sk_out_assign(hd->sk_out);
+
+	gettimeofday(&tv, NULL);
+	memcpy(&last_du, &tv, sizeof(tv));
+
+	fio_mutex_up(hd->startup_mutex);
+
+	msec_to_next_event = DISK_UTIL_MSEC;
+	while (!ret && !hd->exit) {
+		struct timespec ts;
+		struct timeval now;
+		uint64_t since_du;
+
+		timeval_add_msec(&tv, msec_to_next_event);
+		ts.tv_sec = tv.tv_sec;
+		ts.tv_nsec = tv.tv_usec * 1000;
+
+		pthread_mutex_lock(&hd->lock);
+		pthread_cond_timedwait(&hd->cond, &hd->lock, &ts);
+
+		gettimeofday(&now, NULL);
+
+		if (hd->reset) {
+			memcpy(&tv, &now, sizeof(tv));
+			memcpy(&last_du, &now, sizeof(last_du));
+			hd->reset = 0;
+		}
+
+		pthread_mutex_unlock(&hd->lock);
+
+		since_du = mtime_since(&last_du, &now);
+		if (since_du >= DISK_UTIL_MSEC || DISK_UTIL_MSEC - since_du < 10) {
+			ret = update_io_ticks();
+			timeval_add_msec(&last_du, DISK_UTIL_MSEC);
+			msec_to_next_event = DISK_UTIL_MSEC;
+			if (since_du >= DISK_UTIL_MSEC)
+				msec_to_next_event -= (since_du - DISK_UTIL_MSEC);
+		} else {
+			if (since_du >= DISK_UTIL_MSEC)
+				msec_to_next_event = DISK_UTIL_MSEC - (DISK_UTIL_MSEC - since_du);
+			else
+				msec_to_next_event = DISK_UTIL_MSEC;
+		}
+
+		if (hd->do_stat) {
+			hd->do_stat = 0;
+			__show_running_run_stats();
+		}
+
+		next_log = calc_log_samples();
+		if (!next_log)
+			next_log = DISK_UTIL_MSEC;
+
+		msec_to_next_event = min(next_log, msec_to_next_event);
+
+		if (!is_backend)
+			print_thread_status();
+	}
+
+	fio_writeout_logs(false);
+
+	sk_out_drop();
+	return NULL;
+}
+
+int helper_thread_create(struct fio_mutex *startup_mutex, struct sk_out *sk_out)
+{
+	struct helper_data *hd;
+	int ret;
+
+	hd = smalloc(sizeof(*hd));
+
+	setup_disk_util();
+
+	hd->sk_out = sk_out;
+	pthread_cond_init(&hd->cond, NULL);
+	pthread_mutex_init(&hd->lock, NULL);
+	hd->startup_mutex = startup_mutex;
+
+	ret = pthread_create(&hd->thread, NULL, helper_thread_main, hd);
+	if (ret) {
+		log_err("Can't create helper thread: %s\n", strerror(ret));
+		return 1;
+	}
+
+	helper_data = hd;
+
+	dprint(FD_MUTEX, "wait on startup_mutex\n");
+	fio_mutex_down(startup_mutex);
+	dprint(FD_MUTEX, "done waiting on startup_mutex\n");
+	return 0;
+}
diff --git a/helper_thread.h b/helper_thread.h
new file mode 100644
index 0000000..78933b1
--- /dev/null
+++ b/helper_thread.h
@@ -0,0 +1,11 @@
+#ifndef FIO_HELPER_THREAD_H
+#define FIO_HELPER_THREAD_H
+
+extern void helper_reset(void);
+extern void helper_do_stat(void);
+extern bool helper_should_exit(void);
+extern void helper_thread_destroy(void);
+extern void helper_thread_exit(void);
+extern int helper_thread_create(struct fio_mutex *, struct sk_out *);
+
+#endif
diff --git a/init.c b/init.c
index 89e05c0..c579d5c 100644
--- a/init.c
+++ b/init.c
@@ -1416,6 +1416,11 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		};
 		const char *suf;
 
+		if (fio_option_is_set(o, bw_avg_time))
+			p.avg_msec = min(o->log_avg_msec, o->bw_avg_time);
+		else
+			o->bw_avg_time = p.avg_msec;
+
 		if (p.log_gz_store)
 			suf = "log.fz";
 		else
@@ -1436,6 +1441,11 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		};
 		const char *suf;
 
+		if (fio_option_is_set(o, iops_avg_time))
+			p.avg_msec = min(o->log_avg_msec, o->iops_avg_time);
+		else
+			o->iops_avg_time = p.avg_msec;
+
 		if (p.log_gz_store)
 			suf = "log.fz";
 		else
diff --git a/io_u.c b/io_u.c
index 6622bc0..eb15dc2 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1710,16 +1710,18 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 		}
 	}
 
-	if (!td->o.disable_clat) {
-		add_clat_sample(td, idx, lusec, bytes, io_u->offset);
-		io_u_mark_latency(td, lusec);
-	}
+	if (ddir_rw(idx)) {
+		if (!td->o.disable_clat) {
+			add_clat_sample(td, idx, lusec, bytes, io_u->offset);
+			io_u_mark_latency(td, lusec);
+		}
 
-	if (!td->o.disable_bw)
-		add_bw_sample(td, idx, bytes, &icd->time);
+		if (!td->o.disable_bw && per_unit_log(td->bw_log))
+			add_bw_sample(td, io_u, bytes, lusec);
 
-	if (no_reduce)
-		add_iops_sample(td, idx, bytes, &icd->time);
+		if (no_reduce && per_unit_log(td->iops_log))
+			add_iops_sample(td, io_u, bytes);
+	}
 
 	if (td->ts.nr_block_infos && io_u->ddir == DDIR_TRIM) {
 		uint32_t *info = io_u_block_info(td, io_u);
diff --git a/iolog.c b/iolog.c
index 94d3f3c..71afe86 100644
--- a/iolog.c
+++ b/iolog.c
@@ -18,6 +18,7 @@
 #include "verify.h"
 #include "trim.h"
 #include "filelock.h"
+#include "smalloc.h"
 
 static const char iolog_ver2[] = "fio version 2 iolog";
 
@@ -574,14 +575,12 @@ void setup_log(struct io_log **log, struct log_params *p,
 {
 	struct io_log *l;
 
-	l = calloc(1, sizeof(*l));
+	l = smalloc(sizeof(*l));
 	l->nr_samples = 0;
-	l->max_samples = DEF_LOG_ENTRIES;
 	l->log_type = p->log_type;
 	l->log_offset = p->log_offset;
 	l->log_gz = p->log_gz;
 	l->log_gz_store = p->log_gz_store;
-	l->log = malloc(l->max_samples * log_entry_sz(l));
 	l->avg_msec = p->avg_msec;
 	l->filename = strdup(filename);
 	l->td = p->td;
@@ -631,7 +630,7 @@ void free_log(struct io_log *log)
 {
 	free(log->log);
 	free(log->filename);
-	free(log);
+	sfree(log);
 }
 
 void flush_samples(FILE *f, void *samples, uint64_t sample_size)
@@ -1202,29 +1201,74 @@ static int __write_log(struct thread_data *td, struct io_log *log, int try)
 	return 0;
 }
 
-static int write_iops_log(struct thread_data *td, int try)
+static int write_iops_log(struct thread_data *td, int try, bool unit_log)
 {
-	return __write_log(td, td->iops_log, try);
+	int ret;
+
+	if (per_unit_log(td->iops_log) != unit_log)
+		return 0;
+
+	ret = __write_log(td, td->iops_log, try);
+	if (!ret)
+		td->iops_log = NULL;
+
+	return ret;
 }
 
-static int write_slat_log(struct thread_data *td, int try)
+static int write_slat_log(struct thread_data *td, int try, bool unit_log)
 {
-	return __write_log(td, td->slat_log, try);
+	int ret;
+
+	if (!unit_log)
+		return 0;
+
+	ret = __write_log(td, td->slat_log, try);
+	if (!ret)
+		td->slat_log = NULL;
+
+	return ret;
 }
 
-static int write_clat_log(struct thread_data *td, int try)
+static int write_clat_log(struct thread_data *td, int try, bool unit_log)
 {
-	return __write_log(td, td->clat_log, try);
+	int ret;
+
+	if (!unit_log)
+		return 0;
+
+	ret = __write_log(td, td->clat_log, try);
+	if (!ret)
+		td->clat_log = NULL;
+
+	return ret;
 }
 
-static int write_lat_log(struct thread_data *td, int try)
+static int write_lat_log(struct thread_data *td, int try, bool unit_log)
 {
-	return __write_log(td, td->lat_log, try);
+	int ret;
+
+	if (!unit_log)
+		return 0;
+
+	ret = __write_log(td, td->lat_log, try);
+	if (!ret)
+		td->lat_log = NULL;
+
+	return ret;
 }
 
-static int write_bandw_log(struct thread_data *td, int try)
+static int write_bandw_log(struct thread_data *td, int try, bool unit_log)
 {
-	return __write_log(td, td->bw_log, try);
+	int ret;
+
+	if (per_unit_log(td->bw_log) != unit_log)
+		return 0;
+
+	ret = __write_log(td, td->bw_log, try);
+	if (!ret)
+		td->bw_log = NULL;
+
+	return ret;
 }
 
 enum {
@@ -1239,7 +1283,7 @@ enum {
 
 struct log_type {
 	unsigned int mask;
-	int (*fn)(struct thread_data *, int);
+	int (*fn)(struct thread_data *, int, bool);
 };
 
 static struct log_type log_types[] = {
@@ -1265,7 +1309,7 @@ static struct log_type log_types[] = {
 	},
 };
 
-void fio_writeout_logs(struct thread_data *td)
+void td_writeout_logs(struct thread_data *td, bool unit_logs)
 {
 	unsigned int log_mask = 0;
 	unsigned int log_left = ALL_LOG_NR;
@@ -1273,7 +1317,7 @@ void fio_writeout_logs(struct thread_data *td)
 
 	old_state = td_bump_runstate(td, TD_FINISHING);
 
-	finalize_logs(td);
+	finalize_logs(td, unit_logs);
 
 	while (log_left) {
 		int prev_log_left = log_left;
@@ -1283,7 +1327,7 @@ void fio_writeout_logs(struct thread_data *td)
 			int ret;
 
 			if (!(log_mask & lt->mask)) {
-				ret = lt->fn(td, log_left != 1);
+				ret = lt->fn(td, log_left != 1, unit_logs);
 				if (!ret) {
 					log_left--;
 					log_mask |= lt->mask;
@@ -1297,3 +1341,12 @@ void fio_writeout_logs(struct thread_data *td)
 
 	td_restore_runstate(td, old_state);
 }
+
+void fio_writeout_logs(bool unit_logs)
+{
+	struct thread_data *td;
+	int i;
+
+	for_each_td(td, i)
+		td_writeout_logs(td, unit_logs);
+}
diff --git a/iolog.h b/iolog.h
index 74f2170..739a7c8 100644
--- a/iolog.h
+++ b/iolog.h
@@ -207,12 +207,18 @@ struct log_params {
 	int log_compress;
 };
 
-extern void finalize_logs(struct thread_data *td);
+static inline bool per_unit_log(struct io_log *log)
+{
+	return log && !log->avg_msec;
+}
+
+extern void finalize_logs(struct thread_data *td, bool);
 extern void setup_log(struct io_log **, struct log_params *, const char *);
 extern void flush_log(struct io_log *, int);
 extern void flush_samples(FILE *, void *, uint64_t);
 extern void free_log(struct io_log *);
-extern void fio_writeout_logs(struct thread_data *);
+extern void fio_writeout_logs(bool);
+extern void td_writeout_logs(struct thread_data *, bool);
 extern int iolog_flush(struct io_log *, int);
 
 static inline void init_ipo(struct io_piece *ipo)
diff --git a/libfio.c b/libfio.c
index b17f148..55762d7 100644
--- a/libfio.c
+++ b/libfio.c
@@ -33,6 +33,7 @@
 #include "smalloc.h"
 #include "os/os.h"
 #include "filelock.h"
+#include "helper_thread.h"
 
 /*
  * Just expose an empty list, if the OS does not support disk util stats
@@ -151,6 +152,7 @@ void reset_all_stats(struct thread_data *td)
 
 	lat_target_reset(td);
 	clear_rusage_stat(td);
+	helper_reset();
 }
 
 void reset_fio_state(void)
diff --git a/options.c b/options.c
index b6c980e..980b7e5 100644
--- a/options.c
+++ b/options.c
@@ -1569,6 +1569,12 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "Hadoop Distributed Filesystem (HDFS) engine"
 			  },
 #endif
+#ifdef CONFIG_PMEMBLK
+			  { .ival = "pmemblk",
+			    .help = "NVML libpmemblk based IO engine",
+			  },
+
+#endif
 			  { .ival = "external",
 			    .help = "Load external engine (append name)",
 			  },
diff --git a/os/os-mac.h b/os/os-mac.h
index d202e99..76d388e 100644
--- a/os/os-mac.h
+++ b/os/os-mac.h
@@ -35,76 +35,7 @@
 
 typedef off_t off64_t;
 
-/* OS X as of 10.6 doesn't have the timer_* functions. 
- * Emulate the functionality using setitimer and sigaction here
- */
-
-#define MAX_TIMERS 64
-
 typedef unsigned int clockid_t;
-typedef unsigned int timer_t;
-
-struct itimerspec {
-	struct timespec it_value;
-	struct timespec it_interval;
-};
-
-static struct sigevent fio_timers[MAX_TIMERS];
-static unsigned int num_timers = 0;
-
-static void sig_alrm(int signum)
-{
-	union sigval sv;
-	
-	for (int i = 0; i < num_timers; i++) {
-		if (fio_timers[i].sigev_notify_function == NULL)
-			continue;
-		
-		if (fio_timers[i].sigev_notify == SIGEV_THREAD)
-			fio_timers[i].sigev_notify_function(sv);
-		else if (fio_timers[i].sigev_notify == SIGEV_SIGNAL)
-			kill(getpid(), fio_timers[i].sigev_signo);
-	}
-}
-
-static inline int timer_settime(timer_t timerid, int flags,
-				const struct itimerspec *value,
-				struct itimerspec *ovalue)
-{
-	struct sigaction sa;
-	struct itimerval tv;
-	struct itimerval tv_out;
-	int rc;
-	
-	tv.it_interval.tv_sec = value->it_interval.tv_sec;
-	tv.it_interval.tv_usec = value->it_interval.tv_nsec / 1000;
-
-	tv.it_value.tv_sec = value->it_value.tv_sec;
-	tv.it_value.tv_usec = value->it_value.tv_nsec / 1000;
-
-	sa.sa_handler = sig_alrm;
-	sigemptyset(&sa.sa_mask);
-	sa.sa_flags = 0;
-	
-	rc = sigaction(SIGALRM, &sa, NULL);
-
-	if (!rc)
-		rc = setitimer(ITIMER_REAL, &tv, &tv_out);
-	
-	if (!rc && ovalue != NULL) {
-		ovalue->it_interval.tv_sec = tv_out.it_interval.tv_sec;
-		ovalue->it_interval.tv_nsec = tv_out.it_interval.tv_usec * 1000;
-		ovalue->it_value.tv_sec = tv_out.it_value.tv_sec;
-		ovalue->it_value.tv_nsec = tv_out.it_value.tv_usec * 1000;
-	}
-
-	return rc;
-}
-
-static inline int timer_delete(timer_t timer)
-{
-	return 0;
-}
 
 #define FIO_OS_DIRECTIO
 static inline int fio_set_odirect(int fd)
diff --git a/stat.c b/stat.c
index 6d8d4d0..95f206e 100644
--- a/stat.c
+++ b/stat.c
@@ -15,6 +15,7 @@
 #include "idletime.h"
 #include "lib/pow2.h"
 #include "lib/output_buffer.h"
+#include "helper_thread.h"
 
 struct fio_mutex *stat_mutex;
 
@@ -1862,13 +1863,21 @@ static void __add_log_sample(struct io_log *iolog, unsigned long val,
 		iolog->avg_last = t;
 
 	if (iolog->nr_samples == iolog->max_samples) {
-		size_t new_size;
+		size_t new_size, new_samples;
 		void *new_log;
 
-		new_size = 2 * iolog->max_samples * log_entry_sz(iolog);
+		if (!iolog->max_samples)
+			new_samples = DEF_LOG_ENTRIES;
+		else
+			new_samples = iolog->max_samples * 2;
+
+		new_size = new_samples * log_entry_sz(iolog);
 
 		if (iolog->log_gz && (new_size > iolog->log_gz)) {
-			if (iolog_flush(iolog, 0)) {
+			if (!iolog->log) {
+				iolog->log = malloc(new_size);
+				iolog->max_samples = new_samples;
+			} else if (iolog_flush(iolog, 0)) {
 				log_err("fio: failed flushing iolog! Will stop logging.\n");
 				iolog->disabled = 1;
 				return;
@@ -1882,7 +1891,7 @@ static void __add_log_sample(struct io_log *iolog, unsigned long val,
 				return;
 			}
 			iolog->log = new_log;
-			iolog->max_samples <<= 1;
+			iolog->max_samples = new_samples;
 		}
 	}
 
@@ -2013,21 +2022,21 @@ static void add_log_sample(struct thread_data *td, struct io_log *iolog,
 	iolog->avg_last = elapsed;
 }
 
-void finalize_logs(struct thread_data *td)
+void finalize_logs(struct thread_data *td, bool unit_logs)
 {
 	unsigned long elapsed;
 
 	elapsed = mtime_since_now(&td->epoch);
 
-	if (td->clat_log)
+	if (td->clat_log && unit_logs)
 		_add_stat_to_log(td->clat_log, elapsed, td->o.log_max != 0);
-	if (td->slat_log)
+	if (td->slat_log && unit_logs)
 		_add_stat_to_log(td->slat_log, elapsed, td->o.log_max != 0);
-	if (td->lat_log)
+	if (td->lat_log && unit_logs)
 		_add_stat_to_log(td->lat_log, elapsed, td->o.log_max != 0);
-	if (td->bw_log)
+	if (td->bw_log && (unit_logs == per_unit_log(td->bw_log)))
 		_add_stat_to_log(td->bw_log, elapsed, td->o.log_max != 0);
-	if (td->iops_log)
+	if (td->iops_log && (unit_logs == per_unit_log(td->iops_log)))
 		_add_stat_to_log(td->iops_log, elapsed, td->o.log_max != 0);
 }
 
@@ -2056,9 +2065,6 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 {
 	struct thread_stat *ts = &td->ts;
 
-	if (!ddir_rw(ddir))
-		return;
-
 	td_io_u_lock(td);
 
 	add_stat_sample(&ts->clat_stat[ddir], usec);
@@ -2108,18 +2114,41 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 	td_io_u_unlock(td);
 }
 
-void add_bw_sample(struct thread_data *td, enum fio_ddir ddir, unsigned int bs,
-		   struct timeval *t)
+void add_bw_sample(struct thread_data *td, struct io_u *io_u,
+		   unsigned int bytes, unsigned long spent)
+{
+	struct thread_stat *ts = &td->ts;
+	unsigned long rate;
+
+	if (spent)
+		rate = bytes * 1000 / spent;
+	else
+		rate = 0;
+
+	td_io_u_lock(td);
+
+	add_stat_sample(&ts->bw_stat[io_u->ddir], rate);
+
+	if (td->bw_log)
+		add_log_sample(td, td->bw_log, rate, io_u->ddir, bytes, io_u->offset);
+
+	td->stat_io_bytes[io_u->ddir] = td->this_io_bytes[io_u->ddir];
+	td_io_u_unlock(td);
+}
+
+static int add_bw_samples(struct thread_data *td, struct timeval *t)
 {
 	struct thread_stat *ts = &td->ts;
 	unsigned long spent, rate;
+	enum fio_ddir ddir;
 
-	if (!ddir_rw(ddir))
-		return;
+	if (per_unit_log(td->bw_log))
+		return 0;
 
 	spent = mtime_since(&td->bw_sample_time, t);
-	if (spent < td->o.bw_avg_time)
-		return;
+	if (spent < td->o.bw_avg_time &&
+	    td->o.bw_avg_time - spent >= 10)
+		return td->o.bw_avg_time - spent;
 
 	td_io_u_lock(td);
 
@@ -2141,27 +2170,50 @@ void add_bw_sample(struct thread_data *td, enum fio_ddir ddir, unsigned int bs,
 		add_stat_sample(&ts->bw_stat[ddir], rate);
 
 		if (td->bw_log)
-			add_log_sample(td, td->bw_log, rate, ddir, bs, 0);
+			add_log_sample(td, td->bw_log, rate, ddir, 0, 0);
 
 		td->stat_io_bytes[ddir] = td->this_io_bytes[ddir];
 	}
 
-	fio_gettime(&td->bw_sample_time, NULL);
+	timeval_add_msec(&td->bw_sample_time, td->o.bw_avg_time);
+
+	td_io_u_unlock(td);
+
+	if (spent <= td->o.bw_avg_time)
+		return td->o.bw_avg_time;
+
+	return td->o.bw_avg_time - (1 + spent - td->o.bw_avg_time);
+}
+
+void add_iops_sample(struct thread_data *td, struct io_u *io_u,
+		     unsigned int bytes)
+{
+	struct thread_stat *ts = &td->ts;
+
+	td_io_u_lock(td);
+
+	add_stat_sample(&ts->iops_stat[io_u->ddir], 1);
+
+	if (td->iops_log)
+		add_log_sample(td, td->iops_log, 1, io_u->ddir, bytes, io_u->offset);
+
+	td->stat_io_blocks[io_u->ddir] = td->this_io_blocks[io_u->ddir];
 	td_io_u_unlock(td);
 }
 
-void add_iops_sample(struct thread_data *td, enum fio_ddir ddir, unsigned int bs,
-		     struct timeval *t)
+static int add_iops_samples(struct thread_data *td, struct timeval *t)
 {
 	struct thread_stat *ts = &td->ts;
 	unsigned long spent, iops;
+	enum fio_ddir ddir;
 
-	if (!ddir_rw(ddir))
-		return;
+	if (per_unit_log(td->iops_log))
+		return 0;
 
 	spent = mtime_since(&td->iops_sample_time, t);
-	if (spent < td->o.iops_avg_time)
-		return;
+	if (spent < td->o.iops_avg_time &&
+	    td->o.iops_avg_time - spent >= 10)
+		return td->o.iops_avg_time - spent;
 
 	td_io_u_lock(td);
 
@@ -2183,13 +2235,52 @@ void add_iops_sample(struct thread_data *td, enum fio_ddir ddir, unsigned int bs
 		add_stat_sample(&ts->iops_stat[ddir], iops);
 
 		if (td->iops_log)
-			add_log_sample(td, td->iops_log, iops, ddir, bs, 0);
+			add_log_sample(td, td->iops_log, iops, ddir, 0, 0);
 
 		td->stat_io_blocks[ddir] = td->this_io_blocks[ddir];
 	}
 
-	fio_gettime(&td->iops_sample_time, NULL);
+	timeval_add_msec(&td->iops_sample_time, td->o.iops_avg_time);
+
 	td_io_u_unlock(td);
+
+	if (spent <= td->o.iops_avg_time)
+		return td->o.iops_avg_time;
+
+	return td->o.iops_avg_time - (1 + spent - td->o.iops_avg_time);
+}
+
+/*
+ * Returns msecs to next event
+ */
+int calc_log_samples(void)
+{
+	struct thread_data *td;
+	unsigned int next = ~0U, tmp;
+	struct timeval now;
+	int i;
+
+	fio_gettime(&now, NULL);
+
+	for_each_td(td, i) {
+		if (!ramp_time_over(td) ||
+		    !(td->runstate == TD_RUNNING || td->runstate == TD_VERIFYING)) {
+			next = min(td->o.iops_avg_time, td->o.bw_avg_time);
+			continue;
+		}
+		if (!per_unit_log(td->bw_log)) {
+			tmp = add_bw_samples(td, &now);
+			if (tmp < next)
+				next = tmp;
+		}
+		if (!per_unit_log(td->iops_log)) {
+			tmp = add_iops_samples(td, &now);
+			if (tmp < next)
+				next = tmp;
+		}
+	}
+
+	return next == ~0U ? 0 : next;
 }
 
 void stat_init(void)
@@ -2212,8 +2303,7 @@ void stat_exit(void)
  */
 void show_running_run_stats(void)
 {
-	helper_do_stat = 1;
-	pthread_cond_signal(&helper_cond);
+	helper_do_stat();
 }
 
 uint32_t *io_u_block_info(struct thread_data *td, struct io_u *io_u)
diff --git a/stat.h b/stat.h
index 9c3f192..86f1a0b 100644
--- a/stat.h
+++ b/stat.h
@@ -276,11 +276,12 @@ extern void add_clat_sample(struct thread_data *, enum fio_ddir, unsigned long,
 				unsigned int, uint64_t);
 extern void add_slat_sample(struct thread_data *, enum fio_ddir, unsigned long,
 				unsigned int, uint64_t);
-extern void add_bw_sample(struct thread_data *, enum fio_ddir, unsigned int,
-				struct timeval *);
-extern void add_iops_sample(struct thread_data *, enum fio_ddir, unsigned int,
-				struct timeval *);
 extern void add_agg_sample(unsigned long, enum fio_ddir, unsigned int);
+extern void add_iops_sample(struct thread_data *, struct io_u *,
+				unsigned int);
+extern void add_bw_sample(struct thread_data *, struct io_u *,
+				unsigned int, unsigned long);
+extern int calc_log_samples(void);
 
 extern struct io_log *agg_io_log[DDIR_RWDIR_CNT];
 extern int write_bw_log;
diff --git a/time.c b/time.c
index b145e90..0e64af5 100644
--- a/time.c
+++ b/time.c
@@ -6,6 +6,15 @@
 static struct timeval genesis;
 static unsigned long ns_granularity;
 
+void timeval_add_msec(struct timeval *tv, unsigned int msec)
+{
+	tv->tv_usec += 1000 * msec;
+	if (tv->tv_usec >= 1000000) {
+		tv->tv_usec -= 1000000;
+		tv->tv_sec++;
+	}
+}
+
 /*
  * busy looping version for the last few usec
  */
diff --git a/tools/fiologparser.py b/tools/fiologparser.py
new file mode 100755
index 0000000..0574099
--- /dev/null
+++ b/tools/fiologparser.py
@@ -0,0 +1,152 @@
+#!/usr/bin/python
+#
+# fiologparser.py
+#
+# This tool lets you parse multiple fio log files and look at interaval
+# statistics even when samples are non-uniform.  For instance:
+#
+# fiologparser.py -s *bw*
+#
+# to see per-interval sums for all bandwidth logs or:
+#
+# fiologparser.py -a *clat*
+#
+# to see per-interval average completion latency.
+
+import argparse
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-i', '--interval', required=False, type=int, default=1000, help='interval of time in seconds.')
+    parser.add_argument('-d', '--divisor', required=False, type=int, default=1, help='divide the results by this value.')
+    parser.add_argument('-f', '--full', dest='full', action='store_true', default=False, help='print full output.')
+    parser.add_argument('-a', '--average', dest='average', action='store_true', default=False, help='print the average for each interval.')
+    parser.add_argument('-s', '--sum', dest='sum', action='store_true', default=False, help='print the sum for each interval.')
+    parser.add_argument("FILE", help="collectl log output files to parse", nargs="+")
+    args = parser.parse_args()
+
+    return args
+
+def get_ftime(series):
+    ftime = 0
+    for ts in series:
+        if ftime == 0 or ts.last.end < ftime:
+            ftime = ts.last.end
+    return ftime
+
+def print_full(ctx, series):
+    ftime = get_ftime(series)
+    start = 0 
+    end = ctx.interval
+
+    while (start < ftime):
+        end = ftime if ftime < end else end
+        results = [ts.get_value(start, end) for ts in series]
+        print "%s, %s" % (end, ', '.join(["%0.3f" % i for i in results]))
+        start += ctx.interval
+        end += ctx.interval
+
+def print_sums(ctx, series):
+    ftime = get_ftime(series)
+    start = 0
+    end = ctx.interval
+
+    while (start < ftime):
+        end = ftime if ftime < end else end
+        results = [ts.get_value(start, end) for ts in series]
+        print "%s, %0.3f" % (end, sum(results))
+        start += ctx.interval
+        end += ctx.interval
+
+def print_averages(ctx, series):
+    ftime = get_ftime(series)
+    start = 0
+    end = ctx.interval
+
+    while (start < ftime):
+        end = ftime if ftime < end else end
+        results = [ts.get_value(start, end) for ts in series]
+        print "%s, %0.3f" % (end, float(sum(results))/len(results))
+        start += ctx.interval
+        end += ctx.interval
+
+
+def print_default(ctx, series):
+    ftime = get_ftime(series)
+    start = 0
+    end = ctx.interval
+    averages = []
+    weights = []
+
+    while (start < ftime):
+        end = ftime if ftime < end else end
+        results = [ts.get_value(start, end) for ts in series]
+        averages.append(sum(results)) 
+        weights.append(end-start)
+        start += ctx.interval
+        end += ctx.interval
+
+    total = 0
+    for i in xrange(0, len(averages)):
+        total += averages[i]*weights[i]
+    print '%0.3f' % (total/sum(weights))
+ 
+class TimeSeries():
+    def __init__(self, ctx, fn):
+        self.ctx = ctx
+        self.last = None 
+        self.samples = []
+        self.read_data(fn)
+
+    def read_data(self, fn):
+        f = open(fn, 'r')
+        p_time = 0
+        for line in f:
+            (time, value, foo, bar) = line.rstrip('\r\n').rsplit(', ')
+            self.add_sample(p_time, int(time), int(value))
+            p_time = int(time)
+ 
+    def add_sample(self, start, end, value):
+        sample = Sample(ctx, start, end, value)
+        if not self.last or self.last.end < end:
+            self.last = sample
+        self.samples.append(sample)
+
+    def get_value(self, start, end):
+        value = 0
+        for sample in self.samples:
+            value += sample.get_contribution(start, end)
+        return value
+
+class Sample():
+    def __init__(self, ctx, start, end, value):
+       self.ctx = ctx
+       self.start = start
+       self.end = end
+       self.value = value
+
+    def get_contribution(self, start, end):
+       # short circuit if not within the bound
+       if (end < self.start or start > self.end):
+           return 0 
+
+       sbound = self.start if start < self.start else start
+       ebound = self.end if end > self.end else end
+       ratio = float(ebound-sbound) / (end-start) 
+       return self.value*ratio/ctx.divisor
+
+
+if __name__ == '__main__':
+    ctx = parse_args()
+    series = []
+    for fn in ctx.FILE:
+       series.append(TimeSeries(ctx, fn)) 
+    if ctx.sum:
+        print_sums(ctx, series)
+    elif ctx.average:
+        print_averages(ctx, series)
+    elif ctx.full:
+        print_full(ctx, series)
+    else:
+        print_default(ctx, series)
+
diff --git a/workqueue.c b/workqueue.c
index 6e67f3e..4f9c414 100644
--- a/workqueue.c
+++ b/workqueue.c
@@ -9,6 +9,7 @@
 #include "fio.h"
 #include "flist.h"
 #include "workqueue.h"
+#include "smalloc.h"
 
 enum {
 	SW_F_IDLE	= 1 << 0,
@@ -263,7 +264,7 @@ void workqueue_exit(struct workqueue *wq)
 		}
 	} while (shutdown && shutdown != wq->max_workers);
 
-	free(wq->workers);
+	sfree(wq->workers);
 	wq->workers = NULL;
 	pthread_mutex_destroy(&wq->flush_lock);
 	pthread_cond_destroy(&wq->flush_cond);
@@ -317,7 +318,7 @@ int workqueue_init(struct thread_data *td, struct workqueue *wq,
 	pthread_mutex_init(&wq->flush_lock, NULL);
 	pthread_mutex_init(&wq->stat_lock, NULL);
 
-	wq->workers = calloc(wq->max_workers, sizeof(struct submit_worker));
+	wq->workers = smalloc(wq->max_workers * sizeof(struct submit_worker));
 
 	for (i = 0; i < wq->max_workers; i++)
 		if (start_worker(wq, i, sk_out))

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-06 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-06 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2078c1369f993b4d760d3fe1f90cb6ffa4389fe5:

  hash: import Linux sparse hash fix (2016-05-03 13:56:47 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 604577f1329b617d724d6712868d344a5adf5251:

  libfio: clear iops/bw sample times on stats reset (2016-05-05 10:55:47 -0600)

----------------------------------------------------------------
Jens Axboe (5):
      iolog: fix use-after-free of iolog_flush_data
      iolog: don't use the workqueue for sync work
      iolog: free memory on gz init failure
      iolog: add define for default number of log entries
      libfio: clear iops/bw sample times on stats reset

 iolog.c  | 77 +++++++++++++++++-----------------------------------------------
 iolog.h  |  2 ++
 libfio.c |  2 ++
 3 files changed, 24 insertions(+), 57 deletions(-)

---

Diff of recent changes:

diff --git a/iolog.c b/iolog.c
index feda9ed..94d3f3c 100644
--- a/iolog.c
+++ b/iolog.c
@@ -576,7 +576,7 @@ void setup_log(struct io_log **log, struct log_params *p,
 
 	l = calloc(1, sizeof(*l));
 	l->nr_samples = 0;
-	l->max_samples = 1024;
+	l->max_samples = DEF_LOG_ENTRIES;
 	l->log_type = p->log_type;
 	l->log_offset = p->log_offset;
 	l->log_gz = p->log_gz;
@@ -672,11 +672,6 @@ void flush_samples(FILE *f, void *samples, uint64_t sample_size)
 
 struct iolog_flush_data {
 	struct workqueue_work work;
-	pthread_mutex_t lock;
-	pthread_cond_t cv;
-	int wait;
-	volatile int done;
-	volatile int refs;
 	struct io_log *log;
 	void *samples;
 	uint64_t nr_samples;
@@ -1009,28 +1004,8 @@ size_t log_chunk_sizes(struct io_log *log)
 
 #ifdef CONFIG_ZLIB
 
-static void drop_data_unlock(struct iolog_flush_data *data)
+static int gz_work(struct iolog_flush_data *data)
 {
-	int refs;
-
-	refs = --data->refs;
-	pthread_mutex_unlock(&data->lock);
-
-	if (!refs) {
-		free(data);
-		pthread_mutex_destroy(&data->lock);
-		pthread_cond_destroy(&data->cv);
-	}
-}
-
-/*
- * Invoked from our compress helper thread, when logging would have exceeded
- * the specified memory limitation. Compresses the previously stored
- * entries.
- */
-static int gz_work(struct submit_worker *sw, struct workqueue_work *work)
-{
-	struct iolog_flush_data *data;
 	struct iolog_compress *c;
 	struct flist_head list;
 	unsigned int seq;
@@ -1040,8 +1015,6 @@ static int gz_work(struct submit_worker *sw, struct workqueue_work *work)
 
 	INIT_FLIST_HEAD(&list);
 
-	data = container_of(work, struct iolog_flush_data, work);
-
 	stream.zalloc = Z_NULL;
 	stream.zfree = Z_NULL;
 	stream.opaque = Z_NULL;
@@ -1049,7 +1022,7 @@ static int gz_work(struct submit_worker *sw, struct workqueue_work *work)
 	ret = deflateInit(&stream, Z_DEFAULT_COMPRESSION);
 	if (ret != Z_OK) {
 		log_err("fio: failed to init gz stream\n");
-		return 0;
+		goto err;
 	}
 
 	seq = ++data->log->chunk_seq;
@@ -1109,14 +1082,7 @@ static int gz_work(struct submit_worker *sw, struct workqueue_work *work)
 
 	ret = 0;
 done:
-	if (data->wait) {
-		pthread_mutex_lock(&data->lock);
-		data->done = 1;
-		pthread_cond_signal(&data->cv);
-
-		drop_data_unlock(data);
-	} else
-		free(data);
+	free(data);
 	return ret;
 err:
 	while (!flist_empty(&list)) {
@@ -1128,6 +1094,16 @@ err:
 	goto done;
 }
 
+/*
+ * Invoked from our compress helper thread, when logging would have exceeded
+ * the specified memory limitation. Compresses the previously stored
+ * entries.
+ */
+static int gz_work_async(struct submit_worker *sw, struct workqueue_work *work)
+{
+	return gz_work(container_of(work, struct iolog_flush_data, work));
+}
+
 static int gz_init_worker(struct submit_worker *sw)
 {
 	struct thread_data *td = sw->wq->td;
@@ -1144,7 +1120,7 @@ static int gz_init_worker(struct submit_worker *sw)
 }
 
 static struct workqueue_ops log_compress_wq_ops = {
-	.fn		= gz_work,
+	.fn		= gz_work_async,
 	.init_worker_fn	= gz_init_worker,
 	.nice		= 1,
 };
@@ -1189,26 +1165,13 @@ int iolog_flush(struct io_log *log, int wait)
 	data->nr_samples = log->nr_samples;
 
 	log->nr_samples = 0;
-	log->max_samples = 128;
+	log->max_samples = DEF_LOG_ENTRIES;
 	log->log = malloc(log->max_samples * log_entry_sz(log));
 
-	data->wait = wait;
-	if (data->wait) {
-		pthread_mutex_init(&data->lock, NULL);
-		pthread_cond_init(&data->cv, NULL);
-		data->done = 0;
-		data->refs = 2;
-	}
-
-	workqueue_enqueue(&log->td->log_compress_wq, &data->work);
-
-	if (wait) {
-		pthread_mutex_lock(&data->lock);
-		while (!data->done)
-			pthread_cond_wait(&data->cv, &data->lock);
-
-		drop_data_unlock(data);
-	}
+	if (!wait)
+		workqueue_enqueue(&log->td->log_compress_wq, &data->work);
+	else
+		gz_work(data);
 
 	return 0;
 }
diff --git a/iolog.h b/iolog.h
index 297daf5..74f2170 100644
--- a/iolog.h
+++ b/iolog.h
@@ -41,6 +41,8 @@ enum {
 	IO_LOG_TYPE_IOPS,
 };
 
+#define DEF_LOG_ENTRIES		1024
+
 /*
  * Dynamically growing data sample log
  */
diff --git a/libfio.c b/libfio.c
index c626d15..b17f148 100644
--- a/libfio.c
+++ b/libfio.c
@@ -146,6 +146,8 @@ void reset_all_stats(struct thread_data *td)
 	fio_gettime(&tv, NULL);
 	memcpy(&td->epoch, &tv, sizeof(tv));
 	memcpy(&td->start, &tv, sizeof(tv));
+	memcpy(&td->iops_sample_time, &tv, sizeof(tv));
+	memcpy(&td->bw_sample_time, &tv, sizeof(tv));
 
 	lat_target_reset(td);
 	clear_rusage_stat(td);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-04 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-04 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 08a2cbf64720d9371ea4649b1bdc00257916a326:

  Update RBD documentation (2016-05-02 08:25:16 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2078c1369f993b4d760d3fe1f90cb6ffa4389fe5:

  hash: import Linux sparse hash fix (2016-05-03 13:56:47 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      hash: import Linux sparse hash fix

 hash.h | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/hash.h b/hash.h
index 02b0614..1d7608b 100644
--- a/hash.h
+++ b/hash.h
@@ -28,13 +28,29 @@
 #error Define GOLDEN_RATIO_PRIME for your wordsize.
 #endif
 
-#define GR_PRIME_64	0x9e37fffffffc0001ULL
+/*
+ * The above primes are actively bad for hashing, since they are
+ * too sparse. The 32-bit one is mostly ok, the 64-bit one causes
+ * real problems. Besides, the "prime" part is pointless for the
+ * multiplicative hash.
+ *
+ * Although a random odd number will do, it turns out that the golden
+ * ratio phi = (sqrt(5)-1)/2, or its negative, has particularly nice
+ * properties.
+ *
+ * These are the negative, (1 - phi) = (phi^2) = (3 - sqrt(5))/2.
+ * (See Knuth vol 3, section 6.4, exercise 9.)
+ */
+#define GOLDEN_RATIO_32 0x61C88647
+#define GOLDEN_RATIO_64 0x61C8864680B583EBull
 
 static inline unsigned long __hash_long(unsigned long val)
 {
 	unsigned long hash = val;
 
 #if BITS_PER_LONG == 64
+	hash *= GOLDEN_RATIO_64;
+#else
 	/*  Sigh, gcc can't optimise this alone like it does for 32 bits. */
 	unsigned long n = hash;
 	n <<= 18;
@@ -49,9 +65,6 @@ static inline unsigned long __hash_long(unsigned long val)
 	hash += n;
 	n <<= 2;
 	hash += n;
-#else
-	/* On some cpus multiply is faster, on others gcc will do shifts */
-	hash *= GOLDEN_RATIO_PRIME;
 #endif
 
 	return hash;
@@ -65,7 +78,7 @@ static inline unsigned long hash_long(unsigned long val, unsigned int bits)
 
 static inline uint64_t __hash_u64(uint64_t val)
 {
-	return val * GR_PRIME_64;
+	return val * GOLDEN_RATIO_64;
 }
 	
 static inline unsigned long hash_ptr(void *ptr, unsigned int bits)
@@ -77,7 +90,7 @@ static inline unsigned long hash_ptr(void *ptr, unsigned int bits)
  * Bob Jenkins jhash
  */
 
-#define JHASH_INITVAL	GOLDEN_RATIO_PRIME
+#define JHASH_INITVAL	GOLDEN_RATIO_32
 
 static inline uint32_t rol32(uint32_t word, uint32_t shift)
 {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-05-03 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-05-03 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit fe8d0f4c54f0c308c9a02a4e3c2f5084e8bf5461:

  Fio 2.9 (2016-04-28 16:23:10 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 08a2cbf64720d9371ea4649b1bdc00257916a326:

  Update RBD documentation (2016-05-02 08:25:16 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Update RBD documentation

Tianqing (1):
      rbd: add clustername option

 HOWTO         |  9 +++++++++
 engines/rbd.c | 30 +++++++++++++++++++++++++++++-
 fio.1         |  7 ++++++-
 3 files changed, 44 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 2be1648..1f523d3 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1888,6 +1888,15 @@ be the starting port number since fio will use a range of ports.
 		1 	  : allocate space immidietly inside defragment event,
 			    and free right after event
 
+[rbd] clustername=str	Specifies the name of the Ceph cluster.
+[rbd] rbdname=str	Specifies the name of the RBD.
+[rbd] pool=str		Specifies the naem of the Ceph pool containing RBD.
+[rbd] clientname=str	Specifies the username (without the 'client.' prefix)
+			used to access the Ceph cluster. If the clustername is
+			specified, the clientmae shall be the full type.id
+			string. If no type. prefix is given, fio will add
+			'client.' by default.
+
 [mtd] skip_bad=bool	Skip operations against known bad blocks.
 
 [libhdfs] hdfsdirectory	libhdfs will create chunk in this HDFS directory
diff --git a/engines/rbd.c b/engines/rbd.c
index 8252d27..87ed360 100644
--- a/engines/rbd.c
+++ b/engines/rbd.c
@@ -27,6 +27,7 @@ struct rbd_data {
 
 struct rbd_options {
 	void *pad;
+	char *cluster_name;
 	char *rbd_name;
 	char *pool_name;
 	char *client_name;
@@ -34,6 +35,15 @@ struct rbd_options {
 };
 
 static struct fio_option options[] = {
+        {
+		.name		= "clustername",
+		.lname		= "ceph cluster name",
+		.type		= FIO_OPT_STR_STORE,
+		.help		= "Cluster name for ceph",
+		.off1		= offsetof(struct rbd_options, cluster_name),
+		.category	= FIO_OPT_C_ENGINE,
+		.group		= FIO_OPT_G_RBD,
+        },
 	{
 		.name		= "rbdname",
 		.lname		= "rbd engine rbdname",
@@ -112,7 +122,25 @@ static int _fio_rbd_connect(struct thread_data *td)
 	struct rbd_options *o = td->eo;
 	int r;
 
-	r = rados_create(&rbd->cluster, o->client_name);
+	if (o->cluster_name) {
+		char *client_name = NULL; 
+
+		/*
+		 * If we specify cluser name, the rados_creat2
+		 * will not assume 'client.'. name is considered
+		 * as a full type.id namestr
+		 */
+		if (!index(o->client_name, '.')) {
+			client_name = calloc(1, strlen("client.") +
+						strlen(o->client_name) + 1);
+			strcat(client_name, "client.");
+			o->client_name = strcat(client_name, o->client_name);
+		}
+		r = rados_create2(&rbd->cluster, o->cluster_name,
+					o->client_name, 0);
+	} else
+		r = rados_create(&rbd->cluster, o->client_name);
+	
 	if (r < 0) {
 		log_err("rados_create failed.\n");
 		goto failed_early;
diff --git a/fio.1 b/fio.1
index e502dfe..73fdee6 100644
--- a/fio.1
+++ b/fio.1
@@ -1772,6 +1772,9 @@ Preallocate donor's file on init
 .BI 1:
 allocate space immediately inside defragment event, and free right after event
 .RE
+.TP 
+.BI (rbd)clustername \fR=\fPstr
+Specifies the name of the ceph cluster.
 .TP
 .BI (rbd)rbdname \fR=\fPstr
 Specifies the name of the RBD.
@@ -1780,7 +1783,9 @@ Specifies the name of the RBD.
 Specifies the name of the Ceph pool containing the RBD.
 .TP
 .BI (rbd)clientname \fR=\fPstr
-Specifies the username (without the 'client.' prefix) used to access the Ceph cluster.
+Specifies the username (without the 'client.' prefix) used to access the Ceph
+cluster. If the clustername is specified, the clientname shall be the full
+type.id string. If no type. prefix is given, fio will add 'client.' by default.
 .TP
 .BI (mtd)skipbad \fR=\fPbool
 Skip operations against known bad blocks.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-04-29 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-04-29 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 603e604eb6d9b3ba9f201a6bff0a18da1a6c0967:

  oslib/getopt_long: allow (unique) short match (2016-04-22 18:14:24 -0400)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to fe8d0f4c54f0c308c9a02a4e3c2f5084e8bf5461:

  Fio 2.9 (2016-04-28 16:23:10 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      Fio 2.9

 FIO-VERSION-GEN        | 2 +-
 os/windows/install.wxs | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 502d4fe..fcdbd98 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.8
+DEF_VER=fio-2.9
 
 LF='
 '
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 366547d..44cc938 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.8">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.9">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-04-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-04-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 250e878ab5f26b32facbb6e134f3738aa1aa0120:

  include sys/sysmacros.h for major/minor (2016-04-21 07:47:26 -0400)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 603e604eb6d9b3ba9f201a6bff0a18da1a6c0967:

  oslib/getopt_long: allow (unique) short match (2016-04-22 18:14:24 -0400)

----------------------------------------------------------------
Jens Axboe (1):
      oslib/getopt_long: allow (unique) short match

 oslib/getopt_long.c | 36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/oslib/getopt_long.c b/oslib/getopt_long.c
index 11d879a..8ec7741 100644
--- a/oslib/getopt_long.c
+++ b/oslib/getopt_long.c
@@ -26,14 +26,14 @@ static struct getopt_private_state {
 } pvt;
 
 static inline const char *option_matches(const char *arg_str,
-					 const char *opt_name)
+					 const char *opt_name, int smatch)
 {
 	while (*arg_str != '\0' && *arg_str != '=') {
 		if (*arg_str++ != *opt_name++)
 			return NULL;
 	}
 
-	if (*opt_name)
+	if (*opt_name && !smatch)
 		return NULL;
 
 	return arg_str;
@@ -84,11 +84,37 @@ int getopt_long_only(int argc, char *const *argv, const char *optstring,
 		}
 
 		for (lo = longopts; lo->name; lo++) {
-			if ((opt_end = option_matches(carg+2, lo->name)))
+			opt_end = option_matches(carg+2, lo->name, 0);
+			if (opt_end)
 			    break;
 		}
-		if (!opt_end)
-			return '?';
+		/*
+		 * The GNU getopt_long_only() apparently allows a short match,
+		 * if it's unique and if we don't have a full match. Let's
+		 * do the same here, search and see if there is one (and only
+		 * one) short match.
+		 */
+		if (!opt_end) {
+			const struct option *lo_match = NULL;
+
+			for (lo = longopts; lo->name; lo++) {
+				const char *ret;
+
+				ret = option_matches(carg+2, lo->name, 1);
+				if (!ret)
+					continue;
+				if (!opt_end) {
+					opt_end = ret;
+					lo_match = lo;
+				} else {
+					opt_end = NULL;
+					break;
+				}
+			}
+			if (!opt_end)
+				return '?';
+			lo = lo_match;
+		}
 
 		if (longindex)
 			*longindex = lo-longopts;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-04-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-04-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4b1ddb7afd332431e83ef5f1b9ee0216aef197c4:

  Documentation: it's 'log_max_value', not 'log_max' (2016-04-18 16:54:12 -0400)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 250e878ab5f26b32facbb6e134f3738aa1aa0120:

  include sys/sysmacros.h for major/minor (2016-04-21 07:47:26 -0400)

----------------------------------------------------------------
Mike Frysinger (1):
      include sys/sysmacros.h for major/minor

 os/os-linux.h         | 1 +
 oslib/libmtd_common.h | 1 +
 2 files changed, 2 insertions(+)

---

Diff of recent changes:

diff --git a/os/os-linux.h b/os/os-linux.h
index 9e708f0..23c16b6 100644
--- a/os/os-linux.h
+++ b/os/os-linux.h
@@ -6,6 +6,7 @@
 #include <sys/ioctl.h>
 #include <sys/uio.h>
 #include <sys/syscall.h>
+#include <sys/sysmacros.h>
 #include <sys/vfs.h>
 #include <sys/mman.h>
 #include <unistd.h>
diff --git a/oslib/libmtd_common.h b/oslib/libmtd_common.h
index a123323..9768066 100644
--- a/oslib/libmtd_common.h
+++ b/oslib/libmtd_common.h
@@ -30,6 +30,7 @@
 #include <errno.h>
 #include <features.h>
 #include <inttypes.h>
+#include <sys/sysmacros.h>
 
 #ifndef PROGRAM_NAME
 # error "You must define PROGRAM_NAME before including this header"

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-04-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-04-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit eeea64a6ad6c1fed6c477a49bd5398559cf667d3:

  t/fio-verify-state: show completions in order (2016-04-15 09:03:34 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4b1ddb7afd332431e83ef5f1b9ee0216aef197c4:

  Documentation: it's 'log_max_value', not 'log_max' (2016-04-18 16:54:12 -0400)

----------------------------------------------------------------
Jens Axboe (1):
      Documentation: it's 'log_max_value', not 'log_max'

 HOWTO | 11 ++++++-----
 fio.1 |  4 ++--
 2 files changed, 8 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 6e052f5..2be1648 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1596,12 +1596,13 @@ log_avg_msec=int By default, fio will log an entry in the iops, latency,
 		disk log, that can quickly grow to a very large size. Setting
 		this option makes fio average the each log entry over the
 		specified period of time, reducing the resolution of the log.
-		See log_max as well. Defaults to 0, logging all entries.
+		See log_max_value as well. Defaults to 0, logging all entries.
+
+log_max_value=bool	If log_avg_msec is set, fio logs the average over that
+		window. If you instead want to log the maximum value, set this
+		option to 1. Defaults to 0, meaning that averaged values are
+		logged.
 
-log_max=bool	If log_avg_msec is set, fio logs the average over that window.
-		If you instead want to log the maximum value, set this option
-		to 1. Defaults to 0, meaning that averaged values are logged.
-.
 log_offset=int	If this is set, the iolog options will include the byte
 		offset for the IO entry as well as the other data values.
 
diff --git a/fio.1 b/fio.1
index b54f568..e502dfe 100644
--- a/fio.1
+++ b/fio.1
@@ -1465,9 +1465,9 @@ By default, fio will log an entry in the iops, latency, or bw log for every
 IO that completes. When writing to the disk log, that can quickly grow to a
 very large size. Setting this option makes fio average the each log entry
 over the specified period of time, reducing the resolution of the log. See
-\fBlog_max\fR as well.  Defaults to 0, logging all entries.
+\fBlog_max_value\fR as well.  Defaults to 0, logging all entries.
 .TP
-.BI log_max \fR=\fPbool
+.BI log_max_value \fR=\fPbool
 If \fBlog_avg_msec\fR is set, fio logs the average over that window. If you
 instead want to log the maximum value, set this option to 1.  Defaults to
 0, meaning that averaged values are logged.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-04-14 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-04-14 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2a442a30385a3b3a338148be46661f4ea3d9eed1:

  Modify RDMA engine to print strerror messages. (2016-04-04 10:06:03 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to bb7816439c44f6f98debd8afd167da14d612574c:

  client: bool conversion (2016-04-13 15:53:06 -0600)

----------------------------------------------------------------
Jens Axboe (4):
      Fix verify state for multiple files
      Cleanup last write logging
      t/fio-verify-state: pretty up output a bit
      client: bool conversion

 backend.c        |  52 +++++++++++++-----
 client.c         |   4 +-
 client.h         |   6 +--
 file.h           |   7 +++
 fio.h            |   7 ---
 gfio.c           |   2 +-
 init.c           |   4 +-
 io_u.c           |  41 ++++++++------
 server.c         |   4 +-
 server.h         |   2 +-
 t/verify-state.c |  77 ++++++++++++++------------
 verify-state.h   |  34 +++++-------
 verify.c         | 162 ++++++++++++++++++++++++++-----------------------------
 13 files changed, 216 insertions(+), 186 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index e093f75..1723b8f 100644
--- a/backend.c
+++ b/backend.c
@@ -1070,6 +1070,41 @@ reap:
 		bytes_done[i] = td->bytes_done[i] - bytes_done[i];
 }
 
+static void free_file_completion_logging(struct thread_data *td)
+{
+	struct fio_file *f;
+	unsigned int i;
+
+	for_each_file(td, f, i) {
+		if (!f->last_write_comp)
+			break;
+		sfree(f->last_write_comp);
+	}
+}
+
+static int init_file_completion_logging(struct thread_data *td,
+					unsigned int depth)
+{
+	struct fio_file *f;
+	unsigned int i;
+
+	if (td->o.verify == VERIFY_NONE || !td->o.verify_state_save)
+		return 0;
+
+	for_each_file(td, f, i) {
+		f->last_write_comp = scalloc(depth, sizeof(uint64_t));
+		if (!f->last_write_comp)
+			goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	free_file_completion_logging(td);
+	log_err("fio: failed to alloc write comp data\n");
+	return 1;
+}
+
 static void cleanup_io_u(struct thread_data *td)
 {
 	struct io_u *io_u;
@@ -1088,8 +1123,7 @@ static void cleanup_io_u(struct thread_data *td)
 	io_u_qexit(&td->io_u_freelist);
 	io_u_qexit(&td->io_u_all);
 
-	if (td->last_write_comp)
-		sfree(td->last_write_comp);
+	free_file_completion_logging(td);
 }
 
 static int init_io_u(struct thread_data *td)
@@ -1206,13 +1240,8 @@ static int init_io_u(struct thread_data *td)
 		p += max_bs;
 	}
 
-	if (td->o.verify != VERIFY_NONE) {
-		td->last_write_comp = scalloc(max_units, sizeof(uint64_t));
-		if (!td->last_write_comp) {
-			log_err("fio: failed to alloc write comp data\n");
-			return 1;
-		}
-	}
+	if (init_file_completion_logging(td, max_units))
+		return 1;
 
 	return 0;
 }
@@ -1964,12 +1993,11 @@ static int fio_verify_load_state(struct thread_data *td)
 
 	if (is_backend) {
 		void *data;
-		int ver;
 
 		ret = fio_server_get_verify_state(td->o.name,
-					td->thread_number - 1, &data, &ver);
+					td->thread_number - 1, &data);
 		if (!ret)
-			verify_convert_assign_state(td, data, ver);
+			verify_assign_state(td, data);
 	} else
 		ret = verify_load_state(td, "local");
 
diff --git a/client.c b/client.c
index 6bc1145..d502a4b 100644
--- a/client.c
+++ b/client.c
@@ -347,7 +347,7 @@ err:
 	return NULL;
 }
 
-int fio_client_add_ini_file(void *cookie, const char *ini_file, int remote)
+int fio_client_add_ini_file(void *cookie, const char *ini_file, bool remote)
 {
 	struct fio_client *client = cookie;
 	struct client_file *cf;
@@ -789,7 +789,7 @@ static int __fio_client_send_local_ini(struct fio_client *client,
 }
 
 int fio_client_send_ini(struct fio_client *client, const char *filename,
-			int remote)
+			bool remote)
 {
 	int ret;
 
diff --git a/client.h b/client.h
index 7fe09d1..ddacf78 100644
--- a/client.h
+++ b/client.h
@@ -22,7 +22,7 @@ enum {
 
 struct client_file {
 	char *file;
-	int remote;
+	bool remote;
 };
 
 struct fio_client {
@@ -124,12 +124,12 @@ extern int fio_clients_connect(void);
 extern int fio_start_client(struct fio_client *);
 extern int fio_start_all_clients(void);
 extern int fio_clients_send_ini(const char *);
-extern int fio_client_send_ini(struct fio_client *, const char *, int);
+extern int fio_client_send_ini(struct fio_client *, const char *, bool);
 extern int fio_handle_clients(struct client_ops *);
 extern int fio_client_add(struct client_ops *, const char *, void **);
 extern struct fio_client *fio_client_add_explicit(struct client_ops *, const char *, int, int);
 extern void fio_client_add_cmd_option(void *, const char *);
-extern int fio_client_add_ini_file(void *, const char *, int);
+extern int fio_client_add_ini_file(void *, const char *, bool);
 extern int fio_client_terminate(struct fio_client *);
 extern void fio_clients_terminate(void);
 extern struct fio_client *fio_get_client(struct fio_client *);
diff --git a/file.h b/file.h
index a631766..e7563b8 100644
--- a/file.h
+++ b/file.h
@@ -98,6 +98,13 @@ struct fio_file {
 	uint64_t last_write;
 
 	/*
+	 * Tracks the last iodepth number of completed writes, if data
+	 * verification is enabled
+	 */
+	uint64_t *last_write_comp;
+	unsigned int last_write_idx;
+
+	/*
 	 * For use by the io engine
 	 */
 	uint64_t engine_data;
diff --git a/fio.h b/fio.h
index 30fbde0..829cc81 100644
--- a/fio.h
+++ b/fio.h
@@ -154,13 +154,6 @@ struct thread_data {
 	uint64_t stat_io_blocks[DDIR_RWDIR_CNT];
 	struct timeval iops_sample_time;
 
-	/*
-	 * Tracks the last iodepth number of completed writes, if data
-	 * verification is enabled
-	 */
-	uint64_t *last_write_comp;
-	unsigned int last_write_idx;
-
 	volatile int update_rusage;
 	struct fio_mutex *rusage_sem;
 	struct rusage ru_start;
diff --git a/gfio.c b/gfio.c
index 42d536e..e3bcbdf 100644
--- a/gfio.c
+++ b/gfio.c
@@ -449,7 +449,7 @@ static int send_job_file(struct gui_entry *ge)
 		free(gco);
 	}
 
-	ret = fio_client_send_ini(gc->client, ge->job_file, 0);
+	ret = fio_client_send_ini(gc->client, ge->job_file, false);
 	if (!ret)
 		return 0;
 
diff --git a/init.c b/init.c
index cc33bf0..89e05c0 100644
--- a/init.c
+++ b/init.c
@@ -2552,14 +2552,14 @@ int parse_cmd_line(int argc, char *argv[], int client_type)
 				    !strncmp(argv[optind], "-", 1))
 					break;
 
-				if (fio_client_add_ini_file(cur_client, argv[optind], 0))
+				if (fio_client_add_ini_file(cur_client, argv[optind], false))
 					break;
 				optind++;
 			}
 			break;
 		case 'R':
 			did_arg = 1;
-			if (fio_client_add_ini_file(cur_client, optarg, 1)) {
+			if (fio_client_add_ini_file(cur_client, optarg, true)) {
 				do_exit++;
 				exit_val = 1;
 			}
diff --git a/io_u.c b/io_u.c
index ea08c92..6622bc0 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1735,6 +1735,28 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 	}
 }
 
+static void file_log_write_comp(const struct thread_data *td, struct fio_file *f,
+				uint64_t offset, unsigned int bytes)
+{
+	int idx;
+
+	if (!f)
+		return;
+
+	if (f->first_write == -1ULL || offset < f->first_write)
+		f->first_write = offset;
+	if (f->last_write == -1ULL || ((offset + bytes) > f->last_write))
+		f->last_write = offset + bytes;
+
+	if (!f->last_write_comp)
+		return;
+
+	idx = f->last_write_idx++;
+	f->last_write_comp[idx] = offset;
+	if (f->last_write_idx == td->o.iodepth)
+		f->last_write_idx = 0;
+}
+
 static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 			 struct io_completion_data *icd)
 {
@@ -1785,23 +1807,8 @@ static void io_completed(struct thread_data *td, struct io_u **io_u_ptr,
 		if (!(io_u->flags & IO_U_F_VER_LIST))
 			td->this_io_bytes[ddir] += bytes;
 
-		if (ddir == DDIR_WRITE) {
-			if (f) {
-				if (f->first_write == -1ULL ||
-				    io_u->offset < f->first_write)
-					f->first_write = io_u->offset;
-				if (f->last_write == -1ULL ||
-				    ((io_u->offset + bytes) > f->last_write))
-					f->last_write = io_u->offset + bytes;
-			}
-			if (td->last_write_comp) {
-				int idx = td->last_write_idx++;
-
-				td->last_write_comp[idx] = io_u->offset;
-				if (td->last_write_idx == td->o.iodepth)
-					td->last_write_idx = 0;
-			}
-		}
+		if (ddir == DDIR_WRITE)
+			file_log_write_comp(td, f, io_u->offset, bytes);
 
 		if (ramp_time_over(td) && (td->runstate == TD_RUNNING ||
 					   td->runstate == TD_VERIFYING))
diff --git a/server.c b/server.c
index 6416a5c..dcb7c2d 100644
--- a/server.c
+++ b/server.c
@@ -1819,7 +1819,7 @@ void fio_server_send_start(struct thread_data *td)
 }
 
 int fio_server_get_verify_state(const char *name, int threadnumber,
-				void **datap, int *version)
+				void **datap)
 {
 	struct thread_io_list *s;
 	struct cmd_sendfile out;
@@ -1871,7 +1871,7 @@ fail:
 	 * the header, and the thread_io_list checksum
 	 */
 	s = rep->data + sizeof(struct verify_state_hdr);
-	if (verify_state_hdr(rep->data, s, version)) {
+	if (verify_state_hdr(rep->data, s)) {
 		ret = EILSEQ;
 		goto fail;
 	}
diff --git a/server.h b/server.h
index fd0a0ce..7fc3ec6 100644
--- a/server.h
+++ b/server.h
@@ -211,7 +211,7 @@ extern void fio_server_send_ts(struct thread_stat *, struct group_run_stats *);
 extern void fio_server_send_gs(struct group_run_stats *);
 extern void fio_server_send_du(void);
 extern void fio_server_send_job_options(struct flist_head *, unsigned int);
-extern int fio_server_get_verify_state(const char *, int, void **, int *);
+extern int fio_server_get_verify_state(const char *, int, void **);
 
 extern struct fio_net_cmd *fio_net_recv_cmd(int sk, bool wait);
 
diff --git a/t/verify-state.c b/t/verify-state.c
index 6e8cd35..95dcf3a 100644
--- a/t/verify-state.c
+++ b/t/verify-state.c
@@ -19,15 +19,45 @@ static void show_s(struct thread_io_list *s, unsigned int no_s)
 {
 	int i;
 
-	printf("Thread %u, %s\n", no_s, s->name);
-	printf("Completions: %llu\n", (unsigned long long) s->no_comps);
-	printf("Depth: %llu\n", (unsigned long long) s->depth);
-	printf("Number IOs: %llu\n", (unsigned long long) s->numberio);
-	printf("Index: %llu\n", (unsigned long long) s->index);
+	printf("Thread:\t\t%u\n", no_s);
+	printf("Name:\t\t%s\n", s->name);
+	printf("Completions:\t%llu\n", (unsigned long long) s->no_comps);
+	printf("Depth:\t\t%llu\n", (unsigned long long) s->depth);
+	printf("Number IOs:\t%llu\n", (unsigned long long) s->numberio);
+	printf("Index:\t\t%llu\n", (unsigned long long) s->index);
 
 	printf("Completions:\n");
-	for (i = 0; i < s->no_comps; i++)
-		printf("\t%llu\n", (unsigned long long) s->offsets[i]);
+	for (i = 0; i < s->no_comps; i++) {
+		printf("\t(file=%2llu) %llu\n",
+				(unsigned long long) s->comps[i].fileno,
+				(unsigned long long) s->comps[i].offset);
+	}
+}
+
+static void show(struct thread_io_list *s, size_t size)
+{
+	int no_s;
+
+	no_s = 0;
+	do {
+		int i;
+
+		s->no_comps = le64_to_cpu(s->no_comps);
+		s->depth = le32_to_cpu(s->depth);
+		s->nofiles = le32_to_cpu(s->nofiles);
+		s->numberio = le64_to_cpu(s->numberio);
+		s->index = le64_to_cpu(s->index);
+
+		for (i = 0; i < s->no_comps; i++) {
+			s->comps[i].fileno = le64_to_cpu(s->comps[i].fileno);
+			s->comps[i].offset = le64_to_cpu(s->comps[i].offset);
+		}
+
+		show_s(s, no_s);
+		no_s++;
+		size -= __thread_io_list_sz(s->depth, s->nofiles);
+		s = (void *) s + __thread_io_list_sz(s->depth, s->nofiles);
+	} while (size != 0);
 }
 
 static void show_verify_state(void *buf, size_t size)
@@ -35,15 +65,14 @@ static void show_verify_state(void *buf, size_t size)
 	struct verify_state_hdr *hdr = buf;
 	struct thread_io_list *s;
 	uint32_t crc;
-	int no_s;
 
 	hdr->version = le64_to_cpu(hdr->version);
 	hdr->size = le64_to_cpu(hdr->size);
 	hdr->crc = le64_to_cpu(hdr->crc);
 
-	printf("Version: %x, Size %u, crc %x\n", (unsigned int) hdr->version,
-						(unsigned int) hdr->size,
-						(unsigned int) hdr->crc);
+	printf("Version:\t0x%x\n", (unsigned int) hdr->version);
+	printf("Size:\t\t%u\n", (unsigned int) hdr->size);
+	printf("CRC:\t\t0x%x\n", (unsigned int) hdr->crc);
 
 	size -= sizeof(*hdr);
 	if (hdr->size != size) {
@@ -58,28 +87,10 @@ static void show_verify_state(void *buf, size_t size)
 		return;
 	}
 
-	if (hdr->version != 0x02) {
-		log_err("Can only handle version 2 headers\n");
-		return;
-	}
-
-	no_s = 0;
-	do {
-		int i;
-
-		s->no_comps = le64_to_cpu(s->no_comps);
-		s->depth = le64_to_cpu(s->depth);
-		s->numberio = le64_to_cpu(s->numberio);
-		s->index = le64_to_cpu(s->index);
-
-		for (i = 0; i < s->no_comps; i++)
-			s->offsets[i] = le64_to_cpu(s->offsets[i]);
-
-		show_s(s, no_s);
-		no_s++;
-		size -= __thread_io_list_sz(s->depth);
-		s = (void *) s + __thread_io_list_sz(s->depth);
-	} while (size != 0);
+	if (hdr->version == 0x03)
+		show(s, size);
+	else
+		log_err("Unsupported version %d\n", (int) hdr->version);
 }
 
 int main(int argc, char *argv[])
diff --git a/verify-state.h b/verify-state.h
index 0d004b0..f1dc069 100644
--- a/verify-state.h
+++ b/verify-state.h
@@ -22,24 +22,20 @@ struct thread_rand_state {
 /*
  * For dumping current write state
  */
-struct thread_io_list {
-	uint64_t no_comps;
-	uint64_t depth;
-	uint64_t numberio;
-	uint64_t index;
-	struct thread_rand_state rand;
-	uint8_t name[64];
-	uint64_t offsets[0];
+struct file_comp {
+	uint64_t fileno;
+	uint64_t offset;
 };
 
-struct thread_io_list_v1 {
+struct thread_io_list {
 	uint64_t no_comps;
-	uint64_t depth;
+	uint32_t depth;
+	uint32_t nofiles;
 	uint64_t numberio;
 	uint64_t index;
-	struct thread_rand32_state rand;
+	struct thread_rand_state rand;
 	uint8_t name[64];
-	uint64_t offsets[0];
+	struct file_comp comps[0];
 };
 
 struct all_io_list {
@@ -47,8 +43,7 @@ struct all_io_list {
 	struct thread_io_list state[0];
 };
 
-#define VSTATE_HDR_VERSION_V1	0x01
-#define VSTATE_HDR_VERSION	0x02
+#define VSTATE_HDR_VERSION	0x03
 
 struct verify_state_hdr {
 	uint64_t version;
@@ -65,18 +60,17 @@ extern void verify_save_state(int mask);
 extern int verify_load_state(struct thread_data *, const char *);
 extern void verify_free_state(struct thread_data *);
 extern int verify_state_should_stop(struct thread_data *, struct io_u *);
-extern void verify_convert_assign_state(struct thread_data *, void *, int);
-extern int verify_state_hdr(struct verify_state_hdr *, struct thread_io_list *,
-				int *);
+extern void verify_assign_state(struct thread_data *, void *);
+extern int verify_state_hdr(struct verify_state_hdr *, struct thread_io_list *);
 
-static inline size_t __thread_io_list_sz(uint64_t depth)
+static inline size_t __thread_io_list_sz(uint32_t depth, uint32_t nofiles)
 {
-	return sizeof(struct thread_io_list) + depth * sizeof(uint64_t);
+	return sizeof(struct thread_io_list) + depth * nofiles * sizeof(struct file_comp);
 }
 
 static inline size_t thread_io_list_sz(struct thread_io_list *s)
 {
-	return __thread_io_list_sz(le64_to_cpu(s->depth));
+	return __thread_io_list_sz(le32_to_cpu(s->depth), le32_to_cpu(s->nofiles));
 }
 
 static inline struct thread_io_list *io_list_next(struct thread_io_list *s)
diff --git a/verify.c b/verify.c
index 0f43a3e..838db10 100644
--- a/verify.c
+++ b/verify.c
@@ -1353,6 +1353,47 @@ int paste_blockoff(char *buf, unsigned int len, void *priv)
 	return 0;
 }
 
+static int __fill_file_completions(struct thread_data *td,
+				   struct thread_io_list *s,
+				   struct fio_file *f, unsigned int *index)
+{
+	unsigned int comps;
+	int i, j;
+
+	if (!f->last_write_comp)
+		return 0;
+
+	if (td->io_blocks[DDIR_WRITE] < td->o.iodepth)
+		comps = td->io_blocks[DDIR_WRITE];
+	else
+		comps = td->o.iodepth;
+
+	j = f->last_write_idx - 1;
+	for (i = 0; i < comps; i++) {
+		if (j == -1)
+			j = td->o.iodepth - 1;
+		s->comps[*index].fileno = __cpu_to_le64(f->fileno);
+		s->comps[*index].offset = cpu_to_le64(f->last_write_comp[j]);
+		(*index)++;
+		j--;
+	}
+
+	return comps;
+}
+
+static int fill_file_completions(struct thread_data *td,
+				 struct thread_io_list *s, unsigned int *index)
+{
+	struct fio_file *f;
+	unsigned int i;
+	int comps = 0;
+
+	for_each_file(td, f, i)
+		comps += __fill_file_completions(td, s, f, index);
+
+	return comps;
+}
+
 struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 {
 	struct all_io_list *rep;
@@ -1374,7 +1415,7 @@ struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 			continue;
 		td->stop_io = 1;
 		td->flags |= TD_F_VSTATE_SAVED;
-		depth += td->o.iodepth;
+		depth += (td->o.iodepth * td->o.nr_files);
 		nr++;
 	}
 
@@ -1383,7 +1424,7 @@ struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 
 	*sz = sizeof(*rep);
 	*sz += nr * sizeof(struct thread_io_list);
-	*sz += depth * sizeof(uint64_t);
+	*sz += depth * sizeof(struct file_comp);
 	rep = malloc(*sz);
 	memset(rep, 0, *sz);
 
@@ -1392,31 +1433,16 @@ struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 	next = &rep->state[0];
 	for_each_td(td, i) {
 		struct thread_io_list *s = next;
-		unsigned int comps;
+		unsigned int comps, index = 0;
 
 		if (save_mask != IO_LIST_ALL && (i + 1) != save_mask)
 			continue;
 
-		if (td->last_write_comp) {
-			int j, k;
-
-			if (td->io_blocks[DDIR_WRITE] < td->o.iodepth)
-				comps = td->io_blocks[DDIR_WRITE];
-			else
-				comps = td->o.iodepth;
-
-			k = td->last_write_idx - 1;
-			for (j = 0; j < comps; j++) {
-				if (k == -1)
-					k = td->o.iodepth - 1;
-				s->offsets[j] = cpu_to_le64(td->last_write_comp[k]);
-				k--;
-			}
-		} else
-			comps = 0;
+		comps = fill_file_completions(td, s, &index);
 
 		s->no_comps = cpu_to_le64((uint64_t) comps);
 		s->depth = cpu_to_le64((uint64_t) td->o.iodepth);
+		s->nofiles = cpu_to_le64((uint64_t) td->o.nr_files);
 		s->numberio = cpu_to_le64((uint64_t) td->io_issues[DDIR_WRITE]);
 		s->index = cpu_to_le64((uint64_t) i);
 		if (td->random_state.use64) {
@@ -1536,72 +1562,34 @@ void verify_free_state(struct thread_data *td)
 		free(td->vstate);
 }
 
-static struct thread_io_list *convert_v1_list(struct thread_io_list_v1 *s)
+void verify_assign_state(struct thread_data *td, void *p)
 {
-	struct thread_io_list *til;
+	struct thread_io_list *s = p;
 	int i;
 
-	til = malloc(__thread_io_list_sz(s->no_comps));
-	til->no_comps = s->no_comps;
-	til->depth = s->depth;
-	til->numberio = s->numberio;
-	til->index = s->index;
-	memcpy(til->name, s->name, sizeof(til->name));
-
-	til->rand.use64 = 0;
-	for (i = 0; i < 4; i++)
-		til->rand.state32.s[i] = s->rand.s[i];
+	s->no_comps = le64_to_cpu(s->no_comps);
+	s->depth = le32_to_cpu(s->depth);
+	s->nofiles = le32_to_cpu(s->nofiles);
+	s->numberio = le64_to_cpu(s->numberio);
+	s->rand.use64 = le64_to_cpu(s->rand.use64);
 
-	for (i = 0; i < s->no_comps; i++)
-		til->offsets[i] = s->offsets[i];
-
-	return til;
-}
-
-void verify_convert_assign_state(struct thread_data *td, void *p, int version)
-{
-	struct thread_io_list *til;
-	int i;
-
-	if (version == 1) {
-		struct thread_io_list_v1 *s = p;
-
-		s->no_comps = le64_to_cpu(s->no_comps);
-		s->depth = le64_to_cpu(s->depth);
-		s->numberio = le64_to_cpu(s->numberio);
-		for (i = 0; i < 4; i++)
-			s->rand.s[i] = le32_to_cpu(s->rand.s[i]);
-		for (i = 0; i < s->no_comps; i++)
-			s->offsets[i] = le64_to_cpu(s->offsets[i]);
-
-		til = convert_v1_list(s);
-		free(s);
+	if (s->rand.use64) {
+		for (i = 0; i < 6; i++)
+			s->rand.state64.s[i] = le64_to_cpu(s->rand.state64.s[i]);
 	} else {
-		struct thread_io_list *s = p;
-
-		s->no_comps = le64_to_cpu(s->no_comps);
-		s->depth = le64_to_cpu(s->depth);
-		s->numberio = le64_to_cpu(s->numberio);
-		s->rand.use64 = le64_to_cpu(s->rand.use64);
-
-		if (s->rand.use64) {
-			for (i = 0; i < 6; i++)
-				s->rand.state64.s[i] = le64_to_cpu(s->rand.state64.s[i]);
-		} else {
-			for (i = 0; i < 4; i++)
-				s->rand.state32.s[i] = le32_to_cpu(s->rand.state32.s[i]);
-		}
-		for (i = 0; i < s->no_comps; i++)
-			s->offsets[i] = le64_to_cpu(s->offsets[i]);
+		for (i = 0; i < 4; i++)
+			s->rand.state32.s[i] = le32_to_cpu(s->rand.state32.s[i]);
+	}
 
-		til = p;
+	for (i = 0; i < s->no_comps; i++) {
+		s->comps[i].fileno = le64_to_cpu(s->comps[i].fileno);
+		s->comps[i].offset = le64_to_cpu(s->comps[i].offset);
 	}
 
-	td->vstate = til;
+	td->vstate = p;
 }
 
-int verify_state_hdr(struct verify_state_hdr *hdr, struct thread_io_list *s,
-		     int *version)
+int verify_state_hdr(struct verify_state_hdr *hdr, struct thread_io_list *s)
 {
 	uint64_t crc;
 
@@ -1609,15 +1597,13 @@ int verify_state_hdr(struct verify_state_hdr *hdr, struct thread_io_list *s,
 	hdr->size = le64_to_cpu(hdr->size);
 	hdr->crc = le64_to_cpu(hdr->crc);
 
-	if (hdr->version != VSTATE_HDR_VERSION &&
-	    hdr->version != VSTATE_HDR_VERSION_V1)
+	if (hdr->version != VSTATE_HDR_VERSION)
 		return 1;
 
 	crc = fio_crc32c((void *)s, hdr->size);
 	if (crc != hdr->crc)
 		return 1;
 
-	*version = hdr->version;
 	return 0;
 }
 
@@ -1648,9 +1634,9 @@ int verify_load_state(struct thread_data *td, const char *prefix)
 	hdr.size = le64_to_cpu(hdr.size);
 	hdr.crc = le64_to_cpu(hdr.crc);
 
-	if (hdr.version != VSTATE_HDR_VERSION &&
-	    hdr.version != VSTATE_HDR_VERSION_V1) {
-		log_err("fio: bad version in verify state header\n");
+	if (hdr.version != VSTATE_HDR_VERSION) {
+		log_err("fio: unsupported (%d) version in verify state header\n",
+				(unsigned int) hdr.version);
 		goto err;
 	}
 
@@ -1671,7 +1657,7 @@ int verify_load_state(struct thread_data *td, const char *prefix)
 
 	close(fd);
 
-	verify_convert_assign_state(td, s, hdr.version);
+	verify_assign_state(td, s);
 	return 0;
 err:
 	if (s)
@@ -1686,9 +1672,10 @@ err:
 int verify_state_should_stop(struct thread_data *td, struct io_u *io_u)
 {
 	struct thread_io_list *s = td->vstate;
+	struct fio_file *f = io_u->file;
 	int i;
 
-	if (!s)
+	if (!s || !f)
 		return 0;
 
 	/*
@@ -1705,9 +1692,12 @@ int verify_state_should_stop(struct thread_data *td, struct io_u *io_u)
 	 * completed or not. If the IO was seen as completed, then
 	 * lets verify it.
 	 */
-	for (i = 0; i < s->no_comps; i++)
-		if (io_u->offset == s->offsets[i])
+	for (i = 0; i < s->no_comps; i++) {
+		if (s->comps[i].fileno != f->fileno)
+			continue;
+		if (io_u->offset == s->comps[i].offset)
 			return 0;
+	}
 
 	/*
 	 * Not found, we have to stop

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-04-05 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-04-05 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 526c403dc9b892ae1dda6ebb2a0f2d5883795d17:

  fio: register pvsync2 engine correctly (2016-04-01 20:56:52 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2a442a30385a3b3a338148be46661f4ea3d9eed1:

  Modify RDMA engine to print strerror messages. (2016-04-04 10:06:03 -0600)

----------------------------------------------------------------
Logan Gunthorpe (1):
      Modify RDMA engine to print strerror messages.

 engines/rdma.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

---

Diff of recent changes:

diff --git a/engines/rdma.c b/engines/rdma.c
index 87ba465..7fbfad9 100644
--- a/engines/rdma.c
+++ b/engines/rdma.c
@@ -415,7 +415,7 @@ static int fio_rdmaio_setup_qp(struct thread_data *td)
 		rd->pd = ibv_alloc_pd(rd->cm_id->verbs);
 
 	if (rd->pd == NULL) {
-		log_err("fio: ibv_alloc_pd fail\n");
+		log_err("fio: ibv_alloc_pd fail: %m\n");
 		return 1;
 	}
 
@@ -424,7 +424,7 @@ static int fio_rdmaio_setup_qp(struct thread_data *td)
 	else
 		rd->channel = ibv_create_comp_channel(rd->cm_id->verbs);
 	if (rd->channel == NULL) {
-		log_err("fio: ibv_create_comp_channel fail\n");
+		log_err("fio: ibv_create_comp_channel fail: %m\n");
 		goto err1;
 	}
 
@@ -438,12 +438,12 @@ static int fio_rdmaio_setup_qp(struct thread_data *td)
 		rd->cq = ibv_create_cq(rd->cm_id->verbs,
 				       qp_depth, rd, rd->channel, 0);
 	if (rd->cq == NULL) {
-		log_err("fio: ibv_create_cq failed\n");
+		log_err("fio: ibv_create_cq failed: %m\n");
 		goto err2;
 	}
 
 	if (ibv_req_notify_cq(rd->cq, 0) != 0) {
-		log_err("fio: ibv_create_cq failed\n");
+		log_err("fio: ibv_req_notify_cq failed: %m\n");
 		goto err3;
 	}
 
@@ -459,13 +459,13 @@ static int fio_rdmaio_setup_qp(struct thread_data *td)
 
 	if (rd->is_client == 0) {
 		if (rdma_create_qp(rd->child_cm_id, rd->pd, &init_attr) != 0) {
-			log_err("fio: rdma_create_qp failed\n");
+			log_err("fio: rdma_create_qp failed: %m\n");
 			goto err3;
 		}
 		rd->qp = rd->child_cm_id->qp;
 	} else {
 		if (rdma_create_qp(rd->cm_id, rd->pd, &init_attr) != 0) {
-			log_err("fio: rdma_create_qp failed\n");
+			log_err("fio: rdma_create_qp failed: %m\n");
 			goto err3;
 		}
 		rd->qp = rd->cm_id->qp;
@@ -490,14 +490,14 @@ static int fio_rdmaio_setup_control_msg_buffers(struct thread_data *td)
 	rd->recv_mr = ibv_reg_mr(rd->pd, &rd->recv_buf, sizeof(rd->recv_buf),
 				 IBV_ACCESS_LOCAL_WRITE);
 	if (rd->recv_mr == NULL) {
-		log_err("fio: recv_buf reg_mr failed\n");
+		log_err("fio: recv_buf reg_mr failed: %m\n");
 		return 1;
 	}
 
 	rd->send_mr = ibv_reg_mr(rd->pd, &rd->send_buf, sizeof(rd->send_buf),
 				 0);
 	if (rd->send_mr == NULL) {
-		log_err("fio: send_buf reg_mr failed\n");
+		log_err("fio: send_buf reg_mr failed: %m\n");
 		ibv_dereg_mr(rd->recv_mr);
 		return 1;
 	}
@@ -731,7 +731,7 @@ static int fio_rdmaio_send(struct thread_data *td, struct io_u **io_us,
 		}
 
 		if (ibv_post_send(rd->qp, &r_io_u_d->sq_wr, &bad_wr) != 0) {
-			log_err("fio: ibv_post_send fail\n");
+			log_err("fio: ibv_post_send fail: %m\n");
 			return -1;
 		}
 
@@ -759,7 +759,7 @@ static int fio_rdmaio_recv(struct thread_data *td, struct io_u **io_us,
 			r_io_u_d = io_us[i]->engine_data;
 			if (ibv_post_recv(rd->qp, &r_io_u_d->rq_wr, &bad_wr) !=
 			    0) {
-				log_err("fio: ibv_post_recv fail\n");
+				log_err("fio: ibv_post_recv fail: %m\n");
 				return 1;
 			}
 		}
@@ -767,7 +767,7 @@ static int fio_rdmaio_recv(struct thread_data *td, struct io_u **io_us,
 		   || (rd->rdma_protocol == FIO_RDMA_MEM_WRITE)) {
 		/* re-post the rq_wr */
 		if (ibv_post_recv(rd->qp, &rd->rq_wr, &bad_wr) != 0) {
-			log_err("fio: ibv_post_recv fail\n");
+			log_err("fio: ibv_post_recv fail: %m\n");
 			return 1;
 		}
 
@@ -866,7 +866,7 @@ static int fio_rdmaio_connect(struct thread_data *td, struct fio_file *f)
 	conn_param.retry_count = 10;
 
 	if (rdma_connect(rd->cm_id, &conn_param) != 0) {
-		log_err("fio: rdma_connect fail\n");
+		log_err("fio: rdma_connect fail: %m\n");
 		return 1;
 	}
 
@@ -881,7 +881,7 @@ static int fio_rdmaio_connect(struct thread_data *td, struct fio_file *f)
 	rd->send_buf.nr = htonl(td->o.iodepth);
 
 	if (ibv_post_send(rd->qp, &rd->sq_wr, &bad_wr) != 0) {
-		log_err("fio: ibv_post_send fail");
+		log_err("fio: ibv_post_send fail: %m");
 		return 1;
 	}
 
@@ -918,7 +918,7 @@ static int fio_rdmaio_accept(struct thread_data *td, struct fio_file *f)
 	conn_param.initiator_depth = 1;
 
 	if (rdma_accept(rd->child_cm_id, &conn_param) != 0) {
-		log_err("fio: rdma_accept\n");
+		log_err("fio: rdma_accept: %m\n");
 		return 1;
 	}
 
@@ -932,7 +932,7 @@ static int fio_rdmaio_accept(struct thread_data *td, struct fio_file *f)
 	ret = rdma_poll_wait(td, IBV_WC_RECV) < 0;
 
 	if (ibv_post_send(rd->qp, &rd->sq_wr, &bad_wr) != 0) {
-		log_err("fio: ibv_post_send fail");
+		log_err("fio: ibv_post_send fail: %m");
 		return 1;
 	}
 
@@ -965,7 +965,7 @@ static int fio_rdmaio_close_file(struct thread_data *td, struct fio_file *f)
 				     || (rd->rdma_protocol ==
 					 FIO_RDMA_MEM_READ))) {
 		if (ibv_post_send(rd->qp, &rd->sq_wr, &bad_wr) != 0) {
-			log_err("fio: ibv_post_send fail");
+			log_err("fio: ibv_post_send fail: %m");
 			return 1;
 		}
 
@@ -1084,12 +1084,12 @@ static int fio_rdmaio_setup_listen(struct thread_data *td, short port)
 
 	/* rdma_listen */
 	if (rdma_bind_addr(rd->cm_id, (struct sockaddr *)&rd->addr) != 0) {
-		log_err("fio: rdma_bind_addr fail\n");
+		log_err("fio: rdma_bind_addr fail: %m\n");
 		return 1;
 	}
 
 	if (rdma_listen(rd->cm_id, 3) != 0) {
-		log_err("fio: rdma_listen fail\n");
+		log_err("fio: rdma_listen fail: %m\n");
 		return 1;
 	}
 
@@ -1110,7 +1110,7 @@ static int fio_rdmaio_setup_listen(struct thread_data *td, short port)
 
 	/* post recv buf */
 	if (ibv_post_recv(rd->qp, &rd->rq_wr, &bad_wr) != 0) {
-		log_err("fio: ibv_post_recv fail\n");
+		log_err("fio: ibv_post_recv fail: %m\n");
 		return 1;
 	}
 
@@ -1238,13 +1238,13 @@ static int fio_rdmaio_init(struct thread_data *td)
 
 	rd->cm_channel = rdma_create_event_channel();
 	if (!rd->cm_channel) {
-		log_err("fio: rdma_create_event_channel fail\n");
+		log_err("fio: rdma_create_event_channel fail: %m\n");
 		return 1;
 	}
 
 	ret = rdma_create_id(rd->cm_channel, &rd->cm_id, rd, RDMA_PS_TCP);
 	if (ret) {
-		log_err("fio: rdma_create_id fail\n");
+		log_err("fio: rdma_create_id fail: %m\n");
 		return 1;
 	}
 
@@ -1295,7 +1295,7 @@ static int fio_rdmaio_init(struct thread_data *td)
 				      IBV_ACCESS_REMOTE_READ |
 				      IBV_ACCESS_REMOTE_WRITE);
 		if (io_u->mr == NULL) {
-			log_err("fio: ibv_reg_mr io_u failed\n");
+			log_err("fio: ibv_reg_mr io_u failed: %m\n");
 			return 1;
 		}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-04-02 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-04-02 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 23a8e176c3725f3640eaaf31a0a4c7497366c40f:

  HOWTO/man: clarify that the usr/sys utilization numbers are averages (2016-03-29 08:34:06 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 526c403dc9b892ae1dda6ebb2a0f2d5883795d17:

  fio: register pvsync2 engine correctly (2016-04-01 20:56:52 -0600)

----------------------------------------------------------------
Jon Derrick (1):
      fio: register pvsync2 engine correctly

 engines/sync.c | 6 ++++++
 options.c      | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/engines/sync.c b/engines/sync.c
index 0b0d1a7..260ef66 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -453,6 +453,9 @@ static void fio_init fio_syncio_register(void)
 #ifdef CONFIG_PWRITEV
 	register_ioengine(&ioengine_pvrw);
 #endif
+#ifdef CONFIG_PWRITEV2
+	register_ioengine(&ioengine_pvrw2);
+#endif
 }
 
 static void fio_exit fio_syncio_unregister(void)
@@ -463,4 +466,7 @@ static void fio_exit fio_syncio_unregister(void)
 #ifdef CONFIG_PWRITEV
 	unregister_ioengine(&ioengine_pvrw);
 #endif
+#ifdef CONFIG_PWRITEV2
+	unregister_ioengine(&ioengine_pvrw2);
+#endif
 }
diff --git a/options.c b/options.c
index 062abb4..b6c980e 100644
--- a/options.c
+++ b/options.c
@@ -1471,7 +1471,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "Use preadv/pwritev",
 			  },
 #endif
-#ifdef CONFIG_PWRITEV
+#ifdef CONFIG_PWRITEV2
 			  { .ival = "pvsync2",
 			    .help = "Use preadv2/pwritev2",
 			  },

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-30 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-30 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 53280a1dc785ef0447aa5cf8e32e899ecfc22978:

  t/read-to-pipe-async: use gettimeofday() instead of clock_gettime() (2016-03-25 09:40:41 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 23a8e176c3725f3640eaaf31a0a4c7497366c40f:

  HOWTO/man: clarify that the usr/sys utilization numbers are averages (2016-03-29 08:34:06 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      Fix crash with group_reporting and one group member that never started
      HOWTO/man: clarify that the usr/sys utilization numbers are averages

 HOWTO  | 4 +++-
 fio.1  | 4 +++-
 stat.c | 2 ++
 3 files changed, 8 insertions(+), 2 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 0d3d2fb..6e052f5 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1987,7 +1987,9 @@ runt=		The runtime of that thread
 cpu=		CPU usage. User and system time, along with the number
 		of context switches this thread went through, usage of
 		system and user time, and finally the number of major
-		and minor page faults.
+		and minor page faults. The CPU utilization numbers are
+		averages for the jobs in that reporting group, while the
+		context and fault counters are summed.
 IO depths=	The distribution of io depths over the job life time. The
 		numbers are divided into powers of 2, so for example the
 		16= entries includes depths up to that value but higher
diff --git a/fio.1 b/fio.1
index df140cf..b54f568 100644
--- a/fio.1
+++ b/fio.1
@@ -1873,7 +1873,9 @@ and standard deviation.
 .TP
 .B cpu
 CPU usage statistics. Includes user and system time, number of context switches
-this thread went through and number of major and minor page faults.
+this thread went through and number of major and minor page faults. The CPU
+utilization numbers are averages for the jobs in that reporting group, while
+the context and fault counters are summed.
 .TP
 .B IO depths
 Distribution of I/O depths.  Each depth includes everything less than (or equal)
diff --git a/stat.c b/stat.c
index d2720a4..6d8d4d0 100644
--- a/stat.c
+++ b/stat.c
@@ -1580,6 +1580,8 @@ void __show_run_stats(void)
 		unsigned long long bw;
 
 		ts = &threadstats[i];
+		if (ts->groupid == -1)
+			continue;
 		rs = &runstats[ts->groupid];
 		rs->kb_base = ts->kb_base;
 		rs->unit_base = ts->unit_base;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-26 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-26 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit ae46d0f5618d1d2e63d0a733b79f136d88ccac90:

  t/memlock: sample utility to use X memory from Y threads (2016-03-24 17:07:05 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 53280a1dc785ef0447aa5cf8e32e899ecfc22978:

  t/read-to-pipe-async: use gettimeofday() instead of clock_gettime() (2016-03-25 09:40:41 -0600)

----------------------------------------------------------------
Jens Axboe (2):
      t/read-to-pipe-async: synchronization fixes
      t/read-to-pipe-async: use gettimeofday() instead of clock_gettime()

 t/read-to-pipe-async.c | 47 +++++++++++++++++++++++++++--------------------
 1 file changed, 27 insertions(+), 20 deletions(-)

---

Diff of recent changes:

diff --git a/t/read-to-pipe-async.c b/t/read-to-pipe-async.c
index 30a7631..e8bdc85 100644
--- a/t/read-to-pipe-async.c
+++ b/t/read-to-pipe-async.c
@@ -32,7 +32,6 @@
 #include <pthread.h>
 #include <errno.h>
 #include <assert.h>
-#include <time.h>
 
 #include "../flist.h"
 
@@ -231,6 +230,12 @@ static int write_work(struct work_item *work)
 	return work->seq + 1;
 }
 
+static void thread_exiting(struct thread_data *thread)
+{
+	__sync_fetch_and_add(&thread->done, 1);
+	pthread_cond_signal(&thread->done_cond);
+}
+
 static void *writer_fn(void *data)
 {
 	struct writer_thread *wt = data;
@@ -258,8 +263,7 @@ static void *writer_fn(void *data)
 			seq = write_work(work);
 	}
 
-	wt->thread.done = 1;
-	pthread_cond_signal(&wt->thread.done_cond);
+	thread_exiting(&wt->thread);
 	return NULL;
 }
 
@@ -361,14 +365,13 @@ static void *reader_fn(void *data)
 		pthread_mutex_unlock(&rt->thread.lock);
 
 		if (work) {
-			rt->busy = 1;
+			__sync_fetch_and_add(&rt->busy, 1);
 			reader_work(work);
-			rt->busy = 0;
+			__sync_fetch_and_sub(&rt->busy, 1);
 		}
 	}
 
-	rt->thread.done = 1;
-	pthread_cond_signal(&rt->thread.done_cond);
+	thread_exiting(&rt->thread);
 	return NULL;
 }
 
@@ -469,20 +472,21 @@ static void exit_thread(struct thread_data *thread,
 			void fn(struct writer_thread *),
 			struct writer_thread *wt)
 {
-	thread->exit = 1;
+	__sync_fetch_and_add(&thread->exit, 1);
 	pthread_cond_signal(&thread->cond);
 
 	while (!thread->done) {
 		pthread_mutex_lock(&thread->done_lock);
 
 		if (fn) {
-			struct timespec t;
-
-			clock_gettime(CLOCK_REALTIME, &t);
-			t.tv_sec++;
+			struct timeval tv;
+			struct timespec ts;
 
+			gettimeofday(&tv, NULL);
+			ts.tv_sec = tv.tv_sec + 1;
+			ts.tv_nsec = tv.tv_usec * 1000ULL;
 
-			pthread_cond_timedwait(&thread->done_cond, &thread->done_lock, &t);
+			pthread_cond_timedwait(&thread->done_cond, &thread->done_lock, &ts);
 			fn(wt);
 		} else
 			pthread_cond_wait(&thread->done_cond, &thread->done_lock);
@@ -606,7 +610,8 @@ int main(int argc, char *argv[])
 	while (sb.st_size) {
 		struct work_item *work;
 		size_t this_len;
-		struct timespec t;
+		struct timespec ts;
+		struct timeval tv;
 
 		prune_done_entries(wt);
 
@@ -627,15 +632,17 @@ int main(int argc, char *argv[])
 
 		queue_work(rt, work);
 
-		clock_gettime(CLOCK_REALTIME, &t);
-		t.tv_nsec += max_us * 1000ULL;
-		if (t.tv_nsec >= 1000000000ULL) {
-			t.tv_nsec -= 1000000000ULL;
-			t.tv_sec++;
+		gettimeofday(&tv, NULL);
+		ts.tv_sec = tv.tv_sec;
+		ts.tv_nsec = tv.tv_usec * 1000ULL;
+		ts.tv_nsec += max_us * 1000ULL;
+		if (ts.tv_nsec >= 1000000000ULL) {
+			ts.tv_nsec -= 1000000000ULL;
+			ts.tv_sec++;
 		}
 
 		pthread_mutex_lock(&work->lock);
-		pthread_cond_timedwait(&work->cond, &work->lock, &t);
+		pthread_cond_timedwait(&work->cond, &work->lock, &ts);
 		pthread_mutex_unlock(&work->lock);
 
 		off += this_len;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-25 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-25 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 546f51ce7a5667127f0ebb10fd4db8a4b817dea0:

  travis.yml: ensure we have libaio-dev and numa dev libs (2016-03-23 21:38:59 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to ae46d0f5618d1d2e63d0a733b79f136d88ccac90:

  t/memlock: sample utility to use X memory from Y threads (2016-03-24 17:07:05 -0600)

----------------------------------------------------------------
Jens Axboe (5):
      backend: ensure that we run verification for short time based jobs
      t/read-to-pipe-async: standalone test app
      t/read-to-pipe-async: needs time.h on some platforms
      Makefile: disable build of t/read-to-pipe-async by default
      t/memlock: sample utility to use X memory from Y threads

 Makefile               |  14 ++
 backend.c              |   9 +-
 t/memlock.c            |  58 +++++
 t/read-to-pipe-async.c | 663 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 743 insertions(+), 1 deletion(-)
 create mode 100644 t/memlock.c
 create mode 100644 t/read-to-pipe-async.c

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 1d761c9..007ae40 100644
--- a/Makefile
+++ b/Makefile
@@ -231,6 +231,12 @@ T_DEDUPE_PROGS = t/fio-dedupe
 T_VS_OBJS = t/verify-state.o t/log.o crc/crc32c.o crc/crc32c-intel.o t/debug.o
 T_VS_PROGS = t/fio-verify-state
 
+T_PIPE_ASYNC_OBJS = t/read-to-pipe-async.o
+T_PIPE_ASYNC_PROGS = t/read-to-pipe-async
+
+T_MEMLOCK_OBJS = t/memlock.o
+T_MEMLOCK_PROGS = t/memlock
+
 T_OBJS = $(T_SMALLOC_OBJS)
 T_OBJS += $(T_IEEE_OBJS)
 T_OBJS += $(T_ZIPF_OBJS)
@@ -240,6 +246,8 @@ T_OBJS += $(T_GEN_RAND_OBJS)
 T_OBJS += $(T_BTRACE_FIO_OBJS)
 T_OBJS += $(T_DEDUPE_OBJS)
 T_OBJS += $(T_VS_OBJS)
+T_OBJS += $(T_PIPE_ASYNC_OBJS)
+T_OBJS += $(T_MEMLOCK_OBJS)
 
 ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
     T_DEDUPE_OBJS += os/windows/posix.o lib/hweight.o
@@ -372,6 +380,12 @@ cairo_text_helpers.o: cairo_text_helpers.c cairo_text_helpers.h
 printing.o: printing.c printing.h
 	$(QUIET_CC)$(CC) $(CFLAGS) $(GTK_CFLAGS) $(CPPFLAGS) -c $<
 
+t/read-to-pipe-async: $(T_PIPE_ASYNC_OBJS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_PIPE_ASYNC_OBJS) $(LIBS)
+
+t/memlock: $(T_MEMLOCK_OBJS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_MEMLOCK_OBJS) $(LIBS)
+
 t/stest: $(T_SMALLOC_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_SMALLOC_OBJS) $(LIBS)
 
diff --git a/backend.c b/backend.c
index 7f57c65..e093f75 100644
--- a/backend.c
+++ b/backend.c
@@ -881,7 +881,14 @@ static void do_io(struct thread_data *td, uint64_t *bytes_done)
 		if (flow_threshold_exceeded(td))
 			continue;
 
-		if (!td->o.time_based && bytes_issued >= total_bytes)
+		/*
+		 * Break if we exceeded the bytes. The exception is time
+		 * based runs, but we still need to break out of the loop
+		 * for those to run verification, if enabled.
+		 */
+		if (bytes_issued >= total_bytes &&
+		    (!td->o.time_based ||
+		     (td->o.time_based && td->o.verify != VERIFY_NONE)))
 			break;
 
 		io_u = get_io_u(td);
diff --git a/t/memlock.c b/t/memlock.c
new file mode 100644
index 0000000..d9d586d
--- /dev/null
+++ b/t/memlock.c
@@ -0,0 +1,58 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <pthread.h>
+
+static struct thread_data {
+	unsigned long mb;
+} td;
+
+static void *worker(void *data)
+{
+	struct thread_data *td = data;
+	unsigned long index;
+	size_t size;
+	char *buf;
+	int i, first = 1;
+
+	size = td->mb * 1024UL * 1024UL;
+	buf = malloc(size);
+
+	for (i = 0; i < 100000; i++) {
+		for (index = 0; index + 4096 < size; index += 4096)
+			memset(&buf[index+512], 0x89, 512);
+		if (first) {
+			printf("loop%d: did %lu MB\n", i+1, size/(1024UL*1024UL));
+			first = 0;
+		}
+	}
+	return NULL;
+}
+
+int main(int argc, char *argv[])
+{
+	unsigned long mb, threads;
+	pthread_t *pthreads;
+	int i;
+
+	if (argc < 3) {
+		printf("%s: <mb per thread> <threads>\n", argv[0]);
+		return 1;
+	}
+
+	mb = strtoul(argv[1], NULL, 10);
+	threads = strtoul(argv[2], NULL, 10);
+
+	pthreads = calloc(threads, sizeof(pthread_t));
+	td.mb = mb;
+
+	for (i = 0; i < threads; i++)
+		pthread_create(&pthreads[i], NULL, worker, &td);
+
+	for (i = 0; i < threads; i++) {
+		void *ret;
+
+		pthread_join(pthreads[i], &ret);
+	}
+	return 0;
+}
diff --git a/t/read-to-pipe-async.c b/t/read-to-pipe-async.c
new file mode 100644
index 0000000..30a7631
--- /dev/null
+++ b/t/read-to-pipe-async.c
@@ -0,0 +1,663 @@
+/*
+ * Read a file and write the contents to stdout. If a given read takes
+ * longer than 'max_us' time, then we schedule a new thread to handle
+ * the next read. This avoids the coordinated omission problem, where
+ * one request appears to take a long time, but in reality a lot of
+ * requests would have been slow, but we don't notice since new submissions
+ * are not being issued if just 1 is held up.
+ *
+ * One test case:
+ *
+ * $ time (./read-to-pipe-async -f randfile.gz | gzip -dc > outfile; sync)
+ *
+ * This will read randfile.gz and log the latencies of doing so, while
+ * piping the output to gzip to decompress it. Any latencies over max_us
+ * are logged when they happen, and latency buckets are displayed at the
+ * end of the run
+ *
+ * gcc -Wall -g -O2 -o read-to-pipe-async read-to-pipe-async.c -lpthread
+ *
+ * Copyright (C) 2016 Jens Axboe
+ *
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <inttypes.h>
+#include <string.h>
+#include <pthread.h>
+#include <errno.h>
+#include <assert.h>
+#include <time.h>
+
+#include "../flist.h"
+
+static int bs = 4096;
+static int max_us = 10000;
+static char *file;
+static int separate_writer = 1;
+
+#define PLAT_BITS	8
+#define PLAT_VAL	(1 << PLAT_BITS)
+#define PLAT_GROUP_NR	19
+#define PLAT_NR		(PLAT_GROUP_NR * PLAT_VAL)
+#define PLAT_LIST_MAX	20
+
+struct stats {
+	unsigned int plat[PLAT_NR];
+	unsigned int nr_samples;
+	unsigned int max;
+	unsigned int min;
+	unsigned int over;
+};
+
+static double plist[PLAT_LIST_MAX] = { 50.0, 75.0, 90.0, 95.0, 99.0, 99.5, 99.9, 99.99, 99.999, 99.9999, };
+
+struct thread_data {
+	int exit;
+	int done;
+	pthread_mutex_t lock;
+	pthread_cond_t cond;
+	pthread_mutex_t done_lock;
+	pthread_cond_t done_cond;
+	pthread_t thread;
+};
+
+struct writer_thread {
+	struct flist_head list;
+	struct flist_head done_list;
+	struct stats s;
+	struct thread_data thread;
+};
+
+struct reader_thread {
+	struct flist_head list;
+	struct flist_head done_list;
+	int started;
+	int busy;
+	int write_seq;
+	struct stats s;
+	struct thread_data thread;
+};
+
+struct work_item {
+	struct flist_head list;
+	void *buf;
+	size_t buf_size;
+	off_t off;
+	int fd;
+	int seq;
+	struct writer_thread *writer;
+	struct reader_thread *reader;
+	pthread_mutex_t lock;
+	pthread_cond_t cond;
+	pthread_t thread;
+};
+
+static struct reader_thread reader_thread;
+static struct writer_thread writer_thread;
+
+uint64_t utime_since(const struct timeval *s, const struct timeval *e)
+{
+	long sec, usec;
+	uint64_t ret;
+
+	sec = e->tv_sec - s->tv_sec;
+	usec = e->tv_usec - s->tv_usec;
+	if (sec > 0 && usec < 0) {
+		sec--;
+		usec += 1000000;
+	}
+
+	if (sec < 0 || (sec == 0 && usec < 0))
+		return 0;
+
+	ret = sec * 1000000ULL + usec;
+
+	return ret;
+}
+
+static struct work_item *find_seq(struct writer_thread *w, unsigned int seq)
+{
+	struct work_item *work;
+	struct flist_head *entry;
+
+	if (flist_empty(&w->list))
+		return NULL;
+
+	flist_for_each(entry, &w->list) {
+		work = flist_entry(entry, struct work_item, list);
+		if (work->seq == seq)
+			return work;
+	}
+
+	return NULL;
+}
+
+static unsigned int plat_val_to_idx(unsigned int val)
+{
+	unsigned int msb, error_bits, base, offset;
+
+	/* Find MSB starting from bit 0 */
+	if (val == 0)
+		msb = 0;
+	else
+		msb = sizeof(val)*8 - __builtin_clz(val) - 1;
+
+	/*
+	 * MSB <= (PLAT_BITS-1), cannot be rounded off. Use
+	 * all bits of the sample as index
+	 */
+	if (msb <= PLAT_BITS)
+		return val;
+
+	/* Compute the number of error bits to discard*/
+	error_bits = msb - PLAT_BITS;
+
+	/* Compute the number of buckets before the group */
+	base = (error_bits + 1) << PLAT_BITS;
+
+	/*
+	 * Discard the error bits and apply the mask to find the
+	 * index for the buckets in the group
+	 */
+	offset = (PLAT_VAL - 1) & (val >> error_bits);
+
+	/* Make sure the index does not exceed (array size - 1) */
+	return (base + offset) < (PLAT_NR - 1) ?
+		(base + offset) : (PLAT_NR - 1);
+}
+
+/*
+ * Convert the given index of the bucket array to the value
+ * represented by the bucket
+ */
+static unsigned int plat_idx_to_val(unsigned int idx)
+{
+	unsigned int error_bits, k, base;
+
+	assert(idx < PLAT_NR);
+
+	/* MSB <= (PLAT_BITS-1), cannot be rounded off. Use
+	 * all bits of the sample as index */
+	if (idx < (PLAT_VAL << 1))
+		return idx;
+
+	/* Find the group and compute the minimum value of that group */
+	error_bits = (idx >> PLAT_BITS) - 1;
+	base = 1 << (error_bits + PLAT_BITS);
+
+	/* Find its bucket number of the group */
+	k = idx % PLAT_VAL;
+
+	/* Return the mean of the range of the bucket */
+	return base + ((k + 0.5) * (1 << error_bits));
+}
+
+static void add_lat(struct stats *s, unsigned int us, const char *name)
+{
+	int lat_index = 0;
+
+	if (us > s->max)
+		s->max = us;
+	if (us < s->min)
+		s->min = us;
+
+	if (us > max_us) {
+		fprintf(stderr, "%s latency=%u usec\n", name, us);
+		s->over++;
+	}
+
+	lat_index = plat_val_to_idx(us);
+	__sync_fetch_and_add(&s->plat[lat_index], 1);
+	__sync_fetch_and_add(&s->nr_samples, 1);
+}
+
+static int write_work(struct work_item *work)
+{
+	struct timeval s, e;
+	ssize_t ret;
+
+	gettimeofday(&s, NULL);
+	ret = write(STDOUT_FILENO, work->buf, work->buf_size);
+	gettimeofday(&e, NULL);
+	assert(ret == work->buf_size);
+
+	add_lat(&work->writer->s, utime_since(&s, &e), "write");
+	return work->seq + 1;
+}
+
+static void *writer_fn(void *data)
+{
+	struct writer_thread *wt = data;
+	struct work_item *work;
+	unsigned int seq = 1;
+
+	work = NULL;
+	while (!wt->thread.exit || !flist_empty(&wt->list)) {
+		pthread_mutex_lock(&wt->thread.lock);
+
+		if (work) {
+			flist_add_tail(&work->list, &wt->done_list);
+			work = NULL;
+		}
+	
+		work = find_seq(wt, seq);
+		if (work)
+			flist_del_init(&work->list);
+		else
+			pthread_cond_wait(&wt->thread.cond, &wt->thread.lock);
+
+		pthread_mutex_unlock(&wt->thread.lock);
+
+		if (work)
+			seq = write_work(work);
+	}
+
+	wt->thread.done = 1;
+	pthread_cond_signal(&wt->thread.done_cond);
+	return NULL;
+}
+
+static void reader_work(struct work_item *work)
+{
+	struct timeval s, e;
+	ssize_t ret;
+	size_t left;
+	void *buf;
+	off_t off;
+
+	gettimeofday(&s, NULL);
+
+	left = work->buf_size;
+	buf = work->buf;
+	off = work->off;
+	while (left) {
+		ret = pread(work->fd, buf, left, off);
+		if (!ret) {
+			fprintf(stderr, "zero read\n");
+			break;
+		} else if (ret < 0) {
+			fprintf(stderr, "errno=%d\n", errno);
+			break;
+		}
+		left -= ret;
+		off += ret;
+		buf += ret;
+	}
+
+	gettimeofday(&e, NULL);
+
+	add_lat(&work->reader->s, utime_since(&s, &e), "read");
+
+	pthread_cond_signal(&work->cond);
+
+	if (separate_writer) {
+		pthread_mutex_lock(&work->writer->thread.lock);
+		flist_add_tail(&work->list, &work->writer->list);
+		pthread_mutex_unlock(&work->writer->thread.lock);
+		pthread_cond_signal(&work->writer->thread.cond);
+	} else {
+		struct reader_thread *rt = work->reader;
+		struct work_item *next = NULL;
+		struct flist_head *entry;
+
+		/*
+		 * Write current work if it matches in sequence.
+		 */
+		if (work->seq == rt->write_seq)
+			goto write_it;
+
+		pthread_mutex_lock(&rt->thread.lock);
+
+		flist_add_tail(&work->list, &rt->done_list);
+
+		/*
+		 * See if the next work item is here, if so, write it
+		 */
+		work = NULL;
+		flist_for_each(entry, &rt->done_list) {
+			next = flist_entry(entry, struct work_item, list);
+			if (next->seq == rt->write_seq) {
+				work = next;
+				flist_del(&work->list);
+				break;
+			}
+		}
+
+		pthread_mutex_unlock(&rt->thread.lock);
+	
+		if (work) {
+write_it:
+			write_work(work);
+			__sync_fetch_and_add(&rt->write_seq, 1);
+		}
+	}
+}
+
+static void *reader_one_off(void *data)
+{
+	reader_work(data);
+	return NULL;
+}
+
+static void *reader_fn(void *data)
+{
+	struct reader_thread *rt = data;
+	struct work_item *work;
+
+	while (!rt->thread.exit || !flist_empty(&rt->list)) {
+		work = NULL;
+		pthread_mutex_lock(&rt->thread.lock);
+		if (!flist_empty(&rt->list)) {
+			work = flist_first_entry(&rt->list, struct work_item, list);
+			flist_del_init(&work->list);
+		} else
+			pthread_cond_wait(&rt->thread.cond, &rt->thread.lock);
+		pthread_mutex_unlock(&rt->thread.lock);
+
+		if (work) {
+			rt->busy = 1;
+			reader_work(work);
+			rt->busy = 0;
+		}
+	}
+
+	rt->thread.done = 1;
+	pthread_cond_signal(&rt->thread.done_cond);
+	return NULL;
+}
+
+static void queue_work(struct reader_thread *rt, struct work_item *work)
+{
+	if (!rt->started) {
+		pthread_mutex_lock(&rt->thread.lock);
+		flist_add_tail(&work->list, &rt->list);
+		pthread_mutex_unlock(&rt->thread.lock);
+
+		rt->started = 1;
+		pthread_create(&rt->thread.thread, NULL, reader_fn, rt);
+	} else if (!rt->busy && !pthread_mutex_trylock(&rt->thread.lock)) {
+		flist_add_tail(&work->list, &rt->list);
+		pthread_mutex_unlock(&rt->thread.lock);
+
+		pthread_cond_signal(&rt->thread.cond);
+	} else {
+		int ret = pthread_create(&work->thread, NULL, reader_one_off, work);
+		if (ret)
+			fprintf(stderr, "pthread_create=%d\n", ret);
+		else
+			pthread_detach(work->thread);
+	}
+}
+
+static unsigned int calc_percentiles(unsigned int *io_u_plat, unsigned long nr,
+				     unsigned int **output)
+{
+	unsigned long sum = 0;
+	unsigned int len, i, j = 0;
+	unsigned int oval_len = 0;
+	unsigned int *ovals = NULL;
+	int is_last;
+
+	len = 0;
+	while (len < PLAT_LIST_MAX && plist[len] != 0.0)
+		len++;
+
+	if (!len)
+		return 0;
+
+	/*
+	 * Calculate bucket values, note down max and min values
+	 */
+	is_last = 0;
+	for (i = 0; i < PLAT_NR && !is_last; i++) {
+		sum += io_u_plat[i];
+		while (sum >= (plist[j] / 100.0 * nr)) {
+			assert(plist[j] <= 100.0);
+
+			if (j == oval_len) {
+				oval_len += 100;
+				ovals = realloc(ovals, oval_len * sizeof(unsigned int));
+			}
+
+			ovals[j] = plat_idx_to_val(i);
+			is_last = (j == len - 1);
+			if (is_last)
+				break;
+
+			j++;
+		}
+	}
+
+	*output = ovals;
+	return len;
+}
+
+static void show_latencies(struct stats *s, const char *msg)
+{
+	unsigned int *ovals = NULL;
+	unsigned int len, i;
+
+	len = calc_percentiles(s->plat, s->nr_samples, &ovals);
+	if (len) {
+		fprintf(stderr, "Latency percentiles (usec) (%s)\n", msg);
+		for (i = 0; i < len; i++)
+			fprintf(stderr, "\t%2.4fth: %u\n", plist[i], ovals[i]);
+	}
+
+	if (ovals)
+		free(ovals);
+
+	fprintf(stderr, "\tOver=%u, min=%u, max=%u\n", s->over, s->min, s->max);
+}
+
+static void init_thread(struct thread_data *thread)
+{
+	pthread_cond_init(&thread->cond, NULL);
+	pthread_cond_init(&thread->done_cond, NULL);
+	pthread_mutex_init(&thread->lock, NULL);
+	pthread_mutex_init(&thread->done_lock, NULL);
+	thread->exit = 0;
+}
+
+static void exit_thread(struct thread_data *thread,
+			void fn(struct writer_thread *),
+			struct writer_thread *wt)
+{
+	thread->exit = 1;
+	pthread_cond_signal(&thread->cond);
+
+	while (!thread->done) {
+		pthread_mutex_lock(&thread->done_lock);
+
+		if (fn) {
+			struct timespec t;
+
+			clock_gettime(CLOCK_REALTIME, &t);
+			t.tv_sec++;
+
+
+			pthread_cond_timedwait(&thread->done_cond, &thread->done_lock, &t);
+			fn(wt);
+		} else
+			pthread_cond_wait(&thread->done_cond, &thread->done_lock);
+
+		pthread_mutex_unlock(&thread->done_lock);
+	}
+}
+
+static int usage(char *argv[])
+{
+	fprintf(stderr, "%s: [-b blocksize] [-t max usec] [-w separate writer] -f file\n", argv[0]);
+	return 1;
+}
+
+static int parse_options(int argc, char *argv[])
+{
+	int c;
+
+	while ((c = getopt(argc, argv, "f:b:t:w:")) != -1) {
+		switch (c) {
+		case 'f':
+			file = strdup(optarg);
+			break;
+		case 'b':
+			bs = atoi(optarg);
+			break;
+		case 't':
+			max_us = atoi(optarg);
+			break;
+		case 'w':
+			separate_writer = atoi(optarg);
+			if (!separate_writer)
+				fprintf(stderr, "inline writing is broken\n");
+			break;
+		case '?':
+		default:
+			return usage(argv);
+		}
+	}
+
+	if (!file)
+		return usage(argv);
+
+	return 0;
+}
+
+static void prune_done_entries(struct writer_thread *wt)
+{
+	FLIST_HEAD(list);
+
+	if (flist_empty(&wt->done_list))
+		return;
+
+	if (pthread_mutex_trylock(&wt->thread.lock))
+		return;
+
+	if (!flist_empty(&wt->done_list))
+		flist_splice_init(&wt->done_list, &list);
+	pthread_mutex_unlock(&wt->thread.lock);
+
+	while (!flist_empty(&list)) {
+		struct work_item *work;
+
+		work = flist_first_entry(&list, struct work_item, list);
+		flist_del(&work->list);
+
+		pthread_cond_destroy(&work->cond);
+		pthread_mutex_destroy(&work->lock);
+		free(work->buf);
+		free(work);
+	}
+}
+
+int main(int argc, char *argv[])
+{
+	struct timeval s, re, we;
+	struct reader_thread *rt;
+	struct writer_thread *wt;
+	unsigned long rate;
+	struct stat sb;
+	size_t bytes;
+	off_t off;
+	int fd, seq;
+
+	if (parse_options(argc, argv))
+		return 1;
+
+	fd = open(file, O_RDONLY);
+	if (fd < 0) {
+		perror("open");
+		return 2;
+	}
+
+	if (fstat(fd, &sb) < 0) {
+		perror("stat");
+		return 3;
+	}
+
+	wt = &writer_thread;
+	init_thread(&wt->thread);
+	INIT_FLIST_HEAD(&wt->list);
+	INIT_FLIST_HEAD(&wt->done_list);
+	wt->s.max = 0;
+	wt->s.min = -1U;
+	pthread_create(&wt->thread.thread, NULL, writer_fn, wt);
+
+	rt = &reader_thread;
+	init_thread(&rt->thread);
+	INIT_FLIST_HEAD(&rt->list);
+	INIT_FLIST_HEAD(&rt->done_list);
+	rt->s.max = 0;
+	rt->s.min = -1U;
+	rt->write_seq = 1;
+
+	off = 0;
+	seq = 0;
+	bytes = 0;
+
+	gettimeofday(&s, NULL);
+
+	while (sb.st_size) {
+		struct work_item *work;
+		size_t this_len;
+		struct timespec t;
+
+		prune_done_entries(wt);
+
+		this_len = sb.st_size;
+		if (this_len > bs)
+			this_len = bs;
+
+		work = calloc(1, sizeof(*work));
+		work->buf = malloc(this_len);
+		work->buf_size = this_len;
+		work->off = off;
+		work->fd = fd;
+		work->seq = ++seq;
+		work->writer = wt;
+		work->reader = rt;
+		pthread_cond_init(&work->cond, NULL);
+		pthread_mutex_init(&work->lock, NULL);
+
+		queue_work(rt, work);
+
+		clock_gettime(CLOCK_REALTIME, &t);
+		t.tv_nsec += max_us * 1000ULL;
+		if (t.tv_nsec >= 1000000000ULL) {
+			t.tv_nsec -= 1000000000ULL;
+			t.tv_sec++;
+		}
+
+		pthread_mutex_lock(&work->lock);
+		pthread_cond_timedwait(&work->cond, &work->lock, &t);
+		pthread_mutex_unlock(&work->lock);
+
+		off += this_len;
+		sb.st_size -= this_len;
+		bytes += this_len;
+	}
+
+	exit_thread(&rt->thread, NULL, NULL);
+	gettimeofday(&re, NULL);
+
+	exit_thread(&wt->thread, prune_done_entries, wt);
+	gettimeofday(&we, NULL);
+
+	show_latencies(&rt->s, "READERS");
+	show_latencies(&wt->s, "WRITERS");
+
+	bytes /= 1024;
+	rate = (bytes * 1000UL * 1000UL) / utime_since(&s, &re);
+	fprintf(stderr, "Read rate (KB/sec) : %lu\n", rate);
+	rate = (bytes * 1000UL * 1000UL) / utime_since(&s, &we);
+	fprintf(stderr, "Write rate (KB/sec): %lu\n", rate);
+
+	close(fd);
+	return 0;
+}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-24 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-24 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit e91ba49acb8ff9f2e82046fec4f42b790e8644b1:

  gen-rand: fix dependency on strcasestr.o (2016-03-20 09:38:57 -0600)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 546f51ce7a5667127f0ebb10fd4db8a4b817dea0:

  travis.yml: ensure we have libaio-dev and numa dev libs (2016-03-23 21:38:59 -0600)

----------------------------------------------------------------
Jens Axboe (3):
      Add .travis.yml
      Makefile: add empty 'test' target
      travis.yml: ensure we have libaio-dev and numa dev libs

 .travis.yml | 7 +++++++
 Makefile    | 2 ++
 2 files changed, 9 insertions(+)
 create mode 100644 .travis.yml

---

Diff of recent changes:

diff --git a/.travis.yml b/.travis.yml
new file mode 100644
index 0000000..9bef750
--- /dev/null
+++ b/.travis.yml
@@ -0,0 +1,7 @@
+language: c
+compiler:
+  - clang
+  - gcc
+before_install:
+  - sudo apt-get -qq update
+  - sudo apt-get install -y libaio-dev libnuma-dev
diff --git a/Makefile b/Makefile
index 749a508..1d761c9 100644
--- a/Makefile
+++ b/Makefile
@@ -424,6 +424,8 @@ doc: tools/plot/fio2gnuplot.1
 	@man -t tools/fio_generate_plots.1 | ps2pdf - fio_generate_plots.pdf
 	@man -t tools/plot/fio2gnuplot.1 | ps2pdf - fio2gnuplot.pdf
 
+test:
+
 install: $(PROGS) $(SCRIPTS) tools/plot/fio2gnuplot.1 FORCE
 	$(INSTALL) -m 755 -d $(DESTDIR)$(bindir)
 	$(INSTALL) $(PROGS) $(SCRIPTS) $(DESTDIR)$(bindir)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-21 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-21 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 87a0ea3b58ef3d126148cb0279e015177ef9c62f:

  init: seed repeatable jobs differently (2016-03-18 09:07:16 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e91ba49acb8ff9f2e82046fec4f42b790e8644b1:

  gen-rand: fix dependency on strcasestr.o (2016-03-20 09:38:57 -0600)

----------------------------------------------------------------
Jens Axboe (1):
      gen-rand: fix dependency on strcasestr.o

 Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index a2502dc..749a508 100644
--- a/Makefile
+++ b/Makefile
@@ -212,7 +212,8 @@ T_LFSR_TEST_OBJS += lib/lfsr.o gettime.o t/log.o t/debug.o t/arch.o
 T_LFSR_TEST_PROGS = t/lfsr-test
 
 T_GEN_RAND_OBJS = t/gen-rand.o
-T_GEN_RAND_OBJS += t/log.o t/debug.o lib/rand.o lib/pattern.o lib/strntol.o
+T_GEN_RAND_OBJS += t/log.o t/debug.o lib/rand.o lib/pattern.o lib/strntol.o \
+			oslib/strcasestr.o
 T_GEN_RAND_PROGS = t/gen-rand
 
 ifeq ($(CONFIG_TARGET_OS), Linux)

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-19 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-19 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit a7ef38c4f0bb5ff11f46968e7fa5fa9a54c16de0:

  Fio 2.8 (2016-03-15 09:10:37 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 87a0ea3b58ef3d126148cb0279e015177ef9c62f:

  init: seed repeatable jobs differently (2016-03-18 09:07:16 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      init: seed repeatable jobs differently

 init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 9052add..cc33bf0 100644
--- a/init.c
+++ b/init.c
@@ -1079,7 +1079,7 @@ static int setup_random_seeds(struct thread_data *td)
 		seed *= 0x9e370001UL;
 
 	for (i = 0; i < FIO_RAND_NR_OFFS; i++) {
-		td->rand_seeds[i] = seed;
+		td->rand_seeds[i] = seed * td->thread_number + i;
 		seed *= 0x9e370001UL;
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-16 12:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-16 12:00 UTC (permalink / raw)
  To: fio

The following changes since commit 5f2f35697b1559cc4fff47c7c94cb983e6f2a460:

  lib/rand: make __init_randX() static (2016-03-10 12:12:09 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a7ef38c4f0bb5ff11f46968e7fa5fa9a54c16de0:

  Fio 2.8 (2016-03-15 09:10:37 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      Revert "options: move pattern_fmt_desc where we need it"
      verify: clear IO verify state all_io_list before writing
      Fio 2.8

 FIO-VERSION-GEN        |  2 +-
 options.c              | 19 ++++++++++---------
 os/windows/install.wxs |  2 +-
 verify.c               |  1 +
 4 files changed, 13 insertions(+), 11 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index a4ff012..502d4fe 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.7
+DEF_VER=fio-2.8
 
 LF='
 '
diff --git a/options.c b/options.c
index 7075d84..062abb4 100644
--- a/options.c
+++ b/options.c
@@ -20,6 +20,14 @@
 
 char client_sockaddr_str[INET6_ADDRSTRLEN] = { 0 };
 
+struct pattern_fmt_desc fmt_desc[] = {
+	{
+		.fmt   = "%o",
+		.len   = FIELD_SIZE(struct io_u *, offset),
+		.paste = paste_blockoff
+	}
+};
+
 /*
  * Check if mmap/mmaphuge has a :/foo/bar/file at the end. If so, return that.
  */
@@ -1184,20 +1192,13 @@ static int str_dedupe_cb(void *data, unsigned long long *il)
 
 static int str_verify_pattern_cb(void *data, const char *input)
 {
-	struct pattern_fmt_desc fmt_desc[] = {
-		{
-			.fmt   = "%o",
-			.len   = FIELD_SIZE(struct io_u *, offset),
-			.paste = paste_blockoff
-		}
-	};
 	struct thread_data *td = data;
 	int ret;
 
 	td->o.verify_fmt_sz = ARRAY_SIZE(td->o.verify_fmt);
 	ret = parse_and_fill_pattern(input, strlen(input), td->o.verify_pattern,
-			MAX_PATTERN_SIZE, fmt_desc, sizeof(fmt_desc),
-			td->o.verify_fmt, &td->o.verify_fmt_sz);
+				     MAX_PATTERN_SIZE, fmt_desc, sizeof(fmt_desc),
+				     td->o.verify_fmt, &td->o.verify_fmt_sz);
 	if (ret < 0)
 		return 1;
 
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index ed9f98b..366547d 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.7">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.8">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"
diff --git a/verify.c b/verify.c
index 5d491d7..0f43a3e 100644
--- a/verify.c
+++ b/verify.c
@@ -1385,6 +1385,7 @@ struct all_io_list *get_all_io_list(int save_mask, size_t *sz)
 	*sz += nr * sizeof(struct thread_io_list);
 	*sz += depth * sizeof(uint64_t);
 	rep = malloc(*sz);
+	memset(rep, 0, *sz);
 
 	rep->threads = cpu_to_le64((uint64_t) nr);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 9d0ad2a56d63e7f59473f31708358a4b65d2a5e3:

  t/gen-rand: remove compile warning on 32-bit (2016-03-09 14:15:11 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 5f2f35697b1559cc4fff47c7c94cb983e6f2a460:

  lib/rand: make __init_randX() static (2016-03-10 12:12:09 -0700)

----------------------------------------------------------------
Jens Axboe (4):
      Fix compile of test programs on archs that use arch_flags at runtime
      t/gen-rand: use 32-bit random generator
      Use 32-bit rand for parts that use rand_between()
      lib/rand: make __init_randX() static

 Makefile      |  6 +++---
 init.c        | 22 +++++++++++++---------
 io_u.c        |  8 ++++----
 lib/rand.h    | 15 +++++++--------
 t/arch.c      |  5 +++++
 t/dedupe.c    |  1 +
 t/gen-rand.c  |  4 ++--
 t/lfsr-test.c |  2 ++
 t/stest.c     |  2 ++
 9 files changed, 39 insertions(+), 26 deletions(-)
 create mode 100644 t/arch.c

---

Diff of recent changes:

diff --git a/Makefile b/Makefile
index 6b4c9db..a2502dc 100644
--- a/Makefile
+++ b/Makefile
@@ -191,7 +191,7 @@ endif
 -include $(OBJS:.o=.d)
 
 T_SMALLOC_OBJS = t/stest.o
-T_SMALLOC_OBJS += gettime.o mutex.o smalloc.o t/log.o t/debug.o
+T_SMALLOC_OBJS += gettime.o mutex.o smalloc.o t/log.o t/debug.o t/arch.o
 T_SMALLOC_PROGS = t/stest
 
 T_IEEE_OBJS = t/ieee754.o
@@ -208,7 +208,7 @@ T_AXMAP_OBJS += lib/lfsr.o lib/axmap.o
 T_AXMAP_PROGS = t/axmap
 
 T_LFSR_TEST_OBJS = t/lfsr-test.o
-T_LFSR_TEST_OBJS += lib/lfsr.o gettime.o t/log.o t/debug.o
+T_LFSR_TEST_OBJS += lib/lfsr.o gettime.o t/log.o t/debug.o t/arch.o
 T_LFSR_TEST_PROGS = t/lfsr-test
 
 T_GEN_RAND_OBJS = t/gen-rand.o
@@ -223,7 +223,7 @@ endif
 
 T_DEDUPE_OBJS = t/dedupe.o
 T_DEDUPE_OBJS += lib/rbtree.o t/log.o mutex.o smalloc.o gettime.o crc/md5.o \
-		lib/memalign.o lib/bloom.o t/debug.o crc/xxhash.o \
+		lib/memalign.o lib/bloom.o t/debug.o crc/xxhash.o t/arch.o \
 		crc/murmur3.o crc/crc32c.o crc/crc32c-intel.o crc/fnv.o
 T_DEDUPE_PROGS = t/fio-dedupe
 
diff --git a/init.c b/init.c
index 149029a..9052add 100644
--- a/init.c
+++ b/init.c
@@ -919,11 +919,13 @@ static int exists_and_not_file(const char *filename)
 	return 1;
 }
 
-static void td_fill_rand_seeds_internal(struct thread_data *td, int use64)
+static void td_fill_rand_seeds_internal(struct thread_data *td, bool use64)
 {
+	int i;
+
 	init_rand_seed(&td->bsrange_state, td->rand_seeds[FIO_RAND_BS_OFF], use64);
 	init_rand_seed(&td->verify_state, td->rand_seeds[FIO_RAND_VER_OFF], use64);
-	init_rand_seed(&td->rwmix_state, td->rand_seeds[FIO_RAND_MIX_OFF], use64);
+	init_rand_seed(&td->rwmix_state, td->rand_seeds[FIO_RAND_MIX_OFF], false);
 
 	if (td->o.file_service_type == FIO_FSERVICE_RANDOM)
 		init_rand_seed(&td->next_file_state, td->rand_seeds[FIO_RAND_FILE_OFF], use64);
@@ -932,6 +934,8 @@ static void td_fill_rand_seeds_internal(struct thread_data *td, int use64)
 	init_rand_seed(&td->trim_state, td->rand_seeds[FIO_RAND_TRIM_OFF], use64);
 	init_rand_seed(&td->delay_state, td->rand_seeds[FIO_RAND_START_DELAY], use64);
 	init_rand_seed(&td->poisson_state, td->rand_seeds[FIO_RAND_POISSON_OFF], 0);
+	init_rand_seed(&td->dedupe_state, td->rand_seeds[FIO_DEDUPE_OFF], false);
+	init_rand_seed(&td->zone_state, td->rand_seeds[FIO_RAND_ZONE_OFF], false);
 
 	if (!td_random(td))
 		return;
@@ -940,14 +944,17 @@ static void td_fill_rand_seeds_internal(struct thread_data *td, int use64)
 		td->rand_seeds[FIO_RAND_BLOCK_OFF] = FIO_RANDSEED * td->thread_number;
 
 	init_rand_seed(&td->random_state, td->rand_seeds[FIO_RAND_BLOCK_OFF], use64);
-	init_rand_seed(&td->seq_rand_state[DDIR_READ], td->rand_seeds[FIO_RAND_SEQ_RAND_READ_OFF], use64);
-	init_rand_seed(&td->seq_rand_state[DDIR_WRITE], td->rand_seeds[FIO_RAND_SEQ_RAND_WRITE_OFF], use64);
-	init_rand_seed(&td->seq_rand_state[DDIR_TRIM], td->rand_seeds[FIO_RAND_SEQ_RAND_TRIM_OFF], use64);
+
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		struct frand_state *s = &td->seq_rand_state[i];
+
+		init_rand_seed(s, td->rand_seeds[FIO_RAND_SEQ_RAND_READ_OFF], false);
+	}
 }
 
 void td_fill_rand_seeds(struct thread_data *td)
 {
-	int use64;
+	bool use64;
 
 	if (td->o.allrand_repeatable) {
 		unsigned int i;
@@ -966,9 +973,6 @@ void td_fill_rand_seeds(struct thread_data *td)
 
 	init_rand_seed(&td->buf_state, td->rand_seeds[FIO_RAND_BUF_OFF], use64);
 	frand_copy(&td->buf_state_prev, &td->buf_state);
-
-	init_rand_seed(&td->dedupe_state, td->rand_seeds[FIO_DEDUPE_OFF], use64);
-	init_rand_seed(&td->zone_state, td->rand_seeds[FIO_RAND_ZONE_OFF], use64);
 }
 
 /*
diff --git a/io_u.c b/io_u.c
index 3299f29..ea08c92 100644
--- a/io_u.c
+++ b/io_u.c
@@ -177,7 +177,7 @@ bail:
 	/*
 	 * Generate a value, v, between 1 and 100, both inclusive
 	 */
-	v = rand_between(&td->zone_state, 1, 100);
+	v = rand32_between(&td->zone_state, 1, 100);
 
 	zsi = &td->zone_state_index[ddir][v - 1];
 	stotal = zsi->size_perc_prev;
@@ -279,7 +279,7 @@ static bool should_do_random(struct thread_data *td, enum fio_ddir ddir)
 	if (td->o.perc_rand[ddir] == 100)
 		return true;
 
-	v = rand_between(&td->seq_rand_state[ddir], 1, 100);
+	v = rand32_between(&td->seq_rand_state[ddir], 1, 100);
 
 	return v <= td->o.perc_rand[ddir];
 }
@@ -601,7 +601,7 @@ static inline enum fio_ddir get_rand_ddir(struct thread_data *td)
 {
 	unsigned int v;
 
-	v = rand_between(&td->rwmix_state, 1, 100);
+	v = rand32_between(&td->rwmix_state, 1, 100);
 
 	if (v <= td->o.rwmix[DDIR_READ])
 		return DDIR_READ;
@@ -1964,7 +1964,7 @@ static struct frand_state *get_buf_state(struct thread_data *td)
 		return &td->buf_state;
 	}
 
-	v = rand_between(&td->dedupe_state, 1, 100);
+	v = rand32_between(&td->dedupe_state, 1, 100);
 
 	if (v <= td->o.dedupe_percentage)
 		return &td->buf_state_prev;
diff --git a/lib/rand.h b/lib/rand.h
index 24fac23..bff4a35 100644
--- a/lib/rand.h
+++ b/lib/rand.h
@@ -2,6 +2,7 @@
 #define FIO_RAND_H
 
 #include <inttypes.h>
+#include <assert.h>
 #include "types.h"
 #include "../arch/arch.h"
 
@@ -24,10 +25,6 @@ struct frand_state {
 	};
 };
 
-struct frand64_state {
-	uint64_t s1, s2, s3, s4, s5;
-};
-
 static inline uint64_t rand_max(struct frand_state *state)
 {
 	if (state->use64)
@@ -121,12 +118,14 @@ static inline double __rand_0_1(struct frand_state *state)
 /*
  * Generate a random value between 'start' and 'end', both inclusive
  */
-static inline int rand_between(struct frand_state *state, int start, int end)
+static inline int rand32_between(struct frand_state *state, int start, int end)
 {
-	uint64_t r;
+	uint32_t r;
+
+	assert(!state->use64);
 
-	r = __rand(state);
-	return start + (int) ((double)end * (r / (rand_max(state) + 1.0)));
+	r = __rand32(&state->state32);
+	return start + (int) ((double)end * (r / (FRAND32_MAX + 1.0)));
 }
 
 extern void init_rand(struct frand_state *, bool);
diff --git a/t/arch.c b/t/arch.c
new file mode 100644
index 0000000..befb7c7
--- /dev/null
+++ b/t/arch.c
@@ -0,0 +1,5 @@
+#include "../arch/arch.h"
+
+unsigned long arch_flags = 0;
+int tsc_reliable;
+int arch_random;
diff --git a/t/dedupe.c b/t/dedupe.c
index 3a66820..7856da1 100644
--- a/t/dedupe.c
+++ b/t/dedupe.c
@@ -537,6 +537,7 @@ int main(int argc, char *argv[])
 	uint64_t nextents = 0, nchunks = 0;
 	int c, ret;
 
+	arch_init(argv);
 	debug_init();
 
 	while ((c = getopt(argc, argv, "b:t:d:o:c:p:B:")) != -1) {
diff --git a/t/gen-rand.c b/t/gen-rand.c
index a03646a..6c31f92 100644
--- a/t/gen-rand.c
+++ b/t/gen-rand.c
@@ -37,10 +37,10 @@ int main(int argc, char *argv[])
 
 	nvalues = strtoul(argv[3], NULL, 10);
 
-	init_rand(&s, true);
+	init_rand(&s, false);
 
 	for (i = 0; i < nvalues; i++) {
-		int v = rand_between(&s, start, end);
+		int v = rand32_between(&s, start, end);
 
 		buckets[v - start]++;
 	}
diff --git a/t/lfsr-test.c b/t/lfsr-test.c
index 4352b89..bad5097 100644
--- a/t/lfsr-test.c
+++ b/t/lfsr-test.c
@@ -38,6 +38,8 @@ int main(int argc, char *argv[])
 	void *v = NULL, *v_start;
 	double total, mean;
 
+	arch_init(argv);
+
 	/* Read arguments */
 	switch (argc) {
 		case 5: if (strncmp(argv[4], "verify", 7) == 0)
diff --git a/t/stest.c b/t/stest.c
index fb51989..0e0d8b0 100644
--- a/t/stest.c
+++ b/t/stest.c
@@ -4,6 +4,7 @@
 
 #include "../smalloc.h"
 #include "../flist.h"
+#include "../arch/arch.h"
 #include "debug.h"
 
 #define MAGIC1	0xa9b1c8d2
@@ -69,6 +70,7 @@ static int do_specific_alloc(unsigned long size)
 
 int main(int argc, char *argv[])
 {
+	arch_init(argv);
 	sinit();
 	debug_init();
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-10 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-10 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 4f126cad8fc5ed71c7ac8155726e38395c689905:

  .gitignore: ignore vim undo files (2016-03-08 11:19:17 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 9d0ad2a56d63e7f59473f31708358a4b65d2a5e3:

  t/gen-rand: remove compile warning on 32-bit (2016-03-09 14:15:11 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Fio 2.7
      t/gen-rand: remove compile warning on 32-bit

 FIO-VERSION-GEN        | 2 +-
 os/windows/install.wxs | 2 +-
 t/gen-rand.c           | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

---

Diff of recent changes:

diff --git a/FIO-VERSION-GEN b/FIO-VERSION-GEN
index 597e615..a4ff012 100755
--- a/FIO-VERSION-GEN
+++ b/FIO-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=FIO-VERSION-FILE
-DEF_VER=fio-2.6
+DEF_VER=fio-2.7
 
 LF='
 '
diff --git a/os/windows/install.wxs b/os/windows/install.wxs
index 011c1eb..ed9f98b 100755
--- a/os/windows/install.wxs
+++ b/os/windows/install.wxs
@@ -10,7 +10,7 @@
 	<Product Id="*"
 	  Codepage="1252" Language="1033"
 	  Manufacturer="fio" Name="fio"
-	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.6">
+	  UpgradeCode="2338A332-5511-43CF-B9BD-5C60496CCFCC" Version="2.7">
 		<Package
 		  Description="Flexible IO Tester"
 		  InstallerVersion="301" Keywords="Installer,MSI,Database"
diff --git a/t/gen-rand.c b/t/gen-rand.c
index c2a31bc..a03646a 100644
--- a/t/gen-rand.c
+++ b/t/gen-rand.c
@@ -54,10 +54,10 @@ int main(int argc, char *argv[])
 	pass = fail = 0;
 	for (i = 0; i < index; i++) {
 		if (buckets[i] < vmin || buckets[i] > vmax) {
-			printf("FAIL bucket%4lu: val=%8lu (%.1f < %.1f > %.1f)\n", i + 1, buckets[i], vmin, mean, vmax);
+			printf("FAIL bucket%4lu: val=%8lu (%.1f < %.1f > %.1f)\n", (unsigned long) i + 1, buckets[i], vmin, mean, vmax);
 			fail++;
 		} else {
-			printf("PASS bucket%4lu: val=%8lu (%.1f < %.1f > %.1f)\n", i + 1, buckets[i], vmin, mean, vmax);
+			printf("PASS bucket%4lu: val=%8lu (%.1f < %.1f > %.1f)\n", (unsigned long) i + 1, buckets[i], vmin, mean, vmax);
 			pass++;
 		}
 	}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-09 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-09 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit de4096e8682a064ed9125af7ac30a3fe4021167b:

  rand: use bools (2016-03-07 15:38:44 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 4f126cad8fc5ed71c7ac8155726e38395c689905:

  .gitignore: ignore vim undo files (2016-03-08 11:19:17 -0700)

----------------------------------------------------------------
Jens Axboe (3):
      Update documentation on log file formats
      Add t/gen-rand to test random generator
      .gitignore: ignore vim undo files

 .gitignore   |  1 +
 HOWTO        | 43 ++++++++++++++++++++++++++++++++++----
 Makefile     |  9 ++++++++
 fio.1        | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 t/gen-rand.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 174 insertions(+), 7 deletions(-)
 create mode 100644 t/gen-rand.c

---

Diff of recent changes:

diff --git a/.gitignore b/.gitignore
index c9d90fb..bd9b032 100644
--- a/.gitignore
+++ b/.gitignore
@@ -9,3 +9,4 @@
 /fio
 y.tab.*
 lex.yy.c
+*.un~
diff --git a/HOWTO b/HOWTO
index e2a4b15..0d3d2fb 100644
--- a/HOWTO
+++ b/HOWTO
@@ -11,6 +11,8 @@ Table of contents
 8. Trace file format
 9. CPU idleness profiling
 10. Verification and triggers
+11. Log File Formats
+
 
 1.0 Overview and history
 ------------------------
@@ -1564,7 +1566,7 @@ write_bw_log=str If given, write a bandwidth log of the jobs in this job
 		filename. For this option, the suffix is _bw.x.log, where
 		x is the index of the job (1..N, where N is the number of
 		jobs). If 'per_job_logs' is false, then the filename will not
-		include the job index.
+		include the job index. See 'Log File Formats'.
 
 write_lat_log=str Same as write_bw_log, except that this option stores io
 		submission, completion, and total latencies instead. If no
@@ -1578,8 +1580,8 @@ write_lat_log=str Same as write_bw_log, except that this option stores io
 		and foo_lat.x.log, where x is the index of the job (1..N,
 		where N is the number of jobs). This helps fio_generate_plot
 		fine the logs automatically. If 'per_job_logs' is false, then
-		the filename will not include the job index.
-
+		the filename will not include the job index. See 'Log File
+		Formats'.
 
 write_iops_log=str Same as write_bw_log, but writes IOPS. If no filename is
 		given with this option, the default filename of
@@ -1587,7 +1589,7 @@ write_iops_log=str Same as write_bw_log, but writes IOPS. If no filename is
 		(1..N, where N is the number of jobs). Even if the filename
 		is given, fio will still append the type of log. If
 		'per_job_logs' is false, then the filename will not include
-		the job index.
+		the job index. See 'Log File Formats'.
 
 log_avg_msec=int By default, fio will log an entry in the iops, latency,
 		or bw log for every IO that completes. When writing to the
@@ -2253,3 +2255,36 @@ the verify_state_load option. If that is set, fio will load the previously
 stored state. For a local fio run this is done by loading the files directly,
 and on a client/server run, the server backend will ask the client to send
 the files over and load them from there.
+
+
+11.0 Log File Formats
+---------------------
+
+Fio supports a variety of log file formats, for logging latencies, bandwidth,
+and IOPS. The logs share a common format, which looks like this:
+
+time (msec), value, data direction, offset
+
+Time for the log entry is always in milliseconds. The value logged depends
+on the type of log, it will be one of the following:
+
+	Latency log		Value is latency in usecs
+	Bandwidth log		Value is in KB/sec
+	IOPS log		Value is IOPS
+
+Data direction is one of the following:
+
+	0			IO is a READ
+	1			IO is a WRITE
+	2			IO is a TRIM
+
+The offset is the offset, in bytes, from the start of the file, for that
+particular IO. The logging of the offset can be toggled with 'log_offset'.
+
+If windowed logging is enabled though 'log_avg_msec', then fio doesn't log
+individual IOs. Instead of logs the average values over the specified
+period of time. Since 'data direction' and 'offset' are per-IO values,
+they aren't applicable if windowed logging is enabled. If windowed logging
+is enabled and 'log_max_value' is set, then fio logs maximum values in
+that window instead of averages.
+
diff --git a/Makefile b/Makefile
index 684b565..6b4c9db 100644
--- a/Makefile
+++ b/Makefile
@@ -211,6 +211,10 @@ T_LFSR_TEST_OBJS = t/lfsr-test.o
 T_LFSR_TEST_OBJS += lib/lfsr.o gettime.o t/log.o t/debug.o
 T_LFSR_TEST_PROGS = t/lfsr-test
 
+T_GEN_RAND_OBJS = t/gen-rand.o
+T_GEN_RAND_OBJS += t/log.o t/debug.o lib/rand.o lib/pattern.o lib/strntol.o
+T_GEN_RAND_PROGS = t/gen-rand
+
 ifeq ($(CONFIG_TARGET_OS), Linux)
 T_BTRACE_FIO_OBJS = t/btrace2fio.o
 T_BTRACE_FIO_OBJS += fifo.o lib/flist_sort.o t/log.o oslib/linux-dev-lookup.o
@@ -231,6 +235,7 @@ T_OBJS += $(T_IEEE_OBJS)
 T_OBJS += $(T_ZIPF_OBJS)
 T_OBJS += $(T_AXMAP_OBJS)
 T_OBJS += $(T_LFSR_TEST_OBJS)
+T_OBJS += $(T_GEN_RAND_OBJS)
 T_OBJS += $(T_BTRACE_FIO_OBJS)
 T_OBJS += $(T_DEDUPE_OBJS)
 T_OBJS += $(T_VS_OBJS)
@@ -246,6 +251,7 @@ T_TEST_PROGS += $(T_IEEE_PROGS)
 T_PROGS += $(T_ZIPF_PROGS)
 T_TEST_PROGS += $(T_AXMAP_PROGS)
 T_TEST_PROGS += $(T_LFSR_TEST_PROGS)
+T_TEST_PROGS += $(T_GEN_RAND_PROGS)
 T_PROGS += $(T_BTRACE_FIO_PROGS)
 T_PROGS += $(T_DEDUPE_PROGS)
 T_PROGS += $(T_VS_PROGS)
@@ -386,6 +392,9 @@ t/axmap: $(T_AXMAP_OBJS)
 t/lfsr-test: $(T_LFSR_TEST_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_LFSR_TEST_OBJS) $(LIBS)
 
+t/gen-rand: $(T_GEN_RAND_OBJS)
+	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_GEN_RAND_OBJS) $(LIBS)
+
 ifeq ($(CONFIG_TARGET_OS), Linux)
 t/fio-btrace2fio: $(T_BTRACE_FIO_OBJS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(T_BTRACE_FIO_OBJS) $(LIBS)
diff --git a/fio.1 b/fio.1
index 87404c1..df140cf 100644
--- a/fio.1
+++ b/fio.1
@@ -1441,7 +1441,8 @@ fio_generate_plots script uses gnuplot to turn these text files into nice
 graphs. See \fBwrite_lat_log\fR for behaviour of given filename. For this
 option, the postfix is _bw.x.log, where x is the index of the job (1..N,
 where N is the number of jobs). If \fBper_job_logs\fR is false, then the
-filename will not include the job index.
+filename will not include the job index. See the \fBLOG FILE FORMATS\fR
+section.
 .TP
 .BI write_lat_log \fR=\fPstr
 Same as \fBwrite_bw_log\fR, but writes I/O completion latencies.  If no
@@ -1449,14 +1450,15 @@ filename is given with this option, the default filename of
 "jobname_type.x.log" is used, where x is the index of the job (1..N, where
 N is the number of jobs). Even if the filename is given, fio will still
 append the type of log. If \fBper_job_logs\fR is false, then the filename will
-not include the job index.
+not include the job index. See the \fBLOG FILE FORMATS\fR section.
 .TP
 .BI write_iops_log \fR=\fPstr
 Same as \fBwrite_bw_log\fR, but writes IOPS. If no filename is given with this
 option, the default filename of "jobname_type.x.log" is used, where x is the
 index of the job (1..N, where N is the number of jobs). Even if the filename
 is given, fio will still append the type of log. If \fBper_job_logs\fR is false,
-then the filename will not include the job index.
+then the filename will not include the job index. See the \fBLOG FILE FORMATS\fR
+section.
 .TP
 .BI log_avg_msec \fR=\fPint
 By default, fio will log an entry in the iops, latency, or bw log for every
@@ -2219,6 +2221,58 @@ the files over and load them from there.
 
 .RE
 
+.SH LOG FILE FORMATS
+
+Fio supports a variety of log file formats, for logging latencies, bandwidth,
+and IOPS. The logs share a common format, which looks like this:
+
+.B time (msec), value, data direction, offset
+
+Time for the log entry is always in milliseconds. The value logged depends
+on the type of log, it will be one of the following:
+
+.P
+.PD 0
+.TP
+.B Latency log
+Value is in latency in usecs
+.TP
+.B Bandwidth log
+Value is in KB/sec
+.TP
+.B IOPS log
+Value is in IOPS
+.PD
+.P
+
+Data direction is one of the following:
+
+.P
+.PD 0
+.TP
+.B 0
+IO is a READ
+.TP
+.B 1
+IO is a WRITE
+.TP
+.B 2
+IO is a TRIM
+.PD
+.P
+
+The \fIoffset\fR is the offset, in bytes, from the start of the file, for that
+particular IO. The logging of the offset can be toggled with \fBlog_offset\fR.
+
+If windowed logging is enabled though \fBlog_avg_msec\fR, then fio doesn't log
+individual IOs. Instead of logs the average values over the specified
+period of time. Since \fIdata direction\fR and \fIoffset\fR are per-IO values,
+they aren't applicable if windowed logging is enabled. If windowed logging
+is enabled and \fBlog_max_value\fR is set, then fio logs maximum values in
+that window instead of averages.
+
+.RE
+
 .SH CLIENT / SERVER
 Normally you would run fio as a stand-alone application on the machine
 where the IO workload should be generated. However, it is also possible to
diff --git a/t/gen-rand.c b/t/gen-rand.c
new file mode 100644
index 0000000..c2a31bc
--- /dev/null
+++ b/t/gen-rand.c
@@ -0,0 +1,68 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <inttypes.h>
+#include <assert.h>
+#include <math.h>
+#include <string.h>
+
+#include "../lib/types.h"
+#include "../log.h"
+#include "../lib/lfsr.h"
+#include "../lib/axmap.h"
+#include "../smalloc.h"
+#include "../minmax.h"
+#include "../lib/rand.h"
+
+int main(int argc, char *argv[])
+{
+	struct frand_state s;
+	uint64_t i, start, end, nvalues;
+	unsigned long *buckets, index, pass, fail;
+	double p, dev, mean, vmin, vmax;
+
+	if (argc < 4) {
+		log_err("%s: start end nvalues\n", argv[0]);
+		return 1;
+	}
+
+	start = strtoul(argv[1], NULL, 10);
+	end = strtoul(argv[2], NULL, 10);
+
+	if (start >= end) {
+		log_err("%s: start must be smaller than end\n", argv[0]);
+		return 1;
+	}
+	index = 1 + end - start;
+	buckets = calloc(index, sizeof(unsigned long));
+
+	nvalues = strtoul(argv[3], NULL, 10);
+
+	init_rand(&s, true);
+
+	for (i = 0; i < nvalues; i++) {
+		int v = rand_between(&s, start, end);
+
+		buckets[v - start]++;
+	}
+
+	p = 1.0 / index;
+	dev = sqrt(nvalues * p * (1.0 - p));
+	mean = nvalues * p;
+	vmin = mean - dev;
+	vmax = mean + dev;
+
+	pass = fail = 0;
+	for (i = 0; i < index; i++) {
+		if (buckets[i] < vmin || buckets[i] > vmax) {
+			printf("FAIL bucket%4lu: val=%8lu (%.1f < %.1f > %.1f)\n", i + 1, buckets[i], vmin, mean, vmax);
+			fail++;
+		} else {
+			printf("PASS bucket%4lu: val=%8lu (%.1f < %.1f > %.1f)\n", i + 1, buckets[i], vmin, mean, vmax);
+			pass++;
+		}
+	}
+
+	printf("Passes=%lu, Fail=%lu\n", pass, fail);
+
+	return 0;
+}

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-08 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-08 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 7c4d0fb7e9e038000978b43b5674f9ad049d36b9:

  Fix double free of td zone state index (2016-03-04 19:50:41 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to de4096e8682a064ed9125af7ac30a3fe4021167b:

  rand: use bools (2016-03-07 15:38:44 -0700)

----------------------------------------------------------------
Andrey Kuzmin (1):
      io_u: speed up __get_next_buflen()

Jens Axboe (5):
      options: unify the bssplit/zone split code
      options: finish merge of bssplit/rand zone code
      options: improvements to parse dry run
      Add the sample JESD219 job file
      rand: use bools

 examples/jesd219.fio |  19 ++++
 io_u.c               |   2 +-
 lib/rand.c           |   4 +-
 lib/rand.h           |   5 +-
 options.c            | 267 ++++++++++++++++++++++++---------------------------
 5 files changed, 152 insertions(+), 145 deletions(-)
 create mode 100644 examples/jesd219.fio

---

Diff of recent changes:

diff --git a/examples/jesd219.fio b/examples/jesd219.fio
new file mode 100644
index 0000000..ab2c40e
--- /dev/null
+++ b/examples/jesd219.fio
@@ -0,0 +1,19 @@
+# Sample implementation of the JESD219 workload for SSD endurance
+# testing. It uses a specific distribution of block sizes and
+# read/write mix, as well as a specific distribution of where on
+# the device the IO accesses will land. Based on posting from
+# Jeff Furlong <jeff.furlong@hgst.com>
+[JESD219]
+ioengine=libaio
+direct=1
+rw=randrw
+norandommap
+randrepeat=0
+rwmixread=40
+rwmixwrite=60
+iodepth=256
+numjobs=4
+bssplit=512/4:1024/1:1536/1:2048/1:2560/1:3072/1:3584/1:4k/67:8k/10:16k/7:32k/3:64k/3
+random_distribution=zoned:50/5:30/15:20/80
+filename=/dev/nvme0n1
+group_reporting=1
diff --git a/io_u.c b/io_u.c
index 0a39886..3299f29 100644
--- a/io_u.c
+++ b/io_u.c
@@ -553,7 +553,7 @@ static unsigned int __get_next_buflen(struct thread_data *td, struct io_u *io_u,
 
 				buflen = bsp->bs;
 				perc += bsp->perc;
-				if ((r <= ((frand_max / 100L) * perc)) &&
+				if ((r * 100UL <= frand_max * perc) &&
 				    io_u_fits(td, io_u, buflen))
 					break;
 			}
diff --git a/lib/rand.c b/lib/rand.c
index 1b661a8..9c3e0d6 100644
--- a/lib/rand.c
+++ b/lib/rand.c
@@ -76,7 +76,7 @@ static void __init_rand64(struct taus258_state *state, uint64_t seed)
 		__rand64(state);
 }
 
-void init_rand(struct frand_state *state, int use64)
+void init_rand(struct frand_state *state, bool use64)
 {
 	state->use64 = use64;
 
@@ -86,7 +86,7 @@ void init_rand(struct frand_state *state, int use64)
 		__init_rand64(&state->state64, 1);
 }
 
-void init_rand_seed(struct frand_state *state, unsigned int seed, int use64)
+void init_rand_seed(struct frand_state *state, unsigned int seed, bool use64)
 {
 	state->use64 = use64;
 
diff --git a/lib/rand.h b/lib/rand.h
index 49773b0..24fac23 100644
--- a/lib/rand.h
+++ b/lib/rand.h
@@ -2,6 +2,7 @@
 #define FIO_RAND_H
 
 #include <inttypes.h>
+#include "types.h"
 #include "../arch/arch.h"
 
 #define FRAND32_MAX	(-1U)
@@ -128,8 +129,8 @@ static inline int rand_between(struct frand_state *state, int start, int end)
 	return start + (int) ((double)end * (r / (rand_max(state) + 1.0)));
 }
 
-extern void init_rand(struct frand_state *, int);
-extern void init_rand_seed(struct frand_state *, unsigned int seed, int);
+extern void init_rand(struct frand_state *, bool);
+extern void init_rand_seed(struct frand_state *, unsigned int seed, bool);
 extern void __fill_random_buf(void *buf, unsigned int len, unsigned long seed);
 extern unsigned long fill_random_buf(struct frand_state *, void *buf, unsigned int len);
 extern void __fill_random_buf_percentage(unsigned long, void *, unsigned int, unsigned int, unsigned int, char *, unsigned int);
diff --git a/options.c b/options.c
index bc2d0ed..7075d84 100644
--- a/options.c
+++ b/options.c
@@ -44,35 +44,28 @@ static int bs_cmp(const void *p1, const void *p2)
 	return (int) bsp1->perc - (int) bsp2->perc;
 }
 
-static int bssplit_ddir(struct thread_options *o, int ddir, char *str)
+struct split {
+	unsigned int nr;
+	unsigned int val1[100];
+	unsigned int val2[100];
+};
+
+static int split_parse_ddir(struct thread_options *o, struct split *split,
+			    enum fio_ddir ddir, char *str)
 {
-	struct bssplit *bssplit;
-	unsigned int i, perc, perc_missing;
-	unsigned int max_bs, min_bs;
+	unsigned int i, perc;
 	long long val;
 	char *fname;
 
-	o->bssplit_nr[ddir] = 4;
-	bssplit = malloc(4 * sizeof(struct bssplit));
+	split->nr = 0;
 
 	i = 0;
-	max_bs = 0;
-	min_bs = -1;
 	while ((fname = strsep(&str, ":")) != NULL) {
 		char *perc_str;
 
 		if (!strlen(fname))
 			break;
 
-		/*
-		 * grow struct buffer, if needed
-		 */
-		if (i == o->bssplit_nr[ddir]) {
-			o->bssplit_nr[ddir] <<= 1;
-			bssplit = realloc(bssplit, o->bssplit_nr[ddir]
-						  * sizeof(struct bssplit));
-		}
-
 		perc_str = strstr(fname, "/");
 		if (perc_str) {
 			*perc_str = '\0';
@@ -87,28 +80,53 @@ static int bssplit_ddir(struct thread_options *o, int ddir, char *str)
 
 		if (str_to_decimal(fname, &val, 1, o, 0, 0)) {
 			log_err("fio: bssplit conversion failed\n");
-			free(bssplit);
 			return 1;
 		}
 
-		if (val > max_bs)
-			max_bs = val;
-		if (val < min_bs)
-			min_bs = val;
-
-		bssplit[i].bs = val;
-		bssplit[i].perc = perc;
+		split->val1[i] = val;
+		split->val2[i] = perc;
 		i++;
+		if (i == 100)
+			break;
 	}
 
-	o->bssplit_nr[ddir] = i;
+	split->nr = i;
+	return 0;
+}
+
+static int bssplit_ddir(struct thread_options *o, enum fio_ddir ddir, char *str)
+{
+	unsigned int i, perc, perc_missing;
+	unsigned int max_bs, min_bs;
+	struct split split;
+
+	memset(&split, 0, sizeof(split));
+
+	if (split_parse_ddir(o, &split, ddir, str))
+		return 1;
+	if (!split.nr)
+		return 0;
+
+	max_bs = 0;
+	min_bs = -1;
+	o->bssplit[ddir] = malloc(split.nr * sizeof(struct bssplit));
+	o->bssplit_nr[ddir] = split.nr;
+	for (i = 0; i < split.nr; i++) {
+		if (split.val1[i] > max_bs)
+			max_bs = split.val1[i];
+		if (split.val1[i] < min_bs)
+			min_bs = split.val1[i];
+
+		o->bssplit[ddir][i].bs = split.val1[i];
+		o->bssplit[ddir][i].perc =split.val2[i];
+	}
 
 	/*
 	 * Now check if the percentages add up, and how much is missing
 	 */
 	perc = perc_missing = 0;
 	for (i = 0; i < o->bssplit_nr[ddir]; i++) {
-		struct bssplit *bsp = &bssplit[i];
+		struct bssplit *bsp = &o->bssplit[ddir][i];
 
 		if (bsp->perc == -1U)
 			perc_missing++;
@@ -118,7 +136,8 @@ static int bssplit_ddir(struct thread_options *o, int ddir, char *str)
 
 	if (perc > 100 && perc_missing > 1) {
 		log_err("fio: bssplit percentages add to more than 100%%\n");
-		free(bssplit);
+		free(o->bssplit[ddir]);
+		o->bssplit[ddir] = NULL;
 		return 1;
 	}
 
@@ -130,7 +149,7 @@ static int bssplit_ddir(struct thread_options *o, int ddir, char *str)
 		if (perc_missing == 1 && o->bssplit_nr[ddir] == 1)
 			perc = 100;
 		for (i = 0; i < o->bssplit_nr[ddir]; i++) {
-			struct bssplit *bsp = &bssplit[i];
+			struct bssplit *bsp = &o->bssplit[ddir][i];
 
 			if (bsp->perc == -1U)
 				bsp->perc = (100 - perc) / perc_missing;
@@ -143,60 +162,78 @@ static int bssplit_ddir(struct thread_options *o, int ddir, char *str)
 	/*
 	 * now sort based on percentages, for ease of lookup
 	 */
-	qsort(bssplit, o->bssplit_nr[ddir], sizeof(struct bssplit), bs_cmp);
-	o->bssplit[ddir] = bssplit;
+	qsort(o->bssplit[ddir], o->bssplit_nr[ddir], sizeof(struct bssplit), bs_cmp);
 	return 0;
 }
 
-static int str_bssplit_cb(void *data, const char *input)
+typedef int (split_parse_fn)(struct thread_options *, enum fio_ddir, char *);
+
+static int str_split_parse(struct thread_data *td, char *str, split_parse_fn *fn)
 {
-	struct thread_data *td = data;
-	char *str, *p, *odir, *ddir;
+	char *odir, *ddir;
 	int ret = 0;
 
-	if (parse_dryrun())
-		return 0;
-
-	p = str = strdup(input);
-
-	strip_blank_front(&str);
-	strip_blank_end(str);
-
 	odir = strchr(str, ',');
 	if (odir) {
 		ddir = strchr(odir + 1, ',');
 		if (ddir) {
-			ret = bssplit_ddir(&td->o, DDIR_TRIM, ddir + 1);
+			ret = fn(&td->o, DDIR_TRIM, ddir + 1);
 			if (!ret)
 				*ddir = '\0';
 		} else {
 			char *op;
 
 			op = strdup(odir + 1);
-			ret = bssplit_ddir(&td->o, DDIR_TRIM, op);
+			ret = fn(&td->o, DDIR_TRIM, op);
 
 			free(op);
 		}
 		if (!ret)
-			ret = bssplit_ddir(&td->o, DDIR_WRITE, odir + 1);
+			ret = fn(&td->o, DDIR_WRITE, odir + 1);
 		if (!ret) {
 			*odir = '\0';
-			ret = bssplit_ddir(&td->o, DDIR_READ, str);
+			ret = fn(&td->o, DDIR_READ, str);
 		}
 	} else {
 		char *op;
 
 		op = strdup(str);
-		ret = bssplit_ddir(&td->o, DDIR_WRITE, op);
+		ret = fn(&td->o, DDIR_WRITE, op);
 		free(op);
 
 		if (!ret) {
 			op = strdup(str);
-			ret = bssplit_ddir(&td->o, DDIR_TRIM, op);
+			ret = fn(&td->o, DDIR_TRIM, op);
 			free(op);
 		}
 		if (!ret)
-			ret = bssplit_ddir(&td->o, DDIR_READ, str);
+			ret = fn(&td->o, DDIR_READ, str);
+	}
+
+	return ret;
+}
+
+static int str_bssplit_cb(void *data, const char *input)
+{
+	struct thread_data *td = data;
+	char *str, *p;
+	int ret = 0;
+
+	p = str = strdup(input);
+
+	strip_blank_front(&str);
+	strip_blank_end(str);
+
+	ret = str_split_parse(td, str, bssplit_ddir);
+
+	if (parse_dryrun()) {
+		int i;
+
+		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+			free(td->o.bssplit[i]);
+			td->o.bssplit[i] = NULL;
+			td->o.bssplit_nr[i] = 0;
+		}
 	}
 
 	free(p);
@@ -714,64 +751,33 @@ static int zone_cmp(const void *p1, const void *p2)
 	return (int) zsp2->access_perc - (int) zsp1->access_perc;
 }
 
-static int zone_split_ddir(struct thread_options *o, int ddir, char *str)
+static int zone_split_ddir(struct thread_options *o, enum fio_ddir ddir,
+			   char *str)
 {
-	struct zone_split *zsplit;
 	unsigned int i, perc, perc_missing, sperc, sperc_missing;
-	long long val;
-	char *fname;
+	struct split split;
 
-	o->zone_split_nr[ddir] = 4;
-	zsplit = malloc(4 * sizeof(struct zone_split));
-
-	i = 0;
-	while ((fname = strsep(&str, ":")) != NULL) {
-		char *perc_str;
+	memset(&split, 0, sizeof(split));
 
-		if (!strlen(fname))
-			break;
-
-		/*
-		 * grow struct buffer, if needed
-		 */
-		if (i == o->zone_split_nr[ddir]) {
-			o->zone_split_nr[ddir] <<= 1;
-			zsplit = realloc(zsplit, o->zone_split_nr[ddir]
-						  * sizeof(struct zone_split));
-		}
-
-		perc_str = strstr(fname, "/");
-		if (perc_str) {
-			*perc_str = '\0';
-			perc_str++;
-			perc = atoi(perc_str);
-			if (perc > 100)
-				perc = 100;
-			else if (!perc)
-				perc = -1U;
-		} else
-			perc = -1U;
-
-		if (str_to_decimal(fname, &val, 1, o, 0, 0)) {
-			log_err("fio: zone_split conversion failed\n");
-			free(zsplit);
-			return 1;
-		}
+	if (split_parse_ddir(o, &split, ddir, str))
+		return 1;
+	if (!split.nr)
+		return 0;
 
-		zsplit[i].access_perc = val;
-		zsplit[i].size_perc = perc;
-		i++;
+	o->zone_split[ddir] = malloc(split.nr * sizeof(struct zone_split));
+	o->zone_split_nr[ddir] = split.nr;
+	for (i = 0; i < split.nr; i++) {
+		o->zone_split[ddir][i].access_perc = split.val1[i];
+		o->zone_split[ddir][i].size_perc = split.val2[i];
 	}
 
-	o->zone_split_nr[ddir] = i;
-
 	/*
 	 * Now check if the percentages add up, and how much is missing
 	 */
 	perc = perc_missing = 0;
 	sperc = sperc_missing = 0;
 	for (i = 0; i < o->zone_split_nr[ddir]; i++) {
-		struct zone_split *zsp = &zsplit[i];
+		struct zone_split *zsp = &o->zone_split[ddir][i];
 
 		if (zsp->access_perc == (uint8_t) -1U)
 			perc_missing++;
@@ -787,13 +793,15 @@ static int zone_split_ddir(struct thread_options *o, int ddir, char *str)
 
 	if (perc > 100 || sperc > 100) {
 		log_err("fio: zone_split percentages add to more than 100%%\n");
-		free(zsplit);
+		free(o->zone_split[ddir]);
+		o->zone_split[ddir] = NULL;
 		return 1;
 	}
 	if (perc < 100) {
 		log_err("fio: access percentage don't add up to 100 for zoned "
 			"random distribution (got=%u)\n", perc);
-		free(zsplit);
+		free(o->zone_split[ddir]);
+		o->zone_split[ddir] = NULL;
 		return 1;
 	}
 
@@ -805,7 +813,7 @@ static int zone_split_ddir(struct thread_options *o, int ddir, char *str)
 		if (perc_missing == 1 && o->zone_split_nr[ddir] == 1)
 			perc = 100;
 		for (i = 0; i < o->zone_split_nr[ddir]; i++) {
-			struct zone_split *zsp = &zsplit[i];
+			struct zone_split *zsp = &o->zone_split[ddir][i];
 
 			if (zsp->access_perc == (uint8_t) -1U)
 				zsp->access_perc = (100 - perc) / perc_missing;
@@ -815,7 +823,7 @@ static int zone_split_ddir(struct thread_options *o, int ddir, char *str)
 		if (sperc_missing == 1 && o->zone_split_nr[ddir] == 1)
 			sperc = 100;
 		for (i = 0; i < o->zone_split_nr[ddir]; i++) {
-			struct zone_split *zsp = &zsplit[i];
+			struct zone_split *zsp = &o->zone_split[ddir][i];
 
 			if (zsp->size_perc == (uint8_t) -1U)
 				zsp->size_perc = (100 - sperc) / sperc_missing;
@@ -825,8 +833,7 @@ static int zone_split_ddir(struct thread_options *o, int ddir, char *str)
 	/*
 	 * now sort based on percentages, for ease of lookup
 	 */
-	qsort(zsplit, o->zone_split_nr[ddir], sizeof(struct zone_split), zone_cmp);
-	o->zone_split[ddir] = zsplit;
+	qsort(o->zone_split[ddir], o->zone_split_nr[ddir], sizeof(struct zone_split), zone_cmp);
 	return 0;
 }
 
@@ -869,7 +876,7 @@ static void td_zone_gen_index(struct thread_data *td)
 
 static int parse_zoned_distribution(struct thread_data *td, const char *input)
 {
-	char *str, *p, *odir, *ddir;
+	char *str, *p;
 	int i, ret = 0;
 
 	p = str = strdup(input);
@@ -885,42 +892,7 @@ static int parse_zoned_distribution(struct thread_data *td, const char *input)
 	}
 	str += strlen("zoned:");
 
-	odir = strchr(str, ',');
-	if (odir) {
-		ddir = strchr(odir + 1, ',');
-		if (ddir) {
-			ret = zone_split_ddir(&td->o, DDIR_TRIM, ddir + 1);
-			if (!ret)
-				*ddir = '\0';
-		} else {
-			char *op;
-
-			op = strdup(odir + 1);
-			ret = zone_split_ddir(&td->o, DDIR_TRIM, op);
-
-			free(op);
-		}
-		if (!ret)
-			ret = zone_split_ddir(&td->o, DDIR_WRITE, odir + 1);
-		if (!ret) {
-			*odir = '\0';
-			ret = zone_split_ddir(&td->o, DDIR_READ, str);
-		}
-	} else {
-		char *op;
-
-		op = strdup(str);
-		ret = zone_split_ddir(&td->o, DDIR_WRITE, op);
-		free(op);
-
-		if (!ret) {
-			op = strdup(str);
-			ret = zone_split_ddir(&td->o, DDIR_TRIM, op);
-			free(op);
-		}
-		if (!ret)
-			ret = zone_split_ddir(&td->o, DDIR_READ, str);
-	}
+	ret = str_split_parse(td, str, zone_split_ddir);
 
 	free(p);
 
@@ -937,6 +909,18 @@ static int parse_zoned_distribution(struct thread_data *td, const char *input)
 		}
 	}
 
+	if (parse_dryrun()) {
+		int i;
+
+		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+			free(td->o.zone_split[i]);
+			td->o.zone_split[i] = NULL;
+			td->o.zone_split_nr[i] = 0;
+		}
+
+		return ret;
+	}
+
 	if (!ret)
 		td_zone_gen_index(td);
 	else {
@@ -953,9 +937,6 @@ static int str_random_distribution_cb(void *data, const char *str)
 	double val;
 	char *nr;
 
-	if (parse_dryrun())
-		return 0;
-
 	if (td->o.random_distribution == FIO_RAND_DIST_ZIPF)
 		val = FIO_DEF_ZIPF;
 	else if (td->o.random_distribution == FIO_RAND_DIST_PARETO)
@@ -981,18 +962,24 @@ static int str_random_distribution_cb(void *data, const char *str)
 			log_err("fio: zipf theta must different than 1.0\n");
 			return 1;
 		}
+		if (parse_dryrun())
+			return 0;
 		td->o.zipf_theta.u.f = val;
 	} else if (td->o.random_distribution == FIO_RAND_DIST_PARETO) {
 		if (val <= 0.00 || val >= 1.00) {
 			log_err("fio: pareto input out of range (0 < input < 1.0)\n");
 			return 1;
 		}
+		if (parse_dryrun())
+			return 0;
 		td->o.pareto_h.u.f = val;
 	} else {
 		if (val <= 0.00 || val >= 100.0) {
 			log_err("fio: normal deviation out of range (0 < input < 100.0)\n");
 			return 1;
 		}
+		if (parse_dryrun())
+			return 0;
 		td->o.gauss_dev.u.f = val;
 	}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-05 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-05 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8116fd24b737c9d878ccb6a4cc13cc4f974dc2dc:

  Update documentation for random_distribution=gauss (2016-03-03 14:00:54 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 7c4d0fb7e9e038000978b43b5674f9ad049d36b9:

  Fix double free of td zone state index (2016-03-04 19:50:41 -0700)

----------------------------------------------------------------
Alan C (1):
      Log to parent instead of twice to child td

Jens Axboe (3):
      Add support for zones of random IO, with varying frequency of access
      options: clean number of zones if we fail parsing
      Fix double free of td zone state index

 HOWTO                   |  20 ++++
 backend.c               |   9 ++
 cconv.c                 |  32 +++++++
 examples/rand-zones.fio |  18 ++++
 fio.1                   |  34 ++++++-
 fio.h                   |  10 ++
 init.c                  |   1 +
 io_u.c                  | 108 ++++++++++++++++-----
 lib/rand.h              |  11 +++
 options.c               | 248 ++++++++++++++++++++++++++++++++++++++++++++++++
 server.h                |   2 +-
 thread_options.h        |  13 ++-
 12 files changed, 477 insertions(+), 29 deletions(-)
 create mode 100644 examples/rand-zones.fio

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index d70c9af..e2a4b15 100644
--- a/HOWTO
+++ b/HOWTO
@@ -964,6 +964,7 @@ random_distribution=str:float	By default, fio will use a completely uniform
 		zipf		Zipf distribution
 		pareto		Pareto distribution
 		gauss		Normal (guassian) distribution
+		zoned		Zoned random distribution
 
 		When using a zipf or pareto distribution, an input value
 		is also needed to define the access pattern. For zipf, this
@@ -976,6 +977,25 @@ random_distribution=str:float	By default, fio will use a completely uniform
 		the gauss distribution, a normal deviation is supplied as
 		a value between 0 and 100.
 
+		For a zoned distribution, fio supports specifying percentages
+		of IO access that should fall within what range of the file or
+		device. For example, given a criteria of:
+
+			60% of accesses should be to the first 10%
+			30% of accesses should be to the next 20%
+			8% of accesses should be to to the next 30%
+			2% of accesses should be to the next 40%
+
+		we can define that through zoning of the random accesses. For
+		the above example, the user would do:
+
+			random_distribution=zoned:60/10:30/20:8/30:2/40
+
+		similarly to how bssplit works for setting ranges and
+		percentages of block sizes. Like bssplit, it's possible to
+		specify separate zones for reads, writes, and trims. If just
+		one set is given, it'll apply to all of them.
+
 percentage_random=int	For a random workload, set how big a percentage should
 		be random. This defaults to 100%, in which case the workload
 		is fully random. It can be set from anywhere from 0 to 100.
diff --git a/backend.c b/backend.c
index 6083a51..7f57c65 100644
--- a/backend.c
+++ b/backend.c
@@ -1711,6 +1711,15 @@ err:
 	cgroup_shutdown(td, &cgroup_mnt);
 	verify_free_state(td);
 
+	if (td->zone_state_index) {
+		int i;
+
+		for (i = 0; i < DDIR_RWDIR_CNT; i++)
+			free(td->zone_state_index[i]);
+		free(td->zone_state_index);
+		td->zone_state_index = NULL;
+	}
+
 	if (fio_option_is_set(o, cpumask)) {
 		ret = fio_cpuset_exit(&o->cpumask);
 		if (ret)
diff --git a/cconv.c b/cconv.c
index 6f57d90..0c3a36c 100644
--- a/cconv.c
+++ b/cconv.c
@@ -23,6 +23,8 @@ static void __string_to_net(uint8_t *dst, const char *src, size_t dst_size)
 
 static void free_thread_options_to_cpu(struct thread_options *o)
 {
+	int i;
+
 	free(o->description);
 	free(o->name);
 	free(o->wait_for);
@@ -43,6 +45,11 @@ static void free_thread_options_to_cpu(struct thread_options *o)
 	free(o->ioscheduler);
 	free(o->profile);
 	free(o->cgroup);
+
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		free(o->bssplit[i]);
+		free(o->zone_split[i]);
+	}
 }
 
 void convert_thread_options_to_cpu(struct thread_options *o,
@@ -111,6 +118,16 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 			}
 		}
 
+		o->zone_split_nr[i] = le32_to_cpu(top->zone_split_nr[i]);
+
+		if (o->zone_split_nr[i]) {
+			o->zone_split[i] = malloc(o->zone_split_nr[i] * sizeof(struct zone_split));
+			for (j = 0; j < o->zone_split_nr[i]; j++) {
+				o->zone_split[i][j].access_perc = top->zone_split[i][j].access_perc;
+				o->zone_split[i][j].size_perc = top->zone_split[i][j].size_perc;
+			}
+		}
+
 		o->rwmix[i] = le32_to_cpu(top->rwmix[i]);
 		o->rate[i] = le32_to_cpu(top->rate[i]);
 		o->ratemin[i] = le32_to_cpu(top->ratemin[i]);
@@ -453,6 +470,21 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 			}
 		}
 
+		top->zone_split_nr[i] = cpu_to_le32(o->zone_split_nr[i]);
+
+		if (o->zone_split_nr[i]) {
+			unsigned int zone_split_nr = o->zone_split_nr[i];
+
+			if (zone_split_nr > ZONESPLIT_MAX) {
+				log_err("fio: ZONESPLIT_MAX is too small\n");
+				zone_split_nr = ZONESPLIT_MAX;
+			}
+			for (j = 0; j < zone_split_nr; j++) {
+				top->zone_split[i][j].access_perc = o->zone_split[i][j].access_perc;
+				top->zone_split[i][j].size_perc = o->zone_split[i][j].size_perc;
+			}
+		}
+
 		top->rwmix[i] = cpu_to_le32(o->rwmix[i]);
 		top->rate[i] = cpu_to_le32(o->rate[i]);
 		top->ratemin[i] = cpu_to_le32(o->ratemin[i]);
diff --git a/examples/rand-zones.fio b/examples/rand-zones.fio
new file mode 100644
index 0000000..da13fa3
--- /dev/null
+++ b/examples/rand-zones.fio
@@ -0,0 +1,18 @@
+# Sample job file demonstrating how to use zoned random distributionss
+# to have skewed random accesses. This example has 50% of the accesses
+# to the first 5% of the file (50/5), 30% to the next 15% (30/15), and
+# finally 20% of the IO will end up in the remaining 80%.
+[zones]
+size=2g
+direct=1
+bs=4k
+rw=randread
+norandommap
+random_distribution=zoned:50/5:30/15:20/
+
+# The above applies to all of reads/writes/trims. If we wanted to do
+# something differently for writes, let's say 50% for the first 10%
+# and 50% for the remaining 90%, we could do it by adding a new section
+# after a a comma.
+
+# random_distribution=zoned:50/5:30/15:20/,50/10:50/90
diff --git a/fio.1 b/fio.1
index 81c266b..87404c1 100644
--- a/fio.1
+++ b/fio.1
@@ -877,8 +877,10 @@ Pareto distribution
 .B gauss
 Normal (gaussian) distribution
 .TP
+.B zoned
+Zoned random distribution
+.TP
 .RE
-.P
 When using a \fBzipf\fR or \fBpareto\fR distribution, an input value is also
 needed to define the access pattern. For \fBzipf\fR, this is the zipf theta.
 For \fBpareto\fR, it's the pareto power. Fio includes a test program, genzipf,
@@ -887,6 +889,36 @@ hit rates. If you wanted to use \fBzipf\fR with a theta of 1.2, you would use
 random_distribution=zipf:1.2 as the option. If a non-uniform model is used,
 fio will disable use of the random map. For the \fBgauss\fR distribution, a
 normal deviation is supplied as a value between 0 and 100.
+.P
+.RS
+For a \fBzoned\fR distribution, fio supports specifying percentages of IO
+access that should fall within what range of the file or device. For example,
+given a criteria of:
+.P
+.RS
+60% of accesses should be to the first 10%
+.RE
+.RS
+30% of accesses should be to the next 20%
+.RE
+.RS
+8% of accesses should be to to the next 30%
+.RE
+.RS
+2% of accesses should be to the next 40%
+.RE
+.P
+we can define that through zoning of the random accesses. For the above
+example, the user would do:
+.P
+.RS
+.B random_distribution=zoned:60/10:30/20:8/30:2/40
+.RE
+.P
+similarly to how \fBbssplit\fR works for setting ranges and percentages of block
+sizes. Like \fBbssplit\fR, it's possible to specify separate zones for reads,
+writes, and trims. If just one set is given, it'll apply to all of them.
+.RE
 .TP
 .BI percentage_random \fR=\fPint
 For a random workload, set how big a percentage should be random. This defaults
diff --git a/fio.h b/fio.h
index b71a486..30fbde0 100644
--- a/fio.h
+++ b/fio.h
@@ -96,6 +96,7 @@ enum {
 	FIO_RAND_START_DELAY,
 	FIO_DEDUPE_OFF,
 	FIO_RAND_POISSON_OFF,
+	FIO_RAND_ZONE_OFF,
 	FIO_RAND_NR_OFFS,
 };
 
@@ -115,6 +116,11 @@ struct sk_out;
 void sk_out_assign(struct sk_out *);
 void sk_out_drop(void);
 
+struct zone_split_index {
+	uint8_t size_perc;
+	uint8_t size_perc_prev;
+};
+
 /*
  * This describes a single thread/process executing a fio job.
  */
@@ -200,6 +206,9 @@ struct thread_data {
 	struct frand_state buf_state;
 	struct frand_state buf_state_prev;
 	struct frand_state dedupe_state;
+	struct frand_state zone_state;
+
+	struct zone_split_index **zone_state_index;
 
 	unsigned int verify_batch;
 	unsigned int trim_batch;
@@ -712,6 +721,7 @@ enum {
 	FIO_RAND_DIST_ZIPF,
 	FIO_RAND_DIST_PARETO,
 	FIO_RAND_DIST_GAUSS,
+	FIO_RAND_DIST_ZONED,
 };
 
 #define FIO_DEF_ZIPF		1.1
diff --git a/init.c b/init.c
index c7ce2cc..149029a 100644
--- a/init.c
+++ b/init.c
@@ -968,6 +968,7 @@ void td_fill_rand_seeds(struct thread_data *td)
 	frand_copy(&td->buf_state_prev, &td->buf_state);
 
 	init_rand_seed(&td->dedupe_state, td->rand_seeds[FIO_DEDUPE_OFF], use64);
+	init_rand_seed(&td->zone_state, td->rand_seeds[FIO_RAND_ZONE_OFF], use64);
 }
 
 /*
diff --git a/io_u.c b/io_u.c
index 8d34912..0a39886 100644
--- a/io_u.c
+++ b/io_u.c
@@ -86,24 +86,19 @@ struct rand_off {
 };
 
 static int __get_next_rand_offset(struct thread_data *td, struct fio_file *f,
-				  enum fio_ddir ddir, uint64_t *b)
+				  enum fio_ddir ddir, uint64_t *b,
+				  uint64_t lastb)
 {
 	uint64_t r;
 
 	if (td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE ||
 	    td->o.random_generator == FIO_RAND_GEN_TAUSWORTHE64) {
-		uint64_t frand_max, lastb;
 
-		lastb = last_block(td, f, ddir);
-		if (!lastb)
-			return 1;
-
-		frand_max = rand_max(&td->random_state);
 		r = __rand(&td->random_state);
 
 		dprint(FD_RANDOM, "off rand %llu\n", (unsigned long long) r);
 
-		*b = lastb * (r / ((uint64_t) frand_max + 1.0));
+		*b = lastb * (r / (rand_max(&td->random_state) + 1.0));
 	} else {
 		uint64_t off = 0;
 
@@ -161,6 +156,70 @@ static int __get_next_rand_offset_gauss(struct thread_data *td,
 	return 0;
 }
 
+static int __get_next_rand_offset_zoned(struct thread_data *td,
+					struct fio_file *f, enum fio_ddir ddir,
+					uint64_t *b)
+{
+	unsigned int v, send, stotal;
+	uint64_t offset, lastb;
+	static int warned;
+	struct zone_split_index *zsi;
+
+	lastb = last_block(td, f, ddir);
+	if (!lastb)
+		return 1;
+
+	if (!td->o.zone_split_nr[ddir]) {
+bail:
+		return __get_next_rand_offset(td, f, ddir, b, lastb);
+	}
+
+	/*
+	 * Generate a value, v, between 1 and 100, both inclusive
+	 */
+	v = rand_between(&td->zone_state, 1, 100);
+
+	zsi = &td->zone_state_index[ddir][v - 1];
+	stotal = zsi->size_perc_prev;
+	send = zsi->size_perc;
+
+	/*
+	 * Should never happen
+	 */
+	if (send == -1U) {
+		if (!warned) {
+			log_err("fio: bug in zoned generation\n");
+			warned = 1;
+		}
+		goto bail;
+	}
+
+	/*
+	 * 'send' is some percentage below or equal to 100 that
+	 * marks the end of the current IO range. 'stotal' marks
+	 * the start, in percent.
+	 */
+	if (stotal)
+		offset = stotal * lastb / 100ULL;
+	else
+		offset = 0;
+
+	lastb = lastb * (send - stotal) / 100ULL;
+
+	/*
+	 * Generate index from 0..send-of-lastb
+	 */
+	if (__get_next_rand_offset(td, f, ddir, b, lastb) == 1)
+		return 1;
+
+	/*
+	 * Add our start offset, if any
+	 */
+	if (offset)
+		*b += offset;
+
+	return 0;
+}
 
 static int flist_cmp(void *data, struct flist_head *a, struct flist_head *b)
 {
@@ -173,14 +232,22 @@ static int flist_cmp(void *data, struct flist_head *a, struct flist_head *b)
 static int get_off_from_method(struct thread_data *td, struct fio_file *f,
 			       enum fio_ddir ddir, uint64_t *b)
 {
-	if (td->o.random_distribution == FIO_RAND_DIST_RANDOM)
-		return __get_next_rand_offset(td, f, ddir, b);
-	else if (td->o.random_distribution == FIO_RAND_DIST_ZIPF)
+	if (td->o.random_distribution == FIO_RAND_DIST_RANDOM) {
+		uint64_t lastb;
+
+		lastb = last_block(td, f, ddir);
+		if (!lastb)
+			return 1;
+
+		return __get_next_rand_offset(td, f, ddir, b, lastb);
+	} else if (td->o.random_distribution == FIO_RAND_DIST_ZIPF)
 		return __get_next_rand_offset_zipf(td, f, ddir, b);
 	else if (td->o.random_distribution == FIO_RAND_DIST_PARETO)
 		return __get_next_rand_offset_pareto(td, f, ddir, b);
 	else if (td->o.random_distribution == FIO_RAND_DIST_GAUSS)
 		return __get_next_rand_offset_gauss(td, f, ddir, b);
+	else if (td->o.random_distribution == FIO_RAND_DIST_ZONED)
+		return __get_next_rand_offset_zoned(td, f, ddir, b);
 
 	log_err("fio: unknown random distribution: %d\n", td->o.random_distribution);
 	return 1;
@@ -207,16 +274,12 @@ static inline bool should_sort_io(struct thread_data *td)
 
 static bool should_do_random(struct thread_data *td, enum fio_ddir ddir)
 {
-	uint64_t frand_max;
 	unsigned int v;
-	unsigned long r;
 
 	if (td->o.perc_rand[ddir] == 100)
 		return true;
 
-	frand_max = rand_max(&td->seq_rand_state[ddir]);
-	r = __rand(&td->seq_rand_state[ddir]);
-	v = 1 + (int) (100.0 * (r / (frand_max + 1.0)));
+	v = rand_between(&td->seq_rand_state[ddir], 1, 100);
 
 	return v <= td->o.perc_rand[ddir];
 }
@@ -536,12 +599,9 @@ static void set_rwmix_bytes(struct thread_data *td)
 
 static inline enum fio_ddir get_rand_ddir(struct thread_data *td)
 {
-	uint64_t frand_max = rand_max(&td->rwmix_state);
 	unsigned int v;
-	unsigned long r;
 
-	r = __rand(&td->rwmix_state);
-	v = 1 + (int) (100.0 * (r / (frand_max + 1.0)));
+	v = rand_between(&td->rwmix_state, 1, 100);
 
 	if (v <= td->o.rwmix[DDIR_READ])
 		return DDIR_READ;
@@ -1607,7 +1667,7 @@ void io_u_log_error(struct thread_data *td, struct io_u *io_u)
 {
 	__io_u_log_error(td, io_u);
 	if (td->parent)
-		__io_u_log_error(td, io_u);
+		__io_u_log_error(td->parent, io_u);
 }
 
 static inline bool gtod_reduce(struct thread_data *td)
@@ -1895,9 +1955,7 @@ void io_u_queued(struct thread_data *td, struct io_u *io_u)
  */
 static struct frand_state *get_buf_state(struct thread_data *td)
 {
-	uint64_t frand_max;
 	unsigned int v;
-	unsigned long r;
 
 	if (!td->o.dedupe_percentage)
 		return &td->buf_state;
@@ -1906,9 +1964,7 @@ static struct frand_state *get_buf_state(struct thread_data *td)
 		return &td->buf_state;
 	}
 
-	frand_max = rand_max(&td->dedupe_state);
-	r = __rand(&td->dedupe_state);
-	v = 1 + (int) (100.0 * (r / (frand_max + 1.0)));
+	v = rand_between(&td->dedupe_state, 1, 100);
 
 	if (v <= td->o.dedupe_percentage)
 		return &td->buf_state_prev;
diff --git a/lib/rand.h b/lib/rand.h
index a95bd28..49773b0 100644
--- a/lib/rand.h
+++ b/lib/rand.h
@@ -117,6 +117,17 @@ static inline double __rand_0_1(struct frand_state *state)
 	}
 }
 
+/*
+ * Generate a random value between 'start' and 'end', both inclusive
+ */
+static inline int rand_between(struct frand_state *state, int start, int end)
+{
+	uint64_t r;
+
+	r = __rand(state);
+	return start + (int) ((double)end * (r / (rand_max(state) + 1.0)));
+}
+
 extern void init_rand(struct frand_state *, int);
 extern void init_rand_seed(struct frand_state *, unsigned int seed, int);
 extern void __fill_random_buf(void *buf, unsigned int len, unsigned long seed);
diff --git a/options.c b/options.c
index ac2da71..bc2d0ed 100644
--- a/options.c
+++ b/options.c
@@ -706,6 +706,247 @@ static int str_sfr_cb(void *data, const char *str)
 }
 #endif
 
+static int zone_cmp(const void *p1, const void *p2)
+{
+	const struct zone_split *zsp1 = p1;
+	const struct zone_split *zsp2 = p2;
+
+	return (int) zsp2->access_perc - (int) zsp1->access_perc;
+}
+
+static int zone_split_ddir(struct thread_options *o, int ddir, char *str)
+{
+	struct zone_split *zsplit;
+	unsigned int i, perc, perc_missing, sperc, sperc_missing;
+	long long val;
+	char *fname;
+
+	o->zone_split_nr[ddir] = 4;
+	zsplit = malloc(4 * sizeof(struct zone_split));
+
+	i = 0;
+	while ((fname = strsep(&str, ":")) != NULL) {
+		char *perc_str;
+
+		if (!strlen(fname))
+			break;
+
+		/*
+		 * grow struct buffer, if needed
+		 */
+		if (i == o->zone_split_nr[ddir]) {
+			o->zone_split_nr[ddir] <<= 1;
+			zsplit = realloc(zsplit, o->zone_split_nr[ddir]
+						  * sizeof(struct zone_split));
+		}
+
+		perc_str = strstr(fname, "/");
+		if (perc_str) {
+			*perc_str = '\0';
+			perc_str++;
+			perc = atoi(perc_str);
+			if (perc > 100)
+				perc = 100;
+			else if (!perc)
+				perc = -1U;
+		} else
+			perc = -1U;
+
+		if (str_to_decimal(fname, &val, 1, o, 0, 0)) {
+			log_err("fio: zone_split conversion failed\n");
+			free(zsplit);
+			return 1;
+		}
+
+		zsplit[i].access_perc = val;
+		zsplit[i].size_perc = perc;
+		i++;
+	}
+
+	o->zone_split_nr[ddir] = i;
+
+	/*
+	 * Now check if the percentages add up, and how much is missing
+	 */
+	perc = perc_missing = 0;
+	sperc = sperc_missing = 0;
+	for (i = 0; i < o->zone_split_nr[ddir]; i++) {
+		struct zone_split *zsp = &zsplit[i];
+
+		if (zsp->access_perc == (uint8_t) -1U)
+			perc_missing++;
+		else
+			perc += zsp->access_perc;
+
+		if (zsp->size_perc == (uint8_t) -1U)
+			sperc_missing++;
+		else
+			sperc += zsp->size_perc;
+
+	}
+
+	if (perc > 100 || sperc > 100) {
+		log_err("fio: zone_split percentages add to more than 100%%\n");
+		free(zsplit);
+		return 1;
+	}
+	if (perc < 100) {
+		log_err("fio: access percentage don't add up to 100 for zoned "
+			"random distribution (got=%u)\n", perc);
+		free(zsplit);
+		return 1;
+	}
+
+	/*
+	 * If values didn't have a percentage set, divide the remains between
+	 * them.
+	 */
+	if (perc_missing) {
+		if (perc_missing == 1 && o->zone_split_nr[ddir] == 1)
+			perc = 100;
+		for (i = 0; i < o->zone_split_nr[ddir]; i++) {
+			struct zone_split *zsp = &zsplit[i];
+
+			if (zsp->access_perc == (uint8_t) -1U)
+				zsp->access_perc = (100 - perc) / perc_missing;
+		}
+	}
+	if (sperc_missing) {
+		if (sperc_missing == 1 && o->zone_split_nr[ddir] == 1)
+			sperc = 100;
+		for (i = 0; i < o->zone_split_nr[ddir]; i++) {
+			struct zone_split *zsp = &zsplit[i];
+
+			if (zsp->size_perc == (uint8_t) -1U)
+				zsp->size_perc = (100 - sperc) / sperc_missing;
+		}
+	}
+
+	/*
+	 * now sort based on percentages, for ease of lookup
+	 */
+	qsort(zsplit, o->zone_split_nr[ddir], sizeof(struct zone_split), zone_cmp);
+	o->zone_split[ddir] = zsplit;
+	return 0;
+}
+
+static void __td_zone_gen_index(struct thread_data *td, enum fio_ddir ddir)
+{
+	unsigned int i, j, sprev, aprev;
+
+	td->zone_state_index[ddir] = malloc(sizeof(struct zone_split_index) * 100);
+
+	sprev = aprev = 0;
+	for (i = 0; i < td->o.zone_split_nr[ddir]; i++) {
+		struct zone_split *zsp = &td->o.zone_split[ddir][i];
+
+		for (j = aprev; j < aprev + zsp->access_perc; j++) {
+			struct zone_split_index *zsi = &td->zone_state_index[ddir][j];
+
+			zsi->size_perc = sprev + zsp->size_perc;
+			zsi->size_perc_prev = sprev;
+		}
+
+		aprev += zsp->access_perc;
+		sprev += zsp->size_perc;
+	}
+}
+
+/*
+ * Generate state table for indexes, so we don't have to do it inline from
+ * the hot IO path
+ */
+static void td_zone_gen_index(struct thread_data *td)
+{
+	int i;
+
+	td->zone_state_index = malloc(DDIR_RWDIR_CNT *
+					sizeof(struct zone_split_index *));
+
+	for (i = 0; i < DDIR_RWDIR_CNT; i++)
+		__td_zone_gen_index(td, i);
+}
+
+static int parse_zoned_distribution(struct thread_data *td, const char *input)
+{
+	char *str, *p, *odir, *ddir;
+	int i, ret = 0;
+
+	p = str = strdup(input);
+
+	strip_blank_front(&str);
+	strip_blank_end(str);
+
+	/* We expect it to start like that, bail if not */
+	if (strncmp(str, "zoned:", 6)) {
+		log_err("fio: mismatch in zoned input <%s>\n", str);
+		free(p);
+		return 1;
+	}
+	str += strlen("zoned:");
+
+	odir = strchr(str, ',');
+	if (odir) {
+		ddir = strchr(odir + 1, ',');
+		if (ddir) {
+			ret = zone_split_ddir(&td->o, DDIR_TRIM, ddir + 1);
+			if (!ret)
+				*ddir = '\0';
+		} else {
+			char *op;
+
+			op = strdup(odir + 1);
+			ret = zone_split_ddir(&td->o, DDIR_TRIM, op);
+
+			free(op);
+		}
+		if (!ret)
+			ret = zone_split_ddir(&td->o, DDIR_WRITE, odir + 1);
+		if (!ret) {
+			*odir = '\0';
+			ret = zone_split_ddir(&td->o, DDIR_READ, str);
+		}
+	} else {
+		char *op;
+
+		op = strdup(str);
+		ret = zone_split_ddir(&td->o, DDIR_WRITE, op);
+		free(op);
+
+		if (!ret) {
+			op = strdup(str);
+			ret = zone_split_ddir(&td->o, DDIR_TRIM, op);
+			free(op);
+		}
+		if (!ret)
+			ret = zone_split_ddir(&td->o, DDIR_READ, str);
+	}
+
+	free(p);
+
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		int j;
+
+		dprint(FD_PARSE, "zone ddir %d (nr=%u): \n", i, td->o.zone_split_nr[i]);
+
+		for (j = 0; j < td->o.zone_split_nr[i]; j++) {
+			struct zone_split *zsp = &td->o.zone_split[i][j];
+
+			dprint(FD_PARSE, "\t%d: %u/%u\n", j, zsp->access_perc,
+								zsp->size_perc);
+		}
+	}
+
+	if (!ret)
+		td_zone_gen_index(td);
+	else {
+		for (i = 0; i < DDIR_RWDIR_CNT; i++)
+			td->o.zone_split_nr[i] = 0;
+	}
+
+	return ret;
+}
+
 static int str_random_distribution_cb(void *data, const char *str)
 {
 	struct thread_data *td = data;
@@ -721,6 +962,8 @@ static int str_random_distribution_cb(void *data, const char *str)
 		val = FIO_DEF_PARETO;
 	else if (td->o.random_distribution == FIO_RAND_DIST_GAUSS)
 		val = 0.0;
+	else if (td->o.random_distribution == FIO_RAND_DIST_ZONED)
+		return parse_zoned_distribution(td, str);
 	else
 		return 0;
 
@@ -1709,6 +1952,11 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .oval = FIO_RAND_DIST_GAUSS,
 			    .help = "Normal (gaussian) distribution",
 			  },
+			  { .ival = "zoned",
+			    .oval = FIO_RAND_DIST_ZONED,
+			    .help = "Zoned random distribution",
+			  },
+
 		},
 		.category = FIO_OPT_C_IO,
 		.group	= FIO_OPT_G_RANDOM,
diff --git a/server.h b/server.h
index a726894..fd0a0ce 100644
--- a/server.h
+++ b/server.h
@@ -38,7 +38,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 52,
+	FIO_SERVER_VER			= 53,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/thread_options.h b/thread_options.h
index 384534a..10d7ba6 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -25,12 +25,18 @@ enum fio_memtype {
 #define ERROR_STR_MAX	128
 
 #define BSSPLIT_MAX	64
+#define ZONESPLIT_MAX	64
 
 struct bssplit {
 	uint32_t bs;
 	uint32_t perc;
 };
 
+struct zone_split {
+	uint8_t access_perc;
+	uint8_t size_perc;
+};
+
 #define NR_OPTS_SZ	(FIO_MAX_OPTS / (8 * sizeof(uint64_t)))
 
 #define OPT_MAGIC	0x4f50544e
@@ -135,6 +141,9 @@ struct thread_options {
 	unsigned int random_distribution;
 	unsigned int exitall_error;
 
+	struct zone_split *zone_split[DDIR_RWDIR_CNT];
+	unsigned int zone_split_nr[DDIR_RWDIR_CNT];
+
 	fio_fp64_t zipf_theta;
 	fio_fp64_t pareto_h;
 	fio_fp64_t gauss_dev;
@@ -382,7 +391,9 @@ struct thread_options_pack {
 
 	uint32_t random_distribution;
 	uint32_t exitall_error;
-	uint32_t pad0;
+
+	struct zone_split zone_split[DDIR_RWDIR_CNT][ZONESPLIT_MAX];
+	uint32_t zone_split_nr[DDIR_RWDIR_CNT];
 
 	fio_fp64_t zipf_theta;
 	fio_fp64_t pareto_h;

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-03-04 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-03-04 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 2cafffbea5d2ed2f20d73efa0d82baa9046e0b12:

  Add support for preadv2/pwritev2 (2016-02-26 11:02:54 -0800)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8116fd24b737c9d878ccb6a4cc13cc4f974dc2dc:

  Update documentation for random_distribution=gauss (2016-03-03 14:00:54 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Update documentation for random_distribution=gauss

 HOWTO |  5 ++++-
 fio.1 | 16 ++++++++++------
 2 files changed, 14 insertions(+), 7 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index c37a9e0..d70c9af 100644
--- a/HOWTO
+++ b/HOWTO
@@ -963,6 +963,7 @@ random_distribution=str:float	By default, fio will use a completely uniform
 		random		Uniform random distribution
 		zipf		Zipf distribution
 		pareto		Pareto distribution
+		gauss		Normal (guassian) distribution
 
 		When using a zipf or pareto distribution, an input value
 		is also needed to define the access pattern. For zipf, this
@@ -971,7 +972,9 @@ random_distribution=str:float	By default, fio will use a completely uniform
 		what the given input values will yield in terms of hit rates.
 		If you wanted to use zipf with a theta of 1.2, you would use
 		random_distribution=zipf:1.2 as the option. If a non-uniform
-		model is used, fio will disable use of the random map.
+		model is used, fio will disable use of the random map. For
+		the gauss distribution, a normal deviation is supplied as
+		a value between 0 and 100.
 
 percentage_random=int	For a random workload, set how big a percentage should
 		be random. This defaults to 100%, in which case the workload
diff --git a/fio.1 b/fio.1
index f98802a..81c266b 100644
--- a/fio.1
+++ b/fio.1
@@ -874,15 +874,19 @@ Zipf distribution
 .B pareto
 Pareto distribution
 .TP
+.B gauss
+Normal (gaussian) distribution
+.TP
 .RE
 .P
-When using a zipf or pareto distribution, an input value is also needed to
-define the access pattern. For zipf, this is the zipf theta. For pareto,
-it's the pareto power. Fio includes a test program, genzipf, that can be
-used visualize what the given input values will yield in terms of hit rates.
-If you wanted to use zipf with a theta of 1.2, you would use
+When using a \fBzipf\fR or \fBpareto\fR distribution, an input value is also
+needed to define the access pattern. For \fBzipf\fR, this is the zipf theta.
+For \fBpareto\fR, it's the pareto power. Fio includes a test program, genzipf,
+that can be used visualize what the given input values will yield in terms of
+hit rates. If you wanted to use \fBzipf\fR with a theta of 1.2, you would use
 random_distribution=zipf:1.2 as the option. If a non-uniform model is used,
-fio will disable use of the random map.
+fio will disable use of the random map. For the \fBgauss\fR distribution, a
+normal deviation is supplied as a value between 0 and 100.
 .TP
 .BI percentage_random \fR=\fPint
 For a random workload, set how big a percentage should be random. This defaults

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-02-27 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-02-27 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e6727cbdd34a6ec30b437311724a6aa60e6b06fd:

  ioengines: account any queued IO on the engine side (2016-02-25 12:24:47 -0800)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 2cafffbea5d2ed2f20d73efa0d82baa9046e0b12:

  Add support for preadv2/pwritev2 (2016-02-26 11:02:54 -0800)

----------------------------------------------------------------
Jens Axboe (1):
      Add support for preadv2/pwritev2

 HOWTO          |  3 +++
 configure      | 19 ++++++++++++++++
 engines/sync.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fio.1          |  7 ++++++
 options.c      |  5 +++++
 5 files changed, 105 insertions(+)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 5e765d4..c37a9e0 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1784,6 +1784,9 @@ that defines them is selected.
 		enabled when polling for a minimum of 0 events (eg when
 		iodepth_batch_complete=0).
 
+[psyncv2] hipri		Set RWF_HIPRI on IO, indicating to the kernel that
+			it's of higher priority than normal.
+
 [cpu] cpuload=int Attempt to use the specified percentage of CPU cycles.
 
 [cpu] cpuchunks=int Split the load into cycles of the given time. In
diff --git a/configure b/configure
index cbd4d30..6e2488c 100755
--- a/configure
+++ b/configure
@@ -1241,6 +1241,22 @@ fi
 echo "pwritev/preadv                $pwritev"
 
 ##########################################
+# Check whether we have pwritev2/preadv2
+pwritev2="no"
+cat > $TMPC << EOF
+#include <stdio.h>
+#include <sys/uio.h>
+int main(int argc, char **argv)
+{
+  return pwritev2(0, NULL, 1, 0, 0) + preadv2(0, NULL, 1, 0, 0);
+}
+EOF
+if compile_prog "" "" "pwritev2"; then
+  pwritev2="yes"
+fi
+echo "pwritev2/preadv2              $pwritev2"
+
+##########################################
 # Check whether we have the required functions for ipv6
 ipv6="no"
 cat > $TMPC << EOF
@@ -1742,6 +1758,9 @@ fi
 if test "$pwritev" = "yes" ; then
   output_sym "CONFIG_PWRITEV"
 fi
+if test "$pwritev2" = "yes" ; then
+  output_sym "CONFIG_PWRITEV2"
+fi
 if test "$ipv6" = "yes" ; then
   output_sym "CONFIG_IPV6"
 fi
diff --git a/engines/sync.c b/engines/sync.c
index f5801fe..0b0d1a7 100644
--- a/engines/sync.c
+++ b/engines/sync.c
@@ -13,6 +13,7 @@
 #include <assert.h>
 
 #include "../fio.h"
+#include "../optgroup.h"
 
 /*
  * Sync engine uses engine_data to store last offset
@@ -31,6 +32,28 @@ struct syncio_data {
 	enum fio_ddir last_ddir;
 };
 
+#ifdef CONFIG_PWRITEV2
+struct psyncv2_options {
+	void *pad;
+	unsigned int hipri;
+};
+
+static struct fio_option options[] = {
+	{
+		.name	= "hipri",
+		.lname	= "RWF_HIPRI",
+		.type	= FIO_OPT_STR_SET,
+		.off1	= offsetof(struct psyncv2_options, hipri),
+		.help	= "Set RWF_HIPRI for pwritev2/preadv2",
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
+		.name	= NULL,
+	},
+};
+#endif
+
 static int fio_syncio_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
@@ -98,6 +121,38 @@ static int fio_pvsyncio_queue(struct thread_data *td, struct io_u *io_u)
 }
 #endif
 
+#ifdef CONFIG_PWRITEV2
+static int fio_pvsyncio2_queue(struct thread_data *td, struct io_u *io_u)
+{
+	struct syncio_data *sd = td->io_ops->data;
+	struct psyncv2_options *o = td->eo;
+	struct iovec *iov = &sd->iovecs[0];
+	struct fio_file *f = io_u->file;
+	int ret, flags = 0;
+
+	fio_ro_check(td, io_u);
+
+	if (o->hipri)
+		flags |= RWF_HIPRI;
+
+	iov->iov_base = io_u->xfer_buf;
+	iov->iov_len = io_u->xfer_buflen;
+
+	if (io_u->ddir == DDIR_READ)
+		ret = preadv2(f->fd, iov, 1, io_u->offset, flags);
+	else if (io_u->ddir == DDIR_WRITE)
+		ret = pwritev2(f->fd, iov, 1, io_u->offset, flags);
+	else if (io_u->ddir == DDIR_TRIM) {
+		do_io_u_trim(td, io_u);
+		return FIO_Q_COMPLETED;
+	} else
+		ret = do_io_u_sync(td, io_u);
+
+	return fio_io_end(td, io_u, ret);
+}
+#endif
+
+
 static int fio_psyncio_queue(struct thread_data *td, struct io_u *io_u)
 {
 	struct fio_file *f = io_u->file;
@@ -374,6 +429,22 @@ static struct ioengine_ops ioengine_pvrw = {
 };
 #endif
 
+#ifdef CONFIG_PWRITEV2
+static struct ioengine_ops ioengine_pvrw2 = {
+	.name		= "pvsync2",
+	.version	= FIO_IOOPS_VERSION,
+	.init		= fio_vsyncio_init,
+	.cleanup	= fio_vsyncio_cleanup,
+	.queue		= fio_pvsyncio2_queue,
+	.open_file	= generic_open_file,
+	.close_file	= generic_close_file,
+	.get_file_size	= generic_get_file_size,
+	.flags		= FIO_SYNCIO,
+	.options	= options,
+	.option_struct_size	= sizeof(struct psyncv2_options),
+};
+#endif
+
 static void fio_init fio_syncio_register(void)
 {
 	register_ioengine(&ioengine_rw);
diff --git a/fio.1 b/fio.1
index 690c8f4..f98802a 100644
--- a/fio.1
+++ b/fio.1
@@ -591,6 +591,9 @@ coalescing adjacent IOs into a single submission.
 .B pvsync
 Basic \fBpreadv\fR\|(2) or \fBpwritev\fR\|(2) I/O.
 .TP
+.B pvsync2
+Basic \fBpreadv2\fR\|(2) or \fBpwritev2\fR\|(2) I/O.
+.TP
 .B libaio
 Linux native asynchronous I/O. This ioengine defines engine specific options.
 .TP
@@ -1647,6 +1650,10 @@ from user-space to reap events. The reaping mode is only
 enabled when polling for a minimum of 0 events (eg when
 iodepth_batch_complete=0).
 .TP
+.BI (psyncv2)hipri
+Set RWF_HIPRI on IO, indicating to the kernel that it's of
+higher priority than normal.
+.TP
 .BI (net,netsplice)hostname \fR=\fPstr
 The host name or IP address to use for TCP or UDP based IO.
 If the job is a TCP listener or UDP reader, the hostname is not
diff --git a/options.c b/options.c
index 3902087..ac2da71 100644
--- a/options.c
+++ b/options.c
@@ -1240,6 +1240,11 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			    .help = "Use preadv/pwritev",
 			  },
 #endif
+#ifdef CONFIG_PWRITEV
+			  { .ival = "pvsync2",
+			    .help = "Use preadv2/pwritev2",
+			  },
+#endif
 #ifdef CONFIG_LIBAIO
 			  { .ival = "libaio",
 			    .help = "Linux native asynchronous IO",

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-02-26 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-02-26 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit a6cb85e25b31553aab043fe763ce4c999197dcff:

  Allow IO engine driven allocations of IO structures (2016-02-24 16:29:08 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e6727cbdd34a6ec30b437311724a6aa60e6b06fd:

  ioengines: account any queued IO on the engine side (2016-02-25 12:24:47 -0800)

----------------------------------------------------------------
Jens Axboe (2):
      backend: ensure that fio_io_sync() commits IN on queued status
      ioengines: account any queued IO on the engine side

 backend.c   | 2 ++
 ioengines.c | 6 +++---
 2 files changed, 5 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/backend.c b/backend.c
index fd81849..6083a51 100644
--- a/backend.c
+++ b/backend.c
@@ -309,6 +309,8 @@ requeue:
 		put_io_u(td, io_u);
 		return true;
 	} else if (ret == FIO_Q_QUEUED) {
+		if (td_io_commit(td))
+			return true;
 		if (io_u_queued_complete(td, 1) < 0)
 			return true;
 	} else if (ret == FIO_Q_COMPLETED) {
diff --git a/ioengines.c b/ioengines.c
index decc9cb..b89a121 100644
--- a/ioengines.c
+++ b/ioengines.c
@@ -342,10 +342,10 @@ int td_io_queue(struct thread_data *td, struct io_u *io_u)
 	} else if (ret == FIO_Q_QUEUED) {
 		int r;
 
-		if (ddir_rw(io_u->ddir)) {
-			td->io_u_queued++;
+		td->io_u_queued++;
+
+		if (ddir_rw(io_u->ddir))
 			td->ts.total_io_u[io_u->ddir]++;
-		}
 
 		if (td->io_u_queued >= td->o.iodepth_batch) {
 			r = td_io_commit(td);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-02-25 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-02-25 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit c0821a08db2fd3e37e16db73c3c4c9c7ff601e88:

  Allow for the include file specification to be relative. (2016-02-13 12:31:00 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to a6cb85e25b31553aab043fe763ce4c999197dcff:

  Allow IO engine driven allocations of IO structures (2016-02-24 16:29:08 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      Allow IO engine driven allocations of IO structures

 ioengine.h |  4 +++-
 memory.c   | 17 +++++++++++++++--
 2 files changed, 18 insertions(+), 3 deletions(-)

---

Diff of recent changes:

diff --git a/ioengine.h b/ioengine.h
index 6734c7b..161acf5 100644
--- a/ioengine.h
+++ b/ioengine.h
@@ -16,7 +16,7 @@
 #include <guasi.h>
 #endif
 
-#define FIO_IOOPS_VERSION	22
+#define FIO_IOOPS_VERSION	23
 
 enum {
 	IO_U_F_FREE		= 1 << 0,
@@ -157,6 +157,8 @@ struct ioengine_ops {
 	int (*unlink_file)(struct thread_data *, struct fio_file *);
 	int (*get_file_size)(struct thread_data *, struct fio_file *);
 	void (*terminate)(struct thread_data *);
+	int (*iomem_alloc)(struct thread_data *, size_t);
+	void (*iomem_free)(struct thread_data *);
 	int (*io_u_init)(struct thread_data *, struct io_u *);
 	void (*io_u_free)(struct thread_data *, struct io_u *);
 	int option_struct_size;
diff --git a/memory.c b/memory.c
index 5060223..c04d7df 100644
--- a/memory.c
+++ b/memory.c
@@ -229,7 +229,17 @@ int allocate_io_mem(struct thread_data *td)
 
 	dprint(FD_MEM, "Alloc %llu for buffers\n", (unsigned long long) total_mem);
 
-	if (td->o.mem_type == MEM_MALLOC)
+	/*
+	 * If the IO engine has hooks to allocate/free memory, use those. But
+	 * error out if the user explicitly asked for something else.
+	 */
+	if (td->io_ops->iomem_alloc) {
+		if (fio_option_is_set(&td->o, mem_type)) {
+			log_err("fio: option 'mem/iomem' conflicts with specified IO engine\n");
+			ret = 1;
+		} else
+			ret = td->io_ops->iomem_alloc(td, total_mem);
+	} else if (td->o.mem_type == MEM_MALLOC)
 		ret = alloc_mem_malloc(td, total_mem);
 	else if (td->o.mem_type == MEM_SHM || td->o.mem_type == MEM_SHMHUGE)
 		ret = alloc_mem_shm(td, total_mem);
@@ -255,7 +265,10 @@ void free_io_mem(struct thread_data *td)
 	if (td->o.odirect || td->o.oatomic)
 		total_mem += page_mask;
 
-	if (td->o.mem_type == MEM_MALLOC)
+	if (td->io_ops->iomem_alloc) {
+		if (td->io_ops->iomem_free)
+			td->io_ops->iomem_free(td);
+	} else if (td->o.mem_type == MEM_MALLOC)
 		free_mem_malloc(td);
 	else if (td->o.mem_type == MEM_SHM || td->o.mem_type == MEM_SHMHUGE)
 		free_mem_shm(td);

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Re: Recent changes (master)
  2016-02-14 13:00 Jens Axboe
@ 2016-02-15 14:26 ` Erwan Velu
  0 siblings, 0 replies; 1305+ messages in thread
From: Erwan Velu @ 2016-02-15 14:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

Thanks Jens for clarifying this use case.

The part I don't really get is _when_ does the log is reported.
By a short read of the source code, I don't get how does the commited IO is reported on the server side.
I did find some code when we fire the IO but not when the IO is completed. In a power-failure case, that's pretty important reporting only completed ones.
Thanks,
Erwan

----- Mail original -----
De: "Jens Axboe" <axboe@kernel.dk>
À: fio@vger.kernel.org
Envoyé: Dimanche 14 Février 2016 14:00:02
Objet: Recent changes (master)

The following changes since commit e0ee7a8ba4e6badc6cb73814315aa11c15d86ef9:

  fio.1: man page fixes (2016-02-12 15:00:39 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c0821a08db2fd3e37e16db73c3c4c9c7ff601e88:

  Allow for the include file specification to be relative. (2016-02-13 12:31:00 -0700)

----------------------------------------------------------------
Andrey Kuzmin (1):
      Allow for the include file specification to be relative.

 init.c | 45 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 39 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 5ee4082..c7ce2cc 100644
--- a/init.c
+++ b/init.c
@@ -1773,15 +1773,48 @@ int __parse_jobs_ini(struct thread_data *td,
 			strip_blank_end(p);
 
 			if (!strncmp(p, "include", strlen("include"))) {
-				char *filename = p + strlen("include") + 1;
+				char *filename = p + strlen("include") + 1,
+					*ts, *full_fn = NULL;
 
-				if ((ret = __parse_jobs_ini(td, filename,
-						is_buf, stonewall_flag, type, 1,
-						name, &opts, &alloc_opts, &num_opts))) {
-					log_err("Error %d while parsing include file %s\n",
+				/*
+				 * Allow for the include filename
+				 * specification to be relative.
+				 */
+				if (access(filename, F_OK) &&
+				    (ts = strrchr(file, '/'))) {
+					int len = ts - file +
+						strlen(filename) + 2;
+
+					if (!(full_fn = calloc(1, len))) {
+						ret = ENOMEM;
+						break;
+					}
+
+					strncpy(full_fn,
+						file, (ts - file) + 1);
+					strncpy(full_fn + (ts - file) + 1,
+						filename, strlen(filename));
+					full_fn[len - 1] = 0;
+					filename = full_fn;
+				}
+
+				ret = __parse_jobs_ini(td, filename, is_buf,
+						       stonewall_flag, type, 1,
+						       name, &opts,
+						       &alloc_opts, &num_opts);
+
+				if (ret) {
+					log_err("Error %d while parsing "
+						"include file %s\n",
 						ret, filename);
-					break;
 				}
+
+				if (full_fn)
+					free(full_fn);
+
+				if (ret)
+					break;
+
 				continue;
 			}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-02-14 13:00 Jens Axboe
  2016-02-15 14:26 ` Erwan Velu
  0 siblings, 1 reply; 1305+ messages in thread
From: Jens Axboe @ 2016-02-14 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit e0ee7a8ba4e6badc6cb73814315aa11c15d86ef9:

  fio.1: man page fixes (2016-02-12 15:00:39 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to c0821a08db2fd3e37e16db73c3c4c9c7ff601e88:

  Allow for the include file specification to be relative. (2016-02-13 12:31:00 -0700)

----------------------------------------------------------------
Andrey Kuzmin (1):
      Allow for the include file specification to be relative.

 init.c | 45 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 39 insertions(+), 6 deletions(-)

---

Diff of recent changes:

diff --git a/init.c b/init.c
index 5ee4082..c7ce2cc 100644
--- a/init.c
+++ b/init.c
@@ -1773,15 +1773,48 @@ int __parse_jobs_ini(struct thread_data *td,
 			strip_blank_end(p);
 
 			if (!strncmp(p, "include", strlen("include"))) {
-				char *filename = p + strlen("include") + 1;
+				char *filename = p + strlen("include") + 1,
+					*ts, *full_fn = NULL;
 
-				if ((ret = __parse_jobs_ini(td, filename,
-						is_buf, stonewall_flag, type, 1,
-						name, &opts, &alloc_opts, &num_opts))) {
-					log_err("Error %d while parsing include file %s\n",
+				/*
+				 * Allow for the include filename
+				 * specification to be relative.
+				 */
+				if (access(filename, F_OK) &&
+				    (ts = strrchr(file, '/'))) {
+					int len = ts - file +
+						strlen(filename) + 2;
+
+					if (!(full_fn = calloc(1, len))) {
+						ret = ENOMEM;
+						break;
+					}
+
+					strncpy(full_fn,
+						file, (ts - file) + 1);
+					strncpy(full_fn + (ts - file) + 1,
+						filename, strlen(filename));
+					full_fn[len - 1] = 0;
+					filename = full_fn;
+				}
+
+				ret = __parse_jobs_ini(td, filename, is_buf,
+						       stonewall_flag, type, 1,
+						       name, &opts,
+						       &alloc_opts, &num_opts);
+
+				if (ret) {
+					log_err("Error %d while parsing "
+						"include file %s\n",
 						ret, filename);
-					break;
 				}
+
+				if (full_fn)
+					free(full_fn);
+
+				if (ret)
+					break;
+
 				continue;
 			}
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-02-13 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-02-13 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 8a768c2e725d6a527b904570949f6099c3f1434a:

  server: don't make SO_REUSEPORT errors fatal (2016-02-10 08:32:13 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to e0ee7a8ba4e6badc6cb73814315aa11c15d86ef9:

  fio.1: man page fixes (2016-02-12 15:00:39 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      Update documentation
      fio.1: man page fixes

 HOWTO |   3 +-
 fio.1 | 208 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 200 insertions(+), 11 deletions(-)

---

Diff of recent changes:

diff --git a/HOWTO b/HOWTO
index 6d80360..5e765d4 100644
--- a/HOWTO
+++ b/HOWTO
@@ -10,6 +10,7 @@ Table of contents
 7. Terse output
 8. Trace file format
 9. CPU idleness profiling
+10. Verification and triggers
 
 1.0 Overview and history
 ------------------------
@@ -2219,7 +2220,7 @@ localbox$ fio --client=server --trigger-file=/tmp/my-trigger --trigger="ipmi-reb
 For this case, fio would wait for the server to send us the write state,
 then execute 'ipmi-reboot server' when that happened.
 
-10.1 Loading verify state
+10.2 Loading verify state
 -------------------------
 To load store write state, read verification job file must contain
 the verify_state_load option. If that is set, fio will load the previously
diff --git a/fio.1 b/fio.1
index 246bcd2..690c8f4 100644
--- a/fio.1
+++ b/fio.1
@@ -1988,6 +1988,194 @@ Error Info (dependent on continue_on_error, default off):
 .P
 .B text description (if provided in config - appears on newline)
 .RE
+.SH TRACE FILE FORMAT
+There are two trace file format that you can encounter. The older (v1) format
+is unsupported since version 1.20-rc3 (March 2008). It will still be described
+below in case that you get an old trace and want to understand it.
+
+In any case the trace is a simple text file with a single action per line.
+
+.P
+.B Trace file format v1
+.RS
+Each line represents a single io action in the following format:
+
+rw, offset, length
+
+where rw=0/1 for read/write, and the offset and length entries being in bytes.
+
+This format is not supported in Fio versions => 1.20-rc3.
+
+.RE
+.P
+.B Trace file format v2
+.RS
+The second version of the trace file format was added in Fio version 1.17.
+It allows to access more then one file per trace and has a bigger set of
+possible file actions.
+
+The first line of the trace file has to be:
+
+\fBfio version 2 iolog\fR
+
+Following this can be lines in two different formats, which are described below.
+The file management format:
+
+\fBfilename action\fR
+
+The filename is given as an absolute path. The action can be one of these:
+
+.P
+.PD 0
+.RS
+.TP
+.B add
+Add the given filename to the trace
+.TP
+.B open
+Open the file with the given filename. The filename has to have been previously
+added with the \fBadd\fR action.
+.TP
+.B close
+Close the file with the given filename. The file must have previously been
+opened.
+.RE
+.PD
+.P
+
+The file io action format:
+
+\fBfilename action offset length\fR
+
+The filename is given as an absolute path, and has to have been added and opened
+before it can be used with this format. The offset and length are given in
+bytes. The action can be one of these:
+
+.P
+.PD 0
+.RS
+.TP
+.B wait
+Wait for 'offset' microseconds. Everything below 100 is discarded.  The time is
+relative to the previous wait statement.
+.TP
+.B read
+Read \fBlength\fR bytes beginning from \fBoffset\fR
+.TP
+.B write
+Write \fBlength\fR bytes beginning from \fBoffset\fR
+.TP
+.B sync
+fsync() the file
+.TP
+.B datasync
+fdatasync() the file
+.TP
+.B trim
+trim the given file from the given \fBoffset\fR for \fBlength\fR bytes
+.RE
+.PD
+.P
+
+.SH CPU IDLENESS PROFILING
+In some cases, we want to understand CPU overhead in a test. For example,
+we test patches for the specific goodness of whether they reduce CPU usage.
+fio implements a balloon approach to create a thread per CPU that runs at
+idle priority, meaning that it only runs when nobody else needs the cpu.
+By measuring the amount of work completed by the thread, idleness of each
+CPU can be derived accordingly.
+
+An unit work is defined as touching a full page of unsigned characters. Mean
+and standard deviation of time to complete an unit work is reported in "unit
+work" section. Options can be chosen to report detailed percpu idleness or
+overall system idleness by aggregating percpu stats.
+
+.SH VERIFICATION AND TRIGGERS
+Fio is usually run in one of two ways, when data verification is done. The
+first is a normal write job of some sort with verify enabled. When the
+write phase has completed, fio switches to reads and verifies everything
+it wrote. The second model is running just the write phase, and then later
+on running the same job (but with reads instead of writes) to repeat the
+same IO patterns and verify the contents. Both of these methods depend
+on the write phase being completed, as fio otherwise has no idea how much
+data was written.
+
+With verification triggers, fio supports dumping the current write state
+to local files. Then a subsequent read verify workload can load this state
+and know exactly where to stop. This is useful for testing cases where
+power is cut to a server in a managed fashion, for instance.
+
+A verification trigger consists of two things:
+
+.RS
+Storing the write state of each job
+.LP
+Executing a trigger command
+.RE
+
+The write state is relatively small, on the order of hundreds of bytes
+to single kilobytes. It contains information on the number of completions
+done, the last X completions, etc.
+
+A trigger is invoked either through creation (\fBtouch\fR) of a specified
+file in the system, or through a timeout setting. If fio is run with
+\fB\-\-trigger\-file=/tmp/trigger-file\fR, then it will continually check for
+the existence of /tmp/trigger-file. When it sees this file, it will
+fire off the trigger (thus saving state, and executing the trigger
+command).
+
+For client/server runs, there's both a local and remote trigger. If
+fio is running as a server backend, it will send the job states back
+to the client for safe storage, then execute the remote trigger, if
+specified. If a local trigger is specified, the server will still send
+back the write state, but the client will then execute the trigger.
+
+.RE
+.P
+.B Verification trigger example
+.RS
+
+Lets say we want to run a powercut test on the remote machine 'server'.
+Our write workload is in write-test.fio. We want to cut power to 'server'
+at some point during the run, and we'll run this test from the safety
+or our local machine, 'localbox'. On the server, we'll start the fio
+backend normally:
+
+server# \fBfio \-\-server\fR
+
+and on the client, we'll fire off the workload:
+
+localbox$ \fBfio \-\-client=server \-\-trigger\-file=/tmp/my\-trigger \-\-trigger-remote="bash \-c "echo b > /proc/sysrq-triger""\fR
+
+We set \fB/tmp/my-trigger\fR as the trigger file, and we tell fio to execute
+
+\fBecho b > /proc/sysrq-trigger\fR
+
+on the server once it has received the trigger and sent us the write
+state. This will work, but it's not \fIreally\fR cutting power to the server,
+it's merely abruptly rebooting it. If we have a remote way of cutting
+power to the server through IPMI or similar, we could do that through
+a local trigger command instead. Lets assume we have a script that does
+IPMI reboot of a given hostname, ipmi-reboot. On localbox, we could
+then have run fio with a local trigger instead:
+
+localbox$ \fBfio \-\-client=server \-\-trigger\-file=/tmp/my\-trigger \-\-trigger="ipmi-reboot server"\fR
+
+For this case, fio would wait for the server to send us the write state,
+then execute 'ipmi-reboot server' when that happened.
+
+.RE
+.P
+.B Loading verify state
+.RS
+To load store write state, read verification job file must contain
+the verify_state_load option. If that is set, fio will load the previously
+stored state. For a local fio run this is done by loading the files directly,
+and on a client/server run, the server backend will ask the client to send
+the files over and load them from there.
+
+.RE
+
 .SH CLIENT / SERVER
 Normally you would run fio as a stand-alone application on the machine
 where the IO workload should be generated. However, it is also possible to
@@ -2005,34 +2193,34 @@ for TCP/IP v4, 'ip6' for TCP/IP v6, or 'sock' for a local unix domain
 socket. 'hostname' is either a hostname or IP address, and 'port' is the port to
 listen to (only valid for TCP/IP, not a local socket). Some examples:
 
-1) fio \-\-server
+1) \fBfio \-\-server\fR
 
    Start a fio server, listening on all interfaces on the default port (8765).
 
-2) fio \-\-server=ip:hostname,4444
+2) \fBfio \-\-server=ip:hostname,4444\fR
 
    Start a fio server, listening on IP belonging to hostname and on port 4444.
 
-3) fio \-\-server=ip6:::1,4444
+3) \fBfio \-\-server=ip6:::1,4444\fR
 
    Start a fio server, listening on IPv6 localhost ::1 and on port 4444.
 
-4) fio \-\-server=,4444
+4) \fBfio \-\-server=,4444\fR
 
    Start a fio server, listening on all interfaces on port 4444.
 
-5) fio \-\-server=1.2.3.4
+5) \fBfio \-\-server=1.2.3.4\fR
 
    Start a fio server, listening on IP 1.2.3.4 on the default port.
 
-6) fio \-\-server=sock:/tmp/fio.sock
+6) \fBfio \-\-server=sock:/tmp/fio.sock\fR
 
    Start a fio server, listening on the local socket /tmp/fio.sock.
 
 When a server is running, you can connect to it from a client. The client
 is run with:
 
-fio \-\-local-args \-\-client=server \-\-remote-args <job file(s)>
+\fBfio \-\-local-args \-\-client=server \-\-remote-args <job file(s)>\fR
 
 where \-\-local-args are arguments that are local to the client where it is
 running, 'server' is the connect string, and \-\-remote-args and <job file(s)>
@@ -2040,12 +2228,12 @@ are sent to the server. The 'server' string follows the same format as it
 does on the server side, to allow IP/hostname/socket and port strings.
 You can connect to multiple clients as well, to do that you could run:
 
-fio \-\-client=server2 \-\-client=server2 <job file(s)>
+\fBfio \-\-client=server2 \-\-client=server2 <job file(s)>\fR
 
 If the job file is located on the fio server, then you can tell the server
 to load a local file as well. This is done by using \-\-remote-config:
 
-fio \-\-client=server \-\-remote-config /path/to/file.fio
+\fBfio \-\-client=server \-\-remote-config /path/to/file.fio\fR
 
 Then fio will open this local (to the server) job file instead
 of being passed one from the client.
@@ -2060,7 +2248,7 @@ host2.your.dns.domain
 
 The fio command would then be:
 
-fio \-\-client=host.list <job file>
+\fBfio \-\-client=host.list <job file>\fR
 
 In this mode, you cannot input server-specific parameters or job files, and all
 servers receive the same job file.

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-02-11 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-02-11 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 03d189b7d77f66ce4f4dba9885d34c7a55ff4e53:

  diskutil: don't print terse disk util twice for json,terse output format (2016-02-09 13:45:50 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 8a768c2e725d6a527b904570949f6099c3f1434a:

  server: don't make SO_REUSEPORT errors fatal (2016-02-10 08:32:13 -0700)

----------------------------------------------------------------
Jens Axboe (1):
      server: don't make SO_REUSEPORT errors fatal

 server.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/server.c b/server.c
index c3e034c..6416a5c 100644
--- a/server.c
+++ b/server.c
@@ -1916,11 +1916,10 @@ static int fio_init_server_ip(void)
 		return -1;
 	}
 #ifdef SO_REUSEPORT
-	if (setsockopt(sk, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof(opt)) < 0) {
-		log_err("fio: setsockopt(REUSEPORT): %s\n", strerror(errno));
-		close(sk);
-		return -1;
-	}
+	/*
+	 * Not fatal if fails, so just ignore it if that happens
+	 */
+	setsockopt(sk, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof(opt));
 #endif
 
 	if (use_ipv6) {

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

* Recent changes (master)
@ 2016-02-10 13:00 Jens Axboe
  0 siblings, 0 replies; 1305+ messages in thread
From: Jens Axboe @ 2016-02-10 13:00 UTC (permalink / raw)
  To: fio

The following changes since commit 820ba1f9c31bbe92e9b8f71d587907819919c0f8:

  io_ddir: io_ddir_name array should be static (2016-02-05 08:42:26 -0700)

are available in the git repository at:

  git://git.kernel.dk/fio.git master

for you to fetch changes up to 03d189b7d77f66ce4f4dba9885d34c7a55ff4e53:

  diskutil: don't print terse disk util twice for json,terse output format (2016-02-09 13:45:50 -0700)

----------------------------------------------------------------
Jens Axboe (2):
      diskutil: fix segfault for both json and terse output
      diskutil: don't print terse disk util twice for json,terse output format

 diskutil.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

---

Diff of recent changes:

diff --git a/diskutil.c b/diskutil.c
index c3181b5..c25c5c9 100644
--- a/diskutil.c
+++ b/diskutil.c
@@ -695,6 +695,7 @@ void show_disk_util(int terse, struct json_object *parent,
 {
 	struct flist_head *entry;
 	struct disk_util *du;
+	bool do_json;
 
 	if (!disk_util_mutex)
 		return;
@@ -706,15 +707,17 @@ void show_disk_util(int terse, struct json_object *parent,
 		return;
 	}
 
-	if (output_format & FIO_OUTPUT_JSON)
-		assert(parent);
+	if ((output_format & FIO_OUTPUT_JSON) && parent)
+		do_json = true;
+	else
+		do_json = false;
 
-	if (!terse && !(output_format & FIO_OUTPUT_JSON))
+	if (!terse && !do_json)
 		log_buf(out, "\nDisk stats (read/write):\n");
 
-	if (output_format & FIO_OUTPUT_JSON)
+	if (do_json)
 		json_object_add_disk_utils(parent, &disk_list);
-	if (output_format & ~(FIO_OUTPUT_JSON | FIO_OUTPUT_JSON_PLUS)) {
+	else if (output_format & ~(FIO_OUTPUT_JSON | FIO_OUTPUT_JSON_PLUS)) {
 		flist_for_each(entry, &disk_list) {
 			du = flist_entry(entry, struct disk_util, list);
 

^ permalink raw reply related	[flat|nested] 1305+ messages in thread

end of thread, other threads:[~2024-04-03 12:00 UTC | newest]

Thread overview: 1305+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-02 12:00 Recent changes (master) Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2024-04-03 12:00 Jens Axboe
2024-03-27 12:00 Jens Axboe
2024-03-26 12:00 Jens Axboe
2024-03-23 12:00 Jens Axboe
2024-03-22 12:00 Jens Axboe
2024-03-21 12:00 Jens Axboe
2024-03-19 12:00 Jens Axboe
2024-03-08 13:00 Jens Axboe
2024-03-06 13:00 Jens Axboe
2024-03-05 13:00 Jens Axboe
2024-02-28 13:00 Jens Axboe
2024-02-23 13:00 Jens Axboe
2024-02-17 13:00 Jens Axboe
2024-02-16 13:00 Jens Axboe
2024-02-15 13:00 Jens Axboe
2024-02-14 13:00 Jens Axboe
2024-02-13 13:00 Jens Axboe
2024-02-09 13:00 Jens Axboe
2024-02-08 13:00 Jens Axboe
2024-01-28 13:00 Jens Axboe
2024-01-26 13:00 Jens Axboe
2024-01-25 13:00 Jens Axboe
2024-01-24 13:00 Jens Axboe
2024-01-23 13:00 Jens Axboe
2024-01-19 13:00 Jens Axboe
2024-01-18 13:00 Jens Axboe
2024-01-17 13:00 Jens Axboe
2023-12-30 13:00 Jens Axboe
2023-12-20 13:00 Jens Axboe
2023-12-16 13:00 Jens Axboe
2023-12-15 13:00 Jens Axboe
2023-12-13 13:00 Jens Axboe
2023-12-12 13:00 Jens Axboe
2023-11-20 13:00 Jens Axboe
2023-11-08 13:00 Jens Axboe
2023-11-07 13:00 Jens Axboe
2023-11-04 12:00 Jens Axboe
2023-11-03 12:00 Jens Axboe
2023-11-01 12:00 Jens Axboe
2023-10-26 12:00 Jens Axboe
2023-10-24 12:00 Jens Axboe
2023-10-23 12:00 Jens Axboe
2023-10-20 12:00 Jens Axboe
2023-10-17 12:00 Jens Axboe
2023-10-14 12:00 Jens Axboe
2023-10-07 12:00 Jens Axboe
2023-10-03 12:00 Jens Axboe
2023-09-30 12:00 Jens Axboe
2023-09-29 12:00 Jens Axboe
2023-09-27 12:00 Jens Axboe
2023-09-20 12:00 Jens Axboe
2023-09-16 12:00 Jens Axboe
2023-09-12 12:00 Jens Axboe
2023-09-03 12:00 Jens Axboe
2023-08-24 12:00 Jens Axboe
2023-08-17 12:00 Jens Axboe
2023-08-15 12:00 Jens Axboe
2023-08-04 12:00 Jens Axboe
2023-08-03 12:00 Jens Axboe
2023-08-01 12:00 Jens Axboe
2023-07-29 12:00 Jens Axboe
2023-07-28 12:00 Jens Axboe
2023-07-22 12:00 Jens Axboe
2023-07-21 12:00 Jens Axboe
2023-07-16 12:00 Jens Axboe
2023-07-15 12:00 Jens Axboe
2023-07-14 12:00 Jens Axboe
2023-07-06 12:00 Jens Axboe
2023-07-04 12:00 Jens Axboe
2023-06-22 12:00 Jens Axboe
2023-06-17 12:00 Jens Axboe
2023-06-10 12:00 Jens Axboe
2023-06-09 12:00 Jens Axboe
2023-06-02 12:00 Jens Axboe
2023-05-31 12:00 Jens Axboe
2023-05-25 12:00 Jens Axboe
2023-05-24 12:00 Jens Axboe
2023-05-20 12:00 Jens Axboe
2023-05-19 12:00 Jens Axboe
2023-05-18 12:00 Jens Axboe
2023-05-17 12:00 Jens Axboe
2023-05-16 12:00 Jens Axboe
2023-05-12 12:00 Jens Axboe
2023-05-11 12:00 Jens Axboe
2023-04-28 12:00 Jens Axboe
2023-04-27 12:00 Jens Axboe
2023-04-21 12:00 Jens Axboe
2023-04-14 12:00 Jens Axboe
2023-04-11 12:00 Jens Axboe
2023-04-08 12:00 Jens Axboe
2023-04-05 12:00 Jens Axboe
2023-04-01 12:00 Jens Axboe
2023-03-28 12:00 Jens Axboe
2023-03-22 12:00 Jens Axboe
2023-03-21 12:00 Jens Axboe
2023-03-16 12:00 Jens Axboe
2023-03-15 12:00 Jens Axboe
2023-03-08 13:00 Jens Axboe
2023-03-04 13:00 Jens Axboe
2023-03-03 13:00 Jens Axboe
2023-03-01 13:00 Jens Axboe
2023-02-28 13:00 Jens Axboe
2023-02-24 13:00 Jens Axboe
2023-02-22 13:00 Jens Axboe
2023-02-21 13:00 Jens Axboe
2023-02-18 13:00 Jens Axboe
2023-02-16 13:00 Jens Axboe
2023-02-15 13:00 Jens Axboe
2023-02-11 13:00 Jens Axboe
2023-02-10 13:00 Jens Axboe
2023-02-08 13:00 Jens Axboe
2023-02-07 13:00 Jens Axboe
2023-02-04 13:00 Jens Axboe
2023-02-01 13:00 Jens Axboe
2023-01-31 13:00 Jens Axboe
2023-01-26 13:00 Jens Axboe
2023-01-25 13:00 Jens Axboe
2023-01-24 13:00 Jens Axboe
2023-01-21 13:00 Jens Axboe
2023-01-19 13:00 Jens Axboe
2023-01-12 13:00 Jens Axboe
2022-12-23 13:00 Jens Axboe
2022-12-17 13:00 Jens Axboe
2022-12-16 13:00 Jens Axboe
2022-12-13 13:00 Jens Axboe
2022-12-03 13:00 Jens Axboe
2022-12-02 13:00 Jens Axboe
2022-12-01 13:00 Jens Axboe
2022-11-30 13:00 Jens Axboe
2022-11-29 13:00 Jens Axboe
2022-11-24 13:00 Jens Axboe
2022-11-19 13:00 Jens Axboe
2022-11-15 13:00 Jens Axboe
2022-11-08 13:00 Jens Axboe
2022-11-07 13:00 Jens Axboe
2022-11-05 12:00 Jens Axboe
2022-11-03 12:00 Jens Axboe
2022-11-02 12:00 Jens Axboe
2022-10-25 12:00 Jens Axboe
2022-10-22 12:00 Jens Axboe
2022-10-20 12:00 Jens Axboe
2022-10-19 12:00 Jens Axboe
2022-10-17 12:00 Jens Axboe
2022-10-16 12:00 Jens Axboe
2022-10-15 12:00 Jens Axboe
2022-10-08 12:00 Jens Axboe
2022-10-06 12:00 Jens Axboe
2022-10-05 12:00 Jens Axboe
2022-10-04 12:00 Jens Axboe
2022-09-29 12:00 Jens Axboe
2022-09-23 12:00 Jens Axboe
2022-09-20 12:00 Jens Axboe
2022-09-16 12:00 Jens Axboe
2022-09-14 12:00 Jens Axboe
2022-09-13 12:00 Jens Axboe
2022-09-07 12:00 Jens Axboe
2022-09-04 12:00 Jens Axboe
2022-09-03 12:00 Jens Axboe
2022-09-02 12:00 Jens Axboe
2022-09-01 12:00 Jens Axboe
2022-08-31 12:00 Jens Axboe
2022-08-30 12:00 Jens Axboe
2022-08-27 12:00 Jens Axboe
2022-08-26 12:00 Jens Axboe
2022-08-25 12:00 Jens Axboe
2022-08-24 12:00 Jens Axboe
2022-08-17 12:00 Jens Axboe
2022-08-16 12:00 Jens Axboe
2022-08-12 12:00 Jens Axboe
2022-08-11 12:00 Jens Axboe
2022-08-10 12:00 Jens Axboe
2022-08-08 12:00 Jens Axboe
2022-08-04 12:00 Jens Axboe
2022-08-03 12:00 Jens Axboe
2022-08-01 12:00 Jens Axboe
2022-07-29 12:00 Jens Axboe
2022-07-28 12:00 Jens Axboe
2022-07-23 12:00 Jens Axboe
2022-07-22 12:00 Jens Axboe
2022-07-20 12:00 Jens Axboe
2022-07-12 12:00 Jens Axboe
2022-07-08 12:00 Jens Axboe
2022-07-07 12:00 Jens Axboe
2022-07-06 12:00 Jens Axboe
2022-07-02 12:00 Jens Axboe
2022-06-24 12:00 Jens Axboe
2022-06-23 12:00 Jens Axboe
2022-06-20 12:00 Jens Axboe
2022-06-16 12:00 Jens Axboe
2022-06-14 12:00 Jens Axboe
2022-06-01 12:00 Jens Axboe
2022-05-30 12:00 Jens Axboe
2022-05-26 12:00 Jens Axboe
2022-05-13 12:00 Jens Axboe
2022-05-02 12:00 Jens Axboe
2022-04-30 12:00 Jens Axboe
2022-04-18 12:00 Jens Axboe
2022-04-11 12:00 Jens Axboe
2022-04-09 12:00 Jens Axboe
2022-04-07 12:00 Jens Axboe
2022-04-06 12:00 Jens Axboe
2022-03-31 12:00 Jens Axboe
2022-03-30 12:00 Jens Axboe
2022-03-29 12:00 Jens Axboe
2022-03-25 12:00 Jens Axboe
2022-03-21 12:00 Jens Axboe
2022-03-16 12:00 Jens Axboe
2022-03-12 13:00 Jens Axboe
2022-03-11 13:00 Jens Axboe
2022-03-10 13:00 Jens Axboe
2022-03-09 13:00 Jens Axboe
2022-03-08 13:00 Jens Axboe
2022-02-27 13:00 Jens Axboe
2022-02-25 13:00 Jens Axboe
2022-02-22 13:00 Jens Axboe
2022-02-21 13:00 Jens Axboe
2022-02-19 13:00 Jens Axboe
2022-02-18 13:00 Jens Axboe
2022-02-16 13:00 Jens Axboe
2022-02-12 13:00 Jens Axboe
2022-02-09 13:00 Jens Axboe
2022-02-05 13:00 Jens Axboe
2022-02-04 13:00 Jens Axboe
2022-01-29 13:00 Jens Axboe
2022-01-27 13:00 Jens Axboe
2022-01-22 13:00 Jens Axboe
2022-01-21 13:00 Jens Axboe
2022-01-19 13:00 Jens Axboe
2022-01-18 13:00 Jens Axboe
2022-01-11 13:00 Jens Axboe
2022-01-10 13:00 Jens Axboe
2021-12-24 13:00 Jens Axboe
2021-12-19 13:00 Jens Axboe
2021-12-16 13:00 Jens Axboe
2021-12-15 13:00 Jens Axboe
2021-12-11 13:00 Jens Axboe
2021-12-10 13:00 Jens Axboe
2021-12-07 13:00 Jens Axboe
2021-12-03 13:00 Jens Axboe
2021-11-26 13:00 Jens Axboe
2021-11-25 13:00 Jens Axboe
2021-11-22 13:00 Jens Axboe
2021-11-21 13:00 Jens Axboe
2021-11-20 13:00 Jens Axboe
2021-11-18 13:00 Jens Axboe
2021-11-13 13:00 Jens Axboe
2021-11-11 13:00 Jens Axboe
2021-10-26 12:00 Jens Axboe
2021-10-23 12:00 Jens Axboe
2021-10-25 15:37 ` Rebecca Cran
2021-10-25 15:41   ` Jens Axboe
2021-10-25 15:42     ` Rebecca Cran
2021-10-25 15:43       ` Jens Axboe
2021-10-20 12:00 Jens Axboe
2021-10-19 12:00 Jens Axboe
2021-10-18 12:00 Jens Axboe
2021-10-16 12:00 Jens Axboe
2021-10-15 12:00 Jens Axboe
2021-10-14 12:00 Jens Axboe
2021-10-13 12:00 Jens Axboe
2021-10-12 12:00 Jens Axboe
2021-10-10 12:00 Jens Axboe
2021-10-08 12:00 Jens Axboe
2021-10-06 12:00 Jens Axboe
2021-10-05 12:00 Jens Axboe
2021-10-02 12:00 Jens Axboe
2021-10-01 12:00 Jens Axboe
2021-09-30 12:00 Jens Axboe
2021-09-29 12:00 Jens Axboe
2021-09-27 12:00 Jens Axboe
2021-09-26 12:00 Jens Axboe
2021-09-25 12:00 Jens Axboe
2021-09-24 12:00 Jens Axboe
2021-09-21 12:00 Jens Axboe
2021-09-17 12:00 Jens Axboe
2021-09-16 12:00 Jens Axboe
2021-09-14 12:00 Jens Axboe
2021-09-09 12:00 Jens Axboe
2021-09-06 12:00 Jens Axboe
     [not found] <20210904120002.6CvOT9T4szpIiJFCHDKPhuyks6R8uigef-9NM23WJEg@z>
2021-09-04 12:00 ` Jens Axboe
2021-09-03 12:00 Jens Axboe
2021-08-29 12:00 Jens Axboe
2021-08-28 12:00 Jens Axboe
2021-08-27 12:00 Jens Axboe
2021-08-21 12:00 Jens Axboe
2021-08-19 12:00 Jens Axboe
2021-08-14 12:00 Jens Axboe
2021-08-12 12:00 Jens Axboe
2021-08-07 12:00 Jens Axboe
2021-08-05 12:00 Jens Axboe
2021-08-04 12:00 Jens Axboe
2021-08-03 12:00 Jens Axboe
2021-08-02 12:00 Jens Axboe
2021-07-29 12:00 Jens Axboe
2021-07-26 12:00 Jens Axboe
2021-07-16 12:00 Jens Axboe
2021-07-08 12:00 Jens Axboe
2021-07-02 12:00 Jens Axboe
2021-06-30 12:00 Jens Axboe
2021-06-21 12:00 Jens Axboe
2021-06-18 12:00 Jens Axboe
2021-06-15 12:00 Jens Axboe
2021-06-11 12:00 Jens Axboe
2021-06-09 12:00 Jens Axboe
2021-06-04 12:00 Jens Axboe
2021-05-28 12:00 Jens Axboe
2021-05-27 12:00 Jens Axboe
2021-05-26 12:00 Jens Axboe
2021-05-19 12:00 Jens Axboe
2021-05-15 12:00 Jens Axboe
2021-05-12 12:00 Jens Axboe
2021-05-11 12:00 Jens Axboe
2021-05-09 12:00 Jens Axboe
2021-05-07 12:00 Jens Axboe
2021-04-28 12:00 Jens Axboe
2021-04-26 12:00 Jens Axboe
2021-04-24 12:00 Jens Axboe
2021-04-23 12:00 Jens Axboe
2021-04-17 12:00 Jens Axboe
2021-04-16 12:00 Jens Axboe
2021-04-14 12:00 Jens Axboe
2021-04-13 12:00 Jens Axboe
2021-04-11 12:00 Jens Axboe
2021-03-31 12:00 Jens Axboe
2021-03-19 12:00 Jens Axboe
2021-03-18 12:00 Jens Axboe
2021-03-12 13:00 Jens Axboe
2021-03-11 13:00 Jens Axboe
2021-03-10 13:00 Jens Axboe
2021-03-09 13:00 Jens Axboe
2021-03-07 13:00 Jens Axboe
2021-02-22 13:00 Jens Axboe
2021-02-17 13:00 Jens Axboe
2021-02-15 13:00 Jens Axboe
2021-02-11 13:00 Jens Axboe
2021-01-30 13:00 Jens Axboe
2021-01-28 13:00 Jens Axboe
2021-01-27 13:00 Jens Axboe
2021-01-26 13:00 Jens Axboe
2021-01-24 13:00 Jens Axboe
2021-01-17 13:00 Jens Axboe
2021-01-16 13:00 Jens Axboe
2021-01-13 13:00 Jens Axboe
2021-01-10 13:00 Jens Axboe
2021-01-08 13:00 Jens Axboe
2021-01-07 13:00 Jens Axboe
2021-01-06 13:00 Jens Axboe
2020-12-30 13:00 Jens Axboe
2020-12-25 13:00 Jens Axboe
2020-12-18 13:00 Jens Axboe
2020-12-16 13:00 Jens Axboe
2020-12-08 13:00 Jens Axboe
2020-12-06 13:00 Jens Axboe
2020-12-05 13:00 Jens Axboe
2020-12-04 13:00 Jens Axboe
2020-11-28 13:00 Jens Axboe
2020-11-26 13:00 Jens Axboe
2020-11-23 13:00 Jens Axboe
2020-11-14 13:00 Jens Axboe
2020-11-13 13:00 Jens Axboe
2020-11-10 13:00 Jens Axboe
2020-11-06 13:00 Jens Axboe
2020-11-12 20:51 ` Rebecca Cran
2020-11-05 13:00 Jens Axboe
2020-11-02 13:00 Jens Axboe
2020-10-31 12:00 Jens Axboe
2020-10-29 12:00 Jens Axboe
2020-10-15 12:00 Jens Axboe
2020-10-14 12:00 Jens Axboe
2020-10-11 12:00 Jens Axboe
2020-10-10 12:00 Jens Axboe
2020-09-15 12:00 Jens Axboe
2020-09-12 12:00 Jens Axboe
2020-09-10 12:00 Jens Axboe
2020-09-09 12:00 Jens Axboe
2020-09-08 12:00 Jens Axboe
2020-09-07 12:00 Jens Axboe
2020-09-06 12:00 Jens Axboe
2020-09-04 12:00 Jens Axboe
2020-09-02 12:00 Jens Axboe
2020-09-01 12:00 Jens Axboe
2020-08-30 12:00 Jens Axboe
2020-08-29 12:00 Jens Axboe
2020-08-28 12:00 Jens Axboe
2020-08-23 12:00 Jens Axboe
2020-08-22 12:00 Jens Axboe
2020-08-20 12:00 Jens Axboe
2020-08-19 12:00 Jens Axboe
2020-08-18 12:00 Jens Axboe
2020-08-17 12:00 Jens Axboe
2020-08-15 12:00 Jens Axboe
2020-08-14 12:00 Jens Axboe
2020-08-13 12:00 Jens Axboe
2020-08-12 12:00 Jens Axboe
2020-08-11 12:00 Jens Axboe
2020-08-08 12:00 Jens Axboe
2020-08-02 12:00 Jens Axboe
2020-07-28 12:00 Jens Axboe
2020-07-27 12:00 Jens Axboe
2020-07-26 12:00 Jens Axboe
2020-07-25 12:00 Jens Axboe
2020-07-22 12:00 Jens Axboe
2020-07-21 12:00 Jens Axboe
2020-07-19 12:00 Jens Axboe
2020-07-18 12:00 Jens Axboe
2020-07-15 12:00 Jens Axboe
2020-07-14 12:00 Jens Axboe
2020-07-09 12:00 Jens Axboe
2020-07-05 12:00 Jens Axboe
2020-07-04 12:00 Jens Axboe
2020-07-03 12:00 Jens Axboe
2020-06-29 12:00 Jens Axboe
2020-06-25 12:00 Jens Axboe
2020-06-24 12:00 Jens Axboe
2020-06-22 12:00 Jens Axboe
2020-06-13 12:00 Jens Axboe
2020-06-10 12:00 Jens Axboe
2020-06-08 12:00 Jens Axboe
2020-06-06 12:00 Jens Axboe
2020-06-04 12:00 Jens Axboe
2020-06-03 12:00 Jens Axboe
2020-05-30 12:00 Jens Axboe
2020-05-29 12:00 Jens Axboe
2020-05-26 12:00 Jens Axboe
2020-05-25 12:00 Jens Axboe
2020-05-24 12:00 Jens Axboe
2020-05-22 12:00 Jens Axboe
2020-05-21 12:00 Jens Axboe
2020-05-20 12:00 Jens Axboe
2020-05-19 12:00 Jens Axboe
2020-05-15 12:00 Jens Axboe
2020-05-14 12:00 Jens Axboe
2020-05-12 12:00 Jens Axboe
2020-04-30 12:00 Jens Axboe
2020-04-22 12:00 Jens Axboe
2020-04-21 12:00 Jens Axboe
2020-04-18 12:00 Jens Axboe
2020-04-17 12:00 Jens Axboe
2020-04-16 12:00 Jens Axboe
2020-04-14 12:00 Jens Axboe
2020-04-09 12:00 Jens Axboe
2020-04-08 12:00 Jens Axboe
2020-04-07 12:00 Jens Axboe
2020-04-03 12:00 Jens Axboe
2020-04-01 12:00 Jens Axboe
2020-03-27 12:00 Jens Axboe
2020-03-18 12:00 Jens Axboe
2020-03-17 12:00 Jens Axboe
2020-03-16 12:00 Jens Axboe
2020-03-13 12:00 Jens Axboe
2020-03-04 13:00 Jens Axboe
2020-03-03 13:00 Jens Axboe
2020-03-02 13:00 Jens Axboe
2020-02-27 13:00 Jens Axboe
2020-02-25 13:00 Jens Axboe
2020-02-07 13:00 Jens Axboe
2020-02-06 13:00 Jens Axboe
2020-02-05 13:00 Jens Axboe
2020-01-29 13:00 Jens Axboe
2020-01-24 13:00 Jens Axboe
2020-01-23 13:00 Jens Axboe
2020-01-19 13:00 Jens Axboe
2020-01-17 13:00 Jens Axboe
2020-01-15 13:00 Jens Axboe
2020-01-14 13:00 Jens Axboe
2020-01-10 13:00 Jens Axboe
2020-01-07 13:00 Jens Axboe
2020-01-06 13:00 Jens Axboe
2020-01-05 13:00 Jens Axboe
2020-01-04 13:00 Jens Axboe
2019-12-26 13:00 Jens Axboe
2019-12-24 13:00 Jens Axboe
2019-12-22 13:00 Jens Axboe
2019-12-19 13:00 Jens Axboe
2019-12-17 13:00 Jens Axboe
2019-12-12 13:00 Jens Axboe
2019-12-07 13:00 Jens Axboe
2019-11-28 13:00 Jens Axboe
2019-11-27 13:00 Jens Axboe
2019-11-26 13:00 Jens Axboe
2019-11-15 13:00 Jens Axboe
2019-11-07 15:25 Jens Axboe
2019-11-07 13:00 Jens Axboe
2019-11-06 13:00 Jens Axboe
2019-11-04 13:00 Jens Axboe
2019-11-03 13:00 Jens Axboe
2019-10-30 12:00 Jens Axboe
2019-10-25 12:00 Jens Axboe
2019-10-22 12:00 Jens Axboe
2019-10-16 12:00 Jens Axboe
2019-10-15 12:00 Jens Axboe
2019-10-14 12:00 Jens Axboe
2019-10-09 12:00 Jens Axboe
2019-10-08 12:00 Jens Axboe
2019-10-07 12:00 Jens Axboe
2019-10-03 12:00 Jens Axboe
2019-10-02 12:00 Jens Axboe
2019-09-28 12:00 Jens Axboe
2019-09-26 12:00 Jens Axboe
2019-09-25 12:00 Jens Axboe
2019-09-24 12:00 Jens Axboe
2019-09-20 12:00 Jens Axboe
2019-09-14 12:00 Jens Axboe
2019-09-13 12:00 Jens Axboe
2019-09-06 12:00 Jens Axboe
2019-09-04 12:00 Jens Axboe
2019-08-30 12:00 Jens Axboe
2019-08-29 12:00 Jens Axboe
2019-08-16 12:00 Jens Axboe
2019-08-15 12:00 Jens Axboe
2019-08-15 14:27 ` Rebecca Cran
2019-08-15 14:28   ` Jens Axboe
2019-08-15 15:05     ` Rebecca Cran
2019-08-15 15:17       ` Jens Axboe
2019-08-15 15:35         ` Rebecca Cran
2019-08-09 12:00 Jens Axboe
2019-08-06 12:00 Jens Axboe
2019-08-04 12:00 Jens Axboe
2019-08-03 12:00 Jens Axboe
2019-08-01 12:00 Jens Axboe
2019-07-27 12:00 Jens Axboe
2019-07-13 12:00 Jens Axboe
2019-07-10 12:00 Jens Axboe
2019-07-02 12:00 Jens Axboe
2019-06-01 12:00 Jens Axboe
2019-05-24 12:00 Jens Axboe
2019-05-23 12:00 Jens Axboe
2019-05-21 12:00 Jens Axboe
2019-05-17 12:00 Jens Axboe
2019-05-10 12:00 Jens Axboe
2019-05-09 12:00 Jens Axboe
2019-05-09 12:47 ` Erwan Velu
2019-05-09 14:07   ` Jens Axboe
2019-05-09 15:47 ` Elliott, Robert (Servers)
2019-05-09 15:52   ` Sebastien Boisvert
2019-05-09 16:12     ` Elliott, Robert (Servers)
2019-05-09 15:57   ` Jens Axboe
2019-05-07 12:00 Jens Axboe
2019-04-26 12:00 Jens Axboe
2019-04-23 12:00 Jens Axboe
2019-04-20 12:00 Jens Axboe
2019-04-19 12:00 Jens Axboe
2019-04-18 12:00 Jens Axboe
2019-04-02 12:00 Jens Axboe
2019-03-26 12:00 Jens Axboe
2019-03-22 12:00 Jens Axboe
2019-03-12 12:00 Jens Axboe
2019-03-09 13:00 Jens Axboe
2019-03-08 13:00 Jens Axboe
2019-03-07 13:00 Jens Axboe
2019-03-01 13:00 Jens Axboe
2019-02-25 13:00 Jens Axboe
2019-02-24 13:00 Jens Axboe
2019-02-22 13:00 Jens Axboe
2019-02-12 13:00 Jens Axboe
2019-02-11 13:00 Jens Axboe
2019-02-09 13:00 Jens Axboe
2019-02-08 13:00 Jens Axboe
2019-02-05 13:00 Jens Axboe
2019-02-01 13:00 Jens Axboe
2019-01-30 13:00 Jens Axboe
2019-01-29 13:00 Jens Axboe
2019-01-25 13:00 Jens Axboe
2019-01-24 13:00 Jens Axboe
2019-01-17 13:00 Jens Axboe
2019-01-16 13:00 Jens Axboe
2019-01-15 13:00 Jens Axboe
2019-01-14 13:00 Jens Axboe
2019-01-13 13:00 Jens Axboe
2019-01-12 13:00 Jens Axboe
2019-01-11 13:00 Jens Axboe
2019-01-10 13:00 Jens Axboe
2019-01-09 13:00 Jens Axboe
2019-01-08 13:00 Jens Axboe
2019-01-06 13:00 Jens Axboe
2019-01-05 13:00 Jens Axboe
2018-12-31 13:00 Jens Axboe
2018-12-22 13:00 Jens Axboe
2018-12-20 13:00 Jens Axboe
2018-12-15 13:00 Jens Axboe
2018-12-14 13:00 Jens Axboe
2018-12-13 13:00 Jens Axboe
2018-12-11 13:00 Jens Axboe
2018-12-05 13:00 Jens Axboe
2018-12-02 13:00 Jens Axboe
2018-12-01 13:00 Jens Axboe
2018-11-30 13:00 Jens Axboe
2018-11-28 13:00 Jens Axboe
2018-11-27 13:00 Jens Axboe
2018-11-26 13:00 Jens Axboe
2018-11-25 13:00 Jens Axboe
2018-11-22 13:00 Jens Axboe
2018-11-21 13:00 Jens Axboe
2018-11-20 13:00 Jens Axboe
2018-11-16 13:00 Jens Axboe
2018-11-07 13:00 Jens Axboe
2018-11-03 12:00 Jens Axboe
2018-10-27 12:00 Jens Axboe
2018-10-24 12:00 Jens Axboe
2018-10-20 12:00 Jens Axboe
2018-10-19 12:00 Jens Axboe
2018-10-16 12:00 Jens Axboe
2018-10-09 12:00 Jens Axboe
2018-10-06 12:00 Jens Axboe
2018-10-05 12:00 Jens Axboe
2018-10-04 12:00 Jens Axboe
2018-10-02 12:00 Jens Axboe
2018-10-01 12:00 Jens Axboe
2018-09-30 12:00 Jens Axboe
2018-09-28 12:00 Jens Axboe
2018-09-27 12:00 Jens Axboe
2018-09-26 12:00 Jens Axboe
2018-09-23 12:00 Jens Axboe
2018-09-22 12:00 Jens Axboe
2018-09-21 12:00 Jens Axboe
2018-09-20 12:00 Jens Axboe
2018-09-18 12:00 Jens Axboe
2018-09-17 12:00 Jens Axboe
2018-09-13 12:00 Jens Axboe
2018-09-12 12:00 Jens Axboe
2018-09-11 12:00 Jens Axboe
2018-09-10 12:00 Jens Axboe
2018-09-09 12:00 Jens Axboe
2018-09-08 12:00 Jens Axboe
2018-09-07 12:00 Jens Axboe
2018-09-06 12:00 Jens Axboe
2018-09-04 12:00 Jens Axboe
2018-09-01 12:00 Jens Axboe
2018-08-31 12:00 Jens Axboe
2018-08-26 12:00 Jens Axboe
2018-08-25 12:00 Jens Axboe
2018-08-24 12:00 Jens Axboe
2018-08-23 12:00 Jens Axboe
2018-08-22 12:00 Jens Axboe
2018-08-21 12:00 Jens Axboe
2018-08-18 12:00 Jens Axboe
2018-08-17 12:00 Jens Axboe
2018-08-16 12:00 Jens Axboe
2018-08-15 12:00 Jens Axboe
2018-08-14 12:00 Jens Axboe
2018-08-13 12:00 Jens Axboe
2018-08-11 12:00 Jens Axboe
2018-08-10 12:00 Jens Axboe
2018-08-08 12:00 Jens Axboe
2018-08-06 12:00 Jens Axboe
2018-08-04 12:00 Jens Axboe
2018-08-03 12:00 Jens Axboe
2018-07-31 12:00 Jens Axboe
2018-07-27 12:00 Jens Axboe
2018-07-26 12:00 Jens Axboe
2018-07-25 12:00 Jens Axboe
2018-07-24 12:00 Jens Axboe
2018-07-13 12:00 Jens Axboe
2018-07-12 12:00 Jens Axboe
2018-07-11 12:00 Jens Axboe
2018-07-05 12:00 Jens Axboe
2018-06-30 12:00 Jens Axboe
2018-06-22 12:00 Jens Axboe
2018-06-19 12:00 Jens Axboe
2018-06-16 12:00 Jens Axboe
2018-06-13 12:00 Jens Axboe
2018-06-12 12:00 Jens Axboe
2018-06-09 12:00 Jens Axboe
2018-06-08 12:00 Jens Axboe
2018-06-06 12:00 Jens Axboe
2018-06-05 12:00 Jens Axboe
2018-06-02 12:00 Jens Axboe
2018-06-01 12:00 Jens Axboe
2018-05-26 12:00 Jens Axboe
2018-05-19 12:00 Jens Axboe
2018-05-17 12:00 Jens Axboe
2018-05-15 12:00 Jens Axboe
2018-04-27 12:00 Jens Axboe
2018-04-25 12:00 Jens Axboe
2018-04-21 12:00 Jens Axboe
2018-04-19 12:00 Jens Axboe
2018-04-18 12:00 Jens Axboe
2018-04-17 12:00 Jens Axboe
2018-04-15 12:00 Jens Axboe
2018-04-14 12:00 Jens Axboe
2018-04-11 12:00 Jens Axboe
2018-04-10 12:00 Jens Axboe
2018-04-09 12:00 Jens Axboe
2018-04-07 12:00 Jens Axboe
2018-04-05 12:00 Jens Axboe
2018-04-04 12:00 Jens Axboe
2018-03-31 12:00 Jens Axboe
2018-03-30 12:00 Jens Axboe
2018-03-24 12:00 Jens Axboe
2018-03-23 12:00 Jens Axboe
2018-03-22 12:00 Jens Axboe
2018-03-21 12:00 Jens Axboe
2018-03-20 12:00 Jens Axboe
2018-03-14 12:00 Jens Axboe
2018-03-13 12:00 Jens Axboe
2018-03-10 13:00 Jens Axboe
2018-03-08 13:00 Jens Axboe
2018-03-07 13:00 Jens Axboe
2018-03-06 13:00 Jens Axboe
2018-03-03 13:00 Jens Axboe
2018-03-02 13:00 Jens Axboe
2018-03-01 13:00 Jens Axboe
2018-02-28 13:00 Jens Axboe
2018-02-27 13:00 Jens Axboe
2018-02-21 13:00 Jens Axboe
2018-02-15 13:00 Jens Axboe
2018-02-13 13:00 Jens Axboe
2018-02-11 13:00 Jens Axboe
2018-02-09 13:00 Jens Axboe
2018-02-08 13:00 Jens Axboe
2018-01-26 13:00 Jens Axboe
2018-01-25 13:00 Jens Axboe
2018-01-17 13:00 Jens Axboe
2018-01-13 13:00 Jens Axboe
2018-01-11 13:00 Jens Axboe
2018-01-07 13:00 Jens Axboe
2018-01-06 13:00 Jens Axboe
2018-01-03 13:00 Jens Axboe
2017-12-30 13:00 Jens Axboe
2017-12-29 13:00 Jens Axboe
2017-12-28 13:00 Jens Axboe
2017-12-22 13:00 Jens Axboe
2017-12-20 13:00 Jens Axboe
2017-12-16 13:00 Jens Axboe
2017-12-15 13:00 Jens Axboe
2017-12-14 13:00 Jens Axboe
2017-12-09 13:00 Jens Axboe
2017-12-08 13:00 Jens Axboe
2017-12-07 13:00 Jens Axboe
2017-12-04 13:00 Jens Axboe
2017-12-03 13:00 Jens Axboe
2017-12-02 13:00 Jens Axboe
2017-12-01 13:00 Jens Axboe
2017-11-30 13:00 Jens Axboe
2017-11-29 13:00 Jens Axboe
2017-11-24 13:00 Jens Axboe
2017-11-23 13:00 Jens Axboe
2017-11-18 13:00 Jens Axboe
2017-11-20 15:00 ` Elliott, Robert (Persistent Memory)
2017-11-17 13:00 Jens Axboe
2017-11-16 13:00 Jens Axboe
2017-11-07 13:00 Jens Axboe
2017-11-04 12:00 Jens Axboe
2017-11-03 12:00 Jens Axboe
2017-11-02 12:00 Jens Axboe
2017-11-01 12:00 Jens Axboe
2017-10-31 12:00 Jens Axboe
2017-10-27 12:00 Jens Axboe
2017-10-26 12:00 Jens Axboe
2017-10-21 12:00 Jens Axboe
2017-10-18 12:00 Jens Axboe
2017-10-13 12:00 Jens Axboe
2017-10-12 12:00 Jens Axboe
2017-10-11 12:00 Jens Axboe
2017-10-10 12:00 Jens Axboe
2017-10-07 12:00 Jens Axboe
2017-10-04 12:00 Jens Axboe
2017-09-29 12:00 Jens Axboe
2017-09-28 12:00 Jens Axboe
2017-09-27 12:00 Jens Axboe
2017-09-21 12:00 Jens Axboe
2017-09-19 12:00 Jens Axboe
2017-09-15 12:00 Jens Axboe
2017-09-14 12:00 Jens Axboe
2017-09-13 12:00 Jens Axboe
2017-09-12 12:00 Jens Axboe
2017-09-06 12:00 Jens Axboe
2017-09-03 12:00 Jens Axboe
2017-09-02 12:00 Jens Axboe
2017-09-01 12:00 Jens Axboe
2017-08-31 12:00 Jens Axboe
2017-08-30 12:00 Jens Axboe
2017-08-29 12:00 Jens Axboe
2017-08-28 12:00 Jens Axboe
2017-08-24 12:00 Jens Axboe
2017-08-23 12:00 Jens Axboe
2017-08-18 12:00 Jens Axboe
2017-08-17 12:00 Jens Axboe
2017-08-15 12:00 Jens Axboe
2017-08-10 12:00 Jens Axboe
2017-08-09 12:00 Jens Axboe
2017-08-08 12:00 Jens Axboe
2017-08-02 12:00 Jens Axboe
2017-08-01 12:00 Jens Axboe
2017-07-28 12:00 Jens Axboe
2017-07-26 12:00 Jens Axboe
2017-07-21 12:00 Jens Axboe
2017-07-17 12:00 Jens Axboe
2017-07-15 12:00 Jens Axboe
2017-07-14 12:00 Jens Axboe
2017-07-13 12:00 Jens Axboe
2017-07-11 12:00 Jens Axboe
2017-07-08 12:00 Jens Axboe
2017-07-07 12:00 Jens Axboe
2017-07-05 12:00 Jens Axboe
2017-07-04 12:00 Jens Axboe
2017-07-03 12:00 Jens Axboe
2017-06-29 12:00 Jens Axboe
2017-06-28 12:00 Jens Axboe
2017-06-27 12:00 Jens Axboe
2017-06-26 12:00 Jens Axboe
2017-06-24 12:00 Jens Axboe
2017-06-23 12:00 Jens Axboe
2017-06-20 12:00 Jens Axboe
2017-06-19 12:00 Jens Axboe
2017-06-16 12:00 Jens Axboe
2017-06-15 12:00 Jens Axboe
2017-06-13 12:00 Jens Axboe
2017-06-09 12:00 Jens Axboe
2017-06-08 12:00 Jens Axboe
2017-06-06 12:00 Jens Axboe
2017-06-03 12:00 Jens Axboe
2017-05-27 12:00 Jens Axboe
2017-05-25 12:00 Jens Axboe
2017-05-24 12:00 Jens Axboe
2017-05-23 12:00 Jens Axboe
2017-05-20 12:00 Jens Axboe
2017-05-19 12:00 Jens Axboe
2017-05-10 12:00 Jens Axboe
2017-05-05 12:00 Jens Axboe
2017-05-04 12:00 Jens Axboe
2017-05-02 12:00 Jens Axboe
2017-05-01 12:00 Jens Axboe
2017-04-27 12:00 Jens Axboe
2017-04-26 12:00 Jens Axboe
2017-04-20 12:00 Jens Axboe
2017-04-11 12:00 Jens Axboe
2017-04-09 12:00 Jens Axboe
2017-04-08 12:00 Jens Axboe
2017-04-05 12:00 Jens Axboe
2017-04-04 12:00 Jens Axboe
2017-04-03 12:00 Jens Axboe
2017-03-29 12:00 Jens Axboe
2017-03-22 12:00 Jens Axboe
2017-03-20 12:00 Jens Axboe
2017-03-18 12:00 Jens Axboe
2017-03-17 12:00 Jens Axboe
2017-03-15 12:00 Jens Axboe
2017-03-14 12:00 Jens Axboe
2017-03-13 12:00 Jens Axboe
2017-03-11 13:00 Jens Axboe
2017-03-09 13:00 Jens Axboe
2017-03-08 13:00 Jens Axboe
2017-02-25 13:00 Jens Axboe
2017-02-24 13:00 Jens Axboe
2017-02-23 13:00 Jens Axboe
2017-02-22 13:00 Jens Axboe
2017-02-21 13:00 Jens Axboe
2017-02-20 13:00 Jens Axboe
2017-02-18 13:00 Jens Axboe
2017-02-17 13:00 Jens Axboe
2017-02-16 13:00 Jens Axboe
2017-02-15 13:00 Jens Axboe
2017-02-14 13:00 Jens Axboe
2017-02-08 13:00 Jens Axboe
2017-02-05 13:00 Jens Axboe
2017-02-03 13:00 Jens Axboe
2017-01-31 13:00 Jens Axboe
2017-01-28 13:00 Jens Axboe
2017-01-27 13:00 Jens Axboe
2017-01-24 13:00 Jens Axboe
2017-01-21 13:00 Jens Axboe
2017-01-20 13:00 Jens Axboe
2017-01-19 13:00 Jens Axboe
2017-01-18 13:00 Jens Axboe
2017-01-13 13:00 Jens Axboe
2017-01-17 14:42 ` Elliott, Robert (Persistent Memory)
2017-01-17 15:51   ` Jens Axboe
2017-01-17 16:03     ` Jens Axboe
2017-01-12 13:00 Jens Axboe
2017-01-11 13:00 Jens Axboe
2017-01-07 13:00 Jens Axboe
2017-01-06 13:00 Jens Axboe
2017-01-05 13:00 Jens Axboe
2017-01-04 13:00 Jens Axboe
2017-01-03 13:00 Jens Axboe
2016-12-30 13:00 Jens Axboe
2016-12-24 13:00 Jens Axboe
2016-12-21 13:00 Jens Axboe
2016-12-20 13:00 Jens Axboe
2016-12-17 13:00 Jens Axboe
2016-12-16 13:00 Jens Axboe
2016-12-14 13:00 Jens Axboe
2016-12-13 13:00 Jens Axboe
2016-12-06 13:00 Jens Axboe
2016-12-02 13:00 Jens Axboe
2016-11-28 13:00 Jens Axboe
2016-11-17 13:00 Jens Axboe
2016-11-16 13:00 Jens Axboe
2016-11-14 13:00 Jens Axboe
2016-11-13 13:00 Jens Axboe
2016-11-03 12:00 Jens Axboe
2016-11-02 12:00 Jens Axboe
2016-10-27 12:00 Jens Axboe
2016-10-26 12:00 Jens Axboe
2016-10-25 12:00 Jens Axboe
2016-10-24 12:00 Jens Axboe
2016-10-21 12:00 Jens Axboe
2016-10-20 12:00 Jens Axboe
2016-10-19 12:00 Jens Axboe
2016-10-18 12:00 Jens Axboe
2016-10-15 12:00 Jens Axboe
2016-10-13 12:00 Jens Axboe
2016-10-12 12:00 Jens Axboe
2016-09-28 12:00 Jens Axboe
2016-09-26 12:00 Jens Axboe
2016-09-24 12:00 Jens Axboe
2016-09-21 12:00 Jens Axboe
2016-09-20 12:00 Jens Axboe
2016-09-17 12:00 Jens Axboe
2016-09-16 12:00 Jens Axboe
2016-09-14 12:00 Jens Axboe
2016-09-13 12:00 Jens Axboe
2016-09-12 12:00 Jens Axboe
2016-09-07 12:00 Jens Axboe
2016-09-03 12:00 Jens Axboe
2016-08-30 12:00 Jens Axboe
2016-08-27 12:00 Jens Axboe
2016-08-26 12:00 Jens Axboe
2016-08-23 12:00 Jens Axboe
2016-08-21 12:00 Jens Axboe
2016-08-19 12:00 Jens Axboe
2016-08-17 12:00 Jens Axboe
2016-08-16 12:00 Jens Axboe
2016-08-15 12:00 Jens Axboe
2016-08-09 12:00 Jens Axboe
2016-08-08 12:00 Jens Axboe
2016-08-08 13:31 ` Erwan Velu
2016-08-08 13:47   ` Jens Axboe
2016-08-05 12:00 Jens Axboe
2016-08-04 12:00 Jens Axboe
2016-08-03 12:00 Jens Axboe
2016-08-02 12:00 Jens Axboe
2016-07-30 12:00 Jens Axboe
2016-07-29 12:00 Jens Axboe
2016-07-28 12:00 Jens Axboe
2016-07-27 12:00 Jens Axboe
2016-07-23 12:00 Jens Axboe
2016-07-21 12:00 Jens Axboe
2016-07-20 12:00 Jens Axboe
2016-07-19 12:00 Jens Axboe
2016-07-15 12:00 Jens Axboe
2016-07-14 12:00 Jens Axboe
2016-07-13 12:00 Jens Axboe
2016-07-12 12:00 Jens Axboe
2016-07-07 12:00 Jens Axboe
2016-07-06 12:00 Jens Axboe
2016-06-30 12:00 Jens Axboe
2016-06-14 12:00 Jens Axboe
2016-06-12 12:00 Jens Axboe
2016-06-10 12:00 Jens Axboe
2016-06-09 12:00 Jens Axboe
2016-06-07 12:00 Jens Axboe
2016-06-04 12:00 Jens Axboe
2016-06-03 12:00 Jens Axboe
2016-05-28 12:00 Jens Axboe
2016-05-26 12:00 Jens Axboe
2016-05-25 12:00 Jens Axboe
2016-05-24 12:00 Jens Axboe
2016-05-22 12:00 Jens Axboe
2016-05-21 12:00 Jens Axboe
2016-05-20 12:00 Jens Axboe
2016-05-19 12:00 Jens Axboe
2016-05-18 12:00 Jens Axboe
2016-05-17 12:00 Jens Axboe
2016-05-11 12:00 Jens Axboe
2016-05-10 12:00 Jens Axboe
2016-05-07 12:00 Jens Axboe
2016-05-06 12:00 Jens Axboe
2016-05-04 12:00 Jens Axboe
2016-05-03 12:00 Jens Axboe
2016-04-29 12:00 Jens Axboe
2016-04-24 12:00 Jens Axboe
2016-04-21 12:00 Jens Axboe
2016-04-19 12:00 Jens Axboe
2016-04-14 12:00 Jens Axboe
2016-04-05 12:00 Jens Axboe
2016-04-02 12:00 Jens Axboe
2016-03-30 12:00 Jens Axboe
2016-03-26 12:00 Jens Axboe
2016-03-25 12:00 Jens Axboe
2016-03-24 12:00 Jens Axboe
2016-03-21 12:00 Jens Axboe
2016-03-19 12:00 Jens Axboe
2016-03-16 12:00 Jens Axboe
2016-03-11 13:00 Jens Axboe
2016-03-10 13:00 Jens Axboe
2016-03-09 13:00 Jens Axboe
2016-03-08 13:00 Jens Axboe
2016-03-05 13:00 Jens Axboe
2016-03-04 13:00 Jens Axboe
2016-02-27 13:00 Jens Axboe
2016-02-26 13:00 Jens Axboe
2016-02-25 13:00 Jens Axboe
2016-02-14 13:00 Jens Axboe
2016-02-15 14:26 ` Erwan Velu
2016-02-13 13:00 Jens Axboe
2016-02-11 13:00 Jens Axboe
2016-02-10 13:00 Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).